Google Drive connector
Read files recursively from shared Google Drive folders via a service account. Supports MIME-type filtering and auto-exports Google Docs, Sheets, and Slides to text or CSV.
The google_drive connector provides utilities for reading files from Google Drive using a service account.
from cocoindex.connectors import google_drive
This connector requires additional dependencies. Install with:
pip install cocoindex[google_drive]As source
The connector provides two ways to read from Google Drive:
GoogleDriveSource— high-level source class with async iterationlist_files()— lower-level function returning a sync iterator
Both require a Google service account with access to the target Drive folders.
Setting up a service account
- Create a service account in the Google Cloud Console
- Download the JSON credential file
- Share the target Drive folders with the service account’s email address
gws is an optional, unofficial Google Workspace CLI. It is actively developed and subject to change, but can be useful for exploring or validating Drive API access before configuring CocoIndex’s service-account flow. For example:
gws auth setup
gws auth login
gws drive files listIn headless or agent workflows, gws can also read credentials from GOOGLE_WORKSPACE_CLI_CREDENTIALS_FILE. CocoIndex still expects the service account JSON path in service_account_credential_path; use the gws credentials setting for gws commands themselves.
GoogleDriveSource
The primary source class for iterating over Google Drive files.
class GoogleDriveSource(
*,
service_account_credential_path: str,
root_folder_ids: Sequence[str],
mime_types: Sequence[str] | None = None,
)
Parameters:
service_account_credential_path— Path to the service account JSON credential file.root_folder_ids— List of Google Drive folder IDs to scan. Subfolders are traversed recursively.mime_types— Optional list of MIME types to include. IfNone, all file types are included.
Iterating files
GoogleDriveSource provides async iteration via files(), yielding DriveFile objects (implementing the FileLike base class):
source = google_drive.GoogleDriveSource(
service_account_credential_path="./credentials.json",
root_folder_ids=["1abc...xyz"],
)
async for file in source.files():
text = await file.read_text()
...
Keyed iteration with items()
items() yields (str, DriveFile) pairs, where the key is the file’s name path. This is useful with mount_each():
async for key, file in source.items():
content = await file.read()
Filtering by MIME type
Use mime_types to restrict which files are returned:
source = google_drive.GoogleDriveSource(
service_account_credential_path="./credentials.json",
root_folder_ids=["1abc...xyz"],
mime_types=["application/pdf", "text/plain"],
)
Google Workspace files (Docs, Sheets, Slides) are automatically exported:
| Google Workspace type | Exported as |
|---|---|
| Google Docs | Plain text |
| Google Sheets | CSV |
| Google Slides | Plain text |
list_files
A lower-level sync iterator for listing files:
def list_files(spec: GoogleDriveSourceSpec) -> Iterator[DriveFile]
Parameters:
spec— AGoogleDriveSourceSpecwith the same fields asGoogleDriveSourceconstructor parameters.
Returns: A sync iterator of DriveFile objects.
DriveFile
DriveFile implements FileLike with Google Drive-specific behavior:
file_path— ADriveFilePathwhereresolve()returns the Google Drive file ID.read()/read_text()— Downloads file content via the Google Drive API. Partial reads (sizeparameter) are not supported.
Example
import cocoindex as coco
from cocoindex.connectors import google_drive
from cocoindex.resources.file import FileLike
@coco.fn(memo=True)
async def process_file(file: FileLike) -> None:
text = await file.read_text()
# ... process the file content ...
@coco.fn
async def app_main(credential_path: str, folder_ids: list[str]) -> None:
source = google_drive.GoogleDriveSource(
service_account_credential_path=credential_path,
root_folder_ids=folder_ids,
)
with coco.component_subpath("file"):
async for key, file in source.items():
await coco.mount(
coco.component_subpath(key),
process_file,
file,
)
app = coco.App(
"GoogleDriveIngestion",
app_main,
credential_path="./credentials.json",
folder_ids=["1abc...xyz"],
)