Google Drive connector
Read files recursively from shared Google Drive folders via a service account. Supports MIME-type filtering and auto-exports Google Docs, Sheets, and Slides to text or CSV.
The google_drive connector provides utilities for reading files from Google Drive using a service account.
from cocoindex.connectors import google_drive
This connector requires additional dependencies. Install with:
pip install cocoindex[google_drive]As source
The connector provides two ways to read from Google Drive:
GoogleDriveSource— high-level source class with async iterationlist_files()— lower-level function returning a sync iterator
Both require a Google service account with access to the target Drive folders.
Setting up a service account
- Create a service account in the Google Cloud Console
- Download the JSON credential file
- Share the target Drive folders with the service account’s email address
GoogleDriveSource
The primary source class for iterating over Google Drive files.
class GoogleDriveSource(
*,
service_account_credential_path: str,
root_folder_ids: Sequence[str],
mime_types: Sequence[str] | None = None,
)
Parameters:
service_account_credential_path— Path to the service account JSON credential file.root_folder_ids— List of Google Drive folder IDs to scan. Subfolders are traversed recursively.mime_types— Optional list of MIME types to include. IfNone, all file types are included.
Iterating files
GoogleDriveSource provides async iteration via files(), yielding DriveFile objects (implementing the FileLike base class):
source = google_drive.GoogleDriveSource(
service_account_credential_path="./credentials.json",
root_folder_ids=["1abc...xyz"],
)
async for file in source.files():
text = await file.read_text()
...
Keyed iteration with items()
items() yields (str, DriveFile) pairs, where the key is the file’s name path. This is useful with mount_each():
async for key, file in source.items():
content = await file.read()
Filtering by MIME type
Use mime_types to restrict which files are returned:
source = google_drive.GoogleDriveSource(
service_account_credential_path="./credentials.json",
root_folder_ids=["1abc...xyz"],
mime_types=["application/pdf", "text/plain"],
)
Google Workspace files (Docs, Sheets, Slides) are automatically exported:
| Google Workspace type | Exported as |
|---|---|
| Google Docs | Plain text |
| Google Sheets | CSV |
| Google Slides | Plain text |
list_files
A lower-level sync iterator for listing files:
def list_files(spec: GoogleDriveSourceSpec) -> Iterator[DriveFile]
Parameters:
spec— AGoogleDriveSourceSpecwith the same fields asGoogleDriveSourceconstructor parameters.
Returns: A sync iterator of DriveFile objects.
DriveFile
DriveFile implements FileLike with Google Drive-specific behavior:
file_path— ADriveFilePathwhereresolve()returns the Google Drive file ID.read()/read_text()— Downloads file content via the Google Drive API. Partial reads (sizeparameter) are not supported.
Example
import cocoindex as coco
from cocoindex.connectors import google_drive
from cocoindex.resources.file import FileLike
@coco.fn(memo=True)
async def process_file(file: FileLike) -> None:
text = await file.read_text()
# ... process the file content ...
@coco.fn
async def app_main(credential_path: str, folder_ids: list[str]) -> None:
source = google_drive.GoogleDriveSource(
service_account_credential_path=credential_path,
root_folder_ids=folder_ids,
)
with coco.component_subpath("file"):
async for key, file in source.items():
await coco.mount(
coco.component_subpath(key),
process_file,
file,
)
app = coco.App(
"GoogleDriveIngestion",
app_main,
credential_path="./credentials.json",
folder_ids=["1abc...xyz"],
)