Google Drive connector

Read files recursively from shared Google Drive folders via a service account. Supports MIME-type filtering and auto-exports Google Docs, Sheets, and Slides to text or CSV.

Version
v 1.0.0-alpha48
Last reviewed
Apr 19, 2026

The google_drive connector provides utilities for reading files from Google Drive using a service account.

python
from cocoindex.connectors import google_drive
Dependencies

This connector requires additional dependencies. Install with:

bash
pip install cocoindex[google_drive]

As source

The connector provides two ways to read from Google Drive:

  • GoogleDriveSource — high-level source class with async iteration
  • list_files() — lower-level function returning a sync iterator

Both require a Google service account with access to the target Drive folders.

Setting up a service account

  1. Create a service account in the Google Cloud Console
  2. Download the JSON credential file
  3. Share the target Drive folders with the service account’s email address

GoogleDriveSource

The primary source class for iterating over Google Drive files.

python
class GoogleDriveSource(
    *,
    service_account_credential_path: str,
    root_folder_ids: Sequence[str],
    mime_types: Sequence[str] | None = None,
)

Parameters:

  • service_account_credential_path — Path to the service account JSON credential file.
  • root_folder_ids — List of Google Drive folder IDs to scan. Subfolders are traversed recursively.
  • mime_types — Optional list of MIME types to include. If None, all file types are included.

Iterating files

GoogleDriveSource provides async iteration via files(), yielding DriveFile objects (implementing the FileLike base class):

python
source = google_drive.GoogleDriveSource(
    service_account_credential_path="./credentials.json",
    root_folder_ids=["1abc...xyz"],
)

async for file in source.files():
    text = await file.read_text()
    ...

Keyed iteration with items()

items() yields (str, DriveFile) pairs, where the key is the file’s name path. This is useful with mount_each():

python
async for key, file in source.items():
    content = await file.read()

Filtering by MIME type

Use mime_types to restrict which files are returned:

python
source = google_drive.GoogleDriveSource(
    service_account_credential_path="./credentials.json",
    root_folder_ids=["1abc...xyz"],
    mime_types=["application/pdf", "text/plain"],
)

Google Workspace files (Docs, Sheets, Slides) are automatically exported:

Google Workspace typeExported as
Google DocsPlain text
Google SheetsCSV
Google SlidesPlain text

list_files

A lower-level sync iterator for listing files:

python
def list_files(spec: GoogleDriveSourceSpec) -> Iterator[DriveFile]

Parameters:

  • spec — A GoogleDriveSourceSpec with the same fields as GoogleDriveSource constructor parameters.

Returns: A sync iterator of DriveFile objects.

DriveFile

DriveFile implements FileLike with Google Drive-specific behavior:

  • file_path — A DriveFilePath where resolve() returns the Google Drive file ID.
  • read() / read_text() — Downloads file content via the Google Drive API. Partial reads (size parameter) are not supported.

Example

python
import cocoindex as coco
from cocoindex.connectors import google_drive
from cocoindex.resources.file import FileLike

@coco.fn(memo=True)
async def process_file(file: FileLike) -> None:
    text = await file.read_text()
    # ... process the file content ...

@coco.fn
async def app_main(credential_path: str, folder_ids: list[str]) -> None:
    source = google_drive.GoogleDriveSource(
        service_account_credential_path=credential_path,
        root_folder_ids=folder_ids,
    )

    with coco.component_subpath("file"):
        async for key, file in source.items():
            await coco.mount(
                coco.component_subpath(key),
                process_file,
                file,
            )

app = coco.App(
    "GoogleDriveIngestion",
    app_main,
    credential_path="./credentials.json",
    folder_ids=["1abc...xyz"],
)
CocoIndex Docs Edit this page Report issue