Python SDK overview

A tour of the CocoIndex Python SDK — package layout (core, connectors, resources, ops), common types like StableKey, and how async orchestration composes with sync leaf functions.

Version
v 1.0.0-alpha48
Last reviewed
Apr 19, 2026

This document provides an overview of the CocoIndex Python SDK organization and how async and sync APIs work together.

Package organization

The CocoIndex SDK is organized into several modules:

Core package

PackageDescription
cocoindexAll core APIs — async by default, sync variants have a _blocking suffix

Sub-packages

PackageDescription
cocoindex.connectorsConnectors for data sources and targets
cocoindex.resourcesCommon resources — shared data types, vector schema annotations, and ID generation utilities
cocoindex.opsBuilt-in operations for common data processing tasks (e.g., text splitting, embedding with SentenceTransformers)

Import connectors and extras by their specific sub-module:

python
from cocoindex.connectors import localfs, postgres
from cocoindex.ops.text import RecursiveSplitter
from cocoindex.ops.sentence_transformers import SentenceTransformerEmbedder
from cocoindex.resources.file import FileLike, PatternFilePathMatcher
from cocoindex.resources.chunk import Chunk

Common types

StableKey

StableKey is a type alias defining what values can be used when creating component paths via coco.component_subpath():

python
StableKey = None | bool | int | str | bytes | uuid.UUID | Symbol | tuple[StableKey, ...]

Common examples include strings (like "setup" or "table"), integers, and UUIDs. Tuples allow composite keys when needed. Symbol provides predefined names that will never conflict with strings (which typically come from runtime data).

Each processing component must be mounted at a unique path. See Processing Component for how the component path tree affects target states and ownership.

Async vs sync APIs

CocoIndex’s API is async-first. The APIs fall into three categories:

Orchestration APIs (async only)

The APIs that shape your pipeline are async:

mount(), use_mount(), mount_each(), mount_target(), map()

Entry-point APIs (async + sync)

APIs for starting and running your pipeline have both async and sync variants. Sync variants use a _blocking suffix:

AsyncSync (blocking)
await app.update(...)app.update_blocking(...)
await app.drop(...)app.drop_blocking(...)
await coco.start()coco.start_blocking()
await coco.stop()coco.stop_blocking()
async with coco.runtime():with coco.runtime():

app.update() returns an UpdateHandle that is also awaitable — await app.update() returns the result directly, or you can use the handle for progress monitoring. Use the _blocking variants for scripts and CLI usage. See App for details.

Processing functions (your choice)

The @coco.fn decorator preserves the sync/async nature of your function — your processing functions can be sync or async. See Function for details.

How sync and async work together

Like any async Python program, async functions can call into sync code, but not the other way around. In practice, this means higher-level functions (orchestration) tend to be async, while leaf functions (the actual computation) can be sync.

CocoIndex provides two ways for async code to call into sync functions:

  • Mounting — When you mount a processing component, the function is scheduled on CocoIndex’s runtime, not called directly. So an async function can mount a sync processing function.
  • @coco.fn.as_async — Wraps a sync function with an async interface (runs on a thread pool). Useful for compute-intensive leaf functions. See Function for details.

Example: async orchestration mounting sync leaf functions

A typical pipeline has an async main function that orchestrates the pipeline, while leaf functions that do the actual computation can be sync:

python
import pathlib

import cocoindex as coco
from cocoindex.connectors import localfs
from cocoindex.resources.file import PatternFilePathMatcher
from docling.document_converter import DocumentConverter

_converter = DocumentConverter()

@coco.fn(memo=True)
def process_file(file: localfs.File, outdir: pathlib.Path) -> None:
    # Sync leaf function — does the actual computation
    markdown = _converter.convert(
        file.file_path.resolve()
    ).document.export_to_markdown()
    outname = file.file_path.path.stem + ".md"
    localfs.declare_file(outdir / outname, markdown, create_parent_dirs=True)

@coco.fn
async def app_main(sourcedir: pathlib.Path, outdir: pathlib.Path) -> None:
    # Async — orchestrates the pipeline, mounts child components
    files = localfs.walk_dir(
        sourcedir,
        recursive=True,
        path_matcher=PatternFilePathMatcher(included_patterns=["**/*.pdf"]),
    )
    await coco.mount_each(process_file, files.items(), outdir)

app = coco.App("PdfToMarkdown", app_main,
               sourcedir=pathlib.Path("./pdf_files"), outdir=pathlib.Path("./out"))

Here app_main is async because it uses mounting APIs (mount_each), while process_file is sync because it only does computation. The sync process_file is mounted as a child component — mounting schedules it on CocoIndex’s runtime, so the async parent can mount a sync child without issues.

Running an app

Run the app with either an async or sync entry point:

python
# Async entry point
async def main():
    await app.update(report_to_stdout=True)

asyncio.run(main())
python
# Sync entry point (scripts, CLI)
app.update_blocking(report_to_stdout=True)
CocoIndex Docs Edit this page Report issue