Processing Component

Most apps process many independent source items — files, rows, or entities. A Processing Component is the unit of execution for one: it runs that item's transformation logic and declares the set of target states produced for it.

Component path

A component path is the stable identifier for a processing component across runs (think of it like a path in a tree). CocoIndex uses it to match a component to its previous run, detect what changed for that item, and sync that component's target states atomically when it finishes. This sync happens per component; CocoIndex does not wait for other components in the same app to complete.

Component paths are hierarchical and form a tree structure. You specify child paths using coco.component_subpath() with stable identifiers like string literals, file names, row keys, or entity IDs:

coco.component_subpath(filename)           # e.g., coco.component_subpath("hello.pdf")
coco.component_subpath("user", user_id)    # e.g., coco.component_subpath("user", 12345)

Choose paths that are stable for the "same" item (e.g., file path, primary key). If an item disappears and its path is no longer present, CocoIndex cleans up the target states owned by that path (and its sub-paths).

Here's an example component path tree (from the Quickstart):

(root)                         ← app_main component
├── "setup"                    ← declare_dir_target component
└── "process"
    ├── "hello.pdf"            ← process_file component
    └── "world.pdf"            ← process_file component

See StableKey in the SDK Overview for details on what values can be used in component paths.

Mount

Mounting is how you declare (instantiate) a processing component within an app at a specific path, so CocoIndex knows that component exists, should run, and owns a set of target states. CocoIndex provides two APIs: mount() and mount_run().

mount() sets up a processing component in a child path without depending on data from it. This allows the component to refresh independently in live mode.
mount_run() returns a value from the component's execution to the caller. The component at that path cannot refresh independently without re-executing the caller.

Usually, only use mount_run() when you need the return value.

`mount()`

Use mount() when you don't need a return value from the processing component. It schedules the processing component to run and returns a handle:

handle = coco_aio.mount(
    coco.component_subpath("process", filename),
    process_file,
    file,
    target,
)

The handle provides a method you can call if you need to wait until the processing component is fully ready — meaning all its target states have been applied to external systems and all components in its sub-paths are ready:

await handle.ready()  # Async API

The corresponding sync API:

handle = coco.mount(
    coco.component_subpath("process", filename),
    process_file,
    file,
    target,
)
handle.wait_until_ready()  # Blocks until ready

You usually only need to call ready() (or wait_until_ready() in sync) when you have logic that depends on the processing component's target states being applied — for example, querying the latest data from a target table after syncing it.

`mount_run()`

Use mount_run() when you need the processing component's return value. It returns a handle with a result() method:

handle = coco_aio.mount_run(
    coco.component_subpath("setup"),
    setup_table,
    table_name="docs",
)
table = await handle.result()  # Waits until ready, then returns the value

Calling result() waits until the processing component is ready and then returns the value.

The corresponding sync API:

handle = coco.mount_run(
    coco.component_subpath("setup"),
    setup_table,
    table_name="docs",
)
table = handle.result()  # Blocks until ready, then returns the value

A common use of mount_run() is to obtain a target after its container target state is applied.

Using `component_subpath` as a context manager

You can also use component_subpath() as a context manager to create nested paths without repeating common prefixes:

with coco.component_subpath("process"):
    for f in files:
        coco.mount(
            coco.component_subpath(str(f.relative_path)),
            process_file,
            f,
            target,
        )

This is equivalent to:

for f in files:
    coco.mount(
        coco.component_subpath("process", str(f.relative_path)),
        process_file,
        f,
        target,
    )

How target states sync

The component path tree determines ownership. When a component is no longer mounted at a path (e.g., a source file is deleted), CocoIndex automatically cleans up its target states — and recursively for all its sub-paths.

Sync Mechanism

After a processing component finishes, CocoIndex syncs its target states:

Compares the target states declared in this run against those from the previous run at the same path
Applies changes as a unit — creating, updating, or deleting target states as needed
Recursively cleans up sub-paths where components are no longer mounted

This provides atomic updates per component. For example, if a source file changes, its component's target states are applied atomically.

How big should a processing component be?

When defining processing components, think about granularity — what one path represents — because it determines the sync boundary for target states.

For example, if you're processing files:

Coarse: one component for "all files" (coco.component_subpath("process"))
Medium: one component per file (coco.component_subpath("process", file_path))
Fine: one component per chunk (coco.component_subpath("process", file_path, chunk_id))

In general:

Coarse-grained (fewer, larger components): More target states sync together as a unit, but you only see updates after the larger component finishes.
Fine-grained (more, smaller components): Each component syncs its target states as soon as it finishes, but target states owned by different components do not sync atomically together.

For small datasets, a single processing component that owns all target states is simple and ensures all target states sync atomically. As data grows, consider breaking it down into one component per source item (e.g., one per file) to reduce latency: you see each item's target states synced as soon as it's processed, without waiting for the full dataset to complete. This also helps isolate failures to that item.

Explicit context management

CocoIndex automatically propagates component context through Python's contextvars, which works for ordinary function calls (both sync and async). However, in situations where context variables are not preserved (for example, when using concurrent.futures.ThreadPoolExecutor), you need to explicitly capture and attach the context.

Use coco.get_component_context() to capture the current context, and context.attach() to restore it:

from concurrent.futures import ThreadPoolExecutor

@coco.function
def app_main() -> None:
    # Capture the current context
    ctx = coco.get_component_context()

    def worker(item):
        # Attach the context in the worker thread
        with ctx.attach():
            # Now CocoIndex APIs work correctly
            process_item(item)

    with ThreadPoolExecutor() as executor:
        executor.map(worker, items)

This pattern ensures that CocoIndex can track component relationships and target state ownership even across thread boundaries.

Component path​

Mount​

mount()​

mount_run()​

Using component_subpath as a context manager​

How target states sync​

How big should a processing component be?​

Explicit context management​