The CocoIndex App

Apps are the top-level runnable unit in CocoIndex. Learn how to create an App, trigger updates, configure the database path, and wire up lifespan-managed resources.

Version: v 1.0.2
Last reviewed: May 2, 2026

An App is the top-level runnable unit in CocoIndex. It names your pipeline and binds a main function with its parameters. When you call app.update(), CocoIndex runs that main function as the root processing component which can mount child processing components to do work and declare target states.

Creating an app

To create an App, provide:

An AppConfig (or just a name string) — identifies the pipeline
A main function — the entry point for your pipeline
Arguments — any additional arguments to pass to the main function

python

import cocoindex as coco

@coco.fn
async def app_main(sourcedir: pathlib.Path) -> None:
    # ... your pipeline logic ...

app = coco.App(
    coco.AppConfig(name="MyPipeline"),
    app_main,
    sourcedir=pathlib.Path("./data"),
)

You can also pass just a name string instead of AppConfig:

python

app = coco.App("MyPipeline", app_main, sourcedir=pathlib.Path("./data"))

Tip

The main function is usually async. See How Sync and Async Work Together for details.

Updating an app

Call update() to execute the pipeline. It returns an UpdateHandle that is also Awaitable, so the simplest usage stays the same:

python

# Async — await the result directly (backward-compatible)
result = await app.update()

python

# Sync (blocking) API
result = app.update_blocking()

Parameters:

live option keeps the app running after the initial scan so live components can continue watching for changes. See Live Mode.
report_to_stdout option prints periodic progress updates during execution. Pass True for the default refresh interval, or a timedelta to set it.
full_reprocess option reprocesses everything and invalidates existing caches. This forces all components to re-execute and all target states to be re-applied, even if they haven’t changed.

When you update an App, CocoIndex:

Runs the lifespan setup (if not already done)
Executes the main function (the root processing component), which mounts child processing components
Compares the declared target states with the previous run and applies only the necessary changes to external systems

Given the same logic and inputs, updates are repeatable. When logic or inputs change, only the affected parts re-execute.

To watch progress beyond the report_to_stdout flag, the UpdateHandle returned by app.update() also exposes stats programmatically — poll with handle.stats() or stream with handle.watch(). For those structured APIs, and for splitting a run into separately-reported scopes with coco.stats_group(), see Progress monitoring.

How an app runs

An App is the top-level runner and entry point. A processing component is the unit of incremental execution within an app.

Your app’s main function runs as the root processing component at the root path.
Each call to mount() or use_mount() declares a child processing component at a child path. Sugar APIs like mount_each() and mount_target() also create child components.
Each processing component declares a set of target states, and CocoIndex syncs them as a unit when that component finishes — all writes happen after processing completes, and each target backend applies its batch atomically when supported.

This is why app.update() does not “run everything from scratch”: CocoIndex uses the component path tree to decide what can be reused and what must re-run.

For example, an app that processes files might mount one component per file:

text

(root)                         ← app_main component
├── "setup"                    ← declare_dir_target component
└── "process"
    ├── "hello.md"             ← process_file component
    └── "world.md"             ← process_file component

See Processing Component for how mounting and component paths define these boundaries.

Database path

CocoIndex needs a database path (db_path) to store its internal state. This database tracks target states and memoized results from previous runs, enabling CocoIndex to compute what changed and apply only the necessary updates.

The simplest way to configure the database path is via the COCOINDEX_DB environment variable:

bash

export COCOINDEX_DB=./cocoindex.db

With COCOINDEX_DB set, you can create and run apps without any additional configuration:

python

import cocoindex as coco

@coco.fn
def app_main() -> None:
    # ... your pipeline logic ...

app = coco.App("MyPipeline", app_main)
app.update_blocking()  # Uses COCOINDEX_DB for storage

For details on what the internal database stores and how to tune its LMDB settings (e.g., increasing the maximum database size beyond 4 GiB), see Internal Storage.

Lifespan (optional)

A lifespan function defines the CocoIndex runtime lifecycle: its setup runs when the runtime starts (automatically before the first app.update()), and its cleanup runs when the runtime stops. Use it to configure CocoIndex settings programmatically or to initialize shared resources that processing components can reuse.

Tip

If you only need to set the database path, using the COCOINDEX_DB environment variable is simpler than defining a lifespan function.

Defining a lifespan

Use the @lifespan decorator to register a lifespan function. By default, all apps share the same lifespan (unless you explicitly specify an app in a different Environment). The function receives an EnvironmentBuilder for configuration and uses yield to separate setup from cleanup:

python

import pathlib
from typing import AsyncIterator
import cocoindex as coco

@coco.lifespan
async def coco_lifespan(builder: coco.EnvironmentBuilder) -> AsyncIterator[None]:
    # Configure CocoIndex's internal database location (overrides COCOINDEX_DB if set)
    builder.settings.db_path = pathlib.Path("./cocoindex.db")
    # Setup: initialize resources here
    yield
    # Cleanup happens automatically when the context exits

Setting db_path in the lifespan takes precedence over the COCOINDEX_DB environment variable. If neither is provided, CocoIndex will raise an error.

The lifespan function can be sync or async:

python

import cocoindex as coco

@coco.lifespan
def coco_lifespan(builder: coco.EnvironmentBuilder) -> Iterator[None]:
    builder.settings.db_path = pathlib.Path("./cocoindex.db")
    yield

You can also use the lifespan to provide resources (like database connections) that processing components can access. See Context for details on sharing resources across your pipeline.

Explicit lifecycle control (optional)

The lifespan runs automatically the first time any App updates — most users don’t need to do anything beyond defining the lifespan and calling app.update().

If you need more explicit control — for example, to know when startup completes for health checks, or to explicitly trigger shutdown — you can manage the lifecycle directly:

python

# Async API
await coco.start()   # Run lifespan setup
# ... run apps or other operations ...
await coco.stop()    # Run lifespan cleanup

python

# Sync (blocking) API
coco.start_blocking()   # Run lifespan setup
# ... run apps or other operations ...
coco.stop_blocking()    # Run lifespan cleanup

Or use the runtime() context manager, which supports both sync and async usage:

python

# Async
async with coco.runtime():
    await app.update()

python

# Sync (blocking)
with coco.runtime():
    app.update_blocking()

Managing apps with CLI

CocoIndex provides a CLI for managing your apps without writing additional code.

Update an app

Run your app once to sync all target states:

bash

cocoindex update main.py

This executes your pipeline and applies all declared target states to external systems. Add --live (or -L) to keep the app running and react to source changes continuously — see Live Mode.

Drop an app

Remove an app and revert all its target states:

bash

cocoindex drop main.py

This will delete all target states created by the app (e.g., drop tables, delete rows) and clear its internal state.

drop is an explicit, foreground operation — any failure during the recursive delete (root or any descendant) raises rather than being silently logged. The internal tracking record for a component whose delete failed is preserved so the next drop (with the underlying problem fixed) can complete the cleanup. See Error Handling for the general principle.

See CLI Reference for more commands and options.