The mental model

React — for data engineering.

A persistent-state-driven model. You declare the desired state of your target. The engine keeps it in sync with the latest source data and code, across long time horizons, with low latency and low cost.

REACT COCOINDEX SAME PATTERN STATE SOURCE { count: 3 } + props, context… a.py b.md c.pdf Δ d.ts on change on change YOUR CODE YOUR CODE function App({ state }) { return <UI {...}/> } @coco.fn def process(src) -> target: return transform(src) DIFF · PATCH DIFF · SYNC only Δ only Δ DOM TARGET · rows 1 node patched row · a.py row · b.md row · c.pdf row · d.ts 1 row upserted · 3 cached

React re-renders only the nodes that changed. CocoIndex re-syncs only the rows that changed. Same reactive loop — one drives pixels, the other drives data.

Your code is as simple as the one-off version.

Write the transformation. Declare the sink. That’s the job. CocoIndex figures out what to rerun, what to cache, and what’s already fresh.

01

Transforms the input data — you write Python, not a DAG configuration.

In classical pipelines you describe how to move data: stages, operators, schedules, retries. In CocoIndex you describe what the target should be — a pure function of the source — and the engine derives the graph from your code.

The function you write looks like any other Python function. You set breakpoints in it, call it in tests, import it into a notebook. The decorator (@coco.fn) is opt-in metadata that lets the engine memoize, fingerprint, and re-run it incrementally — but the semantics stay the same: input in, output out.

02

Declares desired state for the target — we compute the minimum work to reach it.

A target state is a declaration: “this row should exist in this table with these values,” “this vector should live under this id,” “this Kafka topic should carry this message.” You describe the end state once. You never write insert / update / delete branches.

When inputs change, the engine diffs declared target state against what’s already in the store and applies the smallest set of mutations that reconciles the two. New rows get upserted, stale rows get removed, unchanged rows are skipped. The same pattern React uses to patch the DOM, applied to your data store.

03

Tracks lineage end-to-end — every byte in the target can be traced to a source.

Every declared target state is tagged with the source item(s) and function version(s) that produced it. When you ask “where did this chunk come from,” you get the file, the byte range, the code commit, and the run timestamp — without adding audit columns yourself.

Lineage is the same mechanism that powers incrementality: because the engine knows which source fingerprints produced which outputs, it knows exactly what to invalidate when any of those fingerprints change.

04

Runs incrementally at any scale — only the delta, never the full recompute.

The same code runs on a laptop against a toy repo, and on a shared daemon against a petabyte corpus. You do not choose between “batch” and “streaming” — the engine runs continuously and only touches what changed.

Memoization at the function level, component-path identity across runs, and content-addressed fingerprints mean unchanged work is always skipped — whether a developer edits one file on their laptop or a CI pipeline swaps out a helper function across a million files. Your bill scales with delta, not with corpus size.

The full analogy, row by row.

If React taught the frontend to stop thinking about DOM mutations, CocoIndex is teaching backend pipelines to stop thinking about ETL steps. The mechanics map almost 1:1.

Concept
React
CocoIndex
Virtual state
React state + props + context
Source items + input fingerprints
Your function
function App(state) → ReactElement
@coco.fn def process(src) → TargetState
Diff engine
Reconciler diffs element trees
Engine diffs declared target state vs. last run
Commit
Patch only the DOM nodes that changed
Upsert / delete only the rows that changed
Persistence
DOM is the persisted surface
Your target store is the persisted surface
Re-render trigger
State setter / prop change
Source fingerprint / code fingerprint change
Cache key
useMemo / React.memo on props
@coco.fn(memo=True) on input + code hash
Cleanup
Unmounted components leave the DOM
Retired source items clear their target rows

Declare the target. Skip the plumbing.

The easiest way to feel the model is to write a tiny flow against a local folder. Five minutes end-to-end.