CocoIndex
Open source ETL framework designed for AI workloads.
CocoIndex is an ultra performant data transformation framework, with core engine written in Rust. Make it super easy to transform data with AI workloads, and keep source data and target in sync effortlessly.
Either creating embedding, building knowledge graphs, or doing any data transformations - beyond traditional SQL.
Either creating embedding, building knowledge graphs, or doing any data transformations - beyond traditional SQL.
Exceptional Velocity
Just declare transformations in dataflow, minimal code needed. Get started with ~100 lines of Python.
Assemble building blocks
Native building blocks for different sources, targets and transformations. Standardize interface, make it 1-line code switch between different components - as easy as assemble building blocks.
Incremental Processing
Out-of-box support for incremental indexing
- minimal recomputation on source or logic change.
- (re-)processing necessary portions; reuse cache when possible
Flow is Single Source of Truth
Define once, run in multiple modes
- batch
- long running job (live update that watches your source)
- sample based fast preview run (cocoinsight)
Automatic schema setup based on logic and data
- data processing and schema management
CocoInsight
Your Data Lineage and Observability Companion
You don't need to be a data expert. CocoInsight provides you with best in-class tools to understand your pipeline step by step, explains the process, and helps you choose the best indexing strategy.
One of the most loved feature for our users building ETL with coco, with significant boost on developer velocity, and lowering the barrier to entry for data engineering.
One of the most loved feature for our users building ETL with coco, with significant boost on developer velocity, and lowering the barrier to entry for data engineering.

️ Loved by builders

At Unity, we use CocoIndex to incrementally index critical unstructured assets. With it, we've managed to dramatically reduce unnecessary data processing computation and remote LLM embedding calls.
CocoIndex is our “Kubernetes moment” – it empowers us to index and operate data with exceptional efficiency, keeping everything always current and context-ready for AI.
Wenlan Yang, Staff Engineer @ Unity
Share you love