Extract, Transform, Index Data. Easy and Fresh.
CocoIndex is the world's first open-source engine that supports both custom transformation logic and incremental updates specialized for data indexing.
Heavy Transformation
ParseChunkEmbedding...
Incremental indexing on data or logic update
Custom transformation logic
Coco Does the Rest
CocoIndex saves these work for you compared with conventional indexing pipelines that are normally error prone and tricky to handle:
Setup table schema for indexing and maintain on logic change
(re-)processing necessary portions; reuse cache when possible
Maintain data fresh, clear stale derived data/versions
Re-index data based on tracking data/logic changes or data TTL settings
Resume from terminated execution without recomputing everything
Defined once, run in different scenarios
CocoIndex handles the scalability you need and make your pipeline robust. Once your are ready to deploy in production, CocoIndex saves your time and cost.
Sample Preview
Sample based fast preview run, for dev time understanding / debugging
Batch Processing
Batch run on entire data source, in large scale
Continuous Updates
Continuous apply incremental source changes to keep index up to date, with low latency
CocoIndex Components
CocoIndex can help you connect to all the data sources, identify the best indexing strategy and setup the most robust pipeline - chunking, embedding model, deduping/reconciling, vector stores, knowledge graph etc. And then provide standard API to access the index.
Your Input
Web Pages Documents Databases
Applications
Search RAG Analytics
Indexing Stack
Ingestion Connectors
Web Cloud Storage Ingestion API
Parse, Convert
PDF Markdown HTML XML Slides Google Doc Docs JSON
Extract, Split
Flat chunks Hierachical chunks Knowledge Graph triple extraction
Align, Reconcile
Dedupe Cross-doc reference Entity alignment
Query Stack
Query API
Query Understanding
Planning
Rerank
Related Discover
Related Retrieval
Retrieve
Index
Graph Store Relational Store Object Store Vector Store
CocoInsight
You don't need to be a data expert. CocoInsight provides you the best-in-class tools to understand your pipeline step by step, explains and helps you choose the best indexing strategy.
Chunking
Observe, understand, and compare different chunking configurations to quickly iterate on your strategy.
Pricing
CocoIndex Open Source
- Self-hosted
- Free
- Apache 2.0 license
CocoInsight
Free beta now, join our discord group to get started!
Enterprise
- VPC / On Prem Deployments
- Guaranteed customer support and SLA
- Enterprise source connectors
- Data governance - PII
- Cost and usage optimization
- CocoInsight
- Support and control plane for distributed computing