Incremental updates, change detection, and always-fresh derived data.
Index a codebase for RAG and AI coding agents with CocoIndex V1 and Tree-sitter: language-aware chunking, embedding, and a live vector index in async Python.
Walk through a live CocoIndex pipeline that watches a folder of CSV files and publishes each row as JSON to a Kafka topic incrementally, with no glue code.
CocoIndex V1 is live: a ground-up redesign of incremental data pipelines, built for AI engineers and agent builders shipping RAG, memory, and knowledge graphs.
Build a pipeline that turns YouTube podcasts into a knowledge graph: extract speakers, statements, and entities with an LLM, then dedupe them with embeddings.
Featuring five new target connectors, filesystem-level change detection, Python 3.14 free-threading, and smarter pipeline lifecycle management.
Build a CocoIndex pipeline that generates a wiki page for each project in your codebase using an LLM, and keeps it fresh with incremental processing.
Build a self-updating knowledge graph from meeting notes: extract decisions, tasks, owners, and relationships from your documents with CocoIndex and an LLM.
Build a real-time HackerNews trending topics detector with CocoIndex: a deep dive into Custom Sources and AI-powered topic extraction.
Production-ready upgrades: durable execution, faster incremental processing over large datasets, GPU isolation, and richer native building blocks.
Extract invoice fields from PDFs in Azure Blob Storage and load them into Snowflake with an incremental CocoIndex + GPT-4o pipeline: open-source unstructured ETL.
Build unified, incrementally updated semantic + structured search over PostgreSQL data with CocoIndex: read a table, transform with AI and non-AI ops, and write pgvector embeddings back to Postgres.
CocoIndex now supports custom targets. Export indexed data to any destination: a local file, cloud storage, a REST API, or your own bespoke system.
CocoIndex updates: in-process setup/drop API, the EmbedText building block, major SplitRecursively codebase-indexing improvements, union and NumPy type support, more LLM APIs, and the Kuzu graph target.
CocoIndex updates: Amazon S3 as a data source, improved query handling, a standalone runtime mode, and more connector and performance improvements.
Build a real-time data transformation pipeline with Amazon S3 and SQS using CocoIndex: incremental indexing on object storage with live updates that reprocess only changed files.
The story of CocoIndex at 1,000 GitHub stars: the open-source engine that combines custom transformation logic with incremental processing for data indexing.
CocoIndex updates: incremental processing with live update mode, evaluation utilities, date/time types, a Google Drive source, and core performance improvements.
CocoIndex continuously watches source changes and applies incremental updates to keep derived data in sync, with low latency and no full reindexing.
CocoIndex helps to keep the index up to date with source changes, super efficient and low latency - with the support of incremental processing.
CocoIndex is now open source: the first engine to combine custom transformation logic with incremental processing built specifically for data indexing.
What makes indexing pipelines different from other data systems, and why they need special handling for incremental processing and persistence.