CocoIndex V1 is now live. It is a fundamental redesign of how you write incremental data pipelines — built from a year of watching what people actually wanted to do with CocoIndex. CocoIndex V1 is built for AI engineers and agent builders — people building coding intelligence, context, RAG, memory, knowledge-graph that live agents depend on.
CocoIndex joined the GitHub Secure Open Source Fund — strengthening security for the AI data infrastructure developers depend on.
A multi-source pipeline that ingests SEC filings (TXT, JSON, PDF), scrubs PII, extracts topics, and powers hybrid search with CocoIndex + Apache Doris.
Automatically generates a wiki page for each project in your codebase, and keeps it fresh with incremental processing.
Turn slide decks into a continuously updated multimodal dataset — extract speaker notes, synthesize narration, keep LanceDB in sync.
Extract Pydantic-typed structured data from patient intake forms using DSPy and CocoIndex — OCR vision models with incremental processing.
"Most companies sit on an ocean of meeting notes - inside those documents are decisions, tasks, owners, and relationships — an untapped knowledge graph that is constantly changing.
Build a real-time HackerNews trending topics detector with CocoIndex — a deep dive into Custom Sources and AI-powered topic extraction.
Build a custom incremental HackerNews connector with CocoIndex's Custom Source API and export to Postgres for semantic search and analytics.
How to use BAML and CocoIndex to extract structured data from patient intake forms in PDF/Word with LLM continuous for production.
Why the next wave of AI needs open source, scalable, and AI-native data infrastructure, and how CocoIndex is building the foundation for the future of intelligent data pipelines.
Extract, embed, and store multimodal PDF elements — text with SentenceTransformers, images with CLIP — for unified semantic search with traceable metadata.
CocoIndex now supports custom sources — read data from any system and keep it incrementally fresh as knowledge for AI agents.
Define query handlers in CocoIndex and trace search results back to source data in CocoInsight — close the loop on indexing strategy.
Build unified, incrementally updated search and analytics over structured + unstructured data in PostgreSQL with CocoIndex.
Build a unified visual document index from multiple file formats — including PDFs, images, and slides — using CocoIndex and ColPali, No OCR needed.
Featuring production readiness, scalability, and reliability. More flexibility with customization and native integrations. Extended features for multi-modalities pipelines and more.
CocoIndex now supports native integration with ColPali — enabling multi-vector, patch-level image indexing.
CocoIndex natively handles typed multi-dimensional vectors — from simple arrays to multi-vector embeddings, unlocks multimodal AI pipelines at scale.
CocoIndex now officially supports custom targets — giving you the power to export data to any destination, whether it's a local file, cloud storage, a REST API, or your own bespoke system.
Build a scalable face detection and recognition pipeline with CocoIndex — embed faces, structure for search, and export to a vector DB.
How to index academic research papers by extracting metadata (e.g., title, authors, abstract) for AI agents and AI workflows using LLMs and CocoIndex
CocoInsight is a platform for data lineage and data observability.
CocoIndex now sets up Qdrant collections automatically by inferring the target schema from your indexing flow — no manual config.
Build a real-time knowledge graph with Kuzu as a native CocoIndex target — incremental updates, high-performance graph queries.
Build a real-time data transformation pipeline with Amazon S3 and SQS using CocoIndex — incremental indexing on object storage.
Indexing images with CocoIndex and Vision Model in real-time: multi-modal embedding, and build vector index for efficient retrieval.
Indexing text with CocoIndex and text embeddings, and query it with natural language.
CocoIndex is the world's first open-source engine that supports both custom transformation logic and incremental processing specialized for data indexing. We just crossed 1k stars, thank you so much!
Build a real-time product recommendation engine with LLM and graph database, from the aspect of product category (taxonomy) understanding.
CocoIndex now supports knowledge graph with incremental processing. Build live knowledge for agents is super easy with CocoIndex!