The CocoIndex Blog.

Tutorials, deep dives, and notes from the CocoIndex team. Incremental data infrastructure, Rust internals, knowledge graphs, and stories from production.

RSS feed → · Browse tags →

Featured Apr 22, 2026

CocoIndex V1 is Live!

CocoIndex V1 is live: a ground-up redesign of incremental data pipelines, built for AI engineers and agent builders shipping RAG, memory, and knowledge graphs.

AnnouncementFeatureIncremental ProcessingArchitectureAI Agents
Read post

Jun 10, 2026

Index Your Codebase for AI Agents with CocoIndex V1

Index a codebase for RAG and AI coding agents with CocoIndex V1 and Tree-sitter: language-aware chunking, embedding, and a live vector index in async Python.

Examples RAG Embeddings AI Agents Incremental Processing
Jun 1, 2026

CocoIndex Changelog 1.0.1 - 1.0.7

CocoIndex's first post-v1 releases: stable memoization keys, scheduled live refresh, scoped stats, safer SQL connectors, and more integrations.

Changelog Announcement Connectors Performance
Apr 28, 2026

Live CSV → Kafka with CocoIndex's New Kafka Target Connector

Walk through a live CocoIndex pipeline that watches a folder of CSV files and publishes each row as JSON to a Kafka topic incrementally, with no glue code.

Feature Examples Connectors Incremental Processing AI Agents
Apr 2, 2026

Turn Podcasts into a Knowledge Graph with LLM and CocoIndex

Build a pipeline that turns YouTube podcasts into a knowledge graph: extract speakers, statements, and entities with an LLM, then dedupe them with embeddings.

Examples Knowledge Graph LLM Structured Extraction Incremental Processing
Mar 27, 2026

From pickle to type-guided, safer Python serialization

How CocoIndex moved from pickle to type-guided serialization that uses Python type hints to pick the right serializer, no decorators or registration needed.

Insight Architecture Best Practices
Mar 24, 2026

Invisible Daemon: architecture patterns for local dev tools

Five patterns for a Python CLI background daemon that auto-starts, upgrades transparently, and shuts down fast, from the daemon behind cocoindex-code.

Best Practices Architecture AI Agents
Mar 10, 2026

CocoIndex Changelog 0.3.27 - 0.3.34

Featuring five new target connectors, filesystem-level change detection, Python 3.14 free-threading, and smarter pipeline lifecycle management.

Changelog Connectors Performance Incremental Processing
Feb 17, 2026

CocoIndex joins the GitHub Secure Open Source Fund

CocoIndex joined the GitHub Secure Open Source Fund, hardening the AI data infrastructure developers depend on with threat modeling, CodeQL, and audits.

Announcement Security Best Practices Community
Feb 9, 2026

SEC EDGAR financial analytics with Apache Doris

A multi-source pipeline that ingests SEC filings (TXT, JSON, PDF), scrubs PII, extracts topics, and powers hybrid search with CocoIndex + Apache Doris.

Examples Structured Extraction Vector Search Embeddings Connectors
Feb 5, 2026

Build a Self-Updating Wiki for Your Codebases with an LLM

Auto-generate documentation for every project in your codebase: a CocoIndex pipeline writes a wiki page per repo with an LLM, kept fresh as code changes.

Examples LLM Structured Extraction Incremental Processing Tutorial
Jan 22, 2026

Slides-to-speech: narrate presentations automatically

Turn slide decks into a live multimodal dataset with CocoIndex: extract notes with Gemini Vision, narrate with Piper TTS, and keep LanceDB in sync.

Examples Multimodal Embeddings Structured Extraction Vector Search
Jan 18, 2026

CocoIndex Changelog 0.3.11 - 0.3.26

CocoIndex updates: production-ready resilience, a structured error system, expanded integrations, and always-fresh context for agents.

Changelog Connectors Structured Extraction Knowledge Graph
Dec 15, 2025

Extract patient intake forms with DSPy and CocoIndex

Extract Pydantic-typed structured data from patient intake forms using DSPy and CocoIndex: OCR vision models with incremental processing.

Examples Tutorial Structured Extraction Multimodal LLM
Dec 8, 2025

A knowledge graph from meeting notes that auto-updates

Build a self-updating knowledge graph from meeting notes: extract decisions, tasks, owners, and relationships from your documents with CocoIndex and an LLM.

Examples Connectors Knowledge Graph LLM Incremental Processing
Dec 2, 2025

Real-time HackerNews trending topics detector with CocoIndex

Build a real-time HackerNews trending topics detector with CocoIndex: a deep dive into Custom Sources and AI-powered topic extraction.

Examples Custom Source LLM Structured Extraction Incremental Processing
Nov 25, 2025

CocoIndex Changelog 0.2.21 - 0.3.10

Featuring batching support for CocoIndex functions, execution robustness, schema & type system improvements, custom source support, and more.

Changelog Feature Performance Custom Source Connectors
Nov 25, 2025

Extract HackerNews into Postgres with a custom source

Build a custom incremental HackerNews connector with CocoIndex's Custom Source API and export to Postgres for semantic search and analytics.

Examples Custom Source Feature Postgres Tutorial
Nov 21, 2025

Extracting Intake Forms with BAML and CocoIndex

How to use BAML and CocoIndex to extract structured data from patient intake forms in PDF/Word with LLMs continuously for production.

Examples Tutorial Structured Extraction LLM
Nov 10, 2025

Adaptive Batching - 5x throughput on your data pipelines

CocoIndex now batches GPU and ML workloads automatically: 5x throughput on text embeddings and AI ops, with zero configuration required.

Feature Performance Embeddings Best Practices
Oct 29, 2025

AI-Native Data Pipeline - Why We Made It

Why the next wave of AI needs open-source, scalable, AI-native data infrastructure, and how CocoIndex is building the foundation for intelligent data pipelines.

Insight Architecture AI Agents Data Indexing
Oct 27, 2025

Index PDF elements with mixed embedding models

Extract, embed, and store multimodal PDF elements (text with SentenceTransformers, images with CLIP) for unified semantic search with traceable metadata.

Examples Feature Multimodal Embeddings Vector Search
Oct 21, 2025

Bring your own data: Index any data with Custom Sources

CocoIndex now supports custom sources: read data from any system and keep it incrementally fresh as knowledge for AI agents.

Feature Custom Source Connectors Data Indexing Architecture
Oct 19, 2025

CocoIndex Changelog 2025-10-19

Production-ready upgrades: durable execution, faster incremental processing over large datasets, GPU isolation, and richer native building blocks.

Changelog Incremental Processing Postgres Structured Extraction Connectors
Oct 11, 2025

Automated invoice processing with AI and Snowflake

Extract invoice fields from PDFs in Azure Blob Storage into Snowflake with an incremental CocoIndex + GPT-4o pipeline: open-source unstructured ETL.

Examples Tutorial Structured Extraction Connectors Incremental Processing
Oct 10, 2025

Thinking in Rust: Ownership, Access, and Memory Safety

A mental framework for Rust's memory safety concepts. Think systematically about ownership, references, Send, Sync, and Rc, Arc, RefCell, Mutex, etc.

Insight Architecture Best Practices
Sep 21, 2025

Iterate faster: trace queries back to source data

Define query handlers in CocoIndex and trace search results back to source data in CocoInsight to close the loop on indexing strategy.

Examples Feature RAG Vector Search
Sep 1, 2025

Incrementally transform Postgres data with AI

Build incrementally updated semantic and structured search over PostgreSQL with CocoIndex, writing pgvector embeddings back to Postgres.

Examples Postgres Incremental Processing Embeddings Vector Search
Aug 20, 2025

Index PDFs, images, and slides with ColPali, no OCR

Build a unified visual document index from multiple file formats (including PDFs, images, and slides) using CocoIndex and ColPali. No OCR needed.

Examples Multimodal Embeddings Vector Search RAG
Aug 18, 2025

CocoIndex Changelog 2025-08-18

CocoIndex updates: production readiness, scalability, and reliability, plus more customization, native integrations, and multi-modal pipeline features.

Changelog Performance Multimodal Connectors Vector Search
Aug 13, 2025

Control Processing Concurrency in CocoIndex

How CocoIndex's layered concurrency controls optimize data-processing performance, prevent system overload, and keep pipelines stable and efficient at scale.

Feature Performance Best Practices Architecture
Aug 12, 2025

Index Images with ColPali: Multi-Modal Context Engineering

CocoIndex now natively integrates ColPali for multi-vector, patch-level image indexing: multi-modal context engineering for visually rich documents and PDFs.

Examples Feature Multimodal Embeddings Vector Search
Aug 10, 2025

Multi-Dimensional Vector Support in CocoIndex

CocoIndex natively handles typed multi-dimensional vectors, from simple arrays to multi-vector embeddings, unlocking multimodal AI pipelines at scale.

Feature Embeddings Vector Search Multimodal
Aug 3, 2025

Custom Targets: export your data anywhere

CocoIndex now supports custom targets. Export indexed data to any destination: a local file, cloud storage, a REST API, or your own bespoke system.

Examples Feature Connectors Incremental Processing Tutorial
Jul 24, 2025

Index faces for visual search: your own Google Photos

Build a scalable face detection and recognition pipeline with CocoIndex: embed faces, structure for search, and export to a vector DB.

Examples Tutorial Multimodal Embeddings Vector Search
Jul 9, 2025

Index academic papers and extract metadata for AI agents

How to index academic research papers by extracting metadata (e.g., title, authors, abstract) for AI agents and AI workflows using LLMs and CocoIndex

Examples Structured Extraction Embeddings RAG Tutorial
Jul 7, 2025

CocoIndex Changelog 2025-07-07

CocoIndex updates: in-process setup/drop API, EmbedText building block, SplitRecursively improvements, union/NumPy types, and the Kuzu graph target.

Changelog Embeddings LLM Knowledge Graph Incremental Processing
Jun 24, 2025

Introducing CocoInsight

Introducing CocoInsight, a data lineage and observability tool that lets you inspect, trace, and debug every step of a CocoIndex pipeline in real time.

Feature Announcement Insight Architecture
Jun 8, 2025

Flow-based schema inference for Qdrant

CocoIndex sets up Qdrant collections automatically by inferring the target schema from your indexing flow: no manual config, vector sizes kept in sync.

Feature Vector Search Connectors Data Indexing
Jun 3, 2025

CocoIndex + Kuzu: Real-time knowledge graph with Kuzu

Build a real-time knowledge graph with Kuzu as a native CocoIndex target: incremental updates, high-performance graph queries.

Examples Feature Knowledge Graph Connectors
May 31, 2025

CocoIndex Changelog 2025-05-31

CocoIndex updates: Amazon S3 as a data source, improved query handling, a standalone runtime mode, and more connector and performance improvements.

Changelog Connectors Incremental Processing Embeddings Vector Search
May 29, 2025

Real-time data pipeline with S3, SQS, and CocoIndex

Build a real-time data pipeline on Amazon S3 and SQS with CocoIndex: incremental indexing on object storage that reprocesses only changed files.

Examples Incremental Processing Connectors Data Indexing
May 20, 2025

Image search in natural language with CLIP

Indexing images with CocoIndex and Vision Model in real-time: multi-modal embedding, and build vector index for efficient retrieval.

Examples Multimodal Embeddings Vector Search Tutorial
May 19, 2025

How to build an index with text embeddings

Build a semantic text index with CocoIndex and text embeddings, then query it with natural language: a beginner's guide to embeddings and vector search.

Examples Embeddings Vector Search RAG Tutorial
May 8, 2025

Story of CocoIndex, at 1k stars 🎉

The story of CocoIndex at 1,000 GitHub stars: the open-source engine that combines custom transformation logic with incremental processing for data indexing.

Announcement Changelog Data Indexing Incremental Processing Architecture
May 7, 2025

Real-time product recommendations with LLM + graph DB

Build a real-time product recommendation engine with an LLM and a graph database, from the aspect of product category (taxonomy) understanding.

Examples Knowledge Graph LLM Structured Extraction
Apr 30, 2025

CocoIndex Changelog 2025-04-30

CocoIndex updates: knowledge graph support, Qdrant and Supabase targets, KTable and LTable data types, additional LLM providers, and more.

Changelog Knowledge Graph Connectors Vector Search LLM
Apr 29, 2025

Build Real-Time Knowledge Graph For Documents with LLM

CocoIndex now supports knowledge graphs with incremental processing. Building live knowledge for agents is super easy with CocoIndex!

Examples Knowledge Graph LLM Structured Extraction Tutorial
Apr 7, 2025

CocoIndex Changelog 2025-04-07

CocoIndex updates: incremental live update mode, evaluation utilities, date/time types, a Google Drive source, and core performance improvements.

Changelog Incremental Processing Connectors Structured Extraction
Apr 7, 2025

Continuous updates: derive data when sources change

CocoIndex continuously watches source changes and applies incremental updates to keep derived data in sync, with low latency and no full reindexing.

Feature Incremental Processing Data Indexing Connectors
Apr 6, 2025

Incremental Processing with CocoIndex

CocoIndex helps to keep the index up to date with source changes, super efficient and low latency - with the support of incremental processing.

Feature Incremental Processing Performance Architecture
Mar 26, 2025

Structured Extraction from Patient Intake Form with LLM

Extract structured data from patient intake forms in PDF and Word documents using an LLM and CocoIndex: a practical healthcare document extraction example.

Examples Structured Extraction LLM Multimodal
Mar 23, 2025

Build text embeddings from Google Drive for RAG

Step-by-step tutorial to build text embeddings from Google Drive docs with CocoIndex and store them in Postgres for semantic search and RAG.

Examples Embeddings RAG Vector Search Connectors
Mar 20, 2025

CocoIndex Changelog 2025-03-20

First release of CocoIndex Changelog: LLM support, codebase indexing, custom functions, and assorted core/performance improvements

Changelog LLM Structured Extraction RAG
Mar 18, 2025

Build Real-Time Codebase Indexing for AI Code Generation

Indexing codebase for RAG with CocoIndex and Tree-sitter in real-time: chunking, embedding, semantic search, and build vector index for efficient retrieval.

Examples RAG Embeddings Vector Search Tutorial
Mar 17, 2025

On-premise structured extraction with LLM using Ollama

Extract structured data from PDF/Markdown with CocoIndex and Ollama's local LLMs, running on premise without sending data to external APIs.

Examples Tutorial Structured Extraction LLM Postgres
Mar 3, 2025

We are officially open sourced! 🎉

CocoIndex is now open source: the first engine to combine custom transformation logic with incremental processing built specifically for data indexing.

Announcement Changelog Incremental Processing Data Indexing RAG
Feb 20, 2025

Customizable Data Indexing Pipelines

What customizable data indexing pipelines are and why custom transformation logic matters, with practical CocoIndex examples.

Data Indexing Insight RAG Embeddings Vector Search
Jan 30, 2025

What Makes Indexing Pipelines Different?

What makes indexing pipelines different from other data systems, and why they need special handling for incremental processing and persistence.

Insight Incremental Processing Architecture Data Indexing
Jan 20, 2025

System updates and automatic schema inference

How CocoIndex handles system updates in indexing flows: automatic schema inference and managing data + logic evolution without downtime.

Data Indexing Best Practices Feature Architecture
Jan 10, 2025

Processing Large Files in Data Indexing Systems

Handle large files in data indexing: processing granularity, fan-in/fan-out, and memory pressure, walked through a patent XML example in CocoIndex.

Data Indexing Best Practices Performance Architecture
Jan 6, 2025

Data Consistency in Indexing Pipelines

Data consistency in indexing pipelines: concurrent updates, exposure risks, and how CocoIndex's data-driven approach keeps indexes converging.

Data Indexing Best Practices Insight Architecture
Jan 5, 2025

Data Indexing and Common Challenges

Fundamentals of data indexing pipelines for RAG: what makes a good one, common production pitfalls, and how CocoIndex addresses them.

Data Indexing Best Practices Insight
Jan 4, 2025

CocoIndex - A Data Indexing Platform for AI Applications

CocoIndex is a data indexing platform for AI: ingestion, chunking, embedding, and pipeline management for RAG, semantic search, and knowledge graphs.

Data Indexing RAG Embeddings Vector Search Knowledge Graph
Jan 2, 2025

Welcome to CocoIndex

Welcome to the official CocoIndex blog! We're excited to share our journey in building high-performance indexing infrastructure for AI applications.

Announcement Community Data Indexing

CocoIndex V1 is Live!