# CocoIndex Changelog 0.3.11 - 0.3.26

> CocoIndex updates: production-ready resilience, a structured error system, expanded integrations, and always-fresh context for agents.

Published: 2026-01-18 · Canonical: https://cocoindex.io/blogs/changelog-0311-0326/

[CocoIndex](https://github.com/cocoindex-io/cocoindex) is moving at insane speed! 🚀
Since the last release, CocoIndex shipped **15 releases (0.3.11–0.3.26)** and crossed a major milestone: **#1 GitHub Global Trending** across all languages and #1 on Rust. This cycle was focused on one clear goal: making **fresh, structured, programmable context** reliable enough for **agents running in production**.

Agents are only as good as the data they reason over. Thinking about codebases, meeting notes, emails, relationships, commerce ... anything that may be subject to change. If you have ever done index-swap or had to keep multiple copies of the target to serve live agents on data or code change, you'll never have to do it again.

**CocoIndex helps you ace the data for agents without heavy data engineering work.** Continuous fresh, structured and programmable context for AI, from PDFs, Codebase, Emails, Screenshots, Meeting Notes, ... Always up-to-date, at any scale, ultra-performant, with a smart incremental engine. ⭐ [Star the repo](https://github.com/cocoindex-io/cocoindex)

Glad this project resonates with the community! 2026 will see more autonomous agents going into production, and we're excited to see [CocoIndex](https://github.com/cocoindex-io/cocoindex) help in that frontier.

Here's what we shipped since the last release. **0.3.11–0.3.26 was about production trust**: resilient pipelines, structured errors, better observability, and always-fresh structured context for agents operating in the real world.

## Core capability

CocoIndex just got a significant upgrade in runtime resilience and developer ergonomics. The core engine now also features a fully structured error system, enabling unified error tracing from Rust to Python so you can finally see human‑readable stack traces across boundaries. Combined with smarter runtime warnings for slow live updates, simplified progress bar rendering, and cleaner internal API boundaries, CocoIndex continues to evolve into the most transparent and trustworthy data‑processing core in its class.

### Failure-tolerant target deletion

Target drop failures no longer block pipeline setup updates: an environment toggle **`COCOINDEX_IGNORE_TARGET_DROP_FAILURES`** lets you log and continue gracefully ([#1454](https://github.com/cocoindex-io/cocoindex/pull/1454)).

### Simplified progress experience

Streamlined progress bar updates for a more predictable runtime visualization experience ([#1467](https://github.com/cocoindex-io/cocoindex/pull/1467)).

### Structured error system overhaul

A new foundation for error definition and propagation:

- Introduced **structured error framework** with host exception tunneling
- End-to-end **Python stack trace propagation** now supported
- New unified **error types** improve exception messaging and logging at both Rust and Python layers ([#1380](https://github.com/cocoindex-io/cocoindex/pull/1380), [#1413](https://github.com/cocoindex-io/cocoindex/pull/1413)).

### Smart runtime alerts

CocoIndex now warns when live updates exceed the configured refresh interval, ideal for maintaining freshness guarantees ([#1408](https://github.com/cocoindex-io/cocoindex/pull/1408)).

### Cleaner internal API boundaries

Refactored **`datatype.py`** with a dedicated **`_internal`** module, decluttering the public API surface ([#1402](https://github.com/cocoindex-io/cocoindex/pull/1402)).

### Force reprocessing option for updates

Added explicit control for full recomputation during incremental updates for safer state resets. Run `cocoindex update --full-reprocess ...` to force a reprocessing ([#1385](https://github.com/cocoindex-io/cocoindex/pull/1385)).

### Flow correctness improvements

- Fixed target diffing logic for user-managed setup states
- General FTS fixes for LanceDB ([#1369](https://github.com/cocoindex-io/cocoindex/pull/1369), [#1403](https://github.com/cocoindex-io/cocoindex/pull/1403))

Together, these changes strengthen the core runtime's fault tolerance, improve observability, and lay the groundwork for a richer developer debugging experience.

## Integrations

This release massively expands what CocoIndex can plug into.

### Sources & targets

#### Qdrant

Full support for **`VectorIndexMethod`** HNSW configuration in the [Qdrant connector](https://cocoindex.io/docs/connectors/qdrant/) ([#1410](https://github.com/cocoindex-io/cocoindex/pull/1410)).

#### Postgres

Native support for **`pgvector`** types in [Postgres source](https://cocoindex.io/docs/connectors/postgres/) ([#1387](https://github.com/cocoindex-io/cocoindex/pull/1387)).

#### Local files

Automatic symlink resolution for easier graph ingestion ([#1463](https://github.com/cocoindex-io/cocoindex/pull/1463)).

#### LanceDB

- Added **`optimize()`** for post-ingest table compaction
- Extended full-text search integration in the [LanceDB connector](https://cocoindex.io/docs/connectors/lancedb/) ([#1466](https://github.com/cocoindex-io/cocoindex/pull/1466), [#1358](https://github.com/cocoindex-io/cocoindex/pull/1358))

### Functions & LLM providers

#### Unified OpenAI/Azure architecture

Shared logic between **`openai.rs`** and **`azureopenai.rs`** for cleaner code reuse ([#1376](https://github.com/cocoindex-io/cocoindex/pull/1376)).

#### Azure OpenAI integration

Bring your Azure LLM accounts seamlessly into CocoIndex pipelines ([#1362](https://github.com/cocoindex-io/cocoindex/pull/1362)).

#### OpenRouter embeddings

Added embedding model support across OpenRouter's expanding provider network ([#1359](https://github.com/cocoindex-io/cocoindex/pull/1359)).

#### Embedding dimension validation

Introduced **`expected_output_dimension`** param to enforce vector consistency ([#1373](https://github.com/cocoindex-io/cocoindex/pull/1373)).

#### Direct JSON output in functions

**`GeneratedOutput`** enum now supports structured JSON returns for downstream consumption ([#1395](https://github.com/cocoindex-io/cocoindex/pull/1395)).

#### Gemini API key fixes

Cleaner, more consistent propagation of credentials across function calls ([#1375](https://github.com/cocoindex-io/cocoindex/pull/1375)).

These integrations make CocoIndex a more open, extensible, and multi-LLM-ready data processing cortex.

## Build with CocoIndex

We are seeing more projects built with CocoIndex.

### Build fresh data with LanceDB and CocoIndex

🚀 If agents are going to search, reason, and act over long periods of time, we need infrastructure that treats **data freshness and multimodal as a first-class primitive**.

Our friends at [LanceDB](https://lancedb.com) just published a full walkthrough with us showing how to build a **live-updating multimodal search index** using LanceDB and CocoIndex, where text, images, embeddings, and LLM-derived features stay continuously in sync as data and logic evolve.

Check out the [tutorial](https://lancedb.com/blog/keep-your-data-fresh-with-cocoindex-and-lancedb/) for more details.

🔥 **What's cool about this example:**

- **Incremental-by-design processing**: Only new or changed records are reprocessed. Add one recipe? Update one image? Fix one ingredient? CocoIndex touches only what changed, not the other 100k rows.
- **True multimodal indexing**: Text, images, embeddings, and metadata live together in LanceDB, optimized for both search and analytics, without splitting data across systems.
- **Schema evolution without rewrites**: We add new LLM-extracted feature columns (vegetarian, dairy, category, etc.) after data is already live. CocoIndex updates the schema and backfills safely, with no table rebuilds.
- **Structured LLM extraction (no brittle prompts)**: DSPY signatures define input/output types, so LLMs behave like programs, not JSON-generating slot machines.
- **Upserts + deletes handled automatically**: Primary keys + incremental flows mean no duplicates, no stale rows, no "did this record change?" logic to maintain.
- **End-to-end lineage & observability**: With CocoInsight, you can see exactly why a record changed and which transformation produced each field.
- **Live mode for agent-ready systems**: Run the pipeline continuously. As data arrives or logic changes, your search layer updates in near real time.

Think PDFs, slides, screenshots, images, videos, and support tickets across massive enterprise data lakes, all kept fresh, indexed, searchable, and generating real insights. 

### Building a real-time HackerNews trending topics detector with CocoIndex

In the age of information overload, understanding what's trending (and why) is crucial for developers, researchers, and data engineers. HackerNews is one of the most influential tech communities, but manually tracking emerging topics across thousands of threads and comments is practically impossible.

In this post, we'll explore how to build a **production-ready, real-time trending topics pipeline** on top of HackerNews using **CocoIndex**. It shows how unstructured, high-volume community data can be incrementally ingested, enriched with LLMs, and queried as structured, up-to-date knowledge.

Each HackerNews thread and comment is then processed through an **LLM-powered extraction step** to identify normalized topics (technologies, products, companies, people, etc.). These topics, along with the original messages, are indexed into **Postgres** as queryable tables. Weighted scoring distinguishes primary topics in threads from secondary mentions in comments, enabling higher-quality trend rankings.

The result is a continuously updating system that turns raw HackerNews activity into **structured, queryable intelligence**, ready for dashboards, analytics, or AI agents.

Check out the [blog](https://cocoindex.io/blogs/hackernews-trending-topics) for more details.

### Building a knowledge graph from meeting notes that automatically updates

Most companies sit on an ocean of meeting notes and treat them like static text files. But inside those documents are decisions, tasks, owners, and relationships: basically an untapped knowledge graph that is constantly changing. We just published a full walkthrough showing how to turn meeting notes in Drive into a **live-updating knowledge graph** using [CocoIndex](https://github.com/cocoindex-io/cocoindex) + LLM extraction.

Check out the [tutorial](https://cocoindex.io/examples/meeting_notes_graph) for more details.

**What's cool about this example:**

- **Incremental processing**: Only changed documents get reprocessed. If you have thousands of meeting notes, but only 1% change each day, CocoIndex only touches that 1%, saving 99% of LLM cost and compute.
- **Structured extraction with LLMs**: We use a typed Python dataclass as the schema, so the LLM returns real structured objects, not brittle JSON prompts.
- **Graph-native export**: CocoIndex maps nodes (Meeting, Person, Task) and relationships (ATTENDED, DECIDED, ASSIGNED_TO) without writing Cypher, directly into [Neo4j](https://www.linkedin.com/company/neo4j/).
- **Real-time updates**: If a meeting note changes (task reassigned, typo fixed, new discussion added), the graph updates automatically.

### Extracting structured data from patient intake forms with DSPy and CocoIndex

Patient intake forms are indeed a rich source of structured clinical data, but traditional OCR + regex pipelines fail to reliably capture their nested, conditional, and variable structure, leaving most of that value locked in unstructured text or manual entry.

This project demonstrates a production-ready approach to extracting **clean, validated, structured patient data directly from PDF intake forms** using **DSPy** and **CocoIndex**, without brittle OCR pipelines, regex rules, or Markdown intermediates.

The result is a scalable, explainable, and compliant pipeline well-suited for healthcare and other regulated domains, turning unstructured intake forms into always-fresh, queryable patient models ready for downstream systems, analytics, or AI agents.

Check out the [tutorial](https://cocoindex.io/examples/patient_form_extraction_dspy) for more details.

## Thanks to the community 🤗🎉

Welcome new contributors to the CocoIndex community! We are so excited to have you!

### @1fanwang

Thanks [@1fanwang](https://github.com/1fanwang) for adding Azure OpenAI support as an LLM provider in [#1362](https://github.com/cocoindex-io/cocoindex/pull/1362), enabling enterprise users to leverage CocoIndex with their Azure deployments.

### @vipulaSD

Thanks [@vipulaSD](https://github.com/vipulaSD) for adding field-level descriptions for data classes in the Product Recommendation example in [#1327](https://github.com/cocoindex-io/cocoindex/pull/1327), making the example more informative and easier to understand.

### @prrao87

Thanks [@prrao87](https://github.com/prrao87) for adding support for embedding models in OpenRouter in [#1359](https://github.com/cocoindex-io/cocoindex/pull/1359), expanding LLM provider options, and fixing typos and consistency issues in the docs in [#1360](https://github.com/cocoindex-io/cocoindex/pull/1360), making the documentation clearer and more polished.

### @Haleshot

Thanks [@Haleshot](https://github.com/Haleshot) for improving the README note for better GitHub rendering in [#1288](https://github.com/cocoindex-io/cocoindex/pull/1288), shifting to the tracing crate for logging in [#1363](https://github.com/cocoindex-io/cocoindex/pull/1363), adding Postgres server reminder to quickstart docs in [#1368](https://github.com/cocoindex-io/cocoindex/pull/1368), fixing the broken workflow badge in README in [#1370](https://github.com/cocoindex-io/cocoindex/pull/1370), introducing a structured error system with host exception tunneling in [#1380](https://github.com/cocoindex-io/cocoindex/pull/1380), adding llms.txt file to docs in [#1455](https://github.com/cocoindex-io/cocoindex/pull/1455), adding short flag aliases for common CLI options in [#1458](https://github.com/cocoindex-io/cocoindex/pull/1458), and adding env file sourcing tip to meeting_notes_graph example in [#1470](https://github.com/cocoindex-io/cocoindex/pull/1470).

### @thisisharsh7

Thanks [@thisisharsh7](https://github.com/thisisharsh7) for making <code>openai.rs</code> generic to share logic with <code>azureopenai.rs</code> in [#1376](https://github.com/cocoindex-io/cocoindex/pull/1376), including mypy type checking for the examples directory in [#1383](https://github.com/cocoindex-io/cocoindex/pull/1383), and adding the <code>GeneratedOutput</code> enum for direct JSON returns in [#1395](https://github.com/cocoindex-io/cocoindex/pull/1395), enhancing type safety and code reusability.

### @prithvi-moonshot

Thanks [@prithvi-moonshot](https://github.com/prithvi-moonshot) for adding <code>onBrokenAnchors</code> config to report documentation errors in [#1325](https://github.com/cocoindex-io/cocoindex/pull/1325), providing an option to force reprocessing during update in [#1385](https://github.com/cocoindex-io/cocoindex/pull/1385), and making target deletion (drop) tolerate failures in [#1454](https://github.com/cocoindex-io/cocoindex/pull/1454), improving robustness of the data pipeline.

### @skprasadu

Thanks [@skprasadu](https://github.com/skprasadu) for wrapping cargo test to avoid embedded Python stdlib discovery crash in [#1398](https://github.com/cocoindex-io/cocoindex/pull/1398), moving Query Support above Built-in Sources in the docs for better organization in [#1397](https://github.com/cocoindex-io/cocoindex/pull/1397), and adding a warning when live update pass exceeds <code>refresh_interval</code> in [#1408](https://github.com/cocoindex-io/cocoindex/pull/1408).

### @Gohlub

Thanks [@Gohlub](https://github.com/Gohlub) for adding <code>with_context</code> to user‑configured operations in [#1275](https://github.com/cocoindex-io/cocoindex/pull/1275), enabling more powerful context‑aware flows for sources, functions, and targets.

### @wxh06

Thanks [@wxh06](https://github.com/wxh06) for adding <code>expected_output_dimension</code> parameter to <code>embed_text</code> in [#1373](https://github.com/cocoindex-io/cocoindex/pull/1373), providing more control over embedding output dimensions.

### @samojavo

Thanks [@samojavo](https://github.com/samojavo) for fixing mypy errors in example entry points and adding a local type-check script in [#1248](https://github.com/cocoindex-io/cocoindex/pull/1248), improving code quality and developer experience.

### @aryasoni98

Thanks [@aryasoni98](https://github.com/aryasoni98) for enabling programmatic API key passing besides reading from environment variables in [#1134](https://github.com/cocoindex-io/cocoindex/pull/1134), providing more flexibility in configuration.

### @shinobiwanshin

Thanks [@shinobiwanshin](https://github.com/shinobiwanshin) for enhancing the Button component with hover effects and styling improvements in [#1393](https://github.com/cocoindex-io/cocoindex/pull/1393), making the UI more polished and interactive.

### @majiayu000

Thanks [@majiayu000](https://github.com/majiayu000) for adding <code>VectorIndexMethod</code> support for HNSW configuration in Qdrant in [#1410](https://github.com/cocoindex-io/cocoindex/pull/1410), enabling more fine-grained control over vector indexing parameters.

## Summary

[CocoIndex](https://github.com/cocoindex-io/cocoindex) continues evolving into a **resilient, intelligent data engine** for the modern AI stack.

This cycle focuses on **error transparency, LLM integration depth, and frictionless developer operations**, ensuring your pipelines stay alive, observable, and blazing fast even under adverse conditions.

If you like this project, please support us with a star ⭐ at https://github.com/cocoindex-io/cocoindex !

🔥🔥 We are cooking something big! v1 is coming soon. Stay tuned! If you read all the way here, you are a true fan!
