Introducing CocoInsight

June 24, 2025 · 4 min read

From day zero, we envisioned CocoInsight as a fundamental companion to CocoIndex — not just a tool, but a philosophy: making data explainable, auditable, and actionable at every stage of the data pipeline with AI workloads. CocoInsight has been in private beta for a while, it is one of the most loved feature for our users building ETL with coco, with significant boost on developer velocity, and lowering the barrier to entry for data engineering.

We are officially launching CocoInsight today - it has zero pipeline data retention and connects to your on-premise CocoIndex server for pipeline insights. This makes data directly visible and easy to develop ETL pipelines.

Getting Started

Start using it by running:

cocoindex server -ci main.py

for your cocoindex projects.

Overview

This is an example view for CocoInsight:

right panel is dataflow, and
left panel is step-by-step data preview. Each field is tied to an input or output of a step in the dataflow transformation.

CocoInsight Panels

Inspect lineage

You could click on any field (either in data flow or data preview), or any transformation step in the dataflow to inspect lineage - to understand where the data comes from.

Inspect Lineage

The clicked element will be set to purple color, as the element being inspected.

Visibility:
- Direct data/ops with transitive dependency (upstream or downstream) will stay in view.
- Data/ops unrelated to the current selected element will be dimmed.
Color:
- Direct upstream data dependency (exact fields) will be colored blue.
- Direct downstream data output (exact fields) will be colored green.

Let's walk through some simple examples on how these AI pipelines work. You don't need to know how to write code, just need to make sense from spreadsheet 😊.

Codebase Indexing Example

Ingest files, which outputs file names and contents.
Take the filename and extract extension.
Take the content (source code) and extension (language, e.g., .py) to do split based on code boundaries with Tree-sitter.

You could further click on each chunk of a document to expand the details of the chunks.

Knowledge Graph Example

In this example, we process a list of files and generate a knowledge graph with documents and entities as nodes, and relationships between document/entity and entity/entity.

Some key steps:

Use LLM to summarize a document.
Use LLM to extract entities and relationships between entities.

Click on any relationship "rows" to drill into the child table.

How it works

At the core of CocoIndex, both data and data operations are first-class citizens.

Because of this pure dataflow foundation, CocoIndex offers full observability by default:

Before/after of the data are available at every transformation node.
Every output field can be traced back to the exact set of input fields and operations that created it.
Lineage is first-class — not as metadata bolted on afterward, but as a structural property of how data is defined and transformed in the system.

This lineage model is not just useful for debugging — it enables features like incremental processing, intelligent caching, and transformation-level explainability, all out of the box.

While CocoIndex is architecturally a dataflow engine, its user experience is deeply inspired by spreadsheets. Just like in a spreadsheet:

Values of cells are derived from others through clearly visible formulas or expressions.
You can visually inspect how data looks before and after each transformation, cell by cell.
There’s no implicit global state, and every value can be explained in terms of its formula and input values.
Once value of a source cell changes, we automatically update derived cell values based on formulas with minimum reprocessing.

This spreadsheet-inspired paradigm is more than a UI choice — it’s a cognitive model. It bridges the gap between low-code users and developers, allowing anyone familiar with spreadsheets to reason about data transformations intuitively.

We have lots of features planned for CocoInsight 😎, including query debugging, stats, and more. Stay tuned and join our Discord for any questions.

Getting Started​

Overview​

Inspect lineage​

Codebase Indexing Example​

Knowledge Graph Example​

How it works​