---
title: "System updates and automatic schema inference"
description: "How CocoIndex handles system updates in indexing flows: automatic schema inference and managing data + logic evolution without downtime."
last_updated: 2025-01-20
doc_version: "2025-01-20"
canonical: https://cocoindex.io/blogs/handle-system-update-for-indexing-flow/
---
# System updates and automatic schema inference

> How CocoIndex handles system updates in indexing flows: automatic schema inference and managing data + logic evolution without downtime.

Published: 2025-01-20 · Canonical: https://cocoindex.io/blogs/handle-system-update-for-indexing-flow/

When building [data processing and indexing systems](https://cocoindex.io/blogs/data-indexing-and-common-challenges), one of the key challenges is handling system updates gracefully. These systems maintain state across multiple components (like Pinecone, PostgreSQL, etc.) and need to evolve over time. Let's explore the challenges and potential solutions.

## The two dimensions of change

### 1. Data evolution
Source data is constantly changing - new records are added, existing ones are updated or deleted.

### 2. Logic evolution
The business logic and processing rules also evolve, for example,
- New fields need to be indexed
- Transformation logic changes
- New analysis requirements emerge

This is similar to how spreadsheets work - changes in either source data or formulas trigger updates to the target data.

## Infrastructure and schema management challenges

When setting up a new indexing flow, there are multiple moving parts to configure:

1. Internal data storage
2. Target storage systems (PostgreSQL, Pinecone, Milvus, etc.)
3. Pipeline logic needs to match with the component setup. For example, the fields that need to be carried into the index need to be carefully managed. 

Currently, this often requires manual setup and careful coordination. Small mismatches in schema or field definitions can cause subtle bugs that are hard to debug.

## CocoIndex Approach: reduce manual setup and infer from indexing flow 

CocoIndex aims to simplify this by making infrastructure setup and schema management automatic and inference-based:

### Flow-driven setup
- Users define their [indexing flow logic](https://cocoindex.io/docs/programming_guide/core_concepts/)
- CocoIndex automatically infers required storage and schema configurations
- [Internal](https://cocoindex.io/docs/advanced_topics/internal_storage/) and [target storage](https://cocoindex.io/docs/programming_guide/target_state/) is provisioned with correct schemas automatically

### Benefits of inference
Like modern programming languages that use type inference, for example, when using Java/TS to write code, developers don't need to define data types at every single step. CocoIndex can derive the necessary infrastructure setup from the flow definition. This:
- Reduces manual configuration
- Prevents schema mismatches
- Makes updates more reliable
- Allows the system to evolve more easily

## How updates are actually applied

The inference above is what makes updates safe to run. Because CocoIndex knows the schema up front and persists what it created, an update is a reconciliation rather than a rebuild.

### Declared state vs. previous state
A [target state](https://cocoindex.io/docs/programming_guide/target_state/) is what you declare should exist in an external system — a table, a row, a file, an embedding. CocoIndex treats your declarations as the source of truth and records them in its [internal storage](https://cocoindex.io/docs/advanced_topics/internal_storage/) (an LMDB database that tracks target states and memoization results from previous runs). On the next run it compares what you now declare against what it stored last time and applies only the minimal changes needed:

| Target state | On first declaration | When declared differently | When no longer declared |
| --- | --- | --- | --- |
| A database table | Create the table | Alter the table | Drop the table |
| A row in a table | Insert the row | Update the row | Delete the row |
| A file in a directory | Create the file | Update the file | Delete the file |

This is the same mechanism for both dimensions of change. A new or deleted *source* record shows up as a row that is now (or no longer) declared. An edit to your *logic* — adding a field, changing a transformation — shows up as a row or table that is declared differently. Either way, CocoIndex computes the delta instead of reprocessing everything. Memoized components are skipped when their inputs haven't changed, so unchanged work isn't redone.

### What setup and drop do
When a *container* target state changes — for instance you add a column or change a primary key — CocoIndex detects it and does its best to alter the target in place. If the change is too large to alter (changing primary keys is the canonical example), the target is **dropped and recreated**. Crucially, when that happens CocoIndex automatically reprocesses the affected components to backfill the data; you don't have to manually trigger a full reprocess. This is driven by the target connector's child-invalidation mechanism, which tells the engine whether a change is destructive (all children lost) or merely lossy (some data may be lost).

You can also reach for these transitions explicitly through the [CLI](https://cocoindex.io/docs/cli_commands/). `cocoindex update` runs the app in catch-up mode and applies the reconciliation above. Passing `--reset` drops the existing setup before updating (equivalent to running `cocoindex drop` first), while `--full-reprocess` reprocesses everything and invalidates existing caches. The standalone `cocoindex drop` command reverts all target states an app created — dropping tables, deleting rows — and clears the app's internal state database.

### Keeping the index fresh
Catch-up mode is already incremental, but each `update()` call still has to scan sources to discover what changed, and changes are only picked up when you trigger a run. For near-real-time indexes, [live mode](https://cocoindex.io/docs/programming_guide/live_mode/) keeps the app running after the initial catch-up and lets change-aware sources (a filesystem watcher, a database change feed, a Kafka consumer) stream updates continuously into the same target-state reconciliation — new or modified items re-mount the affected component, deletions remove it and its target states.

## Looking forward
The future of data processing systems lies in smart automation that can:
- Infer infrastructure needs from processing logic
- Handle schema evolution gracefully
- Maintain consistency across distributed storage
- Make updates and changes reliable and predictable

By building these capabilities into [CocoIndex](https://cocoindex.dev), we can significantly reduce the operational burden on users while making systems more reliable and maintainable.

## Sitemap

- [Blog index](https://cocoindex.io/blogs/)
- [Site index (llms.txt)](https://cocoindex.io/llms.txt)
- [Full blog corpus](https://cocoindex.io/llms-full.txt)
- [Markdown sitemap](https://cocoindex.io/sitemap.md)
- [XML sitemap](https://cocoindex.io/sitemap.xml)
- [RSS feed](https://cocoindex.io/blogs/rss.xml)