Skip to main content
View all authors

CocoIndex Changelog 2025-07-07

· 9 min read

CocoIndex Changelog 2025-07-07

In the past weeks, we've added support for in-process API and convenient CLI options for setup / drop, native support for EmbedText as building block, major improvement to support codebase indexing and many core improvements over 10+ releases.

CocoIndex Changelog 2025-05-31

· 7 min read

CocoIndex Changelog 2025-05-31

In the past weeks, we've added support for Amazon S3 as native data source, updated query handling, and many core improvements in performance and stability over 15 releases. We're all in on building the best real-time incremental data framework — and we couldn’t be more excited.

Continuous update derived data on source updates, automatically

· 5 min read

Continuous Updates

Today, we are excited to announce the support of continuous updates for long-running pipelines in CocoIndex. This powerful feature automatically applies incremental source changes to keep your index up-to-date with minimal latency.

With continuous updates, your indexes remain synchronized with your source data in real-time, ensuring that your applications always have access to the most current information without the performance overhead of full reindexing.

Incremental Processing with CocoIndex

· 9 min read

Incremental processing is one of the core values provided by CocoIndex. In CocoIndex, users declare the transformation, and don't need to worry about the work to keep index and source in sync.

CocoIndex creates & maintains an index, and keeps the derived index up to date based on source updates, with minimal computation and changes. That makes it suitable for ETL/RAG or any transformation tasks that stay low latency between source and index updates, and also minimizes the computation cost.

Build text embeddings from Google Drive for RAG

· 9 min read

Text Embedding from Google Drive

In this blog, we will show you how to use CocoIndex to build text embeddings from Google Drive for RAG step by step including how to setup Google Cloud Service Account for Google Drive. CocoIndex is an open source framework to build fresh indexes from your data for AI. It is designed to be easy to use and extend.

CocoIndex Changelog 2025-03-20

· 8 min read

CocoIndex Changelog We're excited to share our progress with you! We'll be publishing these updates weekly, but since this is our first one, we're covering highlights from the last two weeks. We had 9 releases in the last 2 weeks over 100+ PRs merged (Yes, we shipped a lot!), here are the highlights.

What Makes Indexing Pipelines Different?

· 3 min read

Indexing Pipeline Differences

When building data processing systems, it's easy to think all pipelines are similar - they take data in, transform it, and produce outputs. However, indexing pipelines have unique characteristics that set them apart from traditional ETL, analytics, or transactional systems. Let's explore what makes indexing special.

Data Consistency in Indexing Pipelines

· 7 min read

Data Consistency in Indexing Pipelines

An indexing pipeline builds indexes derived from source data. The index should always be converging to the current version of source data. In other words, once a new version of source data is processed by the pipeline, all data derived from previous versions should no longer exist in the target index storage. This is called data consistency requirement for an indexing pipeline.

Data Indexing and Common Challenges

· 5 min read

Data Indexing Pipeline

At its core, data indexing is the process of transforming raw data into a format that's optimized for retrieval. Unlike an arbitrary application that may generate new source-of-truth data, indexing pipelines process existing data in various ways while maintaining trackability back to the original source. This intrinsic nature - being a derivative rather than source of truth - creates unique challenges and requirements.

CocoIndex - A Data Indexing Platform for AI Applications

· 4 min read

CocoIndex Cover Image

High-quality data tailored for specific use cases is essential for successful AI applications in production. The old adage "garbage in, garbage out" rings especially true for modern AI systems - when a RAG pipeline or agent workflow is built on poorly processed, inconsistent, or irrelevant data, no amount of prompt engineering or model sophistication can fully compensate. Even the most advanced AI models can't magically make sense of low-quality or improperly structured data.

Welcome to CocoIndex

· 2 min read

Aloha CocoIndex

Welcome to the official CocoIndex blog! We're excited to share our journey in building high-performance indexing infrastructure for AI applications.

CocoIndex is designed to provide exceptional velocity for AI systems that need fast, reliable access to their data. Whether you're building large language models, recommendation systems, or other AI applications, our goal is to make data indexing and retrieval as efficient as possible.