
We are officially open sourced! 🎉
CocoIndex is the world's first open-source engine that supports both custom transformation logic and incremental updates specialized for data indexing. We are now officially open sourced!
CocoIndex updates: Incremental processing with live update mode, evaluation utilities, support for date/time types, Google Drive, and assorted core/performance improvements
CocoIndex continuously watches source changes and keeps derived data in sync, with low latency and minimal performance overhead.
CocoIndex helps to keep index up to date with source changes, super efficient and low latency - with the support of incremental processing.
Extract structured data from patient intake forms in PDF/Word with LLM by CocoIndex.
Tutorial to create text embeddings from docs on Google Drive, save in vector stores for semantics search / RAG, using CocoIndex.
First release of CocoIndex Changelog: LLM support, codebase indexing, custom functions, and assorted core/performance improvements
Tutorial on indexing codebase for RAG with CocoIndex and Tree-sitter: chunking, embedding, semantic search, and build vector index for efficient retrieval.
Learn to use CocoIndex extracting structured data from PDF/Markdown with Ollama's local LLM models. All running on premise without sending data to external APIs.
Explain what customizable data indexing pipelines are through comparisons and examples.
Understanding the unique characteristics of indexing pipelines compared to other data processing systems. Learn why indexing requires special handling for incremental updates and persistence.
Explore how CocoIndex handles system updates in indexing flows and our approach to automatic schema inference. Learn about the challenges of managing data and logic evolution, infrastructure setup, and how CocoIndex simplifies these processes through smart automation.
Learn best practices for handling large files in data indexing systems. Understand processing granularity, fan-in/fan-out scenarios, and strategies for efficient processing of large datasets like patent XML files. Discover how CocoIndex helps manage memory pressure and ensures reliable processing.
Explore the challenges and solutions for maintaining data consistency in indexing pipelines. Learn about concurrent updates, data exposure risks, and best practices for ensuring reliable, up-to-date indexes using CocoIndex's data-driven approach.
Explore the fundamentals of data indexing pipelines for RAG applications. Learn about key characteristics of effective indexing systems, common challenges in production, and how CocoIndex solves them. Discover best practices for building maintainable, cost-effective, and scalable data indexing infrastructure.
Discover how CocoIndex simplifies data indexing for AI applications. Learn about our comprehensive platform that handles data ingestion, processing, and management for RAG, semantic search, and other AI use cases.
Welcome to the official CocoIndex blog! We're excited to share our journey in building high-performance indexing infrastructure for AI applications.