CocoIndex updates: in-process API, CLI improvements, EmbedText support, codebase indexing enhancements, and more.
CocoIndex updates: Amazon S3 as a data source, improved query handling, a standalone runtime mode, and more connector and performance improvements.
CocoIndex updates: knowledge graph support, Qdrant and Supabase targets, KTable and LTable data types, additional LLM providers, and more.
CocoIndex updates: incremental processing with live update mode, evaluation utilities, date/time types, a Google Drive source, and core performance improvements.
CocoIndex continuously watches source changes and keeps derived data in sync, with low latency and minimal performance overhead.
CocoIndex helps to keep the index up to date with source changes, super efficient and low latency - with the support of incremental processing.
Extract structured data from patient intake forms in PDF and Word documents using an LLM and CocoIndex: a practical healthcare document extraction example.
Tutorial to create text embeddings from docs on Google Drive, save in vector stores for semantic search / RAG, using CocoIndex.
First release of CocoIndex Changelog: LLM support, codebase indexing, custom functions, and assorted core/performance improvements
Indexing codebase for RAG with CocoIndex and Tree-sitter in real-time: chunking, embedding, semantic search, and build vector index for efficient retrieval.
Learn to use CocoIndex to extract structured data from PDF/Markdown with Ollama's local LLM models. All running on premise without sending data to external APIs.
CocoIndex is now open source: the first engine to combine custom transformation logic with incremental processing built specifically for data indexing.
What customizable data indexing pipelines are, and why custom transformation logic matters, explained through clear comparisons and practical CocoIndex examples.
What makes indexing pipelines different from other data systems, and why they need special handling for incremental processing and persistence.
How CocoIndex handles system updates in indexing flows: automatic schema inference and managing data + logic evolution without downtime.
Handle large files in data indexing: processing granularity, fan-in/fan-out, and memory pressure, walked through a patent XML example in CocoIndex.
Data consistency in indexing pipelines: concurrent updates, exposure risks, and how CocoIndex's data-driven approach keeps indexes converging.
Fundamentals of data indexing pipelines for RAG: what makes a good one, common production pitfalls, and how CocoIndex addresses them.
CocoIndex is a data indexing platform for AI applications: ingestion, processing, and management for RAG and semantic search.
Welcome to the official CocoIndex blog! We're excited to share our journey in building high-performance indexing infrastructure for AI applications.