Generating, storing, and serving text and multimodal embeddings.
Index a codebase for RAG and AI coding agents with CocoIndex V1 and Tree-sitter: language-aware chunking, embedding, and a live vector index in async Python.
A multi-source pipeline that ingests SEC filings (TXT, JSON, PDF), scrubs PII, extracts topics, and powers hybrid search with CocoIndex + Apache Doris.
Turn slide decks into a continuously updated multimodal dataset with CocoIndex: extract speaker notes with Gemini Vision, synthesize narration with Piper TTS, and keep LanceDB in sync.
CocoIndex now batches GPU and ML workloads automatically: 5x throughput on text embeddings and AI ops, with zero configuration required.
Extract, embed, and store multimodal PDF elements (text with SentenceTransformers, images with CLIP) for unified semantic search with traceable metadata.
Build unified, incrementally updated semantic + structured search over PostgreSQL data with CocoIndex: read a table, transform with AI and non-AI ops, and write pgvector embeddings back to Postgres.
Build a unified visual document index from multiple file formats (including PDFs, images, and slides) using CocoIndex and ColPali. No OCR needed.
CocoIndex now natively integrates ColPali for multi-vector, patch-level image indexing: multi-modal context engineering for visually rich documents and PDFs.
CocoIndex natively handles typed multi-dimensional vectors, from simple arrays to multi-vector embeddings, unlocking multimodal AI pipelines at scale.
Build a scalable face detection and recognition pipeline with CocoIndex: embed faces, structure for search, and export to a vector DB.
How to index academic research papers by extracting metadata (e.g., title, authors, abstract) for AI agents and AI workflows using LLMs and CocoIndex
CocoIndex updates: in-process setup/drop API, the EmbedText building block, major SplitRecursively codebase-indexing improvements, union and NumPy type support, more LLM APIs, and the Kuzu graph target.
CocoIndex updates: Amazon S3 as a data source, improved query handling, a standalone runtime mode, and more connector and performance improvements.
Indexing images with CocoIndex and Vision Model in real-time: multi-modal embedding, and build vector index for efficient retrieval.
Build a semantic text index with CocoIndex and text embeddings, then query it with natural language: a beginner's guide to embeddings and vector search.
Step-by-step tutorial to build text embeddings from Google Drive docs with CocoIndex, including service-account setup, and store them in Postgres for semantic search and RAG.
Indexing codebase for RAG with CocoIndex and Tree-sitter in real-time: chunking, embedding, semantic search, and build vector index for efficient retrieval.
What customizable data indexing pipelines are, and why custom transformation logic matters, explained through clear comparisons and practical CocoIndex examples.
CocoIndex is a data indexing platform for AI applications, handling ingestion, chunking, embedding, and pipeline management for RAG, semantic search, and knowledge graphs with built-in lineage and observability.