Vector indexes, similarity search, and the databases that back them.
A multi-source pipeline that ingests SEC filings (TXT, JSON, PDF), scrubs PII, extracts topics, and powers hybrid search with CocoIndex + Apache Doris.
Turn slide decks into a continuously updated multimodal dataset with CocoIndex: extract speaker notes with Gemini Vision, synthesize narration with Piper TTS, and keep LanceDB in sync.
Extract, embed, and store multimodal PDF elements (text with SentenceTransformers, images with CLIP) for unified semantic search with traceable metadata.
Define query handlers in CocoIndex and trace search results back to source data in CocoInsight to close the loop on indexing strategy.
Build unified, incrementally updated semantic + structured search over PostgreSQL data with CocoIndex: read a table, transform with AI and non-AI ops, and write pgvector embeddings back to Postgres.
Build a unified visual document index from multiple file formats (including PDFs, images, and slides) using CocoIndex and ColPali. No OCR needed.
CocoIndex updates: production readiness, scalability, and reliability, plus more customization, native integrations, and multi-modal pipeline features.
CocoIndex now natively integrates ColPali for multi-vector, patch-level image indexing: multi-modal context engineering for visually rich documents and PDFs.
CocoIndex natively handles typed multi-dimensional vectors, from simple arrays to multi-vector embeddings, unlocking multimodal AI pipelines at scale.
Build a scalable face detection and recognition pipeline with CocoIndex: embed faces, structure for search, and export to a vector DB.
CocoIndex now sets up Qdrant collections automatically by inferring the target schema from your indexing flow: no manual config, vector sizes derived from the embedding model and kept in sync.
CocoIndex updates: Amazon S3 as a data source, improved query handling, a standalone runtime mode, and more connector and performance improvements.
Indexing images with CocoIndex and Vision Model in real-time: multi-modal embedding, and build vector index for efficient retrieval.
Build a semantic text index with CocoIndex and text embeddings, then query it with natural language: a beginner's guide to embeddings and vector search.
CocoIndex updates: knowledge graph support, Qdrant and Supabase targets, KTable and LTable data types, additional LLM providers, and more.
Step-by-step tutorial to build text embeddings from Google Drive docs with CocoIndex, including service-account setup, and store them in Postgres for semantic search and RAG.
Indexing codebase for RAG with CocoIndex and Tree-sitter in real-time: chunking, embedding, semantic search, and build vector index for efficient retrieval.
What customizable data indexing pipelines are, and why custom transformation logic matters, explained through clear comparisons and practical CocoIndex examples.
CocoIndex is a data indexing platform for AI applications, handling ingestion, chunking, embedding, and pipeline management for RAG, semantic search, and knowledge graphs with built-in lineage and observability.