Worked examples and implementation guides for CocoIndex.
CocoIndex now ships a Kafka target connector. We walk through a tiny live pipeline that watches a folder of CSV files and publishes each row as a JSON message to a StreamNative-hosted Kafka topic — incrementally, with no glue code.
Build a pipeline that converts YouTube podcasts into a structured knowledge graph — extracting speakers, statements, and entities with LLM, then resolving duplicates with embeddings.
A multi-source pipeline that ingests SEC filings (TXT, JSON, PDF), scrubs PII, extracts topics, and powers hybrid search with CocoIndex + Apache Doris.
Automatically generates a wiki page for each project in your codebase, and keeps it fresh with incremental processing.
Turn slide decks into a continuously updated multimodal dataset — extract speaker notes, synthesize narration, keep LanceDB in sync.
Extract Pydantic-typed structured data from patient intake forms using DSPy and CocoIndex — OCR vision models with incremental processing.
"Most companies sit on an ocean of meeting notes - inside those documents are decisions, tasks, owners, and relationships — an untapped knowledge graph that is constantly changing.
Build a real-time HackerNews trending topics detector with CocoIndex — a deep dive into Custom Sources and AI-powered topic extraction.
Build a custom incremental HackerNews connector with CocoIndex's Custom Source API and export to Postgres for semantic search and analytics.
How to use BAML and CocoIndex to extract structured data from patient intake forms in PDF/Word with LLM continuous for production.
Extract, embed, and store multimodal PDF elements — text with SentenceTransformers, images with CLIP — for unified semantic search with traceable metadata.
Build an incremental AI pipeline that extracts invoice fields from PDFs in Azure Blob Storage and loads them into Snowflake — with CocoIndex, OpenAI GPT-4o, and a ~50-line custom Snowflake target. Open-source alternative to Snowflake Openflow and Cortex Document AI for unstructured ETL.
Define query handlers in CocoIndex and trace search results back to source data in CocoInsight — close the loop on indexing strategy.
Build unified, incrementally updated search and analytics over structured + unstructured data in PostgreSQL with CocoIndex.
Build a unified visual document index from multiple file formats — including PDFs, images, and slides — using CocoIndex and ColPali, No OCR needed.
CocoIndex now supports native integration with ColPali — enabling multi-vector, patch-level image indexing.
CocoIndex now officially supports custom targets — giving you the power to export data to any destination, whether it's a local file, cloud storage, a REST API, or your own bespoke system.
Build a scalable face detection and recognition pipeline with CocoIndex — embed faces, structure for search, and export to a vector DB.
How to index academic research papers by extracting metadata (e.g., title, authors, abstract) for AI agents and AI workflows using LLMs and CocoIndex
Build a real-time knowledge graph with Kuzu as a native CocoIndex target — incremental updates, high-performance graph queries.
Build a real-time data transformation pipeline with Amazon S3 and SQS using CocoIndex — incremental indexing on object storage.
Indexing images with CocoIndex and Vision Model in real-time: multi-modal embedding, and build vector index for efficient retrieval.
Indexing text with CocoIndex and text embeddings, and query it with natural language.
Build a real-time product recommendation engine with LLM and graph database, from the aspect of product category (taxonomy) understanding.
CocoIndex now supports knowledge graph with incremental processing. Build live knowledge for agents is super easy with CocoIndex!
Extract structured data from patient intake forms in PDF/Word with LLM by CocoIndex.
Tutorial to create text embeddings from docs on Google Drive, save in vector stores for semantics search / RAG, using CocoIndex.
Indexing codebase for RAG with CocoIndex and Tree-sitter in real-time: chunking, embedding, semantic search, and build vector index for efficient retrieval.
Learn to use CocoIndex extracting structured data from PDF/Markdown with Ollama's local LLM models. All running on premise without sending data to external APIs.