Turning documents, PDFs, and free text into typed, structured data.
Build a pipeline that turns YouTube podcasts into a knowledge graph: extract speakers, statements, and entities with an LLM, then dedupe them with embeddings.
A multi-source pipeline that ingests SEC filings (TXT, JSON, PDF), scrubs PII, extracts topics, and powers hybrid search with CocoIndex + Apache Doris.
Build a CocoIndex pipeline that generates a wiki page for each project in your codebase using an LLM, and keeps it fresh with incremental processing.
Turn slide decks into a continuously updated multimodal dataset with CocoIndex: extract speaker notes with Gemini Vision, synthesize narration with Piper TTS, and keep LanceDB in sync.
Featuring production-ready resilience, structured error system, expanded integrations, and always-fresh structured context for agents operating in the real world.
Extract Pydantic-typed structured data from patient intake forms using DSPy and CocoIndex: OCR vision models with incremental processing.
Build a real-time HackerNews trending topics detector with CocoIndex: a deep dive into Custom Sources and AI-powered topic extraction.
How to use BAML and CocoIndex to extract structured data from patient intake forms in PDF/Word with LLMs continuously for production.
Production-ready upgrades: durable execution, faster incremental processing over large datasets, GPU isolation, and richer native building blocks.
Extract invoice fields from PDFs in Azure Blob Storage and load them into Snowflake with an incremental CocoIndex + GPT-4o pipeline: open-source unstructured ETL.
How to index academic research papers by extracting metadata (e.g., title, authors, abstract) for AI agents and AI workflows using LLMs and CocoIndex
Build a real-time product recommendation engine with an LLM and a graph database, from the aspect of product category (taxonomy) understanding.
CocoIndex now supports knowledge graphs with incremental processing. Building live knowledge for agents is super easy with CocoIndex!
CocoIndex updates: incremental processing with live update mode, evaluation utilities, date/time types, a Google Drive source, and core performance improvements.
Extract structured data from patient intake forms in PDF and Word documents using an LLM and CocoIndex: a practical healthcare document extraction example.
First release of CocoIndex Changelog: LLM support, codebase indexing, custom functions, and assorted core/performance improvements
Learn to use CocoIndex to extract structured data from PDF/Markdown with Ollama's local LLM models. All running on premise without sending data to external APIs.