Academic Papers Indexing
Build a real-time academic papers index. Extract metadata, chunk and embed abstracts, and enable semantic and author-based search over academic PDFs.
Build a real-time academic papers index. Extract metadata, chunk and embed abstracts, and enable semantic and author-based search over academic PDFs.
Use Google Document AI to parse document, embed the resulting text, and store it in a vectorized database for semantic search.
Build image search index with ColPali and FastAPI
Build a visual document indexing pipeline using ColPali to index scanned documents, PDFs, academic papers, presentation slides, and standalone images — all mixed together with charts, tables, and figures - into the same vector space.
Covers extracting and embedding faces from images, structuring data for visual search, and exporting to a vector database for face similarity queries.
Build a real-time codebase index for retrieval-augmented generation (RAG) using CocoIndex and Tree-sitter. Chunk, embed, and search code with semantic understanding.
Indexing text with CocoIndex and text embeddings, and query it with natural language.
Transform data from PostgreSQL table as source, transform with both AI models and non-AI data mappings, and write them into PostgreSQL/PgVector for semantic + structured search.