Customizable Data Indexing Pipelines
CocoIndex is the world's first open-source engine that supports both custom transformation logic and incremental updates specialized for data indexing. So, what is custom transformation logic?
Articles about insights and observations about data indexing and pipelines
View All TagsCocoIndex is the world's first open-source engine that supports both custom transformation logic and incremental updates specialized for data indexing. So, what is custom transformation logic?
When building data processing systems, it's easy to think all pipelines are similar - they take data in, transform it, and produce outputs. However, indexing pipelines have unique characteristics that set them apart from traditional ETL, analytics, or transactional systems. Let's explore what makes indexing special.
An indexing pipeline builds indexes derived from source data. The index should always be converging to the current version of source data. In other words, once a new version of source data is processed by the pipeline, all data derived from previous versions should no longer exist in the target index storage. This is called data consistency requirement for an indexing pipeline.
At its core, data indexing is the process of transforming raw data into a format that's optimized for retrieval. Unlike an arbitrary application that may generate new source-of-truth data, indexing pipelines process existing data in various ways while maintaining trackability back to the original source. This intrinsic nature - being a derivative rather than source of truth - creates unique challenges and requirements.