Skip to main content

5 posts tagged with "Insight"

Articles about insights and observations about data indexing and pipelines

View All Tags

Thinking in Rust: Ownership, Access, and Memory Safety

9 min read

Thinking in Rust: Ownership, Access, and Memory Safety

I'm an experienced C++ programmer, but still feel that I need a mindset shift when starting to work with Rust.

  • References can be immutable (&) or mutable (&mut), which is straightforward for simplest cases. There are more complex cases, e.g. there are data types with interior mutability (e.g. RefCell), there are multi-level references (& &mut).
  • Rust also models memory safety in multi-thread scenarios, so there's Send and Sync. They have intricate rules related to various types of references, e.g. when &T is Send or Sync? How about &mut T?
  • There are various data types, like Rc , Arc, Cell, RefCell, Mutex, RwLock, Cow . When to pick which?

What Makes Indexing Pipelines Different?

3 min read

Indexing Pipeline Differences

When building data processing systems, it's easy to think all pipelines are similar - they take data in, transform it, and produce outputs. However, indexing pipelines have unique characteristics that set them apart from traditional ETL, analytics, or transactional systems. Let's explore what makes indexing special.

Data Consistency in Indexing Pipelines

7 min read

Data Consistency in Indexing Pipelines

An indexing pipeline builds indexes derived from source data. The index should always be converging to the current version of source data. In other words, once a new version of source data is processed by the pipeline, all data derived from previous versions should no longer exist in the target index storage. This is called data consistency requirement for an indexing pipeline.

Data Indexing and Common Challenges

5 min read

Data Indexing Pipeline

At its core, data indexing is the process of transforming raw data into a format that's optimized for retrieval. Unlike an arbitrary application that may generate new source-of-truth data, indexing pipelines process existing data in various ways while maintaining trackability back to the original source. This intrinsic nature - being a derivative rather than source of truth - creates unique challenges and requirements.