Skip to main content

LanceDB

Exports data to a LanceDB table.

Data Mapping

Here's how CocoIndex data elements map to LanceDB elements during export:

CocoIndex ElementLanceDB Element
an export targeta unique table
a collected rowa row
a fielda column
Installation and import

This target is provided via an optional dependency [lancedb]:

pip install "cocoindex[lancedb]"

To use it, you need to import the submodule cocoindex.targets.lancedb:

import cocoindex.targets.lancedb as coco_lancedb

Spec

The spec coco_lancedb.LanceDB takes the following fields:

  • db_uri (str, required): The LanceDB database location (e.g. ./lancedb_data).
  • table_name (str, required): The name of the table to export the data to.
  • db_options (coco_lancedb.DatabaseOptions, optional): Advanced database options.
    • storage_options (dict[str, Any], optional): Passed through to LanceDB when connecting.

Additional notes:

  • Exactly one primary key field is required for LanceDB targets. We create B-Tree index on this key column.
info

LanceDB has a limitation that it cannot build a vector index on an empty table (see LanceDB issue #4034). If you want to use vector indexes, you can run the flow once to populate the target table with data, and then create the vector indexes.

You can find an end-to-end example here: examples/text_embedding_lancedb.

connect_async() helper

We provide a helper to obtain a shared AsyncConnection that is reused across your process and shared with CocoIndex's writer for strong read-after-write consistency:

from cocoindex.targets import lancedb as coco_lancedb

db = await coco_lancedb.connect_async("./lancedb_data")
table = await db.open_table("TextEmbedding")

Signature:

def connect_async(
db_uri: str,
*,
db_options: coco_lancedb.DatabaseOptions | None = None,
read_consistency_interval: datetime.timedelta | None = None
) -> lancedb.AsyncConnection

Once db_uri matches, it automatically reuses the same connection instance without re-establishing a new connection. This achieves strong consistency between your indexing and querying logic, if they run in the same process.

Example

Text Embedding LanceDB Example