LanceDB
Exports data to a LanceDB table.
Data Mapping
Here's how CocoIndex data elements map to LanceDB elements during export:
| CocoIndex Element | LanceDB Element |
|---|---|
| an export target | a unique table |
| a collected row | a row |
| a field | a column |
This target is provided via an optional dependency [lancedb]:
pip install "cocoindex[lancedb]"
To use it, you need to import the submodule cocoindex.targets.lancedb:
import cocoindex.targets.lancedb as coco_lancedb
Spec
The spec coco_lancedb.LanceDB takes the following fields:
db_uri(str, required): The LanceDB database location (e.g../lancedb_data).table_name(str, required): The name of the table to export the data to.db_options(coco_lancedb.DatabaseOptions, optional): Advanced database options.storage_options(dict[str, Any], optional): Passed through to LanceDB when connecting.
Additional notes:
- Exactly one primary key field is required for LanceDB targets. We create B-Tree index on this key column.
- Full-Text Search (FTS) indexes are supported via the
fts_indexesparameter. Note that FTS functionality requires LanceDB Enterprise. You can pass any parameters supported by the target's FTS index creation API (e.g.,tokenizer_namefor LanceDB). See LanceDB FTS documentation for full parameter details.
LanceDB has a limitation that it cannot build a vector index on an empty table (see LanceDB issue #4034). If you want to use vector indexes, you can run the flow once to populate the target table with data, and then create the vector indexes.
You can find an end-to-end example here: examples/text_embedding_lancedb.
FTS Index Example
import cocoindex
import cocoindex.targets.lancedb as coco_lancedb
@cocoindex.flow_def(name="DocumentSearchFlow")
def document_search_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
# ... source and transformations ...
doc_collector = data_scope.add_collector()
# ... collect document data ...
doc_collector.export(
"documents",
coco_lancedb.LanceDB(
db_uri="./lancedb_data",
table_name="documents"
),
primary_key_fields=["id"],
# Add FTS indexes for full-text search
fts_indexes=[
# Basic FTS index with default tokenizer
cocoindex.FtsIndexDef("content"),
# FTS index with stemming for better search recall
cocoindex.FtsIndexDef("description", parameters={"tokenizer_name": "en_stem"}),
# FTS index with position tracking for phrase searches
cocoindex.FtsIndexDef("title", parameters={"tokenizer_name": "default", "with_position": True})
]
)
connect_async() helper
We provide a helper to obtain a shared AsyncConnection that is reused across your process and shared with CocoIndex's writer for strong read-after-write consistency:
from cocoindex.targets import lancedb as coco_lancedb
db = await coco_lancedb.connect_async("./lancedb_data")
table = await db.open_table("TextEmbedding")
Signature:
def connect_async(
db_uri: str,
*,
db_options: coco_lancedb.DatabaseOptions | None = None,
read_consistency_interval: datetime.timedelta | None = None
) -> lancedb.AsyncConnection
Once db_uri matches, it automatically reuses the same connection instance without re-establishing a new connection.
This achieves strong consistency between your indexing and querying logic, if they run in the same process.