ChromaDB
Exports data to a ChromaDB collection with vector search support.
Data Mapping
Here's how CocoIndex data elements map to ChromaDB elements during export:
| CocoIndex Element | ChromaDB Element |
|---|---|
| an export target | a unique collection |
| a collected row | a document |
| a vector field | the embedding |
a field matching document_field | the document content |
| other fields | metadata key-value pairs |
ChromaDB supports one embedding per document. Exactly one vector field must be present in the value schema — it becomes the embedding. Non-vector fields become metadata, except the field named by document_field (if set), which is stored as ChromaDB's document content and enables its built-in text search.
This target is provided via an optional dependency [chromadb]:
pip install "cocoindex[chromadb]"
To use it, import the submodule cocoindex.targets.chromadb:
import cocoindex.targets.chromadb as coco_chromadb
Spec
The spec coco_chromadb.ChromaDB takes the following fields:
Collection
-
collection_name(str, required): The name of the collection to export data to. -
document_field(str, optional): Name of the value field to pass as ChromaDB'sdocumentsparameter instead of metadata. Enables ChromaDB's built-in text search on that field.
Client
-
client_type(coco_chromadb.ClientType, optional, default:PERSISTENT): Which ChromaDB client to use:PERSISTENT— local on-disk storage viaPersistentClient.HTTP— connects to a remote ChromaDB server viaHttpClient.CLOUD— connects to Chroma Cloud viaCloudClient.
-
path(str, optional, default:"./chromadb_data"): Data directory. Used withPERSISTENTclient. -
host(str, optional, default:"localhost"): Server host. Used withHTTPclient. -
port(int, optional, default:8000): Server port. Used withHTTPclient. -
ssl(bool, optional, default:False): Whether to use SSL. Used withHTTPclient. -
api_key(str, optional): API key for authentication. Required when usingCLOUDclient. -
tenant(str, optional): Chroma tenant (defaults to Chroma's default tenant). -
database(str, optional): Chroma database (defaults to Chroma's default database).
HNSW Index
hnsw_config(coco_chromadb.HnswConfig, optional): HNSW index tuning parameters.m(int, optional): Number of bi-directional links per element.ef_construction(int, optional): Size of the dynamic candidate list during index construction.ef_search(int, optional): Size of the dynamic candidate list during search.
Additional notes:
- Exactly one primary key field is required.
- Exactly one vector field is required — ChromaDB stores a single embedding per document.
- Supported distance metrics:
COSINE_SIMILARITY,L2_DISTANCE,INNER_PRODUCT. - Complex metadata values (lists, dicts, etc.) are JSON-serialized automatically.
Example
import cocoindex
import cocoindex.targets.chromadb as coco_chromadb
@cocoindex.flow_def(name="TextEmbeddingWithChromaDB")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
# ... source and transformations ...
doc_embeddings = data_scope.add_collector()
# ... collect fields: id, filename, text, text_embedding ...
doc_embeddings.export(
"doc_embeddings",
coco_chromadb.ChromaDB(
collection_name="text_embedding",
path="./chromadb_data",
document_field="text",
),
primary_key_fields=["id"],
vector_indexes=[
cocoindex.VectorIndexDef(
"text_embedding",
cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY,
)
],
)