Qdrant
The qdrant connector provides utilities for writing points to Qdrant vector databases, with support for both single and named vectors, as well as multi-vector configurations.
from cocoindex.connectors import qdrant
This connector requires additional dependencies. Install with:
pip install cocoindex[qdrant]
Connection setup
create_client() creates a Qdrant client connection with optional gRPC support.
def create_client(
url: str,
*,
prefer_grpc: bool = True,
**kwargs: Any,
) -> QdrantClient
Parameters:
url— Qdrant server URL (e.g.,"http://localhost:6333").prefer_grpc— Whether to prefer gRPC over HTTP (default:True).**kwargs— Additional arguments passed directly toQdrantClient.
Returns: A Qdrant client instance.
Example:
client = qdrant.create_client("http://localhost:6333")
As target
The qdrant connector provides target state APIs for writing points to collections. CocoIndex tracks what points should exist and automatically handles upserts and deletions.
Declaring target states
Database registration
Before declaring target states, register the Qdrant client with a stable key that identifies the logical database. This key allows CocoIndex to recognize the same database even when connection details change.
def register_db(key: str, client: QdrantClient) -> QdrantDatabase
Parameters:
key— A stable identifier for this database (e.g.,"vector_db"). Must be unique.client— A Qdrant client instance.
Returns: A QdrantDatabase handle for declaring target states.
Example:
client = qdrant.create_client("http://localhost:6333")
db = qdrant.register_db("my_vectors", client)
Collections (parent state)
Declares a collection as a target state. Returns a CollectionTarget for declaring points.
def QdrantDatabase.declare_collection_target(
self,
collection_name: str,
schema: CollectionSchema,
*,
managed_by: Literal["system", "user"] = "system",
) -> CollectionTarget[coco.PendingS]
Parameters:
collection_name— Name of the collection.schema— Schema definition specifying vector configurations (see Collection Schema).managed_by— Whether CocoIndex manages the collection lifecycle ("system") or assumes it exists ("user").
Returns: A pending CollectionTarget. Use mount_run(...).result() to wait for resolution.
Points (child states)
Once a CollectionTarget is resolved, declare points to be upserted using qdrant.PointStruct, which is an alias of qdrant_client.http.models.PointStruct:
def CollectionTarget.declare_point(
self,
point: qdrant.PointStruct,
) -> None
Parameters:
point— Aqdrant.PointStruct(alias ofqdrant_client.http.models.PointStruct) containing:id— Point ID (str, int, or UUID)vector— Vector data (single vector or dict of named vectors)payload— Optional metadata as a JSON-serializable dict
Collection schema
Define vector configurations for a collection using CollectionSchema. Unlike row-oriented databases, Qdrant uses a point-oriented model where each point has schemaless payload and one or more vectors with predefined dimensions.
class CollectionSchema:
def __init__(
self,
vectors: QdrantVectorDef | dict[str, QdrantVectorDef],
) -> None
Parameters:
vectors— Either:- A single
QdrantVectorDeffor an unnamed vector - A dict mapping vector names to
QdrantVectorDeffor named vectors
- A single
QdrantVectorDef
Specifies vector configuration including dimension, distance metric, and multi-vector settings:
class QdrantVectorDef(NamedTuple):
schema: VectorSchemaProvider | MultiVectorSchemaProvider
distance: Literal["cosine", "dot", "euclid"] = "cosine"
multivector_comparator: Literal["max_sim"] = "max_sim"
Parameters:
schema— AVectorSchemaProviderorMultiVectorSchemaProviderthat defines vector dimensionsdistance— Distance metric for similarity search (default:"cosine")multivector_comparator— Comparator for multi-vector fields (only applies toMultiVectorSchemaProvider)
Single (unnamed) vector
For collections with a single unnamed vector:
from cocoindex.ops.sentence_transformers import SentenceTransformerEmbedder
embedder = SentenceTransformerEmbedder("sentence-transformers/all-MiniLM-L6-v2")
schema = qdrant.CollectionSchema(
vectors=qdrant.QdrantVectorDef(schema=embedder)
)
Points use the vector directly:
point = qdrant.PointStruct(
id="doc-123",
vector=embedding.tolist(), # Single vector
payload={"text": "...", "metadata": {...}},
)
Named vectors
For collections with multiple named vectors:
from cocoindex.resources.schema import VectorSchema
import numpy as np
schema = qdrant.CollectionSchema(
vectors={
"text_embedding": qdrant.QdrantVectorDef(
schema=VectorSchema(dtype=np.float32, size=384),
distance="cosine",
),
"image_embedding": qdrant.QdrantVectorDef(
schema=VectorSchema(dtype=np.float32, size=512),
distance="dot",
),
}
)
Points use a dict of vectors:
point = qdrant.PointStruct(
id="doc-123",
vector={
"text_embedding": text_vec.tolist(),
"image_embedding": image_vec.tolist(),
},
payload={"text": "...", "metadata": {...}},
)
VectorSchemaProvider
Vector dimensions are typically determined by the embedding model. By using a VectorSchemaProvider, the dimension is derived automatically from the source configuration.
A VectorSchemaProvider can be:
- An embedding model (e.g.,
SentenceTransformerEmbedder) — dimension is inferred from the model - A
VectorSchema— for explicit size and dtype when not using an embedder
from cocoindex.ops.sentence_transformers import SentenceTransformerEmbedder
embedder = SentenceTransformerEmbedder("sentence-transformers/all-MiniLM-L6-v2")
schema = qdrant.CollectionSchema(
vectors=qdrant.QdrantVectorDef(schema=embedder) # dimension inferred (384)
)
Or with explicit configuration:
from cocoindex.resources.schema import VectorSchema
import numpy as np
schema = qdrant.CollectionSchema(
vectors=qdrant.QdrantVectorDef(
schema=VectorSchema(dtype=np.float32, size=384)
)
)
Multi-vector support
For multi-vector configurations (multiple vectors per point stored together):
from cocoindex.resources.schema import MultiVectorSchema, VectorSchema
import numpy as np
schema = qdrant.CollectionSchema(
vectors=qdrant.QdrantVectorDef(
schema=MultiVectorSchema(
vector_schema=VectorSchema(dtype=np.float32, size=384)
),
multivector_comparator="max_sim",
)
)
Distance metrics
The distance parameter in QdrantVectorDef specifies the similarity metric:
"cosine"— Cosine similarity (default, normalized dot product)"dot"— Dot product similarity"euclid"— Euclidean distance (L2)
Example: single vector
import cocoindex as coco
import cocoindex.asyncio as coco_aio
from cocoindex.connectors import qdrant
from cocoindex.ops.sentence_transformers import SentenceTransformerEmbedder
from typing import AsyncIterator
QDRANT_URL = "http://localhost:6333"
QDRANT_DB = coco.ContextKey[qdrant.QdrantDatabase]("qdrant_db")
embedder = SentenceTransformerEmbedder("sentence-transformers/all-MiniLM-L6-v2")
@coco_aio.lifespan
async def coco_lifespan(builder: coco_aio.EnvironmentBuilder) -> AsyncIterator[None]:
client = qdrant.create_client(QDRANT_URL)
builder.provide(QDRANT_DB, qdrant.register_db("main_vectors", client))
yield
@coco.function
async def process_document(
doc_id: str,
text: str,
target: qdrant.CollectionTarget,
) -> None:
embedding = await embedder.embed_async(text)
point = qdrant.PointStruct(
id=doc_id,
vector=embedding.tolist(),
payload={"text": text},
)
target.declare_point(point)
@coco.function
async def app_main() -> None:
db = coco.use_context(QDRANT_DB)
# Declare collection target state
collection = await coco_aio.mount_run(
coco.component_subpath("setup", "collection"),
db.declare_collection_target,
collection_name="documents",
schema=qdrant.CollectionSchema(
vectors=qdrant.QdrantVectorDef(schema=embedder)
),
).result()
# Declare points
for doc_id, text in documents:
await coco_aio.mount_run(
coco.component_subpath("doc", doc_id),
process_document,
doc_id,
text,
collection,
)
Example: named vectors
from cocoindex.resources.schema import VectorSchema
import numpy as np
@coco.function
async def app_main() -> None:
db = coco.use_context(QDRANT_DB)
collection = await coco_aio.mount_run(
coco.component_subpath("setup", "collection"),
db.declare_collection_target,
collection_name="multimodal_docs",
schema=qdrant.CollectionSchema(
vectors={
"text": qdrant.QdrantVectorDef(
schema=text_embedder,
distance="cosine",
),
"image": qdrant.QdrantVectorDef(
schema=VectorSchema(dtype=np.float32, size=512),
distance="dot",
),
}
),
).result()
# Declare points with named vectors
for doc in documents:
point = qdrant.PointStruct(
id=doc.id,
vector={
"text": doc.text_embedding.tolist(),
"image": doc.image_embedding.tolist(),
},
payload={"title": doc.title, "url": doc.url},
)
collection.declare_point(point)
Point IDs
Qdrant supports the following point ID types:
str— String identifiersint— Integer identifiers (unsigned 64-bit)uuid.UUID— UUID identifiers (converted to string)
All other types are converted to strings automatically.
Payloads
Point payloads are schemaless JSON objects. Any JSON-serializable Python data structure can be used:
payload = {
"text": "Document content",
"metadata": {
"author": "Alice",
"tags": ["machine-learning", "nlp"],
"published": "2024-01-15",
},
"stats": {
"views": 1500,
"likes": 42,
},
}
Vector search
The connector focuses on writing points to Qdrant. For vector search, use the Qdrant client directly:
from qdrant_client.http import models as qdrant_models
# Get the registered client
client = qdrant.create_client("http://localhost:6333")
# Perform search
results = client.search(
collection_name="documents",
query_vector=query_embedding.tolist(),
limit=10,
)
for result in results:
print(f"Score: {result.score}, ID: {result.id}")
print(f"Payload: {result.payload}")
For named vectors:
results = client.search(
collection_name="documents",
query_vector=("text", query_embedding.tolist()), # Search using "text" vector
limit=10,
)