CocoIndex Query Support
The main functionality of CocoIndex is indexing. The goal of indexing is to enable efficient querying against your data. You can use any libraries or frameworks of your choice to perform queries. At the same time, CocoIndex provides seamless integration between indexing and querying workflows. For example, you can share transformations between indexing and querying, and easily retrieve table names when using CocoIndex's default naming conventions.
Transform Flow
Sometimes a part of the transformation logic needs to be shared between indexing and querying, e.g. when we build a vector index and query against it, the embedding computation needs to be consistent between indexing and querying.
In this case, you can:
-
Extract a sub-flow with the shared transformation logic into a standalone function.
- It takes one or more data slices as input.
- It returns one data slice as output.
- You need to annotate data types for both inputs and outputs as type parameter for
cocoindex.DataSlice[T]
. See data types for more details about supported data types.
-
When you're defining your indexing flow, you can directly call the function. The body will be executed, so that the transformation logic will be added as part of the indexing flow.
-
At query time, you usually want to directly run the function with specific input data, instead of letting it called as part of a long-lived indexing flow. To do this, declare the function as a transform flow, by decorating it with
@cocoindex.transform_flow()
. This will addeval()
andeval_async()
methods to the function, so that you can directly call with specific input data.
- Python
The quickstart shows an example:
@cocoindex.transform_flow()
def text_to_embedding(text: cocoindex.DataSlice[str]) -> cocoindex.DataSlice[list[float]]:
return text.transform(
cocoindex.functions.SentenceTransformerEmbed(
model="sentence-transformers/all-MiniLM-L6-v2"))
When you're defining your indexing flow, you can directly call the function:
with doc["chunks"].row() as chunk:
chunk["embedding"] = text_to_embedding(chunk["text"])
or, using the call()
method of the transform flow on the first argument, to make operations chainable:
with doc["chunks"].row() as chunk:
chunk["embedding"] = chunk["text"].call(text_to_embedding)
Any time, you can call the eval()
method with specific string, which will return a list[float]
:
print(text_to_embedding.eval("Hello, world!"))
If you're in an async context, please call the eval_async()
method instead:
print(await text_to_embedding.eval_async("Hello, world!"))
Get Target Native Names
In your indexing flow, when you export data to a target, you can specify the target name (e.g. a database table name, a collection name, the node label in property graph databases, etc.) explicitly,
or for some backends you can also omit it and let CocoIndex generate a default name for you.
For the latter case, CocoIndex provides a utility function cocoindex.utils.get_target_storage_default_name()
to get the default name.
It takes the following arguments:
flow
(type:cocoindex.Flow
): The flow to get the default name for.target_name
(type:str
): The export target name, appeared in theexport()
call.
For example:
- Python
table_name = cocoindex.utils.get_target_storage_default_name(text_embedding_flow, "doc_embeddings")
query = f"SELECT filename, text FROM {table_name} ORDER BY embedding <=> %s::vector DESC LIMIT 5"
...