Query support
Share embedding logic between indexing and search via transform flows, then register query handlers so CocoInsight and your app can hit the same code path.
The main functionality of CocoIndex is indexing. The goal of indexing is to enable efficient querying against your data. You can use any libraries or frameworks of your choice to perform queries. At the same time, CocoIndex provides seamless integration between indexing and querying workflows.
- You can share transformations between indexing and querying.
- You can define query handlers, so that you can easily run queries in tools like CocoInsight.
- You can easily retrieve table names when using CocoIndex’s default naming conventions.
Transform Flow
Sometimes a part of the transformation logic needs to be shared between indexing and querying, e.g. when we build a vector index and query against it, the embedding computation needs to be consistent between indexing and querying.
In this case, you can:
-
Extract a sub-flow with the shared transformation logic into a standalone function.
- It takes one or more data slices as input.
- It returns one data slice as output.
- You need to annotate data types for both inputs and outputs as type parameter for
cocoindex.DataSlice[T]. See data types for more details about supported data types.
-
When you’re defining your indexing flow, you can directly call the function. The body will be executed, so that the transformation logic will be added as part of the indexing flow.
-
At query time, you usually want to directly run the function with specific input data, instead of letting it called as part of a long-lived indexing flow. To do this, declare the function as a transform flow, by decorating it with
@cocoindex.transform_flow(). This will addeval()andeval_async()methods to the function, so that you can directly call with specific input data.
Query Handler
Query handlers let you expose a simple function that takes a query string and returns structured results. They are discoverable by tools like CocoInsight so you can query your indexes without writing extra glue code.
- What you write: a plain Python function
def search(query: str) -> cocoindex.QueryOutput. - How you register: decorate it with
@<your_flow>.query_handler(...)or callflow.add_query_handler(...)directly. - What you return: a
cocoindex.QueryOutput(results=[...], query_info=...). - Optional metadata:
QueryHandlerResultFieldstells tools which fields contain the embedding vector and score.
Minimum Query Handler
A minimum query handler looks like this:
Notes about the decorator:
- The handler can be sync or async.
- The decorator registers the handler as a query handler for the flow. It doesn’t change the function signature: you can still call the function directly.
Your function returns a cocoindex.QueryOutput, with a results field, which is a list of dicts (or dataclass instances) representing query results.
Each element is a query result. All data types convertible to JSON are supported. Embeddings can be list[float] or numpy array.
A simple query handler like this will enable CocoInsight to display the query results for you to view easily.
Query Handler with Additional Information
You can provide additional information by extra fields like this:
-
result_fieldswithinquery_handlerspecifies field names in the query results returned by the query handler. This provides metadata for tools like CocoInsight to recognize structure of the query results, as specified by the following fields (all optional):embeddingis a list of keys that navigates to the embedding in each result (use multiple in case of multiple embeddings, e.g. using different models).scoreshould point to a numeric field where larger means more relevant.
-
QueryOutput.query_infospecifies information for the query itself, with the following fields (all optional):embeddingis the embedding of the query.similarity_metricis the similarity metric used to query the index.
Directly Register without Decorator
The above example can be written without decorator like this:
def my_search(query: str) -> cocoindex.QueryOutput:
...
my_flow.add_query_handler(
name="run_query",
handler=my_search,
result_fields=cocoindex.QueryHandlerResultFields(embedding=["embedding"], score="score"),
)
Sometimes this provides more flexibility.
Examples
You can see our following examples:
Get Target Native Names
In your indexing flow, when you export data to a target, you can specify the target name (e.g. a database table name, a collection name, the node label in property graph databases, etc.) explicitly,
or for some backends you can also omit it and let CocoIndex generate a default name for you.
For the latter case, CocoIndex provides a utility function cocoindex.utils.get_target_default_name() to get the default name.
It takes the following arguments:
flow(type:cocoindex.Flow): The flow to get the default name for.target_name(type:str): The export target name, appeared in theexport()call.
For example: