Data Types in CocoIndex
In CocoIndex, all data processed by the flow have a type determined when the flow is defined, before any actual data is processed at runtime.
This makes schema of data processed by CocoIndex clear, and easily determine the schema of your index.
Data Types
Basic Types
This is the list of all basic types supported by CocoIndex:
Type | Type in Python | Original Type in Python |
---|---|---|
bytes | bytes | bytes |
str | str | str |
bool | bool | bool |
int64 | int | int |
float32 | cocoindex.typing.Float32 | float |
float64 | cocoindex.typing.Float64 | float |
range | cocoindex.typing.Range | tuple[int, int] |
vector[type, N?] | Annotated[list[type], cocoindex.typing.Vector(dim=N)] | list[type] |
json | cocoindex.typing.Json | Any type convertible to JSON by json package |
For some types, CocoIndex Python SDK provides annotated types with finer granularity than Python's original type, e.g. Float32
and Float64
for float
, and vector
has dimension information.
When defining custom functions, use the specific types as type annotations for arguments and return values. So CocoIndex will have information about the specific type.
Struct Type
A struct has a bunch of fields, each with a name and a type.
In Python, a struct type is represented by a dataclass, and all fields must be annotated with a specific type. For example:
from dataclasses import dataclass
@dataclass
class Order:
order_id: str
name: str
price: float
Collection Types
A collection type models a collection of rows, each of which is a struct with specific schema.
We have two specific types of collection:
Type | Description | Type in Python | Original Type in Python |
---|---|---|---|
Table[type] | The first field is the key, and CocoIndex enforces its uniqueness | cocoindex.typing.Table[type] | list[type] |
List[type] | No key field; row order is preserved | cocoindex.typing.List[type] | list[type] |
For example, we can use cocoindex.typing.Table[Order]
to represent a table of orders, and the first field order_id
will be taken as the key field.
Types to Create Indexes
Key Types
Currently, the following types are supported as types for key fields:
bytes
str
bool
int64
range
- Struct with all fields being key types
Vector Type
Users can create vector index on fields with vector
types.
A vector index also needs to be configured with a similarity metric, and the index is only effective when this metric is used during retrieval.
Following metrics are supported:
Metric Name | Description | Similarity Order |
---|---|---|
CosineSimilarity | Cosine similarity | Larger is more similar |
L2Distance | L2 distance (a.k.a. Euclidean distance) | Smaller is more similar |
InnerProduct | Inner product | Larger is more similar |