Skip to main content

Data Types in CocoIndex

In CocoIndex, all data processed by the flow have a type determined when the flow is defined, before any actual data is processed at runtime.

This makes schema of data processed by CocoIndex clear, and easily determine the schema of your index.

Data Types

Basic Types

This is the list of all basic types supported by CocoIndex:

TypeType in PythonOriginal Type in Python
bytesbytesbytes
strstrstr
boolboolbool
int64intint
float32cocoindex.typing.Float32float
float64cocoindex.typing.Float64float
rangecocoindex.typing.Rangetuple[int, int]
vector[type, N?]Annotated[list[type], cocoindex.typing.Vector(dim=N)]list[type]
jsoncocoindex.typing.JsonAny type convertible to JSON by json package

For some types, CocoIndex Python SDK provides annotated types with finer granularity than Python's original type, e.g. Float32 and Float64 for float, and vector has dimension information.

When defining custom functions, use the specific types as type annotations for arguments and return values. So CocoIndex will have information about the specific type.

Struct Type

A struct has a bunch of fields, each with a name and a type.

In Python, a struct type is represented by a dataclass, and all fields must be annotated with a specific type. For example:

from dataclasses import dataclass

@dataclass
class Order:
order_id: str
name: str
price: float

Collection Types

A collection type models a collection of rows, each of which is a struct with specific schema.

We have two specific types of collection:

TypeDescriptionType in PythonOriginal Type in Python
Table[type]The first field is the key, and CocoIndex enforces its uniquenesscocoindex.typing.Table[type]list[type]
List[type]No key field; row order is preservedcocoindex.typing.List[type]list[type]

For example, we can use cocoindex.typing.Table[Order] to represent a table of orders, and the first field order_id will be taken as the key field.

Types to Create Indexes

Key Types

Currently, the following types are supported as types for key fields:

  • bytes
  • str
  • bool
  • int64
  • range
  • Struct with all fields being key types

Vector Type

Users can create vector index on fields with vector types. A vector index also needs to be configured with a similarity metric, and the index is only effective when this metric is used during retrieval.

Following metrics are supported:

Metric NameDescriptionSimilarity Order
CosineSimilarityCosine similarityLarger is more similar
L2DistanceL2 distance (a.k.a. Euclidean distance)Smaller is more similar
InnerProductInner productLarger is more similar