SQLite
The sqlite connector provides utilities for writing rows to SQLite databases, with optional vector support via the sqlite-vec extension.
from cocoindex.connectors import sqlite
For vector operations, install the sqlite-vec extension:
pip install cocoindex[sqlite]
Note: The default SQLite library bundled with macOS does not support extensions. Use Homebrew Python (brew install python) or build SQLite with extension support.
Connection Setup
connect
connect() creates a managed SQLite connection with sensible defaults, including automatic sqlite-vec loading and thread-safe access.
def connect(
database: str | Path,
*,
timeout: float = 5.0,
load_vec: bool | Literal["auto"] = "auto",
**kwargs: Any,
) -> ManagedConnection
Parameters:
database— Path to the SQLite database file, or":memory:"for an in-memory database.timeout— How long to wait for locks before raising an error.load_vec— Whether to load the sqlite-vec extension for vector support:"auto"(default): Try to load, silently ignore if unavailable.True: Load and raise an error if unavailable.False: Don't attempt to load.
**kwargs— Additional arguments passed directly tosqlite3.connect().
Returns: A ManagedConnection with thread-safe access and extension tracking.
Example:
managed_conn = sqlite.connect("mydb.sqlite") # Auto-loads sqlite-vec if available
# Or for in-memory:
managed_conn = sqlite.connect(":memory:")
# Or explicitly require vector support:
managed_conn = sqlite.connect("mydb.sqlite", load_vec=True)
# Or disable auto-loading:
managed_conn = sqlite.connect("mydb.sqlite", load_vec=False)
ManagedConnection
A wrapper around sqlite3.Connection that provides thread-safe access and tracks loaded extensions. The connection uses autocommit mode internally.
Methods:
transaction()— Context manager that acquires a lock and executes within a transaction (BEGIN...COMMIT/ROLLBACK). Use for write operations that should be atomic.readonly()— Context manager that acquires a lock for read-only operations. No transaction is started since the connection uses autocommit mode.close()— Closes the underlying connection.
Properties:
loaded_extensions— A read-onlySet[str]of loaded extension names (e.g.,"sqlite-vec").
As Target
The sqlite connector provides target state APIs for writing rows to tables. With it, CocoIndex tracks what rows should exist and automatically handles upserts and deletions.
Declaring Target States
Setting up a connection
Create a ContextKey[sqlite.ManagedConnection] (with tracked=False) to identify your SQLite connection, then provide it in your lifespan using sqlite.managed_connection():
import cocoindex as coco
SQLITE_DB = coco.ContextKey[sqlite.ManagedConnection]("main_db", tracked=False)
@coco.lifespan
def coco_lifespan(builder: coco.EnvironmentBuilder) -> Iterator[None]:
with sqlite.managed_connection("mydb.sqlite") as conn:
builder.provide(SQLITE_DB, conn)
yield
Tables (Parent State)
Declares a table as a target state. Returns a TableTarget for declaring rows.
def declare_table_target(
db: ContextKey[ManagedConnection],
table_name: str,
table_schema: TableSchema[RowT],
*,
managed_by: Literal["system", "user"] = "system",
virtual_table_def: Vec0TableDef | None = None,
) -> TableTarget[RowT, coco.PendingS]
Parameters:
db— AContextKey[ManagedConnection]identifying the connection to use.table_name— Name of the table.table_schema— Schema definition including columns and primary key (see Table Schema).managed_by— Whether CocoIndex manages the table lifecycle ("system") or assumes it exists ("user").virtual_table_def— OptionalVec0TableDefto create a vec0 virtual table instead of a regular table.
Returns: A pending TableTarget. Use the convenience wrapper await sqlite.mount_table_target(SQLITE_DB, table_name, table_schema) to resolve.
Rows (Child States)
Once a TableTarget is resolved, declare rows to be upserted:
def TableTarget.declare_row(
self,
*,
row: RowT,
) -> None
Parameters:
row— A row object (dict, dataclass, NamedTuple, or Pydantic model). Must include all primary key columns.
Table Schema: From Python Class
Define the table structure using a Python class (dataclass, NamedTuple, or Pydantic model):
@classmethod
async def TableSchema.from_class(
cls,
record_type: type[RowT],
primary_key: list[str],
*,
column_overrides: dict[str, SqliteType | VectorSchemaProvider] | None = None,
) -> TableSchema[RowT]
Parameters:
record_type— A record type whose fields define table columns.primary_key— List of column names forming the primary key.column_overrides— Optional per-column overrides for type mapping or vector configuration.
Example:
@dataclass
class OutputProduct:
category: str
name: str
price: float
embedding: Annotated[NDArray, embedder]
schema = await sqlite.TableSchema.from_class(
OutputProduct,
primary_key=["category", "name"],
)
Python types are automatically mapped to SQLite type affinities:
| Python Type | SQLite Type |
|---|---|
bool | INTEGER (0/1) |
int | INTEGER |
float | REAL |
decimal.Decimal | TEXT |
str | TEXT |
bytes | BLOB |
uuid.UUID | TEXT |
datetime.date | TEXT (ISO format) |
datetime.time | TEXT (ISO format) |
datetime.datetime | TEXT (ISO format) |
datetime.timedelta | REAL (total seconds) |
list, dict, nested structs | TEXT (JSON) |
NDArray (with vector schema) | float[N] (sqlite-vec type, e.g., float[384]) |
To override the default mapping, provide a SqliteType or VectorSchemaProvider via:
- Type annotation — using
typing.Annotatedon the field column_overrides— passing overrides when constructingTableSchema
SqliteType
Use SqliteType to specify a custom SQLite type and optional encoder:
from typing import Annotated
from cocoindex.connectors.sqlite import SqliteType
@dataclass
class MyRow:
id: int
value: Annotated[float, SqliteType("REAL")]
data: Annotated[dict, SqliteType("TEXT", encoder=lambda v: json.dumps(v))]
Or via column_overrides:
schema = sqlite.TableSchema(
MyRow,
primary_key=["id"],
column_overrides={
"data": sqlite.SqliteType("TEXT", encoder=lambda v: json.dumps(v)),
},
)
VectorSchemaProvider
For NDArray fields, a VectorSchemaProvider annotation specifies the vector dimension and dtype. Vectors are stored as BLOBs in sqlite-vec compatible float32 format. See Vector Schema for the full list of annotation options (ContextKey, embedder instance, or explicit VectorSchema).
Table Schema: Explicit Column Definitions
Define columns directly using ColumnDef:
def TableSchema.__init__(
self,
columns: dict[str, ColumnDef],
primary_key: list[str],
) -> None
Example:
schema = sqlite.TableSchema(
{
"category": sqlite.ColumnDef(type="TEXT", nullable=False),
"name": sqlite.ColumnDef(type="TEXT", nullable=False),
"price": sqlite.ColumnDef(type="REAL"),
"embedding": sqlite.ColumnDef(type="float[384]"), # sqlite-vec vector type
},
primary_key=["category", "name"],
)
Virtual Tables
SQLite virtual tables allow custom storage backends and specialized functionality. The sqlite connector supports creating virtual tables through the same declare_table_target() API used for regular tables.
Vec0 Virtual Tables
The vec0 module from sqlite-vec provides optimized vector storage for similarity search. Use vec0 virtual tables when:
- You need efficient vector similarity search with built-in indexing
- You want to partition vectors by categories for faster queries
- You're working with large vector datasets
Requirements:
- Exactly one
INTEGERprimary key column - At least one
float[N]vector column - The sqlite-vec extension must be loaded (
load_vec=True)
Vec0TableDef
Configure vec0-specific features using Vec0TableDef:
from cocoindex.connectors.sqlite import Vec0TableDef
virtual_table_def = Vec0TableDef(
partition_key_columns=["category"], # Optional: partition index by these columns
auxiliary_columns=["metadata"], # Optional: columns excluded from KNN filters
)
Parameters:
partition_key_columns— List of column names used to partition the vector index. Queries can filter by partition keys efficiently. Multiple partition keys create a composite partition.auxiliary_columns— List of column names to mark as auxiliary (stored but not usable in KNN filters). Useful for metadata that doesn't need to participate in similarity search.
Creating Vec0 Virtual Tables
Pass virtual_table_def to declare_table_target():
@dataclass
class VectorDocument:
id: int
category: str
content: str
embedding: Annotated[NDArray, embedder] # e.g., float[384]
metadata: str
# Create vec0 virtual table with partition key and auxiliary column
table = await sqlite.mount_table_target(
SQLITE_DB,
"documents",
await sqlite.TableSchema.from_class(
VectorDocument,
primary_key=["id"],
),
virtual_table_def=sqlite.Vec0TableDef(
partition_key_columns=["category"],
auxiliary_columns=["metadata"],
),
)
Vec0 virtual tables have the following limitations:
- Schema changes are not supported incrementally. When you modify the table schema (add/remove columns, change
virtual_table_defsettings), the table will be recreated and existing data will be lost. - Switching between regular and virtual tables will also recreate the table and clear existing data.
A future schema versioning mechanism will allow preserving row data across table recreations.
Example
import cocoindex as coco
from cocoindex.connectors import sqlite
DATABASE_PATH = "mydb.sqlite"
SQLITE_DB = coco.ContextKey[sqlite.ManagedConnection]("main_db", tracked=False)
@dataclass
class OutputProduct:
category: str
name: str
description: str
embedding: Annotated[NDArray, embedder]
@coco.lifespan
def coco_lifespan(builder: coco.EnvironmentBuilder) -> Iterator[None]:
with sqlite.managed_connection(DATABASE_PATH, load_vec=True) as conn: # Enable vector support
builder.provide(SQLITE_DB, conn)
yield
@coco.fn
async def app_main() -> None:
# Declare table target state
table = await sqlite.mount_table_target(
SQLITE_DB,
"products",
await sqlite.TableSchema.from_class(
OutputProduct,
primary_key=["category", "name"],
),
)
# Declare rows
for product in products:
table.declare_row(row=product)
Example: Vec0 Virtual Table
import cocoindex as coco
from cocoindex.connectors import sqlite
from cocoindex.ops.sentence_transformers import SentenceTransformerEmbedder
from dataclasses import dataclass
from typing import Annotated
from numpy.typing import NDArray
DATABASE_PATH = "vectors.sqlite"
SQLITE_DB = coco.ContextKey[sqlite.ManagedConnection]("vec_db", tracked=False)
embedder = SentenceTransformerEmbedder("sentence-transformers/all-MiniLM-L6-v2")
@dataclass
class VectorDocument:
id: int
category: str
title: str
content: str
embedding: Annotated[NDArray, embedder] # float[384]
metadata: str # Will be marked as auxiliary
@coco.lifespan
def coco_lifespan(builder: coco.EnvironmentBuilder) -> Iterator[None]:
with sqlite.managed_connection(DATABASE_PATH, load_vec=True) as conn:
builder.provide(SQLITE_DB, conn)
yield
@coco.fn
async def app_main() -> None:
# Create vec0 virtual table with partition key and auxiliary column
table = await sqlite.mount_table_target(
SQLITE_DB,
"documents",
await sqlite.TableSchema.from_class(
VectorDocument,
primary_key=["id"],
),
virtual_table_def=sqlite.Vec0TableDef(
partition_key_columns=["category"], # Partition index by category
auxiliary_columns=["metadata"], # Store but don't index for KNN
),
)
# Declare document rows
docs = [
VectorDocument(
id=1,
category="tech",
title="Introduction to AI",
content="Artificial intelligence is...",
embedding=await embedder.embed("Artificial intelligence is..."),
metadata='{"source": "blog", "date": "2025-01-15"}',
),
# ... more documents
]
for doc in docs:
table.declare_row(row=doc)