Postgres
Exports data to Postgres database (with pgvector extension).
Data Mapping
Here's how CocoIndex data elements map to Postgres elements during export:
| CocoIndex Element | Postgres Element |
|---|---|
| an export target | a unique table |
| a collected row | a row |
| a field | a column |
For example, if you have a data collector that collects rows with fields id, title, and embedding, it will be exported to a Postgres table with corresponding columns.
It should be a unique table, meaning that no other export target should export to the same table.
Since vectors in pgvector must have fixed dimension, we only map vectors of number types with fixed dimension (i.e. Vector[cocoindex.Float32, N], Vector[cocoindex.Float64, N], and Vector[cocoindex.Int64, N]) to vector(N) columns.
For all other vector types, we map them to jsonb columns.
U+0000 (NUL) is a valid character in Unicode, but Postgres has a limitation that strings (including text-like types and strings in jsonb) cannot contain them.
CocoIndex automatically strips U+0000 (NUL) characters from strings before exporting to Postgres. For example, if you have a string "Hello\0World", it will be exported as "HelloWorld".
Spec
The spec takes the following fields:
-
database(auth reference toDatabaseConnectionSpec, optional): The connection to the Postgres database. See DatabaseConnectionSpec for its specific fields. If not provided, will use the same database as the internal storage. -
table_name(str, optional): The name of the table to store to. If unspecified, will use the table name[${AppNamespace}__]${FlowName}__${TargetName}, e.g.DemoFlow__doc_embeddingsorStaging__DemoFlow__doc_embeddings. -
schema(str, optional): The PostgreSQL schema to create the table in. If unspecified, the table will be created in the default schema (usuallypublic). When specified,table_namemust also be explicitly specified. CocoIndex will automatically create the schema if it doesn't exist. -
column_options(dict[str, PostgresColumnOptions], optional): Options for the columns in the table. Key is the column name, and value is the specific option, with the following fields:type(str, optional): The specific type of the column in Postgres. Currently only supports"vector"and"halfvec". By default, we use"vector"for vector columns, and you can use this field to override it to"halfvec"for some columns.
Attachments
PostgresSqlCommand
Execute arbitrary Postgres SQL during flow setup, with an optional SQL to undo it when the attachment or target is removed.
This attachment is useful for capabilities not natively modeled by the target spec, such as creating specialized indexes, triggers, or grants.
Fields:
name(str, required): A identifier for this attachment on the target. Unique within the target.setup_sql(str, required): SQL to execute during setup.teardown_sql(str, optional): SQL to execute on removal/drop.
Notes about setup_sql and teardown_sql:
- Multiple statements are allowed in both
setup_sqlandteardown_sql. Use;to separate them. - Both
setup_sqlandteardown_sqlare expected to be idempotent, e.g. use statements likeCREATE ... IF NOT EXISTSandDROP ... IF EXISTS. - The
setup_sqlis expected to have an "upsert" behavior. If you updatesetup_sql, the updatedsetup_sqlwill be executed during setup. - The
teardown_sqlis saved by CocoIndex, so it'll be executed when the attachment no longer exists. If you updateteardown_sql, the updatedteardown_sqlwill be saved and executed (instead of the previous one) during teardown.
Example (create a custom index):
collector.export(
"doc_embeddings",
cocoindex.targets.Postgres(table_name="doc_embeddings"),
primary_key_fields=["id"],
attachments=[
cocoindex.targets.PostgresSqlCommand(
name="fts",
setup_sql=(
"CREATE INDEX IF NOT EXISTS doc_embeddings_text_fts "
"ON doc_embeddings USING GIN (to_tsvector('english', text));"
),
teardown_sql= "DROP INDEX IF EXISTS doc_embeddings_text_fts;",
)
],
)