Postgres
Exports data to Postgres database (with pgvector extension).
Data Mapping
Here's how CocoIndex data elements map to Postgres elements during export:
CocoIndex Element | Postgres Element |
---|---|
an export target | a unique table |
a collected row | a row |
a field | a column |
For example, if you have a data collector that collects rows with fields id
, title
, and embedding
, it will be exported to a Postgres table with corresponding columns.
It should be a unique table, meaning that no other export target should export to the same table.
Since vectors in pgvector must have fixed dimension, we only map vectors of number types with fixed dimension (i.e. Vector[cocoindex.Float32, N], Vector[cocoindex.Float64, N], and Vector[cocoindex.Int64, N]) to vector(N)
columns.
For all other vector types, we map them to jsonb
columns.
U+0000 (NUL) is a valid character in Unicode, but Postgres has a limitation that strings (including text
-like types and strings in jsonb
) cannot contain them.
CocoIndex automatically strips U+0000 (NUL) characters from strings before exporting to Postgres. For example, if you have a string "Hello\0World"
, it will be exported as "HelloWorld"
.
Spec
The spec takes the following fields:
-
database
(auth reference toDatabaseConnectionSpec
, optional): The connection to the Postgres database. See DatabaseConnectionSpec for its specific fields. If not provided, will use the same database as the internal storage. -
table_name
(str
, optional): The name of the table to store to. If unspecified, will use the table name[${AppNamespace}__]${FlowName}__${TargetName}
, e.g.DemoFlow__doc_embeddings
orStaging__DemoFlow__doc_embeddings
. -
schema
(str
, optional): The PostgreSQL schema to create the table in. If unspecified, the table will be created in the default schema (usuallypublic
). When specified,table_name
must also be explicitly specified. CocoIndex will automatically create the schema if it doesn't exist.
Attachments
PostgresSqlCommand
Execute arbitrary Postgres SQL during flow setup, with an optional SQL to undo it when the attachment or target is removed.
This attachment is useful for capabilities not natively modeled by the target spec, such as creating specialized indexes, triggers, or grants.
Fields:
name
(str
, required): A identifier for this attachment on the target. Unique within the target.setup_sql
(str
, required): SQL to execute during setup.teardown_sql
(str
, optional): SQL to execute on removal/drop.
Notes about setup_sql
and teardown_sql
:
- Multiple statements are allowed in both
setup_sql
andteardown_sql
. Use;
to separate them. - Both
setup_sql
andteardown_sql
are expected to be idempotent, e.g. use statements likeCREATE ... IF NOT EXISTS
andDROP ... IF EXISTS
. - The
setup_sql
is expected to have an "upsert" behavior. If you updatesetup_sql
, the updatedsetup_sql
will be executed during setup. - The
teardown_sql
is saved by CocoIndex, so it'll be executed when the attachment no longer exists. If you updateteardown_sql
, the updatedteardown_sql
will be saved and executed (instead of the previous one) during teardown.
Example (create a custom index):
collector.export(
"doc_embeddings",
cocoindex.targets.Postgres(table_name="doc_embeddings"),
primary_key_fields=["id"],
attachments=[
cocoindex.targets.PostgresSqlCommand(
name="fts",
setup_sql=(
"CREATE INDEX IF NOT EXISTS doc_embeddings_text_fts "
"ON doc_embeddings USING GIN (to_tsvector('english', text));"
),
teardown_sql= "DROP INDEX IF EXISTS doc_embeddings_text_fts;",
)
],
)