Write to Neo4j — a property graph database — with support for node tables, relationship tables (edges), per-database multitenancy, atomic-batch writes via Bolt transactions, real CREATE CONSTRAINT uniqueness, and vector indexes with cosine / euclidean similarity.
Version
v 1.0.2
The neo4j connector writes records to Neo4j, a property graph database. It supports node tables (labels), relationship tables (edge types), per-database multitenancy (one Neo4j cluster, many isolated databases), real Cypher uniqueness constraints, and vector indexes via the CREATE VECTOR INDEX DDL form.
python
from cocoindex.connectors import neo4j
i
Dependencies
This connector requires additional dependencies. Install with:
bash
pip install cocoindex[neo4j]
Targets Neo4j 5.18+. Vector-index DDL (CREATE VECTOR INDEX … OPTIONS { indexConfig: { … } }) shipped in 5.18 — older 5.x servers will reject the DDL the connector emits.
Connection setup
Create a ConnectionFactory and provide it via a ContextKey. The factory holds the Bolt URI, optional auth, and the target database name; it lazily opens a Neo4j async driver and returns a graph handle on demand.
i
Note
The key name is load-bearing across runs — it’s the stable identity CocoIndex uses to track managed rows. See ContextKey as stable identity before renaming.
auth is optional — omit it for unauthenticated dev instances. database defaults to "neo4j" (the default db that ships with every Neo4j 5 installation).
Multitenancy
A single Neo4j cluster can host many isolated databases. Pair each database with its own ContextKey and ConnectionFactory(database=...):
Different ContextKeys with different database names produce fully separate target-state trees — changes to one never spill into the other.
As target
The neo4j connector provides target state APIs for writing records to node tables and relation tables. CocoIndex tracks what records should exist and automatically handles upserts and deletions.
Each apply batch is wrapped in a single Neo4j transaction (tx.commit() on success, rollback on exception), so partial writes never leak into the database. Within a batch, writes are ordered as node upserts → relation upserts → relation deletes → node deletes so dependent edges always see their endpoints.
Declaring target states
Node tables (parent state)
Declares a node label as a target state. Returns a TableTarget for declaring records.
db — A ContextKey[neo4j.ConnectionFactory] for the Neo4j connection.
table_name — The Cypher node label (e.g. "Document").
table_schema — Optional schema definition (see Table Schema). The schema participates in CocoIndex’s fingerprint (so two flows declaring the same label must agree); per-property type DDL is not emitted in v1.
primary_key — Single property name used as the node’s primary key. Defaults to "id". Compound primary keys are not supported in v1.0.
managed_by — Whether CocoIndex manages the table lifecycle ("system") or assumes it exists ("user").
Returns: A pending TableTarget. Use await neo4j.mount_table_target(KG_DB, ...) to get a resolved target.
Records (child states)
Once a TableTarget is resolved, declare records to be upserted (translated to MERGE (n:Label {pk: $key_0}) SET n += $props):
from_id — The source node’s primary-key value. The connector MERGEs (s:FromLabel {pk: $from_id}) so endpoints are auto-created if absent.
to_id — The target node’s primary-key value. Same MERGE behavior.
record — Optional row object whose fields populate the relationship’s properties. Must include the relationship’s primary_key field if provided.
If record is omitted, the connector derives a deterministic edge id of the form {from_label}_{from_id}_{to_label}_{to_id}. Convenient when an edge has no properties of its own.
Vector indexes (attachment)
Declares a vector index on a column of a node table. Vector indexes are an attachment to a TableTarget:
name — Optional logical name for the index. Defaults to f"vec_{table_name}__{field}".
field — The node property holding the vector.
metric — Similarity metric: "cosine" or "euclidean". Translated to Neo4j’s vector.similarity_function option.
dimension — The vector’s dimension. Required.
The connector emits:
cypher
CREATE VECTOR INDEX `coco_vec_<Label>__<field>` IF NOT EXISTSFOR (n:`Label`) ON n.`field`OPTIONS { indexConfig: { `vector.dimensions`: <N>, `vector.similarity_function`: '<metric>'} }
Vectors are float32 only.
Table schema: from Python class
Build a TableSchema by introspecting a record type:
The neo4j_type string is metadata-only — it participates in the schema fingerprint (so two flows declaring the same table must agree) but no per-property type DDL is emitted from it.
VectorSchemaProvider
For NumPy ndarray columns, attach a VectorSchema annotation to specify dtype + dimension. See VectorSchema for details.
Table schema: explicit column definitions
Build a TableSchema directly from a dict of column definitions when the row type is dynamic:
type — The Neo4j type string (metadata only; see table above).
nullable — Whether the column may be None. Defaults to True.
encoder — Optional Callable[[Any], Any] applied to non-None values before they’re sent to Neo4j.
DDL: indexes and constraints
For each managed table, the connector creates supporting Cypher artifacts on first run:
For node tables: a uniqueness constraint on the primary key —
cypher
CREATE CONSTRAINT `coco_uniq_<Label>__<pk>` IF NOT EXISTSFOR (n:`<Label>`) REQUIRE n.`<pk>` IS UNIQUE
Neo4j auto-creates a backing index for each constraint, so a separate CREATE INDEX is redundant on nodes.
For relation tables:
cypher
CREATE INDEX `coco_idx_rel_<RelType>__<pk>` IF NOT EXISTSFOR ()-[r:`<RelType>`]-() ON (r.`<pk>`)
Indexes and constraints are dropped on cocoindex drop or when the table is no longer declared.
When managed_by="user" is set, the connector skips DDL entirely — you’re responsible for creating and dropping the schema. Record-level upserts and deletes still work.
The Entity table is declared up-front (via mount_table_target) so its uniqueness constraint is reconciled before any RELATIONSHIP edge MERGEs entity endpoints. The relationship’s three-MERGE pattern (source endpoint → target endpoint → edge) means missing endpoints are auto-created — but it’s good practice to declare them explicitly so deletion-cascade behavior stays predictable.