Skip to main content

CocoIndex Built-in Targets

For each target, data are exported from a data collector, containing data of multiple entries, each with multiple fields. The way to map data from a data collector to a target depends on data model of the target.

Targets Overview

Target TypeDescription
PostgresRelational Database, Vector Search (PGVector)
QdrantVector Database, Keyword Search
LanceDBVector Database, Keyword Search
Neo4jProperty graph
KuzuProperty graph

If you are looking for targets beyond here, you can always use custom targets as building blocks.

Property Graph Targets

Property graph is a widely-adopted model for knowledge graphs, where both nodes and relationships can have properties.

Graph database concepts has a good introduction to basic concepts of property graphs.

The following concepts will be used in the following sections:

Data Mapping

Data from collectors are mapped to graph elements in various types:

  1. Rows from collectors → Nodes in the graph
  2. Rows from collectors → Relationships in the graph (including source and target nodes of the relationship)

This is what you need to provide to define these mappings:

In addition, the same node may appear multiple times, from exported nodes and various relationships. They should appear as the same node in the target graph database. CocoIndex automatically matches and deduplicates nodes based on their primary key values.

Nodes to Export

Here's how CocoIndex data elements map to nodes in the graph:

CocoIndex ElementGraph Element
an export targetnodes with a unique label
a collected rowa node
a fielda property of node

Note that the label used in different Nodess should be unique.

cocoindex.targets.Nodes is to describe mapping to nodes. It has the following fields:

  • label (str): The label of the node.

For example, consider we have collected the following rows:

filenamesummary
chapter1.mdAt the beginning, ...
chapter2.mdIn the second day, ...

We can export them to nodes under label Document like this:

document_collector.export(
...
cocoindex.targets.Neo4j(
...
mapping=cocoindex.targets.Nodes(label="Document"),
),
primary_key_fields=["filename"],
)

The collected rows will be mapped to nodes in knowledge database like this:

Declare Extra Node Labels

If a node label needs to appear as source or target of a relationship, but not exported as a node, you need to declare the label with necessary configuration.

The dataclass to describe the declaration is specific to each target (e.g. cocoindex.targets.Neo4jDeclarations), while they share the following common fields:

  • nodes_label (required): The label of the node.
  • Options for storage indexes.
    • primary_key_fields (required)
    • vector_indexes (optional)

Continuing the same example above. Considering we want to extract relationships from Document to Place later (i.e. a document mentions a place), but the Place label isn't exported as a node, we need to declare it:

flow_builder.declare(
cocoindex.targets.Neo4jDeclarations(
connection = ...,
nodes_label="Place",
primary_key_fields=["name"],
),
)

Relationships to Export

Here's how CocoIndex data elements map to relationships in the graph:

CocoIndex ElementGraph Element
an export targetrelationships with a unique type
a collected rowa relationship
a fielda property of relationship, or a property of source/target node, based on configuration

Note that the type used in different Relationshipss should be unique.

cocoindex.targets.Relationships is to describe mapping to relationships. It has the following fields:

  • rel_type (str): The type of the relationship.
  • source/target (cocoindex.targets.NodeFromFields): Specify how to extract source/target node information from specific fields in the collected row. It has the following fields:
    • label (str): The label of the node.

    • fields (Sequence[cocoindex.targets.TargetFieldMapping]): Specify field mappings from the collected rows to node properties, with the following fields:

      • source (str): The name of the field in the collected row.
      • target (str, optional): The name of the field to use as the node field. If unspecified, will use the same as source.
      Map necessary fields for nodes of relationships

      You need to map the following fields for nodes of each relationship:

      • Make sure all primary key fields for the label are mapped.
      • Optionally, you can also map non-key fields. If you do so, please make sure all value fields are mapped.

All fields in the collector that are not used in mappings for source or target node fields will be mapped to relationship properties.

For example, consider we have collected the following rows, to describe places mentioned in each file, along with embeddings of the places:

doc_filenameplace_nameplace_embeddinglocation
chapter1.mdCrystal Palace[0.1, 0.5, ...]12
chapter2.mdMagic Forest[0.4, 0.2, ...]23
chapter2.mdCrystal Palace[0.1, 0.5, ...]56

We can export them to relationships under type MENTION like this:

doc_place_collector.export(
...
cocoindex.targets.Neo4j(
...
mapping=cocoindex.targets.Relationships(
rel_type="MENTION",
source=cocoindex.targets.NodeFromFields(
label="Document",
fields=[cocoindex.targets.TargetFieldMapping(source="doc_filename", target="filename")],
),
target=cocoindex.targets.NodeFromFields(
label="Place",
fields=[
cocoindex.targets.TargetFieldMapping(source="place_name", target="name"),
cocoindex.targets.TargetFieldMapping(source="place_embedding", target="embedding"),
],
),
),
),
...
)

The doc_filename field is mapped to Document.filename property for the source node, while place_name and place_embedding are mapped to Place.name and Place.embedding properties for the target node. The remaining field location becomes a property of the relationship. For the data above, we get a bunch of relationships like this:

Nodes Matching and Deduplicating

The nodes and relationships we got above are discrete elements. To fit them into a connected property graph, CocoIndex will match and deduplicate nodes automatically:

  • Match nodes based on their primary key values. Nodes with the same primary key values are considered as the same node.
  • For non-primary key fields (a.k.a. value fields), CocoIndex will pick the values from an arbitrary one. If multiple nodes (before deduplication) with the same primary key provide value fields, an arbitrary one will be picked.
note

The best practice is to make the value fields consistent across different appearances of the same node, to avoid non-determinism in the exported graph.

After matching and deduplication, we get the final graph:

Examples

You can find end-to-end examples fitting into any of supported property graphs in the following directories: