Docker + pgvector setup

A step-by-step Docker setup for pgvector, a Python virtualenv, a text-embedding flow, and a semantic-search query, with the common port and extension gotchas called out.

Time
~30 minutes
Language
Python 3.11+
Requires
Docker
Version
v 0.3.37
Last reviewed
Mar 18, 2026

This tutorial walks through setting up CocoIndex with Docker-based PostgreSQL and pgvector, building a text embedding pipeline, and querying it with semantic search. It covers common gotchas and is written to be easy for both humans and AI coding assistants to follow.

Prerequisites

  • Python 3.11+
  • Docker

Step 1: Start PostgreSQL with pgvector

CocoIndex requires the vector PostgreSQL extension for embedding storage and HNSW indexes. You must use a pgvector-enabled image, not plain postgres.

Using the project’s docker compose config:

docker compose -f <(curl -L https://raw.githubusercontent.com/cocoindex-io/cocoindex/refs/heads/main/dev/postgres.yaml) up -d

Or manually:

docker run -d --name cocoindex-postgres \
  -e POSTGRES_USER=cocoindex \
  -e POSTGRES_PASSWORD=cocoindex \
  -e POSTGRES_DB=cocoindex \
  -p 5432:5432 \
  pgvector/pgvector:pg17
!
Use pgvector image

Using plain postgres:16 or postgres:17 will fail with extension "vector" is not available when CocoIndex tries to create the vector index.

Port conflict tip: If you get unexpected “password authentication failed” errors, check that no other process (such as an SSH tunnel) is listening on your chosen port:

lsof -i :5432

You should only see Docker’s process. If another process is listed, choose a different port (e.g., -p 5450:5432).

Running alongside other PostgreSQL instances

If port 5432 is already in use, map to a different host port:

docker run -d --name cocoindex-postgres \
  -e POSTGRES_USER=cocoindex \
  -e POSTGRES_PASSWORD=cocoindex \
  -e POSTGRES_DB=cocoindex \
  -p 5450:5432 \
  pgvector/pgvector:pg17

Then adjust the port in your database URL accordingly.

Step 2: Create a Python environment

mkdir cocoindex-quickstart && cd cocoindex-quickstart
python3 -m venv .venv
source .venv/bin/activate
pip install -U 'cocoindex[embeddings]'

The [embeddings] extra installs sentence-transformers for local embedding generation (no API key required).

Step 3: Configure the database connection

Create a .env file in your project directory. CocoIndex loads it automatically:

COCOINDEX_DATABASE_URL=postgresql://cocoindex:cocoindex@localhost:5432/cocoindex
i
Info

CocoIndex uses python-dotenv and loads .env from the current directory. The .env value takes precedence over shell environment variables.

Step 4: Define the pipeline

Create main.py:

import cocoindex

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    data_scope["documents"] = flow_builder.add_source(
        cocoindex.sources.LocalFile(path="markdown_files"))

    doc_embeddings = data_scope.add_collector()

    with data_scope["documents"].row() as doc:
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        with doc["chunks"].row() as chunk:
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"
                )
            )
            doc_embeddings.collect(
                filename=doc["filename"],
                location=chunk["location"],
                text=chunk["text"],
                embedding=chunk["embedding"],
            )

    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.storages.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY,
            )
        ],
    )

Step 5: Add source files and run

Create a markdown_files/ directory with some markdown content, then build the index:

mkdir markdown_files
# Add your .md files to markdown_files/

cocoindex update main.py

CocoIndex will show the tables it needs to create and ask for confirmation. Type yes to proceed.

Install psycopg2 for direct database queries:

pip install psycopg2-binary

Create query.py:

import sys
from sentence_transformers import SentenceTransformer
import psycopg2

DB_URL = "postgresql://cocoindex:cocoindex@localhost:5432/cocoindex"

def search(query: str, top_k: int = 3):
    model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
    embedding = model.encode(query)
    vec_str = "[" + ",".join(str(x) for x in embedding) + "]"

    conn = psycopg2.connect(DB_URL)
    cur = conn.cursor()
    cur.execute("""
        SELECT filename, left(text, 200),
               1 - (embedding <=> %s::vector) as similarity
        FROM textembedding__doc_embeddings
        ORDER BY embedding <=> %s::vector
        LIMIT %s
    """, (vec_str, vec_str, top_k))

    results = cur.fetchall()
    cur.close()
    conn.close()
    return results

if __name__ == "__main__":
    query = " ".join(sys.argv[1:]) or "What is CocoIndex?"
    print(f"\nQuery: {query}\n")
    for filename, text, score in search(query):
        print(f"Score: {score:.4f} | {filename}")
        print(f"  {text.strip()[:150]}...\n")
python query.py "how do vector embeddings work?"

Common issues

Table naming

CocoIndex lowercases flow names when creating tables. A flow named TextEmbedding with an export named doc_embeddings creates the table textembedding__doc_embeddings.

Docker volume persistence

If you change Postgres environment variables (user, password) but reuse the same container volume, the old credentials persist. Remove the volume when recreating:

docker rm -v cocoindex-postgres

Deprecated APIs

If you see examples using cocoindex.main_fn(), that API was removed in v0.3.36+. Use the cocoindex CLI directly instead:

cocoindex update main.py

Using with Claude Code

If you’re using Claude Code, install the CocoIndex skill for up-to-date API knowledge and workflow support:

/plugin marketplace add cocoindex-io/cocoindex-claude
/plugin install cocoindex-skills@cocoindex

This helps Claude Code generate correct CocoIndex pipeline code and avoid deprecated APIs.

Next steps

  • Explore Query Support for CocoIndex’s built-in query handlers
  • Try other targets like Qdrant, Pinecone, or LanceDB
  • Set up Live Updates for continuous indexing
CocoIndex Docs Edit this page Report issue