Docker + pgvector setup

A step-by-step Docker setup for pgvector, a Python virtualenv, a text-embedding flow, and a semantic-search query, with the common port and extension gotchas called out.

Time: ~30 minutes
Language: Python 3.11+
Requires: Docker
Version: v 0.3.37
Last reviewed: Mar 18, 2026

This tutorial walks through setting up CocoIndex with Docker-based PostgreSQL and pgvector, building a text embedding pipeline, and querying it with semantic search. It covers common gotchas and is written to be easy for both humans and AI coding assistants to follow.

Prerequisites

Python 3.11+
Docker

Step 1: Start PostgreSQL with pgvector

CocoIndex requires the vector PostgreSQL extension for embedding storage and HNSW indexes. You must use a pgvector-enabled image, not plain postgres.

Using the project’s docker compose config:

docker compose -f <(curl -L https://raw.githubusercontent.com/cocoindex-io/cocoindex/refs/heads/main/dev/postgres.yaml) up -d

Or manually:

docker run -d --name cocoindex-postgres \
  -e POSTGRES_USER=cocoindex \
  -e POSTGRES_PASSWORD=cocoindex \
  -e POSTGRES_DB=cocoindex \
  -p 5432:5432 \
  pgvector/pgvector:pg17

Use pgvector image

Using plain postgres:16 or postgres:17 will fail with extension "vector" is not available when CocoIndex tries to create the vector index.

Port conflict tip: If you get unexpected “password authentication failed” errors, check that no other process (such as an SSH tunnel) is listening on your chosen port:

lsof -i :5432

You should only see Docker’s process. If another process is listed, choose a different port (e.g., -p 5450:5432).

Running alongside other PostgreSQL instances

If port 5432 is already in use, map to a different host port:

docker run -d --name cocoindex-postgres \
  -e POSTGRES_USER=cocoindex \
  -e POSTGRES_PASSWORD=cocoindex \
  -e POSTGRES_DB=cocoindex \
  -p 5450:5432 \
  pgvector/pgvector:pg17

Then adjust the port in your database URL accordingly.

Step 2: Create a Python environment

mkdir cocoindex-quickstart && cd cocoindex-quickstart
python3 -m venv .venv
source .venv/bin/activate
pip install -U 'cocoindex[embeddings]'

The [embeddings] extra installs sentence-transformers for local embedding generation (no API key required).

Step 3: Configure the database connection

Create a .env file in your project directory. CocoIndex loads it automatically:

COCOINDEX_DATABASE_URL=postgresql://cocoindex:cocoindex@localhost:5432/cocoindex

Info

CocoIndex uses python-dotenv and loads .env from the current directory. The .env value takes precedence over shell environment variables.

Step 4: Define the pipeline

Create main.py:

import cocoindex

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    data_scope["documents"] = flow_builder.add_source(
        cocoindex.sources.LocalFile(path="markdown_files"))

    doc_embeddings = data_scope.add_collector()

    with data_scope["documents"].row() as doc:
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        with doc["chunks"].row() as chunk:
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"
                )
            )
            doc_embeddings.collect(
                filename=doc["filename"],
                location=chunk["location"],
                text=chunk["text"],
                embedding=chunk["embedding"],
            )

    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.storages.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY,
            )
        ],
    )

Step 5: Add source files and run

Create a markdown_files/ directory with some markdown content, then build the index:

mkdir markdown_files
# Add your .md files to markdown_files/

cocoindex update main.py

CocoIndex will show the tables it needs to create and ask for confirmation. Type yes to proceed.

Step 6: Query with semantic search

Install psycopg2 for direct database queries:

pip install psycopg2-binary

Create query.py:

import sys
from sentence_transformers import SentenceTransformer
import psycopg2

DB_URL = "postgresql://cocoindex:cocoindex@localhost:5432/cocoindex"

def search(query: str, top_k: int = 3):
    model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
    embedding = model.encode(query)
    vec_str = "[" + ",".join(str(x) for x in embedding) + "]"

    conn = psycopg2.connect(DB_URL)
    cur = conn.cursor()
    cur.execute("""
        SELECT filename, left(text, 200),
               1 - (embedding <=> %s::vector) as similarity
        FROM textembedding__doc_embeddings
        ORDER BY embedding <=> %s::vector
        LIMIT %s
    """, (vec_str, vec_str, top_k))

    results = cur.fetchall()
    cur.close()
    conn.close()
    return results

if __name__ == "__main__":
    query = " ".join(sys.argv[1:]) or "What is CocoIndex?"
    print(f"\nQuery: {query}\n")
    for filename, text, score in search(query):
        print(f"Score: {score:.4f} | {filename}")
        print(f"  {text.strip()[:150]}...\n")

python query.py "how do vector embeddings work?"

Common issues

Table naming

CocoIndex lowercases flow names when creating tables. A flow named TextEmbedding with an export named doc_embeddings creates the table textembedding__doc_embeddings.

Docker volume persistence

If you change Postgres environment variables (user, password) but reuse the same container volume, the old credentials persist. Remove the volume when recreating:

docker rm -v cocoindex-postgres

Deprecated APIs

If you see examples using cocoindex.main_fn(), that API was removed in v0.3.36+. Use the cocoindex CLI directly instead:

cocoindex update main.py

Using with Claude Code

If you’re using Claude Code, install the CocoIndex skill for up-to-date API knowledge and workflow support:

/plugin marketplace add cocoindex-io/cocoindex-claude
/plugin install cocoindex-skills@cocoindex

This helps Claude Code generate correct CocoIndex pipeline code and avoid deprecated APIs.

Next steps

Explore Query Support for CocoIndex’s built-in query handlers
Try other targets like Qdrant, Pinecone, or LanceDB
Set up Live Updates for continuous indexing