# CocoIndex quickstart

> **CocoIndex v1.** This page documents CocoIndex **v1** — a ground-up redesign from v0. When writing code, ignore any v0 flow-builder DSL or deprecated decorators.
>
> Source: https://cocoindex.io/docs/getting_started/quickstart/ · Docs index: https://cocoindex.io/docs/llms.txt · Agent skill: https://cocoindex.io/docs/skill.md
>
> v0→v1 quick map — if you reach for these v0 symbols, stop and use the v1 form: `@cocoindex.flow_def`/`FlowBuilder` → `coco.App` + a `@coco.fn` main function; `add_collector()`/`collect()`/`export()` → declare target states (`declare_row`, `declare_file`); `cocoindex.sources/functions/targets.*` → connector APIs (`localfs.walk_dir`, `coco.ops.*`, `postgres.declare_table_target`). Full mapping + API reference: https://cocoindex.io/docs/skill.md.

In this tutorial, we'll build a simple app that converts PDF files to Markdown and saves them to a local directory.

## Overview

1. Read PDF files from a local directory
2. Convert each file to Markdown using Docling
3. Save the Markdown files to an output directory (as **target states**)

You declare the transformation logic with native Python without worrying about changes.

Think: **target_state = transformation(source_state)**

When your source data is updated, or your processing logic is changed (for example, switching parsers or tweaking conversion settings), CocoIndex performs smart incremental processing that only reprocesses the minimum. And it keeps your Markdown files always up to date.

## Setup

1. Install CocoIndex (see [Installation](./installation) for other package managers) and the Docling dependency:

    ```bash
    pip install -U cocoindex docling
    ```

2. Create a new directory for your project:

    ```bash
    mkdir cocoindex-quickstart
    cd cocoindex-quickstart
    ```

3. Create a `pdf_files/` directory and add your PDF files:

    ```bash
    mkdir pdf_files
    ```
    You can download sample PDF files from the [git repo](https://github.com/cocoindex-io/cocoindex/tree/main/examples/pdf_to_markdown).

4. Create a `.env` file to configure the database path:

    ```bash
    echo "COCOINDEX_DB=./cocoindex.db" > .env
    ```

## Define the app

At a high level, the app has three layers:

1. **App** — binds the pipeline function to concrete input and output paths
2. **Main function** — finds PDF files and mounts one processing component per file
3. **File processing** — converts one PDF to Markdown and declares the output file

We'll define the code in the opposite order so each Python symbol exists before it is referenced.

Create a new file `main.py`. We'll define the processing functions first, then wire them into an App.

### Define file processing

This function converts a single PDF to Markdown:

```python title="main.py"
import pathlib

import cocoindex as coco
from cocoindex.connectors import localfs
from cocoindex.resources.file import PatternFilePathMatcher
from docling.datamodel.accelerator_options import AcceleratorDevice, AcceleratorOptions
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption

_pipeline_options = PdfPipelineOptions(
    accelerator_options=AcceleratorOptions(device=AcceleratorDevice.CPU)
)
_converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(pipeline_options=_pipeline_options)
    }
)

@coco.fn(memo=True)
def process_file(
    file: localfs.File,
    outdir: pathlib.Path,
) -> None:
    markdown = _converter.convert(
        file.file_path.resolve()
    ).document.export_to_markdown()
    outname = file.file_path.path.stem + ".md"
    localfs.declare_file(outdir / outname, markdown, create_parent_dirs=True)
```

- **`localfs.File`** — A file object returned by `localfs.walk_dir()`, implementing the [`FileLike`](../common_resources/data_types#filelike) base class. See the [localfs connector](../connectors/localfs) for full details.
- **`memo=True`** — Caches results; unchanged files are skipped on re-runs
- **`localfs.declare_file()`** — Declares a file [target state](../programming_guide/target_state); auto-deleted if source is removed. See [localfs as target](../connectors/localfs#as-target) for the full API.

### Define the main function

```python title="main.py"
@coco.fn
async def app_main(sourcedir: pathlib.Path, outdir: pathlib.Path) -> None:
    files = localfs.walk_dir(
        sourcedir,
        recursive=True,
        path_matcher=PatternFilePathMatcher(included_patterns=["**/*.pdf"]),
    )
    await coco.mount_each(process_file, files.items(), outdir)
```

`mount_each()` mounts one processing component per file. Each item from `files.items()` is a `(key, file)` pair — the key (the file's relative path) becomes the component subpath automatically.

It's up to you to pick the process granularity — it can be at directory level, at file level, or at page level. In this example, because we want to independently convert each file to Markdown, the file level is the most natural choice.

### Create the App

```python title="main.py"
app = coco.App(
    "PdfToMarkdown",
    app_main,
    sourcedir=pathlib.Path("./pdf_files"),
    outdir=pathlib.Path("./out"),
)
```
This defines a CocoIndex App — the top-level runnable unit in CocoIndex. It binds the main function with its arguments.

## Run the pipeline

Run the pipeline:

```bash
cocoindex update main.py
```

CocoIndex will:

1. Create the `out/` directory
2. Convert each PDF in `pdf_files/` to Markdown in `out/`

Check the output:

```bash
ls out/
# example.md (one .md file for each input PDF)
```

## Incremental updates

The power of CocoIndex is **incremental processing**. Try these:

**Add a new file:**

Add a new PDF to `pdf_files/`, then run:

```bash
cocoindex update main.py
```

Only the new file is processed.

**Modify a file:**

Replace a PDF in `pdf_files/` with an updated version, then run:

```bash
cocoindex update main.py
```

Only the changed file is reprocessed.

**Delete a file:**

```bash
rm pdf_files/example.pdf
cocoindex update main.py
```

The corresponding Markdown file is automatically removed.

## Next steps

- Read [Core Concepts](../programming_guide/core_concepts) to understand the mental model — state-driven programming, processing components, and memoization
- Dive into the [Programming Guide](../programming_guide/app), starting with Apps, to learn how to build more complex pipelines
- Browse more [examples](https://github.com/cocoindex-io/cocoindex/tree/main/examples) for real-world patterns (text embedding, RAG, knowledge graphs)
