# Built-in operations

> **CocoIndex v1.** This page documents CocoIndex **v1** — a ground-up redesign from v0. When writing code, ignore any v0 flow-builder DSL or deprecated decorators.
>
> Source: https://cocoindex.io/docs/ops/ · Docs index: https://cocoindex.io/docs/llms.txt · Agent skill: https://cocoindex.io/docs/skill.md
>
> v0→v1 quick map — if you reach for these v0 symbols, stop and use the v1 form: `@cocoindex.flow_def`/`FlowBuilder` → `coco.App` + a `@coco.fn` main function; `add_collector()`/`collect()`/`export()` → declare target states (`declare_row`, `declare_file`); `cocoindex.sources/functions/targets.*` → connector APIs (`localfs.walk_dir`, `coco.ops.*`, `postgres.declare_table_target`). Full mapping + API reference: https://cocoindex.io/docs/skill.md.

CocoIndex ships a set of built-in operations under the `cocoindex.ops` package.
Each module is independently importable and composes with the rest of a pipeline.

```python
from cocoindex.ops import text, litellm, sentence_transformers, entity_resolution
```

## Operation modules

| Module | What it provides |
|---|---|
| [Text operations](/docs/ops/text/) | Code language detection, regex-based `SeparatorSplitter`, and syntax-aware `RecursiveSplitter` (tree-sitter) returning position-tracked [`Chunk`](/docs/common_resources/data_types/#chunk)s. |
| [Sentence Transformers](/docs/ops/sentence_transformers/) | Local text embeddings via `sentence-transformers`, with model caching, thread-safe GPU access, and optional normalization. |
| [LiteLLM](/docs/ops/litellm/) | Embeddings and audio transcription through LiteLLM's unified API across 100+ providers (OpenAI, Azure, Vertex AI, Bedrock, Cohere, and more). |
| [Entity resolution](/docs/ops/entity_resolution/) | Deduplicate entity names via FAISS embedding similarity plus a pluggable LLM pair-resolver, with PINNED / PREFERRED canonical policies. |

## Most common: chunking code and text

The most-used built-in is `RecursiveSplitter`. It does syntax-aware splitting that
respects language structure (functions, classes, blocks) via tree-sitter, falling
back to separator-based splitting for unsupported languages.

```python
from cocoindex.ops.text import RecursiveSplitter

splitter = RecursiveSplitter()

chunks = splitter.split(
    python_code,
    chunk_size=1000,
    min_chunk_size=300,
    chunk_overlap=300,
    language="python",
)
```

It tries to keep each output chunk between `min_chunk_size` and `chunk_size`, splitting
at the highest-level boundary that fits and descending to finer boundaries when a piece
is still too large. See [Text operations](/docs/ops/text/) for the full parameter reference,
supported languages, and custom-language configuration.

## Embeddings

Two interchangeable embedding backends are available, both providing a
`VectorSchemaProvider` so vector columns are configured automatically by connectors:

- [Sentence Transformers](/docs/ops/sentence_transformers/) runs locally, no API key.
- [LiteLLM](/docs/ops/litellm/) reaches hosted providers behind one API.
