Custom functions

Define a custom function as a standalone Python function for simple cases, or as a spec paired with an executor when you need caching, GPU resources, or per-operation setup.

Language
Python 3.11+
Version
v 0.3.37
Last reviewed
Dec 22, 2025

A custom function can be defined in one of the following ways:

  • A standalone function. It’s simpler and doesn’t allow additional configurations and setup logic.
  • A function spec and an executor. It’s more powerful, allows additional configurations and setup logic.

Option 1: By a standalone function

It fits into simple cases that the function doesn’t need to take additional configurations and extra setup logic.

Examples

The cocoindex repository contains the following examples of custom functions defined in this way:

  • In the code_embedding example, extract_extension is a custom function to extract the extension of a file name.
  • In the manuals_llm_extraction example, summarize_manuals is a custom function to summarize structured information of a manual page.

Option 2: By a function spec and an executor

This is more advanced and flexible way to define a custom function. It allows a function to be configured with the function spec, and allow preparation logic before execution, e.g. initialize a model based on the spec.

Function Spec

The function spec of a function configures behavior of a specific instance of the function. When you use this function in a flow (typically by a transform()), you instantiate this function spec, with specific parameter values.

Function Executor

A function executor defines behavior of a function. It’s instantiated for each operation that uses this function.

The function executor is responsible for:

  • Prepare for the function execution, based on the spec. It happens once and only once before execution. e.g. if the function calls a machine learning model, the model name can be a parameter as a field of the spec, and we may load the model in this phase.
  • Run the function, for each specific input arguments. This happens multiple times, for each specific row of data.

Examples

The cocoindex repository contains the following examples of custom functions defined in this way:

  • In the pdf_embedding example, we define a custom function PdfToMarkdown
  • The SentenceTransformerEmbed function shipped with the CocoIndex Python package is defined by Python SDK. Search for SentenceTransformerEmbedExecutor to see the code.

Parameters for custom functions

Custom functions take the following additional parameters:

  • gpu: bool: Whether the executor will use GPU. It will affect the way the function is scheduled.

  • cache: bool: Whether the executor will enable cache for this function. When True, the executor will cache the result of the function for reuse during reprocessing. We recommend to set this to True for any function that is computationally intensive.

  • batching: bool: Whether the executor will consume requests in batch. See the Batching section below for details.

  • max_batch_size: int | None: The maximum batch size for the executor.

  • timeout: datetime.timedelta | None: Timeout for this function execution. None means use the default (1800 seconds).

  • behavior_version: int: The version of the behavior of the function. When the version is changed, the function will be re-executed even if cache is enabled. It’s required to be set if cache is True.

  • arg_relationship: tuple[ArgRelationship, str]: It specifies the relationship between an input argument and the output, e.g. (ArgRelationship.CHUNKS_BASE_TEXT, "content") means the output is chunks for the text represented by the input argument with name content. This provides metadata for tools, e.g. CocoInsight. Currently the following attributes are supported:

    • ArgRelationship.CHUNKS_BASE_TEXT: The output is chunks for the text represented by the input argument. In this case, the output is expected to be a Table, whose each row represents a text chunk, and the first column has type Range, representing the range of the text chunk.
    • ArgRelationship.EMBEDDING_ORIGIN_TEXT: The output is embedding vector for the text represented by the input argument. The output is expected to be a Vector.
    • ArgRelationship.RECTS_BASE_IMAGE: The output is rectangles for the image represented by the input argument. The output is expected to be a Table, whose each row represents a rectangle, and the first column has type Struct, with fields min_x, min_y, max_x, max_y to represent the coordinates of the rectangle.

For example:

Batching

Batching allows a function executor to process multiple function calls in batch. Sometimes batching is more efficient than processing them one by one, e.g. running inference on GPU, calling remote APIs with quota limits, etc.

Batching can be enabled by setting the batching parameter to True in custom function parameters. Once it’s set to True, type of the argument and return value must be a list. Currently we only support batching functions taking a single argument.

CocoIndex Docs Edit this page Report issue