Custom Functions

A custom function can be defined in one of the following ways:

A standalone function. It's simpler and doesn't allow additional configurations and setup logic.
A function spec and an executor. It's more powerful, allows additional configurations and setup logic.

Option 1: By a standalone function

It fits into simple cases that the function doesn't need to take additional configurations and extra setup logic.

Python

The standalone function needs to be decorated by @cocoindex.op.function(), like this:

@cocoindex.op.function(...)
def compute_something(arg1: str, arg2: int | None = None) -> str:
    """
    Documentation for the function.
    """
    ...

Notes:

The cocoindex.op.function() function decorator also takes optional parameters. See Parameters for custom functions for details.
Types of arguments and the return value must be annotated, so that CocoIndex will have information about data types of the operation's output fields. See Data Types for supported types.

Examples

The cocoindex repository contains the following examples of custom functions defined in this way:

In the code_embedding example, extract_extension is a custom function to extract the extension of a file name.
In the manuals_llm_extraction example, summarize_manuals is a custom function to summarize structured information of a manual page.

Option 2: By a function spec and an executor

This is more advanced and flexible way to define a custom function. It allows a function to be configured with the function spec, and allow preparation logic before execution, e.g. initialize a model based on the spec.

Function Spec

The function spec of a function configures behavior of a specific instance of the function. When you use this function in a flow (typically by a transform()), you instantiate this function spec, with specific parameter values.

Python

A function spec is defined as a class that inherits from cocoindex.op.FunctionSpec.

class ComputeSomething(cocoindex.op.FunctionSpec):
    """
    Documentation for the function.
    """
    param1: str
    param2: int | None = None
    ...

Notes:

All fields of the spec must have a type serializable / deserializable by the json module.
All subclasses of FunctionSpec can be instantiated similar to a dataclass, i.e. ClassName(param1=value1, param2=value2, ...).

Function Executor

A function executor defines behavior of a function. It's initantiated for each operation that uses this function.

The function executor is responsible for:

Prepare for the function execution, based on the spec. It happens once and only once before execution. e.g. if the function calls a machine learning model, the model name can be a parameter as a field of the spec, and we may load the model in this phase.
Run the function, for each specific input arguments. This happens multiple times, for each specific row of data.

Python

A function executor is defined as a class decorated by @cocoindex.op.executor_class().

@cocoindex.op.executor_class(...)
class ComputeSomethingExecutor:
    spec: ComputeSomething
    ...

    def prepare(self) -> None:
        ...

    def __call__(self, arg1: str, arg2: int | None = None) -> str:
        ...

Notes:

The cocoindex.op.executor_class() class decorator also takes optional parameters. See Parameters for custom functions for details.
A spec field must be present in the class, and must be annoated with the spec class name.
The prepare() method is optional. It's executed once and only once before any __call__ execution, to prepare the function execution.
The __call__() method is required. It's executed for each specific rows of data. Types of arugments and the return value must be decorated, so that CocoIndex will have information about data types of the operation's output fields. See Data Types for supported types.

Examples

The cocoindex repository contains the following examples of custom functions defined in this way:

In the pdf_embedding example, we define a custom function PdfToMarkdown
The SentenceTransformerEmbed function shipped with the CocoIndex Python package is defined by Python SDK. Search for SentenceTransformerEmbedExecutor to see the code.

Parameters for custom functions

Custom functions take the following additional parameters:

gpu: bool: Whether the executor will use GPU. It will affect the way the function is scheduled.
cache: bool: Whether the executor will enable cache for this function. When True, the executor will cache the result of the function for reuse during reprocessing. We recommend to set this to True for any function that is computationally intensive.
behavior_version: int: The version of the behavior of the function. When the version is changed, the function will be re-executed even if cache is enabled. It's required to be set if cache is True.

For example:

Python

This enables cache for a standalone function:

@cocoindex.op.function(cache=True, behavior_version=1)
def compute_something(arg1: str, arg2: int | None = None) -> str:
    ...

This enables cache for a function defined by a spec and an executor:

class ComputeSomething(cocoindex.op.FunctionSpec):
    ...

@cocoindex.op.executor_class(cache=True, behavior_version=1)
class ComputeSomethingExecutor:
    spec: ComputeSomething

    ...

Option 1: By a standalone function​

Examples​

Option 2: By a function spec and an executor​

Function Spec​

Function Executor​

Examples​

Parameters for custom functions​

Option 1: By a standalone function

Examples

Option 2: By a function spec and an executor

Function Spec

Function Executor

Examples

Parameters for custom functions