Memoization keys & states

Customize how CocoIndex fingerprints memoized inputs via __coco_memo_key__ or registered functions, and layer state validation (e.g. mtime, then content hash) on top — plus NotMemoKeyable to opt out for stateful types.

Version
v 1.0.0-alpha48
Last reviewed
Apr 19, 2026

As described in Function — Change detection, CocoIndex detects logic, input, and context changes to decide whether a memo can be reused. Function arguments, deps values, and context values with detect_change=True are all fingerprinted through the same data fingerprinting pipeline. By default, most types are fingerprinted automatically. This page covers how to customize that pipeline — how objects are fingerprinted and validated:

  • Memoization keys — how to control what CocoIndex uses as the fingerprint for your objects.
  • Memo states — how to add post-fingerprint validation to check freshness beyond simple equality.

How data fingerprinting works

For each data value (function argument, deps value, or context value), CocoIndex derives a canonical form with this precedence:

  1. If the object implements __coco_memo_key__(), CocoIndex uses its return value.
  2. Otherwise, if you registered a memo key function for the object’s type, CocoIndex uses that.
  3. Otherwise, CocoIndex falls back to structural canonicalization for a limited set of primitives/containers.

The following types are handled automatically (no custom key needed):

  • Primitives: None, bool, int, float, str, bytes, bytearray, memoryview
  • Containers: list, tuple, dict, set, frozenset (recursively canonicalized)
  • Dataclass instances: all fields included in definition order
  • Pydantic v2 models: all fields included
  • Class objects (type): identified by module and qualified name
  • Other picklable objects: used as a fallback via pickle

The canonical forms are combined into a deterministic fingerprint. If the fingerprint matches a cached entry, the cached result is reused — unless memo states indicate it’s stale (see Memo state validation below).

Customizing the memoization key

Define __coco_memo_key__ (when you control the type)

Implement a method on your class that returns a stable, deterministic value:

python
class MyType:
    def __coco_memo_key__(self) -> object:
        # Return small primitives / tuples.
        return (...)

Return something that uniquely identifies the semantic content your function depends on:

  • Good: small tuples of primitives, e.g. (stable_id, version)
  • Bad: memory addresses, unstable UUIDs, open file handles, datetime.now(), or large raw payloads

Example — DB row:

python
class UserRow:
    def __init__(self, user_id: int, updated_at: int) -> None:
        self.user_id = user_id
        self.updated_at = updated_at

    def __coco_memo_key__(self) -> object:
        return ("users", self.user_id, self.updated_at)

Register a key function (when you don’t control the type)

If you can’t add __coco_memo_key__ (stdlib / third-party types), register a handler:

python
from pathlib import Path
from cocoindex import register_memo_key_function

def path_key(p: Path) -> object:
    p = p.resolve()
    st = p.stat()
    return (str(p), st.st_mtime_ns, st.st_size)

register_memo_key_function(Path, path_key)
  • Registration is MRO-aware: if you register both a base class and a subclass, the most specific match wins.
  • Your key function must return the same kinds of stable objects as __coco_memo_key__ (small primitives/tuples).

Memo state validation

Sometimes fingerprint matching alone isn’t enough to decide whether a cached result is valid. For example:

  • Multi-level validation: for files, check the modified time first (cheap), and only read the file for a content fingerprint when the time doesn’t match.
  • Async validation: for an S3 object, send a HEAD request to check freshness — an inherently async operation.
  • Stateful validation: for HTTP resources, store the last fetch time and use If-Modified-Since on the next run.

Memo state validation addresses these by letting you attach a state function to your objects. It runs after a fingerprint match, giving you a chance to check freshness before the cached result is reused.

How it works

When CocoIndex finds a fingerprint match, it calls each state function with the stored state from the previous run:

  1. First run (no previous state): prev_state is coco.NON_EXISTENCE. Use coco.is_non_existence(prev_state) to detect this.
  2. Subsequent runs: prev_state is whatever you returned last time.

Your state function returns a coco.MemoStateOutcome(state=..., memo_valid=...):

  • state — the current state value. CocoIndex stores it for the next run.
  • memo_valid (bool, defaults to False) — whether the cached result is still valid.

This decouples “has the state changed?” from “can we reuse the memo?”:

  • MemoStateOutcome(state=new_state) → cache is invalid (default). Function re-executes, new state is stored. On the first run (no previous cache), simply return the initial state without setting memo_valid.
  • MemoStateOutcome(state=same_state, memo_valid=True) → nothing changed, cached result reused, no state update needed.
  • MemoStateOutcome(state=new_state, memo_valid=True) → state changed but cached result is still valid (e.g. mtime changed but content hash unchanged). The new state is persisted so the next run uses the updated state.

Define __coco_memo_state__ (when you control the type)

Type annotations

Annotate the prev_state parameter with its expected type (matching what you return in MemoStateOutcome(state=...)) so CocoIndex can properly reconstruct stored state values. See Serialization for details on supported types.

Add a __coco_memo_state__ method alongside __coco_memo_key__:

python
import os
import hashlib
from pathlib import Path
import cocoindex as coco

class LocalFile:
    def __init__(self, path: Path) -> None:
        self.path = path

    def __coco_memo_key__(self) -> object:
        # Identity only — which file is it?
        return str(self.path.resolve())

    def __coco_memo_state__(self, prev_state: tuple[int, str] | coco.NonExistenceType) -> coco.MemoStateOutcome:
        st = os.stat(self.path)
        new_mtime = st.st_mtime_ns
        if coco.is_non_existence(prev_state):
            # First run — compute initial state (memo_valid defaults to False,
            # which is fine since there's no previous cache to reuse)
            content_hash = hashlib.sha256(self.path.read_bytes()).hexdigest()
            return coco.MemoStateOutcome(state=(new_mtime, content_hash))

        prev_mtime, prev_hash = prev_state
        if new_mtime == prev_mtime:
            # mtime unchanged — definitely reusable, no content read needed
            return coco.MemoStateOutcome(state=prev_state, memo_valid=True)
        # mtime changed — read content and check hash
        content_hash = coco.connectorkits.fingerprint_bytes(self.path.read_bytes())
        return coco.MemoStateOutcome(state=(new_mtime, content_hash), memo_valid=content_hash == prev_hash)
Keys vs states for files

Without state validation, you’d include mtime and size directly in the memo key:

python
def __coco_memo_key__(self):
    st = os.stat(self.path)
    return (str(self.path.resolve()), st.st_mtime_ns, st.st_size)

This works for simple cases. State validation becomes useful when you need multi-level checks (e.g. check mtime first, then content hash only if it differs), async operations, or stored metadata like ETags. With the MemoStateOutcome return, you can update the state (e.g. new mtime) without invalidating the cache when the content hasn’t actually changed.

Register a state function (when you don’t control the type)

Pass a state_fn keyword argument to register_memo_key_function. The state function receives the object as its first argument and prev_state as its second. Annotate prev_state with the expected type:

python
from pathlib import Path
from cocoindex import register_memo_key_function

def path_key(p: Path) -> object:
    return str(p.resolve())

def path_state(p: Path, prev_state: tuple[int, int] | coco.NonExistenceType) -> coco.MemoStateOutcome:
    st = p.stat()
    new_state = (st.st_mtime_ns, st.st_size)
    memo_valid = not coco.is_non_existence(prev_state) and new_state == prev_state
    return coco.MemoStateOutcome(state=new_state, memo_valid=memo_valid)

register_memo_key_function(Path, path_key, state_fn=path_state)

Async state methods

A state method can return an Awaitable. CocoIndex handles this automatically:

  • In an async CocoIndex function: awaitables from all state methods are gathered concurrently.
  • In a sync CocoIndex function: if no event loop is running, CocoIndex uses asyncio.run(). If a loop is already running, it raises an error — switch to an async function or use @coco.fn.as_async.
python
import cocoindex as coco

class S3Object:
    def __init__(self, bucket: str, key: str) -> None:
        self.bucket = bucket
        self.key = key

    def __coco_memo_key__(self) -> object:
        return (self.bucket, self.key)

    async def __coco_memo_state__(self, prev_state: str | coco.NonExistenceType) -> coco.MemoStateOutcome:
        etag = await self._head_object()
        memo_valid = not coco.is_non_existence(prev_state) and etag == prev_state
        return coco.MemoStateOutcome(state=etag, memo_valid=memo_valid)

    async def _head_object(self) -> str:
        ...  # boto3 / aioboto3 HEAD call

Preventing memoization

Some types maintain internal state that makes memoization semantically incorrect. For example, a generator that tracks call counts would produce wrong results if memoized.

Inherit from NotMemoKeyable (when you control the type)

python
import cocoindex as coco

class MyStatefulGenerator(coco.NotMemoKeyable):
    def __init__(self) -> None:
        self._counter = 0

    def next_value(self) -> int:
        self._counter += 1
        return self._counter

Register as not memo-keyable (when you don’t control the type)

python
import cocoindex as coco
from some_library import StatefulGenerator

coco.register_not_memo_keyable(StatefulGenerator)

In either case, attempting to use the type as a memo key raises a clear error.

Best practices

  • Keep keys small and deterministic: use identifiers and versions, not full payloads. No id(obj), pointer addresses, or random values.
  • Separate identity from freshness: put stable identifiers (file path, URL, primary key) in the key. Put freshness checks (mtime, ETag, version) in the state.
  • Use state validation for expensive checks: if freshness validation is costly (content hashing, network calls), a state function lets you do it only when the fingerprint matches, and only when a cheap pre-check (mtime) fails.
  • Use MemoStateOutcome(state=new_state, memo_valid=True) for cheap state updates: when a cheap property changes (mtime) but the expensive check (content hash) confirms nothing meaningful changed, return memo_valid=True while updating the state. This avoids re-executing the function and avoids re-checking the expensive property next time.
  • Mark stateful types as NotMemoKeyable: prevent subtle bugs from incorrect memoization of types with side effects.
CocoIndex Docs Edit this page Report issue