LocalFile source

Point at a directory, set glob include and exclude patterns, optionally enable watch_changes, and CocoIndex turns the filesystem into a KTable of filename and content.

Language
Python 3.11+
Version
v 0.3.37
Last reviewed
Feb 17, 2026

The LocalFile source imports files from a local file system.

Spec

The spec takes the following fields:

  • path (str): full path of the root directory to import files from

  • binary (bool, optional): whether reading files as binary (instead of text)

  • included_patterns (list[str], optional): a list of glob patterns to include files, e.g. ["*.txt", "docs/**/*.md"]. If not specified, all files will be included.

  • excluded_patterns (list[str], optional): a list of glob patterns to exclude files, e.g. ["tmp", "**/node_modules"]. Any file or directory matching these patterns will be excluded even if they match included_patterns. If not specified, no files will be excluded.

    i
    Info

    included_patterns and excluded_patterns are using Unix-style glob syntax. See globset syntax for the details.

  • max_file_size (int, optional): if provided, files exceeding this size in bytes will be treated as non-existent and skipped during processing. This is useful to avoid processing large files that are not relevant to your use case, such as videos or backups. If not specified, no size limit is applied.

  • watch_changes (bool, optional): if True, the source watches the file system for changes and automatically triggers updates when files are added, modified, or deleted. Defaults to False.

Schema

The output is a KTable with the following sub fields:

  • filename (Str, key): the filename of the file, including the path, relative to the root directory, e.g. "dir1/file1.md"
  • content (Str if binary is False, Bytes otherwise): the content of the file
CocoIndex Docs Edit this page Report issue