Skip to main content

Export markdown files to local Html with Custom Targets

View on GitHub

Overview

Let’s walk through a simple example—exporting .md files as .html using a custom file-based target. This project monitors folder changes and continuously converts markdown to HTML incrementally. Check out the full source code.

The overall flow is simple: This example focuses on

  • how to configure your custom target
  • the flow effortless picks up the changes in the source, recomputes only what's changed and export to the target

Ingest files

Ingest a list of markdown files:

@cocoindex.flow_def(name="CustomOutputFiles")
def custom_output_files(
flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
) -> None:
"""
Define an example flow that exports markdown files to HTML files.
"""
data_scope["documents"] = flow_builder.add_source(
cocoindex.sources.LocalFile(path="data", included_patterns=["*.md"]),
refresh_interval=timedelta(seconds=5),
)

This ingestion creates a table with filename and content fields.

Process each file and collect

Define custom function that converts markdown to HTML

@cocoindex.op.function()

def markdown_to_html(text: str) -> str:
return _markdown_it.render(text)

Define data collector and transform each document to html.

output_html = data_scope.add_collector()
with data_scope["documents"].row() as doc:
doc["html"] = doc["content"].transform(markdown_to_html)
output_html.collect(filename=doc["filename"], html=doc["html"])

Define the custom target

Define the target spec

The target spec contains a directory for output files:

class LocalFileTarget(cocoindex.op.TargetSpec):
directory: str

Implement the connector

get_persistent_key() defines the persistent key, which uniquely identifies the target for change tracking and incremental updates. Here, we simply use the target directory as the key (e.g., ./data/output).

@cocoindex.op.target_connector(spec_cls=LocalFileTarget)
class LocalFileTargetConnector:
@staticmethod
def get_persistent_key(spec: LocalFileTarget, target_name: str) -> str:
"""Use the directory path as the persistent key for this target."""
return spec.directory

The describe() method returns a human-readable string that describes the target, which is displayed in the CLI logs. For example, it prints:

Target: Local directory ./data/output

@staticmethod
def describe(key: str) -> str:
"""(Optional) Return a human-readable description of the target."""
return f"Local directory {key}"

apply_setup_change() applies setup changes to the backend. The previous and current specs are passed as arguments, and the method is expected to update the backend setup to match the current state.

A None spec indicates non-existence, so when previous is None, we need to create it, and when current is None, we need to delete it.

@staticmethod
def apply_setup_change(
key: str, previous: LocalFileTarget | None, current: LocalFileTarget | None
) -> None:
"""
Apply setup changes to the target.

Best practice: keep all actions idempotent.
"""

# Create the directory if it didn't exist.
if previous is None and current is not None:
os.makedirs(current.directory, exist_ok=True)

# Delete the directory with its contents if it no longer exists.
if previous is not None and current is None:
if os.path.isdir(previous.directory):
for filename in os.listdir(previous.directory):
if filename.endswith(".html"):
os.remove(os.path.join(previous.directory, filename))
os.rmdir(previous.directory)

The mutate() method is called by CocoIndex to apply data changes to the target, batching mutations to potentially multiple targets of the same type. This allows the target connector flexibility in implementation (e.g., atomic commits, or processing items with dependencies in a specific order).

Each element in the batch corresponds to a specific target and is represented by a tuple containing:

  • the target specification
  • all mutations for the target, represented by a dict mapping primary keys to value fields. Value fields can be represented by a dataclass—LocalFileTargetValues in this case:
@dataclasses.dataclass
class LocalFileTargetValues:
"""Represents value fields of exported data. Used in `mutate` method below."""

html: str

The value type of the dict is LocalFileTargetValues | None, where a non-None value means an upsert and None value means a delete. Similar to apply_setup_changes(), idempotency is expected here.

@staticmethod
def mutate(
*all_mutations: tuple[LocalFileTarget, dict[str, LocalFileTargetValues | None]],
) -> None:
"""
Mutate the target.
"""
for spec, mutations in all_mutations:
for filename, mutation in mutations.items():
full_path = os.path.join(spec.directory, filename) + ".html"
if mutation is None:
# Delete the file
try:
os.remove(full_path)
except FileNotFoundError:
pass
else:
# Create/update the file
with open(full_path, "w") as f:
f.write(mutation.html)

Use it in the Flow

    output_html.export(
"OutputHtml",
LocalFileTarget(directory="output_html"),
primary_key_fields=["filename"],
)

Run the example

Once your pipeline is set up, keeping your knowledge graph updated is simple:

pip install -e .
cocoindex update --setup main.py

You can add, modify, or remove files in the data/ directory — CocoIndex will only reprocess the changed files and update the target accordingly.

For real-time updates, run in live mode:

cocoindex update --setup -L main.py

This keeps your knowledge graph continuously synchronized with your document source — perfect for fast-changing environments like internal wikis or technical documentation.

Best Practices

  • Idempotency matters: apply_setup_change() and mutate() should be safe to run multiple times without unintended effects.
  • Prepare once, mutate many: If you need setup (such as establishing a connection), use prepare() to avoid repeating work.
  • Use structured types: For primary keys or values, CocoIndex supports simple types as well as dataclasses and NamedTuples.

Why Custom Targets?

Integration with internal system

Sometimes there may be an internal/homegrown tool or API (e.g. within a company) that's not publicly available. These can only be connected through custom targets.

Faster adoption of new export logic

When a new tool, database, or API joins your stack, simply define a Target Spec and Target Connector — start exporting right away, with no pipeline refactoring required.