Skip to main content

A Leap Forward in Security: CocoIndex Joins GitHub Secure Open Source Fund

· 7 min read
Linghua Jin
CocoIndex Maintainer

A Leap Forward in Security: CocoIndex Joins GitHub Secure Open Source Fund

At CocoIndex, we've always believed that the strength of open source stays in its community and the shared goal of building transparent, robust, and reliable infrastructure for everyone. As an data transformation framework that sits between raw data and the AI systems that depend on it, security isn't just a feature -- it's a responsibility.

When developers trust CocoIndex to process sensitive documents, codebases, and enterprise data, we owe them the highest standards. We are honored to be selected to join GitHub Secure Open Source Fund, to formalize our commitment and accelerate our security practices.

What is GitHub's Secure Open Source Fund?

GitHub Secure Open Source Fund

When the Log4j "Log4Shell" vulnerability hit in December 2021, it exposed a harsh truth about modern software: one small, underfunded open-source project can ripple through the entire global tech stack. Most apps today rely on countless dependencies, many maintained by unpaid volunteers, and the security of those components is no longer a “nice-to-have” — it’s a shared responsibility.

Recognizing this, GitHub launched the Secure Open Source (SOS) Fund in late 2024 to strengthen the foundation of open-source security. The program pairs direct funding with a focused three-week security sprint that combines expert guidance, better tooling, and a community of security-conscious maintainers.

In its first two cohorts, 125 maintainers from 71 projects participated, collectively fixing over 1,100 vulnerabilities, disclosing 50+ new CVEs, and resolving nearly 270 secret exposures.

Beyond the numbers, the program built a culture of shared security practice -- participants exchanged knowledge, leveraged AI-assisted tools like GitHub Copilot for security tasks, and developed incident response plans that many have since published for others to reuse.

What is CocoIndex?

CocoIndex Dynamic context engineering for AI.

As AI systems evolve from chatbots to autonomous agents, one fundamental challenge remains unsolved: keeping context accurate and up to date in a world that never stops changing.

Enterprises sit on massive, dynamic datasets — documents, codebases, emails, APIs — yet most data infra is still batch-oriented and blind to change. Every shift in data or logic forces engineers to rebuild from scratch, breaking the feedback loop between real-time world state and AI decision-making. This is where dynamic context engineering comes in: maintaining an AI system's "mental model" of the world — continuously, reliably, and incrementally. CocoIndex is our take on making this simple, with a declarative Python-native data transformation engine designed for AI workloads — and you will never need to handle change.

Think of it as React for data processing.

The Cohort & Thanks to our batchmates

We were honored to be in the program alongside legendary projects from Pandas, Apache Airflow, Fabric.js, PyPI, and others. We heard from teams dealing with similar challenges -- how they handle dependency updates, how they structure security reviews, how they think about trust boundaries.

Open source can be isolating. You're deep in your own codebase, solving your own problems. This program reminded us that security is a shared challenge, and the solutions are better when they're shared too.

Strengthening CocoIndex: Security Improvements Implemented

Strengthening CocoIndex During the program, the CocoIndex team ran a focused security sprint to harden both our codebase and the way we build and ship it. With guidance from GitHub Security Lab and community experts, we concentrated on four areas that matter most for a framework sitting directly in AI data paths.

Threat Modeling & AI-Specific Risk Assessment. We systematically threat-modeled the CocoIndex framework -- examining how an attacker might exploit a data transformation pipeline that feeds directly into AI systems. We mapped out attack vectors including adversarial document content that could poison embeddings, prompt injection through source data that could manipulate LLM-based extraction, and compromised dependencies that could silently corrupt downstream indexes. Resources like the Adversarial AI Reading List and Embrace The Red were particularly useful in shaping this analysis. By identifying these failure points upfront, we developed mitigation strategies before they could be exploited in production pipelines.

Secure Coding & Automated Checks. We wired GitHub’s security tooling directly into CocoIndex’s development loop. CodeQL now runs on every pull request, scanning both our Rust core and Python SDK for vulnerability patterns. For a system where data flows from sources through transformations to targets, CodeQL’s data flow analysis maps naturally onto our architecture, tracing untrusted input from ingestion all the way to exported artifacts. We also enabled secret scanning to keep credentials and API keys out of the repository, and we started using GitHub Copilot as a guardrail during development to surface potential issues before code is even committed. Together, these checks act as a continuous security layer in our CI pipeline, catching problems early instead of after a release.

Dependency & Supply Chain Hardening. CocoIndex integrates with embedding models, vector databases, document parsers, and cloud storage backends -- each one pulling in its own dependency chain. We audited our dependencies across both the Rust core and Python SDK, going beyond direct dependencies to examine the transitive tree underneath. We tightened our dependency review process with more systematic tracking of what we depend on and why, and adopted the OpenSSF Scorecard to continuously benchmark our project's security posture across branch protection, dependency updates, CI/CD practices, and more. Following OpenSSF standards, we're working toward a comprehensive software bill of materials (SBOM) for each release, giving users visibility into exactly what ships with CocoIndex.

Policy, Documentation & Vulnerability Response. We formalized our SECURITY.md with clear instructions for responsible disclosure and defined response commitments. Vulnerabilities should be reported to [email protected], not filed as public issues. We also established an internal process for triaging and patching reported vulnerabilities -- who gets involved, how we communicate, and how quickly we ship fixes. Every new connector or transformation function now gets reviewed through the lens of: what's the worst-case input, and what happens if a dependency in this path is compromised?

Each of these improvements directly strengthens CocoIndex and the ecosystem built on top of it. CodeQL scans are already catching issues that might have been overlooked in manual review. Tightened dependency management and secret scanning reduce the risk of supply chain attacks through our integration points. And with a clear vulnerability disclosure and response process in place, developers building AI pipelines with CocoIndex can trust that security issues will be handled responsibly and transparently.

These changes don't just benefit CocoIndex in isolation -- they benefit every application downstream. By shipping these improvements now, we're helping protect the growing number of AI systems that rely on CocoIndex to keep their context accurate, up to date, and trustworthy.

Thank You to Our Contributors

None of this would have been possible without the people who build CocoIndex with us every day. Everyone who has submitted a PR, reported a bug, reviewed code, or helped another user in Discord -- you're the reason this project exists and the reason it was recognized.

We especially want to thank those doing the work that doesn't show up in commit logs: writing documentation, answering questions, and building examples that help new users get started. That work compounds quietly, and it matters more than most people realize.

Looking Forward

Security is a practice, not a milestone. It's something we build into every change, every new integration, every line of code.

For CocoIndex, the GitHub Secure Open Source Fund was a catalyst. It gave us dedicated time, expert guidance, and a community of peers to learn from. The improvements we've made will benefit every developer who builds an AI pipeline with CocoIndex.

We're grateful to GitHub and the entire cohort for this experience -- and we're committed to carrying these practices forward, not just in our own project, but by sharing what we've learned with the broader AI infrastructure community.


🌟 Star us on GitHub | Join our Discord