How GitHub Leverages eBPF to Fortify Deployment Safety

From Mbkuae Stack, the free encyclopedia of technology

The Challenge of Circular Dependencies

GitHub, like many tech companies, practices dogfooding: we host our own source code on github.com, making us our own biggest customer. While this approach helps us rigorously test new features before they reach external users, it introduces a unique vulnerability. If github.com were to experience an outage, we would lose access to our own repositories, creating a critical circular dependency that could hinder our ability to fix the very problem causing the downtime.

How GitHub Leverages eBPF to Fortify Deployment Safety
Source: github.blog

To mitigate this, we maintain a mirror of our code for forward-fix scenarios and keep built assets for rollbacks. However, this only addresses the obvious dependency. More subtle circular dependencies lurk within deployment scripts themselves—scripts that might inadvertently require github.com or other internal services to run. This article explores how we tackled these hidden risks using eBPF.

Types of Circular Dependencies

Consider a hypothetical MySQL outage: GitHub becomes unable to serve release data from its repositories. To resolve the incident, we need to deploy a configuration change to the affected stateful MySQL nodes by executing a deploy script on each node. This scenario reveals three categories of circular dependencies:

Direct Dependencies

The deploy script tries to pull the latest release of an open-source tool from GitHub. Because GitHub cannot serve that data during the outage, the script fails immediately. This is a straightforward, direct circular dependency.

Hidden Dependencies

Even if the script itself doesn't reach out to GitHub, a servicing tool already on the machine might check for updates upon execution. If that update check fails due to the outage, the script could hang or error out, depending on how the tool handles update failures. This dependency is hidden because it's not obvious from the script's code.

Transient Dependencies

The deploy script calls another internal service (like a migrations service) via an API. That service, in turn, attempts to fetch a binary from GitHub. The failure propagates back to the deploy script, causing it to stall. This transient dependency is indirect and often hard to detect.

Solving with eBPF

Previously, the responsibility fell on each team that owns stateful hosts to manually audit their deployment scripts for circular dependencies—a tedious, error-prone process. When we began designing our new host-based deployment system, we sought a more robust solution. eBPF (extended Berkeley Packet Filter) emerged as the answer.

eBPF allows us to safely and efficiently monitor system calls made by deployment scripts. By writing small eBPF programs, we can intercept network calls and block those that would create circular dependencies—for example, DNS lookups or HTTP requests to internal services or github.com itself. This selective monitoring ensures that deployment scripts run without inadvertently widening the outage impact.

How eBPF Works in This Context

We attach eBPF programs to tracepoints in the kernel, such as sys_connect or sys_sendto. When a deployment script attempts to make an outbound connection, the eBPF program inspects the destination IP address and port. If it matches a known circular dependency pattern (e.g., an internal service or GitHub.com), the call can be either logged for audit or blocked entirely. This gives us fine-grained control without modifying the application code.

For example, we can allow connections to essential services like DNS or package mirrors while blocking everything else. This approach prevents not only the direct, hidden, and transient dependencies mentioned earlier but also future dependencies that might be added inadvertently.

How GitHub Leverages eBPF to Fortify Deployment Safety
Source: github.blog

Implementation Details

We developed a small daemon that loads the eBPF program at boot time, ensuring protection from the start. The daemon uses the bpf() syscall to load and attach the program to the appropriate hooks. Cilium's eBPF libraries were instrumental in simplifying this process. For performance, we rely on eBPF maps to maintain a dynamic allowlist of safe destinations. This list can be updated without restarting the daemon.

Results and Benefits

By deploying eBPF on our host-based systems, we have virtually eliminated the risk of circular dependencies in deployment scripts. The system is lightweight (minimal CPU and memory overhead) and transparent to developers—no changes to scripts are required. If a script triggers a blocked call, it receives a clear error message, making debugging straightforward.

Moreover, the same eBPF programs are used to monitor and log all outbound calls during deployments, providing a rich audit trail. This helps us refine our allowlist and better understand dependency patterns across our infrastructure.

Getting Started with eBPF for Deployment Safety

If you want to implement a similar solution, here are the basic steps:

  1. Identify dependency patterns: Map out all services your deployment scripts might contact (internal APIs, package repositories, update checks).
  2. Write eBPF programs: Use C or a higher-level framework like Cilium's ebpf-go to create programs that inspect syscalls for network operations.
  3. Attach to kernel hooks: Use tracepoints or kprobes for network-related syscalls.
  4. Deploy a daemon: Load the eBPF programs on your host-based systems at boot.
  5. Iterate: Monitor logs and adjust your rules as needed.

For a detailed guide, refer to the eBPF documentation and the Cilium project.

Conclusion

Circular dependencies are a subtle and dangerous risk in any deployment system, especially when you rely on your own platform. By leveraging eBPF, GitHub has found a powerful, low-overhead way to automatically prevent such dependencies, ensuring that even during an outage, our deployment scripts can run safely. This approach has become an integral part of our host-based deployment architecture and reflects our commitment to reliability.

If you face similar challenges, we encourage you to explore eBPF—it's not just for networking and security; it's a versatile tool for building safer systems.