eBPF: Tracing Kernel Events for IRIS Workloads

I attended Cloud Native Security Con in Seattle with the full intention of crushing OTEL day, then perusing the subject of security applied to Cloud Native workloads the following days leading up to CTF as a professional exercise. This was happily upended by a new understanding of eBPF, which got my screens, career, workloads, and attitude a much-needed upgrade with new approaches to solving workload problems.

So I made it to the eBPF party and have been attending clinic after clinic on the subject ever since. Here I would like to “unbox” eBPF as a technical solution, mapped directly to what we do in practice, and step through my experimentation on supporting InterSystems IRIS Workloads, particularly on Kubernetes.

eBee Steps

eBee Steps with eBPF and InterSystems IRIS Workloads

What is eBPF?

eBPF (extended Berkeley Packet Filter) is a killer Linux kernel feature that implements a VM within kernel space and makes it possible to run sandboxed apps safely with guiderails. These apps can “map” data into user land for observability, tracing, security, and networking. I think of it as a “sniffer” of the OS. While traditionally it was associated with BPF and networking, the extended version “sniffs” tracepoints, processes, scheduling, execution, and block device access.

“What JavaScript is to the browser, eBPF is to the Linux Kernel”

The VM

eBPF VM Architecture

Programs are written in a restricted C, then compiled into eBPF bytecode. This bytecode is then loaded into the kernel, where a Verifier ensures it won’t crash the kernel (checking for loops, out-of-bounds access, etc.). Once verified, it is JIT-compiled into native machine code and attached to a specific “hook.”

The Hooks: How we see IRIS

To get visibility into IRIS, we can attach eBPF programs to various hooks in the kernel or user space.

Tracepoints

Tracepoints are static hooks placed in the kernel code by developers. They are stable and provide a reliable way to hook into common events like system calls.

Kernel Tracepoints

Kprobes

Kprobes (Kernel Probes) allow you to dynamically break into any kernel routine and collect information. Unlike tracepoints, they are not static and can be used on almost any kernel function.

Kprobes Illustration

Uprobes

Uprobes (User Probes) are the user-space equivalent of kprobes. They allow you to hook into functions within a user-space application—like irisdb or web gateway. This is incredibly powerful for tracing custom application logic without modifying the binary.

Uprobes Illustration

The BCC Toolkit

BCC Toolkit Logo

Writing raw eBPF can be tough. The BCC (BPF Compiler Collection) toolkit makes it easier by providing a framework for writing BPF programs with Python or Lua front-ends.

Example: Tracing `open()` syscalls on IRIS

Using opensnoop from the BCC toolkit, we can see every file InterSystems IRIS touches in real-time:

sudo opensnoop-bpfcc -n irisdb

PID    COMM               FD ERR PATH
3017691 irisdb              0   0 /data/IRIS/iris.cpf
3017691 irisdb             -1   2 /data/IRIS/_LastGood_.cpf_20240908
3017691 irisdb             -1   2 /data/IRIS/_LastGood_.cpf
3017691 irisdb              0   0 /data/IRIS/_LastGood_.cpf_5275
3017691 irisdb             -1   2 /data/IRIS/_LastGood_.cpf
3017756 irisdb             -1   2 /data/IRIS/mgr/journal/20240908.002
3017756 irisdb              0   0 /data/IRIS/mgr/journal/20240908.002z

Flamegraphs

FlameGraph Repository

One of the coolest things I stumbled upon was Brendan Gregg’s implementation of flamegraphs. They visualize bpf output to help understand performance and stack traces.

Given the following perf recording during a start/stop of IRIS:

sudo perf record -F 99 -a -g -- sleep 60
[ perf record: Captured and wrote 3.701 MB perf.data (15013 samples) ]

We can generate a flame graph with:

sudo perf script > out.perf
./stackcollapse-perf.pl out.perf > /tmp/gar.thing
./flamegraph.pl /tmp/gar.thing > flamegraph.svg

IRIS Flamegraph

Flamegraphs are interactive (when viewed as SVG) and allow you to drill down into stack traces.

X-axis: Represents the population of samples (not time, but frequency).
Y-axis: Shows stack depth. The top is the current function on CPU.
Width: Indicates how often that function (and its children) appeared in the samples.

“High and Wide” is what you look for when hunting performance bottlenecks.

Red == User-level

Orange == Kernel

Yellow == C++

Green == JIT/Java

This is especially powerful for those running Python in IRIS or productions with complex user-land code.

Onward and upward! I hope this piqued your interest in the world of eBPF.