vivarium 0.2.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CONTEXT.md +535 -0
- data/README.md +2 -2
- data/examples/raise_demo.rb +42 -0
- data/examples/sudo_attempt_demo.rb +18 -0
- data/exe/vivarium +6 -0
- data/image.png +0 -0
- data/lib/vivarium/cli.rb +40 -0
- data/lib/vivarium/correlator.rb +137 -0
- data/lib/vivarium/tree_renderer.rb +543 -0
- data/lib/vivarium/version.rb +1 -1
- data/lib/vivarium.rb +314 -171
- data/logo-simple.png +0 -0
- metadata +28 -5
- data/lib/vivarium/logger.rb +0 -80
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: a8bbb6affa1c85f3c5af82f751021cc6d0230741545bcb090a3441ae6285ddae
|
|
4
|
+
data.tar.gz: 0a51c1fd7065cf030363679f8f1c2370937365b0d22115326948cabb6053ded4
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 4efc19c34686b652e07213be2c6d865fa9b02fbe0a207b1f9e391cfd8591269f8dafbe6ad7c444091541bd1cb10e2c8a6ae07564caa5fe5deb2ed38226b3a38c
|
|
7
|
+
data.tar.gz: e0f15247ae4ed3ff159935686affd554626eb079bda7cf1db9e5f577df93db5ab57134c87ec926cab96da8302990c7524f187bbf120559b386cc40fe8399807f
|
data/CONTEXT.md
ADDED
|
@@ -0,0 +1,535 @@
|
|
|
1
|
+
# Vivarium Glossary
|
|
2
|
+
|
|
3
|
+
This file defines canonical terms used in Vivarium discussions and code.
|
|
4
|
+
When a term here conflicts with a term used in conversation or in code,
|
|
5
|
+
**this file wins** unless we explicitly update it. Refine the glossary as
|
|
6
|
+
the design evolves; do not drift the vocabulary.
|
|
7
|
+
|
|
8
|
+
Sections §1–§2 describe roles and events. Sections §3–§4 describe
|
|
9
|
+
**the original ArrayMap-based transport** and are now **superseded by
|
|
10
|
+
§5**, which has been implemented (as of 2026-05-27). §6 lists what
|
|
11
|
+
remains open. §7 captures the implementation outcome and the few
|
|
12
|
+
v1 deviations from §5.
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
## 1. Roles (today)
|
|
17
|
+
|
|
18
|
+
- **vivariumd** — The privileged background daemon that loads the BPF
|
|
19
|
+
program and pins shared maps under `bpf_pin_dir`. One per host.
|
|
20
|
+
- **Observer** — A Ruby process that calls `Vivarium.observe` (or
|
|
21
|
+
`Vivarium.top_observe`).
|
|
22
|
+
- **Target** — A process whose syscalls/LSM hooks vivariumd is currently
|
|
23
|
+
emitting events for. Union of:
|
|
24
|
+
- **Root target** — PID explicitly registered via `register_pid`
|
|
25
|
+
(pinned map `config_root_targets`).
|
|
26
|
+
- **Spawned target** — TID inserted by the `sched_process_fork`
|
|
27
|
+
tracepoint whenever a target forks (pinned map
|
|
28
|
+
`config_spawned_targets`).
|
|
29
|
+
|
|
30
|
+
> "PID" in `config_root_targets` is the userland `Process.pid` (kernel
|
|
31
|
+
> tgid); "TID" in `config_spawned_targets` is the kernel-level task id.
|
|
32
|
+
> Keep this distinction visible.
|
|
33
|
+
|
|
34
|
+
---
|
|
35
|
+
|
|
36
|
+
## 2. Events (today)
|
|
37
|
+
|
|
38
|
+
- **Event (`event_t`)** — A fixed 288-byte struct emitted by the BPF
|
|
39
|
+
program: `{ u64 ktime_ns; u32 pid; char event_name[16]; char payload[256]; }`.
|
|
40
|
+
- **Event name** — Short string identifying the hook source (`path_open`,
|
|
41
|
+
`proc_exec`, `dns_req`, `task_kill`, ...). See [README.md](README.md).
|
|
42
|
+
- **Severity** — `high` or `medium`, derived from event name.
|
|
43
|
+
- **Drain** — Reading all currently-pending events out of the shared
|
|
44
|
+
transport and clearing it. Today this is `MapStore#drain_events`.
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## 3. Transport (historical — superseded by §5, removed 2026-05-27)
|
|
49
|
+
|
|
50
|
+
- **Pinned maps** under bpffs (`/sys/fs/bpf/vivarium/`).
|
|
51
|
+
- **`event_invoked`** — 1024-slot `BPF_ARRAY` of `event_t`. **Wraps
|
|
52
|
+
silently on overflow.**
|
|
53
|
+
- **`event_write_pos`** — 1-slot `BPF_ARRAY` of `u32`, atomically
|
|
54
|
+
incremented BPF-side; reset to 0 on observer-side drain.
|
|
55
|
+
|
|
56
|
+
---
|
|
57
|
+
|
|
58
|
+
## 4. Correlation (historical — superseded by §5, removed 2026-05-27)
|
|
59
|
+
|
|
60
|
+
- **TracePoint correlation** — On every Ruby `:return` / `:c_return`,
|
|
61
|
+
the observer drains the buffer and attributes everything currently
|
|
62
|
+
sitting in it to the just-returned method. Positional, not identity-
|
|
63
|
+
based. Known to break when (a) a method returns before its events are
|
|
64
|
+
drained, (b) another thread emits events into the same buffer, or
|
|
65
|
+
(c) the buffer wraps.
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
## 5. v1 architecture (implemented 2026-05-27)
|
|
70
|
+
|
|
71
|
+
The following terms and components describe the **implemented**
|
|
72
|
+
ringbuf + USDT + Correlator architecture. See §7 for the file/symbol
|
|
73
|
+
map and v1 deviations.
|
|
74
|
+
|
|
75
|
+
### 5.1 Vocabulary
|
|
76
|
+
|
|
77
|
+
- **Span** — One activation of a Ruby method on one thread. Identified
|
|
78
|
+
by `(tid, method_id, start_ktime_ns)`. Delimited by:
|
|
79
|
+
- **`span_start`** — emitted via `Vivarium::Usdt` `start_probe`.
|
|
80
|
+
- **`span_stop`** — emitted via `stop_probe`. Fired on **both** normal
|
|
81
|
+
and exceptional returns (Ruby's `:return` / `:c_return` TracePoint
|
|
82
|
+
fires even when a method raises), so every Span is closed by exactly
|
|
83
|
+
one `span_stop`.
|
|
84
|
+
- **`span_raise`** — emitted via `raise_probe` on Ruby `:raise`. This
|
|
85
|
+
is an **event within a Span**, not a Span terminator: it sets a
|
|
86
|
+
`raised` flag on the innermost open Span on that tid and is also
|
|
87
|
+
rendered as an `EXCP` event line under that Span (§7.1).
|
|
88
|
+
Spans may nest within a single tid.
|
|
89
|
+
|
|
90
|
+
- **method_id** — 64-bit hash of `"#{defined_class}##{method_name}"`,
|
|
91
|
+
produced by `Vivarium::Usdt.register_or_resolve_method`.
|
|
92
|
+
|
|
93
|
+
- **System event** — Any non-Span event captured by vivariumd's BPF
|
|
94
|
+
program (LSM hooks, tracepoints, uprobes other than Span probes).
|
|
95
|
+
|
|
96
|
+
- **Span event** — `span_start` / `span_stop` / `span_raise`, captured
|
|
97
|
+
by vivariumd attaching to the USDT probe sites in the observer's
|
|
98
|
+
process. From vivariumd's point of view this is also "just" a uprobe.
|
|
99
|
+
|
|
100
|
+
- **Correlator** — The component that joins System events to Spans
|
|
101
|
+
using `tid` and `ktime_ns ∈ [span.start, span.stop|raise]`, and
|
|
102
|
+
renders the result as a Process Tree (see §5.4).
|
|
103
|
+
|
|
104
|
+
- **Process Tree** — The rendered output. A textual tree whose primary
|
|
105
|
+
axis is process lineage (parent → child via execve/fork). Spans and
|
|
106
|
+
events hang off process nodes. See §5.4 for the exact format.
|
|
107
|
+
|
|
108
|
+
### 5.2 Transport
|
|
109
|
+
|
|
110
|
+
- **Ringbuf** — `BPF_RINGBUF_OUTPUT` from BPF to userland, pinned on
|
|
111
|
+
bpffs. Replaces today's `event_invoked` ArrayMap.
|
|
112
|
+
- **Single ringbuf for v1** — Span events and System events flow into
|
|
113
|
+
the **same** ringbuf so they share a consistent ordering by
|
|
114
|
+
`ktime_ns` and can be reordered in one consumer loop.
|
|
115
|
+
- **Single consumer for v1** — Exactly one Observer per host is
|
|
116
|
+
supported in v1. Ringbuf is single-consumer by nature; multi-observer
|
|
117
|
+
is deferred (§6).
|
|
118
|
+
- **Pin path is parameterizable** — The ringbuf pin path is exposed
|
|
119
|
+
through the same `bpf_pin_dir` mechanism, so a future control
|
|
120
|
+
protocol can introduce per-observer ringbufs without changing the
|
|
121
|
+
consumer API.
|
|
122
|
+
- **Event schema (`event_t` v2)** — Field-reordered version of today's
|
|
123
|
+
`event_t` (§2) that adds a `tid` field without changing the
|
|
124
|
+
288-byte total size:
|
|
125
|
+
|
|
126
|
+
```c
|
|
127
|
+
struct event_t {
|
|
128
|
+
u64 ktime_ns; // 0..7
|
|
129
|
+
u32 pid; // 8..11 tgid
|
|
130
|
+
u32 tid; // 12..15 task id (NEW)
|
|
131
|
+
char event_name[16]; // 16..31
|
|
132
|
+
char payload[256]; // 32..287
|
|
133
|
+
};
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
The added `tid` is consumed by the Correlator's join algorithm
|
|
137
|
+
(§5.9).
|
|
138
|
+
|
|
139
|
+
### 5.3 Correlator placement
|
|
140
|
+
|
|
141
|
+
- **Lives in the Observer process** as a dedicated Ruby Thread (the
|
|
142
|
+
**Correlator Thread**), separate from the **Main Thread** that runs
|
|
143
|
+
user Ruby code and fires Span USDTs.
|
|
144
|
+
- **Must be extractable to a separate process in the future.** This
|
|
145
|
+
constrains the in-process design: the Main Thread and Correlator
|
|
146
|
+
Thread communicate only through a narrow message interface (§5.5),
|
|
147
|
+
never by sharing mutable state directly. Replacing the in-process
|
|
148
|
+
Queue with a Unix-socket transport must be a localized change.
|
|
149
|
+
|
|
150
|
+
### 5.4 Process Tree output format (Format A)
|
|
151
|
+
|
|
152
|
+
The canonical, human-and-machine readable rendering.
|
|
153
|
+
|
|
154
|
+
```
|
|
155
|
+
[PROC pid=100 comm=ruby]
|
|
156
|
+
└─ [SPAN tid=100 Net::HTTP#request dur=12.3ms]
|
|
157
|
+
├─ LSM socket_connect → tcp/192.168.1.1:443 @+0.4ms
|
|
158
|
+
├─ TP execve → /bin/sh ["-c","id"] @+8.0ms
|
|
159
|
+
│ └─ [PROC pid=101 comm=sh parent=100]
|
|
160
|
+
│ └─ USDT ssl_write → "POST /bad-endpoint" @+9.2ms
|
|
161
|
+
└─ [SPAN tid=100 SubCall#go dur=0.5ms]
|
|
162
|
+
└─ LSM file_open → /tmp/x @+10.0ms
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
**Line kinds:**
|
|
166
|
+
|
|
167
|
+
| Kind | Form |
|
|
168
|
+
|-------|---------------------------------------------------------|
|
|
169
|
+
| PROC | `[PROC pid=N comm=STR (parent=N)?]` |
|
|
170
|
+
| SPAN | `[SPAN tid=N FQNAME dur=Xms]` (closed form) |
|
|
171
|
+
| EVENT | `KIND name → target @+Xms` |
|
|
172
|
+
|
|
173
|
+
**Attribute conventions (load-bearing — keep stable):**
|
|
174
|
+
|
|
175
|
+
- Every attribute is `key=value` with no quoting unless the value
|
|
176
|
+
contains spaces; in that case use double quotes (`"..."`).
|
|
177
|
+
- Numeric durations are `Xms` or `Xus`. Timestamps in events use the
|
|
178
|
+
prefix `@+` and are relative to the **enclosing Span's start**.
|
|
179
|
+
- Bracketed `[...]` denotes a **container** (PROC or SPAN). Unbracketed
|
|
180
|
+
lines are **events** (leaves).
|
|
181
|
+
- Event KIND is one of: `LSM`, `TP`, `USDT`.
|
|
182
|
+
- The arrow `→` (U+2192) separates the event from its target.
|
|
183
|
+
- Box-drawing characters (`├ └ │ ─`) are decoration only; structure
|
|
184
|
+
must remain reconstructible from indentation depth alone, so AI/grep
|
|
185
|
+
parsers may ignore them.
|
|
186
|
+
|
|
187
|
+
**Process Tree edges:**
|
|
188
|
+
|
|
189
|
+
- A child PROC appears nested under the EVENT (`TP execve` or
|
|
190
|
+
`TP clone`/`TP fork`) that spawned it. The line above a nested PROC
|
|
191
|
+
is the causal event; the PROC's `parent=` attribute is redundant but
|
|
192
|
+
load-bearing for machine parsers that read PROC lines in isolation.
|
|
193
|
+
|
|
194
|
+
### 5.5 Main Thread ↔ Correlator Thread interface
|
|
195
|
+
|
|
196
|
+
For v1, the Main Thread publishes only **one** piece of context that
|
|
197
|
+
USDT cannot carry inline:
|
|
198
|
+
|
|
199
|
+
1. **method_id table** — when a new `method_id` is registered via
|
|
200
|
+
`Vivarium::Usdt.register_or_resolve_method`, the Main Thread sends
|
|
201
|
+
`(method_id, "Class#method")` to the Correlator.
|
|
202
|
+
|
|
203
|
+
All other Span data (`tid`, `ktime_ns`, exit status) arrives through
|
|
204
|
+
the ringbuf. This minimizes coupling and matches what a future
|
|
205
|
+
out-of-process Correlator will receive over IPC.
|
|
206
|
+
|
|
207
|
+
The transport used today is a `Thread::Queue`. Tomorrow it can become
|
|
208
|
+
a Unix socket without changing message semantics.
|
|
209
|
+
|
|
210
|
+
### 5.6 Stack trace handling
|
|
211
|
+
|
|
212
|
+
**Multi-frame stack traces are dropped in v1.** The pre-v1
|
|
213
|
+
`caller_locations(...)`-per-drain mechanism
|
|
214
|
+
(see [lib/vivarium/logger.rb](lib/vivarium/logger.rb)) is intentionally
|
|
215
|
+
not used; method context is preserved through Span nesting in the
|
|
216
|
+
Process Tree.
|
|
217
|
+
|
|
218
|
+
**Per-Span `file:lineno` is implemented** via the originally proposed
|
|
219
|
+
option **(b)**: the USDT probes carry a `(file_id, lineno)` argument
|
|
220
|
+
pair (24-byte payload for `span_start`/`span_stop`, 32-byte for
|
|
221
|
+
`span_raise` which adds `error_id`/`message_id`). `file_id` is
|
|
222
|
+
hash-resolved by `Vivarium::Usdt.register_or_resolve_file`, parallel
|
|
223
|
+
to `method_id`. Renderings:
|
|
224
|
+
|
|
225
|
+
- Span headers include `at=basename.rb:N` when the file is known
|
|
226
|
+
(`TreeRenderer#span_file_info`).
|
|
227
|
+
- `EXCP` event lines include `error=Class message="..." at=basename.rb:N`
|
|
228
|
+
(`TreeRenderer#render_raise_target`).
|
|
229
|
+
|
|
230
|
+
Option (a) (Queue-based delivery of file/line) was not adopted.
|
|
231
|
+
|
|
232
|
+
### 5.7 Threading scope (v1)
|
|
233
|
+
|
|
234
|
+
**v1 is single-Thread, single-Ractor on the Observer side.** This is
|
|
235
|
+
a deliberate PoC narrowing — we aim for a working end-to-end pipeline
|
|
236
|
+
before tackling concurrency.
|
|
237
|
+
|
|
238
|
+
Concretely:
|
|
239
|
+
|
|
240
|
+
- The Observer process has exactly two Ruby Threads: the **Main
|
|
241
|
+
Thread** (user code + TracePoint + USDT firing) and the
|
|
242
|
+
**Correlator Thread** (ringbuf consumer + tree renderer). No user
|
|
243
|
+
Ractors are supported.
|
|
244
|
+
- The Main Thread's kernel task id is obtained by calling
|
|
245
|
+
`gettid(2)` via Fiddle once at Observer start. This value is the
|
|
246
|
+
canonical `tid` for **every** Span emitted in v1.
|
|
247
|
+
- The Correlator Thread has a different kernel tid, but it does not
|
|
248
|
+
emit Spans. It is, however, part of the same tgid as the Main
|
|
249
|
+
Thread, so its syscalls will be picked up by vivariumd's BPF
|
|
250
|
+
program. For v1 we accept the minor self-noise this causes
|
|
251
|
+
(a handful of `file_open` events at startup). A future iteration
|
|
252
|
+
may add a BPF-side exclusion map keyed by tid.
|
|
253
|
+
- Thread/Ractor support is explicitly deferred. See §6 for the
|
|
254
|
+
shape of the unanswered questions when we revisit it.
|
|
255
|
+
|
|
256
|
+
### 5.8 Span boundary mechanism (v1)
|
|
257
|
+
|
|
258
|
+
The Main Thread fires Span USDTs from inside a `TracePoint` callback.
|
|
259
|
+
|
|
260
|
+
- **TracePoint events listened:** `:call`, `:return`, `:c_call`,
|
|
261
|
+
`:c_return`, `:raise`. Both Ruby- and C-implemented methods are
|
|
262
|
+
eligible, so allowlist entries are not constrained by implementation
|
|
263
|
+
language.
|
|
264
|
+
- **Per-method allowlist (call/return only):** for the call/return
|
|
265
|
+
events, the callback fires USDT probes only when the method matches
|
|
266
|
+
either:
|
|
267
|
+
- `SPAN_ALLOWLIST` — exact `"#{tp.defined_class}##{tp.method_id}"`
|
|
268
|
+
string match, or
|
|
269
|
+
- `SPAN_ALLOWCLASSES` — `tp.defined_class` is one of the listed
|
|
270
|
+
classes (or its singleton class, covering class-method calls).
|
|
271
|
+
|
|
272
|
+
This is **mandatory** for call/return — no match means no Span.
|
|
273
|
+
- **v1 allowlist contents** (see [lib/vivarium.rb](lib/vivarium.rb)
|
|
274
|
+
`SPAN_ALLOWCLASSES` / `SPAN_ALLOWLIST`):
|
|
275
|
+
- Classes: `Socket`, `BasicSocket`, `IPSocket`, `TCPSocket`,
|
|
276
|
+
`UDPSocket`, `UNIXSocket`, `File`, `Dir`, `Signal`, `Process`,
|
|
277
|
+
`Process::UID`, `Process::GID`.
|
|
278
|
+
- Methods: `Kernel#system`, `Kernel#require`,
|
|
279
|
+
`Kernel#require_relative`, `Kernel#load`, `Kernel#eval`,
|
|
280
|
+
`Object#instance_eval`, `Object#instance_exec`.
|
|
281
|
+
- **Mapping events → probes:**
|
|
282
|
+
- `:call` / `:c_call` → `Vivarium::Usdt.start(defined_class, method_id, file:, lineno:)`
|
|
283
|
+
- `:return` / `:c_return` → `Vivarium::Usdt.stop(defined_class, method_id, file:, lineno:)`
|
|
284
|
+
- `:raise` → `Vivarium::Usdt.raise(exception.class, exception.message, file:, lineno:)`
|
|
285
|
+
- **`:raise` is unfiltered.** Unlike call/return, the `:raise` branch
|
|
286
|
+
does **not** consult the allowlist — every Ruby-level raise inside
|
|
287
|
+
the Observer process fires `raise_probe`. The probe is documented as
|
|
288
|
+
exception-safe at the `vivarium_usdt` layer, so a misbehaving
|
|
289
|
+
raise handler will not re-enter the TracePoint. Noise from third-
|
|
290
|
+
party libraries (e.g. internal `rescue`d exceptions) is accepted
|
|
291
|
+
for v1; filtering, if needed, will be added later.
|
|
292
|
+
- **`:raise` does not close the Span.** Ruby's TracePoint fires both
|
|
293
|
+
`:raise` and the subsequent `:return` / `:c_return` for the raising
|
|
294
|
+
frame, so the span is still closed by `span_stop`; `span_raise` only
|
|
295
|
+
flags the Span with `raised=true` (rendered as `(raise)` suffix) and
|
|
296
|
+
appears as an `EXCP` event line within it.
|
|
297
|
+
- **Allowlist configuration mechanism:** hardcoded constants in v1.
|
|
298
|
+
A configuration API (`Vivarium.observe(methods: [...])`) is out of
|
|
299
|
+
scope for v1.
|
|
300
|
+
|
|
301
|
+
### 5.9 Correlator join algorithm (fork/exec handling)
|
|
302
|
+
|
|
303
|
+
**New BPF event.** vivariumd must emit a `proc_fork` ringbuf event
|
|
304
|
+
whenever `sched_process_fork` fires for a target. Today this
|
|
305
|
+
tracepoint only updates `config_spawned_targets`
|
|
306
|
+
(see [vivarium.rb:695-718](lib/vivarium.rb#L695)); v1 keeps that
|
|
307
|
+
behavior and additionally submits:
|
|
308
|
+
|
|
309
|
+
- `event_name = "proc_fork"`
|
|
310
|
+
- `pid`, `tid` — the **parent** tgid and tid (the thread that called
|
|
311
|
+
fork — this is also a member of the Spanning thread for the v1
|
|
312
|
+
`Kernel#system` case).
|
|
313
|
+
- `payload = { u32 child_pid; u32 child_tid; }` (8 bytes; rest zero).
|
|
314
|
+
|
|
315
|
+
**Join rule.** For each event `E` (consumed in `ktime_ns` order),
|
|
316
|
+
locate the innermost open Span `S` such that
|
|
317
|
+
`S.start_ktime_ns ≤ E.ktime_ns ≤ S.stop_ktime_ns` AND either:
|
|
318
|
+
|
|
319
|
+
- **(i)** `E.tid == S.tid` — event from the Spanning thread, OR
|
|
320
|
+
- **(ii)** `E.pid ∈ S.descendant_pids` — event from a fork descendant.
|
|
321
|
+
|
|
322
|
+
`S.descendant_pids` is the closure under `proc_fork` events rooted
|
|
323
|
+
at `S.tid`'s process: when a `proc_fork` `F` is matched into `S`
|
|
324
|
+
via rule (i) or (ii), `F.payload.child_pid` is added to
|
|
325
|
+
`S.descendant_pids`.
|
|
326
|
+
|
|
327
|
+
**Rendering.**
|
|
328
|
+
|
|
329
|
+
- Events matched by rule (i) are placed directly under the `[SPAN]`
|
|
330
|
+
line.
|
|
331
|
+
- Events matched by rule (ii) are placed under the `[PROC pid=X ...]`
|
|
332
|
+
node that was materialized beneath the `proc_fork` event line
|
|
333
|
+
that birthed `X`.
|
|
334
|
+
|
|
335
|
+
**`comm` in `[PROC ...]` headers** is rendered as the **most recently
|
|
336
|
+
observed comm** for that pid. Initialize from the
|
|
337
|
+
`sched_process_fork` event's `child_comm`; update on each subsequent
|
|
338
|
+
`sys_enter_execve` under that PROC using the exec'd program's
|
|
339
|
+
basename. So `Kernel#system "sh -c 'id'"` renders as
|
|
340
|
+
`[PROC pid=101 comm=sh parent=100]`, even though the process started
|
|
341
|
+
life as a `ruby` clone.
|
|
342
|
+
|
|
343
|
+
### 5.10 Process Tree root and output strategy
|
|
344
|
+
|
|
345
|
+
**Single shared root.** The Process Tree has exactly one root per
|
|
346
|
+
Observer session: the Observer's own `[PROC pid=N comm=ruby]`. All
|
|
347
|
+
Spans accumulate beneath it as siblings, in the order they closed:
|
|
348
|
+
|
|
349
|
+
```
|
|
350
|
+
[PROC pid=100 comm=ruby]
|
|
351
|
+
├─ [SPAN tid=100 Kernel#system dur=2.3ms]
|
|
352
|
+
│ └─ ...
|
|
353
|
+
└─ [SPAN tid=100 Kernel#system dur=5.1ms]
|
|
354
|
+
└─ ...
|
|
355
|
+
```
|
|
356
|
+
|
|
357
|
+
**Output timing.** The Correlator emits the full tree **once** when
|
|
358
|
+
the Observer session ends:
|
|
359
|
+
|
|
360
|
+
- `Vivarium.observe { ... }` (scoped) — at block exit.
|
|
361
|
+
- `Vivarium.top_observe` — at `at_exit` / explicit `session.stop`.
|
|
362
|
+
|
|
363
|
+
v1 does not stream partial output mid-session. A future iteration
|
|
364
|
+
may add per-Span streaming output.
|
|
365
|
+
|
|
366
|
+
**State retention.** The Correlator holds all closed-Span subtrees
|
|
367
|
+
in memory until session end. Memory cost is proportional to the
|
|
368
|
+
total event count across the session. For PoC scope (a handful of
|
|
369
|
+
`Kernel#system` calls), this is negligible.
|
|
370
|
+
|
|
371
|
+
**Out-of-Span events: grouped into synthetic `<no-span>` Spans.**
|
|
372
|
+
Events whose `(pid, tid, ktime_ns)` match no real Span are collected
|
|
373
|
+
into per-gap synthetic Spans. The Correlator materializes one
|
|
374
|
+
synthetic Span per time gap:
|
|
375
|
+
|
|
376
|
+
- one for the interval `[session_start, firstRealSpan.start]`,
|
|
377
|
+
- one for each interval `[prevRealSpan.stop, nextRealSpan.start]`
|
|
378
|
+
between consecutive real Spans,
|
|
379
|
+
- one for the interval `[lastRealSpan.stop, session_end]`.
|
|
380
|
+
|
|
381
|
+
A synthetic Span uses the literal sentinel name **`<no-span>`** in
|
|
382
|
+
its Format A header — the angle brackets distinguish it from real
|
|
383
|
+
method names — and otherwise renders identically to a real Span:
|
|
384
|
+
|
|
385
|
+
```
|
|
386
|
+
[SPAN tid=100 <no-span> dur=10.0ms]
|
|
387
|
+
└─ LSM file_open → /tmp/whatever @+3.0ms
|
|
388
|
+
```
|
|
389
|
+
|
|
390
|
+
The same join rule (§5.9) applies: events match a synthetic Span by
|
|
391
|
+
either `tid == MainThread.tid` or `pid ∈ descendant_pids` (forks
|
|
392
|
+
that happen outside any real Span are tracked the same way).
|
|
393
|
+
|
|
394
|
+
**Empty gaps are not rendered.** A synthetic Span with zero matched
|
|
395
|
+
events is skipped, so a stretch of pure idle does not pollute the
|
|
396
|
+
tree.
|
|
397
|
+
|
|
398
|
+
**Late-arriving child events.** For v1 the allowlist is
|
|
399
|
+
`Kernel#system`, which `wait(2)`s for its child. Therefore every
|
|
400
|
+
child process's events arrive within `[Span.start, Span.stop]` and
|
|
401
|
+
the join rule (§5.9) places them correctly. The case where a Span
|
|
402
|
+
spawns a long-lived background child (e.g. `Kernel#spawn` + no
|
|
403
|
+
wait) is out of v1 scope.
|
|
404
|
+
|
|
405
|
+
### 5.11 Time anchoring and header format
|
|
406
|
+
|
|
407
|
+
The Correlator captures a session anchor at startup and another at
|
|
408
|
+
shutdown:
|
|
409
|
+
|
|
410
|
+
- `session_start_iso` / `session_stop_iso` — wall clock at
|
|
411
|
+
Correlator Thread start / stop (ISO 8601 with millisecond
|
|
412
|
+
precision, UTC).
|
|
413
|
+
- `session_start_ktime` / `session_stop_ktime` —
|
|
414
|
+
`bpf_ktime_get_ns()` values sampled at the same instants.
|
|
415
|
+
|
|
416
|
+
These are emitted at the top of the rendered output as a multi-line
|
|
417
|
+
comment header (lines beginning with `#`):
|
|
418
|
+
|
|
419
|
+
```
|
|
420
|
+
# vivarium session
|
|
421
|
+
# started iso=2026-05-27T19:00:00.000Z ktime=12345678900
|
|
422
|
+
# stopped iso=2026-05-27T19:00:30.250Z ktime=12375929150
|
|
423
|
+
# duration 30.250s
|
|
424
|
+
[PROC pid=100 comm=ruby]
|
|
425
|
+
...
|
|
426
|
+
```
|
|
427
|
+
|
|
428
|
+
The `ktime` values are the absolute anchor for the entire session.
|
|
429
|
+
All Format A timestamps (§5.4) are `@+Xms` offsets relative to their
|
|
430
|
+
enclosing Span's `start_ktime`. Span `start_ktime` itself can be
|
|
431
|
+
recovered as `session_start_ktime + (span offset from session start)`
|
|
432
|
+
if needed, but is not exposed in the default rendering.
|
|
433
|
+
|
|
434
|
+
Comment lines (`#`-prefixed) are out of band; renderers may emit
|
|
435
|
+
additional `#` lines for warnings (§5.12).
|
|
436
|
+
|
|
437
|
+
### 5.12 method_id resolution and ordering
|
|
438
|
+
|
|
439
|
+
`method_id` registrations from the Main Thread arrive on the Queue
|
|
440
|
+
out of order relative to ringbuf events bearing the same
|
|
441
|
+
`method_id` (the ringbuf `span_start` may be observed before the
|
|
442
|
+
Queue registration is processed). v1 handles this **lazily**:
|
|
443
|
+
|
|
444
|
+
1. The Correlator updates its local `method_id → signature` table
|
|
445
|
+
as Queue messages arrive, but never blocks on it.
|
|
446
|
+
2. Span name resolution is **deferred until rendering** (which
|
|
447
|
+
happens once at session end per §5.10). By that point the Queue
|
|
448
|
+
should be drained and the table should be complete.
|
|
449
|
+
3. Any `method_id` still unresolved at render time is rendered with
|
|
450
|
+
the placeholder name **`<method_id=0x{hex}>`**, and a warning
|
|
451
|
+
line is emitted immediately after the session header:
|
|
452
|
+
|
|
453
|
+
```
|
|
454
|
+
# warning method_id=0xABCD1234EF567890 unresolved at render time
|
|
455
|
+
```
|
|
456
|
+
|
|
457
|
+
4. If unresolved warnings become frequent in practice, revisit the
|
|
458
|
+
registration delivery (e.g., push registrations through the same
|
|
459
|
+
ringbuf, or block the Main Thread until ack). v1 does not.
|
|
460
|
+
|
|
461
|
+
---
|
|
462
|
+
|
|
463
|
+
## 6. Decisions still open
|
|
464
|
+
|
|
465
|
+
1. **Backpressure.** What happens when `bpf_ringbuf_reserve` returns
|
|
466
|
+
NULL? Need a drop counter (a separate small map) and the Correlator
|
|
467
|
+
should render `[DROPPED n]` markers in-tree at the point of loss.
|
|
468
|
+
|
|
469
|
+
2. **`method_id` collisions.** Hash space is 64-bit; collisions are
|
|
470
|
+
astronomically unlikely but not impossible. Proposal: ignore for
|
|
471
|
+
v1, document.
|
|
472
|
+
|
|
473
|
+
3. **Long-lived background children.** §5.10 assumes the allowlisted
|
|
474
|
+
method waits for its child (true for `Kernel#system`). With the
|
|
475
|
+
broader v1 allowlist (Process spawn, etc.) a Span may close while
|
|
476
|
+
its child is still alive, causing child events to land outside any
|
|
477
|
+
real Span and be absorbed by a synthetic `<no-span>` gap. Acceptable
|
|
478
|
+
for now; revisit if this becomes a usability issue.
|
|
479
|
+
|
|
480
|
+
4. **`:raise` noise filtering.** `:raise` is currently unfiltered
|
|
481
|
+
(§5.8). If library-internal exceptions become a rendering nuisance
|
|
482
|
+
beyond what the `vivarium_usdt` exception-safety guarantee buffers,
|
|
483
|
+
reintroduce a defined-class denylist or an allowlist.
|
|
484
|
+
|
|
485
|
+
---
|
|
486
|
+
|
|
487
|
+
## 7. Implementation map (v1)
|
|
488
|
+
|
|
489
|
+
### 7.1 Where the §5 design lives in code
|
|
490
|
+
|
|
491
|
+
| §5 concept | Code location |
|
|
492
|
+
|-------------------------------------|-------------------------------------------------------------------------------------------------------------------|
|
|
493
|
+
| `event_t` v2 (with `tid`) | [lib/vivarium.rb](lib/vivarium.rb) — `EVENT_TID_OFFSET=12`, `EVENT_PAYLOAD_OFFSET=32` + BPF `struct event_t` |
|
|
494
|
+
| `BPF_RINGBUF_OUTPUT(events, ...)` | [lib/vivarium.rb](lib/vivarium.rb) — replaces former `event_invoked` / `event_write_pos` array maps |
|
|
495
|
+
| Auto-`tid`/`ktime_ns` fill | [lib/vivarium.rb](lib/vivarium.rb) — `submit_event()` BPF helper sets both from `bpf_get_current_pid_tgid` |
|
|
496
|
+
| `proc_fork` ringbuf event | [lib/vivarium.rb](lib/vivarium.rb) — `TRACEPOINT_PROBE(sched, sched_process_fork)` emits when target |
|
|
497
|
+
| USDT uprobe handlers | [lib/vivarium.rb](lib/vivarium.rb) — `on_span_start` / `on_span_stop` / `on_span_raise` (BPF C) |
|
|
498
|
+
| USDT attach via `.so` path | [lib/vivarium.rb](lib/vivarium.rb) — `Daemon#run` calls `RbBCC::USDT.new(path: ...)` |
|
|
499
|
+
| `.so` path discovery | [lib/vivarium.rb](lib/vivarium.rb) — `Vivarium.locate_vivarium_usdt_so` |
|
|
500
|
+
| `MapStore` (slim, registration only)| [lib/vivarium.rb](lib/vivarium.rb) — `register_pid` / `unregister_pid` only |
|
|
501
|
+
| Correlator Thread | [lib/vivarium/correlator.rb](lib/vivarium/correlator.rb) |
|
|
502
|
+
| Format A renderer | [lib/vivarium/tree_renderer.rb](lib/vivarium/tree_renderer.rb) |
|
|
503
|
+
| `gettid(2)` via Fiddle | [lib/vivarium.rb](lib/vivarium.rb) — `Vivarium.gettid` |
|
|
504
|
+
| `method_id` Queue + lazy resolve | `Thread::Queue` created in `top_observe` / `scoped_observe`, drained by Correlator, resolved at render |
|
|
505
|
+
| Span allowlist (class + method) | [lib/vivarium.rb](lib/vivarium.rb) — `SPAN_ALLOWCLASSES` (Socket/File/Dir/Signal/Process families) + `SPAN_ALLOWLIST` (Kernel#system/require/load/eval, Object#instance_eval/instance_exec) |
|
|
506
|
+
| TracePoint USDT firing | [lib/vivarium.rb](lib/vivarium.rb) — `build_observe_tracepoint` (`:call`/`:c_call`/`:return`/`:c_return` gated by allowlist; `:raise` unfiltered → `raise_probe`) |
|
|
507
|
+
| `span_raise` payload + event render | [lib/vivarium.rb](lib/vivarium.rb) `decode_span_raise_payload` + [lib/vivarium/tree_renderer.rb](lib/vivarium/tree_renderer.rb) `render_raise_target` (EXCP kind, `(raise)` Span suffix) |
|
|
508
|
+
| Per-Span `file:lineno` | USDT probe args (24B start/stop, 32B raise) resolved via `Vivarium::Usdt.register_or_resolve_file`; rendered by `TreeRenderer#span_file_info` |
|
|
509
|
+
|
|
510
|
+
### 7.2 v1 deviations and pragmatic choices
|
|
511
|
+
|
|
512
|
+
- **USDT attach is binary-path-based at daemon startup** (chosen over
|
|
513
|
+
per-PID dynamic attach). vivariumd loads `vivarium_usdt` once,
|
|
514
|
+
resolves its `.so` via `$LOADED_FEATURES`, and passes it as a single
|
|
515
|
+
`usdt_contexts:` to `RbBCC::BCC.new`. Per-PID filtering still happens
|
|
516
|
+
BPF-side via `target_enabled` (unchanged).
|
|
517
|
+
- **Logger (`lib/vivarium/logger.rb`) is orphaned but retained.** All
|
|
518
|
+
Observer-side rendering goes through the Correlator / TreeRenderer.
|
|
519
|
+
The kprint diagnostic thread in `Daemon` continues to use `puts` /
|
|
520
|
+
`warn` directly, not Logger. The file is left intact to avoid
|
|
521
|
+
collateral churn; it can be deleted in a follow-up.
|
|
522
|
+
- **`render_event_payload` is reused inside TreeRenderer** as the
|
|
523
|
+
`target` text for event lines. This couples Format A target rendering
|
|
524
|
+
to the existing decoder set; v1 accepts the coupling rather than
|
|
525
|
+
duplicate the decoders.
|
|
526
|
+
- **`Process.clock_gettime(CLOCK_MONOTONIC, :nanosecond)`** is used as
|
|
527
|
+
the userspace anchor for `session_start_ktime` / `session_stop_ktime`.
|
|
528
|
+
This is treated as equivalent to BPF `bpf_ktime_get_ns()`; any small
|
|
529
|
+
divergence (suspend handling on older kernels) is accepted for v1.
|
|
530
|
+
- **method_id resolution is fully lazy at render time** per §5.12,
|
|
531
|
+
using `<method_id=0xHEX>` placeholder + `# warning ...` header line
|
|
532
|
+
if unresolved. No back-fill, no Main-Thread blocking, no second
|
|
533
|
+
registration channel.
|
|
534
|
+
- **Synthetic `<no-span>` gap rendering is emitted only when non-empty**
|
|
535
|
+
(matches §5.10's "Empty gaps are not rendered" rule).
|
data/README.md
CHANGED
|
@@ -2,10 +2,10 @@
|
|
|
2
2
|
|
|
3
3
|
[](https://rubygems.org/gems/vivarium)
|
|
4
4
|
|
|
5
|
-
RubyGems: https://rubygems.org/gems/vivarium
|
|
6
|
-
|
|
7
5
|
Vivarium is an observation and sandbox helper for Ruby.
|
|
8
6
|
|
|
7
|
+
<img src="./image.png" alt="Logo - generated by Nano Banana 2" width="450">
|
|
8
|
+
|
|
9
9
|
It combines:
|
|
10
10
|
|
|
11
11
|
- eBPF LSM monitoring via RbBCC (`vivariumd`)
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
#!/usr/bin/env ruby
|
|
2
|
+
# frozen_string_literal: true
|
|
3
|
+
|
|
4
|
+
require "vivarium"
|
|
5
|
+
|
|
6
|
+
def try_step(title)
|
|
7
|
+
puts "[priv-demo] #{title}"
|
|
8
|
+
yield
|
|
9
|
+
rescue StandardError => e
|
|
10
|
+
puts "[priv-demo] #{title} failed: #{e.class}: #{e.message}"
|
|
11
|
+
end
|
|
12
|
+
|
|
13
|
+
Vivarium.observe do
|
|
14
|
+
try_step("raise in main") do
|
|
15
|
+
raise "error in main"
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
try_step("raise in eval") do
|
|
19
|
+
eval("raise 'error in eval'")
|
|
20
|
+
end
|
|
21
|
+
|
|
22
|
+
try_step("raise in nested eval") do
|
|
23
|
+
eval(<<~RUBY)
|
|
24
|
+
eval(<<~INNER_RUBY)
|
|
25
|
+
begin
|
|
26
|
+
eval(<<~INNER_INNER_RUBY)
|
|
27
|
+
puts "Hi"
|
|
28
|
+
raise "error in nested nested eval"
|
|
29
|
+
INNER_INNER_RUBY
|
|
30
|
+
rescue StandardError => _
|
|
31
|
+
puts "Rescued in nested eval"
|
|
32
|
+
end
|
|
33
|
+
File.open("/etc/hosts")
|
|
34
|
+
INNER_RUBY
|
|
35
|
+
RUBY
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
|
|
39
|
+
try_step("raise in method") do
|
|
40
|
+
File.open("notfound")
|
|
41
|
+
end
|
|
42
|
+
end
|
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
def debug_output(msg)
|
|
4
|
+
$stderr.puts("[DEBUG] #{msg}") if ENV["VIVARIUM_DEBUG"]
|
|
5
|
+
end
|
|
6
|
+
|
|
7
|
+
debug_output "=== sudo attempt demo ==="
|
|
8
|
+
|
|
9
|
+
debug_output "[1] Attempting: sudo id"
|
|
10
|
+
system("sudo", "-n", "id")
|
|
11
|
+
|
|
12
|
+
debug_output "[2] Attempting: sudo cat /etc/shadow"
|
|
13
|
+
system("sudo", "-n", "cat", "/etc/shadow")
|
|
14
|
+
|
|
15
|
+
debug_output "[3] Attempting: sudo cat /proc/1/environ"
|
|
16
|
+
system("sudo", "-n", "cat", "/proc/1/environ")
|
|
17
|
+
|
|
18
|
+
debug_output "=== done ==="
|
data/exe/vivarium
ADDED
data/image.png
ADDED
|
Binary file
|
data/lib/vivarium/cli.rb
ADDED
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "optparse"
|
|
4
|
+
|
|
5
|
+
module Vivarium
|
|
6
|
+
module CLI
|
|
7
|
+
def self.run!(argv = ARGV)
|
|
8
|
+
options = { pin_dir: Vivarium.bpf_pin_dir, dest: $stdout }
|
|
9
|
+
parser = OptionParser.new do |opts|
|
|
10
|
+
opts.banner = "Usage: vivarium [options] <command> [args]"
|
|
11
|
+
opts.separator ""
|
|
12
|
+
opts.separator "Commands:"
|
|
13
|
+
opts.separator " load <script> Load and observe a Ruby script"
|
|
14
|
+
opts.separator ""
|
|
15
|
+
opts.separator "Options:"
|
|
16
|
+
opts.on("--pin-dir PATH", "Pinned map directory") { |v| options[:pin_dir] = v }
|
|
17
|
+
opts.on("-o", "--output PATH", "Log output file (default: stdout)") { |v| options[:dest] = File.open(v, "a") }
|
|
18
|
+
end
|
|
19
|
+
parser.order!(argv)
|
|
20
|
+
|
|
21
|
+
command = argv.shift
|
|
22
|
+
case command
|
|
23
|
+
when "load"
|
|
24
|
+
run_load!(argv, options)
|
|
25
|
+
else
|
|
26
|
+
abort parser.help
|
|
27
|
+
end
|
|
28
|
+
end
|
|
29
|
+
|
|
30
|
+
def self.run_load!(argv, options)
|
|
31
|
+
script = argv.shift
|
|
32
|
+
abort "Usage: vivarium load <script>" unless script
|
|
33
|
+
abort "File not found: #{script}" unless File.exist?(script)
|
|
34
|
+
|
|
35
|
+
Vivarium.observe(pin_dir: options[:pin_dir], dest: options[:dest]) do
|
|
36
|
+
Kernel.load(File.expand_path(script))
|
|
37
|
+
end
|
|
38
|
+
end
|
|
39
|
+
end
|
|
40
|
+
end
|