rperf 0.8.0 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 520b1f5fd883bd68232c2b714aa1a66dbf811a3f3d2b2b54c212233f4a97d1c4
4
- data.tar.gz: 11f5a6f52444abebc28a41055726eab246d623457c85001ba737fcb790b20cea
3
+ metadata.gz: 0dd9ddc0ea22d51ff2ebcdae6ddfa2f5e531c4a7814552b16470668d04d1d617
4
+ data.tar.gz: fce76e35ea6ccbca8e8b429da4b55574349f98b8731a2db3826efd908082dda6
5
5
  SHA512:
6
- metadata.gz: 88c50af83f66f569739bd37377cc2dfc51f7bc6970243cb14bf5b2a7defc9ab6fee60f0e43f85e30f5acf0a315c2774ff0fd88b36c6ed913a5e9d6ef4aa63352
7
- data.tar.gz: 63359d42f26529e726ef070f335e726b62bef23b0c897092b6010385ae91d01790039da35e300e734c4dfebb796c8b687759bdce395e13fcb778dca89c0318a1
6
+ metadata.gz: dceedb899f6b9249ae342e5701ef3280253c82d317ad73dea0bdd0c0c040a576740bb747c05c79b1f6118c82b05ad4756207ac046d9cc4805982a2f24ad09e1b
7
+ data.tar.gz: 4353afbe9ca80126c25884e837aacdd902a1e9d20756d543d235415492e17932490711b348ee49eb9e5144f977027a1284876cbd4dea4de31c342a16db16d931
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Koichi Sasada
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md CHANGED
@@ -69,14 +69,23 @@ rperf stat ruby app.rb
69
69
  rperf record ruby app.rb # → rperf.json.gz (cpu mode, default)
70
70
  rperf record -m wall ruby server.rb # wall mode
71
71
 
72
- # View results in browser (no external tools needed)
72
+ # View results in browser
73
73
  rperf report # open rperf.json.gz in viewer
74
74
  rperf report --top profile.json.gz # print top functions to terminal
75
75
 
76
76
  # Compare two profiles (requires Go)
77
77
  rperf diff before.json.gz after.json.gz # open diff in browser
78
+
79
+ # Track performance across commits (time-travel viewer)
80
+ rperf record --snapshot-dir ./profiles ruby app.rb # → profiles/rperf-<sha7>-<ts>.json.gz
81
+ rperf report ./profiles/ # sidebar: per-commit list, diff, sparkline
82
+
83
+ # Flat tables for AI analysis (no Go required)
84
+ rperf diff --format table base.json.gz head.json.gz | claude -p "analyze the regression"
78
85
  ```
79
86
 
87
+ On `rperf report`, you can see the profile result like this page: [rperf viewer](https://ko1.github.io/rperf/examples/cpu_intensive_profile.html)
88
+
80
89
  ### Ruby API
81
90
 
82
91
  ```ruby
@@ -117,10 +126,12 @@ Thread.new { loop { sleep 3600; Rperf::Viewer.instance&.take_snapshot! } }
117
126
  Profile without code changes (e.g., Rails):
118
127
 
119
128
  ```bash
120
- RPERF_ENABLED=1 RPERF_MODE=wall ruby app.rb # → rperf.json.gz
121
- rperf report # open in viewer
129
+ RPERF_ENABLED=1 RPERF_MODE=wall RUBYOPT=-rrperf ruby app.rb # → rperf.json.gz
130
+ rperf report # open in viewer
122
131
  ```
123
132
 
133
+ `RPERF_ENABLED` takes effect when rperf is loaded — if rperf is already in your Gemfile (e.g., Rails), `RUBYOPT=-rrperf` is unnecessary.
134
+
124
135
  Run `rperf help` for full documentation, or see the [online manual](https://ko1.github.io/rperf/docs/manual/).
125
136
 
126
137
  ## Subcommands
@@ -132,8 +143,8 @@ Inspired by Linux `perf` — familiar subcommand interface for profiling workflo
132
143
  | `rperf record` | Profile a command and save to file (default: `.json.gz`) |
133
144
  | `rperf stat` | Profile a command and print summary to stderr |
134
145
  | `rperf exec` | Profile a command and print full report to stderr |
135
- | `rperf report` | Open viewer for `.json.gz`; wraps `go tool pprof` for `.pb.gz` (requires Go) |
136
- | `rperf diff` | Compare two profiles (requires Go) |
146
+ | `rperf report` | Open viewer for `.json.gz` (or a directory for time-travel mode); wraps `go tool pprof` for `.pb.gz` (requires Go) |
147
+ | `rperf diff` | Compare two profiles (requires Go, except `--format table`) |
137
148
  | `rperf help` | Show full reference documentation |
138
149
 
139
150
  ## How It Works
@@ -156,7 +167,7 @@ Timer (signal or thread) VM thread (postponed job)
156
167
  record(backtrace, weight)
157
168
  ```
158
169
 
159
- On Linux, the timer uses `timer_create` + signal delivery (no extra thread).
170
+ On Linux, the timer uses `timer_create` + signal delivery to a dedicated worker thread.
160
171
  On other platforms, a dedicated pthread with `nanosleep` is used.
161
172
 
162
173
  If a safepoint is delayed, the sample carries proportionally more weight. The total weight equals the total time, accurately distributed across call stacks.
@@ -170,32 +181,32 @@ If a safepoint is delayed, the sample carries proportionally more weight. The to
170
181
 
171
182
  Use `cpu` to find what consumes CPU. Use `wall` to find what makes things slow (I/O, GVL contention, GC).
172
183
 
173
- ### GVL and GC Labels (wall mode)
184
+ ### GVL and GC Labels
174
185
 
175
186
  rperf hooks GVL and GC events to attribute non-CPU time. These are recorded as labels on samples rather than synthetic stack frames:
176
187
 
177
- | Label | Meaning |
178
- |-------|---------|
179
- | `%GVL: blocked` | Off-GVL time (I/O, sleep, C extension releasing GVL) |
180
- | `%GVL: wait` | Waiting to reacquire the GVL (contention) |
181
- | `%GC: mark` | Time in GC mark phase |
182
- | `%GC: sweep` | Time in GC sweep phase |
188
+ | Label (key=value) | Mode | Meaning |
189
+ |-------|------|---------|
190
+ | `%GVL=blocked` | wall only | Off-GVL time (I/O, sleep, C extension releasing GVL) |
191
+ | `%GVL=wait` | wall only | Waiting to reacquire the GVL (contention) |
192
+ | `%GC=mark` | cpu and wall | Time in GC mark phase (wall time) |
193
+ | `%GC=sweep` | cpu and wall | Time in GC sweep phase (wall time) |
183
194
 
184
195
  ## Why rperf?
185
196
 
186
197
  - **Accurate despite safepoints** — Safepoint sampling is *safer* (no async-signal-safety issues), but normally *inaccurate*. rperf compensates with real time-delta weights, so profiles faithfully reflect where time is actually spent.
187
198
  - **See the whole picture** (wall mode) — GVL contention, off-GVL I/O, GC marking/sweeping — all attributed to the call stacks responsible, via sample labels.
188
199
  - **Built-in viewer** — Flamegraph, Top, Tags tabs with interactive tag filtering. No external tools needed to analyze profiles.
189
- - **Low overhead** — Signal-based timer on Linux (no extra thread). ~1–5 us per sample.
200
+ - **Low overhead** — Signal-based timer on Linux (signals delivered to a dedicated worker thread — Ruby threads are never interrupted). ~1–5 us per sample.
190
201
  - **Zero code changes** — Profile any Ruby program via CLI or environment variables. Drop-in for Rails, too.
191
202
  - **`perf`-like CLI** — `record`, `stat`, `report`, `diff` — if you know Linux perf, you already know rperf.
203
+ - **Multi-process** — automatically profiles forked/spawned Ruby child processes (e.g., Unicorn/Puma workers). Use `--no-inherit` to disable.
192
204
 
193
205
  ### Limitations
194
206
 
195
207
  - **Method-level only** — no line-level granularity.
196
208
  - **Ruby >= 3.4.0** — uses recent VM internals (postponed jobs, thread event hooks).
197
209
  - **POSIX only** — Linux, macOS. No Windows.
198
- - **No fork following** — profiling stops in fork(2) child processes (the child can start a new session).
199
210
 
200
211
 
201
212
  ## Output Formats
data/docs/help.md CHANGED
@@ -12,7 +12,9 @@ POSIX systems (Linux, macOS). Requires Ruby >= 3.4.0.
12
12
  rperf stat [options] command [args...]
13
13
  rperf exec [options] command [args...]
14
14
  rperf report [options] [file]
15
+ rperf diff [options] base target
15
16
  rperf help
17
+ rperf -v / --version
16
18
 
17
19
  ### record: Profile and save to file.
18
20
 
@@ -24,8 +26,17 @@ POSIX systems (Linux, macOS). Requires Ruby >= 3.4.0.
24
26
  (same as --format=text --output=/dev/stdout)
25
27
  --signal VALUE Timer signal (Linux only): signal number, or 'false'
26
28
  for nanosleep thread (default: auto)
29
+ --snapshot-dir DIR Save as rperf-<sha7>-<timestamp>.json.gz in DIR
30
+ (rperf-nogit-<timestamp>-<pid>.json.gz outside git)
31
+ --label KEY=VALUE Add a label to profile metadata (repeatable)
32
+ --no-inherit Do not profile forked/spawned child processes
33
+ --no-aggregate Disable C-level sample aggregation (raw per-sample data)
27
34
  -v, --verbose Print sampling statistics to stderr
28
35
 
36
+ JSON output embeds `meta` (git commit, host, Ruby/rperf versions, labels)
37
+ and `summary` (time, GC, allocation, top methods) — see "Profile metadata"
38
+ under OUTPUT FORMATS.
39
+
29
40
  ### stat: Run command and print performance summary to stderr.
30
41
 
31
42
  Uses wall mode by default. No file output by default.
@@ -33,9 +44,12 @@ Uses wall mode by default. No file output by default.
33
44
  -o, --output PATH Also save profile to file (default: none)
34
45
  -f, --frequency HZ Sampling frequency in Hz (default: 1000)
35
46
  -m, --mode MODE cpu or wall (default: wall)
47
+ --label KEY=VALUE Add a label to profile metadata (repeatable)
36
48
  --report Include flat/cumulative profile tables in output
37
49
  --signal VALUE Timer signal (Linux only): signal number, or 'false'
38
50
  for nanosleep thread (default: auto)
51
+ --no-inherit Do not profile forked/spawned child processes
52
+ --no-aggregate Disable C-level sample aggregation (raw per-sample data)
39
53
  -v, --verbose Print additional sampling statistics
40
54
 
41
55
  Shows: user/sys/real time, time breakdown (CPU execution, GVL blocked,
@@ -44,6 +58,10 @@ Lines are prefixed: `[Rperf]` for sampling-derived data, `[Ruby ]` for
44
58
  runtime info, `[OS ]` for OS-level info.
45
59
  Use --report to add flat and cumulative top-50 function tables.
46
60
 
61
+ When child processes are profiled (default), the stat output shows
62
+ aggregated data from all processes and includes a "Ruby processes profiled"
63
+ count. Use --no-inherit to disable child process tracking.
64
+
47
65
  ### exec: Run command and print full profile report to stderr.
48
66
 
49
67
  Like `stat --report`. Uses wall mode by default. No file output by default.
@@ -51,8 +69,11 @@ Like `stat --report`. Uses wall mode by default. No file output by default.
51
69
  -o, --output PATH Also save profile to file (default: none)
52
70
  -f, --frequency HZ Sampling frequency in Hz (default: 1000)
53
71
  -m, --mode MODE cpu or wall (default: wall)
72
+ --label KEY=VALUE Add a label to profile metadata (repeatable)
54
73
  --signal VALUE Timer signal (Linux only): signal number, or 'false'
55
74
  for nanosleep thread (default: auto)
75
+ --no-inherit Do not profile forked/spawned child processes
76
+ --no-aggregate Disable C-level sample aggregation (raw per-sample data)
56
77
  -v, --verbose Print additional sampling statistics
57
78
 
58
79
  Shows: user/sys/real time, time breakdown, GC/memory/OS stats, profiler overhead,
@@ -62,17 +83,136 @@ and flat/cumulative top-50 function tables.
62
83
 
63
84
  --top Print top functions by flat time
64
85
  --text Print text report
86
+ --format FORMAT Flat table for AI/machine consumption:
87
+ 'table' (TSV) or 'table-json' (JSON array)
88
+ --html Output static HTML viewer to stdout
89
+ --port PORT Port for the web UI (default: auto).
90
+ Useful for SSH port forwarding
91
+ --host HOST Bind address for the web UI (default: localhost).
92
+ 0.0.0.0 allows external access — the viewer has
93
+ NO authentication; prefer SSH port forwarding
94
+
95
+ The browser is auto-opened only when a GUI is available (DISPLAY /
96
+ WAYLAND_DISPLAY on Linux) and the bind address is local; otherwise the URL
97
+ is printed for manual opening (no terminal-browser fallback).
65
98
 
66
99
  Default (no flag): opens interactive web UI in browser.
67
100
  Default file: rperf.json.gz
68
101
 
69
- ### diff: Compare two pprof profiles (target - base). Requires Go.
102
+ #### Time-travel mode (directory input)
103
+
104
+ rperf report ./profiles/
105
+
106
+ Passing a directory lists all `*.json(.gz)` profiles in a sidebar — one row
107
+ per snapshot with commit SHA (a `*` marks a dirty working tree), commit
108
+ subject, date, and alloc/GC badges versus the previous snapshot (⚠️ when
109
+ allocation changed more than ±15%). Rows are grouped by git branch
110
+ (main/master expanded by default). Only meta/summary heads are read for the
111
+ listing; profile bodies are lazy-loaded on selection, so directories with
112
+ 100+ snapshots open instantly. Works well with `rperf record --snapshot-dir`.
113
+
114
+ - Click a row to view that snapshot; j / k move to the newer / older one.
115
+ - ⇄ on a row diffs the current snapshot against it: the flamegraph is
116
+ recolored by share change (red = increased, blue = decreased, neutral
117
+ below ±0.4pt; direction is base (older) → current).
118
+ - Shift+click a frame to pin that method: a sparkline of its share across
119
+ all snapshots appears at the top of the sidebar (points are filled in as
120
+ bodies load; click a point to jump). The pinned frame is highlighted and
121
+ others are dimmed.
122
+ - Files without meta (saved by older rperf) appear as unknown snapshots.
123
+
124
+ `--html` generates an HTML file with profile data embedded inline.
125
+ No server is needed — open it directly in a browser. d3 and
126
+ d3-flamegraph are loaded from CDN, so an internet connection is
127
+ required on first viewing. Useful for sharing or hosting on static
128
+ sites (e.g., GitHub Pages).
129
+
130
+ rperf report --html profile.json.gz > report.html
131
+
132
+ ### diff: Compare two profiles (target - base). Requires Go (except --format table).
133
+
134
+ Accepts `.json.gz` (auto-converted to pprof) or `.pb.gz` files.
70
135
 
71
136
  --top Print top functions by diff
72
137
  --text Print text diff report
138
+ --format FORMAT Flat diff table for AI/machine consumption:
139
+ 'table' (TSV) or 'table-json' (JSON array).
140
+ Computed in Ruby — no Go required.
141
+ .json.gz / .json files only.
142
+ --port PORT Port for the web UI (default: auto)
143
+ --host HOST Bind address for the web UI (default: localhost)
73
144
 
74
145
  Default (no flag): opens diff in browser.
75
146
 
147
+ ### Table output for AI analysis (--format table / table-json)
148
+
149
+ Aggregation, diffing, and cutoff happen on the rperf side; the output is a
150
+ flat table that an LLM can analyze directly — no tree walking required.
151
+
152
+ `rperf report --format table FILE` columns (self_pct descending, top 50
153
+ plus an `(other)` aggregate row):
154
+
155
+ method self_pct total_pct self_ms
156
+
157
+ `rperf diff --format table BASE HEAD` columns (|delta_pt| descending,
158
+ top 50; delta_pt = self_pct_head - self_pct_base in percentage points):
159
+
160
+ method self_pct_base self_pct_head delta_pt
161
+
162
+ Per-method allocation data does not exist in sampling profiles, so
163
+ allocation counts appear only in the summary (whole-profile delta).
164
+
165
+ The last TSV line is `# summary` with tab-separated key=value pairs
166
+ (total_ms / allocated_objects / GC counts; base/head/delta for diff).
167
+ With `table-json`, the output is a JSON array of row objects whose last
168
+ element is `{"summary": {...}}`.
169
+
170
+ Feed the result to an LLM:
171
+
172
+ rperf diff --format table base.json.gz head.json.gz | claude -p "回帰の原因を分析して"
173
+
174
+ ### Multi-process profiling
175
+
176
+ By default, rperf profiles forked and spawned Ruby child processes.
177
+ Profiles from all processes are merged into a single output. Each child
178
+ process's samples are tagged with a `%pid` label for per-process filtering.
179
+
180
+ # Profile a preforking server (Unicorn, Puma, etc.)
181
+ rperf stat -m wall bundle exec unicorn
182
+ rperf record -m wall -o profile.json.gz bundle exec unicorn
183
+
184
+ # Profile with fork
185
+ rperf stat ruby -e '4.times { fork { work } }; Process.waitall'
186
+
187
+ # Disable child process tracking
188
+ rperf stat --no-inherit ruby app.rb
189
+
190
+ How it works:
191
+
192
+ - On fork: `Process._fork` hook restarts profiling in the child and sets
193
+ a `%pid` label. When the child exits, its profile is saved to a
194
+ temporary session directory.
195
+ - On spawn/system: The spawned Ruby process inherits `RUBYLIB` (pointing
196
+ to rperf's lib directory) and `RUBYOPT=-rrperf`, plus `RPERF_SESSION_DIR`.
197
+ It auto-starts profiling and writes its profile to the session directory.
198
+ - When the root process exits, it aggregates all profiles from the
199
+ session directory into a single output (stat report or file).
200
+ - The session directory is cleaned up after aggregation.
201
+
202
+ Limitations:
203
+
204
+ - Daemon children (Process.daemon) that outlive the parent will have
205
+ their profiles lost, since the parent aggregates and cleans up the
206
+ session directory at exit.
207
+ - Cross-process snapshots (Rperf.snapshot) are not supported; snapshots
208
+ only cover the current process.
209
+ - Only Ruby child processes are profiled; non-Ruby children (shell
210
+ scripts, Python, etc.) are not affected.
211
+ - Child processes that use rperf independently (Rperf.start in their
212
+ own code) will conflict with the inherited auto-start session.
213
+ Such programs should clear RPERF_ENABLED from their environment
214
+ before requiring rperf.
215
+
76
216
  ### Examples
77
217
 
78
218
  rperf record ruby app.rb
@@ -81,15 +221,22 @@ Default (no flag): opens diff in browser.
81
221
  rperf record -o profile.collapsed ruby app.rb
82
222
  rperf record -o profile.txt ruby app.rb
83
223
  rperf record -p ruby app.rb
224
+ rperf record --snapshot-dir ./profiles ruby app.rb
225
+ rperf record --label ci=github-actions --label pr=123 ruby app.rb
84
226
  rperf stat ruby app.rb
85
227
  rperf stat --report ruby app.rb
86
228
  rperf stat -o profile.pb.gz ruby app.rb
229
+ rperf stat -m wall bundle exec unicorn
230
+ rperf stat --no-inherit ruby app.rb
87
231
  rperf exec ruby app.rb
88
232
  rperf exec -m cpu ruby app.rb
89
233
  rperf report
90
234
  rperf report --top profile.pb.gz
235
+ rperf report --format table profile.json.gz
236
+ rperf report ./profiles/
91
237
  rperf diff before.pb.gz after.pb.gz
92
238
  rperf diff --top before.pb.gz after.pb.gz
239
+ rperf diff --format table before.json.gz after.json.gz
93
240
 
94
241
  ## RUBY API
95
242
 
@@ -114,12 +261,17 @@ Rperf.save("profile.txt", data)
114
261
 
115
262
  ### Rperf.start parameters
116
263
 
117
- frequency: Sampling frequency in Hz (Integer, default: 1000)
264
+ frequency: Sampling frequency in Hz (Integer, 1..10000, default: 1000)
118
265
  mode: :cpu or :wall (Symbol, default: :cpu)
119
266
  output: File path to write on stop (String or nil)
120
267
  verbose: Print statistics to stderr (true/false, default: false)
121
268
  format: :json, :pprof, :collapsed, :text, or nil for auto-detect (Symbol or nil)
122
269
  defer: Start with timer paused; use Rperf.profile to activate (default: false)
270
+ inherit: Child process tracking: :fork (default), true (fork+spawn), false (none)
271
+ Note: CLI defaults to true (--no-inherit to disable)
272
+ signal: Timer signal (Linux only): nil (default, auto), false (use nanosleep),
273
+ or a signal number (Integer)
274
+ aggregate: Aggregate samples in C (default: true). false returns raw per-sample data
123
275
 
124
276
  ### Rperf.stop return value
125
277
 
@@ -128,24 +280,30 @@ nil if profiler was not running; otherwise a Hash:
128
280
  ```ruby
129
281
  { mode: :cpu, # or :wall
130
282
  frequency: 500,
131
- sampling_count: 1234,
283
+ trigger_count: 1300, # number of timer triggers
284
+ sampling_count: 1234, # number of timer callbacks (may differ from trigger_count)
132
285
  sampling_time_ns: 56789,
133
286
  detected_thread_count: 4, # threads seen during profiling
134
287
  start_time_ns: 17740..., # CLOCK_REALTIME epoch nanos
135
288
  duration_ns: 10000000, # profiling duration in nanos
136
- aggregated_samples: [ # when aggregate: true (default)
289
+ aggregated_samples: [ # always present
137
290
  [frames, weight, seq, label_set_id], # frames: [[path, label], ...] deepest-first
138
291
  ... # weight: Integer (nanoseconds, merged per unique stack)
139
292
  ], # seq: Integer (thread sequence, 1-based)
140
293
  # label_set_id: Integer (0 = no labels)
141
294
  label_sets: [{}, {request: "abc"}, ...], # label set table (index = label_set_id)
142
- # --- OR ---
143
- raw_samples: [ # when aggregate: false
144
- [frames, weight, seq, label_set_id], # one entry per timer sample (not merged)
145
- ...
146
- ] }
295
+ # additionally, when aggregate: false:
296
+ raw_samples: [ # one entry per timer sample (not merged)
297
+ [frames, weight, seq, label_set_id, vm_state],
298
+ ... # vm_state: Integer (raw VM state; NOT
299
+ ] } # converted to %GVL/%GC labels — only
300
+ # aggregated_samples gets that conversion)
147
301
  ```
148
302
 
303
+ With `aggregate: false`, BOTH keys are present: `aggregated_samples` is built
304
+ in Ruby from the raw samples (so encoders always work), and `raw_samples`
305
+ preserves the unmerged per-sample data.
306
+
149
307
  ### Rperf.snapshot(clear: false)
150
308
 
151
309
  Returns a snapshot of the current profiling data without stopping.
@@ -153,7 +311,9 @@ Only works in aggregate mode (the default). Returns nil if not profiling.
153
311
 
154
312
  When `clear: true` is given, resets aggregated data after taking the snapshot.
155
313
  This enables interval-based profiling where each snapshot covers only the
156
- period since the last clear.
314
+ period since the last clear. Note: the frame table is intentionally retained
315
+ (frame IDs must stay stable for GC safety and thread data consistency), so
316
+ `unique_frames` may accumulate across intervals.
157
317
 
158
318
  ```ruby
159
319
  Rperf.start(frequency: 1000)
@@ -250,6 +410,22 @@ running). Raises `RuntimeError` if not started, `ArgumentError` without block.
250
410
 
251
411
  Returns the current thread's labels as a Hash. Empty hash if none set.
252
412
 
413
+ ### Rperf.running?
414
+
415
+ Returns true while a profiling session is active (between start and stop).
416
+
417
+ ### Rperf.load(path)
418
+
419
+ Loads a `.json.gz` or `.json` profile file (saved by `rperf record` or `Rperf.save`)
420
+ and returns the parsed data hash (same format as `Rperf.stop` / `Rperf.snapshot`).
421
+ Gzip is auto-detected by magic bytes, so both compressed and plain files work.
422
+ Warns to stderr if the file was saved by a different rperf version.
423
+
424
+ ```ruby
425
+ data = Rperf.load("rperf.json.gz") # gzip compressed
426
+ data = Rperf.load("profile.json") # plain text JSON
427
+ ```
428
+
253
429
  ### Rperf.save(path, data, format: nil)
254
430
 
255
431
  Writes data to path. format: :json, :pprof, :collapsed, or :text.
@@ -269,6 +445,8 @@ use Rperf::RackMiddleware
269
445
 
270
446
  The middleware uses `Rperf.profile` to activate timer and set labels.
271
447
  Start profiling separately. Option: `label_key:` (default: `:endpoint`).
448
+ When the profiler is not running, the middleware is a no-op (passes the
449
+ request straight through).
272
450
 
273
451
  ### Rperf::ActiveJobMiddleware
274
452
 
@@ -312,7 +490,19 @@ use Rperf::Viewer, max_snapshots: 12 # keep fewer snapshots (default: 24)
312
490
  ```
313
491
 
314
492
  Take snapshots via `Rperf::Viewer.instance.take_snapshot!` or
315
- `Rperf::Viewer.instance.add_snapshot(data)`.
493
+ `Rperf::Viewer.instance.add_snapshot(data)`. Snapshots carry the same
494
+ meta/summary as saved profiles, so when more than one snapshot exists the
495
+ UI shows the time-travel sidebar (list, diff, pin/sparkline, j/k) — see
496
+ "Time-travel mode" under the report subcommand.
497
+ `add_snapshot_dir(dir)` loads a directory of saved profiles (lazy-loaded;
498
+ `max_snapshots` does not apply to directory entries).
499
+
500
+ The UI fetches data from `<path>/snapshots` (list) and
501
+ `<path>/snapshots/<id>` (body). The URLs are replaceable at runtime:
502
+ define `window.RPERF_DATA_SOURCE` (with `listUrl()` / `snapshotUrl(id)`,
503
+ and optionally an async `onAuthError(url)` hook that returns a fresh URL
504
+ when a fetch hits HTTP 403, e.g. an expired signed URL) before the viewer
505
+ script runs to read snapshots from another source.
316
506
 
317
507
  #### Typical setup with RackMiddleware and periodic snapshots
318
508
 
@@ -358,7 +548,8 @@ end
358
548
 
359
549
  #### UI tabs
360
550
 
361
- - **Flamegraph** — Interactive flamegraph (d3-flame-graph). Click to zoom.
551
+ - **Flamegraph** — Interactive flamegraph (d3-flame-graph). Click to zoom;
552
+ Shift+click to pin a method (sparkline across snapshots in the sidebar).
362
553
  - **Top** — Flat/cumulative weight table. Click column headers to sort.
363
554
  - **Tags** — Label key/value breakdown with weight bars. Click a row to
364
555
  set tagfocus and switch to Flamegraph.
@@ -389,15 +580,80 @@ Tag keys are sorted alphabetically (`%`-prefixed VM state keys appear first).
389
580
 
390
581
  ### json (default) — rperf native format
391
582
 
392
- Gzip-compressed JSON representation of the internal data hash
583
+ JSON representation of the internal data hash
393
584
  (the same hash returned by `Rperf.stop` / `Rperf.snapshot` — see
394
585
  "Return value" above for the full structure).
395
586
  Preserves all data including labels, VM state, thread info, and statistics.
396
587
  Readable by non-Ruby tools (Python, jq, etc.).
397
- Extension convention: `.json.gz`
588
+ Extension convention: `.json.gz` (gzip-compressed, default) or `.json` (plain text).
398
589
  View with: `rperf report` (opens rperf viewer in browser, no Go required).
399
590
  Load programmatically: `data = Rperf.load("rperf.json.gz")`
400
591
 
592
+ #### Profile metadata (meta / summary)
593
+
594
+ JSON profiles embed two extra top-level keys, written FIRST in the file so
595
+ tools can list profiles by decompressing only the head (`Rperf.read_meta`):
596
+
597
+ ```json
598
+ {
599
+ "meta": {
600
+ "format_version": 1,
601
+ "created_at": "2026-06-12T10:00:00Z",
602
+ "ruby_version": "3.5.0",
603
+ "rperf_version": "0.10.0",
604
+ "mode": "cpu",
605
+ "hostname": "...",
606
+ "git": {
607
+ "sha": "88e1a40...",
608
+ "branch": "main",
609
+ "subject": "Add nested includes support",
610
+ "committed_at": "2026-06-09T...",
611
+ "dirty": false
612
+ },
613
+ "labels": { "ci": "github-actions", "pr": "123" }
614
+ },
615
+ "summary": {
616
+ "total_ms": 2001.8,
617
+ "cpu_ms": 2023.3,
618
+ "gc_count_minor": 2,
619
+ "gc_count_major": 2,
620
+ "gc_ms": 3.0,
621
+ "allocated_objects": 48741,
622
+ "freed_objects": 27034,
623
+ "maxrss_mb": 16,
624
+ "samples": 1999,
625
+ "top_methods": [
626
+ { "name": "Object#fibonacci", "self_pct": 99.9, "total_pct": 99.9 }
627
+ ]
628
+ }
629
+ }
630
+ ```
631
+
632
+ Notes:
633
+
634
+ - `git` is omitted outside a git repository (never an error). In GitHub
635
+ Actions, `GITHUB_SHA` / `GITHUB_REF_NAME` take priority over git commands.
636
+ The CLI collects git info before launching the profiled command, so a
637
+ `chdir` in the app cannot point git at the wrong repository.
638
+ - `git.dirty` is true when the working tree has uncommitted changes.
639
+ - GC counts and allocation counts are deltas over the profiled period
640
+ (`GC.stat` baselines are captured at `Rperf.start`). `maxrss_mb` is the
641
+ process-lifetime peak (no period delta is possible; on macOS the value
642
+ comes from `ps -o rss=` and is the current RSS, not the peak).
643
+ - `summary.top_methods` lists up to 50 methods by self time.
644
+ - In multi-process profiling, GC/memory stats come from the root process
645
+ only (same policy as `rperf stat`); time and sample stats are aggregated.
646
+ - Files saved by older rperf versions (no `meta`) remain loadable; viewers
647
+ treat them as unknown snapshots.
648
+ - pprof / collapsed / text exports do not contain meta.
649
+
650
+ ### Rperf.read_meta(path)
651
+
652
+ Reads only `meta` / `summary` from a `.json(.gz)` profile without parsing
653
+ the sample body (fast even for large files). Returns
654
+ `{ meta: Hash|nil, summary: Hash|nil }`, or nil for pre-meta files and
655
+ unreadable files.
656
+
401
657
  ### pprof
402
658
 
403
659
  Gzip-compressed protobuf. Standard pprof format.
@@ -459,7 +715,8 @@ Example output:
459
715
 
460
716
  Format is auto-detected from the output file extension:
461
717
 
462
- .json.gz → json (rperf native, default)
718
+ .json.gz → json (rperf native, gzip compressed, default)
719
+ .json → json (plain text, readable by jq)
463
720
  .pb.gz → pprof
464
721
  .collapsed → collapsed
465
722
  .txt → text
@@ -485,8 +742,8 @@ In both modes, GC state labels are recorded:
485
742
  - **%GC=mark** — Time spent in GC marking phase (wall time).
486
743
  - **%GC=sweep** — Time spent in GC sweeping phase (wall time).
487
744
 
488
- These labels appear in `label_sets` (e.g., `{"%GVL" => "blocked"}`,
489
- `{"%GC" => "mark"}`) and are written into pprof sample labels.
745
+ These labels appear in `label_sets` (e.g., `{:"%GVL" => "blocked"}`,
746
+ `{:"%GC" => "mark"}`) and are written into pprof sample labels.
490
747
 
491
748
  To add VM state as frames in flamegraphs, use pprof tag options:
492
749
 
@@ -566,7 +823,7 @@ Or convert to text with pprof CLI:
566
823
 
567
824
  go tool pprof -text profile.pb.gz
568
825
  go tool pprof -top profile.pb.gz
569
- go tool pprof -flame profile.pb.gz
826
+ go tool pprof -http=:8080 profile.pb.gz # web UI (includes flame graph view)
570
827
 
571
828
  ## ENVIRONMENT VARIABLES
572
829
 
@@ -581,6 +838,15 @@ Used internally by the CLI to pass options to the auto-started profiler:
581
838
  RPERF_SIGNAL=N|false Timer signal number or 'false' for nanosleep (Linux only)
582
839
  RPERF_STAT=1 Enable stat mode (used by rperf stat)
583
840
  RPERF_STAT_REPORT=1 Include profile tables in stat output
841
+ RPERF_AGGREGATE=0 Disable C-level sample aggregation (raw mode)
842
+ RPERF_DEFER=1 Start with timer paused; use Rperf.profile to activate
843
+ RPERF_TMPDIR=path Base directory for session directories (overrides default tmpdir)
844
+
845
+ Internal variables (set automatically by the CLI — not for manual use):
846
+
847
+ RPERF_SESSION_DIR=path Session directory for multi-process profiling
848
+ RPERF_ROOT_PROCESS=pid Marks the root aggregating process
849
+ RPERF_STAT_COMMAND=str Command string displayed in stat output
584
850
 
585
851
  ## TIPS
586
852