rperf 0.9.0 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7182c353301aa38afde2d65928219c46cee3e777b49842bf60331dd40e7b3ab2
4
- data.tar.gz: ebb30b807d9b86a7ff48090bc5990d6bdcc0c9951f80b21758df2413bdbd39f2
3
+ metadata.gz: 0dd9ddc0ea22d51ff2ebcdae6ddfa2f5e531c4a7814552b16470668d04d1d617
4
+ data.tar.gz: fce76e35ea6ccbca8e8b429da4b55574349f98b8731a2db3826efd908082dda6
5
5
  SHA512:
6
- metadata.gz: d70dde1af1a4c3c9cec02e20981038c465facea4d3da32c32ce26e16cfc402de01f594c0b41386e3095f53603631f930fb2bd133d23b8a5757662d40f117d2a5
7
- data.tar.gz: b7868f0237f84a47bda6286c99b7056f47738fb235ede0d21ac8f950feb9b891b1677dff38f25294393766cbe4f742b2eeffe73a43dded92562926db8d5bd392
6
+ metadata.gz: dceedb899f6b9249ae342e5701ef3280253c82d317ad73dea0bdd0c0c040a576740bb747c05c79b1f6118c82b05ad4756207ac046d9cc4805982a2f24ad09e1b
7
+ data.tar.gz: 4353afbe9ca80126c25884e837aacdd902a1e9d20756d543d235415492e17932490711b348ee49eb9e5144f977027a1284876cbd4dea4de31c342a16db16d931
data/README.md CHANGED
@@ -75,9 +75,16 @@ rperf report --top profile.json.gz # print top functions to terminal
75
75
 
76
76
  # Compare two profiles (requires Go)
77
77
  rperf diff before.json.gz after.json.gz # open diff in browser
78
+
79
+ # Track performance across commits (time-travel viewer)
80
+ rperf record --snapshot-dir ./profiles ruby app.rb # → profiles/rperf-<sha7>-<ts>.json.gz
81
+ rperf report ./profiles/ # sidebar: per-commit list, diff, sparkline
82
+
83
+ # Flat tables for AI analysis (no Go required)
84
+ rperf diff --format table base.json.gz head.json.gz | claude -p "analyze the regression"
78
85
  ```
79
86
 
80
- On `rperf report`, you can see the profile result like this page: [rprof viewer](https://ko1.github.io/rperf/examples/cpu_intensive_profile.html)
87
+ On `rperf report`, you can see the profile result like this page: [rperf viewer](https://ko1.github.io/rperf/examples/cpu_intensive_profile.html)
81
88
 
82
89
  ### Ruby API
83
90
 
@@ -119,10 +126,12 @@ Thread.new { loop { sleep 3600; Rperf::Viewer.instance&.take_snapshot! } }
119
126
  Profile without code changes (e.g., Rails):
120
127
 
121
128
  ```bash
122
- RPERF_ENABLED=1 RPERF_MODE=wall ruby app.rb # → rperf.json.gz
123
- rperf report # open in viewer
129
+ RPERF_ENABLED=1 RPERF_MODE=wall RUBYOPT=-rrperf ruby app.rb # → rperf.json.gz
130
+ rperf report # open in viewer
124
131
  ```
125
132
 
133
+ `RPERF_ENABLED` takes effect when rperf is loaded — if rperf is already in your Gemfile (e.g., Rails), `RUBYOPT=-rrperf` is unnecessary.
134
+
126
135
  Run `rperf help` for full documentation, or see the [online manual](https://ko1.github.io/rperf/docs/manual/).
127
136
 
128
137
  ## Subcommands
@@ -134,8 +143,8 @@ Inspired by Linux `perf` — familiar subcommand interface for profiling workflo
134
143
  | `rperf record` | Profile a command and save to file (default: `.json.gz`) |
135
144
  | `rperf stat` | Profile a command and print summary to stderr |
136
145
  | `rperf exec` | Profile a command and print full report to stderr |
137
- | `rperf report` | Open viewer for `.json.gz`; wraps `go tool pprof` for `.pb.gz` (requires Go) |
138
- | `rperf diff` | Compare two profiles (requires Go) |
146
+ | `rperf report` | Open viewer for `.json.gz` (or a directory for time-travel mode); wraps `go tool pprof` for `.pb.gz` (requires Go) |
147
+ | `rperf diff` | Compare two profiles (requires Go, except `--format table`) |
139
148
  | `rperf help` | Show full reference documentation |
140
149
 
141
150
  ## How It Works
@@ -188,7 +197,7 @@ rperf hooks GVL and GC events to attribute non-CPU time. These are recorded as l
188
197
  - **Accurate despite safepoints** — Safepoint sampling is *safer* (no async-signal-safety issues), but normally *inaccurate*. rperf compensates with real time-delta weights, so profiles faithfully reflect where time is actually spent.
189
198
  - **See the whole picture** (wall mode) — GVL contention, off-GVL I/O, GC marking/sweeping — all attributed to the call stacks responsible, via sample labels.
190
199
  - **Built-in viewer** — Flamegraph, Top, Tags tabs with interactive tag filtering. No external tools needed to analyze profiles.
191
- - **Low overhead** — Signal-based timer on Linux (no extra thread). ~1–5 us per sample.
200
+ - **Low overhead** — Signal-based timer on Linux (signals delivered to a dedicated worker thread — Ruby threads are never interrupted). ~1–5 us per sample.
192
201
  - **Zero code changes** — Profile any Ruby program via CLI or environment variables. Drop-in for Rails, too.
193
202
  - **`perf`-like CLI** — `record`, `stat`, `report`, `diff` — if you know Linux perf, you already know rperf.
194
203
  - **Multi-process** — automatically profiles forked/spawned Ruby child processes (e.g., Unicorn/Puma workers). Use `--no-inherit` to disable.
data/docs/help.md CHANGED
@@ -26,10 +26,17 @@ POSIX systems (Linux, macOS). Requires Ruby >= 3.4.0.
26
26
  (same as --format=text --output=/dev/stdout)
27
27
  --signal VALUE Timer signal (Linux only): signal number, or 'false'
28
28
  for nanosleep thread (default: auto)
29
+ --snapshot-dir DIR Save as rperf-<sha7>-<timestamp>.json.gz in DIR
30
+ (rperf-nogit-<timestamp>-<pid>.json.gz outside git)
31
+ --label KEY=VALUE Add a label to profile metadata (repeatable)
29
32
  --no-inherit Do not profile forked/spawned child processes
30
33
  --no-aggregate Disable C-level sample aggregation (raw per-sample data)
31
34
  -v, --verbose Print sampling statistics to stderr
32
35
 
36
+ JSON output embeds `meta` (git commit, host, Ruby/rperf versions, labels)
37
+ and `summary` (time, GC, allocation, top methods) — see "Profile metadata"
38
+ under OUTPUT FORMATS.
39
+
33
40
  ### stat: Run command and print performance summary to stderr.
34
41
 
35
42
  Uses wall mode by default. No file output by default.
@@ -37,6 +44,7 @@ Uses wall mode by default. No file output by default.
37
44
  -o, --output PATH Also save profile to file (default: none)
38
45
  -f, --frequency HZ Sampling frequency in Hz (default: 1000)
39
46
  -m, --mode MODE cpu or wall (default: wall)
47
+ --label KEY=VALUE Add a label to profile metadata (repeatable)
40
48
  --report Include flat/cumulative profile tables in output
41
49
  --signal VALUE Timer signal (Linux only): signal number, or 'false'
42
50
  for nanosleep thread (default: auto)
@@ -61,6 +69,7 @@ Like `stat --report`. Uses wall mode by default. No file output by default.
61
69
  -o, --output PATH Also save profile to file (default: none)
62
70
  -f, --frequency HZ Sampling frequency in Hz (default: 1000)
63
71
  -m, --mode MODE cpu or wall (default: wall)
72
+ --label KEY=VALUE Add a label to profile metadata (repeatable)
64
73
  --signal VALUE Timer signal (Linux only): signal number, or 'false'
65
74
  for nanosleep thread (default: auto)
66
75
  --no-inherit Do not profile forked/spawned child processes
@@ -74,11 +83,44 @@ and flat/cumulative top-50 function tables.
74
83
 
75
84
  --top Print top functions by flat time
76
85
  --text Print text report
86
+ --format FORMAT Flat table for AI/machine consumption:
87
+ 'table' (TSV) or 'table-json' (JSON array)
77
88
  --html Output static HTML viewer to stdout
89
+ --port PORT Port for the web UI (default: auto).
90
+ Useful for SSH port forwarding
91
+ --host HOST Bind address for the web UI (default: localhost).
92
+ 0.0.0.0 allows external access — the viewer has
93
+ NO authentication; prefer SSH port forwarding
94
+
95
+ The browser is auto-opened only when a GUI is available (DISPLAY /
96
+ WAYLAND_DISPLAY on Linux) and the bind address is local; otherwise the URL
97
+ is printed for manual opening (no terminal-browser fallback).
78
98
 
79
99
  Default (no flag): opens interactive web UI in browser.
80
100
  Default file: rperf.json.gz
81
101
 
102
+ #### Time-travel mode (directory input)
103
+
104
+ rperf report ./profiles/
105
+
106
+ Passing a directory lists all `*.json(.gz)` profiles in a sidebar — one row
107
+ per snapshot with commit SHA (a `*` marks a dirty working tree), commit
108
+ subject, date, and alloc/GC badges versus the previous snapshot (⚠️ when
109
+ allocation changed more than ±15%). Rows are grouped by git branch
110
+ (main/master expanded by default). Only meta/summary heads are read for the
111
+ listing; profile bodies are lazy-loaded on selection, so directories with
112
+ 100+ snapshots open instantly. Works well with `rperf record --snapshot-dir`.
113
+
114
+ - Click a row to view that snapshot; j / k move to the newer / older one.
115
+ - ⇄ on a row diffs the current snapshot against it: the flamegraph is
116
+ recolored by share change (red = increased, blue = decreased, neutral
117
+ below ±0.4pt; direction is base (older) → current).
118
+ - Shift+click a frame to pin that method: a sparkline of its share across
119
+ all snapshots appears at the top of the sidebar (points are filled in as
120
+ bodies load; click a point to jump). The pinned frame is highlighted and
121
+ others are dimmed.
122
+ - Files without meta (saved by older rperf) appear as unknown snapshots.
123
+
82
124
  `--html` generates an HTML file with profile data embedded inline.
83
125
  No server is needed — open it directly in a browser. d3 and
84
126
  d3-flamegraph are loaded from CDN, so an internet connection is
@@ -87,15 +129,48 @@ sites (e.g., GitHub Pages).
87
129
 
88
130
  rperf report --html profile.json.gz > report.html
89
131
 
90
- ### diff: Compare two profiles (target - base). Requires Go.
132
+ ### diff: Compare two profiles (target - base). Requires Go (except --format table).
91
133
 
92
134
  Accepts `.json.gz` (auto-converted to pprof) or `.pb.gz` files.
93
135
 
94
136
  --top Print top functions by diff
95
137
  --text Print text diff report
138
+ --format FORMAT Flat diff table for AI/machine consumption:
139
+ 'table' (TSV) or 'table-json' (JSON array).
140
+ Computed in Ruby — no Go required.
141
+ .json.gz / .json files only.
142
+ --port PORT Port for the web UI (default: auto)
143
+ --host HOST Bind address for the web UI (default: localhost)
96
144
 
97
145
  Default (no flag): opens diff in browser.
98
146
 
147
+ ### Table output for AI analysis (--format table / table-json)
148
+
149
+ Aggregation, diffing, and cutoff happen on the rperf side; the output is a
150
+ flat table that an LLM can analyze directly — no tree walking required.
151
+
152
+ `rperf report --format table FILE` columns (self_pct descending, top 50
153
+ plus an `(other)` aggregate row):
154
+
155
+ method self_pct total_pct self_ms
156
+
157
+ `rperf diff --format table BASE HEAD` columns (|delta_pt| descending,
158
+ top 50; delta_pt = self_pct_head - self_pct_base in percentage points):
159
+
160
+ method self_pct_base self_pct_head delta_pt
161
+
162
+ Per-method allocation data does not exist in sampling profiles, so
163
+ allocation counts appear only in the summary (whole-profile delta).
164
+
165
+ The last TSV line is `# summary` with tab-separated key=value pairs
166
+ (total_ms / allocated_objects / GC counts; base/head/delta for diff).
167
+ With `table-json`, the output is a JSON array of row objects whose last
168
+ element is `{"summary": {...}}`.
169
+
170
+ Feed the result to an LLM:
171
+
172
+ rperf diff --format table base.json.gz head.json.gz | claude -p "回帰の原因を分析して"
173
+
99
174
  ### Multi-process profiling
100
175
 
101
176
  By default, rperf profiles forked and spawned Ruby child processes.
@@ -146,6 +221,8 @@ Limitations:
146
221
  rperf record -o profile.collapsed ruby app.rb
147
222
  rperf record -o profile.txt ruby app.rb
148
223
  rperf record -p ruby app.rb
224
+ rperf record --snapshot-dir ./profiles ruby app.rb
225
+ rperf record --label ci=github-actions --label pr=123 ruby app.rb
149
226
  rperf stat ruby app.rb
150
227
  rperf stat --report ruby app.rb
151
228
  rperf stat -o profile.pb.gz ruby app.rb
@@ -155,8 +232,11 @@ Limitations:
155
232
  rperf exec -m cpu ruby app.rb
156
233
  rperf report
157
234
  rperf report --top profile.pb.gz
235
+ rperf report --format table profile.json.gz
236
+ rperf report ./profiles/
158
237
  rperf diff before.pb.gz after.pb.gz
159
238
  rperf diff --top before.pb.gz after.pb.gz
239
+ rperf diff --format table before.json.gz after.json.gz
160
240
 
161
241
  ## RUBY API
162
242
 
@@ -206,19 +286,24 @@ nil if profiler was not running; otherwise a Hash:
206
286
  detected_thread_count: 4, # threads seen during profiling
207
287
  start_time_ns: 17740..., # CLOCK_REALTIME epoch nanos
208
288
  duration_ns: 10000000, # profiling duration in nanos
209
- aggregated_samples: [ # when aggregate: true (default)
289
+ aggregated_samples: [ # always present
210
290
  [frames, weight, seq, label_set_id], # frames: [[path, label], ...] deepest-first
211
291
  ... # weight: Integer (nanoseconds, merged per unique stack)
212
292
  ], # seq: Integer (thread sequence, 1-based)
213
293
  # label_set_id: Integer (0 = no labels)
214
294
  label_sets: [{}, {request: "abc"}, ...], # label set table (index = label_set_id)
215
- # --- OR ---
216
- raw_samples: [ # when aggregate: false
217
- [frames, weight, seq, label_set_id], # one entry per timer sample (not merged)
218
- ...
219
- ] }
295
+ # additionally, when aggregate: false:
296
+ raw_samples: [ # one entry per timer sample (not merged)
297
+ [frames, weight, seq, label_set_id, vm_state],
298
+ ... # vm_state: Integer (raw VM state; NOT
299
+ ] } # converted to %GVL/%GC labels — only
300
+ # aggregated_samples gets that conversion)
220
301
  ```
221
302
 
303
+ With `aggregate: false`, BOTH keys are present: `aggregated_samples` is built
304
+ in Ruby from the raw samples (so encoders always work), and `raw_samples`
305
+ preserves the unmerged per-sample data.
306
+
222
307
  ### Rperf.snapshot(clear: false)
223
308
 
224
309
  Returns a snapshot of the current profiling data without stopping.
@@ -325,6 +410,10 @@ running). Raises `RuntimeError` if not started, `ArgumentError` without block.
325
410
 
326
411
  Returns the current thread's labels as a Hash. Empty hash if none set.
327
412
 
413
+ ### Rperf.running?
414
+
415
+ Returns true while a profiling session is active (between start and stop).
416
+
328
417
  ### Rperf.load(path)
329
418
 
330
419
  Loads a `.json.gz` or `.json` profile file (saved by `rperf record` or `Rperf.save`)
@@ -356,6 +445,8 @@ use Rperf::RackMiddleware
356
445
 
357
446
  The middleware uses `Rperf.profile` to activate timer and set labels.
358
447
  Start profiling separately. Option: `label_key:` (default: `:endpoint`).
448
+ When the profiler is not running, the middleware is a no-op (passes the
449
+ request straight through).
359
450
 
360
451
  ### Rperf::ActiveJobMiddleware
361
452
 
@@ -399,7 +490,19 @@ use Rperf::Viewer, max_snapshots: 12 # keep fewer snapshots (default: 24)
399
490
  ```
400
491
 
401
492
  Take snapshots via `Rperf::Viewer.instance.take_snapshot!` or
402
- `Rperf::Viewer.instance.add_snapshot(data)`.
493
+ `Rperf::Viewer.instance.add_snapshot(data)`. Snapshots carry the same
494
+ meta/summary as saved profiles, so when more than one snapshot exists the
495
+ UI shows the time-travel sidebar (list, diff, pin/sparkline, j/k) — see
496
+ "Time-travel mode" under the report subcommand.
497
+ `add_snapshot_dir(dir)` loads a directory of saved profiles (lazy-loaded;
498
+ `max_snapshots` does not apply to directory entries).
499
+
500
+ The UI fetches data from `<path>/snapshots` (list) and
501
+ `<path>/snapshots/<id>` (body). The URLs are replaceable at runtime:
502
+ define `window.RPERF_DATA_SOURCE` (with `listUrl()` / `snapshotUrl(id)`,
503
+ and optionally an async `onAuthError(url)` hook that returns a fresh URL
504
+ when a fetch hits HTTP 403, e.g. an expired signed URL) before the viewer
505
+ script runs to read snapshots from another source.
403
506
 
404
507
  #### Typical setup with RackMiddleware and periodic snapshots
405
508
 
@@ -445,7 +548,8 @@ end
445
548
 
446
549
  #### UI tabs
447
550
 
448
- - **Flamegraph** — Interactive flamegraph (d3-flame-graph). Click to zoom.
551
+ - **Flamegraph** — Interactive flamegraph (d3-flame-graph). Click to zoom;
552
+ Shift+click to pin a method (sparkline across snapshots in the sidebar).
449
553
  - **Top** — Flat/cumulative weight table. Click column headers to sort.
450
554
  - **Tags** — Label key/value breakdown with weight bars. Click a row to
451
555
  set tagfocus and switch to Flamegraph.
@@ -485,6 +589,71 @@ Extension convention: `.json.gz` (gzip-compressed, default) or `.json` (plain te
485
589
  View with: `rperf report` (opens rperf viewer in browser, no Go required).
486
590
  Load programmatically: `data = Rperf.load("rperf.json.gz")`
487
591
 
592
+ #### Profile metadata (meta / summary)
593
+
594
+ JSON profiles embed two extra top-level keys, written FIRST in the file so
595
+ tools can list profiles by decompressing only the head (`Rperf.read_meta`):
596
+
597
+ ```json
598
+ {
599
+ "meta": {
600
+ "format_version": 1,
601
+ "created_at": "2026-06-12T10:00:00Z",
602
+ "ruby_version": "3.5.0",
603
+ "rperf_version": "0.10.0",
604
+ "mode": "cpu",
605
+ "hostname": "...",
606
+ "git": {
607
+ "sha": "88e1a40...",
608
+ "branch": "main",
609
+ "subject": "Add nested includes support",
610
+ "committed_at": "2026-06-09T...",
611
+ "dirty": false
612
+ },
613
+ "labels": { "ci": "github-actions", "pr": "123" }
614
+ },
615
+ "summary": {
616
+ "total_ms": 2001.8,
617
+ "cpu_ms": 2023.3,
618
+ "gc_count_minor": 2,
619
+ "gc_count_major": 2,
620
+ "gc_ms": 3.0,
621
+ "allocated_objects": 48741,
622
+ "freed_objects": 27034,
623
+ "maxrss_mb": 16,
624
+ "samples": 1999,
625
+ "top_methods": [
626
+ { "name": "Object#fibonacci", "self_pct": 99.9, "total_pct": 99.9 }
627
+ ]
628
+ }
629
+ }
630
+ ```
631
+
632
+ Notes:
633
+
634
+ - `git` is omitted outside a git repository (never an error). In GitHub
635
+ Actions, `GITHUB_SHA` / `GITHUB_REF_NAME` take priority over git commands.
636
+ The CLI collects git info before launching the profiled command, so a
637
+ `chdir` in the app cannot point git at the wrong repository.
638
+ - `git.dirty` is true when the working tree has uncommitted changes.
639
+ - GC counts and allocation counts are deltas over the profiled period
640
+ (`GC.stat` baselines are captured at `Rperf.start`). `maxrss_mb` is the
641
+ process-lifetime peak (no period delta is possible; on macOS the value
642
+ comes from `ps -o rss=` and is the current RSS, not the peak).
643
+ - `summary.top_methods` lists up to 50 methods by self time.
644
+ - In multi-process profiling, GC/memory stats come from the root process
645
+ only (same policy as `rperf stat`); time and sample stats are aggregated.
646
+ - Files saved by older rperf versions (no `meta`) remain loadable; viewers
647
+ treat them as unknown snapshots.
648
+ - pprof / collapsed / text exports do not contain meta.
649
+
650
+ ### Rperf.read_meta(path)
651
+
652
+ Reads only `meta` / `summary` from a `.json(.gz)` profile without parsing
653
+ the sample body (fast even for large files). Returns
654
+ `{ meta: Hash|nil, summary: Hash|nil }`, or nil for pre-meta files and
655
+ unreadable files.
656
+
488
657
  ### pprof
489
658
 
490
659
  Gzip-compressed protobuf. Standard pprof format.
@@ -654,7 +823,7 @@ Or convert to text with pprof CLI:
654
823
 
655
824
  go tool pprof -text profile.pb.gz
656
825
  go tool pprof -top profile.pb.gz
657
- go tool pprof -flame profile.pb.gz
826
+ go tool pprof -http=:8080 profile.pb.gz # web UI (includes flame graph view)
658
827
 
659
828
  ## ENVIRONMENT VARIABLES
660
829