rperf 0.8.0 → 0.10.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/LICENSE +21 -0
- data/README.md +26 -15
- data/docs/help.md +284 -18
- data/exe/rperf +278 -55
- data/ext/rperf/rperf.c +220 -81
- data/lib/rperf/active_job.rb +1 -0
- data/lib/rperf/meta.rb +343 -0
- data/lib/rperf/rack.rb +7 -2
- data/lib/rperf/table.rb +156 -0
- data/lib/rperf/version.rb +1 -1
- data/lib/rperf/viewer/viewer.html +1148 -0
- data/lib/rperf/viewer.rb +158 -661
- data/lib/rperf.rb +682 -89
- metadata +8 -4
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 0dd9ddc0ea22d51ff2ebcdae6ddfa2f5e531c4a7814552b16470668d04d1d617
|
|
4
|
+
data.tar.gz: fce76e35ea6ccbca8e8b429da4b55574349f98b8731a2db3826efd908082dda6
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: dceedb899f6b9249ae342e5701ef3280253c82d317ad73dea0bdd0c0c040a576740bb747c05c79b1f6118c82b05ad4756207ac046d9cc4805982a2f24ad09e1b
|
|
7
|
+
data.tar.gz: 4353afbe9ca80126c25884e837aacdd902a1e9d20756d543d235415492e17932490711b348ee49eb9e5144f977027a1284876cbd4dea4de31c342a16db16d931
|
data/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Koichi Sasada
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
data/README.md
CHANGED
|
@@ -69,14 +69,23 @@ rperf stat ruby app.rb
|
|
|
69
69
|
rperf record ruby app.rb # → rperf.json.gz (cpu mode, default)
|
|
70
70
|
rperf record -m wall ruby server.rb # wall mode
|
|
71
71
|
|
|
72
|
-
# View results in browser
|
|
72
|
+
# View results in browser
|
|
73
73
|
rperf report # open rperf.json.gz in viewer
|
|
74
74
|
rperf report --top profile.json.gz # print top functions to terminal
|
|
75
75
|
|
|
76
76
|
# Compare two profiles (requires Go)
|
|
77
77
|
rperf diff before.json.gz after.json.gz # open diff in browser
|
|
78
|
+
|
|
79
|
+
# Track performance across commits (time-travel viewer)
|
|
80
|
+
rperf record --snapshot-dir ./profiles ruby app.rb # → profiles/rperf-<sha7>-<ts>.json.gz
|
|
81
|
+
rperf report ./profiles/ # sidebar: per-commit list, diff, sparkline
|
|
82
|
+
|
|
83
|
+
# Flat tables for AI analysis (no Go required)
|
|
84
|
+
rperf diff --format table base.json.gz head.json.gz | claude -p "analyze the regression"
|
|
78
85
|
```
|
|
79
86
|
|
|
87
|
+
On `rperf report`, you can see the profile result like this page: [rperf viewer](https://ko1.github.io/rperf/examples/cpu_intensive_profile.html)
|
|
88
|
+
|
|
80
89
|
### Ruby API
|
|
81
90
|
|
|
82
91
|
```ruby
|
|
@@ -117,10 +126,12 @@ Thread.new { loop { sleep 3600; Rperf::Viewer.instance&.take_snapshot! } }
|
|
|
117
126
|
Profile without code changes (e.g., Rails):
|
|
118
127
|
|
|
119
128
|
```bash
|
|
120
|
-
RPERF_ENABLED=1 RPERF_MODE=wall ruby app.rb # → rperf.json.gz
|
|
121
|
-
rperf report
|
|
129
|
+
RPERF_ENABLED=1 RPERF_MODE=wall RUBYOPT=-rrperf ruby app.rb # → rperf.json.gz
|
|
130
|
+
rperf report # open in viewer
|
|
122
131
|
```
|
|
123
132
|
|
|
133
|
+
`RPERF_ENABLED` takes effect when rperf is loaded — if rperf is already in your Gemfile (e.g., Rails), `RUBYOPT=-rrperf` is unnecessary.
|
|
134
|
+
|
|
124
135
|
Run `rperf help` for full documentation, or see the [online manual](https://ko1.github.io/rperf/docs/manual/).
|
|
125
136
|
|
|
126
137
|
## Subcommands
|
|
@@ -132,8 +143,8 @@ Inspired by Linux `perf` — familiar subcommand interface for profiling workflo
|
|
|
132
143
|
| `rperf record` | Profile a command and save to file (default: `.json.gz`) |
|
|
133
144
|
| `rperf stat` | Profile a command and print summary to stderr |
|
|
134
145
|
| `rperf exec` | Profile a command and print full report to stderr |
|
|
135
|
-
| `rperf report` | Open viewer for `.json.gz
|
|
136
|
-
| `rperf diff` | Compare two profiles (requires Go) |
|
|
146
|
+
| `rperf report` | Open viewer for `.json.gz` (or a directory for time-travel mode); wraps `go tool pprof` for `.pb.gz` (requires Go) |
|
|
147
|
+
| `rperf diff` | Compare two profiles (requires Go, except `--format table`) |
|
|
137
148
|
| `rperf help` | Show full reference documentation |
|
|
138
149
|
|
|
139
150
|
## How It Works
|
|
@@ -156,7 +167,7 @@ Timer (signal or thread) VM thread (postponed job)
|
|
|
156
167
|
record(backtrace, weight)
|
|
157
168
|
```
|
|
158
169
|
|
|
159
|
-
On Linux, the timer uses `timer_create` + signal delivery
|
|
170
|
+
On Linux, the timer uses `timer_create` + signal delivery to a dedicated worker thread.
|
|
160
171
|
On other platforms, a dedicated pthread with `nanosleep` is used.
|
|
161
172
|
|
|
162
173
|
If a safepoint is delayed, the sample carries proportionally more weight. The total weight equals the total time, accurately distributed across call stacks.
|
|
@@ -170,32 +181,32 @@ If a safepoint is delayed, the sample carries proportionally more weight. The to
|
|
|
170
181
|
|
|
171
182
|
Use `cpu` to find what consumes CPU. Use `wall` to find what makes things slow (I/O, GVL contention, GC).
|
|
172
183
|
|
|
173
|
-
### GVL and GC Labels
|
|
184
|
+
### GVL and GC Labels
|
|
174
185
|
|
|
175
186
|
rperf hooks GVL and GC events to attribute non-CPU time. These are recorded as labels on samples rather than synthetic stack frames:
|
|
176
187
|
|
|
177
|
-
| Label | Meaning |
|
|
178
|
-
|
|
179
|
-
| `%GVL
|
|
180
|
-
| `%GVL
|
|
181
|
-
| `%GC
|
|
182
|
-
| `%GC
|
|
188
|
+
| Label (key=value) | Mode | Meaning |
|
|
189
|
+
|-------|------|---------|
|
|
190
|
+
| `%GVL=blocked` | wall only | Off-GVL time (I/O, sleep, C extension releasing GVL) |
|
|
191
|
+
| `%GVL=wait` | wall only | Waiting to reacquire the GVL (contention) |
|
|
192
|
+
| `%GC=mark` | cpu and wall | Time in GC mark phase (wall time) |
|
|
193
|
+
| `%GC=sweep` | cpu and wall | Time in GC sweep phase (wall time) |
|
|
183
194
|
|
|
184
195
|
## Why rperf?
|
|
185
196
|
|
|
186
197
|
- **Accurate despite safepoints** — Safepoint sampling is *safer* (no async-signal-safety issues), but normally *inaccurate*. rperf compensates with real time-delta weights, so profiles faithfully reflect where time is actually spent.
|
|
187
198
|
- **See the whole picture** (wall mode) — GVL contention, off-GVL I/O, GC marking/sweeping — all attributed to the call stacks responsible, via sample labels.
|
|
188
199
|
- **Built-in viewer** — Flamegraph, Top, Tags tabs with interactive tag filtering. No external tools needed to analyze profiles.
|
|
189
|
-
- **Low overhead** — Signal-based timer on Linux (
|
|
200
|
+
- **Low overhead** — Signal-based timer on Linux (signals delivered to a dedicated worker thread — Ruby threads are never interrupted). ~1–5 us per sample.
|
|
190
201
|
- **Zero code changes** — Profile any Ruby program via CLI or environment variables. Drop-in for Rails, too.
|
|
191
202
|
- **`perf`-like CLI** — `record`, `stat`, `report`, `diff` — if you know Linux perf, you already know rperf.
|
|
203
|
+
- **Multi-process** — automatically profiles forked/spawned Ruby child processes (e.g., Unicorn/Puma workers). Use `--no-inherit` to disable.
|
|
192
204
|
|
|
193
205
|
### Limitations
|
|
194
206
|
|
|
195
207
|
- **Method-level only** — no line-level granularity.
|
|
196
208
|
- **Ruby >= 3.4.0** — uses recent VM internals (postponed jobs, thread event hooks).
|
|
197
209
|
- **POSIX only** — Linux, macOS. No Windows.
|
|
198
|
-
- **No fork following** — profiling stops in fork(2) child processes (the child can start a new session).
|
|
199
210
|
|
|
200
211
|
|
|
201
212
|
## Output Formats
|
data/docs/help.md
CHANGED
|
@@ -12,7 +12,9 @@ POSIX systems (Linux, macOS). Requires Ruby >= 3.4.0.
|
|
|
12
12
|
rperf stat [options] command [args...]
|
|
13
13
|
rperf exec [options] command [args...]
|
|
14
14
|
rperf report [options] [file]
|
|
15
|
+
rperf diff [options] base target
|
|
15
16
|
rperf help
|
|
17
|
+
rperf -v / --version
|
|
16
18
|
|
|
17
19
|
### record: Profile and save to file.
|
|
18
20
|
|
|
@@ -24,8 +26,17 @@ POSIX systems (Linux, macOS). Requires Ruby >= 3.4.0.
|
|
|
24
26
|
(same as --format=text --output=/dev/stdout)
|
|
25
27
|
--signal VALUE Timer signal (Linux only): signal number, or 'false'
|
|
26
28
|
for nanosleep thread (default: auto)
|
|
29
|
+
--snapshot-dir DIR Save as rperf-<sha7>-<timestamp>.json.gz in DIR
|
|
30
|
+
(rperf-nogit-<timestamp>-<pid>.json.gz outside git)
|
|
31
|
+
--label KEY=VALUE Add a label to profile metadata (repeatable)
|
|
32
|
+
--no-inherit Do not profile forked/spawned child processes
|
|
33
|
+
--no-aggregate Disable C-level sample aggregation (raw per-sample data)
|
|
27
34
|
-v, --verbose Print sampling statistics to stderr
|
|
28
35
|
|
|
36
|
+
JSON output embeds `meta` (git commit, host, Ruby/rperf versions, labels)
|
|
37
|
+
and `summary` (time, GC, allocation, top methods) — see "Profile metadata"
|
|
38
|
+
under OUTPUT FORMATS.
|
|
39
|
+
|
|
29
40
|
### stat: Run command and print performance summary to stderr.
|
|
30
41
|
|
|
31
42
|
Uses wall mode by default. No file output by default.
|
|
@@ -33,9 +44,12 @@ Uses wall mode by default. No file output by default.
|
|
|
33
44
|
-o, --output PATH Also save profile to file (default: none)
|
|
34
45
|
-f, --frequency HZ Sampling frequency in Hz (default: 1000)
|
|
35
46
|
-m, --mode MODE cpu or wall (default: wall)
|
|
47
|
+
--label KEY=VALUE Add a label to profile metadata (repeatable)
|
|
36
48
|
--report Include flat/cumulative profile tables in output
|
|
37
49
|
--signal VALUE Timer signal (Linux only): signal number, or 'false'
|
|
38
50
|
for nanosleep thread (default: auto)
|
|
51
|
+
--no-inherit Do not profile forked/spawned child processes
|
|
52
|
+
--no-aggregate Disable C-level sample aggregation (raw per-sample data)
|
|
39
53
|
-v, --verbose Print additional sampling statistics
|
|
40
54
|
|
|
41
55
|
Shows: user/sys/real time, time breakdown (CPU execution, GVL blocked,
|
|
@@ -44,6 +58,10 @@ Lines are prefixed: `[Rperf]` for sampling-derived data, `[Ruby ]` for
|
|
|
44
58
|
runtime info, `[OS ]` for OS-level info.
|
|
45
59
|
Use --report to add flat and cumulative top-50 function tables.
|
|
46
60
|
|
|
61
|
+
When child processes are profiled (default), the stat output shows
|
|
62
|
+
aggregated data from all processes and includes a "Ruby processes profiled"
|
|
63
|
+
count. Use --no-inherit to disable child process tracking.
|
|
64
|
+
|
|
47
65
|
### exec: Run command and print full profile report to stderr.
|
|
48
66
|
|
|
49
67
|
Like `stat --report`. Uses wall mode by default. No file output by default.
|
|
@@ -51,8 +69,11 @@ Like `stat --report`. Uses wall mode by default. No file output by default.
|
|
|
51
69
|
-o, --output PATH Also save profile to file (default: none)
|
|
52
70
|
-f, --frequency HZ Sampling frequency in Hz (default: 1000)
|
|
53
71
|
-m, --mode MODE cpu or wall (default: wall)
|
|
72
|
+
--label KEY=VALUE Add a label to profile metadata (repeatable)
|
|
54
73
|
--signal VALUE Timer signal (Linux only): signal number, or 'false'
|
|
55
74
|
for nanosleep thread (default: auto)
|
|
75
|
+
--no-inherit Do not profile forked/spawned child processes
|
|
76
|
+
--no-aggregate Disable C-level sample aggregation (raw per-sample data)
|
|
56
77
|
-v, --verbose Print additional sampling statistics
|
|
57
78
|
|
|
58
79
|
Shows: user/sys/real time, time breakdown, GC/memory/OS stats, profiler overhead,
|
|
@@ -62,17 +83,136 @@ and flat/cumulative top-50 function tables.
|
|
|
62
83
|
|
|
63
84
|
--top Print top functions by flat time
|
|
64
85
|
--text Print text report
|
|
86
|
+
--format FORMAT Flat table for AI/machine consumption:
|
|
87
|
+
'table' (TSV) or 'table-json' (JSON array)
|
|
88
|
+
--html Output static HTML viewer to stdout
|
|
89
|
+
--port PORT Port for the web UI (default: auto).
|
|
90
|
+
Useful for SSH port forwarding
|
|
91
|
+
--host HOST Bind address for the web UI (default: localhost).
|
|
92
|
+
0.0.0.0 allows external access — the viewer has
|
|
93
|
+
NO authentication; prefer SSH port forwarding
|
|
94
|
+
|
|
95
|
+
The browser is auto-opened only when a GUI is available (DISPLAY /
|
|
96
|
+
WAYLAND_DISPLAY on Linux) and the bind address is local; otherwise the URL
|
|
97
|
+
is printed for manual opening (no terminal-browser fallback).
|
|
65
98
|
|
|
66
99
|
Default (no flag): opens interactive web UI in browser.
|
|
67
100
|
Default file: rperf.json.gz
|
|
68
101
|
|
|
69
|
-
|
|
102
|
+
#### Time-travel mode (directory input)
|
|
103
|
+
|
|
104
|
+
rperf report ./profiles/
|
|
105
|
+
|
|
106
|
+
Passing a directory lists all `*.json(.gz)` profiles in a sidebar — one row
|
|
107
|
+
per snapshot with commit SHA (a `*` marks a dirty working tree), commit
|
|
108
|
+
subject, date, and alloc/GC badges versus the previous snapshot (⚠️ when
|
|
109
|
+
allocation changed more than ±15%). Rows are grouped by git branch
|
|
110
|
+
(main/master expanded by default). Only meta/summary heads are read for the
|
|
111
|
+
listing; profile bodies are lazy-loaded on selection, so directories with
|
|
112
|
+
100+ snapshots open instantly. Works well with `rperf record --snapshot-dir`.
|
|
113
|
+
|
|
114
|
+
- Click a row to view that snapshot; j / k move to the newer / older one.
|
|
115
|
+
- ⇄ on a row diffs the current snapshot against it: the flamegraph is
|
|
116
|
+
recolored by share change (red = increased, blue = decreased, neutral
|
|
117
|
+
below ±0.4pt; direction is base (older) → current).
|
|
118
|
+
- Shift+click a frame to pin that method: a sparkline of its share across
|
|
119
|
+
all snapshots appears at the top of the sidebar (points are filled in as
|
|
120
|
+
bodies load; click a point to jump). The pinned frame is highlighted and
|
|
121
|
+
others are dimmed.
|
|
122
|
+
- Files without meta (saved by older rperf) appear as unknown snapshots.
|
|
123
|
+
|
|
124
|
+
`--html` generates an HTML file with profile data embedded inline.
|
|
125
|
+
No server is needed — open it directly in a browser. d3 and
|
|
126
|
+
d3-flamegraph are loaded from CDN, so an internet connection is
|
|
127
|
+
required on first viewing. Useful for sharing or hosting on static
|
|
128
|
+
sites (e.g., GitHub Pages).
|
|
129
|
+
|
|
130
|
+
rperf report --html profile.json.gz > report.html
|
|
131
|
+
|
|
132
|
+
### diff: Compare two profiles (target - base). Requires Go (except --format table).
|
|
133
|
+
|
|
134
|
+
Accepts `.json.gz` (auto-converted to pprof) or `.pb.gz` files.
|
|
70
135
|
|
|
71
136
|
--top Print top functions by diff
|
|
72
137
|
--text Print text diff report
|
|
138
|
+
--format FORMAT Flat diff table for AI/machine consumption:
|
|
139
|
+
'table' (TSV) or 'table-json' (JSON array).
|
|
140
|
+
Computed in Ruby — no Go required.
|
|
141
|
+
.json.gz / .json files only.
|
|
142
|
+
--port PORT Port for the web UI (default: auto)
|
|
143
|
+
--host HOST Bind address for the web UI (default: localhost)
|
|
73
144
|
|
|
74
145
|
Default (no flag): opens diff in browser.
|
|
75
146
|
|
|
147
|
+
### Table output for AI analysis (--format table / table-json)
|
|
148
|
+
|
|
149
|
+
Aggregation, diffing, and cutoff happen on the rperf side; the output is a
|
|
150
|
+
flat table that an LLM can analyze directly — no tree walking required.
|
|
151
|
+
|
|
152
|
+
`rperf report --format table FILE` columns (self_pct descending, top 50
|
|
153
|
+
plus an `(other)` aggregate row):
|
|
154
|
+
|
|
155
|
+
method self_pct total_pct self_ms
|
|
156
|
+
|
|
157
|
+
`rperf diff --format table BASE HEAD` columns (|delta_pt| descending,
|
|
158
|
+
top 50; delta_pt = self_pct_head - self_pct_base in percentage points):
|
|
159
|
+
|
|
160
|
+
method self_pct_base self_pct_head delta_pt
|
|
161
|
+
|
|
162
|
+
Per-method allocation data does not exist in sampling profiles, so
|
|
163
|
+
allocation counts appear only in the summary (whole-profile delta).
|
|
164
|
+
|
|
165
|
+
The last TSV line is `# summary` with tab-separated key=value pairs
|
|
166
|
+
(total_ms / allocated_objects / GC counts; base/head/delta for diff).
|
|
167
|
+
With `table-json`, the output is a JSON array of row objects whose last
|
|
168
|
+
element is `{"summary": {...}}`.
|
|
169
|
+
|
|
170
|
+
Feed the result to an LLM:
|
|
171
|
+
|
|
172
|
+
rperf diff --format table base.json.gz head.json.gz | claude -p "回帰の原因を分析して"
|
|
173
|
+
|
|
174
|
+
### Multi-process profiling
|
|
175
|
+
|
|
176
|
+
By default, rperf profiles forked and spawned Ruby child processes.
|
|
177
|
+
Profiles from all processes are merged into a single output. Each child
|
|
178
|
+
process's samples are tagged with a `%pid` label for per-process filtering.
|
|
179
|
+
|
|
180
|
+
# Profile a preforking server (Unicorn, Puma, etc.)
|
|
181
|
+
rperf stat -m wall bundle exec unicorn
|
|
182
|
+
rperf record -m wall -o profile.json.gz bundle exec unicorn
|
|
183
|
+
|
|
184
|
+
# Profile with fork
|
|
185
|
+
rperf stat ruby -e '4.times { fork { work } }; Process.waitall'
|
|
186
|
+
|
|
187
|
+
# Disable child process tracking
|
|
188
|
+
rperf stat --no-inherit ruby app.rb
|
|
189
|
+
|
|
190
|
+
How it works:
|
|
191
|
+
|
|
192
|
+
- On fork: `Process._fork` hook restarts profiling in the child and sets
|
|
193
|
+
a `%pid` label. When the child exits, its profile is saved to a
|
|
194
|
+
temporary session directory.
|
|
195
|
+
- On spawn/system: The spawned Ruby process inherits `RUBYLIB` (pointing
|
|
196
|
+
to rperf's lib directory) and `RUBYOPT=-rrperf`, plus `RPERF_SESSION_DIR`.
|
|
197
|
+
It auto-starts profiling and writes its profile to the session directory.
|
|
198
|
+
- When the root process exits, it aggregates all profiles from the
|
|
199
|
+
session directory into a single output (stat report or file).
|
|
200
|
+
- The session directory is cleaned up after aggregation.
|
|
201
|
+
|
|
202
|
+
Limitations:
|
|
203
|
+
|
|
204
|
+
- Daemon children (Process.daemon) that outlive the parent will have
|
|
205
|
+
their profiles lost, since the parent aggregates and cleans up the
|
|
206
|
+
session directory at exit.
|
|
207
|
+
- Cross-process snapshots (Rperf.snapshot) are not supported; snapshots
|
|
208
|
+
only cover the current process.
|
|
209
|
+
- Only Ruby child processes are profiled; non-Ruby children (shell
|
|
210
|
+
scripts, Python, etc.) are not affected.
|
|
211
|
+
- Child processes that use rperf independently (Rperf.start in their
|
|
212
|
+
own code) will conflict with the inherited auto-start session.
|
|
213
|
+
Such programs should clear RPERF_ENABLED from their environment
|
|
214
|
+
before requiring rperf.
|
|
215
|
+
|
|
76
216
|
### Examples
|
|
77
217
|
|
|
78
218
|
rperf record ruby app.rb
|
|
@@ -81,15 +221,22 @@ Default (no flag): opens diff in browser.
|
|
|
81
221
|
rperf record -o profile.collapsed ruby app.rb
|
|
82
222
|
rperf record -o profile.txt ruby app.rb
|
|
83
223
|
rperf record -p ruby app.rb
|
|
224
|
+
rperf record --snapshot-dir ./profiles ruby app.rb
|
|
225
|
+
rperf record --label ci=github-actions --label pr=123 ruby app.rb
|
|
84
226
|
rperf stat ruby app.rb
|
|
85
227
|
rperf stat --report ruby app.rb
|
|
86
228
|
rperf stat -o profile.pb.gz ruby app.rb
|
|
229
|
+
rperf stat -m wall bundle exec unicorn
|
|
230
|
+
rperf stat --no-inherit ruby app.rb
|
|
87
231
|
rperf exec ruby app.rb
|
|
88
232
|
rperf exec -m cpu ruby app.rb
|
|
89
233
|
rperf report
|
|
90
234
|
rperf report --top profile.pb.gz
|
|
235
|
+
rperf report --format table profile.json.gz
|
|
236
|
+
rperf report ./profiles/
|
|
91
237
|
rperf diff before.pb.gz after.pb.gz
|
|
92
238
|
rperf diff --top before.pb.gz after.pb.gz
|
|
239
|
+
rperf diff --format table before.json.gz after.json.gz
|
|
93
240
|
|
|
94
241
|
## RUBY API
|
|
95
242
|
|
|
@@ -114,12 +261,17 @@ Rperf.save("profile.txt", data)
|
|
|
114
261
|
|
|
115
262
|
### Rperf.start parameters
|
|
116
263
|
|
|
117
|
-
frequency: Sampling frequency in Hz (Integer, default: 1000)
|
|
264
|
+
frequency: Sampling frequency in Hz (Integer, 1..10000, default: 1000)
|
|
118
265
|
mode: :cpu or :wall (Symbol, default: :cpu)
|
|
119
266
|
output: File path to write on stop (String or nil)
|
|
120
267
|
verbose: Print statistics to stderr (true/false, default: false)
|
|
121
268
|
format: :json, :pprof, :collapsed, :text, or nil for auto-detect (Symbol or nil)
|
|
122
269
|
defer: Start with timer paused; use Rperf.profile to activate (default: false)
|
|
270
|
+
inherit: Child process tracking: :fork (default), true (fork+spawn), false (none)
|
|
271
|
+
Note: CLI defaults to true (--no-inherit to disable)
|
|
272
|
+
signal: Timer signal (Linux only): nil (default, auto), false (use nanosleep),
|
|
273
|
+
or a signal number (Integer)
|
|
274
|
+
aggregate: Aggregate samples in C (default: true). false returns raw per-sample data
|
|
123
275
|
|
|
124
276
|
### Rperf.stop return value
|
|
125
277
|
|
|
@@ -128,24 +280,30 @@ nil if profiler was not running; otherwise a Hash:
|
|
|
128
280
|
```ruby
|
|
129
281
|
{ mode: :cpu, # or :wall
|
|
130
282
|
frequency: 500,
|
|
131
|
-
|
|
283
|
+
trigger_count: 1300, # number of timer triggers
|
|
284
|
+
sampling_count: 1234, # number of timer callbacks (may differ from trigger_count)
|
|
132
285
|
sampling_time_ns: 56789,
|
|
133
286
|
detected_thread_count: 4, # threads seen during profiling
|
|
134
287
|
start_time_ns: 17740..., # CLOCK_REALTIME epoch nanos
|
|
135
288
|
duration_ns: 10000000, # profiling duration in nanos
|
|
136
|
-
aggregated_samples: [ #
|
|
289
|
+
aggregated_samples: [ # always present
|
|
137
290
|
[frames, weight, seq, label_set_id], # frames: [[path, label], ...] deepest-first
|
|
138
291
|
... # weight: Integer (nanoseconds, merged per unique stack)
|
|
139
292
|
], # seq: Integer (thread sequence, 1-based)
|
|
140
293
|
# label_set_id: Integer (0 = no labels)
|
|
141
294
|
label_sets: [{}, {request: "abc"}, ...], # label set table (index = label_set_id)
|
|
142
|
-
#
|
|
143
|
-
raw_samples: [
|
|
144
|
-
[frames, weight, seq, label_set_id],
|
|
145
|
-
...
|
|
146
|
-
] }
|
|
295
|
+
# additionally, when aggregate: false:
|
|
296
|
+
raw_samples: [ # one entry per timer sample (not merged)
|
|
297
|
+
[frames, weight, seq, label_set_id, vm_state],
|
|
298
|
+
... # vm_state: Integer (raw VM state; NOT
|
|
299
|
+
] } # converted to %GVL/%GC labels — only
|
|
300
|
+
# aggregated_samples gets that conversion)
|
|
147
301
|
```
|
|
148
302
|
|
|
303
|
+
With `aggregate: false`, BOTH keys are present: `aggregated_samples` is built
|
|
304
|
+
in Ruby from the raw samples (so encoders always work), and `raw_samples`
|
|
305
|
+
preserves the unmerged per-sample data.
|
|
306
|
+
|
|
149
307
|
### Rperf.snapshot(clear: false)
|
|
150
308
|
|
|
151
309
|
Returns a snapshot of the current profiling data without stopping.
|
|
@@ -153,7 +311,9 @@ Only works in aggregate mode (the default). Returns nil if not profiling.
|
|
|
153
311
|
|
|
154
312
|
When `clear: true` is given, resets aggregated data after taking the snapshot.
|
|
155
313
|
This enables interval-based profiling where each snapshot covers only the
|
|
156
|
-
period since the last clear.
|
|
314
|
+
period since the last clear. Note: the frame table is intentionally retained
|
|
315
|
+
(frame IDs must stay stable for GC safety and thread data consistency), so
|
|
316
|
+
`unique_frames` may accumulate across intervals.
|
|
157
317
|
|
|
158
318
|
```ruby
|
|
159
319
|
Rperf.start(frequency: 1000)
|
|
@@ -250,6 +410,22 @@ running). Raises `RuntimeError` if not started, `ArgumentError` without block.
|
|
|
250
410
|
|
|
251
411
|
Returns the current thread's labels as a Hash. Empty hash if none set.
|
|
252
412
|
|
|
413
|
+
### Rperf.running?
|
|
414
|
+
|
|
415
|
+
Returns true while a profiling session is active (between start and stop).
|
|
416
|
+
|
|
417
|
+
### Rperf.load(path)
|
|
418
|
+
|
|
419
|
+
Loads a `.json.gz` or `.json` profile file (saved by `rperf record` or `Rperf.save`)
|
|
420
|
+
and returns the parsed data hash (same format as `Rperf.stop` / `Rperf.snapshot`).
|
|
421
|
+
Gzip is auto-detected by magic bytes, so both compressed and plain files work.
|
|
422
|
+
Warns to stderr if the file was saved by a different rperf version.
|
|
423
|
+
|
|
424
|
+
```ruby
|
|
425
|
+
data = Rperf.load("rperf.json.gz") # gzip compressed
|
|
426
|
+
data = Rperf.load("profile.json") # plain text JSON
|
|
427
|
+
```
|
|
428
|
+
|
|
253
429
|
### Rperf.save(path, data, format: nil)
|
|
254
430
|
|
|
255
431
|
Writes data to path. format: :json, :pprof, :collapsed, or :text.
|
|
@@ -269,6 +445,8 @@ use Rperf::RackMiddleware
|
|
|
269
445
|
|
|
270
446
|
The middleware uses `Rperf.profile` to activate timer and set labels.
|
|
271
447
|
Start profiling separately. Option: `label_key:` (default: `:endpoint`).
|
|
448
|
+
When the profiler is not running, the middleware is a no-op (passes the
|
|
449
|
+
request straight through).
|
|
272
450
|
|
|
273
451
|
### Rperf::ActiveJobMiddleware
|
|
274
452
|
|
|
@@ -312,7 +490,19 @@ use Rperf::Viewer, max_snapshots: 12 # keep fewer snapshots (default: 24)
|
|
|
312
490
|
```
|
|
313
491
|
|
|
314
492
|
Take snapshots via `Rperf::Viewer.instance.take_snapshot!` or
|
|
315
|
-
`Rperf::Viewer.instance.add_snapshot(data)`.
|
|
493
|
+
`Rperf::Viewer.instance.add_snapshot(data)`. Snapshots carry the same
|
|
494
|
+
meta/summary as saved profiles, so when more than one snapshot exists the
|
|
495
|
+
UI shows the time-travel sidebar (list, diff, pin/sparkline, j/k) — see
|
|
496
|
+
"Time-travel mode" under the report subcommand.
|
|
497
|
+
`add_snapshot_dir(dir)` loads a directory of saved profiles (lazy-loaded;
|
|
498
|
+
`max_snapshots` does not apply to directory entries).
|
|
499
|
+
|
|
500
|
+
The UI fetches data from `<path>/snapshots` (list) and
|
|
501
|
+
`<path>/snapshots/<id>` (body). The URLs are replaceable at runtime:
|
|
502
|
+
define `window.RPERF_DATA_SOURCE` (with `listUrl()` / `snapshotUrl(id)`,
|
|
503
|
+
and optionally an async `onAuthError(url)` hook that returns a fresh URL
|
|
504
|
+
when a fetch hits HTTP 403, e.g. an expired signed URL) before the viewer
|
|
505
|
+
script runs to read snapshots from another source.
|
|
316
506
|
|
|
317
507
|
#### Typical setup with RackMiddleware and periodic snapshots
|
|
318
508
|
|
|
@@ -358,7 +548,8 @@ end
|
|
|
358
548
|
|
|
359
549
|
#### UI tabs
|
|
360
550
|
|
|
361
|
-
- **Flamegraph** — Interactive flamegraph (d3-flame-graph). Click to zoom
|
|
551
|
+
- **Flamegraph** — Interactive flamegraph (d3-flame-graph). Click to zoom;
|
|
552
|
+
Shift+click to pin a method (sparkline across snapshots in the sidebar).
|
|
362
553
|
- **Top** — Flat/cumulative weight table. Click column headers to sort.
|
|
363
554
|
- **Tags** — Label key/value breakdown with weight bars. Click a row to
|
|
364
555
|
set tagfocus and switch to Flamegraph.
|
|
@@ -389,15 +580,80 @@ Tag keys are sorted alphabetically (`%`-prefixed VM state keys appear first).
|
|
|
389
580
|
|
|
390
581
|
### json (default) — rperf native format
|
|
391
582
|
|
|
392
|
-
|
|
583
|
+
JSON representation of the internal data hash
|
|
393
584
|
(the same hash returned by `Rperf.stop` / `Rperf.snapshot` — see
|
|
394
585
|
"Return value" above for the full structure).
|
|
395
586
|
Preserves all data including labels, VM state, thread info, and statistics.
|
|
396
587
|
Readable by non-Ruby tools (Python, jq, etc.).
|
|
397
|
-
Extension convention: `.json.gz`
|
|
588
|
+
Extension convention: `.json.gz` (gzip-compressed, default) or `.json` (plain text).
|
|
398
589
|
View with: `rperf report` (opens rperf viewer in browser, no Go required).
|
|
399
590
|
Load programmatically: `data = Rperf.load("rperf.json.gz")`
|
|
400
591
|
|
|
592
|
+
#### Profile metadata (meta / summary)
|
|
593
|
+
|
|
594
|
+
JSON profiles embed two extra top-level keys, written FIRST in the file so
|
|
595
|
+
tools can list profiles by decompressing only the head (`Rperf.read_meta`):
|
|
596
|
+
|
|
597
|
+
```json
|
|
598
|
+
{
|
|
599
|
+
"meta": {
|
|
600
|
+
"format_version": 1,
|
|
601
|
+
"created_at": "2026-06-12T10:00:00Z",
|
|
602
|
+
"ruby_version": "3.5.0",
|
|
603
|
+
"rperf_version": "0.10.0",
|
|
604
|
+
"mode": "cpu",
|
|
605
|
+
"hostname": "...",
|
|
606
|
+
"git": {
|
|
607
|
+
"sha": "88e1a40...",
|
|
608
|
+
"branch": "main",
|
|
609
|
+
"subject": "Add nested includes support",
|
|
610
|
+
"committed_at": "2026-06-09T...",
|
|
611
|
+
"dirty": false
|
|
612
|
+
},
|
|
613
|
+
"labels": { "ci": "github-actions", "pr": "123" }
|
|
614
|
+
},
|
|
615
|
+
"summary": {
|
|
616
|
+
"total_ms": 2001.8,
|
|
617
|
+
"cpu_ms": 2023.3,
|
|
618
|
+
"gc_count_minor": 2,
|
|
619
|
+
"gc_count_major": 2,
|
|
620
|
+
"gc_ms": 3.0,
|
|
621
|
+
"allocated_objects": 48741,
|
|
622
|
+
"freed_objects": 27034,
|
|
623
|
+
"maxrss_mb": 16,
|
|
624
|
+
"samples": 1999,
|
|
625
|
+
"top_methods": [
|
|
626
|
+
{ "name": "Object#fibonacci", "self_pct": 99.9, "total_pct": 99.9 }
|
|
627
|
+
]
|
|
628
|
+
}
|
|
629
|
+
}
|
|
630
|
+
```
|
|
631
|
+
|
|
632
|
+
Notes:
|
|
633
|
+
|
|
634
|
+
- `git` is omitted outside a git repository (never an error). In GitHub
|
|
635
|
+
Actions, `GITHUB_SHA` / `GITHUB_REF_NAME` take priority over git commands.
|
|
636
|
+
The CLI collects git info before launching the profiled command, so a
|
|
637
|
+
`chdir` in the app cannot point git at the wrong repository.
|
|
638
|
+
- `git.dirty` is true when the working tree has uncommitted changes.
|
|
639
|
+
- GC counts and allocation counts are deltas over the profiled period
|
|
640
|
+
(`GC.stat` baselines are captured at `Rperf.start`). `maxrss_mb` is the
|
|
641
|
+
process-lifetime peak (no period delta is possible; on macOS the value
|
|
642
|
+
comes from `ps -o rss=` and is the current RSS, not the peak).
|
|
643
|
+
- `summary.top_methods` lists up to 50 methods by self time.
|
|
644
|
+
- In multi-process profiling, GC/memory stats come from the root process
|
|
645
|
+
only (same policy as `rperf stat`); time and sample stats are aggregated.
|
|
646
|
+
- Files saved by older rperf versions (no `meta`) remain loadable; viewers
|
|
647
|
+
treat them as unknown snapshots.
|
|
648
|
+
- pprof / collapsed / text exports do not contain meta.
|
|
649
|
+
|
|
650
|
+
### Rperf.read_meta(path)
|
|
651
|
+
|
|
652
|
+
Reads only `meta` / `summary` from a `.json(.gz)` profile without parsing
|
|
653
|
+
the sample body (fast even for large files). Returns
|
|
654
|
+
`{ meta: Hash|nil, summary: Hash|nil }`, or nil for pre-meta files and
|
|
655
|
+
unreadable files.
|
|
656
|
+
|
|
401
657
|
### pprof
|
|
402
658
|
|
|
403
659
|
Gzip-compressed protobuf. Standard pprof format.
|
|
@@ -459,7 +715,8 @@ Example output:
|
|
|
459
715
|
|
|
460
716
|
Format is auto-detected from the output file extension:
|
|
461
717
|
|
|
462
|
-
.json.gz → json (rperf native, default)
|
|
718
|
+
.json.gz → json (rperf native, gzip compressed, default)
|
|
719
|
+
.json → json (plain text, readable by jq)
|
|
463
720
|
.pb.gz → pprof
|
|
464
721
|
.collapsed → collapsed
|
|
465
722
|
.txt → text
|
|
@@ -485,8 +742,8 @@ In both modes, GC state labels are recorded:
|
|
|
485
742
|
- **%GC=mark** — Time spent in GC marking phase (wall time).
|
|
486
743
|
- **%GC=sweep** — Time spent in GC sweeping phase (wall time).
|
|
487
744
|
|
|
488
|
-
These labels appear in `label_sets` (e.g., `{"%GVL" => "blocked"}`,
|
|
489
|
-
`{"%GC" => "mark"}`) and are written into pprof sample labels.
|
|
745
|
+
These labels appear in `label_sets` (e.g., `{:"%GVL" => "blocked"}`,
|
|
746
|
+
`{:"%GC" => "mark"}`) and are written into pprof sample labels.
|
|
490
747
|
|
|
491
748
|
To add VM state as frames in flamegraphs, use pprof tag options:
|
|
492
749
|
|
|
@@ -566,7 +823,7 @@ Or convert to text with pprof CLI:
|
|
|
566
823
|
|
|
567
824
|
go tool pprof -text profile.pb.gz
|
|
568
825
|
go tool pprof -top profile.pb.gz
|
|
569
|
-
go tool pprof -
|
|
826
|
+
go tool pprof -http=:8080 profile.pb.gz # web UI (includes flame graph view)
|
|
570
827
|
|
|
571
828
|
## ENVIRONMENT VARIABLES
|
|
572
829
|
|
|
@@ -581,6 +838,15 @@ Used internally by the CLI to pass options to the auto-started profiler:
|
|
|
581
838
|
RPERF_SIGNAL=N|false Timer signal number or 'false' for nanosleep (Linux only)
|
|
582
839
|
RPERF_STAT=1 Enable stat mode (used by rperf stat)
|
|
583
840
|
RPERF_STAT_REPORT=1 Include profile tables in stat output
|
|
841
|
+
RPERF_AGGREGATE=0 Disable C-level sample aggregation (raw mode)
|
|
842
|
+
RPERF_DEFER=1 Start with timer paused; use Rperf.profile to activate
|
|
843
|
+
RPERF_TMPDIR=path Base directory for session directories (overrides default tmpdir)
|
|
844
|
+
|
|
845
|
+
Internal variables (set automatically by the CLI — not for manual use):
|
|
846
|
+
|
|
847
|
+
RPERF_SESSION_DIR=path Session directory for multi-process profiling
|
|
848
|
+
RPERF_ROOT_PROCESS=pid Marks the root aggregating process
|
|
849
|
+
RPERF_STAT_COMMAND=str Command string displayed in stat output
|
|
584
850
|
|
|
585
851
|
## TIPS
|
|
586
852
|
|