rperf 0.7.0 → 0.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/LICENSE +21 -0
- data/README.md +75 -49
- data/docs/help.md +255 -36
- data/docs/logo.svg +25 -0
- data/exe/rperf +154 -30
- data/ext/rperf/rperf.c +235 -121
- data/lib/rperf/active_job.rb +1 -0
- data/lib/rperf/rack.rb +25 -3
- data/lib/rperf/version.rb +1 -1
- data/lib/rperf/viewer.rb +847 -0
- data/lib/rperf.rb +663 -92
- metadata +7 -4
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 7182c353301aa38afde2d65928219c46cee3e777b49842bf60331dd40e7b3ab2
|
|
4
|
+
data.tar.gz: ebb30b807d9b86a7ff48090bc5990d6bdcc0c9951f80b21758df2413bdbd39f2
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: d70dde1af1a4c3c9cec02e20981038c465facea4d3da32c32ce26e16cfc402de01f594c0b41386e3095f53603631f930fb2bd133d23b8a5757662d40f117d2a5
|
|
7
|
+
data.tar.gz: b7868f0237f84a47bda6286c99b7056f47738fb235ede0d21ac8f950feb9b891b1677dff38f25294393766cbe4f742b2eeffe73a43dded92562926db8d5bd392
|
data/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Koichi Sasada
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
data/README.md
CHANGED
|
@@ -17,7 +17,7 @@
|
|
|
17
17
|
</p>
|
|
18
18
|
|
|
19
19
|
<p align="center">
|
|
20
|
-
|
|
20
|
+
Built-in flamegraph viewer · CPU mode & wall mode (GVL + GC tracking)
|
|
21
21
|
</p>
|
|
22
22
|
|
|
23
23
|
<p align="center">
|
|
@@ -29,29 +29,34 @@
|
|
|
29
29
|
## See It in Action
|
|
30
30
|
|
|
31
31
|
```bash
|
|
32
|
-
$ gem install rperf
|
|
33
32
|
$ rperf exec ruby fib.rb
|
|
34
33
|
|
|
35
34
|
Performance stats for 'ruby fib.rb':
|
|
36
35
|
|
|
37
|
-
2,
|
|
38
|
-
|
|
39
|
-
2,
|
|
36
|
+
2,023.3 ms user
|
|
37
|
+
4.3 ms sys
|
|
38
|
+
2,001.8 ms real
|
|
40
39
|
|
|
41
|
-
2,
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
40
|
+
2,000.3 ms 100.0% [Rperf] CPU execution
|
|
41
|
+
3.0 ms [Ruby ] GC time (4 count: 2 minor, 2 major)
|
|
42
|
+
48,741 [Ruby ] allocated objects
|
|
43
|
+
27,034 [Ruby ] freed objects
|
|
44
|
+
1 [Ruby ] detected threads
|
|
45
|
+
16 MB [OS ] peak memory (maxrss)
|
|
46
|
+
5,784 [OS ] page faults (5,783 minor, 1 major)
|
|
47
|
+
22 [OS ] context switches (13 voluntary, 9 involuntary)
|
|
48
|
+
0 MB [OS ] disk I/O (0 MB read, 0 MB write)
|
|
46
49
|
|
|
47
50
|
Flat:
|
|
48
|
-
|
|
51
|
+
1,998.4 ms 99.9% Object#fibonacci (fib.rb)
|
|
52
|
+
1.9 ms 0.1% Module#method_added (<C method>)
|
|
49
53
|
|
|
50
54
|
Cumulative:
|
|
51
|
-
2,
|
|
52
|
-
|
|
55
|
+
2,000.3 ms 100.0% <main> (fib.rb)
|
|
56
|
+
1,998.4 ms 99.9% Object#fibonacci (fib.rb)
|
|
57
|
+
1.9 ms 0.1% Module#method_added (<C method>)
|
|
53
58
|
|
|
54
|
-
|
|
59
|
+
1999 samples / 1999 triggers, 0.1% profiler overhead
|
|
55
60
|
```
|
|
56
61
|
|
|
57
62
|
## Quick Start
|
|
@@ -60,26 +65,27 @@ $ rperf exec ruby fib.rb
|
|
|
60
65
|
# Performance summary (wall mode, prints to stderr)
|
|
61
66
|
rperf stat ruby app.rb
|
|
62
67
|
|
|
63
|
-
# Record a
|
|
64
|
-
rperf record ruby app.rb
|
|
65
|
-
rperf record -m wall
|
|
68
|
+
# Record a profile to file
|
|
69
|
+
rperf record ruby app.rb # → rperf.json.gz (cpu mode, default)
|
|
70
|
+
rperf record -m wall ruby server.rb # wall mode
|
|
66
71
|
|
|
67
|
-
# View results
|
|
68
|
-
rperf report
|
|
69
|
-
rperf report --top profile.
|
|
72
|
+
# View results in browser
|
|
73
|
+
rperf report # open rperf.json.gz in viewer
|
|
74
|
+
rperf report --top profile.json.gz # print top functions to terminal
|
|
70
75
|
|
|
71
|
-
# Compare two profiles
|
|
72
|
-
rperf diff before.
|
|
73
|
-
rperf diff --top before.pb.gz after.pb.gz # print diff to terminal
|
|
76
|
+
# Compare two profiles (requires Go)
|
|
77
|
+
rperf diff before.json.gz after.json.gz # open diff in browser
|
|
74
78
|
```
|
|
75
79
|
|
|
80
|
+
On `rperf report`, you can see the profile result like this page: [rprof viewer](https://ko1.github.io/rperf/examples/cpu_intensive_profile.html)
|
|
81
|
+
|
|
76
82
|
### Ruby API
|
|
77
83
|
|
|
78
84
|
```ruby
|
|
79
85
|
require "rperf"
|
|
80
86
|
|
|
81
87
|
# Block form — profiles and saves to file
|
|
82
|
-
Rperf.start(output: "profile.
|
|
88
|
+
Rperf.start(output: "profile.json.gz", frequency: 500, mode: :cpu) do
|
|
83
89
|
# code to profile
|
|
84
90
|
end
|
|
85
91
|
|
|
@@ -87,18 +93,37 @@ end
|
|
|
87
93
|
Rperf.start(frequency: 1000, mode: :wall)
|
|
88
94
|
# ...
|
|
89
95
|
data = Rperf.stop
|
|
90
|
-
Rperf.save("profile.
|
|
96
|
+
Rperf.save("profile.json.gz", data)
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
### In-browser Viewer
|
|
100
|
+
|
|
101
|
+
```ruby
|
|
102
|
+
# config.ru
|
|
103
|
+
require "rperf/viewer"
|
|
104
|
+
require "rperf/rack"
|
|
105
|
+
|
|
106
|
+
Rperf.start(mode: :wall, defer: true)
|
|
107
|
+
use Rperf::Viewer # visit /rperf/ for flamegraph UI
|
|
108
|
+
use Rperf::RackMiddleware # labels each request
|
|
109
|
+
run MyApp
|
|
110
|
+
|
|
111
|
+
# Snapshot every 60 minutes
|
|
112
|
+
Thread.new { loop { sleep 3600; Rperf::Viewer.instance&.take_snapshot! } }
|
|
91
113
|
```
|
|
92
114
|
|
|
115
|
+
> **Note:** `Rperf::Viewer` has no built-in authentication. In production, restrict access with your framework's auth mechanisms (e.g., route constraints in Rails). See the [manual](https://ko1.github.io/rperf/docs/manual/) for examples.
|
|
116
|
+
|
|
93
117
|
### Environment Variables
|
|
94
118
|
|
|
95
119
|
Profile without code changes (e.g., Rails):
|
|
96
120
|
|
|
97
121
|
```bash
|
|
98
|
-
RPERF_ENABLED=1 RPERF_MODE=wall
|
|
122
|
+
RPERF_ENABLED=1 RPERF_MODE=wall ruby app.rb # → rperf.json.gz
|
|
123
|
+
rperf report # open in viewer
|
|
99
124
|
```
|
|
100
125
|
|
|
101
|
-
Run `rperf help` for full documentation, or see the [online manual](https://ko1.github.io/rperf/).
|
|
126
|
+
Run `rperf help` for full documentation, or see the [online manual](https://ko1.github.io/rperf/docs/manual/).
|
|
102
127
|
|
|
103
128
|
## Subcommands
|
|
104
129
|
|
|
@@ -106,11 +131,11 @@ Inspired by Linux `perf` — familiar subcommand interface for profiling workflo
|
|
|
106
131
|
|
|
107
132
|
| Command | Description |
|
|
108
133
|
|---------|-------------|
|
|
109
|
-
| `rperf record` | Profile a command and save to file |
|
|
134
|
+
| `rperf record` | Profile a command and save to file (default: `.json.gz`) |
|
|
110
135
|
| `rperf stat` | Profile a command and print summary to stderr |
|
|
111
136
|
| `rperf exec` | Profile a command and print full report to stderr |
|
|
112
|
-
| `rperf report` | Open
|
|
113
|
-
| `rperf diff` | Compare two
|
|
137
|
+
| `rperf report` | Open viewer for `.json.gz`; wraps `go tool pprof` for `.pb.gz` (requires Go) |
|
|
138
|
+
| `rperf diff` | Compare two profiles (requires Go) |
|
|
114
139
|
| `rperf help` | Show full reference documentation |
|
|
115
140
|
|
|
116
141
|
## How It Works
|
|
@@ -133,7 +158,7 @@ Timer (signal or thread) VM thread (postponed job)
|
|
|
133
158
|
record(backtrace, weight)
|
|
134
159
|
```
|
|
135
160
|
|
|
136
|
-
On Linux, the timer uses `timer_create` + signal delivery
|
|
161
|
+
On Linux, the timer uses `timer_create` + signal delivery to a dedicated worker thread.
|
|
137
162
|
On other platforms, a dedicated pthread with `nanosleep` is used.
|
|
138
163
|
|
|
139
164
|
If a safepoint is delayed, the sample carries proportionally more weight. The total weight equals the total time, accurately distributed across call stacks.
|
|
@@ -147,44 +172,45 @@ If a safepoint is delayed, the sample carries proportionally more weight. The to
|
|
|
147
172
|
|
|
148
173
|
Use `cpu` to find what consumes CPU. Use `wall` to find what makes things slow (I/O, GVL contention, GC).
|
|
149
174
|
|
|
150
|
-
###
|
|
175
|
+
### GVL and GC Labels
|
|
151
176
|
|
|
152
|
-
rperf hooks GVL and GC events to attribute non-CPU time:
|
|
177
|
+
rperf hooks GVL and GC events to attribute non-CPU time. These are recorded as labels on samples rather than synthetic stack frames:
|
|
153
178
|
|
|
154
|
-
|
|
|
155
|
-
|
|
156
|
-
|
|
|
157
|
-
|
|
|
158
|
-
|
|
|
159
|
-
|
|
|
179
|
+
| Label (key=value) | Mode | Meaning |
|
|
180
|
+
|-------|------|---------|
|
|
181
|
+
| `%GVL=blocked` | wall only | Off-GVL time (I/O, sleep, C extension releasing GVL) |
|
|
182
|
+
| `%GVL=wait` | wall only | Waiting to reacquire the GVL (contention) |
|
|
183
|
+
| `%GC=mark` | cpu and wall | Time in GC mark phase (wall time) |
|
|
184
|
+
| `%GC=sweep` | cpu and wall | Time in GC sweep phase (wall time) |
|
|
160
185
|
|
|
161
186
|
## Why rperf?
|
|
162
187
|
|
|
163
188
|
- **Accurate despite safepoints** — Safepoint sampling is *safer* (no async-signal-safety issues), but normally *inaccurate*. rperf compensates with real time-delta weights, so profiles faithfully reflect where time is actually spent.
|
|
164
|
-
- **See the whole picture** (wall mode) — GVL contention, off-GVL I/O, GC marking/sweeping — all attributed to the call stacks responsible, via
|
|
165
|
-
- **
|
|
166
|
-
- **
|
|
189
|
+
- **See the whole picture** (wall mode) — GVL contention, off-GVL I/O, GC marking/sweeping — all attributed to the call stacks responsible, via sample labels.
|
|
190
|
+
- **Built-in viewer** — Flamegraph, Top, Tags tabs with interactive tag filtering. No external tools needed to analyze profiles.
|
|
191
|
+
- **Low overhead** — Signal-based timer on Linux (no extra thread). ~1–5 us per sample.
|
|
167
192
|
- **Zero code changes** — Profile any Ruby program via CLI or environment variables. Drop-in for Rails, too.
|
|
168
193
|
- **`perf`-like CLI** — `record`, `stat`, `report`, `diff` — if you know Linux perf, you already know rperf.
|
|
194
|
+
- **Multi-process** — automatically profiles forked/spawned Ruby child processes (e.g., Unicorn/Puma workers). Use `--no-inherit` to disable.
|
|
169
195
|
|
|
170
196
|
### Limitations
|
|
171
197
|
|
|
172
198
|
- **Method-level only** — no line-level granularity.
|
|
173
199
|
- **Ruby >= 3.4.0** — uses recent VM internals (postponed jobs, thread event hooks).
|
|
174
200
|
- **POSIX only** — Linux, macOS. No Windows.
|
|
175
|
-
- **No fork support** — profiling does not follow fork(2) child processes.
|
|
176
201
|
|
|
177
202
|
|
|
178
203
|
## Output Formats
|
|
179
204
|
|
|
180
|
-
| Format | Extension |
|
|
181
|
-
|
|
182
|
-
|
|
|
183
|
-
|
|
|
184
|
-
|
|
|
205
|
+
| Format | Extension | Viewer |
|
|
206
|
+
|--------|-----------|--------|
|
|
207
|
+
| JSON (default) | `.json.gz` | `rperf report` (built-in viewer), `Rperf.load`, any JSON tool |
|
|
208
|
+
| pprof | `.pb.gz` | `go tool pprof` (requires Go), speedscope |
|
|
209
|
+
| collapsed | `.collapsed` | FlameGraph, speedscope |
|
|
210
|
+
| text | `.txt` | any text viewer |
|
|
185
211
|
|
|
186
212
|
Format is auto-detected from extension, or set explicitly with `--format`.
|
|
187
213
|
|
|
188
214
|
## License
|
|
189
215
|
|
|
190
|
-
MIT
|
|
216
|
+
MIT
|