rperf 0.5.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3413c4c6ed0cdc0897428bf01fc0fec17a4d14f1c2883e9e5afa0cff110247dc
4
- data.tar.gz: '097b06203ce4648a860f2816635d6dfac52f8e5987aa381653cec874d52abf7c'
3
+ metadata.gz: f19061984f2ea33bbcd569c43e8a7ece03071b8fbb442ebe108468eb07d96a14
4
+ data.tar.gz: 66bee438bd8459db8ce89129cef39bdaaba3ad82e348012c5d220fb5b6f3963f
5
5
  SHA512:
6
- metadata.gz: 37065071f049a27eb1bab9f859ed39499022489a19aa8ecd91b3dc35cb6052ffb6b2fbc02c67ea46a94e8dba7644f2b23760d72d2dda7b998ccf3c61c304e225
7
- data.tar.gz: 686ab430d58e5dd5163ae65a2bd330a76e57cf0dd72e7eac2b7c61621a03007bd724cac20b1e452766870ff33de325855e199bc3d873d004344f7b26b9b6614f
6
+ metadata.gz: 3a9468eadacbb41afbc751bd767141a3db785a2eaa51e33549503fe160a8adb25f6612c0cc4c61381b8f8442836a970a23d29cda4fe696488ca85d2b048518a2
7
+ data.tar.gz: 5e8e6c6c24fbb264f352c98511481e5d448b6a07e390c72cc31beeab2aa03699cde22f2f04268531ab50572e7cfd54cab7235359915caa6c6b3c9eb40f26d4e9
data/README.md CHANGED
@@ -2,25 +2,66 @@
2
2
  <img src="docs/logo.svg" alt="rperf logo" width="260">
3
3
  </p>
4
4
 
5
- # rperf
5
+ <h1 align="center">rperf</h1>
6
6
 
7
- A safepoint-based sampling performance profiler for Ruby. Uses actual time deltas as sample weights to correct safepoint bias.
7
+ <p align="center">
8
+ <strong>Know where your Ruby spends its time — accurately.</strong><br>
9
+ A sampling profiler that corrects safepoint bias using real time deltas.
10
+ </p>
8
11
 
9
- - Requires Ruby >= 3.4.0
10
- - Output: pprof protobuf, collapsed stacks, or text report
11
- - Modes: CPU time (per-thread) and wall time (with GVL/GC tracking)
12
- - [Online manual](https://ko1.github.io/rperf/docs/manual/) | [GitHub](https://github.com/ko1/rperf)
12
+ <p align="center">
13
+ <a href="https://rubygems.org/gems/rperf"><img src="https://img.shields.io/gem/v/rperf.svg" alt="Gem Version"></a>
14
+ <img src="https://img.shields.io/badge/Ruby-%3E%3D%203.4.0-cc342d" alt="Ruby >= 3.4.0">
15
+ <a href="https://ko1.github.io/rperf/docs/manual/"><img src="https://img.shields.io/badge/docs-manual-blue" alt="Manual"></a>
16
+ <img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License">
17
+ </p>
13
18
 
14
- ## Quick Start
19
+ <p align="center">
20
+ pprof / collapsed stacks / text report &nbsp;·&nbsp; CPU mode & wall mode (GVL + GC tracking)
21
+ </p>
22
+
23
+ <p align="center">
24
+ <a href='https://ko1.github.io/rperf/'>Web site</a>,
25
+ <a href='https://ko1.github.io/rperf/docs/manual/'>Online manual</a>,
26
+ <a href='https://github.com/ko1/rperf'>GitHub repository</a>
27
+ </p>
28
+
29
+ ## See It in Action
15
30
 
16
31
  ```bash
17
- gem install rperf
32
+ $ gem install rperf
33
+ $ rperf exec ruby fib.rb
18
34
 
35
+ Performance stats for 'ruby fib.rb':
36
+
37
+ 2,326.0 ms user
38
+ 64.5 ms sys
39
+ 2,035.5 ms real
40
+
41
+ 2,034.2 ms 100.0% CPU execution
42
+ 1 [Ruby] detected threads
43
+ 7.0 ms [Ruby] GC time (7 count: 5 minor, 2 major)
44
+ 106,078 [Ruby] allocated objects
45
+ 22 MB [OS] peak memory (maxrss)
46
+
47
+ Flat:
48
+ 2,034.2 ms 100.0% Object#fibonacci (fib.rb)
49
+
50
+ Cumulative:
51
+ 2,034.2 ms 100.0% Object#fibonacci (fib.rb)
52
+ 2,034.2 ms 100.0% <main> (fib.rb)
53
+
54
+ 2034 samples / 2034 triggers, 0.1% profiler overhead
55
+ ```
56
+
57
+ ## Quick Start
58
+
59
+ ```bash
19
60
  # Performance summary (wall mode, prints to stderr)
20
61
  rperf stat ruby app.rb
21
62
 
22
- # Profile to file
23
- rperf record ruby app.rb # → rperf.data (pprof, cpu mode)
63
+ # Record a pprof profile to file
64
+ rperf record ruby app.rb # → rperf.data (cpu mode)
24
65
  rperf record -m wall -o profile.pb.gz ruby server.rb # wall mode, custom output
25
66
 
26
67
  # View results (report/diff require Go: https://go.dev/dl/)
@@ -67,19 +108,20 @@ Inspired by Linux `perf` — familiar subcommand interface for profiling workflo
67
108
  |---------|-------------|
68
109
  | `rperf record` | Profile a command and save to file |
69
110
  | `rperf stat` | Profile a command and print summary to stderr |
111
+ | `rperf exec` | Profile a command and print full report to stderr |
70
112
  | `rperf report` | Open pprof profile with `go tool pprof` (requires Go) |
71
113
  | `rperf diff` | Compare two pprof profiles (requires Go) |
72
114
  | `rperf help` | Show full reference documentation |
73
115
 
74
116
  ## How It Works
75
117
 
76
- ### The Problem
118
+ ### The Challenge: Safepoint Sampling
77
119
 
78
- Ruby's sampling profilers collect stack traces at **safepoints**, not at the exact timer tick. Traditional profilers assign equal weight to every sample, so if a safepoint is delayed 5ms, that delay is invisible.
120
+ Most Ruby profilers (e.g., stackprof) use signal handlers to capture stack traces at the exact moment the timer fires. rperf takes a different approach it samples at **safepoints** (VM checkpoints), which is safer (no async-signal-safety concerns, reliable access to VM state) but means the sample timing can be delayed. Without correction, this delay would skew the results.
79
121
 
80
- ### The Solution
122
+ ### The Fix: Weight = Real Time
81
123
 
82
- rperf uses **time deltas as sample weights**:
124
+ rperf uses **actual elapsed time as sample weights** — so delayed samples carry proportionally more weight, and the profile matches reality:
83
125
 
84
126
  ```
85
127
  Timer (signal or thread) VM thread (postponed job)
@@ -116,23 +158,22 @@ rperf hooks GVL and GC events to attribute non-CPU time:
116
158
  | `[GC marking]` | Time in GC mark phase |
117
159
  | `[GC sweeping]` | Time in GC sweep phase |
118
160
 
119
- ## Pros & Cons
161
+ ## Why rperf?
120
162
 
121
- ### Pros
163
+ - **Accurate despite safepoints** — Safepoint sampling is *safer* (no async-signal-safety issues), but normally *inaccurate*. rperf compensates with real time-delta weights, so profiles faithfully reflect where time is actually spent.
164
+ - **See the whole picture** (wall mode) — GVL contention, off-GVL I/O, GC marking/sweeping — all attributed to the call stacks responsible, via synthetic frames.
165
+ - **Low overhead** — Signal-based timer on Linux (no extra thread). ~1–5 µs per sample.
166
+ - **pprof compatible** — Works with `go tool pprof`, speedscope, and other standard tools out of the box.
167
+ - **Zero code changes** — Profile any Ruby program via CLI or environment variables. Drop-in for Rails, too.
168
+ - **`perf`-like CLI** — `record`, `stat`, `report`, `diff` — if you know Linux perf, you already know rperf.
122
169
 
123
- - **Safepoint-based, but accurate**: Unlike signal-based profilers (e.g., stackprof), rperf samples at safepoints. Safepoint sampling is safer — no async-signal-safety constraints, so backtraces and VM state (GC phase, GVL ownership) can be inspected reliably. The downside is less precise sampling timing, but rperf compensates by using actual time deltas as sample weights — so the profiling results faithfully reflect where time is actually spent.
124
- - **GVL & GC visibility** (wall mode): Attributes off-GVL time, GVL contention, and GC phases to the responsible call stacks with synthetic frames.
125
- - **Low overhead**: No extra thread on Linux (signal-based timer). Sampling overhead is ~1-5 us per sample.
126
- - **pprof compatible**: Output works with `go tool pprof`, speedscope, and other standard tools.
127
- - **No code changes required**: Profile any Ruby program via CLI (`rperf stat ruby app.rb`) or environment variables (`RPERF_ENABLED=1`).
128
- - **perf-like CLI**: Familiar subcommand interface — `record`, `stat`, `report`, `diff` — inspired by Linux perf.
170
+ ### Limitations
129
171
 
130
- ### Cons
172
+ - **Method-level only** — no line-level granularity.
173
+ - **Ruby >= 3.4.0** — uses recent VM internals (postponed jobs, thread event hooks).
174
+ - **POSIX only** — Linux, macOS. No Windows.
175
+ - **No fork support** — profiling does not follow fork(2) child processes.
131
176
 
132
- - **Method-level only**: Profiles at the method level, not the line level. You can see which method is slow, but not which line within it.
133
- - **Ruby >= 3.4.0**: Requires recent Ruby for the internal APIs used (postponed jobs, thread event hooks).
134
- - **POSIX only**: Linux, macOS, etc. No Windows support.
135
- - **Safepoint sampling**: Cannot sample inside C extensions or during long-running C calls that don't reach a safepoint. Time spent there is attributed to the next sample.
136
177
 
137
178
  ## Output Formats
138
179
 
@@ -146,4 +187,4 @@ Format is auto-detected from extension, or set explicitly with `--format`.
146
187
 
147
188
  ## License
148
189
 
149
- MIT
190
+ MIT
data/docs/help.md CHANGED
@@ -117,6 +117,7 @@ Rperf.save("profile.txt", data)
117
117
  output: File path to write on stop (String or nil)
118
118
  verbose: Print statistics to stderr (true/false, default: false)
119
119
  format: :pprof, :collapsed, :text, or nil for auto-detect (Symbol or nil)
120
+ defer: Start with timer paused; use Rperf.profile to activate (default: false)
120
121
 
121
122
  ### Rperf.stop return value
122
123
 
@@ -130,22 +131,159 @@ nil if profiler was not running; otherwise a Hash:
130
131
  detected_thread_count: 4, # threads seen during profiling
131
132
  start_time_ns: 17740..., # CLOCK_REALTIME epoch nanos
132
133
  duration_ns: 10000000, # profiling duration in nanos
133
- aggregated_samples: [ # when aggregate: true (default)
134
- [frames, weight, seq], # frames: [[path, label], ...] deepest-first
135
- ... # weight: Integer (nanoseconds, merged per unique stack)
136
- ], # seq: Integer (thread sequence, 1-based)
134
+ aggregated_samples: [ # when aggregate: true (default)
135
+ [frames, weight, seq, label_set_id], # frames: [[path, label], ...] deepest-first
136
+ ... # weight: Integer (nanoseconds, merged per unique stack)
137
+ ], # seq: Integer (thread sequence, 1-based)
138
+ # label_set_id: Integer (0 = no labels)
139
+ label_sets: [{}, {request: "abc"}, ...], # label set table (index = label_set_id)
137
140
  # --- OR ---
138
- raw_samples: [ # when aggregate: false
139
- [frames, weight, seq], # one entry per timer sample (not merged)
141
+ raw_samples: [ # when aggregate: false
142
+ [frames, weight, seq, label_set_id], # one entry per timer sample (not merged)
140
143
  ...
141
144
  ] }
142
145
  ```
143
146
 
147
+ ### Rperf.snapshot(clear: false)
148
+
149
+ Returns a snapshot of the current profiling data without stopping.
150
+ Only works in aggregate mode (the default). Returns nil if not profiling.
151
+
152
+ When `clear: true` is given, resets aggregated data after taking the snapshot.
153
+ This enables interval-based profiling where each snapshot covers only the
154
+ period since the last clear.
155
+
156
+ ```ruby
157
+ Rperf.start(frequency: 1000)
158
+ # ... work ...
159
+ snap = Rperf.snapshot # read data without stopping
160
+ Rperf.save("snap.pb.gz", snap)
161
+ # ... more work ...
162
+ data = Rperf.stop
163
+ ```
164
+
165
+ Interval-based usage:
166
+
167
+ ```ruby
168
+ Rperf.start(frequency: 1000)
169
+ loop do
170
+ sleep 10
171
+ snap = Rperf.snapshot(clear: true) # each snapshot covers the last 10s
172
+ Rperf.save("profile-#{Time.now.to_i}.pb.gz", snap)
173
+ end
174
+ ```
175
+
176
+ ### Rperf.label(**labels, &block)
177
+
178
+ Attaches key-value labels to the current thread's samples. Labels appear
179
+ in pprof sample labels, enabling per-context filtering (e.g., per-request).
180
+ If profiling is not running, labels are silently ignored (no error).
181
+
182
+ ```ruby
183
+ # Block form — labels are restored when the block exits
184
+ Rperf.label(request: "abc-123", endpoint: "/api/users") do
185
+ handle_request # samples inside get these labels
186
+ end
187
+ # labels are restored to previous state here
188
+
189
+ # Without block — labels persist until changed
190
+ Rperf.label(request: "abc-123")
191
+
192
+ # Merge — new labels merge with existing ones
193
+ Rperf.label(phase: "db") # adds phase, keeps request
194
+
195
+ # Delete a key — set value to nil
196
+ Rperf.label(request: nil) # removes request key
197
+
198
+ # Nested blocks — each block restores its entry state
199
+ Rperf.label(request: "abc") do
200
+ Rperf.label(phase: "db") do
201
+ Rperf.labels #=> {request: "abc", phase: "db"}
202
+ end
203
+ Rperf.labels #=> {request: "abc"}
204
+ end
205
+ Rperf.labels #=> {}
206
+ ```
207
+
208
+ In pprof output, use labels for filtering and grouping:
209
+
210
+ go tool pprof -tagfocus=request=abc-123 profile.pb.gz
211
+ go tool pprof -tagroot=request profile.pb.gz
212
+ go tool pprof -tagleaf=request profile.pb.gz
213
+
214
+ ### Rperf.start with defer: true
215
+
216
+ With `defer: true`, the profiler infrastructure is set up but the sampling
217
+ timer does not start. Use `Rperf.profile` to activate the timer for specific
218
+ sections. Outside `profile` blocks, overhead is zero.
219
+
220
+ ### Rperf.profile(**labels, &block)
221
+
222
+ Activates the sampling timer for the block duration and applies labels.
223
+ Designed for use with `start(defer: true)` to profile only specific
224
+ code paths.
225
+
226
+ ```ruby
227
+ Rperf.start(defer: true, mode: :wall)
228
+
229
+ Rperf.profile(endpoint: "/users") do
230
+ handle_request # sampled with endpoint="/users"
231
+ end
232
+ # timer paused — zero overhead
233
+
234
+ data = Rperf.stop
235
+ ```
236
+
237
+ Nesting is supported: timer stays active until the outermost block exits.
238
+ Also works with `start(defer: false)` — applies labels only (timer already
239
+ running). Raises `RuntimeError` if not started, `ArgumentError` without block.
240
+
241
+ ### Rperf.labels
242
+
243
+ Returns the current thread's labels as a Hash. Empty hash if none set.
244
+
144
245
  ### Rperf.save(path, data, format: nil)
145
246
 
146
247
  Writes data to path. format: :pprof, :collapsed, or :text.
147
248
  nil auto-detects from extension.
148
249
 
250
+ ### Rperf::RackMiddleware (Rack)
251
+
252
+ Labels samples with the request endpoint. Requires `require "rperf/rack"`.
253
+
254
+ ```ruby
255
+ # Rails
256
+ Rails.application.config.middleware.use Rperf::RackMiddleware
257
+
258
+ # Sinatra
259
+ use Rperf::RackMiddleware
260
+ ```
261
+
262
+ The middleware uses `Rperf.profile` to activate timer and set labels.
263
+ Start profiling separately. Option: `label_key:` (default: `:endpoint`).
264
+
265
+ ### Rperf::ActiveJobMiddleware
266
+
267
+ Labels samples with the job class name. Requires `require "rperf/active_job"`.
268
+
269
+ ```ruby
270
+ class ApplicationJob < ActiveJob::Base
271
+ include Rperf::ActiveJobMiddleware
272
+ end
273
+ ```
274
+
275
+ ### Rperf::SidekiqMiddleware
276
+
277
+ Labels samples with the worker class name. Requires `require "rperf/sidekiq"`.
278
+
279
+ ```ruby
280
+ Sidekiq.configure_server do |config|
281
+ config.server_middleware do |chain|
282
+ chain.add Rperf::SidekiqMiddleware
283
+ end
284
+ end
285
+ ```
286
+
149
287
  ## PROFILING MODES
150
288
 
151
289
  - **cpu** — Measures per-thread CPU time via Linux thread clock.
@@ -175,11 +313,20 @@ Embedded metadata:
175
313
  Sample labels:
176
314
 
177
315
  thread_seq thread sequence number (1-based, assigned per profiling session)
316
+ <user labels> custom key-value labels set via Rperf.label()
178
317
 
179
318
  View comments: `go tool pprof -comments profile.pb.gz`
180
319
 
181
320
  Group by thread: `go tool pprof -tagroot=thread_seq profile.pb.gz`
182
321
 
322
+ Filter by label: `go tool pprof -tagfocus=request=abc-123 profile.pb.gz`
323
+
324
+ Group by label (root): `go tool pprof -tagroot=request profile.pb.gz`
325
+
326
+ Group by label (leaf): `go tool pprof -tagleaf=request profile.pb.gz`
327
+
328
+ Exclude by label: `go tool pprof -tagignore=request=healthcheck profile.pb.gz`
329
+
183
330
  ### collapsed
184
331
 
185
332
  Plain text. One line per unique stack: `frame1;frame2;...;leaf weight`
data/exe/rperf CHANGED
@@ -80,7 +80,7 @@ USAGE = "Usage: rperf record [options] command [args...]\n" \
80
80
  # Handle top-level flags before subcommand parsing
81
81
  case ARGV.first
82
82
  when "-v", "--version"
83
- require "rperf"
83
+ require_relative "../lib/rperf"
84
84
  puts "rperf #{Rperf::VERSION}"
85
85
  exit
86
86
  when "-h", "--help"