rperf 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: a5e10797e7670051bb82e49f32a80bac5371c9bd7652809ece4894a7d508c4bf
4
+ data.tar.gz: b577d93730398a5b91ab89df80e0cec422300839ec3d9879043b711285d4c4c2
5
+ SHA512:
6
+ metadata.gz: 2b5eb6e2125e2155937af009e084b43ff4ea4a5599a4b9d2015f3d6cd13a86f6644ecf05b58a383867853bb87e2017a4097d0f9c34662622dfaafba21efdd98c
7
+ data.tar.gz: e3585af44f4cfbb5bace10a7ea127d801035006123a45a54aa1c2b095aeb93f8b98a41041f39cda62a5bda7e16ce4ee5acb6a88290224db5b3162e08c969f6da
data/README.md ADDED
@@ -0,0 +1,149 @@
1
+ <p align="center">
2
+ <img src="docs/logo.svg" alt="rperf logo" width="260">
3
+ </p>
4
+
5
+ # rperf
6
+
7
+ A safepoint-based sampling performance profiler for Ruby. Uses actual time deltas as sample weights to correct safepoint bias.
8
+
9
+ - Requires Ruby >= 3.4.0
10
+ - Output: pprof protobuf, collapsed stacks, or text report
11
+ - Modes: CPU time (per-thread) and wall time (with GVL/GC tracking)
12
+ - [Online manual](https://ko1.github.io/rperf/docs/manual/) | [GitHub](https://github.com/ko1/rperf)
13
+
14
+ ## Quick Start
15
+
16
+ ```bash
17
+ gem install rperf
18
+
19
+ # Performance summary (wall mode, prints to stderr)
20
+ rperf stat ruby app.rb
21
+
22
+ # Profile to file
23
+ rperf record ruby app.rb # → rperf.data (pprof, cpu mode)
24
+ rperf record -m wall -o profile.pb.gz ruby server.rb # wall mode, custom output
25
+
26
+ # View results (report/diff require Go: https://go.dev/dl/)
27
+ rperf report # open rperf.data in browser
28
+ rperf report --top profile.pb.gz # print top functions to terminal
29
+
30
+ # Compare two profiles
31
+ rperf diff before.pb.gz after.pb.gz # open diff in browser
32
+ rperf diff --top before.pb.gz after.pb.gz # print diff to terminal
33
+ ```
34
+
35
+ ### Ruby API
36
+
37
+ ```ruby
38
+ require "rperf"
39
+
40
+ # Block form — profiles and saves to file
41
+ Rperf.start(output: "profile.pb.gz", frequency: 500, mode: :cpu) do
42
+ # code to profile
43
+ end
44
+
45
+ # Manual start/stop
46
+ Rperf.start(frequency: 1000, mode: :wall)
47
+ # ...
48
+ data = Rperf.stop
49
+ Rperf.save("profile.pb.gz", data)
50
+ ```
51
+
52
+ ### Environment Variables
53
+
54
+ Profile without code changes (e.g., Rails):
55
+
56
+ ```bash
57
+ RPERF_ENABLED=1 RPERF_MODE=wall RPERF_OUTPUT=profile.pb.gz ruby app.rb
58
+ ```
59
+
60
+ Run `rperf help` for full documentation, or see the [online manual](https://ko1.github.io/rperf/).
61
+
62
+ ## Subcommands
63
+
64
+ Inspired by Linux `perf` — familiar subcommand interface for profiling workflows.
65
+
66
+ | Command | Description |
67
+ |---------|-------------|
68
+ | `rperf record` | Profile a command and save to file |
69
+ | `rperf stat` | Profile a command and print summary to stderr |
70
+ | `rperf report` | Open pprof profile with `go tool pprof` (requires Go) |
71
+ | `rperf diff` | Compare two pprof profiles (requires Go) |
72
+ | `rperf help` | Show full reference documentation |
73
+
74
+ ## How It Works
75
+
76
+ ### The Problem
77
+
78
+ Ruby's sampling profilers collect stack traces at **safepoints**, not at the exact timer tick. Traditional profilers assign equal weight to every sample, so if a safepoint is delayed 5ms, that delay is invisible.
79
+
80
+ ### The Solution
81
+
82
+ rperf uses **time deltas as sample weights**:
83
+
84
+ ```
85
+ Timer (signal or thread) VM thread (postponed job)
86
+ ──────────────────────── ────────────────────────
87
+ every 1/frequency sec: at next safepoint:
88
+ rb_postponed_job_trigger() → rperf_sample_job()
89
+ time_now = read_clock()
90
+ weight = time_now - prev_time
91
+ record(backtrace, weight)
92
+ ```
93
+
94
+ On Linux, the timer uses `timer_create` + signal delivery (no extra thread).
95
+ On other platforms, a dedicated pthread with `nanosleep` is used.
96
+
97
+ If a safepoint is delayed, the sample carries proportionally more weight. The total weight equals the total time, accurately distributed across call stacks.
98
+
99
+ ### Modes
100
+
101
+ | Mode | Clock | What it measures |
102
+ |------|-------|------------------|
103
+ | `cpu` (default) | `CLOCK_THREAD_CPUTIME_ID` | CPU time consumed (excludes sleep/I/O) |
104
+ | `wall` | `CLOCK_MONOTONIC` | Real elapsed time (includes everything) |
105
+
106
+ Use `cpu` to find what consumes CPU. Use `wall` to find what makes things slow (I/O, GVL contention, GC).
107
+
108
+ ### Synthetic Frames (wall mode)
109
+
110
+ rperf hooks GVL and GC events to attribute non-CPU time:
111
+
112
+ | Frame | Meaning |
113
+ |-------|---------|
114
+ | `[GVL blocked]` | Off-GVL time (I/O, sleep, C extension releasing GVL) |
115
+ | `[GVL wait]` | Waiting to reacquire the GVL (contention) |
116
+ | `[GC marking]` | Time in GC mark phase |
117
+ | `[GC sweeping]` | Time in GC sweep phase |
118
+
119
+ ## Pros & Cons
120
+
121
+ ### Pros
122
+
123
+ - **Safepoint-based, but accurate**: Unlike signal-based profilers (e.g., stackprof), rperf samples at safepoints. Safepoint sampling is safer — no async-signal-safety constraints, so backtraces and VM state (GC phase, GVL ownership) can be inspected reliably. The downside is less precise sampling timing, but rperf compensates by using actual time deltas as sample weights — so the profiling results faithfully reflect where time is actually spent.
124
+ - **GVL & GC visibility** (wall mode): Attributes off-GVL time, GVL contention, and GC phases to the responsible call stacks with synthetic frames.
125
+ - **Low overhead**: No extra thread on Linux (signal-based timer). Sampling overhead is ~1-5 us per sample.
126
+ - **pprof compatible**: Output works with `go tool pprof`, speedscope, and other standard tools.
127
+ - **No code changes required**: Profile any Ruby program via CLI (`rperf stat ruby app.rb`) or environment variables (`RPERF_ENABLED=1`).
128
+ - **perf-like CLI**: Familiar subcommand interface — `record`, `stat`, `report`, `diff` — inspired by Linux perf.
129
+
130
+ ### Cons
131
+
132
+ - **Method-level only**: Profiles at the method level, not the line level. You can see which method is slow, but not which line within it.
133
+ - **Ruby >= 3.4.0**: Requires recent Ruby for the internal APIs used (postponed jobs, thread event hooks).
134
+ - **POSIX only**: Linux, macOS, etc. No Windows support.
135
+ - **Safepoint sampling**: Cannot sample inside C extensions or during long-running C calls that don't reach a safepoint. Time spent there is attributed to the next sample.
136
+
137
+ ## Output Formats
138
+
139
+ | Format | Extension | Use case |
140
+ |--------|-----------|----------|
141
+ | pprof (default) | `.pb.gz` | `rperf report`, `go tool pprof`, speedscope |
142
+ | collapsed | `.collapsed` | FlameGraph (`flamegraph.pl`), speedscope |
143
+ | text | `.txt` | Human/AI-readable flat + cumulative report |
144
+
145
+ Format is auto-detected from extension, or set explicitly with `--format`.
146
+
147
+ ## License
148
+
149
+ MIT
data/docs/help.md ADDED
@@ -0,0 +1,291 @@
1
+ # rperf - safepoint-based sampling performance profiler for Ruby
2
+
3
+ ## OVERVIEW
4
+
5
+ rperf profiles Ruby programs by sampling at safepoints and using actual
6
+ time deltas (nanoseconds) as weights to correct safepoint bias.
7
+ POSIX systems (Linux, macOS). Requires Ruby >= 3.4.0.
8
+
9
+ ## CLI USAGE
10
+
11
+ rperf record [options] command [args...]
12
+ rperf stat [options] command [args...]
13
+ rperf report [options] [file]
14
+ rperf help
15
+
16
+ ### record: Profile and save to file.
17
+
18
+ -o, --output PATH Output file (default: rperf.data)
19
+ -f, --frequency HZ Sampling frequency in Hz (default: 1000)
20
+ -m, --mode MODE cpu or wall (default: cpu)
21
+ --format FORMAT pprof, collapsed, or text (default: auto from extension)
22
+ --signal VALUE Timer signal (Linux only): signal number, or 'false'
23
+ for nanosleep thread (default: auto)
24
+ -v, --verbose Print sampling statistics to stderr
25
+
26
+ ### stat: Run command and print performance summary to stderr.
27
+
28
+ Always uses wall mode. No file output by default.
29
+
30
+ -o, --output PATH Also save profile to file (default: none)
31
+ -f, --frequency HZ Sampling frequency in Hz (default: 1000)
32
+ --signal VALUE Timer signal (Linux only): signal number, or 'false'
33
+ for nanosleep thread (default: auto)
34
+ -v, --verbose Print additional sampling statistics
35
+
36
+ Shows: user/sys/real time, time breakdown (CPU execution, GVL blocked,
37
+ GVL wait, GC marking, GC sweeping), and top 5 hot functions.
38
+
39
+ ### report: Open pprof profile with go tool pprof. Requires Go.
40
+
41
+ --top Print top functions by flat time
42
+ --text Print text report
43
+
44
+ Default (no flag): opens interactive web UI in browser.
45
+ Default file: rperf.data
46
+
47
+ ### diff: Compare two pprof profiles (target - base). Requires Go.
48
+
49
+ --top Print top functions by diff
50
+ --text Print text diff report
51
+
52
+ Default (no flag): opens diff in browser.
53
+
54
+ ### Examples
55
+
56
+ rperf record ruby app.rb
57
+ rperf record -o profile.pb.gz ruby app.rb
58
+ rperf record -m wall -f 500 -o profile.pb.gz ruby server.rb
59
+ rperf record -o profile.collapsed ruby app.rb
60
+ rperf record -o profile.txt ruby app.rb
61
+ rperf stat ruby app.rb
62
+ rperf stat -o profile.pb.gz ruby app.rb
63
+ rperf report
64
+ rperf report --top profile.pb.gz
65
+ rperf diff before.pb.gz after.pb.gz
66
+ rperf diff --top before.pb.gz after.pb.gz
67
+
68
+ ## RUBY API
69
+
70
+ ```ruby
71
+ require "rperf"
72
+
73
+ # Block form (recommended) — profiles the block and writes to file
74
+ Rperf.start(output: "profile.pb.gz", frequency: 500, mode: :cpu) do
75
+ # code to profile
76
+ end
77
+
78
+ # Manual start/stop — returns data hash for programmatic use
79
+ Rperf.start(frequency: 1000, mode: :wall)
80
+ # ... code to profile ...
81
+ data = Rperf.stop
82
+
83
+ # Save data to file later
84
+ Rperf.save("profile.pb.gz", data)
85
+ Rperf.save("profile.collapsed", data)
86
+ Rperf.save("profile.txt", data)
87
+ ```
88
+
89
+ ### Rperf.start parameters
90
+
91
+ frequency: Sampling frequency in Hz (Integer, default: 1000)
92
+ mode: :cpu or :wall (Symbol, default: :cpu)
93
+ output: File path to write on stop (String or nil)
94
+ verbose: Print statistics to stderr (true/false, default: false)
95
+ format: :pprof, :collapsed, :text, or nil for auto-detect (Symbol or nil)
96
+
97
+ ### Rperf.stop return value
98
+
99
+ nil if profiler was not running; otherwise a Hash:
100
+
101
+ ```ruby
102
+ { mode: :cpu, # or :wall
103
+ frequency: 500,
104
+ sampling_count: 1234,
105
+ sampling_time_ns: 56789,
106
+ start_time_ns: 17740..., # CLOCK_REALTIME epoch nanos
107
+ duration_ns: 10000000, # profiling duration in nanos
108
+ samples: [ # Array of [frames, weight, thread_seq]
109
+ [frames, weight, seq], # frames: [[path, label], ...] deepest-first
110
+ ... # weight: Integer (nanoseconds)
111
+ ] } # seq: Integer (thread sequence, 1-based)
112
+ ```
113
+
114
+ ### Rperf.save(path, data, format: nil)
115
+
116
+ Writes data to path. format: :pprof, :collapsed, or :text.
117
+ nil auto-detects from extension.
118
+
119
+ ## PROFILING MODES
120
+
121
+ - **cpu** — Measures per-thread CPU time via Linux thread clock.
122
+ Use for: finding functions that consume CPU cycles.
123
+ Ignores time spent sleeping, in I/O, or waiting for GVL.
124
+
125
+ - **wall** — Measures wall-clock time (CLOCK_MONOTONIC).
126
+ Use for: finding where wall time goes, including I/O, sleep, GVL
127
+ contention, and off-CPU waits.
128
+ Includes synthetic frames (see below).
129
+
130
+ ## OUTPUT FORMATS
131
+
132
+ ### pprof (default)
133
+
134
+ Gzip-compressed protobuf. Standard pprof format.
135
+ Extension convention: `.pb.gz`
136
+ View with: `go tool pprof`, pprof-rs, or speedscope (via import).
137
+
138
+ Embedded metadata:
139
+
140
+ comment rperf version, mode, frequency, Ruby version
141
+ time_nanos profile collection start time (epoch nanoseconds)
142
+ duration_nanos profile duration (nanoseconds)
143
+ doc_url link to this documentation
144
+
145
+ Sample labels:
146
+
147
+ thread_seq thread sequence number (1-based, assigned per profiling session)
148
+
149
+ View comments: `go tool pprof -comments profile.pb.gz`
150
+
151
+ Group by thread: `go tool pprof -tagroot=thread_seq profile.pb.gz`
152
+
153
+ ### collapsed
154
+
155
+ Plain text. One line per unique stack: `frame1;frame2;...;leaf weight`
156
+ Frames are semicolon-separated, bottom-to-top. Weight in nanoseconds.
157
+ Extension convention: `.collapsed`
158
+ Compatible with: FlameGraph (flamegraph.pl), speedscope.
159
+
160
+ ### text
161
+
162
+ Human/AI-readable report. Shows total time, then flat and cumulative
163
+ top-N tables sorted by weight descending. No parsing needed.
164
+ Extension convention: `.txt`
165
+
166
+ Example output:
167
+
168
+ Total: 1523.4ms (cpu)
169
+ Samples: 4820, Frequency: 500Hz
170
+
171
+ Flat:
172
+ 820.3ms 53.8% Array#each (app/models/user.rb)
173
+ 312.1ms 20.5% JSON.parse (lib/json/parser.rb)
174
+ ...
175
+
176
+ Cumulative:
177
+ 1401.2ms 92.0% UsersController#index (app/controllers/users_controller.rb)
178
+ ...
179
+
180
+ ### Format auto-detection
181
+
182
+ Format is auto-detected from the output file extension:
183
+
184
+ .collapsed → collapsed
185
+ .txt → text
186
+ anything else → pprof
187
+
188
+ The `--format` flag (CLI) or `format:` parameter (API) overrides auto-detect.
189
+
190
+ ## SYNTHETIC FRAMES
191
+
192
+ In wall mode, rperf adds synthetic frames that represent non-CPU time:
193
+
194
+ - **[GVL blocked]** — Time the thread spent off-GVL (I/O, sleep, C extension
195
+ releasing GVL). Attributed to the stack at SUSPENDED.
196
+ - **[GVL wait]** — Time the thread spent waiting to reacquire the GVL after
197
+ becoming ready. Indicates GVL contention. Same stack.
198
+
199
+ In both modes, GC time is tracked:
200
+
201
+ - **[GC marking]** — Time spent in GC marking phase (wall time).
202
+ - **[GC sweeping]** — Time spent in GC sweeping phase (wall time).
203
+
204
+ These always appear as the leaf (deepest) frame in a sample.
205
+
206
+ ## INTERPRETING RESULTS
207
+
208
+ Weight unit is always nanoseconds regardless of mode.
209
+
210
+ - **Flat time**: weight attributed directly to a function (it was the leaf).
211
+ - **Cumulative time**: weight for all samples where the function appears
212
+ anywhere in the stack.
213
+
214
+ High flat time → the function itself is expensive.
215
+ High cum but low flat → the function calls expensive children.
216
+
217
+ To convert: 1,000,000 ns = 1 ms, 1,000,000,000 ns = 1 s.
218
+
219
+ ## DIAGNOSING COMMON PERFORMANCE PROBLEMS
220
+
221
+ **Problem: high CPU usage**
222
+ - Mode: cpu
223
+ - Look for: functions with high flat cpu time.
224
+ - Action: optimize the hot function or call it less.
225
+
226
+ **Problem: slow request / high latency**
227
+ - Mode: wall
228
+ - Look for: functions with high cum wall time.
229
+ - If [GVL blocked] is dominant → I/O or sleep is the bottleneck.
230
+ - If [GVL wait] is dominant → GVL contention; reduce GVL-holding work
231
+ or move work to Ractors / child processes.
232
+
233
+ **Problem: GC pauses**
234
+ - Mode: cpu or wall
235
+ - Look for: [GC marking] and [GC sweeping] samples.
236
+ - High [GC marking] → too many live objects; reduce allocations.
237
+ - High [GC sweeping] → too many short-lived objects; reuse or pool.
238
+
239
+ **Problem: multithreaded app slower than expected**
240
+ - Mode: wall
241
+ - Look for: [GVL wait] time across threads.
242
+ - High [GVL wait] means threads are serialized on the GVL.
243
+
244
+ ## READING COLLAPSED STACKS PROGRAMMATICALLY
245
+
246
+ Each line: `bottom_frame;...;top_frame weight_ns`
247
+
248
+ ```ruby
249
+ File.readlines("profile.collapsed").each do |line|
250
+ stack, weight = line.rpartition(" ").then { |s, _, w| [s, w.to_i] }
251
+ frames = stack.split(";")
252
+ # frames[0] is bottom (main), frames[-1] is leaf (hot)
253
+ end
254
+ ```
255
+
256
+ ## READING PPROF PROGRAMMATICALLY
257
+
258
+ Decompress + parse protobuf:
259
+
260
+ ```ruby
261
+ require "zlib"; require "stringio"
262
+ raw = Zlib::GzipReader.new(StringIO.new(File.binread("profile.pb.gz"))).read
263
+ # raw is a protobuf binary; use google-protobuf gem or pprof tooling.
264
+ ```
265
+
266
+ Or convert to text with pprof CLI:
267
+
268
+ go tool pprof -text profile.pb.gz
269
+ go tool pprof -top profile.pb.gz
270
+ go tool pprof -flame profile.pb.gz
271
+
272
+ ## ENVIRONMENT VARIABLES
273
+
274
+ Used internally by the CLI to pass options to the auto-started profiler:
275
+
276
+ RPERF_ENABLED=1 Enable auto-start on require
277
+ RPERF_OUTPUT=path Output file path
278
+ RPERF_FREQUENCY=hz Sampling frequency
279
+ RPERF_MODE=cpu|wall Profiling mode
280
+ RPERF_FORMAT=fmt pprof, collapsed, or text
281
+ RPERF_VERBOSE=1 Print statistics
282
+ RPERF_SIGNAL=N|false Timer signal number or 'false' for nanosleep (Linux only)
283
+ RPERF_STAT=1 Enable stat mode (used by rperf stat)
284
+
285
+ ## TIPS
286
+
287
+ - Default frequency (1000 Hz) works well for most cases; overhead is < 0.2%.
288
+ - For long-running production profiling, lower frequency (100-500) reduces overhead further.
289
+ - Profile representative workloads, not micro-benchmarks.
290
+ - Compare cpu and wall profiles to distinguish CPU-bound from I/O-bound.
291
+ - The verbose flag (-v) shows sampling overhead and top functions on stderr.
data/exe/rperf ADDED
@@ -0,0 +1,207 @@
1
+ #!/usr/bin/env ruby
2
+ require "optparse"
3
+ require "socket"
4
+
5
+ def find_available_port
6
+ server = TCPServer.new("localhost", 0)
7
+ port = server.addr[1]
8
+ server.close
9
+ port
10
+ end
11
+
12
+ def run_pprof_subcommand(name, banner, min_files:)
13
+ mode = :http
14
+
15
+ parser = OptionParser.new do |opts|
16
+ opts.banner = banner
17
+
18
+ opts.on("--top", "Print top functions by #{min_files > 1 ? 'diff' : 'flat time'}") do
19
+ mode = :top
20
+ end
21
+
22
+ opts.on("--text", "Print text #{min_files > 1 ? 'diff report' : 'report'}") do
23
+ mode = :text
24
+ end
25
+
26
+ opts.on("-h", "--help", "Show this help") do
27
+ puts opts
28
+ exit
29
+ end
30
+ end
31
+
32
+ begin
33
+ parser.order!(ARGV)
34
+ rescue OptionParser::InvalidOption => e
35
+ $stderr.puts e.message
36
+ $stderr.puts parser
37
+ exit 1
38
+ end
39
+
40
+ if ARGV.size < min_files
41
+ msg = min_files > 1 ? "Two profile files required." : "No profile file specified."
42
+ $stderr.puts msg if min_files > 1
43
+ # For report, use default file
44
+ end
45
+
46
+ files = ARGV.shift(min_files > 1 ? [ARGV.size, min_files].min : 1)
47
+ files = ["rperf.data"] if files.empty? && min_files == 1
48
+
49
+ if min_files > 1 && files.size < min_files
50
+ $stderr.puts "Two profile files required."
51
+ $stderr.puts parser
52
+ exit 1
53
+ end
54
+
55
+ files.each do |f|
56
+ unless File.exist?(f)
57
+ $stderr.puts "File not found: #{f}"
58
+ exit 1
59
+ end
60
+ end
61
+
62
+ unless system("go", "version", out: File::NULL, err: File::NULL)
63
+ $stderr.puts "'go' command not found. Install Go to use 'rperf #{name}'."
64
+ $stderr.puts " https://go.dev/dl/"
65
+ exit 1
66
+ end
67
+
68
+ yield mode, files
69
+ end
70
+
71
+ HELP_TEXT = File.read(File.expand_path("../docs/help.md", __dir__))
72
+
73
+ USAGE = "Usage: rperf record [options] command [args...]\n" \
74
+ " rperf stat [options] command [args...]\n" \
75
+ " rperf report [options] [file]\n" \
76
+ " rperf diff [options] base.pb.gz target.pb.gz\n" \
77
+ " rperf help\n"
78
+
79
+ # Handle top-level flags before subcommand parsing
80
+ case ARGV.first
81
+ when "-v", "--version"
82
+ require "rperf"
83
+ puts "rperf #{Rperf::VERSION}"
84
+ exit
85
+ when "-h", "--help"
86
+ puts USAGE
87
+ puts
88
+ puts "Run 'rperf help' for full documentation"
89
+ exit
90
+ end
91
+
92
+ subcommand = ARGV.shift
93
+
94
+ case subcommand
95
+ when "help"
96
+ puts HELP_TEXT
97
+ exit
98
+ when "report"
99
+ run_pprof_subcommand("report",
100
+ "Usage: rperf report [options] [file]\n" \
101
+ " Opens pprof profile in browser (default) or prints summary.\n" \
102
+ " Default file: rperf.data",
103
+ min_files: 1) do |mode, files|
104
+ report_file = files[0]
105
+ case mode
106
+ when :top then exec("go", "tool", "pprof", "-top", report_file)
107
+ when :text then exec("go", "tool", "pprof", "-text", report_file)
108
+ else exec("go", "tool", "pprof", "-http=localhost:#{find_available_port}", report_file)
109
+ end
110
+ end
111
+ when "diff"
112
+ run_pprof_subcommand("diff",
113
+ "Usage: rperf diff [options] base.pb.gz target.pb.gz\n" \
114
+ " Compare two pprof profiles (shows target - base).",
115
+ min_files: 2) do |mode, files|
116
+ base_file, target_file = files
117
+ case mode
118
+ when :top then exec("go", "tool", "pprof", "-top", "-diff_base=#{base_file}", target_file)
119
+ when :text then exec("go", "tool", "pprof", "-text", "-diff_base=#{base_file}", target_file)
120
+ else exec("go", "tool", "pprof", "-http=localhost:#{find_available_port}", "-diff_base=#{base_file}", target_file)
121
+ end
122
+ end
123
+ when "record", "stat"
124
+ # continue below
125
+ else
126
+ $stderr.puts "Unknown subcommand: #{subcommand.inspect}" if subcommand
127
+ $stderr.puts USAGE
128
+ exit 1
129
+ end
130
+
131
+ output = (subcommand == "stat") ? nil : "rperf.data"
132
+ frequency = 1000
133
+ mode = (subcommand == "stat") ? "wall" : "cpu"
134
+ format = nil
135
+ signal = nil
136
+ verbose = false
137
+
138
+ parser = OptionParser.new do |opts|
139
+ opts.banner = USAGE
140
+
141
+ opts.on("-o", "--output PATH", "Output file#{subcommand == 'stat' ? ' (default: none)' : ' (default: rperf.data)'}") do |v|
142
+ output = v
143
+ end
144
+
145
+ opts.on("-f", "--frequency HZ", Integer, "Sampling frequency in Hz (default: 1000)") do |v|
146
+ frequency = v
147
+ end
148
+
149
+ if subcommand == "record"
150
+ opts.on("-m", "--mode MODE", %w[cpu wall], "Profiling mode: cpu or wall (default: cpu)") do |v|
151
+ mode = v
152
+ end
153
+
154
+ opts.on("--format FORMAT", %w[pprof collapsed text],
155
+ "Output format: pprof, collapsed, or text (default: auto from extension)") do |v|
156
+ format = v
157
+ end
158
+ end
159
+
160
+ opts.on("--signal VALUE", "Timer signal (Linux only): signal number, or 'false' for nanosleep thread") do |v|
161
+ signal = (v == "false") ? "false" : v
162
+ end
163
+
164
+ opts.on("-v", "--verbose", "Print sampling statistics to stderr") do
165
+ verbose = true
166
+ end
167
+
168
+ opts.on("-h", "--help", "Show this help") do
169
+ puts opts
170
+ puts
171
+ puts "Run 'rperf help' for full documentation (modes, formats, diagnostics guide, etc.)"
172
+ exit
173
+ end
174
+ end
175
+
176
+ begin
177
+ parser.order!(ARGV)
178
+ rescue OptionParser::InvalidOption => e
179
+ $stderr.puts e.message
180
+ $stderr.puts parser
181
+ exit 1
182
+ end
183
+
184
+ if ARGV.empty?
185
+ $stderr.puts "No command specified."
186
+ $stderr.puts parser
187
+ exit 1
188
+ end
189
+
190
+ # Add lib dir to RUBYLIB so -rrperf can find the extension
191
+ lib_dir = File.expand_path("../lib", __dir__)
192
+ ENV["RUBYLIB"] = [lib_dir, ENV["RUBYLIB"]].compact.join(File::PATH_SEPARATOR)
193
+ ENV["RUBYOPT"] = "-rrperf #{ENV['RUBYOPT']}".strip
194
+ ENV["RPERF_ENABLED"] = "1"
195
+ ENV["RPERF_OUTPUT"] = output if output
196
+ ENV["RPERF_FREQUENCY"] = frequency.to_s
197
+ ENV["RPERF_MODE"] = mode
198
+ ENV["RPERF_FORMAT"] = format if format
199
+ ENV["RPERF_VERBOSE"] = "1" if verbose
200
+ ENV["RPERF_SIGNAL"] = signal if signal
201
+
202
+ if subcommand == "stat"
203
+ ENV["RPERF_STAT"] = "1"
204
+ ENV["RPERF_STAT_COMMAND"] = ARGV.join(" ")
205
+ end
206
+
207
+ exec(*ARGV)
@@ -0,0 +1,6 @@
1
+ require "mkmf"
2
+
3
+ have_header("pthread.h") or abort "pthread.h not found"
4
+ have_library("pthread") or abort "libpthread not found"
5
+
6
+ create_makefile("rperf")