RubyGems - rperf - Versions diffs - 0.5.0 → 0.7.0 - Mend

rperf 0.5.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 3413c4c6ed0cdc0897428bf01fc0fec17a4d14f1c2883e9e5afa0cff110247dc
-  data.tar.gz: '097b06203ce4648a860f2816635d6dfac52f8e5987aa381653cec874d52abf7c'
+  metadata.gz: f19061984f2ea33bbcd569c43e8a7ece03071b8fbb442ebe108468eb07d96a14
+  data.tar.gz: 66bee438bd8459db8ce89129cef39bdaaba3ad82e348012c5d220fb5b6f3963f
 SHA512:
-  metadata.gz: 37065071f049a27eb1bab9f859ed39499022489a19aa8ecd91b3dc35cb6052ffb6b2fbc02c67ea46a94e8dba7644f2b23760d72d2dda7b998ccf3c61c304e225
-  data.tar.gz: 686ab430d58e5dd5163ae65a2bd330a76e57cf0dd72e7eac2b7c61621a03007bd724cac20b1e452766870ff33de325855e199bc3d873d004344f7b26b9b6614f
+  metadata.gz: 3a9468eadacbb41afbc751bd767141a3db785a2eaa51e33549503fe160a8adb25f6612c0cc4c61381b8f8442836a970a23d29cda4fe696488ca85d2b048518a2
+  data.tar.gz: 5e8e6c6c24fbb264f352c98511481e5d448b6a07e390c72cc31beeab2aa03699cde22f2f04268531ab50572e7cfd54cab7235359915caa6c6b3c9eb40f26d4e9

data/README.md CHANGED Viewed

@@ -2,25 +2,66 @@
   <img src="docs/logo.svg" alt="rperf logo" width="260">
 </p>
-# rperf
+<h1 align="center">rperf</h1>
-A safepoint-based sampling performance profiler for Ruby. Uses actual time deltas as sample weights to correct safepoint bias.
+<p align="center">
+  <strong>Know where your Ruby spends its time — accurately.</strong><br>
+  A sampling profiler that corrects safepoint bias using real time deltas.
+</p>
-- Requires Ruby >= 3.4.0
-- Output: pprof protobuf, collapsed stacks, or text report
-- Modes: CPU time (per-thread) and wall time (with GVL/GC tracking)
-- [Online manual](https://ko1.github.io/rperf/docs/manual/) | [GitHub](https://github.com/ko1/rperf)
+<p align="center">
+  <a href="https://rubygems.org/gems/rperf"><img src="https://img.shields.io/gem/v/rperf.svg" alt="Gem Version"></a>
+  <img src="https://img.shields.io/badge/Ruby-%3E%3D%203.4.0-cc342d" alt="Ruby >= 3.4.0">
+  <a href="https://ko1.github.io/rperf/docs/manual/"><img src="https://img.shields.io/badge/docs-manual-blue" alt="Manual"></a>
+  <img src="https://img.shields.io/badge/license-MIT-green" alt="MIT License">
+</p>
-## Quick Start
+<p align="center">
+  pprof / collapsed stacks / text report &nbsp;·&nbsp; CPU mode & wall mode (GVL + GC tracking)
+</p>
+<p align="center">
+  <a href='https://ko1.github.io/rperf/'>Web site</a>,
+  <a href='https://ko1.github.io/rperf/docs/manual/'>Online manual</a>,
+  <a href='https://github.com/ko1/rperf'>GitHub repository</a>
+</p>
+## See It in Action
 ```bash
-gem install rperf
+$ gem install rperf
+$ rperf exec ruby fib.rb
+ Performance stats for 'ruby fib.rb':
+         2,326.0 ms   user
+            64.5 ms   sys
+         2,035.5 ms   real
+         2,034.2 ms 100.0%  CPU execution
+               1            [Ruby] detected threads
+             7.0 ms         [Ruby] GC time (7 count: 5 minor, 2 major)
+         106,078            [Ruby] allocated objects
+              22 MB         [OS] peak memory (maxrss)
+ Flat:
+         2,034.2 ms 100.0%  Object#fibonacci (fib.rb)
+ Cumulative:
+         2,034.2 ms 100.0%  Object#fibonacci (fib.rb)
+         2,034.2 ms 100.0%  <main> (fib.rb)
+  2034 samples / 2034 triggers, 0.1% profiler overhead
+```
+## Quick Start
+```bash
 # Performance summary (wall mode, prints to stderr)
 rperf stat ruby app.rb
-# Profile to file
-rperf record ruby app.rb                              # → rperf.data (pprof, cpu mode)
+# Record a pprof profile to file
+rperf record ruby app.rb                              # → rperf.data (cpu mode)
 rperf record -m wall -o profile.pb.gz ruby server.rb   # wall mode, custom output
 # View results (report/diff require Go: https://go.dev/dl/)
@@ -67,19 +108,20 @@ Inspired by Linux `perf` — familiar subcommand interface for profiling workflo
 |---------|-------------|
 | `rperf record` | Profile a command and save to file |
 | `rperf stat` | Profile a command and print summary to stderr |
+| `rperf exec` | Profile a command and print full report to stderr |
 | `rperf report` | Open pprof profile with `go tool pprof` (requires Go) |
 | `rperf diff` | Compare two pprof profiles (requires Go) |
 | `rperf help` | Show full reference documentation |
 ## How It Works
-### The Problem
+### The Challenge: Safepoint Sampling
-Ruby's sampling profilers collect stack traces at **safepoints**, not at the exact timer tick. Traditional profilers assign equal weight to every sample, so if a safepoint is delayed 5ms, that delay is invisible.
+Most Ruby profilers (e.g., stackprof) use signal handlers to capture stack traces at the exact moment the timer fires. rperf takes a different approach — it samples at **safepoints** (VM checkpoints), which is safer (no async-signal-safety concerns, reliable access to VM state) but means the sample timing can be delayed. Without correction, this delay would skew the results.
-### The Solution
+### The Fix: Weight = Real Time
-rperf uses **time deltas as sample weights**:
+rperf uses **actual elapsed time as sample weights** — so delayed samples carry proportionally more weight, and the profile matches reality:
 ```
 Timer (signal or thread)         VM thread (postponed job)
@@ -116,23 +158,22 @@ rperf hooks GVL and GC events to attribute non-CPU time:
 | `[GC marking]` | Time in GC mark phase |
 | `[GC sweeping]` | Time in GC sweep phase |
-## Pros & Cons
+## Why rperf?
-### Pros
+- **Accurate despite safepoints** — Safepoint sampling is *safer* (no async-signal-safety issues), but normally *inaccurate*. rperf compensates with real time-delta weights, so profiles faithfully reflect where time is actually spent.
+- **See the whole picture** (wall mode) — GVL contention, off-GVL I/O, GC marking/sweeping — all attributed to the call stacks responsible, via synthetic frames.
+- **Low overhead** — Signal-based timer on Linux (no extra thread). ~1–5 µs per sample.
+- **pprof compatible** — Works with `go tool pprof`, speedscope, and other standard tools out of the box.
+- **Zero code changes** — Profile any Ruby program via CLI or environment variables. Drop-in for Rails, too.
+- **`perf`-like CLI** — `record`, `stat`, `report`, `diff` — if you know Linux perf, you already know rperf.
-- **Safepoint-based, but accurate**: Unlike signal-based profilers (e.g., stackprof), rperf samples at safepoints. Safepoint sampling is safer — no async-signal-safety constraints, so backtraces and VM state (GC phase, GVL ownership) can be inspected reliably. The downside is less precise sampling timing, but rperf compensates by using actual time deltas as sample weights — so the profiling results faithfully reflect where time is actually spent.
-- **GVL & GC visibility** (wall mode): Attributes off-GVL time, GVL contention, and GC phases to the responsible call stacks with synthetic frames.
-- **Low overhead**: No extra thread on Linux (signal-based timer). Sampling overhead is ~1-5 us per sample.
-- **pprof compatible**: Output works with `go tool pprof`, speedscope, and other standard tools.
-- **No code changes required**: Profile any Ruby program via CLI (`rperf stat ruby app.rb`) or environment variables (`RPERF_ENABLED=1`).
-- **perf-like CLI**: Familiar subcommand interface — `record`, `stat`, `report`, `diff` — inspired by Linux perf.
+### Limitations
-### Cons
+- **Method-level only** — no line-level granularity.
+- **Ruby >= 3.4.0** — uses recent VM internals (postponed jobs, thread event hooks).
+- **POSIX only** — Linux, macOS. No Windows.
+- **No fork support** — profiling does not follow fork(2) child processes.
-- **Method-level only**: Profiles at the method level, not the line level. You can see which method is slow, but not which line within it.
-- **Ruby >= 3.4.0**: Requires recent Ruby for the internal APIs used (postponed jobs, thread event hooks).
-- **POSIX only**: Linux, macOS, etc. No Windows support.
-- **Safepoint sampling**: Cannot sample inside C extensions or during long-running C calls that don't reach a safepoint. Time spent there is attributed to the next sample.
 ## Output Formats
@@ -146,4 +187,4 @@ Format is auto-detected from extension, or set explicitly with `--format`.
 ## License
-MIT
+MIT

data/docs/help.md CHANGED Viewed

@@ -117,6 +117,7 @@ Rperf.save("profile.txt", data)
     output:     File path to write on stop (String or nil)
     verbose:    Print statistics to stderr (true/false, default: false)
     format:     :pprof, :collapsed, :text, or nil for auto-detect (Symbol or nil)
+    defer:      Start with timer paused; use Rperf.profile to activate (default: false)
 ### Rperf.stop return value
@@ -130,22 +131,159 @@ nil if profiler was not running; otherwise a Hash:
   detected_thread_count: 4,        # threads seen during profiling
   start_time_ns: 17740...,         # CLOCK_REALTIME epoch nanos
   duration_ns: 10000000,           # profiling duration in nanos
-  aggregated_samples: [            # when aggregate: true (default)
-    [frames, weight, seq],         #   frames: [[path, label], ...] deepest-first
-    ...                            #   weight: Integer (nanoseconds, merged per unique stack)
-  ],                               #   seq: Integer (thread sequence, 1-based)
+  aggregated_samples: [                  # when aggregate: true (default)
+    [frames, weight, seq, label_set_id], #   frames: [[path, label], ...] deepest-first
+    ...                                  #   weight: Integer (nanoseconds, merged per unique stack)
+  ],                                     #   seq: Integer (thread sequence, 1-based)
+                                         #   label_set_id: Integer (0 = no labels)
+  label_sets: [{}, {request: "abc"}, ...], # label set table (index = label_set_id)
   # --- OR ---
-  raw_samples: [           # when aggregate: false
-    [frames, weight, seq], #   one entry per timer sample (not merged)
+  raw_samples: [                   # when aggregate: false
+    [frames, weight, seq, label_set_id], # one entry per timer sample (not merged)
     ...
   ] }
 ```
+### Rperf.snapshot(clear: false)
+Returns a snapshot of the current profiling data without stopping.
+Only works in aggregate mode (the default). Returns nil if not profiling.
+When `clear: true` is given, resets aggregated data after taking the snapshot.
+This enables interval-based profiling where each snapshot covers only the
+period since the last clear.
+```ruby
+Rperf.start(frequency: 1000)
+# ... work ...
+snap = Rperf.snapshot         # read data without stopping
+Rperf.save("snap.pb.gz", snap)
+# ... more work ...
+data = Rperf.stop
+```
+Interval-based usage:
+```ruby
+Rperf.start(frequency: 1000)
+loop do
+  sleep 10
+  snap = Rperf.snapshot(clear: true)  # each snapshot covers the last 10s
+  Rperf.save("profile-#{Time.now.to_i}.pb.gz", snap)
+end
+```
+### Rperf.label(**labels, &block)
+Attaches key-value labels to the current thread's samples. Labels appear
+in pprof sample labels, enabling per-context filtering (e.g., per-request).
+If profiling is not running, labels are silently ignored (no error).
+```ruby
+# Block form — labels are restored when the block exits
+Rperf.label(request: "abc-123", endpoint: "/api/users") do
+  handle_request   # samples inside get these labels
+end
+# labels are restored to previous state here
+# Without block — labels persist until changed
+Rperf.label(request: "abc-123")
+# Merge — new labels merge with existing ones
+Rperf.label(phase: "db")      # adds phase, keeps request
+# Delete a key — set value to nil
+Rperf.label(request: nil)     # removes request key
+# Nested blocks — each block restores its entry state
+Rperf.label(request: "abc") do
+  Rperf.label(phase: "db") do
+    Rperf.labels  #=> {request: "abc", phase: "db"}
+  end
+  Rperf.labels    #=> {request: "abc"}
+end
+Rperf.labels      #=> {}
+```
+In pprof output, use labels for filtering and grouping:
+    go tool pprof -tagfocus=request=abc-123 profile.pb.gz
+    go tool pprof -tagroot=request profile.pb.gz
+    go tool pprof -tagleaf=request profile.pb.gz
+### Rperf.start with defer: true
+With `defer: true`, the profiler infrastructure is set up but the sampling
+timer does not start. Use `Rperf.profile` to activate the timer for specific
+sections. Outside `profile` blocks, overhead is zero.
+### Rperf.profile(**labels, &block)
+Activates the sampling timer for the block duration and applies labels.
+Designed for use with `start(defer: true)` to profile only specific
+code paths.
+```ruby
+Rperf.start(defer: true, mode: :wall)
+Rperf.profile(endpoint: "/users") do
+  handle_request   # sampled with endpoint="/users"
+end
+# timer paused — zero overhead
+data = Rperf.stop
+```
+Nesting is supported: timer stays active until the outermost block exits.
+Also works with `start(defer: false)` — applies labels only (timer already
+running). Raises `RuntimeError` if not started, `ArgumentError` without block.
+### Rperf.labels
+Returns the current thread's labels as a Hash. Empty hash if none set.
 ### Rperf.save(path, data, format: nil)
 Writes data to path. format: :pprof, :collapsed, or :text.
 nil auto-detects from extension.
+### Rperf::RackMiddleware (Rack)
+Labels samples with the request endpoint. Requires `require "rperf/rack"`.
+```ruby
+# Rails
+Rails.application.config.middleware.use Rperf::RackMiddleware
+# Sinatra
+use Rperf::RackMiddleware
+```
+The middleware uses `Rperf.profile` to activate timer and set labels.
+Start profiling separately. Option: `label_key:` (default: `:endpoint`).
+### Rperf::ActiveJobMiddleware
+Labels samples with the job class name. Requires `require "rperf/active_job"`.
+```ruby
+class ApplicationJob < ActiveJob::Base
+  include Rperf::ActiveJobMiddleware
+end
+```
+### Rperf::SidekiqMiddleware
+Labels samples with the worker class name. Requires `require "rperf/sidekiq"`.
+```ruby
+Sidekiq.configure_server do |config|
+  config.server_middleware do |chain|
+    chain.add Rperf::SidekiqMiddleware
+  end
+end
+```
 ## PROFILING MODES
 - **cpu** — Measures per-thread CPU time via Linux thread clock.
@@ -175,11 +313,20 @@ Embedded metadata:
 Sample labels:
     thread_seq      thread sequence number (1-based, assigned per profiling session)
+    <user labels>   custom key-value labels set via Rperf.label()
 View comments: `go tool pprof -comments profile.pb.gz`
 Group by thread: `go tool pprof -tagroot=thread_seq profile.pb.gz`
+Filter by label: `go tool pprof -tagfocus=request=abc-123 profile.pb.gz`
+Group by label (root): `go tool pprof -tagroot=request profile.pb.gz`
+Group by label (leaf): `go tool pprof -tagleaf=request profile.pb.gz`
+Exclude by label: `go tool pprof -tagignore=request=healthcheck profile.pb.gz`
 ### collapsed
 Plain text. One line per unique stack: `frame1;frame2;...;leaf weight`

data/exe/rperf CHANGED Viewed

@@ -80,7 +80,7 @@ USAGE = "Usage: rperf record [options] command [args...]\n" \
 # Handle top-level flags before subcommand parsing
 case ARGV.first
 when "-v", "--version"
-  require "rperf"
+  require_relative "../lib/rperf"
   puts "rperf #{Rperf::VERSION}"
   exit
 when "-h", "--help"