RubyGems - rperf - Versions diffs - 0.3.0 - Mend

rperf 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: a5e10797e7670051bb82e49f32a80bac5371c9bd7652809ece4894a7d508c4bf
+  data.tar.gz: b577d93730398a5b91ab89df80e0cec422300839ec3d9879043b711285d4c4c2
+SHA512:
+  metadata.gz: 2b5eb6e2125e2155937af009e084b43ff4ea4a5599a4b9d2015f3d6cd13a86f6644ecf05b58a383867853bb87e2017a4097d0f9c34662622dfaafba21efdd98c
+  data.tar.gz: e3585af44f4cfbb5bace10a7ea127d801035006123a45a54aa1c2b095aeb93f8b98a41041f39cda62a5bda7e16ce4ee5acb6a88290224db5b3162e08c969f6da

data/README.md ADDED Viewed

@@ -0,0 +1,149 @@
+<p align="center">
+  <img src="docs/logo.svg" alt="rperf logo" width="260">
+</p>
+# rperf
+A safepoint-based sampling performance profiler for Ruby. Uses actual time deltas as sample weights to correct safepoint bias.
+- Requires Ruby >= 3.4.0
+- Output: pprof protobuf, collapsed stacks, or text report
+- Modes: CPU time (per-thread) and wall time (with GVL/GC tracking)
+- [Online manual](https://ko1.github.io/rperf/docs/manual/) | [GitHub](https://github.com/ko1/rperf)
+## Quick Start
+```bash
+gem install rperf
+# Performance summary (wall mode, prints to stderr)
+rperf stat ruby app.rb
+# Profile to file
+rperf record ruby app.rb                              # → rperf.data (pprof, cpu mode)
+rperf record -m wall -o profile.pb.gz ruby server.rb   # wall mode, custom output
+# View results (report/diff require Go: https://go.dev/dl/)
+rperf report                      # open rperf.data in browser
+rperf report --top profile.pb.gz  # print top functions to terminal
+# Compare two profiles
+rperf diff before.pb.gz after.pb.gz        # open diff in browser
+rperf diff --top before.pb.gz after.pb.gz  # print diff to terminal
+```
+### Ruby API
+```ruby
+require "rperf"
+# Block form — profiles and saves to file
+Rperf.start(output: "profile.pb.gz", frequency: 500, mode: :cpu) do
+  # code to profile
+end
+# Manual start/stop
+Rperf.start(frequency: 1000, mode: :wall)
+# ...
+data = Rperf.stop
+Rperf.save("profile.pb.gz", data)
+```
+### Environment Variables
+Profile without code changes (e.g., Rails):
+```bash
+RPERF_ENABLED=1 RPERF_MODE=wall RPERF_OUTPUT=profile.pb.gz ruby app.rb
+```
+Run `rperf help` for full documentation, or see the [online manual](https://ko1.github.io/rperf/).
+## Subcommands
+Inspired by Linux `perf` — familiar subcommand interface for profiling workflows.
+| Command | Description |
+|---------|-------------|
+| `rperf record` | Profile a command and save to file |
+| `rperf stat` | Profile a command and print summary to stderr |
+| `rperf report` | Open pprof profile with `go tool pprof` (requires Go) |
+| `rperf diff` | Compare two pprof profiles (requires Go) |
+| `rperf help` | Show full reference documentation |
+## How It Works
+### The Problem
+Ruby's sampling profilers collect stack traces at **safepoints**, not at the exact timer tick. Traditional profilers assign equal weight to every sample, so if a safepoint is delayed 5ms, that delay is invisible.
+### The Solution
+rperf uses **time deltas as sample weights**:
+```
+Timer (signal or thread)         VM thread (postponed job)
+────────────────────────         ────────────────────────
+  every 1/frequency sec:          at next safepoint:
+    rb_postponed_job_trigger()  →   rperf_sample_job()
+                                      time_now = read_clock()
+                                      weight = time_now - prev_time
+                                      record(backtrace, weight)
+```
+On Linux, the timer uses `timer_create` + signal delivery (no extra thread).
+On other platforms, a dedicated pthread with `nanosleep` is used.
+If a safepoint is delayed, the sample carries proportionally more weight. The total weight equals the total time, accurately distributed across call stacks.
+### Modes
+| Mode | Clock | What it measures |
+|------|-------|------------------|
+| `cpu` (default) | `CLOCK_THREAD_CPUTIME_ID` | CPU time consumed (excludes sleep/I/O) |
+| `wall` | `CLOCK_MONOTONIC` | Real elapsed time (includes everything) |
+Use `cpu` to find what consumes CPU. Use `wall` to find what makes things slow (I/O, GVL contention, GC).
+### Synthetic Frames (wall mode)
+rperf hooks GVL and GC events to attribute non-CPU time:
+| Frame | Meaning |
+|-------|---------|
+| `[GVL blocked]` | Off-GVL time (I/O, sleep, C extension releasing GVL) |
+| `[GVL wait]` | Waiting to reacquire the GVL (contention) |
+| `[GC marking]` | Time in GC mark phase |
+| `[GC sweeping]` | Time in GC sweep phase |
+## Pros & Cons
+### Pros
+- **Safepoint-based, but accurate**: Unlike signal-based profilers (e.g., stackprof), rperf samples at safepoints. Safepoint sampling is safer — no async-signal-safety constraints, so backtraces and VM state (GC phase, GVL ownership) can be inspected reliably. The downside is less precise sampling timing, but rperf compensates by using actual time deltas as sample weights — so the profiling results faithfully reflect where time is actually spent.
+- **GVL & GC visibility** (wall mode): Attributes off-GVL time, GVL contention, and GC phases to the responsible call stacks with synthetic frames.
+- **Low overhead**: No extra thread on Linux (signal-based timer). Sampling overhead is ~1-5 us per sample.
+- **pprof compatible**: Output works with `go tool pprof`, speedscope, and other standard tools.
+- **No code changes required**: Profile any Ruby program via CLI (`rperf stat ruby app.rb`) or environment variables (`RPERF_ENABLED=1`).
+- **perf-like CLI**: Familiar subcommand interface — `record`, `stat`, `report`, `diff` — inspired by Linux perf.
+### Cons
+- **Method-level only**: Profiles at the method level, not the line level. You can see which method is slow, but not which line within it.
+- **Ruby >= 3.4.0**: Requires recent Ruby for the internal APIs used (postponed jobs, thread event hooks).
+- **POSIX only**: Linux, macOS, etc. No Windows support.
+- **Safepoint sampling**: Cannot sample inside C extensions or during long-running C calls that don't reach a safepoint. Time spent there is attributed to the next sample.
+## Output Formats
+| Format | Extension | Use case |
+|--------|-----------|----------|
+| pprof (default) | `.pb.gz` | `rperf report`, `go tool pprof`, speedscope |
+| collapsed | `.collapsed` | FlameGraph (`flamegraph.pl`), speedscope |
+| text | `.txt` | Human/AI-readable flat + cumulative report |
+Format is auto-detected from extension, or set explicitly with `--format`.
+## License
+MIT

data/docs/help.md ADDED Viewed

@@ -0,0 +1,291 @@
+# rperf - safepoint-based sampling performance profiler for Ruby
+## OVERVIEW
+rperf profiles Ruby programs by sampling at safepoints and using actual
+time deltas (nanoseconds) as weights to correct safepoint bias.
+POSIX systems (Linux, macOS). Requires Ruby >= 3.4.0.
+## CLI USAGE
+    rperf record [options] command [args...]
+    rperf stat [options] command [args...]
+    rperf report [options] [file]
+    rperf help
+### record: Profile and save to file.
+    -o, --output PATH       Output file (default: rperf.data)
+    -f, --frequency HZ      Sampling frequency in Hz (default: 1000)
+    -m, --mode MODE         cpu or wall (default: cpu)
+    --format FORMAT         pprof, collapsed, or text (default: auto from extension)
+    --signal VALUE          Timer signal (Linux only): signal number, or 'false'
+                            for nanosleep thread (default: auto)
+    -v, --verbose           Print sampling statistics to stderr
+### stat: Run command and print performance summary to stderr.
+Always uses wall mode. No file output by default.
+    -o, --output PATH       Also save profile to file (default: none)
+    -f, --frequency HZ      Sampling frequency in Hz (default: 1000)
+    --signal VALUE          Timer signal (Linux only): signal number, or 'false'
+                            for nanosleep thread (default: auto)
+    -v, --verbose           Print additional sampling statistics
+Shows: user/sys/real time, time breakdown (CPU execution, GVL blocked,
+GVL wait, GC marking, GC sweeping), and top 5 hot functions.
+### report: Open pprof profile with go tool pprof. Requires Go.
+    --top                   Print top functions by flat time
+    --text                  Print text report
+Default (no flag): opens interactive web UI in browser.
+Default file: rperf.data
+### diff: Compare two pprof profiles (target - base). Requires Go.
+    --top                   Print top functions by diff
+    --text                  Print text diff report
+Default (no flag): opens diff in browser.
+### Examples
+    rperf record ruby app.rb
+    rperf record -o profile.pb.gz ruby app.rb
+    rperf record -m wall -f 500 -o profile.pb.gz ruby server.rb
+    rperf record -o profile.collapsed ruby app.rb
+    rperf record -o profile.txt ruby app.rb
+    rperf stat ruby app.rb
+    rperf stat -o profile.pb.gz ruby app.rb
+    rperf report
+    rperf report --top profile.pb.gz
+    rperf diff before.pb.gz after.pb.gz
+    rperf diff --top before.pb.gz after.pb.gz
+## RUBY API
+```ruby
+require "rperf"
+# Block form (recommended) — profiles the block and writes to file
+Rperf.start(output: "profile.pb.gz", frequency: 500, mode: :cpu) do
+  # code to profile
+end
+# Manual start/stop — returns data hash for programmatic use
+Rperf.start(frequency: 1000, mode: :wall)
+# ... code to profile ...
+data = Rperf.stop
+# Save data to file later
+Rperf.save("profile.pb.gz", data)
+Rperf.save("profile.collapsed", data)
+Rperf.save("profile.txt", data)
+```
+### Rperf.start parameters
+    frequency:  Sampling frequency in Hz (Integer, default: 1000)
+    mode:       :cpu or :wall (Symbol, default: :cpu)
+    output:     File path to write on stop (String or nil)
+    verbose:    Print statistics to stderr (true/false, default: false)
+    format:     :pprof, :collapsed, :text, or nil for auto-detect (Symbol or nil)
+### Rperf.stop return value
+nil if profiler was not running; otherwise a Hash:
+```ruby
+{ mode: :cpu,             # or :wall
+  frequency: 500,
+  sampling_count: 1234,
+  sampling_time_ns: 56789,
+  start_time_ns: 17740..., # CLOCK_REALTIME epoch nanos
+  duration_ns: 10000000,   # profiling duration in nanos
+  samples: [               # Array of [frames, weight, thread_seq]
+    [frames, weight, seq], #   frames: [[path, label], ...] deepest-first
+    ...                    #   weight: Integer (nanoseconds)
+  ] }                      #   seq: Integer (thread sequence, 1-based)
+```
+### Rperf.save(path, data, format: nil)
+Writes data to path. format: :pprof, :collapsed, or :text.
+nil auto-detects from extension.
+## PROFILING MODES
+- **cpu** — Measures per-thread CPU time via Linux thread clock.
+  Use for: finding functions that consume CPU cycles.
+  Ignores time spent sleeping, in I/O, or waiting for GVL.
+- **wall** — Measures wall-clock time (CLOCK_MONOTONIC).
+  Use for: finding where wall time goes, including I/O, sleep, GVL
+  contention, and off-CPU waits.
+  Includes synthetic frames (see below).
+## OUTPUT FORMATS
+### pprof (default)
+Gzip-compressed protobuf. Standard pprof format.
+Extension convention: `.pb.gz`
+View with: `go tool pprof`, pprof-rs, or speedscope (via import).
+Embedded metadata:
+    comment         rperf version, mode, frequency, Ruby version
+    time_nanos      profile collection start time (epoch nanoseconds)
+    duration_nanos  profile duration (nanoseconds)
+    doc_url         link to this documentation
+Sample labels:
+    thread_seq      thread sequence number (1-based, assigned per profiling session)
+View comments: `go tool pprof -comments profile.pb.gz`
+Group by thread: `go tool pprof -tagroot=thread_seq profile.pb.gz`
+### collapsed
+Plain text. One line per unique stack: `frame1;frame2;...;leaf weight`
+Frames are semicolon-separated, bottom-to-top. Weight in nanoseconds.
+Extension convention: `.collapsed`
+Compatible with: FlameGraph (flamegraph.pl), speedscope.
+### text
+Human/AI-readable report. Shows total time, then flat and cumulative
+top-N tables sorted by weight descending. No parsing needed.
+Extension convention: `.txt`
+Example output:
+    Total: 1523.4ms (cpu)
+    Samples: 4820, Frequency: 500Hz
+    Flat:
+         820.3ms  53.8%  Array#each (app/models/user.rb)
+         312.1ms  20.5%  JSON.parse (lib/json/parser.rb)
+         ...
+    Cumulative:
+        1401.2ms  92.0%  UsersController#index (app/controllers/users_controller.rb)
+         ...
+### Format auto-detection
+Format is auto-detected from the output file extension:
+    .collapsed → collapsed
+    .txt       → text
+    anything else → pprof
+The `--format` flag (CLI) or `format:` parameter (API) overrides auto-detect.
+## SYNTHETIC FRAMES
+In wall mode, rperf adds synthetic frames that represent non-CPU time:
+- **[GVL blocked]** — Time the thread spent off-GVL (I/O, sleep, C extension
+  releasing GVL). Attributed to the stack at SUSPENDED.
+- **[GVL wait]** — Time the thread spent waiting to reacquire the GVL after
+  becoming ready. Indicates GVL contention. Same stack.
+In both modes, GC time is tracked:
+- **[GC marking]** — Time spent in GC marking phase (wall time).
+- **[GC sweeping]** — Time spent in GC sweeping phase (wall time).
+These always appear as the leaf (deepest) frame in a sample.
+## INTERPRETING RESULTS
+Weight unit is always nanoseconds regardless of mode.
+- **Flat time**: weight attributed directly to a function (it was the leaf).
+- **Cumulative time**: weight for all samples where the function appears
+  anywhere in the stack.
+High flat time → the function itself is expensive.
+High cum but low flat → the function calls expensive children.
+To convert: 1,000,000 ns = 1 ms, 1,000,000,000 ns = 1 s.
+## DIAGNOSING COMMON PERFORMANCE PROBLEMS
+**Problem: high CPU usage**
+- Mode: cpu
+- Look for: functions with high flat cpu time.
+- Action: optimize the hot function or call it less.
+**Problem: slow request / high latency**
+- Mode: wall
+- Look for: functions with high cum wall time.
+- If [GVL blocked] is dominant → I/O or sleep is the bottleneck.
+- If [GVL wait] is dominant → GVL contention; reduce GVL-holding work
+  or move work to Ractors / child processes.
+**Problem: GC pauses**
+- Mode: cpu or wall
+- Look for: [GC marking] and [GC sweeping] samples.
+- High [GC marking] → too many live objects; reduce allocations.
+- High [GC sweeping] → too many short-lived objects; reuse or pool.
+**Problem: multithreaded app slower than expected**
+- Mode: wall
+- Look for: [GVL wait] time across threads.
+- High [GVL wait] means threads are serialized on the GVL.
+## READING COLLAPSED STACKS PROGRAMMATICALLY
+Each line: `bottom_frame;...;top_frame weight_ns`
+```ruby
+File.readlines("profile.collapsed").each do |line|
+  stack, weight = line.rpartition(" ").then { |s, _, w| [s, w.to_i] }
+  frames = stack.split(";")
+  # frames[0] is bottom (main), frames[-1] is leaf (hot)
+end
+```
+## READING PPROF PROGRAMMATICALLY
+Decompress + parse protobuf:
+```ruby
+require "zlib"; require "stringio"
+raw = Zlib::GzipReader.new(StringIO.new(File.binread("profile.pb.gz"))).read
+# raw is a protobuf binary; use google-protobuf gem or pprof tooling.
+```
+Or convert to text with pprof CLI:
+    go tool pprof -text profile.pb.gz
+    go tool pprof -top profile.pb.gz
+    go tool pprof -flame profile.pb.gz
+## ENVIRONMENT VARIABLES
+Used internally by the CLI to pass options to the auto-started profiler:
+    RPERF_ENABLED=1       Enable auto-start on require
+    RPERF_OUTPUT=path     Output file path
+    RPERF_FREQUENCY=hz    Sampling frequency
+    RPERF_MODE=cpu|wall   Profiling mode
+    RPERF_FORMAT=fmt      pprof, collapsed, or text
+    RPERF_VERBOSE=1       Print statistics
+    RPERF_SIGNAL=N|false  Timer signal number or 'false' for nanosleep (Linux only)
+    RPERF_STAT=1          Enable stat mode (used by rperf stat)
+## TIPS
+- Default frequency (1000 Hz) works well for most cases; overhead is < 0.2%.
+- For long-running production profiling, lower frequency (100-500) reduces overhead further.
+- Profile representative workloads, not micro-benchmarks.
+- Compare cpu and wall profiles to distinguish CPU-bound from I/O-bound.
+- The verbose flag (-v) shows sampling overhead and top functions on stderr.

data/exe/rperf ADDED Viewed

@@ -0,0 +1,207 @@
+#!/usr/bin/env ruby
+require "optparse"
+require "socket"
+def find_available_port
+  server = TCPServer.new("localhost", 0)
+  port = server.addr[1]
+  server.close
+  port
+end
+def run_pprof_subcommand(name, banner, min_files:)
+  mode = :http
+  parser = OptionParser.new do |opts|
+    opts.banner = banner
+    opts.on("--top", "Print top functions by #{min_files > 1 ? 'diff' : 'flat time'}") do
+      mode = :top
+    end
+    opts.on("--text", "Print text #{min_files > 1 ? 'diff report' : 'report'}") do
+      mode = :text
+    end
+    opts.on("-h", "--help", "Show this help") do
+      puts opts
+      exit
+    end
+  end
+  begin
+    parser.order!(ARGV)
+  rescue OptionParser::InvalidOption => e
+    $stderr.puts e.message
+    $stderr.puts parser
+    exit 1
+  end
+  if ARGV.size < min_files
+    msg = min_files > 1 ? "Two profile files required." : "No profile file specified."
+    $stderr.puts msg if min_files > 1
+    # For report, use default file
+  end
+  files = ARGV.shift(min_files > 1 ? [ARGV.size, min_files].min : 1)
+  files = ["rperf.data"] if files.empty? && min_files == 1
+  if min_files > 1 && files.size < min_files
+    $stderr.puts "Two profile files required."
+    $stderr.puts parser
+    exit 1
+  end
+  files.each do |f|
+    unless File.exist?(f)
+      $stderr.puts "File not found: #{f}"
+      exit 1
+    end
+  end
+  unless system("go", "version", out: File::NULL, err: File::NULL)
+    $stderr.puts "'go' command not found. Install Go to use 'rperf #{name}'."
+    $stderr.puts "  https://go.dev/dl/"
+    exit 1
+  end
+  yield mode, files
+end
+HELP_TEXT = File.read(File.expand_path("../docs/help.md", __dir__))
+USAGE = "Usage: rperf record [options] command [args...]\n" \
+       "       rperf stat [options] command [args...]\n" \
+       "       rperf report [options] [file]\n" \
+       "       rperf diff [options] base.pb.gz target.pb.gz\n" \
+       "       rperf help\n"
+# Handle top-level flags before subcommand parsing
+case ARGV.first
+when "-v", "--version"
+  require "rperf"
+  puts "rperf #{Rperf::VERSION}"
+  exit
+when "-h", "--help"
+  puts USAGE
+  puts
+  puts "Run 'rperf help' for full documentation"
+  exit
+end
+subcommand = ARGV.shift
+case subcommand
+when "help"
+  puts HELP_TEXT
+  exit
+when "report"
+  run_pprof_subcommand("report",
+    "Usage: rperf report [options] [file]\n" \
+    "       Opens pprof profile in browser (default) or prints summary.\n" \
+    "       Default file: rperf.data",
+    min_files: 1) do |mode, files|
+    report_file = files[0]
+    case mode
+    when :top  then exec("go", "tool", "pprof", "-top", report_file)
+    when :text then exec("go", "tool", "pprof", "-text", report_file)
+    else            exec("go", "tool", "pprof", "-http=localhost:#{find_available_port}", report_file)
+    end
+  end
+when "diff"
+  run_pprof_subcommand("diff",
+    "Usage: rperf diff [options] base.pb.gz target.pb.gz\n" \
+    "       Compare two pprof profiles (shows target - base).",
+    min_files: 2) do |mode, files|
+    base_file, target_file = files
+    case mode
+    when :top  then exec("go", "tool", "pprof", "-top", "-diff_base=#{base_file}", target_file)
+    when :text then exec("go", "tool", "pprof", "-text", "-diff_base=#{base_file}", target_file)
+    else            exec("go", "tool", "pprof", "-http=localhost:#{find_available_port}", "-diff_base=#{base_file}", target_file)
+    end
+  end
+when "record", "stat"
+  # continue below
+else
+  $stderr.puts "Unknown subcommand: #{subcommand.inspect}" if subcommand
+  $stderr.puts USAGE
+  exit 1
+end
+output = (subcommand == "stat") ? nil : "rperf.data"
+frequency = 1000
+mode = (subcommand == "stat") ? "wall" : "cpu"
+format = nil
+signal = nil
+verbose = false
+parser = OptionParser.new do |opts|
+  opts.banner = USAGE
+  opts.on("-o", "--output PATH", "Output file#{subcommand == 'stat' ? ' (default: none)' : ' (default: rperf.data)'}") do |v|
+    output = v
+  end
+  opts.on("-f", "--frequency HZ", Integer, "Sampling frequency in Hz (default: 1000)") do |v|
+    frequency = v
+  end
+  if subcommand == "record"
+    opts.on("-m", "--mode MODE", %w[cpu wall], "Profiling mode: cpu or wall (default: cpu)") do |v|
+      mode = v
+    end
+    opts.on("--format FORMAT", %w[pprof collapsed text],
+            "Output format: pprof, collapsed, or text (default: auto from extension)") do |v|
+      format = v
+    end
+  end
+  opts.on("--signal VALUE", "Timer signal (Linux only): signal number, or 'false' for nanosleep thread") do |v|
+    signal = (v == "false") ? "false" : v
+  end
+  opts.on("-v", "--verbose", "Print sampling statistics to stderr") do
+    verbose = true
+  end
+  opts.on("-h", "--help", "Show this help") do
+    puts opts
+    puts
+    puts "Run 'rperf help' for full documentation (modes, formats, diagnostics guide, etc.)"
+    exit
+  end
+end
+begin
+  parser.order!(ARGV)
+rescue OptionParser::InvalidOption => e
+  $stderr.puts e.message
+  $stderr.puts parser
+  exit 1
+end
+if ARGV.empty?
+  $stderr.puts "No command specified."
+  $stderr.puts parser
+  exit 1
+end
+# Add lib dir to RUBYLIB so -rrperf can find the extension
+lib_dir = File.expand_path("../lib", __dir__)
+ENV["RUBYLIB"] = [lib_dir, ENV["RUBYLIB"]].compact.join(File::PATH_SEPARATOR)
+ENV["RUBYOPT"] = "-rrperf #{ENV['RUBYOPT']}".strip
+ENV["RPERF_ENABLED"] = "1"
+ENV["RPERF_OUTPUT"] = output if output
+ENV["RPERF_FREQUENCY"] = frequency.to_s
+ENV["RPERF_MODE"] = mode
+ENV["RPERF_FORMAT"] = format if format
+ENV["RPERF_VERBOSE"] = "1" if verbose
+ENV["RPERF_SIGNAL"] = signal if signal
+if subcommand == "stat"
+  ENV["RPERF_STAT"] = "1"
+  ENV["RPERF_STAT_COMMAND"] = ARGV.join(" ")
+end
+exec(*ARGV)

data/ext/rperf/extconf.rb ADDED Viewed

@@ -0,0 +1,6 @@
+require "mkmf"
+have_header("pthread.h") or abort "pthread.h not found"
+have_library("pthread") or abort "libpthread not found"
+create_makefile("rperf")