sperf 0.1.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: aecc23432b8dba72018c524b1fb6653a0c83a44e9e8b2c9af17c87cde20b648b
4
- data.tar.gz: fbdf1dddcb5b8fef86e358f315a801478bfcad7768d8f28cbcf82726c67d529c
3
+ metadata.gz: 125090cb17cacbc9157402fcb9db54117011ccde812da775ea00b6c1f6bee535
4
+ data.tar.gz: 1430d997988a6538059a0c63be28d4f05f43681d1f2efbd84020490308a5c279
5
5
  SHA512:
6
- metadata.gz: ff265b78dd2e237dac4d07a32b7459e07e73d77504347556945878e4addc1ccf89c47c409824591ff800cbe5605a863ae536b200c7f7c5eca5d06bbbd4ca6c05
7
- data.tar.gz: 950c3860737f37a9d57d15dae9db259635ebe94494096ad06c0489ab6c028827ea2b24d7e0046462e3b57ea546173993b2ed3e0c910bc677828b23b01ed718e9
6
+ metadata.gz: 17a52cfc856ae47f254bf0df3450ff76cd471703c97e029907abbf639aa9dd8d6971ef8b18acf6a9784b87b295981533776c147b0fc31993e33dcf528cf15a3d
7
+ data.tar.gz: 6bbad7152c998859f4ec1f07d42eeca8824918ad635b0454fbe13635d9b913ad309b09f8e7c760f5cdd4cc1d32d1316ba50a5fb7c1863e7718cb1df01a2761ad
data/README.md CHANGED
@@ -9,6 +9,7 @@ A safepoint-based sampling performance profiler for Ruby. Uses actual time delta
9
9
  - Requires Ruby >= 3.4.0
10
10
  - Output: pprof protobuf, collapsed stacks, or text report
11
11
  - Modes: CPU time (per-thread) and wall time (with GVL/GC tracking)
12
+ - [Online manual](https://ko1.github.io/sperf/docs/manual/) | [GitHub](https://github.com/ko1/sperf)
12
13
 
13
14
  ## Quick Start
14
15
 
@@ -56,10 +57,12 @@ Profile without code changes (e.g., Rails):
56
57
  SPERF_ENABLED=1 SPERF_MODE=wall SPERF_OUTPUT=profile.pb.gz ruby app.rb
57
58
  ```
58
59
 
59
- Run `sperf help` for full documentation (all options, output interpretation, diagnostics guide).
60
+ Run `sperf help` for full documentation, or see the [online manual](https://ko1.github.io/sperf/).
60
61
 
61
62
  ## Subcommands
62
63
 
64
+ Inspired by Linux `perf` — familiar subcommand interface for profiling workflows.
65
+
63
66
  | Command | Description |
64
67
  |---------|-------------|
65
68
  | `sperf record` | Profile a command and save to file |
@@ -79,8 +82,8 @@ Ruby's sampling profilers collect stack traces at **safepoints**, not at the exa
79
82
  sperf uses **time deltas as sample weights**:
80
83
 
81
84
  ```
82
- Timer thread (pthread) VM thread (postponed job)
83
- ───────────────────── ────────────────────────
85
+ Timer (signal or thread) VM thread (postponed job)
86
+ ──────────────────────── ────────────────────────
84
87
  every 1/frequency sec: at next safepoint:
85
88
  rb_postponed_job_trigger() → sperf_sample_job()
86
89
  time_now = read_clock()
@@ -88,6 +91,9 @@ Timer thread (pthread) VM thread (postponed job)
88
91
  record(backtrace, weight)
89
92
  ```
90
93
 
94
+ On Linux, the timer uses `timer_create` + signal delivery (no extra thread).
95
+ On other platforms, a dedicated pthread with `nanosleep` is used.
96
+
91
97
  If a safepoint is delayed, the sample carries proportionally more weight. The total weight equals the total time, accurately distributed across call stacks.
92
98
 
93
99
  ### Modes
@@ -110,6 +116,24 @@ sperf hooks GVL and GC events to attribute non-CPU time:
110
116
  | `[GC marking]` | Time in GC mark phase |
111
117
  | `[GC sweeping]` | Time in GC sweep phase |
112
118
 
119
+ ## Pros & Cons
120
+
121
+ ### Pros
122
+
123
+ - **Safepoint-based, but accurate**: Unlike signal-based profilers (e.g., stackprof), sperf samples at safepoints. Safepoint sampling is safer — no async-signal-safety constraints, so backtraces and VM state (GC phase, GVL ownership) can be inspected reliably. The downside is less precise sampling timing, but sperf compensates by using actual time deltas as sample weights — so the profiling results faithfully reflect where time is actually spent.
124
+ - **GVL & GC visibility** (wall mode): Attributes off-GVL time, GVL contention, and GC phases to the responsible call stacks with synthetic frames.
125
+ - **Low overhead**: No extra thread on Linux (signal-based timer). Sampling overhead is ~1-5 us per sample.
126
+ - **pprof compatible**: Output works with `go tool pprof`, speedscope, and other standard tools.
127
+ - **No code changes required**: Profile any Ruby program via CLI (`sperf stat ruby app.rb`) or environment variables (`SPERF_ENABLED=1`).
128
+ - **perf-like CLI**: Familiar subcommand interface — `record`, `stat`, `report`, `diff` — inspired by Linux perf.
129
+
130
+ ### Cons
131
+
132
+ - **Method-level only**: Profiles at the method level, not the line level. You can see which method is slow, but not which line within it.
133
+ - **Ruby >= 3.4.0**: Requires recent Ruby for the internal APIs used (postponed jobs, thread event hooks).
134
+ - **POSIX only**: Linux, macOS, etc. No Windows support.
135
+ - **Safepoint sampling**: Cannot sample inside C extensions or during long-running C calls that don't reach a safepoint. Time spent there is attributed to the next sample.
136
+
113
137
  ## Output Formats
114
138
 
115
139
  | Format | Extension | Use case |
data/exe/sperf CHANGED
@@ -1,5 +1,13 @@
1
1
  #!/usr/bin/env ruby
2
2
  require "optparse"
3
+ require "socket"
4
+
5
+ def find_available_port
6
+ server = TCPServer.new("localhost", 0)
7
+ port = server.addr[1]
8
+ server.close
9
+ port
10
+ end
3
11
 
4
12
  HELP_TEXT = <<'HELP'
5
13
  sperf - safepoint-based sampling performance profiler for Ruby
@@ -22,12 +30,16 @@ CLI USAGE
22
30
  -f, --frequency HZ Sampling frequency in Hz (default: 1000)
23
31
  -m, --mode MODE cpu or wall (default: cpu)
24
32
  --format FORMAT pprof, collapsed, or text (default: auto from extension)
33
+ --signal VALUE Timer signal (Linux only): signal number, or 'false'
34
+ for nanosleep thread (default: auto)
25
35
  -v, --verbose Print sampling statistics to stderr
26
36
 
27
37
  stat: Run command and print performance summary to stderr.
28
38
  Always uses wall mode. No file output by default.
29
39
  -o, --output PATH Also save profile to file (default: none)
30
40
  -f, --frequency HZ Sampling frequency in Hz (default: 1000)
41
+ --signal VALUE Timer signal (Linux only): signal number, or 'false'
42
+ for nanosleep thread (default: auto)
31
43
  -v, --verbose Print additional sampling statistics
32
44
 
33
45
  Shows: user/sys/real time, time breakdown (CPU execution, GVL blocked,
@@ -230,6 +242,7 @@ ENVIRONMENT VARIABLES
230
242
  SPERF_MODE=cpu|wall Profiling mode
231
243
  SPERF_FORMAT=fmt pprof, collapsed, or text
232
244
  SPERF_VERBOSE=1 Print statistics
245
+ SPERF_SIGNAL=N|false Timer signal number or 'false' for nanosleep (Linux only)
233
246
 
234
247
  TIPS
235
248
 
@@ -316,7 +329,7 @@ when "report"
316
329
  when :text
317
330
  exec("go", "tool", "pprof", "-text", report_file)
318
331
  else
319
- exec("go", "tool", "pprof", "-http=:0", report_file)
332
+ exec("go", "tool", "pprof", "-http=localhost:#{find_available_port}", report_file)
320
333
  end
321
334
  when "diff"
322
335
  # sperf diff: compare two pprof profiles via go tool pprof -diff_base
@@ -374,7 +387,7 @@ when "diff"
374
387
  when :text
375
388
  exec("go", "tool", "pprof", "-text", "-diff_base=#{base_file}", target_file)
376
389
  else
377
- exec("go", "tool", "pprof", "-http=:0", "-diff_base=#{base_file}", target_file)
390
+ exec("go", "tool", "pprof", "-http=localhost:#{find_available_port}", "-diff_base=#{base_file}", target_file)
378
391
  end
379
392
  when "record", "stat"
380
393
  # continue below
@@ -388,6 +401,7 @@ output = (subcommand == "stat") ? nil : "sperf.data"
388
401
  frequency = 1000
389
402
  mode = (subcommand == "stat") ? "wall" : "cpu"
390
403
  format = nil
404
+ signal = nil
391
405
  verbose = false
392
406
 
393
407
  parser = OptionParser.new do |opts|
@@ -412,6 +426,10 @@ parser = OptionParser.new do |opts|
412
426
  end
413
427
  end
414
428
 
429
+ opts.on("--signal VALUE", "Timer signal (Linux only): signal number, or 'false' for nanosleep thread") do |v|
430
+ signal = (v == "false") ? "false" : v
431
+ end
432
+
415
433
  opts.on("-v", "--verbose", "Print sampling statistics to stderr") do
416
434
  verbose = true
417
435
  end
@@ -448,6 +466,7 @@ ENV["SPERF_FREQUENCY"] = frequency.to_s
448
466
  ENV["SPERF_MODE"] = mode
449
467
  ENV["SPERF_FORMAT"] = format if format
450
468
  ENV["SPERF_VERBOSE"] = "1" if verbose
469
+ ENV["SPERF_SIGNAL"] = signal if signal
451
470
 
452
471
  if subcommand == "stat"
453
472
  ENV["SPERF_STAT"] = "1"
data/ext/sperf/sperf.c CHANGED
@@ -6,6 +6,14 @@
6
6
  #include <string.h>
7
7
  #include <stdlib.h>
8
8
  #include <unistd.h>
9
+ #include <signal.h>
10
+
11
+ #ifdef __linux__
12
+ #define SPERF_USE_TIMER_SIGNAL 1
13
+ #define SPERF_TIMER_SIGNAL_DEFAULT (SIGRTMIN + 8)
14
+ #else
15
+ #define SPERF_USE_TIMER_SIGNAL 0
16
+ #endif
9
17
 
10
18
  #define SPERF_MAX_STACK_DEPTH 512
11
19
  #define SPERF_INITIAL_SAMPLES 1024
@@ -49,6 +57,10 @@ typedef struct sperf_profiler {
49
57
  int mode; /* 0 = cpu, 1 = wall */
50
58
  volatile int running;
51
59
  pthread_t timer_thread;
60
+ #if SPERF_USE_TIMER_SIGNAL
61
+ timer_t timer_id;
62
+ int timer_signal; /* >0: use timer signal, 0: use nanosleep thread */
63
+ #endif
52
64
  rb_postponed_job_handle_t pj_handle;
53
65
  sperf_sample_t *samples;
54
66
  size_t sample_count;
@@ -407,7 +419,15 @@ sperf_sample_job(void *arg)
407
419
  (ts_end.tv_nsec - ts_start.tv_nsec);
408
420
  }
409
421
 
410
- /* ---- Timer thread ---- */
422
+ /* ---- Timer ---- */
423
+
424
+ #if SPERF_USE_TIMER_SIGNAL
425
+ static void
426
+ sperf_signal_handler(int sig)
427
+ {
428
+ rb_postponed_job_trigger(g_profiler.pj_handle);
429
+ }
430
+ #endif
411
431
 
412
432
  static void *
413
433
  sperf_timer_func(void *arg)
@@ -448,6 +468,9 @@ rb_sperf_start(int argc, VALUE *argv, VALUE self)
448
468
  VALUE opts;
449
469
  int frequency = 1000;
450
470
  int mode = 0; /* 0 = cpu, 1 = wall */
471
+ #if SPERF_USE_TIMER_SIGNAL
472
+ int timer_signal = SPERF_TIMER_SIGNAL_DEFAULT;
473
+ #endif
451
474
 
452
475
  rb_scan_args(argc, argv, ":", &opts);
453
476
  if (!NIL_P(opts)) {
@@ -469,6 +492,21 @@ rb_sperf_start(int argc, VALUE *argv, VALUE self)
469
492
  rb_raise(rb_eArgError, "mode must be :cpu or :wall");
470
493
  }
471
494
  }
495
+ #if SPERF_USE_TIMER_SIGNAL
496
+ VALUE vsig = rb_hash_aref(opts, ID2SYM(rb_intern("signal")));
497
+ if (!NIL_P(vsig)) {
498
+ if (RTEST(vsig)) {
499
+ timer_signal = NUM2INT(vsig);
500
+ if (timer_signal < SIGRTMIN || timer_signal > SIGRTMAX) {
501
+ rb_raise(rb_eArgError, "signal must be between SIGRTMIN(%d) and SIGRTMAX(%d)",
502
+ SIGRTMIN, SIGRTMAX);
503
+ }
504
+ } else {
505
+ /* signal: false or signal: 0 → use nanosleep thread */
506
+ timer_signal = 0;
507
+ }
508
+ }
509
+ #endif
472
510
  }
473
511
 
474
512
  if (g_profiler.running) {
@@ -534,8 +572,43 @@ rb_sperf_start(int argc, VALUE *argv, VALUE self)
534
572
 
535
573
  g_profiler.running = 1;
536
574
 
537
- if (pthread_create(&g_profiler.timer_thread, NULL, sperf_timer_func, &g_profiler) != 0) {
538
- g_profiler.running = 0;
575
+ #if SPERF_USE_TIMER_SIGNAL
576
+ g_profiler.timer_signal = timer_signal;
577
+
578
+ if (timer_signal > 0) {
579
+ struct sigaction sa;
580
+ struct sigevent sev;
581
+ struct itimerspec its;
582
+
583
+ memset(&sa, 0, sizeof(sa));
584
+ sa.sa_handler = sperf_signal_handler;
585
+ sa.sa_flags = SA_RESTART;
586
+ sigaction(g_profiler.timer_signal, &sa, NULL);
587
+
588
+ memset(&sev, 0, sizeof(sev));
589
+ sev.sigev_notify = SIGEV_SIGNAL;
590
+ sev.sigev_signo = g_profiler.timer_signal;
591
+ if (timer_create(CLOCK_MONOTONIC, &sev, &g_profiler.timer_id) != 0) {
592
+ g_profiler.running = 0;
593
+ signal(g_profiler.timer_signal, SIG_DFL);
594
+ goto timer_fail;
595
+ }
596
+
597
+ its.it_value.tv_sec = 0;
598
+ its.it_value.tv_nsec = 1000000000L / g_profiler.frequency;
599
+ its.it_interval = its.it_value;
600
+ timer_settime(g_profiler.timer_id, 0, &its, NULL);
601
+ } else
602
+ #endif
603
+ {
604
+ if (pthread_create(&g_profiler.timer_thread, NULL, sperf_timer_func, &g_profiler) != 0) {
605
+ g_profiler.running = 0;
606
+ goto timer_fail;
607
+ }
608
+ }
609
+
610
+ if (0) {
611
+ timer_fail:
539
612
  {
540
613
  VALUE cur = rb_thread_current();
541
614
  sperf_thread_data_t *td = (sperf_thread_data_t *)rb_internal_thread_specific_get(cur, g_profiler.ts_key);
@@ -550,7 +623,7 @@ rb_sperf_start(int argc, VALUE *argv, VALUE self)
550
623
  g_profiler.samples = NULL;
551
624
  free(g_profiler.frame_pool);
552
625
  g_profiler.frame_pool = NULL;
553
- rb_raise(rb_eRuntimeError, "sperf: failed to create timer thread");
626
+ rb_raise(rb_eRuntimeError, "sperf: failed to create timer");
554
627
  }
555
628
 
556
629
  return Qtrue;
@@ -568,7 +641,15 @@ rb_sperf_stop(VALUE self)
568
641
  }
569
642
 
570
643
  g_profiler.running = 0;
571
- pthread_join(g_profiler.timer_thread, NULL);
644
+ #if SPERF_USE_TIMER_SIGNAL
645
+ if (g_profiler.timer_signal > 0) {
646
+ timer_delete(g_profiler.timer_id);
647
+ signal(g_profiler.timer_signal, SIG_DFL);
648
+ } else
649
+ #endif
650
+ {
651
+ pthread_join(g_profiler.timer_thread, NULL);
652
+ }
572
653
 
573
654
  if (g_profiler.thread_hook) {
574
655
  rb_internal_thread_remove_event_hook(g_profiler.thread_hook);
@@ -657,9 +738,16 @@ sperf_after_fork_child(void)
657
738
  {
658
739
  if (!g_profiler.running) return;
659
740
 
660
- /* Mark as not running — timer thread doesn't exist in child */
741
+ /* Mark as not running — timer doesn't exist in child */
661
742
  g_profiler.running = 0;
662
743
 
744
+ #if SPERF_USE_TIMER_SIGNAL
745
+ /* timer_create timers are not inherited across fork; reset signal handler */
746
+ if (g_profiler.timer_signal > 0) {
747
+ signal(g_profiler.timer_signal, SIG_DFL);
748
+ }
749
+ #endif
750
+
663
751
  /* Remove hooks so they don't fire with stale state */
664
752
  if (g_profiler.thread_hook) {
665
753
  rb_internal_thread_remove_event_hook(g_profiler.thread_hook);
@@ -0,0 +1,3 @@
1
+ module Sperf
2
+ VERSION = "0.2.1"
3
+ end
data/lib/sperf.rb CHANGED
@@ -1,9 +1,9 @@
1
1
  require "sperf.so"
2
+ require "sperf/version"
2
3
  require "zlib"
3
4
  require "stringio"
4
5
 
5
6
  module Sperf
6
- VERSION = "0.1.0"
7
7
 
8
8
  @verbose = false
9
9
  @output = nil
@@ -17,13 +17,15 @@ module Sperf
17
17
  # .collapsed → collapsed stacks (FlameGraph / speedscope compatible)
18
18
  # .txt → text report (human/AI readable flat + cumulative table)
19
19
  # otherwise (.pb.gz etc) → pprof protobuf (gzip compressed)
20
- def self.start(frequency: 1000, mode: :cpu, output: nil, verbose: false, format: nil, stat: false)
20
+ def self.start(frequency: 1000, mode: :cpu, output: nil, verbose: false, format: nil, stat: false, signal: nil)
21
21
  @verbose = verbose || ENV["SPERF_VERBOSE"] == "1"
22
22
  @output = output
23
23
  @format = format
24
24
  @stat = stat
25
25
  @stat_start_mono = Process.clock_gettime(Process::CLOCK_MONOTONIC) if @stat
26
- _c_start(frequency: frequency, mode: mode)
26
+ c_opts = { frequency: frequency, mode: mode }
27
+ c_opts[:signal] = signal unless signal.nil?
28
+ _c_start(**c_opts)
27
29
 
28
30
  if block_given?
29
31
  begin
@@ -357,11 +359,18 @@ module Sperf
357
359
  _sperf_mode = _sperf_mode_str == "wall" ? :wall : :cpu
358
360
  _sperf_format = ENV["SPERF_FORMAT"] ? ENV["SPERF_FORMAT"].to_sym : nil
359
361
  _sperf_stat = ENV["SPERF_STAT"] == "1"
360
- start(frequency: (ENV["SPERF_FREQUENCY"] || 1000).to_i, mode: _sperf_mode,
361
- output: _sperf_stat ? ENV["SPERF_OUTPUT"] : (ENV["SPERF_OUTPUT"] || "sperf.data"),
362
- verbose: ENV["SPERF_VERBOSE"] == "1",
363
- format: _sperf_format,
364
- stat: _sperf_stat)
362
+ _sperf_signal = case ENV["SPERF_SIGNAL"]
363
+ when nil then nil
364
+ when "false" then false
365
+ else ENV["SPERF_SIGNAL"].to_i
366
+ end
367
+ _sperf_start_opts = { frequency: (ENV["SPERF_FREQUENCY"] || 1000).to_i, mode: _sperf_mode,
368
+ output: _sperf_stat ? ENV["SPERF_OUTPUT"] : (ENV["SPERF_OUTPUT"] || "sperf.data"),
369
+ verbose: ENV["SPERF_VERBOSE"] == "1",
370
+ format: _sperf_format,
371
+ stat: _sperf_stat }
372
+ _sperf_start_opts[:signal] = _sperf_signal unless _sperf_signal.nil?
373
+ start(**_sperf_start_opts)
365
374
  at_exit { stop }
366
375
  end
367
376
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sperf
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Koichi Sasada
@@ -53,6 +53,7 @@ files:
53
53
  - ext/sperf/extconf.rb
54
54
  - ext/sperf/sperf.c
55
55
  - lib/sperf.rb
56
+ - lib/sperf/version.rb
56
57
  homepage: "https://github.com/ko1/sperf"
57
58
  licenses:
58
59
  - MIT