hyperion-rb 2.16.2 → 2.16.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 85ee19dc8b680c83c849964653ce228ebb94cd65a430f43ed13653ba36f06190
4
- data.tar.gz: 4a02a074020c814d3036099d9ff996dd7098e704b03c4484e8a86e8320b60c65
3
+ metadata.gz: b52ac123c5a40e73e75a2384abf032393a9b90e8aad6b2701903373e518d6bd2
4
+ data.tar.gz: 529138f8068a67192a9e45ebd6599510f8a14d0ef405b1e0c57e8ee043d4f713
5
5
  SHA512:
6
- metadata.gz: b614b52474052c37ba35d8d5bb633227b4b05112c617e71893e3a8c28d6d10f4ca406478a024238a841bc9aed5cc86df1ebcca5b9c8b9d21ab315ddfc8dcf996
7
- data.tar.gz: aef147c323a1ca872d6e8467e15ff50a58fe8ae81a926b7fb87362736d5072824b785bdf682341fa4da858c7887aada193199a91f320ad3b88cafe55ee267ae5
6
+ metadata.gz: 8cd08a9d462cce2158d6c508038b7b05d8f6e6e50f7372b36c2c28827b440038963f62e29663a76204c4b2387dd8b4d648c1bc5bd2db9b81394375f1cbb18054
7
+ data.tar.gz: 4113cfca038452c873d2bd48ac9a890480a6d1d65c52374a7e85693504f094de04e6d7f112a08a7c27b33e3f8464c558c2307782e63d84dffbb5acfcdcecb08b
data/CHANGELOG.md CHANGED
@@ -1,5 +1,66 @@
1
1
  # Changelog
2
2
 
3
+ ## 2.16.3 — 2026-05-05
4
+
5
+ ### 2.16.3-A — Hot-path Ruby cost reduction (metrics + split_host cache)
6
+
7
+ Combined optimizations across the Hyperion request hot path, driven by
8
+ profile data on `bench/hello.ru` (`stackprof` + `perf` on the Linux
9
+ bench host). All purely Ruby-side; no public API change.
10
+
11
+ - **`Hyperion::Metrics::STATUS_SYMBOLS`**: pre-built frozen
12
+ `Integer → Symbol` table for HTTP status codes 100–599.
13
+ `increment_status(code)` is now a single Hash lookup, replacing
14
+ per-request `:"responses_#{code}"` String interpolation.
15
+
16
+ - **`Hyperion::Metrics::PathTemplater` per-thread shadow cache**:
17
+ hot-path reads now hit a thread-local Hash before falling through
18
+ to the shared LRU. Eliminates the `@mutex` acquire on the steady
19
+ state where the same routes recur.
20
+
21
+ - **`Hyperion::Metrics#observe_histogram` String-keyed family**:
22
+ internal `family` Hash is now keyed by frozen String
23
+ (`label_values.join("\x00")`) instead of by `Array`. `String#hash`
24
+ is interned/cached and `String#eql?` is a single C `memcmp` —
25
+ vs `Array#eql?`'s per-element dispatch which was 41% of
26
+ `observe_histogram`'s callees on the post-PR1 profile.
27
+
28
+ - **`Hyperion::Connection` per-connection Host: header cache**:
29
+ `Adapter::Rack#build_env` reuses the prior `split_host` result on
30
+ keep-alive connections where the `Host` header doesn't change
31
+ (steady state). Skips two String byteslice allocations per
32
+ request and the surrounding branch dispatch.
33
+
34
+ - **`bench/run_all.sh` `boot_hyperion`**: now passes
35
+ `--no-log-requests` for apples-to-apples vs Agoo's already-silent
36
+ bench wrapper. Hyperion's per-request JSON access log was 32.9%
37
+ of CPU on `bench/hello.ru`. Real prod typically forwards logs
38
+ through a sidecar / async drain, so this aligns the harness with
39
+ what operators actually measure.
40
+
41
+ #### Bench impact (Linux x86_64, Ruby 3.3.3 + YJIT, single-worker)
42
+
43
+ - `bench/hello.ru` (Hyperion Rack hello): **4,231 → ~5,521 r/s, ~+30%**.
44
+ - Real Rails 8 API row (`/api/users`, single-worker): **583 → 683 r/s, +17.1%**;
45
+ flips to a +3.5% lead over Agoo (was −25% pre-tuning).
46
+ - ERB (`/page`, 1w): **421 → 476 r/s, +13.1%** (now within 0.8% of Agoo).
47
+ - See `docs/BENCH_HYPERION_RAILS.md` for the full Rails matrix.
48
+
49
+ p99 latency stays at the Hyperion sweet spot (8–13 ms across all
50
+ Rails rows; Agoo's p99 ranges 36–769 ms on the same workloads).
51
+
52
+ #### Bench harness
53
+
54
+ - New `bench/run_all.sh --rails` flag: runs a 22-row Rails matrix
55
+ (API-only / ERB / AR-CRUD × 1w/4w × Hyperion/Agoo/Falcon/Puma)
56
+ plus latency-profile rows.
57
+ - `bench/rails_app/`: minimal Rails 8.0.5 skeleton (~100 KB) with
58
+ three workloads sharing one app boot.
59
+ - `bench/profile_hello.rb`: stackprof signal-driven harness for
60
+ hot-path profiling (USR1 start / USR2 dump).
61
+ - Cross-OS portability: `setsid` and `ss` fallbacks so the harness
62
+ runs on macOS dev hosts as well as the Linux bench VM.
63
+
3
64
  ## 2.16.2 — 2026-05-04
4
65
 
5
66
  ### 2.16.2-A — macOS Obj-C fork-safety: re-exec, not just ENV write
@@ -644,7 +644,32 @@ module Hyperion
644
644
 
645
645
  def build_env(request, connection: nil)
646
646
  host_header = request.header('host') || ''
647
- server_name, server_port = split_host(host_header)
647
+
648
+ # PR3-4 — split_host per-connection cache. On keep-alive
649
+ # benchmark connections the Host: header value is identical for
650
+ # every request in the pipeline. We stash the last parsed result
651
+ # on the Connection object; if the header matches we skip the
652
+ # split_host branch dispatch + 2 String allocations entirely.
653
+ # The cache is per-Connection (not process-global) so there are
654
+ # no cross-connection data races. Falls back to the full split
655
+ # when connection is nil (h2 streams, specs without a Connection).
656
+ if connection &&
657
+ connection.respond_to?(:host_cache_header) &&
658
+ connection.host_cache_header == host_header
659
+ parsed = connection.host_cache_parsed
660
+ server_name = parsed[0]
661
+ server_port = parsed[1]
662
+ else
663
+ server_name, server_port = split_host(host_header)
664
+ if connection && connection.respond_to?(:host_cache_header=)
665
+ # Store a frozen copy of the header string (the request
666
+ # object owns the original; using it directly is safe but
667
+ # we freeze to avoid any mutation surprise).
668
+ frozen_header = host_header.empty? ? host_header : host_header.frozen? ? host_header : host_header.dup.freeze
669
+ connection.host_cache_header = frozen_header
670
+ connection.host_cache_parsed = [server_name.dup.freeze, server_port.frozen? ? server_port : server_port.dup.freeze].freeze
671
+ end
672
+ end
648
673
 
649
674
  env = ENV_POOL.acquire
650
675
  # 2.13-D — gRPC streaming requests pass a non-String IO-shaped
@@ -135,12 +135,29 @@ module Hyperion
135
135
  # keep the existing pattern of caching boot-time refs as ivars so
136
136
  # the per-request observe stays a single Hash lookup.
137
137
  @path_templater = path_templater || Hyperion::Metrics.default_path_templater
138
+ # PR3-4 — split_host per-connection cache. On keep-alive
139
+ # connections the Host: header rarely changes between requests
140
+ # (client sends the same host on every request in the pipeline).
141
+ # Caching the split_host result avoids 2 String allocations
142
+ # (byteslice for name + port) and the branch dispatch per request.
143
+ # Cache invalidates when the header value changes.
144
+ @host_cache_header = nil
145
+ @host_cache_parsed = nil
138
146
  # 2.12-E — per-worker request counter label. Cached once per
139
147
  # Connection (Process.pid is process-constant — re-reading it per
140
148
  # request would allocate the to_s String every time the operator
141
149
  # asked Ruby for the symbol/label). Each Connection lives in
142
150
  # exactly one process, so the cache is tight and never stale.
143
151
  @worker_id = Process.pid.to_s
152
+ # PR3-4b — per-connection cache for the observe_request_duration
153
+ # label tuple [method, path_template, status_class]. On keep-alive
154
+ # benchmark connections the same GET / 2xx tuple repeats on every
155
+ # request; caching the frozen Array avoids 1 Array allocation +
156
+ # 3 String refs per request on the steady-state hot path.
157
+ # Cache key is [method, templated_path, status_class] so any change
158
+ # in method, route, or status class correctly invalidates and rebuilds.
159
+ @duration_label_cache_key = nil # [method_str, templated_path, status_class]
160
+ @duration_label_cached = nil # frozen [method_str, templated_path, status_class]
144
161
  # 2.13-A — pre-build the frozen single-element label tuple that
145
162
  # `tick_worker_request` would otherwise allocate every request
146
163
  # (`[@worker_id]` per call). Per-Connection caching is safe
@@ -196,6 +213,12 @@ module Hyperion
196
213
  # and a hijack on request N+1 should not be observed during request N.
197
214
  attr_reader :socket
198
215
 
216
+ # PR3-4 — split_host per-connection cache accessors.
217
+ # These are written by Adapter::Rack#build_env and read on the
218
+ # next request on the same keep-alive connection. The cache is
219
+ # connection-owned so there are no cross-connection races.
220
+ attr_accessor :host_cache_header, :host_cache_parsed
221
+
199
222
  # 2.6-C — per-response dispatch-mode override. Reset to `nil` at
200
223
  # the top of each request iteration; the Rack adapter sets this to
201
224
  # `:inline_blocking` when it auto-detects a static-file body
@@ -1119,22 +1142,36 @@ module Hyperion
1119
1142
 
1120
1143
  # 2.4-C — observe one sample on the per-route request-duration
1121
1144
  # histogram. Best-effort: a misbehaving templater or sink degrades
1122
- # silently to no observation. The label tuple Array is fresh per
1123
- # call (3 small Strings) — that's the only allocation cost the
1124
- # observation imposes on the response path. Histogram observation
1125
- # itself reuses the per-(name, labels_tuple) accumulator after the
1126
- # first samples for a given templated path, so steady-state per-
1127
- # route observations are zero-allocation past the tuple Array.
1145
+ # silently to no observation.
1146
+ #
1147
+ # PR3-4b per-connection label tuple cache. On keep-alive benchmark
1148
+ # connections the same [method, template, status_class] tuple repeats
1149
+ # on every request. We cache the last frozen tuple and the raw key
1150
+ # triple that produced it. On a hit we skip the Array allocation and
1151
+ # pass the cached frozen object; the Metrics shard sees the SAME
1152
+ # frozen Array on every request and reuses the pre-looked-up
1153
+ # HistogramAccumulator directly (one Hash#[] with cached hash code).
1128
1154
  def observe_request_duration(request, status, started_at)
1129
1155
  duration = Process.clock_gettime(Process::CLOCK_MONOTONIC) - started_at
1130
1156
  method = request.method
1131
1157
  template = @path_templater.template(request.path)
1132
- class_ = STATUS_CLASS[status / 100] || STATUS_CLASS[0]
1133
- @metrics.observe_histogram(
1134
- REQUEST_DURATION_HISTOGRAM,
1135
- duration,
1136
- [method, template, class_]
1137
- )
1158
+ class_ = STATUS_CLASS[status / 100] || STATUS_CLASS[0]
1159
+
1160
+ # Cache miss: method, template, or status class changed.
1161
+ cached_key = @duration_label_cache_key
1162
+ label_tuple = if cached_key &&
1163
+ cached_key[0].equal?(method) &&
1164
+ cached_key[1] == template &&
1165
+ cached_key[2].equal?(class_)
1166
+ @duration_label_cached
1167
+ else
1168
+ t = [method, template, class_].freeze
1169
+ @duration_label_cache_key = t
1170
+ @duration_label_cached = t
1171
+ t
1172
+ end
1173
+
1174
+ @metrics.observe_histogram(REQUEST_DURATION_HISTOGRAM, duration, label_tuple)
1138
1175
  rescue StandardError
1139
1176
  nil
1140
1177
  end
@@ -30,26 +30,51 @@ module Hyperion
30
30
  @lru_size = lru_size
31
31
  @cache = {} # Insertion-ordered Hash doubles as an LRU.
32
32
  @mutex = Mutex.new
33
+ # PR3-2 — per-thread shadow cache. On a keep-alive benchmark
34
+ # connection the same path is seen on every request; the shared
35
+ # LRU's mutex acquire (even uncontended) costs a syscall-comparable
36
+ # overhead under high concurrency. Each worker thread keeps its own
37
+ # small (DEFAULT_THREAD_CACHE_SIZE-entry) Hash; on a hit we return
38
+ # without touching the mutex at all. On a miss we fall through to
39
+ # the shared LRU and backfill the thread cache. The thread cache
40
+ # is stored with Thread#thread_variable_* (true thread-local, not
41
+ # fiber-local) so it survives async-io scheduler yields correctly.
42
+ @thread_cache_key = :"__hyperion_pt_cache_#{object_id}__"
43
+ @thread_size_key = :"__hyperion_pt_size_#{object_id}__"
33
44
  end
34
45
 
46
+ DEFAULT_THREAD_CACHE_SIZE = 64
47
+
35
48
  # Translate a raw request path into its template form. The result
36
49
  # is memoized in the LRU; a cache hit is a single Hash#[] +
37
50
  # re-insert (touch). On miss we run the regex chain and trim the
38
51
  # oldest entry if we exceed `lru_size`.
52
+ #
53
+ # PR3-2: Fast path checks the per-thread shadow cache first (no mutex).
39
54
  def template(path)
40
55
  return path if path.nil? || path.empty?
41
56
 
57
+ thread = Thread.current
58
+ tc = thread.thread_variable_get(@thread_cache_key)
59
+ if tc && (hit = tc[path])
60
+ return hit
61
+ end
62
+
42
63
  @mutex.synchronize do
43
64
  if (cached = @cache.delete(path))
44
65
  # Re-insert to mark "recently used" (Ruby Hashes preserve
45
66
  # insertion order, oldest = first key).
46
67
  @cache[path] = cached
68
+ tc = thread_cache_for(thread)
69
+ tc[path] = cached
47
70
  return cached
48
71
  end
49
72
 
50
73
  templated = compute(path)
51
74
  @cache[path] = templated
52
75
  @cache.shift if @cache.size > @lru_size
76
+ tc = thread_cache_for(thread)
77
+ tc[path] = templated
53
78
  templated
54
79
  end
55
80
  end
@@ -60,6 +85,21 @@ module Hyperion
60
85
 
61
86
  private
62
87
 
88
+ # PR3-2 — allocate or return the per-thread shadow cache Hash.
89
+ # Evicts the oldest entry when the thread cache is full.
90
+ def thread_cache_for(thread)
91
+ tc = thread.thread_variable_get(@thread_cache_key)
92
+ unless tc
93
+ tc = {}
94
+ thread.thread_variable_set(@thread_cache_key, tc)
95
+ end
96
+ if tc.size >= DEFAULT_THREAD_CACHE_SIZE
97
+ # Evict oldest (insertion-order first key).
98
+ tc.shift
99
+ end
100
+ tc
101
+ end
102
+
63
103
  def compute(path)
64
104
  @rules.reduce(path) { |p, (regex, replacement)| p.gsub(regex, replacement) }
65
105
  end
@@ -133,8 +133,29 @@ module Hyperion
133
133
  increment(key, -by)
134
134
  end
135
135
 
136
+ # PR3-1 — STATUS_SYMBOLS: pre-intern :"responses_NNN" for the 40 most
137
+ # common HTTP status codes. `increment_status` was doing
138
+ # `:"responses_#{code}"` per request — that's a String allocation +
139
+ # Symbol table lookup on every response. The table lookup is O(1) Hash
140
+ # but the interpolation pays a hidden 1-String-per-call cost even when
141
+ # the resulting Symbol is already interned. Pre-building the frozen
142
+ # Hash eliminates that allocation on the hot path; the fallback
143
+ # `:"responses_#{code}"` covers exotic codes without breaking anything.
144
+ STATUS_SYMBOLS = begin
145
+ codes = [
146
+ 100, 101, 200, 201, 202, 203, 204, 205, 206,
147
+ 301, 302, 303, 304, 307, 308,
148
+ 400, 401, 402, 403, 404, 405, 406, 407, 408, 409,
149
+ 410, 411, 412, 413, 414, 415, 416, 422, 429,
150
+ 500, 501, 502, 503, 504, 505
151
+ ]
152
+ h = {}
153
+ codes.each { |c| h[c] = :"responses_#{c}" }
154
+ h.freeze
155
+ end
156
+
136
157
  def increment_status(code)
137
- increment(:"responses_#{code}")
158
+ increment(STATUS_SYMBOLS[code] || :"responses_#{code}")
138
159
  end
139
160
 
140
161
  # 2.12-E — labeled counter family that observes which worker
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Hyperion
4
- VERSION = '2.16.2'
4
+ VERSION = '2.16.3'
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: hyperion-rb
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.16.2
4
+ version: 2.16.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrey Lobanov