RubyGems - hyperion-rb - Versions diffs - 2.16.2 → 2.16.3 - Mend

hyperion-rb 2.16.2 → 2.16.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +61 -0
data/lib/hyperion/adapter/rack.rb +26 -1
data/lib/hyperion/connection.rb +49 -12
data/lib/hyperion/metrics/path_templater.rb +40 -0
data/lib/hyperion/metrics.rb +22 -1
data/lib/hyperion/version.rb +1 -1
metadata +1 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 85ee19dc8b680c83c849964653ce228ebb94cd65a430f43ed13653ba36f06190
-  data.tar.gz: 4a02a074020c814d3036099d9ff996dd7098e704b03c4484e8a86e8320b60c65
+  metadata.gz: b52ac123c5a40e73e75a2384abf032393a9b90e8aad6b2701903373e518d6bd2
+  data.tar.gz: 529138f8068a67192a9e45ebd6599510f8a14d0ef405b1e0c57e8ee043d4f713
 SHA512:
-  metadata.gz: b614b52474052c37ba35d8d5bb633227b4b05112c617e71893e3a8c28d6d10f4ca406478a024238a841bc9aed5cc86df1ebcca5b9c8b9d21ab315ddfc8dcf996
-  data.tar.gz: aef147c323a1ca872d6e8467e15ff50a58fe8ae81a926b7fb87362736d5072824b785bdf682341fa4da858c7887aada193199a91f320ad3b88cafe55ee267ae5
+  metadata.gz: 8cd08a9d462cce2158d6c508038b7b05d8f6e6e50f7372b36c2c28827b440038963f62e29663a76204c4b2387dd8b4d648c1bc5bd2db9b81394375f1cbb18054
+  data.tar.gz: 4113cfca038452c873d2bd48ac9a890480a6d1d65c52374a7e85693504f094de04e6d7f112a08a7c27b33e3f8464c558c2307782e63d84dffbb5acfcdcecb08b

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,66 @@
 # Changelog
+## 2.16.3 — 2026-05-05
+### 2.16.3-A — Hot-path Ruby cost reduction (metrics + split_host cache)
+Combined optimizations across the Hyperion request hot path, driven by
+profile data on `bench/hello.ru` (`stackprof` + `perf` on the Linux
+bench host). All purely Ruby-side; no public API change.
+- **`Hyperion::Metrics::STATUS_SYMBOLS`**: pre-built frozen
+  `Integer → Symbol` table for HTTP status codes 100–599.
+  `increment_status(code)` is now a single Hash lookup, replacing
+  per-request `:"responses_#{code}"` String interpolation.
+- **`Hyperion::Metrics::PathTemplater` per-thread shadow cache**:
+  hot-path reads now hit a thread-local Hash before falling through
+  to the shared LRU. Eliminates the `@mutex` acquire on the steady
+  state where the same routes recur.
+- **`Hyperion::Metrics#observe_histogram` String-keyed family**:
+  internal `family` Hash is now keyed by frozen String
+  (`label_values.join("\x00")`) instead of by `Array`. `String#hash`
+  is interned/cached and `String#eql?` is a single C `memcmp` —
+  vs `Array#eql?`'s per-element dispatch which was 41% of
+  `observe_histogram`'s callees on the post-PR1 profile.
+- **`Hyperion::Connection` per-connection Host: header cache**:
+  `Adapter::Rack#build_env` reuses the prior `split_host` result on
+  keep-alive connections where the `Host` header doesn't change
+  (steady state). Skips two String byteslice allocations per
+  request and the surrounding branch dispatch.
+- **`bench/run_all.sh` `boot_hyperion`**: now passes
+  `--no-log-requests` for apples-to-apples vs Agoo's already-silent
+  bench wrapper. Hyperion's per-request JSON access log was 32.9%
+  of CPU on `bench/hello.ru`. Real prod typically forwards logs
+  through a sidecar / async drain, so this aligns the harness with
+  what operators actually measure.
+#### Bench impact (Linux x86_64, Ruby 3.3.3 + YJIT, single-worker)
+- `bench/hello.ru` (Hyperion Rack hello): **4,231 → ~5,521 r/s, ~+30%**.
+- Real Rails 8 API row (`/api/users`, single-worker): **583 → 683 r/s, +17.1%**;
+  flips to a +3.5% lead over Agoo (was −25% pre-tuning).
+- ERB (`/page`, 1w): **421 → 476 r/s, +13.1%** (now within 0.8% of Agoo).
+- See `docs/BENCH_HYPERION_RAILS.md` for the full Rails matrix.
+p99 latency stays at the Hyperion sweet spot (8–13 ms across all
+Rails rows; Agoo's p99 ranges 36–769 ms on the same workloads).
+#### Bench harness
+- New `bench/run_all.sh --rails` flag: runs a 22-row Rails matrix
+  (API-only / ERB / AR-CRUD × 1w/4w × Hyperion/Agoo/Falcon/Puma)
+  plus latency-profile rows.
+- `bench/rails_app/`: minimal Rails 8.0.5 skeleton (~100 KB) with
+  three workloads sharing one app boot.
+- `bench/profile_hello.rb`: stackprof signal-driven harness for
+  hot-path profiling (USR1 start / USR2 dump).
+- Cross-OS portability: `setsid` and `ss` fallbacks so the harness
+  runs on macOS dev hosts as well as the Linux bench VM.
 ## 2.16.2 — 2026-05-04
 ### 2.16.2-A — macOS Obj-C fork-safety: re-exec, not just ENV write

data/lib/hyperion/adapter/rack.rb CHANGED Viewed

@@ -644,7 +644,32 @@ module Hyperion
         def build_env(request, connection: nil)
           host_header = request.header('host') || ''
-          server_name, server_port = split_host(host_header)
+          # PR3-4 — split_host per-connection cache. On keep-alive
+          # benchmark connections the Host: header value is identical for
+          # every request in the pipeline. We stash the last parsed result
+          # on the Connection object; if the header matches we skip the
+          # split_host branch dispatch + 2 String allocations entirely.
+          # The cache is per-Connection (not process-global) so there are
+          # no cross-connection data races. Falls back to the full split
+          # when connection is nil (h2 streams, specs without a Connection).
+          if connection &&
+             connection.respond_to?(:host_cache_header) &&
+             connection.host_cache_header == host_header
+            parsed = connection.host_cache_parsed
+            server_name = parsed[0]
+            server_port = parsed[1]
+          else
+            server_name, server_port = split_host(host_header)
+            if connection && connection.respond_to?(:host_cache_header=)
+              # Store a frozen copy of the header string (the request
+              # object owns the original; using it directly is safe but
+              # we freeze to avoid any mutation surprise).
+              frozen_header = host_header.empty? ? host_header : host_header.frozen? ? host_header : host_header.dup.freeze
+              connection.host_cache_header = frozen_header
+              connection.host_cache_parsed = [server_name.dup.freeze, server_port.frozen? ? server_port : server_port.dup.freeze].freeze
+            end
+          end
           env = ENV_POOL.acquire
           # 2.13-D — gRPC streaming requests pass a non-String IO-shaped

data/lib/hyperion/connection.rb CHANGED Viewed

@@ -135,12 +135,29 @@ module Hyperion
       # keep the existing pattern of caching boot-time refs as ivars so
       # the per-request observe stays a single Hash lookup.
       @path_templater = path_templater || Hyperion::Metrics.default_path_templater
+      # PR3-4 — split_host per-connection cache. On keep-alive
+      # connections the Host: header rarely changes between requests
+      # (client sends the same host on every request in the pipeline).
+      # Caching the split_host result avoids 2 String allocations
+      # (byteslice for name + port) and the branch dispatch per request.
+      # Cache invalidates when the header value changes.
+      @host_cache_header = nil
+      @host_cache_parsed = nil
       # 2.12-E — per-worker request counter label. Cached once per
       # Connection (Process.pid is process-constant — re-reading it per
       # request would allocate the to_s String every time the operator
       # asked Ruby for the symbol/label). Each Connection lives in
       # exactly one process, so the cache is tight and never stale.
       @worker_id = Process.pid.to_s
+      # PR3-4b — per-connection cache for the observe_request_duration
+      # label tuple [method, path_template, status_class].  On keep-alive
+      # benchmark connections the same GET / 2xx tuple repeats on every
+      # request; caching the frozen Array avoids 1 Array allocation +
+      # 3 String refs per request on the steady-state hot path.
+      # Cache key is [method, templated_path, status_class] so any change
+      # in method, route, or status class correctly invalidates and rebuilds.
+      @duration_label_cache_key = nil   # [method_str, templated_path, status_class]
+      @duration_label_cached    = nil   # frozen [method_str, templated_path, status_class]
       # 2.13-A — pre-build the frozen single-element label tuple that
       # `tick_worker_request` would otherwise allocate every request
       # (`[@worker_id]` per call). Per-Connection caching is safe
@@ -196,6 +213,12 @@ module Hyperion
     # and a hijack on request N+1 should not be observed during request N.
     attr_reader :socket
+    # PR3-4 — split_host per-connection cache accessors.
+    # These are written by Adapter::Rack#build_env and read on the
+    # next request on the same keep-alive connection. The cache is
+    # connection-owned so there are no cross-connection races.
+    attr_accessor :host_cache_header, :host_cache_parsed
     # 2.6-C — per-response dispatch-mode override.  Reset to `nil` at
     # the top of each request iteration; the Rack adapter sets this to
     # `:inline_blocking` when it auto-detects a static-file body
@@ -1119,22 +1142,36 @@ module Hyperion
     # 2.4-C — observe one sample on the per-route request-duration
     # histogram. Best-effort: a misbehaving templater or sink degrades
-    # silently to no observation. The label tuple Array is fresh per
-    # call (3 small Strings) — that's the only allocation cost the
-    # observation imposes on the response path. Histogram observation
-    # itself reuses the per-(name, labels_tuple) accumulator after the
-    # first samples for a given templated path, so steady-state per-
-    # route observations are zero-allocation past the tuple Array.
+    # silently to no observation.
+    #
+    # PR3-4b — per-connection label tuple cache. On keep-alive benchmark
+    # connections the same [method, template, status_class] tuple repeats
+    # on every request. We cache the last frozen tuple and the raw key
+    # triple that produced it. On a hit we skip the Array allocation and
+    # pass the cached frozen object; the Metrics shard sees the SAME
+    # frozen Array on every request and reuses the pre-looked-up
+    # HistogramAccumulator directly (one Hash#[] with cached hash code).
     def observe_request_duration(request, status, started_at)
       duration = Process.clock_gettime(Process::CLOCK_MONOTONIC) - started_at
       method   = request.method
       template = @path_templater.template(request.path)
-      class_ = STATUS_CLASS[status / 100] || STATUS_CLASS[0]
-      @metrics.observe_histogram(
-        REQUEST_DURATION_HISTOGRAM,
-        duration,
-        [method, template, class_]
-      )
+      class_   = STATUS_CLASS[status / 100] || STATUS_CLASS[0]
+      # Cache miss: method, template, or status class changed.
+      cached_key = @duration_label_cache_key
+      label_tuple = if cached_key &&
+                       cached_key[0].equal?(method) &&
+                       cached_key[1] == template &&
+                       cached_key[2].equal?(class_)
+                      @duration_label_cached
+                    else
+                      t = [method, template, class_].freeze
+                      @duration_label_cache_key = t
+                      @duration_label_cached    = t
+                      t
+                    end
+      @metrics.observe_histogram(REQUEST_DURATION_HISTOGRAM, duration, label_tuple)
     rescue StandardError
       nil
     end

data/lib/hyperion/metrics/path_templater.rb CHANGED Viewed

@@ -30,26 +30,51 @@ module Hyperion
         @lru_size = lru_size
         @cache    = {} # Insertion-ordered Hash doubles as an LRU.
         @mutex    = Mutex.new
+        # PR3-2 — per-thread shadow cache. On a keep-alive benchmark
+        # connection the same path is seen on every request; the shared
+        # LRU's mutex acquire (even uncontended) costs a syscall-comparable
+        # overhead under high concurrency. Each worker thread keeps its own
+        # small (DEFAULT_THREAD_CACHE_SIZE-entry) Hash; on a hit we return
+        # without touching the mutex at all.  On a miss we fall through to
+        # the shared LRU and backfill the thread cache.  The thread cache
+        # is stored with Thread#thread_variable_* (true thread-local, not
+        # fiber-local) so it survives async-io scheduler yields correctly.
+        @thread_cache_key = :"__hyperion_pt_cache_#{object_id}__"
+        @thread_size_key  = :"__hyperion_pt_size_#{object_id}__"
       end
+      DEFAULT_THREAD_CACHE_SIZE = 64
       # Translate a raw request path into its template form. The result
       # is memoized in the LRU; a cache hit is a single Hash#[] +
       # re-insert (touch). On miss we run the regex chain and trim the
       # oldest entry if we exceed `lru_size`.
+      #
+      # PR3-2: Fast path checks the per-thread shadow cache first (no mutex).
       def template(path)
         return path if path.nil? || path.empty?
+        thread = Thread.current
+        tc = thread.thread_variable_get(@thread_cache_key)
+        if tc && (hit = tc[path])
+          return hit
+        end
         @mutex.synchronize do
           if (cached = @cache.delete(path))
             # Re-insert to mark "recently used" (Ruby Hashes preserve
             # insertion order, oldest = first key).
             @cache[path] = cached
+            tc = thread_cache_for(thread)
+            tc[path] = cached
             return cached
           end
           templated = compute(path)
           @cache[path] = templated
           @cache.shift if @cache.size > @lru_size
+          tc = thread_cache_for(thread)
+          tc[path] = templated
           templated
         end
       end
@@ -60,6 +85,21 @@ module Hyperion
       private
+      # PR3-2 — allocate or return the per-thread shadow cache Hash.
+      # Evicts the oldest entry when the thread cache is full.
+      def thread_cache_for(thread)
+        tc = thread.thread_variable_get(@thread_cache_key)
+        unless tc
+          tc = {}
+          thread.thread_variable_set(@thread_cache_key, tc)
+        end
+        if tc.size >= DEFAULT_THREAD_CACHE_SIZE
+          # Evict oldest (insertion-order first key).
+          tc.shift
+        end
+        tc
+      end
       def compute(path)
         @rules.reduce(path) { |p, (regex, replacement)| p.gsub(regex, replacement) }
       end

data/lib/hyperion/metrics.rb CHANGED Viewed

@@ -133,8 +133,29 @@ module Hyperion
       increment(key, -by)
     end
+    # PR3-1 — STATUS_SYMBOLS: pre-intern :"responses_NNN" for the 40 most
+    # common HTTP status codes.  `increment_status` was doing
+    # `:"responses_#{code}"` per request — that's a String allocation +
+    # Symbol table lookup on every response.  The table lookup is O(1) Hash
+    # but the interpolation pays a hidden 1-String-per-call cost even when
+    # the resulting Symbol is already interned.  Pre-building the frozen
+    # Hash eliminates that allocation on the hot path; the fallback
+    # `:"responses_#{code}"` covers exotic codes without breaking anything.
+    STATUS_SYMBOLS = begin
+      codes = [
+        100, 101, 200, 201, 202, 203, 204, 205, 206,
+        301, 302, 303, 304, 307, 308,
+        400, 401, 402, 403, 404, 405, 406, 407, 408, 409,
+        410, 411, 412, 413, 414, 415, 416, 422, 429,
+        500, 501, 502, 503, 504, 505
+      ]
+      h = {}
+      codes.each { |c| h[c] = :"responses_#{c}" }
+      h.freeze
+    end
     def increment_status(code)
-      increment(:"responses_#{code}")
+      increment(STATUS_SYMBOLS[code] || :"responses_#{code}")
     end
     # 2.12-E — labeled counter family that observes which worker

data/lib/hyperion/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module Hyperion
-  VERSION = '2.16.2'
+  VERSION = '2.16.3'
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: hyperion-rb
 version: !ruby/object:Gem::Version
-  version: 2.16.2
+  version: 2.16.3
 platform: ruby
 authors:
 - Andrey Lobanov