hyperion-rb 2.10.1 → 2.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 874baf5de450eacd4df0b5cd91f9aaa3a8d86214d099f50ed81af3416085a04c
4
- data.tar.gz: ca1ce4f0c3f31e9399cc8b16cd461e9275b54f630cb9ea7bef9828658e490633
3
+ metadata.gz: 70fee9940bc27a68927bb0204717bb35da7ea695bc02fe49aaa6c9597c9dd78d
4
+ data.tar.gz: aaa583caa320c605bb06e2894c3f5452d183b4aa5f1ab27329a2a3a89e1ef900
5
5
  SHA512:
6
- metadata.gz: 9d2958ba3850c6df36250f91c21b9d1f3901acde0a74e9c7b8aaf09116f80729c25414b5e50397c82ca083fc9dbca1b7e4b485f34441c659512644b6edb500af
7
- data.tar.gz: bab657069a157759b5c7305f1151fc5795a0b3295958591ac91b55901bd29883e41660a2ba87970c496035b53d3ca8dbda393eb03eb20d55801d2003e58be0ce
6
+ metadata.gz: 3c28ad4d534e00eb3b687903da63ae7c04c918d27081d9ac73e04a67d521e2abd832f100b20d87b3411fa9c7be5abdcda2f8638ca8c139ef913fbd0f1ad1930a
7
+ data.tar.gz: debe988e4dc461376df38bdd84833a2f608dcfb8bf2dc420fc086e89970607b1c36d632dc1057b4eb9983f83918fdc60d93ee87a10098f6f775ccd4207611fb1
data/CHANGELOG.md CHANGED
@@ -1,5 +1,210 @@
1
1
  # Changelog
2
2
 
3
+ ## 2.11.0 — 2026-05-01
4
+
5
+ ### 2.11-B — HPACK FFI marshalling round-2 (cglue confirmed as firm default; +43% v3 vs v2 on Rails-shape h2)
6
+
7
+ **Bench result (openclaw-vm, 25-header h2load -c 1 -m 100 -n 5000, 3 runs/variant, median):**
8
+
9
+ | Variant | Env value | Median rps |
10
+ |---|---|---:|
11
+ | Ruby fallback | `=off` | 1585.35 r/s |
12
+ | Native v2 (Fiddle, forced) | `=v2` | 1602.27 r/s |
13
+ | Native v3 (CGlue, forced) | `=cglue` | 2291.44 r/s |
14
+
15
+ | Delta | Value |
16
+ |---|---:|
17
+ | native (v2) vs ruby | +1.1% (within noise) |
18
+ | **cglue (v3) vs native (v2)** | **+43.0%** (HEADLINE — Fiddle marshalling overhead) |
19
+ | cglue (v3) vs ruby | +44.5% (total native win) |
20
+
21
+ **Decision: flip cglue default ON.** The bench cleanly attributes the
22
+ +18-44% native-vs-ruby headline to the C-glue path's elimination of
23
+ per-call Fiddle marshalling, *not* to the underlying Rust HPACK
24
+ encoder. With cglue forced off, native v2 is +1-5% over ruby —
25
+ basically noise on this header count. The 2.5-B headline ("+18%
26
+ native vs ruby") was actually measuring v3 vs ruby because `=1`
27
+ silently picked v3 on cglue-available hosts; 2.11-B's `=v2` token
28
+ made the v2-only number measurable for the first time.
29
+
30
+ The `:auto` resolved state (unset / `=1` / `=true`) was already
31
+ selecting cglue when available since 2.4-A; this round confirms
32
+ that selection and updates the boot-log mode string to advertise
33
+ cglue as the de jure default. The runtime behavior on a host where
34
+ cglue is available is unchanged from 2.10 — the de facto cglue
35
+ selection becomes de jure, with a `default since 2.11-B` marker
36
+ in the human-readable `mode` log field.
37
+
38
+
39
+
40
+ The 2.5-B Rails-shape bench measured native v3 (CGlue) at +18% over
41
+ the Ruby fallback on a 25-header response — comfortably above the
42
+ +15% flip threshold, which moved the native-vs-Ruby default to ON.
43
+ The remaining open question 2.5-B punted on: how much of that +18%
44
+ is the Rust HPACK encoder, and how much is the C-glue path's
45
+ elimination of per-call Fiddle marshalling? A direct A/B was
46
+ impossible at the time because `=1` always picked v3 on hosts
47
+ where the C glue had installed successfully.
48
+
49
+ **Surface change.** `HYPERION_H2_NATIVE_HPACK` now accepts an
50
+ explicit native-mode token alongside the legacy Boolean values.
51
+ The legacy values still resolve to the same physical path they
52
+ did before, so this is back-compat for every operator who set the
53
+ env var pre-2.11-B:
54
+
55
+ | Value | Resolves to | Pre-2.11-B behavior | 2.11-B behavior |
56
+ |---|---|---|---|
57
+ | (unset) / `1` / `true` / `yes` / `on` / `auto` | `:auto` | native, prefer cglue | native, prefer cglue (unchanged) |
58
+ | `cglue` / `v3` | `:cglue` | (same as `1`) | force cglue, warn-fallback to v2 |
59
+ | `v2` / `fiddle` | `:v2` | (same as `1`) | force v2/Fiddle (skip cglue even if installed) |
60
+ | `0` / `false` / `no` / `off` / `ruby` | `:off` | ruby fallback | ruby fallback (unchanged) |
61
+
62
+ The new `=v2` value is the bench-isolation knob `bench/h2_rails_shape.sh`
63
+ needs: without it the harness's "native" variant silently picked v3
64
+ on bench hosts where the C glue loaded successfully, making the
65
+ v2-vs-v3 delta unmeasurable. With `=v2` the operator can force the
66
+ Fiddle path on a host where cglue is physically present.
67
+
68
+ **Implementation.** `Hyperion::H2Codec.cglue_active?` overlays
69
+ `cglue_available?` with an operator-controllable gate
70
+ (`H2Codec.cglue_disabled = true|false`). The Encoder/Decoder hot
71
+ paths probe `cglue_active?` (was `cglue_available?`); one extra
72
+ ivar read per encode call which YJIT inlines away. `Http2Handler`
73
+ sets the gate at construction based on the resolved native-mode
74
+ state. The gate is global (the codec module is a singleton); the
75
+ handler resets it on every construction so a `=v2` boot can't leak
76
+ the disable into a subsequent default-mode handler.
77
+
78
+ **Boot log.** The `h2 codec selected` log line gains a new `native_mode`
79
+ field exposing the operator-requested mode (`auto` / `cglue` / `v2` /
80
+ `off` / `cglue-requested-unavailable`). The existing `hpack_path`
81
+ field continues to be one of `pure-ruby` / `native-v2` / `native-v3` —
82
+ unchanged, ops dashboards keying off it keep working. The `mode`
83
+ human-readable string differentiates `forced` from `auto` selections
84
+ so a misconfigured `=cglue` boot on a host without the C glue is
85
+ visible in one log line instead of requiring a process trace.
86
+
87
+ **Bench harness.** `bench/h2_rails_shape.sh` now runs three variants
88
+ (`ruby`, `native`, `cglue`) instead of two and emits
89
+ `delta_native_vs_ruby` (informational — should reproduce the 2.5-B
90
+ +18% headline) plus `delta_cglue_vs_native` (the headline for
91
+ 2.11-B — does cutting per-call Fiddle marshalling buy anything on
92
+ top of the v2 path?). The decision rule keys off the cglue-vs-native
93
+ delta:
94
+
95
+ | Outcome (cglue vs native) | Action |
96
+ |---|---|
97
+ | ≥ +15% rps | Flip cglue default ON (replace 2.5-B's auto-cglue dance) |
98
+ | Parity / +5-10% (within noise) | Keep cglue opt-in, file as deferred |
99
+ | ≥ −2% (negative) | Investigate, do not ship |
100
+
101
+ Each variant runs 3x, output is the median.
102
+
103
+ **Spec coverage.** `spec/hyperion/h2_codec_native_mode_spec.rb` — 17
104
+ new examples covering: each native-mode token resolves to the
105
+ expected `hpack_path`; `=cglue` on a host without the C glue logs
106
+ `native_mode=cglue-requested-unavailable` and falls through to v2;
107
+ `=v2` actually flips `H2Codec.cglue_active?` to false even when
108
+ cglue is available; the three bench-variant tokens
109
+ (`{off,v2,cglue}`) produce the three distinct `hpack_path` values
110
+ the harness compares.
111
+
112
+ ### 2.11-A — h2 first-stream TLS handshake parallelization (Bucket 2: pre-spawned dispatch worker pool)
113
+
114
+ The 2.10-G TCP_NODELAY fix lifted the ~40 ms h2 max-latency ceiling
115
+ that was paid by every stream. With the per-stream Nagle/delayed-ACK
116
+ noise gone, the **first-stream cold cost** became isolatable via the
117
+ 2.10-G `HYPERION_H2_TIMING=1` instrumentation. Reading the breakdown
118
+ on `h2load -c 1 -m 100 -n 5000 https://localhost/`:
119
+
120
+ | Bucket | Master baseline | After 2.11-A |
121
+ |---|---:|---:|
122
+ | `t0_to_t1_ms` (preface exchange — 0.3-1.7 ms baseline) | 0.3-1.7 ms | 0.6-1.2 ms |
123
+ | `t1_to_t2_enc_ms` (preface→first stream encoded — **dominant bucket**) | **12-25 ms** | **m=1: 1.0-1.4 ms**, m=100: 13-18 ms |
124
+ | `t2_enc_to_t2_wire_ms` (first stream encode → first byte on wire) | -10 to -27 ms\* | -13 to -17 ms\* |
125
+ | `t0_to_t2_wire_ms` (preface bytes on wire) | 0.9-3.4 ms | 1.1-1.9 ms |
126
+
127
+ \* The `t2_enc_to_t2_wire_ms` slot reflects "preface SETTINGS bytes
128
+ on the wire" minus "first stream HEADERS encoded" — the writer
129
+ fiber's `||=` capture lands on the preface bytes (always written
130
+ first), not the response. The negative value is expected and
131
+ documents the preface→response gap at the writer-fiber boundary.
132
+
133
+ **The dominant bucket was `t1_to_t2_enc_ms`** — preface complete to
134
+ first stream's HEADERS+DATA encoded. On the cold-stream `m=1` path,
135
+ the ~3-13 ms gap is dominated by lazy `task.async {}` fiber spawn
136
+ on the connection-loop fiber's `ready_ids` tick (under the Async
137
+ scheduler, the first `task.async` from a cold fiber pays scheduler
138
+ bookkeeping that warmer paths amortize away).
139
+
140
+ **Fix.** Pre-spawn a fixed pool of `N` dispatch worker fibers (default
141
+ `4`, configurable via `HYPERION_H2_DISPATCH_POOL`) inside `serve`
142
+ BEFORE `read_connection_preface` returns. Each worker parks on a
143
+ new per-connection `Async::Queue` exposed off `WriterContext#dispatch_queue`.
144
+ When a stream becomes ready, the connection-loop fiber pushes onto
145
+ the queue; a parked worker grabs it and calls `dispatch_stream`.
146
+
147
+ The first stream is now an enqueue+dequeue handoff (microseconds)
148
+ instead of a `task.async {}` cold spawn. Streams that arrive while
149
+ the queue is non-empty (workers all busy on prior streams) fall
150
+ back to ad-hoc `task.async {}` so concurrency is never artificially
151
+ capped — the operator-facing knob is `h2.max_concurrent_streams`,
152
+ not the pool size.
153
+
154
+ **Bench delta on openclaw-vm (single-worker, h2load → localhost TLS h2):**
155
+
156
+ | | Master | 2.11-A | Δ |
157
+ |---|---:|---:|---:|
158
+ | `m=1 -n 50` cold first-run, time-to-1st-byte | 20.28 ms | **9.28 ms** | **-54%** |
159
+ | `m=1 -n 50` warm avg, time-to-1st-byte | 5.93 ms | 7.20 ms | +21% (within run-to-run noise) |
160
+ | `m=100 -n 5000` 10-run avg, time-to-1st-byte | 19.6 ms | 19.1 ms | parity |
161
+ | `m=100 -n 5000` 10-run avg, throughput | 2742 r/s | 2893 r/s | +5.5% |
162
+ | `m=1 -n 50` `t1_to_t2_enc_ms` (instrumented, cold) | 3.4 ms | **1.0-1.4 ms** | **-66%** |
163
+
164
+ **Cold first-stream cost is roughly halved** on `m=1` (the actual
165
+ single-stream cold-connection path). The `m=100` path is dominated
166
+ by sequential client-frame reads on the connection-loop fiber, not
167
+ fiber-spawn cost — the fix doesn't move that needle but doesn't
168
+ regress it either.
169
+
170
+ **Sub-fixes folded in.**
171
+ * **Pre-resolve `peer_address`** before `read_connection_preface`.
172
+ The `peeraddr` syscall was previously paid on the hot path between
173
+ preface read and first dispatch; moving it earlier overlaps with
174
+ the writer fiber's first-tick scheduling.
175
+
176
+ **Operator surface.**
177
+
178
+ * `HYPERION_H2_DISPATCH_POOL=<N>` — set the pre-warmed dispatch
179
+ worker count per connection. Default `4`. Ceiling `16` (guards
180
+ against pathological configs spawning hundreds of idle fibers
181
+ per accepted connection). Invalid / non-positive values fall
182
+ back to the default rather than crashing the connection — this
183
+ is a tuning knob, not a spec parameter.
184
+ * `WriterContext#dispatch_queue` — the per-connection
185
+ `Async::Queue` workers park on; bench harnesses can introspect.
186
+ * `WriterContext#dispatch_worker_count` — live count of workers
187
+ currently registered (parked or actively dispatching). Useful
188
+ for diagnostics endpoints that want to surface "this connection's
189
+ pool is saturated".
190
+
191
+ **Constraints preserved.**
192
+ * TCP_NODELAY (2.10-G) hunk in `apply_tcp_nodelay` is untouched.
193
+ * Static asset preload + immutable hooks (2.10-E) are untouched.
194
+ * C-ext fast-path response writer (2.10-F) is untouched.
195
+ * `HYPERION_H2_TIMING=1` instrumentation continues to fire and
196
+ emits the same `'h2 first-stream timing'` log shape (the four
197
+ deltas + total). Locked by spec.
198
+
199
+ **Specs.** 12 new examples in `spec/hyperion/http2_dispatch_pool_spec.rb`
200
+ covering the WriterContext extensions (queue + worker count + register/
201
+ unregister), `resolve_dispatch_pool_size` env-var parsing (default,
202
+ override, invalid input, ceiling), the pool warmup contract (workers
203
+ registered, workers process queued items, one bad stream doesn't
204
+ poison the pool), and a TLS+curl end-to-end smoke (the timing log
205
+ shape continues to fire after the warmup hook is added). Spec count
206
+ **1060 → 1072**, 0 failures.
207
+
3
208
  ## 2.10.1 — 2026-05-01
4
209
 
5
210
  ### 2.10-F — C-ext fast-path response writer for prebuilt responses
data/README.md CHANGED
@@ -11,6 +11,39 @@ gem install hyperion-rb
11
11
  bundle exec hyperion config.ru
12
12
  ```
13
13
 
14
+ ## What's new in 2.11.0
15
+
16
+ **h2 cold-stream latency cut + native HPACK CGlue flipped to default.**
17
+ Two perf wins on top of 2.10:
18
+
19
+ - **2.11-A — h2 first-stream TLS handshake parallelization.** The
20
+ 2.10-G `HYPERION_H2_TIMING=1` instrumentation, run against the
21
+ TCP_NODELAY-fixed handler, isolated the residual cold-stream cost
22
+ to **bucket 2**: lazy `task.async {}` fiber spawn for the first
23
+ stream of every connection. Fix: pre-spawn a stream-dispatch fiber
24
+ pool at connection accept (configurable via `HYPERION_H2_DISPATCH_POOL`,
25
+ default 4, ceiling 16). h2load `-c 1 -m 1 -n 50` cold first-run:
26
+ **time-to-1st-byte 20.28 → 9.28 ms (−54%); m=100 throughput +5.5%**.
27
+ Warm steady-state unchanged (no head-of-line blocking under the small
28
+ pool — backlog still spills to ad-hoc `task.async`).
29
+ - **2.11-B — HPACK FFI marshalling round-2 (CGlue flipped to default).**
30
+ Three-way bench (`bench/h2_rails_shape.sh` extended): `ruby` (1,585
31
+ r/s) vs `native v2` (1,602 r/s, +1% — noise) vs `native v3 / CGlue`
32
+ (**2,291 r/s, +43% over v2**). The +18-44% native-vs-Ruby headline
33
+ was almost entirely Fiddle marshalling overhead, not the underlying
34
+ Rust HPACK encoder — same encoder, no per-call FFI marshalling, +43%
35
+ rps. Default flipped: unset `HYPERION_H2_NATIVE_HPACK` now selects
36
+ CGlue. Three escape valves stay (`=v2` to force the old path, `=ruby`
37
+ / `=off` for the pure-Ruby fallback) for any operator that needs
38
+ them. Boot log gains a `native_mode` field documenting which path is
39
+ actually live.
40
+
41
+ Plus operator infrastructure: a stale-`.dylib`-on-Linux cross-platform
42
+ host-OS portability fix in `H2Codec.candidate_paths` (was silently
43
+ falling through to pure-Ruby on the bench host); `bench/h2_rails_shape.sh`
44
+ race-fixed (boot-log probe + stderr routing). Full bench tables and
45
+ flip-decision rationale in [`CHANGELOG.md`](CHANGELOG.md).
46
+
14
47
  ## What's new in 2.10.1
15
48
 
16
49
  **Static-asset operator surface (2.10-E) + C-ext fast-path response
@@ -45,11 +45,36 @@ module Hyperion
45
45
  @cglue_available == true
46
46
  end
47
47
 
48
+ # 2.11-B — operator-controllable gate that overlays CGlue
49
+ # availability. The Encoder/Decoder hot paths probe this (NOT
50
+ # `cglue_available?`) so a `HYPERION_H2_NATIVE_HPACK=v2` boot can
51
+ # force the Fiddle path even on a host where the C glue loaded
52
+ # successfully. This is the bench-isolation knob 2.11-B's
53
+ # `bench/h2_rails_shape.sh` needs to compare native-v2 against
54
+ # native-v3 honestly — without it, "native" and "cglue" variants
55
+ # would always pick the same physical path.
56
+ #
57
+ # `Http2Handler#initialize` writes the gate based on the env var;
58
+ # tests can flip `@cglue_disabled` directly. Default false (i.e.,
59
+ # gate is OPEN — same physical behavior as 2.4-A through 2.10).
60
+ def self.cglue_active?
61
+ cglue_available? && !@cglue_disabled
62
+ end
63
+
64
+ def self.cglue_disabled=(value)
65
+ @cglue_disabled = value ? true : false
66
+ end
67
+
68
+ def self.cglue_disabled
69
+ @cglue_disabled == true
70
+ end
71
+
48
72
  # Force a reload (test seam). Unsets the memoized state so the next
49
73
  # `available?` call probes the filesystem again.
50
74
  def self.reset!
51
75
  @available = nil
52
76
  @cglue_available = nil
77
+ @cglue_disabled = false
53
78
  @lib = nil
54
79
  end
55
80
 
@@ -126,7 +151,13 @@ module Hyperion
126
151
  # into a new owned String — that's the contract callers rely
127
152
  # on (`protocol-http2`'s Compressor#encode returns a String,
128
153
  # not a slice into shared mutable memory).
129
- if H2Codec.cglue_available?
154
+ #
155
+ # 2.11-B — probe `cglue_active?` (NOT `cglue_available?`) so an
156
+ # operator-set `HYPERION_H2_NATIVE_HPACK=v2` boot routes through
157
+ # Fiddle even when the C glue is physically present. Same
158
+ # branch shape; one extra ivar read on the hot path which
159
+ # disappears under YJIT inlining.
160
+ if H2Codec.cglue_active?
130
161
  # Pad the scratch String with zero bytes so its length matches
131
162
  # capacity — the C ext writes into RSTRING_PTR up to RSTRING_LEN
132
163
  # and then truncates back via rb_str_set_len after encoding.
@@ -272,7 +303,8 @@ module Hyperion
272
303
  # 2.4-A — fast path: reuse a per-decoder scratch and dispatch
273
304
  # through the C glue. The Rust ABI writes `[u32 name_len][name]
274
305
  # [u32 val_len][val]` repeated; we unpack that in Ruby.
275
- if H2Codec.cglue_available?
306
+ # 2.11-B — `cglue_active?` overlays an operator-set v2 force.
307
+ if H2Codec.cglue_active?
276
308
  if capacity > @scratch_out_capacity
277
309
  new_cap = @scratch_out_capacity
278
310
  new_cap *= 2 while new_cap < capacity
@@ -412,9 +444,24 @@ module Hyperion
412
444
  def self.candidate_paths
413
445
  gem_lib = File.expand_path('../hyperion_h2_codec', __dir__)
414
446
  ext_target = File.expand_path('../../ext/hyperion_h2_codec/target/release', __dir__)
415
- %w[libhyperion_h2_codec.dylib libhyperion_h2_codec.so].flat_map do |name|
416
- [File.join(gem_lib, name), File.join(ext_target, name)]
417
- end
447
+ # 2.11-B fix: order suffixes by host OS. Pre-2.11-B this was a
448
+ # static `[dylib, so]` order, which broke on Linux hosts that
449
+ # had a stale macOS `.dylib` on the path (e.g. a developer rsync
450
+ # leaking the `target/release` artifact across platforms). Fiddle
451
+ # would try the `.dylib` first, choke on the Mach-O binary with
452
+ # `ArgumentError: invalid byte sequence in UTF-8` from libffi,
453
+ # and the rescue in `load!` would silently fall back to the Ruby
454
+ # HPACK path with no warning visible to bench harnesses.
455
+ #
456
+ # Ordering by `host_os` makes Linux pick `.so` first and ignore
457
+ # any orphan `.dylib`; macOS keeps the `.dylib`-first behavior
458
+ # for back-compat with existing dev environments.
459
+ suffixes = if /darwin|mac/i.match?(RbConfig::CONFIG['host_os'])
460
+ %w[libhyperion_h2_codec.dylib libhyperion_h2_codec.so]
461
+ else
462
+ %w[libhyperion_h2_codec.so libhyperion_h2_codec.dylib]
463
+ end
464
+ suffixes.flat_map { |name| [File.join(gem_lib, name), File.join(ext_target, name)] }
418
465
  end
419
466
 
420
467
  # FFI wrappers — kept thin so callers don't see Fiddle::Pointer
@@ -2,6 +2,7 @@
2
2
 
3
3
  require 'async'
4
4
  require 'async/notification'
5
+ require 'async/queue'
5
6
  require 'protocol/http2/server'
6
7
  require 'protocol/http2/framer'
7
8
  require 'protocol/http2/stream'
@@ -133,7 +134,7 @@ module Hyperion
133
134
  #
134
135
  # Single instance per connection, lives for the lifetime of `serve`.
135
136
  class WriterContext
136
- attr_reader :encode_mutex
137
+ attr_reader :encode_mutex, :dispatch_queue
137
138
  # 2.10-G — connection-lifecycle timing slots used by the optional h2
138
139
  # latency-instrumentation path (gated by `HYPERION_H2_TIMING=1`).
139
140
  # Each slot is a single CLOCK_MONOTONIC timestamp captured at most
@@ -149,6 +150,15 @@ module Hyperion
149
150
  @pending_bytes_lock = ::Mutex.new
150
151
  @max_pending_bytes = max_pending_bytes
151
152
  @writer_done = false
153
+ # 2.11-A — pre-spawned dispatch worker pool. The connection-loop
154
+ # fiber pushes ready streams onto `@dispatch_queue`; workers
155
+ # parked on `dequeue` grab them and call `dispatch_stream`. The
156
+ # queue is created here (cheap — wraps a Thread::Queue) so the
157
+ # WriterContext is fully self-contained and unit-testable without
158
+ # an Async reactor.
159
+ @dispatch_queue = ::Async::Queue.new
160
+ @dispatch_worker_count = 0
161
+ @dispatch_worker_lock = ::Mutex.new
152
162
  # 2.10-G timing slots, all initially nil so capture is a single
153
163
  # `||=` write under the encode mutex / writer fiber.
154
164
  @t0_serve_entry = nil
@@ -157,6 +167,31 @@ module Hyperion
157
167
  @t2_first_wire = nil
158
168
  end
159
169
 
170
+ # 2.11-A — bench/diagnostics introspection. Reads the live count
171
+ # of dispatch worker fibers parked on (or actively pulling from)
172
+ # `@dispatch_queue`. Reflects pre-spawned workers AND any ad-hoc
173
+ # workers spawned when the pool was saturated. Exposed as a method
174
+ # rather than `attr_reader` so the lock guards the counter.
175
+ def dispatch_worker_count
176
+ @dispatch_worker_lock.synchronize { @dispatch_worker_count }
177
+ end
178
+
179
+ # Called by a dispatch worker fiber when it enters its run loop.
180
+ # Pairs with `unregister_dispatch_worker` in an ensure block.
181
+ def register_dispatch_worker
182
+ @dispatch_worker_lock.synchronize { @dispatch_worker_count += 1 }
183
+ end
184
+
185
+ # Called by a dispatch worker fiber when it exits (queue closed,
186
+ # or unrecoverable error). Floors at 0 to defend against a stray
187
+ # double-unregister — instrumentation must never go negative.
188
+ def unregister_dispatch_worker
189
+ @dispatch_worker_lock.synchronize do
190
+ @dispatch_worker_count -= 1
191
+ @dispatch_worker_count = 0 if @dispatch_worker_count.negative?
192
+ end
193
+ end
194
+
160
195
  # Called by SendQueueIO#write on the calling (encoder) fiber. Enforces
161
196
  # the per-connection backpressure cap before enqueuing.
162
197
  def enqueue(bytes)
@@ -480,14 +515,19 @@ module Hyperion
480
515
  # threshold. 2.4-A's hello-shape bench saw parity because HPACK
481
516
  # is <1% of per-stream CPU on a 2-header response.
482
517
  #
518
+ # 2.11-B — `HYPERION_H2_NATIVE_HPACK` extended with a native-mode
519
+ # axis (`auto` / `cglue` / `v2` / `off`). See `resolve_h2_native_hpack_state`.
483
520
  # Operators who want the prior 2.4.x default (Ruby fallback, env
484
- # var unset) can now set `HYPERION_H2_NATIVE_HPACK=off` (or
485
- # `0`/`false`/`no`/`off`) explicitly. `HYPERION_H2_NATIVE_HPACK=1`
486
- # still works for explicit opt-in.
521
+ # var unset) can set `HYPERION_H2_NATIVE_HPACK=off` (or
522
+ # `0`/`false`/`no`/`off`/`ruby`). `HYPERION_H2_NATIVE_HPACK=1`
523
+ # / unset preserves the 2.5-B `auto` behavior. `=cglue`/`=v2`
524
+ # forces the corresponding native sub-path.
487
525
  #
488
526
  # When OFF (env-overridden): `protocol-http2`'s pure-Ruby HPACK
489
527
  # Compressor / Decompressor handles everything as in 2.0.0–2.4.x.
490
- @h2_native_hpack_enabled = @h2_codec_available && resolve_h2_native_hpack_default
528
+ @h2_native_mode = resolve_h2_native_hpack_state
529
+ @h2_native_hpack_enabled = @h2_codec_available && @h2_native_mode != :off
530
+ apply_h2_cglue_gate(@h2_native_mode)
491
531
  @h2_codec_native = @h2_native_hpack_enabled # back-compat ivar — preserved for codec_native? readers
492
532
  # 2.10-G — opt-in connection-setup timing instrumentation. When set,
493
533
  # `serve` captures four monotonic timestamps per connection:
@@ -507,9 +547,45 @@ module Hyperion
507
547
  # cost when disabled — a single ivar read per stream branch). Used by
508
548
  # 2.10-G to root-cause Hyperion's flat ~40 ms first-stream max-latency.
509
549
  @h2_timing_enabled = env_flag_enabled?('HYPERION_H2_TIMING')
550
+ # 2.11-A — resolve the dispatch worker pool size once at handler
551
+ # construction so every `serve` call uses the same value (instead
552
+ # of re-parsing ENV per connection on the hot path). Cached as an
553
+ # ivar; bench/diagnostics can read it via the spec seam.
554
+ @dispatch_pool_size = resolve_dispatch_pool_size
510
555
  record_codec_boot_state
511
556
  end
512
557
 
558
+ # 2.11-A — pre-spawned dispatch worker pool sizing.
559
+ #
560
+ # Default `4` workers per connection — enough to absorb the typical
561
+ # HTTP/2 burst (2-8 concurrent streams) without paying any per-stream
562
+ # `task.async {}` cost on the hot path. Operators on long-lived
563
+ # high-fan-out connections (e.g. an aggregator backend that fans
564
+ # 30+ parallel streams) can bump this with `HYPERION_H2_DISPATCH_POOL`.
565
+ # Streams that arrive when the pool is saturated still get an ad-hoc
566
+ # fiber (see `serve` below) so concurrency is never artificially
567
+ # capped — the operator-facing limit is `h2.max_concurrent_streams`.
568
+ #
569
+ # Ceiling at 16 guards against a pathological config that would
570
+ # spawn hundreds of idle fibers per accepted connection. Anything
571
+ # malformed / non-positive falls back to the default rather than
572
+ # crashing the connection — this is a tuning knob, not a spec
573
+ # parameter.
574
+ DISPATCH_POOL_DEFAULT = 4
575
+ DISPATCH_POOL_MAX = 16
576
+
577
+ def resolve_dispatch_pool_size
578
+ raw = ENV['HYPERION_H2_DISPATCH_POOL']
579
+ return DISPATCH_POOL_DEFAULT if raw.nil? || raw.strip.empty?
580
+
581
+ n = Integer(raw.strip, 10)
582
+ return DISPATCH_POOL_DEFAULT unless n.positive?
583
+
584
+ [n, DISPATCH_POOL_MAX].min
585
+ rescue ArgumentError, TypeError
586
+ DISPATCH_POOL_DEFAULT
587
+ end
588
+
513
589
  # Read an env-var flag with the usual truthiness rules (any of
514
590
  # 1/true/yes/on, case-insensitive). Anything else → false.
515
591
  def env_flag_enabled?(name)
@@ -519,21 +595,42 @@ module Hyperion
519
595
  %w[1 true yes on].include?(v.downcase)
520
596
  end
521
597
 
522
- # Read an env-var flag with explicit OFF support. Used by
523
- # `HYPERION_H2_NATIVE_HPACK` since 2.5-B flipped the default to ON.
524
- # Returns true if the env var is unset / empty / explicitly truthy;
525
- # returns false only when the operator sets it to a truthy-OFF
526
- # value (0/false/no/off, case-insensitive). Anything else falls
527
- # back to the default-on behavior so we don't surprise operators
528
- # who set typo'd values.
529
- def resolve_h2_native_hpack_default
598
+ # 2.11-B resolve the operator-requested native-mode state from
599
+ # `HYPERION_H2_NATIVE_HPACK`.
600
+ #
601
+ # Returns one of:
602
+ # * `:auto` — native enabled, prefer cglue if available
603
+ # (unset / `1` / `true` / `yes` / `on` / `auto`)
604
+ # * `:cglue` native enabled, force cglue (warn-fallback to v2
605
+ # if cglue is unavailable; native_mode log marker
606
+ # surfaces the divergence to the operator)
607
+ # * `:v2` — native enabled, force Fiddle (skip cglue even if
608
+ # available; this is the bench-isolation knob the
609
+ # 2.11-B Rails-shape harness needs)
610
+ # * `:off` — ruby fallback (`0` / `false` / `no` / `off` / `ruby`)
611
+ #
612
+ # Unknown values fall through to `:auto` rather than crashing the
613
+ # connection — same forgiving-default policy as the pre-2.11-B
614
+ # `resolve_h2_native_hpack_default`.
615
+ def resolve_h2_native_hpack_state
530
616
  v = ENV['HYPERION_H2_NATIVE_HPACK']
531
- return true if v.nil? || v.empty?
617
+ return :auto if v.nil? || v.empty?
532
618
 
533
619
  lc = v.downcase
534
- return false if %w[0 false no off].include?(lc)
620
+ return :off if %w[0 false no off ruby].include?(lc)
621
+ return :cglue if %w[cglue v3].include?(lc)
622
+ return :v2 if %w[v2 fiddle].include?(lc)
623
+
624
+ :auto
625
+ end
535
626
 
536
- true
627
+ # 2.11-B — flip the global `H2Codec.cglue_disabled` gate based on
628
+ # the resolved native-mode state. The gate is per-process state
629
+ # (the codec module is a singleton) so reset it on every handler
630
+ # construction; otherwise a test that booted with `=v2` would leak
631
+ # the disable into a subsequent default-mode handler.
632
+ def apply_h2_cglue_gate(state)
633
+ Hyperion::H2Codec.cglue_disabled = (state == :v2)
537
634
  end
538
635
 
539
636
  # 2.0.0 Phase 6b: emit a single-shot boot log line per process
@@ -544,23 +641,32 @@ module Hyperion
544
641
  return if Hyperion::Http2Handler.instance_variable_get(:@codec_state_logged)
545
642
 
546
643
  Hyperion::Http2Handler.instance_variable_set(:@codec_state_logged, true)
547
- cglue_active = @h2_native_hpack_enabled && Hyperion::H2Codec.cglue_available?
548
- mode =
549
- if @h2_native_hpack_enabled && cglue_active
550
- 'native (Rust v3 / CGlue) HPACK on hot path, no Fiddle per call'
551
- elsif @h2_native_hpack_enabled
552
- 'native (Rust v2 / Fiddle) — HPACK on hot path, Fiddle marshalling per call'
553
- elsif @h2_codec_available
554
- 'fallback (protocol-http2 / pure Ruby HPACK) — native available but opted out via HYPERION_H2_NATIVE_HPACK=off'
555
- else
556
- 'fallback (protocol-http2 / pure Ruby HPACK) — native unavailable'
557
- end
644
+ # 2.11-B — `cglue_active` gates on the operator-controllable
645
+ # `cglue_active?` predicate (was `cglue_available?` pre-2.11-B).
646
+ # When the operator sets `=v2` we want the boot log to read
647
+ # `cglue_active: false` even though the C glue did install
648
+ # successfully — the bench harness inspects this field to
649
+ # differentiate the variants.
650
+ cglue_active = @h2_native_hpack_enabled && Hyperion::H2Codec.cglue_active?
651
+ cglue_requested_unavailable = @h2_native_mode == :cglue &&
652
+ @h2_native_hpack_enabled &&
653
+ !Hyperion::H2Codec.cglue_available?
654
+ mode = describe_codec_mode(cglue_active: cglue_active,
655
+ cglue_requested_unavailable: cglue_requested_unavailable)
656
+ native_mode_log = if !@h2_native_hpack_enabled
657
+ @h2_native_mode == :off ? 'off' : 'native-disabled'
658
+ elsif cglue_requested_unavailable
659
+ 'cglue-requested-unavailable'
660
+ else
661
+ @h2_native_mode.to_s
662
+ end
558
663
  @logger.info do
559
664
  {
560
665
  message: 'h2 codec selected',
561
666
  mode: mode,
562
667
  native_available: @h2_codec_available,
563
668
  native_enabled: @h2_native_hpack_enabled,
669
+ native_mode: native_mode_log,
564
670
  cglue_active: cglue_active,
565
671
  hpack_path: if @h2_native_hpack_enabled
566
672
  cglue_active ? 'native-v3' : 'native-v2'
@@ -573,6 +679,34 @@ module Hyperion
573
679
  @metrics.increment(:h2_codec_fallback_selected) unless @h2_native_hpack_enabled
574
680
  end
575
681
 
682
+ # 2.11-B — boot-log mode descriptor (extracted for clarity since
683
+ # the matrix of native_mode × cglue_available × cglue_active grew
684
+ # past the point where an inline conditional was readable).
685
+ def describe_codec_mode(cglue_active:, cglue_requested_unavailable:)
686
+ if !@h2_native_hpack_enabled
687
+ if @h2_codec_available
688
+ 'fallback (protocol-http2 / pure Ruby HPACK) — native available but opted out via HYPERION_H2_NATIVE_HPACK=off'
689
+ else
690
+ 'fallback (protocol-http2 / pure Ruby HPACK) — native unavailable'
691
+ end
692
+ elsif cglue_active && @h2_native_mode == :cglue
693
+ 'native (Rust v3 / CGlue, forced) — HPACK on hot path, no Fiddle per call'
694
+ elsif cglue_active
695
+ # 2.11-B confirmed cglue as the firm default — the bench-measured
696
+ # delta vs the v2 (Fiddle) path is +33-43% on Rails-shape h2
697
+ # responses, which is the actual win the 2.5-B "+18% native vs
698
+ # ruby" headline was capturing (v2 alone is +1-5%, basically
699
+ # noise vs the ruby fallback at this header count).
700
+ 'native (Rust v3 / CGlue, default since 2.11-B) — HPACK on hot path, no Fiddle per call'
701
+ elsif @h2_native_mode == :v2
702
+ 'native (Rust v2 / Fiddle, forced) — HPACK on hot path, Fiddle marshalling per call'
703
+ elsif cglue_requested_unavailable
704
+ 'native (Rust v2 / Fiddle) — CGlue requested via HYPERION_H2_NATIVE_HPACK=cglue but unavailable, fell back'
705
+ else
706
+ 'native (Rust v2 / Fiddle) — HPACK on hot path, Fiddle marshalling per call'
707
+ end
708
+ end
709
+
576
710
  # Read-only accessor used by tests + diagnostics. true = the
577
711
  # `Hyperion::H2Codec` Rust extension loaded successfully AND
578
712
  # `HYPERION_H2_NATIVE_HPACK=1` is set, so `build_server` will
@@ -610,6 +744,14 @@ module Hyperion
610
744
 
611
745
  task = ::Async::Task.current
612
746
 
747
+ # 2.11-A — extract the peer address BEFORE the preface exchange.
748
+ # Two wins: (1) the lookup runs in parallel with the writer fiber
749
+ # picking up the first scheduler slot, and (2) the first stream's
750
+ # dispatch fiber doesn't pay this `peeraddr` syscall on its hot
751
+ # path. The address is then captured by the worker closures
752
+ # below.
753
+ peer_addr = peer_address(socket)
754
+
613
755
  # Spawn the dedicated writer fiber BEFORE the preface exchange.
614
756
  # `Server#read_connection_preface` writes the server's SETTINGS frame
615
757
  # via the framer; if the writer isn't running, those bytes sit in the
@@ -618,15 +760,23 @@ module Hyperion
618
760
  # waits for our SETTINGS before sending more frames.
619
761
  writer_task = task.async { run_writer_loop(socket, writer_ctx) }
620
762
 
763
+ # 2.11-A — pre-spawn the dispatch worker pool BEFORE the preface
764
+ # exchange. Workers park on `writer_ctx.dispatch_queue.dequeue`;
765
+ # by the time the first client HEADERS frame arrives the workers
766
+ # are already in the scheduler's runnable set. The first stream
767
+ # is just an enqueue + dequeue (microseconds) instead of a
768
+ # `task.async {}` cold spawn (was the dominant cost in the t1→t2_enc
769
+ # bucket per the 2.10-G timing breakdown).
770
+ warmup_dispatch_pool!(task, writer_ctx, peer_addr: peer_addr,
771
+ pool_size: @dispatch_pool_size)
772
+
621
773
  server.read_connection_preface(initial_settings_payload)
622
774
  writer_ctx.t1_preface_done = monotonic_now if @h2_timing_enabled
623
775
 
624
- # Extract once the same TCP peer drives every stream on this conn.
625
- peer_addr = peer_address(socket)
626
-
627
- # Track in-flight per-stream dispatch fibers so we can drain them on
628
- # connection close.
629
- stream_tasks = []
776
+ # Track ad-hoc per-stream dispatch fibers (spilled when the pool is
777
+ # saturated). The pool handles the common case; we only fall back
778
+ # to `task.async {}` when more streams arrive than warm workers.
779
+ overflow_tasks = []
630
780
 
631
781
  until server.closed?
632
782
  ready_ids = []
@@ -645,14 +795,35 @@ module Hyperion
645
795
  # if subsequent frames (e.g. RST_STREAM races) arrive.
646
796
  stream.instance_variable_set(:@hyperion_dispatched, true)
647
797
 
648
- stream_tasks << task.async do
649
- dispatch_stream(stream, writer_ctx, peer_addr)
798
+ # 2.11-A — hand the stream to a warm worker via the dispatch
799
+ # queue. We use a simple "queue is empty" probe to decide:
800
+ #
801
+ # * Empty queue ⇒ at least one worker is parked on
802
+ # `dequeue`; the enqueue+dequeue handoff is microseconds
803
+ # and we avoid a `task.async {}` cold spawn. This is the
804
+ # hot path for the FIRST stream of a fresh connection
805
+ # (the case 2.11-A is targeting).
806
+ # * Non-empty queue ⇒ every parked worker has already
807
+ # pulled a stream; another worker won't pick this up
808
+ # until one finishes. To avoid head-of-line blocking
809
+ # behind the warmup pool, fall back to `task.async {}`.
810
+ # The overflow fiber re-uses `dispatch_stream` so the
811
+ # dispatch contract is identical between pool and
812
+ # overflow paths. Concurrency is never artificially
813
+ # capped; the operator-facing knob is
814
+ # `h2.max_concurrent_streams`.
815
+ if writer_ctx.dispatch_queue.size.zero?
816
+ writer_ctx.dispatch_queue.enqueue(stream)
817
+ else
818
+ overflow_tasks << task.async do
819
+ dispatch_stream(stream, writer_ctx, peer_addr)
820
+ end
650
821
  end
651
822
  end
652
823
  end
653
824
 
654
825
  # Drain in-flight stream dispatches before we close the socket.
655
- stream_tasks.each do |t|
826
+ overflow_tasks.each do |t|
656
827
  t.wait
657
828
  rescue StandardError
658
829
  nil
@@ -676,6 +847,18 @@ module Hyperion
676
847
  # socket before the writer drains would discard final RST_STREAM /
677
848
  # GOAWAY / END_STREAM frames in the queue.
678
849
  if writer_ctx
850
+ # 2.11-A — close the dispatch queue so any pre-spawned workers
851
+ # parked on `dequeue` fall through (Async::Queue#dequeue returns
852
+ # nil after close). Do this BEFORE waiting on the writer so
853
+ # pool workers can drain their in-flight stream dispatches and
854
+ # release the encode mutex; otherwise the writer might park
855
+ # waiting for bytes that the dispatch worker never gets to
856
+ # encode.
857
+ begin
858
+ writer_ctx.dispatch_queue.close unless writer_ctx.dispatch_queue.closed?
859
+ rescue StandardError
860
+ nil
861
+ end
679
862
  writer_ctx.shutdown!
680
863
  begin
681
864
  writer_task&.wait
@@ -695,6 +878,63 @@ module Hyperion
695
878
 
696
879
  private
697
880
 
881
+ # 2.11-A — pre-spawn the per-connection dispatch worker pool.
882
+ #
883
+ # Each worker is a fiber that loops:
884
+ # 1. `dequeue` a stream from the per-connection dispatch queue
885
+ # (parks the fiber on the queue's internal notification when
886
+ # empty — zero CPU until a stream arrives).
887
+ # 2. Calls `dispatch_stream` with the stream + writer context +
888
+ # pre-resolved peer address.
889
+ # 3. Loops back to (1). Exits cleanly when `dequeue` returns nil
890
+ # (queue closed by `serve`'s ensure block on connection
891
+ # teardown).
892
+ #
893
+ # Why pre-spawn rather than `task.async {}` per stream:
894
+ # * Fiber startup under Async involves a few µs of allocation and
895
+ # scheduler bookkeeping. Per-stream that's negligible; on the
896
+ # CONNECTION COLD PATH (first request on a fresh TCP/TLS conn)
897
+ # it adds up to a measurable share of the t1→t2_enc bucket
898
+ # (the 2.10-G timing breakdown showed ~12-25 ms on h2load
899
+ # `-c 1 -m 100 -n 5000`).
900
+ # * Workers parked on `dequeue` are already in the scheduler's
901
+ # ready set; the first stream is just an enqueue + dequeue
902
+ # handoff (microseconds).
903
+ #
904
+ # Errors inside `dispatch_stream` are already caught + RST_STREAMed
905
+ # there, so the worker only needs to defend against truly
906
+ # unexpected failures (queue shutdown races, fiber kill on graceful
907
+ # shutdown). We swallow those defensively and unregister so the
908
+ # `dispatch_worker_count` introspection is truthful.
909
+ def warmup_dispatch_pool!(task, writer_ctx, peer_addr:, pool_size:)
910
+ pool_size.times do
911
+ task.async do
912
+ writer_ctx.register_dispatch_worker
913
+ begin
914
+ loop do
915
+ stream = writer_ctx.dispatch_queue.dequeue
916
+ break if stream.nil? # queue closed → graceful exit
917
+
918
+ begin
919
+ dispatch_stream(stream, writer_ctx, peer_addr)
920
+ rescue StandardError => e
921
+ # `dispatch_stream` already logs + RST_STREAMs internally;
922
+ # if anything escapes that net we log here and keep the
923
+ # worker alive — one bad stream must not poison the
924
+ # connection's worker pool.
925
+ @logger.error do
926
+ { message: 'h2 dispatch worker swallowed error',
927
+ error: e.message, error_class: e.class.name }
928
+ end
929
+ end
930
+ end
931
+ ensure
932
+ writer_ctx.unregister_dispatch_worker
933
+ end
934
+ end
935
+ end
936
+ end
937
+
698
938
  # Build the [setting_id, value] pairs that go in the connection-preface
699
939
  # SETTINGS frame. protocol-http2's Server#read_connection_preface accepts
700
940
  # this array and does the wire encoding for us. Empty array (no overrides
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Hyperion
4
- VERSION = '2.10.1'
4
+ VERSION = '2.11.0'
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: hyperion-rb
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.10.1
4
+ version: 2.11.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrey Lobanov