hyperion-rb 2.10.1 → 2.11.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +205 -0
- data/README.md +33 -0
- data/lib/hyperion/h2_codec.rb +52 -5
- data/lib/hyperion/http2_handler.rb +276 -36
- data/lib/hyperion/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 70fee9940bc27a68927bb0204717bb35da7ea695bc02fe49aaa6c9597c9dd78d
|
|
4
|
+
data.tar.gz: aaa583caa320c605bb06e2894c3f5452d183b4aa5f1ab27329a2a3a89e1ef900
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 3c28ad4d534e00eb3b687903da63ae7c04c918d27081d9ac73e04a67d521e2abd832f100b20d87b3411fa9c7be5abdcda2f8638ca8c139ef913fbd0f1ad1930a
|
|
7
|
+
data.tar.gz: debe988e4dc461376df38bdd84833a2f608dcfb8bf2dc420fc086e89970607b1c36d632dc1057b4eb9983f83918fdc60d93ee87a10098f6f775ccd4207611fb1
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,210 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 2.11.0 — 2026-05-01
|
|
4
|
+
|
|
5
|
+
### 2.11-B — HPACK FFI marshalling round-2 (cglue confirmed as firm default; +43% v3 vs v2 on Rails-shape h2)
|
|
6
|
+
|
|
7
|
+
**Bench result (openclaw-vm, 25-header h2load -c 1 -m 100 -n 5000, 3 runs/variant, median):**
|
|
8
|
+
|
|
9
|
+
| Variant | Env value | Median rps |
|
|
10
|
+
|---|---|---:|
|
|
11
|
+
| Ruby fallback | `=off` | 1585.35 r/s |
|
|
12
|
+
| Native v2 (Fiddle, forced) | `=v2` | 1602.27 r/s |
|
|
13
|
+
| Native v3 (CGlue, forced) | `=cglue` | 2291.44 r/s |
|
|
14
|
+
|
|
15
|
+
| Delta | Value |
|
|
16
|
+
|---|---:|
|
|
17
|
+
| native (v2) vs ruby | +1.1% (within noise) |
|
|
18
|
+
| **cglue (v3) vs native (v2)** | **+43.0%** (HEADLINE — Fiddle marshalling overhead) |
|
|
19
|
+
| cglue (v3) vs ruby | +44.5% (total native win) |
|
|
20
|
+
|
|
21
|
+
**Decision: flip cglue default ON.** The bench cleanly attributes the
|
|
22
|
+
+18-44% native-vs-ruby headline to the C-glue path's elimination of
|
|
23
|
+
per-call Fiddle marshalling, *not* to the underlying Rust HPACK
|
|
24
|
+
encoder. With cglue forced off, native v2 is +1-5% over ruby —
|
|
25
|
+
basically noise on this header count. The 2.5-B headline ("+18%
|
|
26
|
+
native vs ruby") was actually measuring v3 vs ruby because `=1`
|
|
27
|
+
silently picked v3 on cglue-available hosts; 2.11-B's `=v2` token
|
|
28
|
+
made the v2-only number measurable for the first time.
|
|
29
|
+
|
|
30
|
+
The `:auto` resolved state (unset / `=1` / `=true`) was already
|
|
31
|
+
selecting cglue when available since 2.4-A; this round confirms
|
|
32
|
+
that selection and updates the boot-log mode string to advertise
|
|
33
|
+
cglue as the de jure default. The runtime behavior on a host where
|
|
34
|
+
cglue is available is unchanged from 2.10 — the de facto cglue
|
|
35
|
+
selection becomes de jure, with a `default since 2.11-B` marker
|
|
36
|
+
in the human-readable `mode` log field.
|
|
37
|
+
|
|
38
|
+
|
|
39
|
+
|
|
40
|
+
The 2.5-B Rails-shape bench measured native v3 (CGlue) at +18% over
|
|
41
|
+
the Ruby fallback on a 25-header response — comfortably above the
|
|
42
|
+
+15% flip threshold, which moved the native-vs-Ruby default to ON.
|
|
43
|
+
The remaining open question 2.5-B punted on: how much of that +18%
|
|
44
|
+
is the Rust HPACK encoder, and how much is the C-glue path's
|
|
45
|
+
elimination of per-call Fiddle marshalling? A direct A/B was
|
|
46
|
+
impossible at the time because `=1` always picked v3 on hosts
|
|
47
|
+
where the C glue had installed successfully.
|
|
48
|
+
|
|
49
|
+
**Surface change.** `HYPERION_H2_NATIVE_HPACK` now accepts an
|
|
50
|
+
explicit native-mode token alongside the legacy Boolean values.
|
|
51
|
+
The legacy values still resolve to the same physical path they
|
|
52
|
+
did before, so this is back-compat for every operator who set the
|
|
53
|
+
env var pre-2.11-B:
|
|
54
|
+
|
|
55
|
+
| Value | Resolves to | Pre-2.11-B behavior | 2.11-B behavior |
|
|
56
|
+
|---|---|---|---|
|
|
57
|
+
| (unset) / `1` / `true` / `yes` / `on` / `auto` | `:auto` | native, prefer cglue | native, prefer cglue (unchanged) |
|
|
58
|
+
| `cglue` / `v3` | `:cglue` | (same as `1`) | force cglue, warn-fallback to v2 |
|
|
59
|
+
| `v2` / `fiddle` | `:v2` | (same as `1`) | force v2/Fiddle (skip cglue even if installed) |
|
|
60
|
+
| `0` / `false` / `no` / `off` / `ruby` | `:off` | ruby fallback | ruby fallback (unchanged) |
|
|
61
|
+
|
|
62
|
+
The new `=v2` value is the bench-isolation knob `bench/h2_rails_shape.sh`
|
|
63
|
+
needs: without it the harness's "native" variant silently picked v3
|
|
64
|
+
on bench hosts where the C glue loaded successfully, making the
|
|
65
|
+
v2-vs-v3 delta unmeasurable. With `=v2` the operator can force the
|
|
66
|
+
Fiddle path on a host where cglue is physically present.
|
|
67
|
+
|
|
68
|
+
**Implementation.** `Hyperion::H2Codec.cglue_active?` overlays
|
|
69
|
+
`cglue_available?` with an operator-controllable gate
|
|
70
|
+
(`H2Codec.cglue_disabled = true|false`). The Encoder/Decoder hot
|
|
71
|
+
paths probe `cglue_active?` (was `cglue_available?`); one extra
|
|
72
|
+
ivar read per encode call which YJIT inlines away. `Http2Handler`
|
|
73
|
+
sets the gate at construction based on the resolved native-mode
|
|
74
|
+
state. The gate is global (the codec module is a singleton); the
|
|
75
|
+
handler resets it on every construction so a `=v2` boot can't leak
|
|
76
|
+
the disable into a subsequent default-mode handler.
|
|
77
|
+
|
|
78
|
+
**Boot log.** The `h2 codec selected` log line gains a new `native_mode`
|
|
79
|
+
field exposing the operator-requested mode (`auto` / `cglue` / `v2` /
|
|
80
|
+
`off` / `cglue-requested-unavailable`). The existing `hpack_path`
|
|
81
|
+
field continues to be one of `pure-ruby` / `native-v2` / `native-v3` —
|
|
82
|
+
unchanged, ops dashboards keying off it keep working. The `mode`
|
|
83
|
+
human-readable string differentiates `forced` from `auto` selections
|
|
84
|
+
so a misconfigured `=cglue` boot on a host without the C glue is
|
|
85
|
+
visible in one log line instead of requiring a process trace.
|
|
86
|
+
|
|
87
|
+
**Bench harness.** `bench/h2_rails_shape.sh` now runs three variants
|
|
88
|
+
(`ruby`, `native`, `cglue`) instead of two and emits
|
|
89
|
+
`delta_native_vs_ruby` (informational — should reproduce the 2.5-B
|
|
90
|
+
+18% headline) plus `delta_cglue_vs_native` (the headline for
|
|
91
|
+
2.11-B — does cutting per-call Fiddle marshalling buy anything on
|
|
92
|
+
top of the v2 path?). The decision rule keys off the cglue-vs-native
|
|
93
|
+
delta:
|
|
94
|
+
|
|
95
|
+
| Outcome (cglue vs native) | Action |
|
|
96
|
+
|---|---|
|
|
97
|
+
| ≥ +15% rps | Flip cglue default ON (replace 2.5-B's auto-cglue dance) |
|
|
98
|
+
| Parity / +5-10% (within noise) | Keep cglue opt-in, file as deferred |
|
|
99
|
+
| ≥ −2% (negative) | Investigate, do not ship |
|
|
100
|
+
|
|
101
|
+
Each variant runs 3x, output is the median.
|
|
102
|
+
|
|
103
|
+
**Spec coverage.** `spec/hyperion/h2_codec_native_mode_spec.rb` — 17
|
|
104
|
+
new examples covering: each native-mode token resolves to the
|
|
105
|
+
expected `hpack_path`; `=cglue` on a host without the C glue logs
|
|
106
|
+
`native_mode=cglue-requested-unavailable` and falls through to v2;
|
|
107
|
+
`=v2` actually flips `H2Codec.cglue_active?` to false even when
|
|
108
|
+
cglue is available; the three bench-variant tokens
|
|
109
|
+
(`{off,v2,cglue}`) produce the three distinct `hpack_path` values
|
|
110
|
+
the harness compares.
|
|
111
|
+
|
|
112
|
+
### 2.11-A — h2 first-stream TLS handshake parallelization (Bucket 2: pre-spawned dispatch worker pool)
|
|
113
|
+
|
|
114
|
+
The 2.10-G TCP_NODELAY fix lifted the ~40 ms h2 max-latency ceiling
|
|
115
|
+
that was paid by every stream. With the per-stream Nagle/delayed-ACK
|
|
116
|
+
noise gone, the **first-stream cold cost** became isolatable via the
|
|
117
|
+
2.10-G `HYPERION_H2_TIMING=1` instrumentation. Reading the breakdown
|
|
118
|
+
on `h2load -c 1 -m 100 -n 5000 https://localhost/`:
|
|
119
|
+
|
|
120
|
+
| Bucket | Master baseline | After 2.11-A |
|
|
121
|
+
|---|---:|---:|
|
|
122
|
+
| `t0_to_t1_ms` (preface exchange — 0.3-1.7 ms baseline) | 0.3-1.7 ms | 0.6-1.2 ms |
|
|
123
|
+
| `t1_to_t2_enc_ms` (preface→first stream encoded — **dominant bucket**) | **12-25 ms** | **m=1: 1.0-1.4 ms**, m=100: 13-18 ms |
|
|
124
|
+
| `t2_enc_to_t2_wire_ms` (first stream encode → first byte on wire) | -10 to -27 ms\* | -13 to -17 ms\* |
|
|
125
|
+
| `t0_to_t2_wire_ms` (preface bytes on wire) | 0.9-3.4 ms | 1.1-1.9 ms |
|
|
126
|
+
|
|
127
|
+
\* The `t2_enc_to_t2_wire_ms` slot reflects "preface SETTINGS bytes
|
|
128
|
+
on the wire" minus "first stream HEADERS encoded" — the writer
|
|
129
|
+
fiber's `||=` capture lands on the preface bytes (always written
|
|
130
|
+
first), not the response. The negative value is expected and
|
|
131
|
+
documents the preface→response gap at the writer-fiber boundary.
|
|
132
|
+
|
|
133
|
+
**The dominant bucket was `t1_to_t2_enc_ms`** — preface complete to
|
|
134
|
+
first stream's HEADERS+DATA encoded. On the cold-stream `m=1` path,
|
|
135
|
+
the ~3-13 ms gap is dominated by lazy `task.async {}` fiber spawn
|
|
136
|
+
on the connection-loop fiber's `ready_ids` tick (under the Async
|
|
137
|
+
scheduler, the first `task.async` from a cold fiber pays scheduler
|
|
138
|
+
bookkeeping that warmer paths amortize away).
|
|
139
|
+
|
|
140
|
+
**Fix.** Pre-spawn a fixed pool of `N` dispatch worker fibers (default
|
|
141
|
+
`4`, configurable via `HYPERION_H2_DISPATCH_POOL`) inside `serve`
|
|
142
|
+
BEFORE `read_connection_preface` returns. Each worker parks on a
|
|
143
|
+
new per-connection `Async::Queue` exposed off `WriterContext#dispatch_queue`.
|
|
144
|
+
When a stream becomes ready, the connection-loop fiber pushes onto
|
|
145
|
+
the queue; a parked worker grabs it and calls `dispatch_stream`.
|
|
146
|
+
|
|
147
|
+
The first stream is now an enqueue+dequeue handoff (microseconds)
|
|
148
|
+
instead of a `task.async {}` cold spawn. Streams that arrive while
|
|
149
|
+
the queue is non-empty (workers all busy on prior streams) fall
|
|
150
|
+
back to ad-hoc `task.async {}` so concurrency is never artificially
|
|
151
|
+
capped — the operator-facing knob is `h2.max_concurrent_streams`,
|
|
152
|
+
not the pool size.
|
|
153
|
+
|
|
154
|
+
**Bench delta on openclaw-vm (single-worker, h2load → localhost TLS h2):**
|
|
155
|
+
|
|
156
|
+
| | Master | 2.11-A | Δ |
|
|
157
|
+
|---|---:|---:|---:|
|
|
158
|
+
| `m=1 -n 50` cold first-run, time-to-1st-byte | 20.28 ms | **9.28 ms** | **-54%** |
|
|
159
|
+
| `m=1 -n 50` warm avg, time-to-1st-byte | 5.93 ms | 7.20 ms | +21% (within run-to-run noise) |
|
|
160
|
+
| `m=100 -n 5000` 10-run avg, time-to-1st-byte | 19.6 ms | 19.1 ms | parity |
|
|
161
|
+
| `m=100 -n 5000` 10-run avg, throughput | 2742 r/s | 2893 r/s | +5.5% |
|
|
162
|
+
| `m=1 -n 50` `t1_to_t2_enc_ms` (instrumented, cold) | 3.4 ms | **1.0-1.4 ms** | **-66%** |
|
|
163
|
+
|
|
164
|
+
**Cold first-stream cost is roughly halved** on `m=1` (the actual
|
|
165
|
+
single-stream cold-connection path). The `m=100` path is dominated
|
|
166
|
+
by sequential client-frame reads on the connection-loop fiber, not
|
|
167
|
+
fiber-spawn cost — the fix doesn't move that needle but doesn't
|
|
168
|
+
regress it either.
|
|
169
|
+
|
|
170
|
+
**Sub-fixes folded in.**
|
|
171
|
+
* **Pre-resolve `peer_address`** before `read_connection_preface`.
|
|
172
|
+
The `peeraddr` syscall was previously paid on the hot path between
|
|
173
|
+
preface read and first dispatch; moving it earlier overlaps with
|
|
174
|
+
the writer fiber's first-tick scheduling.
|
|
175
|
+
|
|
176
|
+
**Operator surface.**
|
|
177
|
+
|
|
178
|
+
* `HYPERION_H2_DISPATCH_POOL=<N>` — set the pre-warmed dispatch
|
|
179
|
+
worker count per connection. Default `4`. Ceiling `16` (guards
|
|
180
|
+
against pathological configs spawning hundreds of idle fibers
|
|
181
|
+
per accepted connection). Invalid / non-positive values fall
|
|
182
|
+
back to the default rather than crashing the connection — this
|
|
183
|
+
is a tuning knob, not a spec parameter.
|
|
184
|
+
* `WriterContext#dispatch_queue` — the per-connection
|
|
185
|
+
`Async::Queue` workers park on; bench harnesses can introspect.
|
|
186
|
+
* `WriterContext#dispatch_worker_count` — live count of workers
|
|
187
|
+
currently registered (parked or actively dispatching). Useful
|
|
188
|
+
for diagnostics endpoints that want to surface "this connection's
|
|
189
|
+
pool is saturated".
|
|
190
|
+
|
|
191
|
+
**Constraints preserved.**
|
|
192
|
+
* TCP_NODELAY (2.10-G) hunk in `apply_tcp_nodelay` is untouched.
|
|
193
|
+
* Static asset preload + immutable hooks (2.10-E) are untouched.
|
|
194
|
+
* C-ext fast-path response writer (2.10-F) is untouched.
|
|
195
|
+
* `HYPERION_H2_TIMING=1` instrumentation continues to fire and
|
|
196
|
+
emits the same `'h2 first-stream timing'` log shape (the four
|
|
197
|
+
deltas + total). Locked by spec.
|
|
198
|
+
|
|
199
|
+
**Specs.** 12 new examples in `spec/hyperion/http2_dispatch_pool_spec.rb`
|
|
200
|
+
covering the WriterContext extensions (queue + worker count + register/
|
|
201
|
+
unregister), `resolve_dispatch_pool_size` env-var parsing (default,
|
|
202
|
+
override, invalid input, ceiling), the pool warmup contract (workers
|
|
203
|
+
registered, workers process queued items, one bad stream doesn't
|
|
204
|
+
poison the pool), and a TLS+curl end-to-end smoke (the timing log
|
|
205
|
+
shape continues to fire after the warmup hook is added). Spec count
|
|
206
|
+
**1060 → 1072**, 0 failures.
|
|
207
|
+
|
|
3
208
|
## 2.10.1 — 2026-05-01
|
|
4
209
|
|
|
5
210
|
### 2.10-F — C-ext fast-path response writer for prebuilt responses
|
data/README.md
CHANGED
|
@@ -11,6 +11,39 @@ gem install hyperion-rb
|
|
|
11
11
|
bundle exec hyperion config.ru
|
|
12
12
|
```
|
|
13
13
|
|
|
14
|
+
## What's new in 2.11.0
|
|
15
|
+
|
|
16
|
+
**h2 cold-stream latency cut + native HPACK CGlue flipped to default.**
|
|
17
|
+
Two perf wins on top of 2.10:
|
|
18
|
+
|
|
19
|
+
- **2.11-A — h2 first-stream TLS handshake parallelization.** The
|
|
20
|
+
2.10-G `HYPERION_H2_TIMING=1` instrumentation, run against the
|
|
21
|
+
TCP_NODELAY-fixed handler, isolated the residual cold-stream cost
|
|
22
|
+
to **bucket 2**: lazy `task.async {}` fiber spawn for the first
|
|
23
|
+
stream of every connection. Fix: pre-spawn a stream-dispatch fiber
|
|
24
|
+
pool at connection accept (configurable via `HYPERION_H2_DISPATCH_POOL`,
|
|
25
|
+
default 4, ceiling 16). h2load `-c 1 -m 1 -n 50` cold first-run:
|
|
26
|
+
**time-to-1st-byte 20.28 → 9.28 ms (−54%); m=100 throughput +5.5%**.
|
|
27
|
+
Warm steady-state unchanged (no head-of-line blocking under the small
|
|
28
|
+
pool — backlog still spills to ad-hoc `task.async`).
|
|
29
|
+
- **2.11-B — HPACK FFI marshalling round-2 (CGlue flipped to default).**
|
|
30
|
+
Three-way bench (`bench/h2_rails_shape.sh` extended): `ruby` (1,585
|
|
31
|
+
r/s) vs `native v2` (1,602 r/s, +1% — noise) vs `native v3 / CGlue`
|
|
32
|
+
(**2,291 r/s, +43% over v2**). The +18-44% native-vs-Ruby headline
|
|
33
|
+
was almost entirely Fiddle marshalling overhead, not the underlying
|
|
34
|
+
Rust HPACK encoder — same encoder, no per-call FFI marshalling, +43%
|
|
35
|
+
rps. Default flipped: unset `HYPERION_H2_NATIVE_HPACK` now selects
|
|
36
|
+
CGlue. Three escape valves stay (`=v2` to force the old path, `=ruby`
|
|
37
|
+
/ `=off` for the pure-Ruby fallback) for any operator that needs
|
|
38
|
+
them. Boot log gains a `native_mode` field documenting which path is
|
|
39
|
+
actually live.
|
|
40
|
+
|
|
41
|
+
Plus operator infrastructure: a stale-`.dylib`-on-Linux cross-platform
|
|
42
|
+
host-OS portability fix in `H2Codec.candidate_paths` (was silently
|
|
43
|
+
falling through to pure-Ruby on the bench host); `bench/h2_rails_shape.sh`
|
|
44
|
+
race-fixed (boot-log probe + stderr routing). Full bench tables and
|
|
45
|
+
flip-decision rationale in [`CHANGELOG.md`](CHANGELOG.md).
|
|
46
|
+
|
|
14
47
|
## What's new in 2.10.1
|
|
15
48
|
|
|
16
49
|
**Static-asset operator surface (2.10-E) + C-ext fast-path response
|
data/lib/hyperion/h2_codec.rb
CHANGED
|
@@ -45,11 +45,36 @@ module Hyperion
|
|
|
45
45
|
@cglue_available == true
|
|
46
46
|
end
|
|
47
47
|
|
|
48
|
+
# 2.11-B — operator-controllable gate that overlays CGlue
|
|
49
|
+
# availability. The Encoder/Decoder hot paths probe this (NOT
|
|
50
|
+
# `cglue_available?`) so a `HYPERION_H2_NATIVE_HPACK=v2` boot can
|
|
51
|
+
# force the Fiddle path even on a host where the C glue loaded
|
|
52
|
+
# successfully. This is the bench-isolation knob 2.11-B's
|
|
53
|
+
# `bench/h2_rails_shape.sh` needs to compare native-v2 against
|
|
54
|
+
# native-v3 honestly — without it, "native" and "cglue" variants
|
|
55
|
+
# would always pick the same physical path.
|
|
56
|
+
#
|
|
57
|
+
# `Http2Handler#initialize` writes the gate based on the env var;
|
|
58
|
+
# tests can flip `@cglue_disabled` directly. Default false (i.e.,
|
|
59
|
+
# gate is OPEN — same physical behavior as 2.4-A through 2.10).
|
|
60
|
+
def self.cglue_active?
|
|
61
|
+
cglue_available? && !@cglue_disabled
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
def self.cglue_disabled=(value)
|
|
65
|
+
@cglue_disabled = value ? true : false
|
|
66
|
+
end
|
|
67
|
+
|
|
68
|
+
def self.cglue_disabled
|
|
69
|
+
@cglue_disabled == true
|
|
70
|
+
end
|
|
71
|
+
|
|
48
72
|
# Force a reload (test seam). Unsets the memoized state so the next
|
|
49
73
|
# `available?` call probes the filesystem again.
|
|
50
74
|
def self.reset!
|
|
51
75
|
@available = nil
|
|
52
76
|
@cglue_available = nil
|
|
77
|
+
@cglue_disabled = false
|
|
53
78
|
@lib = nil
|
|
54
79
|
end
|
|
55
80
|
|
|
@@ -126,7 +151,13 @@ module Hyperion
|
|
|
126
151
|
# into a new owned String — that's the contract callers rely
|
|
127
152
|
# on (`protocol-http2`'s Compressor#encode returns a String,
|
|
128
153
|
# not a slice into shared mutable memory).
|
|
129
|
-
|
|
154
|
+
#
|
|
155
|
+
# 2.11-B — probe `cglue_active?` (NOT `cglue_available?`) so an
|
|
156
|
+
# operator-set `HYPERION_H2_NATIVE_HPACK=v2` boot routes through
|
|
157
|
+
# Fiddle even when the C glue is physically present. Same
|
|
158
|
+
# branch shape; one extra ivar read on the hot path which
|
|
159
|
+
# disappears under YJIT inlining.
|
|
160
|
+
if H2Codec.cglue_active?
|
|
130
161
|
# Pad the scratch String with zero bytes so its length matches
|
|
131
162
|
# capacity — the C ext writes into RSTRING_PTR up to RSTRING_LEN
|
|
132
163
|
# and then truncates back via rb_str_set_len after encoding.
|
|
@@ -272,7 +303,8 @@ module Hyperion
|
|
|
272
303
|
# 2.4-A — fast path: reuse a per-decoder scratch and dispatch
|
|
273
304
|
# through the C glue. The Rust ABI writes `[u32 name_len][name]
|
|
274
305
|
# [u32 val_len][val]` repeated; we unpack that in Ruby.
|
|
275
|
-
|
|
306
|
+
# 2.11-B — `cglue_active?` overlays an operator-set v2 force.
|
|
307
|
+
if H2Codec.cglue_active?
|
|
276
308
|
if capacity > @scratch_out_capacity
|
|
277
309
|
new_cap = @scratch_out_capacity
|
|
278
310
|
new_cap *= 2 while new_cap < capacity
|
|
@@ -412,9 +444,24 @@ module Hyperion
|
|
|
412
444
|
def self.candidate_paths
|
|
413
445
|
gem_lib = File.expand_path('../hyperion_h2_codec', __dir__)
|
|
414
446
|
ext_target = File.expand_path('../../ext/hyperion_h2_codec/target/release', __dir__)
|
|
415
|
-
|
|
416
|
-
|
|
417
|
-
|
|
447
|
+
# 2.11-B fix: order suffixes by host OS. Pre-2.11-B this was a
|
|
448
|
+
# static `[dylib, so]` order, which broke on Linux hosts that
|
|
449
|
+
# had a stale macOS `.dylib` on the path (e.g. a developer rsync
|
|
450
|
+
# leaking the `target/release` artifact across platforms). Fiddle
|
|
451
|
+
# would try the `.dylib` first, choke on the Mach-O binary with
|
|
452
|
+
# `ArgumentError: invalid byte sequence in UTF-8` from libffi,
|
|
453
|
+
# and the rescue in `load!` would silently fall back to the Ruby
|
|
454
|
+
# HPACK path with no warning visible to bench harnesses.
|
|
455
|
+
#
|
|
456
|
+
# Ordering by `host_os` makes Linux pick `.so` first and ignore
|
|
457
|
+
# any orphan `.dylib`; macOS keeps the `.dylib`-first behavior
|
|
458
|
+
# for back-compat with existing dev environments.
|
|
459
|
+
suffixes = if /darwin|mac/i.match?(RbConfig::CONFIG['host_os'])
|
|
460
|
+
%w[libhyperion_h2_codec.dylib libhyperion_h2_codec.so]
|
|
461
|
+
else
|
|
462
|
+
%w[libhyperion_h2_codec.so libhyperion_h2_codec.dylib]
|
|
463
|
+
end
|
|
464
|
+
suffixes.flat_map { |name| [File.join(gem_lib, name), File.join(ext_target, name)] }
|
|
418
465
|
end
|
|
419
466
|
|
|
420
467
|
# FFI wrappers — kept thin so callers don't see Fiddle::Pointer
|
|
@@ -2,6 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
require 'async'
|
|
4
4
|
require 'async/notification'
|
|
5
|
+
require 'async/queue'
|
|
5
6
|
require 'protocol/http2/server'
|
|
6
7
|
require 'protocol/http2/framer'
|
|
7
8
|
require 'protocol/http2/stream'
|
|
@@ -133,7 +134,7 @@ module Hyperion
|
|
|
133
134
|
#
|
|
134
135
|
# Single instance per connection, lives for the lifetime of `serve`.
|
|
135
136
|
class WriterContext
|
|
136
|
-
attr_reader :encode_mutex
|
|
137
|
+
attr_reader :encode_mutex, :dispatch_queue
|
|
137
138
|
# 2.10-G — connection-lifecycle timing slots used by the optional h2
|
|
138
139
|
# latency-instrumentation path (gated by `HYPERION_H2_TIMING=1`).
|
|
139
140
|
# Each slot is a single CLOCK_MONOTONIC timestamp captured at most
|
|
@@ -149,6 +150,15 @@ module Hyperion
|
|
|
149
150
|
@pending_bytes_lock = ::Mutex.new
|
|
150
151
|
@max_pending_bytes = max_pending_bytes
|
|
151
152
|
@writer_done = false
|
|
153
|
+
# 2.11-A — pre-spawned dispatch worker pool. The connection-loop
|
|
154
|
+
# fiber pushes ready streams onto `@dispatch_queue`; workers
|
|
155
|
+
# parked on `dequeue` grab them and call `dispatch_stream`. The
|
|
156
|
+
# queue is created here (cheap — wraps a Thread::Queue) so the
|
|
157
|
+
# WriterContext is fully self-contained and unit-testable without
|
|
158
|
+
# an Async reactor.
|
|
159
|
+
@dispatch_queue = ::Async::Queue.new
|
|
160
|
+
@dispatch_worker_count = 0
|
|
161
|
+
@dispatch_worker_lock = ::Mutex.new
|
|
152
162
|
# 2.10-G timing slots, all initially nil so capture is a single
|
|
153
163
|
# `||=` write under the encode mutex / writer fiber.
|
|
154
164
|
@t0_serve_entry = nil
|
|
@@ -157,6 +167,31 @@ module Hyperion
|
|
|
157
167
|
@t2_first_wire = nil
|
|
158
168
|
end
|
|
159
169
|
|
|
170
|
+
# 2.11-A — bench/diagnostics introspection. Reads the live count
|
|
171
|
+
# of dispatch worker fibers parked on (or actively pulling from)
|
|
172
|
+
# `@dispatch_queue`. Reflects pre-spawned workers AND any ad-hoc
|
|
173
|
+
# workers spawned when the pool was saturated. Exposed as a method
|
|
174
|
+
# rather than `attr_reader` so the lock guards the counter.
|
|
175
|
+
def dispatch_worker_count
|
|
176
|
+
@dispatch_worker_lock.synchronize { @dispatch_worker_count }
|
|
177
|
+
end
|
|
178
|
+
|
|
179
|
+
# Called by a dispatch worker fiber when it enters its run loop.
|
|
180
|
+
# Pairs with `unregister_dispatch_worker` in an ensure block.
|
|
181
|
+
def register_dispatch_worker
|
|
182
|
+
@dispatch_worker_lock.synchronize { @dispatch_worker_count += 1 }
|
|
183
|
+
end
|
|
184
|
+
|
|
185
|
+
# Called by a dispatch worker fiber when it exits (queue closed,
|
|
186
|
+
# or unrecoverable error). Floors at 0 to defend against a stray
|
|
187
|
+
# double-unregister — instrumentation must never go negative.
|
|
188
|
+
def unregister_dispatch_worker
|
|
189
|
+
@dispatch_worker_lock.synchronize do
|
|
190
|
+
@dispatch_worker_count -= 1
|
|
191
|
+
@dispatch_worker_count = 0 if @dispatch_worker_count.negative?
|
|
192
|
+
end
|
|
193
|
+
end
|
|
194
|
+
|
|
160
195
|
# Called by SendQueueIO#write on the calling (encoder) fiber. Enforces
|
|
161
196
|
# the per-connection backpressure cap before enqueuing.
|
|
162
197
|
def enqueue(bytes)
|
|
@@ -480,14 +515,19 @@ module Hyperion
|
|
|
480
515
|
# threshold. 2.4-A's hello-shape bench saw parity because HPACK
|
|
481
516
|
# is <1% of per-stream CPU on a 2-header response.
|
|
482
517
|
#
|
|
518
|
+
# 2.11-B — `HYPERION_H2_NATIVE_HPACK` extended with a native-mode
|
|
519
|
+
# axis (`auto` / `cglue` / `v2` / `off`). See `resolve_h2_native_hpack_state`.
|
|
483
520
|
# Operators who want the prior 2.4.x default (Ruby fallback, env
|
|
484
|
-
# var unset) can
|
|
485
|
-
# `0`/`false`/`no`/`off`)
|
|
486
|
-
#
|
|
521
|
+
# var unset) can set `HYPERION_H2_NATIVE_HPACK=off` (or
|
|
522
|
+
# `0`/`false`/`no`/`off`/`ruby`). `HYPERION_H2_NATIVE_HPACK=1`
|
|
523
|
+
# / unset preserves the 2.5-B `auto` behavior. `=cglue`/`=v2`
|
|
524
|
+
# forces the corresponding native sub-path.
|
|
487
525
|
#
|
|
488
526
|
# When OFF (env-overridden): `protocol-http2`'s pure-Ruby HPACK
|
|
489
527
|
# Compressor / Decompressor handles everything as in 2.0.0–2.4.x.
|
|
490
|
-
@
|
|
528
|
+
@h2_native_mode = resolve_h2_native_hpack_state
|
|
529
|
+
@h2_native_hpack_enabled = @h2_codec_available && @h2_native_mode != :off
|
|
530
|
+
apply_h2_cglue_gate(@h2_native_mode)
|
|
491
531
|
@h2_codec_native = @h2_native_hpack_enabled # back-compat ivar — preserved for codec_native? readers
|
|
492
532
|
# 2.10-G — opt-in connection-setup timing instrumentation. When set,
|
|
493
533
|
# `serve` captures four monotonic timestamps per connection:
|
|
@@ -507,9 +547,45 @@ module Hyperion
|
|
|
507
547
|
# cost when disabled — a single ivar read per stream branch). Used by
|
|
508
548
|
# 2.10-G to root-cause Hyperion's flat ~40 ms first-stream max-latency.
|
|
509
549
|
@h2_timing_enabled = env_flag_enabled?('HYPERION_H2_TIMING')
|
|
550
|
+
# 2.11-A — resolve the dispatch worker pool size once at handler
|
|
551
|
+
# construction so every `serve` call uses the same value (instead
|
|
552
|
+
# of re-parsing ENV per connection on the hot path). Cached as an
|
|
553
|
+
# ivar; bench/diagnostics can read it via the spec seam.
|
|
554
|
+
@dispatch_pool_size = resolve_dispatch_pool_size
|
|
510
555
|
record_codec_boot_state
|
|
511
556
|
end
|
|
512
557
|
|
|
558
|
+
# 2.11-A — pre-spawned dispatch worker pool sizing.
|
|
559
|
+
#
|
|
560
|
+
# Default `4` workers per connection — enough to absorb the typical
|
|
561
|
+
# HTTP/2 burst (2-8 concurrent streams) without paying any per-stream
|
|
562
|
+
# `task.async {}` cost on the hot path. Operators on long-lived
|
|
563
|
+
# high-fan-out connections (e.g. an aggregator backend that fans
|
|
564
|
+
# 30+ parallel streams) can bump this with `HYPERION_H2_DISPATCH_POOL`.
|
|
565
|
+
# Streams that arrive when the pool is saturated still get an ad-hoc
|
|
566
|
+
# fiber (see `serve` below) so concurrency is never artificially
|
|
567
|
+
# capped — the operator-facing limit is `h2.max_concurrent_streams`.
|
|
568
|
+
#
|
|
569
|
+
# Ceiling at 16 guards against a pathological config that would
|
|
570
|
+
# spawn hundreds of idle fibers per accepted connection. Anything
|
|
571
|
+
# malformed / non-positive falls back to the default rather than
|
|
572
|
+
# crashing the connection — this is a tuning knob, not a spec
|
|
573
|
+
# parameter.
|
|
574
|
+
DISPATCH_POOL_DEFAULT = 4
|
|
575
|
+
DISPATCH_POOL_MAX = 16
|
|
576
|
+
|
|
577
|
+
def resolve_dispatch_pool_size
|
|
578
|
+
raw = ENV['HYPERION_H2_DISPATCH_POOL']
|
|
579
|
+
return DISPATCH_POOL_DEFAULT if raw.nil? || raw.strip.empty?
|
|
580
|
+
|
|
581
|
+
n = Integer(raw.strip, 10)
|
|
582
|
+
return DISPATCH_POOL_DEFAULT unless n.positive?
|
|
583
|
+
|
|
584
|
+
[n, DISPATCH_POOL_MAX].min
|
|
585
|
+
rescue ArgumentError, TypeError
|
|
586
|
+
DISPATCH_POOL_DEFAULT
|
|
587
|
+
end
|
|
588
|
+
|
|
513
589
|
# Read an env-var flag with the usual truthiness rules (any of
|
|
514
590
|
# 1/true/yes/on, case-insensitive). Anything else → false.
|
|
515
591
|
def env_flag_enabled?(name)
|
|
@@ -519,21 +595,42 @@ module Hyperion
|
|
|
519
595
|
%w[1 true yes on].include?(v.downcase)
|
|
520
596
|
end
|
|
521
597
|
|
|
522
|
-
#
|
|
523
|
-
# `HYPERION_H2_NATIVE_HPACK
|
|
524
|
-
#
|
|
525
|
-
#
|
|
526
|
-
#
|
|
527
|
-
#
|
|
528
|
-
#
|
|
529
|
-
|
|
598
|
+
# 2.11-B — resolve the operator-requested native-mode state from
|
|
599
|
+
# `HYPERION_H2_NATIVE_HPACK`.
|
|
600
|
+
#
|
|
601
|
+
# Returns one of:
|
|
602
|
+
# * `:auto` — native enabled, prefer cglue if available
|
|
603
|
+
# (unset / `1` / `true` / `yes` / `on` / `auto`)
|
|
604
|
+
# * `:cglue` — native enabled, force cglue (warn-fallback to v2
|
|
605
|
+
# if cglue is unavailable; native_mode log marker
|
|
606
|
+
# surfaces the divergence to the operator)
|
|
607
|
+
# * `:v2` — native enabled, force Fiddle (skip cglue even if
|
|
608
|
+
# available; this is the bench-isolation knob the
|
|
609
|
+
# 2.11-B Rails-shape harness needs)
|
|
610
|
+
# * `:off` — ruby fallback (`0` / `false` / `no` / `off` / `ruby`)
|
|
611
|
+
#
|
|
612
|
+
# Unknown values fall through to `:auto` rather than crashing the
|
|
613
|
+
# connection — same forgiving-default policy as the pre-2.11-B
|
|
614
|
+
# `resolve_h2_native_hpack_default`.
|
|
615
|
+
def resolve_h2_native_hpack_state
|
|
530
616
|
v = ENV['HYPERION_H2_NATIVE_HPACK']
|
|
531
|
-
return
|
|
617
|
+
return :auto if v.nil? || v.empty?
|
|
532
618
|
|
|
533
619
|
lc = v.downcase
|
|
534
|
-
return
|
|
620
|
+
return :off if %w[0 false no off ruby].include?(lc)
|
|
621
|
+
return :cglue if %w[cglue v3].include?(lc)
|
|
622
|
+
return :v2 if %w[v2 fiddle].include?(lc)
|
|
623
|
+
|
|
624
|
+
:auto
|
|
625
|
+
end
|
|
535
626
|
|
|
536
|
-
|
|
627
|
+
# 2.11-B — flip the global `H2Codec.cglue_disabled` gate based on
|
|
628
|
+
# the resolved native-mode state. The gate is per-process state
|
|
629
|
+
# (the codec module is a singleton) so reset it on every handler
|
|
630
|
+
# construction; otherwise a test that booted with `=v2` would leak
|
|
631
|
+
# the disable into a subsequent default-mode handler.
|
|
632
|
+
def apply_h2_cglue_gate(state)
|
|
633
|
+
Hyperion::H2Codec.cglue_disabled = (state == :v2)
|
|
537
634
|
end
|
|
538
635
|
|
|
539
636
|
# 2.0.0 Phase 6b: emit a single-shot boot log line per process
|
|
@@ -544,23 +641,32 @@ module Hyperion
|
|
|
544
641
|
return if Hyperion::Http2Handler.instance_variable_get(:@codec_state_logged)
|
|
545
642
|
|
|
546
643
|
Hyperion::Http2Handler.instance_variable_set(:@codec_state_logged, true)
|
|
547
|
-
cglue_active
|
|
548
|
-
|
|
549
|
-
|
|
550
|
-
|
|
551
|
-
|
|
552
|
-
|
|
553
|
-
|
|
554
|
-
|
|
555
|
-
|
|
556
|
-
|
|
557
|
-
|
|
644
|
+
# 2.11-B — `cglue_active` gates on the operator-controllable
|
|
645
|
+
# `cglue_active?` predicate (was `cglue_available?` pre-2.11-B).
|
|
646
|
+
# When the operator sets `=v2` we want the boot log to read
|
|
647
|
+
# `cglue_active: false` even though the C glue did install
|
|
648
|
+
# successfully — the bench harness inspects this field to
|
|
649
|
+
# differentiate the variants.
|
|
650
|
+
cglue_active = @h2_native_hpack_enabled && Hyperion::H2Codec.cglue_active?
|
|
651
|
+
cglue_requested_unavailable = @h2_native_mode == :cglue &&
|
|
652
|
+
@h2_native_hpack_enabled &&
|
|
653
|
+
!Hyperion::H2Codec.cglue_available?
|
|
654
|
+
mode = describe_codec_mode(cglue_active: cglue_active,
|
|
655
|
+
cglue_requested_unavailable: cglue_requested_unavailable)
|
|
656
|
+
native_mode_log = if !@h2_native_hpack_enabled
|
|
657
|
+
@h2_native_mode == :off ? 'off' : 'native-disabled'
|
|
658
|
+
elsif cglue_requested_unavailable
|
|
659
|
+
'cglue-requested-unavailable'
|
|
660
|
+
else
|
|
661
|
+
@h2_native_mode.to_s
|
|
662
|
+
end
|
|
558
663
|
@logger.info do
|
|
559
664
|
{
|
|
560
665
|
message: 'h2 codec selected',
|
|
561
666
|
mode: mode,
|
|
562
667
|
native_available: @h2_codec_available,
|
|
563
668
|
native_enabled: @h2_native_hpack_enabled,
|
|
669
|
+
native_mode: native_mode_log,
|
|
564
670
|
cglue_active: cglue_active,
|
|
565
671
|
hpack_path: if @h2_native_hpack_enabled
|
|
566
672
|
cglue_active ? 'native-v3' : 'native-v2'
|
|
@@ -573,6 +679,34 @@ module Hyperion
|
|
|
573
679
|
@metrics.increment(:h2_codec_fallback_selected) unless @h2_native_hpack_enabled
|
|
574
680
|
end
|
|
575
681
|
|
|
682
|
+
# 2.11-B — boot-log mode descriptor (extracted for clarity since
|
|
683
|
+
# the matrix of native_mode × cglue_available × cglue_active grew
|
|
684
|
+
# past the point where an inline conditional was readable).
|
|
685
|
+
def describe_codec_mode(cglue_active:, cglue_requested_unavailable:)
|
|
686
|
+
if !@h2_native_hpack_enabled
|
|
687
|
+
if @h2_codec_available
|
|
688
|
+
'fallback (protocol-http2 / pure Ruby HPACK) — native available but opted out via HYPERION_H2_NATIVE_HPACK=off'
|
|
689
|
+
else
|
|
690
|
+
'fallback (protocol-http2 / pure Ruby HPACK) — native unavailable'
|
|
691
|
+
end
|
|
692
|
+
elsif cglue_active && @h2_native_mode == :cglue
|
|
693
|
+
'native (Rust v3 / CGlue, forced) — HPACK on hot path, no Fiddle per call'
|
|
694
|
+
elsif cglue_active
|
|
695
|
+
# 2.11-B confirmed cglue as the firm default — the bench-measured
|
|
696
|
+
# delta vs the v2 (Fiddle) path is +33-43% on Rails-shape h2
|
|
697
|
+
# responses, which is the actual win the 2.5-B "+18% native vs
|
|
698
|
+
# ruby" headline was capturing (v2 alone is +1-5%, basically
|
|
699
|
+
# noise vs the ruby fallback at this header count).
|
|
700
|
+
'native (Rust v3 / CGlue, default since 2.11-B) — HPACK on hot path, no Fiddle per call'
|
|
701
|
+
elsif @h2_native_mode == :v2
|
|
702
|
+
'native (Rust v2 / Fiddle, forced) — HPACK on hot path, Fiddle marshalling per call'
|
|
703
|
+
elsif cglue_requested_unavailable
|
|
704
|
+
'native (Rust v2 / Fiddle) — CGlue requested via HYPERION_H2_NATIVE_HPACK=cglue but unavailable, fell back'
|
|
705
|
+
else
|
|
706
|
+
'native (Rust v2 / Fiddle) — HPACK on hot path, Fiddle marshalling per call'
|
|
707
|
+
end
|
|
708
|
+
end
|
|
709
|
+
|
|
576
710
|
# Read-only accessor used by tests + diagnostics. true = the
|
|
577
711
|
# `Hyperion::H2Codec` Rust extension loaded successfully AND
|
|
578
712
|
# `HYPERION_H2_NATIVE_HPACK=1` is set, so `build_server` will
|
|
@@ -610,6 +744,14 @@ module Hyperion
|
|
|
610
744
|
|
|
611
745
|
task = ::Async::Task.current
|
|
612
746
|
|
|
747
|
+
# 2.11-A — extract the peer address BEFORE the preface exchange.
|
|
748
|
+
# Two wins: (1) the lookup runs in parallel with the writer fiber
|
|
749
|
+
# picking up the first scheduler slot, and (2) the first stream's
|
|
750
|
+
# dispatch fiber doesn't pay this `peeraddr` syscall on its hot
|
|
751
|
+
# path. The address is then captured by the worker closures
|
|
752
|
+
# below.
|
|
753
|
+
peer_addr = peer_address(socket)
|
|
754
|
+
|
|
613
755
|
# Spawn the dedicated writer fiber BEFORE the preface exchange.
|
|
614
756
|
# `Server#read_connection_preface` writes the server's SETTINGS frame
|
|
615
757
|
# via the framer; if the writer isn't running, those bytes sit in the
|
|
@@ -618,15 +760,23 @@ module Hyperion
|
|
|
618
760
|
# waits for our SETTINGS before sending more frames.
|
|
619
761
|
writer_task = task.async { run_writer_loop(socket, writer_ctx) }
|
|
620
762
|
|
|
763
|
+
# 2.11-A — pre-spawn the dispatch worker pool BEFORE the preface
|
|
764
|
+
# exchange. Workers park on `writer_ctx.dispatch_queue.dequeue`;
|
|
765
|
+
# by the time the first client HEADERS frame arrives the workers
|
|
766
|
+
# are already in the scheduler's runnable set. The first stream
|
|
767
|
+
# is just an enqueue + dequeue (microseconds) instead of a
|
|
768
|
+
# `task.async {}` cold spawn (was the dominant cost in the t1→t2_enc
|
|
769
|
+
# bucket per the 2.10-G timing breakdown).
|
|
770
|
+
warmup_dispatch_pool!(task, writer_ctx, peer_addr: peer_addr,
|
|
771
|
+
pool_size: @dispatch_pool_size)
|
|
772
|
+
|
|
621
773
|
server.read_connection_preface(initial_settings_payload)
|
|
622
774
|
writer_ctx.t1_preface_done = monotonic_now if @h2_timing_enabled
|
|
623
775
|
|
|
624
|
-
#
|
|
625
|
-
|
|
626
|
-
|
|
627
|
-
|
|
628
|
-
# connection close.
|
|
629
|
-
stream_tasks = []
|
|
776
|
+
# Track ad-hoc per-stream dispatch fibers (spilled when the pool is
|
|
777
|
+
# saturated). The pool handles the common case; we only fall back
|
|
778
|
+
# to `task.async {}` when more streams arrive than warm workers.
|
|
779
|
+
overflow_tasks = []
|
|
630
780
|
|
|
631
781
|
until server.closed?
|
|
632
782
|
ready_ids = []
|
|
@@ -645,14 +795,35 @@ module Hyperion
|
|
|
645
795
|
# if subsequent frames (e.g. RST_STREAM races) arrive.
|
|
646
796
|
stream.instance_variable_set(:@hyperion_dispatched, true)
|
|
647
797
|
|
|
648
|
-
|
|
649
|
-
|
|
798
|
+
# 2.11-A — hand the stream to a warm worker via the dispatch
|
|
799
|
+
# queue. We use a simple "queue is empty" probe to decide:
|
|
800
|
+
#
|
|
801
|
+
# * Empty queue ⇒ at least one worker is parked on
|
|
802
|
+
# `dequeue`; the enqueue+dequeue handoff is microseconds
|
|
803
|
+
# and we avoid a `task.async {}` cold spawn. This is the
|
|
804
|
+
# hot path for the FIRST stream of a fresh connection
|
|
805
|
+
# (the case 2.11-A is targeting).
|
|
806
|
+
# * Non-empty queue ⇒ every parked worker has already
|
|
807
|
+
# pulled a stream; another worker won't pick this up
|
|
808
|
+
# until one finishes. To avoid head-of-line blocking
|
|
809
|
+
# behind the warmup pool, fall back to `task.async {}`.
|
|
810
|
+
# The overflow fiber re-uses `dispatch_stream` so the
|
|
811
|
+
# dispatch contract is identical between pool and
|
|
812
|
+
# overflow paths. Concurrency is never artificially
|
|
813
|
+
# capped; the operator-facing knob is
|
|
814
|
+
# `h2.max_concurrent_streams`.
|
|
815
|
+
if writer_ctx.dispatch_queue.size.zero?
|
|
816
|
+
writer_ctx.dispatch_queue.enqueue(stream)
|
|
817
|
+
else
|
|
818
|
+
overflow_tasks << task.async do
|
|
819
|
+
dispatch_stream(stream, writer_ctx, peer_addr)
|
|
820
|
+
end
|
|
650
821
|
end
|
|
651
822
|
end
|
|
652
823
|
end
|
|
653
824
|
|
|
654
825
|
# Drain in-flight stream dispatches before we close the socket.
|
|
655
|
-
|
|
826
|
+
overflow_tasks.each do |t|
|
|
656
827
|
t.wait
|
|
657
828
|
rescue StandardError
|
|
658
829
|
nil
|
|
@@ -676,6 +847,18 @@ module Hyperion
|
|
|
676
847
|
# socket before the writer drains would discard final RST_STREAM /
|
|
677
848
|
# GOAWAY / END_STREAM frames in the queue.
|
|
678
849
|
if writer_ctx
|
|
850
|
+
# 2.11-A — close the dispatch queue so any pre-spawned workers
|
|
851
|
+
# parked on `dequeue` fall through (Async::Queue#dequeue returns
|
|
852
|
+
# nil after close). Do this BEFORE waiting on the writer so
|
|
853
|
+
# pool workers can drain their in-flight stream dispatches and
|
|
854
|
+
# release the encode mutex; otherwise the writer might park
|
|
855
|
+
# waiting for bytes that the dispatch worker never gets to
|
|
856
|
+
# encode.
|
|
857
|
+
begin
|
|
858
|
+
writer_ctx.dispatch_queue.close unless writer_ctx.dispatch_queue.closed?
|
|
859
|
+
rescue StandardError
|
|
860
|
+
nil
|
|
861
|
+
end
|
|
679
862
|
writer_ctx.shutdown!
|
|
680
863
|
begin
|
|
681
864
|
writer_task&.wait
|
|
@@ -695,6 +878,63 @@ module Hyperion
|
|
|
695
878
|
|
|
696
879
|
private
|
|
697
880
|
|
|
881
|
+
# 2.11-A — pre-spawn the per-connection dispatch worker pool.
|
|
882
|
+
#
|
|
883
|
+
# Each worker is a fiber that loops:
|
|
884
|
+
# 1. `dequeue` a stream from the per-connection dispatch queue
|
|
885
|
+
# (parks the fiber on the queue's internal notification when
|
|
886
|
+
# empty — zero CPU until a stream arrives).
|
|
887
|
+
# 2. Calls `dispatch_stream` with the stream + writer context +
|
|
888
|
+
# pre-resolved peer address.
|
|
889
|
+
# 3. Loops back to (1). Exits cleanly when `dequeue` returns nil
|
|
890
|
+
# (queue closed by `serve`'s ensure block on connection
|
|
891
|
+
# teardown).
|
|
892
|
+
#
|
|
893
|
+
# Why pre-spawn rather than `task.async {}` per stream:
|
|
894
|
+
# * Fiber startup under Async involves a few µs of allocation and
|
|
895
|
+
# scheduler bookkeeping. Per-stream that's negligible; on the
|
|
896
|
+
# CONNECTION COLD PATH (first request on a fresh TCP/TLS conn)
|
|
897
|
+
# it adds up to a measurable share of the t1→t2_enc bucket
|
|
898
|
+
# (the 2.10-G timing breakdown showed ~12-25 ms on h2load
|
|
899
|
+
# `-c 1 -m 100 -n 5000`).
|
|
900
|
+
# * Workers parked on `dequeue` are already in the scheduler's
|
|
901
|
+
# ready set; the first stream is just an enqueue + dequeue
|
|
902
|
+
# handoff (microseconds).
|
|
903
|
+
#
|
|
904
|
+
# Errors inside `dispatch_stream` are already caught + RST_STREAMed
|
|
905
|
+
# there, so the worker only needs to defend against truly
|
|
906
|
+
# unexpected failures (queue shutdown races, fiber kill on graceful
|
|
907
|
+
# shutdown). We swallow those defensively and unregister so the
|
|
908
|
+
# `dispatch_worker_count` introspection is truthful.
|
|
909
|
+
def warmup_dispatch_pool!(task, writer_ctx, peer_addr:, pool_size:)
|
|
910
|
+
pool_size.times do
|
|
911
|
+
task.async do
|
|
912
|
+
writer_ctx.register_dispatch_worker
|
|
913
|
+
begin
|
|
914
|
+
loop do
|
|
915
|
+
stream = writer_ctx.dispatch_queue.dequeue
|
|
916
|
+
break if stream.nil? # queue closed → graceful exit
|
|
917
|
+
|
|
918
|
+
begin
|
|
919
|
+
dispatch_stream(stream, writer_ctx, peer_addr)
|
|
920
|
+
rescue StandardError => e
|
|
921
|
+
# `dispatch_stream` already logs + RST_STREAMs internally;
|
|
922
|
+
# if anything escapes that net we log here and keep the
|
|
923
|
+
# worker alive — one bad stream must not poison the
|
|
924
|
+
# connection's worker pool.
|
|
925
|
+
@logger.error do
|
|
926
|
+
{ message: 'h2 dispatch worker swallowed error',
|
|
927
|
+
error: e.message, error_class: e.class.name }
|
|
928
|
+
end
|
|
929
|
+
end
|
|
930
|
+
end
|
|
931
|
+
ensure
|
|
932
|
+
writer_ctx.unregister_dispatch_worker
|
|
933
|
+
end
|
|
934
|
+
end
|
|
935
|
+
end
|
|
936
|
+
end
|
|
937
|
+
|
|
698
938
|
# Build the [setting_id, value] pairs that go in the connection-preface
|
|
699
939
|
# SETTINGS frame. protocol-http2's Server#read_connection_preface accepts
|
|
700
940
|
# this array and does the wire encoding for us. Empty array (no overrides
|
data/lib/hyperion/version.rb
CHANGED