hyperion-rb 2.11.0 → 2.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 70fee9940bc27a68927bb0204717bb35da7ea695bc02fe49aaa6c9597c9dd78d
4
- data.tar.gz: aaa583caa320c605bb06e2894c3f5452d183b4aa5f1ab27329a2a3a89e1ef900
3
+ metadata.gz: ae613d6efdf92072a2076a95a3656687e6bc0b5a9302700c56d75bcc0a900b56
4
+ data.tar.gz: f4f3f26cff7f4ebd08d35757726219ee53e65418bacdd0061d05626d7b354dba
5
5
  SHA512:
6
- metadata.gz: 3c28ad4d534e00eb3b687903da63ae7c04c918d27081d9ac73e04a67d521e2abd832f100b20d87b3411fa9c7be5abdcda2f8638ca8c139ef913fbd0f1ad1930a
7
- data.tar.gz: debe988e4dc461376df38bdd84833a2f608dcfb8bf2dc420fc086e89970607b1c36d632dc1057b4eb9983f83918fdc60d93ee87a10098f6f775ccd4207611fb1
6
+ metadata.gz: 5e0cbc3aa81bbe246dbf670642cae5fd2e1f0275f36030729bb68fe23b00dd54bbc51fb5f2e62c0a306fd825f73ab35e86cc49dd3bb6a27f38269885067de62c
7
+ data.tar.gz: f749e0b6e05c52695bc4698d847b92e88c10a17dc21a2d6d93deb9c9a013c5c712f778dfa13aa6dab6f5dd0ebf398c69a5eb56255031aaf9539ca561a38bca6d
data/CHANGELOG.md CHANGED
@@ -1,5 +1,571 @@
1
1
  # Changelog
2
2
 
3
+ ## 2.12.0 — 2026-05-01
4
+
5
+ ### 2.12-F — gRPC support on h2
6
+
7
+ **Background.** Hyperion has shipped a working HTTP/2 stack since 1.6
8
+ (per-stream fiber multiplexing, WINDOW_UPDATE-aware flow control, ALPN
9
+ auto-negotiation, native HPACK via the Rust v3/CGlue codec since 2.5-B).
10
+ What was missing for gRPC over Hyperion was three small things:
11
+
12
+ 1. **Trailing headers.** gRPC carries its protocol-level status
13
+ (`grpc-status: 0` / `grpc-message: OK`) as **trailers** — a
14
+ final HEADERS frame sent AFTER the body's DATA frames, with
15
+ END_STREAM=1. Hyperion's pre-2.12-F dispatch path always
16
+ folded END_STREAM onto the LAST DATA frame, so there was no
17
+ hook for emitting trailers.
18
+ 2. **Binary-clean request body.** The h2 RequestStream initialised
19
+ `@request_body = +''` (UTF-8). Valid gRPC request bodies
20
+ (`[1-byte compressed flag][4-byte length-prefix][protobuf bytes]`)
21
+ contain non-UTF-8 byte sequences, which broke `.bytesize` /
22
+ `valid_encoding?` checks and corrupted bodies that downstream
23
+ code interpolated into a UTF-8 String.
24
+ 3. **TE: trailers preservation.** Hyperion's RFC 7540 §8.1.2.2
25
+ validator already accepted `te: trailers` (any other TE value
26
+ is rejected as a protocol error per the spec). What was needed
27
+ was a non-regression spec confirming the header makes it to
28
+ `env['HTTP_TE']` for the Rack app.
29
+
30
+ **What's new.** The h2 dispatch path (`Http2Handler#dispatch_stream`)
31
+ now checks if the Rack response body responds to `:trailers` AFTER
32
+ iterating the body. When it does, the wire shape becomes:
33
+
34
+ ```
35
+ HEADERS (no END_STREAM)
36
+ DATA (no END_STREAM on last DATA)
37
+ HEADERS (END_STREAM=1) ← trailers, e.g. grpc-status: 0
38
+ ```
39
+
40
+ The trailer Hash is filtered defensively before encoding: pseudo-headers
41
+ (`:status` / `:method` / etc.) and connection-specific names
42
+ (`connection`, `transfer-encoding`, …) are stripped — same allow-list
43
+ the regular response-header path uses. A misbehaving app that returns
44
+ non-Hash trailers (or whose `body.trailers` raises) gets a warn log and
45
+ falls back to the no-trailers path; the connection never crashes.
46
+
47
+ `Http2Handler::RequestStream#@request_body` is now an
48
+ `Encoding::ASCII_8BIT` String, so binary bytes survive verbatim through
49
+ `<<` accumulation and `env['rack.input'].read`.
50
+
51
+ **What's NOT changed.** The existing wire shape for non-gRPC h2 traffic
52
+ is identical to 2.11.x. A Rack body that doesn't define `:trailers` (or
53
+ where `body.trailers` is `nil` / empty) takes the pre-2.12-F path:
54
+ HEADERS → DATA-with-END_STREAM-on-last. Verified by three non-regression
55
+ specs in `spec/hyperion/grpc_trailers_spec.rb`.
56
+
57
+ The HTTP/1.1 dispatch path is untouched. gRPC is HTTP/2-only by spec
58
+ (the `Trailer:` mechanism on h1 has interop gaps with Go / Java clients;
59
+ gRPC mandates h2 since v1.0). Hyperion follows suit.
60
+
61
+ **What's NOT in scope for this stream.** gRPC server streaming /
62
+ client streaming / bidi streaming require Rack 3 streaming bodies AND
63
+ a way for the Rack app to read incoming DATA frames after sending
64
+ response HEADERS. That's a separate stream-control problem (`rack.hijack`
65
+ on h2 needs RFC 8441 Extended CONNECT, same blocker as WebSocket-over-h2).
66
+ 2.12-F lands the **unary** (request-response) and **server-side trailers**
67
+ half — which covers ~80% of production gRPC traffic shapes by message
68
+ volume per the public Google / Netflix talks. Streaming is a 2.13
69
+ candidate.
70
+
71
+ **Opt-in via Rack 3 `body.trailers`.** Apps don't need to know about
72
+ Hyperion. Any Rack 3 body — hand-rolled, `grpc-server-rack`, a future
73
+ `Rack::Trailers` middleware — that exposes `body.trailers` returning
74
+ a Hash gets the trailing HEADERS frame on the wire. Bodies that don't
75
+ expose it (the overwhelming majority of HTTP/2 traffic — Rails,
76
+ Sinatra, asset servers, JSON APIs) take the legacy path with no
77
+ behaviour change.
78
+
79
+ **Spec coverage.** `spec/hyperion/grpc_trailers_spec.rb` — 11 specs
80
+ covering:
81
+
82
+ - End-to-end TLS+h2 client driving a real Hyperion server through
83
+ `Protocol::HTTP2::Client`, asserting the trailing HEADERS frame
84
+ decodes to the expected `grpc-status` / `grpc-message` map.
85
+ - `te: trailers` request header reaches `env['HTTP_TE']`.
86
+ - Binary request body bytes (`\xFF\x00\x01\x02\xC3\x28\x80`)
87
+ round-trip verbatim through `env['rack.input'].read`.
88
+ - Non-regression: bodies without `:trailers` keep the
89
+ DATA-with-END_STREAM-on-last shape (no extra HEADERS frame).
90
+ - Defensive: `body.trailers = nil` and `body.trailers = {}` both
91
+ take the no-trailers path.
92
+ - Unit tests for `collect_response_trailers` covering nil /
93
+ raises / Hash-coercible / non-responding bodies.
94
+
95
+ `spec/hyperion/grpc_smoke_spec.rb` — opt-in (`RUN_GRPC_SMOKE=1`)
96
+ end-to-end smoke against the real `grpc` Ruby gem. Skipped by default
97
+ because the gem pulls protobuf C bindings (~50 MB build); the unit
98
+ specs above are the durable coverage.
99
+
100
+ **Performance.** Zero hot-path cost when no app uses trailers — the
101
+ `body.respond_to?(:trailers)` probe is one method dispatch, the
102
+ trailers-Hash branch only runs when the probe returns truthy. The
103
+ trailers-emitting path costs one extra encoder-mutex acquisition + one
104
+ extra HEADERS frame per response — negligible against the existing
105
+ HEADERS+DATA sequence. No bench numbers here because gRPC throughput
106
+ on Hyperion is dominated by the application's protobuf marshalling,
107
+ not the framing layer; a meaningful gRPC bench is a 2.13 follow-up
108
+ once we have a streaming story to compare against Falcon's
109
+ `async-grpc`.
110
+
111
+ ### 2.12-E — SO_REUSEPORT load balancing audit
112
+
113
+ **Background.** Hyperion's cluster mode (`-w N`) uses two listener-FD
114
+ models depending on platform:
115
+
116
+ - **Linux:** `SO_REUSEPORT`. Every worker calls `bind()` on the same
117
+ port; the kernel does in-kernel load balancing across connected
118
+ workers based on a 4-tuple hash. Documented as the
119
+ production-recommended setup since rc15.
120
+ - **Darwin / BSD:** master-bind + worker-fd-share. The master process
121
+ binds the listener and passes the fd to workers via `fork()`;
122
+ workers race to `accept()`. Darwin's `SO_REUSEPORT` doesn't
123
+ load-balance — it just lets multiple sockets bind without
124
+ `EADDRINUSE` — so we deliberately do NOT use it on Darwin.
125
+
126
+ The Linux path was built on the assumption that the kernel hash
127
+ distributes uniformly across workers under sustained load. We had no
128
+ **measurement** of that — only theory. 2.12-E fixes the gap.
129
+
130
+ **New per-worker request counter.** `Hyperion::Metrics#tick_worker_request`
131
+ ticks the labeled counter family
132
+ `hyperion_requests_dispatch_total{worker_id="<pid>"}` once per
133
+ dispatched request, regardless of which dispatch shape served it:
134
+
135
+ - Rack-via-`Connection#serve` (the threadpool / inline / async-io
136
+ paths).
137
+ - HTTP/2 stream dispatch in `Http2Handler`.
138
+ - The 2.12-C `accept4` C accept loop.
139
+ - The 2.12-D io_uring accept loop.
140
+
141
+ The C-loop variants tick a process-global atomic counter
142
+ (`Hyperion::Http::PageCache.c_loop_requests_total`) that the
143
+ `PrometheusExporter.render_full` pass folds into the
144
+ per-worker series at scrape time — so operators see one consistent
145
+ per-PID count even on the C-only fast path that bypasses
146
+ `Connection#serve`. Hot-path cost on the C loops is one
147
+ `__atomic_add_fetch` (relaxed) — negligible against the 134k r/s
148
+ 2.12-D bench peak.
149
+
150
+ The label value is `Process.pid.to_s` — matches the 2.4-C
151
+ `hyperion_io_uring_workers_active` and
152
+ `hyperion_per_conn_rejections_total` labeling convention; lets
153
+ operators correlate cluster-mode rows with `ps`/`/proc` data without
154
+ a separate worker_id ↔ pid mapping table.
155
+
156
+ **New bench harness.** `bench/cluster_distribution.sh` boots
157
+ Hyperion `-w 4 -t 1 -p 9292 bench/hello_static.ru` (4 workers,
158
+ sharp imbalance signal), runs `wrk -t8 -c200 -d30s` against `/`,
159
+ then scrapes `/-/metrics` repeatedly until all 4 workers have
160
+ responded. Reports the per-worker request distribution
161
+ (pid + count + share-%), mean / stddev / max-vs-min ratio, and a
162
+ verdict:
163
+
164
+ - `balanced` (max/min ≤ 1.10): kernel hash is doing its job.
165
+ - `mild` (1.10 < max/min ≤ 1.50): note here, no fix.
166
+ - `severe` (max/min > 1.50): file follow-up; do not ship as-is.
167
+
168
+ Three runs back-to-back so noise is visible; aggregate verdict is
169
+ the worst per-run verdict (one severe run is enough to fail the
170
+ audit).
171
+
172
+ **Bench result.** See `docs/BENCH_HYPERION_2_11.md`'s "Cluster
173
+ distribution audit" section. Headline number lands in the
174
+ `[bench][2.12.0] 2.12-E — bench result: …` follow-up commit.
175
+
176
+ **Darwin caveat.** The Darwin master-bind/worker-fd-share path is
177
+ documented as known-imbalanced and is NOT covered by this audit
178
+ (no kernel SO_REUSEPORT distributor to measure). The metric
179
+ infrastructure is platform-independent; any operator running on
180
+ Darwin can read their own per-worker imbalance via `/-/metrics` and
181
+ the same `hyperion_requests_dispatch_total{worker_id}` series.
182
+
183
+ **Spec coverage.** `spec/hyperion/metrics_per_worker_request_count_spec.rb`
184
+ asserts:
185
+
186
+ - `Metrics#tick_worker_request` registers the labeled family with
187
+ `worker_id` label and increments the right series.
188
+ - `Connection#serve` ticks the counter once per request from any
189
+ dispatch shape (regular Rack, direct dispatch, StaticEntry fast
190
+ path).
191
+ - `PrometheusExporter.render_full` emits the merged
192
+ `hyperion_requests_dispatch_total{worker_id="<pid>"}` line.
193
+ - The C accept4 loop's atomic counter ticks per request and is
194
+ reset on `run_static_accept_loop` entry. (Gated on the C ext
195
+ being available; same gating pattern 2.12-D specs use.)
196
+
197
+ ### 2.12-D — io_uring accept loop (Linux 5.x)
198
+
199
+ The 2.12-C `accept4` loop closed most of the gap to Agoo on the
200
+ `handle_static` hello row (5,502 → 15,685 r/s, 1.21× from parity).
201
+ Looking at the remaining cost on Linux: the worker thread does
202
+ **three** syscalls per request — `accept4` + `recv` + `write`. On a
203
+ host that delivers ~150 ns of syscall entry/exit overhead each, that's
204
+ ~450 ns of pure kernel-mode bookkeeping per request, before any I/O
205
+ actually happens.
206
+
207
+ io_uring lets us collapse the train. We submit ACCEPT/RECV/WRITE/CLOSE
208
+ SQEs to a per-worker ring; the worker thread does **one**
209
+ `io_uring_enter` per cycle and reaps N completions in a single
210
+ syscall round-trip. Steady-state cost goes from N×3 syscalls to
211
+ ~N÷K × 1 syscall (where K is the burst-batch size the kernel
212
+ naturally delivers).
213
+
214
+ **New C exports** on `Hyperion::Http::PageCache`:
215
+
216
+ - `run_static_io_uring_loop(listen_fd) -> Integer | :crashed | :unavailable`
217
+ drives the io_uring accept-and-serve loop. Same wire contract as
218
+ `run_static_accept_loop`: only `handle_static` routes, plain TCP,
219
+ GET/HEAD without a body, HTTP/1.1 only. Returns `:unavailable` if
220
+ the C ext was built without `liburing` OR the runtime
221
+ `io_uring_queue_init` probe failed (seccomp / locked-down
222
+ container / kernel < 5.5). The Ruby caller treats `:unavailable`
223
+ as "fall through to the 2.12-C `accept4` path" — operator gets a
224
+ one-line warn-level log, no surprise downtime.
225
+ - `io_uring_loop_compiled?` — boolean, true when the C ext was
226
+ built with `HAVE_LIBURING`. Cheap eligibility check that lets
227
+ `ConnectionLoop#io_uring_eligible?` skip the env-var read on
228
+ builds where the path can't engage anyway.
229
+
230
+ **Build.** `extconf.rb` probes for `liburing` in two passes:
231
+
232
+ 1. `pkg-config --exists liburing` — picks up Debian/Ubuntu's
233
+ pkg-config metadata and adds the right `-L`/`-l` flags.
234
+ 2. `have_header('liburing.h')` + `have_library('uring', ...)` —
235
+ covers the no-pkg-config path.
236
+
237
+ On success, `-DHAVE_LIBURING` lands and the io_uring loop compiles.
238
+ On failure, the same `io_uring_loop.c` translation unit compiles to
239
+ a thin stub that returns `:unavailable`. Soft-optional dependency:
240
+ hosts without liburing-dev still build cleanly and ship the 2.12-C
241
+ behaviour.
242
+
243
+ **Wiring.** `Hyperion::Server::ConnectionLoop.io_uring_eligible?`
244
+ returns true when ALL hold:
245
+
246
+ 1. The C accept-loop path is available
247
+ (`ConnectionLoop.available?`).
248
+ 2. The C ext was compiled with `HAVE_LIBURING`.
249
+ 3. `HYPERION_IO_URING_ACCEPT=1` is set (default OFF in 2.12 — the
250
+ soak window before flipping the default to ON is 2.13 or later).
251
+
252
+ When eligible, `Server#run_c_accept_loop` calls
253
+ `run_static_io_uring_loop` first; on `:unavailable` (runtime probe
254
+ fail) it falls through to `run_static_accept_loop`. A new
255
+ `:c_accept_loop_io_uring_h1` `DispatchMode` symbol counts engagements
256
+ under `requests_dispatch_c_accept_loop_io_uring_h1`.
257
+
258
+ **Lifecycle hooks.** Same contract as 2.12-C: per-request
259
+ `Runtime#fire_request_start` / `#fire_request_end` fire on every
260
+ request the io_uring loop served. `env=nil`, minimal `Hyperion::Request`
261
+ passed. The C-side `lifecycle_active?` flag gates the GVL re-acquisition
262
+ (`rb_thread_call_with_gvl`); the no-hook hot path stays GVL-free for
263
+ the whole `submit_and_wait` cycle.
264
+
265
+ **Handoff.** Same set of "send to Ruby" triggers as 2.12-C: HTTP/1.0,
266
+ `Content-Length` / `Transfer-Encoding`, `Upgrade`, `HTTP2-Settings`,
267
+ `Connection: upgrade`, malformed framing, header section >64 KiB,
268
+ unknown method, path miss. Ruby resumes ownership via the same
269
+ `dispatch_handed_off` path the 2.12-C loop already uses.
270
+
271
+ **TCP_NODELAY.** Applied per-accept via the shared
272
+ `pc_internal_apply_tcp_nodelay` wrapper — preserves the 2.10-G hunk.
273
+
274
+ **Tests.** `spec/hyperion/io_uring_loop_spec.rb` covers the Ruby
275
+ surface (always exposed), the stub `:unavailable` return on
276
+ non-Linux / no-liburing builds, eligibility-gate semantics
277
+ (`HYPERION_IO_URING_ACCEPT` env var honoured, compile-time flag
278
+ honoured), and — gated on `HAVE_LIBURING` — a smoke test (5 GETs,
279
+ assert served count grows), lifecycle-hook firing parity with the
280
+ 2.12-C contract, and Server-level engagement (the
281
+ `:c_accept_loop_io_uring_h1` dispatch mode is recorded).
282
+
283
+ Total: 1065 specs / 0 failures / 15 pending on macOS — +10 specs and
284
+ +4 pending over the 2.12-C macOS baseline (1055 / 0 / 11). Linux:
285
+ 1065 / 1 / 14 (the one failure is a pre-existing flake on the 2.12-C
286
+ smoke spec — reproduces on unmodified master at e526ef3 on the bench
287
+ host, NOT a 2.12-D regression).
288
+
289
+ **Bench (3 trials, median, openclaw-vm 16 vCPU Ubuntu 24.04 Linux 6.8.0,
290
+ liburing 2.5, wrk -t4 -c100 -d20s, `bench/hello_static.ru`):**
291
+
292
+ | Variant | r/s (median) | p99 |
293
+ |---|---:|---:|
294
+ | 2.12-C `handle_static` Hyperion (C accept loop, accept4) | 15,532 | 101 µs |
295
+ | **2.12-D `handle_static` Hyperion (C accept loop, io_uring)** | **134,084** | **0.99 ms** |
296
+ | Agoo 2.15.14 (reference, prior bench) | 19,024 | 10.47 ms |
297
+
298
+ `handle_static` hello: **8.6× over the 2.12-C accept4 baseline**
299
+ (15,532 → 134,084 r/s) and **7.0× over Agoo** (134,084 vs. 19,024).
300
+ The accept4 path is unchanged and within bench noise of the prior
301
+ 2.12-C 15,685 r/s baseline (15,532 r/s today is -1%, well inside the
302
+ ±5% target).
303
+
304
+ The p99 climb (101 µs accept4 → 0.99 ms io_uring) is the cost of
305
+ batching: with one `io_uring_enter` reaping K completions, the
306
+ last-enqueued request waits for the kernel to produce all K CQEs
307
+ before our worker drains. At 134k r/s the per-request median is
308
+ ~7.4 µs; the p99 of ~990 µs is roughly the steady-state batch-tail
309
+ (~130 requests' worth of completions). For the smoke-class workload
310
+ this is a clear win — sub-millisecond p99 is below most real-world
311
+ network jitter floors and well below Agoo's 10.47 ms p99.
312
+
313
+ The Rack-style fallback (`bench/hello.ru`, no `handle_static`) is
314
+ unaffected: io_uring engagement requires `Server.handle_static`
315
+ registration, the regular dynamic-Rack path is unchanged.
316
+
317
+ **Production rollout.** Default OFF for 2.12.0. Operators opt in
318
+ with `HYPERION_IO_URING_ACCEPT=1` per worker. Soak window for
319
+ flipping the default to ON is 2.13 — kernel io_uring has known
320
+ sharp edges around fork-shared rings and `seccomp` policies that
321
+ this default-off posture lets us discover under operator A/B
322
+ before forcing the path on every install.
323
+
324
+ ### 2.12-C — Connection lifecycle in C
325
+
326
+ The 2.12-B re-bench made it clear that Hyperion's `handle_static`
327
+ hot path was already doing the **response side** in C
328
+ (`PageCache.serve_request`, 2.10-F), but the **connection
329
+ lifecycle around it** — accept loop, route lookup, per-`Connection`
330
+ ivar init — was still in Ruby. The 2.10-D / 2.10-F bench analysis
331
+ flagged that as the next dominant cost: `accept4` + `clone3`
332
+ (worker thread wakeup) + `futex` (GVL handoff) + Ruby ivar init
333
+ on every connection.
334
+
335
+ This stream pushes the entire accept→read→route-lookup→write
336
+ loop into C for `handle_static`-routed paths.
337
+
338
+ **New C exports** on `Hyperion::Http::PageCache`:
339
+
340
+ - `run_static_accept_loop(listen_fd) -> Integer | :crashed`
341
+ drives the accept-and-serve loop. Releases the GVL during
342
+ `accept(2)`, `recv(2)`, and `write(2)`; re-acquires only for
343
+ the registered lifecycle / handoff Ruby callbacks. Returns
344
+ the count of requests served when the listener closes
345
+ cleanly, or `:crashed` if an unrecoverable accept error
346
+ happened (Ruby falls back to its own accept loop on this
347
+ sentinel).
348
+ - `set_lifecycle_callback(callable)` and
349
+ `set_lifecycle_active(bool)` — register / gate the per-request
350
+ lifecycle hook. Decoupled so flipping the gate at runtime
351
+ doesn't re-allocate the callback. The hot path checks a
352
+ single `int` flag; the no-hook path stays one syscall (recv
353
+ or write) per request.
354
+ - `set_handoff_callback(callable)` — register the callback the
355
+ C loop invokes when a connection's first request can't be
356
+ served from the static cache (path miss, malformed request,
357
+ body present, h2 upgrade requested, HTTP/1.0). Receives
358
+ `(fd_int, partial_buffer_or_nil)`; Ruby owns the fd from
359
+ that point on and resumes ownership via the regular
360
+ `Connection` path.
361
+ - `stop_accept_loop` and `lifecycle_active?` — operator /
362
+ spec helpers.
363
+ - `handoff_to_ruby(client_fd, partial_buffer, partial_len)` —
364
+ echo helper exposing the 2.12-C contract for spec-time
365
+ introspection (the actual handoff happens inside
366
+ `run_static_accept_loop` via the registered callback).
367
+
368
+ **Wiring.** `Hyperion::Server::ConnectionLoop` (new module) holds
369
+ the engagement check + callback factories. `Server#start_raw_loop`
370
+ engages the C loop when:
371
+
372
+ 1. The listener is plain TCP (no TLS, no h2 ALPN dance).
373
+ 2. The route table has at least one
374
+ `RouteTable::StaticEntry` registration.
375
+ 3. The route table has NO non-StaticEntry registrations
376
+ (any dynamic handler disables the C path; the C loop
377
+ only knows how to write prebuilt responses).
378
+ 4. The C ext is available (the `Hyperion::Http::PageCache`
379
+ module responds to `:run_static_accept_loop`) and the
380
+ `HYPERION_C_ACCEPT_LOOP=0` escape hatch is not set.
381
+
382
+ A new `:c_accept_loop_h1` `DispatchMode` symbol ships under
383
+ `requests_dispatch_c_accept_loop_h1` (one bump per worker boot
384
+ that engages the path) plus `c_accept_loop_requests` (count of
385
+ requests the C loop served on this worker, bumped at loop exit).
386
+
387
+ **Lifecycle hooks.** The 2.10-D contract holds: per-request
388
+ `Runtime#fire_request_start` / `#fire_request_end` fire on every
389
+ request the C loop served. `env=nil`, `path=` the matched path
390
+ string — same surface as the Ruby direct-dispatch path. The
391
+ Ruby-side bridge in `ConnectionLoop.build_lifecycle_callback`
392
+ builds a minimal `Hyperion::Request` for the hooks to consume.
393
+ The C-side `lifecycle_active?` flag gates the `rb_funcall`;
394
+ when no hooks are registered, the no-hook hot path is one
395
+ syscall (recv) + one syscall (write) per request, with **zero
396
+ Ruby method invocations**.
397
+
398
+ **Handoff.** When the C loop sees a request it can't serve from
399
+ the static cache (POST, path miss, malformed framing,
400
+ `Content-Length`, `Transfer-Encoding`, `Upgrade`,
401
+ `HTTP2-Settings`, `Connection: upgrade`, HTTP/1.0), it invokes
402
+ the handoff callback with `(fd_int, partial_buffer_str)` and
403
+ continues to the next accept. Ruby resumes ownership of the fd
404
+ via `dispatch_handed_off`, which wraps the fd in `Socket.for_fd`,
405
+ applies the read timeout (matches the regular Ruby accept
406
+ path), pre-loads the partial buffer onto the Connection's
407
+ `@inbuf` ivar so the parser sees the bytes the C loop already
408
+ consumed, and dispatches through the existing thread-pool /
409
+ inline path.
410
+
411
+ **Tests.** `spec/hyperion/connection_loop_spec.rb` covers
412
+ smoke (registered route → C loop served-count grows), mixed
413
+ registered + unregistered (C loop hands off to the Ruby callback
414
+ spy with the partial buffer + fd), body-present requests
415
+ (`POST /hello` hands off), lifecycle hooks
416
+ (`set_lifecycle_active(true)` fires the registered callback
417
+ once per served request; `false` is silent — no callback
418
+ invocation, even with one set), GVL release (a slow C-loop
419
+ parked on `accept(2)` doesn't block another Ruby thread doing
420
+ arithmetic), keep-alive on a single connection (two pipelined
421
+ requests on the same socket → two responses), Server-level
422
+ engagement (only-static-routes engages the C loop;
423
+ mixed-with-dynamic-handlers refuses to engage and falls back
424
+ to the regular Ruby loop).
425
+
426
+ Total: 1112 specs / 0 failures / 11 pending — +10 specs over
427
+ the 2.12-B baseline (1102 / 0 / 11).
428
+
429
+ **Bench (3 trials, median, openclaw-vm 16 vCPU Ubuntu 24.04 Linux 6.8.0,
430
+ wrk -t4 -c100 -d20s, `bench/hello_static.ru`):**
431
+
432
+ | Variant | r/s (median) | p99 |
433
+ |---|---:|---:|
434
+ | 2.12-B `handle_static` Hyperion (Ruby accept loop + C `serve_request`) | 5,502 | 1.59 ms |
435
+ | **2.12-C `handle_static` Hyperion (C accept loop)** | **15,685** | **107 µs** |
436
+ | Agoo 2.15.14 (reference) | 19,024 | 10.47 ms |
437
+
438
+ `handle_static` hello: **2.85× over the 2.12-B baseline** (5,502 → 15,685).
439
+ Gap to Agoo's 19,024 r/s closed from **3.46×** to **1.21×** — the
440
+ 2.12-C path is now within striking distance of Agoo on this row.
441
+ Tail latency (p99) is now **107 µs**, an **15× improvement** over
442
+ the 2.12-B 1.59 ms p99 and **97× better** than Agoo's 10.47 ms p99.
443
+
444
+ The Rack-style fallback (`bench/hello.ru`, no `handle_static`) was
445
+ re-checked: 4,648 r/s median (vs 4,477 r/s 2.12-B baseline — within
446
+ bench noise, no regression). The C-loop path is only engaged when
447
+ the operator opts in via `Server.handle_static` registration on
448
+ every route; the regular dynamic-Rack path is unchanged.
449
+
450
+ ### 2.12-B — Fresh 4-way re-bench (post-2.10/2.11 wins)
451
+
452
+ The 4-way head-to-head in `docs/BENCH_HYPERION_2_0.md`
453
+ § "4-way head-to-head (2.10-B baseline, 2026-05-01)" was
454
+ captured before the 2.10-C / 2.10-D / 2.10-E / 2.10-F /
455
+ 2.10-G / 2.11-A / 2.11-B wins landed. Every Hyperion column
456
+ in that doc was therefore stale; the Puma / Falcon / Agoo
457
+ columns were unchanged (no version bumps on the bench host).
458
+ This stream re-runs the entire 6-row matrix on the 2.11.0
459
+ codebase — same host (`openclaw-vm`, 16 vCPU, Ubuntu 24.04,
460
+ Linux 6.8.0), same tools (`wrk` + `perfer`), 3 trials per
461
+ row, median reported — and writes the results to a new
462
+ `docs/BENCH_HYPERION_2_11.md` (pointed at from the README's
463
+ Benchmarks section; the 2.0 doc is now marked "historical
464
+ baseline (pre-2.10/2.11 wins)").
465
+
466
+ The harness at `bench/4way_compare.sh` gains a sixth server
467
+ label, `hyperion_handle_static`, which boots Hyperion against
468
+ an alternate rackup whose hot path is the
469
+ `Hyperion::Server.handle_static` direct route + 2.10-F C-ext
470
+ fast-path response writer + 2.10-C PageCache fold. This
471
+ exposes two Hyperion columns on the rows where it applies
472
+ (hello + small static): "Rack-style" (the legacy generic-Rack
473
+ path most apps run unchanged) and "`handle_static`" (the peak
474
+ path operators can opt into by registering one pre-built
475
+ route at boot). New rackup `bench/static_handle_static.ru`
476
+ preloads the 1 KB static asset bytes at boot and serves them
477
+ through the same direct-route fast path; the existing
478
+ `bench/hello_static.ru` (added in 2.10-F) plays the same role
479
+ for the hello row.
480
+
481
+ **Headline shifts (medians, vs the 2.10-B baseline at the
482
+ same workload):**
483
+
484
+ | # | Workload | 2.10-B Hyperion | 2.11.0 Rack-style | 2.11.0 `handle_static` | Gap-vs-leader shift |
485
+ |---:|---|---:|---:|---:|---|
486
+ | 1 | hello | 4,587 | 4,477 (-2.4%, noise) | **5,502** (+19.9%) | gap to Agoo: 4.22× → **3.46×** with `handle_static` |
487
+ | 2 | static 1 KB | 1,380 | 1,687 (**+22.2%**, PageCache 2.10-C auto-engage) | **5,935** (+330%) | gap to Agoo: 1.89× behind → **flipped, Hyperion +127%** |
488
+ | 3 | static 1 MiB | 1,378 | 1,513 (+9.8%) | n/a (handle_static buffers in memory; defeats sendfile) | Hyperion lead vs Agoo: 9.07× → 9.74× (held) |
489
+ | 4 | CPU JSON 50-key | 3,450 | 3,659 (+6.0%) | n/a (per-request `JSON.generate`) | gap to Agoo: 1.85× → **2.05× (widened)**; Agoo +17.5% in same window |
490
+ | 5 | PG-bound async (`pg_sleep 50ms`) | 1,564 | 1,565 (+0.1%) | n/a | Hyperion-only; identical |
491
+ | 6 | SSE 1000 events × 50 B | 500 | 472 (-5.6%, noise) | n/a (single fixed response) | Hyperion lead vs Puma: 3.65× → 3.58× (flat) |
492
+
493
+ **Reading the deltas.**
494
+
495
+ - **The 2.10-D + 2.10-F + 2.10-C win-stack lands cleanly on
496
+ static 1 KB.** Hyperion's `handle_static` row at 5,935 r/s
497
+ wins the column outright by +127% (2.27×) over Agoo, +208%
498
+ over Falcon, +282% over Puma. The Rack-style row also moved
499
+ up +22.2% from the PageCache auto-engage even without
500
+ explicit `handle_static` registration. Operators who can
501
+ register one pre-built route via `handle_static` pick up a
502
+ 3.5× lift over the generic Rack-style path on this exact
503
+ shape.
504
+ - **Hello narrowed but did not close.** The gap to Agoo went
505
+ from 4.22× to 3.46× with `handle_static`; the Rack-style
506
+ row stayed flat (within bench noise) — meaning the 2.10-D
507
+ direct-route win **only manifests when the rackup actually
508
+ opts in via `handle_static`**.
509
+ - **CPU JSON widened.** Agoo +17.5% in the same window
510
+ Hyperion +6.0% took the gap from 1.85× to 2.05×. This is
511
+ the one row the 2.10/2.11 streams didn't move; closing it
512
+ is the obvious 2.12 follow-on.
513
+ - **Large static, PG-bound async, SSE — Hyperion's existing
514
+ wins held.** Sendfile (9.7× over Agoo on 1 MiB), fiber-
515
+ cooperative I/O (Hyperion-only on PG-bound, identical to
516
+ 2.10-B), ChunkedCoalescer (3.58× over Puma, 16.7× over
517
+ Falcon on SSE) all stayed clean. Agoo's SSE behavior
518
+ regressed to a hard segfault at boot on `bench/sse_generic.ru`
519
+ (different shape from 2.10-B's "buffers entire response,
520
+ takes ~5 s to flush"; same conclusion either way — Agoo is
521
+ not a viable SSE server on this rackup at any throughput).
522
+ - **Tail latency is still Hyperion's clean win** across every
523
+ row with a non-trivial p99: hello 1.73 ms vs Agoo 10.47 ms
524
+ / Puma 29 ms / Falcon 408 ms; 1 KB 1.69 ms vs 57–86 ms;
525
+ 1 MiB 4.63 ms vs 82–720 ms; CPU JSON 2.60 ms vs 17–411 ms;
526
+ SSE 2.85 ms vs 11–42 ms.
527
+
528
+ **Harness changes** (`bench/4way_compare.sh`):
529
+
530
+ - New server label `hyperion_handle_static` — boots Hyperion
531
+ against an alternate rackup specified by the new
532
+ `HYPERION_STATIC_RACKUP` env var (falls back to the same
533
+ `RACKUP` as the legacy `hyperion` label if unset, which
534
+ becomes a harmless no-op duplicate). The four legacy
535
+ labels (`hyperion`, `puma`, `falcon`, `agoo`) keep their
536
+ byte-identical 2.10-B shape so the older doc stays
537
+ reproducible.
538
+
539
+ **New rackup**: `bench/static_handle_static.ru` — preloads
540
+ the 1 KB static asset (`/tmp/hyperion_bench_1k.bin`) at boot,
541
+ registers it via `Hyperion::Server.handle_static`, and falls
542
+ through to a 404 lambda for any other path. Mirrors the role
543
+ `bench/hello_static.ru` plays for the hello row.
544
+
545
+ **Spec coverage** (`spec/hyperion/bench_handle_static_rackups_spec.rb`,
546
+ 9 examples):
547
+
548
+ - `bench/hello_static.ru` parses cleanly and registers a `/`
549
+ StaticEntry with the expected response bytes.
550
+ - `bench/static_handle_static.ru` preloads the asset and
551
+ registers a route at boot; fails fast if the asset is
552
+ missing rather than booting empty (prevents a silently-
553
+ bound 404 from masking a misconfigured bench run).
554
+ - `bench/4way_compare.sh` knows how to boot all five variant
555
+ labels (hyperion, hyperion_handle_static, puma, falcon,
556
+ agoo) — a typo in the harness's case statement or the
557
+ new env var name fails this spec at unit-level instead
558
+ of waiting for a 41-minute bench sweep to surface the
559
+ regression.
560
+
561
+ Spec count: 1093 → 1102 (+9). 0 failures, 11 pending —
562
+ invariant preserved.
563
+
564
+ **Reproducing.** See
565
+ `docs/BENCH_HYPERION_2_11.md` § "Reproducing 4-way" for the
566
+ six per-row commands. Bench host: `openclaw-vm`, sweep dir
567
+ `/home/ubuntu/bench-2.12-B/`. Total wall time ~41 min.
568
+
3
569
  ## 2.11.0 — 2026-05-01
4
570
 
5
571
  ### 2.11-B — HPACK FFI marshalling round-2 (cglue confirmed as firm default; +43% v3 vs v2 on Rails-shape h2)