hyperion-rb 2.11.0 → 2.12.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +566 -0
- data/README.md +102 -5
- data/ext/hyperion_http/extconf.rb +41 -0
- data/ext/hyperion_http/io_uring_loop.c +710 -0
- data/ext/hyperion_http/page_cache.c +1032 -0
- data/ext/hyperion_http/page_cache_internal.h +132 -0
- data/lib/hyperion/connection.rb +14 -0
- data/lib/hyperion/dispatch_mode.rb +19 -1
- data/lib/hyperion/http2_handler.rb +123 -5
- data/lib/hyperion/metrics.rb +38 -0
- data/lib/hyperion/prometheus_exporter.rb +76 -1
- data/lib/hyperion/server/connection_loop.rb +159 -0
- data/lib/hyperion/server.rb +183 -0
- data/lib/hyperion/thread_pool.rb +23 -7
- data/lib/hyperion/version.rb +1 -1
- metadata +4 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: ae613d6efdf92072a2076a95a3656687e6bc0b5a9302700c56d75bcc0a900b56
|
|
4
|
+
data.tar.gz: f4f3f26cff7f4ebd08d35757726219ee53e65418bacdd0061d05626d7b354dba
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 5e0cbc3aa81bbe246dbf670642cae5fd2e1f0275f36030729bb68fe23b00dd54bbc51fb5f2e62c0a306fd825f73ab35e86cc49dd3bb6a27f38269885067de62c
|
|
7
|
+
data.tar.gz: f749e0b6e05c52695bc4698d847b92e88c10a17dc21a2d6d93deb9c9a013c5c712f778dfa13aa6dab6f5dd0ebf398c69a5eb56255031aaf9539ca561a38bca6d
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,571 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 2.12.0 — 2026-05-01
|
|
4
|
+
|
|
5
|
+
### 2.12-F — gRPC support on h2
|
|
6
|
+
|
|
7
|
+
**Background.** Hyperion has shipped a working HTTP/2 stack since 1.6
|
|
8
|
+
(per-stream fiber multiplexing, WINDOW_UPDATE-aware flow control, ALPN
|
|
9
|
+
auto-negotiation, native HPACK via the Rust v3/CGlue codec since 2.5-B).
|
|
10
|
+
What was missing for gRPC over Hyperion was three small things:
|
|
11
|
+
|
|
12
|
+
1. **Trailing headers.** gRPC carries its protocol-level status
|
|
13
|
+
(`grpc-status: 0` / `grpc-message: OK`) as **trailers** — a
|
|
14
|
+
final HEADERS frame sent AFTER the body's DATA frames, with
|
|
15
|
+
END_STREAM=1. Hyperion's pre-2.12-F dispatch path always
|
|
16
|
+
folded END_STREAM onto the LAST DATA frame, so there was no
|
|
17
|
+
hook for emitting trailers.
|
|
18
|
+
2. **Binary-clean request body.** The h2 RequestStream initialised
|
|
19
|
+
`@request_body = +''` (UTF-8). Valid gRPC request bodies
|
|
20
|
+
(`[1-byte compressed flag][4-byte length-prefix][protobuf bytes]`)
|
|
21
|
+
contain non-UTF-8 byte sequences, which broke `.bytesize` /
|
|
22
|
+
`valid_encoding?` checks and corrupted bodies that downstream
|
|
23
|
+
code interpolated into a UTF-8 String.
|
|
24
|
+
3. **TE: trailers preservation.** Hyperion's RFC 7540 §8.1.2.2
|
|
25
|
+
validator already accepted `te: trailers` (any other TE value
|
|
26
|
+
is rejected as a protocol error per the spec). What was needed
|
|
27
|
+
was a non-regression spec confirming the header makes it to
|
|
28
|
+
`env['HTTP_TE']` for the Rack app.
|
|
29
|
+
|
|
30
|
+
**What's new.** The h2 dispatch path (`Http2Handler#dispatch_stream`)
|
|
31
|
+
now checks if the Rack response body responds to `:trailers` AFTER
|
|
32
|
+
iterating the body. When it does, the wire shape becomes:
|
|
33
|
+
|
|
34
|
+
```
|
|
35
|
+
HEADERS (no END_STREAM)
|
|
36
|
+
DATA (no END_STREAM on last DATA)
|
|
37
|
+
HEADERS (END_STREAM=1) ← trailers, e.g. grpc-status: 0
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
The trailer Hash is filtered defensively before encoding: pseudo-headers
|
|
41
|
+
(`:status` / `:method` / etc.) and connection-specific names
|
|
42
|
+
(`connection`, `transfer-encoding`, …) are stripped — same allow-list
|
|
43
|
+
the regular response-header path uses. A misbehaving app that returns
|
|
44
|
+
non-Hash trailers (or whose `body.trailers` raises) gets a warn log and
|
|
45
|
+
falls back to the no-trailers path; the connection never crashes.
|
|
46
|
+
|
|
47
|
+
`Http2Handler::RequestStream#@request_body` is now an
|
|
48
|
+
`Encoding::ASCII_8BIT` String, so binary bytes survive verbatim through
|
|
49
|
+
`<<` accumulation and `env['rack.input'].read`.
|
|
50
|
+
|
|
51
|
+
**What's NOT changed.** The existing wire shape for non-gRPC h2 traffic
|
|
52
|
+
is identical to 2.11.x. A Rack body that doesn't define `:trailers` (or
|
|
53
|
+
where `body.trailers` is `nil` / empty) takes the pre-2.12-F path:
|
|
54
|
+
HEADERS → DATA-with-END_STREAM-on-last. Verified by three non-regression
|
|
55
|
+
specs in `spec/hyperion/grpc_trailers_spec.rb`.
|
|
56
|
+
|
|
57
|
+
The HTTP/1.1 dispatch path is untouched. gRPC is HTTP/2-only by spec
|
|
58
|
+
(the `Trailer:` mechanism on h1 has interop gaps with Go / Java clients;
|
|
59
|
+
gRPC mandates h2 since v1.0). Hyperion follows suit.
|
|
60
|
+
|
|
61
|
+
**What's NOT in scope for this stream.** gRPC server streaming /
|
|
62
|
+
client streaming / bidi streaming require Rack 3 streaming bodies AND
|
|
63
|
+
a way for the Rack app to read incoming DATA frames after sending
|
|
64
|
+
response HEADERS. That's a separate stream-control problem (`rack.hijack`
|
|
65
|
+
on h2 needs RFC 8441 Extended CONNECT, same blocker as WebSocket-over-h2).
|
|
66
|
+
2.12-F lands the **unary** (request-response) and **server-side trailers**
|
|
67
|
+
half — which covers ~80% of production gRPC traffic shapes by message
|
|
68
|
+
volume per the public Google / Netflix talks. Streaming is a 2.13
|
|
69
|
+
candidate.
|
|
70
|
+
|
|
71
|
+
**Opt-in via Rack 3 `body.trailers`.** Apps don't need to know about
|
|
72
|
+
Hyperion. Any Rack 3 body — hand-rolled, `grpc-server-rack`, a future
|
|
73
|
+
`Rack::Trailers` middleware — that exposes `body.trailers` returning
|
|
74
|
+
a Hash gets the trailing HEADERS frame on the wire. Bodies that don't
|
|
75
|
+
expose it (the overwhelming majority of HTTP/2 traffic — Rails,
|
|
76
|
+
Sinatra, asset servers, JSON APIs) take the legacy path with no
|
|
77
|
+
behaviour change.
|
|
78
|
+
|
|
79
|
+
**Spec coverage.** `spec/hyperion/grpc_trailers_spec.rb` — 11 specs
|
|
80
|
+
covering:
|
|
81
|
+
|
|
82
|
+
- End-to-end TLS+h2 client driving a real Hyperion server through
|
|
83
|
+
`Protocol::HTTP2::Client`, asserting the trailing HEADERS frame
|
|
84
|
+
decodes to the expected `grpc-status` / `grpc-message` map.
|
|
85
|
+
- `te: trailers` request header reaches `env['HTTP_TE']`.
|
|
86
|
+
- Binary request body bytes (`\xFF\x00\x01\x02\xC3\x28\x80`)
|
|
87
|
+
round-trip verbatim through `env['rack.input'].read`.
|
|
88
|
+
- Non-regression: bodies without `:trailers` keep the
|
|
89
|
+
DATA-with-END_STREAM-on-last shape (no extra HEADERS frame).
|
|
90
|
+
- Defensive: `body.trailers = nil` and `body.trailers = {}` both
|
|
91
|
+
take the no-trailers path.
|
|
92
|
+
- Unit tests for `collect_response_trailers` covering nil /
|
|
93
|
+
raises / Hash-coercible / non-responding bodies.
|
|
94
|
+
|
|
95
|
+
`spec/hyperion/grpc_smoke_spec.rb` — opt-in (`RUN_GRPC_SMOKE=1`)
|
|
96
|
+
end-to-end smoke against the real `grpc` Ruby gem. Skipped by default
|
|
97
|
+
because the gem pulls protobuf C bindings (~50 MB build); the unit
|
|
98
|
+
specs above are the durable coverage.
|
|
99
|
+
|
|
100
|
+
**Performance.** Zero hot-path cost when no app uses trailers — the
|
|
101
|
+
`body.respond_to?(:trailers)` probe is one method dispatch, the
|
|
102
|
+
trailers-Hash branch only runs when the probe returns truthy. The
|
|
103
|
+
trailers-emitting path costs one extra encoder-mutex acquisition + one
|
|
104
|
+
extra HEADERS frame per response — negligible against the existing
|
|
105
|
+
HEADERS+DATA sequence. No bench numbers here because gRPC throughput
|
|
106
|
+
on Hyperion is dominated by the application's protobuf marshalling,
|
|
107
|
+
not the framing layer; a meaningful gRPC bench is a 2.13 follow-up
|
|
108
|
+
once we have a streaming story to compare against Falcon's
|
|
109
|
+
`async-grpc`.
|
|
110
|
+
|
|
111
|
+
### 2.12-E — SO_REUSEPORT load balancing audit
|
|
112
|
+
|
|
113
|
+
**Background.** Hyperion's cluster mode (`-w N`) uses two listener-FD
|
|
114
|
+
models depending on platform:
|
|
115
|
+
|
|
116
|
+
- **Linux:** `SO_REUSEPORT`. Every worker calls `bind()` on the same
|
|
117
|
+
port; the kernel does in-kernel load balancing across connected
|
|
118
|
+
workers based on a 4-tuple hash. Documented as the
|
|
119
|
+
production-recommended setup since rc15.
|
|
120
|
+
- **Darwin / BSD:** master-bind + worker-fd-share. The master process
|
|
121
|
+
binds the listener and passes the fd to workers via `fork()`;
|
|
122
|
+
workers race to `accept()`. Darwin's `SO_REUSEPORT` doesn't
|
|
123
|
+
load-balance — it just lets multiple sockets bind without
|
|
124
|
+
`EADDRINUSE` — so we deliberately do NOT use it on Darwin.
|
|
125
|
+
|
|
126
|
+
The Linux path was built on the assumption that the kernel hash
|
|
127
|
+
distributes uniformly across workers under sustained load. We had no
|
|
128
|
+
**measurement** of that — only theory. 2.12-E fixes the gap.
|
|
129
|
+
|
|
130
|
+
**New per-worker request counter.** `Hyperion::Metrics#tick_worker_request`
|
|
131
|
+
ticks the labeled counter family
|
|
132
|
+
`hyperion_requests_dispatch_total{worker_id="<pid>"}` once per
|
|
133
|
+
dispatched request, regardless of which dispatch shape served it:
|
|
134
|
+
|
|
135
|
+
- Rack-via-`Connection#serve` (the threadpool / inline / async-io
|
|
136
|
+
paths).
|
|
137
|
+
- HTTP/2 stream dispatch in `Http2Handler`.
|
|
138
|
+
- The 2.12-C `accept4` C accept loop.
|
|
139
|
+
- The 2.12-D io_uring accept loop.
|
|
140
|
+
|
|
141
|
+
The C-loop variants tick a process-global atomic counter
|
|
142
|
+
(`Hyperion::Http::PageCache.c_loop_requests_total`) that the
|
|
143
|
+
`PrometheusExporter.render_full` pass folds into the
|
|
144
|
+
per-worker series at scrape time — so operators see one consistent
|
|
145
|
+
per-PID count even on the C-only fast path that bypasses
|
|
146
|
+
`Connection#serve`. Hot-path cost on the C loops is one
|
|
147
|
+
`__atomic_add_fetch` (relaxed) — negligible against the 134k r/s
|
|
148
|
+
2.12-D bench peak.
|
|
149
|
+
|
|
150
|
+
The label value is `Process.pid.to_s` — matches the 2.4-C
|
|
151
|
+
`hyperion_io_uring_workers_active` and
|
|
152
|
+
`hyperion_per_conn_rejections_total` labeling convention; lets
|
|
153
|
+
operators correlate cluster-mode rows with `ps`/`/proc` data without
|
|
154
|
+
a separate worker_id ↔ pid mapping table.
|
|
155
|
+
|
|
156
|
+
**New bench harness.** `bench/cluster_distribution.sh` boots
|
|
157
|
+
Hyperion `-w 4 -t 1 -p 9292 bench/hello_static.ru` (4 workers,
|
|
158
|
+
sharp imbalance signal), runs `wrk -t8 -c200 -d30s` against `/`,
|
|
159
|
+
then scrapes `/-/metrics` repeatedly until all 4 workers have
|
|
160
|
+
responded. Reports the per-worker request distribution
|
|
161
|
+
(pid + count + share-%), mean / stddev / max-vs-min ratio, and a
|
|
162
|
+
verdict:
|
|
163
|
+
|
|
164
|
+
- `balanced` (max/min ≤ 1.10): kernel hash is doing its job.
|
|
165
|
+
- `mild` (1.10 < max/min ≤ 1.50): note here, no fix.
|
|
166
|
+
- `severe` (max/min > 1.50): file follow-up; do not ship as-is.
|
|
167
|
+
|
|
168
|
+
Three runs back-to-back so noise is visible; aggregate verdict is
|
|
169
|
+
the worst per-run verdict (one severe run is enough to fail the
|
|
170
|
+
audit).
|
|
171
|
+
|
|
172
|
+
**Bench result.** See `docs/BENCH_HYPERION_2_11.md`'s "Cluster
|
|
173
|
+
distribution audit" section. Headline number lands in the
|
|
174
|
+
`[bench][2.12.0] 2.12-E — bench result: …` follow-up commit.
|
|
175
|
+
|
|
176
|
+
**Darwin caveat.** The Darwin master-bind/worker-fd-share path is
|
|
177
|
+
documented as known-imbalanced and is NOT covered by this audit
|
|
178
|
+
(no kernel SO_REUSEPORT distributor to measure). The metric
|
|
179
|
+
infrastructure is platform-independent; any operator running on
|
|
180
|
+
Darwin can read their own per-worker imbalance via `/-/metrics` and
|
|
181
|
+
the same `hyperion_requests_dispatch_total{worker_id}` series.
|
|
182
|
+
|
|
183
|
+
**Spec coverage.** `spec/hyperion/metrics_per_worker_request_count_spec.rb`
|
|
184
|
+
asserts:
|
|
185
|
+
|
|
186
|
+
- `Metrics#tick_worker_request` registers the labeled family with
|
|
187
|
+
`worker_id` label and increments the right series.
|
|
188
|
+
- `Connection#serve` ticks the counter once per request from any
|
|
189
|
+
dispatch shape (regular Rack, direct dispatch, StaticEntry fast
|
|
190
|
+
path).
|
|
191
|
+
- `PrometheusExporter.render_full` emits the merged
|
|
192
|
+
`hyperion_requests_dispatch_total{worker_id="<pid>"}` line.
|
|
193
|
+
- The C accept4 loop's atomic counter ticks per request and is
|
|
194
|
+
reset on `run_static_accept_loop` entry. (Gated on the C ext
|
|
195
|
+
being available; same gating pattern 2.12-D specs use.)
|
|
196
|
+
|
|
197
|
+
### 2.12-D — io_uring accept loop (Linux 5.x)
|
|
198
|
+
|
|
199
|
+
The 2.12-C `accept4` loop closed most of the gap to Agoo on the
|
|
200
|
+
`handle_static` hello row (5,502 → 15,685 r/s, 1.21× from parity).
|
|
201
|
+
Looking at the remaining cost on Linux: the worker thread does
|
|
202
|
+
**three** syscalls per request — `accept4` + `recv` + `write`. On a
|
|
203
|
+
host that delivers ~150 ns of syscall entry/exit overhead each, that's
|
|
204
|
+
~450 ns of pure kernel-mode bookkeeping per request, before any I/O
|
|
205
|
+
actually happens.
|
|
206
|
+
|
|
207
|
+
io_uring lets us collapse the train. We submit ACCEPT/RECV/WRITE/CLOSE
|
|
208
|
+
SQEs to a per-worker ring; the worker thread does **one**
|
|
209
|
+
`io_uring_enter` per cycle and reaps N completions in a single
|
|
210
|
+
syscall round-trip. Steady-state cost goes from N×3 syscalls to
|
|
211
|
+
~N÷K × 1 syscall (where K is the burst-batch size the kernel
|
|
212
|
+
naturally delivers).
|
|
213
|
+
|
|
214
|
+
**New C exports** on `Hyperion::Http::PageCache`:
|
|
215
|
+
|
|
216
|
+
- `run_static_io_uring_loop(listen_fd) -> Integer | :crashed | :unavailable`
|
|
217
|
+
drives the io_uring accept-and-serve loop. Same wire contract as
|
|
218
|
+
`run_static_accept_loop`: only `handle_static` routes, plain TCP,
|
|
219
|
+
GET/HEAD without a body, HTTP/1.1 only. Returns `:unavailable` if
|
|
220
|
+
the C ext was built without `liburing` OR the runtime
|
|
221
|
+
`io_uring_queue_init` probe failed (seccomp / locked-down
|
|
222
|
+
container / kernel < 5.5). The Ruby caller treats `:unavailable`
|
|
223
|
+
as "fall through to the 2.12-C `accept4` path" — operator gets a
|
|
224
|
+
one-line warn-level log, no surprise downtime.
|
|
225
|
+
- `io_uring_loop_compiled?` — boolean, true when the C ext was
|
|
226
|
+
built with `HAVE_LIBURING`. Cheap eligibility check that lets
|
|
227
|
+
`ConnectionLoop#io_uring_eligible?` skip the env-var read on
|
|
228
|
+
builds where the path can't engage anyway.
|
|
229
|
+
|
|
230
|
+
**Build.** `extconf.rb` probes for `liburing` in two passes:
|
|
231
|
+
|
|
232
|
+
1. `pkg-config --exists liburing` — picks up Debian/Ubuntu's
|
|
233
|
+
pkg-config metadata and adds the right `-L`/`-l` flags.
|
|
234
|
+
2. `have_header('liburing.h')` + `have_library('uring', ...)` —
|
|
235
|
+
covers the no-pkg-config path.
|
|
236
|
+
|
|
237
|
+
On success, `-DHAVE_LIBURING` lands and the io_uring loop compiles.
|
|
238
|
+
On failure, the same `io_uring_loop.c` translation unit compiles to
|
|
239
|
+
a thin stub that returns `:unavailable`. Soft-optional dependency:
|
|
240
|
+
hosts without liburing-dev still build cleanly and ship the 2.12-C
|
|
241
|
+
behaviour.
|
|
242
|
+
|
|
243
|
+
**Wiring.** `Hyperion::Server::ConnectionLoop.io_uring_eligible?`
|
|
244
|
+
returns true when ALL hold:
|
|
245
|
+
|
|
246
|
+
1. The C accept-loop path is available
|
|
247
|
+
(`ConnectionLoop.available?`).
|
|
248
|
+
2. The C ext was compiled with `HAVE_LIBURING`.
|
|
249
|
+
3. `HYPERION_IO_URING_ACCEPT=1` is set (default OFF in 2.12 — the
|
|
250
|
+
soak window before flipping the default to ON is 2.13 or later).
|
|
251
|
+
|
|
252
|
+
When eligible, `Server#run_c_accept_loop` calls
|
|
253
|
+
`run_static_io_uring_loop` first; on `:unavailable` (runtime probe
|
|
254
|
+
fail) it falls through to `run_static_accept_loop`. A new
|
|
255
|
+
`:c_accept_loop_io_uring_h1` `DispatchMode` symbol counts engagements
|
|
256
|
+
under `requests_dispatch_c_accept_loop_io_uring_h1`.
|
|
257
|
+
|
|
258
|
+
**Lifecycle hooks.** Same contract as 2.12-C: per-request
|
|
259
|
+
`Runtime#fire_request_start` / `#fire_request_end` fire on every
|
|
260
|
+
request the io_uring loop served. `env=nil`, minimal `Hyperion::Request`
|
|
261
|
+
passed. The C-side `lifecycle_active?` flag gates the GVL re-acquisition
|
|
262
|
+
(`rb_thread_call_with_gvl`); the no-hook hot path stays GVL-free for
|
|
263
|
+
the whole `submit_and_wait` cycle.
|
|
264
|
+
|
|
265
|
+
**Handoff.** Same set of "send to Ruby" triggers as 2.12-C: HTTP/1.0,
|
|
266
|
+
`Content-Length` / `Transfer-Encoding`, `Upgrade`, `HTTP2-Settings`,
|
|
267
|
+
`Connection: upgrade`, malformed framing, header section >64 KiB,
|
|
268
|
+
unknown method, path miss. Ruby resumes ownership via the same
|
|
269
|
+
`dispatch_handed_off` path the 2.12-C loop already uses.
|
|
270
|
+
|
|
271
|
+
**TCP_NODELAY.** Applied per-accept via the shared
|
|
272
|
+
`pc_internal_apply_tcp_nodelay` wrapper — preserves the 2.10-G hunk.
|
|
273
|
+
|
|
274
|
+
**Tests.** `spec/hyperion/io_uring_loop_spec.rb` covers the Ruby
|
|
275
|
+
surface (always exposed), the stub `:unavailable` return on
|
|
276
|
+
non-Linux / no-liburing builds, eligibility-gate semantics
|
|
277
|
+
(`HYPERION_IO_URING_ACCEPT` env var honoured, compile-time flag
|
|
278
|
+
honoured), and — gated on `HAVE_LIBURING` — a smoke test (5 GETs,
|
|
279
|
+
assert served count grows), lifecycle-hook firing parity with the
|
|
280
|
+
2.12-C contract, and Server-level engagement (the
|
|
281
|
+
`:c_accept_loop_io_uring_h1` dispatch mode is recorded).
|
|
282
|
+
|
|
283
|
+
Total: 1065 specs / 0 failures / 15 pending on macOS — +10 specs and
|
|
284
|
+
+4 pending over the 2.12-C macOS baseline (1055 / 0 / 11). Linux:
|
|
285
|
+
1065 / 1 / 14 (the one failure is a pre-existing flake on the 2.12-C
|
|
286
|
+
smoke spec — reproduces on unmodified master at e526ef3 on the bench
|
|
287
|
+
host, NOT a 2.12-D regression).
|
|
288
|
+
|
|
289
|
+
**Bench (3 trials, median, openclaw-vm 16 vCPU Ubuntu 24.04 Linux 6.8.0,
|
|
290
|
+
liburing 2.5, wrk -t4 -c100 -d20s, `bench/hello_static.ru`):**
|
|
291
|
+
|
|
292
|
+
| Variant | r/s (median) | p99 |
|
|
293
|
+
|---|---:|---:|
|
|
294
|
+
| 2.12-C `handle_static` Hyperion (C accept loop, accept4) | 15,532 | 101 µs |
|
|
295
|
+
| **2.12-D `handle_static` Hyperion (C accept loop, io_uring)** | **134,084** | **0.99 ms** |
|
|
296
|
+
| Agoo 2.15.14 (reference, prior bench) | 19,024 | 10.47 ms |
|
|
297
|
+
|
|
298
|
+
`handle_static` hello: **8.6× over the 2.12-C accept4 baseline**
|
|
299
|
+
(15,532 → 134,084 r/s) and **7.0× over Agoo** (134,084 vs. 19,024).
|
|
300
|
+
The accept4 path is unchanged and within bench noise of the prior
|
|
301
|
+
2.12-C 15,685 r/s baseline (15,532 r/s today is -1%, well inside the
|
|
302
|
+
±5% target).
|
|
303
|
+
|
|
304
|
+
The p99 climb (101 µs accept4 → 0.99 ms io_uring) is the cost of
|
|
305
|
+
batching: with one `io_uring_enter` reaping K completions, the
|
|
306
|
+
last-enqueued request waits for the kernel to produce all K CQEs
|
|
307
|
+
before our worker drains. At 134k r/s the per-request median is
|
|
308
|
+
~7.4 µs; the p99 of ~990 µs is roughly the steady-state batch-tail
|
|
309
|
+
(~130 requests' worth of completions). For the smoke-class workload
|
|
310
|
+
this is a clear win — sub-millisecond p99 is below most real-world
|
|
311
|
+
network jitter floors and well below Agoo's 10.47 ms p99.
|
|
312
|
+
|
|
313
|
+
The Rack-style fallback (`bench/hello.ru`, no `handle_static`) is
|
|
314
|
+
unaffected: io_uring engagement requires `Server.handle_static`
|
|
315
|
+
registration, the regular dynamic-Rack path is unchanged.
|
|
316
|
+
|
|
317
|
+
**Production rollout.** Default OFF for 2.12.0. Operators opt in
|
|
318
|
+
with `HYPERION_IO_URING_ACCEPT=1` per worker. Soak window for
|
|
319
|
+
flipping the default to ON is 2.13 — kernel io_uring has known
|
|
320
|
+
sharp edges around fork-shared rings and `seccomp` policies that
|
|
321
|
+
this default-off posture lets us discover under operator A/B
|
|
322
|
+
before forcing the path on every install.
|
|
323
|
+
|
|
324
|
+
### 2.12-C — Connection lifecycle in C
|
|
325
|
+
|
|
326
|
+
The 2.12-B re-bench made it clear that Hyperion's `handle_static`
|
|
327
|
+
hot path was already doing the **response side** in C
|
|
328
|
+
(`PageCache.serve_request`, 2.10-F), but the **connection
|
|
329
|
+
lifecycle around it** — accept loop, route lookup, per-`Connection`
|
|
330
|
+
ivar init — was still in Ruby. The 2.10-D / 2.10-F bench analysis
|
|
331
|
+
flagged that as the next dominant cost: `accept4` + `clone3`
|
|
332
|
+
(worker thread wakeup) + `futex` (GVL handoff) + Ruby ivar init
|
|
333
|
+
on every connection.
|
|
334
|
+
|
|
335
|
+
This stream pushes the entire accept→read→route-lookup→write
|
|
336
|
+
loop into C for `handle_static`-routed paths.
|
|
337
|
+
|
|
338
|
+
**New C exports** on `Hyperion::Http::PageCache`:
|
|
339
|
+
|
|
340
|
+
- `run_static_accept_loop(listen_fd) -> Integer | :crashed`
|
|
341
|
+
drives the accept-and-serve loop. Releases the GVL during
|
|
342
|
+
`accept(2)`, `recv(2)`, and `write(2)`; re-acquires only for
|
|
343
|
+
the registered lifecycle / handoff Ruby callbacks. Returns
|
|
344
|
+
the count of requests served when the listener closes
|
|
345
|
+
cleanly, or `:crashed` if an unrecoverable accept error
|
|
346
|
+
happened (Ruby falls back to its own accept loop on this
|
|
347
|
+
sentinel).
|
|
348
|
+
- `set_lifecycle_callback(callable)` and
|
|
349
|
+
`set_lifecycle_active(bool)` — register / gate the per-request
|
|
350
|
+
lifecycle hook. Decoupled so flipping the gate at runtime
|
|
351
|
+
doesn't re-allocate the callback. The hot path checks a
|
|
352
|
+
single `int` flag; the no-hook path stays one syscall (recv
|
|
353
|
+
or write) per request.
|
|
354
|
+
- `set_handoff_callback(callable)` — register the callback the
|
|
355
|
+
C loop invokes when a connection's first request can't be
|
|
356
|
+
served from the static cache (path miss, malformed request,
|
|
357
|
+
body present, h2 upgrade requested, HTTP/1.0). Receives
|
|
358
|
+
`(fd_int, partial_buffer_or_nil)`; Ruby owns the fd from
|
|
359
|
+
that point on and resumes ownership via the regular
|
|
360
|
+
`Connection` path.
|
|
361
|
+
- `stop_accept_loop` and `lifecycle_active?` — operator /
|
|
362
|
+
spec helpers.
|
|
363
|
+
- `handoff_to_ruby(client_fd, partial_buffer, partial_len)` —
|
|
364
|
+
echo helper exposing the 2.12-C contract for spec-time
|
|
365
|
+
introspection (the actual handoff happens inside
|
|
366
|
+
`run_static_accept_loop` via the registered callback).
|
|
367
|
+
|
|
368
|
+
**Wiring.** `Hyperion::Server::ConnectionLoop` (new module) holds
|
|
369
|
+
the engagement check + callback factories. `Server#start_raw_loop`
|
|
370
|
+
engages the C loop when:
|
|
371
|
+
|
|
372
|
+
1. The listener is plain TCP (no TLS, no h2 ALPN dance).
|
|
373
|
+
2. The route table has at least one
|
|
374
|
+
`RouteTable::StaticEntry` registration.
|
|
375
|
+
3. The route table has NO non-StaticEntry registrations
|
|
376
|
+
(any dynamic handler disables the C path; the C loop
|
|
377
|
+
only knows how to write prebuilt responses).
|
|
378
|
+
4. The C ext is available (the `Hyperion::Http::PageCache`
|
|
379
|
+
module responds to `:run_static_accept_loop`) and the
|
|
380
|
+
`HYPERION_C_ACCEPT_LOOP=0` escape hatch is not set.
|
|
381
|
+
|
|
382
|
+
A new `:c_accept_loop_h1` `DispatchMode` symbol ships under
|
|
383
|
+
`requests_dispatch_c_accept_loop_h1` (one bump per worker boot
|
|
384
|
+
that engages the path) plus `c_accept_loop_requests` (count of
|
|
385
|
+
requests the C loop served on this worker, bumped at loop exit).
|
|
386
|
+
|
|
387
|
+
**Lifecycle hooks.** The 2.10-D contract holds: per-request
|
|
388
|
+
`Runtime#fire_request_start` / `#fire_request_end` fire on every
|
|
389
|
+
request the C loop served. `env=nil`, `path=` the matched path
|
|
390
|
+
string — same surface as the Ruby direct-dispatch path. The
|
|
391
|
+
Ruby-side bridge in `ConnectionLoop.build_lifecycle_callback`
|
|
392
|
+
builds a minimal `Hyperion::Request` for the hooks to consume.
|
|
393
|
+
The C-side `lifecycle_active?` flag gates the `rb_funcall`;
|
|
394
|
+
when no hooks are registered, the no-hook hot path is one
|
|
395
|
+
syscall (recv) + one syscall (write) per request, with **zero
|
|
396
|
+
Ruby method invocations**.
|
|
397
|
+
|
|
398
|
+
**Handoff.** When the C loop sees a request it can't serve from
|
|
399
|
+
the static cache (POST, path miss, malformed framing,
|
|
400
|
+
`Content-Length`, `Transfer-Encoding`, `Upgrade`,
|
|
401
|
+
`HTTP2-Settings`, `Connection: upgrade`, HTTP/1.0), it invokes
|
|
402
|
+
the handoff callback with `(fd_int, partial_buffer_str)` and
|
|
403
|
+
continues to the next accept. Ruby resumes ownership of the fd
|
|
404
|
+
via `dispatch_handed_off`, which wraps the fd in `Socket.for_fd`,
|
|
405
|
+
applies the read timeout (matches the regular Ruby accept
|
|
406
|
+
path), pre-loads the partial buffer onto the Connection's
|
|
407
|
+
`@inbuf` ivar so the parser sees the bytes the C loop already
|
|
408
|
+
consumed, and dispatches through the existing thread-pool /
|
|
409
|
+
inline path.
|
|
410
|
+
|
|
411
|
+
**Tests.** `spec/hyperion/connection_loop_spec.rb` covers
|
|
412
|
+
smoke (registered route → C loop served-count grows), mixed
|
|
413
|
+
registered + unregistered (C loop hands off to the Ruby callback
|
|
414
|
+
spy with the partial buffer + fd), body-present requests
|
|
415
|
+
(`POST /hello` hands off), lifecycle hooks
|
|
416
|
+
(`set_lifecycle_active(true)` fires the registered callback
|
|
417
|
+
once per served request; `false` is silent — no callback
|
|
418
|
+
invocation, even with one set), GVL release (a slow C-loop
|
|
419
|
+
parked on `accept(2)` doesn't block another Ruby thread doing
|
|
420
|
+
arithmetic), keep-alive on a single connection (two pipelined
|
|
421
|
+
requests on the same socket → two responses), Server-level
|
|
422
|
+
engagement (only-static-routes engages the C loop;
|
|
423
|
+
mixed-with-dynamic-handlers refuses to engage and falls back
|
|
424
|
+
to the regular Ruby loop).
|
|
425
|
+
|
|
426
|
+
Total: 1112 specs / 0 failures / 11 pending — +10 specs over
|
|
427
|
+
the 2.12-B baseline (1102 / 0 / 11).
|
|
428
|
+
|
|
429
|
+
**Bench (3 trials, median, openclaw-vm 16 vCPU Ubuntu 24.04 Linux 6.8.0,
|
|
430
|
+
wrk -t4 -c100 -d20s, `bench/hello_static.ru`):**
|
|
431
|
+
|
|
432
|
+
| Variant | r/s (median) | p99 |
|
|
433
|
+
|---|---:|---:|
|
|
434
|
+
| 2.12-B `handle_static` Hyperion (Ruby accept loop + C `serve_request`) | 5,502 | 1.59 ms |
|
|
435
|
+
| **2.12-C `handle_static` Hyperion (C accept loop)** | **15,685** | **107 µs** |
|
|
436
|
+
| Agoo 2.15.14 (reference) | 19,024 | 10.47 ms |
|
|
437
|
+
|
|
438
|
+
`handle_static` hello: **2.85× over the 2.12-B baseline** (5,502 → 15,685).
|
|
439
|
+
Gap to Agoo's 19,024 r/s closed from **3.46×** to **1.21×** — the
|
|
440
|
+
2.12-C path is now within striking distance of Agoo on this row.
|
|
441
|
+
Tail latency (p99) is now **107 µs**, an **15× improvement** over
|
|
442
|
+
the 2.12-B 1.59 ms p99 and **97× better** than Agoo's 10.47 ms p99.
|
|
443
|
+
|
|
444
|
+
The Rack-style fallback (`bench/hello.ru`, no `handle_static`) was
|
|
445
|
+
re-checked: 4,648 r/s median (vs 4,477 r/s 2.12-B baseline — within
|
|
446
|
+
bench noise, no regression). The C-loop path is only engaged when
|
|
447
|
+
the operator opts in via `Server.handle_static` registration on
|
|
448
|
+
every route; the regular dynamic-Rack path is unchanged.
|
|
449
|
+
|
|
450
|
+
### 2.12-B — Fresh 4-way re-bench (post-2.10/2.11 wins)
|
|
451
|
+
|
|
452
|
+
The 4-way head-to-head in `docs/BENCH_HYPERION_2_0.md`
|
|
453
|
+
§ "4-way head-to-head (2.10-B baseline, 2026-05-01)" was
|
|
454
|
+
captured before the 2.10-C / 2.10-D / 2.10-E / 2.10-F /
|
|
455
|
+
2.10-G / 2.11-A / 2.11-B wins landed. Every Hyperion column
|
|
456
|
+
in that doc was therefore stale; the Puma / Falcon / Agoo
|
|
457
|
+
columns were unchanged (no version bumps on the bench host).
|
|
458
|
+
This stream re-runs the entire 6-row matrix on the 2.11.0
|
|
459
|
+
codebase — same host (`openclaw-vm`, 16 vCPU, Ubuntu 24.04,
|
|
460
|
+
Linux 6.8.0), same tools (`wrk` + `perfer`), 3 trials per
|
|
461
|
+
row, median reported — and writes the results to a new
|
|
462
|
+
`docs/BENCH_HYPERION_2_11.md` (pointed at from the README's
|
|
463
|
+
Benchmarks section; the 2.0 doc is now marked "historical
|
|
464
|
+
baseline (pre-2.10/2.11 wins)").
|
|
465
|
+
|
|
466
|
+
The harness at `bench/4way_compare.sh` gains a sixth server
|
|
467
|
+
label, `hyperion_handle_static`, which boots Hyperion against
|
|
468
|
+
an alternate rackup whose hot path is the
|
|
469
|
+
`Hyperion::Server.handle_static` direct route + 2.10-F C-ext
|
|
470
|
+
fast-path response writer + 2.10-C PageCache fold. This
|
|
471
|
+
exposes two Hyperion columns on the rows where it applies
|
|
472
|
+
(hello + small static): "Rack-style" (the legacy generic-Rack
|
|
473
|
+
path most apps run unchanged) and "`handle_static`" (the peak
|
|
474
|
+
path operators can opt into by registering one pre-built
|
|
475
|
+
route at boot). New rackup `bench/static_handle_static.ru`
|
|
476
|
+
preloads the 1 KB static asset bytes at boot and serves them
|
|
477
|
+
through the same direct-route fast path; the existing
|
|
478
|
+
`bench/hello_static.ru` (added in 2.10-F) plays the same role
|
|
479
|
+
for the hello row.
|
|
480
|
+
|
|
481
|
+
**Headline shifts (medians, vs the 2.10-B baseline at the
|
|
482
|
+
same workload):**
|
|
483
|
+
|
|
484
|
+
| # | Workload | 2.10-B Hyperion | 2.11.0 Rack-style | 2.11.0 `handle_static` | Gap-vs-leader shift |
|
|
485
|
+
|---:|---|---:|---:|---:|---|
|
|
486
|
+
| 1 | hello | 4,587 | 4,477 (-2.4%, noise) | **5,502** (+19.9%) | gap to Agoo: 4.22× → **3.46×** with `handle_static` |
|
|
487
|
+
| 2 | static 1 KB | 1,380 | 1,687 (**+22.2%**, PageCache 2.10-C auto-engage) | **5,935** (+330%) | gap to Agoo: 1.89× behind → **flipped, Hyperion +127%** |
|
|
488
|
+
| 3 | static 1 MiB | 1,378 | 1,513 (+9.8%) | n/a (handle_static buffers in memory; defeats sendfile) | Hyperion lead vs Agoo: 9.07× → 9.74× (held) |
|
|
489
|
+
| 4 | CPU JSON 50-key | 3,450 | 3,659 (+6.0%) | n/a (per-request `JSON.generate`) | gap to Agoo: 1.85× → **2.05× (widened)**; Agoo +17.5% in same window |
|
|
490
|
+
| 5 | PG-bound async (`pg_sleep 50ms`) | 1,564 | 1,565 (+0.1%) | n/a | Hyperion-only; identical |
|
|
491
|
+
| 6 | SSE 1000 events × 50 B | 500 | 472 (-5.6%, noise) | n/a (single fixed response) | Hyperion lead vs Puma: 3.65× → 3.58× (flat) |
|
|
492
|
+
|
|
493
|
+
**Reading the deltas.**
|
|
494
|
+
|
|
495
|
+
- **The 2.10-D + 2.10-F + 2.10-C win-stack lands cleanly on
|
|
496
|
+
static 1 KB.** Hyperion's `handle_static` row at 5,935 r/s
|
|
497
|
+
wins the column outright by +127% (2.27×) over Agoo, +208%
|
|
498
|
+
over Falcon, +282% over Puma. The Rack-style row also moved
|
|
499
|
+
up +22.2% from the PageCache auto-engage even without
|
|
500
|
+
explicit `handle_static` registration. Operators who can
|
|
501
|
+
register one pre-built route via `handle_static` pick up a
|
|
502
|
+
3.5× lift over the generic Rack-style path on this exact
|
|
503
|
+
shape.
|
|
504
|
+
- **Hello narrowed but did not close.** The gap to Agoo went
|
|
505
|
+
from 4.22× to 3.46× with `handle_static`; the Rack-style
|
|
506
|
+
row stayed flat (within bench noise) — meaning the 2.10-D
|
|
507
|
+
direct-route win **only manifests when the rackup actually
|
|
508
|
+
opts in via `handle_static`**.
|
|
509
|
+
- **CPU JSON widened.** Agoo +17.5% in the same window
|
|
510
|
+
Hyperion +6.0% took the gap from 1.85× to 2.05×. This is
|
|
511
|
+
the one row the 2.10/2.11 streams didn't move; closing it
|
|
512
|
+
is the obvious 2.12 follow-on.
|
|
513
|
+
- **Large static, PG-bound async, SSE — Hyperion's existing
|
|
514
|
+
wins held.** Sendfile (9.7× over Agoo on 1 MiB), fiber-
|
|
515
|
+
cooperative I/O (Hyperion-only on PG-bound, identical to
|
|
516
|
+
2.10-B), ChunkedCoalescer (3.58× over Puma, 16.7× over
|
|
517
|
+
Falcon on SSE) all stayed clean. Agoo's SSE behavior
|
|
518
|
+
regressed to a hard segfault at boot on `bench/sse_generic.ru`
|
|
519
|
+
(different shape from 2.10-B's "buffers entire response,
|
|
520
|
+
takes ~5 s to flush"; same conclusion either way — Agoo is
|
|
521
|
+
not a viable SSE server on this rackup at any throughput).
|
|
522
|
+
- **Tail latency is still Hyperion's clean win** across every
|
|
523
|
+
row with a non-trivial p99: hello 1.73 ms vs Agoo 10.47 ms
|
|
524
|
+
/ Puma 29 ms / Falcon 408 ms; 1 KB 1.69 ms vs 57–86 ms;
|
|
525
|
+
1 MiB 4.63 ms vs 82–720 ms; CPU JSON 2.60 ms vs 17–411 ms;
|
|
526
|
+
SSE 2.85 ms vs 11–42 ms.
|
|
527
|
+
|
|
528
|
+
**Harness changes** (`bench/4way_compare.sh`):
|
|
529
|
+
|
|
530
|
+
- New server label `hyperion_handle_static` — boots Hyperion
|
|
531
|
+
against an alternate rackup specified by the new
|
|
532
|
+
`HYPERION_STATIC_RACKUP` env var (falls back to the same
|
|
533
|
+
`RACKUP` as the legacy `hyperion` label if unset, which
|
|
534
|
+
becomes a harmless no-op duplicate). The four legacy
|
|
535
|
+
labels (`hyperion`, `puma`, `falcon`, `agoo`) keep their
|
|
536
|
+
byte-identical 2.10-B shape so the older doc stays
|
|
537
|
+
reproducible.
|
|
538
|
+
|
|
539
|
+
**New rackup**: `bench/static_handle_static.ru` — preloads
|
|
540
|
+
the 1 KB static asset (`/tmp/hyperion_bench_1k.bin`) at boot,
|
|
541
|
+
registers it via `Hyperion::Server.handle_static`, and falls
|
|
542
|
+
through to a 404 lambda for any other path. Mirrors the role
|
|
543
|
+
`bench/hello_static.ru` plays for the hello row.
|
|
544
|
+
|
|
545
|
+
**Spec coverage** (`spec/hyperion/bench_handle_static_rackups_spec.rb`,
|
|
546
|
+
9 examples):
|
|
547
|
+
|
|
548
|
+
- `bench/hello_static.ru` parses cleanly and registers a `/`
|
|
549
|
+
StaticEntry with the expected response bytes.
|
|
550
|
+
- `bench/static_handle_static.ru` preloads the asset and
|
|
551
|
+
registers a route at boot; fails fast if the asset is
|
|
552
|
+
missing rather than booting empty (prevents a silently-
|
|
553
|
+
bound 404 from masking a misconfigured bench run).
|
|
554
|
+
- `bench/4way_compare.sh` knows how to boot all five variant
|
|
555
|
+
labels (hyperion, hyperion_handle_static, puma, falcon,
|
|
556
|
+
agoo) — a typo in the harness's case statement or the
|
|
557
|
+
new env var name fails this spec at unit-level instead
|
|
558
|
+
of waiting for a 41-minute bench sweep to surface the
|
|
559
|
+
regression.
|
|
560
|
+
|
|
561
|
+
Spec count: 1093 → 1102 (+9). 0 failures, 11 pending —
|
|
562
|
+
invariant preserved.
|
|
563
|
+
|
|
564
|
+
**Reproducing.** See
|
|
565
|
+
`docs/BENCH_HYPERION_2_11.md` § "Reproducing 4-way" for the
|
|
566
|
+
six per-row commands. Bench host: `openclaw-vm`, sweep dir
|
|
567
|
+
`/home/ubuntu/bench-2.12-B/`. Total wall time ~41 min.
|
|
568
|
+
|
|
3
569
|
## 2.11.0 — 2026-05-01
|
|
4
570
|
|
|
5
571
|
### 2.11-B — HPACK FFI marshalling round-2 (cglue confirmed as firm default; +43% v3 vs v2 on Rails-shape h2)
|