hyperion-rb 2.11.0 → 2.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -11,6 +11,114 @@ gem install hyperion-rb
11
11
  bundle exec hyperion config.ru
12
12
  ```
13
13
 
14
+ ## What's new in 2.13.0
15
+
16
+ **Hardening sprint: profile-driven CPU work, durable infrastructure,
17
+ and gRPC streaming.** 2.13 follows up the 2.12 perf jump with the
18
+ work that wasn't structural enough to need its own major release —
19
+ each stream is small but adds up:
20
+
21
+ - **2.13-A — Generic Rack hot-path wins.** Per-thread shards for
22
+ hot-path metrics (no more cross-worker mutex on `observe_histogram`),
23
+ cached `worker_id` label tuple, and a Rack-3 keepalive fast-path.
24
+ Generic Rack hello bench: **+3.6% on `-c100` no-log**. Honest
25
+ finding: env-pool + body-coalesce already shipped; the deeper
26
+ generic-Rack gap to Agoo is single-thread Ruby ceiling that closing
27
+ needs moving `app.call` into the C accept loop — a 2.14 lift.
28
+ - **2.13-B — Response head builder rewritten in C.** Pre-baked
29
+ status-line table, `rb_hash_foreach` replacing `rb_funcall(:keys)`,
30
+ per-key downcase + per-(key, value) full-line caches, custom `itoa`
31
+ replacing `snprintf`. **+7.7% single-thread synthetic; multi-thread
32
+ neutral (GVL-bound).** Profile confirms Hyperion's own C-ext code is
33
+ **<1%** of wall-clock; the rest is libruby + JSON gem.
34
+ - **2.13-C — Spec flake hunt.** Two long-standing flakes fixed:
35
+ `tls_ktls_spec` macOS skip leak (unconditional `RUBY_PLATFORM`
36
+ guard), and `connection_loop_spec:79` Linux port-bind flake (root
37
+ cause: Linux `close()` doesn't wake a parked `accept(2)` — fixed
38
+ with a `stop_loop_and_wake` helper). 5/5 → 0/10 failure rate;
39
+ spec-suite runtime 46 s → 1.3 s.
40
+ - **2.13-D — gRPC streaming RPCs.** Server-streaming, client-streaming,
41
+ and bidirectional RPCs on top of 2.12-F's unary trailers foundation.
42
+ New `bench/grpc_stream.{proto,ru}` + `grpc_stream_bench.sh` ghz
43
+ harness for operator-side comparison vs Falcon's `async-grpc`.
44
+ - **2.13-E — io_uring soak harness + CI smoke.** New
45
+ `bench/io_uring_soak.sh` runs a 24h soak against the 2.12-D
46
+ io_uring loop with `/proc/$PID` + `/-/metrics` sampling, emits a
47
+ CSV + verdict (PASS / SOAK FAIL / borderline). New
48
+ `spec/hyperion/io_uring_soak_smoke_spec.rb` runs a 1000-request
49
+ mini-soak in CI to catch leak regressions before any 24h run.
50
+ **`HYPERION_IO_URING_ACCEPT` stays opt-in for 2.13** — operators
51
+ with their own staging environments can now collect signal; the
52
+ default-flip decision moves to 2.14.
53
+
54
+ Plus all previous wins are preserved and verified by the 1183-spec
55
+ suite (2.10-G TCP_NODELAY at accept, 2.10-E preload hooks, 2.10-F
56
+ C-ext fast-path response writer, 2.11-A dispatch pool warmup, 2.11-B
57
+ cglue HPACK default, 2.12-C accept4 connection loop, 2.12-D io_uring
58
+ loop, 2.12-E per-worker request counter, 2.12-F gRPC unary trailers).
59
+
60
+ Full per-stream details, bench tables, and follow-up items in
61
+ [`CHANGELOG.md`](CHANGELOG.md).
62
+
63
+ ## What's new in 2.12.0
64
+
65
+ **The hot path moves into C — and gRPC ships.** The headline win:
66
+ `Server.handle_static` routes now serve from a C accept→read→route→write
67
+ loop with optional **io_uring** (Linux 5.x+) backing it. The `wrk -t4
68
+ -c100 -d20s` hello bench moved from **5,502 r/s** (2.11.0
69
+ `Server.handle_static` via Ruby accept loop) to **15,685 r/s** (2.12-C
70
+ C accept4 loop) to **134,084 r/s** (2.12-D io_uring loop) — that's
71
+ **24× over 2.11.0's `handle_static` and 7× over Agoo 2.15.14's
72
+ 19,024 r/s** on the same workload. p99 stays sub-millisecond
73
+ throughout. Plus durable foundation work and one big new feature:
74
+
75
+ - **2.12-B — Fresh 4-way re-bench.** New
76
+ [`docs/BENCH_HYPERION_2_11.md`](docs/BENCH_HYPERION_2_11.md) re-runs
77
+ Hyperion / Puma / Falcon / Agoo on the 6 workloads with all 2.10/2.11
78
+ wins enabled. Headline shifts: static 1 KB Hyperion `handle_static`
79
+ flipped from 1.89× behind Agoo to **+127% ahead**; CPU JSON gap
80
+ widened (the one row 2.10/2.11 didn't touch — flagged for follow-up).
81
+ - **2.12-C — Connection lifecycle in C.** New
82
+ `Hyperion::Http::PageCache.run_static_accept_loop` does
83
+ `accept4` + `recv` + path lookup + `write` entirely in a C tight
84
+ loop, returning to Ruby only on a route miss / TLS / h2 / WebSocket
85
+ upgrade. GVL released across syscalls. Auto-engages when the listener
86
+ is plain TCP and the route table contains only `StaticEntry`
87
+ registrations. **5,502 → 15,685 r/s (+185%, 2.85×) on `handle_static`
88
+ hello; p99 1.59 ms → 107 µs (15× tighter).** Falls through to the
89
+ existing Ruby accept loop on miss with no regression.
90
+ - **2.12-D — io_uring accept loop (Linux 5.x+).** A multishot accept +
91
+ per-conn RECV/WRITE/CLOSE state machine on top of liburing. One
92
+ `io_uring_enter` per N requests instead of N×3 syscalls. Opt-in via
93
+ `HYPERION_IO_URING_ACCEPT=1` (default off until 2.13 production
94
+ soak). **15,685 → 134,084 r/s (+755%, 8.6×) on the same bench.**
95
+ Compiles out cleanly without liburing — the `accept4` path stays
96
+ the fallback. macOS keeps using `accept4` (no liburing).
97
+ - **2.12-E — SO_REUSEPORT cluster-mode audit.** New per-worker request
98
+ metric (`requests_dispatch_total{worker_id="N"}`) ticks under every
99
+ dispatch mode (Rack, `handle_static`, h2, the C accept loops). New
100
+ audit harness `bench/cluster_distribution.sh` and a 4-worker, 30s
101
+ sustained-load bench: under steady state the SO_REUSEPORT hash
102
+ distributes within **1.004-1.011 max/min ratio** — production-grade,
103
+ measured. The cold-start swing (1.16× during the first second of
104
+ fresh boot) is documented as expected `SO_REUSEPORT + keep-alive`
105
+ behavior and matches what production L4 LBs already exhibit.
106
+ - **2.12-F — gRPC support on h2.** Trailers (the `grpc-status` /
107
+ `grpc-message` final HEADERS frame), `TE: trailers` handling, h2
108
+ request half-close semantics. Rack 3 contract: a Rack body that
109
+ defines `#trailers` triggers the trailers wire shape automatically;
110
+ bodies that don't are byte-identical to 2.11.x h2. Smoke test against
111
+ the real `grpc` Ruby gem ships gated by `RUN_GRPC_SMOKE=1`; the
112
+ durable coverage is 11 unit specs driving real `protocol-http2`
113
+ framer + HPACK encode/decode + TLS.
114
+
115
+ The 2.10-G TCP_NODELAY hunk, 2.10-E preload hooks, 2.10-F C-ext
116
+ `rb_pc_serve_request`, 2.11-A dispatch pool warmup, and 2.11-B cglue
117
+ HPACK default all preserved and verified by the 1143-spec suite.
118
+
119
+ Full per-stream details, bench tables, and follow-up items in
120
+ [`CHANGELOG.md`](CHANGELOG.md).
121
+
14
122
  ## What's new in 2.11.0
15
123
 
16
124
  **h2 cold-stream latency cut + native HPACK CGlue flipped to default.**
@@ -156,6 +264,107 @@ container required. HTTP/1.1 only this release; WS-over-HTTP/2 (RFC 8441
156
264
  Extended CONNECT) and permessage-deflate (RFC 7692) defer to 2.2.x.
157
265
  See [`docs/WEBSOCKETS.md`](docs/WEBSOCKETS.md).
158
266
 
267
+ ## gRPC on Hyperion (2.12-F+)
268
+
269
+ Hyperion's HTTP/2 path supports gRPC unary calls via the Rack 3 trailers
270
+ contract: any response body that exposes `:trailers` gets a final
271
+ HEADERS frame (with END_STREAM=1) carrying the trailer map after the
272
+ DATA frames. That's the wire shape gRPC clients expect for the
273
+ `grpc-status` / `grpc-message` map.
274
+
275
+ A minimal Rack-shaped gRPC handler:
276
+
277
+ ```ruby
278
+ class GrpcBody
279
+ def initialize(reply); @reply = reply; end
280
+ def each; yield @reply; end
281
+ def trailers; { 'grpc-status' => '0', 'grpc-message' => 'OK' }; end
282
+ def close; end
283
+ end
284
+
285
+ run ->(env) {
286
+ request = env['rack.input'].read # gRPC-framed protobuf bytes
287
+ reply = handle(request) # your service implementation
288
+ [200, { 'content-type' => 'application/grpc' }, GrpcBody.new(reply)]
289
+ }
290
+ ```
291
+
292
+ What Hyperion handles for you: ALPN negotiation, HTTP/2 framing, HPACK,
293
+ per-stream flow control, the trailer-frame emit, binary-clean
294
+ `env['rack.input']` (gRPC bodies are non-UTF-8), and `te: trailers`
295
+ preserved into `env['HTTP_TE']`. What you handle: protobuf
296
+ marshalling and the `grpc-status` semantics.
297
+
298
+ ### Streaming RPCs (2.13-D+)
299
+
300
+ All four gRPC call shapes work on Hyperion since 2.13-D — unary,
301
+ server-streaming, client-streaming, and bidirectional. The detection
302
+ trigger is the gRPC content-type plus `te: trailers`; any HTTP/2
303
+ request that carries both is dispatched to the Rack app on HEADERS
304
+ arrival (rather than after END_STREAM), and `env['rack.input']`
305
+ becomes a streaming IO that blocks reads until the next DATA frame
306
+ lands. Plain HTTP/2 traffic (without those headers) keeps the unary
307
+ buffered semantics — no behaviour change for non-gRPC clients.
308
+
309
+ **Server-streaming.** Yield one gRPC-framed message per `each`
310
+ iteration; Hyperion writes each yield as its own DATA frame:
311
+
312
+ ```ruby
313
+ class StreamReply
314
+ def initialize(messages); @messages = messages; end
315
+ def each; @messages.each { |m| yield m }; end # one DATA frame each
316
+ def trailers; { 'grpc-status' => '0' }; end
317
+ def close; end
318
+ end
319
+
320
+ run ->(env) {
321
+ env['rack.input'].read # the unary request message
322
+ [200, { 'content-type' => 'application/grpc' }, StreamReply.new(messages)]
323
+ }
324
+ ```
325
+
326
+ **Client-streaming.** Read messages off `env['rack.input']` as the peer
327
+ sends them. Reads block until a DATA frame arrives:
328
+
329
+ ```ruby
330
+ run ->(env) {
331
+ io = env['rack.input']
332
+ count = 0
333
+ while (prefix = io.read(5)) && prefix.bytesize == 5
334
+ length = prefix.byteslice(1, 4).unpack1('N')
335
+ msg = io.read(length)
336
+ count += 1
337
+ end
338
+ [200, { 'content-type' => 'application/grpc' }, GrpcBody.new(reply_for(count))]
339
+ }
340
+ ```
341
+
342
+ **Bidirectional.** Interleave reads and writes. The response body is
343
+ iterated lazily, so you can read one request message, yield one reply,
344
+ read the next, yield the next:
345
+
346
+ ```ruby
347
+ class BidiReplies
348
+ def initialize(io); @io = io; end
349
+ def each
350
+ while (prefix = @io.read(5)) && prefix.bytesize == 5
351
+ len = prefix.byteslice(1, 4).unpack1('N')
352
+ msg = @io.read(len)
353
+ yield grpc_frame(handle(msg)) # sent immediately
354
+ end
355
+ end
356
+ def trailers; { 'grpc-status' => '0' }; end
357
+ def close; end
358
+ end
359
+
360
+ run ->(env) {
361
+ [200, { 'content-type' => 'application/grpc' }, BidiReplies.new(env['rack.input'])]
362
+ }
363
+ ```
364
+
365
+ The streaming-input path runs each stream on its own fiber, so
366
+ concurrent read+write on the same stream is safe.
367
+
159
368
  ## Highlights
160
369
 
161
370
  - **HTTP/1.1 + HTTP/2 + TLS** out of the box (HTTP/2 with per-stream fiber multiplexing, WINDOW_UPDATE-aware flow control, ALPN auto-negotiation).
@@ -174,11 +383,17 @@ See [`docs/WEBSOCKETS.md`](docs/WEBSOCKETS.md).
174
383
  All numbers are real wrk runs against published Hyperion configs. Hyperion ships **with default-ON structured access logs**; Puma comparisons use Puma defaults (no per-request log emission). Each section is stamped with the Hyperion version + bench host it was measured against — bench-host drift over time is real (see "Bench-host drift" note below).
175
384
 
176
385
  **Headline doc**: the most recent comprehensive sweep is
177
- [`docs/BENCH_HYPERION_2_0.md`](docs/BENCH_HYPERION_2_0.md) (Hyperion
178
- 2.0.0 vs Puma 8.0.1, 16-vCPU Ubuntu 24.04, 12 workloads). The 1.6.0
179
- matrix at [`docs/BENCH_2026_04_27.md`](docs/BENCH_2026_04_27.md) covers
180
- 9 workloads × 25+ configs against hyperion-async-pg 0.5.0; both docs
181
- include caveats and per-row reproduction commands.
386
+ [`docs/BENCH_HYPERION_2_11.md`](docs/BENCH_HYPERION_2_11.md) — the
387
+ 2.12-B 4-way re-bench (Hyperion 2.11.0 vs Puma 8.0.1 / Falcon 0.55.3 /
388
+ Agoo 2.15.14, 16-vCPU Ubuntu 24.04, 6 workloads). It's the post-
389
+ 2.10/2.11-wins re-baseline of the four-server matrix that originally
390
+ shipped in [`docs/BENCH_HYPERION_2_0.md`](docs/BENCH_HYPERION_2_0.md)
391
+ § "4-way head-to-head (2.10-B baseline)" — the older doc is the
392
+ **historical baseline (pre-2.10/2.11 wins)** and is preserved
393
+ unchanged for archaeology. The 1.6.0 matrix at
394
+ [`docs/BENCH_2026_04_27.md`](docs/BENCH_2026_04_27.md) covers 9
395
+ workloads × 25+ configs against hyperion-async-pg 0.5.0; all three
396
+ docs include caveats and per-row reproduction commands.
182
397
 
183
398
  > **Bench-host drift note (2026-05-01).** A spot-check rerun on
184
399
  > `openclaw-vm` 5 days after the 2.0.0 sweep showed Puma 8.0.1 and
@@ -17,6 +17,7 @@ $srcs = %w[
17
17
  parser.c
18
18
  sendfile.c
19
19
  page_cache.c
20
+ io_uring_loop.c
20
21
  websocket.c
21
22
  h2_codec_glue.c
22
23
  llhttp.c
@@ -44,4 +45,44 @@ have_header('sys/socket.h')
44
45
  have_header('dlfcn.h')
45
46
  have_library('dl', 'dlopen')
46
47
 
48
+ # 2.12-D — io_uring accept loop (Linux 5.x).
49
+ #
50
+ # Soft-optional dependency: if `liburing` is installed at compile time
51
+ # (Ubuntu/Debian: `apt install liburing-dev`; Fedora: `dnf install
52
+ # liburing-devel`; Arch: `pacman -S liburing`), we build the io_uring
53
+ # accept-loop variant. If it's not, the C ext compiles cleanly without
54
+ # it and the Ruby caller falls through to the 2.12-C `accept4` loop.
55
+ #
56
+ # We probe in two passes:
57
+ # 1. `pkg-config --exists liburing` lets us pick up Debian/Ubuntu's
58
+ # pkg-config metadata and add the right -L/-l flags. Quiet failure
59
+ # is fine — the second pass catches header-only setups (vendored
60
+ # installs, distros without pkg-config metadata).
61
+ # 2. `have_header('liburing.h')` + `have_library('uring', ...)` covers
62
+ # the no-pkg-config path.
63
+ #
64
+ # On success: `-DHAVE_LIBURING` lands in $defs (mkmf-managed) and
65
+ # `io_uring_loop.c` compiles its real loop. On failure: the file
66
+ # compiles to a thin stub that returns `:unavailable`.
67
+ #
68
+ # Linux-only — the loop is `#ifdef __linux__` guarded too, so a
69
+ # liburing-on-FreeBSD setup (technically possible) still picks the
70
+ # stub. Worth-it cost: portability + zero surprise on the bench host.
71
+ RbConfig::CONFIG['target_os'] =~ /linux/ && begin
72
+ pkg_ok = system('pkg-config --exists liburing 2>/dev/null')
73
+ if pkg_ok
74
+ $CFLAGS << ' ' + `pkg-config --cflags liburing`.strip
75
+ $LDFLAGS << ' ' + `pkg-config --libs liburing`.strip
76
+ have_header('liburing.h')
77
+ $defs << '-DHAVE_LIBURING'
78
+ puts '[hyperion] liburing detected via pkg-config — building 2.12-D io_uring accept loop'
79
+ elsif have_header('liburing.h') && have_library('uring', 'io_uring_queue_init')
80
+ $defs << '-DHAVE_LIBURING'
81
+ puts '[hyperion] liburing detected via header probe — building 2.12-D io_uring accept loop'
82
+ else
83
+ puts '[hyperion] liburing not found — 2.12-D io_uring accept loop will return :unavailable; ' \
84
+ 'install `liburing-dev` (Debian/Ubuntu) / `liburing-devel` (Fedora) for the io_uring path'
85
+ end
86
+ end
87
+
47
88
  create_makefile('hyperion_http/hyperion_http')