hyperion-rb 2.10.1 → 2.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -11,6 +11,98 @@ gem install hyperion-rb
11
11
  bundle exec hyperion config.ru
12
12
  ```
13
13
 
14
+ ## What's new in 2.12.0
15
+
16
+ **The hot path moves into C — and gRPC ships.** The headline win:
17
+ `Server.handle_static` routes now serve from a C accept→read→route→write
18
+ loop with optional **io_uring** (Linux 5.x+) backing it. The `wrk -t4
19
+ -c100 -d20s` hello bench moved from **5,502 r/s** (2.11.0
20
+ `Server.handle_static` via Ruby accept loop) to **15,685 r/s** (2.12-C
21
+ C accept4 loop) to **134,084 r/s** (2.12-D io_uring loop) — that's
22
+ **24× over 2.11.0's `handle_static` and 7× over Agoo 2.15.14's
23
+ 19,024 r/s** on the same workload. p99 stays sub-millisecond
24
+ throughout. Plus durable foundation work and one big new feature:
25
+
26
+ - **2.12-B — Fresh 4-way re-bench.** New
27
+ [`docs/BENCH_HYPERION_2_11.md`](docs/BENCH_HYPERION_2_11.md) re-runs
28
+ Hyperion / Puma / Falcon / Agoo on the 6 workloads with all 2.10/2.11
29
+ wins enabled. Headline shifts: static 1 KB Hyperion `handle_static`
30
+ flipped from 1.89× behind Agoo to **+127% ahead**; CPU JSON gap
31
+ widened (the one row 2.10/2.11 didn't touch — flagged for follow-up).
32
+ - **2.12-C — Connection lifecycle in C.** New
33
+ `Hyperion::Http::PageCache.run_static_accept_loop` does
34
+ `accept4` + `recv` + path lookup + `write` entirely in a C tight
35
+ loop, returning to Ruby only on a route miss / TLS / h2 / WebSocket
36
+ upgrade. GVL released across syscalls. Auto-engages when the listener
37
+ is plain TCP and the route table contains only `StaticEntry`
38
+ registrations. **5,502 → 15,685 r/s (+185%, 2.85×) on `handle_static`
39
+ hello; p99 1.59 ms → 107 µs (15× tighter).** Falls through to the
40
+ existing Ruby accept loop on miss with no regression.
41
+ - **2.12-D — io_uring accept loop (Linux 5.x+).** A multishot accept +
42
+ per-conn RECV/WRITE/CLOSE state machine on top of liburing. One
43
+ `io_uring_enter` per N requests instead of N×3 syscalls. Opt-in via
44
+ `HYPERION_IO_URING_ACCEPT=1` (default off until 2.13 production
45
+ soak). **15,685 → 134,084 r/s (+755%, 8.6×) on the same bench.**
46
+ Compiles out cleanly without liburing — the `accept4` path stays
47
+ the fallback. macOS keeps using `accept4` (no liburing).
48
+ - **2.12-E — SO_REUSEPORT cluster-mode audit.** New per-worker request
49
+ metric (`requests_dispatch_total{worker_id="N"}`) ticks under every
50
+ dispatch mode (Rack, `handle_static`, h2, the C accept loops). New
51
+ audit harness `bench/cluster_distribution.sh` and a 4-worker, 30s
52
+ sustained-load bench: under steady state the SO_REUSEPORT hash
53
+ distributes within **1.004-1.011 max/min ratio** — production-grade,
54
+ measured. The cold-start swing (1.16× during the first second of
55
+ fresh boot) is documented as expected `SO_REUSEPORT + keep-alive`
56
+ behavior and matches what production L4 LBs already exhibit.
57
+ - **2.12-F — gRPC support on h2.** Trailers (the `grpc-status` /
58
+ `grpc-message` final HEADERS frame), `TE: trailers` handling, h2
59
+ request half-close semantics. Rack 3 contract: a Rack body that
60
+ defines `#trailers` triggers the trailers wire shape automatically;
61
+ bodies that don't are byte-identical to 2.11.x h2. Smoke test against
62
+ the real `grpc` Ruby gem ships gated by `RUN_GRPC_SMOKE=1`; the
63
+ durable coverage is 11 unit specs driving real `protocol-http2`
64
+ framer + HPACK encode/decode + TLS.
65
+
66
+ The 2.10-G TCP_NODELAY hunk, 2.10-E preload hooks, 2.10-F C-ext
67
+ `rb_pc_serve_request`, 2.11-A dispatch pool warmup, and 2.11-B cglue
68
+ HPACK default all preserved and verified by the 1143-spec suite.
69
+
70
+ Full per-stream details, bench tables, and follow-up items in
71
+ [`CHANGELOG.md`](CHANGELOG.md).
72
+
73
+ ## What's new in 2.11.0
74
+
75
+ **h2 cold-stream latency cut + native HPACK CGlue flipped to default.**
76
+ Two perf wins on top of 2.10:
77
+
78
+ - **2.11-A — h2 first-stream TLS handshake parallelization.** The
79
+ 2.10-G `HYPERION_H2_TIMING=1` instrumentation, run against the
80
+ TCP_NODELAY-fixed handler, isolated the residual cold-stream cost
81
+ to **bucket 2**: lazy `task.async {}` fiber spawn for the first
82
+ stream of every connection. Fix: pre-spawn a stream-dispatch fiber
83
+ pool at connection accept (configurable via `HYPERION_H2_DISPATCH_POOL`,
84
+ default 4, ceiling 16). h2load `-c 1 -m 1 -n 50` cold first-run:
85
+ **time-to-1st-byte 20.28 → 9.28 ms (−54%); m=100 throughput +5.5%**.
86
+ Warm steady-state unchanged (no head-of-line blocking under the small
87
+ pool — backlog still spills to ad-hoc `task.async`).
88
+ - **2.11-B — HPACK FFI marshalling round-2 (CGlue flipped to default).**
89
+ Three-way bench (`bench/h2_rails_shape.sh` extended): `ruby` (1,585
90
+ r/s) vs `native v2` (1,602 r/s, +1% — noise) vs `native v3 / CGlue`
91
+ (**2,291 r/s, +43% over v2**). The +18-44% native-vs-Ruby headline
92
+ was almost entirely Fiddle marshalling overhead, not the underlying
93
+ Rust HPACK encoder — same encoder, no per-call FFI marshalling, +43%
94
+ rps. Default flipped: unset `HYPERION_H2_NATIVE_HPACK` now selects
95
+ CGlue. Three escape valves stay (`=v2` to force the old path, `=ruby`
96
+ / `=off` for the pure-Ruby fallback) for any operator that needs
97
+ them. Boot log gains a `native_mode` field documenting which path is
98
+ actually live.
99
+
100
+ Plus operator infrastructure: a stale-`.dylib`-on-Linux cross-platform
101
+ host-OS portability fix in `H2Codec.candidate_paths` (was silently
102
+ falling through to pure-Ruby on the bench host); `bench/h2_rails_shape.sh`
103
+ race-fixed (boot-log probe + stderr routing). Full bench tables and
104
+ flip-decision rationale in [`CHANGELOG.md`](CHANGELOG.md).
105
+
14
106
  ## What's new in 2.10.1
15
107
 
16
108
  **Static-asset operator surface (2.10-E) + C-ext fast-path response
@@ -123,6 +215,38 @@ container required. HTTP/1.1 only this release; WS-over-HTTP/2 (RFC 8441
123
215
  Extended CONNECT) and permessage-deflate (RFC 7692) defer to 2.2.x.
124
216
  See [`docs/WEBSOCKETS.md`](docs/WEBSOCKETS.md).
125
217
 
218
+ ## gRPC on Hyperion (2.12-F+)
219
+
220
+ Hyperion's HTTP/2 path supports gRPC unary calls via the Rack 3 trailers
221
+ contract: any response body that exposes `:trailers` gets a final
222
+ HEADERS frame (with END_STREAM=1) carrying the trailer map after the
223
+ DATA frames. That's the wire shape gRPC clients expect for the
224
+ `grpc-status` / `grpc-message` map.
225
+
226
+ A minimal Rack-shaped gRPC handler:
227
+
228
+ ```ruby
229
+ class GrpcBody
230
+ def initialize(reply); @reply = reply; end
231
+ def each; yield @reply; end
232
+ def trailers; { 'grpc-status' => '0', 'grpc-message' => 'OK' }; end
233
+ def close; end
234
+ end
235
+
236
+ run ->(env) {
237
+ request = env['rack.input'].read # gRPC-framed protobuf bytes
238
+ reply = handle(request) # your service implementation
239
+ [200, { 'content-type' => 'application/grpc' }, GrpcBody.new(reply)]
240
+ }
241
+ ```
242
+
243
+ What Hyperion handles for you: ALPN negotiation, HTTP/2 framing, HPACK,
244
+ per-stream flow control, the trailer-frame emit, binary-clean
245
+ `env['rack.input']` (gRPC bodies are non-UTF-8), and `te: trailers`
246
+ preserved into `env['HTTP_TE']`. What you handle: protobuf
247
+ marshalling and the `grpc-status` semantics. Streaming RPCs (server /
248
+ client / bidi) are 2.13 candidates — pin to unary for now.
249
+
126
250
  ## Highlights
127
251
 
128
252
  - **HTTP/1.1 + HTTP/2 + TLS** out of the box (HTTP/2 with per-stream fiber multiplexing, WINDOW_UPDATE-aware flow control, ALPN auto-negotiation).
@@ -141,11 +265,17 @@ See [`docs/WEBSOCKETS.md`](docs/WEBSOCKETS.md).
141
265
  All numbers are real wrk runs against published Hyperion configs. Hyperion ships **with default-ON structured access logs**; Puma comparisons use Puma defaults (no per-request log emission). Each section is stamped with the Hyperion version + bench host it was measured against — bench-host drift over time is real (see "Bench-host drift" note below).
142
266
 
143
267
  **Headline doc**: the most recent comprehensive sweep is
144
- [`docs/BENCH_HYPERION_2_0.md`](docs/BENCH_HYPERION_2_0.md) (Hyperion
145
- 2.0.0 vs Puma 8.0.1, 16-vCPU Ubuntu 24.04, 12 workloads). The 1.6.0
146
- matrix at [`docs/BENCH_2026_04_27.md`](docs/BENCH_2026_04_27.md) covers
147
- 9 workloads × 25+ configs against hyperion-async-pg 0.5.0; both docs
148
- include caveats and per-row reproduction commands.
268
+ [`docs/BENCH_HYPERION_2_11.md`](docs/BENCH_HYPERION_2_11.md) — the
269
+ 2.12-B 4-way re-bench (Hyperion 2.11.0 vs Puma 8.0.1 / Falcon 0.55.3 /
270
+ Agoo 2.15.14, 16-vCPU Ubuntu 24.04, 6 workloads). It's the post-
271
+ 2.10/2.11-wins re-baseline of the four-server matrix that originally
272
+ shipped in [`docs/BENCH_HYPERION_2_0.md`](docs/BENCH_HYPERION_2_0.md)
273
+ § "4-way head-to-head (2.10-B baseline)" — the older doc is the
274
+ **historical baseline (pre-2.10/2.11 wins)** and is preserved
275
+ unchanged for archaeology. The 1.6.0 matrix at
276
+ [`docs/BENCH_2026_04_27.md`](docs/BENCH_2026_04_27.md) covers 9
277
+ workloads × 25+ configs against hyperion-async-pg 0.5.0; all three
278
+ docs include caveats and per-row reproduction commands.
149
279
 
150
280
  > **Bench-host drift note (2026-05-01).** A spot-check rerun on
151
281
  > `openclaw-vm` 5 days after the 2.0.0 sweep showed Puma 8.0.1 and
@@ -17,6 +17,7 @@ $srcs = %w[
17
17
  parser.c
18
18
  sendfile.c
19
19
  page_cache.c
20
+ io_uring_loop.c
20
21
  websocket.c
21
22
  h2_codec_glue.c
22
23
  llhttp.c
@@ -44,4 +45,44 @@ have_header('sys/socket.h')
44
45
  have_header('dlfcn.h')
45
46
  have_library('dl', 'dlopen')
46
47
 
48
+ # 2.12-D — io_uring accept loop (Linux 5.x).
49
+ #
50
+ # Soft-optional dependency: if `liburing` is installed at compile time
51
+ # (Ubuntu/Debian: `apt install liburing-dev`; Fedora: `dnf install
52
+ # liburing-devel`; Arch: `pacman -S liburing`), we build the io_uring
53
+ # accept-loop variant. If it's not, the C ext compiles cleanly without
54
+ # it and the Ruby caller falls through to the 2.12-C `accept4` loop.
55
+ #
56
+ # We probe in two passes:
57
+ # 1. `pkg-config --exists liburing` lets us pick up Debian/Ubuntu's
58
+ # pkg-config metadata and add the right -L/-l flags. Quiet failure
59
+ # is fine — the second pass catches header-only setups (vendored
60
+ # installs, distros without pkg-config metadata).
61
+ # 2. `have_header('liburing.h')` + `have_library('uring', ...)` covers
62
+ # the no-pkg-config path.
63
+ #
64
+ # On success: `-DHAVE_LIBURING` lands in $defs (mkmf-managed) and
65
+ # `io_uring_loop.c` compiles its real loop. On failure: the file
66
+ # compiles to a thin stub that returns `:unavailable`.
67
+ #
68
+ # Linux-only — the loop is `#ifdef __linux__` guarded too, so a
69
+ # liburing-on-FreeBSD setup (technically possible) still picks the
70
+ # stub. Worth-it cost: portability + zero surprise on the bench host.
71
+ RbConfig::CONFIG['target_os'] =~ /linux/ && begin
72
+ pkg_ok = system('pkg-config --exists liburing 2>/dev/null')
73
+ if pkg_ok
74
+ $CFLAGS << ' ' + `pkg-config --cflags liburing`.strip
75
+ $LDFLAGS << ' ' + `pkg-config --libs liburing`.strip
76
+ have_header('liburing.h')
77
+ $defs << '-DHAVE_LIBURING'
78
+ puts '[hyperion] liburing detected via pkg-config — building 2.12-D io_uring accept loop'
79
+ elsif have_header('liburing.h') && have_library('uring', 'io_uring_queue_init')
80
+ $defs << '-DHAVE_LIBURING'
81
+ puts '[hyperion] liburing detected via header probe — building 2.12-D io_uring accept loop'
82
+ else
83
+ puts '[hyperion] liburing not found — 2.12-D io_uring accept loop will return :unavailable; ' \
84
+ 'install `liburing-dev` (Debian/Ubuntu) / `liburing-devel` (Fedora) for the io_uring path'
85
+ end
86
+ end
87
+
47
88
  create_makefile('hyperion_http/hyperion_http')