hyperion-rb 2.11.0 → 2.13.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +1079 -0
- data/README.md +220 -5
- data/ext/hyperion_http/extconf.rb +41 -0
- data/ext/hyperion_http/io_uring_loop.c +710 -0
- data/ext/hyperion_http/page_cache.c +1032 -0
- data/ext/hyperion_http/page_cache_internal.h +132 -0
- data/ext/hyperion_http/parser.c +382 -51
- data/lib/hyperion/adapter/rack.rb +18 -4
- data/lib/hyperion/connection.rb +78 -3
- data/lib/hyperion/dispatch_mode.rb +19 -1
- data/lib/hyperion/http2_handler.rb +458 -13
- data/lib/hyperion/metrics.rb +212 -38
- data/lib/hyperion/prometheus_exporter.rb +76 -1
- data/lib/hyperion/server/connection_loop.rb +159 -0
- data/lib/hyperion/server.rb +183 -0
- data/lib/hyperion/thread_pool.rb +23 -7
- data/lib/hyperion/version.rb +1 -1
- metadata +4 -1
data/README.md
CHANGED
|
@@ -11,6 +11,114 @@ gem install hyperion-rb
|
|
|
11
11
|
bundle exec hyperion config.ru
|
|
12
12
|
```
|
|
13
13
|
|
|
14
|
+
## What's new in 2.13.0
|
|
15
|
+
|
|
16
|
+
**Hardening sprint: profile-driven CPU work, durable infrastructure,
|
|
17
|
+
and gRPC streaming.** 2.13 follows up the 2.12 perf jump with the
|
|
18
|
+
work that wasn't structural enough to need its own major release —
|
|
19
|
+
each stream is small but adds up:
|
|
20
|
+
|
|
21
|
+
- **2.13-A — Generic Rack hot-path wins.** Per-thread shards for
|
|
22
|
+
hot-path metrics (no more cross-worker mutex on `observe_histogram`),
|
|
23
|
+
cached `worker_id` label tuple, and a Rack-3 keepalive fast-path.
|
|
24
|
+
Generic Rack hello bench: **+3.6% on `-c100` no-log**. Honest
|
|
25
|
+
finding: env-pool + body-coalesce already shipped; the deeper
|
|
26
|
+
generic-Rack gap to Agoo is single-thread Ruby ceiling that closing
|
|
27
|
+
needs moving `app.call` into the C accept loop — a 2.14 lift.
|
|
28
|
+
- **2.13-B — Response head builder rewritten in C.** Pre-baked
|
|
29
|
+
status-line table, `rb_hash_foreach` replacing `rb_funcall(:keys)`,
|
|
30
|
+
per-key downcase + per-(key, value) full-line caches, custom `itoa`
|
|
31
|
+
replacing `snprintf`. **+7.7% single-thread synthetic; multi-thread
|
|
32
|
+
neutral (GVL-bound).** Profile confirms Hyperion's own C-ext code is
|
|
33
|
+
**<1%** of wall-clock; the rest is libruby + JSON gem.
|
|
34
|
+
- **2.13-C — Spec flake hunt.** Two long-standing flakes fixed:
|
|
35
|
+
`tls_ktls_spec` macOS skip leak (unconditional `RUBY_PLATFORM`
|
|
36
|
+
guard), and `connection_loop_spec:79` Linux port-bind flake (root
|
|
37
|
+
cause: Linux `close()` doesn't wake a parked `accept(2)` — fixed
|
|
38
|
+
with a `stop_loop_and_wake` helper). 5/5 → 0/10 failure rate;
|
|
39
|
+
spec-suite runtime 46 s → 1.3 s.
|
|
40
|
+
- **2.13-D — gRPC streaming RPCs.** Server-streaming, client-streaming,
|
|
41
|
+
and bidirectional RPCs on top of 2.12-F's unary trailers foundation.
|
|
42
|
+
New `bench/grpc_stream.{proto,ru}` + `grpc_stream_bench.sh` ghz
|
|
43
|
+
harness for operator-side comparison vs Falcon's `async-grpc`.
|
|
44
|
+
- **2.13-E — io_uring soak harness + CI smoke.** New
|
|
45
|
+
`bench/io_uring_soak.sh` runs a 24h soak against the 2.12-D
|
|
46
|
+
io_uring loop with `/proc/$PID` + `/-/metrics` sampling, emits a
|
|
47
|
+
CSV + verdict (PASS / SOAK FAIL / borderline). New
|
|
48
|
+
`spec/hyperion/io_uring_soak_smoke_spec.rb` runs a 1000-request
|
|
49
|
+
mini-soak in CI to catch leak regressions before any 24h run.
|
|
50
|
+
**`HYPERION_IO_URING_ACCEPT` stays opt-in for 2.13** — operators
|
|
51
|
+
with their own staging environments can now collect signal; the
|
|
52
|
+
default-flip decision moves to 2.14.
|
|
53
|
+
|
|
54
|
+
Plus all previous wins are preserved and verified by the 1183-spec
|
|
55
|
+
suite (2.10-G TCP_NODELAY at accept, 2.10-E preload hooks, 2.10-F
|
|
56
|
+
C-ext fast-path response writer, 2.11-A dispatch pool warmup, 2.11-B
|
|
57
|
+
cglue HPACK default, 2.12-C accept4 connection loop, 2.12-D io_uring
|
|
58
|
+
loop, 2.12-E per-worker request counter, 2.12-F gRPC unary trailers).
|
|
59
|
+
|
|
60
|
+
Full per-stream details, bench tables, and follow-up items in
|
|
61
|
+
[`CHANGELOG.md`](CHANGELOG.md).
|
|
62
|
+
|
|
63
|
+
## What's new in 2.12.0
|
|
64
|
+
|
|
65
|
+
**The hot path moves into C — and gRPC ships.** The headline win:
|
|
66
|
+
`Server.handle_static` routes now serve from a C accept→read→route→write
|
|
67
|
+
loop with optional **io_uring** (Linux 5.x+) backing it. The `wrk -t4
|
|
68
|
+
-c100 -d20s` hello bench moved from **5,502 r/s** (2.11.0
|
|
69
|
+
`Server.handle_static` via Ruby accept loop) to **15,685 r/s** (2.12-C
|
|
70
|
+
C accept4 loop) to **134,084 r/s** (2.12-D io_uring loop) — that's
|
|
71
|
+
**24× over 2.11.0's `handle_static` and 7× over Agoo 2.15.14's
|
|
72
|
+
19,024 r/s** on the same workload. p99 stays sub-millisecond
|
|
73
|
+
throughout. Plus durable foundation work and one big new feature:
|
|
74
|
+
|
|
75
|
+
- **2.12-B — Fresh 4-way re-bench.** New
|
|
76
|
+
[`docs/BENCH_HYPERION_2_11.md`](docs/BENCH_HYPERION_2_11.md) re-runs
|
|
77
|
+
Hyperion / Puma / Falcon / Agoo on the 6 workloads with all 2.10/2.11
|
|
78
|
+
wins enabled. Headline shifts: static 1 KB Hyperion `handle_static`
|
|
79
|
+
flipped from 1.89× behind Agoo to **+127% ahead**; CPU JSON gap
|
|
80
|
+
widened (the one row 2.10/2.11 didn't touch — flagged for follow-up).
|
|
81
|
+
- **2.12-C — Connection lifecycle in C.** New
|
|
82
|
+
`Hyperion::Http::PageCache.run_static_accept_loop` does
|
|
83
|
+
`accept4` + `recv` + path lookup + `write` entirely in a C tight
|
|
84
|
+
loop, returning to Ruby only on a route miss / TLS / h2 / WebSocket
|
|
85
|
+
upgrade. GVL released across syscalls. Auto-engages when the listener
|
|
86
|
+
is plain TCP and the route table contains only `StaticEntry`
|
|
87
|
+
registrations. **5,502 → 15,685 r/s (+185%, 2.85×) on `handle_static`
|
|
88
|
+
hello; p99 1.59 ms → 107 µs (15× tighter).** Falls through to the
|
|
89
|
+
existing Ruby accept loop on miss with no regression.
|
|
90
|
+
- **2.12-D — io_uring accept loop (Linux 5.x+).** A multishot accept +
|
|
91
|
+
per-conn RECV/WRITE/CLOSE state machine on top of liburing. One
|
|
92
|
+
`io_uring_enter` per N requests instead of N×3 syscalls. Opt-in via
|
|
93
|
+
`HYPERION_IO_URING_ACCEPT=1` (default off until 2.13 production
|
|
94
|
+
soak). **15,685 → 134,084 r/s (+755%, 8.6×) on the same bench.**
|
|
95
|
+
Compiles out cleanly without liburing — the `accept4` path stays
|
|
96
|
+
the fallback. macOS keeps using `accept4` (no liburing).
|
|
97
|
+
- **2.12-E — SO_REUSEPORT cluster-mode audit.** New per-worker request
|
|
98
|
+
metric (`requests_dispatch_total{worker_id="N"}`) ticks under every
|
|
99
|
+
dispatch mode (Rack, `handle_static`, h2, the C accept loops). New
|
|
100
|
+
audit harness `bench/cluster_distribution.sh` and a 4-worker, 30s
|
|
101
|
+
sustained-load bench: under steady state the SO_REUSEPORT hash
|
|
102
|
+
distributes within **1.004-1.011 max/min ratio** — production-grade,
|
|
103
|
+
measured. The cold-start swing (1.16× during the first second of
|
|
104
|
+
fresh boot) is documented as expected `SO_REUSEPORT + keep-alive`
|
|
105
|
+
behavior and matches what production L4 LBs already exhibit.
|
|
106
|
+
- **2.12-F — gRPC support on h2.** Trailers (the `grpc-status` /
|
|
107
|
+
`grpc-message` final HEADERS frame), `TE: trailers` handling, h2
|
|
108
|
+
request half-close semantics. Rack 3 contract: a Rack body that
|
|
109
|
+
defines `#trailers` triggers the trailers wire shape automatically;
|
|
110
|
+
bodies that don't are byte-identical to 2.11.x h2. Smoke test against
|
|
111
|
+
the real `grpc` Ruby gem ships gated by `RUN_GRPC_SMOKE=1`; the
|
|
112
|
+
durable coverage is 11 unit specs driving real `protocol-http2`
|
|
113
|
+
framer + HPACK encode/decode + TLS.
|
|
114
|
+
|
|
115
|
+
The 2.10-G TCP_NODELAY hunk, 2.10-E preload hooks, 2.10-F C-ext
|
|
116
|
+
`rb_pc_serve_request`, 2.11-A dispatch pool warmup, and 2.11-B cglue
|
|
117
|
+
HPACK default all preserved and verified by the 1143-spec suite.
|
|
118
|
+
|
|
119
|
+
Full per-stream details, bench tables, and follow-up items in
|
|
120
|
+
[`CHANGELOG.md`](CHANGELOG.md).
|
|
121
|
+
|
|
14
122
|
## What's new in 2.11.0
|
|
15
123
|
|
|
16
124
|
**h2 cold-stream latency cut + native HPACK CGlue flipped to default.**
|
|
@@ -156,6 +264,107 @@ container required. HTTP/1.1 only this release; WS-over-HTTP/2 (RFC 8441
|
|
|
156
264
|
Extended CONNECT) and permessage-deflate (RFC 7692) defer to 2.2.x.
|
|
157
265
|
See [`docs/WEBSOCKETS.md`](docs/WEBSOCKETS.md).
|
|
158
266
|
|
|
267
|
+
## gRPC on Hyperion (2.12-F+)
|
|
268
|
+
|
|
269
|
+
Hyperion's HTTP/2 path supports gRPC unary calls via the Rack 3 trailers
|
|
270
|
+
contract: any response body that exposes `:trailers` gets a final
|
|
271
|
+
HEADERS frame (with END_STREAM=1) carrying the trailer map after the
|
|
272
|
+
DATA frames. That's the wire shape gRPC clients expect for the
|
|
273
|
+
`grpc-status` / `grpc-message` map.
|
|
274
|
+
|
|
275
|
+
A minimal Rack-shaped gRPC handler:
|
|
276
|
+
|
|
277
|
+
```ruby
|
|
278
|
+
class GrpcBody
|
|
279
|
+
def initialize(reply); @reply = reply; end
|
|
280
|
+
def each; yield @reply; end
|
|
281
|
+
def trailers; { 'grpc-status' => '0', 'grpc-message' => 'OK' }; end
|
|
282
|
+
def close; end
|
|
283
|
+
end
|
|
284
|
+
|
|
285
|
+
run ->(env) {
|
|
286
|
+
request = env['rack.input'].read # gRPC-framed protobuf bytes
|
|
287
|
+
reply = handle(request) # your service implementation
|
|
288
|
+
[200, { 'content-type' => 'application/grpc' }, GrpcBody.new(reply)]
|
|
289
|
+
}
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
What Hyperion handles for you: ALPN negotiation, HTTP/2 framing, HPACK,
|
|
293
|
+
per-stream flow control, the trailer-frame emit, binary-clean
|
|
294
|
+
`env['rack.input']` (gRPC bodies are non-UTF-8), and `te: trailers`
|
|
295
|
+
preserved into `env['HTTP_TE']`. What you handle: protobuf
|
|
296
|
+
marshalling and the `grpc-status` semantics.
|
|
297
|
+
|
|
298
|
+
### Streaming RPCs (2.13-D+)
|
|
299
|
+
|
|
300
|
+
All four gRPC call shapes work on Hyperion since 2.13-D — unary,
|
|
301
|
+
server-streaming, client-streaming, and bidirectional. The detection
|
|
302
|
+
trigger is the gRPC content-type plus `te: trailers`; any HTTP/2
|
|
303
|
+
request that carries both is dispatched to the Rack app on HEADERS
|
|
304
|
+
arrival (rather than after END_STREAM), and `env['rack.input']`
|
|
305
|
+
becomes a streaming IO that blocks reads until the next DATA frame
|
|
306
|
+
lands. Plain HTTP/2 traffic (without those headers) keeps the unary
|
|
307
|
+
buffered semantics — no behaviour change for non-gRPC clients.
|
|
308
|
+
|
|
309
|
+
**Server-streaming.** Yield one gRPC-framed message per `each`
|
|
310
|
+
iteration; Hyperion writes each yield as its own DATA frame:
|
|
311
|
+
|
|
312
|
+
```ruby
|
|
313
|
+
class StreamReply
|
|
314
|
+
def initialize(messages); @messages = messages; end
|
|
315
|
+
def each; @messages.each { |m| yield m }; end # one DATA frame each
|
|
316
|
+
def trailers; { 'grpc-status' => '0' }; end
|
|
317
|
+
def close; end
|
|
318
|
+
end
|
|
319
|
+
|
|
320
|
+
run ->(env) {
|
|
321
|
+
env['rack.input'].read # the unary request message
|
|
322
|
+
[200, { 'content-type' => 'application/grpc' }, StreamReply.new(messages)]
|
|
323
|
+
}
|
|
324
|
+
```
|
|
325
|
+
|
|
326
|
+
**Client-streaming.** Read messages off `env['rack.input']` as the peer
|
|
327
|
+
sends them. Reads block until a DATA frame arrives:
|
|
328
|
+
|
|
329
|
+
```ruby
|
|
330
|
+
run ->(env) {
|
|
331
|
+
io = env['rack.input']
|
|
332
|
+
count = 0
|
|
333
|
+
while (prefix = io.read(5)) && prefix.bytesize == 5
|
|
334
|
+
length = prefix.byteslice(1, 4).unpack1('N')
|
|
335
|
+
msg = io.read(length)
|
|
336
|
+
count += 1
|
|
337
|
+
end
|
|
338
|
+
[200, { 'content-type' => 'application/grpc' }, GrpcBody.new(reply_for(count))]
|
|
339
|
+
}
|
|
340
|
+
```
|
|
341
|
+
|
|
342
|
+
**Bidirectional.** Interleave reads and writes. The response body is
|
|
343
|
+
iterated lazily, so you can read one request message, yield one reply,
|
|
344
|
+
read the next, yield the next:
|
|
345
|
+
|
|
346
|
+
```ruby
|
|
347
|
+
class BidiReplies
|
|
348
|
+
def initialize(io); @io = io; end
|
|
349
|
+
def each
|
|
350
|
+
while (prefix = @io.read(5)) && prefix.bytesize == 5
|
|
351
|
+
len = prefix.byteslice(1, 4).unpack1('N')
|
|
352
|
+
msg = @io.read(len)
|
|
353
|
+
yield grpc_frame(handle(msg)) # sent immediately
|
|
354
|
+
end
|
|
355
|
+
end
|
|
356
|
+
def trailers; { 'grpc-status' => '0' }; end
|
|
357
|
+
def close; end
|
|
358
|
+
end
|
|
359
|
+
|
|
360
|
+
run ->(env) {
|
|
361
|
+
[200, { 'content-type' => 'application/grpc' }, BidiReplies.new(env['rack.input'])]
|
|
362
|
+
}
|
|
363
|
+
```
|
|
364
|
+
|
|
365
|
+
The streaming-input path runs each stream on its own fiber, so
|
|
366
|
+
concurrent read+write on the same stream is safe.
|
|
367
|
+
|
|
159
368
|
## Highlights
|
|
160
369
|
|
|
161
370
|
- **HTTP/1.1 + HTTP/2 + TLS** out of the box (HTTP/2 with per-stream fiber multiplexing, WINDOW_UPDATE-aware flow control, ALPN auto-negotiation).
|
|
@@ -174,11 +383,17 @@ See [`docs/WEBSOCKETS.md`](docs/WEBSOCKETS.md).
|
|
|
174
383
|
All numbers are real wrk runs against published Hyperion configs. Hyperion ships **with default-ON structured access logs**; Puma comparisons use Puma defaults (no per-request log emission). Each section is stamped with the Hyperion version + bench host it was measured against — bench-host drift over time is real (see "Bench-host drift" note below).
|
|
175
384
|
|
|
176
385
|
**Headline doc**: the most recent comprehensive sweep is
|
|
177
|
-
[`docs/
|
|
178
|
-
2.
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
386
|
+
[`docs/BENCH_HYPERION_2_11.md`](docs/BENCH_HYPERION_2_11.md) — the
|
|
387
|
+
2.12-B 4-way re-bench (Hyperion 2.11.0 vs Puma 8.0.1 / Falcon 0.55.3 /
|
|
388
|
+
Agoo 2.15.14, 16-vCPU Ubuntu 24.04, 6 workloads). It's the post-
|
|
389
|
+
2.10/2.11-wins re-baseline of the four-server matrix that originally
|
|
390
|
+
shipped in [`docs/BENCH_HYPERION_2_0.md`](docs/BENCH_HYPERION_2_0.md)
|
|
391
|
+
§ "4-way head-to-head (2.10-B baseline)" — the older doc is the
|
|
392
|
+
**historical baseline (pre-2.10/2.11 wins)** and is preserved
|
|
393
|
+
unchanged for archaeology. The 1.6.0 matrix at
|
|
394
|
+
[`docs/BENCH_2026_04_27.md`](docs/BENCH_2026_04_27.md) covers 9
|
|
395
|
+
workloads × 25+ configs against hyperion-async-pg 0.5.0; all three
|
|
396
|
+
docs include caveats and per-row reproduction commands.
|
|
182
397
|
|
|
183
398
|
> **Bench-host drift note (2026-05-01).** A spot-check rerun on
|
|
184
399
|
> `openclaw-vm` 5 days after the 2.0.0 sweep showed Puma 8.0.1 and
|
|
@@ -17,6 +17,7 @@ $srcs = %w[
|
|
|
17
17
|
parser.c
|
|
18
18
|
sendfile.c
|
|
19
19
|
page_cache.c
|
|
20
|
+
io_uring_loop.c
|
|
20
21
|
websocket.c
|
|
21
22
|
h2_codec_glue.c
|
|
22
23
|
llhttp.c
|
|
@@ -44,4 +45,44 @@ have_header('sys/socket.h')
|
|
|
44
45
|
have_header('dlfcn.h')
|
|
45
46
|
have_library('dl', 'dlopen')
|
|
46
47
|
|
|
48
|
+
# 2.12-D — io_uring accept loop (Linux 5.x).
|
|
49
|
+
#
|
|
50
|
+
# Soft-optional dependency: if `liburing` is installed at compile time
|
|
51
|
+
# (Ubuntu/Debian: `apt install liburing-dev`; Fedora: `dnf install
|
|
52
|
+
# liburing-devel`; Arch: `pacman -S liburing`), we build the io_uring
|
|
53
|
+
# accept-loop variant. If it's not, the C ext compiles cleanly without
|
|
54
|
+
# it and the Ruby caller falls through to the 2.12-C `accept4` loop.
|
|
55
|
+
#
|
|
56
|
+
# We probe in two passes:
|
|
57
|
+
# 1. `pkg-config --exists liburing` lets us pick up Debian/Ubuntu's
|
|
58
|
+
# pkg-config metadata and add the right -L/-l flags. Quiet failure
|
|
59
|
+
# is fine — the second pass catches header-only setups (vendored
|
|
60
|
+
# installs, distros without pkg-config metadata).
|
|
61
|
+
# 2. `have_header('liburing.h')` + `have_library('uring', ...)` covers
|
|
62
|
+
# the no-pkg-config path.
|
|
63
|
+
#
|
|
64
|
+
# On success: `-DHAVE_LIBURING` lands in $defs (mkmf-managed) and
|
|
65
|
+
# `io_uring_loop.c` compiles its real loop. On failure: the file
|
|
66
|
+
# compiles to a thin stub that returns `:unavailable`.
|
|
67
|
+
#
|
|
68
|
+
# Linux-only — the loop is `#ifdef __linux__` guarded too, so a
|
|
69
|
+
# liburing-on-FreeBSD setup (technically possible) still picks the
|
|
70
|
+
# stub. Worth-it cost: portability + zero surprise on the bench host.
|
|
71
|
+
RbConfig::CONFIG['target_os'] =~ /linux/ && begin
|
|
72
|
+
pkg_ok = system('pkg-config --exists liburing 2>/dev/null')
|
|
73
|
+
if pkg_ok
|
|
74
|
+
$CFLAGS << ' ' + `pkg-config --cflags liburing`.strip
|
|
75
|
+
$LDFLAGS << ' ' + `pkg-config --libs liburing`.strip
|
|
76
|
+
have_header('liburing.h')
|
|
77
|
+
$defs << '-DHAVE_LIBURING'
|
|
78
|
+
puts '[hyperion] liburing detected via pkg-config — building 2.12-D io_uring accept loop'
|
|
79
|
+
elsif have_header('liburing.h') && have_library('uring', 'io_uring_queue_init')
|
|
80
|
+
$defs << '-DHAVE_LIBURING'
|
|
81
|
+
puts '[hyperion] liburing detected via header probe — building 2.12-D io_uring accept loop'
|
|
82
|
+
else
|
|
83
|
+
puts '[hyperion] liburing not found — 2.12-D io_uring accept loop will return :unavailable; ' \
|
|
84
|
+
'install `liburing-dev` (Debian/Ubuntu) / `liburing-devel` (Fedora) for the io_uring path'
|
|
85
|
+
end
|
|
86
|
+
end
|
|
87
|
+
|
|
47
88
|
create_makefile('hyperion_http/hyperion_http')
|