hyperion-rb 2.12.0 → 2.14.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +1117 -0
- data/README.md +301 -674
- data/ext/hyperion_http/page_cache.c +538 -43
- data/ext/hyperion_http/parser.c +382 -51
- data/lib/hyperion/adapter/rack.rb +303 -4
- data/lib/hyperion/connection.rb +65 -4
- data/lib/hyperion/http2_handler.rb +348 -21
- data/lib/hyperion/metrics.rb +174 -38
- data/lib/hyperion/server/connection_loop.rb +104 -6
- data/lib/hyperion/server/route_table.rb +64 -0
- data/lib/hyperion/server.rb +202 -2
- data/lib/hyperion/version.rb +1 -1
- metadata +1 -1
data/README.md
CHANGED
|
@@ -1,589 +1,218 @@
|
|
|
1
1
|
# Hyperion
|
|
2
2
|
|
|
3
|
-
High-performance Ruby HTTP server.
|
|
3
|
+
High-performance Ruby HTTP server. Rack 3 + HTTP/2 + WebSockets + gRPC on a single binary.
|
|
4
4
|
|
|
5
5
|
[](https://github.com/andrew-woblavobla/hyperion/actions/workflows/ci.yml)
|
|
6
6
|
[](https://rubygems.org/gems/hyperion-rb)
|
|
7
7
|
[](https://github.com/andrew-woblavobla/hyperion/blob/master/LICENSE)
|
|
8
8
|
|
|
9
|
+
Hyperion serves a hello-world Rack response at **134,084 r/s with a 1.14 ms p99**
|
|
10
|
+
on a single worker (Linux 6.x, io_uring accept loop, `Server.handle_static`),
|
|
11
|
+
**7×** Agoo's 19,024 r/s on the same hardware. Beyond the C-side fast path
|
|
12
|
+
it's a complete Rack 3 server: HTTP/1.1 + HTTP/2 with ALPN, WebSockets
|
|
13
|
+
(RFC 6455), gRPC unary + streaming on the Rack 3 trailers contract, native
|
|
14
|
+
fiber concurrency for PG-bound apps, and pre-fork cluster mode with
|
|
15
|
+
SO_REUSEPORT-balanced workers.
|
|
16
|
+
|
|
9
17
|
```sh
|
|
10
18
|
gem install hyperion-rb
|
|
11
|
-
bundle exec hyperion config.ru
|
|
19
|
+
bundle exec hyperion config.ru # http://127.0.0.1:9292
|
|
12
20
|
```
|
|
13
21
|
|
|
14
|
-
##
|
|
15
|
-
|
|
16
|
-
**The hot path moves into C — and gRPC ships.** The headline win:
|
|
17
|
-
`Server.handle_static` routes now serve from a C accept→read→route→write
|
|
18
|
-
loop with optional **io_uring** (Linux 5.x+) backing it. The `wrk -t4
|
|
19
|
-
-c100 -d20s` hello bench moved from **5,502 r/s** (2.11.0
|
|
20
|
-
`Server.handle_static` via Ruby accept loop) to **15,685 r/s** (2.12-C
|
|
21
|
-
C accept4 loop) to **134,084 r/s** (2.12-D io_uring loop) — that's
|
|
22
|
-
**24× over 2.11.0's `handle_static` and 7× over Agoo 2.15.14's
|
|
23
|
-
19,024 r/s** on the same workload. p99 stays sub-millisecond
|
|
24
|
-
throughout. Plus durable foundation work and one big new feature:
|
|
25
|
-
|
|
26
|
-
- **2.12-B — Fresh 4-way re-bench.** New
|
|
27
|
-
[`docs/BENCH_HYPERION_2_11.md`](docs/BENCH_HYPERION_2_11.md) re-runs
|
|
28
|
-
Hyperion / Puma / Falcon / Agoo on the 6 workloads with all 2.10/2.11
|
|
29
|
-
wins enabled. Headline shifts: static 1 KB Hyperion `handle_static`
|
|
30
|
-
flipped from 1.89× behind Agoo to **+127% ahead**; CPU JSON gap
|
|
31
|
-
widened (the one row 2.10/2.11 didn't touch — flagged for follow-up).
|
|
32
|
-
- **2.12-C — Connection lifecycle in C.** New
|
|
33
|
-
`Hyperion::Http::PageCache.run_static_accept_loop` does
|
|
34
|
-
`accept4` + `recv` + path lookup + `write` entirely in a C tight
|
|
35
|
-
loop, returning to Ruby only on a route miss / TLS / h2 / WebSocket
|
|
36
|
-
upgrade. GVL released across syscalls. Auto-engages when the listener
|
|
37
|
-
is plain TCP and the route table contains only `StaticEntry`
|
|
38
|
-
registrations. **5,502 → 15,685 r/s (+185%, 2.85×) on `handle_static`
|
|
39
|
-
hello; p99 1.59 ms → 107 µs (15× tighter).** Falls through to the
|
|
40
|
-
existing Ruby accept loop on miss with no regression.
|
|
41
|
-
- **2.12-D — io_uring accept loop (Linux 5.x+).** A multishot accept +
|
|
42
|
-
per-conn RECV/WRITE/CLOSE state machine on top of liburing. One
|
|
43
|
-
`io_uring_enter` per N requests instead of N×3 syscalls. Opt-in via
|
|
44
|
-
`HYPERION_IO_URING_ACCEPT=1` (default off until 2.13 production
|
|
45
|
-
soak). **15,685 → 134,084 r/s (+755%, 8.6×) on the same bench.**
|
|
46
|
-
Compiles out cleanly without liburing — the `accept4` path stays
|
|
47
|
-
the fallback. macOS keeps using `accept4` (no liburing).
|
|
48
|
-
- **2.12-E — SO_REUSEPORT cluster-mode audit.** New per-worker request
|
|
49
|
-
metric (`requests_dispatch_total{worker_id="N"}`) ticks under every
|
|
50
|
-
dispatch mode (Rack, `handle_static`, h2, the C accept loops). New
|
|
51
|
-
audit harness `bench/cluster_distribution.sh` and a 4-worker, 30s
|
|
52
|
-
sustained-load bench: under steady state the SO_REUSEPORT hash
|
|
53
|
-
distributes within **1.004-1.011 max/min ratio** — production-grade,
|
|
54
|
-
measured. The cold-start swing (1.16× during the first second of
|
|
55
|
-
fresh boot) is documented as expected `SO_REUSEPORT + keep-alive`
|
|
56
|
-
behavior and matches what production L4 LBs already exhibit.
|
|
57
|
-
- **2.12-F — gRPC support on h2.** Trailers (the `grpc-status` /
|
|
58
|
-
`grpc-message` final HEADERS frame), `TE: trailers` handling, h2
|
|
59
|
-
request half-close semantics. Rack 3 contract: a Rack body that
|
|
60
|
-
defines `#trailers` triggers the trailers wire shape automatically;
|
|
61
|
-
bodies that don't are byte-identical to 2.11.x h2. Smoke test against
|
|
62
|
-
the real `grpc` Ruby gem ships gated by `RUN_GRPC_SMOKE=1`; the
|
|
63
|
-
durable coverage is 11 unit specs driving real `protocol-http2`
|
|
64
|
-
framer + HPACK encode/decode + TLS.
|
|
65
|
-
|
|
66
|
-
The 2.10-G TCP_NODELAY hunk, 2.10-E preload hooks, 2.10-F C-ext
|
|
67
|
-
`rb_pc_serve_request`, 2.11-A dispatch pool warmup, and 2.11-B cglue
|
|
68
|
-
HPACK default all preserved and verified by the 1143-spec suite.
|
|
69
|
-
|
|
70
|
-
Full per-stream details, bench tables, and follow-up items in
|
|
71
|
-
[`CHANGELOG.md`](CHANGELOG.md).
|
|
72
|
-
|
|
73
|
-
## What's new in 2.11.0
|
|
74
|
-
|
|
75
|
-
**h2 cold-stream latency cut + native HPACK CGlue flipped to default.**
|
|
76
|
-
Two perf wins on top of 2.10:
|
|
77
|
-
|
|
78
|
-
- **2.11-A — h2 first-stream TLS handshake parallelization.** The
|
|
79
|
-
2.10-G `HYPERION_H2_TIMING=1` instrumentation, run against the
|
|
80
|
-
TCP_NODELAY-fixed handler, isolated the residual cold-stream cost
|
|
81
|
-
to **bucket 2**: lazy `task.async {}` fiber spawn for the first
|
|
82
|
-
stream of every connection. Fix: pre-spawn a stream-dispatch fiber
|
|
83
|
-
pool at connection accept (configurable via `HYPERION_H2_DISPATCH_POOL`,
|
|
84
|
-
default 4, ceiling 16). h2load `-c 1 -m 1 -n 50` cold first-run:
|
|
85
|
-
**time-to-1st-byte 20.28 → 9.28 ms (−54%); m=100 throughput +5.5%**.
|
|
86
|
-
Warm steady-state unchanged (no head-of-line blocking under the small
|
|
87
|
-
pool — backlog still spills to ad-hoc `task.async`).
|
|
88
|
-
- **2.11-B — HPACK FFI marshalling round-2 (CGlue flipped to default).**
|
|
89
|
-
Three-way bench (`bench/h2_rails_shape.sh` extended): `ruby` (1,585
|
|
90
|
-
r/s) vs `native v2` (1,602 r/s, +1% — noise) vs `native v3 / CGlue`
|
|
91
|
-
(**2,291 r/s, +43% over v2**). The +18-44% native-vs-Ruby headline
|
|
92
|
-
was almost entirely Fiddle marshalling overhead, not the underlying
|
|
93
|
-
Rust HPACK encoder — same encoder, no per-call FFI marshalling, +43%
|
|
94
|
-
rps. Default flipped: unset `HYPERION_H2_NATIVE_HPACK` now selects
|
|
95
|
-
CGlue. Three escape valves stay (`=v2` to force the old path, `=ruby`
|
|
96
|
-
/ `=off` for the pure-Ruby fallback) for any operator that needs
|
|
97
|
-
them. Boot log gains a `native_mode` field documenting which path is
|
|
98
|
-
actually live.
|
|
99
|
-
|
|
100
|
-
Plus operator infrastructure: a stale-`.dylib`-on-Linux cross-platform
|
|
101
|
-
host-OS portability fix in `H2Codec.candidate_paths` (was silently
|
|
102
|
-
falling through to pure-Ruby on the bench host); `bench/h2_rails_shape.sh`
|
|
103
|
-
race-fixed (boot-log probe + stderr routing). Full bench tables and
|
|
104
|
-
flip-decision rationale in [`CHANGELOG.md`](CHANGELOG.md).
|
|
105
|
-
|
|
106
|
-
## What's new in 2.10.1
|
|
107
|
-
|
|
108
|
-
**Static-asset operator surface (2.10-E) + C-ext fast-path response
|
|
109
|
-
writer (2.10-F).** Two follow-on streams to 2.10's static / direct-route
|
|
110
|
-
work:
|
|
111
|
-
|
|
112
|
-
- **2.10-E — Static asset preload + immutable flag.** Boot-time hook
|
|
113
|
-
warms `Hyperion::Http::PageCache` over a tree of files and marks
|
|
114
|
-
every cached entry immutable. Surface: `--preload-static <dir>` (and
|
|
115
|
-
`--no-preload-static`) CLI flags, `preload_static "/path", immutable:
|
|
116
|
-
true` config DSL key, and zero-config Rails auto-detect that pulls
|
|
117
|
-
`Rails.configuration.assets.paths.first(8)` when present. Hyperion
|
|
118
|
-
never `require`s Rails — purely defensive `defined?(::Rails)`
|
|
119
|
-
probing keeps the generic Rack server path clean. **Operator value:
|
|
120
|
-
predictable first-request latency** (the asset is in cache before
|
|
121
|
-
the first request arrives) and the `recheck_seconds` mtime poll is
|
|
122
|
-
skipped on immutable entries. Sustained-load throughput on the
|
|
123
|
-
static-1-KB bench did *not* move (cold 1,929 r/s vs warm 1,886 r/s,
|
|
124
|
-
inside trial noise) because `ResponseWriter` already auto-caches
|
|
125
|
-
Rack::Files responses on the first hit; preload moves that one
|
|
126
|
-
`cache_file` call from request 1 to boot.
|
|
127
|
-
- **2.10-F — C-ext fast-path response writer for prebuilt responses.**
|
|
128
|
-
`Server.handle_static`-routed requests now serve from a single
|
|
129
|
-
C function (`rb_pc_serve_request` in `ext/hyperion_http/page_cache.c`)
|
|
130
|
-
that does route lookup → header build → `write()` syscall without
|
|
131
|
-
re-entering Ruby on the response side. GVL is released across the
|
|
132
|
-
`write()` so slow clients no longer block other Ruby work on the
|
|
133
|
-
same VM. Automatic HEAD support (HTTP-mandated) lights up on every
|
|
134
|
-
GET registered via `handle_static` — same buffer, body stripped.
|
|
135
|
-
Bench (3-trial median, `wrk -t4 -c100 -d20s`): **5,768 r/s vs
|
|
136
|
-
2.10-D's 5,619 r/s (+2.6% — inside noise) and p99 1.93 → 1.67 ms
|
|
137
|
-
(−14% — outside noise, reproducible).** The throughput needle didn't
|
|
138
|
-
move because the per-connection lifecycle (accept4 + clone3 + futex
|
|
139
|
-
on GVL handoff) dominates at 100 concurrent connections; 2.10-F
|
|
140
|
-
shrinks the response phase, but the response phase isn't the
|
|
141
|
-
bottleneck on this profile. Durable infrastructure for 2.11+ when
|
|
142
|
-
the accept-loop work closes.
|
|
143
|
-
|
|
144
|
-
Full per-stream details and bench tables in
|
|
145
|
-
[`CHANGELOG.md`](CHANGELOG.md).
|
|
146
|
-
|
|
147
|
-
## What's new in 2.10.0
|
|
148
|
-
|
|
149
|
-
**4-way bench harness, page cache, direct routes, and the h2 40 ms
|
|
150
|
-
ceiling killed.** This sprint widens the comparison matrix to all four
|
|
151
|
-
major Ruby web servers (Hyperion + Puma + Falcon + Agoo) and ships
|
|
152
|
-
four substantive perf streams against that backdrop:
|
|
153
|
-
|
|
154
|
-
- **2.10-A / 2.10-B — 4-way bench harness + honest baseline.**
|
|
155
|
-
`bench/4way_compare.sh` runs the same 6 workloads (hello, static
|
|
156
|
-
1 KB / 1 MiB, CPU JSON, PG-bound, SSE) against all four servers from
|
|
157
|
-
one script. Baseline numbers committed *before* any code changes:
|
|
158
|
-
Agoo wins the static-asset and JSON columns by ~2-4×, Hyperion wins
|
|
159
|
-
the static 1 MiB column by 9× and the SSE column by 3.6-17×.
|
|
160
|
-
- **2.10-C — `Hyperion::Http::PageCache` (pre-built static response
|
|
161
|
-
cache).** Open-addressed bucket table behind a pthread mutex
|
|
162
|
-
(GVL-released for writes), engages automatically on `Rack::Files`
|
|
163
|
-
responses. **Static 1 KB: 1,380 → 1,880 r/s (+36%), p99 3.7 → 2.7
|
|
164
|
-
ms.** Closes the Agoo gap from −47% to −28% on that column.
|
|
165
|
-
- **2.10-D — `Hyperion::Server.handle` direct route registration.**
|
|
166
|
-
New API for hot Rack-bypass paths (`Server.handle '/health' do …
|
|
167
|
-
end`, `Server.handle_static '/robots.txt', body: '...'`). Skips Rack
|
|
168
|
-
adapter + env-build for matched routes. **`hello` via
|
|
169
|
-
`handle_static`: 4,408 → 5,619 r/s (+27%), p99 1.93 ms** — the
|
|
170
|
-
cleanest p99 in the 4-way matrix.
|
|
171
|
-
- **2.10-G — h2 max-latency ceiling at ~40 ms: fixed.** Filed by 2.9-B
|
|
172
|
-
as a "first-stream cost" hypothesis, the instrumentation revealed
|
|
173
|
-
it was paid by *every* h2 stream — the canonical Linux delayed-ACK
|
|
174
|
-
+ Nagle interaction on small framer writes. One-line fix:
|
|
175
|
-
TCP_NODELAY at accept time. **h2load `-c 1 -m 1 -n 200`: min
|
|
176
|
-
40.62 → 0.54 ms (−98.7%), throughput 24 → 1,142 r/s (+47.6×).** The
|
|
177
|
-
`HYPERION_H2_TIMING=1` instrumentation stays in place as durable
|
|
178
|
-
diagnostic infrastructure.
|
|
179
|
-
|
|
180
|
-
Full per-stream details, bench numbers, and follow-up items live in
|
|
181
|
-
[`CHANGELOG.md`](CHANGELOG.md).
|
|
182
|
-
|
|
183
|
-
## What's new in 2.5.0
|
|
184
|
-
|
|
185
|
-
**Native HPACK ON by default + autobahn 100% conformance + request
|
|
186
|
-
hooks.** The Rust HPACK encoder (added in 2.0.0, opt-in until 2.4.x)
|
|
187
|
-
flips ON by default in 2.5.0 — verified **+18% rps on Rails-shape h2
|
|
188
|
-
workloads** (25-header responses, the bench harness lives at
|
|
189
|
-
`bench/h2_rails_shape.ru` + `bench/h2_rails_shape.sh`). RFC 6455
|
|
190
|
-
WebSocket conformance hit **463/463 autobahn-testsuite cases passing**
|
|
191
|
-
(2.5-A, host openclaw-vm). Request lifecycle hooks
|
|
192
|
-
(`Runtime#on_request_start` / `on_request_end`) shipped in 2.5-C —
|
|
193
|
-
recipes in [`docs/OBSERVABILITY.md`](docs/OBSERVABILITY.md).
|
|
194
|
-
|
|
195
|
-
## What's new in 2.4.0
|
|
196
|
-
|
|
197
|
-
**Production observability.** The `/-/metrics` endpoint now exposes
|
|
198
|
-
per-route latency histograms, per-conn fairness rejections, WebSocket
|
|
199
|
-
permessage-deflate compression ratio, kTLS active connections,
|
|
200
|
-
io_uring-active workers, and ThreadPool queue depth — operators can
|
|
201
|
-
finally see whether the 2.x knobs are firing and how effective they
|
|
202
|
-
are. A pre-built Grafana dashboard ships at
|
|
203
|
-
[`docs/grafana/hyperion-2.4-dashboard.json`](docs/grafana/hyperion-2.4-dashboard.json).
|
|
204
|
-
Full metric reference + operator playbook in
|
|
205
|
-
[`docs/OBSERVABILITY.md`](docs/OBSERVABILITY.md).
|
|
206
|
-
|
|
207
|
-
## What's new in 2.1.0
|
|
208
|
-
|
|
209
|
-
**WebSockets.** RFC 6455 over Rack 3 full hijack, native frame codec,
|
|
210
|
-
per-connection wrapper with auto-pong / close handshake / UTF-8 validation /
|
|
211
|
-
per-message size cap. **ActionCable on Hyperion is now a single-binary
|
|
212
|
-
deployment** — one `hyperion -w 4 -t 10 config.ru` process serves HTTP,
|
|
213
|
-
HTTP/2, TLS, **and** `/cable` from the same listener; no separate cable
|
|
214
|
-
container required. HTTP/1.1 only this release; WS-over-HTTP/2 (RFC 8441
|
|
215
|
-
Extended CONNECT) and permessage-deflate (RFC 7692) defer to 2.2.x.
|
|
216
|
-
See [`docs/WEBSOCKETS.md`](docs/WEBSOCKETS.md).
|
|
217
|
-
|
|
218
|
-
## gRPC on Hyperion (2.12-F+)
|
|
219
|
-
|
|
220
|
-
Hyperion's HTTP/2 path supports gRPC unary calls via the Rack 3 trailers
|
|
221
|
-
contract: any response body that exposes `:trailers` gets a final
|
|
222
|
-
HEADERS frame (with END_STREAM=1) carrying the trailer map after the
|
|
223
|
-
DATA frames. That's the wire shape gRPC clients expect for the
|
|
224
|
-
`grpc-status` / `grpc-message` map.
|
|
225
|
-
|
|
226
|
-
A minimal Rack-shaped gRPC handler:
|
|
227
|
-
|
|
228
|
-
```ruby
|
|
229
|
-
class GrpcBody
|
|
230
|
-
def initialize(reply); @reply = reply; end
|
|
231
|
-
def each; yield @reply; end
|
|
232
|
-
def trailers; { 'grpc-status' => '0', 'grpc-message' => 'OK' }; end
|
|
233
|
-
def close; end
|
|
234
|
-
end
|
|
235
|
-
|
|
236
|
-
run ->(env) {
|
|
237
|
-
request = env['rack.input'].read # gRPC-framed protobuf bytes
|
|
238
|
-
reply = handle(request) # your service implementation
|
|
239
|
-
[200, { 'content-type' => 'application/grpc' }, GrpcBody.new(reply)]
|
|
240
|
-
}
|
|
241
|
-
```
|
|
22
|
+
## Headline benchmarks
|
|
242
23
|
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
|
|
246
|
-
|
|
247
|
-
marshalling and the `grpc-status` semantics. Streaming RPCs (server /
|
|
248
|
-
client / bidi) are 2.13 candidates — pin to unary for now.
|
|
249
|
-
|
|
250
|
-
## Highlights
|
|
251
|
-
|
|
252
|
-
- **HTTP/1.1 + HTTP/2 + TLS** out of the box (HTTP/2 with per-stream fiber multiplexing, WINDOW_UPDATE-aware flow control, ALPN auto-negotiation).
|
|
253
|
-
- **WebSockets (RFC 6455)** — full handshake, native frame codec, per-connection wrapper. ActionCable + faye-websocket work on a single-binary deploy. See [`docs/WEBSOCKETS.md`](docs/WEBSOCKETS.md). (2.1.0+, HTTP/1.1 only.)
|
|
254
|
-
- **Pre-fork cluster** with per-OS worker model: `SO_REUSEPORT` on Linux, master-bind + worker-fd-share on macOS/BSD (Darwin's `SO_REUSEPORT` doesn't load-balance).
|
|
255
|
-
- **Hybrid concurrency**: fiber-per-connection for I/O, OS-thread pool for `app.call(env)` — synchronous Rack handlers (Rails, ActiveRecord, anything holding a global mutex) get true OS-thread concurrency.
|
|
256
|
-
- **Vendored llhttp 9.3.0** C parser; pure-Ruby fallback for non-MRI runtimes.
|
|
257
|
-
- **Default-ON structured access logs** (one JSON or text line per request) with hot-path optimisations: per-thread cached timestamp, hand-rolled line builder, lock-free per-thread write buffer.
|
|
258
|
-
- **12-factor logger split**: info/debug → stdout, warn/error/fatal → stderr.
|
|
259
|
-
- **Ruby DSL config file** (`config/hyperion.rb`) with lifecycle hooks (`before_fork`, `on_worker_boot`, `on_worker_shutdown`).
|
|
260
|
-
- **Object pooling** for the Rack `env` hash and `rack.input` IO — amortizes per-request allocations across the worker's lifetime.
|
|
261
|
-
- **`Hyperion::FiberLocal`** opt-in shim for older Rails idioms that store request-scoped data via `Thread.current.thread_variable_*`.
|
|
262
|
-
|
|
263
|
-
## Benchmarks
|
|
264
|
-
|
|
265
|
-
All numbers are real wrk runs against published Hyperion configs. Hyperion ships **with default-ON structured access logs**; Puma comparisons use Puma defaults (no per-request log emission). Each section is stamped with the Hyperion version + bench host it was measured against — bench-host drift over time is real (see "Bench-host drift" note below).
|
|
266
|
-
|
|
267
|
-
**Headline doc**: the most recent comprehensive sweep is
|
|
268
|
-
[`docs/BENCH_HYPERION_2_11.md`](docs/BENCH_HYPERION_2_11.md) — the
|
|
269
|
-
2.12-B 4-way re-bench (Hyperion 2.11.0 vs Puma 8.0.1 / Falcon 0.55.3 /
|
|
270
|
-
Agoo 2.15.14, 16-vCPU Ubuntu 24.04, 6 workloads). It's the post-
|
|
271
|
-
2.10/2.11-wins re-baseline of the four-server matrix that originally
|
|
272
|
-
shipped in [`docs/BENCH_HYPERION_2_0.md`](docs/BENCH_HYPERION_2_0.md)
|
|
273
|
-
§ "4-way head-to-head (2.10-B baseline)" — the older doc is the
|
|
274
|
-
**historical baseline (pre-2.10/2.11 wins)** and is preserved
|
|
275
|
-
unchanged for archaeology. The 1.6.0 matrix at
|
|
276
|
-
[`docs/BENCH_2026_04_27.md`](docs/BENCH_2026_04_27.md) covers 9
|
|
277
|
-
workloads × 25+ configs against hyperion-async-pg 0.5.0; all three
|
|
278
|
-
docs include caveats and per-row reproduction commands.
|
|
279
|
-
|
|
280
|
-
> **Bench-host drift note (2026-05-01).** A spot-check rerun on
|
|
281
|
-
> `openclaw-vm` 5 days after the 2.0.0 sweep showed Puma 8.0.1 and
|
|
282
|
-
> Hyperion 2.0.0 baseline numbers had drifted 14-32% downward from the
|
|
283
|
-
> 2026-04-29 sweep with no code changes — the bench host runs other
|
|
284
|
-
> workloads in the background and is a single VM (KVM CPU). Numbers in
|
|
285
|
-
> this README and BENCH docs are snapshots; expect ±10-30% absolute
|
|
286
|
-
> drift between sweep dates. **The relative position (Hyperion vs Puma
|
|
287
|
-
> at matched config) is the durable signal**; e.g. Hyperion `-w 16 -t 5`
|
|
288
|
-
> hello-world today is 76,593 r/s vs Puma 8.0.1 `-w 16 -t 5:5` at 55,609
|
|
289
|
-
> r/s, **+37.7% over Puma** — wider than the 2.0.0 sweep's +27.8% even
|
|
290
|
-
> though absolute rps is lower. Reproduce: `bundle exec bin/hyperion
|
|
291
|
-
> -p 9501 -w 16 -t 5 bench/hello.ru` then `wrk -t4 -c200 -d20s
|
|
292
|
-
> http://127.0.0.1:9501/`.
|
|
293
|
-
|
|
294
|
-
> **Topology relevance.** Hyperion is built to run **fronted by nginx
|
|
295
|
-
> or an L7 load balancer** in most production deployments — plaintext
|
|
296
|
-
> HTTP/1.1 upstream, TLS terminated at the LB. The benches in this
|
|
297
|
-
> README that match that topology are: hello-world, CPU JSON, static,
|
|
298
|
-
> SSE, PG, WebSocket. Benches that are **bench-only for nginx-fronted
|
|
299
|
-
> ops** (the LB → upstream hop is plaintext h1 regardless): TLS h1,
|
|
300
|
-
> HTTP/2, kTLS_TX. Those rows still ship for operators who terminate
|
|
301
|
-
> TLS / h2 at Hyperion directly (small static fleets, edge boxes), but
|
|
302
|
-
> don't chase the +60% TLS-h1 win unless you actually terminate TLS at
|
|
303
|
-
> Hyperion.
|
|
304
|
-
|
|
305
|
-
### Hello-world Rack app
|
|
306
|
-
|
|
307
|
-
`bench/hello.ru`, single worker, parity threads (`-t 5` vs Puma `-t 5:5`), 4 wrk threads / 100 connections / 15s, macOS arm64 / Ruby 3.3.3, Hyperion 1.2.0. **macOS dev numbers; the headline Linux 2.0.0 bench is in [`docs/BENCH_HYPERION_2_0.md`](docs/BENCH_HYPERION_2_0.md)**:
|
|
308
|
-
|
|
309
|
-
| | r/s | p99 | tail vs Hyperion |
|
|
310
|
-
|---|---:|---:|---:|
|
|
311
|
-
| **Hyperion 1.2.0** (default, logs ON) | **22,496** | **502 µs** | **1×** |
|
|
312
|
-
| Falcon 0.55.3 `--count 1` | 22,199 | 5.36 ms | 11× worse |
|
|
313
|
-
| Puma 7.1.0 `-t 5:5` | 20,400 | 422.85 ms | 845× worse |
|
|
314
|
-
|
|
315
|
-
**Hyperion: 1.10× Puma throughput, parity with Falcon on throughput, ~10× lower p99 than Falcon and ~845× lower than Puma — while emitting structured JSON access logs the others don't.**
|
|
316
|
-
|
|
317
|
-
### Production cluster config (`-w 4`)
|
|
318
|
-
|
|
319
|
-
Same bench app, `-w 4` cluster, parity threads (`-t 5` everywhere), 4 wrk threads / 200 connections / 15s, macOS arm64:
|
|
320
|
-
|
|
321
|
-
| | r/s | p99 | tail vs Hyperion |
|
|
322
|
-
|---|---:|---:|---:|
|
|
323
|
-
| Falcon `--count 4` | 48,197 | 4.84 ms | 5.9× worse |
|
|
324
|
-
| **Hyperion `-w 4 -t 5`** | **40,137** | **825 µs** | **1×** |
|
|
325
|
-
| Puma `-w 4 -t 5:5` | 34,793 | 177.76 ms | 215× worse (1 timeout) |
|
|
326
|
-
|
|
327
|
-
Falcon edges Hyperion ~20% on raw rps at `-w 4` on macOS hello-world. **Hyperion still leads on tail latency by 5.9× over Falcon and 215× over Puma**, and beats Puma on throughput by 1.15×. On Linux production-config and DB-backed workloads (below) Hyperion takes the rps lead too — the macOS hello-world advantage to Falcon disappears once the workload includes any actual work or the kernel is Linux.
|
|
328
|
-
|
|
329
|
-
### Linux production-config (DB-backed Rack)
|
|
330
|
-
|
|
331
|
-
`-w 4 -t 10` on Ubuntu 24.04 / Ruby 3.3.3. Rack app does one Postgres `SELECT 1` + one Redis `GET` per request, real network round-trip. wrk `-t4 -c50 -d10s` × 3 runs (median):
|
|
332
|
-
|
|
333
|
-
| | r/s (median) | vs Puma default |
|
|
334
|
-
|---|---:|---:|
|
|
335
|
-
| **Hyperion default (rc17, logs ON)** | **5,786** | **1.012×** |
|
|
336
|
-
| Hyperion `--no-log-requests` | 6,364 | 1.114× |
|
|
337
|
-
| Puma `-w 4 -t 10:10` (no per-req logs) | 5,715 | 1.000× |
|
|
338
|
-
|
|
339
|
-
Bench is **wait-bound** — ~3-4 ms median is the PG + Redis round-trip, dwarfing the per-request CPU work where Hyperion's optimisations live. With a synchronous `pg` driver, fibers don't help: every in-flight DB call still parks an OS thread, and both servers max out at `workers × threads` concurrent queries. To widen this gap requires either an async PG driver — see [hyperion-async-pg](https://github.com/andrew-woblavobla/hyperion-async-pg) (companion gem; pair with `--async-io` and a fiber-aware pool, see "Async I/O — fiber concurrency on PG-bound apps" below) — or a CPU-bound workload, where Hyperion's lead becomes visible (next section).
|
|
340
|
-
|
|
341
|
-
### Async I/O — fiber concurrency on PG-bound apps
|
|
342
|
-
|
|
343
|
-
Ubuntu 24.04 / 16 vCPU / Ruby 3.3.3, Postgres 17 over WAN, `wrk -t4 -c200 -d20s`. Single worker (`-w 1`) unless noted. All configs returned 0 non-2xx and 0 timeouts. RSS sampled mid-run via `ps -o rss`.
|
|
344
|
-
|
|
345
|
-
**Wait-bound workload** (`pg_concurrent.ru`: `SELECT pg_sleep(0.05)` + tiny JSON; rackup lives in the [hyperion-async-pg companion repo](https://github.com/andrew-woblavobla/hyperion-async-pg) and on the bench host at `~/bench/pg_concurrent.ru`, not in this repo):
|
|
346
|
-
|
|
347
|
-
| | r/s | p99 | RSS | vs Puma `-t 5` |
|
|
348
|
-
|---|---:|---:|---:|---:|
|
|
349
|
-
| Puma 8.0 `-t 5` pool=5 | 56.5 | 3.88 s | 87 MB | 1.0× |
|
|
350
|
-
| Puma 8.0 `-t 30` pool=30 | 402.1 | 880 ms | 99 MB | 7.1× |
|
|
351
|
-
| Puma 8.0 `-t 100` pool=100 | 1067.4 | 557 ms | 121 MB | 18.9× |
|
|
352
|
-
| **Hyperion `--async-io -t 5`** pool=32 | 400.4 | 878 ms | 123 MB | 7.1× |
|
|
353
|
-
| **Hyperion `--async-io -t 5`** pool=64 | 778.9 | 638 ms | 133 MB | 13.8× |
|
|
354
|
-
| **Hyperion `--async-io -t 5`** pool=128 | 1344.2 | 536 ms | 148 MB | 23.8× |
|
|
355
|
-
| **Hyperion `--async-io -t 5` pool=200** | **2381.4** | **471 ms** | **164 MB** | **42.2×** |
|
|
356
|
-
| Hyperion `--async-io -w 4 -t 5` pool=64 | 1937.5 | 4.84 s | 416 MB | 34.3× (cold-start p99 — see note) |
|
|
357
|
-
| Falcon 0.55.3 `--count 1` pool=128 | 1665.7 | 516 ms | 141 MB | 29.5× |
|
|
358
|
-
|
|
359
|
-
**Mixed CPU+wait** (`pg_mixed.ru`: same query + 50-key JSON serialization, ~5 ms CPU; rackup lives in hyperion-async-pg + on the bench host at `~/bench/pg_mixed.ru`, not in this repo):
|
|
360
|
-
|
|
361
|
-
| | r/s | p99 | RSS | vs Puma `-t 30` |
|
|
362
|
-
|---|---:|---:|---:|---:|
|
|
363
|
-
| Puma 8.0 `-t 30` pool=30 | 351.7 | 963 ms | 127 MB | 1.0× |
|
|
364
|
-
| Hyperion `--async-io -t 5` pool=32 | 371.2 | 919 ms | 151 MB | 1.05× |
|
|
365
|
-
| Hyperion `--async-io -t 5` pool=64 | 741.5 | 681 ms | 161 MB | 2.1× |
|
|
366
|
-
| **Hyperion `--async-io -t 5` pool=128** | **1739.9** | **512 ms** | **201 MB** | **4.9×** |
|
|
367
|
-
| Falcon `--count 1` pool=128 | 1642.1 | 531 ms | 213 MB | 4.7× |
|
|
368
|
-
|
|
369
|
-
**Takeaways:**
|
|
370
|
-
1. **Linear scaling with pool size** under `--async-io` — `r/s ≈ pool × 12` on this WAN bench. Single-worker pool=200 hits 2381 r/s. The "**42× Puma `-t 5`**" and "**5.9× Puma's best**" framings above use Puma's pool=5 (timeout-floor) and pool=30 (mid-tier) rows respectively — fair comparisons on the *same* bench fixture, but a Puma operator who sizes their pool to match (`-t 100 pool=100` row above) lands at 1,067 r/s, so the **honest "Puma at its own best vs Hyperion at its own best" ratio is 2,381 / 1,067 ≈ 2.2×**, not 42×. The architectural win — fiber-pool grows to pool=200 without OS-thread cost — is real; the 42× headline is a configuration-difference effect, not a steady-state gap on matched configs.
|
|
371
|
-
2. **Mixed workload doesn't kill the win** — Hyperion `--async-io` pool=128 actually goes *up* on mixed (1740 vs 1344 r/s) because CPU work overlaps other fibers' PG-wait windows. This is the honest "what happens to a real Rails handler" answer.
|
|
372
|
-
3. **Hyperion ≈ Falcon within 3-7%** across pool sizes; both fiber-native architectures extract similar value from `hyperion-async-pg`.
|
|
373
|
-
4. **RSS at single-worker scale isn't the architectural moat** — Linux thread stacks are demand-paged; PG connection buffers dominate RSS at pool sizes ≤ 200. The architectural win is **handler concurrency under load**, not idle memory: Hyperion's fiber path runs thousands of in-flight handler invocations per OS thread, so wait-bound handlers don't queue at `max_threads`. See [Concurrency at scale](#concurrency-at-scale-architectural-advantages) for both the throughput-under-load row and a measured 10k-idle-keepalive RSS sweep against Puma and Falcon.
|
|
374
|
-
5. **`-w 4` cold-start caveat** — multi-worker p99 inflates because the bench rackup uses lazy per-process pool init (each worker pays full pool fill on its first request). Production apps avoid this with `on_worker_boot { Hyperion::AsyncPg::FiberPool.new(...).fill }`.
|
|
375
|
-
6. **Apples-to-apples PG note**: the row above uses `pg.wobla.space` WAN PG with `max_connections=500`. Earlier sweeps that compared Hyperion (WAN, max_conn=500) against Puma (local, max_conn=100) overstated the ratio because the Puma side timed out at the local pool ceiling. The 2.0.0 bench doc carries this caveat in the row 7 verification section; treat any "Hyperion 4× Puma on PG" headline as **indicative**, not precisely calibrated, until rerun against matched-pool PG.
|
|
376
|
-
|
|
377
|
-
Three things must all be true to get this win:
|
|
378
|
-
1. **`async_io: true`** in your Hyperion config (or `--async-io` CLI flag). Default is off to keep 1.2.0's raw-loop perf for fiber-unaware apps.
|
|
379
|
-
2. **`hyperion-async-pg`** installed: `gem 'hyperion-async-pg', require: 'hyperion/async_pg'` + `Hyperion::AsyncPg.install!` at boot.
|
|
380
|
-
3. **Fiber-aware connection pool.** The popular `connection_pool` gem is NOT — its Mutex blocks the OS thread. Use `Hyperion::AsyncPg::FiberPool` (ships with hyperion-async-pg 0.3.0+), [`async-pool`](https://github.com/socketry/async-pool), or `Async::Semaphore`.
|
|
381
|
-
|
|
382
|
-
Skip any of these and you get parity with Puma at the same `-t`. Run the bench yourself: `MODE=async DATABASE_URL=... PG_POOL_SIZE=200 bundle exec hyperion --async-io -t 5 bench/pg_concurrent.ru` (in the [hyperion-async-pg](https://github.com/andrew-woblavobla/hyperion-async-pg) repo).
|
|
383
|
-
|
|
384
|
-
> **TLS + async-pg note (1.4.0+).** TLS / HTTPS already runs each connection on a fiber under `Async::Scheduler` (the TLS path always uses `start_async_loop` for the ALPN handshake). **As of 1.4.0, the post-handshake `app.call` for HTTP/1.1-over-TLS dispatches inline on the calling fiber by default** — so fiber-cooperative libraries (`hyperion-async-pg`, `async-redis`) work on the TLS h1 path without needing `--async-io`. The Async-loop cost is already paid for the handshake; running the handler under the existing scheduler just preserves that context instead of stripping it on a thread-pool hop. h2 streams are always fiber-dispatched and benefit from async-pg without the flag.
|
|
385
|
-
>
|
|
386
|
-
> Operators who specifically want **TLS + threadpool dispatch** (e.g. CPU-heavy handlers competing for OS threads, where you'd rather not pay fiber yields and want true OS-thread parallelism on a synchronous handler) can pass `async_io: false` in the config to force the pool branch back on. The three-way `async_io` setting:
|
|
387
|
-
> - `nil` (default): plain HTTP/1.1 → pool, TLS h1 → inline.
|
|
388
|
-
> - `true`: plain HTTP/1.1 → inline, TLS h1 → inline (force fiber dispatch everywhere; needed for `hyperion-async-pg` on plain HTTP).
|
|
389
|
-
> - `false`: plain HTTP/1.1 → pool, TLS h1 → pool (explicit opt-out for TLS+threadpool).
|
|
24
|
+
Linux 6.8 / 16-vCPU Ubuntu 24.04 / Ruby 3.3.3, single worker, `wrk -t4 -c100 -d20s`
|
|
25
|
+
unless noted. Reproduction commands and the full 6-row 4-way matrix
|
|
26
|
+
(Hyperion / Puma / Falcon / Agoo) live in
|
|
27
|
+
[docs/BENCH_HYPERION_2_11.md](docs/BENCH_HYPERION_2_11.md).
|
|
390
28
|
|
|
391
|
-
|
|
29
|
+
| Workload | Hyperion r/s | Hyperion p99 | Reference |
|
|
30
|
+
|-------------------------------------------------------|-------------:|-------------:|----------------------|
|
|
31
|
+
| Static hello, `handle_static` + io_uring (2.12-D) | **134,084** | 1.14 ms | Agoo 2.15.14: 19,024 |
|
|
32
|
+
| Static hello, `handle_static` + accept4 fallback | 15,685 | 107 µs | Agoo 2.15.14: 19,024 |
|
|
33
|
+
| Dynamic block, `Server.handle { \|env\| ... }` (2.14-A) | 9,422 | 166 µs | Agoo 2.15.14: 19,024 |
|
|
34
|
+
| CPU JSON via block (`bench/work.ru`, 2.14-A) | 5,897 | 256 µs | Falcon: 4,226 |
|
|
35
|
+
| Generic Rack hello (no `Server.handle`) | 4,752 | 2.02 ms | Agoo 2.15.14: 19,024 |
|
|
36
|
+
| gRPC unary, h2/TLS, ghz `-c50` (2.14-D) | 1,618 | 33.3 ms | Falcon `async-grpc`: 1,512 (+7%) |
|
|
392
37
|
|
|
393
|
-
|
|
38
|
+
The 134,084 r/s row is sustained over a 4-hour soak at **120,684 r/s**
|
|
39
|
+
with RSS variance 2.71% and `wrk-truth` p99 1.14 ms (2.14-C). The
|
|
40
|
+
io_uring loop is opt-in via `HYPERION_IO_URING_ACCEPT=1` until 2.15;
|
|
41
|
+
the `accept4` row is the default on Linux.
|
|
394
42
|
|
|
395
|
-
|
|
396
|
-
|---|---:|---:|---:|
|
|
397
|
-
| Falcon `--count 4` | 46,166 | 20.17 ms | 24× worse |
|
|
398
|
-
| **Hyperion `-w 4 -t 5`** | **43,924** | **824 µs** | **1×** |
|
|
399
|
-
| Puma `-w 4 -t 5:5` | 36,383 | 166.30 ms (47 socket errors) | 200× worse |
|
|
400
|
-
|
|
401
|
-
**1.21× Puma throughput, 200× lower p99.** This is the gap that hides behind PG-round-trip noise on the DB bench. Hyperion's per-request CPU savings (lock-free per-thread metrics, frozen header keys in the Rack adapter, C-ext response head builder, cached iso8601 timestamps, cached HTTP Date header) land on the wire when the workload is CPU-bound. Falcon edges us 5% on raw r/s but with 24× worse tail — a different tradeoff curve. Reproduce: `bundle exec bin/hyperion -w 4 -t 5 -p 9292 bench/work.ru`.
|
|
402
|
-
|
|
403
|
-
### Real Rails 8.1 app (single worker, parity threads `-t 16`)
|
|
404
|
-
|
|
405
|
-
Health endpoint that traverses the full middleware chain (rack-attack, locale redirect, structured tagger, geo-location, etc.). Plus a Grape API endpoint reading cached data, and a Rails controller doing a Redis GET + an ActiveRecord query.
|
|
406
|
-
|
|
407
|
-
| endpoint | server | r/s | p99 | wrk timeouts |
|
|
408
|
-
|---|---|---:|---:|---:|
|
|
409
|
-
| `/up` (health) | **Hyperion** | **19.03** | **1.12 s** | **0** |
|
|
410
|
-
| `/up` (health) | Puma `-t 16:16` | 16.64 | 1.95 s | **138** |
|
|
411
|
-
| Grape `/api/v1/cached_data` | **Hyperion** | **16.15** | **779 ms** | 16 |
|
|
412
|
-
| Grape `/api/v1/cached_data` | Puma `-t 16:16` | 10.90 | (>2 s, censored) | **110** |
|
|
413
|
-
| Rails `/api/v1/health` | **Hyperion** | **15.95** | **992 ms** | 16 |
|
|
414
|
-
| Rails `/api/v1/health` | Puma `-t 16:16` | 11.29 | (>2 s, censored) | **114** |
|
|
415
|
-
|
|
416
|
-
On Grape and Rails-controller workloads Puma hits wrk's 2 s timeout cap on ~⅔ of requests — its real p99 is censored above 2 s. Hyperion serves all of its requests under 1.2 s with 0 to 16 timeouts. **1.14–1.48× Puma throughput** depending on endpoint.
|
|
417
|
-
|
|
418
|
-
### Static-asset serving (sendfile zero-copy path, 1.2.0+)
|
|
419
|
-
|
|
420
|
-
`bench/static.ru` (`Rack::Files` over a 1 MiB asset), `-w 1`, `wrk -t4 -c100 -d15s`, macOS arm64 / Ruby 3.3.3:
|
|
421
|
-
|
|
422
|
-
| | r/s | p99 | transferred | tail vs winner |
|
|
423
|
-
|---|---:|---:|---:|---:|
|
|
424
|
-
| **Hyperion (sendfile path)** | **2,069** | **3.10 ms** | 30.4 GB | **1×** |
|
|
425
|
-
| Puma `-w 1 -t 5:5` | 2,109 | 566.16 ms | 31.0 GB | 183× worse |
|
|
426
|
-
| Falcon `--count 1` | 1,269 | 801.01 ms | 18.7 GB | 258× worse (28 timeouts) |
|
|
427
|
-
|
|
428
|
-
Throughput is bandwidth-bound on localhost (≈2 GB/s = the loopback memory ceiling), so the throughput column looks like parity. The actual win is in the **tail latency** column: Hyperion's `IO.copy_stream` → `sendfile(2)` path skips userspace entirely, while Puma allocates a String per response and Falcon serializes more aggressively. On real network paths sendfile widens the gap further (kernel-to-NIC zero-copy).
|
|
43
|
+
## Quick start
|
|
429
44
|
|
|
430
|
-
Reproduce:
|
|
431
45
|
```sh
|
|
432
|
-
|
|
433
|
-
bundle exec
|
|
434
|
-
|
|
46
|
+
bundle exec hyperion config.ru # single process
|
|
47
|
+
bundle exec hyperion -w 4 -t 10 config.ru # 4 workers × 10 threads
|
|
48
|
+
bundle exec hyperion -w 0 config.ru # one worker per CPU
|
|
49
|
+
bundle exec hyperion --tls-cert cert.pem --tls-key key.pem -p 9443 config.ru
|
|
435
50
|
```
|
|
436
51
|
|
|
437
|
-
|
|
438
|
-
|
|
439
|
-
These workloads demonstrate structural differences between Hyperion's fiber-per-connection / fiber-per-stream model and Puma's thread-pool model. Numbers are illustrative; the architecture is what matters. Run on Ubuntu 24.04 / Ruby 3.3.3, single worker, h2load `-c <conns> -n 100000 --rps 1000 --h1`.
|
|
440
|
-
|
|
441
|
-
**5,000 concurrent keep-alive connections (50,000 requests):**
|
|
442
|
-
|
|
443
|
-
| | succeeded | r/s | wall | master RSS |
|
|
444
|
-
|---|---:|---:|---:|---:|
|
|
445
|
-
| Hyperion `-w 1 -t 10` | 50,000 / 50,000 | 3,460 | 14.45 s | 53.5 MB |
|
|
446
|
-
| Puma `-w 1 -t 10:10` | 50,000 / 50,000 | 1,762 | 28.37 s | 36.9 MB |
|
|
52
|
+
`bundle exec rake spec` (and the default task) auto-invoke `compile`, so a
|
|
53
|
+
fresh checkout just needs `bundle install && bundle exec rake` for a green run.
|
|
447
54
|
|
|
448
|
-
|
|
55
|
+
Migrating from Puma? `hyperion -t N -w M` matching your current Puma
|
|
56
|
+
`-t N:N -w M` is the recommended drop-in. See
|
|
57
|
+
[docs/MIGRATING_FROM_PUMA.md](docs/MIGRATING_FROM_PUMA.md).
|
|
449
58
|
|
|
450
|
-
|
|
451
|
-
|---|---:|---:|---:|---:|
|
|
452
|
-
| Hyperion `-w 1 -t 10` | 93,090 | 6,910 | 3,446 | 27.01 s |
|
|
453
|
-
| Puma `-w 1 -t 10:10` | 77,340 | 22,660 | 706 | 109.59 s |
|
|
59
|
+
## Features
|
|
454
60
|
|
|
455
|
-
|
|
61
|
+
### HTTP/1.1 + HTTP/2 + TLS
|
|
456
62
|
|
|
457
|
-
|
|
63
|
+
ALPN auto-negotiates `h2` or `http/1.1` per connection. HTTP/2 multiplexes
|
|
64
|
+
streams onto fibers within a single connection — slow handlers don't
|
|
65
|
+
head-of-line-block other streams. Cluster-mode TLS works (`-w N` +
|
|
66
|
+
`--tls-cert` / `--tls-key`).
|
|
458
67
|
|
|
459
|
-
|
|
68
|
+
Smuggling defenses for HTTP/1.1: `Content-Length` + `Transfer-Encoding`
|
|
69
|
+
together → 400; non-chunked `Transfer-Encoding` → 501; CRLF in response
|
|
70
|
+
header values → `ArgumentError` (response-splitting guard).
|
|
460
71
|
|
|
461
|
-
|
|
462
|
-
|---|---:|---:|---:|---:|
|
|
463
|
-
| Hyperion `-w 1 -t 5 --async-io` | 10,000 / 10,000 | 0 | 173 MB | 155 MB |
|
|
464
|
-
| Puma `-w 0 -t 100` | 10,000 / 10,000 | 0 | 101 MB | 104 MB |
|
|
465
|
-
| Falcon `--count 1` | 10,000 / 10,000 | 0 | 429 MB | 440 MB |
|
|
72
|
+
### WebSockets (2.1.0+)
|
|
466
73
|
|
|
467
|
-
|
|
74
|
+
RFC 6455 over Rack 3 full hijack, native frame codec, per-connection
|
|
75
|
+
wrapper with auto-pong, close handshake, UTF-8 validation, and per-message
|
|
76
|
+
size cap. **ActionCable + faye-websocket on a single binary** — one
|
|
77
|
+
`hyperion -w 4 -t 10 config.ru` serves HTTP, HTTP/2, TLS, and `/cable`
|
|
78
|
+
from the same listener. Conformance: 463/463 autobahn-testsuite cases
|
|
79
|
+
pass. See [docs/WEBSOCKETS.md](docs/WEBSOCKETS.md).
|
|
468
80
|
|
|
469
|
-
|
|
81
|
+
### gRPC (2.12-F+)
|
|
470
82
|
|
|
471
|
-
|
|
83
|
+
Hyperion's HTTP/2 path supports gRPC unary, server-streaming,
|
|
84
|
+
client-streaming, and bidirectional RPCs via the Rack 3 trailers contract:
|
|
85
|
+
any response body that defines `#trailers` gets a final HEADERS frame
|
|
86
|
+
(with `END_STREAM=1`) carrying the trailer map after the DATA frames.
|
|
87
|
+
Plain HTTP/2 traffic without the gRPC content-type keeps the unary
|
|
88
|
+
buffered semantics — no behaviour change for non-gRPC clients.
|
|
472
89
|
|
|
473
|
-
|
|
474
|
-
|---|---:|
|
|
475
|
-
| Hyperion (per-stream fiber dispatch) | **1.04 s** |
|
|
476
|
-
| Serial baseline (100 × 50 ms) | 5.00 s |
|
|
90
|
+
A minimal unary handler:
|
|
477
91
|
|
|
478
|
-
|
|
479
|
-
|
|
480
|
-
|
|
481
|
-
|
|
482
|
-
|
|
483
|
-
|
|
484
|
-
|
|
485
|
-
|
|
486
|
-
```sh
|
|
487
|
-
# Setup (once)
|
|
488
|
-
bundle install
|
|
489
|
-
bundle exec rake compile
|
|
490
|
-
|
|
491
|
-
# Hello-world (rps + p99 ceiling, no I/O)
|
|
492
|
-
bundle exec bin/hyperion -p 9292 -w 16 -t 5 bench/hello.ru &
|
|
493
|
-
wrk -t4 -c200 -d20s --latency http://127.0.0.1:9292/
|
|
494
|
-
|
|
495
|
-
# CPU-bound JSON (per-request CPU savings visible)
|
|
496
|
-
bundle exec bin/hyperion -p 9292 -w 4 -t 5 bench/work.ru &
|
|
497
|
-
wrk -t4 -c200 -d15s --latency http://127.0.0.1:9292/
|
|
498
|
-
|
|
499
|
-
# Static 1 MiB sendfile path
|
|
500
|
-
ruby -e 'File.binwrite("/tmp/hyperion_bench_asset_1m.bin", "x" * (1024*1024))'
|
|
501
|
-
bundle exec bin/hyperion -p 9292 -w 1 -t 5 bench/static.ru &
|
|
502
|
-
wrk -t4 -c100 -d15s --latency http://127.0.0.1:9292/hyperion_bench_asset_1m.bin
|
|
503
|
-
|
|
504
|
-
# SSE streaming (Hyperion-shaped rackup with explicit flush sentinel — see caveat in BENCH doc)
|
|
505
|
-
bundle exec bin/hyperion -p 9292 -w 1 -t 5 bench/sse.ru &
|
|
506
|
-
wrk -t1 -c1 -d10s http://127.0.0.1:9292/
|
|
507
|
-
|
|
508
|
-
# WebSocket multi-process throughput
|
|
509
|
-
bundle exec bin/hyperion -p 9888 -w 4 -t 64 bench/ws_echo.ru &
|
|
510
|
-
ruby bench/ws_bench_client_multi.rb --port 9888 --procs 4 --conns 200 --msgs 1000 --bytes 1024 --json
|
|
511
|
-
|
|
512
|
-
# h2 native HPACK (Rails-shape, 25-header response)
|
|
513
|
-
./bench/h2_rails_shape.sh
|
|
514
|
-
|
|
515
|
-
# Idle keep-alive RSS sweep (1k / 5k / 10k conns, 30s hold per server)
|
|
516
|
-
./bench/keepalive_memory.sh
|
|
92
|
+
```ruby
|
|
93
|
+
class GrpcBody
|
|
94
|
+
def initialize(reply); @reply = reply; end
|
|
95
|
+
def each; yield @reply; end
|
|
96
|
+
def trailers; { 'grpc-status' => '0', 'grpc-message' => 'OK' }; end
|
|
97
|
+
def close; end
|
|
98
|
+
end
|
|
517
99
|
|
|
518
|
-
|
|
519
|
-
|
|
520
|
-
|
|
100
|
+
run ->(env) {
|
|
101
|
+
request = env['rack.input'].read
|
|
102
|
+
reply = handle(request)
|
|
103
|
+
[200, { 'content-type' => 'application/grpc' }, GrpcBody.new(reply)]
|
|
104
|
+
}
|
|
521
105
|
```
|
|
522
106
|
|
|
523
|
-
|
|
107
|
+
Server-streaming yields one DATA frame per `each`; client-streaming
|
|
108
|
+
reads incoming frames off `env['rack.input']` (a streaming IO that
|
|
109
|
+
blocks until the next DATA frame lands); bidirectional interleaves
|
|
110
|
+
both. Reproducible bench at `bench/grpc_stream.{proto,ru}` +
|
|
111
|
+
`bench/grpc_stream_bench.sh` (ghz). Numbers in
|
|
112
|
+
[docs/BENCH_HYPERION_2_11.md](docs/BENCH_HYPERION_2_11.md#grpc-ghz-bench--hyperion-vs-falcon-async-grpc-214-d).
|
|
524
113
|
|
|
525
|
-
|
|
114
|
+
### `Server.handle` direct routes
|
|
526
115
|
|
|
527
|
-
|
|
116
|
+
Bypass the Rack adapter for hot paths:
|
|
528
117
|
|
|
529
|
-
```
|
|
530
|
-
|
|
531
|
-
|
|
532
|
-
bundle exec hyperion config.ru # single-process default
|
|
533
|
-
bundle exec hyperion -w 4 -t 10 config.ru # 4-worker cluster, 10 threads each
|
|
534
|
-
bundle exec hyperion -w 0 config.ru # 1 worker per CPU
|
|
535
|
-
bundle exec hyperion --tls-cert cert.pem --tls-key key.pem -p 9443 config.ru # HTTPS
|
|
536
|
-
curl http://127.0.0.1:9292/ # => hello
|
|
537
|
-
|
|
538
|
-
# Chunked POST works:
|
|
539
|
-
curl -X POST -H "Transfer-Encoding: chunked" --data-binary @file http://127.0.0.1:9292/
|
|
540
|
-
|
|
541
|
-
# HTTP/2 (over TLS, ALPN-negotiated):
|
|
542
|
-
curl --http2 -k https://127.0.0.1:9443/
|
|
118
|
+
```ruby
|
|
119
|
+
Hyperion::Server.handle_static '/health', body: 'ok'
|
|
120
|
+
Hyperion::Server.handle(:GET, '/v1/ping') { |env| [200, {}, ['pong']] }
|
|
543
121
|
```
|
|
544
122
|
|
|
545
|
-
`
|
|
546
|
-
|
|
547
|
-
|
|
123
|
+
`handle_static` bakes the response at boot and serves from the C accept
|
|
124
|
+
loop (134k r/s with io_uring, 16k r/s on accept4). The dynamic block
|
|
125
|
+
form (2.14-A) runs `app.call(env)` on the C accept loop too — accept +
|
|
126
|
+
recv + parse + write release the GVL while the block holds it, so
|
|
127
|
+
multi-threaded workers actually parallelise.
|
|
128
|
+
|
|
129
|
+
### Pre-fork cluster
|
|
130
|
+
|
|
131
|
+
Per-OS worker model: `SO_REUSEPORT` on Linux (kernel-balanced accept,
|
|
132
|
+
1.004–1.011 max/min ratio across workers under steady load — 2.12-E
|
|
133
|
+
audit), master-bind + worker-fd-share on macOS/BSD where Darwin's
|
|
134
|
+
`SO_REUSEPORT` doesn't load-balance. Lifecycle hooks (`before_fork`,
|
|
135
|
+
`on_worker_boot`, `on_worker_shutdown`) for AR / Redis / pool init.
|
|
136
|
+
|
|
137
|
+
### Async I/O (PG-bound apps)
|
|
138
|
+
|
|
139
|
+
`--async-io` runs plain HTTP/1.1 connections under `Async::Scheduler`,
|
|
140
|
+
turning one OS thread into thousands of in-flight handler invocations.
|
|
141
|
+
Paired with [hyperion-async-pg](https://github.com/andrew-woblavobla/hyperion-async-pg)
|
|
142
|
+
on a `pg_sleep(50ms)` workload, single-worker `pool=200` hits **2,381 r/s**
|
|
143
|
+
vs Puma `-t 5` at 56 r/s (architectural ceiling: pool size, not thread
|
|
144
|
+
count). Three things must all be true: `--async-io`, `hyperion-async-pg`
|
|
145
|
+
loaded, and a fiber-aware pool (`Hyperion::AsyncPg::FiberPool`,
|
|
146
|
+
`async-pool`, or `Async::Semaphore` — **not** the `connection_pool` gem,
|
|
147
|
+
whose `Mutex` blocks the OS thread). Skip any one and you get parity
|
|
148
|
+
with Puma.
|
|
149
|
+
|
|
150
|
+
### Observability
|
|
151
|
+
|
|
152
|
+
`/-/metrics` Prometheus endpoint (admin-token guarded), per-route
|
|
153
|
+
latency histograms, per-conn fairness rejections, WebSocket
|
|
154
|
+
permessage-deflate ratio, kTLS active connections, ThreadPool queue
|
|
155
|
+
depth, dispatch-mode counters (Rack / `handle_static` / dynamic block /
|
|
156
|
+
h2 / async-io). Pre-built Grafana dashboard at
|
|
157
|
+
[docs/grafana/hyperion-2.4-dashboard.json](docs/grafana/hyperion-2.4-dashboard.json).
|
|
158
|
+
Full reference: [docs/OBSERVABILITY.md](docs/OBSERVABILITY.md).
|
|
159
|
+
|
|
160
|
+
Default-ON structured access logs (one JSON or text line per request)
|
|
161
|
+
with hot-path optimisations: per-thread cached iso8601 timestamp,
|
|
162
|
+
hand-rolled line builder, lock-free per-thread 4 KiB write buffer.
|
|
163
|
+
12-factor logger split: `info`/`debug` → stdout, `warn`/`error`/`fatal`
|
|
164
|
+
→ stderr.
|
|
165
|
+
|
|
166
|
+
### Optional io_uring accept loop
|
|
167
|
+
|
|
168
|
+
Linux 5.x+, opt-in via `HYPERION_IO_URING_ACCEPT=1`. Multishot accept
|
|
169
|
+
+ per-conn RECV/WRITE/CLOSE state machine on top of liburing. One
|
|
170
|
+
`io_uring_enter` per N requests instead of N×3 syscalls. Compiles out
|
|
171
|
+
cleanly without liburing — the `accept4` path stays the fallback.
|
|
172
|
+
macOS keeps using `accept4`. Default-flip moves to 2.15 with a fresh
|
|
173
|
+
24h soak.
|
|
548
174
|
|
|
549
175
|
## Configuration
|
|
550
176
|
|
|
551
|
-
Three layers, in precedence order: explicit CLI flag > environment
|
|
177
|
+
Three layers, in precedence order: explicit CLI flag > environment
|
|
178
|
+
variable > `config/hyperion.rb` > built-in default.
|
|
552
179
|
|
|
553
|
-
### CLI flags
|
|
180
|
+
### Most-used CLI flags
|
|
554
181
|
|
|
555
182
|
| Flag | Default | Notes |
|
|
556
183
|
|---|---|---|
|
|
557
184
|
| `-b, --bind HOST` | `127.0.0.1` | |
|
|
558
185
|
| `-p, --port PORT` | `9292` | |
|
|
559
186
|
| `-w, --workers N` | `1` | `0` → `Etc.nprocessors` |
|
|
560
|
-
| `-t, --threads N` | `5` | OS-thread Rack handler pool per worker. `0` → run inline (
|
|
187
|
+
| `-t, --threads N` | `5` | OS-thread Rack handler pool per worker. `0` → run inline (debugging). |
|
|
561
188
|
| `-C, --config PATH` | `config/hyperion.rb` if present | Ruby DSL file. |
|
|
562
|
-
| `--tls-cert PATH` | nil | PEM
|
|
563
|
-
| `--
|
|
564
|
-
| `--
|
|
565
|
-
| `--
|
|
566
|
-
| `--
|
|
567
|
-
| `--
|
|
568
|
-
| `--
|
|
569
|
-
| `--
|
|
570
|
-
|
|
571
|
-
|
|
572
|
-
|
|
573
|
-
|
|
574
|
-
|
|
575
|
-
|
|
576
|
-
| `--worker-max-rss-mb MB` | unset | Master gracefully recycles a worker once its RSS exceeds this many megabytes. nil = disabled. |
|
|
577
|
-
| `--idle-keepalive SECONDS` | `5` | Keep-alive idle timeout. Connection closes after this many seconds of inactivity. |
|
|
578
|
-
| `--graceful-timeout SECONDS` | `30` | Shutdown deadline before SIGKILL is delivered to a worker that hasn't drained. |
|
|
189
|
+
| `--tls-cert PATH` / `--tls-key PATH` | nil | PEM cert + key for HTTPS. |
|
|
190
|
+
| `--[no-]async-io` | off | Run plain HTTP/1.1 under `Async::Scheduler`. Required for `hyperion-async-pg` on plain HTTP. |
|
|
191
|
+
| `--preload-static DIR` | nil | Preload static assets from DIR at boot (repeatable, immutable). Rails apps auto-detect from `Rails.configuration.assets.paths`. |
|
|
192
|
+
| `--admin-token-file PATH` | unset | Auth file for `/-/quit` and `/-/metrics`. Refuses world-readable files. |
|
|
193
|
+
| `--worker-max-rss-mb MB` | unset | Master gracefully recycles a worker exceeding MB RSS. |
|
|
194
|
+
| `--max-pending COUNT` | unbounded | Per-worker accept-queue cap before HTTP 503 + `Retry-After: 1`. |
|
|
195
|
+
| `--idle-keepalive SECONDS` | `5` | Keep-alive idle timeout. |
|
|
196
|
+
| `--graceful-timeout SECONDS` | `30` | Shutdown deadline before SIGKILL. |
|
|
197
|
+
|
|
198
|
+
`bin/hyperion --help` prints the full set, including `--max-body-bytes`,
|
|
199
|
+
`--max-header-bytes`, `--max-request-read-seconds` (slowloris defence),
|
|
200
|
+
`--h2-max-total-streams`, `--max-in-flight-per-conn`,
|
|
201
|
+
`--tls-handshake-rate-limit`, and the `--[no-]yjit` /
|
|
202
|
+
`--[no-]log-requests` toggles.
|
|
579
203
|
|
|
580
204
|
### Environment variables
|
|
581
205
|
|
|
582
|
-
`HYPERION_LOG_LEVEL`, `HYPERION_LOG_FORMAT`, `HYPERION_LOG_REQUESTS`
|
|
206
|
+
`HYPERION_LOG_LEVEL`, `HYPERION_LOG_FORMAT`, `HYPERION_LOG_REQUESTS`
|
|
207
|
+
(`0|1|true|false|yes|no|on|off`), `HYPERION_ENV`,
|
|
208
|
+
`HYPERION_WORKER_MODEL` (`share|reuseport`), `HYPERION_IO_URING_ACCEPT`
|
|
209
|
+
(`0|1`), `HYPERION_H2_DISPATCH_POOL`, `HYPERION_H2_NATIVE_HPACK`
|
|
210
|
+
(`v2|ruby|off`), `HYPERION_H2_TIMING`.
|
|
583
211
|
|
|
584
212
|
### Config file
|
|
585
213
|
|
|
586
|
-
`config/hyperion.rb` — same shape as Puma's `puma.rb`. Auto-loaded if
|
|
214
|
+
`config/hyperion.rb` — same shape as Puma's `puma.rb`. Auto-loaded if
|
|
215
|
+
present. Strict DSL: unknown methods raise `NoMethodError` at boot.
|
|
587
216
|
|
|
588
217
|
```ruby
|
|
589
218
|
# config/hyperion.rb
|
|
@@ -600,16 +229,11 @@ read_timeout 30
|
|
|
600
229
|
idle_keepalive 5
|
|
601
230
|
graceful_timeout 30
|
|
602
231
|
|
|
603
|
-
max_header_bytes 64 * 1024
|
|
604
|
-
max_body_bytes 16 * 1024 * 1024
|
|
605
|
-
|
|
606
232
|
log_level :info
|
|
607
233
|
log_format :auto
|
|
608
234
|
log_requests true
|
|
609
235
|
|
|
610
|
-
|
|
611
|
-
|
|
612
|
-
async_io nil # Three-way (1.4.0+): nil (default, auto: inline-on-fiber for TLS h1, pool hop for plain HTTP/1.1), true (force inline-on-fiber everywhere — required for hyperion-async-pg on plain HTTP/1.1), false (force pool hop everywhere — explicit opt-out for TLS+threadpool with CPU-heavy handlers). ~5% throughput hit on hello-world when inline; in exchange one OS thread serves N concurrent in-flight DB queries on wait-bound workloads. TLS / HTTP/2 accept loops always run under Async::Scheduler regardless of this flag.
|
|
236
|
+
async_io nil # nil = auto (1.4.0+), true = inline-on-fiber everywhere, false = pool everywhere
|
|
613
237
|
|
|
614
238
|
before_fork do
|
|
615
239
|
ActiveRecord::Base.connection_handler.clear_all_connections! if defined?(ActiveRecord)
|
|
@@ -618,199 +242,202 @@ end
|
|
|
618
242
|
on_worker_boot do |worker_index|
|
|
619
243
|
ActiveRecord::Base.establish_connection if defined?(ActiveRecord)
|
|
620
244
|
end
|
|
621
|
-
|
|
622
|
-
on_worker_shutdown do |worker_index|
|
|
623
|
-
ActiveRecord::Base.connection_handler.clear_all_connections! if defined?(ActiveRecord)
|
|
624
|
-
end
|
|
625
245
|
```
|
|
626
246
|
|
|
627
|
-
|
|
628
|
-
|
|
629
|
-
A documented sample lives at [`config/hyperion.example.rb`](config/hyperion.example.rb).
|
|
247
|
+
A documented sample lives at
|
|
248
|
+
[`config/hyperion.example.rb`](config/hyperion.example.rb).
|
|
630
249
|
|
|
631
250
|
## Operator guidance
|
|
632
251
|
|
|
633
|
-
|
|
634
|
-
|
|
635
|
-
|
|
636
|
-
|
|
637
|
-
| Workload shape | Recommended | Why |
|
|
638
|
-
|---|---|---|
|
|
639
|
-
| **Pure I/O-bound** (PG / Redis / external HTTP, no significant CPU) | `-w 1` + larger pool | Bench: `-w 1 pool=200` = 87 MB / 2,180 r/s vs `-w 4 pool=64` = 224 MB / 1,680 r/s. **2.6× more memory, 0.77× rps** if you pick multi-worker on a wait-bound workload. |
|
|
640
|
-
| **Pure CPU-bound** (heavy JSON / template render / image processing) | `-w N` matching CPU count | Each worker's accept loop is single-threaded under `--async-io`; multi-worker gives CPU-parallelism. Bench: `-w 16 -t 5` hits 98,818 r/s on a 16-vCPU box, 4.7× a `-w 1` ceiling on the same hardware. |
|
|
641
|
-
| **Mixed** (Rails-shaped: ~5 ms CPU + 50 ms PG wait per request) | `-w N/2` (half cores) + medium pool | Lets CPU work parallelise while keeping per-worker memory tractable. Bench `pg_mixed.ru` (in hyperion-async-pg repo / `~/bench/`) at `-w 4 -t 5 pool=128` = 1,740 r/s with no cold-start spike (ForkSafe `prefill_in_child: true`). |
|
|
642
|
-
|
|
643
|
-
Multi-worker on PG-wait workloads is the **wrong** default for most apps — the headline rps doesn't justify the memory and PG-connection cost. Verify your shape with the bench before scaling out.
|
|
252
|
+
Distilled from [docs/BENCH_2026_04_27.md](docs/BENCH_2026_04_27.md)
|
|
253
|
+
(Rails 8.1 real-app sweep). Headline finding: **the simplest drop-in
|
|
254
|
+
is the right answer.**
|
|
644
255
|
|
|
645
|
-
###
|
|
256
|
+
### Migrating from Puma
|
|
646
257
|
|
|
647
|
-
|
|
648
|
-
|
|
649
|
-
|
|
650
|
-
|
|
651
|
-
|
|
652
|
-
|
|
653
|
-
│ │
|
|
654
|
-
Pair with a fiber-aware Leave --async-io OFF.
|
|
655
|
-
connection pool Default thread-pool dispatch
|
|
656
|
-
(FiberPool, async-pool — is faster for synchronous
|
|
657
|
-
NOT connection_pool gem, Rails apps. Bench: --async-io
|
|
658
|
-
which uses non-fiber Mutex). on hello-world = 47% rps
|
|
659
|
-
│ regression + p99 spike to
|
|
660
|
-
Set --async-io. 3.65 s under no-yield workloads.
|
|
661
|
-
Pool size is the real No reason to flip the flag.
|
|
662
|
-
concurrency knob; -t is
|
|
663
|
-
decorative for wait-bound.
|
|
664
|
-
```
|
|
258
|
+
`hyperion -t N -w M` matching your current Puma `-t N:N -w M`. No other
|
|
259
|
+
flags. Versus Puma at the same `-t/-w` shape on real Rails endpoints:
|
|
260
|
+
**+9% rps on lightweight endpoints, 28× lower p99 on health-style
|
|
261
|
+
endpoints, 3.8× lower p99 on PG-touching endpoints.** Same RSS, same
|
|
262
|
+
operator surface — keep all your existing config, monitoring, deploy
|
|
263
|
+
scripts.
|
|
665
264
|
|
|
666
|
-
|
|
265
|
+
### Knobs that help on synthetic benches but **not** on real Rails
|
|
667
266
|
|
|
668
|
-
|
|
267
|
+
| Knob | Synthetic | Real Rails | Recommendation |
|
|
268
|
+
|---|---|---|---|
|
|
269
|
+
| `-t 30` | +5–10% on hello-world | **Hurts** p99 vs `-t 10` (3.51 s vs 148 ms on `/up`) — GVL + middleware Mutex contention | Stay at `-t 10`. |
|
|
270
|
+
| `--yjit` | +5–10% on CPU-bound | Wash on dev-mode Rails | Skip until you bench production-mode. |
|
|
271
|
+
| `RAILS_POOL > 25` | n/a | No improvement at 50 or 100 | Keep your existing AR pool. |
|
|
272
|
+
| `--async-io` | 33–42× rps on PG-bound | **Worse** than drop-in (4.14 s p99 on `/up`) until your full I/O stack is fiber-cooperative | Don't enable until `redis-rb` → `async-redis`. |
|
|
669
273
|
|
|
670
|
-
|
|
671
|
-
- **With `--async-io` + a fiber-aware pool**: pool size is the concurrency knob. `-t` is decorative for wait-bound workloads; one accept-loop fiber serves all in-flight queries via the pool. Linear scaling: pool=64 → ~780 r/s, pool=128 → ~1,344 r/s, pool=200 → ~2,180 r/s on 50 ms PG queries.
|
|
672
|
-
- **Pool over WAN**: if `PG.connect` round-trip is >50 ms, expect pool fill at startup to take `pool_size / parallel_fill_threads × RTT`. `hyperion-async-pg 0.5.1+` auto-scales `parallel_fill_threads` so pool=200 fills in ~1-2 s.
|
|
274
|
+
### When `-w N` helps
|
|
673
275
|
|
|
674
|
-
|
|
276
|
+
| Workload | Recommended | Why |
|
|
277
|
+
|---|---|---|
|
|
278
|
+
| Pure I/O-bound (PG / Redis / external HTTP) | `-w 1` + larger pool | `-w 1 pool=200` = 87 MB / 2,180 r/s vs `-w 4 pool=64` = 224 MB / 1,680 r/s. **2.6× memory, 0.77× rps** if you pick multi-worker on wait-bound. |
|
|
279
|
+
| Pure CPU-bound | `-w N` matching CPU count | Bench: `-w 16 -t 5` hits 98,818 r/s on a 16-vCPU box. |
|
|
280
|
+
| Mixed (Rails-shaped, ~5 ms CPU + 50 ms wait) | `-w N/2` (half cores) + medium pool | `-w 4 -t 5 pool=128` = 1,740 r/s on `pg_mixed.ru`, no cold-start spike. |
|
|
675
281
|
|
|
676
|
-
|
|
282
|
+
### Read p99 not mean
|
|
677
283
|
|
|
678
284
|
| Workload | Hyperion rps / p99 | Closest competitor | rps ratio | p99 ratio |
|
|
679
285
|
|---|---|---|---:|---:|
|
|
680
|
-
| Hello `-w 4` | 21,215
|
|
681
|
-
| CPU JSON `-w 4` | 15,582
|
|
682
|
-
| Static 1 MiB | 1,919
|
|
683
|
-
| PG-wait `-w 1` pool=200 | 2,180
|
|
684
|
-
|
|
685
|
-
**Size capacity by p99, not by mean.** Throughput peaks are easy to fake under controlled bench conditions; tail latency reflects what your slowest user actually experiences when the load balancer fans them onto a busy worker.
|
|
686
|
-
|
|
687
|
-
### Production tuning (real Rails apps)
|
|
688
|
-
|
|
689
|
-
Distilled from a real-app bench against the [Exodus platform](https://github.com/andrew-woblavobla/hyperion/blob/master/docs/BENCH_2026_04_27.md) (Rails 8.1, on-LAN PG + Redis at ~0.3 ms RTT, `-w 4 -t 10`, `wrk -t8 -c200 -d30s`). The headline finding: the **simplest drop-in is the right answer**, and the additional knobs operators reach for first don't help on real Rails.
|
|
690
|
-
|
|
691
|
-
**Recommended for migrating from Puma**: `hyperion -t N -w M` matching your current Puma `-t N:N -w M`. No other flags. That gives you (vs Puma at the same `-t/-w`):
|
|
692
|
-
|
|
693
|
-
- **+9% rps on lightweight endpoints** (matches the 5-10% per-request CPU savings the rest of the bench section documents).
|
|
694
|
-
- **28× lower p99 on health-style endpoints** — the queue-of-doom shape Puma exhibits under sustained 200-conn load doesn't reproduce on Hyperion's worker-owns-connection model.
|
|
695
|
-
- **3.8× lower p99 on PG-touching endpoints**.
|
|
696
|
-
- **Same RSS, same operator surface** — you keep all your existing config, monitoring, and deploy scripts.
|
|
697
|
-
|
|
698
|
-
**Knobs that help on synthetic benches but NOT on real Rails — leave them off:**
|
|
699
|
-
|
|
700
|
-
| Knob | Synthetic bench result | Real Rails result | Recommendation |
|
|
701
|
-
|---|---|---|---|
|
|
702
|
-
| `-t 30` (more threads/worker) | Helped Hyperion 5-10% on hello-world | **Hurt** p99 vs `-t 10` on real Rails (3.51 s vs 148 ms on /up) — GVL + middleware Mutex contention dominates past `-t 10` | Stay at `-t 10`. Match Puma's recommended `RAILS_MAX_THREADS`. |
|
|
703
|
-
| `--yjit` | 5-10% on synthetic CPU-bound | Wash on dev-mode Rails (312 vs 328 rps, p99 worse with YJIT) | Skip for now. Production-mode Rails may behave differently — verify with your own bench before flipping. |
|
|
704
|
-
| `RAILS_POOL` > 25 | n/a | No improvement at pool=50 or pool=100 on real Rails (rps within 3%, p99 within noise). Pool starvation is rarely the bottleneck on a `-w 4 -t 10` config | Keep your existing AR pool size. |
|
|
705
|
-
| `--async-io` | 33-42× rps on PG-bound (with `hyperion-async-pg`) | **Worse** than drop-in on real Rails (4.14 s p99 on /up vs 148 ms drop-in) | **Don't enable** until your full I/O stack is fiber-cooperative. The synchronous Redis client (`redis-rb`) blocks the OS thread before async-pg can yield, so fibers can't compound. Migrate to `async-redis` *first*, then revisit. |
|
|
706
|
-
| `--async-io` + `hyperion-async-pg` AR adapter | Verified 48× rps lift on a single-PG-query bench | Marginal-or-negative on real Rails (similar reason: Redis-first handlers don't yield) | Same — wait for a full-async I/O stack. |
|
|
286
|
+
| Hello `-w 4` | 21,215 / 1.87 ms | Falcon 24,061 / 9.78 ms | 0.88× | **5.2× lower** |
|
|
287
|
+
| CPU JSON `-w 4` | 15,582 / 2.47 ms | Falcon 18,643 / 13.51 ms | 0.84× | **5.5× lower** |
|
|
288
|
+
| Static 1 MiB | 1,919 / 4.22 ms | Puma 2,074 / 55 ms | 0.93× | **13× lower** |
|
|
289
|
+
| PG-wait `-w 1` pool=200 | 2,180 / 668 ms | Puma 530 + 200 timeouts | **4.1×** | qualitative crush |
|
|
707
290
|
|
|
708
|
-
|
|
291
|
+
Throughput peaks are easy to fake under controlled conditions; tail
|
|
292
|
+
latency reflects what your slowest user actually experiences when the
|
|
293
|
+
load balancer fans them onto a busy worker.
|
|
709
294
|
|
|
710
295
|
## Logging
|
|
711
296
|
|
|
712
|
-
Default behaviour
|
|
713
|
-
|
|
714
|
-
- **`info` / `debug` → stdout**, **`warn` / `error` / `fatal` → stderr** (12-factor).
|
|
715
|
-
- **One structured access-log line per response**, info level, on stdout. Disable with `--no-log-requests` or `HYPERION_LOG_REQUESTS=0`.
|
|
716
|
-
- **Format auto-selects**: production envs → JSON (line-delimited, parseable by every log aggregator); TTY → coloured text; piped output without env hint → JSON.
|
|
297
|
+
Default behaviour:
|
|
717
298
|
|
|
718
|
-
|
|
299
|
+
- `info`/`debug` → stdout, `warn`/`error`/`fatal` → stderr (12-factor).
|
|
300
|
+
- One structured access-log line per response, `info` level. Disable
|
|
301
|
+
with `--no-log-requests` or `HYPERION_LOG_REQUESTS=0`.
|
|
302
|
+
- Format auto-selects: `RAILS_ENV=production`/`staging` → JSON; TTY →
|
|
303
|
+
coloured text; piped output without env hint → JSON.
|
|
719
304
|
|
|
720
|
-
|
|
305
|
+
Sample text (TTY default):
|
|
721
306
|
|
|
722
307
|
```
|
|
723
308
|
2026-04-26T18:40:04.112Z INFO [hyperion] message=request method=GET path=/api/v1/health status=200 duration_ms=46.63 remote_addr=127.0.0.1 http_version=HTTP/1.1
|
|
724
|
-
2026-04-26T18:40:04.123Z INFO [hyperion] message=request method=GET path=/api/v1/cached_data query="currency=USD" status=200 duration_ms=43.87 remote_addr=127.0.0.1 http_version=HTTP/1.1
|
|
725
309
|
```
|
|
726
310
|
|
|
727
|
-
JSON
|
|
311
|
+
Sample JSON (production / piped):
|
|
728
312
|
|
|
729
313
|
```json
|
|
730
314
|
{"ts":"2026-04-26T18:38:49.405Z","level":"info","source":"hyperion","message":"request","method":"GET","path":"/api/v1/health","status":200,"duration_ms":46.63,"remote_addr":"127.0.0.1","http_version":"HTTP/1.1"}
|
|
731
|
-
{"ts":"2026-04-26T18:38:49.411Z","level":"info","source":"hyperion","message":"request","method":"GET","path":"/api/v1/cached_data","query":"currency=USD","status":200,"duration_ms":40.64,"remote_addr":"127.0.0.1","http_version":"HTTP/1.1"}
|
|
732
315
|
```
|
|
733
316
|
|
|
734
|
-
### Hot-path optimisations
|
|
735
|
-
|
|
736
|
-
The default-ON access log path is engineered to stay near-zero cost:
|
|
737
|
-
|
|
738
|
-
- **Per-thread cached `iso8601(3)` timestamp** — one allocation per millisecond per thread, reused across all requests in that millisecond.
|
|
739
|
-
- **Hand-rolled single-interpolation line builder** — bypasses generic `Hash#map.join`.
|
|
740
|
-
- **Per-thread 4 KiB write buffer** — flushes to stdout when full or on connection close. Cuts ~32× the syscalls under load.
|
|
741
|
-
- **Lock-free emit** — POSIX `write(2)` is atomic for writes ≤ PIPE_BUF (4096 B); a log line is ~200 B. No logger mutex.
|
|
742
|
-
|
|
743
317
|
## Metrics
|
|
744
318
|
|
|
745
|
-
`Hyperion.stats` returns a snapshot Hash with
|
|
319
|
+
`Hyperion.stats` returns a snapshot Hash with lock-free per-thread
|
|
320
|
+
counters (`connections_accepted`, `connections_active`, `requests_total`,
|
|
321
|
+
`requests_in_flight`, `responses_<code>`, `parse_errors`, `app_errors`,
|
|
322
|
+
`read_timeouts`, `requests_threadpool_dispatched`,
|
|
323
|
+
`requests_async_dispatched`, `c_loop_requests_total`).
|
|
746
324
|
|
|
747
|
-
|
|
748
|
-
|
|
749
|
-
|
|
750
|
-
| `connections_active` | Currently in-flight connections. |
|
|
751
|
-
| `requests_total` | Lifetime request count. |
|
|
752
|
-
| `requests_in_flight` | Currently in-flight requests. |
|
|
753
|
-
| `responses_<code>` | One counter per status code emitted (`responses_200`, `responses_400`, …). |
|
|
754
|
-
| `parse_errors` | HTTP parse failures → 400. |
|
|
755
|
-
| `app_errors` | Rack app raised → 500. |
|
|
756
|
-
| `read_timeouts` | Per-connection read deadline hit. |
|
|
757
|
-
| `requests_threadpool_dispatched` | HTTP/1.1 connection handed to the worker pool (or served inline in `start_raw_loop` when `thread_count: 0`). The default dispatch path. |
|
|
758
|
-
| `requests_async_dispatched` | HTTP/1.1 connection served inline on the accept-loop fiber under `--async-io`. Operators can use the ratio against `requests_threadpool_dispatched` to verify fiber-cooperative I/O is actually engaged. |
|
|
759
|
-
|
|
760
|
-
```ruby
|
|
761
|
-
require 'hyperion'
|
|
762
|
-
Hyperion.stats
|
|
763
|
-
# => {connections_accepted: 1234, connections_active: 7, requests_total: 8910, …}
|
|
764
|
-
```
|
|
765
|
-
|
|
766
|
-
### Prometheus exporter
|
|
767
|
-
|
|
768
|
-
When `admin_token` is set in your config, Hyperion mounts a `/-/metrics` endpoint that emits Prometheus text-format v0.0.4. Same token guards both `/-/metrics` (GET) and `/-/quit` (POST); auth is via the `X-Hyperion-Admin-Token` header.
|
|
325
|
+
When `admin_token` is set, `/-/metrics` emits Prometheus text-format
|
|
326
|
+
v0.0.4. Auth is via the `X-Hyperion-Admin-Token` header (same token
|
|
327
|
+
guards `POST /-/quit`):
|
|
769
328
|
|
|
770
329
|
```sh
|
|
771
330
|
$ curl -s -H 'X-Hyperion-Admin-Token: secret' http://127.0.0.1:9292/-/metrics
|
|
772
331
|
# HELP hyperion_requests_total Total HTTP requests handled
|
|
773
332
|
# TYPE hyperion_requests_total counter
|
|
774
333
|
hyperion_requests_total 8910
|
|
775
|
-
# HELP hyperion_bytes_written_total Total bytes written to response sockets
|
|
776
|
-
# TYPE hyperion_bytes_written_total counter
|
|
777
|
-
hyperion_bytes_written_total 2351023
|
|
778
|
-
# HELP hyperion_responses_status_total Responses by HTTP status code
|
|
779
|
-
# TYPE hyperion_responses_status_total counter
|
|
780
334
|
hyperion_responses_status_total{status="200"} 8521
|
|
781
335
|
hyperion_responses_status_total{status="404"} 12
|
|
782
|
-
hyperion_responses_status_total{status="500"} 3
|
|
783
|
-
# … and so on for sendfile_responses_total, rejected_connections_total,
|
|
784
|
-
# slow_request_aborts_total, requests_async_dispatched_total, etc.
|
|
785
336
|
```
|
|
786
337
|
|
|
787
|
-
Any counter not in the known set (added
|
|
338
|
+
Any counter not in the known set (added via
|
|
339
|
+
`Hyperion.metrics.increment(:custom_thing)`) is auto-exported as
|
|
340
|
+
`hyperion_custom_thing` with a generic HELP line. Network-isolate the
|
|
341
|
+
admin endpoints if the listener is internet-facing — see
|
|
342
|
+
[docs/REVERSE_PROXY.md](docs/REVERSE_PROXY.md) for the nginx
|
|
343
|
+
`location /-/ { return 404; }` recipe.
|
|
344
|
+
|
|
345
|
+
## Compatibility
|
|
346
|
+
|
|
347
|
+
| Component | Version |
|
|
348
|
+
|---|---|
|
|
349
|
+
| Ruby | 3.3+ (transitive `protocol-http2 ~> 0.26` floor) |
|
|
350
|
+
| Rack | 3.x |
|
|
351
|
+
| Rails | verified up to 8.1 |
|
|
352
|
+
| Linux kernel | 5.x+ for io_uring opt-in; 4.x+ otherwise |
|
|
353
|
+
| macOS | works (TLS, h2, WebSockets, `accept4` fallback path) |
|
|
788
354
|
|
|
789
|
-
|
|
355
|
+
Per-Rack-3-spec: auto-sets `SERVER_SOFTWARE`, `rack.version`,
|
|
356
|
+
`REMOTE_ADDR`, IPv6-safe `Host` parsing, CRLF guard. The
|
|
357
|
+
`Hyperion::FiberLocal.install!` opt-in shim handles the residual
|
|
358
|
+
`Thread.current.thread_variable_*` footgun in older Rails idioms;
|
|
359
|
+
modern Rails 7.1+ already uses Fiber storage natively.
|
|
790
360
|
|
|
791
|
-
##
|
|
361
|
+
## Reproducing benchmarks
|
|
792
362
|
|
|
793
|
-
|
|
363
|
+
Every number in this README is reproducible. Per-row commands:
|
|
794
364
|
|
|
795
365
|
```sh
|
|
796
|
-
|
|
797
|
-
|
|
366
|
+
# Setup (once)
|
|
367
|
+
bundle install
|
|
368
|
+
bundle exec rake compile
|
|
798
369
|
|
|
799
|
-
|
|
370
|
+
# Hello via Server.handle_static + io_uring (134k r/s row)
|
|
371
|
+
HYPERION_IO_URING_ACCEPT=1 bundle exec bin/hyperion -w 1 -t 5 -p 9292 bench/hello_static.ru &
|
|
372
|
+
wrk -t4 -c100 -d20s --latency http://127.0.0.1:9292/
|
|
800
373
|
|
|
801
|
-
|
|
374
|
+
# Dynamic block via Server.handle (9.4k r/s row)
|
|
375
|
+
bundle exec bin/hyperion -w 1 -t 5 -p 9292 bench/hello_handle_block.ru &
|
|
376
|
+
wrk -t4 -c100 -d20s --latency http://127.0.0.1:9292/
|
|
802
377
|
|
|
803
|
-
|
|
378
|
+
# Generic Rack hello (4.7k r/s row)
|
|
379
|
+
bundle exec bin/hyperion -w 1 -t 5 -p 9292 bench/hello.ru &
|
|
380
|
+
wrk -t4 -c100 -d20s --latency http://127.0.0.1:9292/
|
|
381
|
+
|
|
382
|
+
# CPU JSON via block form (5.9k r/s row)
|
|
383
|
+
bundle exec bin/hyperion -w 1 -t 5 -p 9292 bench/work.ru &
|
|
384
|
+
wrk -t4 -c200 -d15s --latency http://127.0.0.1:9292/
|
|
385
|
+
|
|
386
|
+
# 4-way comparator (Hyperion vs Puma vs Falcon vs Agoo)
|
|
387
|
+
bash bench/4way_compare.sh
|
|
388
|
+
|
|
389
|
+
# gRPC unary + streaming (Hyperion side)
|
|
390
|
+
GHZ=/tmp/ghz TRIALS=3 DURATION=15s WARMUP_DURATION=3s bash bench/grpc_stream_bench.sh
|
|
391
|
+
|
|
392
|
+
# Idle keep-alive RSS sweep (10k conns × 30s hold)
|
|
393
|
+
bash bench/keepalive_memory.sh
|
|
394
|
+
```
|
|
804
395
|
|
|
805
|
-
|
|
806
|
-
-
|
|
807
|
-
|
|
808
|
-
|
|
396
|
+
PG benches (`pg_concurrent.ru`, `pg_mixed.ru`) live in the
|
|
397
|
+
[hyperion-async-pg](https://github.com/andrew-woblavobla/hyperion-async-pg)
|
|
398
|
+
companion repo — they require a running Postgres and the companion
|
|
399
|
+
gem.
|
|
400
|
+
|
|
401
|
+
When numbers from your host don't match the published numbers, the
|
|
402
|
+
most likely explanations (in order): (1) bench-host noise — single-VM
|
|
403
|
+
benches drift 10–30% over days; (2) Puma version mismatch (sweep used
|
|
404
|
+
Puma 8.0.1; the in-repo Gemfile pins `~> 6.4`); (3) different kernel
|
|
405
|
+
or Ruby; (4) different `-t` / `-c` (apples-to-apples requires
|
|
406
|
+
identical worker count, thread count, wrk concurrency, payload, and
|
|
407
|
+
TLS cipher).
|
|
408
|
+
|
|
409
|
+
## Release history
|
|
410
|
+
|
|
411
|
+
See [CHANGELOG.md](CHANGELOG.md). Recent: 2.14.0 (gRPC streaming ghz
|
|
412
|
+
numbers; dynamic-block C dispatch — `Server.handle { |env| ... }` lifts
|
|
413
|
+
hello to 9,422 r/s and CPU JSON to 5,897 r/s; `Server#stop` accept-wake
|
|
414
|
+
on Linux; io_uring 4h soak), 2.13.0 (response head builder C-rewrite;
|
|
415
|
+
gRPC streaming RPCs; soak harness), 2.12.0 (C connection lifecycle;
|
|
416
|
+
io_uring loop hits 134k r/s; gRPC unary trailers; SO_REUSEPORT
|
|
417
|
+
audit), 2.11.0 (HPACK CGlue default; h2 dispatch-pool warmup), 2.10.x
|
|
418
|
+
(`PageCache`, `Server.handle` direct routes, TCP_NODELAY at accept).
|
|
419
|
+
|
|
420
|
+
## Links
|
|
421
|
+
|
|
422
|
+
- [CHANGELOG.md](CHANGELOG.md) — per-stream releases.
|
|
423
|
+
- [docs/BENCH_HYPERION_2_11.md](docs/BENCH_HYPERION_2_11.md) — current
|
|
424
|
+
4-way matrix + 2.14-D gRPC numbers.
|
|
425
|
+
- [docs/BENCH_HYPERION_2_0.md](docs/BENCH_HYPERION_2_0.md) — historical
|
|
426
|
+
2.10-B baseline (preserved for archaeology).
|
|
427
|
+
- [docs/BENCH_2026_04_27.md](docs/BENCH_2026_04_27.md) — real Rails 8.1
|
|
428
|
+
app sweep (Exodus platform).
|
|
429
|
+
- [docs/OBSERVABILITY.md](docs/OBSERVABILITY.md) — metrics + Grafana.
|
|
430
|
+
- [docs/WEBSOCKETS.md](docs/WEBSOCKETS.md) — RFC 6455 surface.
|
|
431
|
+
- [docs/MIGRATING_FROM_PUMA.md](docs/MIGRATING_FROM_PUMA.md) — drop-in
|
|
432
|
+
guide.
|
|
433
|
+
- [docs/REVERSE_PROXY.md](docs/REVERSE_PROXY.md) — nginx fronting.
|
|
809
434
|
|
|
810
435
|
## Credits
|
|
811
436
|
|
|
812
|
-
- Vendored [llhttp](https://github.com/nodejs/llhttp) (Node.js's HTTP
|
|
813
|
-
|
|
437
|
+
- Vendored [llhttp](https://github.com/nodejs/llhttp) (Node.js's HTTP
|
|
438
|
+
parser, MIT) under `ext/hyperion_http/llhttp/`.
|
|
439
|
+
- HTTP/2 framing and HPACK via
|
|
440
|
+
[`protocol-http2`](https://github.com/socketry/protocol-http2).
|
|
814
441
|
- Fiber scheduler via [`async`](https://github.com/socketry/async).
|
|
815
442
|
|
|
816
443
|
## License
|