hyperion-rb 1.2.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4174d7143559b6bd05bdc78acf4377add8aca32f885e933786c50f31c956e9ba
4
- data.tar.gz: f163a7f5bd2b363f37205e1f1ba845fb0324c329cc15b4c1144e6d519a1bc60a
3
+ metadata.gz: 53235f97fb1e384507f62373cd12180ade1eecc8df6f9ce75145ba60403a983e
4
+ data.tar.gz: 1c58ede296c54a098d26cae2b69837f23bf60fb5ce4239c033446f583f12a4df
5
5
  SHA512:
6
- metadata.gz: ea61b5e3298ae50b9b6530d51e1f9a5299b0ccfea3b99248230a601a96ebaf764b5d7978215e09a7d73ed7e85ee3f8b5f7d13d40a830ca5c4482a9d192b2919a
7
- data.tar.gz: ed8e125b2ff0c9aab53f3178d0f31d1b0db028f8ebf3a40d09ba11e86c3a62756a3c15c7eb3b288faf8dee6f1062159d372cb0c08997a76fedcb97d485d87283
6
+ metadata.gz: 8484e7168d8ba27312edece5c86af770ef0604bf85d13cdde37c5f4c87b9de0417216f241e073340bedeabef292fc4ec032a7379a76a836e236f6f129c97bcd3
7
+ data.tar.gz: 89ac23881d0ddd4beff79d08551fa6f7e8399948c1607a3a32475202780170dc9c036f1ef93bd1f17dc5a34e1d91d9901bf3e4119e9efea05d5aa528b22271ff
data/CHANGELOG.md CHANGED
@@ -1,5 +1,24 @@
1
1
  # Changelog
2
2
 
3
+ ## [1.3.0] - 2026-04-27
4
+
5
+ Adds the structural moat for fiber-cooperative I/O. No breaking changes.
6
+
7
+ ### Added
8
+ - **`async_io: true` config flag** (also `--async-io` CLI flag) — when enabled, the plain HTTP/1.1 accept loop runs each connection on a fiber under `Async::Scheduler` instead of handing it to a worker thread. This is what makes [hyperion-async-pg](https://github.com/andrew-woblavobla/hyperion-async-pg) (and other Async-aware libraries) actually cooperate: each fiber yields the OS thread on socket waits, so one thread can serve N concurrent in-flight DB queries instead of 1. **Default off** to keep the 1.2.0 raw-loop perf for fiber-unaware apps. Trade-off: ~5% throughput hit on hello-world; 5–10× throughput on PG-bound workloads when paired with hyperion-async-pg + a fiber-aware connection pool.
9
+ - **Bench validation (macOS, 50ms PG round-trip, 200 concurrent wrk conns):**
10
+
11
+ | | r/s | p99 |
12
+ |---|---:|---:|
13
+ | Puma 7.2 `-t 5` + plain pg (pool=5) | 88.9 | 2.31 s |
14
+ | **Hyperion 1.3.0 `--async-io -t 5` + hyperion-async-pg (FiberPool=64)** | **1,103.7** | **237 ms** |
15
+
16
+ **12.4× throughput, 9.7× lower p99.** Theoretical ceiling at pool=64 + 50ms query is ~1280 r/s; achieved 86% of it. Linux numbers will land in a follow-up bench section.
17
+
18
+ ### Changed
19
+ - TLS / HTTP/2 paths still always use the Async accept loop (unchanged); they ignore the `async_io` flag because they need the scheduler for ALPN handshake yields and per-stream fiber dispatch anyway.
20
+ - When `async_io: true`, plain HTTP/1.1 dispatch bypasses the thread pool and serves the connection inline on the calling fiber. The pool stays in use for the TLS path's `app.call` hops on each h2 stream.
21
+
3
22
  ## [1.2.0] - 2026-04-27
4
23
 
5
24
  Production hardening + perf round 2. No breaking changes.
data/README.md CHANGED
@@ -29,26 +29,27 @@ All numbers are real wrk runs against published Hyperion configs. Hyperion ships
29
29
 
30
30
  ### Hello-world Rack app
31
31
 
32
- `bench/hello.ru`, single worker, parity threads (`-t 16` vs Puma `-t 16:16`), 4 wrk threads / 50 connections / 10s, macOS arm64 / Ruby 3.3.3:
32
+ `bench/hello.ru`, single worker, parity threads (`-t 5` vs Puma `-t 5:5`), 4 wrk threads / 100 connections / 15s, macOS arm64 / Ruby 3.3.3, Hyperion 1.2.0:
33
33
 
34
- | | r/s | p99 |
35
- |---|---:|---:|
36
- | **Hyperion default (logs ON)** | **23,885** | **1.05 ms** |
37
- | Hyperion `--no-log-requests` | 24,222 | 1.00 ms |
38
- | Puma `-t 16:16` | 18,794 | 30.89 ms |
34
+ | | r/s | p99 | tail vs Hyperion |
35
+ |---|---:|---:|---:|
36
+ | **Hyperion 1.2.0** (default, logs ON) | **22,496** | **502 µs** | **1×** |
37
+ | Falcon 0.55.3 `--count 1` | 22,199 | 5.36 ms | 11× worse |
38
+ | Puma 7.1.0 `-t 5:5` | 20,400 | 422.85 ms | 845× worse |
39
39
 
40
- **1.27× Puma throughput, ~30× lower p99 — while emitting structured JSON access logs Puma doesn't.**
40
+ **Hyperion: 1.10× Puma throughput, parity with Falcon on throughput, ~10× lower p99 than Falcon and ~845× lower than Puma — while emitting structured JSON access logs the others don't.**
41
41
 
42
42
  ### Production cluster config (`-w 4`)
43
43
 
44
- Same bench app, `-w 4` cluster, parity threads. macOS arm64:
44
+ Same bench app, `-w 4` cluster, parity threads (`-t 5` everywhere), 4 wrk threads / 200 connections / 15s, macOS arm64:
45
45
 
46
- | | r/s | p99 |
47
- |---|---:|---:|
48
- | **Hyperion `-w 4 -t 10`** | **44,221** | **1.15 ms** |
49
- | Puma `-w 4 -t 10:10` | 37,929 | 17.06 ms |
46
+ | | r/s | p99 | tail vs Hyperion |
47
+ |---|---:|---:|---:|
48
+ | Falcon `--count 4` | 48,197 | 4.84 ms | 5.9× worse |
49
+ | **Hyperion `-w 4 -t 5`** | **40,137** | **825 µs** | **1×** |
50
+ | Puma `-w 4 -t 5:5` | 34,793 | 177.76 ms | 215× worse (1 timeout) |
50
51
 
51
- **1.17× Puma throughput, ~15× lower p99.**
52
+ Falcon edges Hyperion ~20% on raw rps at `-w 4` on macOS hello-world. **Hyperion still leads on tail latency by 5.9× over Falcon and 215× over Puma**, and beats Puma on throughput by 1.15×. On Linux production-config and DB-backed workloads (below) Hyperion takes the rps lead too — the macOS hello-world advantage to Falcon disappears once the workload includes any actual work or the kernel is Linux.
52
53
 
53
54
  ### Linux production-config (DB-backed Rack)
54
55
 
@@ -60,7 +61,37 @@ Same bench app, `-w 4` cluster, parity threads. macOS arm64:
60
61
  | Hyperion `--no-log-requests` | 6,364 | 1.114× |
61
62
  | Puma `-w 4 -t 10:10` (no per-req logs) | 5,715 | 1.000× |
62
63
 
63
- Bench is network-bound (~3-4 ms median is the PG + Redis round-trip). Hyperion's lead comes from cheaper per-request CPU: lock-free per-thread metrics, per-thread cached iso8601 timestamps in the access log, hand-rolled single-interpolation log line builder, no logger mutex (POSIX `write(2)` atomicity), C-extension response-head builder.
64
+ Bench is **wait-bound** ~3-4 ms median is the PG + Redis round-trip, dwarfing the per-request CPU work where Hyperion's optimisations live. With a synchronous `pg` driver, fibers don't help: every in-flight DB call still parks an OS thread, and both servers max out at `workers × threads` concurrent queries. To widen this gap requires either an async PG driver — see [hyperion-async-pg](https://github.com/andrew-woblavobla/hyperion-async-pg) (companion gem; pair with `--async-io` and a fiber-aware pool, see "Async I/O — fiber concurrency on PG-bound apps" below) — or a CPU-bound workload, where Hyperion's lead becomes visible (next section).
65
+
66
+ ### Async I/O — fiber concurrency on PG-bound apps
67
+
68
+ `bench/pg_concurrent.ru` (50 ms PG query per request, pool sized for the server's concurrency model). macOS, Postgres over WAN, wrk `-t4 -c200 -d20s`:
69
+
70
+ | | r/s | p99 |
71
+ |---|---:|---:|
72
+ | Puma 7.2 `-t 5` + plain pg (pool=5) | 88.9 | 2.31 s |
73
+ | **Hyperion 1.3.0 `--async-io -t 5` + hyperion-async-pg (FiberPool=64)** | **1,103.7** | **237 ms** |
74
+
75
+ **12.4× throughput, 9.7× lower p99.** Puma is bottlenecked at `threads × 1 in-flight query` because plain `pg` blocks the OS thread on `recv()`. Hyperion + async-pg + a fiber-aware pool decouples concurrency from threads: 5 OS threads serve 64 concurrent in-flight queries via fiber cooperation. Theoretical ceiling at pool=64 + 50 ms query = 1280 r/s; achieved 1103 r/s = 86% of it.
76
+
77
+ Three things must all be true to get this win:
78
+ 1. **`async_io: true`** in your Hyperion config (or `--async-io` CLI flag). Default is off to keep 1.2.0's raw-loop perf for fiber-unaware apps.
79
+ 2. **`hyperion-async-pg`** installed: `gem 'hyperion-async-pg', require: 'hyperion/async_pg'` + `Hyperion::AsyncPg.install!` at boot.
80
+ 3. **Fiber-aware connection pool.** The popular `connection_pool` gem is NOT — its Mutex blocks the OS thread. Use [`async-pool`](https://github.com/socketry/async-pool), `Async::Semaphore`, or hand-roll one (see `bench/pg_concurrent.ru` for a 30-line FiberPool example).
81
+
82
+ Skip any of these and you get parity with Puma at the same `-t`. Run the bench yourself: `MODE=async DATABASE_URL=... PG_POOL_SIZE=64 bundle exec hyperion --async-io -t 5 bench/pg_concurrent.ru` (in the [hyperion-async-pg](https://github.com/andrew-woblavobla/hyperion-async-pg) repo).
83
+
84
+ ### CPU-bound JSON workload
85
+
86
+ `bench/work.ru` — handler builds a 50-key fixture, JSON-encodes a fresh response per request (~8 KB body), processes a 6-cookie header chain. wrk `-t4 -c200 -d15s`, macOS arm64 / Ruby 3.3.3, 1.2.0:
87
+
88
+ | | r/s | p99 | tail vs Hyperion |
89
+ |---|---:|---:|---:|
90
+ | Falcon `--count 4` | 46,166 | 20.17 ms | 24× worse |
91
+ | **Hyperion `-w 4 -t 5`** | **43,924** | **824 µs** | **1×** |
92
+ | Puma `-w 4 -t 5:5` | 36,383 | 166.30 ms (47 socket errors) | 200× worse |
93
+
94
+ **1.21× Puma throughput, 200× lower p99.** This is the gap that hides behind PG-round-trip noise on the DB bench. Hyperion's per-request CPU savings (lock-free per-thread metrics, frozen header keys in the Rack adapter, C-ext response head builder, cached iso8601 timestamps, cached HTTP Date header) land on the wire when the workload is CPU-bound. Falcon edges us 5% on raw r/s but with 24× worse tail — a different tradeoff curve. Reproduce: `bundle exec bin/hyperion -w 4 -t 5 -p 9292 bench/work.ru`.
64
95
 
65
96
  ### Real Rails 8.1 app (single worker, parity threads `-t 16`)
66
97
 
@@ -77,6 +108,25 @@ Health endpoint that traverses the full middleware chain (rack-attack, locale re
77
108
 
78
109
  On Grape and Rails-controller workloads Puma hits wrk's 2 s timeout cap on ~⅔ of requests — its real p99 is censored above 2 s. Hyperion serves all of its requests under 1.2 s with 0 to 16 timeouts. **1.14–1.48× Puma throughput** depending on endpoint.
79
110
 
111
+ ### Static-asset serving (sendfile zero-copy path, 1.2.0+)
112
+
113
+ `bench/static.ru` (`Rack::Files` over a 1 MiB asset), `-w 1`, `wrk -t4 -c100 -d15s`, macOS arm64 / Ruby 3.3.3:
114
+
115
+ | | r/s | p99 | transferred | tail vs winner |
116
+ |---|---:|---:|---:|---:|
117
+ | **Hyperion (sendfile path)** | **2,069** | **3.10 ms** | 30.4 GB | **1×** |
118
+ | Puma `-w 1 -t 5:5` | 2,109 | 566.16 ms | 31.0 GB | 183× worse |
119
+ | Falcon `--count 1` | 1,269 | 801.01 ms | 18.7 GB | 258× worse (28 timeouts) |
120
+
121
+ Throughput is bandwidth-bound on localhost (≈2 GB/s = the loopback memory ceiling), so the throughput column looks like parity. The actual win is in the **tail latency** column: Hyperion's `IO.copy_stream` → `sendfile(2)` path skips userspace entirely, while Puma allocates a String per response and Falcon serializes more aggressively. On real network paths sendfile widens the gap further (kernel-to-NIC zero-copy).
122
+
123
+ Reproduce:
124
+ ```sh
125
+ ruby -e 'File.binwrite("/tmp/hyperion_bench_asset_1m.bin", "x" * (1024*1024))'
126
+ bundle exec bin/hyperion -p 9292 bench/static.ru
127
+ wrk --latency -t4 -c100 -d15s http://127.0.0.1:9292/hyperion_bench_asset_1m.bin
128
+ ```
129
+
80
130
  ### Concurrency at scale (architectural advantages)
81
131
 
82
132
  These workloads demonstrate structural differences between Hyperion's fiber-per-connection / fiber-per-stream model and Puma's thread-pool model. Numbers are illustrative; the architecture is what matters. Run on Ubuntu 24.04 / Ruby 3.3.3, single worker, h2load `-c <conns> -n 100000 --rps 1000 --h1`.
data/lib/hyperion/cli.rb CHANGED
@@ -57,6 +57,10 @@ module Hyperion
57
57
  'Enable Ruby YJIT (default: auto on RAILS_ENV/RACK_ENV=production/staging)') do |v|
58
58
  cli_opts[:yjit] = v
59
59
  end
60
+ o.on('--[no-]async-io',
61
+ 'Run plain HTTP/1.1 connections under Async::Scheduler (required for hyperion-async-pg and other fiber-cooperative I/O; default off)') do |v|
62
+ cli_opts[:async_io] = v
63
+ end
60
64
  o.on('-h', '--help', 'show help') do
61
65
  puts o
62
66
  exit 0
@@ -114,7 +118,8 @@ module Hyperion
114
118
  read_timeout: config.read_timeout,
115
119
  max_pending: config.max_pending,
116
120
  max_request_read_seconds: config.max_request_read_seconds,
117
- h2_settings: Master.build_h2_settings(config))
121
+ h2_settings: Master.build_h2_settings(config),
122
+ async_io: config.async_io)
118
123
  server.listen
119
124
  scheme = tls ? 'https' : 'http'
120
125
  Hyperion.logger.info { { message: 'listening', url: "#{scheme}://#{server.host}:#{server.port}" } }
@@ -31,6 +31,7 @@ module Hyperion
31
31
  admin_token: nil, # String. When set, exposes admin endpoints (POST /-/quit triggers graceful drain; GET /-/metrics returns Prometheus-format Hyperion.stats). Same token guards both. nil disables admin entirely (paths fall through to the app).
32
32
  max_pending: nil, # Integer, e.g. 256. When the per-worker accept inbox has this many queued connections, additional accepts are rejected with HTTP 503 + Retry-After:1 instead of being queued. nil disables (current behaviour: unbounded queue).
33
33
  max_request_read_seconds: 60, # Numeric. Total wallclock budget (seconds) for reading the request line + headers + body for ONE request. Defends against slowloris-style drips that satisfy the per-recv read_timeout but never finish the request. Resets between requests on a keep-alive connection. nil disables.
34
+ async_io: false, # When true, the plain HTTP/1.1 accept loop runs each connection on a fiber under Async::Scheduler instead of handing it to a worker thread. Required for fiber-cooperative I/O (e.g. hyperion-async-pg). Costs ~5% throughput on hello-world; in exchange one OS thread can serve N concurrent in-flight DB queries on wait-bound workloads. TLS / HTTP/2 paths always use the async loop and ignore this flag.
34
35
  h2_max_concurrent_streams: 128, # HTTP/2 SETTINGS_MAX_CONCURRENT_STREAMS — cap on simultaneously-open streams per connection. Falcon: 64. nil leaves protocol-http2 default (0xFFFFFFFF).
35
36
  h2_initial_window_size: 1_048_576, # HTTP/2 SETTINGS_INITIAL_WINDOW_SIZE (octets) — flow-control window per stream at open. Bigger = fewer WINDOW_UPDATE round-trips on large bodies. Spec default is 65535. nil → leave protocol default.
36
37
  h2_max_frame_size: 1_048_576, # HTTP/2 SETTINGS_MAX_FRAME_SIZE (octets) — biggest DATA/HEADERS frame we'll accept. Spec floor 16384, ceiling 16777215. We pick 1 MiB to match common CDNs without unbounded buffer growth. nil → leave protocol default (16384).
@@ -166,7 +166,8 @@ module Hyperion
166
166
  worker_index: worker_index,
167
167
  max_pending: @config.max_pending,
168
168
  max_request_read_seconds: @config.max_request_read_seconds,
169
- h2_settings: Master.build_h2_settings(@config)
169
+ h2_settings: Master.build_h2_settings(@config),
170
+ async_io: @config.async_io
170
171
  }
171
172
  # Hand the inherited socket to the worker in :share mode. In
172
173
  # :reuseport mode the worker binds its own with SO_REUSEPORT.
@@ -42,7 +42,7 @@ module Hyperion
42
42
 
43
43
  def initialize(app:, host: '127.0.0.1', port: 9292, read_timeout: DEFAULT_READ_TIMEOUT_SECONDS,
44
44
  tls: nil, thread_count: DEFAULT_THREAD_COUNT, max_pending: nil,
45
- max_request_read_seconds: 60, h2_settings: nil)
45
+ max_request_read_seconds: 60, h2_settings: nil, async_io: false)
46
46
  @host = host
47
47
  @port = port
48
48
  @app = app
@@ -52,6 +52,7 @@ module Hyperion
52
52
  @max_pending = max_pending
53
53
  @max_request_read_seconds = max_request_read_seconds
54
54
  @h2_settings = h2_settings
55
+ @async_io = async_io
55
56
  @thread_pool = nil
56
57
  @stopped = false
57
58
  end
@@ -107,16 +108,23 @@ module Hyperion
107
108
  listen unless @server
108
109
  @thread_pool = ThreadPool.new(size: @thread_count, max_pending: @max_pending) if @thread_count.positive?
109
110
 
110
- if @tls
111
+ if @tls || @async_io
111
112
  # TLS path: ALPN may pick `h2`, and h2 spawns one fiber per stream
112
113
  # inside Http2Handler. Keep the Async wrapper so the scheduler is
113
114
  # available for those fibers and for handshake yields.
115
+ #
116
+ # async_io: true: operator opt-in for plain HTTP/1.1. The Async wrap
117
+ # is required when callers want fiber cooperative I/O — e.g.
118
+ # `hyperion-async-pg` yielding while a Postgres query is in flight.
119
+ # Pays ~5% throughput vs the raw-loop fast path; in exchange one
120
+ # OS thread can serve N concurrent in-flight DB queries instead of 1.
114
121
  start_async_loop
115
122
  else
116
- # Plain HTTP/1.1: the worker thread owns each connection for its
117
- # lifetime, so the Async wrapper adds zero value (no fibers ever
118
- # run on this loop's task). Skip it — pure IO.select + accept_nonblock
119
- # shaves measurable overhead off the accept hot path.
123
+ # Plain HTTP/1.1, async_io: false (default): the worker thread owns
124
+ # each connection for its lifetime, so the Async wrapper adds zero
125
+ # value (no fibers ever run on this loop's task). Skip it — pure
126
+ # IO.select + accept_nonblock shaves measurable overhead off the
127
+ # accept hot path.
120
128
  start_raw_loop
121
129
  end
122
130
  ensure
@@ -174,6 +182,15 @@ module Hyperion
174
182
  # handler still uses the pool's `#call` for app.call hops on each
175
183
  # stream (one per stream, not one per connection).
176
184
  Http2Handler.new(app: @app, thread_pool: @thread_pool, h2_settings: @h2_settings).serve(socket)
185
+ elsif @async_io
186
+ # async_io plain HTTP/1.1: serve inline on the calling fiber so the
187
+ # request runs *under* Async::Scheduler. This is what makes
188
+ # hyperion-async-pg (and other Async-aware libraries) actually
189
+ # cooperate — each fiber yields the OS thread on socket waits, so
190
+ # one thread can serve N concurrent in-flight DB queries. The
191
+ # thread pool is intentionally bypassed here: handing the socket
192
+ # to a worker thread strips the scheduler context.
193
+ Connection.new.serve(socket, @app, max_request_read_seconds: @max_request_read_seconds)
177
194
  elsif @thread_pool
178
195
  # HTTP/1.1 (e.g. TLS-wrapped after ALPN picked http/1.1): hand the
179
196
  # connection to a worker thread. The fiber that called dispatch
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Hyperion
4
- VERSION = '1.2.0'
4
+ VERSION = '1.3.0'
5
5
  end
@@ -20,7 +20,7 @@ module Hyperion
20
20
  thread_count: Server::DEFAULT_THREAD_COUNT,
21
21
  config: nil, worker_index: 0, listener: nil,
22
22
  max_pending: nil, max_request_read_seconds: 60,
23
- h2_settings: nil)
23
+ h2_settings: nil, async_io: false)
24
24
  @host = host
25
25
  @port = port
26
26
  @app = app
@@ -33,6 +33,7 @@ module Hyperion
33
33
  @max_pending = max_pending
34
34
  @max_request_read_seconds = max_request_read_seconds
35
35
  @h2_settings = h2_settings
36
+ @async_io = async_io
36
37
  end
37
38
 
38
39
  def run
@@ -51,7 +52,8 @@ module Hyperion
51
52
  thread_count: @thread_count,
52
53
  max_pending: @max_pending,
53
54
  max_request_read_seconds: @max_request_read_seconds,
54
- h2_settings: @h2_settings)
55
+ h2_settings: @h2_settings,
56
+ async_io: @async_io)
55
57
  tcp_server = @listener || build_reuseport_listener
56
58
  server.adopt_listener(tcp_server)
57
59
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: hyperion-rb
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2.0
4
+ version: 1.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrey Lobanov