hyperion-rb 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 5670da7700c48436d0e3ded790cf5df090deeebd38bc0b7024b9b6e95c20b5c8
4
- data.tar.gz: 10bedef6e02717511eb83bea0044e71978d7e13b9d93d0ac37310a89f6581e9a
3
+ metadata.gz: 53235f97fb1e384507f62373cd12180ade1eecc8df6f9ce75145ba60403a983e
4
+ data.tar.gz: 1c58ede296c54a098d26cae2b69837f23bf60fb5ce4239c033446f583f12a4df
5
5
  SHA512:
6
- metadata.gz: e791cdd9271cb954ddc11ee037ced8c182fffa4c8b27ded1d0c5672cada1d62fb4095d9e4c440136ce8eeed746eca6e4d99ebb3b1e42a2bc9bbd7bce5c1d9615
7
- data.tar.gz: 4728b4bf159583fc6f46bd8c33dbcf916b74dddd49dd685159d39950112f5716cdc8108903d0ca312b31eef397d2237fab9d2f34d51e90822a7d3cab9c1b6691
6
+ metadata.gz: 8484e7168d8ba27312edece5c86af770ef0604bf85d13cdde37c5f4c87b9de0417216f241e073340bedeabef292fc4ec032a7379a76a836e236f6f129c97bcd3
7
+ data.tar.gz: 89ac23881d0ddd4beff79d08551fa6f7e8399948c1607a3a32475202780170dc9c036f1ef93bd1f17dc5a34e1d91d9901bf3e4119e9efea05d5aa528b22271ff
data/CHANGELOG.md CHANGED
@@ -1,5 +1,45 @@
1
1
  # Changelog
2
2
 
3
+ ## [1.3.0] - 2026-04-27
4
+
5
+ Adds the structural moat for fiber-cooperative I/O. No breaking changes.
6
+
7
+ ### Added
8
+ - **`async_io: true` config flag** (also `--async-io` CLI flag) — when enabled, the plain HTTP/1.1 accept loop runs each connection on a fiber under `Async::Scheduler` instead of handing it to a worker thread. This is what makes [hyperion-async-pg](https://github.com/andrew-woblavobla/hyperion-async-pg) (and other Async-aware libraries) actually cooperate: each fiber yields the OS thread on socket waits, so one thread can serve N concurrent in-flight DB queries instead of 1. **Default off** to keep the 1.2.0 raw-loop perf for fiber-unaware apps. Trade-off: ~5% throughput hit on hello-world; 5–10× throughput on PG-bound workloads when paired with hyperion-async-pg + a fiber-aware connection pool.
9
+ - **Bench validation (macOS, 50ms PG round-trip, 200 concurrent wrk conns):**
10
+
11
+ | | r/s | p99 |
12
+ |---|---:|---:|
13
+ | Puma 7.2 `-t 5` + plain pg (pool=5) | 88.9 | 2.31 s |
14
+ | **Hyperion 1.3.0 `--async-io -t 5` + hyperion-async-pg (FiberPool=64)** | **1,103.7** | **237 ms** |
15
+
16
+ **12.4× throughput, 9.7× lower p99.** Theoretical ceiling at pool=64 + 50ms query is ~1280 r/s; achieved 86% of it. Linux numbers will land in a follow-up bench section.
17
+
18
+ ### Changed
19
+ - TLS / HTTP/2 paths still always use the Async accept loop (unchanged); they ignore the `async_io` flag because they need the scheduler for ALPN handshake yields and per-stream fiber dispatch anyway.
20
+ - When `async_io: true`, plain HTTP/1.1 dispatch bypasses the thread pool and serves the connection inline on the calling fiber. The pool stays in use for the TLS path's `app.call` hops on each h2 stream.
21
+
22
+ ## [1.2.0] - 2026-04-27
23
+
24
+ Production hardening + perf round 2. No breaking changes.
25
+
26
+ ### Added
27
+ - **Zero-copy sendfile path** — when a Rack body responds to `#to_path` (e.g. `Rack::Files`, asset uploads), `ResponseWriter` uses `IO.copy_stream(file, socket)` which triggers `sendfile(2)` on Linux for plain TCP. Eliminates the ~MB-sized String allocation per static-asset response. Falls back to userspace copy on TLS / non-Linux but still avoids the userspace String build. New metrics: `:sendfile_responses`, `:tls_zerobuf_responses`.
28
+ - **Hot fork warmup (`Hyperion.warmup!`)** — master pre-allocates the Rack env Hash pool, primes the C extension's lazy state, and touches commonly-resolved constants before `before_fork`. Workers inherit the warm pools via Copy-on-Write. Removes first-N-requests-after-fork allocation tax.
29
+ - **Backpressure (`max_pending`)** — when the thread pool's inbox queue exceeds the configured depth, new accepts get HTTP 503 + `Retry-After: 1` and the socket is closed immediately (no Rack dispatch, no access-log line). Default off (nil); opt in by setting an Integer. New metric: `:rejected_connections`.
30
+ - **Prometheus exporter** — `AdminMiddleware` now serves `GET /-/metrics` in addition to `POST /-/quit` (same token). Renders `Hyperion.stats` as Prometheus text exposition v0.0.4. Counter names follow the `hyperion_<key>_total` convention; `:responses_<code>` keys are grouped under `hyperion_responses_status_total{status="<code>"}`.
31
+ - **Slow-client total-deadline (`max_request_read_seconds`)** — per-request wallclock cap on the request-line + headers read phase (default 60s). Defense-in-depth against slowloris: a malicious client can no longer dribble 1 byte per `read_timeout` window indefinitely. On overrun, Hyperion writes 408 + closes. Resets per request on keep-alive sessions. New metric: `:slow_request_aborts`.
32
+ - **HTTP/2 SETTINGS tuning** — Falcon-class defaults shipped: `MAX_CONCURRENT_STREAMS=128`, `INITIAL_WINDOW_SIZE=1MiB`, `MAX_FRAME_SIZE=1MiB`, `MAX_HEADER_LIST_SIZE=64KiB`. All four overridable via Config DSL (`h2_max_concurrent_streams` etc). Out-of-spec values are clamped + warned, not crashed.
33
+ - **`docs/REVERSE_PROXY.md`** — nginx + AWS ALB samples, X-Forwarded-* semantics, admin-endpoint hardening at the edge. Includes the documented gotcha that ALB-to-target HTTP/2 strips WebSocket upgrade headers (use HTTP/1.1 upstream).
34
+
35
+ ### Changed
36
+ - **`ResponseWriter` Date header now uses `cached_date`** — the per-thread, per-second cache landed in 1.1.0 was never wired into the hot path. It is now. Eliminates ~3 String allocations per response (`Time.now.httpdate` → cached String reuse).
37
+ - **`AdminMiddleware`** refactored: shared `authorize` helper between `/-/quit` and `/-/metrics`; `PATH` constant split into `PATH_QUIT` + `PATH_METRICS`.
38
+ - **`Hyperion::Logger` per-thread access buffer key** is now namespaced per Logger instance (already shipped as a 1.1.0 follow-up fix; documented here for completeness).
39
+
40
+ ### Fixed
41
+ - N/A — no regressions discovered between 1.1.0 and 1.2.0.
42
+
3
43
  ## [1.1.0] - 2026-04-27
4
44
 
5
45
  First minor release after 1.0.0. Production hardening + perf wins, no breaking changes.
data/README.md CHANGED
@@ -29,26 +29,27 @@ All numbers are real wrk runs against published Hyperion configs. Hyperion ships
29
29
 
30
30
  ### Hello-world Rack app
31
31
 
32
- `bench/hello.ru`, single worker, parity threads (`-t 16` vs Puma `-t 16:16`), 4 wrk threads / 50 connections / 10s, macOS arm64 / Ruby 3.3.3:
32
+ `bench/hello.ru`, single worker, parity threads (`-t 5` vs Puma `-t 5:5`), 4 wrk threads / 100 connections / 15s, macOS arm64 / Ruby 3.3.3, Hyperion 1.2.0:
33
33
 
34
- | | r/s | p99 |
35
- |---|---:|---:|
36
- | **Hyperion default (logs ON)** | **23,885** | **1.05 ms** |
37
- | Hyperion `--no-log-requests` | 24,222 | 1.00 ms |
38
- | Puma `-t 16:16` | 18,794 | 30.89 ms |
34
+ | | r/s | p99 | tail vs Hyperion |
35
+ |---|---:|---:|---:|
36
+ | **Hyperion 1.2.0** (default, logs ON) | **22,496** | **502 µs** | **1×** |
37
+ | Falcon 0.55.3 `--count 1` | 22,199 | 5.36 ms | 11× worse |
38
+ | Puma 7.1.0 `-t 5:5` | 20,400 | 422.85 ms | 845× worse |
39
39
 
40
- **1.27× Puma throughput, ~30× lower p99 — while emitting structured JSON access logs Puma doesn't.**
40
+ **Hyperion: 1.10× Puma throughput, parity with Falcon on throughput, ~10× lower p99 than Falcon and ~845× lower than Puma — while emitting structured JSON access logs the others don't.**
41
41
 
42
42
  ### Production cluster config (`-w 4`)
43
43
 
44
- Same bench app, `-w 4` cluster, parity threads. macOS arm64:
44
+ Same bench app, `-w 4` cluster, parity threads (`-t 5` everywhere), 4 wrk threads / 200 connections / 15s, macOS arm64:
45
45
 
46
- | | r/s | p99 |
47
- |---|---:|---:|
48
- | **Hyperion `-w 4 -t 10`** | **44,221** | **1.15 ms** |
49
- | Puma `-w 4 -t 10:10` | 37,929 | 17.06 ms |
46
+ | | r/s | p99 | tail vs Hyperion |
47
+ |---|---:|---:|---:|
48
+ | Falcon `--count 4` | 48,197 | 4.84 ms | 5.9× worse |
49
+ | **Hyperion `-w 4 -t 5`** | **40,137** | **825 µs** | **1×** |
50
+ | Puma `-w 4 -t 5:5` | 34,793 | 177.76 ms | 215× worse (1 timeout) |
50
51
 
51
- **1.17× Puma throughput, ~15× lower p99.**
52
+ Falcon edges Hyperion ~20% on raw rps at `-w 4` on macOS hello-world. **Hyperion still leads on tail latency by 5.9× over Falcon and 215× over Puma**, and beats Puma on throughput by 1.15×. On Linux production-config and DB-backed workloads (below) Hyperion takes the rps lead too — the macOS hello-world advantage to Falcon disappears once the workload includes any actual work or the kernel is Linux.
52
53
 
53
54
  ### Linux production-config (DB-backed Rack)
54
55
 
@@ -60,7 +61,37 @@ Same bench app, `-w 4` cluster, parity threads. macOS arm64:
60
61
  | Hyperion `--no-log-requests` | 6,364 | 1.114× |
61
62
  | Puma `-w 4 -t 10:10` (no per-req logs) | 5,715 | 1.000× |
62
63
 
63
- Bench is network-bound (~3-4 ms median is the PG + Redis round-trip). Hyperion's lead comes from cheaper per-request CPU: lock-free per-thread metrics, per-thread cached iso8601 timestamps in the access log, hand-rolled single-interpolation log line builder, no logger mutex (POSIX `write(2)` atomicity), C-extension response-head builder.
64
+ Bench is **wait-bound** ~3-4 ms median is the PG + Redis round-trip, dwarfing the per-request CPU work where Hyperion's optimisations live. With a synchronous `pg` driver, fibers don't help: every in-flight DB call still parks an OS thread, and both servers max out at `workers × threads` concurrent queries. To widen this gap requires either an async PG driver — see [hyperion-async-pg](https://github.com/andrew-woblavobla/hyperion-async-pg) (companion gem; pair with `--async-io` and a fiber-aware pool, see "Async I/O — fiber concurrency on PG-bound apps" below) — or a CPU-bound workload, where Hyperion's lead becomes visible (next section).
65
+
66
+ ### Async I/O — fiber concurrency on PG-bound apps
67
+
68
+ `bench/pg_concurrent.ru` (50 ms PG query per request, pool sized for the server's concurrency model). macOS, Postgres over WAN, wrk `-t4 -c200 -d20s`:
69
+
70
+ | | r/s | p99 |
71
+ |---|---:|---:|
72
+ | Puma 7.2 `-t 5` + plain pg (pool=5) | 88.9 | 2.31 s |
73
+ | **Hyperion 1.3.0 `--async-io -t 5` + hyperion-async-pg (FiberPool=64)** | **1,103.7** | **237 ms** |
74
+
75
+ **12.4× throughput, 9.7× lower p99.** Puma is bottlenecked at `threads × 1 in-flight query` because plain `pg` blocks the OS thread on `recv()`. Hyperion + async-pg + a fiber-aware pool decouples concurrency from threads: 5 OS threads serve 64 concurrent in-flight queries via fiber cooperation. Theoretical ceiling at pool=64 + 50 ms query = 1280 r/s; achieved 1103 r/s = 86% of it.
76
+
77
+ Three things must all be true to get this win:
78
+ 1. **`async_io: true`** in your Hyperion config (or `--async-io` CLI flag). Default is off to keep 1.2.0's raw-loop perf for fiber-unaware apps.
79
+ 2. **`hyperion-async-pg`** installed: `gem 'hyperion-async-pg', require: 'hyperion/async_pg'` + `Hyperion::AsyncPg.install!` at boot.
80
+ 3. **Fiber-aware connection pool.** The popular `connection_pool` gem is NOT — its Mutex blocks the OS thread. Use [`async-pool`](https://github.com/socketry/async-pool), `Async::Semaphore`, or hand-roll one (see `bench/pg_concurrent.ru` for a 30-line FiberPool example).
81
+
82
+ Skip any of these and you get parity with Puma at the same `-t`. Run the bench yourself: `MODE=async DATABASE_URL=... PG_POOL_SIZE=64 bundle exec hyperion --async-io -t 5 bench/pg_concurrent.ru` (in the [hyperion-async-pg](https://github.com/andrew-woblavobla/hyperion-async-pg) repo).
83
+
84
+ ### CPU-bound JSON workload
85
+
86
+ `bench/work.ru` — handler builds a 50-key fixture, JSON-encodes a fresh response per request (~8 KB body), processes a 6-cookie header chain. wrk `-t4 -c200 -d15s`, macOS arm64 / Ruby 3.3.3, 1.2.0:
87
+
88
+ | | r/s | p99 | tail vs Hyperion |
89
+ |---|---:|---:|---:|
90
+ | Falcon `--count 4` | 46,166 | 20.17 ms | 24× worse |
91
+ | **Hyperion `-w 4 -t 5`** | **43,924** | **824 µs** | **1×** |
92
+ | Puma `-w 4 -t 5:5` | 36,383 | 166.30 ms (47 socket errors) | 200× worse |
93
+
94
+ **1.21× Puma throughput, 200× lower p99.** This is the gap that hides behind PG-round-trip noise on the DB bench. Hyperion's per-request CPU savings (lock-free per-thread metrics, frozen header keys in the Rack adapter, C-ext response head builder, cached iso8601 timestamps, cached HTTP Date header) land on the wire when the workload is CPU-bound. Falcon edges us 5% on raw r/s but with 24× worse tail — a different tradeoff curve. Reproduce: `bundle exec bin/hyperion -w 4 -t 5 -p 9292 bench/work.ru`.
64
95
 
65
96
  ### Real Rails 8.1 app (single worker, parity threads `-t 16`)
66
97
 
@@ -77,6 +108,25 @@ Health endpoint that traverses the full middleware chain (rack-attack, locale re
77
108
 
78
109
  On Grape and Rails-controller workloads Puma hits wrk's 2 s timeout cap on ~⅔ of requests — its real p99 is censored above 2 s. Hyperion serves all of its requests under 1.2 s with 0 to 16 timeouts. **1.14–1.48× Puma throughput** depending on endpoint.
79
110
 
111
+ ### Static-asset serving (sendfile zero-copy path, 1.2.0+)
112
+
113
+ `bench/static.ru` (`Rack::Files` over a 1 MiB asset), `-w 1`, `wrk -t4 -c100 -d15s`, macOS arm64 / Ruby 3.3.3:
114
+
115
+ | | r/s | p99 | transferred | tail vs winner |
116
+ |---|---:|---:|---:|---:|
117
+ | **Hyperion (sendfile path)** | **2,069** | **3.10 ms** | 30.4 GB | **1×** |
118
+ | Puma `-w 1 -t 5:5` | 2,109 | 566.16 ms | 31.0 GB | 183× worse |
119
+ | Falcon `--count 1` | 1,269 | 801.01 ms | 18.7 GB | 258× worse (28 timeouts) |
120
+
121
+ Throughput is bandwidth-bound on localhost (≈2 GB/s = the loopback memory ceiling), so the throughput column looks like parity. The actual win is in the **tail latency** column: Hyperion's `IO.copy_stream` → `sendfile(2)` path skips userspace entirely, while Puma allocates a String per response and Falcon serializes more aggressively. On real network paths sendfile widens the gap further (kernel-to-NIC zero-copy).
122
+
123
+ Reproduce:
124
+ ```sh
125
+ ruby -e 'File.binwrite("/tmp/hyperion_bench_asset_1m.bin", "x" * (1024*1024))'
126
+ bundle exec bin/hyperion -p 9292 bench/static.ru
127
+ wrk --latency -t4 -c100 -d15s http://127.0.0.1:9292/hyperion_bench_asset_1m.bin
128
+ ```
129
+
80
130
  ### Concurrency at scale (architectural advantages)
81
131
 
82
132
  These workloads demonstrate structural differences between Hyperion's fiber-per-connection / fiber-per-stream model and Puma's thread-pool model. Numbers are illustrative; the architecture is what matters. Run on Ubuntu 24.04 / Ruby 3.3.3, single worker, h2load `-c <conns> -n 100000 --rps 1000 --h1`.
@@ -49,6 +49,20 @@ module Hyperion
49
49
  )
50
50
 
51
51
  class << self
52
+ # Pre-allocate `n` env-hash and rack-input objects in master before
53
+ # fork. Children inherit the populated free-list via copy-on-write —
54
+ # the hash slots stay shared until a request mutates them. Eliminates
55
+ # the first-N-requests allocation tax that every fresh worker would
56
+ # otherwise pay on cold start. Idempotent: safe to call multiple
57
+ # times; the pool simply caps at its configured `max_size`.
58
+ def warmup_pool(count = 8)
59
+ warmed_envs = Array.new(count) { ENV_POOL.acquire }
60
+ warmed_inputs = Array.new(count) { INPUT_POOL.acquire }
61
+ warmed_envs.each { |e| ENV_POOL.release(e) }
62
+ warmed_inputs.each { |i| INPUT_POOL.release(i) }
63
+ nil
64
+ end
65
+
52
66
  def call(app, request)
53
67
  env, input = build_env(request)
54
68
  status, headers, body = app.call(env)
@@ -7,7 +7,8 @@ module Hyperion
7
7
  # listener as the application. Disabled by default — only mounted when
8
8
  # `admin_token` is configured. Currently provides:
9
9
  #
10
- # POST /-/quit → triggers graceful master drain (SIGTERM to ppid)
10
+ # POST /-/quit → triggers graceful master drain (SIGTERM to ppid)
11
+ # GET /-/metrics → returns Hyperion.stats in Prometheus text format
11
12
  #
12
13
  # Auth: the request must include `X-Hyperion-Admin-Token: <token>`.
13
14
  # Mismatch → 401. Path/method mismatch → falls through to the app
@@ -18,9 +19,17 @@ module Hyperion
18
19
  # SECURITY: the bearer token is defense-in-depth, not a substitute for
19
20
  # network isolation. Operators MUST keep the listener on a private
20
21
  # network or behind TLS + an authenticating reverse proxy. Anyone who
21
- # can reach the listener AND knows the token can drain the server.
22
+ # can reach the listener AND knows the token can drain the server or
23
+ # scrape its metrics. See docs/REVERSE_PROXY.md for nginx/ALB recipes
24
+ # that block /-/* at the edge.
22
25
  class AdminMiddleware
23
- PATH = '/-/quit'
26
+ PATH_QUIT = '/-/quit'
27
+ PATH_METRICS = '/-/metrics'
28
+
29
+ METRICS_CONTENT_TYPE = 'text/plain; version=0.0.4; charset=utf-8'
30
+ JSON_CONTENT_TYPE = 'application/json'
31
+
32
+ UNAUTHORIZED_BODY = %({"error":"unauthorized"}\n)
24
33
 
25
34
  def initialize(app, token:, signal_target: nil)
26
35
  raise ArgumentError, 'admin_token must be a non-empty String' if token.nil? || token.to_s.empty?
@@ -33,38 +42,59 @@ module Hyperion
33
42
  end
34
43
 
35
44
  def call(env)
36
- return @app.call(env) unless admin_request?(env)
45
+ path = env['PATH_INFO']
46
+ method = env['REQUEST_METHOD']
37
47
 
38
- provided = env['HTTP_X_HYPERION_ADMIN_TOKEN'].to_s
39
- # Constant-time comparison. Rack::Utils.secure_compare requires same
40
- # length, so prefix-pad first to avoid a length-leak side channel.
41
- unless secure_match?(provided)
42
- return [401, { 'content-type' => 'application/json' },
43
- [%({"error":"unauthorized"}\n)]]
48
+ if path == PATH_QUIT && method == 'POST'
49
+ authorize(env) { handle_quit(env) }
50
+ elsif path == PATH_METRICS && method == 'GET'
51
+ authorize(env) { handle_metrics }
52
+ else
53
+ @app.call(env)
44
54
  end
55
+ end
56
+
57
+ private
58
+
59
+ # Wrap a handler in the shared bearer-token check. Yields only when the
60
+ # token matches; returns the canonical 401 response otherwise.
61
+ def authorize(env)
62
+ provided = env['HTTP_X_HYPERION_ADMIN_TOKEN'].to_s
63
+ return unauthorized unless secure_match?(provided)
45
64
 
65
+ yield
66
+ end
67
+
68
+ def unauthorized
69
+ [401, { 'content-type' => JSON_CONTENT_TYPE }, [UNAUTHORIZED_BODY]]
70
+ end
71
+
72
+ def handle_quit(env)
46
73
  target = resolve_signal_target
47
- Hyperion.logger.info { { message: 'admin drain requested', remote_addr: env['REMOTE_ADDR'], target_pid: target } }
74
+ Hyperion.logger.info do
75
+ { message: 'admin drain requested', remote_addr: env['REMOTE_ADDR'], target_pid: target }
76
+ end
48
77
  begin
49
78
  Process.kill('TERM', target)
50
79
  rescue StandardError => e
51
80
  Hyperion.logger.warn { { message: 'admin drain signal failed', error: e.message } }
52
- return [500, { 'content-type' => 'application/json' }, [%({"error":"signal_failed"}\n)]]
81
+ return [500, { 'content-type' => JSON_CONTENT_TYPE }, [%({"error":"signal_failed"}\n)]]
53
82
  end
54
83
 
55
- [202, { 'content-type' => 'application/json' }, [%({"status":"draining"}\n)]]
84
+ [202, { 'content-type' => JSON_CONTENT_TYPE }, [%({"status":"draining"}\n)]]
56
85
  end
57
86
 
58
- private
59
-
60
- def admin_request?(env)
61
- env['PATH_INFO'] == PATH && env['REQUEST_METHOD'] == 'POST'
87
+ def handle_metrics
88
+ body = PrometheusExporter.render(Hyperion.stats)
89
+ [200, { 'content-type' => METRICS_CONTENT_TYPE }, [body]]
62
90
  end
63
91
 
64
92
  def secure_match?(provided)
65
93
  return false if provided.empty?
66
94
  return false unless provided.bytesize == @token.bytesize
67
95
 
96
+ # Constant-time comparison. Rack::Utils.secure_compare requires same
97
+ # length, so we prefix-pad first to avoid a length-leak side channel.
68
98
  Rack::Utils.secure_compare(provided, @token)
69
99
  end
70
100
 
data/lib/hyperion/cli.rb CHANGED
@@ -57,6 +57,10 @@ module Hyperion
57
57
  'Enable Ruby YJIT (default: auto on RAILS_ENV/RACK_ENV=production/staging)') do |v|
58
58
  cli_opts[:yjit] = v
59
59
  end
60
+ o.on('--[no-]async-io',
61
+ 'Run plain HTTP/1.1 connections under Async::Scheduler (required for hyperion-async-pg and other fiber-cooperative I/O; default off)') do |v|
62
+ cli_opts[:async_io] = v
63
+ end
60
64
  o.on('-h', '--help', 'show help') do
61
65
  puts o
62
66
  exit 0
@@ -111,12 +115,22 @@ module Hyperion
111
115
  tls = build_tls_from_config(config)
112
116
  server = Server.new(host: config.host, port: config.port, app: app,
113
117
  tls: tls, thread_count: config.thread_count,
114
- read_timeout: config.read_timeout)
118
+ read_timeout: config.read_timeout,
119
+ max_pending: config.max_pending,
120
+ max_request_read_seconds: config.max_request_read_seconds,
121
+ h2_settings: Master.build_h2_settings(config),
122
+ async_io: config.async_io)
115
123
  server.listen
116
124
  scheme = tls ? 'https' : 'http'
117
125
  Hyperion.logger.info { { message: 'listening', url: "#{scheme}://#{server.host}:#{server.port}" } }
118
126
  warn_c_parser_unavailable
119
127
 
128
+ # Pre-allocate Rack env-pool entries and eager-touch lazy constants.
129
+ # In single-mode there's no fork, but the warmup still pays for itself
130
+ # by frontloading the first-N-request allocation cost off the first
131
+ # real client. Idempotent — safe to call once per process.
132
+ Hyperion.warmup!
133
+
120
134
  # Single-worker mode reuses the lifecycle hooks: before_fork is a no-op
121
135
  # here (no fork happens), and on_worker_boot/on_worker_shutdown fire
122
136
  # for the lone in-process "worker" so app code that opens DB pools etc.
@@ -199,13 +213,16 @@ module Hyperion
199
213
  private_class_method :maybe_enable_yjit
200
214
 
201
215
  # When admin_token is configured, wrap the app in AdminMiddleware so
202
- # POST /-/quit becomes a token-protected drain endpoint. Skipped when
203
- # the token is unset — the path falls through to the app, so apps may
204
- # still own /-/anything if Hyperion's admin is off.
216
+ # POST /-/quit and GET /-/metrics become token-protected admin endpoints.
217
+ # Skipped when the token is unset — those paths fall through to the app,
218
+ # so apps may still own /-/anything if Hyperion's admin is off.
205
219
  def self.wrap_admin_middleware(app, config)
206
220
  return app if config.admin_token.nil? || config.admin_token.to_s.empty?
207
221
 
208
- Hyperion.logger.info { { message: 'admin endpoint enabled', path: AdminMiddleware::PATH } }
222
+ Hyperion.logger.info do
223
+ { message: 'admin endpoint enabled',
224
+ paths: [AdminMiddleware::PATH_QUIT, AdminMiddleware::PATH_METRICS] }
225
+ end
209
226
  AdminMiddleware.new(app, token: config.admin_token)
210
227
  end
211
228
  private_class_method :wrap_admin_middleware
@@ -28,7 +28,14 @@ module Hyperion
28
28
  yjit: nil, # nil → auto: enable on production/staging; true/false to force.
29
29
  worker_max_rss_mb: nil, # Integer, e.g. 1024. When a worker exceeds this RSS in MB, master gracefully cycles it. nil disables.
30
30
  worker_check_interval: 30, # Seconds between RSS polls. Tradeoff: tighter = faster recycle, more ps calls. 30s matches Puma WorkerKiller.
31
- admin_token: nil # String. When set, POST /-/quit triggers graceful drain. nil disables endpoint entirely (returns 404).
31
+ admin_token: nil, # String. When set, exposes admin endpoints (POST /-/quit triggers graceful drain; GET /-/metrics returns Prometheus-format Hyperion.stats). Same token guards both. nil disables admin entirely (paths fall through to the app).
32
+ max_pending: nil, # Integer, e.g. 256. When the per-worker accept inbox has this many queued connections, additional accepts are rejected with HTTP 503 + Retry-After:1 instead of being queued. nil disables (current behaviour: unbounded queue).
33
+ max_request_read_seconds: 60, # Numeric. Total wallclock budget (seconds) for reading the request line + headers + body for ONE request. Defends against slowloris-style drips that satisfy the per-recv read_timeout but never finish the request. Resets between requests on a keep-alive connection. nil disables.
34
+ async_io: false, # When true, the plain HTTP/1.1 accept loop runs each connection on a fiber under Async::Scheduler instead of handing it to a worker thread. Required for fiber-cooperative I/O (e.g. hyperion-async-pg). Costs ~5% throughput on hello-world; in exchange one OS thread can serve N concurrent in-flight DB queries on wait-bound workloads. TLS / HTTP/2 paths always use the async loop and ignore this flag.
35
+ h2_max_concurrent_streams: 128, # HTTP/2 SETTINGS_MAX_CONCURRENT_STREAMS — cap on simultaneously-open streams per connection. Falcon: 64. nil leaves protocol-http2 default (0xFFFFFFFF).
36
+ h2_initial_window_size: 1_048_576, # HTTP/2 SETTINGS_INITIAL_WINDOW_SIZE (octets) — flow-control window per stream at open. Bigger = fewer WINDOW_UPDATE round-trips on large bodies. Spec default is 65535. nil → leave protocol default.
37
+ h2_max_frame_size: 1_048_576, # HTTP/2 SETTINGS_MAX_FRAME_SIZE (octets) — biggest DATA/HEADERS frame we'll accept. Spec floor 16384, ceiling 16777215. We pick 1 MiB to match common CDNs without unbounded buffer growth. nil → leave protocol default (16384).
38
+ h2_max_header_list_size: 65_536 # HTTP/2 SETTINGS_MAX_HEADER_LIST_SIZE (octets) — advisory cap on the decompressed header block. Bounds memory of pathological client headers. nil → leave protocol default (unbounded).
32
39
  }.freeze
33
40
 
34
41
  HOOKS = %i[before_fork on_worker_boot on_worker_shutdown].freeze
@@ -17,6 +17,7 @@ module Hyperion
17
17
  MAX_BODY_BYTES = 16 * 1024 * 1024 # 16 MB cap. Phase 5 introduces streaming bodies.
18
18
  HEADER_TERM = "\r\n\r\n"
19
19
  TIMEOUT_SENTINEL = :__hyperion_read_timeout__
20
+ DEADLINE_SENTINEL = :__hyperion_request_deadline__
20
21
  IDLE_KEEPALIVE_TIMEOUT_SECONDS = 5
21
22
 
22
23
  # Default parser is the C-extension `CParser` when the extension built;
@@ -44,14 +45,20 @@ module Hyperion
44
45
  @log_requests = log_requests.nil? ? Hyperion.log_requests? : log_requests
45
46
  end
46
47
 
47
- def serve(socket, app)
48
+ def serve(socket, app, max_request_read_seconds: 60)
48
49
  request_count = 0
49
50
  carry = +'' # bytes already pulled off the socket but past the prev request boundary
50
51
  peer_addr = peer_address(socket)
51
52
  @metrics.increment(:connections_accepted)
52
53
  @metrics.increment(:connections_active)
53
54
  loop do
54
- buffer = read_request(socket, carry)
55
+ # Per-request wallclock deadline. Captured fresh for every request so
56
+ # long-lived keep-alive sessions with many small requests don't
57
+ # falsely trip after the cumulative budget elapses.
58
+ request_started_clock = Process.clock_gettime(Process::CLOCK_MONOTONIC) if max_request_read_seconds
59
+ buffer = read_request(socket, carry, deadline_started_at: request_started_clock,
60
+ max_request_read_seconds: max_request_read_seconds,
61
+ peer_addr: peer_addr)
55
62
  return unless buffer
56
63
 
57
64
  if buffer == TIMEOUT_SENTINEL
@@ -65,6 +72,10 @@ module Hyperion
65
72
  return
66
73
  end
67
74
 
75
+ # Slowloris-style abort: deadline tripped during read. We've already
76
+ # written the 408 (best-effort) inside read_request; close out here.
77
+ return if buffer == DEADLINE_SENTINEL
78
+
68
79
  request, body_end = @parser.parse(buffer)
69
80
  carry = +(buffer.byteslice(body_end, buffer.bytesize - body_end) || '')
70
81
  request = enrich_with_peer(request, peer_addr) if peer_addr && request.peer_address.nil?
@@ -193,10 +204,16 @@ module Hyperion
193
204
  # pipelining). Returns the full buffer (with any trailing pipelined
194
205
  # bytes intact); the parser's returned end_offset tells the caller
195
206
  # where this request ends. On EOF returns nil; on read timeout returns
196
- # TIMEOUT_SENTINEL.
197
- def read_request(socket, carry = +'')
207
+ # TIMEOUT_SENTINEL; on per-request wallclock deadline trip returns
208
+ # DEADLINE_SENTINEL (and emits a best-effort 408 + close).
209
+ def read_request(socket, carry = +'', deadline_started_at: nil, max_request_read_seconds: nil,
210
+ peer_addr: nil)
198
211
  buffer = carry
199
212
  until buffer.include?(HEADER_TERM)
213
+ if deadline_exceeded?(deadline_started_at, max_request_read_seconds)
214
+ return abort_for_deadline(socket, deadline_started_at, peer_addr)
215
+ end
216
+
200
217
  chunk = read_chunk(socket)
201
218
  return chunk if chunk.nil? || chunk == TIMEOUT_SENTINEL
202
219
  return nil if chunk.empty?
@@ -211,6 +228,9 @@ module Hyperion
211
228
  if chunked?(headers_part)
212
229
  until chunked_body_complete?(buffer, header_end)
213
230
  raise ParseError, 'chunked body exceeds limit' if buffer.bytesize - header_end > MAX_BODY_BYTES
231
+ if deadline_exceeded?(deadline_started_at, max_request_read_seconds)
232
+ return abort_for_deadline(socket, deadline_started_at, peer_addr)
233
+ end
214
234
 
215
235
  chunk = read_chunk(socket)
216
236
  break if chunk.nil? || chunk.empty? || chunk == TIMEOUT_SENTINEL
@@ -220,6 +240,10 @@ module Hyperion
220
240
  else
221
241
  content_length = headers_part[/^content-length:\s*(\d+)/i, 1].to_i
222
242
  while buffer.bytesize < header_end + content_length
243
+ if deadline_exceeded?(deadline_started_at, max_request_read_seconds)
244
+ return abort_for_deadline(socket, deadline_started_at, peer_addr)
245
+ end
246
+
223
247
  chunk = read_chunk(socket)
224
248
  break if chunk.nil? || chunk.empty? || chunk == TIMEOUT_SENTINEL
225
249
 
@@ -230,6 +254,33 @@ module Hyperion
230
254
  buffer
231
255
  end
232
256
 
257
+ # nil-disabled or budget-untripped → false. Otherwise the wallclock cap
258
+ # has been exceeded and the caller should abort.
259
+ def deadline_exceeded?(started_at, max_seconds)
260
+ return false unless started_at && max_seconds
261
+
262
+ (Process.clock_gettime(Process::CLOCK_MONOTONIC) - started_at) > max_seconds
263
+ end
264
+
265
+ # Slowloris fallback: log a structured warn, bump :slow_request_aborts,
266
+ # write a best-effort 408, and let the caller close the socket. We don't
267
+ # wait on the 408 write — a dribbling client may never read it, and
268
+ # that's the failure mode we're protecting against anyway.
269
+ def abort_for_deadline(socket, started_at, peer_addr)
270
+ elapsed = started_at ? (Process.clock_gettime(Process::CLOCK_MONOTONIC) - started_at).round(3) : nil
271
+ @metrics.increment(:slow_request_aborts)
272
+ @logger.warn do
273
+ { message: 'request read deadline exceeded', remote_addr: peer_addr, elapsed_seconds: elapsed }
274
+ end
275
+ begin
276
+ socket.write("HTTP/1.1 408 Request Timeout\r\nconnection: close\r\ncontent-length: 0\r\n\r\n")
277
+ rescue StandardError
278
+ # Peer may have already gone — nothing to do.
279
+ end
280
+ @metrics.increment_status(408)
281
+ DEADLINE_SENTINEL
282
+ end
283
+
233
284
  def chunked?(headers_part)
234
285
  headers_part.match?(/^transfer-encoding:[ \t]*[^\r\n]*chunked\b/i)
235
286
  end
@@ -212,9 +212,34 @@ module Hyperion
212
212
  end
213
213
  end
214
214
 
215
- def initialize(app:, thread_pool: nil)
215
+ # Maps Hyperion-friendly setting names to the integer SETTINGS_* identifiers
216
+ # protocol-http2 uses on the wire. See RFC 7540 §6.5.2 — these are the
217
+ # only four parameters Hyperion exposes; the rest of the SETTINGS frame
218
+ # (HEADER_TABLE_SIZE, ENABLE_PUSH, etc.) keeps protocol-http2's default.
219
+ SETTINGS_KEY_MAP = {
220
+ max_concurrent_streams: ::Protocol::HTTP2::Settings::MAXIMUM_CONCURRENT_STREAMS,
221
+ initial_window_size: ::Protocol::HTTP2::Settings::INITIAL_WINDOW_SIZE,
222
+ max_frame_size: ::Protocol::HTTP2::Settings::MAXIMUM_FRAME_SIZE,
223
+ max_header_list_size: ::Protocol::HTTP2::Settings::MAXIMUM_HEADER_LIST_SIZE
224
+ }.freeze
225
+
226
+ # RFC 7540 §6.5.2 floor for SETTINGS_MAX_FRAME_SIZE. protocol-http2 raises
227
+ # ProtocolError on values below this; we clamp + warn instead so a
228
+ # misconfigured operator gets a working server, not a boot-time crash.
229
+ H2_MIN_FRAME_SIZE = 0x4000 # 16384
230
+
231
+ # RFC 7540 §6.5.2 ceiling for SETTINGS_MAX_FRAME_SIZE.
232
+ H2_MAX_FRAME_SIZE = 0xFFFFFF # 16777215
233
+
234
+ # RFC 7540 §6.9.2 — INITIAL_WINDOW_SIZE has the same 31-bit max as the
235
+ # WINDOW_UPDATE frame's Window Size Increment (see protocol-http2's
236
+ # MAXIMUM_ALLOWED_WINDOW_SIZE).
237
+ H2_MAX_WINDOW_SIZE = 0x7FFFFFFF
238
+
239
+ def initialize(app:, thread_pool: nil, h2_settings: nil)
216
240
  @app = app
217
241
  @thread_pool = thread_pool
242
+ @h2_settings = h2_settings
218
243
  @metrics = Hyperion.metrics
219
244
  @logger = Hyperion.logger
220
245
  end
@@ -224,7 +249,7 @@ module Hyperion
224
249
  @metrics.increment(:connections_active)
225
250
  framer = ::Protocol::HTTP2::Framer.new(socket)
226
251
  server = build_server(framer)
227
- server.read_connection_preface
252
+ server.read_connection_preface(initial_settings_payload)
228
253
 
229
254
  # Extract once — the same TCP peer drives every stream on this conn.
230
255
  peer_addr = peer_address(socket)
@@ -290,6 +315,69 @@ module Hyperion
290
315
 
291
316
  private
292
317
 
318
+ # Build the [setting_id, value] pairs that go in the connection-preface
319
+ # SETTINGS frame. protocol-http2's Server#read_connection_preface accepts
320
+ # this array and does the wire encoding for us. Empty array (no overrides
321
+ # configured) → SETTINGS frame still goes out, just with no entries
322
+ # (effectively an ack), which is what the spec allows.
323
+ #
324
+ # We clamp out-of-range values (max_frame_size below the spec floor or
325
+ # above its ceiling, initial_window_size above 31-bit max) instead of
326
+ # letting protocol-http2 raise ProtocolError at handshake time — a
327
+ # crashing handshake leaks the connection. Operator gets a warn so the
328
+ # misconfiguration surfaces in logs.
329
+ def initial_settings_payload
330
+ return [] unless @h2_settings
331
+
332
+ payload = []
333
+ @h2_settings.each do |key, value|
334
+ next if value.nil?
335
+
336
+ setting_id = SETTINGS_KEY_MAP[key]
337
+ unless setting_id
338
+ @logger.warn { { message: 'unknown h2 setting; skipping', setting: key } }
339
+ next
340
+ end
341
+
342
+ clamped = clamp_h2_setting(key, value)
343
+ payload << [setting_id, clamped]
344
+ end
345
+ payload
346
+ end
347
+
348
+ def clamp_h2_setting(key, value)
349
+ case key
350
+ when :max_frame_size
351
+ if value < H2_MIN_FRAME_SIZE
352
+ @logger.warn do
353
+ { message: 'h2 max_frame_size below spec minimum; clamping',
354
+ configured: value, clamped_to: H2_MIN_FRAME_SIZE }
355
+ end
356
+ H2_MIN_FRAME_SIZE
357
+ elsif value > H2_MAX_FRAME_SIZE
358
+ @logger.warn do
359
+ { message: 'h2 max_frame_size above spec maximum; clamping',
360
+ configured: value, clamped_to: H2_MAX_FRAME_SIZE }
361
+ end
362
+ H2_MAX_FRAME_SIZE
363
+ else
364
+ value
365
+ end
366
+ when :initial_window_size
367
+ if value > H2_MAX_WINDOW_SIZE
368
+ @logger.warn do
369
+ { message: 'h2 initial_window_size above spec maximum; clamping',
370
+ configured: value, clamped_to: H2_MAX_WINDOW_SIZE }
371
+ end
372
+ H2_MAX_WINDOW_SIZE
373
+ else
374
+ value
375
+ end
376
+ else
377
+ value
378
+ end
379
+ end
380
+
293
381
  def build_server(framer)
294
382
  server = ::Protocol::HTTP2::Server.new(framer)
295
383
  server.define_singleton_method(:accept_stream) do |stream_id, &block|
@@ -47,6 +47,20 @@ module Hyperion
47
47
  end
48
48
  end
49
49
 
50
+ # Pulls the four configurable HTTP/2 SETTINGS values out of the Config
51
+ # and returns them as a Hash. Nils are stripped so an operator who
52
+ # explicitly sets one to `nil` (meaning "leave protocol-http2 default in
53
+ # place") doesn't accidentally send a SETTINGS entry with a nil value.
54
+ # Empty hash → no override → Http2Handler skips the SETTINGS push.
55
+ def self.build_h2_settings(config)
56
+ {
57
+ max_concurrent_streams: config.h2_max_concurrent_streams,
58
+ initial_window_size: config.h2_initial_window_size,
59
+ max_frame_size: config.h2_max_frame_size,
60
+ max_header_list_size: config.h2_max_header_list_size
61
+ }.compact
62
+ end
63
+
50
64
  def initialize(host:, port:, app:, workers: DEFAULT_WORKER_COUNT,
51
65
  read_timeout: Server::DEFAULT_READ_TIMEOUT_SECONDS, tls: nil,
52
66
  thread_count: Server::DEFAULT_THREAD_COUNT, config: nil)
@@ -84,6 +98,12 @@ module Hyperion
84
98
  }
85
99
  end
86
100
 
101
+ # Pre-allocate Rack env-pool entries and eager-touch lazy constants
102
+ # BEFORE we fork. Children inherit the warm memory via copy-on-write
103
+ # so the first batch of requests on each fresh worker doesn't pay
104
+ # the allocation/autoload tax.
105
+ Hyperion.warmup!
106
+
87
107
  # `before_fork` runs ONCE in the master before any worker is forked.
88
108
  # Operators use it to close shared resources (DB pools, Redis sockets)
89
109
  # so each child gets fresh connections rather than inheriting the
@@ -143,7 +163,11 @@ module Hyperion
143
163
  host: @host, port: @port, app: @app,
144
164
  read_timeout: @read_timeout, tls: @tls,
145
165
  thread_count: @thread_count, config: @config,
146
- worker_index: worker_index
166
+ worker_index: worker_index,
167
+ max_pending: @config.max_pending,
168
+ max_request_read_seconds: @config.max_request_read_seconds,
169
+ h2_settings: Master.build_h2_settings(@config),
170
+ async_io: @config.async_io
147
171
  }
148
172
  # Hand the inherited socket to the worker in :share mode. In
149
173
  # :reuseport mode the worker binds its own with SO_REUSEPORT.
@@ -0,0 +1,96 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Hyperion
4
+ # Renders Hyperion.stats as Prometheus text exposition format (v0.0.4).
5
+ # Mounted by AdminMiddleware on GET /-/metrics; the returned content-type
6
+ # is `text/plain; version=0.0.4; charset=utf-8`.
7
+ #
8
+ # Mapping rules:
9
+ # - keys listed in KNOWN_METRICS get their canonical name + curated HELP/TYPE
10
+ # - keys matching `responses_<3-digit>` are grouped under a single
11
+ # `hyperion_responses_status_total` family with a `status` label
12
+ # - any other key is auto-exported as `hyperion_<key>` with a generic HELP
13
+ # line, so newly-added counters surface in Prometheus without code changes
14
+ # here (the curated-name path is just nicer presentation, not gating)
15
+ #
16
+ # Output ordering is deterministic for stable scrape diffs:
17
+ # - known metrics in KNOWN_METRICS declaration order
18
+ # - status codes ascending
19
+ # - other keys alphabetically
20
+ module PrometheusExporter
21
+ module_function
22
+
23
+ KNOWN_METRICS = {
24
+ requests: { name: 'hyperion_requests_total',
25
+ help: 'Total HTTP requests handled',
26
+ type: 'counter' },
27
+ bytes_read: { name: 'hyperion_bytes_read_total',
28
+ help: 'Total bytes read from request sockets',
29
+ type: 'counter' },
30
+ bytes_written: { name: 'hyperion_bytes_written_total',
31
+ help: 'Total bytes written to response sockets',
32
+ type: 'counter' },
33
+ rejected_connections: { name: 'hyperion_rejected_connections_total',
34
+ help: 'Connections rejected due to backpressure (max_pending)',
35
+ type: 'counter' },
36
+ sendfile_responses: { name: 'hyperion_sendfile_responses_total',
37
+ help: 'Responses sent via plain-TCP sendfile(2) zero-copy path',
38
+ type: 'counter' },
39
+ tls_zerobuf_responses: { name: 'hyperion_tls_zerobuf_responses_total',
40
+ help: 'Responses sent via TLS IO.copy_stream (avoids userspace String build, but TLS encryption forces copy)',
41
+ type: 'counter' }
42
+ }.freeze
43
+
44
+ STATUS_KEY_PATTERN = /\Aresponses_(\d{3})\z/
45
+
46
+ STATUS_FAMILY_NAME = 'hyperion_responses_status_total'
47
+ STATUS_FAMILY_HELP = 'Responses by HTTP status code'
48
+
49
+ def render(stats)
50
+ buf = +''
51
+ grouped_status = {}
52
+ other = {}
53
+ known = {}
54
+
55
+ stats.each do |key, value|
56
+ if (match = key.to_s.match(STATUS_KEY_PATTERN))
57
+ grouped_status[match[1]] = value
58
+ elsif KNOWN_METRICS.key?(key)
59
+ known[key] = value
60
+ else
61
+ other[key] = value
62
+ end
63
+ end
64
+
65
+ # Known metrics first, in declaration order — gives the scrape a stable,
66
+ # human-friendly preamble regardless of hash insertion order.
67
+ KNOWN_METRICS.each do |key, meta|
68
+ next unless known.key?(key)
69
+
70
+ append_metric(buf, meta[:name], meta[:help], meta[:type], known[key])
71
+ end
72
+
73
+ unless grouped_status.empty?
74
+ buf << "# HELP #{STATUS_FAMILY_NAME} #{STATUS_FAMILY_HELP}\n"
75
+ buf << "# TYPE #{STATUS_FAMILY_NAME} counter\n"
76
+ grouped_status.sort.each do |status, value|
77
+ buf << %(#{STATUS_FAMILY_NAME}{status="#{status}"} #{value}\n)
78
+ end
79
+ end
80
+
81
+ other.sort_by { |k, _| k.to_s }.each do |key, value|
82
+ name = "hyperion_#{key}"
83
+ append_metric(buf, name, 'Hyperion internal counter (auto-exported)', 'counter', value)
84
+ end
85
+
86
+ buf
87
+ end
88
+
89
+ def append_metric(buf, name, help, type, value)
90
+ buf << "# HELP #{name} #{help}\n"
91
+ buf << "# TYPE #{name} #{type}\n"
92
+ buf << "#{name} #{value}\n"
93
+ end
94
+ private_class_method :append_metric
95
+ end
96
+ end
@@ -36,6 +36,21 @@ module Hyperion
36
36
  CRLF_HEADER_VALUE = /[\r\n]/
37
37
 
38
38
  def write(io, status, headers, body, keep_alive: false)
39
+ # Zero-copy fast path: bodies that point at an on-disk file (Rack::Files,
40
+ # asset servers, signed-download responders) get streamed via
41
+ # IO.copy_stream which delegates to sendfile(2) on Linux for plain TCP
42
+ # sockets — bytes go from the file's page cache straight to the socket
43
+ # buffer with no userspace allocation. For TLS sockets we still avoid the
44
+ # multi-MB String build, but encryption forces a userspace round-trip so
45
+ # we count that path separately.
46
+ return write_sendfile(io, status, headers, body, keep_alive: keep_alive) if body.respond_to?(:to_path)
47
+
48
+ write_buffered(io, status, headers, body, keep_alive: keep_alive)
49
+ end
50
+
51
+ private
52
+
53
+ def write_buffered(io, status, headers, body, keep_alive:)
39
54
  # Phase 1 buffers the full body so Content-Length is exact.
40
55
  # Phase 2 introduces chunked transfer-encoding for streaming bodies;
41
56
  # Phase 5 batches via IO::Buffer to avoid this intermediate String.
@@ -43,7 +58,7 @@ module Hyperion
43
58
  body.each { |chunk| buffered << chunk }
44
59
 
45
60
  reason = REASONS[status] || 'Unknown'
46
- date_str = Time.now.httpdate
61
+ date_str = cached_date
47
62
 
48
63
  head = build_head(status, reason, headers, buffered.bytesize, keep_alive, date_str)
49
64
 
@@ -67,7 +82,52 @@ module Hyperion
67
82
  body.close if body.respond_to?(:close)
68
83
  end
69
84
 
70
- private
85
+ def write_sendfile(io, status, headers, body, keep_alive:)
86
+ path = body.to_path
87
+ file = File.open(path, 'rb')
88
+ file_size = file.size
89
+
90
+ # If the app explicitly set content-length, respect it; otherwise use the
91
+ # real file size. Rack::Files does not pre-set content-length, so the
92
+ # common case is the File.size branch.
93
+ content_length = explicit_content_length(headers) || file_size
94
+
95
+ reason = REASONS[status] || 'Unknown'
96
+ date_str = cached_date
97
+ head = build_head(status, reason, headers, content_length, keep_alive, date_str)
98
+
99
+ io.write(head)
100
+ # IO.copy_stream copies up to file_size bytes from the file to the socket.
101
+ # On Linux + plain TCPSocket this triggers sendfile(2) — kernel-level
102
+ # zero-copy. On TLS sockets and non-Linux platforms it falls back to
103
+ # internal read+write loops, but we still avoid building a String the
104
+ # size of the file in Ruby.
105
+ copied = IO.copy_stream(file, io, file_size)
106
+
107
+ record_zero_copy_metric(io)
108
+ Hyperion.metrics.increment(:bytes_written, head.bytesize + copied)
109
+ ensure
110
+ file&.close
111
+ body.close if body.respond_to?(:close)
112
+ end
113
+
114
+ def explicit_content_length(headers)
115
+ headers.each do |k, v|
116
+ return v.to_i if k.to_s.casecmp('content-length').zero?
117
+ end
118
+ nil
119
+ end
120
+
121
+ # Plain TCPSocket → real sendfile(2). TLS-wrapped sockets cannot use
122
+ # sendfile (kernel can't encrypt) but still avoid the per-response String
123
+ # allocation, so we track them under a separate counter.
124
+ def record_zero_copy_metric(io)
125
+ if defined?(::OpenSSL::SSL::SSLSocket) && io.is_a?(::OpenSSL::SSL::SSLSocket)
126
+ Hyperion.metrics.increment(:tls_zerobuf_responses)
127
+ else
128
+ Hyperion.metrics.increment(:sendfile_responses)
129
+ end
130
+ end
71
131
 
72
132
  # rc17: prefer the C extension when available — eliminates the per-response
73
133
  # status-line interpolation, normalized hash, and per-header String#<<
@@ -20,18 +20,41 @@ module Hyperion
20
20
  DEFAULT_READ_TIMEOUT_SECONDS = 30
21
21
  DEFAULT_THREAD_COUNT = 5
22
22
 
23
+ # Pre-built minimal 503 response for the backpressure path. We bypass
24
+ # ResponseWriter / Rack entirely — no env build, no app dispatch, no
25
+ # access-log line. The bytes are frozen and reused across every
26
+ # rejection so the overload path stays allocation-free. Body is JSON
27
+ # so JSON-only API consumers don't have to special-case the format.
28
+ REJECT_503 = lambda {
29
+ body = +%({"error":"server_busy","retry_after_seconds":1}\n)
30
+ body.force_encoding(Encoding::ASCII_8BIT)
31
+ head = +"HTTP/1.1 503 Service Unavailable\r\n" \
32
+ "content-type: application/json\r\n" \
33
+ "content-length: #{body.bytesize}\r\n" \
34
+ "retry-after: 1\r\n" \
35
+ "connection: close\r\n" \
36
+ "\r\n"
37
+ head.force_encoding(Encoding::ASCII_8BIT)
38
+ (head + body).freeze
39
+ }.call
40
+
23
41
  attr_reader :host, :port
24
42
 
25
43
  def initialize(app:, host: '127.0.0.1', port: 9292, read_timeout: DEFAULT_READ_TIMEOUT_SECONDS,
26
- tls: nil, thread_count: DEFAULT_THREAD_COUNT)
27
- @host = host
28
- @port = port
29
- @app = app
30
- @read_timeout = read_timeout
31
- @tls = tls
32
- @thread_count = thread_count
33
- @thread_pool = nil
34
- @stopped = false
44
+ tls: nil, thread_count: DEFAULT_THREAD_COUNT, max_pending: nil,
45
+ max_request_read_seconds: 60, h2_settings: nil, async_io: false)
46
+ @host = host
47
+ @port = port
48
+ @app = app
49
+ @read_timeout = read_timeout
50
+ @tls = tls
51
+ @thread_count = thread_count
52
+ @max_pending = max_pending
53
+ @max_request_read_seconds = max_request_read_seconds
54
+ @h2_settings = h2_settings
55
+ @async_io = async_io
56
+ @thread_pool = nil
57
+ @stopped = false
35
58
  end
36
59
 
37
60
  def listen
@@ -83,18 +106,25 @@ module Hyperion
83
106
 
84
107
  def start
85
108
  listen unless @server
86
- @thread_pool = ThreadPool.new(size: @thread_count) if @thread_count.positive?
109
+ @thread_pool = ThreadPool.new(size: @thread_count, max_pending: @max_pending) if @thread_count.positive?
87
110
 
88
- if @tls
111
+ if @tls || @async_io
89
112
  # TLS path: ALPN may pick `h2`, and h2 spawns one fiber per stream
90
113
  # inside Http2Handler. Keep the Async wrapper so the scheduler is
91
114
  # available for those fibers and for handshake yields.
115
+ #
116
+ # async_io: true: operator opt-in for plain HTTP/1.1. The Async wrap
117
+ # is required when callers want fiber cooperative I/O — e.g.
118
+ # `hyperion-async-pg` yielding while a Postgres query is in flight.
119
+ # Pays ~5% throughput vs the raw-loop fast path; in exchange one
120
+ # OS thread can serve N concurrent in-flight DB queries instead of 1.
92
121
  start_async_loop
93
122
  else
94
- # Plain HTTP/1.1: the worker thread owns each connection for its
95
- # lifetime, so the Async wrapper adds zero value (no fibers ever
96
- # run on this loop's task). Skip it — pure IO.select + accept_nonblock
97
- # shaves measurable overhead off the accept hot path.
123
+ # Plain HTTP/1.1, async_io: false (default): the worker thread owns
124
+ # each connection for its lifetime, so the Async wrapper adds zero
125
+ # value (no fibers ever run on this loop's task). Skip it — pure
126
+ # IO.select + accept_nonblock shaves measurable overhead off the
127
+ # accept hot path.
98
128
  start_raw_loop
99
129
  end
100
130
  ensure
@@ -121,9 +151,12 @@ module Hyperion
121
151
 
122
152
  apply_timeout(socket)
123
153
  if @thread_pool
124
- @thread_pool.submit_connection(socket, @app)
154
+ unless @thread_pool.submit_connection(socket, @app,
155
+ max_request_read_seconds: @max_request_read_seconds)
156
+ reject_connection(socket)
157
+ end
125
158
  else
126
- Connection.new.serve(socket, @app)
159
+ Connection.new.serve(socket, @app, max_request_read_seconds: @max_request_read_seconds)
127
160
  end
128
161
  end
129
162
  end
@@ -148,15 +181,47 @@ module Hyperion
148
181
  # HTTP/2: each stream runs on a fiber inside Http2Handler. The
149
182
  # handler still uses the pool's `#call` for app.call hops on each
150
183
  # stream (one per stream, not one per connection).
151
- Http2Handler.new(app: @app, thread_pool: @thread_pool).serve(socket)
184
+ Http2Handler.new(app: @app, thread_pool: @thread_pool, h2_settings: @h2_settings).serve(socket)
185
+ elsif @async_io
186
+ # async_io plain HTTP/1.1: serve inline on the calling fiber so the
187
+ # request runs *under* Async::Scheduler. This is what makes
188
+ # hyperion-async-pg (and other Async-aware libraries) actually
189
+ # cooperate — each fiber yields the OS thread on socket waits, so
190
+ # one thread can serve N concurrent in-flight DB queries. The
191
+ # thread pool is intentionally bypassed here: handing the socket
192
+ # to a worker thread strips the scheduler context.
193
+ Connection.new.serve(socket, @app, max_request_read_seconds: @max_request_read_seconds)
152
194
  elsif @thread_pool
153
195
  # HTTP/1.1 (e.g. TLS-wrapped after ALPN picked http/1.1): hand the
154
196
  # connection to a worker thread. The fiber that called dispatch
155
- # returns immediately.
156
- @thread_pool.submit_connection(socket, @app)
197
+ # returns immediately. On overflow, reject with 503 + close.
198
+ unless @thread_pool.submit_connection(socket, @app,
199
+ max_request_read_seconds: @max_request_read_seconds)
200
+ reject_connection(socket)
201
+ end
157
202
  else
158
203
  # No pool (thread_count: 0): inline on the calling fiber.
159
- Connection.new.serve(socket, @app)
204
+ Connection.new.serve(socket, @app, max_request_read_seconds: @max_request_read_seconds)
205
+ end
206
+ end
207
+
208
+ # Backpressure rejection. Emits a pre-built 503 + closes the socket.
209
+ # No Rack env, no app dispatch, no access-log line — the overload
210
+ # path must stay cheap so we don't pile rejection cost on top of the
211
+ # already-saturated workers. Bumps :rejected_connections so operators
212
+ # can alert on sustained overload.
213
+ def reject_connection(socket)
214
+ socket.write(REJECT_503)
215
+ Hyperion.metrics.increment(:rejected_connections)
216
+ rescue StandardError
217
+ # Client may have hung up between accept and our 503 write — that's
218
+ # the failure mode we're protecting them from anyway, so swallow.
219
+ nil
220
+ ensure
221
+ begin
222
+ socket.close
223
+ rescue StandardError
224
+ nil
160
225
  end
161
226
  end
162
227
 
@@ -26,11 +26,12 @@ module Hyperion
26
26
  class ThreadPool
27
27
  SHUTDOWN = :__hyperion_thread_pool_shutdown__
28
28
 
29
- attr_reader :size
29
+ attr_reader :size, :max_pending
30
30
 
31
- def initialize(size:)
32
- @size = size
33
- @inbox = Queue.new # multiplexes both kinds of jobs
31
+ def initialize(size:, max_pending: nil)
32
+ @size = size
33
+ @max_pending = max_pending
34
+ @inbox = Queue.new # multiplexes both kinds of jobs
34
35
  # Pre-allocate one reply queue per in-flight slot for the legacy `#call`
35
36
  # path. Bounded by `size`: if all workers are busy, all reply queues are
36
37
  # checked out, and the next caller blocks on `@reply_pool.pop` until a
@@ -43,8 +44,23 @@ module Hyperion
43
44
  # HTTP/1.1 path: hand the whole socket to a worker thread. The worker
44
45
  # runs `Connection#serve(socket, app)` directly. No per-request hop.
45
46
  # Returns immediately — caller does not wait.
46
- def submit_connection(socket, app)
47
- @inbox << [:connection, socket, app]
47
+ #
48
+ # Returns true on enqueue, false on rejection. When `max_pending` is set
49
+ # and the inbox already has at least that many entries, the connection
50
+ # is rejected up to the caller (Server emits a 503 and closes the
51
+ # socket). Without `max_pending` (default nil) the queue is unbounded
52
+ # and we always return true — preserves pre-1.2 behaviour.
53
+ #
54
+ # The check is inherently racy with worker drain — workers may pop
55
+ # between our `size` read and the `<<`. Backpressure is statistical,
56
+ # not strict. Off-by-one over the configured cap during a thundering
57
+ # accept burst is acceptable; the cost of stricter sync would be a
58
+ # mutex on every enqueue, which we won't pay on the hot path.
59
+ def submit_connection(socket, app, max_request_read_seconds: 60)
60
+ return false if @max_pending && @inbox.size >= @max_pending
61
+
62
+ @inbox << [:connection, socket, app, max_request_read_seconds]
63
+ true
48
64
  end
49
65
 
50
66
  # HTTP/2 + sub-call path: hop one `app.call` from the calling fiber to a
@@ -78,12 +94,12 @@ module Hyperion
78
94
 
79
95
  case job[0]
80
96
  when :connection
81
- _, socket, app = job
97
+ _, socket, app, max_request_read_seconds = job
82
98
  # Worker thread owns the connection for its full lifetime. Pass
83
99
  # thread_pool: nil so Connection#call_app inlines Adapter::Rack.call
84
100
  # — the worker IS the pool, no further hop required.
85
101
  begin
86
- Hyperion::Connection.new.serve(socket, app)
102
+ Hyperion::Connection.new.serve(socket, app, max_request_read_seconds: max_request_read_seconds)
87
103
  rescue StandardError => e
88
104
  Hyperion.logger.error do
89
105
  {
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Hyperion
4
- VERSION = '1.1.0'
4
+ VERSION = '1.3.0'
5
5
  end
@@ -18,16 +18,22 @@ module Hyperion
18
18
  class Worker
19
19
  def initialize(host:, port:, app:, read_timeout:, tls: nil,
20
20
  thread_count: Server::DEFAULT_THREAD_COUNT,
21
- config: nil, worker_index: 0, listener: nil)
22
- @host = host
23
- @port = port
24
- @app = app
25
- @read_timeout = read_timeout
26
- @tls = tls
27
- @thread_count = thread_count
28
- @config = config || Hyperion::Config.new
29
- @worker_index = worker_index
30
- @listener = listener
21
+ config: nil, worker_index: 0, listener: nil,
22
+ max_pending: nil, max_request_read_seconds: 60,
23
+ h2_settings: nil, async_io: false)
24
+ @host = host
25
+ @port = port
26
+ @app = app
27
+ @read_timeout = read_timeout
28
+ @tls = tls
29
+ @thread_count = thread_count
30
+ @config = config || Hyperion::Config.new
31
+ @worker_index = worker_index
32
+ @listener = listener
33
+ @max_pending = max_pending
34
+ @max_request_read_seconds = max_request_read_seconds
35
+ @h2_settings = h2_settings
36
+ @async_io = async_io
31
37
  end
32
38
 
33
39
  def run
@@ -43,7 +49,11 @@ module Hyperion
43
49
 
44
50
  server = Server.new(host: @host, port: @port, app: @app,
45
51
  read_timeout: @read_timeout, tls: @tls,
46
- thread_count: @thread_count)
52
+ thread_count: @thread_count,
53
+ max_pending: @max_pending,
54
+ max_request_read_seconds: @max_request_read_seconds,
55
+ h2_settings: @h2_settings,
56
+ async_io: @async_io)
47
57
  tcp_server = @listener || build_reuseport_listener
48
58
  server.adopt_listener(tcp_server)
49
59
 
data/lib/hyperion.rb CHANGED
@@ -63,6 +63,44 @@ module Hyperion
63
63
  else true # default ON
64
64
  end
65
65
  end
66
+
67
+ # Pre-fork warmup. Run by Master and CLI single-mode BEFORE children are
68
+ # forked (or before the lone worker starts accepting). Pre-allocates the
69
+ # Rack adapter's object pools and eager-touches lazily-resolved constants
70
+ # so each forked child inherits warm memory via copy-on-write — the first
71
+ # N requests on a fresh worker no longer pay the allocation / autoload
72
+ # tax that would otherwise serialize behind the GVL on cold start.
73
+ #
74
+ # Idempotent — second and later calls are no-ops. Failures are swallowed
75
+ # with a warn log: warmup is an optimization, not a correctness gate.
76
+ # If, for instance, OpenSSL can't be required in some odd environment,
77
+ # we'd rather start cold than refuse to boot.
78
+ def warmup!
79
+ return if @warmed
80
+
81
+ @warmed = true
82
+
83
+ if defined?(::Hyperion::Adapter::Rack) && ::Hyperion::Adapter::Rack.respond_to?(:warmup_pool)
84
+ ::Hyperion::Adapter::Rack.warmup_pool(8)
85
+ end
86
+
87
+ # Touch the C extension's response-head builder so its lazily-initialized
88
+ # internal state runs in the master, not in every child after fork.
89
+ ::Hyperion::CParser.respond_to?(:build_response_head) if defined?(::Hyperion::CParser)
90
+
91
+ # Eager-load TLS / SSLSocket. The sendfile path's `is_a?` check would
92
+ # otherwise trigger autoload in the worker on the first TLS response.
93
+ require 'openssl'
94
+ defined?(::OpenSSL::SSL::SSLSocket) && ::OpenSSL::SSL::SSLSocket.name
95
+
96
+ # Force Ruby's tzinfo / strftime-cache load by emitting one httpdate.
97
+ # Subsequent calls hit the per-thread `cached_date` slot in response_writer.
98
+ Time.now.httpdate
99
+ nil
100
+ rescue StandardError => e
101
+ Hyperion.logger.warn { { message: 'warmup failed (non-fatal)', error: e.message } }
102
+ nil
103
+ end
66
104
  end
67
105
  end
68
106
 
@@ -89,6 +127,7 @@ require_relative 'hyperion/request'
89
127
  require_relative 'hyperion/parser'
90
128
  require_relative 'hyperion/c_parser'
91
129
  require_relative 'hyperion/adapter/rack'
130
+ require_relative 'hyperion/prometheus_exporter'
92
131
  require_relative 'hyperion/admin_middleware'
93
132
  require_relative 'hyperion/response_writer'
94
133
  require_relative 'hyperion/thread_pool'
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: hyperion-rb
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.0
4
+ version: 1.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrey Lobanov
@@ -160,6 +160,7 @@ files:
160
160
  - lib/hyperion/metrics.rb
161
161
  - lib/hyperion/parser.rb
162
162
  - lib/hyperion/pool.rb
163
+ - lib/hyperion/prometheus_exporter.rb
163
164
  - lib/hyperion/request.rb
164
165
  - lib/hyperion/response_writer.rb
165
166
  - lib/hyperion/server.rb