RubyGems - hyperion-rb - Versions diffs - 1.1.0 → 1.3.0 - Mend

hyperion-rb 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +40 -0
data/README.md +64 -14
data/lib/hyperion/adapter/rack.rb +14 -0
data/lib/hyperion/admin_middleware.rb +47 -17
data/lib/hyperion/cli.rb +22 -5
data/lib/hyperion/config.rb +8 -1
data/lib/hyperion/connection.rb +55 -4
data/lib/hyperion/http2_handler.rb +90 -2
data/lib/hyperion/master.rb +25 -1
data/lib/hyperion/prometheus_exporter.rb +96 -0
data/lib/hyperion/response_writer.rb +62 -2
data/lib/hyperion/server.rb +86 -21
data/lib/hyperion/thread_pool.rb +24 -8
data/lib/hyperion/version.rb +1 -1
data/lib/hyperion/worker.rb +21 -11
data/lib/hyperion.rb +39 -0
metadata +2 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 5670da7700c48436d0e3ded790cf5df090deeebd38bc0b7024b9b6e95c20b5c8
-  data.tar.gz: 10bedef6e02717511eb83bea0044e71978d7e13b9d93d0ac37310a89f6581e9a
+  metadata.gz: 53235f97fb1e384507f62373cd12180ade1eecc8df6f9ce75145ba60403a983e
+  data.tar.gz: 1c58ede296c54a098d26cae2b69837f23bf60fb5ce4239c033446f583f12a4df
 SHA512:
-  metadata.gz: e791cdd9271cb954ddc11ee037ced8c182fffa4c8b27ded1d0c5672cada1d62fb4095d9e4c440136ce8eeed746eca6e4d99ebb3b1e42a2bc9bbd7bce5c1d9615
-  data.tar.gz: 4728b4bf159583fc6f46bd8c33dbcf916b74dddd49dd685159d39950112f5716cdc8108903d0ca312b31eef397d2237fab9d2f34d51e90822a7d3cab9c1b6691
+  metadata.gz: 8484e7168d8ba27312edece5c86af770ef0604bf85d13cdde37c5f4c87b9de0417216f241e073340bedeabef292fc4ec032a7379a76a836e236f6f129c97bcd3
+  data.tar.gz: 89ac23881d0ddd4beff79d08551fa6f7e8399948c1607a3a32475202780170dc9c036f1ef93bd1f17dc5a34e1d91d9901bf3e4119e9efea05d5aa528b22271ff

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,45 @@
 # Changelog
+## [1.3.0] - 2026-04-27
+Adds the structural moat for fiber-cooperative I/O. No breaking changes.
+### Added
+- **`async_io: true` config flag** (also `--async-io` CLI flag) — when enabled, the plain HTTP/1.1 accept loop runs each connection on a fiber under `Async::Scheduler` instead of handing it to a worker thread. This is what makes [hyperion-async-pg](https://github.com/andrew-woblavobla/hyperion-async-pg) (and other Async-aware libraries) actually cooperate: each fiber yields the OS thread on socket waits, so one thread can serve N concurrent in-flight DB queries instead of 1. **Default off** to keep the 1.2.0 raw-loop perf for fiber-unaware apps. Trade-off: ~5% throughput hit on hello-world; 5–10× throughput on PG-bound workloads when paired with hyperion-async-pg + a fiber-aware connection pool.
+- **Bench validation (macOS, 50ms PG round-trip, 200 concurrent wrk conns):**
+  | | r/s | p99 |
+  |---|---:|---:|
+  | Puma 7.2 `-t 5` + plain pg (pool=5) | 88.9 | 2.31 s |
+  | **Hyperion 1.3.0 `--async-io -t 5` + hyperion-async-pg (FiberPool=64)** | **1,103.7** | **237 ms** |
+  **12.4× throughput, 9.7× lower p99.** Theoretical ceiling at pool=64 + 50ms query is ~1280 r/s; achieved 86% of it. Linux numbers will land in a follow-up bench section.
+### Changed
+- TLS / HTTP/2 paths still always use the Async accept loop (unchanged); they ignore the `async_io` flag because they need the scheduler for ALPN handshake yields and per-stream fiber dispatch anyway.
+- When `async_io: true`, plain HTTP/1.1 dispatch bypasses the thread pool and serves the connection inline on the calling fiber. The pool stays in use for the TLS path's `app.call` hops on each h2 stream.
+## [1.2.0] - 2026-04-27
+Production hardening + perf round 2. No breaking changes.
+### Added
+- **Zero-copy sendfile path** — when a Rack body responds to `#to_path` (e.g. `Rack::Files`, asset uploads), `ResponseWriter` uses `IO.copy_stream(file, socket)` which triggers `sendfile(2)` on Linux for plain TCP. Eliminates the ~MB-sized String allocation per static-asset response. Falls back to userspace copy on TLS / non-Linux but still avoids the userspace String build. New metrics: `:sendfile_responses`, `:tls_zerobuf_responses`.
+- **Hot fork warmup (`Hyperion.warmup!`)** — master pre-allocates the Rack env Hash pool, primes the C extension's lazy state, and touches commonly-resolved constants before `before_fork`. Workers inherit the warm pools via Copy-on-Write. Removes first-N-requests-after-fork allocation tax.
+- **Backpressure (`max_pending`)** — when the thread pool's inbox queue exceeds the configured depth, new accepts get HTTP 503 + `Retry-After: 1` and the socket is closed immediately (no Rack dispatch, no access-log line). Default off (nil); opt in by setting an Integer. New metric: `:rejected_connections`.
+- **Prometheus exporter** — `AdminMiddleware` now serves `GET /-/metrics` in addition to `POST /-/quit` (same token). Renders `Hyperion.stats` as Prometheus text exposition v0.0.4. Counter names follow the `hyperion_<key>_total` convention; `:responses_<code>` keys are grouped under `hyperion_responses_status_total{status="<code>"}`.
+- **Slow-client total-deadline (`max_request_read_seconds`)** — per-request wallclock cap on the request-line + headers read phase (default 60s). Defense-in-depth against slowloris: a malicious client can no longer dribble 1 byte per `read_timeout` window indefinitely. On overrun, Hyperion writes 408 + closes. Resets per request on keep-alive sessions. New metric: `:slow_request_aborts`.
+- **HTTP/2 SETTINGS tuning** — Falcon-class defaults shipped: `MAX_CONCURRENT_STREAMS=128`, `INITIAL_WINDOW_SIZE=1MiB`, `MAX_FRAME_SIZE=1MiB`, `MAX_HEADER_LIST_SIZE=64KiB`. All four overridable via Config DSL (`h2_max_concurrent_streams` etc). Out-of-spec values are clamped + warned, not crashed.
+- **`docs/REVERSE_PROXY.md`** — nginx + AWS ALB samples, X-Forwarded-* semantics, admin-endpoint hardening at the edge. Includes the documented gotcha that ALB-to-target HTTP/2 strips WebSocket upgrade headers (use HTTP/1.1 upstream).
+### Changed
+- **`ResponseWriter` Date header now uses `cached_date`** — the per-thread, per-second cache landed in 1.1.0 was never wired into the hot path. It is now. Eliminates ~3 String allocations per response (`Time.now.httpdate` → cached String reuse).
+- **`AdminMiddleware`** refactored: shared `authorize` helper between `/-/quit` and `/-/metrics`; `PATH` constant split into `PATH_QUIT` + `PATH_METRICS`.
+- **`Hyperion::Logger` per-thread access buffer key** is now namespaced per Logger instance (already shipped as a 1.1.0 follow-up fix; documented here for completeness).
+### Fixed
+- N/A — no regressions discovered between 1.1.0 and 1.2.0.
 ## [1.1.0] - 2026-04-27
 First minor release after 1.0.0. Production hardening + perf wins, no breaking changes.

data/README.md CHANGED Viewed

@@ -29,26 +29,27 @@ All numbers are real wrk runs against published Hyperion configs. Hyperion ships
 ### Hello-world Rack app
-`bench/hello.ru`, single worker, parity threads (`-t 16` vs Puma `-t 16:16`), 4 wrk threads / 50 connections / 10s, macOS arm64 / Ruby 3.3.3:
+`bench/hello.ru`, single worker, parity threads (`-t 5` vs Puma `-t 5:5`), 4 wrk threads / 100 connections / 15s, macOS arm64 / Ruby 3.3.3, Hyperion 1.2.0:
-| | r/s | p99 |
-|---|---:|---:|
-| **Hyperion default (logs ON)** | **23,885** | **1.05 ms** |
-| Hyperion `--no-log-requests` | 24,222 | 1.00 ms |
-| Puma `-t 16:16` | 18,794 | 30.89 ms |
+| | r/s | p99 | tail vs Hyperion |
+|---|---:|---:|---:|
+| **Hyperion 1.2.0** (default, logs ON) | **22,496** | **502 µs** | **1×** |
+| Falcon 0.55.3 `--count 1` | 22,199 | 5.36 ms | 11× worse |
+| Puma 7.1.0 `-t 5:5` | 20,400 | 422.85 ms | 845× worse |
-**1.27× Puma throughput, ~30× lower p99 — while emitting structured JSON access logs Puma doesn't.**
+**Hyperion: 1.10× Puma throughput, parity with Falcon on throughput, ~10× lower p99 than Falcon and ~845× lower than Puma — while emitting structured JSON access logs the others don't.**
 ### Production cluster config (`-w 4`)
-Same bench app, `-w 4` cluster, parity threads. macOS arm64:
+Same bench app, `-w 4` cluster, parity threads (`-t 5` everywhere), 4 wrk threads / 200 connections / 15s, macOS arm64:
-| | r/s | p99 |
-|---|---:|---:|
-| **Hyperion `-w 4 -t 10`** | **44,221** | **1.15 ms** |
-| Puma `-w 4 -t 10:10` | 37,929 | 17.06 ms |
+| | r/s | p99 | tail vs Hyperion |
+|---|---:|---:|---:|
+| Falcon `--count 4` | 48,197 | 4.84 ms | 5.9× worse |
+| **Hyperion `-w 4 -t 5`** | **40,137** | **825 µs** | **1×** |
+| Puma `-w 4 -t 5:5` | 34,793 | 177.76 ms | 215× worse (1 timeout) |
-**1.17× Puma throughput, ~15× lower p99.**
+Falcon edges Hyperion ~20% on raw rps at `-w 4` on macOS hello-world. **Hyperion still leads on tail latency by 5.9× over Falcon and 215× over Puma**, and beats Puma on throughput by 1.15×. On Linux production-config and DB-backed workloads (below) Hyperion takes the rps lead too — the macOS hello-world advantage to Falcon disappears once the workload includes any actual work or the kernel is Linux.
 ### Linux production-config (DB-backed Rack)
@@ -60,7 +61,37 @@ Same bench app, `-w 4` cluster, parity threads. macOS arm64:
 | Hyperion `--no-log-requests` | 6,364 | 1.114× |
 | Puma `-w 4 -t 10:10` (no per-req logs) | 5,715 | 1.000× |
-Bench is network-bound (~3-4 ms median is the PG + Redis round-trip). Hyperion's lead comes from cheaper per-request CPU: lock-free per-thread metrics, per-thread cached iso8601 timestamps in the access log, hand-rolled single-interpolation log line builder, no logger mutex (POSIX `write(2)` atomicity), C-extension response-head builder.
+Bench is **wait-bound** — ~3-4 ms median is the PG + Redis round-trip, dwarfing the per-request CPU work where Hyperion's optimisations live. With a synchronous `pg` driver, fibers don't help: every in-flight DB call still parks an OS thread, and both servers max out at `workers × threads` concurrent queries. To widen this gap requires either an async PG driver — see [hyperion-async-pg](https://github.com/andrew-woblavobla/hyperion-async-pg) (companion gem; pair with `--async-io` and a fiber-aware pool, see "Async I/O — fiber concurrency on PG-bound apps" below) — or a CPU-bound workload, where Hyperion's lead becomes visible (next section).
+### Async I/O — fiber concurrency on PG-bound apps
+`bench/pg_concurrent.ru` (50 ms PG query per request, pool sized for the server's concurrency model). macOS, Postgres over WAN, wrk `-t4 -c200 -d20s`:
+| | r/s | p99 |
+|---|---:|---:|
+| Puma 7.2 `-t 5` + plain pg (pool=5) | 88.9 | 2.31 s |
+| **Hyperion 1.3.0 `--async-io -t 5` + hyperion-async-pg (FiberPool=64)** | **1,103.7** | **237 ms** |
+**12.4× throughput, 9.7× lower p99.** Puma is bottlenecked at `threads × 1 in-flight query` because plain `pg` blocks the OS thread on `recv()`. Hyperion + async-pg + a fiber-aware pool decouples concurrency from threads: 5 OS threads serve 64 concurrent in-flight queries via fiber cooperation. Theoretical ceiling at pool=64 + 50 ms query = 1280 r/s; achieved 1103 r/s = 86% of it.
+Three things must all be true to get this win:
+1. **`async_io: true`** in your Hyperion config (or `--async-io` CLI flag). Default is off to keep 1.2.0's raw-loop perf for fiber-unaware apps.
+2. **`hyperion-async-pg`** installed: `gem 'hyperion-async-pg', require: 'hyperion/async_pg'` + `Hyperion::AsyncPg.install!` at boot.
+3. **Fiber-aware connection pool.** The popular `connection_pool` gem is NOT — its Mutex blocks the OS thread. Use [`async-pool`](https://github.com/socketry/async-pool), `Async::Semaphore`, or hand-roll one (see `bench/pg_concurrent.ru` for a 30-line FiberPool example).
+Skip any of these and you get parity with Puma at the same `-t`. Run the bench yourself: `MODE=async DATABASE_URL=... PG_POOL_SIZE=64 bundle exec hyperion --async-io -t 5 bench/pg_concurrent.ru` (in the [hyperion-async-pg](https://github.com/andrew-woblavobla/hyperion-async-pg) repo).
+### CPU-bound JSON workload
+`bench/work.ru` — handler builds a 50-key fixture, JSON-encodes a fresh response per request (~8 KB body), processes a 6-cookie header chain. wrk `-t4 -c200 -d15s`, macOS arm64 / Ruby 3.3.3, 1.2.0:
+| | r/s | p99 | tail vs Hyperion |
+|---|---:|---:|---:|
+| Falcon `--count 4` | 46,166 | 20.17 ms | 24× worse |
+| **Hyperion `-w 4 -t 5`** | **43,924** | **824 µs** | **1×** |
+| Puma `-w 4 -t 5:5` | 36,383 | 166.30 ms (47 socket errors) | 200× worse |
+**1.21× Puma throughput, 200× lower p99.** This is the gap that hides behind PG-round-trip noise on the DB bench. Hyperion's per-request CPU savings (lock-free per-thread metrics, frozen header keys in the Rack adapter, C-ext response head builder, cached iso8601 timestamps, cached HTTP Date header) land on the wire when the workload is CPU-bound. Falcon edges us 5% on raw r/s but with 24× worse tail — a different tradeoff curve. Reproduce: `bundle exec bin/hyperion -w 4 -t 5 -p 9292 bench/work.ru`.
 ### Real Rails 8.1 app (single worker, parity threads `-t 16`)
@@ -77,6 +108,25 @@ Health endpoint that traverses the full middleware chain (rack-attack, locale re
 On Grape and Rails-controller workloads Puma hits wrk's 2 s timeout cap on ~⅔ of requests — its real p99 is censored above 2 s. Hyperion serves all of its requests under 1.2 s with 0 to 16 timeouts. **1.14–1.48× Puma throughput** depending on endpoint.
+### Static-asset serving (sendfile zero-copy path, 1.2.0+)
+`bench/static.ru` (`Rack::Files` over a 1 MiB asset), `-w 1`, `wrk -t4 -c100 -d15s`, macOS arm64 / Ruby 3.3.3:
+| | r/s | p99 | transferred | tail vs winner |
+|---|---:|---:|---:|---:|
+| **Hyperion (sendfile path)** | **2,069** | **3.10 ms** | 30.4 GB | **1×** |
+| Puma `-w 1 -t 5:5` | 2,109 | 566.16 ms | 31.0 GB | 183× worse |
+| Falcon `--count 1` | 1,269 | 801.01 ms | 18.7 GB | 258× worse (28 timeouts) |
+Throughput is bandwidth-bound on localhost (≈2 GB/s = the loopback memory ceiling), so the throughput column looks like parity. The actual win is in the **tail latency** column: Hyperion's `IO.copy_stream` → `sendfile(2)` path skips userspace entirely, while Puma allocates a String per response and Falcon serializes more aggressively. On real network paths sendfile widens the gap further (kernel-to-NIC zero-copy).
+Reproduce:
+```sh
+ruby -e 'File.binwrite("/tmp/hyperion_bench_asset_1m.bin", "x" * (1024*1024))'
+bundle exec bin/hyperion -p 9292 bench/static.ru
+wrk --latency -t4 -c100 -d15s http://127.0.0.1:9292/hyperion_bench_asset_1m.bin
+```
 ### Concurrency at scale (architectural advantages)
 These workloads demonstrate structural differences between Hyperion's fiber-per-connection / fiber-per-stream model and Puma's thread-pool model. Numbers are illustrative; the architecture is what matters. Run on Ubuntu 24.04 / Ruby 3.3.3, single worker, h2load `-c <conns> -n 100000 --rps 1000 --h1`.

data/lib/hyperion/adapter/rack.rb CHANGED Viewed

@@ -49,6 +49,20 @@ module Hyperion
       )
       class << self
+        # Pre-allocate `n` env-hash and rack-input objects in master before
+        # fork. Children inherit the populated free-list via copy-on-write —
+        # the hash slots stay shared until a request mutates them. Eliminates
+        # the first-N-requests allocation tax that every fresh worker would
+        # otherwise pay on cold start. Idempotent: safe to call multiple
+        # times; the pool simply caps at its configured `max_size`.
+        def warmup_pool(count = 8)
+          warmed_envs = Array.new(count) { ENV_POOL.acquire }
+          warmed_inputs = Array.new(count) { INPUT_POOL.acquire }
+          warmed_envs.each { |e| ENV_POOL.release(e) }
+          warmed_inputs.each { |i| INPUT_POOL.release(i) }
+          nil
+        end
         def call(app, request)
           env, input = build_env(request)
           status, headers, body = app.call(env)

data/lib/hyperion/admin_middleware.rb CHANGED Viewed

@@ -7,7 +7,8 @@ module Hyperion
   # listener as the application. Disabled by default — only mounted when
   # `admin_token` is configured. Currently provides:
   #
-  #   POST /-/quit  →  triggers graceful master drain (SIGTERM to ppid)
+  #   POST /-/quit     →  triggers graceful master drain (SIGTERM to ppid)
+  #   GET  /-/metrics  →  returns Hyperion.stats in Prometheus text format
   #
   # Auth: the request must include `X-Hyperion-Admin-Token: <token>`.
   # Mismatch → 401. Path/method mismatch → falls through to the app
@@ -18,9 +19,17 @@ module Hyperion
   # SECURITY: the bearer token is defense-in-depth, not a substitute for
   # network isolation. Operators MUST keep the listener on a private
   # network or behind TLS + an authenticating reverse proxy. Anyone who
-  # can reach the listener AND knows the token can drain the server.
+  # can reach the listener AND knows the token can drain the server or
+  # scrape its metrics. See docs/REVERSE_PROXY.md for nginx/ALB recipes
+  # that block /-/* at the edge.
   class AdminMiddleware
-    PATH = '/-/quit'
+    PATH_QUIT    = '/-/quit'
+    PATH_METRICS = '/-/metrics'
+    METRICS_CONTENT_TYPE = 'text/plain; version=0.0.4; charset=utf-8'
+    JSON_CONTENT_TYPE    = 'application/json'
+    UNAUTHORIZED_BODY = %({"error":"unauthorized"}\n)
     def initialize(app, token:, signal_target: nil)
       raise ArgumentError, 'admin_token must be a non-empty String' if token.nil? || token.to_s.empty?
@@ -33,38 +42,59 @@ module Hyperion
     end
     def call(env)
-      return @app.call(env) unless admin_request?(env)
+      path   = env['PATH_INFO']
+      method = env['REQUEST_METHOD']
-      provided = env['HTTP_X_HYPERION_ADMIN_TOKEN'].to_s
-      # Constant-time comparison. Rack::Utils.secure_compare requires same
-      # length, so prefix-pad first to avoid a length-leak side channel.
-      unless secure_match?(provided)
-        return [401, { 'content-type' => 'application/json' },
-                [%({"error":"unauthorized"}\n)]]
+      if path == PATH_QUIT && method == 'POST'
+        authorize(env) { handle_quit(env) }
+      elsif path == PATH_METRICS && method == 'GET'
+        authorize(env) { handle_metrics }
+      else
+        @app.call(env)
       end
+    end
+    private
+    # Wrap a handler in the shared bearer-token check. Yields only when the
+    # token matches; returns the canonical 401 response otherwise.
+    def authorize(env)
+      provided = env['HTTP_X_HYPERION_ADMIN_TOKEN'].to_s
+      return unauthorized unless secure_match?(provided)
+      yield
+    end
+    def unauthorized
+      [401, { 'content-type' => JSON_CONTENT_TYPE }, [UNAUTHORIZED_BODY]]
+    end
+    def handle_quit(env)
       target = resolve_signal_target
-      Hyperion.logger.info { { message: 'admin drain requested', remote_addr: env['REMOTE_ADDR'], target_pid: target } }
+      Hyperion.logger.info do
+        { message: 'admin drain requested', remote_addr: env['REMOTE_ADDR'], target_pid: target }
+      end
       begin
         Process.kill('TERM', target)
       rescue StandardError => e
         Hyperion.logger.warn { { message: 'admin drain signal failed', error: e.message } }
-        return [500, { 'content-type' => 'application/json' }, [%({"error":"signal_failed"}\n)]]
+        return [500, { 'content-type' => JSON_CONTENT_TYPE }, [%({"error":"signal_failed"}\n)]]
       end
-      [202, { 'content-type' => 'application/json' }, [%({"status":"draining"}\n)]]
+      [202, { 'content-type' => JSON_CONTENT_TYPE }, [%({"status":"draining"}\n)]]
     end
-    private
-    def admin_request?(env)
-      env['PATH_INFO'] == PATH && env['REQUEST_METHOD'] == 'POST'
+    def handle_metrics
+      body = PrometheusExporter.render(Hyperion.stats)
+      [200, { 'content-type' => METRICS_CONTENT_TYPE }, [body]]
     end
     def secure_match?(provided)
       return false if provided.empty?
       return false unless provided.bytesize == @token.bytesize
+      # Constant-time comparison. Rack::Utils.secure_compare requires same
+      # length, so we prefix-pad first to avoid a length-leak side channel.
       Rack::Utils.secure_compare(provided, @token)
     end

data/lib/hyperion/cli.rb CHANGED Viewed

@@ -57,6 +57,10 @@ module Hyperion
              'Enable Ruby YJIT (default: auto on RAILS_ENV/RACK_ENV=production/staging)') do |v|
           cli_opts[:yjit] = v
         end
+        o.on('--[no-]async-io',
+             'Run plain HTTP/1.1 connections under Async::Scheduler (required for hyperion-async-pg and other fiber-cooperative I/O; default off)') do |v|
+          cli_opts[:async_io] = v
+        end
         o.on('-h', '--help', 'show help') do
           puts o
           exit 0
@@ -111,12 +115,22 @@ module Hyperion
       tls = build_tls_from_config(config)
       server = Server.new(host: config.host, port: config.port, app: app,
                           tls: tls, thread_count: config.thread_count,
-                          read_timeout: config.read_timeout)
+                          read_timeout: config.read_timeout,
+                          max_pending: config.max_pending,
+                          max_request_read_seconds: config.max_request_read_seconds,
+                          h2_settings: Master.build_h2_settings(config),
+                          async_io: config.async_io)
       server.listen
       scheme = tls ? 'https' : 'http'
       Hyperion.logger.info { { message: 'listening', url: "#{scheme}://#{server.host}:#{server.port}" } }
       warn_c_parser_unavailable
+      # Pre-allocate Rack env-pool entries and eager-touch lazy constants.
+      # In single-mode there's no fork, but the warmup still pays for itself
+      # by frontloading the first-N-request allocation cost off the first
+      # real client. Idempotent — safe to call once per process.
+      Hyperion.warmup!
       # Single-worker mode reuses the lifecycle hooks: before_fork is a no-op
       # here (no fork happens), and on_worker_boot/on_worker_shutdown fire
       # for the lone in-process "worker" so app code that opens DB pools etc.
@@ -199,13 +213,16 @@ module Hyperion
     private_class_method :maybe_enable_yjit
     # When admin_token is configured, wrap the app in AdminMiddleware so
-    # POST /-/quit becomes a token-protected drain endpoint. Skipped when
-    # the token is unset — the path falls through to the app, so apps may
-    # still own /-/anything if Hyperion's admin is off.
+    # POST /-/quit and GET /-/metrics become token-protected admin endpoints.
+    # Skipped when the token is unset — those paths fall through to the app,
+    # so apps may still own /-/anything if Hyperion's admin is off.
     def self.wrap_admin_middleware(app, config)
       return app if config.admin_token.nil? || config.admin_token.to_s.empty?
-      Hyperion.logger.info { { message: 'admin endpoint enabled', path: AdminMiddleware::PATH } }
+      Hyperion.logger.info do
+        { message: 'admin endpoint enabled',
+          paths: [AdminMiddleware::PATH_QUIT, AdminMiddleware::PATH_METRICS] }
+      end
       AdminMiddleware.new(app, token: config.admin_token)
     end
     private_class_method :wrap_admin_middleware

data/lib/hyperion/config.rb CHANGED Viewed

@@ -28,7 +28,14 @@ module Hyperion
       yjit: nil, # nil → auto: enable on production/staging; true/false to force.
       worker_max_rss_mb: nil, # Integer, e.g. 1024. When a worker exceeds this RSS in MB, master gracefully cycles it. nil disables.
       worker_check_interval: 30, # Seconds between RSS polls. Tradeoff: tighter = faster recycle, more ps calls. 30s matches Puma WorkerKiller.
-      admin_token: nil # String. When set, POST /-/quit triggers graceful drain. nil disables endpoint entirely (returns 404).
+      admin_token: nil, # String. When set, exposes admin endpoints (POST /-/quit triggers graceful drain; GET /-/metrics returns Prometheus-format Hyperion.stats). Same token guards both. nil disables admin entirely (paths fall through to the app).
+      max_pending: nil, # Integer, e.g. 256. When the per-worker accept inbox has this many queued connections, additional accepts are rejected with HTTP 503 + Retry-After:1 instead of being queued. nil disables (current behaviour: unbounded queue).
+      max_request_read_seconds: 60, # Numeric. Total wallclock budget (seconds) for reading the request line + headers + body for ONE request. Defends against slowloris-style drips that satisfy the per-recv read_timeout but never finish the request. Resets between requests on a keep-alive connection. nil disables.
+      async_io: false, # When true, the plain HTTP/1.1 accept loop runs each connection on a fiber under Async::Scheduler instead of handing it to a worker thread. Required for fiber-cooperative I/O (e.g. hyperion-async-pg). Costs ~5% throughput on hello-world; in exchange one OS thread can serve N concurrent in-flight DB queries on wait-bound workloads. TLS / HTTP/2 paths always use the async loop and ignore this flag.
+      h2_max_concurrent_streams: 128, # HTTP/2 SETTINGS_MAX_CONCURRENT_STREAMS — cap on simultaneously-open streams per connection. Falcon: 64. nil leaves protocol-http2 default (0xFFFFFFFF).
+      h2_initial_window_size: 1_048_576, # HTTP/2 SETTINGS_INITIAL_WINDOW_SIZE (octets) — flow-control window per stream at open. Bigger = fewer WINDOW_UPDATE round-trips on large bodies. Spec default is 65535. nil → leave protocol default.
+      h2_max_frame_size: 1_048_576, # HTTP/2 SETTINGS_MAX_FRAME_SIZE (octets) — biggest DATA/HEADERS frame we'll accept. Spec floor 16384, ceiling 16777215. We pick 1 MiB to match common CDNs without unbounded buffer growth. nil → leave protocol default (16384).
+      h2_max_header_list_size: 65_536 # HTTP/2 SETTINGS_MAX_HEADER_LIST_SIZE (octets) — advisory cap on the decompressed header block. Bounds memory of pathological client headers. nil → leave protocol default (unbounded).
     }.freeze
     HOOKS = %i[before_fork on_worker_boot on_worker_shutdown].freeze

data/lib/hyperion/connection.rb CHANGED Viewed

@@ -17,6 +17,7 @@ module Hyperion
     MAX_BODY_BYTES                  = 16 * 1024 * 1024 # 16 MB cap. Phase 5 introduces streaming bodies.
     HEADER_TERM                     = "\r\n\r\n"
     TIMEOUT_SENTINEL                = :__hyperion_read_timeout__
+    DEADLINE_SENTINEL               = :__hyperion_request_deadline__
     IDLE_KEEPALIVE_TIMEOUT_SECONDS  = 5
     # Default parser is the C-extension `CParser` when the extension built;
@@ -44,14 +45,20 @@ module Hyperion
       @log_requests = log_requests.nil? ? Hyperion.log_requests? : log_requests
     end
-    def serve(socket, app)
+    def serve(socket, app, max_request_read_seconds: 60)
       request_count = 0
       carry = +'' # bytes already pulled off the socket but past the prev request boundary
       peer_addr = peer_address(socket)
       @metrics.increment(:connections_accepted)
       @metrics.increment(:connections_active)
       loop do
-        buffer = read_request(socket, carry)
+        # Per-request wallclock deadline. Captured fresh for every request so
+        # long-lived keep-alive sessions with many small requests don't
+        # falsely trip after the cumulative budget elapses.
+        request_started_clock = Process.clock_gettime(Process::CLOCK_MONOTONIC) if max_request_read_seconds
+        buffer = read_request(socket, carry, deadline_started_at: request_started_clock,
+                                             max_request_read_seconds: max_request_read_seconds,
+                                             peer_addr: peer_addr)
         return unless buffer
         if buffer == TIMEOUT_SENTINEL
@@ -65,6 +72,10 @@ module Hyperion
           return
         end
+        # Slowloris-style abort: deadline tripped during read. We've already
+        # written the 408 (best-effort) inside read_request; close out here.
+        return if buffer == DEADLINE_SENTINEL
         request, body_end = @parser.parse(buffer)
         carry = +(buffer.byteslice(body_end, buffer.bytesize - body_end) || '')
         request = enrich_with_peer(request, peer_addr) if peer_addr && request.peer_address.nil?
@@ -193,10 +204,16 @@ module Hyperion
     # pipelining). Returns the full buffer (with any trailing pipelined
     # bytes intact); the parser's returned end_offset tells the caller
     # where this request ends. On EOF returns nil; on read timeout returns
-    # TIMEOUT_SENTINEL.
-    def read_request(socket, carry = +'')
+    # TIMEOUT_SENTINEL; on per-request wallclock deadline trip returns
+    # DEADLINE_SENTINEL (and emits a best-effort 408 + close).
+    def read_request(socket, carry = +'', deadline_started_at: nil, max_request_read_seconds: nil,
+                     peer_addr: nil)
       buffer = carry
       until buffer.include?(HEADER_TERM)
+        if deadline_exceeded?(deadline_started_at, max_request_read_seconds)
+          return abort_for_deadline(socket, deadline_started_at, peer_addr)
+        end
         chunk = read_chunk(socket)
         return chunk if chunk.nil? || chunk == TIMEOUT_SENTINEL
         return nil if chunk.empty?
@@ -211,6 +228,9 @@ module Hyperion
       if chunked?(headers_part)
         until chunked_body_complete?(buffer, header_end)
           raise ParseError, 'chunked body exceeds limit' if buffer.bytesize - header_end > MAX_BODY_BYTES
+          if deadline_exceeded?(deadline_started_at, max_request_read_seconds)
+            return abort_for_deadline(socket, deadline_started_at, peer_addr)
+          end
           chunk = read_chunk(socket)
           break if chunk.nil? || chunk.empty? || chunk == TIMEOUT_SENTINEL
@@ -220,6 +240,10 @@ module Hyperion
       else
         content_length = headers_part[/^content-length:\s*(\d+)/i, 1].to_i
         while buffer.bytesize < header_end + content_length
+          if deadline_exceeded?(deadline_started_at, max_request_read_seconds)
+            return abort_for_deadline(socket, deadline_started_at, peer_addr)
+          end
           chunk = read_chunk(socket)
           break if chunk.nil? || chunk.empty? || chunk == TIMEOUT_SENTINEL
@@ -230,6 +254,33 @@ module Hyperion
       buffer
     end
+    # nil-disabled or budget-untripped → false. Otherwise the wallclock cap
+    # has been exceeded and the caller should abort.
+    def deadline_exceeded?(started_at, max_seconds)
+      return false unless started_at && max_seconds
+      (Process.clock_gettime(Process::CLOCK_MONOTONIC) - started_at) > max_seconds
+    end
+    # Slowloris fallback: log a structured warn, bump :slow_request_aborts,
+    # write a best-effort 408, and let the caller close the socket. We don't
+    # wait on the 408 write — a dribbling client may never read it, and
+    # that's the failure mode we're protecting against anyway.
+    def abort_for_deadline(socket, started_at, peer_addr)
+      elapsed = started_at ? (Process.clock_gettime(Process::CLOCK_MONOTONIC) - started_at).round(3) : nil
+      @metrics.increment(:slow_request_aborts)
+      @logger.warn do
+        { message: 'request read deadline exceeded', remote_addr: peer_addr, elapsed_seconds: elapsed }
+      end
+      begin
+        socket.write("HTTP/1.1 408 Request Timeout\r\nconnection: close\r\ncontent-length: 0\r\n\r\n")
+      rescue StandardError
+        # Peer may have already gone — nothing to do.
+      end
+      @metrics.increment_status(408)
+      DEADLINE_SENTINEL
+    end
     def chunked?(headers_part)
       headers_part.match?(/^transfer-encoding:[ \t]*[^\r\n]*chunked\b/i)
     end

data/lib/hyperion/http2_handler.rb CHANGED Viewed

@@ -212,9 +212,34 @@ module Hyperion
       end
     end
-    def initialize(app:, thread_pool: nil)
+    # Maps Hyperion-friendly setting names to the integer SETTINGS_* identifiers
+    # protocol-http2 uses on the wire. See RFC 7540 §6.5.2 — these are the
+    # only four parameters Hyperion exposes; the rest of the SETTINGS frame
+    # (HEADER_TABLE_SIZE, ENABLE_PUSH, etc.) keeps protocol-http2's default.
+    SETTINGS_KEY_MAP = {
+      max_concurrent_streams: ::Protocol::HTTP2::Settings::MAXIMUM_CONCURRENT_STREAMS,
+      initial_window_size: ::Protocol::HTTP2::Settings::INITIAL_WINDOW_SIZE,
+      max_frame_size: ::Protocol::HTTP2::Settings::MAXIMUM_FRAME_SIZE,
+      max_header_list_size: ::Protocol::HTTP2::Settings::MAXIMUM_HEADER_LIST_SIZE
+    }.freeze
+    # RFC 7540 §6.5.2 floor for SETTINGS_MAX_FRAME_SIZE. protocol-http2 raises
+    # ProtocolError on values below this; we clamp + warn instead so a
+    # misconfigured operator gets a working server, not a boot-time crash.
+    H2_MIN_FRAME_SIZE = 0x4000 # 16384
+    # RFC 7540 §6.5.2 ceiling for SETTINGS_MAX_FRAME_SIZE.
+    H2_MAX_FRAME_SIZE = 0xFFFFFF # 16777215
+    # RFC 7540 §6.9.2 — INITIAL_WINDOW_SIZE has the same 31-bit max as the
+    # WINDOW_UPDATE frame's Window Size Increment (see protocol-http2's
+    # MAXIMUM_ALLOWED_WINDOW_SIZE).
+    H2_MAX_WINDOW_SIZE = 0x7FFFFFFF
+    def initialize(app:, thread_pool: nil, h2_settings: nil)
       @app         = app
       @thread_pool = thread_pool
+      @h2_settings = h2_settings
       @metrics     = Hyperion.metrics
       @logger      = Hyperion.logger
     end
@@ -224,7 +249,7 @@ module Hyperion
       @metrics.increment(:connections_active)
       framer = ::Protocol::HTTP2::Framer.new(socket)
       server = build_server(framer)
-      server.read_connection_preface
+      server.read_connection_preface(initial_settings_payload)
       # Extract once — the same TCP peer drives every stream on this conn.
       peer_addr = peer_address(socket)
@@ -290,6 +315,69 @@ module Hyperion
     private
+    # Build the [setting_id, value] pairs that go in the connection-preface
+    # SETTINGS frame. protocol-http2's Server#read_connection_preface accepts
+    # this array and does the wire encoding for us. Empty array (no overrides
+    # configured) → SETTINGS frame still goes out, just with no entries
+    # (effectively an ack), which is what the spec allows.
+    #
+    # We clamp out-of-range values (max_frame_size below the spec floor or
+    # above its ceiling, initial_window_size above 31-bit max) instead of
+    # letting protocol-http2 raise ProtocolError at handshake time — a
+    # crashing handshake leaks the connection. Operator gets a warn so the
+    # misconfiguration surfaces in logs.
+    def initial_settings_payload
+      return [] unless @h2_settings
+      payload = []
+      @h2_settings.each do |key, value|
+        next if value.nil?
+        setting_id = SETTINGS_KEY_MAP[key]
+        unless setting_id
+          @logger.warn { { message: 'unknown h2 setting; skipping', setting: key } }
+          next
+        end
+        clamped = clamp_h2_setting(key, value)
+        payload << [setting_id, clamped]
+      end
+      payload
+    end
+    def clamp_h2_setting(key, value)
+      case key
+      when :max_frame_size
+        if value < H2_MIN_FRAME_SIZE
+          @logger.warn do
+            { message: 'h2 max_frame_size below spec minimum; clamping',
+              configured: value, clamped_to: H2_MIN_FRAME_SIZE }
+          end
+          H2_MIN_FRAME_SIZE
+        elsif value > H2_MAX_FRAME_SIZE
+          @logger.warn do
+            { message: 'h2 max_frame_size above spec maximum; clamping',
+              configured: value, clamped_to: H2_MAX_FRAME_SIZE }
+          end
+          H2_MAX_FRAME_SIZE
+        else
+          value
+        end
+      when :initial_window_size
+        if value > H2_MAX_WINDOW_SIZE
+          @logger.warn do
+            { message: 'h2 initial_window_size above spec maximum; clamping',
+              configured: value, clamped_to: H2_MAX_WINDOW_SIZE }
+          end
+          H2_MAX_WINDOW_SIZE
+        else
+          value
+        end
+      else
+        value
+      end
+    end
     def build_server(framer)
       server = ::Protocol::HTTP2::Server.new(framer)
       server.define_singleton_method(:accept_stream) do |stream_id, &block|

data/lib/hyperion/master.rb CHANGED Viewed

@@ -47,6 +47,20 @@ module Hyperion
       end
     end
+    # Pulls the four configurable HTTP/2 SETTINGS values out of the Config
+    # and returns them as a Hash. Nils are stripped so an operator who
+    # explicitly sets one to `nil` (meaning "leave protocol-http2 default in
+    # place") doesn't accidentally send a SETTINGS entry with a nil value.
+    # Empty hash → no override → Http2Handler skips the SETTINGS push.
+    def self.build_h2_settings(config)
+      {
+        max_concurrent_streams: config.h2_max_concurrent_streams,
+        initial_window_size: config.h2_initial_window_size,
+        max_frame_size: config.h2_max_frame_size,
+        max_header_list_size: config.h2_max_header_list_size
+      }.compact
+    end
     def initialize(host:, port:, app:, workers: DEFAULT_WORKER_COUNT,
                    read_timeout: Server::DEFAULT_READ_TIMEOUT_SECONDS, tls: nil,
                    thread_count: Server::DEFAULT_THREAD_COUNT, config: nil)
@@ -84,6 +98,12 @@ module Hyperion
         }
       end
+      # Pre-allocate Rack env-pool entries and eager-touch lazy constants
+      # BEFORE we fork. Children inherit the warm memory via copy-on-write
+      # so the first batch of requests on each fresh worker doesn't pay
+      # the allocation/autoload tax.
+      Hyperion.warmup!
       # `before_fork` runs ONCE in the master before any worker is forked.
       # Operators use it to close shared resources (DB pools, Redis sockets)
       # so each child gets fresh connections rather than inheriting the
@@ -143,7 +163,11 @@ module Hyperion
           host: @host, port: @port, app: @app,
           read_timeout: @read_timeout, tls: @tls,
           thread_count: @thread_count, config: @config,
-          worker_index: worker_index
+          worker_index: worker_index,
+          max_pending: @config.max_pending,
+          max_request_read_seconds: @config.max_request_read_seconds,
+          h2_settings: Master.build_h2_settings(@config),
+          async_io: @config.async_io
         }
         # Hand the inherited socket to the worker in :share mode. In
         # :reuseport mode the worker binds its own with SO_REUSEPORT.

data/lib/hyperion/prometheus_exporter.rb ADDED Viewed

@@ -0,0 +1,96 @@
+# frozen_string_literal: true
+module Hyperion
+  # Renders Hyperion.stats as Prometheus text exposition format (v0.0.4).
+  # Mounted by AdminMiddleware on GET /-/metrics; the returned content-type
+  # is `text/plain; version=0.0.4; charset=utf-8`.
+  #
+  # Mapping rules:
+  # - keys listed in KNOWN_METRICS get their canonical name + curated HELP/TYPE
+  # - keys matching `responses_<3-digit>` are grouped under a single
+  #   `hyperion_responses_status_total` family with a `status` label
+  # - any other key is auto-exported as `hyperion_<key>` with a generic HELP
+  #   line, so newly-added counters surface in Prometheus without code changes
+  #   here (the curated-name path is just nicer presentation, not gating)
+  #
+  # Output ordering is deterministic for stable scrape diffs:
+  # - known metrics in KNOWN_METRICS declaration order
+  # - status codes ascending
+  # - other keys alphabetically
+  module PrometheusExporter
+    module_function
+    KNOWN_METRICS = {
+      requests: { name: 'hyperion_requests_total',
+                  help: 'Total HTTP requests handled',
+                  type: 'counter' },
+      bytes_read: { name: 'hyperion_bytes_read_total',
+                    help: 'Total bytes read from request sockets',
+                    type: 'counter' },
+      bytes_written: { name: 'hyperion_bytes_written_total',
+                       help: 'Total bytes written to response sockets',
+                       type: 'counter' },
+      rejected_connections: { name: 'hyperion_rejected_connections_total',
+                              help: 'Connections rejected due to backpressure (max_pending)',
+                              type: 'counter' },
+      sendfile_responses: { name: 'hyperion_sendfile_responses_total',
+                            help: 'Responses sent via plain-TCP sendfile(2) zero-copy path',
+                            type: 'counter' },
+      tls_zerobuf_responses: { name: 'hyperion_tls_zerobuf_responses_total',
+                               help: 'Responses sent via TLS IO.copy_stream (avoids userspace String build, but TLS encryption forces copy)',
+                               type: 'counter' }
+    }.freeze
+    STATUS_KEY_PATTERN = /\Aresponses_(\d{3})\z/
+    STATUS_FAMILY_NAME = 'hyperion_responses_status_total'
+    STATUS_FAMILY_HELP = 'Responses by HTTP status code'
+    def render(stats)
+      buf = +''
+      grouped_status = {}
+      other = {}
+      known = {}
+      stats.each do |key, value|
+        if (match = key.to_s.match(STATUS_KEY_PATTERN))
+          grouped_status[match[1]] = value
+        elsif KNOWN_METRICS.key?(key)
+          known[key] = value
+        else
+          other[key] = value
+        end
+      end
+      # Known metrics first, in declaration order — gives the scrape a stable,
+      # human-friendly preamble regardless of hash insertion order.
+      KNOWN_METRICS.each do |key, meta|
+        next unless known.key?(key)
+        append_metric(buf, meta[:name], meta[:help], meta[:type], known[key])
+      end
+      unless grouped_status.empty?
+        buf << "# HELP #{STATUS_FAMILY_NAME} #{STATUS_FAMILY_HELP}\n"
+        buf << "# TYPE #{STATUS_FAMILY_NAME} counter\n"
+        grouped_status.sort.each do |status, value|
+          buf << %(#{STATUS_FAMILY_NAME}{status="#{status}"} #{value}\n)
+        end
+      end
+      other.sort_by { |k, _| k.to_s }.each do |key, value|
+        name = "hyperion_#{key}"
+        append_metric(buf, name, 'Hyperion internal counter (auto-exported)', 'counter', value)
+      end
+      buf
+    end
+    def append_metric(buf, name, help, type, value)
+      buf << "# HELP #{name} #{help}\n"
+      buf << "# TYPE #{name} #{type}\n"
+      buf << "#{name} #{value}\n"
+    end
+    private_class_method :append_metric
+  end
+end

data/lib/hyperion/response_writer.rb CHANGED Viewed

@@ -36,6 +36,21 @@ module Hyperion
     CRLF_HEADER_VALUE = /[\r\n]/
     def write(io, status, headers, body, keep_alive: false)
+      # Zero-copy fast path: bodies that point at an on-disk file (Rack::Files,
+      # asset servers, signed-download responders) get streamed via
+      # IO.copy_stream which delegates to sendfile(2) on Linux for plain TCP
+      # sockets — bytes go from the file's page cache straight to the socket
+      # buffer with no userspace allocation. For TLS sockets we still avoid the
+      # multi-MB String build, but encryption forces a userspace round-trip so
+      # we count that path separately.
+      return write_sendfile(io, status, headers, body, keep_alive: keep_alive) if body.respond_to?(:to_path)
+      write_buffered(io, status, headers, body, keep_alive: keep_alive)
+    end
+    private
+    def write_buffered(io, status, headers, body, keep_alive:)
       # Phase 1 buffers the full body so Content-Length is exact.
       # Phase 2 introduces chunked transfer-encoding for streaming bodies;
       # Phase 5 batches via IO::Buffer to avoid this intermediate String.
@@ -43,7 +58,7 @@ module Hyperion
       body.each { |chunk| buffered << chunk }
       reason = REASONS[status] || 'Unknown'
-      date_str = Time.now.httpdate
+      date_str = cached_date
       head = build_head(status, reason, headers, buffered.bytesize, keep_alive, date_str)
@@ -67,7 +82,52 @@ module Hyperion
       body.close if body.respond_to?(:close)
     end
-    private
+    def write_sendfile(io, status, headers, body, keep_alive:)
+      path = body.to_path
+      file = File.open(path, 'rb')
+      file_size = file.size
+      # If the app explicitly set content-length, respect it; otherwise use the
+      # real file size. Rack::Files does not pre-set content-length, so the
+      # common case is the File.size branch.
+      content_length = explicit_content_length(headers) || file_size
+      reason = REASONS[status] || 'Unknown'
+      date_str = cached_date
+      head = build_head(status, reason, headers, content_length, keep_alive, date_str)
+      io.write(head)
+      # IO.copy_stream copies up to file_size bytes from the file to the socket.
+      # On Linux + plain TCPSocket this triggers sendfile(2) — kernel-level
+      # zero-copy. On TLS sockets and non-Linux platforms it falls back to
+      # internal read+write loops, but we still avoid building a String the
+      # size of the file in Ruby.
+      copied = IO.copy_stream(file, io, file_size)
+      record_zero_copy_metric(io)
+      Hyperion.metrics.increment(:bytes_written, head.bytesize + copied)
+    ensure
+      file&.close
+      body.close if body.respond_to?(:close)
+    end
+    def explicit_content_length(headers)
+      headers.each do |k, v|
+        return v.to_i if k.to_s.casecmp('content-length').zero?
+      end
+      nil
+    end
+    # Plain TCPSocket → real sendfile(2). TLS-wrapped sockets cannot use
+    # sendfile (kernel can't encrypt) but still avoid the per-response String
+    # allocation, so we track them under a separate counter.
+    def record_zero_copy_metric(io)
+      if defined?(::OpenSSL::SSL::SSLSocket) && io.is_a?(::OpenSSL::SSL::SSLSocket)
+        Hyperion.metrics.increment(:tls_zerobuf_responses)
+      else
+        Hyperion.metrics.increment(:sendfile_responses)
+      end
+    end
     # rc17: prefer the C extension when available — eliminates the per-response
     # status-line interpolation, normalized hash, and per-header String#<<

data/lib/hyperion/server.rb CHANGED Viewed

@@ -20,18 +20,41 @@ module Hyperion
     DEFAULT_READ_TIMEOUT_SECONDS = 30
     DEFAULT_THREAD_COUNT         = 5
+    # Pre-built minimal 503 response for the backpressure path. We bypass
+    # ResponseWriter / Rack entirely — no env build, no app dispatch, no
+    # access-log line. The bytes are frozen and reused across every
+    # rejection so the overload path stays allocation-free. Body is JSON
+    # so JSON-only API consumers don't have to special-case the format.
+    REJECT_503 = lambda {
+      body = +%({"error":"server_busy","retry_after_seconds":1}\n)
+      body.force_encoding(Encoding::ASCII_8BIT)
+      head = +"HTTP/1.1 503 Service Unavailable\r\n" \
+              "content-type: application/json\r\n" \
+              "content-length: #{body.bytesize}\r\n" \
+              "retry-after: 1\r\n" \
+              "connection: close\r\n" \
+              "\r\n"
+      head.force_encoding(Encoding::ASCII_8BIT)
+      (head + body).freeze
+    }.call
     attr_reader :host, :port
     def initialize(app:, host: '127.0.0.1', port: 9292, read_timeout: DEFAULT_READ_TIMEOUT_SECONDS,
-                   tls: nil, thread_count: DEFAULT_THREAD_COUNT)
-      @host         = host
-      @port         = port
-      @app          = app
-      @read_timeout = read_timeout
-      @tls          = tls
-      @thread_count = thread_count
-      @thread_pool  = nil
-      @stopped      = false
+                   tls: nil, thread_count: DEFAULT_THREAD_COUNT, max_pending: nil,
+                   max_request_read_seconds: 60, h2_settings: nil, async_io: false)
+      @host                     = host
+      @port                     = port
+      @app                      = app
+      @read_timeout             = read_timeout
+      @tls                      = tls
+      @thread_count             = thread_count
+      @max_pending              = max_pending
+      @max_request_read_seconds = max_request_read_seconds
+      @h2_settings              = h2_settings
+      @async_io                 = async_io
+      @thread_pool              = nil
+      @stopped                  = false
     end
     def listen
@@ -83,18 +106,25 @@ module Hyperion
     def start
       listen unless @server
-      @thread_pool = ThreadPool.new(size: @thread_count) if @thread_count.positive?
+      @thread_pool = ThreadPool.new(size: @thread_count, max_pending: @max_pending) if @thread_count.positive?
-      if @tls
+      if @tls || @async_io
         # TLS path: ALPN may pick `h2`, and h2 spawns one fiber per stream
         # inside Http2Handler. Keep the Async wrapper so the scheduler is
         # available for those fibers and for handshake yields.
+        #
+        # async_io: true: operator opt-in for plain HTTP/1.1. The Async wrap
+        # is required when callers want fiber cooperative I/O — e.g.
+        # `hyperion-async-pg` yielding while a Postgres query is in flight.
+        # Pays ~5% throughput vs the raw-loop fast path; in exchange one
+        # OS thread can serve N concurrent in-flight DB queries instead of 1.
         start_async_loop
       else
-        # Plain HTTP/1.1: the worker thread owns each connection for its
-        # lifetime, so the Async wrapper adds zero value (no fibers ever
-        # run on this loop's task). Skip it — pure IO.select + accept_nonblock
-        # shaves measurable overhead off the accept hot path.
+        # Plain HTTP/1.1, async_io: false (default): the worker thread owns
+        # each connection for its lifetime, so the Async wrapper adds zero
+        # value (no fibers ever run on this loop's task). Skip it — pure
+        # IO.select + accept_nonblock shaves measurable overhead off the
+        # accept hot path.
         start_raw_loop
       end
     ensure
@@ -121,9 +151,12 @@ module Hyperion
         apply_timeout(socket)
         if @thread_pool
-          @thread_pool.submit_connection(socket, @app)
+          unless @thread_pool.submit_connection(socket, @app,
+                                                max_request_read_seconds: @max_request_read_seconds)
+            reject_connection(socket)
+          end
         else
-          Connection.new.serve(socket, @app)
+          Connection.new.serve(socket, @app, max_request_read_seconds: @max_request_read_seconds)
         end
       end
     end
@@ -148,15 +181,47 @@ module Hyperion
         # HTTP/2: each stream runs on a fiber inside Http2Handler. The
         # handler still uses the pool's `#call` for app.call hops on each
         # stream (one per stream, not one per connection).
-        Http2Handler.new(app: @app, thread_pool: @thread_pool).serve(socket)
+        Http2Handler.new(app: @app, thread_pool: @thread_pool, h2_settings: @h2_settings).serve(socket)
+      elsif @async_io
+        # async_io plain HTTP/1.1: serve inline on the calling fiber so the
+        # request runs *under* Async::Scheduler. This is what makes
+        # hyperion-async-pg (and other Async-aware libraries) actually
+        # cooperate — each fiber yields the OS thread on socket waits, so
+        # one thread can serve N concurrent in-flight DB queries. The
+        # thread pool is intentionally bypassed here: handing the socket
+        # to a worker thread strips the scheduler context.
+        Connection.new.serve(socket, @app, max_request_read_seconds: @max_request_read_seconds)
       elsif @thread_pool
         # HTTP/1.1 (e.g. TLS-wrapped after ALPN picked http/1.1): hand the
         # connection to a worker thread. The fiber that called dispatch
-        # returns immediately.
-        @thread_pool.submit_connection(socket, @app)
+        # returns immediately. On overflow, reject with 503 + close.
+        unless @thread_pool.submit_connection(socket, @app,
+                                              max_request_read_seconds: @max_request_read_seconds)
+          reject_connection(socket)
+        end
       else
         # No pool (thread_count: 0): inline on the calling fiber.
-        Connection.new.serve(socket, @app)
+        Connection.new.serve(socket, @app, max_request_read_seconds: @max_request_read_seconds)
+      end
+    end
+    # Backpressure rejection. Emits a pre-built 503 + closes the socket.
+    # No Rack env, no app dispatch, no access-log line — the overload
+    # path must stay cheap so we don't pile rejection cost on top of the
+    # already-saturated workers. Bumps :rejected_connections so operators
+    # can alert on sustained overload.
+    def reject_connection(socket)
+      socket.write(REJECT_503)
+      Hyperion.metrics.increment(:rejected_connections)
+    rescue StandardError
+      # Client may have hung up between accept and our 503 write — that's
+      # the failure mode we're protecting them from anyway, so swallow.
+      nil
+    ensure
+      begin
+        socket.close
+      rescue StandardError
+        nil
       end
     end

data/lib/hyperion/thread_pool.rb CHANGED Viewed

@@ -26,11 +26,12 @@ module Hyperion
   class ThreadPool
     SHUTDOWN = :__hyperion_thread_pool_shutdown__
-    attr_reader :size
+    attr_reader :size, :max_pending
-    def initialize(size:)
-      @size       = size
-      @inbox      = Queue.new # multiplexes both kinds of jobs
+    def initialize(size:, max_pending: nil)
+      @size        = size
+      @max_pending = max_pending
+      @inbox       = Queue.new # multiplexes both kinds of jobs
       # Pre-allocate one reply queue per in-flight slot for the legacy `#call`
       # path. Bounded by `size`: if all workers are busy, all reply queues are
       # checked out, and the next caller blocks on `@reply_pool.pop` until a
@@ -43,8 +44,23 @@ module Hyperion
     # HTTP/1.1 path: hand the whole socket to a worker thread. The worker
     # runs `Connection#serve(socket, app)` directly. No per-request hop.
     # Returns immediately — caller does not wait.
-    def submit_connection(socket, app)
-      @inbox << [:connection, socket, app]
+    #
+    # Returns true on enqueue, false on rejection. When `max_pending` is set
+    # and the inbox already has at least that many entries, the connection
+    # is rejected up to the caller (Server emits a 503 and closes the
+    # socket). Without `max_pending` (default nil) the queue is unbounded
+    # and we always return true — preserves pre-1.2 behaviour.
+    #
+    # The check is inherently racy with worker drain — workers may pop
+    # between our `size` read and the `<<`. Backpressure is statistical,
+    # not strict. Off-by-one over the configured cap during a thundering
+    # accept burst is acceptable; the cost of stricter sync would be a
+    # mutex on every enqueue, which we won't pay on the hot path.
+    def submit_connection(socket, app, max_request_read_seconds: 60)
+      return false if @max_pending && @inbox.size >= @max_pending
+      @inbox << [:connection, socket, app, max_request_read_seconds]
+      true
     end
     # HTTP/2 + sub-call path: hop one `app.call` from the calling fiber to a
@@ -78,12 +94,12 @@ module Hyperion
           case job[0]
           when :connection
-            _, socket, app = job
+            _, socket, app, max_request_read_seconds = job
             # Worker thread owns the connection for its full lifetime. Pass
             # thread_pool: nil so Connection#call_app inlines Adapter::Rack.call
             # — the worker IS the pool, no further hop required.
             begin
-              Hyperion::Connection.new.serve(socket, app)
+              Hyperion::Connection.new.serve(socket, app, max_request_read_seconds: max_request_read_seconds)
             rescue StandardError => e
               Hyperion.logger.error do
                 {

data/lib/hyperion/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module Hyperion
-  VERSION = '1.1.0'
+  VERSION = '1.3.0'
 end

data/lib/hyperion/worker.rb CHANGED Viewed

@@ -18,16 +18,22 @@ module Hyperion
   class Worker
     def initialize(host:, port:, app:, read_timeout:, tls: nil,
                    thread_count: Server::DEFAULT_THREAD_COUNT,
-                   config: nil, worker_index: 0, listener: nil)
-      @host         = host
-      @port         = port
-      @app          = app
-      @read_timeout = read_timeout
-      @tls          = tls
-      @thread_count = thread_count
-      @config       = config || Hyperion::Config.new
-      @worker_index = worker_index
-      @listener     = listener
+                   config: nil, worker_index: 0, listener: nil,
+                   max_pending: nil, max_request_read_seconds: 60,
+                   h2_settings: nil, async_io: false)
+      @host                     = host
+      @port                     = port
+      @app                      = app
+      @read_timeout             = read_timeout
+      @tls                      = tls
+      @thread_count             = thread_count
+      @config                   = config || Hyperion::Config.new
+      @worker_index             = worker_index
+      @listener                 = listener
+      @max_pending              = max_pending
+      @max_request_read_seconds = max_request_read_seconds
+      @h2_settings              = h2_settings
+      @async_io                 = async_io
     end
     def run
@@ -43,7 +49,11 @@ module Hyperion
       server = Server.new(host: @host, port: @port, app: @app,
                           read_timeout: @read_timeout, tls: @tls,
-                          thread_count: @thread_count)
+                          thread_count: @thread_count,
+                          max_pending: @max_pending,
+                          max_request_read_seconds: @max_request_read_seconds,
+                          h2_settings: @h2_settings,
+                          async_io: @async_io)
       tcp_server = @listener || build_reuseport_listener
       server.adopt_listener(tcp_server)

data/lib/hyperion.rb CHANGED Viewed

@@ -63,6 +63,44 @@ module Hyperion
         else true # default ON
         end
     end
+    # Pre-fork warmup. Run by Master and CLI single-mode BEFORE children are
+    # forked (or before the lone worker starts accepting). Pre-allocates the
+    # Rack adapter's object pools and eager-touches lazily-resolved constants
+    # so each forked child inherits warm memory via copy-on-write — the first
+    # N requests on a fresh worker no longer pay the allocation / autoload
+    # tax that would otherwise serialize behind the GVL on cold start.
+    #
+    # Idempotent — second and later calls are no-ops. Failures are swallowed
+    # with a warn log: warmup is an optimization, not a correctness gate.
+    # If, for instance, OpenSSL can't be required in some odd environment,
+    # we'd rather start cold than refuse to boot.
+    def warmup!
+      return if @warmed
+      @warmed = true
+      if defined?(::Hyperion::Adapter::Rack) && ::Hyperion::Adapter::Rack.respond_to?(:warmup_pool)
+        ::Hyperion::Adapter::Rack.warmup_pool(8)
+      end
+      # Touch the C extension's response-head builder so its lazily-initialized
+      # internal state runs in the master, not in every child after fork.
+      ::Hyperion::CParser.respond_to?(:build_response_head) if defined?(::Hyperion::CParser)
+      # Eager-load TLS / SSLSocket. The sendfile path's `is_a?` check would
+      # otherwise trigger autoload in the worker on the first TLS response.
+      require 'openssl'
+      defined?(::OpenSSL::SSL::SSLSocket) && ::OpenSSL::SSL::SSLSocket.name
+      # Force Ruby's tzinfo / strftime-cache load by emitting one httpdate.
+      # Subsequent calls hit the per-thread `cached_date` slot in response_writer.
+      Time.now.httpdate
+      nil
+    rescue StandardError => e
+      Hyperion.logger.warn { { message: 'warmup failed (non-fatal)', error: e.message } }
+      nil
+    end
   end
 end
@@ -89,6 +127,7 @@ require_relative 'hyperion/request'
 require_relative 'hyperion/parser'
 require_relative 'hyperion/c_parser'
 require_relative 'hyperion/adapter/rack'
+require_relative 'hyperion/prometheus_exporter'
 require_relative 'hyperion/admin_middleware'
 require_relative 'hyperion/response_writer'
 require_relative 'hyperion/thread_pool'

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: hyperion-rb
 version: !ruby/object:Gem::Version
-  version: 1.1.0
+  version: 1.3.0
 platform: ruby
 authors:
 - Andrey Lobanov
@@ -160,6 +160,7 @@ files:
 - lib/hyperion/metrics.rb
 - lib/hyperion/parser.rb
 - lib/hyperion/pool.rb
+- lib/hyperion/prometheus_exporter.rb
 - lib/hyperion/request.rb
 - lib/hyperion/response_writer.rb
 - lib/hyperion/server.rb