RubyGems - hyperion-rb - Versions diffs - 1.5.0 → 1.6.1 - Mend

hyperion-rb 1.5.0 → 1.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +53 -0
data/README.md +61 -1
data/ext/hyperion_http/parser.c +322 -0
data/lib/hyperion/adapter/rack.rb +18 -1
data/lib/hyperion/cli.rb +37 -0
data/lib/hyperion/connection.rb +22 -2
data/lib/hyperion/http2_handler.rb +287 -24
data/lib/hyperion/logger.rb +17 -1
data/lib/hyperion/version.rb +1 -1
metadata +1 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: ab7691ac6671b0e0c9606c281c55659c76675b71f3d461d1fb5bf6a03680861b
-  data.tar.gz: b7ad35585d56e59d4a7b5c9fcb6d4e016e72b4c3f99496ba675ca7e871865718
+  metadata.gz: 388377a54507d370411ae4b229ff575e191742ba6e3dc044c9c8990552bff5ff
+  data.tar.gz: 8cc9cd083c9450948ba3a710cb5514f16bc31b8a421ebd405c4064129b0b031c
 SHA512:
-  metadata.gz: 8911a91c7932b332a9d5f069099c7f6ded94d9b5978dffd259881ab482066d5328c508ca5983101c6d9d04b18c1353664766bf759ec66cb034d3bcdf84f01a89
-  data.tar.gz: 7c948c98eb9aea2cb31595e08deca0c4e98c2281105a18fc0419678da25a04ac9b04f9defe564ea660104cf935399582506eb87769f6f9fbbca74f568c8f904b
+  metadata.gz: 389098362215d01ce8fa08add90d29871390e0c9c5e38d384caa50ee2605210005c252ecbaea191f379d59580e2c4d4573b94ef8c9308259e1959abee81e4397
+  data.tar.gz: f3d2664e553a2b3c24f8518ed9b65e73a49bbefd0dee03f0c602d7867fdd37ed0a8bf7030abcee68b9020c54ee80fe3cc039c93e4e004fbd372cea02313592bd

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,58 @@
 # Changelog
+## [1.6.1] - 2026-04-27
+Audit follow-up from the [BENCH_2026_04_27.md](docs/BENCH_2026_04_27.md) sweep. No code-path changes; doc surface and operator-UX polish.
+### Added
+- **`## Operator guidance` README section** — concrete "when do I pick which config?" tables. Translates the bench numbers into decisions: `-w 1 + larger pool` vs `-w N + smaller pool` for I/O-bound (multi-worker is 2.6× memory for 0.77× rps if you pick wrong on PG-wait); the `--async-io` decision tree (default OFF unless you're paired with a fiber-cooperative library); how to read p50 vs p99 (tail wins are 5-200× larger than the rps story suggests — size capacity by p99).
+- **Boot-time advisory warn for orphan `--async-io`** — if `async_io: true` is set but no fiber-cooperative library is loaded (`hyperion-async-pg`, `async-redis`, `async-http`), Hyperion logs a single advisory warn at boot pointing at the operator-guidance docs. The setting is still honoured; the warn just helps operators who flipped the flag expecting a free perf bump (bench showed `--async-io` on hello-world = 47% rps regression + 3.65 s p99 spike).
+- **4 new specs in `spec/hyperion/cli_async_io_warn_spec.rb`** covering all four warn-fire cases (true + no library, false, nil, true + library detected via stub_const).
+## [1.6.0] - 2026-04-27
+Two parallel improvements landing in 1.6.0:
+1. Three small C-extension additions on the request hot path (sibling commit — see "Performance" below).
+2. Architectural rewrite of the HTTP/2 outbound write path — per-stream send queue + dedicated writer fiber replace the global `@send_mutex` (see "HTTP/2 writer architecture" below).
+These are independent and can be reviewed / reverted separately. The CHANGELOG sub-sections will be merged before tag.
+### HTTP/2 writer architecture (Changed)
+- **`Hyperion::Http2Handler` now uses a per-connection writer fiber instead of a single send Mutex.** Pre-1.6.0 every framer write — HEADERS, DATA, RST_STREAM, GOAWAY — ran inside one `@send_mutex.synchronize { socket.write(...) }`. That capped per-connection h2 throughput at "one socket-write at a time" regardless of how many streams were concurrently in flight: a slow socket (kernel send buffer full, peer reading slowly) blocked every other stream's writes too. 1.6.0 splits the path:
+  - **Encode + frame format** (HPACK encoding, frame layout) is fast (microseconds, in-memory) and stays serialized on the calling fiber via `WriterContext#encode_mutex`. HPACK state is connection-scoped and stateful across HEADERS frames; per-stream wire order (HEADERS → DATA → END_STREAM) must also be preserved. Holding the encode mutex across a `stream.send_*` call satisfies both.
+  - **Bytes-to-socket** is owned by a dedicated `run_writer_loop` fiber spawned per connection. Encoder fibers hand bytes off via `WriterContext#enqueue` (non-blocking, signals an `Async::Notification`); the writer pops chunks from the queue and writes them. Only this fiber ever calls `socket.write`, satisfying SSLSocket's "no concurrent writes from different fibers" constraint.
+  - **Net effect**: a stream that has bytes ready can encode and enqueue while the writer is mid-flush of an earlier chunk — the slow-socket case no longer serializes encode work across streams. Mutex hold time drops from "until the kernel accepts the write" to "until the bytes are appended to the in-memory queue."
+- **Per-connection backpressure cap** (`MAX_PER_CONN_PENDING_BYTES = 16 MiB`). Pathological clients that read very slowly could otherwise let the queue grow without bound. `WriterContext#enqueue` parks the encoder on `@drained_notify` once `@pending_bytes` exceeds the cap; the writer signals `@drained_notify` after each drain pass.
+- **Coordinated shutdown**: when `Http2Handler#serve` exits (clean close, peer disconnect, or protocol error), the `ensure` block sets `WriterContext#shutdown!` and `writer_task.wait`s for the final drain BEFORE closing the socket. Order matters — closing the socket first would discard final RST_STREAM / GOAWAY / END_STREAM frames sitting in the queue.
+### HTTP/2 writer architecture (Added)
+- **`Hyperion::Http2Handler::SendQueueIO`** — IO-shaped wrapper passed to `Protocol::HTTP2::Framer` in place of the raw socket. `read` is a passthrough (single-reader on the connection fiber); `write` enqueues onto the connection-wide queue. Reports `closed?` from the underlying socket so framer EOF detection still works.
+- **`Hyperion::Http2Handler::WriterContext`** — holds the per-connection queue, the encode mutex, the send/drained notifications, and the byte-budget counters. One instance per connection; lives for the lifetime of `Http2Handler#serve`.
+- **9 new specs in `spec/hyperion/http2_writer_loop_spec.rb`**:
+  - `SendQueueIO#write` returns bytesize, enqueues without writing the socket, no-ops on empty/nil, reports the underlying socket's `closed?` state (4).
+  - Writer loop drains a single encoder's frames in enqueue order (1).
+  - Two encoder fibers pushing concurrently — bytes for both streams reach the wire and per-stream order (HEADERS → DATA → END) is preserved (1).
+  - Backpressure parks the encoder when `@pending_bytes` exceeds `max_pending_bytes`; encoder resumes after the writer drains (1).
+  - Shutdown drains all queued frames before the writer fiber exits; shutdown with an empty queue exits cleanly (2).
+- **`bench/h2_streams.sh`** — `h2load`-driven recipe (`-c 1 -m 100 -n 5000`) for measuring per-connection multi-stream rps. Skips with a clear message if `h2load` isn't on PATH; emits a one-line JSON summary so cross-version diffs are easy.
+### HTTP/2 writer architecture (Migration)
+- No public-API changes. Operators do not need to touch config or restart with new flags. The architectural change is internal to `Http2Handler`.
+### HTTP/2 writer architecture (Notes)
+- HPACK's dynamic-table state is shared across all streams on a connection (per RFC 7541 §2.3.2.1). That is why we still serialize encode work — two fibers calling `stream.send_headers` concurrently would corrupt the encoder's table state. The mutex is now microseconds-of-CPU rather than "however long the socket takes to drain N MB."
+- `Async::Notification#signal` is a no-op when there are no waiters (signals are not buffered). The writer loop accordingly re-checks `writer_done? && queue_empty?` before parking, so a `shutdown!` call that races a `wait_for_signal` doesn't deadlock.
+### Performance
+- **`Hyperion::CParser.upcase_underscore(name)` — C-level Rack header-name normalizer.** Replaces the per-uncached-header `"HTTP_#{name.upcase.tr('-', '_')}"` allocation in `Adapter::Rack#build_env`. Single allocation (5 prefix bytes + N source bytes), single byte loop, no Ruby intermediates. Microbench (5 typical X-* names per call): 460k i/s Ruby → 2.21M i/s C, **4.80×** faster (2.17 μs → 452 ns/iter). On a header-heavy hello-world rackup with 8 X-Custom-* request headers + 9 response headers, headline throughput went from ~16.6k r/s to ~18.0k r/s wrk-driven (~+8.5%, averaged across 3 trials). The 16-name `HTTP_KEY_CACHE` still short-circuits the common headers; this only fires on uncached customs.
+- **`Hyperion::CParser.chunked_body_complete?(buffer, body_start)` — chunked-transfer body completion check in C.** Replaces the pure-Ruby walker in `Connection#chunked_body_complete?` with a C-level loop that scans CRLF boundaries, decodes hex sizes, and advances the cursor without per-iteration `String#index` / `byteslice` / `split` allocations. Returns `[complete?, last_safe_offset]` so the caller can persist parse progress across read boundaries (handy for pipelined / streaming buffers, even though Connection currently only consults the boolean). Microbench (3 mixed buffers per iter): 283k i/s Ruby → 3.73M i/s C, **13.19×** faster (3.54 μs → 268 ns/iter). Profit is small in production because chunked uploads are rare, but the path now matches the rest of the parser in cost shape.
+- **`Hyperion::CParser.build_access_line_colored(...)` — TTY-coloured access-log builder in C.** Mirrors `build_access_line` with the green ANSI escape pair `\e[32mINFO \e[0m` baked into the level label. Ten extra bytes per line, single allocation. The pre-1.6.0 `Logger#access` path fell back to the slower Ruby builder whenever `@colorize` was on (i.e. local TTY / dev runs); now the C builder fires there too. Microbench: 1.78M i/s Ruby → 2.90M i/s C, **1.63×** faster (561 ns → 345 ns per line). Smaller win than the others — the Ruby builder was already a single interpolation — but closes the parity gap so dev-loop `tail -f` doesn't pay an avoidable Ruby tax.
+### Added
+- **9 new specs in `spec/hyperion/c_upcase_underscore_spec.rb`** plus a fallback-parity assertion that flips `Hyperion::Adapter::Rack.@c_upcase_available` to walk both the C and Ruby branches in one process. Covers lowercase / uppercase / multi-dash / empty / single-byte / non-ASCII byte-pass-through / digit-preservation / Ruby-equivalence on a panel of canonical custom names / encoding (US-ASCII).
+- **13 new specs in `spec/hyperion/c_chunked_body_complete_spec.rb`** including a fallback-parity assertion against the original Ruby walker. Covers single chunk, multi-chunk, trailers, partial CRLF, partial size token, partial chunk data, chunk extensions, body_start offset, last-safe-cursor reporting on partial buffers, ArgumentError on out-of-range body_start, and a panel of mixed inputs that must agree byte-for-byte with the Ruby walker.
+- **9 new specs in `spec/hyperion/c_access_line_colored_spec.rb`** plus a Logger#access integration test that constructs a TTY-faking IO and asserts the green INFO label appears in the emitted line. Covers text + json formats, query nil/empty/quote-trigger, remote_addr nil, ANSI absence in JSON, and byte-for-byte parity against a hand-rolled Ruby colored builder.
 ## [1.5.0] - 2026-04-27
 Audit-driven CLI + adapter polish. No breaking changes; pure additions to the operator surface and a hardening of the host-header parser.

data/README.md CHANGED Viewed

@@ -25,7 +25,9 @@ bundle exec hyperion config.ru
 ## Benchmarks
-All numbers are real wrk runs against published Hyperion configs. Hyperion ships **with default-ON structured access logs**; Puma comparisons use Puma defaults (no per-request log emission). Each section is stamped with the Hyperion version it was measured against — newer versions (1.3.0+ `--async-io`, 1.4.0+ TLS h1 inline, 1.4.1+ Metrics fiber-key fix) preserve or improve these numbers; we re-run the headline configs each release and have not seen regressions on these workloads.
+All numbers are real wrk runs against published Hyperion configs. Hyperion ships **with default-ON structured access logs**; Puma comparisons use Puma defaults (no per-request log emission). Each section is stamped with the Hyperion version it was measured against — newer versions (1.3.0+ `--async-io`, 1.4.0+ TLS h1 inline, 1.4.1+ Metrics fiber-key fix, 1.6.0+ HTTP/2 writer fiber + 3 C-ext additions) preserve or improve these numbers; we re-run the headline configs each release and have not seen regressions on these workloads.
+> **Comprehensive matrix for 1.6.0 + hyperion-async-pg 0.5.0 (16-vCPU Linux, 9 workloads × 25+ configs)**: see [`docs/BENCH_2026_04_27.md`](docs/BENCH_2026_04_27.md). Headline: 98,818 r/s on hello `-w 16`, 21,215 r/s `-w 4` at p99 < 2 ms, 2,180 r/s on a 50 ms-waiting PG workload (4.1× the best Puma), 1,667 req/s HTTP/2 multiplexed at 0 errors, 155 MB RSS for 10k idle keep-alive connections.
 ### Hello-world Rack app
@@ -201,6 +203,8 @@ The architectural difference shows up under **load**, not at idle: Puma can only
 Hyperion fans 100 in-flight streams across separate fibers within a single TCP connection. A serial server would take 5 s; the fiber-multiplexed result (1.04 s, ~96 req/s on one socket) is bounded by single-handler sleep time plus framing overhead. Puma has no native HTTP/2 path — production deployments terminate h2 at nginx and forward h1 to the worker pool, which serializes again.
+> **1.6.0 outbound write path** — `Http2Handler` no longer serializes every framer write through one `Mutex#synchronize { socket.write(...) }`. HPACK encoding (microseconds, in-memory) still serializes on a fast encode mutex, but the actual `socket.write` is owned by a dedicated per-connection writer fiber draining a queue. On per-connection multi-stream workloads where the kernel send buffer or peer reads are slow, encode work for ready streams overlaps the writer's flush of earlier chunks, instead of stacking up behind it. See `bench/h2_streams.sh` (`h2load -c 1 -m 100 -n 5000`) for a recipe to compare 1.5.0 vs 1.6.0 on a workload of your choice.
 ### Reproduce
 ```sh
@@ -318,6 +322,62 @@ Strict DSL: unknown methods raise `NoMethodError` at boot — typos surface imme
 A documented sample lives at [`config/hyperion.example.rb`](config/hyperion.example.rb).
+## Operator guidance
+Concrete tradeoffs distilled from [`docs/BENCH_2026_04_27.md`](docs/BENCH_2026_04_27.md). If the bench numbers cited below feel surprising, check that doc for the full matrix + caveats.
+### When to use `-w N`
+| Workload shape | Recommended | Why |
+|---|---|---|
+| **Pure I/O-bound** (PG / Redis / external HTTP, no significant CPU) | `-w 1` + larger pool | Bench: `-w 1 pool=200` = 87 MB / 2,180 r/s vs `-w 4 pool=64` = 224 MB / 1,680 r/s. **2.6× more memory, 0.77× rps** if you pick multi-worker on a wait-bound workload. |
+| **Pure CPU-bound** (heavy JSON / template render / image processing) | `-w N` matching CPU count | Each worker's accept loop is single-threaded under `--async-io`; multi-worker gives CPU-parallelism. Bench: `-w 16 -t 5` hits 98,818 r/s on a 16-vCPU box, 4.7× a `-w 1` ceiling on the same hardware. |
+| **Mixed** (Rails-shaped: ~5 ms CPU + 50 ms PG wait per request) | `-w N/2` (half cores) + medium pool | Lets CPU work parallelise while keeping per-worker memory tractable. Bench `pg_mixed.ru` at `-w 4 -t 5 pool=128` = 1,740 r/s with no cold-start spike (ForkSafe `prefill_in_child: true`). |
+Multi-worker on PG-wait workloads is the **wrong** default for most apps — the headline rps doesn't justify the memory and PG-connection cost. Verify your shape with the bench before scaling out.
+### When to use `--async-io`
+```
+                 Are you using a fiber-cooperative I/O library?
+                 (hyperion-async-pg, async-redis, async-http)
+                              │
+                ┌─────────────┴─────────────┐
+                yes                          no
+                │                            │
+        Pair with a fiber-aware       Leave --async-io OFF.
+        connection pool               Default thread-pool dispatch
+        (FiberPool, async-pool —      is faster for synchronous
+        NOT connection_pool gem,      Rails apps. Bench: --async-io
+        which uses non-fiber Mutex).  on hello-world = 47% rps
+                │                     regression + p99 spike to
+        Set --async-io.               3.65 s under no-yield workloads.
+        Pool size is the real         No reason to flip the flag.
+        concurrency knob; -t is
+        decorative for wait-bound.
+```
+Hyperion warns at boot if you set `--async-io` without any fiber-cooperative library loaded. The setting is still honoured; the warn just nudges operators who flipped it expecting a free perf bump.
+### Tuning `-t` and pool sizes
+- **Without `--async-io`** (sync server, default): `-t` is the concurrency knob. Each in-flight request holds an OS thread; pool size should match `-t`. Bench shows Puma-style behaviour — at 200 wrk conns hitting a 5-thread server, queue depth dominates p99 (Hyperion `-t 5 -w 1` p50 = 0.95 ms vs Puma's same shape at 59.5 ms — Hyperion's queueing is cheaper but the model still serializes at `-t`).
+- **With `--async-io` + a fiber-aware pool**: pool size is the concurrency knob. `-t` is decorative for wait-bound workloads; one accept-loop fiber serves all in-flight queries via the pool. Linear scaling: pool=64 → ~780 r/s, pool=128 → ~1,344 r/s, pool=200 → ~2,180 r/s on 50 ms PG queries.
+- **Pool over WAN**: if `PG.connect` round-trip is >50 ms, expect pool fill at startup to take `pool_size / parallel_fill_threads × RTT`. `hyperion-async-pg 0.5.1+` auto-scales `parallel_fill_threads` so pool=200 fills in ~1-2 s.
+### How to read p50 vs p99
+Tail latency tells the queueing story; rps tells the throughput story. Hyperion's tail wins are **always** bigger than its rps wins — sometimes the rps numbers look close to a competitor while p99 is 5-200× lower:
+| Workload | Hyperion rps / p99 | Closest competitor | rps ratio | p99 ratio |
+|---|---|---|---:|---:|
+| Hello `-w 4` | 21,215 r/s / 1.87 ms | Falcon 24,061 / 9.78 ms | 0.88× | **5.2× lower** |
+| CPU JSON `-w 4` | 15,582 r/s / 2.47 ms | Falcon 18,643 / 13.51 ms | 0.84× | **5.5× lower** |
+| Static 1 MiB | 1,919 r/s / 4.22 ms | Puma 2,074 / 55 ms | 0.93× | **13× lower** |
+| PG-wait `-w 1` pool=200 | 2,180 r/s / 668 ms | Puma 530 r/s + 200 timeouts | **4.1×** | qualitative crush |
+**Size capacity by p99, not by mean.** Throughput peaks are easy to fake under controlled bench conditions; tail latency reflects what your slowest user actually experiences when the load balancer fans them onto a busy worker.
 ## Logging
 Default behaviour (rc16+):

data/ext/hyperion_http/parser.c CHANGED Viewed

@@ -543,6 +543,322 @@ static VALUE cbuild_access_line(VALUE self,
 }
 #undef CAT_LIT
+/* Hyperion::CParser.build_access_line_colored(format, ts, method, path, query,
+ *                                              status, duration_ms, remote_addr,
+ *                                              http_version) -> String
+ *
+ * TTY-coloured variant of build_access_line. The text path wraps the level
+ * label with ANSI escape "\e[32mINFO \e[0m" so a developer running Hyperion
+ * in a terminal sees a green INFO tag. The :json branch is identical to the
+ * non-coloured builder — JSON access lines are machine-readable and never
+ * carry ANSI escapes.
+ *
+ * Lifted from cbuild_access_line above; the only divergence is the level
+ * label injection in the text branch. We deliberately duplicate the text
+ * format rather than templating, because the text body is short and a
+ * single function with a colour flag would compile to the same code with an
+ * extra branch in the hot loop.
+ */
+static VALUE cbuild_access_line_colored(VALUE self,
+                                        VALUE format_sym, VALUE rb_ts,
+                                        VALUE rb_method, VALUE rb_path,
+                                        VALUE rb_query, VALUE rb_status,
+                                        VALUE rb_duration, VALUE rb_remote,
+                                        VALUE rb_http_version) {
+    (void)self;
+    Check_Type(rb_ts, T_STRING);
+    Check_Type(rb_method, T_STRING);
+    Check_Type(rb_path, T_STRING);
+    Check_Type(rb_http_version, T_STRING);
+    int is_json = (TYPE(format_sym) == T_SYMBOL) &&
+                  (SYM2ID(format_sym) == rb_intern("json"));
+    int status     = NUM2INT(rb_status);
+    double dur_ms  = NUM2DBL(rb_duration);
+    int has_query  = !NIL_P(rb_query) && RSTRING_LEN(rb_query) > 0;
+    int has_remote = !NIL_P(rb_remote) && RSTRING_LEN(rb_remote) > 0;
+#define CAT_LIT(b, s) rb_str_cat((b), (s), (long)(sizeof(s) - 1))
+    VALUE buf = rb_str_buf_new(512);
+    if (is_json) {
+        /* JSON output is identical to the non-coloured path — ANSI escapes
+         * have no place in a structured log record. */
+        CAT_LIT(buf, "{\"ts\":\"");
+        rb_str_cat(buf, RSTRING_PTR(rb_ts), RSTRING_LEN(rb_ts));
+        CAT_LIT(buf, "\",\"level\":\"info\",\"source\":\"hyperion\",\"message\":\"request\",");
+        CAT_LIT(buf, "\"method\":\"");
+        rb_str_cat(buf, RSTRING_PTR(rb_method), RSTRING_LEN(rb_method));
+        CAT_LIT(buf, "\",\"path\":\"");
+        rb_str_cat(buf, RSTRING_PTR(rb_path), RSTRING_LEN(rb_path));
+        CAT_LIT(buf, "\"");
+        if (has_query) {
+            CAT_LIT(buf, ",\"query\":\"");
+            rb_str_cat(buf, RSTRING_PTR(rb_query), RSTRING_LEN(rb_query));
+            CAT_LIT(buf, "\"");
+        }
+        char num[64];
+        int n = snprintf(num, sizeof(num), ",\"status\":%d,\"duration_ms\":%g,",
+                         status, dur_ms);
+        rb_str_cat(buf, num, n);
+        if (has_remote) {
+            CAT_LIT(buf, "\"remote_addr\":\"");
+            rb_str_cat(buf, RSTRING_PTR(rb_remote), RSTRING_LEN(rb_remote));
+            CAT_LIT(buf, "\",");
+        } else {
+            CAT_LIT(buf, "\"remote_addr\":null,");
+        }
+        CAT_LIT(buf, "\"http_version\":\"");
+        rb_str_cat(buf, RSTRING_PTR(rb_http_version), RSTRING_LEN(rb_http_version));
+        CAT_LIT(buf, "\"}\n");
+    } else {
+        /* text: "<ts> \e[32mINFO \e[0m [hyperion] message=request method=..." */
+        rb_str_cat(buf, RSTRING_PTR(rb_ts), RSTRING_LEN(rb_ts));
+        CAT_LIT(buf, " \x1b[32mINFO \x1b[0m [hyperion] message=request method=");
+        rb_str_cat(buf, RSTRING_PTR(rb_method), RSTRING_LEN(rb_method));
+        CAT_LIT(buf, " path=");
+        rb_str_cat(buf, RSTRING_PTR(rb_path), RSTRING_LEN(rb_path));
+        if (has_query) {
+            const char *q_ptr = RSTRING_PTR(rb_query);
+            long q_len = RSTRING_LEN(rb_query);
+            int need_quote = 0;
+            for (long j = 0; j < q_len; j++) {
+                char c = q_ptr[j];
+                if (c == ' ' || c == '\t' || c == '\n' || c == '\r' ||
+                    c == '"' || c == '=') {
+                    need_quote = 1;
+                    break;
+                }
+            }
+            if (need_quote) {
+                VALUE quoted = rb_funcall(rb_query, rb_intern("inspect"), 0);
+                CAT_LIT(buf, " query=");
+                rb_str_cat(buf, RSTRING_PTR(quoted), RSTRING_LEN(quoted));
+            } else {
+                CAT_LIT(buf, " query=");
+                rb_str_cat(buf, q_ptr, q_len);
+            }
+        }
+        char num[80];
+        int n = snprintf(num, sizeof(num), " status=%d duration_ms=%g remote_addr=",
+                         status, dur_ms);
+        rb_str_cat(buf, num, n);
+        if (has_remote) {
+            rb_str_cat(buf, RSTRING_PTR(rb_remote), RSTRING_LEN(rb_remote));
+        } else {
+            CAT_LIT(buf, "nil");
+        }
+        CAT_LIT(buf, " http_version=");
+        rb_str_cat(buf, RSTRING_PTR(rb_http_version), RSTRING_LEN(rb_http_version));
+        CAT_LIT(buf, "\n");
+    }
+    return buf;
+}
+#undef CAT_LIT
+/* Hyperion::CParser.upcase_underscore(name) -> "HTTP_<UPCASED_UNDERSCORED>"
+ *
+ * Single-allocation replacement for `"HTTP_#{name.upcase.tr('-', '_')}"`.
+ * Hot path on the Rack adapter: every uncached request header (any
+ * `X-*` custom header) hits this on every request, and the Ruby version
+ * spawns three String allocations (the upcase result, the tr result, and the
+ * "HTTP_..." interpolation) plus a per-byte loop in tr.
+ *
+ * We allocate one Ruby String of length 5 + name.bytesize, fill it in a
+ * single byte loop, return it. ASCII letters get OR'd with 0x20 inverted
+ * (i.e. cleared bit 5 to upcase 'a'..'z'); '-' becomes '_'; everything else
+ * passes through (header names are ASCII per RFC 9110, but multi-byte UTF-8
+ * bytes pass through bytewise unmolested rather than crashing).
+ *
+ * Encoding is set to US-ASCII because Ruby's String#upcase on an ASCII-only
+ * input returns a US-ASCII string, and the env-key lookup downstream is
+ * encoding-agnostic anyway.
+ */
+static VALUE cupcase_underscore(VALUE self, VALUE rb_name) {
+    (void)self;
+    Check_Type(rb_name, T_STRING);
+    const char *src = RSTRING_PTR(rb_name);
+    long src_len    = RSTRING_LEN(rb_name);
+    /* Single allocation: 5 prefix bytes + N source bytes. */
+    VALUE out = rb_str_new(NULL, 5 + src_len);
+    char *dst = RSTRING_PTR(out);
+    dst[0] = 'H';
+    dst[1] = 'T';
+    dst[2] = 'T';
+    dst[3] = 'P';
+    dst[4] = '_';
+    for (long i = 0; i < src_len; i++) {
+        unsigned char c = (unsigned char)src[i];
+        if (c >= 'a' && c <= 'z') {
+            dst[5 + i] = (char)(c - 32);
+        } else if (c == '-') {
+            dst[5 + i] = '_';
+        } else {
+            dst[5 + i] = (char)c;
+        }
+    }
+    rb_enc_associate(out, rb_usascii_encoding());
+    /* Keep rb_name live across the loop above. RSTRING_PTR returns an
+     * interior pointer that becomes invalid if the GC moves the source
+     * String — unlikely on this tight path, but cheap insurance. */
+    RB_GC_GUARD(rb_name);
+    return out;
+}
+/* Hyperion::CParser.chunked_body_complete?(buffer, body_start)
+ *   -> [complete?, end_offset]
+ *
+ * Walks chunked-transfer framing in `buffer` starting at byte offset
+ * `body_start`. Returns a 2-element array:
+ *   [true,  end_offset] — chunked body fully buffered; end_offset is the
+ *                         byte just after the trailer CRLF (where pipelined
+ *                         bytes from a follow-on request would begin).
+ *   [false, last_safe]  — body is not yet complete; last_safe is the
+ *                         furthest cursor we successfully advanced to,
+ *                         useful as a hint for incremental parsing.
+ *
+ * Mirrors Connection#chunked_body_complete? in pure Ruby — see lib/hyperion/
+ * connection.rb. Trailing whitespace after the size token (e.g. "5 ; ext\r\n")
+ * is permitted as a permissive parse to match the upstream Ruby `.strip`.
+ */
+static VALUE cchunked_body_complete(VALUE self, VALUE rb_buffer, VALUE rb_body_start) {
+    (void)self;
+    Check_Type(rb_buffer, T_STRING);
+    const char *data = RSTRING_PTR(rb_buffer);
+    long len         = RSTRING_LEN(rb_buffer);
+    long cursor      = NUM2LONG(rb_body_start);
+    if (cursor < 0 || cursor > len) {
+        rb_raise(rb_eArgError, "body_start out of range");
+    }
+    long last_safe = cursor;
+    VALUE result   = rb_ary_new_capa(2);
+    while (1) {
+        /* Find the next CRLF starting at cursor. */
+        long line_end = -1;
+        for (long i = cursor; i + 1 < len; i++) {
+            if (data[i] == '\r' && data[i + 1] == '\n') {
+                line_end = i;
+                break;
+            }
+        }
+        if (line_end < 0) {
+            rb_ary_push(result, Qfalse);
+            rb_ary_push(result, LONG2NUM(last_safe));
+            RB_GC_GUARD(rb_buffer);
+            return result;
+        }
+        /* Parse the size token: hex digits up to ';' or whitespace, optional
+         * chunk extension after ';' which we ignore wholesale. */
+        long tok_start = cursor;
+        long tok_end   = line_end;
+        for (long i = cursor; i < line_end; i++) {
+            if (data[i] == ';') { tok_end = i; break; }
+        }
+        /* Trim leading/trailing ASCII whitespace from the token. */
+        while (tok_start < tok_end &&
+               (data[tok_start] == ' ' || data[tok_start] == '\t')) {
+            tok_start++;
+        }
+        while (tok_end > tok_start &&
+               (data[tok_end - 1] == ' ' || data[tok_end - 1] == '\t')) {
+            tok_end--;
+        }
+        if (tok_end <= tok_start) {
+            /* Empty size token — incomplete frame. */
+            rb_ary_push(result, Qfalse);
+            rb_ary_push(result, LONG2NUM(last_safe));
+            RB_GC_GUARD(rb_buffer);
+            return result;
+        }
+        /* Validate + decode hex. */
+        unsigned long size = 0;
+        for (long i = tok_start; i < tok_end; i++) {
+            unsigned char c = (unsigned char)data[i];
+            unsigned int digit;
+            if (c >= '0' && c <= '9') {
+                digit = c - '0';
+            } else if (c >= 'a' && c <= 'f') {
+                digit = 10 + (c - 'a');
+            } else if (c >= 'A' && c <= 'F') {
+                digit = 10 + (c - 'A');
+            } else {
+                /* Non-hex byte: incomplete/malformed. Match the Ruby
+                 * regex `/\A\h+\z/` semantics — return false, advance no
+                 * further. The caller will read more bytes and retry. */
+                rb_ary_push(result, Qfalse);
+                rb_ary_push(result, LONG2NUM(last_safe));
+                RB_GC_GUARD(rb_buffer);
+                return result;
+            }
+            size = (size << 4) | digit;
+        }
+        cursor = line_end + 2;
+        if (size == 0) {
+            /* Final chunk — walk trailer headers until we hit "\r\n\r\n"
+             * (i.e. an empty trailer line directly after the size line). */
+            while (1) {
+                long nl = -1;
+                for (long i = cursor; i + 1 < len; i++) {
+                    if (data[i] == '\r' && data[i + 1] == '\n') {
+                        nl = i;
+                        break;
+                    }
+                }
+                if (nl < 0) {
+                    rb_ary_push(result, Qfalse);
+                    rb_ary_push(result, LONG2NUM(last_safe));
+                    RB_GC_GUARD(rb_buffer);
+                    return result;
+                }
+                if (nl == cursor) {
+                    /* Empty line — body complete. */
+                    rb_ary_push(result, Qtrue);
+                    rb_ary_push(result, LONG2NUM(nl + 2));
+                    RB_GC_GUARD(rb_buffer);
+                    return result;
+                }
+                cursor = nl + 2;
+            }
+        }
+        /* Need cursor + size + 2 bytes (chunk data + trailing CRLF). */
+        if ((unsigned long)(len - cursor) < size + 2) {
+            rb_ary_push(result, Qfalse);
+            rb_ary_push(result, LONG2NUM(last_safe));
+            RB_GC_GUARD(rb_buffer);
+            return result;
+        }
+        cursor += (long)size + 2;
+        last_safe = cursor;
+    }
+}
 void Init_hyperion_http(void) {
     install_settings();
@@ -557,6 +873,12 @@ void Init_hyperion_http(void) {
                                cbuild_response_head, 6);
     rb_define_singleton_method(rb_cCParser, "build_access_line",
                                cbuild_access_line, 9);
+    rb_define_singleton_method(rb_cCParser, "build_access_line_colored",
+                               cbuild_access_line_colored, 9);
+    rb_define_singleton_method(rb_cCParser, "upcase_underscore",
+                               cupcase_underscore, 1);
+    rb_define_singleton_method(rb_cCParser, "chunked_body_complete?",
+                               cchunked_body_complete, 2);
     id_new             = rb_intern("new");
     id_downcase        = rb_intern("downcase");

data/lib/hyperion/adapter/rack.rb CHANGED Viewed

@@ -48,6 +48,17 @@ module Hyperion
         }
       )
+      # Whether Hyperion::CParser.upcase_underscore is available. Probed lazily
+      # at first use (CParser is required after this file, so an eager check
+      # at load time would always be false). Memoised in a class-level ivar to
+      # keep the hot path branchless.
+      def self.c_upcase_available?
+        return @c_upcase_available unless @c_upcase_available.nil?
+        @c_upcase_available = defined?(::Hyperion::CParser) &&
+                              ::Hyperion::CParser.respond_to?(:upcase_underscore)
+      end
       class << self
         # Pre-allocate `n` env-hash and rack-input objects in master before
         # fork. Children inherit the populated free-list via copy-on-write —
@@ -122,8 +133,14 @@ module Hyperion
           env['rack.run_once']     = false
           env['SCRIPT_NAME']       = ''
+          # Header-name → Rack env-key conversion. Cache covers the 16 most
+          # common names; uncached headers (X-* customs, vendor-specific) flow
+          # through CParser.upcase_underscore (single C-level allocation) when
+          # the extension is built, else the pure-Ruby triple-allocation path.
+          c_upcase = Rack.c_upcase_available?
           request.headers.each do |name, value|
-            key = HTTP_KEY_CACHE[name] || "HTTP_#{name.upcase.tr('-', '_')}"
+            key = HTTP_KEY_CACHE[name] ||
+                  (c_upcase ? ::Hyperion::CParser.upcase_underscore(name) : "HTTP_#{name.upcase.tr('-', '_')}")
             env[key] = value
           end

data/lib/hyperion/cli.rb CHANGED Viewed

@@ -26,6 +26,14 @@ module Hyperion
         Hyperion.logger = Hyperion::Logger.new(level: config.log_level, format: config.log_format)
       end
+      # Advisory: operators frequently flip --async-io expecting "fast mode"
+      # without installing a fiber-cooperative I/O library. On hello-world this
+      # costs ~5% rps; on no-I/O workloads more. The flag only pays off when
+      # paired with `hyperion-async-pg` / `async-redis` / `async-http`. We log
+      # once at boot pointing at the operator-guidance docs; the operator's
+      # setting is still honoured.
+      warn_orphan_async_io(config)
       # Propagate log_requests so every Connection picks it up via
       # `Hyperion.log_requests?` without needing to thread it through
       # Server/ThreadPool/Master plumbing. Default is ON; nil means "don't
@@ -261,6 +269,35 @@ WARNING: argv is visible via `ps`; prefer --admin-token-file PATH for production
     end
     private_class_method :maybe_enable_yjit
+    # Probe table for fiber-cooperative I/O libraries. If `async_io: true` is
+    # set but none of these are loaded, the operator has likely flipped the
+    # flag without reading the bench numbers — `--async-io` adds Async-loop
+    # overhead and only pays off when paired with a library whose I/O calls
+    # yield to the scheduler. Hello-world bench (BENCH_2026_04_27.md) showed
+    # a 47% rps regression + 3.65 s p99 spike on this shape.
+    ASYNC_IO_PROBE_LIBS = {
+      'hyperion-async-pg' => -> { defined?(::Hyperion::AsyncPg) },
+      'async-redis' => -> { defined?(::Async::Redis) },
+      'async-http' => -> { defined?(::Async::HTTP) }
+    }.freeze
+    def self.warn_orphan_async_io(config)
+      return unless config.async_io == true # nil and false are both no-ops here
+      detected = ASYNC_IO_PROBE_LIBS.select { |_name, probe| probe.call }.keys
+      return unless detected.empty?
+      Hyperion.logger.warn do
+        {
+          message: 'async_io enabled but no fiber-cooperative I/O library detected',
+          libraries_checked: ASYNC_IO_PROBE_LIBS.keys,
+          impact: 'async_io adds Async-loop overhead (~5-47% rps depending on workload) and only pays off when paired with a library that yields to the Async scheduler on socket waits.',
+          docs: 'https://github.com/andrew-woblavobla/hyperion#operator-guidance'
+        }
+      end
+    end
+    private_class_method :warn_orphan_async_io
     # When admin_token is configured, wrap the app in AdminMiddleware so
     # POST /-/quit and GET /-/metrics become token-protected admin endpoints.
     # Skipped when the token is unset — those paths fall through to the app,

data/lib/hyperion/connection.rb CHANGED Viewed

@@ -287,9 +287,29 @@ module Hyperion
     # Walks chunked framing in `buffer` starting at `body_start` and
     # returns true once the final 0-sized chunk (and trailer terminator)
-    # is fully buffered. Mirrors the parser's dechunk walk; Phase 4's C
-    # parser folds these together via incremental parsing.
+    # is fully buffered. The C extension folds the size-line scan + hex
+    # decode + chunk advance into a single tight loop with no per-iteration
+    # Ruby allocation; the pure-Ruby fallback below preserves the original
+    # semantics for environments where the C extension didn't build.
     def chunked_body_complete?(buffer, body_start)
+      if self.class.c_chunked_available?
+        ::Hyperion::CParser.chunked_body_complete?(buffer, body_start).first
+      else
+        chunked_body_complete_ruby?(buffer, body_start)
+      end
+    end
+    # Whether Hyperion::CParser.chunked_body_complete? is available. Probed
+    # lazily at first use; memoised in a class-level ivar to keep the
+    # per-request hot path branchless.
+    def self.c_chunked_available?
+      return @c_chunked_available unless @c_chunked_available.nil?
+      @c_chunked_available = defined?(::Hyperion::CParser) &&
+                             ::Hyperion::CParser.respond_to?(:chunked_body_complete?)
+    end
+    def chunked_body_complete_ruby?(buffer, body_start)
       cursor = body_start
       loop do
         line_end = buffer.index("\r\n", cursor)

data/lib/hyperion/http2_handler.rb CHANGED Viewed

@@ -18,11 +18,40 @@ module Hyperion
   # dispatch — slow handlers no longer block other streams on the same
   # connection.
   #
-  # All framer writes (HEADERS, DATA, RST_STREAM) are serialized through a
-  # single connection-scoped Mutex (`@send_mutex`). The OpenSSL::SSL::SSLSocket
-  # underneath is not safe to drive from two fibers concurrently, and
-  # protocol-http2's HPACK encoder is also stateful across HEADERS frames,
-  # so all sends must be serialized.
+  # ## Outbound write architecture (1.6.0+)
+  #
+  # Pre-1.6.0 every framer write (HEADERS / DATA / RST_STREAM / GOAWAY) ran
+  # under one connection-scoped `Mutex#synchronize { socket.write(...) }`.
+  # That capped per-connection h2 throughput to "one socket-write at a time"
+  # regardless of stream count: a slow socket (kernel send buffer full,
+  # remote peer reading slowly) blocked every other stream's writes too.
+  #
+  # 1.6.0 splits the path:
+  #   * The HPACK encode + frame format step is fast (microseconds, in-memory)
+  #     and remains serialized on the calling fiber via `@encode_mutex`. HPACK
+  #     state is stateful across HEADERS frames per connection, and frames for
+  #     a single stream must be wire-ordered (HEADERS → DATA → END_STREAM).
+  #     Holding the encode mutex across a `send_*` call accomplishes both.
+  #   * The framer writes through a `SendQueueIO` wrapper (wraps the real
+  #     socket). `SendQueueIO#write(bytes)` enqueues onto a connection-wide
+  #     `@send_queue` and signals `@send_notify`; it never touches the real
+  #     socket.
+  #   * A dedicated **writer fiber** owns the real socket. It pops byte chunks
+  #     off the queue, writes them, and parks on `@send_notify` when empty.
+  #     Only this fiber ever calls `socket.write` — the SSLSocket cross-fiber
+  #     unsafety constraint is satisfied.
+  #
+  # Net effect: the slow-socket case no longer serializes encode work across
+  # streams. A stream that has bytes ready to encode can encode and enqueue
+  # while the writer is mid-flush of an earlier chunk. The mutex hold time
+  # drops from "until the kernel accepts the write" to "until the bytes are
+  # appended to the in-memory queue."
+  #
+  # Backpressure: pathological clients (slow-read h2) could otherwise let the
+  # queue grow without bound. We track `@pending_bytes`; once it exceeds
+  # `MAX_PER_CONN_PENDING_BYTES`, encoding fibers wait on `@drained_notify`
+  # before enqueueing more. The writer signals `@drained_notify` after each
+  # drain pass.
   #
   # Flow control: `RequestStream#window_updated` overrides the protocol-http2
   # default to fan a notification out to any fiber blocked in `send_body`
@@ -31,6 +60,153 @@ module Hyperion
   # size and yields on the notification when the window is exhausted, so
   # large bodies never trip a FlowControlError.
   class Http2Handler
+    # Cap on bytes that may sit in a connection's send queue waiting for the
+    # writer fiber to drain. Slow-read h2 clients can otherwise let an
+    # encoder fiber pile arbitrary bytes into RAM. 16 MiB matches the upper
+    # bound a well-behaved peer will buffer — anything beyond that is the
+    # writer being starved, and the right answer is to backpressure the
+    # encoder rather than allocate more.
+    MAX_PER_CONN_PENDING_BYTES = 16 * 1024 * 1024
+    # IO-shaped wrapper passed to `Protocol::HTTP2::Framer` in place of the
+    # real socket. Reads are direct passthroughs (the read loop runs on the
+    # connection fiber and there's only one reader). Writes are enqueued
+    # onto the connection-wide `WriterContext#queue`; the writer fiber owns
+    # the real socket and drains the queue.
+    #
+    # We deliberately do NOT delegate `flush` to the real socket: writes
+    # don't reach it from this object — the writer fiber does that. `flush`
+    # here is a no-op (the writer flushes after each batch).
+    #
+    # `closed?` reports the real socket's state so protocol-http2's read
+    # loop sees EOF the same way it always has.
+    class SendQueueIO
+      attr_reader :real_socket
+      def initialize(real_socket, writer_ctx)
+        @real_socket = real_socket
+        @writer_ctx  = writer_ctx
+      end
+      # Framer's read path — direct delegation. Single-reader (the conn
+      # fiber), so no contention here.
+      def read(*args)
+        @real_socket.read(*args)
+      end
+      # Framer's write path — non-blocking handoff into the send queue.
+      # Backpressure is applied here: if pending bytes exceed the cap, the
+      # calling fiber parks on the drained notification until the writer
+      # has flushed enough to bring us below the threshold.
+      def write(bytes)
+        return 0 if bytes.nil? || bytes.empty?
+        @writer_ctx.enqueue(bytes)
+        bytes.bytesize
+      end
+      def flush
+        # No-op: bytes don't live in this object, they live in the queue.
+        # The writer fiber flushes the real socket as it drains.
+        nil
+      end
+      def close
+        @real_socket.close unless @real_socket.closed?
+      end
+      # Multi-line on purpose: a single-line `def closed?; @real_socket.closed?; end`
+      # gets autocorrected to `delegate :closed?, to: :@real_socket` by Rails-aware
+      # ruby-lsp formatters, which is wrong here (this is a plain gem, no
+      # ActiveSupport on the dependency graph).
+      def closed?
+        socket = @real_socket
+        socket.closed?
+      end
+    end
+    # Holds the per-connection outbound coordination state (queue,
+    # notifications, byte counters, shutdown flag) plus the encode mutex
+    # that protects HPACK state and per-stream frame ordering.
+    #
+    # Single instance per connection, lives for the lifetime of `serve`.
+    class WriterContext
+      attr_reader :encode_mutex
+      def initialize(max_pending_bytes: MAX_PER_CONN_PENDING_BYTES)
+        @queue              = ::Thread::Queue.new
+        @send_notify        = ::Async::Notification.new
+        @drained_notify     = ::Async::Notification.new
+        @encode_mutex       = ::Mutex.new
+        @pending_bytes      = 0
+        @pending_bytes_lock = ::Mutex.new
+        @max_pending_bytes  = max_pending_bytes
+        @writer_done        = false
+      end
+      # Called by SendQueueIO#write on the calling (encoder) fiber. Enforces
+      # the per-connection backpressure cap before enqueuing.
+      def enqueue(bytes)
+        wait_for_drain_if_full(bytes.bytesize)
+        @pending_bytes_lock.synchronize { @pending_bytes += bytes.bytesize }
+        @queue << bytes
+        @send_notify.signal
+      end
+      # Pops a single chunk; returns nil if the queue is empty (non-blocking).
+      def try_pop
+        @queue.pop(true)
+      rescue ::ThreadError
+        nil
+      end
+      # Called by the writer fiber after each successful drain to release
+      # any encoders blocked on the cap.
+      def note_drained(bytesize)
+        @pending_bytes_lock.synchronize do
+          @pending_bytes -= bytesize
+          @pending_bytes = 0 if @pending_bytes.negative? # paranoia
+        end
+        @drained_notify.signal
+      end
+      def wait_for_signal
+        @send_notify.wait
+      end
+      def shutdown!
+        @writer_done = true
+        # Wake the writer if it's parked, and any encoder waiting on drain.
+        @send_notify.signal
+        @drained_notify.signal
+      end
+      def writer_done?
+        @writer_done
+      end
+      def queue_empty?
+        @queue.empty?
+      end
+      def pending_bytes
+        @pending_bytes_lock.synchronize { @pending_bytes }
+      end
+      private
+      def wait_for_drain_if_full(incoming_bytes)
+        # If we're already at/above the cap, park until the writer has
+        # drained. We re-check after every signal because multiple encoders
+        # can wake on a single drain notification.
+        while !@writer_done &&
+              @pending_bytes_lock.synchronize { @pending_bytes + incoming_bytes > @max_pending_bytes } &&
+              !@queue.empty?
+          @drained_notify.wait
+        end
+      end
+    end
     # Per-stream subclass that captures decoded request pseudo-headers,
     # regular headers, and any DATA frame body bytes for later dispatch.
     # Also exposes a `window_available` notification fan-out so the
@@ -247,21 +423,29 @@ module Hyperion
     def serve(socket)
       @metrics.increment(:connections_accepted)
       @metrics.increment(:connections_active)
-      framer = ::Protocol::HTTP2::Framer.new(socket)
-      server = build_server(framer)
+      # Per-connection outbound coordination. Encoder fibers enqueue bytes;
+      # the writer fiber owns the real socket and drains. See class docstring.
+      writer_ctx   = WriterContext.new
+      send_io      = SendQueueIO.new(socket, writer_ctx)
+      framer       = ::Protocol::HTTP2::Framer.new(send_io)
+      server       = build_server(framer)
+      task = ::Async::Task.current
+      # Spawn the dedicated writer fiber BEFORE the preface exchange.
+      # `Server#read_connection_preface` writes the server's SETTINGS frame
+      # via the framer; if the writer isn't running, those bytes sit in the
+      # queue. Spawning first guarantees they flush as soon as the scheduler
+      # ticks, avoiding any pathological deadlock where a client implementation
+      # waits for our SETTINGS before sending more frames.
+      writer_task = task.async { run_writer_loop(socket, writer_ctx) }
       server.read_connection_preface(initial_settings_payload)
       # Extract once — the same TCP peer drives every stream on this conn.
       peer_addr = peer_address(socket)
-      # All framer writes (HEADERS / DATA / RST_STREAM / GOAWAY) must be
-      # serialized: the underlying SSLSocket is not safe across fibers, and
-      # the HPACK encoder is also stateful. The connection's own frame loop
-      # uses this mutex too — see `dispatch_stream` and `send_body`.
-      send_mutex = ::Mutex.new
-      task = ::Async::Task.current
       # Track in-flight per-stream dispatch fibers so we can drain them on
       # connection close.
       stream_tasks = []
@@ -284,7 +468,7 @@ module Hyperion
           stream.instance_variable_set(:@hyperion_dispatched, true)
           stream_tasks << task.async do
-            dispatch_stream(stream, send_mutex, peer_addr)
+            dispatch_stream(stream, writer_ctx, peer_addr)
           end
         end
       end
@@ -309,6 +493,18 @@ module Hyperion
         }
       end
     ensure
+      # Coordinated shutdown: flag the writer, signal it, wait for the final
+      # drain, then close the real socket. Order matters — closing the
+      # socket before the writer drains would discard final RST_STREAM /
+      # GOAWAY / END_STREAM frames in the queue.
+      if writer_ctx
+        writer_ctx.shutdown!
+        begin
+          writer_task&.wait
+        rescue StandardError
+          nil
+        end
+      end
       @metrics.decrement(:connections_active)
       socket.close unless socket.closed?
     end
@@ -394,7 +590,7 @@ module Hyperion
       server
     end
-    def dispatch_stream(stream, send_mutex, peer_addr = nil)
+    def dispatch_stream(stream, writer_ctx, peer_addr = nil)
       # RFC 7540 §8.1.2 — header validation flagged this stream as malformed.
       # Send RST_STREAM PROTOCOL_ERROR instead of invoking the app.
       if stream.protocol_error?
@@ -403,7 +599,7 @@ module Hyperion
         end
         @metrics.increment(:requests_rejected)
         begin
-          send_mutex.synchronize do
+          writer_ctx.encode_mutex.synchronize do
             stream.send_reset_stream(::Protocol::HTTP2::Error::PROTOCOL_ERROR) unless stream.closed?
           end
         rescue StandardError
@@ -459,8 +655,8 @@ module Hyperion
       body_chunks.each { |c| payload << c.to_s }
       body_chunks.close if body_chunks.respond_to?(:close)
-      send_mutex.synchronize { stream.send_headers(out_headers) }
-      send_body(stream, payload, send_mutex)
+      writer_ctx.encode_mutex.synchronize { stream.send_headers(out_headers) }
+      send_body(stream, payload, writer_ctx)
       @metrics.increment_status(status)
     rescue StandardError => e
       @metrics.increment(:app_errors)
@@ -473,7 +669,9 @@ module Hyperion
         }
       end
       begin
-        send_mutex.synchronize { stream.send_reset_stream(::Protocol::HTTP2::Error::INTERNAL_ERROR) }
+        writer_ctx.encode_mutex.synchronize do
+          stream.send_reset_stream(::Protocol::HTTP2::Error::INTERNAL_ERROR)
+        end
       rescue StandardError
         nil
       end
@@ -485,9 +683,12 @@ module Hyperion
     # notification — protocol-http2 calls `window_updated` on every active
     # stream when WINDOW_UPDATE frames arrive (either stream- or
     # connection-scoped), which signals the notification.
-    def send_body(stream, payload, send_mutex)
+    #
+    # The encode_mutex protects HPACK state and per-stream frame ordering;
+    # the actual socket write happens off-fiber via the writer task.
+    def send_body(stream, payload, writer_ctx)
       if payload.empty?
-        send_mutex.synchronize { stream.send_data('', ::Protocol::HTTP2::END_STREAM) }
+        writer_ctx.encode_mutex.synchronize { stream.send_data('', ::Protocol::HTTP2::END_STREAM) }
         return
       end
@@ -508,7 +709,69 @@ module Hyperion
         offset += chunk.bytesize
         flags = offset >= bytesize ? ::Protocol::HTTP2::END_STREAM : 0
-        send_mutex.synchronize { stream.send_data(chunk, flags) }
+        writer_ctx.encode_mutex.synchronize { stream.send_data(chunk, flags) }
+      end
+    end
+    # Drain bytes off the per-connection send queue onto the real socket.
+    # This fiber is the SOLE writer to `socket` for the connection's
+    # lifetime, which satisfies SSLSocket's "no concurrent writes from
+    # different fibers" constraint.
+    #
+    # The loop:
+    #   1. Drain everything currently enqueued (non-blocking pops).
+    #   2. If we drained anything, signal `@drained_notify` so backpressured
+    #      encoders can resume, then loop again — more bytes may have been
+    #      enqueued while we were writing.
+    #   3. If shutdown was requested AND the queue is empty, exit.
+    #   4. Otherwise park on the send notification until an encoder pokes us.
+    def run_writer_loop(socket, writer_ctx)
+      loop do
+        drained_bytes = 0
+        while (chunk = writer_ctx.try_pop)
+          begin
+            socket.write(chunk)
+          rescue EOFError, Errno::ECONNRESET, Errno::EPIPE, IOError, OpenSSL::SSL::SSLError
+            # Peer hung up. Release THIS chunk's byte budget, then drain the
+            # rest of the queue (without writing) so backpressured encoders
+            # don't stall waiting on a writer that's about to exit. Any
+            # remaining queued bytes are dropped — the connection is dead.
+            writer_ctx.note_drained(chunk.bytesize)
+            drain_and_discard_queue(writer_ctx)
+            return
+          end
+          drained_bytes += chunk.bytesize
+          writer_ctx.note_drained(chunk.bytesize)
+        end
+        # Some sockets (SSLSocket on a TCPSocket whose Nagle is off) need an
+        # explicit flush to push small final frames (END_STREAM data, GOAWAY)
+        # without waiting for the next write. Cheap when there's nothing
+        # buffered.
+        socket.flush if drained_bytes.positive? && socket.respond_to?(:flush) && !socket.closed?
+        return if writer_ctx.writer_done? && writer_ctx.queue_empty?
+        writer_ctx.wait_for_signal
+      end
+    rescue StandardError => e
+      @logger.error do
+        {
+          message: 'h2 writer loop error',
+          error: e.message,
+          error_class: e.class.name,
+          backtrace: (e.backtrace || []).first(10).join(' | ')
+        }
+      end
+    end
+    # On peer-disconnect we discard any queued bytes (we can't write them),
+    # but we MUST still decrement the byte counter for each one or
+    # backpressured encoder fibers will park forever on the drain
+    # notification.
+    def drain_and_discard_queue(writer_ctx)
+      while (chunk = writer_ctx.try_pop)
+        writer_ctx.note_drained(chunk.bytesize)
       end
     end

data/lib/hyperion/logger.rb CHANGED Viewed

@@ -65,6 +65,7 @@ module Hyperion
       # check the regular stream here — colored text is for humans.
       @colorize = @format == :text && tty?(@out)
       @c_access_available = nil # lazy-computed on first access — see below.
+      @c_access_colored_available = nil # ditto for the coloured TTY variant.
       # Registry of every per-thread access buffer ever allocated through
       # this Logger instance. Walked by #flush_all on shutdown so SIGTERM
       # doesn't strand buffered lines in dying threads. The Mutex guards
@@ -94,6 +95,16 @@ module Hyperion
                             ::Hyperion::CParser.respond_to?(:build_access_line)
     end
+    # Whether Hyperion::CParser.build_access_line_colored is available. Same
+    # lazy-probe pattern as #c_access_available?; lets a colored-TTY run pick
+    # up the C path instead of the Ruby fallback.
+    def c_access_colored_available?
+      return @c_access_colored_available unless @c_access_colored_available.nil?
+      @c_access_colored_available = defined?(::Hyperion::CParser) &&
+                                    ::Hyperion::CParser.respond_to?(:build_access_line_colored)
+    end
     LEVELS.each_key do |lvl|
       define_method(lvl) do |payload = nil, &block|
         next unless emit?(lvl)
@@ -140,7 +151,12 @@ module Hyperion
       # which the C builder doesn't emit. Production deploys (non-TTY,
       # log-aggregator destinations) take the C path; local TTY runs keep the
       # colored Ruby fallback.
-      line = if !@colorize && c_access_available?
+      line = if @colorize && c_access_colored_available?
+               # Colored TTY path: green INFO label baked into the C builder.
+               ::Hyperion::CParser.build_access_line_colored(@format, ts, method, path,
+                                                             query, status, duration_ms,
+                                                             remote_addr, http_version)
+             elsif !@colorize && c_access_available?
                ::Hyperion::CParser.build_access_line(@format, ts, method, path,
                                                      query, status, duration_ms,
                                                      remote_addr, http_version)

data/lib/hyperion/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module Hyperion
-  VERSION = '1.5.0'
+  VERSION = '1.6.1'
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: hyperion-rb
 version: !ruby/object:Gem::Version
-  version: 1.5.0
+  version: 1.6.1
 platform: ruby
 authors:
 - Andrey Lobanov