hyperion-rb 1.5.0 → 1.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ab7691ac6671b0e0c9606c281c55659c76675b71f3d461d1fb5bf6a03680861b
4
- data.tar.gz: b7ad35585d56e59d4a7b5c9fcb6d4e016e72b4c3f99496ba675ca7e871865718
3
+ metadata.gz: 388377a54507d370411ae4b229ff575e191742ba6e3dc044c9c8990552bff5ff
4
+ data.tar.gz: 8cc9cd083c9450948ba3a710cb5514f16bc31b8a421ebd405c4064129b0b031c
5
5
  SHA512:
6
- metadata.gz: 8911a91c7932b332a9d5f069099c7f6ded94d9b5978dffd259881ab482066d5328c508ca5983101c6d9d04b18c1353664766bf759ec66cb034d3bcdf84f01a89
7
- data.tar.gz: 7c948c98eb9aea2cb31595e08deca0c4e98c2281105a18fc0419678da25a04ac9b04f9defe564ea660104cf935399582506eb87769f6f9fbbca74f568c8f904b
6
+ metadata.gz: 389098362215d01ce8fa08add90d29871390e0c9c5e38d384caa50ee2605210005c252ecbaea191f379d59580e2c4d4573b94ef8c9308259e1959abee81e4397
7
+ data.tar.gz: f3d2664e553a2b3c24f8518ed9b65e73a49bbefd0dee03f0c602d7867fdd37ed0a8bf7030abcee68b9020c54ee80fe3cc039c93e4e004fbd372cea02313592bd
data/CHANGELOG.md CHANGED
@@ -1,5 +1,58 @@
1
1
  # Changelog
2
2
 
3
+ ## [1.6.1] - 2026-04-27
4
+
5
+ Audit follow-up from the [BENCH_2026_04_27.md](docs/BENCH_2026_04_27.md) sweep. No code-path changes; doc surface and operator-UX polish.
6
+
7
+ ### Added
8
+ - **`## Operator guidance` README section** — concrete "when do I pick which config?" tables. Translates the bench numbers into decisions: `-w 1 + larger pool` vs `-w N + smaller pool` for I/O-bound (multi-worker is 2.6× memory for 0.77× rps if you pick wrong on PG-wait); the `--async-io` decision tree (default OFF unless you're paired with a fiber-cooperative library); how to read p50 vs p99 (tail wins are 5-200× larger than the rps story suggests — size capacity by p99).
9
+ - **Boot-time advisory warn for orphan `--async-io`** — if `async_io: true` is set but no fiber-cooperative library is loaded (`hyperion-async-pg`, `async-redis`, `async-http`), Hyperion logs a single advisory warn at boot pointing at the operator-guidance docs. The setting is still honoured; the warn just helps operators who flipped the flag expecting a free perf bump (bench showed `--async-io` on hello-world = 47% rps regression + 3.65 s p99 spike).
10
+ - **4 new specs in `spec/hyperion/cli_async_io_warn_spec.rb`** covering all four warn-fire cases (true + no library, false, nil, true + library detected via stub_const).
11
+
12
+ ## [1.6.0] - 2026-04-27
13
+
14
+ Two parallel improvements landing in 1.6.0:
15
+ 1. Three small C-extension additions on the request hot path (sibling commit — see "Performance" below).
16
+ 2. Architectural rewrite of the HTTP/2 outbound write path — per-stream send queue + dedicated writer fiber replace the global `@send_mutex` (see "HTTP/2 writer architecture" below).
17
+
18
+ These are independent and can be reviewed / reverted separately. The CHANGELOG sub-sections will be merged before tag.
19
+
20
+ ### HTTP/2 writer architecture (Changed)
21
+ - **`Hyperion::Http2Handler` now uses a per-connection writer fiber instead of a single send Mutex.** Pre-1.6.0 every framer write — HEADERS, DATA, RST_STREAM, GOAWAY — ran inside one `@send_mutex.synchronize { socket.write(...) }`. That capped per-connection h2 throughput at "one socket-write at a time" regardless of how many streams were concurrently in flight: a slow socket (kernel send buffer full, peer reading slowly) blocked every other stream's writes too. 1.6.0 splits the path:
22
+ - **Encode + frame format** (HPACK encoding, frame layout) is fast (microseconds, in-memory) and stays serialized on the calling fiber via `WriterContext#encode_mutex`. HPACK state is connection-scoped and stateful across HEADERS frames; per-stream wire order (HEADERS → DATA → END_STREAM) must also be preserved. Holding the encode mutex across a `stream.send_*` call satisfies both.
23
+ - **Bytes-to-socket** is owned by a dedicated `run_writer_loop` fiber spawned per connection. Encoder fibers hand bytes off via `WriterContext#enqueue` (non-blocking, signals an `Async::Notification`); the writer pops chunks from the queue and writes them. Only this fiber ever calls `socket.write`, satisfying SSLSocket's "no concurrent writes from different fibers" constraint.
24
+ - **Net effect**: a stream that has bytes ready can encode and enqueue while the writer is mid-flush of an earlier chunk — the slow-socket case no longer serializes encode work across streams. Mutex hold time drops from "until the kernel accepts the write" to "until the bytes are appended to the in-memory queue."
25
+ - **Per-connection backpressure cap** (`MAX_PER_CONN_PENDING_BYTES = 16 MiB`). Pathological clients that read very slowly could otherwise let the queue grow without bound. `WriterContext#enqueue` parks the encoder on `@drained_notify` once `@pending_bytes` exceeds the cap; the writer signals `@drained_notify` after each drain pass.
26
+ - **Coordinated shutdown**: when `Http2Handler#serve` exits (clean close, peer disconnect, or protocol error), the `ensure` block sets `WriterContext#shutdown!` and `writer_task.wait`s for the final drain BEFORE closing the socket. Order matters — closing the socket first would discard final RST_STREAM / GOAWAY / END_STREAM frames sitting in the queue.
27
+
28
+ ### HTTP/2 writer architecture (Added)
29
+ - **`Hyperion::Http2Handler::SendQueueIO`** — IO-shaped wrapper passed to `Protocol::HTTP2::Framer` in place of the raw socket. `read` is a passthrough (single-reader on the connection fiber); `write` enqueues onto the connection-wide queue. Reports `closed?` from the underlying socket so framer EOF detection still works.
30
+ - **`Hyperion::Http2Handler::WriterContext`** — holds the per-connection queue, the encode mutex, the send/drained notifications, and the byte-budget counters. One instance per connection; lives for the lifetime of `Http2Handler#serve`.
31
+ - **9 new specs in `spec/hyperion/http2_writer_loop_spec.rb`**:
32
+ - `SendQueueIO#write` returns bytesize, enqueues without writing the socket, no-ops on empty/nil, reports the underlying socket's `closed?` state (4).
33
+ - Writer loop drains a single encoder's frames in enqueue order (1).
34
+ - Two encoder fibers pushing concurrently — bytes for both streams reach the wire and per-stream order (HEADERS → DATA → END) is preserved (1).
35
+ - Backpressure parks the encoder when `@pending_bytes` exceeds `max_pending_bytes`; encoder resumes after the writer drains (1).
36
+ - Shutdown drains all queued frames before the writer fiber exits; shutdown with an empty queue exits cleanly (2).
37
+ - **`bench/h2_streams.sh`** — `h2load`-driven recipe (`-c 1 -m 100 -n 5000`) for measuring per-connection multi-stream rps. Skips with a clear message if `h2load` isn't on PATH; emits a one-line JSON summary so cross-version diffs are easy.
38
+
39
+ ### HTTP/2 writer architecture (Migration)
40
+ - No public-API changes. Operators do not need to touch config or restart with new flags. The architectural change is internal to `Http2Handler`.
41
+
42
+ ### HTTP/2 writer architecture (Notes)
43
+ - HPACK's dynamic-table state is shared across all streams on a connection (per RFC 7541 §2.3.2.1). That is why we still serialize encode work — two fibers calling `stream.send_headers` concurrently would corrupt the encoder's table state. The mutex is now microseconds-of-CPU rather than "however long the socket takes to drain N MB."
44
+ - `Async::Notification#signal` is a no-op when there are no waiters (signals are not buffered). The writer loop accordingly re-checks `writer_done? && queue_empty?` before parking, so a `shutdown!` call that races a `wait_for_signal` doesn't deadlock.
45
+
46
+ ### Performance
47
+ - **`Hyperion::CParser.upcase_underscore(name)` — C-level Rack header-name normalizer.** Replaces the per-uncached-header `"HTTP_#{name.upcase.tr('-', '_')}"` allocation in `Adapter::Rack#build_env`. Single allocation (5 prefix bytes + N source bytes), single byte loop, no Ruby intermediates. Microbench (5 typical X-* names per call): 460k i/s Ruby → 2.21M i/s C, **4.80×** faster (2.17 μs → 452 ns/iter). On a header-heavy hello-world rackup with 8 X-Custom-* request headers + 9 response headers, headline throughput went from ~16.6k r/s to ~18.0k r/s wrk-driven (~+8.5%, averaged across 3 trials). The 16-name `HTTP_KEY_CACHE` still short-circuits the common headers; this only fires on uncached customs.
48
+ - **`Hyperion::CParser.chunked_body_complete?(buffer, body_start)` — chunked-transfer body completion check in C.** Replaces the pure-Ruby walker in `Connection#chunked_body_complete?` with a C-level loop that scans CRLF boundaries, decodes hex sizes, and advances the cursor without per-iteration `String#index` / `byteslice` / `split` allocations. Returns `[complete?, last_safe_offset]` so the caller can persist parse progress across read boundaries (handy for pipelined / streaming buffers, even though Connection currently only consults the boolean). Microbench (3 mixed buffers per iter): 283k i/s Ruby → 3.73M i/s C, **13.19×** faster (3.54 μs → 268 ns/iter). Profit is small in production because chunked uploads are rare, but the path now matches the rest of the parser in cost shape.
49
+ - **`Hyperion::CParser.build_access_line_colored(...)` — TTY-coloured access-log builder in C.** Mirrors `build_access_line` with the green ANSI escape pair `\e[32mINFO \e[0m` baked into the level label. Ten extra bytes per line, single allocation. The pre-1.6.0 `Logger#access` path fell back to the slower Ruby builder whenever `@colorize` was on (i.e. local TTY / dev runs); now the C builder fires there too. Microbench: 1.78M i/s Ruby → 2.90M i/s C, **1.63×** faster (561 ns → 345 ns per line). Smaller win than the others — the Ruby builder was already a single interpolation — but closes the parity gap so dev-loop `tail -f` doesn't pay an avoidable Ruby tax.
50
+
51
+ ### Added
52
+ - **9 new specs in `spec/hyperion/c_upcase_underscore_spec.rb`** plus a fallback-parity assertion that flips `Hyperion::Adapter::Rack.@c_upcase_available` to walk both the C and Ruby branches in one process. Covers lowercase / uppercase / multi-dash / empty / single-byte / non-ASCII byte-pass-through / digit-preservation / Ruby-equivalence on a panel of canonical custom names / encoding (US-ASCII).
53
+ - **13 new specs in `spec/hyperion/c_chunked_body_complete_spec.rb`** including a fallback-parity assertion against the original Ruby walker. Covers single chunk, multi-chunk, trailers, partial CRLF, partial size token, partial chunk data, chunk extensions, body_start offset, last-safe-cursor reporting on partial buffers, ArgumentError on out-of-range body_start, and a panel of mixed inputs that must agree byte-for-byte with the Ruby walker.
54
+ - **9 new specs in `spec/hyperion/c_access_line_colored_spec.rb`** plus a Logger#access integration test that constructs a TTY-faking IO and asserts the green INFO label appears in the emitted line. Covers text + json formats, query nil/empty/quote-trigger, remote_addr nil, ANSI absence in JSON, and byte-for-byte parity against a hand-rolled Ruby colored builder.
55
+
3
56
  ## [1.5.0] - 2026-04-27
4
57
 
5
58
  Audit-driven CLI + adapter polish. No breaking changes; pure additions to the operator surface and a hardening of the host-header parser.
data/README.md CHANGED
@@ -25,7 +25,9 @@ bundle exec hyperion config.ru
25
25
 
26
26
  ## Benchmarks
27
27
 
28
- All numbers are real wrk runs against published Hyperion configs. Hyperion ships **with default-ON structured access logs**; Puma comparisons use Puma defaults (no per-request log emission). Each section is stamped with the Hyperion version it was measured against — newer versions (1.3.0+ `--async-io`, 1.4.0+ TLS h1 inline, 1.4.1+ Metrics fiber-key fix) preserve or improve these numbers; we re-run the headline configs each release and have not seen regressions on these workloads.
28
+ All numbers are real wrk runs against published Hyperion configs. Hyperion ships **with default-ON structured access logs**; Puma comparisons use Puma defaults (no per-request log emission). Each section is stamped with the Hyperion version it was measured against — newer versions (1.3.0+ `--async-io`, 1.4.0+ TLS h1 inline, 1.4.1+ Metrics fiber-key fix, 1.6.0+ HTTP/2 writer fiber + 3 C-ext additions) preserve or improve these numbers; we re-run the headline configs each release and have not seen regressions on these workloads.
29
+
30
+ > **Comprehensive matrix for 1.6.0 + hyperion-async-pg 0.5.0 (16-vCPU Linux, 9 workloads × 25+ configs)**: see [`docs/BENCH_2026_04_27.md`](docs/BENCH_2026_04_27.md). Headline: 98,818 r/s on hello `-w 16`, 21,215 r/s `-w 4` at p99 < 2 ms, 2,180 r/s on a 50 ms-waiting PG workload (4.1× the best Puma), 1,667 req/s HTTP/2 multiplexed at 0 errors, 155 MB RSS for 10k idle keep-alive connections.
29
31
 
30
32
  ### Hello-world Rack app
31
33
 
@@ -201,6 +203,8 @@ The architectural difference shows up under **load**, not at idle: Puma can only
201
203
 
202
204
  Hyperion fans 100 in-flight streams across separate fibers within a single TCP connection. A serial server would take 5 s; the fiber-multiplexed result (1.04 s, ~96 req/s on one socket) is bounded by single-handler sleep time plus framing overhead. Puma has no native HTTP/2 path — production deployments terminate h2 at nginx and forward h1 to the worker pool, which serializes again.
203
205
 
206
+ > **1.6.0 outbound write path** — `Http2Handler` no longer serializes every framer write through one `Mutex#synchronize { socket.write(...) }`. HPACK encoding (microseconds, in-memory) still serializes on a fast encode mutex, but the actual `socket.write` is owned by a dedicated per-connection writer fiber draining a queue. On per-connection multi-stream workloads where the kernel send buffer or peer reads are slow, encode work for ready streams overlaps the writer's flush of earlier chunks, instead of stacking up behind it. See `bench/h2_streams.sh` (`h2load -c 1 -m 100 -n 5000`) for a recipe to compare 1.5.0 vs 1.6.0 on a workload of your choice.
207
+
204
208
  ### Reproduce
205
209
 
206
210
  ```sh
@@ -318,6 +322,62 @@ Strict DSL: unknown methods raise `NoMethodError` at boot — typos surface imme
318
322
 
319
323
  A documented sample lives at [`config/hyperion.example.rb`](config/hyperion.example.rb).
320
324
 
325
+ ## Operator guidance
326
+
327
+ Concrete tradeoffs distilled from [`docs/BENCH_2026_04_27.md`](docs/BENCH_2026_04_27.md). If the bench numbers cited below feel surprising, check that doc for the full matrix + caveats.
328
+
329
+ ### When to use `-w N`
330
+
331
+ | Workload shape | Recommended | Why |
332
+ |---|---|---|
333
+ | **Pure I/O-bound** (PG / Redis / external HTTP, no significant CPU) | `-w 1` + larger pool | Bench: `-w 1 pool=200` = 87 MB / 2,180 r/s vs `-w 4 pool=64` = 224 MB / 1,680 r/s. **2.6× more memory, 0.77× rps** if you pick multi-worker on a wait-bound workload. |
334
+ | **Pure CPU-bound** (heavy JSON / template render / image processing) | `-w N` matching CPU count | Each worker's accept loop is single-threaded under `--async-io`; multi-worker gives CPU-parallelism. Bench: `-w 16 -t 5` hits 98,818 r/s on a 16-vCPU box, 4.7× a `-w 1` ceiling on the same hardware. |
335
+ | **Mixed** (Rails-shaped: ~5 ms CPU + 50 ms PG wait per request) | `-w N/2` (half cores) + medium pool | Lets CPU work parallelise while keeping per-worker memory tractable. Bench `pg_mixed.ru` at `-w 4 -t 5 pool=128` = 1,740 r/s with no cold-start spike (ForkSafe `prefill_in_child: true`). |
336
+
337
+ Multi-worker on PG-wait workloads is the **wrong** default for most apps — the headline rps doesn't justify the memory and PG-connection cost. Verify your shape with the bench before scaling out.
338
+
339
+ ### When to use `--async-io`
340
+
341
+ ```
342
+ Are you using a fiber-cooperative I/O library?
343
+ (hyperion-async-pg, async-redis, async-http)
344
+
345
+ ┌─────────────┴─────────────┐
346
+ yes no
347
+ │ │
348
+ Pair with a fiber-aware Leave --async-io OFF.
349
+ connection pool Default thread-pool dispatch
350
+ (FiberPool, async-pool — is faster for synchronous
351
+ NOT connection_pool gem, Rails apps. Bench: --async-io
352
+ which uses non-fiber Mutex). on hello-world = 47% rps
353
+ │ regression + p99 spike to
354
+ Set --async-io. 3.65 s under no-yield workloads.
355
+ Pool size is the real No reason to flip the flag.
356
+ concurrency knob; -t is
357
+ decorative for wait-bound.
358
+ ```
359
+
360
+ Hyperion warns at boot if you set `--async-io` without any fiber-cooperative library loaded. The setting is still honoured; the warn just nudges operators who flipped it expecting a free perf bump.
361
+
362
+ ### Tuning `-t` and pool sizes
363
+
364
+ - **Without `--async-io`** (sync server, default): `-t` is the concurrency knob. Each in-flight request holds an OS thread; pool size should match `-t`. Bench shows Puma-style behaviour — at 200 wrk conns hitting a 5-thread server, queue depth dominates p99 (Hyperion `-t 5 -w 1` p50 = 0.95 ms vs Puma's same shape at 59.5 ms — Hyperion's queueing is cheaper but the model still serializes at `-t`).
365
+ - **With `--async-io` + a fiber-aware pool**: pool size is the concurrency knob. `-t` is decorative for wait-bound workloads; one accept-loop fiber serves all in-flight queries via the pool. Linear scaling: pool=64 → ~780 r/s, pool=128 → ~1,344 r/s, pool=200 → ~2,180 r/s on 50 ms PG queries.
366
+ - **Pool over WAN**: if `PG.connect` round-trip is >50 ms, expect pool fill at startup to take `pool_size / parallel_fill_threads × RTT`. `hyperion-async-pg 0.5.1+` auto-scales `parallel_fill_threads` so pool=200 fills in ~1-2 s.
367
+
368
+ ### How to read p50 vs p99
369
+
370
+ Tail latency tells the queueing story; rps tells the throughput story. Hyperion's tail wins are **always** bigger than its rps wins — sometimes the rps numbers look close to a competitor while p99 is 5-200× lower:
371
+
372
+ | Workload | Hyperion rps / p99 | Closest competitor | rps ratio | p99 ratio |
373
+ |---|---|---|---:|---:|
374
+ | Hello `-w 4` | 21,215 r/s / 1.87 ms | Falcon 24,061 / 9.78 ms | 0.88× | **5.2× lower** |
375
+ | CPU JSON `-w 4` | 15,582 r/s / 2.47 ms | Falcon 18,643 / 13.51 ms | 0.84× | **5.5× lower** |
376
+ | Static 1 MiB | 1,919 r/s / 4.22 ms | Puma 2,074 / 55 ms | 0.93× | **13× lower** |
377
+ | PG-wait `-w 1` pool=200 | 2,180 r/s / 668 ms | Puma 530 r/s + 200 timeouts | **4.1×** | qualitative crush |
378
+
379
+ **Size capacity by p99, not by mean.** Throughput peaks are easy to fake under controlled bench conditions; tail latency reflects what your slowest user actually experiences when the load balancer fans them onto a busy worker.
380
+
321
381
  ## Logging
322
382
 
323
383
  Default behaviour (rc16+):
@@ -543,6 +543,322 @@ static VALUE cbuild_access_line(VALUE self,
543
543
  }
544
544
  #undef CAT_LIT
545
545
 
546
+ /* Hyperion::CParser.build_access_line_colored(format, ts, method, path, query,
547
+ * status, duration_ms, remote_addr,
548
+ * http_version) -> String
549
+ *
550
+ * TTY-coloured variant of build_access_line. The text path wraps the level
551
+ * label with ANSI escape "\e[32mINFO \e[0m" so a developer running Hyperion
552
+ * in a terminal sees a green INFO tag. The :json branch is identical to the
553
+ * non-coloured builder — JSON access lines are machine-readable and never
554
+ * carry ANSI escapes.
555
+ *
556
+ * Lifted from cbuild_access_line above; the only divergence is the level
557
+ * label injection in the text branch. We deliberately duplicate the text
558
+ * format rather than templating, because the text body is short and a
559
+ * single function with a colour flag would compile to the same code with an
560
+ * extra branch in the hot loop.
561
+ */
562
+ static VALUE cbuild_access_line_colored(VALUE self,
563
+ VALUE format_sym, VALUE rb_ts,
564
+ VALUE rb_method, VALUE rb_path,
565
+ VALUE rb_query, VALUE rb_status,
566
+ VALUE rb_duration, VALUE rb_remote,
567
+ VALUE rb_http_version) {
568
+ (void)self;
569
+ Check_Type(rb_ts, T_STRING);
570
+ Check_Type(rb_method, T_STRING);
571
+ Check_Type(rb_path, T_STRING);
572
+ Check_Type(rb_http_version, T_STRING);
573
+
574
+ int is_json = (TYPE(format_sym) == T_SYMBOL) &&
575
+ (SYM2ID(format_sym) == rb_intern("json"));
576
+
577
+ int status = NUM2INT(rb_status);
578
+ double dur_ms = NUM2DBL(rb_duration);
579
+
580
+ int has_query = !NIL_P(rb_query) && RSTRING_LEN(rb_query) > 0;
581
+ int has_remote = !NIL_P(rb_remote) && RSTRING_LEN(rb_remote) > 0;
582
+
583
+ #define CAT_LIT(b, s) rb_str_cat((b), (s), (long)(sizeof(s) - 1))
584
+
585
+ VALUE buf = rb_str_buf_new(512);
586
+
587
+ if (is_json) {
588
+ /* JSON output is identical to the non-coloured path — ANSI escapes
589
+ * have no place in a structured log record. */
590
+ CAT_LIT(buf, "{\"ts\":\"");
591
+ rb_str_cat(buf, RSTRING_PTR(rb_ts), RSTRING_LEN(rb_ts));
592
+ CAT_LIT(buf, "\",\"level\":\"info\",\"source\":\"hyperion\",\"message\":\"request\",");
593
+ CAT_LIT(buf, "\"method\":\"");
594
+ rb_str_cat(buf, RSTRING_PTR(rb_method), RSTRING_LEN(rb_method));
595
+ CAT_LIT(buf, "\",\"path\":\"");
596
+ rb_str_cat(buf, RSTRING_PTR(rb_path), RSTRING_LEN(rb_path));
597
+ CAT_LIT(buf, "\"");
598
+
599
+ if (has_query) {
600
+ CAT_LIT(buf, ",\"query\":\"");
601
+ rb_str_cat(buf, RSTRING_PTR(rb_query), RSTRING_LEN(rb_query));
602
+ CAT_LIT(buf, "\"");
603
+ }
604
+
605
+ char num[64];
606
+ int n = snprintf(num, sizeof(num), ",\"status\":%d,\"duration_ms\":%g,",
607
+ status, dur_ms);
608
+ rb_str_cat(buf, num, n);
609
+
610
+ if (has_remote) {
611
+ CAT_LIT(buf, "\"remote_addr\":\"");
612
+ rb_str_cat(buf, RSTRING_PTR(rb_remote), RSTRING_LEN(rb_remote));
613
+ CAT_LIT(buf, "\",");
614
+ } else {
615
+ CAT_LIT(buf, "\"remote_addr\":null,");
616
+ }
617
+
618
+ CAT_LIT(buf, "\"http_version\":\"");
619
+ rb_str_cat(buf, RSTRING_PTR(rb_http_version), RSTRING_LEN(rb_http_version));
620
+ CAT_LIT(buf, "\"}\n");
621
+ } else {
622
+ /* text: "<ts> \e[32mINFO \e[0m [hyperion] message=request method=..." */
623
+ rb_str_cat(buf, RSTRING_PTR(rb_ts), RSTRING_LEN(rb_ts));
624
+ CAT_LIT(buf, " \x1b[32mINFO \x1b[0m [hyperion] message=request method=");
625
+ rb_str_cat(buf, RSTRING_PTR(rb_method), RSTRING_LEN(rb_method));
626
+ CAT_LIT(buf, " path=");
627
+ rb_str_cat(buf, RSTRING_PTR(rb_path), RSTRING_LEN(rb_path));
628
+
629
+ if (has_query) {
630
+ const char *q_ptr = RSTRING_PTR(rb_query);
631
+ long q_len = RSTRING_LEN(rb_query);
632
+ int need_quote = 0;
633
+ for (long j = 0; j < q_len; j++) {
634
+ char c = q_ptr[j];
635
+ if (c == ' ' || c == '\t' || c == '\n' || c == '\r' ||
636
+ c == '"' || c == '=') {
637
+ need_quote = 1;
638
+ break;
639
+ }
640
+ }
641
+ if (need_quote) {
642
+ VALUE quoted = rb_funcall(rb_query, rb_intern("inspect"), 0);
643
+ CAT_LIT(buf, " query=");
644
+ rb_str_cat(buf, RSTRING_PTR(quoted), RSTRING_LEN(quoted));
645
+ } else {
646
+ CAT_LIT(buf, " query=");
647
+ rb_str_cat(buf, q_ptr, q_len);
648
+ }
649
+ }
650
+
651
+ char num[80];
652
+ int n = snprintf(num, sizeof(num), " status=%d duration_ms=%g remote_addr=",
653
+ status, dur_ms);
654
+ rb_str_cat(buf, num, n);
655
+
656
+ if (has_remote) {
657
+ rb_str_cat(buf, RSTRING_PTR(rb_remote), RSTRING_LEN(rb_remote));
658
+ } else {
659
+ CAT_LIT(buf, "nil");
660
+ }
661
+
662
+ CAT_LIT(buf, " http_version=");
663
+ rb_str_cat(buf, RSTRING_PTR(rb_http_version), RSTRING_LEN(rb_http_version));
664
+ CAT_LIT(buf, "\n");
665
+ }
666
+
667
+ return buf;
668
+ }
669
+ #undef CAT_LIT
670
+
671
+ /* Hyperion::CParser.upcase_underscore(name) -> "HTTP_<UPCASED_UNDERSCORED>"
672
+ *
673
+ * Single-allocation replacement for `"HTTP_#{name.upcase.tr('-', '_')}"`.
674
+ * Hot path on the Rack adapter: every uncached request header (any
675
+ * `X-*` custom header) hits this on every request, and the Ruby version
676
+ * spawns three String allocations (the upcase result, the tr result, and the
677
+ * "HTTP_..." interpolation) plus a per-byte loop in tr.
678
+ *
679
+ * We allocate one Ruby String of length 5 + name.bytesize, fill it in a
680
+ * single byte loop, return it. ASCII letters get OR'd with 0x20 inverted
681
+ * (i.e. cleared bit 5 to upcase 'a'..'z'); '-' becomes '_'; everything else
682
+ * passes through (header names are ASCII per RFC 9110, but multi-byte UTF-8
683
+ * bytes pass through bytewise unmolested rather than crashing).
684
+ *
685
+ * Encoding is set to US-ASCII because Ruby's String#upcase on an ASCII-only
686
+ * input returns a US-ASCII string, and the env-key lookup downstream is
687
+ * encoding-agnostic anyway.
688
+ */
689
+ static VALUE cupcase_underscore(VALUE self, VALUE rb_name) {
690
+ (void)self;
691
+ Check_Type(rb_name, T_STRING);
692
+
693
+ const char *src = RSTRING_PTR(rb_name);
694
+ long src_len = RSTRING_LEN(rb_name);
695
+
696
+ /* Single allocation: 5 prefix bytes + N source bytes. */
697
+ VALUE out = rb_str_new(NULL, 5 + src_len);
698
+ char *dst = RSTRING_PTR(out);
699
+
700
+ dst[0] = 'H';
701
+ dst[1] = 'T';
702
+ dst[2] = 'T';
703
+ dst[3] = 'P';
704
+ dst[4] = '_';
705
+
706
+ for (long i = 0; i < src_len; i++) {
707
+ unsigned char c = (unsigned char)src[i];
708
+ if (c >= 'a' && c <= 'z') {
709
+ dst[5 + i] = (char)(c - 32);
710
+ } else if (c == '-') {
711
+ dst[5 + i] = '_';
712
+ } else {
713
+ dst[5 + i] = (char)c;
714
+ }
715
+ }
716
+
717
+ rb_enc_associate(out, rb_usascii_encoding());
718
+ /* Keep rb_name live across the loop above. RSTRING_PTR returns an
719
+ * interior pointer that becomes invalid if the GC moves the source
720
+ * String — unlikely on this tight path, but cheap insurance. */
721
+ RB_GC_GUARD(rb_name);
722
+ return out;
723
+ }
724
+
725
+ /* Hyperion::CParser.chunked_body_complete?(buffer, body_start)
726
+ * -> [complete?, end_offset]
727
+ *
728
+ * Walks chunked-transfer framing in `buffer` starting at byte offset
729
+ * `body_start`. Returns a 2-element array:
730
+ * [true, end_offset] — chunked body fully buffered; end_offset is the
731
+ * byte just after the trailer CRLF (where pipelined
732
+ * bytes from a follow-on request would begin).
733
+ * [false, last_safe] — body is not yet complete; last_safe is the
734
+ * furthest cursor we successfully advanced to,
735
+ * useful as a hint for incremental parsing.
736
+ *
737
+ * Mirrors Connection#chunked_body_complete? in pure Ruby — see lib/hyperion/
738
+ * connection.rb. Trailing whitespace after the size token (e.g. "5 ; ext\r\n")
739
+ * is permitted as a permissive parse to match the upstream Ruby `.strip`.
740
+ */
741
+ static VALUE cchunked_body_complete(VALUE self, VALUE rb_buffer, VALUE rb_body_start) {
742
+ (void)self;
743
+ Check_Type(rb_buffer, T_STRING);
744
+
745
+ const char *data = RSTRING_PTR(rb_buffer);
746
+ long len = RSTRING_LEN(rb_buffer);
747
+ long cursor = NUM2LONG(rb_body_start);
748
+
749
+ if (cursor < 0 || cursor > len) {
750
+ rb_raise(rb_eArgError, "body_start out of range");
751
+ }
752
+
753
+ long last_safe = cursor;
754
+ VALUE result = rb_ary_new_capa(2);
755
+
756
+ while (1) {
757
+ /* Find the next CRLF starting at cursor. */
758
+ long line_end = -1;
759
+ for (long i = cursor; i + 1 < len; i++) {
760
+ if (data[i] == '\r' && data[i + 1] == '\n') {
761
+ line_end = i;
762
+ break;
763
+ }
764
+ }
765
+ if (line_end < 0) {
766
+ rb_ary_push(result, Qfalse);
767
+ rb_ary_push(result, LONG2NUM(last_safe));
768
+ RB_GC_GUARD(rb_buffer);
769
+ return result;
770
+ }
771
+
772
+ /* Parse the size token: hex digits up to ';' or whitespace, optional
773
+ * chunk extension after ';' which we ignore wholesale. */
774
+ long tok_start = cursor;
775
+ long tok_end = line_end;
776
+ for (long i = cursor; i < line_end; i++) {
777
+ if (data[i] == ';') { tok_end = i; break; }
778
+ }
779
+ /* Trim leading/trailing ASCII whitespace from the token. */
780
+ while (tok_start < tok_end &&
781
+ (data[tok_start] == ' ' || data[tok_start] == '\t')) {
782
+ tok_start++;
783
+ }
784
+ while (tok_end > tok_start &&
785
+ (data[tok_end - 1] == ' ' || data[tok_end - 1] == '\t')) {
786
+ tok_end--;
787
+ }
788
+ if (tok_end <= tok_start) {
789
+ /* Empty size token — incomplete frame. */
790
+ rb_ary_push(result, Qfalse);
791
+ rb_ary_push(result, LONG2NUM(last_safe));
792
+ RB_GC_GUARD(rb_buffer);
793
+ return result;
794
+ }
795
+
796
+ /* Validate + decode hex. */
797
+ unsigned long size = 0;
798
+ for (long i = tok_start; i < tok_end; i++) {
799
+ unsigned char c = (unsigned char)data[i];
800
+ unsigned int digit;
801
+ if (c >= '0' && c <= '9') {
802
+ digit = c - '0';
803
+ } else if (c >= 'a' && c <= 'f') {
804
+ digit = 10 + (c - 'a');
805
+ } else if (c >= 'A' && c <= 'F') {
806
+ digit = 10 + (c - 'A');
807
+ } else {
808
+ /* Non-hex byte: incomplete/malformed. Match the Ruby
809
+ * regex `/\A\h+\z/` semantics — return false, advance no
810
+ * further. The caller will read more bytes and retry. */
811
+ rb_ary_push(result, Qfalse);
812
+ rb_ary_push(result, LONG2NUM(last_safe));
813
+ RB_GC_GUARD(rb_buffer);
814
+ return result;
815
+ }
816
+ size = (size << 4) | digit;
817
+ }
818
+
819
+ cursor = line_end + 2;
820
+
821
+ if (size == 0) {
822
+ /* Final chunk — walk trailer headers until we hit "\r\n\r\n"
823
+ * (i.e. an empty trailer line directly after the size line). */
824
+ while (1) {
825
+ long nl = -1;
826
+ for (long i = cursor; i + 1 < len; i++) {
827
+ if (data[i] == '\r' && data[i + 1] == '\n') {
828
+ nl = i;
829
+ break;
830
+ }
831
+ }
832
+ if (nl < 0) {
833
+ rb_ary_push(result, Qfalse);
834
+ rb_ary_push(result, LONG2NUM(last_safe));
835
+ RB_GC_GUARD(rb_buffer);
836
+ return result;
837
+ }
838
+ if (nl == cursor) {
839
+ /* Empty line — body complete. */
840
+ rb_ary_push(result, Qtrue);
841
+ rb_ary_push(result, LONG2NUM(nl + 2));
842
+ RB_GC_GUARD(rb_buffer);
843
+ return result;
844
+ }
845
+ cursor = nl + 2;
846
+ }
847
+ }
848
+
849
+ /* Need cursor + size + 2 bytes (chunk data + trailing CRLF). */
850
+ if ((unsigned long)(len - cursor) < size + 2) {
851
+ rb_ary_push(result, Qfalse);
852
+ rb_ary_push(result, LONG2NUM(last_safe));
853
+ RB_GC_GUARD(rb_buffer);
854
+ return result;
855
+ }
856
+
857
+ cursor += (long)size + 2;
858
+ last_safe = cursor;
859
+ }
860
+ }
861
+
546
862
  void Init_hyperion_http(void) {
547
863
  install_settings();
548
864
 
@@ -557,6 +873,12 @@ void Init_hyperion_http(void) {
557
873
  cbuild_response_head, 6);
558
874
  rb_define_singleton_method(rb_cCParser, "build_access_line",
559
875
  cbuild_access_line, 9);
876
+ rb_define_singleton_method(rb_cCParser, "build_access_line_colored",
877
+ cbuild_access_line_colored, 9);
878
+ rb_define_singleton_method(rb_cCParser, "upcase_underscore",
879
+ cupcase_underscore, 1);
880
+ rb_define_singleton_method(rb_cCParser, "chunked_body_complete?",
881
+ cchunked_body_complete, 2);
560
882
 
561
883
  id_new = rb_intern("new");
562
884
  id_downcase = rb_intern("downcase");
@@ -48,6 +48,17 @@ module Hyperion
48
48
  }
49
49
  )
50
50
 
51
+ # Whether Hyperion::CParser.upcase_underscore is available. Probed lazily
52
+ # at first use (CParser is required after this file, so an eager check
53
+ # at load time would always be false). Memoised in a class-level ivar to
54
+ # keep the hot path branchless.
55
+ def self.c_upcase_available?
56
+ return @c_upcase_available unless @c_upcase_available.nil?
57
+
58
+ @c_upcase_available = defined?(::Hyperion::CParser) &&
59
+ ::Hyperion::CParser.respond_to?(:upcase_underscore)
60
+ end
61
+
51
62
  class << self
52
63
  # Pre-allocate `n` env-hash and rack-input objects in master before
53
64
  # fork. Children inherit the populated free-list via copy-on-write —
@@ -122,8 +133,14 @@ module Hyperion
122
133
  env['rack.run_once'] = false
123
134
  env['SCRIPT_NAME'] = ''
124
135
 
136
+ # Header-name → Rack env-key conversion. Cache covers the 16 most
137
+ # common names; uncached headers (X-* customs, vendor-specific) flow
138
+ # through CParser.upcase_underscore (single C-level allocation) when
139
+ # the extension is built, else the pure-Ruby triple-allocation path.
140
+ c_upcase = Rack.c_upcase_available?
125
141
  request.headers.each do |name, value|
126
- key = HTTP_KEY_CACHE[name] || "HTTP_#{name.upcase.tr('-', '_')}"
142
+ key = HTTP_KEY_CACHE[name] ||
143
+ (c_upcase ? ::Hyperion::CParser.upcase_underscore(name) : "HTTP_#{name.upcase.tr('-', '_')}")
127
144
  env[key] = value
128
145
  end
129
146
 
data/lib/hyperion/cli.rb CHANGED
@@ -26,6 +26,14 @@ module Hyperion
26
26
  Hyperion.logger = Hyperion::Logger.new(level: config.log_level, format: config.log_format)
27
27
  end
28
28
 
29
+ # Advisory: operators frequently flip --async-io expecting "fast mode"
30
+ # without installing a fiber-cooperative I/O library. On hello-world this
31
+ # costs ~5% rps; on no-I/O workloads more. The flag only pays off when
32
+ # paired with `hyperion-async-pg` / `async-redis` / `async-http`. We log
33
+ # once at boot pointing at the operator-guidance docs; the operator's
34
+ # setting is still honoured.
35
+ warn_orphan_async_io(config)
36
+
29
37
  # Propagate log_requests so every Connection picks it up via
30
38
  # `Hyperion.log_requests?` without needing to thread it through
31
39
  # Server/ThreadPool/Master plumbing. Default is ON; nil means "don't
@@ -261,6 +269,35 @@ WARNING: argv is visible via `ps`; prefer --admin-token-file PATH for production
261
269
  end
262
270
  private_class_method :maybe_enable_yjit
263
271
 
272
+ # Probe table for fiber-cooperative I/O libraries. If `async_io: true` is
273
+ # set but none of these are loaded, the operator has likely flipped the
274
+ # flag without reading the bench numbers — `--async-io` adds Async-loop
275
+ # overhead and only pays off when paired with a library whose I/O calls
276
+ # yield to the scheduler. Hello-world bench (BENCH_2026_04_27.md) showed
277
+ # a 47% rps regression + 3.65 s p99 spike on this shape.
278
+ ASYNC_IO_PROBE_LIBS = {
279
+ 'hyperion-async-pg' => -> { defined?(::Hyperion::AsyncPg) },
280
+ 'async-redis' => -> { defined?(::Async::Redis) },
281
+ 'async-http' => -> { defined?(::Async::HTTP) }
282
+ }.freeze
283
+
284
+ def self.warn_orphan_async_io(config)
285
+ return unless config.async_io == true # nil and false are both no-ops here
286
+
287
+ detected = ASYNC_IO_PROBE_LIBS.select { |_name, probe| probe.call }.keys
288
+ return unless detected.empty?
289
+
290
+ Hyperion.logger.warn do
291
+ {
292
+ message: 'async_io enabled but no fiber-cooperative I/O library detected',
293
+ libraries_checked: ASYNC_IO_PROBE_LIBS.keys,
294
+ impact: 'async_io adds Async-loop overhead (~5-47% rps depending on workload) and only pays off when paired with a library that yields to the Async scheduler on socket waits.',
295
+ docs: 'https://github.com/andrew-woblavobla/hyperion#operator-guidance'
296
+ }
297
+ end
298
+ end
299
+ private_class_method :warn_orphan_async_io
300
+
264
301
  # When admin_token is configured, wrap the app in AdminMiddleware so
265
302
  # POST /-/quit and GET /-/metrics become token-protected admin endpoints.
266
303
  # Skipped when the token is unset — those paths fall through to the app,
@@ -287,9 +287,29 @@ module Hyperion
287
287
 
288
288
  # Walks chunked framing in `buffer` starting at `body_start` and
289
289
  # returns true once the final 0-sized chunk (and trailer terminator)
290
- # is fully buffered. Mirrors the parser's dechunk walk; Phase 4's C
291
- # parser folds these together via incremental parsing.
290
+ # is fully buffered. The C extension folds the size-line scan + hex
291
+ # decode + chunk advance into a single tight loop with no per-iteration
292
+ # Ruby allocation; the pure-Ruby fallback below preserves the original
293
+ # semantics for environments where the C extension didn't build.
292
294
  def chunked_body_complete?(buffer, body_start)
295
+ if self.class.c_chunked_available?
296
+ ::Hyperion::CParser.chunked_body_complete?(buffer, body_start).first
297
+ else
298
+ chunked_body_complete_ruby?(buffer, body_start)
299
+ end
300
+ end
301
+
302
+ # Whether Hyperion::CParser.chunked_body_complete? is available. Probed
303
+ # lazily at first use; memoised in a class-level ivar to keep the
304
+ # per-request hot path branchless.
305
+ def self.c_chunked_available?
306
+ return @c_chunked_available unless @c_chunked_available.nil?
307
+
308
+ @c_chunked_available = defined?(::Hyperion::CParser) &&
309
+ ::Hyperion::CParser.respond_to?(:chunked_body_complete?)
310
+ end
311
+
312
+ def chunked_body_complete_ruby?(buffer, body_start)
293
313
  cursor = body_start
294
314
  loop do
295
315
  line_end = buffer.index("\r\n", cursor)
@@ -18,11 +18,40 @@ module Hyperion
18
18
  # dispatch — slow handlers no longer block other streams on the same
19
19
  # connection.
20
20
  #
21
- # All framer writes (HEADERS, DATA, RST_STREAM) are serialized through a
22
- # single connection-scoped Mutex (`@send_mutex`). The OpenSSL::SSL::SSLSocket
23
- # underneath is not safe to drive from two fibers concurrently, and
24
- # protocol-http2's HPACK encoder is also stateful across HEADERS frames,
25
- # so all sends must be serialized.
21
+ # ## Outbound write architecture (1.6.0+)
22
+ #
23
+ # Pre-1.6.0 every framer write (HEADERS / DATA / RST_STREAM / GOAWAY) ran
24
+ # under one connection-scoped `Mutex#synchronize { socket.write(...) }`.
25
+ # That capped per-connection h2 throughput to "one socket-write at a time"
26
+ # regardless of stream count: a slow socket (kernel send buffer full,
27
+ # remote peer reading slowly) blocked every other stream's writes too.
28
+ #
29
+ # 1.6.0 splits the path:
30
+ # * The HPACK encode + frame format step is fast (microseconds, in-memory)
31
+ # and remains serialized on the calling fiber via `@encode_mutex`. HPACK
32
+ # state is stateful across HEADERS frames per connection, and frames for
33
+ # a single stream must be wire-ordered (HEADERS → DATA → END_STREAM).
34
+ # Holding the encode mutex across a `send_*` call accomplishes both.
35
+ # * The framer writes through a `SendQueueIO` wrapper (wraps the real
36
+ # socket). `SendQueueIO#write(bytes)` enqueues onto a connection-wide
37
+ # `@send_queue` and signals `@send_notify`; it never touches the real
38
+ # socket.
39
+ # * A dedicated **writer fiber** owns the real socket. It pops byte chunks
40
+ # off the queue, writes them, and parks on `@send_notify` when empty.
41
+ # Only this fiber ever calls `socket.write` — the SSLSocket cross-fiber
42
+ # unsafety constraint is satisfied.
43
+ #
44
+ # Net effect: the slow-socket case no longer serializes encode work across
45
+ # streams. A stream that has bytes ready to encode can encode and enqueue
46
+ # while the writer is mid-flush of an earlier chunk. The mutex hold time
47
+ # drops from "until the kernel accepts the write" to "until the bytes are
48
+ # appended to the in-memory queue."
49
+ #
50
+ # Backpressure: pathological clients (slow-read h2) could otherwise let the
51
+ # queue grow without bound. We track `@pending_bytes`; once it exceeds
52
+ # `MAX_PER_CONN_PENDING_BYTES`, encoding fibers wait on `@drained_notify`
53
+ # before enqueueing more. The writer signals `@drained_notify` after each
54
+ # drain pass.
26
55
  #
27
56
  # Flow control: `RequestStream#window_updated` overrides the protocol-http2
28
57
  # default to fan a notification out to any fiber blocked in `send_body`
@@ -31,6 +60,153 @@ module Hyperion
31
60
  # size and yields on the notification when the window is exhausted, so
32
61
  # large bodies never trip a FlowControlError.
33
62
  class Http2Handler
63
+ # Cap on bytes that may sit in a connection's send queue waiting for the
64
+ # writer fiber to drain. Slow-read h2 clients can otherwise let an
65
+ # encoder fiber pile arbitrary bytes into RAM. 16 MiB matches the upper
66
+ # bound a well-behaved peer will buffer — anything beyond that is the
67
+ # writer being starved, and the right answer is to backpressure the
68
+ # encoder rather than allocate more.
69
+ MAX_PER_CONN_PENDING_BYTES = 16 * 1024 * 1024
70
+
71
+ # IO-shaped wrapper passed to `Protocol::HTTP2::Framer` in place of the
72
+ # real socket. Reads are direct passthroughs (the read loop runs on the
73
+ # connection fiber and there's only one reader). Writes are enqueued
74
+ # onto the connection-wide `WriterContext#queue`; the writer fiber owns
75
+ # the real socket and drains the queue.
76
+ #
77
+ # We deliberately do NOT delegate `flush` to the real socket: writes
78
+ # don't reach it from this object — the writer fiber does that. `flush`
79
+ # here is a no-op (the writer flushes after each batch).
80
+ #
81
+ # `closed?` reports the real socket's state so protocol-http2's read
82
+ # loop sees EOF the same way it always has.
83
+ class SendQueueIO
84
+ attr_reader :real_socket
85
+
86
+ def initialize(real_socket, writer_ctx)
87
+ @real_socket = real_socket
88
+ @writer_ctx = writer_ctx
89
+ end
90
+
91
+ # Framer's read path — direct delegation. Single-reader (the conn
92
+ # fiber), so no contention here.
93
+ def read(*args)
94
+ @real_socket.read(*args)
95
+ end
96
+
97
+ # Framer's write path — non-blocking handoff into the send queue.
98
+ # Backpressure is applied here: if pending bytes exceed the cap, the
99
+ # calling fiber parks on the drained notification until the writer
100
+ # has flushed enough to bring us below the threshold.
101
+ def write(bytes)
102
+ return 0 if bytes.nil? || bytes.empty?
103
+
104
+ @writer_ctx.enqueue(bytes)
105
+ bytes.bytesize
106
+ end
107
+
108
+ def flush
109
+ # No-op: bytes don't live in this object, they live in the queue.
110
+ # The writer fiber flushes the real socket as it drains.
111
+ nil
112
+ end
113
+
114
+ def close
115
+ @real_socket.close unless @real_socket.closed?
116
+ end
117
+
118
+ # Multi-line on purpose: a single-line `def closed?; @real_socket.closed?; end`
119
+ # gets autocorrected to `delegate :closed?, to: :@real_socket` by Rails-aware
120
+ # ruby-lsp formatters, which is wrong here (this is a plain gem, no
121
+ # ActiveSupport on the dependency graph).
122
+ def closed?
123
+ socket = @real_socket
124
+ socket.closed?
125
+ end
126
+ end
127
+
128
+ # Holds the per-connection outbound coordination state (queue,
129
+ # notifications, byte counters, shutdown flag) plus the encode mutex
130
+ # that protects HPACK state and per-stream frame ordering.
131
+ #
132
+ # Single instance per connection, lives for the lifetime of `serve`.
133
+ class WriterContext
134
+ attr_reader :encode_mutex
135
+
136
+ def initialize(max_pending_bytes: MAX_PER_CONN_PENDING_BYTES)
137
+ @queue = ::Thread::Queue.new
138
+ @send_notify = ::Async::Notification.new
139
+ @drained_notify = ::Async::Notification.new
140
+ @encode_mutex = ::Mutex.new
141
+ @pending_bytes = 0
142
+ @pending_bytes_lock = ::Mutex.new
143
+ @max_pending_bytes = max_pending_bytes
144
+ @writer_done = false
145
+ end
146
+
147
+ # Called by SendQueueIO#write on the calling (encoder) fiber. Enforces
148
+ # the per-connection backpressure cap before enqueuing.
149
+ def enqueue(bytes)
150
+ wait_for_drain_if_full(bytes.bytesize)
151
+ @pending_bytes_lock.synchronize { @pending_bytes += bytes.bytesize }
152
+ @queue << bytes
153
+ @send_notify.signal
154
+ end
155
+
156
+ # Pops a single chunk; returns nil if the queue is empty (non-blocking).
157
+ def try_pop
158
+ @queue.pop(true)
159
+ rescue ::ThreadError
160
+ nil
161
+ end
162
+
163
+ # Called by the writer fiber after each successful drain to release
164
+ # any encoders blocked on the cap.
165
+ def note_drained(bytesize)
166
+ @pending_bytes_lock.synchronize do
167
+ @pending_bytes -= bytesize
168
+ @pending_bytes = 0 if @pending_bytes.negative? # paranoia
169
+ end
170
+ @drained_notify.signal
171
+ end
172
+
173
+ def wait_for_signal
174
+ @send_notify.wait
175
+ end
176
+
177
+ def shutdown!
178
+ @writer_done = true
179
+ # Wake the writer if it's parked, and any encoder waiting on drain.
180
+ @send_notify.signal
181
+ @drained_notify.signal
182
+ end
183
+
184
+ def writer_done?
185
+ @writer_done
186
+ end
187
+
188
+ def queue_empty?
189
+ @queue.empty?
190
+ end
191
+
192
+ def pending_bytes
193
+ @pending_bytes_lock.synchronize { @pending_bytes }
194
+ end
195
+
196
+ private
197
+
198
+ def wait_for_drain_if_full(incoming_bytes)
199
+ # If we're already at/above the cap, park until the writer has
200
+ # drained. We re-check after every signal because multiple encoders
201
+ # can wake on a single drain notification.
202
+ while !@writer_done &&
203
+ @pending_bytes_lock.synchronize { @pending_bytes + incoming_bytes > @max_pending_bytes } &&
204
+ !@queue.empty?
205
+ @drained_notify.wait
206
+ end
207
+ end
208
+ end
209
+
34
210
  # Per-stream subclass that captures decoded request pseudo-headers,
35
211
  # regular headers, and any DATA frame body bytes for later dispatch.
36
212
  # Also exposes a `window_available` notification fan-out so the
@@ -247,21 +423,29 @@ module Hyperion
247
423
  def serve(socket)
248
424
  @metrics.increment(:connections_accepted)
249
425
  @metrics.increment(:connections_active)
250
- framer = ::Protocol::HTTP2::Framer.new(socket)
251
- server = build_server(framer)
426
+
427
+ # Per-connection outbound coordination. Encoder fibers enqueue bytes;
428
+ # the writer fiber owns the real socket and drains. See class docstring.
429
+ writer_ctx = WriterContext.new
430
+ send_io = SendQueueIO.new(socket, writer_ctx)
431
+ framer = ::Protocol::HTTP2::Framer.new(send_io)
432
+ server = build_server(framer)
433
+
434
+ task = ::Async::Task.current
435
+
436
+ # Spawn the dedicated writer fiber BEFORE the preface exchange.
437
+ # `Server#read_connection_preface` writes the server's SETTINGS frame
438
+ # via the framer; if the writer isn't running, those bytes sit in the
439
+ # queue. Spawning first guarantees they flush as soon as the scheduler
440
+ # ticks, avoiding any pathological deadlock where a client implementation
441
+ # waits for our SETTINGS before sending more frames.
442
+ writer_task = task.async { run_writer_loop(socket, writer_ctx) }
443
+
252
444
  server.read_connection_preface(initial_settings_payload)
253
445
 
254
446
  # Extract once — the same TCP peer drives every stream on this conn.
255
447
  peer_addr = peer_address(socket)
256
448
 
257
- # All framer writes (HEADERS / DATA / RST_STREAM / GOAWAY) must be
258
- # serialized: the underlying SSLSocket is not safe across fibers, and
259
- # the HPACK encoder is also stateful. The connection's own frame loop
260
- # uses this mutex too — see `dispatch_stream` and `send_body`.
261
- send_mutex = ::Mutex.new
262
-
263
- task = ::Async::Task.current
264
-
265
449
  # Track in-flight per-stream dispatch fibers so we can drain them on
266
450
  # connection close.
267
451
  stream_tasks = []
@@ -284,7 +468,7 @@ module Hyperion
284
468
  stream.instance_variable_set(:@hyperion_dispatched, true)
285
469
 
286
470
  stream_tasks << task.async do
287
- dispatch_stream(stream, send_mutex, peer_addr)
471
+ dispatch_stream(stream, writer_ctx, peer_addr)
288
472
  end
289
473
  end
290
474
  end
@@ -309,6 +493,18 @@ module Hyperion
309
493
  }
310
494
  end
311
495
  ensure
496
+ # Coordinated shutdown: flag the writer, signal it, wait for the final
497
+ # drain, then close the real socket. Order matters — closing the
498
+ # socket before the writer drains would discard final RST_STREAM /
499
+ # GOAWAY / END_STREAM frames in the queue.
500
+ if writer_ctx
501
+ writer_ctx.shutdown!
502
+ begin
503
+ writer_task&.wait
504
+ rescue StandardError
505
+ nil
506
+ end
507
+ end
312
508
  @metrics.decrement(:connections_active)
313
509
  socket.close unless socket.closed?
314
510
  end
@@ -394,7 +590,7 @@ module Hyperion
394
590
  server
395
591
  end
396
592
 
397
- def dispatch_stream(stream, send_mutex, peer_addr = nil)
593
+ def dispatch_stream(stream, writer_ctx, peer_addr = nil)
398
594
  # RFC 7540 §8.1.2 — header validation flagged this stream as malformed.
399
595
  # Send RST_STREAM PROTOCOL_ERROR instead of invoking the app.
400
596
  if stream.protocol_error?
@@ -403,7 +599,7 @@ module Hyperion
403
599
  end
404
600
  @metrics.increment(:requests_rejected)
405
601
  begin
406
- send_mutex.synchronize do
602
+ writer_ctx.encode_mutex.synchronize do
407
603
  stream.send_reset_stream(::Protocol::HTTP2::Error::PROTOCOL_ERROR) unless stream.closed?
408
604
  end
409
605
  rescue StandardError
@@ -459,8 +655,8 @@ module Hyperion
459
655
  body_chunks.each { |c| payload << c.to_s }
460
656
  body_chunks.close if body_chunks.respond_to?(:close)
461
657
 
462
- send_mutex.synchronize { stream.send_headers(out_headers) }
463
- send_body(stream, payload, send_mutex)
658
+ writer_ctx.encode_mutex.synchronize { stream.send_headers(out_headers) }
659
+ send_body(stream, payload, writer_ctx)
464
660
  @metrics.increment_status(status)
465
661
  rescue StandardError => e
466
662
  @metrics.increment(:app_errors)
@@ -473,7 +669,9 @@ module Hyperion
473
669
  }
474
670
  end
475
671
  begin
476
- send_mutex.synchronize { stream.send_reset_stream(::Protocol::HTTP2::Error::INTERNAL_ERROR) }
672
+ writer_ctx.encode_mutex.synchronize do
673
+ stream.send_reset_stream(::Protocol::HTTP2::Error::INTERNAL_ERROR)
674
+ end
477
675
  rescue StandardError
478
676
  nil
479
677
  end
@@ -485,9 +683,12 @@ module Hyperion
485
683
  # notification — protocol-http2 calls `window_updated` on every active
486
684
  # stream when WINDOW_UPDATE frames arrive (either stream- or
487
685
  # connection-scoped), which signals the notification.
488
- def send_body(stream, payload, send_mutex)
686
+ #
687
+ # The encode_mutex protects HPACK state and per-stream frame ordering;
688
+ # the actual socket write happens off-fiber via the writer task.
689
+ def send_body(stream, payload, writer_ctx)
489
690
  if payload.empty?
490
- send_mutex.synchronize { stream.send_data('', ::Protocol::HTTP2::END_STREAM) }
691
+ writer_ctx.encode_mutex.synchronize { stream.send_data('', ::Protocol::HTTP2::END_STREAM) }
491
692
  return
492
693
  end
493
694
 
@@ -508,7 +709,69 @@ module Hyperion
508
709
  offset += chunk.bytesize
509
710
  flags = offset >= bytesize ? ::Protocol::HTTP2::END_STREAM : 0
510
711
 
511
- send_mutex.synchronize { stream.send_data(chunk, flags) }
712
+ writer_ctx.encode_mutex.synchronize { stream.send_data(chunk, flags) }
713
+ end
714
+ end
715
+
716
+ # Drain bytes off the per-connection send queue onto the real socket.
717
+ # This fiber is the SOLE writer to `socket` for the connection's
718
+ # lifetime, which satisfies SSLSocket's "no concurrent writes from
719
+ # different fibers" constraint.
720
+ #
721
+ # The loop:
722
+ # 1. Drain everything currently enqueued (non-blocking pops).
723
+ # 2. If we drained anything, signal `@drained_notify` so backpressured
724
+ # encoders can resume, then loop again — more bytes may have been
725
+ # enqueued while we were writing.
726
+ # 3. If shutdown was requested AND the queue is empty, exit.
727
+ # 4. Otherwise park on the send notification until an encoder pokes us.
728
+ def run_writer_loop(socket, writer_ctx)
729
+ loop do
730
+ drained_bytes = 0
731
+ while (chunk = writer_ctx.try_pop)
732
+ begin
733
+ socket.write(chunk)
734
+ rescue EOFError, Errno::ECONNRESET, Errno::EPIPE, IOError, OpenSSL::SSL::SSLError
735
+ # Peer hung up. Release THIS chunk's byte budget, then drain the
736
+ # rest of the queue (without writing) so backpressured encoders
737
+ # don't stall waiting on a writer that's about to exit. Any
738
+ # remaining queued bytes are dropped — the connection is dead.
739
+ writer_ctx.note_drained(chunk.bytesize)
740
+ drain_and_discard_queue(writer_ctx)
741
+ return
742
+ end
743
+ drained_bytes += chunk.bytesize
744
+ writer_ctx.note_drained(chunk.bytesize)
745
+ end
746
+
747
+ # Some sockets (SSLSocket on a TCPSocket whose Nagle is off) need an
748
+ # explicit flush to push small final frames (END_STREAM data, GOAWAY)
749
+ # without waiting for the next write. Cheap when there's nothing
750
+ # buffered.
751
+ socket.flush if drained_bytes.positive? && socket.respond_to?(:flush) && !socket.closed?
752
+
753
+ return if writer_ctx.writer_done? && writer_ctx.queue_empty?
754
+
755
+ writer_ctx.wait_for_signal
756
+ end
757
+ rescue StandardError => e
758
+ @logger.error do
759
+ {
760
+ message: 'h2 writer loop error',
761
+ error: e.message,
762
+ error_class: e.class.name,
763
+ backtrace: (e.backtrace || []).first(10).join(' | ')
764
+ }
765
+ end
766
+ end
767
+
768
+ # On peer-disconnect we discard any queued bytes (we can't write them),
769
+ # but we MUST still decrement the byte counter for each one or
770
+ # backpressured encoder fibers will park forever on the drain
771
+ # notification.
772
+ def drain_and_discard_queue(writer_ctx)
773
+ while (chunk = writer_ctx.try_pop)
774
+ writer_ctx.note_drained(chunk.bytesize)
512
775
  end
513
776
  end
514
777
 
@@ -65,6 +65,7 @@ module Hyperion
65
65
  # check the regular stream here — colored text is for humans.
66
66
  @colorize = @format == :text && tty?(@out)
67
67
  @c_access_available = nil # lazy-computed on first access — see below.
68
+ @c_access_colored_available = nil # ditto for the coloured TTY variant.
68
69
  # Registry of every per-thread access buffer ever allocated through
69
70
  # this Logger instance. Walked by #flush_all on shutdown so SIGTERM
70
71
  # doesn't strand buffered lines in dying threads. The Mutex guards
@@ -94,6 +95,16 @@ module Hyperion
94
95
  ::Hyperion::CParser.respond_to?(:build_access_line)
95
96
  end
96
97
 
98
+ # Whether Hyperion::CParser.build_access_line_colored is available. Same
99
+ # lazy-probe pattern as #c_access_available?; lets a colored-TTY run pick
100
+ # up the C path instead of the Ruby fallback.
101
+ def c_access_colored_available?
102
+ return @c_access_colored_available unless @c_access_colored_available.nil?
103
+
104
+ @c_access_colored_available = defined?(::Hyperion::CParser) &&
105
+ ::Hyperion::CParser.respond_to?(:build_access_line_colored)
106
+ end
107
+
97
108
  LEVELS.each_key do |lvl|
98
109
  define_method(lvl) do |payload = nil, &block|
99
110
  next unless emit?(lvl)
@@ -140,7 +151,12 @@ module Hyperion
140
151
  # which the C builder doesn't emit. Production deploys (non-TTY,
141
152
  # log-aggregator destinations) take the C path; local TTY runs keep the
142
153
  # colored Ruby fallback.
143
- line = if !@colorize && c_access_available?
154
+ line = if @colorize && c_access_colored_available?
155
+ # Colored TTY path: green INFO label baked into the C builder.
156
+ ::Hyperion::CParser.build_access_line_colored(@format, ts, method, path,
157
+ query, status, duration_ms,
158
+ remote_addr, http_version)
159
+ elsif !@colorize && c_access_available?
144
160
  ::Hyperion::CParser.build_access_line(@format, ts, method, path,
145
161
  query, status, duration_ms,
146
162
  remote_addr, http_version)
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Hyperion
4
- VERSION = '1.5.0'
4
+ VERSION = '1.6.1'
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: hyperion-rb
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.5.0
4
+ version: 1.6.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrey Lobanov