hyperion-rb 2.16.3 → 2.16.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b52ac123c5a40e73e75a2384abf032393a9b90e8aad6b2701903373e518d6bd2
4
- data.tar.gz: 529138f8068a67192a9e45ebd6599510f8a14d0ef405b1e0c57e8ee043d4f713
3
+ metadata.gz: 1052090a1cfa42b3ba8807ff239b88f1b5700b0efd2df58a84166b0f754d6496
4
+ data.tar.gz: '028fa894acb855151cd2901aa1964fb2fd4be68908060401ba64a80d43987bd1'
5
5
  SHA512:
6
- metadata.gz: 8cd08a9d462cce2158d6c508038b7b05d8f6e6e50f7372b36c2c28827b440038963f62e29663a76204c4b2387dd8b4d648c1bc5bd2db9b81394375f1cbb18054
7
- data.tar.gz: 4113cfca038452c873d2bd48ac9a890480a6d1d65c52374a7e85693504f094de04e6d7f112a08a7c27b33e3f8464c558c2307782e63d84dffbb5acfcdcecb08b
6
+ metadata.gz: b080a73c39cbaa284594bda8b2a0dab21f2e737d8d4fd0b6316026bf9fb72b913f3d001117f4291166a6436fd1af0dae11766c47141aa54ca84c759d718c516d
7
+ data.tar.gz: 24fd2ff23e4ace9e29e9b43ef5c1becaaa1d732e4cff0ac6cafd1fa3fce317029afcf4cc26d3ee5cdad19ccfe7b7252f6da8b3d3a731824a1a10d74f043e4107
data/CHANGELOG.md CHANGED
@@ -1,5 +1,80 @@
1
1
  # Changelog
2
2
 
3
+ ## 2.16.4 — 2026-05-07
4
+
5
+ ### io_uring hotpath accept fiber: row-19 BOOT-FAIL fix
6
+
7
+ Bug fix only; no public API change. Restores the
8
+ `--async-io + io_uring_hotpath=on + 1w` boot path on Linux 5.19+.
9
+
10
+ - **`Hyperion::Server#run_accept_fiber_io_uring_hotpath`** used
11
+ `@metrics.increment(:connections_accepted)` /
12
+ `@metrics.increment(:connections_active)` on every accept
13
+ completion, but `Server` has no `@metrics` ivar — sibling write
14
+ paths route through `runtime_metrics` (which resolves to
15
+ `@runtime.metrics` or `Hyperion.metrics`). Reading the unset
16
+ ivar yielded `nil`, so the first `OP_ACCEPT` completion raised
17
+ `NoMethodError: undefined method 'increment' for nil`. The
18
+ `rescue StandardError` block caught it, logged
19
+ `"io_uring hotpath accept fiber error; falling back to epoll"`,
20
+ and bailed to the epoll path. The fault only manifested on the
21
+ `--async-io` boot shape because `start_async_loop` is the only
22
+ call path that drives the hotpath fiber — `start_raw_loop`
23
+ short-circuits to the C accept loop and bypasses the bug
24
+ entirely (which is why the hotpath-on synthetic rows kept
25
+ passing while the AR-CRUD `--async-io` row took down `wait_for_bind`
26
+ and registered as `BOOT-FAIL`). Replaced with `runtime_metrics`.
27
+
28
+ - **`Hyperion::Connection#feed_read_bytes`** and
29
+ **`#close_for_eof`** were defined under `private` (line 536) but
30
+ called externally from the accept fiber per their own doc
31
+ comments. Each `OP_RECV` completion would have re-tripped the
32
+ fallback with `NoMethodError: private method 'feed_read_bytes'
33
+ called …`. Moved both to `public`.
34
+
35
+ - **`Hyperion::Server#run_accept_fiber_io_uring_hotpath` rescue
36
+ ordering**: the `rescue` blocks called `run_accept_fiber_epoll(task)`
37
+ *before* the `ensure` closed the hotpath ring. While the fallback
38
+ ran, the multishot-accept SQE stayed armed against the listener
39
+ fd and competed with the fallback's `accept_or_nil` for incoming
40
+ connections — defeating the recovery. Extracted
41
+ `close_hotpath_ring_for_fallback` and call it before the fallback
42
+ invocation in every rescue path.
43
+
44
+ - **`spec/hyperion/connection_io_uring_hotpath_spec.rb`** had a
45
+ `rescue StandardError => e ; pending "in-process boot is fragile"`
46
+ block that masked the `NoMethodError` above as a flake for the
47
+ bench gate to chase. Removed the `pending` rescue and added
48
+ `async_io: true` to `boot_server_thread` (without it the spec
49
+ exercised `start_raw_loop`, not the hotpath fiber it claimed to
50
+ test — which is exactly how the bug slipped past CI).
51
+
52
+ - **`spec/hyperion/server_metrics_ivar_regression_spec.rb`** new —
53
+ cross-platform structural guard so a future re-introduction of
54
+ an `@metrics` ivar reference in `Server` trips on macOS / Linux
55
+ CI before reaching the bench gate.
56
+
57
+ Bench evidence on the project's Linux 6.8 + io_uring 5.19+ host
58
+ against PG 17 (canonical `--async-io` + `hyperion-async-pg`):
59
+
60
+ | Trial | Requests/sec | p99 latency |
61
+ |---:|---:|---:|
62
+ | 1 | 468.46 | 237.95 ms |
63
+ | 2 | 474.61 | 220.60 ms |
64
+ | 3 | 492.93 | 234.79 ms |
65
+ | **Median** | **474.61** | **234.79 ms** |
66
+
67
+ Boot succeeded in 5 s (vs `BOOT-FAIL` after 45 s on 2026-05-07).
68
+ Zero `io_uring hotpath accept fiber error` warn lines and zero
69
+ `io_uring hotpath ring unhealthy` lines across the 60 s wrk window:
70
+ the hotpath served every request end-to-end with no fallback
71
+ engagement. p99 is ~2× better than the 2026-05-05 first pass
72
+ (234 ms vs 497 ms) — the 2026-05-05 run used the epoll-style
73
+ accept loop; this run exercises multishot-accept + multishot-recv
74
+ + send-SQE on the unified ring as designed (shallower queues,
75
+ trades some throughput for tail latency). CSV in
76
+ `docs/BENCH_ROW19_HOTPATH_FIX_2026_05_07.csv`.
77
+
3
78
  ## 2.16.3 — 2026-05-05
4
79
 
5
80
  ### 2.16.3-A — Hot-path Ruby cost reduction (metrics + split_host cache)
@@ -15,6 +15,7 @@ require 'mkmf'
15
15
  # is available (the C source raises NotImplementedError at call time).
16
16
  $srcs = %w[
17
17
  parser.c
18
+ response_writer.c
18
19
  sendfile.c
19
20
  page_cache.c
20
21
  io_uring_loop.c
@@ -37,6 +38,14 @@ have_header('sys/sendfile.h') # Linux: sendfile64
37
38
  have_header('sys/uio.h') # BSD/Darwin sendfile + Linux iovec plumbing
38
39
  have_header('sys/socket.h')
39
40
 
41
+ # Plan #1 (perf roadmap) — direct-syscall response writer probes.
42
+ # All POSIX-shaped; on macOS MSG_NOSIGNAL doesn't exist so the C
43
+ # source falls back to writev with #ifdef MSG_NOSIGNAL guards.
44
+ have_func('writev', 'sys/uio.h')
45
+ have_func('sendmsg', 'sys/socket.h')
46
+ have_macro('MSG_NOSIGNAL', 'sys/socket.h')
47
+ have_macro('TCP_CORK', 'netinet/tcp.h')
48
+
40
49
  # 2.4-A: h2_codec_glue.c calls dlopen/dlsym to wire the Rust HPACK
41
50
  # cdylib without going through Fiddle on the per-call hot path. macOS
42
51
  # ships dlopen in libSystem (no extra link flag needed); Linux glibc
@@ -3,6 +3,7 @@
3
3
  #include <ruby/st.h>
4
4
  #include <string.h>
5
5
  #include "llhttp.h"
6
+ #include "response_writer.h"
6
7
 
7
8
  /* ----------------------------------------------------------------------
8
9
  * Hyperion::CParser — C extension wrapping llhttp.
@@ -568,9 +569,10 @@ typedef struct {
568
569
  /* Two-word key: input key VALUE bits + value VALUE bits. */
569
570
  VALUE key_v;
570
571
  VALUE val_v;
571
- VALUE line; /* "<lc-key>: <value>\r\n" buffer */
572
+ VALUE line; /* "<lc-key>: <value>\r\n" buffer */
572
573
  long line_len;
573
- int is_date; /* 1 if lc-key == "date" — caller skips the date tail */
574
+ int is_date; /* 1 if lc-key == "date" — caller skips the date tail */
575
+ int is_framing; /* 1 if lc-key == "transfer-encoding" — skip in chunked mode */
574
576
  } header_line_cache_entry_t;
575
577
 
576
578
  static st_table *g_header_line_cache = NULL;
@@ -601,7 +603,7 @@ static const struct st_hash_type header_line_cache_type = {
601
603
  * and st_insert. */
602
604
  static const header_line_cache_entry_t *header_line_cache_lookup(VALUE key, VALUE val) {
603
605
  if (g_header_line_cache == NULL) return NULL;
604
- header_line_cache_entry_t probe = { key, val, Qnil, 0, 0 };
606
+ header_line_cache_entry_t probe = { key, val, Qnil, 0, 0, 0 };
605
607
  st_data_t found_data;
606
608
  if (st_lookup(g_header_line_cache, (st_data_t)&probe, &found_data)) {
607
609
  return (const header_line_cache_entry_t *)found_data;
@@ -664,6 +666,8 @@ static const header_key_cache_entry_t *header_key_cache_lookup(VALUE key_v) {
664
666
  typedef struct {
665
667
  VALUE buf;
666
668
  int has_date;
669
+ int is_chunked; /* 1 → skip content-length + transfer-encoding from user headers;
670
+ * emit transfer-encoding: chunked instead (set by sentinel -1). */
667
671
  } build_head_state_t;
668
672
 
669
673
  static int build_head_each(VALUE k, VALUE v, VALUE arg) {
@@ -671,11 +675,16 @@ static int build_head_each(VALUE k, VALUE v, VALUE arg) {
671
675
 
672
676
  /* Full-line cache fast path: BOTH key AND value are frozen-literal
673
677
  * Strings AND the (key, value) pair is already cached. ONE rb_str_cat
674
- * consumes the entire prebuilt "<lc-key>: <value>\r\n" line. */
678
+ * consumes the entire prebuilt "<lc-key>: <value>\r\n" line.
679
+ * In chunked mode, skip any framing headers (transfer-encoding) that
680
+ * may have been cached from a previous non-chunked call on the same
681
+ * connection. content-length is never in the cache because the slow
682
+ * path drops it before the cache-populate step. */
675
683
  if (TYPE(k) == T_STRING && TYPE(v) == T_STRING &&
676
684
  OBJ_FROZEN_RAW(k) && OBJ_FROZEN_RAW(v)) {
677
685
  const header_line_cache_entry_t *line_e = header_line_cache_lookup(k, v);
678
686
  if (line_e != NULL) {
687
+ if (st->is_chunked && line_e->is_framing) return ST_CONTINUE;
679
688
  rb_str_cat(st->buf, RSTRING_PTR(line_e->line), line_e->line_len);
680
689
  if (line_e->is_date) st->has_date = 1;
681
690
  return ST_CONTINUE;
@@ -717,9 +726,14 @@ static int build_head_each(VALUE k, VALUE v, VALUE arg) {
717
726
  }
718
727
 
719
728
  /* Drop user-supplied content-length / connection — we always set
720
- * these unconditionally below. */
729
+ * these unconditionally below.
730
+ * In chunked mode also drop transfer-encoding — we emit our own
731
+ * "transfer-encoding: chunked\r\n" (RFC 7230 §3.3.3: content-length
732
+ * and transfer-encoding are mutually exclusive). */
721
733
  if (lc_len == 14 && memcmp(lc_ptr, "content-length", 14) == 0) return ST_CONTINUE;
722
734
  if (lc_len == 10 && memcmp(lc_ptr, "connection", 10) == 0) return ST_CONTINUE;
735
+ if (st->is_chunked &&
736
+ lc_len == 17 && memcmp(lc_ptr, "transfer-encoding", 17) == 0) return ST_CONTINUE;
723
737
 
724
738
  if (lc_len == 4 && memcmp(lc_ptr, "date", 4) == 0) st->has_date = 1;
725
739
 
@@ -733,7 +747,8 @@ static int build_head_each(VALUE k, VALUE v, VALUE arg) {
733
747
  rb_str_cat(st->buf, "\r\n", 2);
734
748
 
735
749
  /* Populate the line cache for next time when both sides are frozen
736
- * literals and we have room. */
750
+ * literals and we have room. Mark transfer-encoding entries as
751
+ * is_framing so the full-line fast path can skip them in chunked mode. */
737
752
  if (g_header_line_cache != NULL &&
738
753
  TYPE(k) == T_STRING && TYPE(v) == T_STRING &&
739
754
  OBJ_FROZEN_RAW(k) && OBJ_FROZEN_RAW(v) &&
@@ -747,11 +762,12 @@ static int build_head_each(VALUE k, VALUE v, VALUE arg) {
747
762
  rb_obj_freeze(line);
748
763
 
749
764
  header_line_cache_entry_t *ne = ALLOC(header_line_cache_entry_t);
750
- ne->key_v = k;
751
- ne->val_v = v;
752
- ne->line = line;
753
- ne->line_len = line_len;
754
- ne->is_date = (lc_len == 4 && memcmp(lc_ptr, "date", 4) == 0) ? 1 : 0;
765
+ ne->key_v = k;
766
+ ne->val_v = v;
767
+ ne->line = line;
768
+ ne->line_len = line_len;
769
+ ne->is_date = (lc_len == 4 && memcmp(lc_ptr, "date", 4) == 0) ? 1 : 0;
770
+ ne->is_framing = (lc_len == 17 && memcmp(lc_ptr, "transfer-encoding", 17) == 0) ? 1 : 0;
755
771
 
756
772
  rb_ary_push(rb_aHeaderLineAnchor, k);
757
773
  rb_ary_push(rb_aHeaderLineAnchor, v);
@@ -811,6 +827,23 @@ static VALUE cbuild_response_head(VALUE self, VALUE rb_status, VALUE rb_reason,
811
827
  long body_size = NUM2LONG(rb_body_size);
812
828
  int keep_alive = RTEST(rb_keep_alive);
813
829
 
830
+ /* body_size == -1 is the chunked-encoding sentinel; any other
831
+ * negative value is a programming error (likely an integer
832
+ * underflow in a caller). Reject early with a clear message
833
+ * rather than silently treating -2 / -42 as chunked. */
834
+ if (body_size < -1) {
835
+ rb_raise(rb_eArgError,
836
+ "body_size must be >= 0 (or -1 for chunked sentinel), got %ld",
837
+ body_size);
838
+ }
839
+
840
+ /* body_size == -1 is the chunked-encoding sentinel (from
841
+ * hyperion_build_response_head_chunked). In this mode we emit
842
+ * "transfer-encoding: chunked\r\n" instead of "content-length: N\r\n"
843
+ * and suppress any user-supplied content-length / transfer-encoding
844
+ * headers (RFC 7230 §3.3.3 — they are mutually exclusive). */
845
+ int is_chunked = (body_size == -1);
846
+
814
847
  /* Most heads fit in 1 KiB; rb_str_cat grows on demand. */
815
848
  VALUE buf = rb_str_buf_new(1024);
816
849
 
@@ -831,18 +864,25 @@ static VALUE cbuild_response_head(VALUE self, VALUE rb_status, VALUE rb_reason,
831
864
 
832
865
  /* Iterate user headers — lowercase key, validate value, skip framing.
833
866
  * Threaded through rb_hash_foreach so we can reuse the per-key
834
- * downcase cache and skip the per-call `keys` Array allocation. */
835
- build_head_state_t state = { buf, 0 };
867
+ * downcase cache and skip the per-call `keys` Array allocation.
868
+ * is_chunked is threaded through state so build_head_each can drop
869
+ * user-supplied transfer-encoding and content-length in chunked mode. */
870
+ build_head_state_t state = { buf, 0, is_chunked };
836
871
  rb_hash_foreach(rb_headers, build_head_each, (VALUE)&state);
837
872
 
838
- /* Framing headers — always emitted. content-length uses a hand-rolled
839
- * itoa rather than snprintf (vfprintf was 1 % of CPU on the
840
- * CPU-JSON profile). */
841
- char itoa_scratch[24];
842
- int cl_off = itoa_positive_decimal(body_size, itoa_scratch, (int)sizeof(itoa_scratch));
843
- rb_str_cat(buf, "content-length: ", 16);
844
- rb_str_cat(buf, itoa_scratch + cl_off, sizeof(itoa_scratch) - cl_off);
845
- rb_str_cat(buf, "\r\n", 2);
873
+ /* Framing headers — always emitted.
874
+ * Non-chunked: content-length uses a hand-rolled itoa rather than
875
+ * snprintf (vfprintf was 1 % of CPU on the CPU-JSON profile).
876
+ * Chunked: transfer-encoding: chunked (no content-length — RFC 7230 §3.3.3). */
877
+ if (is_chunked) {
878
+ rb_str_cat(buf, "transfer-encoding: chunked\r\n", 28);
879
+ } else {
880
+ char itoa_scratch[24];
881
+ int cl_off = itoa_positive_decimal(body_size, itoa_scratch, (int)sizeof(itoa_scratch));
882
+ rb_str_cat(buf, "content-length: ", 16);
883
+ rb_str_cat(buf, itoa_scratch + cl_off, sizeof(itoa_scratch) - cl_off);
884
+ rb_str_cat(buf, "\r\n", 2);
885
+ }
846
886
 
847
887
  if (keep_alive) {
848
888
  rb_str_cat(buf, "connection: keep-alive\r\n", 24);
@@ -862,6 +902,34 @@ static VALUE cbuild_response_head(VALUE self, VALUE rb_status, VALUE rb_reason,
862
902
  return buf;
863
903
  }
864
904
 
905
+ /* response_writer.h surface — called from response_writer.c to reuse the
906
+ * head-build logic without going through Ruby method dispatch on the hot path.
907
+ * Both wrappers delegate directly to cbuild_response_head; they are NOT static
908
+ * so the linker can resolve them from response_writer.c in the same .bundle. */
909
+
910
+ /* hyperion_build_response_head — non-chunked path.
911
+ * Calling convention matches the Ruby-side method:
912
+ * status / reason / headers / body_size / keep_alive / date_str. */
913
+ VALUE hyperion_build_response_head(VALUE status, VALUE reason, VALUE headers,
914
+ VALUE body_size, VALUE keep_alive,
915
+ VALUE date_str) {
916
+ return cbuild_response_head(Qnil, status, reason, headers,
917
+ body_size, keep_alive, date_str);
918
+ }
919
+
920
+ /* hyperion_build_response_head_chunked — chunked-encoding path.
921
+ * Same byte shape as ResponseWriter#build_head_chunked in response_writer.rb
922
+ * but native, allocating one Ruby String. Drops any caller-supplied
923
+ * content-length and transfer-encoding headers (mutually exclusive per
924
+ * RFC 7230 §3.3.3) and always emits "transfer-encoding: chunked\r\n".
925
+ * Implemented as cbuild_response_head with the body_size = -1 sentinel. */
926
+ VALUE hyperion_build_response_head_chunked(VALUE status, VALUE reason,
927
+ VALUE headers, VALUE keep_alive,
928
+ VALUE date_str) {
929
+ return cbuild_response_head(Qnil, status, reason, headers,
930
+ LL2NUM(-1), keep_alive, date_str);
931
+ }
932
+
865
933
  /* Hyperion::CParser.build_access_line(format, ts, method, path, query,
866
934
  * status, duration_ms, remote_addr,
867
935
  * http_version) -> String
@@ -1654,4 +1722,8 @@ void Init_hyperion_http(void) {
1654
1722
  * fail-fast crash here would break parser.c entirely. */
1655
1723
  extern void Init_hyperion_h2_codec_glue(void);
1656
1724
  Init_hyperion_h2_codec_glue();
1725
+
1726
+ /* Plan #1 (perf roadmap) — Hyperion::Http::ResponseWriter. */
1727
+ extern void Init_hyperion_response_writer(void);
1728
+ Init_hyperion_response_writer();
1657
1729
  }