hyperion-rb 1.4.0 → 1.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 94ad9628c5683d6b9c9d2d8359066e4bda37d29c7ec2e4fb4e48ae726f8ff939
4
- data.tar.gz: 7ebf3c87d10a384e388cf7f2c28d276c0975044a49fe09e1174ffa331cab76b4
3
+ metadata.gz: fd27fc15210a2436de9b88664413eeb7cd7cf129e0921f88d7d15edd87b76d11
4
+ data.tar.gz: 8dd98630e170ee1467197542d69d034b15aa1d1a1bc2441e780d3290c2c59f1d
5
5
  SHA512:
6
- metadata.gz: 9fb844690d5a06aa55335211bdaf3a5276b48f7cfd8a248bb714258ae02549a7e595bc1088987ae9c77d3f28aa1efec91fcd8cc401c26264620f48d78d85faf3
7
- data.tar.gz: d050bffe3c8fed2ae1ce434fb7ccf8904a418e3fa18c6afd3e4314b841215ca0dec621edb7bc5f80350e4807e04503b1e785786fb57503b716e52afaf2e30a20
6
+ metadata.gz: a2ae7959ab5fe99207c2c53db22060385a6eeb3aad355d05b4b6f14b16c16d1057592eda2f63b2a72cbfeb62ec7ebae5ec5945f317e84a27c5bac9c6fdfbe614
7
+ data.tar.gz: 6fb3d77d7c631b98e366ef921859dcc9af1f5f2e3f6eb6e97bebfd9453f6403c6846d46c66628204785bd9c0253e01984696b59eab690c53a79f3238c6f1f15a
data/CHANGELOG.md CHANGED
@@ -1,5 +1,26 @@
1
1
  # Changelog
2
2
 
3
+ ## [1.4.2] - 2026-04-27
4
+
5
+ Audit-driven cleanup. No behaviour changes; fiber-correctness + docs polish.
6
+
7
+ ### Fixed
8
+ - **`Hyperion::Logger` access buffer was fiber-local, not thread-local** — pre-1.4.2 the access-log write buffer was stored via `Thread.current[@buffer_key]`. Under an `Async::Scheduler` (TLS / h2 / `--async-io` plain HTTP/1.1) every handler fiber got its own private buffer, so the 4 KiB `ACCESS_FLUSH_BYTES` batching never fired — each fiber's buffer accumulated 1-3 lines before its connection closed and `flush_access_buffer` wrote them. At 24k r/s this meant ~12-24k `write(2)` syscalls/sec instead of the designed ~750/sec. Switched to `Thread#thread_variable_*` so all fibers on the same OS thread share one buffer and the batching actually fires. Same root cause as the 1.4.1 Metrics fix; surfaced by a code-audit grep for residual `Thread.current[:key]` patterns.
9
+ - **`Logger#cached_timestamp` and `ResponseWriter#cached_date`** — same fix. Pre-1.4.2 the per-second / per-millisecond Time-formatting caches were per-fiber, so under Async every fiber rebuilt the iso8601 / httpdate String on its first call after a tick. Now per-OS-thread, shared across fibers; one allocation per second per thread total.
10
+
11
+ ### Added
12
+ - **Prometheus exporter example output** in the README's Metrics section — shows what `curl -H 'X-Hyperion-Admin-Token: ...' /-/metrics` actually returns (HELP/TYPE lines, status-code labels, auto-export of unknown counters), plus the Prometheus scraper config sketch.
13
+ - **Regression spec** for the access-buffer cross-fiber bug — two fibers on the same OS thread write through one logger; verifies a single buffer is registered (not one per fiber) and both lines land via `flush_all`.
14
+ - **4 new Metrics specs** (already shipped in 1.4.1; called out here for coverage tracking) — cross-fiber on same thread, cross-thread, cross-fiber-on-different-thread, many-fibers-on-same-thread.
15
+
16
+ ### Changed
17
+ - **README benchmark section** version-stamped: clarifies that the headline numbers were measured against the noted Hyperion version (most are 1.2.0 hello-world / 1.3.0 PG-bound) and that 1.3.0+ `--async-io` + 1.4.0+ TLS-inline + 1.4.1+ Metrics fix preserve or improve these numbers. We re-run the headline configs each release.
18
+
19
+ ## [1.4.1] - 2026-04-27
20
+
21
+ ### Fixed
22
+ - **`Hyperion::Metrics` fiber-key bug** — pre-1.4.1 the metrics module stored counters via `Thread.current[:key]`, which is FIBER-local in Ruby 1.9+. Under an `Async::Scheduler` (TLS / h2 / `--async-io` plain HTTP/1.1) every handler fiber got its own private counters Hash that `Hyperion.stats` could never see — increments were stranded, the dispatch counters and `:bytes_written` etc. read as zero from any non-handler-fiber observer (including the Prometheus `/-/metrics` exporter when scraped from a different fiber). Switched to `Thread#thread_variable_*` (truly thread-local across fibers) plus direct counter-Hash list storage so snapshots also survive thread death. Verified via 4 new specs: cross-fiber on same thread, cross-thread, cross-fiber-on-different-thread, many-fibers-on-same-thread (210 increments aggregated correctly). Surfaced by hyperion-async-pg 0.4.0's bench round, which couldn't read `:requests_async_dispatched` from spec assertions even though the increments were firing.
23
+
3
24
  ## [1.4.0] - 2026-04-27
4
25
 
5
26
  Default-behaviour change for TLS users: HTTP/1.1-over-TLS now dispatches inline on the calling fiber instead of hopping through the worker thread pool. Fiber-cooperative libraries (`hyperion-async-pg`, `async-redis`) work on the TLS h1 path without `--async-io`. No code-path changes for plain HTTP/1.1 default behaviour.
data/README.md CHANGED
@@ -25,7 +25,7 @@ bundle exec hyperion config.ru
25
25
 
26
26
  ## Benchmarks
27
27
 
28
- All numbers are real wrk runs against published Hyperion configs. Hyperion ships **with default-ON structured access logs**; Puma comparisons use Puma defaults (no per-request log emission).
28
+ All numbers are real wrk runs against published Hyperion configs. Hyperion ships **with default-ON structured access logs**; Puma comparisons use Puma defaults (no per-request log emission). Each section is stamped with the Hyperion version it was measured against — newer versions (1.3.0+ `--async-io`, 1.4.0+ TLS h1 inline, 1.4.1+ Metrics fiber-key fix) preserve or improve these numbers; we re-run the headline configs each release and have not seen regressions on these workloads.
29
29
 
30
30
  ### Hello-world Rack app
31
31
 
@@ -95,7 +95,7 @@ Ubuntu 24.04 / 16 vCPU / Ruby 3.3.3, Postgres 17 over WAN, `wrk -t4 -c200 -d20s`
95
95
  1. **Linear scaling with pool size** under `--async-io` — `r/s ≈ pool × 12` on this WAN bench. Single-worker pool=200 hits 2381 r/s, **42× Puma `-t 5`** and **5.9× Puma's best** (`-t 30`).
96
96
  2. **Mixed workload doesn't kill the win** — Hyperion `--async-io` pool=128 actually goes *up* on mixed (1740 vs 1344 r/s) because CPU work overlaps other fibers' PG-wait windows. This is the honest "what happens to a real Rails handler" answer.
97
97
  3. **Hyperion ≈ Falcon within 3-7%** across pool sizes; both fiber-native architectures extract similar value from `hyperion-async-pg`.
98
- 4. **RSS at single-worker scale isn't the architectural moat** — Linux thread stacks are demand-paged; PG connection buffers dominate RSS at pool sizes ≤ 200. The MB-vs-GB story shows up at **idle keep-alive connection scale** (10k+ conns), not in this PG-bound throughput bench. See [Concurrency at scale](#concurrency-at-scale-architectural-advantages) for the connection-count win.
98
+ 4. **RSS at single-worker scale isn't the architectural moat** — Linux thread stacks are demand-paged; PG connection buffers dominate RSS at pool sizes ≤ 200. The architectural win is **handler concurrency under load**, not idle memory: Hyperion's fiber path runs thousands of in-flight handler invocations per OS thread, so wait-bound handlers don't queue at `max_threads`. See [Concurrency at scale](#concurrency-at-scale-architectural-advantages) for both the throughput-under-load row and a measured 10k-idle-keepalive RSS sweep against Puma and Falcon.
99
99
  5. **`-w 4` cold-start caveat** — multi-worker p99 inflates because the bench rackup uses lazy per-process pool init (each worker pays full pool fill on its first request). Production apps avoid this with `on_worker_boot { Hyperion::AsyncPg::FiberPool.new(...).fill }`.
100
100
 
101
101
  Three things must all be true to get this win:
@@ -176,7 +176,21 @@ These workloads demonstrate structural differences between Hyperion's fiber-per-
176
176
  | Hyperion `-w 1 -t 10` | 93,090 | 6,910 | 3,446 | 27.01 s |
177
177
  | Puma `-w 1 -t 10:10` | 77,340 | 22,660 | 706 | 109.59 s |
178
178
 
179
- Hyperion holds each connection in a ~1 KB fiber stack; Puma needs an OS thread (~1–8 MB each, capped at `max_threads`). At 10k concurrent connections Hyperion serves **~5× the throughput** of Puma with **~20% fewer dropped requests**, while the per-connection bookkeeping cost is bounded by fiber size, not by `max_threads`.
179
+ At 10k concurrent connections under load Hyperion serves **~5× the throughput** of Puma with **~20% fewer dropped requests**. The per-connection bookkeeping cost is bounded by fiber size, not by `max_threads` — workers don't get pinned to long-lived sockets, so a slow handler doesn't starve other connections.
180
+
181
+ **Memory at idle keep-alive scale — 10,000 idle HTTP/1.1 keep-alive connections:**
182
+
183
+ Each client opens a TCP connection, sends one keep-alive GET, drains the response, then holds the socket open without sending a follow-up request. RSS is sampled once a second across a 30s idle hold. Same hello-world rackup, single worker, no TLS. Hyperion runs with `async_io true` (fiber-per-connection on the plain HTTP/1.1 path).
184
+
185
+ | | held | dropped | peak RSS | RSS after drain |
186
+ |---|---:|---:|---:|---:|
187
+ | Hyperion `-w 1 -t 5 --async-io` | 10,000 / 10,000 | 0 | 173 MB | 155 MB |
188
+ | Puma `-w 0 -t 100` | 10,000 / 10,000 | 0 | 101 MB | 104 MB |
189
+ | Falcon `--count 1` | 10,000 / 10,000 | 0 | 429 MB | 440 MB |
190
+
191
+ All three hold 10k idle conns without OOMing or dropping — the "MB-per-thread" intuition that thread-based servers can't reach this scale doesn't survive contact with Linux's demand-paged thread stacks plus Puma's reactor-based keep-alive handling. Per-conn RSS lands at ~14 KB (Hyperion fiber + parser state), ~7 KB (Puma reactor entry + tiny thread share), ~36 KB (Falcon Async::Task + protocol-http stack). Bounded, not unbounded — for all three.
192
+
193
+ The architectural difference shows up under **load**, not at idle: Puma can only run `max_threads` handler invocations concurrently, so wait-bound handlers (DB, HTTP, Redis) starve at higher request concurrency than `max_threads`. Hyperion's fiber-per-connection model + `--async-io` gives one OS thread thousands of in-flight handler executions, paired with [hyperion-async-pg](https://github.com/exodusgaming-io/hyperion-async-pg) for non-blocking DB. The 10k-conn throughput row above (5× Puma) is the consequence — same idle RSS shape, very different behaviour once the handlers actually do work.
180
194
 
181
195
  **HTTP/2 multiplexing — 1 connection × 100 concurrent streams (handler sleeps 50 ms):**
182
196
 
@@ -194,6 +208,9 @@ Hyperion fans 100 in-flight streams across separate fibers within a single TCP c
194
208
  bundle exec ruby bench/compare.rb
195
209
  HYPERION_WORKERS=4 PUMA_WORKERS=4 FALCON_COUNT=4 bundle exec ruby bench/compare.rb
196
210
 
211
+ # Idle keep-alive RSS sweep (1k / 5k / 10k conns, 30s hold per server)
212
+ ./bench/keepalive_memory.sh
213
+
197
214
  # Real Rails / Grape: see bench/db.ru for the schema
198
215
  ```
199
216
 
@@ -346,6 +363,31 @@ Hyperion.stats
346
363
  # => {connections_accepted: 1234, connections_active: 7, requests_total: 8910, …}
347
364
  ```
348
365
 
366
+ ### Prometheus exporter
367
+
368
+ When `admin_token` is set in your config, Hyperion mounts a `/-/metrics` endpoint that emits Prometheus text-format v0.0.4. Same token guards both `/-/metrics` (GET) and `/-/quit` (POST); auth is via the `X-Hyperion-Admin-Token` header.
369
+
370
+ ```sh
371
+ $ curl -s -H 'X-Hyperion-Admin-Token: secret' http://127.0.0.1:9292/-/metrics
372
+ # HELP hyperion_requests_total Total HTTP requests handled
373
+ # TYPE hyperion_requests_total counter
374
+ hyperion_requests_total 8910
375
+ # HELP hyperion_bytes_written_total Total bytes written to response sockets
376
+ # TYPE hyperion_bytes_written_total counter
377
+ hyperion_bytes_written_total 2351023
378
+ # HELP hyperion_responses_status_total Responses by HTTP status code
379
+ # TYPE hyperion_responses_status_total counter
380
+ hyperion_responses_status_total{status="200"} 8521
381
+ hyperion_responses_status_total{status="404"} 12
382
+ hyperion_responses_status_total{status="500"} 3
383
+ # … and so on for sendfile_responses_total, rejected_connections_total,
384
+ # slow_request_aborts_total, requests_async_dispatched_total, etc.
385
+ ```
386
+
387
+ Any counter not in the known set (added by app middleware via `Hyperion.metrics.increment(:custom_thing)`) is auto-exported as `hyperion_custom_thing` with a generic HELP line — no Hyperion config change required.
388
+
389
+ Point your scraper at it: in Prometheus' `scrape_configs`, set `metrics_path: /-/metrics` and `bearer_token` (or use a custom header relabel — Prometheus 2.42+ supports `authorization.credentials_file` paired with a custom `header` block). Network-isolate the admin endpoints if the listener is internet-facing — see [docs/REVERSE_PROXY.md](docs/REVERSE_PROXY.md) for the nginx `location /-/ { return 404; }` recipe.
390
+
349
391
  ## TLS + HTTP/2
350
392
 
351
393
  Provide a PEM cert + key:
@@ -150,7 +150,7 @@ module Hyperion
150
150
  build_access_text(ts, method, path, query, status, duration_ms, remote_addr, http_version)
151
151
  end
152
152
 
153
- buf = Thread.current[@buffer_key] || allocate_access_buffer
153
+ buf = Thread.current.thread_variable_get(@buffer_key) || allocate_access_buffer
154
154
  buf << line
155
155
  return if buf.bytesize < ACCESS_FLUSH_BYTES
156
156
 
@@ -164,7 +164,7 @@ module Hyperion
164
164
  # loop when a connection closes (so log lines from a closing keep-alive
165
165
  # session don't get stuck behind the buffer until the next connection).
166
166
  def flush_access_buffer
167
- buf = Thread.current[@buffer_key]
167
+ buf = Thread.current.thread_variable_get(@buffer_key)
168
168
  return if buf.nil? || buf.empty?
169
169
 
170
170
  @out.write(buf)
@@ -215,7 +215,7 @@ module Hyperion
215
215
  # Mutex is taken once per thread (not per request).
216
216
  def allocate_access_buffer
217
217
  buf = +''
218
- Thread.current[@buffer_key] = buf
218
+ Thread.current.thread_variable_set(@buffer_key, buf)
219
219
  @access_buffers_mutex.synchronize { @access_buffers << buf }
220
220
  buf
221
221
  end
@@ -229,11 +229,21 @@ module Hyperion
229
229
  end
230
230
 
231
231
  # Cached UTC iso8601(3) timestamp, refreshed at most once per millisecond
232
- # per thread. At 24k r/s with 16 threads we render ~1500 r/s/thread; only
233
- # ~1000 of those allocate a new String. The other 500 reuse the cached one.
232
+ # per OS thread. At 24k r/s with 16 threads we render ~1500 r/s/thread;
233
+ # only ~1000 of those allocate a new String. The other 500 reuse the
234
+ # cached one. Stored as a thread variable (truly thread-local across
235
+ # fibers) so under Async every fiber on this thread shares the same
236
+ # cache and the per-second amortisation actually fires; with the prior
237
+ # `Thread.current[:k]` storage each fiber would re-build the iso8601
238
+ # String on its first call after a millisecond tick.
234
239
  def cached_timestamp
235
240
  now_ms = Process.clock_gettime(Process::CLOCK_REALTIME, :millisecond)
236
- cache = (Thread.current[:__hyperion_ts_cache__] ||= [-1, ''])
241
+ thread = Thread.current
242
+ cache = thread.thread_variable_get(:__hyperion_ts_cache__)
243
+ if cache.nil?
244
+ cache = [-1, '']
245
+ thread.thread_variable_set(:__hyperion_ts_cache__, cache)
246
+ end
237
247
  return cache[1] if cache[0] == now_ms
238
248
 
239
249
  cache[0] = now_ms
@@ -7,6 +7,22 @@ module Hyperion
7
7
  # all threads that have ever incremented (one short mutex section, only
8
8
  # taken when the operator asks for stats).
9
9
  #
10
+ # Storage: counters live behind `Thread#thread_variable_*`, which is the
11
+ # only TRUE thread-local in Ruby 1.9+ — `Thread.current[:key]` is in fact
12
+ # FIBER-local, so under an `Async::Scheduler` (TLS path, h2 streams, the
13
+ # 1.3.0+ `--async-io` plain HTTP/1.1 path) every handler fiber would get
14
+ # its own private counters Hash that `snapshot` could never find.
15
+ # Verified with hyperion-async-pg 0.4.0's bench round; before the fix
16
+ # the dispatch counters dropped requests entirely under `--async-io` and
17
+ # an external scrape (Prometheus exporter on a different fiber than the
18
+ # handler) saw the dispatch buckets at zero.
19
+ #
20
+ # Cross-fiber races on the same OS thread: the `+=` is technically read-
21
+ # modify-write, but Ruby's fiber scheduler only preempts at IO boundaries
22
+ # (Fiber.scheduler-aware system calls), and `Hash#[]=` is purely Ruby —
23
+ # no preemption mid-increment, no torn writes. Two fibers cannot
24
+ # interleave a single `+=` on the same OS thread.
25
+ #
10
26
  # Reset semantics: counters monotonically increase. Operators that want
11
27
  # rate-of-change should snapshot, sleep, snapshot, diff.
12
28
  #
@@ -14,16 +30,40 @@ module Hyperion
14
30
  # Hyperion.stats -> Hash with all current values across all threads.
15
31
  class Metrics
16
32
  def initialize
17
- @threads = Set.new
18
- @threads_mutex = Mutex.new
19
- # Each Metrics instance has its own thread-local key so spec runs that
20
- # build fresh Metrics objects don't share state across examples.
33
+ # Direct list of every per-thread counters Hash ever allocated through
34
+ # this Metrics instance. We hold the Hash refs ourselves (instead of
35
+ # holding Thread refs and looking the Hash up via thread-local
36
+ # storage) so snapshot survives thread death counters from a
37
+ # short-lived worker that already exited still aggregate. Tiny per-
38
+ # thread footprint (one Hash + one slot in this Array).
39
+ @thread_counters = []
40
+ @counters_mutex = Mutex.new
41
+ # Per-instance thread-local key so spec runs that build fresh Metrics
42
+ # objects don't share state across examples.
21
43
  @thread_key = :"__hyperion_metrics_#{object_id}__"
22
44
  end
23
45
 
24
- # Hot path: one TLS lookup + one hash op. No mutex.
46
+ # Hot path: one thread-variable lookup + one hash op. No mutex on the
47
+ # increment fast path; the mutex is taken only on first allocation per
48
+ # OS thread (very rare) and on snapshot.
49
+ #
50
+ # Storage uses Thread#thread_variable_*, which is the only TRUE thread-
51
+ # local in Ruby 1.9+ — Thread.current[:key] is in fact FIBER-local, so
52
+ # under an Async::Scheduler (TLS path, h2 streams, the 1.3.0+ --async-io
53
+ # plain HTTP/1.1 path) every handler fiber would get its own private
54
+ # counters Hash that snapshot could never aggregate. Verified with
55
+ # hyperion-async-pg 0.4.0's bench round; before the fix the dispatch
56
+ # counters dropped requests under --async-io.
57
+ #
58
+ # Cross-fiber races on the same OS thread: the `+=` is read-modify-write,
59
+ # but Ruby's fiber scheduler only preempts at IO boundaries (Fiber-
60
+ # scheduler-aware system calls). Hash#[]= is purely Ruby — no
61
+ # preemption mid-increment, no torn writes. Two fibers cannot
62
+ # interleave a single `+=` on the same OS thread.
25
63
  def increment(key, by = 1)
26
- counters = Thread.current[@thread_key] ||= register_thread_counters
64
+ thread = Thread.current
65
+ counters = thread.thread_variable_get(@thread_key)
66
+ counters = register_thread_counters(thread) if counters.nil?
27
67
  counters[key] += by
28
68
  end
29
69
 
@@ -37,14 +77,9 @@ module Hyperion
37
77
 
38
78
  def snapshot
39
79
  result = Hash.new(0)
40
- @threads_mutex.synchronize do
41
- @threads.delete_if { |t| !t.alive? }
42
- @threads.each do |t|
43
- counters = t[@thread_key]
44
- next unless counters
45
-
46
- counters.each { |k, v| result[k] += v }
47
- end
80
+ counters_snapshot = @counters_mutex.synchronize { @thread_counters.dup }
81
+ counters_snapshot.each do |counters|
82
+ counters.each { |k, v| result[k] += v }
48
83
  end
49
84
  result.default = nil
50
85
  result
@@ -52,16 +87,17 @@ module Hyperion
52
87
 
53
88
  # Tests can call .reset! between examples to avoid cross-spec leakage.
54
89
  def reset!
55
- @threads_mutex.synchronize do
56
- @threads.each { |t| t[@thread_key]&.clear }
90
+ @counters_mutex.synchronize do
91
+ @thread_counters.each(&:clear)
57
92
  end
58
93
  end
59
94
 
60
95
  private
61
96
 
62
- def register_thread_counters
97
+ def register_thread_counters(thread)
63
98
  counters = Hash.new(0)
64
- @threads_mutex.synchronize { @threads << Thread.current }
99
+ thread.thread_variable_set(@thread_key, counters)
100
+ @counters_mutex.synchronize { @thread_counters << counters }
65
101
  counters
66
102
  end
67
103
  end
@@ -142,10 +142,19 @@ module Hyperion
142
142
 
143
143
  # Cached HTTP `Date:` header at second resolution. `Time.now.httpdate`
144
144
  # allocates several strings; at high r/s the cache reuses one String per
145
- # second per thread instead of allocating per response.
145
+ # second per OS thread instead of allocating per response. Stored as a
146
+ # thread variable (truly thread-local across fibers) so under Async
147
+ # every fiber on this thread shares the same cache — otherwise each
148
+ # fiber would rebuild the httpdate String on its first response after
149
+ # a second tick.
146
150
  def cached_date
147
151
  now_s = Process.clock_gettime(Process::CLOCK_REALTIME, :second)
148
- cache = (Thread.current[:__hyperion_date_cache__] ||= [-1, ''])
152
+ thread = Thread.current
153
+ cache = thread.thread_variable_get(:__hyperion_date_cache__)
154
+ if cache.nil?
155
+ cache = [-1, '']
156
+ thread.thread_variable_set(:__hyperion_date_cache__, cache)
157
+ end
149
158
  return cache[1] if cache[0] == now_s
150
159
 
151
160
  cache[0] = now_s
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Hyperion
4
- VERSION = '1.4.0'
4
+ VERSION = '1.4.2'
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: hyperion-rb
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.4.0
4
+ version: 1.4.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrey Lobanov