RubyGems - protobuf-nats - Versions diffs - 0.13.0 → 0.13.1.pre1 - Mend

protobuf-nats 0.13.0 → 0.13.1.pre1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +20 -0
data/README.md +77 -4
data/bench/bench.md +87 -7
data/bench/muxer_resilience_bench.rb +151 -0
data/bench/server_intake_bench.rb +158 -0
data/bench/soak.rb +146 -0
data/lib/protobuf/nats/client.rb +32 -8
data/lib/protobuf/nats/config.rb +18 -14
data/lib/protobuf/nats/errors.rb +26 -0
data/lib/protobuf/nats/response_muxer.rb +53 -18
data/lib/protobuf/nats/server.rb +128 -17
data/lib/protobuf/nats/super_subscription_manager.rb +117 -61
data/lib/protobuf/nats/thread_pool.rb +14 -4
data/lib/protobuf/nats/version.rb +1 -1
data/lib/protobuf/nats.rb +62 -7
metadata +4 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: dd9d0e1d6f565a66e972312fa19398f5b027c426b363672b5aa96b5a61f00595
-  data.tar.gz: c253885854d9bafcd5e714f8f1ff2afdb1193758d394f42c91ebf58f70865bd6
+  metadata.gz: 547f632aa7ad154f6a546c67df3628166ac2db3acbc2dd532a1e98aa6525788e
+  data.tar.gz: 699fe41c76d6aabb6f5db20f20f8309f5723e99d74f164d906dabd92fc441556
 SHA512:
-  metadata.gz: c6415db921943a0c61e3c310aea1a67f50fc955dfe03512d8709aeb3ae242228d8da78ce60df8158e9e7ba1b8d56cb8d59512e265a27e8f106d9fcf85fdf2c1c
-  data.tar.gz: ed7ff5492e9dce9a7c15aaf30885c2fa0e50533186e2c48a662eed6f80ae4db5a485a5469d3707b221956f02d05b68843273a23280cc7c9209217bca66f03c7a
+  metadata.gz: ac2417d75bbc60ad475c01bec82bd334a83ef7b3a7a6051b096273e28406c68c9344756078bb6e1a729f8827c478e91b0341c1d69bbe3ff35fdb2d3f6bab47b8
+  data.tar.gz: 4bbc4f568f992068277a3ee2d9b50d6b715cf80e12e6208e020bed6137cd54c7dcd37a7279933f7ca2f0ebcef52aae9adba939633de9edbf5f76366ed1fb0188

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,25 @@
 ## Changelog
+### 0.13.1
+Fixes a production regression and a set of related issues, all of the same class: assumptions left over from the JNats → nats-pure migration in 0.13.0 that became silently wrong.
+- **Dropped-connection retries were silently disabled.** Dropping JNats collapsed `Errors::IOException` to the never-raised `MriIOException`, so the client's reconnect/retry `rescue` became dead code. A dropped NATS connection then escaped immediately as an `RPC_ERROR` (surfacing as a 500) instead of being retried. The client now rescues the transport errors `nats-pure` and the socket layer actually raise (`EOFError`, `IOError`, `Errno::ECONNRESET`/`EPIPE`/`ECONNREFUSED`/`ETIMEDOUT`, `NATS::IO::ConnectionClosedError`, and `java.io.IOException` on JRuby) via `Errors::RETRYABLE_TRANSPORT_ERRORS` and rides them out with the existing `reconnect_delay` retry loop.
+- **Response muxer `pending_size` drift (could silently drop all responses).** nats-pure increments a subscription's `pending_size` (synchronized) for every inbound message and uses it to enforce the slow-consumer byte limit; for a callback-less subscription it never decrements it, so the muxer would be the sole consumer. Rather than mirror that accounting with a lock on every message, the muxer now **disables the byte-based limit** on its response subscription and relies on the message-count limit (the `SizedQueue` depth, tracked accurately for free). This removes the per-message lock from the dispatch hot path (~**2.7× faster** per message on JRuby — see `bench/muxer_resilience_bench.rb`) and eliminates the drift bug entirely.
+- **Dispatcher no longer busy-spins during a restart window.** If `@resp_sub` was briefly `nil` while the muxer restarted, the dispatch loop raised `NoMethodError` every iteration — busy-spinning and emitting a logged error + error-callback per spin. It now parks briefly (~0.2% of the old wasted work, zero errors).
+- **Self-healing backoff counter is now thread-safe.** The shared dispatcher crash counter was a plain `Integer` mutated by multiple dispatcher threads (it lost ~45% of updates under true parallelism on JRuby, corrupting the exponential backoff). It is now a `Concurrent::AtomicFixnum` that decays once a dispatcher is healthy.
+- **Client connection lifecycle hardening.** Connection callbacks (`on_disconnect`/`on_reconnect`/`on_close`/`on_error`) are now registered before `connect`, so handshake-time events are observed; and a failed handshake closes the half-open client so nats-pure's reader/flusher threads aren't leaked.
+- **Removed the dead `:disable_reconnect_buffer` connect option.** nats-pure has no such option (it was a JNats concept), so it was silently ignored. Transient disconnects are now handled by the client's transport-error retry path and `ack_timeout`.
+- **Server no longer leaves clients hanging on handler/publish failure.** If processing a request fails after the ACK was sent, the server now publishes an encoded `RPC_ERROR` response so the client fails fast instead of blocking until `response_timeout` (60s).
+- **Config no longer crashes when the YAML file has no section for the current environment** (or is empty); it falls back to defaults.
+- **TLS now floors at 1.2 and ceilings at 1.3** (replacing the deprecated `ssl_version = :TLSv1_2` hard pin), so TLS 1.3 is used when the server supports it and a TLS-1.2-only transport still negotiates down to 1.2. Verified on JRuby 9.4 and 10.0.
+- **Server request intake is now parallelized.** `SuperSubscriptionManager` drained the shared intake queue with a single thread that also published every ACK/NACK, so on JRuby intake was pinned to one core and one slow publish (e.g. nats-pure's buffer during a reconnect) head-of-line blocked *every* subject. Intake now fans out to `PB_NATS_SERVER_SUBSCRIPTION_HANDLERS` threads (default `processor_count` on JRuby, 1 on CRuby) with per-thread self-healing backoff. NATS queue-group semantics and subscription counts are unchanged — each request is still delivered to exactly one consumer. Measured **~8.5× intake throughput** and head-of-line stall **~505ms → ~0.4ms** at 8 handlers (`bench/server_intake_bench.rb`).
+- **Client retry is bounded and jittered.** `PB_NATS_CLIENT_MAX_RETRIES` (default 3) and `PB_NATS_CLIENT_RECONNECT_DELAY_SPLAY_LIMIT` (default 1000ms) make retries configurable, and the reconnect sleep now adds random jitter so a fleet hitting the same outage doesn't reconnect in lockstep.
+- **More transient errors are retried.** `ConnectionPool::TimeoutError` (subscription-pool exhaustion during a reconnect) is now treated as transient instead of surfacing as an `RPC_ERROR`.
+- **`connection_options` only forwards nats-pure-recognized keys** (servers, max_reconnect_attempts, connect_timeout, tls); app-level settings are read via their own accessors and no longer leak into `nats.connect`. YAML config now uses `safe_load`.
+- **Thread-pool robustness.** `wait_for_termination` prunes under its mutex and returns a real drained/timed-out result; a new `replenish` (called each server tick) respawns a worker killed by a non-StandardError. On shutdown the drain timeout tracks `handler_overdue_ms` so a legitimate long handler isn't killed mid-flight, and abandoned in-flight handlers are logged/instrumented.
+- **Error callbacks run off the read loop.** The nats `on_error` hooks dispatch via a bounded executor (`notify_error_callbacks_async`) so a slow user callback can't stall message processing for every subject.
+- **Server handler observability (long operations are first-class).** Handlers are never aborted — long-running operations (up to and beyond a minute) are allowed. The server now tracks in-flight handlers and emits `server.inflight_count`, `server.inflight_oldest_age_ms`, `server.overdue_handler_count`, `server.handler_overdue`, `server.pending_intake_queue_size`, `server.slow_handler` (opt-in via `PB_NATS_SERVER_SLOW_HANDLER_THRESHOLD_MS`), and `server.thread_pool_saturated`. A handler is only flagged "overdue" once it outlives the client's `response_timeout` (`PB_NATS_SERVER_HANDLER_OVERDUE_MS`, default 65s), so normal long ops are not mislabeled. Server duration metrics now use a monotonic clock.
 ### 0.13.0
 This is a large overhaul of the client and server internals.

data/README.md CHANGED Viewed

@@ -39,7 +39,23 @@ file is removed it will resubscribe and restart slow start (default: `nil`).
 `PB_NATS_SERVER_SUBSCRIPTIONS_PER_RPC_ENDPOINT` - Number of subscriptions to create for each rpc endpoint. This number is
 used to allow JVM based servers to warm-up slowly to prevent jolts in runtime performance across your RPC network
-(default: 10).
+(default: 10). Each subscription joins the NATS queue group for its endpoint, so every request is still delivered to
+exactly one consumer — this knob controls subscription/interest count, not duplicate delivery.
+`PB_NATS_SERVER_SUBSCRIPTION_HANDLERS` - Number of threads that drain the shared intake queue and publish ACK/NACKs
+(see [How it works](#how-it-works)). Defaults to `Concurrent.processor_count` on JRuby and `1` on CRuby. This is the
+*consumer* parallelism for messages this server has already received; it does not change how many topics are subscribed
+to or the queue-group delivery semantics. Minimum of 1.
+`PB_NATS_SERVER_SLOW_HANDLER_THRESHOLD_MS` - If set (> 0), emit `server.slow_handler` when a handler runs longer than this
+many milliseconds. Informational/SLA only — handlers are never aborted (default: 0, off).
+`PB_NATS_SERVER_HANDLER_OVERDUE_MS` - A handler still running past this many milliseconds is reported as "overdue"
+(`server.handler_overdue` + `server.overdue_handler_count`) — i.e. the client has already given up (`response_timeout`)
+so the work is orphaned. Defaults above the client's 60s response timeout so legitimate long operations are not flagged
+(default: 65000). **This should track your clients' `PB_NATS_CLIENT_RESPONSE_TIMEOUT`** — set it to roughly that value (plus
+a small grace). If clients use a longer response timeout, raise this so handlers aren't flagged overdue while a client is
+still waiting; if shorter, lower it so orphaned work is surfaced promptly.
 `PB_NATS_CLIENT_ACK_TIMEOUT` - Seconds to wait for an ACK from the rpc server (default: 5 seconds).
@@ -50,7 +66,11 @@ used to allow JVM based servers to warm-up slowly to prevent jolts in runtime pe
 `PB_NATS_CLIENT_RESPONSE_TIMEOUT` - Seconds to wait for a non-ACK response from the rpc server (default: 60 seconds).
-`PB_NATS_CLIENT_RECONNECT_DELAY` - If we detect a reconnect delay, we will wait this many seconds (default: the ACK timeout).
+`PB_NATS_CLIENT_RECONNECT_DELAY` - When a request hits a transient transport error (e.g. the NATS connection drops or is reset), the client sleeps this many seconds before retrying to give the connection time to re-establish (default: the ACK timeout). See [Resilience](#resilience).
+`PB_NATS_CLIENT_RECONNECT_DELAY_SPLAY_LIMIT` - Random jitter (milliseconds, `0..limit`) added to the reconnect delay so a fleet hitting the same NATS outage does not reconnect in lockstep (default: 1000). Set to 0 to disable jitter.
+`PB_NATS_CLIENT_MAX_RETRIES` - Number of attempts for ack-timeouts and transient transport errors before raising (default: 3).
 `PB_NATS_CLIENT_SUBSCRIPTION_POOL_SIZE` - If subscription pooling is desired for the request/response cycle then the pool size maximum should be set; the pool is lazy and therefore will only start new subscriptions as necessary (default: 0)
@@ -93,6 +113,8 @@ An example config looks like this:
       - "original_service": "replacement_service"
 ```
+When `uses_tls` is set, the client negotiates TLS with a floor of 1.2 and a ceiling of 1.3: it uses TLS 1.3 where the NATS server supports it and falls back to 1.2 otherwise (verified on JRuby 9.4 and 10.0).
 ## Usage
 This library is designed to be an alternative transport implementation used by the `protobuf` gem. In order to make
@@ -162,13 +184,64 @@ If we were to add another service endpoint called `search` to the `UserService`
 - **ResponseMuxer** (`lib/protobuf/nats/response_muxer.rb`) — the client uses a single wildcard subscription to multiplex
   all RPC responses (similar to the Golang NATS client) instead of subscribing/unsubscribing per request. One or more
   dispatcher threads drain the shared subscription and route each reply to the waiting caller via a `Concurrent::Map`,
-  keyed by a UUIDv7 request token. Tune the dispatcher count with `PB_NATS_RESPONSE_MUXER_DISPATCHERS`.
+  keyed by a UUIDv7 request token. Tune the dispatcher count with `PB_NATS_RESPONSE_MUXER_DISPATCHERS`. Slow-consumer
+  protection on the response subscription is by message count (the queue depth); the dispatch hot path does no per-message
+  locking. Dispatcher threads self-heal: a crashed dispatcher is restarted with exponential backoff (capped at 60s) that
+  decays once healthy.
 - **SuperSubscriptionManager** (`lib/protobuf/nats/super_subscription_manager.rb`) — the server manages the lifecycle of
-  RPC endpoint subscriptions, including slow start, pausing, and resubscription.
+  RPC endpoint subscriptions (NATS queue groups, so each request is delivered to one consumer), including slow start,
+  pausing, and resubscription. All subscriptions feed one shared intake queue drained by `PB_NATS_SERVER_SUBSCRIPTION_HANDLERS`
+  handler threads, so a slow ACK publish on one message can't head-of-line block every other subject. Handler threads
+  self-heal with exponential backoff.
+- **Server observability** — beyond the thread-pool gauges, the server emits in-flight handler metrics
+  (`server.inflight_count`, `server.inflight_oldest_age_ms`, `server.overdue_handler_count`, `server.handler_overdue`,
+  `server.pending_intake_queue_size`, `server.thread_pool_saturated`). Long-running handlers are allowed and never aborted;
+  a handler is only "overdue" once it outlives the client's `response_timeout` (see `PB_NATS_SERVER_HANDLER_OVERDUE_MS`).
+## Resilience
+The client is built to ride out transient NATS hiccups rather than surface them as request failures:
+- **Transient transport errors are retried.** If a request hits a dropped/reset/closed connection (`EOFError`,
+  `IOError`, `Errno::ECONNRESET`/`EPIPE`/`ECONNREFUSED`/`ETIMEDOUT`, `NATS::IO::ConnectionClosedError`, or a Java
+  `IOException` on JRuby — see `Errors::RETRYABLE_TRANSPORT_ERRORS`), the client sleeps `PB_NATS_CLIENT_RECONNECT_DELAY`
+  and retries (up to 3 attempts) while `nats-pure` re-establishes the connection in the background.
+- **Missing ACKs and NACKs are retried** with their own timeouts/backoff (`PB_NATS_CLIENT_ACK_TIMEOUT`,
+  `PB_NATS_CLIENT_NACK_BACKOFF_INTERVALS`).
+- **Server-side failures fail the caller fast.** If the server cannot process a request after it has ACKed, it publishes
+  an encoded RPC error response so the client raises immediately instead of blocking until `PB_NATS_CLIENT_RESPONSE_TIMEOUT`.
+- **The response dispatcher self-heals.** A crashed muxer dispatcher restarts with exponential backoff, and a brief
+  subscription-restart window won't busy-spin the dispatch loop.
+See `bench/muxer_resilience_bench.rb` for microbenchmarks of the dispatch hot path and these resilience paths.
+## Delivery semantics (at-least-once)
+**Current design choice:** RPC delivery is **at-least-once**, and the gem does **not** deduplicate requests. The resilience features above are the reason: when the client retries on an ACK/response timeout or a transient transport error, the server may have *already received and processed* the original request, so a single client call can run a handler **more than once**. (NATS queue groups guarantee each *delivered* message goes to one consumer, but they do not prevent the client from re-sending after a timeout.)
+The gem deliberately favors at-least-once over at-most-once: dropping work on a transient blip is usually worse than occasionally repeating it. Making this safe is therefore the **service author's responsibility** — handlers that have side effects should be written to be idempotent:
+- Key writes on a natural/business id or a client-supplied idempotency token (upsert / `find_or_create`) rather than blind inserts.
+- Make external side effects (charges, emails, downstream RPCs) safe to repeat, or guard them with your own dedup keyed on a request id you put in the message.
+- Naturally idempotent operations (reads, idempotent upserts) need no special handling.
+**Why no built-in dedup (yet):** correct dedup across a horizontally-scaled service requires a *shared* store (a retry can land on a different server instance), a tuned TTL, and a cached response to replay on duplicates — and it only helps RPCs that aren't already idempotent. A future, **opt-in per-RPC** dedup with a pluggable store may be added; it will not be the default. Until then, treat handlers as potentially re-run.
 ## Future Improvements (locked behind ruby version)
 - Migrate from the `uuid7` gem to native `Random#uuid_v7` once the minimum Ruby version supports it (see `UUIDv7Helper`).
+## Benchmarks
+Microbenchmarks live in `bench/` and measure both the old and new behavior in one process (no NATS server required). See `bench/bench.md` for details. Highlights on JRuby:
+- `bench/muxer_resilience_bench.rb` — response-muxer dispatch hot path (~2.5× faster per message with the per-message lock removed), restart-window resilience, and crash-counter accuracy.
+- `bench/server_intake_bench.rb` — server intake fan-out (~8× throughput, head-of-line stall ~505ms → ~2ms) and the handler-exhaustion observability.
+```
+bundle exec ruby -Ilib bench/server_intake_bench.rb
+bundle exec ruby -Ilib bench/muxer_resilience_bench.rb
+```
 ## Development
 After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.

data/bench/bench.md CHANGED Viewed

@@ -1,16 +1,96 @@
+## Benchmarks
-Notes:
-`-Xjit.threshold=0` - Setting the threshold to 0 forces JRuby to compile every method into Java bytecode immediately before its very first execution. This is particularly useful for debugging or bypassing warm-up times during profiling
+- `bench/concurrency_bench.rb` — end-to-end hot-path throughput (muxer round-trip,
+  subscription-key cache, thread pool) across thread counts. No NATS server needed.
+- `bench/muxer_resilience_bench.rb` — measures the response-muxer hot-path and
+  self-healing fixes (both old/baseline and new/patched behavior in one process):
+  - **A. Dispatch hot-path** — per-message `pending_size` accounting that was
+    removed; the dispatch step is ~**2.7× faster** per message on JRuby once the
+    per-message subscription lock is gone.
+  - **B. nil-`@resp_sub` resilience** — during a restart window the old loop
+    busy-spun (a `NoMethodError` + logged error/callback every iteration); the new
+    loop parks, doing **~0.2%** of the old wasted work and emitting **0** errors.
+  - **C. Self-healing crash counter** — a plain Integer mutated by N dispatcher
+    threads loses ~**45%** of updates on JRuby (corrupting the exponential backoff);
+    the `Concurrent::AtomicFixnum` replacement loses none.
+  Run: `bundle exec ruby -Ilib bench/muxer_resilience_bench.rb`
+- `bench/server_intake_bench.rb` — server intake fan-out + handler observability
+  (old single-handler vs new N-handler intake, in one process):
+  - **A. Intake throughput** — with a per-ACK publish cost, N drain threads scale
+    intake ~linearly (measured **~8.5×** at 8 handlers on JRuby vs the old single
+    intake thread).
+  - **B. Head-of-line blocking** — behind one slow (0.5s) publish, 50 quick
+    messages finished in **~505ms** with one handler vs **~0.4ms** with N.
+  - **C. Observability demo** — with hung handlers the new notifications report
+    `inflight_count` / `inflight_oldest_age_ms` / `overdue_handler_count` and fire
+    `server.handler_overdue`, where before only `server.message_dropped` was visible.
-`-Xjit.threshold=10 -J-XX:CompileThreshold=10` - If you are running benchmarks and want both JRuby and the JVM to aggressively optimize early, you can lower both thresholds simultaneously
+  Run: `bundle exec ruby -Ilib bench/server_intake_bench.rb`
+- `bench/soak.rb` — opt-in soak/chaos test: spawns its own `nats-server`, runs a
+  real protobuf-nats server + client in-process under sustained concurrency
+  (including deliberately long handlers), bounces the nats-server mid-run, and
+  asserts recovery (≥90% success) while reporting the resilience signals. Skips
+  if `nats-server` isn't on PATH.
-`bundle;  bx ruby -I lib bench/real_client.rb`
+  Run: `SOAK_DURATION=20 SOAK_BOUNCES=3 bundle exec ruby -Ilib bench/soak.rb`
-Start local nats server so details can be monitored.
-`/opt/homebrew/opt/nats-server/bin/nats-server -DV -m 8222 -p 4222`
+---
+## Running benchmarks (warm + reliable)
+These numbers are meaningless cold. On JRuby the JVM has to load classes and JIT-compile the hot paths before it reaches steady state, so the first second(s) of any run are far slower than production. Always warm up, repeat, and compare like-for-like.
+### 1. Use the production engine
+Run on JRuby (what production uses); CRuby numbers differ because the GVL serializes the parallelism these benches exercise.
+```
+rbenv shell jruby-9.4.14.0   # or your deployed JRuby
+ruby -v                       # confirm engine before trusting any number
+```
+### 2. Benchmarking JRUBY_OPTS
+Fix the heap so GC resizing doesn't jitter the run, give the young gen room, and don't block on entropy:
+```
+export JRUBY_OPTS="-J-Xms4g -J-Xmx4g -J-Xmn1g --disable:did_you_mean -J-Djava.security.egd=file:/dev/./urandom"
+```
+- Set `-Xms == -Xmx` so the heap never resizes mid-measurement.
+- Do **not** use `--dev` for benchmarking — it disables the JIT for fast startup and will understate performance.
+- Optional faster warmup (compile sooner): add `-Xjit.threshold=10 -J-XX:CompileThreshold=10`. `-Xjit.threshold=0` forces immediate compilation — useful for profiling, but prefer real warmup for representative steady-state numbers.
+### 3. Warm up, then measure
+- `muxer_resilience_bench.rb` section A uses **benchmark-ips**, which warms up on its own (warmup then a timed window) — no extra flags needed.
+- The loop-driven benches (`concurrency_bench.rb`, and the throughput sections of `server_intake_bench.rb`) measure a fixed window. Give them a real warmup and a longer window:
+```
+BENCH_WARMUP=5 BENCH_DURATION=10 BENCH_THREADS=1,4,8,16 bundle exec ruby -Ilib bench/concurrency_bench.rb
+```
+### 4. Repeat and take the median
+JVM warmup and machine noise make any single run unreliable. Run each bench **3+ times**, discard the first (cold class-load/JIT), and report the **median**. Keep the machine quiet (close other apps, disable CPU throttling / keep laptops on AC) and run one bench at a time.
+### Per-script tuning knobs
+| Script | Env knobs (defaults) |
+| --- | --- |
+| `concurrency_bench.rb` | `BENCH_DURATION` (4), `BENCH_WARMUP` (2), `BENCH_THREADS` (`1,4,8,16`), `BENCH_POOL_WORKERS` (8) |
+| `muxer_resilience_bench.rb` | none — benchmark-ips controls warmup/time |
+| `server_intake_bench.rb` | `BENCH_HANDLERS` (cores), `BENCH_MSGS` (20000), `BENCH_PUBLISH_LATENCY_US` (50) |
+| `soak.rb` | `SOAK_DURATION` (15), `SOAK_THREADS` (12), `SOAK_BOUNCES` (2), `SOAK_NATS_PORT` (4299) |
+### Real end-to-end run (optional, needs a NATS server)
+`bench/real_client.rb` drives the example app against a live server. Start a local nats-server (with monitoring) first:
 ```
-export JRUBY_OPTS="--disable:did_you_mean -J-Djava.security.egd=file:/dev/./urandom -J-Xmx2g -J-Xms1024m -J-Xmn512m -Xjit.threshold=10 -J-XX:CompileThreshold=10"
+nats-server -DV -m 8222 -p 4222          # or: /opt/homebrew/opt/nats-server/bin/nats-server ...
+bundle exec ruby -Ilib bench/real_client.rb
 ```
+`bench/soak.rb` spawns and bounces its own throwaway nats-server, so it needs only the `nats-server` binary on PATH (it self-skips otherwise).

data/bench/muxer_resilience_bench.rb ADDED Viewed

@@ -0,0 +1,151 @@
+# Benchmarks for the response-muxer hot-path and self-healing changes.
+#
+# This file measures BOTH the old (baseline) and new (patched) behavior in one
+# process so the speedup/robustness delta is reproducible on CRuby and JRuby
+# without a NATS server:
+#
+#   A. Dispatch hot-path cost   -- per-message pending_size accounting that was
+#                                  removed (#1). benchmark-ips, lower is better.
+#   B. nil-@resp_sub resilience -- busy-spin vs park during a restart window (#3).
+#   C. Self-healing counter     -- lost updates with a plain int vs AtomicFixnum
+#                                  under concurrent crashes (#4).
+#
+# Usage:
+#   bundle exec ruby -Ilib bench/muxer_resilience_bench.rb
+require "bundler/setup"
+require "benchmark/ips"
+require "concurrent"
+require "nats/client" # real NATS::Subscription / NATS::Msg
+def mono
+  ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
+end
+puts "=" * 72
+puts "protobuf-nats response-muxer resilience bench"
+puts "engine=#{RUBY_ENGINE} #{RUBY_VERSION}  processor_count=#{::Concurrent.processor_count}"
+puts "=" * 72
+# --------------------------------------------------------------------------
+# A. Dispatch hot-path: per-message pending_size accounting (removed in #1).
+#
+# Old dispatch did `sub.synchronize { sub.pending_size -= msg.data.size }` for
+# EVERY response message; the new code does nothing here. We compare the old
+# accounting step against the cheapest real per-message op (a Concurrent::Map
+# lookup, which the dispatcher still does) so the delta is the lock overhead we
+# removed from the hot path.
+# --------------------------------------------------------------------------
+puts "\nA. Dispatch hot-path per-message overhead (higher ips = better)\n\n"
+sub = ::NATS::Subscription.new
+sub.pending_size = 0
+resp_map = ::Concurrent::Map.new
+resp_map["tok"] = { :queue => ::Queue.new }
+size = 64
+Benchmark.ips do |x|
+  x.config(:time => 3, :warmup => 1)
+  x.report("old: synchronize { pending_size -= n } + map lookup") do
+    sub.synchronize { sub.pending_size -= size }
+    resp_map["tok"]
+  end
+  x.report("new: map lookup only (accounting removed)") do
+    resp_map["tok"]
+  end
+  x.compare!
+end
+# --------------------------------------------------------------------------
+# B. nil-@resp_sub resilience (#3). During a restart @resp_sub can briefly be
+# nil. The old loop dereferenced it unconditionally (NoMethodError every
+# iteration -> busy-spin + a logged error/callback per spin); the new loop
+# parks. We run each for a fixed window and count iterations and "errors that
+# would be logged/dispatched to callbacks".
+# --------------------------------------------------------------------------
+puts "\nB. Behavior while @resp_sub is nil for #{(WINDOW = 0.5)}s (lower spin = better)\n\n"
+def run_old_loop(window)
+  resp_sub = nil # the restart window
+  iters = 0
+  errors = 0
+  deadline = mono + window
+  while mono < deadline
+    begin
+      resp_sub.pending_queue.pop # NoMethodError on nil
+    rescue => _e
+      errors += 1 # old code logs + notify_error_callbacks here
+    end
+    iters += 1
+  end
+  [iters, errors]
+end
+def run_new_loop(window)
+  resp_sub = nil
+  iters = 0
+  errors = 0
+  deadline = mono + window
+  while mono < deadline
+    s = resp_sub
+    if s.nil?
+      sleep 0.01 # park instead of spinning
+      iters += 1
+      next
+    end
+    begin
+      s.pending_queue.pop
+    rescue => _e
+      errors += 1
+    end
+    iters += 1
+  end
+  [iters, errors]
+end
+old_iters, old_errs = run_old_loop(WINDOW)
+new_iters, new_errs = run_new_loop(WINDOW)
+printf("  old loop: %12d iterations, %12d errors logged/dispatched\n", old_iters, old_errs)
+printf("  new loop: %12d iterations, %12d errors logged/dispatched\n", new_iters, new_errs)
+printf("  => new loop does %.5f%% of the old loop's wasted work\n",
+       old_iters.zero? ? 0.0 : (new_iters.to_f / old_iters * 100))
+# --------------------------------------------------------------------------
+# C. Self-healing crash counter (#4). The old counter was a plain Integer
+# mutated by multiple dispatcher threads (`@crash_count = (@crash_count||0)+1`),
+# which loses updates under true parallelism, corrupting the exponential
+# backoff. The new counter is a Concurrent::AtomicFixnum. We have N threads each
+# "crash" K times and check the final count.
+# --------------------------------------------------------------------------
+puts "\nC. Crash-counter accuracy under concurrent crashes (expected == actual is correct)\n\n"
+def hammer(counter, threads, per_thread)
+  ts = threads.times.map do
+    ::Thread.new do
+      per_thread.times { counter.call }
+    end
+  end
+  ts.each(&:join)
+end
+threads = [::Concurrent.processor_count, 4].max
+per_thread = 50_000
+expected = threads * per_thread
+# Old: plain integer read-modify-write (racy).
+plain = 0
+hammer(->{ plain = plain + 1 }, threads, per_thread)
+# New: atomic increment.
+atomic = ::Concurrent::AtomicFixnum.new(0)
+hammer(->{ atomic.increment }, threads, per_thread)
+printf("  threads=%d  per_thread=%d  expected=%d\n", threads, per_thread, expected)
+printf("  old plain Integer: %10d  (lost %d updates)\n", plain, expected - plain)
+printf("  new AtomicFixnum:  %10d  (lost %d updates)\n", atomic.value, expected - atomic.value)
+puts "\ndone."

data/bench/server_intake_bench.rb ADDED Viewed

@@ -0,0 +1,158 @@
+# Benchmarks for the server intake fan-out (#1) and handler observability (#2).
+#
+# Models old (1 intake handler) vs new (N intake handlers) in one process, plus
+# a demonstration of the #2 in-flight observability. No NATS server required.
+#
+#   A. Intake throughput     -- acks/sec with 1 vs N drain threads when each ACK
+#                               publish has some latency (the real bottleneck).
+#   B. Head-of-line blocking -- how long other subjects stall behind one slow
+#                               publish with 1 vs N handlers.
+#   C. Observability demo    -- with hung handlers, the new server notifications
+#                               surface the saturation/overdue work that was
+#                               previously invisible (only message_dropped).
+#
+# Usage:
+#   bundle exec ruby -Ilib bench/server_intake_bench.rb
+require "bundler/setup"
+require "concurrent"
+require "nats/client"             # real NATS::Subscription / NATS::Msg
+require "protobuf/nats"
+::Protobuf::Logging.logger = ::Logger.new(nil)
+def mono
+  ::Process.clock_gettime(::Process::CLOCK_MONOTONIC)
+end
+HANDLERS        = Integer(ENV.fetch("BENCH_HANDLERS", [::Concurrent.processor_count, 4].max.to_s))
+MSGS            = Integer(ENV.fetch("BENCH_MSGS", "20000"))
+PUBLISH_LAT_US  = Integer(ENV.fetch("BENCH_PUBLISH_LATENCY_US", "50")) # per-ACK publish latency
+puts "=" * 72
+puts "protobuf-nats server intake bench"
+puts "engine=#{RUBY_ENGINE} #{RUBY_VERSION}  processor_count=#{::Concurrent.processor_count}"
+puts "handlers(new)=#{HANDLERS}  msgs=#{MSGS}  publish_latency=#{PUBLISH_LAT_US}us"
+puts "=" * 72
+# --------------------------------------------------------------------------
+# Shared intake model: a SizedQueue fed with `total` messages, drained by
+# `handlers` threads. Each message does light work + an ACK "publish" that
+# costs `publish_latency` seconds (the part that serializes on one thread today).
+# --------------------------------------------------------------------------
+def drain(handlers, total, publish_latency)
+  queue = ::SizedQueue.new(total + handlers)
+  total.times { queue.push(:msg) }
+  handlers.times { queue.push(:stop) }
+  processed = ::Concurrent::AtomicFixnum.new(0)
+  t0 = mono
+  threads = handlers.times.map do
+    ::Thread.new do
+      loop do
+        m = queue.pop
+        break if m == :stop
+        sleep(publish_latency) if publish_latency.positive?
+        processed.increment
+      end
+    end
+  end
+  threads.each(&:join)
+  elapsed = mono - t0
+  { per_sec: processed.value / elapsed, elapsed: elapsed }
+end
+puts "\nA. Intake throughput (acks/sec; higher is better)\n\n"
+lat = PUBLISH_LAT_US / 1_000_000.0
+old = drain(1, MSGS, lat)
+new = drain(HANDLERS, MSGS, lat)
+printf("  old (1 handler):   %12.0f acks/s  (%.2fs)\n", old[:per_sec], old[:elapsed])
+printf("  new (%d handlers):  %12.0f acks/s  (%.2fs)\n", HANDLERS, new[:per_sec], new[:elapsed])
+printf("  => %.2fx faster intake\n", new[:per_sec] / old[:per_sec])
+# --------------------------------------------------------------------------
+# B. Head-of-line blocking: one slow publish is enqueued first, followed by
+# `fast_count` quick messages. Measure how long until all the quick messages
+# finish. With one handler they wait behind the slow publish; with N they don't.
+# --------------------------------------------------------------------------
+def head_of_line(handlers, slow_latency, fast_count)
+  queue = ::SizedQueue.new(fast_count + 1 + handlers)
+  queue.push(:slow)
+  fast_count.times { queue.push(:fast) }
+  handlers.times { queue.push(:stop) }
+  fast_done = ::Concurrent::AtomicFixnum.new(0)
+  last_fast_at = ::Concurrent::AtomicReference.new(nil)
+  start = mono
+  threads = handlers.times.map do
+    ::Thread.new do
+      loop do
+        m = queue.pop
+        break if m == :stop
+        if m == :slow
+          sleep slow_latency
+        else
+          last_fast_at.set(mono) if fast_done.increment == fast_count
+        end
+      end
+    end
+  end
+  threads.each(&:join)
+  (last_fast_at.get || mono) - start
+end
+puts "\nB. Head-of-line blocking behind one slow (0.5s) publish (lower = better)\n\n"
+slow = 0.5
+old_b = head_of_line(1, slow, 50)
+new_b = head_of_line(HANDLERS, slow, 50)
+printf("  old (1 handler):   50 quick messages finished after %6.1f ms (stuck behind the slow publish)\n", old_b * 1000)
+printf("  new (%d handlers):  50 quick messages finished after %6.1f ms (unaffected)\n", HANDLERS, new_b * 1000)
+# --------------------------------------------------------------------------
+# C. #2 observability demo: hung handlers occupy the pool. Today operators only
+# see `message_dropped`; now the in-flight gauges + overdue event explain why.
+# --------------------------------------------------------------------------
+puts "\nC. Handler-exhaustion observability (what an operator now sees)\n\n"
+ENV["PB_NATS_SERVER_SUBSCRIPTION_HANDLERS"] = "1"
+ENV["PB_NATS_SERVER_HANDLER_OVERDUE_MS"] = "100"
+class DemoNats
+  def connect(*); end
+  def new_inbox; "_INBOX.demo"; end
+  def subscribe(_s, *_a)
+    sub = ::NATS::Subscription.new
+    sub.pending_queue = ::SizedQueue.new(1024)
+    sub
+  end
+  def publish(*); end
+  def flush(*); end
+  %i[on_disconnect on_reconnect on_close on_error].each { |m| define_method(m) { |*| } }
+  def close; end
+end
+server = ::Protobuf::Nats::Server.new(:threads => 4, :client => DemoNats.new, :server => "bench")
+release = ::Queue.new
+server.define_singleton_method(:handle_request) { |*_| release.pop; "" }
+gauges = {}
+%w[inflight_count inflight_oldest_age_ms overdue_handler_count handler_overdue pending_intake_queue_size].each do |name|
+  ::ActiveSupport::Notifications.subscribe("server.#{name}.protobuf-nats") { |_, _, _, _, v| gauges[name] = v }
+end
+4.times { |i| server.enqueue_request("req#{i}", "inbox#{i}") } # all 4 pool slots now hung
+sleep 0.15                                                       # exceed the 100ms overdue window
+server.enqueue_request("req5", "inbox5")                         # pool full -> NACK + saturated
+server.instrument_inflight_handlers
+printf("  inflight_count          = %s   (handlers stuck on the downstream)\n", gauges["inflight_count"])
+printf("  inflight_oldest_age_ms  = %.0f\n", gauges["inflight_oldest_age_ms"] || 0)
+printf("  overdue_handler_count   = %s   (client already gave up on these)\n", gauges["overdue_handler_count"])
+printf("  handler_overdue fired   = %s\n", gauges.key?("handler_overdue"))
+puts   "  (previously: only server.message_dropped, with no hint that handlers were stuck)"
+release << :go while !release.num_waiting.zero?
+4.times { release << :go }
+puts "\ndone."