nnq 0.4.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ed71a88238bae0611223d36bb3ec0a39795388d7b90227fb690cfc924cf85198
4
- data.tar.gz: cda3bfff65005960b91672cd1c6e2a61e8fce104e202ed928932e7b0404d7eda
3
+ metadata.gz: 0a336ac1e24bc6210ddeac6731163baab6f8980d9eebbe3235fac13157416fcf
4
+ data.tar.gz: f049bf9038235487966cae1c8920b452abf9984e3776b2b93cbbd95e067fb485
5
5
  SHA512:
6
- metadata.gz: 87ddb00a39836f3699dbd5f17a18b34dbd1c0c18d284ad1cf0b14d6271e565444ffce926ce2514c912423f943ef9a994ebb79eb3620b238f872f840124e3ad38
7
- data.tar.gz: 193a8a3a1830f18aba6399990a7f1ea81639165c6ffc072691874ca972e21732f4edaf6fff3a82c3bf40b1d3a117ff04adc381e1a3f65d2f2ca677b091ae9d9d
6
+ metadata.gz: dcef45943a41f1bc53bbcf8ecc0c34c2779e4a16dea04e8f78854cf6a9debe11168596b47b764728e49393f61c319d41e2d79e489b37dec809adc59cbc0741f0
7
+ data.tar.gz: fd7aaa30c57d5b8fbfd6279aba8b96423e201837ca4ef0144a315ec020da5e0183aaede0c7327fc49bf1bd318894099e4fe02657a12aebf98c1d895582583c62
data/CHANGELOG.md CHANGED
@@ -1,5 +1,119 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.6.0 — 2026-04-15
4
+
5
+ - **NNG-style raw mode for REQ/REP and SURVEYOR/RESPONDENT.** Constructing
6
+ any of the four with `raw: true` bypasses the cooked state machine
7
+ (request-id tracking, pending-reply slot, survey window) and exposes
8
+ the full SP backtrace header as an opaque, caller-supplied handle.
9
+ - `#receive` returns `[pipe, header, body]` where `pipe` is the live
10
+ `NNQ::Connection` that delivered the message (idiomatic Ruby handle
11
+ — no opaque pipe_id token, no lookup registry), `header` is the
12
+ parsed backtrace bytes, and `body` is the payload.
13
+ - Raw REQ/SURVEYOR send: `send(body, header:)` — fans round-robin /
14
+ fans out.
15
+ - Raw REP/RESPONDENT send: `send(body, to:, header:)` — routes
16
+ directly to a prior `pipe` with the stored `header` written
17
+ verbatim, so the cooked peer matches the reply. Closed peer or
18
+ over-TTL header → silent drop (matches NNG behavior).
19
+ - Cooked-mode methods (`send_request`, `send_reply`, `send_survey`)
20
+ raise `NNQ::Error` in raw mode and vice versa.
21
+ - Unblocks proxy/device-style use cases (forwarders, request routers)
22
+ without touching the cooked code paths. `lib/nnq/routing/{req,rep,
23
+ surveyor,respondent}_raw.rb` live alongside their cooked siblings;
24
+ `build_routing` branches on `@raw` inside REQ0/REP0/SURVEYOR0/
25
+ RESPONDENT0. PUB/SUB and PUSH/PULL raw are still out of scope.
26
+ - **Zero-alloc cooked send paths via protocol-sp `header:` kwarg.**
27
+ `Connection#send_message` / `#write_message` grow an optional
28
+ `header:` kwarg that protocol-sp writes between the SP length prefix
29
+ and the body as a third buffered write (coalesced into a single
30
+ `writev`). Cooked `Req#send_request`, `Rep#send_reply`, and
31
+ `Respondent#send_reply` no longer allocate the `header + body`
32
+ intermediate String on every send — the savings apply to every
33
+ REQ/REP round trip regardless of whether raw mode is used.
34
+ Requires `protocol-sp >= 0.3`.
35
+ - **`Options#recv_hwm`** — new option, defaults to `Options::DEFAULT_HWM`
36
+ (same as `send_hwm`). Bounds the raw routing strategies' receive
37
+ queues; the cooked paths still use their existing (unbounded) state
38
+ and are unaffected.
39
+
40
+ ## 0.5.0 — 2026-04-15
41
+
42
+ - **Send-path freezes the body** — every public send method (PUSH,
43
+ PUB, PAIR, BUS, REQ, REP, SURVEYOR, RESPONDENT) routes the body
44
+ through `Socket#frozen_binary`, which coerces to a frozen binary
45
+ string. Fast path: already frozen and binary → returned as-is, no
46
+ allocation. Slow path: `body.b.freeze` (one copy). Prevents a
47
+ caller from mutating the string after it has been enqueued (the
48
+ body can sit in a send queue or per-peer queue until a pump
49
+ writes it).
50
+ - **Hot-path: no kwargs splat on verbose monitor emit** —
51
+ `emit_verbose_monitor_event(type, **detail)` replaced with dedicated
52
+ `emit_verbose_msg_sent(body)` / `emit_verbose_msg_received(body)`
53
+ helpers. Early-returns before allocating the detail hash, so the
54
+ send/recv loops pay nothing when `-vvv` is off. Send pump also
55
+ hoists the `verbose_monitor` check out of the batch `.each`.
56
+ - **YJIT-friendly `all?` blocks** — `@queues.each_value.all?(&:empty?)`
57
+ → explicit `{ |q| q.empty? }` in pub/bus/surveyor `drained?`
58
+ (YJIT specializes explicit blocks, not `Symbol#to_proc`).
59
+ - **`Reactor.run` uses `Async::Promise`** — replaces the
60
+ `Thread::Queue` + manual `[:ok,val]`/`[:error,exc]` tagging with a
61
+ single `result.fulfill { block.call }` + `result.wait` pair.
62
+ - **`Engine#spawn_task(parent:)`** — renamed from `barrier:` to make it
63
+ clear any parent barrier is accepted, not just the socket-level one.
64
+ - **`linger` default → `Float::INFINITY`** — matches libzmq parity.
65
+ `Socket#close` waits forever for the send queue to drain. Pass
66
+ `linger: 0` for the old drop-on-close behavior.
67
+ - **`Socket.new` accepts a block** — File.open-style. The socket is
68
+ yielded to the block and `#close`d when the block returns (or
69
+ raises).
70
+ - **`drain_send_queue` rescues `Async::Stop`** — parent-task
71
+ cancellation during close no longer propagates out of the ensure
72
+ path; the rest of teardown runs.
73
+ - **Hot-path `Array#first`** — `send_pump` uses `Array#first` instead
74
+ of `[0]` for YJIT specialization.
75
+ - **Barrier-based cascading teardown** — `SocketLifecycle` owns a
76
+ socket-level `Async::Barrier`; `ConnectionLifecycle` creates a nested
77
+ per-connection barrier. All pumps, accept loops, reconnect loops, and
78
+ supervisors live under these barriers. `Engine#close` calls
79
+ `barrier.stop` once and every descendant unwinds atomically. Replaces
80
+ the manual `@tasks` array.
81
+ - **Per-connection supervisor** — each connection spawns a supervisor
82
+ task (on the socket barrier) that watches for the first pump exit and
83
+ runs `lost!` in `ensure`. Placing the supervisor outside the
84
+ per-connection barrier avoids the self-stop footgun.
85
+ - **Connect timeout** — `Transport::TCP.connect` uses
86
+ `Socket.tcp(host, port, connect_timeout:)` instead of `TCPSocket.new`.
87
+ Timeout derived from `reconnect_interval` (floor 0.5s). Fixes macOS
88
+ hang where IPv6 `connect(2)` never delivers `ECONNREFUSED`.
89
+ - **Handshake timeout** — SP greeting exchange wrapped in
90
+ `Async::Task#with_timeout(handshake_timeout)`. Prevents a hang when a
91
+ non-NNG service accepts the TCP connection but never sends a greeting.
92
+ - **Reconnect after handshake failure** — `ConnectionLifecycle#handshake!`
93
+ now calls `tear_down!(reconnect: true)` on error instead of bare
94
+ `transition!(:closed)`, so the endpoint doesn't go dead when a peer
95
+ RSTs mid-handshake.
96
+ - **Quantized reconnect sleeps** — `Reconnect#quantized_wait` aligns
97
+ retries to wall-clock grid boundaries. Multiple clients reconnecting
98
+ with the same interval wake at the same instant.
99
+ - **Send pump fairness yield** — `Async::Task.current.yield` after each
100
+ batch write ensures peer pumps get a turn when the queue stays
101
+ non-empty.
102
+ - Add `DESIGN.md` documenting the architecture.
103
+ - **Versioned socket names** — `PUSH` → `PUSH0`, `PULL` → `PULL0`, etc.
104
+ Canonical names now include the SP protocol version. Unversioned
105
+ aliases (`NNQ::PUSH = NNQ::PUSH0`) are kept for backward compat.
106
+ - **`raw:` kwarg** — `Socket#initialize` accepts `raw: false`. Plumbing
107
+ for raw-mode routing (device/proxy support). No functional raw
108
+ routing yet.
109
+ - **`NNQ::BUS0`** — best-effort bidirectional mesh (bus0). Fan-out send
110
+ to all peers (drop when full), shared recv queue. Self-pairing.
111
+ - **`NNQ::SURVEYOR0` / `NNQ::RESPONDENT0`** — survey/response pattern
112
+ (survey0). Surveyor broadcasts a survey with a timed reply window
113
+ (`options.survey_time`, default 1s). Respondent echoes the backtrace
114
+ like REP. Shared `Routing::Backtrace` module extracted from REP.
115
+ - **`NNQ::TimedOut`** error raised when the survey window expires.
116
+
3
117
  ## 0.4.0 — 2026-04-09
4
118
 
5
119
  - `Socket#all_peers_gone` — `Async::Promise` resolving the first time
data/lib/nnq/bus.rb ADDED
@@ -0,0 +1,37 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "socket"
4
+ require_relative "routing/bus"
5
+
6
+ module NNQ
7
+ # BUS (nng bus0): best-effort bidirectional mesh. Every message sent
8
+ # goes to all directly connected peers. Every message received from
9
+ # any peer is delivered to the application. Self-pairing (BUS ↔ BUS).
10
+ #
11
+ # Send never blocks — if a peer's queue is full, the message is
12
+ # dropped for that peer (matching nng's best-effort semantics).
13
+ #
14
+ class BUS0 < Socket
15
+ def send(body)
16
+ body = frozen_binary(body)
17
+ Reactor.run { @engine.routing.send(body) }
18
+ end
19
+
20
+
21
+ def receive
22
+ Reactor.run { @engine.routing.receive }
23
+ end
24
+
25
+
26
+ private
27
+
28
+ def protocol
29
+ Protocol::SP::Protocols::BUS_V0
30
+ end
31
+
32
+
33
+ def build_routing(engine)
34
+ Routing::Bus.new(engine)
35
+ end
36
+ end
37
+ end
@@ -12,9 +12,11 @@ module NNQ
12
12
  # @return [Protocol::SP::Connection]
13
13
  attr_reader :sp
14
14
 
15
+
15
16
  # @return [String, nil] endpoint URI we connected to / accepted from
16
17
  attr_reader :endpoint
17
18
 
19
+
18
20
  # @param sp [Protocol::SP::Connection] handshake-completed SP connection
19
21
  # @param endpoint [String, nil]
20
22
  def initialize(sp, endpoint: nil)
@@ -25,16 +27,20 @@ module NNQ
25
27
 
26
28
 
27
29
  # @return [Integer] peer protocol id (e.g. Protocols::PULL_V0)
28
- def peer_protocol = @sp.peer_protocol
30
+ def peer_protocol
31
+ @sp.peer_protocol
32
+ end
29
33
 
30
34
 
31
35
  # Writes one message into the SP connection's send buffer (no flush).
32
36
  #
33
37
  # @param body [String]
38
+ # @param header [String, nil] optional binary prefix written between
39
+ # the SP length prefix and body (see Protocol::SP::Connection)
34
40
  # @return [void]
35
- def write_message(body)
41
+ def write_message(body, header: nil)
36
42
  raise ClosedError, "connection closed" if @closed
37
- @sp.write_message(body)
43
+ @sp.write_message(body, header: header)
38
44
  end
39
45
 
40
46
 
@@ -53,10 +59,11 @@ module NNQ
53
59
  # each call is request-paced and there's nothing to batch.
54
60
  #
55
61
  # @param body [String]
62
+ # @param header [String, nil] optional binary prefix
56
63
  # @return [void]
57
- def send_message(body)
64
+ def send_message(body, header: nil)
58
65
  raise ClosedError, "connection closed" if @closed
59
- @sp.send_message(body)
66
+ @sp.send_message(body, header: header)
60
67
  end
61
68
 
62
69
 
@@ -77,7 +84,9 @@ module NNQ
77
84
 
78
85
 
79
86
  # @return [Boolean]
80
- def closed? = @closed
87
+ def closed?
88
+ @closed
89
+ end
81
90
 
82
91
 
83
92
  # Closes the underlying SP connection. Safe to call twice.
@@ -86,5 +95,6 @@ module NNQ
86
95
  @closed = true
87
96
  @sp.close
88
97
  end
98
+
89
99
  end
90
100
  end
@@ -1,5 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require "async/barrier"
3
4
  require "protocol/sp"
4
5
  require_relative "../connection"
5
6
 
@@ -42,6 +43,12 @@ module NNQ
42
43
  # @return [Symbol]
43
44
  attr_reader :state
44
45
 
46
+ # @return [Async::Barrier] holds all per-connection pump tasks
47
+ # (send pump, recv pump). When the connection is torn down,
48
+ # {#tear_down!} calls `@barrier.stop` to cancel every sibling
49
+ # task atomically.
50
+ attr_reader :barrier
51
+
45
52
 
46
53
  # @param engine [Engine]
47
54
  # @param endpoint [String, nil]
@@ -52,6 +59,7 @@ module NNQ
52
59
  @framing = framing
53
60
  @state = :new
54
61
  @conn = nil
62
+ @barrier = Async::Barrier.new(parent: engine.barrier)
55
63
  end
56
64
 
57
65
 
@@ -68,13 +76,15 @@ module NNQ
68
76
  max_message_size: @engine.options.max_message_size,
69
77
  framing: @framing,
70
78
  )
71
- sp.handshake!
79
+ Async::Task.current.with_timeout(handshake_timeout) { sp.handshake! }
72
80
  ready!(NNQ::Connection.new(sp, endpoint: @endpoint))
73
81
  @conn
74
- rescue => e
75
- @engine.emit_monitor_event(:handshake_failed, endpoint: @endpoint, detail: { error: e })
82
+ rescue Protocol::SP::Error, *CONNECTION_LOST, Async::TimeoutError => error
83
+ @engine.emit_monitor_event(:handshake_failed, endpoint: @endpoint, detail: { error: error })
76
84
  io.close rescue nil
77
- transition!(:closed) unless @state == :closed
85
+ # Full tear-down with reconnect: without this, the endpoint
86
+ # goes dead when a peer RSTs mid-handshake.
87
+ tear_down!(reconnect: true)
78
88
  raise
79
89
  end
80
90
 
@@ -83,16 +93,28 @@ module NNQ
83
93
  # asks the engine to schedule a reconnect (if the endpoint is in
84
94
  # the dialed set and reconnect is still enabled).
85
95
  def lost!
86
- ep = @endpoint
87
- tear_down!
88
- @engine.maybe_reconnect(ep)
96
+ tear_down!(reconnect: true)
89
97
  end
90
98
 
91
99
 
92
100
  # Deliberate close (engine shutdown or routing eviction). Does
93
101
  # not trigger reconnect.
94
102
  def close!
95
- tear_down!
103
+ tear_down!(reconnect: false)
104
+ end
105
+
106
+
107
+ # Starts a supervisor for this connection. Must be called after
108
+ # all per-connection pumps (recv loop, send pump) have been
109
+ # spawned on the connection barrier. The supervisor blocks until
110
+ # the first pump exits, then runs tear_down! via lost!.
111
+ #
112
+ # Called by Engine#handle_accepted / Engine#handle_connected after
113
+ # spawning the recv loop — routing's connection_added may have
114
+ # already spawned send pumps during ready!, so the barrier is
115
+ # guaranteed non-empty by then.
116
+ def start_supervisor!
117
+ start_supervisor unless @barrier.empty?
96
118
  end
97
119
 
98
120
 
@@ -106,7 +128,7 @@ module NNQ
106
128
  @engine.routing.connection_added(conn) if @engine.routing.respond_to?(:connection_added)
107
129
  rescue ConnectionRejected
108
130
  @engine.emit_monitor_event(:connection_rejected, endpoint: @endpoint)
109
- tear_down!
131
+ tear_down!(reconnect: false)
110
132
  raise
111
133
  end
112
134
  @engine.lifecycle.peer_connected.resolve(conn) unless @engine.lifecycle.peer_connected.resolved?
@@ -116,7 +138,7 @@ module NNQ
116
138
  end
117
139
 
118
140
 
119
- def tear_down!
141
+ def tear_down!(reconnect: false)
120
142
  return if @state == :closed
121
143
  transition!(:closed)
122
144
  if @conn
@@ -126,6 +148,35 @@ module NNQ
126
148
  @engine.emit_monitor_event(:disconnected, endpoint: @endpoint)
127
149
  @engine.resolve_all_peers_gone_if_empty
128
150
  end
151
+ @engine.maybe_reconnect(@endpoint) if reconnect
152
+ # Cancel every sibling pump of this connection. The caller is
153
+ # the supervisor task, which is NOT in the barrier — so there
154
+ # is no self-stop risk.
155
+ @barrier.stop
156
+ end
157
+
158
+
159
+ # Spawns a supervisor task on the *socket-level* barrier (not the
160
+ # per-connection barrier) that blocks on the first pump to finish
161
+ # and then triggers teardown.
162
+ def start_supervisor
163
+ @engine.barrier.async(transient: true, annotation: "conn supervisor") do
164
+ @barrier.wait { |task| task.wait; break }
165
+ rescue Async::Stop, Async::Cancel
166
+ rescue *CONNECTION_LOST
167
+ ensure
168
+ lost!
169
+ end
170
+ end
171
+
172
+
173
+ # Handshake timeout: same logic as TCP.connect_timeout — derived
174
+ # from reconnect_interval (floor 0.5s). Prevents a hang when the
175
+ # peer accepts the TCP connection but never sends an SP greeting.
176
+ def handshake_timeout
177
+ ri = @engine.options.reconnect_interval
178
+ ri = ri.end if ri.is_a?(Range)
179
+ [ri, 0.5].max
129
180
  end
130
181
 
131
182
 
@@ -55,10 +55,10 @@ module NNQ
55
55
  def run(parent_task, delay: nil)
56
56
  delay, max_delay = init_delay(delay)
57
57
 
58
- task = parent_task.async(transient: true, annotation: "nnq reconnect #{@endpoint}") do
58
+ parent_task.async(transient: true, annotation: "nnq reconnect #{@endpoint}") do
59
59
  loop do
60
60
  break if @engine.closed?
61
- sleep delay if delay > 0
61
+ sleep quantized_wait(delay) if delay > 0
62
62
  break if @engine.closed?
63
63
  begin
64
64
  @engine.transport_for(@endpoint).connect(@endpoint, @engine)
@@ -70,13 +70,22 @@ module NNQ
70
70
  end
71
71
  rescue Async::Stop
72
72
  end
73
- @engine.tasks << task
74
73
  end
75
74
 
76
75
 
77
76
  private
78
77
 
79
78
 
79
+ # Wall-clock quantized sleep: wait until the next +delay+-sized
80
+ # grid tick. Multiple clients reconnecting with the same interval
81
+ # wake up at the same instant, collapsing staggered retries into
82
+ # aligned waves.
83
+ def quantized_wait(delay, now = Time.now.to_f)
84
+ wait = delay - (now % delay)
85
+ wait.positive? ? wait : delay
86
+ end
87
+
88
+
80
89
  def init_delay(delay)
81
90
  ri = @options.reconnect_interval
82
91
  if ri.is_a?(Range)
@@ -1,5 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require "async/barrier"
3
4
  require "async/promise"
4
5
 
5
6
  module NNQ
@@ -42,9 +43,14 @@ module NNQ
42
43
  # Edge-triggered: does not re-arm on reconnect.
43
44
  attr_reader :all_peers_gone
44
45
 
46
+ # @return [Async::Barrier, nil] holds every socket-scoped task
47
+ # (connection supervisors, reconnect loops, accept loops).
48
+ # {Engine#close} calls +barrier.stop+ to cascade teardown
49
+ # through every per-connection barrier in one shot.
50
+ attr_reader :barrier
51
+
45
52
  # @return [Boolean] when false, the engine must not schedule new
46
- # reconnect attempts. Default true. nnq has no automatic
47
- # reconnect loop yet, so this currently just records intent.
53
+ # reconnect attempts. Default true.
48
54
  attr_accessor :reconnect_enabled
49
55
 
50
56
 
@@ -55,6 +61,7 @@ module NNQ
55
61
  @peer_connected = Async::Promise.new
56
62
  @all_peers_gone = Async::Promise.new
57
63
  @reconnect_enabled = true
64
+ @barrier = nil
58
65
  end
59
66
 
60
67
 
@@ -75,6 +82,7 @@ module NNQ
75
82
  return false if @parent_task
76
83
  @parent_task = task
77
84
  @on_io_thread = on_io_thread
85
+ @barrier = Async::Barrier.new(parent: @parent_task)
78
86
  transition!(:open)
79
87
  true
80
88
  end