nnq 0.4.0 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +114 -0
- data/lib/nnq/bus.rb +37 -0
- data/lib/nnq/connection.rb +16 -6
- data/lib/nnq/engine/connection_lifecycle.rb +61 -10
- data/lib/nnq/engine/reconnect.rb +12 -3
- data/lib/nnq/engine/socket_lifecycle.rb +10 -2
- data/lib/nnq/engine.rb +77 -30
- data/lib/nnq/error.rb +26 -6
- data/lib/nnq/monitor_event.rb +3 -1
- data/lib/nnq/options.rb +10 -1
- data/lib/nnq/pair.rb +6 -1
- data/lib/nnq/pub_sub.rb +9 -2
- data/lib/nnq/push_pull.rb +9 -2
- data/lib/nnq/reactor.rb +12 -11
- data/lib/nnq/req_rep.rb +61 -13
- data/lib/nnq/routing/backtrace.rb +47 -0
- data/lib/nnq/routing/bus.rb +108 -0
- data/lib/nnq/routing/pair.rb +4 -1
- data/lib/nnq/routing/pub.rb +9 -5
- data/lib/nnq/routing/pull.rb +2 -1
- data/lib/nnq/routing/push.rb +2 -0
- data/lib/nnq/routing/rep.rb +7 -22
- data/lib/nnq/routing/rep_raw.rb +63 -0
- data/lib/nnq/routing/req.rb +7 -3
- data/lib/nnq/routing/req_raw.rb +73 -0
- data/lib/nnq/routing/respondent.rb +84 -0
- data/lib/nnq/routing/respondent_raw.rb +54 -0
- data/lib/nnq/routing/send_pump.rb +27 -6
- data/lib/nnq/routing/sub.rb +4 -0
- data/lib/nnq/routing/surveyor.rb +138 -0
- data/lib/nnq/routing/surveyor_raw.rb +107 -0
- data/lib/nnq/socket.rb +51 -8
- data/lib/nnq/surveyor_respondent.rb +98 -0
- data/lib/nnq/transport/inproc.rb +5 -0
- data/lib/nnq/transport/ipc.rb +3 -0
- data/lib/nnq/transport/tcp.rb +27 -5
- data/lib/nnq/version.rb +1 -1
- data/lib/nnq.rb +2 -0
- metadata +13 -3
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 0a336ac1e24bc6210ddeac6731163baab6f8980d9eebbe3235fac13157416fcf
|
|
4
|
+
data.tar.gz: f049bf9038235487966cae1c8920b452abf9984e3776b2b93cbbd95e067fb485
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: dcef45943a41f1bc53bbcf8ecc0c34c2779e4a16dea04e8f78854cf6a9debe11168596b47b764728e49393f61c319d41e2d79e489b37dec809adc59cbc0741f0
|
|
7
|
+
data.tar.gz: fd7aaa30c57d5b8fbfd6279aba8b96423e201837ca4ef0144a315ec020da5e0183aaede0c7327fc49bf1bd318894099e4fe02657a12aebf98c1d895582583c62
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,119 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.6.0 — 2026-04-15
|
|
4
|
+
|
|
5
|
+
- **NNG-style raw mode for REQ/REP and SURVEYOR/RESPONDENT.** Constructing
|
|
6
|
+
any of the four with `raw: true` bypasses the cooked state machine
|
|
7
|
+
(request-id tracking, pending-reply slot, survey window) and exposes
|
|
8
|
+
the full SP backtrace header as an opaque, caller-supplied handle.
|
|
9
|
+
- `#receive` returns `[pipe, header, body]` where `pipe` is the live
|
|
10
|
+
`NNQ::Connection` that delivered the message (idiomatic Ruby handle
|
|
11
|
+
— no opaque pipe_id token, no lookup registry), `header` is the
|
|
12
|
+
parsed backtrace bytes, and `body` is the payload.
|
|
13
|
+
- Raw REQ/SURVEYOR send: `send(body, header:)` — fans round-robin /
|
|
14
|
+
fans out.
|
|
15
|
+
- Raw REP/RESPONDENT send: `send(body, to:, header:)` — routes
|
|
16
|
+
directly to a prior `pipe` with the stored `header` written
|
|
17
|
+
verbatim, so the cooked peer matches the reply. Closed peer or
|
|
18
|
+
over-TTL header → silent drop (matches NNG behavior).
|
|
19
|
+
- Cooked-mode methods (`send_request`, `send_reply`, `send_survey`)
|
|
20
|
+
raise `NNQ::Error` in raw mode and vice versa.
|
|
21
|
+
- Unblocks proxy/device-style use cases (forwarders, request routers)
|
|
22
|
+
without touching the cooked code paths. `lib/nnq/routing/{req,rep,
|
|
23
|
+
surveyor,respondent}_raw.rb` live alongside their cooked siblings;
|
|
24
|
+
`build_routing` branches on `@raw` inside REQ0/REP0/SURVEYOR0/
|
|
25
|
+
RESPONDENT0. PUB/SUB and PUSH/PULL raw are still out of scope.
|
|
26
|
+
- **Zero-alloc cooked send paths via protocol-sp `header:` kwarg.**
|
|
27
|
+
`Connection#send_message` / `#write_message` grow an optional
|
|
28
|
+
`header:` kwarg that protocol-sp writes between the SP length prefix
|
|
29
|
+
and the body as a third buffered write (coalesced into a single
|
|
30
|
+
`writev`). Cooked `Req#send_request`, `Rep#send_reply`, and
|
|
31
|
+
`Respondent#send_reply` no longer allocate the `header + body`
|
|
32
|
+
intermediate String on every send — the savings apply to every
|
|
33
|
+
REQ/REP round trip regardless of whether raw mode is used.
|
|
34
|
+
Requires `protocol-sp >= 0.3`.
|
|
35
|
+
- **`Options#recv_hwm`** — new option, defaults to `Options::DEFAULT_HWM`
|
|
36
|
+
(same as `send_hwm`). Bounds the raw routing strategies' receive
|
|
37
|
+
queues; the cooked paths still use their existing (unbounded) state
|
|
38
|
+
and are unaffected.
|
|
39
|
+
|
|
40
|
+
## 0.5.0 — 2026-04-15
|
|
41
|
+
|
|
42
|
+
- **Send-path freezes the body** — every public send method (PUSH,
|
|
43
|
+
PUB, PAIR, BUS, REQ, REP, SURVEYOR, RESPONDENT) routes the body
|
|
44
|
+
through `Socket#frozen_binary`, which coerces to a frozen binary
|
|
45
|
+
string. Fast path: already frozen and binary → returned as-is, no
|
|
46
|
+
allocation. Slow path: `body.b.freeze` (one copy). Prevents a
|
|
47
|
+
caller from mutating the string after it has been enqueued (the
|
|
48
|
+
body can sit in a send queue or per-peer queue until a pump
|
|
49
|
+
writes it).
|
|
50
|
+
- **Hot-path: no kwargs splat on verbose monitor emit** —
|
|
51
|
+
`emit_verbose_monitor_event(type, **detail)` replaced with dedicated
|
|
52
|
+
`emit_verbose_msg_sent(body)` / `emit_verbose_msg_received(body)`
|
|
53
|
+
helpers. Early-returns before allocating the detail hash, so the
|
|
54
|
+
send/recv loops pay nothing when `-vvv` is off. Send pump also
|
|
55
|
+
hoists the `verbose_monitor` check out of the batch `.each`.
|
|
56
|
+
- **YJIT-friendly `all?` blocks** — `@queues.each_value.all?(&:empty?)`
|
|
57
|
+
→ explicit `{ |q| q.empty? }` in pub/bus/surveyor `drained?`
|
|
58
|
+
(YJIT specializes explicit blocks, not `Symbol#to_proc`).
|
|
59
|
+
- **`Reactor.run` uses `Async::Promise`** — replaces the
|
|
60
|
+
`Thread::Queue` + manual `[:ok,val]`/`[:error,exc]` tagging with a
|
|
61
|
+
single `result.fulfill { block.call }` + `result.wait` pair.
|
|
62
|
+
- **`Engine#spawn_task(parent:)`** — renamed from `barrier:` to make it
|
|
63
|
+
clear any parent barrier is accepted, not just the socket-level one.
|
|
64
|
+
- **`linger` default → `Float::INFINITY`** — matches libzmq parity.
|
|
65
|
+
`Socket#close` waits forever for the send queue to drain. Pass
|
|
66
|
+
`linger: 0` for the old drop-on-close behavior.
|
|
67
|
+
- **`Socket.new` accepts a block** — File.open-style. The socket is
|
|
68
|
+
yielded to the block and `#close`d when the block returns (or
|
|
69
|
+
raises).
|
|
70
|
+
- **`drain_send_queue` rescues `Async::Stop`** — parent-task
|
|
71
|
+
cancellation during close no longer propagates out of the ensure
|
|
72
|
+
path; the rest of teardown runs.
|
|
73
|
+
- **Hot-path `Array#first`** — `send_pump` uses `Array#first` instead
|
|
74
|
+
of `[0]` for YJIT specialization.
|
|
75
|
+
- **Barrier-based cascading teardown** — `SocketLifecycle` owns a
|
|
76
|
+
socket-level `Async::Barrier`; `ConnectionLifecycle` creates a nested
|
|
77
|
+
per-connection barrier. All pumps, accept loops, reconnect loops, and
|
|
78
|
+
supervisors live under these barriers. `Engine#close` calls
|
|
79
|
+
`barrier.stop` once and every descendant unwinds atomically. Replaces
|
|
80
|
+
the manual `@tasks` array.
|
|
81
|
+
- **Per-connection supervisor** — each connection spawns a supervisor
|
|
82
|
+
task (on the socket barrier) that watches for the first pump exit and
|
|
83
|
+
runs `lost!` in `ensure`. Placing the supervisor outside the
|
|
84
|
+
per-connection barrier avoids the self-stop footgun.
|
|
85
|
+
- **Connect timeout** — `Transport::TCP.connect` uses
|
|
86
|
+
`Socket.tcp(host, port, connect_timeout:)` instead of `TCPSocket.new`.
|
|
87
|
+
Timeout derived from `reconnect_interval` (floor 0.5s). Fixes macOS
|
|
88
|
+
hang where IPv6 `connect(2)` never delivers `ECONNREFUSED`.
|
|
89
|
+
- **Handshake timeout** — SP greeting exchange wrapped in
|
|
90
|
+
`Async::Task#with_timeout(handshake_timeout)`. Prevents a hang when a
|
|
91
|
+
non-NNG service accepts the TCP connection but never sends a greeting.
|
|
92
|
+
- **Reconnect after handshake failure** — `ConnectionLifecycle#handshake!`
|
|
93
|
+
now calls `tear_down!(reconnect: true)` on error instead of bare
|
|
94
|
+
`transition!(:closed)`, so the endpoint doesn't go dead when a peer
|
|
95
|
+
RSTs mid-handshake.
|
|
96
|
+
- **Quantized reconnect sleeps** — `Reconnect#quantized_wait` aligns
|
|
97
|
+
retries to wall-clock grid boundaries. Multiple clients reconnecting
|
|
98
|
+
with the same interval wake at the same instant.
|
|
99
|
+
- **Send pump fairness yield** — `Async::Task.current.yield` after each
|
|
100
|
+
batch write ensures peer pumps get a turn when the queue stays
|
|
101
|
+
non-empty.
|
|
102
|
+
- Add `DESIGN.md` documenting the architecture.
|
|
103
|
+
- **Versioned socket names** — `PUSH` → `PUSH0`, `PULL` → `PULL0`, etc.
|
|
104
|
+
Canonical names now include the SP protocol version. Unversioned
|
|
105
|
+
aliases (`NNQ::PUSH = NNQ::PUSH0`) are kept for backward compat.
|
|
106
|
+
- **`raw:` kwarg** — `Socket#initialize` accepts `raw: false`. Plumbing
|
|
107
|
+
for raw-mode routing (device/proxy support). No functional raw
|
|
108
|
+
routing yet.
|
|
109
|
+
- **`NNQ::BUS0`** — best-effort bidirectional mesh (bus0). Fan-out send
|
|
110
|
+
to all peers (drop when full), shared recv queue. Self-pairing.
|
|
111
|
+
- **`NNQ::SURVEYOR0` / `NNQ::RESPONDENT0`** — survey/response pattern
|
|
112
|
+
(survey0). Surveyor broadcasts a survey with a timed reply window
|
|
113
|
+
(`options.survey_time`, default 1s). Respondent echoes the backtrace
|
|
114
|
+
like REP. Shared `Routing::Backtrace` module extracted from REP.
|
|
115
|
+
- **`NNQ::TimedOut`** error raised when the survey window expires.
|
|
116
|
+
|
|
3
117
|
## 0.4.0 — 2026-04-09
|
|
4
118
|
|
|
5
119
|
- `Socket#all_peers_gone` — `Async::Promise` resolving the first time
|
data/lib/nnq/bus.rb
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require_relative "socket"
|
|
4
|
+
require_relative "routing/bus"
|
|
5
|
+
|
|
6
|
+
module NNQ
|
|
7
|
+
# BUS (nng bus0): best-effort bidirectional mesh. Every message sent
|
|
8
|
+
# goes to all directly connected peers. Every message received from
|
|
9
|
+
# any peer is delivered to the application. Self-pairing (BUS ↔ BUS).
|
|
10
|
+
#
|
|
11
|
+
# Send never blocks — if a peer's queue is full, the message is
|
|
12
|
+
# dropped for that peer (matching nng's best-effort semantics).
|
|
13
|
+
#
|
|
14
|
+
class BUS0 < Socket
|
|
15
|
+
def send(body)
|
|
16
|
+
body = frozen_binary(body)
|
|
17
|
+
Reactor.run { @engine.routing.send(body) }
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
|
|
21
|
+
def receive
|
|
22
|
+
Reactor.run { @engine.routing.receive }
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
|
|
26
|
+
private
|
|
27
|
+
|
|
28
|
+
def protocol
|
|
29
|
+
Protocol::SP::Protocols::BUS_V0
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
|
|
33
|
+
def build_routing(engine)
|
|
34
|
+
Routing::Bus.new(engine)
|
|
35
|
+
end
|
|
36
|
+
end
|
|
37
|
+
end
|
data/lib/nnq/connection.rb
CHANGED
|
@@ -12,9 +12,11 @@ module NNQ
|
|
|
12
12
|
# @return [Protocol::SP::Connection]
|
|
13
13
|
attr_reader :sp
|
|
14
14
|
|
|
15
|
+
|
|
15
16
|
# @return [String, nil] endpoint URI we connected to / accepted from
|
|
16
17
|
attr_reader :endpoint
|
|
17
18
|
|
|
19
|
+
|
|
18
20
|
# @param sp [Protocol::SP::Connection] handshake-completed SP connection
|
|
19
21
|
# @param endpoint [String, nil]
|
|
20
22
|
def initialize(sp, endpoint: nil)
|
|
@@ -25,16 +27,20 @@ module NNQ
|
|
|
25
27
|
|
|
26
28
|
|
|
27
29
|
# @return [Integer] peer protocol id (e.g. Protocols::PULL_V0)
|
|
28
|
-
def peer_protocol
|
|
30
|
+
def peer_protocol
|
|
31
|
+
@sp.peer_protocol
|
|
32
|
+
end
|
|
29
33
|
|
|
30
34
|
|
|
31
35
|
# Writes one message into the SP connection's send buffer (no flush).
|
|
32
36
|
#
|
|
33
37
|
# @param body [String]
|
|
38
|
+
# @param header [String, nil] optional binary prefix written between
|
|
39
|
+
# the SP length prefix and body (see Protocol::SP::Connection)
|
|
34
40
|
# @return [void]
|
|
35
|
-
def write_message(body)
|
|
41
|
+
def write_message(body, header: nil)
|
|
36
42
|
raise ClosedError, "connection closed" if @closed
|
|
37
|
-
@sp.write_message(body)
|
|
43
|
+
@sp.write_message(body, header: header)
|
|
38
44
|
end
|
|
39
45
|
|
|
40
46
|
|
|
@@ -53,10 +59,11 @@ module NNQ
|
|
|
53
59
|
# each call is request-paced and there's nothing to batch.
|
|
54
60
|
#
|
|
55
61
|
# @param body [String]
|
|
62
|
+
# @param header [String, nil] optional binary prefix
|
|
56
63
|
# @return [void]
|
|
57
|
-
def send_message(body)
|
|
64
|
+
def send_message(body, header: nil)
|
|
58
65
|
raise ClosedError, "connection closed" if @closed
|
|
59
|
-
@sp.send_message(body)
|
|
66
|
+
@sp.send_message(body, header: header)
|
|
60
67
|
end
|
|
61
68
|
|
|
62
69
|
|
|
@@ -77,7 +84,9 @@ module NNQ
|
|
|
77
84
|
|
|
78
85
|
|
|
79
86
|
# @return [Boolean]
|
|
80
|
-
def closed?
|
|
87
|
+
def closed?
|
|
88
|
+
@closed
|
|
89
|
+
end
|
|
81
90
|
|
|
82
91
|
|
|
83
92
|
# Closes the underlying SP connection. Safe to call twice.
|
|
@@ -86,5 +95,6 @@ module NNQ
|
|
|
86
95
|
@closed = true
|
|
87
96
|
@sp.close
|
|
88
97
|
end
|
|
98
|
+
|
|
89
99
|
end
|
|
90
100
|
end
|
|
@@ -1,5 +1,6 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
|
+
require "async/barrier"
|
|
3
4
|
require "protocol/sp"
|
|
4
5
|
require_relative "../connection"
|
|
5
6
|
|
|
@@ -42,6 +43,12 @@ module NNQ
|
|
|
42
43
|
# @return [Symbol]
|
|
43
44
|
attr_reader :state
|
|
44
45
|
|
|
46
|
+
# @return [Async::Barrier] holds all per-connection pump tasks
|
|
47
|
+
# (send pump, recv pump). When the connection is torn down,
|
|
48
|
+
# {#tear_down!} calls `@barrier.stop` to cancel every sibling
|
|
49
|
+
# task atomically.
|
|
50
|
+
attr_reader :barrier
|
|
51
|
+
|
|
45
52
|
|
|
46
53
|
# @param engine [Engine]
|
|
47
54
|
# @param endpoint [String, nil]
|
|
@@ -52,6 +59,7 @@ module NNQ
|
|
|
52
59
|
@framing = framing
|
|
53
60
|
@state = :new
|
|
54
61
|
@conn = nil
|
|
62
|
+
@barrier = Async::Barrier.new(parent: engine.barrier)
|
|
55
63
|
end
|
|
56
64
|
|
|
57
65
|
|
|
@@ -68,13 +76,15 @@ module NNQ
|
|
|
68
76
|
max_message_size: @engine.options.max_message_size,
|
|
69
77
|
framing: @framing,
|
|
70
78
|
)
|
|
71
|
-
sp.handshake!
|
|
79
|
+
Async::Task.current.with_timeout(handshake_timeout) { sp.handshake! }
|
|
72
80
|
ready!(NNQ::Connection.new(sp, endpoint: @endpoint))
|
|
73
81
|
@conn
|
|
74
|
-
rescue =>
|
|
75
|
-
@engine.emit_monitor_event(:handshake_failed, endpoint: @endpoint, detail: { error:
|
|
82
|
+
rescue Protocol::SP::Error, *CONNECTION_LOST, Async::TimeoutError => error
|
|
83
|
+
@engine.emit_monitor_event(:handshake_failed, endpoint: @endpoint, detail: { error: error })
|
|
76
84
|
io.close rescue nil
|
|
77
|
-
|
|
85
|
+
# Full tear-down with reconnect: without this, the endpoint
|
|
86
|
+
# goes dead when a peer RSTs mid-handshake.
|
|
87
|
+
tear_down!(reconnect: true)
|
|
78
88
|
raise
|
|
79
89
|
end
|
|
80
90
|
|
|
@@ -83,16 +93,28 @@ module NNQ
|
|
|
83
93
|
# asks the engine to schedule a reconnect (if the endpoint is in
|
|
84
94
|
# the dialed set and reconnect is still enabled).
|
|
85
95
|
def lost!
|
|
86
|
-
|
|
87
|
-
tear_down!
|
|
88
|
-
@engine.maybe_reconnect(ep)
|
|
96
|
+
tear_down!(reconnect: true)
|
|
89
97
|
end
|
|
90
98
|
|
|
91
99
|
|
|
92
100
|
# Deliberate close (engine shutdown or routing eviction). Does
|
|
93
101
|
# not trigger reconnect.
|
|
94
102
|
def close!
|
|
95
|
-
tear_down!
|
|
103
|
+
tear_down!(reconnect: false)
|
|
104
|
+
end
|
|
105
|
+
|
|
106
|
+
|
|
107
|
+
# Starts a supervisor for this connection. Must be called after
|
|
108
|
+
# all per-connection pumps (recv loop, send pump) have been
|
|
109
|
+
# spawned on the connection barrier. The supervisor blocks until
|
|
110
|
+
# the first pump exits, then runs tear_down! via lost!.
|
|
111
|
+
#
|
|
112
|
+
# Called by Engine#handle_accepted / Engine#handle_connected after
|
|
113
|
+
# spawning the recv loop — routing's connection_added may have
|
|
114
|
+
# already spawned send pumps during ready!, so the barrier is
|
|
115
|
+
# guaranteed non-empty by then.
|
|
116
|
+
def start_supervisor!
|
|
117
|
+
start_supervisor unless @barrier.empty?
|
|
96
118
|
end
|
|
97
119
|
|
|
98
120
|
|
|
@@ -106,7 +128,7 @@ module NNQ
|
|
|
106
128
|
@engine.routing.connection_added(conn) if @engine.routing.respond_to?(:connection_added)
|
|
107
129
|
rescue ConnectionRejected
|
|
108
130
|
@engine.emit_monitor_event(:connection_rejected, endpoint: @endpoint)
|
|
109
|
-
tear_down!
|
|
131
|
+
tear_down!(reconnect: false)
|
|
110
132
|
raise
|
|
111
133
|
end
|
|
112
134
|
@engine.lifecycle.peer_connected.resolve(conn) unless @engine.lifecycle.peer_connected.resolved?
|
|
@@ -116,7 +138,7 @@ module NNQ
|
|
|
116
138
|
end
|
|
117
139
|
|
|
118
140
|
|
|
119
|
-
def tear_down!
|
|
141
|
+
def tear_down!(reconnect: false)
|
|
120
142
|
return if @state == :closed
|
|
121
143
|
transition!(:closed)
|
|
122
144
|
if @conn
|
|
@@ -126,6 +148,35 @@ module NNQ
|
|
|
126
148
|
@engine.emit_monitor_event(:disconnected, endpoint: @endpoint)
|
|
127
149
|
@engine.resolve_all_peers_gone_if_empty
|
|
128
150
|
end
|
|
151
|
+
@engine.maybe_reconnect(@endpoint) if reconnect
|
|
152
|
+
# Cancel every sibling pump of this connection. The caller is
|
|
153
|
+
# the supervisor task, which is NOT in the barrier — so there
|
|
154
|
+
# is no self-stop risk.
|
|
155
|
+
@barrier.stop
|
|
156
|
+
end
|
|
157
|
+
|
|
158
|
+
|
|
159
|
+
# Spawns a supervisor task on the *socket-level* barrier (not the
|
|
160
|
+
# per-connection barrier) that blocks on the first pump to finish
|
|
161
|
+
# and then triggers teardown.
|
|
162
|
+
def start_supervisor
|
|
163
|
+
@engine.barrier.async(transient: true, annotation: "conn supervisor") do
|
|
164
|
+
@barrier.wait { |task| task.wait; break }
|
|
165
|
+
rescue Async::Stop, Async::Cancel
|
|
166
|
+
rescue *CONNECTION_LOST
|
|
167
|
+
ensure
|
|
168
|
+
lost!
|
|
169
|
+
end
|
|
170
|
+
end
|
|
171
|
+
|
|
172
|
+
|
|
173
|
+
# Handshake timeout: same logic as TCP.connect_timeout — derived
|
|
174
|
+
# from reconnect_interval (floor 0.5s). Prevents a hang when the
|
|
175
|
+
# peer accepts the TCP connection but never sends an SP greeting.
|
|
176
|
+
def handshake_timeout
|
|
177
|
+
ri = @engine.options.reconnect_interval
|
|
178
|
+
ri = ri.end if ri.is_a?(Range)
|
|
179
|
+
[ri, 0.5].max
|
|
129
180
|
end
|
|
130
181
|
|
|
131
182
|
|
data/lib/nnq/engine/reconnect.rb
CHANGED
|
@@ -55,10 +55,10 @@ module NNQ
|
|
|
55
55
|
def run(parent_task, delay: nil)
|
|
56
56
|
delay, max_delay = init_delay(delay)
|
|
57
57
|
|
|
58
|
-
|
|
58
|
+
parent_task.async(transient: true, annotation: "nnq reconnect #{@endpoint}") do
|
|
59
59
|
loop do
|
|
60
60
|
break if @engine.closed?
|
|
61
|
-
sleep delay if delay > 0
|
|
61
|
+
sleep quantized_wait(delay) if delay > 0
|
|
62
62
|
break if @engine.closed?
|
|
63
63
|
begin
|
|
64
64
|
@engine.transport_for(@endpoint).connect(@endpoint, @engine)
|
|
@@ -70,13 +70,22 @@ module NNQ
|
|
|
70
70
|
end
|
|
71
71
|
rescue Async::Stop
|
|
72
72
|
end
|
|
73
|
-
@engine.tasks << task
|
|
74
73
|
end
|
|
75
74
|
|
|
76
75
|
|
|
77
76
|
private
|
|
78
77
|
|
|
79
78
|
|
|
79
|
+
# Wall-clock quantized sleep: wait until the next +delay+-sized
|
|
80
|
+
# grid tick. Multiple clients reconnecting with the same interval
|
|
81
|
+
# wake up at the same instant, collapsing staggered retries into
|
|
82
|
+
# aligned waves.
|
|
83
|
+
def quantized_wait(delay, now = Time.now.to_f)
|
|
84
|
+
wait = delay - (now % delay)
|
|
85
|
+
wait.positive? ? wait : delay
|
|
86
|
+
end
|
|
87
|
+
|
|
88
|
+
|
|
80
89
|
def init_delay(delay)
|
|
81
90
|
ri = @options.reconnect_interval
|
|
82
91
|
if ri.is_a?(Range)
|
|
@@ -1,5 +1,6 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
|
+
require "async/barrier"
|
|
3
4
|
require "async/promise"
|
|
4
5
|
|
|
5
6
|
module NNQ
|
|
@@ -42,9 +43,14 @@ module NNQ
|
|
|
42
43
|
# Edge-triggered: does not re-arm on reconnect.
|
|
43
44
|
attr_reader :all_peers_gone
|
|
44
45
|
|
|
46
|
+
# @return [Async::Barrier, nil] holds every socket-scoped task
|
|
47
|
+
# (connection supervisors, reconnect loops, accept loops).
|
|
48
|
+
# {Engine#close} calls +barrier.stop+ to cascade teardown
|
|
49
|
+
# through every per-connection barrier in one shot.
|
|
50
|
+
attr_reader :barrier
|
|
51
|
+
|
|
45
52
|
# @return [Boolean] when false, the engine must not schedule new
|
|
46
|
-
# reconnect attempts. Default true.
|
|
47
|
-
# reconnect loop yet, so this currently just records intent.
|
|
53
|
+
# reconnect attempts. Default true.
|
|
48
54
|
attr_accessor :reconnect_enabled
|
|
49
55
|
|
|
50
56
|
|
|
@@ -55,6 +61,7 @@ module NNQ
|
|
|
55
61
|
@peer_connected = Async::Promise.new
|
|
56
62
|
@all_peers_gone = Async::Promise.new
|
|
57
63
|
@reconnect_enabled = true
|
|
64
|
+
@barrier = nil
|
|
58
65
|
end
|
|
59
66
|
|
|
60
67
|
|
|
@@ -75,6 +82,7 @@ module NNQ
|
|
|
75
82
|
return false if @parent_task
|
|
76
83
|
@parent_task = task
|
|
77
84
|
@on_io_thread = on_io_thread
|
|
85
|
+
@barrier = Async::Barrier.new(parent: @parent_task)
|
|
78
86
|
transition!(:open)
|
|
79
87
|
true
|
|
80
88
|
end
|