nnq 0.2.0 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +118 -0
- data/lib/nnq/bus.rb +37 -0
- data/lib/nnq/connection.rb +9 -2
- data/lib/nnq/engine/connection_lifecycle.rb +72 -12
- data/lib/nnq/engine/reconnect.rb +112 -0
- data/lib/nnq/engine/socket_lifecycle.rb +40 -3
- data/lib/nnq/engine.rb +186 -35
- data/lib/nnq/error.rb +26 -6
- data/lib/nnq/monitor_event.rb +18 -0
- data/lib/nnq/options.rb +8 -1
- data/lib/nnq/pair.rb +6 -1
- data/lib/nnq/pub_sub.rb +9 -2
- data/lib/nnq/push_pull.rb +16 -3
- data/lib/nnq/reactor.rb +12 -11
- data/lib/nnq/req_rep.rb +10 -2
- data/lib/nnq/routing/backtrace.rb +39 -0
- data/lib/nnq/routing/bus.rb +108 -0
- data/lib/nnq/routing/pair.rb +10 -1
- data/lib/nnq/routing/pub.rb +9 -4
- data/lib/nnq/routing/pull.rb +10 -1
- data/lib/nnq/routing/push.rb +2 -0
- data/lib/nnq/routing/rep.rb +10 -20
- data/lib/nnq/routing/req.rb +6 -2
- data/lib/nnq/routing/respondent.rb +84 -0
- data/lib/nnq/routing/send_pump.rb +27 -5
- data/lib/nnq/routing/sub.rb +9 -0
- data/lib/nnq/routing/surveyor.rb +138 -0
- data/lib/nnq/socket.rb +102 -5
- data/lib/nnq/surveyor_respondent.rb +78 -0
- data/lib/nnq/transport/inproc.rb +5 -0
- data/lib/nnq/transport/ipc.rb +3 -0
- data/lib/nnq/transport/tcp.rb +27 -5
- data/lib/nnq/version.rb +1 -1
- data/lib/nnq.rb +10 -0
- metadata +11 -3
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 68a6dd62dc097b93740827c44f95bbfa983d5c7d6072b4625257bd2350ba23fe
|
|
4
|
+
data.tar.gz: 376c1ef08eda16a8950ae703d1798256c8c5ba720cd1e5d0e988f1a87067c093
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: f88c29c241ea5922930342a11c00181007cf698045b219c6e7cc111d2563e37e7e56c64efbb61b48efa608eaa6ab3e7104bc33983c6202410f1745a50a082012
|
|
7
|
+
data.tar.gz: e1e42983059b5a280495216a08b6e6ac6a9e6eff9385a2093ad62b9ac7698a80d24a2f36f14c4f4f9579962b7fde9c83a042f04e1819e5963ba2ae94b5cbed4f
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,123 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.5.0 — 2026-04-15
|
|
4
|
+
|
|
5
|
+
- **Send-path freezes the body** — every public send method (PUSH,
|
|
6
|
+
PUB, PAIR, BUS, REQ, REP, SURVEYOR, RESPONDENT) routes the body
|
|
7
|
+
through `Socket#frozen_binary`, which coerces to a frozen binary
|
|
8
|
+
string. Fast path: already frozen and binary → returned as-is, no
|
|
9
|
+
allocation. Slow path: `body.b.freeze` (one copy). Prevents a
|
|
10
|
+
caller from mutating the string after it has been enqueued (the
|
|
11
|
+
body can sit in a send queue or per-peer queue until a pump
|
|
12
|
+
writes it).
|
|
13
|
+
- **Hot-path: no kwargs splat on verbose monitor emit** —
|
|
14
|
+
`emit_verbose_monitor_event(type, **detail)` replaced with dedicated
|
|
15
|
+
`emit_verbose_msg_sent(body)` / `emit_verbose_msg_received(body)`
|
|
16
|
+
helpers. Early-returns before allocating the detail hash, so the
|
|
17
|
+
send/recv loops pay nothing when `-vvv` is off. Send pump also
|
|
18
|
+
hoists the `verbose_monitor` check out of the batch `.each`.
|
|
19
|
+
- **YJIT-friendly `all?` blocks** — `@queues.each_value.all?(&:empty?)`
|
|
20
|
+
→ explicit `{ |q| q.empty? }` in pub/bus/surveyor `drained?`
|
|
21
|
+
(YJIT specializes explicit blocks, not `Symbol#to_proc`).
|
|
22
|
+
- **`Reactor.run` uses `Async::Promise`** — replaces the
|
|
23
|
+
`Thread::Queue` + manual `[:ok,val]`/`[:error,exc]` tagging with a
|
|
24
|
+
single `result.fulfill { block.call }` + `result.wait` pair.
|
|
25
|
+
- **`Engine#spawn_task(parent:)`** — renamed from `barrier:` to make it
|
|
26
|
+
clear any parent barrier is accepted, not just the socket-level one.
|
|
27
|
+
- **`linger` default → `Float::INFINITY`** — matches libzmq parity.
|
|
28
|
+
`Socket#close` waits forever for the send queue to drain. Pass
|
|
29
|
+
`linger: 0` for the old drop-on-close behavior.
|
|
30
|
+
- **`Socket.new` accepts a block** — File.open-style. The socket is
|
|
31
|
+
yielded to the block and `#close`d when the block returns (or
|
|
32
|
+
raises).
|
|
33
|
+
- **`drain_send_queue` rescues `Async::Stop`** — parent-task
|
|
34
|
+
cancellation during close no longer propagates out of the ensure
|
|
35
|
+
path; the rest of teardown runs.
|
|
36
|
+
- **Hot-path `Array#first`** — `send_pump` uses `Array#first` instead
|
|
37
|
+
of `[0]` for YJIT specialization.
|
|
38
|
+
- **Barrier-based cascading teardown** — `SocketLifecycle` owns a
|
|
39
|
+
socket-level `Async::Barrier`; `ConnectionLifecycle` creates a nested
|
|
40
|
+
per-connection barrier. All pumps, accept loops, reconnect loops, and
|
|
41
|
+
supervisors live under these barriers. `Engine#close` calls
|
|
42
|
+
`barrier.stop` once and every descendant unwinds atomically. Replaces
|
|
43
|
+
the manual `@tasks` array.
|
|
44
|
+
- **Per-connection supervisor** — each connection spawns a supervisor
|
|
45
|
+
task (on the socket barrier) that watches for the first pump exit and
|
|
46
|
+
runs `lost!` in `ensure`. Placing the supervisor outside the
|
|
47
|
+
per-connection barrier avoids the self-stop footgun.
|
|
48
|
+
- **Connect timeout** — `Transport::TCP.connect` uses
|
|
49
|
+
`Socket.tcp(host, port, connect_timeout:)` instead of `TCPSocket.new`.
|
|
50
|
+
Timeout derived from `reconnect_interval` (floor 0.5s). Fixes macOS
|
|
51
|
+
hang where IPv6 `connect(2)` never delivers `ECONNREFUSED`.
|
|
52
|
+
- **Handshake timeout** — SP greeting exchange wrapped in
|
|
53
|
+
`Async::Task#with_timeout(handshake_timeout)`. Prevents a hang when a
|
|
54
|
+
non-NNG service accepts the TCP connection but never sends a greeting.
|
|
55
|
+
- **Reconnect after handshake failure** — `ConnectionLifecycle#handshake!`
|
|
56
|
+
now calls `tear_down!(reconnect: true)` on error instead of bare
|
|
57
|
+
`transition!(:closed)`, so the endpoint doesn't go dead when a peer
|
|
58
|
+
RSTs mid-handshake.
|
|
59
|
+
- **Quantized reconnect sleeps** — `Reconnect#quantized_wait` aligns
|
|
60
|
+
retries to wall-clock grid boundaries. Multiple clients reconnecting
|
|
61
|
+
with the same interval wake at the same instant.
|
|
62
|
+
- **Send pump fairness yield** — `Async::Task.current.yield` after each
|
|
63
|
+
batch write ensures peer pumps get a turn when the queue stays
|
|
64
|
+
non-empty.
|
|
65
|
+
- Add `DESIGN.md` documenting the architecture.
|
|
66
|
+
- **Versioned socket names** — `PUSH` → `PUSH0`, `PULL` → `PULL0`, etc.
|
|
67
|
+
Canonical names now include the SP protocol version. Unversioned
|
|
68
|
+
aliases (`NNQ::PUSH = NNQ::PUSH0`) are kept for backward compat.
|
|
69
|
+
- **`raw:` kwarg** — `Socket#initialize` accepts `raw: false`. Plumbing
|
|
70
|
+
for raw-mode routing (device/proxy support). No functional raw
|
|
71
|
+
routing yet.
|
|
72
|
+
- **`NNQ::BUS0`** — best-effort bidirectional mesh (bus0). Fan-out send
|
|
73
|
+
to all peers (drop when full), shared recv queue. Self-pairing.
|
|
74
|
+
- **`NNQ::SURVEYOR0` / `NNQ::RESPONDENT0`** — survey/response pattern
|
|
75
|
+
(survey0). Surveyor broadcasts a survey with a timed reply window
|
|
76
|
+
(`options.survey_time`, default 1s). Respondent echoes the backtrace
|
|
77
|
+
like REP. Shared `Routing::Backtrace` module extracted from REP.
|
|
78
|
+
- **`NNQ::TimedOut`** error raised when the survey window expires.
|
|
79
|
+
|
|
80
|
+
## 0.4.0 — 2026-04-09
|
|
81
|
+
|
|
82
|
+
- `Socket#all_peers_gone` — `Async::Promise` resolving the first time
|
|
83
|
+
the connection set becomes empty after at least one peer connected.
|
|
84
|
+
Edge-triggered, ported from OMQ.
|
|
85
|
+
- `Socket#close_read` — closes the recv side only. Buffered messages
|
|
86
|
+
drain, then `#receive` returns `nil`. Send side stays operational.
|
|
87
|
+
- `Socket#reconnect_enabled` / `#reconnect_enabled=` — flipped by
|
|
88
|
+
transient-mode consumers before draining to prevent the background
|
|
89
|
+
reconnect loop from revivifying a dying socket.
|
|
90
|
+
- `Socket#monitor` / `NNQ::MonitorEvent` — lifecycle event stream
|
|
91
|
+
emitting `:listening`, `:connect_delayed`, `:connect_retried`,
|
|
92
|
+
`:connected`, `:handshake_succeeded`/`_failed`, `:disconnected`,
|
|
93
|
+
`:closed`, and (when `verbose: true`) `:message_sent` /
|
|
94
|
+
`:message_received`. Ported from OMQ, minus the heartbeat/mechanism
|
|
95
|
+
events nnq doesn't have.
|
|
96
|
+
- Background reconnect — `NNQ::Engine::Reconnect` runs a `transient: true`
|
|
97
|
+
task per dialed endpoint, retrying with exponential back-off bounded
|
|
98
|
+
by `options.reconnect_interval` (Numeric or Range). `connect` becomes
|
|
99
|
+
non-blocking for `tcp://` and `ipc://`; `inproc://` stays synchronous.
|
|
100
|
+
`CONNECTION_FAILED` / `CONNECTION_LOST` mutable-at-load-time registries
|
|
101
|
+
let plugins append transport-specific error classes.
|
|
102
|
+
- `NNQ::PULL#receive` honors `options.read_timeout` via
|
|
103
|
+
`Fiber.scheduler.with_timeout`. Previously the option was declared
|
|
104
|
+
but inert.
|
|
105
|
+
- `NNQ.freeze_for_ractors!` — freezes `Engine::CONNECTION_FAILED`,
|
|
106
|
+
`Engine::CONNECTION_LOST`, and `Engine::TRANSPORTS` so NNQ sockets
|
|
107
|
+
can be used from non-main Ractors. Required for nnq-cli's `pipe -P N`
|
|
108
|
+
parallel worker mode.
|
|
109
|
+
|
|
110
|
+
## 0.3.0 — 2026-04-09
|
|
111
|
+
|
|
112
|
+
- `Socket#peer_connected` — `Async::Promise` that resolves with the
|
|
113
|
+
first connected peer (or `nil` on close without any peers). Ported
|
|
114
|
+
from OMQ. Held on `SocketLifecycle`, resolved by `ConnectionLifecycle`
|
|
115
|
+
on first `ready!`, and edge-triggered so callers don't need to poll.
|
|
116
|
+
- `bench/` — main throughput suite ported from OMQ. Four patterns
|
|
117
|
+
(push/pull, req/rep, pair, pub/sub) across inproc, ipc, and tcp.
|
|
118
|
+
Calibration-driven burst sizing, fastest-of-3 reporting, regression
|
|
119
|
+
report with `--update-readme` to regenerate README tables.
|
|
120
|
+
|
|
3
121
|
## 0.2.0 — 2026-04-09
|
|
4
122
|
|
|
5
123
|
- `NNQ::PUB` / `NNQ::SUB` with local prefix filtering (pub0/sub0).
|
data/lib/nnq/bus.rb
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require_relative "socket"
|
|
4
|
+
require_relative "routing/bus"
|
|
5
|
+
|
|
6
|
+
module NNQ
|
|
7
|
+
# BUS (nng bus0): best-effort bidirectional mesh. Every message sent
|
|
8
|
+
# goes to all directly connected peers. Every message received from
|
|
9
|
+
# any peer is delivered to the application. Self-pairing (BUS ↔ BUS).
|
|
10
|
+
#
|
|
11
|
+
# Send never blocks — if a peer's queue is full, the message is
|
|
12
|
+
# dropped for that peer (matching nng's best-effort semantics).
|
|
13
|
+
#
|
|
14
|
+
class BUS0 < Socket
|
|
15
|
+
def send(body)
|
|
16
|
+
body = frozen_binary(body)
|
|
17
|
+
Reactor.run { @engine.routing.send(body) }
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
|
|
21
|
+
def receive
|
|
22
|
+
Reactor.run { @engine.routing.receive }
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
|
|
26
|
+
private
|
|
27
|
+
|
|
28
|
+
def protocol
|
|
29
|
+
Protocol::SP::Protocols::BUS_V0
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
|
|
33
|
+
def build_routing(engine)
|
|
34
|
+
Routing::Bus.new(engine)
|
|
35
|
+
end
|
|
36
|
+
end
|
|
37
|
+
end
|
data/lib/nnq/connection.rb
CHANGED
|
@@ -12,9 +12,11 @@ module NNQ
|
|
|
12
12
|
# @return [Protocol::SP::Connection]
|
|
13
13
|
attr_reader :sp
|
|
14
14
|
|
|
15
|
+
|
|
15
16
|
# @return [String, nil] endpoint URI we connected to / accepted from
|
|
16
17
|
attr_reader :endpoint
|
|
17
18
|
|
|
19
|
+
|
|
18
20
|
# @param sp [Protocol::SP::Connection] handshake-completed SP connection
|
|
19
21
|
# @param endpoint [String, nil]
|
|
20
22
|
def initialize(sp, endpoint: nil)
|
|
@@ -25,7 +27,9 @@ module NNQ
|
|
|
25
27
|
|
|
26
28
|
|
|
27
29
|
# @return [Integer] peer protocol id (e.g. Protocols::PULL_V0)
|
|
28
|
-
def peer_protocol
|
|
30
|
+
def peer_protocol
|
|
31
|
+
@sp.peer_protocol
|
|
32
|
+
end
|
|
29
33
|
|
|
30
34
|
|
|
31
35
|
# Writes one message into the SP connection's send buffer (no flush).
|
|
@@ -77,7 +81,9 @@ module NNQ
|
|
|
77
81
|
|
|
78
82
|
|
|
79
83
|
# @return [Boolean]
|
|
80
|
-
def closed?
|
|
84
|
+
def closed?
|
|
85
|
+
@closed
|
|
86
|
+
end
|
|
81
87
|
|
|
82
88
|
|
|
83
89
|
# Closes the underlying SP connection. Safe to call twice.
|
|
@@ -86,5 +92,6 @@ module NNQ
|
|
|
86
92
|
@closed = true
|
|
87
93
|
@sp.close
|
|
88
94
|
end
|
|
95
|
+
|
|
89
96
|
end
|
|
90
97
|
end
|
|
@@ -1,5 +1,6 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
|
+
require "async/barrier"
|
|
3
4
|
require "protocol/sp"
|
|
4
5
|
require_relative "../connection"
|
|
5
6
|
|
|
@@ -42,6 +43,12 @@ module NNQ
|
|
|
42
43
|
# @return [Symbol]
|
|
43
44
|
attr_reader :state
|
|
44
45
|
|
|
46
|
+
# @return [Async::Barrier] holds all per-connection pump tasks
|
|
47
|
+
# (send pump, recv pump). When the connection is torn down,
|
|
48
|
+
# {#tear_down!} calls `@barrier.stop` to cancel every sibling
|
|
49
|
+
# task atomically.
|
|
50
|
+
attr_reader :barrier
|
|
51
|
+
|
|
45
52
|
|
|
46
53
|
# @param engine [Engine]
|
|
47
54
|
# @param endpoint [String, nil]
|
|
@@ -52,6 +59,7 @@ module NNQ
|
|
|
52
59
|
@framing = framing
|
|
53
60
|
@state = :new
|
|
54
61
|
@conn = nil
|
|
62
|
+
@barrier = Async::Barrier.new(parent: engine.barrier)
|
|
55
63
|
end
|
|
56
64
|
|
|
57
65
|
|
|
@@ -68,28 +76,45 @@ module NNQ
|
|
|
68
76
|
max_message_size: @engine.options.max_message_size,
|
|
69
77
|
framing: @framing,
|
|
70
78
|
)
|
|
71
|
-
sp.handshake!
|
|
79
|
+
Async::Task.current.with_timeout(handshake_timeout) { sp.handshake! }
|
|
72
80
|
ready!(NNQ::Connection.new(sp, endpoint: @endpoint))
|
|
73
81
|
@conn
|
|
74
|
-
rescue
|
|
82
|
+
rescue Protocol::SP::Error, *CONNECTION_LOST, Async::TimeoutError => error
|
|
83
|
+
@engine.emit_monitor_event(:handshake_failed, endpoint: @endpoint, detail: { error: error })
|
|
75
84
|
io.close rescue nil
|
|
76
|
-
|
|
85
|
+
# Full tear-down with reconnect: without this, the endpoint
|
|
86
|
+
# goes dead when a peer RSTs mid-handshake.
|
|
87
|
+
tear_down!(reconnect: true)
|
|
77
88
|
raise
|
|
78
89
|
end
|
|
79
90
|
|
|
80
91
|
|
|
81
|
-
#
|
|
82
|
-
#
|
|
92
|
+
# Unexpected loss of an established connection. Tears down and
|
|
93
|
+
# asks the engine to schedule a reconnect (if the endpoint is in
|
|
94
|
+
# the dialed set and reconnect is still enabled).
|
|
83
95
|
def lost!
|
|
84
|
-
tear_down!
|
|
96
|
+
tear_down!(reconnect: true)
|
|
85
97
|
end
|
|
86
98
|
|
|
87
99
|
|
|
88
|
-
#
|
|
89
|
-
#
|
|
90
|
-
# reconnect yet, so the two behave identically.
|
|
100
|
+
# Deliberate close (engine shutdown or routing eviction). Does
|
|
101
|
+
# not trigger reconnect.
|
|
91
102
|
def close!
|
|
92
|
-
tear_down!
|
|
103
|
+
tear_down!(reconnect: false)
|
|
104
|
+
end
|
|
105
|
+
|
|
106
|
+
|
|
107
|
+
# Starts a supervisor for this connection. Must be called after
|
|
108
|
+
# all per-connection pumps (recv loop, send pump) have been
|
|
109
|
+
# spawned on the connection barrier. The supervisor blocks until
|
|
110
|
+
# the first pump exits, then runs tear_down! via lost!.
|
|
111
|
+
#
|
|
112
|
+
# Called by Engine#handle_accepted / Engine#handle_connected after
|
|
113
|
+
# spawning the recv loop — routing's connection_added may have
|
|
114
|
+
# already spawned send pumps during ready!, so the barrier is
|
|
115
|
+
# guaranteed non-empty by then.
|
|
116
|
+
def start_supervisor!
|
|
117
|
+
start_supervisor unless @barrier.empty?
|
|
93
118
|
end
|
|
94
119
|
|
|
95
120
|
|
|
@@ -102,24 +127,59 @@ module NNQ
|
|
|
102
127
|
begin
|
|
103
128
|
@engine.routing.connection_added(conn) if @engine.routing.respond_to?(:connection_added)
|
|
104
129
|
rescue ConnectionRejected
|
|
105
|
-
|
|
130
|
+
@engine.emit_monitor_event(:connection_rejected, endpoint: @endpoint)
|
|
131
|
+
tear_down!(reconnect: false)
|
|
106
132
|
raise
|
|
107
133
|
end
|
|
134
|
+
@engine.lifecycle.peer_connected.resolve(conn) unless @engine.lifecycle.peer_connected.resolved?
|
|
135
|
+
@engine.emit_monitor_event(:handshake_succeeded, endpoint: @endpoint)
|
|
136
|
+
@engine.emit_monitor_event(:connected, endpoint: @endpoint)
|
|
108
137
|
@engine.new_pipe.signal
|
|
109
138
|
end
|
|
110
139
|
|
|
111
140
|
|
|
112
|
-
def tear_down!
|
|
141
|
+
def tear_down!(reconnect: false)
|
|
113
142
|
return if @state == :closed
|
|
114
143
|
transition!(:closed)
|
|
115
144
|
if @conn
|
|
116
145
|
@engine.connections.delete(@conn)
|
|
117
146
|
@engine.routing.connection_removed(@conn) if @engine.routing.respond_to?(:connection_removed)
|
|
118
147
|
@conn.close rescue nil
|
|
148
|
+
@engine.emit_monitor_event(:disconnected, endpoint: @endpoint)
|
|
149
|
+
@engine.resolve_all_peers_gone_if_empty
|
|
150
|
+
end
|
|
151
|
+
@engine.maybe_reconnect(@endpoint) if reconnect
|
|
152
|
+
# Cancel every sibling pump of this connection. The caller is
|
|
153
|
+
# the supervisor task, which is NOT in the barrier — so there
|
|
154
|
+
# is no self-stop risk.
|
|
155
|
+
@barrier.stop
|
|
156
|
+
end
|
|
157
|
+
|
|
158
|
+
|
|
159
|
+
# Spawns a supervisor task on the *socket-level* barrier (not the
|
|
160
|
+
# per-connection barrier) that blocks on the first pump to finish
|
|
161
|
+
# and then triggers teardown.
|
|
162
|
+
def start_supervisor
|
|
163
|
+
@engine.barrier.async(transient: true, annotation: "conn supervisor") do
|
|
164
|
+
@barrier.wait { |task| task.wait; break }
|
|
165
|
+
rescue Async::Stop, Async::Cancel
|
|
166
|
+
rescue *CONNECTION_LOST
|
|
167
|
+
ensure
|
|
168
|
+
lost!
|
|
119
169
|
end
|
|
120
170
|
end
|
|
121
171
|
|
|
122
172
|
|
|
173
|
+
# Handshake timeout: same logic as TCP.connect_timeout — derived
|
|
174
|
+
# from reconnect_interval (floor 0.5s). Prevents a hang when the
|
|
175
|
+
# peer accepts the TCP connection but never sends an SP greeting.
|
|
176
|
+
def handshake_timeout
|
|
177
|
+
ri = @engine.options.reconnect_interval
|
|
178
|
+
ri = ri.end if ri.is_a?(Range)
|
|
179
|
+
[ri, 0.5].max
|
|
180
|
+
end
|
|
181
|
+
|
|
182
|
+
|
|
123
183
|
def transition!(new_state)
|
|
124
184
|
allowed = TRANSITIONS[@state]
|
|
125
185
|
unless allowed&.include?(new_state)
|
|
@@ -0,0 +1,112 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module NNQ
|
|
4
|
+
class Engine
|
|
5
|
+
# Connection errors that should trigger a reconnect retry rather
|
|
6
|
+
# than propagate. Mutable at load time so plugins (e.g. a future
|
|
7
|
+
# TLS transport) can append their own error classes; frozen on
|
|
8
|
+
# first {Engine#connect}.
|
|
9
|
+
CONNECTION_FAILED = [
|
|
10
|
+
Errno::ECONNREFUSED,
|
|
11
|
+
Errno::EHOSTUNREACH,
|
|
12
|
+
Errno::ENETUNREACH,
|
|
13
|
+
Errno::ENOENT,
|
|
14
|
+
Errno::EPIPE,
|
|
15
|
+
Errno::ETIMEDOUT,
|
|
16
|
+
Socket::ResolutionError,
|
|
17
|
+
]
|
|
18
|
+
|
|
19
|
+
# Errors that indicate an established connection went away. Used
|
|
20
|
+
# by the recv loop and pumps to silently terminate (the connection
|
|
21
|
+
# lifecycle's #lost! handler decides whether to reconnect).
|
|
22
|
+
CONNECTION_LOST = [
|
|
23
|
+
EOFError,
|
|
24
|
+
IOError,
|
|
25
|
+
Errno::ECONNRESET,
|
|
26
|
+
Errno::EPIPE,
|
|
27
|
+
]
|
|
28
|
+
|
|
29
|
+
|
|
30
|
+
# Schedules reconnect attempts with exponential back-off.
|
|
31
|
+
#
|
|
32
|
+
# Runs a background task that loops until a connection is
|
|
33
|
+
# established or the engine is closed. Caller is non-blocking:
|
|
34
|
+
# {Engine#connect} returns immediately and the actual dial happens
|
|
35
|
+
# inside the task.
|
|
36
|
+
#
|
|
37
|
+
class Reconnect
|
|
38
|
+
# @param endpoint [String]
|
|
39
|
+
# @param options [Options]
|
|
40
|
+
# @param parent_task [Async::Task]
|
|
41
|
+
# @param engine [Engine]
|
|
42
|
+
# @param delay [Numeric, nil] initial delay (defaults to reconnect_interval)
|
|
43
|
+
def self.schedule(endpoint, options, parent_task, engine, delay: nil)
|
|
44
|
+
new(engine, endpoint, options).run(parent_task, delay: delay)
|
|
45
|
+
end
|
|
46
|
+
|
|
47
|
+
|
|
48
|
+
def initialize(engine, endpoint, options)
|
|
49
|
+
@engine = engine
|
|
50
|
+
@endpoint = endpoint
|
|
51
|
+
@options = options
|
|
52
|
+
end
|
|
53
|
+
|
|
54
|
+
|
|
55
|
+
def run(parent_task, delay: nil)
|
|
56
|
+
delay, max_delay = init_delay(delay)
|
|
57
|
+
|
|
58
|
+
parent_task.async(transient: true, annotation: "nnq reconnect #{@endpoint}") do
|
|
59
|
+
loop do
|
|
60
|
+
break if @engine.closed?
|
|
61
|
+
sleep quantized_wait(delay) if delay > 0
|
|
62
|
+
break if @engine.closed?
|
|
63
|
+
begin
|
|
64
|
+
@engine.transport_for(@endpoint).connect(@endpoint, @engine)
|
|
65
|
+
break
|
|
66
|
+
rescue *CONNECTION_FAILED, *CONNECTION_LOST => e
|
|
67
|
+
delay = next_delay(delay, max_delay)
|
|
68
|
+
@engine.emit_monitor_event(:connect_retried, endpoint: @endpoint, detail: { interval: delay, error: e })
|
|
69
|
+
end
|
|
70
|
+
end
|
|
71
|
+
rescue Async::Stop
|
|
72
|
+
end
|
|
73
|
+
end
|
|
74
|
+
|
|
75
|
+
|
|
76
|
+
private
|
|
77
|
+
|
|
78
|
+
|
|
79
|
+
# Wall-clock quantized sleep: wait until the next +delay+-sized
|
|
80
|
+
# grid tick. Multiple clients reconnecting with the same interval
|
|
81
|
+
# wake up at the same instant, collapsing staggered retries into
|
|
82
|
+
# aligned waves.
|
|
83
|
+
def quantized_wait(delay, now = Time.now.to_f)
|
|
84
|
+
wait = delay - (now % delay)
|
|
85
|
+
wait.positive? ? wait : delay
|
|
86
|
+
end
|
|
87
|
+
|
|
88
|
+
|
|
89
|
+
def init_delay(delay)
|
|
90
|
+
ri = @options.reconnect_interval
|
|
91
|
+
if ri.is_a?(Range)
|
|
92
|
+
[delay || ri.begin, ri.end]
|
|
93
|
+
else
|
|
94
|
+
[delay || ri, nil]
|
|
95
|
+
end
|
|
96
|
+
end
|
|
97
|
+
|
|
98
|
+
|
|
99
|
+
def next_delay(delay, max_delay)
|
|
100
|
+
ri = @options.reconnect_interval
|
|
101
|
+
if ri.is_a?(Range)
|
|
102
|
+
delay = delay * 2
|
|
103
|
+
delay = [delay, max_delay].min if max_delay
|
|
104
|
+
delay = ri.begin if delay == 0
|
|
105
|
+
delay
|
|
106
|
+
else
|
|
107
|
+
ri
|
|
108
|
+
end
|
|
109
|
+
end
|
|
110
|
+
end
|
|
111
|
+
end
|
|
112
|
+
end
|
|
@@ -1,5 +1,8 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
|
+
require "async/barrier"
|
|
4
|
+
require "async/promise"
|
|
5
|
+
|
|
3
6
|
module NNQ
|
|
4
7
|
class Engine
|
|
5
8
|
# Owns the socket-level state: `:new → :open → :closing → :closed`
|
|
@@ -31,11 +34,34 @@ module NNQ
|
|
|
31
34
|
# @return [Boolean] true if parent_task is the shared Reactor thread
|
|
32
35
|
attr_reader :on_io_thread
|
|
33
36
|
|
|
37
|
+
# @return [Async::Promise] resolves with the first connected peer
|
|
38
|
+
# (or nil if the socket closes before anyone connects)
|
|
39
|
+
attr_reader :peer_connected
|
|
40
|
+
|
|
41
|
+
# @return [Async::Promise] resolves with true the first time the
|
|
42
|
+
# connection set becomes empty after at least one peer connected.
|
|
43
|
+
# Edge-triggered: does not re-arm on reconnect.
|
|
44
|
+
attr_reader :all_peers_gone
|
|
45
|
+
|
|
46
|
+
# @return [Async::Barrier, nil] holds every socket-scoped task
|
|
47
|
+
# (connection supervisors, reconnect loops, accept loops).
|
|
48
|
+
# {Engine#close} calls +barrier.stop+ to cascade teardown
|
|
49
|
+
# through every per-connection barrier in one shot.
|
|
50
|
+
attr_reader :barrier
|
|
51
|
+
|
|
52
|
+
# @return [Boolean] when false, the engine must not schedule new
|
|
53
|
+
# reconnect attempts. Default true.
|
|
54
|
+
attr_accessor :reconnect_enabled
|
|
55
|
+
|
|
34
56
|
|
|
35
57
|
def initialize
|
|
36
|
-
@state
|
|
37
|
-
@parent_task
|
|
38
|
-
@on_io_thread
|
|
58
|
+
@state = :new
|
|
59
|
+
@parent_task = nil
|
|
60
|
+
@on_io_thread = false
|
|
61
|
+
@peer_connected = Async::Promise.new
|
|
62
|
+
@all_peers_gone = Async::Promise.new
|
|
63
|
+
@reconnect_enabled = true
|
|
64
|
+
@barrier = nil
|
|
39
65
|
end
|
|
40
66
|
|
|
41
67
|
|
|
@@ -56,6 +82,7 @@ module NNQ
|
|
|
56
82
|
return false if @parent_task
|
|
57
83
|
@parent_task = task
|
|
58
84
|
@on_io_thread = on_io_thread
|
|
85
|
+
@barrier = Async::Barrier.new(parent: @parent_task)
|
|
59
86
|
transition!(:open)
|
|
60
87
|
true
|
|
61
88
|
end
|
|
@@ -74,6 +101,16 @@ module NNQ
|
|
|
74
101
|
end
|
|
75
102
|
|
|
76
103
|
|
|
104
|
+
# Resolves `all_peers_gone` if we had peers and now have none.
|
|
105
|
+
# Idempotent.
|
|
106
|
+
# @param connections [Hash] current connection map
|
|
107
|
+
def resolve_all_peers_gone_if_empty(connections)
|
|
108
|
+
return unless @peer_connected.resolved? && connections.empty?
|
|
109
|
+
return if @all_peers_gone.resolved?
|
|
110
|
+
@all_peers_gone.resolve(true)
|
|
111
|
+
end
|
|
112
|
+
|
|
113
|
+
|
|
77
114
|
private
|
|
78
115
|
|
|
79
116
|
def transition!(new_state)
|