omq 0.16.2 → 0.17.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 856ca133b440ec0812ec17c9a470ce25d69a405f0d3fdbedb04194a4ac60527e
4
- data.tar.gz: 8b01aceb4098b436ad0c69f0c62930d6fafe8372bf6cbc77004cb4569936c6be
3
+ metadata.gz: ec5d4d1943efe37da6ad5493f80591b9578467e0c470fd4e2a577968580609cd
4
+ data.tar.gz: 8476b90cc637629874c94388cd91ae7a6f3ce3aaac410af1aa77c3fbd6470412
5
5
  SHA512:
6
- metadata.gz: 7b08a46d592cc3ba300991aaab8b99cbb22c377520b10bee9a8e67537d714474707113b15e8202a3f3b5bf277d8796b8ff0670f63e6cb850721d4a18374aac2b
7
- data.tar.gz: 7cb4caa364be3a387f4aaa531b4019fa075d434b2d243507892cc7ae6dbe13f0b9b2c201564f703ff4fc5ed42b27f714753ae870c4fcd9af47ab102afdcb3bf3
6
+ metadata.gz: 67b62142c8786b9594efdf5f62f0aef8b5329498a04faafec760305cb73d3de354b6f4dd7f5db3e416f4e69e683b1c60ad7920a879fb72c7942d602af2088715
7
+ data.tar.gz: 2731913ad5262a1d0038af1887b3c2bb167df719b650dd99d3d59812573180adb0e528eeaafba021823f12d7e8a38dbad92b8df07a9d6bac98a4534b71ae3cfc
data/CHANGELOG.md CHANGED
@@ -1,5 +1,77 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.17.0 — 2026-04-10
4
+
5
+ ### Changed
6
+
7
+ - **`Readable#receive` no longer prefetches a batch.** Each `#receive`
8
+ call dequeues exactly one message from the engine recv queue. The
9
+ per-socket prefetch buffer (`@recv_buffer` + `@recv_mutex`) and
10
+ `dequeue_recv_batch` are gone, along with `Readable::RECV_BATCH_SIZE`.
11
+ Simpler code; ~5–10% inproc microbench regression accepted (tcp/ipc
12
+ unchanged — wire I/O dominates dispatch overhead).
13
+
14
+ ### Added
15
+
16
+ - **Socket-level `Async::Barrier` and cascading teardown.**
17
+ `SocketLifecycle` now owns an `Async::Barrier` that tracks every
18
+ socket-scoped task — connection supervisors, pumps, accept loops,
19
+ reconnect loops, heartbeat, maintenance. `Engine#close` and the new
20
+ `Engine#stop` stop this single barrier and every descendant unwinds
21
+ in one call, so the ordering of `:disconnected` / `all_peers_gone` /
22
+ `maybe_reconnect` side effects no longer depends on which pump
23
+ happens to observe the disconnect first.
24
+
25
+ - **`Socket#stop`** — immediate hard stop that skips the linger drain
26
+ and goes straight to the barrier cascade. Complements `#close` for
27
+ crash-path cleanup.
28
+
29
+ - **`parent:` kwarg on `Socket#bind` / `Socket#connect`.** Accepts any
30
+ object responding to `#async` (`Async::Task`, `Async::Barrier`,
31
+ `Async::Semaphore`). The socket-level barrier is constructed with
32
+ the caller's parent, so every task spawned under the socket lives
33
+ under the caller's Async tree — standard Async idiom for letting
34
+ callers coordinate teardown of internal tasks with their own work.
35
+
36
+ ### Fixed
37
+
38
+ - **macOS: PUSH fails to reconnect after peer rebinds** (and analogous
39
+ races on any platform where the send pump observes the disconnect
40
+ before the recv pump does). The send pump's `rescue EPIPE` called
41
+ `connection_lost(conn)` → `tear_down!` → `routing.connection_removed`
42
+ → `.stop` on `@conn_send_tasks[conn]` — which **was** the currently-
43
+ running send pump. `Task#stop` on self raises `Async::Cancel`
44
+ synchronously and unwinds through `tear_down!` mid-sequence, before
45
+ `:disconnected` emission and `maybe_reconnect`, leaving the socket
46
+ stuck with no reconnect scheduled. Root-caused from a `ruby -d`
47
+ trace showing `EPIPE` at `buffered.rb:112` immediately followed by
48
+ `Async::Cancel` at `task.rb:358` "Cancelling current task!".
49
+
50
+ Fix: introduce a per-connection `Async::Barrier` and a supervisor
51
+ task placed on the *socket* barrier (not the per-conn one) that
52
+ blocks on `@barrier.wait { |t| t.wait; break }` and runs `lost!`
53
+ in its `ensure`. Pumps now just exit on `EPIPE` / `EOFError` /
54
+ ZMTP errors — they never initiate teardown from inside themselves,
55
+ so `Task#stop`-on-self is structurally impossible. All three
56
+ shutdown paths (peer disconnect, `#close`, `#stop`) converge on the
57
+ same ordered `tear_down!` sequence.
58
+
59
+ - **`DESIGN.md` synced with post-barrier-refactor reality.** Rewrote
60
+ the Task tree and Engine lifecycle sections to reflect the socket-
61
+ level `Async::Barrier`, per-connection nested barrier, supervisor
62
+ pattern, `Socket#stop`, and user-provided `parent:` kwarg. Added a
63
+ new Cancellation safety subsection documenting that wire writes in
64
+ protocol-zmtp are wrapped in `Async::Task#defer_cancel` so cascade
65
+ teardown during a mid-frame write can't desync the peer's framer.
66
+
67
+ - **IPC connect to an existing `SOCK_DGRAM` socket file** now surfaces
68
+ as a connect-time failure with backoff retry instead of crashing
69
+ the pump. `Errno::EPROTOTYPE` added to `CONNECTION_FAILED` (not
70
+ `CONNECTION_LOST` — it's a connect() error, not an established-
71
+ connection drop). Consistent with how `ECONNREFUSED` is treated for
72
+ TCP: the endpoint is misconfigured or not ready, the socket keeps
73
+ trying, and the user sees `:connect_retried` monitor events.
74
+
3
75
  ## 0.16.2 — 2026-04-09
4
76
 
5
77
  ### Fixed
@@ -40,6 +40,13 @@ module OMQ
40
40
  # @return [Symbol] current state
41
41
  attr_reader :state
42
42
 
43
+ # @return [Async::Barrier] holds all per-connection pump tasks
44
+ # (send pump, recv pump, reaper, heartbeat). When the connection
45
+ # is torn down, {#tear_down!} calls `@barrier.stop` to take down
46
+ # every sibling task atomically — so the first pump to see a
47
+ # disconnect takes down all the others.
48
+ attr_reader :barrier
49
+
43
50
 
44
51
  # @param engine [Engine]
45
52
  # @param endpoint [String, nil]
@@ -51,6 +58,11 @@ module OMQ
51
58
  @done = done
52
59
  @state = :new
53
60
  @conn = nil
61
+ # Nest the per-connection barrier under the socket-level barrier
62
+ # so every pump spawned via +@barrier.async+ is also tracked by
63
+ # the socket barrier — {Engine#stop}/{Engine#close} cascade
64
+ # through in one call.
65
+ @barrier = Async::Barrier.new(parent: engine.barrier)
54
66
  end
55
67
 
56
68
 
@@ -71,7 +83,7 @@ module OMQ
71
83
  max_message_size: @engine.options.max_message_size,
72
84
  )
73
85
  conn.handshake!
74
- Heartbeat.start(Async::Task.current, conn, @engine.options, @engine.tasks)
86
+ Heartbeat.start(@barrier, conn, @engine.options, @engine.tasks)
75
87
  ready!(conn)
76
88
  @conn
77
89
  rescue Protocol::ZMTP::Error, *CONNECTION_LOST => error
@@ -120,6 +132,39 @@ module OMQ
120
132
  @engine.routing.connection_added(@conn)
121
133
  @engine.peer_connected.resolve(@conn)
122
134
  transition!(:ready)
135
+ # No supervisor if nothing to supervise: inproc DirectPipes
136
+ # wire the recv/send paths synchronously (no task-based pumps),
137
+ # and isolated unit tests use a FakeEngine without pumps at all.
138
+ # Waiting on an empty barrier returns immediately and would
139
+ # tear the connection down right after registering.
140
+ start_supervisor unless @barrier.empty?
141
+ end
142
+
143
+
144
+ # Spawns a supervisor task on the *socket-level* barrier (not the
145
+ # per-connection barrier) that blocks on the first pump to finish
146
+ # and then triggers teardown.
147
+ #
148
+ # Keeping the supervisor out of the per-connection barrier avoids
149
+ # the self-stop problem: stopping the current task raises
150
+ # Async::Cancel synchronously and unwinds before side effects can
151
+ # run. Placing it on the socket barrier means {Engine#stop} /
152
+ # {Engine#close} cascade-cancels the supervisor, whose +ensure+
153
+ # runs the ordered disconnect side effects once.
154
+ #
155
+ def start_supervisor
156
+ @supervisor = @engine.barrier.async(transient: true, annotation: "conn supervisor") do
157
+ @barrier.wait do |task|
158
+ task.wait
159
+ break
160
+ end
161
+ rescue Async::Stop, Async::Cancel
162
+ # socket or supervisor cancelled externally (socket closing)
163
+ rescue Protocol::ZMTP::Error, *CONNECTION_LOST
164
+ # expected pump exit on disconnect
165
+ ensure
166
+ lost!
167
+ end
123
168
  end
124
169
 
125
170
 
@@ -133,6 +178,10 @@ module OMQ
133
178
  @done&.resolve(true)
134
179
  @engine.resolve_all_peers_gone_if_empty
135
180
  @engine.maybe_reconnect(@endpoint) if reconnect
181
+ # Cancel every sibling pump of this connection. The caller is
182
+ # the supervisor task, which is NOT in the barrier — so there
183
+ # is no self-stop risk.
184
+ @barrier.stop
136
185
  end
137
186
 
138
187
 
@@ -8,12 +8,12 @@ module OMQ
8
8
  # if no traffic is seen within +timeout+ seconds.
9
9
  #
10
10
  module Heartbeat
11
- # @param parent_task [Async::Task]
11
+ # @param parent [Async::Task, Async::Barrier] parent to spawn under
12
12
  # @param conn [Connection]
13
13
  # @param options [Options]
14
14
  # @param tasks [Array]
15
15
  #
16
- def self.start(parent_task, conn, options, tasks)
16
+ def self.start(parent, conn, options, tasks)
17
17
  interval = options.heartbeat_interval
18
18
  return unless interval
19
19
 
@@ -21,7 +21,7 @@ module OMQ
21
21
  timeout = options.heartbeat_timeout || interval
22
22
  conn.touch_heartbeat
23
23
 
24
- tasks << parent_task.async(transient: true, annotation: "heartbeat") do
24
+ tasks << parent.async(transient: true, annotation: "heartbeat") do
25
25
  loop do
26
26
  sleep interval
27
27
  conn.send_command(Protocol::ZMTP::Codec::Command.ping(ttl: ttl, context: "".b))
@@ -30,7 +30,7 @@ module OMQ
30
30
  break
31
31
  end
32
32
  end
33
- rescue Async::Stop
33
+ rescue Async::Stop, Async::Cancel
34
34
  rescue *CONNECTION_LOST
35
35
  # connection closed
36
36
  end
@@ -19,15 +19,15 @@ module OMQ
19
19
 
20
20
  # Public entry point — callers use the class method.
21
21
  #
22
- # @param parent_task [Async::Task]
22
+ # @param parent [Async::Task, Async::Barrier] parent to spawn under
23
23
  # @param conn [Connection, Transport::Inproc::DirectPipe]
24
24
  # @param recv_queue [SignalingQueue]
25
25
  # @param engine [Engine]
26
26
  # @param transform [Proc, nil]
27
27
  # @return [Async::Task, nil]
28
28
  #
29
- def self.start(parent_task, conn, recv_queue, engine, transform)
30
- new(conn, recv_queue, engine).start(parent_task, transform)
29
+ def self.start(parent, conn, recv_queue, engine, transform)
30
+ new(conn, recv_queue, engine).start(parent, transform)
31
31
  end
32
32
 
33
33
 
@@ -67,10 +67,10 @@ module OMQ
67
67
  private
68
68
 
69
69
 
70
- def start_with_transform(parent_task, transform)
70
+ def start_with_transform(parent, transform)
71
71
  conn, recv_queue, engine, count_bytes = @conn, @recv_queue, @engine, @count_bytes
72
72
 
73
- parent_task.async(transient: true, annotation: "recv pump") do |task|
73
+ parent.async(transient: true, annotation: "recv pump") do |task|
74
74
  loop do
75
75
  count = 0
76
76
  bytes = 0
@@ -84,19 +84,19 @@ module OMQ
84
84
  end
85
85
  task.yield
86
86
  end
87
- rescue Async::Stop
87
+ rescue Async::Stop, Async::Cancel
88
88
  rescue Protocol::ZMTP::Error, *CONNECTION_LOST
89
- @engine.connection_lost(conn)
89
+ # expected disconnect — supervisor will trigger teardown
90
90
  rescue => error
91
91
  @engine.signal_fatal_error(error)
92
92
  end
93
93
  end
94
94
 
95
95
 
96
- def start_direct(parent_task)
96
+ def start_direct(parent)
97
97
  conn, recv_queue, engine, count_bytes = @conn, @recv_queue, @engine, @count_bytes
98
98
 
99
- parent_task.async(transient: true, annotation: "recv pump") do |task|
99
+ parent.async(transient: true, annotation: "recv pump") do |task|
100
100
  loop do
101
101
  count = 0
102
102
  bytes = 0
@@ -109,9 +109,9 @@ module OMQ
109
109
  end
110
110
  task.yield
111
111
  end
112
- rescue Async::Stop
112
+ rescue Async::Stop, Async::Cancel
113
113
  rescue Protocol::ZMTP::Error, *CONNECTION_LOST
114
- @engine.connection_lost(conn)
114
+ # expected disconnect — supervisor will trigger teardown
115
115
  rescue => error
116
116
  @engine.signal_fatal_error(error)
117
117
  end
@@ -34,12 +34,21 @@ module OMQ
34
34
  # @return [Async::Promise] resolves once all peers are gone (after having had peers)
35
35
  attr_reader :all_peers_gone
36
36
 
37
- # @return [Async::Task, nil] root of the socket's task tree
37
+ # @return [Async::Task, Async::Barrier, Async::Semaphore, nil] root of
38
+ # the socket's task tree (may be user-provided via +parent:+ on
39
+ # {Socket#bind} / {Socket#connect}; falls back to the current
40
+ # Async task or the shared Reactor root)
38
41
  attr_reader :parent_task
39
42
 
40
43
  # @return [Boolean] true if parent_task is the shared Reactor thread
41
44
  attr_reader :on_io_thread
42
45
 
46
+ # @return [Async::Barrier] holds every socket-scoped task (connection
47
+ # supervisors, reconnect loops, heartbeat, monitor, accept loops).
48
+ # {Engine#stop} and {Engine#close} call +barrier.stop+ to cascade
49
+ # teardown through every per-connection barrier in one shot.
50
+ attr_reader :barrier
51
+
43
52
  # @return [Boolean] whether auto-reconnect is enabled
44
53
  attr_accessor :reconnect_enabled
45
54
 
@@ -51,6 +60,7 @@ module OMQ
51
60
  @reconnect_enabled = true
52
61
  @parent_task = nil
53
62
  @on_io_thread = false
63
+ @barrier = nil
54
64
  end
55
65
 
56
66
 
@@ -60,22 +70,36 @@ module OMQ
60
70
  def alive? = @state == :new || @state == :open
61
71
 
62
72
 
63
- # Captures the current Async task (or the shared Reactor root) as
64
- # this socket's task tree root. Transitions `:new → :open`.
73
+ # Captures the socket's task tree root. Transitions `:new :open`.
74
+ #
75
+ # When +parent+ is provided (any Async task/barrier/semaphore — any
76
+ # object that responds to +#async+), it is used as the root; this is
77
+ # the common Async idiom for letting callers place internal tasks
78
+ # under a caller-managed parent so teardown can be coordinated with
79
+ # other work. Otherwise falls back to the current Async task or the
80
+ # shared Reactor root for non-Async callers.
81
+ #
82
+ # The socket-level {#barrier} is constructed with the captured root
83
+ # as its parent so every task spawned via +barrier.async+ lives
84
+ # under the caller's tree.
65
85
  #
86
+ # @param parent [#async, nil] optional Async parent
66
87
  # @param linger [Numeric, nil] used to register the Reactor linger slot
67
88
  # when falling back to the IO thread
68
89
  # @return [Boolean] true on first-time capture, false if already captured
69
90
  #
70
- def capture_parent_task(linger:)
91
+ def capture_parent_task(parent: nil, linger:)
71
92
  return false if @parent_task
72
- if Async::Task.current?
93
+ if parent
94
+ @parent_task = parent
95
+ elsif Async::Task.current?
73
96
  @parent_task = Async::Task.current
74
97
  else
75
98
  @parent_task = Reactor.root_task
76
99
  @on_io_thread = true
77
100
  Reactor.track_linger(linger)
78
101
  end
102
+ @barrier = Async::Barrier.new(parent: @parent_task)
79
103
  transition!(:open)
80
104
  true
81
105
  end
data/lib/omq/engine.rb CHANGED
@@ -90,6 +90,7 @@ module OMQ
90
90
  def peer_connected = @lifecycle.peer_connected
91
91
  def all_peers_gone = @lifecycle.all_peers_gone
92
92
  def parent_task = @lifecycle.parent_task
93
+ def barrier = @lifecycle.barrier
93
94
  def closed? = @lifecycle.closed?
94
95
  def reconnect_enabled=(value)
95
96
  @lifecycle.reconnect_enabled = value
@@ -109,9 +110,9 @@ module OMQ
109
110
  def spawn_inproc_retry(endpoint)
110
111
  ri = @options.reconnect_interval
111
112
  ivl = ri.is_a?(Range) ? ri.begin : ri
112
- @tasks << @lifecycle.parent_task.async(transient: true, annotation: "inproc reconnect #{endpoint}") do
113
+ @tasks << @lifecycle.barrier.async(transient: true, annotation: "inproc reconnect #{endpoint}") do
113
114
  yield ivl
114
- rescue Async::Stop
115
+ rescue Async::Stop, Async::Cancel
115
116
  end
116
117
  end
117
118
 
@@ -122,8 +123,9 @@ module OMQ
122
123
  # @return [void]
123
124
  # @raise [ArgumentError] on unsupported transport
124
125
  #
125
- def bind(endpoint)
126
+ def bind(endpoint, parent: nil)
126
127
  OMQ.freeze_for_ractors!
128
+ capture_parent_task(parent: parent)
127
129
  transport = transport_for(endpoint)
128
130
  listener = transport.bind(endpoint, self)
129
131
  start_accept_loops(listener)
@@ -142,8 +144,9 @@ module OMQ
142
144
  # @param endpoint [String]
143
145
  # @return [void]
144
146
  #
145
- def connect(endpoint)
147
+ def connect(endpoint, parent: nil)
146
148
  OMQ.freeze_for_ractors!
149
+ capture_parent_task(parent: parent)
147
150
  validate_endpoint!(endpoint)
148
151
  @dialed.add(endpoint)
149
152
  if endpoint.start_with?("inproc://")
@@ -232,30 +235,6 @@ module OMQ
232
235
  end
233
236
 
234
237
 
235
- # Dequeues up to +max+ messages or +max_bytes+ total. Blocks
236
- # on the first, then drains non-blocking.
237
- #
238
- # @param max [Integer] message count limit
239
- # @param max_bytes [Integer] byte size limit
240
- # @return [Array<Array<String>>]
241
- #
242
- def dequeue_recv_batch(max, max_bytes: 1 << 20)
243
- raise @fatal_error if @fatal_error
244
- queue = routing.recv_queue
245
- msg = queue.dequeue
246
- raise @fatal_error if msg.nil? && @fatal_error
247
- batch = [msg]
248
- bytes = msg.sum(&:bytesize)
249
- while batch.size < max && bytes < max_bytes
250
- msg = queue.dequeue(timeout: 0)
251
- break unless msg
252
- batch << msg
253
- bytes += msg.sum(&:bytesize)
254
- end
255
- batch
256
- end
257
-
258
-
259
238
  # Pushes a nil sentinel into the recv queue, unblocking a
260
239
  # pending {#dequeue_recv} with a nil return value.
261
240
  #
@@ -284,7 +263,11 @@ module OMQ
284
263
  # @return [Async::Task, nil]
285
264
  #
286
265
  def start_recv_pump(conn, recv_queue, &transform)
287
- task = RecvPump.start(Async::Task.current, conn, recv_queue, self, transform)
266
+ # Spawn on the connection's lifecycle barrier so the recv pump is
267
+ # torn down together with the rest of its sibling per-connection
268
+ # pumps when the connection is lost.
269
+ parent = @connections[conn]&.barrier || @lifecycle.barrier
270
+ task = RecvPump.start(parent, conn, recv_queue, self, transform)
288
271
  @tasks << task if task
289
272
  task
290
273
  end
@@ -318,7 +301,10 @@ module OMQ
318
301
  end
319
302
 
320
303
 
321
- # Closes all connections and listeners.
304
+ # Closes all connections and listeners gracefully. Drains pending
305
+ # sends up to +linger+ seconds, then cascades teardown through the
306
+ # socket-level {SocketLifecycle#barrier} — every per-connection
307
+ # barrier is stopped as a side effect, cancelling every pump.
322
308
  #
323
309
  # @return [void]
324
310
  #
@@ -330,8 +316,27 @@ module OMQ
330
316
  @lifecycle.finish_closing!
331
317
  Reactor.untrack_linger(@options.linger) if @lifecycle.on_io_thread
332
318
  stop_listeners
333
- close_connections
334
- stop_tasks
319
+ tear_down_barrier
320
+ routing.stop rescue nil
321
+ emit_monitor_event(:closed)
322
+ close_monitor_queue
323
+ end
324
+
325
+
326
+ # Immediate hard stop: skips the linger drain and cascades teardown
327
+ # through the socket-level barrier. Intended for crash-path cleanup
328
+ # where {#close}'s drain is either unsafe or undesired.
329
+ #
330
+ # @return [void]
331
+ #
332
+ def stop
333
+ return unless @lifecycle.alive?
334
+ @lifecycle.start_closing! if @lifecycle.open?
335
+ @lifecycle.finish_closing!
336
+ Reactor.untrack_linger(@options.linger) if @lifecycle.on_io_thread
337
+ stop_listeners
338
+ tear_down_barrier
339
+ routing.stop rescue nil
335
340
  emit_monitor_event(:closed)
336
341
  close_monitor_queue
337
342
  end
@@ -350,7 +355,7 @@ module OMQ
350
355
  def spawn_pump_task(annotation:, &block)
351
356
  Async::Task.current.async(transient: true, annotation: annotation) do
352
357
  yield
353
- rescue Async::Stop, Protocol::ZMTP::Error, *CONNECTION_LOST
358
+ rescue Async::Stop, Async::Cancel, Protocol::ZMTP::Error, *CONNECTION_LOST
354
359
  # normal shutdown / expected disconnect
355
360
  rescue => error
356
361
  signal_fatal_error(error)
@@ -358,6 +363,31 @@ module OMQ
358
363
  end
359
364
 
360
365
 
366
+ # Spawns a per-connection pump task on the connection's own
367
+ # lifecycle barrier. When any pump on the barrier exits (e.g. the
368
+ # send pump sees EPIPE and calls {#connection_lost}), {ConnectionLifecycle#tear_down!}
369
+ # calls `barrier.stop` which cancels every sibling pump for that
370
+ # connection — so a dead peer can no longer leave orphan send
371
+ # pumps blocked on `dequeue` waiting for messages that will never
372
+ # be written.
373
+ #
374
+ # @param conn [Connection, Transport::Inproc::DirectPipe]
375
+ # @param annotation [String]
376
+ #
377
+ def spawn_conn_pump_task(conn, annotation:, &block)
378
+ lifecycle = @connections[conn]
379
+ return spawn_pump_task(annotation: annotation, &block) unless lifecycle
380
+
381
+ lifecycle.barrier.async(transient: true, annotation: annotation) do
382
+ yield
383
+ rescue Async::Stop, Async::Cancel, Protocol::ZMTP::Error, *CONNECTION_LOST
384
+ # normal shutdown / expected disconnect / sibling tore us down
385
+ rescue => error
386
+ signal_fatal_error(error)
387
+ end
388
+ end
389
+
390
+
361
391
  # Wraps an unexpected pump error as {OMQ::SocketDeadError} and
362
392
  # unblocks any callers waiting on the recv queue.
363
393
  #
@@ -378,14 +408,22 @@ module OMQ
378
408
  end
379
409
 
380
410
 
381
- # Saves the current Async task so connection subtrees can be
382
- # spawned under the caller's task tree. Called by Socket before
383
- # the first bind/connect outside Reactor.run so non-Async
384
- # callers get the IO thread's root task, not an ephemeral work task.
411
+ # Captures the socket's task tree root and starts the socket-level
412
+ # maintenance task. If +parent+ is given, it is used as the parent
413
+ # for every task spawned under this socket (connection supervisors,
414
+ # reconnect loops, maintenance, monitor). Otherwise the current
415
+ # Async task (or the shared Reactor root, for non-Async callers)
416
+ # is captured automatically.
417
+ #
418
+ # Idempotent: first call wins. Subsequent calls (including from
419
+ # later bind/connect invocations) with a different +parent+ are
420
+ # silently ignored.
385
421
  #
386
- def capture_parent_task
387
- return unless @lifecycle.capture_parent_task(linger: @options.linger)
388
- Maintenance.start(@lifecycle.parent_task, @options.mechanism, @tasks)
422
+ # @param parent [#async, nil] optional Async parent
423
+ #
424
+ def capture_parent_task(parent: nil)
425
+ return unless @lifecycle.capture_parent_task(parent: parent, linger: @options.linger)
426
+ Maintenance.start(@lifecycle.barrier, @options.mechanism, @tasks)
389
427
  end
390
428
 
391
429
 
@@ -432,11 +470,13 @@ module OMQ
432
470
  private
433
471
 
434
472
  def spawn_connection(io, as_server:, endpoint: nil)
435
- task = @lifecycle.parent_task&.async(transient: true, annotation: "conn #{endpoint}") do
473
+ task = @lifecycle.barrier&.async(transient: true, annotation: "conn #{endpoint}") do
436
474
  done = Async::Promise.new
437
475
  lifecycle = ConnectionLifecycle.new(self, endpoint: endpoint, done: done)
438
476
  lifecycle.handshake!(io, as_server: as_server)
439
477
  done.wait
478
+ rescue Async::Stop, Async::Cancel
479
+ # socket barrier stopped — cascade teardown
440
480
  rescue Async::Queue::ClosedError
441
481
  # connection dropped during drain — message re-staged
442
482
  rescue Protocol::ZMTP::Error, *CONNECTION_LOST
@@ -471,7 +511,7 @@ module OMQ
471
511
 
472
512
  def start_accept_loops(listener)
473
513
  return unless listener.respond_to?(:start_accept_loops)
474
- listener.start_accept_loops(@lifecycle.parent_task) do |io|
514
+ listener.start_accept_loops(@lifecycle.barrier) do |io|
475
515
  handle_accepted(io, endpoint: listener.endpoint)
476
516
  end
477
517
  end
@@ -483,19 +523,19 @@ module OMQ
483
523
  end
484
524
 
485
525
 
486
- def close_connections
487
- @connections.values.each(&:close!)
488
- end
489
-
490
-
491
526
  def close_connections_at(endpoint)
492
527
  @connections.values.select { |lc| lc.endpoint == endpoint }.each(&:close!)
493
528
  end
494
529
 
495
530
 
496
- def stop_tasks
497
- routing.stop rescue nil
498
- @tasks.each { |t| t.stop rescue nil }
531
+ # Cascades teardown through the socket-level barrier. Stopping the
532
+ # barrier cancels every tracked task: connection supervisors (whose
533
+ # `ensure lost!` runs the ordered disconnect side effects), accept
534
+ # loops, reconnect loops, heartbeat, maintenance. After the cascade,
535
+ # clears the legacy +@tasks+ list.
536
+ #
537
+ def tear_down_barrier
538
+ @lifecycle.barrier&.stop
499
539
  @tasks.clear
500
540
  end
501
541
 
@@ -16,13 +16,7 @@ module OMQ
16
16
  # @raise [IO::TimeoutError] if timeout exceeded
17
17
  #
18
18
  def dequeue(timeout: @options.read_timeout)
19
- msg = @recv_mutex.synchronize { @recv_buffer.shift }
20
- return msg if msg
21
-
22
- batch = Reactor.run { with_timeout(timeout) { @engine.dequeue_recv_batch(Readable::RECV_BATCH_SIZE) } }
23
- msg = batch.shift
24
- @recv_mutex.synchronize { @recv_buffer.concat(batch) } unless batch.empty?
25
- msg
19
+ Reactor.run { with_timeout(timeout) { @engine.dequeue_recv } }
26
20
  end
27
21
 
28
22
  alias_method :pop, :dequeue
data/lib/omq/readable.rb CHANGED
@@ -8,20 +8,13 @@ module OMQ
8
8
  module Readable
9
9
  include QueueReadable
10
10
 
11
- # Maximum messages to prefetch from the recv queue per drain.
12
- RECV_BATCH_SIZE = 64
13
-
14
-
15
- # Receives the next message. Returns from a local prefetch
16
- # buffer when available, otherwise drains up to
17
- # {RECV_BATCH_SIZE} messages from the recv queue in one
18
- # synchronized dequeue.
11
+ # Receives the next message directly from the engine recv queue.
19
12
  #
20
13
  # @return [Array<String>] message parts
21
14
  # @raise [IO::TimeoutError] if read_timeout exceeded
22
15
  #
23
16
  def receive
24
- @recv_mutex.synchronize { @recv_buffer.shift } || fill_recv_buffer
17
+ Reactor.run { with_timeout(@options.read_timeout) { @engine.dequeue_recv } }
25
18
  end
26
19
 
27
20
 
@@ -33,14 +26,5 @@ module OMQ
33
26
  def wait_readable(timeout = @options.read_timeout)
34
27
  true
35
28
  end
36
-
37
- private
38
-
39
- def fill_recv_buffer
40
- batch = Reactor.run { with_timeout(@options.read_timeout) { @engine.dequeue_recv_batch(RECV_BATCH_SIZE) } }
41
- msg = batch.shift
42
- @recv_mutex.synchronize { @recv_buffer.concat(batch) } unless batch.empty?
43
- msg
44
- end
45
29
  end
46
30
  end
@@ -17,7 +17,7 @@ module OMQ
17
17
  # @return [Async::Task]
18
18
  #
19
19
  def self.start(engine, conn, q, tasks)
20
- task = engine.spawn_pump_task(annotation: "send pump") do
20
+ task = engine.spawn_conn_pump_task(conn, annotation: "send pump") do
21
21
  loop do
22
22
  batch = [q.dequeue]
23
23
  Routing.drain_send_queue(q, batch)
@@ -28,9 +28,6 @@ module OMQ
28
28
  end
29
29
  conn.flush
30
30
  batch.each { |parts| engine.emit_verbose_monitor_event(:message_sent, parts: parts) }
31
- rescue Protocol::ZMTP::Error, *CONNECTION_LOST
32
- engine.connection_lost(conn)
33
- break
34
31
  end
35
32
  end
36
33
  tasks << task
@@ -123,7 +123,7 @@ module OMQ
123
123
 
124
124
 
125
125
  def start_subscription_listener(conn)
126
- @tasks << @engine.spawn_pump_task(annotation: "subscription listener") do
126
+ @tasks << @engine.spawn_conn_pump_task(conn, annotation: "subscription listener") do
127
127
  loop do
128
128
  frame = conn.read_frame
129
129
  next unless frame.command?
@@ -135,8 +135,6 @@ module OMQ
135
135
  on_cancel(conn, cmd.data)
136
136
  end
137
137
  end
138
- rescue *CONNECTION_LOST
139
- @engine.connection_lost(conn)
140
138
  end
141
139
  end
142
140
 
@@ -159,7 +157,7 @@ module OMQ
159
157
 
160
158
 
161
159
  def start_conn_send_pump_normal(conn, q, use_wire)
162
- @engine.spawn_pump_task(annotation: "send pump") do
160
+ @engine.spawn_conn_pump_task(conn, annotation: "send pump") do
163
161
  loop do
164
162
  batch = [q.dequeue]
165
163
  Routing.drain_send_queue(q, batch)
@@ -167,9 +165,6 @@ module OMQ
167
165
  conn.flush
168
166
  batch.each { |parts| @engine.emit_verbose_monitor_event(:message_sent, parts: parts) }
169
167
  end
170
- rescue Protocol::ZMTP::Error, *CONNECTION_LOST
171
- @engine.connection_lost(conn)
172
- break
173
168
  end
174
169
  end
175
170
  end
@@ -187,21 +182,16 @@ module OMQ
187
182
 
188
183
 
189
184
  def start_conn_send_pump_conflate(conn, q)
190
- @engine.spawn_pump_task(annotation: "send pump") do
185
+ @engine.spawn_conn_pump_task(conn, annotation: "send pump") do
191
186
  loop do
192
187
  batch = [q.dequeue]
193
188
  Routing.drain_send_queue(q, batch)
194
189
  # Keep only the latest message that matches the subscription.
195
190
  latest = batch.reverse.find { |parts| subscribed?(conn, parts.first || EMPTY_BINARY) }
196
191
  next unless latest
197
- begin
198
- conn.write_message(latest)
199
- conn.flush
200
- @engine.emit_verbose_monitor_event(:message_sent, parts: latest)
201
- rescue Protocol::ZMTP::Error, *CONNECTION_LOST
202
- @engine.connection_lost(conn)
203
- break
204
- end
192
+ conn.write_message(latest)
193
+ conn.flush
194
+ @engine.emit_verbose_monitor_event(:message_sent, parts: latest)
205
195
  end
206
196
  end
207
197
  end
@@ -86,7 +86,7 @@ module OMQ
86
86
  private
87
87
 
88
88
  def start_send_pump(conn)
89
- @send_pump = @engine.spawn_pump_task(annotation: "send pump") do
89
+ @send_pump = @engine.spawn_conn_pump_task(conn, annotation: "send pump") do
90
90
  loop do
91
91
  batch = [@send_queue.dequeue]
92
92
  Routing.drain_send_queue(@send_queue, batch)
@@ -97,9 +97,6 @@ module OMQ
97
97
  end
98
98
  conn.flush
99
99
  batch.each { |parts| @engine.emit_verbose_monitor_event(:message_sent, parts: parts) }
100
- rescue Protocol::ZMTP::Error, *CONNECTION_LOST
101
- @engine.connection_lost(conn)
102
- break
103
100
  end
104
101
  end
105
102
  @tasks << @send_pump
@@ -62,10 +62,8 @@ module OMQ
62
62
  #
63
63
  def start_reaper(conn)
64
64
  return if conn.is_a?(Transport::Inproc::DirectPipe)
65
- @tasks << @engine.spawn_pump_task(annotation: "reaper") do
66
- conn.receive_message # blocks until peer disconnects
67
- rescue *CONNECTION_LOST
68
- @engine.connection_lost(conn)
65
+ @tasks << @engine.spawn_conn_pump_task(conn, annotation: "reaper") do
66
+ conn.receive_message # blocks until peer disconnects; then exits
69
67
  end
70
68
  end
71
69
  end
@@ -60,7 +60,7 @@ module OMQ
60
60
  #
61
61
  def remove_round_robin_send_connection(conn)
62
62
  update_direct_pipe
63
- @conn_send_tasks.delete(conn)&.stop
63
+ @conn_send_tasks.delete(conn)
64
64
  end
65
65
 
66
66
 
@@ -122,16 +122,13 @@ module OMQ
122
122
  # @param conn [Connection]
123
123
  #
124
124
  def start_conn_send_pump(conn)
125
- task = @engine.spawn_pump_task(annotation: "send pump") do
125
+ task = @engine.spawn_conn_pump_task(conn, annotation: "send pump") do
126
126
  loop do
127
127
  batch = [@send_queue.dequeue]
128
128
  drain_send_queue_capped(batch)
129
129
  write_batch(conn, batch)
130
130
  batch.each { |parts| @engine.emit_verbose_monitor_event(:message_sent, parts: parts) }
131
131
  Async::Task.current.yield
132
- rescue Protocol::ZMTP::Error, *CONNECTION_LOST
133
- @engine.connection_lost(conn)
134
- break
135
132
  end
136
133
  end
137
134
  @conn_send_tasks[conn] = task
@@ -81,23 +81,18 @@ module OMQ
81
81
  private
82
82
 
83
83
  def start_conn_send_pump(conn, q)
84
- task = @engine.spawn_pump_task(annotation: "send pump") do
84
+ task = @engine.spawn_conn_pump_task(conn, annotation: "send pump") do
85
85
  loop do
86
86
  parts = q.dequeue
87
87
  frame = parts.first&.b
88
88
  next if frame.nil? || frame.empty?
89
89
  flag = frame.getbyte(0)
90
90
  prefix = frame.byteslice(1..) || "".b
91
- begin
92
- case flag
93
- when 0x01
94
- conn.send_command(Protocol::ZMTP::Codec::Command.subscribe(prefix))
95
- when 0x00
96
- conn.send_command(Protocol::ZMTP::Codec::Command.cancel(prefix))
97
- end
98
- rescue Protocol::ZMTP::Error, *CONNECTION_LOST
99
- @engine.connection_lost(conn)
100
- break
91
+ case flag
92
+ when 0x01
93
+ conn.send_command(Protocol::ZMTP::Codec::Command.subscribe(prefix))
94
+ when 0x00
95
+ conn.send_command(Protocol::ZMTP::Codec::Command.cancel(prefix))
101
96
  end
102
97
  end
103
98
  end
data/lib/omq/routing.rb CHANGED
@@ -70,8 +70,7 @@ module OMQ
70
70
  #
71
71
  def self.drain_send_queue(queue, batch)
72
72
  loop do
73
- msg = queue.dequeue(timeout: 0)
74
- break unless msg
73
+ msg = queue.dequeue(timeout: 0) or break
75
74
  batch << msg
76
75
  end
77
76
  end
data/lib/omq/socket.rb CHANGED
@@ -75,10 +75,18 @@ module OMQ
75
75
  # Binds to an endpoint.
76
76
  #
77
77
  # @param endpoint [String]
78
+ # @param parent [#async, nil] Async parent for the socket's task tree.
79
+ # Accepts any object that responds to +#async+ — +Async::Task+,
80
+ # +Async::Barrier+, +Async::Semaphore+. When given, every task
81
+ # spawned under this socket (connection supervisors, reconnect
82
+ # loops, heartbeat, monitor) is placed under +parent+, so callers
83
+ # can coordinate teardown with their own Async tree. Only the
84
+ # *first* bind/connect call captures the parent — subsequent
85
+ # calls ignore the kwarg.
78
86
  # @return [void]
79
87
  #
80
- def bind(endpoint)
81
- ensure_parent_task
88
+ def bind(endpoint, parent: nil)
89
+ ensure_parent_task(parent: parent)
82
90
  Reactor.run do
83
91
  @engine.bind(endpoint)
84
92
  @last_tcp_port = @engine.last_tcp_port
@@ -89,10 +97,11 @@ module OMQ
89
97
  # Connects to an endpoint.
90
98
  #
91
99
  # @param endpoint [String]
100
+ # @param parent [#async, nil] see {#bind}.
92
101
  # @return [void]
93
102
  #
94
- def connect(endpoint)
95
- ensure_parent_task
103
+ def connect(endpoint, parent: nil)
104
+ ensure_parent_task(parent: parent)
96
105
  Reactor.run { @engine.connect(endpoint) }
97
106
  end
98
107
 
@@ -201,7 +210,10 @@ module OMQ
201
210
  end
202
211
 
203
212
 
204
- # Closes the socket and releases all resources.
213
+ # Closes the socket and releases all resources. Drains pending sends
214
+ # up to +linger+ seconds, then cascades teardown through the
215
+ # socket-level Async::Barrier — every connection's per-connection
216
+ # barrier is stopped, cancelling every pump.
205
217
  #
206
218
  # @return [nil]
207
219
  #
@@ -211,6 +223,18 @@ module OMQ
211
223
  end
212
224
 
213
225
 
226
+ # Immediate hard stop. Skips the linger drain and cascades teardown
227
+ # through the socket-level Async::Barrier. Intended for crash-path
228
+ # cleanup or when the caller already knows no pending sends matter.
229
+ #
230
+ # @return [nil]
231
+ #
232
+ def stop
233
+ Reactor.run { @engine.stop }
234
+ nil
235
+ end
236
+
237
+
214
238
  # Set socket to use unbounded pipes (HWM=0).
215
239
  #
216
240
  # @return [nil]
@@ -254,8 +278,8 @@ module OMQ
254
278
  # Must be called OUTSIDE Reactor.run so that non-Async callers
255
279
  # get the IO thread's root task, not an ephemeral work task.
256
280
  #
257
- def ensure_parent_task
258
- @engine.capture_parent_task
281
+ def ensure_parent_task(parent: nil)
282
+ @engine.capture_parent_task(parent: parent)
259
283
  end
260
284
 
261
285
 
@@ -292,9 +316,7 @@ module OMQ
292
316
  @options.recv_timeout = recv_timeout if recv_timeout
293
317
  @options.conflate = conflate
294
318
  @options.on_mute = on_mute if on_mute
295
- @recv_buffer = []
296
- @recv_mutex = Mutex.new
297
- @engine = case backend
319
+ @engine = case backend
298
320
  when nil, :ruby
299
321
  Engine.new(socket_type, @options)
300
322
  when :ffi
data/lib/omq/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module OMQ
4
- VERSION = "0.16.2"
4
+ VERSION = "0.17.0"
5
5
  end
data/lib/omq.rb CHANGED
@@ -45,6 +45,7 @@ module OMQ
45
45
  Errno::ETIMEDOUT,
46
46
  Errno::EHOSTUNREACH,
47
47
  Errno::ENETUNREACH,
48
+ Errno::EPROTOTYPE, # IPC: existing socket file is SOCK_DGRAM, not SOCK_STREAM
48
49
  Socket::ResolutionError,
49
50
  ]
50
51
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: omq
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.16.2
4
+ version: 0.17.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Patrik Wenger