eventhub-processor2 1.27.2 → 1.28.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 1c00f385100ce56b24b64a527d935dc4820ce5446c41361792d5f00b7345940d
4
- data.tar.gz: f8d3a05880d7af80858c39b000325ec4e3789551961d941cf0511fd051ec6d58
3
+ metadata.gz: 712c5d8a0d59c49a1efe698824cca79e0d229894db4673cfc6eaed9d8080bc57
4
+ data.tar.gz: c3107bdb2f39ab2e79faa2bcd6cd2230c0c4fa07b31a067efd9cba07fe51540b
5
5
  SHA512:
6
- metadata.gz: d20c763b7e403e2946a83af3cba688ebbd2ef92199332ce7aa2e22e6c3bd9bc8c288fea4dcace16c2a1d504fb2c8d643a6d3eb54ea419bc482d75675155c5c70
7
- data.tar.gz: fb3830c2b68e056d995b8c33742cdd46f216788094cce20f25d5e176a74f86facdc52c9b2da9eeb7b9561d7b16cce8683187e1eccb14c6d29ed1d0b820195a74
6
+ metadata.gz: 8ff4cb3ad79b36661685556048fd9006f899c1deb7dcc80e00c83f15a253e2d5bd6e60d5697a83827ad1adcefe22dd2c54d3de7ee4eeeb0e9ce3126e7f3c345d
7
+ data.tar.gz: e486e53d7a81179645b3e2a0be64bca1e6604215a16bdb5e1c5a08114ed2b7893339bf89ae48c9c34241bf516c49fa0bded503673b2fa15934b08c9bf20f2ee4
data/.gitignore CHANGED
@@ -2,7 +2,8 @@
2
2
  coverage
3
3
  .rspec_status
4
4
  logs
5
- example/data
5
+ soak/data
6
+ soak/logs
6
7
  pkg
7
8
  tmp
8
9
  checksums
data/.tool-versions CHANGED
@@ -1 +1 @@
1
- ruby 4.0.2
1
+ ruby 4.0.4
data/CHANGELOG.md CHANGED
@@ -1,5 +1,37 @@
1
1
  # Changelog of EventHub::Processor2
2
2
 
3
+ # 1.28.0 / 2026-05-18
4
+
5
+ **Reliability**
6
+
7
+ * Survive broker restarts. `automatically_recover: true` + recovery callbacks, so transient disconnects no longer crash the Celluloid actor or exhaust the supervisor's restart budget. Fixes the dispatcher's crash-loop into permanent death after a broker bounce.
8
+ * Survive silent consumer-thread death. New `channel#on_uncaught_exception` / `consumer#on_cancellation` hooks on `ActorListenerAmqp` escalate to an actor restart, but only for non-recoverable errors - transient `Bunny::NetworkFailure` / `ConnectionClosedError` / `TCPConnectionFailed` / `Timeout::Error` / `IOError` and mid-disconnect cancellations are left to Bunny's recovery (no spurious 15s restart sleep).
9
+ * `ActorWatchdog` tolerates transient broker errors; only escalates on 3 consecutive cycles of persistent queue absence.
10
+ * Tolerant `cleanup` in `ActorListenerAmqp`, `ActorPublisher`, `ActorHeartbeat` - bunny-3 raises on `close` of a torn-down session no longer crash the finalizer.
11
+ * **Patch Celluloid 0.18 / Ruby 3.x incompatibility.** Celluloid's `Internals::Logger.crash` mutates a frozen string literal, which raises `FrozenError` before our exception handler runs - the actor thread then dies silently, the supervisor never sees the exit, and the listener becomes a zombie. Symptom: SIGHUP-triggered restart looks like it works but the listener stops consuming. Patched via `Module#prepend` in `lib/eventhub/patches/celluloid_logger.rb`.
12
+
13
+ **Throughput**
14
+
15
+ * `ActorPublisher` and `ActorHeartbeat` now reuse a single channel per actor instead of opening one per message. Eliminates the RabbitMQ `Channel is stopping with N pending publisher confirms` warning and lifts publisher throughput from ~hundreds to ~7k msg/s in local tests.
16
+ * New `rake test:performance` task: regression gate at 5k msg/s on a reused channel (opt-in, tagged `:performance`).
17
+
18
+ **Tracing**
19
+
20
+ * `correlation_id` now survives the cross-actor hop on publish. `Processor2#publish` captures `CorrelationId.current` in the caller's thread before handing off to the publisher actor, so the AMQP header is preserved end-to-end.
21
+
22
+ **Security**
23
+
24
+ * Sensitive-value redaction in the rendered config page now walks nested hashes and arrays. Previously `server.credentials.password` and `connections[].token` leaked through; now redacted at any depth.
25
+
26
+ **Observability**
27
+
28
+ * `Celluloid.exception_handler` log line includes the dying actor's class name.
29
+ * Log broker `connection.blocked` / `connection.unblocked` events.
30
+
31
+ **Test harness**
32
+
33
+ * New `soak/` reliability harness (publisher / router / receiver / crasher) with `make soak` Makefile target. Adaptive drain, real-vs-in-flight orphan classification, configurable chaos length. Validated with 2h chaos run: 0 real orphans.
34
+
3
35
  # 1.27.2 / 2026-04-08
4
36
 
5
37
  * Fix publish return value leaking Bunny::Exchange object back to callers, causing unintended re-publishing of garbage messages via `handle_payload`
data/Makefile ADDED
@@ -0,0 +1,115 @@
1
+ # Convenience targets for running and soak-testing the chaos harness in soak/.
2
+ # The gem itself is built via `rake` (see Rakefile).
3
+
4
+ SHELL := /bin/bash
5
+ SOAK_DIR := soak
6
+ DATA_DIR := $(SOAK_DIR)/data
7
+ LOG_DIR := $(SOAK_DIR)/logs
8
+ SOAK_MINUTES ?= 10
9
+ SOAK_DRAIN_POLL_S ?= 5
10
+ SOAK_DRAIN_MAX_S ?= 600
11
+
12
+ .PHONY: help
13
+ help:
14
+ @echo "Targets:"
15
+ @echo " make soak-start start publisher, router, receiver, crasher in background"
16
+ @echo " make soak-stop stop everything (SIGINT, clean shutdown)"
17
+ @echo " make soak-clean stop and wipe $(DATA_DIR)/ + $(LOG_DIR)/"
18
+ @echo " make soak run reliability soak for SOAK_MINUTES (default 10) min"
19
+ @echo ""
20
+ @echo "Soak env overrides:"
21
+ @echo " SOAK_MINUTES=N length of the chaos phase (default 10)"
22
+ @echo " SOAK_DRAIN_POLL_S=N drain poll interval in seconds (default 5)"
23
+ @echo " SOAK_DRAIN_MAX_S=N drain hard cap in seconds (default 600 = 10 min)"
24
+ @echo ""
25
+ @echo "Publisher env overrides (forwarded to publisher.rb):"
26
+ @echo " PAUSE_BETWEEN_WORK=F seconds between publishes (default 0.05)"
27
+ @echo " PUBLISH_MAX_ATTEMPTS=N publish retry attempts on transient errors (default 8)"
28
+ @echo " PUBLISH_RETRY_DELAY_S=F seconds between publish retries (default 1)"
29
+
30
+ .PHONY: soak-start
31
+ soak-start: soak-stop
32
+ @mkdir -p $(DATA_DIR)
33
+ @echo "==> starting receiver"
34
+ @cd $(SOAK_DIR) && nohup ruby receiver.rb > /dev/null 2>&1 &
35
+ @echo "==> starting router"
36
+ @cd $(SOAK_DIR) && nohup ruby router.rb > /dev/null 2>&1 &
37
+ @sleep 3
38
+ @echo "==> starting publisher"
39
+ @cd $(SOAK_DIR) && nohup ruby publisher.rb > /dev/null 2>&1 &
40
+ @sleep 2
41
+ @echo "==> starting crasher"
42
+ @cd $(SOAK_DIR) && nohup ruby crasher.rb > /dev/null 2>&1 &
43
+ @echo "==> all processes started; tail logs in $(SOAK_DIR)/logs/ruby/"
44
+
45
+ .PHONY: soak-stop
46
+ soak-stop:
47
+ @pkill -INT -f "ruby (publisher|router|receiver|crasher)\.rb" 2>/dev/null || true
48
+ @sleep 2
49
+
50
+ .PHONY: soak-clean
51
+ soak-clean: soak-stop
52
+ @rm -rf $(DATA_DIR) $(LOG_DIR)
53
+ @mkdir -p $(DATA_DIR) $(LOG_DIR)/ruby
54
+ @echo "==> cleaned $(DATA_DIR) and $(LOG_DIR)"
55
+
56
+ # Soak: run the chaos loop for SOAK_MINUTES, then SIGKILL the publisher
57
+ # (skipping its cleanup so we keep an honest snapshot of any in-flight
58
+ # files), give router+receiver SOAK_DRAIN_S seconds to drain the queue,
59
+ # then count remaining files. Anything left is an orphan.
60
+ .PHONY: soak
61
+ soak:
62
+ @started=$$(date +%s); started_human=$$(date '+%Y-%m-%d %H:%M:%S %Z'); \
63
+ chaos_ends_at=$$(date -r $$(($$started + $(SOAK_MINUTES) * 60)) '+%H:%M:%S' 2>/dev/null \
64
+ || date -d @"$$(($$started + $(SOAK_MINUTES) * 60))" '+%H:%M:%S' 2>/dev/null); \
65
+ echo "==> soak: $(SOAK_MINUTES) min chaos + adaptive drain (cap $(SOAK_DRAIN_MAX_S)s)"; \
66
+ echo " started: $$started_human"; \
67
+ echo " chaos ends ~ $$chaos_ends_at"; \
68
+ $(MAKE) --no-print-directory soak-clean; \
69
+ mkdir -p $(DATA_DIR); \
70
+ ( cd $(SOAK_DIR) && nohup ruby receiver.rb > /dev/null 2>&1 & ); \
71
+ ( cd $(SOAK_DIR) && nohup ruby router.rb > /dev/null 2>&1 & ); \
72
+ sleep 3; \
73
+ ( cd $(SOAK_DIR) && nohup ruby publisher.rb > /dev/null 2>&1 & ); \
74
+ sleep 2; \
75
+ ( cd $(SOAK_DIR) && nohup ruby crasher.rb > /dev/null 2>&1 & ); \
76
+ echo "==> chaos phase running for $$(($(SOAK_MINUTES) * 60))s..."; \
77
+ sleep $$(($(SOAK_MINUTES) * 60)); \
78
+ echo "==> stopping crasher (SIGINT)"; \
79
+ pkill -INT -f "ruby crasher\.rb" 2>/dev/null || true; \
80
+ echo "==> SIGKILL publisher to skip its cleanup and freeze the snapshot"; \
81
+ pkill -KILL -f "ruby publisher\.rb" 2>/dev/null || true; \
82
+ sleep 1; \
83
+ drain_started=$$(date +%s); prev=-1; deadline=$$(($$drain_started + $(SOAK_DRAIN_MAX_S))); \
84
+ echo "==> draining (poll every $(SOAK_DRAIN_POLL_S)s, cap $(SOAK_DRAIN_MAX_S)s)"; \
85
+ echo " (orphans = files with no matching store.json entry; in-flight = SIGKILL race, not a failure)"; \
86
+ while :; do \
87
+ now=$$(date +%s); \
88
+ read real in_flight <<< $$($(SOAK_DIR)/check_orphans.rb $(DATA_DIR)); \
89
+ elapsed_drain=$$(($$now - $$drain_started)); \
90
+ printf " [%4ds] real=%d in_flight=%d\n" $$elapsed_drain $$real $$in_flight; \
91
+ if [ "$$real" = "0" ]; then break; fi; \
92
+ if [ $$now -ge $$deadline ]; then echo " drain cap reached, giving up"; break; fi; \
93
+ if [ "$$real" = "$$prev" ]; then echo " no progress in last interval, giving up"; break; fi; \
94
+ prev=$$real; \
95
+ sleep $(SOAK_DRAIN_POLL_S); \
96
+ done; \
97
+ finished=$$(date +%s); finished_human=$$(date '+%Y-%m-%d %H:%M:%S %Z'); \
98
+ elapsed=$$(($$finished - $$started)); \
99
+ read real in_flight <<< $$($(SOAK_DIR)/check_orphans.rb $(DATA_DIR)); \
100
+ echo ""; \
101
+ echo "==> soak result"; \
102
+ echo " started: $$started_human"; \
103
+ echo " finished: $$finished_human"; \
104
+ echo " elapsed total: $${elapsed}s"; \
105
+ echo " drain time: $$(($$finished - $$drain_started))s"; \
106
+ echo " real orphans: $$real (pipeline loss)"; \
107
+ echo " in-flight at SIGKILL: $$in_flight (expected residual; cleaned on next publisher start)"; \
108
+ $(MAKE) --no-print-directory soak-stop; \
109
+ if [ "$$real" = "0" ]; then \
110
+ echo "==> PASS"; \
111
+ else \
112
+ echo "==> FAIL: $$real real orphan(s) remain after drain"; \
113
+ $(SOAK_DIR)/check_orphans.rb $(DATA_DIR) --list-orphans >/dev/null; \
114
+ exit 1; \
115
+ fi
data/README.md CHANGED
@@ -463,7 +463,7 @@ end
463
463
 
464
464
  ### Configuration
465
465
 
466
- Displays the active configuration as an HTML table. Sensitive values (passwords, tokens, keys) are automatically redacted.
466
+ Displays the active configuration as an HTML table. Sensitive values (passwords, tokens, keys) are automatically redacted at any depth — keys matching the sensitive list are masked whether they appear at the top level, inside nested hashes, or inside hashes nested in arrays.
467
467
 
468
468
  ```
469
469
  GET {base_path}/docs/configuration
@@ -547,6 +547,35 @@ end
547
547
 
548
548
  To install this gem onto your local machine, run `bundle exec rake install`.
549
549
 
550
+ ### Reliability soak harness
551
+
552
+ A multi-process chaos harness lives in `soak/` (publisher, router, receiver, crasher). It validates end-to-end reliability under sustained broker restarts and `SIGHUP`-triggered listener restarts. A run is considered passing when zero "real orphan" files remain (files whose UUID is not in the publisher's transaction store — i.e. messages whose delivery was claimed by the publisher but never made it through the pipeline).
553
+
554
+ ```
555
+ # 10-minute chaos + adaptive drain (default), prints PASS/FAIL
556
+ make soak
557
+
558
+ # Longer runs
559
+ SOAK_MINUTES=60 make soak
560
+ SOAK_MINUTES=120 SOAK_DRAIN_MAX_S=1200 make soak
561
+
562
+ # See all targets and env knobs
563
+ make help
564
+ ```
565
+
566
+ See `soak/README.md` for the full description.
567
+
568
+ ### Throughput baseline
569
+
570
+ `rake test:performance` runs a publisher-throughput regression spec against a real local RabbitMQ. It is excluded from `rake spec` (tagged `:performance`) so it does not run on every CI build.
571
+
572
+ ```
573
+ bundle exec rake test:performance
574
+ PERF_FLOOR=3000 bundle exec rake test:performance # lower the floor for slower hosts
575
+ ```
576
+
577
+ Default floor is 5,000 msg/s. Typical local result is ~7,000 msg/s.
578
+
550
579
  ## Publishing
551
580
 
552
581
  This project uses [Trusted Publishing](https://guides.rubygems.org/trusted-publishing/) to securely publish gems to RubyGems.org via GitHub Actions. To release a new version:
data/Rakefile CHANGED
@@ -4,6 +4,15 @@ require "standard/rake"
4
4
 
5
5
  RSpec::Core::RakeTask.new(:spec) do |t|
6
6
  t.verbose = false
7
+ t.rspec_opts = "--tag ~performance"
8
+ end
9
+
10
+ namespace :test do
11
+ desc "Run throughput baseline against real RabbitMQ (skipped by `rake spec`)"
12
+ RSpec::Core::RakeTask.new(:performance) do |t|
13
+ t.verbose = false
14
+ t.rspec_opts = "--tag performance --format documentation"
15
+ end
7
16
  end
8
17
 
9
18
  desc "Initialize or reset rabbitmq docker container (run before rspec)"
data/example/README.md CHANGED
@@ -1,40 +1,20 @@
1
- ## Example Application
1
+ ## Example processor
2
2
 
3
- ### Description
3
+ `example.rb` is a minimal `EventHub::Processor2` subclass used to demo and
4
+ visually verify the gem's built-in HTTP endpoints (`/heartbeat`, `/version`,
5
+ `/docs`, `/changelog`, `/configuration`). It's intentionally tiny - just enough
6
+ to start a processor against the local RabbitMQ container.
4
7
 
5
- Example folder contains a series of applications in order to test reliability and performance of processor2 gem.
6
- * publisher.rb - Creates a unique file and a message
7
- * router.rb - routes messges via queues
8
- * receiver.rb - receives message and does final processing
9
- * crasher.rb - restarts message broker or sends signals to other processes
8
+ For the chaos / reliability test harness (publisher, router, receiver,
9
+ crasher), see [`../soak/`](../soak/) and the `make soak` target in the
10
+ project root.
10
11
 
11
- ### How does it work?
12
+ ### Run
12
13
 
13
- A message is passed throuhg the following components.
14
- publisher.rb => [example.outbound] => router.rb => [example.inbound] => receiver.rb
14
+ ```bash
15
+ cd example
16
+ bundle exec ruby example.rb
17
+ ```
15
18
 
16
- 1. publisher.rb generates a unique ID, creates a json message with the ID as payload, save the message in a file, and passes the message to example.outbound queue.
17
-
18
- 2. router.rb receives the message and passes it to exmaple.outbound queue
19
-
20
- 3. receiver.rb gets the message and deletes the file with the given ID
21
-
22
- ### Goal
23
- What ever happens to these components (restarted, killed and restarted, stopped and started, message broker killed, stopped and started) if you do a graceful shutdown at the end there should be no message in the /data folder (except store.json).
24
-
25
- Graceful shutdown with CTRL-C or TERM signal to pid
26
- * Stop producer.rb. Leave the other components running until all messages in example.* queues are gone.
27
- * Stop remaining components
28
- * Check ./example/data folder
29
-
30
-
31
- ### How to use?
32
- * Make sure docker container (process-rabbitmq) is running (see [readme](../docker/README.md))
33
- * Start one or more router with: bundle exec ruby router.rb
34
- * Start one or more receiver with: bundle exec ruby receier.rb
35
- * Start one publisher with: bundle exec ruby publisher.rb
36
- * Start one crasher with: bundle exec ruby crasher.rb (or do this manually)
37
-
38
- ### Note
39
- * Publisher has a simple transaction store implemented to deal with issues between file creation and file publishing. At the end of the publisher process in the cleanup method pending transaction get processed and coresponding files get deleted.
40
- * Watch for huge log files!
19
+ Then visit `http://localhost:8083/svc/example/docs` (or whatever `http.port`
20
+ is configured in `config/example.json`).
@@ -9,6 +9,8 @@ module EventHub
9
9
 
10
10
  def initialize(processor_instance)
11
11
  @processor_instance = processor_instance
12
+ @connection = nil
13
+ @channel = nil
12
14
  async.start
13
15
  end
14
16
 
@@ -28,21 +30,49 @@ module EventHub
28
30
 
29
31
  def cleanup
30
32
  EventHub.logger.info("Heartbeat is cleaning up...")
31
- publish(heartbeat(action: "stopped"))
32
- EventHub.logger.info("Heartbeat has sent a [stopped] beat")
33
+ begin
34
+ publish(heartbeat(action: "stopped"))
35
+ EventHub.logger.info("Heartbeat has sent a [stopped] beat")
36
+ rescue => ex
37
+ EventHub.logger.warn("Heartbeat cleanup publish: ignoring #{ex.class}: #{ex.message}")
38
+ end
39
+ begin
40
+ @channel&.close
41
+ rescue => ex
42
+ EventHub.logger.warn("Heartbeat cleanup channel: ignoring #{ex.class}: #{ex.message}")
43
+ end
44
+ begin
45
+ @connection&.close
46
+ rescue => ex
47
+ EventHub.logger.warn("Heartbeat cleanup connection: ignoring #{ex.class}: #{ex.message}")
48
+ end
33
49
  end
34
50
 
35
51
  private
36
52
 
37
53
  def publish(message)
38
- connection = create_bunny_connection
39
- connection.start
40
- channel = connection.create_channel
41
- channel.confirm_select(tracking: true)
42
- exchange = channel.direct(EventHub::EH_X_INBOUND, durable: true)
54
+ ensure_channel
55
+ exchange = @channel.direct(EventHub::EH_X_INBOUND, durable: true)
43
56
  exchange.publish(message, persistent: true)
44
- ensure
45
- connection&.close
57
+ rescue Bunny::NetworkFailure, Bunny::ChannelAlreadyClosed => e
58
+ EventHub.logger.warn("Heartbeat channel dropped: #{e.class}: #{e.message}")
59
+ begin
60
+ @channel&.close
61
+ rescue
62
+ nil
63
+ end
64
+ @channel = nil
65
+ raise
66
+ end
67
+
68
+ def ensure_channel
69
+ unless @connection
70
+ @connection = create_bunny_connection
71
+ @connection.start
72
+ end
73
+ return if @channel&.open?
74
+ @channel = @connection.create_channel
75
+ @channel.confirm_select(tracking: true)
46
76
  end
47
77
 
48
78
  def heartbeat(args = {action: "running"})
@@ -39,6 +39,37 @@ module EventHub
39
39
  def listen(args = {})
40
40
  with_listen(args) do |connection, channel, consumer, queue, queue_name|
41
41
  EventHub.logger.info("Listening to queue [#{queue_name}]")
42
+
43
+ # log broker-initiated connection state changes
44
+ connection.on_blocked { |reason| EventHub.logger.warn("Broker blocked connection: #{reason}") }
45
+ connection.on_unblocked { EventHub.logger.info("Broker unblocked connection") }
46
+
47
+ # Only escalate to an actor restart for exceptions Bunny will NOT
48
+ # recover from. Transient network exceptions are handled by Bunny's
49
+ # automatic recovery; an actor restart in that window races recovery
50
+ # and incurs an avoidable 15s before_restart sleep without consumption.
51
+ channel.on_uncaught_exception do |ex, _consumer|
52
+ if recoverable_bunny_error?(ex)
53
+ EventHub.logger.warn("Consumer thread raised recoverable #{ex.class}: #{ex.message} - leaving recovery to Bunny")
54
+ else
55
+ EventHub.logger.error("Consumer thread raised non-recoverable #{ex.class}: #{ex.message} - restarting listener")
56
+ Celluloid::Actor[:actor_listener_amqp]&.async&.restart
57
+ end
58
+ end
59
+
60
+ # Broker may cancel a consumer (queue deleted, HA failover, policy change).
61
+ # If the connection is still open, this is a real broker-side cancel and
62
+ # we must restart. If the connection is closed/recovering, Bunny will
63
+ # re-register the consumer itself on reconnect; do not race it.
64
+ consumer.on_cancellation do
65
+ if connection.open?
66
+ EventHub.logger.error("Consumer for [#{queue_name}] cancelled by broker - restarting listener")
67
+ Celluloid::Actor[:actor_listener_amqp]&.async&.restart
68
+ else
69
+ EventHub.logger.warn("Consumer for [#{queue_name}] cancelled during disconnect - leaving recovery to Bunny")
70
+ end
71
+ end
72
+
42
73
  consumer.on_delivery do |delivery_info, metadata, payload|
43
74
  CorrelationId.with(metadata[:correlation_id]) do
44
75
  EventHub.logger.info("#{queue_name}: [#{delivery_info.delivery_tag}]" \
@@ -70,9 +101,10 @@ module EventHub
70
101
 
71
102
  def with_listen(args = {}, &block)
72
103
  connection = create_bunny_connection
73
- connection.start
74
104
  queue_name = args[:queue_name]
105
+ # store FIRST so cleanup can find a partially-started session
75
106
  @connections[queue_name] = connection
107
+ connection.start
76
108
  channel = connection.create_channel
77
109
  channel.prefetch(1)
78
110
  queue = channel.queue(queue_name, durable: true)
@@ -88,6 +120,7 @@ module EventHub
88
120
  def handle_payload(args = {})
89
121
  response_messages = []
90
122
  connection = args[:connection]
123
+ correlation_id = args[:correlation_id] || CorrelationId.current
91
124
 
92
125
  # convert to EventHub message
93
126
  message = EventHub::Message.from_json(args[:payload])
@@ -123,9 +156,12 @@ module EventHub
123
156
  end
124
157
  end
125
158
 
159
+ # use possibly-updated execution_id fallback from above
160
+ correlation_id ||= CorrelationId.current
161
+
126
162
  Array(response_messages).each do |message|
127
163
  next unless message.is_a?(EventHub::Message)
128
- publish(message: message.to_json, connection: connection)
164
+ publish(message: message.to_json, connection: connection, correlation_id: correlation_id)
129
165
  end
130
166
  end
131
167
 
@@ -136,15 +172,33 @@ module EventHub
136
172
 
137
173
  def cleanup
138
174
  EventHub.logger.info("Listener amqp is cleaning up...")
139
- # close all open connections
175
+ # close all open connections; bunny-3 can raise on a torn-down session
140
176
  return unless @connections
141
177
  @connections.values.each do |connection|
142
178
  connection&.close
179
+ rescue => ex
180
+ EventHub.logger.warn("Listener cleanup: ignoring #{ex.class}: #{ex.message}")
143
181
  end
144
182
  end
145
183
 
146
184
  def publish(args)
147
185
  @actor_publisher.publish(args)
148
186
  end
187
+
188
+ # Exceptions that Bunny's network recovery handles transparently. If one of
189
+ # these bubbles into `on_uncaught_exception`, the right move is to let the
190
+ # in-flight recovery complete rather than racing it with an actor restart.
191
+ RECOVERABLE_BUNNY_ERRORS = [
192
+ Bunny::NetworkFailure,
193
+ Bunny::ConnectionClosedError,
194
+ Bunny::TCPConnectionFailed,
195
+ Bunny::TCPConnectionFailedForAllHosts,
196
+ Timeout::Error,
197
+ IOError
198
+ ].freeze
199
+
200
+ def recoverable_bunny_error?(ex)
201
+ RECOVERABLE_BUNNY_ERRORS.any? { |klass| ex.is_a?(klass) }
202
+ end
149
203
  end
150
204
  end
@@ -10,23 +10,17 @@ module EventHub
10
10
  def initialize
11
11
  EventHub.logger.info("Publisher is starting...")
12
12
  @connection = nil
13
+ @channel = nil
13
14
  end
14
15
 
15
16
  def publish(args = {})
16
- # keep connection once established
17
- unless @connection
18
- @connection = create_bunny_connection
19
- @connection.start
20
- end
17
+ ensure_channel
21
18
 
22
19
  message = args[:message]
23
20
  return if message.nil?
24
21
 
25
22
  exchange_name = args[:exchange_name] || EH_X_INBOUND
26
-
27
- channel = @connection.create_channel
28
- channel.confirm_select(tracking: true)
29
- exchange = channel.direct(exchange_name, durable: true)
23
+ exchange = @channel.direct(exchange_name, durable: true)
30
24
 
31
25
  publish_options = {persistent: true}
32
26
  correlation_id = args[:correlation_id] || CorrelationId.current
@@ -34,13 +28,53 @@ module EventHub
34
28
 
35
29
  exchange.publish(message, publish_options)
36
30
  nil
37
- ensure
38
- channel&.close
31
+ rescue Bunny::NetworkFailure, Bunny::ChannelAlreadyClosed => e
32
+ # broker-side close - drop the channel so next publish reopens it
33
+ EventHub.logger.warn("Publisher channel dropped: #{e.class}: #{e.message}")
34
+ begin
35
+ @channel&.close
36
+ rescue
37
+ nil
38
+ end
39
+ @channel = nil
40
+ raise
39
41
  end
40
42
 
41
43
  def cleanup
42
44
  EventHub.logger.info("Publisher is cleaning up...")
43
- @connection&.close
45
+ begin
46
+ @channel&.close
47
+ rescue => ex
48
+ EventHub.logger.warn("Publisher cleanup channel: ignoring #{ex.class}: #{ex.message}")
49
+ end
50
+ begin
51
+ @connection&.close
52
+ rescue => ex
53
+ EventHub.logger.warn("Publisher cleanup connection: ignoring #{ex.class}: #{ex.message}")
54
+ end
55
+ end
56
+
57
+ private
58
+
59
+ def ensure_channel
60
+ unless @connection
61
+ @connection = create_bunny_connection
62
+ @connection.start
63
+ end
64
+ return if @channel&.open?
65
+
66
+ attempts = 0
67
+ begin
68
+ @channel = @connection.create_channel
69
+ @channel.confirm_select(tracking: true)
70
+ rescue Bunny::NetworkFailure, Bunny::ChannelAlreadyClosed
71
+ attempts += 1
72
+ if attempts < 3
73
+ sleep 1
74
+ retry
75
+ end
76
+ raise
77
+ end
44
78
  end
45
79
  end
46
80
  end
@@ -7,9 +7,13 @@ module EventHub
7
7
 
8
8
  finalizer :cleanup
9
9
 
10
+ # number of consecutive failed cycles before we raise to force a restart
11
+ MISSING_QUEUE_THRESHOLD = 3
12
+
10
13
  def initialize
11
14
  cycle = Configuration.processor[:watchdog_cycle_in_s]
12
15
  EventHub.logger.info("Watchdog is starting [cycle: #{cycle}s]...")
16
+ @consecutive_failures = 0
13
17
  async.start
14
18
  end
15
19
 
@@ -30,14 +34,28 @@ module EventHub
30
34
  connection = create_bunny_connection
31
35
  connection.start
32
36
 
33
- EventHub::Configuration.processor[:listener_queues].each do |queue_name|
34
- unless connection.queue_exists?(queue_name)
35
- EventHub.logger.warn("Queue [#{queue_name}] is missing")
36
- raise "Queue [#{queue_name}] is missing"
37
+ missing = EventHub::Configuration.processor[:listener_queues].reject do |queue_name|
38
+ connection.queue_exists?(queue_name)
39
+ end
40
+
41
+ if missing.empty?
42
+ @consecutive_failures = 0
43
+ else
44
+ @consecutive_failures += 1
45
+ EventHub.logger.warn("Watchdog: queue(s) missing #{missing.inspect} (#{@consecutive_failures}/#{MISSING_QUEUE_THRESHOLD})")
46
+ if @consecutive_failures >= MISSING_QUEUE_THRESHOLD
47
+ raise "Queue(s) missing for #{@consecutive_failures} consecutive cycles: #{missing.inspect}"
37
48
  end
38
49
  end
50
+ rescue Bunny::NetworkFailure, Bunny::TCPConnectionFailed, Timeout::Error => ex
51
+ # transient broker problems are auto-recovered by Bunny; don't fight it
52
+ EventHub.logger.warn("Watchdog: transient broker error #{ex.class}: #{ex.message} - skipping cycle")
39
53
  ensure
40
- connection&.close
54
+ begin
55
+ connection&.close
56
+ rescue
57
+ nil
58
+ end
41
59
  end
42
60
  end
43
61
  end
data/lib/eventhub/base.rb CHANGED
@@ -28,6 +28,32 @@ require_relative "actor_listener_amqp"
28
28
  require_relative "actor_listener_http"
29
29
  require_relative "docs_renderer"
30
30
  require_relative "processor2"
31
+ require_relative "patches/celluloid_logger"
32
+
33
+ module EventHub
34
+ # Format a Celluloid actor exception with the dying actor's class name
35
+ # so post-mortem analysis can identify which actor died.
36
+ #
37
+ # Important: inside an actor's crash flow, `Celluloid.current_actor`
38
+ # returns the Proxy::Cell, whose `.class` goes through method_missing /
39
+ # the mailbox - which can hang on a dying actor. Read the raw actor
40
+ # object out of the thread-local instead and walk to the subject class
41
+ # via instance variables (no proxy round-trips).
42
+ def self.format_celluloid_exception(ex)
43
+ actor_name = begin
44
+ actor = Thread.current[:celluloid_actor]
45
+ if actor
46
+ behavior = actor.instance_variable_get(:@behavior)
47
+ subject = behavior&.instance_variable_get(:@subject)
48
+ subject&.class&.name
49
+ end
50
+ rescue
51
+ nil
52
+ end
53
+ prefix = actor_name ? "[#{actor_name}] " : ""
54
+ "#{prefix}Exception occured: #{ex.class}: #{ex.message}"
55
+ end
56
+ end
31
57
 
32
58
  Celluloid.logger = nil
33
- Celluloid.exception_handler { |ex| EventHub.logger.error "Exception occured: #{ex}" }
59
+ Celluloid.exception_handler { |ex| EventHub.logger.error(EventHub.format_celluloid_exception(ex)) }
@@ -162,7 +162,9 @@ module EventHub
162
162
  def config_to_html_table(hash, depth = 0, prefix = "")
163
163
  rows = hash.map do |key, value|
164
164
  full_key = prefix.empty? ? key.to_s : "#{prefix}.#{key}"
165
- if depth == 0 && value.is_a?(Hash) && !value.empty?
165
+ if sensitive_key?(key) && !value.nil? && !(value.respond_to?(:empty?) && value.empty?)
166
+ "<tr><td class=\"config-key\">#{ERB::Util.html_escape(full_key)}</td><td>#{redacted_html}</td></tr>"
167
+ elsif depth == 0 && value.is_a?(Hash) && !value.empty?
166
168
  "<tr class=\"is-section is-section-top\"><td colspan=\"2\"><strong>#{ERB::Util.html_escape(full_key)}</strong></td></tr>\n" \
167
169
  "#{config_to_html_table(value, 1, full_key)}"
168
170
  elsif value.is_a?(Hash) && value.empty?
@@ -178,9 +180,7 @@ module EventHub
178
180
  elsif value.is_a?(Array)
179
181
  format_array_rows(full_key, key, value, depth)
180
182
  else
181
- display_value = if sensitive_key?(key)
182
- "<span class=\"redacted\">***</span>"
183
- elsif value.nil? || value.to_s.strip.empty?
183
+ display_value = if value.nil? || value.to_s.strip.empty?
184
184
  "<span class=\"not-set\">(not set)</span>"
185
185
  else
186
186
  ERB::Util.html_escape(value.to_s)
@@ -200,7 +200,7 @@ module EventHub
200
200
  return "<tr><td class=\"config-key\">#{ERB::Util.html_escape(full_key)}</td><td><span class=\"not-set\">(empty)</span></td></tr>" if array.empty?
201
201
 
202
202
  if sensitive_key?(key)
203
- return "<tr><td class=\"config-key\">#{ERB::Util.html_escape(full_key)}</td><td><span class=\"redacted\">***</span></td></tr>"
203
+ return "<tr><td class=\"config-key\">#{ERB::Util.html_escape(full_key)}</td><td>#{redacted_html}</td></tr>"
204
204
  end
205
205
 
206
206
  inner = array.map { |item| format_array_item(item) }.join("\n")
@@ -210,7 +210,7 @@ module EventHub
210
210
  def format_array_item(item)
211
211
  if item.is_a?(Hash)
212
212
  rows = item.map do |k, v|
213
- value = format_nested_value(v)
213
+ value = sensitive_key?(k) ? redacted_html : format_nested_value(v)
214
214
  "<tr><td>#{ERB::Util.html_escape(k)}</td><td>#{value}</td></tr>"
215
215
  end.join
216
216
  "<li><table class=\"table is-bordered is-narrow config-subtable\">#{rows}</table></li>"
@@ -225,7 +225,8 @@ module EventHub
225
225
  def format_nested_value(value)
226
226
  if value.is_a?(Hash)
227
227
  rows = value.map do |k, v|
228
- "<tr><td>#{ERB::Util.html_escape(k)}</td><td>#{format_nested_value(v)}</td></tr>"
228
+ inner = sensitive_key?(k) ? redacted_html : format_nested_value(v)
229
+ "<tr><td>#{ERB::Util.html_escape(k)}</td><td>#{inner}</td></tr>"
229
230
  end.join
230
231
  "<table class=\"table is-bordered is-narrow config-subtable\">#{rows}</table>"
231
232
  elsif value.is_a?(Array)
@@ -238,6 +239,10 @@ module EventHub
238
239
  end
239
240
  end
240
241
 
242
+ def redacted_html
243
+ "<span class=\"redacted\">***</span>"
244
+ end
245
+
241
246
  def compact_hash?(hash)
242
247
  hash.values.all? do |v|
243
248
  if v.is_a?(Hash)
@@ -18,6 +18,11 @@ module EventHub
18
18
  end
19
19
 
20
20
  def create_bunny_connection
21
+ connection_string, connection_properties = bunny_connection_options
22
+ Bunny.new(connection_string, connection_properties)
23
+ end
24
+
25
+ def bunny_connection_options
21
26
  server = EventHub::Configuration.server
22
27
 
23
28
  protocol = "amqp"
@@ -31,8 +36,21 @@ module EventHub
31
36
  connection_properties[:logger] = Logger.new(File::NULL)
32
37
  end
33
38
 
34
- # we don't need it since reactors can deal with it
35
- connection_properties[:automatically_recover] = false
39
+ # Bunny's network recovery: re-establish connection, re-open channels,
40
+ # re-declare queues, and re-register consumers transparently after a
41
+ # broker disconnect (heartbeat miss, broker restart, LB drop).
42
+ connection_properties[:automatically_recover] = true
43
+ connection_properties[:network_recovery_interval] = 5
44
+ connection_properties[:recovery_attempts] = nil
45
+ connection_properties[:continuation_timeout] = 15_000
46
+
47
+ # Belt-and-suspenders: if recovery_attempts is ever capped, escalate to
48
+ # a Celluloid actor restart instead of going silent.
49
+ connection_properties[:recovery_attempts_exhausted] = lambda do
50
+ EventHub.logger.error("Bunny recovery attempts exhausted - actor will restart")
51
+ actor = Celluloid::Actor[:actor_listener_amqp]
52
+ actor&.async&.restart
53
+ end
36
54
 
37
55
  # do we do tls?
38
56
  if server[:tls]
@@ -45,8 +63,7 @@ module EventHub
45
63
  end
46
64
 
47
65
  connection_string = "#{protocol}://#{server[:host]}:#{server[:port]}"
48
-
49
- Bunny.new(connection_string, connection_properties)
66
+ [connection_string, connection_properties]
50
67
  end
51
68
 
52
69
  # Formats stamp into UTC format
@@ -0,0 +1,51 @@
1
+ # EventHub patches for upstream gems.
2
+ #
3
+ # Celluloid 0.18 (the last released version, ~2016) is incompatible with
4
+ # Ruby 3.x frozen-string-literal defaults. Its `Internals::Logger.crash`
5
+ # mutates string literals like:
6
+ #
7
+ # def crash(string, exception)
8
+ # if Celluloid.log_actor_crashes
9
+ # string << "\n" << format_exception(exception) # FrozenError under Ruby 3.x
10
+ # error string
11
+ # end
12
+ # @exception_handlers.each { |h| h.call(exception) }
13
+ # end
14
+ #
15
+ # The `string << ...` raises FrozenError BEFORE the registered exception
16
+ # handlers fire. The actor thread then dies silently:
17
+ # * no exit event is sent to the supervisor (no restart),
18
+ # * no exit event is sent to linked sub-actors (they stay alive as zombies),
19
+ # * no error is logged anywhere.
20
+ #
21
+ # Externally the symptom is: an actor whose method raises (e.g. our
22
+ # `ActorListenerAmqp#restart` raising "Listener amqp is restarting...") appears
23
+ # to be entering the raise but never actually dies, and the listener never
24
+ # gets restarted. We hit this in 1.28.0 testing: SIGHUP looked like it
25
+ # worked (Configuration reloaded, async.restart enqueued, restart entered)
26
+ # but the listener silently became a zombie.
27
+ #
28
+ # Upstream fix is unlikely - Celluloid is unmaintained. We prepend a corrected
29
+ # `crash` that defrosts the input string before mutating it. Behavior is
30
+ # otherwise identical to the original.
31
+ module EventHub
32
+ module Patches
33
+ module CelluloidLoggerCrash
34
+ def crash(string, exception)
35
+ message = +String(string)
36
+ if Celluloid.log_actor_crashes
37
+ message << "\n" << format_exception(exception)
38
+ error message
39
+ end
40
+
41
+ @exception_handlers.each do |handler|
42
+ handler.call(exception)
43
+ rescue => ex
44
+ error(+"EXCEPTION HANDLER CRASHED:\n" << format_exception(ex))
45
+ end
46
+ end
47
+ end
48
+ end
49
+ end
50
+
51
+ Celluloid::Internals::Logger.singleton_class.prepend(EventHub::Patches::CelluloidLoggerCrash)
@@ -61,6 +61,11 @@ module EventHub
61
61
  # pass message as string like: '{ "header": ... , "body": { .. }}'
62
62
  # and optionally exchange_name: 'your exchange name'
63
63
  def publish(args = {})
64
+ # capture caller-thread thread-local before the cross-actor hop;
65
+ # CorrelationId.current is nil inside the publisher actor's thread
66
+ if CorrelationId.current && !args[:correlation_id]
67
+ args = args.merge(correlation_id: CorrelationId.current)
68
+ end
64
69
  Celluloid::Actor[:actor_listener_amqp].publish(args)
65
70
  rescue => error
66
71
  EventHub.logger.error("Unexpected exeption while publish: #{error}")
@@ -1,3 +1,3 @@
1
1
  module EventHub
2
- VERSION = "1.27.2".freeze
2
+ VERSION = "1.28.0".freeze
3
3
  end
data/soak/README.md ADDED
@@ -0,0 +1,72 @@
1
+ ## Soak / reliability harness
2
+
3
+ This folder contains a multi-process chaos test for the `eventhub-processor2`
4
+ gem. Four programs cooperate so that any reliability gap (lost messages,
5
+ zombie consumers, stuck publisher, ...) shows up as an orphan file in
6
+ `soak/data/`.
7
+
8
+ * `publisher.rb` - creates a unique file and a message, publishes to `example.outbound`
9
+ * `router.rb` - listens on `example.outbound`, re-publishes to `example.inbound`
10
+ * `receiver.rb` - listens on `example.inbound`, deletes the file with the given id
11
+ * `crasher.rb` - randomly restarts RabbitMQ or sends SIGHUP to router/receiver
12
+
13
+ ```
14
+ publisher => [example.outbound] => router => [example.inbound] => receiver
15
+ ```
16
+
17
+ ### Goal
18
+
19
+ No matter what the crasher does, after a graceful shutdown and a drain
20
+ period `soak/data/` should contain only `store.json` (and that should
21
+ be `{}`).
22
+
23
+ ### Quick start (via Makefile)
24
+
25
+ The project root has a `Makefile` that wraps everything:
26
+
27
+ ```bash
28
+ make soak # 10-minute default soak, prints PASS/FAIL
29
+ SOAK_MINUTES=30 make soak # longer
30
+ make soak-start # start all four manually
31
+ make soak-stop # SIGINT them all
32
+ make soak-clean # stop + wipe data/
33
+ make help # list targets and env knobs
34
+ ```
35
+
36
+ The `soak` target runs the chaos loop, then `SIGKILL`s the publisher
37
+ (skipping its cleanup) so any in-flight file stays on disk as an honest
38
+ snapshot. After draining, it counts files in `data/` excluding `store.json`
39
+ and exits 0 if empty, 1 with the first 5 orphan ids if not.
40
+
41
+ ### Manual start
42
+
43
+ Make sure the RabbitMQ container is running (see [docker readme](../docker/README.md)):
44
+
45
+ ```bash
46
+ cd soak
47
+ bundle exec ruby receiver.rb # in its own terminal
48
+ bundle exec ruby router.rb # in its own terminal
49
+ bundle exec ruby publisher.rb # in its own terminal
50
+ bundle exec ruby crasher.rb # optional, in its own terminal
51
+ ```
52
+
53
+ Graceful shutdown order: stop `publisher.rb` first, let `router`/`receiver`
54
+ drain the queues, then stop the rest. Check `soak/data/` afterwards.
55
+
56
+ ### Publisher knobs (env overridable)
57
+
58
+ * `PAUSE_BETWEEN_WORK=F` - seconds between publishes (default `0.05` ~ 20 msg/s)
59
+ * `PUBLISH_MAX_ATTEMPTS=N` - publish retry attempts on transient Bunny errors (default `8`)
60
+ * `PUBLISH_RETRY_DELAY_S=F` - seconds between publish retries (default `1`)
61
+
62
+ The publisher uses `wait_for_confirms` (synchronous publisher confirms) and
63
+ retries `Bunny::NetworkFailure` / `Bunny::ChannelAlreadyClosed` / `Timeout::Error`
64
+ to bridge Bunny's channel-recovery window after a broker restart.
65
+
66
+ ### Notes
67
+
68
+ * The publisher's `TransactionStore.cleanup` runs on graceful shutdown and
69
+ deletes any pending file in `store.json`. That's why the `make soak`
70
+ target uses `SIGKILL` for the publisher at the end - you want to see what
71
+ was actually in flight at the moment the chaos phase ended.
72
+ * Watch for huge log files in `logs/ruby/` during long runs.
@@ -0,0 +1,37 @@
1
+ #!/usr/bin/env ruby
2
+ # Distinguish real orphans from in-flight-at-SIGKILL artifacts.
3
+ #
4
+ # After the publisher is SIGKILLed, three states can exist in data/:
5
+ # * file + UUID in store.json - publisher was mid-transaction; expected residual.
6
+ # A real restart would cleanup via TransactionStore.
7
+ # * file + UUID NOT in store - publisher confirmed delivery but receiver never
8
+ # deleted the file. This is a real pipeline loss.
9
+ # * no file + UUID in store - delivery completed despite SIGKILL; harmless.
10
+ #
11
+ # We treat only the middle case as a soak FAILURE.
12
+ #
13
+ # Usage: soak/check_orphans.rb <data_dir>
14
+ # Output (single line): "<real_orphan_count> <in_flight_count>"
15
+
16
+ require "json"
17
+
18
+ data_dir = ARGV[0] || "soak/data"
19
+ store_path = File.join(data_dir, "store.json")
20
+ in_flight = begin
21
+ JSON.parse(File.read(store_path))
22
+ rescue
23
+ {}
24
+ end.keys
25
+
26
+ files = Dir.glob(File.join(data_dir, "*.json"))
27
+ .map { |f| File.basename(f, ".json") }
28
+ .reject { |id| id == "store" }
29
+
30
+ real_orphans = files - in_flight
31
+
32
+ puts "#{real_orphans.size} #{in_flight.size}"
33
+
34
+ if ARGV.include?("--list-orphans") && !real_orphans.empty?
35
+ warn "first 5 real orphan ids:"
36
+ real_orphans.first(5).each { |id| warn " #{id}" }
37
+ end
@@ -9,10 +9,16 @@ require_relative "../lib/eventhub/sleeper"
9
9
  SIGNALS_FOR_TERMINATION = [:INT, :TERM, :QUIT]
10
10
  SIGNALS_FOR_RELOAD_CONFIG = [:HUP]
11
11
  ALL_SIGNALS = SIGNALS_FOR_TERMINATION + SIGNALS_FOR_RELOAD_CONFIG
12
- PAUSE_BETWEEN_WORK = 0.05 # default is 0.05
12
+
13
+ # Tunables (override via env). Defaults are sized to bridge a typical
14
+ # Bunny channel-recovery window (~network_recovery_interval = 5s) plus
15
+ # reconnect, so a broker restart doesn't leak in-flight publishes.
16
+ PAUSE_BETWEEN_WORK = Float(ENV.fetch("PAUSE_BETWEEN_WORK", "0.05"))
17
+ PUBLISH_MAX_ATTEMPTS = Integer(ENV.fetch("PUBLISH_MAX_ATTEMPTS", "8"))
18
+ PUBLISH_RETRY_DELAY_S = Float(ENV.fetch("PUBLISH_RETRY_DELAY_S", "1"))
13
19
 
14
20
  Celluloid.logger = nil
15
- Celluloid.exception_handler { |ex| Publisher.logger.error "Exception occured: #{ex}}" }
21
+ Celluloid.exception_handler { |ex| Publisher.logger.error "Exception occured: #{ex.class}: #{ex.message}" }
16
22
  Celluloid.boot
17
23
 
18
24
  # Publisher module
@@ -110,14 +116,24 @@ module Publisher
110
116
  sleep PAUSE_BETWEEN_WORK
111
117
  end
112
118
  ensure
113
- @connection&.close
119
+ begin
120
+ @connection&.close
121
+ rescue => ex
122
+ Publisher.logger.warn("Worker connection close: ignoring #{ex.class}: #{ex.message}")
123
+ end
114
124
  end
115
125
 
116
126
  private
117
127
 
118
128
  def connect
119
129
  @connection = Bunny.new(vhost: "event_hub",
120
- automatic_recovery: false,
130
+ # match the gem's recovery defaults so Bunny re-opens the channel
131
+ # and re-declares the exchange after a broker restart, without
132
+ # crashing the Worker actor.
133
+ automatically_recover: true,
134
+ network_recovery_interval: 5,
135
+ recovery_attempts: nil,
136
+ continuation_timeout: 15_000,
121
137
  logger: Logger.new(File::NULL))
122
138
  @connection.start
123
139
  @channel = @connection.create_channel
@@ -131,14 +147,35 @@ module Publisher
131
147
  file_name = "data/#{id}.json"
132
148
  data = {body: {id: id}}.to_json
133
149
 
134
- # start transaction...
150
+ # start transaction (durable on disk via store.json)
135
151
  Celluloid::Actor[:transaction_store].start(id)
136
152
  File.write(file_name, data)
137
153
  Publisher.logger.info("[#{id}] - Message/File created")
138
154
 
139
- @exchange.publish(data, persistent: true)
155
+ # Bridge Bunny's channel-recovery window: when the broker is bouncing,
156
+ # the channel needs ~network_recovery_interval seconds to reopen and
157
+ # publishes during that window raise ChannelAlreadyClosed. Retry a few
158
+ # times with backoff so the message survives a broker hiccup. After
159
+ # PUBLISH_MAX_ATTEMPTS, leave it pending on disk for an external sweep.
160
+ attempts = 0
161
+ begin
162
+ attempts += 1
163
+ @exchange.publish(data, persistent: true)
164
+ unless @channel.wait_for_confirms
165
+ Publisher.logger.warn("[#{id}] - broker nacked; leaving pending")
166
+ return
167
+ end
168
+ rescue Bunny::NetworkFailure, Bunny::ChannelAlreadyClosed, Timeout::Error => ex
169
+ if attempts < PUBLISH_MAX_ATTEMPTS
170
+ sleep PUBLISH_RETRY_DELAY_S
171
+ retry
172
+ end
173
+ Publisher.logger.warn("[#{id}] - publish failed after #{attempts} attempts (#{ex.class}: #{ex.message}); leaving pending")
174
+ return
175
+ end
176
+
140
177
  Celluloid::Actor[:transaction_store]&.stop(id)
141
- Publisher&.logger&.info("[#{id}] - Message sent")
178
+ Publisher.logger.info("[#{id}] - Message sent")
142
179
  end
143
180
  end
144
181
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: eventhub-processor2
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.27.2
4
+ version: 1.28.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Steiner, Thomas
@@ -179,6 +179,7 @@ files:
179
179
  - CHANGELOG.md
180
180
  - Gemfile
181
181
  - LICENSE.txt
182
+ - Makefile
182
183
  - README.md
183
184
  - Rakefile
184
185
  - bin/console
@@ -190,16 +191,9 @@ files:
190
191
  - docker/rabbitmq.config
191
192
  - docker/reset
192
193
  - eventhub-processor2.gemspec
193
- - example/CHANGELOG.md
194
194
  - example/README.md
195
195
  - example/config/example.json
196
- - example/config/receiver.json
197
- - example/config/router.json
198
- - example/crasher.rb
199
196
  - example/example.rb
200
- - example/publisher.rb
201
- - example/receiver.rb
202
- - example/router.rb
203
197
  - lib/eventhub/actor_heartbeat.rb
204
198
  - lib/eventhub/actor_listener_amqp.rb
205
199
  - lib/eventhub/actor_listener_http.rb
@@ -220,11 +214,21 @@ files:
220
214
  - lib/eventhub/helper.rb
221
215
  - lib/eventhub/logger.rb
222
216
  - lib/eventhub/message.rb
217
+ - lib/eventhub/patches/celluloid_logger.rb
223
218
  - lib/eventhub/processor2.rb
224
219
  - lib/eventhub/sleeper.rb
225
220
  - lib/eventhub/statistics.rb
226
221
  - lib/eventhub/templates/layout.erb
227
222
  - lib/eventhub/version.rb
223
+ - soak/CHANGELOG.md
224
+ - soak/README.md
225
+ - soak/check_orphans.rb
226
+ - soak/config/receiver.json
227
+ - soak/config/router.json
228
+ - soak/crasher.rb
229
+ - soak/publisher.rb
230
+ - soak/receiver.rb
231
+ - soak/router.rb
228
232
  homepage: https://github.com/thomis/eventhub-processor2
229
233
  licenses:
230
234
  - MIT
@@ -243,7 +247,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
243
247
  - !ruby/object:Gem::Version
244
248
  version: '0'
245
249
  requirements: []
246
- rubygems_version: 4.0.6
250
+ rubygems_version: 4.0.10
247
251
  specification_version: 4
248
252
  summary: Next generation gem to build ruby based eventhub processor
249
253
  test_files: []
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes