waterdrop 2.10.0 → 2.10.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.ruby-version +1 -1
- data/CHANGELOG.md +23 -0
- data/Gemfile +1 -1
- data/Gemfile.lock +15 -15
- data/docker-compose.oauth.yml +1 -1
- data/docker-compose.sasl.yml +1 -1
- data/docker-compose.yml +1 -1
- data/lib/waterdrop/connection_pool.rb +19 -7
- data/lib/waterdrop/errors.rb +1 -2
- data/lib/waterdrop/instrumentation/callbacks/delivery.rb +14 -0
- data/lib/waterdrop/instrumentation/callbacks/error.rb +1 -2
- data/lib/waterdrop/instrumentation/logger_listener.rb +2 -2
- data/lib/waterdrop/instrumentation/vendors/datadog/metrics_listener.rb +1 -2
- data/lib/waterdrop/polling/config.rb +1 -2
- data/lib/waterdrop/polling/latch.rb +1 -2
- data/lib/waterdrop/polling/poller.rb +38 -18
- data/lib/waterdrop/polling/queue_pipe.rb +1 -2
- data/lib/waterdrop/polling/state.rb +2 -4
- data/lib/waterdrop/producer/async.rb +2 -2
- data/lib/waterdrop/producer/buffer.rb +34 -2
- data/lib/waterdrop/producer/idempotence.rb +9 -0
- data/lib/waterdrop/producer/status.rb +21 -1
- data/lib/waterdrop/producer/sync.rb +10 -4
- data/lib/waterdrop/producer/tombstone.rb +6 -2
- data/lib/waterdrop/producer/variant.rb +18 -4
- data/lib/waterdrop/producer.rb +119 -23
- data/lib/waterdrop/version.rb +1 -1
- data/package-lock.json +6 -6
- data/renovate.json +14 -3
- metadata +2 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 1329d22d7f4f960b2df24949a040dd1e1b57ec73002ed779f3bcd0c5ad4dad20
|
|
4
|
+
data.tar.gz: b2b9868379d00dd3951df5a2e3b45bdce1b24295284fd5cb6cca410709a38a05
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 817244d3151a463cca84a1a7fefd298c5e355337325817430cdef5f5eb1c09013d20f801f9ef95b5e9182539f8cb550f25df62042bf93bc7826e9f50ee4b1caf
|
|
7
|
+
data.tar.gz: c60cc27e7b3787a9610229dacdcbe0440fa7332c5659f5e61348409b4ca2c507718a5bbd93bf5e088f8cc7a5d8b91da981eac0f377257ec0c4394e20dcd92994
|
data/.ruby-version
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
4.0.
|
|
1
|
+
4.0.5
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,28 @@
|
|
|
1
1
|
# WaterDrop changelog
|
|
2
2
|
|
|
3
|
+
## 2.10.2 (2026-06-15)
|
|
4
|
+
- [Feature] Expose `Producer#current_variant` as a public method. It returns the variant active for the current dispatch on the current fiber - the custom variant while inside a `#with`/`#variant`-wrapped call, otherwise the producer's default variant - so middleware and instrumentation listeners running synchronously within a dispatch can read the effective per-dispatch settings (`topic_config`, `max_wait_timeout`, `default?`). The lookup is fiber-local and dispatch-scoped: outside a variant-wrapped call (or from an asynchronous delivery callback) it returns the default variant.
|
|
5
|
+
- [Enhancement] Stop allocating one interpolated string per message in `LoggerListener` batch produce handlers. The quoted topic strings were only ever counted (quoting is a 1:1 mapping), never displayed, so counting the raw topic values yields the identical number with zero string allocations - relevant for large `produce_many_*` batches with the default logger listener attached.
|
|
6
|
+
- [Enhancement] Use `Array#concat` in `Producer#buffer_many` instead of appending messages one by one.
|
|
7
|
+
- [Enhancement] Skip building the `message.acknowledged` instrumentation payload in the delivery callback when nothing is subscribed to that event. The notifications bus already short-circuits on empty listeners, but only after the payload hash was allocated - once per delivered message on the polling thread. Mirrors the listener guard already used by the statistics callback. Late subscribers keep working as the check happens on each emission.
|
|
8
|
+
- [Enhancement] Resolve the fiber-local variant once per `#produce` call and once per `#produce_many_sync` wait phase instead of re-resolving it for every usage and for every waited delivery handle. For a 1,000-message sync batch this removes ~2,000 redundant fiber-local lookups.
|
|
9
|
+
- [Enhancement] Do not allocate the fiber-local variants hash on the `Producer#current_variant` read path. Previously every fiber that produced messages got a Hash pinned to it for the fiber's lifetime (per producer use), even when variants were never used - wasteful under fiber-per-request servers (Falcon, async). The hash is now only created by variant wrapper methods that actually need to write to it.
|
|
10
|
+
- [Enhancement] Cache the variant validation contract in a constant instead of instantiating a new `Contracts::Variant` on every `Producer#with` / `Producer#variant` call (mirrors the existing `Transactions::CONTRACT` pattern).
|
|
11
|
+
- [Enhancement] Cache the tombstone validation contract in a constant instead of instantiating a new `Contracts::Tombstone` per tombstone message, removing per-message allocations in the `tombstone_*` APIs (mirrors the existing `Transactions::CONTRACT` pattern).
|
|
12
|
+
- [Enhancement] Replace explicit `Warning[:performance]` opt-in with a dynamic approach using `Warning.categories` (available since Ruby 3.4) to automatically enable all stable opt-in warning categories in the test suite, including `:strict_unused_block` introduced in Ruby 4.0.
|
|
13
|
+
- [Fix] Prevent a deadlock between a transactional single-message dispatch and `#close`. A single `produce_sync`/`produce_async` on a transactional producer incremented the operations counter (which `#close` drains while holding `@transaction_mutex`) before acquiring `@transaction_mutex` for its per-message transaction - an inverted lock order. A dispatch that had counted itself but not yet taken `@transaction_mutex` could deadlock a concurrent `#close` permanently (the close wait loop has no timeout). Transactional dispatches now take `@transaction_mutex` before the operation is counted, matching `#close`'s lock order (`@transaction_mutex` -> `@operating_mutex` -> operations counter).
|
|
14
|
+
- [Fix] Prevent a deadlock (`ThreadError: deadlock; recursive locking`) when closing an idempotent producer (with `reload_on_idempotent_fatal_error` enabled) that has buffered messages whose final flush surfaces a fatal librdkafka error. `#close` performs the final flush while already holding `@operating_mutex`, and the idempotent fatal-error reload tried to re-acquire that same mutex, leaving the producer stuck in `:closing` with the native client leaked. The idempotent reload is now skipped on the closing path, and the final buffer flush is best-effort so client teardown always completes.
|
|
15
|
+
- [Fix] Make concurrent idempotent fatal-error reload thread-safe. When several threads shared an idempotent producer (with `reload_on_idempotent_fatal_error` enabled), a single fatal librdkafka condition failed all their in-flight produces at once and each entered the reload path; the second reload ran `reload!` after the first had already reset `@client` to `nil`, raising `NoMethodError`. The idempotent reload now bails out if another thread already reloaded (mirroring the transactional path's `return if @status.configured?` guard). Additionally, `Status#active?` now classifies the lifecycle from a single atomic read and `Producer#ensure_active!` branches on one snapshot, so a concurrent `configured -> connected` transition during a reload can no longer make `ensure_active!` raise `StatusInvalidError` for a valid, active producer.
|
|
16
|
+
- [Fix] Stop `#flush_async` / `#flush_sync` from silently dropping valid buffered messages when the dispatch fails. `#flush` removes the batch from the internal buffer before dispatching it, and a failure (a single invalid message failing validation before anything is sent, or a mid-batch inline error such as queue full) previously discarded the entire taken batch - the removed messages were never restored. A failed flush now re-buffers the messages that never reached librdkafka (the whole batch on validation failure or on a transactional rollback, the unsent remainder otherwise) so they can be retried instead of being lost.
|
|
17
|
+
- [Fix] Make `Producer#close` fork-safe so the GC finalizer inherited by a forked child can no longer close the parent's client. `#client` registers an `ObjectSpace` finalizer that calls `#close`; that finalizer is inherited across `fork`, and a child that inherited a used producer, never touched it, and exited normally would run `#close` in the child - flushing and closing (with the real rdkafka client, `rd_kafka_destroy` on a fork-inherited handle, i.e. undefined behavior) a client owned by the parent. `#close` now detects when it runs in a process other than the one that built the client, drops the inherited references and finalizer, and returns without touching the native client (matching the existing fork guard on the `#client` path).
|
|
18
|
+
- [Fix] Guard the internal buffer appends in `Producer#buffer` and `Producer#buffer_many` with `@buffer_mutex`. The appends mutated the shared `@messages` buffer without the lock that `flush`/`purge`/`close` hold while swapping it for a fresh array, so a concurrent swap landing between reading `@messages` and appending could drop the message into an orphaned array that is never dispatched - silently losing buffered messages in the documented "buffer in one thread, flush in another" pattern.
|
|
19
|
+
- [Fix] Stop a nested same-producer variant call from clobbering the outer variant inside a variant `transaction` block. `transaction` is the only variant-wrapped method that yields user code, so a variant call nested inside it (another `variant.produce_*`, or a raw producer dispatch in the same scope) used to delete the shared `Fiber.current.waterdrop_clients` entry on return, making the rest of the block silently fall back to the default variant and dispatch with default `topic_config` (timeouts, compression, partitioner) instead of the altered one. The wrapper now saves and restores the previous entry instead of unconditionally deleting it (still deleting when there was none, so the fiber-local hash does not accumulate stale keys).
|
|
20
|
+
- [Fix] Stop `ConnectionPool#shutdown` and `#reload` from silently dropping in-flight messages. Both closed every pooled producer with `close!` (force), which flushes for the max wait timeout and then purges whatever has not drained - so on a slow or unreachable broker, queued `produce_async` messages were cancelled and lost with no delivery report. They now close producers gracefully by default (`#reload` always; `#shutdown` unless called with the new `force: true`), letting messages flush instead of being purged. Pass `pool.shutdown(force: true)` to keep the old force-and-purge behavior.
|
|
21
|
+
- [Fix] Close a race in the FD poller where a producer registered while the last one was being torn down could be left permanently unpolled (sync produces hang until timeout, async deliveries are never acknowledged). The poller thread decided to exit (last producer unregistered) and cleared its thread reference in two separate, unsynchronized steps, so a `register` landing in that gap saw the still-alive exiting thread, skipped starting a fresh one, and then had its producer's state closed by the exiting thread's cleanup. The thread now decides to stop and clears its reference in a single mutex section, so a racing `register` either keeps it running or starts a fresh thread; and the exit cleanup runs only on an abnormal exit, since a normal exit always leaves an empty registry and so can never close a producer registered in the gap.
|
|
22
|
+
|
|
23
|
+
## 2.10.1 (2026-05-25)
|
|
24
|
+
- [Fix] Prevent `Producer#close` from raising `ThreadError: can't be called from trap context` when invoked from a Ruby signal trap context (e.g. Puma's `after_stopped` DSL hook in single mode). `close` now detects this case and delegates to a background thread, joining it so the caller blocks until the producer is fully closed (#866).
|
|
25
|
+
|
|
3
26
|
## 2.10.0 (2026-05-07)
|
|
4
27
|
- [Fix] Clean up native rdkafka client, global instrumentation callbacks, and poller registration when `init_transactions` fails during producer client construction. Previously, each failed attempt permanently leaked native threads, pipe file descriptors, and callback registry entries because the started `rd_kafka_t` handle was abandoned without being destroyed.
|
|
5
28
|
- **[Breaking]** Skip emitting librdkafka statistics when nothing is subscribed to `statistics.emitted` at the time the underlying rdkafka client is constructed. When no listener is present at build time, `statistics.interval.ms` is forced to `0` regardless of user configuration and the statistics callback is not registered, saving substantial allocations in the hot path (no JSON parsing, no statistics hash materialization, no decoration work). To use statistics, subscribe a listener to `statistics.emitted` BEFORE the first producer use (before the underlying client is lazily initialized).
|
data/Gemfile
CHANGED
data/Gemfile.lock
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
PATH
|
|
2
2
|
remote: .
|
|
3
3
|
specs:
|
|
4
|
-
waterdrop (2.10.
|
|
4
|
+
waterdrop (2.10.2)
|
|
5
5
|
karafka-core (>= 2.5.12, < 3.0.0)
|
|
6
6
|
karafka-rdkafka (>= 0.24.0)
|
|
7
7
|
zeitwerk (~> 2.3)
|
|
@@ -16,11 +16,11 @@ GEM
|
|
|
16
16
|
drb (2.2.3)
|
|
17
17
|
ffi (1.17.4)
|
|
18
18
|
io-console (0.8.2)
|
|
19
|
-
json (2.19.
|
|
20
|
-
karafka-core (2.5.
|
|
19
|
+
json (2.19.7)
|
|
20
|
+
karafka-core (2.5.13)
|
|
21
21
|
karafka-rdkafka (>= 0.20.0)
|
|
22
22
|
logger (>= 1.6.0)
|
|
23
|
-
karafka-rdkafka (0.
|
|
23
|
+
karafka-rdkafka (0.27.2)
|
|
24
24
|
ffi (~> 1.17.1)
|
|
25
25
|
json (> 2.0)
|
|
26
26
|
logger
|
|
@@ -35,7 +35,7 @@ GEM
|
|
|
35
35
|
ruby2_keywords (>= 0.0.5)
|
|
36
36
|
ostruct (0.6.3)
|
|
37
37
|
prism (1.9.0)
|
|
38
|
-
rake (13.
|
|
38
|
+
rake (13.4.2)
|
|
39
39
|
reline (0.6.3)
|
|
40
40
|
io-console (~> 0.5)
|
|
41
41
|
ruby2_keywords (0.0.5)
|
|
@@ -45,8 +45,8 @@ GEM
|
|
|
45
45
|
simplecov_json_formatter (~> 0.1)
|
|
46
46
|
simplecov-html (0.13.2)
|
|
47
47
|
simplecov_json_formatter (0.1.4)
|
|
48
|
-
warning (1.
|
|
49
|
-
zeitwerk (2.
|
|
48
|
+
warning (1.6.0)
|
|
49
|
+
zeitwerk (2.8.2)
|
|
50
50
|
|
|
51
51
|
PLATFORMS
|
|
52
52
|
ruby
|
|
@@ -60,7 +60,7 @@ DEPENDENCIES
|
|
|
60
60
|
simplecov
|
|
61
61
|
warning
|
|
62
62
|
waterdrop!
|
|
63
|
-
zeitwerk (~> 2.
|
|
63
|
+
zeitwerk (~> 2.8.0)
|
|
64
64
|
|
|
65
65
|
CHECKSUMS
|
|
66
66
|
byebug (13.0.0) sha256=d2263efe751941ca520fa29744b71972d39cbc41839496706f5d9b22e92ae05d
|
|
@@ -69,24 +69,24 @@ CHECKSUMS
|
|
|
69
69
|
drb (2.2.3) sha256=0b00d6fdb50995fe4a45dea13663493c841112e4068656854646f418fda13373
|
|
70
70
|
ffi (1.17.4) sha256=bcd1642e06f0d16fc9e09ac6d49c3a7298b9789bcb58127302f934e437d60acf
|
|
71
71
|
io-console (0.8.2) sha256=d6e3ae7a7cc7574f4b8893b4fca2162e57a825b223a177b7afa236c5ef9814cc
|
|
72
|
-
json (2.19.
|
|
73
|
-
karafka-core (2.5.
|
|
74
|
-
karafka-rdkafka (0.
|
|
72
|
+
json (2.19.7) sha256=fe432c8639f6efff69f9d73b518a3705d9581ab93156f981ea72806e1e5bcc3e
|
|
73
|
+
karafka-core (2.5.13) sha256=0acec083043bb6166c4b647a7458091cc7b08066d3b92a026932925ec7e07f61
|
|
74
|
+
karafka-rdkafka (0.27.2) sha256=3ccce96306642be70bff8168e4e737fc10f2ffae20bc0ff0a43d88dbb7452d31
|
|
75
75
|
logger (1.7.0) sha256=196edec7cc44b66cfb40f9755ce11b392f21f7967696af15d274dde7edff0203
|
|
76
76
|
mini_portile2 (2.8.9) sha256=0cd7c7f824e010c072e33f68bc02d85a00aeb6fce05bb4819c03dfd3c140c289
|
|
77
77
|
minitest (6.0.6) sha256=153ea36d1d987a62942382b61075745042a2b3123b1cd48f4c3675af9cc7d6f1
|
|
78
78
|
mocha (3.1.0) sha256=75f42d69ebfb1f10b32489dff8f8431d37a418120ecdfc07afe3bc183d4e1d56
|
|
79
79
|
ostruct (0.6.3) sha256=95a2ed4a4bd1d190784e666b47b2d3f078e4a9efda2fccf18f84ddc6538ed912
|
|
80
80
|
prism (1.9.0) sha256=7b530c6a9f92c24300014919c9dcbc055bf4cdf51ec30aed099b06cd6674ef85
|
|
81
|
-
rake (13.
|
|
81
|
+
rake (13.4.2) sha256=cb825b2bd5f1f8e91ca37bddb4b9aaf345551b4731da62949be002fa89283701
|
|
82
82
|
reline (0.6.3) sha256=1198b04973565b36ec0f11542ab3f5cfeeec34823f4e54cebde90968092b1835
|
|
83
83
|
ruby2_keywords (0.0.5) sha256=ffd13740c573b7301cf7a2e61fc857b2a8e3d3aff32545d6f8300d8bae10e3ef
|
|
84
84
|
simplecov (0.22.0) sha256=fe2622c7834ff23b98066bb0a854284b2729a569ac659f82621fc22ef36213a5
|
|
85
85
|
simplecov-html (0.13.2) sha256=bd0b8e54e7c2d7685927e8d6286466359b6f16b18cb0df47b508e8d73c777246
|
|
86
86
|
simplecov_json_formatter (0.1.4) sha256=529418fbe8de1713ac2b2d612aa3daa56d316975d307244399fa4838c601b428
|
|
87
|
-
warning (1.
|
|
88
|
-
waterdrop (2.10.
|
|
89
|
-
zeitwerk (2.
|
|
87
|
+
warning (1.6.0) sha256=a49cdfae19fb77d19afff2efbe45f8ab759e9cd25b4e4ce2c79dbaf46bdb6c9e
|
|
88
|
+
waterdrop (2.10.2)
|
|
89
|
+
zeitwerk (2.8.2) sha256=7212a61311083c604184b1ea2574b9aa05cd14f855a0841c06985cabe9181d12
|
|
90
90
|
|
|
91
91
|
BUNDLED WITH
|
|
92
92
|
4.0.6
|
data/docker-compose.oauth.yml
CHANGED
data/docker-compose.sasl.yml
CHANGED
data/docker-compose.yml
CHANGED
|
@@ -113,11 +113,14 @@ module WaterDrop
|
|
|
113
113
|
end
|
|
114
114
|
|
|
115
115
|
# Shutdown the global connection pool
|
|
116
|
-
|
|
116
|
+
#
|
|
117
|
+
# @param force [Boolean] when true, force-close each producer, purging unflushed messages.
|
|
118
|
+
# Defaults to false (graceful close) so in-flight messages are not silently dropped.
|
|
119
|
+
def shutdown(force: false)
|
|
117
120
|
return unless @default_pool
|
|
118
121
|
|
|
119
122
|
pool = @default_pool
|
|
120
|
-
@default_pool.shutdown
|
|
123
|
+
@default_pool.shutdown(force: force)
|
|
121
124
|
@default_pool = nil
|
|
122
125
|
|
|
123
126
|
# Emit global event for pool shutdown
|
|
@@ -237,9 +240,16 @@ module WaterDrop
|
|
|
237
240
|
end
|
|
238
241
|
|
|
239
242
|
# Shutdown the connection pool
|
|
240
|
-
|
|
243
|
+
#
|
|
244
|
+
# @param force [Boolean] when true, force-close each producer, purging any messages that do not
|
|
245
|
+
# flush within the producer's max wait timeout. Defaults to false: producers are closed
|
|
246
|
+
# gracefully so in-flight messages are flushed instead of being silently dropped when the
|
|
247
|
+
# broker is slow or unreachable.
|
|
248
|
+
def shutdown(force: false)
|
|
241
249
|
@pool.shutdown do |producer|
|
|
242
|
-
|
|
250
|
+
next unless producer&.status&.active?
|
|
251
|
+
|
|
252
|
+
force ? producer.close! : producer.close
|
|
243
253
|
end
|
|
244
254
|
|
|
245
255
|
# Emit event after pool is shut down
|
|
@@ -254,11 +264,13 @@ module WaterDrop
|
|
|
254
264
|
# for API consistency across both individual producers and connection pools
|
|
255
265
|
alias_method :close, :shutdown
|
|
256
266
|
|
|
257
|
-
# Reload all connections in the pool
|
|
258
|
-
#
|
|
267
|
+
# Reload all connections in the pool. Useful for configuration changes or error recovery
|
|
268
|
+
#
|
|
269
|
+
# @note Producers are always closed gracefully (never force-closed): a reload must not drop
|
|
270
|
+
# in-flight messages, so it waits for them to flush rather than purging the queue.
|
|
259
271
|
def reload
|
|
260
272
|
@pool.reload do |producer|
|
|
261
|
-
producer.close
|
|
273
|
+
producer.close if producer&.status&.active?
|
|
262
274
|
end
|
|
263
275
|
|
|
264
276
|
# Emit event after pool is reloaded
|
data/lib/waterdrop/errors.rb
CHANGED
|
@@ -80,7 +80,6 @@ module WaterDrop
|
|
|
80
80
|
end
|
|
81
81
|
end
|
|
82
82
|
|
|
83
|
-
# Alias so we can have a nicer API to abort transactions
|
|
84
|
-
# This makes referencing easier
|
|
83
|
+
# Alias so we can have a nicer API to abort transactions. This makes referencing easier
|
|
85
84
|
AbortTransaction = Errors::AbortTransaction
|
|
86
85
|
end
|
|
@@ -60,7 +60,14 @@ module WaterDrop
|
|
|
60
60
|
private
|
|
61
61
|
|
|
62
62
|
# @param delivery_report [Rdkafka::Producer::DeliveryReport] delivery report
|
|
63
|
+
# @note This is the most frequently fired event in the system (once per delivered
|
|
64
|
+
# message) and most users do not subscribe to it. While the notifications bus
|
|
65
|
+
# short-circuits on empty listeners, that happens only after the payload hash is
|
|
66
|
+
# built, so we guard here to keep the no-listeners path allocation-free. We check on
|
|
67
|
+
# each emission to support late subscribers.
|
|
63
68
|
def instrument_acknowledged(delivery_report)
|
|
69
|
+
return unless listening?
|
|
70
|
+
|
|
64
71
|
@monitor.instrument(
|
|
65
72
|
"message.acknowledged",
|
|
66
73
|
caller: self,
|
|
@@ -111,6 +118,13 @@ module WaterDrop
|
|
|
111
118
|
def build_error(delivery_report)
|
|
112
119
|
::Rdkafka::RdkafkaError.new(delivery_report.error)
|
|
113
120
|
end
|
|
121
|
+
|
|
122
|
+
# Check if anyone is listening to the acknowledgement events
|
|
123
|
+
# @return [Boolean] true if there are any listeners
|
|
124
|
+
def listening?
|
|
125
|
+
listeners = @monitor.listeners["message.acknowledged"]
|
|
126
|
+
listeners && !listeners.empty?
|
|
127
|
+
end
|
|
114
128
|
end
|
|
115
129
|
end
|
|
116
130
|
end
|
|
@@ -21,8 +21,7 @@ module WaterDrop
|
|
|
21
21
|
# @note When there is a particular message produce error (not internal error), the error
|
|
22
22
|
# is shipped via the delivery callback, not via error callback.
|
|
23
23
|
def call(client_name, error)
|
|
24
|
-
# Emit only errors related to our client
|
|
25
|
-
# Same as with statistics (mor explanation there)
|
|
24
|
+
# Emit only errors related to our client, same as with statistics (mor explanation there)
|
|
26
25
|
return unless @client_name == client_name
|
|
27
26
|
|
|
28
27
|
@monitor.instrument(
|
|
@@ -47,7 +47,7 @@ module WaterDrop
|
|
|
47
47
|
# @param event [Dry::Events::Event] event that happened with the details
|
|
48
48
|
def on_messages_produced_async(event)
|
|
49
49
|
messages = event[:messages]
|
|
50
|
-
topics_count = messages.map { |message|
|
|
50
|
+
topics_count = messages.map { |message| message[:topic] }.uniq.count
|
|
51
51
|
|
|
52
52
|
info(
|
|
53
53
|
event,
|
|
@@ -62,7 +62,7 @@ module WaterDrop
|
|
|
62
62
|
# @param event [Dry::Events::Event] event that happened with the details
|
|
63
63
|
def on_messages_produced_sync(event)
|
|
64
64
|
messages = event[:messages]
|
|
65
|
-
topics_count = messages.map { |message|
|
|
65
|
+
topics_count = messages.map { |message| message[:topic] }.uniq.count
|
|
66
66
|
|
|
67
67
|
info(event, "Sync producing of #{messages.size} messages to #{topics_count} topics")
|
|
68
68
|
|
|
@@ -218,8 +218,7 @@ module WaterDrop
|
|
|
218
218
|
when :brokers
|
|
219
219
|
statistics.fetch("brokers").each_value do |broker_statistics|
|
|
220
220
|
# Skip bootstrap nodes
|
|
221
|
-
# Bootstrap nodes have nodeid -1, other nodes have positive
|
|
222
|
-
# node ids
|
|
221
|
+
# Bootstrap nodes have nodeid -1, other nodes have positive node ids
|
|
223
222
|
next if broker_statistics["nodeid"] == -1
|
|
224
223
|
|
|
225
224
|
public_send(
|
|
@@ -16,8 +16,7 @@ module WaterDrop
|
|
|
16
16
|
extend ::Karafka::Core::Configurable
|
|
17
17
|
|
|
18
18
|
# Ruby thread priority for the poller thread
|
|
19
|
-
# Valid range: -3 to 3 (Ruby's thread priority range)
|
|
20
|
-
# Higher values = higher priority
|
|
19
|
+
# Valid range: -3 to 3 (Ruby's thread priority range). Higher values = higher priority
|
|
21
20
|
setting :thread_priority, default: 0
|
|
22
21
|
|
|
23
22
|
# IO.select timeout in milliseconds
|
|
@@ -33,8 +33,7 @@ module WaterDrop
|
|
|
33
33
|
end
|
|
34
34
|
end
|
|
35
35
|
|
|
36
|
-
# Waits until the latch is released
|
|
37
|
-
# Returns immediately if already released
|
|
36
|
+
# Waits until the latch is released. Returns immediately if already released
|
|
38
37
|
def wait
|
|
39
38
|
@mutex.synchronize do
|
|
40
39
|
@cv.wait(@mutex) until @released
|
|
@@ -186,8 +186,7 @@ module WaterDrop
|
|
|
186
186
|
@ios_dirty = true
|
|
187
187
|
end
|
|
188
188
|
|
|
189
|
-
# Ensures the polling thread is running
|
|
190
|
-
# Must be called within @mutex.synchronize
|
|
189
|
+
# Ensures the polling thread is running. Must be called within @mutex.synchronize
|
|
191
190
|
def ensure_thread_running!
|
|
192
191
|
return if @thread&.alive?
|
|
193
192
|
|
|
@@ -200,9 +199,29 @@ module WaterDrop
|
|
|
200
199
|
# Main polling loop that runs in a dedicated thread
|
|
201
200
|
def polling_loop
|
|
202
201
|
backoff_ms = 0
|
|
202
|
+
clean_exit = false
|
|
203
203
|
|
|
204
204
|
loop do
|
|
205
|
-
|
|
205
|
+
# Decide whether to stop AND clear @thread in a single critical section. This is what
|
|
206
|
+
# closes the register/shutdown race: a concurrent `register` is serialized by @mutex, so
|
|
207
|
+
# it either runs before this block (we observe its producer plus `@shutdown = false` and
|
|
208
|
+
# keep polling) or after it (it finds `@thread` already nil and starts a fresh thread).
|
|
209
|
+
# Previously the exit decision and the `@thread = nil` teardown were separate and
|
|
210
|
+
# unsynchronized, so a producer registered in that gap was treated as already served by
|
|
211
|
+
# this exiting thread and then closed by its cleanup - left registered but never polled.
|
|
212
|
+
stop = @mutex.synchronize do
|
|
213
|
+
if @shutdown || @producers.empty?
|
|
214
|
+
@thread = nil
|
|
215
|
+
true
|
|
216
|
+
else
|
|
217
|
+
false
|
|
218
|
+
end
|
|
219
|
+
end
|
|
220
|
+
|
|
221
|
+
if stop
|
|
222
|
+
clean_exit = true
|
|
223
|
+
break
|
|
224
|
+
end
|
|
206
225
|
|
|
207
226
|
# Apply backoff from previous error
|
|
208
227
|
if backoff_ms > 0
|
|
@@ -213,9 +232,9 @@ module WaterDrop
|
|
|
213
232
|
# Collect readable IOs (queue FDs)
|
|
214
233
|
readable_ios, io_to_state = collect_readable_ios
|
|
215
234
|
|
|
216
|
-
#
|
|
217
|
-
#
|
|
218
|
-
|
|
235
|
+
# A producer may have registered right after the stop check above; if the cached snapshot
|
|
236
|
+
# is momentarily empty, loop to rebuild it instead of selecting on an empty set.
|
|
237
|
+
next if readable_ios.empty?
|
|
219
238
|
|
|
220
239
|
poll_with_select(readable_ios, io_to_state)
|
|
221
240
|
rescue => e
|
|
@@ -229,13 +248,12 @@ module WaterDrop
|
|
|
229
248
|
end
|
|
230
249
|
end
|
|
231
250
|
ensure
|
|
232
|
-
#
|
|
233
|
-
#
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
#
|
|
237
|
-
|
|
238
|
-
close_all_states
|
|
251
|
+
# A normal exit already cleared @thread above with an empty registry, so there is nothing to
|
|
252
|
+
# release - and skipping cleanup here is what keeps a producer registered in the exit gap
|
|
253
|
+
# from being closed: its fresh thread owns it now. Only an abnormal exit (an exception
|
|
254
|
+
# escaped the loop) can leave producers registered with callers blocked in `unregister`;
|
|
255
|
+
# release those so they don't hang.
|
|
256
|
+
close_all_states unless clean_exit
|
|
239
257
|
end
|
|
240
258
|
|
|
241
259
|
# Broadcasts an error to all registered producers' monitors
|
|
@@ -379,13 +397,15 @@ module WaterDrop
|
|
|
379
397
|
state.close
|
|
380
398
|
end
|
|
381
399
|
|
|
382
|
-
#
|
|
383
|
-
#
|
|
384
|
-
#
|
|
400
|
+
# Releases any producer states still registered when the poller thread exits ABNORMALLY (an
|
|
401
|
+
# exception escaped the loop), so callers blocked in `unregister` waiting on their latch are
|
|
402
|
+
# not left hanging. A normal exit clears the registry through the loop and never calls this,
|
|
403
|
+
# which is why no thread-ownership check is needed here.
|
|
385
404
|
def close_all_states
|
|
386
405
|
states = @mutex.synchronize do
|
|
387
|
-
|
|
388
|
-
@producers.
|
|
406
|
+
@thread = nil
|
|
407
|
+
to_close = @producers.values
|
|
408
|
+
@producers = {}
|
|
389
409
|
@ios_dirty = true
|
|
390
410
|
to_close
|
|
391
411
|
end
|
|
@@ -25,8 +25,7 @@ module WaterDrop
|
|
|
25
25
|
client.enable_queue_io_events(@writer.fileno)
|
|
26
26
|
end
|
|
27
27
|
|
|
28
|
-
# Signals by writing a byte to the pipe
|
|
29
|
-
# Used to wake IO.select for continue/close signals
|
|
28
|
+
# Signals by writing a byte to the pipe. Used to wake IO.select for continue/close signals
|
|
30
29
|
# Thread-safe and non-blocking; silently ignores errors
|
|
31
30
|
def signal
|
|
32
31
|
@writer.write_nonblock("W", exception: false)
|
|
@@ -53,8 +53,7 @@ module WaterDrop
|
|
|
53
53
|
@io = @queue_pipe.reader
|
|
54
54
|
end
|
|
55
55
|
|
|
56
|
-
# Drains the queue pipe
|
|
57
|
-
# Called before polling to clear any pending signals
|
|
56
|
+
# Drains the queue pipe. Called before polling to clear any pending signals
|
|
58
57
|
def drain
|
|
59
58
|
@queue_pipe.drain
|
|
60
59
|
end
|
|
@@ -88,8 +87,7 @@ module WaterDrop
|
|
|
88
87
|
|
|
89
88
|
private_constant :STALE_CHECK_THROTTLE_MS
|
|
90
89
|
|
|
91
|
-
# Marks this producer as having been polled
|
|
92
|
-
# Called after polling to track staleness
|
|
90
|
+
# Marks this producer as having been polled. Called after polling to track staleness
|
|
93
91
|
def mark_polled!
|
|
94
92
|
@last_poll_time = monotonic_now
|
|
95
93
|
end
|
|
@@ -21,7 +21,7 @@ module WaterDrop
|
|
|
21
21
|
"message.produced_async",
|
|
22
22
|
producer_id: id,
|
|
23
23
|
message: message
|
|
24
|
-
) { produce(message) }
|
|
24
|
+
) { produce(message, "produce_async") }
|
|
25
25
|
rescue *SUPPORTED_FLOW_ERRORS => e
|
|
26
26
|
# We use this syntax here because we want to preserve the original `#cause` when we
|
|
27
27
|
# instrument the error and there is no way to manually assign `#cause` value
|
|
@@ -62,7 +62,7 @@ module WaterDrop
|
|
|
62
62
|
) do
|
|
63
63
|
with_transaction_if_transactional do
|
|
64
64
|
messages.each do |message|
|
|
65
|
-
dispatched << produce(message)
|
|
65
|
+
dispatched << produce(message, "produce_many_async")
|
|
66
66
|
end
|
|
67
67
|
end
|
|
68
68
|
|
|
@@ -12,12 +12,15 @@ module WaterDrop
|
|
|
12
12
|
def buffer(message)
|
|
13
13
|
ensure_active!
|
|
14
14
|
|
|
15
|
+
# The append runs under @buffer_mutex because flush/purge/close swap @messages for a fresh
|
|
16
|
+
# array under the same lock. Without it, a concurrent swap between reading @messages and
|
|
17
|
+
# appending would land the message in the orphaned old array and silently lose it.
|
|
15
18
|
@monitor.instrument(
|
|
16
19
|
"message.buffered",
|
|
17
20
|
producer_id: id,
|
|
18
21
|
message: message,
|
|
19
22
|
buffer: @messages
|
|
20
|
-
) { @messages << message }
|
|
23
|
+
) { @buffer_mutex.synchronize { @messages << message } }
|
|
21
24
|
end
|
|
22
25
|
|
|
23
26
|
# Adds given messages into the internal producer buffer without flushing them to Kafka
|
|
@@ -29,13 +32,16 @@ module WaterDrop
|
|
|
29
32
|
def buffer_many(messages)
|
|
30
33
|
ensure_active!
|
|
31
34
|
|
|
35
|
+
# The concat runs under @buffer_mutex for the same reason as #buffer: flush/purge/close swap
|
|
36
|
+
# @messages under the lock, so an unguarded concat could append into an array that has just
|
|
37
|
+
# been captured for dispatch (or discarded), silently losing the messages.
|
|
32
38
|
@monitor.instrument(
|
|
33
39
|
"messages.buffered",
|
|
34
40
|
producer_id: id,
|
|
35
41
|
messages: messages,
|
|
36
42
|
buffer: @messages
|
|
37
43
|
) do
|
|
38
|
-
|
|
44
|
+
@buffer_mutex.synchronize { @messages.concat(messages) }
|
|
39
45
|
messages
|
|
40
46
|
end
|
|
41
47
|
end
|
|
@@ -83,6 +89,32 @@ module WaterDrop
|
|
|
83
89
|
return data_for_dispatch if data_for_dispatch.empty?
|
|
84
90
|
|
|
85
91
|
sync ? produce_many_sync(data_for_dispatch) : produce_many_async(data_for_dispatch)
|
|
92
|
+
rescue Errors::ProduceManyError => e
|
|
93
|
+
# A dispatch failed partway through the batch. Re-buffer the messages that never reached
|
|
94
|
+
# librdkafka so a partial failure does not silently drop valid buffered messages. For a
|
|
95
|
+
# transactional producer the whole batch is rolled back (nothing is visible to consumers),
|
|
96
|
+
# so all of it is restored; for a regular producer `e.dispatched` holds the handles already
|
|
97
|
+
# created, so only the remainder is restored.
|
|
98
|
+
requeue_unflushed(transactional? ? data_for_dispatch : data_for_dispatch.drop(e.dispatched.size))
|
|
99
|
+
|
|
100
|
+
raise
|
|
101
|
+
rescue Errors::MessageInvalidError
|
|
102
|
+
# Validation runs before anything is dispatched, so nothing reached librdkafka. Restore the
|
|
103
|
+
# whole batch instead of dropping valid messages alongside the invalid one.
|
|
104
|
+
requeue_unflushed(data_for_dispatch)
|
|
105
|
+
|
|
106
|
+
raise
|
|
107
|
+
end
|
|
108
|
+
|
|
109
|
+
# Puts not-yet-dispatched messages back at the front of the buffer (preserving their original
|
|
110
|
+
# order relative to each other and to anything buffered concurrently), so a failed flush does
|
|
111
|
+
# not lose them.
|
|
112
|
+
#
|
|
113
|
+
# @param messages [Array<Hash>] messages to restore to the buffer
|
|
114
|
+
def requeue_unflushed(messages)
|
|
115
|
+
return if messages.empty?
|
|
116
|
+
|
|
117
|
+
@buffer_mutex.synchronize { @messages.unshift(*messages) }
|
|
86
118
|
end
|
|
87
119
|
end
|
|
88
120
|
end
|
|
@@ -57,6 +57,15 @@ module WaterDrop
|
|
|
57
57
|
# @note After reload, the producer will automatically retry the failed operation
|
|
58
58
|
def idempotent_reload_client_on_fatal_error(attempt, error)
|
|
59
59
|
@operating_mutex.synchronize do
|
|
60
|
+
# When several threads share an idempotent producer, one fatal librdkafka condition fails
|
|
61
|
+
# all their in-flight produces at once and each enters this method. The mutex serializes
|
|
62
|
+
# them, but a thread that waited here may arrive after another has already reloaded -
|
|
63
|
+
# resetting @client to nil and moving the producer to the configured state. Running
|
|
64
|
+
# reload! again would call methods on a nil @client and raise NoMethodError, so we bail
|
|
65
|
+
# out and let #produce retry against the freshly reloaded client. This mirrors the
|
|
66
|
+
# `return if @status.configured?` guard on the transactional reload path.
|
|
67
|
+
next if @client.nil? || @status.configured?
|
|
68
|
+
|
|
60
69
|
# Emit producer.reload event before reload
|
|
61
70
|
# Users can subscribe to this event and modify event[:caller].config.kafka to change
|
|
62
71
|
# producer config
|
|
@@ -17,6 +17,16 @@ module WaterDrop
|
|
|
17
17
|
|
|
18
18
|
private_constant :LIFECYCLE
|
|
19
19
|
|
|
20
|
+
# States in which the producer is considered active and able to accept work. Kept as a single
|
|
21
|
+
# set so the current state can be classified in one atomic read (see `#active?` / `#to_sym`)
|
|
22
|
+
# rather than via a chain of predicate calls that could straddle a concurrent transition.
|
|
23
|
+
ACTIVE_STATES = %i[
|
|
24
|
+
connected
|
|
25
|
+
configured
|
|
26
|
+
disconnecting
|
|
27
|
+
disconnected
|
|
28
|
+
].freeze
|
|
29
|
+
|
|
20
30
|
# Creates a new instance of status with the initial state
|
|
21
31
|
# @return [Status]
|
|
22
32
|
def initialize
|
|
@@ -29,7 +39,10 @@ module WaterDrop
|
|
|
29
39
|
# established or disconnected, meaning it was working but user disconnected for his own
|
|
30
40
|
# reasons though sending could reconnect and continue.
|
|
31
41
|
def active?
|
|
32
|
-
|
|
42
|
+
# Single read of @current so a concurrent transition cannot make this return false for a
|
|
43
|
+
# status that is in fact active (for example flipping configured -> connected mid-check
|
|
44
|
+
# while another thread reloads the client after a fatal error).
|
|
45
|
+
ACTIVE_STATES.include?(@current)
|
|
33
46
|
end
|
|
34
47
|
|
|
35
48
|
# @return [String] current status as a string
|
|
@@ -37,6 +50,13 @@ module WaterDrop
|
|
|
37
50
|
@current.to_s
|
|
38
51
|
end
|
|
39
52
|
|
|
53
|
+
# @return [Symbol] current lifecycle state captured as a single atomic read. Lets callers
|
|
54
|
+
# branch on one consistent value instead of issuing several predicate calls that could
|
|
55
|
+
# observe different states if the producer is transitioning on another thread.
|
|
56
|
+
def to_sym
|
|
57
|
+
@current
|
|
58
|
+
end
|
|
59
|
+
|
|
40
60
|
LIFECYCLE.each do |state|
|
|
41
61
|
# @example
|
|
42
62
|
# def initial?
|
|
@@ -24,7 +24,7 @@ module WaterDrop
|
|
|
24
24
|
producer_id: id,
|
|
25
25
|
message: message
|
|
26
26
|
) do
|
|
27
|
-
wait(produce(message))
|
|
27
|
+
wait(produce(message, "produce_sync"))
|
|
28
28
|
end
|
|
29
29
|
rescue *SUPPORTED_FLOW_ERRORS => e
|
|
30
30
|
# We use this syntax here because we want to preserve the original `#cause` when we
|
|
@@ -84,21 +84,27 @@ module WaterDrop
|
|
|
84
84
|
begin
|
|
85
85
|
with_transaction_if_transactional do
|
|
86
86
|
messages.each do |message|
|
|
87
|
-
dispatched << produce(message)
|
|
87
|
+
dispatched << produce(message, "produce_many_sync")
|
|
88
88
|
end
|
|
89
89
|
end
|
|
90
90
|
rescue *SUPPORTED_FLOW_ERRORS => e
|
|
91
91
|
inline_error = e
|
|
92
92
|
end
|
|
93
93
|
|
|
94
|
+
# Resolve the variant timeout once instead of re-resolving the fiber-local variant for
|
|
95
|
+
# every single handler we wait on
|
|
96
|
+
max_wait_timeout = current_variant.max_wait_timeout
|
|
97
|
+
|
|
94
98
|
# This will ensure, that we have all verdicts before raising the failure, so we pass
|
|
95
99
|
# all delivery handles having a final verdict
|
|
96
|
-
dispatched.each
|
|
100
|
+
dispatched.each do |handler|
|
|
101
|
+
wait(handler, max_wait_timeout: max_wait_timeout, raise_response_error: false)
|
|
102
|
+
end
|
|
97
103
|
|
|
98
104
|
raise(inline_error) if inline_error
|
|
99
105
|
|
|
100
106
|
# This will raise an error on the first error that have happened
|
|
101
|
-
dispatched.each { |handler| wait(handler) }
|
|
107
|
+
dispatched.each { |handler| wait(handler, max_wait_timeout: max_wait_timeout) }
|
|
102
108
|
|
|
103
109
|
dispatched
|
|
104
110
|
end
|
|
@@ -8,6 +8,11 @@ module WaterDrop
|
|
|
8
8
|
# in compacted topics. This module provides a dedicated API so users don't have to manually
|
|
9
9
|
# construct `produce_*(topic:, key:, payload: nil, ...)` calls.
|
|
10
10
|
module Tombstone
|
|
11
|
+
# Contract to validate that tombstone message input is correct
|
|
12
|
+
CONTRACT = Contracts::Tombstone.new
|
|
13
|
+
|
|
14
|
+
private_constant :CONTRACT
|
|
15
|
+
|
|
11
16
|
# Produces a tombstone message to Kafka and waits for it to be delivered
|
|
12
17
|
#
|
|
13
18
|
# @param message [Hash] hash with at least `:topic`, `:key`, and `:partition` keys.
|
|
@@ -66,10 +71,9 @@ module WaterDrop
|
|
|
66
71
|
# @raise [Errors::MessageInvalidError] when key or partition is missing
|
|
67
72
|
def prepare_tombstone(message)
|
|
68
73
|
message = message.dup
|
|
69
|
-
message.delete(:payload)
|
|
70
74
|
message[:payload] = nil
|
|
71
75
|
|
|
72
|
-
|
|
76
|
+
CONTRACT.validate!(message, Errors::MessageInvalidError)
|
|
73
77
|
|
|
74
78
|
message
|
|
75
79
|
end
|
|
@@ -34,7 +34,10 @@ module WaterDrop
|
|
|
34
34
|
# When rdkafka-ruby detects empty hash, it will use the librdkafka defaults
|
|
35
35
|
EMPTY_HASH = {}.freeze
|
|
36
36
|
|
|
37
|
-
|
|
37
|
+
# Contract to validate that variant alteration data is correct
|
|
38
|
+
CONTRACT = Contracts::Variant.new
|
|
39
|
+
|
|
40
|
+
private_constant :EMPTY_HASH, :CONTRACT
|
|
38
41
|
|
|
39
42
|
attr_reader :max_wait_timeout, :topic_config, :producer
|
|
40
43
|
|
|
@@ -56,7 +59,7 @@ module WaterDrop
|
|
|
56
59
|
@default = default
|
|
57
60
|
super(producer)
|
|
58
61
|
|
|
59
|
-
|
|
62
|
+
CONTRACT.validate!(to_h, Errors::VariantInvalidError)
|
|
60
63
|
end
|
|
61
64
|
|
|
62
65
|
# @return [Boolean] is this a default variant for this producer
|
|
@@ -75,23 +78,34 @@ module WaterDrop
|
|
|
75
78
|
Transactions
|
|
76
79
|
].each do |scope|
|
|
77
80
|
scope.instance_methods(false).each do |method_name|
|
|
81
|
+
# We save and restore any variant already active for this producer in this fiber rather
|
|
82
|
+
# than unconditionally deleting it. A variant-wrapped method that yields user code (e.g.
|
|
83
|
+
# `transaction`) may wrap a nested same-producer variant call; without save/restore the
|
|
84
|
+
# inner call's `ensure` would clear the slot the outer scope still needs, so the rest of
|
|
85
|
+
# the outer scope would silently fall back to the default variant. When there was no outer
|
|
86
|
+
# entry we still `delete` (not nil-assign) to avoid leaving stale entries behind.
|
|
87
|
+
#
|
|
78
88
|
# @example
|
|
79
89
|
# def produce_async(*args, &block)
|
|
80
90
|
# ref = Fiber.current.waterdrop_clients ||= {}
|
|
91
|
+
# had = ref.key?(@producer.id)
|
|
92
|
+
# prev = ref[@producer.id]
|
|
81
93
|
# ref[@producer.id] = self
|
|
82
94
|
#
|
|
83
95
|
# @producer.produce_async(*args, &block)
|
|
84
96
|
# ensure
|
|
85
|
-
# ref.delete(@producer.id)
|
|
97
|
+
# had ? (ref[@producer.id] = prev) : ref.delete(@producer.id)
|
|
86
98
|
# end
|
|
87
99
|
class_eval <<-RUBY, __FILE__, __LINE__ + 1
|
|
88
100
|
def #{method_name}(*args, &block)
|
|
89
101
|
ref = Fiber.current.waterdrop_clients ||= {}
|
|
102
|
+
had = ref.key?(@producer.id)
|
|
103
|
+
prev = ref[@producer.id]
|
|
90
104
|
ref[@producer.id] = self
|
|
91
105
|
|
|
92
106
|
@producer.#{method_name}(*args, &block)
|
|
93
107
|
ensure
|
|
94
|
-
ref.delete(@producer.id)
|
|
108
|
+
had ? (ref[@producer.id] = prev) : ref.delete(@producer.id)
|
|
95
109
|
end
|
|
96
110
|
RUBY
|
|
97
111
|
end
|
data/lib/waterdrop/producer.rb
CHANGED
|
@@ -152,8 +152,7 @@ module WaterDrop
|
|
|
152
152
|
|
|
153
153
|
# We should raise an error when trying to use a producer with client from a fork. Always.
|
|
154
154
|
if @client
|
|
155
|
-
# We need to reset the client, otherwise there might be attempt to close the parent
|
|
156
|
-
# client
|
|
155
|
+
# We need to reset the client, otherwise there might be attempt to close the parent client
|
|
157
156
|
@client = nil
|
|
158
157
|
raise Errors::ProducerUsedInParentProcess, Process.pid
|
|
159
158
|
end
|
|
@@ -264,6 +263,29 @@ module WaterDrop
|
|
|
264
263
|
@middleware ||= config.middleware
|
|
265
264
|
end
|
|
266
265
|
|
|
266
|
+
# Returns the variant currently in effect for dispatches on the current fiber.
|
|
267
|
+
#
|
|
268
|
+
# While executing inside a variant-wrapped call (any method invoked on the object returned by
|
|
269
|
+
# {#with} / {#variant}), this returns that variant; otherwise it returns the producer's default
|
|
270
|
+
# variant. It is primarily useful to middleware and instrumentation listeners that run
|
|
271
|
+
# synchronously within a dispatch and want to read the effective per-dispatch settings, such as
|
|
272
|
+
# `#topic_config`, `#max_wait_timeout` or `#default?`.
|
|
273
|
+
#
|
|
274
|
+
# @return [WaterDrop::Producer::Variant] the variant active for the current dispatch on this
|
|
275
|
+
# fiber, or the producer's default variant when not inside a variant-wrapped call
|
|
276
|
+
#
|
|
277
|
+
# @note The lookup is fiber-local and scoped to a single dispatch; it does not represent a
|
|
278
|
+
# producer-wide setting. Called from arbitrary code outside a variant-wrapped call it always
|
|
279
|
+
# returns the default variant. It is likewise not meaningful from asynchronous delivery
|
|
280
|
+
# callbacks (which run on the poller thread, a different fiber) - there it also returns the
|
|
281
|
+
# default variant, not the variant the acknowledged message was dispatched with.
|
|
282
|
+
def current_variant
|
|
283
|
+
# Read-only: the fiber-local hash is created by the variant wrapper methods only when needed,
|
|
284
|
+
# so we must not allocate it here just to look up a variant that may not exist.
|
|
285
|
+
clients = Fiber.current.waterdrop_clients
|
|
286
|
+
(clients && clients[id]) || @default_variant
|
|
287
|
+
end
|
|
288
|
+
|
|
267
289
|
# Disconnects the producer from Kafka while keeping it configured for potential reconnection
|
|
268
290
|
#
|
|
269
291
|
# This method safely disconnects the underlying Kafka client while preserving the producer's
|
|
@@ -339,6 +361,19 @@ module WaterDrop
|
|
|
339
361
|
# @param force [Boolean] should we force closing even with outstanding messages after the
|
|
340
362
|
# max wait timeout
|
|
341
363
|
def close(force: false)
|
|
364
|
+
# If the client was built in a different process, we have been forked. The client and its
|
|
365
|
+
# native resources belong to the parent, so we must never flush or close them here: with the
|
|
366
|
+
# real rdkafka client that is rd_kafka_destroy on a fork-inherited handle (undefined behavior),
|
|
367
|
+
# and it would also tear down a client the parent still uses. We just drop our references and
|
|
368
|
+
# the inherited finalizer and return. This matters most for the GC finalizer, which is
|
|
369
|
+
# inherited across fork and would otherwise run #close in the child at exit.
|
|
370
|
+
if @client && @pid != Process.pid
|
|
371
|
+
@client = nil
|
|
372
|
+
ObjectSpace.undefine_finalizer(id)
|
|
373
|
+
|
|
374
|
+
return
|
|
375
|
+
end
|
|
376
|
+
|
|
342
377
|
# When closing from within the FD poller thread (e.g., from a callback like
|
|
343
378
|
# message.acknowledged or error.occurred), we must delegate to a background thread.
|
|
344
379
|
# Close performs flush which waits for delivery reports, but delivery reports require
|
|
@@ -382,7 +417,18 @@ module WaterDrop
|
|
|
382
417
|
|
|
383
418
|
# Flush has its own buffer mutex but even if it is blocked, flushing can still happen
|
|
384
419
|
# as we close the client after the flushing (even if blocked by the mutex)
|
|
385
|
-
|
|
420
|
+
#
|
|
421
|
+
# This is best-effort: if a buffered message surfaces a terminal error here (for example
|
|
422
|
+
# a fatal error on an idempotent producer), we must still proceed to close the underlying
|
|
423
|
+
# client. Otherwise the native client and its resources would leak and the producer would
|
|
424
|
+
# stay stuck in the `:closing` state. The failure is already surfaced via the
|
|
425
|
+
# `error.occurred` instrumentation emitted by the dispatch itself, so swallowing the
|
|
426
|
+
# re-raised wrapper here does not hide it.
|
|
427
|
+
begin
|
|
428
|
+
flush(true)
|
|
429
|
+
rescue Errors::ProduceError
|
|
430
|
+
nil
|
|
431
|
+
end
|
|
386
432
|
|
|
387
433
|
# We should not close the client in several threads the same time
|
|
388
434
|
# It is safe to run it several times but not exactly the same moment
|
|
@@ -423,6 +469,20 @@ module WaterDrop
|
|
|
423
469
|
end
|
|
424
470
|
end
|
|
425
471
|
end
|
|
472
|
+
rescue ThreadError => e
|
|
473
|
+
# Ruby raises ThreadError with this specific message when Mutex#synchronize (or #lock) is
|
|
474
|
+
# called from a signal trap context. There is no public Ruby API to detect trap context
|
|
475
|
+
# proactively - Thread.current is the same object as the main thread, its status is "run",
|
|
476
|
+
# and caller_locations contains no "trap" frame. The only observable difference is that
|
|
477
|
+
# blocking mutex operations raise this error. We re-raise anything else (e.g.
|
|
478
|
+
# "deadlock; recursive locking") so those are not silently swallowed.
|
|
479
|
+
#
|
|
480
|
+
# Puma's `after_stopped` DSL hook in single mode is one example that fires in trap context.
|
|
481
|
+
# We escape by delegating to a background thread and joining so the caller blocks until the
|
|
482
|
+
# producer is fully closed.
|
|
483
|
+
raise unless e.message == "can't be called from trap context"
|
|
484
|
+
|
|
485
|
+
Thread.new { close(force: force) }.value
|
|
426
486
|
end
|
|
427
487
|
|
|
428
488
|
# Closes the producer with forced close after timeout, purging any outgoing data
|
|
@@ -484,15 +544,21 @@ module WaterDrop
|
|
|
484
544
|
# Ensures that we don't run any operations when the producer is not configured or when it
|
|
485
545
|
# was already closed
|
|
486
546
|
def ensure_active!
|
|
487
|
-
|
|
488
|
-
|
|
547
|
+
# Capture the lifecycle state once. Another thread may be transitioning the producer between
|
|
548
|
+
# states (for example configured -> connected while reloading the client after a fatal error),
|
|
549
|
+
# and issuing several @status predicate calls here could otherwise observe an inconsistent mix
|
|
550
|
+
# of states and raise StatusInvalidError for what is in fact a valid, active producer.
|
|
551
|
+
state = @status.to_sym
|
|
489
552
|
|
|
490
|
-
|
|
491
|
-
|
|
492
|
-
|
|
553
|
+
return if Status::ACTIVE_STATES.include?(state)
|
|
554
|
+
return if state == :closing && @operating_mutex.owned?
|
|
555
|
+
|
|
556
|
+
raise Errors::ProducerNotConfiguredError, id if state == :initial
|
|
557
|
+
raise Errors::ProducerClosedError, id if state == :closing
|
|
558
|
+
raise Errors::ProducerClosedError, id if state == :closed
|
|
493
559
|
|
|
494
560
|
# This should never happen
|
|
495
|
-
raise Errors::StatusInvalidError, [id,
|
|
561
|
+
raise Errors::StatusInvalidError, [id, state.to_s]
|
|
496
562
|
end
|
|
497
563
|
|
|
498
564
|
# Ensures that the message we want to send out to Kafka is actually valid and that it can be
|
|
@@ -506,26 +572,48 @@ module WaterDrop
|
|
|
506
572
|
# Waits on a given handler
|
|
507
573
|
#
|
|
508
574
|
# @param handler [Rdkafka::Producer::DeliveryHandle]
|
|
575
|
+
# @param max_wait_timeout [Integer] max wait timeout in ms. Resolved from the current variant
|
|
576
|
+
# by default but can be passed in by batch operations that wait on many handlers, so the
|
|
577
|
+
# variant is not re-resolved for each of them.
|
|
509
578
|
# @param raise_response_error [Boolean] should we raise the response error after we receive the
|
|
510
579
|
# final result and it is an error.
|
|
511
|
-
def wait(handler, raise_response_error: true)
|
|
580
|
+
def wait(handler, max_wait_timeout: current_variant.max_wait_timeout, raise_response_error: true)
|
|
512
581
|
handler.wait(
|
|
513
|
-
max_wait_timeout_ms:
|
|
582
|
+
max_wait_timeout_ms: max_wait_timeout,
|
|
514
583
|
raise_response_error: raise_response_error
|
|
515
584
|
)
|
|
516
585
|
end
|
|
517
586
|
|
|
518
|
-
#
|
|
519
|
-
#
|
|
520
|
-
|
|
521
|
-
|
|
522
|
-
|
|
587
|
+
# Dispatches a message, ensuring transactional producers take the transaction lock before the
|
|
588
|
+
# operation is counted.
|
|
589
|
+
#
|
|
590
|
+
# For a transactional producer we wrap the whole dispatch (including the operations-counter
|
|
591
|
+
# bookkeeping) in `transaction`, so `@transaction_mutex` is acquired BEFORE
|
|
592
|
+
# `@operations_in_progress` is incremented. This makes `#produce` acquire locks in the same order
|
|
593
|
+
# as `#close` (`@transaction_mutex` -> `@operating_mutex` -> operations counter) and removes a
|
|
594
|
+
# lock-order inversion: without it, a dispatch that had already counted itself could block forever
|
|
595
|
+
# on `@transaction_mutex` held by a concurrent `#close` that was itself waiting for the operations
|
|
596
|
+
# counter to drain. When we already own the transaction lock (inside an explicit transaction block
|
|
597
|
+
# or the closing flush) the order is already correct, so we dispatch directly.
|
|
598
|
+
#
|
|
599
|
+
# @param message [Hash] message we want to send
|
|
600
|
+
# @param label [String] short name of the public dispatch method (e.g. `"produce_sync"`) that
|
|
601
|
+
# we surface in the `message.*` queue-full error type. Passed explicitly by each public entry
|
|
602
|
+
# point so we never have to walk the call stack to recover it (the number of internal frames
|
|
603
|
+
# varies because the transactional path wraps the dispatch in a `transaction`).
|
|
604
|
+
def produce(message, label)
|
|
605
|
+
if transactional? && !@transaction_mutex.owned?
|
|
606
|
+
transaction { produce_to_client(message, label) }
|
|
607
|
+
else
|
|
608
|
+
produce_to_client(message, label)
|
|
609
|
+
end
|
|
523
610
|
end
|
|
524
611
|
|
|
525
612
|
# Runs the client produce method with a given message
|
|
526
613
|
#
|
|
527
614
|
# @param message [Hash] message we want to send
|
|
528
|
-
|
|
615
|
+
# @param label [String] public dispatch method name used in the queue-full error type
|
|
616
|
+
def produce_to_client(message, label)
|
|
529
617
|
produce_time ||= monotonic_now
|
|
530
618
|
|
|
531
619
|
# This can happen only during flushing on closing, in case like this we don't have to
|
|
@@ -537,16 +625,20 @@ module WaterDrop
|
|
|
537
625
|
ensure_active!
|
|
538
626
|
end
|
|
539
627
|
|
|
628
|
+
# The variant is fiber-local and cannot change mid-call, so we resolve it once instead of
|
|
629
|
+
# paying the fiber-local lookup for each usage
|
|
630
|
+
variant = current_variant
|
|
631
|
+
|
|
540
632
|
# We basically only duplicate the message hash only if it is needed.
|
|
541
633
|
# It is needed when user is using a custom settings variant or when symbol is provided as
|
|
542
634
|
# the topic name. We should never mutate user input message as it may be a hash that the
|
|
543
635
|
# user is using for some other operations
|
|
544
|
-
if message[:topic].is_a?(Symbol) || !
|
|
636
|
+
if message[:topic].is_a?(Symbol) || !variant.default?
|
|
545
637
|
message = message.dup
|
|
546
638
|
# In case someone defines topic as a symbol, we need to convert it into a string as
|
|
547
639
|
# librdkafka does not accept symbols
|
|
548
640
|
message[:topic] = message[:topic].to_s
|
|
549
|
-
message[:topic_config] =
|
|
641
|
+
message[:topic_config] = variant.topic_config
|
|
550
642
|
end
|
|
551
643
|
|
|
552
644
|
result = if transactional?
|
|
@@ -560,8 +652,14 @@ module WaterDrop
|
|
|
560
652
|
|
|
561
653
|
result
|
|
562
654
|
rescue SUPPORTED_FLOW_ERRORS.first => e
|
|
563
|
-
# Check if this is a fatal error on an idempotent producer and we should reload
|
|
564
|
-
|
|
655
|
+
# Check if this is a fatal error on an idempotent producer and we should reload.
|
|
656
|
+
#
|
|
657
|
+
# We must never reload while closing. During `#close` the final `flush` runs while this
|
|
658
|
+
# thread already owns `@operating_mutex`; the idempotent reload re-acquires that same mutex,
|
|
659
|
+
# which Ruby rejects with `ThreadError: deadlock; recursive locking`, and it would also try to
|
|
660
|
+
# rebuild the very client we are tearing down. In that case we let the error propagate so
|
|
661
|
+
# `#close` can finish and release the underlying client.
|
|
662
|
+
if idempotent_reloadable?(e) && !@operating_mutex.owned?
|
|
565
663
|
# Check if we've exceeded max reload attempts
|
|
566
664
|
raise unless idempotent_retryable?
|
|
567
665
|
|
|
@@ -597,8 +695,6 @@ module WaterDrop
|
|
|
597
695
|
# in an infinite loop, effectively hanging the processing
|
|
598
696
|
raise unless monotonic_now - produce_time < @config.wait_timeout_on_queue_full
|
|
599
697
|
|
|
600
|
-
label = caller_locations(2, 1)[0].label.split.last.split("#").last
|
|
601
|
-
|
|
602
698
|
# We use this syntax here because we want to preserve the original `#cause` when we
|
|
603
699
|
# instrument the error and there is no way to manually assign `#cause` value. We want to keep
|
|
604
700
|
# the original cause to maintain the same API across all the errors dispatched to the
|
data/lib/waterdrop/version.rb
CHANGED
data/package-lock.json
CHANGED
|
@@ -286,9 +286,9 @@
|
|
|
286
286
|
}
|
|
287
287
|
},
|
|
288
288
|
"node_modules/smol-toml": {
|
|
289
|
-
"version": "1.6.
|
|
290
|
-
"resolved": "https://registry.npmjs.org/smol-toml/-/smol-toml-1.6.
|
|
291
|
-
"integrity": "sha512-
|
|
289
|
+
"version": "1.6.1",
|
|
290
|
+
"resolved": "https://registry.npmjs.org/smol-toml/-/smol-toml-1.6.1.tgz",
|
|
291
|
+
"integrity": "sha512-dWUG8F5sIIARXih1DTaQAX4SsiTXhInKf1buxdY9DIg4ZYPZK5nGM1VRIYmEbDbsHt7USo99xSLFu5Q1IqTmsg==",
|
|
292
292
|
"dev": true,
|
|
293
293
|
"license": "BSD-3-Clause",
|
|
294
294
|
"engines": {
|
|
@@ -312,9 +312,9 @@
|
|
|
312
312
|
}
|
|
313
313
|
},
|
|
314
314
|
"node_modules/yaml": {
|
|
315
|
-
"version": "2.
|
|
316
|
-
"resolved": "https://registry.npmjs.org/yaml/-/yaml-2.
|
|
317
|
-
"integrity": "sha512-
|
|
315
|
+
"version": "2.9.0",
|
|
316
|
+
"resolved": "https://registry.npmjs.org/yaml/-/yaml-2.9.0.tgz",
|
|
317
|
+
"integrity": "sha512-2AvhNX3mb8zd6Zy7INTtSpl1F15HW6Wnqj0srWlkKLcpYl/gMIMJiyuGq2KeI2YFxUPjdlB+3Lc10seMLtL4cA==",
|
|
318
318
|
"dev": true,
|
|
319
319
|
"license": "ISC",
|
|
320
320
|
"bin": {
|
data/renovate.json
CHANGED
|
@@ -17,7 +17,7 @@
|
|
|
17
17
|
{
|
|
18
18
|
"minimumReleaseAge": "7 days",
|
|
19
19
|
"matchDepNames": [
|
|
20
|
-
"
|
|
20
|
+
"*"
|
|
21
21
|
]
|
|
22
22
|
},
|
|
23
23
|
{
|
|
@@ -39,7 +39,15 @@
|
|
|
39
39
|
"ruby/setup-ruby",
|
|
40
40
|
"ruby"
|
|
41
41
|
],
|
|
42
|
-
"groupName": "ruby setup"
|
|
42
|
+
"groupName": "ruby setup",
|
|
43
|
+
"internalChecksFilter": "strict"
|
|
44
|
+
},
|
|
45
|
+
{
|
|
46
|
+
"description": "Let setup-ruby pass age gate before ruby so it is ready when the group PR is created",
|
|
47
|
+
"matchPackageNames": [
|
|
48
|
+
"ruby/setup-ruby"
|
|
49
|
+
],
|
|
50
|
+
"minimumReleaseAge": "5 days"
|
|
43
51
|
}
|
|
44
52
|
],
|
|
45
53
|
"minimumReleaseAge": "7 days",
|
|
@@ -47,6 +55,9 @@
|
|
|
47
55
|
"dependencies"
|
|
48
56
|
],
|
|
49
57
|
"lockFileMaintenance": {
|
|
50
|
-
"enabled": true
|
|
58
|
+
"enabled": true,
|
|
59
|
+
"schedule": [
|
|
60
|
+
"before 4am on the first day of the month"
|
|
61
|
+
]
|
|
51
62
|
}
|
|
52
63
|
}
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: waterdrop
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 2.10.
|
|
4
|
+
version: 2.10.2
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Maciej Mensfeld
|
|
@@ -160,7 +160,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
160
160
|
- !ruby/object:Gem::Version
|
|
161
161
|
version: '0'
|
|
162
162
|
requirements: []
|
|
163
|
-
rubygems_version: 4.0.
|
|
163
|
+
rubygems_version: 4.0.10
|
|
164
164
|
specification_version: 4
|
|
165
165
|
summary: Kafka messaging made easy!
|
|
166
166
|
test_files: []
|