RubyGems - waterdrop - Versions diffs - 2.10.0 → 2.10.2 - Mend

waterdrop 2.10.0 → 2.10.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

checksums.yaml +4 -4
data/.ruby-version +1 -1
data/CHANGELOG.md +23 -0
data/Gemfile +1 -1
data/Gemfile.lock +15 -15
data/docker-compose.oauth.yml +1 -1
data/docker-compose.sasl.yml +1 -1
data/docker-compose.yml +1 -1
data/lib/waterdrop/connection_pool.rb +19 -7
data/lib/waterdrop/errors.rb +1 -2
data/lib/waterdrop/instrumentation/callbacks/delivery.rb +14 -0
data/lib/waterdrop/instrumentation/callbacks/error.rb +1 -2
data/lib/waterdrop/instrumentation/logger_listener.rb +2 -2
data/lib/waterdrop/instrumentation/vendors/datadog/metrics_listener.rb +1 -2
data/lib/waterdrop/polling/config.rb +1 -2
data/lib/waterdrop/polling/latch.rb +1 -2
data/lib/waterdrop/polling/poller.rb +38 -18
data/lib/waterdrop/polling/queue_pipe.rb +1 -2
data/lib/waterdrop/polling/state.rb +2 -4
data/lib/waterdrop/producer/async.rb +2 -2
data/lib/waterdrop/producer/buffer.rb +34 -2
data/lib/waterdrop/producer/idempotence.rb +9 -0
data/lib/waterdrop/producer/status.rb +21 -1
data/lib/waterdrop/producer/sync.rb +10 -4
data/lib/waterdrop/producer/tombstone.rb +6 -2
data/lib/waterdrop/producer/variant.rb +18 -4
data/lib/waterdrop/producer.rb +119 -23
data/lib/waterdrop/version.rb +1 -1
data/package-lock.json +6 -6
data/renovate.json +14 -3
metadata +2 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 5a840a99425c1700eb3ea2cad5da08279e11ba0f4a2600b046dcef14bca9255b
-  data.tar.gz: 6107f58c3ed66912e56379660a021eb87c1e18a0d7fee7702460c1e90b75fea3
+  metadata.gz: 1329d22d7f4f960b2df24949a040dd1e1b57ec73002ed779f3bcd0c5ad4dad20
+  data.tar.gz: b2b9868379d00dd3951df5a2e3b45bdce1b24295284fd5cb6cca410709a38a05
 SHA512:
-  metadata.gz: 454ff01bc3baa3c2b47c46c6538dd8310c3b2af10cafe3551f00630e78d5980b7afeec7eb2047f769b24d0d6106aba4366e04e99d91045154a4ddb0b34edd4b1
-  data.tar.gz: '0980ac5585f18983d4d6918f99cf8bf070160d5c99964459c714b532195d74372785dd9b3e40366fae709f92aea9c364e2b71d33c3ffadde0528f60e863d8348'
+  metadata.gz: 817244d3151a463cca84a1a7fefd298c5e355337325817430cdef5f5eb1c09013d20f801f9ef95b5e9182539f8cb550f25df62042bf93bc7826e9f50ee4b1caf
+  data.tar.gz: c60cc27e7b3787a9610229dacdcbe0440fa7332c5659f5e61348409b4ca2c507718a5bbd93bf5e088f8cc7a5d8b91da981eac0f377257ec0c4394e20dcd92994

data/.ruby-version CHANGED Viewed

	@@ -1 +1 @@
1	- 4.0.3
1	+ 4.0.5

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,28 @@
 # WaterDrop changelog
+## 2.10.2 (2026-06-15)
+- [Feature] Expose `Producer#current_variant` as a public method. It returns the variant active for the current dispatch on the current fiber - the custom variant while inside a `#with`/`#variant`-wrapped call, otherwise the producer's default variant - so middleware and instrumentation listeners running synchronously within a dispatch can read the effective per-dispatch settings (`topic_config`, `max_wait_timeout`, `default?`). The lookup is fiber-local and dispatch-scoped: outside a variant-wrapped call (or from an asynchronous delivery callback) it returns the default variant.
+- [Enhancement] Stop allocating one interpolated string per message in `LoggerListener` batch produce handlers. The quoted topic strings were only ever counted (quoting is a 1:1 mapping), never displayed, so counting the raw topic values yields the identical number with zero string allocations - relevant for large `produce_many_*` batches with the default logger listener attached.
+- [Enhancement] Use `Array#concat` in `Producer#buffer_many` instead of appending messages one by one.
+- [Enhancement] Skip building the `message.acknowledged` instrumentation payload in the delivery callback when nothing is subscribed to that event. The notifications bus already short-circuits on empty listeners, but only after the payload hash was allocated - once per delivered message on the polling thread. Mirrors the listener guard already used by the statistics callback. Late subscribers keep working as the check happens on each emission.
+- [Enhancement] Resolve the fiber-local variant once per `#produce` call and once per `#produce_many_sync` wait phase instead of re-resolving it for every usage and for every waited delivery handle. For a 1,000-message sync batch this removes ~2,000 redundant fiber-local lookups.
+- [Enhancement] Do not allocate the fiber-local variants hash on the `Producer#current_variant` read path. Previously every fiber that produced messages got a Hash pinned to it for the fiber's lifetime (per producer use), even when variants were never used - wasteful under fiber-per-request servers (Falcon, async). The hash is now only created by variant wrapper methods that actually need to write to it.
+- [Enhancement] Cache the variant validation contract in a constant instead of instantiating a new `Contracts::Variant` on every `Producer#with` / `Producer#variant` call (mirrors the existing `Transactions::CONTRACT` pattern).
+- [Enhancement] Cache the tombstone validation contract in a constant instead of instantiating a new `Contracts::Tombstone` per tombstone message, removing per-message allocations in the `tombstone_*` APIs (mirrors the existing `Transactions::CONTRACT` pattern).
+- [Enhancement] Replace explicit `Warning[:performance]` opt-in with a dynamic approach using `Warning.categories` (available since Ruby 3.4) to automatically enable all stable opt-in warning categories in the test suite, including `:strict_unused_block` introduced in Ruby 4.0.
+- [Fix] Prevent a deadlock between a transactional single-message dispatch and `#close`. A single `produce_sync`/`produce_async` on a transactional producer incremented the operations counter (which `#close` drains while holding `@transaction_mutex`) before acquiring `@transaction_mutex` for its per-message transaction - an inverted lock order. A dispatch that had counted itself but not yet taken `@transaction_mutex` could deadlock a concurrent `#close` permanently (the close wait loop has no timeout). Transactional dispatches now take `@transaction_mutex` before the operation is counted, matching `#close`'s lock order (`@transaction_mutex` -> `@operating_mutex` -> operations counter).
+- [Fix] Prevent a deadlock (`ThreadError: deadlock; recursive locking`) when closing an idempotent producer (with `reload_on_idempotent_fatal_error` enabled) that has buffered messages whose final flush surfaces a fatal librdkafka error. `#close` performs the final flush while already holding `@operating_mutex`, and the idempotent fatal-error reload tried to re-acquire that same mutex, leaving the producer stuck in `:closing` with the native client leaked. The idempotent reload is now skipped on the closing path, and the final buffer flush is best-effort so client teardown always completes.
+- [Fix] Make concurrent idempotent fatal-error reload thread-safe. When several threads shared an idempotent producer (with `reload_on_idempotent_fatal_error` enabled), a single fatal librdkafka condition failed all their in-flight produces at once and each entered the reload path; the second reload ran `reload!` after the first had already reset `@client` to `nil`, raising `NoMethodError`. The idempotent reload now bails out if another thread already reloaded (mirroring the transactional path's `return if @status.configured?` guard). Additionally, `Status#active?` now classifies the lifecycle from a single atomic read and `Producer#ensure_active!` branches on one snapshot, so a concurrent `configured -> connected` transition during a reload can no longer make `ensure_active!` raise `StatusInvalidError` for a valid, active producer.
+- [Fix] Stop `#flush_async` / `#flush_sync` from silently dropping valid buffered messages when the dispatch fails. `#flush` removes the batch from the internal buffer before dispatching it, and a failure (a single invalid message failing validation before anything is sent, or a mid-batch inline error such as queue full) previously discarded the entire taken batch - the removed messages were never restored. A failed flush now re-buffers the messages that never reached librdkafka (the whole batch on validation failure or on a transactional rollback, the unsent remainder otherwise) so they can be retried instead of being lost.
+- [Fix] Make `Producer#close` fork-safe so the GC finalizer inherited by a forked child can no longer close the parent's client. `#client` registers an `ObjectSpace` finalizer that calls `#close`; that finalizer is inherited across `fork`, and a child that inherited a used producer, never touched it, and exited normally would run `#close` in the child - flushing and closing (with the real rdkafka client, `rd_kafka_destroy` on a fork-inherited handle, i.e. undefined behavior) a client owned by the parent. `#close` now detects when it runs in a process other than the one that built the client, drops the inherited references and finalizer, and returns without touching the native client (matching the existing fork guard on the `#client` path).
+- [Fix] Guard the internal buffer appends in `Producer#buffer` and `Producer#buffer_many` with `@buffer_mutex`. The appends mutated the shared `@messages` buffer without the lock that `flush`/`purge`/`close` hold while swapping it for a fresh array, so a concurrent swap landing between reading `@messages` and appending could drop the message into an orphaned array that is never dispatched - silently losing buffered messages in the documented "buffer in one thread, flush in another" pattern.
+- [Fix] Stop a nested same-producer variant call from clobbering the outer variant inside a variant `transaction` block. `transaction` is the only variant-wrapped method that yields user code, so a variant call nested inside it (another `variant.produce_*`, or a raw producer dispatch in the same scope) used to delete the shared `Fiber.current.waterdrop_clients` entry on return, making the rest of the block silently fall back to the default variant and dispatch with default `topic_config` (timeouts, compression, partitioner) instead of the altered one. The wrapper now saves and restores the previous entry instead of unconditionally deleting it (still deleting when there was none, so the fiber-local hash does not accumulate stale keys).
+- [Fix] Stop `ConnectionPool#shutdown` and `#reload` from silently dropping in-flight messages. Both closed every pooled producer with `close!` (force), which flushes for the max wait timeout and then purges whatever has not drained - so on a slow or unreachable broker, queued `produce_async` messages were cancelled and lost with no delivery report. They now close producers gracefully by default (`#reload` always; `#shutdown` unless called with the new `force: true`), letting messages flush instead of being purged. Pass `pool.shutdown(force: true)` to keep the old force-and-purge behavior.
+- [Fix] Close a race in the FD poller where a producer registered while the last one was being torn down could be left permanently unpolled (sync produces hang until timeout, async deliveries are never acknowledged). The poller thread decided to exit (last producer unregistered) and cleared its thread reference in two separate, unsynchronized steps, so a `register` landing in that gap saw the still-alive exiting thread, skipped starting a fresh one, and then had its producer's state closed by the exiting thread's cleanup. The thread now decides to stop and clears its reference in a single mutex section, so a racing `register` either keeps it running or starts a fresh thread; and the exit cleanup runs only on an abnormal exit, since a normal exit always leaves an empty registry and so can never close a producer registered in the gap.
+## 2.10.1 (2026-05-25)
+- [Fix] Prevent `Producer#close` from raising `ThreadError: can't be called from trap context` when invoked from a Ruby signal trap context (e.g. Puma's `after_stopped` DSL hook in single mode). `close` now detects this case and delegates to a background thread, joining it so the caller blocks until the producer is fully closed (#866).
 ## 2.10.0 (2026-05-07)
 - [Fix] Clean up native rdkafka client, global instrumentation callbacks, and poller registration when `init_transactions` fails during producer client construction. Previously, each failed attempt permanently leaked native threads, pipe file descriptors, and callback registry entries because the started `rd_kafka_t` handle was abandoned without being destroyed.
 - **[Breaking]** Skip emitting librdkafka statistics when nothing is subscribed to `statistics.emitted` at the time the underlying rdkafka client is constructed. When no listener is present at build time, `statistics.interval.ms` is forced to `0` regardless of user configuration and the statistics callback is not registered, saving substantial allocations in the hot path (no JSON parsing, no statistics hash materialization, no decoration work). To use statistics, subscribe a listener to `statistics.emitted` BEFORE the first producer use (before the underlying client is lazily initialized).

data/Gemfile CHANGED Viewed

@@ -5,7 +5,7 @@ source "https://rubygems.org"
 gemspec
 # Relaxed from 2.7 because we support Ruby 3.1
-gem "zeitwerk", "~> 2.7.0"
+gem "zeitwerk", "~> 2.8.0"
 group :development do
   gem "byebug"

data/Gemfile.lock CHANGED Viewed

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    waterdrop (2.10.0)
+    waterdrop (2.10.2)
       karafka-core (>= 2.5.12, < 3.0.0)
       karafka-rdkafka (>= 0.24.0)
       zeitwerk (~> 2.3)
@@ -16,11 +16,11 @@ GEM
     drb (2.2.3)
     ffi (1.17.4)
     io-console (0.8.2)
-    json (2.19.3)
-    karafka-core (2.5.12)
+    json (2.19.7)
+    karafka-core (2.5.13)
       karafka-rdkafka (>= 0.20.0)
       logger (>= 1.6.0)
-    karafka-rdkafka (0.25.0)
+    karafka-rdkafka (0.27.2)
       ffi (~> 1.17.1)
       json (> 2.0)
       logger
@@ -35,7 +35,7 @@ GEM
       ruby2_keywords (>= 0.0.5)
     ostruct (0.6.3)
     prism (1.9.0)
-    rake (13.3.1)
+    rake (13.4.2)
     reline (0.6.3)
       io-console (~> 0.5)
     ruby2_keywords (0.0.5)
@@ -45,8 +45,8 @@ GEM
       simplecov_json_formatter (~> 0.1)
     simplecov-html (0.13.2)
     simplecov_json_formatter (0.1.4)
-    warning (1.5.0)
-    zeitwerk (2.7.5)
+    warning (1.6.0)
+    zeitwerk (2.8.2)
 PLATFORMS
   ruby
@@ -60,7 +60,7 @@ DEPENDENCIES
   simplecov
   warning
   waterdrop!
-  zeitwerk (~> 2.7.0)
+  zeitwerk (~> 2.8.0)
 CHECKSUMS
   byebug (13.0.0) sha256=d2263efe751941ca520fa29744b71972d39cbc41839496706f5d9b22e92ae05d
@@ -69,24 +69,24 @@ CHECKSUMS
   drb (2.2.3) sha256=0b00d6fdb50995fe4a45dea13663493c841112e4068656854646f418fda13373
   ffi (1.17.4) sha256=bcd1642e06f0d16fc9e09ac6d49c3a7298b9789bcb58127302f934e437d60acf
   io-console (0.8.2) sha256=d6e3ae7a7cc7574f4b8893b4fca2162e57a825b223a177b7afa236c5ef9814cc
-  json (2.19.3) sha256=289b0bb53052a1fa8c34ab33cc750b659ba14a5c45f3fcf4b18762dc67c78646
-  karafka-core (2.5.12) sha256=57cbb45a187fbe3df9b9a57af59dda7211f9969524b2afbb83792a64705860e1
-  karafka-rdkafka (0.25.0) sha256=67b316b942cf9ff7e9d7bbf9029e6f2d91eba97b4c9dc93b9f49fd207dfb80f8
+  json (2.19.7) sha256=fe432c8639f6efff69f9d73b518a3705d9581ab93156f981ea72806e1e5bcc3e
+  karafka-core (2.5.13) sha256=0acec083043bb6166c4b647a7458091cc7b08066d3b92a026932925ec7e07f61
+  karafka-rdkafka (0.27.2) sha256=3ccce96306642be70bff8168e4e737fc10f2ffae20bc0ff0a43d88dbb7452d31
   logger (1.7.0) sha256=196edec7cc44b66cfb40f9755ce11b392f21f7967696af15d274dde7edff0203
   mini_portile2 (2.8.9) sha256=0cd7c7f824e010c072e33f68bc02d85a00aeb6fce05bb4819c03dfd3c140c289
   minitest (6.0.6) sha256=153ea36d1d987a62942382b61075745042a2b3123b1cd48f4c3675af9cc7d6f1
   mocha (3.1.0) sha256=75f42d69ebfb1f10b32489dff8f8431d37a418120ecdfc07afe3bc183d4e1d56
   ostruct (0.6.3) sha256=95a2ed4a4bd1d190784e666b47b2d3f078e4a9efda2fccf18f84ddc6538ed912
   prism (1.9.0) sha256=7b530c6a9f92c24300014919c9dcbc055bf4cdf51ec30aed099b06cd6674ef85
-  rake (13.3.1) sha256=8c9e89d09f66a26a01264e7e3480ec0607f0c497a861ef16063604b1b08eb19c
+  rake (13.4.2) sha256=cb825b2bd5f1f8e91ca37bddb4b9aaf345551b4731da62949be002fa89283701
   reline (0.6.3) sha256=1198b04973565b36ec0f11542ab3f5cfeeec34823f4e54cebde90968092b1835
   ruby2_keywords (0.0.5) sha256=ffd13740c573b7301cf7a2e61fc857b2a8e3d3aff32545d6f8300d8bae10e3ef
   simplecov (0.22.0) sha256=fe2622c7834ff23b98066bb0a854284b2729a569ac659f82621fc22ef36213a5
   simplecov-html (0.13.2) sha256=bd0b8e54e7c2d7685927e8d6286466359b6f16b18cb0df47b508e8d73c777246
   simplecov_json_formatter (0.1.4) sha256=529418fbe8de1713ac2b2d612aa3daa56d316975d307244399fa4838c601b428
-  warning (1.5.0) sha256=0f12c49fea0c06757778eefdcc7771e4fd99308901e3d55c504d87afdd718c53
-  waterdrop (2.10.0)
-  zeitwerk (2.7.5) sha256=d8da92128c09ea6ec62c949011b00ed4a20242b255293dd66bf41545398f73dd
+  warning (1.6.0) sha256=a49cdfae19fb77d19afff2efbe45f8ab759e9cd25b4e4ce2c79dbaf46bdb6c9e
+  waterdrop (2.10.2)
+  zeitwerk (2.8.2) sha256=7212a61311083c604184b1ea2574b9aa05cd14f855a0841c06985cabe9181d12
 BUNDLED WITH
   4.0.6

data/docker-compose.oauth.yml CHANGED Viewed

@@ -18,7 +18,7 @@ services:
       start_period: 90s
   kafka-oauth:
-    image: confluentinc/cp-kafka:8.2.0
+    image: confluentinc/cp-kafka:8.2.1
     container_name: kafka-oauth
     depends_on:
       keycloak:

data/docker-compose.sasl.yml CHANGED Viewed

@@ -1,6 +1,6 @@
 services:
   kafka-sasl:
-    image: confluentinc/cp-kafka:8.2.0
+    image: confluentinc/cp-kafka:8.2.1
     container_name: kafka-sasl
     ports:
       - "9095:9095"

data/docker-compose.yml CHANGED Viewed

@@ -1,7 +1,7 @@
 services:
   kafka:
     container_name: kafka
-    image: confluentinc/cp-kafka:8.2.0
+    image: confluentinc/cp-kafka:8.2.1
     ports:
       - 9092:9092

data/lib/waterdrop/connection_pool.rb CHANGED Viewed

@@ -113,11 +113,14 @@ module WaterDrop
       end
       # Shutdown the global connection pool
-      def shutdown
+      #
+      # @param force [Boolean] when true, force-close each producer, purging unflushed messages.
+      #   Defaults to false (graceful close) so in-flight messages are not silently dropped.
+      def shutdown(force: false)
         return unless @default_pool
         pool = @default_pool
-        @default_pool.shutdown
+        @default_pool.shutdown(force: force)
         @default_pool = nil
         # Emit global event for pool shutdown
@@ -237,9 +240,16 @@ module WaterDrop
     end
     # Shutdown the connection pool
-    def shutdown
+    #
+    # @param force [Boolean] when true, force-close each producer, purging any messages that do not
+    #   flush within the producer's max wait timeout. Defaults to false: producers are closed
+    #   gracefully so in-flight messages are flushed instead of being silently dropped when the
+    #   broker is slow or unreachable.
+    def shutdown(force: false)
       @pool.shutdown do |producer|
-        producer.close! if producer&.status&.active?
+        next unless producer&.status&.active?
+        force ? producer.close! : producer.close
       end
       # Emit event after pool is shut down
@@ -254,11 +264,13 @@ module WaterDrop
     # for API consistency across both individual producers and connection pools
     alias_method :close, :shutdown
-    # Reload all connections in the pool
-    # Useful for configuration changes or error recovery
+    # Reload all connections in the pool. Useful for configuration changes or error recovery
+    #
+    # @note Producers are always closed gracefully (never force-closed): a reload must not drop
+    #   in-flight messages, so it waits for them to flush rather than purging the queue.
     def reload
       @pool.reload do |producer|
-        producer.close! if producer&.status&.active?
+        producer.close if producer&.status&.active?
       end
       # Emit event after pool is reloaded

data/lib/waterdrop/errors.rb CHANGED Viewed

@@ -80,7 +80,6 @@ module WaterDrop
     end
   end
-  # Alias so we can have a nicer API to abort transactions
-  # This makes referencing easier
+  # Alias so we can have a nicer API to abort transactions. This makes referencing easier
   AbortTransaction = Errors::AbortTransaction
 end

data/lib/waterdrop/instrumentation/callbacks/delivery.rb CHANGED Viewed

@@ -60,7 +60,14 @@ module WaterDrop
         private
         # @param delivery_report [Rdkafka::Producer::DeliveryReport] delivery report
+        # @note This is the most frequently fired event in the system (once per delivered
+        #   message) and most users do not subscribe to it. While the notifications bus
+        #   short-circuits on empty listeners, that happens only after the payload hash is
+        #   built, so we guard here to keep the no-listeners path allocation-free. We check on
+        #   each emission to support late subscribers.
         def instrument_acknowledged(delivery_report)
+          return unless listening?
           @monitor.instrument(
             "message.acknowledged",
             caller: self,
@@ -111,6 +118,13 @@ module WaterDrop
         def build_error(delivery_report)
           ::Rdkafka::RdkafkaError.new(delivery_report.error)
         end
+        # Check if anyone is listening to the acknowledgement events
+        # @return [Boolean] true if there are any listeners
+        def listening?
+          listeners = @monitor.listeners["message.acknowledged"]
+          listeners && !listeners.empty?
+        end
       end
     end
   end

data/lib/waterdrop/instrumentation/callbacks/error.rb CHANGED Viewed

@@ -21,8 +21,7 @@ module WaterDrop
         # @note When there is a particular message produce error (not internal error), the error
         #   is shipped via the delivery callback, not via error callback.
         def call(client_name, error)
-          # Emit only errors related to our client
-          # Same as with statistics (mor explanation there)
+          # Emit only errors related to our client, same as with statistics (mor explanation there)
           return unless @client_name == client_name
           @monitor.instrument(

data/lib/waterdrop/instrumentation/logger_listener.rb CHANGED Viewed

@@ -47,7 +47,7 @@ module WaterDrop
       # @param event [Dry::Events::Event] event that happened with the details
       def on_messages_produced_async(event)
         messages = event[:messages]
-        topics_count = messages.map { |message| "'#{message[:topic]}'" }.uniq.count
+        topics_count = messages.map { |message| message[:topic] }.uniq.count
         info(
           event,
@@ -62,7 +62,7 @@ module WaterDrop
       # @param event [Dry::Events::Event] event that happened with the details
       def on_messages_produced_sync(event)
         messages = event[:messages]
-        topics_count = messages.map { |message| "'#{message[:topic]}'" }.uniq.count
+        topics_count = messages.map { |message| message[:topic] }.uniq.count
         info(event, "Sync producing of #{messages.size} messages to #{topics_count} topics")

data/lib/waterdrop/instrumentation/vendors/datadog/metrics_listener.rb CHANGED Viewed

@@ -218,8 +218,7 @@ module WaterDrop
             when :brokers
               statistics.fetch("brokers").each_value do |broker_statistics|
                 # Skip bootstrap nodes
-                # Bootstrap nodes have nodeid -1, other nodes have positive
-                # node ids
+                # Bootstrap nodes have nodeid -1, other nodes have positive node ids
                 next if broker_statistics["nodeid"] == -1
                 public_send(

data/lib/waterdrop/polling/config.rb CHANGED Viewed

@@ -16,8 +16,7 @@ module WaterDrop
       extend ::Karafka::Core::Configurable
       # Ruby thread priority for the poller thread
-      # Valid range: -3 to 3 (Ruby's thread priority range)
-      # Higher values = higher priority
+      # Valid range: -3 to 3 (Ruby's thread priority range). Higher values = higher priority
       setting :thread_priority, default: 0
       # IO.select timeout in milliseconds

data/lib/waterdrop/polling/latch.rb CHANGED Viewed

@@ -33,8 +33,7 @@ module WaterDrop
         end
       end
-      # Waits until the latch is released
-      # Returns immediately if already released
+      # Waits until the latch is released. Returns immediately if already released
       def wait
         @mutex.synchronize do
           @cv.wait(@mutex) until @released

data/lib/waterdrop/polling/poller.rb CHANGED Viewed

@@ -186,8 +186,7 @@ module WaterDrop
         @ios_dirty = true
       end
-      # Ensures the polling thread is running
-      # Must be called within @mutex.synchronize
+      # Ensures the polling thread is running. Must be called within @mutex.synchronize
       def ensure_thread_running!
         return if @thread&.alive?
@@ -200,9 +199,29 @@ module WaterDrop
       # Main polling loop that runs in a dedicated thread
       def polling_loop
         backoff_ms = 0
+        clean_exit = false
         loop do
-          break if @shutdown
+          # Decide whether to stop AND clear @thread in a single critical section. This is what
+          # closes the register/shutdown race: a concurrent `register` is serialized by @mutex, so
+          # it either runs before this block (we observe its producer plus `@shutdown = false` and
+          # keep polling) or after it (it finds `@thread` already nil and starts a fresh thread).
+          # Previously the exit decision and the `@thread = nil` teardown were separate and
+          # unsynchronized, so a producer registered in that gap was treated as already served by
+          # this exiting thread and then closed by its cleanup - left registered but never polled.
+          stop = @mutex.synchronize do
+            if @shutdown || @producers.empty?
+              @thread = nil
+              true
+            else
+              false
+            end
+          end
+          if stop
+            clean_exit = true
+            break
+          end
           # Apply backoff from previous error
           if backoff_ms > 0
@@ -213,9 +232,9 @@ module WaterDrop
           # Collect readable IOs (queue FDs)
           readable_ios, io_to_state = collect_readable_ios
-          # Exit when no producers registered
-          # New registrations will start a fresh thread via ensure_thread_running!
-          break if readable_ios.empty?
+          # A producer may have registered right after the stop check above; if the cached snapshot
+          # is momentarily empty, loop to rebuild it instead of selecting on an empty set.
+          next if readable_ios.empty?
           poll_with_select(readable_ios, io_to_state)
         rescue => e
@@ -229,13 +248,12 @@ module WaterDrop
             end
         end
       ensure
-        # Clear thread reference first so new registrations will start a fresh thread
-        # This prevents race where register sees old thread as alive during cleanup
-        @mutex.synchronize { @thread = nil }
-        # When the poller thread exits (error or clean shutdown), close all remaining states
-        # This releases any latches that might be waiting in unregister calls
-        close_all_states
+        # A normal exit already cleared @thread above with an empty registry, so there is nothing to
+        # release - and skipping cleanup here is what keeps a producer registered in the exit gap
+        # from being closed: its fresh thread owns it now. Only an abnormal exit (an exception
+        # escaped the loop) can leave producers registered with callers blocked in `unregister`;
+        # release those so they don't hang.
+        close_all_states unless clean_exit
       end
       # Broadcasts an error to all registered producers' monitors
@@ -379,13 +397,15 @@ module WaterDrop
         state.close
       end
-      # Closes all remaining producer states
-      # Called when the poller thread exits to release any pending latches
-      # This prevents deadlocks if producers are waiting in unregister
+      # Releases any producer states still registered when the poller thread exits ABNORMALLY (an
+      # exception escaped the loop), so callers blocked in `unregister` waiting on their latch are
+      # not left hanging. A normal exit clears the registry through the loop and never calls this,
+      # which is why no thread-ownership check is needed here.
       def close_all_states
         states = @mutex.synchronize do
-          to_close = @producers.values.dup
-          @producers.clear
+          @thread = nil
+          to_close = @producers.values
+          @producers = {}
           @ios_dirty = true
           to_close
         end

data/lib/waterdrop/polling/queue_pipe.rb CHANGED Viewed

@@ -25,8 +25,7 @@ module WaterDrop
         client.enable_queue_io_events(@writer.fileno)
       end
-      # Signals by writing a byte to the pipe
-      # Used to wake IO.select for continue/close signals
+      # Signals by writing a byte to the pipe. Used to wake IO.select for continue/close signals
       # Thread-safe and non-blocking; silently ignores errors
       def signal
         @writer.write_nonblock("W", exception: false)

data/lib/waterdrop/polling/state.rb CHANGED Viewed

@@ -53,8 +53,7 @@ module WaterDrop
         @io = @queue_pipe.reader
       end
-      # Drains the queue pipe
-      # Called before polling to clear any pending signals
+      # Drains the queue pipe. Called before polling to clear any pending signals
       def drain
         @queue_pipe.drain
       end
@@ -88,8 +87,7 @@ module WaterDrop
       private_constant :STALE_CHECK_THROTTLE_MS
-      # Marks this producer as having been polled
-      # Called after polling to track staleness
+      # Marks this producer as having been polled. Called after polling to track staleness
       def mark_polled!
         @last_poll_time = monotonic_now
       end

data/lib/waterdrop/producer/async.rb CHANGED Viewed

@@ -21,7 +21,7 @@ module WaterDrop
           "message.produced_async",
           producer_id: id,
           message: message
-        ) { produce(message) }
+        ) { produce(message, "produce_async") }
       rescue *SUPPORTED_FLOW_ERRORS => e
         # We use this syntax here because we want to preserve the original `#cause` when we
         # instrument the error and there is no way to manually assign `#cause` value
@@ -62,7 +62,7 @@ module WaterDrop
         ) do
           with_transaction_if_transactional do
             messages.each do |message|
-              dispatched << produce(message)
+              dispatched << produce(message, "produce_many_async")
             end
           end

data/lib/waterdrop/producer/buffer.rb CHANGED Viewed

@@ -12,12 +12,15 @@ module WaterDrop
       def buffer(message)
         ensure_active!
+        # The append runs under @buffer_mutex because flush/purge/close swap @messages for a fresh
+        # array under the same lock. Without it, a concurrent swap between reading @messages and
+        # appending would land the message in the orphaned old array and silently lose it.
         @monitor.instrument(
           "message.buffered",
           producer_id: id,
           message: message,
           buffer: @messages
-        ) { @messages << message }
+        ) { @buffer_mutex.synchronize { @messages << message } }
       end
       # Adds given messages into the internal producer buffer without flushing them to Kafka
@@ -29,13 +32,16 @@ module WaterDrop
       def buffer_many(messages)
         ensure_active!
+        # The concat runs under @buffer_mutex for the same reason as #buffer: flush/purge/close swap
+        # @messages under the lock, so an unguarded concat could append into an array that has just
+        # been captured for dispatch (or discarded), silently losing the messages.
         @monitor.instrument(
           "messages.buffered",
           producer_id: id,
           messages: messages,
           buffer: @messages
         ) do
-          messages.each { |message| @messages << message }
+          @buffer_mutex.synchronize { @messages.concat(messages) }
           messages
         end
       end
@@ -83,6 +89,32 @@ module WaterDrop
         return data_for_dispatch if data_for_dispatch.empty?
         sync ? produce_many_sync(data_for_dispatch) : produce_many_async(data_for_dispatch)
+      rescue Errors::ProduceManyError => e
+        # A dispatch failed partway through the batch. Re-buffer the messages that never reached
+        # librdkafka so a partial failure does not silently drop valid buffered messages. For a
+        # transactional producer the whole batch is rolled back (nothing is visible to consumers),
+        # so all of it is restored; for a regular producer `e.dispatched` holds the handles already
+        # created, so only the remainder is restored.
+        requeue_unflushed(transactional? ? data_for_dispatch : data_for_dispatch.drop(e.dispatched.size))
+        raise
+      rescue Errors::MessageInvalidError
+        # Validation runs before anything is dispatched, so nothing reached librdkafka. Restore the
+        # whole batch instead of dropping valid messages alongside the invalid one.
+        requeue_unflushed(data_for_dispatch)
+        raise
+      end
+      # Puts not-yet-dispatched messages back at the front of the buffer (preserving their original
+      # order relative to each other and to anything buffered concurrently), so a failed flush does
+      # not lose them.
+      #
+      # @param messages [Array<Hash>] messages to restore to the buffer
+      def requeue_unflushed(messages)
+        return if messages.empty?
+        @buffer_mutex.synchronize { @messages.unshift(*messages) }
       end
     end
   end

data/lib/waterdrop/producer/idempotence.rb CHANGED Viewed

@@ -57,6 +57,15 @@ module WaterDrop
       # @note After reload, the producer will automatically retry the failed operation
       def idempotent_reload_client_on_fatal_error(attempt, error)
         @operating_mutex.synchronize do
+          # When several threads share an idempotent producer, one fatal librdkafka condition fails
+          # all their in-flight produces at once and each enters this method. The mutex serializes
+          # them, but a thread that waited here may arrive after another has already reloaded -
+          # resetting @client to nil and moving the producer to the configured state. Running
+          # reload! again would call methods on a nil @client and raise NoMethodError, so we bail
+          # out and let #produce retry against the freshly reloaded client. This mirrors the
+          # `return if @status.configured?` guard on the transactional reload path.
+          next if @client.nil? || @status.configured?
           # Emit producer.reload event before reload
           # Users can subscribe to this event and modify event[:caller].config.kafka to change
           # producer config

data/lib/waterdrop/producer/status.rb CHANGED Viewed

@@ -17,6 +17,16 @@ module WaterDrop
       private_constant :LIFECYCLE
+      # States in which the producer is considered active and able to accept work. Kept as a single
+      # set so the current state can be classified in one atomic read (see `#active?` / `#to_sym`)
+      # rather than via a chain of predicate calls that could straddle a concurrent transition.
+      ACTIVE_STATES = %i[
+        connected
+        configured
+        disconnecting
+        disconnected
+      ].freeze
       # Creates a new instance of status with the initial state
       # @return [Status]
       def initialize
@@ -29,7 +39,10 @@ module WaterDrop
       #   established or disconnected, meaning it was working but user disconnected for his own
       #   reasons though sending could reconnect and continue.
       def active?
-        connected? || configured? || disconnecting? || disconnected?
+        # Single read of @current so a concurrent transition cannot make this return false for a
+        # status that is in fact active (for example flipping configured -> connected mid-check
+        # while another thread reloads the client after a fatal error).
+        ACTIVE_STATES.include?(@current)
       end
       # @return [String] current status as a string
@@ -37,6 +50,13 @@ module WaterDrop
         @current.to_s
       end
+      # @return [Symbol] current lifecycle state captured as a single atomic read. Lets callers
+      #   branch on one consistent value instead of issuing several predicate calls that could
+      #   observe different states if the producer is transitioning on another thread.
+      def to_sym
+        @current
+      end
       LIFECYCLE.each do |state|
         # @example
         #   def initial?

data/lib/waterdrop/producer/sync.rb CHANGED Viewed

@@ -24,7 +24,7 @@ module WaterDrop
           producer_id: id,
           message: message
         ) do
-          wait(produce(message))
+          wait(produce(message, "produce_sync"))
         end
       rescue *SUPPORTED_FLOW_ERRORS => e
         # We use this syntax here because we want to preserve the original `#cause` when we
@@ -84,21 +84,27 @@ module WaterDrop
           begin
             with_transaction_if_transactional do
               messages.each do |message|
-                dispatched << produce(message)
+                dispatched << produce(message, "produce_many_sync")
               end
             end
           rescue *SUPPORTED_FLOW_ERRORS => e
             inline_error = e
           end
+          # Resolve the variant timeout once instead of re-resolving the fiber-local variant for
+          # every single handler we wait on
+          max_wait_timeout = current_variant.max_wait_timeout
           # This will ensure, that we have all verdicts before raising the failure, so we pass
           # all delivery handles having a final verdict
-          dispatched.each { |handler| wait(handler, raise_response_error: false) }
+          dispatched.each do |handler|
+            wait(handler, max_wait_timeout: max_wait_timeout, raise_response_error: false)
+          end
           raise(inline_error) if inline_error
           # This will raise an error on the first error that have happened
-          dispatched.each { |handler| wait(handler) }
+          dispatched.each { |handler| wait(handler, max_wait_timeout: max_wait_timeout) }
           dispatched
         end

data/lib/waterdrop/producer/tombstone.rb CHANGED Viewed

@@ -8,6 +8,11 @@ module WaterDrop
     # in compacted topics. This module provides a dedicated API so users don't have to manually
     # construct `produce_*(topic:, key:, payload: nil, ...)` calls.
     module Tombstone
+      # Contract to validate that tombstone message input is correct
+      CONTRACT = Contracts::Tombstone.new
+      private_constant :CONTRACT
       # Produces a tombstone message to Kafka and waits for it to be delivered
       #
       # @param message [Hash] hash with at least `:topic`, `:key`, and `:partition` keys.
@@ -66,10 +71,9 @@ module WaterDrop
       # @raise [Errors::MessageInvalidError] when key or partition is missing
       def prepare_tombstone(message)
         message = message.dup
-        message.delete(:payload)
         message[:payload] = nil
-        Contracts::Tombstone.new.validate!(message, Errors::MessageInvalidError)
+        CONTRACT.validate!(message, Errors::MessageInvalidError)
         message
       end

data/lib/waterdrop/producer/variant.rb CHANGED Viewed

@@ -34,7 +34,10 @@ module WaterDrop
       # When rdkafka-ruby detects empty hash, it will use the librdkafka defaults
       EMPTY_HASH = {}.freeze
-      private_constant :EMPTY_HASH
+      # Contract to validate that variant alteration data is correct
+      CONTRACT = Contracts::Variant.new
+      private_constant :EMPTY_HASH, :CONTRACT
       attr_reader :max_wait_timeout, :topic_config, :producer
@@ -56,7 +59,7 @@ module WaterDrop
         @default = default
         super(producer)
-        Contracts::Variant.new.validate!(to_h, Errors::VariantInvalidError)
+        CONTRACT.validate!(to_h, Errors::VariantInvalidError)
       end
       # @return [Boolean] is this a default variant for this producer
@@ -75,23 +78,34 @@ module WaterDrop
         Transactions
       ].each do |scope|
         scope.instance_methods(false).each do |method_name|
+          # We save and restore any variant already active for this producer in this fiber rather
+          # than unconditionally deleting it. A variant-wrapped method that yields user code (e.g.
+          # `transaction`) may wrap a nested same-producer variant call; without save/restore the
+          # inner call's `ensure` would clear the slot the outer scope still needs, so the rest of
+          # the outer scope would silently fall back to the default variant. When there was no outer
+          # entry we still `delete` (not nil-assign) to avoid leaving stale entries behind.
+          #
           # @example
           #   def produce_async(*args, &block)
           #     ref = Fiber.current.waterdrop_clients ||= {}
+          #     had = ref.key?(@producer.id)
+          #     prev = ref[@producer.id]
           #     ref[@producer.id] = self
           #
           #     @producer.produce_async(*args, &block)
           #   ensure
-          #     ref.delete(@producer.id)
+          #     had ? (ref[@producer.id] = prev) : ref.delete(@producer.id)
           #   end
           class_eval <<-RUBY, __FILE__, __LINE__ + 1
             def #{method_name}(*args, &block)
               ref = Fiber.current.waterdrop_clients ||= {}
+              had = ref.key?(@producer.id)
+              prev = ref[@producer.id]
               ref[@producer.id] = self
               @producer.#{method_name}(*args, &block)
             ensure
-              ref.delete(@producer.id)
+              had ? (ref[@producer.id] = prev) : ref.delete(@producer.id)
             end
           RUBY
         end

data/lib/waterdrop/producer.rb CHANGED Viewed

@@ -152,8 +152,7 @@ module WaterDrop
         # We should raise an error when trying to use a producer with client from a fork. Always.
         if @client
-          # We need to reset the client, otherwise there might be attempt to close the parent
-          # client
+          # We need to reset the client, otherwise there might be attempt to close the parent client
           @client = nil
           raise Errors::ProducerUsedInParentProcess, Process.pid
         end
@@ -264,6 +263,29 @@ module WaterDrop
       @middleware ||= config.middleware
     end
+    # Returns the variant currently in effect for dispatches on the current fiber.
+    #
+    # While executing inside a variant-wrapped call (any method invoked on the object returned by
+    # {#with} / {#variant}), this returns that variant; otherwise it returns the producer's default
+    # variant. It is primarily useful to middleware and instrumentation listeners that run
+    # synchronously within a dispatch and want to read the effective per-dispatch settings, such as
+    # `#topic_config`, `#max_wait_timeout` or `#default?`.
+    #
+    # @return [WaterDrop::Producer::Variant] the variant active for the current dispatch on this
+    #   fiber, or the producer's default variant when not inside a variant-wrapped call
+    #
+    # @note The lookup is fiber-local and scoped to a single dispatch; it does not represent a
+    #   producer-wide setting. Called from arbitrary code outside a variant-wrapped call it always
+    #   returns the default variant. It is likewise not meaningful from asynchronous delivery
+    #   callbacks (which run on the poller thread, a different fiber) - there it also returns the
+    #   default variant, not the variant the acknowledged message was dispatched with.
+    def current_variant
+      # Read-only: the fiber-local hash is created by the variant wrapper methods only when needed,
+      # so we must not allocate it here just to look up a variant that may not exist.
+      clients = Fiber.current.waterdrop_clients
+      (clients && clients[id]) || @default_variant
+    end
     # Disconnects the producer from Kafka while keeping it configured for potential reconnection
     #
     # This method safely disconnects the underlying Kafka client while preserving the producer's
@@ -339,6 +361,19 @@ module WaterDrop
     # @param force [Boolean] should we force closing even with outstanding messages after the
     #   max wait timeout
     def close(force: false)
+      # If the client was built in a different process, we have been forked. The client and its
+      # native resources belong to the parent, so we must never flush or close them here: with the
+      # real rdkafka client that is rd_kafka_destroy on a fork-inherited handle (undefined behavior),
+      # and it would also tear down a client the parent still uses. We just drop our references and
+      # the inherited finalizer and return. This matters most for the GC finalizer, which is
+      # inherited across fork and would otherwise run #close in the child at exit.
+      if @client && @pid != Process.pid
+        @client = nil
+        ObjectSpace.undefine_finalizer(id)
+        return
+      end
       # When closing from within the FD poller thread (e.g., from a callback like
       # message.acknowledged or error.occurred), we must delegate to a background thread.
       # Close performs flush which waits for delivery reports, but delivery reports require
@@ -382,7 +417,18 @@ module WaterDrop
             # Flush has its own buffer mutex but even if it is blocked, flushing can still happen
             # as we close the client after the flushing (even if blocked by the mutex)
-            flush(true)
+            #
+            # This is best-effort: if a buffered message surfaces a terminal error here (for example
+            # a fatal error on an idempotent producer), we must still proceed to close the underlying
+            # client. Otherwise the native client and its resources would leak and the producer would
+            # stay stuck in the `:closing` state. The failure is already surfaced via the
+            # `error.occurred` instrumentation emitted by the dispatch itself, so swallowing the
+            # re-raised wrapper here does not hide it.
+            begin
+              flush(true)
+            rescue Errors::ProduceError
+              nil
+            end
             # We should not close the client in several threads the same time
             # It is safe to run it several times but not exactly the same moment
@@ -423,6 +469,20 @@ module WaterDrop
           end
         end
       end
+    rescue ThreadError => e
+      # Ruby raises ThreadError with this specific message when Mutex#synchronize (or #lock) is
+      # called from a signal trap context. There is no public Ruby API to detect trap context
+      # proactively - Thread.current is the same object as the main thread, its status is "run",
+      # and caller_locations contains no "trap" frame. The only observable difference is that
+      # blocking mutex operations raise this error. We re-raise anything else (e.g.
+      # "deadlock; recursive locking") so those are not silently swallowed.
+      #
+      # Puma's `after_stopped` DSL hook in single mode is one example that fires in trap context.
+      # We escape by delegating to a background thread and joining so the caller blocks until the
+      # producer is fully closed.
+      raise unless e.message == "can't be called from trap context"
+      Thread.new { close(force: force) }.value
     end
     # Closes the producer with forced close after timeout, purging any outgoing data
@@ -484,15 +544,21 @@ module WaterDrop
     # Ensures that we don't run any operations when the producer is not configured or when it
     # was already closed
     def ensure_active!
-      return if @status.active?
-      return if @status.closing? && @operating_mutex.owned?
+      # Capture the lifecycle state once. Another thread may be transitioning the producer between
+      # states (for example configured -> connected while reloading the client after a fatal error),
+      # and issuing several @status predicate calls here could otherwise observe an inconsistent mix
+      # of states and raise StatusInvalidError for what is in fact a valid, active producer.
+      state = @status.to_sym
-      raise Errors::ProducerNotConfiguredError, id if @status.initial?
-      raise Errors::ProducerClosedError, id if @status.closing?
-      raise Errors::ProducerClosedError, id if @status.closed?
+      return if Status::ACTIVE_STATES.include?(state)
+      return if state == :closing && @operating_mutex.owned?
+      raise Errors::ProducerNotConfiguredError, id if state == :initial
+      raise Errors::ProducerClosedError, id if state == :closing
+      raise Errors::ProducerClosedError, id if state == :closed
       # This should never happen
-      raise Errors::StatusInvalidError, [id, @status.to_s]
+      raise Errors::StatusInvalidError, [id, state.to_s]
     end
     # Ensures that the message we want to send out to Kafka is actually valid and that it can be
@@ -506,26 +572,48 @@ module WaterDrop
     # Waits on a given handler
     #
     # @param handler [Rdkafka::Producer::DeliveryHandle]
+    # @param max_wait_timeout [Integer] max wait timeout in ms. Resolved from the current variant
+    #   by default but can be passed in by batch operations that wait on many handlers, so the
+    #   variant is not re-resolved for each of them.
     # @param raise_response_error [Boolean] should we raise the response error after we receive the
     #   final result and it is an error.
-    def wait(handler, raise_response_error: true)
+    def wait(handler, max_wait_timeout: current_variant.max_wait_timeout, raise_response_error: true)
       handler.wait(
-        max_wait_timeout_ms: current_variant.max_wait_timeout,
+        max_wait_timeout_ms: max_wait_timeout,
         raise_response_error: raise_response_error
       )
     end
-    # @return [Producer::Variant] the variant config. Either custom if built using `#with` or
-    #   a default one.
-    def current_variant
-      Fiber.current.waterdrop_clients ||= {}
-      Fiber.current.waterdrop_clients[id] || @default_variant
+    # Dispatches a message, ensuring transactional producers take the transaction lock before the
+    # operation is counted.
+    #
+    # For a transactional producer we wrap the whole dispatch (including the operations-counter
+    # bookkeeping) in `transaction`, so `@transaction_mutex` is acquired BEFORE
+    # `@operations_in_progress` is incremented. This makes `#produce` acquire locks in the same order
+    # as `#close` (`@transaction_mutex` -> `@operating_mutex` -> operations counter) and removes a
+    # lock-order inversion: without it, a dispatch that had already counted itself could block forever
+    # on `@transaction_mutex` held by a concurrent `#close` that was itself waiting for the operations
+    # counter to drain. When we already own the transaction lock (inside an explicit transaction block
+    # or the closing flush) the order is already correct, so we dispatch directly.
+    #
+    # @param message [Hash] message we want to send
+    # @param label [String] short name of the public dispatch method (e.g. `"produce_sync"`) that
+    #   we surface in the `message.*` queue-full error type. Passed explicitly by each public entry
+    #   point so we never have to walk the call stack to recover it (the number of internal frames
+    #   varies because the transactional path wraps the dispatch in a `transaction`).
+    def produce(message, label)
+      if transactional? && !@transaction_mutex.owned?
+        transaction { produce_to_client(message, label) }
+      else
+        produce_to_client(message, label)
+      end
     end
     # Runs the client produce method with a given message
     #
     # @param message [Hash] message we want to send
-    def produce(message)
+    # @param label [String] public dispatch method name used in the queue-full error type
+    def produce_to_client(message, label)
       produce_time ||= monotonic_now
       # This can happen only during flushing on closing, in case like this we don't have to
@@ -537,16 +625,20 @@ module WaterDrop
         ensure_active!
       end
+      # The variant is fiber-local and cannot change mid-call, so we resolve it once instead of
+      # paying the fiber-local lookup for each usage
+      variant = current_variant
       # We basically only duplicate the message hash only if it is needed.
       # It is needed when user is using a custom settings variant or when symbol is provided as
       # the topic name. We should never mutate user input message as it may be a hash that the
       # user is using for some other operations
-      if message[:topic].is_a?(Symbol) || !current_variant.default?
+      if message[:topic].is_a?(Symbol) || !variant.default?
         message = message.dup
         # In case someone defines topic as a symbol, we need to convert it into a string as
         # librdkafka does not accept symbols
         message[:topic] = message[:topic].to_s
-        message[:topic_config] = current_variant.topic_config
+        message[:topic_config] = variant.topic_config
       end
       result = if transactional?
@@ -560,8 +652,14 @@ module WaterDrop
       result
     rescue SUPPORTED_FLOW_ERRORS.first => e
-      # Check if this is a fatal error on an idempotent producer and we should reload
-      if idempotent_reloadable?(e)
+      # Check if this is a fatal error on an idempotent producer and we should reload.
+      #
+      # We must never reload while closing. During `#close` the final `flush` runs while this
+      # thread already owns `@operating_mutex`; the idempotent reload re-acquires that same mutex,
+      # which Ruby rejects with `ThreadError: deadlock; recursive locking`, and it would also try to
+      # rebuild the very client we are tearing down. In that case we let the error propagate so
+      # `#close` can finish and release the underlying client.
+      if idempotent_reloadable?(e) && !@operating_mutex.owned?
         # Check if we've exceeded max reload attempts
         raise unless idempotent_retryable?
@@ -597,8 +695,6 @@ module WaterDrop
       # in an infinite loop, effectively hanging the processing
       raise unless monotonic_now - produce_time < @config.wait_timeout_on_queue_full
-      label = caller_locations(2, 1)[0].label.split.last.split("#").last
       # We use this syntax here because we want to preserve the original `#cause` when we
       # instrument the error and there is no way to manually assign `#cause` value. We want to keep
       # the original cause to maintain the same API across all the errors dispatched to the

data/lib/waterdrop/version.rb CHANGED Viewed

@@ -3,5 +3,5 @@
 # WaterDrop library
 module WaterDrop
   # Current WaterDrop version
-  VERSION = "2.10.0"
+  VERSION = "2.10.2"
 end

data/package-lock.json CHANGED Viewed

@@ -286,9 +286,9 @@
       }
     },
     "node_modules/smol-toml": {
-      "version": "1.6.0",
-      "resolved": "https://registry.npmjs.org/smol-toml/-/smol-toml-1.6.0.tgz",
-      "integrity": "sha512-4zemZi0HvTnYwLfrpk/CF9LOd9Lt87kAt50GnqhMpyF9U3poDAP2+iukq2bZsO/ufegbYehBkqINbsWxj4l4cw==",
+      "version": "1.6.1",
+      "resolved": "https://registry.npmjs.org/smol-toml/-/smol-toml-1.6.1.tgz",
+      "integrity": "sha512-dWUG8F5sIIARXih1DTaQAX4SsiTXhInKf1buxdY9DIg4ZYPZK5nGM1VRIYmEbDbsHt7USo99xSLFu5Q1IqTmsg==",
       "dev": true,
       "license": "BSD-3-Clause",
       "engines": {
@@ -312,9 +312,9 @@
       }
     },
     "node_modules/yaml": {
-      "version": "2.8.2",
-      "resolved": "https://registry.npmjs.org/yaml/-/yaml-2.8.2.tgz",
-      "integrity": "sha512-mplynKqc1C2hTVYxd0PU2xQAc22TI1vShAYGksCCfxbn/dFwnHTNi1bvYsBTkhdUNtGIf5xNOg938rrSSYvS9A==",
+      "version": "2.9.0",
+      "resolved": "https://registry.npmjs.org/yaml/-/yaml-2.9.0.tgz",
+      "integrity": "sha512-2AvhNX3mb8zd6Zy7INTtSpl1F15HW6Wnqj0srWlkKLcpYl/gMIMJiyuGq2KeI2YFxUPjdlB+3Lc10seMLtL4cA==",
       "dev": true,
       "license": "ISC",
       "bin": {

data/renovate.json CHANGED Viewed

@@ -17,7 +17,7 @@
     {
       "minimumReleaseAge": "7 days",
       "matchDepNames": [
-        "/*/"
+        "*"
       ]
     },
     {
@@ -39,7 +39,15 @@
         "ruby/setup-ruby",
         "ruby"
       ],
-      "groupName": "ruby setup"
+      "groupName": "ruby setup",
+      "internalChecksFilter": "strict"
+    },
+    {
+      "description": "Let setup-ruby pass age gate before ruby so it is ready when the group PR is created",
+      "matchPackageNames": [
+        "ruby/setup-ruby"
+      ],
+      "minimumReleaseAge": "5 days"
     }
   ],
   "minimumReleaseAge": "7 days",
@@ -47,6 +55,9 @@
     "dependencies"
   ],
   "lockFileMaintenance": {
-    "enabled": true
+    "enabled": true,
+    "schedule": [
+      "before 4am on the first day of the month"
+    ]
   }
 }

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: waterdrop
 version: !ruby/object:Gem::Version
-  version: 2.10.0
+  version: 2.10.2
 platform: ruby
 authors:
 - Maciej Mensfeld
@@ -160,7 +160,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 4.0.6
+rubygems_version: 4.0.10
 specification_version: 4
 summary: Kafka messaging made easy!
 test_files: []