RubyGems - waterdrop - Versions diffs - 2.5.0 → 2.5.1 - Mend

waterdrop 2.5.0 → 2.5.1

Files changed (11) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 8aa4c6d8e6d5364a4a1a93739a0292410d664df16f7d93545bb478b5659ee443
-  data.tar.gz: 2bf8a36f8332be75eef108b90b2179ecc18f7301750f551eadce9d90eec6c5a9
+  metadata.gz: 2b66f5a9cb1c6fe80fe594777cb60f9fd20f120c2a897ab439404c825503bb37
+  data.tar.gz: 985491a90694c7c729e5c2dd8a581127c96ec26a5eb5eda53d03bc32ab463ee6
 SHA512:
-  metadata.gz: 35269366d659f9dcada5fafd5120f2434784c29417c021a6d78fad722496c37acde6617c4e4e6fee3132052248e5b935c958cb45a1113c8510d995cfd6136ccb
-  data.tar.gz: 8ff77a6a0c61118cf5dcad89f7d7414c510107bb559683fd8b0c133798c686cf6bbf5eef95fb14ecb5bf5762354148eb545b6835e597730457125c69cd996a41
+  metadata.gz: 6ec7c01eb151ad4142f7eccfb988c56394f53b4679e480a9d8706c73323e3f6a25f8f704595fe56d5ba45f7995756add34931cf91bfcb079be6873bcc5563371
+  data.tar.gz: 94261472ac4786fd7911bce919b45267b0d1dc7298965c37ceee51718733d72a650c50745fabac7a0a8604909414b4efb6233640f3f24c499e51f42316d3597a

checksums.yaml.gz.sig CHANGED Viewed

Binary file

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,8 @@
 # WaterDrop changelog
+## 2.5.1 (2023-03-09)
+- [Feature] Introduce a configurable backoff upon `librdkafka` queue full (false by default).
 ## 2.5.0 (2023-03-04)
 - [Feature] Pipe **all** the errors including synchronous errors via the `error.occurred`.
 - [Improvement] Pipe delivery errors that occurred not via the error callback using the `error.occurred` channel.

data/Gemfile.lock CHANGED Viewed

@@ -1,7 +1,7 @@
 PATH
   remote: .
   specs:
-    waterdrop (2.5.0)
+    waterdrop (2.5.1)
       karafka-core (>= 2.0.12, < 3.0.0)
       zeitwerk (~> 2.3)
@@ -30,7 +30,7 @@ GEM
       mini_portile2 (~> 2.6)
       rake (> 12)
     mini_portile2 (2.8.1)
-    minitest (5.17.0)
+    minitest (5.18.0)
     rake (13.0.6)
     rspec (3.12.0)
       rspec-core (~> 3.12.0)

data/README.md CHANGED Viewed

@@ -29,6 +29,7 @@ It:
   * [Buffering](#buffering)
       + [Using WaterDrop to buffer messages based on the application logic](#using-waterdrop-to-buffer-messages-based-on-the-application-logic)
       + [Using WaterDrop with rdkafka buffers to achieve periodic auto-flushing](#using-waterdrop-with-rdkafka-buffers-to-achieve-periodic-auto-flushing)
+  * [Idempotence](#idempotence)
   * [Compression](#compression)
 - [Instrumentation](#instrumentation)
   * [Usage statistics](#usage-statistics)
@@ -92,13 +93,15 @@ end
 Some of the options are:
-| Option             | Description                                                     |
-|--------------------|-----------------------------------------------------------------|
-| `id`               | id of the producer for instrumentation and logging              |
-| `logger`           | Logger that we want to use                                      |
-| `deliver`          | Should we send messages to Kafka or just fake the delivery      |
-| `max_wait_timeout` | Waits that long for the delivery report or raises an error      |
-| `wait_timeout`     | Waits that long before re-check of delivery report availability |
+| Option                       | Description                                                      |
+|------------------------------|------------------------------------------------------------------|
+| `id`                         | id of the producer for instrumentation and logging               |
+| `logger`                     | Logger that we want to use                                       |
+| `deliver`                    | Should we send messages to Kafka or just fake the delivery       |
+| `max_wait_timeout`           | Waits that long for the delivery report or raises an error       |
+| `wait_timeout`               | Waits that long before re-check of delivery report availability  |
+| `wait_on_queue_full`         | Should be wait on queue full or raise an error when that happens |
+| `wait_on_queue_full_timeout` | Waits that long before retry when queue is full                  |
 Full list of the root configuration options is available [here](https://github.com/karafka/waterdrop/blob/master/lib/waterdrop/config.rb#L25).
@@ -206,6 +209,30 @@ WaterDrop producers support buffering messages in their internal buffers and on
 This means that depending on your use case, you can achieve both granular buffering and flushing control when needed with context awareness and periodic and size-based flushing functionalities.
+### Idempotence
+When idempotence is enabled, the producer will ensure that messages are successfully produced exactly once and in the original production order.
+To enable idempotence, you need to set the `enable.idempotence` kafka scope setting to `true`:
+```ruby
+WaterDrop::Producer.new do |config|
+  config.deliver = true
+  config.kafka = {
+    'bootstrap.servers': 'localhost:9092',
+    'enable.idempotence': true
+  }
+end
+```
+The following Kafka configuration properties are adjusted automatically (if not modified by the user) when idempotence is enabled:
+- `max.in.flight.requests.per.connection` set to `5`
+- `retries` set to `2147483647`
+- `acks` set to `all`
+The idempotent producer ensures that messages are always delivered in the correct order and without duplicates. In other words, when an idempotent producer sends a message, the messaging system ensures that the message is only delivered once to the message broker and subsequently to the consumers, even if the producer tries to send the message multiple times.
 ### Compression
 WaterDrop supports following compression types:

data/lib/waterdrop/config.rb CHANGED Viewed

@@ -50,6 +50,17 @@ module WaterDrop
     #   delivery report. In a really robust systems, this describes the min-delivery time
     #   for a single sync message when produced in isolation
     setting :wait_timeout, default: 0.005 # 5 milliseconds
+    # option [Boolean] should we upon detecting full librdkafka queue backoff and retry or should
+    #   we raise an exception.
+    #   When this is set to `true`, upon full queue, we won't raise an error. There will be error
+    #   in the `error.occurred` notification pipeline with a proper type as while this is
+    #   recoverable, in a high number it still may mean issues.
+    #   Waiting is one of the recommended strategies.
+    setting :wait_on_queue_full, default: false
+    # option [Integer] how long (in seconds) should we backoff before a retry when queue is full
+    #   The retry will happen with the same message and backoff should give us some time to
+    #   dispatch previously buffered messages.
+    setting :wait_on_queue_full_timeout, default: 0.1
     # option [Boolean] should we send messages. Setting this to false can be really useful when
     #   testing and or developing because when set to false, won't actually ping Kafka but will
     #   run all the validations, etc

data/lib/waterdrop/producer.rb CHANGED Viewed

@@ -173,6 +173,37 @@ module WaterDrop
     # @param message [Hash] message we want to send
     def produce(message)
       client.produce(**message)
+    rescue SUPPORTED_FLOW_ERRORS.first => e
+      # Unless we want to wait and retry and it's a full queue, we raise normally
+      raise unless @config.wait_on_queue_full
+      raise unless e.code == :queue_full
+      # We use this syntax here because we want to preserve the original `#cause` when we
+      # instrument the error and there is no way to manually assign `#cause` value. We want to keep
+      # the original cause to maintain the same API across all the errors dispatched to the
+      # notifications pipeline.
+      begin
+        raise Errors::ProduceError
+      rescue Errors::ProduceError => e
+        # We want to instrument on this event even when we restart it.
+        # The reason is simple: instrumentation and visibility.
+        # We can recover from this, but despite that we should be able to instrument this.
+        # If this type of event happens too often, it may indicate that the buffer settings are not
+        # well configured.
+        @monitor.instrument(
+          'error.occurred',
+          producer_id: id,
+          message: message,
+          error: e,
+          type: 'message.produce'
+        )
+        # We do not poll the producer because polling happens in a background thread
+        # It also should not be a frequent case (queue full), hence it's ok to just throttle.
+        sleep @config.wait_on_queue_full_timeout
+      end
+      retry
     end
     # Waits on a given handler

data/lib/waterdrop/version.rb CHANGED Viewed

@@ -3,5 +3,5 @@
 # WaterDrop library
 module WaterDrop
   # Current WaterDrop version
-  VERSION = '2.5.0'
+  VERSION = '2.5.1'
 end

data.tar.gz.sig CHANGED Viewed

Binary file

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: waterdrop
 version: !ruby/object:Gem::Version
-  version: 2.5.0
+  version: 2.5.1
 platform: ruby
 authors:
 - Maciej Mensfeld
@@ -35,7 +35,7 @@ cert_chain:
   Qf04B9ceLUaC4fPVEz10FyobjaFoY4i32xRto3XnrzeAgfEe4swLq8bQsR3w/EF3
   MGU0FeSV2Yj7Xc2x/7BzLK8xQn5l7Yy75iPF+KP3vVmDHnNl
   -----END CERTIFICATE-----
-date: 2023-03-04 00:00:00.000000000 Z
+date: 2023-03-09 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: karafka-core

metadata.gz.sig CHANGED Viewed

Binary file