waterdrop 2.4.5 → 2.4.7

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e1cabe8225d1038bc9fab5c41efaf5822b3d6a633dac233f92058874be8cb5ae
4
- data.tar.gz: ac830393a2c6ed2d50042a08fbd59a6d2ebbd0de505a53953c67b3d50ac1dd22
3
+ metadata.gz: 8aa78b8b5f2d8534689cb9fe3db46d579610ce3b4767cef46b16d8fb1e19d48e
4
+ data.tar.gz: 75ff38cc56317bc74047fe9ba53ee1f42b098299f694c1ad5767f80ed4c2cf7e
5
5
  SHA512:
6
- metadata.gz: d757751cb8a62fee91c5aff7ac30260e3b45d62d17db71eedcd23b77881b4564b8dc0bef534eb060049a65b5d99610c09c86eeb13913a999d864dffc3adee5e3
7
- data.tar.gz: 51cb53cc3a4f168b3157ad840157f2df8a9479381972fe0d8cc13b445449099373acd76d37cf0206ec35d625fb75723b3ab4a3e0c193413de9844c54106209c5
6
+ metadata.gz: 3e5589c065d8db716a277bb78e985b85fbe94c2d064eff1cae780f9b4fa6f64ddd95ef0591c99de2dcefedd5582c80ad83442f1cd53091b27349fb193bcd98f6
7
+ data.tar.gz: 9af7835c4419dd10af2cde93b42c7797e1811f3f049fe7ae4a50a8ccad50fc6f86a8703777c60410d10c1615386eaee2acbeacf073dd640ba24af24ed534326a
checksums.yaml.gz.sig CHANGED
Binary file
data/CHANGELOG.md CHANGED
@@ -1,5 +1,29 @@
1
1
  # WaterDrop changelog
2
2
 
3
+ ## 2.4.7 (2022-12-18)
4
+ - Add support to customizable middlewares that can modify message hash prior to validation and dispatch.
5
+ - Fix a case where upon not-available leader, metadata request would not be retried
6
+ - Require `karafka-core` 2.0.7.
7
+
8
+ ## 2.4.6 (2022-12-10)
9
+ - Set `statistics.interval.ms` to 5 seconds by default, so the defaults cover all the instrumentation out of the box.
10
+
11
+ ### Upgrade notes
12
+
13
+ If you want to disable `librdkafka` statistics because you do not use them at all, update the `kafka` `statistics.interval.ms` setting and set it to `0`:
14
+
15
+ ```ruby
16
+ producer = WaterDrop::Producer.new
17
+
18
+ producer.setup do |config|
19
+ config.deliver = true
20
+ config.kafka = {
21
+ 'bootstrap.servers': 'localhost:9092',
22
+ 'statistics.interval.ms': 0
23
+ }
24
+ end
25
+ ```
26
+
3
27
  ## 2.4.5 (2022-12-10)
4
28
  - Fix invalid error scope visibility.
5
29
  - Cache partition count to improve messages production and lower stress on Kafka when `partition_key` is on.
data/Gemfile.lock CHANGED
@@ -1,8 +1,8 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- waterdrop (2.4.5)
5
- karafka-core (>= 2.0.6, < 3.0.0)
4
+ waterdrop (2.4.7)
5
+ karafka-core (>= 2.0.7, < 3.0.0)
6
6
  zeitwerk (~> 2.3)
7
7
 
8
8
  GEM
@@ -22,7 +22,7 @@ GEM
22
22
  ffi (1.15.5)
23
23
  i18n (1.12.0)
24
24
  concurrent-ruby (~> 1.0)
25
- karafka-core (2.0.6)
25
+ karafka-core (2.0.7)
26
26
  concurrent-ruby (>= 1.1)
27
27
  rdkafka (>= 0.12)
28
28
  mini_portile2 (2.8.0)
data/README.md CHANGED
@@ -35,6 +35,7 @@ It:
35
35
  * [Error notifications](#error-notifications)
36
36
  * [Datadog and StatsD integration](#datadog-and-statsd-integration)
37
37
  * [Forking and potential memory problems](#forking-and-potential-memory-problems)
38
+ - [Middleware](#middleware)
38
39
  - [Note on contributions](#note-on-contributions)
39
40
 
40
41
  ## Installation
@@ -301,9 +302,9 @@ See the `WaterDrop::Instrumentation::Monitor::EVENTS` for the list of all the su
301
302
 
302
303
  ### Usage statistics
303
304
 
304
- WaterDrop may be configured to emit internal metrics at a fixed interval by setting the `kafka` `statistics.interval.ms` configuration property to a value > `0`. Once that is done, emitted statistics are available after subscribing to the `statistics.emitted` publisher event.
305
+ WaterDrop is configured to emit internal `librdkafka` metrics every five seconds. You can change this by setting the `kafka` `statistics.interval.ms` configuration property to a value greater of equal `0`. Emitted statistics are available after subscribing to the `statistics.emitted` publisher event. If set to `0`, metrics will not be published.
305
306
 
306
- The statistics include all of the metrics from `librdkafka` (full list [here](https://github.com/edenhill/librdkafka/blob/master/STATISTICS.md)) as well as the diff of those against the previously emitted values.
307
+ The statistics include all of the metrics from `librdkafka` (complete list [here](https://github.com/edenhill/librdkafka/blob/master/STATISTICS.md)) as well as the diff of those against the previously emitted values.
307
308
 
308
309
  For several attributes like `txmsgs`, `librdkafka` publishes only the totals. In order to make it easier to track the progress (for example number of messages sent between statistics emitted events), WaterDrop diffs all the numeric values against previously available numbers. All of those metrics are available under the same key as the metric but with additional `_d` postfix:
309
310
 
@@ -420,6 +421,38 @@ If you work with forked processes, make sure you **don't** use the producer befo
420
421
 
421
422
  To tackle this [obstacle](https://github.com/appsignal/rdkafka-ruby/issues/15) related to rdkafka, WaterDrop adds finalizer to each of the producers to close the rdkafka client before the Ruby process is shutdown. Due to the [nature of the finalizers](https://www.mikeperham.com/2010/02/24/the-trouble-with-ruby-finalizers/), this implementation prevents producers from being GCed (except upon VM shutdown) and can cause memory leaks if you don't use persistent/long-lived producers in a long-running process or if you don't use the `#close` method of a producer when it is no longer needed. Creating a producer instance for each message is anyhow a rather bad idea, so we recommend not to.
422
423
 
424
+ ## Middleware
425
+
426
+ WaterDrop supports injecting middleware similar to Rack.
427
+
428
+ Middleware can be used to provide extra functionalities like auto-serialization of data or any other modifications of messages before their validation and dispatch.
429
+
430
+ Each middleware accepts the message hash as input and expects a message hash as a result.
431
+
432
+ There are two methods to register middlewares:
433
+
434
+ - `#prepend` - registers middleware as the first in the order of execution
435
+ - `#append` - registers middleware as the last in the order of execution
436
+
437
+ Below you can find an example middleware that converts the incoming payload into a JSON string by running `#to_json` automatically:
438
+
439
+ ```ruby
440
+ class AutoMapper
441
+ def call(message)
442
+ message[:payload] = message[:payload].to_json
443
+ message
444
+ end
445
+ end
446
+
447
+ # Register middleware
448
+ producer.middleware.append(AutoMapper.new)
449
+
450
+ # Dispatch without manual casting
451
+ producer.produce_async(topic: 'users', payload: user)
452
+ ```
453
+
454
+ **Note**: It is up to the end user to decide whether to modify the provided message or deep copy it and update the newly created one.
455
+
423
456
  ## Note on contributions
424
457
 
425
458
  First, thank you for considering contributing to the Karafka ecosystem! It's people like you that make the open source community such a great community!
@@ -9,7 +9,10 @@ module WaterDrop
9
9
 
10
10
  # Defaults for kafka settings, that will be overwritten only if not present already
11
11
  KAFKA_DEFAULTS = {
12
- 'client.id': 'waterdrop'
12
+ 'client.id': 'waterdrop',
13
+ # emit librdkafka statistics every five seconds. This is used in instrumentation.
14
+ # When disabled, part of metrics will not be published and available.
15
+ 'statistics.interval.ms': 5_000
13
16
  }.freeze
14
17
 
15
18
  private_constant :KAFKA_DEFAULTS
@@ -54,6 +57,12 @@ module WaterDrop
54
57
  # rdkafka options
55
58
  # @see https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md
56
59
  setting :kafka, default: {}
60
+ # Middleware chain that can be expanded with useful middleware steps
61
+ setting(
62
+ :middleware,
63
+ default: false,
64
+ constructor: ->(middleware) { middleware || WaterDrop::Middleware.new }
65
+ )
57
66
 
58
67
  # Configuration method
59
68
  # @yield Runs a block of code providing a config singleton instance to it
@@ -1,6 +1,7 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module WaterDrop
4
+ # WaterDrop instrumentation related module
4
5
  module Instrumentation
5
6
  # Default listener that hooks up to our instrumentation and uses its events for logging
6
7
  # It can be removed/replaced or anything without any harm to the Waterdrop flow
@@ -0,0 +1,50 @@
1
+ # frozen_string_literal: true
2
+
3
+ module WaterDrop
4
+ # Simple middleware layer for manipulating messages prior to their validation
5
+ class Middleware
6
+ def initialize
7
+ @mutex = Mutex.new
8
+ @steps = []
9
+ end
10
+
11
+ # Runs middleware on a single message prior to validation
12
+ #
13
+ # @param message [Hash] message hash
14
+ # @return [Hash] message hash. Either the same if transformed in place, or a copy if modified
15
+ # into a new object.
16
+ # @note You need to decide yourself whether you don't use the message hash data anywhere else
17
+ # and you want to save on memory by modifying it in place or do you want to do a deep copy
18
+ def run(message)
19
+ @steps.each do |step|
20
+ message = step.call(message)
21
+ end
22
+
23
+ message
24
+ end
25
+
26
+ # @param messages [Array<Hash>] messages on which we want to run middlewares
27
+ # @return [Array<Hash>] transformed messages
28
+ def run_many(messages)
29
+ messages.map do |message|
30
+ run(message)
31
+ end
32
+ end
33
+
34
+ # Register given middleware as the first one in the chain
35
+ # @param step [#call] step that needs to return the message
36
+ def prepend(step)
37
+ @mutex.synchronize do
38
+ @steps.prepend step
39
+ end
40
+ end
41
+
42
+ # Register given middleware as the last one in the chain
43
+ # @param step [#call] step that needs to return the message
44
+ def append(step)
45
+ @mutex.synchronize do
46
+ @steps.append step
47
+ end
48
+ end
49
+ end
50
+ end
@@ -7,6 +7,14 @@ module WaterDrop
7
7
  module Rdkafka
8
8
  # Rdkafka::Metadata patches
9
9
  module Metadata
10
+ # Errors upon which we retry the metadata fetch
11
+ RETRIED_ERRORS = %i[
12
+ timed_out
13
+ leader_not_available
14
+ ].freeze
15
+
16
+ private_constant :RETRIED_ERRORS
17
+
10
18
  # We overwrite this method because there were reports of metadata operation timing out
11
19
  # when Kafka was under stress. While the messages dispatch will be retried, metadata
12
20
  # fetch happens prior to that, effectively crashing the process. Metadata fetch was not
@@ -19,7 +27,7 @@ module WaterDrop
19
27
 
20
28
  super(*args)
21
29
  rescue ::Rdkafka::RdkafkaError => e
22
- raise unless e.code == :timed_out
30
+ raise unless RETRIED_ERRORS.include?(e.code)
23
31
  raise if attempt > 10
24
32
 
25
33
  backoff_factor = 2**attempt
@@ -15,6 +15,8 @@ module WaterDrop
15
15
  # message could not be sent to Kafka
16
16
  def produce_async(message)
17
17
  ensure_active!
18
+
19
+ message = middleware.run(message)
18
20
  validate_message!(message)
19
21
 
20
22
  @monitor.instrument(
@@ -36,6 +38,8 @@ module WaterDrop
36
38
  # and the message could not be sent to Kafka
37
39
  def produce_many_async(messages)
38
40
  ensure_active!
41
+
42
+ messages = middleware.run_many(messages)
39
43
  messages.each { |message| validate_message!(message) }
40
44
 
41
45
  @monitor.instrument(
@@ -19,6 +19,7 @@ module WaterDrop
19
19
  # message could not be sent to Kafka
20
20
  def buffer(message)
21
21
  ensure_active!
22
+ message = middleware.run(message)
22
23
  validate_message!(message)
23
24
 
24
25
  @monitor.instrument(
@@ -37,6 +38,8 @@ module WaterDrop
37
38
  # and the message could not be sent to Kafka
38
39
  def buffer_many(messages)
39
40
  ensure_active!
41
+
42
+ messages = middleware.run_many(messages)
40
43
  messages.each { |message| validate_message!(message) }
41
44
 
42
45
  @monitor.instrument(
@@ -17,6 +17,8 @@ module WaterDrop
17
17
  # message could not be sent to Kafka
18
18
  def produce_sync(message)
19
19
  ensure_active!
20
+
21
+ message = middleware.run(message)
20
22
  validate_message!(message)
21
23
 
22
24
  @monitor.instrument(
@@ -47,6 +49,8 @@ module WaterDrop
47
49
  # and the message could not be sent to Kafka
48
50
  def produce_many_sync(messages)
49
51
  ensure_active!
52
+
53
+ messages = middleware.run_many(messages)
50
54
  messages.each { |message| validate_message!(message) }
51
55
 
52
56
  @monitor.instrument('messages.produced_sync', producer_id: id, messages: messages) do
@@ -3,10 +3,13 @@
3
3
  module WaterDrop
4
4
  # Main WaterDrop messages producer
5
5
  class Producer
6
+ extend Forwardable
6
7
  include Sync
7
8
  include Async
8
9
  include Buffer
9
10
 
11
+ def_delegators :config, :middleware
12
+
10
13
  # @return [String] uuid of the current producer
11
14
  attr_reader :id
12
15
  # @return [Status] producer status object
@@ -3,5 +3,5 @@
3
3
  # WaterDrop library
4
4
  module WaterDrop
5
5
  # Current WaterDrop version
6
- VERSION = '2.4.5'
6
+ VERSION = '2.4.7'
7
7
  end
data/waterdrop.gemspec CHANGED
@@ -16,7 +16,7 @@ Gem::Specification.new do |spec|
16
16
  spec.description = spec.summary
17
17
  spec.license = 'MIT'
18
18
 
19
- spec.add_dependency 'karafka-core', '>= 2.0.6', '< 3.0.0'
19
+ spec.add_dependency 'karafka-core', '>= 2.0.7', '< 3.0.0'
20
20
  spec.add_dependency 'zeitwerk', '~> 2.3'
21
21
 
22
22
  spec.required_ruby_version = '>= 2.7'
data.tar.gz.sig CHANGED
@@ -1,5 +1,3 @@
1
- w�9T���BS�C��=Y<_ʀ����:����-+A�Y���Lo��.}�VNyҙH.�m�Ju�4����؄\Y2�� 9s]Ї�AԷ[�7�JQ�� 3��]ʳ���Su}"����samB봧�^S����n���K��b�N��F�Ԙ����g��m �,�4r��M�B` �
2
- �H>��Kh�*���7��
3
- �nA�*�%gOnz����!CWAP!SҼ8h��n���9z�'�F��j�òr@D�>�V�~�
4
- Rj,������и]��cGv���U���2����� ��L�
5
- �r~�3���S`���۹�?�
1
+ ��+x��8XTҞQ-u8
2
+ q���%P,���+��%�ݗQ ��we�!}qJaP��5j�:H;�#�v�O�F[0�C�ŮP Y�cH�b��=*��~u+5��N�
3
+ ����KGD��.
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: waterdrop
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.4.5
4
+ version: 2.4.7
5
5
  platform: ruby
6
6
  authors:
7
7
  - Maciej Mensfeld
@@ -35,7 +35,7 @@ cert_chain:
35
35
  Qf04B9ceLUaC4fPVEz10FyobjaFoY4i32xRto3XnrzeAgfEe4swLq8bQsR3w/EF3
36
36
  MGU0FeSV2Yj7Xc2x/7BzLK8xQn5l7Yy75iPF+KP3vVmDHnNl
37
37
  -----END CERTIFICATE-----
38
- date: 2022-12-10 00:00:00.000000000 Z
38
+ date: 2022-12-18 00:00:00.000000000 Z
39
39
  dependencies:
40
40
  - !ruby/object:Gem::Dependency
41
41
  name: karafka-core
@@ -43,7 +43,7 @@ dependencies:
43
43
  requirements:
44
44
  - - ">="
45
45
  - !ruby/object:Gem::Version
46
- version: 2.0.6
46
+ version: 2.0.7
47
47
  - - "<"
48
48
  - !ruby/object:Gem::Version
49
49
  version: 3.0.0
@@ -53,7 +53,7 @@ dependencies:
53
53
  requirements:
54
54
  - - ">="
55
55
  - !ruby/object:Gem::Version
56
- version: 2.0.6
56
+ version: 2.0.7
57
57
  - - "<"
58
58
  - !ruby/object:Gem::Version
59
59
  version: 3.0.0
@@ -108,6 +108,7 @@ files:
108
108
  - lib/waterdrop/instrumentation/notifications.rb
109
109
  - lib/waterdrop/instrumentation/vendors/datadog/dashboard.json
110
110
  - lib/waterdrop/instrumentation/vendors/datadog/listener.rb
111
+ - lib/waterdrop/middleware.rb
111
112
  - lib/waterdrop/patches/rdkafka/metadata.rb
112
113
  - lib/waterdrop/patches/rdkafka/producer.rb
113
114
  - lib/waterdrop/producer.rb
metadata.gz.sig CHANGED
Binary file