waterdrop 2.4.5 → 2.4.7
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- checksums.yaml.gz.sig +0 -0
- data/CHANGELOG.md +24 -0
- data/Gemfile.lock +3 -3
- data/README.md +35 -2
- data/lib/waterdrop/config.rb +10 -1
- data/lib/waterdrop/instrumentation/logger_listener.rb +1 -0
- data/lib/waterdrop/middleware.rb +50 -0
- data/lib/waterdrop/patches/rdkafka/metadata.rb +9 -1
- data/lib/waterdrop/producer/async.rb +4 -0
- data/lib/waterdrop/producer/buffer.rb +3 -0
- data/lib/waterdrop/producer/sync.rb +4 -0
- data/lib/waterdrop/producer.rb +3 -0
- data/lib/waterdrop/version.rb +1 -1
- data/waterdrop.gemspec +1 -1
- data.tar.gz.sig +3 -5
- metadata +5 -4
- metadata.gz.sig +0 -0
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 8aa78b8b5f2d8534689cb9fe3db46d579610ce3b4767cef46b16d8fb1e19d48e
|
4
|
+
data.tar.gz: 75ff38cc56317bc74047fe9ba53ee1f42b098299f694c1ad5767f80ed4c2cf7e
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 3e5589c065d8db716a277bb78e985b85fbe94c2d064eff1cae780f9b4fa6f64ddd95ef0591c99de2dcefedd5582c80ad83442f1cd53091b27349fb193bcd98f6
|
7
|
+
data.tar.gz: 9af7835c4419dd10af2cde93b42c7797e1811f3f049fe7ae4a50a8ccad50fc6f86a8703777c60410d10c1615386eaee2acbeacf073dd640ba24af24ed534326a
|
checksums.yaml.gz.sig
CHANGED
Binary file
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,29 @@
|
|
1
1
|
# WaterDrop changelog
|
2
2
|
|
3
|
+
## 2.4.7 (2022-12-18)
|
4
|
+
- Add support to customizable middlewares that can modify message hash prior to validation and dispatch.
|
5
|
+
- Fix a case where upon not-available leader, metadata request would not be retried
|
6
|
+
- Require `karafka-core` 2.0.7.
|
7
|
+
|
8
|
+
## 2.4.6 (2022-12-10)
|
9
|
+
- Set `statistics.interval.ms` to 5 seconds by default, so the defaults cover all the instrumentation out of the box.
|
10
|
+
|
11
|
+
### Upgrade notes
|
12
|
+
|
13
|
+
If you want to disable `librdkafka` statistics because you do not use them at all, update the `kafka` `statistics.interval.ms` setting and set it to `0`:
|
14
|
+
|
15
|
+
```ruby
|
16
|
+
producer = WaterDrop::Producer.new
|
17
|
+
|
18
|
+
producer.setup do |config|
|
19
|
+
config.deliver = true
|
20
|
+
config.kafka = {
|
21
|
+
'bootstrap.servers': 'localhost:9092',
|
22
|
+
'statistics.interval.ms': 0
|
23
|
+
}
|
24
|
+
end
|
25
|
+
```
|
26
|
+
|
3
27
|
## 2.4.5 (2022-12-10)
|
4
28
|
- Fix invalid error scope visibility.
|
5
29
|
- Cache partition count to improve messages production and lower stress on Kafka when `partition_key` is on.
|
data/Gemfile.lock
CHANGED
@@ -1,8 +1,8 @@
|
|
1
1
|
PATH
|
2
2
|
remote: .
|
3
3
|
specs:
|
4
|
-
waterdrop (2.4.
|
5
|
-
karafka-core (>= 2.0.
|
4
|
+
waterdrop (2.4.7)
|
5
|
+
karafka-core (>= 2.0.7, < 3.0.0)
|
6
6
|
zeitwerk (~> 2.3)
|
7
7
|
|
8
8
|
GEM
|
@@ -22,7 +22,7 @@ GEM
|
|
22
22
|
ffi (1.15.5)
|
23
23
|
i18n (1.12.0)
|
24
24
|
concurrent-ruby (~> 1.0)
|
25
|
-
karafka-core (2.0.
|
25
|
+
karafka-core (2.0.7)
|
26
26
|
concurrent-ruby (>= 1.1)
|
27
27
|
rdkafka (>= 0.12)
|
28
28
|
mini_portile2 (2.8.0)
|
data/README.md
CHANGED
@@ -35,6 +35,7 @@ It:
|
|
35
35
|
* [Error notifications](#error-notifications)
|
36
36
|
* [Datadog and StatsD integration](#datadog-and-statsd-integration)
|
37
37
|
* [Forking and potential memory problems](#forking-and-potential-memory-problems)
|
38
|
+
- [Middleware](#middleware)
|
38
39
|
- [Note on contributions](#note-on-contributions)
|
39
40
|
|
40
41
|
## Installation
|
@@ -301,9 +302,9 @@ See the `WaterDrop::Instrumentation::Monitor::EVENTS` for the list of all the su
|
|
301
302
|
|
302
303
|
### Usage statistics
|
303
304
|
|
304
|
-
WaterDrop
|
305
|
+
WaterDrop is configured to emit internal `librdkafka` metrics every five seconds. You can change this by setting the `kafka` `statistics.interval.ms` configuration property to a value greater of equal `0`. Emitted statistics are available after subscribing to the `statistics.emitted` publisher event. If set to `0`, metrics will not be published.
|
305
306
|
|
306
|
-
The statistics include all of the metrics from `librdkafka` (
|
307
|
+
The statistics include all of the metrics from `librdkafka` (complete list [here](https://github.com/edenhill/librdkafka/blob/master/STATISTICS.md)) as well as the diff of those against the previously emitted values.
|
307
308
|
|
308
309
|
For several attributes like `txmsgs`, `librdkafka` publishes only the totals. In order to make it easier to track the progress (for example number of messages sent between statistics emitted events), WaterDrop diffs all the numeric values against previously available numbers. All of those metrics are available under the same key as the metric but with additional `_d` postfix:
|
309
310
|
|
@@ -420,6 +421,38 @@ If you work with forked processes, make sure you **don't** use the producer befo
|
|
420
421
|
|
421
422
|
To tackle this [obstacle](https://github.com/appsignal/rdkafka-ruby/issues/15) related to rdkafka, WaterDrop adds finalizer to each of the producers to close the rdkafka client before the Ruby process is shutdown. Due to the [nature of the finalizers](https://www.mikeperham.com/2010/02/24/the-trouble-with-ruby-finalizers/), this implementation prevents producers from being GCed (except upon VM shutdown) and can cause memory leaks if you don't use persistent/long-lived producers in a long-running process or if you don't use the `#close` method of a producer when it is no longer needed. Creating a producer instance for each message is anyhow a rather bad idea, so we recommend not to.
|
422
423
|
|
424
|
+
## Middleware
|
425
|
+
|
426
|
+
WaterDrop supports injecting middleware similar to Rack.
|
427
|
+
|
428
|
+
Middleware can be used to provide extra functionalities like auto-serialization of data or any other modifications of messages before their validation and dispatch.
|
429
|
+
|
430
|
+
Each middleware accepts the message hash as input and expects a message hash as a result.
|
431
|
+
|
432
|
+
There are two methods to register middlewares:
|
433
|
+
|
434
|
+
- `#prepend` - registers middleware as the first in the order of execution
|
435
|
+
- `#append` - registers middleware as the last in the order of execution
|
436
|
+
|
437
|
+
Below you can find an example middleware that converts the incoming payload into a JSON string by running `#to_json` automatically:
|
438
|
+
|
439
|
+
```ruby
|
440
|
+
class AutoMapper
|
441
|
+
def call(message)
|
442
|
+
message[:payload] = message[:payload].to_json
|
443
|
+
message
|
444
|
+
end
|
445
|
+
end
|
446
|
+
|
447
|
+
# Register middleware
|
448
|
+
producer.middleware.append(AutoMapper.new)
|
449
|
+
|
450
|
+
# Dispatch without manual casting
|
451
|
+
producer.produce_async(topic: 'users', payload: user)
|
452
|
+
```
|
453
|
+
|
454
|
+
**Note**: It is up to the end user to decide whether to modify the provided message or deep copy it and update the newly created one.
|
455
|
+
|
423
456
|
## Note on contributions
|
424
457
|
|
425
458
|
First, thank you for considering contributing to the Karafka ecosystem! It's people like you that make the open source community such a great community!
|
data/lib/waterdrop/config.rb
CHANGED
@@ -9,7 +9,10 @@ module WaterDrop
|
|
9
9
|
|
10
10
|
# Defaults for kafka settings, that will be overwritten only if not present already
|
11
11
|
KAFKA_DEFAULTS = {
|
12
|
-
'client.id': 'waterdrop'
|
12
|
+
'client.id': 'waterdrop',
|
13
|
+
# emit librdkafka statistics every five seconds. This is used in instrumentation.
|
14
|
+
# When disabled, part of metrics will not be published and available.
|
15
|
+
'statistics.interval.ms': 5_000
|
13
16
|
}.freeze
|
14
17
|
|
15
18
|
private_constant :KAFKA_DEFAULTS
|
@@ -54,6 +57,12 @@ module WaterDrop
|
|
54
57
|
# rdkafka options
|
55
58
|
# @see https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md
|
56
59
|
setting :kafka, default: {}
|
60
|
+
# Middleware chain that can be expanded with useful middleware steps
|
61
|
+
setting(
|
62
|
+
:middleware,
|
63
|
+
default: false,
|
64
|
+
constructor: ->(middleware) { middleware || WaterDrop::Middleware.new }
|
65
|
+
)
|
57
66
|
|
58
67
|
# Configuration method
|
59
68
|
# @yield Runs a block of code providing a config singleton instance to it
|
@@ -1,6 +1,7 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
3
|
module WaterDrop
|
4
|
+
# WaterDrop instrumentation related module
|
4
5
|
module Instrumentation
|
5
6
|
# Default listener that hooks up to our instrumentation and uses its events for logging
|
6
7
|
# It can be removed/replaced or anything without any harm to the Waterdrop flow
|
@@ -0,0 +1,50 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module WaterDrop
|
4
|
+
# Simple middleware layer for manipulating messages prior to their validation
|
5
|
+
class Middleware
|
6
|
+
def initialize
|
7
|
+
@mutex = Mutex.new
|
8
|
+
@steps = []
|
9
|
+
end
|
10
|
+
|
11
|
+
# Runs middleware on a single message prior to validation
|
12
|
+
#
|
13
|
+
# @param message [Hash] message hash
|
14
|
+
# @return [Hash] message hash. Either the same if transformed in place, or a copy if modified
|
15
|
+
# into a new object.
|
16
|
+
# @note You need to decide yourself whether you don't use the message hash data anywhere else
|
17
|
+
# and you want to save on memory by modifying it in place or do you want to do a deep copy
|
18
|
+
def run(message)
|
19
|
+
@steps.each do |step|
|
20
|
+
message = step.call(message)
|
21
|
+
end
|
22
|
+
|
23
|
+
message
|
24
|
+
end
|
25
|
+
|
26
|
+
# @param messages [Array<Hash>] messages on which we want to run middlewares
|
27
|
+
# @return [Array<Hash>] transformed messages
|
28
|
+
def run_many(messages)
|
29
|
+
messages.map do |message|
|
30
|
+
run(message)
|
31
|
+
end
|
32
|
+
end
|
33
|
+
|
34
|
+
# Register given middleware as the first one in the chain
|
35
|
+
# @param step [#call] step that needs to return the message
|
36
|
+
def prepend(step)
|
37
|
+
@mutex.synchronize do
|
38
|
+
@steps.prepend step
|
39
|
+
end
|
40
|
+
end
|
41
|
+
|
42
|
+
# Register given middleware as the last one in the chain
|
43
|
+
# @param step [#call] step that needs to return the message
|
44
|
+
def append(step)
|
45
|
+
@mutex.synchronize do
|
46
|
+
@steps.append step
|
47
|
+
end
|
48
|
+
end
|
49
|
+
end
|
50
|
+
end
|
@@ -7,6 +7,14 @@ module WaterDrop
|
|
7
7
|
module Rdkafka
|
8
8
|
# Rdkafka::Metadata patches
|
9
9
|
module Metadata
|
10
|
+
# Errors upon which we retry the metadata fetch
|
11
|
+
RETRIED_ERRORS = %i[
|
12
|
+
timed_out
|
13
|
+
leader_not_available
|
14
|
+
].freeze
|
15
|
+
|
16
|
+
private_constant :RETRIED_ERRORS
|
17
|
+
|
10
18
|
# We overwrite this method because there were reports of metadata operation timing out
|
11
19
|
# when Kafka was under stress. While the messages dispatch will be retried, metadata
|
12
20
|
# fetch happens prior to that, effectively crashing the process. Metadata fetch was not
|
@@ -19,7 +27,7 @@ module WaterDrop
|
|
19
27
|
|
20
28
|
super(*args)
|
21
29
|
rescue ::Rdkafka::RdkafkaError => e
|
22
|
-
raise unless e.code
|
30
|
+
raise unless RETRIED_ERRORS.include?(e.code)
|
23
31
|
raise if attempt > 10
|
24
32
|
|
25
33
|
backoff_factor = 2**attempt
|
@@ -15,6 +15,8 @@ module WaterDrop
|
|
15
15
|
# message could not be sent to Kafka
|
16
16
|
def produce_async(message)
|
17
17
|
ensure_active!
|
18
|
+
|
19
|
+
message = middleware.run(message)
|
18
20
|
validate_message!(message)
|
19
21
|
|
20
22
|
@monitor.instrument(
|
@@ -36,6 +38,8 @@ module WaterDrop
|
|
36
38
|
# and the message could not be sent to Kafka
|
37
39
|
def produce_many_async(messages)
|
38
40
|
ensure_active!
|
41
|
+
|
42
|
+
messages = middleware.run_many(messages)
|
39
43
|
messages.each { |message| validate_message!(message) }
|
40
44
|
|
41
45
|
@monitor.instrument(
|
@@ -19,6 +19,7 @@ module WaterDrop
|
|
19
19
|
# message could not be sent to Kafka
|
20
20
|
def buffer(message)
|
21
21
|
ensure_active!
|
22
|
+
message = middleware.run(message)
|
22
23
|
validate_message!(message)
|
23
24
|
|
24
25
|
@monitor.instrument(
|
@@ -37,6 +38,8 @@ module WaterDrop
|
|
37
38
|
# and the message could not be sent to Kafka
|
38
39
|
def buffer_many(messages)
|
39
40
|
ensure_active!
|
41
|
+
|
42
|
+
messages = middleware.run_many(messages)
|
40
43
|
messages.each { |message| validate_message!(message) }
|
41
44
|
|
42
45
|
@monitor.instrument(
|
@@ -17,6 +17,8 @@ module WaterDrop
|
|
17
17
|
# message could not be sent to Kafka
|
18
18
|
def produce_sync(message)
|
19
19
|
ensure_active!
|
20
|
+
|
21
|
+
message = middleware.run(message)
|
20
22
|
validate_message!(message)
|
21
23
|
|
22
24
|
@monitor.instrument(
|
@@ -47,6 +49,8 @@ module WaterDrop
|
|
47
49
|
# and the message could not be sent to Kafka
|
48
50
|
def produce_many_sync(messages)
|
49
51
|
ensure_active!
|
52
|
+
|
53
|
+
messages = middleware.run_many(messages)
|
50
54
|
messages.each { |message| validate_message!(message) }
|
51
55
|
|
52
56
|
@monitor.instrument('messages.produced_sync', producer_id: id, messages: messages) do
|
data/lib/waterdrop/producer.rb
CHANGED
@@ -3,10 +3,13 @@
|
|
3
3
|
module WaterDrop
|
4
4
|
# Main WaterDrop messages producer
|
5
5
|
class Producer
|
6
|
+
extend Forwardable
|
6
7
|
include Sync
|
7
8
|
include Async
|
8
9
|
include Buffer
|
9
10
|
|
11
|
+
def_delegators :config, :middleware
|
12
|
+
|
10
13
|
# @return [String] uuid of the current producer
|
11
14
|
attr_reader :id
|
12
15
|
# @return [Status] producer status object
|
data/lib/waterdrop/version.rb
CHANGED
data/waterdrop.gemspec
CHANGED
@@ -16,7 +16,7 @@ Gem::Specification.new do |spec|
|
|
16
16
|
spec.description = spec.summary
|
17
17
|
spec.license = 'MIT'
|
18
18
|
|
19
|
-
spec.add_dependency 'karafka-core', '>= 2.0.
|
19
|
+
spec.add_dependency 'karafka-core', '>= 2.0.7', '< 3.0.0'
|
20
20
|
spec.add_dependency 'zeitwerk', '~> 2.3'
|
21
21
|
|
22
22
|
spec.required_ruby_version = '>= 2.7'
|
data.tar.gz.sig
CHANGED
@@ -1,5 +1,3 @@
|
|
1
|
-
|
2
|
-
|
3
|
-
|
4
|
-
Rj,������и]��cGv���U���2�������L�
|
5
|
-
�r~�3���S`���۹�?�
|
1
|
+
��+x��8�XTҞQ-u�8
|
2
|
+
q���%P,���+��%�ݗQ ��we�!}qJaP��5j�:H;�#�v�O�F[0�C�ŮP Y�cH�b��=*��~u+5��N�
|
3
|
+
����K�GD��.
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: waterdrop
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.4.
|
4
|
+
version: 2.4.7
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Maciej Mensfeld
|
@@ -35,7 +35,7 @@ cert_chain:
|
|
35
35
|
Qf04B9ceLUaC4fPVEz10FyobjaFoY4i32xRto3XnrzeAgfEe4swLq8bQsR3w/EF3
|
36
36
|
MGU0FeSV2Yj7Xc2x/7BzLK8xQn5l7Yy75iPF+KP3vVmDHnNl
|
37
37
|
-----END CERTIFICATE-----
|
38
|
-
date: 2022-12-
|
38
|
+
date: 2022-12-18 00:00:00.000000000 Z
|
39
39
|
dependencies:
|
40
40
|
- !ruby/object:Gem::Dependency
|
41
41
|
name: karafka-core
|
@@ -43,7 +43,7 @@ dependencies:
|
|
43
43
|
requirements:
|
44
44
|
- - ">="
|
45
45
|
- !ruby/object:Gem::Version
|
46
|
-
version: 2.0.
|
46
|
+
version: 2.0.7
|
47
47
|
- - "<"
|
48
48
|
- !ruby/object:Gem::Version
|
49
49
|
version: 3.0.0
|
@@ -53,7 +53,7 @@ dependencies:
|
|
53
53
|
requirements:
|
54
54
|
- - ">="
|
55
55
|
- !ruby/object:Gem::Version
|
56
|
-
version: 2.0.
|
56
|
+
version: 2.0.7
|
57
57
|
- - "<"
|
58
58
|
- !ruby/object:Gem::Version
|
59
59
|
version: 3.0.0
|
@@ -108,6 +108,7 @@ files:
|
|
108
108
|
- lib/waterdrop/instrumentation/notifications.rb
|
109
109
|
- lib/waterdrop/instrumentation/vendors/datadog/dashboard.json
|
110
110
|
- lib/waterdrop/instrumentation/vendors/datadog/listener.rb
|
111
|
+
- lib/waterdrop/middleware.rb
|
111
112
|
- lib/waterdrop/patches/rdkafka/metadata.rb
|
112
113
|
- lib/waterdrop/patches/rdkafka/producer.rb
|
113
114
|
- lib/waterdrop/producer.rb
|
metadata.gz.sig
CHANGED
Binary file
|