waterdrop 2.0.7 → 2.6.11

Sign up to get free protection for your applications and to get access to all the features.
Files changed (56) hide show
  1. checksums.yaml +4 -4
  2. checksums.yaml.gz.sig +0 -0
  3. data/.github/FUNDING.yml +1 -0
  4. data/.github/workflows/ci.yml +22 -11
  5. data/.ruby-version +1 -1
  6. data/CHANGELOG.md +200 -0
  7. data/Gemfile +0 -2
  8. data/Gemfile.lock +32 -75
  9. data/README.md +22 -275
  10. data/certs/cert_chain.pem +26 -0
  11. data/config/locales/errors.yml +33 -0
  12. data/docker-compose.yml +19 -12
  13. data/lib/waterdrop/clients/buffered.rb +90 -0
  14. data/lib/waterdrop/clients/dummy.rb +69 -0
  15. data/lib/waterdrop/clients/rdkafka.rb +34 -0
  16. data/lib/{water_drop → waterdrop}/config.rb +39 -16
  17. data/lib/waterdrop/contracts/config.rb +43 -0
  18. data/lib/waterdrop/contracts/message.rb +64 -0
  19. data/lib/{water_drop → waterdrop}/errors.rb +14 -7
  20. data/lib/waterdrop/instrumentation/callbacks/delivery.rb +102 -0
  21. data/lib/{water_drop → waterdrop}/instrumentation/callbacks/error.rb +6 -2
  22. data/lib/{water_drop → waterdrop}/instrumentation/callbacks/statistics.rb +1 -1
  23. data/lib/{water_drop/instrumentation/stdout_listener.rb → waterdrop/instrumentation/logger_listener.rb} +66 -21
  24. data/lib/waterdrop/instrumentation/monitor.rb +20 -0
  25. data/lib/{water_drop/instrumentation/monitor.rb → waterdrop/instrumentation/notifications.rb} +12 -14
  26. data/lib/waterdrop/instrumentation/vendors/datadog/dashboard.json +1 -0
  27. data/lib/waterdrop/instrumentation/vendors/datadog/metrics_listener.rb +210 -0
  28. data/lib/waterdrop/middleware.rb +50 -0
  29. data/lib/{water_drop → waterdrop}/producer/async.rb +40 -4
  30. data/lib/{water_drop → waterdrop}/producer/buffer.rb +12 -30
  31. data/lib/{water_drop → waterdrop}/producer/builder.rb +6 -11
  32. data/lib/{water_drop → waterdrop}/producer/sync.rb +44 -15
  33. data/lib/waterdrop/producer/transactions.rb +170 -0
  34. data/lib/waterdrop/producer.rb +308 -0
  35. data/lib/{water_drop → waterdrop}/version.rb +1 -1
  36. data/lib/waterdrop.rb +28 -2
  37. data/renovate.json +6 -0
  38. data/waterdrop.gemspec +14 -11
  39. data.tar.gz.sig +0 -0
  40. metadata +71 -111
  41. metadata.gz.sig +0 -0
  42. data/certs/mensfeld.pem +0 -25
  43. data/config/errors.yml +0 -6
  44. data/lib/water_drop/contracts/config.rb +0 -26
  45. data/lib/water_drop/contracts/message.rb +0 -42
  46. data/lib/water_drop/instrumentation/callbacks/delivery.rb +0 -30
  47. data/lib/water_drop/instrumentation/callbacks/statistics_decorator.rb +0 -77
  48. data/lib/water_drop/instrumentation/callbacks_manager.rb +0 -39
  49. data/lib/water_drop/instrumentation.rb +0 -20
  50. data/lib/water_drop/patches/rdkafka/bindings.rb +0 -42
  51. data/lib/water_drop/patches/rdkafka/producer.rb +0 -20
  52. data/lib/water_drop/producer/dummy_client.rb +0 -32
  53. data/lib/water_drop/producer.rb +0 -162
  54. data/lib/water_drop.rb +0 -36
  55. /data/lib/{water_drop → waterdrop}/contracts.rb +0 -0
  56. /data/lib/{water_drop → waterdrop}/producer/status.rb +0 -0
data/README.md CHANGED
@@ -1,84 +1,40 @@
1
1
  # WaterDrop
2
2
 
3
- **Note**: Documentation presented here refers to WaterDrop `2.0.0`.
4
-
5
- WaterDrop `2.0` does **not** work with Karafka `1.*` and aims to either work as a standalone producer outside of Karafka `1.*` ecosystem or as a part of not yet released Karafka `2.0.*`.
6
-
7
- Please refer to [this](https://github.com/karafka/waterdrop/tree/1.4) branch and its documentation for details about WaterDrop `1.*` usage.
8
-
9
3
  [![Build Status](https://github.com/karafka/waterdrop/workflows/ci/badge.svg)](https://github.com/karafka/waterdrop/actions?query=workflow%3Aci)
10
4
  [![Gem Version](https://badge.fury.io/rb/waterdrop.svg)](http://badge.fury.io/rb/waterdrop)
11
5
  [![Join the chat at https://slack.karafka.io](https://raw.githubusercontent.com/karafka/misc/master/slack.svg)](https://slack.karafka.io)
12
6
 
13
- Gem used to send messages to Kafka in an easy way with an extra validation layer. It is a part of the [Karafka](https://github.com/karafka/karafka) ecosystem.
7
+ WaterDrop is a standalone gem that sends messages to Kafka easily with an extra validation layer. It is a part of the [Karafka](https://github.com/karafka/karafka) ecosystem.
14
8
 
15
9
  It:
16
10
 
17
- - Is thread safe
11
+ - Is thread-safe
18
12
  - Supports sync producing
19
13
  - Supports async producing
14
+ - Supports transactions
20
15
  - Supports buffering
21
16
  - Supports producing messages to multiple clusters
22
17
  - Supports multiple delivery policies
23
- - Works with Kafka 1.0+ and Ruby 2.6+
24
-
25
- ## Table of contents
18
+ - Works with Kafka `1.0+` and Ruby `2.7+`
19
+ - Works with and without Karafka
26
20
 
27
- - [Installation](#installation)
28
- - [Setup](#setup)
29
- * [WaterDrop configuration options](#waterdrop-configuration-options)
30
- * [Kafka configuration options](#kafka-configuration-options)
31
- - [Usage](#usage)
32
- * [Basic usage](#basic-usage)
33
- * [Buffering](#buffering)
34
- + [Using WaterDrop to buffer messages based on the application logic](#using-waterdrop-to-buffer-messages-based-on-the-application-logic)
35
- + [Using WaterDrop with rdkafka buffers to achieve periodic auto-flushing](#using-waterdrop-with-rdkafka-buffers-to-achieve-periodic-auto-flushing)
36
- - [Instrumentation](#instrumentation)
37
- * [Usage statistics](#usage-statistics)
38
- * [Error notifications](#error-notifications)
39
- * [Forking and potential memory problems](#forking-and-potential-memory-problems)
40
- - [Note on contributions](#note-on-contributions)
41
-
42
- ## Installation
43
-
44
- ```ruby
45
- gem install waterdrop
46
- ```
47
-
48
- or add this to your Gemfile:
49
-
50
- ```ruby
51
- gem 'waterdrop'
52
- ```
53
-
54
- and run
55
-
56
- ```
57
- bundle install
58
- ```
21
+ ## Documentation
59
22
 
60
- ## Setup
23
+ Karafka ecosystem components documentation, including WaterDrop, can be found [here](https://karafka.io/docs/#waterdrop).
61
24
 
62
- WaterDrop is a complex tool, that contains multiple configuration options. To keep everything organized, all the configuration options were divided into two groups:
25
+ ## Getting Started
63
26
 
64
- - WaterDrop options - options directly related to WaterDrop and its components
65
- - Kafka driver options - options related to `rdkafka`
27
+ If you want to both produce and consume messages, please use [Karafka](https://github.com/karafka/karafka/). It integrates WaterDrop automatically.
66
28
 
67
- To apply all those configuration options, you need to create a producer instance and use the ```#setup``` method:
29
+ To get started with WaterDrop:
68
30
 
69
- ```ruby
70
- producer = WaterDrop::Producer.new
31
+ 1. Add it to your Gemfile:
71
32
 
72
- producer.setup do |config|
73
- config.deliver = true
74
- config.kafka = {
75
- 'bootstrap.servers': 'localhost:9092',
76
- 'request.required.acks': 1
77
- }
78
- end
33
+ ```bash
34
+ bundle add waterdrop
79
35
  ```
80
36
 
81
- or you can do the same while initializing the producer:
37
+ 2. Create and configure a producer:
82
38
 
83
39
  ```ruby
84
40
  producer = WaterDrop::Producer.new do |config|
@@ -90,41 +46,17 @@ producer = WaterDrop::Producer.new do |config|
90
46
  end
91
47
  ```
92
48
 
93
- ### WaterDrop configuration options
49
+ 3. Use it as follows:
94
50
 
95
- | Option | Description |
96
- |--------------------|-----------------------------------------------------------------|
97
- | `id` | id of the producer for instrumentation and logging |
98
- | `logger` | Logger that we want to use |
99
- | `deliver` | Should we send messages to Kafka or just fake the delivery |
100
- | `max_wait_timeout` | Waits that long for the delivery report or raises an error |
101
- | `wait_timeout` | Waits that long before re-check of delivery report availability |
102
-
103
- ### Kafka configuration options
104
-
105
- You can create producers with different `kafka` settings. Documentation of the available configuration options is available on https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md.
106
-
107
- ## Usage
108
-
109
- Please refer to the [documentation](https://www.rubydoc.info/gems/waterdrop) in case you're interested in the more advanced API.
110
-
111
- ### Basic usage
112
-
113
- To send Kafka messages, just create a producer and use it:
114
51
 
115
52
  ```ruby
116
- producer = WaterDrop::Producer.new
117
-
118
- producer.setup do |config|
119
- config.kafka = { 'bootstrap.servers': 'localhost:9092' }
120
- end
121
-
53
+ # sync producing
122
54
  producer.produce_sync(topic: 'my-topic', payload: 'my message')
123
55
 
124
56
  # or for async
125
57
  producer.produce_async(topic: 'my-topic', payload: 'my message')
126
58
 
127
- # or in batches
59
+ # or in sync batches
128
60
  producer.produce_many_sync(
129
61
  [
130
62
  { topic: 'my-topic', payload: 'my message'},
@@ -132,7 +64,7 @@ producer.produce_many_sync(
132
64
  ]
133
65
  )
134
66
 
135
- # both sync and async
67
+ # and async batches
136
68
  producer.produce_many_async(
137
69
  [
138
70
  { topic: 'my-topic', payload: 'my message'},
@@ -140,194 +72,9 @@ producer.produce_many_async(
140
72
  ]
141
73
  )
142
74
 
143
- # Don't forget to close the producer once you're done to flush the internal buffers, etc
144
- producer.close
145
- ```
146
-
147
- Each message that you want to publish, will have its value checked.
148
-
149
- Here are all the things you can provide in the message hash:
150
-
151
- | Option | Required | Value type | Description |
152
- |-----------------|----------|---------------|----------------------------------------------------------|
153
- | `topic` | true | String | The Kafka topic that should be written to |
154
- | `payload` | true | String | Data you want to send to Kafka |
155
- | `key` | false | String | The key that should be set in the Kafka message |
156
- | `partition` | false | Integer | A specific partition number that should be written to |
157
- | `partition_key` | false | String | Key to indicate the destination partition of the message |
158
- | `timestamp` | false | Time, Integer | The timestamp that should be set on the message |
159
- | `headers` | false | Hash | Headers for the message |
160
-
161
- Keep in mind, that message you want to send should be either binary or stringified (to_s, to_json, etc).
162
-
163
- ### Buffering
164
-
165
- WaterDrop producers support buffering messages in their internal buffers and on the `rdkafka` level via `queue.buffering.*` set of settings.
166
-
167
- This means that depending on your use case, you can achieve both granular buffering and flushing control when needed with context awareness and periodic and size-based flushing functionalities.
168
-
169
- #### Using WaterDrop to buffer messages based on the application logic
170
-
171
- ```ruby
172
- producer = WaterDrop::Producer.new
173
-
174
- producer.setup do |config|
175
- config.kafka = { 'bootstrap.servers': 'localhost:9092' }
176
- end
177
-
178
- # Simulating some events states of a transaction - notice, that the messages will be flushed to
179
- # kafka only upon arrival of the `finished` state.
180
- %w[
181
- started
182
- processed
183
- finished
184
- ].each do |state|
185
- producer.buffer(topic: 'events', payload: state)
186
-
187
- puts "The messages buffer size #{producer.messages.size}"
188
- producer.flush_sync if state == 'finished'
189
- puts "The messages buffer size #{producer.messages.size}"
190
- end
191
-
192
- producer.close
193
- ```
194
-
195
- #### Using WaterDrop with rdkafka buffers to achieve periodic auto-flushing
196
-
197
- ```ruby
198
- producer = WaterDrop::Producer.new
199
-
200
- producer.setup do |config|
201
- config.kafka = {
202
- 'bootstrap.servers': 'localhost:9092',
203
- # Accumulate messages for at most 10 seconds
204
- 'queue.buffering.max.ms' => 10_000
205
- }
206
- end
207
-
208
- # WaterDrop will flush messages minimum once every 10 seconds
209
- 30.times do |i|
210
- producer.produce_async(topic: 'events', payload: i.to_s)
211
- sleep(1)
212
- end
213
-
214
- producer.close
215
- ```
216
-
217
- ## Instrumentation
218
-
219
- Each of the producers after the `#setup` is done, has a custom monitor to which you can subscribe.
220
-
221
- ```ruby
222
- producer = WaterDrop::Producer.new
223
-
224
- producer.setup do |config|
225
- config.kafka = { 'bootstrap.servers': 'localhost:9092' }
226
- end
227
-
228
- producer.monitor.subscribe('message.produced_async') do |event|
229
- puts "A message was produced to '#{event[:message][:topic]}' topic!"
230
- end
231
-
232
- producer.produce_async(topic: 'events', payload: 'data')
233
-
234
- producer.close
235
- ```
236
-
237
- See the `WaterDrop::Instrumentation::Monitor::EVENTS` for the list of all the supported events.
238
-
239
- ### Usage statistics
240
-
241
- WaterDrop may be configured to emit internal metrics at a fixed interval by setting the `kafka` `statistics.interval.ms` configuration property to a value > `0`. Once that is done, emitted statistics are available after subscribing to the `statistics.emitted` publisher event.
242
-
243
- The statistics include all of the metrics from `librdkafka` (full list [here](https://github.com/edenhill/librdkafka/blob/master/STATISTICS.md)) as well as the diff of those against the previously emitted values.
244
-
245
- For several attributes like `txmsgs`, `librdkafka` publishes only the totals. In order to make it easier to track the progress (for example number of messages sent between statistics emitted events), WaterDrop diffs all the numeric values against previously available numbers. All of those metrics are available under the same key as the metric but with additional `_d` postfix:
246
-
247
-
248
- ```ruby
249
- producer = WaterDrop::Producer.new do |config|
250
- config.kafka = {
251
- 'bootstrap.servers': 'localhost:9092',
252
- 'statistics.interval.ms': 2_000 # emit statistics every 2 seconds
253
- }
254
- end
255
-
256
- producer.monitor.subscribe('statistics.emitted') do |event|
257
- sum = event[:statistics]['txmsgs']
258
- diff = event[:statistics]['txmsgs_d']
259
-
260
- p "Sent messages: #{sum}"
261
- p "Messages sent from last statistics report: #{diff}"
262
- end
263
-
264
- sleep(2)
265
-
266
- # Sent messages: 0
267
- # Messages sent from last statistics report: 0
268
-
269
- 20.times { producer.produce_async(topic: 'events', payload: 'data') }
270
-
271
- # Sent messages: 20
272
- # Messages sent from last statistics report: 20
273
-
274
- sleep(2)
275
-
276
- 20.times { producer.produce_async(topic: 'events', payload: 'data') }
277
-
278
- # Sent messages: 40
279
- # Messages sent from last statistics report: 20
280
-
281
- sleep(2)
282
-
283
- # Sent messages: 40
284
- # Messages sent from last statistics report: 0
285
-
286
- producer.close
287
- ```
288
-
289
- Note: The metrics returned may not be completely consistent between brokers, toppars and totals, due to the internal asynchronous nature of librdkafka. E.g., the top level tx total may be less than the sum of the broker tx values which it represents.
290
-
291
- ### Error notifications
292
-
293
- Aside from errors related to publishing messages like `buffer.flushed_async.error`, WaterDrop allows you to listen to errors that occur in its internal background threads. Things like reconnecting to Kafka upon network errors and others unrelated to publishing messages are all available under `error.emitted` notification key. You can subscribe to this event to ensure your setup is healthy and without any problems that would otherwise go unnoticed as long as messages are delivered.
294
-
295
- ```ruby
296
- producer = WaterDrop::Producer.new do |config|
297
- # Note invalid connection port...
298
- config.kafka = { 'bootstrap.servers': 'localhost:9090' }
299
- end
300
-
301
- producer.monitor.subscribe('error.emitted') do |event|
302
- error = event[:error]
303
-
304
- p "Internal error occurred: #{error}"
75
+ # transactions
76
+ producer.transaction do
77
+ producer.produce_async(topic: 'my-topic', payload: 'my message')
78
+ producer.produce_async(topic: 'my-topic', payload: 'my message')
305
79
  end
306
-
307
- # Run this code without Kafka cluster
308
- loop do
309
- producer.produce_async(topic: 'events', payload: 'data')
310
-
311
- sleep(1)
312
- end
313
-
314
- # After you stop your Kafka cluster, you will see a lot of those:
315
- #
316
- # Internal error occurred: Local: Broker transport failure (transport)
317
- #
318
- # Internal error occurred: Local: Broker transport failure (transport)
319
80
  ```
320
-
321
- ### Forking and potential memory problems
322
-
323
- If you work with forked processes, make sure you **don't** use the producer before the fork. You can easily configure the producer and then fork and use it.
324
-
325
- To tackle this [obstacle](https://github.com/appsignal/rdkafka-ruby/issues/15) related to rdkafka, WaterDrop adds finalizer to each of the producers to close the rdkafka client before the Ruby process is shutdown. Due to the [nature of the finalizers](https://www.mikeperham.com/2010/02/24/the-trouble-with-ruby-finalizers/), this implementation prevents producers from being GCed (except upon VM shutdown) and can cause memory leaks if you don't use persistent/long-lived producers in a long-running process or if you don't use the `#close` method of a producer when it is no longer needed. Creating a producer instance for each message is anyhow a rather bad idea, so we recommend not to.
326
-
327
- ## Note on contributions
328
-
329
- First, thank you for considering contributing to the Karafka ecosystem! It's people like you that make the open source community such a great community!
330
-
331
- Each pull request must pass all the RSpec specs, integration tests and meet our quality requirements.
332
-
333
- Fork it, update and wait for the Github Actions results.
@@ -0,0 +1,26 @@
1
+ -----BEGIN CERTIFICATE-----
2
+ MIIEcDCCAtigAwIBAgIBATANBgkqhkiG9w0BAQsFADA/MRAwDgYDVQQDDAdjb250
3
+ YWN0MRcwFQYKCZImiZPyLGQBGRYHa2FyYWZrYTESMBAGCgmSJomT8ixkARkWAmlv
4
+ MB4XDTIzMDgyMTA3MjU1NFoXDTI0MDgyMDA3MjU1NFowPzEQMA4GA1UEAwwHY29u
5
+ dGFjdDEXMBUGCgmSJomT8ixkARkWB2thcmFma2ExEjAQBgoJkiaJk/IsZAEZFgJp
6
+ bzCCAaIwDQYJKoZIhvcNAQEBBQADggGPADCCAYoCggGBAOuZpyQKEwsTG9plLat7
7
+ 8bUaNuNBEnouTsNMr6X+XTgvyrAxTuocdsyP1sNCjdS1B8RiiDH1/Nt9qpvlBWon
8
+ sdJ1SYhaWNVfqiYStTDnCx3PRMmHRdD4KqUWKpN6VpZ1O/Zu+9Mw0COmvXgZuuO9
9
+ wMSJkXRo6dTCfMedLAIxjMeBIxtoLR2e6Jm6MR8+8WYYVWrO9kSOOt5eKQLBY7aK
10
+ b/Dc40EcJKPg3Z30Pia1M9ZyRlb6SOj6SKpHRqc7vbVQxjEw6Jjal1lZ49m3YZMd
11
+ ArMAs9lQZNdSw5/UX6HWWURLowg6k10RnhTUtYyzO9BFev0JFJftHnmuk8vtb+SD
12
+ 5VPmjFXg2VOcw0B7FtG75Vackk8QKfgVe3nSPhVpew2CSPlbJzH80wChbr19+e3+
13
+ YGr1tOiaJrL6c+PNmb0F31NXMKpj/r+n15HwlTMRxQrzFcgjBlxf2XFGnPQXHhBm
14
+ kp1OFnEq4GG9sON4glRldkwzi/f/fGcZmo5fm3d+0ZdNgwIDAQABo3cwdTAJBgNV
15
+ HRMEAjAAMAsGA1UdDwQEAwIEsDAdBgNVHQ4EFgQUPVH5+dLA80A1kJ2Uz5iGwfOa
16
+ 1+swHQYDVR0RBBYwFIESY29udGFjdEBrYXJhZmthLmlvMB0GA1UdEgQWMBSBEmNv
17
+ bnRhY3RAa2FyYWZrYS5pbzANBgkqhkiG9w0BAQsFAAOCAYEAnpa0jcN7JzREHMTQ
18
+ bfZ+xcvlrzuROMY6A3zIZmQgbnoZZNuX4cMRrT1p1HuwXpxdpHPw7dDjYqWw3+1h
19
+ 3mXLeMuk7amjQpYoSWU/OIZMhIsARra22UN8qkkUlUj3AwTaChVKN/bPJOM2DzfU
20
+ kz9vUgLeYYFfQbZqeI6SsM7ltilRV4W8D9yNUQQvOxCFxtLOetJ00fC/E7zMUzbK
21
+ IBwYFQYsbI6XQzgAIPW6nGSYKgRhkfpmquXSNKZRIQ4V6bFrufa+DzD0bt2ZA3ah
22
+ fMmJguyb5L2Gf1zpDXzFSPMG7YQFLzwYz1zZZvOU7/UCpQsHpID/YxqDp4+Dgb+Y
23
+ qma0whX8UG/gXFV2pYWpYOfpatvahwi+A1TwPQsuZwkkhi1OyF1At3RY+hjSXyav
24
+ AnG1dJU+yL2BK7vaVytLTstJME5mepSZ46qqIJXMuWob/YPDmVaBF39TDSG9e34s
25
+ msG3BiCqgOgHAnL23+CN3Rt8MsuRfEtoTKpJVcCfoEoNHOkc
26
+ -----END CERTIFICATE-----
@@ -0,0 +1,33 @@
1
+ en:
2
+ validations:
3
+ config:
4
+ missing: must be present
5
+ logger_format: must be present
6
+ deliver_format: must be boolean
7
+ id_format: must be a non-empty string
8
+ max_payload_size_format: must be an integer that is equal or bigger than 1
9
+ wait_timeout_format: must be a numeric that is bigger than 0
10
+ max_wait_timeout_format: must be an integer that is equal or bigger than 0
11
+ kafka_format: must be a hash with symbol based keys
12
+ kafka_key_must_be_a_symbol: All keys under the kafka settings scope need to be symbols
13
+ wait_on_queue_full_format: must be boolean
14
+ wait_backoff_on_queue_full_format: must be a numeric that is bigger or equal to 0
15
+ wait_timeout_on_queue_full_format: must be a numeric that is bigger or equal to 0
16
+
17
+ message:
18
+ missing: must be present
19
+ partition_format: must be an integer greater or equal to -1
20
+ topic_format: 'does not match the topic allowed format'
21
+ partition_key_format: must be a non-empty string
22
+ timestamp_format: must be either time or integer
23
+ payload_format: must be string or nil
24
+ headers_format: must be a hash
25
+ key_format: must be a non-empty string
26
+ payload_max_size: is more than `max_payload_size` config value
27
+ headers_invalid_key_type: all headers keys need to be of type String
28
+ headers_invalid_value_type: all headers values need to be of type String
29
+
30
+ test:
31
+ missing: must be present
32
+ nested.id_format: 'is invalid'
33
+ nested.id2_format: 'is invalid'
data/docker-compose.yml CHANGED
@@ -1,18 +1,25 @@
1
1
  version: '2'
2
+
2
3
  services:
3
- zookeeper:
4
- image: wurstmeister/zookeeper
5
- ports:
6
- - "2181:2181"
7
4
  kafka:
8
- image: wurstmeister/kafka:2.12-2.5.0
5
+ container_name: kafka
6
+ image: confluentinc/cp-kafka:7.5.1
7
+
9
8
  ports:
10
- - "9092:9092"
9
+ - 9092:9092
10
+
11
11
  environment:
12
- KAFKA_ADVERTISED_HOST_NAME: localhost
13
- KAFKA_ADVERTISED_PORT: 9092
14
- KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
12
+ CLUSTER_ID: kafka-docker-cluster-1
13
+ KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
14
+ KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
15
+ KAFKA_PROCESS_ROLES: broker,controller
16
+ KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
17
+ KAFKA_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
18
+ KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
19
+ KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://127.0.0.1:9092
20
+ KAFKA_BROKER_ID: 1
21
+ KAFKA_CONTROLLER_QUORUM_VOTERS: 1@127.0.0.1:9093
22
+ ALLOW_PLAINTEXT_LISTENER: 'yes'
15
23
  KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'true'
16
- KAFKA_CREATE_TOPICS: 'example_topic:1:1'
17
- volumes:
18
- - /var/run/docker.sock:/var/run/docker.sock
24
+ KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
25
+ KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
@@ -0,0 +1,90 @@
1
+ # frozen_string_literal: true
2
+
3
+ module WaterDrop
4
+ module Clients
5
+ # Client used to buffer messages that we send out in specs and other places.
6
+ class Buffered < Clients::Dummy
7
+ attr_accessor :messages
8
+
9
+ # @param args [Object] anything accepted by `Clients::Dummy`
10
+ def initialize(*args)
11
+ super
12
+ @messages = []
13
+ @topics = Hash.new { |k, v| k[v] = [] }
14
+
15
+ @transaction_active = false
16
+ @transaction_messages = []
17
+ @transaction_topics = Hash.new { |k, v| k[v] = [] }
18
+ @transaction_level = 0
19
+ end
20
+
21
+ # "Produces" message to Kafka: it acknowledges it locally, adds it to the internal buffer
22
+ # @param message [Hash] `WaterDrop::Producer#produce_sync` message hash
23
+ # @return [Dummy::Handle] fake delivery handle that can be materialized into a report
24
+ def produce(message)
25
+ if @transaction_active
26
+ @transaction_topics[message.fetch(:topic)] << message
27
+ @transaction_messages << message
28
+ else
29
+ # We pre-validate the message payload, so topic is ensured to be present
30
+ @topics[message.fetch(:topic)] << message
31
+ @messages << message
32
+ end
33
+
34
+ super(**message.to_h)
35
+ end
36
+
37
+ # Starts the transaction on a given level
38
+ def begin_transaction
39
+ @transaction_level += 1
40
+ @transaction_active = true
41
+ end
42
+
43
+ # Finishes given level of transaction
44
+ def commit_transaction
45
+ @transaction_level -= 1
46
+
47
+ return unless @transaction_level.zero?
48
+
49
+ # Transfer transactional data on success
50
+ @transaction_topics.each do |topic, messages|
51
+ @topics[topic] += messages
52
+ end
53
+
54
+ @messages += @transaction_messages
55
+
56
+ @transaction_topics.clear
57
+ @transaction_messages.clear
58
+ @transaction_active = false
59
+ end
60
+
61
+ # Aborts the transaction
62
+ def abort_transaction
63
+ @transaction_level -= 1
64
+
65
+ return unless @transaction_level.zero?
66
+
67
+ @transaction_topics.clear
68
+ @transaction_messages.clear
69
+ @transaction_active = false
70
+ end
71
+
72
+ # Returns messages produced to a given topic
73
+ # @param topic [String]
74
+ def messages_for(topic)
75
+ @topics[topic]
76
+ end
77
+
78
+ # Clears internal buffer
79
+ # Used in between specs so messages do not leak out
80
+ def reset
81
+ @transaction_level = 0
82
+ @transaction_active = false
83
+ @transaction_topics.clear
84
+ @transaction_messages.clear
85
+ @messages.clear
86
+ @topics.each_value(&:clear)
87
+ end
88
+ end
89
+ end
90
+ end
@@ -0,0 +1,69 @@
1
+ # frozen_string_literal: true
2
+
3
+ module WaterDrop
4
+ module Clients
5
+ # A dummy client that is supposed to be used instead of Rdkafka::Producer in case we don't
6
+ # want to dispatch anything to Kafka.
7
+ #
8
+ # It does not store anything and just ignores messages. It does however return proper delivery
9
+ # handle that can be materialized into a report.
10
+ class Dummy
11
+ # `::Rdkafka::Producer::DeliveryHandle` object API compatible dummy object
12
+ class Handle < ::Rdkafka::Producer::DeliveryHandle
13
+ # @param topic [String] topic where we want to dispatch message
14
+ # @param partition [Integer] target partition
15
+ # @param offset [Integer] offset assigned by our fake "Kafka"
16
+ def initialize(topic, partition, offset)
17
+ @topic = topic
18
+ @partition = partition
19
+ @offset = offset
20
+ end
21
+
22
+ # Does not wait, just creates the result
23
+ #
24
+ # @param _args [Array] anything the wait handle would accept
25
+ # @return [::Rdkafka::Producer::DeliveryReport]
26
+ def wait(*_args)
27
+ create_result
28
+ end
29
+
30
+ # Creates a delivery report with details where the message went
31
+ #
32
+ # @return [::Rdkafka::Producer::DeliveryReport]
33
+ def create_result
34
+ ::Rdkafka::Producer::DeliveryReport.new(
35
+ @partition,
36
+ @offset,
37
+ @topic
38
+ )
39
+ end
40
+ end
41
+
42
+ # @param _producer [WaterDrop::Producer]
43
+ # @return [Dummy] dummy instance
44
+ def initialize(_producer)
45
+ @counters = Hash.new { |h, k| h[k] = -1 }
46
+ end
47
+
48
+ # "Produces" the message
49
+ # @param topic [String, Symbol] topic where we want to dispatch message
50
+ # @param partition [Integer] target partition
51
+ # @param _args [Hash] remaining details that are ignored in the dummy mode
52
+ # @return [Handle] delivery handle
53
+ def produce(topic:, partition: 0, **_args)
54
+ Handle.new(topic.to_s, partition, @counters["#{topic}#{partition}"] += 1)
55
+ end
56
+
57
+ # @param _args [Object] anything really, this dummy is suppose to support anything
58
+ def respond_to_missing?(*_args)
59
+ true
60
+ end
61
+
62
+ # @param _args [Object] anything really, this dummy is suppose to support anything
63
+ # @return [self] returns self for chaining cases
64
+ def method_missing(*_args)
65
+ self || super
66
+ end
67
+ end
68
+ end
69
+ end
@@ -0,0 +1,34 @@
1
+ # frozen_string_literal: true
2
+
3
+ module WaterDrop
4
+ # Namespace for all the clients that WaterDrop may use under the hood
5
+ module Clients
6
+ # Default Rdkafka client.
7
+ # Since we use the ::Rdkafka::Producer under the hood, this is just a module that aligns with
8
+ # client building API for the convenience.
9
+ module Rdkafka
10
+ class << self
11
+ # @param producer [WaterDrop::Producer] producer instance with its config, etc
12
+ # @note We overwrite this that way, because we do not care
13
+ def new(producer)
14
+ config = producer.config.kafka.to_h
15
+
16
+ client = ::Rdkafka::Config.new(config).producer
17
+
18
+ # This callback is not global and is per client, thus we do not have to wrap it with a
19
+ # callbacks manager to make it work
20
+ client.delivery_callback = Instrumentation::Callbacks::Delivery.new(
21
+ producer.id,
22
+ producer.transactional?,
23
+ producer.config.monitor
24
+ )
25
+
26
+ # Switch to the transactional mode if user provided the transactional id
27
+ client.init_transactions if config.key?(:'transactional.id')
28
+
29
+ client
30
+ end
31
+ end
32
+ end
33
+ end
34
+ end