karafka 2.2.9 → 2.2.10

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 71ed12ebff017d2ea4ded639fc745c09c3edf198456b2eb3a4d6e3f1740be925
4
- data.tar.gz: de696bd36d0af383eed008492ab0fd71dc0395515b56fe050ca93765631cbcb9
3
+ metadata.gz: 460029e2dbc352c56f107e18bbff28f5914b5b56a0e42be9670e13f0f3592fc9
4
+ data.tar.gz: 68c25bacd1995a8e9c97a915e1e1b2b4435437c97a5e31e2824499c9ad531057
5
5
  SHA512:
6
- metadata.gz: 14be5b79b7e25ca006bc3d1eecd6706c0beaec38b67ec949e5fc064bc1b883468a0eedb7dabf10d61f9ecf197ec2f476bf413f9382368203437d171318269312
7
- data.tar.gz: c6fa615fe8eb7c36f5804421c392c13241d4fd0d3250039f08c0a146cb68cae1b58aac0a9e75c8019b7c13eb743867afd9df229063ee2d8e405e64464a2bb523
6
+ metadata.gz: adc877c3be3cb8885b0244168014387f0353bed89575b6931f5e1ca062d01ba6a0e5a0566da4dd282f5117c7daf27ff2116f27025ec4213972c1ca52e92980de
7
+ data.tar.gz: a62ff96ef96ea85d98288bad3c8f6cf0bfafc1ee6de1cf928fe803ad80435b0fa137aac9cf5eff4b232e854b3c4df76c3c16efaa4a117d6b86e442975cfc9a30
checksums.yaml.gz.sig CHANGED
Binary file
@@ -37,14 +37,15 @@ Here's an example:
37
37
 
38
38
  ```
39
39
  $ [bundle exec] karafka info
40
- Karafka version: 1.3.0
41
- Ruby version: 2.6.3
42
- Ruby-kafka version: 0.7.9
43
- Application client id: karafka-local
44
- Backend: inline
45
- Batch fetching: true
46
- Batch consuming: true
47
- Boot file: /app/karafka/karafka.rb
40
+ Karafka version: 2.2.10 + Pro
41
+ Ruby version: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
42
+ Rdkafka version: 0.13.8
43
+ Consumer groups count: 2
44
+ Subscription groups count: 2
45
+ Workers count: 2
46
+ Application client id: example_app
47
+ Boot file: /app/karafka.rb
48
48
  Environment: development
49
- Kafka seed brokers: ["kafka://kafka:9092"]
49
+ License: Commercial
50
+ License entity: karafka-ci
50
51
  ```
@@ -1,6 +1,8 @@
1
1
  name: ci
2
2
 
3
- concurrency: ci-${{ github.ref }}
3
+ concurrency:
4
+ group: ${{ github.workflow }}-${{ github.ref }}
5
+ cancel-in-progress: true
4
6
 
5
7
  on:
6
8
  pull_request:
data/CHANGELOG.md CHANGED
@@ -1,5 +1,15 @@
1
1
  # Karafka framework changelog
2
2
 
3
+ ## 2.2.10 (2023-11-02)
4
+ - [Improvement] Allow for running `#pause` without specifying the offset (provide offset or `:consecutive`). This allows for pausing on the consecutive message (last received + 1), so after resume we will get last message received + 1 effectively not using `#seek` and not purging `librdafka` buffer preserving on networking. Please be mindful that this uses notion of last message passed from **librdkafka**, and not the last one available in the consumer (`messages.last`). While for regular cases they will be the same, when using things like DLQ, LRJs, VPs or Filtering API, those may not be the same.
5
+ - [Improvement] **Drastically** improve network efficiency of operating with LRJ by using the `:consecutive` offset as default strategy for running LRJs without moving the offset in place and purging the data.
6
+ - [Improvement] Do not "seek in place". When pausing and/or seeking to the same location as the current position, do nothing not to purge buffers and not to move to the same place where we are.
7
+ - [Fix] Pattern regexps should not be part of declaratives even when configured.
8
+
9
+ ### Upgrade Notes
10
+
11
+ In the latest Karafka release, there are no breaking changes. However, please note the updates to #pause and #seek. If you spot any issues, please report them immediately. Your feedback is crucial.
12
+
3
13
  ## 2.2.9 (2023-10-24)
4
14
  - [Improvement] Allow using negative offset references in `Karafka::Admin#read_topic`.
5
15
  - [Change] Make sure that WaterDrop `2.6.10` or higher is used with this release to support transactions fully and the Web-UI.
@@ -11,7 +21,7 @@
11
21
  - [Refactor] Reorganize how rebalance events are propagated from `librdkafka` to Karafka. Replace `connection.client.rebalance_callback` with `rebalance.partitions_assigned` and `rebalance.partitions_revoked`. Introduce two extra events: `rebalance.partitions_assign` and `rebalance.partitions_revoke` to handle pre-rebalance future work.
12
22
  - [Refactor] Remove `thor` as a CLI layer and rely on Ruby `OptParser`
13
23
 
14
- ### Upgrade notes
24
+ ### Upgrade Notes
15
25
 
16
26
  1. Unless you were using `connection.client.rebalance_callback` which was considered private, nothing.
17
27
  2. None of the CLI commands should change but `thor` has been removed so please report if you find any bugs.
@@ -53,7 +63,7 @@
53
63
  - [Enhancement] Make sure that consumer group used by `Karafka::Admin` obeys the `ConsumerMapper` setup.
54
64
  - [Fix] Fix a case where subscription group would not accept a symbol name.
55
65
 
56
- ### Upgrade notes
66
+ ### Upgrade Notes
57
67
 
58
68
  As always, please make sure you have upgraded to the most recent version of `2.1` before upgrading to `2.2`.
59
69
 
@@ -184,7 +194,7 @@ If you want to maintain the `2.1` behavior, that is `karafka_admin` admin group,
184
194
  - [Change] Removed `Karafka::Pro::BaseConsumer` in favor of `Karafka::BaseConsumer`. (#1345)
185
195
  - [Fix] Fix for `max_messages` and `max_wait_time` not having reference in errors.yml (#1443)
186
196
 
187
- ### Upgrade notes
197
+ ### Upgrade Notes
188
198
 
189
199
  1. Upgrade to Karafka `2.0.41` prior to upgrading to `2.1.0`.
190
200
  2. Replace `Karafka::Pro::BaseConsumer` references to `Karafka::BaseConsumer`.
@@ -256,7 +266,7 @@ If you want to maintain the `2.1` behavior, that is `karafka_admin` admin group,
256
266
  - [Improvement] Attach an `embedded` tag to Karafka processes started using the embedded API.
257
267
  - [Change] Renamed `Datadog::Listener` to `Datadog::MetricsListener` for consistency. (#1124)
258
268
 
259
- ### Upgrade notes
269
+ ### Upgrade Notes
260
270
 
261
271
  1. Replace `Datadog::Listener` references to `Datadog::MetricsListener`.
262
272
 
@@ -270,7 +280,7 @@ If you want to maintain the `2.1` behavior, that is `karafka_admin` admin group,
270
280
  - [Improvement] Introduce a `strict_topics_namespacing` config option to enable/disable the strict topics naming validations. This can be useful when working with pre-existing topics which we cannot or do not want to rename.
271
281
  - [Fix] Karafka monitor is prematurely cached (#1314)
272
282
 
273
- ### Upgrade notes
283
+ ### Upgrade Notes
274
284
 
275
285
  Since `#tags` were introduced on consumers, the `#tags` method is now part of the consumers API.
276
286
 
@@ -332,7 +342,7 @@ end
332
342
  - [Improvement] Include `original_consumer_group` in the DLQ dispatched messages in Pro.
333
343
  - [Improvement] Use Karafka `client_id` as kafka `client.id` value by default
334
344
 
335
- ### Upgrade notes
345
+ ### Upgrade Notes
336
346
 
337
347
  If you want to continue to use `karafka` as default for kafka `client.id`, assign it manually:
338
348
 
@@ -385,7 +395,7 @@ class KarafkaApp < Karafka::App
385
395
  - [Fix] Do not trigger code reloading when `consumer_persistence` is enabled.
386
396
  - [Fix] Shutdown producer after all the consumer components are down and the status is stopped. This will ensure, that any instrumentation related Kafka messaging can still operate.
387
397
 
388
- ### Upgrade notes
398
+ ### Upgrade Notes
389
399
 
390
400
  If you want to disable `librdkafka` statistics because you do not use them at all, update the `kafka` `statistics.interval.ms` setting and set it to `0`:
391
401
 
@@ -447,7 +457,7 @@ end
447
457
  - [Fix] Few typos around DLQ and Pro DLQ Dispatch original metadata naming.
448
458
  - [Fix] Narrow the components lookup to the appropriate scope (#1114)
449
459
 
450
- ### Upgrade notes
460
+ ### Upgrade Notes
451
461
 
452
462
  1. Replace `original-*` references from DLQ dispatched metadata with `original_*`
453
463
 
@@ -490,7 +500,7 @@ end
490
500
  - [Specs] Split specs into regular and pro to simplify how resources are loaded
491
501
  - [Specs] Add specs to ensure, that all the Pro components have a proper per-file license (#1099)
492
502
 
493
- ### Upgrade notes
503
+ ### Upgrade Notes
494
504
 
495
505
  1. Remove the `manual_offset_management` setting from the main config if you use it:
496
506
 
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- karafka (2.2.9)
4
+ karafka (2.2.10)
5
5
  karafka-core (>= 2.2.2, < 2.3.0)
6
6
  waterdrop (>= 2.6.10, < 3.0.0)
7
7
  zeitwerk (~> 2.3)
@@ -39,24 +39,24 @@ GEM
39
39
  activesupport (>= 6.1)
40
40
  i18n (1.14.1)
41
41
  concurrent-ruby (~> 1.0)
42
- karafka-core (2.2.3)
42
+ karafka-core (2.2.5)
43
43
  concurrent-ruby (>= 1.1)
44
- karafka-rdkafka (>= 0.13.6, < 0.14.0)
45
- karafka-rdkafka (0.13.6)
44
+ karafka-rdkafka (>= 0.13.8, < 0.15.0)
45
+ karafka-rdkafka (0.13.8)
46
46
  ffi (~> 1.15)
47
47
  mini_portile2 (~> 2.6)
48
48
  rake (> 12)
49
- karafka-web (0.7.7)
49
+ karafka-web (0.7.10)
50
50
  erubi (~> 1.4)
51
- karafka (>= 2.2.8.beta1, < 3.0.0)
52
- karafka-core (>= 2.2.2, < 3.0.0)
51
+ karafka (>= 2.2.9, < 3.0.0)
52
+ karafka-core (>= 2.2.4, < 3.0.0)
53
53
  roda (~> 3.68, >= 3.69)
54
54
  tilt (~> 2.0)
55
55
  mini_portile2 (2.8.5)
56
56
  minitest (5.20.0)
57
57
  mutex_m (0.1.2)
58
58
  rack (3.0.8)
59
- rake (13.0.6)
59
+ rake (13.1.0)
60
60
  roda (3.73.0)
61
61
  rack
62
62
  rspec (3.12.0)
@@ -82,7 +82,7 @@ GEM
82
82
  tilt (2.3.0)
83
83
  tzinfo (2.0.6)
84
84
  concurrent-ruby (~> 1.0)
85
- waterdrop (2.6.10)
85
+ waterdrop (2.6.11)
86
86
  karafka-core (>= 2.2.3, < 3.0.0)
87
87
  zeitwerk (~> 2.3)
88
88
  zeitwerk (2.6.12)
@@ -171,23 +171,28 @@ module Karafka
171
171
  @used
172
172
  end
173
173
 
174
- # Pauses processing on a given offset for the current topic partition
174
+ # Pauses processing on a given offset or consecutive offset for the current topic partition
175
175
  #
176
176
  # After given partition is resumed, it will continue processing from the given offset
177
- # @param offset [Integer] offset from which we want to restart the processing
177
+ # @param offset [Integer, Symbol] offset from which we want to restart the processing or
178
+ # `:consecutive` if we want to pause and continue without changing the consecutive offset
179
+ # (cursor position)
178
180
  # @param timeout [Integer, nil] how long in milliseconds do we want to pause or nil to use the
179
181
  # default exponential pausing strategy defined for retries
180
182
  # @param manual_pause [Boolean] Flag to differentiate between user pause and system/strategy
181
183
  # based pause. While they both pause in exactly the same way, the strategy application
182
184
  # may need to differentiate between them.
185
+ #
186
+ # @note It is **critical** to understand how pause with `:consecutive` offset operates. While
187
+ # it provides benefit of not purging librdkafka buffer, in case of usage of filters, retries
188
+ # or other advanced options the consecutive offset may not be the one you want to pause on.
189
+ # Test it well to ensure, that this behaviour is expected by you.
183
190
  def pause(offset, timeout = nil, manual_pause = true)
184
191
  timeout ? coordinator.pause_tracker.pause(timeout) : coordinator.pause_tracker.pause
185
192
 
186
- client.pause(
187
- topic.name,
188
- partition,
189
- offset
190
- )
193
+ offset = nil if offset == :consecutive
194
+
195
+ client.pause(topic.name, partition, offset)
191
196
 
192
197
  # Indicate, that user took a manual action of pausing
193
198
  coordinator.manual_pause if manual_pause
@@ -160,16 +160,17 @@ module Karafka
160
160
  #
161
161
  # @param topic [String] topic name
162
162
  # @param partition [Integer] partition
163
- # @param offset [Integer] offset of the message on which we want to pause (this message will
164
- # be reprocessed after getting back to processing)
163
+ # @param offset [Integer, nil] offset of the message on which we want to pause (this message
164
+ # will be reprocessed after getting back to processing) or nil if we want to pause and
165
+ # resume from the consecutive offset (+1 from the last message passed to us by librdkafka)
165
166
  # @note This will pause indefinitely and requires manual `#resume`
166
- def pause(topic, partition, offset)
167
+ # @note When `#internal_seek` is not involved (when offset is `nil`) we will not purge the
168
+ # librdkafka buffers and continue from the last cursor offset
169
+ def pause(topic, partition, offset = nil)
167
170
  @mutex.synchronize do
168
171
  # Do not pause if the client got closed, would not change anything
169
172
  return if @closed
170
173
 
171
- pause_msg = Messages::Seek.new(topic, partition, offset)
172
-
173
174
  internal_commit_offsets(async: true)
174
175
 
175
176
  # Here we do not use our cached tpls because we should not try to pause something we do
@@ -190,6 +191,14 @@ module Karafka
190
191
  @paused_tpls[topic][partition] = tpl
191
192
 
192
193
  @kafka.pause(tpl)
194
+
195
+ # If offset is not provided, will pause where it finished.
196
+ # This makes librdkafka not purge buffers and can provide significant network savings
197
+ # when we just want to pause before further processing without changing the offsets
198
+ return unless offset
199
+
200
+ pause_msg = Messages::Seek.new(topic, partition, offset)
201
+
193
202
  internal_seek(pause_msg)
194
203
  end
195
204
  end
@@ -354,6 +363,9 @@ module Karafka
354
363
  #
355
364
  # @param message [Messages::Message, Messages::Seek] message to which we want to seek to.
356
365
  # It can have the time based offset.
366
+ #
367
+ # @note Will not invoke seeking if the desired seek would lead us to the current position.
368
+ # This prevents us from flushing librdkafka buffer when it is not needed.
357
369
  def internal_seek(message)
358
370
  # If the seek message offset is in a time format, we need to find the closest "real"
359
371
  # offset matching before we seek
@@ -378,6 +390,14 @@ module Karafka
378
390
  message.offset = detected_partition&.offset || raise(Errors::InvalidTimeBasedOffsetError)
379
391
  end
380
392
 
393
+ # Never seek if we would get the same location as we would get without seeking
394
+ # This prevents us from the expensive buffer purges that can lead to increased network
395
+ # traffic and can cost a lot of money
396
+ #
397
+ # This code adds around 0.01 ms per seek but saves from many user unexpected behaviours in
398
+ # seeking and pausing
399
+ return if message.offset == topic_partition_position(message.topic, message.partition)
400
+
381
401
  @kafka.seek(message)
382
402
  end
383
403
 
@@ -432,6 +452,18 @@ module Karafka
432
452
  Rdkafka::Consumer::TopicPartitionList.new({ topic => [rdkafka_partition] })
433
453
  end
434
454
 
455
+ # @param topic [String]
456
+ # @param partition [Integer]
457
+ # @return [Integer] current position within topic partition or `-1` if it could not be
458
+ # established. It may be `-1` in case we lost the assignment or we did not yet fetch data
459
+ # for this topic partition
460
+ def topic_partition_position(topic, partition)
461
+ rd_partition = ::Rdkafka::Consumer::Partition.new(partition, nil, 0)
462
+ tpl = ::Rdkafka::Consumer::TopicPartitionList.new(topic => [rd_partition])
463
+
464
+ @kafka.position(tpl).to_h.fetch(topic).first.offset || -1
465
+ end
466
+
435
467
  # Performs a single poll operation and handles retries and error
436
468
  #
437
469
  # @param timeout [Integer] timeout for a single poll
@@ -71,6 +71,8 @@ module Karafka
71
71
  # Prints info about a consumer pause occurrence. Irrelevant if user or system initiated.
72
72
  #
73
73
  # @param event [Karafka::Core::Monitoring::Event] event details including payload
74
+ # @note There may be no offset provided in case user wants to pause on the consecutive offset
75
+ # position. This can be beneficial when not wanting to purge the buffers.
74
76
  def on_client_pause(event)
75
77
  topic = event[:topic]
76
78
  partition = event[:partition]
@@ -78,7 +80,9 @@ module Karafka
78
80
  client = event[:caller]
79
81
 
80
82
  info <<~MSG.tr("\n", ' ').strip!
81
- [#{client.id}] Pausing on topic #{topic}/#{partition} on offset #{offset}
83
+ [#{client.id}]
84
+ Pausing on topic #{topic}/#{partition}
85
+ on #{offset ? "offset #{offset}" : 'the consecutive offset'}
82
86
  MSG
83
87
  end
84
88
 
@@ -37,7 +37,7 @@ module Karafka
37
37
  super
38
38
 
39
39
  coordinator.on_enqueued do
40
- pause(coordinator.seek_offset, Strategies::Lrj::Default::MAX_PAUSE_TIME, false)
40
+ pause(:consecutive, Strategies::Lrj::Default::MAX_PAUSE_TIME, false)
41
41
  end
42
42
  end
43
43
 
@@ -35,12 +35,10 @@ module Karafka
35
35
  # This ensures that when running LRJ with VP, things operate as expected run only
36
36
  # once for all the virtual partitions collectively
37
37
  coordinator.on_enqueued do
38
- # Pause at the first message in a batch. That way in case of a crash, we will not
39
- # loose any messages.
40
- #
41
- # For VP it applies the same way and since VP cannot be used with MOM we should not
42
- # have any edge cases here.
43
- pause(coordinator.seek_offset, MAX_PAUSE_TIME, false)
38
+ # Pause and continue with another batch in case of a regular resume.
39
+ # In case of an error, the `#retry_after_pause` will move the offset to the first
40
+ # message out of this batch.
41
+ pause(:consecutive, MAX_PAUSE_TIME, false)
44
42
  end
45
43
  end
46
44
 
@@ -35,12 +35,7 @@ module Karafka
35
35
  # This ensures that when running LRJ with VP, things operate as expected run only
36
36
  # once for all the virtual partitions collectively
37
37
  coordinator.on_enqueued do
38
- # Pause at the first message in a batch. That way in case of a crash, we will not
39
- # loose any messages.
40
- #
41
- # For VP it applies the same way and since VP cannot be used with MOM we should not
42
- # have any edge cases here.
43
- pause(coordinator.seek_offset, Strategies::Lrj::Default::MAX_PAUSE_TIME, false)
38
+ pause(:consecutive, Strategies::Lrj::Default::MAX_PAUSE_TIME, false)
44
39
  end
45
40
  end
46
41
 
@@ -50,6 +50,10 @@ module Karafka
50
50
  virtual_topic = public_send(:topic=, pattern.name, &block)
51
51
  # Indicate the nature of this topic (matcher)
52
52
  virtual_topic.patterns(active: true, type: :matcher, pattern: pattern)
53
+ # Pattern subscriptions should never be part of declarative topics definitions
54
+ # Since they are subscribed by regular expressions, we do not know the target
55
+ # topics names so we cannot manage them via declaratives
56
+ virtual_topic.config(active: false)
53
57
  pattern.topic = virtual_topic
54
58
  @patterns << pattern
55
59
  end
@@ -3,5 +3,5 @@
3
3
  # Main module namespace
4
4
  module Karafka
5
5
  # Current Karafka version
6
- VERSION = '2.2.9'
6
+ VERSION = '2.2.10'
7
7
  end
data/renovate.json CHANGED
@@ -1,6 +1,9 @@
1
1
  {
2
2
  "$schema": "https://docs.renovatebot.com/renovate-schema.json",
3
3
  "extends": [
4
- "config:base"
4
+ "config:base"
5
+ ],
6
+ "ignorePaths": [
7
+ "spec/integrations"
5
8
  ]
6
9
  }
data.tar.gz.sig CHANGED
Binary file
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: karafka
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.2.9
4
+ version: 2.2.10
5
5
  platform: ruby
6
6
  authors:
7
7
  - Maciej Mensfeld
@@ -35,7 +35,7 @@ cert_chain:
35
35
  AnG1dJU+yL2BK7vaVytLTstJME5mepSZ46qqIJXMuWob/YPDmVaBF39TDSG9e34s
36
36
  msG3BiCqgOgHAnL23+CN3Rt8MsuRfEtoTKpJVcCfoEoNHOkc
37
37
  -----END CERTIFICATE-----
38
- date: 2023-10-24 00:00:00.000000000 Z
38
+ date: 2023-11-02 00:00:00.000000000 Z
39
39
  dependencies:
40
40
  - !ruby/object:Gem::Dependency
41
41
  name: karafka-core
metadata.gz.sig CHANGED
Binary file