karafka 2.2.9 → 2.2.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 71ed12ebff017d2ea4ded639fc745c09c3edf198456b2eb3a4d6e3f1740be925
4
- data.tar.gz: de696bd36d0af383eed008492ab0fd71dc0395515b56fe050ca93765631cbcb9
3
+ metadata.gz: 460029e2dbc352c56f107e18bbff28f5914b5b56a0e42be9670e13f0f3592fc9
4
+ data.tar.gz: 68c25bacd1995a8e9c97a915e1e1b2b4435437c97a5e31e2824499c9ad531057
5
5
  SHA512:
6
- metadata.gz: 14be5b79b7e25ca006bc3d1eecd6706c0beaec38b67ec949e5fc064bc1b883468a0eedb7dabf10d61f9ecf197ec2f476bf413f9382368203437d171318269312
7
- data.tar.gz: c6fa615fe8eb7c36f5804421c392c13241d4fd0d3250039f08c0a146cb68cae1b58aac0a9e75c8019b7c13eb743867afd9df229063ee2d8e405e64464a2bb523
6
+ metadata.gz: adc877c3be3cb8885b0244168014387f0353bed89575b6931f5e1ca062d01ba6a0e5a0566da4dd282f5117c7daf27ff2116f27025ec4213972c1ca52e92980de
7
+ data.tar.gz: a62ff96ef96ea85d98288bad3c8f6cf0bfafc1ee6de1cf928fe803ad80435b0fa137aac9cf5eff4b232e854b3c4df76c3c16efaa4a117d6b86e442975cfc9a30
checksums.yaml.gz.sig CHANGED
Binary file
@@ -37,14 +37,15 @@ Here's an example:
37
37
 
38
38
  ```
39
39
  $ [bundle exec] karafka info
40
- Karafka version: 1.3.0
41
- Ruby version: 2.6.3
42
- Ruby-kafka version: 0.7.9
43
- Application client id: karafka-local
44
- Backend: inline
45
- Batch fetching: true
46
- Batch consuming: true
47
- Boot file: /app/karafka/karafka.rb
40
+ Karafka version: 2.2.10 + Pro
41
+ Ruby version: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
42
+ Rdkafka version: 0.13.8
43
+ Consumer groups count: 2
44
+ Subscription groups count: 2
45
+ Workers count: 2
46
+ Application client id: example_app
47
+ Boot file: /app/karafka.rb
48
48
  Environment: development
49
- Kafka seed brokers: ["kafka://kafka:9092"]
49
+ License: Commercial
50
+ License entity: karafka-ci
50
51
  ```
@@ -1,6 +1,8 @@
1
1
  name: ci
2
2
 
3
- concurrency: ci-${{ github.ref }}
3
+ concurrency:
4
+ group: ${{ github.workflow }}-${{ github.ref }}
5
+ cancel-in-progress: true
4
6
 
5
7
  on:
6
8
  pull_request:
data/CHANGELOG.md CHANGED
@@ -1,5 +1,15 @@
1
1
  # Karafka framework changelog
2
2
 
3
+ ## 2.2.10 (2023-11-02)
4
+ - [Improvement] Allow for running `#pause` without specifying the offset (provide offset or `:consecutive`). This allows for pausing on the consecutive message (last received + 1), so after resume we will get last message received + 1 effectively not using `#seek` and not purging `librdafka` buffer preserving on networking. Please be mindful that this uses notion of last message passed from **librdkafka**, and not the last one available in the consumer (`messages.last`). While for regular cases they will be the same, when using things like DLQ, LRJs, VPs or Filtering API, those may not be the same.
5
+ - [Improvement] **Drastically** improve network efficiency of operating with LRJ by using the `:consecutive` offset as default strategy for running LRJs without moving the offset in place and purging the data.
6
+ - [Improvement] Do not "seek in place". When pausing and/or seeking to the same location as the current position, do nothing not to purge buffers and not to move to the same place where we are.
7
+ - [Fix] Pattern regexps should not be part of declaratives even when configured.
8
+
9
+ ### Upgrade Notes
10
+
11
+ In the latest Karafka release, there are no breaking changes. However, please note the updates to #pause and #seek. If you spot any issues, please report them immediately. Your feedback is crucial.
12
+
3
13
  ## 2.2.9 (2023-10-24)
4
14
  - [Improvement] Allow using negative offset references in `Karafka::Admin#read_topic`.
5
15
  - [Change] Make sure that WaterDrop `2.6.10` or higher is used with this release to support transactions fully and the Web-UI.
@@ -11,7 +21,7 @@
11
21
  - [Refactor] Reorganize how rebalance events are propagated from `librdkafka` to Karafka. Replace `connection.client.rebalance_callback` with `rebalance.partitions_assigned` and `rebalance.partitions_revoked`. Introduce two extra events: `rebalance.partitions_assign` and `rebalance.partitions_revoke` to handle pre-rebalance future work.
12
22
  - [Refactor] Remove `thor` as a CLI layer and rely on Ruby `OptParser`
13
23
 
14
- ### Upgrade notes
24
+ ### Upgrade Notes
15
25
 
16
26
  1. Unless you were using `connection.client.rebalance_callback` which was considered private, nothing.
17
27
  2. None of the CLI commands should change but `thor` has been removed so please report if you find any bugs.
@@ -53,7 +63,7 @@
53
63
  - [Enhancement] Make sure that consumer group used by `Karafka::Admin` obeys the `ConsumerMapper` setup.
54
64
  - [Fix] Fix a case where subscription group would not accept a symbol name.
55
65
 
56
- ### Upgrade notes
66
+ ### Upgrade Notes
57
67
 
58
68
  As always, please make sure you have upgraded to the most recent version of `2.1` before upgrading to `2.2`.
59
69
 
@@ -184,7 +194,7 @@ If you want to maintain the `2.1` behavior, that is `karafka_admin` admin group,
184
194
  - [Change] Removed `Karafka::Pro::BaseConsumer` in favor of `Karafka::BaseConsumer`. (#1345)
185
195
  - [Fix] Fix for `max_messages` and `max_wait_time` not having reference in errors.yml (#1443)
186
196
 
187
- ### Upgrade notes
197
+ ### Upgrade Notes
188
198
 
189
199
  1. Upgrade to Karafka `2.0.41` prior to upgrading to `2.1.0`.
190
200
  2. Replace `Karafka::Pro::BaseConsumer` references to `Karafka::BaseConsumer`.
@@ -256,7 +266,7 @@ If you want to maintain the `2.1` behavior, that is `karafka_admin` admin group,
256
266
  - [Improvement] Attach an `embedded` tag to Karafka processes started using the embedded API.
257
267
  - [Change] Renamed `Datadog::Listener` to `Datadog::MetricsListener` for consistency. (#1124)
258
268
 
259
- ### Upgrade notes
269
+ ### Upgrade Notes
260
270
 
261
271
  1. Replace `Datadog::Listener` references to `Datadog::MetricsListener`.
262
272
 
@@ -270,7 +280,7 @@ If you want to maintain the `2.1` behavior, that is `karafka_admin` admin group,
270
280
  - [Improvement] Introduce a `strict_topics_namespacing` config option to enable/disable the strict topics naming validations. This can be useful when working with pre-existing topics which we cannot or do not want to rename.
271
281
  - [Fix] Karafka monitor is prematurely cached (#1314)
272
282
 
273
- ### Upgrade notes
283
+ ### Upgrade Notes
274
284
 
275
285
  Since `#tags` were introduced on consumers, the `#tags` method is now part of the consumers API.
276
286
 
@@ -332,7 +342,7 @@ end
332
342
  - [Improvement] Include `original_consumer_group` in the DLQ dispatched messages in Pro.
333
343
  - [Improvement] Use Karafka `client_id` as kafka `client.id` value by default
334
344
 
335
- ### Upgrade notes
345
+ ### Upgrade Notes
336
346
 
337
347
  If you want to continue to use `karafka` as default for kafka `client.id`, assign it manually:
338
348
 
@@ -385,7 +395,7 @@ class KarafkaApp < Karafka::App
385
395
  - [Fix] Do not trigger code reloading when `consumer_persistence` is enabled.
386
396
  - [Fix] Shutdown producer after all the consumer components are down and the status is stopped. This will ensure, that any instrumentation related Kafka messaging can still operate.
387
397
 
388
- ### Upgrade notes
398
+ ### Upgrade Notes
389
399
 
390
400
  If you want to disable `librdkafka` statistics because you do not use them at all, update the `kafka` `statistics.interval.ms` setting and set it to `0`:
391
401
 
@@ -447,7 +457,7 @@ end
447
457
  - [Fix] Few typos around DLQ and Pro DLQ Dispatch original metadata naming.
448
458
  - [Fix] Narrow the components lookup to the appropriate scope (#1114)
449
459
 
450
- ### Upgrade notes
460
+ ### Upgrade Notes
451
461
 
452
462
  1. Replace `original-*` references from DLQ dispatched metadata with `original_*`
453
463
 
@@ -490,7 +500,7 @@ end
490
500
  - [Specs] Split specs into regular and pro to simplify how resources are loaded
491
501
  - [Specs] Add specs to ensure, that all the Pro components have a proper per-file license (#1099)
492
502
 
493
- ### Upgrade notes
503
+ ### Upgrade Notes
494
504
 
495
505
  1. Remove the `manual_offset_management` setting from the main config if you use it:
496
506
 
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- karafka (2.2.9)
4
+ karafka (2.2.10)
5
5
  karafka-core (>= 2.2.2, < 2.3.0)
6
6
  waterdrop (>= 2.6.10, < 3.0.0)
7
7
  zeitwerk (~> 2.3)
@@ -39,24 +39,24 @@ GEM
39
39
  activesupport (>= 6.1)
40
40
  i18n (1.14.1)
41
41
  concurrent-ruby (~> 1.0)
42
- karafka-core (2.2.3)
42
+ karafka-core (2.2.5)
43
43
  concurrent-ruby (>= 1.1)
44
- karafka-rdkafka (>= 0.13.6, < 0.14.0)
45
- karafka-rdkafka (0.13.6)
44
+ karafka-rdkafka (>= 0.13.8, < 0.15.0)
45
+ karafka-rdkafka (0.13.8)
46
46
  ffi (~> 1.15)
47
47
  mini_portile2 (~> 2.6)
48
48
  rake (> 12)
49
- karafka-web (0.7.7)
49
+ karafka-web (0.7.10)
50
50
  erubi (~> 1.4)
51
- karafka (>= 2.2.8.beta1, < 3.0.0)
52
- karafka-core (>= 2.2.2, < 3.0.0)
51
+ karafka (>= 2.2.9, < 3.0.0)
52
+ karafka-core (>= 2.2.4, < 3.0.0)
53
53
  roda (~> 3.68, >= 3.69)
54
54
  tilt (~> 2.0)
55
55
  mini_portile2 (2.8.5)
56
56
  minitest (5.20.0)
57
57
  mutex_m (0.1.2)
58
58
  rack (3.0.8)
59
- rake (13.0.6)
59
+ rake (13.1.0)
60
60
  roda (3.73.0)
61
61
  rack
62
62
  rspec (3.12.0)
@@ -82,7 +82,7 @@ GEM
82
82
  tilt (2.3.0)
83
83
  tzinfo (2.0.6)
84
84
  concurrent-ruby (~> 1.0)
85
- waterdrop (2.6.10)
85
+ waterdrop (2.6.11)
86
86
  karafka-core (>= 2.2.3, < 3.0.0)
87
87
  zeitwerk (~> 2.3)
88
88
  zeitwerk (2.6.12)
@@ -171,23 +171,28 @@ module Karafka
171
171
  @used
172
172
  end
173
173
 
174
- # Pauses processing on a given offset for the current topic partition
174
+ # Pauses processing on a given offset or consecutive offset for the current topic partition
175
175
  #
176
176
  # After given partition is resumed, it will continue processing from the given offset
177
- # @param offset [Integer] offset from which we want to restart the processing
177
+ # @param offset [Integer, Symbol] offset from which we want to restart the processing or
178
+ # `:consecutive` if we want to pause and continue without changing the consecutive offset
179
+ # (cursor position)
178
180
  # @param timeout [Integer, nil] how long in milliseconds do we want to pause or nil to use the
179
181
  # default exponential pausing strategy defined for retries
180
182
  # @param manual_pause [Boolean] Flag to differentiate between user pause and system/strategy
181
183
  # based pause. While they both pause in exactly the same way, the strategy application
182
184
  # may need to differentiate between them.
185
+ #
186
+ # @note It is **critical** to understand how pause with `:consecutive` offset operates. While
187
+ # it provides benefit of not purging librdkafka buffer, in case of usage of filters, retries
188
+ # or other advanced options the consecutive offset may not be the one you want to pause on.
189
+ # Test it well to ensure, that this behaviour is expected by you.
183
190
  def pause(offset, timeout = nil, manual_pause = true)
184
191
  timeout ? coordinator.pause_tracker.pause(timeout) : coordinator.pause_tracker.pause
185
192
 
186
- client.pause(
187
- topic.name,
188
- partition,
189
- offset
190
- )
193
+ offset = nil if offset == :consecutive
194
+
195
+ client.pause(topic.name, partition, offset)
191
196
 
192
197
  # Indicate, that user took a manual action of pausing
193
198
  coordinator.manual_pause if manual_pause
@@ -160,16 +160,17 @@ module Karafka
160
160
  #
161
161
  # @param topic [String] topic name
162
162
  # @param partition [Integer] partition
163
- # @param offset [Integer] offset of the message on which we want to pause (this message will
164
- # be reprocessed after getting back to processing)
163
+ # @param offset [Integer, nil] offset of the message on which we want to pause (this message
164
+ # will be reprocessed after getting back to processing) or nil if we want to pause and
165
+ # resume from the consecutive offset (+1 from the last message passed to us by librdkafka)
165
166
  # @note This will pause indefinitely and requires manual `#resume`
166
- def pause(topic, partition, offset)
167
+ # @note When `#internal_seek` is not involved (when offset is `nil`) we will not purge the
168
+ # librdkafka buffers and continue from the last cursor offset
169
+ def pause(topic, partition, offset = nil)
167
170
  @mutex.synchronize do
168
171
  # Do not pause if the client got closed, would not change anything
169
172
  return if @closed
170
173
 
171
- pause_msg = Messages::Seek.new(topic, partition, offset)
172
-
173
174
  internal_commit_offsets(async: true)
174
175
 
175
176
  # Here we do not use our cached tpls because we should not try to pause something we do
@@ -190,6 +191,14 @@ module Karafka
190
191
  @paused_tpls[topic][partition] = tpl
191
192
 
192
193
  @kafka.pause(tpl)
194
+
195
+ # If offset is not provided, will pause where it finished.
196
+ # This makes librdkafka not purge buffers and can provide significant network savings
197
+ # when we just want to pause before further processing without changing the offsets
198
+ return unless offset
199
+
200
+ pause_msg = Messages::Seek.new(topic, partition, offset)
201
+
193
202
  internal_seek(pause_msg)
194
203
  end
195
204
  end
@@ -354,6 +363,9 @@ module Karafka
354
363
  #
355
364
  # @param message [Messages::Message, Messages::Seek] message to which we want to seek to.
356
365
  # It can have the time based offset.
366
+ #
367
+ # @note Will not invoke seeking if the desired seek would lead us to the current position.
368
+ # This prevents us from flushing librdkafka buffer when it is not needed.
357
369
  def internal_seek(message)
358
370
  # If the seek message offset is in a time format, we need to find the closest "real"
359
371
  # offset matching before we seek
@@ -378,6 +390,14 @@ module Karafka
378
390
  message.offset = detected_partition&.offset || raise(Errors::InvalidTimeBasedOffsetError)
379
391
  end
380
392
 
393
+ # Never seek if we would get the same location as we would get without seeking
394
+ # This prevents us from the expensive buffer purges that can lead to increased network
395
+ # traffic and can cost a lot of money
396
+ #
397
+ # This code adds around 0.01 ms per seek but saves from many user unexpected behaviours in
398
+ # seeking and pausing
399
+ return if message.offset == topic_partition_position(message.topic, message.partition)
400
+
381
401
  @kafka.seek(message)
382
402
  end
383
403
 
@@ -432,6 +452,18 @@ module Karafka
432
452
  Rdkafka::Consumer::TopicPartitionList.new({ topic => [rdkafka_partition] })
433
453
  end
434
454
 
455
+ # @param topic [String]
456
+ # @param partition [Integer]
457
+ # @return [Integer] current position within topic partition or `-1` if it could not be
458
+ # established. It may be `-1` in case we lost the assignment or we did not yet fetch data
459
+ # for this topic partition
460
+ def topic_partition_position(topic, partition)
461
+ rd_partition = ::Rdkafka::Consumer::Partition.new(partition, nil, 0)
462
+ tpl = ::Rdkafka::Consumer::TopicPartitionList.new(topic => [rd_partition])
463
+
464
+ @kafka.position(tpl).to_h.fetch(topic).first.offset || -1
465
+ end
466
+
435
467
  # Performs a single poll operation and handles retries and error
436
468
  #
437
469
  # @param timeout [Integer] timeout for a single poll
@@ -71,6 +71,8 @@ module Karafka
71
71
  # Prints info about a consumer pause occurrence. Irrelevant if user or system initiated.
72
72
  #
73
73
  # @param event [Karafka::Core::Monitoring::Event] event details including payload
74
+ # @note There may be no offset provided in case user wants to pause on the consecutive offset
75
+ # position. This can be beneficial when not wanting to purge the buffers.
74
76
  def on_client_pause(event)
75
77
  topic = event[:topic]
76
78
  partition = event[:partition]
@@ -78,7 +80,9 @@ module Karafka
78
80
  client = event[:caller]
79
81
 
80
82
  info <<~MSG.tr("\n", ' ').strip!
81
- [#{client.id}] Pausing on topic #{topic}/#{partition} on offset #{offset}
83
+ [#{client.id}]
84
+ Pausing on topic #{topic}/#{partition}
85
+ on #{offset ? "offset #{offset}" : 'the consecutive offset'}
82
86
  MSG
83
87
  end
84
88
 
@@ -37,7 +37,7 @@ module Karafka
37
37
  super
38
38
 
39
39
  coordinator.on_enqueued do
40
- pause(coordinator.seek_offset, Strategies::Lrj::Default::MAX_PAUSE_TIME, false)
40
+ pause(:consecutive, Strategies::Lrj::Default::MAX_PAUSE_TIME, false)
41
41
  end
42
42
  end
43
43
 
@@ -35,12 +35,10 @@ module Karafka
35
35
  # This ensures that when running LRJ with VP, things operate as expected run only
36
36
  # once for all the virtual partitions collectively
37
37
  coordinator.on_enqueued do
38
- # Pause at the first message in a batch. That way in case of a crash, we will not
39
- # loose any messages.
40
- #
41
- # For VP it applies the same way and since VP cannot be used with MOM we should not
42
- # have any edge cases here.
43
- pause(coordinator.seek_offset, MAX_PAUSE_TIME, false)
38
+ # Pause and continue with another batch in case of a regular resume.
39
+ # In case of an error, the `#retry_after_pause` will move the offset to the first
40
+ # message out of this batch.
41
+ pause(:consecutive, MAX_PAUSE_TIME, false)
44
42
  end
45
43
  end
46
44
 
@@ -35,12 +35,7 @@ module Karafka
35
35
  # This ensures that when running LRJ with VP, things operate as expected run only
36
36
  # once for all the virtual partitions collectively
37
37
  coordinator.on_enqueued do
38
- # Pause at the first message in a batch. That way in case of a crash, we will not
39
- # loose any messages.
40
- #
41
- # For VP it applies the same way and since VP cannot be used with MOM we should not
42
- # have any edge cases here.
43
- pause(coordinator.seek_offset, Strategies::Lrj::Default::MAX_PAUSE_TIME, false)
38
+ pause(:consecutive, Strategies::Lrj::Default::MAX_PAUSE_TIME, false)
44
39
  end
45
40
  end
46
41
 
@@ -50,6 +50,10 @@ module Karafka
50
50
  virtual_topic = public_send(:topic=, pattern.name, &block)
51
51
  # Indicate the nature of this topic (matcher)
52
52
  virtual_topic.patterns(active: true, type: :matcher, pattern: pattern)
53
+ # Pattern subscriptions should never be part of declarative topics definitions
54
+ # Since they are subscribed by regular expressions, we do not know the target
55
+ # topics names so we cannot manage them via declaratives
56
+ virtual_topic.config(active: false)
53
57
  pattern.topic = virtual_topic
54
58
  @patterns << pattern
55
59
  end
@@ -3,5 +3,5 @@
3
3
  # Main module namespace
4
4
  module Karafka
5
5
  # Current Karafka version
6
- VERSION = '2.2.9'
6
+ VERSION = '2.2.10'
7
7
  end
data/renovate.json CHANGED
@@ -1,6 +1,9 @@
1
1
  {
2
2
  "$schema": "https://docs.renovatebot.com/renovate-schema.json",
3
3
  "extends": [
4
- "config:base"
4
+ "config:base"
5
+ ],
6
+ "ignorePaths": [
7
+ "spec/integrations"
5
8
  ]
6
9
  }
data.tar.gz.sig CHANGED
Binary file
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: karafka
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.2.9
4
+ version: 2.2.10
5
5
  platform: ruby
6
6
  authors:
7
7
  - Maciej Mensfeld
@@ -35,7 +35,7 @@ cert_chain:
35
35
  AnG1dJU+yL2BK7vaVytLTstJME5mepSZ46qqIJXMuWob/YPDmVaBF39TDSG9e34s
36
36
  msG3BiCqgOgHAnL23+CN3Rt8MsuRfEtoTKpJVcCfoEoNHOkc
37
37
  -----END CERTIFICATE-----
38
- date: 2023-10-24 00:00:00.000000000 Z
38
+ date: 2023-11-02 00:00:00.000000000 Z
39
39
  dependencies:
40
40
  - !ruby/object:Gem::Dependency
41
41
  name: karafka-core
metadata.gz.sig CHANGED
Binary file