karafka 2.0.37 → 2.0.38

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: bab9c1d7bc952b4ecbfc4fad794d7e7c861cd3a332cc5d9058cef6c0bd9b57cb
4
- data.tar.gz: 7662bd8dc5748d9112f3c72b2912619534e45750188f78df2f69a7e6ae1f9c31
3
+ metadata.gz: 1b9653385cf5a3b1e27eae06d53b9761c9a1f265252f721773258459eb3df1e7
4
+ data.tar.gz: c0af983ab0539e8463bf2612068a6b261de1325078c3e8600b0d6df0f596d100
5
5
  SHA512:
6
- metadata.gz: 9a99a84d538a74bd27d5a0f585a12dbbe67eb76ab63cc1a0984cbe1562f230070ad482418f85393a3a479e81534a0957a0863c91f3a7f5b6433f74efd317c79e
7
- data.tar.gz: 7da6129cd795f65d821bae897864648e4a5e37c0d07e8745f110f0d03a23d688d7717ab6c42652c4660fe3be3d26ee7051a8c2c66c1fcb62834a1c4159bd4ac4
6
+ metadata.gz: d9000a8f71d7fff762db5f567956f6ea68e436b428014c509ae233730c9f75fd6ac311e51b0022999dfdce64362c86dab6912ce549378d9def231e5749961140
7
+ data.tar.gz: f980261b5ada2f46efbf919aac86ab63da5bccce26639b9e7d98c07c6012cc3c727189a548627687092ee2802aca8df3d5459bcdcc8d9d29b35f2d6da92a64fc
checksums.yaml.gz.sig CHANGED
Binary file
data/CHANGELOG.md CHANGED
@@ -1,5 +1,11 @@
1
1
  # Karafka framework changelog
2
2
 
3
+ ## 2.0.38 (2023-03-27)
4
+ - [Improvement] Introduce `Karafka::Admin#read_watermark_offsets` to get low and high watermark offsets values.
5
+ - [Improvement] Track active_job_id in instrumentation (#1372)
6
+ - [Improvement] Improve `#read_topic` reading in case of a compacted partition where the offset is below the low watermark offset. This should optimize reading and should not go beyond the low watermark offset.
7
+ - [Improvement] Allow `#read_topic` to accept instance settings to overwrite any settings needed to customize reading behaviours.
8
+
3
9
  ## 2.0.37 (2023-03-20)
4
10
  - [Fix] Declarative topics execution on a secondary cluster run topics creation on the primary one (#1365)
5
11
  - [Fix] Admin read operations commit offset when not needed (#1369)
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- karafka (2.0.37)
4
+ karafka (2.0.38)
5
5
  karafka-core (>= 2.0.12, < 3.0.0)
6
6
  thor (>= 0.20)
7
7
  waterdrop (>= 2.4.10, < 3.0.0)
@@ -10,10 +10,10 @@ PATH
10
10
  GEM
11
11
  remote: https://rubygems.org/
12
12
  specs:
13
- activejob (7.0.4.2)
14
- activesupport (= 7.0.4.2)
13
+ activejob (7.0.4.3)
14
+ activesupport (= 7.0.4.3)
15
15
  globalid (>= 0.3.6)
16
- activesupport (7.0.4.2)
16
+ activesupport (7.0.4.3)
17
17
  concurrent-ruby (~> 1.0, >= 1.0.2)
18
18
  i18n (>= 1.6, < 2)
19
19
  minitest (>= 5.1)
data/README.md CHANGED
@@ -86,7 +86,7 @@ bundle exec karafka server
86
86
 
87
87
  I also sell Karafka Pro subscriptions. It includes a commercial-friendly license, priority support, architecture consultations, enhanced Web UI and high throughput data processing-related features (virtual partitions, long-running jobs, and more).
88
88
 
89
- **20%** of the income will be distributed back to other OSS projects that Karafka uses under the hood.
89
+ **10%** of the income will be distributed back to other OSS projects that Karafka uses under the hood.
90
90
 
91
91
  Help me provide high-quality open-source software. Please see the Karafka [homepage](https://karafka.io/#become-pro) for more details.
92
92
 
@@ -12,16 +12,31 @@ module Karafka
12
12
  messages.each do |message|
13
13
  break if Karafka::App.stopping?
14
14
 
15
- # We technically speaking could set this as deserializer and reference it from the
16
- # message instead of using the `#raw_payload`. This is not done on purpose to simplify
17
- # the ActiveJob setup here
18
- job = ::ActiveSupport::JSON.decode(message.raw_payload)
15
+ consume_job(message)
19
16
 
20
- tags.add(:job_class, job['job_class'])
17
+ mark_as_consumed(message)
18
+ end
19
+ end
21
20
 
22
- ::ActiveJob::Base.execute(job)
21
+ private
23
22
 
24
- mark_as_consumed(message)
23
+ # Consumes a message with the job and runs needed instrumentation
24
+ #
25
+ # @param job_message [Karafka::Messages::Message] message with active job
26
+ def consume_job(job_message)
27
+ # We technically speaking could set this as deserializer and reference it from the
28
+ # message instead of using the `#raw_payload`. This is not done on purpose to simplify
29
+ # the ActiveJob setup here
30
+ job = ::ActiveSupport::JSON.decode(job_message.raw_payload)
31
+
32
+ tags.add(:job_class, job['job_class'])
33
+
34
+ payload = { caller: self, job: job, message: job_message }
35
+
36
+ # We publish both to make it consistent with `consumer.x` events
37
+ Karafka.monitor.instrument('active_job.consume', payload)
38
+ Karafka.monitor.instrument('active_job.consumed', payload) do
39
+ ::ActiveJob::Base.execute(job)
25
40
  end
26
41
  end
27
42
  end
data/lib/karafka/admin.rb CHANGED
@@ -44,17 +44,32 @@ module Karafka
44
44
  # @param count [Integer] how many messages we want to get at most
45
45
  # @param start_offset [Integer] offset from which we should start. If -1 is provided
46
46
  # (default) we will start from the latest offset
47
+ # @param settings [Hash] kafka extra settings (optional)
47
48
  #
48
49
  # @return [Array<Karafka::Messages::Message>] array with messages
49
- def read_topic(name, partition, count, start_offset = -1)
50
+ def read_topic(name, partition, count, start_offset = -1, settings = {})
50
51
  messages = []
51
52
  tpl = Rdkafka::Consumer::TopicPartitionList.new
53
+ low_offset, high_offset = nil
52
54
 
53
- with_consumer do |consumer|
54
- offsets = consumer.query_watermark_offsets(name, partition)
55
- end_offset = offsets.last
55
+ with_consumer(settings) do |consumer|
56
+ low_offset, high_offset = consumer.query_watermark_offsets(name, partition)
57
+
58
+ # Select offset dynamically if -1 or less
59
+ start_offset = high_offset - count if start_offset.negative?
56
60
 
57
- start_offset = [0, offsets.last - count].max if start_offset.negative?
61
+ # Build the requested range - since first element is on the start offset we need to
62
+ # subtract one from requested count to end up with expected number of elements
63
+ requested_range = (start_offset..start_offset + (count - 1))
64
+ # Establish theoretical available range. Note, that this does not handle cases related to
65
+ # log retention or compaction
66
+ available_range = (low_offset..high_offset)
67
+ # Select only offset that we can select. This will remove all the potential offsets that
68
+ # are below the low watermark offset
69
+ possible_range = requested_range.select { |offset| available_range.include?(offset) }
70
+
71
+ start_offset = possible_range.first
72
+ count = possible_range.count
58
73
 
59
74
  tpl.add_topic_and_partitions_with_offsets(name, partition => start_offset)
60
75
  consumer.assign(tpl)
@@ -64,11 +79,15 @@ module Karafka
64
79
  loop do
65
80
  # If we've got as many messages as we've wanted stop
66
81
  break if messages.size >= count
67
- # If we've reached end of the topic messages, don't process more
68
- break if !messages.empty? && end_offset <= messages.last.offset
69
82
 
70
83
  message = consumer.poll(200)
71
- messages << message if message
84
+
85
+ next unless message
86
+
87
+ # If the message we've got is beyond the requested range, stop
88
+ break unless possible_range.include?(message.offset)
89
+
90
+ messages << message
72
91
  rescue Rdkafka::RdkafkaError => e
73
92
  # End of partition
74
93
  break if e.code == :partition_eof
@@ -77,7 +96,7 @@ module Karafka
77
96
  end
78
97
  end
79
98
 
80
- messages.map do |message|
99
+ messages.map! do |message|
81
100
  Messages::Builders::Message.call(
82
101
  message,
83
102
  # Use topic from routes if we can match it or create a dummy one
@@ -136,6 +155,17 @@ module Karafka
136
155
  end
137
156
  end
138
157
 
158
+ # Fetches the watermark offsets for a given topic partition
159
+ #
160
+ # @param name [String, Symbol] topic name
161
+ # @param partition [Integer] partition
162
+ # @return [Array<Integer, Integer>] low watermark offset and high watermark offset
163
+ def read_watermark_offsets(name, partition)
164
+ with_consumer do |consumer|
165
+ consumer.query_watermark_offsets(name, partition)
166
+ end
167
+ end
168
+
139
169
  # @return [Rdkafka::Metadata] cluster metadata info
140
170
  def cluster_info
141
171
  with_admin do |admin|
@@ -159,15 +189,16 @@ module Karafka
159
189
 
160
190
  # Creates admin instance and yields it. After usage it closes the admin instance
161
191
  def with_admin
162
- admin = config(:producer).admin
192
+ admin = config(:producer, {}).admin
163
193
  yield(admin)
164
194
  ensure
165
195
  admin&.close
166
196
  end
167
197
 
168
198
  # Creates consumer instance and yields it. After usage it closes the consumer instance
169
- def with_consumer
170
- consumer = config(:consumer).consumer
199
+ # @param settings [Hash] extra settings to customize consumer
200
+ def with_consumer(settings = {})
201
+ consumer = config(:consumer, settings).consumer
171
202
  yield(consumer)
172
203
  ensure
173
204
  consumer&.close
@@ -196,11 +227,12 @@ module Karafka
196
227
  end
197
228
 
198
229
  # @param type [Symbol] type of config we want
230
+ # @param settings [Hash] extra settings for config (if needed)
199
231
  # @return [::Rdkafka::Config] rdkafka config
200
- def config(type)
232
+ def config(type, settings)
201
233
  config_hash = Karafka::Setup::AttributesMap.public_send(
202
234
  type,
203
- Karafka::App.config.kafka.dup.merge(CONFIG_DEFAULTS)
235
+ Karafka::App.config.kafka.dup.merge(CONFIG_DEFAULTS).merge!(settings)
204
236
  )
205
237
 
206
238
  ::Rdkafka::Config.new(config_hash)
@@ -17,6 +17,9 @@ module Karafka
17
17
  # complete list of all the events. Please use the #available_events on fully loaded
18
18
  # Karafka system to determine all of the events you can use.
19
19
  EVENTS = %w[
20
+ active_job.consume
21
+ active_job.consumed
22
+
20
23
  app.initialized
21
24
  app.running
22
25
  app.quieting
@@ -22,7 +22,7 @@ module Karafka
22
22
  #
23
23
  # It contains slightly better revocation warranties than the regular blocking consumer as
24
24
  # it can stop processing batch of jobs in the middle after the revocation.
25
- class Consumer < Karafka::Pro::BaseConsumer
25
+ class Consumer < ::Karafka::ActiveJob::Consumer
26
26
  # Runs ActiveJob jobs processing and handles lrj if needed
27
27
  def consume
28
28
  messages.each do |message|
@@ -31,11 +31,7 @@ module Karafka
31
31
  break if revoked?
32
32
  break if Karafka::App.stopping?
33
33
 
34
- job = ::ActiveSupport::JSON.decode(message.raw_payload)
35
-
36
- tags.add(:job_class, job['job_class'])
37
-
38
- ::ActiveJob::Base.execute(job)
34
+ consume_job(message)
39
35
 
40
36
  # We cannot mark jobs as done after each if there are virtual partitions. Otherwise
41
37
  # this could create random markings.
@@ -3,5 +3,5 @@
3
3
  # Main module namespace
4
4
  module Karafka
5
5
  # Current Karafka version
6
- VERSION = '2.0.37'
6
+ VERSION = '2.0.38'
7
7
  end
data.tar.gz.sig CHANGED
Binary file
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: karafka
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.0.37
4
+ version: 2.0.38
5
5
  platform: ruby
6
6
  authors:
7
7
  - Maciej Mensfeld
@@ -35,7 +35,7 @@ cert_chain:
35
35
  Qf04B9ceLUaC4fPVEz10FyobjaFoY4i32xRto3XnrzeAgfEe4swLq8bQsR3w/EF3
36
36
  MGU0FeSV2Yj7Xc2x/7BzLK8xQn5l7Yy75iPF+KP3vVmDHnNl
37
37
  -----END CERTIFICATE-----
38
- date: 2023-03-20 00:00:00.000000000 Z
38
+ date: 2023-03-27 00:00:00.000000000 Z
39
39
  dependencies:
40
40
  - !ruby/object:Gem::Dependency
41
41
  name: karafka-core
metadata.gz.sig CHANGED
Binary file