karafka 2.0.4 → 2.0.7

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8f065ff811caf7ed4cf8c1b39c09c5936bc06ef99f51ee8040ed00164dcffbe6
4
- data.tar.gz: 878c4a53feaaa8587334cfb7c6c19d370ffaf03b6d67ef6ddeb50699da4d7322
3
+ metadata.gz: 0abed3f97a58be6b48f640468f7d7e6d48bc0960596b21d022b4616dd047be28
4
+ data.tar.gz: 48143253beee640e25e47a81474767c179e715e855d6173b59566483a57af5a8
5
5
  SHA512:
6
- metadata.gz: 71794fc1da73605fe6cf40d1826d48294f436a61bfec46c03dce370b5093cefa5dec09b3eba8e33434dd13fe0daa57ff1b634e2e42de887075c0f29d820b0122
7
- data.tar.gz: 17f1dc66520907e04a876cd86995ff2619fe46c995a239f6ffe61045cd284d82a24ebcb276501b7fa669163a99c45edb4738fdbbc6f59660a4752fca658ac7c7
6
+ metadata.gz: 9c9f8c170ac82fc0f1eb6ea41698dcd82cc525006931a59443d004c94eb18b56ffcb67eb1eb45fcc1fd557fee22e6e63ceb7a8a001245469e3e574d87c88c8e8
7
+ data.tar.gz: 47bc7e7dfe5ca3d503a3cb18da4e4b95c076197dc26b5633195e169d3f4d94da4effaf27bd4360ddff1481031b1ee20f61e465e24f6984570f6067ca4fbd51ea
checksums.yaml.gz.sig CHANGED
Binary file
data/CHANGELOG.md CHANGED
@@ -1,5 +1,23 @@
1
1
  # Karafka framework changelog
2
2
 
3
+ ## 2.0.7 (Unreleased)
4
+ - [Breaking change] Redefine the Virtual Partitions routing DSL to accept concurrency
5
+ - Allow for `concurrency` setting in Virtual Partitions to extend or limit number of jobs per regular partition. This allows to make sure, we do not use all the threads on virtual partitions jobs
6
+ - Allow for creation of as many Virtual Partitions as needed, without taking global `concurrency` into consideration
7
+
8
+ ## 2.0.6 (2022-09-02)
9
+ - Improve client closing.
10
+ - Fix for: Multiple LRJ topics fetched concurrently block ability for LRJ to kick in (#1002)
11
+ - Introduce a pre-enqueue sync execution layer to prevent starvation cases for LRJ
12
+ - Close admin upon critical errors to prevent segmentation faults
13
+ - Add support for manual subscription group management (#852)
14
+
15
+ ## 2.0.5 (2022-08-23)
16
+ - Fix unnecessary double new line in the `karafka.rb` template for Ruby on Rails
17
+ - Fix a case where a manually paused partition would not be processed after rebalance (#988)
18
+ - Increase specs stability.
19
+ - Lower concurrency of execution of specs in Github CI.
20
+
3
21
  ## 2.0.4 (2022-08-19)
4
22
  - Fix hanging topic creation (#964)
5
23
  - Fix conflict with other Rails loading libraries like `gruf` (#974)
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- karafka (2.0.4)
4
+ karafka (2.0.7)
5
5
  karafka-core (>= 2.0.2, < 3.0.0)
6
6
  rdkafka (>= 0.12)
7
7
  thor (>= 0.20)
data/README.md CHANGED
@@ -8,12 +8,12 @@
8
8
 
9
9
  Karafka is a Ruby and Rails multi-threaded efficient Kafka processing framework that:
10
10
 
11
- - Supports parallel processing in [multiple threads](https://github.com/karafka/karafka/wiki/Concurrency-and-multithreading) (also for a [single topic partition](https://github.com/karafka/karafka/wiki/Pro-Virtual-Partitions) work)
12
- - Has [ActiveJob backend](https://github.com/karafka/karafka/wiki/Active-Job) support (including [ordered jobs](https://github.com/karafka/karafka/wiki/Pro-Enhanced-Active-Job#ordered-jobs))
13
- - [Automatically integrates](https://github.com/karafka/karafka/wiki/Integrating-with-Ruby-on-Rails-and-other-frameworks#integrating-with-ruby-on-rails=) with Ruby on Rails
14
- - Supports in-development [code reloading](https://github.com/karafka/karafka/wiki/Auto-reload-of-code-changes-in-development)
11
+ - Supports parallel processing in [multiple threads](https://karafka.io/docs/Concurrency-and-multithreading) (also for a [single topic partition](https://karafka.io/docs/Pro-Virtual-Partitions) work)
12
+ - Has [ActiveJob backend](https://karafka.io/docs/Active-Job) support (including [ordered jobs](https://karafka.io/docs/Pro-Enhanced-Active-Job#ordered-jobs))
13
+ - [Automatically integrates](https://karafka.io/docs/Integrating-with-Ruby-on-Rails-and-other-frameworks#integrating-with-ruby-on-rails) with Ruby on Rails
14
+ - Supports in-development [code reloading](https://karafka.io/docs/Auto-reload-of-code-changes-in-development)
15
15
  - Is powered by [librdkafka](https://github.com/edenhill/librdkafka) (the Apache Kafka C/C++ client library)
16
- - Has an out-of the box [StatsD/DataDog monitoring](https://github.com/karafka/karafka/wiki/Monitoring-and-logging) with a dashboard template.
16
+ - Has an out-of the box [StatsD/DataDog monitoring](https://karafka.io/docs/Monitoring-and-logging) with a dashboard template.
17
17
 
18
18
  ```ruby
19
19
  # Define what topics you want to consume with which consumers in karafka.rb
@@ -42,13 +42,13 @@ If you're entirely new to the subject, you can start with our "Kafka on Rails" a
42
42
  - [Kafka on Rails: Using Kafka with Ruby on Rails – Part 1 – Kafka basics and its advantages](https://mensfeld.pl/2017/11/kafka-on-rails-using-kafka-with-ruby-on-rails-part-1-kafka-basics-and-its-advantages/)
43
43
  - [Kafka on Rails: Using Kafka with Ruby on Rails – Part 2 – Getting started with Rails and Kafka](https://mensfeld.pl/2018/01/kafka-on-rails-using-kafka-with-ruby-on-rails-part-2-getting-started-with-ruby-and-kafka/)
44
44
 
45
- If you want to get started with Kafka and Karafka as fast as possible, then the best idea is to visit our [Getting started](https://github.com/karafka/karafka/wiki/Getting-started) guides and the [example apps repository](https://github.com/karafka/example-apps).
45
+ If you want to get started with Kafka and Karafka as fast as possible, then the best idea is to visit our [Getting started](https://karafka.io/docs/Getting-Started) guides and the [example apps repository](https://github.com/karafka/example-apps).
46
46
 
47
47
  We also maintain many [integration specs](https://github.com/karafka/karafka/tree/master/spec/integrations) illustrating various use-cases and features of the framework.
48
48
 
49
49
  ### TL;DR (1 minute from setup to publishing and consuming messages)
50
50
 
51
- **Prerequisites**: Kafka running. You can start it by following instructions from [here](https://github.com/karafka/karafka/wiki/Setting-up-Kafka).
51
+ **Prerequisites**: Kafka running. You can start it by following instructions from [here](https://karafka.io/docs/Setting-up-Kafka).
52
52
 
53
53
  1. Add and install Karafka:
54
54
 
@@ -85,8 +85,8 @@ Help me provide high-quality open-source software. Please see the Karafka [homep
85
85
 
86
86
  ## Support
87
87
 
88
- Karafka has [Wiki pages](https://github.com/karafka/karafka/wiki) for almost everything and a pretty decent [FAQ](https://github.com/karafka/karafka/wiki/FAQ). It covers the installation, setup, and deployment, along with other useful details on how to run Karafka.
88
+ Karafka has [Wiki pages](https://karafka.io/docs) for almost everything and a pretty decent [FAQ](https://karafka.io/docs/FAQ). It covers the installation, setup, and deployment, along with other useful details on how to run Karafka.
89
89
 
90
90
  If you have questions about using Karafka, feel free to join our [Slack](https://slack.karafka.io) channel.
91
91
 
92
- Karafka has [priority support](https://github.com/karafka/karafka/wiki/Pro-Support) for technical and architectural questions that is part of the Karafka Pro subscription.
92
+ Karafka has [priority support](https://karafka.io/docs/Pro-Support) for technical and architectural questions that is part of the Karafka Pro subscription.
data/bin/integrations CHANGED
@@ -19,7 +19,7 @@ ROOT_PATH = Pathname.new(File.expand_path(File.join(File.dirname(__FILE__), '../
19
19
  # When the value is high, there's a problem with thread allocation on Github CI, tht is why
20
20
  # we limit it. Locally we can run a lot of those, as many of them have sleeps and do not use a lot
21
21
  # of CPU
22
- CONCURRENCY = ENV.key?('CI') ? 5 : Etc.nprocessors * 2
22
+ CONCURRENCY = ENV.key?('CI') ? 3 : Etc.nprocessors * 2
23
23
 
24
24
  # How may bytes do we want to keep from the stdout in the buffer for when we need to print it
25
25
  MAX_BUFFER_OUTPUT = 51_200
@@ -47,6 +47,8 @@ class Scenario
47
47
  # @param path [String] path to the scenarios file
48
48
  def initialize(path)
49
49
  @path = path
50
+ # First 1024 characters from stdout
51
+ @stdout_head = ''
50
52
  # Last 1024 characters from stdout
51
53
  @stdout_tail = ''
52
54
  end
@@ -75,8 +77,6 @@ class Scenario
75
77
  def finished?
76
78
  # If the thread is running too long, kill it
77
79
  if current_time - @started_at > MAX_RUN_TIME
78
- @wait_thr.kill
79
-
80
80
  begin
81
81
  Process.kill('TERM', pid)
82
82
  # It may finish right after we want to kill it, that's why we ignore this
@@ -88,6 +88,7 @@ class Scenario
88
88
  # to stdout. Otherwise after reaching the buffer size, it would hang
89
89
  buffer = ''
90
90
  @stdout.read_nonblock(MAX_BUFFER_OUTPUT, buffer, exception: false)
91
+ @stdout_head = buffer if @stdout_head.empty?
91
92
  @stdout_tail << buffer
92
93
  @stdout_tail = @stdout_tail[-MAX_BUFFER_OUTPUT..-1] || @stdout_tail
93
94
 
@@ -112,6 +113,11 @@ class Scenario
112
113
  @wait_thr.value&.exitstatus || 123
113
114
  end
114
115
 
116
+ # @return [String] exit status of the process
117
+ def exit_status
118
+ @wait_thr.value.to_s
119
+ end
120
+
115
121
  # Prints a status report when scenario is finished and stdout if it failed
116
122
  def report
117
123
  if success?
@@ -123,7 +129,11 @@ class Scenario
123
129
 
124
130
  puts
125
131
  puts "\e[#{31}m#{'[FAILED]'}\e[0m #{name}"
132
+ puts "Time taken: #{current_time - @started_at} seconds"
126
133
  puts "Exit code: #{exit_code}"
134
+ puts "Exit status: #{exit_status}"
135
+ puts @stdout_head
136
+ puts '...'
127
137
  puts @stdout_tail
128
138
  puts buffer
129
139
  puts
data/config/errors.yml CHANGED
@@ -35,6 +35,7 @@ en:
35
35
  consumer_format: needs to be present
36
36
  id_format: 'needs to be a string with a Kafka accepted format'
37
37
  initial_offset_format: needs to be either earliest or latest
38
+ subscription_group_format: must be nil or a non-empty string
38
39
 
39
40
  consumer_group:
40
41
  missing: needs to be present
@@ -54,3 +55,5 @@ en:
54
55
 
55
56
  pro_consumer_group_topic:
56
57
  consumer_format: needs to inherit from Karafka::Pro::BaseConsumer and not Karafka::Consumer
58
+ virtual_partitions.partitioner_respond_to_call: needs to be defined and needs to respond to `#call`
59
+ virtual_partitions.concurrency_format: needs to be equl or more than 1
data/lib/karafka/admin.rb CHANGED
@@ -54,8 +54,9 @@ module Karafka
54
54
  def with_admin
55
55
  admin = ::Rdkafka::Config.new(Karafka::App.config.kafka).admin
56
56
  result = yield(admin)
57
- admin.close
58
57
  result
58
+ ensure
59
+ admin&.close
59
60
  end
60
61
  end
61
62
  end
@@ -15,13 +15,24 @@ module Karafka
15
15
  # @return [Waterdrop::Producer] producer instance
16
16
  attr_accessor :producer
17
17
 
18
- # Can be used to run preparation code
18
+ # Can be used to run preparation code prior to the job being enqueued
19
19
  #
20
20
  # @private
21
- # @note This should not be used by the end users as it is part of the lifecycle of things but
21
+ # @note This should not be used by the end users as it is part of the lifecycle of things and
22
+ # not as a part of the public api. This should not perform any extensive operations as it is
23
+ # blocking and running in the listener thread.
24
+ def on_before_enqueue; end
25
+
26
+ # Can be used to run preparation code in the worker
27
+ #
28
+ # @private
29
+ # @note This should not be used by the end users as it is part of the lifecycle of things and
22
30
  # not as part of the public api. This can act as a hook when creating non-blocking
23
31
  # consumers and doing other advanced stuff
24
- def on_before_consume; end
32
+ def on_before_consume
33
+ messages.metadata.processed_at = Time.now
34
+ messages.metadata.freeze
35
+ end
25
36
 
26
37
  # Executes the default consumer flow.
27
38
  #
@@ -70,10 +81,15 @@ module Karafka
70
81
  end
71
82
  end
72
83
 
73
- # Trigger method for running on shutdown.
84
+ # Trigger method for running on partition revocation.
74
85
  #
75
86
  # @private
76
87
  def on_revoked
88
+ # We need to always un-pause the processing in case we have lost a given partition.
89
+ # Otherwise the underlying librdkafka would not know we may want to continue processing and
90
+ # the pause could in theory last forever
91
+ resume
92
+
77
93
  coordinator.revoke
78
94
 
79
95
  Karafka.monitor.instrument('consumer.revoked', caller: self) do
@@ -275,16 +275,16 @@ module Karafka
275
275
 
276
276
  # Commits the stored offsets in a sync way and closes the consumer.
277
277
  def close
278
- # Once client is closed, we should not close it again
279
- # This could only happen in case of a race-condition when forceful shutdown happens
280
- # and triggers this from a different thread
281
- return if @closed
282
-
283
278
  @mutex.synchronize do
284
- internal_commit_offsets(async: false)
279
+ # Once client is closed, we should not close it again
280
+ # This could only happen in case of a race-condition when forceful shutdown happens
281
+ # and triggers this from a different thread
282
+ return if @closed
285
283
 
286
284
  @closed = true
287
285
 
286
+ internal_commit_offsets(async: false)
287
+
288
288
  # Remove callbacks runners that were registered
289
289
  ::Karafka::Instrumentation.statistics_callbacks.delete(@subscription_group.id)
290
290
  ::Karafka::Instrumentation.error_callbacks.delete(@subscription_group.id)
@@ -185,7 +185,9 @@ module Karafka
185
185
  # processed (if it was assigned and revoked really fast), thus we may not have it
186
186
  # here. In cases like this, we do not run a revocation job
187
187
  @executors.find_all(topic, partition).each do |executor|
188
- jobs << @jobs_builder.revoked(executor)
188
+ job = @jobs_builder.revoked(executor)
189
+ job.before_enqueue
190
+ jobs << job
189
191
  end
190
192
 
191
193
  # We need to remove all the executors of a given topic partition that we have lost, so
@@ -205,7 +207,9 @@ module Karafka
205
207
  jobs = []
206
208
 
207
209
  @executors.each do |_, _, executor|
208
- jobs << @jobs_builder.shutdown(executor)
210
+ job = @jobs_builder.shutdown(executor)
211
+ job.before_enqueue
212
+ jobs << job
209
213
  end
210
214
 
211
215
  @scheduler.schedule_shutdown(@jobs_queue, jobs)
@@ -238,10 +242,10 @@ module Karafka
238
242
  @partitioner.call(topic, messages) do |group_id, partition_messages|
239
243
  # Count the job we're going to create here
240
244
  coordinator.increment
241
-
242
245
  executor = @executors.find_or_create(topic, partition, group_id)
243
-
244
- jobs << @jobs_builder.consume(executor, partition_messages, coordinator)
246
+ job = @jobs_builder.consume(executor, partition_messages, coordinator)
247
+ job.before_enqueue
248
+ jobs << job
245
249
  end
246
250
  end
247
251
 
@@ -12,8 +12,8 @@ module Karafka
12
12
  ).fetch('en').fetch('validations').fetch('consumer_group')
13
13
  end
14
14
 
15
- required(:id) { |id| id.is_a?(String) && Contracts::TOPIC_REGEXP.match?(id) }
16
- required(:topics) { |topics| topics.is_a?(Array) && !topics.empty? }
15
+ required(:id) { |val| val.is_a?(String) && Contracts::TOPIC_REGEXP.match?(val) }
16
+ required(:topics) { |val| val.is_a?(Array) && !val.empty? }
17
17
 
18
18
  virtual do |data, errors|
19
19
  next unless errors.empty?
@@ -12,15 +12,16 @@ module Karafka
12
12
  ).fetch('en').fetch('validations').fetch('consumer_group_topic')
13
13
  end
14
14
 
15
- required(:consumer) { |consumer_group| !consumer_group.nil? }
16
- required(:deserializer) { |deserializer| !deserializer.nil? }
17
- required(:id) { |id| id.is_a?(String) && Contracts::TOPIC_REGEXP.match?(id) }
18
- required(:kafka) { |kafka| kafka.is_a?(Hash) && !kafka.empty? }
19
- required(:max_messages) { |mm| mm.is_a?(Integer) && mm >= 1 }
20
- required(:initial_offset) { |io| %w[earliest latest].include?(io) }
21
- required(:max_wait_time) { |mwt| mwt.is_a?(Integer) && mwt >= 10 }
22
- required(:manual_offset_management) { |mmm| [true, false].include?(mmm) }
23
- required(:name) { |name| name.is_a?(String) && Contracts::TOPIC_REGEXP.match?(name) }
15
+ required(:consumer) { |val| !val.nil? }
16
+ required(:deserializer) { |val| !val.nil? }
17
+ required(:id) { |val| val.is_a?(String) && Contracts::TOPIC_REGEXP.match?(val) }
18
+ required(:kafka) { |val| val.is_a?(Hash) && !val.empty? }
19
+ required(:max_messages) { |val| val.is_a?(Integer) && val >= 1 }
20
+ required(:initial_offset) { |val| %w[earliest latest].include?(val) }
21
+ required(:max_wait_time) { |val| val.is_a?(Integer) && val >= 10 }
22
+ required(:manual_offset_management) { |val| [true, false].include?(val) }
23
+ required(:name) { |val| val.is_a?(String) && Contracts::TOPIC_REGEXP.match?(val) }
24
+ required(:subscription_group) { |val| val.nil? || (val.is_a?(String) && !val.empty?) }
24
25
 
25
26
  virtual do |data, errors|
26
27
  next unless errors.empty?
@@ -28,9 +28,8 @@ module Karafka
28
28
  created_at: messages.last.timestamp,
29
29
  # When this batch was built and scheduled for execution
30
30
  scheduled_at: scheduled_at,
31
- # We build the batch metadata when we pick up the job in the worker, thus we can use
32
- # current time here
33
- processed_at: Time.now
31
+ # This needs to be set to a correct value prior to processing starting
32
+ processed_at: nil
34
33
  )
35
34
  end
36
35
  end
@@ -14,11 +14,13 @@ module Karafka
14
14
  # @param received_at [Time] moment in time when the messages were received
15
15
  # @return [Karafka::Messages::Messages] messages batch object
16
16
  def call(messages, topic, received_at)
17
+ # We cannot freeze the batch metadata because it is altered with the processed_at time
18
+ # prior to the consumption. It is being frozen there
17
19
  metadata = BatchMetadata.call(
18
20
  messages,
19
21
  topic,
20
22
  received_at
21
- ).freeze
23
+ )
22
24
 
23
25
  Karafka::Messages::Messages.new(
24
26
  messages,
@@ -35,7 +35,7 @@ module Karafka
35
35
 
36
36
  # We cannot mark jobs as done after each if there are virtual partitions. Otherwise
37
37
  # this could create random markings
38
- next if topic.virtual_partitioner?
38
+ next if topic.virtual_partitions?
39
39
 
40
40
  mark_as_consumed(message)
41
41
  end
@@ -23,13 +23,17 @@ module Karafka
23
23
 
24
24
  private_constant :MAX_PAUSE_TIME
25
25
 
26
- # Pauses processing of a given partition until we're done with the processing
26
+ # Pauses processing of a given partition until we're done with the processing.
27
27
  # This ensures, that we can easily poll not reaching the `max.poll.interval`
28
- def on_before_consume
28
+ # @note This needs to happen in the listener thread, because we cannot wait on this being
29
+ # executed in the workers. Workers may be already running some LRJ jobs that are blocking
30
+ # all the threads until finished, yet unless we pause the incoming partitions information,
31
+ # we may be kicked out of the consumer group due to not polling often enough
32
+ def on_before_enqueue
29
33
  return unless topic.long_running_job?
30
34
 
31
35
  # This ensures, that when running LRJ with VP, things operate as expected
32
- coordinator.on_started do |first_group_message|
36
+ coordinator.on_enqueued do |first_group_message|
33
37
  # Pause at the first message in a batch. That way in case of a crash, we will not loose
34
38
  # any messages
35
39
  pause(first_group_message.offset, MAX_PAUSE_TIME)
@@ -44,6 +48,29 @@ module Karafka
44
48
  end
45
49
  end
46
50
 
51
+ # Trigger method for running on partition revocation.
52
+ #
53
+ # @private
54
+ def on_revoked
55
+ # We do not want to resume on revocation in case of a LRJ.
56
+ # For LRJ we resume after the successful processing or do a backoff pause in case of a
57
+ # failure. Double non-blocking resume could cause problems in coordination.
58
+ resume unless topic.long_running_job?
59
+
60
+ coordinator.revoke
61
+
62
+ Karafka.monitor.instrument('consumer.revoked', caller: self) do
63
+ revoked
64
+ end
65
+ rescue StandardError => e
66
+ Karafka.monitor.instrument(
67
+ 'error.occurred',
68
+ error: e,
69
+ caller: self,
70
+ type: 'consumer.revoked.error'
71
+ )
72
+ end
73
+
47
74
  private
48
75
 
49
76
  # Handles the post-consumption flow depending on topic settings
@@ -74,6 +101,8 @@ module Karafka
74
101
  resume
75
102
  else
76
103
  # If processing failed, we need to pause
104
+ # For long running job this will overwrite the default never-ending pause and will cause
105
+ # the processing th keep going after the error backoff
77
106
  pause(@seek_offset || first_message.offset)
78
107
  end
79
108
  end
@@ -22,11 +22,31 @@ module Karafka
22
22
  ).fetch('en').fetch('validations').fetch('pro_consumer_group_topic')
23
23
  end
24
24
 
25
- virtual do |data|
25
+ nested(:virtual_partitions) do
26
+ required(:active) { |val| [true, false].include?(val) }
27
+ required(:partitioner) { |val| val.nil? || val.respond_to?(:call) }
28
+ required(:concurrency) { |val| val.is_a?(Integer) && val >= 1 }
29
+ end
30
+
31
+ virtual do |data, errors|
32
+ next unless errors.empty?
26
33
  next if data[:consumer] < Karafka::Pro::BaseConsumer
27
34
 
28
35
  [[%i[consumer], :consumer_format]]
29
36
  end
37
+
38
+ # When virtual partitions are defined, partitioner needs to respond to `#call` and it
39
+ # cannot be nil
40
+ virtual do |data, errors|
41
+ next unless errors.empty?
42
+
43
+ virtual_partitions = data[:virtual_partitions]
44
+
45
+ next unless virtual_partitions[:active]
46
+ next if virtual_partitions[:partitioner].respond_to?(:call)
47
+
48
+ [[%i[virtual_partitions partitioner], :respond_to_call]]
49
+ end
30
50
  end
31
51
  end
32
52
  end
@@ -67,7 +67,7 @@ module Karafka
67
67
 
68
68
  # Loads routing extensions
69
69
  def load_routing_extensions
70
- ::Karafka::Routing::Topic.include(Routing::TopicExtensions)
70
+ ::Karafka::Routing::Topic.prepend(Routing::TopicExtensions)
71
71
  ::Karafka::Routing::Builder.prepend(Routing::BuilderExtensions)
72
72
  end
73
73
  end
@@ -18,6 +18,7 @@ module Karafka
18
18
  # @param args [Object] anything the base coordinator accepts
19
19
  def initialize(*args)
20
20
  super
21
+ @on_enqueued_invoked = false
21
22
  @on_started_invoked = false
22
23
  @on_finished_invoked = false
23
24
  @flow_lock = Mutex.new
@@ -30,6 +31,7 @@ module Karafka
30
31
  super
31
32
 
32
33
  @mutex.synchronize do
34
+ @on_enqueued_invoked = false
33
35
  @on_started_invoked = false
34
36
  @on_finished_invoked = false
35
37
  @first_message = messages.first
@@ -42,6 +44,18 @@ module Karafka
42
44
  @running_jobs.zero?
43
45
  end
44
46
 
47
+ # Runs synchronized code once for a collective of virtual partitions prior to work being
48
+ # enqueued
49
+ def on_enqueued
50
+ @flow_lock.synchronize do
51
+ return if @on_enqueued_invoked
52
+
53
+ @on_enqueued_invoked = true
54
+
55
+ yield(@first_message, @last_message)
56
+ end
57
+ end
58
+
45
59
  # Runs given code only once per all the coordinated jobs upon starting first of them
46
60
  def on_started
47
61
  @flow_lock.synchronize do
@@ -25,8 +25,9 @@ module Karafka
25
25
  # @note It needs to be working with a proper consumer that will handle the partition
26
26
  # management. This layer of the framework knows nothing about Kafka messages consumption.
27
27
  class ConsumeNonBlocking < ::Karafka::Processing::Jobs::Consume
28
- # Releases the blocking lock after it is done with the preparation phase for this job
29
- def before_call
28
+ # Makes this job non-blocking from the start
29
+ # @param args [Array] any arguments accepted by `::Karafka::Processing::Jobs::Consume`
30
+ def initialize(*args)
30
31
  super
31
32
  @non_blocking = true
32
33
  end
@@ -21,17 +21,15 @@ module Karafka
21
21
  def call(topic, messages)
22
22
  ktopic = @subscription_group.topics.find(topic)
23
23
 
24
- @concurrency ||= ::Karafka::App.config.concurrency
25
-
26
24
  # We only partition work if we have a virtual partitioner and more than one thread to
27
25
  # process the data. With one thread it is not worth partitioning the work as the work
28
26
  # itself will be assigned to one thread (pointless work)
29
- if ktopic.virtual_partitioner? && @concurrency > 1
27
+ if ktopic.virtual_partitions? && ktopic.virtual_partitions.concurrency > 1
30
28
  # We need to reduce it to number of threads, so the group_id is not a direct effect
31
29
  # of the end user action. Otherwise the persistence layer for consumers would cache
32
30
  # it forever and it would cause memory leaks
33
31
  groupings = messages
34
- .group_by { |msg| ktopic.virtual_partitioner.call(msg) }
32
+ .group_by { |msg| ktopic.virtual_partitions.partitioner.call(msg) }
35
33
  .values
36
34
 
37
35
  # Reduce the max concurrency to a size that matches the concurrency
@@ -41,7 +39,7 @@ module Karafka
41
39
  # The algorithm here is simple, we assume that the most costly in terms of processing,
42
40
  # will be processing of the biggest group and we reduce the smallest once to have
43
41
  # max of groups equal to concurrency
44
- while groupings.size > @concurrency
42
+ while groupings.size > ktopic.virtual_partitions.concurrency
45
43
  groupings.sort_by! { |grouping| -grouping.size }
46
44
 
47
45
  # Offset order needs to be maintained for virtual partitions
@@ -15,23 +15,59 @@ module Karafka
15
15
  module Routing
16
16
  # Routing extensions that allow to configure some extra PRO routing options
17
17
  module TopicExtensions
18
+ # Internal representation of the virtual partitions settings and configuration
19
+ # This allows us to abstract away things in a nice manner
20
+ #
21
+ # For features with more options than just on/off we use this approach as it simplifies
22
+ # the code. We do not use it for all not to create unneeded complexity
23
+ VirtualPartitions = Struct.new(
24
+ :active,
25
+ :partitioner,
26
+ :concurrency,
27
+ keyword_init: true
28
+ ) { alias_method :active?, :active }
29
+
18
30
  class << self
19
31
  # @param base [Class] class we extend
20
- def included(base)
32
+ def prepended(base)
21
33
  base.attr_accessor :long_running_job
22
- base.attr_accessor :virtual_partitioner
23
34
  end
24
35
  end
25
36
 
26
- # @return [Boolean] true if virtual partitioner is defined, false otherwise
27
- def virtual_partitioner?
28
- virtual_partitioner != nil
37
+ # @param concurrency [Integer] max number of virtual partitions that can come out of the
38
+ # single distribution flow. When set to more than the Karafka threading, will create
39
+ # more work than workers. When less, can ensure we have spare resources to process other
40
+ # things in parallel.
41
+ # @param partitioner [nil, #call] nil or callable partitioner
42
+ # @return [VirtualPartitions] method that allows to set the virtual partitions details
43
+ # during the routing configuration and then allows to retrieve it
44
+ def virtual_partitions(
45
+ concurrency: Karafka::App.config.concurrency,
46
+ partitioner: nil
47
+ )
48
+ @virtual_partitions ||= VirtualPartitions.new(
49
+ active: !partitioner.nil?,
50
+ concurrency: concurrency,
51
+ partitioner: partitioner
52
+ )
53
+ end
54
+
55
+ # @return [Boolean] are virtual partitions enabled for given topic
56
+ def virtual_partitions?
57
+ virtual_partitions.active?
29
58
  end
30
59
 
31
60
  # @return [Boolean] is a given job on a topic a long-running one
32
61
  def long_running_job?
33
62
  @long_running_job || false
34
63
  end
64
+
65
+ # @return [Hash] hash with topic details and the extensions details
66
+ def to_h
67
+ super.merge(
68
+ virtual_partitions: virtual_partitions.to_h
69
+ )
70
+ end
35
71
  end
36
72
  end
37
73
  end
@@ -37,14 +37,17 @@ module Karafka
37
37
  @topic = topic
38
38
  end
39
39
 
40
- # Builds the consumer instance, builds messages batch and sets all that is needed to run the
41
- # user consumption logic
40
+ # Allows us to prepare the consumer in the listener thread prior to the job being send to
41
+ # the queue. It also allows to run some code that is time sensitive and cannot wait in the
42
+ # queue as it could cause starvation.
42
43
  #
43
44
  # @param messages [Array<Karafka::Messages::Message>]
44
- # @param received_at [Time] the moment we've received the batch (actually the moment we've)
45
- # enqueued it, but good enough
46
45
  # @param coordinator [Karafka::Processing::Coordinator] coordinator for processing management
47
- def before_consume(messages, received_at, coordinator)
46
+ def before_enqueue(messages, coordinator)
47
+ # the moment we've received the batch or actually the moment we've enqueued it,
48
+ # but good enough
49
+ @enqueued_at = Time.now
50
+
48
51
  # Recreate consumer with each batch if persistence is not enabled
49
52
  # We reload the consumers with each batch instead of relying on some external signals
50
53
  # when needed for consistency. That way devs may have it on or off and not in this
@@ -57,9 +60,14 @@ module Karafka
57
60
  consumer.messages = Messages::Builders::Messages.call(
58
61
  messages,
59
62
  @topic,
60
- received_at
63
+ @enqueued_at
61
64
  )
62
65
 
66
+ consumer.on_before_enqueue
67
+ end
68
+
69
+ # Runs setup and warm-up code in the worker prior to running the consumption
70
+ def before_consume
63
71
  consumer.on_before_consume
64
72
  end
65
73
 
@@ -22,6 +22,10 @@ module Karafka
22
22
  @non_blocking = false
23
23
  end
24
24
 
25
+ # When redefined can run any code prior to the job being enqueued
26
+ # @note This will run in the listener thread and not in the worker
27
+ def before_enqueue; end
28
+
25
29
  # When redefined can run any code that should run before executing the proper code
26
30
  def before_call; end
27
31
 
@@ -18,13 +18,18 @@ module Karafka
18
18
  @executor = executor
19
19
  @messages = messages
20
20
  @coordinator = coordinator
21
- @created_at = Time.now
22
21
  super()
23
22
  end
24
23
 
24
+ # Runs all the preparation code on the executor that needs to happen before the job is
25
+ # enqueued.
26
+ def before_enqueue
27
+ executor.before_enqueue(@messages, @coordinator)
28
+ end
29
+
25
30
  # Runs the before consumption preparations on the executor
26
31
  def before_call
27
- executor.before_consume(@messages, @created_at, @coordinator)
32
+ executor.before_consume
28
33
  end
29
34
 
30
35
  # Runs the given executor
@@ -49,7 +49,6 @@ module Karafka
49
49
  instrument_details = { caller: self, job: job, jobs_queue: @jobs_queue }
50
50
 
51
51
  if job
52
-
53
52
  Karafka.monitor.instrument('worker.process', instrument_details)
54
53
 
55
54
  Karafka.monitor.instrument('worker.processed', instrument_details) do
@@ -7,15 +7,6 @@ module Karafka
7
7
  class Proxy
8
8
  attr_reader :target
9
9
 
10
- # We should proxy only non ? and = methods as we want to have a regular dsl
11
- IGNORED_POSTFIXES = %w[
12
- ?
13
- =
14
- !
15
- ].freeze
16
-
17
- private_constant :IGNORED_POSTFIXES
18
-
19
10
  # @param target [Object] target object to which we proxy any DSL call
20
11
  # @param block [Proc] block that we want to evaluate in the proxy context
21
12
  def initialize(target, &block)
@@ -25,21 +16,23 @@ module Karafka
25
16
 
26
17
  # Translates the no "=" DSL of routing into elements assignments on target
27
18
  # @param method_name [Symbol] name of the missing method
28
- # @param arguments [Array] array with it's arguments
29
- # @param block [Proc] block provided to the method
30
- def method_missing(method_name, *arguments, &block)
19
+ def method_missing(method_name, ...)
31
20
  return super unless respond_to_missing?(method_name)
32
21
 
33
- @target.public_send(:"#{method_name}=", *arguments, &block)
22
+ if @target.respond_to?(:"#{method_name}=")
23
+ @target.public_send(:"#{method_name}=", ...)
24
+ else
25
+ @target.public_send(method_name, ...)
26
+ end
34
27
  end
35
28
 
36
29
  # Tells whether or not a given element exists on the target
37
30
  # @param method_name [Symbol] name of the missing method
38
31
  # @param include_private [Boolean] should we include private in the check as well
39
32
  def respond_to_missing?(method_name, include_private = false)
40
- return false if IGNORED_POSTFIXES.any? { |postfix| method_name.to_s.end_with?(postfix) }
41
-
42
- @target.respond_to?(:"#{method_name}=", include_private) || super
33
+ @target.respond_to?(:"#{method_name}=", include_private) ||
34
+ @target.respond_to?(method_name, include_private) ||
35
+ super
43
36
  end
44
37
  end
45
38
  end
@@ -19,6 +19,7 @@ module Karafka
19
19
  max_messages
20
20
  max_wait_time
21
21
  initial_offset
22
+ subscription_group
22
23
  ].freeze
23
24
 
24
25
  private_constant :DISTRIBUTION_KEYS
@@ -8,6 +8,7 @@ module Karafka
8
8
  class Topic
9
9
  attr_reader :id, :name, :consumer_group
10
10
  attr_writer :consumer
11
+ attr_accessor :subscription_group
11
12
 
12
13
  # Attributes we can inherit from the root unless they were defined on this level
13
14
  INHERITABLE_ATTRIBUTES = %i[
@@ -91,7 +92,8 @@ module Karafka
91
92
  id: id,
92
93
  name: name,
93
94
  consumer: consumer,
94
- consumer_group_id: consumer_group.id
95
+ consumer_group_id: consumer_group.id,
96
+ subscription_group: subscription_group
95
97
  ).freeze
96
98
  end
97
99
  end
@@ -1,6 +1,6 @@
1
1
  # frozen_string_literal: true
2
-
3
2
  <% unless rails? -%>
3
+
4
4
  # This file is auto-generated during the install process.
5
5
  # If by any chance you've wanted a setup for Rails app, either run the `karafka:install`
6
6
  # command again or refer to the install templates available in the source codes
@@ -3,5 +3,5 @@
3
3
  # Main module namespace
4
4
  module Karafka
5
5
  # Current Karafka version
6
- VERSION = '2.0.4'
6
+ VERSION = '2.0.7'
7
7
  end
data.tar.gz.sig CHANGED
Binary file
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: karafka
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.0.4
4
+ version: 2.0.7
5
5
  platform: ruby
6
6
  authors:
7
7
  - Maciej Mensfeld
@@ -35,7 +35,7 @@ cert_chain:
35
35
  Qf04B9ceLUaC4fPVEz10FyobjaFoY4i32xRto3XnrzeAgfEe4swLq8bQsR3w/EF3
36
36
  MGU0FeSV2Yj7Xc2x/7BzLK8xQn5l7Yy75iPF+KP3vVmDHnNl
37
37
  -----END CERTIFICATE-----
38
- date: 2022-08-19 00:00:00.000000000 Z
38
+ date: 2022-09-05 00:00:00.000000000 Z
39
39
  dependencies:
40
40
  - !ruby/object:Gem::Dependency
41
41
  name: karafka-core
metadata.gz.sig CHANGED
Binary file