karafka 2.0.4 → 2.0.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- checksums.yaml.gz.sig +0 -0
- data/CHANGELOG.md +18 -0
- data/Gemfile.lock +1 -1
- data/README.md +9 -9
- data/bin/integrations +13 -3
- data/config/errors.yml +3 -0
- data/lib/karafka/admin.rb +2 -1
- data/lib/karafka/base_consumer.rb +20 -4
- data/lib/karafka/connection/client.rb +6 -6
- data/lib/karafka/connection/listener.rb +9 -5
- data/lib/karafka/contracts/consumer_group.rb +2 -2
- data/lib/karafka/contracts/consumer_group_topic.rb +10 -9
- data/lib/karafka/messages/builders/batch_metadata.rb +2 -3
- data/lib/karafka/messages/builders/messages.rb +3 -1
- data/lib/karafka/pro/active_job/consumer.rb +1 -1
- data/lib/karafka/pro/base_consumer.rb +32 -3
- data/lib/karafka/pro/contracts/consumer_group_topic.rb +21 -1
- data/lib/karafka/pro/loader.rb +1 -1
- data/lib/karafka/pro/processing/coordinator.rb +14 -0
- data/lib/karafka/pro/processing/jobs/consume_non_blocking.rb +3 -2
- data/lib/karafka/pro/processing/partitioner.rb +3 -5
- data/lib/karafka/pro/routing/topic_extensions.rb +41 -5
- data/lib/karafka/processing/executor.rb +14 -6
- data/lib/karafka/processing/jobs/base.rb +4 -0
- data/lib/karafka/processing/jobs/consume.rb +7 -2
- data/lib/karafka/processing/worker.rb +0 -1
- data/lib/karafka/routing/proxy.rb +9 -16
- data/lib/karafka/routing/subscription_groups_builder.rb +1 -0
- data/lib/karafka/routing/topic.rb +3 -1
- data/lib/karafka/templates/karafka.rb.erb +1 -1
- data/lib/karafka/version.rb +1 -1
- data.tar.gz.sig +0 -0
- metadata +2 -2
- metadata.gz.sig +0 -0
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 0abed3f97a58be6b48f640468f7d7e6d48bc0960596b21d022b4616dd047be28
|
4
|
+
data.tar.gz: 48143253beee640e25e47a81474767c179e715e855d6173b59566483a57af5a8
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 9c9f8c170ac82fc0f1eb6ea41698dcd82cc525006931a59443d004c94eb18b56ffcb67eb1eb45fcc1fd557fee22e6e63ceb7a8a001245469e3e574d87c88c8e8
|
7
|
+
data.tar.gz: 47bc7e7dfe5ca3d503a3cb18da4e4b95c076197dc26b5633195e169d3f4d94da4effaf27bd4360ddff1481031b1ee20f61e465e24f6984570f6067ca4fbd51ea
|
checksums.yaml.gz.sig
CHANGED
Binary file
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,23 @@
|
|
1
1
|
# Karafka framework changelog
|
2
2
|
|
3
|
+
## 2.0.7 (Unreleased)
|
4
|
+
- [Breaking change] Redefine the Virtual Partitions routing DSL to accept concurrency
|
5
|
+
- Allow for `concurrency` setting in Virtual Partitions to extend or limit number of jobs per regular partition. This allows to make sure, we do not use all the threads on virtual partitions jobs
|
6
|
+
- Allow for creation of as many Virtual Partitions as needed, without taking global `concurrency` into consideration
|
7
|
+
|
8
|
+
## 2.0.6 (2022-09-02)
|
9
|
+
- Improve client closing.
|
10
|
+
- Fix for: Multiple LRJ topics fetched concurrently block ability for LRJ to kick in (#1002)
|
11
|
+
- Introduce a pre-enqueue sync execution layer to prevent starvation cases for LRJ
|
12
|
+
- Close admin upon critical errors to prevent segmentation faults
|
13
|
+
- Add support for manual subscription group management (#852)
|
14
|
+
|
15
|
+
## 2.0.5 (2022-08-23)
|
16
|
+
- Fix unnecessary double new line in the `karafka.rb` template for Ruby on Rails
|
17
|
+
- Fix a case where a manually paused partition would not be processed after rebalance (#988)
|
18
|
+
- Increase specs stability.
|
19
|
+
- Lower concurrency of execution of specs in Github CI.
|
20
|
+
|
3
21
|
## 2.0.4 (2022-08-19)
|
4
22
|
- Fix hanging topic creation (#964)
|
5
23
|
- Fix conflict with other Rails loading libraries like `gruf` (#974)
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -8,12 +8,12 @@
|
|
8
8
|
|
9
9
|
Karafka is a Ruby and Rails multi-threaded efficient Kafka processing framework that:
|
10
10
|
|
11
|
-
- Supports parallel processing in [multiple threads](https://
|
12
|
-
- Has [ActiveJob backend](https://
|
13
|
-
- [Automatically integrates](https://
|
14
|
-
- Supports in-development [code reloading](https://
|
11
|
+
- Supports parallel processing in [multiple threads](https://karafka.io/docs/Concurrency-and-multithreading) (also for a [single topic partition](https://karafka.io/docs/Pro-Virtual-Partitions) work)
|
12
|
+
- Has [ActiveJob backend](https://karafka.io/docs/Active-Job) support (including [ordered jobs](https://karafka.io/docs/Pro-Enhanced-Active-Job#ordered-jobs))
|
13
|
+
- [Automatically integrates](https://karafka.io/docs/Integrating-with-Ruby-on-Rails-and-other-frameworks#integrating-with-ruby-on-rails) with Ruby on Rails
|
14
|
+
- Supports in-development [code reloading](https://karafka.io/docs/Auto-reload-of-code-changes-in-development)
|
15
15
|
- Is powered by [librdkafka](https://github.com/edenhill/librdkafka) (the Apache Kafka C/C++ client library)
|
16
|
-
- Has an out-of the box [StatsD/DataDog monitoring](https://
|
16
|
+
- Has an out-of the box [StatsD/DataDog monitoring](https://karafka.io/docs/Monitoring-and-logging) with a dashboard template.
|
17
17
|
|
18
18
|
```ruby
|
19
19
|
# Define what topics you want to consume with which consumers in karafka.rb
|
@@ -42,13 +42,13 @@ If you're entirely new to the subject, you can start with our "Kafka on Rails" a
|
|
42
42
|
- [Kafka on Rails: Using Kafka with Ruby on Rails – Part 1 – Kafka basics and its advantages](https://mensfeld.pl/2017/11/kafka-on-rails-using-kafka-with-ruby-on-rails-part-1-kafka-basics-and-its-advantages/)
|
43
43
|
- [Kafka on Rails: Using Kafka with Ruby on Rails – Part 2 – Getting started with Rails and Kafka](https://mensfeld.pl/2018/01/kafka-on-rails-using-kafka-with-ruby-on-rails-part-2-getting-started-with-ruby-and-kafka/)
|
44
44
|
|
45
|
-
If you want to get started with Kafka and Karafka as fast as possible, then the best idea is to visit our [Getting started](https://
|
45
|
+
If you want to get started with Kafka and Karafka as fast as possible, then the best idea is to visit our [Getting started](https://karafka.io/docs/Getting-Started) guides and the [example apps repository](https://github.com/karafka/example-apps).
|
46
46
|
|
47
47
|
We also maintain many [integration specs](https://github.com/karafka/karafka/tree/master/spec/integrations) illustrating various use-cases and features of the framework.
|
48
48
|
|
49
49
|
### TL;DR (1 minute from setup to publishing and consuming messages)
|
50
50
|
|
51
|
-
**Prerequisites**: Kafka running. You can start it by following instructions from [here](https://
|
51
|
+
**Prerequisites**: Kafka running. You can start it by following instructions from [here](https://karafka.io/docs/Setting-up-Kafka).
|
52
52
|
|
53
53
|
1. Add and install Karafka:
|
54
54
|
|
@@ -85,8 +85,8 @@ Help me provide high-quality open-source software. Please see the Karafka [homep
|
|
85
85
|
|
86
86
|
## Support
|
87
87
|
|
88
|
-
Karafka has [Wiki pages](https://
|
88
|
+
Karafka has [Wiki pages](https://karafka.io/docs) for almost everything and a pretty decent [FAQ](https://karafka.io/docs/FAQ). It covers the installation, setup, and deployment, along with other useful details on how to run Karafka.
|
89
89
|
|
90
90
|
If you have questions about using Karafka, feel free to join our [Slack](https://slack.karafka.io) channel.
|
91
91
|
|
92
|
-
Karafka has [priority support](https://
|
92
|
+
Karafka has [priority support](https://karafka.io/docs/Pro-Support) for technical and architectural questions that is part of the Karafka Pro subscription.
|
data/bin/integrations
CHANGED
@@ -19,7 +19,7 @@ ROOT_PATH = Pathname.new(File.expand_path(File.join(File.dirname(__FILE__), '../
|
|
19
19
|
# When the value is high, there's a problem with thread allocation on Github CI, tht is why
|
20
20
|
# we limit it. Locally we can run a lot of those, as many of them have sleeps and do not use a lot
|
21
21
|
# of CPU
|
22
|
-
CONCURRENCY = ENV.key?('CI') ?
|
22
|
+
CONCURRENCY = ENV.key?('CI') ? 3 : Etc.nprocessors * 2
|
23
23
|
|
24
24
|
# How may bytes do we want to keep from the stdout in the buffer for when we need to print it
|
25
25
|
MAX_BUFFER_OUTPUT = 51_200
|
@@ -47,6 +47,8 @@ class Scenario
|
|
47
47
|
# @param path [String] path to the scenarios file
|
48
48
|
def initialize(path)
|
49
49
|
@path = path
|
50
|
+
# First 1024 characters from stdout
|
51
|
+
@stdout_head = ''
|
50
52
|
# Last 1024 characters from stdout
|
51
53
|
@stdout_tail = ''
|
52
54
|
end
|
@@ -75,8 +77,6 @@ class Scenario
|
|
75
77
|
def finished?
|
76
78
|
# If the thread is running too long, kill it
|
77
79
|
if current_time - @started_at > MAX_RUN_TIME
|
78
|
-
@wait_thr.kill
|
79
|
-
|
80
80
|
begin
|
81
81
|
Process.kill('TERM', pid)
|
82
82
|
# It may finish right after we want to kill it, that's why we ignore this
|
@@ -88,6 +88,7 @@ class Scenario
|
|
88
88
|
# to stdout. Otherwise after reaching the buffer size, it would hang
|
89
89
|
buffer = ''
|
90
90
|
@stdout.read_nonblock(MAX_BUFFER_OUTPUT, buffer, exception: false)
|
91
|
+
@stdout_head = buffer if @stdout_head.empty?
|
91
92
|
@stdout_tail << buffer
|
92
93
|
@stdout_tail = @stdout_tail[-MAX_BUFFER_OUTPUT..-1] || @stdout_tail
|
93
94
|
|
@@ -112,6 +113,11 @@ class Scenario
|
|
112
113
|
@wait_thr.value&.exitstatus || 123
|
113
114
|
end
|
114
115
|
|
116
|
+
# @return [String] exit status of the process
|
117
|
+
def exit_status
|
118
|
+
@wait_thr.value.to_s
|
119
|
+
end
|
120
|
+
|
115
121
|
# Prints a status report when scenario is finished and stdout if it failed
|
116
122
|
def report
|
117
123
|
if success?
|
@@ -123,7 +129,11 @@ class Scenario
|
|
123
129
|
|
124
130
|
puts
|
125
131
|
puts "\e[#{31}m#{'[FAILED]'}\e[0m #{name}"
|
132
|
+
puts "Time taken: #{current_time - @started_at} seconds"
|
126
133
|
puts "Exit code: #{exit_code}"
|
134
|
+
puts "Exit status: #{exit_status}"
|
135
|
+
puts @stdout_head
|
136
|
+
puts '...'
|
127
137
|
puts @stdout_tail
|
128
138
|
puts buffer
|
129
139
|
puts
|
data/config/errors.yml
CHANGED
@@ -35,6 +35,7 @@ en:
|
|
35
35
|
consumer_format: needs to be present
|
36
36
|
id_format: 'needs to be a string with a Kafka accepted format'
|
37
37
|
initial_offset_format: needs to be either earliest or latest
|
38
|
+
subscription_group_format: must be nil or a non-empty string
|
38
39
|
|
39
40
|
consumer_group:
|
40
41
|
missing: needs to be present
|
@@ -54,3 +55,5 @@ en:
|
|
54
55
|
|
55
56
|
pro_consumer_group_topic:
|
56
57
|
consumer_format: needs to inherit from Karafka::Pro::BaseConsumer and not Karafka::Consumer
|
58
|
+
virtual_partitions.partitioner_respond_to_call: needs to be defined and needs to respond to `#call`
|
59
|
+
virtual_partitions.concurrency_format: needs to be equl or more than 1
|
data/lib/karafka/admin.rb
CHANGED
@@ -15,13 +15,24 @@ module Karafka
|
|
15
15
|
# @return [Waterdrop::Producer] producer instance
|
16
16
|
attr_accessor :producer
|
17
17
|
|
18
|
-
# Can be used to run preparation code
|
18
|
+
# Can be used to run preparation code prior to the job being enqueued
|
19
19
|
#
|
20
20
|
# @private
|
21
|
-
# @note This should not be used by the end users as it is part of the lifecycle of things
|
21
|
+
# @note This should not be used by the end users as it is part of the lifecycle of things and
|
22
|
+
# not as a part of the public api. This should not perform any extensive operations as it is
|
23
|
+
# blocking and running in the listener thread.
|
24
|
+
def on_before_enqueue; end
|
25
|
+
|
26
|
+
# Can be used to run preparation code in the worker
|
27
|
+
#
|
28
|
+
# @private
|
29
|
+
# @note This should not be used by the end users as it is part of the lifecycle of things and
|
22
30
|
# not as part of the public api. This can act as a hook when creating non-blocking
|
23
31
|
# consumers and doing other advanced stuff
|
24
|
-
def on_before_consume
|
32
|
+
def on_before_consume
|
33
|
+
messages.metadata.processed_at = Time.now
|
34
|
+
messages.metadata.freeze
|
35
|
+
end
|
25
36
|
|
26
37
|
# Executes the default consumer flow.
|
27
38
|
#
|
@@ -70,10 +81,15 @@ module Karafka
|
|
70
81
|
end
|
71
82
|
end
|
72
83
|
|
73
|
-
# Trigger method for running on
|
84
|
+
# Trigger method for running on partition revocation.
|
74
85
|
#
|
75
86
|
# @private
|
76
87
|
def on_revoked
|
88
|
+
# We need to always un-pause the processing in case we have lost a given partition.
|
89
|
+
# Otherwise the underlying librdkafka would not know we may want to continue processing and
|
90
|
+
# the pause could in theory last forever
|
91
|
+
resume
|
92
|
+
|
77
93
|
coordinator.revoke
|
78
94
|
|
79
95
|
Karafka.monitor.instrument('consumer.revoked', caller: self) do
|
@@ -275,16 +275,16 @@ module Karafka
|
|
275
275
|
|
276
276
|
# Commits the stored offsets in a sync way and closes the consumer.
|
277
277
|
def close
|
278
|
-
# Once client is closed, we should not close it again
|
279
|
-
# This could only happen in case of a race-condition when forceful shutdown happens
|
280
|
-
# and triggers this from a different thread
|
281
|
-
return if @closed
|
282
|
-
|
283
278
|
@mutex.synchronize do
|
284
|
-
|
279
|
+
# Once client is closed, we should not close it again
|
280
|
+
# This could only happen in case of a race-condition when forceful shutdown happens
|
281
|
+
# and triggers this from a different thread
|
282
|
+
return if @closed
|
285
283
|
|
286
284
|
@closed = true
|
287
285
|
|
286
|
+
internal_commit_offsets(async: false)
|
287
|
+
|
288
288
|
# Remove callbacks runners that were registered
|
289
289
|
::Karafka::Instrumentation.statistics_callbacks.delete(@subscription_group.id)
|
290
290
|
::Karafka::Instrumentation.error_callbacks.delete(@subscription_group.id)
|
@@ -185,7 +185,9 @@ module Karafka
|
|
185
185
|
# processed (if it was assigned and revoked really fast), thus we may not have it
|
186
186
|
# here. In cases like this, we do not run a revocation job
|
187
187
|
@executors.find_all(topic, partition).each do |executor|
|
188
|
-
|
188
|
+
job = @jobs_builder.revoked(executor)
|
189
|
+
job.before_enqueue
|
190
|
+
jobs << job
|
189
191
|
end
|
190
192
|
|
191
193
|
# We need to remove all the executors of a given topic partition that we have lost, so
|
@@ -205,7 +207,9 @@ module Karafka
|
|
205
207
|
jobs = []
|
206
208
|
|
207
209
|
@executors.each do |_, _, executor|
|
208
|
-
|
210
|
+
job = @jobs_builder.shutdown(executor)
|
211
|
+
job.before_enqueue
|
212
|
+
jobs << job
|
209
213
|
end
|
210
214
|
|
211
215
|
@scheduler.schedule_shutdown(@jobs_queue, jobs)
|
@@ -238,10 +242,10 @@ module Karafka
|
|
238
242
|
@partitioner.call(topic, messages) do |group_id, partition_messages|
|
239
243
|
# Count the job we're going to create here
|
240
244
|
coordinator.increment
|
241
|
-
|
242
245
|
executor = @executors.find_or_create(topic, partition, group_id)
|
243
|
-
|
244
|
-
|
246
|
+
job = @jobs_builder.consume(executor, partition_messages, coordinator)
|
247
|
+
job.before_enqueue
|
248
|
+
jobs << job
|
245
249
|
end
|
246
250
|
end
|
247
251
|
|
@@ -12,8 +12,8 @@ module Karafka
|
|
12
12
|
).fetch('en').fetch('validations').fetch('consumer_group')
|
13
13
|
end
|
14
14
|
|
15
|
-
required(:id) { |
|
16
|
-
required(:topics) { |
|
15
|
+
required(:id) { |val| val.is_a?(String) && Contracts::TOPIC_REGEXP.match?(val) }
|
16
|
+
required(:topics) { |val| val.is_a?(Array) && !val.empty? }
|
17
17
|
|
18
18
|
virtual do |data, errors|
|
19
19
|
next unless errors.empty?
|
@@ -12,15 +12,16 @@ module Karafka
|
|
12
12
|
).fetch('en').fetch('validations').fetch('consumer_group_topic')
|
13
13
|
end
|
14
14
|
|
15
|
-
required(:consumer) { |
|
16
|
-
required(:deserializer) { |
|
17
|
-
required(:id) { |
|
18
|
-
required(:kafka) { |
|
19
|
-
required(:max_messages) { |
|
20
|
-
required(:initial_offset) { |
|
21
|
-
required(:max_wait_time) { |
|
22
|
-
required(:manual_offset_management) { |
|
23
|
-
required(:name) { |
|
15
|
+
required(:consumer) { |val| !val.nil? }
|
16
|
+
required(:deserializer) { |val| !val.nil? }
|
17
|
+
required(:id) { |val| val.is_a?(String) && Contracts::TOPIC_REGEXP.match?(val) }
|
18
|
+
required(:kafka) { |val| val.is_a?(Hash) && !val.empty? }
|
19
|
+
required(:max_messages) { |val| val.is_a?(Integer) && val >= 1 }
|
20
|
+
required(:initial_offset) { |val| %w[earliest latest].include?(val) }
|
21
|
+
required(:max_wait_time) { |val| val.is_a?(Integer) && val >= 10 }
|
22
|
+
required(:manual_offset_management) { |val| [true, false].include?(val) }
|
23
|
+
required(:name) { |val| val.is_a?(String) && Contracts::TOPIC_REGEXP.match?(val) }
|
24
|
+
required(:subscription_group) { |val| val.nil? || (val.is_a?(String) && !val.empty?) }
|
24
25
|
|
25
26
|
virtual do |data, errors|
|
26
27
|
next unless errors.empty?
|
@@ -28,9 +28,8 @@ module Karafka
|
|
28
28
|
created_at: messages.last.timestamp,
|
29
29
|
# When this batch was built and scheduled for execution
|
30
30
|
scheduled_at: scheduled_at,
|
31
|
-
#
|
32
|
-
|
33
|
-
processed_at: Time.now
|
31
|
+
# This needs to be set to a correct value prior to processing starting
|
32
|
+
processed_at: nil
|
34
33
|
)
|
35
34
|
end
|
36
35
|
end
|
@@ -14,11 +14,13 @@ module Karafka
|
|
14
14
|
# @param received_at [Time] moment in time when the messages were received
|
15
15
|
# @return [Karafka::Messages::Messages] messages batch object
|
16
16
|
def call(messages, topic, received_at)
|
17
|
+
# We cannot freeze the batch metadata because it is altered with the processed_at time
|
18
|
+
# prior to the consumption. It is being frozen there
|
17
19
|
metadata = BatchMetadata.call(
|
18
20
|
messages,
|
19
21
|
topic,
|
20
22
|
received_at
|
21
|
-
)
|
23
|
+
)
|
22
24
|
|
23
25
|
Karafka::Messages::Messages.new(
|
24
26
|
messages,
|
@@ -23,13 +23,17 @@ module Karafka
|
|
23
23
|
|
24
24
|
private_constant :MAX_PAUSE_TIME
|
25
25
|
|
26
|
-
# Pauses processing of a given partition until we're done with the processing
|
26
|
+
# Pauses processing of a given partition until we're done with the processing.
|
27
27
|
# This ensures, that we can easily poll not reaching the `max.poll.interval`
|
28
|
-
|
28
|
+
# @note This needs to happen in the listener thread, because we cannot wait on this being
|
29
|
+
# executed in the workers. Workers may be already running some LRJ jobs that are blocking
|
30
|
+
# all the threads until finished, yet unless we pause the incoming partitions information,
|
31
|
+
# we may be kicked out of the consumer group due to not polling often enough
|
32
|
+
def on_before_enqueue
|
29
33
|
return unless topic.long_running_job?
|
30
34
|
|
31
35
|
# This ensures, that when running LRJ with VP, things operate as expected
|
32
|
-
coordinator.
|
36
|
+
coordinator.on_enqueued do |first_group_message|
|
33
37
|
# Pause at the first message in a batch. That way in case of a crash, we will not loose
|
34
38
|
# any messages
|
35
39
|
pause(first_group_message.offset, MAX_PAUSE_TIME)
|
@@ -44,6 +48,29 @@ module Karafka
|
|
44
48
|
end
|
45
49
|
end
|
46
50
|
|
51
|
+
# Trigger method for running on partition revocation.
|
52
|
+
#
|
53
|
+
# @private
|
54
|
+
def on_revoked
|
55
|
+
# We do not want to resume on revocation in case of a LRJ.
|
56
|
+
# For LRJ we resume after the successful processing or do a backoff pause in case of a
|
57
|
+
# failure. Double non-blocking resume could cause problems in coordination.
|
58
|
+
resume unless topic.long_running_job?
|
59
|
+
|
60
|
+
coordinator.revoke
|
61
|
+
|
62
|
+
Karafka.monitor.instrument('consumer.revoked', caller: self) do
|
63
|
+
revoked
|
64
|
+
end
|
65
|
+
rescue StandardError => e
|
66
|
+
Karafka.monitor.instrument(
|
67
|
+
'error.occurred',
|
68
|
+
error: e,
|
69
|
+
caller: self,
|
70
|
+
type: 'consumer.revoked.error'
|
71
|
+
)
|
72
|
+
end
|
73
|
+
|
47
74
|
private
|
48
75
|
|
49
76
|
# Handles the post-consumption flow depending on topic settings
|
@@ -74,6 +101,8 @@ module Karafka
|
|
74
101
|
resume
|
75
102
|
else
|
76
103
|
# If processing failed, we need to pause
|
104
|
+
# For long running job this will overwrite the default never-ending pause and will cause
|
105
|
+
# the processing th keep going after the error backoff
|
77
106
|
pause(@seek_offset || first_message.offset)
|
78
107
|
end
|
79
108
|
end
|
@@ -22,11 +22,31 @@ module Karafka
|
|
22
22
|
).fetch('en').fetch('validations').fetch('pro_consumer_group_topic')
|
23
23
|
end
|
24
24
|
|
25
|
-
|
25
|
+
nested(:virtual_partitions) do
|
26
|
+
required(:active) { |val| [true, false].include?(val) }
|
27
|
+
required(:partitioner) { |val| val.nil? || val.respond_to?(:call) }
|
28
|
+
required(:concurrency) { |val| val.is_a?(Integer) && val >= 1 }
|
29
|
+
end
|
30
|
+
|
31
|
+
virtual do |data, errors|
|
32
|
+
next unless errors.empty?
|
26
33
|
next if data[:consumer] < Karafka::Pro::BaseConsumer
|
27
34
|
|
28
35
|
[[%i[consumer], :consumer_format]]
|
29
36
|
end
|
37
|
+
|
38
|
+
# When virtual partitions are defined, partitioner needs to respond to `#call` and it
|
39
|
+
# cannot be nil
|
40
|
+
virtual do |data, errors|
|
41
|
+
next unless errors.empty?
|
42
|
+
|
43
|
+
virtual_partitions = data[:virtual_partitions]
|
44
|
+
|
45
|
+
next unless virtual_partitions[:active]
|
46
|
+
next if virtual_partitions[:partitioner].respond_to?(:call)
|
47
|
+
|
48
|
+
[[%i[virtual_partitions partitioner], :respond_to_call]]
|
49
|
+
end
|
30
50
|
end
|
31
51
|
end
|
32
52
|
end
|
data/lib/karafka/pro/loader.rb
CHANGED
@@ -67,7 +67,7 @@ module Karafka
|
|
67
67
|
|
68
68
|
# Loads routing extensions
|
69
69
|
def load_routing_extensions
|
70
|
-
::Karafka::Routing::Topic.
|
70
|
+
::Karafka::Routing::Topic.prepend(Routing::TopicExtensions)
|
71
71
|
::Karafka::Routing::Builder.prepend(Routing::BuilderExtensions)
|
72
72
|
end
|
73
73
|
end
|
@@ -18,6 +18,7 @@ module Karafka
|
|
18
18
|
# @param args [Object] anything the base coordinator accepts
|
19
19
|
def initialize(*args)
|
20
20
|
super
|
21
|
+
@on_enqueued_invoked = false
|
21
22
|
@on_started_invoked = false
|
22
23
|
@on_finished_invoked = false
|
23
24
|
@flow_lock = Mutex.new
|
@@ -30,6 +31,7 @@ module Karafka
|
|
30
31
|
super
|
31
32
|
|
32
33
|
@mutex.synchronize do
|
34
|
+
@on_enqueued_invoked = false
|
33
35
|
@on_started_invoked = false
|
34
36
|
@on_finished_invoked = false
|
35
37
|
@first_message = messages.first
|
@@ -42,6 +44,18 @@ module Karafka
|
|
42
44
|
@running_jobs.zero?
|
43
45
|
end
|
44
46
|
|
47
|
+
# Runs synchronized code once for a collective of virtual partitions prior to work being
|
48
|
+
# enqueued
|
49
|
+
def on_enqueued
|
50
|
+
@flow_lock.synchronize do
|
51
|
+
return if @on_enqueued_invoked
|
52
|
+
|
53
|
+
@on_enqueued_invoked = true
|
54
|
+
|
55
|
+
yield(@first_message, @last_message)
|
56
|
+
end
|
57
|
+
end
|
58
|
+
|
45
59
|
# Runs given code only once per all the coordinated jobs upon starting first of them
|
46
60
|
def on_started
|
47
61
|
@flow_lock.synchronize do
|
@@ -25,8 +25,9 @@ module Karafka
|
|
25
25
|
# @note It needs to be working with a proper consumer that will handle the partition
|
26
26
|
# management. This layer of the framework knows nothing about Kafka messages consumption.
|
27
27
|
class ConsumeNonBlocking < ::Karafka::Processing::Jobs::Consume
|
28
|
-
#
|
29
|
-
|
28
|
+
# Makes this job non-blocking from the start
|
29
|
+
# @param args [Array] any arguments accepted by `::Karafka::Processing::Jobs::Consume`
|
30
|
+
def initialize(*args)
|
30
31
|
super
|
31
32
|
@non_blocking = true
|
32
33
|
end
|
@@ -21,17 +21,15 @@ module Karafka
|
|
21
21
|
def call(topic, messages)
|
22
22
|
ktopic = @subscription_group.topics.find(topic)
|
23
23
|
|
24
|
-
@concurrency ||= ::Karafka::App.config.concurrency
|
25
|
-
|
26
24
|
# We only partition work if we have a virtual partitioner and more than one thread to
|
27
25
|
# process the data. With one thread it is not worth partitioning the work as the work
|
28
26
|
# itself will be assigned to one thread (pointless work)
|
29
|
-
if ktopic.
|
27
|
+
if ktopic.virtual_partitions? && ktopic.virtual_partitions.concurrency > 1
|
30
28
|
# We need to reduce it to number of threads, so the group_id is not a direct effect
|
31
29
|
# of the end user action. Otherwise the persistence layer for consumers would cache
|
32
30
|
# it forever and it would cause memory leaks
|
33
31
|
groupings = messages
|
34
|
-
.group_by { |msg| ktopic.
|
32
|
+
.group_by { |msg| ktopic.virtual_partitions.partitioner.call(msg) }
|
35
33
|
.values
|
36
34
|
|
37
35
|
# Reduce the max concurrency to a size that matches the concurrency
|
@@ -41,7 +39,7 @@ module Karafka
|
|
41
39
|
# The algorithm here is simple, we assume that the most costly in terms of processing,
|
42
40
|
# will be processing of the biggest group and we reduce the smallest once to have
|
43
41
|
# max of groups equal to concurrency
|
44
|
-
while groupings.size >
|
42
|
+
while groupings.size > ktopic.virtual_partitions.concurrency
|
45
43
|
groupings.sort_by! { |grouping| -grouping.size }
|
46
44
|
|
47
45
|
# Offset order needs to be maintained for virtual partitions
|
@@ -15,23 +15,59 @@ module Karafka
|
|
15
15
|
module Routing
|
16
16
|
# Routing extensions that allow to configure some extra PRO routing options
|
17
17
|
module TopicExtensions
|
18
|
+
# Internal representation of the virtual partitions settings and configuration
|
19
|
+
# This allows us to abstract away things in a nice manner
|
20
|
+
#
|
21
|
+
# For features with more options than just on/off we use this approach as it simplifies
|
22
|
+
# the code. We do not use it for all not to create unneeded complexity
|
23
|
+
VirtualPartitions = Struct.new(
|
24
|
+
:active,
|
25
|
+
:partitioner,
|
26
|
+
:concurrency,
|
27
|
+
keyword_init: true
|
28
|
+
) { alias_method :active?, :active }
|
29
|
+
|
18
30
|
class << self
|
19
31
|
# @param base [Class] class we extend
|
20
|
-
def
|
32
|
+
def prepended(base)
|
21
33
|
base.attr_accessor :long_running_job
|
22
|
-
base.attr_accessor :virtual_partitioner
|
23
34
|
end
|
24
35
|
end
|
25
36
|
|
26
|
-
# @
|
27
|
-
|
28
|
-
|
37
|
+
# @param concurrency [Integer] max number of virtual partitions that can come out of the
|
38
|
+
# single distribution flow. When set to more than the Karafka threading, will create
|
39
|
+
# more work than workers. When less, can ensure we have spare resources to process other
|
40
|
+
# things in parallel.
|
41
|
+
# @param partitioner [nil, #call] nil or callable partitioner
|
42
|
+
# @return [VirtualPartitions] method that allows to set the virtual partitions details
|
43
|
+
# during the routing configuration and then allows to retrieve it
|
44
|
+
def virtual_partitions(
|
45
|
+
concurrency: Karafka::App.config.concurrency,
|
46
|
+
partitioner: nil
|
47
|
+
)
|
48
|
+
@virtual_partitions ||= VirtualPartitions.new(
|
49
|
+
active: !partitioner.nil?,
|
50
|
+
concurrency: concurrency,
|
51
|
+
partitioner: partitioner
|
52
|
+
)
|
53
|
+
end
|
54
|
+
|
55
|
+
# @return [Boolean] are virtual partitions enabled for given topic
|
56
|
+
def virtual_partitions?
|
57
|
+
virtual_partitions.active?
|
29
58
|
end
|
30
59
|
|
31
60
|
# @return [Boolean] is a given job on a topic a long-running one
|
32
61
|
def long_running_job?
|
33
62
|
@long_running_job || false
|
34
63
|
end
|
64
|
+
|
65
|
+
# @return [Hash] hash with topic details and the extensions details
|
66
|
+
def to_h
|
67
|
+
super.merge(
|
68
|
+
virtual_partitions: virtual_partitions.to_h
|
69
|
+
)
|
70
|
+
end
|
35
71
|
end
|
36
72
|
end
|
37
73
|
end
|
@@ -37,14 +37,17 @@ module Karafka
|
|
37
37
|
@topic = topic
|
38
38
|
end
|
39
39
|
|
40
|
-
#
|
41
|
-
#
|
40
|
+
# Allows us to prepare the consumer in the listener thread prior to the job being send to
|
41
|
+
# the queue. It also allows to run some code that is time sensitive and cannot wait in the
|
42
|
+
# queue as it could cause starvation.
|
42
43
|
#
|
43
44
|
# @param messages [Array<Karafka::Messages::Message>]
|
44
|
-
# @param received_at [Time] the moment we've received the batch (actually the moment we've)
|
45
|
-
# enqueued it, but good enough
|
46
45
|
# @param coordinator [Karafka::Processing::Coordinator] coordinator for processing management
|
47
|
-
def
|
46
|
+
def before_enqueue(messages, coordinator)
|
47
|
+
# the moment we've received the batch or actually the moment we've enqueued it,
|
48
|
+
# but good enough
|
49
|
+
@enqueued_at = Time.now
|
50
|
+
|
48
51
|
# Recreate consumer with each batch if persistence is not enabled
|
49
52
|
# We reload the consumers with each batch instead of relying on some external signals
|
50
53
|
# when needed for consistency. That way devs may have it on or off and not in this
|
@@ -57,9 +60,14 @@ module Karafka
|
|
57
60
|
consumer.messages = Messages::Builders::Messages.call(
|
58
61
|
messages,
|
59
62
|
@topic,
|
60
|
-
|
63
|
+
@enqueued_at
|
61
64
|
)
|
62
65
|
|
66
|
+
consumer.on_before_enqueue
|
67
|
+
end
|
68
|
+
|
69
|
+
# Runs setup and warm-up code in the worker prior to running the consumption
|
70
|
+
def before_consume
|
63
71
|
consumer.on_before_consume
|
64
72
|
end
|
65
73
|
|
@@ -22,6 +22,10 @@ module Karafka
|
|
22
22
|
@non_blocking = false
|
23
23
|
end
|
24
24
|
|
25
|
+
# When redefined can run any code prior to the job being enqueued
|
26
|
+
# @note This will run in the listener thread and not in the worker
|
27
|
+
def before_enqueue; end
|
28
|
+
|
25
29
|
# When redefined can run any code that should run before executing the proper code
|
26
30
|
def before_call; end
|
27
31
|
|
@@ -18,13 +18,18 @@ module Karafka
|
|
18
18
|
@executor = executor
|
19
19
|
@messages = messages
|
20
20
|
@coordinator = coordinator
|
21
|
-
@created_at = Time.now
|
22
21
|
super()
|
23
22
|
end
|
24
23
|
|
24
|
+
# Runs all the preparation code on the executor that needs to happen before the job is
|
25
|
+
# enqueued.
|
26
|
+
def before_enqueue
|
27
|
+
executor.before_enqueue(@messages, @coordinator)
|
28
|
+
end
|
29
|
+
|
25
30
|
# Runs the before consumption preparations on the executor
|
26
31
|
def before_call
|
27
|
-
executor.before_consume
|
32
|
+
executor.before_consume
|
28
33
|
end
|
29
34
|
|
30
35
|
# Runs the given executor
|
@@ -7,15 +7,6 @@ module Karafka
|
|
7
7
|
class Proxy
|
8
8
|
attr_reader :target
|
9
9
|
|
10
|
-
# We should proxy only non ? and = methods as we want to have a regular dsl
|
11
|
-
IGNORED_POSTFIXES = %w[
|
12
|
-
?
|
13
|
-
=
|
14
|
-
!
|
15
|
-
].freeze
|
16
|
-
|
17
|
-
private_constant :IGNORED_POSTFIXES
|
18
|
-
|
19
10
|
# @param target [Object] target object to which we proxy any DSL call
|
20
11
|
# @param block [Proc] block that we want to evaluate in the proxy context
|
21
12
|
def initialize(target, &block)
|
@@ -25,21 +16,23 @@ module Karafka
|
|
25
16
|
|
26
17
|
# Translates the no "=" DSL of routing into elements assignments on target
|
27
18
|
# @param method_name [Symbol] name of the missing method
|
28
|
-
|
29
|
-
# @param block [Proc] block provided to the method
|
30
|
-
def method_missing(method_name, *arguments, &block)
|
19
|
+
def method_missing(method_name, ...)
|
31
20
|
return super unless respond_to_missing?(method_name)
|
32
21
|
|
33
|
-
@target.
|
22
|
+
if @target.respond_to?(:"#{method_name}=")
|
23
|
+
@target.public_send(:"#{method_name}=", ...)
|
24
|
+
else
|
25
|
+
@target.public_send(method_name, ...)
|
26
|
+
end
|
34
27
|
end
|
35
28
|
|
36
29
|
# Tells whether or not a given element exists on the target
|
37
30
|
# @param method_name [Symbol] name of the missing method
|
38
31
|
# @param include_private [Boolean] should we include private in the check as well
|
39
32
|
def respond_to_missing?(method_name, include_private = false)
|
40
|
-
|
41
|
-
|
42
|
-
|
33
|
+
@target.respond_to?(:"#{method_name}=", include_private) ||
|
34
|
+
@target.respond_to?(method_name, include_private) ||
|
35
|
+
super
|
43
36
|
end
|
44
37
|
end
|
45
38
|
end
|
@@ -8,6 +8,7 @@ module Karafka
|
|
8
8
|
class Topic
|
9
9
|
attr_reader :id, :name, :consumer_group
|
10
10
|
attr_writer :consumer
|
11
|
+
attr_accessor :subscription_group
|
11
12
|
|
12
13
|
# Attributes we can inherit from the root unless they were defined on this level
|
13
14
|
INHERITABLE_ATTRIBUTES = %i[
|
@@ -91,7 +92,8 @@ module Karafka
|
|
91
92
|
id: id,
|
92
93
|
name: name,
|
93
94
|
consumer: consumer,
|
94
|
-
consumer_group_id: consumer_group.id
|
95
|
+
consumer_group_id: consumer_group.id,
|
96
|
+
subscription_group: subscription_group
|
95
97
|
).freeze
|
96
98
|
end
|
97
99
|
end
|
@@ -1,6 +1,6 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
|
-
|
3
2
|
<% unless rails? -%>
|
3
|
+
|
4
4
|
# This file is auto-generated during the install process.
|
5
5
|
# If by any chance you've wanted a setup for Rails app, either run the `karafka:install`
|
6
6
|
# command again or refer to the install templates available in the source codes
|
data/lib/karafka/version.rb
CHANGED
data.tar.gz.sig
CHANGED
Binary file
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: karafka
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.0.
|
4
|
+
version: 2.0.7
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Maciej Mensfeld
|
@@ -35,7 +35,7 @@ cert_chain:
|
|
35
35
|
Qf04B9ceLUaC4fPVEz10FyobjaFoY4i32xRto3XnrzeAgfEe4swLq8bQsR3w/EF3
|
36
36
|
MGU0FeSV2Yj7Xc2x/7BzLK8xQn5l7Yy75iPF+KP3vVmDHnNl
|
37
37
|
-----END CERTIFICATE-----
|
38
|
-
date: 2022-
|
38
|
+
date: 2022-09-05 00:00:00.000000000 Z
|
39
39
|
dependencies:
|
40
40
|
- !ruby/object:Gem::Dependency
|
41
41
|
name: karafka-core
|
metadata.gz.sig
CHANGED
Binary file
|