karafka 2.0.4 → 2.0.7
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- checksums.yaml.gz.sig +0 -0
- data/CHANGELOG.md +18 -0
- data/Gemfile.lock +1 -1
- data/README.md +9 -9
- data/bin/integrations +13 -3
- data/config/errors.yml +3 -0
- data/lib/karafka/admin.rb +2 -1
- data/lib/karafka/base_consumer.rb +20 -4
- data/lib/karafka/connection/client.rb +6 -6
- data/lib/karafka/connection/listener.rb +9 -5
- data/lib/karafka/contracts/consumer_group.rb +2 -2
- data/lib/karafka/contracts/consumer_group_topic.rb +10 -9
- data/lib/karafka/messages/builders/batch_metadata.rb +2 -3
- data/lib/karafka/messages/builders/messages.rb +3 -1
- data/lib/karafka/pro/active_job/consumer.rb +1 -1
- data/lib/karafka/pro/base_consumer.rb +32 -3
- data/lib/karafka/pro/contracts/consumer_group_topic.rb +21 -1
- data/lib/karafka/pro/loader.rb +1 -1
- data/lib/karafka/pro/processing/coordinator.rb +14 -0
- data/lib/karafka/pro/processing/jobs/consume_non_blocking.rb +3 -2
- data/lib/karafka/pro/processing/partitioner.rb +3 -5
- data/lib/karafka/pro/routing/topic_extensions.rb +41 -5
- data/lib/karafka/processing/executor.rb +14 -6
- data/lib/karafka/processing/jobs/base.rb +4 -0
- data/lib/karafka/processing/jobs/consume.rb +7 -2
- data/lib/karafka/processing/worker.rb +0 -1
- data/lib/karafka/routing/proxy.rb +9 -16
- data/lib/karafka/routing/subscription_groups_builder.rb +1 -0
- data/lib/karafka/routing/topic.rb +3 -1
- data/lib/karafka/templates/karafka.rb.erb +1 -1
- data/lib/karafka/version.rb +1 -1
- data.tar.gz.sig +0 -0
- metadata +2 -2
- metadata.gz.sig +0 -0
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 0abed3f97a58be6b48f640468f7d7e6d48bc0960596b21d022b4616dd047be28
|
4
|
+
data.tar.gz: 48143253beee640e25e47a81474767c179e715e855d6173b59566483a57af5a8
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 9c9f8c170ac82fc0f1eb6ea41698dcd82cc525006931a59443d004c94eb18b56ffcb67eb1eb45fcc1fd557fee22e6e63ceb7a8a001245469e3e574d87c88c8e8
|
7
|
+
data.tar.gz: 47bc7e7dfe5ca3d503a3cb18da4e4b95c076197dc26b5633195e169d3f4d94da4effaf27bd4360ddff1481031b1ee20f61e465e24f6984570f6067ca4fbd51ea
|
checksums.yaml.gz.sig
CHANGED
Binary file
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,23 @@
|
|
1
1
|
# Karafka framework changelog
|
2
2
|
|
3
|
+
## 2.0.7 (Unreleased)
|
4
|
+
- [Breaking change] Redefine the Virtual Partitions routing DSL to accept concurrency
|
5
|
+
- Allow for `concurrency` setting in Virtual Partitions to extend or limit number of jobs per regular partition. This allows to make sure, we do not use all the threads on virtual partitions jobs
|
6
|
+
- Allow for creation of as many Virtual Partitions as needed, without taking global `concurrency` into consideration
|
7
|
+
|
8
|
+
## 2.0.6 (2022-09-02)
|
9
|
+
- Improve client closing.
|
10
|
+
- Fix for: Multiple LRJ topics fetched concurrently block ability for LRJ to kick in (#1002)
|
11
|
+
- Introduce a pre-enqueue sync execution layer to prevent starvation cases for LRJ
|
12
|
+
- Close admin upon critical errors to prevent segmentation faults
|
13
|
+
- Add support for manual subscription group management (#852)
|
14
|
+
|
15
|
+
## 2.0.5 (2022-08-23)
|
16
|
+
- Fix unnecessary double new line in the `karafka.rb` template for Ruby on Rails
|
17
|
+
- Fix a case where a manually paused partition would not be processed after rebalance (#988)
|
18
|
+
- Increase specs stability.
|
19
|
+
- Lower concurrency of execution of specs in Github CI.
|
20
|
+
|
3
21
|
## 2.0.4 (2022-08-19)
|
4
22
|
- Fix hanging topic creation (#964)
|
5
23
|
- Fix conflict with other Rails loading libraries like `gruf` (#974)
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -8,12 +8,12 @@
|
|
8
8
|
|
9
9
|
Karafka is a Ruby and Rails multi-threaded efficient Kafka processing framework that:
|
10
10
|
|
11
|
-
- Supports parallel processing in [multiple threads](https://
|
12
|
-
- Has [ActiveJob backend](https://
|
13
|
-
- [Automatically integrates](https://
|
14
|
-
- Supports in-development [code reloading](https://
|
11
|
+
- Supports parallel processing in [multiple threads](https://karafka.io/docs/Concurrency-and-multithreading) (also for a [single topic partition](https://karafka.io/docs/Pro-Virtual-Partitions) work)
|
12
|
+
- Has [ActiveJob backend](https://karafka.io/docs/Active-Job) support (including [ordered jobs](https://karafka.io/docs/Pro-Enhanced-Active-Job#ordered-jobs))
|
13
|
+
- [Automatically integrates](https://karafka.io/docs/Integrating-with-Ruby-on-Rails-and-other-frameworks#integrating-with-ruby-on-rails) with Ruby on Rails
|
14
|
+
- Supports in-development [code reloading](https://karafka.io/docs/Auto-reload-of-code-changes-in-development)
|
15
15
|
- Is powered by [librdkafka](https://github.com/edenhill/librdkafka) (the Apache Kafka C/C++ client library)
|
16
|
-
- Has an out-of the box [StatsD/DataDog monitoring](https://
|
16
|
+
- Has an out-of the box [StatsD/DataDog monitoring](https://karafka.io/docs/Monitoring-and-logging) with a dashboard template.
|
17
17
|
|
18
18
|
```ruby
|
19
19
|
# Define what topics you want to consume with which consumers in karafka.rb
|
@@ -42,13 +42,13 @@ If you're entirely new to the subject, you can start with our "Kafka on Rails" a
|
|
42
42
|
- [Kafka on Rails: Using Kafka with Ruby on Rails – Part 1 – Kafka basics and its advantages](https://mensfeld.pl/2017/11/kafka-on-rails-using-kafka-with-ruby-on-rails-part-1-kafka-basics-and-its-advantages/)
|
43
43
|
- [Kafka on Rails: Using Kafka with Ruby on Rails – Part 2 – Getting started with Rails and Kafka](https://mensfeld.pl/2018/01/kafka-on-rails-using-kafka-with-ruby-on-rails-part-2-getting-started-with-ruby-and-kafka/)
|
44
44
|
|
45
|
-
If you want to get started with Kafka and Karafka as fast as possible, then the best idea is to visit our [Getting started](https://
|
45
|
+
If you want to get started with Kafka and Karafka as fast as possible, then the best idea is to visit our [Getting started](https://karafka.io/docs/Getting-Started) guides and the [example apps repository](https://github.com/karafka/example-apps).
|
46
46
|
|
47
47
|
We also maintain many [integration specs](https://github.com/karafka/karafka/tree/master/spec/integrations) illustrating various use-cases and features of the framework.
|
48
48
|
|
49
49
|
### TL;DR (1 minute from setup to publishing and consuming messages)
|
50
50
|
|
51
|
-
**Prerequisites**: Kafka running. You can start it by following instructions from [here](https://
|
51
|
+
**Prerequisites**: Kafka running. You can start it by following instructions from [here](https://karafka.io/docs/Setting-up-Kafka).
|
52
52
|
|
53
53
|
1. Add and install Karafka:
|
54
54
|
|
@@ -85,8 +85,8 @@ Help me provide high-quality open-source software. Please see the Karafka [homep
|
|
85
85
|
|
86
86
|
## Support
|
87
87
|
|
88
|
-
Karafka has [Wiki pages](https://
|
88
|
+
Karafka has [Wiki pages](https://karafka.io/docs) for almost everything and a pretty decent [FAQ](https://karafka.io/docs/FAQ). It covers the installation, setup, and deployment, along with other useful details on how to run Karafka.
|
89
89
|
|
90
90
|
If you have questions about using Karafka, feel free to join our [Slack](https://slack.karafka.io) channel.
|
91
91
|
|
92
|
-
Karafka has [priority support](https://
|
92
|
+
Karafka has [priority support](https://karafka.io/docs/Pro-Support) for technical and architectural questions that is part of the Karafka Pro subscription.
|
data/bin/integrations
CHANGED
@@ -19,7 +19,7 @@ ROOT_PATH = Pathname.new(File.expand_path(File.join(File.dirname(__FILE__), '../
|
|
19
19
|
# When the value is high, there's a problem with thread allocation on Github CI, tht is why
|
20
20
|
# we limit it. Locally we can run a lot of those, as many of them have sleeps and do not use a lot
|
21
21
|
# of CPU
|
22
|
-
CONCURRENCY = ENV.key?('CI') ?
|
22
|
+
CONCURRENCY = ENV.key?('CI') ? 3 : Etc.nprocessors * 2
|
23
23
|
|
24
24
|
# How may bytes do we want to keep from the stdout in the buffer for when we need to print it
|
25
25
|
MAX_BUFFER_OUTPUT = 51_200
|
@@ -47,6 +47,8 @@ class Scenario
|
|
47
47
|
# @param path [String] path to the scenarios file
|
48
48
|
def initialize(path)
|
49
49
|
@path = path
|
50
|
+
# First 1024 characters from stdout
|
51
|
+
@stdout_head = ''
|
50
52
|
# Last 1024 characters from stdout
|
51
53
|
@stdout_tail = ''
|
52
54
|
end
|
@@ -75,8 +77,6 @@ class Scenario
|
|
75
77
|
def finished?
|
76
78
|
# If the thread is running too long, kill it
|
77
79
|
if current_time - @started_at > MAX_RUN_TIME
|
78
|
-
@wait_thr.kill
|
79
|
-
|
80
80
|
begin
|
81
81
|
Process.kill('TERM', pid)
|
82
82
|
# It may finish right after we want to kill it, that's why we ignore this
|
@@ -88,6 +88,7 @@ class Scenario
|
|
88
88
|
# to stdout. Otherwise after reaching the buffer size, it would hang
|
89
89
|
buffer = ''
|
90
90
|
@stdout.read_nonblock(MAX_BUFFER_OUTPUT, buffer, exception: false)
|
91
|
+
@stdout_head = buffer if @stdout_head.empty?
|
91
92
|
@stdout_tail << buffer
|
92
93
|
@stdout_tail = @stdout_tail[-MAX_BUFFER_OUTPUT..-1] || @stdout_tail
|
93
94
|
|
@@ -112,6 +113,11 @@ class Scenario
|
|
112
113
|
@wait_thr.value&.exitstatus || 123
|
113
114
|
end
|
114
115
|
|
116
|
+
# @return [String] exit status of the process
|
117
|
+
def exit_status
|
118
|
+
@wait_thr.value.to_s
|
119
|
+
end
|
120
|
+
|
115
121
|
# Prints a status report when scenario is finished and stdout if it failed
|
116
122
|
def report
|
117
123
|
if success?
|
@@ -123,7 +129,11 @@ class Scenario
|
|
123
129
|
|
124
130
|
puts
|
125
131
|
puts "\e[#{31}m#{'[FAILED]'}\e[0m #{name}"
|
132
|
+
puts "Time taken: #{current_time - @started_at} seconds"
|
126
133
|
puts "Exit code: #{exit_code}"
|
134
|
+
puts "Exit status: #{exit_status}"
|
135
|
+
puts @stdout_head
|
136
|
+
puts '...'
|
127
137
|
puts @stdout_tail
|
128
138
|
puts buffer
|
129
139
|
puts
|
data/config/errors.yml
CHANGED
@@ -35,6 +35,7 @@ en:
|
|
35
35
|
consumer_format: needs to be present
|
36
36
|
id_format: 'needs to be a string with a Kafka accepted format'
|
37
37
|
initial_offset_format: needs to be either earliest or latest
|
38
|
+
subscription_group_format: must be nil or a non-empty string
|
38
39
|
|
39
40
|
consumer_group:
|
40
41
|
missing: needs to be present
|
@@ -54,3 +55,5 @@ en:
|
|
54
55
|
|
55
56
|
pro_consumer_group_topic:
|
56
57
|
consumer_format: needs to inherit from Karafka::Pro::BaseConsumer and not Karafka::Consumer
|
58
|
+
virtual_partitions.partitioner_respond_to_call: needs to be defined and needs to respond to `#call`
|
59
|
+
virtual_partitions.concurrency_format: needs to be equl or more than 1
|
data/lib/karafka/admin.rb
CHANGED
@@ -15,13 +15,24 @@ module Karafka
|
|
15
15
|
# @return [Waterdrop::Producer] producer instance
|
16
16
|
attr_accessor :producer
|
17
17
|
|
18
|
-
# Can be used to run preparation code
|
18
|
+
# Can be used to run preparation code prior to the job being enqueued
|
19
19
|
#
|
20
20
|
# @private
|
21
|
-
# @note This should not be used by the end users as it is part of the lifecycle of things
|
21
|
+
# @note This should not be used by the end users as it is part of the lifecycle of things and
|
22
|
+
# not as a part of the public api. This should not perform any extensive operations as it is
|
23
|
+
# blocking and running in the listener thread.
|
24
|
+
def on_before_enqueue; end
|
25
|
+
|
26
|
+
# Can be used to run preparation code in the worker
|
27
|
+
#
|
28
|
+
# @private
|
29
|
+
# @note This should not be used by the end users as it is part of the lifecycle of things and
|
22
30
|
# not as part of the public api. This can act as a hook when creating non-blocking
|
23
31
|
# consumers and doing other advanced stuff
|
24
|
-
def on_before_consume
|
32
|
+
def on_before_consume
|
33
|
+
messages.metadata.processed_at = Time.now
|
34
|
+
messages.metadata.freeze
|
35
|
+
end
|
25
36
|
|
26
37
|
# Executes the default consumer flow.
|
27
38
|
#
|
@@ -70,10 +81,15 @@ module Karafka
|
|
70
81
|
end
|
71
82
|
end
|
72
83
|
|
73
|
-
# Trigger method for running on
|
84
|
+
# Trigger method for running on partition revocation.
|
74
85
|
#
|
75
86
|
# @private
|
76
87
|
def on_revoked
|
88
|
+
# We need to always un-pause the processing in case we have lost a given partition.
|
89
|
+
# Otherwise the underlying librdkafka would not know we may want to continue processing and
|
90
|
+
# the pause could in theory last forever
|
91
|
+
resume
|
92
|
+
|
77
93
|
coordinator.revoke
|
78
94
|
|
79
95
|
Karafka.monitor.instrument('consumer.revoked', caller: self) do
|
@@ -275,16 +275,16 @@ module Karafka
|
|
275
275
|
|
276
276
|
# Commits the stored offsets in a sync way and closes the consumer.
|
277
277
|
def close
|
278
|
-
# Once client is closed, we should not close it again
|
279
|
-
# This could only happen in case of a race-condition when forceful shutdown happens
|
280
|
-
# and triggers this from a different thread
|
281
|
-
return if @closed
|
282
|
-
|
283
278
|
@mutex.synchronize do
|
284
|
-
|
279
|
+
# Once client is closed, we should not close it again
|
280
|
+
# This could only happen in case of a race-condition when forceful shutdown happens
|
281
|
+
# and triggers this from a different thread
|
282
|
+
return if @closed
|
285
283
|
|
286
284
|
@closed = true
|
287
285
|
|
286
|
+
internal_commit_offsets(async: false)
|
287
|
+
|
288
288
|
# Remove callbacks runners that were registered
|
289
289
|
::Karafka::Instrumentation.statistics_callbacks.delete(@subscription_group.id)
|
290
290
|
::Karafka::Instrumentation.error_callbacks.delete(@subscription_group.id)
|
@@ -185,7 +185,9 @@ module Karafka
|
|
185
185
|
# processed (if it was assigned and revoked really fast), thus we may not have it
|
186
186
|
# here. In cases like this, we do not run a revocation job
|
187
187
|
@executors.find_all(topic, partition).each do |executor|
|
188
|
-
|
188
|
+
job = @jobs_builder.revoked(executor)
|
189
|
+
job.before_enqueue
|
190
|
+
jobs << job
|
189
191
|
end
|
190
192
|
|
191
193
|
# We need to remove all the executors of a given topic partition that we have lost, so
|
@@ -205,7 +207,9 @@ module Karafka
|
|
205
207
|
jobs = []
|
206
208
|
|
207
209
|
@executors.each do |_, _, executor|
|
208
|
-
|
210
|
+
job = @jobs_builder.shutdown(executor)
|
211
|
+
job.before_enqueue
|
212
|
+
jobs << job
|
209
213
|
end
|
210
214
|
|
211
215
|
@scheduler.schedule_shutdown(@jobs_queue, jobs)
|
@@ -238,10 +242,10 @@ module Karafka
|
|
238
242
|
@partitioner.call(topic, messages) do |group_id, partition_messages|
|
239
243
|
# Count the job we're going to create here
|
240
244
|
coordinator.increment
|
241
|
-
|
242
245
|
executor = @executors.find_or_create(topic, partition, group_id)
|
243
|
-
|
244
|
-
|
246
|
+
job = @jobs_builder.consume(executor, partition_messages, coordinator)
|
247
|
+
job.before_enqueue
|
248
|
+
jobs << job
|
245
249
|
end
|
246
250
|
end
|
247
251
|
|
@@ -12,8 +12,8 @@ module Karafka
|
|
12
12
|
).fetch('en').fetch('validations').fetch('consumer_group')
|
13
13
|
end
|
14
14
|
|
15
|
-
required(:id) { |
|
16
|
-
required(:topics) { |
|
15
|
+
required(:id) { |val| val.is_a?(String) && Contracts::TOPIC_REGEXP.match?(val) }
|
16
|
+
required(:topics) { |val| val.is_a?(Array) && !val.empty? }
|
17
17
|
|
18
18
|
virtual do |data, errors|
|
19
19
|
next unless errors.empty?
|
@@ -12,15 +12,16 @@ module Karafka
|
|
12
12
|
).fetch('en').fetch('validations').fetch('consumer_group_topic')
|
13
13
|
end
|
14
14
|
|
15
|
-
required(:consumer) { |
|
16
|
-
required(:deserializer) { |
|
17
|
-
required(:id) { |
|
18
|
-
required(:kafka) { |
|
19
|
-
required(:max_messages) { |
|
20
|
-
required(:initial_offset) { |
|
21
|
-
required(:max_wait_time) { |
|
22
|
-
required(:manual_offset_management) { |
|
23
|
-
required(:name) { |
|
15
|
+
required(:consumer) { |val| !val.nil? }
|
16
|
+
required(:deserializer) { |val| !val.nil? }
|
17
|
+
required(:id) { |val| val.is_a?(String) && Contracts::TOPIC_REGEXP.match?(val) }
|
18
|
+
required(:kafka) { |val| val.is_a?(Hash) && !val.empty? }
|
19
|
+
required(:max_messages) { |val| val.is_a?(Integer) && val >= 1 }
|
20
|
+
required(:initial_offset) { |val| %w[earliest latest].include?(val) }
|
21
|
+
required(:max_wait_time) { |val| val.is_a?(Integer) && val >= 10 }
|
22
|
+
required(:manual_offset_management) { |val| [true, false].include?(val) }
|
23
|
+
required(:name) { |val| val.is_a?(String) && Contracts::TOPIC_REGEXP.match?(val) }
|
24
|
+
required(:subscription_group) { |val| val.nil? || (val.is_a?(String) && !val.empty?) }
|
24
25
|
|
25
26
|
virtual do |data, errors|
|
26
27
|
next unless errors.empty?
|
@@ -28,9 +28,8 @@ module Karafka
|
|
28
28
|
created_at: messages.last.timestamp,
|
29
29
|
# When this batch was built and scheduled for execution
|
30
30
|
scheduled_at: scheduled_at,
|
31
|
-
#
|
32
|
-
|
33
|
-
processed_at: Time.now
|
31
|
+
# This needs to be set to a correct value prior to processing starting
|
32
|
+
processed_at: nil
|
34
33
|
)
|
35
34
|
end
|
36
35
|
end
|
@@ -14,11 +14,13 @@ module Karafka
|
|
14
14
|
# @param received_at [Time] moment in time when the messages were received
|
15
15
|
# @return [Karafka::Messages::Messages] messages batch object
|
16
16
|
def call(messages, topic, received_at)
|
17
|
+
# We cannot freeze the batch metadata because it is altered with the processed_at time
|
18
|
+
# prior to the consumption. It is being frozen there
|
17
19
|
metadata = BatchMetadata.call(
|
18
20
|
messages,
|
19
21
|
topic,
|
20
22
|
received_at
|
21
|
-
)
|
23
|
+
)
|
22
24
|
|
23
25
|
Karafka::Messages::Messages.new(
|
24
26
|
messages,
|
@@ -23,13 +23,17 @@ module Karafka
|
|
23
23
|
|
24
24
|
private_constant :MAX_PAUSE_TIME
|
25
25
|
|
26
|
-
# Pauses processing of a given partition until we're done with the processing
|
26
|
+
# Pauses processing of a given partition until we're done with the processing.
|
27
27
|
# This ensures, that we can easily poll not reaching the `max.poll.interval`
|
28
|
-
|
28
|
+
# @note This needs to happen in the listener thread, because we cannot wait on this being
|
29
|
+
# executed in the workers. Workers may be already running some LRJ jobs that are blocking
|
30
|
+
# all the threads until finished, yet unless we pause the incoming partitions information,
|
31
|
+
# we may be kicked out of the consumer group due to not polling often enough
|
32
|
+
def on_before_enqueue
|
29
33
|
return unless topic.long_running_job?
|
30
34
|
|
31
35
|
# This ensures, that when running LRJ with VP, things operate as expected
|
32
|
-
coordinator.
|
36
|
+
coordinator.on_enqueued do |first_group_message|
|
33
37
|
# Pause at the first message in a batch. That way in case of a crash, we will not loose
|
34
38
|
# any messages
|
35
39
|
pause(first_group_message.offset, MAX_PAUSE_TIME)
|
@@ -44,6 +48,29 @@ module Karafka
|
|
44
48
|
end
|
45
49
|
end
|
46
50
|
|
51
|
+
# Trigger method for running on partition revocation.
|
52
|
+
#
|
53
|
+
# @private
|
54
|
+
def on_revoked
|
55
|
+
# We do not want to resume on revocation in case of a LRJ.
|
56
|
+
# For LRJ we resume after the successful processing or do a backoff pause in case of a
|
57
|
+
# failure. Double non-blocking resume could cause problems in coordination.
|
58
|
+
resume unless topic.long_running_job?
|
59
|
+
|
60
|
+
coordinator.revoke
|
61
|
+
|
62
|
+
Karafka.monitor.instrument('consumer.revoked', caller: self) do
|
63
|
+
revoked
|
64
|
+
end
|
65
|
+
rescue StandardError => e
|
66
|
+
Karafka.monitor.instrument(
|
67
|
+
'error.occurred',
|
68
|
+
error: e,
|
69
|
+
caller: self,
|
70
|
+
type: 'consumer.revoked.error'
|
71
|
+
)
|
72
|
+
end
|
73
|
+
|
47
74
|
private
|
48
75
|
|
49
76
|
# Handles the post-consumption flow depending on topic settings
|
@@ -74,6 +101,8 @@ module Karafka
|
|
74
101
|
resume
|
75
102
|
else
|
76
103
|
# If processing failed, we need to pause
|
104
|
+
# For long running job this will overwrite the default never-ending pause and will cause
|
105
|
+
# the processing th keep going after the error backoff
|
77
106
|
pause(@seek_offset || first_message.offset)
|
78
107
|
end
|
79
108
|
end
|
@@ -22,11 +22,31 @@ module Karafka
|
|
22
22
|
).fetch('en').fetch('validations').fetch('pro_consumer_group_topic')
|
23
23
|
end
|
24
24
|
|
25
|
-
|
25
|
+
nested(:virtual_partitions) do
|
26
|
+
required(:active) { |val| [true, false].include?(val) }
|
27
|
+
required(:partitioner) { |val| val.nil? || val.respond_to?(:call) }
|
28
|
+
required(:concurrency) { |val| val.is_a?(Integer) && val >= 1 }
|
29
|
+
end
|
30
|
+
|
31
|
+
virtual do |data, errors|
|
32
|
+
next unless errors.empty?
|
26
33
|
next if data[:consumer] < Karafka::Pro::BaseConsumer
|
27
34
|
|
28
35
|
[[%i[consumer], :consumer_format]]
|
29
36
|
end
|
37
|
+
|
38
|
+
# When virtual partitions are defined, partitioner needs to respond to `#call` and it
|
39
|
+
# cannot be nil
|
40
|
+
virtual do |data, errors|
|
41
|
+
next unless errors.empty?
|
42
|
+
|
43
|
+
virtual_partitions = data[:virtual_partitions]
|
44
|
+
|
45
|
+
next unless virtual_partitions[:active]
|
46
|
+
next if virtual_partitions[:partitioner].respond_to?(:call)
|
47
|
+
|
48
|
+
[[%i[virtual_partitions partitioner], :respond_to_call]]
|
49
|
+
end
|
30
50
|
end
|
31
51
|
end
|
32
52
|
end
|
data/lib/karafka/pro/loader.rb
CHANGED
@@ -67,7 +67,7 @@ module Karafka
|
|
67
67
|
|
68
68
|
# Loads routing extensions
|
69
69
|
def load_routing_extensions
|
70
|
-
::Karafka::Routing::Topic.
|
70
|
+
::Karafka::Routing::Topic.prepend(Routing::TopicExtensions)
|
71
71
|
::Karafka::Routing::Builder.prepend(Routing::BuilderExtensions)
|
72
72
|
end
|
73
73
|
end
|
@@ -18,6 +18,7 @@ module Karafka
|
|
18
18
|
# @param args [Object] anything the base coordinator accepts
|
19
19
|
def initialize(*args)
|
20
20
|
super
|
21
|
+
@on_enqueued_invoked = false
|
21
22
|
@on_started_invoked = false
|
22
23
|
@on_finished_invoked = false
|
23
24
|
@flow_lock = Mutex.new
|
@@ -30,6 +31,7 @@ module Karafka
|
|
30
31
|
super
|
31
32
|
|
32
33
|
@mutex.synchronize do
|
34
|
+
@on_enqueued_invoked = false
|
33
35
|
@on_started_invoked = false
|
34
36
|
@on_finished_invoked = false
|
35
37
|
@first_message = messages.first
|
@@ -42,6 +44,18 @@ module Karafka
|
|
42
44
|
@running_jobs.zero?
|
43
45
|
end
|
44
46
|
|
47
|
+
# Runs synchronized code once for a collective of virtual partitions prior to work being
|
48
|
+
# enqueued
|
49
|
+
def on_enqueued
|
50
|
+
@flow_lock.synchronize do
|
51
|
+
return if @on_enqueued_invoked
|
52
|
+
|
53
|
+
@on_enqueued_invoked = true
|
54
|
+
|
55
|
+
yield(@first_message, @last_message)
|
56
|
+
end
|
57
|
+
end
|
58
|
+
|
45
59
|
# Runs given code only once per all the coordinated jobs upon starting first of them
|
46
60
|
def on_started
|
47
61
|
@flow_lock.synchronize do
|
@@ -25,8 +25,9 @@ module Karafka
|
|
25
25
|
# @note It needs to be working with a proper consumer that will handle the partition
|
26
26
|
# management. This layer of the framework knows nothing about Kafka messages consumption.
|
27
27
|
class ConsumeNonBlocking < ::Karafka::Processing::Jobs::Consume
|
28
|
-
#
|
29
|
-
|
28
|
+
# Makes this job non-blocking from the start
|
29
|
+
# @param args [Array] any arguments accepted by `::Karafka::Processing::Jobs::Consume`
|
30
|
+
def initialize(*args)
|
30
31
|
super
|
31
32
|
@non_blocking = true
|
32
33
|
end
|
@@ -21,17 +21,15 @@ module Karafka
|
|
21
21
|
def call(topic, messages)
|
22
22
|
ktopic = @subscription_group.topics.find(topic)
|
23
23
|
|
24
|
-
@concurrency ||= ::Karafka::App.config.concurrency
|
25
|
-
|
26
24
|
# We only partition work if we have a virtual partitioner and more than one thread to
|
27
25
|
# process the data. With one thread it is not worth partitioning the work as the work
|
28
26
|
# itself will be assigned to one thread (pointless work)
|
29
|
-
if ktopic.
|
27
|
+
if ktopic.virtual_partitions? && ktopic.virtual_partitions.concurrency > 1
|
30
28
|
# We need to reduce it to number of threads, so the group_id is not a direct effect
|
31
29
|
# of the end user action. Otherwise the persistence layer for consumers would cache
|
32
30
|
# it forever and it would cause memory leaks
|
33
31
|
groupings = messages
|
34
|
-
.group_by { |msg| ktopic.
|
32
|
+
.group_by { |msg| ktopic.virtual_partitions.partitioner.call(msg) }
|
35
33
|
.values
|
36
34
|
|
37
35
|
# Reduce the max concurrency to a size that matches the concurrency
|
@@ -41,7 +39,7 @@ module Karafka
|
|
41
39
|
# The algorithm here is simple, we assume that the most costly in terms of processing,
|
42
40
|
# will be processing of the biggest group and we reduce the smallest once to have
|
43
41
|
# max of groups equal to concurrency
|
44
|
-
while groupings.size >
|
42
|
+
while groupings.size > ktopic.virtual_partitions.concurrency
|
45
43
|
groupings.sort_by! { |grouping| -grouping.size }
|
46
44
|
|
47
45
|
# Offset order needs to be maintained for virtual partitions
|
@@ -15,23 +15,59 @@ module Karafka
|
|
15
15
|
module Routing
|
16
16
|
# Routing extensions that allow to configure some extra PRO routing options
|
17
17
|
module TopicExtensions
|
18
|
+
# Internal representation of the virtual partitions settings and configuration
|
19
|
+
# This allows us to abstract away things in a nice manner
|
20
|
+
#
|
21
|
+
# For features with more options than just on/off we use this approach as it simplifies
|
22
|
+
# the code. We do not use it for all not to create unneeded complexity
|
23
|
+
VirtualPartitions = Struct.new(
|
24
|
+
:active,
|
25
|
+
:partitioner,
|
26
|
+
:concurrency,
|
27
|
+
keyword_init: true
|
28
|
+
) { alias_method :active?, :active }
|
29
|
+
|
18
30
|
class << self
|
19
31
|
# @param base [Class] class we extend
|
20
|
-
def
|
32
|
+
def prepended(base)
|
21
33
|
base.attr_accessor :long_running_job
|
22
|
-
base.attr_accessor :virtual_partitioner
|
23
34
|
end
|
24
35
|
end
|
25
36
|
|
26
|
-
# @
|
27
|
-
|
28
|
-
|
37
|
+
# @param concurrency [Integer] max number of virtual partitions that can come out of the
|
38
|
+
# single distribution flow. When set to more than the Karafka threading, will create
|
39
|
+
# more work than workers. When less, can ensure we have spare resources to process other
|
40
|
+
# things in parallel.
|
41
|
+
# @param partitioner [nil, #call] nil or callable partitioner
|
42
|
+
# @return [VirtualPartitions] method that allows to set the virtual partitions details
|
43
|
+
# during the routing configuration and then allows to retrieve it
|
44
|
+
def virtual_partitions(
|
45
|
+
concurrency: Karafka::App.config.concurrency,
|
46
|
+
partitioner: nil
|
47
|
+
)
|
48
|
+
@virtual_partitions ||= VirtualPartitions.new(
|
49
|
+
active: !partitioner.nil?,
|
50
|
+
concurrency: concurrency,
|
51
|
+
partitioner: partitioner
|
52
|
+
)
|
53
|
+
end
|
54
|
+
|
55
|
+
# @return [Boolean] are virtual partitions enabled for given topic
|
56
|
+
def virtual_partitions?
|
57
|
+
virtual_partitions.active?
|
29
58
|
end
|
30
59
|
|
31
60
|
# @return [Boolean] is a given job on a topic a long-running one
|
32
61
|
def long_running_job?
|
33
62
|
@long_running_job || false
|
34
63
|
end
|
64
|
+
|
65
|
+
# @return [Hash] hash with topic details and the extensions details
|
66
|
+
def to_h
|
67
|
+
super.merge(
|
68
|
+
virtual_partitions: virtual_partitions.to_h
|
69
|
+
)
|
70
|
+
end
|
35
71
|
end
|
36
72
|
end
|
37
73
|
end
|
@@ -37,14 +37,17 @@ module Karafka
|
|
37
37
|
@topic = topic
|
38
38
|
end
|
39
39
|
|
40
|
-
#
|
41
|
-
#
|
40
|
+
# Allows us to prepare the consumer in the listener thread prior to the job being send to
|
41
|
+
# the queue. It also allows to run some code that is time sensitive and cannot wait in the
|
42
|
+
# queue as it could cause starvation.
|
42
43
|
#
|
43
44
|
# @param messages [Array<Karafka::Messages::Message>]
|
44
|
-
# @param received_at [Time] the moment we've received the batch (actually the moment we've)
|
45
|
-
# enqueued it, but good enough
|
46
45
|
# @param coordinator [Karafka::Processing::Coordinator] coordinator for processing management
|
47
|
-
def
|
46
|
+
def before_enqueue(messages, coordinator)
|
47
|
+
# the moment we've received the batch or actually the moment we've enqueued it,
|
48
|
+
# but good enough
|
49
|
+
@enqueued_at = Time.now
|
50
|
+
|
48
51
|
# Recreate consumer with each batch if persistence is not enabled
|
49
52
|
# We reload the consumers with each batch instead of relying on some external signals
|
50
53
|
# when needed for consistency. That way devs may have it on or off and not in this
|
@@ -57,9 +60,14 @@ module Karafka
|
|
57
60
|
consumer.messages = Messages::Builders::Messages.call(
|
58
61
|
messages,
|
59
62
|
@topic,
|
60
|
-
|
63
|
+
@enqueued_at
|
61
64
|
)
|
62
65
|
|
66
|
+
consumer.on_before_enqueue
|
67
|
+
end
|
68
|
+
|
69
|
+
# Runs setup and warm-up code in the worker prior to running the consumption
|
70
|
+
def before_consume
|
63
71
|
consumer.on_before_consume
|
64
72
|
end
|
65
73
|
|
@@ -22,6 +22,10 @@ module Karafka
|
|
22
22
|
@non_blocking = false
|
23
23
|
end
|
24
24
|
|
25
|
+
# When redefined can run any code prior to the job being enqueued
|
26
|
+
# @note This will run in the listener thread and not in the worker
|
27
|
+
def before_enqueue; end
|
28
|
+
|
25
29
|
# When redefined can run any code that should run before executing the proper code
|
26
30
|
def before_call; end
|
27
31
|
|
@@ -18,13 +18,18 @@ module Karafka
|
|
18
18
|
@executor = executor
|
19
19
|
@messages = messages
|
20
20
|
@coordinator = coordinator
|
21
|
-
@created_at = Time.now
|
22
21
|
super()
|
23
22
|
end
|
24
23
|
|
24
|
+
# Runs all the preparation code on the executor that needs to happen before the job is
|
25
|
+
# enqueued.
|
26
|
+
def before_enqueue
|
27
|
+
executor.before_enqueue(@messages, @coordinator)
|
28
|
+
end
|
29
|
+
|
25
30
|
# Runs the before consumption preparations on the executor
|
26
31
|
def before_call
|
27
|
-
executor.before_consume
|
32
|
+
executor.before_consume
|
28
33
|
end
|
29
34
|
|
30
35
|
# Runs the given executor
|
@@ -7,15 +7,6 @@ module Karafka
|
|
7
7
|
class Proxy
|
8
8
|
attr_reader :target
|
9
9
|
|
10
|
-
# We should proxy only non ? and = methods as we want to have a regular dsl
|
11
|
-
IGNORED_POSTFIXES = %w[
|
12
|
-
?
|
13
|
-
=
|
14
|
-
!
|
15
|
-
].freeze
|
16
|
-
|
17
|
-
private_constant :IGNORED_POSTFIXES
|
18
|
-
|
19
10
|
# @param target [Object] target object to which we proxy any DSL call
|
20
11
|
# @param block [Proc] block that we want to evaluate in the proxy context
|
21
12
|
def initialize(target, &block)
|
@@ -25,21 +16,23 @@ module Karafka
|
|
25
16
|
|
26
17
|
# Translates the no "=" DSL of routing into elements assignments on target
|
27
18
|
# @param method_name [Symbol] name of the missing method
|
28
|
-
|
29
|
-
# @param block [Proc] block provided to the method
|
30
|
-
def method_missing(method_name, *arguments, &block)
|
19
|
+
def method_missing(method_name, ...)
|
31
20
|
return super unless respond_to_missing?(method_name)
|
32
21
|
|
33
|
-
@target.
|
22
|
+
if @target.respond_to?(:"#{method_name}=")
|
23
|
+
@target.public_send(:"#{method_name}=", ...)
|
24
|
+
else
|
25
|
+
@target.public_send(method_name, ...)
|
26
|
+
end
|
34
27
|
end
|
35
28
|
|
36
29
|
# Tells whether or not a given element exists on the target
|
37
30
|
# @param method_name [Symbol] name of the missing method
|
38
31
|
# @param include_private [Boolean] should we include private in the check as well
|
39
32
|
def respond_to_missing?(method_name, include_private = false)
|
40
|
-
|
41
|
-
|
42
|
-
|
33
|
+
@target.respond_to?(:"#{method_name}=", include_private) ||
|
34
|
+
@target.respond_to?(method_name, include_private) ||
|
35
|
+
super
|
43
36
|
end
|
44
37
|
end
|
45
38
|
end
|
@@ -8,6 +8,7 @@ module Karafka
|
|
8
8
|
class Topic
|
9
9
|
attr_reader :id, :name, :consumer_group
|
10
10
|
attr_writer :consumer
|
11
|
+
attr_accessor :subscription_group
|
11
12
|
|
12
13
|
# Attributes we can inherit from the root unless they were defined on this level
|
13
14
|
INHERITABLE_ATTRIBUTES = %i[
|
@@ -91,7 +92,8 @@ module Karafka
|
|
91
92
|
id: id,
|
92
93
|
name: name,
|
93
94
|
consumer: consumer,
|
94
|
-
consumer_group_id: consumer_group.id
|
95
|
+
consumer_group_id: consumer_group.id,
|
96
|
+
subscription_group: subscription_group
|
95
97
|
).freeze
|
96
98
|
end
|
97
99
|
end
|
@@ -1,6 +1,6 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
|
-
|
3
2
|
<% unless rails? -%>
|
3
|
+
|
4
4
|
# This file is auto-generated during the install process.
|
5
5
|
# If by any chance you've wanted a setup for Rails app, either run the `karafka:install`
|
6
6
|
# command again or refer to the install templates available in the source codes
|
data/lib/karafka/version.rb
CHANGED
data.tar.gz.sig
CHANGED
Binary file
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: karafka
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.0.
|
4
|
+
version: 2.0.7
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Maciej Mensfeld
|
@@ -35,7 +35,7 @@ cert_chain:
|
|
35
35
|
Qf04B9ceLUaC4fPVEz10FyobjaFoY4i32xRto3XnrzeAgfEe4swLq8bQsR3w/EF3
|
36
36
|
MGU0FeSV2Yj7Xc2x/7BzLK8xQn5l7Yy75iPF+KP3vVmDHnNl
|
37
37
|
-----END CERTIFICATE-----
|
38
|
-
date: 2022-
|
38
|
+
date: 2022-09-05 00:00:00.000000000 Z
|
39
39
|
dependencies:
|
40
40
|
- !ruby/object:Gem::Dependency
|
41
41
|
name: karafka-core
|
metadata.gz.sig
CHANGED
Binary file
|