karafka 2.0.0.beta5 → 2.0.0.rc1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- checksums.yaml.gz.sig +0 -0
- data/CHANGELOG.md +8 -0
- data/CONTRIBUTING.md +0 -5
- data/Gemfile.lock +2 -2
- data/README.md +2 -10
- data/bin/stress_many +1 -1
- data/bin/stress_one +1 -1
- data/docker-compose.yml +4 -0
- data/lib/karafka/base_consumer.rb +4 -6
- data/lib/karafka/connection/client.rb +16 -13
- data/lib/karafka/connection/listener.rb +12 -24
- data/lib/karafka/connection/pauses_manager.rb +0 -8
- data/lib/karafka/contracts/config.rb +1 -0
- data/lib/karafka/pro/base_consumer.rb +21 -12
- data/lib/karafka/pro/loader.rb +2 -0
- data/lib/karafka/pro/processing/coordinator.rb +51 -0
- data/lib/karafka/pro/processing/partitioner.rb +41 -0
- data/lib/karafka/pro/routing/extensions.rb +6 -0
- data/lib/karafka/processing/coordinator.rb +6 -2
- data/lib/karafka/processing/coordinators_buffer.rb +3 -7
- data/lib/karafka/processing/executor.rb +1 -1
- data/lib/karafka/processing/partitioner.rb +22 -0
- data/lib/karafka/setup/config.rb +2 -0
- data/lib/karafka/version.rb +1 -1
- data.tar.gz.sig +0 -0
- metadata +4 -2
- metadata.gz.sig +0 -0
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 506ffb9aef3309eae2ee26e3283b7bc83859b26f4fe41995dca9d8f5e7bf0533
|
4
|
+
data.tar.gz: 14b39f0597676207bf9f2bf10b06c51ba539c3aa01959dfb2378c2e23941d240
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 4b2ad5ef4eff629abfc0088be0c400bd0f9420d24c05c1414541a4c167e7e6bf2b9735bf0e596358d8e3f3c2392bf2c2b4ba345cdb8f0226b54877ce111fd749
|
7
|
+
data.tar.gz: 93adbc64906ff4a03e67dee646a5fe696d825357753bb804c690036800c947edaa186b67e5da8db115a0cc4a9efa46e6b1e061a608e496255bbfdc6ddeb60c14
|
checksums.yaml.gz.sig
CHANGED
Binary file
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,13 @@
|
|
1
1
|
# Karafka framework changelog
|
2
2
|
|
3
|
+
## 2.0.0-rc1 (2022-07-08)
|
4
|
+
- Extract consumption partitioner out of listener inline code.
|
5
|
+
- Introduce virtual partitioner concept for parallel processing of data from a single topic partition.
|
6
|
+
- Improve stability when there kafka internal errors occur while polling.
|
7
|
+
- Fix a case where we would resume a LRJ partition upon rebalance where we would reclaim the partition while job was still running.
|
8
|
+
- Do not revoke pauses for lost partitions. This will allow to un-pause reclaimed partitions when LRJ jobs are done.
|
9
|
+
- Fail integrations by default (unless configured otherwise) if any errors occur during Karafka server execution.
|
10
|
+
|
3
11
|
## 2.0.0-beta5 (2022-07-05)
|
4
12
|
- Always resume processing of a revoked partition upon assignment.
|
5
13
|
- Improve specs stability.
|
data/CONTRIBUTING.md
CHANGED
@@ -34,8 +34,3 @@ By sending a pull request to the pro components, you are agreeing to transfer th
|
|
34
34
|
|
35
35
|
If you have any questions, create an [issue](issue) (protip: do a quick search first to see if someone else didn't ask the same question before!).
|
36
36
|
You can also reach us at hello@karafka.opencollective.com.
|
37
|
-
|
38
|
-
## Credits
|
39
|
-
|
40
|
-
Thank you to all the people who have already contributed to karafka!
|
41
|
-
<a href="graphs/contributors"><img src="https://opencollective.com/karafka/contributors.svg?width=890" /></a>
|
data/Gemfile.lock
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
PATH
|
2
2
|
remote: .
|
3
3
|
specs:
|
4
|
-
karafka (2.0.0.
|
4
|
+
karafka (2.0.0.rc1)
|
5
5
|
dry-configurable (~> 0.13)
|
6
6
|
dry-monitor (~> 0.5)
|
7
7
|
dry-validation (~> 1.7)
|
@@ -45,7 +45,7 @@ GEM
|
|
45
45
|
dry-configurable (~> 0.13, >= 0.13.0)
|
46
46
|
dry-core (~> 0.5, >= 0.5)
|
47
47
|
dry-events (~> 0.2)
|
48
|
-
dry-schema (1.9.
|
48
|
+
dry-schema (1.9.3)
|
49
49
|
concurrent-ruby (~> 1.0)
|
50
50
|
dry-configurable (~> 0.13, >= 0.13.0)
|
51
51
|
dry-core (~> 0.5, >= 0.5)
|
data/README.md
CHANGED
@@ -8,7 +8,7 @@
|
|
8
8
|
|
9
9
|
## About Karafka
|
10
10
|
|
11
|
-
Karafka is a framework used to simplify Apache Kafka based Ruby and Ruby on Rails applications development.
|
11
|
+
Karafka is a multi-threaded framework used to simplify Apache Kafka based Ruby and Ruby on Rails applications development.
|
12
12
|
|
13
13
|
```ruby
|
14
14
|
# Define what topics you want to consume with which consumers in karafka.rb
|
@@ -45,7 +45,7 @@ We also maintain many [integration specs](https://github.com/karafka/karafka/tre
|
|
45
45
|
|
46
46
|
## Want to Upgrade? LGPL is not for you? Want to help?
|
47
47
|
|
48
|
-
I also sell Karafka Pro subscription. It includes commercial-friendly license, priority support, architecture consultations and high throughput data processing-related features (
|
48
|
+
I also sell Karafka Pro subscription. It includes commercial-friendly license, priority support, architecture consultations and high throughput data processing-related features (virtual partitions, long running jobs and more).
|
49
49
|
|
50
50
|
**20%** of the income will be distributed back to other OSS projects that Karafka uses under the hood.
|
51
51
|
|
@@ -56,11 +56,3 @@ Help me provide high-quality open-source software. Please see the Karafka [homep
|
|
56
56
|
Karafka has [Wiki pages](https://github.com/karafka/karafka/wiki) for almost everything and a pretty decent [FAQ](https://github.com/karafka/karafka/wiki/FAQ). It covers the whole installation, setup and deployment along with other useful details on how to run Karafka.
|
57
57
|
|
58
58
|
If you have any questions about using Karafka, feel free to join our [Slack](https://slack.karafka.io) channel.
|
59
|
-
|
60
|
-
## Note on contributions
|
61
|
-
|
62
|
-
First, thank you for considering contributing to the Karafka ecosystem! It's people like you that make the open source community such a great community!
|
63
|
-
|
64
|
-
Each pull request must pass all the RSpec specs, integration tests and meet our quality requirements.
|
65
|
-
|
66
|
-
Fork it, update and wait for the Github Actions results.
|
data/bin/stress_many
CHANGED
data/bin/stress_one
CHANGED
data/docker-compose.yml
CHANGED
@@ -33,6 +33,9 @@ services:
|
|
33
33
|
integrations_14_02:2:1,\
|
34
34
|
integrations_15_02:2:1,\
|
35
35
|
integrations_16_02:2:1,\
|
36
|
+
integrations_17_02:2:1,\
|
37
|
+
integrations_18_02:2:1,\
|
38
|
+
integrations_19_02:2:1,\
|
36
39
|
integrations_00_03:3:1,\
|
37
40
|
integrations_01_03:3:1,\
|
38
41
|
integrations_02_03:3:1,\
|
@@ -41,6 +44,7 @@ services:
|
|
41
44
|
integrations_01_10:10:1,\
|
42
45
|
benchmarks_00_01:1:1,\
|
43
46
|
benchmarks_00_05:5:1,\
|
47
|
+
benchmarks_01_05:5:1,\
|
44
48
|
benchmarks_00_10:10:1"
|
45
49
|
volumes:
|
46
50
|
- /var/run/docker.sock:/var/run/docker.sock
|
@@ -35,9 +35,9 @@ module Karafka
|
|
35
35
|
consume
|
36
36
|
end
|
37
37
|
|
38
|
-
|
38
|
+
coordinator.consumption(self).success!
|
39
39
|
rescue StandardError => e
|
40
|
-
|
40
|
+
coordinator.consumption(self).failure!
|
41
41
|
|
42
42
|
Karafka.monitor.instrument(
|
43
43
|
'error.occurred',
|
@@ -47,7 +47,7 @@ module Karafka
|
|
47
47
|
)
|
48
48
|
ensure
|
49
49
|
# We need to decrease number of jobs that this coordinator coordinates as it has finished
|
50
|
-
|
50
|
+
coordinator.decrement
|
51
51
|
end
|
52
52
|
|
53
53
|
# @private
|
@@ -56,7 +56,7 @@ module Karafka
|
|
56
56
|
def on_after_consume
|
57
57
|
return if revoked?
|
58
58
|
|
59
|
-
if
|
59
|
+
if coordinator.success?
|
60
60
|
coordinator.pause_tracker.reset
|
61
61
|
|
62
62
|
# Mark as consumed only if manual offset management is not on
|
@@ -76,8 +76,6 @@ module Karafka
|
|
76
76
|
def on_revoked
|
77
77
|
coordinator.revoke
|
78
78
|
|
79
|
-
resume
|
80
|
-
|
81
79
|
Karafka.monitor.instrument('consumer.revoked', caller: self) do
|
82
80
|
revoked
|
83
81
|
end
|
@@ -316,37 +316,40 @@ module Karafka
|
|
316
316
|
|
317
317
|
time_poll.start
|
318
318
|
|
319
|
-
@kafka.poll(
|
319
|
+
@kafka.poll(timeout)
|
320
320
|
rescue ::Rdkafka::RdkafkaError => e
|
321
|
-
|
322
|
-
|
323
|
-
|
321
|
+
# We return nil, so we do not restart until running the whole loop
|
322
|
+
# This allows us to run revocation jobs and other things and we will pick up new work
|
323
|
+
# next time after dispatching all the things that are needed
|
324
|
+
#
|
325
|
+
# If we would retry here, the client reset would become transparent and we would not have
|
326
|
+
# a chance to take any actions
|
324
327
|
case e.code
|
325
328
|
when :max_poll_exceeded # -147
|
326
329
|
reset
|
330
|
+
return nil
|
327
331
|
when :transport # -195
|
328
332
|
reset
|
333
|
+
return nil
|
329
334
|
when :rebalance_in_progress # -27
|
330
335
|
reset
|
336
|
+
return nil
|
331
337
|
when :not_coordinator # 16
|
332
338
|
reset
|
339
|
+
return nil
|
333
340
|
when :network_exception # 13
|
334
341
|
reset
|
342
|
+
return nil
|
335
343
|
end
|
336
344
|
|
337
|
-
time_poll.
|
338
|
-
|
345
|
+
raise if time_poll.attempts > MAX_POLL_RETRIES
|
339
346
|
raise unless time_poll.retryable?
|
340
347
|
|
348
|
+
time_poll.checkpoint
|
341
349
|
time_poll.backoff
|
342
350
|
|
343
|
-
#
|
344
|
-
|
345
|
-
# next time after dispatching all the things that are needed
|
346
|
-
#
|
347
|
-
# If we would retry here, the client reset would become transparent and we would not have
|
348
|
-
# a chance to take any actions
|
349
|
-
nil
|
351
|
+
# On unknown errors we do our best to retry and handle them before raising
|
352
|
+
retry
|
350
353
|
end
|
351
354
|
|
352
355
|
# Builds a new rdkafka consumer instance based on the subscription group configuration
|
@@ -18,15 +18,18 @@ module Karafka
|
|
18
18
|
# @param jobs_queue [Karafka::Processing::JobsQueue] queue where we should push work
|
19
19
|
# @return [Karafka::Connection::Listener] listener instance
|
20
20
|
def initialize(subscription_group, jobs_queue)
|
21
|
+
proc_config = ::Karafka::App.config.internal.processing
|
22
|
+
|
21
23
|
@id = SecureRandom.uuid
|
22
24
|
@subscription_group = subscription_group
|
23
25
|
@jobs_queue = jobs_queue
|
24
|
-
@jobs_builder = ::Karafka::App.config.internal.processing.jobs_builder
|
25
26
|
@coordinators = Processing::CoordinatorsBuffer.new
|
26
27
|
@client = Client.new(@subscription_group)
|
27
28
|
@executors = Processing::ExecutorsBuffer.new(@client, subscription_group)
|
29
|
+
@jobs_builder = proc_config.jobs_builder
|
30
|
+
@partitioner = proc_config.partitioner_class.new(subscription_group)
|
28
31
|
# We reference scheduler here as it is much faster than fetching this each time
|
29
|
-
@scheduler =
|
32
|
+
@scheduler = proc_config.scheduler
|
30
33
|
# We keep one buffer for messages to preserve memory and not allocate extra objects
|
31
34
|
# We can do this that way because we always first schedule jobs using messages before we
|
32
35
|
# fetch another batch.
|
@@ -79,10 +82,6 @@ module Karafka
|
|
79
82
|
poll_and_remap_messages
|
80
83
|
end
|
81
84
|
|
82
|
-
# This will ensure, that in the next poll, we continue processing (if we get them back)
|
83
|
-
# partitions that we have paused
|
84
|
-
resume_assigned_partitions
|
85
|
-
|
86
85
|
# If there were revoked partitions, we need to wait on their jobs to finish before
|
87
86
|
# distributing consuming jobs as upon revoking, we might get assigned to the same
|
88
87
|
# partitions, thus getting their jobs. The revoking jobs need to finish before
|
@@ -159,8 +158,6 @@ module Karafka
|
|
159
158
|
|
160
159
|
revoked_partitions.each do |topic, partitions|
|
161
160
|
partitions.each do |partition|
|
162
|
-
# We revoke the coordinator here, so we do not have to revoke it in the revoke job
|
163
|
-
# itself (this happens prior to scheduling those jobs)
|
164
161
|
@coordinators.revoke(topic, partition)
|
165
162
|
|
166
163
|
# There may be a case where we have lost partition of which data we have never
|
@@ -204,17 +201,6 @@ module Karafka
|
|
204
201
|
)
|
205
202
|
end
|
206
203
|
|
207
|
-
# Revoked partition needs to be resumed if we were processing them earlier. This will do
|
208
|
-
# nothing to things that we are planning to process. Without this, things we get
|
209
|
-
# re-assigned would not be polled.
|
210
|
-
def resume_assigned_partitions
|
211
|
-
@client.rebalance_manager.assigned_partitions.each do |topic, partitions|
|
212
|
-
partitions.each do |partition|
|
213
|
-
@client.resume(topic, partition)
|
214
|
-
end
|
215
|
-
end
|
216
|
-
end
|
217
|
-
|
218
204
|
# Takes the messages per topic partition and enqueues processing jobs in threads using
|
219
205
|
# given scheduler.
|
220
206
|
def build_and_schedule_consumption_jobs
|
@@ -226,14 +212,16 @@ module Karafka
|
|
226
212
|
coordinator = @coordinators.find_or_create(topic, partition)
|
227
213
|
|
228
214
|
# Start work coordination for this topic partition
|
229
|
-
coordinator.start
|
215
|
+
coordinator.start(messages)
|
230
216
|
|
231
|
-
|
232
|
-
|
217
|
+
@partitioner.call(topic, messages) do |group_id, partition_messages|
|
218
|
+
# Count the job we're going to create here
|
219
|
+
coordinator.increment
|
233
220
|
|
234
|
-
|
221
|
+
executor = @executors.find_or_create(topic, partition, group_id)
|
235
222
|
|
236
|
-
|
223
|
+
jobs << @jobs_builder.consume(executor, partition_messages, coordinator)
|
224
|
+
end
|
237
225
|
end
|
238
226
|
|
239
227
|
@scheduler.schedule_consumption(@jobs_queue, jobs)
|
@@ -25,14 +25,6 @@ module Karafka
|
|
25
25
|
)
|
26
26
|
end
|
27
27
|
|
28
|
-
# Revokes pause tracker for a given topic partition
|
29
|
-
#
|
30
|
-
# @param topic [String] topic name
|
31
|
-
# @param partition [Integer] partition number
|
32
|
-
def revoke(topic, partition)
|
33
|
-
@pauses[topic].delete(partition)
|
34
|
-
end
|
35
|
-
|
36
28
|
# Resumes processing of partitions for which pause time has ended.
|
37
29
|
#
|
38
30
|
# @yieldparam [String] topic name
|
@@ -26,29 +26,38 @@ module Karafka
|
|
26
26
|
# Pauses processing of a given partition until we're done with the processing
|
27
27
|
# This ensures, that we can easily poll not reaching the `max.poll.interval`
|
28
28
|
def on_before_consume
|
29
|
-
# Pause at the first message in a batch. That way in case of a crash, we will not loose
|
30
|
-
# any messages
|
31
29
|
return unless topic.long_running_job?
|
32
30
|
|
33
|
-
|
31
|
+
# This ensures, that when running LRJ with VP, things operate as expected
|
32
|
+
coordinator.on_started do |first_group_message|
|
33
|
+
# Pause at the first message in a batch. That way in case of a crash, we will not loose
|
34
|
+
# any messages
|
35
|
+
pause(first_group_message.offset, MAX_PAUSE_TIME)
|
36
|
+
end
|
34
37
|
end
|
35
38
|
|
36
39
|
# Runs extra logic after consumption that is related to handling long running jobs
|
37
40
|
# @note This overwrites the '#on_after_consume' from the base consumer
|
38
41
|
def on_after_consume
|
39
|
-
|
40
|
-
|
42
|
+
coordinator.on_finished do |first_group_message, last_group_message|
|
43
|
+
on_after_consume_regular(first_group_message, last_group_message)
|
44
|
+
end
|
45
|
+
end
|
41
46
|
|
42
|
-
|
47
|
+
private
|
48
|
+
|
49
|
+
# Handles the post-consumption flow depending on topic settings
|
50
|
+
#
|
51
|
+
# @param first_message [Karafka::Messages::Message]
|
52
|
+
# @param last_message [Karafka::Messages::Message]
|
53
|
+
def on_after_consume_regular(first_message, last_message)
|
54
|
+
if coordinator.success?
|
43
55
|
coordinator.pause_tracker.reset
|
44
56
|
|
45
57
|
# We use the non-blocking one here. If someone needs the blocking one, can implement it
|
46
58
|
# with manual offset management
|
47
59
|
# Mark as consumed only if manual offset management is not on
|
48
|
-
mark_as_consumed(
|
49
|
-
|
50
|
-
# We check it twice as marking could change this state
|
51
|
-
return if revoked?
|
60
|
+
mark_as_consumed(last_message) unless topic.manual_offset_management? || revoked?
|
52
61
|
|
53
62
|
# If this is not a long running job there is nothing for us to do here
|
54
63
|
return unless topic.long_running_job?
|
@@ -60,12 +69,12 @@ module Karafka
|
|
60
69
|
# interesting (yet valid) corner case, where with manual offset management on and no
|
61
70
|
# marking as consumed, we end up with an infinite loop processing same messages over and
|
62
71
|
# over again
|
63
|
-
seek(@seek_offset ||
|
72
|
+
seek(@seek_offset || first_message.offset)
|
64
73
|
|
65
74
|
resume
|
66
75
|
else
|
67
76
|
# If processing failed, we need to pause
|
68
|
-
pause(@seek_offset ||
|
77
|
+
pause(@seek_offset || first_message.offset)
|
69
78
|
end
|
70
79
|
end
|
71
80
|
end
|
data/lib/karafka/pro/loader.rb
CHANGED
@@ -21,6 +21,7 @@ module Karafka
|
|
21
21
|
processing/jobs/consume_non_blocking
|
22
22
|
processing/jobs_builder
|
23
23
|
processing/coordinator
|
24
|
+
processing/partitioner
|
24
25
|
routing/extensions
|
25
26
|
active_job/consumer
|
26
27
|
active_job/dispatcher
|
@@ -39,6 +40,7 @@ module Karafka
|
|
39
40
|
icfg = config.internal
|
40
41
|
|
41
42
|
icfg.processing.coordinator_class = Processing::Coordinator
|
43
|
+
icfg.processing.partitioner_class = Processing::Partitioner
|
42
44
|
icfg.processing.scheduler = Processing::Scheduler.new
|
43
45
|
icfg.processing.jobs_builder = Processing::JobsBuilder.new
|
44
46
|
|
@@ -6,6 +6,57 @@ module Karafka
|
|
6
6
|
# Pro coordinator that provides extra orchestration methods useful for parallel processing
|
7
7
|
# within the same partition
|
8
8
|
class Coordinator < ::Karafka::Processing::Coordinator
|
9
|
+
# @param args [Object] anything the base coordinator accepts
|
10
|
+
def initialize(*args)
|
11
|
+
super
|
12
|
+
@on_started_invoked = false
|
13
|
+
@on_finished_invoked = false
|
14
|
+
@flow_lock = Mutex.new
|
15
|
+
end
|
16
|
+
|
17
|
+
# Starts the coordination process
|
18
|
+
# @param messages [Array<Karafka::Messages::Message>] messages for which processing we are
|
19
|
+
# going to coordinate.
|
20
|
+
def start(messages)
|
21
|
+
super
|
22
|
+
|
23
|
+
@mutex.synchronize do
|
24
|
+
@on_started_invoked = false
|
25
|
+
@on_finished_invoked = false
|
26
|
+
@first_message = messages.first
|
27
|
+
@last_message = messages.last
|
28
|
+
end
|
29
|
+
end
|
30
|
+
|
31
|
+
# @return [Boolean] is the coordinated work finished or not
|
32
|
+
def finished?
|
33
|
+
@running_jobs.zero?
|
34
|
+
end
|
35
|
+
|
36
|
+
# Runs given code only once per all the coordinated jobs upon starting first of them
|
37
|
+
def on_started
|
38
|
+
@flow_lock.synchronize do
|
39
|
+
return if @on_started_invoked
|
40
|
+
|
41
|
+
@on_started_invoked = true
|
42
|
+
|
43
|
+
yield(@first_message, @last_message)
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
# Runs once when all the work that is suppose to be coordinated is finished
|
48
|
+
# It runs once per all the coordinated jobs and should be used to run any type of post
|
49
|
+
# jobs coordination processing execution
|
50
|
+
def on_finished
|
51
|
+
@flow_lock.synchronize do
|
52
|
+
return unless finished?
|
53
|
+
return if @on_finished_invoked
|
54
|
+
|
55
|
+
@on_finished_invoked = true
|
56
|
+
|
57
|
+
yield(@first_message, @last_message)
|
58
|
+
end
|
59
|
+
end
|
9
60
|
end
|
10
61
|
end
|
11
62
|
end
|
@@ -0,0 +1,41 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
# This Karafka component is a Pro component.
|
4
|
+
# All of the commercial components are present in the lib/karafka/pro directory of this
|
5
|
+
# repository and their usage requires commercial license agreement.
|
6
|
+
#
|
7
|
+
# Karafka has also commercial-friendly license, commercial support and commercial components.
|
8
|
+
#
|
9
|
+
# By sending a pull request to the pro components, you are agreeing to transfer the copyright of
|
10
|
+
# your code to Maciej Mensfeld.
|
11
|
+
|
12
|
+
module Karafka
|
13
|
+
module Pro
|
14
|
+
module Processing
|
15
|
+
# Pro partitioner that can distribute work based on the virtual partitioner settings
|
16
|
+
class Partitioner < ::Karafka::Processing::Partitioner
|
17
|
+
# @param topic [String] topic name
|
18
|
+
# @param messages [Array<Karafka::Messages::Message>] karafka messages
|
19
|
+
# @yieldparam [Integer] group id
|
20
|
+
# @yieldparam [Array<Karafka::Messages::Message>] karafka messages
|
21
|
+
def call(topic, messages)
|
22
|
+
ktopic = @subscription_group.topics.find(topic)
|
23
|
+
|
24
|
+
@concurrency ||= ::Karafka::App.config.concurrency
|
25
|
+
|
26
|
+
# We only partition work if we have a virtual partitioner and more than one thread to
|
27
|
+
# process the data. With one thread it is not worth partitioning the work as the work
|
28
|
+
# itself will be assigned to one thread (pointless work)
|
29
|
+
if ktopic.virtual_partitioner? && @concurrency > 1
|
30
|
+
messages
|
31
|
+
.group_by { |msg| ktopic.virtual_partitioner.call(msg).hash.abs % @concurrency }
|
32
|
+
.each { |group_id, messages_group| yield(group_id, messages_group) }
|
33
|
+
else
|
34
|
+
# When no virtual partitioner, works as regular one
|
35
|
+
yield(0, messages)
|
36
|
+
end
|
37
|
+
end
|
38
|
+
end
|
39
|
+
end
|
40
|
+
end
|
41
|
+
end
|
@@ -19,9 +19,15 @@ module Karafka
|
|
19
19
|
# @param base [Class] class we extend
|
20
20
|
def included(base)
|
21
21
|
base.attr_accessor :long_running_job
|
22
|
+
base.attr_accessor :virtual_partitioner
|
22
23
|
end
|
23
24
|
end
|
24
25
|
|
26
|
+
# @return [Boolean] true if virtual partitioner is defined, false otherwise
|
27
|
+
def virtual_partitioner?
|
28
|
+
virtual_partitioner != nil
|
29
|
+
end
|
30
|
+
|
25
31
|
# @return [Boolean] is a given job on a topic a long running one
|
26
32
|
def long_running_job?
|
27
33
|
@long_running_job || false
|
@@ -23,7 +23,9 @@ module Karafka
|
|
23
23
|
end
|
24
24
|
|
25
25
|
# Starts the coordinator for given consumption jobs
|
26
|
-
|
26
|
+
# @param _messages [Array<Karafka::Messages::Message>] batch of message for which we are
|
27
|
+
# going to coordinate work. Not used with regular coordinator.
|
28
|
+
def start(_messages)
|
27
29
|
@mutex.synchronize do
|
28
30
|
@running_jobs = 0
|
29
31
|
# We need to clear the consumption results hash here, otherwise we could end up storing
|
@@ -44,7 +46,9 @@ module Karafka
|
|
44
46
|
|
45
47
|
return @running_jobs unless @running_jobs.negative?
|
46
48
|
|
47
|
-
|
49
|
+
# This should never happen. If it does, something is heavily out of sync. Please reach
|
50
|
+
# out to us if you encounter this
|
51
|
+
raise Karafka::Errors::InvalidCoordinatorState, 'Was zero before decrementation'
|
48
52
|
end
|
49
53
|
end
|
50
54
|
|
@@ -2,7 +2,7 @@
|
|
2
2
|
|
3
3
|
module Karafka
|
4
4
|
module Processing
|
5
|
-
#
|
5
|
+
# Coordinators builder used to build coordinators per topic partition
|
6
6
|
#
|
7
7
|
# It provides direct pauses access for revocation
|
8
8
|
#
|
@@ -34,17 +34,13 @@ module Karafka
|
|
34
34
|
# @param topic [String] topic name
|
35
35
|
# @param partition [Integer] partition number
|
36
36
|
def revoke(topic, partition)
|
37
|
-
@
|
37
|
+
return unless @coordinators[topic].key?(partition)
|
38
38
|
|
39
39
|
# The fact that we delete here does not change the fact that the executor still holds the
|
40
40
|
# reference to this coordinator. We delete it here, as we will no longer process any
|
41
41
|
# new stuff with it and we may need a new coordinator if we regain this partition, but the
|
42
42
|
# coordinator may still be in use
|
43
|
-
|
44
|
-
|
45
|
-
return unless coordinator
|
46
|
-
|
47
|
-
coordinator.revoke
|
43
|
+
@coordinators[topic].delete(partition).revoke
|
48
44
|
end
|
49
45
|
|
50
46
|
# Clears coordinators and re-created the pauses manager
|
@@ -0,0 +1,22 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module Karafka
|
4
|
+
module Processing
|
5
|
+
# Basic partitioner for work division
|
6
|
+
# It does not divide any work.
|
7
|
+
class Partitioner
|
8
|
+
# @param subscription_group [Karafka::Routing::SubscriptionGroup] subscription group
|
9
|
+
def initialize(subscription_group)
|
10
|
+
@subscription_group = subscription_group
|
11
|
+
end
|
12
|
+
|
13
|
+
# @param _topic [String] topic name
|
14
|
+
# @param messages [Array<Karafka::Messages::Message>] karafka messages
|
15
|
+
# @yieldparam [Integer] group id
|
16
|
+
# @yieldparam [Array<Karafka::Messages::Message>] karafka messages
|
17
|
+
def call(_topic, messages)
|
18
|
+
yield(0, messages)
|
19
|
+
end
|
20
|
+
end
|
21
|
+
end
|
22
|
+
end
|
data/lib/karafka/setup/config.rb
CHANGED
@@ -107,6 +107,8 @@ module Karafka
|
|
107
107
|
setting :jobs_builder, default: Processing::JobsBuilder.new
|
108
108
|
# option coordinator [Class] work coordinator we want to user for processing coordination
|
109
109
|
setting :coordinator_class, default: Processing::Coordinator
|
110
|
+
# option partitioner_class [Class] partitioner we use against a batch of data
|
111
|
+
setting :partitioner_class, default: Processing::Partitioner
|
110
112
|
end
|
111
113
|
|
112
114
|
# Karafka components for ActiveJob
|
data/lib/karafka/version.rb
CHANGED
data.tar.gz.sig
CHANGED
Binary file
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: karafka
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.0.0.
|
4
|
+
version: 2.0.0.rc1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Maciej Mensfeld
|
@@ -34,7 +34,7 @@ cert_chain:
|
|
34
34
|
R2P11bWoCtr70BsccVrN8jEhzwXngMyI2gVt750Y+dbTu1KgRqZKp/ECe7ZzPzXj
|
35
35
|
pIy9vHxTANKYVyI4qj8OrFdEM5BQNu8oQpL0iQ==
|
36
36
|
-----END CERTIFICATE-----
|
37
|
-
date: 2022-07-
|
37
|
+
date: 2022-07-08 00:00:00.000000000 Z
|
38
38
|
dependencies:
|
39
39
|
- !ruby/object:Gem::Dependency
|
40
40
|
name: dry-configurable
|
@@ -240,6 +240,7 @@ files:
|
|
240
240
|
- lib/karafka/pro/processing/coordinator.rb
|
241
241
|
- lib/karafka/pro/processing/jobs/consume_non_blocking.rb
|
242
242
|
- lib/karafka/pro/processing/jobs_builder.rb
|
243
|
+
- lib/karafka/pro/processing/partitioner.rb
|
243
244
|
- lib/karafka/pro/processing/scheduler.rb
|
244
245
|
- lib/karafka/pro/routing/extensions.rb
|
245
246
|
- lib/karafka/process.rb
|
@@ -253,6 +254,7 @@ files:
|
|
253
254
|
- lib/karafka/processing/jobs/shutdown.rb
|
254
255
|
- lib/karafka/processing/jobs_builder.rb
|
255
256
|
- lib/karafka/processing/jobs_queue.rb
|
257
|
+
- lib/karafka/processing/partitioner.rb
|
256
258
|
- lib/karafka/processing/result.rb
|
257
259
|
- lib/karafka/processing/scheduler.rb
|
258
260
|
- lib/karafka/processing/worker.rb
|
metadata.gz.sig
CHANGED
Binary file
|