karafka 2.0.0.beta5 → 2.0.0.rc1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- checksums.yaml.gz.sig +0 -0
- data/CHANGELOG.md +8 -0
- data/CONTRIBUTING.md +0 -5
- data/Gemfile.lock +2 -2
- data/README.md +2 -10
- data/bin/stress_many +1 -1
- data/bin/stress_one +1 -1
- data/docker-compose.yml +4 -0
- data/lib/karafka/base_consumer.rb +4 -6
- data/lib/karafka/connection/client.rb +16 -13
- data/lib/karafka/connection/listener.rb +12 -24
- data/lib/karafka/connection/pauses_manager.rb +0 -8
- data/lib/karafka/contracts/config.rb +1 -0
- data/lib/karafka/pro/base_consumer.rb +21 -12
- data/lib/karafka/pro/loader.rb +2 -0
- data/lib/karafka/pro/processing/coordinator.rb +51 -0
- data/lib/karafka/pro/processing/partitioner.rb +41 -0
- data/lib/karafka/pro/routing/extensions.rb +6 -0
- data/lib/karafka/processing/coordinator.rb +6 -2
- data/lib/karafka/processing/coordinators_buffer.rb +3 -7
- data/lib/karafka/processing/executor.rb +1 -1
- data/lib/karafka/processing/partitioner.rb +22 -0
- data/lib/karafka/setup/config.rb +2 -0
- data/lib/karafka/version.rb +1 -1
- data.tar.gz.sig +0 -0
- metadata +4 -2
- metadata.gz.sig +0 -0
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 506ffb9aef3309eae2ee26e3283b7bc83859b26f4fe41995dca9d8f5e7bf0533
|
4
|
+
data.tar.gz: 14b39f0597676207bf9f2bf10b06c51ba539c3aa01959dfb2378c2e23941d240
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 4b2ad5ef4eff629abfc0088be0c400bd0f9420d24c05c1414541a4c167e7e6bf2b9735bf0e596358d8e3f3c2392bf2c2b4ba345cdb8f0226b54877ce111fd749
|
7
|
+
data.tar.gz: 93adbc64906ff4a03e67dee646a5fe696d825357753bb804c690036800c947edaa186b67e5da8db115a0cc4a9efa46e6b1e061a608e496255bbfdc6ddeb60c14
|
checksums.yaml.gz.sig
CHANGED
Binary file
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,13 @@
|
|
1
1
|
# Karafka framework changelog
|
2
2
|
|
3
|
+
## 2.0.0-rc1 (2022-07-08)
|
4
|
+
- Extract consumption partitioner out of listener inline code.
|
5
|
+
- Introduce virtual partitioner concept for parallel processing of data from a single topic partition.
|
6
|
+
- Improve stability when there kafka internal errors occur while polling.
|
7
|
+
- Fix a case where we would resume a LRJ partition upon rebalance where we would reclaim the partition while job was still running.
|
8
|
+
- Do not revoke pauses for lost partitions. This will allow to un-pause reclaimed partitions when LRJ jobs are done.
|
9
|
+
- Fail integrations by default (unless configured otherwise) if any errors occur during Karafka server execution.
|
10
|
+
|
3
11
|
## 2.0.0-beta5 (2022-07-05)
|
4
12
|
- Always resume processing of a revoked partition upon assignment.
|
5
13
|
- Improve specs stability.
|
data/CONTRIBUTING.md
CHANGED
@@ -34,8 +34,3 @@ By sending a pull request to the pro components, you are agreeing to transfer th
|
|
34
34
|
|
35
35
|
If you have any questions, create an [issue](issue) (protip: do a quick search first to see if someone else didn't ask the same question before!).
|
36
36
|
You can also reach us at hello@karafka.opencollective.com.
|
37
|
-
|
38
|
-
## Credits
|
39
|
-
|
40
|
-
Thank you to all the people who have already contributed to karafka!
|
41
|
-
<a href="graphs/contributors"><img src="https://opencollective.com/karafka/contributors.svg?width=890" /></a>
|
data/Gemfile.lock
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
PATH
|
2
2
|
remote: .
|
3
3
|
specs:
|
4
|
-
karafka (2.0.0.
|
4
|
+
karafka (2.0.0.rc1)
|
5
5
|
dry-configurable (~> 0.13)
|
6
6
|
dry-monitor (~> 0.5)
|
7
7
|
dry-validation (~> 1.7)
|
@@ -45,7 +45,7 @@ GEM
|
|
45
45
|
dry-configurable (~> 0.13, >= 0.13.0)
|
46
46
|
dry-core (~> 0.5, >= 0.5)
|
47
47
|
dry-events (~> 0.2)
|
48
|
-
dry-schema (1.9.
|
48
|
+
dry-schema (1.9.3)
|
49
49
|
concurrent-ruby (~> 1.0)
|
50
50
|
dry-configurable (~> 0.13, >= 0.13.0)
|
51
51
|
dry-core (~> 0.5, >= 0.5)
|
data/README.md
CHANGED
@@ -8,7 +8,7 @@
|
|
8
8
|
|
9
9
|
## About Karafka
|
10
10
|
|
11
|
-
Karafka is a framework used to simplify Apache Kafka based Ruby and Ruby on Rails applications development.
|
11
|
+
Karafka is a multi-threaded framework used to simplify Apache Kafka based Ruby and Ruby on Rails applications development.
|
12
12
|
|
13
13
|
```ruby
|
14
14
|
# Define what topics you want to consume with which consumers in karafka.rb
|
@@ -45,7 +45,7 @@ We also maintain many [integration specs](https://github.com/karafka/karafka/tre
|
|
45
45
|
|
46
46
|
## Want to Upgrade? LGPL is not for you? Want to help?
|
47
47
|
|
48
|
-
I also sell Karafka Pro subscription. It includes commercial-friendly license, priority support, architecture consultations and high throughput data processing-related features (
|
48
|
+
I also sell Karafka Pro subscription. It includes commercial-friendly license, priority support, architecture consultations and high throughput data processing-related features (virtual partitions, long running jobs and more).
|
49
49
|
|
50
50
|
**20%** of the income will be distributed back to other OSS projects that Karafka uses under the hood.
|
51
51
|
|
@@ -56,11 +56,3 @@ Help me provide high-quality open-source software. Please see the Karafka [homep
|
|
56
56
|
Karafka has [Wiki pages](https://github.com/karafka/karafka/wiki) for almost everything and a pretty decent [FAQ](https://github.com/karafka/karafka/wiki/FAQ). It covers the whole installation, setup and deployment along with other useful details on how to run Karafka.
|
57
57
|
|
58
58
|
If you have any questions about using Karafka, feel free to join our [Slack](https://slack.karafka.io) channel.
|
59
|
-
|
60
|
-
## Note on contributions
|
61
|
-
|
62
|
-
First, thank you for considering contributing to the Karafka ecosystem! It's people like you that make the open source community such a great community!
|
63
|
-
|
64
|
-
Each pull request must pass all the RSpec specs, integration tests and meet our quality requirements.
|
65
|
-
|
66
|
-
Fork it, update and wait for the Github Actions results.
|
data/bin/stress_many
CHANGED
data/bin/stress_one
CHANGED
data/docker-compose.yml
CHANGED
@@ -33,6 +33,9 @@ services:
|
|
33
33
|
integrations_14_02:2:1,\
|
34
34
|
integrations_15_02:2:1,\
|
35
35
|
integrations_16_02:2:1,\
|
36
|
+
integrations_17_02:2:1,\
|
37
|
+
integrations_18_02:2:1,\
|
38
|
+
integrations_19_02:2:1,\
|
36
39
|
integrations_00_03:3:1,\
|
37
40
|
integrations_01_03:3:1,\
|
38
41
|
integrations_02_03:3:1,\
|
@@ -41,6 +44,7 @@ services:
|
|
41
44
|
integrations_01_10:10:1,\
|
42
45
|
benchmarks_00_01:1:1,\
|
43
46
|
benchmarks_00_05:5:1,\
|
47
|
+
benchmarks_01_05:5:1,\
|
44
48
|
benchmarks_00_10:10:1"
|
45
49
|
volumes:
|
46
50
|
- /var/run/docker.sock:/var/run/docker.sock
|
@@ -35,9 +35,9 @@ module Karafka
|
|
35
35
|
consume
|
36
36
|
end
|
37
37
|
|
38
|
-
|
38
|
+
coordinator.consumption(self).success!
|
39
39
|
rescue StandardError => e
|
40
|
-
|
40
|
+
coordinator.consumption(self).failure!
|
41
41
|
|
42
42
|
Karafka.monitor.instrument(
|
43
43
|
'error.occurred',
|
@@ -47,7 +47,7 @@ module Karafka
|
|
47
47
|
)
|
48
48
|
ensure
|
49
49
|
# We need to decrease number of jobs that this coordinator coordinates as it has finished
|
50
|
-
|
50
|
+
coordinator.decrement
|
51
51
|
end
|
52
52
|
|
53
53
|
# @private
|
@@ -56,7 +56,7 @@ module Karafka
|
|
56
56
|
def on_after_consume
|
57
57
|
return if revoked?
|
58
58
|
|
59
|
-
if
|
59
|
+
if coordinator.success?
|
60
60
|
coordinator.pause_tracker.reset
|
61
61
|
|
62
62
|
# Mark as consumed only if manual offset management is not on
|
@@ -76,8 +76,6 @@ module Karafka
|
|
76
76
|
def on_revoked
|
77
77
|
coordinator.revoke
|
78
78
|
|
79
|
-
resume
|
80
|
-
|
81
79
|
Karafka.monitor.instrument('consumer.revoked', caller: self) do
|
82
80
|
revoked
|
83
81
|
end
|
@@ -316,37 +316,40 @@ module Karafka
|
|
316
316
|
|
317
317
|
time_poll.start
|
318
318
|
|
319
|
-
@kafka.poll(
|
319
|
+
@kafka.poll(timeout)
|
320
320
|
rescue ::Rdkafka::RdkafkaError => e
|
321
|
-
|
322
|
-
|
323
|
-
|
321
|
+
# We return nil, so we do not restart until running the whole loop
|
322
|
+
# This allows us to run revocation jobs and other things and we will pick up new work
|
323
|
+
# next time after dispatching all the things that are needed
|
324
|
+
#
|
325
|
+
# If we would retry here, the client reset would become transparent and we would not have
|
326
|
+
# a chance to take any actions
|
324
327
|
case e.code
|
325
328
|
when :max_poll_exceeded # -147
|
326
329
|
reset
|
330
|
+
return nil
|
327
331
|
when :transport # -195
|
328
332
|
reset
|
333
|
+
return nil
|
329
334
|
when :rebalance_in_progress # -27
|
330
335
|
reset
|
336
|
+
return nil
|
331
337
|
when :not_coordinator # 16
|
332
338
|
reset
|
339
|
+
return nil
|
333
340
|
when :network_exception # 13
|
334
341
|
reset
|
342
|
+
return nil
|
335
343
|
end
|
336
344
|
|
337
|
-
time_poll.
|
338
|
-
|
345
|
+
raise if time_poll.attempts > MAX_POLL_RETRIES
|
339
346
|
raise unless time_poll.retryable?
|
340
347
|
|
348
|
+
time_poll.checkpoint
|
341
349
|
time_poll.backoff
|
342
350
|
|
343
|
-
#
|
344
|
-
|
345
|
-
# next time after dispatching all the things that are needed
|
346
|
-
#
|
347
|
-
# If we would retry here, the client reset would become transparent and we would not have
|
348
|
-
# a chance to take any actions
|
349
|
-
nil
|
351
|
+
# On unknown errors we do our best to retry and handle them before raising
|
352
|
+
retry
|
350
353
|
end
|
351
354
|
|
352
355
|
# Builds a new rdkafka consumer instance based on the subscription group configuration
|
@@ -18,15 +18,18 @@ module Karafka
|
|
18
18
|
# @param jobs_queue [Karafka::Processing::JobsQueue] queue where we should push work
|
19
19
|
# @return [Karafka::Connection::Listener] listener instance
|
20
20
|
def initialize(subscription_group, jobs_queue)
|
21
|
+
proc_config = ::Karafka::App.config.internal.processing
|
22
|
+
|
21
23
|
@id = SecureRandom.uuid
|
22
24
|
@subscription_group = subscription_group
|
23
25
|
@jobs_queue = jobs_queue
|
24
|
-
@jobs_builder = ::Karafka::App.config.internal.processing.jobs_builder
|
25
26
|
@coordinators = Processing::CoordinatorsBuffer.new
|
26
27
|
@client = Client.new(@subscription_group)
|
27
28
|
@executors = Processing::ExecutorsBuffer.new(@client, subscription_group)
|
29
|
+
@jobs_builder = proc_config.jobs_builder
|
30
|
+
@partitioner = proc_config.partitioner_class.new(subscription_group)
|
28
31
|
# We reference scheduler here as it is much faster than fetching this each time
|
29
|
-
@scheduler =
|
32
|
+
@scheduler = proc_config.scheduler
|
30
33
|
# We keep one buffer for messages to preserve memory and not allocate extra objects
|
31
34
|
# We can do this that way because we always first schedule jobs using messages before we
|
32
35
|
# fetch another batch.
|
@@ -79,10 +82,6 @@ module Karafka
|
|
79
82
|
poll_and_remap_messages
|
80
83
|
end
|
81
84
|
|
82
|
-
# This will ensure, that in the next poll, we continue processing (if we get them back)
|
83
|
-
# partitions that we have paused
|
84
|
-
resume_assigned_partitions
|
85
|
-
|
86
85
|
# If there were revoked partitions, we need to wait on their jobs to finish before
|
87
86
|
# distributing consuming jobs as upon revoking, we might get assigned to the same
|
88
87
|
# partitions, thus getting their jobs. The revoking jobs need to finish before
|
@@ -159,8 +158,6 @@ module Karafka
|
|
159
158
|
|
160
159
|
revoked_partitions.each do |topic, partitions|
|
161
160
|
partitions.each do |partition|
|
162
|
-
# We revoke the coordinator here, so we do not have to revoke it in the revoke job
|
163
|
-
# itself (this happens prior to scheduling those jobs)
|
164
161
|
@coordinators.revoke(topic, partition)
|
165
162
|
|
166
163
|
# There may be a case where we have lost partition of which data we have never
|
@@ -204,17 +201,6 @@ module Karafka
|
|
204
201
|
)
|
205
202
|
end
|
206
203
|
|
207
|
-
# Revoked partition needs to be resumed if we were processing them earlier. This will do
|
208
|
-
# nothing to things that we are planning to process. Without this, things we get
|
209
|
-
# re-assigned would not be polled.
|
210
|
-
def resume_assigned_partitions
|
211
|
-
@client.rebalance_manager.assigned_partitions.each do |topic, partitions|
|
212
|
-
partitions.each do |partition|
|
213
|
-
@client.resume(topic, partition)
|
214
|
-
end
|
215
|
-
end
|
216
|
-
end
|
217
|
-
|
218
204
|
# Takes the messages per topic partition and enqueues processing jobs in threads using
|
219
205
|
# given scheduler.
|
220
206
|
def build_and_schedule_consumption_jobs
|
@@ -226,14 +212,16 @@ module Karafka
|
|
226
212
|
coordinator = @coordinators.find_or_create(topic, partition)
|
227
213
|
|
228
214
|
# Start work coordination for this topic partition
|
229
|
-
coordinator.start
|
215
|
+
coordinator.start(messages)
|
230
216
|
|
231
|
-
|
232
|
-
|
217
|
+
@partitioner.call(topic, messages) do |group_id, partition_messages|
|
218
|
+
# Count the job we're going to create here
|
219
|
+
coordinator.increment
|
233
220
|
|
234
|
-
|
221
|
+
executor = @executors.find_or_create(topic, partition, group_id)
|
235
222
|
|
236
|
-
|
223
|
+
jobs << @jobs_builder.consume(executor, partition_messages, coordinator)
|
224
|
+
end
|
237
225
|
end
|
238
226
|
|
239
227
|
@scheduler.schedule_consumption(@jobs_queue, jobs)
|
@@ -25,14 +25,6 @@ module Karafka
|
|
25
25
|
)
|
26
26
|
end
|
27
27
|
|
28
|
-
# Revokes pause tracker for a given topic partition
|
29
|
-
#
|
30
|
-
# @param topic [String] topic name
|
31
|
-
# @param partition [Integer] partition number
|
32
|
-
def revoke(topic, partition)
|
33
|
-
@pauses[topic].delete(partition)
|
34
|
-
end
|
35
|
-
|
36
28
|
# Resumes processing of partitions for which pause time has ended.
|
37
29
|
#
|
38
30
|
# @yieldparam [String] topic name
|
@@ -26,29 +26,38 @@ module Karafka
|
|
26
26
|
# Pauses processing of a given partition until we're done with the processing
|
27
27
|
# This ensures, that we can easily poll not reaching the `max.poll.interval`
|
28
28
|
def on_before_consume
|
29
|
-
# Pause at the first message in a batch. That way in case of a crash, we will not loose
|
30
|
-
# any messages
|
31
29
|
return unless topic.long_running_job?
|
32
30
|
|
33
|
-
|
31
|
+
# This ensures, that when running LRJ with VP, things operate as expected
|
32
|
+
coordinator.on_started do |first_group_message|
|
33
|
+
# Pause at the first message in a batch. That way in case of a crash, we will not loose
|
34
|
+
# any messages
|
35
|
+
pause(first_group_message.offset, MAX_PAUSE_TIME)
|
36
|
+
end
|
34
37
|
end
|
35
38
|
|
36
39
|
# Runs extra logic after consumption that is related to handling long running jobs
|
37
40
|
# @note This overwrites the '#on_after_consume' from the base consumer
|
38
41
|
def on_after_consume
|
39
|
-
|
40
|
-
|
42
|
+
coordinator.on_finished do |first_group_message, last_group_message|
|
43
|
+
on_after_consume_regular(first_group_message, last_group_message)
|
44
|
+
end
|
45
|
+
end
|
41
46
|
|
42
|
-
|
47
|
+
private
|
48
|
+
|
49
|
+
# Handles the post-consumption flow depending on topic settings
|
50
|
+
#
|
51
|
+
# @param first_message [Karafka::Messages::Message]
|
52
|
+
# @param last_message [Karafka::Messages::Message]
|
53
|
+
def on_after_consume_regular(first_message, last_message)
|
54
|
+
if coordinator.success?
|
43
55
|
coordinator.pause_tracker.reset
|
44
56
|
|
45
57
|
# We use the non-blocking one here. If someone needs the blocking one, can implement it
|
46
58
|
# with manual offset management
|
47
59
|
# Mark as consumed only if manual offset management is not on
|
48
|
-
mark_as_consumed(
|
49
|
-
|
50
|
-
# We check it twice as marking could change this state
|
51
|
-
return if revoked?
|
60
|
+
mark_as_consumed(last_message) unless topic.manual_offset_management? || revoked?
|
52
61
|
|
53
62
|
# If this is not a long running job there is nothing for us to do here
|
54
63
|
return unless topic.long_running_job?
|
@@ -60,12 +69,12 @@ module Karafka
|
|
60
69
|
# interesting (yet valid) corner case, where with manual offset management on and no
|
61
70
|
# marking as consumed, we end up with an infinite loop processing same messages over and
|
62
71
|
# over again
|
63
|
-
seek(@seek_offset ||
|
72
|
+
seek(@seek_offset || first_message.offset)
|
64
73
|
|
65
74
|
resume
|
66
75
|
else
|
67
76
|
# If processing failed, we need to pause
|
68
|
-
pause(@seek_offset ||
|
77
|
+
pause(@seek_offset || first_message.offset)
|
69
78
|
end
|
70
79
|
end
|
71
80
|
end
|
data/lib/karafka/pro/loader.rb
CHANGED
@@ -21,6 +21,7 @@ module Karafka
|
|
21
21
|
processing/jobs/consume_non_blocking
|
22
22
|
processing/jobs_builder
|
23
23
|
processing/coordinator
|
24
|
+
processing/partitioner
|
24
25
|
routing/extensions
|
25
26
|
active_job/consumer
|
26
27
|
active_job/dispatcher
|
@@ -39,6 +40,7 @@ module Karafka
|
|
39
40
|
icfg = config.internal
|
40
41
|
|
41
42
|
icfg.processing.coordinator_class = Processing::Coordinator
|
43
|
+
icfg.processing.partitioner_class = Processing::Partitioner
|
42
44
|
icfg.processing.scheduler = Processing::Scheduler.new
|
43
45
|
icfg.processing.jobs_builder = Processing::JobsBuilder.new
|
44
46
|
|
@@ -6,6 +6,57 @@ module Karafka
|
|
6
6
|
# Pro coordinator that provides extra orchestration methods useful for parallel processing
|
7
7
|
# within the same partition
|
8
8
|
class Coordinator < ::Karafka::Processing::Coordinator
|
9
|
+
# @param args [Object] anything the base coordinator accepts
|
10
|
+
def initialize(*args)
|
11
|
+
super
|
12
|
+
@on_started_invoked = false
|
13
|
+
@on_finished_invoked = false
|
14
|
+
@flow_lock = Mutex.new
|
15
|
+
end
|
16
|
+
|
17
|
+
# Starts the coordination process
|
18
|
+
# @param messages [Array<Karafka::Messages::Message>] messages for which processing we are
|
19
|
+
# going to coordinate.
|
20
|
+
def start(messages)
|
21
|
+
super
|
22
|
+
|
23
|
+
@mutex.synchronize do
|
24
|
+
@on_started_invoked = false
|
25
|
+
@on_finished_invoked = false
|
26
|
+
@first_message = messages.first
|
27
|
+
@last_message = messages.last
|
28
|
+
end
|
29
|
+
end
|
30
|
+
|
31
|
+
# @return [Boolean] is the coordinated work finished or not
|
32
|
+
def finished?
|
33
|
+
@running_jobs.zero?
|
34
|
+
end
|
35
|
+
|
36
|
+
# Runs given code only once per all the coordinated jobs upon starting first of them
|
37
|
+
def on_started
|
38
|
+
@flow_lock.synchronize do
|
39
|
+
return if @on_started_invoked
|
40
|
+
|
41
|
+
@on_started_invoked = true
|
42
|
+
|
43
|
+
yield(@first_message, @last_message)
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
# Runs once when all the work that is suppose to be coordinated is finished
|
48
|
+
# It runs once per all the coordinated jobs and should be used to run any type of post
|
49
|
+
# jobs coordination processing execution
|
50
|
+
def on_finished
|
51
|
+
@flow_lock.synchronize do
|
52
|
+
return unless finished?
|
53
|
+
return if @on_finished_invoked
|
54
|
+
|
55
|
+
@on_finished_invoked = true
|
56
|
+
|
57
|
+
yield(@first_message, @last_message)
|
58
|
+
end
|
59
|
+
end
|
9
60
|
end
|
10
61
|
end
|
11
62
|
end
|
@@ -0,0 +1,41 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
# This Karafka component is a Pro component.
|
4
|
+
# All of the commercial components are present in the lib/karafka/pro directory of this
|
5
|
+
# repository and their usage requires commercial license agreement.
|
6
|
+
#
|
7
|
+
# Karafka has also commercial-friendly license, commercial support and commercial components.
|
8
|
+
#
|
9
|
+
# By sending a pull request to the pro components, you are agreeing to transfer the copyright of
|
10
|
+
# your code to Maciej Mensfeld.
|
11
|
+
|
12
|
+
module Karafka
|
13
|
+
module Pro
|
14
|
+
module Processing
|
15
|
+
# Pro partitioner that can distribute work based on the virtual partitioner settings
|
16
|
+
class Partitioner < ::Karafka::Processing::Partitioner
|
17
|
+
# @param topic [String] topic name
|
18
|
+
# @param messages [Array<Karafka::Messages::Message>] karafka messages
|
19
|
+
# @yieldparam [Integer] group id
|
20
|
+
# @yieldparam [Array<Karafka::Messages::Message>] karafka messages
|
21
|
+
def call(topic, messages)
|
22
|
+
ktopic = @subscription_group.topics.find(topic)
|
23
|
+
|
24
|
+
@concurrency ||= ::Karafka::App.config.concurrency
|
25
|
+
|
26
|
+
# We only partition work if we have a virtual partitioner and more than one thread to
|
27
|
+
# process the data. With one thread it is not worth partitioning the work as the work
|
28
|
+
# itself will be assigned to one thread (pointless work)
|
29
|
+
if ktopic.virtual_partitioner? && @concurrency > 1
|
30
|
+
messages
|
31
|
+
.group_by { |msg| ktopic.virtual_partitioner.call(msg).hash.abs % @concurrency }
|
32
|
+
.each { |group_id, messages_group| yield(group_id, messages_group) }
|
33
|
+
else
|
34
|
+
# When no virtual partitioner, works as regular one
|
35
|
+
yield(0, messages)
|
36
|
+
end
|
37
|
+
end
|
38
|
+
end
|
39
|
+
end
|
40
|
+
end
|
41
|
+
end
|
@@ -19,9 +19,15 @@ module Karafka
|
|
19
19
|
# @param base [Class] class we extend
|
20
20
|
def included(base)
|
21
21
|
base.attr_accessor :long_running_job
|
22
|
+
base.attr_accessor :virtual_partitioner
|
22
23
|
end
|
23
24
|
end
|
24
25
|
|
26
|
+
# @return [Boolean] true if virtual partitioner is defined, false otherwise
|
27
|
+
def virtual_partitioner?
|
28
|
+
virtual_partitioner != nil
|
29
|
+
end
|
30
|
+
|
25
31
|
# @return [Boolean] is a given job on a topic a long running one
|
26
32
|
def long_running_job?
|
27
33
|
@long_running_job || false
|
@@ -23,7 +23,9 @@ module Karafka
|
|
23
23
|
end
|
24
24
|
|
25
25
|
# Starts the coordinator for given consumption jobs
|
26
|
-
|
26
|
+
# @param _messages [Array<Karafka::Messages::Message>] batch of message for which we are
|
27
|
+
# going to coordinate work. Not used with regular coordinator.
|
28
|
+
def start(_messages)
|
27
29
|
@mutex.synchronize do
|
28
30
|
@running_jobs = 0
|
29
31
|
# We need to clear the consumption results hash here, otherwise we could end up storing
|
@@ -44,7 +46,9 @@ module Karafka
|
|
44
46
|
|
45
47
|
return @running_jobs unless @running_jobs.negative?
|
46
48
|
|
47
|
-
|
49
|
+
# This should never happen. If it does, something is heavily out of sync. Please reach
|
50
|
+
# out to us if you encounter this
|
51
|
+
raise Karafka::Errors::InvalidCoordinatorState, 'Was zero before decrementation'
|
48
52
|
end
|
49
53
|
end
|
50
54
|
|
@@ -2,7 +2,7 @@
|
|
2
2
|
|
3
3
|
module Karafka
|
4
4
|
module Processing
|
5
|
-
#
|
5
|
+
# Coordinators builder used to build coordinators per topic partition
|
6
6
|
#
|
7
7
|
# It provides direct pauses access for revocation
|
8
8
|
#
|
@@ -34,17 +34,13 @@ module Karafka
|
|
34
34
|
# @param topic [String] topic name
|
35
35
|
# @param partition [Integer] partition number
|
36
36
|
def revoke(topic, partition)
|
37
|
-
@
|
37
|
+
return unless @coordinators[topic].key?(partition)
|
38
38
|
|
39
39
|
# The fact that we delete here does not change the fact that the executor still holds the
|
40
40
|
# reference to this coordinator. We delete it here, as we will no longer process any
|
41
41
|
# new stuff with it and we may need a new coordinator if we regain this partition, but the
|
42
42
|
# coordinator may still be in use
|
43
|
-
|
44
|
-
|
45
|
-
return unless coordinator
|
46
|
-
|
47
|
-
coordinator.revoke
|
43
|
+
@coordinators[topic].delete(partition).revoke
|
48
44
|
end
|
49
45
|
|
50
46
|
# Clears coordinators and re-created the pauses manager
|
@@ -0,0 +1,22 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
module Karafka
|
4
|
+
module Processing
|
5
|
+
# Basic partitioner for work division
|
6
|
+
# It does not divide any work.
|
7
|
+
class Partitioner
|
8
|
+
# @param subscription_group [Karafka::Routing::SubscriptionGroup] subscription group
|
9
|
+
def initialize(subscription_group)
|
10
|
+
@subscription_group = subscription_group
|
11
|
+
end
|
12
|
+
|
13
|
+
# @param _topic [String] topic name
|
14
|
+
# @param messages [Array<Karafka::Messages::Message>] karafka messages
|
15
|
+
# @yieldparam [Integer] group id
|
16
|
+
# @yieldparam [Array<Karafka::Messages::Message>] karafka messages
|
17
|
+
def call(_topic, messages)
|
18
|
+
yield(0, messages)
|
19
|
+
end
|
20
|
+
end
|
21
|
+
end
|
22
|
+
end
|
data/lib/karafka/setup/config.rb
CHANGED
@@ -107,6 +107,8 @@ module Karafka
|
|
107
107
|
setting :jobs_builder, default: Processing::JobsBuilder.new
|
108
108
|
# option coordinator [Class] work coordinator we want to user for processing coordination
|
109
109
|
setting :coordinator_class, default: Processing::Coordinator
|
110
|
+
# option partitioner_class [Class] partitioner we use against a batch of data
|
111
|
+
setting :partitioner_class, default: Processing::Partitioner
|
110
112
|
end
|
111
113
|
|
112
114
|
# Karafka components for ActiveJob
|
data/lib/karafka/version.rb
CHANGED
data.tar.gz.sig
CHANGED
Binary file
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: karafka
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.0.0.
|
4
|
+
version: 2.0.0.rc1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Maciej Mensfeld
|
@@ -34,7 +34,7 @@ cert_chain:
|
|
34
34
|
R2P11bWoCtr70BsccVrN8jEhzwXngMyI2gVt750Y+dbTu1KgRqZKp/ECe7ZzPzXj
|
35
35
|
pIy9vHxTANKYVyI4qj8OrFdEM5BQNu8oQpL0iQ==
|
36
36
|
-----END CERTIFICATE-----
|
37
|
-
date: 2022-07-
|
37
|
+
date: 2022-07-08 00:00:00.000000000 Z
|
38
38
|
dependencies:
|
39
39
|
- !ruby/object:Gem::Dependency
|
40
40
|
name: dry-configurable
|
@@ -240,6 +240,7 @@ files:
|
|
240
240
|
- lib/karafka/pro/processing/coordinator.rb
|
241
241
|
- lib/karafka/pro/processing/jobs/consume_non_blocking.rb
|
242
242
|
- lib/karafka/pro/processing/jobs_builder.rb
|
243
|
+
- lib/karafka/pro/processing/partitioner.rb
|
243
244
|
- lib/karafka/pro/processing/scheduler.rb
|
244
245
|
- lib/karafka/pro/routing/extensions.rb
|
245
246
|
- lib/karafka/process.rb
|
@@ -253,6 +254,7 @@ files:
|
|
253
254
|
- lib/karafka/processing/jobs/shutdown.rb
|
254
255
|
- lib/karafka/processing/jobs_builder.rb
|
255
256
|
- lib/karafka/processing/jobs_queue.rb
|
257
|
+
- lib/karafka/processing/partitioner.rb
|
256
258
|
- lib/karafka/processing/result.rb
|
257
259
|
- lib/karafka/processing/scheduler.rb
|
258
260
|
- lib/karafka/processing/worker.rb
|
metadata.gz.sig
CHANGED
Binary file
|