karafka 2.0.0.beta4 → 2.0.0.beta5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. checksums.yaml +4 -4
  2. checksums.yaml.gz.sig +0 -0
  3. data/.github/workflows/ci.yml +18 -1
  4. data/CHANGELOG.md +15 -0
  5. data/Gemfile.lock +1 -1
  6. data/bin/benchmarks +2 -2
  7. data/bin/integrations +10 -3
  8. data/bin/{stress → stress_many} +0 -0
  9. data/bin/stress_one +13 -0
  10. data/docker-compose.yml +23 -18
  11. data/lib/karafka/active_job/routing/extensions.rb +1 -1
  12. data/lib/karafka/app.rb +2 -1
  13. data/lib/karafka/base_consumer.rb +26 -19
  14. data/lib/karafka/connection/client.rb +24 -4
  15. data/lib/karafka/connection/listener.rb +49 -11
  16. data/lib/karafka/connection/pauses_manager.rb +8 -0
  17. data/lib/karafka/connection/rebalance_manager.rb +20 -19
  18. data/lib/karafka/contracts/config.rb +17 -4
  19. data/lib/karafka/contracts/server_cli_options.rb +1 -1
  20. data/lib/karafka/errors.rb +3 -0
  21. data/lib/karafka/pro/active_job/consumer.rb +1 -8
  22. data/lib/karafka/pro/base_consumer.rb +10 -13
  23. data/lib/karafka/pro/loader.rb +11 -6
  24. data/lib/karafka/pro/processing/coordinator.rb +12 -0
  25. data/lib/karafka/pro/processing/jobs_builder.rb +3 -2
  26. data/lib/karafka/pro/processing/scheduler.rb +56 -0
  27. data/lib/karafka/processing/coordinator.rb +84 -0
  28. data/lib/karafka/processing/coordinators_buffer.rb +58 -0
  29. data/lib/karafka/processing/executor.rb +6 -16
  30. data/lib/karafka/processing/executors_buffer.rb +46 -15
  31. data/lib/karafka/processing/jobs/consume.rb +4 -2
  32. data/lib/karafka/processing/jobs_builder.rb +3 -2
  33. data/lib/karafka/processing/result.rb +0 -5
  34. data/lib/karafka/processing/scheduler.rb +22 -0
  35. data/lib/karafka/routing/consumer_group.rb +1 -1
  36. data/lib/karafka/routing/topic.rb +9 -0
  37. data/lib/karafka/setup/config.rb +18 -10
  38. data/lib/karafka/version.rb +1 -1
  39. data.tar.gz.sig +0 -0
  40. metadata +9 -5
  41. metadata.gz.sig +4 -1
  42. data/lib/karafka/pro/scheduler.rb +0 -54
  43. data/lib/karafka/scheduler.rb +0 -20
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e4e9430d2278617cbed38f5696011603d9c0d8c53813dfc180499dc6e4b97563
4
- data.tar.gz: f082a95aa9841912f819dc0598591c4b96d7ef1199eff324e65ca0c601008dae
3
+ metadata.gz: 2c8e680ffdf69f88899a715c84cc484e8f568f4a93da9284195f4bf55a283ee1
4
+ data.tar.gz: 974356226a10ba2c77de770351a47180716533021a89040bcdc1aae57f452121
5
5
  SHA512:
6
- metadata.gz: 7252c5503234ab4d35fa02d2bb0a18dd8239584fdddc5b451cfdf028a61f37d59a269bac804913d0abf46e2d3273188560e48aa9de40fbb319c766624c1a3b95
7
- data.tar.gz: a4cc5d7c18d2a45483ee26acbacf62c9c13f8824697af96a3f2bf5bccb232d5b07097ed49cfb84a9b46e09f31405813d50b1564d6668f0a483023f449427428b
6
+ metadata.gz: 2427aaae1b1b07430df7c9f042d290bbae8380fb1f6ec7c26eecee92b8fe79e13ea9f3a99a36bf89b314ffba809c556618b22c0a87f0c0c83bb73cf8af72321b
7
+ data.tar.gz: 55e18448b5645acd38c4194967ea7df657c142d82a105699f7b204f222f8dfb2dbd14cce82b1f424ec177afb78049b3e7588642013674a3c2923a8848b6b87e7
checksums.yaml.gz.sig CHANGED
Binary file
@@ -8,6 +8,10 @@ on:
8
8
  schedule:
9
9
  - cron: '0 1 * * *'
10
10
 
11
+ env:
12
+ BUNDLE_RETRY: 6
13
+ BUNDLE_JOBS: 4
14
+
11
15
  jobs:
12
16
  diffend:
13
17
  runs-on: ubuntu-latest
@@ -17,13 +21,16 @@ jobs:
17
21
  - uses: actions/checkout@v2
18
22
  with:
19
23
  fetch-depth: 0
24
+
20
25
  - name: Set up Ruby
21
26
  uses: ruby/setup-ruby@v1
22
27
  with:
23
28
  ruby-version: 3.1
24
29
  bundler-cache: true
30
+
25
31
  - name: Install Diffend plugin
26
32
  run: bundle plugin install diffend
33
+
27
34
  - name: Bundle Secure
28
35
  run: bundle secure
29
36
 
@@ -101,7 +108,17 @@ jobs:
101
108
  uses: ruby/setup-ruby@v1
102
109
  with:
103
110
  ruby-version: ${{matrix.ruby}}
104
- bundler-cache: true
111
+
112
+ - name: Install latest Bundler
113
+ run: |
114
+ gem install bundler --no-document
115
+ gem update --system --no-document
116
+ bundle config set without 'tools benchmarks docs'
117
+
118
+ - name: Bundle install
119
+ run: |
120
+ bundle config set without development
121
+ bundle install
105
122
 
106
123
  - name: Ensure all needed Kafka topics are created and wait if not
107
124
  run: |
data/CHANGELOG.md CHANGED
@@ -1,5 +1,20 @@
1
1
  # Karafka framework changelog
2
2
 
3
+ ## 2.0.0-beta5 (2022-07-05)
4
+ - Always resume processing of a revoked partition upon assignment.
5
+ - Improve specs stability.
6
+ - Fix a case where revocation job would be executed on partition for which we never did any work.
7
+ - Introduce a jobs group coordinator for easier jobs management.
8
+ - Improve stability of resuming paused partitions that were revoked and re-assigned.
9
+ - Optimize reaction time on partition ownership changes.
10
+ - Fix a bug where despite setting long max wait time, we would return messages prior to it while not reaching the desired max messages count.
11
+ - Add more integration specs related to polling limits.
12
+ - Remove auto-detection of re-assigned partitions upon rebalance as for too fast rebalances it could not be accurate enough. It would also mess up in case of rebalances that would happen right after a `#seek` was issued for a partition.
13
+ - Optimize the removal of pre-buffered lost partitions data.
14
+ - Always rune `#revoked` when rebalance with revocation happens.
15
+ - Evict executors upon rebalance, to prevent race-conditions.
16
+ - Align topics names for integration specs.
17
+
3
18
  ## 2.0.0-beta4 (2022-06-20)
4
19
  - Rename job internal api methods from `#prepare` to `#before_call` and from `#teardown` to `#after_call` to abstract away jobs execution from any type of executors and consumers logic
5
20
  - Remove ability of running `before_consume` and `after_consume` completely. Those should be for internal usage only.
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- karafka (2.0.0.beta4)
4
+ karafka (2.0.0.beta5)
5
5
  dry-configurable (~> 0.13)
6
6
  dry-monitor (~> 0.5)
7
7
  dry-validation (~> 1.7)
data/bin/benchmarks CHANGED
@@ -39,8 +39,8 @@ if ENV['SEED']
39
39
 
40
40
  # We do not populate data of benchmarks_0_10 as we use it with life-stream data only
41
41
  %w[
42
- benchmarks_0_01
43
- benchmarks_0_05
42
+ benchmarks_00_01
43
+ benchmarks_00_05
44
44
  ].each do |topic_name|
45
45
  partitions_count = topic_name.split('_').last.to_i
46
46
 
data/bin/integrations CHANGED
@@ -21,6 +21,9 @@ ROOT_PATH = Pathname.new(File.expand_path(File.join(File.dirname(__FILE__), '../
21
21
  # of CPU
22
22
  CONCURRENCY = ENV.key?('CI') ? 5 : Etc.nprocessors * 2
23
23
 
24
+ # How may bytes do we want to keep from the stdout in the buffer for when we need to print it
25
+ MAX_BUFFER_OUTPUT = 10_240
26
+
24
27
  # Abstraction around a single test scenario execution process
25
28
  class Scenario
26
29
  # How long a scenario can run before we kill it
@@ -84,9 +87,9 @@ class Scenario
84
87
  # We read it so it won't grow as we use our default logger that prints to both test.log and
85
88
  # to stdout. Otherwise after reaching the buffer size, it would hang
86
89
  buffer = ''
87
- @stdout.read_nonblock(10_240, buffer, exception: false)
90
+ @stdout.read_nonblock(MAX_BUFFER_OUTPUT, buffer, exception: false)
88
91
  @stdout_tail << buffer
89
- @stdout_tail = @stdout_tail[-10_024..-1] || @stdout_tail
92
+ @stdout_tail = @stdout_tail[-MAX_BUFFER_OUTPUT..-1] || @stdout_tail
90
93
 
91
94
  !@wait_thr.alive?
92
95
  end
@@ -114,11 +117,15 @@ class Scenario
114
117
  if success?
115
118
  print "\e[#{32}m#{'.'}\e[0m"
116
119
  else
120
+ buffer = ''
121
+
122
+ @stderr.read_nonblock(MAX_BUFFER_OUTPUT, buffer, exception: false)
123
+
117
124
  puts
118
125
  puts "\e[#{31}m#{'[FAILED]'}\e[0m #{name}"
119
126
  puts "Exit code: #{exit_code}"
120
127
  puts @stdout_tail
121
- puts @stderr.read
128
+ puts buffer
122
129
  puts
123
130
  end
124
131
  end
File without changes
data/bin/stress_one ADDED
@@ -0,0 +1,13 @@
1
+ #!/bin/bash
2
+
3
+ # Runs a single integration spec in an endless loop
4
+ # This allows us to ensure (after long enough time) that the integration spec is stable and
5
+ # that there are no anomalies when running it for a long period of time
6
+
7
+ set -e
8
+
9
+ while :
10
+ do
11
+ reset
12
+ bin/scenario $1
13
+ done
data/docker-compose.yml CHANGED
@@ -16,26 +16,31 @@ services:
16
16
  KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
17
17
  KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'true'
18
18
  KAFKA_CREATE_TOPICS:
19
- "integrations_0_02:2:1,\
20
- integrations_1_02:2:1,\
21
- integrations_2_02:2:1,\
22
- integrations_3_02:2:1,\
23
- integrations_4_02:2:1,\
24
- integrations_5_02:2:1,\
25
- integrations_6_02:2:1,\
26
- integrations_7_02:2:1,\
27
- integrations_8_02:2:1,\
28
- integrations_9_02:2:1,\
19
+ "integrations_00_02:2:1,\
20
+ integrations_01_02:2:1,\
21
+ integrations_02_02:2:1,\
22
+ integrations_03_02:2:1,\
23
+ integrations_04_02:2:1,\
24
+ integrations_05_02:2:1,\
25
+ integrations_06_02:2:1,\
26
+ integrations_07_02:2:1,\
27
+ integrations_08_02:2:1,\
28
+ integrations_09_02:2:1,\
29
29
  integrations_10_02:2:1,\
30
30
  integrations_11_02:2:1,\
31
31
  integrations_12_02:2:1,\
32
- integrations_0_03:3:1,\
33
- integrations_1_03:3:1,\
34
- integrations_2_03:3:1,\
35
- integrations_0_10:10:1,\
36
- integrations_1_10:10:1,\
37
- benchmarks_0_01:1:1,\
38
- benchmarks_0_05:5:1,\
39
- benchmarks_0_10:10:1"
32
+ integrations_13_02:2:1,\
33
+ integrations_14_02:2:1,\
34
+ integrations_15_02:2:1,\
35
+ integrations_16_02:2:1,\
36
+ integrations_00_03:3:1,\
37
+ integrations_01_03:3:1,\
38
+ integrations_02_03:3:1,\
39
+ integrations_03_03:3:1,\
40
+ integrations_00_10:10:1,\
41
+ integrations_01_10:10:1,\
42
+ benchmarks_00_01:1:1,\
43
+ benchmarks_00_05:5:1,\
44
+ benchmarks_00_10:10:1"
40
45
  volumes:
41
46
  - /var/run/docker.sock:/var/run/docker.sock
@@ -13,7 +13,7 @@ module Karafka
13
13
  # @param block [Proc] block that we can use for some extra configuration
14
14
  def active_job_topic(name, &block)
15
15
  topic(name) do
16
- consumer App.config.internal.active_job.consumer
16
+ consumer App.config.internal.active_job.consumer_class
17
17
 
18
18
  next unless block
19
19
 
data/lib/karafka/app.rb CHANGED
@@ -10,7 +10,8 @@ module Karafka
10
10
  def consumer_groups
11
11
  config
12
12
  .internal
13
- .routing_builder
13
+ .routing
14
+ .builder
14
15
  end
15
16
 
16
17
  # @return [Array<Karafka::Routing::SubscriptionGroup>] active subscription groups
@@ -10,17 +10,11 @@ module Karafka
10
10
  attr_accessor :messages
11
11
  # @return [Karafka::Connection::Client] kafka connection client
12
12
  attr_accessor :client
13
- # @return [Karafka::TimeTrackers::Pause] current topic partition pause tracker
14
- attr_accessor :pause_tracker
13
+ # @return [Karafka::Processing::Coordinator] coordinator
14
+ attr_accessor :coordinator
15
15
  # @return [Waterdrop::Producer] producer instance
16
16
  attr_accessor :producer
17
17
 
18
- def initialize
19
- # We re-use one to save on object allocation
20
- # It also allows us to transfer the consumption notion to another batch
21
- @consumption = Processing::Result.new
22
- end
23
-
24
18
  # Can be used to run preparation code
25
19
  #
26
20
  # @private
@@ -41,9 +35,9 @@ module Karafka
41
35
  consume
42
36
  end
43
37
 
44
- @consumption.success!
38
+ @coordinator.consumption(self).success!
45
39
  rescue StandardError => e
46
- @consumption.failure!
40
+ @coordinator.consumption(self).failure!
47
41
 
48
42
  Karafka.monitor.instrument(
49
43
  'error.occurred',
@@ -51,14 +45,19 @@ module Karafka
51
45
  caller: self,
52
46
  type: 'consumer.consume.error'
53
47
  )
48
+ ensure
49
+ # We need to decrease number of jobs that this coordinator coordinates as it has finished
50
+ @coordinator.decrement
54
51
  end
55
52
 
56
53
  # @private
57
54
  # @note This should not be used by the end users as it is part of the lifecycle of things but
58
55
  # not as part of the public api.
59
56
  def on_after_consume
60
- if @consumption.success?
61
- pause_tracker.reset
57
+ return if revoked?
58
+
59
+ if @coordinator.success?
60
+ coordinator.pause_tracker.reset
62
61
 
63
62
  # Mark as consumed only if manual offset management is not on
64
63
  return if topic.manual_offset_management?
@@ -75,6 +74,10 @@ module Karafka
75
74
  #
76
75
  # @private
77
76
  def on_revoked
77
+ coordinator.revoke
78
+
79
+ resume
80
+
78
81
  Karafka.monitor.instrument('consumer.revoked', caller: self) do
79
82
  revoked
80
83
  end
@@ -132,9 +135,11 @@ module Karafka
132
135
  # processed but rather at the next one. This applies to both sync and async versions of this
133
136
  # method.
134
137
  def mark_as_consumed(message)
135
- @revoked = !client.mark_as_consumed(message)
138
+ unless client.mark_as_consumed(message)
139
+ coordinator.revoke
136
140
 
137
- return false if revoked?
141
+ return false
142
+ end
138
143
 
139
144
  @seek_offset = message.offset + 1
140
145
 
@@ -147,9 +152,11 @@ module Karafka
147
152
  # @return [Boolean] true if we were able to mark the offset, false otherwise. False indicates
148
153
  # that we were not able and that we have lost the partition.
149
154
  def mark_as_consumed!(message)
150
- @revoked = !client.mark_as_consumed!(message)
155
+ unless client.mark_as_consumed!(message)
156
+ coordinator.revoke
151
157
 
152
- return false if revoked?
158
+ return false
159
+ end
153
160
 
154
161
  @seek_offset = message.offset + 1
155
162
 
@@ -163,7 +170,7 @@ module Karafka
163
170
  # @param timeout [Integer, nil] how long in milliseconds do we want to pause or nil to use the
164
171
  # default exponential pausing strategy defined for retries
165
172
  def pause(offset, timeout = nil)
166
- timeout ? pause_tracker.pause(timeout) : pause_tracker.pause
173
+ timeout ? coordinator.pause_tracker.pause(timeout) : coordinator.pause_tracker.pause
167
174
 
168
175
  client.pause(
169
176
  messages.metadata.topic,
@@ -176,7 +183,7 @@ module Karafka
176
183
  def resume
177
184
  # This is sufficient to expire a partition pause, as with it will be resumed by the listener
178
185
  # thread before the next poll.
179
- pause_tracker.expire
186
+ coordinator.pause_tracker.expire
180
187
  end
181
188
 
182
189
  # Seeks in the context of current topic and partition
@@ -196,7 +203,7 @@ module Karafka
196
203
  # @note We know that partition got revoked because when we try to mark message as consumed,
197
204
  # unless if is successful, it will return false
198
205
  def revoked?
199
- @revoked || false
206
+ coordinator.revoked?
200
207
  end
201
208
  end
202
209
  end
@@ -36,6 +36,12 @@ module Karafka
36
36
  # Marks if we need to offset. If we did not store offsets, we should not commit the offset
37
37
  # position as it will crash rdkafka
38
38
  @offsetting = false
39
+ # We need to keep track of what we have paused for resuming
40
+ # In case we loose partition, we still need to resume it, otherwise it won't be fetched
41
+ # again if we get reassigned to it later on. We need to keep them as after revocation we
42
+ # no longer may be able to fetch them from Kafka. We could build them but it is easier
43
+ # to just keep them here and use if needed when cannot be obtained
44
+ @paused_tpls = Hash.new { |h, k| h[k] = {} }
39
45
  end
40
46
 
41
47
  # Fetches messages within boundaries defined by the settings (time, size, topics, etc).
@@ -45,12 +51,13 @@ module Karafka
45
51
  # @note This method should not be executed from many threads at the same time
46
52
  def batch_poll
47
53
  time_poll = TimeTrackers::Poll.new(@subscription_group.max_wait_time)
48
- time_poll.start
49
54
 
50
55
  @buffer.clear
51
56
  @rebalance_manager.clear
52
57
 
53
58
  loop do
59
+ time_poll.start
60
+
54
61
  # Don't fetch more messages if we do not have any time left
55
62
  break if time_poll.exceeded?
56
63
  # Don't fetch more messages if we've fetched max as we've wanted
@@ -69,7 +76,11 @@ module Karafka
69
76
  # If partition revocation happens, we need to remove messages from revoked partitions
70
77
  # as well as ensure we do not have duplicated due to the offset reset for partitions
71
78
  # that we got assigned
72
- remove_revoked_and_duplicated_messages if @rebalance_manager.revoked_partitions?
79
+ # We also do early break, so the information about rebalance is used as soon as possible
80
+ if @rebalance_manager.changed?
81
+ remove_revoked_and_duplicated_messages
82
+ break
83
+ end
73
84
 
74
85
  # Finally once we've (potentially) removed revoked, etc, if no messages were returned
75
86
  # we can break.
@@ -144,10 +155,14 @@ module Karafka
144
155
 
145
156
  internal_commit_offsets(async: false)
146
157
 
158
+ # Here we do not use our cached tpls because we should not try to pause something we do
159
+ # not own anymore.
147
160
  tpl = topic_partition_list(topic, partition)
148
161
 
149
162
  return unless tpl
150
163
 
164
+ @paused_tpls[topic][partition] = tpl
165
+
151
166
  @kafka.pause(tpl)
152
167
 
153
168
  @kafka.seek(pause_msg)
@@ -169,9 +184,13 @@ module Karafka
169
184
  # We can skip performance penalty since resuming should not happen too often
170
185
  internal_commit_offsets(async: false)
171
186
 
172
- tpl = topic_partition_list(topic, partition)
187
+ # If we were not able, let's try to reuse the one we have (if we have)
188
+ tpl = topic_partition_list(topic, partition) || @paused_tpls[topic][partition]
173
189
 
174
190
  return unless tpl
191
+ # If we did not have it, it means we never paused this partition, thus no resume should
192
+ # happen in the first place
193
+ return unless @paused_tpls[topic].delete(partition)
175
194
 
176
195
  @kafka.resume(tpl)
177
196
  ensure
@@ -214,6 +233,7 @@ module Karafka
214
233
  @mutex.synchronize do
215
234
  @closed = false
216
235
  @offsetting = false
236
+ @paused_tpls.clear
217
237
  @kafka = build_consumer
218
238
  end
219
239
  end
@@ -369,7 +389,7 @@ module Karafka
369
389
  # we are no longer responsible in a given process for processing those messages and they
370
390
  # should have been picked up by a different process.
371
391
  def remove_revoked_and_duplicated_messages
372
- @rebalance_manager.revoked_partitions.each do |topic, partitions|
392
+ @rebalance_manager.lost_partitions.each do |topic, partitions|
373
393
  partitions.each do |partition|
374
394
  @buffer.delete(topic, partition)
375
395
  end
@@ -21,12 +21,12 @@ module Karafka
21
21
  @id = SecureRandom.uuid
22
22
  @subscription_group = subscription_group
23
23
  @jobs_queue = jobs_queue
24
- @jobs_builder = ::Karafka::App.config.internal.jobs_builder
25
- @pauses_manager = PausesManager.new
24
+ @jobs_builder = ::Karafka::App.config.internal.processing.jobs_builder
25
+ @coordinators = Processing::CoordinatorsBuffer.new
26
26
  @client = Client.new(@subscription_group)
27
27
  @executors = Processing::ExecutorsBuffer.new(@client, subscription_group)
28
28
  # We reference scheduler here as it is much faster than fetching this each time
29
- @scheduler = ::Karafka::App.config.internal.scheduler
29
+ @scheduler = ::Karafka::App.config.internal.processing.scheduler
30
30
  # We keep one buffer for messages to preserve memory and not allocate extra objects
31
31
  # We can do this that way because we always first schedule jobs using messages before we
32
32
  # fetch another batch.
@@ -79,6 +79,10 @@ module Karafka
79
79
  poll_and_remap_messages
80
80
  end
81
81
 
82
+ # This will ensure, that in the next poll, we continue processing (if we get them back)
83
+ # partitions that we have paused
84
+ resume_assigned_partitions
85
+
82
86
  # If there were revoked partitions, we need to wait on their jobs to finish before
83
87
  # distributing consuming jobs as upon revoking, we might get assigned to the same
84
88
  # partitions, thus getting their jobs. The revoking jobs need to finish before
@@ -86,6 +90,9 @@ module Karafka
86
90
  build_and_schedule_revoke_lost_partitions_jobs
87
91
 
88
92
  # We wait only on jobs from our subscription group. Other groups are independent.
93
+ # This will block on revoked jobs until they are finished. Those are not meant to last
94
+ # long and should not have any bigger impact on the system. Doing this in a blocking way
95
+ # simplifies the overall design and prevents from race conditions
89
96
  wait
90
97
 
91
98
  build_and_schedule_consumption_jobs
@@ -136,7 +143,7 @@ module Karafka
136
143
 
137
144
  # Resumes processing of partitions that were paused due to an error.
138
145
  def resume_paused_partitions
139
- @pauses_manager.resume do |topic, partition|
146
+ @coordinators.resume do |topic, partition|
140
147
  @client.resume(topic, partition)
141
148
  end
142
149
  end
@@ -152,9 +159,23 @@ module Karafka
152
159
 
153
160
  revoked_partitions.each do |topic, partitions|
154
161
  partitions.each do |partition|
155
- pause_tracker = @pauses_manager.fetch(topic, partition)
156
- executor = @executors.fetch(topic, partition, pause_tracker)
157
- jobs << @jobs_builder.revoked(executor)
162
+ # We revoke the coordinator here, so we do not have to revoke it in the revoke job
163
+ # itself (this happens prior to scheduling those jobs)
164
+ @coordinators.revoke(topic, partition)
165
+
166
+ # There may be a case where we have lost partition of which data we have never
167
+ # processed (if it was assigned and revoked really fast), thus we may not have it
168
+ # here. In cases like this, we do not run a revocation job
169
+ @executors.find_all(topic, partition).each do |executor|
170
+ jobs << @jobs_builder.revoked(executor)
171
+ end
172
+
173
+ # We need to remove all the executors of a given topic partition that we have lost, so
174
+ # next time we pick up it's work, new executors kick in. This may be needed especially
175
+ # for LRJ where we could end up with a race condition
176
+ # This revocation needs to happen after the jobs are scheduled, otherwise they would
177
+ # be scheduled with new executors instead of old
178
+ @executors.revoke(topic, partition)
158
179
  end
159
180
  end
160
181
 
@@ -183,6 +204,17 @@ module Karafka
183
204
  )
184
205
  end
185
206
 
207
+ # Revoked partition needs to be resumed if we were processing them earlier. This will do
208
+ # nothing to things that we are planning to process. Without this, things we get
209
+ # re-assigned would not be polled.
210
+ def resume_assigned_partitions
211
+ @client.rebalance_manager.assigned_partitions.each do |topic, partitions|
212
+ partitions.each do |partition|
213
+ @client.resume(topic, partition)
214
+ end
215
+ end
216
+ end
217
+
186
218
  # Takes the messages per topic partition and enqueues processing jobs in threads using
187
219
  # given scheduler.
188
220
  def build_and_schedule_consumption_jobs
@@ -191,11 +223,17 @@ module Karafka
191
223
  jobs = []
192
224
 
193
225
  @messages_buffer.each do |topic, partition, messages|
194
- pause_tracker = @pauses_manager.fetch(topic, partition)
226
+ coordinator = @coordinators.find_or_create(topic, partition)
227
+
228
+ # Start work coordination for this topic partition
229
+ coordinator.start
230
+
231
+ # Count the job we're going to create here
232
+ coordinator.increment
195
233
 
196
- executor = @executors.fetch(topic, partition, pause_tracker)
234
+ executor = @executors.find_or_create(topic, partition, 0)
197
235
 
198
- jobs << @jobs_builder.consume(executor, messages)
236
+ jobs << @jobs_builder.consume(executor, messages, coordinator)
199
237
  end
200
238
 
201
239
  @scheduler.schedule_consumption(@jobs_queue, jobs)
@@ -231,7 +269,7 @@ module Karafka
231
269
  @jobs_queue.wait(@subscription_group.id)
232
270
  @jobs_queue.clear(@subscription_group.id)
233
271
  @client.reset
234
- @pauses_manager = PausesManager.new
272
+ @coordinators.reset
235
273
  @executors = Processing::ExecutorsBuffer.new(@client, @subscription_group)
236
274
  end
237
275
  end
@@ -25,6 +25,14 @@ module Karafka
25
25
  )
26
26
  end
27
27
 
28
+ # Revokes pause tracker for a given topic partition
29
+ #
30
+ # @param topic [String] topic name
31
+ # @param partition [Integer] partition number
32
+ def revoke(topic, partition)
33
+ @pauses[topic].delete(partition)
34
+ end
35
+
28
36
  # Resumes processing of partitions for which pause time has ended.
29
37
  #
30
38
  # @yieldparam [String] topic name
@@ -18,13 +18,15 @@ module Karafka
18
18
  # Empty array for internal usage not to create new objects
19
19
  EMPTY_ARRAY = [].freeze
20
20
 
21
+ attr_reader :assigned_partitions, :revoked_partitions
22
+
21
23
  private_constant :EMPTY_ARRAY
22
24
 
23
25
  # @return [RebalanceManager]
24
26
  def initialize
25
27
  @assigned_partitions = {}
26
28
  @revoked_partitions = {}
27
- @lost_partitions = {}
29
+ @changed = false
28
30
  end
29
31
 
30
32
  # Resets the rebalance manager state
@@ -33,26 +35,12 @@ module Karafka
33
35
  def clear
34
36
  @assigned_partitions.clear
35
37
  @revoked_partitions.clear
36
- @lost_partitions.clear
37
- end
38
-
39
- # @return [Hash<String, Array<Integer>>] hash where the keys are the names of topics for
40
- # which we've lost partitions and array with ids of the partitions as the value
41
- # @note We do not consider as lost topics and partitions that got revoked and assigned
42
- def revoked_partitions
43
- return @revoked_partitions if @revoked_partitions.empty?
44
- return @lost_partitions unless @lost_partitions.empty?
45
-
46
- @revoked_partitions.each do |topic, partitions|
47
- @lost_partitions[topic] = partitions - @assigned_partitions.fetch(topic, EMPTY_ARRAY)
48
- end
49
-
50
- @lost_partitions
38
+ @changed = false
51
39
  end
52
40
 
53
- # @return [Boolean] true if any partitions were revoked
54
- def revoked_partitions?
55
- !revoked_partitions.empty?
41
+ # @return [Boolean] indicates a state change in the partitions assignment
42
+ def changed?
43
+ @changed
56
44
  end
57
45
 
58
46
  # Callback that kicks in inside of rdkafka, when new partitions are assigned.
@@ -62,6 +50,7 @@ module Karafka
62
50
  # @param partitions [Rdkafka::Consumer::TopicPartitionList]
63
51
  def on_partitions_assigned(_, partitions)
64
52
  @assigned_partitions = partitions.to_h.transform_values { |part| part.map(&:partition) }
53
+ @changed = true
65
54
  end
66
55
 
67
56
  # Callback that kicks in inside of rdkafka, when partitions are revoked.
@@ -71,6 +60,18 @@ module Karafka
71
60
  # @param partitions [Rdkafka::Consumer::TopicPartitionList]
72
61
  def on_partitions_revoked(_, partitions)
73
62
  @revoked_partitions = partitions.to_h.transform_values { |part| part.map(&:partition) }
63
+ @changed = true
64
+ end
65
+
66
+ # We consider as lost only partitions that were taken away and not re-assigned back to us
67
+ def lost_partitions
68
+ lost_partitions = {}
69
+
70
+ revoked_partitions.each do |topic, partitions|
71
+ lost_partitions[topic] = partitions - assigned_partitions.fetch(topic, EMPTY_ARRAY)
72
+ end
73
+
74
+ lost_partitions
74
75
  end
75
76
  end
76
77
  end