karafka 2.0.0.alpha5 → 2.0.0.beta2

Sign up to get free protection for your applications and to get access to all the features.
Files changed (54) hide show
  1. checksums.yaml +4 -4
  2. checksums.yaml.gz.sig +0 -0
  3. data/.ruby-version +1 -1
  4. data/CHANGELOG.md +35 -2
  5. data/Gemfile.lock +6 -6
  6. data/bin/integrations +55 -43
  7. data/config/errors.yml +1 -0
  8. data/docker-compose.yml +4 -1
  9. data/lib/active_job/karafka.rb +2 -2
  10. data/lib/karafka/active_job/routing/extensions.rb +21 -0
  11. data/lib/karafka/base_consumer.rb +65 -12
  12. data/lib/karafka/connection/client.rb +36 -6
  13. data/lib/karafka/connection/listener.rb +92 -27
  14. data/lib/karafka/connection/listeners_batch.rb +24 -0
  15. data/lib/karafka/connection/messages_buffer.rb +49 -22
  16. data/lib/karafka/connection/pauses_manager.rb +2 -2
  17. data/lib/karafka/connection/raw_messages_buffer.rb +101 -0
  18. data/lib/karafka/connection/rebalance_manager.rb +35 -20
  19. data/lib/karafka/contracts/config.rb +8 -0
  20. data/lib/karafka/helpers/async.rb +33 -0
  21. data/lib/karafka/instrumentation/monitor.rb +2 -1
  22. data/lib/karafka/messages/batch_metadata.rb +26 -3
  23. data/lib/karafka/messages/builders/batch_metadata.rb +17 -29
  24. data/lib/karafka/messages/builders/message.rb +1 -0
  25. data/lib/karafka/messages/builders/messages.rb +4 -12
  26. data/lib/karafka/pro/active_job/consumer.rb +21 -0
  27. data/lib/karafka/pro/active_job/dispatcher.rb +10 -10
  28. data/lib/karafka/pro/active_job/job_options_contract.rb +9 -9
  29. data/lib/karafka/pro/loader.rb +17 -8
  30. data/lib/karafka/pro/performance_tracker.rb +80 -0
  31. data/lib/karafka/pro/processing/jobs/consume_non_blocking.rb +38 -0
  32. data/lib/karafka/pro/scheduler.rb +54 -0
  33. data/lib/karafka/processing/executor.rb +19 -11
  34. data/lib/karafka/processing/executors_buffer.rb +15 -7
  35. data/lib/karafka/processing/jobs/base.rb +28 -0
  36. data/lib/karafka/processing/jobs/consume.rb +11 -4
  37. data/lib/karafka/processing/jobs_queue.rb +28 -16
  38. data/lib/karafka/processing/worker.rb +30 -9
  39. data/lib/karafka/processing/workers_batch.rb +5 -0
  40. data/lib/karafka/railtie.rb +12 -0
  41. data/lib/karafka/routing/consumer_group.rb +1 -1
  42. data/lib/karafka/routing/subscription_group.rb +1 -1
  43. data/lib/karafka/routing/subscription_groups_builder.rb +3 -2
  44. data/lib/karafka/routing/topics.rb +38 -0
  45. data/lib/karafka/runner.rb +19 -27
  46. data/lib/karafka/scheduler.rb +20 -0
  47. data/lib/karafka/server.rb +24 -23
  48. data/lib/karafka/setup/config.rb +4 -1
  49. data/lib/karafka/time_trackers/pause.rb +10 -2
  50. data/lib/karafka/version.rb +1 -1
  51. data.tar.gz.sig +0 -0
  52. metadata +13 -4
  53. metadata.gz.sig +0 -0
  54. data/lib/karafka/active_job/routing_extensions.rb +0 -18
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 062ee1f4c49d482daa48aa1af06aae08b83fb879ecfb4a6a14d5b2b34ed2975a
4
- data.tar.gz: 28b92b8b5cea506641e339d59f53122fb18deb1637c20a2b34038ae2000b6f17
3
+ metadata.gz: 7f75623e7d9cdcc4ba151ad551079275528c56bf66cd9c32ecc585756a8d505c
4
+ data.tar.gz: e0becf53133b579f581ddfdf947bbff21221fe69a8c73a0406174aecd0155f3a
5
5
  SHA512:
6
- metadata.gz: '0497e94e2aa16ee20ded58e17313c9259bacb6f3fa91259e30f0cb0560b58a1eecb11447fb150e09cd659bb24d48e0cc732dc3fc7cc585a2aefe980df2e5b3f1'
7
- data.tar.gz: 280407edd6298a7e62f970d2f9a2d6d1ff6eba4737ea4369eba20c4a83b9d00a0efa24ef36ceabd9def060839039fcf925c341f520efa2ab68532d484482f4fd
6
+ metadata.gz: 2ef2ac59f1ea60136abbaccf460206a0c2f6d4fe3124eda520f3a19568702acc774e0d9e02eae24cfe6bb6cb8ee8aa74602588aa818dc3537cd6bbc8409f159d
7
+ data.tar.gz: e29e964e777e2bd8a551458f591b92aea69a3ac2eafa1bc1d75bb42d7cd6bb904abf997724be4268ae1f9d429627d7b92d5c38e51817d36b0f27c6499a062af3
checksums.yaml.gz.sig CHANGED
Binary file
data/.ruby-version CHANGED
@@ -1 +1 @@
1
- 3.1.0
1
+ 3.1.2
data/CHANGELOG.md CHANGED
@@ -1,5 +1,38 @@
1
1
  # Karafka framework changelog
2
2
 
3
+ ## 2.0.0-beta2 (2022-06-07)
4
+ - Abstract away notion of topics groups (until now it was just an array)
5
+ - Optimize how jobs queue is closed. Since we enqueue jobs only from the listeners, we can safely close jobs queue once listeners are done. By extracting this responsibility from listeners, we remove corner cases and race conditions. Note here: for non-blocking jobs we do wait for them to finish while running the `poll`. This ensures, that for async jobs that are long-living, we do not reach `max.poll.interval`.
6
+ - `Shutdown` jobs are executed in workers to align all the jobs behaviours.
7
+ - `Shutdown` jobs are always blocking.
8
+ - Notion of `ListenersBatch` was introduced similar to `WorkersBatch` to abstract this concept.
9
+ - Change default `shutdown_timeout` to be more than `max_wait_time` not to cause forced shutdown when no messages are being received from Kafka.
10
+ - Abstract away scheduling of revocation and shutdown jobs for both default and pro schedulers
11
+ - Introduce a second (internal) messages buffer to distinguish between raw messages buffer and karafka messages buffer
12
+ - Move messages and their metadata remap process to the listener thread to allow for their inline usage
13
+ - Change how we wait in the shutdown phase, so shutdown jobs can still use Kafka connection even if they run for a longer period of time. This will prevent us from being kicked out from the group early.
14
+ - Introduce validation that ensures, that `shutdown_timeout` is more than `max_wait_time`. This will prevent users from ending up with a config that could lead to frequent forceful shutdowns.
15
+
16
+ ## 2.0.0-beta1 (2022-05-22)
17
+ - Update the jobs queue blocking engine and allow for non-blocking jobs execution
18
+ - Provide `#prepared` hook that always runs before the fetching loop is unblocked
19
+ - [Pro] Introduce performance tracker for scheduling optimizer
20
+ - Provide ability to pause (`#pause`) and resume (`#resume`) given partitions from the consumers
21
+ - Small integration specs refactoring + specs for pausing scenarios
22
+
23
+ ## 2.0.0-alpha6 (2022-04-17)
24
+ - Fix a bug, where upon missing boot file and Rails, railtie would fail with a generic exception (#818)
25
+ - Fix an issue with parallel pristine specs colliding with each other during `bundle install` (#820)
26
+ - Replace `consumer.consume` with `consumer.consumed` event to match the behaviour
27
+ - Make sure, that offset committing happens before the `consumer.consumed` event is propagated
28
+ - Fix for failing when not installed (just a dependency) (#817)
29
+ - Evict messages from partitions that were lost upon rebalancing (#825)
30
+ - Do **not** run `#revoked` on partitions that were lost and assigned back upon rebalancing (#825)
31
+ - Remove potential duplicated that could occur upon rebalance with re-assigned partitions (#825)
32
+ - Optimize integration test suite additional consumers shutdown process (#828)
33
+ - Optimize messages eviction and duplicates removal on poll stopped due to lack of messages
34
+ - Add static group membership integration spec
35
+
3
36
  ## 2.0.0-alpha5 (2022-04-03)
4
37
  - Rename StdoutListener to LoggerListener (#811)
5
38
 
@@ -13,12 +46,12 @@
13
46
 
14
47
  ## 2.0.0-alpha2 (2022-02-19)
15
48
  - Require `kafka` keys to be symbols
16
- - Added ActiveJob Pro adapter
49
+ - [Pro] Added ActiveJob Pro adapter
17
50
  - Small updates to the license and docs
18
51
 
19
52
  ## 2.0.0-alpha1 (2022-01-30)
20
53
  - Change license to `LGPL-3.0`
21
- - Introduce a Pro subscription
54
+ - [Pro] Introduce a Pro subscription
22
55
  - Switch from `ruby-kafka` to `librdkafka` as an underlying driver
23
56
  - Introduce fully automatic integration tests that go through the whole server lifecycle
24
57
  - Integrate WaterDrop tightly with autoconfiguration inheritance and an option to redefine it
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- karafka (2.0.0.alpha5)
4
+ karafka (2.0.0.beta2)
5
5
  dry-configurable (~> 0.13)
6
6
  dry-monitor (~> 0.5)
7
7
  dry-validation (~> 1.7)
@@ -13,10 +13,10 @@ PATH
13
13
  GEM
14
14
  remote: https://rubygems.org/
15
15
  specs:
16
- activejob (7.0.2.3)
17
- activesupport (= 7.0.2.3)
16
+ activejob (7.0.3)
17
+ activesupport (= 7.0.3)
18
18
  globalid (>= 0.3.6)
19
- activesupport (7.0.2.3)
19
+ activesupport (7.0.3)
20
20
  concurrent-ruby (~> 1.0, >= 1.0.2)
21
21
  i18n (>= 1.6, < 2)
22
22
  minitest (>= 5.1)
@@ -25,7 +25,7 @@ GEM
25
25
  concurrent-ruby (1.1.10)
26
26
  diff-lcs (1.5.0)
27
27
  docile (1.4.0)
28
- dry-configurable (0.14.0)
28
+ dry-configurable (0.15.0)
29
29
  concurrent-ruby (~> 1.0)
30
30
  dry-core (~> 0.6)
31
31
  dry-container (0.9.0)
@@ -121,4 +121,4 @@ DEPENDENCIES
121
121
  simplecov
122
122
 
123
123
  BUNDLED WITH
124
- 2.3.10
124
+ 2.3.11
data/bin/integrations CHANGED
@@ -44,17 +44,30 @@ class Scenario
44
44
  # @param path [String] path to the scenarios file
45
45
  def initialize(path)
46
46
  @path = path
47
- @stdin, @stdout, @stderr, @wait_thr = Open3.popen3(init_and_build_cmd)
48
- @started_at = current_time
49
47
  # Last 1024 characters from stdout
50
48
  @stdout_tail = ''
51
49
  end
52
50
 
51
+ # Starts running given scenario in a separate process
52
+ def start
53
+ @stdin, @stdout, @stderr, @wait_thr = Open3.popen3(init_and_build_cmd)
54
+ @started_at = current_time
55
+ end
56
+
53
57
  # @return [String] integration spec name
54
58
  def name
55
59
  @path.gsub("#{ROOT_PATH}/spec/integrations/", '')
56
60
  end
57
61
 
62
+ # @return [Boolean] true if spec is pristine
63
+ def pristine?
64
+ scenario_dir = File.dirname(@path)
65
+
66
+ # If there is a Gemfile in a scenario directory, it means it is a pristine spec and we need
67
+ # to run bundle install, etc in order to run it
68
+ File.exist?(File.join(scenario_dir, 'Gemfile'))
69
+ end
70
+
58
71
  # @return [Boolean] did this scenario finished or is it still running
59
72
  def finished?
60
73
  # If the thread is running too long, kill it
@@ -73,6 +86,13 @@ class Scenario
73
86
  !@wait_thr.alive?
74
87
  end
75
88
 
89
+ # @return [Boolean] did this scenario finish successfully or not
90
+ def success?
91
+ expected_exit_codes = EXIT_CODES[name] || EXIT_CODES[:default]
92
+
93
+ expected_exit_codes.include?(exit_code)
94
+ end
95
+
76
96
  # @return [Integer] pid of the process of this scenario
77
97
  def pid
78
98
  @wait_thr.pid
@@ -84,13 +104,6 @@ class Scenario
84
104
  @wait_thr.value&.exitstatus || 123
85
105
  end
86
106
 
87
- # @return [Boolean] did this scenario finish successfully or not
88
- def success?
89
- expected_exit_codes = EXIT_CODES[name] || EXIT_CODES[:default]
90
-
91
- expected_exit_codes.include?(exit_code)
92
- end
93
-
94
107
  # Prints a status report when scenario is finished and stdout if it failed
95
108
  def report
96
109
  result = success? ? "\e[#{32}m#{'OK'}\e[0m" : "\e[#{31}m#{'FAILED'}\e[0m"
@@ -109,11 +122,10 @@ class Scenario
109
122
  # Sets up a proper environment for a given spec to run and returns the run command
110
123
  # @return [String] run command
111
124
  def init_and_build_cmd
112
- scenario_dir = File.dirname(@path)
113
-
114
125
  # If there is a Gemfile in a scenario directory, it means it is a pristine spec and we need
115
126
  # to run bundle install, etc in order to run it
116
- if File.exist?(File.join(scenario_dir, 'Gemfile'))
127
+ if pristine?
128
+ scenario_dir = File.dirname(@path)
117
129
  # We copy the spec into a temp dir, not to pollute the spec location with logs, etc
118
130
  temp_dir = Dir.mktmpdir
119
131
  file_name = File.basename(@path)
@@ -141,31 +153,6 @@ class Scenario
141
153
  end
142
154
  end
143
155
 
144
- # Simple array to keep track of active integration processes thread running with info on which
145
- # test scenario is running
146
- active_scenarios = []
147
-
148
- # Finished runners
149
- finished_scenarios = []
150
-
151
- # Waits for any of the processes to be finished and tracks exit codes
152
- #
153
- # @param active_scenarios [Array] active runners
154
- # @param finished_scenarios [Hash] finished forks exit codes
155
- def wait_and_track(active_scenarios, finished_scenarios)
156
- exited = active_scenarios.find(&:finished?)
157
-
158
- if exited
159
- scenario = active_scenarios.delete(exited)
160
-
161
- scenario.report
162
-
163
- finished_scenarios << scenario
164
- else
165
- Thread.pass
166
- end
167
- end
168
-
169
156
  # Load all the specs
170
157
  specs = Dir[ROOT_PATH.join('spec/integrations/**/*.rb')]
171
158
 
@@ -182,15 +169,40 @@ seed = (ENV['SEED'] || rand(0..10_000)).to_i
182
169
 
183
170
  puts "Random seed: #{seed}"
184
171
 
185
- specs.shuffle(random: Random.new(seed)).each do |integration_test|
186
- scenario = Scenario.new(integration_test)
172
+ scenarios = specs
173
+ .shuffle(random: Random.new(seed))
174
+ .map { |integration_test| Scenario.new(integration_test) }
187
175
 
188
- active_scenarios << scenario
176
+ regulars = scenarios.reject(&:pristine?)
177
+ pristine = scenarios.select(&:pristine?)
189
178
 
190
- wait_and_track(active_scenarios, finished_scenarios) until active_scenarios.size < CONCURRENCY
191
- end
179
+ active_scenarios = []
180
+ finished_scenarios = []
181
+
182
+ while finished_scenarios.size < scenarios.size
183
+ # If we have space to run another scenario, we add it
184
+ if active_scenarios.size < CONCURRENCY
185
+ scenario = nil
186
+ # We can run only one pristine at the same time due to concurrency issues within bundler
187
+ # Since they usually take longer than others, we try to run them as fast as possible when there
188
+ # is a slot
189
+ scenario = pristine.pop unless active_scenarios.any?(&:pristine?)
190
+ scenario ||= regulars.pop
191
+
192
+ if scenario
193
+ scenario.start
194
+ active_scenarios << scenario
195
+ end
196
+ end
192
197
 
193
- wait_and_track(active_scenarios, finished_scenarios) while !active_scenarios.empty?
198
+ active_scenarios.select(&:finished?).each do |exited|
199
+ scenario = active_scenarios.delete(exited)
200
+ scenario.report
201
+ finished_scenarios << scenario
202
+ end
203
+
204
+ sleep(0.1)
205
+ end
194
206
 
195
207
  # Fail all if any of the tests does not have expected exit code
196
208
  raise IntegrationTestError unless finished_scenarios.all?(&:success?)
data/config/errors.yml CHANGED
@@ -2,6 +2,7 @@ en:
2
2
  dry_validation:
3
3
  errors:
4
4
  max_timeout_vs_pause_max_timeout: pause_timeout must be less or equal to pause_max_timeout
5
+ shutdown_timeout_vs_max_wait_time: shutdown_timeout must be more than max_wait_time
5
6
  topics_names_not_unique: all topic names within a single consumer group must be unique
6
7
  required_usage_count: Given topic must be used at least once
7
8
  consumer_groups_inclusion: Unknown consumer group
data/docker-compose.yml CHANGED
@@ -14,7 +14,10 @@ services:
14
14
  KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
15
15
  KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'true'
16
16
  KAFKA_CREATE_TOPICS:
17
- "integrations_0_03:3:1,\
17
+ "integrations_0_02:2:1,\
18
+ integrations_1_02:2:1,\
19
+ integrations_2_02:2:1,\
20
+ integrations_0_03:3:1,\
18
21
  integrations_1_03:3:1,\
19
22
  integrations_2_03:3:1,\
20
23
  integrations_0_10:10:1,\
@@ -14,8 +14,8 @@ begin
14
14
  # We extend routing builder by adding a simple wrapper for easier jobs topics defining
15
15
  # This needs to be extended here as it is going to be used in karafka routes, hence doing that in
16
16
  # the railtie initializer would be too late
17
- ::Karafka::Routing::Builder.include ::Karafka::ActiveJob::RoutingExtensions
18
- ::Karafka::Routing::Proxy.include ::Karafka::ActiveJob::RoutingExtensions
17
+ ::Karafka::Routing::Builder.include ::Karafka::ActiveJob::Routing::Extensions
18
+ ::Karafka::Routing::Proxy.include ::Karafka::ActiveJob::Routing::Extensions
19
19
  rescue LoadError
20
20
  # We extend ActiveJob stuff in the railtie
21
21
  end
@@ -0,0 +1,21 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Karafka
4
+ # ActiveJob related Karafka stuff
5
+ module ActiveJob
6
+ # Karafka routing ActiveJob related components
7
+ module Routing
8
+ # Routing extensions for ActiveJob
9
+ module Extensions
10
+ # This method simplifies routes definition for ActiveJob topics / queues by auto-injecting
11
+ # the consumer class
12
+ # @param name [String, Symbol] name of the topic where ActiveJobs jobs should go
13
+ def active_job_topic(name)
14
+ topic(name) do
15
+ consumer App.config.internal.active_job.consumer
16
+ end
17
+ end
18
+ end
19
+ end
20
+ end
21
+ end
@@ -10,8 +10,8 @@ module Karafka
10
10
  attr_accessor :messages
11
11
  # @return [Karafka::Connection::Client] kafka connection client
12
12
  attr_accessor :client
13
- # @return [Karafka::TimeTrackers::Pause] current topic partition pause
14
- attr_accessor :pause
13
+ # @return [Karafka::TimeTrackers::Pause] current topic partition pause tracker
14
+ attr_accessor :pause_tracker
15
15
  # @return [Waterdrop::Producer] producer instance
16
16
  attr_accessor :producer
17
17
 
@@ -21,18 +21,18 @@ module Karafka
21
21
  # that may not yet kick in when error occurs. That way we pause always on the last processed
22
22
  # message.
23
23
  def on_consume
24
- Karafka.monitor.instrument('consumer.consume', caller: self) do
24
+ Karafka.monitor.instrument('consumer.consumed', caller: self) do
25
25
  consume
26
- end
27
26
 
28
- pause.reset
27
+ pause_tracker.reset
29
28
 
30
- # Mark as consumed only if manual offset management is not on
31
- return if topic.manual_offset_management
29
+ # Mark as consumed only if manual offset management is not on
30
+ next if topic.manual_offset_management
32
31
 
33
- # We use the non-blocking one here. If someone needs the blocking one, can implement it with
34
- # manual offset management
35
- mark_as_consumed(messages.last)
32
+ # We use the non-blocking one here. If someone needs the blocking one, can implement it
33
+ # with manual offset management
34
+ mark_as_consumed(messages.last)
35
+ end
36
36
  rescue StandardError => e
37
37
  Karafka.monitor.instrument(
38
38
  'error.occurred',
@@ -40,8 +40,8 @@ module Karafka
40
40
  caller: self,
41
41
  type: 'consumer.consume.error'
42
42
  )
43
- client.pause(topic.name, messages.first.partition, @seek_offset || messages.first.offset)
44
- pause.pause
43
+
44
+ pause(@seek_offset || messages.first.offset)
45
45
  end
46
46
 
47
47
  # Trigger method for running on shutdown.
@@ -76,8 +76,31 @@ module Karafka
76
76
  )
77
77
  end
78
78
 
79
+ # Can be used to run preparation code
80
+ #
81
+ # @private
82
+ # @note This should not be used by the end users as it is part of the lifecycle of things but
83
+ # not as part of the public api. This can act as a hook when creating non-blocking
84
+ # consumers and doing other advanced stuff
85
+ def on_prepared
86
+ Karafka.monitor.instrument('consumer.prepared', caller: self) do
87
+ prepared
88
+ end
89
+ rescue StandardError => e
90
+ Karafka.monitor.instrument(
91
+ 'error.occurred',
92
+ error: e,
93
+ caller: self,
94
+ type: 'consumer.prepared.error'
95
+ )
96
+ end
97
+
79
98
  private
80
99
 
100
+ # Method that gets called in the blocking flow allowing to setup any type of resources or to
101
+ # send additional commands to Kafka before the proper execution starts.
102
+ def prepared; end
103
+
81
104
  # Method that will perform business logic and on data received from Kafka (it will consume
82
105
  # the data)
83
106
  # @note This method needs bo be implemented in a subclass. We stub it here as a failover if
@@ -97,6 +120,10 @@ module Karafka
97
120
  # Marks message as consumed in an async way.
98
121
  #
99
122
  # @param message [Messages::Message] last successfully processed message.
123
+ # @note We keep track of this offset in case we would mark as consumed and got error when
124
+ # processing another message. In case like this we do not pause on the message we've already
125
+ # processed but rather at the next one. This applies to both sync and async versions of this
126
+ # method.
100
127
  def mark_as_consumed(message)
101
128
  client.mark_as_consumed(message)
102
129
  @seek_offset = message.offset + 1
@@ -110,6 +137,32 @@ module Karafka
110
137
  @seek_offset = message.offset + 1
111
138
  end
112
139
 
140
+ # Pauses processing on a given offset for the current topic partition
141
+ #
142
+ # After given partition is resumed, it will continue processing from the given offset
143
+ # @param offset [Integer] offset from which we want to restart the processing
144
+ # @param timeout [Integer, nil] how long in milliseconds do we want to pause or nil to use the
145
+ # default exponential pausing strategy defined for retries
146
+ def pause(offset, timeout = nil)
147
+ client.pause(
148
+ messages.metadata.topic,
149
+ messages.metadata.partition,
150
+ offset
151
+ )
152
+
153
+ timeout ? pause_tracker.pause(timeout) : pause_tracker.pause
154
+ end
155
+
156
+ # Resumes processing of the current topic partition
157
+ def resume
158
+ client.resume(
159
+ messages.metadata.topic,
160
+ messages.metadata.partition
161
+ )
162
+
163
+ pause_tracker.expire
164
+ end
165
+
113
166
  # Seeks in the context of current topic and partition
114
167
  #
115
168
  # @param offset [Integer] offset where we want to seek
@@ -30,7 +30,7 @@ module Karafka
30
30
  @mutex = Mutex.new
31
31
  @closed = false
32
32
  @subscription_group = subscription_group
33
- @buffer = MessagesBuffer.new
33
+ @buffer = RawMessagesBuffer.new
34
34
  @rebalance_manager = RebalanceManager.new
35
35
  @kafka = build_consumer
36
36
  # Marks if we need to offset. If we did not store offsets, we should not commit the offset
@@ -48,6 +48,7 @@ module Karafka
48
48
  time_poll.start
49
49
 
50
50
  @buffer.clear
51
+ @rebalance_manager.clear
51
52
 
52
53
  loop do
53
54
  # Don't fetch more messages if we do not have any time left
@@ -58,13 +59,23 @@ module Karafka
58
59
  # Fetch message within our time boundaries
59
60
  message = poll(time_poll.remaining)
60
61
 
61
- # If there are no more messages, return what we have
62
- break unless message
63
-
64
- @buffer << message
62
+ # Put a message to the buffer if there is one
63
+ @buffer << message if message
65
64
 
66
65
  # Track time spent on all of the processing and polling
67
66
  time_poll.checkpoint
67
+
68
+ # Upon polling rebalance manager might have been updated.
69
+ # If partition revocation happens, we need to remove messages from revoked partitions
70
+ # as well as ensure we do not have duplicated due to the offset reset for partitions
71
+ # that we got assigned
72
+ remove_revoked_and_duplicated_messages if @rebalance_manager.revoked_partitions?
73
+
74
+ # Finally once we've (potentially) removed revoked, etc, if no messages were returned
75
+ # we can break.
76
+ # Worth keeping in mind, that the rebalance manager might have been updated despite no
77
+ # messages being returned during a poll
78
+ break unless message
68
79
  end
69
80
 
70
81
  @buffer
@@ -84,6 +95,9 @@ module Karafka
84
95
  # Ignoring a case where there would not be an offset (for example when rebalance occurs).
85
96
  #
86
97
  # @param async [Boolean] should the commit happen async or sync (async by default)
98
+ # @return [Boolean] did committing was successful. It may be not, when we no longer own
99
+ # given partition.
100
+ #
87
101
  # @note This will commit all the offsets for the whole consumer. In order to achieve
88
102
  # granular control over where the offset should be for particular topic partitions, the
89
103
  # store_offset should be used to only store new offset when we want to to be flushed
@@ -212,6 +226,8 @@ module Karafka
212
226
  ::Karafka::Instrumentation.error_callbacks.delete(@subscription_group.id)
213
227
 
214
228
  @kafka.close
229
+ @buffer.clear
230
+ @rebalance_manager.clear
215
231
  end
216
232
  end
217
233
 
@@ -232,7 +248,7 @@ module Karafka
232
248
  # Performs a single poll operation.
233
249
  #
234
250
  # @param timeout [Integer] timeout for a single poll
235
- # @return [Array<Rdkafka::Consumer::Message>, nil] fetched messages or nil if nothing polled
251
+ # @return [Rdkafka::Consumer::Message, nil] fetched message or nil if nothing polled
236
252
  def poll(timeout)
237
253
  time_poll ||= TimeTrackers::Poll.new(timeout)
238
254
 
@@ -301,6 +317,20 @@ module Karafka
301
317
 
302
318
  consumer
303
319
  end
320
+
321
+ # We may have a case where in the middle of data polling, we've lost a partition.
322
+ # In a case like this we should remove all the pre-buffered messages from list partitions as
323
+ # we are no longer responsible in a given process for processing those messages and they
324
+ # should have been picked up by a different process.
325
+ def remove_revoked_and_duplicated_messages
326
+ @rebalance_manager.revoked_partitions.each do |topic, partitions|
327
+ partitions.each do |partition|
328
+ @buffer.delete(topic, partition)
329
+ end
330
+ end
331
+
332
+ @buffer.uniq!
333
+ end
304
334
  end
305
335
  end
306
336
  end