racecar 3.0.0.alpha.3 → 3.0.0.beta.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 043070ffc30189dd4f500a691316db7a2a55422078b7bf5f31b81ff45aa69cab
4
- data.tar.gz: 73a2d774f56f7f14d18c5c43691f185ab64fd7e2f3cd6a29dfcad8b9111f00b5
3
+ metadata.gz: b0b740ff066f0f1aee2cd6bffac700badbbf684d2d9575c298aefe6a92a50d75
4
+ data.tar.gz: 022b4e5aea481beb637407cebe248c7890713a9ad64563edd316efca3f8ddfd6
5
5
  SHA512:
6
- metadata.gz: d0ea838d6e6381660a88b6e1d71c95d0fbe741609dc216b18830d8a74eee78b0d56a6e05713b4f12ac28f60146b566f4f125d53b4c4bd4c52c3c13459941fac4
7
- data.tar.gz: 213281a3b10ad91e3d7bcbf3a8974fbf15d8e1b5905518d3a4f13505ede101d254a433f86b1e7f875ccdad3c7d4d59fbc570b0d9db63f146615f4f454a65bd85
6
+ metadata.gz: fb0f983b58a63ec97ae7f7a47b42cb47ff3a03907055b2c855305918b30964983c15882f806bff173acbbff224d22763331c8503366b0b7783af3ad9e7535dac
7
+ data.tar.gz: 4f95b4582e091c8543aeaf683f915a55880dc622d49f67a56c8dbac160531babcff640919804e555d971022035dfb865931c8d603433c2b5748cb9c11c5b7642
data/CHANGELOG.md CHANGED
@@ -2,6 +2,17 @@
2
2
 
3
3
  ## Unreleased
4
4
 
5
+ ## 3.0.0
6
+
7
+ * Introduce multithreaded processing: when enabled, Racecar processes each assigned partition on its own dedicated thread. Disabled by default and gated behind the `multithreaded_processing_enabled` config.
8
+ * Refactor of the Racecar architecture: introduction of `PartitionProcessor` and `AsyncPartitionProcessor` handling the processing of messages.
9
+ * [Racecar::Config] Add `multithreaded_processing_enabled` (default `false`) to enable multithreaded processing. Can be set via `RACECAR_MULTITHREADED_PROCESSING_ENABLED=1`.
10
+ * [Racecar::Config] Add `multithreaded_processing_max_queue_size` (default `1000`) to cap the number of messages queued per partition before backpressure is applied.
11
+ * [Racecar::Config] Add `multithreaded_processing_resume_threshold` (default `0.5`) controlling the queue fill ratio at which a paused partition is resumed.
12
+ * [Racecar::Config] Add `multithreaded_processing_shutdown_timeout` (default `300`) for how long the main thread waits on each processing thread during graceful shutdown.
13
+ * Apply backpressure when multithreaded processing is enabled: a partition is paused once its queue reaches `multithreaded_processing_max_queue_size` and resumed once it drains below `multithreaded_processing_resume_threshold` of that size.
14
+ * Gracefully drain queued messages and exit per-partition threads on rebalance and shutdown.
15
+
5
16
  ## 2.12.0
6
17
 
7
18
  * Add tests against Ruby 3.4
data/Gemfile.lock CHANGED
@@ -1,7 +1,8 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- racecar (3.0.0.alpha.3)
4
+ racecar (3.0.0.beta.2)
5
+ concurrent-ruby (~> 1.3)
5
6
  king_konf (~> 1.0.0)
6
7
  rdkafka (>= 0.15.0)
7
8
 
@@ -81,4 +82,4 @@ DEPENDENCIES
81
82
  timecop
82
83
 
83
84
  BUNDLED WITH
84
- 2.6.2
85
+ 2.6.9
data/README.md CHANGED
@@ -134,6 +134,42 @@ end
134
134
 
135
135
  This is useful to do any one-off work that you wouldn't want to do for each and every message.
136
136
 
137
+ #### Multithreaded message processing (experimental)
138
+
139
+ Warning - limited battle testing in production environments; use at your own risk!
140
+
141
+ By default a Racecar consumer processes all of its assigned partitions on a single thread. When `multithreaded_processing_enabled` is set, the consumer instead spins up one dedicated thread per assigned partition, so partitions are processed concurrently within a single process. This is an alternative to [parallel workers](#running-consumers-in-parallel-experimental) that avoids forking extra processes (and the associated memory overhead), at the cost of running your consumer code on multiple threads.
142
+
143
+ Each partition thread gets its own instance of your consumer class, so **your consumer code does not need to be thread-safe** - a thread never shares its instance (or its instance state) with another partition. The main thread keeps polling Kafka and hands each partition's messages off to the relevant thread via a bounded queue; if a thread falls behind and its queue fills up, the partition is paused until the queue drains, applying backpressure rather than growing memory without bound.
144
+
145
+ **Warning:** the number of threads scales with the number of assigned partitions, which can be large. Since each thread runs its own consumer instance, every resource that instance acquires (database connections, file handles, network sockets, HTTP clients, etc.) is multiplied by the number of partition threads. Make sure your consumer releases any resource it grabs and that any connection pools or other shared limits are sized to accommodate the resulting concurrency.
146
+
147
+ Enable it via config (or the `RACECAR_MULTITHREADED_PROCESSING_ENABLED=1` environment variable):
148
+
149
+ ```ruby
150
+ Racecar.configure do |config|
151
+ config.multithreaded_processing_enabled = true
152
+ end
153
+
154
+ class ResizeImagesConsumer < Racecar::Consumer
155
+ subscribes_to "images"
156
+
157
+ def process(message)
158
+ # This runs on a thread dedicated to message.partition.
159
+ # @state below is private to this partition's thread.
160
+ @state ||= {}
161
+ Image.resize(message.value)
162
+ end
163
+ end
164
+ ```
165
+
166
+ The behaviour can be tuned with the following options:
167
+
168
+ - `multithreaded_processing_enabled` – Enable per-partition threads. Default is `false`.
169
+ - `multithreaded_processing_max_queue_size` – Maximum number of queued message batches per partition before the partition is paused to apply backpressure. Default is `1000`.
170
+ - `multithreaded_processing_resume_threshold` – A paused partition is resumed once its queue drains below this fraction of `multithreaded_processing_max_queue_size`. Default is `0.5` (50%).
171
+ - `multithreaded_processing_shutdown_timeout` – How many seconds to wait for each partition thread to finish during graceful shutdown. Default is `300`.
172
+
137
173
  #### Setting the starting position
138
174
 
139
175
  When a consumer is started for the first time, it needs to decide where in each partition to start. By default, it will start at the _beginning_, meaning that all past messages will be processed. If you want to instead start at the _end_ of each partition, change your `subscribes_to` like this:
@@ -7,12 +7,14 @@ module Racecar
7
7
  class AsyncPartitionProcessor
8
8
  attr_reader :thread
9
9
 
10
- THREAD_KEY_IDENTIFIER = 'racecar_topic_partition_identifier'.freeze
11
-
12
10
  def self.thread_key(topic, partition)
13
11
  "#{topic}/#{partition}"
14
12
  end
15
13
 
14
+ def thread_key
15
+ self.class.thread_key(@topic, @partition)
16
+ end
17
+
16
18
  def initialize(topic:, partition:, logger:, config:, consumer:, consumer_class:, instrumenter:, rdkafka_consumer:)
17
19
  @topic = topic
18
20
  @partition = partition
@@ -70,7 +72,6 @@ module Racecar
70
72
  rdkafka_consumer: @rdkafka_consumer,
71
73
  )
72
74
  @queue = Queue.new
73
- @thread = nil
74
75
 
75
76
  use_process_batch = consumer_class.method_defined?(:process_batch)
76
77
 
@@ -90,7 +91,6 @@ module Racecar
90
91
  def spawn_thread(&block)
91
92
  @thread = Thread.new do
92
93
  Thread.current.name = "Racecar thread for #{thread_key}"
93
- Thread.current[AsyncPartitionProcessor::THREAD_KEY_IDENTIFIER] = thread_key
94
94
  main_processing_loop(block)
95
95
  end
96
96
  end
@@ -121,10 +121,6 @@ module Racecar
121
121
  end
122
122
  end
123
123
 
124
- def thread_key
125
- self.class.thread_key(@topic, @partition)
126
- end
127
-
128
124
  def main_processing_loop(block)
129
125
  loop do
130
126
  msgs = @queue.pop
@@ -48,9 +48,8 @@ module Racecar
48
48
 
49
49
  with_error_handling(message, payload) do |pause|
50
50
  @instrumenter.instrument("process_message", payload) do
51
- if @config.multithreaded_processing_enabled && consumer_class_instance.instance_variable_get(:@producer)&.closed?
52
- reconfigure_consumer_class_instance!
53
- end
51
+ reconfigure_if_producer_closed!
52
+ exit_if_rebalancing!
54
53
  consumer_class_instance.process(Racecar::Message.new(message, retries_count: pause.pauses_count))
55
54
  consumer_class_instance.deliver!
56
55
  consumer.store_offset(message, @rdkafka_consumer) unless rebalancing
@@ -76,9 +75,8 @@ module Racecar
76
75
  racecar_messages = messages.map do |message|
77
76
  Racecar::Message.new(message, retries_count: pause.pauses_count)
78
77
  end
79
- if @config.multithreaded_processing_enabled && consumer_class_instance.instance_variable_get(:@producer)&.closed?
80
- reconfigure_consumer_class_instance!
81
- end
78
+ reconfigure_if_producer_closed!
79
+ exit_if_rebalancing!
82
80
  consumer_class_instance.process_batch(racecar_messages)
83
81
  consumer_class_instance.deliver!
84
82
  consumer.store_offset(messages.last, @rdkafka_consumer) unless rebalancing
@@ -146,14 +144,20 @@ module Racecar
146
144
  elsif !shutting_down
147
145
  handle_processing_error(e, payload, pause: pause)
148
146
  pause.pause!
149
- unless config.pause_timeout <= 0
150
- @sleep_mutex.synchronize do
151
- next if rebalancing || shutting_down
147
+
148
+ break if config.pause_timeout == 0
149
+
150
+ @sleep_mutex.synchronize do
151
+ next if rebalancing || shutting_down
152
+ if config.pause_timeout == -1
153
+ # Pause indefinitely. backoff_interval is Float::INFINITY here,
154
+ @sleep_cv.wait(@sleep_mutex)
155
+ else
152
156
  @sleep_cv.wait(@sleep_mutex, pause.backoff_interval)
153
157
  end
154
158
  end
155
159
  Thread.exit if rebalancing
156
- break if shutting_down || config.pause_timeout <= 0
160
+ break if shutting_down
157
161
  else
158
162
  handle_processing_error(e, payload, pause: pause)
159
163
  break
@@ -205,6 +209,21 @@ module Racecar
205
209
  reconfigure_consumer_class_instance!
206
210
  end
207
211
 
212
+ def reconfigure_if_producer_closed!
213
+ return unless @config.multithreaded_processing_enabled
214
+ return unless consumer_class_instance.instance_variable_get(:@producer)&.closed?
215
+
216
+ reconfigure_consumer_class_instance!
217
+ end
218
+
219
+ def exit_if_rebalancing!
220
+ return unless @config.multithreaded_processing_enabled
221
+ if rebalancing
222
+ logger.info "Exiting processing thread for #{topic}/#{partition} due to rebalancing"
223
+ Thread.exit
224
+ end
225
+ end
226
+
208
227
  def reconfigure_consumer_class_instance!
209
228
  consumer_class_instance.configure(
210
229
  producer: consumer.producer,
@@ -2,7 +2,6 @@ module Racecar
2
2
  class RebalanceListener
3
3
  def initialize(config, instrumenter, partition_processors)
4
4
  @consumer_class = config.consumer_class
5
- @config = config
6
5
  @instrumenter = instrumenter
7
6
  @partition_processors = partition_processors
8
7
  @rdkafka_consumer = nil
@@ -1,6 +1,7 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require "rdkafka"
4
+ require "timeout"
4
5
  require "racecar/pause"
5
6
  require "racecar/message"
6
7
  require "racecar/message_delivery_error"
@@ -168,7 +169,9 @@ module Racecar
168
169
  processors_snapshot.each do |processor|
169
170
  if processor.respond_to?(:thread)
170
171
  begin
171
- processor.thread.join(config.multithreaded_processing_shutdown_timeout)
172
+ raise Timeout::Error unless processor.thread.join(config.multithreaded_processing_shutdown_timeout)
173
+ rescue Timeout::Error
174
+ logger.error "Processor thread for #{processor.thread_key} did not finish within #{config.multithreaded_processing_shutdown_timeout} seconds. It may be stuck in a long-running process or blocked on I/O."
172
175
  rescue => e
173
176
  logger.error "Error while waiting for processor thread to finish: #{e}"
174
177
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Racecar
4
- VERSION = "3.0.0.alpha.3"
4
+ VERSION = "3.0.0.beta.2"
5
5
  end
data/racecar.gemspec CHANGED
@@ -24,6 +24,7 @@ Gem::Specification.new do |spec|
24
24
 
25
25
  spec.add_runtime_dependency "king_konf", "~> 1.0.0"
26
26
  spec.add_runtime_dependency "rdkafka", ">= 0.15.0"
27
+ spec.add_runtime_dependency "concurrent-ruby", "~> 1.3"
27
28
 
28
29
  spec.add_development_dependency "bundler", [">= 1.13", "< 3"]
29
30
  spec.add_development_dependency "pry-byebug"
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: racecar
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.0.0.alpha.3
4
+ version: 3.0.0.beta.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Daniel Schierbeck
@@ -38,6 +38,20 @@ dependencies:
38
38
  - - ">="
39
39
  - !ruby/object:Gem::Version
40
40
  version: 0.15.0
41
+ - !ruby/object:Gem::Dependency
42
+ name: concurrent-ruby
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '1.3'
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '1.3'
41
55
  - !ruby/object:Gem::Dependency
42
56
  name: bundler
43
57
  requirement: !ruby/object:Gem::Requirement