cdc-parallel 0.1.1 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 0f5c66df35f175cac43e88bae1fec2eba6d381d2bcb41940445863a9bb726032
4
- data.tar.gz: 975b651448c205b7458b2c381e5ae93a248c64323e019db5cae73eb880130477
3
+ metadata.gz: 72b3b33568a37fa04b270edad8511a446703f115d78d86c2347c6af2e4be76a0
4
+ data.tar.gz: e044e42d90f11b6b75b946ef0f1b1725f2810056be353a56342a908c139eb91c
5
5
  SHA512:
6
- metadata.gz: 247be6f8016f3d4ffda9e13dd5c61c1359d1ccf26947ce56674e755d296f329172fd86b85a6386eba08d4c48efb24b082bf3c6f03cdfbf6ce28f4103dc28984c
7
- data.tar.gz: 28009df5732c2679e4e433508fe8f47dc69037c7bd3416b05d9b4d5165229c2175c42dcd0dc4fe743daacfbfd76a5b4f4b6b02d10421debaf249ab3a06577256
6
+ metadata.gz: 3780fa1a616b33e824cff4f6c7f0ec8402085e5a742abbaa45082d1dc8ce6c7be04bd462aa78e4afcb61f7f9a3b0adab54ed4e47f972633c12348d57e65a632f
7
+ data.tar.gz: 01ae9a18f4d8a6bab6c5f7e71333c38475a1930bdce6326e41bc9376100e74800cb850b22b8fda1de9638a2e78be3094a27bfd1b6592a6e868554ce933b471af
data/CHANGELOG.md CHANGED
@@ -4,11 +4,45 @@ All notable changes to this project will be documented in this file.
4
4
 
5
5
  The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
6
6
 
7
- ## [Unreleased]
7
+ ## [0.2.1] - 2026-06-03
8
8
 
9
9
  ### Added
10
10
 
11
- - Placeholder for future development.
11
+ v0.2.1 - Correctness and reliability patch
12
+
13
+ - Enforced processor timeout handling.
14
+ - Fixed transaction partial-failure behavior.
15
+ - Added regression coverage for hung processors and transaction failure cases.
16
+
17
+ ## [0.2.0] - 2026-06-03
18
+
19
+ ### Added
20
+
21
+ - Pre-warmed persistent Ractor worker pool implementation.
22
+ - `ProcessorPool#process_many` for batched dispatch.
23
+ - Tiny workload benchmark for dispatch overhead analysis.
24
+ - CPU-bound workload benchmark for throughput analysis.
25
+ - Batch workload benchmark for CDC-style event processing.
26
+ - Performance test suite guarded by `CDC_PARALLEL_PERFORMANCE_TESTS=1`.
27
+ - Reusable benchmark Docker image.
28
+ - `benchmark:processor_pool` Rake task.
29
+ - `benchmark:docker_build` Rake task.
30
+ - `benchmark:docker_run` Rake task.
31
+ - Benchmark documentation and reproducibility guidance.
32
+
33
+ ### Changed
34
+
35
+ - Processor workers are now initialized once and reused for the lifetime of the pool.
36
+ - Benchmark methodology updated to measure pre-warmed worker execution.
37
+ - README updated with benchmark execution instructions and example results.
38
+
39
+ ### Performance
40
+
41
+ Local benchmark results on Ruby 4.0.5 (4 workers) demonstrated measurable throughput improvements for CPU-bound workloads using pre-warmed worker pools compared to serial execution.
42
+
43
+ Benchmark results vary by hardware, operating system, Ruby version, and workload characteristics. Users are encouraged to reproduce results on their own systems using the included benchmark suite.
44
+
45
+
12
46
 
13
47
  ## [0.1.0] - 2026-05-31
14
48
 
data/README.md CHANGED
@@ -161,3 +161,106 @@ The default `test` task runs unit, integration, and behavior tests. Performance
161
161
  ## License
162
162
 
163
163
  MIT.
164
+
165
+
166
+ ## Benchmarking
167
+
168
+ `cdc-parallel` includes reproducible benchmarks that compare serial processor execution against the pre-warmed Ractor worker pool.
169
+
170
+ The benchmark focuses on three workload categories:
171
+
172
+ | Workload | Purpose |
173
+ | -------- | ----------------------------------------------- |
174
+ | tiny | Measure dispatch overhead |
175
+ | cpu | Measure CPU-bound processing throughput |
176
+ | batch | Measure batched CDC event processing throughput |
177
+
178
+ ### Running Benchmarks
179
+
180
+ Tiny workload:
181
+
182
+ ```bash
183
+ BENCHMARK_WORKLOAD=tiny \
184
+ bundle exec rake benchmark:processor_pool
185
+ ```
186
+
187
+ CPU-bound workload:
188
+
189
+ ```bash
190
+ BENCHMARK_WORKLOAD=cpu \
191
+ BENCHMARK_CPU_ROUNDS=5000 \
192
+ bundle exec rake benchmark:processor_pool
193
+ ```
194
+
195
+ Batch workload:
196
+
197
+ ```bash
198
+ BENCHMARK_WORKLOAD=batch \
199
+ BENCHMARK_BATCH_SIZE=10000 \
200
+ bundle exec rake benchmark:processor_pool
201
+ ```
202
+
203
+ ### Benchmark Docker Image
204
+
205
+ Build and run the reusable Docker image:
206
+
207
+ ```bash
208
+ bundle exec rake benchmark:docker_build
209
+ bundle exec rake benchmark:docker_run
210
+ ```
211
+
212
+ Or run the image directly after it is published to GitHub Container Registry:
213
+
214
+ ```bash
215
+ docker run --rm ghcr.io/kanutocd/cdc-parallel-benchmark:main
216
+ ```
217
+
218
+ The benchmark image is intended to become the shared performance validation
219
+ pattern across CDC Ecosystem gems, enabling reproducible benchmark execution
220
+ locally, in CI, and across different development environments.
221
+
222
+ ### Example Result
223
+
224
+ Environment:
225
+
226
+ * Ruby 4.0.5
227
+ * x86_64 Linux
228
+ * 4 workers
229
+
230
+ CPU workload (`BENCHMARK_CPU_ROUNDS=5000`):
231
+
232
+ ```json
233
+ {
234
+ "serial": {
235
+ "events_per_second": 120.26
236
+ },
237
+ "parallel": {
238
+ "events_per_second": 250.15
239
+ },
240
+ "ratio": {
241
+ "parallel_to_serial": 2.08
242
+ }
243
+ }
244
+ ```
245
+
246
+ ### Interpretation
247
+
248
+ A ratio greater than `1.0` indicates that the pre-warmed Ractor worker pool outperformed serial execution.
249
+
250
+ ```text
251
+ ratio > 1.0 => parallel faster
252
+ ratio = 1.0 => equivalent
253
+ ratio < 1.0 => serial faster
254
+ ```
255
+
256
+ ### Reproducibility
257
+
258
+ Benchmark results vary depending on:
259
+
260
+ * CPU model
261
+ * Core count
262
+ * Operating system
263
+ * Ruby version
264
+ * Background system activity
265
+
266
+ The benchmark suite is provided so that users can reproduce and validate results on their own hardware.
@@ -12,6 +12,8 @@ module CDC
12
12
  def initialize(size: Etc.nprocessors, timeout: nil)
13
13
  raise ArgumentError, "size must be an Integer" unless size.is_a?(Integer)
14
14
  raise ArgumentError, "size must be greater than zero" unless size.positive?
15
+ raise ArgumentError, "timeout must be numeric" unless timeout.nil? || timeout.is_a?(Numeric)
16
+ raise ArgumentError, "timeout must be greater than zero" if timeout && !timeout.positive?
15
17
 
16
18
  super
17
19
  ::Ractor.make_shareable(self)
@@ -2,13 +2,13 @@
2
2
 
3
3
  module CDC
4
4
  module Parallel
5
- # Executes one Ractor-safe processor in isolated Ractor workers.
5
+ # Executes one Ractor-safe processor in pre-warmed persistent Ractor workers.
6
6
  #
7
- # This v0.1 implementation intentionally uses one-shot worker Ractors for
8
- # deterministic synchronous semantics while preserving the public pool API.
9
- # The parallel-pool dependency is kept as the runtime foundation for later
10
- # async/throughput-focused versions.
11
- class ProcessorPool
7
+ # Workers are created during initialization and reused for every dispatch.
8
+ # This pays Ractor startup cost once, keeps workers alive after processor
9
+ # failures, and provides both synchronous single-item processing and batched
10
+ # dispatch for throughput-oriented benchmarks and runtimes.
11
+ class ProcessorPool # rubocop:disable Metrics/ClassLength
12
12
  # @param processor [CDC::Core::Processor]
13
13
  # @param size [Integer]
14
14
  # @param timeout [Float, nil]
@@ -18,47 +18,147 @@ module CDC
18
18
 
19
19
  @processor = ::Ractor.make_shareable(processor)
20
20
  @configuration = Configuration.new(size:, timeout:)
21
+ @workers = Array.new(@configuration.size) do
22
+ build_worker(@processor)
23
+ end.freeze
24
+
25
+ @next_worker = 0
21
26
  @shutdown = false
22
27
  end
23
28
 
24
- # Process one ChangeEvent.
29
+ # Process one work item synchronously.
25
30
  #
26
- # @param event [CDC::Core::ChangeEvent]
31
+ # @param item [Object]
27
32
  # @return [CDC::Core::ProcessorResult]
28
- def process(event)
33
+ def process(item)
34
+ process_many([item]).fetch(0)
35
+ end
36
+
37
+ # Process many work items using the pre-warmed worker pool.
38
+ #
39
+ # Results are returned in the same order as the supplied work items.
40
+ #
41
+ # @param items [Array<Object>]
42
+ # @return [Array<CDC::Core::ProcessorResult>]
43
+ def process_many(items)
29
44
  raise ShutdownError, "processor pool has been shut down" if @shutdown
30
45
 
31
- work = ::Ractor.make_shareable(event)
32
- worker = ::Ractor.new(@processor, work) do |processor, item|
33
- CDC::Parallel::ResultCollector.normalize(processor.process(item))
34
- rescue StandardError => e
35
- CDC::Parallel::ResultCollector.worker_failure(e)
46
+ work_items = items.map { |item| ::Ractor.make_shareable(item) }
47
+ reply_port = ::Ractor::Port.new
48
+
49
+ work_items.each_with_index do |item, index|
50
+ next_worker.send([index, item, reply_port])
36
51
  end
37
52
 
38
- ResultCollector.normalize(take(worker))
53
+ collect_results(reply_port, work_items.length)
54
+ ensure
55
+ reply_port&.close
39
56
  end
40
57
 
41
58
  # Shut down the pool.
42
59
  #
43
60
  # @return [void]
44
61
  def shutdown
62
+ return if @shutdown
63
+
45
64
  @shutdown = true
65
+
66
+ @workers.each do |worker|
67
+ worker.send(nil)
68
+ rescue Ractor::ClosedError
69
+ # Already stopped.
70
+ end
46
71
  end
47
72
 
48
73
  private
49
74
 
50
75
  def validate_processor!(processor)
51
- return if processor.class.respond_to?(:ractor_safe?) && processor.class.ractor_safe?
76
+ return if processor.class.respond_to?(:ractor_safe?) &&
77
+ processor.class.ractor_safe?
52
78
 
53
- raise UnsafeProcessorError, "#{processor.class} must declare ractor_safe!"
79
+ raise UnsafeProcessorError,
80
+ "#{processor.class} must declare ractor_safe!"
54
81
  end
55
82
 
56
- def take(worker)
57
- if worker.respond_to?(:value)
58
- worker.value
83
+ def build_worker(processor) # rubocop:disable Metrics/MethodLength
84
+ ::Ractor.new(processor) do |safe_processor|
85
+ loop do
86
+ message = ::Ractor.receive
87
+ break if message.nil?
88
+
89
+ index, item, reply_port = message
90
+
91
+ response = begin
92
+ CDC::Parallel::ResultCollector.worker_success(
93
+ safe_processor.process(item)
94
+ )
95
+ rescue StandardError => e
96
+ CDC::Parallel::ResultCollector.worker_failure(e)
97
+ end
98
+
99
+ begin
100
+ reply_port << [index, response]
101
+ rescue Ractor::ClosedError
102
+ # The caller may have timed out and closed the reply port.
103
+ end
104
+ end
105
+ end
106
+ end
107
+
108
+ def next_worker
109
+ worker = @workers[@next_worker]
110
+
111
+ @next_worker += 1
112
+ @next_worker = 0 if @next_worker >= @workers.length
113
+
114
+ worker
115
+ end
116
+
117
+ def collect_results(reply_port, count)
118
+ results = Array.new(count)
119
+ return results.freeze if count.zero?
120
+
121
+ if @configuration.timeout
122
+ collect_results_with_timeout(reply_port, results)
59
123
  else
60
- worker.take
124
+ collect_results_without_timeout(reply_port, results)
125
+ end
126
+ end
127
+
128
+ def collect_results_without_timeout(reply_port, results)
129
+ results.length.times do
130
+ index, response = reply_port.receive
131
+ results[index] = ResultCollector.normalize(response)
132
+ end
133
+
134
+ results.freeze
135
+ end
136
+
137
+ def collect_results_with_timeout(reply_port, results)
138
+ deadline = Process.clock_gettime(Process::CLOCK_MONOTONIC) + @configuration.timeout
139
+
140
+ results.length.times do
141
+ remaining = deadline - Process.clock_gettime(Process::CLOCK_MONOTONIC)
142
+ return timeout_results(results) unless remaining.positive?
143
+
144
+ index, response = ::Timeout.timeout(remaining, TimeoutError) { reply_port.receive }
145
+ results[index] = ResultCollector.normalize(response)
146
+ rescue TimeoutError
147
+ return timeout_results(results)
61
148
  end
149
+
150
+ results.freeze
151
+ end
152
+
153
+ def timeout_results(results)
154
+ missing = results.count(&:nil?)
155
+ timeout_error = TimeoutError.new(
156
+ "processor pool timed out after #{@configuration.timeout} seconds waiting for #{missing} result(s)"
157
+ )
158
+
159
+ results.map do |result|
160
+ result || CDC::Core::ProcessorResult.failure(timeout_error)
161
+ end.freeze
62
162
  end
63
163
  end
64
164
  end
@@ -6,6 +6,14 @@ module CDC
6
6
  class ResultCollector
7
7
  FAILURE_MARKER = :__cdc_parallel_failure__
8
8
 
9
+ # Build a shareable success payload that can safely cross a Ractor boundary.
10
+ #
11
+ # @param value [Object]
12
+ # @return [Object]
13
+ def self.worker_success(value)
14
+ ::Ractor.make_shareable(value)
15
+ end
16
+
9
17
  # Build a shareable failure payload that can safely cross a Ractor boundary.
10
18
  #
11
19
  # @param error [Exception]
@@ -3,6 +3,6 @@
3
3
  module CDC
4
4
  module Parallel
5
5
  # Current cdc-parallel version.
6
- VERSION = "0.1.1"
6
+ VERSION = "0.2.1"
7
7
  end
8
8
  end
data/lib/cdc/parallel.rb CHANGED
@@ -1,7 +1,7 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require "etc"
4
- require "ractor-pool"
4
+ require "timeout"
5
5
 
6
6
  require_relative "parallel/version"
7
7
  require_relative "parallel/errors"
@@ -13,7 +13,7 @@ require_relative "parallel/router"
13
13
  require_relative "parallel/runtime"
14
14
 
15
15
  module CDC
16
- # Optional high-throughput Ractor runtime for cdc-core processors.
16
+ # Optional parallel Change Data Capture runtime for cdc-core processors.
17
17
  module Parallel
18
18
  end
19
19
  end
@@ -1,16 +1,15 @@
1
1
  module CDC
2
2
  module Parallel
3
- # Executes one Ractor-safe processor in isolated Ractor workers.
4
- #
5
- # This v0.1 implementation intentionally uses one-shot worker Ractors for
6
- # deterministic synchronous semantics while preserving the public pool API.
7
- # The parallel-pool dependency is kept as the runtime foundation for later
8
- # async/throughput-focused versions.
3
+ # Executes one Ractor-safe processor in pre-warmed persistent Ractor workers.
9
4
  class ProcessorPool
10
5
  @processor: untyped
11
6
 
12
7
  @configuration: untyped
13
8
 
9
+ @workers: untyped
10
+
11
+ @next_worker: Integer
12
+
14
13
  @shutdown: untyped
15
14
 
16
15
  # @param processor [CDC::Core::Processor]
@@ -19,11 +18,17 @@ module CDC
19
18
  # @return [void]
20
19
  def initialize: (processor: untyped, ?size: untyped, ?timeout: untyped?) -> void
21
20
 
22
- # Process one ChangeEvent.
21
+ # Process one work item synchronously.
23
22
  #
24
- # @param event [CDC::Core::ChangeEvent]
23
+ # @param item [Object]
25
24
  # @return [CDC::Core::ProcessorResult]
26
- def process: (untyped event) -> untyped
25
+ def process: (untyped item) -> untyped
26
+
27
+ # Process many work items using the pre-warmed worker pool.
28
+ #
29
+ # @param items [Array<Object>]
30
+ # @return [Array<CDC::Core::ProcessorResult>]
31
+ def process_many: (untyped items) -> untyped
27
32
 
28
33
  # Shut down the pool.
29
34
  #
@@ -34,7 +39,17 @@ module CDC
34
39
 
35
40
  def validate_processor!: (untyped processor) -> (nil | untyped)
36
41
 
37
- def take: (untyped worker) -> untyped
42
+ def build_worker: (untyped processor) -> untyped
43
+
44
+ def next_worker: () -> untyped
45
+
46
+ def collect_results: (untyped reply_port, Integer count) -> untyped
47
+
48
+ def collect_results_without_timeout: (untyped reply_port, untyped results) -> untyped
49
+
50
+ def collect_results_with_timeout: (untyped reply_port, untyped results) -> untyped
51
+
52
+ def timeout_results: (untyped results) -> untyped
38
53
  end
39
54
  end
40
55
  end
@@ -4,6 +4,12 @@ module CDC
4
4
  class ResultCollector
5
5
  FAILURE_MARKER: :__cdc_parallel_failure__
6
6
 
7
+ # Build a shareable success payload that can safely cross a Ractor boundary.
8
+ #
9
+ # @param value [Object]
10
+ # @return [Object]
11
+ def self.worker_success: (untyped value) -> untyped
12
+
7
13
  # Build a shareable failure payload that can safely cross a Ractor boundary.
8
14
  #
9
15
  # @param error [Exception]
@@ -1,6 +1,6 @@
1
1
  module CDC
2
2
  module Parallel
3
3
  # Current cdc-parallel version.
4
- VERSION: "0.1.0"
4
+ VERSION: "0.2.0"
5
5
  end
6
6
  end
@@ -0,0 +1,3 @@
1
+ module Timeout
2
+ def self.timeout: (untyped sec, untyped klass) { () -> untyped } -> untyped
3
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: cdc-parallel
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.2.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ken C. Demanawa
@@ -73,6 +73,7 @@ files:
73
73
  - sig/shims/cdc_core.rbs
74
74
  - sig/shims/data_define.rbs
75
75
  - sig/shims/etc.rbs
76
+ - sig/shims/timeout.rbs
76
77
  homepage: https://kanutocd.github.io/cdc-parallel/
77
78
  licenses:
78
79
  - MIT