cdc-parallel 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7b66764665f18a0f84cc6b45c5138d932fb000d01be2bef02a26504e7013c5cb
4
- data.tar.gz: 12daffb757cae8c4e9c5330cce505cc964834ac205fc89d3febc29944772f815
3
+ metadata.gz: 6e16d6352e78132e2e0f488542b17117bda8733ca15b3117f8151b6acbfc3567
4
+ data.tar.gz: 3e28c6c37d5078696ab334f5c8b15409172bebb8102747c2e7558f0089d4ad64
5
5
  SHA512:
6
- metadata.gz: e7e3bd0dcd37187e7f31a9930fdea00680f898ce52c31cf33b232004bd9da4ec07cb132fc1bd71a337c7a8db99b7888618d96fc435132e16eb3038e1ddb8588f
7
- data.tar.gz: f9af8ecfb077ef1898d312f45169b6b0b81d3ea641af1e9f00374999cd06542bcda7561fc8878b147f61ebcd4c0f5503f88d3f332f1657be19bc45504d77de49
6
+ metadata.gz: 4e71ef2eeda63a9d6f6c59d49ba140b73a62d0e37894e168060a4c796f28a7d6790b37b891c7188fff78de1e843386c0f892f0720ba3de16ea9bfe198e719382
7
+ data.tar.gz: de5eb4cb7861e263402305e361d4ac49d335d9b657562cca93544c64f34f76646289212647506c87f4d6ef7eb5ec586cbb58596ad9427f8becb99ca1af82127c
data/CHANGELOG.md CHANGED
@@ -4,11 +4,35 @@ All notable changes to this project will be documented in this file.
4
4
 
5
5
  The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
6
6
 
7
- ## [Unreleased]
7
+ ## [0.2.0] - 2026-06-03
8
8
 
9
9
  ### Added
10
10
 
11
- - Placeholder for future development.
11
+ - Pre-warmed persistent Ractor worker pool implementation.
12
+ - `ProcessorPool#process_many` for batched dispatch.
13
+ - Tiny workload benchmark for dispatch overhead analysis.
14
+ - CPU-bound workload benchmark for throughput analysis.
15
+ - Batch workload benchmark for CDC-style event processing.
16
+ - Performance test suite guarded by `CDC_PARALLEL_PERFORMANCE_TESTS=1`.
17
+ - Reusable benchmark Docker image.
18
+ - `benchmark:processor_pool` Rake task.
19
+ - `benchmark:docker_build` Rake task.
20
+ - `benchmark:docker_run` Rake task.
21
+ - Benchmark documentation and reproducibility guidance.
22
+
23
+ ### Changed
24
+
25
+ - Processor workers are now initialized once and reused for the lifetime of the pool.
26
+ - Benchmark methodology updated to measure pre-warmed worker execution.
27
+ - README updated with benchmark execution instructions and example results.
28
+
29
+ ### Performance
30
+
31
+ Local benchmark results on Ruby 4.0.5 (4 workers) demonstrated measurable throughput improvements for CPU-bound workloads using pre-warmed worker pools compared to serial execution.
32
+
33
+ Benchmark results vary by hardware, operating system, Ruby version, and workload characteristics. Users are encouraged to reproduce results on their own systems using the included benchmark suite.
34
+
35
+
12
36
 
13
37
  ## [0.1.0] - 2026-05-31
14
38
 
@@ -28,3 +52,10 @@ The format is based on Keep a Changelog, and this project adheres to Semantic Ve
28
52
  - Added Minitest suite.
29
53
  - Added README and example.
30
54
  - Added CI and release workflows.
55
+
56
+ ## [0.1.1] - 2026-06-03
57
+
58
+ No code changes.
59
+
60
+ Improves RubyGems metadata and documentation wording to
61
+ explicitly identify CDC as Change Data Capture.
data/README.md CHANGED
@@ -3,7 +3,7 @@
3
3
  [![Gem Version](https://badge.fury.io/rb/cdc-parallel.svg)](https://badge.fury.io/rb/cdc-parallel)
4
4
  [![CI](https://github.com/kanutocd/cdc-parallel/workflows/CI/badge.svg)](https://github.com/kanutocd/cdc-parallel/actions)
5
5
  [![Coverage Status](https://codecov.io/gh/kanutocd/cdc-parallel/branch/main/graph/badge.svg)](https://codecov.io/gh/kanutocd/cdc-parallel)
6
- [![Ruby Version](https://img.shields.io/badge/ruby-%3E%3D%203.4-ruby.svg)](https://www.ruby-lang.org/en/)
6
+ [![Ruby Version](https://img.shields.io/badge/ruby-%3E%3D%204.0-ruby.svg)](https://www.ruby-lang.org/en/)
7
7
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
8
8
 
9
9
  Optional high-throughput Ractor runtime for `cdc-core`.
@@ -161,3 +161,106 @@ The default `test` task runs unit, integration, and behavior tests. Performance
161
161
  ## License
162
162
 
163
163
  MIT.
164
+
165
+
166
+ ## Benchmarking
167
+
168
+ `cdc-parallel` includes reproducible benchmarks that compare serial processor execution against the pre-warmed Ractor worker pool.
169
+
170
+ The benchmark focuses on three workload categories:
171
+
172
+ | Workload | Purpose |
173
+ | -------- | ----------------------------------------------- |
174
+ | tiny | Measure dispatch overhead |
175
+ | cpu | Measure CPU-bound processing throughput |
176
+ | batch | Measure batched CDC event processing throughput |
177
+
178
+ ### Running Benchmarks
179
+
180
+ Tiny workload:
181
+
182
+ ```bash
183
+ BENCHMARK_WORKLOAD=tiny \
184
+ bundle exec rake benchmark:processor_pool
185
+ ```
186
+
187
+ CPU-bound workload:
188
+
189
+ ```bash
190
+ BENCHMARK_WORKLOAD=cpu \
191
+ BENCHMARK_CPU_ROUNDS=5000 \
192
+ bundle exec rake benchmark:processor_pool
193
+ ```
194
+
195
+ Batch workload:
196
+
197
+ ```bash
198
+ BENCHMARK_WORKLOAD=batch \
199
+ BENCHMARK_BATCH_SIZE=10000 \
200
+ bundle exec rake benchmark:processor_pool
201
+ ```
202
+
203
+ ### Benchmark Docker Image
204
+
205
+ Build and run the reusable Docker image:
206
+
207
+ ```bash
208
+ bundle exec rake benchmark:docker_build
209
+ bundle exec rake benchmark:docker_run
210
+ ```
211
+
212
+ Or run the image directly after it is published to GitHub Container Registry:
213
+
214
+ ```bash
215
+ docker run --rm ghcr.io/kanutocd/cdc-parallel-benchmark:main
216
+ ```
217
+
218
+ The benchmark image is intended to become the shared performance validation
219
+ pattern across CDC Ecosystem gems, enabling reproducible benchmark execution
220
+ locally, in CI, and across different development environments.
221
+
222
+ ### Example Result
223
+
224
+ Environment:
225
+
226
+ * Ruby 4.0.5
227
+ * x86_64 Linux
228
+ * 4 workers
229
+
230
+ CPU workload (`BENCHMARK_CPU_ROUNDS=5000`):
231
+
232
+ ```json
233
+ {
234
+ "serial": {
235
+ "events_per_second": 120.26
236
+ },
237
+ "parallel": {
238
+ "events_per_second": 250.15
239
+ },
240
+ "ratio": {
241
+ "parallel_to_serial": 2.08
242
+ }
243
+ }
244
+ ```
245
+
246
+ ### Interpretation
247
+
248
+ A ratio greater than `1.0` indicates that the pre-warmed Ractor worker pool outperformed serial execution.
249
+
250
+ ```text
251
+ ratio > 1.0 => parallel faster
252
+ ratio = 1.0 => equivalent
253
+ ratio < 1.0 => serial faster
254
+ ```
255
+
256
+ ### Reproducibility
257
+
258
+ Benchmark results vary depending on:
259
+
260
+ * CPU model
261
+ * Core count
262
+ * Operating system
263
+ * Ruby version
264
+ * Background system activity
265
+
266
+ The benchmark suite is provided so that users can reproduce and validate results on their own hardware.
@@ -2,12 +2,12 @@
2
2
 
3
3
  module CDC
4
4
  module Parallel
5
- # Executes one Ractor-safe processor in isolated Ractor workers.
5
+ # Executes one Ractor-safe processor in pre-warmed persistent Ractor workers.
6
6
  #
7
- # This v0.1 implementation intentionally uses one-shot worker Ractors for
8
- # deterministic synchronous semantics while preserving the public pool API.
9
- # The parallel-pool dependency is kept as the runtime foundation for later
10
- # async/throughput-focused versions.
7
+ # Workers are created during initialization and reused for every dispatch.
8
+ # This pays Ractor startup cost once, keeps workers alive after processor
9
+ # failures, and provides both synchronous single-item processing and batched
10
+ # dispatch for throughput-oriented benchmarks and runtimes.
11
11
  class ProcessorPool
12
12
  # @param processor [CDC::Core::Processor]
13
13
  # @param size [Integer]
@@ -18,47 +18,107 @@ module CDC
18
18
 
19
19
  @processor = ::Ractor.make_shareable(processor)
20
20
  @configuration = Configuration.new(size:, timeout:)
21
+ @workers = Array.new(@configuration.size) do
22
+ build_worker(@processor)
23
+ end.freeze
24
+
25
+ @next_worker = 0
21
26
  @shutdown = false
22
27
  end
23
28
 
24
- # Process one ChangeEvent.
29
+ # Process one work item synchronously.
25
30
  #
26
- # @param event [CDC::Core::ChangeEvent]
31
+ # @param item [Object]
27
32
  # @return [CDC::Core::ProcessorResult]
28
- def process(event)
33
+ def process(item)
34
+ process_many([item]).fetch(0)
35
+ end
36
+
37
+ # Process many work items using the pre-warmed worker pool.
38
+ #
39
+ # Results are returned in the same order as the supplied work items.
40
+ #
41
+ # @param items [Array<Object>]
42
+ # @return [Array<CDC::Core::ProcessorResult>]
43
+ def process_many(items)
29
44
  raise ShutdownError, "processor pool has been shut down" if @shutdown
30
45
 
31
- work = ::Ractor.make_shareable(event)
32
- worker = ::Ractor.new(@processor, work) do |processor, item|
33
- CDC::Parallel::ResultCollector.normalize(processor.process(item))
34
- rescue StandardError => e
35
- CDC::Parallel::ResultCollector.worker_failure(e)
46
+ work_items = items.map { |item| ::Ractor.make_shareable(item) }
47
+ reply_port = ::Ractor::Port.new
48
+
49
+ work_items.each_with_index do |item, index|
50
+ next_worker.send([index, item, reply_port])
36
51
  end
37
52
 
38
- ResultCollector.normalize(take(worker))
53
+ collect_results(reply_port, work_items.length)
54
+ ensure
55
+ reply_port&.close
39
56
  end
40
57
 
41
58
  # Shut down the pool.
42
59
  #
43
60
  # @return [void]
44
61
  def shutdown
62
+ return if @shutdown
63
+
45
64
  @shutdown = true
65
+
66
+ @workers.each do |worker|
67
+ worker.send(nil)
68
+ rescue Ractor::ClosedError
69
+ # Already stopped.
70
+ end
46
71
  end
47
72
 
48
73
  private
49
74
 
50
75
  def validate_processor!(processor)
51
- return if processor.class.respond_to?(:ractor_safe?) && processor.class.ractor_safe?
76
+ return if processor.class.respond_to?(:ractor_safe?) &&
77
+ processor.class.ractor_safe?
52
78
 
53
- raise UnsafeProcessorError, "#{processor.class} must declare ractor_safe!"
79
+ raise UnsafeProcessorError,
80
+ "#{processor.class} must declare ractor_safe!"
54
81
  end
55
82
 
56
- def take(worker)
57
- if worker.respond_to?(:value)
58
- worker.value
59
- else
60
- worker.take
83
+ def build_worker(processor)
84
+ ::Ractor.new(processor) do |safe_processor|
85
+ loop do
86
+ message = ::Ractor.receive
87
+ break if message.nil?
88
+
89
+ index, item, reply_port = message
90
+
91
+ response = begin
92
+ CDC::Parallel::ResultCollector.worker_success(
93
+ safe_processor.process(item)
94
+ )
95
+ rescue StandardError => e
96
+ CDC::Parallel::ResultCollector.worker_failure(e)
97
+ end
98
+
99
+ reply_port << [index, response]
100
+ end
101
+ end
102
+ end
103
+
104
+ def next_worker
105
+ worker = @workers[@next_worker]
106
+
107
+ @next_worker += 1
108
+ @next_worker = 0 if @next_worker >= @workers.length
109
+
110
+ worker
111
+ end
112
+
113
+ def collect_results(reply_port, count)
114
+ results = Array.new(count)
115
+
116
+ count.times do
117
+ index, response = reply_port.receive
118
+ results[index] = ResultCollector.normalize(response)
61
119
  end
120
+
121
+ results.freeze
62
122
  end
63
123
  end
64
124
  end
@@ -6,6 +6,14 @@ module CDC
6
6
  class ResultCollector
7
7
  FAILURE_MARKER = :__cdc_parallel_failure__
8
8
 
9
+ # Build a shareable success payload that can safely cross a Ractor boundary.
10
+ #
11
+ # @param value [Object]
12
+ # @return [Object]
13
+ def self.worker_success(value)
14
+ ::Ractor.make_shareable(value)
15
+ end
16
+
9
17
  # Build a shareable failure payload that can safely cross a Ractor boundary.
10
18
  #
11
19
  # @param error [Exception]
@@ -3,6 +3,6 @@
3
3
  module CDC
4
4
  module Parallel
5
5
  # Current cdc-parallel version.
6
- VERSION = "0.1.0"
6
+ VERSION = "0.2.0"
7
7
  end
8
8
  end
data/lib/cdc/parallel.rb CHANGED
@@ -1,7 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require "etc"
4
- require "ractor-pool"
5
4
 
6
5
  require_relative "parallel/version"
7
6
  require_relative "parallel/errors"
@@ -13,7 +12,7 @@ require_relative "parallel/router"
13
12
  require_relative "parallel/runtime"
14
13
 
15
14
  module CDC
16
- # Optional high-throughput Ractor runtime for cdc-core processors.
15
+ # Optional parallel Change Data Capture runtime for cdc-core processors.
17
16
  module Parallel
18
17
  end
19
18
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: cdc-parallel
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ken C. Demanawa
@@ -39,8 +39,8 @@ dependencies:
39
39
  version: 0.4.0
40
40
  description: |
41
41
  cdc-parallel provides optional Ractor-backed parallel execution for
42
- cdc-core. It accelerates Change Data Capture (CDC) pipelines while
43
- preserving the simplicity and composability of the CDC Ecosystem.
42
+ cdc-core. It accelerates PostgreSQL Change Data Capture (CDC) event
43
+ processing while preserving the cdc-core programming model.
44
44
  email:
45
45
  - kenneth.c.demanawa@gmail.com
46
46
  executables: []
@@ -98,5 +98,5 @@ required_rubygems_version: !ruby/object:Gem::Requirement
98
98
  requirements: []
99
99
  rubygems_version: 4.0.10
100
100
  specification_version: 4
101
- summary: Optional parallel runtime for the CDC Ecosystem.
101
+ summary: Optional parallel Change Data Capture (CDC) runtime for cdc-core.
102
102
  test_files: []