cdc-parallel 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +33 -2
- data/README.md +104 -1
- data/lib/cdc/parallel/processor_pool.rb +81 -21
- data/lib/cdc/parallel/result_collector.rb +8 -0
- data/lib/cdc/parallel/version.rb +1 -1
- data/lib/cdc/parallel.rb +1 -2
- metadata +4 -4
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 6e16d6352e78132e2e0f488542b17117bda8733ca15b3117f8151b6acbfc3567
|
|
4
|
+
data.tar.gz: 3e28c6c37d5078696ab334f5c8b15409172bebb8102747c2e7558f0089d4ad64
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 4e71ef2eeda63a9d6f6c59d49ba140b73a62d0e37894e168060a4c796f28a7d6790b37b891c7188fff78de1e843386c0f892f0720ba3de16ea9bfe198e719382
|
|
7
|
+
data.tar.gz: de5eb4cb7861e263402305e361d4ac49d335d9b657562cca93544c64f34f76646289212647506c87f4d6ef7eb5ec586cbb58596ad9427f8becb99ca1af82127c
|
data/CHANGELOG.md
CHANGED
|
@@ -4,11 +4,35 @@ All notable changes to this project will be documented in this file.
|
|
|
4
4
|
|
|
5
5
|
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
|
|
6
6
|
|
|
7
|
-
## [
|
|
7
|
+
## [0.2.0] - 2026-06-03
|
|
8
8
|
|
|
9
9
|
### Added
|
|
10
10
|
|
|
11
|
-
-
|
|
11
|
+
- Pre-warmed persistent Ractor worker pool implementation.
|
|
12
|
+
- `ProcessorPool#process_many` for batched dispatch.
|
|
13
|
+
- Tiny workload benchmark for dispatch overhead analysis.
|
|
14
|
+
- CPU-bound workload benchmark for throughput analysis.
|
|
15
|
+
- Batch workload benchmark for CDC-style event processing.
|
|
16
|
+
- Performance test suite guarded by `CDC_PARALLEL_PERFORMANCE_TESTS=1`.
|
|
17
|
+
- Reusable benchmark Docker image.
|
|
18
|
+
- `benchmark:processor_pool` Rake task.
|
|
19
|
+
- `benchmark:docker_build` Rake task.
|
|
20
|
+
- `benchmark:docker_run` Rake task.
|
|
21
|
+
- Benchmark documentation and reproducibility guidance.
|
|
22
|
+
|
|
23
|
+
### Changed
|
|
24
|
+
|
|
25
|
+
- Processor workers are now initialized once and reused for the lifetime of the pool.
|
|
26
|
+
- Benchmark methodology updated to measure pre-warmed worker execution.
|
|
27
|
+
- README updated with benchmark execution instructions and example results.
|
|
28
|
+
|
|
29
|
+
### Performance
|
|
30
|
+
|
|
31
|
+
Local benchmark results on Ruby 4.0.5 (4 workers) demonstrated measurable throughput improvements for CPU-bound workloads using pre-warmed worker pools compared to serial execution.
|
|
32
|
+
|
|
33
|
+
Benchmark results vary by hardware, operating system, Ruby version, and workload characteristics. Users are encouraged to reproduce results on their own systems using the included benchmark suite.
|
|
34
|
+
|
|
35
|
+
|
|
12
36
|
|
|
13
37
|
## [0.1.0] - 2026-05-31
|
|
14
38
|
|
|
@@ -28,3 +52,10 @@ The format is based on Keep a Changelog, and this project adheres to Semantic Ve
|
|
|
28
52
|
- Added Minitest suite.
|
|
29
53
|
- Added README and example.
|
|
30
54
|
- Added CI and release workflows.
|
|
55
|
+
|
|
56
|
+
## [0.1.1] - 2026-06-03
|
|
57
|
+
|
|
58
|
+
No code changes.
|
|
59
|
+
|
|
60
|
+
Improves RubyGems metadata and documentation wording to
|
|
61
|
+
explicitly identify CDC as Change Data Capture.
|
data/README.md
CHANGED
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
[](https://badge.fury.io/rb/cdc-parallel)
|
|
4
4
|
[](https://github.com/kanutocd/cdc-parallel/actions)
|
|
5
5
|
[](https://codecov.io/gh/kanutocd/cdc-parallel)
|
|
6
|
-
[](https://www.ruby-lang.org/en/)
|
|
7
7
|
[](https://opensource.org/licenses/MIT)
|
|
8
8
|
|
|
9
9
|
Optional high-throughput Ractor runtime for `cdc-core`.
|
|
@@ -161,3 +161,106 @@ The default `test` task runs unit, integration, and behavior tests. Performance
|
|
|
161
161
|
## License
|
|
162
162
|
|
|
163
163
|
MIT.
|
|
164
|
+
|
|
165
|
+
|
|
166
|
+
## Benchmarking
|
|
167
|
+
|
|
168
|
+
`cdc-parallel` includes reproducible benchmarks that compare serial processor execution against the pre-warmed Ractor worker pool.
|
|
169
|
+
|
|
170
|
+
The benchmark focuses on three workload categories:
|
|
171
|
+
|
|
172
|
+
| Workload | Purpose |
|
|
173
|
+
| -------- | ----------------------------------------------- |
|
|
174
|
+
| tiny | Measure dispatch overhead |
|
|
175
|
+
| cpu | Measure CPU-bound processing throughput |
|
|
176
|
+
| batch | Measure batched CDC event processing throughput |
|
|
177
|
+
|
|
178
|
+
### Running Benchmarks
|
|
179
|
+
|
|
180
|
+
Tiny workload:
|
|
181
|
+
|
|
182
|
+
```bash
|
|
183
|
+
BENCHMARK_WORKLOAD=tiny \
|
|
184
|
+
bundle exec rake benchmark:processor_pool
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
CPU-bound workload:
|
|
188
|
+
|
|
189
|
+
```bash
|
|
190
|
+
BENCHMARK_WORKLOAD=cpu \
|
|
191
|
+
BENCHMARK_CPU_ROUNDS=5000 \
|
|
192
|
+
bundle exec rake benchmark:processor_pool
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
Batch workload:
|
|
196
|
+
|
|
197
|
+
```bash
|
|
198
|
+
BENCHMARK_WORKLOAD=batch \
|
|
199
|
+
BENCHMARK_BATCH_SIZE=10000 \
|
|
200
|
+
bundle exec rake benchmark:processor_pool
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
### Benchmark Docker Image
|
|
204
|
+
|
|
205
|
+
Build and run the reusable Docker image:
|
|
206
|
+
|
|
207
|
+
```bash
|
|
208
|
+
bundle exec rake benchmark:docker_build
|
|
209
|
+
bundle exec rake benchmark:docker_run
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
Or run the image directly after it is published to GitHub Container Registry:
|
|
213
|
+
|
|
214
|
+
```bash
|
|
215
|
+
docker run --rm ghcr.io/kanutocd/cdc-parallel-benchmark:main
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
The benchmark image is intended to become the shared performance validation
|
|
219
|
+
pattern across CDC Ecosystem gems, enabling reproducible benchmark execution
|
|
220
|
+
locally, in CI, and across different development environments.
|
|
221
|
+
|
|
222
|
+
### Example Result
|
|
223
|
+
|
|
224
|
+
Environment:
|
|
225
|
+
|
|
226
|
+
* Ruby 4.0.5
|
|
227
|
+
* x86_64 Linux
|
|
228
|
+
* 4 workers
|
|
229
|
+
|
|
230
|
+
CPU workload (`BENCHMARK_CPU_ROUNDS=5000`):
|
|
231
|
+
|
|
232
|
+
```json
|
|
233
|
+
{
|
|
234
|
+
"serial": {
|
|
235
|
+
"events_per_second": 120.26
|
|
236
|
+
},
|
|
237
|
+
"parallel": {
|
|
238
|
+
"events_per_second": 250.15
|
|
239
|
+
},
|
|
240
|
+
"ratio": {
|
|
241
|
+
"parallel_to_serial": 2.08
|
|
242
|
+
}
|
|
243
|
+
}
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
### Interpretation
|
|
247
|
+
|
|
248
|
+
A ratio greater than `1.0` indicates that the pre-warmed Ractor worker pool outperformed serial execution.
|
|
249
|
+
|
|
250
|
+
```text
|
|
251
|
+
ratio > 1.0 => parallel faster
|
|
252
|
+
ratio = 1.0 => equivalent
|
|
253
|
+
ratio < 1.0 => serial faster
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
### Reproducibility
|
|
257
|
+
|
|
258
|
+
Benchmark results vary depending on:
|
|
259
|
+
|
|
260
|
+
* CPU model
|
|
261
|
+
* Core count
|
|
262
|
+
* Operating system
|
|
263
|
+
* Ruby version
|
|
264
|
+
* Background system activity
|
|
265
|
+
|
|
266
|
+
The benchmark suite is provided so that users can reproduce and validate results on their own hardware.
|
|
@@ -2,12 +2,12 @@
|
|
|
2
2
|
|
|
3
3
|
module CDC
|
|
4
4
|
module Parallel
|
|
5
|
-
# Executes one Ractor-safe processor in
|
|
5
|
+
# Executes one Ractor-safe processor in pre-warmed persistent Ractor workers.
|
|
6
6
|
#
|
|
7
|
-
#
|
|
8
|
-
#
|
|
9
|
-
#
|
|
10
|
-
#
|
|
7
|
+
# Workers are created during initialization and reused for every dispatch.
|
|
8
|
+
# This pays Ractor startup cost once, keeps workers alive after processor
|
|
9
|
+
# failures, and provides both synchronous single-item processing and batched
|
|
10
|
+
# dispatch for throughput-oriented benchmarks and runtimes.
|
|
11
11
|
class ProcessorPool
|
|
12
12
|
# @param processor [CDC::Core::Processor]
|
|
13
13
|
# @param size [Integer]
|
|
@@ -18,47 +18,107 @@ module CDC
|
|
|
18
18
|
|
|
19
19
|
@processor = ::Ractor.make_shareable(processor)
|
|
20
20
|
@configuration = Configuration.new(size:, timeout:)
|
|
21
|
+
@workers = Array.new(@configuration.size) do
|
|
22
|
+
build_worker(@processor)
|
|
23
|
+
end.freeze
|
|
24
|
+
|
|
25
|
+
@next_worker = 0
|
|
21
26
|
@shutdown = false
|
|
22
27
|
end
|
|
23
28
|
|
|
24
|
-
# Process one
|
|
29
|
+
# Process one work item synchronously.
|
|
25
30
|
#
|
|
26
|
-
# @param
|
|
31
|
+
# @param item [Object]
|
|
27
32
|
# @return [CDC::Core::ProcessorResult]
|
|
28
|
-
def process(
|
|
33
|
+
def process(item)
|
|
34
|
+
process_many([item]).fetch(0)
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
# Process many work items using the pre-warmed worker pool.
|
|
38
|
+
#
|
|
39
|
+
# Results are returned in the same order as the supplied work items.
|
|
40
|
+
#
|
|
41
|
+
# @param items [Array<Object>]
|
|
42
|
+
# @return [Array<CDC::Core::ProcessorResult>]
|
|
43
|
+
def process_many(items)
|
|
29
44
|
raise ShutdownError, "processor pool has been shut down" if @shutdown
|
|
30
45
|
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
46
|
+
work_items = items.map { |item| ::Ractor.make_shareable(item) }
|
|
47
|
+
reply_port = ::Ractor::Port.new
|
|
48
|
+
|
|
49
|
+
work_items.each_with_index do |item, index|
|
|
50
|
+
next_worker.send([index, item, reply_port])
|
|
36
51
|
end
|
|
37
52
|
|
|
38
|
-
|
|
53
|
+
collect_results(reply_port, work_items.length)
|
|
54
|
+
ensure
|
|
55
|
+
reply_port&.close
|
|
39
56
|
end
|
|
40
57
|
|
|
41
58
|
# Shut down the pool.
|
|
42
59
|
#
|
|
43
60
|
# @return [void]
|
|
44
61
|
def shutdown
|
|
62
|
+
return if @shutdown
|
|
63
|
+
|
|
45
64
|
@shutdown = true
|
|
65
|
+
|
|
66
|
+
@workers.each do |worker|
|
|
67
|
+
worker.send(nil)
|
|
68
|
+
rescue Ractor::ClosedError
|
|
69
|
+
# Already stopped.
|
|
70
|
+
end
|
|
46
71
|
end
|
|
47
72
|
|
|
48
73
|
private
|
|
49
74
|
|
|
50
75
|
def validate_processor!(processor)
|
|
51
|
-
return if processor.class.respond_to?(:ractor_safe?) &&
|
|
76
|
+
return if processor.class.respond_to?(:ractor_safe?) &&
|
|
77
|
+
processor.class.ractor_safe?
|
|
52
78
|
|
|
53
|
-
raise UnsafeProcessorError,
|
|
79
|
+
raise UnsafeProcessorError,
|
|
80
|
+
"#{processor.class} must declare ractor_safe!"
|
|
54
81
|
end
|
|
55
82
|
|
|
56
|
-
def
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
83
|
+
def build_worker(processor)
|
|
84
|
+
::Ractor.new(processor) do |safe_processor|
|
|
85
|
+
loop do
|
|
86
|
+
message = ::Ractor.receive
|
|
87
|
+
break if message.nil?
|
|
88
|
+
|
|
89
|
+
index, item, reply_port = message
|
|
90
|
+
|
|
91
|
+
response = begin
|
|
92
|
+
CDC::Parallel::ResultCollector.worker_success(
|
|
93
|
+
safe_processor.process(item)
|
|
94
|
+
)
|
|
95
|
+
rescue StandardError => e
|
|
96
|
+
CDC::Parallel::ResultCollector.worker_failure(e)
|
|
97
|
+
end
|
|
98
|
+
|
|
99
|
+
reply_port << [index, response]
|
|
100
|
+
end
|
|
101
|
+
end
|
|
102
|
+
end
|
|
103
|
+
|
|
104
|
+
def next_worker
|
|
105
|
+
worker = @workers[@next_worker]
|
|
106
|
+
|
|
107
|
+
@next_worker += 1
|
|
108
|
+
@next_worker = 0 if @next_worker >= @workers.length
|
|
109
|
+
|
|
110
|
+
worker
|
|
111
|
+
end
|
|
112
|
+
|
|
113
|
+
def collect_results(reply_port, count)
|
|
114
|
+
results = Array.new(count)
|
|
115
|
+
|
|
116
|
+
count.times do
|
|
117
|
+
index, response = reply_port.receive
|
|
118
|
+
results[index] = ResultCollector.normalize(response)
|
|
61
119
|
end
|
|
120
|
+
|
|
121
|
+
results.freeze
|
|
62
122
|
end
|
|
63
123
|
end
|
|
64
124
|
end
|
|
@@ -6,6 +6,14 @@ module CDC
|
|
|
6
6
|
class ResultCollector
|
|
7
7
|
FAILURE_MARKER = :__cdc_parallel_failure__
|
|
8
8
|
|
|
9
|
+
# Build a shareable success payload that can safely cross a Ractor boundary.
|
|
10
|
+
#
|
|
11
|
+
# @param value [Object]
|
|
12
|
+
# @return [Object]
|
|
13
|
+
def self.worker_success(value)
|
|
14
|
+
::Ractor.make_shareable(value)
|
|
15
|
+
end
|
|
16
|
+
|
|
9
17
|
# Build a shareable failure payload that can safely cross a Ractor boundary.
|
|
10
18
|
#
|
|
11
19
|
# @param error [Exception]
|
data/lib/cdc/parallel/version.rb
CHANGED
data/lib/cdc/parallel.rb
CHANGED
|
@@ -1,7 +1,6 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
3
|
require "etc"
|
|
4
|
-
require "ractor-pool"
|
|
5
4
|
|
|
6
5
|
require_relative "parallel/version"
|
|
7
6
|
require_relative "parallel/errors"
|
|
@@ -13,7 +12,7 @@ require_relative "parallel/router"
|
|
|
13
12
|
require_relative "parallel/runtime"
|
|
14
13
|
|
|
15
14
|
module CDC
|
|
16
|
-
# Optional
|
|
15
|
+
# Optional parallel Change Data Capture runtime for cdc-core processors.
|
|
17
16
|
module Parallel
|
|
18
17
|
end
|
|
19
18
|
end
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: cdc-parallel
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.2.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Ken C. Demanawa
|
|
@@ -39,8 +39,8 @@ dependencies:
|
|
|
39
39
|
version: 0.4.0
|
|
40
40
|
description: |
|
|
41
41
|
cdc-parallel provides optional Ractor-backed parallel execution for
|
|
42
|
-
cdc-core. It accelerates Change Data Capture (CDC)
|
|
43
|
-
preserving the
|
|
42
|
+
cdc-core. It accelerates PostgreSQL Change Data Capture (CDC) event
|
|
43
|
+
processing while preserving the cdc-core programming model.
|
|
44
44
|
email:
|
|
45
45
|
- kenneth.c.demanawa@gmail.com
|
|
46
46
|
executables: []
|
|
@@ -98,5 +98,5 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
98
98
|
requirements: []
|
|
99
99
|
rubygems_version: 4.0.10
|
|
100
100
|
specification_version: 4
|
|
101
|
-
summary: Optional parallel
|
|
101
|
+
summary: Optional parallel Change Data Capture (CDC) runtime for cdc-core.
|
|
102
102
|
test_files: []
|