cdc-parallel 0.2.2 → 0.2.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +48 -21
- data/README.md +54 -48
- data/lib/cdc/parallel/configuration.rb +28 -5
- data/lib/cdc/parallel/errors.rb +59 -5
- data/lib/cdc/parallel/processor_pool.rb +198 -44
- data/lib/cdc/parallel/result_collector.rb +43 -2
- data/lib/cdc/parallel/router.rb +26 -1
- data/lib/cdc/parallel/runtime.rb +65 -4
- data/lib/cdc/parallel/transaction_pool.rb +54 -3
- data/lib/cdc/parallel/version.rb +6 -1
- data/lib/cdc/parallel.rb +33 -1
- data/sig/cdc/parallel/configuration.rbs +8 -2
- data/sig/cdc/parallel/errors.rbs +7 -7
- data/sig/cdc/parallel/processor_pool.rbs +33 -16
- data/sig/cdc/parallel/result_collector.rbs +6 -4
- data/sig/cdc/parallel/router.rbs +7 -4
- data/sig/cdc/parallel/runtime.rbs +11 -8
- data/sig/cdc/parallel/transaction_pool.rbs +4 -4
- data/sig/cdc/parallel/version.rbs +1 -1
- metadata +5 -23
- data/sig/shims/cdc_core.rbs +0 -14
- data/sig/shims/data_define.rbs +0 -0
- data/sig/shims/etc.rbs +0 -3
- data/sig/shims/timeout.rbs +0 -3
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 727373877c3c90e65e65ec4bbb5f5312b7e0d15d879af54402fa6752ac730aa0
|
|
4
|
+
data.tar.gz: cd2701bf5e93f5ef825f4b2e95a015dfbbb214bb14cb3f095a65d499aeaf9f28
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 33e6152dfaf7fdeda19853229519f2154b6ec39cb87b3c074dea0658e94eb65156d11d8e7ce11d0c3560dae678fe196dc1cc99a4f6fb4c6d712f8b84be7ad274
|
|
7
|
+
data.tar.gz: 9999fdf8a4e05694f506b1a4f6e6db9679bc0da09d47b8c67513959d8467ead0fce0b26245b158c3101c8467110ed29af2aefeb4bc13c27d95ac76e38920e449
|
data/CHANGELOG.md
CHANGED
|
@@ -4,30 +4,59 @@ All notable changes to this project will be documented in this file.
|
|
|
4
4
|
|
|
5
5
|
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
## Unreleased
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
No unreleased changes.
|
|
10
10
|
|
|
11
|
-
|
|
12
|
-
- Updated transaction processing so partial event failures fail the transaction result while preserving per-event results.
|
|
13
|
-
- Added CI validation for RBS signatures.
|
|
11
|
+
## [0.2.3] - 2026-06-03
|
|
14
12
|
|
|
15
|
-
|
|
13
|
+
### Added
|
|
14
|
+
|
|
15
|
+
- Added Port-native `Ractor::Port` worker inbox dispatch for the pre-warmed processor pool.
|
|
16
|
+
- Added concurrent threaded caller regression coverage for `ProcessorPool#process_many`.
|
|
17
|
+
- Added worker inbox boot verification coverage.
|
|
18
|
+
- Added multi-trial processor-pool benchmark reporting with min, median, max, and p95 distributions.
|
|
19
|
+
- Added minimum measurement duration support for benchmark trials.
|
|
20
|
+
- Added worker-count sweep support through `BENCHMARK_WORKER_COUNTS`.
|
|
21
|
+
- Added benchmark comparison across serial execution, repeated `ProcessorPool#process`, and batched `ProcessorPool#process_many`.
|
|
22
|
+
- Added benchmark environment metadata for Ruby, platform, host, CPU count, and uname details.
|
|
23
|
+
- Added detailed benchmark methodology and report documentation under `benchmark/README.md`.
|
|
24
|
+
|
|
25
|
+
### Changed
|
|
26
|
+
|
|
27
|
+
- Updated worker dispatch to send work through worker-owned inbox ports instead of direct worker messages.
|
|
28
|
+
- Synchronized dispatch and shutdown with a mutex so multiple Ruby threads can submit work safely.
|
|
29
|
+
- Updated processor pool RBS signatures for worker inboxes and Port-native dispatch helpers.
|
|
30
|
+
- Expanded README documentation for the worker dispatch model.
|
|
31
|
+
- Updated README benchmark guidance to point to the detailed benchmark report documentation.
|
|
32
|
+
- Updated benchmark ratio reporting to compare median throughput against serial execution.
|
|
33
|
+
|
|
34
|
+
## [0.2.2] - 2026-06-03
|
|
16
35
|
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
36
|
+
### Changed
|
|
37
|
+
|
|
38
|
+
- Improved processor pool shutdown so workers are signaled and confirmed stopped where practical.
|
|
39
|
+
- Updated transaction processing so partial event failures fail the transaction result while preserving per-event results.
|
|
40
|
+
- Added CI validation for RBS signatures.
|
|
41
|
+
|
|
42
|
+
### Added
|
|
43
|
+
|
|
44
|
+
- Added regression coverage for shutdown after processed and pending work.
|
|
45
|
+
- Added regression coverage for timeout-bounded shutdown behavior.
|
|
46
|
+
- Added regression coverage for `process_many([])` returning a clean empty result.
|
|
47
|
+
- Added transaction pool coverage for successful and partially failed transactions.
|
|
21
48
|
|
|
22
49
|
## [0.2.1] - 2026-06-03
|
|
23
50
|
|
|
24
51
|
### Added
|
|
25
52
|
|
|
26
|
-
|
|
53
|
+
- Enforced processor timeout handling.
|
|
54
|
+
- Fixed transaction partial-failure behavior.
|
|
55
|
+
- Added regression coverage for hung processors and transaction failure cases.
|
|
27
56
|
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
57
|
+
### Changed
|
|
58
|
+
|
|
59
|
+
- Released a correctness and reliability patch.
|
|
31
60
|
|
|
32
61
|
## [0.2.0] - 2026-06-03
|
|
33
62
|
|
|
@@ -57,7 +86,12 @@ Local benchmark results on Ruby 4.0.5 (4 workers) demonstrated measurable throug
|
|
|
57
86
|
|
|
58
87
|
Benchmark results vary by hardware, operating system, Ruby version, and workload characteristics. Users are encouraged to reproduce results on their own systems using the included benchmark suite.
|
|
59
88
|
|
|
89
|
+
## [0.1.1] - 2026-06-03
|
|
90
|
+
|
|
91
|
+
No code changes.
|
|
60
92
|
|
|
93
|
+
Improves RubyGems metadata and documentation wording to
|
|
94
|
+
explicitly identify CDC as Change Data Capture.
|
|
61
95
|
|
|
62
96
|
## [0.1.0] - 2026-05-31
|
|
63
97
|
|
|
@@ -77,10 +111,3 @@ Benchmark results vary by hardware, operating system, Ruby version, and workload
|
|
|
77
111
|
- Added Minitest suite.
|
|
78
112
|
- Added README and example.
|
|
79
113
|
- Added CI and release workflows.
|
|
80
|
-
|
|
81
|
-
## [0.1.1] - 2026-06-03
|
|
82
|
-
|
|
83
|
-
No code changes.
|
|
84
|
-
|
|
85
|
-
Improves RubyGems metadata and documentation wording to
|
|
86
|
-
explicitly identify CDC as Change Data Capture.
|
data/README.md
CHANGED
|
@@ -2,7 +2,6 @@
|
|
|
2
2
|
|
|
3
3
|
[](https://badge.fury.io/rb/cdc-parallel)
|
|
4
4
|
[](https://github.com/kanutocd/cdc-parallel/actions)
|
|
5
|
-
[](https://codecov.io/gh/kanutocd/cdc-parallel)
|
|
6
5
|
[](https://www.ruby-lang.org/en/)
|
|
7
6
|
[](https://opensource.org/licenses/MIT)
|
|
8
7
|
|
|
@@ -82,6 +81,39 @@ Unsafe processors raise:
|
|
|
82
81
|
CDC::Parallel::UnsafeProcessorError
|
|
83
82
|
```
|
|
84
83
|
|
|
84
|
+
## Concurrency Contract
|
|
85
|
+
|
|
86
|
+
`CDC::Parallel::ProcessorPool` accepts submissions from multiple Ruby threads.
|
|
87
|
+
Dispatch state is synchronized inside the pool, while processor execution occurs
|
|
88
|
+
inside isolated Ruby 4 Ractors.
|
|
89
|
+
|
|
90
|
+
Workers own their `Ractor::Port` inboxes. The pool sends work to those inboxes,
|
|
91
|
+
and workers send results back to a caller-owned reply port.
|
|
92
|
+
|
|
93
|
+
```text
|
|
94
|
+
Caller Thread A ─┐
|
|
95
|
+
Caller Thread B ─┼─> ProcessorPool
|
|
96
|
+
Caller Thread C ─┘ │
|
|
97
|
+
│ synchronized dispatch
|
|
98
|
+
▼
|
|
99
|
+
+-------------------+
|
|
100
|
+
| worker selection |
|
|
101
|
+
+-------------------+
|
|
102
|
+
│ │ │
|
|
103
|
+
▼ ▼ ▼
|
|
104
|
+
inbox port inbox port inbox port
|
|
105
|
+
│ │ │
|
|
106
|
+
▼ ▼ ▼
|
|
107
|
+
Ractor 1 Ractor 2 Ractor 3
|
|
108
|
+
│ │ │
|
|
109
|
+
└───┬───┴───┬───┘
|
|
110
|
+
▼ ▼
|
|
111
|
+
caller-owned reply port
|
|
112
|
+
│
|
|
113
|
+
▼
|
|
114
|
+
ordered ProcessorResult[]
|
|
115
|
+
```
|
|
116
|
+
|
|
85
117
|
## What Belongs Here
|
|
86
118
|
|
|
87
119
|
- Ractor processor execution
|
|
@@ -175,7 +207,10 @@ The benchmark focuses on three workload categories:
|
|
|
175
207
|
| cpu | Measure CPU-bound processing throughput |
|
|
176
208
|
| batch | Measure batched CDC event processing throughput |
|
|
177
209
|
|
|
178
|
-
|
|
210
|
+
See [benchmark/README.md](benchmark/README.md) for the full benchmark methodology,
|
|
211
|
+
configuration reference, report schema, and interpretation guidance.
|
|
212
|
+
|
|
213
|
+
### Quick Start
|
|
179
214
|
|
|
180
215
|
Tiny workload:
|
|
181
216
|
|
|
@@ -200,6 +235,23 @@ BENCHMARK_BATCH_SIZE=10000 \
|
|
|
200
235
|
bundle exec rake benchmark:processor_pool
|
|
201
236
|
```
|
|
202
237
|
|
|
238
|
+
Worker-count sweep:
|
|
239
|
+
|
|
240
|
+
```bash
|
|
241
|
+
BENCHMARK_WORKLOAD=cpu \
|
|
242
|
+
BENCHMARK_WORKER_COUNTS=1,2,4 \
|
|
243
|
+
bundle exec rake benchmark:processor_pool
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
Credibility controls:
|
|
247
|
+
|
|
248
|
+
```bash
|
|
249
|
+
BENCHMARK_TRIALS=7 \
|
|
250
|
+
BENCHMARK_MIN_DURATION=0.25 \
|
|
251
|
+
BENCHMARK_ITERATIONS=1000 \
|
|
252
|
+
bundle exec rake benchmark:processor_pool
|
|
253
|
+
```
|
|
254
|
+
|
|
203
255
|
### Benchmark Docker Image
|
|
204
256
|
|
|
205
257
|
Build and run the reusable Docker image:
|
|
@@ -218,49 +270,3 @@ docker run --rm ghcr.io/kanutocd/cdc-parallel-benchmark:main
|
|
|
218
270
|
The benchmark image is intended to become the shared performance validation
|
|
219
271
|
pattern across CDC Ecosystem gems, enabling reproducible benchmark execution
|
|
220
272
|
locally, in CI, and across different development environments.
|
|
221
|
-
|
|
222
|
-
### Example Result
|
|
223
|
-
|
|
224
|
-
Environment:
|
|
225
|
-
|
|
226
|
-
* Ruby 4.0.5
|
|
227
|
-
* x86_64 Linux
|
|
228
|
-
* 4 workers
|
|
229
|
-
|
|
230
|
-
CPU workload (`BENCHMARK_CPU_ROUNDS=5000`):
|
|
231
|
-
|
|
232
|
-
```json
|
|
233
|
-
{
|
|
234
|
-
"serial": {
|
|
235
|
-
"events_per_second": 120.26
|
|
236
|
-
},
|
|
237
|
-
"parallel": {
|
|
238
|
-
"events_per_second": 250.15
|
|
239
|
-
},
|
|
240
|
-
"ratio": {
|
|
241
|
-
"parallel_to_serial": 2.08
|
|
242
|
-
}
|
|
243
|
-
}
|
|
244
|
-
```
|
|
245
|
-
|
|
246
|
-
### Interpretation
|
|
247
|
-
|
|
248
|
-
A ratio greater than `1.0` indicates that the pre-warmed Ractor worker pool outperformed serial execution.
|
|
249
|
-
|
|
250
|
-
```text
|
|
251
|
-
ratio > 1.0 => parallel faster
|
|
252
|
-
ratio = 1.0 => equivalent
|
|
253
|
-
ratio < 1.0 => serial faster
|
|
254
|
-
```
|
|
255
|
-
|
|
256
|
-
### Reproducibility
|
|
257
|
-
|
|
258
|
-
Benchmark results vary depending on:
|
|
259
|
-
|
|
260
|
-
* CPU model
|
|
261
|
-
* Core count
|
|
262
|
-
* Operating system
|
|
263
|
-
* Ruby version
|
|
264
|
-
* Background system activity
|
|
265
|
-
|
|
266
|
-
The benchmark suite is provided so that users can reproduce and validate results on their own hardware.
|
|
@@ -2,13 +2,36 @@
|
|
|
2
2
|
|
|
3
3
|
module CDC
|
|
4
4
|
module Parallel
|
|
5
|
-
# Immutable configuration
|
|
5
|
+
# Immutable configuration shared by cdc-parallel runtime objects.
|
|
6
6
|
#
|
|
7
|
-
#
|
|
8
|
-
#
|
|
9
|
-
#
|
|
10
|
-
#
|
|
7
|
+
# `Configuration` validates worker sizing and timeout values at construction
|
|
8
|
+
# time, freezes the resulting data object through `Data.define`, and makes
|
|
9
|
+
# the instance shareable so it is safe to retain around Ractor-oriented
|
|
10
|
+
# runtime objects.
|
|
11
|
+
#
|
|
12
|
+
# @example Default configuration
|
|
13
|
+
# config = CDC::Parallel::Configuration.new
|
|
14
|
+
# config.size #=> Etc.nprocessors
|
|
15
|
+
# config.timeout #=> nil
|
|
16
|
+
#
|
|
17
|
+
# @example Explicit worker count and timeout
|
|
18
|
+
# config = CDC::Parallel::Configuration.new(size: 4, timeout: 5)
|
|
19
|
+
#
|
|
20
|
+
# @!attribute [r] size
|
|
21
|
+
# @return [Integer] Number of worker Ractors to boot.
|
|
22
|
+
# @!attribute [r] timeout
|
|
23
|
+
# @return [Numeric, nil] Optional wait timeout in seconds.
|
|
24
|
+
# @api public
|
|
11
25
|
class Configuration < Data.define(:size, :timeout)
|
|
26
|
+
# Create a validated runtime configuration.
|
|
27
|
+
#
|
|
28
|
+
# @param size [Integer]
|
|
29
|
+
# Worker count. Must be greater than zero.
|
|
30
|
+
# @param timeout [Numeric, nil]
|
|
31
|
+
# Optional timeout in seconds. Must be greater than zero when provided.
|
|
32
|
+
# @raise [ArgumentError]
|
|
33
|
+
# Raised when `size` or `timeout` is invalid.
|
|
34
|
+
# @return [void]
|
|
12
35
|
def initialize(size: Etc.nprocessors, timeout: nil)
|
|
13
36
|
raise ArgumentError, "size must be an Integer" unless size.is_a?(Integer)
|
|
14
37
|
raise ArgumentError, "size must be greater than zero" unless size.positive?
|
data/lib/cdc/parallel/errors.rb
CHANGED
|
@@ -2,22 +2,69 @@
|
|
|
2
2
|
|
|
3
3
|
module CDC
|
|
4
4
|
module Parallel
|
|
5
|
-
# Base cdc-parallel
|
|
5
|
+
# Base error for all cdc-parallel-specific failures.
|
|
6
|
+
#
|
|
7
|
+
# Rescue this class when callers want to handle any failure raised directly
|
|
8
|
+
# by the parallel runtime layer.
|
|
9
|
+
#
|
|
10
|
+
# @api public
|
|
6
11
|
class Error < StandardError; end
|
|
7
12
|
|
|
8
13
|
# Raised when a processor has not declared itself Ractor-safe.
|
|
14
|
+
#
|
|
15
|
+
# Processors must opt in with `ractor_safe!` before they can be used by
|
|
16
|
+
# {ProcessorPool}, {TransactionPool}, or {Runtime}. This prevents accidental
|
|
17
|
+
# movement of mutable or otherwise unsafe processor objects across Ractor
|
|
18
|
+
# boundaries.
|
|
19
|
+
#
|
|
20
|
+
# @api public
|
|
9
21
|
class UnsafeProcessorError < Error; end
|
|
10
22
|
|
|
11
|
-
# Raised when work is submitted after
|
|
23
|
+
# Raised when work is submitted after a pool or runtime has been shut down.
|
|
24
|
+
#
|
|
25
|
+
# @api public
|
|
12
26
|
class ShutdownError < Error; end
|
|
13
27
|
|
|
14
|
-
# Raised when the runtime receives an unsupported work item.
|
|
28
|
+
# Raised when the runtime receives an unsupported work item shape.
|
|
29
|
+
#
|
|
30
|
+
# `cdc-parallel` accepts normalized `CDC::Core::ChangeEvent` and
|
|
31
|
+
# `CDC::Core::TransactionEnvelope` objects. Source-specific payloads must be
|
|
32
|
+
# normalized by a source adapter before they reach this runtime layer.
|
|
33
|
+
#
|
|
34
|
+
# @api public
|
|
15
35
|
class UnsupportedWorkItemError < Error; end
|
|
16
36
|
|
|
17
|
-
#
|
|
37
|
+
# Represents an exception raised inside a worker Ractor.
|
|
38
|
+
#
|
|
39
|
+
# Worker exceptions are serialized before they cross the Ractor boundary and
|
|
40
|
+
# reconstructed as `ProcessorExecutionError` instances by
|
|
41
|
+
# {ResultCollector.normalize}. The original exception class name, message,
|
|
42
|
+
# and backtrace are exposed for diagnostics.
|
|
43
|
+
#
|
|
44
|
+
# @example Inspecting the original worker exception
|
|
45
|
+
# result = runtime.process(event)
|
|
46
|
+
# if result.failure?
|
|
47
|
+
# error = result.error
|
|
48
|
+
# error.original_class
|
|
49
|
+
# error.original_message
|
|
50
|
+
# end
|
|
51
|
+
#
|
|
52
|
+
# @attr_reader original_class [String] original exception class name.
|
|
53
|
+
# @attr_reader original_message [String] original exception message.
|
|
54
|
+
# @attr_reader original_backtrace [Array<String>] original exception backtrace.
|
|
55
|
+
# @api public
|
|
18
56
|
class ProcessorExecutionError < Error
|
|
19
57
|
attr_reader :original_class, :original_message, :original_backtrace
|
|
20
58
|
|
|
59
|
+
# Create a reconstructed worker exception.
|
|
60
|
+
#
|
|
61
|
+
# @param original_class [String]
|
|
62
|
+
# Class name of the exception raised inside the worker.
|
|
63
|
+
# @param original_message [String]
|
|
64
|
+
# Message from the exception raised inside the worker.
|
|
65
|
+
# @param original_backtrace [Array<String>]
|
|
66
|
+
# Backtrace captured inside the worker.
|
|
67
|
+
# @return [void]
|
|
21
68
|
def initialize(original_class:, original_message:, original_backtrace: [])
|
|
22
69
|
@original_class = original_class
|
|
23
70
|
@original_message = original_message
|
|
@@ -28,7 +75,14 @@ module CDC
|
|
|
28
75
|
end
|
|
29
76
|
end
|
|
30
77
|
|
|
31
|
-
# Raised when a
|
|
78
|
+
# Raised when a pool does not receive worker results before the configured
|
|
79
|
+
# timeout.
|
|
80
|
+
#
|
|
81
|
+
# Timeout failures are normally returned inside `CDC::Core::ProcessorResult`
|
|
82
|
+
# failure objects rather than raised directly to the caller during result
|
|
83
|
+
# collection.
|
|
84
|
+
#
|
|
85
|
+
# @api public
|
|
32
86
|
class TimeoutError < Error; end
|
|
33
87
|
end
|
|
34
88
|
end
|