cdc-concurrent 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: cbc0a0ebe15a9254e91b40caa48db7f7f26e4b01933be88e5fd148ea8c3371d0
4
- data.tar.gz: 38888cc8ade72e4611e352fa534a25876cbc71d6ef74c2f725b5bb3ef1c4c069
3
+ metadata.gz: e7567d8160a1a8b4f930c718574ed7a6e7c181266d62572bbad2362c5b6a480f
4
+ data.tar.gz: b0867ca4e10b7e26e4c355420183517bc34e13bf4a72b49f91cb6f63227670fe
5
5
  SHA512:
6
- metadata.gz: c3315531e5f0cbe43f86461acabf835ec5f187d65dace206d1640a213b42702a0ee8e1088421f1e2f05e076ce9dfcdd7625f309f837b613c5f13dfc459b07152
7
- data.tar.gz: 351566c71b92c033c2c35b2639653eabf6e75d3bacee667f643dde9af47bffaf7dbeee4f802eb8e237bd5e9a5bc7957d43269b323fc9c17b89bedda1ecc7dd8b
6
+ metadata.gz: b145904c468fdc9ad5bcf492635da16d75371952f04ab247e220d7f3d6fa0a269a32e4255dc810b6e3661fa8bb1c11f1d5ca9306f72c4ce9475a8ef60d7b0dd8
7
+ data.tar.gz: ce8a9edee9f28b4c372b187fa1d8a871f31209321cf4b76b536d44e88b5e0e55006529c0af4fd0c35f60219822da93498e160a51b397c14b015c2fec8915b3a4
data/CHANGELOG.md CHANGED
@@ -1,3 +1,24 @@
1
+ ## [0.1.1] - 2026-06-12
2
+
3
+ Tightened the type surface and switched the signatures over to the published
4
+ `cdc-core` 0.1.3 RBS files.
5
+
6
+ ### Added
7
+
8
+ - Published `cdc-core` RBS files are now loaded directly by Steep.
9
+ - `CDC::Concurrent` signatures are tighter around runtime, router, and pool
10
+ boundaries.
11
+
12
+ ### Changed
13
+
14
+ - Removed local RBS shims for `cdc-core` and `async`.
15
+ - Updated the gem dependency floor to `cdc-core >= 0.1.3`.
16
+
17
+ ### Fixed
18
+
19
+ - `TransactionPool` now handles failure results with a missing error value
20
+ defensively.
21
+
1
22
  ## [0.1.0] - 2026-06-04
2
23
 
3
24
  Initial release of `cdc-concurrent`.
data/README.md CHANGED
@@ -1,5 +1,10 @@
1
1
  # cdc-concurrent
2
2
 
3
+ [![Gem Version](https://badge.fury.io/rb/cdc-conxurrent.svg)](https://badge.fury.io/rb/cdc-conxurrent)
4
+ [![CI](https://github.com/kanutocd/cdc-conxurrent/workflows/CI/badge.svg)](https://github.com/kanutocd/cdc-concurrent/actions)
5
+ [![Ruby Version](https://img.shields.io/badge/ruby-%3E%3D%203.4-ruby.svg)](https://www.ruby-lang.org/en/)
6
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
7
+
3
8
  Optional I/O-concurrent runtime adapter for `cdc-core`.
4
9
 
5
10
  `cdc-concurrent` executes `CDC::Core::Processor` objects with Fiber-scheduler-based I/O concurrency using `async`. It is the I/O-bound twin of `cdc-parallel`.
@@ -73,6 +78,10 @@ results = runtime.process_many(events)
73
78
 
74
79
  Results preserve input order by default. Set `preserve_order: false` when completion order is acceptable.
75
80
 
81
+ For I/O-bound throughput, `process_many` is the primary path. Repeated
82
+ single-event `process` calls still pay one Async dispatch per event, while
83
+ `process_many` lets the runtime overlap waits across the submitted batch.
84
+
76
85
  ## Transaction Processing
77
86
 
78
87
  ```ruby
@@ -99,10 +108,17 @@ CDC::Concurrent::UnsafeProcessorError
99
108
 
100
109
  A concurrent-safe processor should avoid unsafe shared mutable instance state. This runtime runs tasks concurrently in one Ruby process; it does not isolate mutable objects like Ractors do.
101
110
 
111
+ `concurrent_safe!` is a declaration of processor intent, not a proof. The
112
+ runtime cannot verify that a processor avoids unsafe shared mutable state or
113
+ uses scheduler-compatible I/O.
114
+
102
115
  ## Important Limits
103
116
 
104
117
  `cdc-concurrent` improves throughput only for I/O that cooperates with Ruby's Fiber scheduler. Blocking libraries that do not yield to the scheduler will still block the process.
105
118
 
119
+ Timeouts are applied per event task. They are not whole-batch or
120
+ whole-transaction deadlines.
121
+
106
122
  For CPU-bound processing, use `cdc-parallel`.
107
123
 
108
124
  ## Roadmap
@@ -114,6 +130,112 @@ For CPU-bound processing, use `cdc-parallel`.
114
130
  - Sink abstractions
115
131
  - Async Redis/OpenSearch integrations
116
132
 
133
+ ## Test Organization
134
+
135
+ The test suite is grouped by intent so the same structure can be reused across CDC ecosystem gems.
136
+
137
+ ```text
138
+ test/unit/ focused class and branch coverage
139
+ test/integration/ component interaction and runtime integration
140
+ test/behavior/ ecosystem contracts and guardrails
141
+ test/performance/ opt-in smoke benchmarks
142
+ ```
143
+
144
+ Run the default quality suite:
145
+
146
+ ```bash
147
+ bundle exec rake test
148
+ ```
149
+
150
+ Run a specific group:
151
+
152
+ ```bash
153
+ bundle exec rake test:unit
154
+ bundle exec rake test:integration
155
+ bundle exec rake test:behavior
156
+ bundle exec rake test:performance
157
+ ```
158
+
159
+ The default `test` task runs unit, integration, and behavior tests. Performance tests are intentionally separate because they are environment-sensitive.
160
+
161
+ ## Benchmarking
162
+
163
+ `cdc-concurrent` includes reproducible benchmarks that compare serial processor execution against the Async-backed processor pool.
164
+
165
+ The benchmark focuses on three workload categories:
166
+
167
+ | Workload | Purpose |
168
+ | -------- | -------------------------------------------- |
169
+ | tiny | Measure dispatch overhead |
170
+ | io | Measure scheduler-friendly I/O concurrency |
171
+ | batch | Measure batched CDC event I/O fanout |
172
+
173
+ See [benchmark/README.md](benchmark/README.md) for the full benchmark methodology,
174
+ configuration reference, report schema, and interpretation guidance.
175
+
176
+ `cdc-parallel` and `cdc-concurrent` benchmark different bottlenecks.
177
+ `cdc-parallel` measures CPU parallelism; `cdc-concurrent` measures I/O wait
178
+ overlap. Their speedup ratios are not directly comparable.
179
+
180
+ ### Quick Start
181
+
182
+ Default I/O workload:
183
+
184
+ ```bash
185
+ bundle exec rake benchmark:processor_pool
186
+ ```
187
+
188
+ Tiny overhead workload:
189
+
190
+ ```bash
191
+ BENCHMARK_WORKLOAD=tiny \
192
+ bundle exec rake benchmark:processor_pool
193
+ ```
194
+
195
+ Batch workload:
196
+
197
+ ```bash
198
+ BENCHMARK_WORKLOAD=batch \
199
+ BENCHMARK_BATCH_SIZE=1000 \
200
+ bundle exec rake benchmark:processor_pool
201
+ ```
202
+
203
+ Concurrency sweep:
204
+
205
+ ```bash
206
+ BENCHMARK_WORKLOAD=io \
207
+ BENCHMARK_CONCURRENCY_COUNTS=1,10,50,100 \
208
+ bundle exec rake benchmark:processor_pool
209
+ ```
210
+
211
+ Credibility controls:
212
+
213
+ ```bash
214
+ BENCHMARK_TRIALS=7 \
215
+ BENCHMARK_MIN_DURATION=0.25 \
216
+ BENCHMARK_ITERATIONS=1000 \
217
+ bundle exec rake benchmark:processor_pool
218
+ ```
219
+
220
+ ### Benchmark Docker Image
221
+
222
+ Build and run the reusable Docker image:
223
+
224
+ ```bash
225
+ bundle exec rake benchmark:docker_build
226
+ bundle exec rake benchmark:docker_run
227
+ ```
228
+
229
+ Or run the image directly after it is published to GitHub Container Registry:
230
+
231
+ ```bash
232
+ docker run --rm ghcr.io/kanutocd/cdc-concurrent-benchmark:main
233
+ ```
234
+
235
+ The benchmark image is intended to follow the shared performance validation
236
+ pattern across CDC Ecosystem gems, enabling reproducible benchmark execution
237
+ locally, in CI, and across different development environments.
238
+
117
239
  ## License
118
240
 
119
- MIT.
241
+ [MIT](LICENSE.txt)
@@ -0,0 +1,165 @@
1
+ # cdc-concurrent Benchmarking
2
+
3
+ This directory contains the reproducible benchmark harness for `cdc-concurrent`.
4
+
5
+ The benchmark compares direct serial processor execution against the
6
+ Async-backed `CDC::Concurrent::ProcessorPool`.
7
+
8
+ ## Goals
9
+
10
+ The benchmark is designed to answer practical runtime questions:
11
+
12
+ - What is the dispatch overhead for tiny work?
13
+ - When does Async-backed I/O concurrency amortize its overhead?
14
+ - How much does batched `process_many` improve throughput?
15
+ - How does throughput change as concurrency changes?
16
+ - Are results stable across multiple trials?
17
+
18
+ ## Workloads
19
+
20
+ | Workload | Purpose | Default options |
21
+ | -------- | -------------------------------------------- | ------------------------ |
22
+ | tiny | Measures processor-pool dispatch cost | none |
23
+ | io | Measures scheduler-friendly I/O wait overlap | `io_sleep: 0.001` |
24
+ | batch | Measures CDC-style batch I/O fanout | `batch_size: 100`, `io_sleep: 0.001` |
25
+
26
+ Tiny workloads intentionally do almost no work. They are useful for measuring
27
+ runtime overhead, but they are not expected to make concurrent execution look
28
+ faster than direct method calls.
29
+
30
+ I/O and batch workloads are better indicators of useful concurrent throughput.
31
+
32
+ ## Execution Modes
33
+
34
+ The benchmark compares three execution modes.
35
+
36
+ | Mode | Meaning |
37
+ | ---------------- | -------------------------------------------- |
38
+ | serial | Direct processor execution |
39
+ | repeated_process | Repeated `ProcessorPool#process` calls |
40
+ | process_many | Batched `ProcessorPool#process_many` calls |
41
+
42
+ `serial` is measured once per benchmark run. `repeated_process` and
43
+ `process_many` are measured once for each configured concurrency count.
44
+
45
+ For `cdc-concurrent`, `process_many` is the primary throughput path. Repeated
46
+ single-event `process` calls are included to expose per-dispatch overhead, but
47
+ they do not represent the best way to overlap I/O waits.
48
+
49
+ ## Configuration
50
+
51
+ | Environment variable | Default | Meaning |
52
+ | -------------------------------- | ------- | -------------------------------------------- |
53
+ | `BENCHMARK_WORKLOAD` | `io` | `tiny`, `io`, or `batch` |
54
+ | `BENCHMARK_ITERATIONS` | `1000` | Work items submitted per pass |
55
+ | `BENCHMARK_WARMUP` | `100` | Warmup work items before measurement |
56
+ | `BENCHMARK_TRIALS` | `5` | Number of measured trials |
57
+ | `BENCHMARK_MIN_DURATION` | `0.1` | Minimum seconds per trial |
58
+ | `BENCHMARK_CONCURRENCY` | `100` | Single concurrency count when no sweep is given |
59
+ | `BENCHMARK_CONCURRENCY_COUNTS` | unset | Comma-separated concurrency sweep, e.g. `10,50,100` |
60
+ | `BENCHMARK_IO_SLEEP` | `0.001` | Seconds slept by I/O-like workloads |
61
+ | `BENCHMARK_BATCH_SIZE` | `100` | Events inside each batch workload item |
62
+
63
+ `BENCHMARK_CONCURRENCY_COUNTS` takes precedence over `BENCHMARK_CONCURRENCY`.
64
+
65
+ ## Examples
66
+
67
+ Run the default I/O workload:
68
+
69
+ ```bash
70
+ bundle exec rake benchmark:processor_pool
71
+ ```
72
+
73
+ Run the tiny overhead workload:
74
+
75
+ ```bash
76
+ BENCHMARK_WORKLOAD=tiny \
77
+ bundle exec rake benchmark:processor_pool
78
+ ```
79
+
80
+ Run the I/O workload with a custom concurrency sweep:
81
+
82
+ ```bash
83
+ BENCHMARK_WORKLOAD=io \
84
+ BENCHMARK_CONCURRENCY_COUNTS=10,50,100 \
85
+ bundle exec rake benchmark:processor_pool
86
+ ```
87
+
88
+ Run a longer batch benchmark:
89
+
90
+ ```bash
91
+ BENCHMARK_WORKLOAD=batch \
92
+ BENCHMARK_TRIALS=9 \
93
+ BENCHMARK_MIN_DURATION=0.5 \
94
+ BENCHMARK_CONCURRENCY_COUNTS=10,50,100 \
95
+ bundle exec rake benchmark:processor_pool
96
+ ```
97
+
98
+ Build and run the reusable Docker benchmark image:
99
+
100
+ ```bash
101
+ bundle exec rake benchmark:docker_build
102
+ bundle exec rake benchmark:docker_run
103
+ ```
104
+
105
+ ## Report Shape
106
+
107
+ The benchmark prints JSON.
108
+
109
+ Top-level fields:
110
+
111
+ | Field | Meaning |
112
+ | ------------------- | -------------------------------------------- |
113
+ | `benchmark` | Benchmark name |
114
+ | `gem` | Gem name |
115
+ | `timestamp` | UTC timestamp |
116
+ | `environment` | Ruby, platform, host, CPU, and uname metadata |
117
+ | `config` | Benchmark configuration |
118
+ | `workload_options` | Workload-specific options |
119
+ | `serial` | Serial execution distribution |
120
+ | `concurrency_sweep` | Concurrent mode distributions by concurrency |
121
+ | `interpretation` | Ratio interpretation guide |
122
+
123
+ Each distribution includes:
124
+
125
+ | Field | Meaning |
126
+ | -------- | -------------------------------- |
127
+ | `min` | Fastest observed value |
128
+ | `median` | Median observed value |
129
+ | `max` | Slowest observed value |
130
+ | `p95` | 95th percentile observed value |
131
+
132
+ Each mode also includes `raw_trials` so results can be inspected or reprocessed.
133
+
134
+ ## Interpretation
135
+
136
+ `ratio_to_serial_median_events_per_second` compares a concurrent mode's median
137
+ throughput against serial median throughput.
138
+
139
+ ```text
140
+ ratio_to_serial_median_events_per_second > 1.0 => concurrent mode faster
141
+ ratio_to_serial_median_events_per_second = 1.0 => equivalent
142
+ ratio_to_serial_median_events_per_second < 1.0 => serial faster
143
+ ```
144
+
145
+ Tiny workloads primarily measure dispatch overhead, so serial execution may be
146
+ faster. I/O-bound and batched workloads are better indicators of useful
147
+ concurrent throughput.
148
+
149
+ `cdc-parallel` and `cdc-concurrent` benchmark different bottlenecks.
150
+ `cdc-parallel` measures CPU parallelism; `cdc-concurrent` measures I/O wait
151
+ overlap. Their speedup ratios are not directly comparable.
152
+
153
+ ## Reproducibility
154
+
155
+ Benchmark results vary depending on:
156
+
157
+ - CPU model
158
+ - operating system
159
+ - Ruby version
160
+ - background system activity
161
+ - scheduler-compatible I/O behavior
162
+ - thermal and power-management state
163
+
164
+ Use multiple trials, a minimum measurement duration, and concurrency sweeps when
165
+ comparing results across machines or releases.