ruby_reactor 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.rspec +3 -0
- data/.rubocop.yml +98 -0
- data/CODE_OF_CONDUCT.md +84 -0
- data/README.md +570 -0
- data/Rakefile +12 -0
- data/documentation/DAG.md +457 -0
- data/documentation/README.md +123 -0
- data/documentation/async_reactors.md +369 -0
- data/documentation/composition.md +199 -0
- data/documentation/core_concepts.md +662 -0
- data/documentation/data_pipelines.md +224 -0
- data/documentation/examples/inventory_management.md +749 -0
- data/documentation/examples/order_processing.md +365 -0
- data/documentation/examples/payment_processing.md +654 -0
- data/documentation/getting_started.md +224 -0
- data/documentation/retry_configuration.md +357 -0
- data/lib/ruby_reactor/async_router.rb +91 -0
- data/lib/ruby_reactor/configuration.rb +41 -0
- data/lib/ruby_reactor/context.rb +169 -0
- data/lib/ruby_reactor/context_serializer.rb +164 -0
- data/lib/ruby_reactor/dependency_graph.rb +126 -0
- data/lib/ruby_reactor/dsl/compose_builder.rb +86 -0
- data/lib/ruby_reactor/dsl/map_builder.rb +112 -0
- data/lib/ruby_reactor/dsl/reactor.rb +151 -0
- data/lib/ruby_reactor/dsl/step_builder.rb +177 -0
- data/lib/ruby_reactor/dsl/template_helpers.rb +36 -0
- data/lib/ruby_reactor/dsl/validation_helpers.rb +35 -0
- data/lib/ruby_reactor/error/base.rb +16 -0
- data/lib/ruby_reactor/error/compensation_error.rb +8 -0
- data/lib/ruby_reactor/error/context_too_large_error.rb +11 -0
- data/lib/ruby_reactor/error/dependency_error.rb +8 -0
- data/lib/ruby_reactor/error/deserialization_error.rb +11 -0
- data/lib/ruby_reactor/error/input_validation_error.rb +29 -0
- data/lib/ruby_reactor/error/schema_version_error.rb +11 -0
- data/lib/ruby_reactor/error/step_failure_error.rb +18 -0
- data/lib/ruby_reactor/error/undo_error.rb +8 -0
- data/lib/ruby_reactor/error/validation_error.rb +8 -0
- data/lib/ruby_reactor/executor/compensation_manager.rb +79 -0
- data/lib/ruby_reactor/executor/graph_manager.rb +41 -0
- data/lib/ruby_reactor/executor/input_validator.rb +39 -0
- data/lib/ruby_reactor/executor/result_handler.rb +103 -0
- data/lib/ruby_reactor/executor/retry_manager.rb +156 -0
- data/lib/ruby_reactor/executor/step_executor.rb +319 -0
- data/lib/ruby_reactor/executor.rb +123 -0
- data/lib/ruby_reactor/map/collector.rb +65 -0
- data/lib/ruby_reactor/map/element_executor.rb +154 -0
- data/lib/ruby_reactor/map/execution.rb +60 -0
- data/lib/ruby_reactor/map/helpers.rb +67 -0
- data/lib/ruby_reactor/max_retries_exhausted_failure.rb +19 -0
- data/lib/ruby_reactor/reactor.rb +75 -0
- data/lib/ruby_reactor/retry_context.rb +92 -0
- data/lib/ruby_reactor/retry_queued_result.rb +26 -0
- data/lib/ruby_reactor/sidekiq_workers/map_collector_worker.rb +13 -0
- data/lib/ruby_reactor/sidekiq_workers/map_element_worker.rb +13 -0
- data/lib/ruby_reactor/sidekiq_workers/map_execution_worker.rb +15 -0
- data/lib/ruby_reactor/sidekiq_workers/worker.rb +55 -0
- data/lib/ruby_reactor/step/compose_step.rb +107 -0
- data/lib/ruby_reactor/step/map_step.rb +234 -0
- data/lib/ruby_reactor/step.rb +33 -0
- data/lib/ruby_reactor/storage/adapter.rb +51 -0
- data/lib/ruby_reactor/storage/configuration.rb +15 -0
- data/lib/ruby_reactor/storage/redis_adapter.rb +140 -0
- data/lib/ruby_reactor/template/base.rb +15 -0
- data/lib/ruby_reactor/template/element.rb +25 -0
- data/lib/ruby_reactor/template/input.rb +48 -0
- data/lib/ruby_reactor/template/result.rb +48 -0
- data/lib/ruby_reactor/template/value.rb +22 -0
- data/lib/ruby_reactor/validation/base.rb +26 -0
- data/lib/ruby_reactor/validation/input_validator.rb +62 -0
- data/lib/ruby_reactor/validation/schema_builder.rb +17 -0
- data/lib/ruby_reactor/version.rb +5 -0
- data/lib/ruby_reactor.rb +159 -0
- data/sig/ruby_reactor.rbs +4 -0
- metadata +178 -0
|
@@ -0,0 +1,224 @@
|
|
|
1
|
+
# Data Pipelines
|
|
2
|
+
|
|
3
|
+
RubyReactor provides powerful data pipeline capabilities through the `map` feature, allowing you to process collections of data efficiently. This system supports both synchronous and asynchronous execution, batch processing, and robust error handling.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
The data pipeline system is built around the `map` step, which iterates over an input collection and processes each element through a defined sub-reactor or inline steps.
|
|
8
|
+
|
|
9
|
+
Key features:
|
|
10
|
+
- **Parallel Processing**: Execute steps asynchronously via Sidekiq
|
|
11
|
+
- **Batch Control**: Manage system load with configurable batch sizes
|
|
12
|
+
- **Error Handling**: Choose between failing fast or collecting partial results
|
|
13
|
+
- **Retries**: Configure granular retry policies for individual steps
|
|
14
|
+
- **Aggregation**: Collect and transform results after processing
|
|
15
|
+
|
|
16
|
+
## Basic Usage
|
|
17
|
+
|
|
18
|
+
The simplest form of a data pipeline is an inline `map` step that processes elements synchronously.
|
|
19
|
+
|
|
20
|
+
```ruby
|
|
21
|
+
class UserTransformationReactor < RubyReactor::Reactor
|
|
22
|
+
input :users
|
|
23
|
+
|
|
24
|
+
map :transformed_users do
|
|
25
|
+
source input(:users)
|
|
26
|
+
argument :user, element(:transformed_users)
|
|
27
|
+
|
|
28
|
+
# Define steps to run for each element
|
|
29
|
+
step :normalize do
|
|
30
|
+
argument :user, input(:user)
|
|
31
|
+
run do |args, _|
|
|
32
|
+
user = args[:user]
|
|
33
|
+
Success({
|
|
34
|
+
name: user[:name].strip,
|
|
35
|
+
email: user[:email].downcase
|
|
36
|
+
})
|
|
37
|
+
end
|
|
38
|
+
end
|
|
39
|
+
|
|
40
|
+
# The result of this step becomes the result for the element
|
|
41
|
+
returns :normalize
|
|
42
|
+
end
|
|
43
|
+
end
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
## Async Execution
|
|
47
|
+
|
|
48
|
+
For long-running or resource-intensive tasks, you can offload processing to background jobs using Sidekiq.
|
|
49
|
+
|
|
50
|
+
To enable async execution, simply add the `async true` directive to your map definition.
|
|
51
|
+
|
|
52
|
+
```ruby
|
|
53
|
+
map :process_orders do
|
|
54
|
+
source input(:orders)
|
|
55
|
+
argument :order, element(:process_orders)
|
|
56
|
+
|
|
57
|
+
# Enable async execution via Sidekiq
|
|
58
|
+
async true
|
|
59
|
+
|
|
60
|
+
step :charge_card do
|
|
61
|
+
argument :order, input(:order)
|
|
62
|
+
run { PaymentService.charge(args[:order]) }
|
|
63
|
+
end
|
|
64
|
+
|
|
65
|
+
returns :charge_card
|
|
66
|
+
end
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
### Execution Flow
|
|
70
|
+
|
|
71
|
+
```mermaid
|
|
72
|
+
sequenceDiagram
|
|
73
|
+
participant Reactor
|
|
74
|
+
participant Redis
|
|
75
|
+
participant Sidekiq
|
|
76
|
+
participant Worker
|
|
77
|
+
|
|
78
|
+
Reactor->>Redis: Store Context
|
|
79
|
+
Reactor->>Sidekiq: Enqueue MapElementWorkers
|
|
80
|
+
Note over Reactor: Returns AsyncResult immediately
|
|
81
|
+
|
|
82
|
+
loop For each element
|
|
83
|
+
Sidekiq->>Worker: Process Element
|
|
84
|
+
Worker->>Redis: Update Element Result
|
|
85
|
+
end
|
|
86
|
+
|
|
87
|
+
Worker->>Sidekiq: Enqueue MapCollectorWorker (when done)
|
|
88
|
+
Sidekiq->>Worker: Run Collector
|
|
89
|
+
Worker->>Redis: Store Final Result
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
## Batch Processing
|
|
93
|
+
|
|
94
|
+
When processing large datasets asynchronously, you can control the parallelism using `batch_size`. This limits how many Sidekiq jobs are enqueued simultaneously, preventing system overload.
|
|
95
|
+
|
|
96
|
+
```ruby
|
|
97
|
+
map :bulk_import do
|
|
98
|
+
source input(:records)
|
|
99
|
+
argument :record, element(:bulk_import)
|
|
100
|
+
|
|
101
|
+
# Process only 50 records at a time
|
|
102
|
+
async true, batch_size: 50
|
|
103
|
+
|
|
104
|
+
step :import_record do
|
|
105
|
+
# ...
|
|
106
|
+
end
|
|
107
|
+
end
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
**How it works:**
|
|
111
|
+
1. The system initially enqueues `batch_size` jobs.
|
|
112
|
+
2. As each job completes, it triggers the next job in the queue.
|
|
113
|
+
3. This maintains a steady stream of processing without flooding the queue.
|
|
114
|
+
|
|
115
|
+
## Error Handling
|
|
116
|
+
|
|
117
|
+
You can control how the pipeline reacts to failures using the `fail_fast` option.
|
|
118
|
+
|
|
119
|
+
### Fail Fast (Default)
|
|
120
|
+
|
|
121
|
+
By default (`fail_fast true`), the entire map operation fails immediately if any single element fails.
|
|
122
|
+
|
|
123
|
+
```ruby
|
|
124
|
+
map :strict_processing do
|
|
125
|
+
source input(:items)
|
|
126
|
+
# ...
|
|
127
|
+
fail_fast true # Default
|
|
128
|
+
end
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
### Collecting Partial Results
|
|
132
|
+
|
|
133
|
+
If you want to process all elements regardless of failures, set `fail_fast false`. You can then use a `collect` block to handle successes and failures separately.
|
|
134
|
+
|
|
135
|
+
```ruby
|
|
136
|
+
map :resilient_processing do
|
|
137
|
+
source input(:items)
|
|
138
|
+
argument :item, element(:resilient_processing)
|
|
139
|
+
|
|
140
|
+
# Continue processing even if some items fail
|
|
141
|
+
fail_fast false
|
|
142
|
+
|
|
143
|
+
step :risky_operation do
|
|
144
|
+
# ...
|
|
145
|
+
end
|
|
146
|
+
|
|
147
|
+
returns :risky_operation
|
|
148
|
+
|
|
149
|
+
# Aggregate results
|
|
150
|
+
collect do |results|
|
|
151
|
+
# results is an array of Result objects (Success or Failure)
|
|
152
|
+
successful = results.select(&:success?).map(&:value)
|
|
153
|
+
failed = results.select(&:failure?).map(&:error)
|
|
154
|
+
|
|
155
|
+
{
|
|
156
|
+
processed: successful,
|
|
157
|
+
errors: failed,
|
|
158
|
+
success_rate: successful.length.to_f / results.length
|
|
159
|
+
}
|
|
160
|
+
end
|
|
161
|
+
end
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
## Retry Configuration
|
|
165
|
+
|
|
166
|
+
You can configure retries for individual steps within a map. This is particularly useful for transient failures (e.g., network timeouts) in async pipelines.
|
|
167
|
+
|
|
168
|
+
```ruby
|
|
169
|
+
map :reliable_processing do
|
|
170
|
+
source input(:urls)
|
|
171
|
+
argument :url, element(:reliable_processing)
|
|
172
|
+
async true
|
|
173
|
+
|
|
174
|
+
step :fetch_data do
|
|
175
|
+
argument :url, input(:url)
|
|
176
|
+
|
|
177
|
+
# Retry up to 3 times with exponential backoff
|
|
178
|
+
retries max_attempts: 3, backoff: :exponential, base_delay: 1.second
|
|
179
|
+
|
|
180
|
+
run do |args, _|
|
|
181
|
+
# If this raises or returns Failure, it will be retried
|
|
182
|
+
HttpClient.get(args[:url])
|
|
183
|
+
end
|
|
184
|
+
end
|
|
185
|
+
|
|
186
|
+
returns :fetch_data
|
|
187
|
+
end
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
### Retry Behavior
|
|
191
|
+
|
|
192
|
+
- **Async Mode**: Retries are handled by requeuing the Sidekiq job with a delay. This is non-blocking and efficient.
|
|
193
|
+
- **Sync Mode**: Retries happen immediately within the execution thread (blocking).
|
|
194
|
+
|
|
195
|
+
## Visualization
|
|
196
|
+
|
|
197
|
+
### Async Batch Execution
|
|
198
|
+
|
|
199
|
+
```mermaid
|
|
200
|
+
graph TD
|
|
201
|
+
Start[Start Map] --> Init[Initialize Batch]
|
|
202
|
+
Init --> Q1["Queue Initial Batch<br/>(Size N)"]
|
|
203
|
+
|
|
204
|
+
subgraph Workers
|
|
205
|
+
W1[Worker 1]
|
|
206
|
+
W2[Worker 2]
|
|
207
|
+
W3[Worker ...]
|
|
208
|
+
end
|
|
209
|
+
|
|
210
|
+
Q1 --> W1
|
|
211
|
+
Q1 --> W2
|
|
212
|
+
|
|
213
|
+
W1 -->|Complete| Next1{More Items?}
|
|
214
|
+
W2 -->|Complete| Next2{More Items?}
|
|
215
|
+
|
|
216
|
+
Next1 -->|Yes| Q2[Queue Next Item]
|
|
217
|
+
Next2 -->|Yes| Q2
|
|
218
|
+
|
|
219
|
+
Q2 --> W3
|
|
220
|
+
|
|
221
|
+
Next1 -->|No| Check{All Done?}
|
|
222
|
+
Check -->|Yes| Collect[Run Collector]
|
|
223
|
+
Collect --> Finish[Final Result]
|
|
224
|
+
```
|