ruby_reactor 0.3.2 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.release-please-config.json +15 -0
- data/.release-please-manifest.json +3 -0
- data/.tool-versions +1 -0
- data/CHANGELOG.md +13 -0
- data/README.md +80 -4
- data/lib/ruby_reactor/context_serializer.rb +10 -1
- data/lib/ruby_reactor/map/result_enumerator.rb +4 -3
- data/lib/ruby_reactor/rate_limit.rb +2 -2
- data/lib/ruby_reactor/sidekiq_workers/worker.rb +58 -1
- data/lib/ruby_reactor/version.rb +1 -1
- metadata +7 -52
- data/documentation/DAG.md +0 -457
- data/documentation/README.md +0 -135
- data/documentation/async_reactors.md +0 -381
- data/documentation/composition.md +0 -199
- data/documentation/core_concepts.md +0 -676
- data/documentation/data_pipelines.md +0 -230
- data/documentation/examples/inventory_management.md +0 -748
- data/documentation/examples/order_processing.md +0 -380
- data/documentation/examples/payment_processing.md +0 -565
- data/documentation/getting_started.md +0 -242
- data/documentation/images/failed_order_processing.png +0 -0
- data/documentation/images/payment_workflow.png +0 -0
- data/documentation/interrupts.md +0 -163
- data/documentation/locks_and_semaphores.md +0 -459
- data/documentation/retry_configuration.md +0 -362
- data/documentation/testing.md +0 -994
- data/gui/.gitignore +0 -24
- data/gui/README.md +0 -73
- data/gui/eslint.config.js +0 -23
- data/gui/index.html +0 -13
- data/gui/package-lock.json +0 -5925
- data/gui/package.json +0 -46
- data/gui/postcss.config.js +0 -6
- data/gui/public/vite.svg +0 -1
- data/gui/src/App.css +0 -42
- data/gui/src/App.tsx +0 -51
- data/gui/src/assets/react.svg +0 -1
- data/gui/src/components/DagVisualizer.tsx +0 -424
- data/gui/src/components/Dashboard.tsx +0 -163
- data/gui/src/components/ErrorBoundary.tsx +0 -47
- data/gui/src/components/ReactorDetail.tsx +0 -135
- data/gui/src/components/StepInspector.tsx +0 -492
- data/gui/src/components/__tests__/DagVisualizer.test.tsx +0 -140
- data/gui/src/components/__tests__/ReactorDetail.test.tsx +0 -111
- data/gui/src/components/__tests__/StepInspector.test.tsx +0 -408
- data/gui/src/globals.d.ts +0 -7
- data/gui/src/index.css +0 -14
- data/gui/src/lib/utils.ts +0 -13
- data/gui/src/main.tsx +0 -14
- data/gui/src/test/setup.ts +0 -11
- data/gui/tailwind.config.js +0 -11
- data/gui/tsconfig.app.json +0 -28
- data/gui/tsconfig.json +0 -7
- data/gui/tsconfig.node.json +0 -26
- data/gui/vite.config.ts +0 -8
- data/gui/vitest.config.ts +0 -13
|
@@ -1,230 +0,0 @@
|
|
|
1
|
-
# Data Pipelines
|
|
2
|
-
|
|
3
|
-
RubyReactor provides powerful data pipeline capabilities through the `map` feature, allowing you to process collections of data efficiently. This system supports both synchronous and asynchronous execution, batch processing, and robust error handling.
|
|
4
|
-
|
|
5
|
-
## Overview
|
|
6
|
-
|
|
7
|
-
The data pipeline system is built around the `map` step, which iterates over an input collection and processes each element through a defined sub-reactor or inline steps.
|
|
8
|
-
|
|
9
|
-
Key features:
|
|
10
|
-
- **Parallel Processing**: Execute steps asynchronously via Sidekiq
|
|
11
|
-
- **Batch Control**: Manage system load with configurable batch sizes
|
|
12
|
-
- **Error Handling**: Choose between failing fast or collecting partial results
|
|
13
|
-
- **Retries**: Configure granular retry policies for individual steps
|
|
14
|
-
- **Aggregation**: Collect and transform results after processing
|
|
15
|
-
|
|
16
|
-
## Basic Usage
|
|
17
|
-
|
|
18
|
-
The simplest form of a data pipeline is an inline `map` step that processes elements synchronously.
|
|
19
|
-
|
|
20
|
-
```ruby
|
|
21
|
-
class UserTransformationReactor < RubyReactor::Reactor
|
|
22
|
-
input :users
|
|
23
|
-
|
|
24
|
-
map :transformed_users do
|
|
25
|
-
source input(:users)
|
|
26
|
-
argument :user, element(:transformed_users)
|
|
27
|
-
|
|
28
|
-
# Define steps to run for each element
|
|
29
|
-
step :normalize do
|
|
30
|
-
argument :user, input(:user)
|
|
31
|
-
run do |args, _|
|
|
32
|
-
user = args[:user]
|
|
33
|
-
Success({
|
|
34
|
-
name: user[:name].strip,
|
|
35
|
-
email: user[:email].downcase
|
|
36
|
-
})
|
|
37
|
-
end
|
|
38
|
-
end
|
|
39
|
-
|
|
40
|
-
# The result of this step becomes the result for the element
|
|
41
|
-
returns :normalize
|
|
42
|
-
end
|
|
43
|
-
end
|
|
44
|
-
```
|
|
45
|
-
|
|
46
|
-
## Dynamic Sources & ActiveRecord
|
|
47
|
-
|
|
48
|
-
The `map` step supports a dynamic `source` block, which is particularly useful when working with ActiveRecord or when the collection depends on input arguments. Instead of passing a static collection, you can define a block that returns an Enumerable or an `ActiveRecord::Relation`.
|
|
49
|
-
|
|
50
|
-
```ruby
|
|
51
|
-
map :process_products do
|
|
52
|
-
argument :filter, input(:filter)
|
|
53
|
-
|
|
54
|
-
# Dynamic source block
|
|
55
|
-
source do |args|
|
|
56
|
-
# This block executes at runtime
|
|
57
|
-
threshold = args[:filter][:stock]
|
|
58
|
-
Product.where("stock >= ?", threshold)
|
|
59
|
-
end
|
|
60
|
-
|
|
61
|
-
argument :product, element(:process_products)
|
|
62
|
-
async true, batch_size: 100
|
|
63
|
-
|
|
64
|
-
step :process do
|
|
65
|
-
# ...
|
|
66
|
-
end
|
|
67
|
-
|
|
68
|
-
returns :process
|
|
69
|
-
end
|
|
70
|
-
```
|
|
71
|
-
|
|
72
|
-
When an `ActiveRecord::Relation` is returned, RubyReactor efficiently batches the query using database-level `OFFSET` and `LIMIT` based on the configured `batch_size`, preventing memory bloat by not loading all records at once.
|
|
73
|
-
|
|
74
|
-
## Batch Processing Mechanism
|
|
75
|
-
|
|
76
|
-
When processing large datasets asynchronously, you can control the parallelism using `batch_size`. This limits how many Sidekiq jobs are enqueued simultaneously, preventing system overload.
|
|
77
|
-
|
|
78
|
-
```ruby
|
|
79
|
-
map :bulk_import do
|
|
80
|
-
source input(:records)
|
|
81
|
-
argument :record, element(:bulk_import)
|
|
82
|
-
|
|
83
|
-
# Process only 50 records at a time
|
|
84
|
-
async true, batch_size: 50
|
|
85
|
-
|
|
86
|
-
step :import_record do
|
|
87
|
-
# ...
|
|
88
|
-
end
|
|
89
|
-
end
|
|
90
|
-
```
|
|
91
|
-
|
|
92
|
-
### Back Pressure & Resource Management
|
|
93
|
-
|
|
94
|
-
When `async true` is used with a `batch_size`, RubyReactor implements an intelligent **back pressure** mechanism. Instead of flooding Redis and Sidekiq with millions of jobs immediately (which is the standard behavior for many background job systems), the system processes data in controlled chunks.
|
|
95
|
-
|
|
96
|
-
This approach provides critical benefits for stability and scalability:
|
|
97
|
-
|
|
98
|
-
1. **Memory Efficiency**: By using `ActiveRecord` batching (`LIMIT` / `OFFSET`), only the current batch of records is loaded into memory. This allows processing datasets larger than available RAM.
|
|
99
|
-
2. **Redis Protection**: Prevents "Queue Explosion". Only a small number of job arguments are stored in Redis at any time, preventing OOM errors in your Redis instance.
|
|
100
|
-
3. **Database Stability**: Database load is distributed over time rather than spiking all at once.
|
|
101
|
-
|
|
102
|
-
**Visualizing the Flow:**
|
|
103
|
-
|
|
104
|
-
```mermaid
|
|
105
|
-
graph TD
|
|
106
|
-
Start[Start Map] -->|Batch Size: N| BatchManager
|
|
107
|
-
|
|
108
|
-
subgraph "Back Pressure Loop"
|
|
109
|
-
BatchManager[Batch Manager] -->|Fetch N Items| DB[(Database)]
|
|
110
|
-
DB --> Records
|
|
111
|
-
Records -->|Enqueue N Jobs| Sidekiq
|
|
112
|
-
|
|
113
|
-
Sidekiq --> W1[Worker 1]
|
|
114
|
-
Sidekiq --> W2[Worker 2]
|
|
115
|
-
|
|
116
|
-
W1 -.->|Complete| Check{Batch Done?}
|
|
117
|
-
W2 -.->|Complete| Check
|
|
118
|
-
|
|
119
|
-
Check -->|No| Wait[Wait]
|
|
120
|
-
Check -->|Yes| Next[Trigger Next Batch]
|
|
121
|
-
Next --> BatchManager
|
|
122
|
-
end
|
|
123
|
-
|
|
124
|
-
BatchManager -->|No More Items| Finish[Aggregator]
|
|
125
|
-
```
|
|
126
|
-
|
|
127
|
-
This ensures that the system works at the speed of your workers, not the speed of the enqueueing process, maintaining a constant and manageable resource footprint regardless of dataset size.
|
|
128
|
-
|
|
129
|
-
## Error Handling
|
|
130
|
-
|
|
131
|
-
You can control how the pipeline reacts to failures using the `fail_fast` option.
|
|
132
|
-
|
|
133
|
-
### Fail Fast (Default)
|
|
134
|
-
|
|
135
|
-
By default (`fail_fast true`), the entire map operation fails immediately if any single element fails.
|
|
136
|
-
|
|
137
|
-
```ruby
|
|
138
|
-
map :strict_processing do
|
|
139
|
-
source input(:items)
|
|
140
|
-
# ...
|
|
141
|
-
fail_fast true # Default
|
|
142
|
-
end
|
|
143
|
-
```
|
|
144
|
-
|
|
145
|
-
### Collecting Results (Successes & Failures)
|
|
146
|
-
|
|
147
|
-
If you want to process all elements regardless of failures, set `fail_fast false`. The map step returns a `ResultEnumerator` that allows you to easily separate successful executions from failures.
|
|
148
|
-
|
|
149
|
-
```ruby
|
|
150
|
-
map :resilient_processing do
|
|
151
|
-
source input(:items)
|
|
152
|
-
argument :item, element(:resilient_processing)
|
|
153
|
-
|
|
154
|
-
# Continue processing even if some items fail
|
|
155
|
-
fail_fast false
|
|
156
|
-
|
|
157
|
-
step :risky_operation do
|
|
158
|
-
# ...
|
|
159
|
-
end
|
|
160
|
-
|
|
161
|
-
returns :risky_operation
|
|
162
|
-
end
|
|
163
|
-
|
|
164
|
-
step :analyze_results do
|
|
165
|
-
argument :results, result(:resilient_processing)
|
|
166
|
-
|
|
167
|
-
run do |args|
|
|
168
|
-
col = args[:results]
|
|
169
|
-
|
|
170
|
-
# Iterate over successful results
|
|
171
|
-
col.successes.each do |value|
|
|
172
|
-
# 'value' is the direct return value of the map element
|
|
173
|
-
puts "Success: #{value}"
|
|
174
|
-
end
|
|
175
|
-
|
|
176
|
-
# Iterate over failures
|
|
177
|
-
col.failures.each do |error|
|
|
178
|
-
# 'error' is the failure object/message itself
|
|
179
|
-
puts "Error: #{error}"
|
|
180
|
-
end
|
|
181
|
-
|
|
182
|
-
# Note: Iterating the collection directly yields wrapped objects
|
|
183
|
-
col.each do |result|
|
|
184
|
-
if result.is_a?(RubyReactor::Success)
|
|
185
|
-
puts "Wrapped Value: #{result.value}"
|
|
186
|
-
else
|
|
187
|
-
puts "Wrapped Error: #{result.error}"
|
|
188
|
-
end
|
|
189
|
-
end
|
|
190
|
-
|
|
191
|
-
Success({
|
|
192
|
-
success_count: col.successes.count,
|
|
193
|
-
failure_count: col.failures.count
|
|
194
|
-
})
|
|
195
|
-
end
|
|
196
|
-
end
|
|
197
|
-
```
|
|
198
|
-
|
|
199
|
-
## Retry Configuration
|
|
200
|
-
|
|
201
|
-
You can configure retries for individual steps within a map. This is particularly useful for transient failures (e.g., network timeouts) in async pipelines.
|
|
202
|
-
|
|
203
|
-
```ruby
|
|
204
|
-
map :reliable_processing do
|
|
205
|
-
source input(:urls)
|
|
206
|
-
argument :url, element(:reliable_processing)
|
|
207
|
-
async true
|
|
208
|
-
|
|
209
|
-
step :fetch_data do
|
|
210
|
-
argument :url, input(:url)
|
|
211
|
-
|
|
212
|
-
# Retry up to 3 times with exponential backoff
|
|
213
|
-
retries max_attempts: 3, backoff: :exponential, base_delay: 1.second
|
|
214
|
-
|
|
215
|
-
run do |args, _|
|
|
216
|
-
# If this raises or returns Failure, it will be retried
|
|
217
|
-
HttpClient.get(args[:url])
|
|
218
|
-
end
|
|
219
|
-
end
|
|
220
|
-
|
|
221
|
-
returns :fetch_data
|
|
222
|
-
end
|
|
223
|
-
```
|
|
224
|
-
|
|
225
|
-
### Retry Behavior
|
|
226
|
-
|
|
227
|
-
- **Async Mode**: Retries are handled by requeuing the Sidekiq job with a delay. This is non-blocking and efficient.
|
|
228
|
-
- **Sync Mode**: Retries happen immediately within the execution thread (blocking).
|
|
229
|
-
|
|
230
|
-
|