simple_flow 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.envrc +1 -0
- data/.github/workflows/deploy-github-pages.yml +52 -0
- data/.rubocop.yml +57 -0
- data/CHANGELOG.md +4 -0
- data/COMMITS.md +196 -0
- data/LICENSE +21 -0
- data/README.md +481 -0
- data/Rakefile +15 -0
- data/benchmarks/parallel_vs_sequential.rb +98 -0
- data/benchmarks/pipeline_overhead.rb +130 -0
- data/docs/api/middleware.md +468 -0
- data/docs/api/parallel-step.md +363 -0
- data/docs/api/pipeline.md +382 -0
- data/docs/api/result.md +375 -0
- data/docs/concurrent/best-practices.md +687 -0
- data/docs/concurrent/introduction.md +246 -0
- data/docs/concurrent/parallel-steps.md +418 -0
- data/docs/concurrent/performance.md +481 -0
- data/docs/core-concepts/flow-control.md +452 -0
- data/docs/core-concepts/middleware.md +389 -0
- data/docs/core-concepts/overview.md +219 -0
- data/docs/core-concepts/pipeline.md +315 -0
- data/docs/core-concepts/result.md +168 -0
- data/docs/core-concepts/steps.md +391 -0
- data/docs/development/benchmarking.md +443 -0
- data/docs/development/contributing.md +380 -0
- data/docs/development/dagwood-concepts.md +435 -0
- data/docs/development/testing.md +514 -0
- data/docs/getting-started/examples.md +197 -0
- data/docs/getting-started/installation.md +62 -0
- data/docs/getting-started/quick-start.md +218 -0
- data/docs/guides/choosing-concurrency-model.md +441 -0
- data/docs/guides/complex-workflows.md +440 -0
- data/docs/guides/data-fetching.md +478 -0
- data/docs/guides/error-handling.md +635 -0
- data/docs/guides/file-processing.md +505 -0
- data/docs/guides/validation-patterns.md +496 -0
- data/docs/index.md +169 -0
- data/examples/.gitignore +3 -0
- data/examples/01_basic_pipeline.rb +112 -0
- data/examples/02_error_handling.rb +178 -0
- data/examples/03_middleware.rb +186 -0
- data/examples/04_parallel_automatic.rb +221 -0
- data/examples/05_parallel_explicit.rb +279 -0
- data/examples/06_real_world_ecommerce.rb +288 -0
- data/examples/07_real_world_etl.rb +277 -0
- data/examples/08_graph_visualization.rb +246 -0
- data/examples/09_pipeline_visualization.rb +266 -0
- data/examples/10_concurrency_control.rb +235 -0
- data/examples/11_sequential_dependencies.rb +243 -0
- data/examples/12_none_constant.rb +161 -0
- data/examples/README.md +374 -0
- data/examples/regression_test/01_basic_pipeline.txt +38 -0
- data/examples/regression_test/02_error_handling.txt +92 -0
- data/examples/regression_test/03_middleware.txt +61 -0
- data/examples/regression_test/04_parallel_automatic.txt +86 -0
- data/examples/regression_test/05_parallel_explicit.txt +80 -0
- data/examples/regression_test/06_real_world_ecommerce.txt +53 -0
- data/examples/regression_test/07_real_world_etl.txt +58 -0
- data/examples/regression_test/08_graph_visualization.txt +429 -0
- data/examples/regression_test/09_pipeline_visualization.txt +305 -0
- data/examples/regression_test/10_concurrency_control.txt +96 -0
- data/examples/regression_test/11_sequential_dependencies.txt +86 -0
- data/examples/regression_test/12_none_constant.txt +64 -0
- data/examples/regression_test.rb +105 -0
- data/lib/simple_flow/dependency_graph.rb +120 -0
- data/lib/simple_flow/dependency_graph_visualizer.rb +326 -0
- data/lib/simple_flow/middleware.rb +36 -0
- data/lib/simple_flow/parallel_executor.rb +80 -0
- data/lib/simple_flow/pipeline.rb +405 -0
- data/lib/simple_flow/result.rb +88 -0
- data/lib/simple_flow/step_tracker.rb +58 -0
- data/lib/simple_flow/version.rb +5 -0
- data/lib/simple_flow.rb +41 -0
- data/mkdocs.yml +146 -0
- data/pipeline_graph.dot +51 -0
- data/pipeline_graph.html +60 -0
- data/pipeline_graph.mmd +19 -0
- metadata +127 -0
|
@@ -0,0 +1,62 @@
|
|
|
1
|
+
# Installation
|
|
2
|
+
|
|
3
|
+
## Requirements
|
|
4
|
+
|
|
5
|
+
- Ruby >= 2.7.0
|
|
6
|
+
- Bundler (recommended)
|
|
7
|
+
|
|
8
|
+
## Installation Methods
|
|
9
|
+
|
|
10
|
+
### Using Bundler (Recommended)
|
|
11
|
+
|
|
12
|
+
Add SimpleFlow to your `Gemfile`:
|
|
13
|
+
|
|
14
|
+
```ruby
|
|
15
|
+
gem 'simple_flow'
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
Then install:
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
bundle install
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
### Using RubyGems
|
|
25
|
+
|
|
26
|
+
Install directly with gem:
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
gem install simple_flow
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
## Dependencies
|
|
33
|
+
|
|
34
|
+
SimpleFlow has minimal dependencies:
|
|
35
|
+
|
|
36
|
+
- **async** (~> 2.0) - For concurrent execution support
|
|
37
|
+
|
|
38
|
+
All dependencies are automatically installed.
|
|
39
|
+
|
|
40
|
+
## Verifying Installation
|
|
41
|
+
|
|
42
|
+
After installation, verify SimpleFlow is working:
|
|
43
|
+
|
|
44
|
+
```ruby
|
|
45
|
+
require 'simple_flow'
|
|
46
|
+
|
|
47
|
+
pipeline = SimpleFlow::Pipeline.new do
|
|
48
|
+
step ->(result) { result.continue("Hello, SimpleFlow!") }
|
|
49
|
+
end
|
|
50
|
+
|
|
51
|
+
result = pipeline.call(SimpleFlow::Result.new(nil))
|
|
52
|
+
puts result.value
|
|
53
|
+
# => "Hello, SimpleFlow!"
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
If this runs without errors, you're ready to go!
|
|
57
|
+
|
|
58
|
+
## Next Steps
|
|
59
|
+
|
|
60
|
+
- [Quick Start Guide](quick-start.md) - Build your first pipeline
|
|
61
|
+
- [Examples](examples.md) - See SimpleFlow in action
|
|
62
|
+
- [Core Concepts](../core-concepts/overview.md) - Understand the fundamentals
|
|
@@ -0,0 +1,218 @@
|
|
|
1
|
+
# Quick Start
|
|
2
|
+
|
|
3
|
+
Get up and running with SimpleFlow in 5 minutes!
|
|
4
|
+
|
|
5
|
+
## Your First Pipeline
|
|
6
|
+
|
|
7
|
+
```ruby
|
|
8
|
+
require 'simple_flow'
|
|
9
|
+
|
|
10
|
+
# Create a simple text processing pipeline
|
|
11
|
+
pipeline = SimpleFlow::Pipeline.new do
|
|
12
|
+
step ->(result) { result.continue(result.value.strip) }
|
|
13
|
+
step ->(result) { result.continue(result.value.downcase) }
|
|
14
|
+
step ->(result) { result.continue("Hello, #{result.value}!") }
|
|
15
|
+
end
|
|
16
|
+
|
|
17
|
+
# Execute the pipeline
|
|
18
|
+
result = pipeline.call(SimpleFlow::Result.new(" WORLD "))
|
|
19
|
+
puts result.value
|
|
20
|
+
# => "Hello, world!"
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## Understanding the Basics
|
|
24
|
+
|
|
25
|
+
### Sequential Execution
|
|
26
|
+
|
|
27
|
+
**Steps execute in order, with each step automatically depending on the previous step's success.**
|
|
28
|
+
|
|
29
|
+
```ruby
|
|
30
|
+
pipeline = SimpleFlow::Pipeline.new do
|
|
31
|
+
step ->(result) { puts "Step 1"; result.continue(result.value) }
|
|
32
|
+
step ->(result) { puts "Step 2"; result.halt("error") } # Stops here
|
|
33
|
+
step ->(result) { puts "Step 3"; result.continue(result.value) } # Never runs
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
result = pipeline.call(SimpleFlow::Result.new(nil))
|
|
37
|
+
# Output: Step 1
|
|
38
|
+
# Step 2
|
|
39
|
+
# (Step 3 is skipped because Step 2 halted)
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
When any step halts (returns `result.halt`), the pipeline stops immediately and subsequent steps are not executed.
|
|
43
|
+
|
|
44
|
+
### 1. Create a Result
|
|
45
|
+
|
|
46
|
+
A `Result` wraps your data:
|
|
47
|
+
|
|
48
|
+
```ruby
|
|
49
|
+
result = SimpleFlow::Result.new(42)
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
### 2. Define Steps
|
|
53
|
+
|
|
54
|
+
Steps are callable objects (usually lambdas) that transform results:
|
|
55
|
+
|
|
56
|
+
```ruby
|
|
57
|
+
step ->(result) {
|
|
58
|
+
new_value = result.value * 2
|
|
59
|
+
result.continue(new_value)
|
|
60
|
+
}
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
### 3. Build a Pipeline
|
|
64
|
+
|
|
65
|
+
Combine steps into a pipeline:
|
|
66
|
+
|
|
67
|
+
```ruby
|
|
68
|
+
pipeline = SimpleFlow::Pipeline.new do
|
|
69
|
+
step ->(result) { result.continue(result.value + 10) }
|
|
70
|
+
step ->(result) { result.continue(result.value * 2) }
|
|
71
|
+
end
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
### 4. Execute
|
|
75
|
+
|
|
76
|
+
Call the pipeline with an initial result:
|
|
77
|
+
|
|
78
|
+
```ruby
|
|
79
|
+
final = pipeline.call(SimpleFlow::Result.new(5))
|
|
80
|
+
puts final.value # => 30 ((5 + 10) * 2)
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
## Adding Context
|
|
84
|
+
|
|
85
|
+
Track metadata throughout your pipeline:
|
|
86
|
+
|
|
87
|
+
```ruby
|
|
88
|
+
pipeline = SimpleFlow::Pipeline.new do
|
|
89
|
+
step ->(result) {
|
|
90
|
+
result
|
|
91
|
+
.with_context(:started_at, Time.now)
|
|
92
|
+
.continue(result.value)
|
|
93
|
+
}
|
|
94
|
+
|
|
95
|
+
step ->(result) {
|
|
96
|
+
result
|
|
97
|
+
.with_context(:user, "Alice")
|
|
98
|
+
.continue(result.value.upcase)
|
|
99
|
+
}
|
|
100
|
+
end
|
|
101
|
+
|
|
102
|
+
result = pipeline.call(SimpleFlow::Result.new("hello"))
|
|
103
|
+
puts result.value # => "HELLO"
|
|
104
|
+
puts result.context # => {:started_at=>..., :user=>"Alice"}
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
## Error Handling
|
|
108
|
+
|
|
109
|
+
Accumulate errors and halt execution:
|
|
110
|
+
|
|
111
|
+
```ruby
|
|
112
|
+
pipeline = SimpleFlow::Pipeline.new do
|
|
113
|
+
step ->(result) {
|
|
114
|
+
age = result.value
|
|
115
|
+
if age < 18
|
|
116
|
+
result.halt.with_error(:age, "Must be 18 or older")
|
|
117
|
+
else
|
|
118
|
+
result.continue(age)
|
|
119
|
+
end
|
|
120
|
+
}
|
|
121
|
+
|
|
122
|
+
step ->(result) {
|
|
123
|
+
# This won't execute if age < 18
|
|
124
|
+
result.continue("Approved for age #{result.value}")
|
|
125
|
+
}
|
|
126
|
+
end
|
|
127
|
+
|
|
128
|
+
result = pipeline.call(SimpleFlow::Result.new(16))
|
|
129
|
+
puts result.continue? # => false
|
|
130
|
+
puts result.errors # => {:age=>["Must be 18 or older"]}
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
## Concurrent Execution
|
|
134
|
+
|
|
135
|
+
Run independent steps in parallel:
|
|
136
|
+
|
|
137
|
+
```ruby
|
|
138
|
+
pipeline = SimpleFlow::Pipeline.new do
|
|
139
|
+
parallel do
|
|
140
|
+
step ->(result) { result.with_context(:a, fetch_data_a).continue(result.value) }
|
|
141
|
+
step ->(result) { result.with_context(:b, fetch_data_b).continue(result.value) }
|
|
142
|
+
step ->(result) { result.with_context(:c, fetch_data_c).continue(result.value) }
|
|
143
|
+
end
|
|
144
|
+
|
|
145
|
+
step ->(result) {
|
|
146
|
+
# All three fetches completed concurrently
|
|
147
|
+
result.continue("Aggregated data")
|
|
148
|
+
}
|
|
149
|
+
end
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
## Middleware
|
|
153
|
+
|
|
154
|
+
Add cross-cutting concerns:
|
|
155
|
+
|
|
156
|
+
```ruby
|
|
157
|
+
pipeline = SimpleFlow::Pipeline.new do
|
|
158
|
+
use_middleware SimpleFlow::MiddleWare::Logging
|
|
159
|
+
|
|
160
|
+
step ->(result) { result.continue(result.value + 1) }
|
|
161
|
+
step ->(result) { result.continue(result.value * 2) }
|
|
162
|
+
end
|
|
163
|
+
|
|
164
|
+
# Logs before and after each step
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
## Real-World Example
|
|
168
|
+
|
|
169
|
+
Here's a more complete example:
|
|
170
|
+
|
|
171
|
+
```ruby
|
|
172
|
+
require 'simple_flow'
|
|
173
|
+
|
|
174
|
+
# Define validation steps
|
|
175
|
+
validate_email = ->(result) {
|
|
176
|
+
email = result.value[:email]
|
|
177
|
+
if email && email.match?(/\A[\w+\-.]+@[a-z\d\-]+(\.[a-z]+)+\z/i)
|
|
178
|
+
result.continue(result.value)
|
|
179
|
+
else
|
|
180
|
+
result.halt(result.value).with_error(:email, "Invalid email format")
|
|
181
|
+
end
|
|
182
|
+
}
|
|
183
|
+
|
|
184
|
+
validate_age = ->(result) {
|
|
185
|
+
age = result.value[:age]
|
|
186
|
+
if age && age >= 18
|
|
187
|
+
result.continue(result.value)
|
|
188
|
+
else
|
|
189
|
+
result.halt(result.value).with_error(:age, "Must be 18 or older")
|
|
190
|
+
end
|
|
191
|
+
}
|
|
192
|
+
|
|
193
|
+
# Build validation pipeline
|
|
194
|
+
validation_pipeline = SimpleFlow::Pipeline.new do
|
|
195
|
+
step validate_email
|
|
196
|
+
step validate_age
|
|
197
|
+
end
|
|
198
|
+
|
|
199
|
+
# Test with valid data
|
|
200
|
+
valid_data = { email: "alice@example.com", age: 25 }
|
|
201
|
+
result = validation_pipeline.call(SimpleFlow::Result.new(valid_data))
|
|
202
|
+
puts result.continue? # => true
|
|
203
|
+
|
|
204
|
+
# Test with invalid data
|
|
205
|
+
invalid_data = { email: "invalid", age: 16 }
|
|
206
|
+
result = validation_pipeline.call(SimpleFlow::Result.new(invalid_data))
|
|
207
|
+
puts result.continue? # => false
|
|
208
|
+
puts result.errors # => {:email=>["Invalid email format"]}
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
## Next Steps
|
|
212
|
+
|
|
213
|
+
Now that you've got the basics, explore:
|
|
214
|
+
|
|
215
|
+
- [Examples](examples.md) - Real-world use cases
|
|
216
|
+
- [Core Concepts](../core-concepts/overview.md) - Deep dive into architecture
|
|
217
|
+
- [Concurrent Execution](../concurrent/introduction.md) - Maximize performance
|
|
218
|
+
- [Error Handling Guide](../guides/error-handling.md) - Advanced error patterns
|
|
@@ -0,0 +1,441 @@
|
|
|
1
|
+
# Choosing a Concurrency Model
|
|
2
|
+
|
|
3
|
+
SimpleFlow supports two different approaches for parallel execution: Ruby threads and the async gem (fiber-based). This guide helps you choose the right one for your use case.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
You can control which concurrency model a pipeline uses in two ways:
|
|
8
|
+
|
|
9
|
+
### 1. Automatic Detection (Default)
|
|
10
|
+
|
|
11
|
+
When you create a pipeline without specifying concurrency:
|
|
12
|
+
|
|
13
|
+
```ruby
|
|
14
|
+
pipeline = SimpleFlow::Pipeline.new do
|
|
15
|
+
# steps...
|
|
16
|
+
end
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
SimpleFlow automatically uses the best available model:
|
|
20
|
+
- **Without async gem**: Uses Ruby's built-in threads
|
|
21
|
+
- **With async gem**: Uses fiber-based concurrency
|
|
22
|
+
|
|
23
|
+
### 2. Explicit Concurrency Selection
|
|
24
|
+
|
|
25
|
+
You can explicitly choose the concurrency model per pipeline:
|
|
26
|
+
|
|
27
|
+
```ruby
|
|
28
|
+
# Force threads (even if async gem is available)
|
|
29
|
+
pipeline = SimpleFlow::Pipeline.new(concurrency: :threads) do
|
|
30
|
+
# steps...
|
|
31
|
+
end
|
|
32
|
+
|
|
33
|
+
# Force async (raises error if async gem not available)
|
|
34
|
+
pipeline = SimpleFlow::Pipeline.new(concurrency: :async) do
|
|
35
|
+
# steps...
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
# Auto-detect (default behavior)
|
|
39
|
+
pipeline = SimpleFlow::Pipeline.new(concurrency: :auto) do
|
|
40
|
+
# steps...
|
|
41
|
+
end
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Both provide **actual parallel execution** - the difference is in how they achieve it and their resource characteristics.
|
|
45
|
+
|
|
46
|
+
## Ruby Threads (Without async gem)
|
|
47
|
+
|
|
48
|
+
### How It Works
|
|
49
|
+
|
|
50
|
+
- Creates actual OS threads (like having multiple workers)
|
|
51
|
+
- Each thread runs independently
|
|
52
|
+
- Ruby's GIL (Global Interpreter Lock) means only one thread runs Ruby code at a time
|
|
53
|
+
- **BUT**: When a thread waits for I/O (network, disk, database), other threads can run
|
|
54
|
+
|
|
55
|
+
### Best For
|
|
56
|
+
|
|
57
|
+
- **Simple use cases**: You just want things to run in parallel
|
|
58
|
+
- **Blocking I/O operations**:
|
|
59
|
+
- Making HTTP requests to APIs
|
|
60
|
+
- Reading/writing files
|
|
61
|
+
- Database queries
|
|
62
|
+
- Any "waiting" operations
|
|
63
|
+
- **Mixed libraries**: Works with any Ruby gem (doesn't need async support)
|
|
64
|
+
- **Small-to-medium concurrency**: 10-100 parallel operations
|
|
65
|
+
|
|
66
|
+
### Resource Usage
|
|
67
|
+
|
|
68
|
+
- Each thread uses ~1-2 MB of memory
|
|
69
|
+
- OS manages thread scheduling
|
|
70
|
+
- Limited by system resources (maybe 100-1,000 threads max)
|
|
71
|
+
|
|
72
|
+
### Example Scenario
|
|
73
|
+
|
|
74
|
+
```ruby
|
|
75
|
+
# Fetching data from 10 different APIs in parallel
|
|
76
|
+
pipeline = SimpleFlow::Pipeline.new do
|
|
77
|
+
step :validate, validator, depends_on: []
|
|
78
|
+
|
|
79
|
+
# These 10 API calls run in parallel with threads
|
|
80
|
+
step :api_1, ->(r) { r.with_context(:api_1, fetch_api_1) }, depends_on: [:validate]
|
|
81
|
+
step :api_2, ->(r) { r.with_context(:api_2, fetch_api_2) }, depends_on: [:validate]
|
|
82
|
+
# ... 8 more API calls
|
|
83
|
+
|
|
84
|
+
step :merge, merger, depends_on: [:api_1, :api_2, ...]
|
|
85
|
+
end
|
|
86
|
+
|
|
87
|
+
# Each API call takes 500ms, threads let them all wait simultaneously
|
|
88
|
+
# Total time: ~500ms instead of 5 seconds
|
|
89
|
+
result = pipeline.call_parallel(initial_data)
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
---
|
|
93
|
+
|
|
94
|
+
## Async Gem (Fiber-based)
|
|
95
|
+
|
|
96
|
+
### How It Works
|
|
97
|
+
|
|
98
|
+
- Uses Ruby "fibers" (lightweight green threads)
|
|
99
|
+
- Cooperative scheduling (fibers yield control when waiting)
|
|
100
|
+
- Event loop manages thousands of concurrent operations
|
|
101
|
+
- Requires async-aware libraries (async-http, async-postgres, etc.)
|
|
102
|
+
|
|
103
|
+
### Best For
|
|
104
|
+
|
|
105
|
+
- **High concurrency**: Thousands of simultaneous operations
|
|
106
|
+
- **I/O-heavy applications**: Web scrapers, API gateways, chat servers
|
|
107
|
+
- **Long-running services**: Background workers processing many jobs
|
|
108
|
+
- **Async-compatible stack**: When using async-aware gems
|
|
109
|
+
|
|
110
|
+
### Resource Usage
|
|
111
|
+
|
|
112
|
+
- Each fiber uses ~4-8 KB of memory (250x lighter than threads!)
|
|
113
|
+
- Can handle 10,000+ concurrent operations
|
|
114
|
+
- More efficient CPU and memory usage
|
|
115
|
+
|
|
116
|
+
### Example Scenario
|
|
117
|
+
|
|
118
|
+
```ruby
|
|
119
|
+
# Web scraper fetching 10,000 product pages
|
|
120
|
+
require 'async'
|
|
121
|
+
require 'async/http/internet'
|
|
122
|
+
|
|
123
|
+
pipeline = SimpleFlow::Pipeline.new do
|
|
124
|
+
step :load_urls, url_loader, depends_on: []
|
|
125
|
+
|
|
126
|
+
# With async gem, can handle thousands of concurrent requests
|
|
127
|
+
step :fetch_pages, ->(result) {
|
|
128
|
+
urls = result.value[:urls]
|
|
129
|
+
pages = Async::HTTP::Internet.new.get_all(urls)
|
|
130
|
+
result.with_context(:pages, pages).continue(result.value)
|
|
131
|
+
}, depends_on: [:load_urls]
|
|
132
|
+
|
|
133
|
+
step :parse_data, parser, depends_on: [:fetch_pages]
|
|
134
|
+
end
|
|
135
|
+
|
|
136
|
+
# With threads: Would crash or be very slow (10,000 threads = 10+ GB RAM)
|
|
137
|
+
# With async: Handles it smoothly (10,000 fibers = ~80 MB RAM)
|
|
138
|
+
result = pipeline.call_parallel(initial_data)
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
---
|
|
142
|
+
|
|
143
|
+
## Decision Guide
|
|
144
|
+
|
|
145
|
+
### Use Threads (no async gem) when:
|
|
146
|
+
|
|
147
|
+
✅ You have **10-100 parallel operations**
|
|
148
|
+
✅ Using **standard Ruby gems** (not async-compatible)
|
|
149
|
+
✅ Making **database queries** or **HTTP requests** with traditional libraries
|
|
150
|
+
✅ You want **simple, straightforward code**
|
|
151
|
+
✅ Building **internal tools** or **scripts**
|
|
152
|
+
|
|
153
|
+
**Example:**
|
|
154
|
+
```ruby
|
|
155
|
+
# E-commerce checkout: Check inventory, calculate shipping, process payment
|
|
156
|
+
# 3-5 parallel operations, standard libraries
|
|
157
|
+
|
|
158
|
+
# Option 1: Auto-detect (uses threads since no async gem needed)
|
|
159
|
+
pipeline = SimpleFlow::Pipeline.new do
|
|
160
|
+
step :validate_order, validator, depends_on: []
|
|
161
|
+
step :check_inventory, inventory_checker, depends_on: [:validate_order]
|
|
162
|
+
step :calculate_shipping, shipping_calculator, depends_on: [:validate_order]
|
|
163
|
+
step :process_payment, payment_processor, depends_on: [:check_inventory, :calculate_shipping]
|
|
164
|
+
end
|
|
165
|
+
|
|
166
|
+
# Option 2: Explicitly use threads (works even if async gem is installed)
|
|
167
|
+
pipeline = SimpleFlow::Pipeline.new(concurrency: :threads) do
|
|
168
|
+
step :validate_order, validator, depends_on: []
|
|
169
|
+
step :check_inventory, inventory_checker, depends_on: [:validate_order]
|
|
170
|
+
step :calculate_shipping, shipping_calculator, depends_on: [:validate_order]
|
|
171
|
+
step :process_payment, payment_processor, depends_on: [:check_inventory, :calculate_shipping]
|
|
172
|
+
end
|
|
173
|
+
|
|
174
|
+
result = pipeline.call_parallel(order) # ✅ Threads work great
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
### Use Async (add async gem) when:
|
|
178
|
+
|
|
179
|
+
✅ You need **1,000+ concurrent operations**
|
|
180
|
+
✅ Building **high-performance web services**
|
|
181
|
+
✅ Processing **large-scale I/O operations** (web scraping, bulk APIs)
|
|
182
|
+
✅ Using **async-compatible libraries** (async-http, async-postgres)
|
|
183
|
+
✅ Optimizing **resource usage** (hosting costs, memory limits)
|
|
184
|
+
|
|
185
|
+
**Example:**
|
|
186
|
+
```ruby
|
|
187
|
+
# Monitoring service checking 5,000 endpoints every minute
|
|
188
|
+
# Need low memory footprint and high concurrency
|
|
189
|
+
|
|
190
|
+
# Gemfile:
|
|
191
|
+
gem 'async', '~> 2.0'
|
|
192
|
+
gem 'async-http', '~> 0.60'
|
|
193
|
+
|
|
194
|
+
# Explicitly require async concurrency for this high-volume pipeline
|
|
195
|
+
pipeline = SimpleFlow::Pipeline.new(concurrency: :async) do
|
|
196
|
+
step :load_endpoints, endpoint_loader, depends_on: []
|
|
197
|
+
|
|
198
|
+
# Async gem allows 5,000 concurrent health checks efficiently
|
|
199
|
+
step :check_all, health_checker, depends_on: [:load_endpoints]
|
|
200
|
+
|
|
201
|
+
step :aggregate_results, aggregator, depends_on: [:check_all]
|
|
202
|
+
end
|
|
203
|
+
|
|
204
|
+
result = pipeline.call_parallel(config) # ✅ Async is essential
|
|
205
|
+
# Raises error if async gem not installed
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
---
|
|
209
|
+
|
|
210
|
+
## Quick Comparison Table
|
|
211
|
+
|
|
212
|
+
| Factor | Ruby Threads | Async Gem |
|
|
213
|
+
|--------|-------------|-----------|
|
|
214
|
+
| **Setup** | None (built-in) | `gem 'async'` |
|
|
215
|
+
| **Concurrency Limit** | ~100-1,000 | ~10,000+ |
|
|
216
|
+
| **Memory per operation** | 1-2 MB | 4-8 KB |
|
|
217
|
+
| **Library compatibility** | Any Ruby gem | Needs async-aware gems |
|
|
218
|
+
| **Learning curve** | Simple | Moderate |
|
|
219
|
+
| **Speed (I/O)** | Fast | Faster |
|
|
220
|
+
| **Speed (CPU)** | GIL-limited | GIL-limited (same) |
|
|
221
|
+
| **Best use case** | Standard apps | High-concurrency services |
|
|
222
|
+
|
|
223
|
+
---
|
|
224
|
+
|
|
225
|
+
## Real-World Analogy
|
|
226
|
+
|
|
227
|
+
**Threads** = Hiring separate workers
|
|
228
|
+
- Each worker has their own desk, phone, computer (more resources)
|
|
229
|
+
- Can have 50-100 workers before office gets crowded
|
|
230
|
+
- Workers use regular tools everyone knows
|
|
231
|
+
- Easy to manage
|
|
232
|
+
|
|
233
|
+
**Async** = One worker with a really efficient task list
|
|
234
|
+
- Worker rapidly switches between tasks when waiting
|
|
235
|
+
- Can juggle 10,000 tasks because they're mostly waiting anyway
|
|
236
|
+
- Needs special tools designed for rapid task-switching
|
|
237
|
+
- More efficient but requires planning
|
|
238
|
+
|
|
239
|
+
---
|
|
240
|
+
|
|
241
|
+
## Switching Between Models
|
|
242
|
+
|
|
243
|
+
The beauty of SimpleFlow is that you can switch between concurrency models without changing your pipeline code:
|
|
244
|
+
|
|
245
|
+
### Starting with Threads
|
|
246
|
+
|
|
247
|
+
```ruby
|
|
248
|
+
# Gemfile - no async gem
|
|
249
|
+
gem 'simple_flow'
|
|
250
|
+
|
|
251
|
+
# Your pipeline code
|
|
252
|
+
pipeline = SimpleFlow::Pipeline.new do
|
|
253
|
+
step :fetch_user, user_fetcher, depends_on: []
|
|
254
|
+
step :fetch_orders, order_fetcher, depends_on: [:fetch_user]
|
|
255
|
+
step :fetch_products, product_fetcher, depends_on: [:fetch_user]
|
|
256
|
+
end
|
|
257
|
+
|
|
258
|
+
result = pipeline.call_parallel(data) # Uses threads
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
### Upgrading to Async
|
|
262
|
+
|
|
263
|
+
```ruby
|
|
264
|
+
# Gemfile - add async gem
|
|
265
|
+
gem 'simple_flow'
|
|
266
|
+
gem 'async', '~> 2.0'
|
|
267
|
+
|
|
268
|
+
# Same pipeline code - no changes needed!
|
|
269
|
+
pipeline = SimpleFlow::Pipeline.new do
|
|
270
|
+
step :fetch_user, user_fetcher, depends_on: []
|
|
271
|
+
step :fetch_orders, order_fetcher, depends_on: [:fetch_user]
|
|
272
|
+
step :fetch_products, product_fetcher, depends_on: [:fetch_user]
|
|
273
|
+
end
|
|
274
|
+
|
|
275
|
+
result = pipeline.call_parallel(data) # Now uses async automatically
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
### Mixing Concurrency Models in One Application
|
|
279
|
+
|
|
280
|
+
You can use different concurrency models for different pipelines in the same application:
|
|
281
|
+
|
|
282
|
+
```ruby
|
|
283
|
+
# Gemfile - include async for high-volume pipelines
|
|
284
|
+
gem 'simple_flow'
|
|
285
|
+
gem 'async', '~> 2.0'
|
|
286
|
+
|
|
287
|
+
# Low-volume pipeline: Use threads for simplicity
|
|
288
|
+
user_pipeline = SimpleFlow::Pipeline.new(concurrency: :threads) do
|
|
289
|
+
step :validate, validator, depends_on: []
|
|
290
|
+
step :fetch_profile, profile_fetcher, depends_on: [:validate]
|
|
291
|
+
step :fetch_preferences, prefs_fetcher, depends_on: [:validate]
|
|
292
|
+
end
|
|
293
|
+
|
|
294
|
+
# High-volume pipeline: Use async for efficiency
|
|
295
|
+
monitoring_pipeline = SimpleFlow::Pipeline.new(concurrency: :async) do
|
|
296
|
+
step :load_endpoints, endpoint_loader, depends_on: []
|
|
297
|
+
step :check_all, health_checker, depends_on: [:load_endpoints]
|
|
298
|
+
step :alert, alerter, depends_on: [:check_all]
|
|
299
|
+
end
|
|
300
|
+
|
|
301
|
+
# Each pipeline uses its configured concurrency model
|
|
302
|
+
user_result = user_pipeline.call_parallel(user_data) # Uses threads
|
|
303
|
+
monitoring_result = monitoring_pipeline.call_parallel(config) # Uses async
|
|
304
|
+
```
|
|
305
|
+
|
|
306
|
+
This allows you to optimize each pipeline based on its specific requirements!
|
|
307
|
+
|
|
308
|
+
---
|
|
309
|
+
|
|
310
|
+
## Performance Characteristics
|
|
311
|
+
|
|
312
|
+
### I/O-Bound Operations
|
|
313
|
+
|
|
314
|
+
Both threads and async excel at I/O-bound operations (network, disk, database):
|
|
315
|
+
|
|
316
|
+
```ruby
|
|
317
|
+
# API calls, database queries, file operations
|
|
318
|
+
# Both models provide significant speedup over sequential execution
|
|
319
|
+
|
|
320
|
+
# Sequential: 10 API calls × 200ms = 2000ms
|
|
321
|
+
# Threads: 10 API calls in parallel = ~200ms
|
|
322
|
+
# Async: 10 API calls in parallel = ~200ms
|
|
323
|
+
|
|
324
|
+
# Winner: Tie (both are fast for moderate I/O)
|
|
325
|
+
```
|
|
326
|
+
|
|
327
|
+
### High Concurrency (1000+ operations)
|
|
328
|
+
|
|
329
|
+
Async shines when dealing with thousands of concurrent operations:
|
|
330
|
+
|
|
331
|
+
```ruby
|
|
332
|
+
# 5,000 concurrent HTTP requests
|
|
333
|
+
|
|
334
|
+
# Threads: 5,000 threads × 1.5 MB = 7.5 GB RAM ❌
|
|
335
|
+
# Async: 5,000 fibers × 6 KB = 30 MB RAM ✅
|
|
336
|
+
|
|
337
|
+
# Winner: Async (dramatically lower resource usage)
|
|
338
|
+
```
|
|
339
|
+
|
|
340
|
+
### CPU-Bound Operations
|
|
341
|
+
|
|
342
|
+
Neither model helps with pure CPU work due to Ruby's GIL:
|
|
343
|
+
|
|
344
|
+
```ruby
|
|
345
|
+
# Heavy computation (image processing, data crunching)
|
|
346
|
+
# GIL ensures only one thread/fiber does CPU work at a time
|
|
347
|
+
|
|
348
|
+
# Sequential: 1000ms
|
|
349
|
+
# Threads: 1000ms (GIL limitation)
|
|
350
|
+
# Async: 1000ms (GIL limitation)
|
|
351
|
+
|
|
352
|
+
# Winner: None (use process-based parallelism for CPU work)
|
|
353
|
+
```
|
|
354
|
+
|
|
355
|
+
---
|
|
356
|
+
|
|
357
|
+
## Common Questions
|
|
358
|
+
|
|
359
|
+
### Q: Can I use both in the same application?
|
|
360
|
+
|
|
361
|
+
**A:** Yes! SimpleFlow automatically detects if async is available and uses it. Different pipelines in the same app can use different models.
|
|
362
|
+
|
|
363
|
+
### Q: Do I need to change my code to switch models?
|
|
364
|
+
|
|
365
|
+
**A:** No! Just add or remove the `async` gem from your Gemfile. Your pipeline code stays the same.
|
|
366
|
+
|
|
367
|
+
### Q: What if I'm not sure which to use?
|
|
368
|
+
|
|
369
|
+
**A:** Start without async (use threads). It's simpler and works great for most use cases. Add async later if you need it.
|
|
370
|
+
|
|
371
|
+
### Q: Can I check which model is being used?
|
|
372
|
+
|
|
373
|
+
**A:** Yes! Use the `async_available?` method:
|
|
374
|
+
|
|
375
|
+
```ruby
|
|
376
|
+
pipeline = SimpleFlow::Pipeline.new
|
|
377
|
+
puts "Using async: #{pipeline.async_available?}"
|
|
378
|
+
```
|
|
379
|
+
|
|
380
|
+
### Q: Are there any compatibility issues with async?
|
|
381
|
+
|
|
382
|
+
**A:** Async requires async-aware libraries for best results:
|
|
383
|
+
- Use `async-http` instead of `net/http` or `httparty`
|
|
384
|
+
- Use `async-postgres` instead of `pg`
|
|
385
|
+
- Check if your favorite gems have async versions
|
|
386
|
+
|
|
387
|
+
With threads, any Ruby gem works out of the box.
|
|
388
|
+
|
|
389
|
+
---
|
|
390
|
+
|
|
391
|
+
## Recommendations
|
|
392
|
+
|
|
393
|
+
### For Most Users
|
|
394
|
+
|
|
395
|
+
**Start with threads (no async gem):**
|
|
396
|
+
- Simpler setup
|
|
397
|
+
- Works with any library
|
|
398
|
+
- Sufficient for most applications
|
|
399
|
+
- Easy to understand and debug
|
|
400
|
+
|
|
401
|
+
### Upgrade to Async When
|
|
402
|
+
|
|
403
|
+
You experience any of these:
|
|
404
|
+
- ⚠️ High memory usage from threads
|
|
405
|
+
- ⚠️ Need more than 100 concurrent operations
|
|
406
|
+
- ⚠️ Building high-throughput services
|
|
407
|
+
- ⚠️ Already using async-compatible libraries
|
|
408
|
+
- ⚠️ Hosting costs driven by memory usage
|
|
409
|
+
|
|
410
|
+
### Migration Path
|
|
411
|
+
|
|
412
|
+
1. **Start**: Build with threads (no dependencies)
|
|
413
|
+
2. **Measure**: Profile your application under realistic load
|
|
414
|
+
3. **Decide**: If you hit thread limits, add async gem
|
|
415
|
+
4. **Switch**: Just add gem to Gemfile, no code changes
|
|
416
|
+
5. **Optimize**: Gradually adopt async-aware libraries for better performance
|
|
417
|
+
|
|
418
|
+
---
|
|
419
|
+
|
|
420
|
+
## Next Steps
|
|
421
|
+
|
|
422
|
+
- [Parallel Execution](../concurrent/parallel-steps.md) - Deep dive into parallel execution patterns
|
|
423
|
+
- [Performance](../concurrent/performance.md) - Benchmarking and optimization tips
|
|
424
|
+
- [Best Practices](../concurrent/best-practices.md) - Concurrent programming patterns
|
|
425
|
+
- [Error Handling](error-handling.md) - Handling errors in parallel pipelines
|
|
426
|
+
|
|
427
|
+
---
|
|
428
|
+
|
|
429
|
+
## Summary
|
|
430
|
+
|
|
431
|
+
| Your Scenario | Recommendation |
|
|
432
|
+
|--------------|----------------|
|
|
433
|
+
| Building internal tools, scripts | ✅ **Threads** (no async) |
|
|
434
|
+
| Standard web app with DB queries | ✅ **Threads** (no async) |
|
|
435
|
+
| Processing 10-100 parallel tasks | ✅ **Threads** (no async) |
|
|
436
|
+
| High-volume API gateway | ✅ **Async** (add gem) |
|
|
437
|
+
| Web scraper (1000+ requests) | ✅ **Async** (add gem) |
|
|
438
|
+
| Real-time chat/notifications | ✅ **Async** (add gem) |
|
|
439
|
+
| Background job processor | ✅ **Async** (add gem) |
|
|
440
|
+
|
|
441
|
+
**Remember:** You can always start simple (threads) and upgrade to async later without changing your pipeline code!
|