simple_flow 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (80) hide show
  1. checksums.yaml +7 -0
  2. data/.envrc +1 -0
  3. data/.github/workflows/deploy-github-pages.yml +52 -0
  4. data/.rubocop.yml +57 -0
  5. data/CHANGELOG.md +4 -0
  6. data/COMMITS.md +196 -0
  7. data/LICENSE +21 -0
  8. data/README.md +481 -0
  9. data/Rakefile +15 -0
  10. data/benchmarks/parallel_vs_sequential.rb +98 -0
  11. data/benchmarks/pipeline_overhead.rb +130 -0
  12. data/docs/api/middleware.md +468 -0
  13. data/docs/api/parallel-step.md +363 -0
  14. data/docs/api/pipeline.md +382 -0
  15. data/docs/api/result.md +375 -0
  16. data/docs/concurrent/best-practices.md +687 -0
  17. data/docs/concurrent/introduction.md +246 -0
  18. data/docs/concurrent/parallel-steps.md +418 -0
  19. data/docs/concurrent/performance.md +481 -0
  20. data/docs/core-concepts/flow-control.md +452 -0
  21. data/docs/core-concepts/middleware.md +389 -0
  22. data/docs/core-concepts/overview.md +219 -0
  23. data/docs/core-concepts/pipeline.md +315 -0
  24. data/docs/core-concepts/result.md +168 -0
  25. data/docs/core-concepts/steps.md +391 -0
  26. data/docs/development/benchmarking.md +443 -0
  27. data/docs/development/contributing.md +380 -0
  28. data/docs/development/dagwood-concepts.md +435 -0
  29. data/docs/development/testing.md +514 -0
  30. data/docs/getting-started/examples.md +197 -0
  31. data/docs/getting-started/installation.md +62 -0
  32. data/docs/getting-started/quick-start.md +218 -0
  33. data/docs/guides/choosing-concurrency-model.md +441 -0
  34. data/docs/guides/complex-workflows.md +440 -0
  35. data/docs/guides/data-fetching.md +478 -0
  36. data/docs/guides/error-handling.md +635 -0
  37. data/docs/guides/file-processing.md +505 -0
  38. data/docs/guides/validation-patterns.md +496 -0
  39. data/docs/index.md +169 -0
  40. data/examples/.gitignore +3 -0
  41. data/examples/01_basic_pipeline.rb +112 -0
  42. data/examples/02_error_handling.rb +178 -0
  43. data/examples/03_middleware.rb +186 -0
  44. data/examples/04_parallel_automatic.rb +221 -0
  45. data/examples/05_parallel_explicit.rb +279 -0
  46. data/examples/06_real_world_ecommerce.rb +288 -0
  47. data/examples/07_real_world_etl.rb +277 -0
  48. data/examples/08_graph_visualization.rb +246 -0
  49. data/examples/09_pipeline_visualization.rb +266 -0
  50. data/examples/10_concurrency_control.rb +235 -0
  51. data/examples/11_sequential_dependencies.rb +243 -0
  52. data/examples/12_none_constant.rb +161 -0
  53. data/examples/README.md +374 -0
  54. data/examples/regression_test/01_basic_pipeline.txt +38 -0
  55. data/examples/regression_test/02_error_handling.txt +92 -0
  56. data/examples/regression_test/03_middleware.txt +61 -0
  57. data/examples/regression_test/04_parallel_automatic.txt +86 -0
  58. data/examples/regression_test/05_parallel_explicit.txt +80 -0
  59. data/examples/regression_test/06_real_world_ecommerce.txt +53 -0
  60. data/examples/regression_test/07_real_world_etl.txt +58 -0
  61. data/examples/regression_test/08_graph_visualization.txt +429 -0
  62. data/examples/regression_test/09_pipeline_visualization.txt +305 -0
  63. data/examples/regression_test/10_concurrency_control.txt +96 -0
  64. data/examples/regression_test/11_sequential_dependencies.txt +86 -0
  65. data/examples/regression_test/12_none_constant.txt +64 -0
  66. data/examples/regression_test.rb +105 -0
  67. data/lib/simple_flow/dependency_graph.rb +120 -0
  68. data/lib/simple_flow/dependency_graph_visualizer.rb +326 -0
  69. data/lib/simple_flow/middleware.rb +36 -0
  70. data/lib/simple_flow/parallel_executor.rb +80 -0
  71. data/lib/simple_flow/pipeline.rb +405 -0
  72. data/lib/simple_flow/result.rb +88 -0
  73. data/lib/simple_flow/step_tracker.rb +58 -0
  74. data/lib/simple_flow/version.rb +5 -0
  75. data/lib/simple_flow.rb +41 -0
  76. data/mkdocs.yml +146 -0
  77. data/pipeline_graph.dot +51 -0
  78. data/pipeline_graph.html +60 -0
  79. data/pipeline_graph.mmd +19 -0
  80. metadata +127 -0
@@ -0,0 +1,481 @@
1
+ # Performance Characteristics
2
+
3
+ Understanding the performance implications of parallel execution in SimpleFlow helps you make informed decisions about when and how to use concurrent execution.
4
+
5
+ ## Overview
6
+
7
+ SimpleFlow uses the `async` gem for true concurrent execution. When the async gem is available, parallel steps run in separate fibers, allowing I/O operations to execute concurrently. When async is not available, SimpleFlow falls back to sequential execution.
8
+
9
+ ## Async Gem Integration
10
+
11
+ ### Checking Availability
12
+
13
+ ```ruby
14
+ pipeline = SimpleFlow::Pipeline.new
15
+ pipeline.async_available? # => true if async gem is installed
16
+ ```
17
+
18
+ ### Installation
19
+
20
+ Add to your Gemfile:
21
+
22
+ ```ruby
23
+ gem 'async', '~> 2.0'
24
+ ```
25
+
26
+ Then run:
27
+
28
+ ```bash
29
+ bundle install
30
+ ```
31
+
32
+ ### Fallback Behavior
33
+
34
+ If async is not available, SimpleFlow automatically falls back to sequential execution:
35
+
36
+ ```ruby
37
+ # With async gem
38
+ result = pipeline.call_parallel(data) # Executes in parallel
39
+
40
+ # Without async gem
41
+ result = pipeline.call_parallel(data) # Executes sequentially (automatically)
42
+ ```
43
+
44
+ The API remains identical, ensuring your code works in both scenarios.
45
+
46
+ ## When to Use Parallel Execution
47
+
48
+ ### Ideal Use Cases (I/O-Bound)
49
+
50
+ Parallel execution provides significant benefits for I/O-bound operations:
51
+
52
+ #### 1. Multiple API Calls
53
+
54
+ ```ruby
55
+ pipeline = SimpleFlow::Pipeline.new do
56
+ step :fetch_weather, ->(result) {
57
+ # I/O-bound: Network request
58
+ weather = WeatherAPI.fetch(result.value[:location])
59
+ result.with_context(:weather, weather).continue(result.value)
60
+ }, depends_on: []
61
+
62
+ step :fetch_news, ->(result) {
63
+ # I/O-bound: Network request
64
+ news = NewsAPI.fetch(result.value[:topic])
65
+ result.with_context(:news, news).continue(result.value)
66
+ }, depends_on: []
67
+
68
+ step :fetch_stocks, ->(result) {
69
+ # I/O-bound: Network request
70
+ stocks = StockAPI.fetch(result.value[:symbols])
71
+ result.with_context(:stocks, stocks).continue(result.value)
72
+ }, depends_on: []
73
+ end
74
+
75
+ # Sequential: ~300ms (100ms per API call)
76
+ # Parallel: ~100ms (all calls concurrent)
77
+ # Speedup: 3x
78
+ ```
79
+
80
+ #### 2. Database Queries
81
+
82
+ ```ruby
83
+ pipeline = SimpleFlow::Pipeline.new do
84
+ step :query_users, ->(result) {
85
+ # I/O-bound: Database query
86
+ users = DB[:users].where(active: true).all
87
+ result.with_context(:users, users).continue(result.value)
88
+ }, depends_on: []
89
+
90
+ step :query_posts, ->(result) {
91
+ # I/O-bound: Database query
92
+ posts = DB[:posts].where(published: true).all
93
+ result.with_context(:posts, posts).continue(result.value)
94
+ }, depends_on: []
95
+
96
+ step :query_comments, ->(result) {
97
+ # I/O-bound: Database query
98
+ comments = DB[:comments].where(approved: true).all
99
+ result.with_context(:comments, comments).continue(result.value)
100
+ }, depends_on: []
101
+ end
102
+
103
+ # Sequential: ~150ms (50ms per query)
104
+ # Parallel: ~50ms (all queries concurrent)
105
+ # Speedup: 3x
106
+ ```
107
+
108
+ #### 3. File Operations
109
+
110
+ ```ruby
111
+ pipeline = SimpleFlow::Pipeline.new do
112
+ step :read_config, ->(result) {
113
+ # I/O-bound: File read
114
+ config = JSON.parse(File.read('config.json'))
115
+ result.with_context(:config, config).continue(result.value)
116
+ }, depends_on: []
117
+
118
+ step :read_users, ->(result) {
119
+ # I/O-bound: File read
120
+ users = CSV.read('users.csv')
121
+ result.with_context(:users, users).continue(result.value)
122
+ }, depends_on: []
123
+
124
+ step :read_logs, ->(result) {
125
+ # I/O-bound: File read
126
+ logs = File.readlines('app.log')
127
+ result.with_context(:logs, logs).continue(result.value)
128
+ }, depends_on: []
129
+ end
130
+
131
+ # Sequential: ~300ms (100ms per file)
132
+ # Parallel: ~100ms (all reads concurrent)
133
+ # Speedup: 3x
134
+ ```
135
+
136
+ ### When NOT to Use Parallel Execution
137
+
138
+ #### 1. CPU-Bound Operations
139
+
140
+ Due to Ruby's Global Interpreter Lock (GIL), CPU-bound operations do not benefit from parallel execution:
141
+
142
+ ```ruby
143
+ # CPU-intensive calculations
144
+ pipeline = SimpleFlow::Pipeline.new do
145
+ step :calculate_fibonacci, ->(result) {
146
+ # CPU-bound: No I/O, pure computation
147
+ fib = fibonacci(result.value)
148
+ result.with_context(:fib, fib).continue(result.value)
149
+ }, depends_on: []
150
+
151
+ step :calculate_primes, ->(result) {
152
+ # CPU-bound: No I/O, pure computation
153
+ primes = find_primes(result.value)
154
+ result.with_context(:primes, primes).continue(result.value)
155
+ }, depends_on: []
156
+ end
157
+
158
+ # Sequential: ~200ms
159
+ # Parallel: ~200ms (no speedup due to GIL)
160
+ # Speedup: None
161
+ ```
162
+
163
+ **Recommendation**: Use sequential execution for CPU-bound tasks.
164
+
165
+ #### 2. Steps with Shared State
166
+
167
+ Avoid parallel execution when steps modify shared state:
168
+
169
+ ```ruby
170
+ # BAD: Race conditions
171
+ @counter = 0
172
+
173
+ pipeline = SimpleFlow::Pipeline.new do
174
+ step :increment_a, ->(result) {
175
+ @counter += 1 # Race condition!
176
+ result.continue(result.value)
177
+ }, depends_on: []
178
+
179
+ step :increment_b, ->(result) {
180
+ @counter += 1 # Race condition!
181
+ result.continue(result.value)
182
+ }, depends_on: []
183
+ end
184
+ ```
185
+
186
+ **Recommendation**: Design steps to be independent and use context for data sharing.
187
+
188
+ #### 3. Small, Fast Operations
189
+
190
+ Parallel execution has overhead. For very fast operations, the overhead may exceed the benefit:
191
+
192
+ ```ruby
193
+ pipeline = SimpleFlow::Pipeline.new do
194
+ step :upcase_string, ->(result) {
195
+ result.continue(result.value.upcase) # ~0.001ms
196
+ }, depends_on: []
197
+
198
+ step :reverse_string, ->(result) {
199
+ result.continue(result.value.reverse) # ~0.001ms
200
+ }, depends_on: []
201
+ end
202
+
203
+ # Sequential: ~0.002ms
204
+ # Parallel: ~0.5ms (overhead > benefit)
205
+ # Slowdown: 250x
206
+ ```
207
+
208
+ **Recommendation**: Use parallel execution only when individual steps take at least 10-100ms.
209
+
210
+ ## Performance Benchmarks
211
+
212
+ ### Test Setup
213
+
214
+ ```ruby
215
+ require 'benchmark'
216
+
217
+ # Simulate I/O delay
218
+ def simulate_io(duration_ms)
219
+ sleep(duration_ms / 1000.0)
220
+ end
221
+
222
+ # Simple pipeline with 3 parallel steps
223
+ pipeline = SimpleFlow::Pipeline.new do
224
+ step :task_a, ->(result) {
225
+ simulate_io(100)
226
+ result.with_context(:a, "done").continue(result.value)
227
+ }, depends_on: []
228
+
229
+ step :task_b, ->(result) {
230
+ simulate_io(100)
231
+ result.with_context(:b, "done").continue(result.value)
232
+ }, depends_on: []
233
+
234
+ step :task_c, ->(result) {
235
+ simulate_io(100)
236
+ result.with_context(:c, "done").continue(result.value)
237
+ }, depends_on: []
238
+ end
239
+
240
+ initial = SimpleFlow::Result.new(nil)
241
+ ```
242
+
243
+ ### Results
244
+
245
+ ```ruby
246
+ Benchmark.bm do |x|
247
+ x.report("Sequential:") { pipeline.call(initial) }
248
+ x.report("Parallel: ") { pipeline.call_parallel(initial) }
249
+ end
250
+ ```
251
+
252
+ Output:
253
+ ```
254
+ user system total real
255
+ Sequential: 0.000000 0.000000 0.000000 ( 0.301234)
256
+ Parallel: 0.000000 0.000000 0.000000 ( 0.101456)
257
+ ```
258
+
259
+ **Speedup**: 2.97x (nearly 3x for 3 parallel steps)
260
+
261
+ ### Complex Pipeline
262
+
263
+ ```ruby
264
+ # Multi-level pipeline (like e-commerce example)
265
+ # Level 1: 1 step (100ms)
266
+ # Level 2: 2 parallel steps (100ms each)
267
+ # Level 3: 1 step (100ms)
268
+ # Level 4: 2 parallel steps (100ms each)
269
+
270
+ # Sequential: 100 + 100 + 100 + 100 + 100 + 100 = 600ms
271
+ # Parallel: 100 + 100 + 100 + 100 = 400ms
272
+ # Speedup: 1.5x
273
+ ```
274
+
275
+ ## GIL Limitations
276
+
277
+ ### Understanding the GIL
278
+
279
+ Ruby's Global Interpreter Lock (GIL) allows only one thread to execute Ruby code at a time. This means:
280
+
281
+ 1. **I/O Operations**: Can run concurrently (I/O happens outside the GIL)
282
+ 2. **CPU Operations**: Cannot run concurrently (bound by GIL)
283
+
284
+ ### Example: I/O vs CPU
285
+
286
+ ```ruby
287
+ # I/O-bound: Benefits from parallelism
288
+ step :fetch_api, ->(result) {
289
+ # Network I/O releases GIL
290
+ response = HTTP.get("https://api.example.com")
291
+ result.with_context(:data, response).continue(result.value)
292
+ }
293
+
294
+ # CPU-bound: No benefit from parallelism
295
+ step :calculate, ->(result) {
296
+ # Pure Ruby computation holds GIL
297
+ result = (1..1000000).reduce(:+)
298
+ result.continue(result)
299
+ }
300
+ ```
301
+
302
+ ### Ruby Implementation Differences
303
+
304
+ Different Ruby implementations handle the GIL differently:
305
+
306
+ - **MRI (CRuby)**: Has GIL, I/O can be concurrent
307
+ - **JRuby**: No GIL, true parallelism for CPU tasks
308
+ - **TruffleRuby**: No GIL, true parallelism for CPU tasks
309
+
310
+ SimpleFlow works with all implementations, but performance characteristics vary.
311
+
312
+ ## Overhead Analysis
313
+
314
+ ### Parallel Execution Overhead
315
+
316
+ Parallel execution adds overhead from:
317
+
318
+ 1. **Task creation**: Creating async tasks
319
+ 2. **Synchronization**: Waiting for tasks to complete
320
+ 3. **Result merging**: Combining contexts and errors
321
+
322
+ ### Overhead Measurements
323
+
324
+ ```ruby
325
+ # Overhead for empty steps
326
+ pipeline = SimpleFlow::Pipeline.new do
327
+ (1..10).each do |i|
328
+ step "step_#{i}".to_sym, ->(result) {
329
+ result.continue(result.value)
330
+ }, depends_on: []
331
+ end
332
+ end
333
+
334
+ # Sequential: ~0.5ms
335
+ # Parallel: ~5ms
336
+ # Overhead: ~4.5ms
337
+ ```
338
+
339
+ **Guideline**: Parallel execution is worthwhile when:
340
+ - Each step takes > 10ms
341
+ - Multiple steps can run concurrently
342
+ - Steps are I/O-bound
343
+
344
+ ## Optimization Strategies
345
+
346
+ ### 1. Batch Independent Operations
347
+
348
+ Group independent I/O operations for maximum concurrency:
349
+
350
+ ```ruby
351
+ # GOOD: Maximum parallelism
352
+ pipeline = SimpleFlow::Pipeline.new do
353
+ step :fetch_user_data, ->(result) { ... }, depends_on: []
354
+ step :fetch_product_data, ->(result) { ... }, depends_on: []
355
+ step :fetch_order_data, ->(result) { ... }, depends_on: []
356
+ step :fetch_shipping_data, ->(result) { ... }, depends_on: []
357
+ # All 4 run in parallel
358
+ end
359
+
360
+ # BAD: Artificial dependencies
361
+ pipeline = SimpleFlow::Pipeline.new do
362
+ step :fetch_user_data, ->(result) { ... }, depends_on: []
363
+ step :fetch_product_data, ->(result) { ... }, depends_on: [:fetch_user_data]
364
+ step :fetch_order_data, ->(result) { ... }, depends_on: [:fetch_product_data]
365
+ # All run sequentially (slower)
366
+ end
367
+ ```
368
+
369
+ ### 2. Minimize Context Size
370
+
371
+ Large contexts slow down result merging:
372
+
373
+ ```ruby
374
+ # GOOD: Only essential data
375
+ step :fetch_users, ->(result) {
376
+ users = fetch_all_users
377
+ user_ids = users.map { |u| u[:id] }
378
+ result.with_context(:user_ids, user_ids).continue(result.value)
379
+ }
380
+
381
+ # BAD: Large data structures
382
+ step :fetch_users, ->(result) {
383
+ users = fetch_all_users # Huge array
384
+ result.with_context(:all_users, users).continue(result.value)
385
+ }
386
+ ```
387
+
388
+ ### 3. Use Connection Pools
389
+
390
+ For database operations, use connection pooling:
391
+
392
+ ```ruby
393
+ # Configure connection pool
394
+ DB = Sequel.connect(
395
+ 'postgres://localhost/mydb',
396
+ max_connections: 10 # Allow concurrent queries
397
+ )
398
+
399
+ pipeline = SimpleFlow::Pipeline.new do
400
+ step :query_a, ->(result) {
401
+ # Uses connection from pool
402
+ data = DB[:table_a].all
403
+ result.with_context(:data_a, data).continue(result.value)
404
+ }, depends_on: []
405
+
406
+ step :query_b, ->(result) {
407
+ # Uses different connection from pool
408
+ data = DB[:table_b].all
409
+ result.with_context(:data_b, data).continue(result.value)
410
+ }, depends_on: []
411
+ end
412
+ ```
413
+
414
+ ### 4. Profile Before Optimizing
415
+
416
+ Measure actual performance before adding parallelism:
417
+
418
+ ```ruby
419
+ require 'benchmark'
420
+
421
+ # Test sequential
422
+ sequential_time = Benchmark.realtime do
423
+ pipeline.call(initial_result)
424
+ end
425
+
426
+ # Test parallel
427
+ parallel_time = Benchmark.realtime do
428
+ pipeline.call_parallel(initial_result)
429
+ end
430
+
431
+ speedup = sequential_time / parallel_time
432
+ puts "Speedup: #{speedup.round(2)}x"
433
+ ```
434
+
435
+ ## Monitoring and Debugging
436
+
437
+ ### Execution Time Tracking
438
+
439
+ Add timing to your steps:
440
+
441
+ ```ruby
442
+ step :timed_operation, ->(result) {
443
+ start = Time.now
444
+
445
+ # Your operation
446
+ data = perform_operation(result.value)
447
+
448
+ duration = Time.now - start
449
+ result
450
+ .with_context(:operation_data, data)
451
+ .with_context(:operation_duration, duration)
452
+ .continue(result.value)
453
+ }
454
+ ```
455
+
456
+ ### Visualization
457
+
458
+ Use visualization tools to understand execution flow:
459
+
460
+ ```ruby
461
+ # View execution plan
462
+ puts pipeline.execution_plan
463
+
464
+ # Generate visual diagram
465
+ File.write('pipeline.dot', pipeline.visualize_dot)
466
+ # Run: dot -Tpng pipeline.dot -o pipeline.png
467
+ ```
468
+
469
+ ## Performance Testing
470
+
471
+ See `/Users/dewayne/sandbox/git_repos/madbomber/simple_flow/examples/04_parallel_automatic.rb` for performance comparisons showing:
472
+
473
+ - Parallel vs sequential execution times
474
+ - Context merging behavior
475
+ - Error handling overhead
476
+
477
+ ## Related Documentation
478
+
479
+ - [Parallel Steps](parallel-steps.md) - How to use named steps with dependencies
480
+ - [Best Practices](best-practices.md) - Recommended patterns for concurrent execution
481
+ - [Benchmarking Guide](../development/benchmarking.md) - How to benchmark your pipelines