simple_flow 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (80) hide show
  1. checksums.yaml +7 -0
  2. data/.envrc +1 -0
  3. data/.github/workflows/deploy-github-pages.yml +52 -0
  4. data/.rubocop.yml +57 -0
  5. data/CHANGELOG.md +4 -0
  6. data/COMMITS.md +196 -0
  7. data/LICENSE +21 -0
  8. data/README.md +481 -0
  9. data/Rakefile +15 -0
  10. data/benchmarks/parallel_vs_sequential.rb +98 -0
  11. data/benchmarks/pipeline_overhead.rb +130 -0
  12. data/docs/api/middleware.md +468 -0
  13. data/docs/api/parallel-step.md +363 -0
  14. data/docs/api/pipeline.md +382 -0
  15. data/docs/api/result.md +375 -0
  16. data/docs/concurrent/best-practices.md +687 -0
  17. data/docs/concurrent/introduction.md +246 -0
  18. data/docs/concurrent/parallel-steps.md +418 -0
  19. data/docs/concurrent/performance.md +481 -0
  20. data/docs/core-concepts/flow-control.md +452 -0
  21. data/docs/core-concepts/middleware.md +389 -0
  22. data/docs/core-concepts/overview.md +219 -0
  23. data/docs/core-concepts/pipeline.md +315 -0
  24. data/docs/core-concepts/result.md +168 -0
  25. data/docs/core-concepts/steps.md +391 -0
  26. data/docs/development/benchmarking.md +443 -0
  27. data/docs/development/contributing.md +380 -0
  28. data/docs/development/dagwood-concepts.md +435 -0
  29. data/docs/development/testing.md +514 -0
  30. data/docs/getting-started/examples.md +197 -0
  31. data/docs/getting-started/installation.md +62 -0
  32. data/docs/getting-started/quick-start.md +218 -0
  33. data/docs/guides/choosing-concurrency-model.md +441 -0
  34. data/docs/guides/complex-workflows.md +440 -0
  35. data/docs/guides/data-fetching.md +478 -0
  36. data/docs/guides/error-handling.md +635 -0
  37. data/docs/guides/file-processing.md +505 -0
  38. data/docs/guides/validation-patterns.md +496 -0
  39. data/docs/index.md +169 -0
  40. data/examples/.gitignore +3 -0
  41. data/examples/01_basic_pipeline.rb +112 -0
  42. data/examples/02_error_handling.rb +178 -0
  43. data/examples/03_middleware.rb +186 -0
  44. data/examples/04_parallel_automatic.rb +221 -0
  45. data/examples/05_parallel_explicit.rb +279 -0
  46. data/examples/06_real_world_ecommerce.rb +288 -0
  47. data/examples/07_real_world_etl.rb +277 -0
  48. data/examples/08_graph_visualization.rb +246 -0
  49. data/examples/09_pipeline_visualization.rb +266 -0
  50. data/examples/10_concurrency_control.rb +235 -0
  51. data/examples/11_sequential_dependencies.rb +243 -0
  52. data/examples/12_none_constant.rb +161 -0
  53. data/examples/README.md +374 -0
  54. data/examples/regression_test/01_basic_pipeline.txt +38 -0
  55. data/examples/regression_test/02_error_handling.txt +92 -0
  56. data/examples/regression_test/03_middleware.txt +61 -0
  57. data/examples/regression_test/04_parallel_automatic.txt +86 -0
  58. data/examples/regression_test/05_parallel_explicit.txt +80 -0
  59. data/examples/regression_test/06_real_world_ecommerce.txt +53 -0
  60. data/examples/regression_test/07_real_world_etl.txt +58 -0
  61. data/examples/regression_test/08_graph_visualization.txt +429 -0
  62. data/examples/regression_test/09_pipeline_visualization.txt +305 -0
  63. data/examples/regression_test/10_concurrency_control.txt +96 -0
  64. data/examples/regression_test/11_sequential_dependencies.txt +86 -0
  65. data/examples/regression_test/12_none_constant.txt +64 -0
  66. data/examples/regression_test.rb +105 -0
  67. data/lib/simple_flow/dependency_graph.rb +120 -0
  68. data/lib/simple_flow/dependency_graph_visualizer.rb +326 -0
  69. data/lib/simple_flow/middleware.rb +36 -0
  70. data/lib/simple_flow/parallel_executor.rb +80 -0
  71. data/lib/simple_flow/pipeline.rb +405 -0
  72. data/lib/simple_flow/result.rb +88 -0
  73. data/lib/simple_flow/step_tracker.rb +58 -0
  74. data/lib/simple_flow/version.rb +5 -0
  75. data/lib/simple_flow.rb +41 -0
  76. data/mkdocs.yml +146 -0
  77. data/pipeline_graph.dot +51 -0
  78. data/pipeline_graph.html +60 -0
  79. data/pipeline_graph.mmd +19 -0
  80. metadata +127 -0
@@ -0,0 +1,315 @@
1
+ # Pipeline
2
+
3
+ The `Pipeline` class is the orchestrator that manages the execution of steps in your data processing workflow.
4
+
5
+ ## Overview
6
+
7
+ A Pipeline defines a sequence of operations (steps) that transform data, with support for:
8
+
9
+ - Sequential execution with automatic dependencies
10
+ - Parallel execution (automatic and explicit)
11
+ - Middleware integration
12
+ - Short-circuit evaluation
13
+ - Explicit dependency management
14
+
15
+ ## Execution Modes
16
+
17
+ SimpleFlow pipelines support two distinct execution modes:
18
+
19
+ ### Sequential Execution (Default)
20
+
21
+ **Unnamed steps execute in order, with each step automatically depending on the previous step's success.**
22
+
23
+ When a step halts (returns `result.halt`), the pipeline immediately stops and subsequent steps are not executed.
24
+
25
+ ```ruby
26
+ pipeline = SimpleFlow::Pipeline.new do
27
+ step ->(result) { puts "Step 1"; result.continue(result.value) }
28
+ step ->(result) { puts "Step 2"; result.halt("stopped") }
29
+ step ->(result) { puts "Step 3"; result.continue(result.value) } # NEVER EXECUTES
30
+ end
31
+
32
+ result = pipeline.call(SimpleFlow::Result.new(nil))
33
+ # Output:
34
+ # Step 1
35
+ # Step 2
36
+ # (Step 3 is skipped because Step 2 halted)
37
+ ```
38
+
39
+ This automatic dependency chain means:
40
+ - Steps execute one at a time in the order they were defined
41
+ - Each step receives the result from the previous step
42
+ - If any step halts, the entire pipeline stops immediately
43
+ - No need to specify dependencies for sequential workflows
44
+
45
+ ### Parallel Execution
46
+
47
+ **Named steps with explicit dependencies can run concurrently using `call_parallel`.**
48
+
49
+ ```ruby
50
+ pipeline = SimpleFlow::Pipeline.new do
51
+ step :validate, validator, depends_on: []
52
+ step :fetch_a, fetcher_a, depends_on: [:validate] # Runs in parallel with fetch_b
53
+ step :fetch_b, fetcher_b, depends_on: [:validate] # Runs in parallel with fetch_a
54
+ step :merge, merger, depends_on: [:fetch_a, :fetch_b]
55
+ end
56
+
57
+ result = pipeline.call_parallel(initial_data)
58
+ ```
59
+
60
+ See [Parallel Execution](#parallel-execution) below for details.
61
+
62
+ ## Basic Usage
63
+
64
+ ```ruby
65
+ require 'simple_flow'
66
+
67
+ pipeline = SimpleFlow::Pipeline.new do
68
+ step ->(result) { result.continue(result.value * 2) }
69
+ step ->(result) { result.continue(result.value + 10) }
70
+ step ->(result) { result.continue(result.value.to_s) }
71
+ end
72
+
73
+ result = pipeline.call(SimpleFlow::Result.new(5))
74
+ result.value # => "20"
75
+ ```
76
+
77
+ ## Defining Steps
78
+
79
+ ### Lambda Steps
80
+
81
+ ```ruby
82
+ pipeline = SimpleFlow::Pipeline.new do
83
+ step ->(result) do
84
+ # Process the result
85
+ new_value = transform(result.value)
86
+ result.continue(new_value)
87
+ end
88
+ end
89
+ ```
90
+
91
+ ### Method Steps
92
+
93
+ ```ruby
94
+ def validate_user(result)
95
+ if result.value[:email].present?
96
+ result.continue(result.value)
97
+ else
98
+ result.with_error(:validation, 'Email required').halt
99
+ end
100
+ end
101
+
102
+ pipeline = SimpleFlow::Pipeline.new do
103
+ step method(:validate_user)
104
+ end
105
+ ```
106
+
107
+ ### Callable Objects
108
+
109
+ ```ruby
110
+ class EmailValidator
111
+ def call(result)
112
+ # Validation logic
113
+ result.continue(result.value)
114
+ end
115
+ end
116
+
117
+ pipeline = SimpleFlow::Pipeline.new do
118
+ step EmailValidator.new
119
+ end
120
+ ```
121
+
122
+ ## Named Steps with Dependencies
123
+
124
+ For parallel execution, you can define named steps with explicit dependencies:
125
+
126
+ ```ruby
127
+ pipeline = SimpleFlow::Pipeline.new do
128
+ step :validate, ->(result) { validate(result) }, depends_on: []
129
+ step :fetch_user, ->(result) { fetch_user(result) }, depends_on: [:validate]
130
+ step :fetch_orders, ->(result) { fetch_orders(result) }, depends_on: [:validate]
131
+ step :calculate, ->(result) { calculate(result) }, depends_on: [:fetch_user, :fetch_orders]
132
+ end
133
+ ```
134
+
135
+ Steps with the same satisfied dependencies run in parallel automatically.
136
+
137
+ ## Parallel Execution
138
+
139
+ ### Automatic Parallelization
140
+
141
+ ```ruby
142
+ # These will run in parallel (both depend only on :validate)
143
+ pipeline = SimpleFlow::Pipeline.new do
144
+ step :validate, validator, depends_on: []
145
+ step :fetch_orders, fetch_orders_callable, depends_on: [:validate]
146
+ step :fetch_products, fetch_products_callable, depends_on: [:validate]
147
+ end
148
+
149
+ result = pipeline.call_parallel(initial_result)
150
+ ```
151
+
152
+ ### Explicit Parallel Blocks
153
+
154
+ ```ruby
155
+ pipeline = SimpleFlow::Pipeline.new do
156
+ # Sequential step
157
+ step ->(result) { validate(result) }
158
+
159
+ # These run in parallel
160
+ parallel do
161
+ step ->(result) { fetch_from_api(result) }
162
+ step ->(result) { fetch_from_cache(result) }
163
+ step ->(result) { fetch_from_database(result) }
164
+ end
165
+
166
+ # Sequential step
167
+ step ->(result) { merge_results(result) }
168
+ end
169
+ ```
170
+
171
+ ## Short-Circuit Evaluation
172
+
173
+ **Pipelines automatically stop executing when a step halts.** This is a core feature of sequential execution - each unnamed step implicitly depends on the previous step's success.
174
+
175
+ ```ruby
176
+ pipeline = SimpleFlow::Pipeline.new do
177
+ step ->(result) { result.continue("step 1") }
178
+ step ->(result) { result.halt("stopped") } # Execution stops here
179
+ step ->(result) { result.continue("step 3") } # Never executed
180
+ end
181
+
182
+ result = pipeline.call(SimpleFlow::Result.new(nil))
183
+ result.value # => "stopped"
184
+ result.continue? # => false
185
+ ```
186
+
187
+ **Implementation detail:** The `call` method checks `result.continue?` after each step. If it returns `false`, the pipeline returns immediately without executing remaining steps:
188
+
189
+ ```ruby
190
+ # Simplified view of Pipeline#call
191
+ def call(result)
192
+ steps.reduce(result) do |res, step|
193
+ return res unless res.continue? # Short-circuit on halt
194
+ step.call(res)
195
+ end
196
+ end
197
+ ```
198
+
199
+ This behavior ensures:
200
+ - **Fail-fast**: Errors stop processing immediately
201
+ - **Resource efficiency**: No wasted computation on already-failed results
202
+ - **Predictable flow**: Clear execution path based on step outcomes
203
+
204
+ ## Middleware
205
+
206
+ Apply cross-cutting concerns using middleware:
207
+
208
+ ```ruby
209
+ pipeline = SimpleFlow::Pipeline.new do
210
+ use_middleware SimpleFlow::MiddleWare::Logging
211
+ use_middleware SimpleFlow::MiddleWare::Instrumentation, api_key: 'my-key'
212
+
213
+ step ->(result) { process(result) }
214
+ end
215
+ ```
216
+
217
+ [Learn more about Middleware](middleware.md)
218
+
219
+ ## Visualization
220
+
221
+ Pipelines with named steps can be visualized:
222
+
223
+ ```ruby
224
+ # Generate ASCII visualization
225
+ puts pipeline.visualize_ascii
226
+
227
+ # Export to Graphviz DOT format
228
+ File.write('pipeline.dot', pipeline.visualize_dot)
229
+
230
+ # Export to Mermaid diagram
231
+ File.write('pipeline.mmd', pipeline.visualize_mermaid)
232
+
233
+ # Get execution plan analysis
234
+ puts pipeline.execution_plan
235
+ ```
236
+
237
+ ## API Reference
238
+
239
+ ### Class Methods
240
+
241
+ | Method | Description |
242
+ |--------|-------------|
243
+ | `new(&block)` | Create a new pipeline with DSL block |
244
+
245
+ ### Instance Methods
246
+
247
+ | Method | Description |
248
+ |--------|-------------|
249
+ | `call(result)` | Execute pipeline sequentially |
250
+ | `call_parallel(result, strategy: :auto)` | Execute with parallelization |
251
+ | `dependency_graph` | Get underlying dependency graph |
252
+ | `visualize` | Get visualizer instance |
253
+ | `visualize_ascii(show_groups: true)` | ASCII visualization |
254
+ | `visualize_dot(include_groups: true, orientation: 'TB')` | Graphviz DOT export |
255
+ | `visualize_mermaid` | Mermaid diagram export |
256
+ | `execution_plan` | Performance analysis |
257
+
258
+ ### DSL Methods (in Pipeline.new block)
259
+
260
+ | Method | Description |
261
+ |--------|-------------|
262
+ | `step(callable)` | Add anonymous step |
263
+ | `step(name, callable, depends_on: [])` | Add named step with dependencies |
264
+ | `parallel(&block)` | Define explicit parallel block |
265
+ | `use_middleware(middleware, **options)` | Add middleware |
266
+
267
+ ## Best Practices
268
+
269
+ 1. **Keep steps focused**: Each step should do one thing well
270
+ 2. **Use meaningful names**: Named steps improve visualization and debugging
271
+ 3. **Handle errors gracefully**: Use `.halt` to stop processing on errors
272
+ 4. **Leverage context**: Pass metadata between steps via `result.context`
273
+ 5. **Consider parallelization**: Use named steps with dependencies for I/O-bound operations
274
+ 6. **Apply middleware judiciously**: Add logging/instrumentation for observability
275
+
276
+ ## Example: E-Commerce Order Processing
277
+
278
+ ```ruby
279
+ pipeline = SimpleFlow::Pipeline.new do
280
+ use_middleware SimpleFlow::MiddleWare::Logging
281
+ use_middleware SimpleFlow::MiddleWare::Instrumentation
282
+
283
+ step :validate, ->(result) {
284
+ # Validate order
285
+ result.continue(result.value)
286
+ }, depends_on: :none
287
+
288
+ step :check_inventory, ->(result) {
289
+ # Check stock
290
+ result.continue(result.value)
291
+ }, depends_on: [:validate]
292
+
293
+ step :calculate_shipping, ->(result) {
294
+ # Calculate shipping cost
295
+ result.continue(result.value)
296
+ }, depends_on: [:validate]
297
+
298
+ step :process_payment, ->(result) {
299
+ # Process payment
300
+ result.continue(result.value)
301
+ }, depends_on: [:check_inventory, :calculate_shipping]
302
+
303
+ step :send_confirmation, ->(result) {
304
+ # Send email
305
+ result.continue(result.value)
306
+ }, depends_on: [:process_payment]
307
+ end
308
+ ```
309
+
310
+ ## Next Steps
311
+
312
+ - [Steps](steps.md) - Deep dive into step implementations
313
+ - [Middleware](middleware.md) - Adding cross-cutting concerns
314
+ - [Parallel Execution](../concurrent/parallel-steps.md) - Concurrent processing patterns
315
+ - [Complex Workflows Guide](../guides/complex-workflows.md) - Real-world examples
@@ -0,0 +1,168 @@
1
+ # Result
2
+
3
+ The `Result` class is the fundamental value object in SimpleFlow that encapsulates the outcome of each operation in your pipeline.
4
+
5
+ ## Overview
6
+
7
+ A `Result` object contains three main components:
8
+
9
+ - **Value**: The actual data being processed
10
+ - **Context**: A hash of metadata and contextual information
11
+ - **Errors**: Categorized error messages accumulated during processing
12
+
13
+ ## Immutability
14
+
15
+ Results are immutable - every operation returns a new `Result` instance rather than modifying the existing one. This design promotes safer concurrent operations and functional programming patterns.
16
+
17
+ ```ruby
18
+ original = SimpleFlow::Result.new("data")
19
+ updated = original.with_context(:user_id, 123)
20
+
21
+ original.context # => {}
22
+ updated.context # => { user_id: 123 }
23
+ ```
24
+
25
+ ## Creating Results
26
+
27
+ ### Basic Initialization
28
+
29
+ ```ruby
30
+ # Simple result with just a value
31
+ result = SimpleFlow::Result.new(10)
32
+
33
+ # Result with initial context and errors
34
+ result = SimpleFlow::Result.new(
35
+ { count: 5 },
36
+ context: { user_id: 123 },
37
+ errors: { validation: ['Required field missing'] }
38
+ )
39
+ ```
40
+
41
+ ## Working with Context
42
+
43
+ Context allows you to pass metadata through your pipeline without modifying the primary value.
44
+
45
+ ```ruby
46
+ result = SimpleFlow::Result.new(data)
47
+ .with_context(:user_id, 123)
48
+ .with_context(:timestamp, Time.now.to_i)
49
+ .with_context(:source, 'api')
50
+
51
+ result.context
52
+ # => { user_id: 123, timestamp: 1234567890, source: 'api' }
53
+ ```
54
+
55
+ ### Common Context Use Cases
56
+
57
+ - User authentication details
58
+ - Request timestamps
59
+ - Transaction IDs
60
+ - Debug information
61
+ - Performance metrics
62
+
63
+ ## Error Handling
64
+
65
+ Errors are organized by category, allowing multiple errors per category:
66
+
67
+ ```ruby
68
+ result = SimpleFlow::Result.new(data)
69
+ .with_error(:validation, 'Email is required')
70
+ .with_error(:validation, 'Password too short')
71
+ .with_error(:authentication, 'Invalid token')
72
+
73
+ result.errors
74
+ # => {
75
+ # validation: ['Email is required', 'Password too short'],
76
+ # authentication: ['Invalid token']
77
+ # }
78
+ ```
79
+
80
+ ## Flow Control
81
+
82
+ Results include a continue flag that controls pipeline execution.
83
+
84
+ ### Continue
85
+
86
+ Move to the next step with a new value:
87
+
88
+ ```ruby
89
+ result = result.continue(new_value)
90
+ # continue? => true
91
+ ```
92
+
93
+ ### Halt
94
+
95
+ Stop pipeline execution:
96
+
97
+ ```ruby
98
+ # Halt without changing value
99
+ result = result.halt
100
+ # continue? => false, value unchanged
101
+
102
+ # Halt with a new value
103
+ result = result.halt(error_response)
104
+ # continue? => false, value changed
105
+ ```
106
+
107
+ ### Checking Status
108
+
109
+ ```ruby
110
+ if result.continue?
111
+ # Pipeline will proceed
112
+ else
113
+ # Pipeline has been halted
114
+ end
115
+ ```
116
+
117
+ ## Example: Multi-Step Processing
118
+
119
+ ```ruby
120
+ def process_user_registration(params)
121
+ result = SimpleFlow::Result.new(params)
122
+ .with_context(:ip_address, request.ip)
123
+ .with_context(:timestamp, Time.now)
124
+
125
+ # Validation
126
+ if params[:email].nil?
127
+ return result
128
+ .with_error(:validation, 'Email required')
129
+ .halt
130
+ end
131
+
132
+ # Process
133
+ user = create_user(params)
134
+
135
+ result
136
+ .continue(user)
137
+ .with_context(:user_id, user.id)
138
+ end
139
+ ```
140
+
141
+ ## API Reference
142
+
143
+ ### Instance Methods
144
+
145
+ | Method | Description | Returns |
146
+ |--------|-------------|---------|
147
+ | `value` | Get the current value | Object |
148
+ | `context` | Get the context hash | Hash |
149
+ | `errors` | Get the errors hash | Hash |
150
+ | `continue?` | Check if pipeline should continue | Boolean |
151
+ | `with_context(key, value)` | Add context | New Result |
152
+ | `with_error(key, message)` | Add error | New Result |
153
+ | `continue(new_value)` | Proceed with new value | New Result |
154
+ | `halt(new_value = nil)` | Stop execution | New Result |
155
+
156
+ ## Best Practices
157
+
158
+ 1. **Use context for metadata**: Keep the value focused on the data being processed
159
+ 2. **Categorize errors**: Use meaningful error keys like `:validation`, `:authentication`, `:database`
160
+ 3. **Halt early**: Stop processing as soon as you know the operation cannot succeed
161
+ 4. **Chain operations**: Take advantage of immutability to build readable operation chains
162
+ 5. **Preserve information**: When halting, preserve context and errors for debugging
163
+
164
+ ## Next Steps
165
+
166
+ - [Pipeline](pipeline.md) - Learn how Results flow through pipelines
167
+ - [Flow Control](flow-control.md) - Advanced flow control patterns
168
+ - [Error Handling Guide](../guides/error-handling.md) - Comprehensive error handling strategies