RubyGems - simple_flow - Versions diffs - 0.1.0 - Mend

simple_flow 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (80) hide show

checksums.yaml +7 -0
data/.envrc +1 -0
data/.github/workflows/deploy-github-pages.yml +52 -0
data/.rubocop.yml +57 -0
data/CHANGELOG.md +4 -0
data/COMMITS.md +196 -0
data/LICENSE +21 -0
data/README.md +481 -0
data/Rakefile +15 -0
data/benchmarks/parallel_vs_sequential.rb +98 -0
data/benchmarks/pipeline_overhead.rb +130 -0
data/docs/api/middleware.md +468 -0
data/docs/api/parallel-step.md +363 -0
data/docs/api/pipeline.md +382 -0
data/docs/api/result.md +375 -0
data/docs/concurrent/best-practices.md +687 -0
data/docs/concurrent/introduction.md +246 -0
data/docs/concurrent/parallel-steps.md +418 -0
data/docs/concurrent/performance.md +481 -0
data/docs/core-concepts/flow-control.md +452 -0
data/docs/core-concepts/middleware.md +389 -0
data/docs/core-concepts/overview.md +219 -0
data/docs/core-concepts/pipeline.md +315 -0
data/docs/core-concepts/result.md +168 -0
data/docs/core-concepts/steps.md +391 -0
data/docs/development/benchmarking.md +443 -0
data/docs/development/contributing.md +380 -0
data/docs/development/dagwood-concepts.md +435 -0
data/docs/development/testing.md +514 -0
data/docs/getting-started/examples.md +197 -0
data/docs/getting-started/installation.md +62 -0
data/docs/getting-started/quick-start.md +218 -0
data/docs/guides/choosing-concurrency-model.md +441 -0
data/docs/guides/complex-workflows.md +440 -0
data/docs/guides/data-fetching.md +478 -0
data/docs/guides/error-handling.md +635 -0
data/docs/guides/file-processing.md +505 -0
data/docs/guides/validation-patterns.md +496 -0
data/docs/index.md +169 -0
data/examples/.gitignore +3 -0
data/examples/01_basic_pipeline.rb +112 -0
data/examples/02_error_handling.rb +178 -0
data/examples/03_middleware.rb +186 -0
data/examples/04_parallel_automatic.rb +221 -0
data/examples/05_parallel_explicit.rb +279 -0
data/examples/06_real_world_ecommerce.rb +288 -0
data/examples/07_real_world_etl.rb +277 -0
data/examples/08_graph_visualization.rb +246 -0
data/examples/09_pipeline_visualization.rb +266 -0
data/examples/10_concurrency_control.rb +235 -0
data/examples/11_sequential_dependencies.rb +243 -0
data/examples/12_none_constant.rb +161 -0
data/examples/README.md +374 -0
data/examples/regression_test/01_basic_pipeline.txt +38 -0
data/examples/regression_test/02_error_handling.txt +92 -0
data/examples/regression_test/03_middleware.txt +61 -0
data/examples/regression_test/04_parallel_automatic.txt +86 -0
data/examples/regression_test/05_parallel_explicit.txt +80 -0
data/examples/regression_test/06_real_world_ecommerce.txt +53 -0
data/examples/regression_test/07_real_world_etl.txt +58 -0
data/examples/regression_test/08_graph_visualization.txt +429 -0
data/examples/regression_test/09_pipeline_visualization.txt +305 -0
data/examples/regression_test/10_concurrency_control.txt +96 -0
data/examples/regression_test/11_sequential_dependencies.txt +86 -0
data/examples/regression_test/12_none_constant.txt +64 -0
data/examples/regression_test.rb +105 -0
data/lib/simple_flow/dependency_graph.rb +120 -0
data/lib/simple_flow/dependency_graph_visualizer.rb +326 -0
data/lib/simple_flow/middleware.rb +36 -0
data/lib/simple_flow/parallel_executor.rb +80 -0
data/lib/simple_flow/pipeline.rb +405 -0
data/lib/simple_flow/result.rb +88 -0
data/lib/simple_flow/step_tracker.rb +58 -0
data/lib/simple_flow/version.rb +5 -0
data/lib/simple_flow.rb +41 -0
data/mkdocs.yml +146 -0
data/pipeline_graph.dot +51 -0
data/pipeline_graph.html +60 -0
data/pipeline_graph.mmd +19 -0
metadata +127 -0

data/docs/concurrent/introduction.md ADDED Viewed

@@ -0,0 +1,246 @@
+# Concurrent Execution
+One of SimpleFlow's most powerful features is the ability to execute independent steps **concurrently** using fiber-based concurrency.
+## Why Concurrent Execution?
+Many workflows have steps that don't depend on each other and can run at the same time:
+- Fetching data from multiple APIs
+- Running independent validation checks
+- Processing multiple files
+- Enriching data from various sources
+Running these steps concurrently can **dramatically improve performance**.
+## Performance Benefits
+Consider fetching data from 4 APIs:
+**Sequential Execution: ~0.4s**
+```ruby
+pipeline = SimpleFlow::Pipeline.new do
+  step ->(result) { fetch_api_1(result) }  # 0.1s
+  step ->(result) { fetch_api_2(result) }  # 0.1s
+  step ->(result) { fetch_api_3(result) }  # 0.1s
+  step ->(result) { fetch_api_4(result) }  # 0.1s
+end
+# Total: 0.4s
+```
+**Parallel Execution: ~0.1s**
+```ruby
+pipeline = SimpleFlow::Pipeline.new do
+  parallel do
+    step ->(result) { fetch_api_1(result) }  # ┐
+    step ->(result) { fetch_api_2(result) }  # ├─ All run
+    step ->(result) { fetch_api_3(result) }  # ├─ concurrently
+    step ->(result) { fetch_api_4(result) }  # ┘
+  end
+end
+# Total: ~0.1s (4x speedup!)
+```
+## Basic Usage
+Use the `parallel` block in your pipeline:
+```ruby
+pipeline = SimpleFlow::Pipeline.new do
+  # This runs first (sequential)
+  step ->(result) { initialize_data(result) }
+  # These run concurrently
+  parallel do
+    step ->(result) { fetch_orders(result) }
+    step ->(result) { fetch_preferences(result) }
+    step ->(result) { fetch_analytics(result) }
+  end
+  # This waits for all parallel steps to complete
+  step ->(result) { aggregate_results(result) }
+end
+```
+## How It Works
+### Fiber-Based Concurrency
+SimpleFlow uses the **Async gem** which provides fiber-based concurrency:
+- **No threading overhead**: Fibers are lightweight
+- **No GIL limitations**: Not affected by Ruby's Global Interpreter Lock
+- **Perfect for I/O**: Ideal for network requests, file operations, etc.
+### Result Merging
+When parallel steps complete, their results are automatically merged:
+```ruby
+parallel do
+  step ->(result) { result.with_context(:a, 1).continue(result.value) }
+  step ->(result) { result.with_context(:b, 2).continue(result.value) }
+  step ->(result) { result.with_context(:c, 3).continue(result.value) }
+end
+# Merged result has all contexts: {:a=>1, :b=>2, :c=>3}
+```
+**Merging Rules:**
+- **Values**: Uses the last non-halted result's value
+- **Contexts**: Merges all contexts together
+- **Errors**: Merges all errors together
+- **Continue**: If any step halts, the merged result is halted
+## Real-World Example
+### User Data Aggregation
+```ruby
+require 'simple_flow'
+require 'net/http'
+require 'json'
+pipeline = SimpleFlow::Pipeline.new do
+  # Validate user ID
+  step ->(result) {
+    user_id = result.value
+    user_id > 0 ?
+      result.continue(user_id) :
+      result.halt.with_error(:validation, "Invalid user ID")
+  }
+  # Fetch data from multiple services concurrently
+  parallel do
+    step ->(result) {
+      user_id = result.value
+      profile = fetch_user_profile(user_id)
+      result.with_context(:profile, profile).continue(user_id)
+    }
+    step ->(result) {
+      user_id = result.value
+      orders = fetch_user_orders(user_id)
+      result.with_context(:orders, orders).continue(user_id)
+    }
+    step ->(result) {
+      user_id = result.value
+      preferences = fetch_user_preferences(user_id)
+      result.with_context(:preferences, preferences).continue(user_id)
+    }
+    step ->(result) {
+      user_id = result.value
+      analytics = fetch_user_analytics(user_id)
+      result.with_context(:analytics, analytics).continue(user_id)
+    }
+  end
+  # Aggregate all fetched data
+  step ->(result) {
+    aggregated = {
+      user_id: result.value,
+      profile: result.context[:profile],
+      orders: result.context[:orders],
+      preferences: result.context[:preferences],
+      analytics: result.context[:analytics]
+    }
+    result.continue(aggregated)
+  }
+end
+# Execute
+result = pipeline.call(SimpleFlow::Result.new(123))
+puts result.value[:profile]
+# => {...}
+```
+## Multiple Parallel Blocks
+You can have multiple parallel blocks in a pipeline:
+```ruby
+pipeline = SimpleFlow::Pipeline.new do
+  step ->(result) { initialize(result) }
+  # First parallel block
+  parallel do
+    step ->(result) { fetch_data_a(result) }
+    step ->(result) { fetch_data_b(result) }
+  end
+  step ->(result) { process_first_batch(result) }
+  # Second parallel block
+  parallel do
+    step ->(result) { enrich_data_a(result) }
+    step ->(result) { enrich_data_b(result) }
+    step ->(result) { enrich_data_c(result) }
+  end
+  step ->(result) { finalize(result) }
+end
+```
+## Error Handling
+If any parallel step halts, the entire parallel block halts:
+```ruby
+parallel do
+  step ->(result) { result.continue("success") }
+  step ->(result) { result.halt.with_error(:service, "Failed") }
+  step ->(result) { result.continue("success") }
+end
+# Result is halted with error: {:service=>["Failed"]}
+```
+All errors are accumulated:
+```ruby
+parallel do
+  step ->(result) { result.with_error(:a, "Error A").continue(result.value) }
+  step ->(result) { result.with_error(:b, "Error B").continue(result.value) }
+end
+# Result has errors: {:a=>["Error A"], :b=>["Error B"]}
+```
+## Best Practices
+### ✅ Good Use Cases
+- **Independent I/O operations**: API calls, database queries
+- **Independent validations**: Multiple validation checks
+- **Data enrichment**: Fetching supplementary data
+- **File processing**: Processing multiple files
+### ❌ Poor Use Cases
+- **Dependent operations**: When step B needs step A's result
+- **CPU-intensive work**: Better with threading or processes
+- **Shared mutable state**: Could cause race conditions
+- **Very quick operations**: Overhead might outweigh benefits
+## When to Use Parallel Execution
+Use the `parallel` block when:
+1. ✅ Steps are **independent** (don't depend on each other's results)
+2. ✅ Steps are **I/O-bound** (network, file, database)
+3. ✅ Total execution time of steps > ~50ms
+4. ✅ Steps can safely run concurrently
+Don't use `parallel` when:
+1. ❌ Steps depend on previous results
+2. ❌ Steps are very fast (<10ms each)
+3. ❌ Steps modify shared state
+4. ❌ Steps are CPU-intensive
+## Next Steps
+- [Parallel Steps Guide](parallel-steps.md) - Deep dive into ParallelStep
+- [Performance Tips](performance.md) - Optimize concurrent execution
+- [Best Practices](best-practices.md) - Patterns and anti-patterns
+- [Examples](../getting-started/examples.md) - See it in action

data/docs/concurrent/parallel-steps.md ADDED Viewed

@@ -0,0 +1,418 @@
+# Parallel Execution with Named Steps
+SimpleFlow provides powerful parallel execution capabilities through two approaches: automatic parallel discovery using dependency graphs and explicit parallel blocks. This guide focuses on using named steps with dependencies for automatic parallelization.
+## Overview
+When you define steps with names and dependencies, SimpleFlow automatically analyzes the dependency graph and executes independent steps concurrently. This provides optimal performance without requiring you to explicitly manage parallelism.
+## Basic Concepts
+### Named Steps
+A named step is defined with three components:
+1. **Name** (Symbol) - Unique identifier for the step
+2. **Callable** (Proc/Lambda) - The code to execute
+3. **Dependencies** (Array of Symbols) - Steps that must complete first
+```ruby
+pipeline = SimpleFlow::Pipeline.new do
+  step :step_name, ->(result) {
+    # Your code here
+    result.continue(new_value)
+  }, depends_on: [:prerequisite_step]
+end
+```
+### Dependency Declaration
+Dependencies are declared using the `depends_on:` parameter:
+```ruby
+# No dependencies - can run immediately
+step :initial_step, ->(result) { ... }, depends_on: []
+# Depends on one step
+step :second_step, ->(result) { ... }, depends_on: [:initial_step]
+# Depends on multiple steps
+step :final_step, ->(result) { ... }, depends_on: [:second_step, :third_step]
+```
+## Automatic Parallelization
+### How It Works
+1. **Graph Analysis**: SimpleFlow builds a dependency graph from your step declarations
+2. **Topological Sort**: Steps are organized into execution groups using Ruby's TSort module
+3. **Parallel Execution**: Steps with all dependencies satisfied run concurrently
+4. **Result Merging**: Contexts and errors from parallel steps are automatically merged
+### Simple Example
+```ruby
+pipeline = SimpleFlow::Pipeline.new do
+  # Step 1: Runs first (no dependencies)
+  step :fetch_user, ->(result) {
+    user = UserService.find(result.value)
+    result.with_context(:user, user).continue(result.value)
+  }, depends_on: []
+  # Steps 2 & 3: Run in parallel (both depend only on step 1)
+  step :fetch_orders, ->(result) {
+    orders = OrderService.for_user(result.context[:user])
+    result.with_context(:orders, orders).continue(result.value)
+  }, depends_on: [:fetch_user]
+  step :fetch_preferences, ->(result) {
+    prefs = PreferenceService.for_user(result.context[:user])
+    result.with_context(:preferences, prefs).continue(result.value)
+  }, depends_on: [:fetch_user]
+  # Step 4: Runs after both parallel steps complete
+  step :build_profile, ->(result) {
+    profile = {
+      user: result.context[:user],
+      orders: result.context[:orders],
+      preferences: result.context[:preferences]
+    }
+    result.continue(profile)
+  }, depends_on: [:fetch_orders, :fetch_preferences]
+end
+# Execute with automatic parallelism
+result = pipeline.call_parallel(SimpleFlow::Result.new(user_id))
+```
+**Execution Flow:**
+1. `fetch_user` runs first
+2. `fetch_orders` and `fetch_preferences` run in parallel
+3. `build_profile` runs after both parallel steps complete
+## Complex Dependency Graphs
+### Multi-Level Parallelism
+```ruby
+pipeline = SimpleFlow::Pipeline.new do
+  # Level 1: Validation (sequential)
+  step :validate_input, ->(result) {
+    # Validate request
+    result.with_context(:validated, true).continue(result.value)
+  }, depends_on: []
+  # Level 2: Three independent checks (parallel)
+  step :check_inventory, ->(result) {
+    inventory = InventoryService.check(result.value)
+    result.with_context(:inventory, inventory).continue(result.value)
+  }, depends_on: [:validate_input]
+  step :check_pricing, ->(result) {
+    price = PricingService.calculate(result.value)
+    result.with_context(:price, price).continue(result.value)
+  }, depends_on: [:validate_input]
+  step :check_shipping, ->(result) {
+    shipping = ShippingService.calculate(result.value)
+    result.with_context(:shipping, shipping).continue(result.value)
+  }, depends_on: [:validate_input]
+  # Level 3: Calculate discount (depends on inventory and pricing)
+  step :calculate_discount, ->(result) {
+    discount = DiscountService.calculate(
+      result.context[:inventory],
+      result.context[:price]
+    )
+    result.with_context(:discount, discount).continue(result.value)
+  }, depends_on: [:check_inventory, :check_pricing]
+  # Level 4: Finalize (depends on discount and shipping)
+  step :finalize_order, ->(result) {
+    total = result.context[:price] +
+            result.context[:shipping] -
+            result.context[:discount]
+    result.continue(total)
+  }, depends_on: [:calculate_discount, :check_shipping]
+end
+```
+**Execution Groups:**
+- Group 1: `validate_input` (sequential)
+- Group 2: `check_inventory`, `check_pricing`, `check_shipping` (parallel)
+- Group 3: `calculate_discount` (sequential, waits for inventory and pricing)
+- Group 4: `finalize_order` (sequential, waits for discount and shipping)
+## Context Merging
+When parallel steps complete, SimpleFlow automatically merges their contexts and errors:
+```ruby
+pipeline = SimpleFlow::Pipeline.new do
+  step :task_a, ->(result) {
+    result.with_context(:data_a, "from A").continue(result.value)
+  }, depends_on: []
+  step :task_b, ->(result) {
+    result.with_context(:data_b, "from B").continue(result.value)
+  }, depends_on: []
+  step :combine, ->(result) {
+    # Both contexts are available
+    combined = {
+      a: result.context[:data_a],  # "from A"
+      b: result.context[:data_b]   # "from B"
+    }
+    result.continue(combined)
+  }, depends_on: [:task_a, :task_b]
+end
+```
+### Error Accumulation
+Errors from parallel steps are also merged:
+```ruby
+pipeline = SimpleFlow::Pipeline.new do
+  step :validate_email, ->(result) {
+    if invalid_email?(result.value[:email])
+      result.with_error(:email, "Invalid format")
+    end
+    result.continue(result.value)
+  }, depends_on: []
+  step :validate_phone, ->(result) {
+    if invalid_phone?(result.value[:phone])
+      result.with_error(:phone, "Invalid format")
+    end
+    result.continue(result.value)
+  }, depends_on: []
+  step :check_errors, ->(result) {
+    # Errors from both parallel validations are available
+    if result.errors.any?
+      result.halt(result.value)  # Stop if any validation failed
+    else
+      result.continue(result.value)
+    end
+  }, depends_on: [:validate_email, :validate_phone]
+end
+```
+## Halting Execution
+If any parallel step calls `halt()`, the pipeline stops immediately:
+```ruby
+pipeline = SimpleFlow::Pipeline.new do
+  step :task_a, ->(result) {
+    result.with_context(:success_a, true).continue(result.value)
+  }, depends_on: []
+  step :task_b, ->(result) {
+    # This step fails
+    result.halt.with_error(:failure, "Task B failed")
+  }, depends_on: []
+  step :task_c, ->(result) {
+    result.with_context(:success_c, true).continue(result.value)
+  }, depends_on: []
+  step :final_step, ->(result) {
+    # This will NOT execute because task_b halted
+    result.continue("Completed")
+  }, depends_on: [:task_a, :task_b, :task_c]
+end
+result = pipeline.call_parallel(initial_data)
+# result.continue? => false
+# result.errors => {:failure => ["Task B failed"]}
+```
+## Execution Methods
+### `call_parallel(result, strategy: :auto)`
+Executes the pipeline with parallel support:
+```ruby
+# Automatic strategy (default) - uses dependency graph if named steps exist
+result = pipeline.call_parallel(initial_result)
+# Automatic strategy (explicit)
+result = pipeline.call_parallel(initial_result, strategy: :auto)
+# Explicit strategy - only uses explicit parallel blocks
+result = pipeline.call_parallel(initial_result, strategy: :explicit)
+```
+### `call(result)`
+Executes sequentially (ignores parallelism):
+```ruby
+# Sequential execution - useful for debugging
+result = pipeline.call(initial_result)
+```
+## Visualizing Dependencies
+### ASCII Visualization
+```ruby
+# Print dependency graph to console
+puts pipeline.visualize_ascii
+# Hide parallel groups
+puts pipeline.visualize_ascii(show_groups: false)
+```
+### Graphviz DOT Format
+```ruby
+# Generate DOT file for visualization
+dot_content = pipeline.visualize_dot
+File.write('pipeline.dot', dot_content)
+# Generate image: dot -Tpng pipeline.dot -o pipeline.png
+# Left-to-right orientation
+dot_content = pipeline.visualize_dot(orientation: 'LR')
+```
+### Mermaid Diagrams
+```ruby
+# Generate Mermaid diagram
+mermaid = pipeline.visualize_mermaid
+File.write('pipeline.mmd', mermaid)
+# View at https://mermaid.live/
+```
+### Execution Plan
+```ruby
+# Get detailed execution analysis
+puts pipeline.execution_plan
+```
+Output includes:
+- Total steps and execution phases
+- Which steps run in parallel
+- Potential speedup vs sequential execution
+- Step-by-step execution order
+## Best Practices
+### 1. Design Independent Steps
+Ensure parallel steps are truly independent:
+```ruby
+# GOOD: Independent operations
+step :fetch_user_data, ->(result) { ... }, depends_on: []
+step :fetch_product_data, ->(result) { ... }, depends_on: []
+# BAD: Steps that modify shared state
+step :increment_counter, ->(result) { @counter += 1; ... }, depends_on: []
+step :read_counter, ->(result) { puts @counter; ... }, depends_on: []
+```
+### 2. Use Context for Data Sharing
+Pass data between steps using context, not instance variables:
+```ruby
+# GOOD: Using context
+step :fetch_data, ->(result) {
+  data = API.fetch(result.value)
+  result.with_context(:api_data, data).continue(result.value)
+}, depends_on: []
+step :process_data, ->(result) {
+  processed = transform(result.context[:api_data])
+  result.continue(processed)
+}, depends_on: [:fetch_data]
+# BAD: Using instance variables
+@shared_data = nil
+step :fetch_data, ->(result) {
+  @shared_data = API.fetch(result.value)  # Race condition!
+  result.continue(result.value)
+}, depends_on: []
+```
+### 3. Declare All Dependencies
+Be explicit about dependencies to ensure correct execution order:
+```ruby
+# GOOD: Clear dependencies
+step :load_config, ->(result) { ... }, depends_on: []
+step :validate_config, ->(result) { ... }, depends_on: [:load_config]
+step :apply_config, ->(result) { ... }, depends_on: [:validate_config]
+# BAD: Missing dependencies
+step :load_config, ->(result) { ... }, depends_on: []
+step :apply_config, ->(result) { ... }, depends_on: []  # Should depend on load_config!
+```
+### 4. Keep Steps Focused
+Each step should have a single responsibility:
+```ruby
+# GOOD: Focused steps
+step :fetch_user, ->(result) { ... }, depends_on: []
+step :fetch_orders, ->(result) { ... }, depends_on: [:fetch_user]
+step :calculate_total, ->(result) { ... }, depends_on: [:fetch_orders]
+# BAD: Monolithic step
+step :do_everything, ->(result) {
+  user = fetch_user
+  orders = fetch_orders(user)
+  total = calculate_total(orders)
+  # Too much in one step!
+}, depends_on: []
+```
+### 5. Handle Errors Gracefully
+Add error handling at appropriate points:
+```ruby
+pipeline = SimpleFlow::Pipeline.new do
+  # Parallel data fetching
+  step :fetch_a, ->(result) { ... }, depends_on: []
+  step :fetch_b, ->(result) { ... }, depends_on: []
+  # Check for errors before proceeding
+  step :validate_fetch, ->(result) {
+    if result.errors.any?
+      result.halt.with_error(:fetch, "Failed to fetch required data")
+    else
+      result.continue(result.value)
+    end
+  }, depends_on: [:fetch_a, :fetch_b]
+  # Only runs if validation passes
+  step :process, ->(result) { ... }, depends_on: [:validate_fetch]
+end
+```
+## Real-World Example
+See `/Users/dewayne/sandbox/git_repos/madbomber/simple_flow/examples/06_real_world_ecommerce.rb` for a complete e-commerce order processing pipeline that demonstrates:
+- Multi-level parallel execution
+- Context merging
+- Error handling
+- Complex dependency relationships
+## Related Documentation
+- [Performance Characteristics](performance.md) - Understanding parallel execution performance
+- [Best Practices](best-practices.md) - Comprehensive best practices for concurrent execution
+- [Pipeline API](../api/pipeline.md) - Complete Pipeline API reference
+- [Parallel Executor API](../api/parallel-step.md) - Low-level parallel execution details