job-workflow 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.rspec +3 -0
- data/.rubocop.yml +91 -0
- data/CHANGELOG.md +23 -0
- data/LICENSE.txt +21 -0
- data/README.md +47 -0
- data/Rakefile +55 -0
- data/Steepfile +10 -0
- data/guides/API_REFERENCE.md +112 -0
- data/guides/BEST_PRACTICES.md +113 -0
- data/guides/CACHE_STORE_INTEGRATION.md +145 -0
- data/guides/CONDITIONAL_EXECUTION.md +66 -0
- data/guides/DEPENDENCY_WAIT.md +386 -0
- data/guides/DRY_RUN.md +390 -0
- data/guides/DSL_BASICS.md +216 -0
- data/guides/ERROR_HANDLING.md +187 -0
- data/guides/GETTING_STARTED.md +524 -0
- data/guides/INSTRUMENTATION.md +131 -0
- data/guides/LIFECYCLE_HOOKS.md +415 -0
- data/guides/NAMESPACES.md +75 -0
- data/guides/OPENTELEMETRY_INTEGRATION.md +86 -0
- data/guides/PARALLEL_PROCESSING.md +302 -0
- data/guides/PRODUCTION_DEPLOYMENT.md +110 -0
- data/guides/QUEUE_MANAGEMENT.md +141 -0
- data/guides/README.md +174 -0
- data/guides/SCHEDULED_JOBS.md +165 -0
- data/guides/STRUCTURED_LOGGING.md +268 -0
- data/guides/TASK_OUTPUTS.md +240 -0
- data/guides/TESTING_STRATEGY.md +56 -0
- data/guides/THROTTLING.md +198 -0
- data/guides/TROUBLESHOOTING.md +53 -0
- data/guides/WORKFLOW_COMPOSITION.md +675 -0
- data/guides/WORKFLOW_STATUS_QUERY.md +288 -0
- data/lib/job-workflow.rb +3 -0
- data/lib/job_workflow/argument_def.rb +16 -0
- data/lib/job_workflow/arguments.rb +40 -0
- data/lib/job_workflow/auto_scaling/adapter/aws_adapter.rb +66 -0
- data/lib/job_workflow/auto_scaling/adapter.rb +31 -0
- data/lib/job_workflow/auto_scaling/configuration.rb +85 -0
- data/lib/job_workflow/auto_scaling/executor.rb +43 -0
- data/lib/job_workflow/auto_scaling.rb +69 -0
- data/lib/job_workflow/cache_store_adapters.rb +46 -0
- data/lib/job_workflow/context.rb +352 -0
- data/lib/job_workflow/dry_run_config.rb +31 -0
- data/lib/job_workflow/dsl.rb +236 -0
- data/lib/job_workflow/error_hook.rb +24 -0
- data/lib/job_workflow/hook.rb +24 -0
- data/lib/job_workflow/hook_registry.rb +66 -0
- data/lib/job_workflow/instrumentation/log_subscriber.rb +194 -0
- data/lib/job_workflow/instrumentation/opentelemetry_subscriber.rb +221 -0
- data/lib/job_workflow/instrumentation.rb +257 -0
- data/lib/job_workflow/job_status.rb +92 -0
- data/lib/job_workflow/logger.rb +86 -0
- data/lib/job_workflow/namespace.rb +36 -0
- data/lib/job_workflow/output.rb +81 -0
- data/lib/job_workflow/output_def.rb +14 -0
- data/lib/job_workflow/queue.rb +74 -0
- data/lib/job_workflow/queue_adapter.rb +38 -0
- data/lib/job_workflow/queue_adapters/abstract.rb +87 -0
- data/lib/job_workflow/queue_adapters/null_adapter.rb +127 -0
- data/lib/job_workflow/queue_adapters/solid_queue_adapter.rb +224 -0
- data/lib/job_workflow/runner.rb +173 -0
- data/lib/job_workflow/schedule.rb +46 -0
- data/lib/job_workflow/semaphore.rb +71 -0
- data/lib/job_workflow/task.rb +83 -0
- data/lib/job_workflow/task_callable.rb +43 -0
- data/lib/job_workflow/task_context.rb +70 -0
- data/lib/job_workflow/task_dependency_wait.rb +66 -0
- data/lib/job_workflow/task_enqueue.rb +50 -0
- data/lib/job_workflow/task_graph.rb +43 -0
- data/lib/job_workflow/task_job_status.rb +70 -0
- data/lib/job_workflow/task_output.rb +51 -0
- data/lib/job_workflow/task_retry.rb +64 -0
- data/lib/job_workflow/task_throttle.rb +46 -0
- data/lib/job_workflow/version.rb +5 -0
- data/lib/job_workflow/workflow.rb +87 -0
- data/lib/job_workflow/workflow_status.rb +112 -0
- data/lib/job_workflow.rb +59 -0
- data/rbs_collection.lock.yaml +172 -0
- data/rbs_collection.yaml +14 -0
- data/sig/generated/job-workflow.rbs +2 -0
- data/sig/generated/job_workflow/argument_def.rbs +14 -0
- data/sig/generated/job_workflow/arguments.rbs +26 -0
- data/sig/generated/job_workflow/auto_scaling/adapter/aws_adapter.rbs +32 -0
- data/sig/generated/job_workflow/auto_scaling/adapter.rbs +22 -0
- data/sig/generated/job_workflow/auto_scaling/configuration.rbs +50 -0
- data/sig/generated/job_workflow/auto_scaling/executor.rbs +29 -0
- data/sig/generated/job_workflow/auto_scaling.rbs +47 -0
- data/sig/generated/job_workflow/cache_store_adapters.rbs +28 -0
- data/sig/generated/job_workflow/context.rbs +155 -0
- data/sig/generated/job_workflow/dry_run_config.rbs +16 -0
- data/sig/generated/job_workflow/dsl.rbs +117 -0
- data/sig/generated/job_workflow/error_hook.rbs +18 -0
- data/sig/generated/job_workflow/hook.rbs +18 -0
- data/sig/generated/job_workflow/hook_registry.rbs +47 -0
- data/sig/generated/job_workflow/instrumentation/log_subscriber.rbs +102 -0
- data/sig/generated/job_workflow/instrumentation/opentelemetry_subscriber.rbs +113 -0
- data/sig/generated/job_workflow/instrumentation.rbs +138 -0
- data/sig/generated/job_workflow/job_status.rbs +46 -0
- data/sig/generated/job_workflow/logger.rbs +56 -0
- data/sig/generated/job_workflow/namespace.rbs +24 -0
- data/sig/generated/job_workflow/output.rbs +39 -0
- data/sig/generated/job_workflow/output_def.rbs +12 -0
- data/sig/generated/job_workflow/queue.rbs +49 -0
- data/sig/generated/job_workflow/queue_adapter.rbs +18 -0
- data/sig/generated/job_workflow/queue_adapters/abstract.rbs +56 -0
- data/sig/generated/job_workflow/queue_adapters/null_adapter.rbs +73 -0
- data/sig/generated/job_workflow/queue_adapters/solid_queue_adapter.rbs +111 -0
- data/sig/generated/job_workflow/runner.rbs +66 -0
- data/sig/generated/job_workflow/schedule.rbs +34 -0
- data/sig/generated/job_workflow/semaphore.rbs +37 -0
- data/sig/generated/job_workflow/task.rbs +60 -0
- data/sig/generated/job_workflow/task_callable.rbs +30 -0
- data/sig/generated/job_workflow/task_context.rbs +52 -0
- data/sig/generated/job_workflow/task_dependency_wait.rbs +42 -0
- data/sig/generated/job_workflow/task_enqueue.rbs +27 -0
- data/sig/generated/job_workflow/task_graph.rbs +27 -0
- data/sig/generated/job_workflow/task_job_status.rbs +42 -0
- data/sig/generated/job_workflow/task_output.rbs +29 -0
- data/sig/generated/job_workflow/task_retry.rbs +30 -0
- data/sig/generated/job_workflow/task_throttle.rbs +20 -0
- data/sig/generated/job_workflow/version.rbs +5 -0
- data/sig/generated/job_workflow/workflow.rbs +48 -0
- data/sig/generated/job_workflow/workflow_status.rbs +55 -0
- data/sig/generated/job_workflow.rbs +8 -0
- data/sig-private/activejob.rbs +35 -0
- data/sig-private/activesupport.rbs +23 -0
- data/sig-private/aws.rbs +32 -0
- data/sig-private/opentelemetry.rbs +40 -0
- data/sig-private/solid_queue.rbs +108 -0
- data/tmp/.keep +0 -0
- metadata +190 -0
|
@@ -0,0 +1,187 @@
|
|
|
1
|
+
# Error Handling
|
|
2
|
+
|
|
3
|
+
JobWorkflow provides robust error handling features. With retry strategies and custom error handling, you can build reliable workflows.
|
|
4
|
+
|
|
5
|
+
## Retry Configuration
|
|
6
|
+
|
|
7
|
+
### Simple Retry
|
|
8
|
+
|
|
9
|
+
Specify the maximum retry count with a simple integer.
|
|
10
|
+
|
|
11
|
+
```ruby
|
|
12
|
+
argument :api_endpoint, "String"
|
|
13
|
+
|
|
14
|
+
# Simple retry (up to 3 times)
|
|
15
|
+
task :fetch_data, retry: 3, output: { data: "Hash" } do |ctx|
|
|
16
|
+
endpoint = ctx.arguments.api_endpoint
|
|
17
|
+
{ data: ExternalAPI.fetch(endpoint) }
|
|
18
|
+
end
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
### Advanced Retry Configuration
|
|
22
|
+
|
|
23
|
+
Use a Hash for detailed retry configuration with exponential backoff.
|
|
24
|
+
|
|
25
|
+
```ruby
|
|
26
|
+
task :advanced_retry,
|
|
27
|
+
retry: {
|
|
28
|
+
count: 5, # Maximum retry attempts
|
|
29
|
+
strategy: :exponential, # :linear or :exponential
|
|
30
|
+
base_delay: 2, # Initial wait time in seconds
|
|
31
|
+
jitter: true # Add ±randomness to prevent thundering herd
|
|
32
|
+
},
|
|
33
|
+
output: { result: "String" } do |ctx|
|
|
34
|
+
{ result: unreliable_operation }
|
|
35
|
+
# Retry intervals: 2±1s, 4±2s, 8±4s, 16±8s, 32±16s
|
|
36
|
+
end
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
## Retry Strategies
|
|
40
|
+
|
|
41
|
+
### Linear
|
|
42
|
+
|
|
43
|
+
Retries at fixed intervals.
|
|
44
|
+
|
|
45
|
+
```ruby
|
|
46
|
+
task :linear_retry,
|
|
47
|
+
retry: { count: 5, strategy: :linear, base_delay: 10 },
|
|
48
|
+
output: { result: "String" } do |ctx|
|
|
49
|
+
{ result: operation }
|
|
50
|
+
# Retry intervals: 10s, 10s, 10s, 10s, 10s
|
|
51
|
+
end
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
### Exponential (Recommended)
|
|
55
|
+
|
|
56
|
+
Doubles wait time with each retry.
|
|
57
|
+
|
|
58
|
+
```ruby
|
|
59
|
+
task :exponential_retry,
|
|
60
|
+
retry: { count: 5, strategy: :exponential, base_delay: 2, jitter: true },
|
|
61
|
+
output: { result: "String" } do |ctx|
|
|
62
|
+
{ result: operation }
|
|
63
|
+
# Retry intervals: 2±1s, 4±2s, 8±4s, 16±8s, 32±16s
|
|
64
|
+
end
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
## Workflow-Level Retry
|
|
68
|
+
|
|
69
|
+
### Using ActiveJob's `retry_on`
|
|
70
|
+
|
|
71
|
+
To retry the entire workflow (all tasks from the beginning) when an error occurs, use ActiveJob's standard `retry_on` method. This automatically requeues the complete job, ensuring all tasks are re-executed:
|
|
72
|
+
|
|
73
|
+
```ruby
|
|
74
|
+
class DataPipelineJob < ApplicationJob
|
|
75
|
+
include JobWorkflow::DSL
|
|
76
|
+
|
|
77
|
+
argument :data_source, "String"
|
|
78
|
+
|
|
79
|
+
# Retry the entire workflow on StandardError (e.g., API timeouts)
|
|
80
|
+
retry_on StandardError, wait: :exponentially_longer, attempts: 5
|
|
81
|
+
|
|
82
|
+
task :fetch_data, output: { raw_data: "String" } do |ctx|
|
|
83
|
+
source = ctx.arguments.data_source
|
|
84
|
+
{ raw_data: ExternalAPI.fetch(source) }
|
|
85
|
+
end
|
|
86
|
+
|
|
87
|
+
task :validate_data, depends_on: [:fetch_data], output: { valid: "bool" } do |ctx|
|
|
88
|
+
data = ctx.output[:fetch_data][:raw_data]
|
|
89
|
+
{ valid: validate(data) }
|
|
90
|
+
end
|
|
91
|
+
|
|
92
|
+
task :process_data, depends_on: [:validate_data] do |ctx|
|
|
93
|
+
# ... process data
|
|
94
|
+
end
|
|
95
|
+
end
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
### Combining Task-Level and Workflow-Level Retries
|
|
99
|
+
|
|
100
|
+
You can combine task-level retries (for handling transient errors) with workflow-level retries (for catastrophic failures):
|
|
101
|
+
|
|
102
|
+
```ruby
|
|
103
|
+
class RobustDataPipelineJob < ApplicationJob
|
|
104
|
+
include JobWorkflow::DSL
|
|
105
|
+
|
|
106
|
+
# Workflow-level: Handle catastrophic failures (e.g., database connection loss)
|
|
107
|
+
retry_on DatabaseConnectionError, wait: :exponentially_longer, attempts: 3
|
|
108
|
+
|
|
109
|
+
argument :batch_id, "String"
|
|
110
|
+
|
|
111
|
+
# Task-level: Handle transient API errors
|
|
112
|
+
task :fetch_data,
|
|
113
|
+
retry: { count: 3, strategy: :exponential, base_delay: 2 },
|
|
114
|
+
output: { raw_data: "String" } do |ctx|
|
|
115
|
+
{ raw_data: ExternalAPI.fetch(ctx.arguments.batch_id) }
|
|
116
|
+
end
|
|
117
|
+
|
|
118
|
+
task :validate_data,
|
|
119
|
+
depends_on: [:fetch_data],
|
|
120
|
+
retry: { count: 2, strategy: :linear, base_delay: 1 },
|
|
121
|
+
output: { valid: "bool" } do |ctx|
|
|
122
|
+
data = ctx.output[:fetch_data][:raw_data]
|
|
123
|
+
{ valid: validate(data) }
|
|
124
|
+
end
|
|
125
|
+
|
|
126
|
+
task :store_results, depends_on: [:validate_data] do |ctx|
|
|
127
|
+
# If this succeeds, the entire workflow is complete
|
|
128
|
+
# If a database connection error occurs here, the entire job is retried
|
|
129
|
+
Database.store(ctx.output[:validate_data])
|
|
130
|
+
end
|
|
131
|
+
end
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
### Retry Options
|
|
135
|
+
|
|
136
|
+
The `retry_on` method supports several options from ActiveJob:
|
|
137
|
+
|
|
138
|
+
```ruby
|
|
139
|
+
class MyWorkflowJob < ApplicationJob
|
|
140
|
+
include JobWorkflow::DSL
|
|
141
|
+
|
|
142
|
+
# Wait with exponential backoff (2, 4, 8, 16, 32 seconds...)
|
|
143
|
+
retry_on TimeoutError, wait: :exponentially_longer, attempts: 5
|
|
144
|
+
|
|
145
|
+
# Wait with a fixed interval
|
|
146
|
+
retry_on APIError, wait: 30.seconds, attempts: 3
|
|
147
|
+
|
|
148
|
+
# Custom wait logic
|
|
149
|
+
retry_on CustomError, wait: ->(executions) { (executions + 1) * 10.seconds }, attempts: 4
|
|
150
|
+
|
|
151
|
+
# Multiple error types
|
|
152
|
+
retry_on TimeoutError, APIError, wait: :exponentially_longer, attempts: 3
|
|
153
|
+
|
|
154
|
+
# ... task definitions
|
|
155
|
+
end
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
### Key Differences: Task-Level vs Workflow-Level Retry
|
|
159
|
+
|
|
160
|
+
| Aspect | Task-Level (`retry:`) | Workflow-Level (`retry_on`) |
|
|
161
|
+
|--------|------------------------|------------------------------|
|
|
162
|
+
| **Scope** | Single task only | Entire workflow |
|
|
163
|
+
| **Re-execution** | Only the failed task retries | All tasks restart from the beginning |
|
|
164
|
+
| **Use Case** | Transient errors in one task (API timeouts, etc.) | Catastrophic failures affecting the whole workflow |
|
|
165
|
+
| **Output Preservation** | Previous outputs still available | Context reset on workflow retry |
|
|
166
|
+
| **Example** | API call times out | Database connection lost |
|
|
167
|
+
|
|
168
|
+
### Best Practices for Retry Strategy
|
|
169
|
+
|
|
170
|
+
1. **Task-level retries** for transient, recoverable errors:
|
|
171
|
+
```ruby
|
|
172
|
+
task :api_call,
|
|
173
|
+
retry: { count: 3, strategy: :exponential, base_delay: 2 }
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
2. **Workflow-level retries** for environment issues (database, network):
|
|
177
|
+
```ruby
|
|
178
|
+
retry_on DatabaseConnectionError, wait: :exponentially_longer, attempts: 3
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
3. **Avoid infinite retries**:
|
|
182
|
+
- Always set a maximum `attempts` limit
|
|
183
|
+
- Use exponential backoff to avoid overwhelming systems
|
|
184
|
+
|
|
185
|
+
4. **Monitor retry patterns**:
|
|
186
|
+
- Use instrumentation hooks to track retry occurrences
|
|
187
|
+
- Alert on repeated failures to identify systemic issues
|
|
@@ -0,0 +1,524 @@
|
|
|
1
|
+
# Getting Started with JobWorkflow
|
|
2
|
+
|
|
3
|
+
Welcome to JobWorkflow! This guide will help you get up and running quickly with JobWorkflow, from installation to creating your first workflow.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## 🚀 Quick Start (5 Minutes)
|
|
8
|
+
|
|
9
|
+
Want to get started immediately? Here's the absolute minimum you need:
|
|
10
|
+
|
|
11
|
+
### 1. Install JobWorkflow
|
|
12
|
+
|
|
13
|
+
```ruby
|
|
14
|
+
# Gemfile
|
|
15
|
+
gem 'job-workflow'
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
```bash
|
|
19
|
+
bundle install
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
### 2. Create Your First Workflow
|
|
23
|
+
|
|
24
|
+
```ruby
|
|
25
|
+
# app/jobs/hello_workflow_job.rb
|
|
26
|
+
class HelloWorkflowJob < ApplicationJob
|
|
27
|
+
include JobWorkflow::DSL
|
|
28
|
+
|
|
29
|
+
# Define input argument
|
|
30
|
+
argument :name, "String"
|
|
31
|
+
|
|
32
|
+
# Define a simple task
|
|
33
|
+
task :greet do |ctx|
|
|
34
|
+
name = ctx.arguments.name
|
|
35
|
+
puts "Hello, #{name}!"
|
|
36
|
+
end
|
|
37
|
+
end
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
### 3. Run It
|
|
41
|
+
|
|
42
|
+
```ruby
|
|
43
|
+
# In Rails console or from your app
|
|
44
|
+
HelloWorkflowJob.perform_later(name: "World")
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
**That's it!** You've just created and executed your first JobWorkflow workflow. 🎉
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## What is JobWorkflow?
|
|
52
|
+
|
|
53
|
+
JobWorkflow is a declarative workflow orchestration engine for Ruby on Rails applications. Built on top of ActiveJob, it allows you to write complex workflows using a concise DSL.
|
|
54
|
+
|
|
55
|
+
### Why JobWorkflow?
|
|
56
|
+
|
|
57
|
+
- **Declarative DSL**: Familiar syntax similar to Rake and RSpec
|
|
58
|
+
- **Automatic Dependency Management**: Tasks execute in the correct order automatically
|
|
59
|
+
- **Parallel Processing**: Efficient parallel execution using map tasks
|
|
60
|
+
- **Built-in Error Handling**: Retry functionality with exponential backoff
|
|
61
|
+
- **JSON Serialization**: Schema-tolerant context persistence
|
|
62
|
+
- **Flexible Design**: A foundation for building production-style workflows
|
|
63
|
+
|
|
64
|
+
### When to Use JobWorkflow
|
|
65
|
+
|
|
66
|
+
JobWorkflow is ideal for:
|
|
67
|
+
|
|
68
|
+
- **ETL Pipelines**: Extract, transform, and load data workflows
|
|
69
|
+
- **Business Processes**: Multi-step business logic with dependencies
|
|
70
|
+
- **Batch Processing**: Process collections of items in parallel
|
|
71
|
+
- **API Integration**: Coordinate multiple API calls with rate limiting
|
|
72
|
+
- **Data Synchronization**: Keep multiple systems in sync
|
|
73
|
+
- **Report Generation**: Generate complex reports with multiple data sources
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
## Installation
|
|
78
|
+
|
|
79
|
+
### Requirements
|
|
80
|
+
|
|
81
|
+
Before installing JobWorkflow, ensure your environment meets these requirements:
|
|
82
|
+
|
|
83
|
+
- **Ruby** >= 3.1.0
|
|
84
|
+
- **Rails** >= 7.1.0 (ActiveJob, ActiveSupport)
|
|
85
|
+
- **Queue Backend**: SolidQueue recommended (other adapters supported)
|
|
86
|
+
- **Cache Backend**: SolidCache recommended (other adapters supported)
|
|
87
|
+
|
|
88
|
+
### Adding to Gemfile
|
|
89
|
+
|
|
90
|
+
Add JobWorkflow to your application's Gemfile:
|
|
91
|
+
|
|
92
|
+
```ruby
|
|
93
|
+
# Gemfile
|
|
94
|
+
gem 'job-workflow'
|
|
95
|
+
|
|
96
|
+
# Optional but recommended: SolidQueue and SolidCache
|
|
97
|
+
gem 'solid_queue'
|
|
98
|
+
gem 'solid_cache'
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
Run bundle install:
|
|
102
|
+
|
|
103
|
+
```bash
|
|
104
|
+
bundle install
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
### Configuring ActiveJob
|
|
108
|
+
|
|
109
|
+
Set up SolidQueue as your ActiveJob backend:
|
|
110
|
+
|
|
111
|
+
```ruby
|
|
112
|
+
# config/application.rb
|
|
113
|
+
config.active_job.queue_adapter = :solid_queue
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
If using SolidQueue, install and configure it:
|
|
117
|
+
|
|
118
|
+
```bash
|
|
119
|
+
bin/rails solid_queue:install
|
|
120
|
+
bin/rails db:migrate
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### Configuring Cache Store (Optional but Recommended)
|
|
124
|
+
|
|
125
|
+
For workflows using throttling or semaphores, configure SolidCache:
|
|
126
|
+
|
|
127
|
+
```ruby
|
|
128
|
+
# config/environments/production.rb
|
|
129
|
+
config.cache_store = :solid_cache_store
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
Install SolidCache:
|
|
133
|
+
|
|
134
|
+
```bash
|
|
135
|
+
bin/rails solid_cache:install
|
|
136
|
+
bin/rails db:migrate
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
---
|
|
140
|
+
|
|
141
|
+
## Core Concepts
|
|
142
|
+
|
|
143
|
+
Understanding these core concepts will help you build effective workflows with JobWorkflow.
|
|
144
|
+
|
|
145
|
+
### Workflow
|
|
146
|
+
|
|
147
|
+
A workflow is a collection of tasks that execute in a defined order. Each workflow is represented as a job class that includes `JobWorkflow::DSL`.
|
|
148
|
+
|
|
149
|
+
```ruby
|
|
150
|
+
class MyWorkflowJob < ApplicationJob
|
|
151
|
+
include JobWorkflow::DSL
|
|
152
|
+
|
|
153
|
+
# Tasks defined here
|
|
154
|
+
end
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
### Task
|
|
158
|
+
|
|
159
|
+
A task is the smallest execution unit in a workflow. Each task:
|
|
160
|
+
|
|
161
|
+
- Has a unique name (symbol)
|
|
162
|
+
- Can depend on other tasks
|
|
163
|
+
- Receives a Context object
|
|
164
|
+
- Can return outputs for use in later tasks
|
|
165
|
+
|
|
166
|
+
```ruby
|
|
167
|
+
task :fetch_data, output: { result: "String" } do |ctx|
|
|
168
|
+
{ result: "data" }
|
|
169
|
+
end
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
### Arguments
|
|
173
|
+
|
|
174
|
+
Arguments are **immutable inputs** passed to the workflow. They represent the initial configuration and data:
|
|
175
|
+
|
|
176
|
+
```ruby
|
|
177
|
+
# Define arguments
|
|
178
|
+
argument :user_id, "Integer"
|
|
179
|
+
argument :email, "String"
|
|
180
|
+
argument :config, "Hash", default: {}
|
|
181
|
+
|
|
182
|
+
# Access in tasks (read-only)
|
|
183
|
+
task :example do |ctx|
|
|
184
|
+
user_id = ctx.arguments.user_id # Read-only
|
|
185
|
+
email = ctx.arguments.email
|
|
186
|
+
end
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
**Key Points**:
|
|
190
|
+
- Arguments are read-only and cannot be modified
|
|
191
|
+
- Arguments persist throughout workflow execution
|
|
192
|
+
- Use task outputs to pass data between tasks
|
|
193
|
+
|
|
194
|
+
### Context
|
|
195
|
+
|
|
196
|
+
Context provides access to workflow state:
|
|
197
|
+
|
|
198
|
+
- **Arguments**: Immutable inputs (`ctx.arguments`)
|
|
199
|
+
- **Outputs**: Results from previous tasks (`ctx.output[:task_name]`)
|
|
200
|
+
- **Utilities**: Throttling, instrumentation (`ctx.throttle`, `ctx.instrument`)
|
|
201
|
+
|
|
202
|
+
```ruby
|
|
203
|
+
task :process, depends_on: [:fetch_data] do |ctx|
|
|
204
|
+
# Access arguments
|
|
205
|
+
config = ctx.arguments.config
|
|
206
|
+
|
|
207
|
+
# Access outputs from previous tasks
|
|
208
|
+
data = ctx.output[:fetch_data].first.result
|
|
209
|
+
|
|
210
|
+
# Use throttling
|
|
211
|
+
ctx.throttle(limit: 10, key: "api") do
|
|
212
|
+
API.call(data)
|
|
213
|
+
end
|
|
214
|
+
end
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
### Outputs
|
|
218
|
+
|
|
219
|
+
Outputs are structured data returned from tasks. They are:
|
|
220
|
+
|
|
221
|
+
- Defined using the `output:` option
|
|
222
|
+
- Accessible via `ctx.output[:task_name]`
|
|
223
|
+
- Automatically collected for map tasks (arrays)
|
|
224
|
+
- Persisted with the context
|
|
225
|
+
|
|
226
|
+
```ruby
|
|
227
|
+
# Define output structure
|
|
228
|
+
task :fetch_user, output: { user: "Hash", status: "Symbol" } do |ctx|
|
|
229
|
+
user = User.find(ctx.arguments.user_id)
|
|
230
|
+
{
|
|
231
|
+
user: user.as_json,
|
|
232
|
+
status: :ok
|
|
233
|
+
}
|
|
234
|
+
end
|
|
235
|
+
|
|
236
|
+
# Access output in another task
|
|
237
|
+
task :process_user, depends_on: [:fetch_user] do |ctx|
|
|
238
|
+
user_data = ctx.output[:fetch_user].first.user
|
|
239
|
+
status = ctx.output[:fetch_user].first.status
|
|
240
|
+
# Process user_data...
|
|
241
|
+
end
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
---
|
|
245
|
+
|
|
246
|
+
## Your First Workflow
|
|
247
|
+
|
|
248
|
+
Let's create a practical example: a simple ETL (Extract-Transform-Load) workflow.
|
|
249
|
+
|
|
250
|
+
### Step 1: Create the Job Class
|
|
251
|
+
|
|
252
|
+
Create a new job file:
|
|
253
|
+
|
|
254
|
+
```ruby
|
|
255
|
+
# app/jobs/data_pipeline_job.rb
|
|
256
|
+
class DataPipelineJob < ApplicationJob
|
|
257
|
+
include JobWorkflow::DSL
|
|
258
|
+
|
|
259
|
+
# Define arguments (immutable inputs)
|
|
260
|
+
argument :source_id, "Integer"
|
|
261
|
+
|
|
262
|
+
# Task 1: Data extraction
|
|
263
|
+
task :extract, output: { raw_data: "String" } do |ctx|
|
|
264
|
+
source_id = ctx.arguments.source_id
|
|
265
|
+
raw_data = ExternalAPI.fetch(source_id)
|
|
266
|
+
{ raw_data: raw_data }
|
|
267
|
+
end
|
|
268
|
+
|
|
269
|
+
# Task 2: Data transformation (depends on extract)
|
|
270
|
+
task :transform, depends_on: [:extract], output: { transformed_data: "Hash" } do |ctx|
|
|
271
|
+
raw_data = ctx.output[:extract].first.raw_data
|
|
272
|
+
transformed_data = JSON.parse(raw_data)
|
|
273
|
+
{ transformed_data: transformed_data }
|
|
274
|
+
end
|
|
275
|
+
|
|
276
|
+
# Task 3: Data loading (depends on transform)
|
|
277
|
+
task :load, depends_on: [:transform] do |ctx|
|
|
278
|
+
transformed_data = ctx.output[:transform].first.transformed_data
|
|
279
|
+
DataModel.create!(transformed_data)
|
|
280
|
+
Rails.logger.info "Data loaded successfully"
|
|
281
|
+
end
|
|
282
|
+
end
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
### Step 2: Enqueue the Job
|
|
286
|
+
|
|
287
|
+
From a controller, Rake task, or Rails console:
|
|
288
|
+
|
|
289
|
+
```ruby
|
|
290
|
+
# Asynchronous execution (recommended)
|
|
291
|
+
DataPipelineJob.perform_later(source_id: 123)
|
|
292
|
+
|
|
293
|
+
# Synchronous execution (for testing/development)
|
|
294
|
+
DataPipelineJob.perform_now(source_id: 123)
|
|
295
|
+
```
|
|
296
|
+
|
|
297
|
+
### Step 3: Understand the Execution Flow
|
|
298
|
+
|
|
299
|
+
1. The `extract` task executes first (no dependencies)
|
|
300
|
+
2. After `extract` completes, the `transform` task executes
|
|
301
|
+
3. After `transform` completes, the `load` task executes
|
|
302
|
+
|
|
303
|
+
JobWorkflow automatically determines the correct execution order based on dependencies using topological sorting.
|
|
304
|
+
|
|
305
|
+
### Step 4: Monitor Execution
|
|
306
|
+
|
|
307
|
+
JobWorkflow outputs workflow execution status to the Rails logger:
|
|
308
|
+
|
|
309
|
+
```ruby
|
|
310
|
+
# config/environments/development.rb
|
|
311
|
+
config.log_level = :info
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
Example log output:
|
|
315
|
+
|
|
316
|
+
```json
|
|
317
|
+
{"time":"2024-01-02T10:00:00.123Z","level":"INFO","event":"workflow.start","job_name":"DataPipelineJob","job_id":"abc123"}
|
|
318
|
+
{"time":"2024-01-02T10:00:01.234Z","level":"INFO","event":"task.start","job_name":"DataPipelineJob","job_id":"abc123","task_name":"extract"}
|
|
319
|
+
{"time":"2024-01-02T10:00:05.345Z","level":"INFO","event":"task.complete","job_name":"DataPipelineJob","job_id":"abc123","task_name":"extract"}
|
|
320
|
+
{"time":"2024-01-02T10:00:05.456Z","level":"INFO","event":"task.start","job_name":"DataPipelineJob","job_id":"abc123","task_name":"transform"}
|
|
321
|
+
{"time":"2024-01-02T10:00:07.567Z","level":"INFO","event":"task.complete","job_name":"DataPipelineJob","job_id":"abc123","task_name":"transform"}
|
|
322
|
+
{"time":"2024-01-02T10:00:07.678Z","level":"INFO","event":"task.start","job_name":"DataPipelineJob","job_id":"abc123","task_name":"load"}
|
|
323
|
+
{"time":"2024-01-02T10:00:10.789Z","level":"INFO","event":"task.complete","job_name":"DataPipelineJob","job_id":"abc123","task_name":"load"}
|
|
324
|
+
{"time":"2024-01-02T10:00:10.890Z","level":"INFO","event":"workflow.complete","job_name":"DataPipelineJob","job_id":"abc123"}
|
|
325
|
+
```
|
|
326
|
+
|
|
327
|
+
---
|
|
328
|
+
|
|
329
|
+
## Common Patterns
|
|
330
|
+
|
|
331
|
+
Here are some common patterns you'll use when building workflows.
|
|
332
|
+
|
|
333
|
+
### Multiple Dependencies
|
|
334
|
+
|
|
335
|
+
Tasks can depend on multiple other tasks:
|
|
336
|
+
|
|
337
|
+
```ruby
|
|
338
|
+
argument :order_id, "Integer"
|
|
339
|
+
|
|
340
|
+
task :fetch_order, output: { order: "Hash" } do |ctx|
|
|
341
|
+
order = Order.find(ctx.arguments.order_id)
|
|
342
|
+
{ order: order.as_json }
|
|
343
|
+
end
|
|
344
|
+
|
|
345
|
+
task :fetch_user, output: { user: "Hash" } do |ctx|
|
|
346
|
+
order = ctx.output[:fetch_order].first.order
|
|
347
|
+
user = User.find(order["user_id"])
|
|
348
|
+
{ user: user.as_json }
|
|
349
|
+
end
|
|
350
|
+
|
|
351
|
+
task :fetch_inventory, output: { inventory: "Array[Hash]" } do |ctx|
|
|
352
|
+
order = ctx.output[:fetch_order].first.order
|
|
353
|
+
inventory = order["items"].map { |item| Inventory.check(item["id"]) }
|
|
354
|
+
{ inventory: inventory }
|
|
355
|
+
end
|
|
356
|
+
|
|
357
|
+
# This task waits for both :fetch_user and :fetch_inventory
|
|
358
|
+
task :validate_order, depends_on: [:fetch_user, :fetch_inventory] do |ctx|
|
|
359
|
+
user = ctx.output[:fetch_user].first.user
|
|
360
|
+
inventory = ctx.output[:fetch_inventory].first.inventory
|
|
361
|
+
|
|
362
|
+
OrderValidator.validate(user, inventory)
|
|
363
|
+
end
|
|
364
|
+
```
|
|
365
|
+
|
|
366
|
+
### Conditional Execution
|
|
367
|
+
|
|
368
|
+
Execute tasks only when conditions are met:
|
|
369
|
+
|
|
370
|
+
```ruby
|
|
371
|
+
argument :user, "User"
|
|
372
|
+
argument :amount, "Integer"
|
|
373
|
+
|
|
374
|
+
task :basic_processing do |ctx|
|
|
375
|
+
# Always executes
|
|
376
|
+
BasicProcessor.process(ctx.arguments.amount)
|
|
377
|
+
end
|
|
378
|
+
|
|
379
|
+
# Only execute for premium users
|
|
380
|
+
task :premium_processing,
|
|
381
|
+
depends_on: [:basic_processing],
|
|
382
|
+
condition: ->(ctx) { ctx.arguments.user.premium? } do |ctx|
|
|
383
|
+
PremiumProcessor.process(ctx.arguments.amount)
|
|
384
|
+
end
|
|
385
|
+
|
|
386
|
+
# Only execute for large amounts
|
|
387
|
+
task :large_amount_processing,
|
|
388
|
+
depends_on: [:basic_processing],
|
|
389
|
+
condition: ->(ctx) { ctx.arguments.amount > 1000 } do |ctx|
|
|
390
|
+
LargeAmountProcessor.process(ctx.arguments.amount)
|
|
391
|
+
end
|
|
392
|
+
```
|
|
393
|
+
|
|
394
|
+
### Error Handling with Retry
|
|
395
|
+
|
|
396
|
+
Add retry logic to handle transient failures:
|
|
397
|
+
|
|
398
|
+
```ruby
|
|
399
|
+
argument :api_endpoint, "String"
|
|
400
|
+
|
|
401
|
+
# Simple retry (up to 3 times)
|
|
402
|
+
task :fetch_data, retry: 3, output: { data: "Hash" } do |ctx|
|
|
403
|
+
endpoint = ctx.arguments.api_endpoint
|
|
404
|
+
{ data: ExternalAPI.fetch(endpoint) }
|
|
405
|
+
end
|
|
406
|
+
|
|
407
|
+
# Advanced retry with exponential backoff
|
|
408
|
+
task :fetch_data_advanced,
|
|
409
|
+
retry: {
|
|
410
|
+
count: 5,
|
|
411
|
+
strategy: :exponential,
|
|
412
|
+
base_delay: 2,
|
|
413
|
+
jitter: true
|
|
414
|
+
},
|
|
415
|
+
output: { data: "Hash" } do |ctx|
|
|
416
|
+
endpoint = ctx.arguments.api_endpoint
|
|
417
|
+
{ data: ExternalAPI.fetch(endpoint) }
|
|
418
|
+
# Retry intervals: 2±1s, 4±2s, 8±4s, 16±8s, 32±16s
|
|
419
|
+
end
|
|
420
|
+
```
|
|
421
|
+
|
|
422
|
+
### Parallel Processing
|
|
423
|
+
|
|
424
|
+
Process collections in parallel:
|
|
425
|
+
|
|
426
|
+
```ruby
|
|
427
|
+
argument :user_ids, "Array[Integer]"
|
|
428
|
+
|
|
429
|
+
# Process each user in parallel
|
|
430
|
+
task :process_users,
|
|
431
|
+
each: ->(ctx) { ctx.arguments.user_ids },
|
|
432
|
+
output: { user_id: "Integer", status: "Symbol" } do |ctx|
|
|
433
|
+
user_id = ctx.each_value
|
|
434
|
+
user = User.find(user_id)
|
|
435
|
+
user.process!
|
|
436
|
+
{
|
|
437
|
+
user_id: user_id,
|
|
438
|
+
status: :processed
|
|
439
|
+
}
|
|
440
|
+
end
|
|
441
|
+
|
|
442
|
+
# Aggregate results
|
|
443
|
+
task :summarize, depends_on: [:process_users] do |ctx|
|
|
444
|
+
results = ctx.output[:process_users]
|
|
445
|
+
puts "Processed #{results.size} users"
|
|
446
|
+
results.each do |result|
|
|
447
|
+
puts "User #{result.user_id}: #{result.status}"
|
|
448
|
+
end
|
|
449
|
+
end
|
|
450
|
+
```
|
|
451
|
+
|
|
452
|
+
### Throttling API Calls
|
|
453
|
+
|
|
454
|
+
Limit concurrent API calls to respect rate limits:
|
|
455
|
+
|
|
456
|
+
```ruby
|
|
457
|
+
argument :items, "Array[Hash]"
|
|
458
|
+
|
|
459
|
+
# Max 10 concurrent API calls
|
|
460
|
+
task :fetch_from_api,
|
|
461
|
+
throttle: 10,
|
|
462
|
+
each: ->(ctx) { ctx.arguments.items },
|
|
463
|
+
output: { result: "Hash" } do |ctx|
|
|
464
|
+
item = ctx.each_value
|
|
465
|
+
{ result: RateLimitedAPI.fetch(item["id"]) }
|
|
466
|
+
end
|
|
467
|
+
```
|
|
468
|
+
|
|
469
|
+
---
|
|
470
|
+
|
|
471
|
+
## Debugging and Logging
|
|
472
|
+
|
|
473
|
+
### Viewing Logs
|
|
474
|
+
|
|
475
|
+
JobWorkflow uses structured JSON logging. Configure your log level:
|
|
476
|
+
|
|
477
|
+
```ruby
|
|
478
|
+
# config/environments/development.rb
|
|
479
|
+
config.log_level = :debug # Show all logs including throttling
|
|
480
|
+
|
|
481
|
+
# config/environments/production.rb
|
|
482
|
+
config.log_level = :info # Show workflow and task lifecycle events
|
|
483
|
+
```
|
|
484
|
+
|
|
485
|
+
### Common Log Events
|
|
486
|
+
|
|
487
|
+
- `workflow.start` / `workflow.complete` - Workflow lifecycle
|
|
488
|
+
- `task.start` / `task.complete` - Task execution
|
|
489
|
+
- `task.retry` - Task retry after failure
|
|
490
|
+
- `task.skip` - Task skipped (condition not met)
|
|
491
|
+
- `throttle.acquire.*` / `throttle.release` - Throttling events
|
|
492
|
+
|
|
493
|
+
### Testing Workflows
|
|
494
|
+
|
|
495
|
+
Test workflows in development using `perform_now`:
|
|
496
|
+
|
|
497
|
+
```ruby
|
|
498
|
+
# In Rails console
|
|
499
|
+
result = DataPipelineJob.perform_now(source_id: 123)
|
|
500
|
+
```
|
|
501
|
+
|
|
502
|
+
For automated testing, see the [TESTING_STRATEGY.md](TESTING_STRATEGY.md) guide.
|
|
503
|
+
|
|
504
|
+
---
|
|
505
|
+
|
|
506
|
+
## Next Steps
|
|
507
|
+
|
|
508
|
+
Now that you have a basic understanding of JobWorkflow, here are some recommended next steps:
|
|
509
|
+
|
|
510
|
+
1. **[DSL_BASICS.md](DSL_BASICS.md)** - Learn the full DSL syntax and task options
|
|
511
|
+
2. **[TASK_OUTPUTS.md](TASK_OUTPUTS.md)** - Master task outputs and data passing
|
|
512
|
+
3. **[PARALLEL_PROCESSING.md](PARALLEL_PROCESSING.md)** - Build efficient parallel workflows
|
|
513
|
+
4. **[ERROR_HANDLING.md](ERROR_HANDLING.md)** - Implement robust error handling
|
|
514
|
+
5. **[PRODUCTION_DEPLOYMENT.md](PRODUCTION_DEPLOYMENT.md)** - Deploy to production safely
|
|
515
|
+
|
|
516
|
+
---
|
|
517
|
+
|
|
518
|
+
## Need Help?
|
|
519
|
+
|
|
520
|
+
- **Documentation**: Browse the other guides in this directory
|
|
521
|
+
- **Issues**: Report bugs or request features on [GitHub](https://github.com/shoma07/job-workflow/issues)
|
|
522
|
+
- **Examples**: Check out the example workflows in the repository
|
|
523
|
+
|
|
524
|
+
Happy workflow building! 🚀
|