job-workflow 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (132) hide show
  1. checksums.yaml +7 -0
  2. data/.rspec +3 -0
  3. data/.rubocop.yml +91 -0
  4. data/CHANGELOG.md +23 -0
  5. data/LICENSE.txt +21 -0
  6. data/README.md +47 -0
  7. data/Rakefile +55 -0
  8. data/Steepfile +10 -0
  9. data/guides/API_REFERENCE.md +112 -0
  10. data/guides/BEST_PRACTICES.md +113 -0
  11. data/guides/CACHE_STORE_INTEGRATION.md +145 -0
  12. data/guides/CONDITIONAL_EXECUTION.md +66 -0
  13. data/guides/DEPENDENCY_WAIT.md +386 -0
  14. data/guides/DRY_RUN.md +390 -0
  15. data/guides/DSL_BASICS.md +216 -0
  16. data/guides/ERROR_HANDLING.md +187 -0
  17. data/guides/GETTING_STARTED.md +524 -0
  18. data/guides/INSTRUMENTATION.md +131 -0
  19. data/guides/LIFECYCLE_HOOKS.md +415 -0
  20. data/guides/NAMESPACES.md +75 -0
  21. data/guides/OPENTELEMETRY_INTEGRATION.md +86 -0
  22. data/guides/PARALLEL_PROCESSING.md +302 -0
  23. data/guides/PRODUCTION_DEPLOYMENT.md +110 -0
  24. data/guides/QUEUE_MANAGEMENT.md +141 -0
  25. data/guides/README.md +174 -0
  26. data/guides/SCHEDULED_JOBS.md +165 -0
  27. data/guides/STRUCTURED_LOGGING.md +268 -0
  28. data/guides/TASK_OUTPUTS.md +240 -0
  29. data/guides/TESTING_STRATEGY.md +56 -0
  30. data/guides/THROTTLING.md +198 -0
  31. data/guides/TROUBLESHOOTING.md +53 -0
  32. data/guides/WORKFLOW_COMPOSITION.md +675 -0
  33. data/guides/WORKFLOW_STATUS_QUERY.md +288 -0
  34. data/lib/job-workflow.rb +3 -0
  35. data/lib/job_workflow/argument_def.rb +16 -0
  36. data/lib/job_workflow/arguments.rb +40 -0
  37. data/lib/job_workflow/auto_scaling/adapter/aws_adapter.rb +66 -0
  38. data/lib/job_workflow/auto_scaling/adapter.rb +31 -0
  39. data/lib/job_workflow/auto_scaling/configuration.rb +85 -0
  40. data/lib/job_workflow/auto_scaling/executor.rb +43 -0
  41. data/lib/job_workflow/auto_scaling.rb +69 -0
  42. data/lib/job_workflow/cache_store_adapters.rb +46 -0
  43. data/lib/job_workflow/context.rb +352 -0
  44. data/lib/job_workflow/dry_run_config.rb +31 -0
  45. data/lib/job_workflow/dsl.rb +236 -0
  46. data/lib/job_workflow/error_hook.rb +24 -0
  47. data/lib/job_workflow/hook.rb +24 -0
  48. data/lib/job_workflow/hook_registry.rb +66 -0
  49. data/lib/job_workflow/instrumentation/log_subscriber.rb +194 -0
  50. data/lib/job_workflow/instrumentation/opentelemetry_subscriber.rb +221 -0
  51. data/lib/job_workflow/instrumentation.rb +257 -0
  52. data/lib/job_workflow/job_status.rb +92 -0
  53. data/lib/job_workflow/logger.rb +86 -0
  54. data/lib/job_workflow/namespace.rb +36 -0
  55. data/lib/job_workflow/output.rb +81 -0
  56. data/lib/job_workflow/output_def.rb +14 -0
  57. data/lib/job_workflow/queue.rb +74 -0
  58. data/lib/job_workflow/queue_adapter.rb +38 -0
  59. data/lib/job_workflow/queue_adapters/abstract.rb +87 -0
  60. data/lib/job_workflow/queue_adapters/null_adapter.rb +127 -0
  61. data/lib/job_workflow/queue_adapters/solid_queue_adapter.rb +224 -0
  62. data/lib/job_workflow/runner.rb +173 -0
  63. data/lib/job_workflow/schedule.rb +46 -0
  64. data/lib/job_workflow/semaphore.rb +71 -0
  65. data/lib/job_workflow/task.rb +83 -0
  66. data/lib/job_workflow/task_callable.rb +43 -0
  67. data/lib/job_workflow/task_context.rb +70 -0
  68. data/lib/job_workflow/task_dependency_wait.rb +66 -0
  69. data/lib/job_workflow/task_enqueue.rb +50 -0
  70. data/lib/job_workflow/task_graph.rb +43 -0
  71. data/lib/job_workflow/task_job_status.rb +70 -0
  72. data/lib/job_workflow/task_output.rb +51 -0
  73. data/lib/job_workflow/task_retry.rb +64 -0
  74. data/lib/job_workflow/task_throttle.rb +46 -0
  75. data/lib/job_workflow/version.rb +5 -0
  76. data/lib/job_workflow/workflow.rb +87 -0
  77. data/lib/job_workflow/workflow_status.rb +112 -0
  78. data/lib/job_workflow.rb +59 -0
  79. data/rbs_collection.lock.yaml +172 -0
  80. data/rbs_collection.yaml +14 -0
  81. data/sig/generated/job-workflow.rbs +2 -0
  82. data/sig/generated/job_workflow/argument_def.rbs +14 -0
  83. data/sig/generated/job_workflow/arguments.rbs +26 -0
  84. data/sig/generated/job_workflow/auto_scaling/adapter/aws_adapter.rbs +32 -0
  85. data/sig/generated/job_workflow/auto_scaling/adapter.rbs +22 -0
  86. data/sig/generated/job_workflow/auto_scaling/configuration.rbs +50 -0
  87. data/sig/generated/job_workflow/auto_scaling/executor.rbs +29 -0
  88. data/sig/generated/job_workflow/auto_scaling.rbs +47 -0
  89. data/sig/generated/job_workflow/cache_store_adapters.rbs +28 -0
  90. data/sig/generated/job_workflow/context.rbs +155 -0
  91. data/sig/generated/job_workflow/dry_run_config.rbs +16 -0
  92. data/sig/generated/job_workflow/dsl.rbs +117 -0
  93. data/sig/generated/job_workflow/error_hook.rbs +18 -0
  94. data/sig/generated/job_workflow/hook.rbs +18 -0
  95. data/sig/generated/job_workflow/hook_registry.rbs +47 -0
  96. data/sig/generated/job_workflow/instrumentation/log_subscriber.rbs +102 -0
  97. data/sig/generated/job_workflow/instrumentation/opentelemetry_subscriber.rbs +113 -0
  98. data/sig/generated/job_workflow/instrumentation.rbs +138 -0
  99. data/sig/generated/job_workflow/job_status.rbs +46 -0
  100. data/sig/generated/job_workflow/logger.rbs +56 -0
  101. data/sig/generated/job_workflow/namespace.rbs +24 -0
  102. data/sig/generated/job_workflow/output.rbs +39 -0
  103. data/sig/generated/job_workflow/output_def.rbs +12 -0
  104. data/sig/generated/job_workflow/queue.rbs +49 -0
  105. data/sig/generated/job_workflow/queue_adapter.rbs +18 -0
  106. data/sig/generated/job_workflow/queue_adapters/abstract.rbs +56 -0
  107. data/sig/generated/job_workflow/queue_adapters/null_adapter.rbs +73 -0
  108. data/sig/generated/job_workflow/queue_adapters/solid_queue_adapter.rbs +111 -0
  109. data/sig/generated/job_workflow/runner.rbs +66 -0
  110. data/sig/generated/job_workflow/schedule.rbs +34 -0
  111. data/sig/generated/job_workflow/semaphore.rbs +37 -0
  112. data/sig/generated/job_workflow/task.rbs +60 -0
  113. data/sig/generated/job_workflow/task_callable.rbs +30 -0
  114. data/sig/generated/job_workflow/task_context.rbs +52 -0
  115. data/sig/generated/job_workflow/task_dependency_wait.rbs +42 -0
  116. data/sig/generated/job_workflow/task_enqueue.rbs +27 -0
  117. data/sig/generated/job_workflow/task_graph.rbs +27 -0
  118. data/sig/generated/job_workflow/task_job_status.rbs +42 -0
  119. data/sig/generated/job_workflow/task_output.rbs +29 -0
  120. data/sig/generated/job_workflow/task_retry.rbs +30 -0
  121. data/sig/generated/job_workflow/task_throttle.rbs +20 -0
  122. data/sig/generated/job_workflow/version.rbs +5 -0
  123. data/sig/generated/job_workflow/workflow.rbs +48 -0
  124. data/sig/generated/job_workflow/workflow_status.rbs +55 -0
  125. data/sig/generated/job_workflow.rbs +8 -0
  126. data/sig-private/activejob.rbs +35 -0
  127. data/sig-private/activesupport.rbs +23 -0
  128. data/sig-private/aws.rbs +32 -0
  129. data/sig-private/opentelemetry.rbs +40 -0
  130. data/sig-private/solid_queue.rbs +108 -0
  131. data/tmp/.keep +0 -0
  132. metadata +190 -0
@@ -0,0 +1,187 @@
1
+ # Error Handling
2
+
3
+ JobWorkflow provides robust error handling features. With retry strategies and custom error handling, you can build reliable workflows.
4
+
5
+ ## Retry Configuration
6
+
7
+ ### Simple Retry
8
+
9
+ Specify the maximum retry count with a simple integer.
10
+
11
+ ```ruby
12
+ argument :api_endpoint, "String"
13
+
14
+ # Simple retry (up to 3 times)
15
+ task :fetch_data, retry: 3, output: { data: "Hash" } do |ctx|
16
+ endpoint = ctx.arguments.api_endpoint
17
+ { data: ExternalAPI.fetch(endpoint) }
18
+ end
19
+ ```
20
+
21
+ ### Advanced Retry Configuration
22
+
23
+ Use a Hash for detailed retry configuration with exponential backoff.
24
+
25
+ ```ruby
26
+ task :advanced_retry,
27
+ retry: {
28
+ count: 5, # Maximum retry attempts
29
+ strategy: :exponential, # :linear or :exponential
30
+ base_delay: 2, # Initial wait time in seconds
31
+ jitter: true # Add ±randomness to prevent thundering herd
32
+ },
33
+ output: { result: "String" } do |ctx|
34
+ { result: unreliable_operation }
35
+ # Retry intervals: 2±1s, 4±2s, 8±4s, 16±8s, 32±16s
36
+ end
37
+ ```
38
+
39
+ ## Retry Strategies
40
+
41
+ ### Linear
42
+
43
+ Retries at fixed intervals.
44
+
45
+ ```ruby
46
+ task :linear_retry,
47
+ retry: { count: 5, strategy: :linear, base_delay: 10 },
48
+ output: { result: "String" } do |ctx|
49
+ { result: operation }
50
+ # Retry intervals: 10s, 10s, 10s, 10s, 10s
51
+ end
52
+ ```
53
+
54
+ ### Exponential (Recommended)
55
+
56
+ Doubles wait time with each retry.
57
+
58
+ ```ruby
59
+ task :exponential_retry,
60
+ retry: { count: 5, strategy: :exponential, base_delay: 2, jitter: true },
61
+ output: { result: "String" } do |ctx|
62
+ { result: operation }
63
+ # Retry intervals: 2±1s, 4±2s, 8±4s, 16±8s, 32±16s
64
+ end
65
+ ```
66
+
67
+ ## Workflow-Level Retry
68
+
69
+ ### Using ActiveJob's `retry_on`
70
+
71
+ To retry the entire workflow (all tasks from the beginning) when an error occurs, use ActiveJob's standard `retry_on` method. This automatically requeues the complete job, ensuring all tasks are re-executed:
72
+
73
+ ```ruby
74
+ class DataPipelineJob < ApplicationJob
75
+ include JobWorkflow::DSL
76
+
77
+ argument :data_source, "String"
78
+
79
+ # Retry the entire workflow on StandardError (e.g., API timeouts)
80
+ retry_on StandardError, wait: :exponentially_longer, attempts: 5
81
+
82
+ task :fetch_data, output: { raw_data: "String" } do |ctx|
83
+ source = ctx.arguments.data_source
84
+ { raw_data: ExternalAPI.fetch(source) }
85
+ end
86
+
87
+ task :validate_data, depends_on: [:fetch_data], output: { valid: "bool" } do |ctx|
88
+ data = ctx.output[:fetch_data][:raw_data]
89
+ { valid: validate(data) }
90
+ end
91
+
92
+ task :process_data, depends_on: [:validate_data] do |ctx|
93
+ # ... process data
94
+ end
95
+ end
96
+ ```
97
+
98
+ ### Combining Task-Level and Workflow-Level Retries
99
+
100
+ You can combine task-level retries (for handling transient errors) with workflow-level retries (for catastrophic failures):
101
+
102
+ ```ruby
103
+ class RobustDataPipelineJob < ApplicationJob
104
+ include JobWorkflow::DSL
105
+
106
+ # Workflow-level: Handle catastrophic failures (e.g., database connection loss)
107
+ retry_on DatabaseConnectionError, wait: :exponentially_longer, attempts: 3
108
+
109
+ argument :batch_id, "String"
110
+
111
+ # Task-level: Handle transient API errors
112
+ task :fetch_data,
113
+ retry: { count: 3, strategy: :exponential, base_delay: 2 },
114
+ output: { raw_data: "String" } do |ctx|
115
+ { raw_data: ExternalAPI.fetch(ctx.arguments.batch_id) }
116
+ end
117
+
118
+ task :validate_data,
119
+ depends_on: [:fetch_data],
120
+ retry: { count: 2, strategy: :linear, base_delay: 1 },
121
+ output: { valid: "bool" } do |ctx|
122
+ data = ctx.output[:fetch_data][:raw_data]
123
+ { valid: validate(data) }
124
+ end
125
+
126
+ task :store_results, depends_on: [:validate_data] do |ctx|
127
+ # If this succeeds, the entire workflow is complete
128
+ # If a database connection error occurs here, the entire job is retried
129
+ Database.store(ctx.output[:validate_data])
130
+ end
131
+ end
132
+ ```
133
+
134
+ ### Retry Options
135
+
136
+ The `retry_on` method supports several options from ActiveJob:
137
+
138
+ ```ruby
139
+ class MyWorkflowJob < ApplicationJob
140
+ include JobWorkflow::DSL
141
+
142
+ # Wait with exponential backoff (2, 4, 8, 16, 32 seconds...)
143
+ retry_on TimeoutError, wait: :exponentially_longer, attempts: 5
144
+
145
+ # Wait with a fixed interval
146
+ retry_on APIError, wait: 30.seconds, attempts: 3
147
+
148
+ # Custom wait logic
149
+ retry_on CustomError, wait: ->(executions) { (executions + 1) * 10.seconds }, attempts: 4
150
+
151
+ # Multiple error types
152
+ retry_on TimeoutError, APIError, wait: :exponentially_longer, attempts: 3
153
+
154
+ # ... task definitions
155
+ end
156
+ ```
157
+
158
+ ### Key Differences: Task-Level vs Workflow-Level Retry
159
+
160
+ | Aspect | Task-Level (`retry:`) | Workflow-Level (`retry_on`) |
161
+ |--------|------------------------|------------------------------|
162
+ | **Scope** | Single task only | Entire workflow |
163
+ | **Re-execution** | Only the failed task retries | All tasks restart from the beginning |
164
+ | **Use Case** | Transient errors in one task (API timeouts, etc.) | Catastrophic failures affecting the whole workflow |
165
+ | **Output Preservation** | Previous outputs still available | Context reset on workflow retry |
166
+ | **Example** | API call times out | Database connection lost |
167
+
168
+ ### Best Practices for Retry Strategy
169
+
170
+ 1. **Task-level retries** for transient, recoverable errors:
171
+ ```ruby
172
+ task :api_call,
173
+ retry: { count: 3, strategy: :exponential, base_delay: 2 }
174
+ ```
175
+
176
+ 2. **Workflow-level retries** for environment issues (database, network):
177
+ ```ruby
178
+ retry_on DatabaseConnectionError, wait: :exponentially_longer, attempts: 3
179
+ ```
180
+
181
+ 3. **Avoid infinite retries**:
182
+ - Always set a maximum `attempts` limit
183
+ - Use exponential backoff to avoid overwhelming systems
184
+
185
+ 4. **Monitor retry patterns**:
186
+ - Use instrumentation hooks to track retry occurrences
187
+ - Alert on repeated failures to identify systemic issues
@@ -0,0 +1,524 @@
1
+ # Getting Started with JobWorkflow
2
+
3
+ Welcome to JobWorkflow! This guide will help you get up and running quickly with JobWorkflow, from installation to creating your first workflow.
4
+
5
+ ---
6
+
7
+ ## 🚀 Quick Start (5 Minutes)
8
+
9
+ Want to get started immediately? Here's the absolute minimum you need:
10
+
11
+ ### 1. Install JobWorkflow
12
+
13
+ ```ruby
14
+ # Gemfile
15
+ gem 'job-workflow'
16
+ ```
17
+
18
+ ```bash
19
+ bundle install
20
+ ```
21
+
22
+ ### 2. Create Your First Workflow
23
+
24
+ ```ruby
25
+ # app/jobs/hello_workflow_job.rb
26
+ class HelloWorkflowJob < ApplicationJob
27
+ include JobWorkflow::DSL
28
+
29
+ # Define input argument
30
+ argument :name, "String"
31
+
32
+ # Define a simple task
33
+ task :greet do |ctx|
34
+ name = ctx.arguments.name
35
+ puts "Hello, #{name}!"
36
+ end
37
+ end
38
+ ```
39
+
40
+ ### 3. Run It
41
+
42
+ ```ruby
43
+ # In Rails console or from your app
44
+ HelloWorkflowJob.perform_later(name: "World")
45
+ ```
46
+
47
+ **That's it!** You've just created and executed your first JobWorkflow workflow. 🎉
48
+
49
+ ---
50
+
51
+ ## What is JobWorkflow?
52
+
53
+ JobWorkflow is a declarative workflow orchestration engine for Ruby on Rails applications. Built on top of ActiveJob, it allows you to write complex workflows using a concise DSL.
54
+
55
+ ### Why JobWorkflow?
56
+
57
+ - **Declarative DSL**: Familiar syntax similar to Rake and RSpec
58
+ - **Automatic Dependency Management**: Tasks execute in the correct order automatically
59
+ - **Parallel Processing**: Efficient parallel execution using map tasks
60
+ - **Built-in Error Handling**: Retry functionality with exponential backoff
61
+ - **JSON Serialization**: Schema-tolerant context persistence
62
+ - **Flexible Design**: A foundation for building production-style workflows
63
+
64
+ ### When to Use JobWorkflow
65
+
66
+ JobWorkflow is ideal for:
67
+
68
+ - **ETL Pipelines**: Extract, transform, and load data workflows
69
+ - **Business Processes**: Multi-step business logic with dependencies
70
+ - **Batch Processing**: Process collections of items in parallel
71
+ - **API Integration**: Coordinate multiple API calls with rate limiting
72
+ - **Data Synchronization**: Keep multiple systems in sync
73
+ - **Report Generation**: Generate complex reports with multiple data sources
74
+
75
+ ---
76
+
77
+ ## Installation
78
+
79
+ ### Requirements
80
+
81
+ Before installing JobWorkflow, ensure your environment meets these requirements:
82
+
83
+ - **Ruby** >= 3.1.0
84
+ - **Rails** >= 7.1.0 (ActiveJob, ActiveSupport)
85
+ - **Queue Backend**: SolidQueue recommended (other adapters supported)
86
+ - **Cache Backend**: SolidCache recommended (other adapters supported)
87
+
88
+ ### Adding to Gemfile
89
+
90
+ Add JobWorkflow to your application's Gemfile:
91
+
92
+ ```ruby
93
+ # Gemfile
94
+ gem 'job-workflow'
95
+
96
+ # Optional but recommended: SolidQueue and SolidCache
97
+ gem 'solid_queue'
98
+ gem 'solid_cache'
99
+ ```
100
+
101
+ Run bundle install:
102
+
103
+ ```bash
104
+ bundle install
105
+ ```
106
+
107
+ ### Configuring ActiveJob
108
+
109
+ Set up SolidQueue as your ActiveJob backend:
110
+
111
+ ```ruby
112
+ # config/application.rb
113
+ config.active_job.queue_adapter = :solid_queue
114
+ ```
115
+
116
+ If using SolidQueue, install and configure it:
117
+
118
+ ```bash
119
+ bin/rails solid_queue:install
120
+ bin/rails db:migrate
121
+ ```
122
+
123
+ ### Configuring Cache Store (Optional but Recommended)
124
+
125
+ For workflows using throttling or semaphores, configure SolidCache:
126
+
127
+ ```ruby
128
+ # config/environments/production.rb
129
+ config.cache_store = :solid_cache_store
130
+ ```
131
+
132
+ Install SolidCache:
133
+
134
+ ```bash
135
+ bin/rails solid_cache:install
136
+ bin/rails db:migrate
137
+ ```
138
+
139
+ ---
140
+
141
+ ## Core Concepts
142
+
143
+ Understanding these core concepts will help you build effective workflows with JobWorkflow.
144
+
145
+ ### Workflow
146
+
147
+ A workflow is a collection of tasks that execute in a defined order. Each workflow is represented as a job class that includes `JobWorkflow::DSL`.
148
+
149
+ ```ruby
150
+ class MyWorkflowJob < ApplicationJob
151
+ include JobWorkflow::DSL
152
+
153
+ # Tasks defined here
154
+ end
155
+ ```
156
+
157
+ ### Task
158
+
159
+ A task is the smallest execution unit in a workflow. Each task:
160
+
161
+ - Has a unique name (symbol)
162
+ - Can depend on other tasks
163
+ - Receives a Context object
164
+ - Can return outputs for use in later tasks
165
+
166
+ ```ruby
167
+ task :fetch_data, output: { result: "String" } do |ctx|
168
+ { result: "data" }
169
+ end
170
+ ```
171
+
172
+ ### Arguments
173
+
174
+ Arguments are **immutable inputs** passed to the workflow. They represent the initial configuration and data:
175
+
176
+ ```ruby
177
+ # Define arguments
178
+ argument :user_id, "Integer"
179
+ argument :email, "String"
180
+ argument :config, "Hash", default: {}
181
+
182
+ # Access in tasks (read-only)
183
+ task :example do |ctx|
184
+ user_id = ctx.arguments.user_id # Read-only
185
+ email = ctx.arguments.email
186
+ end
187
+ ```
188
+
189
+ **Key Points**:
190
+ - Arguments are read-only and cannot be modified
191
+ - Arguments persist throughout workflow execution
192
+ - Use task outputs to pass data between tasks
193
+
194
+ ### Context
195
+
196
+ Context provides access to workflow state:
197
+
198
+ - **Arguments**: Immutable inputs (`ctx.arguments`)
199
+ - **Outputs**: Results from previous tasks (`ctx.output[:task_name]`)
200
+ - **Utilities**: Throttling, instrumentation (`ctx.throttle`, `ctx.instrument`)
201
+
202
+ ```ruby
203
+ task :process, depends_on: [:fetch_data] do |ctx|
204
+ # Access arguments
205
+ config = ctx.arguments.config
206
+
207
+ # Access outputs from previous tasks
208
+ data = ctx.output[:fetch_data].first.result
209
+
210
+ # Use throttling
211
+ ctx.throttle(limit: 10, key: "api") do
212
+ API.call(data)
213
+ end
214
+ end
215
+ ```
216
+
217
+ ### Outputs
218
+
219
+ Outputs are structured data returned from tasks. They are:
220
+
221
+ - Defined using the `output:` option
222
+ - Accessible via `ctx.output[:task_name]`
223
+ - Automatically collected for map tasks (arrays)
224
+ - Persisted with the context
225
+
226
+ ```ruby
227
+ # Define output structure
228
+ task :fetch_user, output: { user: "Hash", status: "Symbol" } do |ctx|
229
+ user = User.find(ctx.arguments.user_id)
230
+ {
231
+ user: user.as_json,
232
+ status: :ok
233
+ }
234
+ end
235
+
236
+ # Access output in another task
237
+ task :process_user, depends_on: [:fetch_user] do |ctx|
238
+ user_data = ctx.output[:fetch_user].first.user
239
+ status = ctx.output[:fetch_user].first.status
240
+ # Process user_data...
241
+ end
242
+ ```
243
+
244
+ ---
245
+
246
+ ## Your First Workflow
247
+
248
+ Let's create a practical example: a simple ETL (Extract-Transform-Load) workflow.
249
+
250
+ ### Step 1: Create the Job Class
251
+
252
+ Create a new job file:
253
+
254
+ ```ruby
255
+ # app/jobs/data_pipeline_job.rb
256
+ class DataPipelineJob < ApplicationJob
257
+ include JobWorkflow::DSL
258
+
259
+ # Define arguments (immutable inputs)
260
+ argument :source_id, "Integer"
261
+
262
+ # Task 1: Data extraction
263
+ task :extract, output: { raw_data: "String" } do |ctx|
264
+ source_id = ctx.arguments.source_id
265
+ raw_data = ExternalAPI.fetch(source_id)
266
+ { raw_data: raw_data }
267
+ end
268
+
269
+ # Task 2: Data transformation (depends on extract)
270
+ task :transform, depends_on: [:extract], output: { transformed_data: "Hash" } do |ctx|
271
+ raw_data = ctx.output[:extract].first.raw_data
272
+ transformed_data = JSON.parse(raw_data)
273
+ { transformed_data: transformed_data }
274
+ end
275
+
276
+ # Task 3: Data loading (depends on transform)
277
+ task :load, depends_on: [:transform] do |ctx|
278
+ transformed_data = ctx.output[:transform].first.transformed_data
279
+ DataModel.create!(transformed_data)
280
+ Rails.logger.info "Data loaded successfully"
281
+ end
282
+ end
283
+ ```
284
+
285
+ ### Step 2: Enqueue the Job
286
+
287
+ From a controller, Rake task, or Rails console:
288
+
289
+ ```ruby
290
+ # Asynchronous execution (recommended)
291
+ DataPipelineJob.perform_later(source_id: 123)
292
+
293
+ # Synchronous execution (for testing/development)
294
+ DataPipelineJob.perform_now(source_id: 123)
295
+ ```
296
+
297
+ ### Step 3: Understand the Execution Flow
298
+
299
+ 1. The `extract` task executes first (no dependencies)
300
+ 2. After `extract` completes, the `transform` task executes
301
+ 3. After `transform` completes, the `load` task executes
302
+
303
+ JobWorkflow automatically determines the correct execution order based on dependencies using topological sorting.
304
+
305
+ ### Step 4: Monitor Execution
306
+
307
+ JobWorkflow outputs workflow execution status to the Rails logger:
308
+
309
+ ```ruby
310
+ # config/environments/development.rb
311
+ config.log_level = :info
312
+ ```
313
+
314
+ Example log output:
315
+
316
+ ```json
317
+ {"time":"2024-01-02T10:00:00.123Z","level":"INFO","event":"workflow.start","job_name":"DataPipelineJob","job_id":"abc123"}
318
+ {"time":"2024-01-02T10:00:01.234Z","level":"INFO","event":"task.start","job_name":"DataPipelineJob","job_id":"abc123","task_name":"extract"}
319
+ {"time":"2024-01-02T10:00:05.345Z","level":"INFO","event":"task.complete","job_name":"DataPipelineJob","job_id":"abc123","task_name":"extract"}
320
+ {"time":"2024-01-02T10:00:05.456Z","level":"INFO","event":"task.start","job_name":"DataPipelineJob","job_id":"abc123","task_name":"transform"}
321
+ {"time":"2024-01-02T10:00:07.567Z","level":"INFO","event":"task.complete","job_name":"DataPipelineJob","job_id":"abc123","task_name":"transform"}
322
+ {"time":"2024-01-02T10:00:07.678Z","level":"INFO","event":"task.start","job_name":"DataPipelineJob","job_id":"abc123","task_name":"load"}
323
+ {"time":"2024-01-02T10:00:10.789Z","level":"INFO","event":"task.complete","job_name":"DataPipelineJob","job_id":"abc123","task_name":"load"}
324
+ {"time":"2024-01-02T10:00:10.890Z","level":"INFO","event":"workflow.complete","job_name":"DataPipelineJob","job_id":"abc123"}
325
+ ```
326
+
327
+ ---
328
+
329
+ ## Common Patterns
330
+
331
+ Here are some common patterns you'll use when building workflows.
332
+
333
+ ### Multiple Dependencies
334
+
335
+ Tasks can depend on multiple other tasks:
336
+
337
+ ```ruby
338
+ argument :order_id, "Integer"
339
+
340
+ task :fetch_order, output: { order: "Hash" } do |ctx|
341
+ order = Order.find(ctx.arguments.order_id)
342
+ { order: order.as_json }
343
+ end
344
+
345
+ task :fetch_user, output: { user: "Hash" } do |ctx|
346
+ order = ctx.output[:fetch_order].first.order
347
+ user = User.find(order["user_id"])
348
+ { user: user.as_json }
349
+ end
350
+
351
+ task :fetch_inventory, output: { inventory: "Array[Hash]" } do |ctx|
352
+ order = ctx.output[:fetch_order].first.order
353
+ inventory = order["items"].map { |item| Inventory.check(item["id"]) }
354
+ { inventory: inventory }
355
+ end
356
+
357
+ # This task waits for both :fetch_user and :fetch_inventory
358
+ task :validate_order, depends_on: [:fetch_user, :fetch_inventory] do |ctx|
359
+ user = ctx.output[:fetch_user].first.user
360
+ inventory = ctx.output[:fetch_inventory].first.inventory
361
+
362
+ OrderValidator.validate(user, inventory)
363
+ end
364
+ ```
365
+
366
+ ### Conditional Execution
367
+
368
+ Execute tasks only when conditions are met:
369
+
370
+ ```ruby
371
+ argument :user, "User"
372
+ argument :amount, "Integer"
373
+
374
+ task :basic_processing do |ctx|
375
+ # Always executes
376
+ BasicProcessor.process(ctx.arguments.amount)
377
+ end
378
+
379
+ # Only execute for premium users
380
+ task :premium_processing,
381
+ depends_on: [:basic_processing],
382
+ condition: ->(ctx) { ctx.arguments.user.premium? } do |ctx|
383
+ PremiumProcessor.process(ctx.arguments.amount)
384
+ end
385
+
386
+ # Only execute for large amounts
387
+ task :large_amount_processing,
388
+ depends_on: [:basic_processing],
389
+ condition: ->(ctx) { ctx.arguments.amount > 1000 } do |ctx|
390
+ LargeAmountProcessor.process(ctx.arguments.amount)
391
+ end
392
+ ```
393
+
394
+ ### Error Handling with Retry
395
+
396
+ Add retry logic to handle transient failures:
397
+
398
+ ```ruby
399
+ argument :api_endpoint, "String"
400
+
401
+ # Simple retry (up to 3 times)
402
+ task :fetch_data, retry: 3, output: { data: "Hash" } do |ctx|
403
+ endpoint = ctx.arguments.api_endpoint
404
+ { data: ExternalAPI.fetch(endpoint) }
405
+ end
406
+
407
+ # Advanced retry with exponential backoff
408
+ task :fetch_data_advanced,
409
+ retry: {
410
+ count: 5,
411
+ strategy: :exponential,
412
+ base_delay: 2,
413
+ jitter: true
414
+ },
415
+ output: { data: "Hash" } do |ctx|
416
+ endpoint = ctx.arguments.api_endpoint
417
+ { data: ExternalAPI.fetch(endpoint) }
418
+ # Retry intervals: 2±1s, 4±2s, 8±4s, 16±8s, 32±16s
419
+ end
420
+ ```
421
+
422
+ ### Parallel Processing
423
+
424
+ Process collections in parallel:
425
+
426
+ ```ruby
427
+ argument :user_ids, "Array[Integer]"
428
+
429
+ # Process each user in parallel
430
+ task :process_users,
431
+ each: ->(ctx) { ctx.arguments.user_ids },
432
+ output: { user_id: "Integer", status: "Symbol" } do |ctx|
433
+ user_id = ctx.each_value
434
+ user = User.find(user_id)
435
+ user.process!
436
+ {
437
+ user_id: user_id,
438
+ status: :processed
439
+ }
440
+ end
441
+
442
+ # Aggregate results
443
+ task :summarize, depends_on: [:process_users] do |ctx|
444
+ results = ctx.output[:process_users]
445
+ puts "Processed #{results.size} users"
446
+ results.each do |result|
447
+ puts "User #{result.user_id}: #{result.status}"
448
+ end
449
+ end
450
+ ```
451
+
452
+ ### Throttling API Calls
453
+
454
+ Limit concurrent API calls to respect rate limits:
455
+
456
+ ```ruby
457
+ argument :items, "Array[Hash]"
458
+
459
+ # Max 10 concurrent API calls
460
+ task :fetch_from_api,
461
+ throttle: 10,
462
+ each: ->(ctx) { ctx.arguments.items },
463
+ output: { result: "Hash" } do |ctx|
464
+ item = ctx.each_value
465
+ { result: RateLimitedAPI.fetch(item["id"]) }
466
+ end
467
+ ```
468
+
469
+ ---
470
+
471
+ ## Debugging and Logging
472
+
473
+ ### Viewing Logs
474
+
475
+ JobWorkflow uses structured JSON logging. Configure your log level:
476
+
477
+ ```ruby
478
+ # config/environments/development.rb
479
+ config.log_level = :debug # Show all logs including throttling
480
+
481
+ # config/environments/production.rb
482
+ config.log_level = :info # Show workflow and task lifecycle events
483
+ ```
484
+
485
+ ### Common Log Events
486
+
487
+ - `workflow.start` / `workflow.complete` - Workflow lifecycle
488
+ - `task.start` / `task.complete` - Task execution
489
+ - `task.retry` - Task retry after failure
490
+ - `task.skip` - Task skipped (condition not met)
491
+ - `throttle.acquire.*` / `throttle.release` - Throttling events
492
+
493
+ ### Testing Workflows
494
+
495
+ Test workflows in development using `perform_now`:
496
+
497
+ ```ruby
498
+ # In Rails console
499
+ result = DataPipelineJob.perform_now(source_id: 123)
500
+ ```
501
+
502
+ For automated testing, see the [TESTING_STRATEGY.md](TESTING_STRATEGY.md) guide.
503
+
504
+ ---
505
+
506
+ ## Next Steps
507
+
508
+ Now that you have a basic understanding of JobWorkflow, here are some recommended next steps:
509
+
510
+ 1. **[DSL_BASICS.md](DSL_BASICS.md)** - Learn the full DSL syntax and task options
511
+ 2. **[TASK_OUTPUTS.md](TASK_OUTPUTS.md)** - Master task outputs and data passing
512
+ 3. **[PARALLEL_PROCESSING.md](PARALLEL_PROCESSING.md)** - Build efficient parallel workflows
513
+ 4. **[ERROR_HANDLING.md](ERROR_HANDLING.md)** - Implement robust error handling
514
+ 5. **[PRODUCTION_DEPLOYMENT.md](PRODUCTION_DEPLOYMENT.md)** - Deploy to production safely
515
+
516
+ ---
517
+
518
+ ## Need Help?
519
+
520
+ - **Documentation**: Browse the other guides in this directory
521
+ - **Issues**: Report bugs or request features on [GitHub](https://github.com/shoma07/job-workflow/issues)
522
+ - **Examples**: Check out the example workflows in the repository
523
+
524
+ Happy workflow building! 🚀