job-workflow 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.rspec +3 -0
- data/.rubocop.yml +91 -0
- data/CHANGELOG.md +23 -0
- data/LICENSE.txt +21 -0
- data/README.md +47 -0
- data/Rakefile +55 -0
- data/Steepfile +10 -0
- data/guides/API_REFERENCE.md +112 -0
- data/guides/BEST_PRACTICES.md +113 -0
- data/guides/CACHE_STORE_INTEGRATION.md +145 -0
- data/guides/CONDITIONAL_EXECUTION.md +66 -0
- data/guides/DEPENDENCY_WAIT.md +386 -0
- data/guides/DRY_RUN.md +390 -0
- data/guides/DSL_BASICS.md +216 -0
- data/guides/ERROR_HANDLING.md +187 -0
- data/guides/GETTING_STARTED.md +524 -0
- data/guides/INSTRUMENTATION.md +131 -0
- data/guides/LIFECYCLE_HOOKS.md +415 -0
- data/guides/NAMESPACES.md +75 -0
- data/guides/OPENTELEMETRY_INTEGRATION.md +86 -0
- data/guides/PARALLEL_PROCESSING.md +302 -0
- data/guides/PRODUCTION_DEPLOYMENT.md +110 -0
- data/guides/QUEUE_MANAGEMENT.md +141 -0
- data/guides/README.md +174 -0
- data/guides/SCHEDULED_JOBS.md +165 -0
- data/guides/STRUCTURED_LOGGING.md +268 -0
- data/guides/TASK_OUTPUTS.md +240 -0
- data/guides/TESTING_STRATEGY.md +56 -0
- data/guides/THROTTLING.md +198 -0
- data/guides/TROUBLESHOOTING.md +53 -0
- data/guides/WORKFLOW_COMPOSITION.md +675 -0
- data/guides/WORKFLOW_STATUS_QUERY.md +288 -0
- data/lib/job-workflow.rb +3 -0
- data/lib/job_workflow/argument_def.rb +16 -0
- data/lib/job_workflow/arguments.rb +40 -0
- data/lib/job_workflow/auto_scaling/adapter/aws_adapter.rb +66 -0
- data/lib/job_workflow/auto_scaling/adapter.rb +31 -0
- data/lib/job_workflow/auto_scaling/configuration.rb +85 -0
- data/lib/job_workflow/auto_scaling/executor.rb +43 -0
- data/lib/job_workflow/auto_scaling.rb +69 -0
- data/lib/job_workflow/cache_store_adapters.rb +46 -0
- data/lib/job_workflow/context.rb +352 -0
- data/lib/job_workflow/dry_run_config.rb +31 -0
- data/lib/job_workflow/dsl.rb +236 -0
- data/lib/job_workflow/error_hook.rb +24 -0
- data/lib/job_workflow/hook.rb +24 -0
- data/lib/job_workflow/hook_registry.rb +66 -0
- data/lib/job_workflow/instrumentation/log_subscriber.rb +194 -0
- data/lib/job_workflow/instrumentation/opentelemetry_subscriber.rb +221 -0
- data/lib/job_workflow/instrumentation.rb +257 -0
- data/lib/job_workflow/job_status.rb +92 -0
- data/lib/job_workflow/logger.rb +86 -0
- data/lib/job_workflow/namespace.rb +36 -0
- data/lib/job_workflow/output.rb +81 -0
- data/lib/job_workflow/output_def.rb +14 -0
- data/lib/job_workflow/queue.rb +74 -0
- data/lib/job_workflow/queue_adapter.rb +38 -0
- data/lib/job_workflow/queue_adapters/abstract.rb +87 -0
- data/lib/job_workflow/queue_adapters/null_adapter.rb +127 -0
- data/lib/job_workflow/queue_adapters/solid_queue_adapter.rb +224 -0
- data/lib/job_workflow/runner.rb +173 -0
- data/lib/job_workflow/schedule.rb +46 -0
- data/lib/job_workflow/semaphore.rb +71 -0
- data/lib/job_workflow/task.rb +83 -0
- data/lib/job_workflow/task_callable.rb +43 -0
- data/lib/job_workflow/task_context.rb +70 -0
- data/lib/job_workflow/task_dependency_wait.rb +66 -0
- data/lib/job_workflow/task_enqueue.rb +50 -0
- data/lib/job_workflow/task_graph.rb +43 -0
- data/lib/job_workflow/task_job_status.rb +70 -0
- data/lib/job_workflow/task_output.rb +51 -0
- data/lib/job_workflow/task_retry.rb +64 -0
- data/lib/job_workflow/task_throttle.rb +46 -0
- data/lib/job_workflow/version.rb +5 -0
- data/lib/job_workflow/workflow.rb +87 -0
- data/lib/job_workflow/workflow_status.rb +112 -0
- data/lib/job_workflow.rb +59 -0
- data/rbs_collection.lock.yaml +172 -0
- data/rbs_collection.yaml +14 -0
- data/sig/generated/job-workflow.rbs +2 -0
- data/sig/generated/job_workflow/argument_def.rbs +14 -0
- data/sig/generated/job_workflow/arguments.rbs +26 -0
- data/sig/generated/job_workflow/auto_scaling/adapter/aws_adapter.rbs +32 -0
- data/sig/generated/job_workflow/auto_scaling/adapter.rbs +22 -0
- data/sig/generated/job_workflow/auto_scaling/configuration.rbs +50 -0
- data/sig/generated/job_workflow/auto_scaling/executor.rbs +29 -0
- data/sig/generated/job_workflow/auto_scaling.rbs +47 -0
- data/sig/generated/job_workflow/cache_store_adapters.rbs +28 -0
- data/sig/generated/job_workflow/context.rbs +155 -0
- data/sig/generated/job_workflow/dry_run_config.rbs +16 -0
- data/sig/generated/job_workflow/dsl.rbs +117 -0
- data/sig/generated/job_workflow/error_hook.rbs +18 -0
- data/sig/generated/job_workflow/hook.rbs +18 -0
- data/sig/generated/job_workflow/hook_registry.rbs +47 -0
- data/sig/generated/job_workflow/instrumentation/log_subscriber.rbs +102 -0
- data/sig/generated/job_workflow/instrumentation/opentelemetry_subscriber.rbs +113 -0
- data/sig/generated/job_workflow/instrumentation.rbs +138 -0
- data/sig/generated/job_workflow/job_status.rbs +46 -0
- data/sig/generated/job_workflow/logger.rbs +56 -0
- data/sig/generated/job_workflow/namespace.rbs +24 -0
- data/sig/generated/job_workflow/output.rbs +39 -0
- data/sig/generated/job_workflow/output_def.rbs +12 -0
- data/sig/generated/job_workflow/queue.rbs +49 -0
- data/sig/generated/job_workflow/queue_adapter.rbs +18 -0
- data/sig/generated/job_workflow/queue_adapters/abstract.rbs +56 -0
- data/sig/generated/job_workflow/queue_adapters/null_adapter.rbs +73 -0
- data/sig/generated/job_workflow/queue_adapters/solid_queue_adapter.rbs +111 -0
- data/sig/generated/job_workflow/runner.rbs +66 -0
- data/sig/generated/job_workflow/schedule.rbs +34 -0
- data/sig/generated/job_workflow/semaphore.rbs +37 -0
- data/sig/generated/job_workflow/task.rbs +60 -0
- data/sig/generated/job_workflow/task_callable.rbs +30 -0
- data/sig/generated/job_workflow/task_context.rbs +52 -0
- data/sig/generated/job_workflow/task_dependency_wait.rbs +42 -0
- data/sig/generated/job_workflow/task_enqueue.rbs +27 -0
- data/sig/generated/job_workflow/task_graph.rbs +27 -0
- data/sig/generated/job_workflow/task_job_status.rbs +42 -0
- data/sig/generated/job_workflow/task_output.rbs +29 -0
- data/sig/generated/job_workflow/task_retry.rbs +30 -0
- data/sig/generated/job_workflow/task_throttle.rbs +20 -0
- data/sig/generated/job_workflow/version.rbs +5 -0
- data/sig/generated/job_workflow/workflow.rbs +48 -0
- data/sig/generated/job_workflow/workflow_status.rbs +55 -0
- data/sig/generated/job_workflow.rbs +8 -0
- data/sig-private/activejob.rbs +35 -0
- data/sig-private/activesupport.rbs +23 -0
- data/sig-private/aws.rbs +32 -0
- data/sig-private/opentelemetry.rbs +40 -0
- data/sig-private/solid_queue.rbs +108 -0
- data/tmp/.keep +0 -0
- metadata +190 -0
|
@@ -0,0 +1,386 @@
|
|
|
1
|
+
# Dependency Wait
|
|
2
|
+
|
|
3
|
+
JobWorkflow provides a `dependency_wait` option for tasks to efficiently wait for their dependencies without occupying worker threads. This feature is essential for workflows where map tasks spawn many parallel sub-jobs.
|
|
4
|
+
|
|
5
|
+
## The Problem
|
|
6
|
+
|
|
7
|
+
Consider a workflow where Task B depends on Task A, and Task A is a map task that spawns many parallel sub-jobs:
|
|
8
|
+
|
|
9
|
+
```ruby
|
|
10
|
+
class ExampleJob < ApplicationJob
|
|
11
|
+
include JobWorkflow::DSL
|
|
12
|
+
|
|
13
|
+
argument :items, "Array[Integer]"
|
|
14
|
+
|
|
15
|
+
task :process_items,
|
|
16
|
+
each: ->(ctx) { ctx.arguments.items },
|
|
17
|
+
enqueue: { concurrency: 5 },
|
|
18
|
+
output: { result: "Integer" } do |ctx|
|
|
19
|
+
# This creates many sub-jobs
|
|
20
|
+
{ result: ctx.each_value * 2 }
|
|
21
|
+
end
|
|
22
|
+
|
|
23
|
+
task :aggregate,
|
|
24
|
+
depends_on: [:process_items] do |ctx|
|
|
25
|
+
# This task needs to wait for all sub-jobs to complete
|
|
26
|
+
ctx.output[:process_items].sum { |h| h[:result] }
|
|
27
|
+
end
|
|
28
|
+
end
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
Without `dependency_wait`, the `:aggregate` task would continuously poll to check if `:process_items` sub-jobs are complete, occupying a worker thread the entire time. If you have 10 such workflows running and only 10 workers, all workers could be blocked waiting!
|
|
32
|
+
|
|
33
|
+
## The Solution: `dependency_wait`
|
|
34
|
+
|
|
35
|
+
The `dependency_wait` option enables efficient waiting by:
|
|
36
|
+
|
|
37
|
+
1. **Checking if dependencies are complete** - If not, instead of polling in a loop...
|
|
38
|
+
2. **Creating a ScheduledExecution** - Rescheduling the job for later
|
|
39
|
+
3. **Releasing the worker** - Freeing the thread to process other jobs
|
|
40
|
+
4. **Automatic retry** - The job will be picked up again after the reschedule delay
|
|
41
|
+
|
|
42
|
+
### Basic Usage
|
|
43
|
+
|
|
44
|
+
**Important:** bool values for `dependency_wait` (e.g. `dependency_wait: true` or `dependency_wait: false`) are **not supported**. Use an Integer (shorthand for `poll_timeout`) or a Hash for explicit configuration.
|
|
45
|
+
|
|
46
|
+
- Polling-only (default): omit `dependency_wait` or pass an empty Hash; this uses `poll_timeout = 0` (polling-only, no reschedule)
|
|
47
|
+
|
|
48
|
+
```ruby
|
|
49
|
+
# Polling-only (worker will be occupied while waiting)
|
|
50
|
+
task :aggregate,
|
|
51
|
+
depends_on: [:process_items],
|
|
52
|
+
dependency_wait: {} do |ctx|
|
|
53
|
+
ctx.output[:process_items].sum { |h| h[:result] }
|
|
54
|
+
end
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
To enable non-blocking rescheduling, set a positive `poll_timeout` (either as an Integer or inside a Hash):
|
|
58
|
+
|
|
59
|
+
```ruby
|
|
60
|
+
# Example: poll up to 30s in-process, then reschedule for later execution
|
|
61
|
+
task :aggregate,
|
|
62
|
+
depends_on: [:process_items],
|
|
63
|
+
dependency_wait: { poll_timeout: 30, poll_interval: 2, reschedule_delay: 5 } do |ctx|
|
|
64
|
+
ctx.output[:process_items].sum { |h| h[:result] }
|
|
65
|
+
end
|
|
66
|
+
|
|
67
|
+
# Or use the integer shorthand to set poll_timeout directly:
|
|
68
|
+
|
|
69
|
+
task :aggregate,
|
|
70
|
+
depends_on: [:process_items],
|
|
71
|
+
dependency_wait: 30 do |ctx|
|
|
72
|
+
ctx.output[:process_items].sum { |h| h[:result] }
|
|
73
|
+
end
|
|
74
|
+
```
|
|
75
|
+
### Configuration Options
|
|
76
|
+
|
|
77
|
+
```ruby
|
|
78
|
+
# Enable rescheduling after 30s of polling
|
|
79
|
+
task :aggregate,
|
|
80
|
+
depends_on: [:process_items],
|
|
81
|
+
dependency_wait: {
|
|
82
|
+
poll_timeout: 30, # Max seconds to poll before rescheduling
|
|
83
|
+
poll_interval: 2, # Seconds between polls during initial wait
|
|
84
|
+
reschedule_delay: 5 # Seconds to wait before job is re-executed
|
|
85
|
+
} do |ctx|
|
|
86
|
+
ctx.output[:process_items]
|
|
87
|
+
end
|
|
88
|
+
|
|
89
|
+
# Shorthand: integer value sets poll_timeout (rescheduling enabled)
|
|
90
|
+
task :aggregate,
|
|
91
|
+
depends_on: [:process_items],
|
|
92
|
+
dependency_wait: 30 do |ctx|
|
|
93
|
+
ctx.output[:process_items]
|
|
94
|
+
end
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
#### Option Details
|
|
98
|
+
|
|
99
|
+
| Option | Default | Description |
|
|
100
|
+
|--------|---------|-------------|
|
|
101
|
+
| `poll_timeout` | 0 (polling-only) | If `<= 0`, the runner will poll indefinitely and **will not** reschedule; set to a positive integer (seconds) to enable rescheduling after that many seconds. |
|
|
102
|
+
| `poll_interval` | 5 | Seconds between dependency checks during the in-process polling phase. Lower values detect completions sooner but increase DB load. |
|
|
103
|
+
| `reschedule_delay` | 5 | Seconds until the rescheduled job becomes executable once rescheduled. Should be tuned based on expected completion time of dependent work. |
|
|
104
|
+
|
|
105
|
+
## How It Works
|
|
106
|
+
|
|
107
|
+
### Execution Flow
|
|
108
|
+
|
|
109
|
+
```
|
|
110
|
+
┌─────────────────────────────────────────────────────────────────────┐
|
|
111
|
+
│ Task with dependency_wait starts │
|
|
112
|
+
└─────────────────────────────────────────────────────────────────────┘
|
|
113
|
+
│
|
|
114
|
+
▼
|
|
115
|
+
┌─────────────────────────────────────────────────────────────────────┐
|
|
116
|
+
│ Check: Are all dependencies complete? │
|
|
117
|
+
└─────────────────────────────────────────────────────────────────────┘
|
|
118
|
+
│ │
|
|
119
|
+
Yes No
|
|
120
|
+
│ │
|
|
121
|
+
▼ ▼
|
|
122
|
+
┌───────────────────────────┐ ┌────────────────────────────────────┐
|
|
123
|
+
│ Execute task block │ │ Poll for poll_timeout seconds │
|
|
124
|
+
└───────────────────────────┘ └────────────────────────────────────┘
|
|
125
|
+
│
|
|
126
|
+
┌───────────┴───────────┐
|
|
127
|
+
Complete Still waiting
|
|
128
|
+
│ │
|
|
129
|
+
▼ ▼
|
|
130
|
+
┌───────────────────────┐ ┌─────────────────────┐
|
|
131
|
+
│ Execute task block │ │ Reschedule job │
|
|
132
|
+
└───────────────────────┘ │ (release worker) │
|
|
133
|
+
└─────────────────────┘
|
|
134
|
+
│
|
|
135
|
+
▼
|
|
136
|
+
┌─────────────────────┐
|
|
137
|
+
│ Job re-executes │
|
|
138
|
+
│ after reschedule_ │
|
|
139
|
+
│ delay seconds │
|
|
140
|
+
└─────────────────────┘
|
|
141
|
+
│
|
|
142
|
+
▼
|
|
143
|
+
(Repeat from top)
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
### SolidQueue Integration
|
|
147
|
+
|
|
148
|
+
The `dependency_wait` feature leverages SolidQueue's internal mechanisms with a sophisticated control flow pattern:
|
|
149
|
+
|
|
150
|
+
1. **ScheduledExecution Creation** - When polling timeout is exceeded, `reschedule_job` creates a scheduled job entry with `scheduled_at` set to current time + `reschedule_delay`
|
|
151
|
+
2. **ClaimedExecution Cleanup** - The current claimed execution is deleted to free the worker thread
|
|
152
|
+
3. **Control Flow via `throw/catch`** - `throw :rescheduled` exits the job execution, bypassing normal completion markers
|
|
153
|
+
4. **ClaimedExecutionPatch** - Patches SolidQueue's `finished` method to handle rescheduled jobs gracefully: if the claimed execution record no longer exists, it returns early without marking the job as finished
|
|
154
|
+
5. **Dispatcher Pickup** - SolidQueue's dispatcher picks up the scheduled job when it becomes due and re-executes it
|
|
155
|
+
|
|
156
|
+
**Important**: The `throw/catch` mechanism is safe because:
|
|
157
|
+
- `throw` is not an exception, so it won't be caught by `rescue Exception`
|
|
158
|
+
- It jumps directly to the corresponding `catch` block in `Runner#run`
|
|
159
|
+
- SolidQueue's `ClaimedExecution#perform` completes normally without raising errors
|
|
160
|
+
- The job is never marked as `finished_at` or `failed_at`, allowing the dispatcher to re-execute it
|
|
161
|
+
|
|
162
|
+
## Real-World Example
|
|
163
|
+
|
|
164
|
+
### ETL Pipeline with Parallel Processing
|
|
165
|
+
|
|
166
|
+
```ruby
|
|
167
|
+
class DataPipelineJob < ApplicationJob
|
|
168
|
+
include JobWorkflow::DSL
|
|
169
|
+
|
|
170
|
+
argument :date, "String"
|
|
171
|
+
|
|
172
|
+
# Extract data from multiple sources in parallel
|
|
173
|
+
task :extract_data,
|
|
174
|
+
each: ->(ctx) { %w[users orders products inventory] },
|
|
175
|
+
enqueue: { concurrency: 4 },
|
|
176
|
+
output: { source: "String", count: "Integer" } do |ctx|
|
|
177
|
+
source = ctx.each_value
|
|
178
|
+
data = DataSource.fetch(source, date: ctx.arguments.date)
|
|
179
|
+
{ source: source, count: data.size }
|
|
180
|
+
end
|
|
181
|
+
|
|
182
|
+
# Transform: wait for all extracts without blocking workers
|
|
183
|
+
task :transform_data,
|
|
184
|
+
depends_on: [:extract_data],
|
|
185
|
+
dependency_wait: {
|
|
186
|
+
poll_timeout: 30,
|
|
187
|
+
reschedule_delay: 10
|
|
188
|
+
},
|
|
189
|
+
output: { transformed_count: "Integer" } do |ctx|
|
|
190
|
+
extracted = ctx.output[:extract_data]
|
|
191
|
+
# extracted is an array of outputs from each parallel sub-job
|
|
192
|
+
transformed = Transformer.process(extracted)
|
|
193
|
+
{ transformed_count: transformed.size }
|
|
194
|
+
end
|
|
195
|
+
|
|
196
|
+
# Load into destination
|
|
197
|
+
task :load_data,
|
|
198
|
+
depends_on: [:transform_data] do |ctx|
|
|
199
|
+
count = ctx.output[:transform_data].first[:transformed_count]
|
|
200
|
+
DataWarehouse.load(count)
|
|
201
|
+
end
|
|
202
|
+
end
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
### API Aggregation with Rate Limiting
|
|
206
|
+
|
|
207
|
+
```ruby
|
|
208
|
+
class APIAggregatorJob < ApplicationJob
|
|
209
|
+
include JobWorkflow::DSL
|
|
210
|
+
|
|
211
|
+
argument :user_ids, "Array[Integer]"
|
|
212
|
+
|
|
213
|
+
# Fetch user data with rate limiting
|
|
214
|
+
task :fetch_users,
|
|
215
|
+
each: ->(ctx) { ctx.arguments.user_ids },
|
|
216
|
+
enqueue: { concurrency: 10 },
|
|
217
|
+
throttle: { key: "external_api", limit: 5 },
|
|
218
|
+
output: { user_id: "Integer", data: "Hash" } do |ctx|
|
|
219
|
+
user_id = ctx.each_value
|
|
220
|
+
{ user_id: user_id, data: ExternalAPI.get_user(user_id) }
|
|
221
|
+
end
|
|
222
|
+
|
|
223
|
+
# Generate report: efficiently wait for all API calls
|
|
224
|
+
task :generate_report,
|
|
225
|
+
depends_on: [:fetch_users],
|
|
226
|
+
dependency_wait: {
|
|
227
|
+
poll_timeout: 60, # Long poll for slow API
|
|
228
|
+
reschedule_delay: 15 # Generous reschedule delay
|
|
229
|
+
} do |ctx|
|
|
230
|
+
users = ctx.output[:fetch_users]
|
|
231
|
+
ReportGenerator.create(users)
|
|
232
|
+
end
|
|
233
|
+
end
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
## Best Practices
|
|
237
|
+
|
|
238
|
+
### 1. Tune `poll_timeout` Based on Expected Wait Time
|
|
239
|
+
|
|
240
|
+
```ruby
|
|
241
|
+
# If you want rescheduling: choose a positive poll_timeout
|
|
242
|
+
# For quick tasks (< 30s expected)
|
|
243
|
+
dependency_wait: { poll_timeout: 10, reschedule_delay: 5 }
|
|
244
|
+
|
|
245
|
+
# For medium tasks (30s - 2min expected)
|
|
246
|
+
dependency_wait: { poll_timeout: 30, reschedule_delay: 15 }
|
|
247
|
+
|
|
248
|
+
# For long tasks (> 2min expected)
|
|
249
|
+
dependency_wait: { poll_timeout: 60, reschedule_delay: 30 }
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
**Note**: bool shorthand for `dependency_wait` is not supported. To enable rescheduling use a positive `poll_timeout` (via integer shorthand or as a Hash option); otherwise omit `dependency_wait` or pass an empty Hash to use polling-only behavior (poll_timeout = 0).
|
|
253
|
+
### 2. Consider Worker Pool Size
|
|
254
|
+
|
|
255
|
+
If you have many workers, a longer `poll_timeout` is acceptable:
|
|
256
|
+
|
|
257
|
+
```ruby
|
|
258
|
+
# Few workers (< 10): Release quickly
|
|
259
|
+
dependency_wait: { poll_timeout: 5, reschedule_delay: 3 }
|
|
260
|
+
|
|
261
|
+
# Many workers (> 50): Can afford to poll longer
|
|
262
|
+
dependency_wait: { poll_timeout: 60, reschedule_delay: 10 }
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
### 3. Use with `enqueue` for Parallel Sub-jobs
|
|
266
|
+
|
|
267
|
+
`dependency_wait` is most beneficial when combined with enqueued map tasks:
|
|
268
|
+
|
|
269
|
+
```ruby
|
|
270
|
+
# ✅ Good: dependency_wait with parallel sub-jobs
|
|
271
|
+
task :process,
|
|
272
|
+
each: ->(ctx) { ctx.arguments.items },
|
|
273
|
+
enqueue: { concurrency: 10 } do |ctx|
|
|
274
|
+
heavy_process(ctx.each_value)
|
|
275
|
+
end
|
|
276
|
+
|
|
277
|
+
task :aggregate,
|
|
278
|
+
depends_on: [:process],
|
|
279
|
+
dependency_wait: { poll_timeout: 30, reschedule_delay: 10 } do |ctx|
|
|
280
|
+
# Workers won't be blocked waiting for sub-jobs
|
|
281
|
+
end
|
|
282
|
+
|
|
283
|
+
# ⚠️ Less beneficial: dependency_wait without parallel execution
|
|
284
|
+
task :process do |ctx|
|
|
285
|
+
# Single synchronous task
|
|
286
|
+
end
|
|
287
|
+
|
|
288
|
+
task :next_step,
|
|
289
|
+
depends_on: [:process],
|
|
290
|
+
dependency_wait: {} do |ctx|
|
|
291
|
+
# dependency_wait adds overhead here since process is synchronous
|
|
292
|
+
end
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
### 4. Monitor Reschedule Behavior
|
|
296
|
+
|
|
297
|
+
Use instrumentation to track reschedule events:
|
|
298
|
+
|
|
299
|
+
```ruby
|
|
300
|
+
# Subscribe to instrumentation events
|
|
301
|
+
ActiveSupport::Notifications.subscribe("job_rescheduled.job_workflow") do |_name, _start, _finish, _id, payload|
|
|
302
|
+
Rails.logger.info(
|
|
303
|
+
"Job rescheduled",
|
|
304
|
+
task: payload[:task_name],
|
|
305
|
+
poll_count: payload[:poll_count],
|
|
306
|
+
delay: payload[:reschedule_delay]
|
|
307
|
+
)
|
|
308
|
+
end
|
|
309
|
+
```
|
|
310
|
+
|
|
311
|
+
## Troubleshooting
|
|
312
|
+
|
|
313
|
+
### Job Keeps Rescheduling Forever
|
|
314
|
+
|
|
315
|
+
**Symptom**: The task keeps getting rescheduled without ever completing.
|
|
316
|
+
|
|
317
|
+
**Cause**: Dependencies are never completing (failed or stuck sub-jobs).
|
|
318
|
+
|
|
319
|
+
**Solution**:
|
|
320
|
+
1. Check sub-job status using `JobWorkflow::JobStatus`
|
|
321
|
+
2. Look for failed executions in `solid_queue_failed_executions`
|
|
322
|
+
3. Add error handling or timeout to dependent tasks
|
|
323
|
+
|
|
324
|
+
```ruby
|
|
325
|
+
# Check job status
|
|
326
|
+
status = JobWorkflow::JobStatus.new(MyJob, job_id)
|
|
327
|
+
status.fetch!
|
|
328
|
+
puts status.tasks_status # See which tasks are incomplete
|
|
329
|
+
```
|
|
330
|
+
|
|
331
|
+
### Too Many Reschedules Causing Overhead
|
|
332
|
+
|
|
333
|
+
**Symptom**: High database load from frequent rescheduling.
|
|
334
|
+
|
|
335
|
+
**Solution**: Increase `poll_timeout` and `reschedule_delay`:
|
|
336
|
+
|
|
337
|
+
```ruby
|
|
338
|
+
dependency_wait: {
|
|
339
|
+
poll_timeout: 60, # Poll longer before rescheduling
|
|
340
|
+
reschedule_delay: 30 # Wait longer between reschedules
|
|
341
|
+
}
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
### Worker Not Released
|
|
345
|
+
|
|
346
|
+
**Symptom**: Workers are still blocked despite using `dependency_wait`.
|
|
347
|
+
|
|
348
|
+
**Cause**: The `ClaimedExecutionPatch` was not applied to SolidQueue.
|
|
349
|
+
|
|
350
|
+
**Solution**: JobWorkflow automatically installs the patch when the adapter is initialized. Ensure SolidQueue is properly configured and the adapter initialization runs during boot.
|
|
351
|
+
|
|
352
|
+
## Technical Details
|
|
353
|
+
|
|
354
|
+
### How `throw/catch` Makes This Safe
|
|
355
|
+
|
|
356
|
+
JobWorkflow uses Ruby's `throw/catch` mechanism (not exceptions) to handle job rescheduling:
|
|
357
|
+
|
|
358
|
+
**Why `throw/catch` instead of exceptions?**
|
|
359
|
+
- `throw` is a non-local jump mechanism, not exception handling
|
|
360
|
+
- When `throw :rescheduled` is called, it doesn't trigger `rescue Exception` blocks
|
|
361
|
+
- SolidQueue's `execute` method treats the job as successful (no exception)
|
|
362
|
+
- The `catch(:rescheduled)` in `Runner#run` cleanly exits the workflow
|
|
363
|
+
- The job completes normally, but `ClaimedExecutionPatch` prevents `finished_at` from being set
|
|
364
|
+
|
|
365
|
+
**Flow sequence:**
|
|
366
|
+
1. `reschedule_job` updates `scheduled_at` and deletes the claimed execution
|
|
367
|
+
2. `throw :rescheduled` is called
|
|
368
|
+
3. Execution jumps to `catch(:rescheduled)` in `Runner#run`
|
|
369
|
+
4. `run` method completes normally
|
|
370
|
+
5. `execute` returns `Result.new(true, nil)` (success)
|
|
371
|
+
6. SolidQueue calls `finished`, which checks `self.class.exists?(id)`
|
|
372
|
+
7. Since the claimed execution was already deleted, `finished` returns early
|
|
373
|
+
8. `job.finished!` is never called → `finished_at` remains NULL
|
|
374
|
+
9. SolidQueue's dispatcher later executes the scheduled job
|
|
375
|
+
|
|
376
|
+
### Database State After Reschedule
|
|
377
|
+
|
|
378
|
+
After a successful reschedule:
|
|
379
|
+
|
|
380
|
+
| Table | State |
|
|
381
|
+
|-------|-------|
|
|
382
|
+
| `solid_queue_jobs` | `finished_at` remains NULL |
|
|
383
|
+
| `solid_queue_claimed_executions` | Record deleted |
|
|
384
|
+
| `solid_queue_scheduled_executions` | New record with `scheduled_at` |
|
|
385
|
+
|
|
386
|
+
The SolidQueue dispatcher will move the scheduled execution to ready executions when `scheduled_at` is reached.
|