openclaw-scheduler 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (70) hide show
  1. package/AGENTS.md +302 -0
  2. package/BEST-PRACTICES.md +506 -0
  3. package/CHANGELOG.md +82 -0
  4. package/CODE_OF_CONDUCT.md +22 -0
  5. package/CONTEXT.md +26 -0
  6. package/CONTRIBUTING.md +73 -0
  7. package/IMPLEMENTATION_SPEC.md +170 -0
  8. package/INSTALL-ADDITIONAL-HOST.md +333 -0
  9. package/INSTALL-LINUX.md +419 -0
  10. package/INSTALL-WINDOWS.md +305 -0
  11. package/INSTALL.md +364 -0
  12. package/JOB-QUICK-REF.md +222 -0
  13. package/LICENSE +21 -0
  14. package/QUICK-START.md +256 -0
  15. package/README.md +2170 -0
  16. package/SECURITY.md +34 -0
  17. package/UNINSTALL.md +129 -0
  18. package/UPGRADING.md +436 -0
  19. package/agents.js +67 -0
  20. package/approval.js +107 -0
  21. package/backup.js +390 -0
  22. package/bin/openclaw-scheduler.js +138 -0
  23. package/cli.js +1083 -0
  24. package/db.js +122 -0
  25. package/dispatch/529-recovery.mjs +204 -0
  26. package/dispatch/README.md +372 -0
  27. package/dispatch/config.example.json +24 -0
  28. package/dispatch/deliver-watcher.sh +57 -0
  29. package/dispatch/hooks.mjs +171 -0
  30. package/dispatch/index.mjs +1836 -0
  31. package/dispatch/watcher.mjs +1396 -0
  32. package/dispatch-queue.js +112 -0
  33. package/dispatcher-approvals.js +96 -0
  34. package/dispatcher-delivery.js +43 -0
  35. package/dispatcher-maintenance.js +242 -0
  36. package/dispatcher-shell.js +29 -0
  37. package/dispatcher-strategies.js +1280 -0
  38. package/dispatcher-utils.js +81 -0
  39. package/dispatcher.js +855 -0
  40. package/docs/adr-schedule-ownership.md +73 -0
  41. package/docs/gateway-contract.md +904 -0
  42. package/docs/plans/2026-03-09-fix-typescript-types.md +91 -0
  43. package/docs/plans/2026-03-09-test-coverage-gaps.md +83 -0
  44. package/docs/plans/2026-03-10-dispatcher-refactor.md +801 -0
  45. package/docs/trust-architecture.md +266 -0
  46. package/gateway.js +473 -0
  47. package/idempotency.js +119 -0
  48. package/index.d.ts +864 -0
  49. package/index.js +17 -0
  50. package/jobs.js +1224 -0
  51. package/messages.js +357 -0
  52. package/migrate-consolidate.js +694 -0
  53. package/migrate.js +125 -0
  54. package/package.json +130 -0
  55. package/paths.js +79 -0
  56. package/prompt-context.js +94 -0
  57. package/retrieval.js +176 -0
  58. package/runs.js +270 -0
  59. package/scheduler-schema.js +101 -0
  60. package/schema.sql +480 -0
  61. package/scripts/dispatch-cli-utils.mjs +65 -0
  62. package/scripts/inbox-consumer.mjs +288 -0
  63. package/scripts/stuck-detector.sh +18 -0
  64. package/scripts/stuck-run-detector.mjs +333 -0
  65. package/scripts/telegram-webhook-check.mjs +238 -0
  66. package/setup.mjs +724 -0
  67. package/shell-result.js +214 -0
  68. package/task-tracker.js +300 -0
  69. package/team-adapter.js +335 -0
  70. package/v02-runtime.js +599 -0
@@ -0,0 +1,801 @@
1
+ **Status: Completed**
2
+
3
+ # Refactor dispatchJob into Strategy Pattern
4
+
5
+ > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
6
+
7
+ **Goal:** Decompose the 549-line `dispatchJob` closure in dispatcher.js into explicit context object + strategy functions + declarative finalization, without changing any observable behavior.
8
+
9
+ **Architecture:** Extract the function into three phases -- `prepareDispatch` (guards + run creation), per-target strategy functions (watchdog/main/shell/agent), and `finalizeDispatch` (uniform post-execution ceremony). All phases communicate via a plain `DispatchContext` object and a `DispatchResult` descriptor, eliminating shared closure state. The new functions live in a new `dispatcher-strategies.js` file; the orchestrator stays in `dispatcher.js`.
10
+
11
+ **Tech Stack:** Node.js ESM, better-sqlite3, existing test harness + dispatcher integration tests.
12
+
13
+ ---
14
+
15
+ ### Task 1: Create dispatcher-strategies.js with DispatchResult shape and finalizeDispatch
16
+
17
+ This task creates the new module with the post-execution ceremony extracted from all five branches. The ceremony handles: finishRun, idempotency key management, agent status cleanup, delivery, retry logic, updateJobAfterRun, dispatch completion, triggered children, and dequeue.
18
+
19
+ **Files:**
20
+ - Create: `dispatcher-strategies.js`
21
+ - Modify: `package.json` (add to `files` array)
22
+
23
+ **Step 1:** Create `dispatcher-strategies.js` with the `finalizeDispatch` function and the `buildRunFinishFields` helper. This is the single place that replaces all the duplicated post-execution code across branches.
24
+
25
+ ```javascript
26
+ // dispatcher-strategies.js
27
+ // Strategy pattern for dispatchJob: each execution target returns a DispatchResult,
28
+ // and finalizeDispatch processes it uniformly.
29
+
30
+ /**
31
+ * DispatchResult shape (returned by every strategy):
32
+ * {
33
+ * status: 'ok' | 'error' | 'skipped',
34
+ * summary: string,
35
+ * content: string, // for delivery + trigger condition eval
36
+ * errorMessage: string | null,
37
+ * runFinishFields: object, // extra fields for finishRun (shell_exit_code, etc.)
38
+ * deliveryOverride: string | null, // override delivery content (null = use content)
39
+ * skipDelivery: boolean, // suppress delivery entirely
40
+ * skipJobUpdate: boolean, // strategy handled job state itself
41
+ * skipChildren: boolean, // don't fire triggered children
42
+ * skipDequeue: boolean, // don't drain overlap queue
43
+ * idemAction: 'keep' | 'release', // what to do with idempotency key
44
+ * earlyReturn: boolean, // finalize should skip everything (strategy fully handled it)
45
+ * }
46
+ */
47
+
48
+ export function makeDefaultResult() {
49
+ return {
50
+ status: 'ok',
51
+ summary: '',
52
+ content: '',
53
+ errorMessage: null,
54
+ runFinishFields: {},
55
+ deliveryOverride: null,
56
+ skipDelivery: false,
57
+ skipJobUpdate: false,
58
+ skipChildren: false,
59
+ skipDequeue: false,
60
+ idemAction: 'keep',
61
+ earlyReturn: false,
62
+ };
63
+ }
64
+
65
+ /**
66
+ * Uniform post-execution ceremony. Processes the DispatchResult from any strategy.
67
+ *
68
+ * @param {object} job - The job record
69
+ * @param {object} ctx - DispatchContext from prepareDispatch
70
+ * @param {object} result - DispatchResult from the strategy
71
+ * @param {object} deps - Injected dependencies
72
+ */
73
+ export async function finalizeDispatch(job, ctx, result, deps) {
74
+ const {
75
+ finishRun, updateIdempotencyResultHash, releaseIdempotencyKey,
76
+ setAgentStatus, handleDelivery, shouldRetry, scheduleRetry,
77
+ getDb, updateJobAfterRun, setDispatchStatus, handleTriggeredChildren,
78
+ dequeueJob, log,
79
+ } = deps;
80
+
81
+ if (result.earlyReturn) return;
82
+
83
+ // 1. Finish the run
84
+ finishRun(ctx.run.id, result.status, {
85
+ summary: result.summary,
86
+ error_message: result.errorMessage,
87
+ ...result.runFinishFields,
88
+ });
89
+
90
+ // 2. Idempotency key management
91
+ if (ctx.idemKey) {
92
+ if (result.status === 'ok' && result.idemAction === 'keep') {
93
+ updateIdempotencyResultHash(ctx.idemKey, result.content);
94
+ } else {
95
+ releaseIdempotencyKey(ctx.idemKey);
96
+ }
97
+ }
98
+
99
+ // 3. Agent status cleanup
100
+ if (job.agent_id) setAgentStatus(job.agent_id, 'idle', null);
101
+
102
+ // 4. Delivery
103
+ if (!result.skipDelivery) {
104
+ const deliveryContent = result.deliveryOverride ?? result.content;
105
+ const shouldAnnounce = ['announce', 'announce-always'].includes(job.delivery_mode)
106
+ && deliveryContent?.trim();
107
+
108
+ if (shouldAnnounce) {
109
+ if (result.status === 'error') {
110
+ const willRetry = job.max_retries > 0 && (ctx.run.retry_count || 0) < job.max_retries;
111
+ const retryLabel = willRetry ? 'will retry' : 'no retries configured';
112
+ await handleDelivery(job, `⚠️ Job soft-failed (${retryLabel}): ${job.name}\n\n${deliveryContent}`);
113
+ } else {
114
+ await handleDelivery(job, deliveryContent);
115
+ }
116
+ }
117
+ }
118
+
119
+ // 5. Retry on error
120
+ if (result.status === 'error' && shouldRetry(job, ctx.run.id)) {
121
+ const retry = scheduleRetry(job, ctx.run.id);
122
+ if (retry.dispatch) {
123
+ log('info', `Scheduling retry ${retry.retryCount}/${job.max_retries} in ${retry.delaySec}s`, {
124
+ jobId: job.id, runId: ctx.run.id,
125
+ });
126
+ getDb().prepare('UPDATE runs SET retry_count = ? WHERE id = ?').run(retry.retryCount, ctx.run.id);
127
+ if (ctx.dispatchRecord) setDispatchStatus(ctx.dispatchRecord.id, 'done');
128
+ if (!result.skipDequeue && dequeueJob(job.id)) {
129
+ log('info', `Dequeued pending dispatch for ${job.name}`);
130
+ }
131
+ if (!result.skipChildren) {
132
+ handleTriggeredChildren(job.id, 'error', result.content, ctx.run.id, ' on soft failure');
133
+ }
134
+ log('info', `${result.status === 'error' ? 'Failed' : 'Completed'}: ${job.name} (retry scheduled)`, { runId: ctx.run.id });
135
+ return; // retry path handles everything
136
+ }
137
+ log('warn', `Retry skipped for ${job.name} -- dispatch backlog limit reached`, {
138
+ jobId: job.id, runId: ctx.run.id,
139
+ maxQueuedDispatches: job.max_queued_dispatches || 25,
140
+ });
141
+ }
142
+
143
+ // 6. Update job state
144
+ if (!result.skipJobUpdate) {
145
+ updateJobAfterRun(job, result.status);
146
+ }
147
+
148
+ // 7. Complete dispatch
149
+ if (ctx.dispatchRecord) {
150
+ setDispatchStatus(ctx.dispatchRecord.id, result.status === 'error' ? 'cancelled' : 'done');
151
+ }
152
+
153
+ // 8. Triggered children
154
+ if (!result.skipChildren) {
155
+ handleTriggeredChildren(job.id, result.status, result.content, ctx.run.id);
156
+ }
157
+
158
+ // 9. Dequeue overlap
159
+ if (!result.skipDequeue && dequeueJob(job.id)) {
160
+ log('info', `Dequeued pending dispatch for ${job.name}`);
161
+ }
162
+ }
163
+ ```
164
+
165
+ **Step 2:** Add `dispatcher-strategies.js` to the `files` array in `package.json`, after `dispatcher-maintenance.js`.
166
+
167
+ **Step 3:** Run `npm test` -- must still pass (648 tests, no behavior changes yet since nothing imports the new file).
168
+
169
+ ### Task 2: Extract prepareDispatch from dispatchJob's guard section
170
+
171
+ This task extracts lines 186-355 of the current dispatchJob (dispatch claim, approval gate, resource pool, overlap control, idempotency, run creation) into `prepareDispatch` in `dispatcher-strategies.js`. It returns a `DispatchContext` object or `null` if a guard rejected.
172
+
173
+ **Files:**
174
+ - Modify: `dispatcher-strategies.js`
175
+
176
+ **Step 1:** Add `prepareDispatch` to `dispatcher-strategies.js`:
177
+
178
+ ```javascript
179
+ /**
180
+ * DispatchContext shape (returned by prepareDispatch):
181
+ * {
182
+ * dispatchRecord: object | null,
183
+ * idemKey: string | null,
184
+ * run: object, // the created run record
185
+ * retryCount: number,
186
+ * dispatchKind: string | null,
187
+ * isChainDispatch: boolean,
188
+ * }
189
+ */
190
+
191
+ /**
192
+ * Phase 1: Guards + run creation. Returns DispatchContext or null (guard rejected).
193
+ *
194
+ * @param {object} job
195
+ * @param {object} opts - { approvalBypass, dispatchRecord }
196
+ * @param {object} deps - Injected dependencies
197
+ * @returns {object|null}
198
+ */
199
+ export async function prepareDispatch(job, opts, deps) {
200
+ const {
201
+ claimDispatch, releaseDispatch, setDispatchStatus,
202
+ countPendingApprovalsForJob, getPendingApproval,
203
+ createApproval, createRun, getRun,
204
+ hasRunningRunForPool, hasRunningRun,
205
+ enqueueJob, getDispatchBacklogCount,
206
+ generateIdempotencyKey, generateChainIdempotencyKey,
207
+ generateRunNowIdempotencyKey, claimIdempotencyKey,
208
+ finishRun, getDb,
209
+ sqliteNow, adaptiveDeferralMs, buildExecutionIntentNote,
210
+ handleDelivery, advanceNextRun,
211
+ TICK_INTERVAL_MS,
212
+ log,
213
+ } = deps;
214
+
215
+ const approvalBypass = opts.approvalBypass === true;
216
+ let dispatchRecord = opts.dispatchRecord || null;
217
+
218
+ // Claim pending dispatch
219
+ if (dispatchRecord && dispatchRecord.status === 'pending') {
220
+ dispatchRecord = claimDispatch(dispatchRecord.id);
221
+ if (!dispatchRecord) {
222
+ log('debug', `Skipping claimed dispatch for ${job.name}`, { dispatchId: opts.dispatchRecord.id });
223
+ return null;
224
+ }
225
+ }
226
+
227
+ const completeCurrentDispatch = (status = 'done') => {
228
+ if (!dispatchRecord) return null;
229
+ return setDispatchStatus(dispatchRecord.id, status);
230
+ };
231
+
232
+ const dispatchKind = dispatchRecord?.dispatch_kind || null;
233
+ const isChainDispatch = dispatchKind === 'chain';
234
+ const dispatchBacklogDepth = getDispatchBacklogCount(job.id);
235
+
236
+ // HITL approval gate
237
+ if (job.approval_required && isChainDispatch && !approvalBypass) {
238
+ const pendingApprovalCount = countPendingApprovalsForJob(job.id);
239
+ if (pendingApprovalCount >= (job.max_pending_approvals || 10)) {
240
+ completeCurrentDispatch('cancelled');
241
+ log('warn', `Approval backlog limit reached for ${job.name}`, {
242
+ jobId: job.id,
243
+ pendingApprovals: pendingApprovalCount,
244
+ maxPendingApprovals: job.max_pending_approvals || 10,
245
+ });
246
+ return null;
247
+ }
248
+ const existing = getPendingApproval(job.id);
249
+ if (existing) {
250
+ releaseDispatch(dispatchRecord.id, sqliteNow(adaptiveDeferralMs(dispatchBacklogDepth)));
251
+ log('debug', `Skipping ${job.name} -- approval already pending`, {
252
+ approvalId: existing.id,
253
+ dispatchId: dispatchRecord?.id || null,
254
+ deferredMs: adaptiveDeferralMs(dispatchBacklogDepth),
255
+ });
256
+ return null;
257
+ }
258
+ const run = createRun(job.id, {
259
+ run_timeout_ms: job.run_timeout_ms,
260
+ status: 'awaiting_approval',
261
+ dispatch_queue_id: dispatchRecord?.id || null,
262
+ triggered_by_run: dispatchRecord?.source_run_id || null,
263
+ retry_of: dispatchRecord?.retry_of_run_id || null,
264
+ });
265
+ const approval = createApproval(job.id, run.id, dispatchRecord?.id || null);
266
+ if (dispatchRecord) setDispatchStatus(dispatchRecord.id, 'awaiting_approval');
267
+ log('info', `Approval required for ${job.name} -- awaiting operator`, { approvalId: approval.id, runId: run.id });
268
+ const msg = `\u26a0\ufe0f Job '${job.name}' requires approval.\nApprove: node cli.js jobs approve ${job.id}\nReject: node cli.js jobs reject ${job.id}`;
269
+ await handleDelivery(job, msg);
270
+ return null;
271
+ }
272
+
273
+ // Resource pool concurrency
274
+ if (job.resource_pool && hasRunningRunForPool(job.resource_pool)) {
275
+ log('info', `Skipping ${job.name} -- resource pool '${job.resource_pool}' busy`, { jobId: job.id, pool: job.resource_pool });
276
+ if (dispatchRecord) {
277
+ releaseDispatch(dispatchRecord.id, sqliteNow(TICK_INTERVAL_MS));
278
+ } else {
279
+ advanceNextRun(job);
280
+ }
281
+ return null;
282
+ }
283
+
284
+ // Overlap control
285
+ if (hasRunningRun(job.id)) {
286
+ if (job.overlap_policy === 'skip') {
287
+ log('info', `Skipping ${job.name} -- previous run still active`, { jobId: job.id });
288
+ if (dispatchRecord) {
289
+ completeCurrentDispatch('cancelled');
290
+ } else {
291
+ advanceNextRun(job);
292
+ }
293
+ return null;
294
+ }
295
+ if (job.overlap_policy === 'queue') {
296
+ const queueResult = enqueueJob(job.id);
297
+ if (!queueResult.queued) {
298
+ log('warn', `Queue limit reached for ${job.name} -- dropping overlap dispatch`, {
299
+ jobId: job.id,
300
+ queuedCount: queueResult.queued_count,
301
+ maxQueuedDispatches: job.max_queued_dispatches || 25,
302
+ });
303
+ if (dispatchRecord) {
304
+ completeCurrentDispatch('cancelled');
305
+ } else {
306
+ advanceNextRun(job);
307
+ }
308
+ return null;
309
+ }
310
+ log('info', `Queueing ${job.name} -- previous run still active`, {
311
+ jobId: job.id,
312
+ queuedCount: queueResult.queued_count,
313
+ });
314
+ if (dispatchRecord) {
315
+ completeCurrentDispatch('done');
316
+ } else {
317
+ advanceNextRun(job);
318
+ }
319
+ return null;
320
+ }
321
+ // 'allow' falls through
322
+ }
323
+
324
+ // Idempotency key generation
325
+ const scheduledTime = job.next_run_at;
326
+ let idemKey;
327
+ if (dispatchKind === 'chain') {
328
+ idemKey = generateChainIdempotencyKey(dispatchRecord.source_run_id || dispatchRecord.id, job.id);
329
+ } else if (dispatchKind === 'manual') {
330
+ idemKey = generateRunNowIdempotencyKey(job.id);
331
+ } else if (dispatchKind === 'retry') {
332
+ idemKey = generateChainIdempotencyKey(dispatchRecord.retry_of_run_id || dispatchRecord.id, job.id);
333
+ } else {
334
+ idemKey = generateIdempotencyKey(job, scheduledTime);
335
+ }
336
+
337
+ // Idempotency dedup
338
+ if (idemKey) {
339
+ const existing = getDb().prepare("SELECT * FROM idempotency_ledger WHERE key = ? AND status = 'claimed'").get(idemKey);
340
+ if (existing) {
341
+ log('info', `Idempotency skip: ${job.name} (key ${idemKey.slice(0,8)}... already claimed by run ${existing.run_id.slice(0,8)}...)`);
342
+ if (dispatchRecord) {
343
+ completeCurrentDispatch('done');
344
+ } else {
345
+ advanceNextRun(job);
346
+ }
347
+ return null;
348
+ }
349
+ }
350
+
351
+ log('info', `Dispatching: ${job.name}`, { jobId: job.id, target: job.session_target });
352
+
353
+ const retryCount = dispatchKind === 'retry' && dispatchRecord?.retry_of_run_id
354
+ ? (getRun(dispatchRecord.retry_of_run_id)?.retry_count || 0)
355
+ : 0;
356
+
357
+ const run = createRun(job.id, {
358
+ run_timeout_ms: job.run_timeout_ms,
359
+ idempotency_key: idemKey,
360
+ retry_count: retryCount,
361
+ dispatch_queue_id: dispatchRecord?.id || null,
362
+ triggered_by_run: dispatchRecord?.source_run_id || null,
363
+ retry_of: dispatchRecord?.retry_of_run_id || null,
364
+ });
365
+
366
+ // Claim idempotency key
367
+ if (idemKey) {
368
+ const expiresAt = job.delete_after_run
369
+ ? sqliteNow(24 * 60 * 60 * 1000)
370
+ : sqliteNow(7 * 24 * 60 * 60 * 1000);
371
+ const claimed = claimIdempotencyKey(idemKey, job.id, run.id, expiresAt);
372
+ if (!claimed) {
373
+ log('warn', `Idempotency race: ${job.name} key ${idemKey.slice(0,8)}... claimed by concurrent dispatch`);
374
+ finishRun(run.id, 'skipped', { summary: 'Idempotency key already claimed (race)' });
375
+ if (dispatchRecord) {
376
+ completeCurrentDispatch('done');
377
+ } else {
378
+ advanceNextRun(job);
379
+ }
380
+ return null;
381
+ }
382
+ }
383
+
384
+ return { dispatchRecord, idemKey, run, retryCount, dispatchKind, isChainDispatch };
385
+ }
386
+ ```
387
+
388
+ **Step 2:** Run `npm test` -- must pass (no callers yet).
389
+
390
+ ### Task 3: Extract the four strategy functions
391
+
392
+ This task creates `executeWatchdog`, `executeMain`, `executeShell`, and `executeAgent` in `dispatcher-strategies.js`. Each receives `(job, ctx, deps)` and returns a `DispatchResult`.
393
+
394
+ **Files:**
395
+ - Modify: `dispatcher-strategies.js`
396
+
397
+ **Step 1:** Add `executeWatchdog`:
398
+
399
+ ```javascript
400
+ export async function executeWatchdog(job, ctx, deps) {
401
+ const { runShellCommand, handleDelivery, updateJob, deleteJob, log } = deps;
402
+ const result = makeDefaultResult();
403
+ result.skipChildren = true;
404
+ result.skipDequeue = true;
405
+ result.idemAction = 'keep';
406
+
407
+ const checkCmd = job.watchdog_check_cmd;
408
+ if (!checkCmd) {
409
+ result.status = 'error';
410
+ result.errorMessage = 'Watchdog job missing watchdog_check_cmd';
411
+ result.skipJobUpdate = false;
412
+ return result;
413
+ }
414
+
415
+ const shellExec = await runShellCommand(checkCmd, Math.min(job.run_timeout_ms || 300000, 60000));
416
+ const exitCode = shellExec.exitCode;
417
+ const stdout = (shellExec.stdout || '').trim();
418
+ const stderr = (shellExec.stderr || '').trim();
419
+
420
+ let timedOut = false;
421
+ let elapsedMin = 0;
422
+ if (job.watchdog_started_at && job.watchdog_timeout_min) {
423
+ const startedAt = new Date(job.watchdog_started_at).getTime();
424
+ elapsedMin = Math.round((Date.now() - startedAt) / 60000);
425
+ if (elapsedMin >= job.watchdog_timeout_min) timedOut = true;
426
+ }
427
+
428
+ if (exitCode === 2) {
429
+ result.summary = `Watchdog check failed (transient): ${stderr || stdout}`;
430
+ result.skipDelivery = true;
431
+ log('debug', `Watchdog check transient failure: ${job.name}`, { exitCode, stderr: stderr.slice(0, 200) });
432
+
433
+ } else if (exitCode === 0 && stdout) {
434
+ const completionMsg = `\u2705 [watchdog] Task "${job.watchdog_target_label}" completed -- watchdog disarmed`;
435
+ result.summary = completionMsg;
436
+ result.content = completionMsg;
437
+ log('info', `Watchdog: target completed: ${job.watchdog_target_label}`, { jobId: job.id });
438
+
439
+ if (job.watchdog_alert_channel && job.watchdog_alert_target) {
440
+ await handleDelivery({
441
+ ...job,
442
+ delivery_mode: 'announce-always',
443
+ delivery_channel: job.watchdog_alert_channel,
444
+ delivery_to: job.watchdog_alert_target,
445
+ }, completionMsg);
446
+ }
447
+ result.skipDelivery = true;
448
+
449
+ if (job.watchdog_self_destruct) {
450
+ result.skipJobUpdate = true;
451
+ updateJob(job.id, { enabled: 0 });
452
+ deleteJob(job.id);
453
+ log('info', `Watchdog self-destructed: ${job.name}`, { jobId: job.id });
454
+ }
455
+
456
+ } else if (exitCode === 1 || timedOut) {
457
+ const reason = timedOut
458
+ ? `running for ${elapsedMin}min (threshold: ${job.watchdog_timeout_min}min)`
459
+ : `check command reported stuck`;
460
+ const alertMsg = [
461
+ `\ud83d\udea8 [watchdog] Task "${job.watchdog_target_label}" appears stuck`,
462
+ `- Dispatched: ${job.watchdog_started_at || 'unknown'}`,
463
+ `- Running for: ${elapsedMin} minutes (threshold: ${job.watchdog_timeout_min || '?'} min)`,
464
+ `- Reason: ${reason}`,
465
+ `- Check: ${checkCmd}`,
466
+ stderr ? `- Error: ${stderr.slice(0, 500)}` : null,
467
+ stdout ? `- Output: ${stdout.slice(0, 500)}` : null,
468
+ ].filter(Boolean).join('\n');
469
+ result.summary = `Watchdog alert fired: ${reason}`;
470
+ result.content = alertMsg;
471
+
472
+ log('warn', `Watchdog alert: ${job.watchdog_target_label} stuck`, {
473
+ jobId: job.id, elapsedMin, timedOut, exitCode,
474
+ });
475
+
476
+ if (job.watchdog_alert_channel && job.watchdog_alert_target) {
477
+ await handleDelivery({
478
+ ...job,
479
+ delivery_mode: 'announce-always',
480
+ delivery_channel: job.watchdog_alert_channel,
481
+ delivery_to: job.watchdog_alert_target,
482
+ }, alertMsg);
483
+ }
484
+ result.skipDelivery = true;
485
+
486
+ } else {
487
+ result.summary = `Watchdog check: target still running (${elapsedMin}min elapsed)`;
488
+ result.skipDelivery = true;
489
+ log('debug', `Watchdog: target still running: ${job.watchdog_target_label}`, {
490
+ jobId: job.id, elapsedMin,
491
+ });
492
+ }
493
+
494
+ return result;
495
+ }
496
+ ```
497
+
498
+ **Step 2:** Add `executeMain`:
499
+
500
+ ```javascript
501
+ export async function executeMain(job, ctx, deps) {
502
+ const { sendSystemEvent, buildExecutionIntentNote } = deps;
503
+ const result = makeDefaultResult();
504
+ result.skipChildren = true;
505
+ result.skipDequeue = true;
506
+
507
+ const executionNote = buildExecutionIntentNote(job);
508
+ const modelNote = job.payload_thinking
509
+ ? `[SYSTEM NOTE -- model policy]\nPrefer reasoning depth: ${job.payload_thinking}.\n[END SYSTEM NOTE]\n\n`
510
+ : '';
511
+ await sendSystemEvent(`${executionNote ? `${executionNote}\n\n` : ''}${modelNote}${job.payload_message}`, 'now');
512
+ result.summary = 'System event dispatched';
513
+ result.content = job.payload_message;
514
+
515
+ // Main session only delivers on announce-always (not on error)
516
+ if (job.delivery_mode !== 'announce-always') {
517
+ result.skipDelivery = true;
518
+ }
519
+
520
+ return result;
521
+ }
522
+ ```
523
+
524
+ **Step 3:** Add `executeShell`:
525
+
526
+ ```javascript
527
+ export async function executeShell(job, ctx, deps) {
528
+ const { runShellCommand, normalizeShellResult, log } = deps;
529
+ const result = makeDefaultResult();
530
+
531
+ const shellExec = await runShellCommand(job.payload_message, job.run_timeout_ms);
532
+ const shellResult = normalizeShellResult(shellExec, {
533
+ runId: ctx.run.id,
534
+ timeoutMs: job.run_timeout_ms,
535
+ storeLimit: job.output_store_limit_bytes || undefined,
536
+ excerptLimit: job.output_excerpt_limit_bytes || undefined,
537
+ summaryLimit: job.output_summary_limit_bytes || undefined,
538
+ offloadThreshold: job.output_offload_threshold_bytes || undefined,
539
+ });
540
+
541
+ result.status = shellResult.status;
542
+ result.summary = shellResult.summary;
543
+ result.errorMessage = shellResult.errorMessage;
544
+ result.content = shellResult.deliveryText;
545
+ result.runFinishFields = {
546
+ context_summary: shellResult.contextSummary,
547
+ shell_exit_code: shellResult.exitCode,
548
+ shell_signal: shellResult.signal,
549
+ shell_timed_out: shellResult.timedOut,
550
+ shell_stdout: shellResult.stdout,
551
+ shell_stderr: shellResult.stderr,
552
+ shell_stdout_path: shellResult.stdoutPath,
553
+ shell_stderr_path: shellResult.stderrPath,
554
+ shell_stdout_bytes: shellResult.stdoutBytes,
555
+ shell_stderr_bytes: shellResult.stderrBytes,
556
+ };
557
+
558
+ // Shell delivery logic: announce-always sends on all results, announce sends on error only
559
+ const announcePayload = shellResult.deliveryText.trim() ? shellResult.deliveryText : shellResult.errorMessage;
560
+ if (job.delivery_mode === 'announce-always' && announcePayload) {
561
+ const prefix = shellResult.status === 'ok' ? '' : `\u26a0\ufe0f Shell job failed: ${job.name}\n\n`;
562
+ result.deliveryOverride = `${prefix}${announcePayload}`;
563
+ } else if (job.delivery_mode === 'announce' && shellResult.status !== 'ok' && announcePayload) {
564
+ result.deliveryOverride = announcePayload;
565
+ } else {
566
+ result.skipDelivery = true;
567
+ }
568
+
569
+ log('info', `Shell ${shellResult.status}: ${job.name}`, {
570
+ runId: ctx.run.id,
571
+ exitCode: shellResult.exitCode,
572
+ signal: shellResult.signal,
573
+ timedOut: shellResult.timedOut,
574
+ });
575
+
576
+ return result;
577
+ }
578
+ ```
579
+
580
+ **Step 4:** Add `executeAgent`:
581
+
582
+ ```javascript
583
+ export async function executeAgent(job, ctx, deps) {
584
+ const {
585
+ waitForGateway, updateRunSession, setAgentStatus,
586
+ buildJobPrompt, runAgentTurnWithActivityTimeout,
587
+ updateContextSummary, releaseDispatch, releaseIdempotencyKey,
588
+ updateJob, matchesSentinel, detectTransientError,
589
+ sqliteNow, log,
590
+ } = deps;
591
+ const result = makeDefaultResult();
592
+
593
+ // Gateway health check
594
+ const gatewayReady = await waitForGateway(30000, 2000);
595
+ if (!gatewayReady) {
596
+ log('warn', `Gateway unavailable after 30s -- deferring: ${job.name}`, { jobId: job.id });
597
+ // Strategy handles everything for the gateway-down case
598
+ deps.finishRun(ctx.run.id, 'error', { error_message: 'Gateway unavailable -- deferred' });
599
+ if (ctx.idemKey) releaseIdempotencyKey(ctx.idemKey);
600
+ const deferredAt = sqliteNow(60000);
601
+ if (ctx.dispatchRecord) {
602
+ releaseDispatch(ctx.dispatchRecord.id, deferredAt);
603
+ } else {
604
+ updateJob(job.id, { next_run_at: deferredAt });
605
+ }
606
+ result.earlyReturn = true;
607
+ return result;
608
+ }
609
+
610
+ const sessionKey = job.preferred_session_key || `scheduler:${job.id}:${ctx.run.id}`;
611
+ updateRunSession(ctx.run.id, sessionKey, null);
612
+
613
+ if (job.agent_id) setAgentStatus(job.agent_id, 'busy', sessionKey);
614
+
615
+ const { prompt, contextMeta } = buildJobPrompt(job, ctx.run);
616
+ try { updateContextSummary(ctx.run.id, contextMeta); } catch (_e) { /* column may not exist yet */ }
617
+
618
+ const turnResult = await runAgentTurnWithActivityTimeout({
619
+ message: prompt,
620
+ agentId: job.agent_id || 'main',
621
+ sessionKey,
622
+ model: job.payload_model || undefined,
623
+ idleTimeoutMs: (job.payload_timeout_seconds || 120) * 1000,
624
+ pollIntervalMs: 60000,
625
+ absoluteTimeoutMs: job.run_timeout_ms || 300000,
626
+ });
627
+
628
+ const content = turnResult.content || '';
629
+ const trimmed = content.trim();
630
+
631
+ const isHeartbeatOk = matchesSentinel(trimmed, 'HEARTBEAT_OK');
632
+ const isNoFlush = trimmed === 'NO_FLUSH';
633
+ const isIdempotentSkip = matchesSentinel(trimmed, 'IDEMPOTENT_SKIP');
634
+ const isTaskFailed = matchesSentinel(trimmed, 'TASK_FAILED');
635
+ const isTransientError = detectTransientError(content);
636
+
637
+ if (isNoFlush) log('info', `Flush: nothing to flush for ${job.name}`);
638
+ if (isIdempotentSkip) log('info', `Idempotent skip (agent): ${job.name}`);
639
+ if (isTaskFailed) log('warn', `Agent signalled TASK_FAILED: ${job.name}`, { runId: ctx.run.id });
640
+ if (isTransientError) log('warn', `Transient error detected in agent reply: ${job.name}`, { runId: ctx.run.id, snippet: content.slice(0, 200) });
641
+
642
+ const effectiveStatus = (isTaskFailed || isTransientError) ? 'error' : 'ok';
643
+
644
+ result.status = effectiveStatus;
645
+ result.summary = content.slice(0, 5000);
646
+ result.content = content;
647
+ result.errorMessage = effectiveStatus === 'error'
648
+ ? (isTaskFailed ? 'Agent signalled TASK_FAILED' : 'Transient error in agent reply')
649
+ : null;
650
+ result.idemAction = effectiveStatus === 'ok' ? 'keep' : 'release';
651
+
652
+ // Suppress delivery for sentinel responses
653
+ if (isHeartbeatOk || isNoFlush || isIdempotentSkip) {
654
+ result.skipDelivery = true;
655
+ }
656
+
657
+ log('info', `Completed: ${job.name} (${turnResult.usage?.total_tokens || '?'} tokens)`, {
658
+ runId: ctx.run.id,
659
+ durationMs: ctx.run.duration_ms,
660
+ });
661
+
662
+ return result;
663
+ }
664
+ ```
665
+
666
+ **Step 5:** Run `npm test` -- must pass.
667
+
668
+ ### Task 4: Add executeStrategy dispatcher and the error-catch wrapper
669
+
670
+ **Files:**
671
+ - Modify: `dispatcher-strategies.js`
672
+
673
+ **Step 1:** Add the strategy dispatcher that routes to the correct execution function:
674
+
675
+ ```javascript
676
+ export async function executeStrategy(job, ctx, deps) {
677
+ const { runShellCommand, handleDelivery, log } = deps;
678
+ try {
679
+ if (job.job_type === 'watchdog') return await executeWatchdog(job, ctx, deps);
680
+ if (job.session_target === 'main') return await executeMain(job, ctx, deps);
681
+ if (job.session_target === 'shell') return await executeShell(job, ctx, deps);
682
+ return await executeAgent(job, ctx, deps);
683
+ } catch (err) {
684
+ log('error', `Failed: ${job.name}: ${err.message}`, { jobId: job.id });
685
+
686
+ // Deliver failure notification
687
+ if (['announce', 'announce-always'].includes(job.delivery_mode)) {
688
+ await handleDelivery(job, `\u26a0\ufe0f Job failed: ${job.name}\n\n${err.message}`);
689
+ }
690
+
691
+ const result = makeDefaultResult();
692
+ result.status = 'error';
693
+ result.errorMessage = err.message;
694
+ result.content = err.message;
695
+ result.idemAction = 'release';
696
+ result.skipDelivery = true; // already delivered above
697
+ return result;
698
+ }
699
+ }
700
+ ```
701
+
702
+ **Step 2:** Run `npm test` -- must pass.
703
+
704
+ ### Task 5: Rewire dispatchJob to use the new functions
705
+
706
+ This is the key task: replace the 549-line body of `dispatchJob` with a 5-line orchestrator that calls `prepareDispatch`, `executeStrategy`, and `finalizeDispatch`.
707
+
708
+ **Files:**
709
+ - Modify: `dispatcher.js`
710
+
711
+ **Step 1:** Add import at the top of dispatcher.js:
712
+
713
+ ```javascript
714
+ import {
715
+ prepareDispatch, executeStrategy, finalizeDispatch,
716
+ } from './dispatcher-strategies.js';
717
+ ```
718
+
719
+ **Step 2:** Replace the entire `dispatchJob` function body (lines 186-735) with:
720
+
721
+ ```javascript
722
+ async function dispatchJob(job, opts = {}) {
723
+ const deps = buildDispatchDeps();
724
+ const ctx = await prepareDispatch(job, opts, deps);
725
+ if (!ctx) return;
726
+
727
+ const result = await executeStrategy(job, ctx, deps);
728
+ await finalizeDispatch(job, ctx, result, deps);
729
+ }
730
+ ```
731
+
732
+ **Step 3:** Add `buildDispatchDeps()` right above `dispatchJob` -- this wires local functions and imports into the deps object that strategies receive:
733
+
734
+ ```javascript
735
+ function buildDispatchDeps() {
736
+ return {
737
+ // dispatch-queue
738
+ claimDispatch, releaseDispatch, setDispatchStatus,
739
+ // approval
740
+ countPendingApprovalsForJob, getPendingApproval, createApproval,
741
+ // runs
742
+ createRun, finishRun, getRun, updateRunSession, updateContextSummary,
743
+ // jobs
744
+ hasRunningRunForPool, hasRunningRun, enqueueJob, dequeueJob,
745
+ getDispatchBacklogCount, shouldRetry, scheduleRetry,
746
+ updateJob, deleteJob,
747
+ // idempotency
748
+ generateIdempotencyKey, generateChainIdempotencyKey,
749
+ generateRunNowIdempotencyKey, claimIdempotencyKey,
750
+ releaseIdempotencyKey, updateIdempotencyResultHash,
751
+ // gateway
752
+ waitForGateway, runAgentTurnWithActivityTimeout, sendSystemEvent,
753
+ // agents
754
+ setAgentStatus,
755
+ // shell
756
+ runShellCommand, normalizeShellResult,
757
+ // utils
758
+ sqliteNow, adaptiveDeferralMs, buildExecutionIntentNote,
759
+ matchesSentinel, detectTransientError,
760
+ // prompt
761
+ buildJobPrompt,
762
+ // delivery
763
+ handleDelivery,
764
+ // local helpers
765
+ advanceNextRun, updateJobAfterRun, handleTriggeredChildren,
766
+ // db
767
+ getDb,
768
+ // config
769
+ TICK_INTERVAL_MS,
770
+ // logging
771
+ log,
772
+ };
773
+ }
774
+ ```
775
+
776
+ **Step 4:** Remove the old `dispatchJob` body (the 549 lines between the function signature and the closing brace), keeping only the new 5-line version.
777
+
778
+ **Step 5:** Run `npm test` -- all 648 tests must pass.
779
+
780
+ **Step 6:** Run `node dispatcher.js` briefly (Ctrl+C after startup message) to verify it starts without import errors.
781
+
782
+ ### Task 6: Verify and commit
783
+
784
+ **Step 1:** Run `npm test` -- must show 648 passed, 0 failed.
785
+
786
+ **Step 2:** Run `npx eslint dispatcher.js dispatcher-strategies.js` -- must be clean.
787
+
788
+ **Step 3:** Run `npm run typecheck` -- must pass.
789
+
790
+ **Step 4:** Commit:
791
+
792
+ ```bash
793
+ git add dispatcher.js dispatcher-strategies.js package.json
794
+ git commit -m "Refactor dispatchJob into strategy pattern with explicit context
795
+
796
+ Extract the 549-line dispatchJob closure into prepareDispatch (guards +
797
+ run creation), four strategy functions (watchdog/main/shell/agent), and
798
+ finalizeDispatch (uniform post-execution ceremony). Eliminates shared
799
+ closure state in favor of explicit DispatchContext and DispatchResult
800
+ objects. No behavioral changes."
801
+ ```