openclaw-scheduler 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (70) hide show
  1. package/AGENTS.md +302 -0
  2. package/BEST-PRACTICES.md +506 -0
  3. package/CHANGELOG.md +82 -0
  4. package/CODE_OF_CONDUCT.md +22 -0
  5. package/CONTEXT.md +26 -0
  6. package/CONTRIBUTING.md +73 -0
  7. package/IMPLEMENTATION_SPEC.md +170 -0
  8. package/INSTALL-ADDITIONAL-HOST.md +333 -0
  9. package/INSTALL-LINUX.md +419 -0
  10. package/INSTALL-WINDOWS.md +305 -0
  11. package/INSTALL.md +364 -0
  12. package/JOB-QUICK-REF.md +222 -0
  13. package/LICENSE +21 -0
  14. package/QUICK-START.md +256 -0
  15. package/README.md +2170 -0
  16. package/SECURITY.md +34 -0
  17. package/UNINSTALL.md +129 -0
  18. package/UPGRADING.md +436 -0
  19. package/agents.js +67 -0
  20. package/approval.js +107 -0
  21. package/backup.js +390 -0
  22. package/bin/openclaw-scheduler.js +138 -0
  23. package/cli.js +1083 -0
  24. package/db.js +122 -0
  25. package/dispatch/529-recovery.mjs +204 -0
  26. package/dispatch/README.md +372 -0
  27. package/dispatch/config.example.json +24 -0
  28. package/dispatch/deliver-watcher.sh +57 -0
  29. package/dispatch/hooks.mjs +171 -0
  30. package/dispatch/index.mjs +1836 -0
  31. package/dispatch/watcher.mjs +1396 -0
  32. package/dispatch-queue.js +112 -0
  33. package/dispatcher-approvals.js +96 -0
  34. package/dispatcher-delivery.js +43 -0
  35. package/dispatcher-maintenance.js +242 -0
  36. package/dispatcher-shell.js +29 -0
  37. package/dispatcher-strategies.js +1280 -0
  38. package/dispatcher-utils.js +81 -0
  39. package/dispatcher.js +855 -0
  40. package/docs/adr-schedule-ownership.md +73 -0
  41. package/docs/gateway-contract.md +904 -0
  42. package/docs/plans/2026-03-09-fix-typescript-types.md +91 -0
  43. package/docs/plans/2026-03-09-test-coverage-gaps.md +83 -0
  44. package/docs/plans/2026-03-10-dispatcher-refactor.md +801 -0
  45. package/docs/trust-architecture.md +266 -0
  46. package/gateway.js +473 -0
  47. package/idempotency.js +119 -0
  48. package/index.d.ts +864 -0
  49. package/index.js +17 -0
  50. package/jobs.js +1224 -0
  51. package/messages.js +357 -0
  52. package/migrate-consolidate.js +694 -0
  53. package/migrate.js +125 -0
  54. package/package.json +130 -0
  55. package/paths.js +79 -0
  56. package/prompt-context.js +94 -0
  57. package/retrieval.js +176 -0
  58. package/runs.js +270 -0
  59. package/scheduler-schema.js +101 -0
  60. package/schema.sql +480 -0
  61. package/scripts/dispatch-cli-utils.mjs +65 -0
  62. package/scripts/inbox-consumer.mjs +288 -0
  63. package/scripts/stuck-detector.sh +18 -0
  64. package/scripts/stuck-run-detector.mjs +333 -0
  65. package/scripts/telegram-webhook-check.mjs +238 -0
  66. package/setup.mjs +724 -0
  67. package/shell-result.js +214 -0
  68. package/task-tracker.js +300 -0
  69. package/team-adapter.js +335 -0
  70. package/v02-runtime.js +599 -0
package/README.md ADDED
@@ -0,0 +1,2170 @@
1
+ # OpenClaw Scheduler
2
+
3
+ [![CI](https://github.com/amittell/openclaw-scheduler/actions/workflows/ci.yml/badge.svg)](https://github.com/amittell/openclaw-scheduler/actions/workflows/ci.yml)
4
+ [![License](https://img.shields.io/badge/license-MIT-blue)]()
5
+ [![Node](https://img.shields.io/badge/node-%E2%89%A520-green)](https://nodejs.org)
6
+
7
+ A durable orchestration runtime for [OpenClaw](https://openclaw.ai) agents and shell workflows. Use it when built-in cron and heartbeat stop being enough: jobs fail and disappear into logs, shell scripts depend on gateway uptime, multi-step workflows need retries and approvals, and you want a real audit trail for what ran, what failed, and what triggered what.
8
+
9
+ It replaces OpenClaw's built-in cron/heartbeat with a SQLite-backed scheduler that keeps full run history, supports shell and agent steps in the same workflow, and lets you build chains like `shell check -> agent diagnosis -> human approval -> remediation`.
10
+
11
+ **Repo:** `github.com/amittell/openclaw-scheduler`
12
+ **Default location:** `~/.openclaw/scheduler/`
13
+ **Service:** `ai.openclaw.scheduler` (macOS launchd: LaunchAgent or LaunchDaemon)
14
+ **Runtime:** Node.js 20+ (ESM), SQLite via `better-sqlite3`, cron parsing via `croner`
15
+ **Tests:** run with `npm test` (full suite, in-memory SQLite)
16
+ **Platform:** macOS · Linux · Windows (WSL2)
17
+
18
+ In practice, this gives you:
19
+ - scheduled jobs with real run history instead of “it probably ran”
20
+ - shell jobs that still work when the gateway is unhealthy
21
+ - AI jobs that stay isolated from your personal chats
22
+ - chains, retries, and approval gates for workflows that are bigger than one cron line
23
+
24
+ ---
25
+
26
+ ## Table of Contents
27
+
28
+ 1. [Why This Exists](#why-this-exists)
29
+ 2. [Concrete Use Cases](#concrete-use-cases)
30
+ 3. [When To Use It](#when-to-use-it)
31
+ 4. [What Replaced What](#what-replaced-what)
32
+ 5. [Quick Start](#quick-start)
33
+ 6. [Five-Minute Setup](#five-minute-setup)
34
+ 7. [Starter Recipes](#starter-recipes)
35
+ 8. [Common Migrations](#common-migrations)
36
+ 9. [Platform Support](#platform-support)
37
+ 10. [Architecture](#architecture)
38
+ 11. [How Jobs Execute](#how-jobs-execute)
39
+ 12. [Delivery Modes](#delivery-modes)
40
+ 13. [Delivery Aliases](#delivery-aliases)
41
+ 14. [Shell Jobs](#shell-jobs)
42
+ 15. [HITL Approval Gates](#hitl-approval-gates)
43
+ 16. [Idempotency](#idempotency)
44
+ 17. [Context Retrieval](#context-retrieval)
45
+ 18. [Task Tracker](#task-tracker)
46
+ 19. [Resource Pools](#resource-pools)
47
+ 20. [Workflow Chains](#workflow-chains)
48
+ 21. [Retry Logic](#retry-logic)
49
+ 22. [Chain Safety](#chain-safety)
50
+ 23. [Inter-Agent Messaging](#inter-agent-messaging)
51
+ 24. [Backup & Recovery](#backup--recovery)
52
+ 25. [Agent Registry](#agent-registry)
53
+ 26. [Database Schema](#database-schema)
54
+ 27. [CLI Reference](#cli-reference)
55
+ 28. [Configuration](#configuration)
56
+ 29. [Service Management](#service-management)
57
+ 30. [Error Handling & Backoff](#error-handling--backoff)
58
+ 31. [Migration & History](#migration--history)
59
+ 32. [Upgrading](#upgrading)
60
+ 33. [Removing the Scheduler](#removing-the-scheduler)
61
+ 34. [Best Practices](#best-practices)
62
+ 35. [File Reference](#file-reference)
63
+ 36. [Testing](#testing)
64
+ 37. [Sub-agent Dispatch](#sub-agent-dispatch)
65
+ 38. [Working with agentcli](#working-with-agentcli)
66
+ 39. [Trust Architecture](#trust-architecture)
67
+ 40. [Troubleshooting](#troubleshooting)
68
+ 41. [Companion Scripts](#companion-scripts)
69
+
70
+ ---
71
+
72
+ ## Why This Exists
73
+
74
+ OpenClaw's built-in cron and heartbeat are fine until your workflows stop being simple.
75
+
76
+ The pain usually looks like this:
77
+
78
+ - A scheduled agent run fails, but the only record is a log line or a chat reply.
79
+ - A shell script is operationally important, but it should keep running even if the gateway is unhealthy.
80
+ - One step should trigger another, but only on success, only on failure, or only if the output contains a specific signal.
81
+ - A risky action needs a human in the loop instead of firing immediately.
82
+ - An agent needs to hand work to another agent or process, and you want that handoff tracked and auditable.
83
+
84
+ `openclaw-scheduler` exists to solve those problems without making you build a second application stack. It gives OpenClaw a durable runtime for workflows: jobs, runs, chains, retries, shell execution, approvals, and message routing all backed by SQLite.
85
+
86
+ ## Concrete Use Cases
87
+
88
+ These are the kinds of workflows this scheduler is meant for:
89
+
90
+ - `metrics capture -> analysis -> approval -> report publish`
91
+ You want each step tracked, retried if needed, and gated before the final action.
92
+ - `shell ingest fails -> agent diagnoses failure -> operator approves remediation`
93
+ The ingest should still run without the gateway, but the failure follow-up can use an agent.
94
+ - `workspace audit -> diagnosis -> memory compression`
95
+ The audit is a shell step, the diagnosis is an agent step, and the remediation should only run if the diagnosis actually recommends it.
96
+ - `bot health check -> alert -> repair action`
97
+ A shell check runs on schedule, an agent summarizes the issue, and a repair step waits for approval.
98
+
99
+ The differentiator is not just "better cron". It is mixed shell + agent workflows with durable state and control over what happens after success, failure, timeout, or explicit signals in output.
100
+
101
+ ## When To Use It
102
+
103
+ Use it when you want:
104
+
105
+ - reliable scheduled execution with history and retries
106
+ - shell jobs that do not depend on OpenClaw gateway availability
107
+ - parent/child workflow chains
108
+ - approval gates before risky steps
109
+ - auditable inter-agent or agent-to-shell handoffs
110
+
111
+ Do not use it if simple cron is enough. If all you need is “run one thing every hour” and you do not care about retries, chains, approvals, or run history, this is probably more system than you need.
112
+
113
+ ## What Replaced What
114
+
115
+ > If you have never used OpenClaw's built-in cron, skip migration and go directly to [Five-Minute Setup](#five-minute-setup).
116
+
117
+ | Before (OC built-in) | After (scheduler) |
118
+ |----------------------|-------------------|
119
+ | `~/.openclaw/cron/jobs.json` | SQLite `jobs` table with full run history |
120
+ | `heartbeat.every: "5m"` | Scheduled jobs (e.g., "Daily Workspace Audit") |
121
+ | No run tracking | Full run lifecycle with status, duration, summary |
122
+ | No chain support | Parent/child jobs with trigger-on-completion |
123
+ | No retry | Auto-retry with configurable attempts and delay |
124
+ | No inter-agent comms | Message queue with priority, threading, broadcast |
125
+ | Shell scripts (manual) | Shell job target — cron-scheduled scripts, no gateway needed |
126
+
127
+ ---
128
+
129
+ ## Quick Start
130
+
131
+ **New to the scheduler?** Start with [QUICK-START.md](QUICK-START.md) -- a focused guide covering installation, converting existing OpenClaw crons, and building your first workflow chain.
132
+
133
+ For the full reference, use the npm-first path below and then jump straight to [Five-Minute Setup](#five-minute-setup).
134
+
135
+ ### Option A: npm-first (publish/install flow)
136
+
137
+ ```bash
138
+ mkdir -p ~/.openclaw/scheduler
139
+ npm install --prefix ~/.openclaw/scheduler openclaw-scheduler@latest
140
+ npm exec --prefix ~/.openclaw/scheduler openclaw-scheduler -- setup
141
+ ```
142
+
143
+ This installs the package without cloning the repo. The launcher command maps to:
144
+ - `openclaw-scheduler setup` → `setup.mjs`
145
+ - `openclaw-scheduler start` → `dispatcher.js`
146
+ - `openclaw-scheduler webhook-check` → `scripts/telegram-webhook-check.mjs`
147
+ - `openclaw-scheduler <anything-else>` → `cli.js`
148
+
149
+ For npm installs, scheduler state defaults to `~/.openclaw/scheduler/` rather than `node_modules/openclaw-scheduler/`, so upgrades do not trample the database path.
150
+
151
+ If your Node runtime changes later, rebuild the native SQLite binding before restarting the scheduler:
152
+
153
+ ```bash
154
+ cd ~/.openclaw/scheduler
155
+ npm rebuild better-sqlite3
156
+ ```
157
+
158
+ This is commonly needed after a Homebrew Node upgrade on macOS or any major Node ABI change.
159
+
160
+ ### macOS shell setup for ad hoc commands
161
+
162
+ If you use `zsh` on macOS, put your minimal Homebrew PATH bootstrap in `~/.zshenv`, not only in `~/.zprofile` or `~/.zshrc`.
163
+
164
+ Why:
165
+ - `launchd` services do not depend on your interactive shell startup files
166
+ - ad hoc commands like `ssh host 'node cli.js status'` run a non-interactive shell
167
+ - non-interactive `zsh` reads `~/.zshenv`, but does not read `~/.zprofile`
168
+
169
+ Recommended `~/.zshenv`:
170
+
171
+ ```zsh
172
+ # ~/.zshenv — sourced by all zsh instances, including non-interactive SSH commands
173
+ if [ -x /opt/homebrew/bin/brew ]; then
174
+ eval "$(/opt/homebrew/bin/brew shellenv)"
175
+ fi
176
+
177
+ export PATH="$HOME/.local/bin:$HOME/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin"
178
+ ```
179
+
180
+ If you load OpenClaw completions in `~/.zshrc`, initialize completions first so `compdef` is available:
181
+
182
+ ```zsh
183
+ autoload -Uz compinit
184
+ compinit
185
+
186
+ if [ -f "$HOME/.openclaw/completions/openclaw.zsh" ]; then
187
+ source "$HOME/.openclaw/completions/openclaw.zsh"
188
+ fi
189
+ ```
190
+
191
+ Avoid pinning a versioned Node path like `/opt/homebrew/opt/node@22/bin` in shell startup files. Prefer the stable Homebrew symlink `/opt/homebrew/bin/node`, which survives normal `brew upgrade node`.
192
+
193
+ ### Option B: source clone (dev/contributor flow)
194
+
195
+ ```bash
196
+ git clone https://github.com/amittell/openclaw-scheduler ~/.openclaw/scheduler
197
+ cd ~/.openclaw/scheduler
198
+ npm install
199
+ npm test # should end with: 0 failed
200
+ npm run lint # static checks
201
+ npm run typecheck # exported API declarations
202
+ npm run coverage # coverage summary + lcov report
203
+ npm run verify:local # full local maintainer gate
204
+ npm run verify:smoke # lightweight smoke gate used by GitHub Actions
205
+ ```
206
+
207
+ GitHub Actions runs the smoke gate plus the in-memory test suite on Linux, macOS, and Windows with Node 20. The full release gate still runs locally via `npm run verify:local` and is enforced again by `prepublishOnly`.
208
+
209
+ The package also exports a small safe programmatic API surface for tooling:
210
+
211
+ ```js
212
+ import { db, jobs, runs, shellResults } from 'openclaw-scheduler';
213
+ ```
214
+
215
+ Then run the interactive setup wizard:
216
+
217
+ ```bash
218
+ npm exec openclaw-scheduler -- setup
219
+ # or: node setup.mjs
220
+ ```
221
+
222
+ The wizard will:
223
+ - Run DB migrations
224
+ - Append scheduler queue/inbox-consumer entries to your agent's `MEMORY.md` and `workspace-index.md`
225
+ - Create **Inbox Consumer** + **Stuck Run Detector** scheduler jobs
226
+ - Configure dispatcher auto-start service:
227
+ - macOS: LaunchAgent (personal auto-login Mac) or LaunchDaemon (headless/pre-login startup)
228
+ - Linux/WSL2: systemd user service (or PM2 fallback)
229
+
230
+ After setup:
231
+
232
+ ```bash
233
+ npm exec openclaw-scheduler -- status # verify scheduler is running
234
+ node scripts/stuck-run-detector.mjs # should print: No stale runs older than 15 minute(s).
235
+ tail -5 /tmp/openclaw-scheduler.log # live logs
236
+ ```
237
+
238
+ Dispatcher setup is covered in:
239
+ - [INSTALL.md](INSTALL.md) (macOS launchd: LaunchAgent or LaunchDaemon)
240
+ - [INSTALL-LINUX.md](INSTALL-LINUX.md) (Linux/WSL2 systemd + PM2 fallback)
241
+ - [INSTALL-WINDOWS.md](INSTALL-WINDOWS.md) (WSL2 setup path)
242
+ For additional hosts, see [INSTALL-ADDITIONAL-HOST.md](INSTALL-ADDITIONAL-HOST.md).
243
+
244
+ ---
245
+
246
+ ## Five-Minute Setup
247
+
248
+ This is the shortest path from "I installed it" to "I have a real job running."
249
+
250
+ ### 1. Install and initialize
251
+
252
+ ```bash
253
+ mkdir -p ~/.openclaw/scheduler
254
+ npm install --prefix ~/.openclaw/scheduler openclaw-scheduler@latest
255
+ alias ocs='npm exec --prefix ~/.openclaw/scheduler openclaw-scheduler --'
256
+ ocs setup
257
+ ocs status
258
+ ```
259
+
260
+ What this does:
261
+ - installs the package into `~/.openclaw/scheduler`
262
+ - creates or migrates `scheduler.db`
263
+ - installs the scheduler service using the launchd mode you choose (`agent` or `daemon`)
264
+ - creates the built-in helper jobs like `Inbox Consumer` and `Stuck Run Detector`
265
+
266
+ If you plan to use the scheduler often, add the `ocs` alias to your shell profile.
267
+
268
+ ### 2. Add your first real job
269
+
270
+ This example runs a simple shell health check every 15 minutes.
271
+
272
+ ```bash
273
+ ocs jobs add '{
274
+ "name": "Disk Space Check",
275
+ "schedule_cron": "*/15 * * * *",
276
+ "session_target": "shell",
277
+ "payload_message": "df -h /",
278
+ "delivery_mode": "none",
279
+ "run_timeout_ms": 120000,
280
+ "origin": "system"
281
+ }'
282
+ ```
283
+
284
+ What it means in plain English:
285
+ - `schedule_cron`: run every 15 minutes
286
+ - `session_target: "shell"`: run a shell command directly, no AI needed
287
+ - `payload_message`: the command to run
288
+ - `delivery_mode: "none"`: do not send the output anywhere automatically
289
+ - `origin: "system"`: this job was created by the system, not from a user chat
290
+
291
+ ### 3. Run it now and inspect the result
292
+
293
+ ```bash
294
+ ocs jobs list
295
+ # copy the job ID for "Disk Space Check"
296
+
297
+ ocs jobs run <job-id>
298
+ ocs runs list <job-id> 5
299
+ ```
300
+
301
+ If you want the full run record:
302
+
303
+ ```bash
304
+ ocs runs get <run-id>
305
+ ```
306
+
307
+ At that point you have a working scheduler install, a real job, and visible run history. Everything after this is layering on more power: AI jobs, delivery, retries, workflow chains, and approvals.
308
+
309
+ ---
310
+
311
+ ## Starter Recipes
312
+
313
+ These are copy-paste examples for the most common first workflows.
314
+
315
+ ### 1. Shell health check with failure alerts
316
+
317
+ Use this when you want a script to run reliably even if the OpenClaw gateway is down.
318
+
319
+ ```bash
320
+ ocs jobs add '{
321
+ "name": "API Health Check",
322
+ "schedule_cron": "*/15 * * * *",
323
+ "session_target": "shell",
324
+ "payload_message": "curl -fsS http://127.0.0.1:8080/health || exit 1",
325
+ "delivery_mode": "announce",
326
+ "delivery_channel": "telegram",
327
+ "delivery_to": "YOUR_CHAT_ID",
328
+ "run_timeout_ms": 120000,
329
+ "origin": "system"
330
+ }'
331
+ ```
332
+
333
+ Why this is useful:
334
+ - it runs even if the gateway is unhealthy
335
+ - it only announces on failure
336
+ - every run is stored in history
337
+
338
+ ### 2. Daily AI summary
339
+
340
+ Use this when you want a scheduled agent report instead of a shell script.
341
+
342
+ ```bash
343
+ ocs jobs add '{
344
+ "name": "Daily Ops Summary",
345
+ "schedule_cron": "0 9 * * *",
346
+ "schedule_tz": "America/New_York",
347
+ "session_target": "isolated",
348
+ "payload_message": "Summarize the last 24 hours of important errors, deploys, and follow-ups in 5 bullet points.",
349
+ "delivery_mode": "announce-always",
350
+ "delivery_channel": "telegram",
351
+ "delivery_to": "YOUR_CHAT_ID",
352
+ "run_timeout_ms": 300000,
353
+ "origin": "system"
354
+ }'
355
+ ```
356
+
357
+ Why this is useful:
358
+ - the agent runs in its own isolated session
359
+ - the result is delivered every time
360
+ - the run history stays separate from your personal chat threads
361
+
362
+ ### 3. Approval-gated follow-up step
363
+
364
+ Use this when a risky step should wait for a human before it runs.
365
+
366
+ ```bash
367
+ ocs jobs add '{
368
+ "name": "Delete Old Backups",
369
+ "parent_id": "<parent-job-id>",
370
+ "trigger_on": "success",
371
+ "approval_required": 1,
372
+ "approval_timeout_s": 3600,
373
+ "approval_auto": "reject",
374
+ "session_target": "shell",
375
+ "payload_message": "find /backups -type f -mtime +14 -delete",
376
+ "delivery_mode": "announce-always",
377
+ "delivery_channel": "telegram",
378
+ "delivery_to": "YOUR_CHAT_ID",
379
+ "run_timeout_ms": 120000
380
+ }'
381
+ ```
382
+
383
+ Why this is useful:
384
+ - the parent job can run automatically
385
+ - the risky cleanup step pauses until someone approves it
386
+ - the scheduler records who approved or rejected it
387
+
388
+ Approve or reject later with:
389
+
390
+ ```bash
391
+ ocs approvals list
392
+ ocs jobs approve <job-id>
393
+ ocs jobs reject <job-id> "Not today"
394
+ ```
395
+
396
+ ---
397
+
398
+ ## Common Migrations
399
+
400
+ If you already have cron jobs, OpenClaw cron entries, or shell scripts, this is the simplest way to think about the conversion.
401
+
402
+ ### 1. OpenClaw built-in cron -> import first, then clean up
403
+
404
+ If your jobs already live in `~/.openclaw/cron/jobs.json`, start with the importer:
405
+
406
+ ```bash
407
+ cd ~/.openclaw/scheduler
408
+ node migrate.js
409
+ node cli.js jobs list
410
+ ```
411
+
412
+ Then disable the old scheduler path:
413
+
414
+ ```bash
415
+ openclaw cron edit <job-id> --disable
416
+ openclaw config set cron.enabled false
417
+ openclaw config set agents.defaults.heartbeat.every "0m"
418
+ ```
419
+
420
+ Use this path when the existing jobs are already OpenClaw-native. It gets you into SQLite quickly, then you can refine the imported jobs later.
421
+
422
+ ### 2. Plain shell cron line -> `session_target: "shell"`
423
+
424
+ If you have a normal cron line like:
425
+
426
+ ```cron
427
+ */5 * * * * /usr/local/bin/check-api.sh
428
+ ```
429
+
430
+ Convert it to:
431
+
432
+ ```bash
433
+ ocs jobs add '{
434
+ "name": "API Check",
435
+ "schedule_cron": "*/5 * * * *",
436
+ "session_target": "shell",
437
+ "payload_message": "/usr/local/bin/check-api.sh",
438
+ "delivery_mode": "announce",
439
+ "delivery_channel": "telegram",
440
+ "delivery_to": "YOUR_CHAT_ID",
441
+ "run_timeout_ms": 120000,
442
+ "origin": "system"
443
+ }'
444
+ ```
445
+
446
+ Choose `shell` when the task is deterministic and you do not need AI reasoning.
447
+
448
+ ### 3. AI-ish cron job -> `session_target: "isolated"`
449
+
450
+ If the old job was really “run a prompt every morning”, use an isolated agent job instead of a shell script:
451
+
452
+ ```bash
453
+ ocs jobs add '{
454
+ "name": "Daily Status Summary",
455
+ "schedule_cron": "0 8 * * *",
456
+ "schedule_tz": "America/New_York",
457
+ "session_target": "isolated",
458
+ "payload_message": "Summarize the most important errors, deploys, and follow-ups from the last 24 hours in 5 bullet points.",
459
+ "delivery_mode": "announce-always",
460
+ "delivery_channel": "telegram",
461
+ "delivery_to": "YOUR_CHAT_ID",
462
+ "run_timeout_ms": 300000,
463
+ "origin": "system"
464
+ }'
465
+ ```
466
+
467
+ Choose `isolated` when the job needs reasoning, writing, summarization, or tools.
468
+
469
+ ### 4. Two cron jobs with manual ordering -> parent/child chain
470
+
471
+ If your current workflow is:
472
+ - run a backup
473
+ - wait
474
+ - run verification
475
+
476
+ Model that as a chain instead of two unrelated cron entries:
477
+
478
+ ```bash
479
+ ocs jobs add '{
480
+ "name": "Nightly Backup",
481
+ "schedule_cron": "0 2 * * *",
482
+ "session_target": "shell",
483
+ "payload_message": "/usr/local/bin/nightly-backup.sh",
484
+ "delivery_mode": "announce",
485
+ "delivery_channel": "telegram",
486
+ "delivery_to": "YOUR_CHAT_ID",
487
+ "run_timeout_ms": 600000,
488
+ "origin": "system"
489
+ }'
490
+ ```
491
+
492
+ Then create the follow-up:
493
+
494
+ ```bash
495
+ ocs jobs add '{
496
+ "name": "Verify Nightly Backup",
497
+ "parent_id": "<backup-job-id>",
498
+ "trigger_on": "success",
499
+ "trigger_delay_s": 60,
500
+ "session_target": "shell",
501
+ "payload_message": "/usr/local/bin/verify-backup.sh",
502
+ "delivery_mode": "announce",
503
+ "delivery_channel": "telegram",
504
+ "delivery_to": "YOUR_CHAT_ID",
505
+ "run_timeout_ms": 120000
506
+ }'
507
+ ```
508
+
509
+ This is one of the biggest upgrades over plain cron: the second step now runs because the first step succeeded, not because the clock happened to reach another minute.
510
+
511
+ ### 5. Risky follow-up -> add `approval_required`
512
+
513
+ If the current process is “job runs, then a human decides whether to continue,” model that decision directly:
514
+
515
+ ```bash
516
+ ocs jobs add '{
517
+ "name": "Delete Temp Files",
518
+ "parent_id": "<analysis-job-id>",
519
+ "trigger_on": "success",
520
+ "approval_required": 1,
521
+ "approval_timeout_s": 3600,
522
+ "approval_auto": "reject",
523
+ "session_target": "shell",
524
+ "payload_message": "find /tmp/myapp -type f -mtime +7 -delete",
525
+ "delivery_mode": "announce-always",
526
+ "delivery_channel": "telegram",
527
+ "delivery_to": "YOUR_CHAT_ID",
528
+ "run_timeout_ms": 120000
529
+ }'
530
+ ```
531
+
532
+ That keeps the job automated, but only up to the point where human judgment is actually needed.
533
+
534
+ ### Rule of thumb
535
+
536
+ When converting existing work:
537
+ - start with `shell` unless you clearly need AI reasoning
538
+ - add delivery only if someone really needs to see the output
539
+ - use chains when one step depends on another
540
+ - use approvals when the next step would be annoying, expensive, or risky if it ran by mistake
541
+
542
+ ---
543
+
544
+ ## Platform Support
545
+
546
+ | Platform | Service Manager | Shell Jobs | Status |
547
+ |----------|----------------|------------|--------|
548
+ | macOS | launchd (`agent` or `daemon`) | `/bin/zsh` | ✅ Tested |
549
+ | Linux | systemd user service | `/bin/bash` | ✅ Supported |
550
+ | Windows (WSL2) | systemd (WSL2) / PM2 (WSL1) | `/bin/bash` | ✅ Supported |
551
+ | Windows (native) | — | — | ❌ Not supported — use WSL2 |
552
+
553
+ - **macOS:** Full guide in [INSTALL.md](INSTALL.md)
554
+ - **Linux:** Full guide in [INSTALL-LINUX.md](INSTALL-LINUX.md)
555
+ - **Windows:** Install WSL2, then follow [INSTALL-LINUX.md](INSTALL-LINUX.md). See [INSTALL-WINDOWS.md](INSTALL-WINDOWS.md) for WSL2 setup.
556
+
557
+ Override the shell for shell jobs with the `SCHEDULER_SHELL=/path/to/shell` environment variable.
558
+
559
+ ---
560
+
561
+ ## Architecture
562
+
563
+ The scheduler sits alongside the OpenClaw gateway as an independent process. It creates **isolated sessions** for each job — they never touch the user's main conversation.
564
+
565
+ ```
566
+ ┌─────────────────────────────────────────────┐
567
+ │ Host Machine (e.g., scheduler-host.local) │
568
+ │ │
569
+ │ OpenClaw Gateway (:18789) │
570
+ │ ├─ Telegram / Discord / etc. │
571
+ │ ├─ Chat completions endpoint (/v1/...) │
572
+ │ ├─ Tool execution (exec, browser, k8s...) │
573
+ │ └─ Memory search │
574
+ │ │
575
+ │ Scheduler (launchd service) │
576
+ │ ├─ SQLite DB (scheduler.db) │
577
+ │ ├─ Job dispatch via chat completions │
578
+ │ ├─ Workflow chain engine │
579
+ │ ├─ Retry logic │
580
+ │ ├─ Shell job execution │
581
+ │ ├─ HITL approval gates │
582
+ │ ├─ Idempotency ledger │
583
+ │ ├─ Inter-agent message queue │
584
+ │ ├─ Task tracker │
585
+ │ └─ MinIO backup │
586
+ └─────────────────────────────────────────────┘
587
+ ```
588
+
589
+ ### Tick Loop
590
+
591
+ ```
592
+ ┌──────────────────────────────────────────────────────────────┐
593
+ │ Dispatcher Loop (10s tick) │
594
+ │ │
595
+ │ 1. Gateway health check │
596
+ │ 2. Find due jobs → dispatch │
597
+ │ 3. Check running runs (stale/timeout detection) │
598
+ │ 4. HITL approval gate check │
599
+ │ 5. Message delivery + spawn handling │
600
+ │ 6. Task tracker dead-man's-switch │
601
+ │ 7. Expire old messages │
602
+ │ 8. Prune old runs + WAL checkpoint (hourly) │
603
+ │ 9. Backup to MinIO (every 5 min) │
604
+ └──────────────────────────────────────────────────────────────┘
605
+ │ │
606
+ ▼ ▼
607
+ ┌───────────────────┐ ┌──────────────────────┐
608
+ │ SQLite DB │ │ OpenClaw Gateway │
609
+ │ │ │ │
610
+ │ • jobs │ │ • /v1/chat/completions│
611
+ │ • runs │ │ • /tools/invoke │
612
+ │ • messages │ │ • /health │
613
+ │ • agents │ │ • system event CLI │
614
+ │ • approvals │ └──────────────────────┘
615
+ │ • task_tracker │
616
+ │ • idempotency_ledger│
617
+ │ • delivery_aliases│
618
+ │ • schema_migrations│
619
+ └───────────────────┘
620
+ ```
621
+
622
+ ### Session Types
623
+
624
+ | Session | Created By | Lifetime | Used For |
625
+ |---------|-----------|----------|----------|
626
+ | User DM | Telegram message | Persistent per-peer | Your conversations |
627
+ | Group chat | Group message | Persistent per-group | Team discussions |
628
+ | Isolated job | Dispatcher via API | One-shot, dies after completion | Cron jobs, chain steps |
629
+ | Main session | `openclaw system event` | Existing main session | Jobs needing main context |
630
+ | Shell | Dispatcher (direct) | Per-job (no session) | Cron scripts, backups, maintenance |
631
+ | Sub-agent | `sessions_spawn` | Task-scoped | Delegated work |
632
+
633
+ Scheduler jobs get completely isolated sessions. They can't see your chat history and your chats can't see theirs.
634
+
635
+ ---
636
+
637
+ ## How Jobs Execute
638
+
639
+ ### Isolated Jobs (default)
640
+
641
+ ```
642
+ Scheduler tick (every 10s)
643
+
644
+ ├─ getDueJobs() → "Hourly Workspace Backup is due"
645
+ ├─ hasRunningRun()? → skip if overlap_policy='skip'
646
+ ├─ createRun() → status='running'
647
+ ├─ setAgentStatus('main', 'busy')
648
+
649
+ ├─ POST /v1/chat/completions
650
+ │ session: scheduler:<job_id>:<run_id> (unique, isolated)
651
+ │ model: openclaw:main
652
+ │ message: [job prompt + any pending inbox messages]
653
+
654
+ │ ← "Committed 3 files, pushed to origin"
655
+
656
+ ├─ finishRun('ok', summary)
657
+ ├─ setAgentStatus('main', 'idle')
658
+ ├─ Deliver to Telegram? → delivery_mode + channel + target
659
+ ├─ Queue result message for traceability
660
+ ├─ Advance next_run_at to next cron fire
661
+ └─ Trigger child jobs if any (workflow chain)
662
+ ```
663
+
664
+ ### Main Session Jobs
665
+
666
+ For jobs that need the main session context (rare):
667
+
668
+ ```
669
+ Dispatcher → exec: openclaw system event --text "..." --mode now
670
+ ```
671
+
672
+ This injects directly into the active agent session.
673
+
674
+ ### Shell Jobs
675
+
676
+ ```
677
+ Shell Job (session_target='shell')
678
+
679
+ ├─ getDueJobs() → "Hourly Backup is due"
680
+ ├─ createRun() → status='running'
681
+ ├─ run "<payload_message>" via shell (platform default or SCHEDULER_SHELL)
682
+ │ (no gateway required)
683
+ │ ← exit 0: "Backup complete, 3 files"
684
+
685
+ ├─ finishRun(exit===0 ? 'ok' : 'error')
686
+ ├─ announce: post output if exit ≠ 0
687
+ ├─ announce-always: post output regardless
688
+ └─ Trigger child jobs if any
689
+ ```
690
+
691
+ ### Prompt Building
692
+
693
+ Each isolated job prompt includes:
694
+ 1. Header: `[scheduler:<job_id> <job_name>]`
695
+ 2. Pending inbox messages for the agent (up to 5)
696
+ 3. Context from prior runs (if `context_retrieval` is set)
697
+ 4. The job's `payload_message`
698
+
699
+ ---
700
+
701
+ ## Delivery Modes
702
+
703
+ The scheduler delivers job output through the OpenClaw gateway's messaging
704
+ system. All channels supported by the gateway work with the scheduler:
705
+ **Telegram**, **Discord**, **WhatsApp**, **Signal**, **iMessage**, and **Slack**.
706
+ Set `delivery_channel` to the channel name and `delivery_to` to the
707
+ channel-specific target (chat ID, channel ID, phone number, handle, etc.).
708
+
709
+ | Mode | When output is delivered |
710
+ |------|-------------------------|
711
+ | `none` | Never (background jobs) |
712
+ | `announce` | Agent jobs: delivers when run status is not `ok`. Shell jobs: non-zero exit only. Silently skipped for `main` session jobs (use `announce-always` instead) |
713
+ | `announce-always` | Always delivers output (LLM or shell), including `main` session jobs |
714
+
715
+ > **Note:** delivery is suppressed if `delivery_channel` or `delivery_to` are absent, regardless of `delivery_mode`.
716
+ >
717
+ > Examples in this document use Telegram for delivery_channel since it is the
718
+ > most common configuration. Replace with your channel of choice.
719
+
720
+ ---
721
+
722
+ ## Delivery Aliases
723
+
724
+ Delivery aliases let you define named delivery targets (e.g., `@my_team`) instead of hard-coding channel/target pairs in every job.
725
+
726
+ ```bash
727
+ # Create a named alias
728
+ openclaw-scheduler alias add my_team telegram -100200000000
729
+
730
+ # Use @alias in job (resolves at dispatch time)
731
+ openclaw-scheduler jobs add '{
732
+ "name": "Alert",
733
+ "delivery_mode": "announce",
734
+ "delivery_to": "@my_team",
735
+ ...
736
+ }'
737
+
738
+ # List aliases
739
+ openclaw-scheduler alias list
740
+
741
+ # Remove an alias
742
+ openclaw-scheduler alias remove my_team
743
+ ```
744
+
745
+ Aliases are resolved at dispatch time. If an alias is deleted, jobs fall back to suppressed delivery.
746
+
747
+ ---
748
+
749
+ ## Shell Jobs
750
+
751
+ Shell jobs run a command directly on the host — no gateway or LLM required. Ideal for backups, scripts, maintenance tasks, and anything that doesn't need AI.
752
+
753
+ ```bash
754
+ openclaw-scheduler jobs add '{
755
+ "name": "Hourly Backup",
756
+ "schedule_cron": "0 * * * *",
757
+ "schedule_tz": "America/New_York",
758
+ "session_target": "shell",
759
+ "payload_message": "/path/to/backup.sh",
760
+ "delivery_mode": "announce",
761
+ "delivery_channel": "telegram",
762
+ "delivery_to": "YOUR_CHAT_ID",
763
+ "run_timeout_ms": 600000,
764
+ "origin": "system"
765
+ }'
766
+ ```
767
+
768
+ **Key properties:**
769
+ - **No gateway dependency** — runs even when gateway is down
770
+ - `payload_message` is the command to execute (shell string passed to the configured shell)
771
+ - Output captured up to 1MB, with preview/offload budgets to keep large output out of the main run row
772
+ - Shell runs persist structured failure context on `runs`: `shell_exit_code`, `shell_signal`, `shell_timed_out`, `shell_stdout`, `shell_stderr`, plus optional `shell_stdout_path` / `shell_stderr_path` when large output is offloaded
773
+ - Failure-triggered agent children receive shell context with separate exit code, stdout, and stderr blocks
774
+ - `run_timeout_ms` controls max execution time (required, no default)
775
+ - Workflow chains work the same way — shell jobs can trigger children on success/failure
776
+ - Shell jobs now honor `max_retries` before failure children fire, the same as isolated agent jobs
777
+ - `openclaw-scheduler runs output <run-id> stdout|stderr` retrieves stored or offloaded shell output on demand
778
+
779
+ **With environment variables:**
780
+ ```bash
781
+ openclaw-scheduler jobs add '{
782
+ "name": "DB Dump",
783
+ "schedule_cron": "0 3 * * *",
784
+ "session_target": "shell",
785
+ "payload_message": "PGPASSWORD=secret pg_dump mydb > /backups/mydb.sql && echo OK",
786
+ "delivery_mode": "announce-always",
787
+ "delivery_channel": "telegram",
788
+ "delivery_to": "YOUR_CHAT_ID",
789
+ "run_timeout_ms": 600000,
790
+ "origin": "system"
791
+ }'
792
+ ```
793
+
794
+ ---
795
+
796
+ ## HITL Approval Gates
797
+
798
+ Jobs with `approval_required: 1` pause before each chain-triggered execution and wait for a human to approve or reject.
799
+
800
+ ```bash
801
+ # Job that requires operator approval before each chain-triggered execution
802
+ openclaw-scheduler jobs add '{
803
+ "name": "Deploy to Prod",
804
+ "parent_id": "<build-job-id>",
805
+ "trigger_on": "success",
806
+ "approval_required": 1,
807
+ "approval_timeout_s": 3600,
808
+ "approval_auto": "reject",
809
+ "payload_message": "Deploy the application to production",
810
+ "run_timeout_ms": 300000
811
+ }'
812
+ ```
813
+
814
+ When triggered, the job creator receives: `⚠️ Job 'Deploy to Prod' requires approval.`
815
+
816
+ ```bash
817
+ openclaw-scheduler jobs approve <job-id>
818
+ openclaw-scheduler jobs reject <job-id> "Postponing — too late in the day"
819
+ openclaw-scheduler approvals list
820
+ ```
821
+
822
+ **Key notes:**
823
+ - Approval gates only apply to **chain-triggered** jobs (`parent_id` set)
824
+ - Cron-scheduled jobs always dispatch without waiting for approval
825
+ - `approval_timeout_s` — auto-resolve timeout (seconds)
826
+ - `approval_auto` — `"approve"` or `"reject"` — what happens on timeout
827
+
828
+ ---
829
+
830
+ ## Idempotency
831
+
832
+ Control what happens when the dispatcher crashes mid-run.
833
+
834
+ ```bash
835
+ # Enable at-least-once: crashed runs replay on next startup
836
+ openclaw-scheduler jobs update <id> '{"delivery_guarantee":"at-least-once"}'
837
+
838
+ # Default (at-most-once): no replay
839
+ openclaw-scheduler jobs update <id> '{"delivery_guarantee":"at-most-once"}'
840
+ ```
841
+
842
+ **How it works:**
843
+ - **`at-most-once`** (default): if dispatcher crashes mid-run, run is marked `crashed` and the schedule advances normally. The run is not replayed.
844
+ - **`at-least-once`**: on startup, any `running` run from a crashed dispatcher is replayed with a new run. `replay_of` field tracks the original run ID for lineage.
845
+
846
+ **Idempotent agents** can return `IDEMPOTENT_SKIP` in their response to acknowledge they've already processed this execution (detected via the idempotency ledger).
847
+
848
+ The ledger also prevents double-dispatch in concurrent tick scenarios — each run acquires a lock before dispatch.
849
+
850
+ ---
851
+
852
+ ## Context Retrieval
853
+
854
+ Inject prior run summaries into a job's prompt so the agent has awareness of recent outcomes.
855
+
856
+ ```bash
857
+ # Inject last 3 run summaries into job prompt
858
+ openclaw-scheduler jobs update <id> '{"context_retrieval":"recent","context_retrieval_limit":3}'
859
+
860
+ # Hybrid: recent runs + TF-IDF search for semantically relevant summaries
861
+ openclaw-scheduler jobs update <id> '{"context_retrieval":"hybrid","context_retrieval_limit":5}'
862
+ ```
863
+
864
+ **Modes:**
865
+ | Mode | Description |
866
+ |------|-------------|
867
+ | `none` | No context injected (default) |
868
+ | `recent` | Last N run summaries, newest first |
869
+ | `hybrid` | Recent runs + TF-IDF similarity search against all prior summaries |
870
+
871
+ Useful for health check jobs that should know about yesterday's failures, or audit jobs that build incrementally on prior work.
872
+
873
+ ---
874
+
875
+ ## Task Tracker
876
+
877
+ The task tracker provides a dead-man's-switch for coordinating multi-agent sub-agent teams. Create a tracker, assign expected agents, and receive a summary when all agents complete (or time out).
878
+
879
+ ```bash
880
+ # Create a task group to monitor N sub-agents
881
+ openclaw-scheduler tasks create '{
882
+ "name": "v2-release-team",
883
+ "expected_agents": ["schema-agent","frontend-agent","docs-agent"],
884
+ "timeout_s": 1800,
885
+ "delivery_channel": "telegram",
886
+ "delivery_to": "YOUR_CHAT_ID"
887
+ }'
888
+
889
+ # Monitor
890
+ openclaw-scheduler tasks list
891
+ openclaw-scheduler tasks status <tracker-id>
892
+ ```
893
+
894
+ Each agent in the team must send heartbeat updates. If an agent goes silent past its timeout, it's declared dead. When all agents complete or time out, a summary is delivered to the configured channel.
895
+
896
+ ---
897
+
898
+ ## Resource Pools
899
+
900
+ Prevent concurrent execution across different jobs that share a resource.
901
+
902
+ ```bash
903
+ # Two jobs that must not run concurrently
904
+ openclaw-scheduler jobs add '{"name":"DB Migration","resource_pool":"database",...}'
905
+ openclaw-scheduler jobs add '{"name":"DB Backup","resource_pool":"database",...}'
906
+ ```
907
+
908
+ If one job in a pool is currently running, all other pool members skip their tick (same behavior as `overlap_policy: 'skip'`, but cross-job rather than per-job). Pool membership is set via the `resource_pool` string field.
909
+
910
+ ---
911
+
912
+ ## Workflow Chains
913
+
914
+ Jobs can be linked into parent → child chains. When a parent completes, its children fire automatically.
915
+
916
+ ### Pattern 1: Chained Jobs
917
+
918
+ ```bash
919
+ # Parent: runs on cron
920
+ openclaw-scheduler jobs add '{
921
+ "name": "Build App",
922
+ "schedule_cron": "0 10 * * *",
923
+ "payload_message": "Build the application",
924
+ "run_timeout_ms": 300000,
925
+ "origin": "system"
926
+ }'
927
+ # → id: "abc123..."
928
+
929
+ # Child: fires when parent succeeds
930
+ openclaw-scheduler jobs add '{
931
+ "name": "Deploy App",
932
+ "payload_message": "Deploy to production",
933
+ "parent_id": "abc123...",
934
+ "trigger_on": "success",
935
+ "run_timeout_ms": 300000
936
+ }'
937
+
938
+ # Child: fires when parent fails
939
+ openclaw-scheduler jobs add '{
940
+ "name": "Build Alert",
941
+ "payload_message": "Build failed -- check logs",
942
+ "parent_id": "abc123...",
943
+ "trigger_on": "failure",
944
+ "delivery_mode": "announce",
945
+ "delivery_to": "YOUR_CHAT_ID",
946
+ "run_timeout_ms": 300000
947
+ }'
948
+ ```
949
+
950
+ **Trigger types:**
951
+ - `success` — parent run status = `ok`
952
+ - `failure` — parent run status = `error` or `timeout`
953
+ - `complete` — any completion (success, failure, or timeout)
954
+ - Child jobs are chain-triggered only. Use `trigger_delay_s` to delay a child run; one-shot `schedule_kind: "at"` is for root jobs only.
955
+
956
+ ### Pattern 2: Output-Based Trigger Conditions
957
+
958
+ ```bash
959
+ # Only fire child if parent output contains "ALERT"
960
+ openclaw-scheduler jobs add '{
961
+ "name": "Alert Handler",
962
+ "parent_id": "<monitor-job-id>",
963
+ "trigger_on": "success",
964
+ "trigger_condition": "contains:ALERT",
965
+ "payload_message": "Handle the alert",
966
+ "run_timeout_ms": 300000
967
+ }'
968
+
969
+ # Regex condition
970
+ openclaw-scheduler jobs add '{
971
+ "name": "Critical Error Handler",
972
+ "parent_id": "<monitor-job-id>",
973
+ "trigger_on": "success",
974
+ "trigger_condition": "regex:ERROR.*critical",
975
+ "payload_message": "Handle critical error",
976
+ "run_timeout_ms": 300000
977
+ }'
978
+ ```
979
+
980
+ ### Pattern 3: Multi-Agent Workflows
981
+
982
+ Chain jobs targeting different agents:
983
+
984
+ ```
985
+ Build (agent: main, cron: 10am)
986
+ └─ Deploy (agent: ops, trigger: success)
987
+ └─ Health Check (agent: main, trigger: success, delay: 60s)
988
+ ```
989
+
990
+ ```bash
991
+ openclaw-scheduler jobs add '{
992
+ "name": "Deploy",
993
+ "payload_message": "deploy",
994
+ "agent_id": "ops",
995
+ "parent_id": "<build-id>",
996
+ "trigger_on": "success",
997
+ "run_timeout_ms": 300000
998
+ }'
999
+ ```
1000
+
1001
+ ### Pattern 4: Delayed Triggers
1002
+
1003
+ ```bash
1004
+ openclaw-scheduler jobs add '{
1005
+ "name": "Post-Deploy Check",
1006
+ "payload_message": "Verify services healthy",
1007
+ "parent_id": "<deploy-id>",
1008
+ "trigger_on": "success",
1009
+ "trigger_delay_s": 60,
1010
+ "run_timeout_ms": 300000
1011
+ }'
1012
+ ```
1013
+
1014
+ ### Pattern 5: Runtime Spawning
1015
+
1016
+ A running agent can create new jobs on the fly by sending a `spawn` message:
1017
+
1018
+ ```json
1019
+ {
1020
+ "from_agent": "main",
1021
+ "to_agent": "scheduler",
1022
+ "kind": "spawn",
1023
+ "body": "{\"name\":\"Dynamic Task\",\"payload_message\":\"analyze results\",\"delete_after_run\":true,\"run_now\":true}"
1024
+ }
1025
+ ```
1026
+
1027
+ ### Visualizing Chains
1028
+
1029
+ ```bash
1030
+ openclaw-scheduler jobs tree
1031
+
1032
+ # Output (all root jobs and their chains):
1033
+ # Build App
1034
+ # └─ Deploy App [→success] (agent:ops)
1035
+ # └─ Build Alert [→failure]
1036
+ # └─ Health Check [→complete +60s]
1037
+ ```
1038
+
1039
+ ---
1040
+
1041
+ ## Retry Logic
1042
+
1043
+ Jobs can auto-retry before declaring failure and triggering failure children.
1044
+
1045
+ ```bash
1046
+ openclaw-scheduler jobs add '{
1047
+ "name": "Flaky Deploy",
1048
+ "schedule_cron": "0 10 * * *",
1049
+ "payload_message": "deploy to prod",
1050
+ "max_retries": 3,
1051
+ "run_timeout_ms": 300000,
1052
+ "origin": "system"
1053
+ }'
1054
+ ```
1055
+
1056
+ **How it works:**
1057
+
1058
+ 1. Job fails → check `max_retries`
1059
+ 2. Retries remaining → schedule retry with exponential backoff (30s, 60s, 120s, ...)
1060
+ 3. Retry run tracks lineage: `retry_of` → failed run ID, `retry_count` incremented
1061
+ 4. All retries exhausted → trigger failure children + apply error backoff
1062
+ 5. Any retry succeeds → trigger success children, reset `consecutive_errors`
1063
+
1064
+ **Key:** failure children don't fire until all retries are exhausted. This prevents false alerts on transient failures.
1065
+
1066
+ This retry ladder now applies uniformly to shell jobs, isolated agent jobs, and main-session jobs that surface dispatch failures.
1067
+
1068
+ | Field | Default | Description |
1069
+ |-------|---------|-------------|
1070
+ | `max_retries` | 0 | Max retry attempts (0 = no retry) |
1071
+ | `runs.retry_of` | null | ID of the failed run being retried |
1072
+ | `runs.retry_count` | 0 | Which attempt this is (0 = first try) |
1073
+
1074
+ ---
1075
+
1076
+ ## Chain Safety
1077
+
1078
+ ### Max Chain Depth
1079
+
1080
+ `MAX_CHAIN_DEPTH = 10` — enforced on:
1081
+ - `createJob` — can't add a child deeper than 10 levels
1082
+ - `updateJob` — can't move a job to create a chain deeper than 10
1083
+ - `triggerChildren` — runtime safeguard stops dispatch at depth 10
1084
+
1085
+ ### Cycle Detection
1086
+
1087
+ `detectCycle()` walks up the parent chain on both create and update. Catches:
1088
+ - Self-referential: A → A
1089
+ - Deep cycles: A → B → C → A
1090
+ - Throws with descriptive error message
1091
+
1092
+ ### Chain Cancellation
1093
+
1094
+ ```bash
1095
+ openclaw-scheduler jobs cancel <job-id>
1096
+ # Cancels all running runs for this job + every descendant
1097
+ ```
1098
+
1099
+ Sets `status = 'cancelled'` on all running runs in the chain. No-op on finished runs.
1100
+
1101
+ ---
1102
+
1103
+ ## Inter-Agent Messaging
1104
+
1105
+ Agents exchange messages through the scheduler's queue.
1106
+
1107
+ ### Features
1108
+ - **Priority:** 0 (normal), 1 (high), 2 (urgent) — inbox sorted by priority then time
1109
+ - **Threading:** `reply_to` links messages into conversations
1110
+ - **Read receipts:** pending → delivered → read (with timestamps)
1111
+ - **Broadcast:** `to_agent = 'broadcast'` reaches all agents
1112
+ - **TTL/Expiry:** `expires_at` auto-expires unread messages
1113
+ - **Metadata:** JSON blob for structured data
1114
+ - **Kinds:** `text`, `task`, `result`, `status`, `system`, `spawn`, `decision`, `constraint`, `fact`, `preference`
1115
+ - **Owner field:** `owner` tracks message originator for audit
1116
+ - **Job linking:** messages can reference `job_id` and `run_id`
1117
+
1118
+ ### Delivery
1119
+
1120
+ Messages are delivered inline with job prompts. When the dispatcher builds a prompt, it includes up to 5 pending messages for the target agent, marked as `delivered`.
1121
+
1122
+ ### Usage
1123
+
1124
+ ```bash
1125
+ # Send a message
1126
+ openclaw-scheduler msg send <from-agent> <to-agent> "message body"
1127
+
1128
+ # Read inbox
1129
+ openclaw-scheduler msg inbox <agent-id>
1130
+
1131
+ # Mark all read
1132
+ openclaw-scheduler msg readall <agent-id>
1133
+ ```
1134
+
1135
+ ### Signal Queue Consumer Example
1136
+
1137
+ Use this when you want scripts to enqueue only actionable signals, then a single consumer job pushes those signals to Telegram.
1138
+
1139
+ ```bash
1140
+ # 1) Enqueue a signal
1141
+ openclaw-scheduler msg send monitor-agent main "Found 3 critical errors in prod logs"
1142
+
1143
+ # 2) Add a consumer shell job (every 5 minutes)
1144
+ openclaw-scheduler jobs add '{
1145
+ "name": "Inbox Consumer",
1146
+ "schedule_cron": "*/5 * * * *",
1147
+ "session_target": "shell",
1148
+ "payload_message": "npm exec --prefix ~/.openclaw/scheduler openclaw-inbox-consumer -- --to YOUR_CHAT_ID",
1149
+ "delivery_mode": "announce",
1150
+ "delivery_channel": "telegram",
1151
+ "delivery_to": "YOUR_CHAT_ID",
1152
+ "run_timeout_ms": 60000,
1153
+ "origin": "system"
1154
+ }'
1155
+ ```
1156
+
1157
+ ---
1158
+
1159
+ ## Backup & Recovery
1160
+
1161
+ MinIO backups are disabled by default. Set `SCHEDULER_BACKUP=1` to enable. Requires `mc` (MinIO client) installed and configured with a `backupstore` alias.
1162
+
1163
+ The scheduler can back up its SQLite database to MinIO automatically.
1164
+
1165
+ ```bash
1166
+ # Manual snapshot
1167
+ node backup.js snapshot
1168
+
1169
+ # Manual rollup (hourly aggregate)
1170
+ node backup.js rollup
1171
+
1172
+ # Check backup status
1173
+ node backup.js status
1174
+
1175
+ # Restore from snapshot
1176
+ node backup.js restore
1177
+
1178
+ # Prune old backups
1179
+ node backup.js prune
1180
+ ```
1181
+
1182
+ **Configuration via environment:**
1183
+
1184
+ | Variable | Default | Description |
1185
+ |----------|---------|-------------|
1186
+ | `SCHEDULER_BACKUP_MC_ALIAS` | `backupstore` | MinIO client alias |
1187
+ | `SCHEDULER_BACKUP_BUCKET` | `scheduler-backups` | MinIO bucket name |
1188
+ | `SCHEDULER_BACKUP_PREFIX` | `scheduler` | Path prefix within bucket |
1189
+
1190
+ Requires `mc` (MinIO client) in PATH and a configured `backupstore` alias.
1191
+
1192
+ **Built-in (when running as a background service):**
1193
+ - Snapshot every 5 minutes (`SCHEDULER_BACKUP_MS`)
1194
+ - Rollup on the first tick of each hour
1195
+
1196
+ ---
1197
+
1198
+ ## Agent Registry
1199
+
1200
+ | Operation | Function | Description |
1201
+ |-----------|----------|-------------|
1202
+ | Register | `upsertAgent(id, opts)` | Create or update |
1203
+ | Get | `getAgent(id)` | Fetch by ID |
1204
+ | List | `listAgents()` | All agents |
1205
+ | Set status | `setAgentStatus(id, status, sessionKey)` | idle/busy/offline |
1206
+ | Touch | `touchAgent(id)` | Update last_seen_at |
1207
+
1208
+ The dispatcher automatically manages agent status during dispatch (idle → busy → idle).
1209
+
1210
+ ```bash
1211
+ openclaw-scheduler agents list
1212
+ openclaw-scheduler agents get <id>
1213
+ openclaw-scheduler agents register <id> [name]
1214
+ ```
1215
+
1216
+ ---
1217
+
1218
+ ## Database Schema
1219
+
1220
+ **Schema version:** 23 | **Mode:** WAL | **Foreign keys:** ON
1221
+
1222
+ ### Tables
1223
+
1224
+ | Table | Description |
1225
+ |-------|-------------|
1226
+ | `jobs` | Job definitions (schedule, payload, chain config, delivery) |
1227
+ | `runs` | Execution history (status, timing, summaries, retry lineage) |
1228
+ | `messages` | Inter-agent message queue (priority, TTL, typed) |
1229
+ | `agents` | Agent registry (status, capabilities, last seen) |
1230
+ | `approvals` | HITL gate records (pending/approved/rejected/expired) |
1231
+ | `task_tracker` | Multi-agent task group definitions |
1232
+ | `task_tracker_agents` | Per-agent status within a task group |
1233
+ | `idempotency_ledger` | Dispatch deduplication and at-least-once tracking |
1234
+ | `delivery_aliases` | Named delivery targets (channel + target pairs) |
1235
+ | `job_dispatch_queue` | Pending and delivered dispatch entries per job |
1236
+ | `message_receipts` | Delivery receipt tracking for messages |
1237
+ | `team_tasks` | Team-scoped task definitions and status |
1238
+ | `team_mailbox_events` | Projected events from team mailbox activity |
1239
+ | `schema_migrations` | Baseline schema version log |
1240
+
1241
+ ### Jobs (key columns)
1242
+
1243
+ ```
1244
+ id, name, enabled, schedule_kind, schedule_cron, schedule_at, schedule_tz,
1245
+ session_target, agent_id, payload_kind, payload_message,
1246
+ payload_model, payload_thinking, execution_intent, execution_read_only,
1247
+ overlap_policy, run_timeout_ms, max_queued_dispatches, max_pending_approvals,
1248
+ max_trigger_fanout, output_store_limit_bytes, output_excerpt_limit_bytes,
1249
+ output_summary_limit_bytes, output_offload_threshold_bytes,
1250
+ max_retries, delivery_mode, delivery_channel,
1251
+ delivery_to, delivery_guarantee, delete_after_run, ttl_hours,
1252
+ parent_id, trigger_on, trigger_delay_s, trigger_condition,
1253
+ resource_pool, auth_profile,
1254
+ approval_required, approval_timeout_s, approval_auto,
1255
+ context_retrieval, context_retrieval_limit,
1256
+ preferred_session_key, job_type, watchdog_target_label,
1257
+ watchdog_check_cmd, watchdog_timeout_min, watchdog_alert_channel,
1258
+ watchdog_alert_target, watchdog_self_destruct, watchdog_started_at,
1259
+ next_run_at, last_run_at, last_status, consecutive_errors,
1260
+ created_at, updated_at
1261
+ ```
1262
+
1263
+ ### Runs (key columns)
1264
+
1265
+ ```
1266
+ id, job_id, status, started_at, finished_at, duration_ms,
1267
+ last_heartbeat, session_key, session_id, summary,
1268
+ error_message, shell_exit_code, shell_signal, shell_timed_out,
1269
+ shell_stdout, shell_stderr, shell_stdout_path, shell_stderr_path,
1270
+ shell_stdout_bytes, shell_stderr_bytes, dispatched_at, run_timeout_ms,
1271
+ triggered_by_run, retry_of, retry_count, replay_of
1272
+ ```
1273
+
1274
+ **Run statuses:** `pending`, `running`, `ok`, `error`, `timeout`, `skipped`, `cancelled`, `crashed`, `awaiting_approval`, `approved`
1275
+
1276
+ ### Messages (key columns)
1277
+
1278
+ ```
1279
+ id, from_agent, to_agent, reply_to, kind, subject, body,
1280
+ metadata, priority, channel, owner, status, delivered_at,
1281
+ read_at, expires_at, created_at, job_id, run_id
1282
+ ```
1283
+
1284
+ ### Agents (10 columns)
1285
+
1286
+ ```
1287
+ id, name, status, last_seen_at, session_key, capabilities,
1288
+ delivery_channel, delivery_to, brand_name, created_at
1289
+ ```
1290
+
1291
+ ---
1292
+
1293
+ ## CLI Reference
1294
+
1295
+ ```bash
1296
+ # ── Jobs ──────────────────────────────────────────
1297
+ openclaw-scheduler jobs list # List all (shows agent, parent, trigger; supports --type <type>)
1298
+ openclaw-scheduler jobs get <id> # Full details as JSON
1299
+ openclaw-scheduler jobs add '<json>' # Create a job (supports --dry-run)
1300
+ openclaw-scheduler jobs update <id> '<json>' # Partial update (supports --dry-run)
1301
+ openclaw-scheduler jobs validate '<json>' # Validate a job spec without creating it
1302
+ openclaw-scheduler jobs enable <id>
1303
+ openclaw-scheduler jobs disable <id> # NOTE: one-shot at-jobs with delete_after_run: true are auto-pruned after 24h (ordinary disabled cron jobs are kept indefinitely)
1304
+ openclaw-scheduler jobs delete <id> # Cascades to runs
1305
+ openclaw-scheduler jobs tree # Visual chain hierarchy
1306
+ openclaw-scheduler jobs cancel <id> # Cancel running chain
1307
+
1308
+ # ── Runs ──────────────────────────────────────────
1309
+ openclaw-scheduler runs list <job-id> [limit] # Run history
1310
+ openclaw-scheduler runs get <run-id> # Full run details
1311
+ openclaw-scheduler runs output <run-id> stdout # Stored/offloaded stdout or stderr
1312
+ openclaw-scheduler runs running # Active runs
1313
+ openclaw-scheduler runs stale [threshold-s] # Stale runs (default 90s)
1314
+
1315
+ # ── Messages ──────────────────────────────────────
1316
+ openclaw-scheduler msg send <from> <to> <body>
1317
+ openclaw-scheduler msg inbox <agent-id> [limit]
1318
+ openclaw-scheduler msg outbox <agent-id> [limit]
1319
+ openclaw-scheduler msg thread <message-id>
1320
+ openclaw-scheduler msg ack <message-id> [actor] [note]
1321
+ openclaw-scheduler msg receipts <message-id> [limit]
1322
+ openclaw-scheduler msg team-inbox <team-id> [limit] [member-id] [task-id]
1323
+ openclaw-scheduler msg read <message-id>
1324
+ openclaw-scheduler msg readall <agent-id>
1325
+ openclaw-scheduler msg unread <agent-id>
1326
+
1327
+ # ── Agents ────────────────────────────────────────
1328
+ openclaw-scheduler agents list
1329
+ openclaw-scheduler agents get <id>
1330
+ openclaw-scheduler agents register <id> [name]
1331
+
1332
+ # ── Approvals ─────────────────────────────────────
1333
+ openclaw-scheduler jobs approve <id> # Approve pending gate
1334
+ openclaw-scheduler jobs reject <id> [reason] # Reject pending gate
1335
+ openclaw-scheduler approvals list # All pending approvals
1336
+
1337
+ # ── Task Tracker ──────────────────────────────────
1338
+ openclaw-scheduler tasks create '<json>' # Create task group
1339
+ openclaw-scheduler tasks list # Active task groups
1340
+ openclaw-scheduler tasks status <id> # Detailed status
1341
+ openclaw-scheduler tasks history [limit] # Recently completed groups
1342
+ openclaw-scheduler tasks heartbeat <id> <label> running|completed|failed [msg]
1343
+ openclaw-scheduler tasks register-session <id> <label> <session-key> # Enable auto-heartbeat
1344
+
1345
+ # ── Queue ─────────────────────────────────────────
1346
+ openclaw-scheduler queue list [agent] [limit] # Pending + delivered messages
1347
+ openclaw-scheduler queue clear [agent] # Mark all messages read
1348
+ openclaw-scheduler queue prune # Prune old messages
1349
+
1350
+ # ── Team Adapter ─────────────────────────────────
1351
+ openclaw-scheduler team map [limit] # Project team messages into events
1352
+ openclaw-scheduler team tasks <team-id> [limit] # List team tasks
1353
+ openclaw-scheduler team events <team-id> [limit] [task-id] # List team events
1354
+ openclaw-scheduler team gate <team-id> <task-id> <members-json> [timeout-s]
1355
+ openclaw-scheduler team check-gates [limit] # Evaluate task gates
1356
+ openclaw-scheduler team ack <message-id> [actor] [note] # Team-aware ACK
1357
+
1358
+ # ── Idempotency ──────────────────────────────────
1359
+ openclaw-scheduler idem status <job-id> # Recent idempotency keys
1360
+ openclaw-scheduler idem check <key> # Check if key is claimed
1361
+ openclaw-scheduler idem release <key> # Manually release a key
1362
+ openclaw-scheduler idem prune # Force prune expired entries
1363
+
1364
+ # ── Delivery Aliases ──────────────────────────────
1365
+ openclaw-scheduler alias list # List all aliases
1366
+ openclaw-scheduler alias add <name> <channel> <target> [description]
1367
+ openclaw-scheduler alias remove <name>
1368
+
1369
+ # ── Schema Introspection ─────────────────────────
1370
+ openclaw-scheduler schema jobs # JSON schema for job fields (types, defaults, enums)
1371
+ openclaw-scheduler schema runs # Run statuses and key fields
1372
+ openclaw-scheduler schema messages # Message kinds and statuses
1373
+ openclaw-scheduler schema approvals # Approval statuses
1374
+ openclaw-scheduler schema dispatches # Dispatch kinds and statuses
1375
+ openclaw-scheduler schema all # Everything
1376
+
1377
+ # ── Status ────────────────────────────────────────
1378
+ openclaw-scheduler status
1379
+ openclaw-scheduler version # Print version (also: --version)
1380
+ ```
1381
+
1382
+ All CLI commands support `--json` for machine-readable output (useful for piping into `jq` or agent toolchains).
1383
+
1384
+ ---
1385
+
1386
+ ## Configuration
1387
+
1388
+ | Variable | Default | Description |
1389
+ |----------|---------|-------------|
1390
+ | `OPENCLAW_GATEWAY_URL` | `http://127.0.0.1:18789` | Gateway endpoint |
1391
+ | `OPENCLAW_GATEWAY_TOKEN` | *(required)* | Gateway auth token |
1392
+ | `OPENCLAW_GATEWAY_TOKEN_PATH` | `~/.openclaw/credentials/.gateway-token` | Path to gateway token file (used when `OPENCLAW_GATEWAY_TOKEN` is not set) |
1393
+ | `SCHEDULER_HOME` | `~/.openclaw/scheduler` | Base dir for scheduler data when installed from npm or when the package dir is not a writable source checkout |
1394
+ | `SCHEDULER_DB` | auto (`./scheduler.db` in a writable source checkout, else `~/.openclaw/scheduler/scheduler.db`) | SQLite database path |
1395
+ | `SCHEDULER_BACKUP_STAGING_DIR` | `~/.openclaw/scheduler/.backup-staging` | Temp folder used by `backup.js` snapshot/restore |
1396
+ | `SCHEDULER_TICK_MS` | `10000` | Tick interval (10s) |
1397
+ | `SCHEDULER_STALE_THRESHOLD_S` | `90` | Stale run threshold |
1398
+ | `SCHEDULER_HEARTBEAT_CHECK_MS` | `30000` | Health check interval |
1399
+ | `SCHEDULER_MESSAGE_DELIVERY_MS` | `15000` | Message + spawn processing interval |
1400
+ | `SCHEDULER_PRUNE_MS` | `3600000` | Prune interval (1 hour) |
1401
+ | `SCHEDULER_BACKUP_MS` | `300000` | MinIO backup interval (5 min) |
1402
+ | `SCHEDULER_BACKUP` | *(unset)* | Set to `"1"` or `"true"` to enable MinIO backups (requires `mc` CLI) |
1403
+ | `SCHEDULER_BACKUP_MC_ALIAS` | `backupstore` | MinIO alias used by `mc` for backup snapshots |
1404
+ | `SCHEDULER_BACKUP_BUCKET` | `scheduler-backups` | MinIO bucket for snapshots |
1405
+ | `SCHEDULER_BACKUP_PREFIX` | `scheduler` | Object prefix inside bucket |
1406
+ | `SCHEDULER_ARTIFACTS_DIR` | `~/.openclaw/scheduler/artifacts` | Directory for offloaded shell stdout/stderr files |
1407
+ | `SCHEDULER_DEBUG` | *(unset)* | `1` for debug logging |
1408
+ | `SCHEDULER_SHELL` | `/bin/zsh` (macOS), `/bin/bash` (Linux/WSL), `cmd.exe` (Windows) | Shell used for shell jobs |
1409
+ | `SCHEDULER_PROVIDER_PATH` | *(unset)* | Directory of provider plugin `*.js` files loaded at startup. High trust boundary -- only point at operator-controlled code. See [gateway contract](docs/gateway-contract.md#local-provider-plugins) |
1410
+ | `DISPATCH_CONFIG_DIR` | `~/.openclaw/dispatch` | Override dispatch config directory (labels.json, config.json) |
1411
+ | `DISPATCH_LABELS_PATH` | *(auto)* | Override path to labels.json for dispatch session tracking |
1412
+ | `DISPATCH_INDEX_PATH` | *(auto)* | Override path to dispatch/index.mjs (used by watcher) |
1413
+ | `DISPATCH_HOST` | `hostname()` | Host identifier sent with dispatch hook events |
1414
+ | `DISPATCH_WEBHOOK_URL` | *(unset)* | Webhook URL for dispatch lifecycle events (hooks.mjs) |
1415
+ | `LOKI_PUSH_URL` | *(unset)* | Loki push endpoint for dispatch event logging (hooks.mjs) |
1416
+ | `TELEGRAM_BOT_TOKEN` | *(unset)* | Bot token for webhook health check utility |
1417
+ | `TELEGRAM_WEBHOOK_URL` | *(unset)* | Expected webhook URL for Telegram webhook check |
1418
+ | `INBOX_AGENT` | `main` | Target agent for inbox-consumer.mjs |
1419
+ | `INBOX_DELIVERY_CHANNEL` | *(unset)* | Delivery channel for inbox-consumer.mjs forwarding |
1420
+ | `INBOX_DELIVERY_TO` | *(unset)* | Delivery target for inbox-consumer.mjs forwarding |
1421
+ | `INBOX_LIMIT` | `10` | Batch size for inbox-consumer.mjs |
1422
+
1423
+ ---
1424
+
1425
+ Provider-backed identity / authorization / proof behavior, including
1426
+ `authorization_ref` fail-closed semantics, is documented in the
1427
+ [gateway contract](docs/gateway-contract.md#dispatch-time-authorization-evaluation).
1428
+
1429
+ ---
1430
+
1431
+ ## Service Management
1432
+
1433
+ > **Platform note:** The commands below are for macOS (launchd). For Linux, see [INSTALL-LINUX.md](INSTALL-LINUX.md). For Windows, see [INSTALL-WINDOWS.md](INSTALL-WINDOWS.md).
1434
+
1435
+ Choose the launchd mode that matches your host:
1436
+ - **LaunchAgent**: best for a personal Mac that auto-logs in and should run the scheduler in your user session
1437
+ - **LaunchDaemon**: best for a headless Mac or for starting the scheduler before login
1438
+
1439
+ Install either mode with the setup wizard:
1440
+
1441
+ ```bash
1442
+ openclaw-scheduler setup --service-mode agent
1443
+ # or
1444
+ openclaw-scheduler setup --service-mode daemon
1445
+ ```
1446
+
1447
+ ### macOS LaunchAgent
1448
+
1449
+ ```bash
1450
+ # Start / bootstrap
1451
+ launchctl bootstrap gui/$UID ~/Library/LaunchAgents/ai.openclaw.scheduler.plist
1452
+
1453
+ # Stop
1454
+ launchctl bootout gui/$UID/ai.openclaw.scheduler
1455
+
1456
+ # Restart
1457
+ launchctl kickstart -k gui/$UID/ai.openclaw.scheduler
1458
+
1459
+ # Status
1460
+ launchctl print gui/$UID/ai.openclaw.scheduler
1461
+ ps aux | grep dispatcher | grep -v grep
1462
+
1463
+ # Logs
1464
+ tail -f /tmp/openclaw-scheduler.log
1465
+
1466
+ # Quick health
1467
+ openclaw-scheduler status
1468
+ ```
1469
+
1470
+ ### macOS LaunchDaemon
1471
+
1472
+ ```bash
1473
+ # Start / bootstrap
1474
+ sudo launchctl bootstrap system /Library/LaunchDaemons/ai.openclaw.scheduler.plist
1475
+
1476
+ # Stop
1477
+ sudo launchctl bootout system/ai.openclaw.scheduler
1478
+
1479
+ # Restart
1480
+ sudo launchctl kickstart -k system/ai.openclaw.scheduler
1481
+
1482
+ # Status
1483
+ sudo launchctl print system/ai.openclaw.scheduler
1484
+ ps aux | grep dispatcher | grep -v grep
1485
+
1486
+ # Logs
1487
+ tail -f /tmp/openclaw-scheduler.log
1488
+
1489
+ # Quick health
1490
+ openclaw-scheduler status
1491
+ ```
1492
+
1493
+ Both modes use `RunAtLoad: true` and `KeepAlive: true`. LaunchDaemon also sets `UserName: <your-user>` so the service runs under your OpenClaw account while still surviving headless reboots.
1494
+
1495
+ ---
1496
+
1497
+ ## Error Handling & Backoff
1498
+
1499
+ ### On dispatch failure
1500
+
1501
+ 1. Run marked `error`, `consecutive_errors` increments
1502
+ 2. If `max_retries > 0` and retries remain → schedule retry (failure children wait)
1503
+ 3. If retries exhausted → trigger failure children, apply backoff
1504
+
1505
+ ### Backoff schedule
1506
+
1507
+ | Consecutive errors | Delay |
1508
+ |-------------------|-------|
1509
+ | 1 | 30s |
1510
+ | 2 | 1 min |
1511
+ | 3 | 5 min |
1512
+ | 4 | 15 min |
1513
+ | 5+ | 1 hour |
1514
+
1515
+ Backoff is applied on top of the cron schedule (whichever is later). Resets to 0 on success.
1516
+
1517
+ ### Stale run detection
1518
+
1519
+ - Every 30s, dispatcher checks if running runs still have active sessions
1520
+ - No activity for 90s → marked `timeout`
1521
+ - Fallback: runs exceeding `run_timeout_ms` are force-timed-out
1522
+
1523
+ ### Gateway health
1524
+
1525
+ `GET /health` checked before each tick. If unreachable, isolated jobs are deferred; shell and main-session jobs continue.
1526
+
1527
+ ---
1528
+
1529
+ ## Migration & History
1530
+
1531
+ ### Importing from OC cron (first host only)
1532
+
1533
+ ```bash
1534
+ node migrate.js # imports from ~/.openclaw/cron/jobs.json
1535
+ ```
1536
+
1537
+ ### Schema baseline
1538
+
1539
+ As of public release `v0.1.0`, the schema is consolidated in `schema.sql` (baseline `v14`, now `v23`).
1540
+
1541
+ - Net-new installs: `initDb()` applies `schema.sql` directly.
1542
+ - Existing/pre-release DBs: `initDb()` runs `migrate-consolidate.js` to backfill missing columns/tables/indexes.
1543
+
1544
+ ### What was disabled in OpenClaw
1545
+
1546
+ | System | How disabled | Revert |
1547
+ |--------|-------------|--------|
1548
+ | Built-in cron | Jobs disabled (`openclaw cron edit <id> --disable`) + global cron off (`cron.enabled=false`) + gateway env `OPENCLAW_SKIP_CRON=1` | Re-enable jobs + `cron.enabled=true` + unset `OPENCLAW_SKIP_CRON` |
1549
+ | Heartbeat | `agents.defaults.heartbeat.every: "0m"` and disable/remove any per-agent `agents.list[].heartbeat` overrides | Set defaults/per-agent heartbeat cadence back (for example `"5m"`) |
1550
+ | Chat completions | Enabled for scheduler | Can leave enabled |
1551
+
1552
+ ### Public release
1553
+
1554
+ | Version | Date | Schema | Key changes |
1555
+ |---------|------|--------|-------------|
1556
+ | 0.2.0 | 2026-03-11 | v21 | Dispatch `done` hardening, auth profile support, one-shot `at` scheduling, expanded type coverage, UTC scheduling defaults, and portability/runtime fixes |
1557
+ | 0.1.0 | 2026-03-08 | v14 | First public release: workflow engine, structured shell failure triage, watchdog jobs, output offloading, execution-intent controls, safer migration checks, and public-release cleanup |
1558
+
1559
+ ### Pre-public development milestones
1560
+
1561
+ | Date | Former internal tag | Schema | Key changes |
1562
+ |------|----------------------|--------|-------------|
1563
+ | 2026-02-21 | 0.1.0 | v1 | Initial: jobs, runs, messages, agents, standalone dispatch |
1564
+ | 2026-02-22 | 0.4.0 | v3 | Workflow chains, cycle detection, spawn messages, multi-agent |
1565
+ | 2026-02-23 | 0.5.0 | v3b | Retry logic, max chain depth, chain cancellation, queue overlap |
1566
+ | 2026-02-24 | 0.6.0 | v5 | Shell jobs, announce-always, MinIO backup, resource pools, delivery aliases |
1567
+ | 2026-02-25 | 0.7.0 | v6/v7 | Idempotency, at-least-once, context retrieval, approval gates, task tracker, typed messages |
1568
+ | 2026-02-26 | 1.0.0 | v6 | Docs, LICENSE, CHANGELOG, package metadata |
1569
+ | 2026-03-02 | 1.0.1 | v9 | Consolidated schema + migration path, task tracker heartbeat/session baseline columns, session reuse field, Windows shell default fix (`cmd.exe`) |
1570
+ | 2026-03-03 | 1.0.2 | v10 | Team-aware routing fields on messages, explicit message receipt events, team adapter projection + task completion gates |
1571
+ | 2026-03-05 | 1.0.3 | v10 | Dispatch hardening: seeded 529 recovery job reconciliation, watcher token-telemetry safeguards, robust home-path resolution, and watcher DB checks without external `sqlite3` CLI |
1572
+ | 2026-03-08 | 1.1.0 | v13 | Structured shell failure triage, watchdog job type, safer migration skip checks, and public-release cleanup for docs/examples |
1573
+
1574
+ ---
1575
+
1576
+ ## Upgrading
1577
+
1578
+ Already have the scheduler running and need to update? See [UPGRADING.md](UPGRADING.md) for the full guide. Short version by platform:
1579
+
1580
+ ### macOS (launchd, git-clone install)
1581
+ ```bash
1582
+ cd ~/.openclaw/scheduler
1583
+ git pull && npm install
1584
+ SCHEDULER_DB=:memory: node test.js
1585
+ launchctl kickstart -k gui/$(id -u)/ai.openclaw.scheduler
1586
+ ```
1587
+
1588
+ ### Linux / Windows WSL2 (systemd, git-clone install)
1589
+ ```bash
1590
+ cd ~/.openclaw/scheduler
1591
+ git pull && npm install
1592
+ SCHEDULER_DB=:memory: node test.js
1593
+ systemctl --user restart openclaw-scheduler
1594
+ ```
1595
+
1596
+ ### Windows native (PM2, git-clone install)
1597
+ ```powershell
1598
+ cd $env:USERPROFILE\.openclaw\scheduler
1599
+ git pull
1600
+ npm install
1601
+ $env:SCHEDULER_DB=":memory:"; node test.js
1602
+ pm2 restart openclaw-scheduler
1603
+ ```
1604
+
1605
+ ---
1606
+
1607
+ ## Removing the Scheduler
1608
+
1609
+ To stop the scheduler and restore OpenClaw's built-in cron/heartbeat, see [UNINSTALL.md](UNINSTALL.md).
1610
+
1611
+ Quick summary:
1612
+ 1. Stop the service (launchctl / systemctl / pm2)
1613
+ 2. Re-enable OC cron globally: `openclaw config set cron.enabled true` and remove `OPENCLAW_SKIP_CRON=1` from gateway service env
1614
+ 3. Re-enable OC cron jobs: `openclaw cron edit <id> --enable` for each job
1615
+ 4. Re-enable heartbeat: `openclaw config set agents.defaults.heartbeat.every "5m"` and restore any per-agent `agents.list[].heartbeat` overrides you use
1616
+ 5. Optionally delete `~/.openclaw/scheduler/`
1617
+
1618
+ ---
1619
+
1620
+ ## Best Practices
1621
+
1622
+ See [BEST-PRACTICES.md](BEST-PRACTICES.md) for:
1623
+ - Choosing between `shell`, `isolated`, and `main` session targets
1624
+ - Writing effective payload prompts for LLM jobs
1625
+ - When to use chains vs standalone jobs
1626
+ - Delivery mode selection
1627
+ - How to integrate the scheduler with your OpenClaw agent
1628
+ - Example MEMORY.md entries for agent awareness
1629
+
1630
+ ---
1631
+
1632
+ ## File Reference
1633
+
1634
+ ```
1635
+ ~/.openclaw/scheduler/
1636
+
1637
+ │ Core scheduler
1638
+ ├── dispatcher.js # Main process — tick loop, dispatch, chains, retry, backups
1639
+ ├── dispatcher-strategies.js # Dispatch strategy functions (prepare, execute, finalize)
1640
+ ├── dispatcher-maintenance.js # Stale run reaping, TTL pruning, WAL checkpoints
1641
+ ├── dispatcher-approvals.js # Approval timeout resolution and auto-approve/reject
1642
+ ├── dispatcher-delivery.js # Post-run delivery pipeline (announce, announce-always)
1643
+ ├── dispatcher-shell.js # Shell job execution and result normalization
1644
+ ├── dispatcher-utils.js # Shared dispatcher helpers and dependency wiring
1645
+ ├── dispatch-queue.js # Durable dispatch queue (manual runs, retries, chain triggers)
1646
+ ├── db.js # SQLite connection (WAL, FK ON, WAL checkpoint)
1647
+ ├── schema.sql # Complete schema (v23) -- all tables and columns, no incremental DDL
1648
+ ├── migrate-consolidate.js # Single migration for existing DBs: brings any prior version to v23
1649
+ ├── jobs.js # Job CRUD, cron, chains, cycle detection, resource pools, queue
1650
+ ├── runs.js # Run lifecycle, stale/timeout, cancellation, context summary
1651
+ ├── messages.js # Inter-agent message queue (priority, TTL, typed messages)
1652
+ ├── agents.js # Agent registry
1653
+ ├── gateway.js # OpenClaw API client (chat completions, events, delivery, aliases)
1654
+ ├── approval.js # HITL approval gates
1655
+ ├── idempotency.js # Idempotency ledger (at-least-once delivery dedup)
1656
+ ├── retrieval.js # Context retrieval (recent/hybrid run summaries)
1657
+ ├── task-tracker.js # Dead-man's-switch for multi-agent sub-agent teams
1658
+ ├── team-adapter.js # Team mailbox/task projection and task completion gates
1659
+ ├── backup.js # MinIO snapshot/rollup/restore (requires `mc` CLI)
1660
+ ├── cli.js # CLI management tool
1661
+ ├── migrate.js # Import from OC jobs.json
1662
+ ├── scripts/
1663
+ │ ├── dispatch-cli-utils.mjs # Dispatch CLI path resolution helpers
1664
+ │ ├── inbox-consumer.mjs # Drains queue messages and delivers to Telegram
1665
+ │ ├── stuck-run-detector.mjs # Detects stale running runs (alert-only via non-zero exit)
1666
+ │ └── telegram-webhook-check.mjs # Telegram webhook health check / repair utility
1667
+
1668
+ │ Service & docs
1669
+ ├── ~/Library/LaunchAgents/ai.openclaw.scheduler.plist # macOS LaunchAgent location after install
1670
+ ├── /Library/LaunchDaemons/ai.openclaw.scheduler.plist # macOS LaunchDaemon location after install
1671
+ ├── INSTALL.md # Full installation guide — macOS (first host)
1672
+ ├── INSTALL-ADDITIONAL-HOST.md # Installation guide for additional hosts
1673
+ ├── INSTALL-LINUX.md # Installation guide for Linux (systemd user service)
1674
+ ├── INSTALL-WINDOWS.md # Installation guide for Windows (WSL2 or PM2)
1675
+ ├── UPGRADING.md # Upgrade guide (all platforms)
1676
+ ├── UNINSTALL.md # Removal guide (all platforms)
1677
+ ├── BEST-PRACTICES.md # Job type selection, prompt writing, agent integration
1678
+ ├── QUICK-START.md # Focused guide: install, convert crons, first workflow
1679
+ ├── openclaw-scheduler.service # Linux systemd user service template
1680
+ ├── CHANGELOG.md # Version history
1681
+ └── README.md # This file
1682
+ ```
1683
+
1684
+ ---
1685
+
1686
+ ## Testing
1687
+
1688
+ ```bash
1689
+ # Run all tests (in-memory SQLite; expect 0 failed)
1690
+ SCHEDULER_DB=:memory: node test.js
1691
+
1692
+ # Or via npm:
1693
+ npm test
1694
+ ```
1695
+
1696
+ ### Test categories
1697
+
1698
+ - Schema creation & integrity
1699
+ - Job CRUD, cron parsing, due detection
1700
+ - Run lifecycle (create, heartbeat, finish, stale, timeout)
1701
+ - Agent registry (upsert, status, capabilities)
1702
+ - Message queue (priority, broadcast, TTL, typed messages)
1703
+ - Cascade deletes, pruning
1704
+ - Workflow chains (parent/child, trigger matching, tree traversal, trigger conditions)
1705
+ - Cycle detection (self, deep)
1706
+ - Max chain depth enforcement
1707
+ - Retry tracking and sequencing
1708
+ - Chain cancellation
1709
+ - Shell job execution
1710
+ - Approval gate lifecycle
1711
+ - Idempotency key claiming/releasing
1712
+ - Context retrieval (recent/hybrid)
1713
+ - Dispatcher integration (full dispatch pipeline with mock gateway)
1714
+
1715
+ ---
1716
+
1717
+ ## Sub-agent Dispatch
1718
+
1719
+ The dispatch module (`dispatch/index.mjs`) spawns and steers isolated agent sessions via the OpenClaw Gateway API and tracks them by a human-readable label. Unlike the scheduler's job/run model, dispatch calls the gateway directly -- no scheduler tick delay, no DB write required to start a session. Each session is assigned a unique session key, recorded in a local `labels.json` ledger, and auto-announces its result when the agent calls `done` as its final action. The module also supports symlink-based branding: a wrapper directory (such as `my-brand`) contains a `config.json` with a custom name and a symlink to `dispatch/index.mjs`, giving the same CLI a different identity in notifications and logs.
1720
+
1721
+ ### Quick Example
1722
+
1723
+ ```bash
1724
+ # Dispatch a sub-agent task and deliver the result to Telegram
1725
+ openclaw-scheduler enqueue \
1726
+ --label "fix-deploy-script" \
1727
+ --message "Fix the deploy script in ~/app to handle missing .env files" \
1728
+ --mode fresh \
1729
+ --thinking high \
1730
+ --timeout 3600 \
1731
+ --deliver-to YOUR_CHAT_ID \
1732
+ --delivery-mode announce
1733
+
1734
+ # Fallback (if openclaw-scheduler is not in PATH):
1735
+ node ~/.openclaw/scheduler/dispatch/index.mjs enqueue \
1736
+ --label "fix-deploy-script" --message "..." --deliver-to YOUR_CHAT_ID
1737
+ ```
1738
+
1739
+ ### Flag Reference
1740
+
1741
+ | Flag | Default | Description |
1742
+ |------|---------|-------------|
1743
+ | `--label` | required | Human-readable name for the session. Used for status lookups, reuse, and watchdog tracking. |
1744
+ | `--message` | required* | Prompt sent to the agent. |
1745
+ | `--message-file` | -- | Path to a file whose contents are used as the prompt. Alternative to `--message`; avoids shell-escaping issues with long prompts. |
1746
+ | `--mode` | `fresh` | `fresh` creates a new session. `reuse` continues the last session recorded for this label. |
1747
+ | `--thinking` | -- | Reasoning budget: `low`, `high`, or `xhigh`. |
1748
+ | `--model` | -- | Model override, e.g. `anthropic/claude-sonnet-4-6`. |
1749
+ | `--deliver-to` | -- | Delivery target (e.g. Telegram chat ID). Registers a scheduler watcher job for reliable at-least-once delivery. |
1750
+ | `--delivery-mode` | `announce` | `announce` delivers only when output is non-empty. `announce-always` delivers unconditionally. `none` suppresses delivery. |
1751
+ | `--timeout` | `300` | Session timeout in seconds. |
1752
+ | `--monitor` | on | Auto-register a watchdog job that alerts if the session goes silent past the configured threshold. |
1753
+ | `--no-monitor` | -- | Disable watchdog registration for this dispatch. |
1754
+
1755
+ *Either `--message` or `--message-file` is required.
1756
+
1757
+ ### Subcommand Reference
1758
+
1759
+ | Subcommand | Description |
1760
+ |------------|-------------|
1761
+ | `enqueue` | Spawn a new agent session (or resume one with `--mode reuse`) and optionally register a scheduler watcher for delivery. |
1762
+ | `status` | Show current status for a label: session key, spawn time, running/done/error, and liveness data from the sessions store. |
1763
+ | `stuck` | Check all running sessions against the stuck threshold. Exits 1 if genuinely stuck sessions remain after auto-resolving completed ones. |
1764
+ | `result` | Retrieve the last assistant reply from a session transcript via `chat.history`. |
1765
+ | `sync` | Reconcile `labels.json` with sessions store state. Auto-marks sessions as done or error based on idle time. Supports `--dry-run`. |
1766
+ | `done` | Agent-side completion signal. The agent calls this as its final action to mark itself done immediately (push-based; no idle timeout wait). |
1767
+ | `send` | Inject a message into a running session for mid-run steering. The agent sees it as a new user turn. |
1768
+ | `steer` | Alias for `send`. The name makes steering intent explicit. |
1769
+ | `heartbeat` | Check whether a session has been active within the last 10 minutes. Accepts `--label` or `--session-key`. |
1770
+ | `list` | List all tracked labels in `labels.json`, sorted by most recent. Accepts `--status running|done|error` and `--limit`. |
1771
+
1772
+ ### Multi-agent Orchestration
1773
+
1774
+ The main agent acts as the orchestrator and delegates parallel units of work to sub-agents via `enqueue`. Each sub-agent runs in an isolated session, completes its assigned task, and calls `done` as its last action. Results are delivered back to the requesting chat (Telegram, Discord, WhatsApp, Signal, iMessage, or Slack) without the orchestrator polling.
1775
+
1776
+ **Spawn depth constraint:** The gateway enforces `maxSpawnDepth: 2`. The main agent (depth 0) spawns sub-agents (depth 1), which can spawn nested sub-agents (depth 2). Depth 3 is blocked. The dispatcher sets `spawnDepth: 1` on each fresh session automatically.
1777
+
1778
+ **Example: 3 parallel workers**
1779
+
1780
+ ```bash
1781
+ # Orchestrator dispatches three workers in parallel.
1782
+ # All three run concurrently in isolated sessions.
1783
+
1784
+ openclaw-scheduler enqueue \
1785
+ --label "worker-schema" \
1786
+ --message "Review the DB schema and write documentation for all tables" \
1787
+ --thinking high --timeout 600 --deliver-to YOUR_CHAT_ID
1788
+
1789
+ openclaw-scheduler enqueue \
1790
+ --label "worker-frontend" \
1791
+ --message "Audit the React components for accessibility issues" \
1792
+ --thinking high --timeout 600 --deliver-to YOUR_CHAT_ID
1793
+
1794
+ openclaw-scheduler enqueue \
1795
+ --label "worker-docs" \
1796
+ --message "Update the API docs to reflect the new /v2 endpoints" \
1797
+ --thinking high --timeout 600 --deliver-to YOUR_CHAT_ID
1798
+
1799
+ # Each worker auto-announces its result to the configured channel when done.
1800
+ # No polling needed. Watchdog jobs are auto-registered for each.
1801
+ ```
1802
+
1803
+ Check status at any time:
1804
+
1805
+ ```bash
1806
+ openclaw-scheduler list --status running
1807
+ openclaw-scheduler dispatch status --label worker-schema
1808
+ ```
1809
+
1810
+ ### Branding and Configuration
1811
+
1812
+ `dispatch/index.mjs` resolves `config.json` relative to the directory of the invoking script, not the module itself. This means a symlink at `~/.openclaw/my-brand/index.mjs -> ~/.openclaw/scheduler/dispatch/index.mjs` will load `~/.openclaw/my-brand/config.json`, giving the same CLI a different brand name and defaults. All config fields are optional.
1813
+
1814
+ **`config.json` fields:**
1815
+
1816
+ | Field | Default | Description |
1817
+ |-------|---------|-------------|
1818
+ | `name` | `"dispatch"` | Brand name shown in Telegram notifications and log output. |
1819
+ | `startupGraceMs` | `90000` | Grace period (ms) after spawn before stuck detection and auto-resolve activate. |
1820
+ | `stuckThresholdMs` | `600000` | Silence duration (ms) before a session is considered stuck. |
1821
+ | `maxWatcherAgeMs` | `7200000` | Max watcher process age (ms) before it is treated as stale. |
1822
+ | `watchdogIntervalCron` | `"*/15 * * * *"` | Cron schedule for the auto-registered watchdog job. |
1823
+ | `watchdogTimeoutMin` | `60` | Sessions running longer than this (minutes) without completing trigger a watchdog alert. |
1824
+ | `deliver_watcher_ttl_hours` | `48` | TTL (hours) for scheduler-registered deliver-watcher jobs. These jobs are transient; they auto-prune once delivery is confirmed. Lower values prune faster; higher values retain audit history longer. |
1825
+
1826
+ **Environment variables:**
1827
+
1828
+ | Variable | Description |
1829
+ |----------|-------------|
1830
+ | `DISPATCH_LABELS_PATH` | Override path for `labels.json`. Default: `<invoke_dir>/labels.json`. |
1831
+ | `OPENCLAW_GATEWAY_TOKEN` | Gateway auth token. Falls back to `~/.openclaw/openclaw.json` if unset. |
1832
+
1833
+ **Minimal `config.json`:**
1834
+
1835
+ ```json
1836
+ {
1837
+ "name": "my-brand",
1838
+ "watchdogIntervalCron": "*/15 * * * *",
1839
+ "watchdogTimeoutMin": 60
1840
+ }
1841
+ ```
1842
+
1843
+ ### Monitoring and the Watchdog
1844
+
1845
+ When `--deliver-to` is set and `--no-monitor` is not passed, `enqueue` automatically registers a watchdog job in the scheduler DB alongside the delivery watcher job. The watchdog runs on the configured cron schedule and calls `stuck --threshold-min <watchdogTimeoutMin>` for the dispatched label. If the session has been silent past the threshold, the watchdog posts an alert to the configured delivery target and then disables itself.
1846
+
1847
+ Check active dispatch sessions:
1848
+
1849
+ ```bash
1850
+ # List all running dispatch sessions
1851
+ openclaw-scheduler list --status running
1852
+
1853
+ # Check whether any session is stuck (exits 1 if found)
1854
+ node ~/.openclaw/scheduler/dispatch/index.mjs stuck --threshold-min 15
1855
+
1856
+ # Status for a specific label
1857
+ openclaw-scheduler dispatch status --label fix-deploy-script
1858
+ ```
1859
+
1860
+ The watchdog disarms itself automatically when the agent calls `done`, when `status` or `sync` auto-resolves the session from gateway idle state, or when `result` is fetched after a successful completion.
1861
+
1862
+ ---
1863
+
1864
+ ## Working with agentcli
1865
+
1866
+ [agentcli](https://github.com/amittell/agentcli) is the control-plane companion
1867
+ for the scheduler. It provides manifest authoring, validation, local execution,
1868
+ identity binding, and capability negotiation. The scheduler provides the durable
1869
+ runtime: scheduling, retries, approvals, delivery, and persistent state.
1870
+
1871
+ The scheduler works without agentcli -- most jobs are created by the OpenClaw
1872
+ agent itself when a user requests a scheduled task via Telegram or another
1873
+ messaging channel, and operators can also create jobs directly via the CLI.
1874
+ Adding agentcli on top gives you declarative workflow manifests, stable job IDs,
1875
+ v0.2 identity/authorization/evidence support, and repeatable applies for
1876
+ workflows that outgrow ad-hoc job creation.
1877
+
1878
+ ### Installing agentcli
1879
+
1880
+ ```bash
1881
+ npm install -g agentcli
1882
+ ```
1883
+
1884
+ ### Starting fresh with both tools
1885
+
1886
+ Write a manifest, validate it, then apply it to the scheduler:
1887
+
1888
+ ```bash
1889
+ # 1. Write a manifest
1890
+ cat > my-workflow.json <<'JSON'
1891
+ {
1892
+ "version": "0.1",
1893
+ "workflows": [{
1894
+ "id": "daily-ops",
1895
+ "name": "Daily Operations",
1896
+ "tasks": [{
1897
+ "id": "health-check",
1898
+ "name": "Morning Health Check",
1899
+ "prompt": "Run the daily health check and report any issues.",
1900
+ "target": { "session_target": "isolated", "agent_id": "main" },
1901
+ "schedule": { "cron": "0 9 * * *", "tz": "America/New_York" },
1902
+ "runtime": { "timeout_ms": 300000 },
1903
+ "delivery": { "mode": "announce", "channel": "telegram", "to": "YOUR_CHAT_ID" }
1904
+ }]
1905
+ }]
1906
+ }
1907
+ JSON
1908
+
1909
+ # 2. Validate locally (no scheduler needed)
1910
+ agentcli validate my-workflow.json
1911
+
1912
+ # 3. Preview what would be created (dry-run)
1913
+ agentcli apply my-workflow.json \
1914
+ --db ~/.openclaw/scheduler/scheduler.db \
1915
+ --scheduler-prefix ~/.openclaw/scheduler \
1916
+ --dry-run
1917
+
1918
+ # 4. Apply (creates the jobs)
1919
+ agentcli apply my-workflow.json \
1920
+ --db ~/.openclaw/scheduler/scheduler.db \
1921
+ --scheduler-prefix ~/.openclaw/scheduler
1922
+
1923
+ # 5. Verify
1924
+ openclaw-scheduler jobs list
1925
+ ```
1926
+
1927
+ ### Adopting existing scheduler jobs
1928
+
1929
+ If you already have jobs created directly via `openclaw-scheduler jobs add` and
1930
+ want to bring them under agentcli management:
1931
+
1932
+ 1. Write a manifest with task names that match your existing job names exactly.
1933
+
1934
+ 2. Run a one-time adoption by name:
1935
+
1936
+ ```bash
1937
+ agentcli apply my-workflow.json \
1938
+ --db ~/.openclaw/scheduler/scheduler.db \
1939
+ --scheduler-prefix ~/.openclaw/scheduler \
1940
+ --adopt-by name \
1941
+ --dry-run # preview first
1942
+
1943
+ agentcli apply my-workflow.json \
1944
+ --db ~/.openclaw/scheduler/scheduler.db \
1945
+ --scheduler-prefix ~/.openclaw/scheduler \
1946
+ --adopt-by name # execute adoption
1947
+ ```
1948
+
1949
+ This replaces each matched job with a new one under agentcli's stable ID
1950
+ scheme (SHA256 of workflow_id:task_id). The old job is deleted after the
1951
+ new one is created.
1952
+
1953
+ 3. On subsequent applies, use the default (no `--adopt-by` flag). Jobs are
1954
+ matched by their stable ID, so the manifest can be renamed or reorganized
1955
+ without losing job mapping.
1956
+
1957
+ ### Workflow chains
1958
+
1959
+ agentcli manifests support parent/child task relationships that compile to
1960
+ scheduler trigger chains:
1961
+
1962
+ ```json
1963
+ {
1964
+ "version": "0.1",
1965
+ "workflows": [{
1966
+ "id": "deploy-pipeline",
1967
+ "name": "Deploy Pipeline",
1968
+ "tasks": [
1969
+ {
1970
+ "id": "build",
1971
+ "name": "Build",
1972
+ "shell": { "program": "sh", "args": ["-c", "npm run build"] },
1973
+ "target": { "session_target": "shell" },
1974
+ "schedule": { "cron": "0 2 * * *" },
1975
+ "runtime": { "timeout_ms": 600000 }
1976
+ },
1977
+ {
1978
+ "id": "deploy",
1979
+ "name": "Deploy",
1980
+ "shell": { "program": "sh", "args": ["-c", "fly deploy"] },
1981
+ "target": { "session_target": "shell" },
1982
+ "trigger": { "parent": "build", "on": "success" },
1983
+ "runtime": { "timeout_ms": 300000 }
1984
+ },
1985
+ {
1986
+ "id": "verify",
1987
+ "name": "Post-Deploy Verify",
1988
+ "prompt": "Verify all services are healthy after deploy.",
1989
+ "target": { "session_target": "isolated", "agent_id": "main" },
1990
+ "trigger": { "parent": "deploy", "on": "success" },
1991
+ "runtime": { "timeout_ms": 300000 },
1992
+ "delivery": { "mode": "announce-always", "channel": "telegram", "to": "YOUR_CHAT_ID" }
1993
+ }
1994
+ ]
1995
+ }]
1996
+ }
1997
+ ```
1998
+
1999
+ This compiles to three scheduler jobs: Build runs on cron, Deploy triggers
2000
+ on Build success, Verify triggers on Deploy success.
2001
+
2002
+ ### v0.2 identity and authorization
2003
+
2004
+ agentcli v0.2 manifests add identity profiles, authorization proofs, evidence
2005
+ generation, and credential handoff. These compile to the scheduler's v0.2
2006
+ runtime fields and are enforced at dispatch time:
2007
+
2008
+ ```json
2009
+ {
2010
+ "version": "0.2",
2011
+ "identity_profiles": [{
2012
+ "id": "stripe-readonly",
2013
+ "provider": "stripe",
2014
+ "subject": { "kind": "service", "principal": "agent://payments/reader" },
2015
+ "auth": { "mode": "service", "scopes": ["read"] },
2016
+ "trust": { "level": "supervised" }
2017
+ }],
2018
+ "workflows": [{
2019
+ "id": "payment-ops",
2020
+ "tasks": [{
2021
+ "id": "check-balance",
2022
+ "name": "Check Balance",
2023
+ "shell": { "program": "sh", "args": ["-c", "stripe balance retrieve"] },
2024
+ "target": { "session_target": "shell" },
2025
+ "identity": { "ref": "stripe-readonly" },
2026
+ "contract": {
2027
+ "required_trust_level": "supervised",
2028
+ "trust_enforcement": "strict"
2029
+ },
2030
+ "schedule": { "cron": "0 9 * * *" },
2031
+ "runtime": { "timeout_ms": 60000 }
2032
+ }]
2033
+ }]
2034
+ }
2035
+ ```
2036
+
2037
+ See the [agentcli examples directory](https://github.com/amittell/agentcli/tree/main/examples)
2038
+ for fully annotated manifests covering Stripe, Fly.io, Terraform, GitHub CLI,
2039
+ and more.
2040
+
2041
+ ### Environment variables
2042
+
2043
+ | Variable | Default | Description |
2044
+ |----------|---------|-------------|
2045
+ | `AGENTCLI_SCHEDULER_DB` | *(none)* | Path to scheduler SQLite database |
2046
+ | `AGENTCLI_SCHEDULER_PREFIX` | *(none)* | npm prefix where scheduler is installed |
2047
+ | `AGENTCLI_SCHEDULER_BIN` | *(none)* | Direct path to scheduler CLI binary |
2048
+ | `AGENTCLI_TARGET` | `standalone` | Default compilation target (`standalone` or `openclaw-scheduler`) |
2049
+ | `AGENTCLI_OUTPUT` | *(none)* | Output format (`json` or `ndjson`) |
2050
+
2051
+ ### Key commands
2052
+
2053
+ ```bash
2054
+ agentcli validate manifest.json # Check manifest validity
2055
+ agentcli compile manifest.json \
2056
+ --target openclaw-scheduler --explain # Preview compiled job specs
2057
+ agentcli apply manifest.json \
2058
+ --db path/to/scheduler.db --dry-run # Preview changes
2059
+ agentcli apply manifest.json \
2060
+ --db path/to/scheduler.db # Create/update jobs
2061
+ agentcli inspect jobs # List managed jobs
2062
+ agentcli exec manifest.json task-id \
2063
+ --dry-run --signer none # Local execution (no scheduler)
2064
+ ```
2065
+
2066
+ ---
2067
+
2068
+ ## Trust Architecture
2069
+
2070
+ The scheduler acts as a control-plane broker for child execution principals.
2071
+ Child tasks are bounded actors that receive only the credentials the scheduler
2072
+ gives them and cannot escalate their own authority. The credential model
2073
+ supports both precreated scoped keys and dynamic per-task key minting via
2074
+ identity providers.
2075
+
2076
+ For the full trust model -- including when the scheduler/child boundary is a
2077
+ real security boundary vs. an operational one, the credential flow from
2078
+ operator to child, and what the model does and does not guarantee -- see
2079
+ [docs/trust-architecture.md](docs/trust-architecture.md).
2080
+
2081
+ ---
2082
+
2083
+ ## Troubleshooting
2084
+
2085
+ ### Dispatcher isn't dispatching
2086
+
2087
+ ```bash
2088
+ ps aux | grep dispatcher # Is it running?
2089
+ tail -20 /tmp/openclaw-scheduler.log # Any errors?
2090
+ curl http://127.0.0.1:18789/health # Gateway reachable?
2091
+ openclaw-scheduler jobs list # Is nextRun in the past?
2092
+ openclaw-scheduler runs running # Overlap blocking?
2093
+ ```
2094
+
2095
+ ### Wrong next_run_at
2096
+
2097
+ All dates must be SQLite format (`YYYY-MM-DD HH:MM:SS`, UTC). `nextRunFromCron()` handles this. If manually setting dates, don't use ISO format with `T`/`Z`.
2098
+
2099
+ ### Force a job to run now
2100
+
2101
+ ```bash
2102
+ sqlite3 scheduler.db "UPDATE jobs SET next_run_at = datetime('now', '-1 second') WHERE id = '<job-id>'"
2103
+ ```
2104
+
2105
+ ### Check schema version
2106
+
2107
+ ```bash
2108
+ sqlite3 scheduler.db "SELECT * FROM schema_migrations"
2109
+ ```
2110
+
2111
+ ### Service won't start
2112
+
2113
+ ```bash
2114
+ # LaunchAgent
2115
+ plutil -lint ~/Library/LaunchAgents/ai.openclaw.scheduler.plist
2116
+ launchctl bootstrap gui/$UID ~/Library/LaunchAgents/ai.openclaw.scheduler.plist
2117
+ launchctl print gui/$UID/ai.openclaw.scheduler
2118
+
2119
+ # LaunchDaemon
2120
+ sudo plutil -lint /Library/LaunchDaemons/ai.openclaw.scheduler.plist
2121
+ sudo launchctl bootstrap system /Library/LaunchDaemons/ai.openclaw.scheduler.plist
2122
+ sudo launchctl print system/ai.openclaw.scheduler
2123
+ ```
2124
+
2125
+ ### Logs not updating
2126
+
2127
+ Dispatcher logs to stderr (unbuffered). If logs look stale, the process may have crashed. Check the service that matches your launchd mode:
2128
+ - LaunchAgent: `launchctl print gui/$UID/ai.openclaw.scheduler`
2129
+ - LaunchDaemon: `sudo launchctl print system/ai.openclaw.scheduler`
2130
+
2131
+ ### Job shows 'awaiting_approval'
2132
+
2133
+ ```bash
2134
+ openclaw-scheduler approvals list
2135
+ openclaw-scheduler jobs approve <id> # or reject
2136
+ ```
2137
+
2138
+ ### Backup failing
2139
+
2140
+ ```bash
2141
+ mc alias list # verify backupstore alias configured
2142
+ # Check: SCHEDULER_BACKUP_MC_ALIAS, SCHEDULER_BACKUP_BUCKET, SCHEDULER_BACKUP_PREFIX env vars or defaults in backup.js
2143
+ # Verify MinIO is reachable: mc ls backupstore/
2144
+ ```
2145
+
2146
+ ---
2147
+
2148
+ ## Companion Scripts
2149
+
2150
+ The `scripts/` directory contains optional operational helpers built on top of core scheduler primitives.
2151
+
2152
+ These scripts are not required for scheduling itself, but they are useful for production operations:
2153
+ - `scripts/inbox-consumer.mjs` drains queued messages and delivers them to Telegram.
2154
+ - `scripts/stuck-run-detector.mjs` detects stale `running` runs and exits non-zero for alerting.
2155
+
2156
+ ### Signal Queue Pattern
2157
+
2158
+ The message queue (`messages` table) plus `cli.js msg send` implements a **signal-only** delivery path that complements `delivery_mode: announce`:
2159
+
2160
+ ```
2161
+ Failure path: dispatcher → announce → Telegram (immediate, unconditional)
2162
+ Signal path: script → cli.js msg send → queue → Inbox Consumer → Telegram
2163
+ ```
2164
+
2165
+ Scripts write to the queue **only when they have found something** — not unconditionally. A companion `scripts/inbox-consumer.mjs` shell job (run every 5 min) drains the queue and delivers to Telegram. It exits 0 when the queue is empty, so there is no noise.
2166
+
2167
+ > **Important:** The dispatcher does **not** write to the message queue automatically.
2168
+ > Every message in the queue was put there by a script with a specific receiver in mind.
2169
+ > Traceability for completed jobs comes from the `runs` table, `delivery_mode: announce`,
2170
+ > and run history/CLI views — not from queued messages.