openclaw-scheduler 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (70) hide show
  1. package/AGENTS.md +302 -0
  2. package/BEST-PRACTICES.md +506 -0
  3. package/CHANGELOG.md +82 -0
  4. package/CODE_OF_CONDUCT.md +22 -0
  5. package/CONTEXT.md +26 -0
  6. package/CONTRIBUTING.md +73 -0
  7. package/IMPLEMENTATION_SPEC.md +170 -0
  8. package/INSTALL-ADDITIONAL-HOST.md +333 -0
  9. package/INSTALL-LINUX.md +419 -0
  10. package/INSTALL-WINDOWS.md +305 -0
  11. package/INSTALL.md +364 -0
  12. package/JOB-QUICK-REF.md +222 -0
  13. package/LICENSE +21 -0
  14. package/QUICK-START.md +256 -0
  15. package/README.md +2170 -0
  16. package/SECURITY.md +34 -0
  17. package/UNINSTALL.md +129 -0
  18. package/UPGRADING.md +436 -0
  19. package/agents.js +67 -0
  20. package/approval.js +107 -0
  21. package/backup.js +390 -0
  22. package/bin/openclaw-scheduler.js +138 -0
  23. package/cli.js +1083 -0
  24. package/db.js +122 -0
  25. package/dispatch/529-recovery.mjs +204 -0
  26. package/dispatch/README.md +372 -0
  27. package/dispatch/config.example.json +24 -0
  28. package/dispatch/deliver-watcher.sh +57 -0
  29. package/dispatch/hooks.mjs +171 -0
  30. package/dispatch/index.mjs +1836 -0
  31. package/dispatch/watcher.mjs +1396 -0
  32. package/dispatch-queue.js +112 -0
  33. package/dispatcher-approvals.js +96 -0
  34. package/dispatcher-delivery.js +43 -0
  35. package/dispatcher-maintenance.js +242 -0
  36. package/dispatcher-shell.js +29 -0
  37. package/dispatcher-strategies.js +1280 -0
  38. package/dispatcher-utils.js +81 -0
  39. package/dispatcher.js +855 -0
  40. package/docs/adr-schedule-ownership.md +73 -0
  41. package/docs/gateway-contract.md +904 -0
  42. package/docs/plans/2026-03-09-fix-typescript-types.md +91 -0
  43. package/docs/plans/2026-03-09-test-coverage-gaps.md +83 -0
  44. package/docs/plans/2026-03-10-dispatcher-refactor.md +801 -0
  45. package/docs/trust-architecture.md +266 -0
  46. package/gateway.js +473 -0
  47. package/idempotency.js +119 -0
  48. package/index.d.ts +864 -0
  49. package/index.js +17 -0
  50. package/jobs.js +1224 -0
  51. package/messages.js +357 -0
  52. package/migrate-consolidate.js +694 -0
  53. package/migrate.js +125 -0
  54. package/package.json +130 -0
  55. package/paths.js +79 -0
  56. package/prompt-context.js +94 -0
  57. package/retrieval.js +176 -0
  58. package/runs.js +270 -0
  59. package/scheduler-schema.js +101 -0
  60. package/schema.sql +480 -0
  61. package/scripts/dispatch-cli-utils.mjs +65 -0
  62. package/scripts/inbox-consumer.mjs +288 -0
  63. package/scripts/stuck-detector.sh +18 -0
  64. package/scripts/stuck-run-detector.mjs +333 -0
  65. package/scripts/telegram-webhook-check.mjs +238 -0
  66. package/setup.mjs +724 -0
  67. package/shell-result.js +214 -0
  68. package/task-tracker.js +300 -0
  69. package/team-adapter.js +335 -0
  70. package/v02-runtime.js +599 -0
@@ -0,0 +1,506 @@
1
+ # Best Practices
2
+
3
+ Guidance on choosing the right job type, writing effective prompts, structuring workflows, and making your OpenClaw agent scheduler-aware.
4
+
5
+ ---
6
+
7
+ ## Table of Contents
8
+
9
+ 1. [Choosing the Right Job Type](#choosing-the-right-job-type)
10
+ - [Session Targets at a Glance](#session-targets-at-a-glance)
11
+ - [When to use `shell`](#when-to-use-shell)
12
+ - [When to use `isolated`](#when-to-use-isolated)
13
+ - [When to use `main`](#when-to-use-main)
14
+ - [Chains vs Standalone](#chains-vs-standalone)
15
+ - [Delivery Modes](#delivery-modes)
16
+ - [Timeouts and Reliability](#timeouts-and-reliability)
17
+ 2. [Integrating with Your OpenClaw Agent](#integrating-with-your-openclaw-agent)
18
+ - [What the Agent Needs to Know](#what-the-agent-needs-to-know)
19
+ - [Adding the Scheduler to Agent Memory](#adding-the-scheduler-to-agent-memory)
20
+ - [How Scheduled Jobs Appear to the Agent](#how-scheduled-jobs-appear-to-the-agent)
21
+ - [The HEARTBEAT_OK Convention](#the-heartbeat_ok-convention)
22
+ - [Letting the Agent Create Jobs Dynamically](#letting-the-agent-create-jobs-dynamically)
23
+ - [Communicating Between Jobs](#communicating-between-jobs)
24
+ - [Practical Agent Briefing Example](#practical-agent-briefing-example)
25
+
26
+ ---
27
+
28
+ ## Choosing the Right Job Type
29
+
30
+ ### Session Targets at a Glance
31
+
32
+ | Use Case | Target | Why |
33
+ |----------|--------|-----|
34
+ | Backups, scripts, log rotation, file cleanup | `shell` | No LLM cost, no gateway dependency, runs even if gateway is down |
35
+ | Morning briefings, reports, analysis, summaries | `isolated` | Gets full LLM + tools, isolated from your conversations |
36
+ | Urgent alerts that must appear in active chat | `main` | Injects directly into your live session |
37
+ | Build → deploy → notify pipelines | Chain of `isolated` jobs | Each step gets fresh context, failures stop the chain |
38
+
39
+ ---
40
+
41
+ ### When to use `shell`
42
+
43
+ Use `shell` when:
44
+ - The task is deterministic — a script, a health check ping, a disk usage check, a backup
45
+ - You don't need AI reasoning
46
+ - The job must run even when the gateway is down or rate-limited
47
+ - Speed matters — shell jobs complete in milliseconds vs 10–60s for LLM calls
48
+ - You want to minimize API cost
49
+
50
+ `payload_message` is passed directly to the shell (`/bin/zsh` on macOS, `/bin/bash` on Linux). Override with `SCHEDULER_SHELL` environment variable.
51
+
52
+ **Examples:**
53
+
54
+ ```json
55
+ { "session_target": "shell", "payload_message": "~/scripts/backup.sh" }
56
+
57
+ { "session_target": "shell", "payload_message": "df -h / | grep -E '^/dev' | awk '{print $5}'" }
58
+
59
+ { "session_target": "shell", "payload_message": "curl -sf http://localhost:8080/health || exit 1" }
60
+
61
+ { "session_target": "shell", "payload_message": "cd ~/myapp && git pull && npm run build 2>&1" }
62
+ ```
63
+
64
+ **Delivery with shell jobs:**
65
+ - `delivery_mode: "announce"` — sends output only on non-zero exit (perfect for failure alerts)
66
+ - `delivery_mode: "announce-always"` — sends output every time
67
+ - `delivery_mode: "none"` — background, check runs via `openclaw-scheduler runs list`
68
+
69
+ **Shell jobs work great as chain parents too** — a shell build job can trigger an isolated deploy job on success.
70
+
71
+ #### Shell Daemon Keepalive Jobs — Critical Rules
72
+
73
+ When using shell jobs to keep a background daemon (e.g. a model server, a vector DB, an MCP bridge) warm and running, three rules prevent silent failures:
74
+
75
+ **Rule 1: The daemon wrapper script must block.**
76
+
77
+ If your launchd/systemd service runs a wrapper script that sets up the daemon and then exits, the service manager restarts the script immediately -- potentially spawning multiple daemon instances fighting for the same port, causing hangs or crashes.
78
+
79
+ ```bash
80
+ # Wrong -- script exits after setup, KeepAlive loops it immediately
81
+ ./bridge.py daemon start # starts daemon, initializes session, exits
82
+ ./bridge.py warmup # exits -> service restarts -> another daemon spawns
83
+
84
+ # Correct -- start daemon in background, block on its PID
85
+ ./mydaemon --port 8181 &
86
+ DAEMON_PID=$!
87
+ ./bridge.py daemon init # session setup (runs once)
88
+ ./bridge.py warmup # pre-warm (runs once)
89
+ wait $DAEMON_PID # blocks until daemon dies -> then service restarts cleanly
90
+ ```
91
+
92
+ On macOS (`KeepAlive: true` launchd, whether LaunchAgent or LaunchDaemon) and Linux (`Restart=always` systemd), the moment your script exits the service manager restarts it. Always block on the daemon process with `wait $PID`.
93
+
94
+ **Rule 2: Keepalive jobs must exercise the actual hot path.**
95
+
96
+ A keepalive that pings `/health` or runs a lightweight query does NOT keep ML/neural models warm in GPU memory. Only calls that trigger real model inference keep the models loaded.
97
+
98
+ ```bash
99
+ # Wrong -- lightweight health check, no model loading, GPU models go cold
100
+ curl -sf http://localhost:8181/health
101
+
102
+ # Correct -- inference query triggers embedding + reranker model loading
103
+ curl -sf http://localhost:8181/v1/embeddings \
104
+ -d '{"input":"keepalive","model":"default"}' > /dev/null
105
+ ```
106
+
107
+ Schedule model-warming keepalives at least every 10 minutes. Every 30 minutes is too infrequent -- models unload from GPU between calls, causing cold-start delays (10-20s) on the next real query.
108
+
109
+ **Rule 3: Daemon session/state files must not live in `/tmp`.**
110
+
111
+ `/tmp` is cleared on macOS reboot and on Linux boot (or by `systemd-tmpfiles`). If your daemon stores a session ID or token in `/tmp`, any reboot causes the next client call to fail — often silently (e.g. `400 Already Initialized`, empty results, or a crash with no stdout).
112
+
113
+ ```bash
114
+ # ❌ Wrong — lost on reboot, daemon calls fail silently after restart
115
+ SESSION_FILE=/tmp/my-daemon-session.json
116
+
117
+ # ✅ Correct — persistent across reboots
118
+ SESSION_FILE=${XDG_CACHE_HOME:-$HOME/.cache}/my-daemon/session.json
119
+ ```
120
+
121
+ Always store daemon session files in `XDG_CACHE_HOME` or another directory that persists across reboots.
122
+
123
+ ---
124
+
125
+ ### When to use `isolated`
126
+
127
+ Use `isolated` when:
128
+ - The task requires reasoning, writing, planning, or multi-step tool use
129
+ - The task reads or writes memory files
130
+ - You want OpenClaw's tools available (kubectl, browser, file access, exec, etc.)
131
+ - The output should be formatted and delivered to a channel
132
+
133
+ **Writing effective `isolated` job prompts:**
134
+
135
+ | Rule | Bad | Good |
136
+ |------|-----|------|
137
+ | Be imperative and specific | "check kubernetes" | "Check k8s pods in requesthub-prod and requesthub-dev. List any non-Running pods." |
138
+ | Include a success signal | *(nothing)* | "If all pods Running, reply with exactly: HEARTBEAT_OK" |
139
+ | Specify output format | *(nothing)* | "Format issues as: ⚠️ \<namespace\>/\<pod\>: \<status\>" |
140
+ | State available resources | *(implicit)* | "Your memory files are in ~/.openclaw/workspace/memory/" |
141
+ | Set realistic timeouts | default 300s | 120s for single-tool, 240s for multi-tool |
142
+
143
+ **Bad prompt:**
144
+ ```
145
+ Check everything and let me know if anything is wrong
146
+ ```
147
+
148
+ **Good prompt:**
149
+ ```
150
+ Check k8s pod health across requesthub-prod and requesthub-dev namespaces.
151
+ List any non-Running pods. If all pods are Running, reply with exactly: HEARTBEAT_OK
152
+ Format any issues as: ⚠️ <namespace>/<pod>: <status>
153
+ ```
154
+
155
+ **Another good prompt (morning briefing):**
156
+ ```
157
+ Read ~/.openclaw/workspace/memory/2026-02-26.md (today's daily log).
158
+ Summarize: what was completed, what's in progress, any blockers.
159
+ Format as a 3-section bullet list: Done / In Progress / Blocked.
160
+ Keep it under 200 words.
161
+ ```
162
+
163
+ ---
164
+
165
+ ### When to use `main`
166
+
167
+ Use `main` sparingly. It injects directly into your active agent session — the same conversation you're having with your agent right now.
168
+
169
+ **Good uses:**
170
+ - Long-running task check-ins: "You've been running for 30 minutes — report status"
171
+ - Alerts that should appear in your live chat, not delivered separately to a channel
172
+ - Jobs that genuinely need your ongoing conversation context (rare)
173
+
174
+ **Bad uses:**
175
+ - Cron jobs that run overnight — they'll clutter your session history when you wake up
176
+ - Anything that can be delivered to a channel instead
177
+
178
+ ---
179
+
180
+ ### Chains vs Standalone
181
+
182
+ **Use standalone jobs for:**
183
+ - Independent recurring tasks (morning briefing, daily backup)
184
+ - Tasks with no dependencies on other job results
185
+ - Anything where a failure shouldn't block anything else
186
+
187
+ **Use chains for:**
188
+ - Pipelines: build → test → deploy → notify
189
+ - Conditional work: monitor → (only if ALERT found) → escalate
190
+ - Post-processing: analyze → (only if anomaly) → deep-dive
191
+ - Cleanup that runs regardless: build → (always) → cleanup temp files
192
+
193
+ **Good chain example:**
194
+
195
+ ```
196
+ [Daily CI Check — 9am]
197
+ └─ [Deploy Staging — trigger:success]
198
+ └─ [Smoke Test — trigger:success, delay:60s]
199
+ └─ [Notify Team — trigger:complete] ← runs regardless of pass/fail
200
+ └─ [Rollback Staging — trigger:failure]
201
+ ```
202
+
203
+ **Key chain tips:**
204
+ - Use `trigger_on: "complete"` (not success) for notification/cleanup jobs that should always run
205
+ - Use `trigger_delay_s` to give services time to start before smoke tests
206
+ - Use `trigger_condition: "contains:ALERT"` to only trigger escalation when the monitor actually finds something
207
+ - Set `max_retries: 1` or `2` on the first job in a chain — transient failures shouldn't kill the whole pipeline
208
+
209
+ ---
210
+
211
+ ### Delivery Modes
212
+
213
+ | Mode | Best for |
214
+ |------|----------|
215
+ | `none` | Background jobs — check results via `openclaw-scheduler runs list` |
216
+ | `announce` | LLM jobs with important output; shell jobs that should alert on failure |
217
+ | `announce-always` | Monitoring/audit jobs where you want every result |
218
+
219
+ **Don't** use `announce-always` on high-frequency jobs (every 5 minutes) unless you want a constant stream of messages. Save it for hourly or less frequent jobs, or jobs where the output is always relevant.
220
+
221
+ **The `announce` + `HEARTBEAT_OK` pattern** is the most useful combination: zero noise when healthy, immediate delivery when something needs attention. See [The HEARTBEAT_OK Convention](#the-heartbeat_ok-convention).
222
+
223
+ ---
224
+
225
+ ### Timeouts and Reliability
226
+
227
+ | Job type | Recommended `run_timeout_ms` | Notes |
228
+ |----------|------------------------------|-------|
229
+ | Simple shell script | 60,000 (60s) | Default 300s is usually too generous |
230
+ | LLM job, single tool | 120,000 (120s) | |
231
+ | LLM job, multi-tool (k8s + files + analysis) | 240,000 (240s) | |
232
+ | Long-running agent work | 600,000 (600s) | |
233
+
234
+ Set `max_retries: 1` or `max_retries: 2` for any job that hits external services (APIs, databases, GitHub, etc.) — transient failures happen, especially at cron-job-o'clock when everything runs at once.
235
+
236
+ **Retry tip:** Failure chain children don't fire until all retries are exhausted. This prevents false failure alerts on transient errors. Set retries before worrying about failure notifications.
237
+
238
+ ---
239
+
240
+ ## Integrating with Your OpenClaw Agent
241
+
242
+ Your agent runs in a fresh context each isolated session. This section covers how to make it scheduler-aware.
243
+
244
+ ### What the Agent Needs to Know
245
+
246
+ For an agent to use the scheduler effectively, it needs to know:
247
+
248
+ 1. The scheduler exists and where it is (`~/.openclaw/scheduler/`)
249
+ 2. How to check status, create, and manage jobs via the CLI
250
+ 3. What scheduled prompts look like — the `[scheduler:...]` header
251
+ 4. How to respond correctly when *it is* the one receiving a scheduled prompt
252
+
253
+ ---
254
+
255
+ ### Adding the Scheduler to Agent Memory
256
+
257
+ Add a section like this to your `MEMORY.md` or workspace context file. Your agent reads this at the start of each session and will know how to work with the scheduler:
258
+
259
+ ```markdown
260
+ ## Scheduler
261
+ - Standalone scheduler at `~/.openclaw/scheduler/` — runs 24/7 as a background service
262
+ - Check status: `node ~/.openclaw/scheduler/cli.js status`
263
+ - List jobs: `node ~/.openclaw/scheduler/cli.js jobs list`
264
+ - Create a job: `node ~/.openclaw/scheduler/cli.js jobs add '<json>'`
265
+ - View run history: `node ~/.openclaw/scheduler/cli.js runs list <job-id>`
266
+ - Dispatch via OpenClaw chat completions API (isolated sessions — no chat history)
267
+ - Shell jobs run scripts directly — no LLM call, no gateway needed
268
+ - When you receive a prompt starting with [scheduler:...], you are in an isolated session.
269
+ No conversation history. Focus on the task. Reply HEARTBEAT_OK for watchdog jobs when healthy.
270
+ ```
271
+
272
+ ---
273
+
274
+ ### How Scheduled Jobs Appear to the Agent
275
+
276
+ When the dispatcher fires an isolated job, the agent receives a message structured like this:
277
+
278
+ ```
279
+ [scheduler:abc123 Daily Health Check]
280
+
281
+ --- Pending Messages ---
282
+ From: scheduler | result | Previous backup: 3 files committed, pushed to origin
283
+ ---
284
+
285
+ Check k8s pod health across requesthub-prod and requesthub-dev.
286
+ If all pods are Running, reply with exactly: HEARTBEAT_OK
287
+ Format any issues as: ⚠️ <namespace>/<pod>: <status>
288
+ ```
289
+
290
+ The agent should:
291
+ - Recognize the `[scheduler:...]` header as a scheduled task prompt
292
+ - Understand it's in an isolated session — no access to prior user conversations
293
+ - Focus entirely on the task described in `payload_message`
294
+ - Use appropriate tools (exec, kubectl, browser, file access) as needed
295
+ - Respond concisely — the response becomes the run summary stored in the `runs` table
296
+
297
+ ---
298
+
299
+ ### The HEARTBEAT_OK Convention
300
+
301
+ For watchdog and health-check jobs, instruct the agent to reply `HEARTBEAT_OK` when nothing needs attention:
302
+
303
+ ```json
304
+ {
305
+ "name": "Disk Usage Check",
306
+ "schedule_cron": "0 * * * *",
307
+ "session_target": "isolated",
308
+ "payload_message": "Check disk usage on all mounted filesystems. If all mounts are under 80% full, reply with exactly: HEARTBEAT_OK\nIf any mount is 80% or more, describe the affected mounts and their usage.",
309
+ "delivery_mode": "announce",
310
+ "delivery_channel": "telegram",
311
+ "delivery_to": "YOUR_CHAT_ID"
312
+ }
313
+ ```
314
+
315
+ With `delivery_mode: "announce"`, the result is only delivered if the agent's response does **not** contain `HEARTBEAT_OK`.
316
+
317
+ Result: zero noise when healthy, immediate alert when something needs attention.
318
+
319
+ The same works for shell jobs — `announce` only triggers on non-zero exit.
320
+
321
+ ---
322
+
323
+ ### Letting the Agent Create Jobs Dynamically
324
+
325
+ The agent can create new scheduler jobs at runtime in two ways:
326
+
327
+ #### 1. Direct CLI (recommended for user-initiated requests)
328
+
329
+ When a user asks "remind me to review the PR in 2 hours", the agent runs:
330
+
331
+ ```bash
332
+ node ~/.openclaw/scheduler/cli.js jobs add '{
333
+ "name": "PR Review Reminder",
334
+ "schedule_cron": "0 17 * * *",
335
+ "session_target": "isolated",
336
+ "payload_message": "Send Jordan a reminder to review the PR they opened this morning. Be specific about which PR.",
337
+ "delivery_mode": "announce",
338
+ "delivery_channel": "telegram",
339
+ "delivery_to": "YOUR_CHAT_ID",
340
+ "delete_after_run": true
341
+ }'
342
+ ```
343
+
344
+ Use `"delete_after_run": true` for one-shot reminders so they clean up after themselves.
345
+
346
+ #### 2. Spawn messages (for jobs creating child jobs at runtime)
347
+
348
+ An isolated job can create new jobs on the fly by sending a spawn message to the scheduler agent:
349
+
350
+ ```json
351
+ {
352
+ "from_agent": "main",
353
+ "to_agent": "scheduler",
354
+ "kind": "spawn",
355
+ "body": "{\"name\":\"Follow-up Analysis\",\"payload_message\":\"Analyze the anomaly found in the previous run and prepare a report.\",\"delete_after_run\":true,\"run_now\":true}"
356
+ }
357
+ ```
358
+
359
+ The dispatcher picks this up on its next tick (within 15s) and creates and immediately runs the spawned job.
360
+
361
+ ---
362
+
363
+ ### Communicating Between Jobs
364
+
365
+ Jobs can pass data to each other using the inter-agent message queue. Messages injected into a job's context appear in the `--- Pending Messages ---` block at the top of the prompt.
366
+
367
+ **Pattern: Monitor → Handler**
368
+
369
+ Monitor job finds an anomaly and sends a task to the handler agent:
370
+
371
+ ```bash
372
+ # In the monitor's payload_message, instruct the agent to:
373
+ # "If you find any ERROR entries in the log, send a message to agent 'main'
374
+ # with kind='task' and body='<details of the error>'"
375
+ ```
376
+
377
+ The handler job reads its inbox automatically at the start of its next run.
378
+
379
+ **Pattern: Job A → Job B via message queue**
380
+
381
+ ```bash
382
+ # Send a message from one job to another
383
+ openclaw-scheduler msg send monitor-agent handler-agent "Found 3 critical errors at 14:23"
384
+ ```
385
+
386
+ ---
387
+
388
+ ### Tracking Spawned Sub-Agents
389
+
390
+ When you spawn sub-agents to do parallel work, use the task tracker so the scheduler monitors them and delivers a completion summary automatically — without you polling.
391
+
392
+ **Step 1: Create a tracker before spawning**
393
+
394
+ ```bash
395
+ # In your agent session, before spawning:
396
+ TRACKER_ID=$(node ~/.openclaw/scheduler/cli.js tasks create '{
397
+ "name": "doc-sprint",
398
+ "expectedAgents": ["writer", "reviewer"],
399
+ "timeoutS": 3600,
400
+ "deliveryChannel": "telegram",
401
+ "deliveryTo": "YOUR_CHAT_ID"
402
+ }' | grep '"id"' | cut -d'"' -f4)
403
+ ```
404
+
405
+ **Step 2: Spawn sub-agents (via `sessions_spawn` tool), then register their session keys**
406
+
407
+ ```bash
408
+ # After getting the childSessionKey from sessions_spawn:
409
+ node ~/.openclaw/scheduler/cli.js tasks register-session $TRACKER_ID writer "agent:main:subagent:abc-123"
410
+ node ~/.openclaw/scheduler/cli.js tasks register-session $TRACKER_ID reviewer "agent:main:subagent:def-456"
411
+ ```
412
+
413
+ Once session keys are registered, the **dispatcher auto-detects heartbeats** by calling `sessions_list` every 30s. As long as the sub-agent's session is active, it's counted as alive — no CLI calls required from inside the sub-agent.
414
+
415
+ **Step 3: Sub-agents report completion (optional but recommended)**
416
+
417
+ Add this to the sub-agent's task preamble:
418
+
419
+ ```
420
+ ## Status Reporting
421
+ Tracker ID: <TRACKER_ID>
422
+ Your agent label: writer
423
+
424
+ When you start working:
425
+ node ~/.openclaw/scheduler/cli.js tasks heartbeat <TRACKER_ID> writer running
426
+
427
+ When you finish successfully:
428
+ node ~/.openclaw/scheduler/cli.js tasks heartbeat <TRACKER_ID> writer completed "Brief summary of what you did"
429
+
430
+ If something goes wrong:
431
+ node ~/.openclaw/scheduler/cli.js tasks heartbeat <TRACKER_ID> writer failed "What went wrong"
432
+ ```
433
+
434
+ **What happens automatically:**
435
+ - Dispatcher checks active sessions every 30s — agents with active sessions stay "alive"
436
+ - Agents that go silent for > 5 minutes AND whose tracker has timed out → marked dead
437
+ - When all agents reach terminal state → delivery summary sent to your configured channel
438
+ - Check anytime: `openclaw-scheduler tasks status <TRACKER_ID>`
439
+
440
+ **Detecting sub-agents from other sessions:**
441
+
442
+ The scheduler's dispatcher calls `sessions_list` with `kinds: ['subagent']` which returns **all sub-agent sessions across all requesters** — not just the current session. This means:
443
+ - Sub-agents spawned from any session are visible to the task tracker
444
+ - Works even if the spawning session has ended
445
+ - Works across session compaction / context resets
446
+
447
+ ---
448
+
449
+ ### Practical Agent Briefing Example
450
+
451
+ Here's a complete, self-contained entry to add to your workspace `MEMORY.md` or context file. Copy and adapt it:
452
+
453
+ ```markdown
454
+ ## OpenClaw Scheduler — How to Use
455
+
456
+ The scheduler (`~/.openclaw/scheduler/`) runs as a background service (launchd / systemd)
457
+ and fires jobs independently of your chat sessions.
458
+
459
+ ### Quick commands
460
+ - Status: `openclaw-scheduler status`
461
+ - List jobs: `openclaw-scheduler jobs list`
462
+ - Add job: `openclaw-scheduler jobs add '<json>'`
463
+ - View runs: `openclaw-scheduler runs list <job-id>`
464
+ - Force run now: `openclaw-scheduler jobs run <id>`
465
+
466
+ > **Warning:** Avoid direct `sqlite3 ... UPDATE jobs SET next_run_at` statements against the live database. SQLite WAL locking can conflict with the running scheduler process and cause SQLITE_BUSY errors or stale reads. Use the CLI command above instead -- it enqueues through the dispatch queue safely.
467
+ - Logs: `tail -f /tmp/openclaw-scheduler.log`
468
+
469
+ ### When you receive a scheduled prompt
470
+ You're in an isolated session. No conversation history. No context from prior chats.
471
+ Read the `[scheduler:...]` header to identify the job, then do exactly what `payload_message` says.
472
+ - Reply `HEARTBEAT_OK` for watchdog/health-check jobs when nothing needs attention (suppresses delivery)
473
+ - For analysis/report jobs, write a concise summary — it's stored as the run record
474
+ - For shell jobs, you're not invoked at all — the command runs directly
475
+
476
+ ### Job types cheatsheet
477
+ | Type | Use for |
478
+ |------|---------|
479
+ | `shell` | Scripts, backups, pings — fast, no LLM, runs even when gateway is down |
480
+ | `isolated` | AI tasks needing tools, memory, reasoning — each run gets fresh context |
481
+ | `main` | Urgent alerts that must appear in your active chat (use sparingly) |
482
+
483
+ ### Creating jobs
484
+ When asked to set up a scheduled task, use:
485
+ ```bash
486
+ node ~/.openclaw/scheduler/cli.js jobs add '{
487
+ "name": "Task Name",
488
+ "schedule_cron": "0 9 * * 1-5",
489
+ "session_target": "isolated",
490
+ "payload_message": "Your clear, specific instructions here.",
491
+ "delivery_mode": "announce",
492
+ "delivery_channel": "telegram",
493
+ "delivery_to": "TELEGRAM_ID"
494
+ }'
495
+ ```
496
+ ```
497
+
498
+ ---
499
+
500
+ ## See Also
501
+
502
+ - [README.md](README.md) — Full feature reference
503
+ - [INSTALL.md](INSTALL.md) — macOS installation
504
+ - [INSTALL-LINUX.md](INSTALL-LINUX.md) — Linux installation
505
+ - [INSTALL-WINDOWS.md](INSTALL-WINDOWS.md) — Windows installation
506
+ - [UNINSTALL.md](UNINSTALL.md) — Removing the scheduler
package/CHANGELOG.md ADDED
@@ -0,0 +1,82 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ ## [Unreleased]
6
+
7
+ ### Fixed
8
+ - fix(watcher): exit cleanly when session status=done (PR #1)
9
+ - fix(watchdog): prevent auto-resolving active sessions with heartbeat + hard ceiling (PR #2)
10
+ - fix(gateway): reset idle timer while fetch is in flight (PR #3)
11
+ - fix(watcher): prevent premature kill of active subagent sessions with JSONL activity signal (PR #7)
12
+
13
+ ### Added
14
+ - feat: v0.2 runtime with identity/trust/authorization/evidence/credential handoff (PR #4)
15
+ - feat: x-openclaw-env-inject header for agent task credentials (PR #5)
16
+ - docs: trust architecture, multi-agent gateway routing, agent adoption files
17
+
18
+ ### Changed
19
+ - chore: replace non-ASCII characters with ASCII equivalents (PR #6)
20
+
21
+ ## [0.2.0] -- 2026-03-11
22
+
23
+ ### Added
24
+ - Strategy pattern refactor: decomposed 614-line `dispatchJob` closure into explicit `DispatchContext` + strategy functions (`prepareDispatch`, `executeStrategy`, `finalizeDispatch`) in new `dispatcher-strategies.js`
25
+ - Auth profile resolution for isolated agent turns: `auth_profile` field on jobs supports `'inherit'` (looks up main session profile) or explicit `'provider:label'`
26
+ - Drain-error retry: transient infrastructure errors (HTTP 529) bypass normal retry ladder and re-enqueue immediately
27
+ - One-shot `at`-style scheduling via `schedule_kind: 'at'` and `schedule_at` fields (schema v18)
28
+ - Complete TypeScript type coverage: 26 previously missing function signatures, 4 corrected return types, 51 missing schema columns added to `index.d.ts`
29
+ - Expanded type smoke tests from 23 to 192+ lines exercising all typed APIs
30
+ - 5 new test coverage areas: dispatcher-utils, dispatch-queue lifecycle, approval timeout/prune/count, run session/context, prompt-context edge cases
31
+ - `idempotency`, `taskTracker`, and `teamAdapter` modules now exported from `index.js` for programmatic consumers
32
+
33
+ ### Fixed
34
+ - `updateJobAfterRun` null guard prevents crash when job is deleted mid-dispatch
35
+ - Shell timeout and retry exhaustion handling corrected
36
+ - Boolean job flags normalized for SQLite writes
37
+ - Numeric enabled flags treated as disabled on create
38
+ - Child jobs can no longer self-fire as autonomous one-shot schedules; due selectors are root-only
39
+ - Disabled future one-shot jobs are no longer pruned before they ever run
40
+ - Consolidation migration now backfills partial legacy message/task-tracker tables without noisy fallback errors
41
+
42
+ ### Changed
43
+ - Default `schedule_tz` changed from `America/New_York` to `UTC` in schema, validation, and setup
44
+ - `--json` mode wired through all CLI subcommands (msg, tasks, team, queue, idem) via `emit()`/`fail()` helpers
45
+ - Dispatch subsystem portability: `process.execPath` replaces bare `node`, `__dirname`-relative paths replace hardcoded install paths
46
+ - Dispatcher reduced from ~1200 lines to ~850 lines; `dispatchJob` is now a 5-line orchestrator (strategy code lives in `dispatcher-strategies.js`)
47
+ - `buildDispatchDeps()` wires 36+ dependencies via dependency injection
48
+ - Full validation gate moved into local verification commands (`npm run verify:local` / `npm run verify:smoke`); GitHub Actions now runs a single lightweight smoke job
49
+ - Test baseline updated to 1410 passed
50
+ - Schema baseline is now v23
51
+
52
+ ## [0.1.0] -- 2026-03-08
53
+
54
+ First public release.
55
+
56
+ ### Added
57
+ - Watchdog job type for long-running task monitoring, including dedicated watchdog fields, CLI support, dispatcher handling, and config example scaffolding
58
+ - Durable dispatch queue for manual runs, retries, and chain-triggered executions, with persisted run causality via `dispatch_queue_id` and `triggered_by_run`
59
+ - Structured shell result persistence on runs: exit code, signal, timeout flag, stdout, and stderr
60
+ - Richer shell-failure context for triggered follow-up jobs and agent triage flows
61
+ - CLI improvements for machine use and release readiness, including `--json`, `jobs validate`, schema introspection, and improved npm-install defaults
62
+ - Safe typed root exports for programmatic tooling (`index.js` + `index.d.ts`)
63
+
64
+ ### Fixed
65
+ - Shell timeouts are now classified correctly as `timeout`, with `shell_timed_out` persisted on runs
66
+ - Shell retries now exhaust correctly and fire failure children only after the retry ladder is complete
67
+ - Consolidated migration skip logic now checks for actual column presence instead of relying on version markers alone
68
+ - Runtime startup version logging now reads from `package.json` instead of a stale hardcoded string
69
+ - Public-facing docs/examples no longer include private hostnames or deployment-specific Telegram identifiers
70
+ - Node 20 compatibility by removing runtime dependence on `node:sqlite` and JSON import attributes
71
+
72
+ ### Changed
73
+ - Schema baseline is now `v14`
74
+ - Added execution-intent fields, queue / approval / fan-out caps, shell-output offloading, and runtime budget visibility
75
+ - Tightened ESLint rules, added TypeScript declaration smoke tests, and enforced global coverage floors
76
+ - Extracted dispatcher approvals, delivery, maintenance, and shell helpers into dedicated modules
77
+ - Versioning reset to `0.1.0` as the first public release
78
+ - Updated verification baseline to `581 passed, 0 failed`
79
+
80
+ ## Pre-release
81
+
82
+ Internal development versions consolidated into 0.1.0. See git history for details.
@@ -0,0 +1,22 @@
1
+ # Code of Conduct
2
+
3
+ ## Our Standard
4
+
5
+ Participate professionally and constructively.
6
+
7
+ Expected behavior:
8
+
9
+ - focus on technical substance
10
+ - keep feedback specific and actionable
11
+ - respect other contributors
12
+
13
+ Unacceptable behavior:
14
+
15
+ - harassment
16
+ - personal attacks
17
+ - discriminatory language
18
+ - deliberately disruptive conduct
19
+
20
+ ## Enforcement
21
+
22
+ Project maintainers may remove comments, issues, or contributions that violate this standard.
package/CONTEXT.md ADDED
@@ -0,0 +1,26 @@
1
+ # Context
2
+
3
+ ## Problem
4
+
5
+ Scheduled agent and shell workflows need durability: run history, retries,
6
+ approval gates, delivery, triggered chains, and an audit trail. Built-in
7
+ cron/heartbeat does not provide these.
8
+
9
+ ## Repo Position
10
+
11
+ `openclaw-scheduler` is the durable runtime. It sits below the control plane
12
+ (`agentcli`) and beside the gateway (`openclaw`).
13
+
14
+ - It can work standalone with jobs created by agents or operators via CLI.
15
+ - It can be driven by `agentcli` for declarative manifest-based workflows.
16
+ - It dispatches to the OpenClaw gateway for agent sessions.
17
+ - It runs shell jobs directly without the gateway.
18
+
19
+ ## Design Bias
20
+
21
+ - scheduling and state in SQLite (single-file, no external services)
22
+ - shell jobs are first-class, not second-class to agent jobs
23
+ - delivery is channel-agnostic (Telegram, Discord, WhatsApp, Signal, iMessage, Slack)
24
+ - run_timeout_ms is required on every job (no indefinite runs)
25
+ - overlap, retry, and delivery guarantee are per-job configuration
26
+ - keep the scheduler process stateless between ticks (all state in the DB)