openclaw-scheduler 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +302 -0
- package/BEST-PRACTICES.md +506 -0
- package/CHANGELOG.md +82 -0
- package/CODE_OF_CONDUCT.md +22 -0
- package/CONTEXT.md +26 -0
- package/CONTRIBUTING.md +73 -0
- package/IMPLEMENTATION_SPEC.md +170 -0
- package/INSTALL-ADDITIONAL-HOST.md +333 -0
- package/INSTALL-LINUX.md +419 -0
- package/INSTALL-WINDOWS.md +305 -0
- package/INSTALL.md +364 -0
- package/JOB-QUICK-REF.md +222 -0
- package/LICENSE +21 -0
- package/QUICK-START.md +256 -0
- package/README.md +2170 -0
- package/SECURITY.md +34 -0
- package/UNINSTALL.md +129 -0
- package/UPGRADING.md +436 -0
- package/agents.js +67 -0
- package/approval.js +107 -0
- package/backup.js +390 -0
- package/bin/openclaw-scheduler.js +138 -0
- package/cli.js +1083 -0
- package/db.js +122 -0
- package/dispatch/529-recovery.mjs +204 -0
- package/dispatch/README.md +372 -0
- package/dispatch/config.example.json +24 -0
- package/dispatch/deliver-watcher.sh +57 -0
- package/dispatch/hooks.mjs +171 -0
- package/dispatch/index.mjs +1836 -0
- package/dispatch/watcher.mjs +1396 -0
- package/dispatch-queue.js +112 -0
- package/dispatcher-approvals.js +96 -0
- package/dispatcher-delivery.js +43 -0
- package/dispatcher-maintenance.js +242 -0
- package/dispatcher-shell.js +29 -0
- package/dispatcher-strategies.js +1280 -0
- package/dispatcher-utils.js +81 -0
- package/dispatcher.js +855 -0
- package/docs/adr-schedule-ownership.md +73 -0
- package/docs/gateway-contract.md +904 -0
- package/docs/plans/2026-03-09-fix-typescript-types.md +91 -0
- package/docs/plans/2026-03-09-test-coverage-gaps.md +83 -0
- package/docs/plans/2026-03-10-dispatcher-refactor.md +801 -0
- package/docs/trust-architecture.md +266 -0
- package/gateway.js +473 -0
- package/idempotency.js +119 -0
- package/index.d.ts +864 -0
- package/index.js +17 -0
- package/jobs.js +1224 -0
- package/messages.js +357 -0
- package/migrate-consolidate.js +694 -0
- package/migrate.js +125 -0
- package/package.json +130 -0
- package/paths.js +79 -0
- package/prompt-context.js +94 -0
- package/retrieval.js +176 -0
- package/runs.js +270 -0
- package/scheduler-schema.js +101 -0
- package/schema.sql +480 -0
- package/scripts/dispatch-cli-utils.mjs +65 -0
- package/scripts/inbox-consumer.mjs +288 -0
- package/scripts/stuck-detector.sh +18 -0
- package/scripts/stuck-run-detector.mjs +333 -0
- package/scripts/telegram-webhook-check.mjs +238 -0
- package/setup.mjs +724 -0
- package/shell-result.js +214 -0
- package/task-tracker.js +300 -0
- package/team-adapter.js +335 -0
- package/v02-runtime.js +599 -0
|
@@ -0,0 +1,506 @@
|
|
|
1
|
+
# Best Practices
|
|
2
|
+
|
|
3
|
+
Guidance on choosing the right job type, writing effective prompts, structuring workflows, and making your OpenClaw agent scheduler-aware.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Table of Contents
|
|
8
|
+
|
|
9
|
+
1. [Choosing the Right Job Type](#choosing-the-right-job-type)
|
|
10
|
+
- [Session Targets at a Glance](#session-targets-at-a-glance)
|
|
11
|
+
- [When to use `shell`](#when-to-use-shell)
|
|
12
|
+
- [When to use `isolated`](#when-to-use-isolated)
|
|
13
|
+
- [When to use `main`](#when-to-use-main)
|
|
14
|
+
- [Chains vs Standalone](#chains-vs-standalone)
|
|
15
|
+
- [Delivery Modes](#delivery-modes)
|
|
16
|
+
- [Timeouts and Reliability](#timeouts-and-reliability)
|
|
17
|
+
2. [Integrating with Your OpenClaw Agent](#integrating-with-your-openclaw-agent)
|
|
18
|
+
- [What the Agent Needs to Know](#what-the-agent-needs-to-know)
|
|
19
|
+
- [Adding the Scheduler to Agent Memory](#adding-the-scheduler-to-agent-memory)
|
|
20
|
+
- [How Scheduled Jobs Appear to the Agent](#how-scheduled-jobs-appear-to-the-agent)
|
|
21
|
+
- [The HEARTBEAT_OK Convention](#the-heartbeat_ok-convention)
|
|
22
|
+
- [Letting the Agent Create Jobs Dynamically](#letting-the-agent-create-jobs-dynamically)
|
|
23
|
+
- [Communicating Between Jobs](#communicating-between-jobs)
|
|
24
|
+
- [Practical Agent Briefing Example](#practical-agent-briefing-example)
|
|
25
|
+
|
|
26
|
+
---
|
|
27
|
+
|
|
28
|
+
## Choosing the Right Job Type
|
|
29
|
+
|
|
30
|
+
### Session Targets at a Glance
|
|
31
|
+
|
|
32
|
+
| Use Case | Target | Why |
|
|
33
|
+
|----------|--------|-----|
|
|
34
|
+
| Backups, scripts, log rotation, file cleanup | `shell` | No LLM cost, no gateway dependency, runs even if gateway is down |
|
|
35
|
+
| Morning briefings, reports, analysis, summaries | `isolated` | Gets full LLM + tools, isolated from your conversations |
|
|
36
|
+
| Urgent alerts that must appear in active chat | `main` | Injects directly into your live session |
|
|
37
|
+
| Build → deploy → notify pipelines | Chain of `isolated` jobs | Each step gets fresh context, failures stop the chain |
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
### When to use `shell`
|
|
42
|
+
|
|
43
|
+
Use `shell` when:
|
|
44
|
+
- The task is deterministic — a script, a health check ping, a disk usage check, a backup
|
|
45
|
+
- You don't need AI reasoning
|
|
46
|
+
- The job must run even when the gateway is down or rate-limited
|
|
47
|
+
- Speed matters — shell jobs complete in milliseconds vs 10–60s for LLM calls
|
|
48
|
+
- You want to minimize API cost
|
|
49
|
+
|
|
50
|
+
`payload_message` is passed directly to the shell (`/bin/zsh` on macOS, `/bin/bash` on Linux). Override with `SCHEDULER_SHELL` environment variable.
|
|
51
|
+
|
|
52
|
+
**Examples:**
|
|
53
|
+
|
|
54
|
+
```json
|
|
55
|
+
{ "session_target": "shell", "payload_message": "~/scripts/backup.sh" }
|
|
56
|
+
|
|
57
|
+
{ "session_target": "shell", "payload_message": "df -h / | grep -E '^/dev' | awk '{print $5}'" }
|
|
58
|
+
|
|
59
|
+
{ "session_target": "shell", "payload_message": "curl -sf http://localhost:8080/health || exit 1" }
|
|
60
|
+
|
|
61
|
+
{ "session_target": "shell", "payload_message": "cd ~/myapp && git pull && npm run build 2>&1" }
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
**Delivery with shell jobs:**
|
|
65
|
+
- `delivery_mode: "announce"` — sends output only on non-zero exit (perfect for failure alerts)
|
|
66
|
+
- `delivery_mode: "announce-always"` — sends output every time
|
|
67
|
+
- `delivery_mode: "none"` — background, check runs via `openclaw-scheduler runs list`
|
|
68
|
+
|
|
69
|
+
**Shell jobs work great as chain parents too** — a shell build job can trigger an isolated deploy job on success.
|
|
70
|
+
|
|
71
|
+
#### Shell Daemon Keepalive Jobs — Critical Rules
|
|
72
|
+
|
|
73
|
+
When using shell jobs to keep a background daemon (e.g. a model server, a vector DB, an MCP bridge) warm and running, three rules prevent silent failures:
|
|
74
|
+
|
|
75
|
+
**Rule 1: The daemon wrapper script must block.**
|
|
76
|
+
|
|
77
|
+
If your launchd/systemd service runs a wrapper script that sets up the daemon and then exits, the service manager restarts the script immediately -- potentially spawning multiple daemon instances fighting for the same port, causing hangs or crashes.
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
# Wrong -- script exits after setup, KeepAlive loops it immediately
|
|
81
|
+
./bridge.py daemon start # starts daemon, initializes session, exits
|
|
82
|
+
./bridge.py warmup # exits -> service restarts -> another daemon spawns
|
|
83
|
+
|
|
84
|
+
# Correct -- start daemon in background, block on its PID
|
|
85
|
+
./mydaemon --port 8181 &
|
|
86
|
+
DAEMON_PID=$!
|
|
87
|
+
./bridge.py daemon init # session setup (runs once)
|
|
88
|
+
./bridge.py warmup # pre-warm (runs once)
|
|
89
|
+
wait $DAEMON_PID # blocks until daemon dies -> then service restarts cleanly
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
On macOS (`KeepAlive: true` launchd, whether LaunchAgent or LaunchDaemon) and Linux (`Restart=always` systemd), the moment your script exits the service manager restarts it. Always block on the daemon process with `wait $PID`.
|
|
93
|
+
|
|
94
|
+
**Rule 2: Keepalive jobs must exercise the actual hot path.**
|
|
95
|
+
|
|
96
|
+
A keepalive that pings `/health` or runs a lightweight query does NOT keep ML/neural models warm in GPU memory. Only calls that trigger real model inference keep the models loaded.
|
|
97
|
+
|
|
98
|
+
```bash
|
|
99
|
+
# Wrong -- lightweight health check, no model loading, GPU models go cold
|
|
100
|
+
curl -sf http://localhost:8181/health
|
|
101
|
+
|
|
102
|
+
# Correct -- inference query triggers embedding + reranker model loading
|
|
103
|
+
curl -sf http://localhost:8181/v1/embeddings \
|
|
104
|
+
-d '{"input":"keepalive","model":"default"}' > /dev/null
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
Schedule model-warming keepalives at least every 10 minutes. Every 30 minutes is too infrequent -- models unload from GPU between calls, causing cold-start delays (10-20s) on the next real query.
|
|
108
|
+
|
|
109
|
+
**Rule 3: Daemon session/state files must not live in `/tmp`.**
|
|
110
|
+
|
|
111
|
+
`/tmp` is cleared on macOS reboot and on Linux boot (or by `systemd-tmpfiles`). If your daemon stores a session ID or token in `/tmp`, any reboot causes the next client call to fail — often silently (e.g. `400 Already Initialized`, empty results, or a crash with no stdout).
|
|
112
|
+
|
|
113
|
+
```bash
|
|
114
|
+
# ❌ Wrong — lost on reboot, daemon calls fail silently after restart
|
|
115
|
+
SESSION_FILE=/tmp/my-daemon-session.json
|
|
116
|
+
|
|
117
|
+
# ✅ Correct — persistent across reboots
|
|
118
|
+
SESSION_FILE=${XDG_CACHE_HOME:-$HOME/.cache}/my-daemon/session.json
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
Always store daemon session files in `XDG_CACHE_HOME` or another directory that persists across reboots.
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
### When to use `isolated`
|
|
126
|
+
|
|
127
|
+
Use `isolated` when:
|
|
128
|
+
- The task requires reasoning, writing, planning, or multi-step tool use
|
|
129
|
+
- The task reads or writes memory files
|
|
130
|
+
- You want OpenClaw's tools available (kubectl, browser, file access, exec, etc.)
|
|
131
|
+
- The output should be formatted and delivered to a channel
|
|
132
|
+
|
|
133
|
+
**Writing effective `isolated` job prompts:**
|
|
134
|
+
|
|
135
|
+
| Rule | Bad | Good |
|
|
136
|
+
|------|-----|------|
|
|
137
|
+
| Be imperative and specific | "check kubernetes" | "Check k8s pods in requesthub-prod and requesthub-dev. List any non-Running pods." |
|
|
138
|
+
| Include a success signal | *(nothing)* | "If all pods Running, reply with exactly: HEARTBEAT_OK" |
|
|
139
|
+
| Specify output format | *(nothing)* | "Format issues as: ⚠️ \<namespace\>/\<pod\>: \<status\>" |
|
|
140
|
+
| State available resources | *(implicit)* | "Your memory files are in ~/.openclaw/workspace/memory/" |
|
|
141
|
+
| Set realistic timeouts | default 300s | 120s for single-tool, 240s for multi-tool |
|
|
142
|
+
|
|
143
|
+
**Bad prompt:**
|
|
144
|
+
```
|
|
145
|
+
Check everything and let me know if anything is wrong
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
**Good prompt:**
|
|
149
|
+
```
|
|
150
|
+
Check k8s pod health across requesthub-prod and requesthub-dev namespaces.
|
|
151
|
+
List any non-Running pods. If all pods are Running, reply with exactly: HEARTBEAT_OK
|
|
152
|
+
Format any issues as: ⚠️ <namespace>/<pod>: <status>
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
**Another good prompt (morning briefing):**
|
|
156
|
+
```
|
|
157
|
+
Read ~/.openclaw/workspace/memory/2026-02-26.md (today's daily log).
|
|
158
|
+
Summarize: what was completed, what's in progress, any blockers.
|
|
159
|
+
Format as a 3-section bullet list: Done / In Progress / Blocked.
|
|
160
|
+
Keep it under 200 words.
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
---
|
|
164
|
+
|
|
165
|
+
### When to use `main`
|
|
166
|
+
|
|
167
|
+
Use `main` sparingly. It injects directly into your active agent session — the same conversation you're having with your agent right now.
|
|
168
|
+
|
|
169
|
+
**Good uses:**
|
|
170
|
+
- Long-running task check-ins: "You've been running for 30 minutes — report status"
|
|
171
|
+
- Alerts that should appear in your live chat, not delivered separately to a channel
|
|
172
|
+
- Jobs that genuinely need your ongoing conversation context (rare)
|
|
173
|
+
|
|
174
|
+
**Bad uses:**
|
|
175
|
+
- Cron jobs that run overnight — they'll clutter your session history when you wake up
|
|
176
|
+
- Anything that can be delivered to a channel instead
|
|
177
|
+
|
|
178
|
+
---
|
|
179
|
+
|
|
180
|
+
### Chains vs Standalone
|
|
181
|
+
|
|
182
|
+
**Use standalone jobs for:**
|
|
183
|
+
- Independent recurring tasks (morning briefing, daily backup)
|
|
184
|
+
- Tasks with no dependencies on other job results
|
|
185
|
+
- Anything where a failure shouldn't block anything else
|
|
186
|
+
|
|
187
|
+
**Use chains for:**
|
|
188
|
+
- Pipelines: build → test → deploy → notify
|
|
189
|
+
- Conditional work: monitor → (only if ALERT found) → escalate
|
|
190
|
+
- Post-processing: analyze → (only if anomaly) → deep-dive
|
|
191
|
+
- Cleanup that runs regardless: build → (always) → cleanup temp files
|
|
192
|
+
|
|
193
|
+
**Good chain example:**
|
|
194
|
+
|
|
195
|
+
```
|
|
196
|
+
[Daily CI Check — 9am]
|
|
197
|
+
└─ [Deploy Staging — trigger:success]
|
|
198
|
+
└─ [Smoke Test — trigger:success, delay:60s]
|
|
199
|
+
└─ [Notify Team — trigger:complete] ← runs regardless of pass/fail
|
|
200
|
+
└─ [Rollback Staging — trigger:failure]
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
**Key chain tips:**
|
|
204
|
+
- Use `trigger_on: "complete"` (not success) for notification/cleanup jobs that should always run
|
|
205
|
+
- Use `trigger_delay_s` to give services time to start before smoke tests
|
|
206
|
+
- Use `trigger_condition: "contains:ALERT"` to only trigger escalation when the monitor actually finds something
|
|
207
|
+
- Set `max_retries: 1` or `2` on the first job in a chain — transient failures shouldn't kill the whole pipeline
|
|
208
|
+
|
|
209
|
+
---
|
|
210
|
+
|
|
211
|
+
### Delivery Modes
|
|
212
|
+
|
|
213
|
+
| Mode | Best for |
|
|
214
|
+
|------|----------|
|
|
215
|
+
| `none` | Background jobs — check results via `openclaw-scheduler runs list` |
|
|
216
|
+
| `announce` | LLM jobs with important output; shell jobs that should alert on failure |
|
|
217
|
+
| `announce-always` | Monitoring/audit jobs where you want every result |
|
|
218
|
+
|
|
219
|
+
**Don't** use `announce-always` on high-frequency jobs (every 5 minutes) unless you want a constant stream of messages. Save it for hourly or less frequent jobs, or jobs where the output is always relevant.
|
|
220
|
+
|
|
221
|
+
**The `announce` + `HEARTBEAT_OK` pattern** is the most useful combination: zero noise when healthy, immediate delivery when something needs attention. See [The HEARTBEAT_OK Convention](#the-heartbeat_ok-convention).
|
|
222
|
+
|
|
223
|
+
---
|
|
224
|
+
|
|
225
|
+
### Timeouts and Reliability
|
|
226
|
+
|
|
227
|
+
| Job type | Recommended `run_timeout_ms` | Notes |
|
|
228
|
+
|----------|------------------------------|-------|
|
|
229
|
+
| Simple shell script | 60,000 (60s) | Default 300s is usually too generous |
|
|
230
|
+
| LLM job, single tool | 120,000 (120s) | |
|
|
231
|
+
| LLM job, multi-tool (k8s + files + analysis) | 240,000 (240s) | |
|
|
232
|
+
| Long-running agent work | 600,000 (600s) | |
|
|
233
|
+
|
|
234
|
+
Set `max_retries: 1` or `max_retries: 2` for any job that hits external services (APIs, databases, GitHub, etc.) — transient failures happen, especially at cron-job-o'clock when everything runs at once.
|
|
235
|
+
|
|
236
|
+
**Retry tip:** Failure chain children don't fire until all retries are exhausted. This prevents false failure alerts on transient errors. Set retries before worrying about failure notifications.
|
|
237
|
+
|
|
238
|
+
---
|
|
239
|
+
|
|
240
|
+
## Integrating with Your OpenClaw Agent
|
|
241
|
+
|
|
242
|
+
Your agent runs in a fresh context each isolated session. This section covers how to make it scheduler-aware.
|
|
243
|
+
|
|
244
|
+
### What the Agent Needs to Know
|
|
245
|
+
|
|
246
|
+
For an agent to use the scheduler effectively, it needs to know:
|
|
247
|
+
|
|
248
|
+
1. The scheduler exists and where it is (`~/.openclaw/scheduler/`)
|
|
249
|
+
2. How to check status, create, and manage jobs via the CLI
|
|
250
|
+
3. What scheduled prompts look like — the `[scheduler:...]` header
|
|
251
|
+
4. How to respond correctly when *it is* the one receiving a scheduled prompt
|
|
252
|
+
|
|
253
|
+
---
|
|
254
|
+
|
|
255
|
+
### Adding the Scheduler to Agent Memory
|
|
256
|
+
|
|
257
|
+
Add a section like this to your `MEMORY.md` or workspace context file. Your agent reads this at the start of each session and will know how to work with the scheduler:
|
|
258
|
+
|
|
259
|
+
```markdown
|
|
260
|
+
## Scheduler
|
|
261
|
+
- Standalone scheduler at `~/.openclaw/scheduler/` — runs 24/7 as a background service
|
|
262
|
+
- Check status: `node ~/.openclaw/scheduler/cli.js status`
|
|
263
|
+
- List jobs: `node ~/.openclaw/scheduler/cli.js jobs list`
|
|
264
|
+
- Create a job: `node ~/.openclaw/scheduler/cli.js jobs add '<json>'`
|
|
265
|
+
- View run history: `node ~/.openclaw/scheduler/cli.js runs list <job-id>`
|
|
266
|
+
- Dispatch via OpenClaw chat completions API (isolated sessions — no chat history)
|
|
267
|
+
- Shell jobs run scripts directly — no LLM call, no gateway needed
|
|
268
|
+
- When you receive a prompt starting with [scheduler:...], you are in an isolated session.
|
|
269
|
+
No conversation history. Focus on the task. Reply HEARTBEAT_OK for watchdog jobs when healthy.
|
|
270
|
+
```
|
|
271
|
+
|
|
272
|
+
---
|
|
273
|
+
|
|
274
|
+
### How Scheduled Jobs Appear to the Agent
|
|
275
|
+
|
|
276
|
+
When the dispatcher fires an isolated job, the agent receives a message structured like this:
|
|
277
|
+
|
|
278
|
+
```
|
|
279
|
+
[scheduler:abc123 Daily Health Check]
|
|
280
|
+
|
|
281
|
+
--- Pending Messages ---
|
|
282
|
+
From: scheduler | result | Previous backup: 3 files committed, pushed to origin
|
|
283
|
+
---
|
|
284
|
+
|
|
285
|
+
Check k8s pod health across requesthub-prod and requesthub-dev.
|
|
286
|
+
If all pods are Running, reply with exactly: HEARTBEAT_OK
|
|
287
|
+
Format any issues as: ⚠️ <namespace>/<pod>: <status>
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
The agent should:
|
|
291
|
+
- Recognize the `[scheduler:...]` header as a scheduled task prompt
|
|
292
|
+
- Understand it's in an isolated session — no access to prior user conversations
|
|
293
|
+
- Focus entirely on the task described in `payload_message`
|
|
294
|
+
- Use appropriate tools (exec, kubectl, browser, file access) as needed
|
|
295
|
+
- Respond concisely — the response becomes the run summary stored in the `runs` table
|
|
296
|
+
|
|
297
|
+
---
|
|
298
|
+
|
|
299
|
+
### The HEARTBEAT_OK Convention
|
|
300
|
+
|
|
301
|
+
For watchdog and health-check jobs, instruct the agent to reply `HEARTBEAT_OK` when nothing needs attention:
|
|
302
|
+
|
|
303
|
+
```json
|
|
304
|
+
{
|
|
305
|
+
"name": "Disk Usage Check",
|
|
306
|
+
"schedule_cron": "0 * * * *",
|
|
307
|
+
"session_target": "isolated",
|
|
308
|
+
"payload_message": "Check disk usage on all mounted filesystems. If all mounts are under 80% full, reply with exactly: HEARTBEAT_OK\nIf any mount is 80% or more, describe the affected mounts and their usage.",
|
|
309
|
+
"delivery_mode": "announce",
|
|
310
|
+
"delivery_channel": "telegram",
|
|
311
|
+
"delivery_to": "YOUR_CHAT_ID"
|
|
312
|
+
}
|
|
313
|
+
```
|
|
314
|
+
|
|
315
|
+
With `delivery_mode: "announce"`, the result is only delivered if the agent's response does **not** contain `HEARTBEAT_OK`.
|
|
316
|
+
|
|
317
|
+
Result: zero noise when healthy, immediate alert when something needs attention.
|
|
318
|
+
|
|
319
|
+
The same works for shell jobs — `announce` only triggers on non-zero exit.
|
|
320
|
+
|
|
321
|
+
---
|
|
322
|
+
|
|
323
|
+
### Letting the Agent Create Jobs Dynamically
|
|
324
|
+
|
|
325
|
+
The agent can create new scheduler jobs at runtime in two ways:
|
|
326
|
+
|
|
327
|
+
#### 1. Direct CLI (recommended for user-initiated requests)
|
|
328
|
+
|
|
329
|
+
When a user asks "remind me to review the PR in 2 hours", the agent runs:
|
|
330
|
+
|
|
331
|
+
```bash
|
|
332
|
+
node ~/.openclaw/scheduler/cli.js jobs add '{
|
|
333
|
+
"name": "PR Review Reminder",
|
|
334
|
+
"schedule_cron": "0 17 * * *",
|
|
335
|
+
"session_target": "isolated",
|
|
336
|
+
"payload_message": "Send Jordan a reminder to review the PR they opened this morning. Be specific about which PR.",
|
|
337
|
+
"delivery_mode": "announce",
|
|
338
|
+
"delivery_channel": "telegram",
|
|
339
|
+
"delivery_to": "YOUR_CHAT_ID",
|
|
340
|
+
"delete_after_run": true
|
|
341
|
+
}'
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
Use `"delete_after_run": true` for one-shot reminders so they clean up after themselves.
|
|
345
|
+
|
|
346
|
+
#### 2. Spawn messages (for jobs creating child jobs at runtime)
|
|
347
|
+
|
|
348
|
+
An isolated job can create new jobs on the fly by sending a spawn message to the scheduler agent:
|
|
349
|
+
|
|
350
|
+
```json
|
|
351
|
+
{
|
|
352
|
+
"from_agent": "main",
|
|
353
|
+
"to_agent": "scheduler",
|
|
354
|
+
"kind": "spawn",
|
|
355
|
+
"body": "{\"name\":\"Follow-up Analysis\",\"payload_message\":\"Analyze the anomaly found in the previous run and prepare a report.\",\"delete_after_run\":true,\"run_now\":true}"
|
|
356
|
+
}
|
|
357
|
+
```
|
|
358
|
+
|
|
359
|
+
The dispatcher picks this up on its next tick (within 15s) and creates and immediately runs the spawned job.
|
|
360
|
+
|
|
361
|
+
---
|
|
362
|
+
|
|
363
|
+
### Communicating Between Jobs
|
|
364
|
+
|
|
365
|
+
Jobs can pass data to each other using the inter-agent message queue. Messages injected into a job's context appear in the `--- Pending Messages ---` block at the top of the prompt.
|
|
366
|
+
|
|
367
|
+
**Pattern: Monitor → Handler**
|
|
368
|
+
|
|
369
|
+
Monitor job finds an anomaly and sends a task to the handler agent:
|
|
370
|
+
|
|
371
|
+
```bash
|
|
372
|
+
# In the monitor's payload_message, instruct the agent to:
|
|
373
|
+
# "If you find any ERROR entries in the log, send a message to agent 'main'
|
|
374
|
+
# with kind='task' and body='<details of the error>'"
|
|
375
|
+
```
|
|
376
|
+
|
|
377
|
+
The handler job reads its inbox automatically at the start of its next run.
|
|
378
|
+
|
|
379
|
+
**Pattern: Job A → Job B via message queue**
|
|
380
|
+
|
|
381
|
+
```bash
|
|
382
|
+
# Send a message from one job to another
|
|
383
|
+
openclaw-scheduler msg send monitor-agent handler-agent "Found 3 critical errors at 14:23"
|
|
384
|
+
```
|
|
385
|
+
|
|
386
|
+
---
|
|
387
|
+
|
|
388
|
+
### Tracking Spawned Sub-Agents
|
|
389
|
+
|
|
390
|
+
When you spawn sub-agents to do parallel work, use the task tracker so the scheduler monitors them and delivers a completion summary automatically — without you polling.
|
|
391
|
+
|
|
392
|
+
**Step 1: Create a tracker before spawning**
|
|
393
|
+
|
|
394
|
+
```bash
|
|
395
|
+
# In your agent session, before spawning:
|
|
396
|
+
TRACKER_ID=$(node ~/.openclaw/scheduler/cli.js tasks create '{
|
|
397
|
+
"name": "doc-sprint",
|
|
398
|
+
"expectedAgents": ["writer", "reviewer"],
|
|
399
|
+
"timeoutS": 3600,
|
|
400
|
+
"deliveryChannel": "telegram",
|
|
401
|
+
"deliveryTo": "YOUR_CHAT_ID"
|
|
402
|
+
}' | grep '"id"' | cut -d'"' -f4)
|
|
403
|
+
```
|
|
404
|
+
|
|
405
|
+
**Step 2: Spawn sub-agents (via `sessions_spawn` tool), then register their session keys**
|
|
406
|
+
|
|
407
|
+
```bash
|
|
408
|
+
# After getting the childSessionKey from sessions_spawn:
|
|
409
|
+
node ~/.openclaw/scheduler/cli.js tasks register-session $TRACKER_ID writer "agent:main:subagent:abc-123"
|
|
410
|
+
node ~/.openclaw/scheduler/cli.js tasks register-session $TRACKER_ID reviewer "agent:main:subagent:def-456"
|
|
411
|
+
```
|
|
412
|
+
|
|
413
|
+
Once session keys are registered, the **dispatcher auto-detects heartbeats** by calling `sessions_list` every 30s. As long as the sub-agent's session is active, it's counted as alive — no CLI calls required from inside the sub-agent.
|
|
414
|
+
|
|
415
|
+
**Step 3: Sub-agents report completion (optional but recommended)**
|
|
416
|
+
|
|
417
|
+
Add this to the sub-agent's task preamble:
|
|
418
|
+
|
|
419
|
+
```
|
|
420
|
+
## Status Reporting
|
|
421
|
+
Tracker ID: <TRACKER_ID>
|
|
422
|
+
Your agent label: writer
|
|
423
|
+
|
|
424
|
+
When you start working:
|
|
425
|
+
node ~/.openclaw/scheduler/cli.js tasks heartbeat <TRACKER_ID> writer running
|
|
426
|
+
|
|
427
|
+
When you finish successfully:
|
|
428
|
+
node ~/.openclaw/scheduler/cli.js tasks heartbeat <TRACKER_ID> writer completed "Brief summary of what you did"
|
|
429
|
+
|
|
430
|
+
If something goes wrong:
|
|
431
|
+
node ~/.openclaw/scheduler/cli.js tasks heartbeat <TRACKER_ID> writer failed "What went wrong"
|
|
432
|
+
```
|
|
433
|
+
|
|
434
|
+
**What happens automatically:**
|
|
435
|
+
- Dispatcher checks active sessions every 30s — agents with active sessions stay "alive"
|
|
436
|
+
- Agents that go silent for > 5 minutes AND whose tracker has timed out → marked dead
|
|
437
|
+
- When all agents reach terminal state → delivery summary sent to your configured channel
|
|
438
|
+
- Check anytime: `openclaw-scheduler tasks status <TRACKER_ID>`
|
|
439
|
+
|
|
440
|
+
**Detecting sub-agents from other sessions:**
|
|
441
|
+
|
|
442
|
+
The scheduler's dispatcher calls `sessions_list` with `kinds: ['subagent']` which returns **all sub-agent sessions across all requesters** — not just the current session. This means:
|
|
443
|
+
- Sub-agents spawned from any session are visible to the task tracker
|
|
444
|
+
- Works even if the spawning session has ended
|
|
445
|
+
- Works across session compaction / context resets
|
|
446
|
+
|
|
447
|
+
---
|
|
448
|
+
|
|
449
|
+
### Practical Agent Briefing Example
|
|
450
|
+
|
|
451
|
+
Here's a complete, self-contained entry to add to your workspace `MEMORY.md` or context file. Copy and adapt it:
|
|
452
|
+
|
|
453
|
+
```markdown
|
|
454
|
+
## OpenClaw Scheduler — How to Use
|
|
455
|
+
|
|
456
|
+
The scheduler (`~/.openclaw/scheduler/`) runs as a background service (launchd / systemd)
|
|
457
|
+
and fires jobs independently of your chat sessions.
|
|
458
|
+
|
|
459
|
+
### Quick commands
|
|
460
|
+
- Status: `openclaw-scheduler status`
|
|
461
|
+
- List jobs: `openclaw-scheduler jobs list`
|
|
462
|
+
- Add job: `openclaw-scheduler jobs add '<json>'`
|
|
463
|
+
- View runs: `openclaw-scheduler runs list <job-id>`
|
|
464
|
+
- Force run now: `openclaw-scheduler jobs run <id>`
|
|
465
|
+
|
|
466
|
+
> **Warning:** Avoid direct `sqlite3 ... UPDATE jobs SET next_run_at` statements against the live database. SQLite WAL locking can conflict with the running scheduler process and cause SQLITE_BUSY errors or stale reads. Use the CLI command above instead -- it enqueues through the dispatch queue safely.
|
|
467
|
+
- Logs: `tail -f /tmp/openclaw-scheduler.log`
|
|
468
|
+
|
|
469
|
+
### When you receive a scheduled prompt
|
|
470
|
+
You're in an isolated session. No conversation history. No context from prior chats.
|
|
471
|
+
Read the `[scheduler:...]` header to identify the job, then do exactly what `payload_message` says.
|
|
472
|
+
- Reply `HEARTBEAT_OK` for watchdog/health-check jobs when nothing needs attention (suppresses delivery)
|
|
473
|
+
- For analysis/report jobs, write a concise summary — it's stored as the run record
|
|
474
|
+
- For shell jobs, you're not invoked at all — the command runs directly
|
|
475
|
+
|
|
476
|
+
### Job types cheatsheet
|
|
477
|
+
| Type | Use for |
|
|
478
|
+
|------|---------|
|
|
479
|
+
| `shell` | Scripts, backups, pings — fast, no LLM, runs even when gateway is down |
|
|
480
|
+
| `isolated` | AI tasks needing tools, memory, reasoning — each run gets fresh context |
|
|
481
|
+
| `main` | Urgent alerts that must appear in your active chat (use sparingly) |
|
|
482
|
+
|
|
483
|
+
### Creating jobs
|
|
484
|
+
When asked to set up a scheduled task, use:
|
|
485
|
+
```bash
|
|
486
|
+
node ~/.openclaw/scheduler/cli.js jobs add '{
|
|
487
|
+
"name": "Task Name",
|
|
488
|
+
"schedule_cron": "0 9 * * 1-5",
|
|
489
|
+
"session_target": "isolated",
|
|
490
|
+
"payload_message": "Your clear, specific instructions here.",
|
|
491
|
+
"delivery_mode": "announce",
|
|
492
|
+
"delivery_channel": "telegram",
|
|
493
|
+
"delivery_to": "TELEGRAM_ID"
|
|
494
|
+
}'
|
|
495
|
+
```
|
|
496
|
+
```
|
|
497
|
+
|
|
498
|
+
---
|
|
499
|
+
|
|
500
|
+
## See Also
|
|
501
|
+
|
|
502
|
+
- [README.md](README.md) — Full feature reference
|
|
503
|
+
- [INSTALL.md](INSTALL.md) — macOS installation
|
|
504
|
+
- [INSTALL-LINUX.md](INSTALL-LINUX.md) — Linux installation
|
|
505
|
+
- [INSTALL-WINDOWS.md](INSTALL-WINDOWS.md) — Windows installation
|
|
506
|
+
- [UNINSTALL.md](UNINSTALL.md) — Removing the scheduler
|
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
## [Unreleased]
|
|
6
|
+
|
|
7
|
+
### Fixed
|
|
8
|
+
- fix(watcher): exit cleanly when session status=done (PR #1)
|
|
9
|
+
- fix(watchdog): prevent auto-resolving active sessions with heartbeat + hard ceiling (PR #2)
|
|
10
|
+
- fix(gateway): reset idle timer while fetch is in flight (PR #3)
|
|
11
|
+
- fix(watcher): prevent premature kill of active subagent sessions with JSONL activity signal (PR #7)
|
|
12
|
+
|
|
13
|
+
### Added
|
|
14
|
+
- feat: v0.2 runtime with identity/trust/authorization/evidence/credential handoff (PR #4)
|
|
15
|
+
- feat: x-openclaw-env-inject header for agent task credentials (PR #5)
|
|
16
|
+
- docs: trust architecture, multi-agent gateway routing, agent adoption files
|
|
17
|
+
|
|
18
|
+
### Changed
|
|
19
|
+
- chore: replace non-ASCII characters with ASCII equivalents (PR #6)
|
|
20
|
+
|
|
21
|
+
## [0.2.0] -- 2026-03-11
|
|
22
|
+
|
|
23
|
+
### Added
|
|
24
|
+
- Strategy pattern refactor: decomposed 614-line `dispatchJob` closure into explicit `DispatchContext` + strategy functions (`prepareDispatch`, `executeStrategy`, `finalizeDispatch`) in new `dispatcher-strategies.js`
|
|
25
|
+
- Auth profile resolution for isolated agent turns: `auth_profile` field on jobs supports `'inherit'` (looks up main session profile) or explicit `'provider:label'`
|
|
26
|
+
- Drain-error retry: transient infrastructure errors (HTTP 529) bypass normal retry ladder and re-enqueue immediately
|
|
27
|
+
- One-shot `at`-style scheduling via `schedule_kind: 'at'` and `schedule_at` fields (schema v18)
|
|
28
|
+
- Complete TypeScript type coverage: 26 previously missing function signatures, 4 corrected return types, 51 missing schema columns added to `index.d.ts`
|
|
29
|
+
- Expanded type smoke tests from 23 to 192+ lines exercising all typed APIs
|
|
30
|
+
- 5 new test coverage areas: dispatcher-utils, dispatch-queue lifecycle, approval timeout/prune/count, run session/context, prompt-context edge cases
|
|
31
|
+
- `idempotency`, `taskTracker`, and `teamAdapter` modules now exported from `index.js` for programmatic consumers
|
|
32
|
+
|
|
33
|
+
### Fixed
|
|
34
|
+
- `updateJobAfterRun` null guard prevents crash when job is deleted mid-dispatch
|
|
35
|
+
- Shell timeout and retry exhaustion handling corrected
|
|
36
|
+
- Boolean job flags normalized for SQLite writes
|
|
37
|
+
- Numeric enabled flags treated as disabled on create
|
|
38
|
+
- Child jobs can no longer self-fire as autonomous one-shot schedules; due selectors are root-only
|
|
39
|
+
- Disabled future one-shot jobs are no longer pruned before they ever run
|
|
40
|
+
- Consolidation migration now backfills partial legacy message/task-tracker tables without noisy fallback errors
|
|
41
|
+
|
|
42
|
+
### Changed
|
|
43
|
+
- Default `schedule_tz` changed from `America/New_York` to `UTC` in schema, validation, and setup
|
|
44
|
+
- `--json` mode wired through all CLI subcommands (msg, tasks, team, queue, idem) via `emit()`/`fail()` helpers
|
|
45
|
+
- Dispatch subsystem portability: `process.execPath` replaces bare `node`, `__dirname`-relative paths replace hardcoded install paths
|
|
46
|
+
- Dispatcher reduced from ~1200 lines to ~850 lines; `dispatchJob` is now a 5-line orchestrator (strategy code lives in `dispatcher-strategies.js`)
|
|
47
|
+
- `buildDispatchDeps()` wires 36+ dependencies via dependency injection
|
|
48
|
+
- Full validation gate moved into local verification commands (`npm run verify:local` / `npm run verify:smoke`); GitHub Actions now runs a single lightweight smoke job
|
|
49
|
+
- Test baseline updated to 1410 passed
|
|
50
|
+
- Schema baseline is now v23
|
|
51
|
+
|
|
52
|
+
## [0.1.0] -- 2026-03-08
|
|
53
|
+
|
|
54
|
+
First public release.
|
|
55
|
+
|
|
56
|
+
### Added
|
|
57
|
+
- Watchdog job type for long-running task monitoring, including dedicated watchdog fields, CLI support, dispatcher handling, and config example scaffolding
|
|
58
|
+
- Durable dispatch queue for manual runs, retries, and chain-triggered executions, with persisted run causality via `dispatch_queue_id` and `triggered_by_run`
|
|
59
|
+
- Structured shell result persistence on runs: exit code, signal, timeout flag, stdout, and stderr
|
|
60
|
+
- Richer shell-failure context for triggered follow-up jobs and agent triage flows
|
|
61
|
+
- CLI improvements for machine use and release readiness, including `--json`, `jobs validate`, schema introspection, and improved npm-install defaults
|
|
62
|
+
- Safe typed root exports for programmatic tooling (`index.js` + `index.d.ts`)
|
|
63
|
+
|
|
64
|
+
### Fixed
|
|
65
|
+
- Shell timeouts are now classified correctly as `timeout`, with `shell_timed_out` persisted on runs
|
|
66
|
+
- Shell retries now exhaust correctly and fire failure children only after the retry ladder is complete
|
|
67
|
+
- Consolidated migration skip logic now checks for actual column presence instead of relying on version markers alone
|
|
68
|
+
- Runtime startup version logging now reads from `package.json` instead of a stale hardcoded string
|
|
69
|
+
- Public-facing docs/examples no longer include private hostnames or deployment-specific Telegram identifiers
|
|
70
|
+
- Node 20 compatibility by removing runtime dependence on `node:sqlite` and JSON import attributes
|
|
71
|
+
|
|
72
|
+
### Changed
|
|
73
|
+
- Schema baseline is now `v14`
|
|
74
|
+
- Added execution-intent fields, queue / approval / fan-out caps, shell-output offloading, and runtime budget visibility
|
|
75
|
+
- Tightened ESLint rules, added TypeScript declaration smoke tests, and enforced global coverage floors
|
|
76
|
+
- Extracted dispatcher approvals, delivery, maintenance, and shell helpers into dedicated modules
|
|
77
|
+
- Versioning reset to `0.1.0` as the first public release
|
|
78
|
+
- Updated verification baseline to `581 passed, 0 failed`
|
|
79
|
+
|
|
80
|
+
## Pre-release
|
|
81
|
+
|
|
82
|
+
Internal development versions consolidated into 0.1.0. See git history for details.
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
# Code of Conduct
|
|
2
|
+
|
|
3
|
+
## Our Standard
|
|
4
|
+
|
|
5
|
+
Participate professionally and constructively.
|
|
6
|
+
|
|
7
|
+
Expected behavior:
|
|
8
|
+
|
|
9
|
+
- focus on technical substance
|
|
10
|
+
- keep feedback specific and actionable
|
|
11
|
+
- respect other contributors
|
|
12
|
+
|
|
13
|
+
Unacceptable behavior:
|
|
14
|
+
|
|
15
|
+
- harassment
|
|
16
|
+
- personal attacks
|
|
17
|
+
- discriminatory language
|
|
18
|
+
- deliberately disruptive conduct
|
|
19
|
+
|
|
20
|
+
## Enforcement
|
|
21
|
+
|
|
22
|
+
Project maintainers may remove comments, issues, or contributions that violate this standard.
|
package/CONTEXT.md
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
# Context
|
|
2
|
+
|
|
3
|
+
## Problem
|
|
4
|
+
|
|
5
|
+
Scheduled agent and shell workflows need durability: run history, retries,
|
|
6
|
+
approval gates, delivery, triggered chains, and an audit trail. Built-in
|
|
7
|
+
cron/heartbeat does not provide these.
|
|
8
|
+
|
|
9
|
+
## Repo Position
|
|
10
|
+
|
|
11
|
+
`openclaw-scheduler` is the durable runtime. It sits below the control plane
|
|
12
|
+
(`agentcli`) and beside the gateway (`openclaw`).
|
|
13
|
+
|
|
14
|
+
- It can work standalone with jobs created by agents or operators via CLI.
|
|
15
|
+
- It can be driven by `agentcli` for declarative manifest-based workflows.
|
|
16
|
+
- It dispatches to the OpenClaw gateway for agent sessions.
|
|
17
|
+
- It runs shell jobs directly without the gateway.
|
|
18
|
+
|
|
19
|
+
## Design Bias
|
|
20
|
+
|
|
21
|
+
- scheduling and state in SQLite (single-file, no external services)
|
|
22
|
+
- shell jobs are first-class, not second-class to agent jobs
|
|
23
|
+
- delivery is channel-agnostic (Telegram, Discord, WhatsApp, Signal, iMessage, Slack)
|
|
24
|
+
- run_timeout_ms is required on every job (no indefinite runs)
|
|
25
|
+
- overlap, retry, and delivery guarantee are per-job configuration
|
|
26
|
+
- keep the scheduler process stateless between ticks (all state in the DB)
|