openclaw-scheduler 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (70) hide show
  1. package/AGENTS.md +302 -0
  2. package/BEST-PRACTICES.md +506 -0
  3. package/CHANGELOG.md +82 -0
  4. package/CODE_OF_CONDUCT.md +22 -0
  5. package/CONTEXT.md +26 -0
  6. package/CONTRIBUTING.md +73 -0
  7. package/IMPLEMENTATION_SPEC.md +170 -0
  8. package/INSTALL-ADDITIONAL-HOST.md +333 -0
  9. package/INSTALL-LINUX.md +419 -0
  10. package/INSTALL-WINDOWS.md +305 -0
  11. package/INSTALL.md +364 -0
  12. package/JOB-QUICK-REF.md +222 -0
  13. package/LICENSE +21 -0
  14. package/QUICK-START.md +256 -0
  15. package/README.md +2170 -0
  16. package/SECURITY.md +34 -0
  17. package/UNINSTALL.md +129 -0
  18. package/UPGRADING.md +436 -0
  19. package/agents.js +67 -0
  20. package/approval.js +107 -0
  21. package/backup.js +390 -0
  22. package/bin/openclaw-scheduler.js +138 -0
  23. package/cli.js +1083 -0
  24. package/db.js +122 -0
  25. package/dispatch/529-recovery.mjs +204 -0
  26. package/dispatch/README.md +372 -0
  27. package/dispatch/config.example.json +24 -0
  28. package/dispatch/deliver-watcher.sh +57 -0
  29. package/dispatch/hooks.mjs +171 -0
  30. package/dispatch/index.mjs +1836 -0
  31. package/dispatch/watcher.mjs +1396 -0
  32. package/dispatch-queue.js +112 -0
  33. package/dispatcher-approvals.js +96 -0
  34. package/dispatcher-delivery.js +43 -0
  35. package/dispatcher-maintenance.js +242 -0
  36. package/dispatcher-shell.js +29 -0
  37. package/dispatcher-strategies.js +1280 -0
  38. package/dispatcher-utils.js +81 -0
  39. package/dispatcher.js +855 -0
  40. package/docs/adr-schedule-ownership.md +73 -0
  41. package/docs/gateway-contract.md +904 -0
  42. package/docs/plans/2026-03-09-fix-typescript-types.md +91 -0
  43. package/docs/plans/2026-03-09-test-coverage-gaps.md +83 -0
  44. package/docs/plans/2026-03-10-dispatcher-refactor.md +801 -0
  45. package/docs/trust-architecture.md +266 -0
  46. package/gateway.js +473 -0
  47. package/idempotency.js +119 -0
  48. package/index.d.ts +864 -0
  49. package/index.js +17 -0
  50. package/jobs.js +1224 -0
  51. package/messages.js +357 -0
  52. package/migrate-consolidate.js +694 -0
  53. package/migrate.js +125 -0
  54. package/package.json +130 -0
  55. package/paths.js +79 -0
  56. package/prompt-context.js +94 -0
  57. package/retrieval.js +176 -0
  58. package/runs.js +270 -0
  59. package/scheduler-schema.js +101 -0
  60. package/schema.sql +480 -0
  61. package/scripts/dispatch-cli-utils.mjs +65 -0
  62. package/scripts/inbox-consumer.mjs +288 -0
  63. package/scripts/stuck-detector.sh +18 -0
  64. package/scripts/stuck-run-detector.mjs +333 -0
  65. package/scripts/telegram-webhook-check.mjs +238 -0
  66. package/setup.mjs +724 -0
  67. package/shell-result.js +214 -0
  68. package/task-tracker.js +300 -0
  69. package/team-adapter.js +335 -0
  70. package/v02-runtime.js +599 -0
@@ -0,0 +1,904 @@
1
+ # OpenClaw Gateway Contract
2
+
3
+ Date: 2026-03-28
4
+
5
+ ## Purpose
6
+
7
+ This document defines the gateway API surface that openclaw-scheduler depends on.
8
+ The scheduler relies on these endpoints and behaviors for session management,
9
+ agent execution, system event injection, and health monitoring. Changes to these
10
+ surfaces should be coordinated to avoid breaking the scheduler.
11
+
12
+ ---
13
+
14
+ ## Authentication
15
+
16
+ The scheduler resolves a bearer token using the following fallback chain:
17
+
18
+ 1. **Environment variable**: `OPENCLAW_GATEWAY_TOKEN` (checked first).
19
+ 2. **Token file**: Path from `OPENCLAW_GATEWAY_TOKEN_PATH`, or the default
20
+ `~/.openclaw/credentials/.gateway-token`. The file contents are read once
21
+ and cached for the process lifetime.
22
+
23
+ When a token is available, every HTTP request includes:
24
+
25
+ ```
26
+ Authorization: Bearer <token>
27
+ ```
28
+
29
+ If neither source provides a token, requests are sent without an
30
+ `Authorization` header.
31
+
32
+ Scope headers are endpoint-specific. When the scheduler needs a scoped gateway
33
+ operation, the per-endpoint contract below defines the additional
34
+ `x-openclaw-scopes` header.
35
+
36
+ **dispatch/index.mjs** uses a slightly different resolution path for the CLI
37
+ context: it checks `OPENCLAW_GATEWAY_TOKEN` first, then falls back to reading
38
+ `~/.openclaw/openclaw.json` at `gateway.auth.token`.
39
+
40
+ Reference:
41
+ - `gateway.js` (`getGatewayToken`, `authHeaders`)
42
+ - `dispatch/index.mjs` (`getGatewayToken`, `GATEWAY_TOKEN`)
43
+
44
+ ---
45
+
46
+ ## Gateway Base URL
47
+
48
+ Resolved from `OPENCLAW_GATEWAY_URL`, defaulting to `http://127.0.0.1:18789`.
49
+
50
+ Reference:
51
+ - `gateway.js` (`GATEWAY_URL`)
52
+ - `dispatch/index.mjs` (`GATEWAY_URL`)
53
+
54
+ ---
55
+
56
+ ## Endpoints
57
+
58
+ ### POST /v1/chat/completions
59
+
60
+ **Purpose**: Primary dispatch mechanism for isolated scheduler jobs. Sends a
61
+ single user message to an agent and receives the complete assistant response.
62
+
63
+ **Callers**:
64
+ - `gateway.js` `runAgentTurn()`
65
+ - `gateway.js` `runAgentTurnWithActivityTimeout()`
66
+
67
+ **Request headers**:
68
+
69
+ | Header | Required | Description |
70
+ |---|---|---|
71
+ | `Content-Type` | Yes | Always `application/json` |
72
+ | `Authorization` | Conditional | `Bearer <token>` when token is available |
73
+ | `x-openclaw-scopes` | Conditional | `operator.write` when a bearer token is sent. This scope header is specific to chat-completions dispatch. |
74
+ | `x-openclaw-agent-id` | Conditional | Agent ID string (e.g. `main`). Omitted when falsy. |
75
+ | `x-openclaw-session-key` | Conditional | Session key for continuity. Omitted when not provided. |
76
+ | `x-openclaw-auth-profile` | Conditional | Auth profile override. Omitted when null. See "Auth-Profile Forwarding" below. |
77
+
78
+ **Request body**:
79
+
80
+ ```json
81
+ {
82
+ "model": "openclaw:<agentId>",
83
+ "messages": [
84
+ { "role": "user", "content": "<prompt text>" }
85
+ ],
86
+ "stream": false
87
+ }
88
+ ```
89
+
90
+ The `model` field defaults to `openclaw:<agentId>` but can be overridden via
91
+ `job.payload_model`.
92
+
93
+ **Response body** (expected):
94
+
95
+ ```json
96
+ {
97
+ "choices": [
98
+ {
99
+ "message": {
100
+ "content": "<assistant reply>"
101
+ }
102
+ }
103
+ ],
104
+ "usage": { ... }
105
+ }
106
+ ```
107
+
108
+ The scheduler reads `data.choices[0].message.content` and `data.usage`.
109
+
110
+ **Response headers read**:
111
+
112
+ | Header | Description |
113
+ |---|---|
114
+ | `x-openclaw-session-key` | Returned session key. Used to update the caller's session tracking. |
115
+
116
+ **Error semantics**:
117
+ - Any non-2xx status throws: `Chat completions failed (<status>): <body first 500 chars>`
118
+ - `AbortError` / `TimeoutError` from the fetch signal is translated into a
119
+ descriptive timeout message (see "Activity Timeout" below).
120
+
121
+ **Timeout behavior**:
122
+ - `runAgentTurn`: Hard wall-clock abort via `AbortController` at `timeoutMs`
123
+ (default 300000ms / 5 min).
124
+ - `runAgentTurnWithActivityTimeout`: Two-tier timeout -- see "Activity Timeout
125
+ Pattern" below.
126
+
127
+ ---
128
+
129
+ ### POST /tools/invoke
130
+
131
+ **Purpose**: Invoke gateway-side tools for session listing, message delivery,
132
+ and session management.
133
+
134
+ **Caller**: `gateway.js` `invokeGatewayTool()`
135
+
136
+ **Request headers**:
137
+
138
+ | Header | Required | Description |
139
+ |---|---|---|
140
+ | `Content-Type` | Yes | Always `application/json` |
141
+ | `Authorization` | Conditional | `Bearer <token>` when available |
142
+
143
+ **Request body**:
144
+
145
+ ```json
146
+ {
147
+ "tool": "<tool_name>",
148
+ "args": { ... },
149
+ "sessionKey": "<session_key>"
150
+ }
151
+ ```
152
+
153
+ **Timeout**: 30 seconds via `AbortSignal.timeout(30_000)`.
154
+
155
+ **Error semantics**: Non-2xx throws `Gateway <tool> failed (<status>): <body
156
+ first 500 chars>`.
157
+
158
+ #### Tool: `sessions_list`
159
+
160
+ **Caller**: `gateway.js` `listSessions()`
161
+
162
+ **Args**:
163
+
164
+ ```json
165
+ {
166
+ "activeMinutes": 60,
167
+ "limit": 200,
168
+ "kinds": ["subagent"],
169
+ "messageLimit": 0
170
+ }
171
+ ```
172
+
173
+ All fields are optional. `messageLimit: 0` is always sent to suppress message
174
+ history and return only session metadata.
175
+
176
+ **Response**: The scheduler normalizes across several possible response shapes:
177
+
178
+ ```
179
+ result.result.details.sessions
180
+ result.result.sessions
181
+ result.sessions
182
+ result (raw array)
183
+ ```
184
+
185
+ Each session object is expected to have at minimum: `key` (or `sessionKey`),
186
+ `updatedAt`.
187
+
188
+ **Used by**:
189
+ - `runAgentTurnWithActivityTimeout` -- polls session activity during long runs
190
+ - `getAllSubAgentSessions` -- fetches all active subagent sessions
191
+ - `dispatcher-strategies.js` `executeAgent()` -- resolves `auth_profile: 'inherit'` by finding
192
+ the main session's auth profile
193
+ - `dispatcher-maintenance.js` via `checkTaskTrackers` -- correlates subagent
194
+ sessions with task group agents
195
+
196
+ #### Tool: `message`
197
+
198
+ **Caller**: `gateway.js` `deliverMessage()`
199
+
200
+ **Args**:
201
+
202
+ ```json
203
+ {
204
+ "action": "send",
205
+ "message": "<text>",
206
+ "channel": "telegram",
207
+ "target": "<chat_id>"
208
+ }
209
+ ```
210
+
211
+ Used for delivering job results, check-in updates, and notifications to
212
+ Telegram or other channels. Messages exceeding `TELEGRAM_MAX_MESSAGE_LENGTH`
213
+ (4096 chars) are split into numbered chunks by `splitMessageForChannel`.
214
+
215
+ **Also used directly in dispatch/index.mjs** `cmdEnqueue()` via raw `fetch` to
216
+ `POST /tools/invoke` for the "Starting..." notification when spawning a
217
+ subagent session:
218
+
219
+ ```json
220
+ {
221
+ "tool": "message",
222
+ "args": {
223
+ "action": "send",
224
+ "channel": "<deliverChannel>",
225
+ "target": "<deliverTo>",
226
+ "message": "<brand> [<label>] starting..."
227
+ },
228
+ "sessionKey": "main"
229
+ }
230
+ ```
231
+
232
+ ---
233
+
234
+ ### GET /health
235
+
236
+ **Purpose**: Determine whether the gateway is reachable and responsive.
237
+
238
+ **Callers**:
239
+ - `gateway.js` `checkGatewayHealth()`
240
+ - `gateway.js` `waitForGateway()`
241
+
242
+ **Request headers**: `Authorization: Bearer <token>` when available.
243
+
244
+ **Timeout**: 5 seconds for `checkGatewayHealth`, variable for `waitForGateway`
245
+ (capped at 5 seconds per attempt).
246
+
247
+ **Response semantics**:
248
+ - `checkGatewayHealth()` returns `true` if `resp.ok` (2xx), `false` otherwise
249
+ or on any error.
250
+ - `waitForGateway()` treats **any HTTP response** (even non-200) as "gateway is
251
+ up" -- it only needs TCP connectivity. It polls at `intervalMs` (default
252
+ 2000ms) up to `timeoutMs` (default 30000ms).
253
+
254
+ **Scheduler behavior when unhealthy**:
255
+ - Isolated jobs are deferred (next_run_at pushed forward by 60s).
256
+ - Shell and main-session jobs continue regardless.
257
+ - Health is re-checked every 60 seconds (`dispatcher.js` `tick()`).
258
+
259
+ ---
260
+
261
+ ### GET /sessions/:sessionKey
262
+
263
+ **Purpose**: Retrieve session metadata including message count for activity
264
+ validation.
265
+
266
+ **Caller**: `dispatch/index.mjs` `cmdDone()`
267
+
268
+ **Request headers**: `Authorization: Bearer <GATEWAY_TOKEN>`
269
+
270
+ **Timeout**: 5 seconds via `AbortSignal.timeout(5000)`.
271
+
272
+ **Response body** (expected):
273
+
274
+ ```json
275
+ {
276
+ "messageCount": 15,
277
+ "messages": [ ... ]
278
+ }
279
+ ```
280
+
281
+ The scheduler reads `sessionInfo.messageCount` or falls back to
282
+ `sessionInfo.messages.length`. If the count is 2 or fewer, the done signal is
283
+ rejected as the session likely did not perform real work.
284
+
285
+ **Error handling**: Non-2xx responses or fetch failures are treated as
286
+ non-fatal -- the activity check is skipped with a stderr warning.
287
+
288
+ ---
289
+
290
+ ### CLI: openclaw system event
291
+
292
+ **Purpose**: Inject a system event into the main session. Used for jobs with
293
+ `session_target: 'main'` that communicate via the primary conversation thread
294
+ rather than isolated sessions.
295
+
296
+ **Caller**: `gateway.js` `sendSystemEvent()`
297
+
298
+ **Invocation**:
299
+
300
+ ```
301
+ openclaw system event --text <text> --mode <now|queue> --json
302
+ ```
303
+
304
+ **Arguments**:
305
+ - `--text`: The event text to inject.
306
+ - `--mode`: Either `now` (immediate injection) or `queue` (buffered delivery).
307
+ Validated against `VALID_MODES` set.
308
+ - `--json`: Request JSON output.
309
+
310
+ **Timeout**: 30 seconds (`execFileSync` timeout).
311
+
312
+ **Response parsing**: stdout is parsed as JSON. Any non-JSON prefix (e.g.
313
+ openclaw doctor output) is stripped by finding the first `{` character.
314
+
315
+ **Error semantics**: Throws `system event failed: <message>`.
316
+
317
+ **Used by**: `dispatcher-strategies.js` for main-session dispatch strategy, and
318
+ `dispatcher.js` via `buildDispatchDeps()`.
319
+
320
+ ---
321
+
322
+ ### CLI: openclaw gateway call
323
+
324
+ **Purpose**: Invoke gateway RPC methods via the openclaw CLI. Used by
325
+ `dispatch/index.mjs` for session management operations that are not exposed as
326
+ direct HTTP endpoints.
327
+
328
+ **Caller**: `dispatch/index.mjs` `gatewayCall()`
329
+
330
+ **Invocation**:
331
+
332
+ ```
333
+ openclaw gateway call <method> --json --params '<json>' --timeout <ms> [--expect-final]
334
+ ```
335
+
336
+ **Environment**: If `GATEWAY_TOKEN` is available, it is passed as
337
+ `OPENCLAW_GATEWAY_TOKEN` in the child process environment.
338
+
339
+ **Timeout**: `opts.timeout` (default 15000ms) passed to the CLI, plus a 5000ms
340
+ buffer on the `execFileSync` call.
341
+
342
+ **Response parsing**: stdout is parsed as JSON. Non-JSON prefix lines (e.g.
343
+ plugin init logs) are stripped. On error, stderr and stdout are both checked for
344
+ parseable JSON before throwing.
345
+
346
+ #### Methods called:
347
+
348
+ **`sessions.patch`** -- Configure session properties before agent dispatch.
349
+
350
+ Called in `cmdEnqueue()` for fresh sessions:
351
+
352
+ ```json
353
+ // Set spawn depth
354
+ { "key": "<sessionKey>", "spawnDepth": 1 }
355
+
356
+ // Set model override (when --model is provided)
357
+ { "key": "<sessionKey>", "model": "<model>" }
358
+
359
+ // Set thinking level (when --thinking is provided)
360
+ { "key": "<sessionKey>", "thinkingLevel": "low" | "high" | "xhigh" | null }
361
+ ```
362
+
363
+ **`agent`** -- Dispatch a message to an agent session.
364
+
365
+ Called in `cmdEnqueue()` and `cmdSend()`:
366
+
367
+ ```json
368
+ {
369
+ "message": "<task message>",
370
+ "sessionKey": "<session key>",
371
+ "idempotencyKey": "<uuid>",
372
+ "deliver": true,
373
+ "lane": "subagent",
374
+ "timeout": 300,
375
+ "label": "<label>",
376
+ "thinking": "high",
377
+ "channel": "telegram",
378
+ "replyTo": "<chat_id>",
379
+ "replyChannel": "telegram"
380
+ }
381
+ ```
382
+
383
+ For `cmdSend` (mid-session steering), the call uses `lane: 'nested'` and
384
+ `deliver: false`.
385
+
386
+ **`chat.history`** -- Retrieve session transcript.
387
+
388
+ Called in `cmdResult()`:
389
+
390
+ ```json
391
+ { "sessionKey": "<session key>" }
392
+ ```
393
+
394
+ Response expected:
395
+
396
+ ```json
397
+ {
398
+ "messages": [
399
+ { "role": "assistant", "content": "..." },
400
+ ...
401
+ ]
402
+ }
403
+ ```
404
+
405
+ The scheduler scans backwards to find the last assistant message.
406
+
407
+ **`sessions.list`** -- List active sessions (gateway API fallback).
408
+
409
+ Called in `checkSessionDone()` when a session is not found in the
410
+ local sessions.json store:
411
+
412
+ ```json
413
+ { "activeMinutes": 1440 }
414
+ ```
415
+
416
+ Used to confirm whether a session is still active in the gateway before
417
+ auto-resolving it as done. This handles the case where subagent sessions
418
+ (openclaw 2026.3.13+) are tracked via SessionBindingService and are NOT
419
+ written to sessions.json.
420
+
421
+ ---
422
+
423
+ ## Session Lifecycle
424
+
425
+ ### Creation
426
+
427
+ Sessions are created implicitly. The scheduler generates a session key in the
428
+ format `agent:<agentId>:subagent:<uuid>` (dispatch/index.mjs `makeSessionKey()`).
429
+ No explicit "create session" API exists -- the gateway creates the
430
+ session when it first receives a request with that key.
431
+
432
+ ### Configuration (Pre-dispatch)
433
+
434
+ Before dispatching work, `cmdEnqueue` patches the session via
435
+ `openclaw gateway call sessions.patch` to set:
436
+ - `spawnDepth: 1` (always, for fresh sessions)
437
+ - `model` (if `--model` flag was provided)
438
+ - `thinkingLevel` (if `--thinking` flag was provided)
439
+
440
+ ### Dispatch
441
+
442
+ The scheduler dispatches work via two paths:
443
+
444
+ 1. **Isolated agent turns** (`dispatcher.js` -> `dispatcher-strategies.js`):
445
+ Uses `runAgentTurnWithActivityTimeout()` which calls
446
+ `POST /v1/chat/completions`. The response session key is stored in the run
447
+ record via `updateRunSession()`.
448
+
449
+ 2. **Sub-agent dispatch** (`dispatch/index.mjs`): Uses
450
+ `openclaw gateway call agent` which is the CLI-based equivalent. Session key
451
+ and idempotency key are tracked in the labels.json ledger.
452
+
453
+ ### Polling
454
+
455
+ The `runAgentTurnWithActivityTimeout` function polls session activity during
456
+ long-running turns by calling `listSessions()` (which invokes
457
+ `sessions_list` via `/tools/invoke`) at `pollIntervalMs` intervals (default
458
+ 60s). It checks `updatedAt` on the matched session to determine whether the
459
+ agent is still active.
460
+
461
+ ### Status Checking
462
+
463
+ `dispatch/index.mjs` checks session state through two mechanisms:
464
+
465
+ 1. **Local sessions store**: Reads
466
+ `~/.openclaw/agents/<agent>/sessions/sessions.json` directly from disk
467
+ (`readSessionsStore()`). This is treated as ground truth for
468
+ sessions that appear there.
469
+
470
+ 2. **Gateway API fallback**: When a session is not found in the local store
471
+ (common for subagent sessions in openclaw 2026.3.13+),
472
+ `checkSessionDone()` falls back to `openclaw gateway call sessions.list`
473
+ to confirm whether the session is still active.
474
+
475
+ ### Completion Detection
476
+
477
+ A session is considered done when:
478
+ - It is not found in either the sessions store or the gateway API (and is not
479
+ within the 5-minute young session grace period).
480
+ - It is found but has been idle past the threshold (default: max of job timeout
481
+ or 10 minutes).
482
+ - The agent explicitly calls the `done` subcommand, which sets the label status
483
+ in labels.json.
484
+
485
+ ### Patching (Post-completion)
486
+
487
+ No explicit session close/delete API is called. Sessions remain in the store
488
+ after completion. Label status is updated in labels.json to `done`,
489
+ `interrupted`, or `error`.
490
+
491
+ ---
492
+
493
+ ## Multi-Agent Gateway Routing
494
+
495
+ A single OpenClaw gateway instance serves multiple agents. The scheduler
496
+ dispatches to specific agents by setting the `x-openclaw-agent-id` header
497
+ (or encoding the agent ID in the model string as `openclaw:<agentId>`).
498
+ No pre-registration step is required -- the gateway creates agent-scoped
499
+ state on first request.
500
+
501
+ ### Agent ID resolution
502
+
503
+ The gateway resolves the target agent ID from each inbound request using
504
+ two sources, in priority order:
505
+
506
+ 1. **Header**: `x-openclaw-agent-id` (or `x-openclaw-agent`). Highest
507
+ priority. This is what the scheduler sets.
508
+ 2. **Model string**: `openclaw:<agentId>` or `agent:<agentId>` patterns
509
+ parsed from the `model` field in the request body.
510
+
511
+ If neither is present, the gateway defaults to `"main"`. Agent IDs are
512
+ normalized to lowercase and must match `[a-z0-9][a-z0-9_-]{0,63}`.
513
+
514
+ Reference: `openclaw/src/gateway/http-utils.ts`
515
+ (`resolveAgentIdFromHeader`, `resolveAgentIdFromModel`,
516
+ `resolveAgentIdForRequest`).
517
+
518
+ ### Agent-scoped session keys
519
+
520
+ Sessions are namespaced by agent ID. The session key format is:
521
+
522
+ ```
523
+ agent:<agentId>:<prefix>:<identifier>
524
+ ```
525
+
526
+ Examples:
527
+ - `agent:main:subagent:a1b2c3d4-...` -- main agent, scheduler-dispatched
528
+ isolated session
529
+ - `agent:beta:openai:e5f6g7h8-...` -- beta, OpenAI-compat chat session
530
+ - `agent:main:telegram:webhook:123456789` -- main agent, Telegram peer
531
+
532
+ This namespacing provides session isolation between agents. Agent beta's
533
+ sessions cannot read main's conversation history or tool state, and
534
+ vice versa, even though both run on the same gateway.
535
+
536
+ Reference: `openclaw/src/routing/session-key.ts`
537
+ (`buildAgentMainSessionKey`, `DEFAULT_AGENT_ID`).
538
+
539
+ ### Per-agent configuration
540
+
541
+ Each agent has its own configuration directory at
542
+ `~/.openclaw/agents/<agentId>/agent/`, containing:
543
+
544
+ - `models.json` -- provider endpoints and model definitions for this
545
+ agent. Different agents can use different model providers (e.g. main
546
+ uses Anthropic, beta uses OpenAI Codex via a different base URL).
547
+ - `auth-profiles.json` -- credential profiles scoped to this agent.
548
+ Each agent can have independent API keys, OAuth tokens, and provider
549
+ configurations.
550
+ - `sessions/` -- per-agent session store (sessions.json + JSONL files).
551
+
552
+ The gateway reads from the correct agent directory based on the resolved
553
+ agent ID. This means agents on the same gateway can have completely
554
+ independent credential surfaces.
555
+
556
+ ### Scheduler dispatch to non-default agents
557
+
558
+ The scheduler targets a specific agent by setting `agent_id` on the job:
559
+
560
+ ```json
561
+ {
562
+ "name": "Beta Agent Daily Task",
563
+ "agent_id": "beta",
564
+ "session_target": "isolated",
565
+ "payload_kind": "agentTurn",
566
+ "payload_message": "perform daily check"
567
+ }
568
+ ```
569
+
570
+ At dispatch time, `gateway.js` sets `x-openclaw-agent-id: beta` on the
571
+ outbound `/v1/chat/completions` request. The gateway routes the request
572
+ to beta's agent scope, creates a session under `agent:beta:...`, and
573
+ uses beta's model and auth profile configuration.
574
+
575
+ Jobs without an explicit `agent_id` default to `"main"`.
576
+
577
+ ### Multi-agent trust considerations
578
+
579
+ When multiple agents share a gateway, each agent is a separate execution
580
+ principal with its own credential surface:
581
+
582
+ - Auth profiles are per-agent (`~/.openclaw/agents/<id>/agent/auth-profiles.json`).
583
+ A job dispatched to beta uses beta's profiles, not main's.
584
+ - The scheduler's `child_credential_policy` applies within a single
585
+ agent's dispatch chain. Cross-agent credential scoping (e.g. a main
586
+ job triggering a beta child with downscoped credentials) is not
587
+ currently supported -- each agent resolves credentials from its own
588
+ profile store.
589
+ - The `x-openclaw-env-inject` header is agent-agnostic: materialized
590
+ env vars are forwarded to whichever agent the job targets.
591
+ - Session isolation between agents is enforced by the session key
592
+ namespace. Agent A cannot access agent B's sessions or conversation
593
+ history through the gateway.
594
+
595
+ For the broader trust architecture, see `docs/trust-architecture.md`.
596
+
597
+ ---
598
+
599
+ ## Activity Timeout Pattern
600
+
601
+ `runAgentTurnWithActivityTimeout()` in `gateway.js` implements a
602
+ two-tier timeout for the `/v1/chat/completions` call:
603
+
604
+ ### Absolute Timeout
605
+ A hard ceiling (`absoluteTimeoutMs`, default 300000ms / 5 min) fires
606
+ regardless of activity. Maps to `job.run_timeout_ms`.
607
+
608
+ ### Idle Timeout
609
+ Polls session activity via `listSessions()` at `pollIntervalMs` (default
610
+ 60000ms / 1 min). Tracks `lastSeenActivity` timestamp. If the session has been
611
+ idle for `2 * idleTimeoutMs` (default `2 * 120000ms = 240s`), the request is
612
+ aborted. The idle threshold maps to `job.payload_timeout_seconds`.
613
+
614
+ ### Abort Reasons
615
+ On abort, the error message distinguishes the cause:
616
+ - `idle_timeout`: "Session idle for Ns -- aborted (activity-based timeout)"
617
+ - `absolute_timeout`: "Exceeded absolute timeout of Ns"
618
+
619
+ ### Parameters
620
+
621
+ | Parameter | Default | Source |
622
+ |---|---|---|
623
+ | `idleTimeoutMs` | 120000 | `job.payload_timeout_seconds * 1000` |
624
+ | `pollIntervalMs` | 60000 | Hardcoded |
625
+ | `absoluteTimeoutMs` | 300000 | `job.run_timeout_ms` |
626
+
627
+ ---
628
+
629
+ ## Auth-Profile Forwarding
630
+
631
+ Jobs can specify an `auth_profile` field with three modes:
632
+
633
+ ### null (default)
634
+ No `x-openclaw-auth-profile` header is sent. The gateway uses its default
635
+ authentication profile.
636
+
637
+ ### "inherit"
638
+ The scheduler resolves the main session's active auth profile at dispatch time.
639
+ It calls `listSessions({ kinds: ['main'], activeMinutes: 120, limit: 10 })` via
640
+ the `sessions_list` tool, finds the main session, and reads its
641
+ `authProfileOverride`, `authProfile`, or `profile` field (in that priority
642
+ order).
643
+
644
+ If a profile is found, it replaces `'inherit'` with the resolved profile ID
645
+ string. If no main session profile is found, `'inherit'` is passed through
646
+ as-is to the gateway.
647
+
648
+ Reference: `dispatcher-strategies.js` `executeAgent()`.
649
+
650
+ ### "provider:label" (explicit)
651
+ A specific provider and label string (e.g. `anthropic:production`) is passed
652
+ directly as the `x-openclaw-auth-profile` header value without resolution.
653
+
654
+ ---
655
+
656
+ ## Env-Inject Forwarding
657
+
658
+ When credential materialization for an agent task produces a non-empty plain
659
+ object of string environment variables, the scheduler JSON-encodes that map
660
+ and sends it as the `x-openclaw-env-inject` header on
661
+ `POST /v1/chat/completions`.
662
+
663
+ Validation rules:
664
+
665
+ - Arrays, non-plain objects, and null/undefined values are rejected.
666
+ - Empty objects are omitted.
667
+ - All values must be strings.
668
+ - Serialization uses `Object.fromEntries` on validated entries so hidden
669
+ `toJSON` hooks on the original object cannot alter the payload.
670
+
671
+ ### Precedence when both headers are present
672
+
673
+ A request may include both `x-openclaw-auth-profile` and
674
+ `x-openclaw-env-inject`. These are complementary, not competing:
675
+
676
+ - `x-openclaw-auth-profile` selects which credential profile the gateway
677
+ uses for upstream API calls (model provider routing).
678
+ - `x-openclaw-env-inject` injects task-scoped environment variables into
679
+ the child session's process environment (credential materialization).
680
+
681
+ If the gateway receives both, it should apply both: select the auth profile
682
+ for provider routing, and merge the env vars into the child environment.
683
+ Neither header overrides the other.
684
+
685
+ ### Header size limits
686
+
687
+ Materialized env maps should be kept small (a handful of API keys and
688
+ scope tokens). The scheduler does not enforce a size limit, but HTTP
689
+ proxies and gateways typically cap individual header values at 8 KB.
690
+ Gateway implementations should reject `x-openclaw-env-inject` values
691
+ that exceed a reasonable threshold (suggested: 8192 bytes) and return
692
+ `431 Request Header Fields Too Large`.
693
+
694
+ ### Receiver-side implementation notes
695
+
696
+ When the gateway parses `x-openclaw-env-inject`, it must use a safe
697
+ merge strategy. Specifically:
698
+
699
+ - Parse the header value with `JSON.parse`.
700
+ - Validate the result is a plain object (not an array, not a prototype
701
+ chain exploit).
702
+ - Merge only string-valued entries into the child process environment.
703
+ - Do not use recursive merge or spread into `Object.prototype` --
704
+ naive merge enables prototype pollution.
705
+
706
+ This path requires matching receiver-side support in the gateway. Until that
707
+ support is available, `auth_profile` forwarding remains the compatibility path
708
+ for agent-side credential selection.
709
+
710
+ Reference: `gateway.js` (`runAgentTurn()`,
711
+ `runAgentTurnWithActivityTimeout()`) and `dispatcher-strategies.js`
712
+ (`executeAgent()`).
713
+
714
+ ---
715
+
716
+ ## Trust Architecture
717
+
718
+ For the full trust architecture -- including what the scheduler/child
719
+ boundary guarantees vs. what it does not, the credential flow from operator
720
+ to child, and the distinction between security boundaries and operational
721
+ boundaries -- see `docs/trust-architecture.md`.
722
+
723
+ The gateway contract intersects with the trust architecture at these points:
724
+
725
+ - **Session isolation:** isolated sessions cannot access the main session's
726
+ memory or history. This provides context isolation between parent and child
727
+ tasks.
728
+ - **Auth-profile forwarding:** the scheduler can direct the gateway to use a
729
+ specific credential profile for agent tasks (see "Auth-Profile Forwarding"
730
+ above).
731
+ - **Credential materialization:** for shell tasks, credentials are injected as
732
+ environment variables by the identity provider. For agent tasks, the
733
+ scheduler can now forward a materialized env map via
734
+ `x-openclaw-env-inject`; `auth_profile` forwarding remains the profile-based
735
+ compatibility path when the gateway does not yet apply env injection.
736
+
737
+ ---
738
+
739
+ ## Local Provider Plugins
740
+
741
+ ### Dispatch-Time Authorization Evaluation
742
+
743
+ The scheduler evaluates **inline** `authorization` JSON at dispatch time. When
744
+ the authorization blob names a provider (`authorization.provider` or
745
+ `authorization.authorization_provider`), that provider is invoked and must
746
+ return one of `permit`, `deny`, or `escalate`; unsupported or missing decisions
747
+ fail closed as `deny`.
748
+
749
+ `authorization_ref` by itself is **not** an external-policy lookup mechanism
750
+ today. If `authorization_ref` is set and `authorization` is empty, dispatch-time
751
+ evaluation fails closed with `deny` because external policy resolution is not
752
+ implemented yet. Jobs that need a dispatch-time authorization gate must provide
753
+ an inline authorization blob (optionally provider-backed), or remove the ref.
754
+
755
+ The scheduler can load local identity, authorization, and proof-verifier
756
+ plugins from `SCHEDULER_PROVIDER_PATH` at startup. Every `*.js` file in that
757
+ directory is imported and registered by `provider-registry.js`.
758
+
759
+ This is a high-trust boundary:
760
+
761
+ - `SCHEDULER_PROVIDER_PATH` should point only to operator-controlled code.
762
+ - The directory should not be writable by untrusted users or automation.
763
+ - If a job explicitly references a provider or verifier and that plugin is not
764
+ loaded, the v0.2 runtime fails closed instead of falling back to structural
765
+ checks.
766
+ - Credential handoff materialization is currently shell-only. Jobs that declare
767
+ `identity.presentation` or `credential_handoff` must use
768
+ `session_target: "shell"`; non-shell jobs fail closed at validation/dispatch
769
+ time.
770
+
771
+ For the broader trust architecture that frames this provider trust boundary
772
+ within the scheduler/child execution model, see `docs/trust-architecture.md`.
773
+
774
+ Reference:
775
+ - `dispatcher.js` `main()` (provider loading at startup)
776
+ - `provider-registry.js` `loadProviders()`
777
+ - `v02-runtime.js` (`resolveIdentity()`, `verifyAuthorizationProof()`, `evaluateAuthorization()`)
778
+
779
+ ---
780
+
781
+ ## Cancellation and Interruption
782
+
783
+ ### Current State
784
+
785
+ There is no explicit cancel API. Cancellation is achieved exclusively through
786
+ timeout-based abort:
787
+
788
+ - **`runAgentTurn`**: Hard `AbortController` timeout on the fetch request.
789
+ - **`runAgentTurnWithActivityTimeout`**: Two-tier abort (idle + absolute).
790
+ - **Watchdog jobs**: `dispatch/index.mjs` registers watchdog cron jobs that run
791
+ the `stuck` subcommand. Stuck sessions are reported but not actively
792
+ cancelled -- they are auto-resolved as `interrupted` in the labels ledger
793
+ when the sessions store confirms they are idle.
794
+
795
+ When a session is auto-resolved (via `cmdStatus`, `cmdStuck`, or `cmdSync`),
796
+ the label is marked `interrupted` with a summary noting that work may be
797
+ incomplete. The associated watchdog job is disarmed via the scheduler CLI
798
+ (`jobs disable`).
799
+
800
+ ### Proposed: Explicit Cancel API
801
+
802
+ A `sessions.cancel` method or `sessions.patch` with a cancel flag would allow
803
+ the scheduler to actively terminate a session rather than waiting for the
804
+ timeout to expire. This would reduce resource waste from abandoned sessions and
805
+ provide faster feedback to delivery targets.
806
+
807
+ ---
808
+
809
+ ## Version and Capability Discovery
810
+
811
+ ### Current State
812
+
813
+ The scheduler performs no version or capability checking against the gateway.
814
+ The `/health` endpoint is used only as a binary reachability check (2xx = up,
815
+ anything else = down). There is no mechanism to detect whether the gateway
816
+ supports specific tools, API versions, or features.
817
+
818
+ This creates a fragile coupling: if the gateway removes or changes a tool (e.g.
819
+ `sessions_list` response shape), the scheduler will fail at runtime with
820
+ opaque errors rather than a clear incompatibility signal.
821
+
822
+ ### Proposed: Version and Capability Endpoint
823
+
824
+ The `/health` response should include version and capability metadata:
825
+
826
+ ```json
827
+ {
828
+ "ok": true,
829
+ "version": "2026.3.15",
830
+ "capabilities": [
831
+ "sessions_list",
832
+ "sessions.patch",
833
+ "chat.history",
834
+ "agent",
835
+ "message"
836
+ ]
837
+ }
838
+ ```
839
+
840
+ Alternatively, a dedicated `GET /v1/info` endpoint could serve this purpose,
841
+ keeping `/health` lightweight for load balancer probes.
842
+
843
+ ---
844
+
845
+ ## Scheduler-vs-Native-Cron Distinction
846
+
847
+ ### Current State
848
+
849
+ There is no mechanism to distinguish scheduler-dispatched sessions from
850
+ sessions created by other sources (native openclaw cron, direct user
851
+ interaction, other subagent spawns). The scheduler generates unique session
852
+ keys with the format `agent:<id>:subagent:<uuid>`, but this is
853
+ indistinguishable from subagent sessions spawned by other means.
854
+
855
+ ### Proposed: x-openclaw-scheduler-run-id Header
856
+
857
+ Add a custom header to all scheduler-dispatched requests:
858
+
859
+ ```
860
+ x-openclaw-scheduler-run-id: <run_id>
861
+ ```
862
+
863
+ This would allow the gateway to tag session metadata with the originating
864
+ scheduler run, enabling:
865
+ - Filtering sessions by origin in the gateway UI or API
866
+ - Correlating gateway logs with scheduler run records
867
+ - Preventing duplicate dispatch if both the scheduler and native cron target
868
+ the same agent
869
+
870
+ ### Proposed: Session Source Metadata
871
+
872
+ Sessions should carry a `source` field in their metadata:
873
+
874
+ | Value | Description |
875
+ |---|---|
876
+ | `native-cron` | Created by openclaw's built-in cron system |
877
+ | `scheduler` | Created by openclaw-scheduler |
878
+ | `user` | Created by direct user interaction |
879
+ | `subagent` | Created by another agent session |
880
+
881
+ This could be set via `sessions.patch` at creation time or inferred from the
882
+ request headers.
883
+
884
+ ---
885
+
886
+ ## Summary of Gateway Dependencies
887
+
888
+ | Surface | Method | Source File | Purpose |
889
+ |---|---|---|---|
890
+ | `POST /v1/chat/completions` | HTTP | `gateway.js` | Agent turn dispatch |
891
+ | `POST /tools/invoke` (sessions_list) | HTTP | `gateway.js` | Session activity polling, auth profile resolution |
892
+ | `POST /tools/invoke` (message) | HTTP | `gateway.js`, `dispatch/index.mjs` | Message delivery, notifications |
893
+ | `GET /health` | HTTP | `gateway.js` | Gateway reachability check |
894
+ | `GET /sessions/:key` | HTTP | `dispatch/index.mjs` | Session activity validation (done guard) |
895
+ | `openclaw system event` | CLI | `gateway.js` | Main-session event injection |
896
+ | `openclaw gateway call sessions.patch` | CLI | `dispatch/index.mjs` | Session configuration (model, thinking, spawnDepth) |
897
+ | `openclaw gateway call agent` | CLI | `dispatch/index.mjs` | Subagent session dispatch |
898
+ | `openclaw gateway call chat.history` | CLI | `dispatch/index.mjs` | Session transcript retrieval |
899
+ | `openclaw gateway call sessions.list` | CLI | `dispatch/index.mjs` | Session existence verification (fallback) |
900
+ | `x-openclaw-agent-id` | Header | `gateway.js` | Route request to correct agent |
901
+ | `x-openclaw-session-key` | Header (req) | `gateway.js` | Session continuity |
902
+ | `x-openclaw-session-key` | Header (resp) | `gateway.js` | Session key propagation |
903
+ | `x-openclaw-auth-profile` | Header | `gateway.js` | Auth profile override |
904
+ | `~/.openclaw/agents/<agent>/sessions/sessions.json` | File | `dispatch/index.mjs` | Local session state (ground truth) |