@action-llama/skill 0.23.8 → 0.24.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +14 -0
- package/dist/build-docs.d.ts +32 -0
- package/dist/build-docs.d.ts.map +1 -0
- package/dist/build-docs.js +127 -0
- package/dist/build-docs.js.map +1 -0
- package/dist/index.d.ts +7 -7
- package/dist/index.d.ts.map +1 -1
- package/dist/index.js +9 -15
- package/dist/index.js.map +1 -1
- package/package.json +9 -6
- package/skills/al/SKILL.md +25 -0
- package/skills/al/agent-authoring.md +1436 -0
- package/skills/al/debugging.md +1075 -0
- package/skills/al/operations.md +1918 -0
- package/{content/commands/debug.md → skills/debug/SKILL.md} +12 -3
- package/{content/commands/iterate.md → skills/iterate/SKILL.md} +6 -0
- package/skills/new-agent/SKILL.md +36 -0
- package/{content/commands/run.md → skills/run/SKILL.md} +6 -0
- package/{content/commands/status.md → skills/status/SKILL.md} +5 -0
- package/content/AGENTS.md +0 -1128
- package/content/commands/new-agent.md +0 -24
- /package/{content/mcp.json → .mcp.json} +0 -0
|
@@ -0,0 +1,1075 @@
|
|
|
1
|
+
# Agent Commands
|
|
2
|
+
|
|
3
|
+
Agents running in Docker containers have access to shell commands for persisting environment variables, signaling the scheduler, calling other agents, and coordinating with resource locks. These commands are installed at `/tmp/bin/` and taught to agents via a preamble injected before `SKILL.md`.
|
|
4
|
+
|
|
5
|
+
## Environment Commands
|
|
6
|
+
|
|
7
|
+
### `setenv`
|
|
8
|
+
|
|
9
|
+
Persist an environment variable across bash commands. Each bash command the agent runs starts in a fresh shell, so variables set with `export` are lost between commands. `setenv` makes them stick.
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
setenv <NAME> <value>
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
**First bash command — set variables:**
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
setenv REPO "acme/app"
|
|
19
|
+
setenv ISSUE_NUMBER 42
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
**Later bash command — variables are still available:**
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
gh issue view $ISSUE_NUMBER --repo $REPO
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
### How it works
|
|
29
|
+
|
|
30
|
+
`setenv` writes each variable to `/tmp/env.sh`, which is automatically sourced at the start of every bash command. The variable is also exported immediately in the current shell, so it's available right away in the same command.
|
|
31
|
+
|
|
32
|
+
## Signal Commands
|
|
33
|
+
|
|
34
|
+
Signal commands write signal files that the scheduler reads after the session ends.
|
|
35
|
+
|
|
36
|
+
### `al-rerun`
|
|
37
|
+
|
|
38
|
+
Request an immediate rerun to drain remaining backlog. Without this, the scheduler treats the run as complete and waits for the next scheduled tick.
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
al-rerun
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
- Only applies to **scheduled** runs. Webhook-triggered and agent-called runs do not re-run.
|
|
45
|
+
- Reruns continue until the agent completes without calling `al-rerun`, hits an error, or reaches the `maxReruns` limit (default: 10).
|
|
46
|
+
|
|
47
|
+
### `al-status "<text>"`
|
|
48
|
+
|
|
49
|
+
Update the status text shown in the TUI and web dashboard.
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
al-status "reviewing PR #42"
|
|
53
|
+
al-status "found 3 issues to work on"
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### `al-return "<value>"`
|
|
57
|
+
|
|
58
|
+
Return a value to the calling agent. Used when this agent was invoked via `al-subagent`.
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
al-return "PR looks good. Approved with minor suggestions."
|
|
62
|
+
al-return '{"approved": true, "comments": 2}'
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
The calling agent receives this value when it calls `al-subagent-wait`.
|
|
66
|
+
|
|
67
|
+
### `al-exit [code]`
|
|
68
|
+
|
|
69
|
+
Terminate the agent with an exit code indicating an unrecoverable error. Defaults to exit code 15.
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
al-exit # exit code 15
|
|
73
|
+
al-exit 1 # exit code 1
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
## Call Commands
|
|
77
|
+
|
|
78
|
+
Agent-to-agent calls allow agents to delegate work and collect results. These commands require the gateway (`GATEWAY_URL` must be set).
|
|
79
|
+
|
|
80
|
+
### `al-subagent <agent>`
|
|
81
|
+
|
|
82
|
+
Call another agent. Pass context via stdin. Returns a JSON response with a `callId`.
|
|
83
|
+
|
|
84
|
+
```bash
|
|
85
|
+
echo "Review PR #42 on acme/app" | al-subagent reviewer
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
**Response:**
|
|
89
|
+
|
|
90
|
+
```json
|
|
91
|
+
{"ok": true, "callId": "abc123"}
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
**Errors:**
|
|
95
|
+
|
|
96
|
+
```json
|
|
97
|
+
{"ok": false, "error": "self-call not allowed"}
|
|
98
|
+
{"ok": false, "error": "queue full"}
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
### `al-subagent-check <callId>`
|
|
102
|
+
|
|
103
|
+
Non-blocking status check on a call. Never blocks.
|
|
104
|
+
|
|
105
|
+
```bash
|
|
106
|
+
al-subagent-check abc123
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
**Response:**
|
|
110
|
+
|
|
111
|
+
```json
|
|
112
|
+
{"status": "pending"}
|
|
113
|
+
{"status": "running"}
|
|
114
|
+
{"status": "completed", "returnValue": "PR approved."}
|
|
115
|
+
{"status": "error", "error": "timeout"}
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### `al-subagent-wait <callId> [...] [--timeout N]`
|
|
119
|
+
|
|
120
|
+
Wait for one or more calls to complete. Polls every 5 seconds. Default timeout: 900 seconds.
|
|
121
|
+
|
|
122
|
+
```bash
|
|
123
|
+
al-subagent-wait abc123 --timeout 600
|
|
124
|
+
al-subagent-wait abc123 def456 --timeout 300
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
**Response:**
|
|
128
|
+
|
|
129
|
+
```json
|
|
130
|
+
{
|
|
131
|
+
"abc123": {"status": "completed", "returnValue": "PR approved."},
|
|
132
|
+
"def456": {"status": "completed", "returnValue": "Tests pass."}
|
|
133
|
+
}
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
### Complete call example
|
|
137
|
+
|
|
138
|
+
```bash
|
|
139
|
+
# Fire multiple calls
|
|
140
|
+
REVIEW_ID=$(echo "Review PR #42 on acme/app" | al-subagent reviewer | jq -r .callId)
|
|
141
|
+
TEST_ID=$(echo "Run full test suite for acme/app" | al-subagent tester | jq -r .callId)
|
|
142
|
+
|
|
143
|
+
# ... do other work ...
|
|
144
|
+
|
|
145
|
+
# Collect results
|
|
146
|
+
RESULTS=$(al-subagent-wait "$REVIEW_ID" "$TEST_ID" --timeout 600)
|
|
147
|
+
echo "$RESULTS" | jq ".\"$REVIEW_ID\".returnValue"
|
|
148
|
+
echo "$RESULTS" | jq ".\"$TEST_ID\".returnValue"
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
### Call rules
|
|
152
|
+
|
|
153
|
+
- An agent cannot call itself (self-calls are rejected)
|
|
154
|
+
- If all runners for the target agent are busy, the call is queued (up to `workQueueSize`, default: 100)
|
|
155
|
+
- Call chains are allowed (A calls B, B calls C) up to `maxCallDepth` (default: 3)
|
|
156
|
+
- Called runs do not re-run — they respond to the single call
|
|
157
|
+
- The called agent receives a `<skill-subagent>` block with the caller name and context
|
|
158
|
+
- To return a value, the called agent uses `al-return`
|
|
159
|
+
|
|
160
|
+
## Lock Commands
|
|
161
|
+
|
|
162
|
+
Resource locks prevent multiple agent instances from working on the same resource. Lock keys use URI format (e.g. `github://acme/app/issues/42`).
|
|
163
|
+
|
|
164
|
+
### `rlock`
|
|
165
|
+
|
|
166
|
+
Acquire an exclusive lock on a resource.
|
|
167
|
+
|
|
168
|
+
```bash
|
|
169
|
+
rlock "github://acme/app/issues/42"
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
**Success:**
|
|
173
|
+
|
|
174
|
+
```json
|
|
175
|
+
{"ok": true}
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
**Already held:**
|
|
179
|
+
|
|
180
|
+
```json
|
|
181
|
+
{"ok": false, "holder": "dev-abc123", "heldSince": "2025-01-15T10:30:00Z"}
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
**Deadlock detected:**
|
|
185
|
+
|
|
186
|
+
```json
|
|
187
|
+
{"ok": false, "reason": "possible deadlock detected", "cycle": ["dev-abc", "github://acme/app/pr/10", "dev-def", "deploy://api-prod"]}
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
### `runlock`
|
|
191
|
+
|
|
192
|
+
Release a lock. Only the holder can release.
|
|
193
|
+
|
|
194
|
+
```bash
|
|
195
|
+
runlock "github://acme/app/issues/42"
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
**Success:**
|
|
199
|
+
|
|
200
|
+
```json
|
|
201
|
+
{"ok": true}
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
**Not holder:**
|
|
205
|
+
|
|
206
|
+
```json
|
|
207
|
+
{"ok": false, "reason": "not the lock holder"}
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
### `rlock-heartbeat`
|
|
211
|
+
|
|
212
|
+
Reset the TTL on a held lock. Use during long-running work to prevent the lock from expiring.
|
|
213
|
+
|
|
214
|
+
```bash
|
|
215
|
+
rlock-heartbeat "github://acme/app/issues/42"
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
**Success:**
|
|
219
|
+
|
|
220
|
+
```json
|
|
221
|
+
{"ok": true, "expiresAt": "2025-01-15T11:00:00Z"}
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
### Example in SKILL.md
|
|
225
|
+
|
|
226
|
+
Reference lock commands directly in your `SKILL.md` workflow:
|
|
227
|
+
|
|
228
|
+
```markdown
|
|
229
|
+
## Workflow
|
|
230
|
+
|
|
231
|
+
1. List open issues labeled "agent" in repos from `<skill-config>`
|
|
232
|
+
2. For each issue:
|
|
233
|
+
- rlock "github://owner/repo/issues/123"
|
|
234
|
+
- If the lock fails, skip this issue — another instance is handling it
|
|
235
|
+
- Clone the repo, create a branch, implement the fix
|
|
236
|
+
- Open a PR and link it to the issue
|
|
237
|
+
- runlock "github://owner/repo/issues/123"
|
|
238
|
+
3. If you completed work and there may be more issues, run `al-rerun`
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
The preamble teaches the agent the lock commands, their responses, and the URI key format.
|
|
242
|
+
|
|
243
|
+
### Lock authentication
|
|
244
|
+
|
|
245
|
+
Each container gets a unique per-run secret. Lock requests are authenticated with this secret, so only the container that acquired a lock can release or heartbeat it. There is no way for one agent instance to release another's lock.
|
|
246
|
+
|
|
247
|
+
### Auto-release on exit
|
|
248
|
+
|
|
249
|
+
When a container exits — whether it finishes successfully, hits an error, or times out — all of its locks are released automatically by the scheduler.
|
|
250
|
+
|
|
251
|
+
See [Resource Locks](/concepts/resource-locks) for a complete description of the locking system.
|
|
252
|
+
|
|
253
|
+
---
|
|
254
|
+
|
|
255
|
+
# Runtime Context
|
|
256
|
+
|
|
257
|
+
When your agent runs, Action Llama assembles a prompt from several sources and passes it to the LLM as a single user message. Your `SKILL.md` body becomes the system prompt; everything below is the **user prompt** your agent receives alongside it.
|
|
258
|
+
|
|
259
|
+
Understanding this structure helps you write better `SKILL.md` instructions — you can reference the injected blocks by name, avoid duplicating information that's already provided, and tailor your instructions to complement the runtime context.
|
|
260
|
+
|
|
261
|
+
## Prompt Structure
|
|
262
|
+
|
|
263
|
+
Here's the full user prompt for a webhook-triggered agent with a GitHub token credential:
|
|
264
|
+
|
|
265
|
+
```xml
|
|
266
|
+
<agent-config>
|
|
267
|
+
{"repo":"acme/widgets","labels":["bug","triage"]}
|
|
268
|
+
</agent-config>
|
|
269
|
+
|
|
270
|
+
<credential-context>
|
|
271
|
+
Credential files are mounted at `/credentials/` (read-only).
|
|
272
|
+
|
|
273
|
+
Environment variables already set from credentials:
|
|
274
|
+
- `GITHUB_TOKEN` / `GH_TOKEN` — use `gh` CLI and `git` directly
|
|
275
|
+
|
|
276
|
+
Use standard tools directly: `gh` CLI, `git`, `curl`.
|
|
277
|
+
|
|
278
|
+
Git clone protocol: Always clone repos via SSH...
|
|
279
|
+
|
|
280
|
+
Anti-exfiltration policy:
|
|
281
|
+
[security instructions omitted]
|
|
282
|
+
</credential-context>
|
|
283
|
+
|
|
284
|
+
<environment>
|
|
285
|
+
Filesystem: The root filesystem is read-only. `/tmp` is the only writable directory.
|
|
286
|
+
Use `/tmp` for cloning repos, writing scratch files, and any other disk I/O.
|
|
287
|
+
Your working directory is `/app/static` which contains your agent files.
|
|
288
|
+
|
|
289
|
+
Environment variables: Use `setenv NAME value` to persist variables across bash commands.
|
|
290
|
+
See the agent commands reference for details.
|
|
291
|
+
</environment>
|
|
292
|
+
|
|
293
|
+
<webhook-trigger>
|
|
294
|
+
{"source":"github","event":"issues","action":"opened","repo":"acme/widgets",
|
|
295
|
+
"number":42,"title":"Login button broken on Safari","body":"Steps to reproduce...",
|
|
296
|
+
"url":"https://github.com/acme/widgets/issues/42","author":"jdoe",
|
|
297
|
+
"labels":["bug"],"sender":"jdoe","timestamp":"2026-03-24T14:30:00Z",
|
|
298
|
+
"receiptId":"wh_abc123"}
|
|
299
|
+
</webhook-trigger>
|
|
300
|
+
|
|
301
|
+
A webhook event just fired. Review the trigger context above and take appropriate action.
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
Let's walk through each section.
|
|
305
|
+
|
|
306
|
+
## Agent Config
|
|
307
|
+
|
|
308
|
+
```xml
|
|
309
|
+
<agent-config>
|
|
310
|
+
{"repo":"acme/widgets","labels":["bug","triage"]}
|
|
311
|
+
</agent-config>
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
This is the JSON serialization of the `params` field from your agent's `config.toml`. Use it to pass configuration values that your agent's instructions can reference — repo names, label filters, thresholds, or any structured data your agent needs.
|
|
315
|
+
|
|
316
|
+
Your `SKILL.md` instructions can reference this directly, e.g.: *"Read the repo and labels from `<agent-config>` to determine which issues to process."*
|
|
317
|
+
|
|
318
|
+
See [Agent Config Reference](/reference/agent-config#params) for details on the `params` field.
|
|
319
|
+
|
|
320
|
+
## Credential Context
|
|
321
|
+
|
|
322
|
+
```xml
|
|
323
|
+
<credential-context>
|
|
324
|
+
Credential files are mounted at `/credentials/` (read-only).
|
|
325
|
+
|
|
326
|
+
Environment variables already set from credentials:
|
|
327
|
+
- `GITHUB_TOKEN` / `GH_TOKEN` — use `gh` CLI and `git` directly
|
|
328
|
+
|
|
329
|
+
Use standard tools directly: `gh` CLI, `git`, `curl`.
|
|
330
|
+
...
|
|
331
|
+
</credential-context>
|
|
332
|
+
```
|
|
333
|
+
|
|
334
|
+
This block tells the agent which credentials are available and how to use them. Each credential type defines its own context line — for example, a GitHub token credential explains that `GITHUB_TOKEN` is set and the `gh` CLI is ready to use.
|
|
335
|
+
|
|
336
|
+
The block also includes SSH clone instructions and a **security policy** that instructs the agent never to leak credentials in logs, comments, or API calls.
|
|
337
|
+
|
|
338
|
+
You don't need to repeat any of this in your `SKILL.md`. The agent already knows it can use `gh` and `git` — your instructions just need to say *what* to do, not *how* to authenticate.
|
|
339
|
+
|
|
340
|
+
See [Credentials Reference](/reference/credentials) for all credential types and their injected context.
|
|
341
|
+
|
|
342
|
+
## Environment
|
|
343
|
+
|
|
344
|
+
```xml
|
|
345
|
+
<environment>
|
|
346
|
+
Filesystem: The root filesystem is read-only. `/tmp` is the only writable directory.
|
|
347
|
+
...
|
|
348
|
+
Environment variables: Use `setenv NAME value` to persist variables across bash commands.
|
|
349
|
+
</environment>
|
|
350
|
+
```
|
|
351
|
+
|
|
352
|
+
Describes the filesystem constraints. In Docker mode, the agent learns that `/tmp` is writable and the root filesystem is read-only. In [host-user mode](/reference/agent-config#runtime), the agent's working directory is `/tmp/al-runs/<instance-id>/`. The `setenv` command persists environment variables in both modes.
|
|
353
|
+
|
|
354
|
+
See [Environment Commands](/reference/agent-commands#environment-commands) for `setenv` details, and [Container Filesystem](/concepts/agents#container-filesystem) for the full mount table.
|
|
355
|
+
|
|
356
|
+
## Trigger Context
|
|
357
|
+
|
|
358
|
+
The final section varies by how the agent was triggered. This is the only part of the prompt that changes between runs.
|
|
359
|
+
|
|
360
|
+
### Webhook
|
|
361
|
+
|
|
362
|
+
```xml
|
|
363
|
+
<webhook-trigger>
|
|
364
|
+
{"source":"github","event":"issues","action":"opened","repo":"acme/widgets",...}
|
|
365
|
+
</webhook-trigger>
|
|
366
|
+
|
|
367
|
+
A webhook event just fired. Review the trigger context above and take appropriate action.
|
|
368
|
+
```
|
|
369
|
+
|
|
370
|
+
Contains the full webhook payload as JSON — source, event type, action, repo, issue/PR details, sender, timestamp, and a receipt ID for replay. Your `SKILL.md` instructions should describe how to handle the events your agent subscribes to.
|
|
371
|
+
|
|
372
|
+
See [Webhooks Reference](/reference/webhooks) for the full payload schema.
|
|
373
|
+
|
|
374
|
+
### Scheduled
|
|
375
|
+
|
|
376
|
+
```
|
|
377
|
+
You are running on a schedule. Check for new work and act on anything you find.
|
|
378
|
+
```
|
|
379
|
+
|
|
380
|
+
No structured data — the agent is expected to go find work on its own (poll for open issues, check a queue, etc.).
|
|
381
|
+
|
|
382
|
+
### Manual
|
|
383
|
+
|
|
384
|
+
```
|
|
385
|
+
You have been triggered manually. Check for new work and act on anything you find.
|
|
386
|
+
```
|
|
387
|
+
|
|
388
|
+
Same as scheduled. If you pass a prompt to `al run`, the agent instead receives:
|
|
389
|
+
|
|
390
|
+
```xml
|
|
391
|
+
<user-prompt>
|
|
392
|
+
Your prompt text here
|
|
393
|
+
</user-prompt>
|
|
394
|
+
|
|
395
|
+
You have been given a specific task. Complete the task described above.
|
|
396
|
+
```
|
|
397
|
+
|
|
398
|
+
### Agent Call
|
|
399
|
+
|
|
400
|
+
```xml
|
|
401
|
+
<agent-call>
|
|
402
|
+
{"caller":"orchestrator","context":"Find competitors for Acme in the CRM space"}
|
|
403
|
+
</agent-call>
|
|
404
|
+
|
|
405
|
+
You were called by the "orchestrator" agent. Review the call context above,
|
|
406
|
+
do the requested work, and use `al-return` to send back your result.
|
|
407
|
+
```
|
|
408
|
+
|
|
409
|
+
Contains the calling agent's name and the context string it passed. See the [Subagents Guide](/guides/subagents) for details on agent-to-agent calls.
|
|
410
|
+
|
|
411
|
+
## Skills
|
|
412
|
+
|
|
413
|
+
If your agent enables [skills](/reference/agent-config#skills) like `lock` or `subagent`, additional instruction blocks are injected between the `<environment>` block and the trigger context. These teach the agent the commands it can use — `rlock`/`runlock` for [resource locks](/concepts/resource-locks), or `al-subagent`/`al-subagent-wait` for [subagent calls](/guides/subagents).
|
|
414
|
+
|
|
415
|
+
Skill blocks only appear when explicitly enabled in your `SKILL.md` frontmatter.
|
|
416
|
+
|
|
417
|
+
## Dynamic Context Injection
|
|
418
|
+
|
|
419
|
+
Beyond the assembled prompt, you can inject runtime data into your `SKILL.md` body using the `` !`command` `` syntax. This runs shell commands during container startup and replaces the markers with their output — useful for fetching live data before the LLM session begins.
|
|
420
|
+
|
|
421
|
+
See the [Dynamic Context Guide](/guides/dynamic-context) for details.
|
|
422
|
+
|
|
423
|
+
## Writing Better Instructions
|
|
424
|
+
|
|
425
|
+
Now that you know what the agent receives automatically, here are some tips:
|
|
426
|
+
|
|
427
|
+
- **Don't repeat what's injected.** You don't need to tell the agent about `GITHUB_TOKEN` or filesystem constraints — it already knows.
|
|
428
|
+
- **Reference injected blocks by name.** Say *"Read the config from `<agent-config>`"* rather than hardcoding values.
|
|
429
|
+
- **Handle your trigger types.** If your agent subscribes to both cron and webhooks, your instructions should cover both paths — the trigger context tells the agent which one fired.
|
|
430
|
+
- **Keep instructions focused on behavior.** The runtime context handles the "how" (credentials, environment, tools). Your `SKILL.md` should focus on the "what" and "why."
|
|
431
|
+
|
|
432
|
+
---
|
|
433
|
+
|
|
434
|
+
# Resource Locks
|
|
435
|
+
|
|
436
|
+
When you set `scale > 1` on an agent, multiple instances run concurrently. Without coordination, two instances might pick up the same GitHub issue, review the same PR, or deploy the same service at the same time. Resource locks prevent this.
|
|
437
|
+
|
|
438
|
+
## Why Locks Exist
|
|
439
|
+
|
|
440
|
+
Locks let concurrent agent instances claim exclusive ownership of a resource before working on it. If another instance already holds the lock, the agent skips that resource and moves on.
|
|
441
|
+
|
|
442
|
+
## How It Works
|
|
443
|
+
|
|
444
|
+
1. Before working on a shared resource, the agent runs `rlock "github://acme/app/issues/42"`.
|
|
445
|
+
2. If the lock is free, the agent gets it and proceeds.
|
|
446
|
+
3. If another instance already holds the lock, the agent gets back the holder's name and skips that resource.
|
|
447
|
+
4. When done, the agent runs `runlock "github://acme/app/issues/42"`.
|
|
448
|
+
|
|
449
|
+
The agent learns the lock commands from a preamble injected before the session starts. Agent authors just reference the commands in their `SKILL.md` workflow — no need to think about HTTP endpoints or authentication.
|
|
450
|
+
|
|
451
|
+
## Commands
|
|
452
|
+
|
|
453
|
+
| Command | Description |
|
|
454
|
+
|---------|-------------|
|
|
455
|
+
| `rlock "<uri>"` | Acquire an exclusive lock. Fails if another instance holds it. |
|
|
456
|
+
| `runlock "<uri>"` | Release a lock. Only the holder can release. |
|
|
457
|
+
| `rlock-heartbeat "<uri>"` | Reset the TTL on a held lock. |
|
|
458
|
+
|
|
459
|
+
See [Agent Commands — Locks](/reference/agent-commands#lock-commands) for the full command reference with response JSON.
|
|
460
|
+
|
|
461
|
+
## Resource Key URIs
|
|
462
|
+
|
|
463
|
+
Lock keys use URI format. Use a scheme that identifies the resource type, and a path that uniquely identifies the instance:
|
|
464
|
+
|
|
465
|
+
| Pattern | Example |
|
|
466
|
+
|---------|---------|
|
|
467
|
+
| `github://owner/repo/issues/number` | `rlock "github://acme/app/issues/42"` |
|
|
468
|
+
| `github://owner/repo/pr/number` | `rlock "github://acme/app/pr/17"` |
|
|
469
|
+
| `deploy://service-name` | `rlock "deploy://api-prod"` |
|
|
470
|
+
|
|
471
|
+
## TTL and Expiry
|
|
472
|
+
|
|
473
|
+
Locks expire automatically after **30 minutes** by default. This prevents deadlocks if an agent crashes or hangs without releasing its lock. The timeout is configurable via `resourceLockTimeout` in `config.toml` (value in seconds).
|
|
474
|
+
|
|
475
|
+
For work that takes longer than the timeout, use `rlock-heartbeat` to extend the TTL. Each heartbeat resets the clock to another full TTL period. If the agent forgets to heartbeat and the lock expires, another instance can claim it.
|
|
476
|
+
|
|
477
|
+
## Heartbeat
|
|
478
|
+
|
|
479
|
+
During long-running work, periodically run `rlock-heartbeat` to keep the lock alive:
|
|
480
|
+
|
|
481
|
+
```markdown
|
|
482
|
+
## Workflow
|
|
483
|
+
|
|
484
|
+
1. rlock "deploy://api-prod"
|
|
485
|
+
2. Run the deployment (may take 45+ minutes)
|
|
486
|
+
- Every 10 minutes, run rlock-heartbeat "deploy://api-prod"
|
|
487
|
+
3. runlock "deploy://api-prod"
|
|
488
|
+
```
|
|
489
|
+
|
|
490
|
+
Each heartbeat resets the expiry to a full TTL period from the current time.
|
|
491
|
+
|
|
492
|
+
## Multiple Locks and Deadlock Detection
|
|
493
|
+
|
|
494
|
+
An agent instance can hold multiple locks simultaneously when working across related resources. However, this introduces the possibility of circular waits — agent A holds lock X and waits for lock Y, while agent B holds lock Y and waits for lock X.
|
|
495
|
+
|
|
496
|
+
The gateway detects these cycles automatically. When an `rlock` request would create a circular wait in the wait-for graph, it returns a `possible deadlock` error with the cycle path instead of blocking forever. The agent can then release its held locks and retry.
|
|
497
|
+
|
|
498
|
+
```
|
|
499
|
+
# Example deadlock cycle:
|
|
500
|
+
# Agent A holds "github://acme/app/pr/10", wants "deploy://api-prod"
|
|
501
|
+
# Agent B holds "deploy://api-prod", wants "github://acme/app/pr/10"
|
|
502
|
+
# → rlock "deploy://api-prod" returns: possible deadlock detected
|
|
503
|
+
```
|
|
504
|
+
|
|
505
|
+
Note: The agent preamble constrains agents to one lock at a time for simplicity. Multi-lock is available for advanced use cases where the agent is explicitly instructed to hold multiple locks.
|
|
506
|
+
|
|
507
|
+
## Authentication
|
|
508
|
+
|
|
509
|
+
Each container gets a unique per-run secret (the same one used for the shutdown API). Lock requests are authenticated with this secret, so only the container that acquired a lock can release or heartbeat it. There is no way for one agent instance to release another's lock — it must wait for the TTL to expire.
|
|
510
|
+
|
|
511
|
+
## Auto-release on Exit
|
|
512
|
+
|
|
513
|
+
When a container exits — whether it finishes successfully, hits an error, or times out — all of its locks are released automatically by the scheduler. You don't need to worry about cleanup in error paths.
|
|
514
|
+
|
|
515
|
+
## Example in SKILL.md
|
|
516
|
+
|
|
517
|
+
```markdown
|
|
518
|
+
## Workflow
|
|
519
|
+
|
|
520
|
+
1. List open issues labeled "agent" in repos from `<agent-config>`
|
|
521
|
+
2. For each issue:
|
|
522
|
+
- rlock "github://owner/repo/issues/123"
|
|
523
|
+
- If the lock fails, skip this issue — another instance is handling it
|
|
524
|
+
- Clone the repo, create a branch, implement the fix
|
|
525
|
+
- Open a PR and link it to the issue
|
|
526
|
+
- runlock "github://owner/repo/issues/123"
|
|
527
|
+
3. If you completed work and there may be more issues, run `al-rerun`
|
|
528
|
+
```
|
|
529
|
+
|
|
530
|
+
## Configuration
|
|
531
|
+
|
|
532
|
+
| Setting | Location | Default | Description |
|
|
533
|
+
|---------|----------|---------|-------------|
|
|
534
|
+
| `resourceLockTimeout` | `config.toml` | `1800` (30 min) | Default TTL for locks in seconds |
|
|
535
|
+
|
|
536
|
+
## See Also
|
|
537
|
+
|
|
538
|
+
- [Agent Commands — Locks](/reference/agent-commands#lock-commands) — full command syntax and response JSON
|
|
539
|
+
- [Scaling Agents](/guides/scaling-agents) — guide on scaling with locks
|
|
540
|
+
|
|
541
|
+
---
|
|
542
|
+
|
|
543
|
+
# Dynamic Context
|
|
544
|
+
|
|
545
|
+
Agents spend tokens every time they fetch context at runtime — cloning repos, listing issues, calling APIs. Hooks let you stage this data before the LLM session starts, so the agent begins with everything it needs.
|
|
546
|
+
|
|
547
|
+
## The Problem
|
|
548
|
+
|
|
549
|
+
Without hooks, a typical agent run looks like:
|
|
550
|
+
|
|
551
|
+
1. LLM starts
|
|
552
|
+
2. LLM runs `git clone` (waits, uses tokens to read output)
|
|
553
|
+
3. LLM runs `gh issue list` (waits, uses tokens to parse JSON)
|
|
554
|
+
4. LLM starts actual work
|
|
555
|
+
|
|
556
|
+
Steps 2-3 are mechanical — the agent always needs to do them, and they don't benefit from LLM reasoning.
|
|
557
|
+
|
|
558
|
+
## The Solution: Hooks
|
|
559
|
+
|
|
560
|
+
Pre-hooks run **after credentials are loaded** but **before the LLM session starts**. They execute inside the container with full access to credentials and environment variables. Define them in the agent's `config.toml`:
|
|
561
|
+
|
|
562
|
+
```toml
|
|
563
|
+
# agents/<name>/config.toml
|
|
564
|
+
[hooks]
|
|
565
|
+
pre = [
|
|
566
|
+
"gh repo clone acme/app /tmp/repo --depth 1",
|
|
567
|
+
"gh issue list --repo acme/app --label bug --json number,title,body --limit 20 > /tmp/context/issues.json",
|
|
568
|
+
]
|
|
569
|
+
```
|
|
570
|
+
|
|
571
|
+
Then reference the staged files in the body of your `SKILL.md`:
|
|
572
|
+
|
|
573
|
+
```markdown
|
|
574
|
+
## Context
|
|
575
|
+
|
|
576
|
+
- The repo is cloned at `/tmp/repo`
|
|
577
|
+
- Open bug issues are at `/tmp/context/issues.json`
|
|
578
|
+
```
|
|
579
|
+
|
|
580
|
+
## Example: Git clone
|
|
581
|
+
|
|
582
|
+
The most common hook — clone the repo the agent will work on:
|
|
583
|
+
|
|
584
|
+
```toml
|
|
585
|
+
[hooks]
|
|
586
|
+
pre = ["gh repo clone acme/app /tmp/repo --depth 1"]
|
|
587
|
+
```
|
|
588
|
+
|
|
589
|
+
## Example: Shell command
|
|
590
|
+
|
|
591
|
+
Run any shell command. `GITHUB_TOKEN`, `GH_TOKEN`, and other credential env vars are already set:
|
|
592
|
+
|
|
593
|
+
```toml
|
|
594
|
+
[hooks]
|
|
595
|
+
pre = ["gh issue list --repo acme/app --label P1 --json number,title,body --limit 20 > /tmp/context/issues.json"]
|
|
596
|
+
```
|
|
597
|
+
|
|
598
|
+
## Example: HTTP fetch
|
|
599
|
+
|
|
600
|
+
Fetch data from an API endpoint:
|
|
601
|
+
|
|
602
|
+
```toml
|
|
603
|
+
[hooks]
|
|
604
|
+
pre = ["curl -sf -H 'Authorization: Bearer ${INTERNAL_TOKEN}' https://api.internal/v1/feature-flags -o /tmp/context/flags.json"]
|
|
605
|
+
```
|
|
606
|
+
|
|
607
|
+
Environment variable interpolation (`${VAR_NAME}`) is supported since commands run via `/bin/sh`.
|
|
608
|
+
|
|
609
|
+
## Post-hooks
|
|
610
|
+
|
|
611
|
+
Post-hooks run after the LLM session completes. Use them for cleanup, artifact upload, or reporting:
|
|
612
|
+
|
|
613
|
+
```toml
|
|
614
|
+
[hooks]
|
|
615
|
+
pre = ["gh repo clone acme/app /tmp/repo --depth 1"]
|
|
616
|
+
post = [
|
|
617
|
+
"upload-artifacts.sh",
|
|
618
|
+
"curl -X POST https://hooks.slack.com/... -d '{\"text\": \"Agent run complete\"}'",
|
|
619
|
+
]
|
|
620
|
+
```
|
|
621
|
+
|
|
622
|
+
## Referencing Staged Files in SKILL.md
|
|
623
|
+
|
|
624
|
+
After hooks run, tell the agent what's available in the body of your `SKILL.md`:
|
|
625
|
+
|
|
626
|
+
```markdown
|
|
627
|
+
## Context
|
|
628
|
+
|
|
629
|
+
- The repo is cloned at `/tmp/repo`
|
|
630
|
+
- Open P1 issues are at `/tmp/context/issues.json`
|
|
631
|
+
- Feature flags (if available) are at `/tmp/context/flags.json`
|
|
632
|
+
```
|
|
633
|
+
|
|
634
|
+
## Direct Context Injection
|
|
635
|
+
|
|
636
|
+
For simple, inline data that needs to be embedded directly in your SKILL.md instructions, use direct context injection with the `` !`command` `` syntax. Commands are executed after pre-hooks but before the LLM session starts, and their output replaces the expression inline.
|
|
637
|
+
|
|
638
|
+
### Syntax
|
|
639
|
+
|
|
640
|
+
```markdown
|
|
641
|
+
The current time is !`date`.
|
|
642
|
+
There are !`ls /tmp/repo/src | wc -l` source files in the repo.
|
|
643
|
+
```
|
|
644
|
+
|
|
645
|
+
Becomes:
|
|
646
|
+
|
|
647
|
+
```
|
|
648
|
+
The current time is Mon Mar 22 21:30:45 UTC 2026.
|
|
649
|
+
There are 42 source files in the repo.
|
|
650
|
+
```
|
|
651
|
+
|
|
652
|
+
### When to use
|
|
653
|
+
|
|
654
|
+
- **Hooks**: For setup tasks like cloning repos or downloading data files
|
|
655
|
+
- **Direct injection**: For inline values the agent needs to reference in instructions
|
|
656
|
+
|
|
657
|
+
### Examples
|
|
658
|
+
|
|
659
|
+
**Basic usage**:
|
|
660
|
+
```markdown
|
|
661
|
+
You are analyzing code at !`date +"%Y-%m-%d %H:%M"`.
|
|
662
|
+
```
|
|
663
|
+
|
|
664
|
+
**With pre-staged data**:
|
|
665
|
+
```toml
|
|
666
|
+
[hooks]
|
|
667
|
+
pre = ["gh issue list --repo acme/app --label bug --json number --limit 20 > /tmp/issues.json"]
|
|
668
|
+
```
|
|
669
|
+
|
|
670
|
+
```markdown
|
|
671
|
+
There are !`cat /tmp/issues.json | jq length` open bug issues to work on.
|
|
672
|
+
```
|
|
673
|
+
|
|
674
|
+
**Error handling**:
|
|
675
|
+
If a command fails, it's replaced with `[Error: <message>]`:
|
|
676
|
+
|
|
677
|
+
```markdown
|
|
678
|
+
Config value: !`cat /nonexistent/file`
|
|
679
|
+
```
|
|
680
|
+
|
|
681
|
+
Becomes:
|
|
682
|
+
```
|
|
683
|
+
Config value: [Error: cat: can't open '/nonexistent/file': No such file or directory]
|
|
684
|
+
```
|
|
685
|
+
|
|
686
|
+
### Limitations
|
|
687
|
+
|
|
688
|
+
- Commands have a 60-second timeout
|
|
689
|
+
- Output is limited to prevent prompt explosion
|
|
690
|
+
- Errors are inline — use hooks for critical setup that should fail the run
|
|
691
|
+
|
|
692
|
+
## Tips
|
|
693
|
+
|
|
694
|
+
- **Hooks run sequentially** in the order defined in `config.toml`
|
|
695
|
+
- **Each hook has a 5-minute timeout** — hooks are also bounded by the container-level timeout
|
|
696
|
+
- **If a command fails** (non-zero exit), the run aborts with an error
|
|
697
|
+
- **Environment variables** set inside hook commands do not propagate back to the agent's `process.env`
|
|
698
|
+
- **Use hooks for setup, direct injection for values** — hooks for cloning repos or staging files, direct context injection for inline dynamic values the agent needs to reference
|
|
699
|
+
|
|
700
|
+
## Next steps
|
|
701
|
+
|
|
702
|
+
- [Agent Config — Hooks](/reference/agent-config#hooks) — full field reference
|
|
703
|
+
- [Agents (concepts)](/concepts/agents) — full runtime lifecycle
|
|
704
|
+
|
|
705
|
+
---
|
|
706
|
+
|
|
707
|
+
# Shared Context
|
|
708
|
+
|
|
709
|
+
Agents often need the same context — coding conventions, repo layout, team policies. The `shared/` directory lets you maintain this context in one place and reference it from any agent's `SKILL.md`.
|
|
710
|
+
|
|
711
|
+
## How it works
|
|
712
|
+
|
|
713
|
+
Place files in a `shared/` directory at your project root. At image build time, these files are baked into every agent's container at `/app/static/shared/`.
|
|
714
|
+
|
|
715
|
+
```
|
|
716
|
+
my-project/
|
|
717
|
+
├── config.toml
|
|
718
|
+
├── shared/
|
|
719
|
+
│ ├── conventions.md
|
|
720
|
+
│ ├── repo-layout.md
|
|
721
|
+
│ └── team/
|
|
722
|
+
│ └── review-policy.md
|
|
723
|
+
├── agents/
|
|
724
|
+
│ ├── dev/SKILL.md
|
|
725
|
+
│ └── reviewer/SKILL.md
|
|
726
|
+
```
|
|
727
|
+
|
|
728
|
+
## Referencing shared files in SKILL.md
|
|
729
|
+
|
|
730
|
+
Use direct context injection to include shared files in your agent's prompt:
|
|
731
|
+
|
|
732
|
+
```markdown
|
|
733
|
+
## Context
|
|
734
|
+
|
|
735
|
+
!`cat /app/static/shared/conventions.md`
|
|
736
|
+
!`cat /app/static/shared/repo-layout.md`
|
|
737
|
+
```
|
|
738
|
+
|
|
739
|
+
Each agent chooses which shared files to include. There is no automatic injection — you control exactly what context each agent receives.
|
|
740
|
+
|
|
741
|
+
## Example: coding conventions
|
|
742
|
+
|
|
743
|
+
Create `shared/conventions.md`:
|
|
744
|
+
|
|
745
|
+
```markdown
|
|
746
|
+
# Coding Conventions
|
|
747
|
+
|
|
748
|
+
- TypeScript strict mode, no `any`
|
|
749
|
+
- Use `vitest` for tests, mirror `src/` structure in `test/`
|
|
750
|
+
- Prefer named exports over default exports
|
|
751
|
+
- Error messages should include enough context to debug without a stack trace
|
|
752
|
+
```
|
|
753
|
+
|
|
754
|
+
Reference it in `agents/dev/SKILL.md`:
|
|
755
|
+
|
|
756
|
+
```markdown
|
|
757
|
+
# Dev Agent
|
|
758
|
+
|
|
759
|
+
You solve GitHub issues by writing code.
|
|
760
|
+
|
|
761
|
+
## Context
|
|
762
|
+
|
|
763
|
+
!`cat /app/static/shared/conventions.md`
|
|
764
|
+
|
|
765
|
+
## Workflow
|
|
766
|
+
|
|
767
|
+
1. Read the issue
|
|
768
|
+
2. Write the fix following the conventions above
|
|
769
|
+
3. Run tests
|
|
770
|
+
4. Open a PR
|
|
771
|
+
```
|
|
772
|
+
|
|
773
|
+
And in `agents/reviewer/SKILL.md`:
|
|
774
|
+
|
|
775
|
+
```markdown
|
|
776
|
+
# Reviewer Agent
|
|
777
|
+
|
|
778
|
+
You review pull requests for correctness and style.
|
|
779
|
+
|
|
780
|
+
## Context
|
|
781
|
+
|
|
782
|
+
!`cat /app/static/shared/conventions.md`
|
|
783
|
+
|
|
784
|
+
## Workflow
|
|
785
|
+
|
|
786
|
+
1. Read the PR diff
|
|
787
|
+
2. Check against conventions
|
|
788
|
+
3. Approve or request changes
|
|
789
|
+
```
|
|
790
|
+
|
|
791
|
+
Both agents now share the same conventions without duplication.
|
|
792
|
+
|
|
793
|
+
## Subdirectories
|
|
794
|
+
|
|
795
|
+
The `shared/` directory supports subdirectories. A file at `shared/team/review-policy.md` is available at `/app/static/shared/team/review-policy.md` inside the container.
|
|
796
|
+
|
|
797
|
+
## Cloud deployment
|
|
798
|
+
|
|
799
|
+
When deploying with `al push`, the `shared/` directory is included automatically — `rsync` syncs the entire project directory.
|
|
800
|
+
|
|
801
|
+
---
|
|
802
|
+
|
|
803
|
+
# Subagents
|
|
804
|
+
|
|
805
|
+
Agents can call other agents to delegate work and collect results. This enables multi-agent workflows like planner → developer → reviewer pipelines.
|
|
806
|
+
|
|
807
|
+
## Use Case
|
|
808
|
+
|
|
809
|
+
A planner agent triages an issue and creates an implementation plan. It calls a dev agent to implement it, then calls a reviewer agent to review the PR.
|
|
810
|
+
|
|
811
|
+
## `al-subagent`: Fire a call
|
|
812
|
+
|
|
813
|
+
Pass context to another agent via stdin:
|
|
814
|
+
|
|
815
|
+
```bash
|
|
816
|
+
echo "Implement the fix for issue #42 on acme/app" | al-subagent dev
|
|
817
|
+
```
|
|
818
|
+
|
|
819
|
+
**Response:**
|
|
820
|
+
|
|
821
|
+
```json
|
|
822
|
+
{"ok": true, "callId": "abc123"}
|
|
823
|
+
```
|
|
824
|
+
|
|
825
|
+
The call is non-blocking — the calling agent continues working immediately.
|
|
826
|
+
|
|
827
|
+
## `al-subagent-check`: Non-blocking status
|
|
828
|
+
|
|
829
|
+
Check if a call has finished without waiting:
|
|
830
|
+
|
|
831
|
+
```bash
|
|
832
|
+
al-subagent-check abc123
|
|
833
|
+
```
|
|
834
|
+
|
|
835
|
+
**Response:**
|
|
836
|
+
|
|
837
|
+
```json
|
|
838
|
+
{"status": "pending"}
|
|
839
|
+
{"status": "running"}
|
|
840
|
+
{"status": "completed", "returnValue": "PR #17 opened."}
|
|
841
|
+
{"status": "error", "error": "timeout"}
|
|
842
|
+
```
|
|
843
|
+
|
|
844
|
+
## `al-subagent-wait`: Block until done
|
|
845
|
+
|
|
846
|
+
Wait for one or more calls to complete:
|
|
847
|
+
|
|
848
|
+
```bash
|
|
849
|
+
al-subagent-wait abc123 --timeout 600
|
|
850
|
+
al-subagent-wait abc123 def456 --timeout 300
|
|
851
|
+
```
|
|
852
|
+
|
|
853
|
+
**Response:**
|
|
854
|
+
|
|
855
|
+
```json
|
|
856
|
+
{
|
|
857
|
+
"abc123": {"status": "completed", "returnValue": "PR #17 opened."},
|
|
858
|
+
"def456": {"status": "completed", "returnValue": "All tests pass."}
|
|
859
|
+
}
|
|
860
|
+
```
|
|
861
|
+
|
|
862
|
+
Default timeout: 900 seconds. Polls every 5 seconds.
|
|
863
|
+
|
|
864
|
+
## `al-return`: Send back a result
|
|
865
|
+
|
|
866
|
+
The called agent uses `al-return` to send a value back to the caller:
|
|
867
|
+
|
|
868
|
+
```bash
|
|
869
|
+
al-return "PR #17 opened. Ready for review."
|
|
870
|
+
```
|
|
871
|
+
|
|
872
|
+
## Multi-call Pattern
|
|
873
|
+
|
|
874
|
+
Fire several calls, continue working, then collect all results:
|
|
875
|
+
|
|
876
|
+
```bash
|
|
877
|
+
# Fire calls
|
|
878
|
+
DEV_ID=$(echo "Implement fix for #42" | al-subagent dev | jq -r .callId)
|
|
879
|
+
REVIEW_ID=$(echo "Review PR #17" | al-subagent reviewer | jq -r .callId)
|
|
880
|
+
|
|
881
|
+
# ... do other work while they run ...
|
|
882
|
+
|
|
883
|
+
# Collect results
|
|
884
|
+
RESULTS=$(al-subagent-wait "$DEV_ID" "$REVIEW_ID" --timeout 600)
|
|
885
|
+
echo "$RESULTS" | jq ".\"$DEV_ID\".returnValue"
|
|
886
|
+
echo "$RESULTS" | jq ".\"$REVIEW_ID\".returnValue"
|
|
887
|
+
```
|
|
888
|
+
|
|
889
|
+
## Complete Example: SKILL.md
|
|
890
|
+
|
|
891
|
+
Here's a planner agent that delegates to dev and reviewer:
|
|
892
|
+
|
|
893
|
+
```markdown
|
|
894
|
+
# Planner Agent
|
|
895
|
+
|
|
896
|
+
You orchestrate development workflows. When triggered, you assess the issue,
|
|
897
|
+
create an implementation plan, and delegate to other agents.
|
|
898
|
+
|
|
899
|
+
## Workflow
|
|
900
|
+
|
|
901
|
+
1. Read the issue from the webhook trigger or search for labeled issues
|
|
902
|
+
2. Assess the issue — is it clear enough for development?
|
|
903
|
+
3. If not, comment asking for clarification and stop
|
|
904
|
+
4. Write an implementation plan as a comment on the issue
|
|
905
|
+
5. Call the dev agent:
|
|
906
|
+
```
|
|
907
|
+
echo "Implement the plan in comment #N on issue #M in owner/repo" | al-subagent dev
|
|
908
|
+
```
|
|
909
|
+
6. Wait for dev to finish:
|
|
910
|
+
```
|
|
911
|
+
RESULT=$(al-subagent-wait "$CALL_ID" --timeout 1800)
|
|
912
|
+
```
|
|
913
|
+
7. If dev succeeded, call the reviewer:
|
|
914
|
+
```
|
|
915
|
+
echo "Review PR #P on owner/repo" | al-subagent reviewer
|
|
916
|
+
```
|
|
917
|
+
8. Comment on the issue with the final status
|
|
918
|
+
```
|
|
919
|
+
|
|
920
|
+
## Rules
|
|
921
|
+
|
|
922
|
+
- **No self-calls** — an agent cannot call itself (the call is rejected)
|
|
923
|
+
- **Call depth limit** — chains like A → B → C are allowed up to `maxCallDepth` (default: 3)
|
|
924
|
+
- **Queuing** — if all runners for the target agent are busy, the call is queued (up to `workQueueSize`, default: 100)
|
|
925
|
+
- **No reruns** — called agents do not re-run. They respond to the single call.
|
|
926
|
+
- **Gateway required** — call commands require the gateway. They return errors if `GATEWAY_URL` is not set.
|
|
927
|
+
|
|
928
|
+
## What the Called Agent Sees
|
|
929
|
+
|
|
930
|
+
The called agent receives a `<skill-subagent>` block in its prompt with:
|
|
931
|
+
|
|
932
|
+
- The name of the calling agent
|
|
933
|
+
- The context string passed via stdin
|
|
934
|
+
|
|
935
|
+
The called agent's `SKILL.md` should handle this trigger type:
|
|
936
|
+
|
|
937
|
+
```markdown
|
|
938
|
+
## Trigger handling
|
|
939
|
+
|
|
940
|
+
- **Agent call**: The `<skill-subagent>` block contains context from the calling agent.
|
|
941
|
+
Do what was requested and use `al-return` to send back results.
|
|
942
|
+
```
|
|
943
|
+
|
|
944
|
+
## Next steps
|
|
945
|
+
|
|
946
|
+
- [Agent Commands](/reference/agent-commands) — full call command syntax and responses
|
|
947
|
+
- [Agents (concepts)](/concepts/agents) — runtime lifecycle and trigger types
|
|
948
|
+
|
|
949
|
+
---
|
|
950
|
+
|
|
951
|
+
# Scaling Agents
|
|
952
|
+
|
|
953
|
+
By default, each agent runs one instance at a time. This guide shows how to scale up and use [resource locks](/concepts/resource-locks) to prevent duplicate work.
|
|
954
|
+
|
|
955
|
+
## The Problem
|
|
956
|
+
|
|
957
|
+
With `scale = 1`, a single agent instance handles all work sequentially. If 5 GitHub issues arrive via webhook while the agent is working on one, those 5 events queue up and wait. For high-volume workloads, this creates a bottleneck.
|
|
958
|
+
|
|
959
|
+
## Increase Scale
|
|
960
|
+
|
|
961
|
+
In the agent's `config.toml`:
|
|
962
|
+
|
|
963
|
+
```toml
|
|
964
|
+
# agents/dev/config.toml
|
|
965
|
+
scale = 3 # Run up to 3 instances concurrently
|
|
966
|
+
```
|
|
967
|
+
|
|
968
|
+
Now when 5 issues arrive, up to 3 are processed simultaneously. The remaining 2 wait in the work queue.
|
|
969
|
+
|
|
970
|
+
## Add Locking
|
|
971
|
+
|
|
972
|
+
With multiple instances, two agents might try to work on the same issue. Add a [lock/skip/work/unlock](/concepts/resource-locks) pattern to your `SKILL.md`:
|
|
973
|
+
|
|
974
|
+
```markdown
|
|
975
|
+
## Workflow
|
|
976
|
+
|
|
977
|
+
1. List open issues labeled "agent" in repos from `<agent-config>`
|
|
978
|
+
2. For each issue:
|
|
979
|
+
- rlock "github://owner/repo/issues/123"
|
|
980
|
+
- If the lock fails, skip this issue — another instance is handling it
|
|
981
|
+
- Clone the repo, create a branch, implement the fix
|
|
982
|
+
- Open a PR and link it to the issue
|
|
983
|
+
- runlock "github://owner/repo/issues/123"
|
|
984
|
+
3. If you completed work and there may be more issues, run `al-rerun`
|
|
985
|
+
```
|
|
986
|
+
|
|
987
|
+
### How lock commands work
|
|
988
|
+
|
|
989
|
+
When the agent runs `rlock "github://owner/repo/issues/123"`:
|
|
990
|
+
|
|
991
|
+
- **Lock acquired:** `{"ok": true}` — proceed with work
|
|
992
|
+
- **Already held:** `{"ok": false, "holder": "dev-abc123", ...}` — skip this resource
|
|
993
|
+
|
|
994
|
+
When done: `runlock "github://owner/repo/issues/123"` releases the lock.
|
|
995
|
+
|
|
996
|
+
If the agent crashes or times out, locks are [auto-released](/concepts/resource-locks#auto-release-on-exit).
|
|
997
|
+
|
|
998
|
+
## Monitor with `al stat`
|
|
999
|
+
|
|
1000
|
+
Check queue depth and running instances:
|
|
1001
|
+
|
|
1002
|
+
```bash
|
|
1003
|
+
al stat
|
|
1004
|
+
al stat -E production
|
|
1005
|
+
```
|
|
1006
|
+
|
|
1007
|
+
The `queue` column shows how many events are waiting. If it's consistently high, consider increasing `scale`.
|
|
1008
|
+
|
|
1009
|
+
## Resource Considerations
|
|
1010
|
+
|
|
1011
|
+
Each parallel instance:
|
|
1012
|
+
|
|
1013
|
+
- Uses a separate Docker container
|
|
1014
|
+
- Consumes memory (`local.memory` per container, default 4GB)
|
|
1015
|
+
- Consumes CPU (`local.cpus` per container, default 2)
|
|
1016
|
+
- Makes independent LLM API calls (watch your rate limits and quota)
|
|
1017
|
+
|
|
1018
|
+
### Tune work queue size
|
|
1019
|
+
|
|
1020
|
+
If events arrive faster than agents can process them, the queue buffers them:
|
|
1021
|
+
|
|
1022
|
+
```toml
|
|
1023
|
+
# config.toml
|
|
1024
|
+
workQueueSize = 200 # default: 100 per agent
|
|
1025
|
+
```
|
|
1026
|
+
|
|
1027
|
+
When the queue is full, the oldest items are dropped.
|
|
1028
|
+
|
|
1029
|
+
### Default agent scale
|
|
1030
|
+
|
|
1031
|
+
Set the default scale for all agents that don't have an explicit `scale` in their `config.toml`:
|
|
1032
|
+
|
|
1033
|
+
```toml
|
|
1034
|
+
# config.toml
|
|
1035
|
+
defaultAgentScale = 3 # each agent gets 3 runners unless overridden
|
|
1036
|
+
```
|
|
1037
|
+
|
|
1038
|
+
Without this setting, agents default to 1 runner each.
|
|
1039
|
+
|
|
1040
|
+
### Project-wide scale cap
|
|
1041
|
+
|
|
1042
|
+
Limit total concurrent runners across all agents:
|
|
1043
|
+
|
|
1044
|
+
```toml
|
|
1045
|
+
# config.toml
|
|
1046
|
+
scale = 10 # max 10 runners total across all agents
|
|
1047
|
+
```
|
|
1048
|
+
|
|
1049
|
+
If `defaultAgentScale * agentCount` exceeds `scale`, agents are throttled at startup and a warning is shown.
|
|
1050
|
+
|
|
1051
|
+
## Example Configuration
|
|
1052
|
+
|
|
1053
|
+
Agent runtime config in `agents/dev/config.toml`:
|
|
1054
|
+
|
|
1055
|
+
```toml
|
|
1056
|
+
credentials = ["github_token", "git_ssh"]
|
|
1057
|
+
schedule = "*/5 * * * *"
|
|
1058
|
+
models = ["sonnet"]
|
|
1059
|
+
scale = 3
|
|
1060
|
+
|
|
1061
|
+
[[webhooks]]
|
|
1062
|
+
source = "my-github"
|
|
1063
|
+
events = ["issues"]
|
|
1064
|
+
actions = ["labeled"]
|
|
1065
|
+
labels = ["agent"]
|
|
1066
|
+
|
|
1067
|
+
[params]
|
|
1068
|
+
repos = ["acme/app", "acme/api"]
|
|
1069
|
+
triggerLabel = "agent"
|
|
1070
|
+
```
|
|
1071
|
+
|
|
1072
|
+
## Next steps
|
|
1073
|
+
|
|
1074
|
+
- [Resource Locks (concepts)](/concepts/resource-locks) — TTL, heartbeat, deadlock detection
|
|
1075
|
+
- [Agent Commands — Locks](/reference/agent-commands#lock-commands) — full command syntax and responses
|