@action-llama/skill 0.23.8 → 0.24.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,1075 @@
1
+ # Agent Commands
2
+
3
+ Agents running in Docker containers have access to shell commands for persisting environment variables, signaling the scheduler, calling other agents, and coordinating with resource locks. These commands are installed at `/tmp/bin/` and taught to agents via a preamble injected before `SKILL.md`.
4
+
5
+ ## Environment Commands
6
+
7
+ ### `setenv`
8
+
9
+ Persist an environment variable across bash commands. Each bash command the agent runs starts in a fresh shell, so variables set with `export` are lost between commands. `setenv` makes them stick.
10
+
11
+ ```bash
12
+ setenv <NAME> <value>
13
+ ```
14
+
15
+ **First bash command — set variables:**
16
+
17
+ ```bash
18
+ setenv REPO "acme/app"
19
+ setenv ISSUE_NUMBER 42
20
+ ```
21
+
22
+ **Later bash command — variables are still available:**
23
+
24
+ ```bash
25
+ gh issue view $ISSUE_NUMBER --repo $REPO
26
+ ```
27
+
28
+ ### How it works
29
+
30
+ `setenv` writes each variable to `/tmp/env.sh`, which is automatically sourced at the start of every bash command. The variable is also exported immediately in the current shell, so it's available right away in the same command.
31
+
32
+ ## Signal Commands
33
+
34
+ Signal commands write signal files that the scheduler reads after the session ends.
35
+
36
+ ### `al-rerun`
37
+
38
+ Request an immediate rerun to drain remaining backlog. Without this, the scheduler treats the run as complete and waits for the next scheduled tick.
39
+
40
+ ```bash
41
+ al-rerun
42
+ ```
43
+
44
+ - Only applies to **scheduled** runs. Webhook-triggered and agent-called runs do not re-run.
45
+ - Reruns continue until the agent completes without calling `al-rerun`, hits an error, or reaches the `maxReruns` limit (default: 10).
46
+
47
+ ### `al-status "<text>"`
48
+
49
+ Update the status text shown in the TUI and web dashboard.
50
+
51
+ ```bash
52
+ al-status "reviewing PR #42"
53
+ al-status "found 3 issues to work on"
54
+ ```
55
+
56
+ ### `al-return "<value>"`
57
+
58
+ Return a value to the calling agent. Used when this agent was invoked via `al-subagent`.
59
+
60
+ ```bash
61
+ al-return "PR looks good. Approved with minor suggestions."
62
+ al-return '{"approved": true, "comments": 2}'
63
+ ```
64
+
65
+ The calling agent receives this value when it calls `al-subagent-wait`.
66
+
67
+ ### `al-exit [code]`
68
+
69
+ Terminate the agent with an exit code indicating an unrecoverable error. Defaults to exit code 15.
70
+
71
+ ```bash
72
+ al-exit # exit code 15
73
+ al-exit 1 # exit code 1
74
+ ```
75
+
76
+ ## Call Commands
77
+
78
+ Agent-to-agent calls allow agents to delegate work and collect results. These commands require the gateway (`GATEWAY_URL` must be set).
79
+
80
+ ### `al-subagent <agent>`
81
+
82
+ Call another agent. Pass context via stdin. Returns a JSON response with a `callId`.
83
+
84
+ ```bash
85
+ echo "Review PR #42 on acme/app" | al-subagent reviewer
86
+ ```
87
+
88
+ **Response:**
89
+
90
+ ```json
91
+ {"ok": true, "callId": "abc123"}
92
+ ```
93
+
94
+ **Errors:**
95
+
96
+ ```json
97
+ {"ok": false, "error": "self-call not allowed"}
98
+ {"ok": false, "error": "queue full"}
99
+ ```
100
+
101
+ ### `al-subagent-check <callId>`
102
+
103
+ Non-blocking status check on a call. Never blocks.
104
+
105
+ ```bash
106
+ al-subagent-check abc123
107
+ ```
108
+
109
+ **Response:**
110
+
111
+ ```json
112
+ {"status": "pending"}
113
+ {"status": "running"}
114
+ {"status": "completed", "returnValue": "PR approved."}
115
+ {"status": "error", "error": "timeout"}
116
+ ```
117
+
118
+ ### `al-subagent-wait <callId> [...] [--timeout N]`
119
+
120
+ Wait for one or more calls to complete. Polls every 5 seconds. Default timeout: 900 seconds.
121
+
122
+ ```bash
123
+ al-subagent-wait abc123 --timeout 600
124
+ al-subagent-wait abc123 def456 --timeout 300
125
+ ```
126
+
127
+ **Response:**
128
+
129
+ ```json
130
+ {
131
+ "abc123": {"status": "completed", "returnValue": "PR approved."},
132
+ "def456": {"status": "completed", "returnValue": "Tests pass."}
133
+ }
134
+ ```
135
+
136
+ ### Complete call example
137
+
138
+ ```bash
139
+ # Fire multiple calls
140
+ REVIEW_ID=$(echo "Review PR #42 on acme/app" | al-subagent reviewer | jq -r .callId)
141
+ TEST_ID=$(echo "Run full test suite for acme/app" | al-subagent tester | jq -r .callId)
142
+
143
+ # ... do other work ...
144
+
145
+ # Collect results
146
+ RESULTS=$(al-subagent-wait "$REVIEW_ID" "$TEST_ID" --timeout 600)
147
+ echo "$RESULTS" | jq ".\"$REVIEW_ID\".returnValue"
148
+ echo "$RESULTS" | jq ".\"$TEST_ID\".returnValue"
149
+ ```
150
+
151
+ ### Call rules
152
+
153
+ - An agent cannot call itself (self-calls are rejected)
154
+ - If all runners for the target agent are busy, the call is queued (up to `workQueueSize`, default: 100)
155
+ - Call chains are allowed (A calls B, B calls C) up to `maxCallDepth` (default: 3)
156
+ - Called runs do not re-run — they respond to the single call
157
+ - The called agent receives a `<skill-subagent>` block with the caller name and context
158
+ - To return a value, the called agent uses `al-return`
159
+
160
+ ## Lock Commands
161
+
162
+ Resource locks prevent multiple agent instances from working on the same resource. Lock keys use URI format (e.g. `github://acme/app/issues/42`).
163
+
164
+ ### `rlock`
165
+
166
+ Acquire an exclusive lock on a resource.
167
+
168
+ ```bash
169
+ rlock "github://acme/app/issues/42"
170
+ ```
171
+
172
+ **Success:**
173
+
174
+ ```json
175
+ {"ok": true}
176
+ ```
177
+
178
+ **Already held:**
179
+
180
+ ```json
181
+ {"ok": false, "holder": "dev-abc123", "heldSince": "2025-01-15T10:30:00Z"}
182
+ ```
183
+
184
+ **Deadlock detected:**
185
+
186
+ ```json
187
+ {"ok": false, "reason": "possible deadlock detected", "cycle": ["dev-abc", "github://acme/app/pr/10", "dev-def", "deploy://api-prod"]}
188
+ ```
189
+
190
+ ### `runlock`
191
+
192
+ Release a lock. Only the holder can release.
193
+
194
+ ```bash
195
+ runlock "github://acme/app/issues/42"
196
+ ```
197
+
198
+ **Success:**
199
+
200
+ ```json
201
+ {"ok": true}
202
+ ```
203
+
204
+ **Not holder:**
205
+
206
+ ```json
207
+ {"ok": false, "reason": "not the lock holder"}
208
+ ```
209
+
210
+ ### `rlock-heartbeat`
211
+
212
+ Reset the TTL on a held lock. Use during long-running work to prevent the lock from expiring.
213
+
214
+ ```bash
215
+ rlock-heartbeat "github://acme/app/issues/42"
216
+ ```
217
+
218
+ **Success:**
219
+
220
+ ```json
221
+ {"ok": true, "expiresAt": "2025-01-15T11:00:00Z"}
222
+ ```
223
+
224
+ ### Example in SKILL.md
225
+
226
+ Reference lock commands directly in your `SKILL.md` workflow:
227
+
228
+ ```markdown
229
+ ## Workflow
230
+
231
+ 1. List open issues labeled "agent" in repos from `<skill-config>`
232
+ 2. For each issue:
233
+ - rlock "github://owner/repo/issues/123"
234
+ - If the lock fails, skip this issue — another instance is handling it
235
+ - Clone the repo, create a branch, implement the fix
236
+ - Open a PR and link it to the issue
237
+ - runlock "github://owner/repo/issues/123"
238
+ 3. If you completed work and there may be more issues, run `al-rerun`
239
+ ```
240
+
241
+ The preamble teaches the agent the lock commands, their responses, and the URI key format.
242
+
243
+ ### Lock authentication
244
+
245
+ Each container gets a unique per-run secret. Lock requests are authenticated with this secret, so only the container that acquired a lock can release or heartbeat it. There is no way for one agent instance to release another's lock.
246
+
247
+ ### Auto-release on exit
248
+
249
+ When a container exits — whether it finishes successfully, hits an error, or times out — all of its locks are released automatically by the scheduler.
250
+
251
+ See [Resource Locks](/concepts/resource-locks) for a complete description of the locking system.
252
+
253
+ ---
254
+
255
+ # Runtime Context
256
+
257
+ When your agent runs, Action Llama assembles a prompt from several sources and passes it to the LLM as a single user message. Your `SKILL.md` body becomes the system prompt; everything below is the **user prompt** your agent receives alongside it.
258
+
259
+ Understanding this structure helps you write better `SKILL.md` instructions — you can reference the injected blocks by name, avoid duplicating information that's already provided, and tailor your instructions to complement the runtime context.
260
+
261
+ ## Prompt Structure
262
+
263
+ Here's the full user prompt for a webhook-triggered agent with a GitHub token credential:
264
+
265
+ ```xml
266
+ <agent-config>
267
+ {"repo":"acme/widgets","labels":["bug","triage"]}
268
+ </agent-config>
269
+
270
+ <credential-context>
271
+ Credential files are mounted at `/credentials/` (read-only).
272
+
273
+ Environment variables already set from credentials:
274
+ - `GITHUB_TOKEN` / `GH_TOKEN` — use `gh` CLI and `git` directly
275
+
276
+ Use standard tools directly: `gh` CLI, `git`, `curl`.
277
+
278
+ Git clone protocol: Always clone repos via SSH...
279
+
280
+ Anti-exfiltration policy:
281
+ [security instructions omitted]
282
+ </credential-context>
283
+
284
+ <environment>
285
+ Filesystem: The root filesystem is read-only. `/tmp` is the only writable directory.
286
+ Use `/tmp` for cloning repos, writing scratch files, and any other disk I/O.
287
+ Your working directory is `/app/static` which contains your agent files.
288
+
289
+ Environment variables: Use `setenv NAME value` to persist variables across bash commands.
290
+ See the agent commands reference for details.
291
+ </environment>
292
+
293
+ <webhook-trigger>
294
+ {"source":"github","event":"issues","action":"opened","repo":"acme/widgets",
295
+ "number":42,"title":"Login button broken on Safari","body":"Steps to reproduce...",
296
+ "url":"https://github.com/acme/widgets/issues/42","author":"jdoe",
297
+ "labels":["bug"],"sender":"jdoe","timestamp":"2026-03-24T14:30:00Z",
298
+ "receiptId":"wh_abc123"}
299
+ </webhook-trigger>
300
+
301
+ A webhook event just fired. Review the trigger context above and take appropriate action.
302
+ ```
303
+
304
+ Let's walk through each section.
305
+
306
+ ## Agent Config
307
+
308
+ ```xml
309
+ <agent-config>
310
+ {"repo":"acme/widgets","labels":["bug","triage"]}
311
+ </agent-config>
312
+ ```
313
+
314
+ This is the JSON serialization of the `params` field from your agent's `config.toml`. Use it to pass configuration values that your agent's instructions can reference — repo names, label filters, thresholds, or any structured data your agent needs.
315
+
316
+ Your `SKILL.md` instructions can reference this directly, e.g.: *"Read the repo and labels from `<agent-config>` to determine which issues to process."*
317
+
318
+ See [Agent Config Reference](/reference/agent-config#params) for details on the `params` field.
319
+
320
+ ## Credential Context
321
+
322
+ ```xml
323
+ <credential-context>
324
+ Credential files are mounted at `/credentials/` (read-only).
325
+
326
+ Environment variables already set from credentials:
327
+ - `GITHUB_TOKEN` / `GH_TOKEN` — use `gh` CLI and `git` directly
328
+
329
+ Use standard tools directly: `gh` CLI, `git`, `curl`.
330
+ ...
331
+ </credential-context>
332
+ ```
333
+
334
+ This block tells the agent which credentials are available and how to use them. Each credential type defines its own context line — for example, a GitHub token credential explains that `GITHUB_TOKEN` is set and the `gh` CLI is ready to use.
335
+
336
+ The block also includes SSH clone instructions and a **security policy** that instructs the agent never to leak credentials in logs, comments, or API calls.
337
+
338
+ You don't need to repeat any of this in your `SKILL.md`. The agent already knows it can use `gh` and `git` — your instructions just need to say *what* to do, not *how* to authenticate.
339
+
340
+ See [Credentials Reference](/reference/credentials) for all credential types and their injected context.
341
+
342
+ ## Environment
343
+
344
+ ```xml
345
+ <environment>
346
+ Filesystem: The root filesystem is read-only. `/tmp` is the only writable directory.
347
+ ...
348
+ Environment variables: Use `setenv NAME value` to persist variables across bash commands.
349
+ </environment>
350
+ ```
351
+
352
+ Describes the filesystem constraints. In Docker mode, the agent learns that `/tmp` is writable and the root filesystem is read-only. In [host-user mode](/reference/agent-config#runtime), the agent's working directory is `/tmp/al-runs/<instance-id>/`. The `setenv` command persists environment variables in both modes.
353
+
354
+ See [Environment Commands](/reference/agent-commands#environment-commands) for `setenv` details, and [Container Filesystem](/concepts/agents#container-filesystem) for the full mount table.
355
+
356
+ ## Trigger Context
357
+
358
+ The final section varies by how the agent was triggered. This is the only part of the prompt that changes between runs.
359
+
360
+ ### Webhook
361
+
362
+ ```xml
363
+ <webhook-trigger>
364
+ {"source":"github","event":"issues","action":"opened","repo":"acme/widgets",...}
365
+ </webhook-trigger>
366
+
367
+ A webhook event just fired. Review the trigger context above and take appropriate action.
368
+ ```
369
+
370
+ Contains the full webhook payload as JSON — source, event type, action, repo, issue/PR details, sender, timestamp, and a receipt ID for replay. Your `SKILL.md` instructions should describe how to handle the events your agent subscribes to.
371
+
372
+ See [Webhooks Reference](/reference/webhooks) for the full payload schema.
373
+
374
+ ### Scheduled
375
+
376
+ ```
377
+ You are running on a schedule. Check for new work and act on anything you find.
378
+ ```
379
+
380
+ No structured data — the agent is expected to go find work on its own (poll for open issues, check a queue, etc.).
381
+
382
+ ### Manual
383
+
384
+ ```
385
+ You have been triggered manually. Check for new work and act on anything you find.
386
+ ```
387
+
388
+ Same as scheduled. If you pass a prompt to `al run`, the agent instead receives:
389
+
390
+ ```xml
391
+ <user-prompt>
392
+ Your prompt text here
393
+ </user-prompt>
394
+
395
+ You have been given a specific task. Complete the task described above.
396
+ ```
397
+
398
+ ### Agent Call
399
+
400
+ ```xml
401
+ <agent-call>
402
+ {"caller":"orchestrator","context":"Find competitors for Acme in the CRM space"}
403
+ </agent-call>
404
+
405
+ You were called by the "orchestrator" agent. Review the call context above,
406
+ do the requested work, and use `al-return` to send back your result.
407
+ ```
408
+
409
+ Contains the calling agent's name and the context string it passed. See the [Subagents Guide](/guides/subagents) for details on agent-to-agent calls.
410
+
411
+ ## Skills
412
+
413
+ If your agent enables [skills](/reference/agent-config#skills) like `lock` or `subagent`, additional instruction blocks are injected between the `<environment>` block and the trigger context. These teach the agent the commands it can use — `rlock`/`runlock` for [resource locks](/concepts/resource-locks), or `al-subagent`/`al-subagent-wait` for [subagent calls](/guides/subagents).
414
+
415
+ Skill blocks only appear when explicitly enabled in your `SKILL.md` frontmatter.
416
+
417
+ ## Dynamic Context Injection
418
+
419
+ Beyond the assembled prompt, you can inject runtime data into your `SKILL.md` body using the `` !`command` `` syntax. This runs shell commands during container startup and replaces the markers with their output — useful for fetching live data before the LLM session begins.
420
+
421
+ See the [Dynamic Context Guide](/guides/dynamic-context) for details.
422
+
423
+ ## Writing Better Instructions
424
+
425
+ Now that you know what the agent receives automatically, here are some tips:
426
+
427
+ - **Don't repeat what's injected.** You don't need to tell the agent about `GITHUB_TOKEN` or filesystem constraints — it already knows.
428
+ - **Reference injected blocks by name.** Say *"Read the config from `<agent-config>`"* rather than hardcoding values.
429
+ - **Handle your trigger types.** If your agent subscribes to both cron and webhooks, your instructions should cover both paths — the trigger context tells the agent which one fired.
430
+ - **Keep instructions focused on behavior.** The runtime context handles the "how" (credentials, environment, tools). Your `SKILL.md` should focus on the "what" and "why."
431
+
432
+ ---
433
+
434
+ # Resource Locks
435
+
436
+ When you set `scale > 1` on an agent, multiple instances run concurrently. Without coordination, two instances might pick up the same GitHub issue, review the same PR, or deploy the same service at the same time. Resource locks prevent this.
437
+
438
+ ## Why Locks Exist
439
+
440
+ Locks let concurrent agent instances claim exclusive ownership of a resource before working on it. If another instance already holds the lock, the agent skips that resource and moves on.
441
+
442
+ ## How It Works
443
+
444
+ 1. Before working on a shared resource, the agent runs `rlock "github://acme/app/issues/42"`.
445
+ 2. If the lock is free, the agent gets it and proceeds.
446
+ 3. If another instance already holds the lock, the agent gets back the holder's name and skips that resource.
447
+ 4. When done, the agent runs `runlock "github://acme/app/issues/42"`.
448
+
449
+ The agent learns the lock commands from a preamble injected before the session starts. Agent authors just reference the commands in their `SKILL.md` workflow — no need to think about HTTP endpoints or authentication.
450
+
451
+ ## Commands
452
+
453
+ | Command | Description |
454
+ |---------|-------------|
455
+ | `rlock "<uri>"` | Acquire an exclusive lock. Fails if another instance holds it. |
456
+ | `runlock "<uri>"` | Release a lock. Only the holder can release. |
457
+ | `rlock-heartbeat "<uri>"` | Reset the TTL on a held lock. |
458
+
459
+ See [Agent Commands — Locks](/reference/agent-commands#lock-commands) for the full command reference with response JSON.
460
+
461
+ ## Resource Key URIs
462
+
463
+ Lock keys use URI format. Use a scheme that identifies the resource type, and a path that uniquely identifies the instance:
464
+
465
+ | Pattern | Example |
466
+ |---------|---------|
467
+ | `github://owner/repo/issues/number` | `rlock "github://acme/app/issues/42"` |
468
+ | `github://owner/repo/pr/number` | `rlock "github://acme/app/pr/17"` |
469
+ | `deploy://service-name` | `rlock "deploy://api-prod"` |
470
+
471
+ ## TTL and Expiry
472
+
473
+ Locks expire automatically after **30 minutes** by default. This prevents deadlocks if an agent crashes or hangs without releasing its lock. The timeout is configurable via `resourceLockTimeout` in `config.toml` (value in seconds).
474
+
475
+ For work that takes longer than the timeout, use `rlock-heartbeat` to extend the TTL. Each heartbeat resets the clock to another full TTL period. If the agent forgets to heartbeat and the lock expires, another instance can claim it.
476
+
477
+ ## Heartbeat
478
+
479
+ During long-running work, periodically run `rlock-heartbeat` to keep the lock alive:
480
+
481
+ ```markdown
482
+ ## Workflow
483
+
484
+ 1. rlock "deploy://api-prod"
485
+ 2. Run the deployment (may take 45+ minutes)
486
+ - Every 10 minutes, run rlock-heartbeat "deploy://api-prod"
487
+ 3. runlock "deploy://api-prod"
488
+ ```
489
+
490
+ Each heartbeat resets the expiry to a full TTL period from the current time.
491
+
492
+ ## Multiple Locks and Deadlock Detection
493
+
494
+ An agent instance can hold multiple locks simultaneously when working across related resources. However, this introduces the possibility of circular waits — agent A holds lock X and waits for lock Y, while agent B holds lock Y and waits for lock X.
495
+
496
+ The gateway detects these cycles automatically. When an `rlock` request would create a circular wait in the wait-for graph, it returns a `possible deadlock` error with the cycle path instead of blocking forever. The agent can then release its held locks and retry.
497
+
498
+ ```
499
+ # Example deadlock cycle:
500
+ # Agent A holds "github://acme/app/pr/10", wants "deploy://api-prod"
501
+ # Agent B holds "deploy://api-prod", wants "github://acme/app/pr/10"
502
+ # → rlock "deploy://api-prod" returns: possible deadlock detected
503
+ ```
504
+
505
+ Note: The agent preamble constrains agents to one lock at a time for simplicity. Multi-lock is available for advanced use cases where the agent is explicitly instructed to hold multiple locks.
506
+
507
+ ## Authentication
508
+
509
+ Each container gets a unique per-run secret (the same one used for the shutdown API). Lock requests are authenticated with this secret, so only the container that acquired a lock can release or heartbeat it. There is no way for one agent instance to release another's lock — it must wait for the TTL to expire.
510
+
511
+ ## Auto-release on Exit
512
+
513
+ When a container exits — whether it finishes successfully, hits an error, or times out — all of its locks are released automatically by the scheduler. You don't need to worry about cleanup in error paths.
514
+
515
+ ## Example in SKILL.md
516
+
517
+ ```markdown
518
+ ## Workflow
519
+
520
+ 1. List open issues labeled "agent" in repos from `<agent-config>`
521
+ 2. For each issue:
522
+ - rlock "github://owner/repo/issues/123"
523
+ - If the lock fails, skip this issue — another instance is handling it
524
+ - Clone the repo, create a branch, implement the fix
525
+ - Open a PR and link it to the issue
526
+ - runlock "github://owner/repo/issues/123"
527
+ 3. If you completed work and there may be more issues, run `al-rerun`
528
+ ```
529
+
530
+ ## Configuration
531
+
532
+ | Setting | Location | Default | Description |
533
+ |---------|----------|---------|-------------|
534
+ | `resourceLockTimeout` | `config.toml` | `1800` (30 min) | Default TTL for locks in seconds |
535
+
536
+ ## See Also
537
+
538
+ - [Agent Commands — Locks](/reference/agent-commands#lock-commands) — full command syntax and response JSON
539
+ - [Scaling Agents](/guides/scaling-agents) — guide on scaling with locks
540
+
541
+ ---
542
+
543
+ # Dynamic Context
544
+
545
+ Agents spend tokens every time they fetch context at runtime — cloning repos, listing issues, calling APIs. Hooks let you stage this data before the LLM session starts, so the agent begins with everything it needs.
546
+
547
+ ## The Problem
548
+
549
+ Without hooks, a typical agent run looks like:
550
+
551
+ 1. LLM starts
552
+ 2. LLM runs `git clone` (waits, uses tokens to read output)
553
+ 3. LLM runs `gh issue list` (waits, uses tokens to parse JSON)
554
+ 4. LLM starts actual work
555
+
556
+ Steps 2-3 are mechanical — the agent always needs to do them, and they don't benefit from LLM reasoning.
557
+
558
+ ## The Solution: Hooks
559
+
560
+ Pre-hooks run **after credentials are loaded** but **before the LLM session starts**. They execute inside the container with full access to credentials and environment variables. Define them in the agent's `config.toml`:
561
+
562
+ ```toml
563
+ # agents/<name>/config.toml
564
+ [hooks]
565
+ pre = [
566
+ "gh repo clone acme/app /tmp/repo --depth 1",
567
+ "gh issue list --repo acme/app --label bug --json number,title,body --limit 20 > /tmp/context/issues.json",
568
+ ]
569
+ ```
570
+
571
+ Then reference the staged files in the body of your `SKILL.md`:
572
+
573
+ ```markdown
574
+ ## Context
575
+
576
+ - The repo is cloned at `/tmp/repo`
577
+ - Open bug issues are at `/tmp/context/issues.json`
578
+ ```
579
+
580
+ ## Example: Git clone
581
+
582
+ The most common hook — clone the repo the agent will work on:
583
+
584
+ ```toml
585
+ [hooks]
586
+ pre = ["gh repo clone acme/app /tmp/repo --depth 1"]
587
+ ```
588
+
589
+ ## Example: Shell command
590
+
591
+ Run any shell command. `GITHUB_TOKEN`, `GH_TOKEN`, and other credential env vars are already set:
592
+
593
+ ```toml
594
+ [hooks]
595
+ pre = ["gh issue list --repo acme/app --label P1 --json number,title,body --limit 20 > /tmp/context/issues.json"]
596
+ ```
597
+
598
+ ## Example: HTTP fetch
599
+
600
+ Fetch data from an API endpoint:
601
+
602
+ ```toml
603
+ [hooks]
604
+ pre = ["curl -sf -H 'Authorization: Bearer ${INTERNAL_TOKEN}' https://api.internal/v1/feature-flags -o /tmp/context/flags.json"]
605
+ ```
606
+
607
+ Environment variable interpolation (`${VAR_NAME}`) is supported since commands run via `/bin/sh`.
608
+
609
+ ## Post-hooks
610
+
611
+ Post-hooks run after the LLM session completes. Use them for cleanup, artifact upload, or reporting:
612
+
613
+ ```toml
614
+ [hooks]
615
+ pre = ["gh repo clone acme/app /tmp/repo --depth 1"]
616
+ post = [
617
+ "upload-artifacts.sh",
618
+ "curl -X POST https://hooks.slack.com/... -d '{\"text\": \"Agent run complete\"}'",
619
+ ]
620
+ ```
621
+
622
+ ## Referencing Staged Files in SKILL.md
623
+
624
+ After hooks run, tell the agent what's available in the body of your `SKILL.md`:
625
+
626
+ ```markdown
627
+ ## Context
628
+
629
+ - The repo is cloned at `/tmp/repo`
630
+ - Open P1 issues are at `/tmp/context/issues.json`
631
+ - Feature flags (if available) are at `/tmp/context/flags.json`
632
+ ```
633
+
634
+ ## Direct Context Injection
635
+
636
+ For simple, inline data that needs to be embedded directly in your SKILL.md instructions, use direct context injection with the `` !`command` `` syntax. Commands are executed after pre-hooks but before the LLM session starts, and their output replaces the expression inline.
637
+
638
+ ### Syntax
639
+
640
+ ```markdown
641
+ The current time is !`date`.
642
+ There are !`ls /tmp/repo/src | wc -l` source files in the repo.
643
+ ```
644
+
645
+ Becomes:
646
+
647
+ ```
648
+ The current time is Mon Mar 22 21:30:45 UTC 2026.
649
+ There are 42 source files in the repo.
650
+ ```
651
+
652
+ ### When to use
653
+
654
+ - **Hooks**: For setup tasks like cloning repos or downloading data files
655
+ - **Direct injection**: For inline values the agent needs to reference in instructions
656
+
657
+ ### Examples
658
+
659
+ **Basic usage**:
660
+ ```markdown
661
+ You are analyzing code at !`date +"%Y-%m-%d %H:%M"`.
662
+ ```
663
+
664
+ **With pre-staged data**:
665
+ ```toml
666
+ [hooks]
667
+ pre = ["gh issue list --repo acme/app --label bug --json number --limit 20 > /tmp/issues.json"]
668
+ ```
669
+
670
+ ```markdown
671
+ There are !`cat /tmp/issues.json | jq length` open bug issues to work on.
672
+ ```
673
+
674
+ **Error handling**:
675
+ If a command fails, it's replaced with `[Error: <message>]`:
676
+
677
+ ```markdown
678
+ Config value: !`cat /nonexistent/file`
679
+ ```
680
+
681
+ Becomes:
682
+ ```
683
+ Config value: [Error: cat: can't open '/nonexistent/file': No such file or directory]
684
+ ```
685
+
686
+ ### Limitations
687
+
688
+ - Commands have a 60-second timeout
689
+ - Output is limited to prevent prompt explosion
690
+ - Errors are inline — use hooks for critical setup that should fail the run
691
+
692
+ ## Tips
693
+
694
+ - **Hooks run sequentially** in the order defined in `config.toml`
695
+ - **Each hook has a 5-minute timeout** — hooks are also bounded by the container-level timeout
696
+ - **If a command fails** (non-zero exit), the run aborts with an error
697
+ - **Environment variables** set inside hook commands do not propagate back to the agent's `process.env`
698
+ - **Use hooks for setup, direct injection for values** — hooks for cloning repos or staging files, direct context injection for inline dynamic values the agent needs to reference
699
+
700
+ ## Next steps
701
+
702
+ - [Agent Config — Hooks](/reference/agent-config#hooks) — full field reference
703
+ - [Agents (concepts)](/concepts/agents) — full runtime lifecycle
704
+
705
+ ---
706
+
707
+ # Shared Context
708
+
709
+ Agents often need the same context — coding conventions, repo layout, team policies. The `shared/` directory lets you maintain this context in one place and reference it from any agent's `SKILL.md`.
710
+
711
+ ## How it works
712
+
713
+ Place files in a `shared/` directory at your project root. At image build time, these files are baked into every agent's container at `/app/static/shared/`.
714
+
715
+ ```
716
+ my-project/
717
+ ├── config.toml
718
+ ├── shared/
719
+ │ ├── conventions.md
720
+ │ ├── repo-layout.md
721
+ │ └── team/
722
+ │ └── review-policy.md
723
+ ├── agents/
724
+ │ ├── dev/SKILL.md
725
+ │ └── reviewer/SKILL.md
726
+ ```
727
+
728
+ ## Referencing shared files in SKILL.md
729
+
730
+ Use direct context injection to include shared files in your agent's prompt:
731
+
732
+ ```markdown
733
+ ## Context
734
+
735
+ !`cat /app/static/shared/conventions.md`
736
+ !`cat /app/static/shared/repo-layout.md`
737
+ ```
738
+
739
+ Each agent chooses which shared files to include. There is no automatic injection — you control exactly what context each agent receives.
740
+
741
+ ## Example: coding conventions
742
+
743
+ Create `shared/conventions.md`:
744
+
745
+ ```markdown
746
+ # Coding Conventions
747
+
748
+ - TypeScript strict mode, no `any`
749
+ - Use `vitest` for tests, mirror `src/` structure in `test/`
750
+ - Prefer named exports over default exports
751
+ - Error messages should include enough context to debug without a stack trace
752
+ ```
753
+
754
+ Reference it in `agents/dev/SKILL.md`:
755
+
756
+ ```markdown
757
+ # Dev Agent
758
+
759
+ You solve GitHub issues by writing code.
760
+
761
+ ## Context
762
+
763
+ !`cat /app/static/shared/conventions.md`
764
+
765
+ ## Workflow
766
+
767
+ 1. Read the issue
768
+ 2. Write the fix following the conventions above
769
+ 3. Run tests
770
+ 4. Open a PR
771
+ ```
772
+
773
+ And in `agents/reviewer/SKILL.md`:
774
+
775
+ ```markdown
776
+ # Reviewer Agent
777
+
778
+ You review pull requests for correctness and style.
779
+
780
+ ## Context
781
+
782
+ !`cat /app/static/shared/conventions.md`
783
+
784
+ ## Workflow
785
+
786
+ 1. Read the PR diff
787
+ 2. Check against conventions
788
+ 3. Approve or request changes
789
+ ```
790
+
791
+ Both agents now share the same conventions without duplication.
792
+
793
+ ## Subdirectories
794
+
795
+ The `shared/` directory supports subdirectories. A file at `shared/team/review-policy.md` is available at `/app/static/shared/team/review-policy.md` inside the container.
796
+
797
+ ## Cloud deployment
798
+
799
+ When deploying with `al push`, the `shared/` directory is included automatically — `rsync` syncs the entire project directory.
800
+
801
+ ---
802
+
803
+ # Subagents
804
+
805
+ Agents can call other agents to delegate work and collect results. This enables multi-agent workflows like planner → developer → reviewer pipelines.
806
+
807
+ ## Use Case
808
+
809
+ A planner agent triages an issue and creates an implementation plan. It calls a dev agent to implement it, then calls a reviewer agent to review the PR.
810
+
811
+ ## `al-subagent`: Fire a call
812
+
813
+ Pass context to another agent via stdin:
814
+
815
+ ```bash
816
+ echo "Implement the fix for issue #42 on acme/app" | al-subagent dev
817
+ ```
818
+
819
+ **Response:**
820
+
821
+ ```json
822
+ {"ok": true, "callId": "abc123"}
823
+ ```
824
+
825
+ The call is non-blocking — the calling agent continues working immediately.
826
+
827
+ ## `al-subagent-check`: Non-blocking status
828
+
829
+ Check if a call has finished without waiting:
830
+
831
+ ```bash
832
+ al-subagent-check abc123
833
+ ```
834
+
835
+ **Response:**
836
+
837
+ ```json
838
+ {"status": "pending"}
839
+ {"status": "running"}
840
+ {"status": "completed", "returnValue": "PR #17 opened."}
841
+ {"status": "error", "error": "timeout"}
842
+ ```
843
+
844
+ ## `al-subagent-wait`: Block until done
845
+
846
+ Wait for one or more calls to complete:
847
+
848
+ ```bash
849
+ al-subagent-wait abc123 --timeout 600
850
+ al-subagent-wait abc123 def456 --timeout 300
851
+ ```
852
+
853
+ **Response:**
854
+
855
+ ```json
856
+ {
857
+ "abc123": {"status": "completed", "returnValue": "PR #17 opened."},
858
+ "def456": {"status": "completed", "returnValue": "All tests pass."}
859
+ }
860
+ ```
861
+
862
+ Default timeout: 900 seconds. Polls every 5 seconds.
863
+
864
+ ## `al-return`: Send back a result
865
+
866
+ The called agent uses `al-return` to send a value back to the caller:
867
+
868
+ ```bash
869
+ al-return "PR #17 opened. Ready for review."
870
+ ```
871
+
872
+ ## Multi-call Pattern
873
+
874
+ Fire several calls, continue working, then collect all results:
875
+
876
+ ```bash
877
+ # Fire calls
878
+ DEV_ID=$(echo "Implement fix for #42" | al-subagent dev | jq -r .callId)
879
+ REVIEW_ID=$(echo "Review PR #17" | al-subagent reviewer | jq -r .callId)
880
+
881
+ # ... do other work while they run ...
882
+
883
+ # Collect results
884
+ RESULTS=$(al-subagent-wait "$DEV_ID" "$REVIEW_ID" --timeout 600)
885
+ echo "$RESULTS" | jq ".\"$DEV_ID\".returnValue"
886
+ echo "$RESULTS" | jq ".\"$REVIEW_ID\".returnValue"
887
+ ```
888
+
889
+ ## Complete Example: SKILL.md
890
+
891
+ Here's a planner agent that delegates to dev and reviewer:
892
+
893
+ ```markdown
894
+ # Planner Agent
895
+
896
+ You orchestrate development workflows. When triggered, you assess the issue,
897
+ create an implementation plan, and delegate to other agents.
898
+
899
+ ## Workflow
900
+
901
+ 1. Read the issue from the webhook trigger or search for labeled issues
902
+ 2. Assess the issue — is it clear enough for development?
903
+ 3. If not, comment asking for clarification and stop
904
+ 4. Write an implementation plan as a comment on the issue
905
+ 5. Call the dev agent:
906
+ ```
907
+ echo "Implement the plan in comment #N on issue #M in owner/repo" | al-subagent dev
908
+ ```
909
+ 6. Wait for dev to finish:
910
+ ```
911
+ RESULT=$(al-subagent-wait "$CALL_ID" --timeout 1800)
912
+ ```
913
+ 7. If dev succeeded, call the reviewer:
914
+ ```
915
+ echo "Review PR #P on owner/repo" | al-subagent reviewer
916
+ ```
917
+ 8. Comment on the issue with the final status
918
+ ```
919
+
920
+ ## Rules
921
+
922
+ - **No self-calls** — an agent cannot call itself (the call is rejected)
923
+ - **Call depth limit** — chains like A → B → C are allowed up to `maxCallDepth` (default: 3)
924
+ - **Queuing** — if all runners for the target agent are busy, the call is queued (up to `workQueueSize`, default: 100)
925
+ - **No reruns** — called agents do not re-run. They respond to the single call.
926
+ - **Gateway required** — call commands require the gateway. They return errors if `GATEWAY_URL` is not set.
927
+
928
+ ## What the Called Agent Sees
929
+
930
+ The called agent receives a `<skill-subagent>` block in its prompt with:
931
+
932
+ - The name of the calling agent
933
+ - The context string passed via stdin
934
+
935
+ The called agent's `SKILL.md` should handle this trigger type:
936
+
937
+ ```markdown
938
+ ## Trigger handling
939
+
940
+ - **Agent call**: The `<skill-subagent>` block contains context from the calling agent.
941
+ Do what was requested and use `al-return` to send back results.
942
+ ```
943
+
944
+ ## Next steps
945
+
946
+ - [Agent Commands](/reference/agent-commands) — full call command syntax and responses
947
+ - [Agents (concepts)](/concepts/agents) — runtime lifecycle and trigger types
948
+
949
+ ---
950
+
951
+ # Scaling Agents
952
+
953
+ By default, each agent runs one instance at a time. This guide shows how to scale up and use [resource locks](/concepts/resource-locks) to prevent duplicate work.
954
+
955
+ ## The Problem
956
+
957
+ With `scale = 1`, a single agent instance handles all work sequentially. If 5 GitHub issues arrive via webhook while the agent is working on one, those 5 events queue up and wait. For high-volume workloads, this creates a bottleneck.
958
+
959
+ ## Increase Scale
960
+
961
+ In the agent's `config.toml`:
962
+
963
+ ```toml
964
+ # agents/dev/config.toml
965
+ scale = 3 # Run up to 3 instances concurrently
966
+ ```
967
+
968
+ Now when 5 issues arrive, up to 3 are processed simultaneously. The remaining 2 wait in the work queue.
969
+
970
+ ## Add Locking
971
+
972
+ With multiple instances, two agents might try to work on the same issue. Add a [lock/skip/work/unlock](/concepts/resource-locks) pattern to your `SKILL.md`:
973
+
974
+ ```markdown
975
+ ## Workflow
976
+
977
+ 1. List open issues labeled "agent" in repos from `<agent-config>`
978
+ 2. For each issue:
979
+ - rlock "github://owner/repo/issues/123"
980
+ - If the lock fails, skip this issue — another instance is handling it
981
+ - Clone the repo, create a branch, implement the fix
982
+ - Open a PR and link it to the issue
983
+ - runlock "github://owner/repo/issues/123"
984
+ 3. If you completed work and there may be more issues, run `al-rerun`
985
+ ```
986
+
987
+ ### How lock commands work
988
+
989
+ When the agent runs `rlock "github://owner/repo/issues/123"`:
990
+
991
+ - **Lock acquired:** `{"ok": true}` — proceed with work
992
+ - **Already held:** `{"ok": false, "holder": "dev-abc123", ...}` — skip this resource
993
+
994
+ When done: `runlock "github://owner/repo/issues/123"` releases the lock.
995
+
996
+ If the agent crashes or times out, locks are [auto-released](/concepts/resource-locks#auto-release-on-exit).
997
+
998
+ ## Monitor with `al stat`
999
+
1000
+ Check queue depth and running instances:
1001
+
1002
+ ```bash
1003
+ al stat
1004
+ al stat -E production
1005
+ ```
1006
+
1007
+ The `queue` column shows how many events are waiting. If it's consistently high, consider increasing `scale`.
1008
+
1009
+ ## Resource Considerations
1010
+
1011
+ Each parallel instance:
1012
+
1013
+ - Uses a separate Docker container
1014
+ - Consumes memory (`local.memory` per container, default 4GB)
1015
+ - Consumes CPU (`local.cpus` per container, default 2)
1016
+ - Makes independent LLM API calls (watch your rate limits and quota)
1017
+
1018
+ ### Tune work queue size
1019
+
1020
+ If events arrive faster than agents can process them, the queue buffers them:
1021
+
1022
+ ```toml
1023
+ # config.toml
1024
+ workQueueSize = 200 # default: 100 per agent
1025
+ ```
1026
+
1027
+ When the queue is full, the oldest items are dropped.
1028
+
1029
+ ### Default agent scale
1030
+
1031
+ Set the default scale for all agents that don't have an explicit `scale` in their `config.toml`:
1032
+
1033
+ ```toml
1034
+ # config.toml
1035
+ defaultAgentScale = 3 # each agent gets 3 runners unless overridden
1036
+ ```
1037
+
1038
+ Without this setting, agents default to 1 runner each.
1039
+
1040
+ ### Project-wide scale cap
1041
+
1042
+ Limit total concurrent runners across all agents:
1043
+
1044
+ ```toml
1045
+ # config.toml
1046
+ scale = 10 # max 10 runners total across all agents
1047
+ ```
1048
+
1049
+ If `defaultAgentScale * agentCount` exceeds `scale`, agents are throttled at startup and a warning is shown.
1050
+
1051
+ ## Example Configuration
1052
+
1053
+ Agent runtime config in `agents/dev/config.toml`:
1054
+
1055
+ ```toml
1056
+ credentials = ["github_token", "git_ssh"]
1057
+ schedule = "*/5 * * * *"
1058
+ models = ["sonnet"]
1059
+ scale = 3
1060
+
1061
+ [[webhooks]]
1062
+ source = "my-github"
1063
+ events = ["issues"]
1064
+ actions = ["labeled"]
1065
+ labels = ["agent"]
1066
+
1067
+ [params]
1068
+ repos = ["acme/app", "acme/api"]
1069
+ triggerLabel = "agent"
1070
+ ```
1071
+
1072
+ ## Next steps
1073
+
1074
+ - [Resource Locks (concepts)](/concepts/resource-locks) — TTL, heartbeat, deadlock detection
1075
+ - [Agent Commands — Locks](/reference/agent-commands#lock-commands) — full command syntax and responses