@action-llama/skill 0.23.8 → 0.24.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,1918 @@
1
+ # Getting Started
2
+
3
+ ## Prerequisites
4
+
5
+ - **Node.js 20+** — [nodejs.org](https://nodejs.org)
6
+ - **Docker** — [docker.com](https://www.docker.com) (Docker Desktop on macOS/Windows, or Docker Engine on Linux). Not required if all agents use the [host-user runtime](/reference/agent-config#runtime).
7
+
8
+ ## 1. Create a project
9
+
10
+ ```bash
11
+ npm i -g @action-llama/action-llama@latest
12
+ ```
13
+
14
+ ```bash
15
+ al new my-agents
16
+ ```
17
+
18
+ ```bash
19
+ cd my-agents
20
+ ```
21
+
22
+ The wizard prompts for your LLM API key (Anthropic recommended) and basic configuration. This creates:
23
+
24
+ - `config.toml`: project-level settings
25
+ - `.env.toml`: local environment binding (gitignored)
26
+ - `package.json`: with `@action-llama/action-llama` as a dependency
27
+ - `CLAUDE.md`: a symlink to an AGENTS.md file in the npm package that has everything your agent needs to know
28
+ - `.mcp.json`: MCP server config so [Claude Code](/integrations/claude) can interact with your agents
29
+
30
+ ## 2. Create an agent
31
+
32
+ Add a "dev" agent that will implement Github issues when they are tagged with "ready-for-dev":
33
+
34
+ ```bash
35
+ npx al add Action-Llama/agents -a dev
36
+ ```
37
+
38
+ You'll need to configure the credentials and webhooks since this is the first time you've run it. You'll also want to configure the params to match your Github org.
39
+
40
+ You'll now see:
41
+
42
+ ```
43
+ agents/dev/
44
+ SKILL.md # Portable metadata + instructions
45
+ config.toml # Runtime config (credentials, models, schedule, etc.)
46
+ ```
47
+
48
+ You can also use your agent, such as Claude Code, to create new agents! The CLAUDE.md in the project root has everything it needs to know.
49
+
50
+ ## 3. Run
51
+
52
+ ```bash
53
+ npx al start
54
+ ```
55
+
56
+ If any credentials are missing, you'll be prompted for them. The scheduler starts, discovers your agents, and begins running them on their configured schedules.
57
+
58
+ The terminal shows a live TUI with agent status, or use `-w` to enable the [web dashboard](/reference/web-dashboard):
59
+
60
+ ```bash
61
+ npx al start -w
62
+ ```
63
+
64
+ ## What just happened?
65
+
66
+ 1. The scheduler scanned for directories with `SKILL.md` and found your agent
67
+ 2. It built a Docker image with your agent's tools
68
+ 3. On the first cron tick, it launched a container, loaded credentials, and started an LLM session
69
+ 4. The LLM received your `SKILL.md` instructions and ran autonomously
70
+ 5. When it finished, the container was removed and logs were saved
71
+
72
+ ## Key files
73
+
74
+ | File | Purpose |
75
+ |------|---------|
76
+ | `config.toml` | Project settings: named models, Docker limits, webhook sources |
77
+ | `agents/<name>/SKILL.md` | Portable metadata + agent instructions |
78
+ | `agents/<name>/config.toml` | Agent runtime config: credentials, models, schedule, webhooks, params |
79
+
80
+ ## Manual test run
81
+
82
+ Run a single agent once without starting the scheduler:
83
+
84
+ ```bash
85
+ npx al run dev
86
+ ```
87
+
88
+ ## What's next
89
+
90
+ - [Using Webhooks](/first-steps/using-webhooks) — trigger agents from GitHub events
91
+ - [Dynamic Context](/guides/dynamic-context) — use hooks to stage context
92
+ - [Deploying to a VPS](/guides/deploying-to-vps) — run agents 24/7 on a server
93
+ - [Agents (concepts)](/concepts/agents) — understand the runtime lifecycle
94
+ - [CLI Commands](/reference/cli-commands) — all available commands and flags
95
+
96
+ ---
97
+
98
+ # Using Webhooks
99
+
100
+ This tutorial walks you through setting up a GitHub webhook so that labeling an issue triggers an agent run. By the end, you'll have an agent that responds to GitHub events in real time.
101
+
102
+ ## What you'll build
103
+
104
+ An agent that triggers when a GitHub issue is labeled `"agent"`. The agent runs, processes the issue, and you see the result in your terminal.
105
+
106
+ ## Prerequisites
107
+
108
+ - An Action Llama project with an agent (see [Getting Started](/first-steps/getting-started))
109
+ - A GitHub repo you control
110
+ - A GitHub Personal Access Token (the agent already has one if you ran `al doctor`)
111
+
112
+ ## 1. Add a webhook source to `config.toml`
113
+
114
+ Define a named webhook source in your project's `config.toml`:
115
+
116
+ ```toml
117
+ [webhooks.my-github]
118
+ type = "github"
119
+ credential = "MyGithubWebhookSecret"
120
+ ```
121
+
122
+ ## 2. Add the credential
123
+
124
+ ```bash
125
+ npx al doctor
126
+ ```
127
+
128
+ It will prompt you for any missing credentials.
129
+
130
+ For the webhook, enter a random secret string (e.g. generate one with `openssl rand -hex 20`). Save this value — you'll need it when configuring GitHub.
131
+
132
+ ## 3. Add a webhook trigger to your agent
133
+
134
+ In your agent's `config.toml`, add a webhook trigger:
135
+
136
+ ```toml
137
+ # agents/<name>/config.toml
138
+ [[webhooks]]
139
+ source = "my-github"
140
+ events = ["issues"]
141
+ actions = ["labeled"]
142
+ labels = ["agent"]
143
+ ```
144
+
145
+ This tells the agent to trigger when an issue in any repo is labeled with `"agent"`. You can also filter by repo:
146
+
147
+ ```toml
148
+ [[webhooks]]
149
+ source = "my-github"
150
+ repos = ["your-org/your-repo"]
151
+ events = ["issues"]
152
+ actions = ["labeled"]
153
+ labels = ["agent"]
154
+ ```
155
+
156
+ ## 4. Start ngrok for local development
157
+
158
+ Your local machine isn't reachable from the internet, so GitHub can't deliver webhooks directly. Use ngrok to create a tunnel:
159
+
160
+ ```bash
161
+ ngrok http 8080
162
+ ```
163
+
164
+ Copy the HTTPS URL (e.g. `https://abc123.ngrok-free.app`).
165
+
166
+ ## 5. Configure the GitHub webhook
167
+
168
+ In your GitHub repo:
169
+
170
+ 1. Go to **Settings > Webhooks > Add webhook**
171
+ 2. **Payload URL:** `https://abc123.ngrok-free.app/webhooks/github` (your ngrok URL + `/webhooks/github`)
172
+ 3. **Content type:** `application/json`
173
+ 4. **Secret:** paste the same secret you used in step 2
174
+ 5. **Events:** select "Let me select individual events" and check **Issues**
175
+ 6. Click **Add webhook**
176
+
177
+ ## 6. Start the scheduler
178
+
179
+ ```bash
180
+ npx al start
181
+ ```
182
+
183
+ ## 7. Test it
184
+
185
+ Go to your GitHub repo, open (or create) an issue, and add the label `"agent"`.
186
+
187
+ In your terminal, you should see the agent pick up the webhook event and start running. Check the logs:
188
+
189
+ ```bash
190
+ npx al logs dev
191
+ ```
192
+
193
+ ## How it works
194
+
195
+ 1. GitHub sends a POST request to your ngrok URL when the issue is labeled
196
+ 2. The gateway receives it at `/webhooks/github`, verifies the HMAC signature against your secret
197
+ 3. The event is matched against your agent's webhook filters (`events: issues`, `actions: labeled`, `labels: agent`)
198
+ 4. The agent is triggered with the full webhook payload injected into its prompt as a `<webhook-trigger>` block
199
+
200
+ ## Next steps
201
+
202
+ - [Deploying to a VPS](/guides/deploying-to-vps) — deploy to avoid needing ngrok
203
+ - [Webhooks Reference](/reference/webhooks) — all providers and filter fields
204
+ - [Scaling Agents](/guides/scaling-agents) — handle high webhook volume with parallel instances
205
+
206
+ ---
207
+
208
+ # CLI Commands
209
+
210
+ ## `al new <name>`
211
+
212
+ Creates a new Action Llama project. Runs interactive setup to configure credentials and LLM defaults.
213
+
214
+ ```bash
215
+ npx @action-llama/action-llama new my-project
216
+ ```
217
+
218
+ Creates:
219
+ - `my-project/package.json` — with `@action-llama/action-llama` dependency
220
+ - `my-project/.gitignore`
221
+ - `my-project/.workspace/` — runtime state directory
222
+ - Credential files in `~/.action-llama/credentials/`
223
+
224
+ ## `al doctor`
225
+
226
+ Checks all agent credentials and interactively prompts for any that are missing. Discovers agents in the project, collects their credential requirements (plus any webhook secret credentials), and ensures each one exists on disk. Also generates a gateway API key if one doesn't exist yet (used for dashboard and CLI authentication).
227
+
228
+ Additionally validates:
229
+ - Webhook trigger field configurations (misspelled fields, wrong types)
230
+ - Host-user runtime setup (OS user existence, sudoers configuration). On Linux, auto-creates users and sudoers rules. On macOS, prints manual setup instructions.
231
+
232
+ ```bash
233
+ al doctor
234
+ al doctor -E production
235
+ ```
236
+
237
+ | Option | Description |
238
+ |--------|-------------|
239
+ | `-p, --project <dir>` | Project directory (default: `.`) |
240
+ | `-E, --env <name>` | Environment name — pushes credentials to server and reconciles IAM |
241
+ | `--strict` | Treat unknown config fields as errors instead of warnings |
242
+
243
+ ## `al run <agent>`
244
+
245
+ Manually triggers a single agent run. The agent runs once and the process exits when it completes. Useful for testing, debugging, or one-off runs without starting the full scheduler.
246
+
247
+ ```bash
248
+ al run dev
249
+ al run reviewer -p ./my-project
250
+ al run dev -E production
251
+ al run dev --headless
252
+ ```
253
+
254
+ | Option | Description |
255
+ |--------|-------------|
256
+ | `-p, --project <dir>` | Project directory (default: `.`) |
257
+ | `-E, --env <name>` | Environment name |
258
+ | `-H, --headless` | Non-interactive mode (no TUI, no credential prompts) |
259
+
260
+ ## `al start`
261
+
262
+ Starts the scheduler. Runs all agents on their configured schedules and listens for webhooks.
263
+
264
+ ```bash
265
+ al start
266
+ al start -w # Enable web dashboard
267
+ al start -e # VPS deployment: expose gateway publicly
268
+ al start --port 3000 # Custom gateway port
269
+ al start -H # Headless (no TUI)
270
+ ```
271
+
272
+ | Option | Description |
273
+ |--------|-------------|
274
+ | `-p, --project <dir>` | Project directory (default: `.`) |
275
+ | `-E, --env <name>` | Environment name |
276
+ | `-w, --web-ui` | Enable web dashboard (see [Web Dashboard](/reference/web-dashboard)) |
277
+ | `-e, --expose` | Bind gateway to `0.0.0.0` (public) while keeping local-mode features |
278
+ | `-H, --headless` | Non-interactive mode (no TUI, no credential prompts) |
279
+ | `--port <number>` | Gateway port (overrides `[gateway].port` in config) |
280
+
281
+ ## `al stop`
282
+
283
+ Stops the scheduler and clears all pending agent work queues. Sends a stop signal to the gateway. In-flight runs continue until they finish, but no new runs will start.
284
+
285
+ ```bash
286
+ al stop
287
+ al stop -E production
288
+ ```
289
+
290
+ | Option | Description |
291
+ |--------|-------------|
292
+ | `-p, --project <dir>` | Project directory (default: `.`) |
293
+ | `-E, --env <name>` | Environment name |
294
+
295
+ ## `al stat`
296
+
297
+ Shows status of all discovered agents in the project. Displays each agent's schedule, credentials, webhook configuration, and queue depth.
298
+
299
+ ```bash
300
+ al stat
301
+ al stat -E production
302
+ ```
303
+
304
+ | Option | Description |
305
+ |--------|-------------|
306
+ | `-p, --project <dir>` | Project directory (default: `.`) |
307
+ | `-E, --env <name>` | Environment name |
308
+
309
+ ## `al logs [agent]`
310
+
311
+ View log files for a specific agent, or scheduler logs if no agent is specified.
312
+
313
+ ```bash
314
+ al logs # Scheduler logs
315
+ al logs dev
316
+ al logs dev -n 100 # Show last 100 entries
317
+ al logs dev -f # Follow/tail mode
318
+ al logs dev -d 2025-01-15 # Specific date
319
+ al logs dev -r # Raw JSON log output
320
+ al logs dev -i abc123 # Specific instance
321
+ al logs dev -E production # Remote agent logs
322
+ ```
323
+
324
+ | Option | Description |
325
+ |--------|-------------|
326
+ | `-p, --project <dir>` | Project directory (default: `.`) |
327
+ | `-E, --env <name>` | Environment name |
328
+ | `-n, --lines <N>` | Number of log entries (default: 50) |
329
+ | `-f, --follow` | Tail mode — watch for new entries |
330
+ | `-d, --date <YYYY-MM-DD>` | View a specific date's log file |
331
+ | `-r, --raw` | Raw JSON log output (no formatting) |
332
+ | `-i, --instance <id>` | Filter to a specific instance ID |
333
+
334
+ ### Troubleshooting logs
335
+
336
+ ```bash
337
+ al logs # Scheduler logs
338
+ al logs <agent> # Agent logs
339
+ al logs <agent> -f # Follow mode
340
+ al logs <agent> -d 2025-01-15 # Specific date
341
+ al env logs production -f # Server system logs via SSH
342
+ ```
343
+
344
+ ## `al stats [agent]`
345
+
346
+ Show historical run statistics from the local SQLite stats database. Without an agent name, shows a global summary across all agents. With an agent name, shows detailed per-run history.
347
+
348
+ ```bash
349
+ al stats # Global summary (last 7 days)
350
+ al stats dev # Per-run detail for dev agent
351
+ al stats --since 24h # Last 24 hours only
352
+ al stats dev -n 50 # Last 50 runs for dev agent
353
+ al stats --calls # Agent-to-agent call graph summary
354
+ al stats --json # JSON output
355
+ ```
356
+
357
+ | Option | Description |
358
+ |--------|-------------|
359
+ | `-p, --project <dir>` | Project directory (default: `.`) |
360
+ | `-s, --since <duration>` | Time window: e.g. `24h`, `7d`, `30d` (default: `7d`) |
361
+ | `-n <N>` | Number of recent runs to show (default: 20) |
362
+ | `--json` | Output as JSON |
363
+ | `--calls` | Show agent-to-agent call graph summary instead of run history |
364
+
365
+ ## `al pause [name]`
366
+
367
+ Pause the scheduler or a single agent. Without a name, pauses the entire scheduler — all cron jobs stop firing. With a name, pauses that agent — its cron job stops firing and webhook events are ignored. In-flight runs continue until they finish. Requires the gateway.
368
+
369
+ ```bash
370
+ al pause # Pause the scheduler
371
+ al pause dev # Pause a single agent
372
+ al pause dev -E production
373
+ ```
374
+
375
+ | Option | Description |
376
+ |--------|-------------|
377
+ | `-p, --project <dir>` | Project directory (default: `.`) |
378
+ | `-E, --env <name>` | Environment name |
379
+
380
+ ## `al resume [name]`
381
+
382
+ Resume the scheduler or a single agent. Without a name, resumes the entire scheduler. With a name, resumes that agent — its cron job resumes firing and webhooks are accepted again.
383
+
384
+ ```bash
385
+ al resume # Resume the scheduler
386
+ al resume dev # Resume a single agent
387
+ al resume dev -E production
388
+ ```
389
+
390
+ | Option | Description |
391
+ |--------|-------------|
392
+ | `-p, --project <dir>` | Project directory (default: `.`) |
393
+ | `-E, --env <name>` | Environment name |
394
+
395
+ ## `al kill <target>`
396
+
397
+ Kill an agent (all running instances) or a single instance by ID. Tries the target as an agent name first; if not found, falls back to instance ID. This does **not** pause the agent — if it has a schedule, a new run will start at the next cron tick. To fully stop an agent, pause it first, then kill.
398
+
399
+ ```bash
400
+ al kill dev # Kill all instances of an agent
401
+ al kill my-agent-abc123 # Kill a single instance by ID
402
+ al kill dev -E production
403
+ ```
404
+
405
+ | Option | Description |
406
+ |--------|-------------|
407
+ | `-p, --project <dir>` | Project directory (default: `.`) |
408
+ | `-E, --env <name>` | Environment name |
409
+
410
+ ## `al chat [agent]`
411
+
412
+ Open an interactive console. Without an agent name, opens the project-level console for creating and managing agents. With an agent name, opens an interactive session scoped to that agent's environment — credentials are loaded and injected as environment variables (e.g. `GITHUB_TOKEN`, `GIT_SSH_COMMAND`), and the working directory is set to the agent's directory.
413
+
414
+ ```bash
415
+ al chat # project-level console
416
+ al chat dev # interactive session with dev agent's credentials
417
+ ```
418
+
419
+ | Option | Description |
420
+ |--------|-------------|
421
+ | `[agent]` | Agent name — loads its credentials and environment |
422
+ | `-p, --project <dir>` | Project directory (default: `.`) |
423
+
424
+ When running in agent mode, the command probes the gateway and warns if it is not reachable:
425
+
426
+ ```
427
+ Warning: No gateway detected at http://localhost:8080. Resource locks, agent calls, and signals are unavailable.
428
+ Start the scheduler with `al start` to enable these features.
429
+ ```
430
+
431
+ The agent's SKILL.md is loaded as reference context but is **not** auto-executed — you drive the session interactively.
432
+
433
+ ## `al push [agent]`
434
+
435
+ Deploy your project to a server over SSH. Requires a `[server]` section in your environment file. See [VPS Deployment](/concepts/vps-deployment) for how it works under the hood.
436
+
437
+ ```bash
438
+ al push -E production # Full project push
439
+ al push dev -E production # Push only the dev agent (hot-reloaded)
440
+ al push --dry-run -E production # Preview what would be synced
441
+ al push --creds-only -E production # Sync only credentials
442
+ ```
443
+
444
+ Without an agent name, pushes the entire project and can restart the remote service. With an agent name, pushes only that agent's files and credentials — the running scheduler detects the change and hot-reloads the agent without a full restart.
445
+
446
+ | Option | Description |
447
+ |--------|-------------|
448
+ | `[agent]` | Agent name — push only this agent (hot-reloaded, no restart) |
449
+ | `-p, --project <dir>` | Project directory (default: `.`) |
450
+ | `-E, --env <name>` | Environment with `[server]` config |
451
+ | `--dry-run` | Show what would be synced without making changes |
452
+ | `--no-creds` | Skip credential sync |
453
+ | `--creds-only` | Sync only credentials (skip project files) |
454
+ | `--files-only` | Sync only project files (skip credentials) |
455
+ | `-a, --all` | Sync project files, credentials, and restart service |
456
+ | `--force-install` | Force `npm install` even if dependencies appear unchanged |
457
+
458
+ ## Environment Commands
459
+
460
+ ### `al env init <name>`
461
+
462
+ Create a new environment configuration file at `~/.action-llama/environments/<name>.toml`.
463
+
464
+ ```bash
465
+ al env init production --type server
466
+ ```
467
+
468
+ | Option | Description |
469
+ |--------|-------------|
470
+ | `--type <type>` | Environment type: `server` |
471
+
472
+ ### `al env list`
473
+
474
+ List all configured environments.
475
+
476
+ ```bash
477
+ al env list
478
+ ```
479
+
480
+ ### `al env show <name>`
481
+
482
+ Display the contents of an environment configuration file.
483
+
484
+ ```bash
485
+ al env show production
486
+ ```
487
+
488
+ ### `al env set [name]`
489
+
490
+ Bind the current project to an environment by writing the environment name to `.env.toml`. Omit the name to unbind.
491
+
492
+ ```bash
493
+ al env set production # Bind project to "production"
494
+ al env set # Unbind project from any environment
495
+ ```
496
+
497
+ | Option | Description |
498
+ |--------|-------------|
499
+ | `-p, --project <dir>` | Project directory (default: `.`) |
500
+
501
+ ### `al env check <name>`
502
+
503
+ Verify that an environment is provisioned and configured correctly. Checks SSH connectivity, Docker availability, and server readiness.
504
+
505
+ ```bash
506
+ al env check production
507
+ ```
508
+
509
+ ### `al env prov [name]`
510
+
511
+ Provision a new VPS and save it as an environment. Supports Vultr and Hetzner. If the name is omitted, you'll be prompted for one. See [Deploying to a VPS](/guides/deploying-to-vps) for a walkthrough.
512
+
513
+ ```bash
514
+ al env prov production
515
+ ```
516
+
517
+ ### `al env deprov <name>`
518
+
519
+ Tear down a provisioned environment. Stops containers, cleans up remote credentials, optionally deletes DNS records, and optionally deletes the VPS instance if it was provisioned via `al env prov`.
520
+
521
+ ```bash
522
+ al env deprov staging
523
+ ```
524
+
525
+ | Option | Description |
526
+ |--------|-------------|
527
+ | `-p, --project <dir>` | Project directory (default: `.`) |
528
+
529
+ ### `al env logs [name]`
530
+
531
+ View server system logs (systemd journal) via SSH. If the name is omitted, uses the project's bound environment.
532
+
533
+ ```bash
534
+ al env logs production
535
+ al env logs production -n 200 # Last 200 lines
536
+ al env logs production -f # Follow mode
537
+ ```
538
+
539
+ | Option | Description |
540
+ |--------|-------------|
541
+ | `-p, --project <dir>` | Project directory (default: `.`) |
542
+ | `-n, --lines <N>` | Number of log lines (default: 50) |
543
+ | `-f, --follow` | Tail mode — watch for new entries |
544
+
545
+ ## Credential Commands
546
+
547
+ ### `al creds ls`
548
+
549
+ Lists all stored credentials grouped by type, showing field names but not values.
550
+
551
+ ```bash
552
+ al creds ls
553
+ ```
554
+
555
+ ### `al creds add <ref>`
556
+
557
+ Add or update a credential. Runs the interactive prompter with validation for the credential type.
558
+
559
+ ```bash
560
+ al creds add github_token # default instance
561
+ al creds add github_webhook_secret:myapp
562
+ al creds add git_ssh:prod
563
+ ```
564
+
565
+ The `<ref>` format is `type` or `type:instance`. If no instance is specified, defaults to `default`. If the credential already exists, you'll be prompted to update it.
566
+
567
+ ### `al creds rm <ref>`
568
+
569
+ Remove a credential from disk.
570
+
571
+ ```bash
572
+ al creds rm github_token # default instance
573
+ al creds rm github_webhook_secret:myapp
574
+ ```
575
+
576
+ ### `al creds types`
577
+
578
+ Browse available credential types interactively. Presents a searchable list of all built-in credential types. On selection, shows the credential's fields, environment variables, and agent context, then offers to add it immediately.
579
+
580
+ ```bash
581
+ al creds types
582
+ ```
583
+
584
+ ## Agent Commands
585
+
586
+ ### `al agent new`
587
+
588
+ Interactive wizard to create a new agent from a template. Prompts for agent type (dev, reviewer, devops, or custom), agent name, and then runs `al agent config` to configure the new agent.
589
+
590
+ ```bash
591
+ al agent new
592
+ ```
593
+
594
+ | Option | Description |
595
+ |--------|-------------|
596
+ | `-p, --project <dir>` | Project directory (default: `.`) |
597
+
598
+ ### `al agent config <name>`
599
+
600
+ Interactively configure an existing agent. Opens a menu to edit each section of the agent's `config.toml`: credentials, model, schedule, webhooks, and params. Saves changes to `agents/<name>/config.toml` and runs `al doctor` on completion to validate the configuration.
601
+
602
+ ```bash
603
+ al agent config dev
604
+ ```
605
+
606
+ | Option | Description |
607
+ |--------|-------------|
608
+ | `-p, --project <dir>` | Project directory (default: `.`) |
609
+
610
+ ## Skill Management
611
+
612
+ ### `al add <repo>`
613
+
614
+ Install a skill from a git repository. Clones the repo, discovers `SKILL.md` files (at the root or under `skills/*/`), copies the skill into `agents/<name>/`, creates a `config.toml` with a `source` field pointing back to the repo, and runs `al config` for interactive setup.
615
+
616
+ Accepts GitHub shorthand (`author/repo`) or a full git URL.
617
+
618
+ ```bash
619
+ al add acme/dev-skills
620
+ al add acme/dev-skills --skill reviewer
621
+ al add https://github.com/acme/dev-skills.git
622
+ ```
623
+
624
+ | Option | Description |
625
+ |--------|-------------|
626
+ | `-p, --project <dir>` | Project directory (default: `.`) |
627
+ | `-s, --skill <name>` | Skill name (if the repo contains multiple skills) |
628
+
629
+ ### `al config <name>`
630
+
631
+ Shortcut for `al agent config <name>`. Interactively configure an agent's runtime settings in `config.toml`.
632
+
633
+ ```bash
634
+ al config dev
635
+ ```
636
+
637
+ | Option | Description |
638
+ |--------|-------------|
639
+ | `-p, --project <dir>` | Project directory (default: `.`) |
640
+
641
+ ### `al update [agent]`
642
+
643
+ Update installed skills from their source repos. For each agent with a `source` field in its `config.toml`, clones the source repo, compares the upstream `SKILL.md` with the local copy, and prompts to accept changes. Only updates `SKILL.md` — `config.toml` is never touched.
644
+
645
+ ```bash
646
+ al update # Check all agents
647
+ al update dev # Check only the dev agent
648
+ ```
649
+
650
+ | Option | Description |
651
+ |--------|-------------|
652
+ | `-p, --project <dir>` | Project directory (default: `.`) |
653
+
654
+ ## Webhook Commands
655
+
656
+ ### `al webhook replay <fixture>`
657
+
658
+ Load a webhook fixture file (JSON) and test which agents would match. Useful for debugging webhook configurations without sending real webhooks. The fixture file must have `headers` and `body` properties.
659
+
660
+ ```bash
661
+ al webhook replay test/fixtures/github-issue.json
662
+ al webhook replay payload.json --source my-github
663
+ al webhook replay payload.json --run # Interactively run a matched agent
664
+ ```
665
+
666
+ | Option | Description |
667
+ |--------|-------------|
668
+ | `-p, --project <dir>` | Project directory (default: `.`) |
669
+ | `-r, --run` | Interactively run a matched agent after simulation |
670
+ | `-s, --source <name>` | Webhook source name from `config.toml` (auto-detected from headers if omitted) |
671
+
672
+ ## MCP Commands
673
+
674
+ ### `al mcp serve`
675
+
676
+ Starts the MCP stdio server for Claude Code integration. Claude Code spawns this as a subprocess — you don't typically run it directly. See [Claude Integration](/integrations/claude) for setup and usage.
677
+
678
+ ```bash
679
+ al mcp serve
680
+ al mcp serve -p ./my-project
681
+ al mcp serve -E production
682
+ ```
683
+
684
+ | Option | Description |
685
+ |--------|-------------|
686
+ | `-p, --project <dir>` | Project directory (default: `.`) |
687
+ | `-E, --env <name>` | Environment name |
688
+
689
+ ### `al mcp init`
690
+
691
+ Writes or updates `.mcp.json` in the project root so Claude Code auto-discovers the MCP server.
692
+
693
+ ```bash
694
+ al mcp init
695
+ al mcp init -p ./my-project
696
+ ```
697
+
698
+ | Option | Description |
699
+ |--------|-------------|
700
+ | `-p, --project <dir>` | Project directory (default: `.`) |
701
+
702
+ ## Global Options
703
+
704
+ These options are available on most commands:
705
+
706
+ | Option | Description |
707
+ |--------|-------------|
708
+ | `-p, --project <dir>` | Project directory (default: `.`) |
709
+ | `-E, --env <name>` | Environment name (also `AL_ENV` env var or `environment` field in `.env.toml`) |
710
+
711
+ ---
712
+
713
+ # Project Configuration
714
+
715
+ The project-level `config.toml` lives at the root of your Action Llama project. All sections and fields are optional — sensible defaults are used for anything you omit. If the file doesn't exist at all, an empty config is assumed.
716
+
717
+ ## Full Annotated Example
718
+
719
+ ```toml
720
+ # Named models — define once, reference by name in SKILL.md
721
+ [models.sonnet]
722
+ provider = "anthropic"
723
+ model = "claude-sonnet-4-20250514"
724
+ thinkingLevel = "medium"
725
+ authType = "api_key"
726
+
727
+ [models.haiku]
728
+ provider = "anthropic"
729
+ model = "claude-haiku-4-5-20251001"
730
+ authType = "api_key"
731
+
732
+ [models.gpt4o]
733
+ provider = "openai"
734
+ model = "gpt-4o"
735
+ authType = "api_key"
736
+
737
+ # Local Docker container settings
738
+ [local]
739
+ image = "al-agent:latest" # Base image name (default: "al-agent:latest")
740
+ memory = "4g" # Memory limit per container (default: "4g")
741
+ cpus = 2 # CPU limit per container (default: 2)
742
+ timeout = 900 # Default max container runtime in seconds (default: 900, overridable per-agent)
743
+
744
+ # Gateway HTTP server settings
745
+ [gateway]
746
+ port = 8080 # Gateway port (default: 8080)
747
+
748
+ # Webhook sources — named webhook endpoints with provider type and credential
749
+ [webhooks.my-github]
750
+ type = "github"
751
+ credential = "MyOrg" # credential instance for HMAC validation
752
+
753
+ # Scheduler settings
754
+ resourceLockTimeout = 1800 # Lock TTL in seconds (default: 1800 / 30 minutes)
755
+ maxReruns = 10 # Max consecutive reruns for successful agent runs (default: 10)
756
+ maxCallDepth = 3 # Max depth for agent-to-agent call chains (default: 3)
757
+ workQueueSize = 100 # Max queued work items (webhooks + calls) per agent (default: 100)
758
+ scale = 10 # Project-wide max concurrent runners across all agents (default: unlimited)
759
+ historyRetentionDays = 14 # Days to retain run history and webhook receipts (default: 14)
760
+
761
+ # Telemetry settings
762
+ [telemetry]
763
+ enabled = true
764
+ provider = "otel"
765
+ endpoint = "https://telemetry.example.com/v1"
766
+ serviceName = "action-llama"
767
+ samplingRate = 0.5
768
+ ```
769
+
770
+ ## Field Reference
771
+
772
+ ### Top-level fields
773
+
774
+ | Field | Type | Default | Description |
775
+ |-------|------|---------|-------------|
776
+ | `maxReruns` | number | `10` | Maximum consecutive reruns when an agent requests a rerun via `al-rerun` before stopping |
777
+ | `maxCallDepth` | number | `3` | Maximum depth for agent-to-agent call chains (A calls B calls C = depth 2) |
778
+ | `workQueueSize` | number | `100` | Maximum queued work items (webhook events + agent calls) per agent when all runners are busy. Can be overridden per-agent with `maxWorkQueueSize` in the agent's `config.toml`. |
779
+ | `scale` | number | _(unlimited)_ | Project-wide cap on total concurrent runners across all agents |
780
+ | `resourceLockTimeout` | number | `1800` | Default lock TTL in seconds. Locks expire automatically after this duration unless refreshed via heartbeat. See [Resource Locks](/concepts/resource-locks). |
781
+ | `historyRetentionDays` | number | `14` | Number of days to retain run history and webhook receipts in the local SQLite stats database. Older entries are pruned automatically. |
782
+
783
+ ### `[models.<name>]` — Named Models
784
+
785
+ Define models once in `config.toml`, then reference them by name in each agent's `SKILL.md` frontmatter. Agents list model names in priority order — the first is the primary model, and the rest are fallbacks tried automatically when the primary is rate-limited or unavailable.
786
+
787
+ | Field | Type | Required | Description |
788
+ |-------|------|----------|-------------|
789
+ | `provider` | string | Yes | LLM provider: `"anthropic"`, `"openai"`, `"groq"`, `"google"`, `"xai"`, `"mistral"`, `"openrouter"`, or `"custom"` |
790
+ | `model` | string | Yes | Model ID (e.g. `"claude-sonnet-4-20250514"`, `"gpt-4o"`, `"gemini-2.0-flash-exp"`) |
791
+ | `authType` | string | Yes | Auth method: `"api_key"`, `"oauth_token"`, or `"pi_auth"` |
792
+ | `thinkingLevel` | string | No | Thinking budget: `"off"`, `"minimal"`, `"low"`, `"medium"`, `"high"`, `"xhigh"`. Only applies to Anthropic models with reasoning support. Ignored for other providers. |
793
+
794
+ ```toml
795
+ [models.sonnet]
796
+ provider = "anthropic"
797
+ model = "claude-sonnet-4-20250514"
798
+ thinkingLevel = "medium"
799
+ authType = "api_key"
800
+
801
+ [models.haiku]
802
+ provider = "anthropic"
803
+ model = "claude-haiku-4-5-20251001"
804
+ authType = "api_key"
805
+
806
+ [models.gpt4o]
807
+ provider = "openai"
808
+ model = "gpt-4o"
809
+ authType = "api_key"
810
+ ```
811
+
812
+ Agents reference these by name in their `config.toml`:
813
+
814
+ ```toml
815
+ # agents/<name>/config.toml
816
+ models = ["sonnet", "haiku"]
817
+ ```
818
+
819
+ See [Models](/reference/models) for all supported providers, model IDs, auth types, and thinking levels.
820
+
821
+ ### `[local]` — Docker Container Settings
822
+
823
+ Controls local Docker container isolation. These settings apply only to agents using the default container runtime — they are ignored for agents using the [host-user runtime](/reference/agent-config#runtime).
824
+
825
+ | Field | Type | Default | Description |
826
+ |-------|------|---------|-------------|
827
+ | `image` | string | `"al-agent:latest"` | Base Docker image name |
828
+ | `memory` | string | `"4g"` | Memory limit per container (e.g. `"4g"`, `"8g"`) |
829
+ | `cpus` | number | `2` | CPU limit per container |
830
+ | `timeout` | number | `900` | Default max container runtime in seconds. Individual agents can override this with `timeout` in their `config.toml`. See [agent timeout](/reference/agent-config#timeout). |
831
+
832
+ ### `[gateway]` — HTTP Server
833
+
834
+ The gateway starts automatically when Docker mode or webhooks are enabled. It handles health checks, webhook reception, credential serving (local Docker only), resource locking, and the shutdown kill switch.
835
+
836
+ | Field | Type | Default | Description |
837
+ |-------|------|---------|-------------|
838
+ | `port` | number | `8080` | Port for the gateway HTTP server |
839
+
840
+ ### `[webhooks.*]` — Webhook Sources
841
+
842
+ Named webhook sources that agents can reference in their webhook triggers. Each source defines a provider type and an optional credential for signature validation.
843
+
844
+ | Field | Type | Required | Description |
845
+ |-------|------|----------|-------------|
846
+ | `type` | string | Yes | Provider type: `"github"`, `"sentry"`, `"linear"`, or `"mintlify"` |
847
+ | `credential` | string | No | Credential instance name for HMAC signature validation (e.g. `"MyOrg"` maps to `github_webhook_secret:MyOrg`). Omit for unsigned webhooks. |
848
+
849
+ ```toml
850
+ [webhooks.my-github]
851
+ type = "github"
852
+ credential = "MyOrg" # uses github_webhook_secret:MyOrg for HMAC validation
853
+
854
+ [webhooks.my-sentry]
855
+ type = "sentry"
856
+ credential = "SentryProd" # uses sentry_client_secret:SentryProd
857
+
858
+ [webhooks.my-linear]
859
+ type = "linear"
860
+ credential = "LinearMain" # uses linear_webhook_secret:LinearMain
861
+
862
+ [webhooks.my-mintlify]
863
+ type = "mintlify"
864
+ credential = "MintlifyMain" # uses mintlify_webhook_secret:MintlifyMain
865
+
866
+ [webhooks.unsigned-github]
867
+ type = "github" # no credential — accepts unsigned webhooks
868
+ ```
869
+
870
+ Agents reference these sources by name in their `config.toml`:
871
+
872
+ ```toml
873
+ # agents/<name>/config.toml
874
+ [[webhooks]]
875
+ source = "my-github"
876
+ events = ["issues"]
877
+ ```
878
+
879
+ See [Webhooks](/reference/webhooks) for setup instructions and filter fields per provider.
880
+
881
+ ### `[telemetry]` — Observability
882
+
883
+ Optional OpenTelemetry integration.
884
+
885
+ | Field | Type | Default | Description |
886
+ |-------|------|---------|-------------|
887
+ | `enabled` | boolean | `false` | Enable or disable telemetry collection |
888
+ | `provider` | string | `"none"` | Telemetry provider: `"otel"` or `"none"` |
889
+ | `endpoint` | string | — | OpenTelemetry collector endpoint URL (required when `provider = "otel"`) |
890
+ | `serviceName` | string | — | Service name reported to the collector |
891
+ | `headers` | table | — | Additional HTTP headers sent with telemetry requests (e.g. auth tokens) |
892
+ | `samplingRate` | number | — | Sampling rate between `0.0` (none) and `1.0` (all traces) |
893
+
894
+ ## Minimal Examples
895
+
896
+ ### Anthropic with Docker (typical dev setup)
897
+
898
+ ```toml
899
+ [models.sonnet]
900
+ provider = "anthropic"
901
+ model = "claude-sonnet-4-20250514"
902
+ thinkingLevel = "medium"
903
+ authType = "api_key"
904
+ ```
905
+
906
+ Everything else uses defaults: Docker enabled, 4GB memory, 2 CPUs, 15min timeout, gateway on port 8080.
907
+
908
+ ### VPS production (environment file)
909
+
910
+ Server configuration lives in an environment file (`~/.action-llama/environments/<name>.toml`), not in `config.toml`. See [VPS Deployment](/concepts/vps-deployment) for full setup.
911
+
912
+ ```toml
913
+ # ~/.action-llama/environments/production.toml
914
+ [server]
915
+ host = "5.6.7.8"
916
+ user = "root"
917
+ keyPath = "~/.ssh/id_rsa"
918
+ basePath = "/opt/action-llama"
919
+ expose = true
920
+ ```
921
+
922
+ ### Cloud Run Jobs runtime (environment file)
923
+
924
+ To run agents as Cloud Run Jobs instead of local Docker containers, add a `[cloud]` section to your environment file. The scheduler still runs wherever you host it; only agent execution is offloaded to GCP.
925
+
926
+ ```toml
927
+ # ~/.action-llama/environments/production.toml
928
+ [cloud]
929
+ provider = "cloud-run"
930
+ project = "my-gcp-project"
931
+ region = "us-central1"
932
+ artifact_registry = "al-agents"
933
+ # Optional: service account email for job execution identity
934
+ # service_account = "al-agent-runner@my-gcp-project.iam.gserviceaccount.com"
935
+
936
+ [gateway]
937
+ url = "https://your-gateway.example.com" # Must be publicly reachable
938
+ ```
939
+
940
+ See [Running Agents on Cloud Run Jobs](/guides/cloud-run-runtime) for full setup instructions.
941
+
942
+ ---
943
+
944
+ # Scheduler
945
+
946
+ The scheduler is the heart of Action Llama. It discovers agents, fires cron triggers, dispatches webhook events, and manages runner pools.
947
+
948
+ ## Architecture
949
+
950
+ ```
951
+ ┌──────────────────────────────────────────────┐
952
+ │ Scheduler │
953
+ │ Discovers agents, fires cron triggers, │
954
+ │ manages runner pool and work queue │
955
+ ├──────────────────────────────────────────────┤
956
+ │ Gateway │
957
+ │ HTTP server: webhooks, resource locks, │
958
+ │ dashboard, agent signals, control API │
959
+ ├───────────┬───────────┬──────────────────────┤
960
+ │ Container │ Container │ Host-User Process │
961
+ │ Agent A │ Agent A │ Agent B │
962
+ │ (run 1) │ (run 2) │ (run 1) │
963
+ └───────────┴───────────┴──────────────────────┘
964
+ ```
965
+
966
+ - **Scheduler** — discovers agents by scanning for directories with `SKILL.md`. Registers cron jobs and webhook triggers. Manages a pool of runners per agent (configurable via `scale`) and a durable work queue for buffering events when runners are busy.
967
+ - **Gateway** — HTTP server that receives webhooks from external services (GitHub, Sentry, Linear, Mintlify), serves the web dashboard, handles resource locking, and processes agent signals.
968
+ - **Agent Processes** — each agent run is an isolated process — either a Docker container or a host-user process (via `sudo -u`) — with its own credentials and environment variables. Processes are ephemeral — they start, do their work, and are cleaned up.
969
+
970
+ ## Agent Discovery
971
+
972
+ The scheduler scans the project directory for subdirectories containing a `SKILL.md` file. Each discovered directory becomes an agent. The directory name is the agent name.
973
+
974
+ No registration step is needed. Add a new agent directory, restart the scheduler, and it picks it up automatically.
975
+
976
+ ## Cron Scheduling
977
+
978
+ Agents with a `schedule` field in their `SKILL.md` frontmatter are registered as cron jobs. When a cron tick fires:
979
+
980
+ - If a runner is available, the agent starts immediately
981
+ - If all runners are busy, the scheduled run is **skipped** with a warning (cron runs are not queued)
982
+
983
+ This means cron is best-effort — if an agent is still running from the previous tick, the new tick is dropped.
984
+
985
+ ## Webhook Dispatch
986
+
987
+ When the gateway receives a webhook:
988
+
989
+ 1. **Signature verification** — the payload is verified against the credential secret for that webhook source
990
+ 2. **Event parsing** — the raw payload is parsed into a `WebhookContext` (source, event, action, repo, etc.)
991
+ 3. **Filter matching** — the context is matched against each agent's webhook trigger filters
992
+ 4. **Runner dispatch** — if a runner is available, the agent starts. If all runners are busy, the event is **queued** in the work queue.
993
+
994
+ Unlike cron, webhook events are queued (not dropped) when runners are busy.
995
+
996
+ ## Runner Pools
997
+
998
+ Each agent has its own pool of runners. The pool size is controlled by the `scale` field in `SKILL.md` frontmatter (default: 1).
999
+
1000
+ - `scale = 1` — only one instance can run at a time (default)
1001
+ - `scale = N` — up to N instances can run concurrently
1002
+ - `scale = 0` — agent is disabled (no runners, no cron, no webhooks)
1003
+
1004
+ The project-wide `scale` field in `config.toml` sets a cap on total concurrent runners across all agents.
1005
+
1006
+ ## Work Queue
1007
+
1008
+ When a webhook event or agent call arrives but all runners are busy, the event is placed in a **work queue**. Items are dequeued and executed as runners become available.
1009
+
1010
+ - Backed by SQLite (`.al/work-queue.db`) — survives scheduler restarts
1011
+ - Per-agent queues
1012
+ - Configurable size: `workQueueSize` (default: 100)
1013
+ - When full, oldest items are dropped
1014
+ - Queue depth visible in `al stat` output
1015
+
1016
+ ## Reruns
1017
+
1018
+ When a scheduled agent calls `al-rerun`, the scheduler immediately starts a new run. This continues until:
1019
+
1020
+ - The agent completes without calling `al-rerun` (no more work)
1021
+ - The agent hits an error
1022
+ - The `maxReruns` limit is reached (default: 10)
1023
+
1024
+ This lets agents drain their work queue efficiently without waiting for the next cron tick. Only scheduled runs can rerun — webhook and call runs do not.
1025
+
1026
+ ## Agent Calls
1027
+
1028
+ Agents can call other agents via `al-subagent`. The scheduler routes the call to the target agent's runner pool:
1029
+
1030
+ - If a runner is available, the called agent starts immediately
1031
+ - If all runners are busy, the call is queued in the work queue
1032
+ - Self-calls are rejected
1033
+ - Call depth is bounded by `maxCallDepth` (default: 3) to prevent infinite loops
1034
+
1035
+ See [Subagents](/guides/subagents) for a guide on agent-to-agent workflows.
1036
+
1037
+ ## Graceful Shutdown
1038
+
1039
+ When the scheduler receives a stop signal (`al stop` or SIGTERM):
1040
+
1041
+ 1. No new runs are started
1042
+ 2. All pending work queues are cleared
1043
+ 3. In-flight runs continue until they finish
1044
+ 4. Once all runs complete, the process exits
1045
+
1046
+ ## Configuration
1047
+
1048
+ | Setting | Location | Default | Description |
1049
+ |---------|----------|---------|-------------|
1050
+ | `maxReruns` | `config.toml` | `10` | Max consecutive reruns per agent |
1051
+ | `maxCallDepth` | `config.toml` | `3` | Max depth for agent call chains |
1052
+ | `workQueueSize` | `config.toml` | `100` | Max queued items per agent |
1053
+ | `scale` | `config.toml` | _(unlimited)_ | Project-wide max concurrent runners |
1054
+ | `scale` | `SKILL.md` frontmatter | `1` | Per-agent concurrent runner limit |
1055
+ | `gateway.port` | `config.toml` | `8080` | Gateway HTTP port |
1056
+
1057
+ ## Troubleshooting
1058
+
1059
+ ### Agent not running on schedule
1060
+
1061
+ - Verify the cron expression in `SKILL.md` frontmatter is valid
1062
+ - Check if the agent or scheduler is paused: `al stat`
1063
+ - Resume if paused: `al resume` (scheduler) or `al resume <agent>`
1064
+
1065
+ ### Agent keeps re-running
1066
+
1067
+ An agent that calls `al-rerun` will re-run immediately, up to `maxReruns` (default: 10). If it's re-running more than expected, check the agent's `SKILL.md` — it may be calling `al-rerun` even when there's no remaining work.
1068
+
1069
+ ```toml
1070
+ # config.toml — lower the limit if needed
1071
+ maxReruns = 5
1072
+ ```
1073
+
1074
+ ### Agent timing out
1075
+
1076
+ Default timeout is 900 seconds (15 minutes). Increase it in the project's `config.toml` or per-agent in the agent's `config.toml`:
1077
+
1078
+ ```toml
1079
+ # config.toml — project-wide default
1080
+ [local]
1081
+ timeout = 3600 # 1 hour
1082
+ ```
1083
+
1084
+ ```toml
1085
+ # agents/<name>/config.toml — per-agent override
1086
+ timeout = 7200 # 2 hours
1087
+ ```
1088
+
1089
+ ---
1090
+
1091
+ # Dockerfiles
1092
+
1093
+ Agents run in Docker containers built from a layered image system. See [Custom Dockerfiles](/guides/custom-dockerfiles) for a guide on writing and customizing Dockerfiles.
1094
+
1095
+ Agents can also run without Docker using the [host-user runtime](/reference/agent-config#runtime). Dockerfiles and container configuration do not apply to host-user agents.
1096
+
1097
+ ## Agent Dockerfile
1098
+
1099
+ Agents that need extra tools can add a `Dockerfile` to their directory:
1100
+
1101
+ ```
1102
+ my-project/
1103
+ agents/
1104
+ dev/
1105
+ SKILL.md
1106
+ Dockerfile <-- custom image for this agent
1107
+ reviewer/
1108
+ SKILL.md
1109
+ <-- no Dockerfile, uses base image
1110
+ ```
1111
+
1112
+ Agents without a Dockerfile use `al-agent:latest` directly.
1113
+
1114
+ ## Project Dockerfile
1115
+
1116
+ Project Dockerfiles make agents harder to reuse. When an agent depends on tools installed in the project base image, it won't work when shared to another project. Prefer agent-level Dockerfiles so each agent is self-contained and portable.
1117
+
1118
+ Projects can optionally have a `Dockerfile` at the root that defines a shared base image for all agents. When present and customized beyond a bare `FROM al-agent:latest`, the build pipeline creates an intermediate image (`al-project-base:latest`) that all agent images layer on top of. If unmodified or absent, agents build directly on `al-agent:latest`.
1119
+
1120
+ ## Image Build Order
1121
+
1122
+ ```
1123
+ al-agent:latest <-- Action Llama package (automatic)
1124
+ |
1125
+ v
1126
+ al-project-base:latest <-- project Dockerfile (if customized)
1127
+ |
1128
+ v
1129
+ al-<agent>:latest <-- per-agent Dockerfile (if present)
1130
+ ```
1131
+
1132
+ If the project Dockerfile is unmodified, the middle layer is skipped.
1133
+
1134
+ ## Base Image Contents
1135
+
1136
+ The base image (`al-agent:latest`) is built automatically from the Action Llama package and includes:
1137
+
1138
+ | Package | Why |
1139
+ |---------|-----|
1140
+ | `node:20-alpine` | Runs the container entry point and pi-coding-agent SDK |
1141
+ | `git` | Clone repos, create branches, push commits |
1142
+ | `curl` | API calls (Sentry, arbitrary HTTP), anti-exfiltration shutdown |
1143
+ | `ca-certificates` | HTTPS for git, curl, npm |
1144
+ | `openssh-client` | SSH for `GIT_SSH_COMMAND` — git clone/push over SSH |
1145
+
1146
+ The base image also copies the compiled Action Llama application (`dist/`) and installs its npm dependencies. The entry point is `node /app/dist/agents/container-entry.js`.
1147
+
1148
+ ## Build Behavior
1149
+
1150
+ - The base image (`al-agent:latest`) is only built if it doesn't exist yet
1151
+ - The project base image (`al-project-base:latest`) is rebuilt on every `al start` if the project Dockerfile has customizations
1152
+ - Agent images are named `al-<agent-name>:latest` (e.g. `al-dev:latest`) and are rebuilt on every `al start` to pick up Dockerfile changes
1153
+ - The build context is the Action Llama package root (not the project directory), so `COPY` paths reference the package's `dist/`, `package.json`, etc.
1154
+ - The `FROM` line in agent Dockerfiles is automatically rewritten to point at the correct base image
1155
+
1156
+ ## Container Filesystem Layout
1157
+
1158
+ | Path | Mode | Contents |
1159
+ |------|------|----------|
1160
+ | `/app` | read-only | Action Llama application + node_modules |
1161
+ | `/credentials` | read-only | Mounted credential files (`/<type>/<instance>/<field>`) |
1162
+ | `/tmp` | read-write (tmpfs, 2GB) | Agent working directory — repos, scratch files, SSH keys |
1163
+ | `/workspace` | read-write (2GB) | Persistent workspace |
1164
+ | `/home/node` | read-write (64MB) | Home directory |
1165
+
1166
+ ## Configuration
1167
+
1168
+ | Key | Default | Description |
1169
+ |-----|---------|-------------|
1170
+ | `local.image` | `"al-agent:latest"` | Base Docker image name |
1171
+ | `local.memory` | `"4g"` | Memory limit per container |
1172
+ | `local.cpus` | `2` | CPU limit per container |
1173
+ | `local.timeout` | `900` | Max container runtime in seconds |
1174
+
1175
+ ## Troubleshooting
1176
+
1177
+ **"Docker is not running"** — Start Docker Desktop or the Docker daemon before running `al start`.
1178
+
1179
+ ```bash
1180
+ # macOS — open Docker Desktop
1181
+ open -a Docker
1182
+
1183
+ # Linux
1184
+ sudo systemctl start docker
1185
+ ```
1186
+
1187
+ **Base image build fails** — Run `docker build -t al-agent:latest -f docker/Dockerfile .` from the Action Llama package directory to see the full build output.
1188
+
1189
+ **Project base image build fails** — Check that the project `Dockerfile` starts with `FROM al-agent:latest` and that any `apk add` packages are spelled correctly. The base image uses Alpine Linux.
1190
+
1191
+ **Agent image build fails** — Check that your agent's `Dockerfile` starts with `FROM al-agent:latest` (the build pipeline rewrites this to the correct base) and that any package install commands are correct.
1192
+
1193
+ **Image build fails** — Check that your `Dockerfile` uses `apk add` (Alpine) or `apt-get` (Debian) depending on the base. The default base is Alpine.
1194
+
1195
+ **Container out of memory** — Increase the memory limit in `config.toml`:
1196
+
1197
+ ```toml
1198
+ [local]
1199
+ memory = "8g" # default: "4g"
1200
+ ```
1201
+
1202
+ **Container exits immediately** — Check `al logs <agent>` for the error. Common causes: missing credentials, missing `SKILL.md`, invalid model config.
1203
+
1204
+ ---
1205
+
1206
+ # Deploying to a VPS
1207
+
1208
+ This guide walks you through deploying your Action Llama project to a VPS for 24/7 agent operation. We'll use `al push` (the recommended SSH push deploy approach).
1209
+
1210
+ ## Overview
1211
+
1212
+ Once deployed, your agents run continuously on the server — cron jobs fire on schedule, webhooks are publicly reachable, and the scheduler restarts automatically if it crashes.
1213
+
1214
+ ### Prerequisites
1215
+
1216
+ - **[Cloudflare](https://www.cloudflare.com/)** account
1217
+ - DNS for a domain managed on Cloudflare (so you can point it at your AL server)
1218
+ - Cloudflare account API key with Zone editing permissions:
1219
+ - *Zone Settings*
1220
+ - *SSL and Certificates*
1221
+ - *DNS*
1222
+ - **[Hertzner](https://hetzner.com)** or **[Vultr](https://www.vultr.com/)** account and corresponding API keys
1223
+
1224
+ You should already have a domain whose DNS is hosted on Cloudflare. This is **strongly** recommended so that you can have HTTPS without additional configuration.
1225
+
1226
+ ## 1. Provision a server
1227
+
1228
+ ```bash
1229
+ al env prov myserver
1230
+ ```
1231
+
1232
+ The interactive wizard guides you through:
1233
+
1234
+ 1. **Choose a provider** — Hetzner or Vultr (or connect an existing server)
1235
+ 2. **Pick a plan** — 2 vCPU / 4GB RAM works well for most projects ($5-6/month)
1236
+ 3. **Pick a region** — choose one close to your webhook sources
1237
+ 4. **SSH key** — generate a new key or use an existing one
1238
+ 5. **HTTPS (optional)** — set up TLS via Cloudflare
1239
+
1240
+ ### TLS with Cloudflare (recommended)
1241
+
1242
+ If you choose HTTPS, you'll need the Cloudflare account API token.
1243
+
1244
+ What Action Llama sets up:
1245
+
1246
+ - DNS A record pointing to your VPS (proxied through Cloudflare)
1247
+ - Cloudflare Origin CA certificate on the server
1248
+ - nginx reverse proxy with TLS termination
1249
+ - Cloudflare SSL mode set to Full (Strict)
1250
+
1251
+ For convenience, set the environment as the default:
1252
+
1253
+ ```bash
1254
+ al env set myserver
1255
+ ```
1256
+
1257
+ ## 2. Deploy
1258
+
1259
+ ```bash
1260
+ al push
1261
+ ```
1262
+
1263
+ This syncs your project files and credentials to the server, installs dependencies, and starts the scheduler as a systemd service. The first push takes a minute or two; subsequent pushes are faster (rsync only transfers changes).
1264
+
1265
+ ## 3. Verify
1266
+
1267
+ ```bash
1268
+ al env check production # SSH + Docker + health check
1269
+ al stat -E production # Agent status on the server
1270
+ al env logs production -f # Tail server logs
1271
+ ```
1272
+
1273
+ ## 4. Update a single agent
1274
+
1275
+ After making changes to one agent, push just that agent:
1276
+
1277
+ ```bash
1278
+ al push dev -E production
1279
+ ```
1280
+
1281
+ This hot-reloads the agent without restarting the scheduler. The file watcher detects the change and picks up the new config/actions automatically.
1282
+
1283
+ ## 5. Tear down
1284
+
1285
+ ```bash
1286
+ al env deprov production
1287
+ ```
1288
+
1289
+ Stops containers, cleans credentials, deletes DNS records, and destroy the VPS instance.
1290
+
1291
+ ## Cost comparison
1292
+
1293
+ | Provider | vCPU | RAM | Storage | Price/month |
1294
+ |----------|------|-----|---------|-------------|
1295
+ | Hetzner | 1 | 2GB | 20GB SSD | ~$4 |
1296
+ | Vultr | 1 | 1GB | 25GB SSD | $6 |
1297
+ | DigitalOcean | 1 | 1GB | 25GB SSD | $6 |
1298
+ | Linode | 1 | 1GB | 25GB SSD | $5 |
1299
+
1300
+ ## Alternative: manual deployment
1301
+
1302
+ If you prefer to manage the server directly:
1303
+
1304
+ ```bash
1305
+ # On your VPS:
1306
+ npm install -g @action-llama/action-llama
1307
+ al new my-project && cd my-project
1308
+ al doctor
1309
+ al start -w --expose --headless
1310
+ ```
1311
+
1312
+ Then set up systemd for automatic restarts. See [VPS Deployment — concepts](/concepts/vps-deployment) for the full systemd unit file.
1313
+
1314
+ ## Next steps
1315
+
1316
+ - [VPS Deployment (concepts)](/concepts/vps-deployment) — understand what happens under the hood
1317
+ - [CLI Commands](/reference/cli-commands) — full `al push` and `al env` reference
1318
+ - [Web Dashboard](/reference/web-dashboard) — monitor your deployed agents in a browser
1319
+
1320
+ ---
1321
+
1322
+ # Running Agents on Cloud Run Jobs
1323
+
1324
+ The **Cloud Run Jobs runtime** lets you run agent containers on [Google Cloud Run Jobs](https://cloud.google.com/run/docs/create-jobs) instead of a local Docker daemon or VPS. The scheduler continues to run wherever you host it (local machine, VPS, or CI), while agent execution is offloaded to Cloud Run.
1325
+
1326
+ ## Overview
1327
+
1328
+ - **Agents run as Cloud Run Jobs** — ephemeral, serverless, one job per agent run
1329
+ - **Credentials via Secret Manager** — each credential field is stored as a Secret Manager secret, mounted into the job container at `/credentials/<type>/<instance>/<field>`
1330
+ - **Images via Artifact Registry** — agent images are pushed to Google Artifact Registry; old tags are automatically pruned
1331
+ - **Logs via Cloud Logging** — structured logs are streamed from Cloud Logging to your scheduler
1332
+ - **Public gateway required** — agents need to reach the gateway for registration, locks, and return values; the gateway URL must be publicly accessible
1333
+
1334
+ ## Prerequisites
1335
+
1336
+ - A Google Cloud project with billing enabled
1337
+ - The following APIs enabled:
1338
+ - `run.googleapis.com` (Cloud Run)
1339
+ - `secretmanager.googleapis.com` (Secret Manager)
1340
+ - `artifactregistry.googleapis.com` (Artifact Registry)
1341
+ - `logging.googleapis.com` (Cloud Logging)
1342
+ - A GCP service account with these roles:
1343
+ - `roles/run.admin`
1344
+ - `roles/secretmanager.admin`
1345
+ - `roles/artifactregistry.admin`
1346
+ - `roles/logging.viewer`
1347
+ - An Artifact Registry Docker repository in your project
1348
+
1349
+ ## Setup
1350
+
1351
+ ### 1. Create a service account
1352
+
1353
+ In the GCP console or via `gcloud`:
1354
+
1355
+ ```bash
1356
+ # Create the service account
1357
+ gcloud iam service-accounts create al-agent-runtime \
1358
+ --project=my-project \
1359
+ --display-name="Action Llama Agent Runtime"
1360
+
1361
+ # Grant required roles
1362
+ for ROLE in run.admin secretmanager.admin artifactregistry.admin logging.viewer; do
1363
+ gcloud projects add-iam-policy-binding my-project \
1364
+ --member="serviceAccount:al-agent-runtime@my-project.iam.gserviceaccount.com" \
1365
+ --role="roles/$ROLE"
1366
+ done
1367
+
1368
+ # Create and download a key
1369
+ gcloud iam service-accounts keys create ~/al-agent-runtime-key.json \
1370
+ --iam-account=al-agent-runtime@my-project.iam.gserviceaccount.com
1371
+ ```
1372
+
1373
+ ### 2. Add the credential to Action Llama
1374
+
1375
+ ```bash
1376
+ al cred add gcp_service_account
1377
+ # Paste the contents of ~/al-agent-runtime-key.json when prompted
1378
+ ```
1379
+
1380
+ ### 3. Create an Artifact Registry repository
1381
+
1382
+ ```bash
1383
+ gcloud artifacts repositories create al-agents \
1384
+ --repository-format=docker \
1385
+ --location=us-central1 \
1386
+ --project=my-project
1387
+ ```
1388
+
1389
+ ### 4. Configure your environment
1390
+
1391
+ Add Cloud Run configuration to your environment file (`~/.action-llama/environments/<name>.toml`):
1392
+
1393
+ ```toml
1394
+ [cloud]
1395
+ provider = "cloud-run"
1396
+ project = "my-project"
1397
+ region = "us-central1"
1398
+ artifact_registry = "al-agents"
1399
+ # Optional: service account email for job execution identity
1400
+ # service_account = "al-agent-runner@my-project.iam.gserviceaccount.com"
1401
+ ```
1402
+
1403
+ Also ensure your gateway has a public URL configured:
1404
+
1405
+ ```toml
1406
+ [gateway]
1407
+ url = "https://your-gateway.example.com"
1408
+ ```
1409
+
1410
+ ## How It Works
1411
+
1412
+ ### Credential mounting
1413
+
1414
+ Before each agent run, the scheduler creates ephemeral Secret Manager secrets — one per credential field. Each secret is mounted into the Cloud Run Job container at `/credentials/<type>/<instance>/<field>`, preserving the exact path layout that agents expect.
1415
+
1416
+ After the job completes, the runtime deletes all ephemeral secrets. This is equivalent to the Docker volume mount used by local and VPS runtimes.
1417
+
1418
+ ### Image lifecycle
1419
+
1420
+ When you run `al push` or build an agent image:
1421
+
1422
+ 1. The image is built locally using `docker build`
1423
+ 2. Tagged as `<region>-docker.pkg.dev/<project>/<registry>/<image>:<tag>`
1424
+ 3. Pushed to Artifact Registry
1425
+ 4. Old tags are automatically pruned — only the 3 most recent tags per image are kept
1426
+
1427
+ To avoid unbounded storage costs, we recommend also setting up [Artifact Registry cleanup policies](https://cloud.google.com/artifact-registry/docs/repositories/cleanup-policy) as an additional safeguard.
1428
+
1429
+ ### Job execution
1430
+
1431
+ Each agent run:
1432
+
1433
+ 1. Creates a Cloud Run Job (`al-<agentName>-<runId>`)
1434
+ 2. Runs the job with `maxRetries: 0` (one-shot, no automatic retries)
1435
+ 3. Configures a 1-hour default timeout (configurable via `timeout` in agent config)
1436
+ 4. Streams logs from Cloud Logging (with ~5–10s ingestion latency)
1437
+ 5. Polls for completion every 5 seconds
1438
+ 6. Deletes the job and its associated secrets after completion
1439
+
1440
+ ### Orphan recovery
1441
+
1442
+ Cloud Run Jobs are ephemeral. If the scheduler restarts, it can discover running jobs via `listRunningAgents()`. However, because Cloud Run Jobs don't expose container environment variables via an inspect API, orphaned jobs are **killed** rather than re-adopted. This is acceptable for ephemeral workloads.
1443
+
1444
+ ## Cost considerations
1445
+
1446
+ | Resource | Cost |
1447
+ |----------|------|
1448
+ | Cloud Run Jobs | ~$0.00002400 per vCPU-second, ~$0.00000250 per GiB-second |
1449
+ | Secret Manager | $0.06/10,000 API operations; $0.06/active secret version/month |
1450
+ | Artifact Registry | ~$0.10/GB/month for stored images |
1451
+ | Cloud Logging | First 50 GiB/month free; $0.01/GiB after |
1452
+
1453
+ For a typical agent run (2 vCPU, 2 GiB RAM, 5 minutes): ~$0.015 in Cloud Run compute.
1454
+
1455
+ ## Limitations
1456
+
1457
+ - **Agents require a public gateway URL** — Cloud Run Jobs run in Google's infrastructure and can't reach a purely local scheduler. Configure `gateway.url` to point to a publicly accessible gateway.
1458
+ - **No real-time log streaming** — Cloud Logging has 5–10s ingestion latency; logs are polled every 3 seconds.
1459
+ - **No container inspect** — orphaned jobs are killed, not re-adopted.
1460
+ - **Image builds are local** — the `docker build` step runs where the scheduler runs (your machine or VPS). The built image is then pushed to Artifact Registry.
1461
+ - **Secret Manager quotas** — each credential field creates a Secret Manager secret. With many credentials and frequent runs, you may hit the default quota of 9,000 write operations per minute. Request a quota increase if needed.
1462
+
1463
+ ## Troubleshooting
1464
+
1465
+ **Agents can't reach the gateway**
1466
+
1467
+ Ensure `gateway.url` in your config points to a publicly reachable URL. The agent container runs in Google Cloud, not on your local network.
1468
+
1469
+ **Secret Manager permission denied**
1470
+
1471
+ The service account needs `roles/secretmanager.admin`. If you're using a dedicated execution service account (via `service_account` in config), that account also needs `roles/secretmanager.secretAccessor`.
1472
+
1473
+ **Artifact Registry authentication fails**
1474
+
1475
+ Ensure Docker is configured to authenticate with Artifact Registry:
1476
+ ```bash
1477
+ gcloud auth configure-docker us-central1-docker.pkg.dev
1478
+ ```
1479
+
1480
+ **Cloud Run Job creation fails with quota error**
1481
+
1482
+ Check your Cloud Run quotas in the GCP console. The default job limit per region is 1000. Request an increase if needed.
1483
+
1484
+ **Logs appear delayed**
1485
+
1486
+ Cloud Logging has 5–10s ingestion latency. This is expected. For debugging, check logs directly in the GCP console at:
1487
+ `https://console.cloud.google.com/run/jobs/details/<region>/<jobId>/executions?project=<project>`
1488
+
1489
+ ---
1490
+
1491
+ # Continuous Deployment
1492
+
1493
+ This guide sets up a CI pipeline that automatically deploys your Action Llama project whenever changes land on your main branch.
1494
+
1495
+ ## Overview
1496
+
1497
+ The pipeline works like this:
1498
+
1499
+ 1. Code is pushed to `main` (or a dependency updates)
1500
+ 2. GitHub Actions runs `npm install` and `al push --headless --no-creds`
1501
+ 3. Your server receives the updated project files and restarts the scheduler
1502
+
1503
+ Credentials are managed separately — the CI workflow only deploys code and agent configs, not secrets.
1504
+
1505
+ ### Prerequisites
1506
+
1507
+ - A VPS already provisioned and working with `al push` (see [Deploying to a VPS](/guides/deploying-to-vps))
1508
+ - Credentials already on the server (pushed once via `al push` locally, or managed separately)
1509
+ - Your project in a GitHub repository
1510
+
1511
+ ## 1. Set up GitHub secrets
1512
+
1513
+ You need two secrets in your GitHub repository (Settings > Secrets and variables > Actions):
1514
+
1515
+ | Secret | Contents |
1516
+ |--------|----------|
1517
+ | `DEPLOY_SSH_KEY` | SSH private key for the server (the same key used by `al push`) |
1518
+ | `DEPLOY_ENV_TOML` | Your environment TOML file contents |
1519
+
1520
+ ### Getting the environment TOML
1521
+
1522
+ Copy the contents of your environment file — this is the file at `~/.action-llama/environments/<name>.toml` on your local machine. It should look something like:
1523
+
1524
+ ```toml
1525
+ [server]
1526
+ host = "203.0.113.42"
1527
+ user = "root"
1528
+ keyPath = "~/.ssh/deploy_key"
1529
+ ```
1530
+
1531
+ Set `keyPath` to `~/.ssh/deploy_key` — this is where the CI workflow will write the SSH key.
1532
+
1533
+ ### Getting the SSH key
1534
+
1535
+ This is the private key that `al push` uses to connect to your server. If you provisioned with `al env prov`, it was generated automatically and stored in the credential system. Copy it from:
1536
+
1537
+ ```bash
1538
+ cat ~/.action-llama/credentials/vps_ssh/default/private_key
1539
+ ```
1540
+
1541
+ ## 2. Create the deploy workflow
1542
+
1543
+ Add this file to your project repository:
1544
+
1545
+ ```yaml
1546
+ # .github/workflows/deploy.yml
1547
+ name: Deploy
1548
+
1549
+ on:
1550
+ push:
1551
+ branches: [main]
1552
+ workflow_dispatch:
1553
+
1554
+ concurrency:
1555
+ group: deploy
1556
+ cancel-in-progress: false
1557
+
1558
+ jobs:
1559
+ deploy:
1560
+ runs-on: ubuntu-latest
1561
+ timeout-minutes: 10
1562
+ steps:
1563
+ - uses: actions/checkout@v4
1564
+
1565
+ - uses: actions/setup-node@v4
1566
+ with:
1567
+ node-version: 20
1568
+ cache: npm
1569
+
1570
+ - name: Install dependencies
1571
+ run: npm ci
1572
+
1573
+ - name: Set up SSH key
1574
+ run: |
1575
+ mkdir -p ~/.ssh
1576
+ echo "${{ secrets.DEPLOY_SSH_KEY }}" > ~/.ssh/deploy_key
1577
+ chmod 600 ~/.ssh/deploy_key
1578
+
1579
+ - name: Set up environment config
1580
+ run: |
1581
+ mkdir -p ~/.action-llama/environments
1582
+ echo '${{ secrets.DEPLOY_ENV_TOML }}' > ~/.action-llama/environments/prod.toml
1583
+
1584
+ - name: Deploy
1585
+ run: npx al push --env prod --headless --no-creds
1586
+ ```
1587
+
1588
+ This installs your project (including Action Llama), writes the SSH key and environment config to the expected paths, and runs `al push` in headless mode with credential syncing disabled.
1589
+
1590
+ ## 3. Managing credentials separately
1591
+
1592
+ Since the CI workflow skips credentials (`--no-creds`), you need to push credentials to the server separately. Do this from your local machine:
1593
+
1594
+ ```bash
1595
+ al push --creds-only --env prod
1596
+ ```
1597
+
1598
+ Run this whenever you add or rotate a credential. The server retains credentials across code deploys — `al push --no-creds` only syncs project files.
1599
+
1600
+ ## Cross-repo triggers
1601
+
1602
+ If your agent project depends on a package in another repository (e.g., a shared Action Llama fork), you can trigger deploys automatically when that upstream repo changes.
1603
+
1604
+ ### Using repository dispatch
1605
+
1606
+ In the **upstream** repository's CI workflow, add a step that fires a deploy event after tests pass:
1607
+
1608
+ ```yaml
1609
+ - name: Trigger deploy
1610
+ if: github.ref == 'refs/heads/main'
1611
+ run: |
1612
+ gh api repos/<your-org>/<your-agents-repo>/dispatches \
1613
+ -f event_type=deploy
1614
+ env:
1615
+ GH_TOKEN: ${{ secrets.AGENTS_DEPLOY_TOKEN }}
1616
+ ```
1617
+
1618
+ Then update your deploy workflow to also listen for this event:
1619
+
1620
+ ```yaml
1621
+ on:
1622
+ push:
1623
+ branches: [main]
1624
+ repository_dispatch:
1625
+ types: [deploy]
1626
+ workflow_dispatch:
1627
+ ```
1628
+
1629
+ The `AGENTS_DEPLOY_TOKEN` secret needs to be a GitHub personal access token (or fine-grained token) with `contents: write` permission on the agents repository.
1630
+
1631
+ ### Installing from GitHub instead of npm
1632
+
1633
+ If you want your agents project to always use the latest version from a GitHub repository rather than a published npm package, update your `package.json`:
1634
+
1635
+ ```json
1636
+ {
1637
+ "dependencies": {
1638
+ "@action-llama/action-llama": "github:<your-org>/action-llama#main"
1639
+ }
1640
+ }
1641
+ ```
1642
+
1643
+ When `npm install` runs in CI, it clones the repo, runs the `prepare` script (which builds the TypeScript), and installs the result. Combined with a repository dispatch trigger, this gives you fully automated end-to-end deployment: merge to the upstream repo triggers a deploy of your agents project with the latest version.
1644
+
1645
+ ## Verifying deploys
1646
+
1647
+ After a deploy, you can check the status from your local machine:
1648
+
1649
+ ```bash
1650
+ al stat --env prod # Agent status on the server
1651
+ al logs --env prod -f # Tail server logs
1652
+ ```
1653
+
1654
+ Or check the GitHub Actions run output — `al push` prints deployment progress and a health check result at the end.
1655
+
1656
+ ## Next steps
1657
+
1658
+ - [Deploying to a VPS](/guides/deploying-to-vps) — initial server setup
1659
+ - [CLI Commands](/reference/cli-commands) — full `al push` flag reference
1660
+ - [Credentials](/reference/credentials) — how credentials are stored and synced
1661
+
1662
+ ---
1663
+
1664
+ # Custom Dockerfiles
1665
+
1666
+ Action Llama agents run in Docker containers built from a minimal Alpine-based image with Node.js, git, and curl. Agents that need extra tools can add a `Dockerfile` to their directory.
1667
+
1668
+ Custom Dockerfiles only apply to agents using the default container runtime. Agents configured with the [host-user runtime](/reference/agent-config#runtime) do not use Docker and will ignore any Dockerfile.
1669
+
1670
+ Project-level Dockerfiles are also supported but not recommended — they make agents harder to reuse across projects. See the [Dockerfiles reference](/reference/dockerfiles#project-dockerfile) for details.
1671
+
1672
+ ## Agent Dockerfiles
1673
+
1674
+ Agents that need extra tools can add a `Dockerfile` to their directory:
1675
+
1676
+ ```
1677
+ my-project/
1678
+ agents/
1679
+ dev/
1680
+ SKILL.md
1681
+ Dockerfile <-- custom image for this agent
1682
+ reviewer/
1683
+ SKILL.md
1684
+ <-- no Dockerfile, uses base image
1685
+ ```
1686
+
1687
+ Use `FROM al-agent:latest` and add what you need. The build pipeline automatically rewrites the `FROM` line at build time. Switch to `root` to install packages, then back to `node`:
1688
+
1689
+ ```dockerfile
1690
+ FROM al-agent:latest
1691
+
1692
+ USER root
1693
+ RUN apk add --no-cache github-cli
1694
+ USER node
1695
+ ```
1696
+
1697
+ This is a thin layer on top of the base — fast to build and shares most of the image.
1698
+
1699
+ ## Common additions
1700
+
1701
+ ```dockerfile
1702
+ # GitHub CLI (for gh issue list, gh pr create, etc.)
1703
+ RUN apk add --no-cache github-cli
1704
+
1705
+ # Python (for agents that run Python scripts)
1706
+ RUN apk add --no-cache python3 py3-pip
1707
+
1708
+ # jq (for JSON processing in bash) — already in the base image
1709
+ # RUN apk add --no-cache jq
1710
+ ```
1711
+
1712
+ ## Writing a standalone Dockerfile
1713
+
1714
+ If you need full control, you can write a Dockerfile from scratch. It must:
1715
+
1716
+ 1. Include Node.js 20+
1717
+ 2. Copy the application code from the base image or install it
1718
+ 3. Set `ENTRYPOINT ["node", "/app/dist/agents/container-entry.js"]`
1719
+ 4. Use uid 1000 (`USER node` on node images) for compatibility with the container launcher
1720
+
1721
+ Example standalone Dockerfile:
1722
+
1723
+ ```dockerfile
1724
+ FROM node:20-alpine
1725
+
1726
+ # Install your tools
1727
+ RUN apk add --no-cache git curl ca-certificates openssh-client github-cli jq python3
1728
+
1729
+ # Copy app from the base image (avoids rebuilding from source)
1730
+ COPY --from=al-agent:latest /app /app
1731
+ WORKDIR /app
1732
+
1733
+ USER node
1734
+ ENTRYPOINT ["node", "/app/dist/agents/container-entry.js"]
1735
+ ```
1736
+
1737
+ The key requirement is that `/app/dist/agents/container-entry.js` exists and can run. The entry point reads `AGENT_CONFIG`, `PROMPT`, `GATEWAY_URL`, and `SHUTDOWN_SECRET` from environment variables, and credentials from `/credentials/`.
1738
+
1739
+ ## Next steps
1740
+
1741
+ - [Dockerfiles reference](/reference/dockerfiles) — build behavior, image contents, filesystem layout, and configuration
1742
+ - [Scaling Agents](/guides/scaling-agents) — run multiple instances of an agent
1743
+
1744
+ ---
1745
+
1746
+ # Gateway API
1747
+
1748
+ The gateway is the HTTP server that runs alongside the scheduler. It handles webhooks, serves the [web dashboard](/reference/web-dashboard), and exposes control and status APIs used by CLI commands and the dashboard.
1749
+
1750
+ The gateway starts automatically when needed — either when webhooks are configured, when `--web-ui` is passed to `al start`, or when Docker container communication is required. The port is controlled by the `[gateway].port` setting in `config.toml` (default: `8080`).
1751
+
1752
+ ## Authentication
1753
+
1754
+ The gateway API is protected by an API key. The same key is used for both browser sessions and CLI access.
1755
+
1756
+ **Key location:** `~/.action-llama/credentials/gateway_api_key/default/key`
1757
+
1758
+ The key is generated automatically by `al doctor` or on first `al start`. To view or regenerate it, run `al doctor`.
1759
+
1760
+ ### CLI access
1761
+
1762
+ CLI commands (`al stat`, `al pause`, `al resume`, `al kill`) automatically read the API key from the credential store and send it as a `Bearer` token in the `Authorization` header.
1763
+
1764
+ ### Browser access
1765
+
1766
+ The [web dashboard](/reference/web-dashboard) uses cookie-based authentication. After logging in with the API key, an `al_session` cookie is set (HttpOnly, SameSite=Strict) so all subsequent requests — including SSE streams — are authenticated automatically.
1767
+
1768
+ ### Protected routes
1769
+
1770
+ The following routes require authentication:
1771
+
1772
+ - `/dashboard` and `/dashboard/*` — all dashboard pages and SSE streams
1773
+ - `/control/*` — scheduler and agent control endpoints
1774
+ - `/locks/status` — active lock information
1775
+
1776
+ Health checks (`/health`), webhook endpoints (`/webhooks/*`), and container management routes are **not** protected.
1777
+
1778
+ ### Migrating from `AL_DASHBOARD_SECRET`
1779
+
1780
+ The old `AL_DASHBOARD_SECRET` environment variable (HTTP Basic Auth) is no longer used. If it's still set, a deprecation warning is logged. Remove it from your environment and run `al doctor` to set up the new API key.
1781
+
1782
+ ## Control API
1783
+
1784
+ All control endpoints use `POST` and require authentication.
1785
+
1786
+ ### Scheduler control
1787
+
1788
+ | Endpoint | Description |
1789
+ |----------|-------------|
1790
+ | `POST /control/pause` | Pause the scheduler (all cron jobs) |
1791
+ | `POST /control/resume` | Resume the scheduler |
1792
+
1793
+ ### Agent control
1794
+
1795
+ | Endpoint | Description |
1796
+ |----------|-------------|
1797
+ | `POST /control/trigger/<name>` | Trigger an immediate agent run |
1798
+ | `POST /control/agents/<name>/enable` | Enable a disabled agent |
1799
+ | `POST /control/agents/<name>/disable` | Disable an agent (pauses its cron job) |
1800
+ | `POST /control/agents/<name>/pause` | Pause an agent (alias for disable) |
1801
+ | `POST /control/agents/<name>/resume` | Resume an agent (alias for enable) |
1802
+ | `POST /control/agents/<name>/kill` | Kill all running instances of an agent |
1803
+
1804
+ ## Status API
1805
+
1806
+ ### SSE streams
1807
+
1808
+ Live updates use **Server-Sent Events (SSE)**:
1809
+
1810
+ | Endpoint | Description |
1811
+ |----------|-------------|
1812
+ | `GET /dashboard/api/status-stream` | Pushes agent status and scheduler info whenever state changes |
1813
+ | `GET /dashboard/api/logs/<agent>/stream` | Streams log lines for a specific agent (500ms poll interval) |
1814
+
1815
+ ### Trigger history
1816
+
1817
+ | Endpoint | Description |
1818
+ |----------|-------------|
1819
+ | `GET /api/stats/triggers` | Paginated trigger history (cron, webhook, agent-call). Supports query params: `page`, `limit`, `deadLetter` (boolean). |
1820
+ | `POST /api/webhooks/:receiptId/replay` | Re-dispatch a stored webhook payload by receipt ID. Returns the dispatch result. |
1821
+
1822
+ ### Health check
1823
+
1824
+ | Endpoint | Description |
1825
+ |----------|-------------|
1826
+ | `GET /health` | Health check (no authentication required) |
1827
+
1828
+ ### Lock status
1829
+
1830
+ | Endpoint | Description |
1831
+ |----------|-------------|
1832
+ | `GET /locks/status` | Active resource lock information (requires authentication) |
1833
+
1834
+ ---
1835
+
1836
+ # Web Dashboard
1837
+
1838
+ Action Llama includes an optional web-based dashboard for monitoring agents in your browser. It provides a live view of agent statuses and streaming logs — similar to the terminal TUI, but accessible from any browser.
1839
+
1840
+ ## Enabling the Dashboard
1841
+
1842
+ Pass `-w` or `--web-ui` to `al start`:
1843
+
1844
+ ```bash
1845
+ al start -w
1846
+ ```
1847
+
1848
+ The dashboard URL is shown in the TUI header and in headless log output once the scheduler starts:
1849
+
1850
+ ```
1851
+ Dashboard: http://localhost:8080/dashboard
1852
+ ```
1853
+
1854
+ The port is controlled by the `[gateway].port` setting in `config.toml` (default: `8080`).
1855
+
1856
+ ## Authentication
1857
+
1858
+ The dashboard is protected by the gateway API key. Navigate to `http://localhost:8080/dashboard` and you'll be redirected to a login page where you paste your API key. On success, an `al_session` cookie is set (HttpOnly, SameSite=Strict) so all subsequent requests — including SSE streams — are authenticated automatically.
1859
+
1860
+ A **Logout** link is available in the dashboard header.
1861
+
1862
+ See [Gateway API — Authentication](/reference/gateway-api#authentication) for details on key management and protected routes.
1863
+
1864
+ ## Dashboard Pages
1865
+
1866
+ ### Main Page — `/dashboard`
1867
+
1868
+ Displays a live overview of all agents:
1869
+
1870
+ | Column | Description |
1871
+ |--------|-------------|
1872
+ | Agent | Agent name (click to view logs) |
1873
+ | State | Current state: idle, running, building, or error |
1874
+ | Status | Latest status text or error message |
1875
+ | Last Run | Timestamp of the most recent run |
1876
+ | Duration | How long the last run took |
1877
+ | Next Run | When the next scheduled run will happen |
1878
+ | Actions | **Run** (trigger an immediate run) and **Enable/Disable** (toggle the agent) |
1879
+
1880
+ The header also includes:
1881
+
1882
+ - **Pause/Resume** button — pauses or resumes the scheduler (all cron jobs)
1883
+ - **Logout** link — clears the session cookie and redirects to the login page
1884
+
1885
+ Below the table, a **Recent Activity** section shows the last 20 log lines across all agents.
1886
+
1887
+ All data updates in real time via Server-Sent Events (SSE) — no manual refresh needed.
1888
+
1889
+ ### Trigger History — `/dashboard/triggers`
1890
+
1891
+ Displays a paginated table of all trigger events — cron, webhook, and agent-call triggers — with the outcome of each. Includes a toggle to show dead-letter webhook receipts (payloads that arrived but did not match any agent or failed validation).
1892
+
1893
+ Features:
1894
+ - **Pagination** — browse through historical triggers
1895
+ - **Dead-letter toggle** — show webhook payloads that were received but not dispatched
1896
+ - **Replay** — re-dispatch a stored webhook payload to matching agents
1897
+
1898
+ The same data is available via the [Trigger History API](/reference/gateway-api#trigger-history).
1899
+
1900
+ ### Agent Logs — `/dashboard/agents/<name>/logs`
1901
+
1902
+ Displays a live-streaming log view for a single agent. Logs follow automatically by default (new entries scroll into view as they arrive).
1903
+
1904
+ Features:
1905
+ - **Follow mode** — enabled by default, auto-scrolls to the latest log entry. Scrolling up pauses follow; scrolling back to the bottom re-enables it.
1906
+ - **Clear** — clears the log display (does not delete log files).
1907
+ - **Connection status** — shows whether the SSE connection is active.
1908
+ - **Log levels** — color-coded: green for INFO, yellow for WARN, red for ERROR.
1909
+
1910
+ On initial load, the last 100 log entries from the agent's log file are displayed, then new entries stream in as they are written.
1911
+
1912
+ ## How It Works
1913
+
1914
+ The dashboard is served by the same [gateway](/reference/gateway-api) that handles webhooks and container communication. When `--web-ui` is enabled, the gateway starts even if Docker and webhooks are not configured.
1915
+
1916
+ Dashboard actions (Run, Enable/Disable, Pause/Resume) call the [control API](/reference/gateway-api#control-api) endpoints. Live updates are delivered via [SSE streams](/reference/gateway-api#sse-streams).
1917
+
1918
+ No additional dependencies or frontend build steps are required. The dashboard is rendered as plain HTML with inline CSS and JavaScript.