npm - @action-llama/skill - Versions diffs - 0.23.4 → 0.24.1 - Mend

@action-llama/skill 0.23.4 → 0.24.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

package/.claude-plugin/plugin.json +14 -0
package/dist/build-docs.d.ts +32 -0
package/dist/build-docs.d.ts.map +1 -0
package/dist/build-docs.js +127 -0
package/dist/build-docs.js.map +1 -0
package/dist/index.d.ts +7 -7
package/dist/index.d.ts.map +1 -1
package/dist/index.js +9 -15
package/dist/index.js.map +1 -1
package/package.json +12 -6
package/skills/al/SKILL.md +25 -0
package/skills/al/agent-authoring.md +1436 -0
package/skills/al/debugging.md +1075 -0
package/skills/al/operations.md +1918 -0
package/{content/commands/debug.md → skills/debug/SKILL.md} +12 -3
package/{content/commands/iterate.md → skills/iterate/SKILL.md} +6 -0
package/skills/new-agent/SKILL.md +36 -0
package/{content/commands/run.md → skills/run/SKILL.md} +6 -0
package/{content/commands/status.md → skills/status/SKILL.md} +5 -0
package/content/AGENTS.md +0 -1128
package/content/commands/new-agent.md +0 -24
/package/{content/mcp.json → .mcp.json} +0 -0

package/skills/al/operations.md ADDED Viewed

@@ -0,0 +1,1918 @@
+# Getting Started
+## Prerequisites
+- **Node.js 20+** — [nodejs.org](https://nodejs.org)
+- **Docker** — [docker.com](https://www.docker.com) (Docker Desktop on macOS/Windows, or Docker Engine on Linux). Not required if all agents use the [host-user runtime](/reference/agent-config#runtime).
+## 1. Create a project
+```bash
+npm i -g @action-llama/action-llama@latest
+```
+```bash
+al new my-agents
+```
+```bash
+cd my-agents
+```
+The wizard prompts for your LLM API key (Anthropic recommended) and basic configuration. This creates:
+- `config.toml`: project-level settings
+- `.env.toml`: local environment binding (gitignored)
+- `package.json`: with `@action-llama/action-llama` as a dependency
+- `CLAUDE.md`: a symlink to an AGENTS.md file in the npm package that has everything your agent needs to know
+- `.mcp.json`: MCP server config so [Claude Code](/integrations/claude) can interact with your agents
+## 2. Create an agent
+Add a "dev" agent that will implement Github issues when they are tagged with "ready-for-dev":
+```bash
+npx al add Action-Llama/agents -a dev
+```
+You'll need to configure the credentials and webhooks since this is the first time you've run it.  You'll also want to configure the params to match your Github org.
+You'll now see:
+```
+agents/dev/
+  SKILL.md             # Portable metadata + instructions
+  config.toml          # Runtime config (credentials, models, schedule, etc.)
+```
+You can also use your agent, such as Claude Code, to create new agents! The CLAUDE.md in the project root has everything it needs to know.
+## 3. Run
+```bash
+npx al start
+```
+If any credentials are missing, you'll be prompted for them. The scheduler starts, discovers your agents, and begins running them on their configured schedules.
+The terminal shows a live TUI with agent status, or use `-w` to enable the [web dashboard](/reference/web-dashboard):
+```bash
+npx al start -w
+```
+## What just happened?
+1. The scheduler scanned for directories with `SKILL.md` and found your agent
+2. It built a Docker image with your agent's tools
+3. On the first cron tick, it launched a container, loaded credentials, and started an LLM session
+4. The LLM received your `SKILL.md` instructions and ran autonomously
+5. When it finished, the container was removed and logs were saved
+## Key files
+| File | Purpose |
+|------|---------|
+| `config.toml` | Project settings: named models, Docker limits, webhook sources |
+| `agents/<name>/SKILL.md` | Portable metadata + agent instructions |
+| `agents/<name>/config.toml` | Agent runtime config: credentials, models, schedule, webhooks, params |
+## Manual test run
+Run a single agent once without starting the scheduler:
+```bash
+npx al run dev
+```
+## What's next
+- [Using Webhooks](/first-steps/using-webhooks) — trigger agents from GitHub events
+- [Dynamic Context](/guides/dynamic-context) — use hooks to stage context
+- [Deploying to a VPS](/guides/deploying-to-vps) — run agents 24/7 on a server
+- [Agents (concepts)](/concepts/agents) — understand the runtime lifecycle
+- [CLI Commands](/reference/cli-commands) — all available commands and flags
+---
+# Using Webhooks
+This tutorial walks you through setting up a GitHub webhook so that labeling an issue triggers an agent run. By the end, you'll have an agent that responds to GitHub events in real time.
+## What you'll build
+An agent that triggers when a GitHub issue is labeled `"agent"`. The agent runs, processes the issue, and you see the result in your terminal.
+## Prerequisites
+- An Action Llama project with an agent (see [Getting Started](/first-steps/getting-started))
+- A GitHub repo you control
+- A GitHub Personal Access Token (the agent already has one if you ran `al doctor`)
+## 1. Add a webhook source to `config.toml`
+Define a named webhook source in your project's `config.toml`:
+```toml
+[webhooks.my-github]
+type = "github"
+credential = "MyGithubWebhookSecret"
+```
+## 2. Add the credential
+```bash
+npx al doctor
+```
+It will prompt you for any missing credentials.
+For the webhook, enter a random secret string (e.g. generate one with `openssl rand -hex 20`). Save this value — you'll need it when configuring GitHub.
+## 3. Add a webhook trigger to your agent
+In your agent's `config.toml`, add a webhook trigger:
+```toml
+# agents/<name>/config.toml
+[[webhooks]]
+source = "my-github"
+events = ["issues"]
+actions = ["labeled"]
+labels = ["agent"]
+```
+This tells the agent to trigger when an issue in any repo is labeled with `"agent"`. You can also filter by repo:
+```toml
+[[webhooks]]
+source = "my-github"
+repos = ["your-org/your-repo"]
+events = ["issues"]
+actions = ["labeled"]
+labels = ["agent"]
+```
+## 4. Start ngrok for local development
+Your local machine isn't reachable from the internet, so GitHub can't deliver webhooks directly. Use ngrok to create a tunnel:
+```bash
+ngrok http 8080
+```
+Copy the HTTPS URL (e.g. `https://abc123.ngrok-free.app`).
+## 5. Configure the GitHub webhook
+In your GitHub repo:
+1. Go to **Settings > Webhooks > Add webhook**
+2. **Payload URL:** `https://abc123.ngrok-free.app/webhooks/github` (your ngrok URL + `/webhooks/github`)
+3. **Content type:** `application/json`
+4. **Secret:** paste the same secret you used in step 2
+5. **Events:** select "Let me select individual events" and check **Issues**
+6. Click **Add webhook**
+## 6. Start the scheduler
+```bash
+npx al start
+```
+## 7. Test it
+Go to your GitHub repo, open (or create) an issue, and add the label `"agent"`.
+In your terminal, you should see the agent pick up the webhook event and start running. Check the logs:
+```bash
+npx al logs dev
+```
+## How it works
+1. GitHub sends a POST request to your ngrok URL when the issue is labeled
+2. The gateway receives it at `/webhooks/github`, verifies the HMAC signature against your secret
+3. The event is matched against your agent's webhook filters (`events: issues`, `actions: labeled`, `labels: agent`)
+4. The agent is triggered with the full webhook payload injected into its prompt as a `<webhook-trigger>` block
+## Next steps
+- [Deploying to a VPS](/guides/deploying-to-vps) — deploy to avoid needing ngrok
+- [Webhooks Reference](/reference/webhooks) — all providers and filter fields
+- [Scaling Agents](/guides/scaling-agents) — handle high webhook volume with parallel instances
+---
+# CLI Commands
+## `al new <name>`
+Creates a new Action Llama project. Runs interactive setup to configure credentials and LLM defaults.
+```bash
+npx @action-llama/action-llama new my-project
+```
+Creates:
+- `my-project/package.json` — with `@action-llama/action-llama` dependency
+- `my-project/.gitignore`
+- `my-project/.workspace/` — runtime state directory
+- Credential files in `~/.action-llama/credentials/`
+## `al doctor`
+Checks all agent credentials and interactively prompts for any that are missing. Discovers agents in the project, collects their credential requirements (plus any webhook secret credentials), and ensures each one exists on disk. Also generates a gateway API key if one doesn't exist yet (used for dashboard and CLI authentication).
+Additionally validates:
+- Webhook trigger field configurations (misspelled fields, wrong types)
+- Host-user runtime setup (OS user existence, sudoers configuration). On Linux, auto-creates users and sudoers rules. On macOS, prints manual setup instructions.
+```bash
+al doctor
+al doctor -E production
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+| `-E, --env <name>` | Environment name — pushes credentials to server and reconciles IAM |
+| `--strict` | Treat unknown config fields as errors instead of warnings |
+## `al run <agent>`
+Manually triggers a single agent run. The agent runs once and the process exits when it completes. Useful for testing, debugging, or one-off runs without starting the full scheduler.
+```bash
+al run dev
+al run reviewer -p ./my-project
+al run dev -E production
+al run dev --headless
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+| `-E, --env <name>` | Environment name |
+| `-H, --headless` | Non-interactive mode (no TUI, no credential prompts) |
+## `al start`
+Starts the scheduler. Runs all agents on their configured schedules and listens for webhooks.
+```bash
+al start
+al start -w                   # Enable web dashboard
+al start -e                   # VPS deployment: expose gateway publicly
+al start --port 3000          # Custom gateway port
+al start -H                   # Headless (no TUI)
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+| `-E, --env <name>` | Environment name |
+| `-w, --web-ui` | Enable web dashboard (see [Web Dashboard](/reference/web-dashboard)) |
+| `-e, --expose` | Bind gateway to `0.0.0.0` (public) while keeping local-mode features |
+| `-H, --headless` | Non-interactive mode (no TUI, no credential prompts) |
+| `--port <number>` | Gateway port (overrides `[gateway].port` in config) |
+## `al stop`
+Stops the scheduler and clears all pending agent work queues. Sends a stop signal to the gateway. In-flight runs continue until they finish, but no new runs will start.
+```bash
+al stop
+al stop -E production
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+| `-E, --env <name>` | Environment name |
+## `al stat`
+Shows status of all discovered agents in the project. Displays each agent's schedule, credentials, webhook configuration, and queue depth.
+```bash
+al stat
+al stat -E production
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+| `-E, --env <name>` | Environment name |
+## `al logs [agent]`
+View log files for a specific agent, or scheduler logs if no agent is specified.
+```bash
+al logs                       # Scheduler logs
+al logs dev
+al logs dev -n 100            # Show last 100 entries
+al logs dev -f                # Follow/tail mode
+al logs dev -d 2025-01-15    # Specific date
+al logs dev -r                # Raw JSON log output
+al logs dev -i abc123         # Specific instance
+al logs dev -E production     # Remote agent logs
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+| `-E, --env <name>` | Environment name |
+| `-n, --lines <N>` | Number of log entries (default: 50) |
+| `-f, --follow` | Tail mode — watch for new entries |
+| `-d, --date <YYYY-MM-DD>` | View a specific date's log file |
+| `-r, --raw` | Raw JSON log output (no formatting) |
+| `-i, --instance <id>` | Filter to a specific instance ID |
+### Troubleshooting logs
+```bash
+al logs                       # Scheduler logs
+al logs <agent>               # Agent logs
+al logs <agent> -f            # Follow mode
+al logs <agent> -d 2025-01-15 # Specific date
+al env logs production -f     # Server system logs via SSH
+```
+## `al stats [agent]`
+Show historical run statistics from the local SQLite stats database. Without an agent name, shows a global summary across all agents. With an agent name, shows detailed per-run history.
+```bash
+al stats                      # Global summary (last 7 days)
+al stats dev                  # Per-run detail for dev agent
+al stats --since 24h          # Last 24 hours only
+al stats dev -n 50            # Last 50 runs for dev agent
+al stats --calls              # Agent-to-agent call graph summary
+al stats --json               # JSON output
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+| `-s, --since <duration>` | Time window: e.g. `24h`, `7d`, `30d` (default: `7d`) |
+| `-n <N>` | Number of recent runs to show (default: 20) |
+| `--json` | Output as JSON |
+| `--calls` | Show agent-to-agent call graph summary instead of run history |
+## `al pause [name]`
+Pause the scheduler or a single agent. Without a name, pauses the entire scheduler — all cron jobs stop firing. With a name, pauses that agent — its cron job stops firing and webhook events are ignored. In-flight runs continue until they finish. Requires the gateway.
+```bash
+al pause                              # Pause the scheduler
+al pause dev                          # Pause a single agent
+al pause dev -E production
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+| `-E, --env <name>` | Environment name |
+## `al resume [name]`
+Resume the scheduler or a single agent. Without a name, resumes the entire scheduler. With a name, resumes that agent — its cron job resumes firing and webhooks are accepted again.
+```bash
+al resume                             # Resume the scheduler
+al resume dev                         # Resume a single agent
+al resume dev -E production
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+| `-E, --env <name>` | Environment name |
+## `al kill <target>`
+Kill an agent (all running instances) or a single instance by ID. Tries the target as an agent name first; if not found, falls back to instance ID. This does **not** pause the agent — if it has a schedule, a new run will start at the next cron tick. To fully stop an agent, pause it first, then kill.
+```bash
+al kill dev                           # Kill all instances of an agent
+al kill my-agent-abc123               # Kill a single instance by ID
+al kill dev -E production
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+| `-E, --env <name>` | Environment name |
+## `al chat [agent]`
+Open an interactive console. Without an agent name, opens the project-level console for creating and managing agents. With an agent name, opens an interactive session scoped to that agent's environment — credentials are loaded and injected as environment variables (e.g. `GITHUB_TOKEN`, `GIT_SSH_COMMAND`), and the working directory is set to the agent's directory.
+```bash
+al chat                # project-level console
+al chat dev            # interactive session with dev agent's credentials
+```
+| Option | Description |
+|--------|-------------|
+| `[agent]` | Agent name — loads its credentials and environment |
+| `-p, --project <dir>` | Project directory (default: `.`) |
+When running in agent mode, the command probes the gateway and warns if it is not reachable:
+```
+Warning: No gateway detected at http://localhost:8080. Resource locks, agent calls, and signals are unavailable.
+Start the scheduler with `al start` to enable these features.
+```
+The agent's SKILL.md is loaded as reference context but is **not** auto-executed — you drive the session interactively.
+## `al push [agent]`
+Deploy your project to a server over SSH. Requires a `[server]` section in your environment file. See [VPS Deployment](/concepts/vps-deployment) for how it works under the hood.
+```bash
+al push -E production                    # Full project push
+al push dev -E production                # Push only the dev agent (hot-reloaded)
+al push --dry-run -E production          # Preview what would be synced
+al push --creds-only -E production       # Sync only credentials
+```
+Without an agent name, pushes the entire project and can restart the remote service. With an agent name, pushes only that agent's files and credentials — the running scheduler detects the change and hot-reloads the agent without a full restart.
+| Option | Description |
+|--------|-------------|
+| `[agent]` | Agent name — push only this agent (hot-reloaded, no restart) |
+| `-p, --project <dir>` | Project directory (default: `.`) |
+| `-E, --env <name>` | Environment with `[server]` config |
+| `--dry-run` | Show what would be synced without making changes |
+| `--no-creds` | Skip credential sync |
+| `--creds-only` | Sync only credentials (skip project files) |
+| `--files-only` | Sync only project files (skip credentials) |
+| `-a, --all` | Sync project files, credentials, and restart service |
+| `--force-install` | Force `npm install` even if dependencies appear unchanged |
+## Environment Commands
+### `al env init <name>`
+Create a new environment configuration file at `~/.action-llama/environments/<name>.toml`.
+```bash
+al env init production --type server
+```
+| Option | Description |
+|--------|-------------|
+| `--type <type>` | Environment type: `server` |
+### `al env list`
+List all configured environments.
+```bash
+al env list
+```
+### `al env show <name>`
+Display the contents of an environment configuration file.
+```bash
+al env show production
+```
+### `al env set [name]`
+Bind the current project to an environment by writing the environment name to `.env.toml`. Omit the name to unbind.
+```bash
+al env set production        # Bind project to "production"
+al env set                   # Unbind project from any environment
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+### `al env check <name>`
+Verify that an environment is provisioned and configured correctly. Checks SSH connectivity, Docker availability, and server readiness.
+```bash
+al env check production
+```
+### `al env prov [name]`
+Provision a new VPS and save it as an environment. Supports Vultr and Hetzner. If the name is omitted, you'll be prompted for one. See [Deploying to a VPS](/guides/deploying-to-vps) for a walkthrough.
+```bash
+al env prov production
+```
+### `al env deprov <name>`
+Tear down a provisioned environment. Stops containers, cleans up remote credentials, optionally deletes DNS records, and optionally deletes the VPS instance if it was provisioned via `al env prov`.
+```bash
+al env deprov staging
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+### `al env logs [name]`
+View server system logs (systemd journal) via SSH. If the name is omitted, uses the project's bound environment.
+```bash
+al env logs production
+al env logs production -n 200     # Last 200 lines
+al env logs production -f          # Follow mode
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+| `-n, --lines <N>` | Number of log lines (default: 50) |
+| `-f, --follow` | Tail mode — watch for new entries |
+## Credential Commands
+### `al creds ls`
+Lists all stored credentials grouped by type, showing field names but not values.
+```bash
+al creds ls
+```
+### `al creds add <ref>`
+Add or update a credential. Runs the interactive prompter with validation for the credential type.
+```bash
+al creds add github_token              # default instance
+al creds add github_webhook_secret:myapp
+al creds add git_ssh:prod
+```
+The `<ref>` format is `type` or `type:instance`. If no instance is specified, defaults to `default`. If the credential already exists, you'll be prompted to update it.
+### `al creds rm <ref>`
+Remove a credential from disk.
+```bash
+al creds rm github_token               # default instance
+al creds rm github_webhook_secret:myapp
+```
+### `al creds types`
+Browse available credential types interactively. Presents a searchable list of all built-in credential types. On selection, shows the credential's fields, environment variables, and agent context, then offers to add it immediately.
+```bash
+al creds types
+```
+## Agent Commands
+### `al agent new`
+Interactive wizard to create a new agent from a template. Prompts for agent type (dev, reviewer, devops, or custom), agent name, and then runs `al agent config` to configure the new agent.
+```bash
+al agent new
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+### `al agent config <name>`
+Interactively configure an existing agent. Opens a menu to edit each section of the agent's `config.toml`: credentials, model, schedule, webhooks, and params. Saves changes to `agents/<name>/config.toml` and runs `al doctor` on completion to validate the configuration.
+```bash
+al agent config dev
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+## Skill Management
+### `al add <repo>`
+Install a skill from a git repository. Clones the repo, discovers `SKILL.md` files (at the root or under `skills/*/`), copies the skill into `agents/<name>/`, creates a `config.toml` with a `source` field pointing back to the repo, and runs `al config` for interactive setup.
+Accepts GitHub shorthand (`author/repo`) or a full git URL.
+```bash
+al add acme/dev-skills
+al add acme/dev-skills --skill reviewer
+al add https://github.com/acme/dev-skills.git
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+| `-s, --skill <name>` | Skill name (if the repo contains multiple skills) |
+### `al config <name>`
+Shortcut for `al agent config <name>`. Interactively configure an agent's runtime settings in `config.toml`.
+```bash
+al config dev
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+### `al update [agent]`
+Update installed skills from their source repos. For each agent with a `source` field in its `config.toml`, clones the source repo, compares the upstream `SKILL.md` with the local copy, and prompts to accept changes. Only updates `SKILL.md` — `config.toml` is never touched.
+```bash
+al update              # Check all agents
+al update dev          # Check only the dev agent
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+## Webhook Commands
+### `al webhook replay <fixture>`
+Load a webhook fixture file (JSON) and test which agents would match. Useful for debugging webhook configurations without sending real webhooks. The fixture file must have `headers` and `body` properties.
+```bash
+al webhook replay test/fixtures/github-issue.json
+al webhook replay payload.json --source my-github
+al webhook replay payload.json --run              # Interactively run a matched agent
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+| `-r, --run` | Interactively run a matched agent after simulation |
+| `-s, --source <name>` | Webhook source name from `config.toml` (auto-detected from headers if omitted) |
+## MCP Commands
+### `al mcp serve`
+Starts the MCP stdio server for Claude Code integration. Claude Code spawns this as a subprocess — you don't typically run it directly. See [Claude Integration](/integrations/claude) for setup and usage.
+```bash
+al mcp serve
+al mcp serve -p ./my-project
+al mcp serve -E production
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+| `-E, --env <name>` | Environment name |
+### `al mcp init`
+Writes or updates `.mcp.json` in the project root so Claude Code auto-discovers the MCP server.
+```bash
+al mcp init
+al mcp init -p ./my-project
+```
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+## Global Options
+These options are available on most commands:
+| Option | Description |
+|--------|-------------|
+| `-p, --project <dir>` | Project directory (default: `.`) |
+| `-E, --env <name>` | Environment name (also `AL_ENV` env var or `environment` field in `.env.toml`) |
+---
+# Project Configuration
+The project-level `config.toml` lives at the root of your Action Llama project. All sections and fields are optional — sensible defaults are used for anything you omit. If the file doesn't exist at all, an empty config is assumed.
+## Full Annotated Example
+```toml
+# Named models — define once, reference by name in SKILL.md
+[models.sonnet]
+provider = "anthropic"
+model = "claude-sonnet-4-20250514"
+thinkingLevel = "medium"
+authType = "api_key"
+[models.haiku]
+provider = "anthropic"
+model = "claude-haiku-4-5-20251001"
+authType = "api_key"
+[models.gpt4o]
+provider = "openai"
+model = "gpt-4o"
+authType = "api_key"
+# Local Docker container settings
+[local]
+image = "al-agent:latest"   # Base image name (default: "al-agent:latest")
+memory = "4g"               # Memory limit per container (default: "4g")
+cpus = 2                    # CPU limit per container (default: 2)
+timeout = 900               # Default max container runtime in seconds (default: 900, overridable per-agent)
+# Gateway HTTP server settings
+[gateway]
+port = 8080                 # Gateway port (default: 8080)
+# Webhook sources — named webhook endpoints with provider type and credential
+[webhooks.my-github]
+type = "github"
+credential = "MyOrg"              # credential instance for HMAC validation
+# Scheduler settings
+resourceLockTimeout = 1800  # Lock TTL in seconds (default: 1800 / 30 minutes)
+maxReruns = 10              # Max consecutive reruns for successful agent runs (default: 10)
+maxCallDepth = 3            # Max depth for agent-to-agent call chains (default: 3)
+workQueueSize = 100         # Max queued work items (webhooks + calls) per agent (default: 100)
+scale = 10                  # Project-wide max concurrent runners across all agents (default: unlimited)
+historyRetentionDays = 14   # Days to retain run history and webhook receipts (default: 14)
+# Telemetry settings
+[telemetry]
+enabled = true
+provider = "otel"
+endpoint = "https://telemetry.example.com/v1"
+serviceName = "action-llama"
+samplingRate = 0.5
+```
+## Field Reference
+### Top-level fields
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `maxReruns` | number | `10` | Maximum consecutive reruns when an agent requests a rerun via `al-rerun` before stopping |
+| `maxCallDepth` | number | `3` | Maximum depth for agent-to-agent call chains (A calls B calls C = depth 2) |
+| `workQueueSize` | number | `100` | Maximum queued work items (webhook events + agent calls) per agent when all runners are busy. Can be overridden per-agent with `maxWorkQueueSize` in the agent's `config.toml`. |
+| `scale` | number | _(unlimited)_ | Project-wide cap on total concurrent runners across all agents |
+| `resourceLockTimeout` | number | `1800` | Default lock TTL in seconds. Locks expire automatically after this duration unless refreshed via heartbeat. See [Resource Locks](/concepts/resource-locks). |
+| `historyRetentionDays` | number | `14` | Number of days to retain run history and webhook receipts in the local SQLite stats database. Older entries are pruned automatically. |
+### `[models.<name>]` — Named Models
+Define models once in `config.toml`, then reference them by name in each agent's `SKILL.md` frontmatter. Agents list model names in priority order — the first is the primary model, and the rest are fallbacks tried automatically when the primary is rate-limited or unavailable.
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `provider` | string | Yes | LLM provider: `"anthropic"`, `"openai"`, `"groq"`, `"google"`, `"xai"`, `"mistral"`, `"openrouter"`, or `"custom"` |
+| `model` | string | Yes | Model ID (e.g. `"claude-sonnet-4-20250514"`, `"gpt-4o"`, `"gemini-2.0-flash-exp"`) |
+| `authType` | string | Yes | Auth method: `"api_key"`, `"oauth_token"`, or `"pi_auth"` |
+| `thinkingLevel` | string | No | Thinking budget: `"off"`, `"minimal"`, `"low"`, `"medium"`, `"high"`, `"xhigh"`. Only applies to Anthropic models with reasoning support. Ignored for other providers. |
+```toml
+[models.sonnet]
+provider = "anthropic"
+model = "claude-sonnet-4-20250514"
+thinkingLevel = "medium"
+authType = "api_key"
+[models.haiku]
+provider = "anthropic"
+model = "claude-haiku-4-5-20251001"
+authType = "api_key"
+[models.gpt4o]
+provider = "openai"
+model = "gpt-4o"
+authType = "api_key"
+```
+Agents reference these by name in their `config.toml`:
+```toml
+# agents/<name>/config.toml
+models = ["sonnet", "haiku"]
+```
+See [Models](/reference/models) for all supported providers, model IDs, auth types, and thinking levels.
+### `[local]` — Docker Container Settings
+Controls local Docker container isolation. These settings apply only to agents using the default container runtime — they are ignored for agents using the [host-user runtime](/reference/agent-config#runtime).
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `image` | string | `"al-agent:latest"` | Base Docker image name |
+| `memory` | string | `"4g"` | Memory limit per container (e.g. `"4g"`, `"8g"`) |
+| `cpus` | number | `2` | CPU limit per container |
+| `timeout` | number | `900` | Default max container runtime in seconds. Individual agents can override this with `timeout` in their `config.toml`. See [agent timeout](/reference/agent-config#timeout). |
+### `[gateway]` — HTTP Server
+The gateway starts automatically when Docker mode or webhooks are enabled. It handles health checks, webhook reception, credential serving (local Docker only), resource locking, and the shutdown kill switch.
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `port` | number | `8080` | Port for the gateway HTTP server |
+### `[webhooks.*]` — Webhook Sources
+Named webhook sources that agents can reference in their webhook triggers. Each source defines a provider type and an optional credential for signature validation.
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| `type` | string | Yes | Provider type: `"github"`, `"sentry"`, `"linear"`, or `"mintlify"` |
+| `credential` | string | No | Credential instance name for HMAC signature validation (e.g. `"MyOrg"` maps to `github_webhook_secret:MyOrg`). Omit for unsigned webhooks. |
+```toml
+[webhooks.my-github]
+type = "github"
+credential = "MyOrg"              # uses github_webhook_secret:MyOrg for HMAC validation
+[webhooks.my-sentry]
+type = "sentry"
+credential = "SentryProd"         # uses sentry_client_secret:SentryProd
+[webhooks.my-linear]
+type = "linear"
+credential = "LinearMain"         # uses linear_webhook_secret:LinearMain
+[webhooks.my-mintlify]
+type = "mintlify"
+credential = "MintlifyMain"       # uses mintlify_webhook_secret:MintlifyMain
+[webhooks.unsigned-github]
+type = "github"                   # no credential — accepts unsigned webhooks
+```
+Agents reference these sources by name in their `config.toml`:
+```toml
+# agents/<name>/config.toml
+[[webhooks]]
+source = "my-github"
+events = ["issues"]
+```
+See [Webhooks](/reference/webhooks) for setup instructions and filter fields per provider.
+### `[telemetry]` — Observability
+Optional OpenTelemetry integration.
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `enabled` | boolean | `false` | Enable or disable telemetry collection |
+| `provider` | string | `"none"` | Telemetry provider: `"otel"` or `"none"` |
+| `endpoint` | string | — | OpenTelemetry collector endpoint URL (required when `provider = "otel"`) |
+| `serviceName` | string | — | Service name reported to the collector |
+| `headers` | table | — | Additional HTTP headers sent with telemetry requests (e.g. auth tokens) |
+| `samplingRate` | number | — | Sampling rate between `0.0` (none) and `1.0` (all traces) |
+## Minimal Examples
+### Anthropic with Docker (typical dev setup)
+```toml
+[models.sonnet]
+provider = "anthropic"
+model = "claude-sonnet-4-20250514"
+thinkingLevel = "medium"
+authType = "api_key"
+```
+Everything else uses defaults: Docker enabled, 4GB memory, 2 CPUs, 15min timeout, gateway on port 8080.
+### VPS production (environment file)
+Server configuration lives in an environment file (`~/.action-llama/environments/<name>.toml`), not in `config.toml`. See [VPS Deployment](/concepts/vps-deployment) for full setup.
+```toml
+# ~/.action-llama/environments/production.toml
+[server]
+host = "5.6.7.8"
+user = "root"
+keyPath = "~/.ssh/id_rsa"
+basePath = "/opt/action-llama"
+expose = true
+```
+### Cloud Run Jobs runtime (environment file)
+To run agents as Cloud Run Jobs instead of local Docker containers, add a `[cloud]` section to your environment file. The scheduler still runs wherever you host it; only agent execution is offloaded to GCP.
+```toml
+# ~/.action-llama/environments/production.toml
+[cloud]
+provider = "cloud-run"
+project = "my-gcp-project"
+region = "us-central1"
+artifact_registry = "al-agents"
+# Optional: service account email for job execution identity
+# service_account = "al-agent-runner@my-gcp-project.iam.gserviceaccount.com"
+[gateway]
+url = "https://your-gateway.example.com"  # Must be publicly reachable
+```
+See [Running Agents on Cloud Run Jobs](/guides/cloud-run-runtime) for full setup instructions.
+---
+# Scheduler
+The scheduler is the heart of Action Llama. It discovers agents, fires cron triggers, dispatches webhook events, and manages runner pools.
+## Architecture
+```
+┌──────────────────────────────────────────────┐
+│                 Scheduler                    │
+│  Discovers agents, fires cron triggers,      │
+│  manages runner pool and work queue          │
+├──────────────────────────────────────────────┤
+│                  Gateway                     │
+│  HTTP server: webhooks, resource locks,      │
+│  dashboard, agent signals, control API       │
+├───────────┬───────────┬──────────────────────┤
+│ Container │ Container │   Host-User Process  │
+│  Agent A  │  Agent A  │       Agent B        │
+│ (run 1)   │ (run 2)   │      (run 1)         │
+└───────────┴───────────┴──────────────────────┘
+```
+- **Scheduler** — discovers agents by scanning for directories with `SKILL.md`. Registers cron jobs and webhook triggers. Manages a pool of runners per agent (configurable via `scale`) and a durable work queue for buffering events when runners are busy.
+- **Gateway** — HTTP server that receives webhooks from external services (GitHub, Sentry, Linear, Mintlify), serves the web dashboard, handles resource locking, and processes agent signals.
+- **Agent Processes** — each agent run is an isolated process — either a Docker container or a host-user process (via `sudo -u`) — with its own credentials and environment variables. Processes are ephemeral — they start, do their work, and are cleaned up.
+## Agent Discovery
+The scheduler scans the project directory for subdirectories containing a `SKILL.md` file. Each discovered directory becomes an agent. The directory name is the agent name.
+No registration step is needed. Add a new agent directory, restart the scheduler, and it picks it up automatically.
+## Cron Scheduling
+Agents with a `schedule` field in their `SKILL.md` frontmatter are registered as cron jobs. When a cron tick fires:
+- If a runner is available, the agent starts immediately
+- If all runners are busy, the scheduled run is **skipped** with a warning (cron runs are not queued)
+This means cron is best-effort — if an agent is still running from the previous tick, the new tick is dropped.
+## Webhook Dispatch
+When the gateway receives a webhook:
+1. **Signature verification** — the payload is verified against the credential secret for that webhook source
+2. **Event parsing** — the raw payload is parsed into a `WebhookContext` (source, event, action, repo, etc.)
+3. **Filter matching** — the context is matched against each agent's webhook trigger filters
+4. **Runner dispatch** — if a runner is available, the agent starts. If all runners are busy, the event is **queued** in the work queue.
+Unlike cron, webhook events are queued (not dropped) when runners are busy.
+## Runner Pools
+Each agent has its own pool of runners. The pool size is controlled by the `scale` field in `SKILL.md` frontmatter (default: 1).
+- `scale = 1` — only one instance can run at a time (default)
+- `scale = N` — up to N instances can run concurrently
+- `scale = 0` — agent is disabled (no runners, no cron, no webhooks)
+The project-wide `scale` field in `config.toml` sets a cap on total concurrent runners across all agents.
+## Work Queue
+When a webhook event or agent call arrives but all runners are busy, the event is placed in a **work queue**. Items are dequeued and executed as runners become available.
+- Backed by SQLite (`.al/work-queue.db`) — survives scheduler restarts
+- Per-agent queues
+- Configurable size: `workQueueSize` (default: 100)
+- When full, oldest items are dropped
+- Queue depth visible in `al stat` output
+## Reruns
+When a scheduled agent calls `al-rerun`, the scheduler immediately starts a new run. This continues until:
+- The agent completes without calling `al-rerun` (no more work)
+- The agent hits an error
+- The `maxReruns` limit is reached (default: 10)
+This lets agents drain their work queue efficiently without waiting for the next cron tick. Only scheduled runs can rerun — webhook and call runs do not.
+## Agent Calls
+Agents can call other agents via `al-subagent`. The scheduler routes the call to the target agent's runner pool:
+- If a runner is available, the called agent starts immediately
+- If all runners are busy, the call is queued in the work queue
+- Self-calls are rejected
+- Call depth is bounded by `maxCallDepth` (default: 3) to prevent infinite loops
+See [Subagents](/guides/subagents) for a guide on agent-to-agent workflows.
+## Graceful Shutdown
+When the scheduler receives a stop signal (`al stop` or SIGTERM):
+1. No new runs are started
+2. All pending work queues are cleared
+3. In-flight runs continue until they finish
+4. Once all runs complete, the process exits
+## Configuration
+| Setting | Location | Default | Description |
+|---------|----------|---------|-------------|
+| `maxReruns` | `config.toml` | `10` | Max consecutive reruns per agent |
+| `maxCallDepth` | `config.toml` | `3` | Max depth for agent call chains |
+| `workQueueSize` | `config.toml` | `100` | Max queued items per agent |
+| `scale` | `config.toml` | _(unlimited)_ | Project-wide max concurrent runners |
+| `scale` | `SKILL.md` frontmatter | `1` | Per-agent concurrent runner limit |
+| `gateway.port` | `config.toml` | `8080` | Gateway HTTP port |
+## Troubleshooting
+### Agent not running on schedule
+- Verify the cron expression in `SKILL.md` frontmatter is valid
+- Check if the agent or scheduler is paused: `al stat`
+- Resume if paused: `al resume` (scheduler) or `al resume <agent>`
+### Agent keeps re-running
+An agent that calls `al-rerun` will re-run immediately, up to `maxReruns` (default: 10). If it's re-running more than expected, check the agent's `SKILL.md` — it may be calling `al-rerun` even when there's no remaining work.
+```toml
+# config.toml — lower the limit if needed
+maxReruns = 5
+```
+### Agent timing out
+Default timeout is 900 seconds (15 minutes). Increase it in the project's `config.toml` or per-agent in the agent's `config.toml`:
+```toml
+# config.toml — project-wide default
+[local]
+timeout = 3600    # 1 hour
+```
+```toml
+# agents/<name>/config.toml — per-agent override
+timeout = 7200    # 2 hours
+```
+---
+# Dockerfiles
+Agents run in Docker containers built from a layered image system. See [Custom Dockerfiles](/guides/custom-dockerfiles) for a guide on writing and customizing Dockerfiles.
+Agents can also run without Docker using the [host-user runtime](/reference/agent-config#runtime). Dockerfiles and container configuration do not apply to host-user agents.
+## Agent Dockerfile
+Agents that need extra tools can add a `Dockerfile` to their directory:
+```
+my-project/
+  agents/
+    dev/
+      SKILL.md
+      Dockerfile            <-- custom image for this agent
+    reviewer/
+      SKILL.md
+                            <-- no Dockerfile, uses base image
+```
+Agents without a Dockerfile use `al-agent:latest` directly.
+## Project Dockerfile
+Project Dockerfiles make agents harder to reuse. When an agent depends on tools installed in the project base image, it won't work when shared to another project. Prefer agent-level Dockerfiles so each agent is self-contained and portable.
+Projects can optionally have a `Dockerfile` at the root that defines a shared base image for all agents. When present and customized beyond a bare `FROM al-agent:latest`, the build pipeline creates an intermediate image (`al-project-base:latest`) that all agent images layer on top of. If unmodified or absent, agents build directly on `al-agent:latest`.
+## Image Build Order
+```
+al-agent:latest            <-- Action Llama package (automatic)
+    |
+    v
+al-project-base:latest     <-- project Dockerfile (if customized)
+    |
+    v
+al-<agent>:latest          <-- per-agent Dockerfile (if present)
+```
+If the project Dockerfile is unmodified, the middle layer is skipped.
+## Base Image Contents
+The base image (`al-agent:latest`) is built automatically from the Action Llama package and includes:
+| Package | Why |
+|---------|-----|
+| `node:20-alpine` | Runs the container entry point and pi-coding-agent SDK |
+| `git` | Clone repos, create branches, push commits |
+| `curl` | API calls (Sentry, arbitrary HTTP), anti-exfiltration shutdown |
+| `ca-certificates` | HTTPS for git, curl, npm |
+| `openssh-client` | SSH for `GIT_SSH_COMMAND` — git clone/push over SSH |
+The base image also copies the compiled Action Llama application (`dist/`) and installs its npm dependencies. The entry point is `node /app/dist/agents/container-entry.js`.
+## Build Behavior
+- The base image (`al-agent:latest`) is only built if it doesn't exist yet
+- The project base image (`al-project-base:latest`) is rebuilt on every `al start` if the project Dockerfile has customizations
+- Agent images are named `al-<agent-name>:latest` (e.g. `al-dev:latest`) and are rebuilt on every `al start` to pick up Dockerfile changes
+- The build context is the Action Llama package root (not the project directory), so `COPY` paths reference the package's `dist/`, `package.json`, etc.
+- The `FROM` line in agent Dockerfiles is automatically rewritten to point at the correct base image
+## Container Filesystem Layout
+| Path | Mode | Contents |
+|------|------|----------|
+| `/app` | read-only | Action Llama application + node_modules |
+| `/credentials` | read-only | Mounted credential files (`/<type>/<instance>/<field>`) |
+| `/tmp` | read-write (tmpfs, 2GB) | Agent working directory — repos, scratch files, SSH keys |
+| `/workspace` | read-write (2GB) | Persistent workspace |
+| `/home/node` | read-write (64MB) | Home directory |
+## Configuration
+| Key | Default | Description |
+|-----|---------|-------------|
+| `local.image` | `"al-agent:latest"` | Base Docker image name |
+| `local.memory` | `"4g"` | Memory limit per container |
+| `local.cpus` | `2` | CPU limit per container |
+| `local.timeout` | `900` | Max container runtime in seconds |
+## Troubleshooting
+**"Docker is not running"** — Start Docker Desktop or the Docker daemon before running `al start`.
+```bash
+# macOS — open Docker Desktop
+open -a Docker
+# Linux
+sudo systemctl start docker
+```
+**Base image build fails** — Run `docker build -t al-agent:latest -f docker/Dockerfile .` from the Action Llama package directory to see the full build output.
+**Project base image build fails** — Check that the project `Dockerfile` starts with `FROM al-agent:latest` and that any `apk add` packages are spelled correctly. The base image uses Alpine Linux.
+**Agent image build fails** — Check that your agent's `Dockerfile` starts with `FROM al-agent:latest` (the build pipeline rewrites this to the correct base) and that any package install commands are correct.
+**Image build fails** — Check that your `Dockerfile` uses `apk add` (Alpine) or `apt-get` (Debian) depending on the base. The default base is Alpine.
+**Container out of memory** — Increase the memory limit in `config.toml`:
+```toml
+[local]
+memory = "8g"    # default: "4g"
+```
+**Container exits immediately** — Check `al logs <agent>` for the error. Common causes: missing credentials, missing `SKILL.md`, invalid model config.
+---
+# Deploying to a VPS
+This guide walks you through deploying your Action Llama project to a VPS for 24/7 agent operation. We'll use `al push` (the recommended SSH push deploy approach).
+## Overview
+Once deployed, your agents run continuously on the server — cron jobs fire on schedule, webhooks are publicly reachable, and the scheduler restarts automatically if it crashes.
+### Prerequisites
+- **[Cloudflare](https://www.cloudflare.com/)** account
+- DNS for a domain managed on Cloudflare (so you can point it at your AL server)
+- Cloudflare account API key with Zone editing permissions:
+  - *Zone Settings*
+  - *SSL and Certificates*
+  - *DNS*
+- **[Hertzner](https://hetzner.com)** or **[Vultr](https://www.vultr.com/)** account and corresponding API keys
+You should already have a domain whose DNS is hosted on Cloudflare. This is **strongly** recommended so that you can have HTTPS without additional configuration.
+## 1. Provision a server
+```bash
+al env prov myserver
+```
+The interactive wizard guides you through:
+1. **Choose a provider** — Hetzner or Vultr (or connect an existing server)
+2. **Pick a plan** — 2 vCPU / 4GB RAM works well for most projects ($5-6/month)
+3. **Pick a region** — choose one close to your webhook sources
+4. **SSH key** — generate a new key or use an existing one
+5. **HTTPS (optional)** — set up TLS via Cloudflare
+### TLS with Cloudflare (recommended)
+If you choose HTTPS, you'll need the Cloudflare account API token.
+What Action Llama sets up:
+- DNS A record pointing to your VPS (proxied through Cloudflare)
+- Cloudflare Origin CA certificate on the server
+- nginx reverse proxy with TLS termination
+- Cloudflare SSL mode set to Full (Strict)
+For convenience, set the environment as the default:
+```bash
+al env set myserver
+```
+## 2. Deploy
+```bash
+al push
+```
+This syncs your project files and credentials to the server, installs dependencies, and starts the scheduler as a systemd service. The first push takes a minute or two; subsequent pushes are faster (rsync only transfers changes).
+## 3. Verify
+```bash
+al env check production       # SSH + Docker + health check
+al stat -E production         # Agent status on the server
+al env logs production -f     # Tail server logs
+```
+## 4. Update a single agent
+After making changes to one agent, push just that agent:
+```bash
+al push dev -E production
+```
+This hot-reloads the agent without restarting the scheduler. The file watcher detects the change and picks up the new config/actions automatically.
+## 5. Tear down
+```bash
+al env deprov production
+```
+Stops containers, cleans credentials, deletes DNS records, and destroy the VPS instance.
+## Cost comparison
+| Provider | vCPU | RAM | Storage | Price/month |
+|----------|------|-----|---------|-------------|
+| Hetzner | 1 | 2GB | 20GB SSD | ~$4 |
+| Vultr | 1 | 1GB | 25GB SSD | $6 |
+| DigitalOcean | 1 | 1GB | 25GB SSD | $6 |
+| Linode | 1 | 1GB | 25GB SSD | $5 |
+## Alternative: manual deployment
+If you prefer to manage the server directly:
+```bash
+# On your VPS:
+npm install -g @action-llama/action-llama
+al new my-project && cd my-project
+al doctor
+al start -w --expose --headless
+```
+Then set up systemd for automatic restarts. See [VPS Deployment — concepts](/concepts/vps-deployment) for the full systemd unit file.
+## Next steps
+- [VPS Deployment (concepts)](/concepts/vps-deployment) — understand what happens under the hood
+- [CLI Commands](/reference/cli-commands) — full `al push` and `al env` reference
+- [Web Dashboard](/reference/web-dashboard) — monitor your deployed agents in a browser
+---
+# Running Agents on Cloud Run Jobs
+The **Cloud Run Jobs runtime** lets you run agent containers on [Google Cloud Run Jobs](https://cloud.google.com/run/docs/create-jobs) instead of a local Docker daemon or VPS. The scheduler continues to run wherever you host it (local machine, VPS, or CI), while agent execution is offloaded to Cloud Run.
+## Overview
+- **Agents run as Cloud Run Jobs** — ephemeral, serverless, one job per agent run
+- **Credentials via Secret Manager** — each credential field is stored as a Secret Manager secret, mounted into the job container at `/credentials/<type>/<instance>/<field>`
+- **Images via Artifact Registry** — agent images are pushed to Google Artifact Registry; old tags are automatically pruned
+- **Logs via Cloud Logging** — structured logs are streamed from Cloud Logging to your scheduler
+- **Public gateway required** — agents need to reach the gateway for registration, locks, and return values; the gateway URL must be publicly accessible
+## Prerequisites
+- A Google Cloud project with billing enabled
+- The following APIs enabled:
+  - `run.googleapis.com` (Cloud Run)
+  - `secretmanager.googleapis.com` (Secret Manager)
+  - `artifactregistry.googleapis.com` (Artifact Registry)
+  - `logging.googleapis.com` (Cloud Logging)
+- A GCP service account with these roles:
+  - `roles/run.admin`
+  - `roles/secretmanager.admin`
+  - `roles/artifactregistry.admin`
+  - `roles/logging.viewer`
+- An Artifact Registry Docker repository in your project
+## Setup
+### 1. Create a service account
+In the GCP console or via `gcloud`:
+```bash
+# Create the service account
+gcloud iam service-accounts create al-agent-runtime \
+  --project=my-project \
+  --display-name="Action Llama Agent Runtime"
+# Grant required roles
+for ROLE in run.admin secretmanager.admin artifactregistry.admin logging.viewer; do
+  gcloud projects add-iam-policy-binding my-project \
+    --member="serviceAccount:al-agent-runtime@my-project.iam.gserviceaccount.com" \
+    --role="roles/$ROLE"
+done
+# Create and download a key
+gcloud iam service-accounts keys create ~/al-agent-runtime-key.json \
+  --iam-account=al-agent-runtime@my-project.iam.gserviceaccount.com
+```
+### 2. Add the credential to Action Llama
+```bash
+al cred add gcp_service_account
+# Paste the contents of ~/al-agent-runtime-key.json when prompted
+```
+### 3. Create an Artifact Registry repository
+```bash
+gcloud artifacts repositories create al-agents \
+  --repository-format=docker \
+  --location=us-central1 \
+  --project=my-project
+```
+### 4. Configure your environment
+Add Cloud Run configuration to your environment file (`~/.action-llama/environments/<name>.toml`):
+```toml
+[cloud]
+provider = "cloud-run"
+project = "my-project"
+region = "us-central1"
+artifact_registry = "al-agents"
+# Optional: service account email for job execution identity
+# service_account = "al-agent-runner@my-project.iam.gserviceaccount.com"
+```
+Also ensure your gateway has a public URL configured:
+```toml
+[gateway]
+url = "https://your-gateway.example.com"
+```
+## How It Works
+### Credential mounting
+Before each agent run, the scheduler creates ephemeral Secret Manager secrets — one per credential field. Each secret is mounted into the Cloud Run Job container at `/credentials/<type>/<instance>/<field>`, preserving the exact path layout that agents expect.
+After the job completes, the runtime deletes all ephemeral secrets. This is equivalent to the Docker volume mount used by local and VPS runtimes.
+### Image lifecycle
+When you run `al push` or build an agent image:
+1. The image is built locally using `docker build`
+2. Tagged as `<region>-docker.pkg.dev/<project>/<registry>/<image>:<tag>`
+3. Pushed to Artifact Registry
+4. Old tags are automatically pruned — only the 3 most recent tags per image are kept
+To avoid unbounded storage costs, we recommend also setting up [Artifact Registry cleanup policies](https://cloud.google.com/artifact-registry/docs/repositories/cleanup-policy) as an additional safeguard.
+### Job execution
+Each agent run:
+1. Creates a Cloud Run Job (`al-<agentName>-<runId>`)
+2. Runs the job with `maxRetries: 0` (one-shot, no automatic retries)
+3. Configures a 1-hour default timeout (configurable via `timeout` in agent config)
+4. Streams logs from Cloud Logging (with ~5–10s ingestion latency)
+5. Polls for completion every 5 seconds
+6. Deletes the job and its associated secrets after completion
+### Orphan recovery
+Cloud Run Jobs are ephemeral. If the scheduler restarts, it can discover running jobs via `listRunningAgents()`. However, because Cloud Run Jobs don't expose container environment variables via an inspect API, orphaned jobs are **killed** rather than re-adopted. This is acceptable for ephemeral workloads.
+## Cost considerations
+| Resource | Cost |
+|----------|------|
+| Cloud Run Jobs | ~$0.00002400 per vCPU-second, ~$0.00000250 per GiB-second |
+| Secret Manager | $0.06/10,000 API operations; $0.06/active secret version/month |
+| Artifact Registry | ~$0.10/GB/month for stored images |
+| Cloud Logging | First 50 GiB/month free; $0.01/GiB after |
+For a typical agent run (2 vCPU, 2 GiB RAM, 5 minutes): ~$0.015 in Cloud Run compute.
+## Limitations
+- **Agents require a public gateway URL** — Cloud Run Jobs run in Google's infrastructure and can't reach a purely local scheduler. Configure `gateway.url` to point to a publicly accessible gateway.
+- **No real-time log streaming** — Cloud Logging has 5–10s ingestion latency; logs are polled every 3 seconds.
+- **No container inspect** — orphaned jobs are killed, not re-adopted.
+- **Image builds are local** — the `docker build` step runs where the scheduler runs (your machine or VPS). The built image is then pushed to Artifact Registry.
+- **Secret Manager quotas** — each credential field creates a Secret Manager secret. With many credentials and frequent runs, you may hit the default quota of 9,000 write operations per minute. Request a quota increase if needed.
+## Troubleshooting
+**Agents can't reach the gateway**
+Ensure `gateway.url` in your config points to a publicly reachable URL. The agent container runs in Google Cloud, not on your local network.
+**Secret Manager permission denied**
+The service account needs `roles/secretmanager.admin`. If you're using a dedicated execution service account (via `service_account` in config), that account also needs `roles/secretmanager.secretAccessor`.
+**Artifact Registry authentication fails**
+Ensure Docker is configured to authenticate with Artifact Registry:
+```bash
+gcloud auth configure-docker us-central1-docker.pkg.dev
+```
+**Cloud Run Job creation fails with quota error**
+Check your Cloud Run quotas in the GCP console. The default job limit per region is 1000. Request an increase if needed.
+**Logs appear delayed**
+Cloud Logging has 5–10s ingestion latency. This is expected. For debugging, check logs directly in the GCP console at:
+`https://console.cloud.google.com/run/jobs/details/<region>/<jobId>/executions?project=<project>`
+---
+# Continuous Deployment
+This guide sets up a CI pipeline that automatically deploys your Action Llama project whenever changes land on your main branch.
+## Overview
+The pipeline works like this:
+1. Code is pushed to `main` (or a dependency updates)
+2. GitHub Actions runs `npm install` and `al push --headless --no-creds`
+3. Your server receives the updated project files and restarts the scheduler
+Credentials are managed separately — the CI workflow only deploys code and agent configs, not secrets.
+### Prerequisites
+- A VPS already provisioned and working with `al push` (see [Deploying to a VPS](/guides/deploying-to-vps))
+- Credentials already on the server (pushed once via `al push` locally, or managed separately)
+- Your project in a GitHub repository
+## 1. Set up GitHub secrets
+You need two secrets in your GitHub repository (Settings > Secrets and variables > Actions):
+| Secret | Contents |
+|--------|----------|
+| `DEPLOY_SSH_KEY` | SSH private key for the server (the same key used by `al push`) |
+| `DEPLOY_ENV_TOML` | Your environment TOML file contents |
+### Getting the environment TOML
+Copy the contents of your environment file — this is the file at `~/.action-llama/environments/<name>.toml` on your local machine. It should look something like:
+```toml
+[server]
+host = "203.0.113.42"
+user = "root"
+keyPath = "~/.ssh/deploy_key"
+```
+  Set `keyPath` to `~/.ssh/deploy_key` — this is where the CI workflow will write the SSH key.
+### Getting the SSH key
+This is the private key that `al push` uses to connect to your server. If you provisioned with `al env prov`, it was generated automatically and stored in the credential system. Copy it from:
+```bash
+cat ~/.action-llama/credentials/vps_ssh/default/private_key
+```
+## 2. Create the deploy workflow
+Add this file to your project repository:
+```yaml
+# .github/workflows/deploy.yml
+name: Deploy
+on:
+  push:
+    branches: [main]
+  workflow_dispatch:
+concurrency:
+  group: deploy
+  cancel-in-progress: false
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    timeout-minutes: 10
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-node@v4
+        with:
+          node-version: 20
+          cache: npm
+      - name: Install dependencies
+        run: npm ci
+      - name: Set up SSH key
+        run: |
+          mkdir -p ~/.ssh
+          echo "${{ secrets.DEPLOY_SSH_KEY }}" > ~/.ssh/deploy_key
+          chmod 600 ~/.ssh/deploy_key
+      - name: Set up environment config
+        run: |
+          mkdir -p ~/.action-llama/environments
+          echo '${{ secrets.DEPLOY_ENV_TOML }}' > ~/.action-llama/environments/prod.toml
+      - name: Deploy
+        run: npx al push --env prod --headless --no-creds
+```
+This installs your project (including Action Llama), writes the SSH key and environment config to the expected paths, and runs `al push` in headless mode with credential syncing disabled.
+## 3. Managing credentials separately
+Since the CI workflow skips credentials (`--no-creds`), you need to push credentials to the server separately. Do this from your local machine:
+```bash
+al push --creds-only --env prod
+```
+Run this whenever you add or rotate a credential. The server retains credentials across code deploys — `al push --no-creds` only syncs project files.
+## Cross-repo triggers
+If your agent project depends on a package in another repository (e.g., a shared Action Llama fork), you can trigger deploys automatically when that upstream repo changes.
+### Using repository dispatch
+In the **upstream** repository's CI workflow, add a step that fires a deploy event after tests pass:
+```yaml
+- name: Trigger deploy
+  if: github.ref == 'refs/heads/main'
+  run: |
+    gh api repos/<your-org>/<your-agents-repo>/dispatches \
+      -f event_type=deploy
+  env:
+    GH_TOKEN: ${{ secrets.AGENTS_DEPLOY_TOKEN }}
+```
+Then update your deploy workflow to also listen for this event:
+```yaml
+on:
+  push:
+    branches: [main]
+  repository_dispatch:
+    types: [deploy]
+  workflow_dispatch:
+```
+The `AGENTS_DEPLOY_TOKEN` secret needs to be a GitHub personal access token (or fine-grained token) with `contents: write` permission on the agents repository.
+### Installing from GitHub instead of npm
+If you want your agents project to always use the latest version from a GitHub repository rather than a published npm package, update your `package.json`:
+```json
+{
+  "dependencies": {
+    "@action-llama/action-llama": "github:<your-org>/action-llama#main"
+  }
+}
+```
+When `npm install` runs in CI, it clones the repo, runs the `prepare` script (which builds the TypeScript), and installs the result. Combined with a repository dispatch trigger, this gives you fully automated end-to-end deployment: merge to the upstream repo triggers a deploy of your agents project with the latest version.
+## Verifying deploys
+After a deploy, you can check the status from your local machine:
+```bash
+al stat --env prod        # Agent status on the server
+al logs --env prod -f     # Tail server logs
+```
+Or check the GitHub Actions run output — `al push` prints deployment progress and a health check result at the end.
+## Next steps
+- [Deploying to a VPS](/guides/deploying-to-vps) — initial server setup
+- [CLI Commands](/reference/cli-commands) — full `al push` flag reference
+- [Credentials](/reference/credentials) — how credentials are stored and synced
+---
+# Custom Dockerfiles
+Action Llama agents run in Docker containers built from a minimal Alpine-based image with Node.js, git, and curl. Agents that need extra tools can add a `Dockerfile` to their directory.
+Custom Dockerfiles only apply to agents using the default container runtime. Agents configured with the [host-user runtime](/reference/agent-config#runtime) do not use Docker and will ignore any Dockerfile.
+Project-level Dockerfiles are also supported but not recommended — they make agents harder to reuse across projects. See the [Dockerfiles reference](/reference/dockerfiles#project-dockerfile) for details.
+## Agent Dockerfiles
+Agents that need extra tools can add a `Dockerfile` to their directory:
+```
+my-project/
+  agents/
+    dev/
+      SKILL.md
+      Dockerfile            <-- custom image for this agent
+    reviewer/
+      SKILL.md
+                            <-- no Dockerfile, uses base image
+```
+Use `FROM al-agent:latest` and add what you need. The build pipeline automatically rewrites the `FROM` line at build time. Switch to `root` to install packages, then back to `node`:
+```dockerfile
+FROM al-agent:latest
+USER root
+RUN apk add --no-cache github-cli
+USER node
+```
+This is a thin layer on top of the base — fast to build and shares most of the image.
+## Common additions
+```dockerfile
+# GitHub CLI (for gh issue list, gh pr create, etc.)
+RUN apk add --no-cache github-cli
+# Python (for agents that run Python scripts)
+RUN apk add --no-cache python3 py3-pip
+# jq (for JSON processing in bash) — already in the base image
+# RUN apk add --no-cache jq
+```
+## Writing a standalone Dockerfile
+If you need full control, you can write a Dockerfile from scratch. It must:
+1. Include Node.js 20+
+2. Copy the application code from the base image or install it
+3. Set `ENTRYPOINT ["node", "/app/dist/agents/container-entry.js"]`
+4. Use uid 1000 (`USER node` on node images) for compatibility with the container launcher
+Example standalone Dockerfile:
+```dockerfile
+FROM node:20-alpine
+# Install your tools
+RUN apk add --no-cache git curl ca-certificates openssh-client github-cli jq python3
+# Copy app from the base image (avoids rebuilding from source)
+COPY --from=al-agent:latest /app /app
+WORKDIR /app
+USER node
+ENTRYPOINT ["node", "/app/dist/agents/container-entry.js"]
+```
+The key requirement is that `/app/dist/agents/container-entry.js` exists and can run. The entry point reads `AGENT_CONFIG`, `PROMPT`, `GATEWAY_URL`, and `SHUTDOWN_SECRET` from environment variables, and credentials from `/credentials/`.
+## Next steps
+- [Dockerfiles reference](/reference/dockerfiles) — build behavior, image contents, filesystem layout, and configuration
+- [Scaling Agents](/guides/scaling-agents) — run multiple instances of an agent
+---
+# Gateway API
+The gateway is the HTTP server that runs alongside the scheduler. It handles webhooks, serves the [web dashboard](/reference/web-dashboard), and exposes control and status APIs used by CLI commands and the dashboard.
+The gateway starts automatically when needed — either when webhooks are configured, when `--web-ui` is passed to `al start`, or when Docker container communication is required. The port is controlled by the `[gateway].port` setting in `config.toml` (default: `8080`).
+## Authentication
+The gateway API is protected by an API key. The same key is used for both browser sessions and CLI access.
+**Key location:** `~/.action-llama/credentials/gateway_api_key/default/key`
+The key is generated automatically by `al doctor` or on first `al start`. To view or regenerate it, run `al doctor`.
+### CLI access
+CLI commands (`al stat`, `al pause`, `al resume`, `al kill`) automatically read the API key from the credential store and send it as a `Bearer` token in the `Authorization` header.
+### Browser access
+The [web dashboard](/reference/web-dashboard) uses cookie-based authentication. After logging in with the API key, an `al_session` cookie is set (HttpOnly, SameSite=Strict) so all subsequent requests — including SSE streams — are authenticated automatically.
+### Protected routes
+The following routes require authentication:
+- `/dashboard` and `/dashboard/*` — all dashboard pages and SSE streams
+- `/control/*` — scheduler and agent control endpoints
+- `/locks/status` — active lock information
+Health checks (`/health`), webhook endpoints (`/webhooks/*`), and container management routes are **not** protected.
+### Migrating from `AL_DASHBOARD_SECRET`
+The old `AL_DASHBOARD_SECRET` environment variable (HTTP Basic Auth) is no longer used. If it's still set, a deprecation warning is logged. Remove it from your environment and run `al doctor` to set up the new API key.
+## Control API
+All control endpoints use `POST` and require authentication.
+### Scheduler control
+| Endpoint | Description |
+|----------|-------------|
+| `POST /control/pause` | Pause the scheduler (all cron jobs) |
+| `POST /control/resume` | Resume the scheduler |
+### Agent control
+| Endpoint | Description |
+|----------|-------------|
+| `POST /control/trigger/<name>` | Trigger an immediate agent run |
+| `POST /control/agents/<name>/enable` | Enable a disabled agent |
+| `POST /control/agents/<name>/disable` | Disable an agent (pauses its cron job) |
+| `POST /control/agents/<name>/pause` | Pause an agent (alias for disable) |
+| `POST /control/agents/<name>/resume` | Resume an agent (alias for enable) |
+| `POST /control/agents/<name>/kill` | Kill all running instances of an agent |
+## Status API
+### SSE streams
+Live updates use **Server-Sent Events (SSE)**:
+| Endpoint | Description |
+|----------|-------------|
+| `GET /dashboard/api/status-stream` | Pushes agent status and scheduler info whenever state changes |
+| `GET /dashboard/api/logs/<agent>/stream` | Streams log lines for a specific agent (500ms poll interval) |
+### Trigger history
+| Endpoint | Description |
+|----------|-------------|
+| `GET /api/stats/triggers` | Paginated trigger history (cron, webhook, agent-call). Supports query params: `page`, `limit`, `deadLetter` (boolean). |
+| `POST /api/webhooks/:receiptId/replay` | Re-dispatch a stored webhook payload by receipt ID. Returns the dispatch result. |
+### Health check
+| Endpoint | Description |
+|----------|-------------|
+| `GET /health` | Health check (no authentication required) |
+### Lock status
+| Endpoint | Description |
+|----------|-------------|
+| `GET /locks/status` | Active resource lock information (requires authentication) |
+---
+# Web Dashboard
+Action Llama includes an optional web-based dashboard for monitoring agents in your browser. It provides a live view of agent statuses and streaming logs — similar to the terminal TUI, but accessible from any browser.
+## Enabling the Dashboard
+Pass `-w` or `--web-ui` to `al start`:
+```bash
+al start -w
+```
+The dashboard URL is shown in the TUI header and in headless log output once the scheduler starts:
+```
+Dashboard: http://localhost:8080/dashboard
+```
+The port is controlled by the `[gateway].port` setting in `config.toml` (default: `8080`).
+## Authentication
+The dashboard is protected by the gateway API key. Navigate to `http://localhost:8080/dashboard` and you'll be redirected to a login page where you paste your API key. On success, an `al_session` cookie is set (HttpOnly, SameSite=Strict) so all subsequent requests — including SSE streams — are authenticated automatically.
+A **Logout** link is available in the dashboard header.
+See [Gateway API — Authentication](/reference/gateway-api#authentication) for details on key management and protected routes.
+## Dashboard Pages
+### Main Page — `/dashboard`
+Displays a live overview of all agents:
+| Column | Description |
+|--------|-------------|
+| Agent | Agent name (click to view logs) |
+| State | Current state: idle, running, building, or error |
+| Status | Latest status text or error message |
+| Last Run | Timestamp of the most recent run |
+| Duration | How long the last run took |
+| Next Run | When the next scheduled run will happen |
+| Actions | **Run** (trigger an immediate run) and **Enable/Disable** (toggle the agent) |
+The header also includes:
+- **Pause/Resume** button — pauses or resumes the scheduler (all cron jobs)
+- **Logout** link — clears the session cookie and redirects to the login page
+Below the table, a **Recent Activity** section shows the last 20 log lines across all agents.
+All data updates in real time via Server-Sent Events (SSE) — no manual refresh needed.
+### Trigger History — `/dashboard/triggers`
+Displays a paginated table of all trigger events — cron, webhook, and agent-call triggers — with the outcome of each. Includes a toggle to show dead-letter webhook receipts (payloads that arrived but did not match any agent or failed validation).
+Features:
+- **Pagination** — browse through historical triggers
+- **Dead-letter toggle** — show webhook payloads that were received but not dispatched
+- **Replay** — re-dispatch a stored webhook payload to matching agents
+The same data is available via the [Trigger History API](/reference/gateway-api#trigger-history).
+### Agent Logs — `/dashboard/agents/<name>/logs`
+Displays a live-streaming log view for a single agent. Logs follow automatically by default (new entries scroll into view as they arrive).
+Features:
+- **Follow mode** — enabled by default, auto-scrolls to the latest log entry. Scrolling up pauses follow; scrolling back to the bottom re-enables it.
+- **Clear** — clears the log display (does not delete log files).
+- **Connection status** — shows whether the SSE connection is active.
+- **Log levels** — color-coded: green for INFO, yellow for WARN, red for ERROR.
+On initial load, the last 100 log entries from the agent's log file are displayed, then new entries stream in as they are written.
+## How It Works
+The dashboard is served by the same [gateway](/reference/gateway-api) that handles webhooks and container communication. When `--web-ui` is enabled, the gateway starts even if Docker and webhooks are not configured.
+Dashboard actions (Run, Enable/Disable, Pause/Resume) call the [control API](/reference/gateway-api#control-api) endpoints. Live updates are delivered via [SSE streams](/reference/gateway-api#sse-streams).
+No additional dependencies or frontend build steps are required. The dashboard is rendered as plain HTML with inline CSS and JavaScript.