@action-llama/action-llama 0.12.2 → 0.13.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/{docs/agent-reference → agent-docs}/AGENTS.md +31 -15
- package/{docs/agent-reference → agent-docs}/skills/README.md +1 -0
- package/agent-docs/skills/calls.md +82 -0
- package/{docs/agent-reference → agent-docs}/skills/resource-locks.md +13 -7
- package/{docs/agent-reference → agent-docs}/skills/signals.md +1 -1
- package/dist/agents/container-runner.d.ts +3 -2
- package/dist/agents/container-runner.d.ts.map +1 -1
- package/dist/agents/container-runner.js +12 -12
- package/dist/agents/container-runner.js.map +1 -1
- package/dist/agents/prompt.d.ts.map +1 -1
- package/dist/agents/prompt.js +3 -2
- package/dist/agents/prompt.js.map +1 -1
- package/dist/agents/runner.d.ts +3 -2
- package/dist/agents/runner.d.ts.map +1 -1
- package/dist/agents/runner.js +14 -14
- package/dist/agents/runner.js.map +1 -1
- package/dist/build-info.json +1 -1
- package/dist/cli/commands/doctor.d.ts +1 -0
- package/dist/cli/commands/doctor.d.ts.map +1 -1
- package/dist/cli/commands/doctor.js +53 -15
- package/dist/cli/commands/doctor.js.map +1 -1
- package/dist/cli/commands/env.d.ts +4 -0
- package/dist/cli/commands/env.d.ts.map +1 -1
- package/dist/cli/commands/env.js +41 -0
- package/dist/cli/commands/env.js.map +1 -1
- package/dist/cli/commands/kill.js +2 -2
- package/dist/cli/commands/kill.js.map +1 -1
- package/dist/cli/commands/logs.d.ts.map +1 -1
- package/dist/cli/commands/logs.js +25 -20
- package/dist/cli/commands/logs.js.map +1 -1
- package/dist/cli/commands/pause.js +2 -2
- package/dist/cli/commands/pause.js.map +1 -1
- package/dist/cli/commands/push.d.ts +1 -0
- package/dist/cli/commands/push.d.ts.map +1 -1
- package/dist/cli/commands/push.js +2 -1
- package/dist/cli/commands/push.js.map +1 -1
- package/dist/cli/commands/resume.js +2 -2
- package/dist/cli/commands/resume.js.map +1 -1
- package/dist/cli/commands/run.d.ts.map +1 -1
- package/dist/cli/commands/run.js +21 -46
- package/dist/cli/commands/run.js.map +1 -1
- package/dist/cli/commands/start.d.ts.map +1 -1
- package/dist/cli/commands/start.js +62 -2
- package/dist/cli/commands/start.js.map +1 -1
- package/dist/cli/commands/status.d.ts.map +1 -1
- package/dist/cli/commands/status.js +23 -7
- package/dist/cli/commands/status.js.map +1 -1
- package/dist/cli/commands/stop.d.ts +1 -0
- package/dist/cli/commands/stop.d.ts.map +1 -1
- package/dist/cli/commands/stop.js +3 -2
- package/dist/cli/commands/stop.js.map +1 -1
- package/dist/cli/gateway-client.d.ts +6 -0
- package/dist/cli/gateway-client.d.ts.map +1 -1
- package/dist/cli/gateway-client.js +19 -0
- package/dist/cli/gateway-client.js.map +1 -1
- package/dist/cli/main.js +12 -0
- package/dist/cli/main.js.map +1 -1
- package/dist/cloud/vps/constants.d.ts +1 -1
- package/dist/cloud/vps/constants.d.ts.map +1 -1
- package/dist/cloud/vps/constants.js +9 -0
- package/dist/cloud/vps/constants.js.map +1 -1
- package/dist/cloud/vps/hetzner-api.d.ts +14 -3
- package/dist/cloud/vps/hetzner-api.d.ts.map +1 -1
- package/dist/cloud/vps/hetzner-api.js +24 -11
- package/dist/cloud/vps/hetzner-api.js.map +1 -1
- package/dist/cloud/vps/provision.js +29 -6
- package/dist/cloud/vps/provision.js.map +1 -1
- package/dist/cloud/vps/ssh.d.ts +7 -0
- package/dist/cloud/vps/ssh.d.ts.map +1 -1
- package/dist/cloud/vps/ssh.js +15 -1
- package/dist/cloud/vps/ssh.js.map +1 -1
- package/dist/credentials/builtins/index.d.ts.map +1 -1
- package/dist/credentials/builtins/index.js +2 -0
- package/dist/credentials/builtins/index.js.map +1 -1
- package/dist/credentials/builtins/reddit-oauth.d.ts +4 -0
- package/dist/credentials/builtins/reddit-oauth.d.ts.map +1 -0
- package/dist/credentials/builtins/reddit-oauth.js +71 -0
- package/dist/credentials/builtins/reddit-oauth.js.map +1 -0
- package/dist/docker/local-runtime.d.ts +1 -0
- package/dist/docker/local-runtime.d.ts.map +1 -1
- package/dist/docker/local-runtime.js +9 -6
- package/dist/docker/local-runtime.js.map +1 -1
- package/dist/gateway/index.d.ts.map +1 -1
- package/dist/gateway/index.js +5 -4
- package/dist/gateway/index.js.map +1 -1
- package/dist/gateway/routes/logs.d.ts.map +1 -1
- package/dist/gateway/routes/logs.js +29 -111
- package/dist/gateway/routes/logs.js.map +1 -1
- package/dist/remote/bootstrap.d.ts +2 -0
- package/dist/remote/bootstrap.d.ts.map +1 -1
- package/dist/remote/bootstrap.js +7 -11
- package/dist/remote/bootstrap.js.map +1 -1
- package/dist/remote/push.d.ts +6 -0
- package/dist/remote/push.d.ts.map +1 -1
- package/dist/remote/push.js +172 -91
- package/dist/remote/push.js.map +1 -1
- package/dist/remote/ssh.d.ts +1 -0
- package/dist/remote/ssh.d.ts.map +1 -1
- package/dist/remote/ssh.js +8 -0
- package/dist/remote/ssh.js.map +1 -1
- package/dist/scheduler/index.d.ts.map +1 -1
- package/dist/scheduler/index.js +56 -7
- package/dist/scheduler/index.js.map +1 -1
- package/dist/scheduler/watcher.d.ts +1 -1
- package/dist/scheduler/watcher.d.ts.map +1 -1
- package/dist/scheduler/watcher.js +5 -6
- package/dist/scheduler/watcher.js.map +1 -1
- package/dist/setup/scaffold.js +2 -2
- package/dist/setup/scaffold.js.map +1 -1
- package/dist/shared/config.d.ts +1 -0
- package/dist/shared/config.d.ts.map +1 -1
- package/dist/shared/config.js.map +1 -1
- package/dist/shared/credentials.d.ts +8 -18
- package/dist/shared/credentials.d.ts.map +1 -1
- package/dist/shared/credentials.js +8 -62
- package/dist/shared/credentials.js.map +1 -1
- package/dist/shared/server.d.ts +2 -0
- package/dist/shared/server.d.ts.map +1 -1
- package/dist/shared/server.js.map +1 -1
- package/dist/tui/App.d.ts.map +1 -1
- package/dist/tui/App.js +1 -1
- package/dist/tui/App.js.map +1 -1
- package/dist/webhooks/definitions/github.d.ts.map +1 -1
- package/dist/webhooks/definitions/github.js +13 -0
- package/dist/webhooks/definitions/github.js.map +1 -1
- package/dist/webhooks/providers/github.d.ts.map +1 -1
- package/dist/webhooks/providers/github.js +6 -0
- package/dist/webhooks/providers/github.js.map +1 -1
- package/dist/webhooks/registry.d.ts.map +1 -1
- package/dist/webhooks/registry.js +9 -3
- package/dist/webhooks/registry.js.map +1 -1
- package/dist/webhooks/types.d.ts +3 -1
- package/dist/webhooks/types.d.ts.map +1 -1
- package/docker/bin/_http-exit +17 -0
- package/docker/bin/al-call +10 -4
- package/docker/bin/al-check +9 -3
- package/docker/bin/al-status +1 -1
- package/docker/bin/al-wait +11 -3
- package/docker/bin/rlock +9 -2
- package/docker/bin/rlock-heartbeat +9 -2
- package/docker/bin/runlock +9 -2
- package/package.json +2 -2
- package/docs/agent-config-reference.md +0 -313
- package/docs/agents.md +0 -256
- package/docs/cloud-run.md +0 -173
- package/docs/cloud.md +0 -98
- package/docs/commands.md +0 -286
- package/docs/config-reference.md +0 -241
- package/docs/creating-agents.md +0 -147
- package/docs/credentials.md +0 -167
- package/docs/docker.md +0 -323
- package/docs/ecs.md +0 -795
- package/docs/examples/dev/ACTIONS.md +0 -75
- package/docs/examples/dev/README.md +0 -28
- package/docs/examples/dev/agent-config.toml +0 -18
- package/docs/examples/devops/ACTIONS.md +0 -33
- package/docs/examples/devops/README.md +0 -23
- package/docs/examples/devops/agent-config.toml +0 -13
- package/docs/examples/index.md +0 -15
- package/docs/examples/reviewer/ACTIONS.md +0 -37
- package/docs/examples/reviewer/README.md +0 -22
- package/docs/examples/reviewer/agent-config.toml +0 -11
- package/docs/models.md +0 -191
- package/docs/vps-deployment.md +0 -285
- package/docs/web-dashboard.md +0 -113
- package/docs/webhooks.md +0 -152
- /package/{docs/agent-reference → agent-docs}/skills/credentials.md +0 -0
- /package/{docs/agent-reference → agent-docs}/skills/environment.md +0 -0
package/docs/ecs.md
DELETED
|
@@ -1,795 +0,0 @@
|
|
|
1
|
-
# ECS Fargate Mode
|
|
2
|
-
|
|
3
|
-
Run agents as ECS Fargate tasks on AWS instead of local Docker containers. Agents get the same isolation guarantees with the added benefits of managed infrastructure and per-agent secret isolation via IAM task roles.
|
|
4
|
-
|
|
5
|
-
## Prerequisites
|
|
6
|
-
|
|
7
|
-
- AWS account with ECS, ECR, Secrets Manager, DynamoDB, and CloudWatch Logs access
|
|
8
|
-
- AWS CLI configured (`aws configure`) or `AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY` env vars
|
|
9
|
-
- Docker is **not** required — images are built remotely via AWS CodeBuild and pushed directly to ECR
|
|
10
|
-
- The IAM user running `al setup cloud` needs `iam:CreateServiceLinkedRole` permission (or the service-linked roles for ECS and App Runner must already exist — see [Service-linked roles](#service-linked-roles))
|
|
11
|
-
|
|
12
|
-
## Configuration
|
|
13
|
-
|
|
14
|
-
In your project's `config.toml`:
|
|
15
|
-
|
|
16
|
-
```toml
|
|
17
|
-
[cloud]
|
|
18
|
-
provider = "ecs"
|
|
19
|
-
awsRegion = "us-east-1"
|
|
20
|
-
ecsCluster = "al-cluster"
|
|
21
|
-
ecrRepository = "123456789012.dkr.ecr.us-east-1.amazonaws.com/al-images"
|
|
22
|
-
executionRoleArn = "arn:aws:iam::123456789012:role/ecsTaskExecutionRole"
|
|
23
|
-
taskRoleArn = "arn:aws:iam::123456789012:role/al-default-task-role"
|
|
24
|
-
subnets = ["subnet-abc123"]
|
|
25
|
-
# securityGroups = ["sg-abc123"] # optional
|
|
26
|
-
# awsSecretPrefix = "action-llama" # optional, default: "action-llama"
|
|
27
|
-
```
|
|
28
|
-
|
|
29
|
-
| Key | Required | Description |
|
|
30
|
-
|-----|----------|-------------|
|
|
31
|
-
| `cloud.provider` | Yes | Set to `"ecs"` |
|
|
32
|
-
| `cloud.awsRegion` | Yes | AWS region (e.g. `us-east-1`) |
|
|
33
|
-
| `cloud.ecsCluster` | Yes | ECS cluster name or ARN |
|
|
34
|
-
| `cloud.ecrRepository` | Yes | Full ECR repository URI |
|
|
35
|
-
| `cloud.executionRoleArn` | Yes | IAM role for task execution (ECR pull + CloudWatch Logs) |
|
|
36
|
-
| `cloud.taskRoleArn` | Yes | Default IAM task role (Secrets Manager access) |
|
|
37
|
-
| `cloud.subnets` | Yes | VPC subnet IDs for Fargate tasks |
|
|
38
|
-
| `cloud.securityGroups` | No | Security group IDs for Fargate tasks |
|
|
39
|
-
| `cloud.awsSecretPrefix` | No | Secrets Manager name prefix (default: `"action-llama"`) |
|
|
40
|
-
| `cloud.buildBucket` | No | S3 bucket for CodeBuild source uploads (auto-created if omitted) |
|
|
41
|
-
|
|
42
|
-
Local Docker settings (`[local]`) control resource limits:
|
|
43
|
-
|
|
44
|
-
| Key | Default | Description |
|
|
45
|
-
|-----|---------|-------------|
|
|
46
|
-
| `local.memory` | `"4096"` | Memory per task in MiB |
|
|
47
|
-
| `local.cpus` | `2` | CPUs per task |
|
|
48
|
-
| `local.timeout` | `900` | Default max execution time in seconds (overridable per-agent) |
|
|
49
|
-
|
|
50
|
-
Individual agents can override the timeout in their `agent-config.toml` with the `timeout` field. On ECS, agents with effective timeout <= 900s automatically route to Lambda for faster startup. See [Per-agent timeout](#per-agent-timeout-and-lambda-routing).
|
|
51
|
-
|
|
52
|
-
Optional Lambda configuration (for agents that auto-route to Lambda):
|
|
53
|
-
|
|
54
|
-
| Key | Required | Default | Description |
|
|
55
|
-
|-----|----------|---------|-------------|
|
|
56
|
-
| `cloud.lambdaRoleArn` | No | auto-derived | Lambda execution role ARN (overrides per-agent role derivation) |
|
|
57
|
-
| `cloud.lambdaSubnets` | No | — | VPC subnet IDs for Lambda (only if Lambda needs VPC access) |
|
|
58
|
-
| `cloud.lambdaSecurityGroups` | No | — | Security groups for Lambda (only with `lambdaSubnets`) |
|
|
59
|
-
|
|
60
|
-
Optional cloud scheduler configuration (for `al cloud deploy`):
|
|
61
|
-
|
|
62
|
-
| Key | Required | Default | Description |
|
|
63
|
-
|-----|----------|---------|-------------|
|
|
64
|
-
| `cloud.schedulerCpu` | No | `"256"` | App Runner instance CPU (valid: `256`, `512`, `1024`, `2048`, `4096`) |
|
|
65
|
-
| `cloud.schedulerMemory` | No | `"512"` | App Runner instance memory in MB (valid depends on CPU — see [App Runner docs](https://docs.aws.amazon.com/apprunner/latest/dg/manage-configure.html)) |
|
|
66
|
-
| `cloud.appRunnerInstanceRoleArn` | No | — | IAM role assumed by the scheduler container (needs ECS, Lambda, Secrets Manager, ECR, CodeBuild, S3, CloudWatch Logs permissions) |
|
|
67
|
-
| `cloud.appRunnerAccessRoleArn` | Yes* | — | IAM role that allows App Runner to pull images from ECR (*required only when using `al cloud deploy`) |
|
|
68
|
-
|
|
69
|
-
## Service-linked roles
|
|
70
|
-
|
|
71
|
-
AWS requires service-linked roles for ECS and App Runner. These are account-level roles that AWS services use internally — they only need to be created once per AWS account.
|
|
72
|
-
|
|
73
|
-
`al setup cloud` automatically creates both:
|
|
74
|
-
|
|
75
|
-
- `AWSServiceRoleForECS` (for ECS Fargate task execution)
|
|
76
|
-
- `AWSServiceRoleForAppRunner` (for App Runner service management)
|
|
77
|
-
|
|
78
|
-
If your IAM user lacks `iam:CreateServiceLinkedRole` permission, create them manually:
|
|
79
|
-
|
|
80
|
-
```bash
|
|
81
|
-
aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com
|
|
82
|
-
aws iam create-service-linked-role --aws-service-name apprunner.amazonaws.com
|
|
83
|
-
```
|
|
84
|
-
|
|
85
|
-
These commands are safe to re-run — they return an error if the role already exists.
|
|
86
|
-
|
|
87
|
-
## Quick Setup
|
|
88
|
-
|
|
89
|
-
The fastest way to get started:
|
|
90
|
-
|
|
91
|
-
```bash
|
|
92
|
-
al setup cloud -p .
|
|
93
|
-
```
|
|
94
|
-
|
|
95
|
-
This interactive wizard prompts for all required fields, writes the `[cloud]` config, pushes credentials, and provisions IAM in one step.
|
|
96
|
-
|
|
97
|
-
## Manual Setup
|
|
98
|
-
|
|
99
|
-
### 1. Create an ECS cluster
|
|
100
|
-
|
|
101
|
-
```bash
|
|
102
|
-
aws ecs create-cluster --cluster-name al-cluster --region us-east-1
|
|
103
|
-
```
|
|
104
|
-
|
|
105
|
-
### 2. Create an ECR repository
|
|
106
|
-
|
|
107
|
-
```bash
|
|
108
|
-
aws ecr create-repository --repository-name al-images --region us-east-1
|
|
109
|
-
```
|
|
110
|
-
|
|
111
|
-
### 3. Create the execution role
|
|
112
|
-
|
|
113
|
-
The execution role allows ECS to pull images from ECR and write logs to CloudWatch. Create `ecs-execution-trust.json`:
|
|
114
|
-
|
|
115
|
-
```json
|
|
116
|
-
{
|
|
117
|
-
"Version": "2012-10-17",
|
|
118
|
-
"Statement": [{
|
|
119
|
-
"Effect": "Allow",
|
|
120
|
-
"Principal": { "Service": "ecs-tasks.amazonaws.com" },
|
|
121
|
-
"Action": "sts:AssumeRole"
|
|
122
|
-
}]
|
|
123
|
-
}
|
|
124
|
-
```
|
|
125
|
-
|
|
126
|
-
```bash
|
|
127
|
-
aws iam create-role \
|
|
128
|
-
--role-name ecsTaskExecutionRole \
|
|
129
|
-
--assume-role-policy-document file://ecs-execution-trust.json
|
|
130
|
-
|
|
131
|
-
aws iam attach-role-policy \
|
|
132
|
-
--role-name ecsTaskExecutionRole \
|
|
133
|
-
--policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
|
|
134
|
-
```
|
|
135
|
-
|
|
136
|
-
The execution role also needs `secretsmanager:GetSecretValue` (so ECS can inject secrets at launch) and `logs:CreateLogGroup` (so ECS can create the CloudWatch log group on first run):
|
|
137
|
-
|
|
138
|
-
```bash
|
|
139
|
-
aws iam put-role-policy \
|
|
140
|
-
--role-name ecsTaskExecutionRole \
|
|
141
|
-
--policy-name ActionLlamaExecution \
|
|
142
|
-
--policy-document '{
|
|
143
|
-
"Version": "2012-10-17",
|
|
144
|
-
"Statement": [
|
|
145
|
-
{
|
|
146
|
-
"Effect": "Allow",
|
|
147
|
-
"Action": "secretsmanager:GetSecretValue",
|
|
148
|
-
"Resource": "arn:aws:secretsmanager:us-east-1:123456789012:secret:action-llama/*"
|
|
149
|
-
},
|
|
150
|
-
{
|
|
151
|
-
"Effect": "Allow",
|
|
152
|
-
"Action": "logs:CreateLogGroup",
|
|
153
|
-
"Resource": "arn:aws:logs:us-east-1:123456789012:log-group:/ecs/action-llama*"
|
|
154
|
-
}
|
|
155
|
-
]
|
|
156
|
-
}'
|
|
157
|
-
```
|
|
158
|
-
|
|
159
|
-
### 4. Push credentials and create per-agent task roles
|
|
160
|
-
|
|
161
|
-
```bash
|
|
162
|
-
al doctor -c -p .
|
|
163
|
-
```
|
|
164
|
-
|
|
165
|
-
This pushes all local credentials to AWS Secrets Manager, then creates a task role for each agent (`al-{agentName}-task-role`) and grants `secretsmanager:GetSecretValue` scoped to only that agent's declared secrets.
|
|
166
|
-
|
|
167
|
-
> **Re-run after adding agents:** Whenever you add a new agent to your project, re-run `al doctor -c` to create the task role for the new agent. Without this, the new agent will fail to access its credentials at runtime.
|
|
168
|
-
|
|
169
|
-
Alternatively, create roles manually:
|
|
170
|
-
|
|
171
|
-
```bash
|
|
172
|
-
# Create the role
|
|
173
|
-
aws iam create-role \
|
|
174
|
-
--role-name al-dev-task-role \
|
|
175
|
-
--assume-role-policy-document file://ecs-execution-trust.json
|
|
176
|
-
|
|
177
|
-
# Grant access to only this agent's secrets
|
|
178
|
-
aws iam put-role-policy \
|
|
179
|
-
--role-name al-dev-task-role \
|
|
180
|
-
--policy-name SecretsAccess \
|
|
181
|
-
--policy-document '{
|
|
182
|
-
"Version": "2012-10-17",
|
|
183
|
-
"Statement": [{
|
|
184
|
-
"Effect": "Allow",
|
|
185
|
-
"Action": "secretsmanager:GetSecretValue",
|
|
186
|
-
"Resource": [
|
|
187
|
-
"arn:aws:secretsmanager:us-east-1:123456789012:secret:action-llama/github_token/default/*",
|
|
188
|
-
"arn:aws:secretsmanager:us-east-1:123456789012:secret:action-llama/anthropic_key/default/*"
|
|
189
|
-
]
|
|
190
|
-
}]
|
|
191
|
-
}'
|
|
192
|
-
```
|
|
193
|
-
|
|
194
|
-
Repeat for each agent (`al-reviewer-task-role`, `al-devops-task-role`, etc.), scoping each role's policy to only that agent's credential paths.
|
|
195
|
-
|
|
196
|
-
### 5. Ensure VPC networking
|
|
197
|
-
|
|
198
|
-
Fargate tasks need a VPC subnet with internet access (for pulling images, calling APIs). Use a public subnet with `assignPublicIp: ENABLED` (the default) or a private subnet with a NAT gateway.
|
|
199
|
-
|
|
200
|
-
### 6. Start
|
|
201
|
-
|
|
202
|
-
```bash
|
|
203
|
-
al start -c -p .
|
|
204
|
-
```
|
|
205
|
-
|
|
206
|
-
The scheduler will:
|
|
207
|
-
1. Build agent images remotely via CodeBuild and push to ECR
|
|
208
|
-
2. Create the CloudWatch log group if it doesn't exist
|
|
209
|
-
3. Register ECS task definitions with Secrets Manager secret injection
|
|
210
|
-
4. Run Fargate tasks on schedule or webhook trigger
|
|
211
|
-
5. Stream logs from CloudWatch Logs
|
|
212
|
-
|
|
213
|
-
## Per-agent timeout and Lambda routing
|
|
214
|
-
|
|
215
|
-
When using the ECS provider, agents are automatically routed to the most efficient AWS compute service based on their timeout:
|
|
216
|
-
|
|
217
|
-
- **Timeout <= 900s (15 min):** Routes to **AWS Lambda** — cold starts in ~1-2 seconds, pay-per-100ms pricing
|
|
218
|
-
- **Timeout > 900s:** Routes to **ECS Fargate** — cold starts in ~30-60 seconds, pay-per-second pricing
|
|
219
|
-
|
|
220
|
-
This happens automatically. You control it by setting `timeout` in each agent's `agent-config.toml`:
|
|
221
|
-
|
|
222
|
-
```toml
|
|
223
|
-
# agent-config.toml for a fast webhook responder
|
|
224
|
-
timeout = 300 # 5 minutes — will use Lambda on AWS
|
|
225
|
-
```
|
|
226
|
-
|
|
227
|
-
```toml
|
|
228
|
-
# agent-config.toml for a long-running refactoring agent
|
|
229
|
-
timeout = 3600 # 1 hour — will use ECS Fargate
|
|
230
|
-
```
|
|
231
|
-
|
|
232
|
-
If an agent doesn't set `timeout`, it falls back to `[local].timeout` in `config.toml`, then to the default of 900s. Since 900s is the Lambda maximum, agents without an explicit timeout default to Lambda.
|
|
233
|
-
|
|
234
|
-
### Lambda memory
|
|
235
|
-
|
|
236
|
-
Lambda functions default to 512 MB of memory, which is sufficient for typical LLM agent workloads (HTTP calls to LLM APIs). Lambda's maximum is 3008 MB — the `local.memory` config value is clamped to this limit for Lambda-routed agents.
|
|
237
|
-
|
|
238
|
-
To increase memory for a specific agent, set `memory` in project `config.toml`:
|
|
239
|
-
|
|
240
|
-
```toml
|
|
241
|
-
[local]
|
|
242
|
-
memory = "2048" # 2 GB — clamped to 3008 for Lambda, used as-is for ECS
|
|
243
|
-
```
|
|
244
|
-
|
|
245
|
-
### Why Lambda is faster
|
|
246
|
-
|
|
247
|
-
Lambda keeps container images warm in pre-provisioned execution environments. When invoked, Lambda starts executing in ~1-2 seconds. ECS Fargate must provision a fresh VM, pull the image, and start the container — taking 30-60 seconds.
|
|
248
|
-
|
|
249
|
-
For agents that respond to webhooks (e.g., triaging issues, reviewing PRs, responding to alerts), this means the agent starts working almost immediately after the event arrives.
|
|
250
|
-
|
|
251
|
-
### Shared infrastructure
|
|
252
|
-
|
|
253
|
-
Both Lambda and ECS Fargate use the same infrastructure:
|
|
254
|
-
|
|
255
|
-
- **Same ECR images** — built once via CodeBuild, referenced by both runtimes
|
|
256
|
-
- **Same Secrets Manager credentials** — Lambda resolves secrets at invocation time and passes them as environment variables using the same `AL_SECRET_*` naming convention
|
|
257
|
-
- **Same CodeBuild pipeline** — no separate build step needed
|
|
258
|
-
|
|
259
|
-
### Lambda IAM roles
|
|
260
|
-
|
|
261
|
-
`al doctor -c` automatically creates Lambda execution roles (`al-{agentName}-lambda-role`) for agents with timeout <= 900s. These roles include:
|
|
262
|
-
|
|
263
|
-
- `secretsmanager:GetSecretValue` scoped to the agent's declared secrets
|
|
264
|
-
- `logs:CreateLogGroup`, `logs:CreateLogStream`, `logs:PutLogEvents` for CloudWatch
|
|
265
|
-
- `ecr:BatchGetImage`, `ecr:GetDownloadUrlForLayer` for pulling images
|
|
266
|
-
|
|
267
|
-
To use a shared role instead of per-agent roles, set `cloud.lambdaRoleArn` in `config.toml`.
|
|
268
|
-
|
|
269
|
-
### Operator IAM additions for Lambda
|
|
270
|
-
|
|
271
|
-
If your agents route to Lambda, add these permissions to your operator IAM policy:
|
|
272
|
-
|
|
273
|
-
```json
|
|
274
|
-
{
|
|
275
|
-
"Sid": "Lambda",
|
|
276
|
-
"Effect": "Allow",
|
|
277
|
-
"Action": [
|
|
278
|
-
"lambda:GetFunction",
|
|
279
|
-
"lambda:CreateFunction",
|
|
280
|
-
"lambda:UpdateFunctionCode",
|
|
281
|
-
"lambda:UpdateFunctionConfiguration",
|
|
282
|
-
"lambda:PutFunctionEventInvokeConfig",
|
|
283
|
-
"lambda:InvokeFunction"
|
|
284
|
-
],
|
|
285
|
-
"Resource": "arn:aws:iam::<ACCOUNT_ID>:function:al-*"
|
|
286
|
-
}
|
|
287
|
-
```
|
|
288
|
-
|
|
289
|
-
And extend the `PassRole` condition to include `lambda.amazonaws.com` and the App Runner service principals:
|
|
290
|
-
|
|
291
|
-
```json
|
|
292
|
-
"Condition": {
|
|
293
|
-
"StringEquals": {
|
|
294
|
-
"iam:PassedToService": [
|
|
295
|
-
"ecs-tasks.amazonaws.com",
|
|
296
|
-
"codebuild.amazonaws.com",
|
|
297
|
-
"lambda.amazonaws.com",
|
|
298
|
-
"tasks.apprunner.amazonaws.com",
|
|
299
|
-
"build.apprunner.amazonaws.com"
|
|
300
|
-
]
|
|
301
|
-
}
|
|
302
|
-
}
|
|
303
|
-
```
|
|
304
|
-
|
|
305
|
-
## How it works
|
|
306
|
-
|
|
307
|
-
### Image lifecycle
|
|
308
|
-
|
|
309
|
-
Images are built remotely via AWS CodeBuild and pushed directly to ECR — no local Docker required. This means the scheduler can run anywhere (your machine, Railway, EC2, etc.).
|
|
310
|
-
|
|
311
|
-
Each agent gets its own image tag (`al-{agentName}-latest`). The build happens on every `al start -c` to ensure the latest code is deployed.
|
|
312
|
-
|
|
313
|
-
### How CodeBuild works
|
|
314
|
-
|
|
315
|
-
On each build, the ECS runtime:
|
|
316
|
-
|
|
317
|
-
1. Creates a tarball of the build context
|
|
318
|
-
2. Uploads it to S3 (bucket: `buildBucket` from config, or auto-created as `al-builds-<accountId>-<region>`)
|
|
319
|
-
3. Creates a CodeBuild project (`al-image-builder`) if it doesn't exist
|
|
320
|
-
4. Starts a build that produces and pushes the Docker image to ECR
|
|
321
|
-
|
|
322
|
-
This requires:
|
|
323
|
-
|
|
324
|
-
- An IAM role `al-codebuild-role` that CodeBuild can assume, with ECR push and S3 read permissions
|
|
325
|
-
- The operator IAM policy must include CodeBuild and S3 permissions (see below)
|
|
326
|
-
|
|
327
|
-
To create the CodeBuild service role:
|
|
328
|
-
|
|
329
|
-
```bash
|
|
330
|
-
# Trust policy
|
|
331
|
-
aws iam create-role \
|
|
332
|
-
--role-name al-codebuild-role \
|
|
333
|
-
--assume-role-policy-document '{
|
|
334
|
-
"Version": "2012-10-17",
|
|
335
|
-
"Statement": [{
|
|
336
|
-
"Effect": "Allow",
|
|
337
|
-
"Principal": { "Service": "codebuild.amazonaws.com" },
|
|
338
|
-
"Action": "sts:AssumeRole"
|
|
339
|
-
}]
|
|
340
|
-
}'
|
|
341
|
-
|
|
342
|
-
# ECR push permissions
|
|
343
|
-
aws iam put-role-policy \
|
|
344
|
-
--role-name al-codebuild-role \
|
|
345
|
-
--policy-name ECRPush \
|
|
346
|
-
--policy-document '{
|
|
347
|
-
"Version": "2012-10-17",
|
|
348
|
-
"Statement": [
|
|
349
|
-
{
|
|
350
|
-
"Effect": "Allow",
|
|
351
|
-
"Action": "ecr:GetAuthorizationToken",
|
|
352
|
-
"Resource": "*"
|
|
353
|
-
},
|
|
354
|
-
{
|
|
355
|
-
"Effect": "Allow",
|
|
356
|
-
"Action": [
|
|
357
|
-
"ecr:BatchCheckLayerAvailability",
|
|
358
|
-
"ecr:PutImage",
|
|
359
|
-
"ecr:InitiateLayerUpload",
|
|
360
|
-
"ecr:UploadLayerPart",
|
|
361
|
-
"ecr:CompleteLayerUpload",
|
|
362
|
-
"ecr:GetDownloadUrlForLayer",
|
|
363
|
-
"ecr:BatchGetImage"
|
|
364
|
-
],
|
|
365
|
-
"Resource": "arn:aws:ecr:<REGION>:<ACCOUNT_ID>:repository/al-*"
|
|
366
|
-
},
|
|
367
|
-
{
|
|
368
|
-
"Effect": "Allow",
|
|
369
|
-
"Action": "s3:GetObject",
|
|
370
|
-
"Resource": "arn:aws:s3:::al-builds-<ACCOUNT_ID>-<REGION>/*"
|
|
371
|
-
},
|
|
372
|
-
{
|
|
373
|
-
"Effect": "Allow",
|
|
374
|
-
"Action": [
|
|
375
|
-
"logs:CreateLogGroup",
|
|
376
|
-
"logs:CreateLogStream",
|
|
377
|
-
"logs:PutLogEvents"
|
|
378
|
-
],
|
|
379
|
-
"Resource": "*"
|
|
380
|
-
}
|
|
381
|
-
]
|
|
382
|
-
}'
|
|
383
|
-
```
|
|
384
|
-
|
|
385
|
-
### Secret injection
|
|
386
|
-
|
|
387
|
-
ECS injects secrets from AWS Secrets Manager as environment variables using the naming convention `AL_SECRET_{type}__{instance}__{field}`. The container entry point reads these environment variables and writes them to `/credentials/{type}/{instance}/{field}` for compatibility with the standard credential layout.
|
|
388
|
-
|
|
389
|
-
Secret names in Secrets Manager follow the convention: `{prefix}/{type}/{instance}/{field}` (e.g. `action-llama/github_token/default/token`).
|
|
390
|
-
|
|
391
|
-
### Per-agent task roles
|
|
392
|
-
|
|
393
|
-
Each agent runs with its own IAM task role:
|
|
394
|
-
|
|
395
|
-
```
|
|
396
|
-
al-dev-task-role -> github_token, git_ssh, anthropic_key
|
|
397
|
-
al-reviewer-task-role -> github_token, git_ssh, anthropic_key
|
|
398
|
-
al-devops-task-role -> github_token, sentry_token, anthropic_key
|
|
399
|
-
```
|
|
400
|
-
|
|
401
|
-
Each role only has `secretsmanager:GetSecretValue` on its declared secrets. Even if an agent container is compromised and accesses the ECS task metadata endpoint to obtain the role's credentials, it can only read its own secrets.
|
|
402
|
-
|
|
403
|
-
The per-agent role ARN is derived automatically from the ECR repository's account ID: `arn:aws:iam::{accountId}:role/al-{agentName}-task-role`.
|
|
404
|
-
|
|
405
|
-
### State persistence
|
|
406
|
-
|
|
407
|
-
The scheduler persists its runtime state (container registry, resource locks, work queues, inter-agent call state) to a DynamoDB table (`al-state`). This allows the scheduler to survive restarts (e.g., App Runner redeployments) without losing track of running containers. `al cloud setup` creates this table automatically with on-demand billing and native TTL.
|
|
408
|
-
|
|
409
|
-
Locally, the same state is stored in a SQLite database at `.al/state.db` in your project directory.
|
|
410
|
-
|
|
411
|
-
### Gateway
|
|
412
|
-
|
|
413
|
-
The gateway is **not required** for ECS mode. Containers get their credentials via native Secrets Manager injection (not the gateway's HTTP endpoint), and ECS handles task timeouts natively. The gateway still starts if you have webhooks configured, since webhooks are received by the scheduler process.
|
|
414
|
-
|
|
415
|
-
### Log streaming
|
|
416
|
-
|
|
417
|
-
Logs are streamed from CloudWatch Logs by polling. There is a ~5-10 second delay inherent to CloudWatch Logs ingestion. The TUI displays a warning about this delay when running in ECS mode.
|
|
418
|
-
|
|
419
|
-
## Comparison with local Docker
|
|
420
|
-
|
|
421
|
-
| Aspect | Local Docker | ECS Fargate | Lambda (auto, <=900s) |
|
|
422
|
-
|--------|-------------|-------------|----------------------|
|
|
423
|
-
| Where containers run | Your machine | AWS | AWS |
|
|
424
|
-
| Cold start | Instant (image cached) | ~30-60s | ~1-2s |
|
|
425
|
-
| Max runtime | Unlimited | Unlimited | 15 minutes |
|
|
426
|
-
| Credential delivery | Volume mount | Secrets Manager env vars | Secrets Manager env vars |
|
|
427
|
-
| Secret isolation | Mount-level | IAM task roles | IAM Lambda roles |
|
|
428
|
-
| Log latency | Real-time | ~5-10s | ~5-10s |
|
|
429
|
-
| Image builds | Local Docker | Remote via CodeBuild | Remote via CodeBuild |
|
|
430
|
-
| Cost | Free (your hardware) | Pay per second | Pay per 100ms |
|
|
431
|
-
|
|
432
|
-
## AWS permissions summary
|
|
433
|
-
|
|
434
|
-
There are six IAM principals involved:
|
|
435
|
-
|
|
436
|
-
1. **Operator** — your machine or CI (runs `al` commands)
|
|
437
|
-
2. **Execution role** — used by ECS itself to pull images, write logs, and inject secrets
|
|
438
|
-
3. **Task role** — one per agent on ECS Fargate, used by the container to read its own secrets
|
|
439
|
-
4. **Lambda execution role** — one per short-timeout agent, used by Lambda to read secrets and write logs
|
|
440
|
-
5. **App Runner access role** — allows App Runner to pull scheduler images from ECR (only for `al cloud deploy`)
|
|
441
|
-
6. **App Runner instance role** — assumed by the scheduler container, needs operator-level permissions (only for `al cloud deploy`)
|
|
442
|
-
|
|
443
|
-
### Operator IAM policy
|
|
444
|
-
|
|
445
|
-
This is the minimum policy for the IAM user or role running `al` commands. Replace `<REGION>`, `<ACCOUNT_ID>`, and `<REPO_NAME>` with your values.
|
|
446
|
-
|
|
447
|
-
> **Note:** `al setup cloud` automatically grants the PassRole and Logs statements via the `ActionLlamaOperator` inline policy. You still need to attach the remaining statements (ECS, SecretsManager, ECR, CodeBuild, Lambda, S3, IAM) manually or via your own IaC.
|
|
448
|
-
|
|
449
|
-
```json
|
|
450
|
-
{
|
|
451
|
-
"Version": "2012-10-17",
|
|
452
|
-
"Statement": [
|
|
453
|
-
{
|
|
454
|
-
"Sid": "Identity",
|
|
455
|
-
"Effect": "Allow",
|
|
456
|
-
"Action": "sts:GetCallerIdentity",
|
|
457
|
-
"Resource": "*"
|
|
458
|
-
},
|
|
459
|
-
{
|
|
460
|
-
"Sid": "ECSRuntime",
|
|
461
|
-
"Effect": "Allow",
|
|
462
|
-
"Action": [
|
|
463
|
-
"ecs:RegisterTaskDefinition",
|
|
464
|
-
"ecs:RunTask",
|
|
465
|
-
"ecs:DescribeTasks",
|
|
466
|
-
"ecs:ListTasks",
|
|
467
|
-
"ecs:StopTask"
|
|
468
|
-
],
|
|
469
|
-
"Resource": "*"
|
|
470
|
-
},
|
|
471
|
-
{
|
|
472
|
-
"Sid": "Logs",
|
|
473
|
-
"Effect": "Allow",
|
|
474
|
-
"Action": [
|
|
475
|
-
"logs:CreateLogGroup",
|
|
476
|
-
"logs:GetLogEvents",
|
|
477
|
-
"logs:FilterLogEvents"
|
|
478
|
-
],
|
|
479
|
-
"Resource": [
|
|
480
|
-
"arn:aws:logs:<REGION>:<ACCOUNT_ID>:log-group:/ecs/action-llama*",
|
|
481
|
-
"arn:aws:logs:<REGION>:<ACCOUNT_ID>:log-group:/aws/lambda/al-*",
|
|
482
|
-
"arn:aws:logs:<REGION>:<ACCOUNT_ID>:log-group:/aws/apprunner/al-scheduler*"
|
|
483
|
-
]
|
|
484
|
-
},
|
|
485
|
-
{
|
|
486
|
-
"Sid": "SecretsManager",
|
|
487
|
-
"Effect": "Allow",
|
|
488
|
-
"Action": [
|
|
489
|
-
"secretsmanager:ListSecrets",
|
|
490
|
-
"secretsmanager:CreateSecret",
|
|
491
|
-
"secretsmanager:PutSecretValue",
|
|
492
|
-
"secretsmanager:GetSecretValue"
|
|
493
|
-
],
|
|
494
|
-
"Resource": "*"
|
|
495
|
-
},
|
|
496
|
-
{
|
|
497
|
-
"Sid": "PassRole",
|
|
498
|
-
"Effect": "Allow",
|
|
499
|
-
"Action": "iam:PassRole",
|
|
500
|
-
"Resource": [
|
|
501
|
-
"arn:aws:iam::<ACCOUNT_ID>:role/al-*"
|
|
502
|
-
]
|
|
503
|
-
},
|
|
504
|
-
{
|
|
505
|
-
"Sid": "IAMAgentRoles",
|
|
506
|
-
"Effect": "Allow",
|
|
507
|
-
"Action": [
|
|
508
|
-
"iam:CreateRole",
|
|
509
|
-
"iam:GetRole",
|
|
510
|
-
"iam:GetRolePolicy",
|
|
511
|
-
"iam:PutRolePolicy",
|
|
512
|
-
"iam:DeleteRole",
|
|
513
|
-
"iam:DeleteRolePolicy",
|
|
514
|
-
"iam:AttachRolePolicy"
|
|
515
|
-
],
|
|
516
|
-
"Resource": [
|
|
517
|
-
"arn:aws:iam::<ACCOUNT_ID>:role/al-*"
|
|
518
|
-
]
|
|
519
|
-
},
|
|
520
|
-
{
|
|
521
|
-
"Sid": "IAMListRoles",
|
|
522
|
-
"Effect": "Allow",
|
|
523
|
-
"Action": "iam:ListRoles",
|
|
524
|
-
"Resource": "*"
|
|
525
|
-
},
|
|
526
|
-
{
|
|
527
|
-
"Sid": "ECR",
|
|
528
|
-
"Effect": "Allow",
|
|
529
|
-
"Action": [
|
|
530
|
-
"ecr:BatchGetImage",
|
|
531
|
-
"ecr:GetDownloadUrlForLayer",
|
|
532
|
-
"ecr:PutImage",
|
|
533
|
-
"ecr:InitiateLayerUpload",
|
|
534
|
-
"ecr:UploadLayerPart",
|
|
535
|
-
"ecr:CompleteLayerUpload",
|
|
536
|
-
"ecr:SetRepositoryPolicy"
|
|
537
|
-
],
|
|
538
|
-
"Resource": "arn:aws:ecr:<REGION>:<ACCOUNT_ID>:repository/<REPO_NAME>"
|
|
539
|
-
},
|
|
540
|
-
{
|
|
541
|
-
"Sid": "SetupWizardReadOnly",
|
|
542
|
-
"Effect": "Allow",
|
|
543
|
-
"Action": [
|
|
544
|
-
"ecr:DescribeRepositories",
|
|
545
|
-
"ecr:CreateRepository",
|
|
546
|
-
"ecs:ListClusters",
|
|
547
|
-
"ecs:DescribeClusters",
|
|
548
|
-
"ecs:CreateCluster",
|
|
549
|
-
"ec2:DescribeVpcs",
|
|
550
|
-
"ec2:DescribeSubnets",
|
|
551
|
-
"ec2:DescribeSecurityGroups"
|
|
552
|
-
],
|
|
553
|
-
"Resource": "*"
|
|
554
|
-
},
|
|
555
|
-
{
|
|
556
|
-
"Sid": "CodeBuild",
|
|
557
|
-
"Effect": "Allow",
|
|
558
|
-
"Action": [
|
|
559
|
-
"codebuild:StartBuild",
|
|
560
|
-
"codebuild:BatchGetBuilds",
|
|
561
|
-
"codebuild:CreateProject"
|
|
562
|
-
],
|
|
563
|
-
"Resource": "arn:aws:codebuild:<REGION>:<ACCOUNT_ID>:project/al-image-builder"
|
|
564
|
-
},
|
|
565
|
-
{
|
|
566
|
-
"Sid": "Lambda",
|
|
567
|
-
"Effect": "Allow",
|
|
568
|
-
"Action": [
|
|
569
|
-
"lambda:GetFunction",
|
|
570
|
-
"lambda:CreateFunction",
|
|
571
|
-
"lambda:UpdateFunctionCode",
|
|
572
|
-
"lambda:UpdateFunctionConfiguration",
|
|
573
|
-
"lambda:PutFunctionEventInvokeConfig",
|
|
574
|
-
"lambda:InvokeFunction"
|
|
575
|
-
],
|
|
576
|
-
"Resource": "arn:aws:lambda:<REGION>:<ACCOUNT_ID>:function:al-*"
|
|
577
|
-
},
|
|
578
|
-
{
|
|
579
|
-
"Sid": "S3BuildContext",
|
|
580
|
-
"Effect": "Allow",
|
|
581
|
-
"Action": [
|
|
582
|
-
"s3:CreateBucket",
|
|
583
|
-
"s3:PutObject",
|
|
584
|
-
"s3:ListBucket"
|
|
585
|
-
],
|
|
586
|
-
"Resource": "*"
|
|
587
|
-
},
|
|
588
|
-
{
|
|
589
|
-
"Sid": "AppRunner",
|
|
590
|
-
"Effect": "Allow",
|
|
591
|
-
"Action": [
|
|
592
|
-
"apprunner:CreateService",
|
|
593
|
-
"apprunner:UpdateService",
|
|
594
|
-
"apprunner:DescribeService",
|
|
595
|
-
"apprunner:DeleteService"
|
|
596
|
-
],
|
|
597
|
-
"Resource": "arn:aws:apprunner:<REGION>:<ACCOUNT_ID>:service/al-scheduler/*"
|
|
598
|
-
},
|
|
599
|
-
{
|
|
600
|
-
"Sid": "AppRunnerList",
|
|
601
|
-
"Effect": "Allow",
|
|
602
|
-
"Action": "apprunner:ListServices",
|
|
603
|
-
"Resource": "*"
|
|
604
|
-
},
|
|
605
|
-
{
|
|
606
|
-
"Sid": "DynamoDB",
|
|
607
|
-
"Effect": "Allow",
|
|
608
|
-
"Action": [
|
|
609
|
-
"dynamodb:GetItem",
|
|
610
|
-
"dynamodb:PutItem",
|
|
611
|
-
"dynamodb:DeleteItem",
|
|
612
|
-
"dynamodb:Query",
|
|
613
|
-
"dynamodb:CreateTable",
|
|
614
|
-
"dynamodb:DescribeTable",
|
|
615
|
-
"dynamodb:UpdateTimeToLive"
|
|
616
|
-
],
|
|
617
|
-
"Resource": "arn:aws:dynamodb:<REGION>:<ACCOUNT_ID>:table/al-state"
|
|
618
|
-
},
|
|
619
|
-
{
|
|
620
|
-
"Sid": "ServiceLinkedRoles",
|
|
621
|
-
"Effect": "Allow",
|
|
622
|
-
"Action": "iam:CreateServiceLinkedRole",
|
|
623
|
-
"Resource": [
|
|
624
|
-
"arn:aws:iam::<ACCOUNT_ID>:role/aws-service-role/ecs.amazonaws.com/*",
|
|
625
|
-
"arn:aws:iam::<ACCOUNT_ID>:role/aws-service-role/apprunner.amazonaws.com/*"
|
|
626
|
-
]
|
|
627
|
-
}
|
|
628
|
-
]
|
|
629
|
-
}
|
|
630
|
-
```
|
|
631
|
-
|
|
632
|
-
The `SetupWizardReadOnly` statement is only needed for `al setup cloud`. You can remove it after initial setup if you prefer a tighter policy.
|
|
633
|
-
|
|
634
|
-
The `CodeBuild` and `S3BuildContext` statements are required for image builds via CodeBuild.
|
|
635
|
-
|
|
636
|
-
The `SecretsManager` statement can be scoped to `arn:aws:secretsmanager:<REGION>:<ACCOUNT_ID>:secret:action-llama/*` if you use the default secret prefix.
|
|
637
|
-
|
|
638
|
-
The `IAMAgentRoles` statement is scoped to `al-*` roles, so it cannot modify unrelated IAM resources.
|
|
639
|
-
|
|
640
|
-
The `AppRunner` statement is only needed for `al cloud deploy` / `al teardown cloud`. You can omit it if you only run the scheduler locally.
|
|
641
|
-
|
|
642
|
-
### Execution role (ECS infrastructure)
|
|
643
|
-
|
|
644
|
-
Attach the AWS managed policy `AmazonECSTaskExecutionRolePolicy`, plus an inline policy for secret injection:
|
|
645
|
-
|
|
646
|
-
| Service | Actions |
|
|
647
|
-
|---------|---------|
|
|
648
|
-
| ECR | `GetDownloadUrlForLayer`, `BatchGetImage`, `GetAuthorizationToken` |
|
|
649
|
-
| CloudWatch Logs | `CreateLogStream`, `PutLogEvents`, `CreateLogGroup` |
|
|
650
|
-
| Secrets Manager | `GetSecretValue` (on all agent secrets, so ECS can inject them) |
|
|
651
|
-
|
|
652
|
-
### Task role (container, per-agent)
|
|
653
|
-
|
|
654
|
-
Created automatically by `al doctor -c`. Each agent gets its own role scoped to only its secrets:
|
|
655
|
-
|
|
656
|
-
| Service | Actions |
|
|
657
|
-
|---------|---------|
|
|
658
|
-
| Secrets Manager | `GetSecretValue` (scoped to only that agent's secrets) |
|
|
659
|
-
|
|
660
|
-
## Deploying the scheduler
|
|
661
|
-
|
|
662
|
-
### Using `al cloud deploy` (recommended)
|
|
663
|
-
|
|
664
|
-
Deploy the scheduler as an AWS App Runner service:
|
|
665
|
-
|
|
666
|
-
```bash
|
|
667
|
-
al cloud deploy -p .
|
|
668
|
-
```
|
|
669
|
-
|
|
670
|
-
This builds a container image with the AL CLI and all project files baked in, pushes it to ECR, and creates an App Runner service. The scheduler runs in headless mode with the gateway enabled, providing a public HTTPS endpoint for webhooks.
|
|
671
|
-
|
|
672
|
-
The deployed service URL is printed on completion. Use it to configure webhook endpoints in GitHub/Sentry/Linear:
|
|
673
|
-
|
|
674
|
-
```
|
|
675
|
-
https://<service-id>.<region>.awsapprunner.com/webhooks/github
|
|
676
|
-
```
|
|
677
|
-
|
|
678
|
-
#### App Runner IAM roles
|
|
679
|
-
|
|
680
|
-
`al cloud deploy` requires two additional IAM roles:
|
|
681
|
-
|
|
682
|
-
**1. Access role** — allows App Runner to pull images from ECR:
|
|
683
|
-
|
|
684
|
-
```bash
|
|
685
|
-
aws iam create-role \
|
|
686
|
-
--role-name al-apprunner-access-role \
|
|
687
|
-
--assume-role-policy-document '{
|
|
688
|
-
"Version": "2012-10-17",
|
|
689
|
-
"Statement": [{
|
|
690
|
-
"Effect": "Allow",
|
|
691
|
-
"Principal": { "Service": "build.apprunner.amazonaws.com" },
|
|
692
|
-
"Action": "sts:AssumeRole"
|
|
693
|
-
}]
|
|
694
|
-
}'
|
|
695
|
-
|
|
696
|
-
aws iam attach-role-policy \
|
|
697
|
-
--role-name al-apprunner-access-role \
|
|
698
|
-
--policy-arn arn:aws:iam::aws:policy/service-role/AWSAppRunnerServicePolicyForECRAccess
|
|
699
|
-
```
|
|
700
|
-
|
|
701
|
-
Set `cloud.appRunnerAccessRoleArn` to this role's ARN in `config.toml`.
|
|
702
|
-
|
|
703
|
-
**2. Instance role** — the IAM role assumed by the scheduler container. It needs the same permissions as the operator (ECS, Lambda, Secrets Manager, ECR, CodeBuild, S3, CloudWatch Logs):
|
|
704
|
-
|
|
705
|
-
```bash
|
|
706
|
-
aws iam create-role \
|
|
707
|
-
--role-name al-apprunner-instance-role \
|
|
708
|
-
--assume-role-policy-document '{
|
|
709
|
-
"Version": "2012-10-17",
|
|
710
|
-
"Statement": [{
|
|
711
|
-
"Effect": "Allow",
|
|
712
|
-
"Principal": { "Service": "tasks.apprunner.amazonaws.com" },
|
|
713
|
-
"Action": "sts:AssumeRole"
|
|
714
|
-
}]
|
|
715
|
-
}'
|
|
716
|
-
```
|
|
717
|
-
|
|
718
|
-
Attach the same policy statements from the [operator IAM policy](#operator-iam-policy) to this role (ECS, Lambda, Secrets Manager, ECR, CodeBuild, S3, DynamoDB, CloudWatch Logs, PassRole, IAM agent roles). Set `cloud.appRunnerInstanceRoleArn` to this role's ARN.
|
|
719
|
-
|
|
720
|
-
#### Managing the cloud scheduler
|
|
721
|
-
|
|
722
|
-
```bash
|
|
723
|
-
al stat -c # Show scheduler service status + running agents
|
|
724
|
-
al logs scheduler -c # Tail scheduler logs from CloudWatch
|
|
725
|
-
al teardown cloud # Tear down scheduler + all cloud resources
|
|
726
|
-
```
|
|
727
|
-
|
|
728
|
-
### Manual deployment (alternative)
|
|
729
|
-
|
|
730
|
-
You can also deploy the scheduler manually to any platform that runs Node.js (Railway, Fly, EC2, etc.):
|
|
731
|
-
|
|
732
|
-
**Required environment variables:**
|
|
733
|
-
|
|
734
|
-
| Env var | Description |
|
|
735
|
-
|---------|-------------|
|
|
736
|
-
| `AWS_ACCESS_KEY_ID` | AWS access key for the operator IAM user/role |
|
|
737
|
-
| `AWS_SECRET_ACCESS_KEY` | AWS secret key |
|
|
738
|
-
|
|
739
|
-
These provide the scheduler with the same permissions as running `al` locally. Use the [operator IAM policy](#operator-iam-policy) below to scope the access.
|
|
740
|
-
|
|
741
|
-
**Start command:**
|
|
742
|
-
|
|
743
|
-
```
|
|
744
|
-
al start -c --headless
|
|
745
|
-
```
|
|
746
|
-
|
|
747
|
-
**What needs to be in the deploy:**
|
|
748
|
-
|
|
749
|
-
- Your project repo (with `config.toml`, agent directories containing `agent-config.toml` and `ACTIONS.md`)
|
|
750
|
-
- `@action-llama/action-llama` as a dependency in `package.json`
|
|
751
|
-
|
|
752
|
-
The scheduler builds images via CodeBuild, launches containers on ECS Fargate, and streams logs from CloudWatch — all remotely. No local Docker is needed.
|
|
753
|
-
|
|
754
|
-
## Troubleshooting
|
|
755
|
-
|
|
756
|
-
**"ECS runtime requires cloud.awsRegion..."** — Ensure all required fields are set in `config.toml` under `[cloud]`.
|
|
757
|
-
|
|
758
|
-
**"No AWS credentials found"** — Set `AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY` env vars or run `aws configure`.
|
|
759
|
-
|
|
760
|
-
**"Unable to assume the service linked role"** — ECS needs a service-linked role the first time it's used in an account. `al setup cloud` creates this automatically, but if you set up manually: `aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com`.
|
|
761
|
-
|
|
762
|
-
**"Couldn't create a service-linked role for App Runner"** — Same issue for App Runner. `al setup cloud` creates this automatically, but if you set up manually: `aws iam create-service-linked-role --aws-service-name apprunner.amazonaws.com`. If `al setup cloud` itself fails with this error, your IAM user needs `iam:CreateServiceLinkedRole` permission — see [Service-linked roles](#service-linked-roles).
|
|
763
|
-
|
|
764
|
-
**"ECS was unable to assume the role 'arn:aws:iam::...:role/al-AGENT-task-role'"** — This is the most common issue with multiple agents. It means the IAM task role for your second (or subsequent) agent doesn't exist or has incorrect permissions. This typically happens because:
|
|
765
|
-
|
|
766
|
-
1. You ran `al setup cloud` with only one agent, then added more agents later
|
|
767
|
-
2. The per-agent role creation failed during setup
|
|
768
|
-
3. The role exists but has an incorrect trust policy
|
|
769
|
-
|
|
770
|
-
**Solutions:**
|
|
771
|
-
- **Quick fix:** Run `al doctor -c` to validate and create missing roles
|
|
772
|
-
- **Verify setup:** Run `al doctor -c --check-only` to see what's missing without making changes
|
|
773
|
-
- **Manual check:** Run `aws iam get-role --role-name al-AGENT-task-role` to see if the role exists
|
|
774
|
-
|
|
775
|
-
**Prevention:** Always run `al doctor -c` after adding new agents to ensure their IAM roles are created.
|
|
776
|
-
|
|
777
|
-
**"Failed to start ECS task"** — Check that the ECS cluster exists, subnets have internet access, and the execution role has the required permissions.
|
|
778
|
-
|
|
779
|
-
**CodeBuild build fails** — Check the build logs linked in the error message. Common causes: the `al-codebuild-role` is missing or lacks ECR push permissions, or the S3 bucket doesn't exist. Verify the role exists and has the permissions listed in the "How CodeBuild works" section above.
|
|
780
|
-
|
|
781
|
-
**"The specified log group does not exist"** — The CloudWatch log group `/ecs/action-llama` hasn't been created. The runtime creates it automatically on first launch, but the operator IAM user needs `logs:CreateLogGroup` permission. Either re-run `al setup cloud` (which creates it), or create it manually:
|
|
782
|
-
|
|
783
|
-
```bash
|
|
784
|
-
aws logs create-log-group --log-group-name /ecs/action-llama --region us-east-1
|
|
785
|
-
```
|
|
786
|
-
|
|
787
|
-
If you get `AccessDeniedException`, add the `logs:CreateLogGroup` action to your operator IAM policy (see the operator policy above).
|
|
788
|
-
|
|
789
|
-
**"not authorized to perform: logs:FilterLogEvents"** — Your operator IAM user is missing CloudWatch Logs read permissions. Running `al setup cloud` grants these automatically (the `ActionLlamaOperator` inline policy). If you set up before this was added, re-run `al setup cloud` or manually add the Logs statement from the operator policy above.
|
|
790
|
-
|
|
791
|
-
**Logs are delayed** — This is expected. CloudWatch Logs has a ~5-10 second ingestion delay. The TUI shows a warning when running in ECS mode.
|
|
792
|
-
|
|
793
|
-
**Agent can't access secrets** — Verify the per-agent task role has `secretsmanager:GetSecretValue` on the correct secret ARNs. Check with `aws iam get-role-policy --role-name al-dev-task-role --policy-name SecretsAccess`.
|
|
794
|
-
|
|
795
|
-
**Task stops immediately with exit code 1** — Check CloudWatch Logs for the error. Common causes: missing credentials in Secrets Manager, missing `ACTIONS.md`, invalid model config.
|