@action-llama/action-llama 0.12.2 → 0.13.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (168) hide show
  1. package/{docs/agent-reference → agent-docs}/AGENTS.md +31 -15
  2. package/{docs/agent-reference → agent-docs}/skills/README.md +1 -0
  3. package/agent-docs/skills/calls.md +82 -0
  4. package/{docs/agent-reference → agent-docs}/skills/resource-locks.md +13 -7
  5. package/{docs/agent-reference → agent-docs}/skills/signals.md +1 -1
  6. package/dist/agents/container-runner.d.ts +3 -2
  7. package/dist/agents/container-runner.d.ts.map +1 -1
  8. package/dist/agents/container-runner.js +12 -12
  9. package/dist/agents/container-runner.js.map +1 -1
  10. package/dist/agents/prompt.d.ts.map +1 -1
  11. package/dist/agents/prompt.js +3 -2
  12. package/dist/agents/prompt.js.map +1 -1
  13. package/dist/agents/runner.d.ts +3 -2
  14. package/dist/agents/runner.d.ts.map +1 -1
  15. package/dist/agents/runner.js +14 -14
  16. package/dist/agents/runner.js.map +1 -1
  17. package/dist/build-info.json +1 -1
  18. package/dist/cli/commands/doctor.d.ts +1 -0
  19. package/dist/cli/commands/doctor.d.ts.map +1 -1
  20. package/dist/cli/commands/doctor.js +53 -15
  21. package/dist/cli/commands/doctor.js.map +1 -1
  22. package/dist/cli/commands/env.d.ts +4 -0
  23. package/dist/cli/commands/env.d.ts.map +1 -1
  24. package/dist/cli/commands/env.js +41 -0
  25. package/dist/cli/commands/env.js.map +1 -1
  26. package/dist/cli/commands/kill.js +2 -2
  27. package/dist/cli/commands/kill.js.map +1 -1
  28. package/dist/cli/commands/logs.d.ts.map +1 -1
  29. package/dist/cli/commands/logs.js +25 -20
  30. package/dist/cli/commands/logs.js.map +1 -1
  31. package/dist/cli/commands/pause.js +2 -2
  32. package/dist/cli/commands/pause.js.map +1 -1
  33. package/dist/cli/commands/push.d.ts +1 -0
  34. package/dist/cli/commands/push.d.ts.map +1 -1
  35. package/dist/cli/commands/push.js +2 -1
  36. package/dist/cli/commands/push.js.map +1 -1
  37. package/dist/cli/commands/resume.js +2 -2
  38. package/dist/cli/commands/resume.js.map +1 -1
  39. package/dist/cli/commands/run.d.ts.map +1 -1
  40. package/dist/cli/commands/run.js +21 -46
  41. package/dist/cli/commands/run.js.map +1 -1
  42. package/dist/cli/commands/start.d.ts.map +1 -1
  43. package/dist/cli/commands/start.js +62 -2
  44. package/dist/cli/commands/start.js.map +1 -1
  45. package/dist/cli/commands/status.d.ts.map +1 -1
  46. package/dist/cli/commands/status.js +23 -7
  47. package/dist/cli/commands/status.js.map +1 -1
  48. package/dist/cli/commands/stop.d.ts +1 -0
  49. package/dist/cli/commands/stop.d.ts.map +1 -1
  50. package/dist/cli/commands/stop.js +3 -2
  51. package/dist/cli/commands/stop.js.map +1 -1
  52. package/dist/cli/gateway-client.d.ts +6 -0
  53. package/dist/cli/gateway-client.d.ts.map +1 -1
  54. package/dist/cli/gateway-client.js +19 -0
  55. package/dist/cli/gateway-client.js.map +1 -1
  56. package/dist/cli/main.js +12 -0
  57. package/dist/cli/main.js.map +1 -1
  58. package/dist/cloud/vps/constants.d.ts +1 -1
  59. package/dist/cloud/vps/constants.d.ts.map +1 -1
  60. package/dist/cloud/vps/constants.js +9 -0
  61. package/dist/cloud/vps/constants.js.map +1 -1
  62. package/dist/cloud/vps/hetzner-api.d.ts +14 -3
  63. package/dist/cloud/vps/hetzner-api.d.ts.map +1 -1
  64. package/dist/cloud/vps/hetzner-api.js +24 -11
  65. package/dist/cloud/vps/hetzner-api.js.map +1 -1
  66. package/dist/cloud/vps/provision.js +29 -6
  67. package/dist/cloud/vps/provision.js.map +1 -1
  68. package/dist/cloud/vps/ssh.d.ts +7 -0
  69. package/dist/cloud/vps/ssh.d.ts.map +1 -1
  70. package/dist/cloud/vps/ssh.js +15 -1
  71. package/dist/cloud/vps/ssh.js.map +1 -1
  72. package/dist/credentials/builtins/index.d.ts.map +1 -1
  73. package/dist/credentials/builtins/index.js +2 -0
  74. package/dist/credentials/builtins/index.js.map +1 -1
  75. package/dist/credentials/builtins/reddit-oauth.d.ts +4 -0
  76. package/dist/credentials/builtins/reddit-oauth.d.ts.map +1 -0
  77. package/dist/credentials/builtins/reddit-oauth.js +71 -0
  78. package/dist/credentials/builtins/reddit-oauth.js.map +1 -0
  79. package/dist/docker/local-runtime.d.ts +1 -0
  80. package/dist/docker/local-runtime.d.ts.map +1 -1
  81. package/dist/docker/local-runtime.js +9 -6
  82. package/dist/docker/local-runtime.js.map +1 -1
  83. package/dist/gateway/index.d.ts.map +1 -1
  84. package/dist/gateway/index.js +5 -4
  85. package/dist/gateway/index.js.map +1 -1
  86. package/dist/gateway/routes/logs.d.ts.map +1 -1
  87. package/dist/gateway/routes/logs.js +29 -111
  88. package/dist/gateway/routes/logs.js.map +1 -1
  89. package/dist/remote/bootstrap.d.ts +2 -0
  90. package/dist/remote/bootstrap.d.ts.map +1 -1
  91. package/dist/remote/bootstrap.js +7 -11
  92. package/dist/remote/bootstrap.js.map +1 -1
  93. package/dist/remote/push.d.ts +6 -0
  94. package/dist/remote/push.d.ts.map +1 -1
  95. package/dist/remote/push.js +172 -91
  96. package/dist/remote/push.js.map +1 -1
  97. package/dist/remote/ssh.d.ts +1 -0
  98. package/dist/remote/ssh.d.ts.map +1 -1
  99. package/dist/remote/ssh.js +8 -0
  100. package/dist/remote/ssh.js.map +1 -1
  101. package/dist/scheduler/index.d.ts.map +1 -1
  102. package/dist/scheduler/index.js +56 -7
  103. package/dist/scheduler/index.js.map +1 -1
  104. package/dist/scheduler/watcher.d.ts +1 -1
  105. package/dist/scheduler/watcher.d.ts.map +1 -1
  106. package/dist/scheduler/watcher.js +5 -6
  107. package/dist/scheduler/watcher.js.map +1 -1
  108. package/dist/setup/scaffold.js +2 -2
  109. package/dist/setup/scaffold.js.map +1 -1
  110. package/dist/shared/config.d.ts +1 -0
  111. package/dist/shared/config.d.ts.map +1 -1
  112. package/dist/shared/config.js.map +1 -1
  113. package/dist/shared/credentials.d.ts +8 -18
  114. package/dist/shared/credentials.d.ts.map +1 -1
  115. package/dist/shared/credentials.js +8 -62
  116. package/dist/shared/credentials.js.map +1 -1
  117. package/dist/shared/server.d.ts +2 -0
  118. package/dist/shared/server.d.ts.map +1 -1
  119. package/dist/shared/server.js.map +1 -1
  120. package/dist/tui/App.d.ts.map +1 -1
  121. package/dist/tui/App.js +1 -1
  122. package/dist/tui/App.js.map +1 -1
  123. package/dist/webhooks/definitions/github.d.ts.map +1 -1
  124. package/dist/webhooks/definitions/github.js +13 -0
  125. package/dist/webhooks/definitions/github.js.map +1 -1
  126. package/dist/webhooks/providers/github.d.ts.map +1 -1
  127. package/dist/webhooks/providers/github.js +6 -0
  128. package/dist/webhooks/providers/github.js.map +1 -1
  129. package/dist/webhooks/registry.d.ts.map +1 -1
  130. package/dist/webhooks/registry.js +9 -3
  131. package/dist/webhooks/registry.js.map +1 -1
  132. package/dist/webhooks/types.d.ts +3 -1
  133. package/dist/webhooks/types.d.ts.map +1 -1
  134. package/docker/bin/_http-exit +17 -0
  135. package/docker/bin/al-call +10 -4
  136. package/docker/bin/al-check +9 -3
  137. package/docker/bin/al-status +1 -1
  138. package/docker/bin/al-wait +11 -3
  139. package/docker/bin/rlock +9 -2
  140. package/docker/bin/rlock-heartbeat +9 -2
  141. package/docker/bin/runlock +9 -2
  142. package/package.json +2 -2
  143. package/docs/agent-config-reference.md +0 -313
  144. package/docs/agents.md +0 -256
  145. package/docs/cloud-run.md +0 -173
  146. package/docs/cloud.md +0 -98
  147. package/docs/commands.md +0 -286
  148. package/docs/config-reference.md +0 -241
  149. package/docs/creating-agents.md +0 -147
  150. package/docs/credentials.md +0 -167
  151. package/docs/docker.md +0 -323
  152. package/docs/ecs.md +0 -795
  153. package/docs/examples/dev/ACTIONS.md +0 -75
  154. package/docs/examples/dev/README.md +0 -28
  155. package/docs/examples/dev/agent-config.toml +0 -18
  156. package/docs/examples/devops/ACTIONS.md +0 -33
  157. package/docs/examples/devops/README.md +0 -23
  158. package/docs/examples/devops/agent-config.toml +0 -13
  159. package/docs/examples/index.md +0 -15
  160. package/docs/examples/reviewer/ACTIONS.md +0 -37
  161. package/docs/examples/reviewer/README.md +0 -22
  162. package/docs/examples/reviewer/agent-config.toml +0 -11
  163. package/docs/models.md +0 -191
  164. package/docs/vps-deployment.md +0 -285
  165. package/docs/web-dashboard.md +0 -113
  166. package/docs/webhooks.md +0 -152
  167. /package/{docs/agent-reference → agent-docs}/skills/credentials.md +0 -0
  168. /package/{docs/agent-reference → agent-docs}/skills/environment.md +0 -0
package/docs/ecs.md DELETED
@@ -1,795 +0,0 @@
1
- # ECS Fargate Mode
2
-
3
- Run agents as ECS Fargate tasks on AWS instead of local Docker containers. Agents get the same isolation guarantees with the added benefits of managed infrastructure and per-agent secret isolation via IAM task roles.
4
-
5
- ## Prerequisites
6
-
7
- - AWS account with ECS, ECR, Secrets Manager, DynamoDB, and CloudWatch Logs access
8
- - AWS CLI configured (`aws configure`) or `AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY` env vars
9
- - Docker is **not** required — images are built remotely via AWS CodeBuild and pushed directly to ECR
10
- - The IAM user running `al setup cloud` needs `iam:CreateServiceLinkedRole` permission (or the service-linked roles for ECS and App Runner must already exist — see [Service-linked roles](#service-linked-roles))
11
-
12
- ## Configuration
13
-
14
- In your project's `config.toml`:
15
-
16
- ```toml
17
- [cloud]
18
- provider = "ecs"
19
- awsRegion = "us-east-1"
20
- ecsCluster = "al-cluster"
21
- ecrRepository = "123456789012.dkr.ecr.us-east-1.amazonaws.com/al-images"
22
- executionRoleArn = "arn:aws:iam::123456789012:role/ecsTaskExecutionRole"
23
- taskRoleArn = "arn:aws:iam::123456789012:role/al-default-task-role"
24
- subnets = ["subnet-abc123"]
25
- # securityGroups = ["sg-abc123"] # optional
26
- # awsSecretPrefix = "action-llama" # optional, default: "action-llama"
27
- ```
28
-
29
- | Key | Required | Description |
30
- |-----|----------|-------------|
31
- | `cloud.provider` | Yes | Set to `"ecs"` |
32
- | `cloud.awsRegion` | Yes | AWS region (e.g. `us-east-1`) |
33
- | `cloud.ecsCluster` | Yes | ECS cluster name or ARN |
34
- | `cloud.ecrRepository` | Yes | Full ECR repository URI |
35
- | `cloud.executionRoleArn` | Yes | IAM role for task execution (ECR pull + CloudWatch Logs) |
36
- | `cloud.taskRoleArn` | Yes | Default IAM task role (Secrets Manager access) |
37
- | `cloud.subnets` | Yes | VPC subnet IDs for Fargate tasks |
38
- | `cloud.securityGroups` | No | Security group IDs for Fargate tasks |
39
- | `cloud.awsSecretPrefix` | No | Secrets Manager name prefix (default: `"action-llama"`) |
40
- | `cloud.buildBucket` | No | S3 bucket for CodeBuild source uploads (auto-created if omitted) |
41
-
42
- Local Docker settings (`[local]`) control resource limits:
43
-
44
- | Key | Default | Description |
45
- |-----|---------|-------------|
46
- | `local.memory` | `"4096"` | Memory per task in MiB |
47
- | `local.cpus` | `2` | CPUs per task |
48
- | `local.timeout` | `900` | Default max execution time in seconds (overridable per-agent) |
49
-
50
- Individual agents can override the timeout in their `agent-config.toml` with the `timeout` field. On ECS, agents with effective timeout <= 900s automatically route to Lambda for faster startup. See [Per-agent timeout](#per-agent-timeout-and-lambda-routing).
51
-
52
- Optional Lambda configuration (for agents that auto-route to Lambda):
53
-
54
- | Key | Required | Default | Description |
55
- |-----|----------|---------|-------------|
56
- | `cloud.lambdaRoleArn` | No | auto-derived | Lambda execution role ARN (overrides per-agent role derivation) |
57
- | `cloud.lambdaSubnets` | No | — | VPC subnet IDs for Lambda (only if Lambda needs VPC access) |
58
- | `cloud.lambdaSecurityGroups` | No | — | Security groups for Lambda (only with `lambdaSubnets`) |
59
-
60
- Optional cloud scheduler configuration (for `al cloud deploy`):
61
-
62
- | Key | Required | Default | Description |
63
- |-----|----------|---------|-------------|
64
- | `cloud.schedulerCpu` | No | `"256"` | App Runner instance CPU (valid: `256`, `512`, `1024`, `2048`, `4096`) |
65
- | `cloud.schedulerMemory` | No | `"512"` | App Runner instance memory in MB (valid depends on CPU — see [App Runner docs](https://docs.aws.amazon.com/apprunner/latest/dg/manage-configure.html)) |
66
- | `cloud.appRunnerInstanceRoleArn` | No | — | IAM role assumed by the scheduler container (needs ECS, Lambda, Secrets Manager, ECR, CodeBuild, S3, CloudWatch Logs permissions) |
67
- | `cloud.appRunnerAccessRoleArn` | Yes* | — | IAM role that allows App Runner to pull images from ECR (*required only when using `al cloud deploy`) |
68
-
69
- ## Service-linked roles
70
-
71
- AWS requires service-linked roles for ECS and App Runner. These are account-level roles that AWS services use internally — they only need to be created once per AWS account.
72
-
73
- `al setup cloud` automatically creates both:
74
-
75
- - `AWSServiceRoleForECS` (for ECS Fargate task execution)
76
- - `AWSServiceRoleForAppRunner` (for App Runner service management)
77
-
78
- If your IAM user lacks `iam:CreateServiceLinkedRole` permission, create them manually:
79
-
80
- ```bash
81
- aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com
82
- aws iam create-service-linked-role --aws-service-name apprunner.amazonaws.com
83
- ```
84
-
85
- These commands are safe to re-run — they return an error if the role already exists.
86
-
87
- ## Quick Setup
88
-
89
- The fastest way to get started:
90
-
91
- ```bash
92
- al setup cloud -p .
93
- ```
94
-
95
- This interactive wizard prompts for all required fields, writes the `[cloud]` config, pushes credentials, and provisions IAM in one step.
96
-
97
- ## Manual Setup
98
-
99
- ### 1. Create an ECS cluster
100
-
101
- ```bash
102
- aws ecs create-cluster --cluster-name al-cluster --region us-east-1
103
- ```
104
-
105
- ### 2. Create an ECR repository
106
-
107
- ```bash
108
- aws ecr create-repository --repository-name al-images --region us-east-1
109
- ```
110
-
111
- ### 3. Create the execution role
112
-
113
- The execution role allows ECS to pull images from ECR and write logs to CloudWatch. Create `ecs-execution-trust.json`:
114
-
115
- ```json
116
- {
117
- "Version": "2012-10-17",
118
- "Statement": [{
119
- "Effect": "Allow",
120
- "Principal": { "Service": "ecs-tasks.amazonaws.com" },
121
- "Action": "sts:AssumeRole"
122
- }]
123
- }
124
- ```
125
-
126
- ```bash
127
- aws iam create-role \
128
- --role-name ecsTaskExecutionRole \
129
- --assume-role-policy-document file://ecs-execution-trust.json
130
-
131
- aws iam attach-role-policy \
132
- --role-name ecsTaskExecutionRole \
133
- --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
134
- ```
135
-
136
- The execution role also needs `secretsmanager:GetSecretValue` (so ECS can inject secrets at launch) and `logs:CreateLogGroup` (so ECS can create the CloudWatch log group on first run):
137
-
138
- ```bash
139
- aws iam put-role-policy \
140
- --role-name ecsTaskExecutionRole \
141
- --policy-name ActionLlamaExecution \
142
- --policy-document '{
143
- "Version": "2012-10-17",
144
- "Statement": [
145
- {
146
- "Effect": "Allow",
147
- "Action": "secretsmanager:GetSecretValue",
148
- "Resource": "arn:aws:secretsmanager:us-east-1:123456789012:secret:action-llama/*"
149
- },
150
- {
151
- "Effect": "Allow",
152
- "Action": "logs:CreateLogGroup",
153
- "Resource": "arn:aws:logs:us-east-1:123456789012:log-group:/ecs/action-llama*"
154
- }
155
- ]
156
- }'
157
- ```
158
-
159
- ### 4. Push credentials and create per-agent task roles
160
-
161
- ```bash
162
- al doctor -c -p .
163
- ```
164
-
165
- This pushes all local credentials to AWS Secrets Manager, then creates a task role for each agent (`al-{agentName}-task-role`) and grants `secretsmanager:GetSecretValue` scoped to only that agent's declared secrets.
166
-
167
- > **Re-run after adding agents:** Whenever you add a new agent to your project, re-run `al doctor -c` to create the task role for the new agent. Without this, the new agent will fail to access its credentials at runtime.
168
-
169
- Alternatively, create roles manually:
170
-
171
- ```bash
172
- # Create the role
173
- aws iam create-role \
174
- --role-name al-dev-task-role \
175
- --assume-role-policy-document file://ecs-execution-trust.json
176
-
177
- # Grant access to only this agent's secrets
178
- aws iam put-role-policy \
179
- --role-name al-dev-task-role \
180
- --policy-name SecretsAccess \
181
- --policy-document '{
182
- "Version": "2012-10-17",
183
- "Statement": [{
184
- "Effect": "Allow",
185
- "Action": "secretsmanager:GetSecretValue",
186
- "Resource": [
187
- "arn:aws:secretsmanager:us-east-1:123456789012:secret:action-llama/github_token/default/*",
188
- "arn:aws:secretsmanager:us-east-1:123456789012:secret:action-llama/anthropic_key/default/*"
189
- ]
190
- }]
191
- }'
192
- ```
193
-
194
- Repeat for each agent (`al-reviewer-task-role`, `al-devops-task-role`, etc.), scoping each role's policy to only that agent's credential paths.
195
-
196
- ### 5. Ensure VPC networking
197
-
198
- Fargate tasks need a VPC subnet with internet access (for pulling images, calling APIs). Use a public subnet with `assignPublicIp: ENABLED` (the default) or a private subnet with a NAT gateway.
199
-
200
- ### 6. Start
201
-
202
- ```bash
203
- al start -c -p .
204
- ```
205
-
206
- The scheduler will:
207
- 1. Build agent images remotely via CodeBuild and push to ECR
208
- 2. Create the CloudWatch log group if it doesn't exist
209
- 3. Register ECS task definitions with Secrets Manager secret injection
210
- 4. Run Fargate tasks on schedule or webhook trigger
211
- 5. Stream logs from CloudWatch Logs
212
-
213
- ## Per-agent timeout and Lambda routing
214
-
215
- When using the ECS provider, agents are automatically routed to the most efficient AWS compute service based on their timeout:
216
-
217
- - **Timeout <= 900s (15 min):** Routes to **AWS Lambda** — cold starts in ~1-2 seconds, pay-per-100ms pricing
218
- - **Timeout > 900s:** Routes to **ECS Fargate** — cold starts in ~30-60 seconds, pay-per-second pricing
219
-
220
- This happens automatically. You control it by setting `timeout` in each agent's `agent-config.toml`:
221
-
222
- ```toml
223
- # agent-config.toml for a fast webhook responder
224
- timeout = 300 # 5 minutes — will use Lambda on AWS
225
- ```
226
-
227
- ```toml
228
- # agent-config.toml for a long-running refactoring agent
229
- timeout = 3600 # 1 hour — will use ECS Fargate
230
- ```
231
-
232
- If an agent doesn't set `timeout`, it falls back to `[local].timeout` in `config.toml`, then to the default of 900s. Since 900s is the Lambda maximum, agents without an explicit timeout default to Lambda.
233
-
234
- ### Lambda memory
235
-
236
- Lambda functions default to 512 MB of memory, which is sufficient for typical LLM agent workloads (HTTP calls to LLM APIs). Lambda's maximum is 3008 MB — the `local.memory` config value is clamped to this limit for Lambda-routed agents.
237
-
238
- To increase memory for a specific agent, set `memory` in project `config.toml`:
239
-
240
- ```toml
241
- [local]
242
- memory = "2048" # 2 GB — clamped to 3008 for Lambda, used as-is for ECS
243
- ```
244
-
245
- ### Why Lambda is faster
246
-
247
- Lambda keeps container images warm in pre-provisioned execution environments. When invoked, Lambda starts executing in ~1-2 seconds. ECS Fargate must provision a fresh VM, pull the image, and start the container — taking 30-60 seconds.
248
-
249
- For agents that respond to webhooks (e.g., triaging issues, reviewing PRs, responding to alerts), this means the agent starts working almost immediately after the event arrives.
250
-
251
- ### Shared infrastructure
252
-
253
- Both Lambda and ECS Fargate use the same infrastructure:
254
-
255
- - **Same ECR images** — built once via CodeBuild, referenced by both runtimes
256
- - **Same Secrets Manager credentials** — Lambda resolves secrets at invocation time and passes them as environment variables using the same `AL_SECRET_*` naming convention
257
- - **Same CodeBuild pipeline** — no separate build step needed
258
-
259
- ### Lambda IAM roles
260
-
261
- `al doctor -c` automatically creates Lambda execution roles (`al-{agentName}-lambda-role`) for agents with timeout <= 900s. These roles include:
262
-
263
- - `secretsmanager:GetSecretValue` scoped to the agent's declared secrets
264
- - `logs:CreateLogGroup`, `logs:CreateLogStream`, `logs:PutLogEvents` for CloudWatch
265
- - `ecr:BatchGetImage`, `ecr:GetDownloadUrlForLayer` for pulling images
266
-
267
- To use a shared role instead of per-agent roles, set `cloud.lambdaRoleArn` in `config.toml`.
268
-
269
- ### Operator IAM additions for Lambda
270
-
271
- If your agents route to Lambda, add these permissions to your operator IAM policy:
272
-
273
- ```json
274
- {
275
- "Sid": "Lambda",
276
- "Effect": "Allow",
277
- "Action": [
278
- "lambda:GetFunction",
279
- "lambda:CreateFunction",
280
- "lambda:UpdateFunctionCode",
281
- "lambda:UpdateFunctionConfiguration",
282
- "lambda:PutFunctionEventInvokeConfig",
283
- "lambda:InvokeFunction"
284
- ],
285
- "Resource": "arn:aws:iam::<ACCOUNT_ID>:function:al-*"
286
- }
287
- ```
288
-
289
- And extend the `PassRole` condition to include `lambda.amazonaws.com` and the App Runner service principals:
290
-
291
- ```json
292
- "Condition": {
293
- "StringEquals": {
294
- "iam:PassedToService": [
295
- "ecs-tasks.amazonaws.com",
296
- "codebuild.amazonaws.com",
297
- "lambda.amazonaws.com",
298
- "tasks.apprunner.amazonaws.com",
299
- "build.apprunner.amazonaws.com"
300
- ]
301
- }
302
- }
303
- ```
304
-
305
- ## How it works
306
-
307
- ### Image lifecycle
308
-
309
- Images are built remotely via AWS CodeBuild and pushed directly to ECR — no local Docker required. This means the scheduler can run anywhere (your machine, Railway, EC2, etc.).
310
-
311
- Each agent gets its own image tag (`al-{agentName}-latest`). The build happens on every `al start -c` to ensure the latest code is deployed.
312
-
313
- ### How CodeBuild works
314
-
315
- On each build, the ECS runtime:
316
-
317
- 1. Creates a tarball of the build context
318
- 2. Uploads it to S3 (bucket: `buildBucket` from config, or auto-created as `al-builds-<accountId>-<region>`)
319
- 3. Creates a CodeBuild project (`al-image-builder`) if it doesn't exist
320
- 4. Starts a build that produces and pushes the Docker image to ECR
321
-
322
- This requires:
323
-
324
- - An IAM role `al-codebuild-role` that CodeBuild can assume, with ECR push and S3 read permissions
325
- - The operator IAM policy must include CodeBuild and S3 permissions (see below)
326
-
327
- To create the CodeBuild service role:
328
-
329
- ```bash
330
- # Trust policy
331
- aws iam create-role \
332
- --role-name al-codebuild-role \
333
- --assume-role-policy-document '{
334
- "Version": "2012-10-17",
335
- "Statement": [{
336
- "Effect": "Allow",
337
- "Principal": { "Service": "codebuild.amazonaws.com" },
338
- "Action": "sts:AssumeRole"
339
- }]
340
- }'
341
-
342
- # ECR push permissions
343
- aws iam put-role-policy \
344
- --role-name al-codebuild-role \
345
- --policy-name ECRPush \
346
- --policy-document '{
347
- "Version": "2012-10-17",
348
- "Statement": [
349
- {
350
- "Effect": "Allow",
351
- "Action": "ecr:GetAuthorizationToken",
352
- "Resource": "*"
353
- },
354
- {
355
- "Effect": "Allow",
356
- "Action": [
357
- "ecr:BatchCheckLayerAvailability",
358
- "ecr:PutImage",
359
- "ecr:InitiateLayerUpload",
360
- "ecr:UploadLayerPart",
361
- "ecr:CompleteLayerUpload",
362
- "ecr:GetDownloadUrlForLayer",
363
- "ecr:BatchGetImage"
364
- ],
365
- "Resource": "arn:aws:ecr:<REGION>:<ACCOUNT_ID>:repository/al-*"
366
- },
367
- {
368
- "Effect": "Allow",
369
- "Action": "s3:GetObject",
370
- "Resource": "arn:aws:s3:::al-builds-<ACCOUNT_ID>-<REGION>/*"
371
- },
372
- {
373
- "Effect": "Allow",
374
- "Action": [
375
- "logs:CreateLogGroup",
376
- "logs:CreateLogStream",
377
- "logs:PutLogEvents"
378
- ],
379
- "Resource": "*"
380
- }
381
- ]
382
- }'
383
- ```
384
-
385
- ### Secret injection
386
-
387
- ECS injects secrets from AWS Secrets Manager as environment variables using the naming convention `AL_SECRET_{type}__{instance}__{field}`. The container entry point reads these environment variables and writes them to `/credentials/{type}/{instance}/{field}` for compatibility with the standard credential layout.
388
-
389
- Secret names in Secrets Manager follow the convention: `{prefix}/{type}/{instance}/{field}` (e.g. `action-llama/github_token/default/token`).
390
-
391
- ### Per-agent task roles
392
-
393
- Each agent runs with its own IAM task role:
394
-
395
- ```
396
- al-dev-task-role -> github_token, git_ssh, anthropic_key
397
- al-reviewer-task-role -> github_token, git_ssh, anthropic_key
398
- al-devops-task-role -> github_token, sentry_token, anthropic_key
399
- ```
400
-
401
- Each role only has `secretsmanager:GetSecretValue` on its declared secrets. Even if an agent container is compromised and accesses the ECS task metadata endpoint to obtain the role's credentials, it can only read its own secrets.
402
-
403
- The per-agent role ARN is derived automatically from the ECR repository's account ID: `arn:aws:iam::{accountId}:role/al-{agentName}-task-role`.
404
-
405
- ### State persistence
406
-
407
- The scheduler persists its runtime state (container registry, resource locks, work queues, inter-agent call state) to a DynamoDB table (`al-state`). This allows the scheduler to survive restarts (e.g., App Runner redeployments) without losing track of running containers. `al cloud setup` creates this table automatically with on-demand billing and native TTL.
408
-
409
- Locally, the same state is stored in a SQLite database at `.al/state.db` in your project directory.
410
-
411
- ### Gateway
412
-
413
- The gateway is **not required** for ECS mode. Containers get their credentials via native Secrets Manager injection (not the gateway's HTTP endpoint), and ECS handles task timeouts natively. The gateway still starts if you have webhooks configured, since webhooks are received by the scheduler process.
414
-
415
- ### Log streaming
416
-
417
- Logs are streamed from CloudWatch Logs by polling. There is a ~5-10 second delay inherent to CloudWatch Logs ingestion. The TUI displays a warning about this delay when running in ECS mode.
418
-
419
- ## Comparison with local Docker
420
-
421
- | Aspect | Local Docker | ECS Fargate | Lambda (auto, <=900s) |
422
- |--------|-------------|-------------|----------------------|
423
- | Where containers run | Your machine | AWS | AWS |
424
- | Cold start | Instant (image cached) | ~30-60s | ~1-2s |
425
- | Max runtime | Unlimited | Unlimited | 15 minutes |
426
- | Credential delivery | Volume mount | Secrets Manager env vars | Secrets Manager env vars |
427
- | Secret isolation | Mount-level | IAM task roles | IAM Lambda roles |
428
- | Log latency | Real-time | ~5-10s | ~5-10s |
429
- | Image builds | Local Docker | Remote via CodeBuild | Remote via CodeBuild |
430
- | Cost | Free (your hardware) | Pay per second | Pay per 100ms |
431
-
432
- ## AWS permissions summary
433
-
434
- There are six IAM principals involved:
435
-
436
- 1. **Operator** — your machine or CI (runs `al` commands)
437
- 2. **Execution role** — used by ECS itself to pull images, write logs, and inject secrets
438
- 3. **Task role** — one per agent on ECS Fargate, used by the container to read its own secrets
439
- 4. **Lambda execution role** — one per short-timeout agent, used by Lambda to read secrets and write logs
440
- 5. **App Runner access role** — allows App Runner to pull scheduler images from ECR (only for `al cloud deploy`)
441
- 6. **App Runner instance role** — assumed by the scheduler container, needs operator-level permissions (only for `al cloud deploy`)
442
-
443
- ### Operator IAM policy
444
-
445
- This is the minimum policy for the IAM user or role running `al` commands. Replace `<REGION>`, `<ACCOUNT_ID>`, and `<REPO_NAME>` with your values.
446
-
447
- > **Note:** `al setup cloud` automatically grants the PassRole and Logs statements via the `ActionLlamaOperator` inline policy. You still need to attach the remaining statements (ECS, SecretsManager, ECR, CodeBuild, Lambda, S3, IAM) manually or via your own IaC.
448
-
449
- ```json
450
- {
451
- "Version": "2012-10-17",
452
- "Statement": [
453
- {
454
- "Sid": "Identity",
455
- "Effect": "Allow",
456
- "Action": "sts:GetCallerIdentity",
457
- "Resource": "*"
458
- },
459
- {
460
- "Sid": "ECSRuntime",
461
- "Effect": "Allow",
462
- "Action": [
463
- "ecs:RegisterTaskDefinition",
464
- "ecs:RunTask",
465
- "ecs:DescribeTasks",
466
- "ecs:ListTasks",
467
- "ecs:StopTask"
468
- ],
469
- "Resource": "*"
470
- },
471
- {
472
- "Sid": "Logs",
473
- "Effect": "Allow",
474
- "Action": [
475
- "logs:CreateLogGroup",
476
- "logs:GetLogEvents",
477
- "logs:FilterLogEvents"
478
- ],
479
- "Resource": [
480
- "arn:aws:logs:<REGION>:<ACCOUNT_ID>:log-group:/ecs/action-llama*",
481
- "arn:aws:logs:<REGION>:<ACCOUNT_ID>:log-group:/aws/lambda/al-*",
482
- "arn:aws:logs:<REGION>:<ACCOUNT_ID>:log-group:/aws/apprunner/al-scheduler*"
483
- ]
484
- },
485
- {
486
- "Sid": "SecretsManager",
487
- "Effect": "Allow",
488
- "Action": [
489
- "secretsmanager:ListSecrets",
490
- "secretsmanager:CreateSecret",
491
- "secretsmanager:PutSecretValue",
492
- "secretsmanager:GetSecretValue"
493
- ],
494
- "Resource": "*"
495
- },
496
- {
497
- "Sid": "PassRole",
498
- "Effect": "Allow",
499
- "Action": "iam:PassRole",
500
- "Resource": [
501
- "arn:aws:iam::<ACCOUNT_ID>:role/al-*"
502
- ]
503
- },
504
- {
505
- "Sid": "IAMAgentRoles",
506
- "Effect": "Allow",
507
- "Action": [
508
- "iam:CreateRole",
509
- "iam:GetRole",
510
- "iam:GetRolePolicy",
511
- "iam:PutRolePolicy",
512
- "iam:DeleteRole",
513
- "iam:DeleteRolePolicy",
514
- "iam:AttachRolePolicy"
515
- ],
516
- "Resource": [
517
- "arn:aws:iam::<ACCOUNT_ID>:role/al-*"
518
- ]
519
- },
520
- {
521
- "Sid": "IAMListRoles",
522
- "Effect": "Allow",
523
- "Action": "iam:ListRoles",
524
- "Resource": "*"
525
- },
526
- {
527
- "Sid": "ECR",
528
- "Effect": "Allow",
529
- "Action": [
530
- "ecr:BatchGetImage",
531
- "ecr:GetDownloadUrlForLayer",
532
- "ecr:PutImage",
533
- "ecr:InitiateLayerUpload",
534
- "ecr:UploadLayerPart",
535
- "ecr:CompleteLayerUpload",
536
- "ecr:SetRepositoryPolicy"
537
- ],
538
- "Resource": "arn:aws:ecr:<REGION>:<ACCOUNT_ID>:repository/<REPO_NAME>"
539
- },
540
- {
541
- "Sid": "SetupWizardReadOnly",
542
- "Effect": "Allow",
543
- "Action": [
544
- "ecr:DescribeRepositories",
545
- "ecr:CreateRepository",
546
- "ecs:ListClusters",
547
- "ecs:DescribeClusters",
548
- "ecs:CreateCluster",
549
- "ec2:DescribeVpcs",
550
- "ec2:DescribeSubnets",
551
- "ec2:DescribeSecurityGroups"
552
- ],
553
- "Resource": "*"
554
- },
555
- {
556
- "Sid": "CodeBuild",
557
- "Effect": "Allow",
558
- "Action": [
559
- "codebuild:StartBuild",
560
- "codebuild:BatchGetBuilds",
561
- "codebuild:CreateProject"
562
- ],
563
- "Resource": "arn:aws:codebuild:<REGION>:<ACCOUNT_ID>:project/al-image-builder"
564
- },
565
- {
566
- "Sid": "Lambda",
567
- "Effect": "Allow",
568
- "Action": [
569
- "lambda:GetFunction",
570
- "lambda:CreateFunction",
571
- "lambda:UpdateFunctionCode",
572
- "lambda:UpdateFunctionConfiguration",
573
- "lambda:PutFunctionEventInvokeConfig",
574
- "lambda:InvokeFunction"
575
- ],
576
- "Resource": "arn:aws:lambda:<REGION>:<ACCOUNT_ID>:function:al-*"
577
- },
578
- {
579
- "Sid": "S3BuildContext",
580
- "Effect": "Allow",
581
- "Action": [
582
- "s3:CreateBucket",
583
- "s3:PutObject",
584
- "s3:ListBucket"
585
- ],
586
- "Resource": "*"
587
- },
588
- {
589
- "Sid": "AppRunner",
590
- "Effect": "Allow",
591
- "Action": [
592
- "apprunner:CreateService",
593
- "apprunner:UpdateService",
594
- "apprunner:DescribeService",
595
- "apprunner:DeleteService"
596
- ],
597
- "Resource": "arn:aws:apprunner:<REGION>:<ACCOUNT_ID>:service/al-scheduler/*"
598
- },
599
- {
600
- "Sid": "AppRunnerList",
601
- "Effect": "Allow",
602
- "Action": "apprunner:ListServices",
603
- "Resource": "*"
604
- },
605
- {
606
- "Sid": "DynamoDB",
607
- "Effect": "Allow",
608
- "Action": [
609
- "dynamodb:GetItem",
610
- "dynamodb:PutItem",
611
- "dynamodb:DeleteItem",
612
- "dynamodb:Query",
613
- "dynamodb:CreateTable",
614
- "dynamodb:DescribeTable",
615
- "dynamodb:UpdateTimeToLive"
616
- ],
617
- "Resource": "arn:aws:dynamodb:<REGION>:<ACCOUNT_ID>:table/al-state"
618
- },
619
- {
620
- "Sid": "ServiceLinkedRoles",
621
- "Effect": "Allow",
622
- "Action": "iam:CreateServiceLinkedRole",
623
- "Resource": [
624
- "arn:aws:iam::<ACCOUNT_ID>:role/aws-service-role/ecs.amazonaws.com/*",
625
- "arn:aws:iam::<ACCOUNT_ID>:role/aws-service-role/apprunner.amazonaws.com/*"
626
- ]
627
- }
628
- ]
629
- }
630
- ```
631
-
632
- The `SetupWizardReadOnly` statement is only needed for `al setup cloud`. You can remove it after initial setup if you prefer a tighter policy.
633
-
634
- The `CodeBuild` and `S3BuildContext` statements are required for image builds via CodeBuild.
635
-
636
- The `SecretsManager` statement can be scoped to `arn:aws:secretsmanager:<REGION>:<ACCOUNT_ID>:secret:action-llama/*` if you use the default secret prefix.
637
-
638
- The `IAMAgentRoles` statement is scoped to `al-*` roles, so it cannot modify unrelated IAM resources.
639
-
640
- The `AppRunner` statement is only needed for `al cloud deploy` / `al teardown cloud`. You can omit it if you only run the scheduler locally.
641
-
642
- ### Execution role (ECS infrastructure)
643
-
644
- Attach the AWS managed policy `AmazonECSTaskExecutionRolePolicy`, plus an inline policy for secret injection:
645
-
646
- | Service | Actions |
647
- |---------|---------|
648
- | ECR | `GetDownloadUrlForLayer`, `BatchGetImage`, `GetAuthorizationToken` |
649
- | CloudWatch Logs | `CreateLogStream`, `PutLogEvents`, `CreateLogGroup` |
650
- | Secrets Manager | `GetSecretValue` (on all agent secrets, so ECS can inject them) |
651
-
652
- ### Task role (container, per-agent)
653
-
654
- Created automatically by `al doctor -c`. Each agent gets its own role scoped to only its secrets:
655
-
656
- | Service | Actions |
657
- |---------|---------|
658
- | Secrets Manager | `GetSecretValue` (scoped to only that agent's secrets) |
659
-
660
- ## Deploying the scheduler
661
-
662
- ### Using `al cloud deploy` (recommended)
663
-
664
- Deploy the scheduler as an AWS App Runner service:
665
-
666
- ```bash
667
- al cloud deploy -p .
668
- ```
669
-
670
- This builds a container image with the AL CLI and all project files baked in, pushes it to ECR, and creates an App Runner service. The scheduler runs in headless mode with the gateway enabled, providing a public HTTPS endpoint for webhooks.
671
-
672
- The deployed service URL is printed on completion. Use it to configure webhook endpoints in GitHub/Sentry/Linear:
673
-
674
- ```
675
- https://<service-id>.<region>.awsapprunner.com/webhooks/github
676
- ```
677
-
678
- #### App Runner IAM roles
679
-
680
- `al cloud deploy` requires two additional IAM roles:
681
-
682
- **1. Access role** — allows App Runner to pull images from ECR:
683
-
684
- ```bash
685
- aws iam create-role \
686
- --role-name al-apprunner-access-role \
687
- --assume-role-policy-document '{
688
- "Version": "2012-10-17",
689
- "Statement": [{
690
- "Effect": "Allow",
691
- "Principal": { "Service": "build.apprunner.amazonaws.com" },
692
- "Action": "sts:AssumeRole"
693
- }]
694
- }'
695
-
696
- aws iam attach-role-policy \
697
- --role-name al-apprunner-access-role \
698
- --policy-arn arn:aws:iam::aws:policy/service-role/AWSAppRunnerServicePolicyForECRAccess
699
- ```
700
-
701
- Set `cloud.appRunnerAccessRoleArn` to this role's ARN in `config.toml`.
702
-
703
- **2. Instance role** — the IAM role assumed by the scheduler container. It needs the same permissions as the operator (ECS, Lambda, Secrets Manager, ECR, CodeBuild, S3, CloudWatch Logs):
704
-
705
- ```bash
706
- aws iam create-role \
707
- --role-name al-apprunner-instance-role \
708
- --assume-role-policy-document '{
709
- "Version": "2012-10-17",
710
- "Statement": [{
711
- "Effect": "Allow",
712
- "Principal": { "Service": "tasks.apprunner.amazonaws.com" },
713
- "Action": "sts:AssumeRole"
714
- }]
715
- }'
716
- ```
717
-
718
- Attach the same policy statements from the [operator IAM policy](#operator-iam-policy) to this role (ECS, Lambda, Secrets Manager, ECR, CodeBuild, S3, DynamoDB, CloudWatch Logs, PassRole, IAM agent roles). Set `cloud.appRunnerInstanceRoleArn` to this role's ARN.
719
-
720
- #### Managing the cloud scheduler
721
-
722
- ```bash
723
- al stat -c # Show scheduler service status + running agents
724
- al logs scheduler -c # Tail scheduler logs from CloudWatch
725
- al teardown cloud # Tear down scheduler + all cloud resources
726
- ```
727
-
728
- ### Manual deployment (alternative)
729
-
730
- You can also deploy the scheduler manually to any platform that runs Node.js (Railway, Fly, EC2, etc.):
731
-
732
- **Required environment variables:**
733
-
734
- | Env var | Description |
735
- |---------|-------------|
736
- | `AWS_ACCESS_KEY_ID` | AWS access key for the operator IAM user/role |
737
- | `AWS_SECRET_ACCESS_KEY` | AWS secret key |
738
-
739
- These provide the scheduler with the same permissions as running `al` locally. Use the [operator IAM policy](#operator-iam-policy) below to scope the access.
740
-
741
- **Start command:**
742
-
743
- ```
744
- al start -c --headless
745
- ```
746
-
747
- **What needs to be in the deploy:**
748
-
749
- - Your project repo (with `config.toml`, agent directories containing `agent-config.toml` and `ACTIONS.md`)
750
- - `@action-llama/action-llama` as a dependency in `package.json`
751
-
752
- The scheduler builds images via CodeBuild, launches containers on ECS Fargate, and streams logs from CloudWatch — all remotely. No local Docker is needed.
753
-
754
- ## Troubleshooting
755
-
756
- **"ECS runtime requires cloud.awsRegion..."** — Ensure all required fields are set in `config.toml` under `[cloud]`.
757
-
758
- **"No AWS credentials found"** — Set `AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY` env vars or run `aws configure`.
759
-
760
- **"Unable to assume the service linked role"** — ECS needs a service-linked role the first time it's used in an account. `al setup cloud` creates this automatically, but if you set up manually: `aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com`.
761
-
762
- **"Couldn't create a service-linked role for App Runner"** — Same issue for App Runner. `al setup cloud` creates this automatically, but if you set up manually: `aws iam create-service-linked-role --aws-service-name apprunner.amazonaws.com`. If `al setup cloud` itself fails with this error, your IAM user needs `iam:CreateServiceLinkedRole` permission — see [Service-linked roles](#service-linked-roles).
763
-
764
- **"ECS was unable to assume the role 'arn:aws:iam::...:role/al-AGENT-task-role'"** — This is the most common issue with multiple agents. It means the IAM task role for your second (or subsequent) agent doesn't exist or has incorrect permissions. This typically happens because:
765
-
766
- 1. You ran `al setup cloud` with only one agent, then added more agents later
767
- 2. The per-agent role creation failed during setup
768
- 3. The role exists but has an incorrect trust policy
769
-
770
- **Solutions:**
771
- - **Quick fix:** Run `al doctor -c` to validate and create missing roles
772
- - **Verify setup:** Run `al doctor -c --check-only` to see what's missing without making changes
773
- - **Manual check:** Run `aws iam get-role --role-name al-AGENT-task-role` to see if the role exists
774
-
775
- **Prevention:** Always run `al doctor -c` after adding new agents to ensure their IAM roles are created.
776
-
777
- **"Failed to start ECS task"** — Check that the ECS cluster exists, subnets have internet access, and the execution role has the required permissions.
778
-
779
- **CodeBuild build fails** — Check the build logs linked in the error message. Common causes: the `al-codebuild-role` is missing or lacks ECR push permissions, or the S3 bucket doesn't exist. Verify the role exists and has the permissions listed in the "How CodeBuild works" section above.
780
-
781
- **"The specified log group does not exist"** — The CloudWatch log group `/ecs/action-llama` hasn't been created. The runtime creates it automatically on first launch, but the operator IAM user needs `logs:CreateLogGroup` permission. Either re-run `al setup cloud` (which creates it), or create it manually:
782
-
783
- ```bash
784
- aws logs create-log-group --log-group-name /ecs/action-llama --region us-east-1
785
- ```
786
-
787
- If you get `AccessDeniedException`, add the `logs:CreateLogGroup` action to your operator IAM policy (see the operator policy above).
788
-
789
- **"not authorized to perform: logs:FilterLogEvents"** — Your operator IAM user is missing CloudWatch Logs read permissions. Running `al setup cloud` grants these automatically (the `ActionLlamaOperator` inline policy). If you set up before this was added, re-run `al setup cloud` or manually add the Logs statement from the operator policy above.
790
-
791
- **Logs are delayed** — This is expected. CloudWatch Logs has a ~5-10 second ingestion delay. The TUI shows a warning when running in ECS mode.
792
-
793
- **Agent can't access secrets** — Verify the per-agent task role has `secretsmanager:GetSecretValue` on the correct secret ARNs. Check with `aws iam get-role-policy --role-name al-dev-task-role --policy-name SecretsAccess`.
794
-
795
- **Task stops immediately with exit code 1** — Check CloudWatch Logs for the error. Common causes: missing credentials in Secrets Manager, missing `ACTIONS.md`, invalid model config.