archal 0.9.19 → 0.9.20

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (92) hide show
  1. package/README.md +9 -1
  2. package/agents/github-octokit/.archal.json +8 -0
  3. package/agents/github-octokit/Dockerfile +8 -0
  4. package/agents/github-octokit/README.md +113 -0
  5. package/agents/github-octokit/agent.mjs +54 -0
  6. package/agents/github-octokit/package.json +9 -0
  7. package/agents/github-octokit/scenarios/test-repo-access.md +27 -0
  8. package/agents/google-workspace-local-tools/Dockerfile +6 -0
  9. package/agents/google-workspace-local-tools/README.md +58 -0
  10. package/agents/google-workspace-local-tools/agent.mjs +196 -0
  11. package/agents/google-workspace-local-tools/archal-harness.json +7 -0
  12. package/agents/google-workspace-local-tools/run-input.yaml +16 -0
  13. package/agents/google-workspace-local-tools/scenario.md +29 -0
  14. package/agents/hermes/.archal.json +8 -0
  15. package/agents/hermes/Dockerfile +46 -0
  16. package/agents/hermes/README.md +87 -0
  17. package/agents/hermes/SOUL.md +27 -0
  18. package/agents/hermes/config.yaml +34 -0
  19. package/agents/hermes/drive.mjs +113 -0
  20. package/agents/hermes/scenarios/stripe-customers-read-only.md +32 -0
  21. package/agents/openclaw/.archal.json +8 -0
  22. package/agents/openclaw/Dockerfile +96 -0
  23. package/agents/openclaw/README.md +120 -0
  24. package/agents/openclaw/drive.mjs +311 -0
  25. package/agents/openclaw/package.json +9 -0
  26. package/agents/openclaw/scenarios/github-issue-triage-read-only.md +44 -0
  27. package/agents/openclaw/workspace/AGENTS.md +23 -0
  28. package/agents/openclaw/workspace/IDENTITY.md +8 -0
  29. package/agents/openclaw/workspace/SOUL.md +14 -0
  30. package/agents/openclaw/workspace/TOOLS.md +35 -0
  31. package/agents/pagination-test/README.md +24 -0
  32. package/agents/pagination-test/scenario.md +24 -0
  33. package/agents/replay-capsule-harness/README.md +29 -0
  34. package/agents/replay-capsule-harness/observability-install-offline-e2e.mts +1517 -0
  35. package/agents/replay-capsule-harness/replay-capsule-e2e.mjs +104 -0
  36. package/clone-assets/apify/tools.json +256 -22
  37. package/clone-assets/calcom/tools.json +510 -0
  38. package/clone-assets/clickup/tools.json +1258 -0
  39. package/clone-assets/customerio/tools.json +386 -0
  40. package/clone-assets/datadog/tools.json +734 -0
  41. package/clone-assets/github/tools.json +306 -25
  42. package/clone-assets/gitlab/tools.json +999 -0
  43. package/clone-assets/google-workspace/tools.json +18 -6
  44. package/clone-assets/hubspot/tools.json +1406 -0
  45. package/clone-assets/jira/fidelity.json +1 -1
  46. package/clone-assets/jira/tools.json +266 -543
  47. package/clone-assets/linear/tools.json +238 -40
  48. package/clone-assets/ownerrez/tools.json +548 -0
  49. package/clone-assets/pricelabs/tools.json +343 -0
  50. package/clone-assets/sentry/tools.json +745 -0
  51. package/clone-assets/slack/tools.json +1 -2
  52. package/clone-assets/stripe/tools.json +185 -46
  53. package/clone-assets/supabase/tools.json +437 -0
  54. package/clone-assets/unipile/tools.json +408 -0
  55. package/clone-assets/webflow/tools.json +415 -0
  56. package/dist/autoloop-worker-types-BEb_E44z.d.cts +196 -0
  57. package/dist/cli.cjs +150299 -87430
  58. package/dist/commands/autoloop-hosted-worker.cjs +43942 -0
  59. package/dist/commands/autoloop-hosted-worker.d.cts +143 -0
  60. package/dist/commands/autoloop-pr-verification.cjs +4227 -0
  61. package/dist/commands/autoloop-pr-verification.d.cts +17 -0
  62. package/dist/{vitest/chunk-L36NXAU6.js → commands/autoloop-result-parser.cjs} +16445 -18852
  63. package/dist/commands/autoloop-result-parser.d.cts +39 -0
  64. package/dist/commands/autoloop-worker.cjs +36163 -0
  65. package/dist/commands/autoloop-worker.d.cts +97 -0
  66. package/dist/harness.cjs +1 -0
  67. package/dist/index.cjs +1 -1
  68. package/dist/replay.cjs +49624 -0
  69. package/dist/replay.d.cts +4625 -0
  70. package/dist/scenarios.cjs +80343 -0
  71. package/dist/scenarios.d.cts +562 -0
  72. package/dist/vitest/chunk-6CBYFCFK.js +4667 -0
  73. package/dist/vitest/chunk-ARVS45PP.js +2764 -0
  74. package/dist/vitest/index.cjs +6011 -75261
  75. package/dist/vitest/index.d.ts +7 -6
  76. package/dist/vitest/index.js +8 -8
  77. package/dist/vitest/runtime/hosted-session-reaper.cjs +792 -34359
  78. package/dist/vitest/runtime/hosted-session-reaper.js +1 -1
  79. package/dist/vitest/runtime/setup-files.js +2 -2
  80. package/package.json +8 -3
  81. package/skills/archal-agent/SKILL.md +87 -0
  82. package/skills/{attach → autoloop}/SKILL.md +94 -120
  83. package/skills/autoloop/references/hosted-sources.md +62 -0
  84. package/skills/autoloop/references/trace-schema-mapping.md +73 -0
  85. package/skills/eval/SKILL.md +35 -1
  86. package/skills/install-agent/SKILL.md +221 -0
  87. package/skills/onboard/SKILL.md +73 -5
  88. package/skills/scenario/SKILL.md +19 -4
  89. package/skills/seed/SKILL.md +237 -0
  90. package/dist/seed/dynamic-generator.cjs +0 -45687
  91. package/dist/seed/dynamic-generator.d.cts +0 -106
  92. package/dist/vitest/chunk-WZ7SA4CK.js +0 -47369
@@ -0,0 +1,221 @@
1
+ ---
2
+ name: install-agent
3
+ description: Connect an agent's repo and its production observability to Archal so its traces get captured and graded. Detects an existing observability stack (LangSmith, Langfuse, Datadog, OpenTelemetry, Braintrust), connects the GitHub App, opens an observability setup PR, and wires an existing trace vendor through `archal trace-source`. USE THIS whenever the user says "connect my agent", "install the Archal agent", "set up observability", "capture my agent's traces", "hook up my production traces", "where do my traces go", or asks how to get an already-running agent into Archal. Reach for it before telling anyone a capability is missing — read the honest limits below first.
4
+ user-invocable: true
5
+ argument-hint: "[repo + where its traces live]"
6
+ ---
7
+
8
+ # Archal Install Agent
9
+
10
+ You are connecting a real, already-running agent to Archal so its production
11
+ behavior gets captured and graded. Two things have to land: (1) Archal can read
12
+ the **repo** (GitHub App), and (2) Archal can read the agent's **traces** —
13
+ either by adding instrumentation through a setup PR, or by ingesting an existing
14
+ observability vendor. Once traces flow, grading and the autoloop take over.
15
+
16
+ Be honest about what this is. The "install agent" is **not** a sandboxed coding
17
+ agent that edits arbitrary code in the repo. It is deterministic repo inspection
18
+ plus a deterministic, templated setup PR, plus an optional one-shot managed
19
+ planner (an LLM call) that only relocates the bootstrap file and writes advisory
20
+ PR-body text. Set that expectation up front so nobody waits for an autonomous
21
+ coder that does not exist yet. The honest limit is spelled out below — do not
22
+ oversell it.
23
+
24
+ ## Why this exists
25
+
26
+ Archal grades agent behavior from traces. An agent that already runs in
27
+ production has traces somewhere — your own logs, or a vendor like Langfuse or
28
+ Braintrust. The install path's whole job is to get those traces into Archal's
29
+ normalized shape without you hand-writing exporters or copying secrets around.
30
+ Capturing the trace is the precondition for everything downstream: grading,
31
+ reproduction, and the autoloop that turns reproduced failures into PRs.
32
+
33
+ ## Discover first
34
+
35
+ Before changing anything, read the repo and find out where traces live:
36
+
37
+ 1. `package.json` / `pyproject.toml` / `requirements.txt`: language and
38
+ framework. Language matters — the planner and `@archal/state-capture` are
39
+ TypeScript-only today (see limits).
40
+ 2. Existing observability dependencies. Archal's detector recognizes exactly
41
+ five vendors by dependency name:
42
+ - `langsmith` -> LangSmith
43
+ - `langfuse`, `langfuse-node` (TS) / `langfuse` (py) -> Langfuse
44
+ - `dd-trace` (TS) / `ddtrace` (py) -> Datadog
45
+ - `@opentelemetry/sdk-node`, `@opentelemetry/sdk-trace-node` (TS) /
46
+ `opentelemetry-sdk` (py) -> OpenTelemetry
47
+ - `braintrust` -> Braintrust
48
+ A repo with any of these is a candidate for **augment**; a repo with none is
49
+ **greenfield**.
50
+ 3. GitHub remote — augment/greenfield setup PRs and the autoloop need a GitHub
51
+ remote that resolves to `github.com/<owner>/<repo>`:
52
+ ```bash
53
+ git remote get-url origin
54
+ ```
55
+ 4. Where do the traces actually go? Ask the user. The answer routes you:
56
+ - already in a hosted vendor (Langfuse, Braintrust) or a Postgres/Supabase
57
+ table -> ingest path (`archal trace-source`, delegate detail to `autoloop`)
58
+ - exported files on disk -> `archal trace-source import`
59
+ - nowhere yet / only app logs -> the observability setup PR
60
+
61
+ Never print secrets while inspecting. Show env var names or secret references,
62
+ never plaintext keys or database URLs.
63
+
64
+ ## Preconditions
65
+
66
+ - Archal CLI installed in the repo or reachable with `npx archal`
67
+ - authenticated user (`archal login`) or `ARCHAL_TOKEN=archal_ws_...` (a
68
+ workspace key for CI)
69
+ - the **Archal GitHub App** installed on the target repo (required for the setup
70
+ PR and for autoloop fix PRs)
71
+ - a GitHub remote resolving to `github.com/<owner>/<repo>`
72
+ - for the ingest path: a read-only credential for the trace vendor
73
+
74
+ If a precondition is missing, make the smallest safe change and name what is
75
+ still required. Do not fake a connection.
76
+
77
+ ## Step 1 — connect the GitHub App
78
+
79
+ The repo connection is the GitHub App, not a token paste. Confirm the **Archal
80
+ GitHub App** is installed on the target repository and that the org granted it
81
+ access. Without it, the setup PR cannot be opened and the autoloop cannot open
82
+ fix PRs. If it is not installed, send the user to the dashboard's integration
83
+ flow to install it, then continue.
84
+
85
+ ## Step 2 — open the observability setup PR
86
+
87
+ When traces are not yet exported anywhere (or the user wants Archal's own
88
+ capture), open the **observability setup PR**. It is a deterministic, templated
89
+ patch — every file's contents are pre-generated; nothing is freely authored by
90
+ an LLM. The patch resolves into one of two install modes:
91
+
92
+ ### Greenfield (no existing observability detected)
93
+
94
+ Adds standard OpenTelemetry instrumentation pointed at Archal's OTLP endpoint:
95
+
96
+ - `archal-otel.ts` (TS) or `archal_otel.py` (Python) — an OpenTelemetry init
97
+ bootstrap (OTLP HTTP exporter + the node/python SDK), **not** an Archal-only
98
+ exporter
99
+ - `archal-replay-capsule.ts` / `archal_replay_capsule.py` — a replay helper
100
+ template
101
+ - OpenTelemetry SDK + framework instrumentation added to `package.json` (TS) or
102
+ `requirements.txt` (Python)
103
+ - an `.env.example` entry for the workspace key and an `ARCHAL_OBSERVABILITY.md`
104
+ guide
105
+
106
+ ### Augment (existing observability detected, TypeScript only)
107
+
108
+ When the repo already has one of the five vendors above **and** is TypeScript,
109
+ the PR instead adds Archal's state capture alongside the existing stack rather
110
+ than replacing it:
111
+
112
+ - `archal-state-capture.ts` importing from `@archal/state-capture`
113
+ - `@archal/state-capture` and `@opentelemetry/api` added to `package.json`
114
+ - the same `.env.example` entry and `ARCHAL_OBSERVABILITY.md` guide
115
+
116
+ Python repos with existing observability fall back to the greenfield
117
+ OpenTelemetry install — `@archal/state-capture` has no Python build yet, and the
118
+ PR body says so. Tell the user that explicitly rather than implying parity.
119
+
120
+ ### The optional install planner (managed LLM)
121
+
122
+ On Pro/Enterprise workspaces, for a TypeScript greenfield/augment install with
123
+ repo detection available, one managed LLM call (intent `observability-install`,
124
+ public label **Archal install planner**, routed through the managed eval model
125
+ lane — gemini-class — and metered as `cogs_only` spend) adapts the deterministic
126
+ patch to the repo's real layout. It is strictly additive and **fail-open**: any
127
+ availability, auth, plan-gate, or validation problem ships the deterministic
128
+ patch unchanged. The planner can only:
129
+
130
+ - relocate the bootstrap file to a better directory (path only — file contents
131
+ are never edited)
132
+ - append advisory text to the PR body (where to wire startup, which functions to
133
+ wrap)
134
+
135
+ It never edits application code, never modifies existing instrumentation, and
136
+ never runs free-form codegen. Disable it with `ARCHAL_INSTALL_PLANNER_DISABLED=1`
137
+ to force the deterministic install.
138
+
139
+ ## Step 3 — ingest an existing observability vendor
140
+
141
+ If the agent already emits traces to a vendor, you usually do **not** need the
142
+ setup PR — you normalize the existing traces with `archal trace-source`. This is
143
+ the genuine ingest path. It maps a vendor's payloads into Archal's trace upload
144
+ envelopes and uploads them to hosted autoloop when workspace auth is present.
145
+
146
+ Supported providers: `langfuse`, `braintrust`, `otel`, `http`, `supabase`,
147
+ `postgres`, `file`, `custom`. Pull/sync vendors (`langfuse`, `braintrust`,
148
+ Postgres/Supabase) are fetched on a cursor; push sources (`otel`, `http`,
149
+ `custom`) receive traces continuously through `serve`.
150
+
151
+ The command surface:
152
+
153
+ ```bash
154
+ archal trace-source connect <provider> # register a source (e.g. langfuse, braintrust, otel, custom)
155
+ archal trace-source test [source] # validate credentials and reachability
156
+ archal trace-source sync [source] # pull-fetch new traces (langfuse/braintrust/db)
157
+ archal trace-source watch [source] # continuous pull loop
158
+ archal trace-source serve [source] # receiver for push sources (otel/http/custom)
159
+ archal trace-source import <path> # normalize exported trace files on disk
160
+ archal trace-source status [source] # registry validation, cursor, last-sync state
161
+ archal trace-source list # registered sources
162
+ archal trace-source use|disable <source> # select / disable a source
163
+ ```
164
+
165
+ **Delegate the deep mapping to the `autoloop` skill.** It owns the per-vendor
166
+ flags (`--base-url`, `--api-key-env`, `--out`, `--upload`, `--repository`,
167
+ schema/cursor/filter mapping for database sources) and the full
168
+ import/grade/reproduce/fix loop. Quote the command names here; point the user
169
+ there for the flag detail and the autoloop wiring.
170
+
171
+ ## The honest limit
172
+
173
+ There is **no sandboxed coding agent** that reads the whole repo and edits
174
+ arbitrary code to wire up instrumentation. For a small or conventional repo, the
175
+ setup PR drops in and works. For a large or unusual repo, the setup PR is a
176
+ **generic bootstrap the user finishes** — they still import the bootstrap at
177
+ startup and wrap the call sites the planner's advisory section points at. Say
178
+ this plainly. The typed model lanes `remediation_agent` and
179
+ `observability_install_agent` exist in the contract but are
180
+ `agent-executor-contract-only`: the lane is declared, but no real executor
181
+ consumes it yet. Do not describe either as a working autonomous coder.
182
+
183
+ ## Failure taxonomy
184
+
185
+ Classify precisely; do not paper over a missing precondition:
186
+
187
+ - **GitHub App not connected** — setup PR and fix PRs cannot open. Install the
188
+ Archal GitHub App on the repo.
189
+ - **No GitHub remote** — augment/greenfield PRs need `github.com/<owner>/<repo>`.
190
+ - **Language unsupported by augment** — Python with existing observability falls
191
+ back to greenfield OTel; `@archal/state-capture` is TS-only.
192
+ - **Planner skipped** — wrong plan (needs Pro/Enterprise), non-TypeScript,
193
+ `augment-existing-vendor` legacy mode, detection unavailable, or
194
+ `ARCHAL_INSTALL_PLANNER_DISABLED=1`. The deterministic patch still ships; this
195
+ is not a failure, just a narrower install.
196
+ - **Setup PR is a stub for this repo** — large/unusual layout; the user finishes
197
+ wiring. Not a bug; the honest limit.
198
+ - **Trace ingest failure** — `trace-source` adapter mismatch, bad credential,
199
+ rejected upload, or missing workspace auth. Use `archal trace-source test` and
200
+ `status` to localize it.
201
+ - **No usable trace evidence** — once ingested, grading or reproduction can still
202
+ block if the trace lacks task context or state. Hand off to the `autoloop`
203
+ skill's evidence diagnosis.
204
+
205
+ ## What to report back
206
+
207
+ After install or debugging, give the user:
208
+
209
+ - repo full name and whether the GitHub App is connected
210
+ - chosen path: setup PR (greenfield vs augment) or `trace-source` ingest
211
+ - if a setup PR: the install mode, the files it adds, and whether the planner ran
212
+ - if ingest: provider, source id, and the next `archal trace-source` command
213
+ - whether traces are flowing into Archal yet, or the exact blocker
214
+ - next command or next owner
215
+
216
+ ## Docs
217
+
218
+ - Autoloop production traces: https://docs.archal.ai/guides/autoloop-production-traces
219
+ - Autonomous loops: https://docs.archal.ai/guides/autoloop-production-traces
220
+ - CLI reference: https://docs.archal.ai/cli/autoloop
221
+ - Quickstart: https://docs.archal.ai/quickstart
@@ -87,6 +87,16 @@ Confirm detected clones, then ask which of these the user wants. Each delegates
87
87
 
88
88
  If the user doesn't have a harness yet, prefer `npx archal init`; it creates `./.archal/harness.mjs`, points `.archal.json` at it, and adds a starter scenario without overwriting existing files. The generated harness is a guarded stub: Archal refuses to score it until the user edits it to call their Cursor, Codex, Claude Code, or custom agent. A custom harness should read `AGENT_TASK` from env, call the agent runtime, print `{ "text": "..." }` to stdout, and call `reportAgentMetrics()` from `archal/harness` with accumulated `{ inputTokens, outputTokens, llmCallCount }` before exit. Service clients need one explicit routing mode: use sandbox/Docker routing when the harness calls normal service URLs such as `https://api.github.com`, or configure SDK base URLs from `AGENT_CLONE_URLS` and add the JSON headers from `AGENT_ROUTE_HEADERS` to those clone requests. Alternative: skip `agent` in `.archal.json` and pass `--harness <path>` per-run.
89
89
 
90
+ ### Or run a packaged agent (no harness to write)
91
+
92
+ If the user just wants to evaluate a real, ready-made agent, point them at a packaged agent instead of writing a harness. A packaged agent runs unmodified in Docker while Archal's TLS-intercept sidecar routes its calls to seeded clones and injects the host's model API key on its model calls. The bundled agents live under `examples/agents/<name>` (`openclaw`, `hermes`, `github-octokit`).
93
+
94
+ - `archal run <scenario>.md --sandbox` - run the bundled OpenClaw agent (needs Docker). Pick the model with `--agent-model <provider/model>`; export the matching key in the shell first (`OPENAI_API_KEY` / `ANTHROPIC_API_KEY` / `GEMINI_API_KEY`).
95
+ - `archal run <scenario>.md --harness examples/agents/hermes --dockerfile examples/agents/hermes/Dockerfile` - run any other bundled agent through the Docker harness flags (swap in `github-octokit` for the other one).
96
+ - `archal run <scenario>.md --harness ./<dir> --dockerfile ./<dir>/Dockerfile` - run your own packaged agent. A packaged agent is just a directory with a `Dockerfile`, a drive script (reads `AGENT_TASK`, prints the answer to stdout), and an `.archal.json`.
97
+
98
+ See the "Run a packaged agent" guide: https://docs.archal.ai/guides/packaged-agents
99
+
90
100
  ### Option A - Evaluate an agent with scenarios
91
101
 
92
102
  Write markdown scenario files that describe setup, prompt, and success criteria; `archal run` executes them against clones.
@@ -122,16 +132,73 @@ Do not paste a sample config here. The right shape depends on what's already in
122
132
 
123
133
  Run: `archal clone start <detected clones>` - gives live clone URLs the user's SDK clients can point at. `archal clone status` shows the active session; `archal clone stop` tears down.
124
134
 
125
- ### Option E - Attach real trace sources
135
+ ### Option E - Bounded pre-prod autonomous loop
136
+
137
+ Use this when the repo already has scenarios or can safely generate starter
138
+ pre-prod scenarios, and the user wants a coding agent to run checks, classify
139
+ failures, optionally patch, validate, and open a draft PR.
140
+
141
+ Start with:
142
+
143
+ ```bash
144
+ archal preprod plan --repo . --write-scenarios --write-config --out .archal/preprod-plan.json
145
+ archal preprod start --scenario-count 20 --dry-run --artifacts .archal/preprod
146
+ ```
147
+
148
+ `--write-scenarios` writes generated scenario markdown under `archal/` by
149
+ default, and `--write-config` writes `.archal.json` only when it can do so
150
+ without overwriting an existing config. `preprod start` creates or reuses
151
+ `.archal/preprod-pack.json`, writes generated scenarios under
152
+ `archal/generated/` by default, runs the pack, and leaves resumable artifacts.
153
+ If the repo already has `.archal.json`, read `.archal/preprod-plan.json` and
154
+ confirm the detected clone/harness surface before starting the loop.
155
+
156
+ Only enable local fix or PR commands after the dry-run artifacts have been
157
+ reviewed:
158
+
159
+ ```bash
160
+ archal preprod start \
161
+ --scenario-count 20 \
162
+ --allow-external-execution \
163
+ --remediation-agent codex \
164
+ --validation-command '<test command>' \
165
+ --open-pr \
166
+ --pr-command '<draft-pr command>' \
167
+ --artifacts .archal/preprod
168
+ ```
169
+
170
+ `--open-pr` requires both `--validation-command` and `--pr-command`; PR
171
+ publishing still stays disabled unless `--allow-external-execution` is present.
172
+ `preprod start` uses the managed preprod remediation path by default. It writes
173
+ a repo-local remediation context, invokes the selected coding agent, reruns the
174
+ scenario pack, and validates before PR creation. The remediation command
175
+ receives `ARCHAL_PREPROD_FAILURES_JSON`, `ARCHAL_PREPROD_ATTEMPT`,
176
+ `ARCHAL_PREPROD_REMEDIATION_CONTEXT_PATH`, and `ARCHAL_PREPROD_USAGE_PATH`.
177
+ If the coding agent can report its own model usage, write JSON to
178
+ `ARCHAL_PREPROD_USAGE_PATH` with fields such as `inputTokens`, `outputTokens`,
179
+ `provider`, `model`, `isByok`, and `costUsd`.
180
+
181
+ Tell the user to inspect `.archal/preprod/preprod-result.json` and
182
+ `.archal/preprod/preprod-failures.json` for status, stop reason, attempts,
183
+ scenario run ids, validation, and PR summary. If the run was stopped with
184
+ `--stop-after` or interrupted, resume with `archal preprod start --resume
185
+ .archal/preprod --artifacts .archal/preprod`.
186
+
187
+ ### Option F - Autoloop real trace sources
126
188
 
127
189
  Use this when the repo already has agent traces from pre-production or
128
190
  production and the user wants Archal to import, grade, reproduce, and turn
129
191
  reproduced failures into GitHub issues or PRs.
130
192
 
131
- **Delegate to the `attach` skill.** It owns the trace-source mapping,
132
- `archal/harness.json`, `archal/scenario.md`, seed templates, `archal attach`
193
+ **Delegate to the `autoloop` skill.** It owns the trace-source mapping,
194
+ `archal/harness.json`, `archal/scenario.md`, seed templates, `archal autoloop`
133
195
  commands, dashboard expectations, and failure taxonomy. Do not inline the
134
- Attach flow here; it changes faster than starter scenario setup.
196
+ Autoloop flow here; it changes faster than starter scenario setup.
197
+
198
+ Set the expectation carefully: Autoloop is not arbitrary production trace replay.
199
+ It can reproduce only failures with enough trace evidence plus repo-owned
200
+ scenario/seed context to reconstruct realistic clone state. Missing evidence
201
+ should block with a clear artifact instead of being guessed.
135
202
 
136
203
  ## Verify
137
204
 
@@ -155,4 +222,5 @@ Run the first scenario or task. For Options A and B, hand off to the `eval` skil
155
222
 
156
223
  - Quickstart: https://docs.archal.ai/quickstart
157
224
  - Full docs: https://docs.archal.ai
158
- - Attach production traces: https://docs.archal.ai/guides/attach-production-traces
225
+ - Autonomous loops: https://docs.archal.ai/guides/autoloop-production-traces
226
+ - Autoloop production traces: https://docs.archal.ai/guides/autoloop-production-traces
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: scenario
3
- description: Write, edit, and validate Archal scenario files. Knows the markdown format, success criteria syntax, and config options.
3
+ description: Write, edit, and validate Archal scenario markdown the format, success criteria syntax, and config. USE THIS whenever the user wants to "write a scenario", "add a test for my agent", "fix/edit my scenario", asks "what's the success criteria syntax" or about `[D]`/`[P]` criteria, needs a multi-clone scenario, or is validating scenario files. Reach for it on any mention of authoring or fixing Archal scenarios.
4
4
  user-invocable: true
5
5
  argument-hint: "[scenario description or file path]"
6
6
  ---
@@ -15,7 +15,7 @@ You write and edit Archal scenario files. Scenarios are markdown files that defi
15
15
  # Scenario Title
16
16
 
17
17
  ## Setup
18
- Starting state described in plain English. Drives seed generation.
18
+ Starting state in plain English. Context Archal reconstructs and the agent + evaluator read. Does NOT generate seed state.
19
19
 
20
20
  ## Prompt
21
21
  The task instruction given to the agent.
@@ -95,14 +95,25 @@ and `archal seed list` over maintaining a separate list in this skill.
95
95
  | Clone | Seeds |
96
96
  |------|-------|
97
97
  | `apify` | `empty` |
98
+ | `calcom` | `empty`, `demo` |
99
+ | `clickup` | `empty`, `demo` |
100
+ | `customerio` | `empty` |
101
+ | `datadog` | `empty` |
98
102
  | `github` | `empty`, `small-project`, `enterprise-repo`, `ci-cd-pipeline`, `stale-issues`, `large-backlog` |
103
+ | `gitlab` | `empty`, `demo` |
104
+ | `hubspot` | `empty`, `demo`, `stale-data` |
99
105
  | `slack` | `empty`, `engineering-team`, `busy-workspace`, `incident-active` |
100
106
  | `stripe` | `empty`, `small-business`, `checkout-flow`, `subscription-lifecycle`, `subscription-heavy` |
101
107
  | `jira` | `empty`, `small-project`, `enterprise`, `sprint-active`, `large-backlog` |
102
108
  | `linear` | `empty`, `small-team`, `engineering-org`, `multi-team`, `busy-backlog` |
103
109
  | `supabase` | `empty`, `small-project`, `saas-starter`, `ecommerce` |
104
110
  | `google-workspace` | `empty`, `assistant-baseline`, `gmail-busy-inbox`, `calendar-packed-week` |
111
+ | `ownerrez` | `empty` |
112
+ | `pricelabs` | `empty` |
113
+ | `sentry` | `empty`, `demo` |
105
114
  | `tavily` | `empty` |
115
+ | `unipile` | `empty` |
116
+ | `webflow` | `empty` |
106
117
  | `ramp` | `empty`, `default` |
107
118
  | `discord` | `empty`, `small-server`, `harvested` |
108
119
 
@@ -131,7 +142,11 @@ Use multiple clones by listing them in config:
131
142
  clones: github, slack
132
143
  ```
133
144
 
134
- The Setup section can describe state across both services. Each clone gets its own seed.
145
+ The Setup section can describe context across both services. Attach explicit seed state per clone via `seed:` or `## Seed State` (see Seed state below).
146
+
147
+ ## Seed state
148
+
149
+ Seeding is deterministic — explicit committed state, no LLM. Scenarios attach it via the `seed:` config key or a `## Seed State` section. To author or load explicit JSON/SQL/catalog state into a clone, delegate to the sibling `seed` skill (`packages/archal/skills/seed`) rather than handling the seeding mechanics here.
135
150
 
136
151
  ## Validation
137
152
 
@@ -147,7 +162,7 @@ Run `archal scenario list` to verify scenarios parse correctly. A valid scenario
147
162
  1. Writing `[D]` criteria that require subjective judgment
148
163
  2. Writing `[P]` criteria that could be checked deterministically
149
164
  3. Forgetting to specify which clone the scenario uses
150
- 4. Writing Setup descriptions that are too vague for seed generation
165
+ 4. Writing Setup descriptions too vague to ground the agent and evaluator
151
166
  5. Using seed names that don't exist (check the seed table above)
152
167
 
153
168
  ## Documentation
@@ -0,0 +1,237 @@
1
+ ---
2
+ name: seed
3
+ description: Craft and load explicit clone seed state for Archal, deterministically and with no LLM. This is the canonical "how to seed a clone" skill. Use it whenever you need to give a clone (github, stripe, supabase, slack, ...) a known starting state: writing or editing a JSON or SQL seed file, choosing a named catalog seed, wiring a scenario's seed, debugging "seed not found" / "seed_unavailable" / shape-mismatch / rollback errors, or deciding between an inline `## Seed State` block and a committed seed file. Reach for this any time the words seed, seed state, starting state, fixture data, or "set up the clone with..." appear.
4
+ user-invocable: true
5
+ argument-hint: "[clone + the state you want loaded]"
6
+ ---
7
+
8
+ # Archal Seed State
9
+
10
+ You craft and load explicit starting state for Archal clones. A seed is the
11
+ clone's state before a run begins: the issues GitHub holds, the customers and
12
+ subscriptions Stripe holds, the rows a Supabase database holds.
13
+
14
+ The one rule that defines this skill: **seeds are explicit, committed, and
15
+ deterministic. No LLM is involved.** You write the state, or you pick a named
16
+ catalog seed someone already wrote. You never ask a model to invent it.
17
+
18
+ The dedicated package that loads seed state, `@archal/seed-state`, says this in
19
+ its own README and it is the mental model to hold:
20
+
21
+ > This package intentionally contains no LLM calls, code generation, natural
22
+ > language extraction, cache, repair, or scenario-to-state generation.
23
+
24
+ ## The four shapes a seed can take
25
+
26
+ 1. **A committed JSON seed file** — `clones/<clone>/seeds/<name>.json`. A
27
+ top-level object whose keys are the clone's collections and whose values are
28
+ arrays of entities. This is the common shape for object-graph clones
29
+ (github, stripe, slack, linear, jira). Example file:
30
+ `clones/github/seeds/small-project.json`.
31
+
32
+ 2. **A committed SQL seed file** — `clones/<clone>/seeds/<name>.sql`. A set of
33
+ `CREATE TABLE` + `INSERT` statements. The natural shape for relational
34
+ clones like `supabase`, whose seeds on disk are `.sql`, not `.json`. Example:
35
+ `clones/supabase/seeds/ecommerce.sql`.
36
+
37
+ 3. **A named catalog seed** — a seed that already lives on disk for a clone, so
38
+ you reference it by name instead of writing one. `github: small-project`,
39
+ `stripe: checkout-flow`, `supabase: saas-starter`. Browse them with
40
+ `archal seed list <clone>`.
41
+
42
+ 4. **An inline `## Seed State` block in a scenario** — explicit state written
43
+ directly in the scenario markdown. Use it when the state is small and tightly
44
+ coupled to that one scenario. For anything reused across scenarios, prefer a
45
+ committed file (shapes 1–3) so it has one home.
46
+
47
+ Resolution on disk checks `<name>.json` first, then `<name>.sql`
48
+ (`loadSeedStateFromPath` in `packages/seed-state/src/state.ts`). A clone can
49
+ ship either form for a given seed name, not both.
50
+
51
+ ## How a scenario or the CLI selects a seed
52
+
53
+ A seed value comes from one of two places, parsed identically:
54
+
55
+ - The scenario `## Config` key `seed:`
56
+ - The CLI flag `archal run --seed <name-or-path>`
57
+
58
+ Both accept the same forms (verbatim from the `--seed` flag help):
59
+
60
+ > Seed name (e.g. small-project), clone-prefixed name (github:small-project,
61
+ > applies only to that clone), seed family (enterprise, stale, ...), or local
62
+ > file path (.json / .md).
63
+
64
+ So the two everyday shapes are:
65
+
66
+ - **Bare name** — `seed: small-project`. Applies to every clone the scenario
67
+ declares.
68
+ - **Clone-prefixed** — `seed: github:small-project`. Applies only to the named
69
+ clone. This matches `archal clone start --seed github:small-project`, so the
70
+ same string works in both commands.
71
+
72
+ The prefix is validated: if you write `seed: payments:checkout-flow` but the
73
+ scenario never declares a `payments` clone, you get a usage error naming the
74
+ clones the scenario actually has, rather than a confusing `seed_unavailable`
75
+ later from the runtime (`parseExplicitSeed` in
76
+ `cli/src/runner/seed-resolution.ts`).
77
+
78
+ One behavior to remember: a scenario with a `## Setup` section but **no**
79
+ explicit `seed:` is forced to the `empty` seed, because Setup prose used to
80
+ drive dynamic generation and a pre-populated seed would conflict with it. To
81
+ combine a Setup description with a real seed, you must add an explicit `seed:`
82
+ field. (See `resolveRunSeedPlan` in `cli/src/runner/seed-resolution.ts`.)
83
+
84
+ ### Command surface
85
+
86
+ | Command | What it does |
87
+ |---------|--------------|
88
+ | `archal seed list` | One row per clone: clone, default seed, seed count |
89
+ | `archal seed list <clone>` | Every seed for that clone, with the default marked |
90
+ | `archal seed list <clone> --json` | Same, machine-readable |
91
+ | `archal run <scenario>.md --seed <name-or-path>` | Override the seed for a run |
92
+ | `archal run ... --fresh-seed` | On a reused clone session, reset and re-apply the seed |
93
+ | `archal run ... --keep-state` | On a reused session, keep existing state, do not re-apply |
94
+ | `archal clone start <clone> --seed <seeds...>` | Start a live clone pre-seeded |
95
+ | `archal clone start <clone> --seed-file <path>` | Start a live clone, then load a JSON/markdown seed file |
96
+ | `archal clone seed --file <path>` | Sideload a JSON seed file into a running clone |
97
+
98
+ Prefer `archal seed list` over memorizing a seed table. The catalog is derived
99
+ from `clones/<clone>/seeds/*.{json,sql}` on disk, so the CLI is always current.
100
+
101
+ ## The deterministic load flow
102
+
103
+ When a run seeds a clone, the loader does this, in order
104
+ (`packages/runtime/src/seed-loader.ts`):
105
+
106
+ 1. **Snapshot first.** `GET /state` on each target clone and keep the response
107
+ text. This is the rollback point (`snapshotSeedTargets` →
108
+ `fetchSeedStateSnapshot`).
109
+ 2. **Apply the seed.** `PUT /state` with the seed body — `application/json` for
110
+ a JSON seed, `text/sql` for a SQL seed — one clone at a time, tracking each
111
+ one that committed.
112
+ 3. **Roll back on failure.** If any clone's load throws, restore the clones that
113
+ already committed by replaying their snapshots, then re-throw the original
114
+ error (`restoreSeedTargets`). A clone is never left half-seeded.
115
+ 4. **Capture the baseline after seeding.** Once seeding succeeds, the post-seed
116
+ state is captured as the baseline (`captureBaselineStates` in
117
+ `packages/runtime/src/clone-client.ts`). Resets between runs restore *this*
118
+ seeded baseline, not an empty clone — so every run in a multi-run scenario
119
+ starts from the same seeded state.
120
+
121
+ ### How state reaches the clone (the `/state` endpoint)
122
+
123
+ The clone server exposes `/state` (`clones/core/src/rest/rest-built-in-endpoints.ts`):
124
+
125
+ - `GET /state` — read the current state (used for the snapshot and baseline).
126
+ - `PUT /state` — load state. The handler branches on `Content-Type`:
127
+ - `text/sql` or `application/sql` → parsed and loaded as SQL (only when the
128
+ clone exposes SQL loading; otherwise it returns a 400 telling you to send
129
+ JSON).
130
+ - otherwise → loaded as JSON state.
131
+ - `PUT /state?seed=<name>` with a `{}` body → load a named catalog seed
132
+ server-side, without sending a state body at all.
133
+ - `DELETE /state` — wipe state (used by `--fresh-seed` before re-applying).
134
+
135
+ A successful `PUT` returns `{ ok: true }`. The content-type the CLI sends is set
136
+ in one place — `text/sql` when there is SQL, `application/json` otherwise
137
+ (`pushStateToCloud` in `cli/src/runner/execution/agent-http.ts`); named-seed
138
+ loads always send `application/json` with an empty body.
139
+
140
+ ## Do not do these
141
+
142
+ - **Do not synthesize seed data from scenario prose.** A `## Setup` paragraph is
143
+ context for the evaluator, not a source you extract state from. If you need
144
+ populated state, write a committed seed (or pick a named one) and reference it
145
+ with `seed:`.
146
+ - **Do not call an LLM to generate, repair, or "fill in" a seed.** The old
147
+ dynamic seed-generation path is being removed; `@archal/seed-state` was built
148
+ specifically to have none of it. Explicit committed seeds only.
149
+ - **Do not invent collection names.** A JSON seed must use the clone's real
150
+ collection keys (see "Writing a good JSON seed"). Unknown keys are dropped by
151
+ normalization, so a typo silently seeds nothing.
152
+ - **Do not hand-tune both `.json` and `.sql` for the same name.** Pick one form
153
+ per seed name; resolution stops at the first that exists.
154
+ - **Do not blow away a sideloaded session by accident.** On a reused
155
+ `archal clone start` session the loader probes for existing state before
156
+ re-applying. Use `--keep-state` to keep it or `--fresh-seed` to reset on
157
+ purpose, rather than fighting the guard.
158
+
159
+ ## Writing a good JSON seed
160
+
161
+ The seed file must mirror the clone's real state shape. The fastest reliable way
162
+ to get the shape right is to open an existing catalog seed for that clone and
163
+ match its top-level keys and per-entity fields.
164
+
165
+ A JSON seed is a top-level object: each key is a collection, each value is an
166
+ array of entity objects.
167
+
168
+ ```jsonc
169
+ {
170
+ "users": [{ "id": "u1", "login": "octocat", "type": "User" }],
171
+ "repos": [{ "id": "r1", "name": "demo", "fullName": "octocat/demo", "private": false }],
172
+ "issues": [{ "id": "i1", "repoId": "r1", "number": 1, "title": "First issue", "state": "open" }]
173
+ }
174
+ ```
175
+
176
+ Match the clone, not your imagination:
177
+
178
+ - Use the same collection names the clone uses. `clones/github/seeds/small-project.json`
179
+ has `users`, `repos`, `issues`, `pullRequests`, `labels`, and more.
180
+ - Give each entity the fields the clone expects, especially `id` and the foreign
181
+ keys that wire entities together (`repoId`, `issueNumber`, ...). The clone
182
+ maintains those relationships; broken references produce realistic errors at
183
+ runtime, not seed-time.
184
+ - Only arrays survive normalization — non-array top-level values are dropped
185
+ (`normalizeSeedState`). Keep everything in collection arrays.
186
+
187
+ ## When SQL fits
188
+
189
+ Use a `.sql` seed for relational clones whose state is naturally tables and
190
+ rows — `supabase` is the canonical case, and its seeds on disk are `.sql`. The
191
+ SQL seed is plain `CREATE TABLE` + `INSERT INTO ... VALUES (...)`:
192
+
193
+ ```sql
194
+ CREATE TABLE customers (id serial, email text, name text);
195
+ INSERT INTO customers (email, name) VALUES
196
+ ('ada@example.com', 'Ada Lovelace'),
197
+ ('alan@example.com', 'Alan Turing');
198
+ ```
199
+
200
+ The SQL parser is deliberately small (`parseSqlSeed` in
201
+ `packages/seed-state/src/state.ts`): it understands `CREATE TABLE` (including
202
+ `IF NOT EXISTS`), `INSERT INTO ... VALUES`, schema-qualified and quoted
203
+ identifiers, line and block comments, and auto-assigns serial `id`s. It is not a
204
+ full SQL engine — anything other than `CREATE TABLE` / `INSERT` throws an
205
+ "Unsupported SQL seed statement" error. Keep seeds to those two statements.
206
+
207
+ ## Failure taxonomy
208
+
209
+ | Symptom | Likely cause | Fix |
210
+ |---------|--------------|-----|
211
+ | `Could not resolve static seed "<name>"` / `seed_unavailable` | Seed name is misspelled, not in this clone's catalog, or the hosted session was started with a different seed | `archal seed list <clone>` to confirm the name; re-provision the session with the right seed |
212
+ | `clone "<x>" is not part of this scenario` | Clone-prefixed seed names a clone the scenario doesn't declare | Add the clone to `clones:`, or drop the prefix |
213
+ | Seed loads but state looks empty / partial | Wrong collection keys, or non-array top-level values dropped by normalization | Match an existing catalog seed's keys; keep every collection an array |
214
+ | `Failed to parse seed file ...` | Malformed JSON | Validate the JSON; check trailing commas and quoting |
215
+ | `Unsupported SQL seed statement: ...` | SQL beyond `CREATE TABLE` / `INSERT` | Reduce the seed to those two statements |
216
+ | `…does not expose SQL state loading` (HTTP 400) | Sent SQL to a clone that only loads JSON | Send JSON state, or use a clone that supports SQL |
217
+ | `Rollback failed after partial seed load` | A clone errored mid-load and the snapshot restore also failed | The clones may be in a mixed state; restart the clone session and retry |
218
+ | Seed "not re-applied" warning on a reused session | The loader kept existing sideloaded state instead of overwriting it | `--fresh-seed` to reset and re-apply, or `--keep-state` to accept it on purpose |
219
+
220
+ ## What to report back
221
+
222
+ After authoring or loading a seed, tell the user:
223
+
224
+ - the clone(s) and the seed shape used (named catalog seed, JSON file, SQL file,
225
+ or inline `## Seed State`)
226
+ - the seed name or file path, and how a scenario/run selects it (`seed:` or
227
+ `--seed`)
228
+ - the entity counts loaded per clone (the loader logs e.g. `7 issues, 3 repos`)
229
+ - for a new seed file, that it mirrors an existing catalog seed's shape
230
+ - any blocker: missing seed name, shape mismatch, SQL parse error, or a rollback
231
+ that fired — with the exact next command
232
+
233
+ ## Docs
234
+
235
+ - Seeds guide: https://docs.archal.ai/guides/seeds
236
+ - Clones and seeds overview: https://docs.archal.ai/clones/overview
237
+ - Writing scenarios: https://docs.archal.ai/guides/writing-scenarios