archal 0.9.18 → 0.9.20

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (92) hide show
  1. package/README.md +9 -1
  2. package/agents/github-octokit/.archal.json +8 -0
  3. package/agents/github-octokit/Dockerfile +8 -0
  4. package/agents/github-octokit/README.md +113 -0
  5. package/agents/github-octokit/agent.mjs +54 -0
  6. package/agents/github-octokit/package.json +9 -0
  7. package/agents/github-octokit/scenarios/test-repo-access.md +27 -0
  8. package/agents/google-workspace-local-tools/Dockerfile +6 -0
  9. package/agents/google-workspace-local-tools/README.md +58 -0
  10. package/agents/google-workspace-local-tools/agent.mjs +196 -0
  11. package/agents/google-workspace-local-tools/archal-harness.json +7 -0
  12. package/agents/google-workspace-local-tools/run-input.yaml +16 -0
  13. package/agents/google-workspace-local-tools/scenario.md +29 -0
  14. package/agents/hermes/.archal.json +8 -0
  15. package/agents/hermes/Dockerfile +46 -0
  16. package/agents/hermes/README.md +87 -0
  17. package/agents/hermes/SOUL.md +27 -0
  18. package/agents/hermes/config.yaml +34 -0
  19. package/agents/hermes/drive.mjs +113 -0
  20. package/agents/hermes/scenarios/stripe-customers-read-only.md +32 -0
  21. package/agents/openclaw/.archal.json +8 -0
  22. package/agents/openclaw/Dockerfile +96 -0
  23. package/agents/openclaw/README.md +120 -0
  24. package/agents/openclaw/drive.mjs +311 -0
  25. package/agents/openclaw/package.json +9 -0
  26. package/agents/openclaw/scenarios/github-issue-triage-read-only.md +44 -0
  27. package/agents/openclaw/workspace/AGENTS.md +23 -0
  28. package/agents/openclaw/workspace/IDENTITY.md +8 -0
  29. package/agents/openclaw/workspace/SOUL.md +14 -0
  30. package/agents/openclaw/workspace/TOOLS.md +35 -0
  31. package/agents/pagination-test/README.md +24 -0
  32. package/agents/pagination-test/scenario.md +24 -0
  33. package/agents/replay-capsule-harness/README.md +29 -0
  34. package/agents/replay-capsule-harness/observability-install-offline-e2e.mts +1517 -0
  35. package/agents/replay-capsule-harness/replay-capsule-e2e.mjs +104 -0
  36. package/clone-assets/apify/tools.json +213 -13
  37. package/clone-assets/calcom/tools.json +510 -0
  38. package/clone-assets/clickup/tools.json +1258 -0
  39. package/clone-assets/customerio/tools.json +386 -0
  40. package/clone-assets/datadog/tools.json +734 -0
  41. package/clone-assets/github/tools.json +312 -25
  42. package/clone-assets/gitlab/tools.json +999 -0
  43. package/clone-assets/google-workspace/tools.json +18 -6
  44. package/clone-assets/hubspot/tools.json +1406 -0
  45. package/clone-assets/jira/fidelity.json +1 -1
  46. package/clone-assets/jira/tools.json +266 -543
  47. package/clone-assets/linear/tools.json +238 -40
  48. package/clone-assets/ownerrez/tools.json +548 -0
  49. package/clone-assets/pricelabs/tools.json +343 -0
  50. package/clone-assets/sentry/tools.json +745 -0
  51. package/clone-assets/slack/tools.json +1 -2
  52. package/clone-assets/stripe/tools.json +185 -46
  53. package/clone-assets/supabase/tools.json +511 -14
  54. package/clone-assets/unipile/tools.json +408 -0
  55. package/clone-assets/webflow/tools.json +415 -0
  56. package/dist/autoloop-worker-types-BEb_E44z.d.cts +196 -0
  57. package/dist/cli.cjs +151033 -75282
  58. package/dist/commands/autoloop-hosted-worker.cjs +43942 -0
  59. package/dist/commands/autoloop-hosted-worker.d.cts +143 -0
  60. package/dist/commands/autoloop-pr-verification.cjs +4227 -0
  61. package/dist/commands/autoloop-pr-verification.d.cts +17 -0
  62. package/dist/{vitest/chunk-IVXSSEYS.js → commands/autoloop-result-parser.cjs} +16515 -18857
  63. package/dist/commands/autoloop-result-parser.d.cts +39 -0
  64. package/dist/commands/autoloop-worker.cjs +36163 -0
  65. package/dist/commands/autoloop-worker.d.cts +97 -0
  66. package/dist/harness.cjs +1 -0
  67. package/dist/index.cjs +1 -1
  68. package/dist/replay.cjs +49624 -0
  69. package/dist/replay.d.cts +4625 -0
  70. package/dist/scenarios.cjs +80343 -0
  71. package/dist/scenarios.d.cts +562 -0
  72. package/dist/vitest/chunk-6CBYFCFK.js +4667 -0
  73. package/dist/vitest/chunk-ARVS45PP.js +2764 -0
  74. package/dist/vitest/index.cjs +6079 -75089
  75. package/dist/vitest/index.d.ts +7 -6
  76. package/dist/vitest/index.js +8 -8
  77. package/dist/vitest/runtime/hosted-session-reaper.cjs +801 -34187
  78. package/dist/vitest/runtime/hosted-session-reaper.js +1 -1
  79. package/dist/vitest/runtime/setup-files.js +2 -2
  80. package/package.json +14 -9
  81. package/skills/archal-agent/SKILL.md +87 -0
  82. package/skills/autoloop/SKILL.md +376 -0
  83. package/skills/autoloop/references/hosted-sources.md +62 -0
  84. package/skills/autoloop/references/trace-schema-mapping.md +73 -0
  85. package/skills/eval/SKILL.md +35 -1
  86. package/skills/install-agent/SKILL.md +221 -0
  87. package/skills/onboard/SKILL.md +80 -0
  88. package/skills/scenario/SKILL.md +19 -4
  89. package/skills/seed/SKILL.md +237 -0
  90. package/dist/seed/dynamic-generator.cjs +0 -45564
  91. package/dist/seed/dynamic-generator.d.cts +0 -106
  92. package/dist/vitest/chunk-CTSN67QR.js +0 -47188
@@ -90,6 +90,40 @@ Exit codes: `0` pass, `1` fail or score < threshold, `2` validation error. For G
90
90
 
91
91
  Workspace API keys are runtime and CI credentials bound to one workspace. They can run clones, upload and read traces, and read usage for that workspace. They cannot manage audit events or workspace API keys. Use an owner/admin user credential, either `archal login` or a dashboard-issued user API key, for workspace administration.
92
92
 
93
+ ## Pre-production autonomous loop
94
+
95
+ Use `archal preprod start` when the user wants a coding agent to run a bounded
96
+ pack of scenarios before shipping, remediate failures, rerun, validate, and
97
+ open a draft PR. This is different from post-production `archal autoloop`: it
98
+ starts from repo scenarios and clone runs, not imported production traces.
99
+
100
+ First do a safe dry run:
101
+
102
+ ```bash
103
+ archal preprod start --scenario-count 20 --dry-run --artifacts .archal/preprod
104
+ ```
105
+
106
+ Then, only after the dry-run artifacts look like real agent/scenario failures,
107
+ allow the managed remediation path:
108
+
109
+ ```bash
110
+ archal preprod start \
111
+ --scenario-count 20 \
112
+ --allow-external-execution \
113
+ --remediation-agent codex \
114
+ --validation-command 'pnpm test' \
115
+ --open-pr \
116
+ --pr-command 'gh pr create --draft --fill' \
117
+ --artifacts .archal/preprod
118
+ ```
119
+
120
+ Read `.archal/preprod/preprod-result.json`,
121
+ `.archal/preprod/preprod-failures.json`, and the remediation context before
122
+ summarizing results. Treat runs without validation evidence as local
123
+ remediation passes, not release proof. If a run stops after `initial-runs`,
124
+ `fix`, or `validation`, resume with `archal preprod start --resume
125
+ .archal/preprod --artifacts .archal/preprod`.
126
+
93
127
  ## Artifacts + dashboard
94
128
 
95
129
  - **Local (always written):** `.archal/cache/last-run.json` (summary), `.archal/cache/runs/*.json` (full redacted trace).
@@ -108,6 +142,6 @@ Don't tell users they need `-o json` to save artifacts locally - that's only for
108
142
  ## Docs
109
143
 
110
144
  - Running with an agent: https://docs.archal.ai/guides/run-with-agent
111
- - Existing repo playbook: https://docs.archal.ai/guides/existing-agent-repo
145
+ - Existing repo playbook: https://docs.archal.ai/guides/run-with-agent
112
146
  - Scenario authoring: hand off to the `scenario` skill
113
147
  - Clone sessions: https://docs.archal.ai/guides/clone-sessions
@@ -0,0 +1,221 @@
1
+ ---
2
+ name: install-agent
3
+ description: Connect an agent's repo and its production observability to Archal so its traces get captured and graded. Detects an existing observability stack (LangSmith, Langfuse, Datadog, OpenTelemetry, Braintrust), connects the GitHub App, opens an observability setup PR, and wires an existing trace vendor through `archal trace-source`. USE THIS whenever the user says "connect my agent", "install the Archal agent", "set up observability", "capture my agent's traces", "hook up my production traces", "where do my traces go", or asks how to get an already-running agent into Archal. Reach for it before telling anyone a capability is missing — read the honest limits below first.
4
+ user-invocable: true
5
+ argument-hint: "[repo + where its traces live]"
6
+ ---
7
+
8
+ # Archal Install Agent
9
+
10
+ You are connecting a real, already-running agent to Archal so its production
11
+ behavior gets captured and graded. Two things have to land: (1) Archal can read
12
+ the **repo** (GitHub App), and (2) Archal can read the agent's **traces** —
13
+ either by adding instrumentation through a setup PR, or by ingesting an existing
14
+ observability vendor. Once traces flow, grading and the autoloop take over.
15
+
16
+ Be honest about what this is. The "install agent" is **not** a sandboxed coding
17
+ agent that edits arbitrary code in the repo. It is deterministic repo inspection
18
+ plus a deterministic, templated setup PR, plus an optional one-shot managed
19
+ planner (an LLM call) that only relocates the bootstrap file and writes advisory
20
+ PR-body text. Set that expectation up front so nobody waits for an autonomous
21
+ coder that does not exist yet. The honest limit is spelled out below — do not
22
+ oversell it.
23
+
24
+ ## Why this exists
25
+
26
+ Archal grades agent behavior from traces. An agent that already runs in
27
+ production has traces somewhere — your own logs, or a vendor like Langfuse or
28
+ Braintrust. The install path's whole job is to get those traces into Archal's
29
+ normalized shape without you hand-writing exporters or copying secrets around.
30
+ Capturing the trace is the precondition for everything downstream: grading,
31
+ reproduction, and the autoloop that turns reproduced failures into PRs.
32
+
33
+ ## Discover first
34
+
35
+ Before changing anything, read the repo and find out where traces live:
36
+
37
+ 1. `package.json` / `pyproject.toml` / `requirements.txt`: language and
38
+ framework. Language matters — the planner and `@archal/state-capture` are
39
+ TypeScript-only today (see limits).
40
+ 2. Existing observability dependencies. Archal's detector recognizes exactly
41
+ five vendors by dependency name:
42
+ - `langsmith` -> LangSmith
43
+ - `langfuse`, `langfuse-node` (TS) / `langfuse` (py) -> Langfuse
44
+ - `dd-trace` (TS) / `ddtrace` (py) -> Datadog
45
+ - `@opentelemetry/sdk-node`, `@opentelemetry/sdk-trace-node` (TS) /
46
+ `opentelemetry-sdk` (py) -> OpenTelemetry
47
+ - `braintrust` -> Braintrust
48
+ A repo with any of these is a candidate for **augment**; a repo with none is
49
+ **greenfield**.
50
+ 3. GitHub remote — augment/greenfield setup PRs and the autoloop need a GitHub
51
+ remote that resolves to `github.com/<owner>/<repo>`:
52
+ ```bash
53
+ git remote get-url origin
54
+ ```
55
+ 4. Where do the traces actually go? Ask the user. The answer routes you:
56
+ - already in a hosted vendor (Langfuse, Braintrust) or a Postgres/Supabase
57
+ table -> ingest path (`archal trace-source`, delegate detail to `autoloop`)
58
+ - exported files on disk -> `archal trace-source import`
59
+ - nowhere yet / only app logs -> the observability setup PR
60
+
61
+ Never print secrets while inspecting. Show env var names or secret references,
62
+ never plaintext keys or database URLs.
63
+
64
+ ## Preconditions
65
+
66
+ - Archal CLI installed in the repo or reachable with `npx archal`
67
+ - authenticated user (`archal login`) or `ARCHAL_TOKEN=archal_ws_...` (a
68
+ workspace key for CI)
69
+ - the **Archal GitHub App** installed on the target repo (required for the setup
70
+ PR and for autoloop fix PRs)
71
+ - a GitHub remote resolving to `github.com/<owner>/<repo>`
72
+ - for the ingest path: a read-only credential for the trace vendor
73
+
74
+ If a precondition is missing, make the smallest safe change and name what is
75
+ still required. Do not fake a connection.
76
+
77
+ ## Step 1 — connect the GitHub App
78
+
79
+ The repo connection is the GitHub App, not a token paste. Confirm the **Archal
80
+ GitHub App** is installed on the target repository and that the org granted it
81
+ access. Without it, the setup PR cannot be opened and the autoloop cannot open
82
+ fix PRs. If it is not installed, send the user to the dashboard's integration
83
+ flow to install it, then continue.
84
+
85
+ ## Step 2 — open the observability setup PR
86
+
87
+ When traces are not yet exported anywhere (or the user wants Archal's own
88
+ capture), open the **observability setup PR**. It is a deterministic, templated
89
+ patch — every file's contents are pre-generated; nothing is freely authored by
90
+ an LLM. The patch resolves into one of two install modes:
91
+
92
+ ### Greenfield (no existing observability detected)
93
+
94
+ Adds standard OpenTelemetry instrumentation pointed at Archal's OTLP endpoint:
95
+
96
+ - `archal-otel.ts` (TS) or `archal_otel.py` (Python) — an OpenTelemetry init
97
+ bootstrap (OTLP HTTP exporter + the node/python SDK), **not** an Archal-only
98
+ exporter
99
+ - `archal-replay-capsule.ts` / `archal_replay_capsule.py` — a replay helper
100
+ template
101
+ - OpenTelemetry SDK + framework instrumentation added to `package.json` (TS) or
102
+ `requirements.txt` (Python)
103
+ - an `.env.example` entry for the workspace key and an `ARCHAL_OBSERVABILITY.md`
104
+ guide
105
+
106
+ ### Augment (existing observability detected, TypeScript only)
107
+
108
+ When the repo already has one of the five vendors above **and** is TypeScript,
109
+ the PR instead adds Archal's state capture alongside the existing stack rather
110
+ than replacing it:
111
+
112
+ - `archal-state-capture.ts` importing from `@archal/state-capture`
113
+ - `@archal/state-capture` and `@opentelemetry/api` added to `package.json`
114
+ - the same `.env.example` entry and `ARCHAL_OBSERVABILITY.md` guide
115
+
116
+ Python repos with existing observability fall back to the greenfield
117
+ OpenTelemetry install — `@archal/state-capture` has no Python build yet, and the
118
+ PR body says so. Tell the user that explicitly rather than implying parity.
119
+
120
+ ### The optional install planner (managed LLM)
121
+
122
+ On Pro/Enterprise workspaces, for a TypeScript greenfield/augment install with
123
+ repo detection available, one managed LLM call (intent `observability-install`,
124
+ public label **Archal install planner**, routed through the managed eval model
125
+ lane — gemini-class — and metered as `cogs_only` spend) adapts the deterministic
126
+ patch to the repo's real layout. It is strictly additive and **fail-open**: any
127
+ availability, auth, plan-gate, or validation problem ships the deterministic
128
+ patch unchanged. The planner can only:
129
+
130
+ - relocate the bootstrap file to a better directory (path only — file contents
131
+ are never edited)
132
+ - append advisory text to the PR body (where to wire startup, which functions to
133
+ wrap)
134
+
135
+ It never edits application code, never modifies existing instrumentation, and
136
+ never runs free-form codegen. Disable it with `ARCHAL_INSTALL_PLANNER_DISABLED=1`
137
+ to force the deterministic install.
138
+
139
+ ## Step 3 — ingest an existing observability vendor
140
+
141
+ If the agent already emits traces to a vendor, you usually do **not** need the
142
+ setup PR — you normalize the existing traces with `archal trace-source`. This is
143
+ the genuine ingest path. It maps a vendor's payloads into Archal's trace upload
144
+ envelopes and uploads them to hosted autoloop when workspace auth is present.
145
+
146
+ Supported providers: `langfuse`, `braintrust`, `otel`, `http`, `supabase`,
147
+ `postgres`, `file`, `custom`. Pull/sync vendors (`langfuse`, `braintrust`,
148
+ Postgres/Supabase) are fetched on a cursor; push sources (`otel`, `http`,
149
+ `custom`) receive traces continuously through `serve`.
150
+
151
+ The command surface:
152
+
153
+ ```bash
154
+ archal trace-source connect <provider> # register a source (e.g. langfuse, braintrust, otel, custom)
155
+ archal trace-source test [source] # validate credentials and reachability
156
+ archal trace-source sync [source] # pull-fetch new traces (langfuse/braintrust/db)
157
+ archal trace-source watch [source] # continuous pull loop
158
+ archal trace-source serve [source] # receiver for push sources (otel/http/custom)
159
+ archal trace-source import <path> # normalize exported trace files on disk
160
+ archal trace-source status [source] # registry validation, cursor, last-sync state
161
+ archal trace-source list # registered sources
162
+ archal trace-source use|disable <source> # select / disable a source
163
+ ```
164
+
165
+ **Delegate the deep mapping to the `autoloop` skill.** It owns the per-vendor
166
+ flags (`--base-url`, `--api-key-env`, `--out`, `--upload`, `--repository`,
167
+ schema/cursor/filter mapping for database sources) and the full
168
+ import/grade/reproduce/fix loop. Quote the command names here; point the user
169
+ there for the flag detail and the autoloop wiring.
170
+
171
+ ## The honest limit
172
+
173
+ There is **no sandboxed coding agent** that reads the whole repo and edits
174
+ arbitrary code to wire up instrumentation. For a small or conventional repo, the
175
+ setup PR drops in and works. For a large or unusual repo, the setup PR is a
176
+ **generic bootstrap the user finishes** — they still import the bootstrap at
177
+ startup and wrap the call sites the planner's advisory section points at. Say
178
+ this plainly. The typed model lanes `remediation_agent` and
179
+ `observability_install_agent` exist in the contract but are
180
+ `agent-executor-contract-only`: the lane is declared, but no real executor
181
+ consumes it yet. Do not describe either as a working autonomous coder.
182
+
183
+ ## Failure taxonomy
184
+
185
+ Classify precisely; do not paper over a missing precondition:
186
+
187
+ - **GitHub App not connected** — setup PR and fix PRs cannot open. Install the
188
+ Archal GitHub App on the repo.
189
+ - **No GitHub remote** — augment/greenfield PRs need `github.com/<owner>/<repo>`.
190
+ - **Language unsupported by augment** — Python with existing observability falls
191
+ back to greenfield OTel; `@archal/state-capture` is TS-only.
192
+ - **Planner skipped** — wrong plan (needs Pro/Enterprise), non-TypeScript,
193
+ `augment-existing-vendor` legacy mode, detection unavailable, or
194
+ `ARCHAL_INSTALL_PLANNER_DISABLED=1`. The deterministic patch still ships; this
195
+ is not a failure, just a narrower install.
196
+ - **Setup PR is a stub for this repo** — large/unusual layout; the user finishes
197
+ wiring. Not a bug; the honest limit.
198
+ - **Trace ingest failure** — `trace-source` adapter mismatch, bad credential,
199
+ rejected upload, or missing workspace auth. Use `archal trace-source test` and
200
+ `status` to localize it.
201
+ - **No usable trace evidence** — once ingested, grading or reproduction can still
202
+ block if the trace lacks task context or state. Hand off to the `autoloop`
203
+ skill's evidence diagnosis.
204
+
205
+ ## What to report back
206
+
207
+ After install or debugging, give the user:
208
+
209
+ - repo full name and whether the GitHub App is connected
210
+ - chosen path: setup PR (greenfield vs augment) or `trace-source` ingest
211
+ - if a setup PR: the install mode, the files it adds, and whether the planner ran
212
+ - if ingest: provider, source id, and the next `archal trace-source` command
213
+ - whether traces are flowing into Archal yet, or the exact blocker
214
+ - next command or next owner
215
+
216
+ ## Docs
217
+
218
+ - Autoloop production traces: https://docs.archal.ai/guides/autoloop-production-traces
219
+ - Autonomous loops: https://docs.archal.ai/guides/autoloop-production-traces
220
+ - CLI reference: https://docs.archal.ai/cli/autoloop
221
+ - Quickstart: https://docs.archal.ai/quickstart
@@ -87,6 +87,16 @@ Confirm detected clones, then ask which of these the user wants. Each delegates
87
87
 
88
88
  If the user doesn't have a harness yet, prefer `npx archal init`; it creates `./.archal/harness.mjs`, points `.archal.json` at it, and adds a starter scenario without overwriting existing files. The generated harness is a guarded stub: Archal refuses to score it until the user edits it to call their Cursor, Codex, Claude Code, or custom agent. A custom harness should read `AGENT_TASK` from env, call the agent runtime, print `{ "text": "..." }` to stdout, and call `reportAgentMetrics()` from `archal/harness` with accumulated `{ inputTokens, outputTokens, llmCallCount }` before exit. Service clients need one explicit routing mode: use sandbox/Docker routing when the harness calls normal service URLs such as `https://api.github.com`, or configure SDK base URLs from `AGENT_CLONE_URLS` and add the JSON headers from `AGENT_ROUTE_HEADERS` to those clone requests. Alternative: skip `agent` in `.archal.json` and pass `--harness <path>` per-run.
89
89
 
90
+ ### Or run a packaged agent (no harness to write)
91
+
92
+ If the user just wants to evaluate a real, ready-made agent, point them at a packaged agent instead of writing a harness. A packaged agent runs unmodified in Docker while Archal's TLS-intercept sidecar routes its calls to seeded clones and injects the host's model API key on its model calls. The bundled agents live under `examples/agents/<name>` (`openclaw`, `hermes`, `github-octokit`).
93
+
94
+ - `archal run <scenario>.md --sandbox` - run the bundled OpenClaw agent (needs Docker). Pick the model with `--agent-model <provider/model>`; export the matching key in the shell first (`OPENAI_API_KEY` / `ANTHROPIC_API_KEY` / `GEMINI_API_KEY`).
95
+ - `archal run <scenario>.md --harness examples/agents/hermes --dockerfile examples/agents/hermes/Dockerfile` - run any other bundled agent through the Docker harness flags (swap in `github-octokit` for the other one).
96
+ - `archal run <scenario>.md --harness ./<dir> --dockerfile ./<dir>/Dockerfile` - run your own packaged agent. A packaged agent is just a directory with a `Dockerfile`, a drive script (reads `AGENT_TASK`, prints the answer to stdout), and an `.archal.json`.
97
+
98
+ See the "Run a packaged agent" guide: https://docs.archal.ai/guides/packaged-agents
99
+
90
100
  ### Option A - Evaluate an agent with scenarios
91
101
 
92
102
  Write markdown scenario files that describe setup, prompt, and success criteria; `archal run` executes them against clones.
@@ -122,6 +132,74 @@ Do not paste a sample config here. The right shape depends on what's already in
122
132
 
123
133
  Run: `archal clone start <detected clones>` - gives live clone URLs the user's SDK clients can point at. `archal clone status` shows the active session; `archal clone stop` tears down.
124
134
 
135
+ ### Option E - Bounded pre-prod autonomous loop
136
+
137
+ Use this when the repo already has scenarios or can safely generate starter
138
+ pre-prod scenarios, and the user wants a coding agent to run checks, classify
139
+ failures, optionally patch, validate, and open a draft PR.
140
+
141
+ Start with:
142
+
143
+ ```bash
144
+ archal preprod plan --repo . --write-scenarios --write-config --out .archal/preprod-plan.json
145
+ archal preprod start --scenario-count 20 --dry-run --artifacts .archal/preprod
146
+ ```
147
+
148
+ `--write-scenarios` writes generated scenario markdown under `archal/` by
149
+ default, and `--write-config` writes `.archal.json` only when it can do so
150
+ without overwriting an existing config. `preprod start` creates or reuses
151
+ `.archal/preprod-pack.json`, writes generated scenarios under
152
+ `archal/generated/` by default, runs the pack, and leaves resumable artifacts.
153
+ If the repo already has `.archal.json`, read `.archal/preprod-plan.json` and
154
+ confirm the detected clone/harness surface before starting the loop.
155
+
156
+ Only enable local fix or PR commands after the dry-run artifacts have been
157
+ reviewed:
158
+
159
+ ```bash
160
+ archal preprod start \
161
+ --scenario-count 20 \
162
+ --allow-external-execution \
163
+ --remediation-agent codex \
164
+ --validation-command '<test command>' \
165
+ --open-pr \
166
+ --pr-command '<draft-pr command>' \
167
+ --artifacts .archal/preprod
168
+ ```
169
+
170
+ `--open-pr` requires both `--validation-command` and `--pr-command`; PR
171
+ publishing still stays disabled unless `--allow-external-execution` is present.
172
+ `preprod start` uses the managed preprod remediation path by default. It writes
173
+ a repo-local remediation context, invokes the selected coding agent, reruns the
174
+ scenario pack, and validates before PR creation. The remediation command
175
+ receives `ARCHAL_PREPROD_FAILURES_JSON`, `ARCHAL_PREPROD_ATTEMPT`,
176
+ `ARCHAL_PREPROD_REMEDIATION_CONTEXT_PATH`, and `ARCHAL_PREPROD_USAGE_PATH`.
177
+ If the coding agent can report its own model usage, write JSON to
178
+ `ARCHAL_PREPROD_USAGE_PATH` with fields such as `inputTokens`, `outputTokens`,
179
+ `provider`, `model`, `isByok`, and `costUsd`.
180
+
181
+ Tell the user to inspect `.archal/preprod/preprod-result.json` and
182
+ `.archal/preprod/preprod-failures.json` for status, stop reason, attempts,
183
+ scenario run ids, validation, and PR summary. If the run was stopped with
184
+ `--stop-after` or interrupted, resume with `archal preprod start --resume
185
+ .archal/preprod --artifacts .archal/preprod`.
186
+
187
+ ### Option F - Autoloop real trace sources
188
+
189
+ Use this when the repo already has agent traces from pre-production or
190
+ production and the user wants Archal to import, grade, reproduce, and turn
191
+ reproduced failures into GitHub issues or PRs.
192
+
193
+ **Delegate to the `autoloop` skill.** It owns the trace-source mapping,
194
+ `archal/harness.json`, `archal/scenario.md`, seed templates, `archal autoloop`
195
+ commands, dashboard expectations, and failure taxonomy. Do not inline the
196
+ Autoloop flow here; it changes faster than starter scenario setup.
197
+
198
+ Set the expectation carefully: Autoloop is not arbitrary production trace replay.
199
+ It can reproduce only failures with enough trace evidence plus repo-owned
200
+ scenario/seed context to reconstruct realistic clone state. Missing evidence
201
+ should block with a clear artifact instead of being guessed.
202
+
125
203
  ## Verify
126
204
 
127
205
  Run the first scenario or task. For Options A and B, hand off to the `eval` skill to interpret the satisfaction score and diagnose failures - that skill owns the runtime mental model (`[D]` vs `[P]` criteria, trace inspection, harness execution diagnostics).
@@ -144,3 +222,5 @@ Run the first scenario or task. For Options A and B, hand off to the `eval` skil
144
222
 
145
223
  - Quickstart: https://docs.archal.ai/quickstart
146
224
  - Full docs: https://docs.archal.ai
225
+ - Autonomous loops: https://docs.archal.ai/guides/autoloop-production-traces
226
+ - Autoloop production traces: https://docs.archal.ai/guides/autoloop-production-traces
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: scenario
3
- description: Write, edit, and validate Archal scenario files. Knows the markdown format, success criteria syntax, and config options.
3
+ description: Write, edit, and validate Archal scenario markdown the format, success criteria syntax, and config. USE THIS whenever the user wants to "write a scenario", "add a test for my agent", "fix/edit my scenario", asks "what's the success criteria syntax" or about `[D]`/`[P]` criteria, needs a multi-clone scenario, or is validating scenario files. Reach for it on any mention of authoring or fixing Archal scenarios.
4
4
  user-invocable: true
5
5
  argument-hint: "[scenario description or file path]"
6
6
  ---
@@ -15,7 +15,7 @@ You write and edit Archal scenario files. Scenarios are markdown files that defi
15
15
  # Scenario Title
16
16
 
17
17
  ## Setup
18
- Starting state described in plain English. Drives seed generation.
18
+ Starting state in plain English. Context Archal reconstructs and the agent + evaluator read. Does NOT generate seed state.
19
19
 
20
20
  ## Prompt
21
21
  The task instruction given to the agent.
@@ -95,14 +95,25 @@ and `archal seed list` over maintaining a separate list in this skill.
95
95
  | Clone | Seeds |
96
96
  |------|-------|
97
97
  | `apify` | `empty` |
98
+ | `calcom` | `empty`, `demo` |
99
+ | `clickup` | `empty`, `demo` |
100
+ | `customerio` | `empty` |
101
+ | `datadog` | `empty` |
98
102
  | `github` | `empty`, `small-project`, `enterprise-repo`, `ci-cd-pipeline`, `stale-issues`, `large-backlog` |
103
+ | `gitlab` | `empty`, `demo` |
104
+ | `hubspot` | `empty`, `demo`, `stale-data` |
99
105
  | `slack` | `empty`, `engineering-team`, `busy-workspace`, `incident-active` |
100
106
  | `stripe` | `empty`, `small-business`, `checkout-flow`, `subscription-lifecycle`, `subscription-heavy` |
101
107
  | `jira` | `empty`, `small-project`, `enterprise`, `sprint-active`, `large-backlog` |
102
108
  | `linear` | `empty`, `small-team`, `engineering-org`, `multi-team`, `busy-backlog` |
103
109
  | `supabase` | `empty`, `small-project`, `saas-starter`, `ecommerce` |
104
110
  | `google-workspace` | `empty`, `assistant-baseline`, `gmail-busy-inbox`, `calendar-packed-week` |
111
+ | `ownerrez` | `empty` |
112
+ | `pricelabs` | `empty` |
113
+ | `sentry` | `empty`, `demo` |
105
114
  | `tavily` | `empty` |
115
+ | `unipile` | `empty` |
116
+ | `webflow` | `empty` |
106
117
  | `ramp` | `empty`, `default` |
107
118
  | `discord` | `empty`, `small-server`, `harvested` |
108
119
 
@@ -131,7 +142,11 @@ Use multiple clones by listing them in config:
131
142
  clones: github, slack
132
143
  ```
133
144
 
134
- The Setup section can describe state across both services. Each clone gets its own seed.
145
+ The Setup section can describe context across both services. Attach explicit seed state per clone via `seed:` or `## Seed State` (see Seed state below).
146
+
147
+ ## Seed state
148
+
149
+ Seeding is deterministic — explicit committed state, no LLM. Scenarios attach it via the `seed:` config key or a `## Seed State` section. To author or load explicit JSON/SQL/catalog state into a clone, delegate to the sibling `seed` skill (`packages/archal/skills/seed`) rather than handling the seeding mechanics here.
135
150
 
136
151
  ## Validation
137
152
 
@@ -147,7 +162,7 @@ Run `archal scenario list` to verify scenarios parse correctly. A valid scenario
147
162
  1. Writing `[D]` criteria that require subjective judgment
148
163
  2. Writing `[P]` criteria that could be checked deterministically
149
164
  3. Forgetting to specify which clone the scenario uses
150
- 4. Writing Setup descriptions that are too vague for seed generation
165
+ 4. Writing Setup descriptions too vague to ground the agent and evaluator
151
166
  5. Using seed names that don't exist (check the seed table above)
152
167
 
153
168
  ## Documentation