npm - archal - Versions diffs - 0.9.19 → 0.9.20 - Mend

archal 0.9.19 → 0.9.20

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (92) hide show

package/README.md +9 -1
package/agents/github-octokit/.archal.json +8 -0
package/agents/github-octokit/Dockerfile +8 -0
package/agents/github-octokit/README.md +113 -0
package/agents/github-octokit/agent.mjs +54 -0
package/agents/github-octokit/package.json +9 -0
package/agents/github-octokit/scenarios/test-repo-access.md +27 -0
package/agents/google-workspace-local-tools/Dockerfile +6 -0
package/agents/google-workspace-local-tools/README.md +58 -0
package/agents/google-workspace-local-tools/agent.mjs +196 -0
package/agents/google-workspace-local-tools/archal-harness.json +7 -0
package/agents/google-workspace-local-tools/run-input.yaml +16 -0
package/agents/google-workspace-local-tools/scenario.md +29 -0
package/agents/hermes/.archal.json +8 -0
package/agents/hermes/Dockerfile +46 -0
package/agents/hermes/README.md +87 -0
package/agents/hermes/SOUL.md +27 -0
package/agents/hermes/config.yaml +34 -0
package/agents/hermes/drive.mjs +113 -0
package/agents/hermes/scenarios/stripe-customers-read-only.md +32 -0
package/agents/openclaw/.archal.json +8 -0
package/agents/openclaw/Dockerfile +96 -0
package/agents/openclaw/README.md +120 -0
package/agents/openclaw/drive.mjs +311 -0
package/agents/openclaw/package.json +9 -0
package/agents/openclaw/scenarios/github-issue-triage-read-only.md +44 -0
package/agents/openclaw/workspace/AGENTS.md +23 -0
package/agents/openclaw/workspace/IDENTITY.md +8 -0
package/agents/openclaw/workspace/SOUL.md +14 -0
package/agents/openclaw/workspace/TOOLS.md +35 -0
package/agents/pagination-test/README.md +24 -0
package/agents/pagination-test/scenario.md +24 -0
package/agents/replay-capsule-harness/README.md +29 -0
package/agents/replay-capsule-harness/observability-install-offline-e2e.mts +1517 -0
package/agents/replay-capsule-harness/replay-capsule-e2e.mjs +104 -0
package/clone-assets/apify/tools.json +256 -22
package/clone-assets/calcom/tools.json +510 -0
package/clone-assets/clickup/tools.json +1258 -0
package/clone-assets/customerio/tools.json +386 -0
package/clone-assets/datadog/tools.json +734 -0
package/clone-assets/github/tools.json +306 -25
package/clone-assets/gitlab/tools.json +999 -0
package/clone-assets/google-workspace/tools.json +18 -6
package/clone-assets/hubspot/tools.json +1406 -0
package/clone-assets/jira/fidelity.json +1 -1
package/clone-assets/jira/tools.json +266 -543
package/clone-assets/linear/tools.json +238 -40
package/clone-assets/ownerrez/tools.json +548 -0
package/clone-assets/pricelabs/tools.json +343 -0
package/clone-assets/sentry/tools.json +745 -0
package/clone-assets/slack/tools.json +1 -2
package/clone-assets/stripe/tools.json +185 -46
package/clone-assets/supabase/tools.json +437 -0
package/clone-assets/unipile/tools.json +408 -0
package/clone-assets/webflow/tools.json +415 -0
package/dist/autoloop-worker-types-BEb_E44z.d.cts +196 -0
package/dist/cli.cjs +150299 -87430
package/dist/commands/autoloop-hosted-worker.cjs +43942 -0
package/dist/commands/autoloop-hosted-worker.d.cts +143 -0
package/dist/commands/autoloop-pr-verification.cjs +4227 -0
package/dist/commands/autoloop-pr-verification.d.cts +17 -0
package/dist/{vitest/chunk-L36NXAU6.js → commands/autoloop-result-parser.cjs} +16445 -18852
package/dist/commands/autoloop-result-parser.d.cts +39 -0
package/dist/commands/autoloop-worker.cjs +36163 -0
package/dist/commands/autoloop-worker.d.cts +97 -0
package/dist/harness.cjs +1 -0
package/dist/index.cjs +1 -1
package/dist/replay.cjs +49624 -0
package/dist/replay.d.cts +4625 -0
package/dist/scenarios.cjs +80343 -0
package/dist/scenarios.d.cts +562 -0
package/dist/vitest/chunk-6CBYFCFK.js +4667 -0
package/dist/vitest/chunk-ARVS45PP.js +2764 -0
package/dist/vitest/index.cjs +6011 -75261
package/dist/vitest/index.d.ts +7 -6
package/dist/vitest/index.js +8 -8
package/dist/vitest/runtime/hosted-session-reaper.cjs +792 -34359
package/dist/vitest/runtime/hosted-session-reaper.js +1 -1
package/dist/vitest/runtime/setup-files.js +2 -2
package/package.json +8 -3
package/skills/archal-agent/SKILL.md +87 -0
package/skills/{attach → autoloop}/SKILL.md +94 -120
package/skills/autoloop/references/hosted-sources.md +62 -0
package/skills/autoloop/references/trace-schema-mapping.md +73 -0
package/skills/eval/SKILL.md +35 -1
package/skills/install-agent/SKILL.md +221 -0
package/skills/onboard/SKILL.md +73 -5
package/skills/scenario/SKILL.md +19 -4
package/skills/seed/SKILL.md +237 -0
package/dist/seed/dynamic-generator.cjs +0 -45687
package/dist/seed/dynamic-generator.d.cts +0 -106
package/dist/vitest/chunk-WZ7SA4CK.js +0 -47369

package/skills/install-agent/SKILL.md ADDED Viewed

@@ -0,0 +1,221 @@
+---
+name: install-agent
+description: Connect an agent's repo and its production observability to Archal so its traces get captured and graded. Detects an existing observability stack (LangSmith, Langfuse, Datadog, OpenTelemetry, Braintrust), connects the GitHub App, opens an observability setup PR, and wires an existing trace vendor through `archal trace-source`. USE THIS whenever the user says "connect my agent", "install the Archal agent", "set up observability", "capture my agent's traces", "hook up my production traces", "where do my traces go", or asks how to get an already-running agent into Archal. Reach for it before telling anyone a capability is missing — read the honest limits below first.
+user-invocable: true
+argument-hint: "[repo + where its traces live]"
+---
+# Archal Install Agent
+You are connecting a real, already-running agent to Archal so its production
+behavior gets captured and graded. Two things have to land: (1) Archal can read
+the **repo** (GitHub App), and (2) Archal can read the agent's **traces** —
+either by adding instrumentation through a setup PR, or by ingesting an existing
+observability vendor. Once traces flow, grading and the autoloop take over.
+Be honest about what this is. The "install agent" is **not** a sandboxed coding
+agent that edits arbitrary code in the repo. It is deterministic repo inspection
+plus a deterministic, templated setup PR, plus an optional one-shot managed
+planner (an LLM call) that only relocates the bootstrap file and writes advisory
+PR-body text. Set that expectation up front so nobody waits for an autonomous
+coder that does not exist yet. The honest limit is spelled out below — do not
+oversell it.
+## Why this exists
+Archal grades agent behavior from traces. An agent that already runs in
+production has traces somewhere — your own logs, or a vendor like Langfuse or
+Braintrust. The install path's whole job is to get those traces into Archal's
+normalized shape without you hand-writing exporters or copying secrets around.
+Capturing the trace is the precondition for everything downstream: grading,
+reproduction, and the autoloop that turns reproduced failures into PRs.
+## Discover first
+Before changing anything, read the repo and find out where traces live:
+1. `package.json` / `pyproject.toml` / `requirements.txt`: language and
+   framework. Language matters — the planner and `@archal/state-capture` are
+   TypeScript-only today (see limits).
+2. Existing observability dependencies. Archal's detector recognizes exactly
+   five vendors by dependency name:
+   - `langsmith` -> LangSmith
+   - `langfuse`, `langfuse-node` (TS) / `langfuse` (py) -> Langfuse
+   - `dd-trace` (TS) / `ddtrace` (py) -> Datadog
+   - `@opentelemetry/sdk-node`, `@opentelemetry/sdk-trace-node` (TS) /
+     `opentelemetry-sdk` (py) -> OpenTelemetry
+   - `braintrust` -> Braintrust
+   A repo with any of these is a candidate for **augment**; a repo with none is
+   **greenfield**.
+3. GitHub remote — augment/greenfield setup PRs and the autoloop need a GitHub
+   remote that resolves to `github.com/<owner>/<repo>`:
+   ```bash
+   git remote get-url origin
+   ```
+4. Where do the traces actually go? Ask the user. The answer routes you:
+   - already in a hosted vendor (Langfuse, Braintrust) or a Postgres/Supabase
+     table -> ingest path (`archal trace-source`, delegate detail to `autoloop`)
+   - exported files on disk -> `archal trace-source import`
+   - nowhere yet / only app logs -> the observability setup PR
+Never print secrets while inspecting. Show env var names or secret references,
+never plaintext keys or database URLs.
+## Preconditions
+- Archal CLI installed in the repo or reachable with `npx archal`
+- authenticated user (`archal login`) or `ARCHAL_TOKEN=archal_ws_...` (a
+  workspace key for CI)
+- the **Archal GitHub App** installed on the target repo (required for the setup
+  PR and for autoloop fix PRs)
+- a GitHub remote resolving to `github.com/<owner>/<repo>`
+- for the ingest path: a read-only credential for the trace vendor
+If a precondition is missing, make the smallest safe change and name what is
+still required. Do not fake a connection.
+## Step 1 — connect the GitHub App
+The repo connection is the GitHub App, not a token paste. Confirm the **Archal
+GitHub App** is installed on the target repository and that the org granted it
+access. Without it, the setup PR cannot be opened and the autoloop cannot open
+fix PRs. If it is not installed, send the user to the dashboard's integration
+flow to install it, then continue.
+## Step 2 — open the observability setup PR
+When traces are not yet exported anywhere (or the user wants Archal's own
+capture), open the **observability setup PR**. It is a deterministic, templated
+patch — every file's contents are pre-generated; nothing is freely authored by
+an LLM. The patch resolves into one of two install modes:
+### Greenfield (no existing observability detected)
+Adds standard OpenTelemetry instrumentation pointed at Archal's OTLP endpoint:
+- `archal-otel.ts` (TS) or `archal_otel.py` (Python) — an OpenTelemetry init
+  bootstrap (OTLP HTTP exporter + the node/python SDK), **not** an Archal-only
+  exporter
+- `archal-replay-capsule.ts` / `archal_replay_capsule.py` — a replay helper
+  template
+- OpenTelemetry SDK + framework instrumentation added to `package.json` (TS) or
+  `requirements.txt` (Python)
+- an `.env.example` entry for the workspace key and an `ARCHAL_OBSERVABILITY.md`
+  guide
+### Augment (existing observability detected, TypeScript only)
+When the repo already has one of the five vendors above **and** is TypeScript,
+the PR instead adds Archal's state capture alongside the existing stack rather
+than replacing it:
+- `archal-state-capture.ts` importing from `@archal/state-capture`
+- `@archal/state-capture` and `@opentelemetry/api` added to `package.json`
+- the same `.env.example` entry and `ARCHAL_OBSERVABILITY.md` guide
+Python repos with existing observability fall back to the greenfield
+OpenTelemetry install — `@archal/state-capture` has no Python build yet, and the
+PR body says so. Tell the user that explicitly rather than implying parity.
+### The optional install planner (managed LLM)
+On Pro/Enterprise workspaces, for a TypeScript greenfield/augment install with
+repo detection available, one managed LLM call (intent `observability-install`,
+public label **Archal install planner**, routed through the managed eval model
+lane — gemini-class — and metered as `cogs_only` spend) adapts the deterministic
+patch to the repo's real layout. It is strictly additive and **fail-open**: any
+availability, auth, plan-gate, or validation problem ships the deterministic
+patch unchanged. The planner can only:
+- relocate the bootstrap file to a better directory (path only — file contents
+  are never edited)
+- append advisory text to the PR body (where to wire startup, which functions to
+  wrap)
+It never edits application code, never modifies existing instrumentation, and
+never runs free-form codegen. Disable it with `ARCHAL_INSTALL_PLANNER_DISABLED=1`
+to force the deterministic install.
+## Step 3 — ingest an existing observability vendor
+If the agent already emits traces to a vendor, you usually do **not** need the
+setup PR — you normalize the existing traces with `archal trace-source`. This is
+the genuine ingest path. It maps a vendor's payloads into Archal's trace upload
+envelopes and uploads them to hosted autoloop when workspace auth is present.
+Supported providers: `langfuse`, `braintrust`, `otel`, `http`, `supabase`,
+`postgres`, `file`, `custom`. Pull/sync vendors (`langfuse`, `braintrust`,
+Postgres/Supabase) are fetched on a cursor; push sources (`otel`, `http`,
+`custom`) receive traces continuously through `serve`.
+The command surface:
+```bash
+archal trace-source connect <provider>   # register a source (e.g. langfuse, braintrust, otel, custom)
+archal trace-source test [source]        # validate credentials and reachability
+archal trace-source sync [source]        # pull-fetch new traces (langfuse/braintrust/db)
+archal trace-source watch [source]       # continuous pull loop
+archal trace-source serve [source]       # receiver for push sources (otel/http/custom)
+archal trace-source import <path>        # normalize exported trace files on disk
+archal trace-source status [source]      # registry validation, cursor, last-sync state
+archal trace-source list                 # registered sources
+archal trace-source use|disable <source> # select / disable a source
+```
+**Delegate the deep mapping to the `autoloop` skill.** It owns the per-vendor
+flags (`--base-url`, `--api-key-env`, `--out`, `--upload`, `--repository`,
+schema/cursor/filter mapping for database sources) and the full
+import/grade/reproduce/fix loop. Quote the command names here; point the user
+there for the flag detail and the autoloop wiring.
+## The honest limit
+There is **no sandboxed coding agent** that reads the whole repo and edits
+arbitrary code to wire up instrumentation. For a small or conventional repo, the
+setup PR drops in and works. For a large or unusual repo, the setup PR is a
+**generic bootstrap the user finishes** — they still import the bootstrap at
+startup and wrap the call sites the planner's advisory section points at. Say
+this plainly. The typed model lanes `remediation_agent` and
+`observability_install_agent` exist in the contract but are
+`agent-executor-contract-only`: the lane is declared, but no real executor
+consumes it yet. Do not describe either as a working autonomous coder.
+## Failure taxonomy
+Classify precisely; do not paper over a missing precondition:
+- **GitHub App not connected** — setup PR and fix PRs cannot open. Install the
+  Archal GitHub App on the repo.
+- **No GitHub remote** — augment/greenfield PRs need `github.com/<owner>/<repo>`.
+- **Language unsupported by augment** — Python with existing observability falls
+  back to greenfield OTel; `@archal/state-capture` is TS-only.
+- **Planner skipped** — wrong plan (needs Pro/Enterprise), non-TypeScript,
+  `augment-existing-vendor` legacy mode, detection unavailable, or
+  `ARCHAL_INSTALL_PLANNER_DISABLED=1`. The deterministic patch still ships; this
+  is not a failure, just a narrower install.
+- **Setup PR is a stub for this repo** — large/unusual layout; the user finishes
+  wiring. Not a bug; the honest limit.
+- **Trace ingest failure** — `trace-source` adapter mismatch, bad credential,
+  rejected upload, or missing workspace auth. Use `archal trace-source test` and
+  `status` to localize it.
+- **No usable trace evidence** — once ingested, grading or reproduction can still
+  block if the trace lacks task context or state. Hand off to the `autoloop`
+  skill's evidence diagnosis.
+## What to report back
+After install or debugging, give the user:
+- repo full name and whether the GitHub App is connected
+- chosen path: setup PR (greenfield vs augment) or `trace-source` ingest
+- if a setup PR: the install mode, the files it adds, and whether the planner ran
+- if ingest: provider, source id, and the next `archal trace-source` command
+- whether traces are flowing into Archal yet, or the exact blocker
+- next command or next owner
+## Docs
+- Autoloop production traces: https://docs.archal.ai/guides/autoloop-production-traces
+- Autonomous loops: https://docs.archal.ai/guides/autoloop-production-traces
+- CLI reference: https://docs.archal.ai/cli/autoloop
+- Quickstart: https://docs.archal.ai/quickstart

package/skills/onboard/SKILL.md CHANGED Viewed

@@ -87,6 +87,16 @@ Confirm detected clones, then ask which of these the user wants. Each delegates
 If the user doesn't have a harness yet, prefer `npx archal init`; it creates `./.archal/harness.mjs`, points `.archal.json` at it, and adds a starter scenario without overwriting existing files. The generated harness is a guarded stub: Archal refuses to score it until the user edits it to call their Cursor, Codex, Claude Code, or custom agent. A custom harness should read `AGENT_TASK` from env, call the agent runtime, print `{ "text": "..." }` to stdout, and call `reportAgentMetrics()` from `archal/harness` with accumulated `{ inputTokens, outputTokens, llmCallCount }` before exit. Service clients need one explicit routing mode: use sandbox/Docker routing when the harness calls normal service URLs such as `https://api.github.com`, or configure SDK base URLs from `AGENT_CLONE_URLS` and add the JSON headers from `AGENT_ROUTE_HEADERS` to those clone requests. Alternative: skip `agent` in `.archal.json` and pass `--harness <path>` per-run.
+### Or run a packaged agent (no harness to write)
+If the user just wants to evaluate a real, ready-made agent, point them at a packaged agent instead of writing a harness. A packaged agent runs unmodified in Docker while Archal's TLS-intercept sidecar routes its calls to seeded clones and injects the host's model API key on its model calls. The bundled agents live under `examples/agents/<name>` (`openclaw`, `hermes`, `github-octokit`).
+- `archal run <scenario>.md --sandbox` - run the bundled OpenClaw agent (needs Docker). Pick the model with `--agent-model <provider/model>`; export the matching key in the shell first (`OPENAI_API_KEY` / `ANTHROPIC_API_KEY` / `GEMINI_API_KEY`).
+- `archal run <scenario>.md --harness examples/agents/hermes --dockerfile examples/agents/hermes/Dockerfile` - run any other bundled agent through the Docker harness flags (swap in `github-octokit` for the other one).
+- `archal run <scenario>.md --harness ./<dir> --dockerfile ./<dir>/Dockerfile` - run your own packaged agent. A packaged agent is just a directory with a `Dockerfile`, a drive script (reads `AGENT_TASK`, prints the answer to stdout), and an `.archal.json`.
+See the "Run a packaged agent" guide: https://docs.archal.ai/guides/packaged-agents
 ### Option A - Evaluate an agent with scenarios
 Write markdown scenario files that describe setup, prompt, and success criteria; `archal run` executes them against clones.
@@ -122,16 +132,73 @@ Do not paste a sample config here. The right shape depends on what's already in
 Run: `archal clone start <detected clones>` - gives live clone URLs the user's SDK clients can point at. `archal clone status` shows the active session; `archal clone stop` tears down.
-### Option E - Attach real trace sources
+### Option E - Bounded pre-prod autonomous loop
+Use this when the repo already has scenarios or can safely generate starter
+pre-prod scenarios, and the user wants a coding agent to run checks, classify
+failures, optionally patch, validate, and open a draft PR.
+Start with:
+```bash
+archal preprod plan --repo . --write-scenarios --write-config --out .archal/preprod-plan.json
+archal preprod start --scenario-count 20 --dry-run --artifacts .archal/preprod
+```
+`--write-scenarios` writes generated scenario markdown under `archal/` by
+default, and `--write-config` writes `.archal.json` only when it can do so
+without overwriting an existing config. `preprod start` creates or reuses
+`.archal/preprod-pack.json`, writes generated scenarios under
+`archal/generated/` by default, runs the pack, and leaves resumable artifacts.
+If the repo already has `.archal.json`, read `.archal/preprod-plan.json` and
+confirm the detected clone/harness surface before starting the loop.
+Only enable local fix or PR commands after the dry-run artifacts have been
+reviewed:
+```bash
+archal preprod start \
+  --scenario-count 20 \
+  --allow-external-execution \
+  --remediation-agent codex \
+  --validation-command '<test command>' \
+  --open-pr \
+  --pr-command '<draft-pr command>' \
+  --artifacts .archal/preprod
+```
+`--open-pr` requires both `--validation-command` and `--pr-command`; PR
+publishing still stays disabled unless `--allow-external-execution` is present.
+`preprod start` uses the managed preprod remediation path by default. It writes
+a repo-local remediation context, invokes the selected coding agent, reruns the
+scenario pack, and validates before PR creation. The remediation command
+receives `ARCHAL_PREPROD_FAILURES_JSON`, `ARCHAL_PREPROD_ATTEMPT`,
+`ARCHAL_PREPROD_REMEDIATION_CONTEXT_PATH`, and `ARCHAL_PREPROD_USAGE_PATH`.
+If the coding agent can report its own model usage, write JSON to
+`ARCHAL_PREPROD_USAGE_PATH` with fields such as `inputTokens`, `outputTokens`,
+`provider`, `model`, `isByok`, and `costUsd`.
+Tell the user to inspect `.archal/preprod/preprod-result.json` and
+`.archal/preprod/preprod-failures.json` for status, stop reason, attempts,
+scenario run ids, validation, and PR summary. If the run was stopped with
+`--stop-after` or interrupted, resume with `archal preprod start --resume
+.archal/preprod --artifacts .archal/preprod`.
+### Option F - Autoloop real trace sources
 Use this when the repo already has agent traces from pre-production or
 production and the user wants Archal to import, grade, reproduce, and turn
 reproduced failures into GitHub issues or PRs.
-**Delegate to the `attach` skill.** It owns the trace-source mapping,
-`archal/harness.json`, `archal/scenario.md`, seed templates, `archal attach`
+**Delegate to the `autoloop` skill.** It owns the trace-source mapping,
+`archal/harness.json`, `archal/scenario.md`, seed templates, `archal autoloop`
 commands, dashboard expectations, and failure taxonomy. Do not inline the
-Attach flow here; it changes faster than starter scenario setup.
+Autoloop flow here; it changes faster than starter scenario setup.
+Set the expectation carefully: Autoloop is not arbitrary production trace replay.
+It can reproduce only failures with enough trace evidence plus repo-owned
+scenario/seed context to reconstruct realistic clone state. Missing evidence
+should block with a clear artifact instead of being guessed.
 ## Verify
@@ -155,4 +222,5 @@ Run the first scenario or task. For Options A and B, hand off to the `eval` skil
 - Quickstart: https://docs.archal.ai/quickstart
 - Full docs: https://docs.archal.ai
-- Attach production traces: https://docs.archal.ai/guides/attach-production-traces
+- Autonomous loops: https://docs.archal.ai/guides/autoloop-production-traces
+- Autoloop production traces: https://docs.archal.ai/guides/autoloop-production-traces

package/skills/scenario/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: scenario
-description: Write, edit, and validate Archal scenario files. Knows the markdown format, success criteria syntax, and config options.
+description: Write, edit, and validate Archal scenario markdown — the format, success criteria syntax, and config. USE THIS whenever the user wants to "write a scenario", "add a test for my agent", "fix/edit my scenario", asks "what's the success criteria syntax" or about `[D]`/`[P]` criteria, needs a multi-clone scenario, or is validating scenario files. Reach for it on any mention of authoring or fixing Archal scenarios.
 user-invocable: true
 argument-hint: "[scenario description or file path]"
 ---
@@ -15,7 +15,7 @@ You write and edit Archal scenario files. Scenarios are markdown files that defi
 # Scenario Title
 ## Setup
-Starting state described in plain English. Drives seed generation.
+Starting state in plain English. Context Archal reconstructs and the agent + evaluator read. Does NOT generate seed state.
 ## Prompt
 The task instruction given to the agent.
@@ -95,14 +95,25 @@ and `archal seed list` over maintaining a separate list in this skill.
 | Clone | Seeds |
 |------|-------|
 | `apify` | `empty` |
+| `calcom` | `empty`, `demo` |
+| `clickup` | `empty`, `demo` |
+| `customerio` | `empty` |
+| `datadog` | `empty` |
 | `github` | `empty`, `small-project`, `enterprise-repo`, `ci-cd-pipeline`, `stale-issues`, `large-backlog` |
+| `gitlab` | `empty`, `demo` |
+| `hubspot` | `empty`, `demo`, `stale-data` |
 | `slack` | `empty`, `engineering-team`, `busy-workspace`, `incident-active` |
 | `stripe` | `empty`, `small-business`, `checkout-flow`, `subscription-lifecycle`, `subscription-heavy` |
 | `jira` | `empty`, `small-project`, `enterprise`, `sprint-active`, `large-backlog` |
 | `linear` | `empty`, `small-team`, `engineering-org`, `multi-team`, `busy-backlog` |
 | `supabase` | `empty`, `small-project`, `saas-starter`, `ecommerce` |
 | `google-workspace` | `empty`, `assistant-baseline`, `gmail-busy-inbox`, `calendar-packed-week` |
+| `ownerrez` | `empty` |
+| `pricelabs` | `empty` |
+| `sentry` | `empty`, `demo` |
 | `tavily` | `empty` |
+| `unipile` | `empty` |
+| `webflow` | `empty` |
 | `ramp` | `empty`, `default` |
 | `discord` | `empty`, `small-server`, `harvested` |
@@ -131,7 +142,11 @@ Use multiple clones by listing them in config:
 clones: github, slack
 ```
-The Setup section can describe state across both services. Each clone gets its own seed.
+The Setup section can describe context across both services. Attach explicit seed state per clone via `seed:` or `## Seed State` (see Seed state below).
+## Seed state
+Seeding is deterministic — explicit committed state, no LLM. Scenarios attach it via the `seed:` config key or a `## Seed State` section. To author or load explicit JSON/SQL/catalog state into a clone, delegate to the sibling `seed` skill (`packages/archal/skills/seed`) rather than handling the seeding mechanics here.
 ## Validation
@@ -147,7 +162,7 @@ Run `archal scenario list` to verify scenarios parse correctly. A valid scenario
 1. Writing `[D]` criteria that require subjective judgment
 2. Writing `[P]` criteria that could be checked deterministically
 3. Forgetting to specify which clone the scenario uses
-4. Writing Setup descriptions that are too vague for seed generation
+4. Writing Setup descriptions too vague to ground the agent and evaluator
 5. Using seed names that don't exist (check the seed table above)
 ## Documentation

package/skills/seed/SKILL.md ADDED Viewed

@@ -0,0 +1,237 @@
+---
+name: seed
+description: Craft and load explicit clone seed state for Archal, deterministically and with no LLM. This is the canonical "how to seed a clone" skill. Use it whenever you need to give a clone (github, stripe, supabase, slack, ...) a known starting state: writing or editing a JSON or SQL seed file, choosing a named catalog seed, wiring a scenario's seed, debugging "seed not found" / "seed_unavailable" / shape-mismatch / rollback errors, or deciding between an inline `## Seed State` block and a committed seed file. Reach for this any time the words seed, seed state, starting state, fixture data, or "set up the clone with..." appear.
+user-invocable: true
+argument-hint: "[clone + the state you want loaded]"
+---
+# Archal Seed State
+You craft and load explicit starting state for Archal clones. A seed is the
+clone's state before a run begins: the issues GitHub holds, the customers and
+subscriptions Stripe holds, the rows a Supabase database holds.
+The one rule that defines this skill: **seeds are explicit, committed, and
+deterministic. No LLM is involved.** You write the state, or you pick a named
+catalog seed someone already wrote. You never ask a model to invent it.
+The dedicated package that loads seed state, `@archal/seed-state`, says this in
+its own README and it is the mental model to hold:
+> This package intentionally contains no LLM calls, code generation, natural
+> language extraction, cache, repair, or scenario-to-state generation.
+## The four shapes a seed can take
+1. **A committed JSON seed file** — `clones/<clone>/seeds/<name>.json`. A
+   top-level object whose keys are the clone's collections and whose values are
+   arrays of entities. This is the common shape for object-graph clones
+   (github, stripe, slack, linear, jira). Example file:
+   `clones/github/seeds/small-project.json`.
+2. **A committed SQL seed file** — `clones/<clone>/seeds/<name>.sql`. A set of
+   `CREATE TABLE` + `INSERT` statements. The natural shape for relational
+   clones like `supabase`, whose seeds on disk are `.sql`, not `.json`. Example:
+   `clones/supabase/seeds/ecommerce.sql`.
+3. **A named catalog seed** — a seed that already lives on disk for a clone, so
+   you reference it by name instead of writing one. `github: small-project`,
+   `stripe: checkout-flow`, `supabase: saas-starter`. Browse them with
+   `archal seed list <clone>`.
+4. **An inline `## Seed State` block in a scenario** — explicit state written
+   directly in the scenario markdown. Use it when the state is small and tightly
+   coupled to that one scenario. For anything reused across scenarios, prefer a
+   committed file (shapes 1–3) so it has one home.
+Resolution on disk checks `<name>.json` first, then `<name>.sql`
+(`loadSeedStateFromPath` in `packages/seed-state/src/state.ts`). A clone can
+ship either form for a given seed name, not both.
+## How a scenario or the CLI selects a seed
+A seed value comes from one of two places, parsed identically:
+- The scenario `## Config` key `seed:`
+- The CLI flag `archal run --seed <name-or-path>`
+Both accept the same forms (verbatim from the `--seed` flag help):
+> Seed name (e.g. small-project), clone-prefixed name (github:small-project,
+> applies only to that clone), seed family (enterprise, stale, ...), or local
+> file path (.json / .md).
+So the two everyday shapes are:
+- **Bare name** — `seed: small-project`. Applies to every clone the scenario
+  declares.
+- **Clone-prefixed** — `seed: github:small-project`. Applies only to the named
+  clone. This matches `archal clone start --seed github:small-project`, so the
+  same string works in both commands.
+The prefix is validated: if you write `seed: payments:checkout-flow` but the
+scenario never declares a `payments` clone, you get a usage error naming the
+clones the scenario actually has, rather than a confusing `seed_unavailable`
+later from the runtime (`parseExplicitSeed` in
+`cli/src/runner/seed-resolution.ts`).
+One behavior to remember: a scenario with a `## Setup` section but **no**
+explicit `seed:` is forced to the `empty` seed, because Setup prose used to
+drive dynamic generation and a pre-populated seed would conflict with it. To
+combine a Setup description with a real seed, you must add an explicit `seed:`
+field. (See `resolveRunSeedPlan` in `cli/src/runner/seed-resolution.ts`.)
+### Command surface
+| Command | What it does |
+|---------|--------------|
+| `archal seed list` | One row per clone: clone, default seed, seed count |
+| `archal seed list <clone>` | Every seed for that clone, with the default marked |
+| `archal seed list <clone> --json` | Same, machine-readable |
+| `archal run <scenario>.md --seed <name-or-path>` | Override the seed for a run |
+| `archal run ... --fresh-seed` | On a reused clone session, reset and re-apply the seed |
+| `archal run ... --keep-state` | On a reused session, keep existing state, do not re-apply |
+| `archal clone start <clone> --seed <seeds...>` | Start a live clone pre-seeded |
+| `archal clone start <clone> --seed-file <path>` | Start a live clone, then load a JSON/markdown seed file |
+| `archal clone seed --file <path>` | Sideload a JSON seed file into a running clone |
+Prefer `archal seed list` over memorizing a seed table. The catalog is derived
+from `clones/<clone>/seeds/*.{json,sql}` on disk, so the CLI is always current.
+## The deterministic load flow
+When a run seeds a clone, the loader does this, in order
+(`packages/runtime/src/seed-loader.ts`):
+1. **Snapshot first.** `GET /state` on each target clone and keep the response
+   text. This is the rollback point (`snapshotSeedTargets` →
+   `fetchSeedStateSnapshot`).
+2. **Apply the seed.** `PUT /state` with the seed body — `application/json` for
+   a JSON seed, `text/sql` for a SQL seed — one clone at a time, tracking each
+   one that committed.
+3. **Roll back on failure.** If any clone's load throws, restore the clones that
+   already committed by replaying their snapshots, then re-throw the original
+   error (`restoreSeedTargets`). A clone is never left half-seeded.
+4. **Capture the baseline after seeding.** Once seeding succeeds, the post-seed
+   state is captured as the baseline (`captureBaselineStates` in
+   `packages/runtime/src/clone-client.ts`). Resets between runs restore *this*
+   seeded baseline, not an empty clone — so every run in a multi-run scenario
+   starts from the same seeded state.
+### How state reaches the clone (the `/state` endpoint)
+The clone server exposes `/state` (`clones/core/src/rest/rest-built-in-endpoints.ts`):
+- `GET /state` — read the current state (used for the snapshot and baseline).
+- `PUT /state` — load state. The handler branches on `Content-Type`:
+  - `text/sql` or `application/sql` → parsed and loaded as SQL (only when the
+    clone exposes SQL loading; otherwise it returns a 400 telling you to send
+    JSON).
+  - otherwise → loaded as JSON state.
+  - `PUT /state?seed=<name>` with a `{}` body → load a named catalog seed
+    server-side, without sending a state body at all.
+- `DELETE /state` — wipe state (used by `--fresh-seed` before re-applying).
+A successful `PUT` returns `{ ok: true }`. The content-type the CLI sends is set
+in one place — `text/sql` when there is SQL, `application/json` otherwise
+(`pushStateToCloud` in `cli/src/runner/execution/agent-http.ts`); named-seed
+loads always send `application/json` with an empty body.
+## Do not do these
+- **Do not synthesize seed data from scenario prose.** A `## Setup` paragraph is
+  context for the evaluator, not a source you extract state from. If you need
+  populated state, write a committed seed (or pick a named one) and reference it
+  with `seed:`.
+- **Do not call an LLM to generate, repair, or "fill in" a seed.** The old
+  dynamic seed-generation path is being removed; `@archal/seed-state` was built
+  specifically to have none of it. Explicit committed seeds only.
+- **Do not invent collection names.** A JSON seed must use the clone's real
+  collection keys (see "Writing a good JSON seed"). Unknown keys are dropped by
+  normalization, so a typo silently seeds nothing.
+- **Do not hand-tune both `.json` and `.sql` for the same name.** Pick one form
+  per seed name; resolution stops at the first that exists.
+- **Do not blow away a sideloaded session by accident.** On a reused
+  `archal clone start` session the loader probes for existing state before
+  re-applying. Use `--keep-state` to keep it or `--fresh-seed` to reset on
+  purpose, rather than fighting the guard.
+## Writing a good JSON seed
+The seed file must mirror the clone's real state shape. The fastest reliable way
+to get the shape right is to open an existing catalog seed for that clone and
+match its top-level keys and per-entity fields.
+A JSON seed is a top-level object: each key is a collection, each value is an
+array of entity objects.
+```jsonc
+{
+  "users":  [{ "id": "u1", "login": "octocat", "type": "User" }],
+  "repos":  [{ "id": "r1", "name": "demo", "fullName": "octocat/demo", "private": false }],
+  "issues": [{ "id": "i1", "repoId": "r1", "number": 1, "title": "First issue", "state": "open" }]
+}
+```
+Match the clone, not your imagination:
+- Use the same collection names the clone uses. `clones/github/seeds/small-project.json`
+  has `users`, `repos`, `issues`, `pullRequests`, `labels`, and more.
+- Give each entity the fields the clone expects, especially `id` and the foreign
+  keys that wire entities together (`repoId`, `issueNumber`, ...). The clone
+  maintains those relationships; broken references produce realistic errors at
+  runtime, not seed-time.
+- Only arrays survive normalization — non-array top-level values are dropped
+  (`normalizeSeedState`). Keep everything in collection arrays.
+## When SQL fits
+Use a `.sql` seed for relational clones whose state is naturally tables and
+rows — `supabase` is the canonical case, and its seeds on disk are `.sql`. The
+SQL seed is plain `CREATE TABLE` + `INSERT INTO ... VALUES (...)`:
+```sql
+CREATE TABLE customers (id serial, email text, name text);
+INSERT INTO customers (email, name) VALUES
+  ('ada@example.com', 'Ada Lovelace'),
+  ('alan@example.com', 'Alan Turing');
+```
+The SQL parser is deliberately small (`parseSqlSeed` in
+`packages/seed-state/src/state.ts`): it understands `CREATE TABLE` (including
+`IF NOT EXISTS`), `INSERT INTO ... VALUES`, schema-qualified and quoted
+identifiers, line and block comments, and auto-assigns serial `id`s. It is not a
+full SQL engine — anything other than `CREATE TABLE` / `INSERT` throws an
+"Unsupported SQL seed statement" error. Keep seeds to those two statements.
+## Failure taxonomy
+| Symptom | Likely cause | Fix |
+|---------|--------------|-----|
+| `Could not resolve static seed "<name>"` / `seed_unavailable` | Seed name is misspelled, not in this clone's catalog, or the hosted session was started with a different seed | `archal seed list <clone>` to confirm the name; re-provision the session with the right seed |
+| `clone "<x>" is not part of this scenario` | Clone-prefixed seed names a clone the scenario doesn't declare | Add the clone to `clones:`, or drop the prefix |
+| Seed loads but state looks empty / partial | Wrong collection keys, or non-array top-level values dropped by normalization | Match an existing catalog seed's keys; keep every collection an array |
+| `Failed to parse seed file ...` | Malformed JSON | Validate the JSON; check trailing commas and quoting |
+| `Unsupported SQL seed statement: ...` | SQL beyond `CREATE TABLE` / `INSERT` | Reduce the seed to those two statements |
+| `…does not expose SQL state loading` (HTTP 400) | Sent SQL to a clone that only loads JSON | Send JSON state, or use a clone that supports SQL |
+| `Rollback failed after partial seed load` | A clone errored mid-load and the snapshot restore also failed | The clones may be in a mixed state; restart the clone session and retry |
+| Seed "not re-applied" warning on a reused session | The loader kept existing sideloaded state instead of overwriting it | `--fresh-seed` to reset and re-apply, or `--keep-state` to accept it on purpose |
+## What to report back
+After authoring or loading a seed, tell the user:
+- the clone(s) and the seed shape used (named catalog seed, JSON file, SQL file,
+  or inline `## Seed State`)
+- the seed name or file path, and how a scenario/run selects it (`seed:` or
+  `--seed`)
+- the entity counts loaded per clone (the loader logs e.g. `7 issues, 3 repos`)
+- for a new seed file, that it mirrors an existing catalog seed's shape
+- any blocker: missing seed name, shape mismatch, SQL parse error, or a rollback
+  that fired — with the exact next command
+## Docs
+- Seeds guide: https://docs.archal.ai/guides/seeds
+- Clones and seeds overview: https://docs.archal.ai/clones/overview
+- Writing scenarios: https://docs.archal.ai/guides/writing-scenarios