archal 0.9.10 → 0.9.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "archal",
3
- "version": "0.9.10",
3
+ "version": "0.9.11",
4
4
  "description": "Test your agents & integrations against digital twins",
5
5
  "type": "module",
6
6
  "main": "dist/index.cjs",
@@ -41,6 +41,7 @@
41
41
  "files": [
42
42
  "bin",
43
43
  "dist",
44
+ "skills",
44
45
  "twin-assets",
45
46
  "LICENSE"
46
47
  ],
@@ -0,0 +1,55 @@
1
+ ---
2
+ name: audit
3
+ description: Audit an Archal repository thoroughly. Trace real execution paths, identify concrete bugs and design flaws, distinguish root-cause fixes from architecture problems, and add regression tests for every confirmed issue.
4
+ user-invocable: true
5
+ argument-hint: "[repo path or scope]"
6
+ ---
7
+
8
+ # Archal Repository Audit
9
+
10
+ Use this skill when the goal is to inspect an Archal repository deeply, find problems worth fixing, and avoid shallow or local-only patches.
11
+
12
+ ## Audit standard
13
+
14
+ - Trace real execution paths from entrypoints before proposing fixes.
15
+ - Prefer root-cause fixes over guards, silencing, or narrow special cases.
16
+ - If the real problem is architectural, report it instead of applying a monkey patch.
17
+ - For every confirmed bug you fix, add the narrowest regression test that would have caught it earlier.
18
+ - Always include at least one regression test that covers a stale-data row or pre-migration row when the touched path has compatibility logic.
19
+
20
+ ## Working pattern
21
+
22
+ 1. Map the hot paths first.
23
+ - Identify the actual entrypoints: CLI commands, web routes, background jobs, and core runtime/session flows.
24
+ - Ignore dead-looking surfaces until the primary paths are understood.
25
+ 2. Read the execution path end to end.
26
+ - Follow inputs through parsing, validation, persistence, normalization, and response shaping.
27
+ - Inspect nearby invariants and adjacent edge cases before deciding on a fix.
28
+ 3. Separate findings into two buckets.
29
+ - **Fix now**: clear bug, contained scope, root cause understood, regression test is obvious.
30
+ - **Escalate**: the defect comes from a bad abstraction or architectural boundary and a local patch would hide the real problem.
31
+ 4. Validate narrowly, then broadly.
32
+ - Run the smallest meaningful tests for the changed path first.
33
+ - If code changed, also run the relevant package build/typecheck before concluding.
34
+
35
+ ## What to look for
36
+
37
+ - Compatibility shims that silently drop data from old rows or partially migrated schemas
38
+ - Session lifecycle bugs around start, ready, teardown, stale state, and idempotency
39
+ - Projection code that derives canonical state from stale denormalized fields
40
+ - Fallback behavior that changes semantics instead of preserving them
41
+ - Query builders that filter on derived fields inconsistently across list/count paths
42
+ - Evidence, trace, or normalization code that double-counts, hides, or misattributes records
43
+
44
+ ## Output format
45
+
46
+ For each finding, report:
47
+
48
+ - Problem
49
+ - Technical cause
50
+ - Simple explanation
51
+ - Optimal fix
52
+ - Why that fix is better than narrower alternatives
53
+ - Regression test to add
54
+
55
+ If no actionable problems are found in a slice, say that explicitly and note any remaining coverage gaps.
@@ -0,0 +1,93 @@
1
+ ---
2
+ name: onboard
3
+ description: Set up Archal in a project from scratch. Detects dependencies, installs the CLI, handles auth, then routes to the right sub-skill (scenarios, vitest, etc.) for the workflow the user wants. Use when the user asks to "set up archal", "initialize archal", "get started with archal", or "add archal to this repo".
4
+ user-invocable: true
5
+ ---
6
+
7
+ # Archal Onboard
8
+
9
+ You are setting up Archal in this project. Archal tests AI agents against digital twins of real services (GitHub, Slack, Stripe, etc.). Handle installation and auth yourself; delegate the workflow-specific setup to the matching sub-skill.
10
+
11
+ ## Discover first
12
+
13
+ Before asking anything, read the repo:
14
+
15
+ 1. `package.json` deps → infer likely twins:
16
+ - `@octokit/rest`, `octokit` → `github`
17
+ - `stripe` → `stripe`
18
+ - `@slack/web-api`, `@slack/bolt` → `slack`
19
+ - `@linear/sdk` → `linear`
20
+ - `@supabase/supabase-js` → `supabase`
21
+ - `googleapis`, `@google-cloud/*` → `google-workspace`
22
+ - `jira-client`, `jira.js` → `jira`
23
+ 2. Existing vitest config? Existing scenarios? Existing `.archal.json`? Those change which workflow makes sense.
24
+ 3. If no `package.json` or no matching deps: ask "Which services does your agent interact with?" and show the full list: `github`, `slack`, `stripe`, `linear`, `jira`, `supabase`, `google-workspace`, `ramp`.
25
+
26
+ ## Install + auth
27
+
28
+ ```bash
29
+ npx archal --version # check if installed
30
+ npm install -D archal # install if not (or -g for global)
31
+ archal usage # check auth
32
+ archal login # OAuth browser flow, or: archal login --token <token>
33
+ ```
34
+
35
+ In CI, use `ARCHAL_TOKEN` instead of `archal login`.
36
+
37
+ ## Pick a workflow
38
+
39
+ Confirm detected twins, then ask which of these the user wants. Each delegates to a sub-skill where appropriate — don't inline those flows.
40
+
41
+ ### Option A — Test an agent with scenarios
42
+
43
+ Write markdown scenario files that describe setup, prompt, and success criteria; `archal run` executes them against twins.
44
+
45
+ 1. Create `.archal.json`:
46
+ ```json
47
+ { "agent": "<agent command>", "twins": ["<detected twins>"] }
48
+ ```
49
+ 2. **Delegate to the `scenario` skill** to author a starter scenario. Don't paste a canned example here — the skill knows the markdown format and success-criteria syntax.
50
+ 3. Run: `archal run scenarios/<first>.md`.
51
+
52
+ ### Option B — Run quick inline tasks
53
+
54
+ 1. `.archal.json` with just twins:
55
+ ```json
56
+ { "twins": ["<detected twins>"] }
57
+ ```
58
+ 2. Demo: `archal run --task "Create an issue titled hello" --twin github`.
59
+
60
+ No sub-skill needed — this is a one-shot.
61
+
62
+ ### Option C — Twins in a Vitest suite
63
+
64
+ **Delegate to the `vitest` skill.** It handles reading the existing vitest config, identifying which tests should route, picking the right composition pattern, and seeding the twins.
65
+
66
+ Do not paste a sample config here. The right shape depends on what's already in the repo.
67
+
68
+ ### Option D — Persistent twins to develop against
69
+
70
+ Run: `archal twin start <detected twins>` — gives live twin URLs the user's SDK clients can point at.
71
+
72
+ ## Verify
73
+
74
+ Run the first test or task and show the result.
75
+
76
+ ## `.archal.json` schema
77
+
78
+ | Field | Type | Required | Default | Description |
79
+ |-------|------|----------|---------|-------------|
80
+ | `agent` | string or `{ command, args }` | yes (for scenarios) | | Shell command to run the agent |
81
+ | `title` | string | no | | Display name for reports |
82
+ | `twins` | string[] | no | inferred | Which twins to provision |
83
+ | `scenarios` | string[] | no | | Scenario file paths relative to config |
84
+ | `seeds` | `Record<string, string>` | no | | Per-twin seed overrides |
85
+ | `agentModel` | string | no | | LLM model the agent uses |
86
+ | `model` | string | no | `gemini-2.5-pro` | Evaluator model |
87
+ | `runs` | number | no | `1` | Runs per scenario |
88
+ | `timeout` | number | no | `180` | Timeout per run in seconds |
89
+
90
+ ## Docs
91
+
92
+ - Quickstart: https://docs.archal.ai/quickstart
93
+ - Full docs: https://docs.archal.ai
@@ -0,0 +1,146 @@
1
+ ---
2
+ name: scenario
3
+ description: Write, edit, and validate Archal scenario files. Knows the markdown format, success criteria syntax, and config options.
4
+ user-invocable: true
5
+ argument-hint: "[scenario description or file path]"
6
+ ---
7
+
8
+ # Archal Scenario Writer
9
+
10
+ You write and edit Archal scenario files. Scenarios are markdown files that define a test for an AI agent running against digital twins.
11
+
12
+ ## Scenario format
13
+
14
+ ```markdown
15
+ # Scenario Title
16
+
17
+ ## Setup
18
+ Starting state described in plain English. Drives seed generation.
19
+
20
+ ## Prompt
21
+ The task instruction given to the agent.
22
+
23
+ ## Expected Behavior
24
+ Answer key for the evaluator. Never shown to the agent.
25
+
26
+ ## Success Criteria
27
+ - [D] Deterministic criterion checked against twin state
28
+ - [P] Probabilistic criterion judged by LLM
29
+
30
+ ## Config
31
+ twins: github
32
+ timeout: 90
33
+ runs: 3
34
+ ```
35
+
36
+ ## Sections
37
+
38
+ | Section | Required | Aliases | Purpose |
39
+ |---------|----------|---------|---------|
40
+ | `# Title` | yes | | Scenario name (H1 heading) |
41
+ | `## Setup` | no | `Context`, `Initial State` | Starting state in plain English |
42
+ | `## Prompt` | yes | `Task`, `Instruction`, `Instructions`, `Request` | Task given to the agent |
43
+ | `## Expected Behavior` | no | `Expected Behaviour`, `Behavior`, `Behaviour`, `Judge Notes`, `Evaluation Notes` | Answer key for evaluator (never shown to agent) |
44
+ | `## Success Criteria` | yes | `Success`, `Criteria`, `Checks`, `Assertions` | Evaluable checks |
45
+ | `## Config` | no | | Runtime settings |
46
+ | `## Seed State` | no | | Explicit seed data |
47
+
48
+ ## Success criteria syntax
49
+
50
+ Each criterion is a bullet point. Tag with `[D]` or `[P]`:
51
+
52
+ - `[D]` = **Deterministic**. Checked against twin state programmatically. Use for counts, existence checks, state assertions. No LLM cost.
53
+ - `[P]` = **Probabilistic**. Judged by LLM evaluator from the trace and final state. Use for tone, quality, correctness, reasoning.
54
+
55
+ If no tag is provided, Archal infers the type:
56
+ - Numeric/state patterns (`exactly N`, `at least N`, `is created/closed/merged`, `no errors`, `count is/equals`) are auto-tagged `[D]`
57
+ - Everything else defaults to `[P]`
58
+
59
+ ### Writing good criteria
60
+
61
+ **Good `[D]` criteria:**
62
+ - `[D] Exactly 4 issues are closed`
63
+ - `[D] A pull request exists with title containing "fix"`
64
+ - `[D] No issues have the label "wontfix"`
65
+ - `[D] The Slack channel #incidents has at least 1 new message`
66
+
67
+ **Good `[P]` criteria:**
68
+ - `[P] Each closing comment explains the inactivity period`
69
+ - `[P] The PR description summarizes all changes accurately`
70
+ - `[P] The agent does not modify any unrelated issues`
71
+
72
+ **Bad criteria (avoid):**
73
+ - `The agent works correctly` (too vague)
74
+ - `[D] The response is good` (not deterministic)
75
+ - `[P] Exactly 3 items exist` (should be `[D]`)
76
+
77
+ ## Config keys
78
+
79
+ | Key | Type | Default | Description |
80
+ |-----|------|---------|-------------|
81
+ | `twins` | comma-separated | inferred from content | Which twins to use |
82
+ | `seed` | string | | Named seed to load |
83
+ | `timeout` | integer | `180` | Seconds per run |
84
+ | `runs` | integer | `1` | Number of runs |
85
+ | `evaluator-model` | string | `gemini-2.5-pro` | LLM for `[P]` criteria |
86
+ | `tags` | comma-separated | | Scenario tags |
87
+
88
+ Aliases for `evaluator-model`: `evaluator`, `evaluatormodel`, `model`.
89
+
90
+ ## Available twins and general-purpose seeds
91
+
92
+ | Twin | Seeds |
93
+ |------|-------|
94
+ | `github` | `empty`, `small-project`, `enterprise-repo`, `ci-cd-pipeline`, `stale-issues`, `large-backlog` |
95
+ | `slack` | `empty`, `engineering-team`, `busy-workspace`, `incident-active` |
96
+ | `stripe` | `empty`, `small-business`, `checkout-flow`, `subscription-lifecycle`, `subscription-heavy` |
97
+ | `jira` | `empty`, `small-project`, `enterprise`, `sprint-active`, `large-backlog` |
98
+ | `linear` | `empty`, `small-team`, `engineering-org`, `multi-team`, `busy-backlog` |
99
+ | `supabase` | `empty`, `small-project`, `saas-starter`, `ecommerce` |
100
+ | `google-workspace` | `empty`, `assistant-baseline`, `gmail-busy-inbox`, `calendar-packed-week` |
101
+ | `ramp` | `empty`, `default` |
102
+
103
+ ## Twin auto-detection from content
104
+
105
+ If no `twins:` config is set, Archal infers twins from keywords in Setup, Expected Behavior, and Prompt:
106
+
107
+ - `github`, `repository`, `pull request`, `create_issue` -> `github`
108
+ - `slack`, `slack channel`, `send_message` -> `slack`
109
+ - `linear`, `linear ticket` -> `linear`
110
+ - `jira`, `jira sprint` -> `jira`
111
+ - `stripe`, `payment`, `refund`, `subscription`, `invoice` -> `stripe`
112
+ - `supabase`, `database`, `sql query` -> `supabase`
113
+ - `google workspace`, `gmail`, `calendar event`, `inbox` -> `google-workspace`
114
+
115
+ ## Multi-service scenarios
116
+
117
+ Use multiple twins by listing them in config:
118
+
119
+ ```markdown
120
+ ## Config
121
+ twins: github, slack
122
+ ```
123
+
124
+ The Setup section can describe state across both services. Each twin gets its own seed.
125
+
126
+ ## Validation
127
+
128
+ Run `archal scenario list` to verify scenarios parse correctly. A valid scenario must have:
129
+ - A title (H1 heading)
130
+ - A Prompt section
131
+ - At least one success criterion
132
+ - At least one referenced twin (explicit or inferred)
133
+ - Positive timeout and runs values
134
+
135
+ ## Common mistakes to avoid
136
+
137
+ 1. Writing `[D]` criteria that require subjective judgment
138
+ 2. Writing `[P]` criteria that could be checked deterministically
139
+ 3. Forgetting to specify which twin the scenario uses
140
+ 4. Writing Setup descriptions that are too vague for seed generation
141
+ 5. Using seed names that don't exist (check the seed table above)
142
+
143
+ ## Documentation
144
+
145
+ - Writing scenarios: https://docs.archal.ai/guides/writing-scenarios
146
+ - Twins and seeds: https://docs.archal.ai/twins/overview
@@ -0,0 +1,109 @@
1
+ ---
2
+ name: test
3
+ description: Run Archal scenarios or inline tasks against hosted twins, diagnose failed runs, and interpret satisfaction scores. Triggers on "run my scenario", "test my agent", "archal run X", "debug this failing run", "what does this satisfaction score mean".
4
+ user-invocable: true
5
+ argument-hint: "[scenario.md or task description]"
6
+ ---
7
+
8
+ # Archal Test Runner
9
+
10
+ You run Archal scenarios and inline tasks, then help the user interpret the results. For setting up the agent path or `.archal.json` in a fresh repo, hand off to the `onboard` skill.
11
+
12
+ ## What only you know (product mental model)
13
+
14
+ - `archal run` spawns the user's agent as a child process. The agent needs:
15
+ - A **runnable agent path**. Two ways to supply it: explicit `--harness <path>` (e.g. `./.archal/harness.ts`), or `.archal.json` with an `agent` command. Repo-local auto-discovery also walks up from cwd for a top-level `harness.{ts,js,mjs,cjs}`.
16
+ - A **headless boundary** — no UI, no browser OAuth. The process is spawned without a shell, so interactive auth hangs forever.
17
+ - Env vars — auto-injected. `ARCHAL_ENGINE_TASK` is the prompt; `ARCHAL_<TWIN>_BASE_URL` / `ARCHAL_<TWIN>_URL` point at twins; `ARCHAL_PREFLIGHT=1` is set during boot check (harness should exit early).
18
+ - Every `archal run` writes local artifacts under `.archal/cache/last-run.json` and `.archal/cache/runs/*.json` **regardless** of `--output`. `--output json` is only for machine-readable stdout; it's not needed for local persistence.
19
+ - **Satisfaction score** = (runs passing all criteria) / (total runs). `[D]` criteria are deterministic state checks; `[P]` criteria are LLM-judged from trace + final state.
20
+
21
+ ## Preflight the harness before a run
22
+
23
+ When the agent path is uncertain, or after any change to the harness file, smoke-test the harness directly before `archal run`:
24
+
25
+ ```bash
26
+ ARCHAL_PREFLIGHT=1 ARCHAL_ENGINE_TASK="Reply with OK and do not use tools." npx tsx ./.archal/harness.ts
27
+ ```
28
+
29
+ A harness that exits cleanly with no tool calls is ready. Catches: no runnable entrypoint, UI-boot assumptions, missing provider keys, service bridge misconfig. A failure here is much easier to diagnose than a silent timeout inside `archal run`.
30
+
31
+ ## Running
32
+
33
+ Scenario from a file:
34
+
35
+ ```bash
36
+ archal run scenario.md
37
+ archal run scenario.md --runs 5 --seed enterprise-repo # N runs → satisfaction score
38
+ ```
39
+
40
+ Inline task (no scenario file):
41
+
42
+ ```bash
43
+ archal run --task "Create an issue titled hello" --harness ./.archal/harness.ts --twin github
44
+ ```
45
+
46
+ `--task` only replaces the scenario file — it still needs a runnable agent path. `--twin` is required with `--task`; repeat or comma-separate for multiple twins.
47
+
48
+ When `.archal.json` exists in cwd, bare `archal run` uses it. If the user doesn't have one yet, that's setup — hand off to the `onboard` skill, which owns harness creation and `.archal.json` scaffolding.
49
+
50
+ ## Interpret results
51
+
52
+ Score breakdown:
53
+ - `100%` = every run passed every criterion
54
+ - `80%` = 4/5 runs passed
55
+ - `0%` = none passed
56
+
57
+ Criterion types:
58
+ - `[D]` — deterministic state check. A failure is real; never a model variance artifact.
59
+ - `[P]` — LLM judge reads trace + final state. A single failure can be variance; re-run with `--runs 3+` to confirm before acting on it.
60
+
61
+ ## Diagnose failures
62
+
63
+ Re-run with `-v` for the full trace, then classify with these signals:
64
+
65
+ - **Agent bug** — wrong tool called, wrong arguments, stopped early.
66
+ *Signals:* trace shows the correct tool was available but the agent chose another; or arguments are malformed.
67
+ *Fix:* agent prompt, tool wiring, or underlying model.
68
+
69
+ - **Scenario bug** — criteria are too strict, ambiguous, or contradict the Setup.
70
+ *Signals:* agent clearly did the right thing but a `[D]` criterion expects an exact count the Setup didn't guarantee; or two criteria contradict each other.
71
+ *Fix:* make Setup more specific, or relax the criterion. Use the `scenario` skill.
72
+
73
+ - **Seed mismatch** — twin state doesn't match what Setup describes.
74
+ *Signals:* agent's first introspection tool call returns unexpected state (e.g. Setup says "4 stale issues" but the seed has 3).
75
+ *Fix:* different seed, or adjust Setup to match. `archal seed list <twin>` to browse.
76
+
77
+ - **Harness bug** — agent process never started, crashed immediately, or hung.
78
+ *Signals:* no tool calls in the trace, stderr shows a boot error, or the run times out at the configured `--timeout`.
79
+ *Fix:* smoke-test the harness directly with `ARCHAL_PREFLIGHT=1 ARCHAL_ENGINE_TASK="Reply with OK." npx tsx ./.archal/harness.ts`, then look for UI-only imports, missing provider keys, or interactive auth.
80
+
81
+ ## CI mode
82
+
83
+ ```bash
84
+ archal run scenario.md --runs 3 --pass-threshold 80 -o json -q
85
+ ```
86
+
87
+ Exit codes: `0` pass, `1` fail or score < threshold, `2` validation error. For GitHub Actions, inject `ARCHAL_TOKEN` as a secret.
88
+
89
+ ## Artifacts + dashboard
90
+
91
+ - **Local (always written):** `.archal/cache/last-run.json` (summary), `.archal/cache/runs/*.json` (full redacted trace).
92
+ - **Hosted:** every run also uploads to https://www.archal.ai/dashboard — useful for sharing a failing trace with a colleague or comparing across agent model versions.
93
+
94
+ Don't tell users they need `-o json` to save artifacts locally — that's only for stdout.
95
+
96
+ ## Anti-patterns
97
+
98
+ - Don't re-document the `archal run` flag list here. `archal run --help` and https://docs.archal.ai/cli/run own that — they'll drift if duplicated.
99
+ - Don't guess the agent path. If the user doesn't have `--harness`, a repo-local harness, or `.archal.json`, hand off to `onboard` — it owns setup.
100
+ - Don't promote `--proxy` as default. It's for agents that still call real service domains through raw HTTPS clients. Env-var wiring is the primary path; proxy is a fallback.
101
+ - Don't classify a single `[P]` failure as an agent bug without re-running. Probabilistic criteria need sample size.
102
+ - Don't treat a `[D]` failure as model variance. Deterministic failures are real bugs.
103
+
104
+ ## Docs
105
+
106
+ - Running with an agent: https://docs.archal.ai/guides/run-with-agent
107
+ - Existing repo playbook: https://docs.archal.ai/guides/existing-agent-repo
108
+ - Scenario authoring: hand off to the `scenario` skill
109
+ - Twin sessions: https://docs.archal.ai/guides/twin-sessions
@@ -0,0 +1,133 @@
1
+ ---
2
+ name: vitest
3
+ description: Wire `archal/vitest` into a user's existing Vitest suite so integration tests hit hosted twins instead of real SaaS. Use when the user asks to "add archal to vitest", "wire up vitest with twins", "test against twins in vitest", or when invoked from `archal-onboard` Option C.
4
+ user-invocable: true
5
+ ---
6
+
7
+ # Archal Vitest Integration
8
+
9
+ Wire `archal/vitest` into the user's existing Vitest suite. Don't paste a canned config — inspect what's already there, surface the right choices, and compose on top of it.
10
+
11
+ ## What only you know
12
+
13
+ Claude already knows what Vitest is and how a fetch interceptor works. These are the Archal-specific facts that determine your choices:
14
+
15
+ - `archal/vitest` is a **subpath export of the `archal` npm package**. Users do `pnpm add -D archal`, not `@archal/vitest`.
16
+ - Route mode installs a setup file that rewrites `fetch()` calls to hosted twins. **Test code stays unchanged** — same SDKs, same URLs.
17
+ - Twins are hosted on **ECS Fargate** in Archal's AWS. First run = ~30s cold start. Subsequent runs within the 30-min idle TTL = ~2s. Tell the user; they'll think it's hung otherwise.
18
+ - Session cache key = `(projectName, services, seeds)` hash. Change any of those and the cache misses.
19
+ - **Seeds = starting state.** Omit to get the twin's default. Named seeds give fixtures (e.g. `small-project` for GitHub, `small-business` for Stripe). Never ask "what seed?" open-ended — the user doesn't know the catalog.
20
+ - Route-mode twins available: `github`, `slack`, `stripe`, `jira`, `supabase`, `google-workspace`. Not yet: `linear`, `ramp`.
21
+
22
+ ## Discover before you ask
23
+
24
+ 1. `package.json` deps → infer likely twins (`@octokit/rest` → github, `stripe` → stripe, `@slack/web-api` → slack, `@supabase/supabase-js` → supabase, `googleapis` → google-workspace, `jira.js` → jira).
25
+ 2. Read any existing `vitest.config.ts` / `vitest.config.js` / `vitest.workspace.ts`. Note `setupFiles`, `include`/`exclude`, `reporters`, `projects`.
26
+ 3. Grep test files (`__tests__/`, `tests/`, `*.test.ts`) for outbound calls: `fetch(`, `Octokit`, `new Stripe`, `WebClient`, `createClient`. These are the routing candidates.
27
+ 4. Auth: `archal usage` tells you if they're logged in. `archal login` or `ARCHAL_TOKEN` in CI.
28
+
29
+ ## Ask only what you couldn't infer
30
+
31
+ Offer your inferred answer as the default.
32
+
33
+ 1. **Scope.** "I found these N test files making outbound HTTP calls: [list]. All of them? Or a specific subset (by folder, glob, or file list)?"
34
+ 2. **Twin set.** "From deps I see `[github, stripe]`. Complete, or am I missing/over-including?"
35
+ 3. **Seeds (per twin, with inline catalog).** For each twin, present three choices:
36
+ > "For `github`: (a) default empty twin, (b) `small-project` seed (one repo, few issues/PRs — good starting point), (c) custom seed name. Which?"
37
+
38
+ ## Pick a config pattern
39
+
40
+ Three patterns. The right one depends on what you saw in discovery.
41
+
42
+ ### Pattern A — wrap existing `vitest.config.ts` with `withArchal` (all tests hit twins)
43
+
44
+ For dedicated integration-test packages where every test should route. `withArchal` is a merge helper: it preserves everything in the existing `test` block (`coverage`, `alias`, `globalSetup`, `poolOptions`, custom reporters, etc.) and additively composes Archal's setup file, reporter, and session env on top.
45
+
46
+ Edit their existing file in place — the change is one line on the `test:` value:
47
+
48
+ ```ts
49
+ import { defineConfig } from 'vitest/config';
50
+ import { withArchal } from 'archal/vitest';
51
+
52
+ export default defineConfig({
53
+ test: withArchal(
54
+ {
55
+ // ...everything they already had, unchanged
56
+ globals: true,
57
+ setupFiles: ['./test/my-setup.ts'],
58
+ coverage: { provider: 'v8' },
59
+ },
60
+ {
61
+ services: {
62
+ github: { mode: 'route', seed: 'small-project' },
63
+ stripe: { mode: 'route' },
64
+ },
65
+ },
66
+ ),
67
+ });
68
+ ```
69
+
70
+ Merge behavior: `setupFiles` and `reporters` are concatenated, `env` is merged (user keys preserved + Archal session keys added), and any other field the user had is passed through untouched.
71
+
72
+ If the user is starting from scratch (no existing `test` block), pass `{}` as the first argument: `withArchal({}, { services })`.
73
+
74
+ ### Pattern B — workspace with a separate Archal project (subset of tests hit twins)
75
+
76
+ Most common shape. Unit tests stay fast; only the routed subset provisions twins.
77
+
78
+ ```ts
79
+ import { archalVitestProject } from 'archal/vitest';
80
+
81
+ export default [
82
+ './vitest.config.ts', // their existing unit project untouched
83
+ archalVitestProject(
84
+ {
85
+ name: 'hosted-twins',
86
+ services: {
87
+ github: { mode: 'route', seed: 'small-project' },
88
+ stripe: { mode: 'route' },
89
+ },
90
+ },
91
+ { include: ['__tests__/hosted/**/*.test.ts'] },
92
+ ),
93
+ ];
94
+ ```
95
+
96
+ ### Pattern C — separate config + npm script (strict isolation)
97
+
98
+ `vitest.integration.config.ts` using Pattern A, plus `"test:integration": "vitest -c vitest.integration.config.ts"`. Use when `pnpm test` must stay unit-only.
99
+
100
+ ## Apply → verify
101
+
102
+ 1. Install `archal` if missing.
103
+ 2. Write/edit the config.
104
+ 3. Ensure auth (`archal login` or `ARCHAL_TOKEN`).
105
+ 4. Run one routed test: `pnpm vitest run <path>`.
106
+
107
+ If confirming routing is live from inside a test:
108
+ ```ts
109
+ import { getInstalledArchalVitestSession } from 'archal/vitest';
110
+ console.log(getInstalledArchalVitestSession()?.resolvedRuntime.resolvedServices);
111
+ ```
112
+
113
+ ## Failure modes
114
+
115
+ - **Real API response instead of twin response** — test file isn't in the routed project's `include` glob.
116
+ - **401/auth at setup** — `ARCHAL_TOKEN` unset or `archal login` not run.
117
+ - **First run takes 30+ seconds** — ECS cold-start, expected. Warn the user up front.
118
+ - **Seed state unexpected** — inspect via `getInstalledArchalVitestSession()`; confirm resolved seed matches intent.
119
+ - **`resetArchalTwins()` not restoring** — call in `beforeEach`, not `beforeAll`.
120
+ - **CI credential race** (parallel jobs corrupting `~/.archal/credentials.json`) — export `ARCHAL_TOKEN` directly; don't rely on the credential file.
121
+
122
+ ## Anti-patterns
123
+
124
+ - Don't route `localhost` or the user's own backend. Route mode is for external SaaS.
125
+ - Don't set `testIsolation: 'serial'` preemptively. Only when you've observed cross-test state leaks.
126
+ - Don't add route mode to tests that don't make outbound HTTP calls — the interceptor install has overhead.
127
+ - Don't drive vitest through `.archal.json`. That file is for the CLI `archal run` flow; the vitest integration is self-contained.
128
+ - Don't paste a canonical config without reading what's already in the repo.
129
+
130
+ ## Docs
131
+
132
+ - Guide: https://docs.archal.ai/guides/vitest
133
+ - Package reference: `packages/vitest/README.md`