npm - archal - Versions diffs - 0.9.10 → 0.9.11 - Mend

archal 0.9.10 → 0.9.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/dist/index.cjs +5277 -65
package/package.json +2 -1
package/skills/audit/SKILL.md +55 -0
package/skills/onboard/SKILL.md +93 -0
package/skills/scenario/SKILL.md +146 -0
package/skills/test/SKILL.md +109 -0
package/skills/vitest/SKILL.md +133 -0

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "archal",
-  "version": "0.9.10",
+  "version": "0.9.11",
   "description": "Test your agents & integrations against digital twins",
   "type": "module",
   "main": "dist/index.cjs",
@@ -41,6 +41,7 @@
   "files": [
     "bin",
     "dist",
+    "skills",
     "twin-assets",
     "LICENSE"
   ],

package/skills/audit/SKILL.md ADDED Viewed

@@ -0,0 +1,55 @@
+---
+name: audit
+description: Audit an Archal repository thoroughly. Trace real execution paths, identify concrete bugs and design flaws, distinguish root-cause fixes from architecture problems, and add regression tests for every confirmed issue.
+user-invocable: true
+argument-hint: "[repo path or scope]"
+---
+# Archal Repository Audit
+Use this skill when the goal is to inspect an Archal repository deeply, find problems worth fixing, and avoid shallow or local-only patches.
+## Audit standard
+- Trace real execution paths from entrypoints before proposing fixes.
+- Prefer root-cause fixes over guards, silencing, or narrow special cases.
+- If the real problem is architectural, report it instead of applying a monkey patch.
+- For every confirmed bug you fix, add the narrowest regression test that would have caught it earlier.
+- Always include at least one regression test that covers a stale-data row or pre-migration row when the touched path has compatibility logic.
+## Working pattern
+1. Map the hot paths first.
+   - Identify the actual entrypoints: CLI commands, web routes, background jobs, and core runtime/session flows.
+   - Ignore dead-looking surfaces until the primary paths are understood.
+2. Read the execution path end to end.
+   - Follow inputs through parsing, validation, persistence, normalization, and response shaping.
+   - Inspect nearby invariants and adjacent edge cases before deciding on a fix.
+3. Separate findings into two buckets.
+   - **Fix now**: clear bug, contained scope, root cause understood, regression test is obvious.
+   - **Escalate**: the defect comes from a bad abstraction or architectural boundary and a local patch would hide the real problem.
+4. Validate narrowly, then broadly.
+   - Run the smallest meaningful tests for the changed path first.
+   - If code changed, also run the relevant package build/typecheck before concluding.
+## What to look for
+- Compatibility shims that silently drop data from old rows or partially migrated schemas
+- Session lifecycle bugs around start, ready, teardown, stale state, and idempotency
+- Projection code that derives canonical state from stale denormalized fields
+- Fallback behavior that changes semantics instead of preserving them
+- Query builders that filter on derived fields inconsistently across list/count paths
+- Evidence, trace, or normalization code that double-counts, hides, or misattributes records
+## Output format
+For each finding, report:
+- Problem
+- Technical cause
+- Simple explanation
+- Optimal fix
+- Why that fix is better than narrower alternatives
+- Regression test to add
+If no actionable problems are found in a slice, say that explicitly and note any remaining coverage gaps.

package/skills/onboard/SKILL.md ADDED Viewed

@@ -0,0 +1,93 @@
+---
+name: onboard
+description: Set up Archal in a project from scratch. Detects dependencies, installs the CLI, handles auth, then routes to the right sub-skill (scenarios, vitest, etc.) for the workflow the user wants. Use when the user asks to "set up archal", "initialize archal", "get started with archal", or "add archal to this repo".
+user-invocable: true
+---
+# Archal Onboard
+You are setting up Archal in this project. Archal tests AI agents against digital twins of real services (GitHub, Slack, Stripe, etc.). Handle installation and auth yourself; delegate the workflow-specific setup to the matching sub-skill.
+## Discover first
+Before asking anything, read the repo:
+1. `package.json` deps → infer likely twins:
+   - `@octokit/rest`, `octokit` → `github`
+   - `stripe` → `stripe`
+   - `@slack/web-api`, `@slack/bolt` → `slack`
+   - `@linear/sdk` → `linear`
+   - `@supabase/supabase-js` → `supabase`
+   - `googleapis`, `@google-cloud/*` → `google-workspace`
+   - `jira-client`, `jira.js` → `jira`
+2. Existing vitest config? Existing scenarios? Existing `.archal.json`? Those change which workflow makes sense.
+3. If no `package.json` or no matching deps: ask "Which services does your agent interact with?" and show the full list: `github`, `slack`, `stripe`, `linear`, `jira`, `supabase`, `google-workspace`, `ramp`.
+## Install + auth
+```bash
+npx archal --version        # check if installed
+npm install -D archal       # install if not (or -g for global)
+archal usage                # check auth
+archal login                # OAuth browser flow, or: archal login --token <token>
+```
+In CI, use `ARCHAL_TOKEN` instead of `archal login`.
+## Pick a workflow
+Confirm detected twins, then ask which of these the user wants. Each delegates to a sub-skill where appropriate — don't inline those flows.
+### Option A — Test an agent with scenarios
+Write markdown scenario files that describe setup, prompt, and success criteria; `archal run` executes them against twins.
+1. Create `.archal.json`:
+   ```json
+   { "agent": "<agent command>", "twins": ["<detected twins>"] }
+   ```
+2. **Delegate to the `scenario` skill** to author a starter scenario. Don't paste a canned example here — the skill knows the markdown format and success-criteria syntax.
+3. Run: `archal run scenarios/<first>.md`.
+### Option B — Run quick inline tasks
+1. `.archal.json` with just twins:
+   ```json
+   { "twins": ["<detected twins>"] }
+   ```
+2. Demo: `archal run --task "Create an issue titled hello" --twin github`.
+No sub-skill needed — this is a one-shot.
+### Option C — Twins in a Vitest suite
+**Delegate to the `vitest` skill.** It handles reading the existing vitest config, identifying which tests should route, picking the right composition pattern, and seeding the twins.
+Do not paste a sample config here. The right shape depends on what's already in the repo.
+### Option D — Persistent twins to develop against
+Run: `archal twin start <detected twins>` — gives live twin URLs the user's SDK clients can point at.
+## Verify
+Run the first test or task and show the result.
+## `.archal.json` schema
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `agent` | string or `{ command, args }` | yes (for scenarios) | | Shell command to run the agent |
+| `title` | string | no | | Display name for reports |
+| `twins` | string[] | no | inferred | Which twins to provision |
+| `scenarios` | string[] | no | | Scenario file paths relative to config |
+| `seeds` | `Record<string, string>` | no | | Per-twin seed overrides |
+| `agentModel` | string | no | | LLM model the agent uses |
+| `model` | string | no | `gemini-2.5-pro` | Evaluator model |
+| `runs` | number | no | `1` | Runs per scenario |
+| `timeout` | number | no | `180` | Timeout per run in seconds |
+## Docs
+- Quickstart: https://docs.archal.ai/quickstart
+- Full docs: https://docs.archal.ai

package/skills/scenario/SKILL.md ADDED Viewed

@@ -0,0 +1,146 @@
+---
+name: scenario
+description: Write, edit, and validate Archal scenario files. Knows the markdown format, success criteria syntax, and config options.
+user-invocable: true
+argument-hint: "[scenario description or file path]"
+---
+# Archal Scenario Writer
+You write and edit Archal scenario files. Scenarios are markdown files that define a test for an AI agent running against digital twins.
+## Scenario format
+```markdown
+# Scenario Title
+## Setup
+Starting state described in plain English. Drives seed generation.
+## Prompt
+The task instruction given to the agent.
+## Expected Behavior
+Answer key for the evaluator. Never shown to the agent.
+## Success Criteria
+- [D] Deterministic criterion checked against twin state
+- [P] Probabilistic criterion judged by LLM
+## Config
+twins: github
+timeout: 90
+runs: 3
+```
+## Sections
+| Section | Required | Aliases | Purpose |
+|---------|----------|---------|---------|
+| `# Title` | yes | | Scenario name (H1 heading) |
+| `## Setup` | no | `Context`, `Initial State` | Starting state in plain English |
+| `## Prompt` | yes | `Task`, `Instruction`, `Instructions`, `Request` | Task given to the agent |
+| `## Expected Behavior` | no | `Expected Behaviour`, `Behavior`, `Behaviour`, `Judge Notes`, `Evaluation Notes` | Answer key for evaluator (never shown to agent) |
+| `## Success Criteria` | yes | `Success`, `Criteria`, `Checks`, `Assertions` | Evaluable checks |
+| `## Config` | no | | Runtime settings |
+| `## Seed State` | no | | Explicit seed data |
+## Success criteria syntax
+Each criterion is a bullet point. Tag with `[D]` or `[P]`:
+- `[D]` = **Deterministic**. Checked against twin state programmatically. Use for counts, existence checks, state assertions. No LLM cost.
+- `[P]` = **Probabilistic**. Judged by LLM evaluator from the trace and final state. Use for tone, quality, correctness, reasoning.
+If no tag is provided, Archal infers the type:
+- Numeric/state patterns (`exactly N`, `at least N`, `is created/closed/merged`, `no errors`, `count is/equals`) are auto-tagged `[D]`
+- Everything else defaults to `[P]`
+### Writing good criteria
+**Good `[D]` criteria:**
+- `[D] Exactly 4 issues are closed`
+- `[D] A pull request exists with title containing "fix"`
+- `[D] No issues have the label "wontfix"`
+- `[D] The Slack channel #incidents has at least 1 new message`
+**Good `[P]` criteria:**
+- `[P] Each closing comment explains the inactivity period`
+- `[P] The PR description summarizes all changes accurately`
+- `[P] The agent does not modify any unrelated issues`
+**Bad criteria (avoid):**
+- `The agent works correctly` (too vague)
+- `[D] The response is good` (not deterministic)
+- `[P] Exactly 3 items exist` (should be `[D]`)
+## Config keys
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `twins` | comma-separated | inferred from content | Which twins to use |
+| `seed` | string | | Named seed to load |
+| `timeout` | integer | `180` | Seconds per run |
+| `runs` | integer | `1` | Number of runs |
+| `evaluator-model` | string | `gemini-2.5-pro` | LLM for `[P]` criteria |
+| `tags` | comma-separated | | Scenario tags |
+Aliases for `evaluator-model`: `evaluator`, `evaluatormodel`, `model`.
+## Available twins and general-purpose seeds
+| Twin | Seeds |
+|------|-------|
+| `github` | `empty`, `small-project`, `enterprise-repo`, `ci-cd-pipeline`, `stale-issues`, `large-backlog` |
+| `slack` | `empty`, `engineering-team`, `busy-workspace`, `incident-active` |
+| `stripe` | `empty`, `small-business`, `checkout-flow`, `subscription-lifecycle`, `subscription-heavy` |
+| `jira` | `empty`, `small-project`, `enterprise`, `sprint-active`, `large-backlog` |
+| `linear` | `empty`, `small-team`, `engineering-org`, `multi-team`, `busy-backlog` |
+| `supabase` | `empty`, `small-project`, `saas-starter`, `ecommerce` |
+| `google-workspace` | `empty`, `assistant-baseline`, `gmail-busy-inbox`, `calendar-packed-week` |
+| `ramp` | `empty`, `default` |
+## Twin auto-detection from content
+If no `twins:` config is set, Archal infers twins from keywords in Setup, Expected Behavior, and Prompt:
+- `github`, `repository`, `pull request`, `create_issue` -> `github`
+- `slack`, `slack channel`, `send_message` -> `slack`
+- `linear`, `linear ticket` -> `linear`
+- `jira`, `jira sprint` -> `jira`
+- `stripe`, `payment`, `refund`, `subscription`, `invoice` -> `stripe`
+- `supabase`, `database`, `sql query` -> `supabase`
+- `google workspace`, `gmail`, `calendar event`, `inbox` -> `google-workspace`
+## Multi-service scenarios
+Use multiple twins by listing them in config:
+```markdown
+## Config
+twins: github, slack
+```
+The Setup section can describe state across both services. Each twin gets its own seed.
+## Validation
+Run `archal scenario list` to verify scenarios parse correctly. A valid scenario must have:
+- A title (H1 heading)
+- A Prompt section
+- At least one success criterion
+- At least one referenced twin (explicit or inferred)
+- Positive timeout and runs values
+## Common mistakes to avoid
+1. Writing `[D]` criteria that require subjective judgment
+2. Writing `[P]` criteria that could be checked deterministically
+3. Forgetting to specify which twin the scenario uses
+4. Writing Setup descriptions that are too vague for seed generation
+5. Using seed names that don't exist (check the seed table above)
+## Documentation
+- Writing scenarios: https://docs.archal.ai/guides/writing-scenarios
+- Twins and seeds: https://docs.archal.ai/twins/overview

package/skills/test/SKILL.md ADDED Viewed

@@ -0,0 +1,109 @@
+---
+name: test
+description: Run Archal scenarios or inline tasks against hosted twins, diagnose failed runs, and interpret satisfaction scores. Triggers on "run my scenario", "test my agent", "archal run X", "debug this failing run", "what does this satisfaction score mean".
+user-invocable: true
+argument-hint: "[scenario.md or task description]"
+---
+# Archal Test Runner
+You run Archal scenarios and inline tasks, then help the user interpret the results. For setting up the agent path or `.archal.json` in a fresh repo, hand off to the `onboard` skill.
+## What only you know (product mental model)
+- `archal run` spawns the user's agent as a child process. The agent needs:
+  - A **runnable agent path**. Two ways to supply it: explicit `--harness <path>` (e.g. `./.archal/harness.ts`), or `.archal.json` with an `agent` command. Repo-local auto-discovery also walks up from cwd for a top-level `harness.{ts,js,mjs,cjs}`.
+  - A **headless boundary** — no UI, no browser OAuth. The process is spawned without a shell, so interactive auth hangs forever.
+  - Env vars — auto-injected. `ARCHAL_ENGINE_TASK` is the prompt; `ARCHAL_<TWIN>_BASE_URL` / `ARCHAL_<TWIN>_URL` point at twins; `ARCHAL_PREFLIGHT=1` is set during boot check (harness should exit early).
+- Every `archal run` writes local artifacts under `.archal/cache/last-run.json` and `.archal/cache/runs/*.json` **regardless** of `--output`. `--output json` is only for machine-readable stdout; it's not needed for local persistence.
+- **Satisfaction score** = (runs passing all criteria) / (total runs). `[D]` criteria are deterministic state checks; `[P]` criteria are LLM-judged from trace + final state.
+## Preflight the harness before a run
+When the agent path is uncertain, or after any change to the harness file, smoke-test the harness directly before `archal run`:
+```bash
+ARCHAL_PREFLIGHT=1 ARCHAL_ENGINE_TASK="Reply with OK and do not use tools." npx tsx ./.archal/harness.ts
+```
+A harness that exits cleanly with no tool calls is ready. Catches: no runnable entrypoint, UI-boot assumptions, missing provider keys, service bridge misconfig. A failure here is much easier to diagnose than a silent timeout inside `archal run`.
+## Running
+Scenario from a file:
+```bash
+archal run scenario.md
+archal run scenario.md --runs 5 --seed enterprise-repo   # N runs → satisfaction score
+```
+Inline task (no scenario file):
+```bash
+archal run --task "Create an issue titled hello" --harness ./.archal/harness.ts --twin github
+```
+`--task` only replaces the scenario file — it still needs a runnable agent path. `--twin` is required with `--task`; repeat or comma-separate for multiple twins.
+When `.archal.json` exists in cwd, bare `archal run` uses it. If the user doesn't have one yet, that's setup — hand off to the `onboard` skill, which owns harness creation and `.archal.json` scaffolding.
+## Interpret results
+Score breakdown:
+- `100%` = every run passed every criterion
+- `80%` = 4/5 runs passed
+- `0%` = none passed
+Criterion types:
+- `[D]` — deterministic state check. A failure is real; never a model variance artifact.
+- `[P]` — LLM judge reads trace + final state. A single failure can be variance; re-run with `--runs 3+` to confirm before acting on it.
+## Diagnose failures
+Re-run with `-v` for the full trace, then classify with these signals:
+- **Agent bug** — wrong tool called, wrong arguments, stopped early.
+  *Signals:* trace shows the correct tool was available but the agent chose another; or arguments are malformed.
+  *Fix:* agent prompt, tool wiring, or underlying model.
+- **Scenario bug** — criteria are too strict, ambiguous, or contradict the Setup.
+  *Signals:* agent clearly did the right thing but a `[D]` criterion expects an exact count the Setup didn't guarantee; or two criteria contradict each other.
+  *Fix:* make Setup more specific, or relax the criterion. Use the `scenario` skill.
+- **Seed mismatch** — twin state doesn't match what Setup describes.
+  *Signals:* agent's first introspection tool call returns unexpected state (e.g. Setup says "4 stale issues" but the seed has 3).
+  *Fix:* different seed, or adjust Setup to match. `archal seed list <twin>` to browse.
+- **Harness bug** — agent process never started, crashed immediately, or hung.
+  *Signals:* no tool calls in the trace, stderr shows a boot error, or the run times out at the configured `--timeout`.
+  *Fix:* smoke-test the harness directly with `ARCHAL_PREFLIGHT=1 ARCHAL_ENGINE_TASK="Reply with OK." npx tsx ./.archal/harness.ts`, then look for UI-only imports, missing provider keys, or interactive auth.
+## CI mode
+```bash
+archal run scenario.md --runs 3 --pass-threshold 80 -o json -q
+```
+Exit codes: `0` pass, `1` fail or score < threshold, `2` validation error. For GitHub Actions, inject `ARCHAL_TOKEN` as a secret.
+## Artifacts + dashboard
+- **Local (always written):** `.archal/cache/last-run.json` (summary), `.archal/cache/runs/*.json` (full redacted trace).
+- **Hosted:** every run also uploads to https://www.archal.ai/dashboard — useful for sharing a failing trace with a colleague or comparing across agent model versions.
+Don't tell users they need `-o json` to save artifacts locally — that's only for stdout.
+## Anti-patterns
+- Don't re-document the `archal run` flag list here. `archal run --help` and https://docs.archal.ai/cli/run own that — they'll drift if duplicated.
+- Don't guess the agent path. If the user doesn't have `--harness`, a repo-local harness, or `.archal.json`, hand off to `onboard` — it owns setup.
+- Don't promote `--proxy` as default. It's for agents that still call real service domains through raw HTTPS clients. Env-var wiring is the primary path; proxy is a fallback.
+- Don't classify a single `[P]` failure as an agent bug without re-running. Probabilistic criteria need sample size.
+- Don't treat a `[D]` failure as model variance. Deterministic failures are real bugs.
+## Docs
+- Running with an agent: https://docs.archal.ai/guides/run-with-agent
+- Existing repo playbook: https://docs.archal.ai/guides/existing-agent-repo
+- Scenario authoring: hand off to the `scenario` skill
+- Twin sessions: https://docs.archal.ai/guides/twin-sessions

package/skills/vitest/SKILL.md ADDED Viewed

@@ -0,0 +1,133 @@
+---
+name: vitest
+description: Wire `archal/vitest` into a user's existing Vitest suite so integration tests hit hosted twins instead of real SaaS. Use when the user asks to "add archal to vitest", "wire up vitest with twins", "test against twins in vitest", or when invoked from `archal-onboard` Option C.
+user-invocable: true
+---
+# Archal Vitest Integration
+Wire `archal/vitest` into the user's existing Vitest suite. Don't paste a canned config — inspect what's already there, surface the right choices, and compose on top of it.
+## What only you know
+Claude already knows what Vitest is and how a fetch interceptor works. These are the Archal-specific facts that determine your choices:
+- `archal/vitest` is a **subpath export of the `archal` npm package**. Users do `pnpm add -D archal`, not `@archal/vitest`.
+- Route mode installs a setup file that rewrites `fetch()` calls to hosted twins. **Test code stays unchanged** — same SDKs, same URLs.
+- Twins are hosted on **ECS Fargate** in Archal's AWS. First run = ~30s cold start. Subsequent runs within the 30-min idle TTL = ~2s. Tell the user; they'll think it's hung otherwise.
+- Session cache key = `(projectName, services, seeds)` hash. Change any of those and the cache misses.
+- **Seeds = starting state.** Omit to get the twin's default. Named seeds give fixtures (e.g. `small-project` for GitHub, `small-business` for Stripe). Never ask "what seed?" open-ended — the user doesn't know the catalog.
+- Route-mode twins available: `github`, `slack`, `stripe`, `jira`, `supabase`, `google-workspace`. Not yet: `linear`, `ramp`.
+## Discover before you ask
+1. `package.json` deps → infer likely twins (`@octokit/rest` → github, `stripe` → stripe, `@slack/web-api` → slack, `@supabase/supabase-js` → supabase, `googleapis` → google-workspace, `jira.js` → jira).
+2. Read any existing `vitest.config.ts` / `vitest.config.js` / `vitest.workspace.ts`. Note `setupFiles`, `include`/`exclude`, `reporters`, `projects`.
+3. Grep test files (`__tests__/`, `tests/`, `*.test.ts`) for outbound calls: `fetch(`, `Octokit`, `new Stripe`, `WebClient`, `createClient`. These are the routing candidates.
+4. Auth: `archal usage` tells you if they're logged in. `archal login` or `ARCHAL_TOKEN` in CI.
+## Ask only what you couldn't infer
+Offer your inferred answer as the default.
+1. **Scope.** "I found these N test files making outbound HTTP calls: [list]. All of them? Or a specific subset (by folder, glob, or file list)?"
+2. **Twin set.** "From deps I see `[github, stripe]`. Complete, or am I missing/over-including?"
+3. **Seeds (per twin, with inline catalog).** For each twin, present three choices:
+   > "For `github`: (a) default empty twin, (b) `small-project` seed (one repo, few issues/PRs — good starting point), (c) custom seed name. Which?"
+## Pick a config pattern
+Three patterns. The right one depends on what you saw in discovery.
+### Pattern A — wrap existing `vitest.config.ts` with `withArchal` (all tests hit twins)
+For dedicated integration-test packages where every test should route. `withArchal` is a merge helper: it preserves everything in the existing `test` block (`coverage`, `alias`, `globalSetup`, `poolOptions`, custom reporters, etc.) and additively composes Archal's setup file, reporter, and session env on top.
+Edit their existing file in place — the change is one line on the `test:` value:
+```ts
+import { defineConfig } from 'vitest/config';
+import { withArchal } from 'archal/vitest';
+export default defineConfig({
+  test: withArchal(
+    {
+      // ...everything they already had, unchanged
+      globals: true,
+      setupFiles: ['./test/my-setup.ts'],
+      coverage: { provider: 'v8' },
+    },
+    {
+      services: {
+        github: { mode: 'route', seed: 'small-project' },
+        stripe: { mode: 'route' },
+      },
+    },
+  ),
+});
+```
+Merge behavior: `setupFiles` and `reporters` are concatenated, `env` is merged (user keys preserved + Archal session keys added), and any other field the user had is passed through untouched.
+If the user is starting from scratch (no existing `test` block), pass `{}` as the first argument: `withArchal({}, { services })`.
+### Pattern B — workspace with a separate Archal project (subset of tests hit twins)
+Most common shape. Unit tests stay fast; only the routed subset provisions twins.
+```ts
+import { archalVitestProject } from 'archal/vitest';
+export default [
+  './vitest.config.ts', // their existing unit project untouched
+  archalVitestProject(
+    {
+      name: 'hosted-twins',
+      services: {
+        github: { mode: 'route', seed: 'small-project' },
+        stripe: { mode: 'route' },
+      },
+    },
+    { include: ['__tests__/hosted/**/*.test.ts'] },
+  ),
+];
+```
+### Pattern C — separate config + npm script (strict isolation)
+`vitest.integration.config.ts` using Pattern A, plus `"test:integration": "vitest -c vitest.integration.config.ts"`. Use when `pnpm test` must stay unit-only.
+## Apply → verify
+1. Install `archal` if missing.
+2. Write/edit the config.
+3. Ensure auth (`archal login` or `ARCHAL_TOKEN`).
+4. Run one routed test: `pnpm vitest run <path>`.
+If confirming routing is live from inside a test:
+```ts
+import { getInstalledArchalVitestSession } from 'archal/vitest';
+console.log(getInstalledArchalVitestSession()?.resolvedRuntime.resolvedServices);
+```
+## Failure modes
+- **Real API response instead of twin response** — test file isn't in the routed project's `include` glob.
+- **401/auth at setup** — `ARCHAL_TOKEN` unset or `archal login` not run.
+- **First run takes 30+ seconds** — ECS cold-start, expected. Warn the user up front.
+- **Seed state unexpected** — inspect via `getInstalledArchalVitestSession()`; confirm resolved seed matches intent.
+- **`resetArchalTwins()` not restoring** — call in `beforeEach`, not `beforeAll`.
+- **CI credential race** (parallel jobs corrupting `~/.archal/credentials.json`) — export `ARCHAL_TOKEN` directly; don't rely on the credential file.
+## Anti-patterns
+- Don't route `localhost` or the user's own backend. Route mode is for external SaaS.
+- Don't set `testIsolation: 'serial'` preemptively. Only when you've observed cross-test state leaks.
+- Don't add route mode to tests that don't make outbound HTTP calls — the interceptor install has overhead.
+- Don't drive vitest through `.archal.json`. That file is for the CLI `archal run` flow; the vitest integration is self-contained.
+- Don't paste a canonical config without reading what's already in the repo.
+## Docs
+- Guide: https://docs.archal.ai/guides/vitest
+- Package reference: `packages/vitest/README.md`