npm - cc-workspace - Versions diffs - 4.3.0 → 4.5.0 - Mend

cc-workspace 4.3.0 → 4.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/README.md +125 -7
package/bin/cli.js +39 -7
package/global-skills/agents/e2e-validator.md +149 -0
package/global-skills/agents/implementer.md +65 -46
package/global-skills/agents/team-lead.md +122 -145
package/global-skills/cleanup/SKILL.md +94 -0
package/global-skills/dispatch-feature/SKILL.md +94 -58
package/global-skills/dispatch-feature/references/anti-patterns.md +21 -16
package/global-skills/dispatch-feature/references/spawn-templates.md +95 -148
package/global-skills/doctor/SKILL.md +90 -0
package/global-skills/e2e-validator/references/container-strategies.md +304 -0
package/global-skills/e2e-validator/references/scenario-extraction.md +151 -0
package/global-skills/e2e-validator/references/test-frameworks.md +207 -0
package/global-skills/hooks/session-start-context.sh +38 -7
package/global-skills/session/SKILL.md +79 -0
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -27,7 +27,7 @@ cd ~/projects/my-workspace
 npx cc-workspace init . "My Project"
 ```
-This creates an `orchestrator/` directory and installs 9 skills, 3 agents, 9 hooks, and 3 rules into `~/.claude/`.
+This creates an `orchestrator/` directory and installs 13 skills, 4 agents, 9 hooks, and 2 rules into `~/.claude/`.
 ### Configure (one time)
@@ -47,7 +47,8 @@ The init agent will:
 ```bash
 cd orchestrator/
-claude --agent team-lead
+claude --agent team-lead          # orchestration sessions
+claude --agent e2e-validator      # E2E validation (beta)
 ```
 The team-lead offers 4 modes:
@@ -68,7 +69,7 @@ npx cc-workspace update
 Updates all components if the package version is newer:
 - **Global**: skills, rules, agents in `~/.claude/`
 - **Local** (if `orchestrator/` found): hooks, settings.json, CLAUDE.md, templates, _TEMPLATE.md
-- **Never overwritten**: workspace.md, constitution.md, plans/
+- **Never overwritten**: workspace.md, constitution.md, plans/, e2e/
 ### Diagnostic
@@ -95,6 +96,15 @@ my-workspace/
 │   ├── workspace.md                 <- filled by workspace-init
 │   ├── constitution.md              <- filled by workspace-init
 │   ├── .sessions/                   <- session state (gitignored, created per session)
+│   ├── e2e/                         <- E2E test environment (beta)
+│   │   ├── e2e-config.md            <- agent memory (generated at first boot)
+│   │   ├── docker-compose.e2e.yml   <- generated at first boot
+│   │   ├── tests/                   <- headless API test scripts
+│   │   ├── chrome/
+│   │   │   ├── scenarios/           <- Chrome test flows per plan
+│   │   │   ├── screenshots/         <- evidence
+│   │   │   └── gifs/                <- recorded flows
+│   │   └── reports/                 <- per-plan E2E reports
 │   ├── templates/
 │   │   ├── workspace.template.md
 │   │   ├── constitution.template.md
@@ -198,6 +208,7 @@ parallel in each repo via Agent Teams.
 | **Teammates** | Sonnet 4.6 | Implement in an isolated worktree, test, commit. |
 | **Explorers** | Haiku | Read-only. Scan, verify consistency. |
 | **QA** | Sonnet 4.6 | Hostile mode. Min 3 problems found per service. |
+| **E2E Validator** | Sonnet 4.6 | Containers + Chrome browser testing (beta). |
 ### The 4 session modes
@@ -235,7 +246,7 @@ Protection layers:
 ---
-## The 9 skills
+## The 13 skills
 | Skill | Role | Trigger |
 |-------|------|---------|
@@ -248,19 +259,24 @@ Protection layers:
 | **cycle-retrospective** | Post-cycle learning (Haiku) | "Retro", "retrospective" |
 | **refresh-profiles** | Re-scan repo CLAUDE.md files (Haiku) | "Refresh profiles" |
 | **bootstrap-repo** | Generate a CLAUDE.md (Haiku) | "Bootstrap", "init CLAUDE.md" |
+| **e2e-validator** | E2E validation: containers + Chrome (beta) | `claude --agent e2e-validator` |
+| **session** | List, status, close parallel sessions | `/session`, `/session status X` |
+| **doctor** | Full workspace diagnostic (Haiku) | `/doctor` |
+| **cleanup** | Remove orphan worktrees + stale sessions | `/cleanup` |
 All use `context: fork` — a skill's result is not in context when the
 next one starts. The plan on disk is the source of truth.
 ---
-## The 3 agents
+## The 4 agents
 | Agent | Model | Usage |
 |-------|-------|-------|
 | **team-lead** | Opus 4.6 | `claude --agent team-lead` — multi-service orchestration |
 | **workspace-init** | Sonnet 4.6 | `claude --agent workspace-init` — diagnostic + initial config |
 | **implementer** | Sonnet 4.6 | Task subagent with `isolation: worktree` — isolated implementation |
+| **e2e-validator** | Sonnet 4.6 | `claude --agent e2e-validator` — E2E validation with containers + Chrome (beta) |
 ---
@@ -385,9 +401,14 @@ cc-workspace/
     ├── cycle-retrospective/SKILL.md
     ├── refresh-profiles/SKILL.md
     ├── bootstrap-repo/SKILL.md
+    ├── e2e-validator/
+    │   └── references/
+    │       ├── container-strategies.md
+    │       ├── test-frameworks.md
+    │       └── scenario-extraction.md
     ├── hooks/                         <- 11 scripts (warning-only)
     ├── rules/                         <- 3 rules
-    └── agents/                        <- 3 agents (team-lead, implementer, workspace-init)
+    └── agents/                        <- 4 agents (team-lead, implementer, workspace-init, e2e-validator)
 ```
 ---
@@ -395,7 +416,7 @@ cc-workspace/
 ## Idempotence
 Both `init` and `update` are safe to re-run:
-- **Never overwritten**: `workspace.md`, `constitution.md`, `plans/*.md` (user content)
+- **Never overwritten**: `workspace.md`, `constitution.md`, `plans/*.md`, `e2e/` (user content)
 - **Always regenerated**: `settings.json`, `block-orchestrator-writes.sh` (security), `CLAUDE.md`, `_TEMPLATE.md`
 - **Always copied**: hooks, templates
 - **Always regenerated on init**: `service-profiles.md` (fresh scan)
@@ -403,6 +424,103 @@ Both `init` and `update` are safe to re-run:
 ---
+## E2E Validator (beta)
+A dedicated agent that validates completed plans by running services in containers
+and testing scenarios — including Chrome browser-driven UI tests.
+```bash
+cd orchestrator/
+claude --agent e2e-validator
+```
+### First boot — setup
+On first boot (no `e2e/e2e-config.md`), the agent:
+1. Reads `workspace.md` for repos and stacks
+2. Scans repos for existing `docker-compose.yml` and test frameworks
+3. If docker-compose exists: generates an overlay (`docker-compose.e2e.yml`)
+4. If not: builds the config interactively with you
+5. Writes `e2e/e2e-config.md` (its persistent memory)
+### Modes
+| Mode | Description |
+|------|-------------|
+| `validate <plan>` | Test a specific completed plan (API tests) |
+| `validate <plan> --chrome` | Same + Chrome browser UI tests |
+| `run-all` | Run all E2E tests (headless) |
+| `run-all --chrome` | Run all E2E tests + Chrome |
+| `setup` | Re-run first boot setup |
+Add `--fix` to any mode to dispatch teammates for fixing failures.
+### How it works
+1. Creates `/tmp/` worktrees on session branches (from the plan)
+2. Starts services via `docker compose up`
+3. Waits for health checks
+4. Runs existing test suites + generates API scenario tests from the plan
+5. With `--chrome`: drives Chrome via chrome-devtools MCP (navigate, fill forms,
+   click, take screenshots, record GIFs, check network requests and console)
+6. Generates report with evidence (screenshots, GIFs, network traces)
+7. Tears down containers and worktrees
+### Chrome testing
+With `--chrome`, the agent:
+- Navigates the frontend in your real Chrome browser
+- Plays user scenarios extracted from the plan
+- Takes screenshots at each step as evidence
+- Records GIFs of complete flows
+- Checks the 4 mandatory UX states (loading, empty, error, success)
+- Tests responsive layouts (mobile viewport)
+- Verifies network requests match the API contract
+- Checks console for errors
+### Requirements
+- **Docker** (docker compose v2)
+- **Chrome** with chrome-devtools MCP server (for `--chrome` mode)
+- Completed plan (all tasks ✅) with session branches
+---
+## Changelog v4.4.0 -> v4.5.0
+| # | Feature | Detail |
+|---|---------|--------|
+| 1 | **Agent prompt restructuring** | All agents now have a `CRITICAL — Non-negotiable rules` section at the top. Most important rules are front-loaded for better model adherence. Prompts reduced by ~25%. |
+| 2 | **Context tiering** | Spawn templates now use 3 tiers: Tier 1 (always inject), Tier 2 (conditional), Tier 3 (never — already in agent/CLAUDE.md). Reduces implementer context bloat. |
+| 3 | **Spawn template deduplication** | Git workflow instructions removed from spawn templates — the implementer agent already knows them. Only specific values (repo path, session branch) are injected. |
+| 4 | **Rollback protocol** | team-lead can now `git update-ref` to reset a corrupted session branch to the last known good commit, or recreate from source branch. |
+| 5 | **Failed dispatch tracking** | Plan template now includes a "Failed dispatches" section. After 2 retries, commit units are marked `❌ ESCALATED` and the wave stops for user input. |
+| 6 | **Worktree crash recovery** | SessionStart hook now cleans orphan `/tmp/` worktrees left by crashed implementers. Implementer can also reuse an existing worktree from a previous failed attempt. |
+| 7 | **Implementer maxTurns 50→60** | Buffer for complex commit units. Prevents context loss at boundary. |
+| 8 | **3 new slash commands** | `/session` (list, status, close sessions), `/doctor` (full diagnostic), `/cleanup` (orphan worktrees + stale sessions). Replaces `npx cc-workspace` CLI for in-session use. |
+| 9 | **13 skills** | Up from 10. New: session, doctor, cleanup. |
+---
+## Changelog v4.3.0 -> v4.4.0
+| # | Feature | Detail |
+|---|---------|--------|
+| 1 | **E2E Validator agent (beta)** | New `e2e-validator` agent: validates completed plans by running services in containers. Supports headless API tests and Chrome browser-driven UI tests with screenshots and GIF recording. |
+| 2 | **Chrome testing mode** | `--chrome` flag drives the user's Chrome browser via chrome-devtools MCP. Navigates, fills forms, clicks, takes screenshots, records GIFs, checks network and console. |
+| 3 | **E2E directory structure** | `orchestrator/e2e/` created during init/update. Contains docker-compose overlay, test scripts, Chrome scenarios, screenshots, GIFs, and reports. Never overwritten by updates. |
+| 4 | **Container strategies** | Reference docs for overlay and standalone docker-compose patterns per stack (PHP, Node, Python, Go, Vue, React). |
+| 5 | **Scenario extraction** | Reference doc for extracting testable E2E scenarios from completed plans (API endpoints, Chrome flows, UX states). |
+| 6 | **5 modes** | setup, validate, validate --chrome, run-all, run-all --chrome. Optional --fix dispatches teammates. |
+---
+## Changelog v4.2.0 -> v4.3.0
+> Minor improvements and bug fixes.
+---
 ## Changelog v4.1.4 -> v4.2.0
 | # | Feature | Detail |

package/bin/cli.js CHANGED Viewed

@@ -283,6 +283,7 @@ You clarify, plan, delegate, track.
 cd orchestrator/
 claude --agent workspace-init   # first time: diagnostic + config
 claude --agent team-lead         # work sessions
+claude --agent e2e-validator     # E2E validation of completed plans
 \`\`\`
 ## Initialization (workspace-init)
@@ -305,8 +306,10 @@ Run once. Idempotent — can be re-run to re-diagnose.
 - Service profiles: \`./plans/service-profiles.md\`
 - Active plans: \`./plans/*.md\`
 - Active sessions: \`./.sessions/*.json\`
+- E2E config: \`./e2e/e2e-config.md\`
+- E2E reports: \`./e2e/reports/\`
-## Skills (9)
+## Skills (13)
 - **dispatch-feature**: 4 modes, clarify → plan → waves → collect → verify
 - **qa-ruthless**: adversarial QA, min 3 findings per service
 - **cross-service-check**: inter-repo consistency
@@ -316,6 +319,10 @@ Run once. Idempotent — can be re-run to re-diagnose.
 - **cycle-retrospective**: post-cycle learning (haiku)
 - **refresh-profiles**: re-reads repo CLAUDE.md files (haiku)
 - **bootstrap-repo**: generates a CLAUDE.md for a repo (haiku)
+- **e2e-validator**: E2E validation of completed plans (beta) — containers + Chrome
+- **/session**: list, status, close parallel sessions
+- **/doctor**: full workspace diagnostic
+- **/cleanup**: remove orphan worktrees + stale sessions
 ## Rules
 1. No code in repos — delegate to teammates
@@ -333,6 +340,7 @@ Run once. Idempotent — can be re-run to re-diagnose.
 13. Retrospective cycle after each completed feature
 14. Session branches for parallel isolation — teammates use session/{name}, never create own branches
 15. Never \`git checkout -b\` in repos — use \`git branch\` (no checkout) to avoid disrupting parallel sessions
+16. E2E validation via \`claude --agent e2e-validator\` after plans are complete
 `;
 }
@@ -387,6 +395,9 @@ function planTemplateContent() {
 |---------|:-:|:-:|:-:|:-:|
 | | N | 0 | ⏳ | ⏳ |
+## Failed dispatches
+<!-- Commit units that failed 2+ times are recorded here for user review -->
 ## QA
 - ⏳ Cross-service check
 - ⏳ QA ruthless
@@ -476,8 +487,19 @@ function updateLocal() {
     ok(".sessions/ created");
   }
-  // ── NEVER touch: workspace.md, constitution.md, plans/*.md, service-profiles.md ──
-  info(`${c.dim}workspace.md, constitution.md, plans/ — preserved${c.reset}`);
+  // ── e2e/ (create if missing — never overwrite existing) ──
+  const e2eDir = path.join(orchDir, "e2e");
+  if (!fs.existsSync(e2eDir)) {
+    mkdirp(path.join(e2eDir, "tests"));
+    mkdirp(path.join(e2eDir, "chrome", "scenarios"));
+    mkdirp(path.join(e2eDir, "chrome", "screenshots"));
+    mkdirp(path.join(e2eDir, "chrome", "gifs"));
+    mkdirp(path.join(e2eDir, "reports"));
+    ok("e2e/ directory created");
+  }
+  // ── NEVER touch: workspace.md, constitution.md, plans/*.md, e2e/ ──
+  info(`${c.dim}workspace.md, constitution.md, plans/, e2e/ — preserved${c.reset}`);
   return true;
 }
@@ -493,6 +515,11 @@ function setupWorkspace(workspacePath, projectName) {
   mkdirp(path.join(orchDir, "plans"));
   mkdirp(path.join(orchDir, "templates"));
   mkdirp(path.join(orchDir, ".sessions"));
+  mkdirp(path.join(orchDir, "e2e", "tests"));
+  mkdirp(path.join(orchDir, "e2e", "chrome", "scenarios"));
+  mkdirp(path.join(orchDir, "e2e", "chrome", "screenshots"));
+  mkdirp(path.join(orchDir, "e2e", "chrome", "gifs"));
+  mkdirp(path.join(orchDir, "e2e", "reports"));
   ok("Structure created");
   // ── Templates ──
@@ -575,7 +602,9 @@ function setupWorkspace(workspacePath, projectName) {
     fs.writeFileSync(gi, [
       ".claude/bash-commands.log", ".claude/worktrees/", ".claude/modified-files.log",
       ".sessions/",
-      "plans/*.md", "!plans/_TEMPLATE.md", "!plans/service-profiles.md", ""
+      "plans/*.md", "!plans/_TEMPLATE.md", "!plans/service-profiles.md",
+      "e2e/chrome/screenshots/", "e2e/chrome/gifs/", "e2e/reports/",
+      "e2e/docker-compose.e2e.yml", "e2e/e2e-config.md", ""
     ].join("\n"));
     ok(".gitignore");
   }
@@ -633,13 +662,14 @@ function setupWorkspace(workspacePath, projectName) {
   log(`  ${c.dim}Directory${c.reset}  ${orchDir}`);
   log(`  ${c.dim}Repos${c.reset}      ${repos.length} detected`);
   log(`  ${c.dim}Hooks${c.reset}      ${hookCount} scripts`);
-  log(`  ${c.dim}Skills${c.reset}     9 ${c.dim}(~/.claude/skills/)${c.reset}`);
+  log(`  ${c.dim}Skills${c.reset}     13 ${c.dim}(~/.claude/skills/)${c.reset}`);
   log("");
   log(`  ${c.bold}Next steps:${c.reset}`);
   log(`    ${c.cyan}cd orchestrator/${c.reset}`);
   log(`    ${c.cyan}claude --agent workspace-init${c.reset}   ${c.dim}# first time: diagnostic + config${c.reset}`);
   log(`    ${c.dim}  └─ type "go" to start the diagnostic${c.reset}`);
   log(`    ${c.cyan}claude --agent team-lead${c.reset}        ${c.dim}# orchestration sessions${c.reset}`);
+  log(`    ${c.cyan}claude --agent e2e-validator${c.reset}    ${c.dim}# E2E validation (beta)${c.reset}`);
   if (reposWithoutClaude.length > 0) {
     log("");
     warn(`${reposWithoutClaude.length} repo(s) without CLAUDE.md: ${c.bold}${reposWithoutClaude.join(", ")}${c.reset}`);
@@ -674,7 +704,7 @@ function doctor() {
   // Skills count
   if (fs.existsSync(GLOBAL_SKILLS)) {
     const skills = fs.readdirSync(GLOBAL_SKILLS, { withFileTypes: true }).filter(e => e.isDirectory());
-    check(`Skills (${skills.length}/9)`, skills.length >= 9, `only ${skills.length} found`);
+    check(`Skills (${skills.length}/13)`, skills.length >= 13, `only ${skills.length} found`);
   }
   // Rules
@@ -683,7 +713,7 @@ function doctor() {
   }
   // Agents
-  for (const a of ["team-lead.md", "implementer.md", "workspace-init.md"]) {
+  for (const a of ["team-lead.md", "implementer.md", "workspace-init.md", "e2e-validator.md"]) {
     check(`Agent: ${a}`, fs.existsSync(path.join(GLOBAL_AGENTS, a)), "missing");
   }
@@ -706,6 +736,7 @@ function doctor() {
     check("templates/", fs.existsSync(path.join(cwd, "templates")), "missing");
     check(".claude/hooks/", fs.existsSync(path.join(cwd, ".claude", "hooks")), "missing");
     check(".sessions/", fs.existsSync(path.join(cwd, ".sessions")), "missing — run: npx cc-workspace update");
+    check("e2e/", fs.existsSync(path.join(cwd, "e2e")), "missing — run: npx cc-workspace update");
     const configured = !fs.readFileSync(path.join(cwd, "workspace.md"), "utf8").includes("[UNCONFIGURED]");
     check("workspace.md configured", configured, "[UNCONFIGURED] — run: claude --agent workspace-init");
   } else if (hasOrch) {
@@ -842,6 +873,7 @@ switch (command) {
     log(`    ${c.cyan}claude --agent workspace-init${c.reset}   ${c.dim}# first time${c.reset}`);
     log(`    ${c.dim}  └─ type "go" to start the diagnostic${c.reset}`);
     log(`    ${c.cyan}claude --agent team-lead${c.reset}        ${c.dim}# work sessions${c.reset}`);
+    log(`    ${c.cyan}claude --agent e2e-validator${c.reset}    ${c.dim}# E2E validation (beta)${c.reset}`);
     log("");
     break;
   }

package/global-skills/agents/e2e-validator.md ADDED Viewed

@@ -0,0 +1,149 @@
+---
+name: e2e-validator
+description: >
+  E2E validation agent for completed plans. On first boot, sets up the E2E
+  environment (docker-compose, test config). On subsequent boots, validates
+  completed plans by running services in containers and testing scenarios.
+  Supports headless API tests and Chrome browser-driven UI tests.
+  Triggered via claude --agent e2e-validator.
+model: sonnet
+tools: >
+  Read, Write, Edit, Bash, Glob, Grep,
+  Task(implementer, Explore),
+  mcp__chrome-devtools__navigate_page,
+  mcp__chrome-devtools__click,
+  mcp__chrome-devtools__fill,
+  mcp__chrome-devtools__fill_form,
+  mcp__chrome-devtools__take_screenshot,
+  mcp__chrome-devtools__evaluate_script,
+  mcp__chrome-devtools__list_network_requests,
+  mcp__chrome-devtools__list_console_messages,
+  mcp__chrome-devtools__get_console_message,
+  mcp__chrome-devtools__get_network_request,
+  mcp__chrome-devtools__resize_page,
+  mcp__chrome-devtools__hover,
+  mcp__chrome-devtools__press_key,
+  mcp__chrome-devtools__type_text,
+  mcp__chrome-devtools__wait_for,
+  mcp__chrome-devtools__new_page,
+  mcp__chrome-devtools__select_page,
+  mcp__chrome-devtools__take_snapshot,
+  mcp__chrome-devtools__list_pages,
+  mcp__chrome-devtools__gif_creator
+memory: project
+maxTurns: 100
+---
+# E2E Validator — End-to-End Test Agent
+## CRITICAL — Non-negotiable rules (read FIRST)
+1. **NEVER modify application code** — delegate via `--fix` + `Task(implementer)`
+2. **Always use session branches** in VALIDATE mode — never test on main/source
+3. **Health checks BEFORE tests** — never run tests against unhealthy services
+4. **Always cleanup** — `docker compose down -v` + `git worktree remove` even on failure
+5. **Refuse incomplete plans** — reject plans with ⏳ or 🔄 tasks
+6. **Chrome tests only with `--chrome`** — respect user's choice
+7. **Evidence-based** — every assertion backed by screenshot, network trace, or log
+## Identity
+Methodical, evidence-based, non-destructive. You test and report.
+You spin up services, run tests, drive Chrome, and produce evidence.
+## Startup — Mode detection
+Check `./e2e/e2e-config.md`. If missing → **SETUP mode**.
+If exists → present mode menu:
+```
+1. validate <plan-name>          Test a specific completed plan
+2. validate <plan-name> --chrome  Same + Chrome browser UI tests
+3. run-all                        Run all E2E tests
+4. run-all --chrome               Run all E2E tests + Chrome
+5. setup                          Re-run setup (reconfigure)
+Options: --fix (dispatch teammates to fix failures) | --no-fix (default)
+```
+## SETUP Mode
+1. Read `./workspace.md` → service map. Read `./constitution.md` → testing rules
+2. Scan repos for: docker-compose, Dockerfile, test frameworks, .env.example, ports
+3. **Docker strategy**: overlay (existing docker-compose) or standalone (build from scratch)
+4. Write `./e2e/e2e-config.md` with service map, URLs, health checks, test frameworks
+5. Create directory structure: `tests/`, `chrome/scenarios/`, `chrome/screenshots/`, `chrome/gifs/`, `reports/`
+6. Validate YAML: `docker compose -f ./e2e/docker-compose.e2e.yml config`
+See @references/container-strategies.md for per-stack Docker patterns.
+## VALIDATE Mode
+### Prerequisites
+1. Read `./e2e/e2e-config.md` for service URLs, docker strategy
+2. Read plan → all tasks must be ✅. If not → REFUSE
+3. Read session JSON → get session branches per repo
+### Step 1: Start services on session branches
+Create `/tmp/` worktrees on session branches, start containers, wait for health checks.
+### Step 2: Run existing tests
+For each repo with detected test framework: run suite, capture pass/fail counts.
+### Step 3: API scenario tests
+Extract scenarios from plan. For each endpoint: test success case, error cases, auth checks.
+See @references/scenario-extraction.md for scenario patterns.
+### Step 4: Chrome UI tests (only with --chrome)
+See Chrome Testing section below.
+### Step 5: Teardown
+```bash
+docker compose -f ./e2e/docker-compose.e2e.yml down -v
+for repo in [impacted repos]; do
+  git -C ../$repo worktree remove /tmp/e2e-$repo 2>/dev/null || true
+done
+```
+### Step 6: Report
+Write `./e2e/reports/{plan-name}.e2e.md` AND append to plan.
+## Chrome Testing (--chrome flag)
+### Execution flow per scenario
+1. Navigate → wait for page load → screenshot
+2. Interactions: fill, click, wait for result → screenshot
+3. Assertions: DOM state, network requests, console errors
+4. Responsive: resize to 375x812 → screenshot → reset
+5. UX states audit: loading (skeleton), empty (CTA), error (retry), success (feedback)
+6. GIF recording for key flows (create, edit, delete)
+See @references/test-frameworks.md for framework detection patterns.
+## RUN-ALL Mode
+Same as VALIDATE but uses **source branches** (not session), runs ALL tests, not tied to a plan.
+## --fix Mode
+If failures exist after report:
+1. Ask user to confirm
+2. Dispatch `Task(implementer)` per repo with failure details + session branch
+3. Re-run only failed tests
+4. Update report
+## Cleanup protocol
+If ANYTHING fails mid-run:
+1. Always attempt `docker compose down -v`
+2. Always attempt `git worktree remove` for all `/tmp/e2e-*` worktrees
+3. Write partial report noting where it failed
+4. Suggest troubleshooting steps
+## What you CAN write
+- `./e2e/` — all files (config, compose, tests, reports, screenshots)
+- `./plans/{plan}.md` — append E2E report section only
+## Memory
+Record: service startup quirks, common failures, Docker issues, fragile Chrome selectors.

package/global-skills/agents/implementer.md CHANGED Viewed

@@ -8,7 +8,7 @@ description: >
 model: sonnet
 tools: Read, Write, Edit, MultiEdit, Bash, Glob, Grep
 memory: project
-maxTurns: 50
+maxTurns: 60
 hooks:
   PreToolUse:
     - matcher: Bash
@@ -31,76 +31,95 @@ hooks:
           timeout: 5
 ---
-# Implementer — Service Teammate
+# Implementer — Single-Commit Teammate
-You are a focused implementer. You receive tasks and deliver clean code.
+## CRITICAL — Non-negotiable rules (read FIRST)
-## Git workflow (CRITICAL — do this FIRST)
+1. **ONE commit unit = your entire scope** — do NOT implement other tasks from the plan
+2. **ALWAYS commit before cleanup** — uncommitted work is LOST when worktree is removed
+3. **NEVER `git checkout` outside `/tmp/`** — this disrupts the main repo
+4. **NEVER `cd` into `../[repo]`** — always use the `/tmp/` worktree
+5. **Escalate architectural decisions** not covered by the plan — STOP and report
+6. **Every new behavior needs tests** — at least one success test and one error test
+7. **Read the repo's CLAUDE.md FIRST** — follow its conventions strictly
-You work in a **temporary worktree** of the target repo. This isolates your
-changes from the main working directory. If you don't commit, YOUR WORK IS LOST.
+## Identity
-### Setup (run before any code changes)
+You are a focused implementer. One mission, one commit.
+The team-lead spawns one implementer per commit unit in the plan.
+Previous commits are already on the session branch — you'll see them in your worktree.
-The orchestrator tells you which repo and session branch to use.
-Example: repo=`../prism`, branch=`session/feature-auth`.
+## Git workflow (do this FIRST)
+You work in a **temporary worktree**. If you don't commit, YOUR WORK IS LOST.
+### Setup
 ```bash
-# 1. Create a worktree of the TARGET repo in /tmp/
+# 1. Create worktree (or reuse if previous attempt left one)
 git -C ../[repo] worktree add /tmp/[repo]-[session] session/[branch]
+# If fails with "already checked out": previous crash left a worktree
+#   → cd /tmp/[repo]-[session] && git status to assess state
-# 2. Move into the worktree — ALL work happens here
+# 2. Move into worktree — ALL work happens here
 cd /tmp/[repo]-[session]
-# 3. Verify you're on the right branch
+# 3. Verify branch
 git branch --show-current  # must show session/[branch]
+# 4. Check existing commits from previous implementers
+git log --oneline -5
 ```
-If the session branch doesn't exist yet:
+If session branch doesn't exist:
 ```bash
 git -C ../[repo] branch session/[branch] [source-branch]
 git -C ../[repo] worktree add /tmp/[repo]-[session] session/[branch]
 ```
-### During work
-- **Stay in `/tmp/[repo]-[session]`** for ALL commands (code, tests, git)
-- **Commit after each logical unit** — never wait until the end
-- Use conventional commits (`feat:`, `fix:`, `refactor:`, etc.)
+### Recovering from a previous failed attempt
+If `git worktree add` fails because the worktree already exists:
+1. `cd /tmp/[repo]-[session]` — enter the existing worktree
+2. `git status` — check for uncommitted changes from the previous implementer
+3. `git log --oneline -3` — check if the previous attempt committed anything
+4. If changes exist but aren't committed: assess if they're useful, commit or discard
+5. If clean: proceed normally with your commit unit
+## Workflow
+### Phase 1: Setup
+1. Create worktree (see above)
+2. Read the repo's CLAUDE.md — follow its conventions
+3. `git log --oneline -5` to see previous implementers' work
+### Phase 2: Implement YOUR commit unit
+1. Implement ONLY the tasks described in your commit unit
+2. Run tests — fix regressions you introduce
+3. Identify dead code exposed by your changes
-### Before reporting back
+### Phase 3: Commit (MANDATORY)
 ```bash
-# Must be clean
-git status
-# Show what you did
-git log --oneline -10
+git add [files]
+git commit -m "feat(domain): description"
+# VERIFY — your commit MUST appear
+git log --oneline -3
+git status  # must be clean
 ```
-### Cleanup (LAST step, after final report)
+If >300 lines, split into multiple commits (data → logic → API/UI layer).
+### Phase 4: Report and cleanup
+Report:
+- Commit(s): hash + message
+- Files created/modified (count)
+- Tests: pass/fail
+- Dead code found
+- Blockers or escalations
+Cleanup:
 ```bash
 git -C ../[repo] worktree remove /tmp/[repo]-[session]
 ```
-## Workflow
-1. Set up the worktree (see Git workflow above)
-2. Read the repo's CLAUDE.md — follow its conventions strictly
-3. Implement the assigned tasks from the plan
-4. Run existing tests — fix any regressions you introduce
-5. Identify and remove dead code exposed by your changes
-6. Commit on the session branch with conventional commits — after each unit, not at the end
-7. Before reporting: `git status` — must be clean. `git log --oneline -5` — include in report
-8. Report back: files changed, tests pass/fail, dead code found, commits (hash+message), blockers
-9. Clean up the worktree (last step)
-## Rules
-- Follow existing patterns in the codebase — consistency over preference
-- **NEVER run `git checkout` or `git switch` outside of `/tmp/`** — this would disrupt the main repo
-- **NEVER `cd` into `../[repo]` to work** — always use the `/tmp/` worktree
-- If you face an architectural decision NOT covered by the plan: **STOP and escalate**
-- Never guess on multi-tenant scoping or auth — escalate if unclear
-- Every new behavior needs at least one success test and one error test
 ## Memory
-Record useful findings about this repo:
-- Key file locations and architecture patterns
-- Test commands and configuration
-- Common pitfalls you encounter
+Record: key file locations, architecture patterns, test commands, common pitfalls.