cc-workspace 4.3.0 → 4.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -27,7 +27,7 @@ cd ~/projects/my-workspace
27
27
  npx cc-workspace init . "My Project"
28
28
  ```
29
29
 
30
- This creates an `orchestrator/` directory and installs 9 skills, 3 agents, 9 hooks, and 3 rules into `~/.claude/`.
30
+ This creates an `orchestrator/` directory and installs 13 skills, 4 agents, 9 hooks, and 2 rules into `~/.claude/`.
31
31
 
32
32
  ### Configure (one time)
33
33
 
@@ -47,7 +47,8 @@ The init agent will:
47
47
 
48
48
  ```bash
49
49
  cd orchestrator/
50
- claude --agent team-lead
50
+ claude --agent team-lead # orchestration sessions
51
+ claude --agent e2e-validator # E2E validation (beta)
51
52
  ```
52
53
 
53
54
  The team-lead offers 4 modes:
@@ -68,7 +69,7 @@ npx cc-workspace update
68
69
  Updates all components if the package version is newer:
69
70
  - **Global**: skills, rules, agents in `~/.claude/`
70
71
  - **Local** (if `orchestrator/` found): hooks, settings.json, CLAUDE.md, templates, _TEMPLATE.md
71
- - **Never overwritten**: workspace.md, constitution.md, plans/
72
+ - **Never overwritten**: workspace.md, constitution.md, plans/, e2e/
72
73
 
73
74
  ### Diagnostic
74
75
 
@@ -95,6 +96,15 @@ my-workspace/
95
96
  │ ├── workspace.md <- filled by workspace-init
96
97
  │ ├── constitution.md <- filled by workspace-init
97
98
  │ ├── .sessions/ <- session state (gitignored, created per session)
99
+ │ ├── e2e/ <- E2E test environment (beta)
100
+ │ │ ├── e2e-config.md <- agent memory (generated at first boot)
101
+ │ │ ├── docker-compose.e2e.yml <- generated at first boot
102
+ │ │ ├── tests/ <- headless API test scripts
103
+ │ │ ├── chrome/
104
+ │ │ │ ├── scenarios/ <- Chrome test flows per plan
105
+ │ │ │ ├── screenshots/ <- evidence
106
+ │ │ │ └── gifs/ <- recorded flows
107
+ │ │ └── reports/ <- per-plan E2E reports
98
108
  │ ├── templates/
99
109
  │ │ ├── workspace.template.md
100
110
  │ │ ├── constitution.template.md
@@ -198,6 +208,7 @@ parallel in each repo via Agent Teams.
198
208
  | **Teammates** | Sonnet 4.6 | Implement in an isolated worktree, test, commit. |
199
209
  | **Explorers** | Haiku | Read-only. Scan, verify consistency. |
200
210
  | **QA** | Sonnet 4.6 | Hostile mode. Min 3 problems found per service. |
211
+ | **E2E Validator** | Sonnet 4.6 | Containers + Chrome browser testing (beta). |
201
212
 
202
213
  ### The 4 session modes
203
214
 
@@ -235,7 +246,7 @@ Protection layers:
235
246
 
236
247
  ---
237
248
 
238
- ## The 9 skills
249
+ ## The 13 skills
239
250
 
240
251
  | Skill | Role | Trigger |
241
252
  |-------|------|---------|
@@ -248,19 +259,24 @@ Protection layers:
248
259
  | **cycle-retrospective** | Post-cycle learning (Haiku) | "Retro", "retrospective" |
249
260
  | **refresh-profiles** | Re-scan repo CLAUDE.md files (Haiku) | "Refresh profiles" |
250
261
  | **bootstrap-repo** | Generate a CLAUDE.md (Haiku) | "Bootstrap", "init CLAUDE.md" |
262
+ | **e2e-validator** | E2E validation: containers + Chrome (beta) | `claude --agent e2e-validator` |
263
+ | **session** | List, status, close parallel sessions | `/session`, `/session status X` |
264
+ | **doctor** | Full workspace diagnostic (Haiku) | `/doctor` |
265
+ | **cleanup** | Remove orphan worktrees + stale sessions | `/cleanup` |
251
266
 
252
267
  All use `context: fork` — a skill's result is not in context when the
253
268
  next one starts. The plan on disk is the source of truth.
254
269
 
255
270
  ---
256
271
 
257
- ## The 3 agents
272
+ ## The 4 agents
258
273
 
259
274
  | Agent | Model | Usage |
260
275
  |-------|-------|-------|
261
276
  | **team-lead** | Opus 4.6 | `claude --agent team-lead` — multi-service orchestration |
262
277
  | **workspace-init** | Sonnet 4.6 | `claude --agent workspace-init` — diagnostic + initial config |
263
278
  | **implementer** | Sonnet 4.6 | Task subagent with `isolation: worktree` — isolated implementation |
279
+ | **e2e-validator** | Sonnet 4.6 | `claude --agent e2e-validator` — E2E validation with containers + Chrome (beta) |
264
280
 
265
281
  ---
266
282
 
@@ -385,9 +401,14 @@ cc-workspace/
385
401
  ├── cycle-retrospective/SKILL.md
386
402
  ├── refresh-profiles/SKILL.md
387
403
  ├── bootstrap-repo/SKILL.md
404
+ ├── e2e-validator/
405
+ │ └── references/
406
+ │ ├── container-strategies.md
407
+ │ ├── test-frameworks.md
408
+ │ └── scenario-extraction.md
388
409
  ├── hooks/ <- 11 scripts (warning-only)
389
410
  ├── rules/ <- 3 rules
390
- └── agents/ <- 3 agents (team-lead, implementer, workspace-init)
411
+ └── agents/ <- 4 agents (team-lead, implementer, workspace-init, e2e-validator)
391
412
  ```
392
413
 
393
414
  ---
@@ -395,7 +416,7 @@ cc-workspace/
395
416
  ## Idempotence
396
417
 
397
418
  Both `init` and `update` are safe to re-run:
398
- - **Never overwritten**: `workspace.md`, `constitution.md`, `plans/*.md` (user content)
419
+ - **Never overwritten**: `workspace.md`, `constitution.md`, `plans/*.md`, `e2e/` (user content)
399
420
  - **Always regenerated**: `settings.json`, `block-orchestrator-writes.sh` (security), `CLAUDE.md`, `_TEMPLATE.md`
400
421
  - **Always copied**: hooks, templates
401
422
  - **Always regenerated on init**: `service-profiles.md` (fresh scan)
@@ -403,6 +424,103 @@ Both `init` and `update` are safe to re-run:
403
424
 
404
425
  ---
405
426
 
427
+ ## E2E Validator (beta)
428
+
429
+ A dedicated agent that validates completed plans by running services in containers
430
+ and testing scenarios — including Chrome browser-driven UI tests.
431
+
432
+ ```bash
433
+ cd orchestrator/
434
+ claude --agent e2e-validator
435
+ ```
436
+
437
+ ### First boot — setup
438
+
439
+ On first boot (no `e2e/e2e-config.md`), the agent:
440
+ 1. Reads `workspace.md` for repos and stacks
441
+ 2. Scans repos for existing `docker-compose.yml` and test frameworks
442
+ 3. If docker-compose exists: generates an overlay (`docker-compose.e2e.yml`)
443
+ 4. If not: builds the config interactively with you
444
+ 5. Writes `e2e/e2e-config.md` (its persistent memory)
445
+
446
+ ### Modes
447
+
448
+ | Mode | Description |
449
+ |------|-------------|
450
+ | `validate <plan>` | Test a specific completed plan (API tests) |
451
+ | `validate <plan> --chrome` | Same + Chrome browser UI tests |
452
+ | `run-all` | Run all E2E tests (headless) |
453
+ | `run-all --chrome` | Run all E2E tests + Chrome |
454
+ | `setup` | Re-run first boot setup |
455
+
456
+ Add `--fix` to any mode to dispatch teammates for fixing failures.
457
+
458
+ ### How it works
459
+
460
+ 1. Creates `/tmp/` worktrees on session branches (from the plan)
461
+ 2. Starts services via `docker compose up`
462
+ 3. Waits for health checks
463
+ 4. Runs existing test suites + generates API scenario tests from the plan
464
+ 5. With `--chrome`: drives Chrome via chrome-devtools MCP (navigate, fill forms,
465
+ click, take screenshots, record GIFs, check network requests and console)
466
+ 6. Generates report with evidence (screenshots, GIFs, network traces)
467
+ 7. Tears down containers and worktrees
468
+
469
+ ### Chrome testing
470
+
471
+ With `--chrome`, the agent:
472
+ - Navigates the frontend in your real Chrome browser
473
+ - Plays user scenarios extracted from the plan
474
+ - Takes screenshots at each step as evidence
475
+ - Records GIFs of complete flows
476
+ - Checks the 4 mandatory UX states (loading, empty, error, success)
477
+ - Tests responsive layouts (mobile viewport)
478
+ - Verifies network requests match the API contract
479
+ - Checks console for errors
480
+
481
+ ### Requirements
482
+
483
+ - **Docker** (docker compose v2)
484
+ - **Chrome** with chrome-devtools MCP server (for `--chrome` mode)
485
+ - Completed plan (all tasks ✅) with session branches
486
+
487
+ ---
488
+
489
+ ## Changelog v4.4.0 -> v4.5.0
490
+
491
+ | # | Feature | Detail |
492
+ |---|---------|--------|
493
+ | 1 | **Agent prompt restructuring** | All agents now have a `CRITICAL — Non-negotiable rules` section at the top. Most important rules are front-loaded for better model adherence. Prompts reduced by ~25%. |
494
+ | 2 | **Context tiering** | Spawn templates now use 3 tiers: Tier 1 (always inject), Tier 2 (conditional), Tier 3 (never — already in agent/CLAUDE.md). Reduces implementer context bloat. |
495
+ | 3 | **Spawn template deduplication** | Git workflow instructions removed from spawn templates — the implementer agent already knows them. Only specific values (repo path, session branch) are injected. |
496
+ | 4 | **Rollback protocol** | team-lead can now `git update-ref` to reset a corrupted session branch to the last known good commit, or recreate from source branch. |
497
+ | 5 | **Failed dispatch tracking** | Plan template now includes a "Failed dispatches" section. After 2 retries, commit units are marked `❌ ESCALATED` and the wave stops for user input. |
498
+ | 6 | **Worktree crash recovery** | SessionStart hook now cleans orphan `/tmp/` worktrees left by crashed implementers. Implementer can also reuse an existing worktree from a previous failed attempt. |
499
+ | 7 | **Implementer maxTurns 50→60** | Buffer for complex commit units. Prevents context loss at boundary. |
500
+ | 8 | **3 new slash commands** | `/session` (list, status, close sessions), `/doctor` (full diagnostic), `/cleanup` (orphan worktrees + stale sessions). Replaces `npx cc-workspace` CLI for in-session use. |
501
+ | 9 | **13 skills** | Up from 10. New: session, doctor, cleanup. |
502
+
503
+ ---
504
+
505
+ ## Changelog v4.3.0 -> v4.4.0
506
+
507
+ | # | Feature | Detail |
508
+ |---|---------|--------|
509
+ | 1 | **E2E Validator agent (beta)** | New `e2e-validator` agent: validates completed plans by running services in containers. Supports headless API tests and Chrome browser-driven UI tests with screenshots and GIF recording. |
510
+ | 2 | **Chrome testing mode** | `--chrome` flag drives the user's Chrome browser via chrome-devtools MCP. Navigates, fills forms, clicks, takes screenshots, records GIFs, checks network and console. |
511
+ | 3 | **E2E directory structure** | `orchestrator/e2e/` created during init/update. Contains docker-compose overlay, test scripts, Chrome scenarios, screenshots, GIFs, and reports. Never overwritten by updates. |
512
+ | 4 | **Container strategies** | Reference docs for overlay and standalone docker-compose patterns per stack (PHP, Node, Python, Go, Vue, React). |
513
+ | 5 | **Scenario extraction** | Reference doc for extracting testable E2E scenarios from completed plans (API endpoints, Chrome flows, UX states). |
514
+ | 6 | **5 modes** | setup, validate, validate --chrome, run-all, run-all --chrome. Optional --fix dispatches teammates. |
515
+
516
+ ---
517
+
518
+ ## Changelog v4.2.0 -> v4.3.0
519
+
520
+ > Minor improvements and bug fixes.
521
+
522
+ ---
523
+
406
524
  ## Changelog v4.1.4 -> v4.2.0
407
525
 
408
526
  | # | Feature | Detail |
package/bin/cli.js CHANGED
@@ -283,6 +283,7 @@ You clarify, plan, delegate, track.
283
283
  cd orchestrator/
284
284
  claude --agent workspace-init # first time: diagnostic + config
285
285
  claude --agent team-lead # work sessions
286
+ claude --agent e2e-validator # E2E validation of completed plans
286
287
  \`\`\`
287
288
 
288
289
  ## Initialization (workspace-init)
@@ -305,8 +306,10 @@ Run once. Idempotent — can be re-run to re-diagnose.
305
306
  - Service profiles: \`./plans/service-profiles.md\`
306
307
  - Active plans: \`./plans/*.md\`
307
308
  - Active sessions: \`./.sessions/*.json\`
309
+ - E2E config: \`./e2e/e2e-config.md\`
310
+ - E2E reports: \`./e2e/reports/\`
308
311
 
309
- ## Skills (9)
312
+ ## Skills (13)
310
313
  - **dispatch-feature**: 4 modes, clarify → plan → waves → collect → verify
311
314
  - **qa-ruthless**: adversarial QA, min 3 findings per service
312
315
  - **cross-service-check**: inter-repo consistency
@@ -316,6 +319,10 @@ Run once. Idempotent — can be re-run to re-diagnose.
316
319
  - **cycle-retrospective**: post-cycle learning (haiku)
317
320
  - **refresh-profiles**: re-reads repo CLAUDE.md files (haiku)
318
321
  - **bootstrap-repo**: generates a CLAUDE.md for a repo (haiku)
322
+ - **e2e-validator**: E2E validation of completed plans (beta) — containers + Chrome
323
+ - **/session**: list, status, close parallel sessions
324
+ - **/doctor**: full workspace diagnostic
325
+ - **/cleanup**: remove orphan worktrees + stale sessions
319
326
 
320
327
  ## Rules
321
328
  1. No code in repos — delegate to teammates
@@ -333,6 +340,7 @@ Run once. Idempotent — can be re-run to re-diagnose.
333
340
  13. Retrospective cycle after each completed feature
334
341
  14. Session branches for parallel isolation — teammates use session/{name}, never create own branches
335
342
  15. Never \`git checkout -b\` in repos — use \`git branch\` (no checkout) to avoid disrupting parallel sessions
343
+ 16. E2E validation via \`claude --agent e2e-validator\` after plans are complete
336
344
  `;
337
345
  }
338
346
 
@@ -387,6 +395,9 @@ function planTemplateContent() {
387
395
  |---------|:-:|:-:|:-:|:-:|
388
396
  | | N | 0 | ⏳ | ⏳ |
389
397
 
398
+ ## Failed dispatches
399
+ <!-- Commit units that failed 2+ times are recorded here for user review -->
400
+
390
401
  ## QA
391
402
  - ⏳ Cross-service check
392
403
  - ⏳ QA ruthless
@@ -476,8 +487,19 @@ function updateLocal() {
476
487
  ok(".sessions/ created");
477
488
  }
478
489
 
479
- // ── NEVER touch: workspace.md, constitution.md, plans/*.md, service-profiles.md ──
480
- info(`${c.dim}workspace.md, constitution.md, plans/ — preserved${c.reset}`);
490
+ // ── e2e/ (create if missing never overwrite existing) ──
491
+ const e2eDir = path.join(orchDir, "e2e");
492
+ if (!fs.existsSync(e2eDir)) {
493
+ mkdirp(path.join(e2eDir, "tests"));
494
+ mkdirp(path.join(e2eDir, "chrome", "scenarios"));
495
+ mkdirp(path.join(e2eDir, "chrome", "screenshots"));
496
+ mkdirp(path.join(e2eDir, "chrome", "gifs"));
497
+ mkdirp(path.join(e2eDir, "reports"));
498
+ ok("e2e/ directory created");
499
+ }
500
+
501
+ // ── NEVER touch: workspace.md, constitution.md, plans/*.md, e2e/ ──
502
+ info(`${c.dim}workspace.md, constitution.md, plans/, e2e/ — preserved${c.reset}`);
481
503
 
482
504
  return true;
483
505
  }
@@ -493,6 +515,11 @@ function setupWorkspace(workspacePath, projectName) {
493
515
  mkdirp(path.join(orchDir, "plans"));
494
516
  mkdirp(path.join(orchDir, "templates"));
495
517
  mkdirp(path.join(orchDir, ".sessions"));
518
+ mkdirp(path.join(orchDir, "e2e", "tests"));
519
+ mkdirp(path.join(orchDir, "e2e", "chrome", "scenarios"));
520
+ mkdirp(path.join(orchDir, "e2e", "chrome", "screenshots"));
521
+ mkdirp(path.join(orchDir, "e2e", "chrome", "gifs"));
522
+ mkdirp(path.join(orchDir, "e2e", "reports"));
496
523
  ok("Structure created");
497
524
 
498
525
  // ── Templates ──
@@ -575,7 +602,9 @@ function setupWorkspace(workspacePath, projectName) {
575
602
  fs.writeFileSync(gi, [
576
603
  ".claude/bash-commands.log", ".claude/worktrees/", ".claude/modified-files.log",
577
604
  ".sessions/",
578
- "plans/*.md", "!plans/_TEMPLATE.md", "!plans/service-profiles.md", ""
605
+ "plans/*.md", "!plans/_TEMPLATE.md", "!plans/service-profiles.md",
606
+ "e2e/chrome/screenshots/", "e2e/chrome/gifs/", "e2e/reports/",
607
+ "e2e/docker-compose.e2e.yml", "e2e/e2e-config.md", ""
579
608
  ].join("\n"));
580
609
  ok(".gitignore");
581
610
  }
@@ -633,13 +662,14 @@ function setupWorkspace(workspacePath, projectName) {
633
662
  log(` ${c.dim}Directory${c.reset} ${orchDir}`);
634
663
  log(` ${c.dim}Repos${c.reset} ${repos.length} detected`);
635
664
  log(` ${c.dim}Hooks${c.reset} ${hookCount} scripts`);
636
- log(` ${c.dim}Skills${c.reset} 9 ${c.dim}(~/.claude/skills/)${c.reset}`);
665
+ log(` ${c.dim}Skills${c.reset} 13 ${c.dim}(~/.claude/skills/)${c.reset}`);
637
666
  log("");
638
667
  log(` ${c.bold}Next steps:${c.reset}`);
639
668
  log(` ${c.cyan}cd orchestrator/${c.reset}`);
640
669
  log(` ${c.cyan}claude --agent workspace-init${c.reset} ${c.dim}# first time: diagnostic + config${c.reset}`);
641
670
  log(` ${c.dim} └─ type "go" to start the diagnostic${c.reset}`);
642
671
  log(` ${c.cyan}claude --agent team-lead${c.reset} ${c.dim}# orchestration sessions${c.reset}`);
672
+ log(` ${c.cyan}claude --agent e2e-validator${c.reset} ${c.dim}# E2E validation (beta)${c.reset}`);
643
673
  if (reposWithoutClaude.length > 0) {
644
674
  log("");
645
675
  warn(`${reposWithoutClaude.length} repo(s) without CLAUDE.md: ${c.bold}${reposWithoutClaude.join(", ")}${c.reset}`);
@@ -674,7 +704,7 @@ function doctor() {
674
704
  // Skills count
675
705
  if (fs.existsSync(GLOBAL_SKILLS)) {
676
706
  const skills = fs.readdirSync(GLOBAL_SKILLS, { withFileTypes: true }).filter(e => e.isDirectory());
677
- check(`Skills (${skills.length}/9)`, skills.length >= 9, `only ${skills.length} found`);
707
+ check(`Skills (${skills.length}/13)`, skills.length >= 13, `only ${skills.length} found`);
678
708
  }
679
709
 
680
710
  // Rules
@@ -683,7 +713,7 @@ function doctor() {
683
713
  }
684
714
 
685
715
  // Agents
686
- for (const a of ["team-lead.md", "implementer.md", "workspace-init.md"]) {
716
+ for (const a of ["team-lead.md", "implementer.md", "workspace-init.md", "e2e-validator.md"]) {
687
717
  check(`Agent: ${a}`, fs.existsSync(path.join(GLOBAL_AGENTS, a)), "missing");
688
718
  }
689
719
 
@@ -706,6 +736,7 @@ function doctor() {
706
736
  check("templates/", fs.existsSync(path.join(cwd, "templates")), "missing");
707
737
  check(".claude/hooks/", fs.existsSync(path.join(cwd, ".claude", "hooks")), "missing");
708
738
  check(".sessions/", fs.existsSync(path.join(cwd, ".sessions")), "missing — run: npx cc-workspace update");
739
+ check("e2e/", fs.existsSync(path.join(cwd, "e2e")), "missing — run: npx cc-workspace update");
709
740
  const configured = !fs.readFileSync(path.join(cwd, "workspace.md"), "utf8").includes("[UNCONFIGURED]");
710
741
  check("workspace.md configured", configured, "[UNCONFIGURED] — run: claude --agent workspace-init");
711
742
  } else if (hasOrch) {
@@ -842,6 +873,7 @@ switch (command) {
842
873
  log(` ${c.cyan}claude --agent workspace-init${c.reset} ${c.dim}# first time${c.reset}`);
843
874
  log(` ${c.dim} └─ type "go" to start the diagnostic${c.reset}`);
844
875
  log(` ${c.cyan}claude --agent team-lead${c.reset} ${c.dim}# work sessions${c.reset}`);
876
+ log(` ${c.cyan}claude --agent e2e-validator${c.reset} ${c.dim}# E2E validation (beta)${c.reset}`);
845
877
  log("");
846
878
  break;
847
879
  }
@@ -0,0 +1,149 @@
1
+ ---
2
+ name: e2e-validator
3
+ description: >
4
+ E2E validation agent for completed plans. On first boot, sets up the E2E
5
+ environment (docker-compose, test config). On subsequent boots, validates
6
+ completed plans by running services in containers and testing scenarios.
7
+ Supports headless API tests and Chrome browser-driven UI tests.
8
+ Triggered via claude --agent e2e-validator.
9
+ model: sonnet
10
+ tools: >
11
+ Read, Write, Edit, Bash, Glob, Grep,
12
+ Task(implementer, Explore),
13
+ mcp__chrome-devtools__navigate_page,
14
+ mcp__chrome-devtools__click,
15
+ mcp__chrome-devtools__fill,
16
+ mcp__chrome-devtools__fill_form,
17
+ mcp__chrome-devtools__take_screenshot,
18
+ mcp__chrome-devtools__evaluate_script,
19
+ mcp__chrome-devtools__list_network_requests,
20
+ mcp__chrome-devtools__list_console_messages,
21
+ mcp__chrome-devtools__get_console_message,
22
+ mcp__chrome-devtools__get_network_request,
23
+ mcp__chrome-devtools__resize_page,
24
+ mcp__chrome-devtools__hover,
25
+ mcp__chrome-devtools__press_key,
26
+ mcp__chrome-devtools__type_text,
27
+ mcp__chrome-devtools__wait_for,
28
+ mcp__chrome-devtools__new_page,
29
+ mcp__chrome-devtools__select_page,
30
+ mcp__chrome-devtools__take_snapshot,
31
+ mcp__chrome-devtools__list_pages,
32
+ mcp__chrome-devtools__gif_creator
33
+ memory: project
34
+ maxTurns: 100
35
+ ---
36
+
37
+ # E2E Validator — End-to-End Test Agent
38
+
39
+ ## CRITICAL — Non-negotiable rules (read FIRST)
40
+
41
+ 1. **NEVER modify application code** — delegate via `--fix` + `Task(implementer)`
42
+ 2. **Always use session branches** in VALIDATE mode — never test on main/source
43
+ 3. **Health checks BEFORE tests** — never run tests against unhealthy services
44
+ 4. **Always cleanup** — `docker compose down -v` + `git worktree remove` even on failure
45
+ 5. **Refuse incomplete plans** — reject plans with ⏳ or 🔄 tasks
46
+ 6. **Chrome tests only with `--chrome`** — respect user's choice
47
+ 7. **Evidence-based** — every assertion backed by screenshot, network trace, or log
48
+
49
+ ## Identity
50
+
51
+ Methodical, evidence-based, non-destructive. You test and report.
52
+ You spin up services, run tests, drive Chrome, and produce evidence.
53
+
54
+ ## Startup — Mode detection
55
+
56
+ Check `./e2e/e2e-config.md`. If missing → **SETUP mode**.
57
+ If exists → present mode menu:
58
+
59
+ ```
60
+ 1. validate <plan-name> Test a specific completed plan
61
+ 2. validate <plan-name> --chrome Same + Chrome browser UI tests
62
+ 3. run-all Run all E2E tests
63
+ 4. run-all --chrome Run all E2E tests + Chrome
64
+ 5. setup Re-run setup (reconfigure)
65
+
66
+ Options: --fix (dispatch teammates to fix failures) | --no-fix (default)
67
+ ```
68
+
69
+ ## SETUP Mode
70
+
71
+ 1. Read `./workspace.md` → service map. Read `./constitution.md` → testing rules
72
+ 2. Scan repos for: docker-compose, Dockerfile, test frameworks, .env.example, ports
73
+ 3. **Docker strategy**: overlay (existing docker-compose) or standalone (build from scratch)
74
+ 4. Write `./e2e/e2e-config.md` with service map, URLs, health checks, test frameworks
75
+ 5. Create directory structure: `tests/`, `chrome/scenarios/`, `chrome/screenshots/`, `chrome/gifs/`, `reports/`
76
+ 6. Validate YAML: `docker compose -f ./e2e/docker-compose.e2e.yml config`
77
+
78
+ See @references/container-strategies.md for per-stack Docker patterns.
79
+
80
+ ## VALIDATE Mode
81
+
82
+ ### Prerequisites
83
+ 1. Read `./e2e/e2e-config.md` for service URLs, docker strategy
84
+ 2. Read plan → all tasks must be ✅. If not → REFUSE
85
+ 3. Read session JSON → get session branches per repo
86
+
87
+ ### Step 1: Start services on session branches
88
+ Create `/tmp/` worktrees on session branches, start containers, wait for health checks.
89
+
90
+ ### Step 2: Run existing tests
91
+ For each repo with detected test framework: run suite, capture pass/fail counts.
92
+
93
+ ### Step 3: API scenario tests
94
+ Extract scenarios from plan. For each endpoint: test success case, error cases, auth checks.
95
+
96
+ See @references/scenario-extraction.md for scenario patterns.
97
+
98
+ ### Step 4: Chrome UI tests (only with --chrome)
99
+ See Chrome Testing section below.
100
+
101
+ ### Step 5: Teardown
102
+ ```bash
103
+ docker compose -f ./e2e/docker-compose.e2e.yml down -v
104
+ for repo in [impacted repos]; do
105
+ git -C ../$repo worktree remove /tmp/e2e-$repo 2>/dev/null || true
106
+ done
107
+ ```
108
+
109
+ ### Step 6: Report
110
+ Write `./e2e/reports/{plan-name}.e2e.md` AND append to plan.
111
+
112
+ ## Chrome Testing (--chrome flag)
113
+
114
+ ### Execution flow per scenario
115
+ 1. Navigate → wait for page load → screenshot
116
+ 2. Interactions: fill, click, wait for result → screenshot
117
+ 3. Assertions: DOM state, network requests, console errors
118
+ 4. Responsive: resize to 375x812 → screenshot → reset
119
+ 5. UX states audit: loading (skeleton), empty (CTA), error (retry), success (feedback)
120
+ 6. GIF recording for key flows (create, edit, delete)
121
+
122
+ See @references/test-frameworks.md for framework detection patterns.
123
+
124
+ ## RUN-ALL Mode
125
+
126
+ Same as VALIDATE but uses **source branches** (not session), runs ALL tests, not tied to a plan.
127
+
128
+ ## --fix Mode
129
+
130
+ If failures exist after report:
131
+ 1. Ask user to confirm
132
+ 2. Dispatch `Task(implementer)` per repo with failure details + session branch
133
+ 3. Re-run only failed tests
134
+ 4. Update report
135
+
136
+ ## Cleanup protocol
137
+
138
+ If ANYTHING fails mid-run:
139
+ 1. Always attempt `docker compose down -v`
140
+ 2. Always attempt `git worktree remove` for all `/tmp/e2e-*` worktrees
141
+ 3. Write partial report noting where it failed
142
+ 4. Suggest troubleshooting steps
143
+
144
+ ## What you CAN write
145
+ - `./e2e/` — all files (config, compose, tests, reports, screenshots)
146
+ - `./plans/{plan}.md` — append E2E report section only
147
+
148
+ ## Memory
149
+ Record: service startup quirks, common failures, Docker issues, fragile Chrome selectors.
@@ -8,7 +8,7 @@ description: >
8
8
  model: sonnet
9
9
  tools: Read, Write, Edit, MultiEdit, Bash, Glob, Grep
10
10
  memory: project
11
- maxTurns: 50
11
+ maxTurns: 60
12
12
  hooks:
13
13
  PreToolUse:
14
14
  - matcher: Bash
@@ -31,76 +31,95 @@ hooks:
31
31
  timeout: 5
32
32
  ---
33
33
 
34
- # Implementer — Service Teammate
34
+ # Implementer — Single-Commit Teammate
35
35
 
36
- You are a focused implementer. You receive tasks and deliver clean code.
36
+ ## CRITICAL Non-negotiable rules (read FIRST)
37
37
 
38
- ## Git workflow (CRITICAL — do this FIRST)
38
+ 1. **ONE commit unit = your entire scope** — do NOT implement other tasks from the plan
39
+ 2. **ALWAYS commit before cleanup** — uncommitted work is LOST when worktree is removed
40
+ 3. **NEVER `git checkout` outside `/tmp/`** — this disrupts the main repo
41
+ 4. **NEVER `cd` into `../[repo]`** — always use the `/tmp/` worktree
42
+ 5. **Escalate architectural decisions** not covered by the plan — STOP and report
43
+ 6. **Every new behavior needs tests** — at least one success test and one error test
44
+ 7. **Read the repo's CLAUDE.md FIRST** — follow its conventions strictly
39
45
 
40
- You work in a **temporary worktree** of the target repo. This isolates your
41
- changes from the main working directory. If you don't commit, YOUR WORK IS LOST.
46
+ ## Identity
42
47
 
43
- ### Setup (run before any code changes)
48
+ You are a focused implementer. One mission, one commit.
49
+ The team-lead spawns one implementer per commit unit in the plan.
50
+ Previous commits are already on the session branch — you'll see them in your worktree.
44
51
 
45
- The orchestrator tells you which repo and session branch to use.
46
- Example: repo=`../prism`, branch=`session/feature-auth`.
52
+ ## Git workflow (do this FIRST)
47
53
 
54
+ You work in a **temporary worktree**. If you don't commit, YOUR WORK IS LOST.
55
+
56
+ ### Setup
48
57
  ```bash
49
- # 1. Create a worktree of the TARGET repo in /tmp/
58
+ # 1. Create worktree (or reuse if previous attempt left one)
50
59
  git -C ../[repo] worktree add /tmp/[repo]-[session] session/[branch]
60
+ # If fails with "already checked out": previous crash left a worktree
61
+ # → cd /tmp/[repo]-[session] && git status to assess state
51
62
 
52
- # 2. Move into the worktree — ALL work happens here
63
+ # 2. Move into worktree — ALL work happens here
53
64
  cd /tmp/[repo]-[session]
54
65
 
55
- # 3. Verify you're on the right branch
66
+ # 3. Verify branch
56
67
  git branch --show-current # must show session/[branch]
68
+
69
+ # 4. Check existing commits from previous implementers
70
+ git log --oneline -5
57
71
  ```
58
72
 
59
- If the session branch doesn't exist yet:
73
+ If session branch doesn't exist:
60
74
  ```bash
61
75
  git -C ../[repo] branch session/[branch] [source-branch]
62
76
  git -C ../[repo] worktree add /tmp/[repo]-[session] session/[branch]
63
77
  ```
64
78
 
65
- ### During work
66
- - **Stay in `/tmp/[repo]-[session]`** for ALL commands (code, tests, git)
67
- - **Commit after each logical unit** never wait until the end
68
- - Use conventional commits (`feat:`, `fix:`, `refactor:`, etc.)
79
+ ### Recovering from a previous failed attempt
80
+ If `git worktree add` fails because the worktree already exists:
81
+ 1. `cd /tmp/[repo]-[session]`enter the existing worktree
82
+ 2. `git status` check for uncommitted changes from the previous implementer
83
+ 3. `git log --oneline -3` — check if the previous attempt committed anything
84
+ 4. If changes exist but aren't committed: assess if they're useful, commit or discard
85
+ 5. If clean: proceed normally with your commit unit
86
+
87
+ ## Workflow
88
+
89
+ ### Phase 1: Setup
90
+ 1. Create worktree (see above)
91
+ 2. Read the repo's CLAUDE.md — follow its conventions
92
+ 3. `git log --oneline -5` to see previous implementers' work
93
+
94
+ ### Phase 2: Implement YOUR commit unit
95
+ 1. Implement ONLY the tasks described in your commit unit
96
+ 2. Run tests — fix regressions you introduce
97
+ 3. Identify dead code exposed by your changes
69
98
 
70
- ### Before reporting back
99
+ ### Phase 3: Commit (MANDATORY)
71
100
  ```bash
72
- # Must be clean
73
- git status
74
- # Show what you did
75
- git log --oneline -10
101
+ git add [files]
102
+ git commit -m "feat(domain): description"
103
+
104
+ # VERIFY your commit MUST appear
105
+ git log --oneline -3
106
+ git status # must be clean
76
107
  ```
77
108
 
78
- ### Cleanup (LAST step, after final report)
109
+ If >300 lines, split into multiple commits (data logic API/UI layer).
110
+
111
+ ### Phase 4: Report and cleanup
112
+ Report:
113
+ - Commit(s): hash + message
114
+ - Files created/modified (count)
115
+ - Tests: pass/fail
116
+ - Dead code found
117
+ - Blockers or escalations
118
+
119
+ Cleanup:
79
120
  ```bash
80
121
  git -C ../[repo] worktree remove /tmp/[repo]-[session]
81
122
  ```
82
123
 
83
- ## Workflow
84
- 1. Set up the worktree (see Git workflow above)
85
- 2. Read the repo's CLAUDE.md — follow its conventions strictly
86
- 3. Implement the assigned tasks from the plan
87
- 4. Run existing tests — fix any regressions you introduce
88
- 5. Identify and remove dead code exposed by your changes
89
- 6. Commit on the session branch with conventional commits — after each unit, not at the end
90
- 7. Before reporting: `git status` — must be clean. `git log --oneline -5` — include in report
91
- 8. Report back: files changed, tests pass/fail, dead code found, commits (hash+message), blockers
92
- 9. Clean up the worktree (last step)
93
-
94
- ## Rules
95
- - Follow existing patterns in the codebase — consistency over preference
96
- - **NEVER run `git checkout` or `git switch` outside of `/tmp/`** — this would disrupt the main repo
97
- - **NEVER `cd` into `../[repo]` to work** — always use the `/tmp/` worktree
98
- - If you face an architectural decision NOT covered by the plan: **STOP and escalate**
99
- - Never guess on multi-tenant scoping or auth — escalate if unclear
100
- - Every new behavior needs at least one success test and one error test
101
-
102
124
  ## Memory
103
- Record useful findings about this repo:
104
- - Key file locations and architecture patterns
105
- - Test commands and configuration
106
- - Common pitfalls you encounter
125
+ Record: key file locations, architecture patterns, test commands, common pitfalls.