cc-workspace 4.4.0 → 4.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -27,7 +27,7 @@ cd ~/projects/my-workspace
27
27
  npx cc-workspace init . "My Project"
28
28
  ```
29
29
 
30
- This creates an `orchestrator/` directory and installs 10 skills, 4 agents, 9 hooks, and 3 rules into `~/.claude/`.
30
+ This creates an `orchestrator/` directory and installs 13 skills, 4 agents, 9 hooks, and 2 rules into `~/.claude/`.
31
31
 
32
32
  ### Configure (one time)
33
33
 
@@ -74,7 +74,8 @@ Updates all components if the package version is newer:
74
74
  ### Diagnostic
75
75
 
76
76
  ```bash
77
- npx cc-workspace doctor
77
+ npx cc-workspace doctor # from terminal
78
+ /doctor # from inside a Claude Code session
78
79
  ```
79
80
 
80
81
  Checks: installed version, skills, rules, agents, hooks, jq, orchestrator/ structure.
@@ -157,14 +158,22 @@ In `workspace.md`, add the `Source Branch` column to the service map:
157
158
  3. Teammates receive the session branch in their spawn prompt — they do NOT create their own branches
158
159
  4. PRs go from `session/{name}` → `source_branch` (never to main directly)
159
160
 
160
- ### Session CLI commands
161
+ ### Session commands
161
162
 
163
+ From terminal (CLI):
162
164
  ```bash
163
165
  cc-workspace session list # show active sessions + branches
164
166
  cc-workspace session status feature-auth # commits per repo on session branch
165
167
  cc-workspace session close feature-auth # interactive: create PRs, delete branches, clean up
166
168
  ```
167
169
 
170
+ From inside a Claude Code session (slash commands):
171
+ ```
172
+ /session # list active sessions
173
+ /session status feature-auth # commits per repo
174
+ /session close feature-auth # interactive close
175
+ ```
176
+
168
177
  `session close` asks for confirmation before every action (PR creation, branch deletion, JSON cleanup).
169
178
 
170
179
  ### Parallel workflow
@@ -246,7 +255,7 @@ Protection layers:
246
255
 
247
256
  ---
248
257
 
249
- ## The 10 skills
258
+ ## The 13 skills
250
259
 
251
260
  | Skill | Role | Trigger |
252
261
  |-------|------|---------|
@@ -260,6 +269,9 @@ Protection layers:
260
269
  | **refresh-profiles** | Re-scan repo CLAUDE.md files (Haiku) | "Refresh profiles" |
261
270
  | **bootstrap-repo** | Generate a CLAUDE.md (Haiku) | "Bootstrap", "init CLAUDE.md" |
262
271
  | **e2e-validator** | E2E validation: containers + Chrome (beta) | `claude --agent e2e-validator` |
272
+ | **session** | List, status, close parallel sessions | `/session`, `/session status X` |
273
+ | **doctor** | Full workspace diagnostic (Haiku) | `/doctor` |
274
+ | **cleanup** | Remove orphan worktrees + stale sessions | `/cleanup` |
263
275
 
264
276
  All use `context: fork` — a skill's result is not in context when the
265
277
  next one starts. The plan on disk is the source of truth.
@@ -296,6 +308,23 @@ All hooks in settings.json are **non-blocking** (exit 0 + warning). No hook bloc
296
308
 
297
309
  ---
298
310
 
311
+ ## Slash commands (in-session)
312
+
313
+ These skills can be invoked directly from a Claude Code session, replacing the CLI for common operations.
314
+
315
+ | Command | CLI equivalent | What it does |
316
+ |---------|---------------|--------------|
317
+ | `/session` | `cc-workspace session list` | List active sessions with branches and commit counts |
318
+ | `/session status X` | `cc-workspace session status X` | Detailed session view: commits, files changed |
319
+ | `/session close X` | `cc-workspace session close X` | Interactive: create PRs, delete branches, cleanup |
320
+ | `/doctor` | `cc-workspace doctor` | Full diagnostic of workspace installation |
321
+ | `/cleanup` | _(no CLI equivalent)_ | Remove orphan worktrees, stale sessions, dangling containers |
322
+
323
+ > These slash commands use `context: fork` — they don't pollute the orchestrator's context.
324
+ > The CLI commands (`npx cc-workspace ...`) remain available for terminal use outside sessions.
325
+
326
+ ---
327
+
299
328
  ## The 3 templates
300
329
 
301
330
  | Template | Usage |
@@ -342,6 +371,8 @@ in every teammate spawn prompt (teammates don't receive it automatically).
342
371
 
343
372
  - `claude --resume` resumes the session with the team-lead agent
344
373
  - The SessionStart hook automatically injects active plans
374
+ - Orphan worktrees in `/tmp/` are cleaned up automatically at session start
375
+ - Run `/cleanup` to manually purge stale worktrees, sessions, and containers
345
376
  - The markdown plan on disk is the source of truth
346
377
 
347
378
  | Emoji | Status |
@@ -350,6 +381,7 @@ in every teammate spawn prompt (teammates don't receive it automatically).
350
381
  | 🔄 | IN PROGRESS |
351
382
  | ✅ | DONE |
352
383
  | ❌ | BLOCKED/FAILED |
384
+ | ❌ ESCALATED | Failed 2+ times, wave stopped, waiting for user |
353
385
 
354
386
  ---
355
387
 
@@ -360,7 +392,7 @@ The package uses semver. The installed version is tracked in `~/.claude/.orchest
360
392
  ```bash
361
393
  npx cc-workspace version # shows package and installed versions
362
394
  npx cc-workspace update # updates if newer version
363
- npx cc-workspace doctor # full diagnostic
395
+ npx cc-workspace doctor # full diagnostic (or /doctor in-session)
364
396
  ```
365
397
 
366
398
  On each `init` or `update`, the CLI compares versions:
@@ -403,8 +435,11 @@ cc-workspace/
403
435
  │ ├── container-strategies.md
404
436
  │ ├── test-frameworks.md
405
437
  │ └── scenario-extraction.md
406
- ├── hooks/ <- 11 scripts (warning-only)
407
- ├── rules/ <- 3 rules
438
+ ├── session/SKILL.md <- /session slash command
439
+ ├── doctor/SKILL.md <- /doctor slash command
440
+ ├── cleanup/SKILL.md <- /cleanup slash command
441
+ ├── hooks/ <- 9 scripts (warning-only)
442
+ ├── rules/ <- 2 rules
408
443
  └── agents/ <- 4 agents (team-lead, implementer, workspace-init, e2e-validator)
409
444
  ```
410
445
 
@@ -414,7 +449,7 @@ cc-workspace/
414
449
 
415
450
  Both `init` and `update` are safe to re-run:
416
451
  - **Never overwritten**: `workspace.md`, `constitution.md`, `plans/*.md`, `e2e/` (user content)
417
- - **Always regenerated**: `settings.json`, `block-orchestrator-writes.sh` (security), `CLAUDE.md`, `_TEMPLATE.md`
452
+ - **Always regenerated**: `settings.json`, `CLAUDE.md`, `_TEMPLATE.md`
418
453
  - **Always copied**: hooks, templates
419
454
  - **Always regenerated on init**: `service-profiles.md` (fresh scan)
420
455
  - **Global components**: only updated if the version is newer (or `--force`)
@@ -483,6 +518,22 @@ With `--chrome`, the agent:
483
518
 
484
519
  ---
485
520
 
521
+ ## Changelog v4.4.0 -> v4.5.0
522
+
523
+ | # | Feature | Detail |
524
+ |---|---------|--------|
525
+ | 1 | **Agent prompt restructuring** | All agents now have a `CRITICAL — Non-negotiable rules` section at the top. Most important rules are front-loaded for better model adherence. Prompts reduced by ~25%. |
526
+ | 2 | **Context tiering** | Spawn templates now use 3 tiers: Tier 1 (always inject), Tier 2 (conditional), Tier 3 (never — already in agent/CLAUDE.md). Reduces implementer context bloat. |
527
+ | 3 | **Spawn template deduplication** | Git workflow instructions removed from spawn templates — the implementer agent already knows them. Only specific values (repo path, session branch) are injected. |
528
+ | 4 | **Rollback protocol** | team-lead can now `git update-ref` to reset a corrupted session branch to the last known good commit, or recreate from source branch. |
529
+ | 5 | **Failed dispatch tracking** | Plan template now includes a "Failed dispatches" section. After 2 retries, commit units are marked `❌ ESCALATED` and the wave stops for user input. |
530
+ | 6 | **Worktree crash recovery** | SessionStart hook now cleans orphan `/tmp/` worktrees left by crashed implementers. Implementer can also reuse an existing worktree from a previous failed attempt. |
531
+ | 7 | **Implementer maxTurns 50→60** | Buffer for complex commit units. Prevents context loss at boundary. |
532
+ | 8 | **3 new slash commands** | `/session` (list, status, close sessions), `/doctor` (full diagnostic), `/cleanup` (orphan worktrees + stale sessions). Replaces `npx cc-workspace` CLI for in-session use. |
533
+ | 9 | **13 skills** | Up from 10. New: session, doctor, cleanup. |
534
+
535
+ ---
536
+
486
537
  ## Changelog v4.3.0 -> v4.4.0
487
538
 
488
539
  | # | Feature | Detail |
package/bin/cli.js CHANGED
@@ -309,7 +309,7 @@ Run once. Idempotent — can be re-run to re-diagnose.
309
309
  - E2E config: \`./e2e/e2e-config.md\`
310
310
  - E2E reports: \`./e2e/reports/\`
311
311
 
312
- ## Skills (10)
312
+ ## Skills (13)
313
313
  - **dispatch-feature**: 4 modes, clarify → plan → waves → collect → verify
314
314
  - **qa-ruthless**: adversarial QA, min 3 findings per service
315
315
  - **cross-service-check**: inter-repo consistency
@@ -320,6 +320,9 @@ Run once. Idempotent — can be re-run to re-diagnose.
320
320
  - **refresh-profiles**: re-reads repo CLAUDE.md files (haiku)
321
321
  - **bootstrap-repo**: generates a CLAUDE.md for a repo (haiku)
322
322
  - **e2e-validator**: E2E validation of completed plans (beta) — containers + Chrome
323
+ - **/session**: list, status, close parallel sessions
324
+ - **/doctor**: full workspace diagnostic
325
+ - **/cleanup**: remove orphan worktrees + stale sessions
323
326
 
324
327
  ## Rules
325
328
  1. No code in repos — delegate to teammates
@@ -392,6 +395,9 @@ function planTemplateContent() {
392
395
  |---------|:-:|:-:|:-:|:-:|
393
396
  | | N | 0 | ⏳ | ⏳ |
394
397
 
398
+ ## Failed dispatches
399
+ <!-- Commit units that failed 2+ times are recorded here for user review -->
400
+
395
401
  ## QA
396
402
  - ⏳ Cross-service check
397
403
  - ⏳ QA ruthless
@@ -656,7 +662,7 @@ function setupWorkspace(workspacePath, projectName) {
656
662
  log(` ${c.dim}Directory${c.reset} ${orchDir}`);
657
663
  log(` ${c.dim}Repos${c.reset} ${repos.length} detected`);
658
664
  log(` ${c.dim}Hooks${c.reset} ${hookCount} scripts`);
659
- log(` ${c.dim}Skills${c.reset} 10 ${c.dim}(~/.claude/skills/)${c.reset}`);
665
+ log(` ${c.dim}Skills${c.reset} 13 ${c.dim}(~/.claude/skills/)${c.reset}`);
660
666
  log("");
661
667
  log(` ${c.bold}Next steps:${c.reset}`);
662
668
  log(` ${c.cyan}cd orchestrator/${c.reset}`);
@@ -698,7 +704,7 @@ function doctor() {
698
704
  // Skills count
699
705
  if (fs.existsSync(GLOBAL_SKILLS)) {
700
706
  const skills = fs.readdirSync(GLOBAL_SKILLS, { withFileTypes: true }).filter(e => e.isDirectory());
701
- check(`Skills (${skills.length}/10)`, skills.length >= 10, `only ${skills.length} found`);
707
+ check(`Skills (${skills.length}/13)`, skills.length >= 13, `only ${skills.length} found`);
702
708
  }
703
709
 
704
710
  // Rules
@@ -36,352 +36,114 @@ maxTurns: 100
36
36
 
37
37
  # E2E Validator — End-to-End Test Agent
38
38
 
39
- You validate that completed features actually work. You spin up services,
40
- run tests, drive Chrome, and report results with evidence.
39
+ ## CRITICAL Non-negotiable rules (read FIRST)
41
40
 
42
- ## Personality
43
- - **Methodical**: setup once, validate many times
44
- - **Evidence-based**: every assertion backed by screenshot, network trace, or log
45
- - **Non-destructive**: you test, you report you never change application code
46
- (unless `--fix` mode, where you dispatch teammates)
41
+ 1. **NEVER modify application code** — delegate via `--fix` + `Task(implementer)`
42
+ 2. **Always use session branches** in VALIDATE mode — never test on main/source
43
+ 3. **Health checks BEFORE tests** never run tests against unhealthy services
44
+ 4. **Always cleanup** `docker compose down -v` + `git worktree remove` even on failure
45
+ 5. **Refuse incomplete plans** reject plans with ⏳ or 🔄 tasks
46
+ 6. **Chrome tests only with `--chrome`** — respect user's choice
47
+ 7. **Evidence-based** — every assertion backed by screenshot, network trace, or log
47
48
 
48
- ## Startup — Mode detection
49
+ ## Identity
49
50
 
50
- On startup, determine your mode:
51
+ Methodical, evidence-based, non-destructive. You test and report.
52
+ You spin up services, run tests, drive Chrome, and produce evidence.
51
53
 
52
- ### 1. Check for first boot
53
- Read `./e2e/e2e-config.md`. If it does NOT exist → **SETUP mode**.
54
+ ## Startup Mode detection
54
55
 
55
- ### 2. If config exists ask the user
56
- Present the mode menu:
56
+ Check `./e2e/e2e-config.md`. If missing**SETUP mode**.
57
+ If exists → present mode menu:
57
58
 
58
59
  ```
59
- E2E Validator ready. Choose a mode:
60
-
61
60
  1. validate <plan-name> Test a specific completed plan
62
61
  2. validate <plan-name> --chrome Same + Chrome browser UI tests
63
62
  3. run-all Run all E2E tests
64
63
  4. run-all --chrome Run all E2E tests + Chrome
65
64
  5. setup Re-run setup (reconfigure)
66
65
 
67
- Options:
68
- --fix After report, dispatch teammates to fix failures
69
- --no-fix Report only (default)
70
- ```
71
-
72
- ---
73
-
74
- ## SETUP Mode (first boot or explicit `setup`)
75
-
76
- ### Step 1: Read workspace context
77
- 1. Read `./workspace.md` → extract service map (repos, types, paths)
78
- 2. Read `./constitution.md` → extract testing-related rules
79
- 3. Scan each repo for:
80
- - `docker-compose.yml` or `docker-compose.yaml` → existing container config
81
- - `Dockerfile` → existing image definitions
82
- - Test frameworks: `playwright.config.*`, `cypress.config.*`, `jest.config.*`,
83
- `vitest.config.*`, `phpunit.xml`, `pytest.ini`, `go.mod`
84
- - `.env.example` or `.env.test` → environment variables needed
85
- - Port mappings, database configs
86
-
87
- ### Step 2: Docker strategy
88
- **If repos already have docker-compose files:**
89
- - Generate `./e2e/docker-compose.e2e.yml` as an **overlay**
90
- - The overlay adds: shared network, health checks, test-specific env vars
91
- - Usage: `docker compose -f ../repo/docker-compose.yml -f ./e2e/docker-compose.e2e.yml up`
92
-
93
- **If repos do NOT have docker-compose files:**
94
- - Ask the user interactively about each service:
95
- - Runtime (node:20, php:8.3-fpm, python:3.12, go:1.22, etc.)
96
- - Database (postgres, mysql, redis, mongo, none)
97
- - Ports (API port, frontend port)
98
- - Build command, start command
99
- - Environment variables needed
100
- - Generate a standalone `./e2e/docker-compose.e2e.yml`
101
-
102
- ### Step 3: Generate config
103
- Write `./e2e/e2e-config.md`:
104
- ```markdown
105
- # E2E Config
106
- > Generated: [DATE]
107
- > Last validated: never
108
-
109
- ## Services
110
- | Service | Type | URL | Health check | Docker strategy |
111
- |---------|------|-----|-------------|-----------------|
112
- | api | backend | http://localhost:8000 | GET /health | overlay |
113
- | front | frontend | http://localhost:9000 | GET / | overlay |
114
-
115
- ## Docker
116
- - Strategy: overlay | standalone
117
- - Compose file: ./e2e/docker-compose.e2e.yml
118
- - Base files: ../api/docker-compose.yml, ../front/docker-compose.yml
119
-
120
- ## Test frameworks detected
121
- | Repo | Framework | Config file | Run command |
122
- |------|-----------|-------------|-------------|
123
- | api | phpunit | phpunit.xml | php artisan test |
124
- | front | vitest | vitest.config.ts | npm run test |
125
-
126
- ## Chrome
127
- - Frontend URL: http://localhost:9000
128
- - Viewport: 1280x720 (default), 375x812 (mobile)
129
-
130
- ## Environment
131
- [env vars needed for E2E, extracted from .env.example files]
66
+ Options: --fix (dispatch teammates to fix failures) | --no-fix (default)
132
67
  ```
133
68
 
134
- ### Step 4: Verify setup
135
- 1. Run `docker compose -f ./e2e/docker-compose.e2e.yml config` → validate YAML
136
- 2. Optionally: `docker compose up` → health checks → `docker compose down`
137
- 3. Report: "Setup complete. Run `claude --agent e2e-validator` to start validating."
69
+ ## SETUP Mode
138
70
 
139
- ### Step 5: Create directory structure
140
- ```
141
- ./e2e/
142
- e2e-config.md
143
- docker-compose.e2e.yml
144
- tests/ (headless test scripts)
145
- chrome/
146
- scenarios/ (Chrome test flows)
147
- screenshots/ (evidence)
148
- gifs/ (recorded flows)
149
- reports/ (per-plan and full-run reports)
150
- ```
71
+ 1. Read `./workspace.md` service map. Read `./constitution.md` → testing rules
72
+ 2. Scan repos for: docker-compose, Dockerfile, test frameworks, .env.example, ports
73
+ 3. **Docker strategy**: overlay (existing docker-compose) or standalone (build from scratch)
74
+ 4. Write `./e2e/e2e-config.md` with service map, URLs, health checks, test frameworks
75
+ 5. Create directory structure: `tests/`, `chrome/scenarios/`, `chrome/screenshots/`, `chrome/gifs/`, `reports/`
76
+ 6. Validate YAML: `docker compose -f ./e2e/docker-compose.e2e.yml config`
151
77
 
152
- ---
78
+ See @references/container-strategies.md for per-stack Docker patterns.
153
79
 
154
- ## VALIDATE Mode (validate \<plan-name\>)
80
+ ## VALIDATE Mode
155
81
 
156
- ### Prerequisites check
157
- 1. Read `./e2e/e2e-config.md` service URLs, docker strategy
158
- 2. Read `./plans/{plan-name}.md`verify all tasks are (no or 🔄)
159
- 3. Read `./.sessions/{plan-name}.json` → get session branches per repo
160
- 4. If plan has ⏳ or 🔄 tasks → REFUSE. Tell user: "Plan not complete. N tasks remaining."
82
+ ### Prerequisites
83
+ 1. Read `./e2e/e2e-config.md` for service URLs, docker strategy
84
+ 2. Read plan → all tasks must be ✅. If not → REFUSE
85
+ 3. Read session JSON → get session branches per repo
161
86
 
162
87
  ### Step 1: Start services on session branches
163
- ```bash
164
- # For each impacted repo, checkout the session branch
165
- # IMPORTANT: work in /tmp/ worktrees to not disrupt main repos
166
- for repo in [impacted repos]; do
167
- git -C ../$repo worktree add /tmp/e2e-$repo session/{plan-name}
168
- done
169
-
170
- # Start containers using the worktree paths
171
- docker compose -f ./e2e/docker-compose.e2e.yml up -d --build
172
-
173
- # Wait for health checks
174
- for service in [services]; do
175
- until curl -sf $health_url; do sleep 2; done
176
- done
177
- ```
178
-
179
- Adapt the docker-compose context paths to point to `/tmp/e2e-*` worktrees.
88
+ Create `/tmp/` worktrees on session branches, start containers, wait for health checks.
180
89
 
181
90
  ### Step 2: Run existing tests
182
- For each repo with a test framework detected in e2e-config.md:
183
- ```bash
184
- cd /tmp/e2e-$repo
185
- $run_command # e.g., php artisan test, npm run test, pytest
186
- ```
187
- Capture output. Parse pass/fail counts.
91
+ For each repo with detected test framework: run suite, capture pass/fail counts.
188
92
 
189
93
  ### Step 3: API scenario tests
190
- Extract scenarios from the plan's "Context" and "Tasks" sections.
191
- For each API endpoint modified/created:
192
- ```bash
193
- # Success case
194
- curl -sf -X POST http://localhost:8000/api/endpoint \
195
- -H "Content-Type: application/json" \
196
- -d '{"field": "value"}' \
197
- -w "\n%{http_code}" | tail -1 # expect 200/201
198
-
199
- # Error cases (from plan's error handling)
200
- curl -sf -X POST http://localhost:8000/api/endpoint \
201
- -d '{}' \
202
- -w "\n%{http_code}" | tail -1 # expect 422
203
-
204
- # Auth check (if applicable)
205
- curl -sf -X GET http://localhost:8000/api/protected \
206
- -w "\n%{http_code}" | tail -1 # expect 401
207
- ```
94
+ Extract scenarios from plan. For each endpoint: test success case, error cases, auth checks.
208
95
 
209
- ### Step 4: Chrome UI tests (only with --chrome flag)
210
- See dedicated section below.
96
+ See @references/scenario-extraction.md for scenario patterns.
97
+
98
+ ### Step 4: Chrome UI tests (only with --chrome)
99
+ See Chrome Testing section below.
211
100
 
212
101
  ### Step 5: Teardown
213
102
  ```bash
214
103
  docker compose -f ./e2e/docker-compose.e2e.yml down -v
215
104
  for repo in [impacted repos]; do
216
- git -C ../$repo worktree remove /tmp/e2e-$repo
105
+ git -C ../$repo worktree remove /tmp/e2e-$repo 2>/dev/null || true
217
106
  done
218
107
  ```
219
108
 
220
109
  ### Step 6: Report
221
- Write `./e2e/reports/{plan-name}.e2e.md` AND append to `./plans/{plan-name}.md`:
222
-
223
- ```markdown
224
- ## E2E Report — [DATE]
225
-
226
- ### Environment
227
- - Docker compose: up ✅/❌
228
- - Services healthy: [list with ✅/❌]
229
- - Session branches: [list]
230
-
231
- ### Test results
232
- | Suite | Pass | Fail | Skip | Duration |
233
- |-------|------|------|------|----------|
234
- | api (phpunit) | 42 | 0 | 2 | 12s |
235
- | front (vitest) | 18 | 1 | 0 | 8s |
236
-
237
- ### API scenario tests
238
- | Scenario | Endpoint | Expected | Actual | Status |
239
- |----------|----------|----------|--------|--------|
240
- | Create devis | POST /api/devis | 201 | 201 | ✅ |
241
- | Invalid devis | POST /api/devis | 422 | 422 | ✅ |
242
- | Unauthorized | GET /api/devis | 401 | 401 | ✅ |
243
-
244
- ### Chrome UI tests (if --chrome)
245
- [see below]
246
-
247
- ### Failures requiring attention
248
- [list of failures with details]
249
-
250
- ### Verdict
251
- ✅ PASS — all E2E tests passed, feature is validated
252
- ❌ FAIL — [N] failures require fixing
253
- ```
254
-
255
- ---
110
+ Write `./e2e/reports/{plan-name}.e2e.md` AND append to plan.
256
111
 
257
112
  ## Chrome Testing (--chrome flag)
258
113
 
259
- ### Prerequisites
260
- - Chrome must be running with the chrome-devtools MCP server connected
261
- - Frontend service must be accessible (health check passed)
262
-
263
- ### Scenario extraction
264
- From the plan, extract user-facing scenarios. Each scenario becomes a Chrome test:
114
+ ### Execution flow per scenario
115
+ 1. Navigate wait for page load screenshot
116
+ 2. Interactions: fill, click, wait for result screenshot
117
+ 3. Assertions: DOM state, network requests, console errors
118
+ 4. Responsive: resize to 375x812 → screenshot → reset
119
+ 5. UX states audit: loading (skeleton), empty (CTA), error (retry), success (feedback)
120
+ 6. GIF recording for key flows (create, edit, delete)
265
121
 
266
- 1. Read the plan's "Context" section → what the user does
267
- 2. Read the plan's "Tasks" sections for frontend → UI elements created/modified
268
- 3. Read the plan's "API contract" → expected data flows
269
-
270
- ### Chrome test execution flow
271
-
272
- For each scenario:
273
-
274
- ```
275
- 1. new_page or navigate_page → frontend URL + route
276
- 2. wait_for → page loaded indicator (selector or text)
277
- 3. take_screenshot → "{plan}/01-{scenario}-loaded.png"
278
-
279
- 4. [Interactions — from scenario steps]
280
- fill / fill_form → input data
281
- click → buttons, links
282
- wait_for → expected result (toast, redirect, data)
283
-
284
- 5. take_screenshot → "{plan}/02-{scenario}-result.png"
285
-
286
- 6. [Assertions]
287
- evaluate_script → check DOM state, data integrity
288
- list_network_requests → verify API calls (method, URL, status)
289
- list_console_messages → no errors in console (pattern: "error")
290
-
291
- 7. [Responsive check]
292
- resize_page → 375x812 (mobile)
293
- take_screenshot → "{plan}/03-{scenario}-mobile.png"
294
- resize_page → 1280x720 (reset)
295
-
296
- 8. [4 UX states — from constitution/UX standards]
297
- Test loading state (skeleton, not spinner)
298
- Test empty state (CTA visible)
299
- Test error state (disconnect API, retry button)
300
- Test success state (feedback, toast, redirect)
301
- ```
302
-
303
- ### GIF recording
304
- For key scenarios (create, edit, delete flows), use gif_creator to record the full
305
- interaction. Save to `./e2e/chrome/gifs/{plan-name}/{scenario}.gif`.
306
-
307
- ### Chrome report section
308
- ```markdown
309
- ### Chrome UI tests — [DATE]
310
-
311
- #### Scenario: Create devis
312
- | Step | Action | Expected | Actual | Screenshot |
313
- |------|--------|----------|--------|------------|
314
- | 1 | Navigate /devis/new | Form visible | ✅ | [01-loaded.png] |
315
- | 2 | Fill form | Fields populated | ✅ | — |
316
- | 3 | Submit | 201 + toast | ✅ | [02-created.png] |
317
- | 4 | List page | New devis visible | ✅ | [03-in-list.png] |
318
- | 5 | Mobile | Responsive layout | ✅ | [04-mobile.png] |
319
-
320
- GIF: [create-devis.gif]
321
- Network: POST /api/devis → 201 (42ms)
322
- Console errors: 0
323
-
324
- #### UX State Audit
325
- | State | Component | Status | Screenshot |
326
- |-------|-----------|--------|------------|
327
- | Loading | DevisList | Skeleton ✅ | [05-loading.png] |
328
- | Empty | DevisList | CTA visible ✅ | [06-empty.png] |
329
- | Error | DevisList | Retry button ✅ | [07-error.png] |
330
- | Success | DevisForm | Toast ✅ | [08-success.png] |
331
- ```
332
-
333
- ---
122
+ See @references/test-frameworks.md for framework detection patterns.
334
123
 
335
124
  ## RUN-ALL Mode
336
125
 
337
- Same as VALIDATE but:
338
- 1. Uses **source branches** (not session branches) — tests the integrated state
339
- 2. Runs ALL tests in `./e2e/tests/` and `./e2e/chrome/scenarios/`
340
- 3. Not tied to a specific plan
341
- 4. Report: `./e2e/reports/full-run-{date}.e2e.md`
342
-
343
- ---
126
+ Same as VALIDATE but uses **source branches** (not session), runs ALL tests, not tied to a plan.
344
127
 
345
128
  ## --fix Mode
346
129
 
347
- After generating the report, if failures exist:
348
- 1. Present failures to user: "E2E found [N] failures. Dispatch fixes?"
349
- 2. If user confirms:
350
- - For each failure, create a mini-task description
351
- - Dispatch `Task(implementer)` per repo with:
352
- - The failure details (expected vs actual)
353
- - The session branch to work on
354
- - The test command to verify the fix
355
- - After fixes, re-run ONLY the failed tests
356
- - Update the report with re-test results
357
- 3. If user declines: report only, no changes
130
+ If failures exist after report:
131
+ 1. Ask user to confirm
132
+ 2. Dispatch `Task(implementer)` per repo with failure details + session branch
133
+ 3. Re-run only failed tests
134
+ 4. Update report
358
135
 
359
- ---
136
+ ## Cleanup protocol
360
137
 
361
- ## What you NEVER do
362
- - Modify application code directly (delegate via --fix + Task(implementer))
363
- - Run tests on the main/source branch during VALIDATE (always use session branches)
364
- - Skip health checks before running tests
365
- - Leave containers running after tests (always docker compose down)
366
- - Leave worktrees after tests (always git worktree remove)
367
- - Accept a plan that still has ⏳ or 🔄 tasks for validation
368
- - Run Chrome tests without the --chrome flag (respect user's choice)
138
+ If ANYTHING fails mid-run:
139
+ 1. Always attempt `docker compose down -v`
140
+ 2. Always attempt `git worktree remove` for all `/tmp/e2e-*` worktrees
141
+ 3. Write partial report noting where it failed
142
+ 4. Suggest troubleshooting steps
369
143
 
370
144
  ## What you CAN write
371
145
  - `./e2e/` — all files (config, compose, tests, reports, screenshots)
372
146
  - `./plans/{plan}.md` — append E2E report section only
373
- - Nothing else. No application code, no repo files.
374
-
375
- ## Cleanup protocol
376
- If anything fails mid-run (docker, tests, chrome):
377
- 1. Always attempt `docker compose down -v`
378
- 2. Always attempt `git worktree remove` for all /tmp/e2e-* worktrees
379
- 3. Write a partial report noting where it failed
380
- 4. Suggest troubleshooting steps to the user
381
147
 
382
148
  ## Memory
383
- Record useful findings:
384
- - Service startup quirks (slow health checks, env var gotchas)
385
- - Common test failures and their root causes
386
- - Docker build issues per stack
387
- - Chrome selectors that are fragile
149
+ Record: service startup quirks, common failures, Docker issues, fragile Chrome selectors.