cc-workspace 4.3.0 → 4.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +125 -7
- package/bin/cli.js +39 -7
- package/global-skills/agents/e2e-validator.md +149 -0
- package/global-skills/agents/implementer.md +65 -46
- package/global-skills/agents/team-lead.md +122 -145
- package/global-skills/cleanup/SKILL.md +94 -0
- package/global-skills/dispatch-feature/SKILL.md +94 -58
- package/global-skills/dispatch-feature/references/anti-patterns.md +21 -16
- package/global-skills/dispatch-feature/references/spawn-templates.md +95 -148
- package/global-skills/doctor/SKILL.md +90 -0
- package/global-skills/e2e-validator/references/container-strategies.md +304 -0
- package/global-skills/e2e-validator/references/scenario-extraction.md +151 -0
- package/global-skills/e2e-validator/references/test-frameworks.md +207 -0
- package/global-skills/hooks/session-start-context.sh +38 -7
- package/global-skills/session/SKILL.md +79 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -27,7 +27,7 @@ cd ~/projects/my-workspace
|
|
|
27
27
|
npx cc-workspace init . "My Project"
|
|
28
28
|
```
|
|
29
29
|
|
|
30
|
-
This creates an `orchestrator/` directory and installs
|
|
30
|
+
This creates an `orchestrator/` directory and installs 13 skills, 4 agents, 9 hooks, and 2 rules into `~/.claude/`.
|
|
31
31
|
|
|
32
32
|
### Configure (one time)
|
|
33
33
|
|
|
@@ -47,7 +47,8 @@ The init agent will:
|
|
|
47
47
|
|
|
48
48
|
```bash
|
|
49
49
|
cd orchestrator/
|
|
50
|
-
claude --agent team-lead
|
|
50
|
+
claude --agent team-lead # orchestration sessions
|
|
51
|
+
claude --agent e2e-validator # E2E validation (beta)
|
|
51
52
|
```
|
|
52
53
|
|
|
53
54
|
The team-lead offers 4 modes:
|
|
@@ -68,7 +69,7 @@ npx cc-workspace update
|
|
|
68
69
|
Updates all components if the package version is newer:
|
|
69
70
|
- **Global**: skills, rules, agents in `~/.claude/`
|
|
70
71
|
- **Local** (if `orchestrator/` found): hooks, settings.json, CLAUDE.md, templates, _TEMPLATE.md
|
|
71
|
-
- **Never overwritten**: workspace.md, constitution.md, plans/
|
|
72
|
+
- **Never overwritten**: workspace.md, constitution.md, plans/, e2e/
|
|
72
73
|
|
|
73
74
|
### Diagnostic
|
|
74
75
|
|
|
@@ -95,6 +96,15 @@ my-workspace/
|
|
|
95
96
|
│ ├── workspace.md <- filled by workspace-init
|
|
96
97
|
│ ├── constitution.md <- filled by workspace-init
|
|
97
98
|
│ ├── .sessions/ <- session state (gitignored, created per session)
|
|
99
|
+
│ ├── e2e/ <- E2E test environment (beta)
|
|
100
|
+
│ │ ├── e2e-config.md <- agent memory (generated at first boot)
|
|
101
|
+
│ │ ├── docker-compose.e2e.yml <- generated at first boot
|
|
102
|
+
│ │ ├── tests/ <- headless API test scripts
|
|
103
|
+
│ │ ├── chrome/
|
|
104
|
+
│ │ │ ├── scenarios/ <- Chrome test flows per plan
|
|
105
|
+
│ │ │ ├── screenshots/ <- evidence
|
|
106
|
+
│ │ │ └── gifs/ <- recorded flows
|
|
107
|
+
│ │ └── reports/ <- per-plan E2E reports
|
|
98
108
|
│ ├── templates/
|
|
99
109
|
│ │ ├── workspace.template.md
|
|
100
110
|
│ │ ├── constitution.template.md
|
|
@@ -198,6 +208,7 @@ parallel in each repo via Agent Teams.
|
|
|
198
208
|
| **Teammates** | Sonnet 4.6 | Implement in an isolated worktree, test, commit. |
|
|
199
209
|
| **Explorers** | Haiku | Read-only. Scan, verify consistency. |
|
|
200
210
|
| **QA** | Sonnet 4.6 | Hostile mode. Min 3 problems found per service. |
|
|
211
|
+
| **E2E Validator** | Sonnet 4.6 | Containers + Chrome browser testing (beta). |
|
|
201
212
|
|
|
202
213
|
### The 4 session modes
|
|
203
214
|
|
|
@@ -235,7 +246,7 @@ Protection layers:
|
|
|
235
246
|
|
|
236
247
|
---
|
|
237
248
|
|
|
238
|
-
## The
|
|
249
|
+
## The 13 skills
|
|
239
250
|
|
|
240
251
|
| Skill | Role | Trigger |
|
|
241
252
|
|-------|------|---------|
|
|
@@ -248,19 +259,24 @@ Protection layers:
|
|
|
248
259
|
| **cycle-retrospective** | Post-cycle learning (Haiku) | "Retro", "retrospective" |
|
|
249
260
|
| **refresh-profiles** | Re-scan repo CLAUDE.md files (Haiku) | "Refresh profiles" |
|
|
250
261
|
| **bootstrap-repo** | Generate a CLAUDE.md (Haiku) | "Bootstrap", "init CLAUDE.md" |
|
|
262
|
+
| **e2e-validator** | E2E validation: containers + Chrome (beta) | `claude --agent e2e-validator` |
|
|
263
|
+
| **session** | List, status, close parallel sessions | `/session`, `/session status X` |
|
|
264
|
+
| **doctor** | Full workspace diagnostic (Haiku) | `/doctor` |
|
|
265
|
+
| **cleanup** | Remove orphan worktrees + stale sessions | `/cleanup` |
|
|
251
266
|
|
|
252
267
|
All use `context: fork` — a skill's result is not in context when the
|
|
253
268
|
next one starts. The plan on disk is the source of truth.
|
|
254
269
|
|
|
255
270
|
---
|
|
256
271
|
|
|
257
|
-
## The
|
|
272
|
+
## The 4 agents
|
|
258
273
|
|
|
259
274
|
| Agent | Model | Usage |
|
|
260
275
|
|-------|-------|-------|
|
|
261
276
|
| **team-lead** | Opus 4.6 | `claude --agent team-lead` — multi-service orchestration |
|
|
262
277
|
| **workspace-init** | Sonnet 4.6 | `claude --agent workspace-init` — diagnostic + initial config |
|
|
263
278
|
| **implementer** | Sonnet 4.6 | Task subagent with `isolation: worktree` — isolated implementation |
|
|
279
|
+
| **e2e-validator** | Sonnet 4.6 | `claude --agent e2e-validator` — E2E validation with containers + Chrome (beta) |
|
|
264
280
|
|
|
265
281
|
---
|
|
266
282
|
|
|
@@ -385,9 +401,14 @@ cc-workspace/
|
|
|
385
401
|
├── cycle-retrospective/SKILL.md
|
|
386
402
|
├── refresh-profiles/SKILL.md
|
|
387
403
|
├── bootstrap-repo/SKILL.md
|
|
404
|
+
├── e2e-validator/
|
|
405
|
+
│ └── references/
|
|
406
|
+
│ ├── container-strategies.md
|
|
407
|
+
│ ├── test-frameworks.md
|
|
408
|
+
│ └── scenario-extraction.md
|
|
388
409
|
├── hooks/ <- 11 scripts (warning-only)
|
|
389
410
|
├── rules/ <- 3 rules
|
|
390
|
-
└── agents/ <-
|
|
411
|
+
└── agents/ <- 4 agents (team-lead, implementer, workspace-init, e2e-validator)
|
|
391
412
|
```
|
|
392
413
|
|
|
393
414
|
---
|
|
@@ -395,7 +416,7 @@ cc-workspace/
|
|
|
395
416
|
## Idempotence
|
|
396
417
|
|
|
397
418
|
Both `init` and `update` are safe to re-run:
|
|
398
|
-
- **Never overwritten**: `workspace.md`, `constitution.md`, `plans/*.md` (user content)
|
|
419
|
+
- **Never overwritten**: `workspace.md`, `constitution.md`, `plans/*.md`, `e2e/` (user content)
|
|
399
420
|
- **Always regenerated**: `settings.json`, `block-orchestrator-writes.sh` (security), `CLAUDE.md`, `_TEMPLATE.md`
|
|
400
421
|
- **Always copied**: hooks, templates
|
|
401
422
|
- **Always regenerated on init**: `service-profiles.md` (fresh scan)
|
|
@@ -403,6 +424,103 @@ Both `init` and `update` are safe to re-run:
|
|
|
403
424
|
|
|
404
425
|
---
|
|
405
426
|
|
|
427
|
+
## E2E Validator (beta)
|
|
428
|
+
|
|
429
|
+
A dedicated agent that validates completed plans by running services in containers
|
|
430
|
+
and testing scenarios — including Chrome browser-driven UI tests.
|
|
431
|
+
|
|
432
|
+
```bash
|
|
433
|
+
cd orchestrator/
|
|
434
|
+
claude --agent e2e-validator
|
|
435
|
+
```
|
|
436
|
+
|
|
437
|
+
### First boot — setup
|
|
438
|
+
|
|
439
|
+
On first boot (no `e2e/e2e-config.md`), the agent:
|
|
440
|
+
1. Reads `workspace.md` for repos and stacks
|
|
441
|
+
2. Scans repos for existing `docker-compose.yml` and test frameworks
|
|
442
|
+
3. If docker-compose exists: generates an overlay (`docker-compose.e2e.yml`)
|
|
443
|
+
4. If not: builds the config interactively with you
|
|
444
|
+
5. Writes `e2e/e2e-config.md` (its persistent memory)
|
|
445
|
+
|
|
446
|
+
### Modes
|
|
447
|
+
|
|
448
|
+
| Mode | Description |
|
|
449
|
+
|------|-------------|
|
|
450
|
+
| `validate <plan>` | Test a specific completed plan (API tests) |
|
|
451
|
+
| `validate <plan> --chrome` | Same + Chrome browser UI tests |
|
|
452
|
+
| `run-all` | Run all E2E tests (headless) |
|
|
453
|
+
| `run-all --chrome` | Run all E2E tests + Chrome |
|
|
454
|
+
| `setup` | Re-run first boot setup |
|
|
455
|
+
|
|
456
|
+
Add `--fix` to any mode to dispatch teammates for fixing failures.
|
|
457
|
+
|
|
458
|
+
### How it works
|
|
459
|
+
|
|
460
|
+
1. Creates `/tmp/` worktrees on session branches (from the plan)
|
|
461
|
+
2. Starts services via `docker compose up`
|
|
462
|
+
3. Waits for health checks
|
|
463
|
+
4. Runs existing test suites + generates API scenario tests from the plan
|
|
464
|
+
5. With `--chrome`: drives Chrome via chrome-devtools MCP (navigate, fill forms,
|
|
465
|
+
click, take screenshots, record GIFs, check network requests and console)
|
|
466
|
+
6. Generates report with evidence (screenshots, GIFs, network traces)
|
|
467
|
+
7. Tears down containers and worktrees
|
|
468
|
+
|
|
469
|
+
### Chrome testing
|
|
470
|
+
|
|
471
|
+
With `--chrome`, the agent:
|
|
472
|
+
- Navigates the frontend in your real Chrome browser
|
|
473
|
+
- Plays user scenarios extracted from the plan
|
|
474
|
+
- Takes screenshots at each step as evidence
|
|
475
|
+
- Records GIFs of complete flows
|
|
476
|
+
- Checks the 4 mandatory UX states (loading, empty, error, success)
|
|
477
|
+
- Tests responsive layouts (mobile viewport)
|
|
478
|
+
- Verifies network requests match the API contract
|
|
479
|
+
- Checks console for errors
|
|
480
|
+
|
|
481
|
+
### Requirements
|
|
482
|
+
|
|
483
|
+
- **Docker** (docker compose v2)
|
|
484
|
+
- **Chrome** with chrome-devtools MCP server (for `--chrome` mode)
|
|
485
|
+
- Completed plan (all tasks ✅) with session branches
|
|
486
|
+
|
|
487
|
+
---
|
|
488
|
+
|
|
489
|
+
## Changelog v4.4.0 -> v4.5.0
|
|
490
|
+
|
|
491
|
+
| # | Feature | Detail |
|
|
492
|
+
|---|---------|--------|
|
|
493
|
+
| 1 | **Agent prompt restructuring** | All agents now have a `CRITICAL — Non-negotiable rules` section at the top. Most important rules are front-loaded for better model adherence. Prompts reduced by ~25%. |
|
|
494
|
+
| 2 | **Context tiering** | Spawn templates now use 3 tiers: Tier 1 (always inject), Tier 2 (conditional), Tier 3 (never — already in agent/CLAUDE.md). Reduces implementer context bloat. |
|
|
495
|
+
| 3 | **Spawn template deduplication** | Git workflow instructions removed from spawn templates — the implementer agent already knows them. Only specific values (repo path, session branch) are injected. |
|
|
496
|
+
| 4 | **Rollback protocol** | team-lead can now `git update-ref` to reset a corrupted session branch to the last known good commit, or recreate from source branch. |
|
|
497
|
+
| 5 | **Failed dispatch tracking** | Plan template now includes a "Failed dispatches" section. After 2 retries, commit units are marked `❌ ESCALATED` and the wave stops for user input. |
|
|
498
|
+
| 6 | **Worktree crash recovery** | SessionStart hook now cleans orphan `/tmp/` worktrees left by crashed implementers. Implementer can also reuse an existing worktree from a previous failed attempt. |
|
|
499
|
+
| 7 | **Implementer maxTurns 50→60** | Buffer for complex commit units. Prevents context loss at boundary. |
|
|
500
|
+
| 8 | **3 new slash commands** | `/session` (list, status, close sessions), `/doctor` (full diagnostic), `/cleanup` (orphan worktrees + stale sessions). Replaces `npx cc-workspace` CLI for in-session use. |
|
|
501
|
+
| 9 | **13 skills** | Up from 10. New: session, doctor, cleanup. |
|
|
502
|
+
|
|
503
|
+
---
|
|
504
|
+
|
|
505
|
+
## Changelog v4.3.0 -> v4.4.0
|
|
506
|
+
|
|
507
|
+
| # | Feature | Detail |
|
|
508
|
+
|---|---------|--------|
|
|
509
|
+
| 1 | **E2E Validator agent (beta)** | New `e2e-validator` agent: validates completed plans by running services in containers. Supports headless API tests and Chrome browser-driven UI tests with screenshots and GIF recording. |
|
|
510
|
+
| 2 | **Chrome testing mode** | `--chrome` flag drives the user's Chrome browser via chrome-devtools MCP. Navigates, fills forms, clicks, takes screenshots, records GIFs, checks network and console. |
|
|
511
|
+
| 3 | **E2E directory structure** | `orchestrator/e2e/` created during init/update. Contains docker-compose overlay, test scripts, Chrome scenarios, screenshots, GIFs, and reports. Never overwritten by updates. |
|
|
512
|
+
| 4 | **Container strategies** | Reference docs for overlay and standalone docker-compose patterns per stack (PHP, Node, Python, Go, Vue, React). |
|
|
513
|
+
| 5 | **Scenario extraction** | Reference doc for extracting testable E2E scenarios from completed plans (API endpoints, Chrome flows, UX states). |
|
|
514
|
+
| 6 | **5 modes** | setup, validate, validate --chrome, run-all, run-all --chrome. Optional --fix dispatches teammates. |
|
|
515
|
+
|
|
516
|
+
---
|
|
517
|
+
|
|
518
|
+
## Changelog v4.2.0 -> v4.3.0
|
|
519
|
+
|
|
520
|
+
> Minor improvements and bug fixes.
|
|
521
|
+
|
|
522
|
+
---
|
|
523
|
+
|
|
406
524
|
## Changelog v4.1.4 -> v4.2.0
|
|
407
525
|
|
|
408
526
|
| # | Feature | Detail |
|
package/bin/cli.js
CHANGED
|
@@ -283,6 +283,7 @@ You clarify, plan, delegate, track.
|
|
|
283
283
|
cd orchestrator/
|
|
284
284
|
claude --agent workspace-init # first time: diagnostic + config
|
|
285
285
|
claude --agent team-lead # work sessions
|
|
286
|
+
claude --agent e2e-validator # E2E validation of completed plans
|
|
286
287
|
\`\`\`
|
|
287
288
|
|
|
288
289
|
## Initialization (workspace-init)
|
|
@@ -305,8 +306,10 @@ Run once. Idempotent — can be re-run to re-diagnose.
|
|
|
305
306
|
- Service profiles: \`./plans/service-profiles.md\`
|
|
306
307
|
- Active plans: \`./plans/*.md\`
|
|
307
308
|
- Active sessions: \`./.sessions/*.json\`
|
|
309
|
+
- E2E config: \`./e2e/e2e-config.md\`
|
|
310
|
+
- E2E reports: \`./e2e/reports/\`
|
|
308
311
|
|
|
309
|
-
## Skills (
|
|
312
|
+
## Skills (13)
|
|
310
313
|
- **dispatch-feature**: 4 modes, clarify → plan → waves → collect → verify
|
|
311
314
|
- **qa-ruthless**: adversarial QA, min 3 findings per service
|
|
312
315
|
- **cross-service-check**: inter-repo consistency
|
|
@@ -316,6 +319,10 @@ Run once. Idempotent — can be re-run to re-diagnose.
|
|
|
316
319
|
- **cycle-retrospective**: post-cycle learning (haiku)
|
|
317
320
|
- **refresh-profiles**: re-reads repo CLAUDE.md files (haiku)
|
|
318
321
|
- **bootstrap-repo**: generates a CLAUDE.md for a repo (haiku)
|
|
322
|
+
- **e2e-validator**: E2E validation of completed plans (beta) — containers + Chrome
|
|
323
|
+
- **/session**: list, status, close parallel sessions
|
|
324
|
+
- **/doctor**: full workspace diagnostic
|
|
325
|
+
- **/cleanup**: remove orphan worktrees + stale sessions
|
|
319
326
|
|
|
320
327
|
## Rules
|
|
321
328
|
1. No code in repos — delegate to teammates
|
|
@@ -333,6 +340,7 @@ Run once. Idempotent — can be re-run to re-diagnose.
|
|
|
333
340
|
13. Retrospective cycle after each completed feature
|
|
334
341
|
14. Session branches for parallel isolation — teammates use session/{name}, never create own branches
|
|
335
342
|
15. Never \`git checkout -b\` in repos — use \`git branch\` (no checkout) to avoid disrupting parallel sessions
|
|
343
|
+
16. E2E validation via \`claude --agent e2e-validator\` after plans are complete
|
|
336
344
|
`;
|
|
337
345
|
}
|
|
338
346
|
|
|
@@ -387,6 +395,9 @@ function planTemplateContent() {
|
|
|
387
395
|
|---------|:-:|:-:|:-:|:-:|
|
|
388
396
|
| | N | 0 | ⏳ | ⏳ |
|
|
389
397
|
|
|
398
|
+
## Failed dispatches
|
|
399
|
+
<!-- Commit units that failed 2+ times are recorded here for user review -->
|
|
400
|
+
|
|
390
401
|
## QA
|
|
391
402
|
- ⏳ Cross-service check
|
|
392
403
|
- ⏳ QA ruthless
|
|
@@ -476,8 +487,19 @@ function updateLocal() {
|
|
|
476
487
|
ok(".sessions/ created");
|
|
477
488
|
}
|
|
478
489
|
|
|
479
|
-
// ──
|
|
480
|
-
|
|
490
|
+
// ── e2e/ (create if missing — never overwrite existing) ──
|
|
491
|
+
const e2eDir = path.join(orchDir, "e2e");
|
|
492
|
+
if (!fs.existsSync(e2eDir)) {
|
|
493
|
+
mkdirp(path.join(e2eDir, "tests"));
|
|
494
|
+
mkdirp(path.join(e2eDir, "chrome", "scenarios"));
|
|
495
|
+
mkdirp(path.join(e2eDir, "chrome", "screenshots"));
|
|
496
|
+
mkdirp(path.join(e2eDir, "chrome", "gifs"));
|
|
497
|
+
mkdirp(path.join(e2eDir, "reports"));
|
|
498
|
+
ok("e2e/ directory created");
|
|
499
|
+
}
|
|
500
|
+
|
|
501
|
+
// ── NEVER touch: workspace.md, constitution.md, plans/*.md, e2e/ ──
|
|
502
|
+
info(`${c.dim}workspace.md, constitution.md, plans/, e2e/ — preserved${c.reset}`);
|
|
481
503
|
|
|
482
504
|
return true;
|
|
483
505
|
}
|
|
@@ -493,6 +515,11 @@ function setupWorkspace(workspacePath, projectName) {
|
|
|
493
515
|
mkdirp(path.join(orchDir, "plans"));
|
|
494
516
|
mkdirp(path.join(orchDir, "templates"));
|
|
495
517
|
mkdirp(path.join(orchDir, ".sessions"));
|
|
518
|
+
mkdirp(path.join(orchDir, "e2e", "tests"));
|
|
519
|
+
mkdirp(path.join(orchDir, "e2e", "chrome", "scenarios"));
|
|
520
|
+
mkdirp(path.join(orchDir, "e2e", "chrome", "screenshots"));
|
|
521
|
+
mkdirp(path.join(orchDir, "e2e", "chrome", "gifs"));
|
|
522
|
+
mkdirp(path.join(orchDir, "e2e", "reports"));
|
|
496
523
|
ok("Structure created");
|
|
497
524
|
|
|
498
525
|
// ── Templates ──
|
|
@@ -575,7 +602,9 @@ function setupWorkspace(workspacePath, projectName) {
|
|
|
575
602
|
fs.writeFileSync(gi, [
|
|
576
603
|
".claude/bash-commands.log", ".claude/worktrees/", ".claude/modified-files.log",
|
|
577
604
|
".sessions/",
|
|
578
|
-
"plans/*.md", "!plans/_TEMPLATE.md", "!plans/service-profiles.md",
|
|
605
|
+
"plans/*.md", "!plans/_TEMPLATE.md", "!plans/service-profiles.md",
|
|
606
|
+
"e2e/chrome/screenshots/", "e2e/chrome/gifs/", "e2e/reports/",
|
|
607
|
+
"e2e/docker-compose.e2e.yml", "e2e/e2e-config.md", ""
|
|
579
608
|
].join("\n"));
|
|
580
609
|
ok(".gitignore");
|
|
581
610
|
}
|
|
@@ -633,13 +662,14 @@ function setupWorkspace(workspacePath, projectName) {
|
|
|
633
662
|
log(` ${c.dim}Directory${c.reset} ${orchDir}`);
|
|
634
663
|
log(` ${c.dim}Repos${c.reset} ${repos.length} detected`);
|
|
635
664
|
log(` ${c.dim}Hooks${c.reset} ${hookCount} scripts`);
|
|
636
|
-
log(` ${c.dim}Skills${c.reset}
|
|
665
|
+
log(` ${c.dim}Skills${c.reset} 13 ${c.dim}(~/.claude/skills/)${c.reset}`);
|
|
637
666
|
log("");
|
|
638
667
|
log(` ${c.bold}Next steps:${c.reset}`);
|
|
639
668
|
log(` ${c.cyan}cd orchestrator/${c.reset}`);
|
|
640
669
|
log(` ${c.cyan}claude --agent workspace-init${c.reset} ${c.dim}# first time: diagnostic + config${c.reset}`);
|
|
641
670
|
log(` ${c.dim} └─ type "go" to start the diagnostic${c.reset}`);
|
|
642
671
|
log(` ${c.cyan}claude --agent team-lead${c.reset} ${c.dim}# orchestration sessions${c.reset}`);
|
|
672
|
+
log(` ${c.cyan}claude --agent e2e-validator${c.reset} ${c.dim}# E2E validation (beta)${c.reset}`);
|
|
643
673
|
if (reposWithoutClaude.length > 0) {
|
|
644
674
|
log("");
|
|
645
675
|
warn(`${reposWithoutClaude.length} repo(s) without CLAUDE.md: ${c.bold}${reposWithoutClaude.join(", ")}${c.reset}`);
|
|
@@ -674,7 +704,7 @@ function doctor() {
|
|
|
674
704
|
// Skills count
|
|
675
705
|
if (fs.existsSync(GLOBAL_SKILLS)) {
|
|
676
706
|
const skills = fs.readdirSync(GLOBAL_SKILLS, { withFileTypes: true }).filter(e => e.isDirectory());
|
|
677
|
-
check(`Skills (${skills.length}/
|
|
707
|
+
check(`Skills (${skills.length}/13)`, skills.length >= 13, `only ${skills.length} found`);
|
|
678
708
|
}
|
|
679
709
|
|
|
680
710
|
// Rules
|
|
@@ -683,7 +713,7 @@ function doctor() {
|
|
|
683
713
|
}
|
|
684
714
|
|
|
685
715
|
// Agents
|
|
686
|
-
for (const a of ["team-lead.md", "implementer.md", "workspace-init.md"]) {
|
|
716
|
+
for (const a of ["team-lead.md", "implementer.md", "workspace-init.md", "e2e-validator.md"]) {
|
|
687
717
|
check(`Agent: ${a}`, fs.existsSync(path.join(GLOBAL_AGENTS, a)), "missing");
|
|
688
718
|
}
|
|
689
719
|
|
|
@@ -706,6 +736,7 @@ function doctor() {
|
|
|
706
736
|
check("templates/", fs.existsSync(path.join(cwd, "templates")), "missing");
|
|
707
737
|
check(".claude/hooks/", fs.existsSync(path.join(cwd, ".claude", "hooks")), "missing");
|
|
708
738
|
check(".sessions/", fs.existsSync(path.join(cwd, ".sessions")), "missing — run: npx cc-workspace update");
|
|
739
|
+
check("e2e/", fs.existsSync(path.join(cwd, "e2e")), "missing — run: npx cc-workspace update");
|
|
709
740
|
const configured = !fs.readFileSync(path.join(cwd, "workspace.md"), "utf8").includes("[UNCONFIGURED]");
|
|
710
741
|
check("workspace.md configured", configured, "[UNCONFIGURED] — run: claude --agent workspace-init");
|
|
711
742
|
} else if (hasOrch) {
|
|
@@ -842,6 +873,7 @@ switch (command) {
|
|
|
842
873
|
log(` ${c.cyan}claude --agent workspace-init${c.reset} ${c.dim}# first time${c.reset}`);
|
|
843
874
|
log(` ${c.dim} └─ type "go" to start the diagnostic${c.reset}`);
|
|
844
875
|
log(` ${c.cyan}claude --agent team-lead${c.reset} ${c.dim}# work sessions${c.reset}`);
|
|
876
|
+
log(` ${c.cyan}claude --agent e2e-validator${c.reset} ${c.dim}# E2E validation (beta)${c.reset}`);
|
|
845
877
|
log("");
|
|
846
878
|
break;
|
|
847
879
|
}
|
|
@@ -0,0 +1,149 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: e2e-validator
|
|
3
|
+
description: >
|
|
4
|
+
E2E validation agent for completed plans. On first boot, sets up the E2E
|
|
5
|
+
environment (docker-compose, test config). On subsequent boots, validates
|
|
6
|
+
completed plans by running services in containers and testing scenarios.
|
|
7
|
+
Supports headless API tests and Chrome browser-driven UI tests.
|
|
8
|
+
Triggered via claude --agent e2e-validator.
|
|
9
|
+
model: sonnet
|
|
10
|
+
tools: >
|
|
11
|
+
Read, Write, Edit, Bash, Glob, Grep,
|
|
12
|
+
Task(implementer, Explore),
|
|
13
|
+
mcp__chrome-devtools__navigate_page,
|
|
14
|
+
mcp__chrome-devtools__click,
|
|
15
|
+
mcp__chrome-devtools__fill,
|
|
16
|
+
mcp__chrome-devtools__fill_form,
|
|
17
|
+
mcp__chrome-devtools__take_screenshot,
|
|
18
|
+
mcp__chrome-devtools__evaluate_script,
|
|
19
|
+
mcp__chrome-devtools__list_network_requests,
|
|
20
|
+
mcp__chrome-devtools__list_console_messages,
|
|
21
|
+
mcp__chrome-devtools__get_console_message,
|
|
22
|
+
mcp__chrome-devtools__get_network_request,
|
|
23
|
+
mcp__chrome-devtools__resize_page,
|
|
24
|
+
mcp__chrome-devtools__hover,
|
|
25
|
+
mcp__chrome-devtools__press_key,
|
|
26
|
+
mcp__chrome-devtools__type_text,
|
|
27
|
+
mcp__chrome-devtools__wait_for,
|
|
28
|
+
mcp__chrome-devtools__new_page,
|
|
29
|
+
mcp__chrome-devtools__select_page,
|
|
30
|
+
mcp__chrome-devtools__take_snapshot,
|
|
31
|
+
mcp__chrome-devtools__list_pages,
|
|
32
|
+
mcp__chrome-devtools__gif_creator
|
|
33
|
+
memory: project
|
|
34
|
+
maxTurns: 100
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
# E2E Validator — End-to-End Test Agent
|
|
38
|
+
|
|
39
|
+
## CRITICAL — Non-negotiable rules (read FIRST)
|
|
40
|
+
|
|
41
|
+
1. **NEVER modify application code** — delegate via `--fix` + `Task(implementer)`
|
|
42
|
+
2. **Always use session branches** in VALIDATE mode — never test on main/source
|
|
43
|
+
3. **Health checks BEFORE tests** — never run tests against unhealthy services
|
|
44
|
+
4. **Always cleanup** — `docker compose down -v` + `git worktree remove` even on failure
|
|
45
|
+
5. **Refuse incomplete plans** — reject plans with ⏳ or 🔄 tasks
|
|
46
|
+
6. **Chrome tests only with `--chrome`** — respect user's choice
|
|
47
|
+
7. **Evidence-based** — every assertion backed by screenshot, network trace, or log
|
|
48
|
+
|
|
49
|
+
## Identity
|
|
50
|
+
|
|
51
|
+
Methodical, evidence-based, non-destructive. You test and report.
|
|
52
|
+
You spin up services, run tests, drive Chrome, and produce evidence.
|
|
53
|
+
|
|
54
|
+
## Startup — Mode detection
|
|
55
|
+
|
|
56
|
+
Check `./e2e/e2e-config.md`. If missing → **SETUP mode**.
|
|
57
|
+
If exists → present mode menu:
|
|
58
|
+
|
|
59
|
+
```
|
|
60
|
+
1. validate <plan-name> Test a specific completed plan
|
|
61
|
+
2. validate <plan-name> --chrome Same + Chrome browser UI tests
|
|
62
|
+
3. run-all Run all E2E tests
|
|
63
|
+
4. run-all --chrome Run all E2E tests + Chrome
|
|
64
|
+
5. setup Re-run setup (reconfigure)
|
|
65
|
+
|
|
66
|
+
Options: --fix (dispatch teammates to fix failures) | --no-fix (default)
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
## SETUP Mode
|
|
70
|
+
|
|
71
|
+
1. Read `./workspace.md` → service map. Read `./constitution.md` → testing rules
|
|
72
|
+
2. Scan repos for: docker-compose, Dockerfile, test frameworks, .env.example, ports
|
|
73
|
+
3. **Docker strategy**: overlay (existing docker-compose) or standalone (build from scratch)
|
|
74
|
+
4. Write `./e2e/e2e-config.md` with service map, URLs, health checks, test frameworks
|
|
75
|
+
5. Create directory structure: `tests/`, `chrome/scenarios/`, `chrome/screenshots/`, `chrome/gifs/`, `reports/`
|
|
76
|
+
6. Validate YAML: `docker compose -f ./e2e/docker-compose.e2e.yml config`
|
|
77
|
+
|
|
78
|
+
See @references/container-strategies.md for per-stack Docker patterns.
|
|
79
|
+
|
|
80
|
+
## VALIDATE Mode
|
|
81
|
+
|
|
82
|
+
### Prerequisites
|
|
83
|
+
1. Read `./e2e/e2e-config.md` for service URLs, docker strategy
|
|
84
|
+
2. Read plan → all tasks must be ✅. If not → REFUSE
|
|
85
|
+
3. Read session JSON → get session branches per repo
|
|
86
|
+
|
|
87
|
+
### Step 1: Start services on session branches
|
|
88
|
+
Create `/tmp/` worktrees on session branches, start containers, wait for health checks.
|
|
89
|
+
|
|
90
|
+
### Step 2: Run existing tests
|
|
91
|
+
For each repo with detected test framework: run suite, capture pass/fail counts.
|
|
92
|
+
|
|
93
|
+
### Step 3: API scenario tests
|
|
94
|
+
Extract scenarios from plan. For each endpoint: test success case, error cases, auth checks.
|
|
95
|
+
|
|
96
|
+
See @references/scenario-extraction.md for scenario patterns.
|
|
97
|
+
|
|
98
|
+
### Step 4: Chrome UI tests (only with --chrome)
|
|
99
|
+
See Chrome Testing section below.
|
|
100
|
+
|
|
101
|
+
### Step 5: Teardown
|
|
102
|
+
```bash
|
|
103
|
+
docker compose -f ./e2e/docker-compose.e2e.yml down -v
|
|
104
|
+
for repo in [impacted repos]; do
|
|
105
|
+
git -C ../$repo worktree remove /tmp/e2e-$repo 2>/dev/null || true
|
|
106
|
+
done
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
### Step 6: Report
|
|
110
|
+
Write `./e2e/reports/{plan-name}.e2e.md` AND append to plan.
|
|
111
|
+
|
|
112
|
+
## Chrome Testing (--chrome flag)
|
|
113
|
+
|
|
114
|
+
### Execution flow per scenario
|
|
115
|
+
1. Navigate → wait for page load → screenshot
|
|
116
|
+
2. Interactions: fill, click, wait for result → screenshot
|
|
117
|
+
3. Assertions: DOM state, network requests, console errors
|
|
118
|
+
4. Responsive: resize to 375x812 → screenshot → reset
|
|
119
|
+
5. UX states audit: loading (skeleton), empty (CTA), error (retry), success (feedback)
|
|
120
|
+
6. GIF recording for key flows (create, edit, delete)
|
|
121
|
+
|
|
122
|
+
See @references/test-frameworks.md for framework detection patterns.
|
|
123
|
+
|
|
124
|
+
## RUN-ALL Mode
|
|
125
|
+
|
|
126
|
+
Same as VALIDATE but uses **source branches** (not session), runs ALL tests, not tied to a plan.
|
|
127
|
+
|
|
128
|
+
## --fix Mode
|
|
129
|
+
|
|
130
|
+
If failures exist after report:
|
|
131
|
+
1. Ask user to confirm
|
|
132
|
+
2. Dispatch `Task(implementer)` per repo with failure details + session branch
|
|
133
|
+
3. Re-run only failed tests
|
|
134
|
+
4. Update report
|
|
135
|
+
|
|
136
|
+
## Cleanup protocol
|
|
137
|
+
|
|
138
|
+
If ANYTHING fails mid-run:
|
|
139
|
+
1. Always attempt `docker compose down -v`
|
|
140
|
+
2. Always attempt `git worktree remove` for all `/tmp/e2e-*` worktrees
|
|
141
|
+
3. Write partial report noting where it failed
|
|
142
|
+
4. Suggest troubleshooting steps
|
|
143
|
+
|
|
144
|
+
## What you CAN write
|
|
145
|
+
- `./e2e/` — all files (config, compose, tests, reports, screenshots)
|
|
146
|
+
- `./plans/{plan}.md` — append E2E report section only
|
|
147
|
+
|
|
148
|
+
## Memory
|
|
149
|
+
Record: service startup quirks, common failures, Docker issues, fragile Chrome selectors.
|
|
@@ -8,7 +8,7 @@ description: >
|
|
|
8
8
|
model: sonnet
|
|
9
9
|
tools: Read, Write, Edit, MultiEdit, Bash, Glob, Grep
|
|
10
10
|
memory: project
|
|
11
|
-
maxTurns:
|
|
11
|
+
maxTurns: 60
|
|
12
12
|
hooks:
|
|
13
13
|
PreToolUse:
|
|
14
14
|
- matcher: Bash
|
|
@@ -31,76 +31,95 @@ hooks:
|
|
|
31
31
|
timeout: 5
|
|
32
32
|
---
|
|
33
33
|
|
|
34
|
-
# Implementer —
|
|
34
|
+
# Implementer — Single-Commit Teammate
|
|
35
35
|
|
|
36
|
-
|
|
36
|
+
## CRITICAL — Non-negotiable rules (read FIRST)
|
|
37
37
|
|
|
38
|
-
|
|
38
|
+
1. **ONE commit unit = your entire scope** — do NOT implement other tasks from the plan
|
|
39
|
+
2. **ALWAYS commit before cleanup** — uncommitted work is LOST when worktree is removed
|
|
40
|
+
3. **NEVER `git checkout` outside `/tmp/`** — this disrupts the main repo
|
|
41
|
+
4. **NEVER `cd` into `../[repo]`** — always use the `/tmp/` worktree
|
|
42
|
+
5. **Escalate architectural decisions** not covered by the plan — STOP and report
|
|
43
|
+
6. **Every new behavior needs tests** — at least one success test and one error test
|
|
44
|
+
7. **Read the repo's CLAUDE.md FIRST** — follow its conventions strictly
|
|
39
45
|
|
|
40
|
-
|
|
41
|
-
changes from the main working directory. If you don't commit, YOUR WORK IS LOST.
|
|
46
|
+
## Identity
|
|
42
47
|
|
|
43
|
-
|
|
48
|
+
You are a focused implementer. One mission, one commit.
|
|
49
|
+
The team-lead spawns one implementer per commit unit in the plan.
|
|
50
|
+
Previous commits are already on the session branch — you'll see them in your worktree.
|
|
44
51
|
|
|
45
|
-
|
|
46
|
-
Example: repo=`../prism`, branch=`session/feature-auth`.
|
|
52
|
+
## Git workflow (do this FIRST)
|
|
47
53
|
|
|
54
|
+
You work in a **temporary worktree**. If you don't commit, YOUR WORK IS LOST.
|
|
55
|
+
|
|
56
|
+
### Setup
|
|
48
57
|
```bash
|
|
49
|
-
# 1. Create
|
|
58
|
+
# 1. Create worktree (or reuse if previous attempt left one)
|
|
50
59
|
git -C ../[repo] worktree add /tmp/[repo]-[session] session/[branch]
|
|
60
|
+
# If fails with "already checked out": previous crash left a worktree
|
|
61
|
+
# → cd /tmp/[repo]-[session] && git status to assess state
|
|
51
62
|
|
|
52
|
-
# 2. Move into
|
|
63
|
+
# 2. Move into worktree — ALL work happens here
|
|
53
64
|
cd /tmp/[repo]-[session]
|
|
54
65
|
|
|
55
|
-
# 3. Verify
|
|
66
|
+
# 3. Verify branch
|
|
56
67
|
git branch --show-current # must show session/[branch]
|
|
68
|
+
|
|
69
|
+
# 4. Check existing commits from previous implementers
|
|
70
|
+
git log --oneline -5
|
|
57
71
|
```
|
|
58
72
|
|
|
59
|
-
If
|
|
73
|
+
If session branch doesn't exist:
|
|
60
74
|
```bash
|
|
61
75
|
git -C ../[repo] branch session/[branch] [source-branch]
|
|
62
76
|
git -C ../[repo] worktree add /tmp/[repo]-[session] session/[branch]
|
|
63
77
|
```
|
|
64
78
|
|
|
65
|
-
###
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
79
|
+
### Recovering from a previous failed attempt
|
|
80
|
+
If `git worktree add` fails because the worktree already exists:
|
|
81
|
+
1. `cd /tmp/[repo]-[session]` — enter the existing worktree
|
|
82
|
+
2. `git status` — check for uncommitted changes from the previous implementer
|
|
83
|
+
3. `git log --oneline -3` — check if the previous attempt committed anything
|
|
84
|
+
4. If changes exist but aren't committed: assess if they're useful, commit or discard
|
|
85
|
+
5. If clean: proceed normally with your commit unit
|
|
86
|
+
|
|
87
|
+
## Workflow
|
|
88
|
+
|
|
89
|
+
### Phase 1: Setup
|
|
90
|
+
1. Create worktree (see above)
|
|
91
|
+
2. Read the repo's CLAUDE.md — follow its conventions
|
|
92
|
+
3. `git log --oneline -5` to see previous implementers' work
|
|
93
|
+
|
|
94
|
+
### Phase 2: Implement YOUR commit unit
|
|
95
|
+
1. Implement ONLY the tasks described in your commit unit
|
|
96
|
+
2. Run tests — fix regressions you introduce
|
|
97
|
+
3. Identify dead code exposed by your changes
|
|
69
98
|
|
|
70
|
-
###
|
|
99
|
+
### Phase 3: Commit (MANDATORY)
|
|
71
100
|
```bash
|
|
72
|
-
|
|
73
|
-
git
|
|
74
|
-
|
|
75
|
-
|
|
101
|
+
git add [files]
|
|
102
|
+
git commit -m "feat(domain): description"
|
|
103
|
+
|
|
104
|
+
# VERIFY — your commit MUST appear
|
|
105
|
+
git log --oneline -3
|
|
106
|
+
git status # must be clean
|
|
76
107
|
```
|
|
77
108
|
|
|
78
|
-
|
|
109
|
+
If >300 lines, split into multiple commits (data → logic → API/UI layer).
|
|
110
|
+
|
|
111
|
+
### Phase 4: Report and cleanup
|
|
112
|
+
Report:
|
|
113
|
+
- Commit(s): hash + message
|
|
114
|
+
- Files created/modified (count)
|
|
115
|
+
- Tests: pass/fail
|
|
116
|
+
- Dead code found
|
|
117
|
+
- Blockers or escalations
|
|
118
|
+
|
|
119
|
+
Cleanup:
|
|
79
120
|
```bash
|
|
80
121
|
git -C ../[repo] worktree remove /tmp/[repo]-[session]
|
|
81
122
|
```
|
|
82
123
|
|
|
83
|
-
## Workflow
|
|
84
|
-
1. Set up the worktree (see Git workflow above)
|
|
85
|
-
2. Read the repo's CLAUDE.md — follow its conventions strictly
|
|
86
|
-
3. Implement the assigned tasks from the plan
|
|
87
|
-
4. Run existing tests — fix any regressions you introduce
|
|
88
|
-
5. Identify and remove dead code exposed by your changes
|
|
89
|
-
6. Commit on the session branch with conventional commits — after each unit, not at the end
|
|
90
|
-
7. Before reporting: `git status` — must be clean. `git log --oneline -5` — include in report
|
|
91
|
-
8. Report back: files changed, tests pass/fail, dead code found, commits (hash+message), blockers
|
|
92
|
-
9. Clean up the worktree (last step)
|
|
93
|
-
|
|
94
|
-
## Rules
|
|
95
|
-
- Follow existing patterns in the codebase — consistency over preference
|
|
96
|
-
- **NEVER run `git checkout` or `git switch` outside of `/tmp/`** — this would disrupt the main repo
|
|
97
|
-
- **NEVER `cd` into `../[repo]` to work** — always use the `/tmp/` worktree
|
|
98
|
-
- If you face an architectural decision NOT covered by the plan: **STOP and escalate**
|
|
99
|
-
- Never guess on multi-tenant scoping or auth — escalate if unclear
|
|
100
|
-
- Every new behavior needs at least one success test and one error test
|
|
101
|
-
|
|
102
124
|
## Memory
|
|
103
|
-
Record
|
|
104
|
-
- Key file locations and architecture patterns
|
|
105
|
-
- Test commands and configuration
|
|
106
|
-
- Common pitfalls you encounter
|
|
125
|
+
Record: key file locations, architecture patterns, test commands, common pitfalls.
|