buildanything 1.2.1 → 1.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +1 -1
- package/agents/design-ui-designer.md +28 -0
- package/agents/design-ux-architect.md +10 -0
- package/commands/build.md +463 -324
- package/commands/protocols/brainstorm.md +99 -0
- package/commands/protocols/build-fix.md +52 -0
- package/commands/protocols/cleanup.md +56 -0
- package/commands/protocols/design.md +287 -0
- package/commands/protocols/eval-harness.md +62 -0
- package/commands/protocols/metric-loop.md +94 -0
- package/commands/protocols/planning.md +56 -0
- package/commands/protocols/verify.md +63 -0
- package/hooks/hooks.json +2 -2
- package/hooks/session-start +65 -8
- package/package.json +1 -1
package/commands/build.md
CHANGED
|
@@ -1,464 +1,603 @@
|
|
|
1
1
|
---
|
|
2
|
-
description: "Full product build pipeline
|
|
3
|
-
argument-hint: "
|
|
2
|
+
description: "Full product build pipeline — orchestrates specialist agents through brainstorming, research, architecture, implementation, testing, hardening, and shipping"
|
|
3
|
+
argument-hint: "Describe what to build, or path to a design doc. --autonomous for unattended mode. --resume to continue a previous build."
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
|
|
6
|
+
<HARD-GATE>
|
|
7
|
+
YOU ARE AN ORCHESTRATOR. YOU COORDINATE AGENTS. YOU DO NOT WRITE CODE.
|
|
7
8
|
|
|
8
|
-
|
|
9
|
+
Every step below tells you to call the Agent tool. DO IT. Do not role-play as the agent. Do not write implementation code yourself. Do not skip the Agent tool call "because it's faster." If you are typing code instead of calling the Agent tool, STOP — you are violating this process.
|
|
9
10
|
|
|
10
|
-
|
|
11
|
-
You are an ORCHESTRATOR. You coordinate specialist agents. You do NOT write implementation code yourself.
|
|
11
|
+
"Launch an agent" = call the Agent tool (the actual tool in your toolbar, the one that spawns a subprocess).
|
|
12
12
|
|
|
13
|
-
|
|
13
|
+
For implementation agents, set mode: "bypassPermissions".
|
|
14
|
+
For parallel work, put multiple Agent tool calls in ONE message.
|
|
14
15
|
|
|
15
|
-
|
|
16
|
+
Exception: Brainstorming (Phase 1, Step 1.1) is a direct conversation with the user — you ask questions and process answers yourself. This is the ONE phase where you work directly, not through agents.
|
|
16
17
|
</HARD-GATE>
|
|
17
18
|
|
|
18
|
-
|
|
19
|
-
1. Read `docs/plans/.build-state.md` to recover your phase, step, and progress
|
|
20
|
-
2. Re-read THIS file completely — you are reading it now
|
|
21
|
-
3. Check the TodoWrite list for task progress
|
|
22
|
-
4. Resume from the saved state, not from scratch
|
|
23
|
-
5. Do NOT skip ahead or fall back to default coding behavior
|
|
24
|
-
|
|
25
|
-
### Rationalization Prevention
|
|
26
|
-
|
|
27
|
-
If you catch yourself thinking any of these, you are drifting from the process:
|
|
28
|
-
|
|
29
|
-
| Thought | Reality |
|
|
30
|
-
|---------|---------|
|
|
31
|
-
| "It's faster if I just write this myself" | You are an orchestrator. Dispatch to an agent. Speed is not your job — coordination is. |
|
|
32
|
-
| "This is too small for a subagent" | Every implementation task goes through an agent. No exceptions. Small tasks still need the Dev→QA loop. |
|
|
33
|
-
| "I'll skip the code review for this one" | Every task gets reviewed. The code-reviewer agent exists for a reason. |
|
|
34
|
-
| "The quality gate is obvious, I'll just proceed" | Present it to the user. Quality gates require explicit user approval. |
|
|
35
|
-
| "I already know what to build, I'll skip architecture" | Phase 1 is mandatory. The architecture step catches design mistakes before they become code. |
|
|
36
|
-
| "Tests aren't needed for this part" | Every task has acceptance criteria and tests. The Evidence Collector verifies. |
|
|
37
|
-
| "I'll clean this up later" | The Harden phase (Phase 4) exists for this. Don't skip steps — follow the process. |
|
|
38
|
-
| "Context was compacted, I'll just keep coding" | STOP. Re-read this file. Check .build-state.md. Reload the process. |
|
|
39
|
-
|
|
40
|
-
### Process Flowchart
|
|
41
|
-
|
|
42
|
-
```dot
|
|
43
|
-
digraph build_pipeline {
|
|
44
|
-
rankdir=TB;
|
|
45
|
-
node [shape=box];
|
|
46
|
-
|
|
47
|
-
start [label="User invokes /build" shape=ellipse];
|
|
48
|
-
p1 [label="Phase 1: Architecture & Planning\n(Backend Architect + UX Architect +\nSecurity Engineer + code-architect +\nSprint Prioritizer + Senior PM)"];
|
|
49
|
-
gate1 [label="Quality Gate 1\nUser approves architecture?" shape=diamond];
|
|
50
|
-
p2 [label="Phase 2: Foundation\n(DevOps Automator + Frontend Dev\nor Backend Architect)"];
|
|
51
|
-
gate2 [label="Quality Gate 2\nBuilds? Tests pass? Lint clean?" shape=diamond];
|
|
52
|
-
p3 [label="Phase 3: Build — Dev↔QA Loops\nFor EACH task:\nAgent implements → Evidence Collector\nverifies → code-reviewer reviews"];
|
|
53
|
-
retry [label="Retry (max 3)\nFeedback to dev agent" shape=box];
|
|
54
|
-
escalate [label="Escalate to user\nafter 3 failures" shape=box];
|
|
55
|
-
p4 [label="Phase 4: Harden\n(API Tester + Perf Benchmarker +\nAccessibility Auditor + Security Engineer +\ncode-simplifier + Reality Checker)"];
|
|
56
|
-
gate4 [label="Quality Gate 4\nReality Checker: PRODUCTION READY?" shape=diamond];
|
|
57
|
-
p5 [label="Phase 5: Ship\n(Technical Writer + final commit)"];
|
|
58
|
-
done [label="BUILD COMPLETE" shape=ellipse];
|
|
59
|
-
|
|
60
|
-
start -> p1;
|
|
61
|
-
p1 -> gate1;
|
|
62
|
-
gate1 -> p2 [label="approved"];
|
|
63
|
-
gate1 -> p1 [label="changes requested"];
|
|
64
|
-
p2 -> gate2;
|
|
65
|
-
gate2 -> p3 [label="pass"];
|
|
66
|
-
gate2 -> p2 [label="fix"];
|
|
67
|
-
p3 -> retry [label="task fails"];
|
|
68
|
-
retry -> p3 [label="< 3 retries"];
|
|
69
|
-
retry -> escalate [label="3 retries"];
|
|
70
|
-
p3 -> p4 [label="all tasks complete"];
|
|
71
|
-
p4 -> gate4;
|
|
72
|
-
gate4 -> p5 [label="PRODUCTION READY"];
|
|
73
|
-
gate4 -> p4 [label="NEEDS WORK"];
|
|
74
|
-
p5 -> done;
|
|
75
|
-
}
|
|
76
|
-
```
|
|
19
|
+
### Orchestrator Discipline
|
|
77
20
|
|
|
78
|
-
|
|
21
|
+
Your context window is precious. Protect it.
|
|
79
22
|
|
|
80
|
-
You are
|
|
23
|
+
**You are a DISPATCHER, not a DOER.** Your job is: read state → decide next step → compose agent prompt → dispatch → process result → decide next step.
|
|
81
24
|
|
|
82
|
-
**
|
|
25
|
+
**Two types of agents — handle their results differently:**
|
|
83
26
|
|
|
84
|
-
|
|
27
|
+
| Agent Type | Examples | What you keep |
|
|
28
|
+
|-----------|----------|---------------|
|
|
29
|
+
| **Research/analysis** | Market research, tech feasibility, architecture design, audits, measurement | **Full output** — their response IS the deliverable. You need it to synthesize, compare, and make decisions. Save to `docs/plans/` when applicable. |
|
|
30
|
+
| **Implementation** | Code writing, fixes, cleanup, verification, scaffolding | **Summary only** — their work product lives in the codebase. Keep: what was done, files changed, test results, pass/fail. Discard: code snippets, full build logs, lint output. |
|
|
85
31
|
|
|
86
|
-
|
|
32
|
+
**After implementation agents return:**
|
|
33
|
+
1. Extract: what was built, files changed, test pass/fail, any blockers
|
|
34
|
+
2. Record in `docs/plans/.build-state.md` under the current phase
|
|
35
|
+
3. The code is in the repo — you don't need it in your context
|
|
87
36
|
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
- **Parallelism within phases.** Agents within the same step run in parallel via the Agent tool. Phases run sequentially.
|
|
93
|
-
- **Real code, real tests, real commits.** This pipeline writes actual files, runs actual tests, and makes actual git commits. It does not produce documents about code.
|
|
94
|
-
- **Evidence-based quality.** The Reality Checker defaults to NEEDS WORK. The Evidence Collector requires proof. Do not self-approve.
|
|
95
|
-
- **TodoWrite for progress tracking.** Use TodoWrite to create and update a task checklist at the start of Phase 3. This is your primary progress tracker — it survives context compaction better than memory alone.
|
|
96
|
-
- **State persistence.** After completing each step, update `docs/plans/.build-state.md` with your current phase, step, task progress, and agent usage. This file is your recovery point if context is compacted.
|
|
37
|
+
**After research/analysis agents return:**
|
|
38
|
+
1. Read and use the full output — this is your decision-making input
|
|
39
|
+
2. Save the output to the appropriate file in `docs/plans/` (research brief, architecture doc, etc.)
|
|
40
|
+
3. Once saved to disk, you can reference the file later instead of holding it all in context
|
|
97
41
|
|
|
98
|
-
|
|
42
|
+
**Never do these yourself:**
|
|
43
|
+
- Read source code files to understand implementation details — spawn an Explore agent
|
|
44
|
+
- Write or edit code — spawn an implementation agent
|
|
45
|
+
- Debug failures — spawn a fix agent with the error message
|
|
99
46
|
|
|
100
|
-
|
|
47
|
+
If you catch yourself typing code or reading source files: STOP. You are wasting context. Spawn an agent.
|
|
101
48
|
|
|
102
|
-
|
|
49
|
+
**Dispatch Counter:** Track agent dispatches in `docs/plans/.build-state.md` under `## Dispatch Counter`:
|
|
50
|
+
- `dispatches_since_save: [N]`
|
|
51
|
+
- `last_save: [Phase.Step]`
|
|
52
|
+
Increment after each agent returns (parallel dispatch of 4 agents = +4). Reset to 0 after each compaction save.
|
|
103
53
|
|
|
104
|
-
|
|
105
|
-
- [ ] Phase 1: Architecture & Planning
|
|
106
|
-
- [ ] Phase 2: Foundation
|
|
107
|
-
- [ ] Phase 3: Build (will expand into per-task items later)
|
|
108
|
-
- [ ] Phase 4: Harden
|
|
109
|
-
- [ ] Phase 5: Ship
|
|
54
|
+
Input: $ARGUMENTS
|
|
110
55
|
|
|
111
|
-
|
|
112
|
-
```
|
|
113
|
-
Phase: 0 — Initializing
|
|
114
|
-
Input: [user's build request]
|
|
115
|
-
Started: [timestamp]
|
|
116
|
-
```
|
|
56
|
+
### Autonomous Mode
|
|
117
57
|
|
|
118
|
-
|
|
58
|
+
If the input contains `--autonomous` or `--auto`, this build runs **unattended**. The user will not be present to approve quality gates. In autonomous mode:
|
|
59
|
+
- Quality gates auto-approve. Do NOT pause and wait for user input.
|
|
60
|
+
- Brainstorming runs in autonomous mode (see protocol).
|
|
61
|
+
- Metric loops that stall accept at >= 60% of target, skip below that.
|
|
62
|
+
- Log every decision to `docs/plans/build-log.md` so the user can review later.
|
|
119
63
|
|
|
120
|
-
|
|
64
|
+
If `--autonomous` is NOT present, all quality gates require user approval as described below.
|
|
65
|
+
|
|
66
|
+
When combining `--resume` with `--autonomous`: the current invocation's flags take precedence over saved state. If you resume a previously interactive build with `--autonomous`, it continues in autonomous mode.
|
|
67
|
+
|
|
68
|
+
### Metric Loop
|
|
121
69
|
|
|
122
|
-
|
|
70
|
+
Every phase uses a **metric-driven iteration loop** to drive quality. Read the full protocol at `commands/protocols/metric-loop.md`. Critical rules (survive compaction):
|
|
123
71
|
|
|
124
|
-
|
|
72
|
+
1. YOU define a metric for this phase based on context (what you're building, what matters). The metric is NOT predefined.
|
|
73
|
+
2. Spawn a **measurement agent** to score the artifact 0-100. Read its full output — it's analysis.
|
|
74
|
+
3. Pick the ONE highest-impact issue. Spawn a separate **fix agent** with ONLY that issue + file paths.
|
|
75
|
+
4. Re-measure. Repeat until: target met, stalled (2 consecutive delta <= 0), or max iterations.
|
|
76
|
+
5. Track all scores in `docs/plans/.build-state.md` — this is your lifeline across compaction.
|
|
125
77
|
|
|
126
78
|
<HARD-GATE>
|
|
127
|
-
|
|
79
|
+
METRIC LOOP NON-NEGOTIABLES:
|
|
80
|
+
- Measurement agent and fix agent are SEPARATE Agent tool calls — never share context (author-bias elimination).
|
|
81
|
+
- Fix agent gets ONLY the top issue + file paths + acceptance criteria. NOT the full measurement findings.
|
|
82
|
+
- One fix per iteration. Measure impact before fixing the next thing.
|
|
83
|
+
- Each measurement is fresh — don't accumulate findings across iterations.
|
|
128
84
|
</HARD-GATE>
|
|
129
85
|
|
|
130
|
-
###
|
|
86
|
+
### Handoff Documents
|
|
131
87
|
|
|
132
|
-
|
|
133
|
-
- Similar features and their implementation patterns
|
|
134
|
-
- Architecture layers and abstractions
|
|
135
|
-
- File organization conventions, testing patterns, build system
|
|
88
|
+
When spawning agents in sequence (e.g., architect → implementer → reviewer), pass **scoped handoffs** — not the full architecture dump. Each agent receives only what it needs:
|
|
136
89
|
|
|
137
|
-
|
|
90
|
+
1. **Relevant architecture section** — the specific part of architecture.md that applies to this agent's task
|
|
91
|
+
2. **Previous agent's output** — what the upstream agent produced (if any)
|
|
92
|
+
3. **Acceptance criteria** — what "done" looks like for THIS agent
|
|
138
93
|
|
|
139
|
-
|
|
94
|
+
For implementation agents (Phase 5+): Do NOT paste the entire Design Document or Architecture Document. Extract the relevant sections only. For research and architecture agents (Phases 1-2): pass the full document — these agents need complete context to do their analysis.
|
|
140
95
|
|
|
141
|
-
|
|
96
|
+
### Complexity Routing (Advisory)
|
|
142
97
|
|
|
143
|
-
|
|
98
|
+
When composing agent prompts, prefix with `[COMPLEXITY: S/M/L]` to hint at the appropriate model tier:
|
|
144
99
|
|
|
145
|
-
|
|
100
|
+
| Complexity | Task Types | Preferred Tier |
|
|
101
|
+
|-----------|-----------|----------------|
|
|
102
|
+
| S | Build-fix, cleanup, lint fix, single-error fix | Haiku-class (fastest) |
|
|
103
|
+
| M | Measurement, eval, testing, single-feature impl | Sonnet-class (balanced) |
|
|
104
|
+
| L | Architecture, research, multi-file impl, debugging | Opus-class (deepest reasoning) |
|
|
146
105
|
|
|
147
|
-
|
|
106
|
+
For sprint tasks, use the Size field from `docs/plans/sprint-tasks.md`. This is advisory — the tag documents intent for future model routing support.
|
|
148
107
|
|
|
149
|
-
|
|
108
|
+
---
|
|
150
109
|
|
|
151
|
-
|
|
110
|
+
## Phase 0: Context & Pre-Flight
|
|
152
111
|
|
|
153
|
-
|
|
112
|
+
**Resuming?** If the input contains `--resume` OR if context was just compacted (SessionStart hook fired with active state):
|
|
113
|
+
1. Read `docs/plans/.build-state.md` — verify it exists and has a Resume Point section.
|
|
114
|
+
If `docs/plans/.build-state.md` does not exist or has no Resume Point, warn the user: 'No previous build state found. Starting fresh.' Then proceed to Step 0.1 as a new build.
|
|
115
|
+
2. Re-read this file and all protocol files in `commands/protocols/`.
|
|
116
|
+
3. Re-read `docs/plans/sprint-tasks.md`, `docs/plans/architecture.md`, and `CLAUDE.md`.
|
|
117
|
+
4. Rebuild TodoWrite from the state file (TodoWrite does NOT survive compaction or session breaks).
|
|
118
|
+
5. Reset `dispatches_since_save` to 0 (fresh context window).
|
|
119
|
+
6. Resume from the saved phase and step. Skip Phase 0.
|
|
154
120
|
|
|
155
|
-
|
|
156
|
-
- Break the build into ordered, atomic tasks
|
|
157
|
-
- Each task should be implementable and testable independently
|
|
158
|
-
- Define acceptance criteria for each task — what "done" looks like, what tests must pass
|
|
159
|
-
- Identify dependencies between tasks — what must be built first
|
|
160
|
-
- Estimate relative complexity (S/M/L) for each task
|
|
161
|
-
- **Include the architectural rationale** — WHY this task exists, which part of the architecture it implements
|
|
121
|
+
### Step 0.1 — Read the Room
|
|
162
122
|
|
|
163
|
-
|
|
164
|
-
- Confirm realistic scope — remove anything that isn't in the brainstorming spec
|
|
165
|
-
- Verify no missing tasks — every component from the architecture has implementation tasks
|
|
166
|
-
- Ensure task descriptions are specific enough that a developer agent can execute without ambiguity
|
|
123
|
+
Before doing anything, scan for existing context:
|
|
167
124
|
|
|
168
|
-
|
|
125
|
+
- Check if the input is a file path (e.g., `docs/plans/brainstorm.md`). If so, read it.
|
|
126
|
+
- Check if `docs/plans/` or `docs/briefs/` exist with prior brainstorming, design docs, decision briefs, or research. Read them.
|
|
127
|
+
- Check if there's existing code in the project. If so, this is an enhancement, not greenfield.
|
|
128
|
+
- Check the conversation history — has the user been discussing this idea already?
|
|
129
|
+
- Check if `docs/plans/learnings.md` exists from a previous build. If so, read it. Apply relevant PATTERNS to agent prompt design, avoid listed PITFALLs, use HEURISTICS when applicable.
|
|
169
130
|
|
|
170
|
-
|
|
171
|
-
# Sprint Tasks — buildanything pipeline
|
|
172
|
-
# PROCESS: Execute each task using build.md Phase 3 Dev→QA loops.
|
|
173
|
-
# DO NOT implement tasks directly. Dispatch to specialist agents.
|
|
174
|
-
# If you lost context, re-read: commands/build.md
|
|
175
|
-
#
|
|
176
|
-
# Each task MUST go through: Implement (agent) → Verify (Evidence Collector) → Review (code-reviewer)
|
|
177
|
-
```
|
|
131
|
+
**Classify what you found:**
|
|
178
132
|
|
|
179
|
-
|
|
133
|
+
| Context Level | What You Have | What Happens |
|
|
134
|
+
|---|---|---|
|
|
135
|
+
| **Full design** | Design doc with decisions, scope, tech stack, data models | Skip Phase 1. Feed design into Phase 2. |
|
|
136
|
+
| **Decision brief** | An idea-sweep brief with verdicts and MVP definition | Phase 1 skips research (Step 1.2). Brainstorming refines the brief into a design. |
|
|
137
|
+
| **Partial context** | Some notes, conversation, rough sketch | Phase 1 runs fully. Feed context into brainstorming + research. |
|
|
138
|
+
| **Raw idea** | One-line build request, no prior work | Phase 1 runs fully from scratch. |
|
|
180
139
|
|
|
181
|
-
|
|
182
|
-
1. Architecture Document (system diagram, component tree, data models, API contracts)
|
|
183
|
-
2. Sprint Task List (ordered tasks with acceptance criteria)
|
|
184
|
-
3. Identified risks or decisions that need user input
|
|
140
|
+
### Step 0.2 — Human Prerequisites Checklist
|
|
185
141
|
|
|
186
|
-
|
|
142
|
+
Identify everything that requires HUMAN action before going heads-down:
|
|
187
143
|
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
144
|
+
- **API keys & secrets** — External services the project integrates with. List each key needed.
|
|
145
|
+
- **Database setup** — Supabase, Postgres, etc. User needs to create it and provide credentials.
|
|
146
|
+
- **Repository** — Git repo on GitHub? Public or private?
|
|
147
|
+
- **Deployment** — Vercel, Railway, Fly.io? User needs to connect.
|
|
148
|
+
- **MCP servers** — Playwright for visual testing, database access, etc.
|
|
149
|
+
- **Local tooling** — Docker, specific runtimes, etc.
|
|
150
|
+
|
|
151
|
+
Present the checklist:
|
|
191
152
|
|
|
192
|
-
**Save state:** Write `docs/plans/.build-state.md`:
|
|
193
153
|
```
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
154
|
+
BEFORE I GO HEADS-DOWN, please set up:
|
|
155
|
+
|
|
156
|
+
[ ] [Service] API key → add as [KEY_NAME] to .env
|
|
157
|
+
[ ] [Database] → add connection URL to .env
|
|
158
|
+
[ ] GitHub repo → share the URL
|
|
159
|
+
[ ] [Deployment service] connected (optional)
|
|
160
|
+
|
|
161
|
+
Once done, say "ready" and I'll start building.
|
|
197
162
|
```
|
|
198
163
|
|
|
199
|
-
|
|
164
|
+
<HARD-GATE>
|
|
165
|
+
Interactive mode: DO NOT proceed until the user confirms prerequisites (or says to skip).
|
|
166
|
+
Autonomous mode: Log checklist to `docs/plans/build-log.md`. Create `.env.example` with required keys. Proceed — log missing keys as blockers if hit during build.
|
|
167
|
+
</HARD-GATE>
|
|
168
|
+
|
|
169
|
+
### Step 0.3 — Initialize
|
|
170
|
+
|
|
171
|
+
0. Create `docs/plans/` directory if it doesn't exist (greenfield projects won't have it).
|
|
172
|
+
1. Create a TodoWrite checklist with Phases 0-7.
|
|
173
|
+
2. Create `docs/plans/.build-state.md` as a single write with ALL of the following: phase and step (`Phase: 0 — Starting`), input (`[build request]`), context level (`[classification]`), prerequisites (`[status]`), dispatch counter (`dispatches_since_save: 0, last_save: Phase 0`), and a `## Resume Point` section with: phase, step, autonomous mode flag, completed tasks (none), git branch name.
|
|
174
|
+
3. Go to Phase 1 (or Phase 2 if context level is "Full design").
|
|
175
|
+
|
|
176
|
+
---
|
|
177
|
+
|
|
178
|
+
## Phase 1: Brainstorm & Research
|
|
179
|
+
|
|
180
|
+
**Goal**: Turn the raw idea into a validated Design Document grounded in research. This ensures Phase 2 architects receive a design, not a guess.
|
|
181
|
+
|
|
182
|
+
**Skip if** Step 0.1 classified context as "Full design" — go straight to Phase 2.
|
|
183
|
+
|
|
184
|
+
### Step 1.1 — Brainstorming
|
|
185
|
+
|
|
186
|
+
Follow the Brainstorm Protocol (`commands/protocols/brainstorm.md`).
|
|
187
|
+
|
|
188
|
+
In interactive mode: this is a conversation. Ask questions one at a time, propose approaches with trade-offs, let the user decide. Output: Design Document saved to `docs/plans/`.
|
|
189
|
+
|
|
190
|
+
In autonomous mode: synthesize a design document directly using the build request and available context. Pick pragmatic defaults. Log rationale to `docs/plans/build-log.md`.
|
|
191
|
+
|
|
192
|
+
### Step 1.2 — Parallel Research (5 agents, ONE message)
|
|
193
|
+
|
|
194
|
+
Skip if context level is "Decision brief" (research already done).
|
|
195
|
+
|
|
196
|
+
Call the Agent tool 5 times in a single message. Pass each agent the build request AND the Design Document draft.
|
|
197
|
+
|
|
198
|
+
1. Description: "Market research" — Prompt: "Research market size (TAM/SAM/SOM), competitive landscape (5-10 players), timing, and market structure for: [build request]. Design context: [paste design doc]. Use web search extensively. Report with a Market Verdict: GREEN/AMBER/RED."
|
|
199
|
+
|
|
200
|
+
2. Description: "Tech feasibility" — Prompt: "Evaluate hard technical problems (Solved/Hard/Unsolved), build-vs-buy decisions, MVP scope, and stack validation for: [build request]. Design context: [paste design doc]. Search for APIs and libraries mentioned in the design to verify they exist and are maintained. Report with a Technical Verdict."
|
|
201
|
+
|
|
202
|
+
3. Description: "User research" — Prompt: "Analyze target persona, jobs-to-be-done, current alternatives, behavioral barriers to adoption for: [build request]. Design context: [paste design doc]. Search for real user complaints and communities discussing this problem. Report with a User Verdict."
|
|
203
|
+
|
|
204
|
+
4. Description: "Business model" — Prompt: "Evaluate revenue models, unit economics, growth loops, first-1000-users strategy for: [build request]. Design context: [paste design doc]. Search for comparable pricing and growth data. Report with a Business Verdict."
|
|
205
|
+
|
|
206
|
+
5. Description: "Risk analysis" — Prompt: "Adversarial review: regulatory risk, security concerns, dependency risks, competitive response, top 3 failure modes for: [build request]. Design context: [paste design doc]. Search for enforcement actions and comparable failures. Report with a Risk Verdict."
|
|
207
|
+
|
|
208
|
+
After all 5 return, synthesize a **Research Brief** with a verdict table. Save to `docs/plans/research-brief.md`.
|
|
209
|
+
|
|
210
|
+
### Step 1.3 — Design Refinement
|
|
211
|
+
|
|
212
|
+
Read the Design Document and Research Brief together. Check for contradictions:
|
|
213
|
+
|
|
214
|
+
- Tech-feasibility flagged "Unsolved" hard problem → simplify or flag as risk
|
|
215
|
+
- Risk-analysis returned RED → add mitigation or descope
|
|
216
|
+
- User-research says "no validated demand" → flag as pivot point
|
|
217
|
+
- Business-model says "no moat" → note for speed-to-market priority
|
|
218
|
+
|
|
219
|
+
Update the Design Document with corrections. Save final version.
|
|
220
|
+
|
|
221
|
+
### Step 1.4 — Persist Decisions
|
|
222
|
+
|
|
223
|
+
Append key decisions to the project's `CLAUDE.md` (create if needed) under `## Build Decisions`:
|
|
224
|
+
|
|
225
|
+
- Project name and one-line description
|
|
226
|
+
- Primary user and core value prop
|
|
227
|
+
- Tech stack (with rationale)
|
|
228
|
+
- Key constraints or risks
|
|
229
|
+
- MVP scope boundary (in vs. deferred)
|
|
230
|
+
|
|
231
|
+
This ensures decisions survive context compaction.
|
|
232
|
+
|
|
233
|
+
### Quality Gate 1
|
|
234
|
+
|
|
235
|
+
**Autonomous:** Log design and research paths to `docs/plans/build-log.md`. If 2+ RED verdicts, log warning. Proceed.
|
|
236
|
+
|
|
237
|
+
**Interactive:** Present Design Document summary + Research Brief verdict table. Ask: "Approve this design, or want to adjust?" <HARD-GATE>DO NOT PROCEED without user approval.</HARD-GATE>
|
|
238
|
+
|
|
239
|
+
Update TodoWrite and `docs/plans/.build-state.md`.
|
|
240
|
+
|
|
241
|
+
**Compaction checkpoint:** Check `dispatches_since_save` in `docs/plans/.build-state.md`. If >= 8: save ALL state (current phase, task statuses, metric loop scores, decisions) to `docs/plans/.build-state.md`. Reset `dispatches_since_save` to 0. TodoWrite does NOT survive compaction — rebuild it from this state file on resume.
|
|
200
242
|
|
|
201
243
|
---
|
|
202
244
|
|
|
203
|
-
## Phase 2:
|
|
245
|
+
## Phase 2: Architecture & Planning
|
|
246
|
+
|
|
247
|
+
**Goal**: Convert the validated Design Document into a concrete architecture and ordered task list. Every agent receives the Design Document — not just the build request.
|
|
204
248
|
|
|
205
|
-
|
|
249
|
+
### Step 2.1 — Explore (existing codebase only)
|
|
206
250
|
|
|
207
|
-
|
|
251
|
+
If existing code, call the Agent tool — description: "Explore codebase" — prompt: "Explore this codebase. Map architecture layers, file conventions, testing patterns, existing features. Report findings."
|
|
208
252
|
|
|
209
|
-
|
|
210
|
-
- Project directory structure
|
|
211
|
-
- Package manager and dependencies
|
|
212
|
-
- Build/dev tooling configuration
|
|
213
|
-
- Linting, formatting, type checking config
|
|
214
|
-
- Base test framework and first passing test
|
|
215
|
-
- Git initialization and .gitignore
|
|
216
|
-
- Environment configuration (.env.example)
|
|
253
|
+
If greenfield, skip to Step 2.2.
|
|
217
254
|
|
|
218
|
-
|
|
219
|
-
Use the **Frontend Developer** or **Backend Architect** (as appropriate) to scaffold the actual project.
|
|
255
|
+
### Step 2.2 — Architecture Design (4 agents in parallel, ONE message)
|
|
220
256
|
|
|
221
|
-
|
|
257
|
+
Read the Design Document and Research Brief. Pass both to every agent.
|
|
222
258
|
|
|
223
|
-
|
|
259
|
+
Call the Agent tool 4 times in a single message:
|
|
224
260
|
|
|
225
|
-
|
|
226
|
-
- CSS design tokens (colors, spacing, typography as variables)
|
|
227
|
-
- Base layout components (grid, container, responsive breakpoints)
|
|
228
|
-
- Core UI primitives that other components will build on
|
|
261
|
+
1. Description: "Backend architecture" — Prompt: "Design system architecture. DESIGN DOC: [paste]. RESEARCH: [paste tech + risk sections]. Include services, data models, API contracts, database schema. Be specific. Respect tech stack and constraints from the design doc."
|
|
229
262
|
|
|
230
|
-
|
|
263
|
+
2. Description: "Frontend architecture" — Prompt: "Design frontend architecture. DESIGN DOC: [paste]. RESEARCH: [paste user research section]. Include component hierarchy, layout, responsive strategy, state management. Align UX with the user persona from research."
|
|
264
|
+
|
|
265
|
+
3. Description: "Security architecture" — Prompt: "Security review. DESIGN DOC: [paste]. RESEARCH: [paste risk section]. Cover auth model, input validation, secrets management, threat model. Address any regulatory risks flagged in research."
|
|
266
|
+
|
|
267
|
+
4. Description: "Implementation blueprint" — Prompt: "Implementation blueprint. DESIGN DOC: [paste]. Include specific files to create/modify, build sequence, dependency order. Scope to MVP boundary from design doc."
|
|
268
|
+
|
|
269
|
+
After all 4 return, YOU synthesize into one Architecture Document. Save to `docs/plans/architecture.md`.
|
|
270
|
+
|
|
271
|
+
### Step 2.3 — Metric Loop: Architecture Quality
|
|
272
|
+
|
|
273
|
+
Run the Metric Loop Protocol (`commands/protocols/metric-loop.md`) on the Architecture Document. Define a metric based on this project — coverage of design doc requirements, specificity, consistency between agents. Max 3 iterations.
|
|
274
|
+
|
|
275
|
+
### Step 2.4 — Sprint Planning
|
|
276
|
+
|
|
277
|
+
Follow the Planning Protocol (`commands/protocols/planning.md`). Use 2 sequential Agent tool calls:
|
|
278
|
+
|
|
279
|
+
Call the Agent tool — description: "Sprint breakdown" — prompt: "Break this architecture into ordered, atomic tasks. Each task needs: description, acceptance criteria, dependencies, size (S/M/L). ARCHITECTURE: [paste]. DESIGN DOC: [paste]. Scope to MVP only."
|
|
280
|
+
|
|
281
|
+
Then call the Agent tool — description: "Validate task list" — prompt: "Validate this task list: [paste]. Check scope is realistic, no missing tasks, descriptions specific enough for a developer agent to execute, all tasks within MVP boundary."
|
|
282
|
+
|
|
283
|
+
Save to `docs/plans/sprint-tasks.md`.
|
|
231
284
|
|
|
232
285
|
### Quality Gate 2
|
|
233
286
|
|
|
234
|
-
|
|
235
|
-
- Project builds without errors
|
|
236
|
-
- Test framework runs and the initial test passes
|
|
237
|
-
- Linting passes clean
|
|
238
|
-
- Directory structure matches the Architecture Document
|
|
287
|
+
**Autonomous:** Log to `docs/plans/build-log.md`. Proceed.
|
|
239
288
|
|
|
240
|
-
|
|
289
|
+
**Interactive:** Present Architecture + Sprint Task List. Ask: "Approve to start building, or flag changes?" <HARD-GATE>DO NOT PROCEED without user approval.</HARD-GATE>
|
|
241
290
|
|
|
242
|
-
|
|
243
|
-
```
|
|
244
|
-
Phase: 2 COMPLETE
|
|
245
|
-
Foundation: scaffolded, builds clean, tests pass
|
|
246
|
-
Next: Phase 3 — Dev↔QA loops
|
|
247
|
-
```
|
|
291
|
+
Update TodoWrite and `docs/plans/.build-state.md`.
|
|
248
292
|
|
|
249
|
-
|
|
293
|
+
**Compaction checkpoint:** Check `dispatches_since_save` in `docs/plans/.build-state.md`. If >= 8: save ALL state (current phase, task statuses, metric loop scores, decisions) to `docs/plans/.build-state.md`. Reset `dispatches_since_save` to 0. TodoWrite does NOT survive compaction — rebuild it from this state file on resume.
|
|
250
294
|
|
|
251
295
|
---
|
|
252
296
|
|
|
253
|
-
## Phase 3:
|
|
297
|
+
## Phase 3: Design & Visual Identity
|
|
298
|
+
|
|
299
|
+
**Goal**: Transform architecture into a research-backed visual design system, proven with Playwright screenshots. Fully autonomous — agents research, decide, and iterate without user input.
|
|
300
|
+
|
|
301
|
+
**Skip if** the project has no user-facing frontend (CLI tools, pure APIs, backend services).
|
|
254
302
|
|
|
255
303
|
<HARD-GATE>
|
|
256
|
-
|
|
257
|
-
|
|
258
|
-
|
|
259
|
-
- You are dispatching to agents, not coding directly
|
|
260
|
-
- `docs/plans/.build-state.md` exists and is current
|
|
261
|
-
- TodoWrite has Phases 1 and 2 marked complete
|
|
262
|
-
|
|
263
|
-
If ANY check fails, STOP and resolve before continuing.
|
|
304
|
+
UI/UX IS THE PRODUCT. This phase is a full peer to Architecture and Build — not a footnote, not an afterthought, not a "nice to have." Do NOT skip, compress, or rush this phase for any reason. The agents must research real competitors and award-winning sites, make deliberate visual choices backed by that research, build proof screens, and iterate with Playwright-verified visual QA before a single line of product code is written.
|
|
305
|
+
|
|
306
|
+
Phase 4 (Foundation) WILL NOT START without `docs/plans/visual-design-spec.md`. If it does not exist, return here.
|
|
264
307
|
</HARD-GATE>
|
|
265
308
|
|
|
266
|
-
|
|
309
|
+
### Step 3.1 — Design Research (2 agents, parallel, both use Playwright)
|
|
267
310
|
|
|
268
|
-
|
|
311
|
+
Follow the Design Protocol (`commands/protocols/design.md`), Step 3.1.
|
|
269
312
|
|
|
270
|
-
|
|
313
|
+
Call the Agent tool 2 times in one message:
|
|
271
314
|
|
|
272
|
-
|
|
315
|
+
1. Description: "Competitive visual audit" — Prompt: "Research the top 5-8 competitors/analogues for: [product description]. Use Playwright to screenshot each site (desktop 1920x1080 + mobile 375x812). Screenshot standout components (hero, cards, forms, nav, CTAs). Save to docs/plans/design-references/competitors/. Analyze visual language: colors, typography, spacing, what feels premium vs cheap. Rank by visual quality. DESIGN DOC: [paste]."
|
|
273
316
|
|
|
274
|
-
|
|
275
|
-
- **Frontend Developer** — UI components, pages, client-side logic
|
|
276
|
-
- **Backend Architect** — APIs, database operations, server logic
|
|
277
|
-
- **AI Engineer** — ML features, model integration, data pipelines
|
|
278
|
-
- **Rapid Prototyper** — Quick integrations, glue code, utility functions
|
|
317
|
+
2. Description: "Design inspiration mining" — Prompt: "Search Awwwards.com, Godly.website, SiteInspire for award-winning sites in category: [product category]. Use Playwright to screenshot top 5-8 results + standout components. Save to docs/plans/design-references/inspiration/. Identify visual trends, what separates best-in-class from generic. DESIGN DOC: [paste]."
|
|
279
318
|
|
|
280
|
-
|
|
319
|
+
After both return, synthesize a **Design Research Brief** to `docs/plans/design-research.md`. Include all screenshot paths.
|
|
281
320
|
|
|
282
|
-
|
|
283
|
-
- The specific task description and acceptance criteria from the Sprint Task List
|
|
284
|
-
- The Architecture Document for context
|
|
285
|
-
- Access to all existing code via Read/Grep/Glob tools
|
|
321
|
+
### Step 3.2 — Design Direction (2 agents, sequential)
|
|
286
322
|
|
|
287
|
-
|
|
323
|
+
Follow the Design Protocol (`commands/protocols/design.md`), Step 3.2.
|
|
288
324
|
|
|
289
|
-
|
|
325
|
+
1. Call the Agent tool — description: "UX architecture" — Prompt: "Create structural design foundation. INPUTS: frontend architecture section from architecture.md [paste], Design Research Brief [paste], reference screenshot paths [list], user persona [paste]. OUTPUT: information architecture, layout strategy, component hierarchy, responsive approach, interaction patterns. Base decisions on competitive research, not generic patterns."
|
|
290
326
|
|
|
291
|
-
|
|
327
|
+
2. Call the Agent tool — description: "Visual design spec" — Prompt: "Create the Visual Design Spec with AUTONOMOUS decisions — pick the single best direction, do not present options. INPUTS: UX foundation [paste previous output], Design Research Brief [paste], reference screenshot paths [list], user persona [paste]. OUTPUT: color system (with hex, light+dark), typography (Google Fonts, mathematical scale), 8px spacing system, tinted shadow system, border radius, animation/motion, component styles with ALL states. Every choice must cite the research. Apply anti-AI-template rules from the Design Protocol. Save to docs/plans/visual-design-spec.md."
|
|
292
328
|
|
|
293
|
-
|
|
294
|
-
- Run the tests the developer wrote — do they pass?
|
|
295
|
-
- Check the acceptance criteria from the Sprint Task List — is each one met?
|
|
296
|
-
- If frontend: take screenshots as visual proof
|
|
297
|
-
- Report: **PASS** (all criteria met with evidence) or **FAIL** (specific failures listed)
|
|
329
|
+
### Step 3.3 — Proof Screens (1 implementation agent)
|
|
298
330
|
|
|
299
|
-
|
|
331
|
+
Call the Agent tool — description: "Build proof screens" — mode: "bypassPermissions" — prompt: "[COMPLEXITY: L] Implement 2-3 proof screens (landing/hero, main app view, key form). INPUTS: Visual Design Spec [paste], UX foundation [paste relevant sections], reference screenshots [list paths — these are your visual targets]. Use EXACT colors, fonts, spacing from spec. Real styled responsive pages, not wireframes. Include hover/focus states, transitions. Commit: 'feat: proof screens for design validation'."
|
|
300
332
|
|
|
301
|
-
|
|
302
|
-
- Bugs, logic errors, security issues
|
|
303
|
-
- Adherence to project conventions from the Architecture Document
|
|
304
|
-
- Code quality — is it simple, DRY, readable?
|
|
333
|
+
### Step 3.4 — Visual QA Loop (Playwright + Metric Loop)
|
|
305
334
|
|
|
306
|
-
|
|
335
|
+
Run the Metric Loop Protocol (`commands/protocols/metric-loop.md`) using the measurement criteria from the Design Protocol (`commands/protocols/design.md`, Step 3.4).
|
|
307
336
|
|
|
308
|
-
|
|
309
|
-
- Mark task as complete in TodoWrite
|
|
310
|
-
- Move to next task
|
|
311
|
-
- Reset retry counter
|
|
337
|
+
Measurement: Playwright screenshots of proof screens (desktop + mobile). Design critic agent scores 0-100 across 6 dimensions: spacing/alignment, typography hierarchy, color harmony, component polish, responsive quality, originality (anti-AI-template check). Receives screenshots + Visual Design Spec + reference screenshots.
|
|
312
338
|
|
|
313
|
-
**
|
|
314
|
-
- Increment retry counter
|
|
315
|
-
- Send specific feedback to the developer agent: what failed, what the QA/reviewer found
|
|
316
|
-
- Developer fixes and resubmits
|
|
317
|
-
- Repeat Steps 3.2-3.3
|
|
339
|
+
**Target: 80. Max 5 iterations.** On stall: accept if >= 65, log warning below 65.
|
|
318
340
|
|
|
319
|
-
|
|
320
|
-
- Stop and escalate to the user with:
|
|
321
|
-
- What the task is trying to do
|
|
322
|
-
- What keeps failing
|
|
323
|
-
- The specific error or QA feedback
|
|
324
|
-
- Ask: "Fix manually, skip for now, or redesign the approach?"
|
|
341
|
+
### Step 3.5 — Autonomous Quality Gate
|
|
325
342
|
|
|
326
|
-
|
|
343
|
+
Log to `docs/plans/build-log.md`: final screenshot paths, score history table, design decisions, originality score. No user pause. Proceed to Phase 4.
|
|
327
344
|
|
|
328
|
-
|
|
345
|
+
**Compaction checkpoint:** Check `dispatches_since_save` in `docs/plans/.build-state.md`. If >= 8: save ALL state (current phase, task statuses, metric loop scores, decisions) to `docs/plans/.build-state.md`. Reset `dispatches_since_save` to 0. TodoWrite does NOT survive compaction — rebuild it from this state file on resume.
|
|
329
346
|
|
|
330
|
-
|
|
347
|
+
---
|
|
331
348
|
|
|
332
|
-
|
|
333
|
-
```
|
|
334
|
-
Task [X/total]: [task name] — COMPLETE
|
|
335
|
-
Tests: [pass count] passing
|
|
336
|
-
Attempts: [retry count]
|
|
337
|
-
Next: [next task name]
|
|
338
|
-
```
|
|
349
|
+
## Phase 4: Foundation
|
|
339
350
|
|
|
340
|
-
|
|
341
|
-
|
|
342
|
-
Phase
|
|
343
|
-
|
|
344
|
-
|
|
345
|
-
Retry counter: 0
|
|
346
|
-
Agents used this phase: [list]
|
|
347
|
-
```
|
|
351
|
+
<HARD-GATE>
|
|
352
|
+
Before starting Phase 4: Phase 2 must be approved AND Phase 3 must have produced `docs/plans/visual-design-spec.md`.
|
|
353
|
+
If visual-design-spec.md does not exist, DO NOT PROCEED. Return to Phase 3.
|
|
354
|
+
Step 4.2 (Design System) MUST implement from visual-design-spec.md — not generic architecture tokens.
|
|
355
|
+
</HARD-GATE>
|
|
348
356
|
|
|
349
|
-
|
|
357
|
+
### Step 4.1 — Scaffolding
|
|
358
|
+
|
|
359
|
+
Call the Agent tool — description: "Project scaffolding" — mode: "bypassPermissions" — prompt: "[COMPLEXITY: M] Set up the project from this architecture: [paste]. Create directory structure, dependencies, build tooling, linting config, test framework with one passing test, .gitignore, .env.example. Commit: 'feat: initial scaffolding'."
|
|
360
|
+
|
|
361
|
+
### Step 4.2 — Design System (frontend only)
|
|
362
|
+
|
|
363
|
+
Call the Agent tool — description: "Design system setup" — mode: "bypassPermissions" — prompt: "Implement the design system from the Visual Design Spec: [paste from docs/plans/visual-design-spec.md]. Create CSS tokens matching the spec's color system, typography scale, spacing system, shadow/elevation tokens, and base layout components. Reference the proof screens from Phase 3 as implementation targets. Commit: 'feat: design system'."
|
|
364
|
+
|
|
365
|
+
### Step 4.3 — Metric Loop: Scaffold Health
|
|
366
|
+
|
|
367
|
+
Run the Metric Loop Protocol. Define a metric: builds clean, tests pass, lint clean, structure matches architecture. Max 3 iterations.
|
|
350
368
|
|
|
351
|
-
|
|
369
|
+
### Step 4.4 — Verification Gate
|
|
352
370
|
|
|
353
|
-
|
|
371
|
+
Run the Verification Protocol (`commands/protocols/verify.md`). Critical rules (survive compaction):
|
|
372
|
+
- ONE agent runs all 6 checks sequentially: Build → Type-Check → Lint → Test → Security → Diff Review. Stop on first FAIL.
|
|
373
|
+
- Agent auto-detects stack from manifest files (package.json → Node, go.mod → Go, etc.).
|
|
374
|
+
- On FAIL: for build/type/lint errors, use the Build-Fix Protocol (`commands/protocols/build-fix.md`) — fixes one error at a time with cascade detection. For test/security/diff failures, spawn a targeted fix agent. Re-verify. Max 3 fix attempts.
|
|
375
|
+
- On PASS: log `VERIFY: PASS (6/6)` to `docs/plans/.build-state.md`. Proceed.
|
|
376
|
+
|
|
377
|
+
Call the Agent tool — description: "Verify scaffolding" — mode: "bypassPermissions" — prompt: "Run the Verification Protocol. Execute all 6 checks sequentially, stop on first failure. Report: VERIFY: PASS or VERIFY: FAIL with details."
|
|
378
|
+
|
|
379
|
+
Do not proceed to Phase 5 until verification passes.
|
|
380
|
+
|
|
381
|
+
Update TodoWrite and state.
|
|
382
|
+
|
|
383
|
+
**Compaction checkpoint:** Check `dispatches_since_save` in `docs/plans/.build-state.md`. If >= 8: save ALL state (current phase, task statuses, metric loop scores, decisions) to `docs/plans/.build-state.md`. Reset `dispatches_since_save` to 0. TodoWrite does NOT survive compaction — rebuild it from this state file on resume.
|
|
384
|
+
|
|
385
|
+
---
|
|
386
|
+
|
|
387
|
+
## Phase 5: Build — Metric-Driven Dev Loops
|
|
354
388
|
|
|
355
389
|
<HARD-GATE>
|
|
356
|
-
|
|
390
|
+
Before starting: Phase 2 must be approved, Phase 3 must produce docs/plans/visual-design-spec.md, Phase 4 must pass. You MUST call the Agent tool for EVERY task. No exceptions.
|
|
357
391
|
</HARD-GATE>
|
|
358
392
|
|
|
359
|
-
|
|
393
|
+
Expand TodoWrite with each sprint task.
|
|
394
|
+
|
|
395
|
+
**For EACH task:**
|
|
396
|
+
|
|
397
|
+
### Step 5.1 — Implement
|
|
398
|
+
|
|
399
|
+
Call the Agent tool — description: "[task name]" — mode: "bypassPermissions" — prompt: "TASK: [task description + acceptance criteria]. HANDOFF — Architecture section: [paste ONLY the relevant section from architecture.md]. Design section: [paste ONLY the relevant section from the design doc]. Previous task output: [what the last completed task produced, if relevant]. Implement fully with real code and tests. Commit: 'feat: [task]'. Report what you built, files changed, and test results."
|
|
400
|
+
|
|
401
|
+
Pick the right developer framing: frontend, backend, AI, etc. Set `[COMPLEXITY: S/M/L]` based on the task's Size from sprint-tasks.md.
|
|
402
|
+
|
|
403
|
+
### Step 5.1b — Cleanup (De-Sloppify)
|
|
404
|
+
|
|
405
|
+
Follow the Cleanup Protocol (`commands/protocols/cleanup.md`). Critical rules (survive compaction):
|
|
406
|
+
[COMPLEXITY: S]
|
|
407
|
+
- Skip if trivial (< 20 lines, single file).
|
|
408
|
+
- Cleanup agent is a SEPARATE agent from the implementer — no cleaning your own mess.
|
|
409
|
+
- Scope is sacred: ONLY files from the implementation changeset. Zero exceptions.
|
|
410
|
+
- Cleanup fixes: naming, dead code, unused imports, style, DRY. Does NOT: add features, change architecture, touch other files.
|
|
411
|
+
- If cleanup breaks acceptance criteria, revert and skip. Never block the metric loop on cleanup failure.
|
|
412
|
+
|
|
413
|
+
Call the Agent tool — description: "Cleanup [task name]" — mode: "bypassPermissions" — with the list of files changed and the task's acceptance criteria.
|
|
414
|
+
|
|
415
|
+
### Step 5.2 — Metric Loop: Task Quality
|
|
416
|
+
|
|
417
|
+
Run the Metric Loop Protocol on the task implementation. Define a metric based on the task's acceptance criteria. Max 5 iterations.
|
|
418
|
+
|
|
419
|
+
### Step 5.3 — Loop Exit
|
|
420
|
+
|
|
421
|
+
On target met: mark task complete in TodoWrite, report "Task X/N: [name] — COMPLETE (score: [final], iterations: [count])".
|
|
422
|
+
|
|
423
|
+
On stall or max iterations:
|
|
424
|
+
- **Interactive:** present score history + top remaining issue to user.
|
|
425
|
+
- **Autonomous:** accept if score >= 60% of target, skip otherwise. Log to `docs/plans/build-log.md`.
|
|
426
|
+
|
|
427
|
+
After each task: update TodoWrite and `docs/plans/.build-state.md`.
|
|
428
|
+
|
|
429
|
+
### Step 5.4 — Post-Task Verification
|
|
430
|
+
|
|
431
|
+
Run the Verification Protocol (`commands/protocols/verify.md`) to catch regressions. If FAIL, fix before starting the next task.
|
|
432
|
+
|
|
433
|
+
**Compaction checkpoint:** Check `dispatches_since_save` in `docs/plans/.build-state.md`. If >= 8: save ALL state (current phase, task statuses, metric loop scores, decisions) to `docs/plans/.build-state.md`. Reset `dispatches_since_save` to 0. TodoWrite does NOT survive compaction — rebuild it from this state file on resume.
|
|
434
|
+
|
|
435
|
+
---
|
|
436
|
+
|
|
437
|
+
## Phase 6: Harden — Metric-Driven Hardening
|
|
438
|
+
|
|
439
|
+
### Step 6.0 — Pre-Hardening Verification
|
|
360
440
|
|
|
361
|
-
|
|
441
|
+
Run the Verification Protocol (`commands/protocols/verify.md`). ONE agent, 6 sequential checks (Build → Type → Lint → Test → Security → Diff), stop on first FAIL. Max 3 fix attempts. All checks must pass before starting expensive audit agents — do not waste audit agents on code that doesn't build or pass tests.
|
|
362
442
|
|
|
363
|
-
|
|
443
|
+
### Step 6.1 — Initial Audit (4 agents in parallel, ONE message)
|
|
364
444
|
|
|
365
|
-
|
|
445
|
+
Call the Agent tool 4 times in one message:
|
|
366
446
|
|
|
367
|
-
|
|
447
|
+
1. Description: "API testing" — Prompt: "Comprehensive API validation: all endpoints, edge cases, error responses, auth flows. Report findings with counts."
|
|
368
448
|
|
|
369
|
-
|
|
449
|
+
2. Description: "Performance audit" — Prompt: "Measure response times, identify bottlenecks, flag performance issues. Report benchmarks."
|
|
370
450
|
|
|
371
|
-
|
|
451
|
+
3. Description: "Accessibility audit" — Prompt: "WCAG compliance audit on all interfaces. Check screen reader, keyboard nav, contrast. Report issues with counts."
|
|
372
452
|
|
|
373
|
-
|
|
374
|
-
- Route to the appropriate developer agent with the specific finding
|
|
375
|
-
- Developer fixes the issue
|
|
376
|
-
- The agent that found the issue re-validates
|
|
377
|
-
- Dev↔QA loop until the specific issue is resolved
|
|
453
|
+
4. Description: "Security audit" — Prompt: "Security review: auth, input validation, data exposure, dependency vulnerabilities. Report findings with severity."
|
|
378
454
|
|
|
379
|
-
### Step
|
|
455
|
+
### Step 6.1b — Eval Harness
|
|
380
456
|
|
|
381
|
-
|
|
457
|
+
Run the Eval Harness Protocol (`commands/protocols/eval-harness.md`). Define 8-15 concrete, executable eval cases from the audit findings and architecture doc. Run the eval agent. Record baseline pass rate. CRITICAL and HIGH failures feed into the metric loop in Step 6.2 as specific issues to fix.
|
|
382
458
|
|
|
383
|
-
|
|
384
|
-
2. **type-design-analyzer** (Claude Code) — Review all type definitions for proper encapsulation and invariants
|
|
385
|
-
3. **comment-analyzer** (Claude Code) — Verify all comments are accurate and useful
|
|
459
|
+
### Step 6.2 — Metric Loop: Hardening Quality
|
|
386
460
|
|
|
387
|
-
|
|
461
|
+
Run the Metric Loop Protocol on the full codebase using audit findings as initial input. Define a composite metric based on what this project needs. Max 4 iterations.
|
|
388
462
|
|
|
389
|
-
|
|
463
|
+
When fixing, dispatch to the RIGHT specialist. Security → security agent. Accessibility → frontend agent. Don't send everything to one agent.
|
|
390
464
|
|
|
391
|
-
|
|
392
|
-
- Cross-validate all test results
|
|
393
|
-
- Review all QA evidence from Phase 3 and Phase 4
|
|
394
|
-
- Check every acceptance criterion from the Sprint Task List
|
|
395
|
-
- Verdict: **PRODUCTION READY** or **NEEDS WORK** with specific items
|
|
465
|
+
### Step 6.2b — Eval Re-run
|
|
396
466
|
|
|
397
|
-
|
|
467
|
+
Re-run the Eval Harness after the metric loop exits. All CRITICAL eval cases must now pass. If any CRITICAL case still fails, include it as evidence for the Reality Checker.
|
|
398
468
|
|
|
399
|
-
|
|
400
|
-
|
|
401
|
-
|
|
402
|
-
3.
|
|
403
|
-
|
|
404
|
-
|
|
405
|
-
|
|
469
|
+
### Step 6.2c — E2E Testing (3 mandatory iterations)
|
|
470
|
+
|
|
471
|
+
<HARD-GATE>
|
|
472
|
+
ALL 3 ITERATIONS ARE MANDATORY. Do NOT stop after iteration 1 even if all tests pass. The purpose of 3 runs is to catch flaky tests, timing-dependent failures, and race conditions that only surface on repeated execution. Skip this step ONLY if the project has no user-facing frontend.
|
|
473
|
+
</HARD-GATE>
|
|
474
|
+
|
|
475
|
+
Generate and execute end-to-end tests using Playwright against the running application. Tests cover critical user journeys derived from the design doc and architecture.
|
|
476
|
+
|
|
477
|
+
**Iteration 1 — Generate & Run:**
|
|
478
|
+
|
|
479
|
+
Call the Agent tool — description: "E2E test generation" — mode: "bypassPermissions" — prompt:
|
|
480
|
+
|
|
481
|
+
"[COMPLEXITY: L] Generate and run end-to-end Playwright tests for this application.
|
|
482
|
+
|
|
483
|
+
INPUTS:
|
|
484
|
+
- Architecture doc (user flows and API contracts): [paste relevant sections from docs/plans/architecture.md]
|
|
485
|
+
- Design doc (core user journeys): [paste relevant sections]
|
|
486
|
+
- Visual Design Spec (component selectors and page structure): [paste relevant sections from docs/plans/visual-design-spec.md]
|
|
487
|
+
|
|
488
|
+
REQUIREMENTS:
|
|
489
|
+
1. Identify 5-10 critical user journeys from the design doc (auth flows, core feature flows, data entry, navigation)
|
|
490
|
+
2. Use Page Object Model pattern — one page object per major view
|
|
491
|
+
3. Use data-testid selectors (add them to components if missing)
|
|
492
|
+
4. Wait for API responses, NEVER use arbitrary timeouts (no waitForTimeout)
|
|
493
|
+
5. Capture screenshots at critical verification points
|
|
494
|
+
6. Configure multi-browser: Chromium + Firefox + WebKit
|
|
495
|
+
7. Set up playwright.config.ts with: fullyParallel, retries: 0 (we handle retries ourselves), screenshot: 'only-on-failure', video: 'retain-on-failure', trace: 'on-first-retry'
|
|
496
|
+
8. Run all tests. Report: total, passed, failed, with failure details and screenshot paths.
|
|
497
|
+
9. Commit: 'test: e2e test suite for critical user journeys'
|
|
498
|
+
|
|
499
|
+
Test priority:
|
|
500
|
+
- CRITICAL: Auth, core feature happy path, data submission, payment/transaction flows
|
|
501
|
+
- HIGH: Search, filtering, navigation, error states
|
|
502
|
+
- MEDIUM: Responsive layout, animations, edge cases"
|
|
503
|
+
|
|
504
|
+
Record results: total tests, pass count, fail count, failure details. Log to `docs/plans/.build-state.md` under `## E2E Testing`:
|
|
406
505
|
|
|
407
|
-
**Save state:** Update `docs/plans/.build-state.md`:
|
|
408
506
|
```
|
|
409
|
-
|
|
410
|
-
|
|
507
|
+
| Iter | Total | Passed | Failed | Flaky | Top Failure |
|
|
508
|
+
|------|-------|--------|--------|-------|-------------|
|
|
509
|
+
| 1 | ... | ... | ... | ... | ... |
|
|
411
510
|
```
|
|
412
511
|
|
|
413
|
-
|
|
512
|
+
**Iteration 2 — Fix & Re-run:**
|
|
414
513
|
|
|
415
|
-
|
|
514
|
+
Call the Agent tool — description: "E2E fix iteration 2" — mode: "bypassPermissions" — prompt:
|
|
416
515
|
|
|
417
|
-
|
|
516
|
+
"[COMPLEXITY: M] Fix E2E test failures and re-run the full suite.
|
|
418
517
|
|
|
419
|
-
|
|
518
|
+
ITERATION 1 RESULTS: [paste failure details — test names, error messages, screenshot paths]
|
|
420
519
|
|
|
421
|
-
|
|
520
|
+
For each failure:
|
|
521
|
+
1. Diagnose: Is this a real bug, a flaky test, or a missing data-testid?
|
|
522
|
+
2. Real bugs: Fix the application code
|
|
523
|
+
3. Flaky tests: Add proper waits, fix race conditions, improve selectors
|
|
524
|
+
4. Missing selectors: Add data-testid attributes to components
|
|
525
|
+
5. Do NOT delete or skip failing tests — fix them
|
|
422
526
|
|
|
423
|
-
|
|
424
|
-
|
|
425
|
-
- API documentation (if applicable)
|
|
426
|
-
- Any environment/deployment notes
|
|
527
|
+
Re-run ALL tests (not just previously failing ones). Report results.
|
|
528
|
+
Commit fixes: 'fix: e2e test failures iteration 2'"
|
|
427
529
|
|
|
428
|
-
|
|
530
|
+
Record results in the E2E table. Identify any tests that passed in iteration 1 but failed in iteration 2 — these are flaky candidates.
|
|
429
531
|
|
|
430
|
-
|
|
532
|
+
**Iteration 3 — Final Stability Run:**
|
|
431
533
|
|
|
432
|
-
|
|
534
|
+
Call the Agent tool — description: "E2E stability run" — mode: "bypassPermissions" — prompt:
|
|
433
535
|
|
|
434
|
-
|
|
536
|
+
"[COMPLEXITY: M] Final E2E stability run — iteration 3 of 3.
|
|
435
537
|
|
|
436
|
-
|
|
538
|
+
PREVIOUS RESULTS:
|
|
539
|
+
- Iteration 1: [pass/fail counts]
|
|
540
|
+
- Iteration 2: [pass/fail counts]
|
|
541
|
+
- Flaky candidates: [tests that had inconsistent results across iterations]
|
|
437
542
|
|
|
438
|
-
|
|
439
|
-
|
|
440
|
-
|
|
543
|
+
REQUIREMENTS:
|
|
544
|
+
1. Run ALL tests with --repeat-each=3 to detect flakiness (each test runs 3 times within this iteration)
|
|
545
|
+
2. Any test failing inconsistently across the 3 sub-runs: quarantine with test.fixme() and file path + reason
|
|
546
|
+
3. Fix any remaining consistent failures
|
|
547
|
+
4. Generate final report with: total journeys, pass rate, flaky count, quarantined tests
|
|
548
|
+
5. Commit: 'test: e2e stability fixes iteration 3'
|
|
441
549
|
|
|
442
|
-
|
|
443
|
-
Tasks: [completed]/[total] ([pass rate]%)
|
|
444
|
-
Tests: [count] passing
|
|
445
|
-
Commits: [count]
|
|
550
|
+
PASS CRITERIA: 95%+ pass rate across all tests. Quarantined flaky tests do not count against pass rate but must be logged."
|
|
446
551
|
|
|
447
|
-
|
|
448
|
-
Implementation: [which developer agents were used]
|
|
449
|
-
QA: [Evidence Collector + code-reviewer findings]
|
|
450
|
-
Hardening: [API Tester + Performance Benchmarker + Accessibility Auditor + Security Engineer]
|
|
451
|
-
Final Verdict: [Reality Checker's assessment]
|
|
552
|
+
Record final results. Include in Reality Checker evidence.
|
|
452
553
|
|
|
453
|
-
|
|
454
|
-
Files Modified: [count]
|
|
554
|
+
### Step 6.3 — Reality Check
|
|
455
555
|
|
|
456
|
-
|
|
457
|
-
|
|
556
|
+
Call the Agent tool — description: "Final verdict" — prompt: "You are the Reality Checker. Default: NEEDS WORK. The hardening loop reached score [final_score] after [iterations] iterations. Score history: [paste table]. Review all evidence. Eval harness results: [baseline pass rate] → [final pass rate]. E2E test results: [paste E2E table — 3 iterations, final pass rate, quarantined count]. CRITICAL failures remaining: [list or none]. Verdict: PRODUCTION READY or NEEDS WORK with specifics."
|
|
557
|
+
|
|
558
|
+
<HARD-GATE>Do NOT self-approve. Reality Checker must give the verdict.</HARD-GATE>
|
|
559
|
+
|
|
560
|
+
**Autonomous:** Log verdict to `docs/plans/build-log.md`. Continue.
|
|
561
|
+
**Interactive:** Present score history + verdict to user. Update state.
|
|
458
562
|
|
|
459
|
-
|
|
563
|
+
**Compaction checkpoint:** Check `dispatches_since_save` in `docs/plans/.build-state.md`. If >= 8: save ALL state (current phase, task statuses, metric loop scores, decisions) to `docs/plans/.build-state.md`. Reset `dispatches_since_save` to 0. TodoWrite does NOT survive compaction — rebuild it from this state file on resume.
|
|
564
|
+
|
|
565
|
+
---
|
|
566
|
+
|
|
567
|
+
## Phase 7: Ship
|
|
568
|
+
|
|
569
|
+
### Step 7.0 — Pre-Ship Verification
|
|
570
|
+
|
|
571
|
+
Final verification gate. Run the Verification Protocol (`commands/protocols/verify.md`). ONE agent, 6 sequential checks (Build → Type → Lint → Test → Security → Diff), stop on first FAIL. Max 3 fix attempts. All checks must pass before documenting and shipping. If FAIL persists, return to Phase 6 for targeted fixes.
|
|
572
|
+
|
|
573
|
+
### Step 7.1 — Documentation
|
|
574
|
+
|
|
575
|
+
Call the Agent tool — description: "Documentation" — mode: "bypassPermissions" — prompt: "Write project docs: README with setup/architecture/usage, API docs if applicable, deployment notes. Commit: 'docs: project documentation'."
|
|
576
|
+
|
|
577
|
+
### Step 7.2 — Metric Loop: Documentation Quality
|
|
578
|
+
|
|
579
|
+
Run the Metric Loop Protocol on documentation. Define a metric based on completeness and whether a new developer could follow the README. Max 3 iterations.
|
|
580
|
+
|
|
581
|
+
### Step 7.3 — Record Learnings
|
|
582
|
+
|
|
583
|
+
Append to `docs/plans/learnings.md` (create if it doesn't exist). Review the build and record 3-5 learnings:
|
|
584
|
+
|
|
585
|
+
- **PATTERN:** [what worked well and should be repeated in future builds]
|
|
586
|
+
- **PITFALL:** [what failed, caused waste, or required excessive iterations]
|
|
587
|
+
- **HEURISTIC:** [project-specific tuning discovered during this build]
|
|
588
|
+
|
|
589
|
+
Base learnings on: metric loop stall patterns, build-fix frequency, phases that exceeded expected iterations, agent prompts that needed rework.
|
|
590
|
+
|
|
591
|
+
### Completion Report
|
|
592
|
+
|
|
593
|
+
Create final commit. Present:
|
|
460
594
|
|
|
461
|
-
**Save final state:** Update `docs/plans/.build-state.md`:
|
|
462
595
|
```
|
|
463
|
-
|
|
596
|
+
BUILD COMPLETE
|
|
597
|
+
Project: [name] | Tasks: [done]/[total] | Tests: [count] passing
|
|
598
|
+
Agents used: [list] | Verdict: [Reality Checker result]
|
|
599
|
+
Metric loops run: [count] | Avg iterations: [N]
|
|
600
|
+
Remaining: [any NEEDS WORK items]
|
|
464
601
|
```
|
|
602
|
+
|
|
603
|
+
Mark all TodoWrite items complete. Update `docs/plans/.build-state.md`: "Phase: 7 COMPLETE."
|