buildanything 1.5.0 → 1.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +2 -1
- package/.claude-plugin/plugin.json +10 -2
- package/agents/agentic-identity-trust.md +65 -311
- package/agents/data-consolidation-agent.md +3 -22
- package/agents/design-brand-guardian.md +52 -275
- package/agents/design-image-prompt-engineer.md +67 -196
- package/agents/design-ui-designer.md +55 -351
- package/agents/design-ux-architect.md +54 -427
- package/agents/design-ux-researcher.md +48 -299
- package/agents/design-whimsy-injector.md +58 -405
- package/agents/engineering-backend-architect.md +39 -202
- package/agents/engineering-data-engineer.md +41 -236
- package/agents/engineering-devops-automator.md +73 -258
- package/agents/engineering-frontend-developer.md +33 -206
- package/agents/engineering-mobile-app-builder.md +36 -446
- package/agents/engineering-rapid-prototyper.md +34 -428
- package/agents/engineering-security-engineer.md +44 -204
- package/agents/engineering-senior-developer.md +18 -138
- package/agents/engineering-technical-writer.md +40 -302
- package/agents/marketing-app-store-optimizer.md +63 -276
- package/agents/marketing-social-media-strategist.md +38 -87
- package/agents/project-management-experiment-tracker.md +62 -156
- package/agents/report-distribution-agent.md +4 -24
- package/agents/sales-data-extraction-agent.md +3 -22
- package/agents/specialized-cultural-intelligence-strategist.md +41 -62
- package/agents/specialized-developer-advocate.md +65 -234
- package/agents/support-analytics-reporter.md +76 -306
- package/agents/support-executive-summary-generator.md +26 -172
- package/agents/support-finance-tracker.md +67 -362
- package/agents/support-legal-compliance-checker.md +40 -497
- package/agents/support-support-responder.md +40 -532
- package/agents/testing-accessibility-auditor.md +67 -271
- package/agents/testing-api-tester.md +58 -274
- package/agents/testing-evidence-collector.md +48 -170
- package/agents/testing-performance-benchmarker.md +75 -236
- package/agents/testing-reality-checker.md +49 -192
- package/agents/testing-test-results-analyzer.md +70 -276
- package/agents/testing-tool-evaluator.md +52 -368
- package/agents/testing-workflow-optimizer.md +66 -415
- package/bin/setup.js +45 -0
- package/bin/sync-version.js +38 -0
- package/commands/add-feature.md +98 -0
- package/commands/build.md +285 -79
- package/commands/dogfood.md +43 -0
- package/commands/fix.md +89 -0
- package/commands/idea-sweep.md +19 -82
- package/commands/refactor.md +68 -0
- package/commands/ux-review.md +81 -0
- package/commands/verify.md +43 -0
- package/hooks/session-start +22 -14
- package/package.json +4 -1
- package/agents/agents-orchestrator.md +0 -365
- package/agents/data-analytics-reporter.md +0 -52
- package/agents/lsp-index-engineer.md +0 -312
- package/agents/macos-spatial-metal-engineer.md +0 -335
- package/agents/marketing-content-creator.md +0 -52
- package/agents/marketing-growth-hacker.md +0 -52
- package/agents/product-sprint-prioritizer.md +0 -152
- package/agents/product-trend-researcher.md +0 -157
- package/agents/project-management-project-shepherd.md +0 -192
- package/agents/project-management-studio-operations.md +0 -198
- package/agents/project-management-studio-producer.md +0 -201
- package/agents/project-manager-senior.md +0 -133
- package/agents/support-infrastructure-maintainer.md +0 -616
- package/agents/terminal-integration-specialist.md +0 -68
- package/agents/visionos-spatial-engineer.md +0 -52
- package/agents/xr-cockpit-interaction-specialist.md +0 -30
- package/agents/xr-immersive-developer.md +0 -30
- package/agents/xr-interface-architect.md +0 -30
- package/commands/protocols/brainstorm.md +0 -99
- package/commands/protocols/build-fix.md +0 -52
- package/commands/protocols/cleanup.md +0 -56
- package/commands/protocols/eval-harness.md +0 -62
- package/commands/protocols/metric-loop.md +0 -94
- package/commands/protocols/planning.md +0 -56
- package/commands/protocols/verify.md +0 -63
package/commands/build.md
CHANGED
|
@@ -51,6 +51,8 @@ If you catch yourself typing code or reading source files: STOP. You are wasting
|
|
|
51
51
|
- `last_save: [Phase.Step]`
|
|
52
52
|
Increment after each agent returns (parallel dispatch of 4 agents = +4). Reset to 0 after each compaction save.
|
|
53
53
|
|
|
54
|
+
**Compaction checkpoint format:** At every phase boundary, check `dispatches_since_save` in `docs/plans/.build-state.md`. If >= 8: save ALL state (current phase, task statuses, metric loop scores, decisions) to `docs/plans/.build-state.md`. Reset `dispatches_since_save` to 0. TodoWrite does NOT survive compaction — rebuild it from this state file on resume.
|
|
55
|
+
|
|
54
56
|
Input: $ARGUMENTS
|
|
55
57
|
|
|
56
58
|
### Autonomous Mode
|
|
@@ -67,7 +69,7 @@ When combining `--resume` with `--autonomous`: the current invocation's flags ta
|
|
|
67
69
|
|
|
68
70
|
### Metric Loop
|
|
69
71
|
|
|
70
|
-
Every phase uses a **metric-driven iteration loop** to drive quality. Read the full protocol at `
|
|
72
|
+
Every phase uses a **metric-driven iteration loop** to drive quality. Read the full protocol at `protocols/metric-loop.md`. Critical rules (survive compaction):
|
|
71
73
|
|
|
72
74
|
1. YOU define a metric for this phase based on context (what you're building, what matters). The metric is NOT predefined.
|
|
73
75
|
2. Spawn a **measurement agent** to score the artifact 0-100. Read its full output — it's analysis.
|
|
@@ -91,19 +93,11 @@ When spawning agents in sequence (e.g., architect → implementer → reviewer),
|
|
|
91
93
|
2. **Previous agent's output** — what the upstream agent produced (if any)
|
|
92
94
|
3. **Acceptance criteria** — what "done" looks like for THIS agent
|
|
93
95
|
|
|
94
|
-
For implementation agents (Phase
|
|
96
|
+
For implementation agents (Phase 5+): Do NOT paste the entire Design Document or Architecture Document. Extract the relevant sections only. For research and architecture agents (Phases 1-2): pass the full document — these agents need complete context to do their analysis.
|
|
95
97
|
|
|
96
98
|
### Complexity Routing (Advisory)
|
|
97
99
|
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
| Complexity | Task Types | Preferred Tier |
|
|
101
|
-
|-----------|-----------|----------------|
|
|
102
|
-
| S | Build-fix, cleanup, lint fix, single-error fix | Haiku-class (fastest) |
|
|
103
|
-
| M | Measurement, eval, testing, single-feature impl | Sonnet-class (balanced) |
|
|
104
|
-
| L | Architecture, research, multi-file impl, debugging | Opus-class (deepest reasoning) |
|
|
105
|
-
|
|
106
|
-
For sprint tasks, use the Size field from `docs/plans/sprint-tasks.md`. This is advisory — the tag documents intent for future model routing support.
|
|
100
|
+
Tag agent prompts with `[COMPLEXITY: S/M/L]` based on task size from `docs/plans/sprint-tasks.md`. This is advisory — the tag documents intent for future model routing support.
|
|
107
101
|
|
|
108
102
|
---
|
|
109
103
|
|
|
@@ -112,7 +106,7 @@ For sprint tasks, use the Size field from `docs/plans/sprint-tasks.md`. This is
|
|
|
112
106
|
**Resuming?** If the input contains `--resume` OR if context was just compacted (SessionStart hook fired with active state):
|
|
113
107
|
1. Read `docs/plans/.build-state.md` — verify it exists and has a Resume Point section.
|
|
114
108
|
If `docs/plans/.build-state.md` does not exist or has no Resume Point, warn the user: 'No previous build state found. Starting fresh.' Then proceed to Step 0.1 as a new build.
|
|
115
|
-
2. Re-read this file and all protocol files in `
|
|
109
|
+
2. Re-read this file and all protocol files in `protocols/`.
|
|
116
110
|
3. Re-read `docs/plans/sprint-tasks.md`, `docs/plans/architecture.md`, and `CLAUDE.md`.
|
|
117
111
|
4. Rebuild TodoWrite from the state file (TodoWrite does NOT survive compaction or session breaks).
|
|
118
112
|
5. Reset `dispatches_since_save` to 0 (fresh context window).
|
|
@@ -169,7 +163,7 @@ Autonomous mode: Log checklist to `docs/plans/build-log.md`. Create `.env.exampl
|
|
|
169
163
|
### Step 0.3 — Initialize
|
|
170
164
|
|
|
171
165
|
0. Create `docs/plans/` directory if it doesn't exist (greenfield projects won't have it).
|
|
172
|
-
1. Create a TodoWrite checklist with Phases 0-
|
|
166
|
+
1. Create a TodoWrite checklist with Phases 0-7.
|
|
173
167
|
2. Create `docs/plans/.build-state.md` as a single write with ALL of the following: phase and step (`Phase: 0 — Starting`), input (`[build request]`), context level (`[classification]`), prerequisites (`[status]`), dispatch counter (`dispatches_since_save: 0, last_save: Phase 0`), and a `## Resume Point` section with: phase, step, autonomous mode flag, completed tasks (none), git branch name.
|
|
174
168
|
3. Go to Phase 1 (or Phase 2 if context level is "Full design").
|
|
175
169
|
|
|
@@ -183,7 +177,7 @@ Autonomous mode: Log checklist to `docs/plans/build-log.md`. Create `.env.exampl
|
|
|
183
177
|
|
|
184
178
|
### Step 1.1 — Brainstorming
|
|
185
179
|
|
|
186
|
-
Follow the Brainstorm Protocol (`
|
|
180
|
+
Follow the Brainstorm Protocol (`protocols/brainstorm.md`).
|
|
187
181
|
|
|
188
182
|
In interactive mode: this is a conversation. Ask questions one at a time, propose approaches with trade-offs, let the user decide. Output: Design Document saved to `docs/plans/`.
|
|
189
183
|
|
|
@@ -195,15 +189,15 @@ Skip if context level is "Decision brief" (research already done).
|
|
|
195
189
|
|
|
196
190
|
Call the Agent tool 5 times in a single message. Pass each agent the build request AND the Design Document draft.
|
|
197
191
|
|
|
198
|
-
1. Description: "Market research" — Prompt: "Research market size (TAM/SAM/SOM), competitive landscape (5-10 players), timing, and market structure for: [build request]. Design context: [paste design doc].
|
|
192
|
+
1. Description: "Market research" — Prompt: "Research market size (TAM/SAM/SOM), competitive landscape (5-10 players), timing, and market structure for: [build request]. Design context: [paste design doc]. Report with a Market Verdict: GREEN/AMBER/RED."
|
|
199
193
|
|
|
200
|
-
2. Description: "Tech feasibility" — Prompt: "Evaluate hard technical problems (Solved/Hard/Unsolved), build-vs-buy decisions, MVP scope, and stack validation for: [build request]. Design context: [paste design doc].
|
|
194
|
+
2. Description: "Tech feasibility" — Prompt: "Evaluate hard technical problems (Solved/Hard/Unsolved), build-vs-buy decisions, MVP scope, and stack validation for: [build request]. Design context: [paste design doc]. Verify APIs and libraries from the design exist and are maintained. Report with a Technical Verdict."
|
|
201
195
|
|
|
202
|
-
3. Description: "User research" — Prompt: "Analyze target persona, jobs-to-be-done, current alternatives, behavioral barriers to adoption for: [build request]. Design context: [paste design doc].
|
|
196
|
+
3. Description: "User research" — Prompt: "Analyze target persona, jobs-to-be-done, current alternatives, and behavioral barriers to adoption for: [build request]. Design context: [paste design doc]. Report with a User Verdict."
|
|
203
197
|
|
|
204
|
-
4. Description: "Business model" — Prompt: "Evaluate revenue models, unit economics, growth loops, first-1000-users strategy for: [build request]. Design context: [paste design doc].
|
|
198
|
+
4. Description: "Business model" — Prompt: "Evaluate revenue models, unit economics, growth loops, and first-1000-users strategy for: [build request]. Design context: [paste design doc]. Report with a Business Verdict."
|
|
205
199
|
|
|
206
|
-
5. Description: "Risk analysis" — Prompt: "Adversarial review: regulatory risk, security concerns, dependency risks, competitive response, top 3 failure modes for: [build request]. Design context: [paste design doc].
|
|
200
|
+
5. Description: "Risk analysis" — Prompt: "Adversarial review: regulatory risk, security concerns, dependency risks, competitive response, top 3 failure modes for: [build request]. Design context: [paste design doc]. Report with a Risk Verdict."
|
|
207
201
|
|
|
208
202
|
After all 5 return, synthesize a **Research Brief** with a verdict table. Save to `docs/plans/research-brief.md`.
|
|
209
203
|
|
|
@@ -218,17 +212,41 @@ Read the Design Document and Research Brief together. Check for contradictions:
|
|
|
218
212
|
|
|
219
213
|
Update the Design Document with corrections. Save final version.
|
|
220
214
|
|
|
221
|
-
### Step 1.4 —
|
|
215
|
+
### Step 1.4 — Write CLAUDE.md
|
|
216
|
+
|
|
217
|
+
Create (or overwrite) the project's `CLAUDE.md`. This is the product brain — every agent spawned during the build reads it automatically. Write it from the Design Document and Research Brief. It must give any agent enough context to make smart product, UX, and technical decisions without needing the full design doc.
|
|
218
|
+
|
|
219
|
+
<HARD-GATE>
|
|
220
|
+
CLAUDE.md must be under 200 lines. It is not a wiki, not a conventions doc, not a dump of everything you know. It is the minimum context an agent needs to make correct decisions about this specific product.
|
|
221
|
+
</HARD-GATE>
|
|
222
222
|
|
|
223
|
-
|
|
223
|
+
Structure:
|
|
224
224
|
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
-
|
|
228
|
-
|
|
229
|
-
|
|
225
|
+
```
|
|
226
|
+
## Product
|
|
227
|
+
[1-3 sentences: what this is, core value prop, what success looks like]
|
|
228
|
+
|
|
229
|
+
## User
|
|
230
|
+
[Primary persona: who they are, what they care about, pain points,
|
|
231
|
+
technical sophistication. This drives every UX decision.]
|
|
232
|
+
|
|
233
|
+
## Tech Stack
|
|
234
|
+
[Stack choices with 1-line rationale for each. Framework, DB, auth,
|
|
235
|
+
key libraries, deployment target.]
|
|
236
|
+
|
|
237
|
+
## Scope
|
|
238
|
+
[What's in MVP vs. deferred. Hard boundaries. This prevents agents
|
|
239
|
+
from building features that aren't scoped.]
|
|
240
|
+
|
|
241
|
+
## Rules
|
|
242
|
+
[Project-specific hard rules derived from the product and user context.
|
|
243
|
+
Examples: "All data must be real-time — no simulated/fake data",
|
|
244
|
+
"User must be able to pause/stop any automated process at any time",
|
|
245
|
+
"Every interactive element must have visible feedback within 200ms".
|
|
246
|
+
Only include rules this specific project needs — not generic best practices.]
|
|
247
|
+
```
|
|
230
248
|
|
|
231
|
-
|
|
249
|
+
Keep it product-focused. An implementation agent reading this should understand WHO the user is and WHAT matters enough to make the right call when the handoff prompt doesn't cover an edge case.
|
|
232
250
|
|
|
233
251
|
### Quality Gate 1
|
|
234
252
|
|
|
@@ -238,7 +256,7 @@ This ensures decisions survive context compaction.
|
|
|
238
256
|
|
|
239
257
|
Update TodoWrite and `docs/plans/.build-state.md`.
|
|
240
258
|
|
|
241
|
-
**Compaction checkpoint
|
|
259
|
+
**Compaction checkpoint.** Update `docs/plans/.build-state.md` per the format above.
|
|
242
260
|
|
|
243
261
|
---
|
|
244
262
|
|
|
@@ -270,13 +288,13 @@ After all 4 return, YOU synthesize into one Architecture Document. Save to `docs
|
|
|
270
288
|
|
|
271
289
|
### Step 2.3 — Metric Loop: Architecture Quality
|
|
272
290
|
|
|
273
|
-
Run the Metric Loop Protocol (`
|
|
291
|
+
Run the Metric Loop Protocol (`protocols/metric-loop.md`) on the Architecture Document. Define a metric based on: coverage of design doc requirements, specificity, consistency between agents, and **simplicity** — is this the simplest architecture that meets the requirements? Could any service, abstraction, or dependency be eliminated without losing functionality? Penalize over-engineering (microservices for a simple app, Kubernetes for a static site, complex state management for a 3-page app). Max 3 iterations.
|
|
274
292
|
|
|
275
293
|
### Step 2.4 — Sprint Planning
|
|
276
294
|
|
|
277
|
-
Follow the Planning Protocol (`
|
|
295
|
+
Follow the Planning Protocol (`protocols/planning.md`). Use 2 sequential Agent tool calls:
|
|
278
296
|
|
|
279
|
-
Call the Agent tool — description: "Sprint breakdown" — prompt: "Break this architecture into ordered, atomic tasks. Each task needs: description, acceptance criteria, dependencies, size (S/M/L). ARCHITECTURE: [paste]. DESIGN DOC: [paste]. Scope to MVP only."
|
|
297
|
+
Call the Agent tool — description: "Sprint breakdown" — prompt: "Break this architecture into ordered, atomic tasks. Each task needs: description, acceptance criteria, dependencies, size (S/M/L). Include a `**Behavioral Test:**` field for every task that has UI — a concrete interaction test: 'Navigate to [page], click [element], verify [expected outcome]'. API-only tasks should have curl-based acceptance tests instead. ARCHITECTURE: [paste]. DESIGN DOC: [paste]. Scope to MVP only."
|
|
280
298
|
|
|
281
299
|
Then call the Agent tool — description: "Validate task list" — prompt: "Validate this task list: [paste]. Check scope is realistic, no missing tasks, descriptions specific enough for a developer agent to execute, all tasks within MVP boundary."
|
|
282
300
|
|
|
@@ -290,61 +308,125 @@ Save to `docs/plans/sprint-tasks.md`.
|
|
|
290
308
|
|
|
291
309
|
Update TodoWrite and `docs/plans/.build-state.md`.
|
|
292
310
|
|
|
293
|
-
**Compaction checkpoint
|
|
311
|
+
**Compaction checkpoint.** Update `docs/plans/.build-state.md` per the format above.
|
|
312
|
+
|
|
313
|
+
---
|
|
314
|
+
|
|
315
|
+
## Phase 3: Design & Visual Identity
|
|
316
|
+
|
|
317
|
+
**Goal**: Transform architecture into a research-backed visual design system, proven with Playwright screenshots. Fully autonomous — agents research, decide, and iterate without user input.
|
|
318
|
+
|
|
319
|
+
**Skip if** the project has no user-facing frontend (CLI tools, pure APIs, backend services).
|
|
320
|
+
|
|
321
|
+
<HARD-GATE>
|
|
322
|
+
UI/UX IS THE PRODUCT. This phase is a full peer to Architecture and Build — not a footnote, not an afterthought, not a "nice to have." Do NOT skip, compress, or rush this phase for any reason. The agents must research real competitors and award-winning sites, make deliberate visual choices backed by that research, build a living style guide with every component rendered and interactive, and iterate with Playwright-verified visual QA before a single line of product code is written.
|
|
323
|
+
|
|
324
|
+
Phase 4 (Foundation) WILL NOT START without `docs/plans/visual-design-spec.md`. If it does not exist, return here.
|
|
325
|
+
</HARD-GATE>
|
|
326
|
+
|
|
327
|
+
### Step 3.1 — Design Research (2 agents, parallel, both use Playwright)
|
|
328
|
+
|
|
329
|
+
Follow the Design Protocol (`protocols/design.md`), Step 3.1.
|
|
330
|
+
|
|
331
|
+
Call the Agent tool 2 times in one message:
|
|
332
|
+
|
|
333
|
+
1. Description: "Competitive visual audit" — Prompt: "Research the top 5-8 competitors/analogues for: [product description]. Use Playwright to screenshot each site (desktop 1920x1080 + mobile 375x812). Screenshot standout components (hero, cards, forms, nav, CTAs). Save to docs/plans/design-references/competitors/. Analyze visual language: colors, typography, spacing, what feels premium vs cheap. Rank by visual quality. DESIGN DOC: [paste]."
|
|
334
|
+
|
|
335
|
+
2. Description: "Design inspiration mining" — Prompt: "Search Awwwards.com, Godly.website, SiteInspire for award-winning sites in category: [product category]. Use Playwright to screenshot top 5-8 results + standout components. Save to docs/plans/design-references/inspiration/. Identify visual trends, what separates best-in-class from generic. DESIGN DOC: [paste]."
|
|
336
|
+
|
|
337
|
+
After both return, synthesize a **Design Research Brief** to `docs/plans/design-research.md`. Include all screenshot paths.
|
|
338
|
+
|
|
339
|
+
### Step 3.2 — Design Direction (2 agents, sequential)
|
|
340
|
+
|
|
341
|
+
Follow the Design Protocol (`protocols/design.md`), Step 3.2.
|
|
342
|
+
|
|
343
|
+
1. Call the Agent tool — description: "UX architecture" — Prompt: "Create structural design foundation. INPUTS: frontend architecture section from architecture.md [paste], Design Research Brief [paste], reference screenshot paths [list], user persona [paste]. OUTPUT: information architecture, layout strategy, component hierarchy, responsive approach, interaction patterns. Base decisions on competitive research, not generic patterns."
|
|
344
|
+
|
|
345
|
+
2. Call the Agent tool — description: "Visual design spec" — Prompt: "Create the Visual Design Spec with AUTONOMOUS decisions — pick the single best direction, do not present options. INPUTS: UX foundation [paste previous output], Design Research Brief [paste], reference screenshot paths [list], user persona [paste]. OUTPUT: color system (with hex, light+dark), typography (Google Fonts, mathematical scale), 8px spacing system, tinted shadow system, border radius, animation/motion, component styles with ALL states. Every choice must cite the research. Apply anti-AI-template rules from the Design Protocol. Save to docs/plans/visual-design-spec.md."
|
|
346
|
+
|
|
347
|
+
### Step 3.3 — Living Style Guide (1 implementation agent)
|
|
348
|
+
|
|
349
|
+
Follow the Design Protocol (`protocols/design.md`), Step 3.3.
|
|
350
|
+
|
|
351
|
+
Call the Agent tool — description: "Build living style guide" — mode: "bypassPermissions" — prompt: "[COMPLEXITY: L] Build a living style guide page (/design-system route or standalone HTML). INPUTS: Visual Design Spec [paste], UX foundation [paste relevant sections], reference screenshots [list paths — these are your quality targets]. Must include rendered, interactive examples of: color swatches, typography scale, spacing scale, buttons (all states), form elements (all states), cards, navigation, feedback components (alerts, toasts, spinners, empty states), modals/overlays, and layout grid examples. Every component interactive (hover, focus, transitions work). Mobile-responsive. This ships with the product. Commit: 'feat: living style guide'."
|
|
352
|
+
|
|
353
|
+
### Step 3.4 — Visual QA Loop (Playwright + Metric Loop)
|
|
354
|
+
|
|
355
|
+
Run the Metric Loop Protocol (`protocols/metric-loop.md`) using the measurement criteria from the Design Protocol (`protocols/design.md`, Step 3.4).
|
|
356
|
+
|
|
357
|
+
Measurement: Playwright screenshots of the living style guide sections (desktop + mobile). Design critic agent scores 0-100 across 6 dimensions: spacing/alignment, typography hierarchy, color harmony, component polish, responsive quality, originality (anti-AI-template check). Receives screenshots + Visual Design Spec + reference screenshots.
|
|
358
|
+
|
|
359
|
+
**Target: 80. Max 5 iterations.** On stall: accept if >= 65, log warning below 65.
|
|
360
|
+
|
|
361
|
+
### Step 3.5 — Autonomous Quality Gate
|
|
362
|
+
|
|
363
|
+
Log to `docs/plans/build-log.md`: final screenshot paths, score history table, design decisions, originality score. No user pause. Proceed to Phase 4.
|
|
364
|
+
|
|
365
|
+
**Compaction checkpoint.** Update `docs/plans/.build-state.md` per the format above.
|
|
294
366
|
|
|
295
367
|
---
|
|
296
368
|
|
|
297
|
-
## Phase
|
|
369
|
+
## Phase 4: Foundation
|
|
370
|
+
|
|
371
|
+
<HARD-GATE>
|
|
372
|
+
Before starting Phase 4: Phase 2 must be approved AND Phase 3 must have produced `docs/plans/visual-design-spec.md`.
|
|
373
|
+
If visual-design-spec.md does not exist, DO NOT PROCEED. Return to Phase 3.
|
|
374
|
+
Step 4.2 (Design System) MUST implement from visual-design-spec.md — not generic architecture tokens.
|
|
375
|
+
</HARD-GATE>
|
|
298
376
|
|
|
299
|
-
### Step
|
|
377
|
+
### Step 4.1 — Scaffolding
|
|
300
378
|
|
|
301
379
|
Call the Agent tool — description: "Project scaffolding" — mode: "bypassPermissions" — prompt: "[COMPLEXITY: M] Set up the project from this architecture: [paste]. Create directory structure, dependencies, build tooling, linting config, test framework with one passing test, .gitignore, .env.example. Commit: 'feat: initial scaffolding'."
|
|
302
380
|
|
|
303
|
-
### Step
|
|
381
|
+
### Step 4.2 — Design System (frontend only)
|
|
304
382
|
|
|
305
|
-
Call the Agent tool — description: "Design system setup" — mode: "bypassPermissions" — prompt: "Implement design system
|
|
383
|
+
Call the Agent tool — description: "Design system setup" — mode: "bypassPermissions" — prompt: "Implement the design system from the Visual Design Spec: [paste from docs/plans/visual-design-spec.md]. Create CSS tokens matching the spec's color system, typography scale, spacing system, shadow/elevation tokens, and base layout components. The living style guide from Phase 3 is the reference implementation — components must match. Commit: 'feat: design system'."
|
|
306
384
|
|
|
307
|
-
### Step
|
|
385
|
+
### Step 4.2b — Acceptance Test Scaffolding
|
|
386
|
+
|
|
387
|
+
Call the Agent tool — description: "Scaffold acceptance tests" — mode: "bypassPermissions" — prompt: "Read docs/plans/sprint-tasks.md. For every task with a Behavioral Test field, create a Playwright test stub in tests/e2e/acceptance/. Use Page Object Model. Each test should: navigate to the page, perform the interaction, assert the expected outcome. Tests should FAIL right now (features aren't built yet) — that's correct. Also ensure agent-browser is available (run `which agent-browser`). Commit: 'test: scaffold acceptance tests from sprint tasks'."
|
|
388
|
+
|
|
389
|
+
### Step 4.3 — Metric Loop: Scaffold Health
|
|
308
390
|
|
|
309
391
|
Run the Metric Loop Protocol. Define a metric: builds clean, tests pass, lint clean, structure matches architecture. Max 3 iterations.
|
|
310
392
|
|
|
311
|
-
### Step
|
|
393
|
+
### Step 4.4 — Verification Gate
|
|
312
394
|
|
|
313
|
-
Run the Verification Protocol (`
|
|
395
|
+
Run the Verification Protocol (`protocols/verify.md`). Critical rules (survive compaction):
|
|
314
396
|
- ONE agent runs all 6 checks sequentially: Build → Type-Check → Lint → Test → Security → Diff Review. Stop on first FAIL.
|
|
315
397
|
- Agent auto-detects stack from manifest files (package.json → Node, go.mod → Go, etc.).
|
|
316
|
-
- On FAIL: for build/type/lint errors, use the Build-Fix Protocol (`
|
|
398
|
+
- On FAIL: for build/type/lint errors, use the Build-Fix Protocol (`protocols/build-fix.md`) — fixes one error at a time with cascade detection. For test/security/diff failures, spawn a targeted fix agent. Re-verify. Max 3 fix attempts.
|
|
317
399
|
- On PASS: log `VERIFY: PASS (6/6)` to `docs/plans/.build-state.md`. Proceed.
|
|
318
400
|
|
|
319
401
|
Call the Agent tool — description: "Verify scaffolding" — mode: "bypassPermissions" — prompt: "Run the Verification Protocol. Execute all 6 checks sequentially, stop on first failure. Report: VERIFY: PASS or VERIFY: FAIL with details."
|
|
320
402
|
|
|
321
|
-
Do not proceed to Phase
|
|
403
|
+
Do not proceed to Phase 5 until verification passes.
|
|
322
404
|
|
|
323
405
|
Update TodoWrite and state.
|
|
324
406
|
|
|
325
|
-
**Compaction checkpoint
|
|
407
|
+
**Compaction checkpoint.** Update `docs/plans/.build-state.md` per the format above.
|
|
326
408
|
|
|
327
409
|
---
|
|
328
410
|
|
|
329
|
-
## Phase
|
|
411
|
+
## Phase 5: Build — Metric-Driven Dev Loops
|
|
330
412
|
|
|
331
413
|
<HARD-GATE>
|
|
332
|
-
Before starting: Phase 2 must be approved, Phase 3 must pass. You MUST call the Agent tool for EVERY task. No exceptions.
|
|
414
|
+
Before starting: Phase 2 must be approved, Phase 3 must produce docs/plans/visual-design-spec.md, Phase 4 must pass. You MUST call the Agent tool for EVERY task. No exceptions.
|
|
333
415
|
</HARD-GATE>
|
|
334
416
|
|
|
335
417
|
Expand TodoWrite with each sprint task.
|
|
336
418
|
|
|
337
419
|
**For EACH task:**
|
|
338
420
|
|
|
339
|
-
### Step
|
|
421
|
+
### Step 5.1 — Implement
|
|
340
422
|
|
|
341
|
-
Call the Agent tool — description: "[task name]" — mode: "bypassPermissions" — prompt: "TASK: [task description + acceptance criteria]. HANDOFF — Architecture section: [paste ONLY the relevant section from architecture.md]. Design section: [paste ONLY the relevant section from the design doc]. Previous task output: [what the last completed task produced, if relevant]. Implement fully with real code and tests. Commit: 'feat: [task]'. Report what you built, files changed, and test results."
|
|
423
|
+
Call the Agent tool — description: "[task name]" — mode: "bypassPermissions" — prompt: "TASK: [task description + acceptance criteria]. HANDOFF — Architecture section: [paste ONLY the relevant section from architecture.md]. Design section: [paste ONLY the relevant section from the design doc]. Previous task output: [what the last completed task produced, if relevant]. For UI tasks: the living style guide at /design-system shows every component's exact styling and states — match it. Implement fully with real code and tests. Commit: 'feat: [task]'. Report what you built, files changed, and test results."
|
|
342
424
|
|
|
343
425
|
Pick the right developer framing: frontend, backend, AI, etc. Set `[COMPLEXITY: S/M/L]` based on the task's Size from sprint-tasks.md.
|
|
344
426
|
|
|
345
|
-
### Step
|
|
427
|
+
### Step 5.1b — Cleanup (De-Sloppify)
|
|
346
428
|
|
|
347
|
-
Follow the Cleanup Protocol (`
|
|
429
|
+
Follow the Cleanup Protocol (`protocols/cleanup.md`). Critical rules (survive compaction):
|
|
348
430
|
[COMPLEXITY: S]
|
|
349
431
|
- Skip if trivial (< 20 lines, single file).
|
|
350
432
|
- Cleanup agent is a SEPARATE agent from the implementer — no cleaning your own mess.
|
|
@@ -354,11 +436,11 @@ Follow the Cleanup Protocol (`commands/protocols/cleanup.md`). Critical rules (s
|
|
|
354
436
|
|
|
355
437
|
Call the Agent tool — description: "Cleanup [task name]" — mode: "bypassPermissions" — with the list of files changed and the task's acceptance criteria.
|
|
356
438
|
|
|
357
|
-
### Step
|
|
439
|
+
### Step 5.2 — Metric Loop: Task Quality
|
|
358
440
|
|
|
359
|
-
Run the Metric Loop Protocol on the task implementation. Define a metric based on the task's acceptance criteria. Max 5 iterations.
|
|
441
|
+
Run the Metric Loop Protocol on the task implementation. Define a metric based on the task's acceptance criteria. For UI-facing tasks, include behavioral verification: the measurement agent should use agent-browser to verify the feature renders and responds to interaction, not just read the code. Max 5 iterations.
|
|
360
442
|
|
|
361
|
-
### Step
|
|
443
|
+
### Step 5.3 — Loop Exit
|
|
362
444
|
|
|
363
445
|
On target met: mark task complete in TodoWrite, report "Task X/N: [name] — COMPLETE (score: [final], iterations: [count])".
|
|
364
446
|
|
|
@@ -368,74 +450,198 @@ On stall or max iterations:
|
|
|
368
450
|
|
|
369
451
|
After each task: update TodoWrite and `docs/plans/.build-state.md`.
|
|
370
452
|
|
|
371
|
-
### Step
|
|
453
|
+
### Step 5.3b — Behavioral Smoke Test
|
|
454
|
+
|
|
455
|
+
Skip if this task has no Behavioral Test criteria (API-only, config, infrastructure tasks).
|
|
456
|
+
|
|
457
|
+
Run the Smoke Test Protocol (`protocols/smoke-test.md`). This uses agent-browser to open the app, execute the task's behavioral acceptance criteria, and verify the feature actually works.
|
|
458
|
+
|
|
459
|
+
Evidence saved to `docs/plans/evidence/[task-name]/`: annotated screenshot, snapshot diff, error log, network log, HAR file.
|
|
372
460
|
|
|
373
|
-
|
|
461
|
+
On FAIL: spawn fix agent with the evidence. The fix agent receives: what was expected (from acceptance criteria), what actually happened (snapshot diff + errors + screenshot), and the relevant source files. Max 2 fix-and-retest cycles.
|
|
374
462
|
|
|
375
|
-
|
|
463
|
+
On PASS: proceed to Step 5.4.
|
|
464
|
+
|
|
465
|
+
### Step 5.4 — Post-Task Verification
|
|
466
|
+
|
|
467
|
+
Run the Verification Protocol (`protocols/verify.md`). If FAIL, fix before starting the next task.
|
|
468
|
+
|
|
469
|
+
**Compaction checkpoint.** Update `docs/plans/.build-state.md` per the format above.
|
|
376
470
|
|
|
377
471
|
---
|
|
378
472
|
|
|
379
|
-
## Phase
|
|
473
|
+
## Phase 6: Harden — Metric-Driven Hardening
|
|
380
474
|
|
|
381
|
-
### Step
|
|
475
|
+
### Step 6.0 — Pre-Hardening Verification
|
|
382
476
|
|
|
383
|
-
Run the Verification Protocol (`
|
|
477
|
+
Run the Verification Protocol (`protocols/verify.md`). All checks must pass before starting expensive audit agents.
|
|
384
478
|
|
|
385
|
-
### Step
|
|
479
|
+
### Step 6.1 — Initial Audit (5 agents in parallel, ONE message)
|
|
386
480
|
|
|
387
|
-
|
|
481
|
+
Read the NFRs from `docs/plans/sprint-tasks.md`. Pass the relevant NFR thresholds to each audit agent so they have concrete targets, not generic checks.
|
|
388
482
|
|
|
389
|
-
|
|
483
|
+
Call the Agent tool 5 times in one message:
|
|
390
484
|
|
|
391
|
-
|
|
485
|
+
1. Description: "API testing" — Prompt: "Comprehensive API validation: all endpoints, edge cases, error responses, auth flows. NFR targets: [paste performance and reliability NFRs]. Report findings with counts."
|
|
392
486
|
|
|
393
|
-
|
|
487
|
+
2. Description: "Performance audit" — Prompt: "Measure response times, identify bottlenecks, flag performance issues. NFR targets: [paste performance NFRs — e.g., API < 200ms, page load < 3s]. Report benchmarks AGAINST these targets."
|
|
394
488
|
|
|
395
|
-
|
|
489
|
+
3. Description: "Accessibility audit" — Prompt: "WCAG compliance audit on all interfaces. NFR target: [paste accessibility NFR — e.g., WCAG AA]. Check screen reader, keyboard nav, contrast. Report issues with counts."
|
|
396
490
|
|
|
397
|
-
|
|
491
|
+
4. Description: "Security audit" — Prompt: "Security review: auth, input validation, data exposure, dependency vulnerabilities. NFR targets: [paste security NFRs]. Report findings with severity."
|
|
398
492
|
|
|
399
|
-
|
|
493
|
+
5. Description: "UX quality audit" — Prompt: "UX quality review of every user-facing page. NFR targets: [paste accessibility NFRs]. First, screenshot the living style guide at /design-system as your reference for how components should look. Then review every product page and check: loading states (every async action must show a loading indicator), error states (every form and API call must show user-friendly error feedback), empty states (every list/table must handle zero items gracefully), mobile responsiveness (test at 375px viewport — touch targets >= 44px, no horizontal scroll, readable text), form validation (inline feedback, not just alert()), transition smoothness (no layout shifts, no janky animations), visual consistency (compare each page's components against the style guide — buttons, inputs, cards, colors, spacing should match). Report issues with page, severity, and screenshot."
|
|
400
494
|
|
|
401
|
-
### Step
|
|
495
|
+
### Step 6.1b — Eval Harness
|
|
496
|
+
|
|
497
|
+
Run the Eval Harness Protocol (`protocols/eval-harness.md`). Define 8-15 concrete, executable eval cases from the audit findings and architecture doc. For UI flows, eval cases should use agent-browser: "agent-browser open /dashboard -> agent-browser click @submit -> agent-browser wait --text Success -> expect text contains confirmation ID". Run the eval agent. Record baseline pass rate. CRITICAL and HIGH failures feed into the metric loop in Step 6.2 as specific issues to fix.
|
|
498
|
+
|
|
499
|
+
### Step 6.2 — Metric Loop: Hardening Quality
|
|
402
500
|
|
|
403
501
|
Run the Metric Loop Protocol on the full codebase using audit findings as initial input. Define a composite metric based on what this project needs. Max 4 iterations.
|
|
404
502
|
|
|
405
503
|
When fixing, dispatch to the RIGHT specialist. Security → security agent. Accessibility → frontend agent. Don't send everything to one agent.
|
|
406
504
|
|
|
407
|
-
### Step
|
|
505
|
+
### Step 6.2b — Eval Re-run
|
|
408
506
|
|
|
409
507
|
Re-run the Eval Harness after the metric loop exits. All CRITICAL eval cases must now pass. If any CRITICAL case still fails, include it as evidence for the Reality Checker.
|
|
410
508
|
|
|
411
|
-
### Step
|
|
509
|
+
### Step 6.2c — E2E Testing (3 mandatory iterations)
|
|
510
|
+
|
|
511
|
+
<HARD-GATE>
|
|
512
|
+
ALL 3 ITERATIONS ARE MANDATORY. Do NOT stop after iteration 1 even if all tests pass. The purpose of 3 runs is to catch flaky tests, timing-dependent failures, and race conditions that only surface on repeated execution. Skip this step ONLY if the project has no user-facing frontend.
|
|
513
|
+
</HARD-GATE>
|
|
514
|
+
|
|
515
|
+
Generate and execute end-to-end tests using Playwright against the running application. Tests cover the **User Journeys** defined in `docs/plans/sprint-tasks.md` (Step 0 of the Planning Protocol). Each journey = one E2E test file.
|
|
516
|
+
|
|
517
|
+
**Iteration 1 — Generate & Run:**
|
|
518
|
+
|
|
519
|
+
Call the Agent tool — description: "E2E test generation" — mode: "bypassPermissions" — prompt:
|
|
520
|
+
|
|
521
|
+
"[COMPLEXITY: L] Generate and run end-to-end Playwright tests for this application.
|
|
522
|
+
|
|
523
|
+
INPUTS:
|
|
524
|
+
- User Journeys from docs/plans/sprint-tasks.md: [paste the User Journeys section — each journey becomes one E2E test]
|
|
525
|
+
- Architecture doc (API contracts): [paste relevant sections from docs/plans/architecture.md]
|
|
526
|
+
- NFRs from docs/plans/sprint-tasks.md: [paste — use performance thresholds as test assertions]
|
|
527
|
+
- Visual Design Spec (component selectors): [paste relevant sections from docs/plans/visual-design-spec.md]
|
|
528
|
+
|
|
529
|
+
REQUIREMENTS:
|
|
530
|
+
1. One E2E test per User Journey from sprint-tasks.md (each journey = one test file covering the full flow)
|
|
531
|
+
2. Use Page Object Model pattern — one page object per major view
|
|
532
|
+
3. Use data-testid selectors (add them to components if missing)
|
|
533
|
+
4. Wait for API responses, NEVER use arbitrary timeouts (no waitForTimeout)
|
|
534
|
+
5. Capture screenshots at critical verification points
|
|
535
|
+
6. Configure multi-browser: Chromium + Firefox + WebKit
|
|
536
|
+
7. Set up playwright.config.ts with: fullyParallel, retries: 0 (we handle retries ourselves), screenshot: 'only-on-failure', video: 'retain-on-failure', trace: 'on-first-retry'
|
|
537
|
+
8. Run all tests. Report: total, passed, failed, with failure details and screenshot paths.
|
|
538
|
+
9. Commit: 'test: e2e test suite for critical user journeys'
|
|
539
|
+
|
|
540
|
+
Test priority:
|
|
541
|
+
- CRITICAL: Auth, core feature happy path, data submission, payment/transaction flows
|
|
542
|
+
- HIGH: Search, filtering, navigation, error states
|
|
543
|
+
- MEDIUM: Responsive layout, animations, edge cases"
|
|
544
|
+
|
|
545
|
+
Record results: total tests, pass count, fail count, failure details. Log to `docs/plans/.build-state.md` under `## E2E Testing`:
|
|
546
|
+
|
|
547
|
+
```
|
|
548
|
+
| Iter | Total | Passed | Failed | Flaky | Top Failure |
|
|
549
|
+
|------|-------|--------|--------|-------|-------------|
|
|
550
|
+
| 1 | ... | ... | ... | ... | ... |
|
|
551
|
+
```
|
|
552
|
+
|
|
553
|
+
**Iteration 2 — Fix & Re-run:**
|
|
554
|
+
|
|
555
|
+
Call the Agent tool — description: "E2E fix iteration 2" — mode: "bypassPermissions" — prompt: "[COMPLEXITY: M] Fix E2E test failures from iteration 1: [paste failure details — test names, error messages, screenshot paths]. Diagnose each as real bug, flaky test, or missing selector. Fix accordingly — do NOT delete or skip tests. Re-run ALL tests. Commit: 'fix: e2e test failures iteration 2'."
|
|
556
|
+
|
|
557
|
+
Record results in the E2E table. Identify flaky candidates (passed iter 1, failed iter 2 or vice versa).
|
|
558
|
+
|
|
559
|
+
**Iteration 3 — Final Stability Run:**
|
|
412
560
|
|
|
413
|
-
Call the Agent tool — description: "
|
|
561
|
+
Call the Agent tool — description: "E2E stability run" — mode: "bypassPermissions" — prompt: "[COMPLEXITY: M] Final E2E stability run (3 of 3). Previous results — Iter 1: [pass/fail counts], Iter 2: [pass/fail counts], Flaky candidates: [list]. Run ALL tests with --repeat-each=3. Quarantine inconsistent tests with test.fixme(). Fix remaining consistent failures. PASS CRITERIA: 95%+ pass rate (quarantined flaky tests excluded but logged). Commit: 'test: e2e stability fixes iteration 3'."
|
|
562
|
+
|
|
563
|
+
Record final results. Include in Reality Checker evidence.
|
|
564
|
+
|
|
565
|
+
### Step 6.2d — Autonomous Dogfooding
|
|
566
|
+
|
|
567
|
+
Run the agent-browser dogfood skill against the running app. Unlike the per-task smoke tests (which verify specific acceptance criteria), dogfooding is **exploratory** — it autonomously navigates every reachable page, clicks buttons, fills forms, checks console errors, and finds issues we didn't think to test.
|
|
568
|
+
|
|
569
|
+
Start the dev server if not running. Then invoke the dogfood skill:
|
|
570
|
+
|
|
571
|
+
Call the Agent tool — description: "Dogfood the app" — mode: "bypassPermissions" — prompt: "Run the agent-browser dogfood skill against the running app at http://localhost:[port]. Explore every reachable page. Click every button. Fill every form. Check console for errors. Report a structured list of issues with severity ratings (critical/high/medium/low), screenshots, and repro steps. If dogfood skill is not available, use agent-browser manually: snapshot each page, click all interactive elements, check errors and network requests. Also evaluate UX quality: missing loading states, poor error messages, broken mobile layouts (resize to 375px), visual inconsistencies, missing empty states, form validation gaps. Report UX issues separately from functional issues."
|
|
572
|
+
|
|
573
|
+
**Fix loop:** For each CRITICAL or HIGH issue found:
|
|
574
|
+
1. Classify: is this a code bug (fix in Phase 5 style — spawn implementation fix agent) or a structural problem (needs architecture change — spawn architect agent to propose a fix plan, then implementation agent to execute)?
|
|
575
|
+
2. Spawn the appropriate fix agent with: the issue description, repro steps, screenshot, affected page/component.
|
|
576
|
+
3. After fixes, re-run dogfood on the affected pages only (not the full app). If new CRITICAL/HIGH issues appear, repeat. Max 3 fix cycles.
|
|
577
|
+
|
|
578
|
+
MEDIUM/LOW issues: log to `docs/plans/build-log.md` for the Reality Checker.
|
|
579
|
+
|
|
580
|
+
### Step 6.2e — Fake Data Detector
|
|
581
|
+
|
|
582
|
+
Call the Agent tool — description: "Fake data audit" — mode: "bypassPermissions" — prompt: "Run the Fake Data Detector Protocol (protocols/fake-data-detector.md). Check for mock/hardcoded data in production paths. Static analysis: grep for Math.random() business data, hardcoded API responses, setTimeout faking async, placeholder text. Dynamic analysis: inspect HAR files from docs/plans/evidence/ for missing real API calls, static responses, absent WebSocket traffic. Report findings with file:line references and severity."
|
|
583
|
+
|
|
584
|
+
**Fix loop:** For each CRITICAL finding:
|
|
585
|
+
1. Spawn a fix agent with: the finding (file:line, what's fake, what it should be), and the relevant source files.
|
|
586
|
+
2. The fix agent replaces fake data with real API calls, real WebSocket connections, real data sources. If real data sources aren't available (missing API keys, no backend), the fix agent must flag this as a blocker — not paper over it with better-looking fake data.
|
|
587
|
+
3. After fixes, re-run the fake data detector (static checks only — fast). Max 2 fix cycles.
|
|
588
|
+
|
|
589
|
+
Remaining findings feed into the Reality Checker in Step 6.4.
|
|
590
|
+
|
|
591
|
+
### Step 6.4 — Reality Check
|
|
592
|
+
|
|
593
|
+
Call the Agent tool — description: "Final verdict" — prompt: "You are the Reality Checker. Default: NEEDS WORK. The hardening loop reached score [final_score] after [iterations] iterations. Score history: [paste table]. Review all evidence. Eval harness results: [baseline pass rate] → [final pass rate]. E2E test results: [paste E2E table — 3 iterations, final pass rate, quarantined count]. Dogfood results: [paste issue count and any CRITICAL/HIGH findings, or 'clean — no issues found']. Fake data audit results: [paste findings or 'clean — no fake data detected']. CRITICAL failures remaining: [list or none]. Verdict: PRODUCTION READY or NEEDS WORK with specifics."
|
|
414
594
|
|
|
415
595
|
<HARD-GATE>Do NOT self-approve. Reality Checker must give the verdict.</HARD-GATE>
|
|
416
596
|
|
|
417
|
-
**
|
|
418
|
-
|
|
597
|
+
**On PRODUCTION READY:** Log verdict. Proceed to Phase 7.
|
|
598
|
+
|
|
599
|
+
**On NEEDS WORK:** The Reality Checker returns specific issues. These must be fixed — not logged and ignored.
|
|
600
|
+
|
|
601
|
+
1. Read the Reality Checker's specific findings. Classify each:
|
|
602
|
+
- **Code bug** (broken feature, failing test, fake data) → spawn implementation fix agent with the finding + affected files.
|
|
603
|
+
- **Structural issue** (missing feature, wrong architecture, data flow problem) → spawn architect agent to produce a fix plan, then implementation agent to execute it. This is a mini Phase 5 loop for the specific issue.
|
|
604
|
+
- **Blocker** (missing API key, no backend, needs human action) → log to `docs/plans/build-log.md` and present to user. Cannot be auto-fixed.
|
|
605
|
+
2. After fixes, re-run verification (7 checks) + the specific failing gate (E2E, dogfood, or fake data — whichever surfaced the issue).
|
|
606
|
+
3. Re-run the Reality Checker with updated evidence.
|
|
607
|
+
|
|
608
|
+
<HARD-GATE>
|
|
609
|
+
Max 2 NEEDS WORK cycles. If the Reality Checker returns NEEDS WORK a third time:
|
|
610
|
+
- **Interactive:** Present all remaining issues to user. Ask for direction.
|
|
611
|
+
- **Autonomous:** Log remaining issues to `docs/plans/build-log.md`. Proceed to Phase 7 with a warning in the completion report.
|
|
612
|
+
Do not loop forever.
|
|
613
|
+
</HARD-GATE>
|
|
419
614
|
|
|
420
|
-
**Compaction checkpoint
|
|
615
|
+
**Compaction checkpoint.** Update `docs/plans/.build-state.md` per the format above.
|
|
421
616
|
|
|
422
617
|
---
|
|
423
618
|
|
|
424
|
-
## Phase
|
|
619
|
+
## Phase 7: Ship
|
|
620
|
+
|
|
621
|
+
### Step 7.0 — Pre-Ship Verification
|
|
622
|
+
|
|
623
|
+
Run the Verification Protocol (`protocols/verify.md`). All checks must pass before documenting and shipping. If FAIL persists after 3 fix attempts, return to Phase 6.
|
|
624
|
+
|
|
625
|
+
### Step 7.0b — Requirements Coverage Report
|
|
626
|
+
|
|
627
|
+
Call the Agent tool — description: "Requirements coverage check" — prompt: "Re-read the original Design Document (docs/plans/*.md design doc) and the user journeys + NFRs from docs/plans/sprint-tasks.md. For EVERY feature listed in the MVP scope, verify: (1) it has a corresponding implemented task, (2) it has a passing test or behavioral verification, (3) it is reachable and functional in the running app. Produce a coverage table:
|
|
628
|
+
|
|
629
|
+
| MVP Feature | Task | Test | Verified | Status |
|
|
630
|
+
|-------------|------|------|----------|--------|
|
|
425
631
|
|
|
426
|
-
|
|
632
|
+
Mark each as COVERED, PARTIAL (implemented but untested), or MISSING. Any MISSING feature is a blocker — report it immediately."
|
|
427
633
|
|
|
428
|
-
|
|
634
|
+
If any features are MISSING: spawn implementation agents to build them, then re-run verification. This is the final safety net before shipping — it catches requirements that were planned but somehow never built.
|
|
429
635
|
|
|
430
|
-
### Step
|
|
636
|
+
### Step 7.1 — Documentation
|
|
431
637
|
|
|
432
638
|
Call the Agent tool — description: "Documentation" — mode: "bypassPermissions" — prompt: "Write project docs: README with setup/architecture/usage, API docs if applicable, deployment notes. Commit: 'docs: project documentation'."
|
|
433
639
|
|
|
434
|
-
### Step
|
|
640
|
+
### Step 7.2 — Metric Loop: Documentation Quality
|
|
435
641
|
|
|
436
642
|
Run the Metric Loop Protocol on documentation. Define a metric based on completeness and whether a new developer could follow the README. Max 3 iterations.
|
|
437
643
|
|
|
438
|
-
### Step
|
|
644
|
+
### Step 7.3 — Record Learnings
|
|
439
645
|
|
|
440
646
|
Append to `docs/plans/learnings.md` (create if it doesn't exist). Review the build and record 3-5 learnings:
|
|
441
647
|
|
|
@@ -457,4 +663,4 @@ Metric loops run: [count] | Avg iterations: [N]
|
|
|
457
663
|
Remaining: [any NEEDS WORK items]
|
|
458
664
|
```
|
|
459
665
|
|
|
460
|
-
Mark all TodoWrite items complete. Update `docs/plans/.build-state.md`: "Phase:
|
|
666
|
+
Mark all TodoWrite items complete. Update `docs/plans/.build-state.md`: "Phase: 7 COMPLETE."
|
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Autonomous exploratory testing — navigate your running app like a real user, find bugs, UX issues, and broken flows"
|
|
3
|
+
argument-hint: "URL or page/route to test, e.g. 'http://localhost:3000' or '/settings'. Omit to dogfood the entire app."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Dogfood
|
|
7
|
+
|
|
8
|
+
You are a ruthless QA tester. Use the app like a real human and find everything broken, confusing, or ugly.
|
|
9
|
+
|
|
10
|
+
## Step 1: Scope and Server
|
|
11
|
+
|
|
12
|
+
- If the user provided a specific page/route: focus on that area only.
|
|
13
|
+
- If no argument: dogfood the entire app — discover all routes and test each one.
|
|
14
|
+
- Check if the app is already running at the target URL. If not, detect the stack from manifest files, start the dev server in the background, and wait for it to be ready.
|
|
15
|
+
|
|
16
|
+
## Step 2: Exploratory Testing
|
|
17
|
+
|
|
18
|
+
Use agent-browser (or Playwright MCP tools) for real user interactions:
|
|
19
|
+
|
|
20
|
+
1. **Navigate** — visit every discoverable page/route. Click nav links, follow breadcrumbs.
|
|
21
|
+
2. **Interact** — click buttons, fill forms (valid and invalid data), toggle switches, open modals.
|
|
22
|
+
3. **Check console** — after each page, check for JS errors and warnings.
|
|
23
|
+
4. **Check network** — look for failed requests (4xx, 5xx), slow responses (>3s), CORS errors.
|
|
24
|
+
5. **Screenshot** each page for the final report.
|
|
25
|
+
|
|
26
|
+
## Step 3: UX Checks
|
|
27
|
+
|
|
28
|
+
For each page: check **loading states** (spinner/skeleton vs blank flash), **error states** (submit invalid forms, hit broken routes), **mobile layout** (resize to 375px — check overflow, readability, tap targets), and **empty states** (what happens with no data).
|
|
29
|
+
|
|
30
|
+
## Step 4: Report
|
|
31
|
+
|
|
32
|
+
Present findings as a severity-sorted table:
|
|
33
|
+
|
|
34
|
+
| Severity | Page | Issue | Screenshot | Repro Steps |
|
|
35
|
+
|----------|------|-------|------------|-------------|
|
|
36
|
+
|
|
37
|
+
Severity: **CRITICAL** = crashes/data loss/security, **HIGH** = broken features/JS errors/failed requests, **MEDIUM** = UX confusion/layout issues, **LOW** = cosmetic polish.
|
|
38
|
+
|
|
39
|
+
## Step 5: Offer Fixes
|
|
40
|
+
|
|
41
|
+
For CRITICAL/HIGH issues, ask: "Found [N] critical/high issues. Fix now or just report?"
|
|
42
|
+
|
|
43
|
+
If they want fixes: address each one at a time, re-verifying after each fix. Close the browser when done.
|