buildanything 1.6.0 → 1.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +2 -1
- package/.claude-plugin/plugin.json +10 -2
- package/agents/agentic-identity-trust.md +65 -311
- package/agents/data-consolidation-agent.md +3 -22
- package/agents/design-brand-guardian.md +52 -275
- package/agents/design-image-prompt-engineer.md +67 -196
- package/agents/design-ui-designer.md +37 -361
- package/agents/design-ux-architect.md +51 -434
- package/agents/design-ux-researcher.md +48 -299
- package/agents/design-whimsy-injector.md +58 -405
- package/agents/engineering-backend-architect.md +39 -202
- package/agents/engineering-data-engineer.md +41 -236
- package/agents/engineering-devops-automator.md +73 -258
- package/agents/engineering-frontend-developer.md +33 -206
- package/agents/engineering-mobile-app-builder.md +36 -446
- package/agents/engineering-rapid-prototyper.md +34 -428
- package/agents/engineering-security-engineer.md +44 -204
- package/agents/engineering-senior-developer.md +18 -138
- package/agents/engineering-technical-writer.md +40 -302
- package/agents/marketing-app-store-optimizer.md +63 -276
- package/agents/marketing-social-media-strategist.md +38 -87
- package/agents/project-management-experiment-tracker.md +62 -156
- package/agents/report-distribution-agent.md +4 -24
- package/agents/sales-data-extraction-agent.md +3 -22
- package/agents/specialized-cultural-intelligence-strategist.md +41 -62
- package/agents/specialized-developer-advocate.md +65 -234
- package/agents/support-analytics-reporter.md +76 -306
- package/agents/support-executive-summary-generator.md +26 -172
- package/agents/support-finance-tracker.md +67 -362
- package/agents/support-legal-compliance-checker.md +40 -497
- package/agents/support-support-responder.md +40 -532
- package/agents/testing-accessibility-auditor.md +67 -271
- package/agents/testing-api-tester.md +58 -274
- package/agents/testing-evidence-collector.md +48 -170
- package/agents/testing-performance-benchmarker.md +75 -236
- package/agents/testing-reality-checker.md +49 -192
- package/agents/testing-test-results-analyzer.md +70 -276
- package/agents/testing-tool-evaluator.md +52 -368
- package/agents/testing-workflow-optimizer.md +66 -415
- package/bin/setup.js +45 -0
- package/bin/sync-version.js +38 -0
- package/commands/add-feature.md +98 -0
- package/commands/build.md +156 -93
- package/commands/dogfood.md +43 -0
- package/commands/fix.md +89 -0
- package/commands/idea-sweep.md +19 -82
- package/commands/refactor.md +68 -0
- package/commands/ux-review.md +81 -0
- package/commands/verify.md +43 -0
- package/hooks/session-start +5 -10
- package/package.json +4 -1
- package/agents/agents-orchestrator.md +0 -365
- package/agents/data-analytics-reporter.md +0 -52
- package/agents/lsp-index-engineer.md +0 -312
- package/agents/macos-spatial-metal-engineer.md +0 -335
- package/agents/marketing-content-creator.md +0 -52
- package/agents/marketing-growth-hacker.md +0 -52
- package/agents/product-sprint-prioritizer.md +0 -152
- package/agents/product-trend-researcher.md +0 -157
- package/agents/project-management-project-shepherd.md +0 -192
- package/agents/project-management-studio-operations.md +0 -198
- package/agents/project-management-studio-producer.md +0 -201
- package/agents/project-manager-senior.md +0 -133
- package/agents/support-infrastructure-maintainer.md +0 -616
- package/agents/terminal-integration-specialist.md +0 -68
- package/agents/visionos-spatial-engineer.md +0 -52
- package/agents/xr-cockpit-interaction-specialist.md +0 -30
- package/agents/xr-immersive-developer.md +0 -30
- package/agents/xr-interface-architect.md +0 -30
- package/commands/protocols/brainstorm.md +0 -99
- package/commands/protocols/build-fix.md +0 -52
- package/commands/protocols/cleanup.md +0 -56
- package/commands/protocols/design.md +0 -287
- package/commands/protocols/eval-harness.md +0 -62
- package/commands/protocols/metric-loop.md +0 -94
- package/commands/protocols/planning.md +0 -56
- package/commands/protocols/verify.md +0 -63
|
@@ -0,0 +1,98 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Add a single feature to an existing project — lightweight build cycle using existing architecture, design system, and CLAUDE.md context"
|
|
3
|
+
argument-hint: "Describe the feature to add. --autonomous for unattended mode."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
<HARD-GATE>
|
|
7
|
+
YOU ARE AN ORCHESTRATOR. YOU COORDINATE AGENTS. YOU DO NOT WRITE CODE.
|
|
8
|
+
|
|
9
|
+
"Launch an agent" = call the Agent tool. For implementation agents, set mode: "bypassPermissions". For parallel work, put multiple Agent tool calls in ONE message.
|
|
10
|
+
</HARD-GATE>
|
|
11
|
+
|
|
12
|
+
Input: $ARGUMENTS
|
|
13
|
+
|
|
14
|
+
If the input contains `--autonomous` or `--auto`, skip user approval gates and log decisions to `docs/plans/build-log.md`.
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Phase 1: Context Gathering
|
|
19
|
+
|
|
20
|
+
Read these files directly (no agent needed — this is fast):
|
|
21
|
+
|
|
22
|
+
1. `CLAUDE.md` — product context, tech stack, rules
|
|
23
|
+
2. `docs/plans/architecture.md` — current architecture
|
|
24
|
+
3. `docs/plans/sprint-tasks.md` — existing user journeys and scope
|
|
25
|
+
|
|
26
|
+
If any file is missing, proceed with what exists. If the codebase is unfamiliar or the feature touches unknown areas, spawn an Explore agent:
|
|
27
|
+
|
|
28
|
+
Call the Agent tool — description: "Explore codebase for [feature area]" — prompt: "Find all files related to [feature area]. Report: directory structure, key files, patterns used, relevant components/routes/APIs. Be concise."
|
|
29
|
+
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
## Phase 2: Plan the Feature
|
|
33
|
+
|
|
34
|
+
You do this yourself — no agent needed.
|
|
35
|
+
|
|
36
|
+
1. **Break the feature into 1-5 tasks** (most features are 1-3). Each task should be one commit-sized unit of work.
|
|
37
|
+
2. **Define behavioral acceptance criteria** for each task — what must be true when the task is done.
|
|
38
|
+
3. **Define the user journey** — the end-to-end flow the user will experience with this feature.
|
|
39
|
+
4. **Present the plan to the user for approval.** In autonomous mode, log the plan to `docs/plans/build-log.md` and proceed.
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## Phase 3: Build
|
|
44
|
+
|
|
45
|
+
**For EACH task:**
|
|
46
|
+
|
|
47
|
+
### Step 3.1 — Implement
|
|
48
|
+
|
|
49
|
+
Call the Agent tool — description: "[task name]" — mode: "bypassPermissions" — prompt: "TASK: [task description + acceptance criteria]. HANDOFF — Architecture context: [paste ONLY the relevant section from architecture.md]. Style guide: the living style guide at /design-system shows component styling — match it. Implement with real code and tests. Commit: 'feat: [task]'. Report what you built, files changed, and test results."
|
|
50
|
+
|
|
51
|
+
Set `[COMPLEXITY: S/M/L]` based on task scope.
|
|
52
|
+
|
|
53
|
+
### Step 3.2 — Cleanup
|
|
54
|
+
|
|
55
|
+
Skip if trivial (< 20 lines, single file). Otherwise:
|
|
56
|
+
|
|
57
|
+
Call the Agent tool — description: "Cleanup [task name]" — mode: "bypassPermissions" — prompt: "Clean up these files: [list from implementation]. Fix: naming, dead code, unused imports, style, DRY. Do NOT add features, change architecture, or touch other files. If cleanup breaks acceptance criteria, revert."
|
|
58
|
+
|
|
59
|
+
### Step 3.3 — Smoke Test
|
|
60
|
+
|
|
61
|
+
Skip if this task has no UI surface. Otherwise run the Smoke Test Protocol (`protocols/smoke-test.md`): open the affected route, execute behavioral acceptance criteria via agent-browser, collect evidence. On FAIL: spawn fix agent with evidence. Max 2 fix-and-retest cycles.
|
|
62
|
+
|
|
63
|
+
### Step 3.4 — Verification
|
|
64
|
+
|
|
65
|
+
Run the Verification Protocol (`protocols/verify.md`). All 7 checks. If FAIL, fix before starting the next task.
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
69
|
+
## Phase 4: End-to-End Verification
|
|
70
|
+
|
|
71
|
+
### Step 4.1 — Run the User Journey
|
|
72
|
+
|
|
73
|
+
Call the Agent tool — description: "E2E: [feature name]" — mode: "bypassPermissions" — prompt: "Verify the full user journey for [feature name]: [paste the user journey from Phase 2]. Use agent-browser to walk through each step. For each step: interact, verify the expected outcome, capture evidence. Report PASS/FAIL per step with screenshots."
|
|
74
|
+
|
|
75
|
+
### Step 4.2 — Dogfood Affected Pages
|
|
76
|
+
|
|
77
|
+
Call the Agent tool — description: "Dogfood [feature area]" — prompt: "Open every page affected by [feature name]. Check for: broken layouts, console errors, missing data, dead links, regressions. Report issues with screenshots."
|
|
78
|
+
|
|
79
|
+
### Step 4.3 — Fix Loop
|
|
80
|
+
|
|
81
|
+
If issues found in 4.1 or 4.2: spawn a fix agent with the evidence. Re-run the failing check. Max 2 fix-and-retest cycles. After 2 failures:
|
|
82
|
+
- **Interactive:** present evidence to the user.
|
|
83
|
+
- **Autonomous:** log to `docs/plans/build-log.md` and proceed with a warning.
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
## Phase 5: Done
|
|
88
|
+
|
|
89
|
+
Report to the user:
|
|
90
|
+
|
|
91
|
+
```
|
|
92
|
+
FEATURE COMPLETE: [feature name]
|
|
93
|
+
Tasks: [done]/[total] | Tests: [count] passing
|
|
94
|
+
User journey: PASS/FAIL
|
|
95
|
+
Evidence: [paths to screenshots/logs]
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
If the feature expands the product scope, update `CLAUDE.md` to reflect the new capability.
|
package/commands/build.md
CHANGED
|
@@ -51,6 +51,8 @@ If you catch yourself typing code or reading source files: STOP. You are wasting
|
|
|
51
51
|
- `last_save: [Phase.Step]`
|
|
52
52
|
Increment after each agent returns (parallel dispatch of 4 agents = +4). Reset to 0 after each compaction save.
|
|
53
53
|
|
|
54
|
+
**Compaction checkpoint format:** At every phase boundary, check `dispatches_since_save` in `docs/plans/.build-state.md`. If >= 8: save ALL state (current phase, task statuses, metric loop scores, decisions) to `docs/plans/.build-state.md`. Reset `dispatches_since_save` to 0. TodoWrite does NOT survive compaction — rebuild it from this state file on resume.
|
|
55
|
+
|
|
54
56
|
Input: $ARGUMENTS
|
|
55
57
|
|
|
56
58
|
### Autonomous Mode
|
|
@@ -67,7 +69,7 @@ When combining `--resume` with `--autonomous`: the current invocation's flags ta
|
|
|
67
69
|
|
|
68
70
|
### Metric Loop
|
|
69
71
|
|
|
70
|
-
Every phase uses a **metric-driven iteration loop** to drive quality. Read the full protocol at `
|
|
72
|
+
Every phase uses a **metric-driven iteration loop** to drive quality. Read the full protocol at `protocols/metric-loop.md`. Critical rules (survive compaction):
|
|
71
73
|
|
|
72
74
|
1. YOU define a metric for this phase based on context (what you're building, what matters). The metric is NOT predefined.
|
|
73
75
|
2. Spawn a **measurement agent** to score the artifact 0-100. Read its full output — it's analysis.
|
|
@@ -95,15 +97,7 @@ For implementation agents (Phase 5+): Do NOT paste the entire Design Document or
|
|
|
95
97
|
|
|
96
98
|
### Complexity Routing (Advisory)
|
|
97
99
|
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
| Complexity | Task Types | Preferred Tier |
|
|
101
|
-
|-----------|-----------|----------------|
|
|
102
|
-
| S | Build-fix, cleanup, lint fix, single-error fix | Haiku-class (fastest) |
|
|
103
|
-
| M | Measurement, eval, testing, single-feature impl | Sonnet-class (balanced) |
|
|
104
|
-
| L | Architecture, research, multi-file impl, debugging | Opus-class (deepest reasoning) |
|
|
105
|
-
|
|
106
|
-
For sprint tasks, use the Size field from `docs/plans/sprint-tasks.md`. This is advisory — the tag documents intent for future model routing support.
|
|
100
|
+
Tag agent prompts with `[COMPLEXITY: S/M/L]` based on task size from `docs/plans/sprint-tasks.md`. This is advisory — the tag documents intent for future model routing support.
|
|
107
101
|
|
|
108
102
|
---
|
|
109
103
|
|
|
@@ -112,7 +106,7 @@ For sprint tasks, use the Size field from `docs/plans/sprint-tasks.md`. This is
|
|
|
112
106
|
**Resuming?** If the input contains `--resume` OR if context was just compacted (SessionStart hook fired with active state):
|
|
113
107
|
1. Read `docs/plans/.build-state.md` — verify it exists and has a Resume Point section.
|
|
114
108
|
If `docs/plans/.build-state.md` does not exist or has no Resume Point, warn the user: 'No previous build state found. Starting fresh.' Then proceed to Step 0.1 as a new build.
|
|
115
|
-
2. Re-read this file and all protocol files in `
|
|
109
|
+
2. Re-read this file and all protocol files in `protocols/`.
|
|
116
110
|
3. Re-read `docs/plans/sprint-tasks.md`, `docs/plans/architecture.md`, and `CLAUDE.md`.
|
|
117
111
|
4. Rebuild TodoWrite from the state file (TodoWrite does NOT survive compaction or session breaks).
|
|
118
112
|
5. Reset `dispatches_since_save` to 0 (fresh context window).
|
|
@@ -183,7 +177,7 @@ Autonomous mode: Log checklist to `docs/plans/build-log.md`. Create `.env.exampl
|
|
|
183
177
|
|
|
184
178
|
### Step 1.1 — Brainstorming
|
|
185
179
|
|
|
186
|
-
Follow the Brainstorm Protocol (`
|
|
180
|
+
Follow the Brainstorm Protocol (`protocols/brainstorm.md`).
|
|
187
181
|
|
|
188
182
|
In interactive mode: this is a conversation. Ask questions one at a time, propose approaches with trade-offs, let the user decide. Output: Design Document saved to `docs/plans/`.
|
|
189
183
|
|
|
@@ -195,15 +189,15 @@ Skip if context level is "Decision brief" (research already done).
|
|
|
195
189
|
|
|
196
190
|
Call the Agent tool 5 times in a single message. Pass each agent the build request AND the Design Document draft.
|
|
197
191
|
|
|
198
|
-
1. Description: "Market research" — Prompt: "Research market size (TAM/SAM/SOM), competitive landscape (5-10 players), timing, and market structure for: [build request]. Design context: [paste design doc].
|
|
192
|
+
1. Description: "Market research" — Prompt: "Research market size (TAM/SAM/SOM), competitive landscape (5-10 players), timing, and market structure for: [build request]. Design context: [paste design doc]. Report with a Market Verdict: GREEN/AMBER/RED."
|
|
199
193
|
|
|
200
|
-
2. Description: "Tech feasibility" — Prompt: "Evaluate hard technical problems (Solved/Hard/Unsolved), build-vs-buy decisions, MVP scope, and stack validation for: [build request]. Design context: [paste design doc].
|
|
194
|
+
2. Description: "Tech feasibility" — Prompt: "Evaluate hard technical problems (Solved/Hard/Unsolved), build-vs-buy decisions, MVP scope, and stack validation for: [build request]. Design context: [paste design doc]. Verify APIs and libraries from the design exist and are maintained. Report with a Technical Verdict."
|
|
201
195
|
|
|
202
|
-
3. Description: "User research" — Prompt: "Analyze target persona, jobs-to-be-done, current alternatives, behavioral barriers to adoption for: [build request]. Design context: [paste design doc].
|
|
196
|
+
3. Description: "User research" — Prompt: "Analyze target persona, jobs-to-be-done, current alternatives, and behavioral barriers to adoption for: [build request]. Design context: [paste design doc]. Report with a User Verdict."
|
|
203
197
|
|
|
204
|
-
4. Description: "Business model" — Prompt: "Evaluate revenue models, unit economics, growth loops, first-1000-users strategy for: [build request]. Design context: [paste design doc].
|
|
198
|
+
4. Description: "Business model" — Prompt: "Evaluate revenue models, unit economics, growth loops, and first-1000-users strategy for: [build request]. Design context: [paste design doc]. Report with a Business Verdict."
|
|
205
199
|
|
|
206
|
-
5. Description: "Risk analysis" — Prompt: "Adversarial review: regulatory risk, security concerns, dependency risks, competitive response, top 3 failure modes for: [build request]. Design context: [paste design doc].
|
|
200
|
+
5. Description: "Risk analysis" — Prompt: "Adversarial review: regulatory risk, security concerns, dependency risks, competitive response, top 3 failure modes for: [build request]. Design context: [paste design doc]. Report with a Risk Verdict."
|
|
207
201
|
|
|
208
202
|
After all 5 return, synthesize a **Research Brief** with a verdict table. Save to `docs/plans/research-brief.md`.
|
|
209
203
|
|
|
@@ -218,17 +212,41 @@ Read the Design Document and Research Brief together. Check for contradictions:
|
|
|
218
212
|
|
|
219
213
|
Update the Design Document with corrections. Save final version.
|
|
220
214
|
|
|
221
|
-
### Step 1.4 —
|
|
215
|
+
### Step 1.4 — Write CLAUDE.md
|
|
216
|
+
|
|
217
|
+
Create (or overwrite) the project's `CLAUDE.md`. This is the product brain — every agent spawned during the build reads it automatically. Write it from the Design Document and Research Brief. It must give any agent enough context to make smart product, UX, and technical decisions without needing the full design doc.
|
|
218
|
+
|
|
219
|
+
<HARD-GATE>
|
|
220
|
+
CLAUDE.md must be under 200 lines. It is not a wiki, not a conventions doc, not a dump of everything you know. It is the minimum context an agent needs to make correct decisions about this specific product.
|
|
221
|
+
</HARD-GATE>
|
|
222
222
|
|
|
223
|
-
|
|
223
|
+
Structure:
|
|
224
224
|
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
-
|
|
228
|
-
|
|
229
|
-
|
|
225
|
+
```
|
|
226
|
+
## Product
|
|
227
|
+
[1-3 sentences: what this is, core value prop, what success looks like]
|
|
228
|
+
|
|
229
|
+
## User
|
|
230
|
+
[Primary persona: who they are, what they care about, pain points,
|
|
231
|
+
technical sophistication. This drives every UX decision.]
|
|
232
|
+
|
|
233
|
+
## Tech Stack
|
|
234
|
+
[Stack choices with 1-line rationale for each. Framework, DB, auth,
|
|
235
|
+
key libraries, deployment target.]
|
|
236
|
+
|
|
237
|
+
## Scope
|
|
238
|
+
[What's in MVP vs. deferred. Hard boundaries. This prevents agents
|
|
239
|
+
from building features that aren't scoped.]
|
|
240
|
+
|
|
241
|
+
## Rules
|
|
242
|
+
[Project-specific hard rules derived from the product and user context.
|
|
243
|
+
Examples: "All data must be real-time — no simulated/fake data",
|
|
244
|
+
"User must be able to pause/stop any automated process at any time",
|
|
245
|
+
"Every interactive element must have visible feedback within 200ms".
|
|
246
|
+
Only include rules this specific project needs — not generic best practices.]
|
|
247
|
+
```
|
|
230
248
|
|
|
231
|
-
|
|
249
|
+
Keep it product-focused. An implementation agent reading this should understand WHO the user is and WHAT matters enough to make the right call when the handoff prompt doesn't cover an edge case.
|
|
232
250
|
|
|
233
251
|
### Quality Gate 1
|
|
234
252
|
|
|
@@ -238,7 +256,7 @@ This ensures decisions survive context compaction.
|
|
|
238
256
|
|
|
239
257
|
Update TodoWrite and `docs/plans/.build-state.md`.
|
|
240
258
|
|
|
241
|
-
**Compaction checkpoint
|
|
259
|
+
**Compaction checkpoint.** Update `docs/plans/.build-state.md` per the format above.
|
|
242
260
|
|
|
243
261
|
---
|
|
244
262
|
|
|
@@ -270,13 +288,13 @@ After all 4 return, YOU synthesize into one Architecture Document. Save to `docs
|
|
|
270
288
|
|
|
271
289
|
### Step 2.3 — Metric Loop: Architecture Quality
|
|
272
290
|
|
|
273
|
-
Run the Metric Loop Protocol (`
|
|
291
|
+
Run the Metric Loop Protocol (`protocols/metric-loop.md`) on the Architecture Document. Define a metric based on: coverage of design doc requirements, specificity, consistency between agents, and **simplicity** — is this the simplest architecture that meets the requirements? Could any service, abstraction, or dependency be eliminated without losing functionality? Penalize over-engineering (microservices for a simple app, Kubernetes for a static site, complex state management for a 3-page app). Max 3 iterations.
|
|
274
292
|
|
|
275
293
|
### Step 2.4 — Sprint Planning
|
|
276
294
|
|
|
277
|
-
Follow the Planning Protocol (`
|
|
295
|
+
Follow the Planning Protocol (`protocols/planning.md`). Use 2 sequential Agent tool calls:
|
|
278
296
|
|
|
279
|
-
Call the Agent tool — description: "Sprint breakdown" — prompt: "Break this architecture into ordered, atomic tasks. Each task needs: description, acceptance criteria, dependencies, size (S/M/L). ARCHITECTURE: [paste]. DESIGN DOC: [paste]. Scope to MVP only."
|
|
297
|
+
Call the Agent tool — description: "Sprint breakdown" — prompt: "Break this architecture into ordered, atomic tasks. Each task needs: description, acceptance criteria, dependencies, size (S/M/L). Include a `**Behavioral Test:**` field for every task that has UI — a concrete interaction test: 'Navigate to [page], click [element], verify [expected outcome]'. API-only tasks should have curl-based acceptance tests instead. ARCHITECTURE: [paste]. DESIGN DOC: [paste]. Scope to MVP only."
|
|
280
298
|
|
|
281
299
|
Then call the Agent tool — description: "Validate task list" — prompt: "Validate this task list: [paste]. Check scope is realistic, no missing tasks, descriptions specific enough for a developer agent to execute, all tasks within MVP boundary."
|
|
282
300
|
|
|
@@ -290,7 +308,7 @@ Save to `docs/plans/sprint-tasks.md`.
|
|
|
290
308
|
|
|
291
309
|
Update TodoWrite and `docs/plans/.build-state.md`.
|
|
292
310
|
|
|
293
|
-
**Compaction checkpoint
|
|
311
|
+
**Compaction checkpoint.** Update `docs/plans/.build-state.md` per the format above.
|
|
294
312
|
|
|
295
313
|
---
|
|
296
314
|
|
|
@@ -301,14 +319,14 @@ Update TodoWrite and `docs/plans/.build-state.md`.
|
|
|
301
319
|
**Skip if** the project has no user-facing frontend (CLI tools, pure APIs, backend services).
|
|
302
320
|
|
|
303
321
|
<HARD-GATE>
|
|
304
|
-
UI/UX IS THE PRODUCT. This phase is a full peer to Architecture and Build — not a footnote, not an afterthought, not a "nice to have." Do NOT skip, compress, or rush this phase for any reason. The agents must research real competitors and award-winning sites, make deliberate visual choices backed by that research, build
|
|
322
|
+
UI/UX IS THE PRODUCT. This phase is a full peer to Architecture and Build — not a footnote, not an afterthought, not a "nice to have." Do NOT skip, compress, or rush this phase for any reason. The agents must research real competitors and award-winning sites, make deliberate visual choices backed by that research, build a living style guide with every component rendered and interactive, and iterate with Playwright-verified visual QA before a single line of product code is written.
|
|
305
323
|
|
|
306
324
|
Phase 4 (Foundation) WILL NOT START without `docs/plans/visual-design-spec.md`. If it does not exist, return here.
|
|
307
325
|
</HARD-GATE>
|
|
308
326
|
|
|
309
327
|
### Step 3.1 — Design Research (2 agents, parallel, both use Playwright)
|
|
310
328
|
|
|
311
|
-
Follow the Design Protocol (`
|
|
329
|
+
Follow the Design Protocol (`protocols/design.md`), Step 3.1.
|
|
312
330
|
|
|
313
331
|
Call the Agent tool 2 times in one message:
|
|
314
332
|
|
|
@@ -320,21 +338,23 @@ After both return, synthesize a **Design Research Brief** to `docs/plans/design-
|
|
|
320
338
|
|
|
321
339
|
### Step 3.2 — Design Direction (2 agents, sequential)
|
|
322
340
|
|
|
323
|
-
Follow the Design Protocol (`
|
|
341
|
+
Follow the Design Protocol (`protocols/design.md`), Step 3.2.
|
|
324
342
|
|
|
325
343
|
1. Call the Agent tool — description: "UX architecture" — Prompt: "Create structural design foundation. INPUTS: frontend architecture section from architecture.md [paste], Design Research Brief [paste], reference screenshot paths [list], user persona [paste]. OUTPUT: information architecture, layout strategy, component hierarchy, responsive approach, interaction patterns. Base decisions on competitive research, not generic patterns."
|
|
326
344
|
|
|
327
345
|
2. Call the Agent tool — description: "Visual design spec" — Prompt: "Create the Visual Design Spec with AUTONOMOUS decisions — pick the single best direction, do not present options. INPUTS: UX foundation [paste previous output], Design Research Brief [paste], reference screenshot paths [list], user persona [paste]. OUTPUT: color system (with hex, light+dark), typography (Google Fonts, mathematical scale), 8px spacing system, tinted shadow system, border radius, animation/motion, component styles with ALL states. Every choice must cite the research. Apply anti-AI-template rules from the Design Protocol. Save to docs/plans/visual-design-spec.md."
|
|
328
346
|
|
|
329
|
-
### Step 3.3 —
|
|
347
|
+
### Step 3.3 — Living Style Guide (1 implementation agent)
|
|
348
|
+
|
|
349
|
+
Follow the Design Protocol (`protocols/design.md`), Step 3.3.
|
|
330
350
|
|
|
331
|
-
Call the Agent tool — description: "Build
|
|
351
|
+
Call the Agent tool — description: "Build living style guide" — mode: "bypassPermissions" — prompt: "[COMPLEXITY: L] Build a living style guide page (/design-system route or standalone HTML). INPUTS: Visual Design Spec [paste], UX foundation [paste relevant sections], reference screenshots [list paths — these are your quality targets]. Must include rendered, interactive examples of: color swatches, typography scale, spacing scale, buttons (all states), form elements (all states), cards, navigation, feedback components (alerts, toasts, spinners, empty states), modals/overlays, and layout grid examples. Every component interactive (hover, focus, transitions work). Mobile-responsive. This ships with the product. Commit: 'feat: living style guide'."
|
|
332
352
|
|
|
333
353
|
### Step 3.4 — Visual QA Loop (Playwright + Metric Loop)
|
|
334
354
|
|
|
335
|
-
Run the Metric Loop Protocol (`
|
|
355
|
+
Run the Metric Loop Protocol (`protocols/metric-loop.md`) using the measurement criteria from the Design Protocol (`protocols/design.md`, Step 3.4).
|
|
336
356
|
|
|
337
|
-
Measurement: Playwright screenshots of
|
|
357
|
+
Measurement: Playwright screenshots of the living style guide sections (desktop + mobile). Design critic agent scores 0-100 across 6 dimensions: spacing/alignment, typography hierarchy, color harmony, component polish, responsive quality, originality (anti-AI-template check). Receives screenshots + Visual Design Spec + reference screenshots.
|
|
338
358
|
|
|
339
359
|
**Target: 80. Max 5 iterations.** On stall: accept if >= 65, log warning below 65.
|
|
340
360
|
|
|
@@ -342,7 +362,7 @@ Measurement: Playwright screenshots of proof screens (desktop + mobile). Design
|
|
|
342
362
|
|
|
343
363
|
Log to `docs/plans/build-log.md`: final screenshot paths, score history table, design decisions, originality score. No user pause. Proceed to Phase 4.
|
|
344
364
|
|
|
345
|
-
**Compaction checkpoint
|
|
365
|
+
**Compaction checkpoint.** Update `docs/plans/.build-state.md` per the format above.
|
|
346
366
|
|
|
347
367
|
---
|
|
348
368
|
|
|
@@ -360,7 +380,11 @@ Call the Agent tool — description: "Project scaffolding" — mode: "bypassPerm
|
|
|
360
380
|
|
|
361
381
|
### Step 4.2 — Design System (frontend only)
|
|
362
382
|
|
|
363
|
-
Call the Agent tool — description: "Design system setup" — mode: "bypassPermissions" — prompt: "Implement the design system from the Visual Design Spec: [paste from docs/plans/visual-design-spec.md]. Create CSS tokens matching the spec's color system, typography scale, spacing system, shadow/elevation tokens, and base layout components.
|
|
383
|
+
Call the Agent tool — description: "Design system setup" — mode: "bypassPermissions" — prompt: "Implement the design system from the Visual Design Spec: [paste from docs/plans/visual-design-spec.md]. Create CSS tokens matching the spec's color system, typography scale, spacing system, shadow/elevation tokens, and base layout components. The living style guide from Phase 3 is the reference implementation — components must match. Commit: 'feat: design system'."
|
|
384
|
+
|
|
385
|
+
### Step 4.2b — Acceptance Test Scaffolding
|
|
386
|
+
|
|
387
|
+
Call the Agent tool — description: "Scaffold acceptance tests" — mode: "bypassPermissions" — prompt: "Read docs/plans/sprint-tasks.md. For every task with a Behavioral Test field, create a Playwright test stub in tests/e2e/acceptance/. Use Page Object Model. Each test should: navigate to the page, perform the interaction, assert the expected outcome. Tests should FAIL right now (features aren't built yet) — that's correct. Also ensure agent-browser is available (run `which agent-browser`). Commit: 'test: scaffold acceptance tests from sprint tasks'."
|
|
364
388
|
|
|
365
389
|
### Step 4.3 — Metric Loop: Scaffold Health
|
|
366
390
|
|
|
@@ -368,10 +392,10 @@ Run the Metric Loop Protocol. Define a metric: builds clean, tests pass, lint cl
|
|
|
368
392
|
|
|
369
393
|
### Step 4.4 — Verification Gate
|
|
370
394
|
|
|
371
|
-
Run the Verification Protocol (`
|
|
395
|
+
Run the Verification Protocol (`protocols/verify.md`). Critical rules (survive compaction):
|
|
372
396
|
- ONE agent runs all 6 checks sequentially: Build → Type-Check → Lint → Test → Security → Diff Review. Stop on first FAIL.
|
|
373
397
|
- Agent auto-detects stack from manifest files (package.json → Node, go.mod → Go, etc.).
|
|
374
|
-
- On FAIL: for build/type/lint errors, use the Build-Fix Protocol (`
|
|
398
|
+
- On FAIL: for build/type/lint errors, use the Build-Fix Protocol (`protocols/build-fix.md`) — fixes one error at a time with cascade detection. For test/security/diff failures, spawn a targeted fix agent. Re-verify. Max 3 fix attempts.
|
|
375
399
|
- On PASS: log `VERIFY: PASS (6/6)` to `docs/plans/.build-state.md`. Proceed.
|
|
376
400
|
|
|
377
401
|
Call the Agent tool — description: "Verify scaffolding" — mode: "bypassPermissions" — prompt: "Run the Verification Protocol. Execute all 6 checks sequentially, stop on first failure. Report: VERIFY: PASS or VERIFY: FAIL with details."
|
|
@@ -380,7 +404,7 @@ Do not proceed to Phase 5 until verification passes.
|
|
|
380
404
|
|
|
381
405
|
Update TodoWrite and state.
|
|
382
406
|
|
|
383
|
-
**Compaction checkpoint
|
|
407
|
+
**Compaction checkpoint.** Update `docs/plans/.build-state.md` per the format above.
|
|
384
408
|
|
|
385
409
|
---
|
|
386
410
|
|
|
@@ -396,13 +420,13 @@ Expand TodoWrite with each sprint task.
|
|
|
396
420
|
|
|
397
421
|
### Step 5.1 — Implement
|
|
398
422
|
|
|
399
|
-
Call the Agent tool — description: "[task name]" — mode: "bypassPermissions" — prompt: "TASK: [task description + acceptance criteria]. HANDOFF — Architecture section: [paste ONLY the relevant section from architecture.md]. Design section: [paste ONLY the relevant section from the design doc]. Previous task output: [what the last completed task produced, if relevant]. Implement fully with real code and tests. Commit: 'feat: [task]'. Report what you built, files changed, and test results."
|
|
423
|
+
Call the Agent tool — description: "[task name]" — mode: "bypassPermissions" — prompt: "TASK: [task description + acceptance criteria]. HANDOFF — Architecture section: [paste ONLY the relevant section from architecture.md]. Design section: [paste ONLY the relevant section from the design doc]. Previous task output: [what the last completed task produced, if relevant]. For UI tasks: the living style guide at /design-system shows every component's exact styling and states — match it. Implement fully with real code and tests. Commit: 'feat: [task]'. Report what you built, files changed, and test results."
|
|
400
424
|
|
|
401
425
|
Pick the right developer framing: frontend, backend, AI, etc. Set `[COMPLEXITY: S/M/L]` based on the task's Size from sprint-tasks.md.
|
|
402
426
|
|
|
403
427
|
### Step 5.1b — Cleanup (De-Sloppify)
|
|
404
428
|
|
|
405
|
-
Follow the Cleanup Protocol (`
|
|
429
|
+
Follow the Cleanup Protocol (`protocols/cleanup.md`). Critical rules (survive compaction):
|
|
406
430
|
[COMPLEXITY: S]
|
|
407
431
|
- Skip if trivial (< 20 lines, single file).
|
|
408
432
|
- Cleanup agent is a SEPARATE agent from the implementer — no cleaning your own mess.
|
|
@@ -414,7 +438,7 @@ Call the Agent tool — description: "Cleanup [task name]" — mode: "bypassPerm
|
|
|
414
438
|
|
|
415
439
|
### Step 5.2 — Metric Loop: Task Quality
|
|
416
440
|
|
|
417
|
-
Run the Metric Loop Protocol on the task implementation. Define a metric based on the task's acceptance criteria. Max 5 iterations.
|
|
441
|
+
Run the Metric Loop Protocol on the task implementation. Define a metric based on the task's acceptance criteria. For UI-facing tasks, include behavioral verification: the measurement agent should use agent-browser to verify the feature renders and responds to interaction, not just read the code. Max 5 iterations.
|
|
418
442
|
|
|
419
443
|
### Step 5.3 — Loop Exit
|
|
420
444
|
|
|
@@ -426,11 +450,23 @@ On stall or max iterations:
|
|
|
426
450
|
|
|
427
451
|
After each task: update TodoWrite and `docs/plans/.build-state.md`.
|
|
428
452
|
|
|
453
|
+
### Step 5.3b — Behavioral Smoke Test
|
|
454
|
+
|
|
455
|
+
Skip if this task has no Behavioral Test criteria (API-only, config, infrastructure tasks).
|
|
456
|
+
|
|
457
|
+
Run the Smoke Test Protocol (`protocols/smoke-test.md`). This uses agent-browser to open the app, execute the task's behavioral acceptance criteria, and verify the feature actually works.
|
|
458
|
+
|
|
459
|
+
Evidence saved to `docs/plans/evidence/[task-name]/`: annotated screenshot, snapshot diff, error log, network log, HAR file.
|
|
460
|
+
|
|
461
|
+
On FAIL: spawn fix agent with the evidence. The fix agent receives: what was expected (from acceptance criteria), what actually happened (snapshot diff + errors + screenshot), and the relevant source files. Max 2 fix-and-retest cycles.
|
|
462
|
+
|
|
463
|
+
On PASS: proceed to Step 5.4.
|
|
464
|
+
|
|
429
465
|
### Step 5.4 — Post-Task Verification
|
|
430
466
|
|
|
431
|
-
Run the Verification Protocol (`
|
|
467
|
+
Run the Verification Protocol (`protocols/verify.md`). If FAIL, fix before starting the next task.
|
|
432
468
|
|
|
433
|
-
**Compaction checkpoint
|
|
469
|
+
**Compaction checkpoint.** Update `docs/plans/.build-state.md` per the format above.
|
|
434
470
|
|
|
435
471
|
---
|
|
436
472
|
|
|
@@ -438,23 +474,27 @@ Run the Verification Protocol (`commands/protocols/verify.md`) to catch regressi
|
|
|
438
474
|
|
|
439
475
|
### Step 6.0 — Pre-Hardening Verification
|
|
440
476
|
|
|
441
|
-
Run the Verification Protocol (`
|
|
477
|
+
Run the Verification Protocol (`protocols/verify.md`). All checks must pass before starting expensive audit agents.
|
|
478
|
+
|
|
479
|
+
### Step 6.1 — Initial Audit (5 agents in parallel, ONE message)
|
|
442
480
|
|
|
443
|
-
|
|
481
|
+
Read the NFRs from `docs/plans/sprint-tasks.md`. Pass the relevant NFR thresholds to each audit agent so they have concrete targets, not generic checks.
|
|
444
482
|
|
|
445
|
-
Call the Agent tool
|
|
483
|
+
Call the Agent tool 5 times in one message:
|
|
446
484
|
|
|
447
|
-
1. Description: "API testing" — Prompt: "Comprehensive API validation: all endpoints, edge cases, error responses, auth flows. Report findings with counts."
|
|
485
|
+
1. Description: "API testing" — Prompt: "Comprehensive API validation: all endpoints, edge cases, error responses, auth flows. NFR targets: [paste performance and reliability NFRs]. Report findings with counts."
|
|
448
486
|
|
|
449
|
-
2. Description: "Performance audit" — Prompt: "Measure response times, identify bottlenecks, flag performance issues. Report benchmarks."
|
|
487
|
+
2. Description: "Performance audit" — Prompt: "Measure response times, identify bottlenecks, flag performance issues. NFR targets: [paste performance NFRs — e.g., API < 200ms, page load < 3s]. Report benchmarks AGAINST these targets."
|
|
450
488
|
|
|
451
|
-
3. Description: "Accessibility audit" — Prompt: "WCAG compliance audit on all interfaces. Check screen reader, keyboard nav, contrast. Report issues with counts."
|
|
489
|
+
3. Description: "Accessibility audit" — Prompt: "WCAG compliance audit on all interfaces. NFR target: [paste accessibility NFR — e.g., WCAG AA]. Check screen reader, keyboard nav, contrast. Report issues with counts."
|
|
452
490
|
|
|
453
|
-
4. Description: "Security audit" — Prompt: "Security review: auth, input validation, data exposure, dependency vulnerabilities. Report findings with severity."
|
|
491
|
+
4. Description: "Security audit" — Prompt: "Security review: auth, input validation, data exposure, dependency vulnerabilities. NFR targets: [paste security NFRs]. Report findings with severity."
|
|
492
|
+
|
|
493
|
+
5. Description: "UX quality audit" — Prompt: "UX quality review of every user-facing page. NFR targets: [paste accessibility NFRs]. First, screenshot the living style guide at /design-system as your reference for how components should look. Then review every product page and check: loading states (every async action must show a loading indicator), error states (every form and API call must show user-friendly error feedback), empty states (every list/table must handle zero items gracefully), mobile responsiveness (test at 375px viewport — touch targets >= 44px, no horizontal scroll, readable text), form validation (inline feedback, not just alert()), transition smoothness (no layout shifts, no janky animations), visual consistency (compare each page's components against the style guide — buttons, inputs, cards, colors, spacing should match). Report issues with page, severity, and screenshot."
|
|
454
494
|
|
|
455
495
|
### Step 6.1b — Eval Harness
|
|
456
496
|
|
|
457
|
-
Run the Eval Harness Protocol (`
|
|
497
|
+
Run the Eval Harness Protocol (`protocols/eval-harness.md`). Define 8-15 concrete, executable eval cases from the audit findings and architecture doc. For UI flows, eval cases should use agent-browser: "agent-browser open /dashboard -> agent-browser click @submit -> agent-browser wait --text Success -> expect text contains confirmation ID". Run the eval agent. Record baseline pass rate. CRITICAL and HIGH failures feed into the metric loop in Step 6.2 as specific issues to fix.
|
|
458
498
|
|
|
459
499
|
### Step 6.2 — Metric Loop: Hardening Quality
|
|
460
500
|
|
|
@@ -472,7 +512,7 @@ Re-run the Eval Harness after the metric loop exits. All CRITICAL eval cases mus
|
|
|
472
512
|
ALL 3 ITERATIONS ARE MANDATORY. Do NOT stop after iteration 1 even if all tests pass. The purpose of 3 runs is to catch flaky tests, timing-dependent failures, and race conditions that only surface on repeated execution. Skip this step ONLY if the project has no user-facing frontend.
|
|
473
513
|
</HARD-GATE>
|
|
474
514
|
|
|
475
|
-
Generate and execute end-to-end tests using Playwright against the running application. Tests cover
|
|
515
|
+
Generate and execute end-to-end tests using Playwright against the running application. Tests cover the **User Journeys** defined in `docs/plans/sprint-tasks.md` (Step 0 of the Planning Protocol). Each journey = one E2E test file.
|
|
476
516
|
|
|
477
517
|
**Iteration 1 — Generate & Run:**
|
|
478
518
|
|
|
@@ -481,12 +521,13 @@ Call the Agent tool — description: "E2E test generation" — mode: "bypassPerm
|
|
|
481
521
|
"[COMPLEXITY: L] Generate and run end-to-end Playwright tests for this application.
|
|
482
522
|
|
|
483
523
|
INPUTS:
|
|
484
|
-
-
|
|
485
|
-
-
|
|
486
|
-
-
|
|
524
|
+
- User Journeys from docs/plans/sprint-tasks.md: [paste the User Journeys section — each journey becomes one E2E test]
|
|
525
|
+
- Architecture doc (API contracts): [paste relevant sections from docs/plans/architecture.md]
|
|
526
|
+
- NFRs from docs/plans/sprint-tasks.md: [paste — use performance thresholds as test assertions]
|
|
527
|
+
- Visual Design Spec (component selectors): [paste relevant sections from docs/plans/visual-design-spec.md]
|
|
487
528
|
|
|
488
529
|
REQUIREMENTS:
|
|
489
|
-
1.
|
|
530
|
+
1. One E2E test per User Journey from sprint-tasks.md (each journey = one test file covering the full flow)
|
|
490
531
|
2. Use Page Object Model pattern — one page object per major view
|
|
491
532
|
3. Use data-testid selectors (add them to components if missing)
|
|
492
533
|
4. Wait for API responses, NEVER use arbitrary timeouts (no waitForTimeout)
|
|
@@ -511,56 +552,67 @@ Record results: total tests, pass count, fail count, failure details. Log to `do
|
|
|
511
552
|
|
|
512
553
|
**Iteration 2 — Fix & Re-run:**
|
|
513
554
|
|
|
514
|
-
Call the Agent tool — description: "E2E fix iteration 2" — mode: "bypassPermissions" — prompt:
|
|
555
|
+
Call the Agent tool — description: "E2E fix iteration 2" — mode: "bypassPermissions" — prompt: "[COMPLEXITY: M] Fix E2E test failures from iteration 1: [paste failure details — test names, error messages, screenshot paths]. Diagnose each as real bug, flaky test, or missing selector. Fix accordingly — do NOT delete or skip tests. Re-run ALL tests. Commit: 'fix: e2e test failures iteration 2'."
|
|
515
556
|
|
|
516
|
-
|
|
557
|
+
Record results in the E2E table. Identify flaky candidates (passed iter 1, failed iter 2 or vice versa).
|
|
517
558
|
|
|
518
|
-
|
|
559
|
+
**Iteration 3 — Final Stability Run:**
|
|
519
560
|
|
|
520
|
-
|
|
521
|
-
1. Diagnose: Is this a real bug, a flaky test, or a missing data-testid?
|
|
522
|
-
2. Real bugs: Fix the application code
|
|
523
|
-
3. Flaky tests: Add proper waits, fix race conditions, improve selectors
|
|
524
|
-
4. Missing selectors: Add data-testid attributes to components
|
|
525
|
-
5. Do NOT delete or skip failing tests — fix them
|
|
561
|
+
Call the Agent tool — description: "E2E stability run" — mode: "bypassPermissions" — prompt: "[COMPLEXITY: M] Final E2E stability run (3 of 3). Previous results — Iter 1: [pass/fail counts], Iter 2: [pass/fail counts], Flaky candidates: [list]. Run ALL tests with --repeat-each=3. Quarantine inconsistent tests with test.fixme(). Fix remaining consistent failures. PASS CRITERIA: 95%+ pass rate (quarantined flaky tests excluded but logged). Commit: 'test: e2e stability fixes iteration 3'."
|
|
526
562
|
|
|
527
|
-
|
|
528
|
-
Commit fixes: 'fix: e2e test failures iteration 2'"
|
|
563
|
+
Record final results. Include in Reality Checker evidence.
|
|
529
564
|
|
|
530
|
-
|
|
565
|
+
### Step 6.2d — Autonomous Dogfooding
|
|
531
566
|
|
|
532
|
-
**
|
|
567
|
+
Run the agent-browser dogfood skill against the running app. Unlike the per-task smoke tests (which verify specific acceptance criteria), dogfooding is **exploratory** — it autonomously navigates every reachable page, clicks buttons, fills forms, checks console errors, and finds issues we didn't think to test.
|
|
533
568
|
|
|
534
|
-
|
|
569
|
+
Start the dev server if not running. Then invoke the dogfood skill:
|
|
535
570
|
|
|
536
|
-
"
|
|
571
|
+
Call the Agent tool — description: "Dogfood the app" — mode: "bypassPermissions" — prompt: "Run the agent-browser dogfood skill against the running app at http://localhost:[port]. Explore every reachable page. Click every button. Fill every form. Check console for errors. Report a structured list of issues with severity ratings (critical/high/medium/low), screenshots, and repro steps. If dogfood skill is not available, use agent-browser manually: snapshot each page, click all interactive elements, check errors and network requests. Also evaluate UX quality: missing loading states, poor error messages, broken mobile layouts (resize to 375px), visual inconsistencies, missing empty states, form validation gaps. Report UX issues separately from functional issues."
|
|
537
572
|
|
|
538
|
-
|
|
539
|
-
|
|
540
|
-
|
|
541
|
-
-
|
|
573
|
+
**Fix loop:** For each CRITICAL or HIGH issue found:
|
|
574
|
+
1. Classify: is this a code bug (fix in Phase 5 style — spawn implementation fix agent) or a structural problem (needs architecture change — spawn architect agent to propose a fix plan, then implementation agent to execute)?
|
|
575
|
+
2. Spawn the appropriate fix agent with: the issue description, repro steps, screenshot, affected page/component.
|
|
576
|
+
3. After fixes, re-run dogfood on the affected pages only (not the full app). If new CRITICAL/HIGH issues appear, repeat. Max 3 fix cycles.
|
|
542
577
|
|
|
543
|
-
|
|
544
|
-
1. Run ALL tests with --repeat-each=3 to detect flakiness (each test runs 3 times within this iteration)
|
|
545
|
-
2. Any test failing inconsistently across the 3 sub-runs: quarantine with test.fixme() and file path + reason
|
|
546
|
-
3. Fix any remaining consistent failures
|
|
547
|
-
4. Generate final report with: total journeys, pass rate, flaky count, quarantined tests
|
|
548
|
-
5. Commit: 'test: e2e stability fixes iteration 3'
|
|
578
|
+
MEDIUM/LOW issues: log to `docs/plans/build-log.md` for the Reality Checker.
|
|
549
579
|
|
|
550
|
-
|
|
580
|
+
### Step 6.2e — Fake Data Detector
|
|
551
581
|
|
|
552
|
-
|
|
582
|
+
Call the Agent tool — description: "Fake data audit" — mode: "bypassPermissions" — prompt: "Run the Fake Data Detector Protocol (protocols/fake-data-detector.md). Check for mock/hardcoded data in production paths. Static analysis: grep for Math.random() business data, hardcoded API responses, setTimeout faking async, placeholder text. Dynamic analysis: inspect HAR files from docs/plans/evidence/ for missing real API calls, static responses, absent WebSocket traffic. Report findings with file:line references and severity."
|
|
583
|
+
|
|
584
|
+
**Fix loop:** For each CRITICAL finding:
|
|
585
|
+
1. Spawn a fix agent with: the finding (file:line, what's fake, what it should be), and the relevant source files.
|
|
586
|
+
2. The fix agent replaces fake data with real API calls, real WebSocket connections, real data sources. If real data sources aren't available (missing API keys, no backend), the fix agent must flag this as a blocker — not paper over it with better-looking fake data.
|
|
587
|
+
3. After fixes, re-run the fake data detector (static checks only — fast). Max 2 fix cycles.
|
|
553
588
|
|
|
554
|
-
|
|
589
|
+
Remaining findings feed into the Reality Checker in Step 6.4.
|
|
555
590
|
|
|
556
|
-
|
|
591
|
+
### Step 6.4 — Reality Check
|
|
592
|
+
|
|
593
|
+
Call the Agent tool — description: "Final verdict" — prompt: "You are the Reality Checker. Default: NEEDS WORK. The hardening loop reached score [final_score] after [iterations] iterations. Score history: [paste table]. Review all evidence. Eval harness results: [baseline pass rate] → [final pass rate]. E2E test results: [paste E2E table — 3 iterations, final pass rate, quarantined count]. Dogfood results: [paste issue count and any CRITICAL/HIGH findings, or 'clean — no issues found']. Fake data audit results: [paste findings or 'clean — no fake data detected']. CRITICAL failures remaining: [list or none]. Verdict: PRODUCTION READY or NEEDS WORK with specifics."
|
|
557
594
|
|
|
558
595
|
<HARD-GATE>Do NOT self-approve. Reality Checker must give the verdict.</HARD-GATE>
|
|
559
596
|
|
|
560
|
-
**
|
|
561
|
-
|
|
597
|
+
**On PRODUCTION READY:** Log verdict. Proceed to Phase 7.
|
|
598
|
+
|
|
599
|
+
**On NEEDS WORK:** The Reality Checker returns specific issues. These must be fixed — not logged and ignored.
|
|
562
600
|
|
|
563
|
-
|
|
601
|
+
1. Read the Reality Checker's specific findings. Classify each:
|
|
602
|
+
- **Code bug** (broken feature, failing test, fake data) → spawn implementation fix agent with the finding + affected files.
|
|
603
|
+
- **Structural issue** (missing feature, wrong architecture, data flow problem) → spawn architect agent to produce a fix plan, then implementation agent to execute it. This is a mini Phase 5 loop for the specific issue.
|
|
604
|
+
- **Blocker** (missing API key, no backend, needs human action) → log to `docs/plans/build-log.md` and present to user. Cannot be auto-fixed.
|
|
605
|
+
2. After fixes, re-run verification (7 checks) + the specific failing gate (E2E, dogfood, or fake data — whichever surfaced the issue).
|
|
606
|
+
3. Re-run the Reality Checker with updated evidence.
|
|
607
|
+
|
|
608
|
+
<HARD-GATE>
|
|
609
|
+
Max 2 NEEDS WORK cycles. If the Reality Checker returns NEEDS WORK a third time:
|
|
610
|
+
- **Interactive:** Present all remaining issues to user. Ask for direction.
|
|
611
|
+
- **Autonomous:** Log remaining issues to `docs/plans/build-log.md`. Proceed to Phase 7 with a warning in the completion report.
|
|
612
|
+
Do not loop forever.
|
|
613
|
+
</HARD-GATE>
|
|
614
|
+
|
|
615
|
+
**Compaction checkpoint.** Update `docs/plans/.build-state.md` per the format above.
|
|
564
616
|
|
|
565
617
|
---
|
|
566
618
|
|
|
@@ -568,7 +620,18 @@ Call the Agent tool — description: "Final verdict" — prompt: "You are the Re
|
|
|
568
620
|
|
|
569
621
|
### Step 7.0 — Pre-Ship Verification
|
|
570
622
|
|
|
571
|
-
|
|
623
|
+
Run the Verification Protocol (`protocols/verify.md`). All checks must pass before documenting and shipping. If FAIL persists after 3 fix attempts, return to Phase 6.
|
|
624
|
+
|
|
625
|
+
### Step 7.0b — Requirements Coverage Report
|
|
626
|
+
|
|
627
|
+
Call the Agent tool — description: "Requirements coverage check" — prompt: "Re-read the original Design Document (docs/plans/*.md design doc) and the user journeys + NFRs from docs/plans/sprint-tasks.md. For EVERY feature listed in the MVP scope, verify: (1) it has a corresponding implemented task, (2) it has a passing test or behavioral verification, (3) it is reachable and functional in the running app. Produce a coverage table:
|
|
628
|
+
|
|
629
|
+
| MVP Feature | Task | Test | Verified | Status |
|
|
630
|
+
|-------------|------|------|----------|--------|
|
|
631
|
+
|
|
632
|
+
Mark each as COVERED, PARTIAL (implemented but untested), or MISSING. Any MISSING feature is a blocker — report it immediately."
|
|
633
|
+
|
|
634
|
+
If any features are MISSING: spawn implementation agents to build them, then re-run verification. This is the final safety net before shipping — it catches requirements that were planned but somehow never built.
|
|
572
635
|
|
|
573
636
|
### Step 7.1 — Documentation
|
|
574
637
|
|