warp-os 1.1.2 → 1.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. package/CHANGELOG.md +85 -0
  2. package/README.md +6 -4
  3. package/VERSION +1 -1
  4. package/agents/warp-annotate.md +394 -0
  5. package/agents/warp-browse.md +9 -1
  6. package/agents/warp-build-code.md +9 -1
  7. package/agents/warp-orchestrator.md +10 -1
  8. package/agents/warp-plan-architect.md +120 -1
  9. package/agents/warp-plan-brainstorm.md +93 -2
  10. package/agents/warp-plan-design.md +97 -4
  11. package/agents/warp-plan-onboarding.md +9 -1
  12. package/agents/warp-plan-optimize.md +9 -1
  13. package/agents/warp-plan-scope.md +67 -1
  14. package/agents/warp-plan-security.md +576 -35
  15. package/agents/warp-plan-testdesign.md +9 -1
  16. package/agents/warp-qa-debug.md +117 -1
  17. package/agents/warp-qa-test.md +167 -1
  18. package/agents/warp-release-update.md +290 -4
  19. package/agents/warp-setup.md +9 -1
  20. package/agents/warp-upgrade.md +21 -4
  21. package/bin/hooks/CLAUDE.md +24 -0
  22. package/bin/hooks/_warp_json.sh +4 -2
  23. package/bin/hooks/identity-briefing.sh +20 -13
  24. package/bin/hooks/validate-askuser.sh +41 -0
  25. package/bin/migrate-sessions.js +284 -173
  26. package/dist/warp-annotate/SKILL.md +404 -0
  27. package/dist/warp-browse/SKILL.md +9 -1
  28. package/dist/warp-build-code/SKILL.md +9 -1
  29. package/dist/warp-orchestrator/SKILL.md +10 -1
  30. package/dist/warp-plan-architect/SKILL.md +120 -1
  31. package/dist/warp-plan-brainstorm/SKILL.md +93 -2
  32. package/dist/warp-plan-design/SKILL.md +97 -4
  33. package/dist/warp-plan-onboarding/SKILL.md +9 -1
  34. package/dist/warp-plan-optimize/SKILL.md +9 -1
  35. package/dist/warp-plan-scope/SKILL.md +67 -1
  36. package/dist/warp-plan-security/SKILL.md +578 -35
  37. package/dist/warp-plan-testdesign/SKILL.md +9 -1
  38. package/dist/warp-qa-debug/SKILL.md +117 -1
  39. package/dist/warp-qa-test/SKILL.md +167 -1
  40. package/dist/warp-release-update/SKILL.md +290 -4
  41. package/dist/warp-setup/SKILL.md +9 -1
  42. package/dist/warp-upgrade/SKILL.md +21 -4
  43. package/package.json +2 -2
  44. package/shared/project-hooks.json +7 -0
  45. package/shared/tier1-engineering-constitution.md +9 -1
@@ -119,6 +119,8 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
119
119
 
120
120
  ## AskUserQuestion
121
121
 
122
+ **Flow: analysis first, then decision tool.** Present your full reasoning, trade-offs, and recommendations as conversational text — the user wants to read your thinking. Then cap it with AskUserQuestion to formalize the decision. **If you're composing a message with multiple options or "which approach?" language, you MUST end it with AskUserQuestion.** Never present options in prose without the tool.
123
+
122
124
  **Contract:**
123
125
  1. **Re-ground:** Project name, branch, current task. (1-2 sentences.)
124
126
  2. **Simplify:** Plain English a smart 16-year-old could follow.
@@ -140,9 +142,15 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
140
142
  Format: `"Option name — X/10 🟢"` (or 🟡 or 🔴). In the label, not the description.
141
143
  Rate: 🟢 9-10 complete, 🟡 6-8 adequate, 🔴 1-5 shortcuts.
142
144
 
145
+ **Pre-call checklist (verify before every AskUserQuestion invocation):**
146
+ - ☐ Completeness scores in every option label
147
+ - ☐ Recommended option listed first
148
+ - ☐ One decision per question (split if multiple)
149
+ - ☐ Analysis/reasoning already presented in message text above
150
+
143
151
  **Formatting:**
144
152
  - *Italics* for emphasis, not **bold** (bold for headers only).
145
- - After each answer: `✔ Decision {N} recorded [quicksave updated]`
153
+ - After each answer: `✔ Decision {N} recorded`
146
154
  - Previews under 8 lines. Full mockups go in conversation text before the question.
147
155
 
148
156
  ---
@@ -243,6 +251,28 @@ Internalize these cognitive patterns. They fire simultaneously on every input yo
243
251
 
244
252
  ---
245
253
 
254
+ ## PHASE 0: Scope Challenge
255
+
256
+ **Goal:** Before starting architecture, challenge whether the scope is right-sized. Architecture amplifies scope — if the scope is too large, the architecture will be too large. Five minutes here saves five days in build.
257
+
258
+ Read `.warp/reports/planning/scope.md` and the codebase. Then produce:
259
+
260
+ ```
261
+ SCOPE CHALLENGE:
262
+ Existing code that solves sub-problems: [search codebase for partial solutions]
263
+ Minimum change set: [what is the smallest change that delivers the scope?]
264
+ Complexity smell: [>8 files or >2 new classes/services = smell — justify or simplify]
265
+ Built-in alternatives: [does the framework/language have this built in?]
266
+ TODOS cross-reference: [does TODOS.md already track related work?]
267
+ Completeness check: [AI compresses cost — always prefer the complete version]
268
+ ```
269
+
270
+ If the scope challenge reveals the work is simpler than planned — existing code already solves sub-problems, the framework has built-in support, or the minimum change set is smaller than expected — surface it via AskUserQuestion and suggest scope reduction before proceeding. Do not architect a system larger than the problem requires.
271
+
272
+ If the scope holds, proceed to Phase 1.
273
+
274
+ ---
275
+
246
276
  ## PHASE 1: System Audit
247
277
 
248
278
  **Goal:** Understand what exists before designing what to build. New architecture on top of unexamined existing architecture produces collisions.
@@ -366,6 +396,25 @@ OPERATION: [name, e.g., "fetch pilot's active flight"]
366
396
 
367
397
  [SYSTEM scale] Produce this for every primary operation. [MODULE scale] Produce this for the 3 most complex operations. [FEATURE scale] Produce this for the primary operation only.
368
398
 
399
+ ### 2D. Error & Rescue Map
400
+
401
+ For each major operation documented in 2C, produce a rescue map that pairs every failure with a specific recovery action and user-visible outcome. This complements the four-path data flow by mapping the operational response plan:
402
+
403
+ ```
404
+ ERROR & RESCUE MAP:
405
+ ┌──────────────────┬─────────────────┬──────────────────┬────────────────────┐
406
+ │ Method/Codepath │ What Can Fail │ Rescue Action │ User-Visible Result │
407
+ ├──────────────────┼─────────────────┼──────────────────┼────────────────────┤
408
+ │ [specific method] │ [specific error] │ [specific action] │ [specific outcome] │
409
+ └──────────────────┴─────────────────┴──────────────────┴────────────────────┘
410
+ ```
411
+
412
+ Rules:
413
+ - Every row must name a specific method or codepath, not a vague category.
414
+ - "Rescue Action" must be actionable: retry with backoff, return cached value, degrade gracefully, alert on-call. Never "handle error."
415
+ - "User-Visible Result" must describe exactly what the user sees or experiences. Never "an error message."
416
+ - If a method can fail in multiple ways, each failure gets its own row.
417
+
369
418
  ---
370
419
 
371
420
  ## PHASE 3: API Design
@@ -533,6 +582,34 @@ BOUNDARY: [Component A] → [Component B]
533
582
 
534
583
  ---
535
584
 
585
+ ## PHASE 4.6: Observability & Debuggability Review
586
+
587
+ **Goal:** Verify that every major component can be diagnosed in production without attaching a debugger. Systems without observability are systems that fail silently and stay broken longer.
588
+
589
+ For each major component defined in Phase 2, produce:
590
+
591
+ ```
592
+ OBSERVABILITY:
593
+ Logging: [what is logged? structured? levels correct?]
594
+ Metrics: [key metrics exposed? latency, error rate, throughput?]
595
+ Tracing: [distributed tracing support? correlation IDs?]
596
+ Alerting: [what triggers alerts? who gets paged?]
597
+ Debuggability: [can you diagnose issues from logs alone?]
598
+ Admin tooling: [any admin endpoints or tools needed?]
599
+ ```
600
+
601
+ Rules:
602
+ - **Logging:** Every component must log at structured format (JSON or equivalent). Log levels must be correct: ERROR for things that break, WARN for things that degrade, INFO for state transitions, DEBUG for investigation. If a component has no logging plan, flag it.
603
+ - **Metrics:** At minimum, every component that handles requests must expose latency (p50/p95/p99), error rate, and throughput. Components that manage queues must expose queue depth and processing lag.
604
+ - **Tracing:** If the system has more than two components in a request path, correlation IDs are required. Every log line in a request must include the correlation ID so the full path can be reconstructed.
605
+ - **Alerting:** Every failure mode from Phase 4 must have a corresponding alert or explicit justification for why it does not need one. "We will notice" is not an alerting strategy.
606
+ - **Debuggability:** The litmus test: can an engineer who did not build this component diagnose a production issue using only logs, metrics, and traces — without reading the source code? If not, the observability is insufficient.
607
+ - **Admin tooling:** If the system requires manual intervention for any operational task (clearing a stuck queue, resetting a user's state, force-refreshing cached data), document the admin tool or endpoint that enables it.
608
+
609
+ [FEATURE scale] Brief format — logging and key metrics only. [MODULE scale] Full format for each component. [SYSTEM scale] Full format plus cross-component tracing architecture.
610
+
611
+ ---
612
+
536
613
  ## PHASE 5: Technical Decisions
537
614
 
538
615
  **Goal:** Document each significant technical choice with rationale and alternatives. Future engineers need to understand why, not just what.
@@ -570,6 +647,39 @@ Categories that almost always contain significant decisions:
570
647
 
571
648
  **Goal:** Write the architecture artifact that design, spec, and build all depend on.
572
649
 
650
+ ### 6A. Unresolved Decision Tracking
651
+
652
+ Before writing, review all AskUserQuestion interactions from Phases 0-5. List any decisions the user did not fully answer, deferred, or gave ambiguous responses to:
653
+
654
+ ```
655
+ UNRESOLVED DECISIONS:
656
+ - [decision description] — deferred because: [reason] — revisit when: [trigger]
657
+ ```
658
+
659
+ Include these in architecture.md under a "## Unresolved Decisions" section. These are not failures — they are explicitly tracked unknowns. Downstream skills (design, build) must check this section and either resolve the decision when they have more context or carry it forward.
660
+
661
+ ### 6B. Worktree Parallelization Strategy (Optional)
662
+
663
+ If the architecture has >3 independent components that could be built concurrently (no shared data models, no blocking dependencies), produce a parallelization strategy. This enables the build phase to use git worktrees for concurrent implementation:
664
+
665
+ ```
666
+ PARALLELIZATION STRATEGY:
667
+ ┌──────────────┬──────────────────┬──────────────┬───────────────┐
668
+ │ Lane │ Components │ Dependencies │ Can Start After│
669
+ ├──────────────┼──────────────────┼──────────────┼───────────────┤
670
+ │ Lane A │ [component list] │ none │ immediately │
671
+ │ Lane B │ [component list] │ Lane A types │ Lane A types │
672
+ └──────────────┴──────────────────┴──────────────┴───────────────┘
673
+ ```
674
+
675
+ Rules:
676
+ - A lane is a set of components that can be built independently by a separate agent in a worktree.
677
+ - Lane dependencies must be explicit: "Lane B needs the type definitions from Lane A" — not "Lane B needs Lane A to be done."
678
+ - Shared types/interfaces should be in their own lane (often Lane A) so other lanes can start as soon as types are defined.
679
+ - If no meaningful parallelization exists (everything depends on everything else), skip this section.
680
+
681
+ ### 6C. Completeness Gate
682
+
573
683
  Run a completeness gate before writing:
574
684
 
575
685
  1. Every component in scope has a defined boundary and responsibility
@@ -619,6 +729,15 @@ Create `.warp/reports/planning/architecture.md`:
619
729
  ## Technical Decisions
620
730
  {Each decision with context, options, choice, rationale, reversibility}
621
731
 
732
+ ## Observability
733
+ {Per component: logging, metrics, tracing, alerting, debuggability, admin tooling}
734
+
735
+ ## Unresolved Decisions
736
+ {Decisions deferred or unanswered during architecture — description, reason, revisit trigger}
737
+
738
+ ## Parallelization Strategy
739
+ {If applicable: lanes, components per lane, dependencies, start conditions}
740
+
622
741
  ## Open Questions for Design
623
742
  {Unresolved questions that the design phase must answer}
624
743
 
@@ -119,6 +119,8 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
119
119
 
120
120
  ## AskUserQuestion
121
121
 
122
+ **Flow: analysis first, then decision tool.** Present your full reasoning, trade-offs, and recommendations as conversational text — the user wants to read your thinking. Then cap it with AskUserQuestion to formalize the decision. **If you're composing a message with multiple options or "which approach?" language, you MUST end it with AskUserQuestion.** Never present options in prose without the tool.
123
+
122
124
  **Contract:**
123
125
  1. **Re-ground:** Project name, branch, current task. (1-2 sentences.)
124
126
  2. **Simplify:** Plain English a smart 16-year-old could follow.
@@ -140,9 +142,15 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
140
142
  Format: `"Option name — X/10 🟢"` (or 🟡 or 🔴). In the label, not the description.
141
143
  Rate: 🟢 9-10 complete, 🟡 6-8 adequate, 🔴 1-5 shortcuts.
142
144
 
145
+ **Pre-call checklist (verify before every AskUserQuestion invocation):**
146
+ - ☐ Completeness scores in every option label
147
+ - ☐ Recommended option listed first
148
+ - ☐ One decision per question (split if multiple)
149
+ - ☐ Analysis/reasoning already presented in message text above
150
+
143
151
  **Formatting:**
144
152
  - *Italics* for emphasis, not **bold** (bold for headers only).
145
- - After each answer: `✔ Decision {N} recorded [quicksave updated]`
153
+ - After each answer: `✔ Decision {N} recorded`
146
154
  - Previews under 8 lines. Full mockups go in conversation text before the question.
147
155
 
148
156
  ---
@@ -299,7 +307,15 @@ Product stage determines which forcing questions to ask (see Phase 2).
299
307
  ```
300
308
  If prior brainstorms exist, surface them: "Prior brainstorm found at `.warp/reports/planning/brainstorm.md`. Want to build on it or start fresh?"
301
309
 
302
- 4. Output: "Here's what I understand about this project and the problem area you want to explore: [1-2 paragraph synthesis]." State what is known and what is unclear.
310
+ 4. **Related Design Discovery:** Search `.warp/reports/` for any existing artifacts with keyword overlap to the stated problem:
311
+ ```bash
312
+ grep -rl "[key terms from user's problem description]" .warp/reports/ 2>/dev/null
313
+ ```
314
+ If related work exists, present it via AskUserQuestion: "Found existing artifacts that may be relevant: [list with brief description of each]. Want to build on these or start fresh?"
315
+
316
+ This prevents duplicate brainstorming and surfaces prior thinking the user may have forgotten about. Skip this step only if `.warp/reports/` does not exist.
317
+
318
+ 5. Output: "Here's what I understand about this project and the problem area you want to explore: [1-2 paragraph synthesis]." State what is known and what is unclear.
303
319
 
304
320
  ---
305
321
 
@@ -434,6 +450,45 @@ If the framing is imprecise, reframe constructively: "Let me try restating what
434
450
 
435
451
  ---
436
452
 
453
+ ## PHASE 2.5: Landscape Awareness (Both Modes)
454
+
455
+ **Goal:** Ground the brainstorm in what actually exists in the market. Conventional wisdom is often wrong — search for evidence before synthesizing.
456
+
457
+ ### Privacy Gate
458
+
459
+ Before any web search, ask via AskUserQuestion:
460
+
461
+ > "I'd like to search for [specific query] to understand the competitive landscape. This will send a web search query. OK to proceed? [Y/n]"
462
+
463
+ Only proceed with the search if the user approves. If declined, skip to Phase 3 and note in the brainstorm artifact that landscape analysis was skipped by user choice.
464
+
465
+ ### Search Strategy
466
+
467
+ Search for:
468
+ - `[product category] + "alternatives"` — what direct competitors exist
469
+ - `[problem domain] + "solutions"` — how people solve this problem today
470
+ - `[user type] + [pain point]` — how the target user describes their problem
471
+
472
+ Use WebSearch for each approved query. Limit to 2-3 searches to avoid over-researching.
473
+
474
+ ### Landscape Synthesis
475
+
476
+ Produce:
477
+
478
+ ```
479
+ LANDSCAPE SYNTHESIS:
480
+ Conventional wisdom: [what everyone assumes the solution looks like]
481
+ Search findings: [what competitors actually do — 3-5 with URLs]
482
+ First-principles view: [what the data suggests that contradicts conventional wisdom]
483
+ Eureka moments: [any insight that changes the problem framing]
484
+ ```
485
+
486
+ **Integration:** Feed the landscape synthesis directly into Phase 3 (User Needs Mapping) and Phase 6 (Synthesis). The "Key Insight" in Phase 6 should reference landscape findings when they reveal a non-obvious opportunity.
487
+
488
+ **Smart-skip:** If the user already provided detailed competitive analysis or the product is in a space with no direct competitors (novel research, internal tool), skip the search but still produce the synthesis from what the user shared.
489
+
490
+ ---
491
+
437
492
  ## PHASE 2B: Builder Mode Questions
438
493
 
439
494
  Use this phase when the user is building for fun, learning, hacking, at a hackathon, or doing research.
@@ -663,6 +718,33 @@ Present via AskUserQuestion. Do NOT proceed without user approval.
663
718
 
664
719
  **Goal:** Write the output artifact with all session findings.
665
720
 
721
+ ### Design Doc Lineage
722
+
723
+ Before writing, check if a previous `brainstorm.md` exists:
724
+
725
+ ```bash
726
+ if [ -f .warp/reports/planning/brainstorm.md ]; then
727
+ # Get the date from the existing file's pipeline header
728
+ existing_date=$(grep -oP '\d{4}-\d{2}-\d{2}' .warp/reports/planning/brainstorm.md | head -1)
729
+ # Count existing versions in archive
730
+ mkdir -p .warp/reports/planning/archive
731
+ version=$(ls .warp/reports/planning/archive/brainstorm-v*.md 2>/dev/null | wc -l)
732
+ next_version=$((version + 1))
733
+ # Archive the previous version
734
+ cp .warp/reports/planning/brainstorm.md ".warp/reports/planning/archive/brainstorm-v${next_version}.md"
735
+ fi
736
+ ```
737
+
738
+ If a previous version was archived, add a `Supersedes:` comment at the top of the new brainstorm.md:
739
+
740
+ ```markdown
741
+ <!-- Supersedes: brainstorm-v[N].md ([date]) — [brief reason for new version, e.g., "scope expanded after competitive analysis"] -->
742
+ ```
743
+
744
+ This creates an audit trail of how the product thinking evolved. The archive is in `.warp/reports/planning/archive/` and is never deleted.
745
+
746
+ ### Write the Artifact
747
+
666
748
  Create `.warp/reports/planning/brainstorm.md`:
667
749
 
668
750
  ```markdown
@@ -703,12 +785,21 @@ Create `.warp/reports/planning/brainstorm.md`:
703
785
  ### Approach C: {name — if applicable}
704
786
  {from Phase 7}
705
787
 
788
+ ## Landscape Analysis
789
+ {from Phase 2.5 — conventional wisdom, search findings, first-principles view, eureka moments}
790
+ {If skipped by user choice, note: "Landscape analysis skipped at user request."}
791
+
706
792
  ## Recommended Direction
707
793
  {from Phase 6 synthesis}
708
794
 
709
795
  ## What to Build First
710
796
  {the narrowest wedge}
711
797
 
798
+ ## Distribution Plan
799
+ How users get this: {app store / npm / direct download / SaaS / browser extension / etc.}
800
+ CI/CD pipeline needed: {yes — describe / no — manual / existing pipeline covers it}
801
+ Update mechanism: {auto-update / manual / package manager / N/A for SaaS}
802
+
712
803
  ## Open Questions
713
804
  {unresolved uncertainties that the next skill should address}
714
805
 
@@ -119,6 +119,8 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
119
119
 
120
120
  ## AskUserQuestion
121
121
 
122
+ **Flow: analysis first, then decision tool.** Present your full reasoning, trade-offs, and recommendations as conversational text — the user wants to read your thinking. Then cap it with AskUserQuestion to formalize the decision. **If you're composing a message with multiple options or "which approach?" language, you MUST end it with AskUserQuestion.** Never present options in prose without the tool.
123
+
122
124
  **Contract:**
123
125
  1. **Re-ground:** Project name, branch, current task. (1-2 sentences.)
124
126
  2. **Simplify:** Plain English a smart 16-year-old could follow.
@@ -140,9 +142,15 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
140
142
  Format: `"Option name — X/10 🟢"` (or 🟡 or 🔴). In the label, not the description.
141
143
  Rate: 🟢 9-10 complete, 🟡 6-8 adequate, 🔴 1-5 shortcuts.
142
144
 
145
+ **Pre-call checklist (verify before every AskUserQuestion invocation):**
146
+ - ☐ Completeness scores in every option label
147
+ - ☐ Recommended option listed first
148
+ - ☐ One decision per question (split if multiple)
149
+ - ☐ Analysis/reasoning already presented in message text above
150
+
143
151
  **Formatting:**
144
152
  - *Italics* for emphasis, not **bold** (bold for headers only).
145
- - After each answer: `✔ Decision {N} recorded [quicksave updated]`
153
+ - After each answer: `✔ Decision {N} recorded`
146
154
  - Previews under 8 lines. Full mockups go in conversation text before the question.
147
155
 
148
156
  ---
@@ -420,7 +428,44 @@ SCREEN: [name]
420
428
  - Never use technical language in user-facing copy: "Something went wrong" not "Error 500"
421
429
  - Loading copy uses present participle with ellipsis character: "Loading flights..." uses `…` not `...`
422
430
 
423
- ### 2E. Accessibility Strategy
431
+ ### 2E. Interaction State Coverage
432
+
433
+ For every feature or screen identified in the user flows, map which interaction states have been designed. This ensures no screen ships with only the happy path.
434
+
435
+ ```
436
+ INTERACTION STATE COVERAGE:
437
+ ┌──────────────┬─────────┬───────┬───────┬─────────┬─────────┐
438
+ │ Feature │ Loading │ Empty │ Error │ Success │ Partial │
439
+ ├──────────────┼─────────┼───────┼───────┼─────────┼─────────┤
440
+ │ [feature] │ ✓/✗ │ ✓/✗ │ ✓/✗ │ ✓/✗ │ ✓/✗ │
441
+ └──────────────┴─────────┴───────┴───────┴─────────┴─────────┘
442
+ Every ✗ must be designed before the design phase completes.
443
+ ```
444
+
445
+ Produce this table for every screen from Phase 2A. If a state does not apply to a given feature (e.g., a static "About" page has no loading state), mark it N/A with a brief reason. All other gaps are design debt that must be resolved before Phase 3.
446
+
447
+ ### 2F. User Journey & Emotional Arc
448
+
449
+ For the primary user flow (and any secondary flow that involves emotional stakes), produce a storyboard mapping what the user does, what they feel, and whether the design addresses that feeling.
450
+
451
+ ```
452
+ USER JOURNEY:
453
+ ┌──────┬──────────────────┬──────────────────┬───────────────────┐
454
+ │ Step │ User Does │ User Feels │ Design Specifies? │
455
+ ├──────┼──────────────────┼──────────────────┼───────────────────┤
456
+ │ 1 │ [action] │ [emotion] │ [yes/no + what] │
457
+ │ 2 │ [action] │ [emotion] │ [yes/no + what] │
458
+ └──────┴──────────────────┴──────────────────┴───────────────────┘
459
+ Any step where "Design Specifies?" = no is a gap to fill.
460
+ ```
461
+
462
+ Rules:
463
+ - Include negative emotions (confusion, anxiety, frustration) — these are where design matters most
464
+ - The "Design Specifies?" column must reference a concrete design element: a loading skeleton, a success animation, an error message with recovery instructions, a reassuring empty state
465
+ - If a step has no design specification, create one before proceeding to Phase 3
466
+ - For Builder mode projects, the emotional arc should include at least one "delight" moment where the user says "whoa"
467
+
468
+ ### 2G. Accessibility Strategy
424
469
 
425
470
  Define accessibility requirements at the strategy level:
426
471
 
@@ -447,7 +492,7 @@ REDUCED MOTION:
447
492
  [how animations degrade — typically to opacity fade only]
448
493
  ```
449
494
 
450
- ### 2F. Figma Setup (if available)
495
+ ### 2H. Figma Setup (if available)
451
496
 
452
497
  If Figma MCP is configured (check `.warp/warp-tools.json` → `mcp_servers.figma.status`):
453
498
 
@@ -728,7 +773,55 @@ ANTI-SLOP VERIFICATION:
728
773
 
729
774
  If ANY item fails the slop scan, go back and fix it before proceeding.
730
775
 
731
- **HARD GATE: Visual System complete. Present color palette, typography, spacing, and key component wireframes to user for approval before proceeding to Implementation.**
776
+ **HARD GATE: Visual System complete. Present color palette, typography, spacing, and key component wireframes to user for approval before proceeding to Design Rating.**
777
+
778
+ ---
779
+
780
+ ## PHASE 3.5: Design Dimension Rating
781
+
782
+ **Goal:** Rate the design across seven critical dimensions, identify gaps, and fix them before implementation begins. This is the quality gate that prevents mediocre designs from reaching the build phase.
783
+
784
+ ### Rating Method
785
+
786
+ Score each dimension 0-10. For each, describe what a perfect 10 looks like for THIS specific product. Be concrete — "good typography" is not a 10 description; "type scale with 5 levels, mathematical 1.25 ratio, monospace for all data, system fonts for performance, tested at 200% zoom" is.
787
+
788
+ ```
789
+ DESIGN DIMENSION RATING:
790
+ ┌────────────────────────────┬───────┬──────────────────────────────────┐
791
+ │ Dimension │ Score │ What 10 looks like │
792
+ ├────────────────────────────┼───────┼──────────────────────────────────┤
793
+ │ Information Architecture │ [0-10]│ [specific description] │
794
+ │ Interaction State Coverage │ [0-10]│ [specific description] │
795
+ │ User Journey & Emotional Arc│ [0-10]│ [specific description] │
796
+ │ Design System Alignment │ [0-10]│ [specific description] │
797
+ │ Responsive & Accessibility │ [0-10]│ [specific description] │
798
+ │ Content Strategy │ [0-10]│ [specific description] │
799
+ │ Delight & Differentiation │ [0-10]│ [specific description] │
800
+ └────────────────────────────┴───────┴──────────────────────────────────┘
801
+ ```
802
+
803
+ ### Fix Loop
804
+
805
+ For any dimension scoring below 8:
806
+
807
+ 1. **Explain the gap** — what specifically is missing or weak, with concrete examples from the current design
808
+ 2. **Propose a fix** — what specific change would close the gap
809
+ 3. **Apply the fix** — update the relevant Phase 2 or Phase 3 output
810
+ 4. **Re-rate** — score the dimension again after the fix
811
+
812
+ Loop until all dimensions score 8 or higher, or the user says "move on."
813
+
814
+ ### Dimension Definitions
815
+
816
+ - **Information Architecture:** Is every screen's priority hierarchy clear? Can a user answer their primary question in under 1 second? Is navigation depth ≤ 3?
817
+ - **Interaction State Coverage:** Does every feature have loading, empty, error, success, and partial states designed? (Cross-reference with the Phase 2E table.)
818
+ - **User Journey & Emotional Arc:** Does every step in the primary flow have a designed emotional response? Are negative emotions (anxiety, confusion) explicitly addressed?
819
+ - **Design System Alignment:** Are all colors, typography, spacing, and components using tokens? Zero raw values? Consistent across every screen?
820
+ - **Responsive & Accessibility:** WCAG AA verified for all pairs? Touch targets ≥ 44px? Dynamic type tested? Reduced motion specified? Platform conventions honored?
821
+ - **Content Strategy:** Real copy on every screen, every state? Buttons are verb + object? Error messages include recovery? Empty states suggest next action?
822
+ - **Delight & Differentiation:** Would a human designer guess "AI made this"? Does the design have at least one moment that makes the user say "whoa"? Are all three anti-slop commitments honored?
823
+
824
+ **HARD GATE: All dimensions must score ≥ 8, or the user must explicitly approve moving forward with lower scores. Present the rating table to user for approval before proceeding to Implementation.**
732
825
 
733
826
  ---
734
827
 
@@ -119,6 +119,8 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
119
119
 
120
120
  ## AskUserQuestion
121
121
 
122
+ **Flow: analysis first, then decision tool.** Present your full reasoning, trade-offs, and recommendations as conversational text — the user wants to read your thinking. Then cap it with AskUserQuestion to formalize the decision. **If you're composing a message with multiple options or "which approach?" language, you MUST end it with AskUserQuestion.** Never present options in prose without the tool.
123
+
122
124
  **Contract:**
123
125
  1. **Re-ground:** Project name, branch, current task. (1-2 sentences.)
124
126
  2. **Simplify:** Plain English a smart 16-year-old could follow.
@@ -140,9 +142,15 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
140
142
  Format: `"Option name — X/10 🟢"` (or 🟡 or 🔴). In the label, not the description.
141
143
  Rate: 🟢 9-10 complete, 🟡 6-8 adequate, 🔴 1-5 shortcuts.
142
144
 
145
+ **Pre-call checklist (verify before every AskUserQuestion invocation):**
146
+ - ☐ Completeness scores in every option label
147
+ - ☐ Recommended option listed first
148
+ - ☐ One decision per question (split if multiple)
149
+ - ☐ Analysis/reasoning already presented in message text above
150
+
143
151
  **Formatting:**
144
152
  - *Italics* for emphasis, not **bold** (bold for headers only).
145
- - After each answer: `✔ Decision {N} recorded [quicksave updated]`
153
+ - After each answer: `✔ Decision {N} recorded`
146
154
  - Previews under 8 lines. Full mockups go in conversation text before the question.
147
155
 
148
156
  ---
@@ -119,6 +119,8 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
119
119
 
120
120
  ## AskUserQuestion
121
121
 
122
+ **Flow: analysis first, then decision tool.** Present your full reasoning, trade-offs, and recommendations as conversational text — the user wants to read your thinking. Then cap it with AskUserQuestion to formalize the decision. **If you're composing a message with multiple options or "which approach?" language, you MUST end it with AskUserQuestion.** Never present options in prose without the tool.
123
+
122
124
  **Contract:**
123
125
  1. **Re-ground:** Project name, branch, current task. (1-2 sentences.)
124
126
  2. **Simplify:** Plain English a smart 16-year-old could follow.
@@ -140,9 +142,15 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
140
142
  Format: `"Option name — X/10 🟢"` (or 🟡 or 🔴). In the label, not the description.
141
143
  Rate: 🟢 9-10 complete, 🟡 6-8 adequate, 🔴 1-5 shortcuts.
142
144
 
145
+ **Pre-call checklist (verify before every AskUserQuestion invocation):**
146
+ - ☐ Completeness scores in every option label
147
+ - ☐ Recommended option listed first
148
+ - ☐ One decision per question (split if multiple)
149
+ - ☐ Analysis/reasoning already presented in message text above
150
+
143
151
  **Formatting:**
144
152
  - *Italics* for emphasis, not **bold** (bold for headers only).
145
- - After each answer: `✔ Decision {N} recorded [quicksave updated]`
153
+ - After each answer: `✔ Decision {N} recorded`
146
154
  - Previews under 8 lines. Full mockups go in conversation text before the question.
147
155
 
148
156
  ---
@@ -119,6 +119,8 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
119
119
 
120
120
  ## AskUserQuestion
121
121
 
122
+ **Flow: analysis first, then decision tool.** Present your full reasoning, trade-offs, and recommendations as conversational text — the user wants to read your thinking. Then cap it with AskUserQuestion to formalize the decision. **If you're composing a message with multiple options or "which approach?" language, you MUST end it with AskUserQuestion.** Never present options in prose without the tool.
123
+
122
124
  **Contract:**
123
125
  1. **Re-ground:** Project name, branch, current task. (1-2 sentences.)
124
126
  2. **Simplify:** Plain English a smart 16-year-old could follow.
@@ -140,9 +142,15 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
140
142
  Format: `"Option name — X/10 🟢"` (or 🟡 or 🔴). In the label, not the description.
141
143
  Rate: 🟢 9-10 complete, 🟡 6-8 adequate, 🔴 1-5 shortcuts.
142
144
 
145
+ **Pre-call checklist (verify before every AskUserQuestion invocation):**
146
+ - ☐ Completeness scores in every option label
147
+ - ☐ Recommended option listed first
148
+ - ☐ One decision per question (split if multiple)
149
+ - ☐ Analysis/reasoning already presented in message text above
150
+
143
151
  **Formatting:**
144
152
  - *Italics* for emphasis, not **bold** (bold for headers only).
145
- - After each answer: `✔ Decision {N} recorded [quicksave updated]`
153
+ - After each answer: `✔ Decision {N} recorded`
146
154
  - Previews under 8 lines. Full mockups go in conversation text before the question.
147
155
 
148
156
  ---
@@ -328,6 +336,25 @@ LEVERAGE INVENTORY:
328
336
 
329
337
  **[SYSTEM scale only]:** If this is a greenfield project with no existing code, note that and proceed.
330
338
 
339
+ ### Taste Calibration
340
+
341
+ Before scoping new work, calibrate to the project's actual quality bar. Identify 2-3 well-designed files in the existing codebase and 1-2 anti-patterns. This ensures the scope targets the right level of quality — not aspirational, not lowest-common-denominator, but calibrated to the project's own best work.
342
+
343
+ ```
344
+ TASTE CALIBRATION:
345
+ Well-designed (match this quality):
346
+ - [file path] — [why it's good: clear names, good structure, etc.]
347
+ - [file path] — [why it's good]
348
+ Anti-patterns (avoid this):
349
+ - [file path] — [what's wrong: unclear, coupled, etc.]
350
+ ```
351
+
352
+ Rules:
353
+ - "Well-designed" means: clear naming, clean separation of concerns, readable without comments, testable in isolation. Not "clever" — clear.
354
+ - "Anti-pattern" means: unclear responsibility, tight coupling, implicit state, hard to test, confusing to a new reader.
355
+ - If the project is greenfield with no existing code, skip this section.
356
+ - The taste calibration informs the scope: new work should match the quality of the best existing work, not the worst. If the scope would require work at a quality level below the project's best, flag it.
357
+
331
358
  ---
332
359
 
333
360
  ## PHASE 2: Dream State Mapping
@@ -566,6 +593,41 @@ RISK: [name]
566
593
 
567
594
  ---
568
595
 
596
+ ## PHASE 6.5: Implementation Alternatives (Mandatory)
597
+
598
+ **Goal:** Before committing to a single approach, require 2-3 distinct implementation strategies with explicit trade-offs. This prevents tunnel vision and gives the architect phase real options instead of a predetermined path.
599
+
600
+ For the scoped work, produce:
601
+
602
+ ```
603
+ IMPLEMENTATION ALTERNATIVES:
604
+ A) [approach name]
605
+ Effort: [low/medium/high] Risk: [low/medium/high]
606
+ Pros: [list] Cons: [list]
607
+ B) [approach name]
608
+ Effort: [low/medium/high] Risk: [low/medium/high]
609
+ Pros: [list] Cons: [list]
610
+ C) [approach name] (if applicable)
611
+ Effort: [low/medium/high] Risk: [low/medium/high]
612
+ Pros: [list] Cons: [list]
613
+ ```
614
+
615
+ Rules:
616
+ - **Minimum two alternatives.** If you can only think of one way to build this, you have not thought hard enough. Even "build from scratch" vs. "use existing library" vs. "fork and customize" counts.
617
+ - **Alternatives must be genuinely different.** Not "React" vs. "React with different state management." Different means different architecture, different trade-offs, different failure modes. Examples: monolith vs. services, server-rendered vs. SPA, build vs. buy, single-table vs. normalized.
618
+ - **Effort and risk must be calibrated to this team.** "Low effort" for a team with React experience is different from "low effort" for a team learning React. Use the taste calibration and leverage inventory to ground estimates.
619
+ - **Do not pre-decide.** Present alternatives neutrally. The user (or the architect phase) chooses. If you have a strong recommendation, state it separately after the alternatives — not embedded in the pros/cons.
620
+ - **Include the "boring" option.** One alternative should always be the simplest, most conventional approach. If the boring option has no real downsides, it is probably the right choice.
621
+
622
+ Present via AskUserQuestion. User may select one, ask for more detail, or request additional alternatives. Record the selected approach (or "deferred to architect") in scope.md.
623
+
624
+ **Mode effects:**
625
+ - **Expansion / Selective Expansion:** Full analysis with 3 alternatives minimum.
626
+ - **Hold Scope:** 2 alternatives minimum — the current approach and one meaningful variation.
627
+ - **Reduction:** 2 alternatives — the minimum viable approach and the slightly-less-minimum approach. Focus on what can be cut from the implementation, not just from the feature list.
628
+
629
+ ---
630
+
569
631
  ## PHASE 7: Write scope.md
570
632
 
571
633
  **Goal:** Write the scope artifact that architect, design, spec, and build all depend on.
@@ -625,6 +687,10 @@ Create `.warp/reports/planning/scope.md`:
625
687
 
626
688
  {Top risks from Phase 6}
627
689
 
690
+ ## Implementation Alternatives
691
+
692
+ {2-3 approaches with effort/risk/pros/cons from Phase 6.5. Selected approach marked.}
693
+
628
694
  ## Temporal Decisions
629
695
 
630
696
  {What must be decided now vs. deferred}