npm - warp-os - Versions diffs - 1.1.2 → 1.2.1 - Mend

warp-os 1.1.2 → 1.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (45) hide show

package/CHANGELOG.md +85 -0
package/README.md +6 -4
package/VERSION +1 -1
package/agents/warp-annotate.md +394 -0
package/agents/warp-browse.md +9 -1
package/agents/warp-build-code.md +9 -1
package/agents/warp-orchestrator.md +10 -1
package/agents/warp-plan-architect.md +120 -1
package/agents/warp-plan-brainstorm.md +93 -2
package/agents/warp-plan-design.md +97 -4
package/agents/warp-plan-onboarding.md +9 -1
package/agents/warp-plan-optimize.md +9 -1
package/agents/warp-plan-scope.md +67 -1
package/agents/warp-plan-security.md +576 -35
package/agents/warp-plan-testdesign.md +9 -1
package/agents/warp-qa-debug.md +117 -1
package/agents/warp-qa-test.md +167 -1
package/agents/warp-release-update.md +290 -4
package/agents/warp-setup.md +9 -1
package/agents/warp-upgrade.md +21 -4
package/bin/hooks/CLAUDE.md +24 -0
package/bin/hooks/_warp_json.sh +4 -2
package/bin/hooks/identity-briefing.sh +20 -13
package/bin/hooks/validate-askuser.sh +41 -0
package/bin/migrate-sessions.js +284 -173
package/dist/warp-annotate/SKILL.md +404 -0
package/dist/warp-browse/SKILL.md +9 -1
package/dist/warp-build-code/SKILL.md +9 -1
package/dist/warp-orchestrator/SKILL.md +10 -1
package/dist/warp-plan-architect/SKILL.md +120 -1
package/dist/warp-plan-brainstorm/SKILL.md +93 -2
package/dist/warp-plan-design/SKILL.md +97 -4
package/dist/warp-plan-onboarding/SKILL.md +9 -1
package/dist/warp-plan-optimize/SKILL.md +9 -1
package/dist/warp-plan-scope/SKILL.md +67 -1
package/dist/warp-plan-security/SKILL.md +578 -35
package/dist/warp-plan-testdesign/SKILL.md +9 -1
package/dist/warp-qa-debug/SKILL.md +117 -1
package/dist/warp-qa-test/SKILL.md +167 -1
package/dist/warp-release-update/SKILL.md +290 -4
package/dist/warp-setup/SKILL.md +9 -1
package/dist/warp-upgrade/SKILL.md +21 -4
package/package.json +2 -2
package/shared/project-hooks.json +7 -0
package/shared/tier1-engineering-constitution.md +9 -1

package/agents/warp-plan-architect.md CHANGED Viewed

@@ -119,6 +119,8 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 ## AskUserQuestion
+**Flow: analysis first, then decision tool.** Present your full reasoning, trade-offs, and recommendations as conversational text — the user wants to read your thinking. Then cap it with AskUserQuestion to formalize the decision. **If you're composing a message with multiple options or "which approach?" language, you MUST end it with AskUserQuestion.** Never present options in prose without the tool.
 **Contract:**
 1. **Re-ground:** Project name, branch, current task. (1-2 sentences.)
 2. **Simplify:** Plain English a smart 16-year-old could follow.
@@ -140,9 +142,15 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 Format: `"Option name — X/10 🟢"` (or 🟡 or 🔴). In the label, not the description.
 Rate: 🟢 9-10 complete, 🟡 6-8 adequate, 🔴 1-5 shortcuts.
+**Pre-call checklist (verify before every AskUserQuestion invocation):**
+- ☐ Completeness scores in every option label
+- ☐ Recommended option listed first
+- ☐ One decision per question (split if multiple)
+- ☐ Analysis/reasoning already presented in message text above
 **Formatting:**
 - *Italics* for emphasis, not **bold** (bold for headers only).
-- After each answer: `✔ Decision {N} recorded [quicksave updated]`
+- After each answer: `✔ Decision {N} recorded`
 - Previews under 8 lines. Full mockups go in conversation text before the question.
 ---
@@ -243,6 +251,28 @@ Internalize these cognitive patterns. They fire simultaneously on every input yo
 ---
+## PHASE 0: Scope Challenge
+**Goal:** Before starting architecture, challenge whether the scope is right-sized. Architecture amplifies scope — if the scope is too large, the architecture will be too large. Five minutes here saves five days in build.
+Read `.warp/reports/planning/scope.md` and the codebase. Then produce:
+```
+SCOPE CHALLENGE:
+  Existing code that solves sub-problems: [search codebase for partial solutions]
+  Minimum change set: [what is the smallest change that delivers the scope?]
+  Complexity smell: [>8 files or >2 new classes/services = smell — justify or simplify]
+  Built-in alternatives: [does the framework/language have this built in?]
+  TODOS cross-reference: [does TODOS.md already track related work?]
+  Completeness check: [AI compresses cost — always prefer the complete version]
+```
+If the scope challenge reveals the work is simpler than planned — existing code already solves sub-problems, the framework has built-in support, or the minimum change set is smaller than expected — surface it via AskUserQuestion and suggest scope reduction before proceeding. Do not architect a system larger than the problem requires.
+If the scope holds, proceed to Phase 1.
+---
 ## PHASE 1: System Audit
 **Goal:** Understand what exists before designing what to build. New architecture on top of unexamined existing architecture produces collisions.
@@ -366,6 +396,25 @@ OPERATION: [name, e.g., "fetch pilot's active flight"]
 [SYSTEM scale] Produce this for every primary operation. [MODULE scale] Produce this for the 3 most complex operations. [FEATURE scale] Produce this for the primary operation only.
+### 2D. Error & Rescue Map
+For each major operation documented in 2C, produce a rescue map that pairs every failure with a specific recovery action and user-visible outcome. This complements the four-path data flow by mapping the operational response plan:
+```
+ERROR & RESCUE MAP:
+  ┌──────────────────┬─────────────────┬──────────────────┬────────────────────┐
+  │ Method/Codepath   │ What Can Fail    │ Rescue Action     │ User-Visible Result │
+  ├──────────────────┼─────────────────┼──────────────────┼────────────────────┤
+  │ [specific method] │ [specific error] │ [specific action]  │ [specific outcome]  │
+  └──────────────────┴─────────────────┴──────────────────┴────────────────────┘
+```
+Rules:
+- Every row must name a specific method or codepath, not a vague category.
+- "Rescue Action" must be actionable: retry with backoff, return cached value, degrade gracefully, alert on-call. Never "handle error."
+- "User-Visible Result" must describe exactly what the user sees or experiences. Never "an error message."
+- If a method can fail in multiple ways, each failure gets its own row.
 ---
 ## PHASE 3: API Design
@@ -533,6 +582,34 @@ BOUNDARY: [Component A] → [Component B]
 ---
+## PHASE 4.6: Observability & Debuggability Review
+**Goal:** Verify that every major component can be diagnosed in production without attaching a debugger. Systems without observability are systems that fail silently and stay broken longer.
+For each major component defined in Phase 2, produce:
+```
+OBSERVABILITY:
+  Logging: [what is logged? structured? levels correct?]
+  Metrics: [key metrics exposed? latency, error rate, throughput?]
+  Tracing: [distributed tracing support? correlation IDs?]
+  Alerting: [what triggers alerts? who gets paged?]
+  Debuggability: [can you diagnose issues from logs alone?]
+  Admin tooling: [any admin endpoints or tools needed?]
+```
+Rules:
+- **Logging:** Every component must log at structured format (JSON or equivalent). Log levels must be correct: ERROR for things that break, WARN for things that degrade, INFO for state transitions, DEBUG for investigation. If a component has no logging plan, flag it.
+- **Metrics:** At minimum, every component that handles requests must expose latency (p50/p95/p99), error rate, and throughput. Components that manage queues must expose queue depth and processing lag.
+- **Tracing:** If the system has more than two components in a request path, correlation IDs are required. Every log line in a request must include the correlation ID so the full path can be reconstructed.
+- **Alerting:** Every failure mode from Phase 4 must have a corresponding alert or explicit justification for why it does not need one. "We will notice" is not an alerting strategy.
+- **Debuggability:** The litmus test: can an engineer who did not build this component diagnose a production issue using only logs, metrics, and traces — without reading the source code? If not, the observability is insufficient.
+- **Admin tooling:** If the system requires manual intervention for any operational task (clearing a stuck queue, resetting a user's state, force-refreshing cached data), document the admin tool or endpoint that enables it.
+[FEATURE scale] Brief format — logging and key metrics only. [MODULE scale] Full format for each component. [SYSTEM scale] Full format plus cross-component tracing architecture.
+---
 ## PHASE 5: Technical Decisions
 **Goal:** Document each significant technical choice with rationale and alternatives. Future engineers need to understand why, not just what.
@@ -570,6 +647,39 @@ Categories that almost always contain significant decisions:
 **Goal:** Write the architecture artifact that design, spec, and build all depend on.
+### 6A. Unresolved Decision Tracking
+Before writing, review all AskUserQuestion interactions from Phases 0-5. List any decisions the user did not fully answer, deferred, or gave ambiguous responses to:
+```
+UNRESOLVED DECISIONS:
+  - [decision description] — deferred because: [reason] — revisit when: [trigger]
+```
+Include these in architecture.md under a "## Unresolved Decisions" section. These are not failures — they are explicitly tracked unknowns. Downstream skills (design, build) must check this section and either resolve the decision when they have more context or carry it forward.
+### 6B. Worktree Parallelization Strategy (Optional)
+If the architecture has >3 independent components that could be built concurrently (no shared data models, no blocking dependencies), produce a parallelization strategy. This enables the build phase to use git worktrees for concurrent implementation:
+```
+PARALLELIZATION STRATEGY:
+  ┌──────────────┬──────────────────┬──────────────┬───────────────┐
+  │ Lane          │ Components        │ Dependencies  │ Can Start After│
+  ├──────────────┼──────────────────┼──────────────┼───────────────┤
+  │ Lane A       │ [component list]  │ none          │ immediately    │
+  │ Lane B       │ [component list]  │ Lane A types  │ Lane A types   │
+  └──────────────┴──────────────────┴──────────────┴───────────────┘
+```
+Rules:
+- A lane is a set of components that can be built independently by a separate agent in a worktree.
+- Lane dependencies must be explicit: "Lane B needs the type definitions from Lane A" — not "Lane B needs Lane A to be done."
+- Shared types/interfaces should be in their own lane (often Lane A) so other lanes can start as soon as types are defined.
+- If no meaningful parallelization exists (everything depends on everything else), skip this section.
+### 6C. Completeness Gate
 Run a completeness gate before writing:
 1. Every component in scope has a defined boundary and responsibility
@@ -619,6 +729,15 @@ Create `.warp/reports/planning/architecture.md`:
 ## Technical Decisions
 {Each decision with context, options, choice, rationale, reversibility}
+## Observability
+{Per component: logging, metrics, tracing, alerting, debuggability, admin tooling}
+## Unresolved Decisions
+{Decisions deferred or unanswered during architecture — description, reason, revisit trigger}
+## Parallelization Strategy
+{If applicable: lanes, components per lane, dependencies, start conditions}
 ## Open Questions for Design
 {Unresolved questions that the design phase must answer}

package/agents/warp-plan-brainstorm.md CHANGED Viewed

@@ -119,6 +119,8 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 ## AskUserQuestion
+**Flow: analysis first, then decision tool.** Present your full reasoning, trade-offs, and recommendations as conversational text — the user wants to read your thinking. Then cap it with AskUserQuestion to formalize the decision. **If you're composing a message with multiple options or "which approach?" language, you MUST end it with AskUserQuestion.** Never present options in prose without the tool.
 **Contract:**
 1. **Re-ground:** Project name, branch, current task. (1-2 sentences.)
 2. **Simplify:** Plain English a smart 16-year-old could follow.
@@ -140,9 +142,15 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 Format: `"Option name — X/10 🟢"` (or 🟡 or 🔴). In the label, not the description.
 Rate: 🟢 9-10 complete, 🟡 6-8 adequate, 🔴 1-5 shortcuts.
+**Pre-call checklist (verify before every AskUserQuestion invocation):**
+- ☐ Completeness scores in every option label
+- ☐ Recommended option listed first
+- ☐ One decision per question (split if multiple)
+- ☐ Analysis/reasoning already presented in message text above
 **Formatting:**
 - *Italics* for emphasis, not **bold** (bold for headers only).
-- After each answer: `✔ Decision {N} recorded [quicksave updated]`
+- After each answer: `✔ Decision {N} recorded`
 - Previews under 8 lines. Full mockups go in conversation text before the question.
 ---
@@ -299,7 +307,15 @@ Product stage determines which forcing questions to ask (see Phase 2).
    ```
    If prior brainstorms exist, surface them: "Prior brainstorm found at `.warp/reports/planning/brainstorm.md`. Want to build on it or start fresh?"
-4. Output: "Here's what I understand about this project and the problem area you want to explore: [1-2 paragraph synthesis]." State what is known and what is unclear.
+4. **Related Design Discovery:** Search `.warp/reports/` for any existing artifacts with keyword overlap to the stated problem:
+   ```bash
+   grep -rl "[key terms from user's problem description]" .warp/reports/ 2>/dev/null
+   ```
+   If related work exists, present it via AskUserQuestion: "Found existing artifacts that may be relevant: [list with brief description of each]. Want to build on these or start fresh?"
+   This prevents duplicate brainstorming and surfaces prior thinking the user may have forgotten about. Skip this step only if `.warp/reports/` does not exist.
+5. Output: "Here's what I understand about this project and the problem area you want to explore: [1-2 paragraph synthesis]." State what is known and what is unclear.
 ---
@@ -434,6 +450,45 @@ If the framing is imprecise, reframe constructively: "Let me try restating what
 ---
+## PHASE 2.5: Landscape Awareness (Both Modes)
+**Goal:** Ground the brainstorm in what actually exists in the market. Conventional wisdom is often wrong — search for evidence before synthesizing.
+### Privacy Gate
+Before any web search, ask via AskUserQuestion:
+> "I'd like to search for [specific query] to understand the competitive landscape. This will send a web search query. OK to proceed? [Y/n]"
+Only proceed with the search if the user approves. If declined, skip to Phase 3 and note in the brainstorm artifact that landscape analysis was skipped by user choice.
+### Search Strategy
+Search for:
+- `[product category] + "alternatives"` — what direct competitors exist
+- `[problem domain] + "solutions"` — how people solve this problem today
+- `[user type] + [pain point]` — how the target user describes their problem
+Use WebSearch for each approved query. Limit to 2-3 searches to avoid over-researching.
+### Landscape Synthesis
+Produce:
+```
+LANDSCAPE SYNTHESIS:
+  Conventional wisdom: [what everyone assumes the solution looks like]
+  Search findings: [what competitors actually do — 3-5 with URLs]
+  First-principles view: [what the data suggests that contradicts conventional wisdom]
+  Eureka moments: [any insight that changes the problem framing]
+```
+**Integration:** Feed the landscape synthesis directly into Phase 3 (User Needs Mapping) and Phase 6 (Synthesis). The "Key Insight" in Phase 6 should reference landscape findings when they reveal a non-obvious opportunity.
+**Smart-skip:** If the user already provided detailed competitive analysis or the product is in a space with no direct competitors (novel research, internal tool), skip the search but still produce the synthesis from what the user shared.
+---
 ## PHASE 2B: Builder Mode Questions
 Use this phase when the user is building for fun, learning, hacking, at a hackathon, or doing research.
@@ -663,6 +718,33 @@ Present via AskUserQuestion. Do NOT proceed without user approval.
 **Goal:** Write the output artifact with all session findings.
+### Design Doc Lineage
+Before writing, check if a previous `brainstorm.md` exists:
+```bash
+if [ -f .warp/reports/planning/brainstorm.md ]; then
+  # Get the date from the existing file's pipeline header
+  existing_date=$(grep -oP '\d{4}-\d{2}-\d{2}' .warp/reports/planning/brainstorm.md | head -1)
+  # Count existing versions in archive
+  mkdir -p .warp/reports/planning/archive
+  version=$(ls .warp/reports/planning/archive/brainstorm-v*.md 2>/dev/null | wc -l)
+  next_version=$((version + 1))
+  # Archive the previous version
+  cp .warp/reports/planning/brainstorm.md ".warp/reports/planning/archive/brainstorm-v${next_version}.md"
+fi
+```
+If a previous version was archived, add a `Supersedes:` comment at the top of the new brainstorm.md:
+```markdown
+<!-- Supersedes: brainstorm-v[N].md ([date]) — [brief reason for new version, e.g., "scope expanded after competitive analysis"] -->
+```
+This creates an audit trail of how the product thinking evolved. The archive is in `.warp/reports/planning/archive/` and is never deleted.
+### Write the Artifact
 Create `.warp/reports/planning/brainstorm.md`:
 ```markdown
@@ -703,12 +785,21 @@ Create `.warp/reports/planning/brainstorm.md`:
 ### Approach C: {name — if applicable}
 {from Phase 7}
+## Landscape Analysis
+{from Phase 2.5 — conventional wisdom, search findings, first-principles view, eureka moments}
+{If skipped by user choice, note: "Landscape analysis skipped at user request."}
 ## Recommended Direction
 {from Phase 6 synthesis}
 ## What to Build First
 {the narrowest wedge}
+## Distribution Plan
+How users get this: {app store / npm / direct download / SaaS / browser extension / etc.}
+CI/CD pipeline needed: {yes — describe / no — manual / existing pipeline covers it}
+Update mechanism: {auto-update / manual / package manager / N/A for SaaS}
 ## Open Questions
 {unresolved uncertainties that the next skill should address}

package/agents/warp-plan-design.md CHANGED Viewed

@@ -119,6 +119,8 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 ## AskUserQuestion
+**Flow: analysis first, then decision tool.** Present your full reasoning, trade-offs, and recommendations as conversational text — the user wants to read your thinking. Then cap it with AskUserQuestion to formalize the decision. **If you're composing a message with multiple options or "which approach?" language, you MUST end it with AskUserQuestion.** Never present options in prose without the tool.
 **Contract:**
 1. **Re-ground:** Project name, branch, current task. (1-2 sentences.)
 2. **Simplify:** Plain English a smart 16-year-old could follow.
@@ -140,9 +142,15 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 Format: `"Option name — X/10 🟢"` (or 🟡 or 🔴). In the label, not the description.
 Rate: 🟢 9-10 complete, 🟡 6-8 adequate, 🔴 1-5 shortcuts.
+**Pre-call checklist (verify before every AskUserQuestion invocation):**
+- ☐ Completeness scores in every option label
+- ☐ Recommended option listed first
+- ☐ One decision per question (split if multiple)
+- ☐ Analysis/reasoning already presented in message text above
 **Formatting:**
 - *Italics* for emphasis, not **bold** (bold for headers only).
-- After each answer: `✔ Decision {N} recorded [quicksave updated]`
+- After each answer: `✔ Decision {N} recorded`
 - Previews under 8 lines. Full mockups go in conversation text before the question.
 ---
@@ -420,7 +428,44 @@ SCREEN: [name]
 - Never use technical language in user-facing copy: "Something went wrong" not "Error 500"
 - Loading copy uses present participle with ellipsis character: "Loading flights..." uses `…` not `...`
-### 2E. Accessibility Strategy
+### 2E. Interaction State Coverage
+For every feature or screen identified in the user flows, map which interaction states have been designed. This ensures no screen ships with only the happy path.
+```
+INTERACTION STATE COVERAGE:
+  ┌──────────────┬─────────┬───────┬───────┬─────────┬─────────┐
+  │ Feature       │ Loading │ Empty │ Error │ Success │ Partial │
+  ├──────────────┼─────────┼───────┼───────┼─────────┼─────────┤
+  │ [feature]     │ ✓/✗     │ ✓/✗   │ ✓/✗   │ ✓/✗     │ ✓/✗     │
+  └──────────────┴─────────┴───────┴───────┴─────────┴─────────┘
+Every ✗ must be designed before the design phase completes.
+```
+Produce this table for every screen from Phase 2A. If a state does not apply to a given feature (e.g., a static "About" page has no loading state), mark it N/A with a brief reason. All other gaps are design debt that must be resolved before Phase 3.
+### 2F. User Journey & Emotional Arc
+For the primary user flow (and any secondary flow that involves emotional stakes), produce a storyboard mapping what the user does, what they feel, and whether the design addresses that feeling.
+```
+USER JOURNEY:
+  ┌──────┬──────────────────┬──────────────────┬───────────────────┐
+  │ Step │ User Does         │ User Feels        │ Design Specifies?  │
+  ├──────┼──────────────────┼──────────────────┼───────────────────┤
+  │ 1    │ [action]          │ [emotion]         │ [yes/no + what]    │
+  │ 2    │ [action]          │ [emotion]         │ [yes/no + what]    │
+  └──────┴──────────────────┴──────────────────┴───────────────────┘
+Any step where "Design Specifies?" = no is a gap to fill.
+```
+Rules:
+- Include negative emotions (confusion, anxiety, frustration) — these are where design matters most
+- The "Design Specifies?" column must reference a concrete design element: a loading skeleton, a success animation, an error message with recovery instructions, a reassuring empty state
+- If a step has no design specification, create one before proceeding to Phase 3
+- For Builder mode projects, the emotional arc should include at least one "delight" moment where the user says "whoa"
+### 2G. Accessibility Strategy
 Define accessibility requirements at the strategy level:
@@ -447,7 +492,7 @@ REDUCED MOTION:
   [how animations degrade — typically to opacity fade only]
 ```
-### 2F. Figma Setup (if available)
+### 2H. Figma Setup (if available)
 If Figma MCP is configured (check `.warp/warp-tools.json` → `mcp_servers.figma.status`):
@@ -728,7 +773,55 @@ ANTI-SLOP VERIFICATION:
 If ANY item fails the slop scan, go back and fix it before proceeding.
-**HARD GATE: Visual System complete. Present color palette, typography, spacing, and key component wireframes to user for approval before proceeding to Implementation.**
+**HARD GATE: Visual System complete. Present color palette, typography, spacing, and key component wireframes to user for approval before proceeding to Design Rating.**
+---
+## PHASE 3.5: Design Dimension Rating
+**Goal:** Rate the design across seven critical dimensions, identify gaps, and fix them before implementation begins. This is the quality gate that prevents mediocre designs from reaching the build phase.
+### Rating Method
+Score each dimension 0-10. For each, describe what a perfect 10 looks like for THIS specific product. Be concrete — "good typography" is not a 10 description; "type scale with 5 levels, mathematical 1.25 ratio, monospace for all data, system fonts for performance, tested at 200% zoom" is.
+```
+DESIGN DIMENSION RATING:
+  ┌────────────────────────────┬───────┬──────────────────────────────────┐
+  │ Dimension                   │ Score │ What 10 looks like                │
+  ├────────────────────────────┼───────┼──────────────────────────────────┤
+  │ Information Architecture    │ [0-10]│ [specific description]            │
+  │ Interaction State Coverage  │ [0-10]│ [specific description]            │
+  │ User Journey & Emotional Arc│ [0-10]│ [specific description]            │
+  │ Design System Alignment     │ [0-10]│ [specific description]            │
+  │ Responsive & Accessibility  │ [0-10]│ [specific description]            │
+  │ Content Strategy            │ [0-10]│ [specific description]            │
+  │ Delight & Differentiation   │ [0-10]│ [specific description]            │
+  └────────────────────────────┴───────┴──────────────────────────────────┘
+```
+### Fix Loop
+For any dimension scoring below 8:
+1. **Explain the gap** — what specifically is missing or weak, with concrete examples from the current design
+2. **Propose a fix** — what specific change would close the gap
+3. **Apply the fix** — update the relevant Phase 2 or Phase 3 output
+4. **Re-rate** — score the dimension again after the fix
+Loop until all dimensions score 8 or higher, or the user says "move on."
+### Dimension Definitions
+- **Information Architecture:** Is every screen's priority hierarchy clear? Can a user answer their primary question in under 1 second? Is navigation depth ≤ 3?
+- **Interaction State Coverage:** Does every feature have loading, empty, error, success, and partial states designed? (Cross-reference with the Phase 2E table.)
+- **User Journey & Emotional Arc:** Does every step in the primary flow have a designed emotional response? Are negative emotions (anxiety, confusion) explicitly addressed?
+- **Design System Alignment:** Are all colors, typography, spacing, and components using tokens? Zero raw values? Consistent across every screen?
+- **Responsive & Accessibility:** WCAG AA verified for all pairs? Touch targets ≥ 44px? Dynamic type tested? Reduced motion specified? Platform conventions honored?
+- **Content Strategy:** Real copy on every screen, every state? Buttons are verb + object? Error messages include recovery? Empty states suggest next action?
+- **Delight & Differentiation:** Would a human designer guess "AI made this"? Does the design have at least one moment that makes the user say "whoa"? Are all three anti-slop commitments honored?
+**HARD GATE: All dimensions must score ≥ 8, or the user must explicitly approve moving forward with lower scores. Present the rating table to user for approval before proceeding to Implementation.**
 ---

package/agents/warp-plan-onboarding.md CHANGED Viewed

@@ -119,6 +119,8 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 ## AskUserQuestion
+**Flow: analysis first, then decision tool.** Present your full reasoning, trade-offs, and recommendations as conversational text — the user wants to read your thinking. Then cap it with AskUserQuestion to formalize the decision. **If you're composing a message with multiple options or "which approach?" language, you MUST end it with AskUserQuestion.** Never present options in prose without the tool.
 **Contract:**
 1. **Re-ground:** Project name, branch, current task. (1-2 sentences.)
 2. **Simplify:** Plain English a smart 16-year-old could follow.
@@ -140,9 +142,15 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 Format: `"Option name — X/10 🟢"` (or 🟡 or 🔴). In the label, not the description.
 Rate: 🟢 9-10 complete, 🟡 6-8 adequate, 🔴 1-5 shortcuts.
+**Pre-call checklist (verify before every AskUserQuestion invocation):**
+- ☐ Completeness scores in every option label
+- ☐ Recommended option listed first
+- ☐ One decision per question (split if multiple)
+- ☐ Analysis/reasoning already presented in message text above
 **Formatting:**
 - *Italics* for emphasis, not **bold** (bold for headers only).
-- After each answer: `✔ Decision {N} recorded [quicksave updated]`
+- After each answer: `✔ Decision {N} recorded`
 - Previews under 8 lines. Full mockups go in conversation text before the question.
 ---

package/agents/warp-plan-optimize.md CHANGED Viewed

@@ -119,6 +119,8 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 ## AskUserQuestion
+**Flow: analysis first, then decision tool.** Present your full reasoning, trade-offs, and recommendations as conversational text — the user wants to read your thinking. Then cap it with AskUserQuestion to formalize the decision. **If you're composing a message with multiple options or "which approach?" language, you MUST end it with AskUserQuestion.** Never present options in prose without the tool.
 **Contract:**
 1. **Re-ground:** Project name, branch, current task. (1-2 sentences.)
 2. **Simplify:** Plain English a smart 16-year-old could follow.
@@ -140,9 +142,15 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 Format: `"Option name — X/10 🟢"` (or 🟡 or 🔴). In the label, not the description.
 Rate: 🟢 9-10 complete, 🟡 6-8 adequate, 🔴 1-5 shortcuts.
+**Pre-call checklist (verify before every AskUserQuestion invocation):**
+- ☐ Completeness scores in every option label
+- ☐ Recommended option listed first
+- ☐ One decision per question (split if multiple)
+- ☐ Analysis/reasoning already presented in message text above
 **Formatting:**
 - *Italics* for emphasis, not **bold** (bold for headers only).
-- After each answer: `✔ Decision {N} recorded [quicksave updated]`
+- After each answer: `✔ Decision {N} recorded`
 - Previews under 8 lines. Full mockups go in conversation text before the question.
 ---

package/agents/warp-plan-scope.md CHANGED Viewed

@@ -119,6 +119,8 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 ## AskUserQuestion
+**Flow: analysis first, then decision tool.** Present your full reasoning, trade-offs, and recommendations as conversational text — the user wants to read your thinking. Then cap it with AskUserQuestion to formalize the decision. **If you're composing a message with multiple options or "which approach?" language, you MUST end it with AskUserQuestion.** Never present options in prose without the tool.
 **Contract:**
 1. **Re-ground:** Project name, branch, current task. (1-2 sentences.)
 2. **Simplify:** Plain English a smart 16-year-old could follow.
@@ -140,9 +142,15 @@ Shell commands use Unix syntax (Git Bash). Never use CMD (`dir`, `type`, `del`)
 Format: `"Option name — X/10 🟢"` (or 🟡 or 🔴). In the label, not the description.
 Rate: 🟢 9-10 complete, 🟡 6-8 adequate, 🔴 1-5 shortcuts.
+**Pre-call checklist (verify before every AskUserQuestion invocation):**
+- ☐ Completeness scores in every option label
+- ☐ Recommended option listed first
+- ☐ One decision per question (split if multiple)
+- ☐ Analysis/reasoning already presented in message text above
 **Formatting:**
 - *Italics* for emphasis, not **bold** (bold for headers only).
-- After each answer: `✔ Decision {N} recorded [quicksave updated]`
+- After each answer: `✔ Decision {N} recorded`
 - Previews under 8 lines. Full mockups go in conversation text before the question.
 ---
@@ -328,6 +336,25 @@ LEVERAGE INVENTORY:
 **[SYSTEM scale only]:** If this is a greenfield project with no existing code, note that and proceed.
+### Taste Calibration
+Before scoping new work, calibrate to the project's actual quality bar. Identify 2-3 well-designed files in the existing codebase and 1-2 anti-patterns. This ensures the scope targets the right level of quality — not aspirational, not lowest-common-denominator, but calibrated to the project's own best work.
+```
+TASTE CALIBRATION:
+  Well-designed (match this quality):
+    - [file path] — [why it's good: clear names, good structure, etc.]
+    - [file path] — [why it's good]
+  Anti-patterns (avoid this):
+    - [file path] — [what's wrong: unclear, coupled, etc.]
+```
+Rules:
+- "Well-designed" means: clear naming, clean separation of concerns, readable without comments, testable in isolation. Not "clever" — clear.
+- "Anti-pattern" means: unclear responsibility, tight coupling, implicit state, hard to test, confusing to a new reader.
+- If the project is greenfield with no existing code, skip this section.
+- The taste calibration informs the scope: new work should match the quality of the best existing work, not the worst. If the scope would require work at a quality level below the project's best, flag it.
 ---
 ## PHASE 2: Dream State Mapping
@@ -566,6 +593,41 @@ RISK: [name]
 ---
+## PHASE 6.5: Implementation Alternatives (Mandatory)
+**Goal:** Before committing to a single approach, require 2-3 distinct implementation strategies with explicit trade-offs. This prevents tunnel vision and gives the architect phase real options instead of a predetermined path.
+For the scoped work, produce:
+```
+IMPLEMENTATION ALTERNATIVES:
+  A) [approach name]
+     Effort: [low/medium/high]  Risk: [low/medium/high]
+     Pros: [list]  Cons: [list]
+  B) [approach name]
+     Effort: [low/medium/high]  Risk: [low/medium/high]
+     Pros: [list]  Cons: [list]
+  C) [approach name] (if applicable)
+     Effort: [low/medium/high]  Risk: [low/medium/high]
+     Pros: [list]  Cons: [list]
+```
+Rules:
+- **Minimum two alternatives.** If you can only think of one way to build this, you have not thought hard enough. Even "build from scratch" vs. "use existing library" vs. "fork and customize" counts.
+- **Alternatives must be genuinely different.** Not "React" vs. "React with different state management." Different means different architecture, different trade-offs, different failure modes. Examples: monolith vs. services, server-rendered vs. SPA, build vs. buy, single-table vs. normalized.
+- **Effort and risk must be calibrated to this team.** "Low effort" for a team with React experience is different from "low effort" for a team learning React. Use the taste calibration and leverage inventory to ground estimates.
+- **Do not pre-decide.** Present alternatives neutrally. The user (or the architect phase) chooses. If you have a strong recommendation, state it separately after the alternatives — not embedded in the pros/cons.
+- **Include the "boring" option.** One alternative should always be the simplest, most conventional approach. If the boring option has no real downsides, it is probably the right choice.
+Present via AskUserQuestion. User may select one, ask for more detail, or request additional alternatives. Record the selected approach (or "deferred to architect") in scope.md.
+**Mode effects:**
+- **Expansion / Selective Expansion:** Full analysis with 3 alternatives minimum.
+- **Hold Scope:** 2 alternatives minimum — the current approach and one meaningful variation.
+- **Reduction:** 2 alternatives — the minimum viable approach and the slightly-less-minimum approach. Focus on what can be cut from the implementation, not just from the feature list.
+---
 ## PHASE 7: Write scope.md
 **Goal:** Write the scope artifact that architect, design, spec, and build all depend on.
@@ -625,6 +687,10 @@ Create `.warp/reports/planning/scope.md`:
 {Top risks from Phase 6}
+## Implementation Alternatives
+{2-3 approaches with effort/risk/pros/cons from Phase 6.5. Selected approach marked.}
 ## Temporal Decisions
 {What must be decided now vs. deferred}