npm - mindsystem-cc - Versions diffs - 3.20.0 → 3.22.0 - Mend

mindsystem-cc 3.20.0 → 3.22.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (113) hide show

package/README.md +9 -18
package/agents/ms-mockup-designer.md +1 -1
package/agents/ms-plan-checker.md +30 -30
package/agents/ms-plan-writer.md +1 -1
package/agents/ms-product-researcher.md +71 -0
package/agents/ms-research-synthesizer.md +1 -1
package/agents/ms-researcher.md +8 -8
package/agents/ms-roadmapper.md +9 -13
package/agents/ms-verifier.md +25 -117
package/bin/install.js +68 -5
package/commands/ms/add-phase.md +7 -8
package/commands/ms/add-todo.md +3 -4
package/commands/ms/adhoc.md +4 -5
package/commands/ms/audit-milestone.md +15 -14
package/commands/ms/complete-milestone.md +27 -24
package/commands/ms/config.md +229 -0
package/commands/ms/create-roadmap.md +3 -4
package/commands/ms/debug.md +3 -4
package/commands/ms/design-phase.md +11 -13
package/commands/ms/discuss-phase.md +26 -22
package/commands/ms/doctor.md +28 -205
package/commands/ms/execute-phase.md +20 -12
package/commands/ms/help.md +46 -39
package/commands/ms/insert-phase.md +6 -7
package/commands/ms/map-codebase.md +1 -2
package/commands/ms/new-milestone.md +41 -19
package/commands/ms/new-project.md +56 -47
package/commands/ms/plan-milestone-gaps.md +7 -9
package/commands/ms/plan-phase.md +4 -5
package/commands/ms/progress.md +3 -4
package/commands/ms/remove-phase.md +3 -4
package/commands/ms/research-phase.md +11 -16
package/commands/ms/research-project.md +19 -26
package/commands/ms/review-design.md +4 -2
package/commands/ms/verify-work.md +6 -8
package/mindsystem/references/continuation-format.md +3 -3
package/mindsystem/references/principles.md +1 -1
package/mindsystem/references/routing/audit-result-routing.md +12 -11
package/mindsystem/references/routing/between-milestones-routing.md +2 -2
package/mindsystem/references/routing/milestone-complete-routing.md +1 -1
package/mindsystem/references/routing/next-phase-routing.md +4 -2
package/mindsystem/references/verification-patterns.md +0 -37
package/mindsystem/templates/config.json +2 -1
package/mindsystem/templates/context.md +7 -6
package/mindsystem/templates/milestone-archive.md +5 -5
package/mindsystem/templates/milestone-context.md +1 -1
package/mindsystem/templates/milestone.md +9 -9
package/mindsystem/templates/project.md +2 -2
package/mindsystem/templates/research-subagent-prompt.md +3 -3
package/mindsystem/templates/roadmap-milestone.md +14 -14
package/mindsystem/templates/roadmap.md +10 -8
package/mindsystem/templates/state.md +2 -2
package/mindsystem/templates/verification-report.md +3 -26
package/mindsystem/workflows/adhoc.md +1 -1
package/mindsystem/workflows/complete-milestone.md +40 -75
package/mindsystem/workflows/discuss-phase.md +141 -65
package/mindsystem/workflows/doctor-fixes.md +273 -0
package/mindsystem/workflows/execute-phase.md +9 -21
package/mindsystem/workflows/execute-plan.md +3 -0
package/mindsystem/workflows/map-codebase.md +6 -12
package/mindsystem/workflows/mockup-generation.md +47 -23
package/mindsystem/workflows/plan-phase.md +13 -6
package/mindsystem/workflows/transition.md +2 -2
package/mindsystem/workflows/verify-work.md +97 -70
package/package.json +1 -1
package/scripts/__pycache__/ms-tools.cpython-314.pyc +0 -0
package/scripts/__pycache__/test_ms_tools.cpython-314-pytest-9.0.2.pyc +0 -0
package/scripts/fixtures/scan-context/.planning/ROADMAP.md +16 -0
package/scripts/fixtures/scan-context/.planning/adhoc/20260220-fix-token-SUMMARY.md +12 -0
package/scripts/fixtures/scan-context/.planning/config.json +3 -0
package/scripts/fixtures/scan-context/.planning/debug/resolved/token-bug.md +11 -0
package/scripts/fixtures/scan-context/.planning/knowledge/auth.md +11 -0
package/scripts/fixtures/scan-context/.planning/phases/02-infra/02-1-SUMMARY.md +20 -0
package/scripts/fixtures/scan-context/.planning/phases/04-setup/04-1-SUMMARY.md +21 -0
package/scripts/fixtures/scan-context/.planning/phases/05-auth/05-1-SUMMARY.md +28 -0
package/scripts/fixtures/scan-context/.planning/todos/done/setup-db.md +10 -0
package/scripts/fixtures/scan-context/.planning/todos/pending/add-logout.md +10 -0
package/scripts/fixtures/scan-context/expected-output.json +257 -0
package/scripts/ms-tools.py +2884 -0
package/scripts/test_ms_tools.py +1622 -0
package/agents/ms-flutter-code-quality.md +0 -169
package/agents/ms-flutter-reviewer.md +0 -211
package/agents/ms-flutter-simplifier.md +0 -79
package/commands/ms/list-phase-assumptions.md +0 -56
package/mindsystem/workflows/list-phase-assumptions.md +0 -178
package/mindsystem/workflows/verify-phase.md +0 -625
package/scripts/__pycache__/compare_mockups.cpython-314.pyc +0 -0
package/scripts/archive-milestone-files.sh +0 -68
package/scripts/archive-milestone-phases.sh +0 -138
package/scripts/doctor-scan.sh +0 -402
package/scripts/gather-milestone-stats.sh +0 -179
package/scripts/generate-adhoc-patch.sh +0 -79
package/scripts/generate-phase-patch.sh +0 -169
package/scripts/scan-artifact-subsystems.sh +0 -55
package/scripts/scan-planning-context.py +0 -839
package/scripts/update-state.sh +0 -59
package/scripts/validate-execution-order.sh +0 -104
package/skills/flutter-code-quality/SKILL.md +0 -143
package/skills/flutter-code-simplification/SKILL.md +0 -102
package/skills/flutter-senior-review/AGENTS.md +0 -869
package/skills/flutter-senior-review/SKILL.md +0 -205
package/skills/flutter-senior-review/principles/dependencies-data-not-callbacks.md +0 -75
package/skills/flutter-senior-review/principles/dependencies-provider-tree.md +0 -85
package/skills/flutter-senior-review/principles/dependencies-temporal-coupling.md +0 -97
package/skills/flutter-senior-review/principles/pragmatism-consistent-error-handling.md +0 -130
package/skills/flutter-senior-review/principles/pragmatism-speculative-generality.md +0 -91
package/skills/flutter-senior-review/principles/state-data-clumps.md +0 -64
package/skills/flutter-senior-review/principles/state-invalid-states.md +0 -53
package/skills/flutter-senior-review/principles/state-single-source-of-truth.md +0 -68
package/skills/flutter-senior-review/principles/state-type-hierarchies.md +0 -75
package/skills/flutter-senior-review/principles/structure-composition-over-config.md +0 -105
package/skills/flutter-senior-review/principles/structure-shared-visual-patterns.md +0 -107
package/skills/flutter-senior-review/principles/structure-wrapper-pattern.md +0 -90

package/README.md CHANGED Viewed

@@ -262,17 +262,17 @@ Replace `<N>` with the phase number you're working on.
 **Run:**
 ```
-/ms:audit-milestone 1.0.0
-/ms:complete-milestone 1.0.0
-/ms:new-milestone "v1.1"
+/ms:audit-milestone
+/ms:complete-milestone
+/ms:new-milestone
 ```
 **What you'll get:**
-- `.planning/milestones/v1.0/` — archived milestone (ROADMAP, REQUIREMENTS, DECISIONS, research)
+- `.planning/milestones/mvp/` — archived milestone (ROADMAP, REQUIREMENTS, DECISIONS, research)
 - Active docs stay lean; full detail lives in the version folder
-**Tip:** Milestone review can be **report-only** (e.g., Flutter structural review) so you stay in control. Create a quality phase, or accept tech debt explicitly — your call.
+**Tip:** Milestone review can be **report-only** so you stay in control. Create a quality phase, or accept tech debt explicitly — your call.
 ---
@@ -312,17 +312,9 @@ After `/ms:execute-phase` (and optionally `/ms:audit-milestone`), Mindsystem run
 | Value                     | What it does                                                   |
 | ------------------------- | -------------------------------------------------------------- |
-| `null`                    | Use the default (stack-aware when available)                   |
-| `"ms-code-simplifier"`    | Generic reviewer — improves clarity and maintainability        |
-| `"ms-flutter-simplifier"` | Flutter/Dart-specific — strong widget and Riverpod conventions |
-| `"ms-flutter-reviewer"`   | Flutter structural analysis (report-only, no code changes)     |
-| `"skip"`                  | Disable review for that level                                  |
-**Flutter-specific tools (built-in):**
-- **`ms-flutter-simplifier`** — pragmatic refactors that preserve behavior
-- **`ms-flutter-reviewer`** — milestone-level structural audit with actionable report (you control the fixes)
-- **`flutter-senior-review` skill** — domain principles that raise review quality beyond generic lint advice
+| `null`                 | No reviewer (default)                                       |
+| `"ms-code-simplifier"` | Generic reviewer — improves clarity and maintainability      |
+| `"skip"`               | Disable review for that level                                |
 ---
@@ -338,11 +330,10 @@ Full docs live in `/ms:help` (same content as `commands/ms/help.md`).
 | `/ms:map-codebase`                       | Document existing repo's stack, structure, and conventions    |
 | `/ms:research-project`                   | Do domain research and save findings to `.planning/research/` |
 | `/ms:create-roadmap`                     | Define requirements and create phases mapped to them          |
-| `/ms:discuss-phase <number>`             | Lock intent and constraints before planning                   |
+| `/ms:discuss-phase <number>`             | Product-informed collaborative thinking before planning       |
 | `/ms:design-phase <number>`              | Generate UI/UX spec for UI-heavy work                         |
 | `/ms:review-design [scope]`              | Audit and improve existing UI quality                         |
 | `/ms:research-phase <number>`            | Do deep research for niche phase domains                      |
-| `/ms:list-phase-assumptions <number>`    | Show what Mindsystem assumes before planning                  |
 | `/ms:plan-phase [number] [--gaps]`       | Create small, verifiable plans with optional risk-based verification |
 | `/ms:check-phase <number>`               | Sanity-check plans before execution                           |
 | `/ms:execute-phase <phase-number>`       | Run all unexecuted plans in fresh subagents                   |

package/agents/ms-mockup-designer.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: ms-mockup-designer
 description: Generates self-contained HTML/CSS mockups for design direction exploration. Spawned by design-phase command.
-model: sonnet
+model: opus
 tools: Read, Write, Bash
 color: magenta
 ---

package/agents/ms-plan-checker.md CHANGED Viewed

@@ -186,34 +186,38 @@ issue:
 **Question:** Will plans complete within context budget?
 **Process:**
-1. Count `### ` subsections (changes) per plan
-2. Count files from `**Files:**` lines per plan
-3. Check against thresholds
+1. For each `### ` subsection (change), classify its weight:
+   - **Light (5%):** Config changes, localization keys, renaming, simple field additions, pattern-copying with parameter substitution
+   - **Medium (10%):** CRUD endpoints, pattern-following implementations, widget extraction, single-file refactoring
+   - **Heavy (20%):** Complex business logic, novel state management, architecture changes, multi-file integrations
+2. Sum estimated budget per plan (target: 25-45%)
+3. Check structural signals
-**Thresholds:**
-| Metric | Target | Warning | Blocker |
-|--------|--------|---------|---------|
-| Changes/plan | 2-3 | 4 | 5+ |
-| Files/plan | 5-8 | 10 | 15+ |
-| Total context | ~50% | ~70% | 80%+ |
+**Thresholds (warning-level only — scope never produces blockers):**
+| Metric | Target | Warning |
+|--------|--------|---------|
+| Estimated budget/plan | 25-45% | >50% |
+| Files per single change | 1-3 | 8+ |
+**Raw change count is NOT a threshold.** A plan with 8 lightweight, formulaic changes (~40% budget) is healthier than a plan with 3 heavy, novel changes (~60%). Assess complexity and budget, not count.
 **Red flags:**
-- Plan with 5+ changes (quality degrades)
-- Plan with 15+ file modifications
-- Single change with 10+ files
-- Complex work (auth, payments) crammed into one plan
+- Estimated plan budget >50% (quality will degrade)
+- Single change with 10+ file modifications
+- Multiple unrelated subsystems crammed into one plan
+- Novel/complex work appearing late in a long change sequence (context fatigue risks lower attention)
 **Example issue:**
 ```yaml
 issue:
   dimension: scope_sanity
   severity: warning
-  description: "Plan 01 has 5 changes - split recommended"
+  description: "Plan 01 estimated at ~55% budget - 3 heavy changes with novel state management"
   plan: "01"
   metrics:
-    changes: 5
-    files: 12
-  fix_hint: "Split into 2 plans: foundation (01) and integration (02)"
+    estimated_budget: "55%"
+    heavy_changes: 3
+  fix_hint: "Move change 3 (complex state machine) to a separate plan"
 ```
 ## Dimension 6: Verification Derivation
@@ -301,7 +305,7 @@ PHASE_DIR=$(ls -d .planning/phases/${PADDED_PHASE}-* .planning/phases/${PHASE_AR
 ls "$PHASE_DIR"/*-PLAN.md 2>/dev/null
 # Get phase goal from ROADMAP
-grep -A 10 "Phase ${PHASE_NUM}" .planning/ROADMAP.md | head -15
+grep -A 10 "Phase ${PADDED_PHASE}" .planning/ROADMAP.md | head -15
 # Get phase brief if exists
 ls "$PHASE_DIR"/*-BRIEF.md 2>/dev/null
@@ -342,12 +346,12 @@ Run Dimensions 1-7 from `<verification_dimensions>` against the loaded plans. Bu
 - Missing requirement coverage
 - Missing required change fields
 - Circular dependencies or file conflicts in same wave
-- Scope > 5 changes per plan
 **warning** - Should fix, execution may work
-- Scope 4 tasks (borderline)
+- Estimated plan budget >50%
 - Implementation-focused truths
 - Minor wiring missing
+- Novel/complex changes appearing late in change sequence
 **info** - Suggestions for improvement
 - Could split for better parallelization
@@ -369,8 +373,8 @@ issues:
   - plan: "01"
     dimension: "scope_sanity"
     severity: "warning"
-    description: "Plan has 4 changes - consider splitting"
-    fix_hint: "Split into foundation + integration plans"
+    description: "Plan estimated at ~50% budget - heavy changes may cause degradation"
+    fix_hint: "Consider splitting complex changes into separate plan"
   - plan: null
     dimension: "requirement_coverage"
@@ -462,14 +466,10 @@ issues:
 <anti_patterns>
-**DO NOT check code existence.** That's ms-verifier's job after execution. You verify plans, not codebase.
-**DO NOT run the application.** This is static plan analysis. No `npm start`, no `curl` to running server.
+**DO NOT check the codebase.** You verify plans describe what to build — checking code existence is ms-verifier's job after execution. No `npm start`, no `curl`, no runtime verification.
 **DO NOT accept vague changes.** "Implement auth" is not specific enough. Changes need concrete files, implementation details, verification.
-**DO NOT verify implementation details.** Check that plans describe what to build, not that code exists.
 **DO NOT trust change titles alone.** Read the implementation details, Files lines, verification entries. A well-named change can be empty.
 </anti_patterns>
@@ -478,11 +478,11 @@ issues:
 Plan verification complete when:
-- [ ] Key links checked (wiring planned between artifacts, not just creation)
-- [ ] Scope assessed per plan (changes, files within thresholds)
+- [ ] Context compliance checked (if CONTEXT.md: locked decisions implemented, deferred ideas excluded)
 - [ ] Must-Haves are user-observable truths, not implementation details
+- [ ] Key links checked (wiring planned between artifacts, not just creation)
 - [ ] EXECUTION-ORDER.md validated (no missing plans, no file conflicts in same wave)
-- [ ] Context compliance checked (if CONTEXT.md: locked decisions implemented, deferred ideas excluded)
+- [ ] Scope assessed per plan (estimated budget within thresholds)
 - [ ] Structured issues returned to orchestrator
 </success_criteria>

package/agents/ms-plan-writer.md CHANGED Viewed

@@ -91,7 +91,7 @@ The orchestrator provides structured XML:
 </proposed_grouping>
 <confirmed_skills>
-  flutter-code-quality, flutter-code-simplification
+  project-skill-a, project-skill-b
 </confirmed_skills>
 <learnings>

package/agents/ms-product-researcher.md ADDED Viewed

@@ -0,0 +1,71 @@
+---
+name: ms-product-researcher
+description: Researches competitor products, UX patterns, and industry best practices for phase-level product decisions. Spawned by /ms:discuss-phase.
+model: sonnet
+tools: WebSearch, WebFetch
+color: cyan
+---
+<input>
+You receive: `<current_date>` (YYYY-MM), `<product_context>` (Who It's For, Core Value, How It's Different), `<phase_requirements>` (phase goal + mapped requirements), `<research_focus>` (specific product questions to investigate).
+</input>
+<role>
+You are a Mindsystem product researcher. Deliver prescriptive, audience-grounded product intelligence — "Users expect X" beats "Consider whether X."
+**Prescriptive, not exploratory.** "Users expect inline editing for this type of content" beats "You could consider inline editing or modal editing or page-based editing." Make a recommendation, explain why, let the user override.
+**Audience-grounded.** Every recommendation ties back to the target audience from `<product_context>`. "Enterprise users expect X" is different from "Consumer app users expect Y." Never give generic advice.
+**Competitor-aware, not competitor-driven.** Know what exists. Recommend what fits THIS product's positioning. "Competitors do X, but given your differentiation of Y, consider Z" is the ideal output shape.
+**Concise and structured.** Target 2000-3000 tokens max. The orchestrator weaves your findings into a briefing — dense signal beats comprehensive coverage.
+</role>
+<tool_strategy>
+| Need | Tool | Why |
+|------|------|-----|
+| Competitor features | WebSearch | Discover what exists |
+| UX pattern details | WebFetch | Read specific articles/docs |
+| Industry best practices | WebSearch | Current standards |
+| Product comparisons | WebSearch | Side-by-side analysis |
+**Search freshness:** Use `<current_date>` to keep results current, but apply year strings selectively:
+- **Add year** to trend/best-practice queries where listicle freshness matters: `"payment terminal UX best practices 2026"`
+- **Omit year** from product-specific queries where it narrows results unhelpfully: `"Square Terminal cashier workflow features"` (Square's docs don't mention the year)
+**Budget:** 5-8 searches max. Prioritize breadth over depth — the user needs a landscape, not a dissertation.
+</tool_strategy>
+<output>
+Return structured text (do NOT write files). Use this format:
+```markdown
+## PRODUCT RESEARCH COMPLETE
+### Competitor Landscape
+[How 3-5 relevant competitors handle this. Specific features, not vague descriptions.]
+### UX Patterns Users Expect
+[Industry conventions for this type of feature. What feels "right" to the target audience.]
+### Audience Expectations
+[What the target audience specifically expects, grounded in Who It's For from `<product_context>`.]
+### Key Tradeoffs
+[2-3 decision points with pros/cons and recommendation for each.]
+### Recommendations
+[Prescriptive recommendations tied to this product's positioning. "Do X because Y."]
+```
+</output>
+<success_criteria>
+- Findings grounded in target audience, not generic
+- Competitor analysis names specific products and features
+- Recommendations are prescriptive with reasoning
+- Total output 2000-3000 tokens
+- No technical implementation details
+- Every recommendation connects to product positioning
+</success_criteria>

package/agents/ms-research-synthesizer.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: ms-research-synthesizer
 description: Synthesizes research outputs from parallel researcher agents into SUMMARY.md. Spawned by /ms:research-project after 4 researcher agents complete.
-model: haiku
+model: sonnet
 tools: Read, Write, Bash
 color: purple
 ---

package/agents/ms-researcher.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: ms-researcher
 description: Conducts comprehensive research using systematic methodology, source verification, and structured output. Spawned by /ms:research-phase and /ms:research-project orchestrators.
-model: sonnet
+model: opus
 tools: Read, Write, Bash, Grep, Glob, WebSearch, WebFetch
 color: cyan
 ---
@@ -195,18 +195,18 @@ When researching "best library for X":
 ## ms-lookup CLI
-The CLI is at `~/.claude/mindsystem/scripts/ms-lookup-wrapper.sh`.
+The CLI is available as `ms-lookup`.
 ### Library Documentation
 ```bash
-~/.claude/mindsystem/scripts/ms-lookup-wrapper.sh docs <library> "<query>"
+ms-lookup docs <library> "<query>"
 ```
 Example:
 ```bash
-~/.claude/mindsystem/scripts/ms-lookup-wrapper.sh docs nextjs "app router file conventions"
-~/.claude/mindsystem/scripts/ms-lookup-wrapper.sh docs "react-three-fiber" "physics setup"
+ms-lookup docs nextjs "app router file conventions"
+ms-lookup docs "react-three-fiber" "physics setup"
 ```
 **When to use:** Library APIs, framework features, configuration options, version-specific behavior. This is your PRIMARY source for library-specific questions — most authoritative.
@@ -216,13 +216,13 @@ Example:
 ### Deep Research
 ```bash
-~/.claude/mindsystem/scripts/ms-lookup-wrapper.sh deep "<query>"
+ms-lookup deep "<query>"
 ```
 Example:
 ```bash
-~/.claude/mindsystem/scripts/ms-lookup-wrapper.sh deep "authentication patterns for SaaS applications"
-~/.claude/mindsystem/scripts/ms-lookup-wrapper.sh deep "WebGPU browser support and production readiness 2026"
+ms-lookup deep "authentication patterns for SaaS applications"
+ms-lookup deep "WebGPU browser support and production readiness 2026"
 ```
 **When to use:** Architecture decisions, technology comparisons, comprehensive ecosystem surveys, best practices synthesis. Use for HIGH-VALUE research questions — this costs money.

package/agents/ms-roadmapper.md CHANGED Viewed

@@ -222,20 +222,16 @@ All use binary Likely/Unlikely with parenthetical reason. These are hints to use
 ### Discussion Indicators
-**Problem it solves:** User's mental model isn't documented. Planning happens without understanding what's essential vs nice-to-have.
+**Problem it solves:** Claude plans based on assumptions about user intent. Discussion surfaces those assumptions before they become embedded in plans.
-**Likely when ANY of:**
-- Phase goal mentions "user can [verb]" without specifying HOW
-- Success criteria have multiple valid interpretations
-- Phase involves UX decisions (not just backend)
-- Requirements mention experiential qualities ("should feel", "intuitive")
-- Novel feature not based on existing patterns
+**Default: Likely.** Every phase benefits from surfacing Claude's assumptions before planning. Discussion now provides deep artifact loading, assumptions surfacing, and product-informed questions — valuable even for seemingly "clear" phases.
-**Unlikely when ALL of:**
-- Requirements are specific and unambiguous
-- Backend/infrastructure only (APIs, database, CI/CD)
-- Follows clearly established patterns
-- Bug fix, performance, or technical debt work
+**When Likely**, the rationale enumerates 2-4 phase-specific assumptions or open questions (not generic labels like "ambiguous user flow"). Example: "Likely (assumes password reset uses email not SMS, unclear if social login needed, session duration unspecified)"
+**Unlikely only when ALL of:**
+- Fully mechanical (zero design decisions)
+- Zero ambiguity in scope or approach
+- Examples: version bump, rename-only refactor, config-only change, pure deletion/cleanup
 ### Design Indicators
@@ -273,7 +269,7 @@ All use binary Likely/Unlikely with parenthetical reason. These are hints to use
 For each phase in ROADMAP.md:
 ```markdown
-**Discuss**: Likely (ambiguous user flow) | Unlikely (clear requirements)
+**Discuss**: Likely (assumes X, unclear if Y, Z unspecified) | Unlikely (mechanical change, zero decisions)
 **Discuss topics**: [What to clarify] (only if Likely)
 **Design**: Likely (significant new UI) | Unlikely (backend only)
 **Design focus**: [What to design] (only if Likely)

package/agents/ms-verifier.md CHANGED Viewed

@@ -19,13 +19,7 @@ Your job: Goal-backward verification. Start from what the phase SHOULD deliver,
 A task "create chat component" can be marked complete when the component is a placeholder. The task was done — a file was created — but the goal "working chat interface" was not achieved.
-Goal-backward verification starts from the outcome and works backwards:
-1. What must be TRUE for the goal to be achieved?
-2. What must EXIST for those truths to hold?
-3. What must be WIRED for those artifacts to function?
-Then verify each level against the actual codebase.
+Goal-backward verification starts from the outcome and works backwards — verify each level against the actual codebase.
 </core_principle>
 <verification_process>
@@ -205,7 +199,7 @@ Identify the project's tech stack from file extensions and project structure. Fo
 If REQUIREMENTS.md exists and has requirements mapped to this phase:
 ```bash
-grep -E "Phase ${PHASE_NUM}" .planning/REQUIREMENTS.md 2>/dev/null
+grep -E "^| ${PHASE_NUM}" .planning/REQUIREMENTS.md 2>/dev/null
 ```
 For each requirement:
@@ -226,7 +220,7 @@ Identify files modified in this phase from PLAN.md `**Files:**` lines or git his
 ```bash
 # Extract files from PLAN.md (trustworthy source)
-grep "^\*\*Files:\*\*" "$PHASE_DIR"/*-PLAN.md | sed 's/.*`\([^`]*\)`.*/\1/' | sort -u
+grep -oE '`[^`]+`' "$PHASE_DIR"/*-PLAN.md | grep -v "PLAN.md" | tr -d '`' | sort -u
 ```
 Scan each file for anti-patterns: `TODO/FIXME/XXX/HACK` comments, placeholder content (`coming soon`, `will be here`), empty implementations (`return null`, `return {}`, `=> {}`), console.log-only handlers.
@@ -237,44 +231,16 @@ Categorize findings:
 - ⚠️ Warning: Indicates incomplete (TODO comments, console.log)
 - ℹ️ Info: Notable but not problematic
-## Step 8: Identify Human Verification Needs
-Some things can't be verified programmatically:
-**Always needs human:**
-- Visual appearance (does it look right?)
-- User flow completion (can you do the full task?)
-- Real-time behavior (WebSocket, SSE updates)
-- External service integration (payments, email)
-- Performance feel (does it feel fast?)
-- Error message clarity
-**Needs human if uncertain:**
-- Complex wiring that grep can't trace
-- Dynamic behavior depending on state
-- Edge cases and error states
-**Format for human verification:**
-```markdown
-### 1. {Test Name}
-**Test:** {What to do}
-**Expected:** {What should happen}
-**Why human:** {Why can't verify programmatically}
-```
-## Step 9: Determine Overall Status
+## Step 8: Determine Overall Status
 **Status: passed**
-- All truths VERIFIED
+- All truths VERIFIED or UNCERTAIN
 - All artifacts pass level 1-3
 - All key links WIRED
 - No blocker anti-patterns
-- (Human verification items are OK — will be prompted)
+UNCERTAIN truths count toward passed — they are structurally present but need functional confirmation through UAT.
 **Status: gaps_found**
@@ -283,64 +249,19 @@ Some things can't be verified programmatically:
 - OR one or more key links NOT_WIRED
 - OR blocker anti-patterns found
-**Status: human_needed**
-- All automated checks pass
-- BUT items flagged for human verification
-- Can't determine goal achievement without human
 **Calculate score:**
 ```
 score = (verified_truths / total_truths)
 ```
-## Step 10: Structure Gap Output (If Gaps Found)
+## Step 9: Structure Gap Output (If Gaps Found)
-When gaps are found, structure them for consumption by `/ms:plan-phase --gaps`.
+When gaps are found, structure them in YAML frontmatter for consumption by `/ms:plan-phase --gaps`. Use the `gaps:` format shown in the VERIFICATION.md template below.
-**Output structured gaps in YAML frontmatter:**
+**Gap fields:** `truth` (observable truth that failed), `status` (failed | partial), `reason` (why it failed), `artifacts` (files with issues), `missing` (specific things to add/fix).
-```yaml
----
-phase: XX-name
-verified: YYYY-MM-DDTHH:MM:SSZ
-status: gaps_found
-score: N/M must-haves verified
-gaps:
-  - truth: "User can see existing messages"
-    status: failed
-    reason: "Chat.tsx exists but doesn't fetch from API"
-    artifacts:
-      - path: "src/components/Chat.tsx"
-        issue: "No useEffect with fetch call"
-    missing:
-      - "API call in useEffect to /api/chat"
-      - "State for storing fetched messages"
-      - "Render messages array in JSX"
-  - truth: "User can send a message"
-    status: failed
-    reason: "Form exists but onSubmit is stub"
-    artifacts:
-      - path: "src/components/Chat.tsx"
-        issue: "onSubmit only calls preventDefault()"
-    missing:
-      - "POST request to /api/chat"
-      - "Add new message to state after success"
----
-```
-**Gap structure:**
-- `truth`: The observable truth that failed verification
-- `status`: failed | partial
-- `reason`: Brief explanation of why it failed
-- `artifacts`: Which files have issues and what's wrong
-- `missing`: Specific things that need to be added/fixed
-The planner (`/ms:plan-phase --gaps`) reads this gap analysis and creates appropriate plans.
-**Group related gaps by concern** when possible — if multiple truths fail because of the same root cause (e.g., "Chat component is a stub"), note this in the reason to help the planner create focused plans.
+**Group related gaps by concern** when possible — if multiple truths fail because of the same root cause, note this in the reason to help the planner create focused plans.
 </verification_process>
@@ -354,8 +275,9 @@ Create `.planning/phases/{phase_dir}/{phase}-VERIFICATION.md` with:
 ---
 phase: XX-name
 verified: YYYY-MM-DDTHH:MM:SSZ
-status: passed | gaps_found | human_needed
+status: passed | gaps_found
 score: N/M must-haves verified
+uncertain: N # Count of UNCERTAIN truths + NEEDS HUMAN requirements (0 if none)
 re_verification: # Only include if previous VERIFICATION.md existed
   previous_status: gaps_found
   previous_score: 2/5
@@ -373,10 +295,6 @@ gaps: # Only include if status: gaps_found
     missing:
       - "Specific thing to add/fix"
       - "Another specific thing"
-human_verification: # Only include if status: human_needed
-  - test: "What to do"
-    expected: "What should happen"
-    why_human: "Why can't verify programmatically"
 ---
 # Phase {X}: {Name} Verification Report
@@ -418,10 +336,6 @@ human_verification: # Only include if status: human_needed
 | File | Line | Pattern | Severity | Impact |
 | ---- | ---- | ------- | -------- | ------ |
-### Human Verification Required
-{Items needing human testing — detailed format for user}
 ### Gaps Summary
 {Narrative summary of what's missing and why}
@@ -441,13 +355,23 @@ Return with:
 ```markdown
 ## Verification Complete
-**Status:** {passed | gaps_found | human_needed}
+**Status:** {passed | gaps_found}
 **Score:** {N}/{M} must-haves verified
 **Report:** .planning/phases/{phase_dir}/{phase}-VERIFICATION.md
-{If passed:}
+{If passed AND uncertain == 0:}
 All must-haves verified. Phase goal achieved. Ready to proceed.
+{If passed AND uncertain > 0:}
+All must-haves verified. Phase goal achieved.
+### Items Not Verified Programmatically
+{N} items could not be confirmed by structural checks alone:
+1. **{Truth/Requirement}** — {why uncertain}
+Consider `/ms:verify-work {phase}` to validate these through UAT.
 {If gaps_found:}
 ### Gaps Found
@@ -460,19 +384,6 @@ All must-haves verified. Phase goal achieved. Ready to proceed.
    - Missing: {what needs to be added}
 Structured gaps in VERIFICATION.md frontmatter for `/ms:plan-phase --gaps`.
-{If human_needed:}
-### Human Verification Required
-{N} items need human testing:
-1. **{Test name}** — {what to do}
-   - Expected: {what should happen}
-2. **{Test name}** — {what to do}
-   - Expected: {what should happen}
-Automated checks passed. Awaiting human verification.
 ```
 </output>
@@ -487,8 +398,6 @@ Automated checks passed. Awaiting human verification.
 **Structure gaps in YAML frontmatter.** The planner (`/ms:plan-phase --gaps`) creates plans from your analysis.
-**DO flag for human verification when uncertain.** If you can't verify programmatically (visual, real-time, external service), say so explicitly.
 **DO keep verification fast.** Use grep/file checks, not running the app. Goal is structural verification, not functional testing.
 **DO NOT commit.** Create VERIFICATION.md but leave committing to the orchestrator.
@@ -501,7 +410,6 @@ Automated checks passed. Awaiting human verification.
 - [ ] Key links verified — not just artifact existence; this is where stubs hide
 - [ ] Artifacts checked at all three levels (exists → substantive → wired)
 - [ ] SUMMARY.md claims verified against actual code, not trusted
-- [ ] Human verification items identified for what can't be checked programmatically
 - [ ] Re-verification: focus on previously-failed items, regression-check passed items
 - [ ] Results returned to orchestrator — NOT committed
 </success_criteria>