npm - mindsystem-cc - Versions diffs - 3.18.1 → 3.20.0 - Mend

mindsystem-cc 3.18.1 → 3.20.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

package/agents/ms-designer.md +5 -2
package/agents/ms-plan-writer.md +58 -53
package/commands/ms/add-phase.md +29 -17
package/commands/ms/check-phase.md +1 -1
package/commands/ms/design-phase.md +31 -21
package/commands/ms/doctor.md +208 -80
package/commands/ms/execute-phase.md +8 -8
package/commands/ms/help.md +5 -5
package/commands/ms/insert-phase.md +29 -17
package/commands/ms/new-milestone.md +2 -3
package/commands/ms/new-project.md +42 -68
package/commands/ms/plan-phase.md +4 -1
package/commands/ms/release-notes.md +32 -49
package/mindsystem/references/plan-format.md +11 -1
package/mindsystem/references/prework-status.md +1 -1
package/mindsystem/references/principles.md +2 -2
package/mindsystem/references/questioning.md +50 -8
package/mindsystem/references/scope-estimation.md +12 -13
package/mindsystem/templates/phase-prompt.md +5 -6
package/mindsystem/templates/project.md +69 -63
package/mindsystem/templates/roadmap.md +1 -1
package/mindsystem/templates/verification-report.md +1 -1
package/mindsystem/workflows/complete-milestone.md +27 -33
package/mindsystem/workflows/execute-phase.md +70 -70
package/mindsystem/workflows/execute-plan.md +3 -5
package/mindsystem/workflows/mockup-generation.md +24 -8
package/mindsystem/workflows/plan-phase.md +80 -18
package/mindsystem/workflows/transition.md +18 -23
package/mindsystem/workflows/verify-phase.md +1 -1
package/package.json +1 -1
package/scripts/doctor-scan.sh +402 -0

package/commands/ms/new-project.md CHANGED Viewed

@@ -94,31 +94,32 @@ Exit command.
 Ask inline (freeform, NOT AskUserQuestion):
-"What do you want to build?"
+- **Greenfield** (no existing code): "What do you want to build?"
+- **Brownfield** (code detected in Step 1): "Tell me about this project — if you were explaining it to someone who's never seen it, what does it do and who uses it?"
+For brownfield, explicitly note: "Not the next feature — the product as a whole."
 Wait for their response. This gives you the context needed to ask intelligent follow-up questions.
+**Derive business context:**
+After the initial response, infer business context before asking more questions. Present inferred audience/problem/differentiation for the user to react to:
+"It sounds like this is for [audience] dealing with [problem], and your approach is different because [differentiation]. Sound right?"
+This leverages what they've already said and gives them something concrete to react to.
 **Follow the thread:**
 Based on what they said, ask follow-up questions that dig into their response. Use AskUserQuestion with options that probe what they mentioned — interpretations, clarifications, concrete examples.
-Keep following threads. Each answer opens new threads to explore. Ask about:
-- What excited them
-- What problem sparked this
-- What they mean by vague terms
-- What it would actually look like
-- What's already decided
+Track coverage against the 6-item context checklist from `questioning.md`. Probe fuzziest areas first — spend questioning budget where clarity is lowest.
-Consult `questioning.md` for techniques:
-- Challenge vagueness
-- Make abstract concrete
-- Surface assumptions
-- Find edges
-- Reveal motivation
+Use grounding questions from `questioning.md` instead of template-shaped questions. Don't ask "Who is your target audience?" — ask "Who would be your first 10 users?"
-**Check context (background, not out loud):**
+When the conversation pauses and sections are still fuzzy, use success-backward: "Imagine this is wildly successful in a year. What does that look like?"
-As you go, mentally check the context checklist from `questioning.md`. If gaps remain, weave questions naturally. Don't suddenly switch to checklist mode.
+Consult `questioning.md` for techniques — challenge vagueness, derive before asking, use grounding questions over template-shaped questions.
 **Decision gate:**
@@ -142,54 +143,31 @@ Synthesize all context into `.planning/PROJECT.md` using the template from `temp
 **For greenfield projects:**
-Initialize requirements as hypotheses:
-```markdown
-## Requirements
-### Validated
-(None yet — ship to validate)
-### Active
-- [ ] [Requirement 1]
-- [ ] [Requirement 2]
-- [ ] [Requirement 3]
-### Out of Scope
-- [Exclusion 1] — [why]
-- [Exclusion 2] — [why]
-```
-All Active requirements are hypotheses until shipped and validated.
+Initialize all sections from questioning:
+- **What This Is** — product identity from questioning
+- **Core Value** — the ONE thing from questioning
+- **Who It's For** — target audience from questioning
+- **Core Problem** — pain or desire from questioning
+- **How It's Different** — differentiators from questioning
+- **Key User Flows** — core interactions from questioning
+- **Out of Scope** — boundaries from questioning
+- **Constraints** — hard limits from questioning
+- **Technical Context** — minimal for greenfield (stack choices if discussed)
+- **Validated** — `(None yet — ship to validate)`
+- **Key Decisions** — any decisions made during questioning
 **For brownfield projects (codebase map exists):**
-Infer Validated requirements from existing code:
+Same as greenfield, plus:
 1. Read `.planning/codebase/ARCHITECTURE.md` and `STACK.md`
-2. Identify what the codebase already does
-3. These become the initial Validated set
+2. Infer Validated requirements from existing code — what the codebase already does
+3. Populate Technical Context from codebase map (stack, integrations, known debt)
 ```markdown
-## Requirements
-### Validated
+## Validated
 - ✓ [Existing capability 1] — existing
 - ✓ [Existing capability 2] — existing
-- ✓ [Existing capability 3] — existing
-### Active
-- [ ] [New requirement 1]
-- [ ] [New requirement 2]
-### Out of Scope
-- [Exclusion 1] — [why]
 ```
 **Key Decisions:**
@@ -239,7 +217,7 @@ docs: initialize [project-name]
 [One-liner from PROJECT.md]
-Creates PROJECT.md with requirements and constraints.
+Creates PROJECT.md with business context and constraints.
 EOF
 )"
 ```
@@ -261,19 +239,14 @@ Project initialized:
 ## ▶ Next Up
-Choose your path:
-**Option A: Research first** (recommended)
-Research ecosystem → define requirements → create roadmap. Discovers standard stacks, expected features, architecture patterns.
-`/ms:research-project`
+`/ms:new-milestone` — Discover what to build first, create requirements and roadmap
-**Option B: Define requirements and roadmap directly** (familiar domains)
-Skip research, define requirements and create roadmap from what you know.
+<sub>`/clear` first → fresh context window</sub>
-`/ms:create-roadmap`
+---
-<sub>`/clear` first → fresh context window</sub>
+**Also available:**
+- `/ms:create-roadmap` — Skip milestone discovery, define requirements and roadmap directly
 ---
 ```
@@ -299,9 +272,10 @@ Update `.planning/STATE.md` Last Command field (if STATE.md exists):
 <success_criteria>
-- [ ] Deep questioning completed (not rushed, threads followed)
-- [ ] PROJECT.md captures full context with evolutionary structure
-- [ ] Requirements initialized as hypotheses (greenfield) or with inferred Validated (brownfield)
+- [ ] Deep questioning completed with business context extracted
+- [ ] PROJECT.md captures product identity (What/Value/Audience/Problem/Differentiation/Flows)
+- [ ] PROJECT.md captures boundaries (Out of Scope, Constraints, Technical Context)
+- [ ] Validated requirements initialized (empty for greenfield, inferred for brownfield)
 - [ ] Key Decisions table initialized
 - [ ] config.json has subsystems and code_review settings
 - [ ] All committed to git

package/commands/ms/plan-phase.md CHANGED Viewed

@@ -12,6 +12,7 @@ allowed-tools:
   - WebFetch
   - mcp__context7__*
   - Task
+  - Skill
 ---
 <objective>
@@ -90,7 +91,9 @@ Check for `.planning/codebase/` and load relevant documents based on phase type.
    - Perform mandatory discovery (Level 0-3 as appropriate)
    - Scan project history via context scanner script (prior decisions, issues, debug resolutions, adhoc learnings, cross-milestone patterns)
    - Break phase into tasks
-   - Estimate scope and split into multiple plans if needed
+   - Propose plan grouping (plan boundaries, wave structure, budget estimates) for user review
+   - Discover relevant project skills, confirm with user
+   - Hand off tasks + proposed grouping + confirmed skills to plan-writer subagent
    - Create PLAN.md file(s) with executable structure
 **Gap closure mode (--gaps flag):**

package/commands/ms/release-notes.md CHANGED Viewed

@@ -1,11 +1,11 @@
 ---
 name: ms:release-notes
 description: Show full Mindsystem release notes with update status
-allowed-tools: [Read, Bash, WebFetch]
+allowed-tools: [Read, Bash, Task]
 ---
 <objective>
-Display all Mindsystem release notes from 2.0.0 onward in clean bullet format, with update status at the end.
+Display Mindsystem release notes from 2.0.0 through installed version in clean bullet format, oldest first, with update status at the end.
 Use when: you want to see what Mindsystem has shipped across versions, or check if an update is available.
 </objective>
@@ -31,65 +31,48 @@ Reinstall with: `npx mindsystem-cc`
 STOP here if no VERSION file.
 </step>
-<step name="fetch_changelog">
-Fetch latest CHANGELOG.md from GitHub:
+<step name="fetch_and_display">
+Spawn a Haiku general-purpose subagent with `model: "haiku"` to fetch, parse, and display the changelog.
-Use WebFetch tool with:
-- URL: `https://raw.githubusercontent.com/rolandtolnay/mindsystem/main/CHANGELOG.md`
-- Prompt: "Return the full raw markdown content of this changelog file. Do not summarize or modify."
+The subagent prompt must include the installed version and instruct it to:
-**If fetch fails:**
-Fall back to local changelog:
-```bash
-cat ~/.claude/mindsystem/CHANGELOG.md 2>/dev/null
-```
+1. **Fetch changelog** — Run `curl -sfL https://raw.githubusercontent.com/rolandtolnay/mindsystem/main/CHANGELOG.md`. If curl fails, fall back to reading `~/.claude/mindsystem/CHANGELOG.md`.
-Note to user: "Showing local changelog (couldn't reach GitHub)."
-</step>
+2. **Parse versions** — Extract all `## [X.Y.Z]` entries from 2.0.0 onward (skip `## [Unreleased]` and `## [1.x]`).
-<step name="parse_and_display">
-From the changelog content:
+3. **Filter by installed version** — Only include versions where X.Y.Z is less than or equal to the installed version (semver comparison). Also extract the latest version from the changelog for the status line.
-1. **Extract latest version** — First `## [X.Y.Z]` entry after `## [Unreleased]`
-2. **Parse all version entries from 2.0.0 onward** — Skip the collapsed `## [1.x]` section
-3. **Convert to clean bullet format** — Transform Keep-a-Changelog sections into flat bullets:
-**Format each version as:**
-```
-Version X.Y.Z:
-  - Added feature description
-  - Added another feature
-  - Changed behavior description
-  - Removed old thing
-  - Fixed bug description
-```
-**Conversion rules:**
-- Combine all `### Added`, `### Changed`, `### Removed`, `### Fixed` items under a single version header
-- Prefix each item with its category: "Added", "Changed", "Removed", "Fixed"
-- Strip bold markers from item text
-- One bullet per changelog line item
-- Blank line between versions
-- Skip `### Migration` sections
-Output all versions from newest to oldest, stopping before `## [1.x]`.
-</step>
+4. **Convert to clean bullet format:**
+   ```
+   Version X.Y.Z:
+     - Added feature description
+     - Changed behavior description
+     - Removed old thing
+     - Fixed bug description
+   ```
+   - Combine all `### Added`, `### Changed`, `### Removed`, `### Fixed` items under a single version header
+   - Prefix each item with its category: "Added", "Changed", "Removed", "Fixed"
+   - Strip bold markers from item text
+   - One bullet per changelog line item
+   - Blank line between versions
+   - Skip `### Migration` sections
-<step name="update_status">
-After all release notes, add a single status line:
+5. **Output oldest first, newest last** — Display versions starting from 2.0.0 up to the installed version.
-**Compare installed version with latest version from changelog.**
+6. **End with update status line:**
+   - If installed < latest: `Update available: vINSTALLED -> vLATEST. Run 'npx mindsystem-cc@latest' to update.`
+   - If installed == latest: `You are using the latest version (vINSTALLED).`
+   - If installed > latest: `You are ahead of the latest release (vINSTALLED > vLATEST) — development version.`
-- **If behind:** `Update available: vINSTALLED -> vLATEST. Run 'npx mindsystem-cc@latest' to update.`
-- **If current:** `You are using the latest version (vINSTALLED).`
-- **If ahead:** `You are ahead of the latest release (vINSTALLED > vLATEST) — development version.`
+Output the subagent's response directly to the user.
 </step>
 </process>
 <success_criteria>
-- [ ] Installed version read from VERSION file
-- [ ] Remote changelog fetched (or graceful fallback to local)
-- [ ] All versions from 2.0.0 onward displayed in clean bullet format
+- [ ] Delegated to Haiku subagent (no main-agent token waste)
+- [ ] Only versions ≤ installed version displayed
+- [ ] Versions displayed oldest first, newest last
+- [ ] Clean bullet format with category prefixes
 - [ ] Update status shown as single line at end
 </success_criteria>

package/mindsystem/references/plan-format.md CHANGED Viewed

@@ -43,7 +43,7 @@ Details referencing existing utilities: "Use `comparePassword` in `src/lib/crypt
 | Section | Purpose | Required |
 |---------|---------|----------|
 | `# Plan NN: Title` | H1 with plan number and descriptive title | Yes |
-| `**Subsystem:** X \| **Type:** Y` | Inline metadata (replaces YAML frontmatter) | Yes (type defaults to `execute`) |
+| `**Subsystem:** X \| **Type:** Y \| **Skills:** Z` | Inline metadata (replaces YAML frontmatter) | Yes (type defaults to `execute`, skills optional) |
 | `## Context` | Why this work exists, why this approach | Yes |
 | `## Changes` | Numbered subsections with files and implementation details | Yes |
 | `## Verification` | Bash commands and observable checks | Yes |
@@ -130,6 +130,15 @@ Matches vocabulary from the project's `config.json`. Used by the executor when g
 When `**Type:**` is omitted, the plan defaults to `execute`.
+### Skills
+| Value | Behavior |
+|-------|----------|
+| *(omitted)* | No skills loaded. Skill discovery happens during `/ms:plan-phase` — absence means none were relevant. |
+| `skill-a, skill-b` | Executor invokes listed skills via Skill tool before implementing. |
+Skills are confirmed by the user during `/ms:plan-phase` and encoded into plans. Comma-separated list of skill names matching entries in the system-reminder.
 ---
 ## Must-Haves Specification
@@ -190,6 +199,7 @@ Execution order lives in a single `EXECUTION-ORDER.md` file alongside the plans.
 | What triggers TDD lazy-loading? | `**Type:** tdd` in inline metadata |
 | How does the executor know why an approach was chosen? | Reads `## Context` section |
 | How does the executor find existing utilities? | Reads `**Files:**` lines and inline references in `## Changes` |
+| What triggers skill loading in executor? | `**Skills:**` in inline metadata. No skills loaded if omitted. |
 ---

package/mindsystem/references/prework-status.md CHANGED Viewed

@@ -102,7 +102,7 @@ ELSE:
 | Pre-work | Recommended | Status |
 |----------|-------------|--------|
-| Discuss | Unlikely | - |
+| Discuss | Unlikely | ✓ Done |
 | Design | Unlikely | - |
 | Research | Likely | ✓ Done |

package/mindsystem/references/principles.md CHANGED Viewed

@@ -34,9 +34,9 @@ Plans must complete within reasonable context usage.
 - 70%+ context: Poor quality
 **Solution:** Budget-aware consolidation - prefer larger plans, calibrate from real data.
-- Budget-based grouping (sum of weights within 45%)
+- Orchestrator proposes grouping (weight heuristics, target 25-45%), plan-writer validates structurally
 - Each plan independently executable
-- Consolidate plans under ~10% with related work
+- Bias toward consolidation — fewer plans, less overhead
 </scope_control>
 <claude_automates>

package/mindsystem/references/questioning.md CHANGED Viewed

@@ -16,9 +16,10 @@ Don't interrogate. Collaborate. Don't follow a script. Follow the thread.
 By the end of questioning, you need enough clarity to write a PROJECT.md that downstream phases can act on:
-- **research-project** needs: what domain to research, what the user already knows, what unknowns exist
-- **create-roadmap** needs: clear enough vision to decompose into phases, what "done" looks like
-- **plan-phase** needs: specific requirements to break into tasks, context for implementation choices
+- **research-project** needs: product domain, target audience, core problem, what unknowns exist
+- **create-roadmap** needs: clear vision with business context grounding — who it's for, what problem it solves, how it's different — to decompose into phases
+- **plan-phase** needs: audience context for task weighting, specific requirements for implementation choices
+- **discuss-phase** needs: business context for PO-style analysis — audience, problem, differentiation inform scope recommendations
 - **execute-phase** needs: success criteria to verify against, the "why" behind requirements
 A vague PROJECT.md forces every downstream phase to guess. The cost compounds.
@@ -37,13 +38,17 @@ A vague PROJECT.md forces every downstream phase to guess. The cost compounds.
 **Clarify ambiguity.** "When you say Z, do you mean A or B?" "You mentioned X — tell me more."
+**Brownfield reframing.** For brownfield projects, reframe to product-level before feature-level. "Tell me about this project" not "What do you want to build?" Users with existing codebases default to describing the next feature — redirect to the product as a whole first.
+**Derive before asking.** Infer business context from feature descriptions before asking directly. "You described X, Y, Z — it sounds like this is for [audience] dealing with [problem]. Sound right?" This leverages what they've already said and gives them something concrete to react to.
 **Know when to stop.** When you understand what they want, why they want it, who it's for, and what done looks like — offer to proceed.
 </how_to_question>
 <highest_leverage>
-These three questions unlock the most downstream value. Everything else is nice-to-have context.
+These four questions unlock the most downstream value. Everything else is nice-to-have context.
 1. **"What does done look like?"** — Without observable outcomes, every downstream phase guesses scope. Roadmap can't decompose, plans can't verify, execution can't stop.
@@ -51,7 +56,9 @@ These three questions unlock the most downstream value. Everything else is nice-
 3. **"What already exists / what can't change?"** — Constraints and existing code prevent planning in a vacuum. Surfaces brownfield reality, API limitations, time pressure.
-If you only get three answers, get these three.
+4. **"How will you know this is a success?"** — Reveals what actually matters: commercial viability, reliability, user experience, personal satisfaction. Informs Core Value and the weighting of all sections.
+If you only get four answers, get these four.
 </highest_leverage>
@@ -109,16 +116,49 @@ User mentions "frustrated with current tools"
 </using_askuserquestion>
+<clarity_adaptive>
+Clarity is non-uniform across dimensions. Track per-section, not globally. Spend questioning budget where clarity is lowest.
+**High clarity signals:** Specific demographics, named competitors, concrete scenarios, quantifiable outcomes, clear success metrics.
+→ Confirm and move on. Probe one level deeper to test robustness at most.
+**Low clarity signals:** Broad categories ("developers", "small businesses"), vague benefits ("makes things easier"), no competitor awareness ("nothing else does this"), feature-focused language, hedging ("I think", "maybe").
+→ Offer frameworks. Provide concrete options via AskUserQuestion with derived options. Use scenarios to ground.
+If audience is crystal-clear but differentiation is fuzzy, probe differentiation — don't revisit audience.
+</clarity_adaptive>
+<grounding_questions>
+Grounding questions produce better answers than template-shaped questions. Use these instead of asking directly for template sections.
+| Section | Don't Ask | Ask Instead |
+|---------|-----------|-------------|
+| Who It's For | "Who is your target audience?" | "Who would be your first 10 users — real people you'd tell tomorrow?" |
+| Core Problem | "What problem does this solve?" | "What triggered you to want to build this? What's broken today?" |
+| How It's Different | "What's your USP?" | "What are people using today instead? What's wrong with it?" |
+| Core Value | "What's most important?" | "If only ONE thing worked perfectly and everything else was mediocre, what would that one thing be?" |
+| Key User Flows | "What are the key flows?" | "Walk me through a session. You open the app — then what?" |
+| Success | "How do you define success?" | "Imagine this is wildly successful in a year. What does that look like?" |
+People articulate by reacting, not generating.
+</grounding_questions>
 <context_checklist>
 Use this as a **background checklist**, not a conversation structure. Check these mentally as you go. If gaps remain, weave questions naturally.
 - [ ] What they're building (concrete enough to explain to a stranger)
 - [ ] Why it needs to exist (the problem or desire driving it)
-- [ ] Who it's for (even if just themselves)
-- [ ] What "done" looks like (observable outcomes)
+- [ ] Who it's for (specific enough to find 10 of these people)
+- [ ] What makes it different (from alternatives, even manual ones)
+- [ ] What users actually do (2-3 core interactions)
+- [ ] What success looks like (how they'll know it worked)
-Four things. If they volunteer more, capture it.
+Six things. If they volunteer more, capture it.
 </context_checklist>
@@ -148,6 +188,8 @@ Loop until "Create PROJECT.md" selected.
 - **Shallow acceptance** — Taking vague answers without probing
 - **Premature constraints** — Asking about tech stack before understanding the idea
 - **User skills** — NEVER ask about user's technical experience. Claude builds.
+- **Accepting "everyone" as audience** — Always narrow: "Who needs this MOST?" Broad audiences are a sign of fuzzy thinking, not universal appeal.
+- **Skipping differentiation** — "Nothing else does this" is almost always wrong. Probe alternatives including manual workarounds, spreadsheets, competitor products.
 </anti_patterns>

package/mindsystem/references/scope-estimation.md CHANGED Viewed

@@ -27,7 +27,7 @@ Why 50% not 80%?
 </context_target>
 <task_rule>
-**Budget-based grouping.** Classify tasks by weight, then pack plans until budget full.
+**Budget-aware grouping.** The orchestrator proposes plan boundaries using weight heuristics and contextual judgment.
 | Weight | Cost | Examples |
 |--------|------|----------|
@@ -35,9 +35,9 @@ Why 50% not 80%?
 | Medium | 10% | CRUD endpoints, widget extraction, single-file refactoring |
 | Heavy | 20% | Complex business logic, architecture changes, multi-file integrations |
-**Grouping rule:** `sum(weights) <= 45%`. Pack tasks by feature affinity until budget full. Bias toward consolidation — fewer plans, less overhead.
+**For the orchestrator (grouping authority):** Use weight estimates as guidance when proposing plan boundaries. Target 25-45% per plan. Bias toward consolidation — fewer plans, less overhead. Consider that sequential-only work (no parallelism benefit from splitting) can be grouped more aggressively. A single light task alone wastes executor overhead — consolidate with related work.
-**Minimum plan threshold:** Plans under ~10% → consolidate with related work in the same wave. A single light task alone wastes executor overhead.
+**For the plan-writer (structural validation):** Classify weights for the grouping rationale table. Do NOT re-group based on budget math — the orchestrator already considered context budget with user input. Deviate only for structural issues (file conflicts, circular dependencies, missing dependency chains).
 </task_rule>
 <tdd_plans>
@@ -64,7 +64,6 @@ See `~/.claude/mindsystem/references/tdd.md` for TDD plan structure.
 <split_signals>
 <always_split>
-- **Budget sum exceeds 45%** - Budget overflow regardless of task count
 - **Multiple subsystems** - DB + API + UI = separate plans
 - **Any task with >5 file modifications** - Split by file groups
 - **Discovery + verification in separate plans** - Don't mix exploratory and implementation work
@@ -163,12 +162,11 @@ Tasks: 8 (models, migrations, API, JWT, middleware, hashing, login form, registe
 Result: Task 1-3 good, Task 4-5 degrading, Task 6-8 rushed
 ```
-**Good - Budget-aware plans:**
+**Good - Consolidated plans:**
 ```
-Plan 1: "Auth Database + Config" (4L+1M = ~30%)
-Plan 2: "Auth API" (3M = ~30%)
-Plan 3: "Auth UI" (1H+1M = ~30%)
-Each: within 45% budget, peak quality, atomic commits
+Plan 1: "Auth Foundation" (models + config + middleware = ~35%)
+Plan 2: "Auth Endpoints + UI" (API + forms + wiring = ~40%)
+Each: within reasonable budget, peak quality, atomic commits
 ```
 **Bad - Horizontal layers (sequential):**
@@ -191,7 +189,9 @@ Waves: [01, 02, 03] (all parallel)
 </anti_patterns>
 <estimating_context>
-Weight estimates are heuristics for the plan-writer to bias toward consolidation, not precise predictions. Actual context usage depends on model, task complexity, file sizes, and context window. Calibrate from real execution data — when plans consistently finish with headroom, pack more aggressively.
+Weight estimates are heuristics for the orchestrator's grouping judgment, not mechanical constraints. Actual context usage depends on model, task complexity, file sizes, and context window. Calibrate from real execution data — when plans consistently finish with headroom, the orchestrator should group more aggressively.
+The plan-writer uses weight classifications for the grouping rationale table (transparency), not as a reason to override the orchestrator's proposed boundaries.
 </estimating_context>
 <summary>
@@ -203,9 +203,8 @@ Weight estimates are heuristics for the plan-writer to bias toward consolidation
 **The principle:** Fewer executors, same quality, less overhead. Bias toward consolidation.
 **The rules:**
-- Group by weight budget (`sum(weights) <= 45%`), not by fixed task count.
-- Consolidate plans under ~10% with related same-wave work.
-- Split when budget sum exceeds 45%.
+- Orchestrator proposes grouping using weight heuristics and contextual judgment (target 25-45%).
+- Plan-writer validates structurally (file conflicts, circular deps) — deviates only for structural issues.
 - Vertical slices over horizontal layers.
 - Dependencies centralized in EXECUTION-ORDER.md.
 - Autonomous plans get parallel execution.

package/mindsystem/templates/phase-prompt.md CHANGED Viewed

@@ -16,17 +16,16 @@ read `~/.claude/mindsystem/references/plan-format.md`.
 ## Scope Guidance
-**Plan sizing — budget-based grouping:**
+**Plan sizing — orchestrator proposes, plan-writer validates:**
-Weight classifications (L: 5%, M: 10%, H: 20%) defined in scope-estimation.md.
+Weight classifications (L: 5%, M: 10%, H: 20%) defined in scope-estimation.md. The orchestrator uses these as guidance when proposing plan boundaries to the user. Target 25-45% per plan. Bias toward consolidation.
-Grouping rule: `sum(weights) <= 45%`. Target 25-45% per plan. Plans under ~10% → consolidate with related same-wave work. Bias toward consolidation.
+The plan-writer receives the proposed grouping and validates structurally (file conflicts, circular deps). It does NOT re-group based on budget math.
-**When to split:**
+**When to split (orchestrator judgment):**
 - Different subsystems (auth vs API vs UI)
-- Budget sum exceeds 45%
-- Risk of context overflow
+- Risk of context overflow (contextual judgment, not mechanical formula)
 - TDD candidates — separate plans (inherently heavy-weight)
 **Vertical slices preferred:**