npm - maxsimcli - Versions diffs - 4.0.0 → 4.0.2 - Mend

maxsimcli 4.0.0 → 4.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/README.md +43 -17
package/dist/assets/CHANGELOG.md +60 -0
package/dist/assets/templates/skills/batch-worktree/SKILL.md +38 -85
package/dist/assets/templates/skills/brainstorming/SKILL.md +44 -114
package/dist/assets/templates/skills/code-review/SKILL.md +43 -71
package/dist/assets/templates/skills/memory-management/SKILL.md +36 -100
package/dist/assets/templates/skills/roadmap-writing/SKILL.md +39 -73
package/dist/assets/templates/skills/sdd/SKILL.md +36 -85
package/dist/assets/templates/skills/simplify/SKILL.md +96 -139
package/dist/assets/templates/skills/systematic-debugging/SKILL.md +41 -74
package/dist/assets/templates/skills/tdd/SKILL.md +32 -65
package/dist/assets/templates/skills/using-maxsim/SKILL.md +26 -39
package/dist/assets/templates/skills/verification-before-completion/SKILL.md +37 -56
package/dist/mcp-server.cjs +22489 -32
package/dist/mcp-server.cjs.map +1 -1
package/package.json +1 -1

package/dist/assets/templates/skills/systematic-debugging/SKILL.md CHANGED Viewed

@@ -1,104 +1,71 @@
 ---
 name: systematic-debugging
-description: Use when encountering any bug, test failure, or unexpected behavior — requires root cause investigation before attempting any fix
+description: >-
+  Investigates bugs through systematic root-cause analysis: reproduce, hypothesize,
+  isolate, verify, fix, confirm. Use when encountering any bug, test failure,
+  unexpected behavior, or error message.
 ---
 # Systematic Debugging
-Random fixes waste time and create new bugs. Find the root cause first.
+Find the root cause first. Random fixes waste time and create new bugs.
-**If you have not identified the root cause, you are guessing — not debugging.**
+**HARD GATE -- No fix attempts without understanding root cause. If you have not completed the REPRODUCE and HYPOTHESIZE steps, you cannot propose a fix.**
-## The Iron Law
+## Process
-<HARD-GATE>
-NO FIX ATTEMPTS WITHOUT UNDERSTANDING ROOT CAUSE.
-If you have not completed the REPRODUCE and HYPOTHESIZE steps, you CANNOT propose a fix.
-"Let me just try this" is guessing, not debugging.
-Violating this rule is a violation — not a time-saving shortcut.
-</HARD-GATE>
-## The Gate Function
-Follow these steps IN ORDER for every bug, test failure, or unexpected behavior.
-### 1. REPRODUCE — Confirm the Problem
+### 1. REPRODUCE -- Confirm the Problem
 - Run the failing command or test. Capture the EXACT error output.
 - Can you trigger it reliably? What are the exact steps?
-- If not reproducible: gather more data — do not guess.
-```bash
-# Example: reproduce a test failure
-npx vitest run path/to/failing.test.ts
-```
+- If not reproducible: gather more data -- do not guess.
-### 2. HYPOTHESIZE — Form a Theory
+### 2. HYPOTHESIZE -- Form a Theory
-- Read the error message COMPLETELY (stack trace, line numbers, exit codes)
-- Check recent changes: `git diff`, recent commits, new dependencies
+- Read the error message completely (stack trace, line numbers, exit codes).
+- Check recent changes: `git diff`, recent commits, new dependencies.
 - Trace data flow: where does the bad value originate?
-- State your hypothesis clearly: "I think X is the root cause because Y"
+- State your hypothesis clearly: "I think X is the root cause because Y."
-### 3. ISOLATE — Narrow the Scope
+### 3. ISOLATE -- Narrow the Scope
-- Find the SMALLEST reproduction case
-- In multi-component systems, add diagnostic logging at each boundary
-- Identify which SPECIFIC layer or component is failing
-- Compare against working examples in the codebase
+- Find the smallest reproduction case.
+- In multi-component systems, add diagnostic logging at each boundary.
+- Identify which specific layer or component is failing.
+- Compare against working examples in the codebase.
-### 4. VERIFY — Test Your Hypothesis
+### 4. VERIFY -- Test Your Hypothesis
-- Make the SMALLEST possible change to test your hypothesis
-- Change ONE variable at a time — never multiple things simultaneously
-- If hypothesis is wrong: form a NEW hypothesis, do not stack fixes
+- Make the smallest possible change to test your hypothesis.
+- Change one variable at a time -- never multiple things simultaneously.
+- If hypothesis is wrong: form a new hypothesis, do not stack fixes.
-### 5. FIX — Address the Root Cause
+### 5. FIX -- Address the Root Cause
-- Write a failing test that reproduces the bug (see TDD skill)
-- Implement a SINGLE fix that addresses the root cause
-- No "while I'm here" improvements — fix only the identified issue
+- Write a failing test that reproduces the bug.
+- Implement a single fix that addresses the root cause.
+- No "while I'm here" improvements -- fix only the identified issue.
-### 6. CONFIRM — Verify the Fix
+### 6. CONFIRM -- Verify the Fix
-- Run the original failing test: it must now pass
-- Run the full test suite: no regressions
-- Verify the original error no longer occurs
+- Run the original failing test: it must now pass.
+- Run the full test suite: no regressions.
+- Verify the original error no longer occurs.
-```bash
-# Confirm the specific fix
-npx vitest run path/to/fixed.test.ts
-# Confirm no regressions
-npx vitest run
-```
+## Common Pitfalls
-## Common Rationalizations — REJECT THESE
-| Excuse | Why It Violates the Rule |
-|--------|--------------------------|
-| "I think I know what it is" | Thinking is not evidence. Reproduce first, then hypothesize. |
-| "Let me just try this fix" | "Just try" = guessing. You have skipped REPRODUCE and HYPOTHESIZE. |
-| "Quick patch for now, investigate later" | "Later" never comes. Patches mask the real problem. |
+| Excuse | Reality |
+|--------|---------|
+| "I think I know what it is" | Thinking is not evidence. Reproduce first. |
+| "Let me just try this fix" | That is guessing. Complete REPRODUCE and HYPOTHESIZE first. |
 | "Multiple changes at once saves time" | You cannot isolate what worked. You will create new bugs. |
-| "The issue is simple, I don't need the process" | Simple bugs have root causes too. The process is fast for simple bugs. |
-| "I'm under time pressure" | Systematic debugging IS faster than guess-and-check thrashing. |
-| "The reference is too long, I'll skim it" | Partial understanding guarantees partial fixes. Read it completely. |
-## Red Flags — STOP If You Catch Yourself:
-- Changing code before reproducing the error
-- Proposing a fix before reading the full error message and stack trace
-- Trying random fixes hoping one will work
-- Changing multiple things simultaneously
-- Saying "it's probably X" without evidence
-- Applying a fix that did not work, then adding another fix on top
-- On your 3rd failed fix attempt (this signals an architectural problem — escalate)
+| "The issue is simple" | Simple bugs have root causes too. The process is fast for simple bugs. |
-**If any red flag triggers: STOP. Return to step 1 (REPRODUCE).**
+Stop immediately if you catch yourself changing code before reproducing, proposing a fix before reading the full error, trying random fixes, or changing multiple things at once. If any of these triggers, return to step 1.
-**If 3+ fix attempts have failed:** The issue is likely architectural, not a simple bug. Document what you have tried and escalate to the user for a design decision.
+If 3+ fix attempts have failed, the issue is likely architectural. Document what you have tried and escalate to the user.
-## Verification Checklist
+## Verification
 Before claiming a bug is fixed, confirm:
@@ -110,9 +77,9 @@ Before claiming a bug is fixed, confirm:
 - [ ] The full test suite passes (no regressions)
 - [ ] The original error no longer occurs when running the original steps
-## Debugging in MAXSIM Context
+## MAXSIM Integration
 When debugging during plan execution, MAXSIM deviation rules apply:
 - **Rule 1 (Auto-fix bugs):** You may auto-fix bugs found during execution, but you must still follow this debugging process.
-- **Rule 4 (Architectural changes):** If 3+ fix attempts fail, STOP and return a checkpoint — this is an architectural decision for the user.
+- **Rule 4 (Architectural changes):** If 3+ fix attempts fail, STOP and return a checkpoint -- this is an architectural decision for the user.
 - Track all debugging deviations for SUMMARY.md documentation.

package/dist/assets/templates/skills/tdd/SKILL.md CHANGED Viewed

@@ -1,93 +1,70 @@
 ---
 name: tdd
-description: Use when implementing any feature or bug fix — requires writing a failing test before any implementation code
+description: >-
+  Enforces test-driven development with the Red-Green-Refactor cycle: write a
+  failing test first, implement minimal code to pass, then refactor. Use when
+  implementing features, fixing bugs, or adding new behavior.
 ---
 # Test-Driven Development (TDD)
 Write the test first. Watch it fail. Write minimal code to pass. Clean up.
-**If you did not watch the test fail, you do not know if it tests the right thing.**
+**HARD GATE: No implementation code without a failing test first. If you wrote production code before the test, delete it and start over. No exceptions.**
-## The Iron Law
+## Process
-<HARD-GATE>
-NO IMPLEMENTATION CODE WITHOUT A FAILING TEST FIRST.
-If you wrote production code before the test, DELETE IT. Start over.
-No exceptions. No "I'll add tests after." No "keep as reference."
-Violating this rule is a violation — not a judgment call.
-</HARD-GATE>
+### 1. RED -- Write One Failing Test
-## The Gate Function
-Follow this cycle for every behavior change, feature addition, or bug fix.
-### 1. RED — Write Failing Test
-- Write ONE minimal test that describes the desired behavior
+- Write ONE minimal test describing the desired behavior
 - Test name describes what SHOULD happen, not implementation details
-- Use real code paths — mocks only when unavoidable (external APIs, databases)
+- Use real code paths -- mocks only when unavoidable (external APIs, databases)
-### 2. VERIFY RED — Run the Test
+### 2. VERIFY RED -- Run the Test
-```bash
-# Run the test suite for this file
-npx vitest run path/to/test.test.ts
-```
-- Test MUST fail (not error — fail with an assertion)
+- Test MUST fail with an assertion (not error out from syntax or imports)
 - Failure message must match the missing behavior
-- If test passes immediately: you are testing existing behavior — rewrite it
+- If the test passes immediately, you are testing existing behavior -- rewrite it
-### 3. GREEN — Write Minimal Code
+### 3. GREEN -- Write Minimal Code
 - Write the SIMPLEST code that makes the test pass
 - Do NOT add features the test does not require
-- Do NOT refactor yet — that comes next
-### 4. VERIFY GREEN — Run All Tests
+- Do NOT refactor yet
-```bash
-npx vitest run
-```
+### 4. VERIFY GREEN -- Run All Tests
 - The new test MUST pass
 - ALL existing tests MUST still pass
-- If any test fails: fix code, not tests
+- If any test fails, fix code -- not tests
-### 5. REFACTOR — Clean Up (Tests Still Green)
+### 5. REFACTOR -- Clean Up (Tests Still Green)
 - Remove duplication, improve names, extract helpers
-- Run tests after every change — they must stay green
+- Run tests after every change
 - Do NOT add new behavior during refactor
-### 6. REPEAT — Next failing test for next behavior
+### 6. REPEAT -- Next failing test for next behavior
-## Common Rationalizations — REJECT THESE
+## Common Pitfalls
-| Excuse | Why It Violates the Rule |
-|--------|--------------------------|
-| "Too simple to test" | Simple code breaks. The test takes 30 seconds to write. |
-| "I'll add tests after" | Tests written after pass immediately — they prove nothing. |
-| "The test framework isn't set up yet" | Set it up. That is part of the task, not a reason to skip. |
+| Excuse | Why it fails |
+|--------|-------------|
+| "Too simple to test" | Simple code breaks. The test takes 30 seconds. |
+| "I'll add tests after" | Tests written after pass immediately -- they prove nothing. |
 | "I know the code works" | Knowledge is not evidence. A passing test is evidence. |
-| "TDD is slower for this task" | TDD is faster than debugging. Every "quick skip" creates debt. |
+| "TDD is slower" | TDD is faster than debugging. Every skip creates debt. |
 | "Let me keep the code as reference" | You will adapt it instead of writing test-first. Delete means delete. |
-| "I need to explore the design first" | Explore, then throw it away. Start implementation with TDD. |
-## Red Flags — STOP If You Catch Yourself:
+Stop immediately if you catch yourself:
 - Writing implementation code before writing a test
-- Writing a test that passes on the first run (you are testing existing behavior)
-- Skipping the VERIFY RED step ("I know it will fail")
+- Writing a test that passes on the first run
+- Skipping the VERIFY RED step
 - Adding features beyond what the current test requires
-- Skipping the REFACTOR step to save time
-- Rationalizing "just this once" or "this is different"
-- Keeping pre-TDD code "as reference" while writing tests
+- Keeping pre-TDD code "as reference"
-**If any red flag triggers: STOP. Delete the implementation. Write the test first.**
-## Verification Checklist
+## Verification
 Before claiming TDD compliance, confirm:
@@ -99,20 +76,10 @@ Before claiming TDD compliance, confirm:
 - [ ] All tests pass after implementation
 - [ ] Refactoring (if any) did not break any tests
-Cannot check all boxes? You skipped TDD. Start over.
-## When Stuck
-| Problem | Solution |
-|---------|----------|
-| Don't know how to test it | Write the assertion first. What should the output be? |
-| Test setup is too complex | The design is too complex. Simplify the interface. |
-| Must mock everything | Code is too coupled. Use dependency injection. |
-| Existing code has no tests | Add tests for the code you are changing. Start the cycle now. |
-## Integration with MAXSIM
+## MAXSIM Integration
 In MAXSIM plan execution, tasks marked `tdd="true"` follow this cycle with per-step commits:
 - **RED commit:** `test({phase}-{plan}): add failing test for [feature]`
 - **GREEN commit:** `feat({phase}-{plan}): implement [feature]`
 - **REFACTOR commit (if changes made):** `refactor({phase}-{plan}): clean up [feature]`

package/dist/assets/templates/skills/using-maxsim/SKILL.md CHANGED Viewed

@@ -1,36 +1,33 @@
 ---
 name: using-maxsim
-description: Entry skill that establishes MAXSIM workflow rules — triggers before any action to route work through the correct MAXSIM commands, skills, and agents
 alwaysApply: true
+description: >-
+  Routes all work through MAXSIM's spec-driven workflow: checks for planning
+  directory, determines active phase, and dispatches to the correct MAXSIM
+  command. Use when starting any work session, resuming work, or when unsure
+  which MAXSIM command to run.
 ---
 # Using MAXSIM
-MAXSIM is a spec-driven development system. Work flows through phases, plans, and tasks — not ad-hoc coding.
+MAXSIM is a spec-driven development system. Work flows through phases, plans, and tasks -- not ad-hoc coding.
-**If you are about to write code without a plan, STOP. Route through MAXSIM first.**
-## The Iron Law
-<HARD-GATE>
-NO IMPLEMENTATION WITHOUT A PLAN.
-If there is no .planning/ directory, run `/maxsim:init` first.
+**HARD GATE -- No implementation without a plan.**
+If there is no `.planning/` directory, run `/maxsim:init` first.
 If there is no current phase, run `/maxsim:plan-phase` first.
 If there is no PLAN.md for the current phase, run `/maxsim:plan-phase` first.
 If there IS a plan, run `/maxsim:execute-phase` to execute it.
-Skipping the workflow is a violation — not a shortcut.
-</HARD-GATE>
-## When This Skill Triggers
+## Process
-This skill applies to ALL work sessions. Before starting any task:
+Before starting any task:
-1. **Check for `.planning/` directory** — if missing, initialize with `/maxsim:init`
-2. **Check STATE.md** — resume from last checkpoint if one exists
-3. **Check current phase** — determine what phase is active in ROADMAP.md
-4. **Route to the correct command** based on the situation (see routing table below)
+1. **Check for `.planning/` directory** -- if missing, initialize with `/maxsim:init`
+2. **Check STATE.md** -- resume from last checkpoint if one exists
+3. **Check current phase** -- determine what phase is active in ROADMAP.md
+4. **Route to the correct command** based on the routing table below
-## Routing Table
+### Routing Table
 | Situation | Route To |
 |-----------|----------|
@@ -39,18 +36,18 @@ This skill applies to ALL work sessions. Before starting any task:
 | Active phase has no PLAN.md | `/maxsim:plan-phase` |
 | Active phase has PLAN.md, not started | `/maxsim:execute-phase` |
 | Checkpoint exists in STATE.md | `/maxsim:resume-work` |
-| Bug found during execution | `/maxsim:debug` (triggers systematic-debugging skill) |
-| Phase complete, needs verification | `/maxsim:verify-phase` (triggers verification-before-completion skill) |
+| Bug found during execution | `/maxsim:debug` |
+| Phase complete, needs verification | `/maxsim:verify-phase` |
 | Quick standalone task | `/maxsim:quick` |
 | User asks for help | `/maxsim:help` |
-## Available Skills
+### Available Skills
 Skills are behavioral rules that activate automatically based on context:
 | Skill | Triggers When |
 |-------|---------------|
-| `using-maxsim` | Always (alwaysApply) — entry point for all MAXSIM work |
+| `using-maxsim` | Always (alwaysApply) -- entry point for all MAXSIM work |
 | `systematic-debugging` | Any bug, test failure, or unexpected behavior encountered |
 | `tdd` | Implementing any feature or bug fix (write test first) |
 | `verification-before-completion` | Before claiming any work is complete or passing |
@@ -60,7 +57,7 @@ Skills are behavioral rules that activate automatically based on context:
 | `simplify` | When reviewing and cleaning up code changes |
 | `code-review` | When reviewing implementation quality |
-## Available Agents
+### Available Agents
 Agents are specialized subagent prompts spawned by MAXSIM commands:
@@ -80,17 +77,7 @@ Agents are specialized subagent prompts spawned by MAXSIM commands:
 | `maxsim-codebase-mapper` | Maps codebase structure | `/maxsim:init` |
 | `maxsim-integration-checker` | Checks integration points | `/maxsim:verify-phase` |
-## Common Rationalizations — REJECT THESE
-| Excuse | Why It Violates the Rule |
-|--------|--------------------------|
-| "It's just a small fix" | Small fixes have context and consequences. Use `/maxsim:quick`. |
-| "I know what to do, I don't need a plan" | Plans catch what you miss. The plan is the checkpoint. |
-| "MAXSIM overhead is too much for this" | `/maxsim:quick` exists for lightweight tasks. Use it. |
-| "I'll plan it in my head" | Plans in your head die with context. Write them down. |
-| "The user said 'just do it'" | Route through `/maxsim:quick` — it is fast and still tracked. |
-## Red Flags — STOP If You Catch Yourself:
+## Common Pitfalls
 - Writing implementation code without a PLAN.md
 - Skipping `/maxsim:init` because "the project is simple"
@@ -99,22 +86,22 @@ Agents are specialized subagent prompts spawned by MAXSIM commands:
 - Making architectural decisions without documenting them in STATE.md
 - Finishing work without running verification
-**If any red flag triggers: STOP. Check the routing table. Follow the workflow.**
+**If any of these occur: stop, check the routing table, follow the workflow.**
-## Verification Checklist
+## Verification
 Before ending any work session:
 - [ ] All work was routed through MAXSIM commands (not ad-hoc)
 - [ ] STATE.md reflects current progress and decisions
-- [ ] Any bugs encountered were debugged systematically (not guessed)
+- [ ] Any bugs encountered were debugged systematically
 - [ ] Tests were written before implementation (TDD)
 - [ ] Completion claims have verification evidence
 - [ ] Recurring patterns or errors were saved to memory
-## Integration with CLAUDE.md
+## MAXSIM Integration
 When a project has a `CLAUDE.md`, both apply:
 - `CLAUDE.md` defines project-specific conventions (language, tools, style)
 - MAXSIM skills define workflow rules (how work is structured and verified)
-- If they conflict, `CLAUDE.md` project conventions take priority for code style; MAXSIM takes priority for workflow structure
+- If they conflict, `CLAUDE.md` takes priority for code style; MAXSIM takes priority for workflow structure

package/dist/assets/templates/skills/verification-before-completion/SKILL.md CHANGED Viewed

@@ -1,36 +1,28 @@
 ---
 name: verification-before-completion
-description: Use before claiming any work is complete, fixed, or passing — requires running verification commands and reading output before making success claims
+description: >-
+  Requires running verification commands and reading actual output before making
+  any completion claims. Use when claiming work is done, tests pass, builds
+  succeed, or bugs are fixed. Prevents false completion claims.
 ---
 # Verification Before Completion
-Claiming work is complete without verification is dishonesty, not efficiency.
+Evidence before claims, always.
-**Evidence before claims, always.**
+**HARD GATE -- No completion claims without fresh verification evidence. If you have not run the verification command in this turn, you cannot claim it passes. "Should work" is not evidence. "I'm confident" is not evidence.**
-## The Iron Law
+## Process
-<HARD-GATE>
-NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.
-If you have not run the verification command in this turn, you CANNOT claim it passes.
-"Should work" is not evidence. "I'm confident" is not evidence.
-Violating this rule is a violation — not a special case.
-</HARD-GATE>
+Before claiming any status or marking a task done:
-## The Gate Function
-BEFORE claiming any status, expressing satisfaction, or marking a task done:
-1. **IDENTIFY:** What command proves this claim?
-2. **RUN:** Execute the FULL command (fresh, in this turn — not a previous run)
-3. **READ:** Read the FULL output. Check the exit code. Count failures.
-4. **VERIFY:** Does the output actually confirm the claim?
-   - If NO: State the actual status with evidence
-   - If YES: State the claim WITH the evidence
-5. **CLAIM:** Only now may you assert completion
-**Skip any step = lying, not verifying.**
+1. **IDENTIFY** -- What command proves this claim?
+2. **RUN** -- Execute the full command fresh in this turn (not a previous run)
+3. **READ** -- Read the full output, check the exit code, count failures
+4. **VERIFY** -- Does the output actually confirm the claim?
+   - If NO: state the actual status with evidence
+   - If YES: state the claim with the evidence
+5. **CLAIM** -- Only now may you assert completion
 ### Evidence Block Format
@@ -43,35 +35,11 @@ OUTPUT: [relevant excerpt of actual output]
 VERDICT: PASS | FAIL
 ```
-This format is required for task completion claims in MAXSIM plan execution. It is NOT required for intermediate status updates like "I have read the file" or "here is the plan."
-## Common Rationalizations — REJECT THESE
-| Excuse | Why It Violates the Rule |
-|--------|--------------------------|
-| "Should work now" | "Should" is not evidence. RUN the command. |
-| "I'm confident in the logic" | Confidence is not evidence. Run it. |
-| "The linter passed" | Linter passing does not mean tests pass or build succeeds. |
-| "Just this once" | NO EXCEPTIONS. This is the rule, not a guideline. |
-| "I only changed one line" | One line can break everything. Verify. |
-| "The subagent reported success" | Trust test output and VCS diffs, not agent reports. |
-| "Partial check is enough" | Partial proves nothing about the unchecked parts. |
-## Red Flags — STOP If You Catch Yourself:
+This format is required for task completion claims in MAXSIM plan execution. It is not required for intermediate status updates.
-- Using "should", "probably", "seems to", or "looks good" about unverified work
-- Expressing satisfaction ("Great!", "Perfect!", "Done!") before running verification
-- About to commit or push without running the test/build command in THIS turn
-- Trusting a subagent's completion report without independent verification
-- Thinking "the last run was clean, I only changed one line"
-- About to mark a MAXSIM task as done without running the `<verify>` block
-- Relying on a previous turn's test output as current evidence
+### What Counts as Verification
-**If any red flag triggers: STOP. Run the command. Read the output. THEN make the claim.**
-## What Counts as Verification
-| Claim | Requires | NOT Sufficient |
+| Claim | Requires | Not Sufficient |
 |-------|----------|----------------|
 | "Tests pass" | Test command output showing 0 failures | Previous run, "should pass", partial run |
 | "Build succeeds" | Build command with exit code 0 | Linter passing, "logs look clean" |
@@ -79,7 +47,19 @@ This format is required for task completion claims in MAXSIM plan execution. It
 | "Task is complete" | All done criteria checked with evidence | "I implemented everything in the plan" |
 | "No regressions" | Full test suite passing | "I only changed one file" |
-## Verification Checklist
+## Common Pitfalls
+| Excuse | Why It Fails |
+|--------|-------------|
+| "Should work now" | "Should" is not evidence. Run the command. |
+| "I'm confident in the logic" | Confidence is not evidence. Run it. |
+| "The linter passed" | Linter passing does not mean tests pass or build succeeds. |
+| "I only changed one line" | One line can break everything. Verify. |
+| "The subagent reported success" | Trust test output and VCS diffs, not agent reports. |
+Stop if you catch yourself using "should", "probably", or "looks good" about unverified work, or expressing satisfaction before running verification.
+## Verification
 Before marking any work as complete:
@@ -91,12 +71,13 @@ Before marking any work as complete:
 - [ ] No "should", "probably", or "seems to" in your completion statement
 - [ ] Evidence block produced for the task completion claim
-## In MAXSIM Plan Execution
+## MAXSIM Integration
+The executor's task commit protocol requires verification before committing:
-The executor's task commit protocol requires verification BEFORE committing:
-1. Run the task's `<verify>` block (automated checks)
-2. Confirm the `<done>` criteria are met with evidence
+1. Run the task's verify block (automated checks)
+2. Confirm the done criteria are met with evidence
 3. Produce an evidence block for the task completion
 4. Only then: stage files and commit
-The verifier agent independently re-checks all claims — do not assume the verifier will catch what you missed.
+The verifier agent independently re-checks all claims -- do not assume the verifier will catch what you missed.