npm - maxsimcli - Versions diffs - 5.0.6 → 5.1.0 - Mend

maxsimcli 5.0.6 → 5.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (91) hide show

package/README.md +316 -288
package/dist/assets/CHANGELOG.md +14 -0
package/dist/assets/hooks/maxsim-capture-learnings.cjs +128 -0
package/dist/assets/hooks/maxsim-capture-learnings.cjs.map +1 -0
package/dist/assets/hooks/maxsim-check-update.cjs +126 -88
package/dist/assets/hooks/maxsim-check-update.cjs.map +1 -1
package/dist/assets/hooks/maxsim-notification-sound.cjs +87 -43
package/dist/assets/hooks/maxsim-notification-sound.cjs.map +1 -1
package/dist/assets/hooks/maxsim-statusline.cjs +45 -171
package/dist/assets/hooks/maxsim-statusline.cjs.map +1 -1
package/dist/assets/hooks/maxsim-stop-sound.cjs +86 -43
package/dist/assets/hooks/maxsim-stop-sound.cjs.map +1 -1
package/dist/assets/hooks/maxsim-sync-reminder.cjs +72 -21
package/dist/assets/hooks/maxsim-sync-reminder.cjs.map +1 -1
package/dist/assets/templates/agents/AGENTS.md +62 -51
package/dist/assets/templates/agents/executor.md +44 -59
package/dist/assets/templates/agents/planner.md +36 -31
package/dist/assets/templates/agents/researcher.md +35 -43
package/dist/assets/templates/agents/verifier.md +29 -31
package/dist/assets/templates/commands/maxsim/debug.md +20 -154
package/dist/assets/templates/commands/maxsim/execute.md +19 -33
package/dist/assets/templates/commands/maxsim/go.md +21 -20
package/dist/assets/templates/commands/maxsim/help.md +5 -14
package/dist/assets/templates/commands/maxsim/init.md +18 -40
package/dist/assets/templates/commands/maxsim/plan.md +22 -37
package/dist/assets/templates/commands/maxsim/progress.md +15 -16
package/dist/assets/templates/commands/maxsim/quick.md +18 -29
package/dist/assets/templates/commands/maxsim/settings.md +18 -26
package/dist/assets/templates/references/continuation-format.md +2 -4
package/dist/assets/templates/references/model-profiles.md +2 -2
package/dist/assets/templates/references/planning-config.md +10 -11
package/dist/assets/templates/references/self-improvement.md +120 -0
package/dist/assets/templates/rules/conventions.md +1 -1
package/dist/assets/templates/rules/verification-protocol.md +1 -1
package/dist/assets/templates/skills/brainstorming/SKILL.md +35 -26
package/dist/assets/templates/skills/code-review/SKILL.md +78 -55
package/dist/assets/templates/skills/commit-conventions/SKILL.md +70 -36
package/dist/assets/templates/skills/github-operations/SKILL.md +142 -0
package/dist/assets/templates/skills/handoff-contract/SKILL.md +62 -28
package/dist/assets/templates/skills/maxsim-batch/SKILL.md +68 -42
package/dist/assets/templates/skills/maxsim-simplify/SKILL.md +65 -40
package/dist/assets/templates/skills/project-memory/SKILL.md +121 -0
package/dist/assets/templates/skills/research/SKILL.md +126 -0
package/dist/assets/templates/skills/roadmap-writing/SKILL.md +71 -68
package/dist/assets/templates/skills/systematic-debugging/SKILL.md +37 -25
package/dist/assets/templates/skills/tdd/SKILL.md +36 -39
package/dist/assets/templates/skills/using-maxsim/SKILL.md +69 -55
package/dist/assets/templates/skills/verification/SKILL.md +167 -0
package/dist/assets/templates/workflows/batch.md +249 -268
package/dist/assets/templates/workflows/diagnose-issues.md +225 -151
package/dist/assets/templates/workflows/execute-plan.md +191 -981
package/dist/assets/templates/workflows/execute.md +350 -309
package/dist/assets/templates/workflows/go.md +119 -138
package/dist/assets/templates/workflows/health.md +71 -114
package/dist/assets/templates/workflows/help.md +85 -147
package/dist/assets/templates/workflows/init-existing.md +180 -1373
package/dist/assets/templates/workflows/init.md +53 -165
package/dist/assets/templates/workflows/new-milestone.md +91 -334
package/dist/assets/templates/workflows/new-project.md +165 -1384
package/dist/assets/templates/workflows/plan-create.md +182 -73
package/dist/assets/templates/workflows/plan-discuss.md +89 -82
package/dist/assets/templates/workflows/plan-research.md +191 -85
package/dist/assets/templates/workflows/plan.md +122 -58
package/dist/assets/templates/workflows/progress.md +76 -310
package/dist/assets/templates/workflows/quick.md +70 -495
package/dist/assets/templates/workflows/sdd.md +231 -221
package/dist/assets/templates/workflows/settings.md +90 -120
package/dist/assets/templates/workflows/verify-phase.md +296 -258
package/dist/cli.cjs +17 -23465
package/dist/cli.cjs.map +1 -1
package/dist/install.cjs +356 -8358
package/dist/install.cjs.map +1 -1
package/package.json +16 -22
package/dist/assets/templates/skills/agent-system-map/SKILL.md +0 -92
package/dist/assets/templates/skills/evidence-collection/SKILL.md +0 -87
package/dist/assets/templates/skills/github-artifact-protocol/SKILL.md +0 -67
package/dist/assets/templates/skills/github-tools-guide/SKILL.md +0 -89
package/dist/assets/templates/skills/input-validation/SKILL.md +0 -51
package/dist/assets/templates/skills/memory-management/SKILL.md +0 -75
package/dist/assets/templates/skills/research-methodology/SKILL.md +0 -137
package/dist/assets/templates/skills/sdd/SKILL.md +0 -91
package/dist/assets/templates/skills/tool-priority-guide/SKILL.md +0 -80
package/dist/assets/templates/skills/verification-before-completion/SKILL.md +0 -71
package/dist/assets/templates/skills/verification-gates/SKILL.md +0 -169
package/dist/assets/templates/workflows/discuss-phase.md +0 -683
package/dist/assets/templates/workflows/research-phase.md +0 -73
package/dist/assets/templates/workflows/verify-work.md +0 -572
package/dist/core-D5zUr9cb.cjs +0 -4305
package/dist/core-D5zUr9cb.cjs.map +0 -1
package/dist/skills-CjFWZIGM.cjs +0 -6824
package/dist/skills-CjFWZIGM.cjs.map +0 -1

package/dist/assets/templates/skills/systematic-debugging/SKILL.md CHANGED Viewed

@@ -1,57 +1,66 @@
 ---
 name: systematic-debugging
-description: >-
-  Systematic debugging via reproduce-hypothesize-isolate-verify-fix cycle.
-  Requires evidence at each step. Use when investigating bugs, test failures,
-  unexpected behavior, or runtime errors.
+description: Systematic debugging via reproduce-hypothesize-isolate-verify-fix-confirm cycle. Requires evidence at each step. Use when encountering bugs, test failures, or unexpected behavior.
 ---
 # Systematic Debugging
 Find the root cause first. Random fixes waste time and create new bugs.
-**No fix attempts without understanding root cause.** If you have not completed the REPRODUCE and HYPOTHESIZE steps, you cannot propose a fix.
+**No fix attempts without understanding root cause.** Completing REPRODUCE and HYPOTHESIZE is mandatory before proposing any fix.
-## The 5-Step Process
+## The 6-Step Process
-### 1. REPRODUCE -- Confirm the Problem
+### 1. REPRODUCE — Confirm the Problem
 - Run the failing command or test. Capture the EXACT error output.
-- Can you trigger it reliably? What are the exact steps?
-- If not reproducible: gather more data -- do not guess.
+- Can it be triggered reliably? What are the exact steps?
+- If not reproducible: gather more data — do not guess.
-### 2. HYPOTHESIZE -- Form a Theory
+**Output:** Exact reproduction steps and full error output. Nothing moves forward without this.
-- Read the error message completely (stack trace, line numbers, exit codes).
+### 2. HYPOTHESIZE — Form a Theory
+- Read the error message completely: stack trace, line numbers, exit codes.
 - Check recent changes: `git diff`, recent commits, new dependencies.
 - Trace data flow: where does the bad value originate?
-- State your hypothesis clearly: "I think X is the root cause because Y."
+- State the hypothesis explicitly: "I think X is the root cause because Y."
+**Output:** One clear hypothesis with evidence supporting it.
-### 3. ISOLATE -- Narrow the Scope
+### 3. ISOLATE — Narrow the Scope
 - Find the smallest reproduction case.
 - In multi-component systems, add diagnostic logging at each boundary.
 - Identify which specific layer or component is failing.
 - Compare against working examples in the codebase.
-### 4. VERIFY -- Test Your Hypothesis
+**Output:** The smallest failing case and the specific component responsible.
+### 4. VERIFY — Test the Hypothesis
+- Make the smallest possible change to test the hypothesis.
+- Change one variable at a time — never multiple things simultaneously.
+- If hypothesis is wrong: form a new hypothesis. Do not stack fixes.
-- Make the smallest possible change to test your hypothesis.
-- Change one variable at a time -- never multiple things simultaneously.
-- If hypothesis is wrong: form a new hypothesis, do not stack fixes.
+**Output:** Confirmed or rejected hypothesis, with evidence from the test.
-### 5. FIX -- Address the Root Cause
+### 5. FIX — Address the Root Cause
-- Write a failing test that reproduces the bug.
+- Write a failing test that reproduces the bug (when applicable).
 - Implement a single fix that addresses the root cause.
-- No "while I'm here" improvements -- fix only the identified issue.
+- No "while I'm here" improvements — fix only the identified issue.
-### 6. CONFIRM -- Verify the Fix
+**Output:** A minimal fix that targets exactly the identified root cause.
+### 6. CONFIRM — Verify the Fix
 - Run the original failing test: it must now pass.
 - Run the full test suite: no regressions.
 - Verify the original error no longer occurs.
+**Output:** Evidence that the bug is fixed and nothing is broken.
 ## Hypothesis Testing Protocol
 For each hypothesis:
@@ -59,11 +68,13 @@ For each hypothesis:
 1. **Form:** "I think X is the root cause because Y."
 2. **Design test:** "If X is the cause, then changing Z should produce W."
 3. **Run test:** Execute the change and observe the result.
-4. **Evaluate:** Did the result match the prediction? If yes, proceed to FIX. If no, form a new hypothesis.
+4. **Evaluate:** Did the result match the prediction? If yes, proceed to FIX. If no, discard the hypothesis and form a new one.
+Never carry forward a failed hypothesis. Each iteration starts clean.
 ## Escalation
-If 3+ fix attempts have failed, the issue is likely architectural. Document what you have tried (hypotheses tested, evidence gathered, fixes attempted) and escalate.
+If 3+ fix attempts have failed, the issue is likely architectural. Stop guessing. Document what has been tried — hypotheses tested, evidence gathered, fixes attempted — and escalate or step back to redesign.
 ## Common Pitfalls
@@ -71,9 +82,10 @@ If 3+ fix attempts have failed, the issue is likely architectural. Document what
 |--------|---------|
 | "I think I know what it is" | Thinking is not evidence. Reproduce first. |
 | "Let me just try this fix" | That is guessing. Complete REPRODUCE and HYPOTHESIZE first. |
-| "Multiple changes at once saves time" | You cannot isolate what worked. You will create new bugs. |
+| "Multiple changes at once saves time" | You cannot isolate what worked. New bugs will appear. |
 | "The issue is simple" | Simple bugs have root causes too. The process is fast for simple bugs. |
+| "The stack trace is enough" | Stack traces show where it crashed, not why. Trace the data. |
 Stop immediately if you catch yourself changing code before reproducing, proposing a fix before reading the full error, trying random fixes, or changing multiple things at once.
-See also: `/verification-before-completion` for evidence-based confirmation after fixes.
+See also: `verification-before-completion` for evidence-based confirmation after fixes.

package/dist/assets/templates/skills/tdd/SKILL.md CHANGED Viewed

@@ -1,77 +1,74 @@
 ---
 name: tdd
-description: >-
-  Test-driven development with red-green-refactor cycle and atomic commits.
-  Write failing test first, then minimal passing code, then refactor. Use when
-  implementing business logic, API endpoints, data transformations, validation
-  rules, or algorithms.
+description: Enforces red-green-refactor TDD cycle with atomic commits per phase. Use when implementing features, fixing bugs, or when tests should drive the design.
 ---
 # Test-Driven Development (TDD)
 Write the test first. Watch it fail. Write minimal code to pass. Clean up.
-## When to Use TDD
+## When to Use
-**Good fit:** Business logic with defined I/O, API endpoints with contracts, data transformations, validation rules, algorithms, state machines.
+**Good fit:** Business logic with defined I/O, API endpoints with contracts, data transformations, validation rules, algorithms, state machines, bug fixes.
-**Poor fit:** UI layout, configuration files, build scripts, one-off scripts, mechanical renames.
+**Poor fit:** UI layout, configuration files, build scripts, one-off scripts, mechanical renames, exploratory spikes.
 ## The Red-Green-Refactor Cycle
-### 1. RED -- Write One Failing Test
+### RED — Write One Failing Test
-- Write ONE minimal test describing the desired behavior
-- Test name describes what SHOULD happen, not implementation details
-- Use real code paths -- mocks only when unavoidable (external APIs, databases)
+- Write ONE minimal test describing the desired behavior.
+- Test name describes what SHOULD happen, not implementation details.
+- Use real code paths — mocks only when unavoidable (external APIs, databases, I/O).
-### 2. VERIFY RED -- Run the Test
+**Verify RED:** Run the test. It MUST fail with an assertion error — not a syntax error or import failure. If the test passes immediately, it is testing existing behavior. Rewrite it.
-- Test MUST fail with an assertion (not error out from syntax or imports)
-- Failure message must match the missing behavior
-- If the test passes immediately, you are testing existing behavior -- rewrite it
+### GREEN — Write Minimal Code
-### 3. GREEN -- Write Minimal Code
+- Write the SIMPLEST code that makes the test pass.
+- Do NOT add features the test does not require.
+- Do NOT refactor yet.
-- Write the SIMPLEST code that makes the test pass
-- Do NOT add features the test does not require
-- Do NOT refactor yet
+**Verify GREEN:** Run all tests. The new test must pass. ALL existing tests must still pass. If any existing test fails, fix the code — not the tests.
-### 4. VERIFY GREEN -- Run All Tests
+### REFACTOR — Clean Up
-- The new test MUST pass
-- ALL existing tests MUST still pass
-- If any test fails, fix code -- not tests
+- Remove duplication, improve names, extract helpers.
+- Run tests after every change.
+- Do NOT add new behavior during refactor.
-### 5. REFACTOR -- Clean Up (Tests Still Green)
+**Verify REFACTOR:** All tests still pass. No behavior changed.
-- Remove duplication, improve names, extract helpers
-- Run tests after every change
-- Do NOT add new behavior during refactor
+### REPEAT
-### 6. REPEAT -- Next failing test for next behavior
+Move to the next failing test for the next unit of behavior.
 ## Commit Pattern
-Each TDD cycle produces 2-3 atomic commits:
+Each TDD cycle produces 2–3 atomic commits:
-- **RED commit:** `test({scope}): add failing test for [feature]`
-- **GREEN commit:** `feat({scope}): implement [feature]`
-- **REFACTOR commit (if changes made):** `refactor({scope}): clean up [feature]`
+| Phase | Commit format |
+|-------|---------------|
+| RED | `test({scope}): add failing test for [feature]` |
+| GREEN | `feat({scope}): implement [feature]` |
+| REFACTOR | `refactor({scope}): clean up [feature]` (omit if no changes) |
+Keep commits small. One cycle = one feature unit = one commit group.
 ## Context Budget
-TDD uses approximately 40% more context than direct implementation due to the RED-GREEN-REFACTOR overhead. Plan accordingly for long task lists.
+TDD uses approximately 40% more context than direct implementation due to cycle overhead. Plan accordingly for long task lists.
 ## Common Pitfalls
 | Excuse | Why It Fails |
 |--------|-------------|
-| "Too simple to test" | Simple code breaks. The test takes 30 seconds. |
-| "I'll add tests after" | Tests written after pass immediately -- they prove nothing. |
+| "Too simple to test" | Simple code breaks. The test takes 30 seconds to write. |
+| "I'll add tests after" | Tests written after pass immediately — they prove nothing. |
 | "I know the code works" | Knowledge is not evidence. A passing test is evidence. |
-| "TDD is slower" | TDD is faster than debugging. Every skip creates debt. |
+| "TDD is slower" | TDD is faster than debugging. Every skipped test creates debt. |
+| "Mocks make this easy" | Over-mocking tests the mock, not the code. Prefer real code paths. |
-Stop immediately if you catch yourself writing implementation code before writing a test, writing a test that passes on the first run, skipping the VERIFY RED step, or adding features beyond what the current test requires.
+Stop immediately if you catch yourself writing implementation code before a test, writing a test that passes on the first run, skipping VERIFY RED, or adding features beyond what the current test requires.
-See also: `/verification-before-completion` for evidence-based completion claims after TDD cycles.
+See also: `verification-before-completion` for evidence-based completion claims after TDD cycles.

package/dist/assets/templates/skills/using-maxsim/SKILL.md CHANGED Viewed

@@ -1,78 +1,92 @@
 ---
 name: using-maxsim
-description: >-
-  Routes work through MAXSIM's spec-driven workflow: checks planning state,
-  determines active phase, dispatches to the correct MAXSIM command. Use when
-  starting work sessions, resuming work, or choosing which MAXSIM command to run.
+description: Routes work through MaxsimCLI commands based on project state and user intent. Provides command reference and decision routing table. Use when determining which MaxsimCLI command to use or when starting a new session.
 ---
-# Using MAXSIM
+# Using MaxsimCLI
-MAXSIM is a spec-driven development system. Work flows through phases, plans, and tasks -- not ad-hoc coding.
+MaxsimCLI is a spec-driven development system. Work flows through phases, plans, and tasks — not ad-hoc coding.
-**No implementation without a plan.** If there is no `.planning/` directory, run `/maxsim:init` first. If there is no current phase, run `/maxsim:plan` first. If there IS a plan, run `/maxsim:execute` to execute it.
+**No implementation without a plan.** If there is no `.planning/` directory, run `/maxsim:init` first. If there is no current phase, run `/maxsim:plan N` first. If there is a plan, run `/maxsim:execute N` to execute it.
-## Routing
+---
+## Command Routing Table
+Determine user intent, then route to the correct command.
+| User Intent | Command |
+|-------------|---------|
+| Start a new project | `/maxsim:init` |
+| Continue where I left off | `/maxsim:go` |
+| Plan next phase | `/maxsim:plan N` |
+| Execute planned work | `/maxsim:execute N` |
+| Fix a bug | `/maxsim:debug` |
+| Quick one-off task | `/maxsim:quick` |
+| Check progress | `/maxsim:progress` |
+| Change settings | `/maxsim:settings` |
+| See all commands | `/maxsim:help` |
+---
+## Session Start Routing
+Before beginning any task in a session:
-Before starting any task:
+1. Check for `.planning/` directory — if missing, run `/maxsim:init`
+2. Read `STATE.md` — if a checkpoint exists, resume from it via `/maxsim:go`
+3. Check `ROADMAP.md` — identify the active phase
+4. Route to the correct command using the table above
-1. **Check for `.planning/` directory** -- if missing, initialize with `/maxsim:init`
-2. **Check STATE.md** -- resume from last checkpoint if one exists
-3. **Check current phase** -- determine what phase is active in ROADMAP.md
-4. **Route to the correct command** based on the table below
+GitHub Issues with label `maxsim:lesson` or `maxsim:decision` are the source of truth for project learnings and architectural decisions. Read them before planning.
-### Command Surface (9 commands)
+---
+## Skills Per Agent Type
-| Situation | Command |
-|-----------|---------|
-| No `.planning/` directory | `/maxsim:init` |
-| No ROADMAP.md or empty roadmap | `/maxsim:init` |
-| Active phase has no PLAN.md | `/maxsim:plan N` |
-| Active phase has PLAN.md, not started | `/maxsim:execute N` |
-| Phase complete, needs verification | `/maxsim:execute N` (auto-verifies) |
-| Bug found during execution | `/maxsim:debug` |
-| Quick standalone task | `/maxsim:quick` |
-| Check overall status | `/maxsim:progress` |
-| Don't know what to do next | `/maxsim:go` |
-| Change workflow settings | `/maxsim:settings` |
-| Need command reference | `/maxsim:help` |
+Skills load on-demand based on the current task. Each agent type draws from a different set.
-### Agent Model (4 agents)
+| Agent | Primary Skills |
+|-------|---------------|
+| Planner | `research`, `brainstorming`, `roadmap-writing`, `project-memory` |
+| Executor | `tdd`, `systematic-debugging`, `verification`, `maxsim-simplify` |
+| Verifier | `verification`, `code-review`, `systematic-debugging` |
+| Researcher | `research`, `project-memory` |
-MAXSIM uses 4 generic agent types. Specialization comes from the orchestrator's spawn prompt and on-demand skills, not from separate agent definitions.
+Skills are not auto-loaded. They activate when invoked directly (e.g., `/research`) or when the orchestrator spawns an agent with explicit skill instructions.
+---
-| Agent | Role | Spawned By |
-|-------|------|-----------|
-| Executor | Implements plans with atomic commits and verified completion | `/maxsim:execute` |
-| Planner | Creates structured PLAN.md files from requirements | `/maxsim:plan` |
-| Researcher | Gathers domain knowledge and codebase context | `/maxsim:plan` (research stage) |
-| Verifier | Reviews code, checks specs, debugs failures | `/maxsim:execute` (review stage), `/maxsim:debug` |
+## GitHub as Source of Truth
-### Skills
+All persistent project state lives in GitHub, not in local files that disappear between sessions:
-Skills load on-demand based on description matching or direct `/skill-name` invocation. They are not auto-loaded -- each skill activates only when its content is relevant to the current task.
+| Artifact | Location |
+|----------|---------|
+| Phase plans | `.planning/phases/N/PLAN.md` (committed) |
+| Roadmap | `.planning/ROADMAP.md` (committed) |
+| Session state | `.planning/STATE.md` (committed after each checkpoint) |
+| Learnings | GitHub Issues — label `maxsim:lesson` |
+| Decisions | GitHub Issues — label `maxsim:decision` |
-| Skill | When It Activates |
-|-------|-------------------|
-| `systematic-debugging` | Investigating bugs, test failures, or unexpected behavior |
-| `tdd` | Implementing business logic, APIs, data transformations |
-| `verification-before-completion` | Claiming work is done, tests pass, builds succeed |
-| `memory-management` | Recurring patterns, errors, or decisions worth persisting |
-| `brainstorming` | Facing architectural choices or design decisions |
-| `roadmap-writing` | Creating or restructuring a project roadmap |
-| `maxsim-simplify` | Reviewing code for duplication, dead code, complexity |
-| `code-review` | Reviewing implementation for security, interfaces, quality |
-| `sdd` | Executing sequential tasks with fresh-agent isolation |
-| `maxsim-batch` | Parallelizing work across independent worktree units |
+---
 ## Common Pitfalls
-- Writing implementation code without a PLAN.md
-- Skipping `/maxsim:init` because "the project is simple"
-- Ignoring STATE.md checkpoints from previous sessions
+- Writing implementation code without a `PLAN.md`
+- Skipping `/maxsim:init` because the project seems simple
+- Ignoring `STATE.md` checkpoints from previous sessions
 - Working outside the current phase without explicit user approval
-- Making architectural decisions without documenting them in STATE.md
+- Making architectural decisions without recording them as `maxsim:decision` issues
+If any of these occur: stop, check the routing table, and follow the workflow.
+---
-**If any of these occur: stop, check the routing table, follow the workflow.**
+## v6 Changes from v5
-See also: `/verification-before-completion` for evidence-based completion claims.
+- `/maxsim:resume-work` replaced by `/maxsim:go`
+- `/maxsim:plan-phase` and `/maxsim:execute-phase` replaced by `/maxsim:plan N` and `/maxsim:execute N`
+- Project memory now uses GitHub Issues instead of local STATE.md comments
+- `research` skill merges former `research-methodology` and `tool-priority-guide`
+- `project-memory` skill replaces `memory-management`

package/dist/assets/templates/skills/verification/SKILL.md ADDED Viewed

@@ -0,0 +1,167 @@
+---
+name: verification
+description: Evidence-based verification with quality gates, anti-rationalization enforcement, and retry escalation. Merges gate framework, evidence collection, and completion verification into one authoritative source. Use when completing tasks, verifying implementations, or before claiming work is done.
+---
+## The Iron Law
+No completion claim is valid without fresh verification evidence produced in THIS session. Evidence from a prior session, a prior attempt, or reasoning about what "should" be true does not count. If the evidence was not collected by running a tool call in the current session, it does not exist.
+---
+## Evidence Block Format
+Every claim about task completion, test status, build status, or spec compliance requires an Evidence Block. Produce one per claim.
+```
+**CLAIM**: [The specific assertion being made]
+**EVIDENCE**: [Tool name and exact command or action taken]
+**OUTPUT**: [Actual output, quoted verbatim — not paraphrased]
+**VERDICT**: PASS | FAIL | SKIPPED
+```
+SKIPPED is only allowed when the claim is explicitly out of scope and the reason is documented. A skipped gate must be acknowledged by the caller.
+---
+## 4 Quality Gates
+Gates run in order. A failure at any gate stops forward progress until resolved.
+### Gate 1 — Input Gate
+Run before work begins.
+- Spec or task definition exists and is unambiguous
+- Acceptance criteria are stated explicitly
+- Required inputs (files, configs, credentials) are present
+- Scope boundaries are defined — what is in and what is out
+Failure action: Return to requester with a clarifying question. Do not guess at requirements.
+### Gate 2 — Pre-Action Gate
+Run before executing changes.
+- Git state is clean or the working branch is correctly scoped
+- Dependencies are installed and match the lockfile
+- Linter and formatter configs are present
+- No blocking issues from a previous failed run remain in the working tree
+Failure action: Resolve the blocking state first. Document what was found and what was done to fix it.
+### Gate 3 — Completion Gate
+Run after implementation, before declaring done.
+- All tests pass (fresh run, not cached)
+- Build exits with code 0
+- Lint is clean
+- Every acceptance criterion from Gate 1 is addressed with an Evidence Block
+- No files are left in a modified-but-uncommitted state unless that is the intended deliverable
+Failure action: Fix failures. Do not skip a failing test. Do not suppress a lint error. Each fix resets the gate — re-run from the top of Gate 3.
+### Gate 4 — Quality Gate
+Run after Gate 3 passes.
+- Code review concerns (if any were raised) are resolved
+- No regressions introduced — GUARD command confirms this (see below)
+- Evidence Blocks are complete and attached to the work artifact
+- Handoff contract or completion note is written if another agent will consume this output
+Failure action: Rework the implementation. If regressions are found, revert before attempting a fix.
+---
+## What Counts as Evidence
+| Claim | Required Evidence | NOT Sufficient |
+|---|---|---|
+| Tests pass | `npm test` output showing pass count and zero failures | "I ran the tests" |
+| Build succeeds | `npm run build` (or equivalent) with exit code 0 shown | "Build should work" |
+| Lint is clean | `npm run lint` output with zero errors and zero warnings | "No obvious lint issues" |
+| File was created | `ls -la <path>` or Read tool output showing the file | "I wrote the file" |
+| Function behaves correctly | Test output or REPL output showing actual return value | "The logic looks right" |
+| API responds correctly | Actual HTTP response body and status code | "The endpoint exists" |
+| Dependency is installed | `package.json` or lockfile entry shown verbatim | "I installed it earlier" |
+| Spec is met | Quoted spec requirement next to quoted output proving it | "This matches the spec" |
+| No regressions | GUARD command output from this session | "Nothing was broken" |
+| Migration ran | Migration log or schema diff output | "I ran the migration" |
+---
+## Verify + Guard Pattern
+Every task execution uses two paired commands.
+**VERIFY** — "Did this task accomplish its stated goal?"
+Run after implementation. Produces an Evidence Block for each acceptance criterion. If any criterion fails, the task is not done.
+**GUARD** — "Did this change break what was already working?"
+Run after VERIFY passes. Executes the full test suite and any smoke checks that existed before the task started. If GUARD fails after VERIFY passes, the implementation introduced a regression.
+### Regression Protocol
+1. VERIFY passes, GUARD fails: attempt rework, limit 2 rework cycles
+2. After 2 rework cycles: revert the change entirely, escalate to user
+3. Do not merge a change where GUARD is failing
+---
+## Anti-Rationalization Table
+These phrases indicate a verification failure. They are never acceptable as evidence.
+| Forbidden Phrase | Why It Fails |
+|---|---|
+| "should work" | Describes expectation, not observed outcome |
+| "I already checked" | Not verifiable in this session |
+| "tests were passing before" | Stale evidence; fresh run required |
+| "this is obviously correct" | Correctness is measured, not assessed by inspection |
+| "I think it's fine" | No tool output, no claim |
+| "the logic is sound" | Logic can be sound and still produce wrong output |
+| "nothing changed in that area" | Changes in dependencies, configs, and imports are invisible to this claim |
+| "it worked in my local run" | Local run is not this session's evidence unless tool output is shown |
+| "we can verify later" | Verification deferred is verification skipped |
+| "this is low risk" | Risk level does not substitute for evidence |
+---
+## Retry Logic
+### Attempt Counting
+Each task starts at attempt 1. A failed gate that triggers a rework cycle increments the attempt counter. Attempt count resets only when the task scope changes materially.
+### Per-Attempt Rules
+- **Attempt 1**: Execute normally. Collect full evidence. If gates pass, complete.
+- **Attempt 2**: Fresh context. Do not carry forward assumptions from attempt 1. Re-read the spec. Re-run all gates from Gate 1.
+- **Attempt 3**: Fresh agent context. Treat this as a cold start. Diagnose why attempts 1 and 2 failed before touching any code.
+### After 3 Failures
+Escalate to the user. The escalation must include:
+1. The original task spec (quoted)
+2. What was attempted in each of the 3 runs (brief, factual)
+3. The specific gate that failed each time and the exact error output
+4. A diagnostic summary: is this a spec problem, an environment problem, or an implementation problem?
+5. A proposed next step (rewrite spec, fix environment, reduce scope)
+Do not attempt a 4th run without user acknowledgment and revised instructions.
+---
+## Common Pitfalls
+| Pitfall | Symptom | Correct Behavior |
+|---|---|---|
+| Caching test results | Reporting pass without re-running | Always run tests fresh; use `--no-cache` or equivalent |
+| Partial lint scope | Running lint on one file, claiming lint is clean | Run lint on the entire affected module or project |
+| Missing Gate 1 | Starting work before spec is confirmed | Always confirm acceptance criteria exist before writing code |
+| Evidence copied from prior session | Referencing output not produced in this session | All evidence must come from tool calls in the current session |
+| Verifying only the happy path | Tests pass but edge cases are untested | GUARD must include regression tests, not only new tests |
+| Skipping Gate 4 after Gate 3 passes | Declaring done without regression check | Gate 3 and Gate 4 are both required; neither is optional |
+| Conflating "no errors" with "correct output" | Exit code 0 but wrong behavior | Evidence must show correct output, not just absence of error |
+| Writing evidence after the fact | Constructing output from memory | Run the command, capture the output, paste it verbatim |