maxsimcli 4.0.0 → 4.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,104 +1,71 @@
1
1
  ---
2
2
  name: systematic-debugging
3
- description: Use when encountering any bug, test failure, or unexpected behavior — requires root cause investigation before attempting any fix
3
+ description: >-
4
+ Investigates bugs through systematic root-cause analysis: reproduce, hypothesize,
5
+ isolate, verify, fix, confirm. Use when encountering any bug, test failure,
6
+ unexpected behavior, or error message.
4
7
  ---
5
8
 
6
9
  # Systematic Debugging
7
10
 
8
- Random fixes waste time and create new bugs. Find the root cause first.
11
+ Find the root cause first. Random fixes waste time and create new bugs.
9
12
 
10
- **If you have not identified the root cause, you are guessing not debugging.**
13
+ **HARD GATE -- No fix attempts without understanding root cause. If you have not completed the REPRODUCE and HYPOTHESIZE steps, you cannot propose a fix.**
11
14
 
12
- ## The Iron Law
15
+ ## Process
13
16
 
14
- <HARD-GATE>
15
- NO FIX ATTEMPTS WITHOUT UNDERSTANDING ROOT CAUSE.
16
- If you have not completed the REPRODUCE and HYPOTHESIZE steps, you CANNOT propose a fix.
17
- "Let me just try this" is guessing, not debugging.
18
- Violating this rule is a violation — not a time-saving shortcut.
19
- </HARD-GATE>
20
-
21
- ## The Gate Function
22
-
23
- Follow these steps IN ORDER for every bug, test failure, or unexpected behavior.
24
-
25
- ### 1. REPRODUCE — Confirm the Problem
17
+ ### 1. REPRODUCE -- Confirm the Problem
26
18
 
27
19
  - Run the failing command or test. Capture the EXACT error output.
28
20
  - Can you trigger it reliably? What are the exact steps?
29
- - If not reproducible: gather more data do not guess.
30
-
31
- ```bash
32
- # Example: reproduce a test failure
33
- npx vitest run path/to/failing.test.ts
34
- ```
21
+ - If not reproducible: gather more data -- do not guess.
35
22
 
36
- ### 2. HYPOTHESIZE Form a Theory
23
+ ### 2. HYPOTHESIZE -- Form a Theory
37
24
 
38
- - Read the error message COMPLETELY (stack trace, line numbers, exit codes)
39
- - Check recent changes: `git diff`, recent commits, new dependencies
25
+ - Read the error message completely (stack trace, line numbers, exit codes).
26
+ - Check recent changes: `git diff`, recent commits, new dependencies.
40
27
  - Trace data flow: where does the bad value originate?
41
- - State your hypothesis clearly: "I think X is the root cause because Y"
28
+ - State your hypothesis clearly: "I think X is the root cause because Y."
42
29
 
43
- ### 3. ISOLATE Narrow the Scope
30
+ ### 3. ISOLATE -- Narrow the Scope
44
31
 
45
- - Find the SMALLEST reproduction case
46
- - In multi-component systems, add diagnostic logging at each boundary
47
- - Identify which SPECIFIC layer or component is failing
48
- - Compare against working examples in the codebase
32
+ - Find the smallest reproduction case.
33
+ - In multi-component systems, add diagnostic logging at each boundary.
34
+ - Identify which specific layer or component is failing.
35
+ - Compare against working examples in the codebase.
49
36
 
50
- ### 4. VERIFY Test Your Hypothesis
37
+ ### 4. VERIFY -- Test Your Hypothesis
51
38
 
52
- - Make the SMALLEST possible change to test your hypothesis
53
- - Change ONE variable at a time never multiple things simultaneously
54
- - If hypothesis is wrong: form a NEW hypothesis, do not stack fixes
39
+ - Make the smallest possible change to test your hypothesis.
40
+ - Change one variable at a time -- never multiple things simultaneously.
41
+ - If hypothesis is wrong: form a new hypothesis, do not stack fixes.
55
42
 
56
- ### 5. FIX Address the Root Cause
43
+ ### 5. FIX -- Address the Root Cause
57
44
 
58
- - Write a failing test that reproduces the bug (see TDD skill)
59
- - Implement a SINGLE fix that addresses the root cause
60
- - No "while I'm here" improvements fix only the identified issue
45
+ - Write a failing test that reproduces the bug.
46
+ - Implement a single fix that addresses the root cause.
47
+ - No "while I'm here" improvements -- fix only the identified issue.
61
48
 
62
- ### 6. CONFIRM Verify the Fix
49
+ ### 6. CONFIRM -- Verify the Fix
63
50
 
64
- - Run the original failing test: it must now pass
65
- - Run the full test suite: no regressions
66
- - Verify the original error no longer occurs
51
+ - Run the original failing test: it must now pass.
52
+ - Run the full test suite: no regressions.
53
+ - Verify the original error no longer occurs.
67
54
 
68
- ```bash
69
- # Confirm the specific fix
70
- npx vitest run path/to/fixed.test.ts
71
- # Confirm no regressions
72
- npx vitest run
73
- ```
55
+ ## Common Pitfalls
74
56
 
75
- ## Common Rationalizations REJECT THESE
76
-
77
- | Excuse | Why It Violates the Rule |
78
- |--------|--------------------------|
79
- | "I think I know what it is" | Thinking is not evidence. Reproduce first, then hypothesize. |
80
- | "Let me just try this fix" | "Just try" = guessing. You have skipped REPRODUCE and HYPOTHESIZE. |
81
- | "Quick patch for now, investigate later" | "Later" never comes. Patches mask the real problem. |
57
+ | Excuse | Reality |
58
+ |--------|---------|
59
+ | "I think I know what it is" | Thinking is not evidence. Reproduce first. |
60
+ | "Let me just try this fix" | That is guessing. Complete REPRODUCE and HYPOTHESIZE first. |
82
61
  | "Multiple changes at once saves time" | You cannot isolate what worked. You will create new bugs. |
83
- | "The issue is simple, I don't need the process" | Simple bugs have root causes too. The process is fast for simple bugs. |
84
- | "I'm under time pressure" | Systematic debugging IS faster than guess-and-check thrashing. |
85
- | "The reference is too long, I'll skim it" | Partial understanding guarantees partial fixes. Read it completely. |
86
-
87
- ## Red Flags — STOP If You Catch Yourself:
88
-
89
- - Changing code before reproducing the error
90
- - Proposing a fix before reading the full error message and stack trace
91
- - Trying random fixes hoping one will work
92
- - Changing multiple things simultaneously
93
- - Saying "it's probably X" without evidence
94
- - Applying a fix that did not work, then adding another fix on top
95
- - On your 3rd failed fix attempt (this signals an architectural problem — escalate)
62
+ | "The issue is simple" | Simple bugs have root causes too. The process is fast for simple bugs. |
96
63
 
97
- **If any red flag triggers: STOP. Return to step 1 (REPRODUCE).**
64
+ Stop immediately if you catch yourself changing code before reproducing, proposing a fix before reading the full error, trying random fixes, or changing multiple things at once. If any of these triggers, return to step 1.
98
65
 
99
- **If 3+ fix attempts have failed:** The issue is likely architectural, not a simple bug. Document what you have tried and escalate to the user for a design decision.
66
+ If 3+ fix attempts have failed, the issue is likely architectural. Document what you have tried and escalate to the user.
100
67
 
101
- ## Verification Checklist
68
+ ## Verification
102
69
 
103
70
  Before claiming a bug is fixed, confirm:
104
71
 
@@ -110,9 +77,9 @@ Before claiming a bug is fixed, confirm:
110
77
  - [ ] The full test suite passes (no regressions)
111
78
  - [ ] The original error no longer occurs when running the original steps
112
79
 
113
- ## Debugging in MAXSIM Context
80
+ ## MAXSIM Integration
114
81
 
115
82
  When debugging during plan execution, MAXSIM deviation rules apply:
116
83
  - **Rule 1 (Auto-fix bugs):** You may auto-fix bugs found during execution, but you must still follow this debugging process.
117
- - **Rule 4 (Architectural changes):** If 3+ fix attempts fail, STOP and return a checkpoint this is an architectural decision for the user.
84
+ - **Rule 4 (Architectural changes):** If 3+ fix attempts fail, STOP and return a checkpoint -- this is an architectural decision for the user.
118
85
  - Track all debugging deviations for SUMMARY.md documentation.
@@ -1,93 +1,70 @@
1
1
  ---
2
2
  name: tdd
3
- description: Use when implementing any feature or bug fix — requires writing a failing test before any implementation code
3
+ description: >-
4
+ Enforces test-driven development with the Red-Green-Refactor cycle: write a
5
+ failing test first, implement minimal code to pass, then refactor. Use when
6
+ implementing features, fixing bugs, or adding new behavior.
4
7
  ---
5
8
 
6
9
  # Test-Driven Development (TDD)
7
10
 
8
11
  Write the test first. Watch it fail. Write minimal code to pass. Clean up.
9
12
 
10
- **If you did not watch the test fail, you do not know if it tests the right thing.**
13
+ **HARD GATE: No implementation code without a failing test first. If you wrote production code before the test, delete it and start over. No exceptions.**
11
14
 
12
- ## The Iron Law
15
+ ## Process
13
16
 
14
- <HARD-GATE>
15
- NO IMPLEMENTATION CODE WITHOUT A FAILING TEST FIRST.
16
- If you wrote production code before the test, DELETE IT. Start over.
17
- No exceptions. No "I'll add tests after." No "keep as reference."
18
- Violating this rule is a violation — not a judgment call.
19
- </HARD-GATE>
17
+ ### 1. RED -- Write One Failing Test
20
18
 
21
- ## The Gate Function
22
-
23
- Follow this cycle for every behavior change, feature addition, or bug fix.
24
-
25
- ### 1. RED — Write Failing Test
26
-
27
- - Write ONE minimal test that describes the desired behavior
19
+ - Write ONE minimal test describing the desired behavior
28
20
  - Test name describes what SHOULD happen, not implementation details
29
- - Use real code paths mocks only when unavoidable (external APIs, databases)
21
+ - Use real code paths -- mocks only when unavoidable (external APIs, databases)
30
22
 
31
- ### 2. VERIFY RED Run the Test
23
+ ### 2. VERIFY RED -- Run the Test
32
24
 
33
- ```bash
34
- # Run the test suite for this file
35
- npx vitest run path/to/test.test.ts
36
- ```
37
-
38
- - Test MUST fail (not error — fail with an assertion)
25
+ - Test MUST fail with an assertion (not error out from syntax or imports)
39
26
  - Failure message must match the missing behavior
40
- - If test passes immediately: you are testing existing behavior rewrite it
27
+ - If the test passes immediately, you are testing existing behavior -- rewrite it
41
28
 
42
- ### 3. GREEN Write Minimal Code
29
+ ### 3. GREEN -- Write Minimal Code
43
30
 
44
31
  - Write the SIMPLEST code that makes the test pass
45
32
  - Do NOT add features the test does not require
46
- - Do NOT refactor yet — that comes next
47
-
48
- ### 4. VERIFY GREEN — Run All Tests
33
+ - Do NOT refactor yet
49
34
 
50
- ```bash
51
- npx vitest run
52
- ```
35
+ ### 4. VERIFY GREEN -- Run All Tests
53
36
 
54
37
  - The new test MUST pass
55
38
  - ALL existing tests MUST still pass
56
- - If any test fails: fix code, not tests
39
+ - If any test fails, fix code -- not tests
57
40
 
58
- ### 5. REFACTOR Clean Up (Tests Still Green)
41
+ ### 5. REFACTOR -- Clean Up (Tests Still Green)
59
42
 
60
43
  - Remove duplication, improve names, extract helpers
61
- - Run tests after every change — they must stay green
44
+ - Run tests after every change
62
45
  - Do NOT add new behavior during refactor
63
46
 
64
- ### 6. REPEAT Next failing test for next behavior
47
+ ### 6. REPEAT -- Next failing test for next behavior
65
48
 
66
- ## Common Rationalizations — REJECT THESE
49
+ ## Common Pitfalls
67
50
 
68
- | Excuse | Why It Violates the Rule |
69
- |--------|--------------------------|
70
- | "Too simple to test" | Simple code breaks. The test takes 30 seconds to write. |
71
- | "I'll add tests after" | Tests written after pass immediately they prove nothing. |
72
- | "The test framework isn't set up yet" | Set it up. That is part of the task, not a reason to skip. |
51
+ | Excuse | Why it fails |
52
+ |--------|-------------|
53
+ | "Too simple to test" | Simple code breaks. The test takes 30 seconds. |
54
+ | "I'll add tests after" | Tests written after pass immediately -- they prove nothing. |
73
55
  | "I know the code works" | Knowledge is not evidence. A passing test is evidence. |
74
- | "TDD is slower for this task" | TDD is faster than debugging. Every "quick skip" creates debt. |
56
+ | "TDD is slower" | TDD is faster than debugging. Every skip creates debt. |
75
57
  | "Let me keep the code as reference" | You will adapt it instead of writing test-first. Delete means delete. |
76
- | "I need to explore the design first" | Explore, then throw it away. Start implementation with TDD. |
77
58
 
78
- ## Red Flags STOP If You Catch Yourself:
59
+ Stop immediately if you catch yourself:
79
60
 
80
61
  - Writing implementation code before writing a test
81
- - Writing a test that passes on the first run (you are testing existing behavior)
82
- - Skipping the VERIFY RED step ("I know it will fail")
62
+ - Writing a test that passes on the first run
63
+ - Skipping the VERIFY RED step
83
64
  - Adding features beyond what the current test requires
84
- - Skipping the REFACTOR step to save time
85
- - Rationalizing "just this once" or "this is different"
86
- - Keeping pre-TDD code "as reference" while writing tests
65
+ - Keeping pre-TDD code "as reference"
87
66
 
88
- **If any red flag triggers: STOP. Delete the implementation. Write the test first.**
89
-
90
- ## Verification Checklist
67
+ ## Verification
91
68
 
92
69
  Before claiming TDD compliance, confirm:
93
70
 
@@ -99,20 +76,10 @@ Before claiming TDD compliance, confirm:
99
76
  - [ ] All tests pass after implementation
100
77
  - [ ] Refactoring (if any) did not break any tests
101
78
 
102
- Cannot check all boxes? You skipped TDD. Start over.
103
-
104
- ## When Stuck
105
-
106
- | Problem | Solution |
107
- |---------|----------|
108
- | Don't know how to test it | Write the assertion first. What should the output be? |
109
- | Test setup is too complex | The design is too complex. Simplify the interface. |
110
- | Must mock everything | Code is too coupled. Use dependency injection. |
111
- | Existing code has no tests | Add tests for the code you are changing. Start the cycle now. |
112
-
113
- ## Integration with MAXSIM
79
+ ## MAXSIM Integration
114
80
 
115
81
  In MAXSIM plan execution, tasks marked `tdd="true"` follow this cycle with per-step commits:
82
+
116
83
  - **RED commit:** `test({phase}-{plan}): add failing test for [feature]`
117
84
  - **GREEN commit:** `feat({phase}-{plan}): implement [feature]`
118
85
  - **REFACTOR commit (if changes made):** `refactor({phase}-{plan}): clean up [feature]`
@@ -1,36 +1,33 @@
1
1
  ---
2
2
  name: using-maxsim
3
- description: Entry skill that establishes MAXSIM workflow rules — triggers before any action to route work through the correct MAXSIM commands, skills, and agents
4
3
  alwaysApply: true
4
+ description: >-
5
+ Routes all work through MAXSIM's spec-driven workflow: checks for planning
6
+ directory, determines active phase, and dispatches to the correct MAXSIM
7
+ command. Use when starting any work session, resuming work, or when unsure
8
+ which MAXSIM command to run.
5
9
  ---
6
10
 
7
11
  # Using MAXSIM
8
12
 
9
- MAXSIM is a spec-driven development system. Work flows through phases, plans, and tasks not ad-hoc coding.
13
+ MAXSIM is a spec-driven development system. Work flows through phases, plans, and tasks -- not ad-hoc coding.
10
14
 
11
- **If you are about to write code without a plan, STOP. Route through MAXSIM first.**
12
-
13
- ## The Iron Law
14
-
15
- <HARD-GATE>
16
- NO IMPLEMENTATION WITHOUT A PLAN.
17
- If there is no .planning/ directory, run `/maxsim:init` first.
15
+ **HARD GATE -- No implementation without a plan.**
16
+ If there is no `.planning/` directory, run `/maxsim:init` first.
18
17
  If there is no current phase, run `/maxsim:plan-phase` first.
19
18
  If there is no PLAN.md for the current phase, run `/maxsim:plan-phase` first.
20
19
  If there IS a plan, run `/maxsim:execute-phase` to execute it.
21
- Skipping the workflow is a violation — not a shortcut.
22
- </HARD-GATE>
23
20
 
24
- ## When This Skill Triggers
21
+ ## Process
25
22
 
26
- This skill applies to ALL work sessions. Before starting any task:
23
+ Before starting any task:
27
24
 
28
- 1. **Check for `.planning/` directory** if missing, initialize with `/maxsim:init`
29
- 2. **Check STATE.md** resume from last checkpoint if one exists
30
- 3. **Check current phase** determine what phase is active in ROADMAP.md
31
- 4. **Route to the correct command** based on the situation (see routing table below)
25
+ 1. **Check for `.planning/` directory** -- if missing, initialize with `/maxsim:init`
26
+ 2. **Check STATE.md** -- resume from last checkpoint if one exists
27
+ 3. **Check current phase** -- determine what phase is active in ROADMAP.md
28
+ 4. **Route to the correct command** based on the routing table below
32
29
 
33
- ## Routing Table
30
+ ### Routing Table
34
31
 
35
32
  | Situation | Route To |
36
33
  |-----------|----------|
@@ -39,18 +36,18 @@ This skill applies to ALL work sessions. Before starting any task:
39
36
  | Active phase has no PLAN.md | `/maxsim:plan-phase` |
40
37
  | Active phase has PLAN.md, not started | `/maxsim:execute-phase` |
41
38
  | Checkpoint exists in STATE.md | `/maxsim:resume-work` |
42
- | Bug found during execution | `/maxsim:debug` (triggers systematic-debugging skill) |
43
- | Phase complete, needs verification | `/maxsim:verify-phase` (triggers verification-before-completion skill) |
39
+ | Bug found during execution | `/maxsim:debug` |
40
+ | Phase complete, needs verification | `/maxsim:verify-phase` |
44
41
  | Quick standalone task | `/maxsim:quick` |
45
42
  | User asks for help | `/maxsim:help` |
46
43
 
47
- ## Available Skills
44
+ ### Available Skills
48
45
 
49
46
  Skills are behavioral rules that activate automatically based on context:
50
47
 
51
48
  | Skill | Triggers When |
52
49
  |-------|---------------|
53
- | `using-maxsim` | Always (alwaysApply) entry point for all MAXSIM work |
50
+ | `using-maxsim` | Always (alwaysApply) -- entry point for all MAXSIM work |
54
51
  | `systematic-debugging` | Any bug, test failure, or unexpected behavior encountered |
55
52
  | `tdd` | Implementing any feature or bug fix (write test first) |
56
53
  | `verification-before-completion` | Before claiming any work is complete or passing |
@@ -60,7 +57,7 @@ Skills are behavioral rules that activate automatically based on context:
60
57
  | `simplify` | When reviewing and cleaning up code changes |
61
58
  | `code-review` | When reviewing implementation quality |
62
59
 
63
- ## Available Agents
60
+ ### Available Agents
64
61
 
65
62
  Agents are specialized subagent prompts spawned by MAXSIM commands:
66
63
 
@@ -80,17 +77,7 @@ Agents are specialized subagent prompts spawned by MAXSIM commands:
80
77
  | `maxsim-codebase-mapper` | Maps codebase structure | `/maxsim:init` |
81
78
  | `maxsim-integration-checker` | Checks integration points | `/maxsim:verify-phase` |
82
79
 
83
- ## Common Rationalizations — REJECT THESE
84
-
85
- | Excuse | Why It Violates the Rule |
86
- |--------|--------------------------|
87
- | "It's just a small fix" | Small fixes have context and consequences. Use `/maxsim:quick`. |
88
- | "I know what to do, I don't need a plan" | Plans catch what you miss. The plan is the checkpoint. |
89
- | "MAXSIM overhead is too much for this" | `/maxsim:quick` exists for lightweight tasks. Use it. |
90
- | "I'll plan it in my head" | Plans in your head die with context. Write them down. |
91
- | "The user said 'just do it'" | Route through `/maxsim:quick` — it is fast and still tracked. |
92
-
93
- ## Red Flags — STOP If You Catch Yourself:
80
+ ## Common Pitfalls
94
81
 
95
82
  - Writing implementation code without a PLAN.md
96
83
  - Skipping `/maxsim:init` because "the project is simple"
@@ -99,22 +86,22 @@ Agents are specialized subagent prompts spawned by MAXSIM commands:
99
86
  - Making architectural decisions without documenting them in STATE.md
100
87
  - Finishing work without running verification
101
88
 
102
- **If any red flag triggers: STOP. Check the routing table. Follow the workflow.**
89
+ **If any of these occur: stop, check the routing table, follow the workflow.**
103
90
 
104
- ## Verification Checklist
91
+ ## Verification
105
92
 
106
93
  Before ending any work session:
107
94
 
108
95
  - [ ] All work was routed through MAXSIM commands (not ad-hoc)
109
96
  - [ ] STATE.md reflects current progress and decisions
110
- - [ ] Any bugs encountered were debugged systematically (not guessed)
97
+ - [ ] Any bugs encountered were debugged systematically
111
98
  - [ ] Tests were written before implementation (TDD)
112
99
  - [ ] Completion claims have verification evidence
113
100
  - [ ] Recurring patterns or errors were saved to memory
114
101
 
115
- ## Integration with CLAUDE.md
102
+ ## MAXSIM Integration
116
103
 
117
104
  When a project has a `CLAUDE.md`, both apply:
118
105
  - `CLAUDE.md` defines project-specific conventions (language, tools, style)
119
106
  - MAXSIM skills define workflow rules (how work is structured and verified)
120
- - If they conflict, `CLAUDE.md` project conventions take priority for code style; MAXSIM takes priority for workflow structure
107
+ - If they conflict, `CLAUDE.md` takes priority for code style; MAXSIM takes priority for workflow structure
@@ -1,36 +1,28 @@
1
1
  ---
2
2
  name: verification-before-completion
3
- description: Use before claiming any work is complete, fixed, or passing — requires running verification commands and reading output before making success claims
3
+ description: >-
4
+ Requires running verification commands and reading actual output before making
5
+ any completion claims. Use when claiming work is done, tests pass, builds
6
+ succeed, or bugs are fixed. Prevents false completion claims.
4
7
  ---
5
8
 
6
9
  # Verification Before Completion
7
10
 
8
- Claiming work is complete without verification is dishonesty, not efficiency.
11
+ Evidence before claims, always.
9
12
 
10
- **Evidence before claims, always.**
13
+ **HARD GATE -- No completion claims without fresh verification evidence. If you have not run the verification command in this turn, you cannot claim it passes. "Should work" is not evidence. "I'm confident" is not evidence.**
11
14
 
12
- ## The Iron Law
15
+ ## Process
13
16
 
14
- <HARD-GATE>
15
- NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.
16
- If you have not run the verification command in this turn, you CANNOT claim it passes.
17
- "Should work" is not evidence. "I'm confident" is not evidence.
18
- Violating this rule is a violation — not a special case.
19
- </HARD-GATE>
17
+ Before claiming any status or marking a task done:
20
18
 
21
- ## The Gate Function
22
-
23
- BEFORE claiming any status, expressing satisfaction, or marking a task done:
24
-
25
- 1. **IDENTIFY:** What command proves this claim?
26
- 2. **RUN:** Execute the FULL command (fresh, in this turn — not a previous run)
27
- 3. **READ:** Read the FULL output. Check the exit code. Count failures.
28
- 4. **VERIFY:** Does the output actually confirm the claim?
29
- - If NO: State the actual status with evidence
30
- - If YES: State the claim WITH the evidence
31
- 5. **CLAIM:** Only now may you assert completion
32
-
33
- **Skip any step = lying, not verifying.**
19
+ 1. **IDENTIFY** -- What command proves this claim?
20
+ 2. **RUN** -- Execute the full command fresh in this turn (not a previous run)
21
+ 3. **READ** -- Read the full output, check the exit code, count failures
22
+ 4. **VERIFY** -- Does the output actually confirm the claim?
23
+ - If NO: state the actual status with evidence
24
+ - If YES: state the claim with the evidence
25
+ 5. **CLAIM** -- Only now may you assert completion
34
26
 
35
27
  ### Evidence Block Format
36
28
 
@@ -43,35 +35,11 @@ OUTPUT: [relevant excerpt of actual output]
43
35
  VERDICT: PASS | FAIL
44
36
  ```
45
37
 
46
- This format is required for task completion claims in MAXSIM plan execution. It is NOT required for intermediate status updates like "I have read the file" or "here is the plan."
47
-
48
- ## Common Rationalizations — REJECT THESE
49
-
50
- | Excuse | Why It Violates the Rule |
51
- |--------|--------------------------|
52
- | "Should work now" | "Should" is not evidence. RUN the command. |
53
- | "I'm confident in the logic" | Confidence is not evidence. Run it. |
54
- | "The linter passed" | Linter passing does not mean tests pass or build succeeds. |
55
- | "Just this once" | NO EXCEPTIONS. This is the rule, not a guideline. |
56
- | "I only changed one line" | One line can break everything. Verify. |
57
- | "The subagent reported success" | Trust test output and VCS diffs, not agent reports. |
58
- | "Partial check is enough" | Partial proves nothing about the unchecked parts. |
59
-
60
- ## Red Flags — STOP If You Catch Yourself:
38
+ This format is required for task completion claims in MAXSIM plan execution. It is not required for intermediate status updates.
61
39
 
62
- - Using "should", "probably", "seems to", or "looks good" about unverified work
63
- - Expressing satisfaction ("Great!", "Perfect!", "Done!") before running verification
64
- - About to commit or push without running the test/build command in THIS turn
65
- - Trusting a subagent's completion report without independent verification
66
- - Thinking "the last run was clean, I only changed one line"
67
- - About to mark a MAXSIM task as done without running the `<verify>` block
68
- - Relying on a previous turn's test output as current evidence
40
+ ### What Counts as Verification
69
41
 
70
- **If any red flag triggers: STOP. Run the command. Read the output. THEN make the claim.**
71
-
72
- ## What Counts as Verification
73
-
74
- | Claim | Requires | NOT Sufficient |
42
+ | Claim | Requires | Not Sufficient |
75
43
  |-------|----------|----------------|
76
44
  | "Tests pass" | Test command output showing 0 failures | Previous run, "should pass", partial run |
77
45
  | "Build succeeds" | Build command with exit code 0 | Linter passing, "logs look clean" |
@@ -79,7 +47,19 @@ This format is required for task completion claims in MAXSIM plan execution. It
79
47
  | "Task is complete" | All done criteria checked with evidence | "I implemented everything in the plan" |
80
48
  | "No regressions" | Full test suite passing | "I only changed one file" |
81
49
 
82
- ## Verification Checklist
50
+ ## Common Pitfalls
51
+
52
+ | Excuse | Why It Fails |
53
+ |--------|-------------|
54
+ | "Should work now" | "Should" is not evidence. Run the command. |
55
+ | "I'm confident in the logic" | Confidence is not evidence. Run it. |
56
+ | "The linter passed" | Linter passing does not mean tests pass or build succeeds. |
57
+ | "I only changed one line" | One line can break everything. Verify. |
58
+ | "The subagent reported success" | Trust test output and VCS diffs, not agent reports. |
59
+
60
+ Stop if you catch yourself using "should", "probably", or "looks good" about unverified work, or expressing satisfaction before running verification.
61
+
62
+ ## Verification
83
63
 
84
64
  Before marking any work as complete:
85
65
 
@@ -91,12 +71,13 @@ Before marking any work as complete:
91
71
  - [ ] No "should", "probably", or "seems to" in your completion statement
92
72
  - [ ] Evidence block produced for the task completion claim
93
73
 
94
- ## In MAXSIM Plan Execution
74
+ ## MAXSIM Integration
75
+
76
+ The executor's task commit protocol requires verification before committing:
95
77
 
96
- The executor's task commit protocol requires verification BEFORE committing:
97
- 1. Run the task's `<verify>` block (automated checks)
98
- 2. Confirm the `<done>` criteria are met with evidence
78
+ 1. Run the task's verify block (automated checks)
79
+ 2. Confirm the done criteria are met with evidence
99
80
  3. Produce an evidence block for the task completion
100
81
  4. Only then: stage files and commit
101
82
 
102
- The verifier agent independently re-checks all claims do not assume the verifier will catch what you missed.
83
+ The verifier agent independently re-checks all claims -- do not assume the verifier will catch what you missed.