maxsimcli 4.0.0 → 4.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +43 -17
- package/dist/assets/CHANGELOG.md +60 -0
- package/dist/assets/templates/skills/batch-worktree/SKILL.md +38 -85
- package/dist/assets/templates/skills/brainstorming/SKILL.md +44 -114
- package/dist/assets/templates/skills/code-review/SKILL.md +43 -71
- package/dist/assets/templates/skills/memory-management/SKILL.md +36 -100
- package/dist/assets/templates/skills/roadmap-writing/SKILL.md +39 -73
- package/dist/assets/templates/skills/sdd/SKILL.md +36 -85
- package/dist/assets/templates/skills/simplify/SKILL.md +96 -139
- package/dist/assets/templates/skills/systematic-debugging/SKILL.md +41 -74
- package/dist/assets/templates/skills/tdd/SKILL.md +32 -65
- package/dist/assets/templates/skills/using-maxsim/SKILL.md +26 -39
- package/dist/assets/templates/skills/verification-before-completion/SKILL.md +37 -56
- package/dist/mcp-server.cjs +22489 -32
- package/dist/mcp-server.cjs.map +1 -1
- package/package.json +1 -1
|
@@ -1,104 +1,71 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: systematic-debugging
|
|
3
|
-
description:
|
|
3
|
+
description: >-
|
|
4
|
+
Investigates bugs through systematic root-cause analysis: reproduce, hypothesize,
|
|
5
|
+
isolate, verify, fix, confirm. Use when encountering any bug, test failure,
|
|
6
|
+
unexpected behavior, or error message.
|
|
4
7
|
---
|
|
5
8
|
|
|
6
9
|
# Systematic Debugging
|
|
7
10
|
|
|
8
|
-
Random fixes waste time and create new bugs.
|
|
11
|
+
Find the root cause first. Random fixes waste time and create new bugs.
|
|
9
12
|
|
|
10
|
-
**If you have not
|
|
13
|
+
**HARD GATE -- No fix attempts without understanding root cause. If you have not completed the REPRODUCE and HYPOTHESIZE steps, you cannot propose a fix.**
|
|
11
14
|
|
|
12
|
-
##
|
|
15
|
+
## Process
|
|
13
16
|
|
|
14
|
-
|
|
15
|
-
NO FIX ATTEMPTS WITHOUT UNDERSTANDING ROOT CAUSE.
|
|
16
|
-
If you have not completed the REPRODUCE and HYPOTHESIZE steps, you CANNOT propose a fix.
|
|
17
|
-
"Let me just try this" is guessing, not debugging.
|
|
18
|
-
Violating this rule is a violation — not a time-saving shortcut.
|
|
19
|
-
</HARD-GATE>
|
|
20
|
-
|
|
21
|
-
## The Gate Function
|
|
22
|
-
|
|
23
|
-
Follow these steps IN ORDER for every bug, test failure, or unexpected behavior.
|
|
24
|
-
|
|
25
|
-
### 1. REPRODUCE — Confirm the Problem
|
|
17
|
+
### 1. REPRODUCE -- Confirm the Problem
|
|
26
18
|
|
|
27
19
|
- Run the failing command or test. Capture the EXACT error output.
|
|
28
20
|
- Can you trigger it reliably? What are the exact steps?
|
|
29
|
-
- If not reproducible: gather more data
|
|
30
|
-
|
|
31
|
-
```bash
|
|
32
|
-
# Example: reproduce a test failure
|
|
33
|
-
npx vitest run path/to/failing.test.ts
|
|
34
|
-
```
|
|
21
|
+
- If not reproducible: gather more data -- do not guess.
|
|
35
22
|
|
|
36
|
-
### 2. HYPOTHESIZE
|
|
23
|
+
### 2. HYPOTHESIZE -- Form a Theory
|
|
37
24
|
|
|
38
|
-
- Read the error message
|
|
39
|
-
- Check recent changes: `git diff`, recent commits, new dependencies
|
|
25
|
+
- Read the error message completely (stack trace, line numbers, exit codes).
|
|
26
|
+
- Check recent changes: `git diff`, recent commits, new dependencies.
|
|
40
27
|
- Trace data flow: where does the bad value originate?
|
|
41
|
-
- State your hypothesis clearly: "I think X is the root cause because Y"
|
|
28
|
+
- State your hypothesis clearly: "I think X is the root cause because Y."
|
|
42
29
|
|
|
43
|
-
### 3. ISOLATE
|
|
30
|
+
### 3. ISOLATE -- Narrow the Scope
|
|
44
31
|
|
|
45
|
-
- Find the
|
|
46
|
-
- In multi-component systems, add diagnostic logging at each boundary
|
|
47
|
-
- Identify which
|
|
48
|
-
- Compare against working examples in the codebase
|
|
32
|
+
- Find the smallest reproduction case.
|
|
33
|
+
- In multi-component systems, add diagnostic logging at each boundary.
|
|
34
|
+
- Identify which specific layer or component is failing.
|
|
35
|
+
- Compare against working examples in the codebase.
|
|
49
36
|
|
|
50
|
-
### 4. VERIFY
|
|
37
|
+
### 4. VERIFY -- Test Your Hypothesis
|
|
51
38
|
|
|
52
|
-
- Make the
|
|
53
|
-
- Change
|
|
54
|
-
- If hypothesis is wrong: form a
|
|
39
|
+
- Make the smallest possible change to test your hypothesis.
|
|
40
|
+
- Change one variable at a time -- never multiple things simultaneously.
|
|
41
|
+
- If hypothesis is wrong: form a new hypothesis, do not stack fixes.
|
|
55
42
|
|
|
56
|
-
### 5. FIX
|
|
43
|
+
### 5. FIX -- Address the Root Cause
|
|
57
44
|
|
|
58
|
-
- Write a failing test that reproduces the bug
|
|
59
|
-
- Implement a
|
|
60
|
-
- No "while I'm here" improvements
|
|
45
|
+
- Write a failing test that reproduces the bug.
|
|
46
|
+
- Implement a single fix that addresses the root cause.
|
|
47
|
+
- No "while I'm here" improvements -- fix only the identified issue.
|
|
61
48
|
|
|
62
|
-
### 6. CONFIRM
|
|
49
|
+
### 6. CONFIRM -- Verify the Fix
|
|
63
50
|
|
|
64
|
-
- Run the original failing test: it must now pass
|
|
65
|
-
- Run the full test suite: no regressions
|
|
66
|
-
- Verify the original error no longer occurs
|
|
51
|
+
- Run the original failing test: it must now pass.
|
|
52
|
+
- Run the full test suite: no regressions.
|
|
53
|
+
- Verify the original error no longer occurs.
|
|
67
54
|
|
|
68
|
-
|
|
69
|
-
# Confirm the specific fix
|
|
70
|
-
npx vitest run path/to/fixed.test.ts
|
|
71
|
-
# Confirm no regressions
|
|
72
|
-
npx vitest run
|
|
73
|
-
```
|
|
55
|
+
## Common Pitfalls
|
|
74
56
|
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
|
78
|
-
|
|
79
|
-
| "I think I know what it is" | Thinking is not evidence. Reproduce first, then hypothesize. |
|
|
80
|
-
| "Let me just try this fix" | "Just try" = guessing. You have skipped REPRODUCE and HYPOTHESIZE. |
|
|
81
|
-
| "Quick patch for now, investigate later" | "Later" never comes. Patches mask the real problem. |
|
|
57
|
+
| Excuse | Reality |
|
|
58
|
+
|--------|---------|
|
|
59
|
+
| "I think I know what it is" | Thinking is not evidence. Reproduce first. |
|
|
60
|
+
| "Let me just try this fix" | That is guessing. Complete REPRODUCE and HYPOTHESIZE first. |
|
|
82
61
|
| "Multiple changes at once saves time" | You cannot isolate what worked. You will create new bugs. |
|
|
83
|
-
| "The issue is simple
|
|
84
|
-
| "I'm under time pressure" | Systematic debugging IS faster than guess-and-check thrashing. |
|
|
85
|
-
| "The reference is too long, I'll skim it" | Partial understanding guarantees partial fixes. Read it completely. |
|
|
86
|
-
|
|
87
|
-
## Red Flags — STOP If You Catch Yourself:
|
|
88
|
-
|
|
89
|
-
- Changing code before reproducing the error
|
|
90
|
-
- Proposing a fix before reading the full error message and stack trace
|
|
91
|
-
- Trying random fixes hoping one will work
|
|
92
|
-
- Changing multiple things simultaneously
|
|
93
|
-
- Saying "it's probably X" without evidence
|
|
94
|
-
- Applying a fix that did not work, then adding another fix on top
|
|
95
|
-
- On your 3rd failed fix attempt (this signals an architectural problem — escalate)
|
|
62
|
+
| "The issue is simple" | Simple bugs have root causes too. The process is fast for simple bugs. |
|
|
96
63
|
|
|
97
|
-
|
|
64
|
+
Stop immediately if you catch yourself changing code before reproducing, proposing a fix before reading the full error, trying random fixes, or changing multiple things at once. If any of these triggers, return to step 1.
|
|
98
65
|
|
|
99
|
-
|
|
66
|
+
If 3+ fix attempts have failed, the issue is likely architectural. Document what you have tried and escalate to the user.
|
|
100
67
|
|
|
101
|
-
## Verification
|
|
68
|
+
## Verification
|
|
102
69
|
|
|
103
70
|
Before claiming a bug is fixed, confirm:
|
|
104
71
|
|
|
@@ -110,9 +77,9 @@ Before claiming a bug is fixed, confirm:
|
|
|
110
77
|
- [ ] The full test suite passes (no regressions)
|
|
111
78
|
- [ ] The original error no longer occurs when running the original steps
|
|
112
79
|
|
|
113
|
-
##
|
|
80
|
+
## MAXSIM Integration
|
|
114
81
|
|
|
115
82
|
When debugging during plan execution, MAXSIM deviation rules apply:
|
|
116
83
|
- **Rule 1 (Auto-fix bugs):** You may auto-fix bugs found during execution, but you must still follow this debugging process.
|
|
117
|
-
- **Rule 4 (Architectural changes):** If 3+ fix attempts fail, STOP and return a checkpoint
|
|
84
|
+
- **Rule 4 (Architectural changes):** If 3+ fix attempts fail, STOP and return a checkpoint -- this is an architectural decision for the user.
|
|
118
85
|
- Track all debugging deviations for SUMMARY.md documentation.
|
|
@@ -1,93 +1,70 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: tdd
|
|
3
|
-
description:
|
|
3
|
+
description: >-
|
|
4
|
+
Enforces test-driven development with the Red-Green-Refactor cycle: write a
|
|
5
|
+
failing test first, implement minimal code to pass, then refactor. Use when
|
|
6
|
+
implementing features, fixing bugs, or adding new behavior.
|
|
4
7
|
---
|
|
5
8
|
|
|
6
9
|
# Test-Driven Development (TDD)
|
|
7
10
|
|
|
8
11
|
Write the test first. Watch it fail. Write minimal code to pass. Clean up.
|
|
9
12
|
|
|
10
|
-
**
|
|
13
|
+
**HARD GATE: No implementation code without a failing test first. If you wrote production code before the test, delete it and start over. No exceptions.**
|
|
11
14
|
|
|
12
|
-
##
|
|
15
|
+
## Process
|
|
13
16
|
|
|
14
|
-
|
|
15
|
-
NO IMPLEMENTATION CODE WITHOUT A FAILING TEST FIRST.
|
|
16
|
-
If you wrote production code before the test, DELETE IT. Start over.
|
|
17
|
-
No exceptions. No "I'll add tests after." No "keep as reference."
|
|
18
|
-
Violating this rule is a violation — not a judgment call.
|
|
19
|
-
</HARD-GATE>
|
|
17
|
+
### 1. RED -- Write One Failing Test
|
|
20
18
|
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
Follow this cycle for every behavior change, feature addition, or bug fix.
|
|
24
|
-
|
|
25
|
-
### 1. RED — Write Failing Test
|
|
26
|
-
|
|
27
|
-
- Write ONE minimal test that describes the desired behavior
|
|
19
|
+
- Write ONE minimal test describing the desired behavior
|
|
28
20
|
- Test name describes what SHOULD happen, not implementation details
|
|
29
|
-
- Use real code paths
|
|
21
|
+
- Use real code paths -- mocks only when unavoidable (external APIs, databases)
|
|
30
22
|
|
|
31
|
-
### 2. VERIFY RED
|
|
23
|
+
### 2. VERIFY RED -- Run the Test
|
|
32
24
|
|
|
33
|
-
|
|
34
|
-
# Run the test suite for this file
|
|
35
|
-
npx vitest run path/to/test.test.ts
|
|
36
|
-
```
|
|
37
|
-
|
|
38
|
-
- Test MUST fail (not error — fail with an assertion)
|
|
25
|
+
- Test MUST fail with an assertion (not error out from syntax or imports)
|
|
39
26
|
- Failure message must match the missing behavior
|
|
40
|
-
- If test passes immediately
|
|
27
|
+
- If the test passes immediately, you are testing existing behavior -- rewrite it
|
|
41
28
|
|
|
42
|
-
### 3. GREEN
|
|
29
|
+
### 3. GREEN -- Write Minimal Code
|
|
43
30
|
|
|
44
31
|
- Write the SIMPLEST code that makes the test pass
|
|
45
32
|
- Do NOT add features the test does not require
|
|
46
|
-
- Do NOT refactor yet
|
|
47
|
-
|
|
48
|
-
### 4. VERIFY GREEN — Run All Tests
|
|
33
|
+
- Do NOT refactor yet
|
|
49
34
|
|
|
50
|
-
|
|
51
|
-
npx vitest run
|
|
52
|
-
```
|
|
35
|
+
### 4. VERIFY GREEN -- Run All Tests
|
|
53
36
|
|
|
54
37
|
- The new test MUST pass
|
|
55
38
|
- ALL existing tests MUST still pass
|
|
56
|
-
- If any test fails
|
|
39
|
+
- If any test fails, fix code -- not tests
|
|
57
40
|
|
|
58
|
-
### 5. REFACTOR
|
|
41
|
+
### 5. REFACTOR -- Clean Up (Tests Still Green)
|
|
59
42
|
|
|
60
43
|
- Remove duplication, improve names, extract helpers
|
|
61
|
-
- Run tests after every change
|
|
44
|
+
- Run tests after every change
|
|
62
45
|
- Do NOT add new behavior during refactor
|
|
63
46
|
|
|
64
|
-
### 6. REPEAT
|
|
47
|
+
### 6. REPEAT -- Next failing test for next behavior
|
|
65
48
|
|
|
66
|
-
## Common
|
|
49
|
+
## Common Pitfalls
|
|
67
50
|
|
|
68
|
-
| Excuse | Why
|
|
69
|
-
|
|
70
|
-
| "Too simple to test" | Simple code breaks. The test takes 30 seconds
|
|
71
|
-
| "I'll add tests after" | Tests written after pass immediately
|
|
72
|
-
| "The test framework isn't set up yet" | Set it up. That is part of the task, not a reason to skip. |
|
|
51
|
+
| Excuse | Why it fails |
|
|
52
|
+
|--------|-------------|
|
|
53
|
+
| "Too simple to test" | Simple code breaks. The test takes 30 seconds. |
|
|
54
|
+
| "I'll add tests after" | Tests written after pass immediately -- they prove nothing. |
|
|
73
55
|
| "I know the code works" | Knowledge is not evidence. A passing test is evidence. |
|
|
74
|
-
| "TDD is slower
|
|
56
|
+
| "TDD is slower" | TDD is faster than debugging. Every skip creates debt. |
|
|
75
57
|
| "Let me keep the code as reference" | You will adapt it instead of writing test-first. Delete means delete. |
|
|
76
|
-
| "I need to explore the design first" | Explore, then throw it away. Start implementation with TDD. |
|
|
77
58
|
|
|
78
|
-
|
|
59
|
+
Stop immediately if you catch yourself:
|
|
79
60
|
|
|
80
61
|
- Writing implementation code before writing a test
|
|
81
|
-
- Writing a test that passes on the first run
|
|
82
|
-
- Skipping the VERIFY RED step
|
|
62
|
+
- Writing a test that passes on the first run
|
|
63
|
+
- Skipping the VERIFY RED step
|
|
83
64
|
- Adding features beyond what the current test requires
|
|
84
|
-
-
|
|
85
|
-
- Rationalizing "just this once" or "this is different"
|
|
86
|
-
- Keeping pre-TDD code "as reference" while writing tests
|
|
65
|
+
- Keeping pre-TDD code "as reference"
|
|
87
66
|
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
## Verification Checklist
|
|
67
|
+
## Verification
|
|
91
68
|
|
|
92
69
|
Before claiming TDD compliance, confirm:
|
|
93
70
|
|
|
@@ -99,20 +76,10 @@ Before claiming TDD compliance, confirm:
|
|
|
99
76
|
- [ ] All tests pass after implementation
|
|
100
77
|
- [ ] Refactoring (if any) did not break any tests
|
|
101
78
|
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
## When Stuck
|
|
105
|
-
|
|
106
|
-
| Problem | Solution |
|
|
107
|
-
|---------|----------|
|
|
108
|
-
| Don't know how to test it | Write the assertion first. What should the output be? |
|
|
109
|
-
| Test setup is too complex | The design is too complex. Simplify the interface. |
|
|
110
|
-
| Must mock everything | Code is too coupled. Use dependency injection. |
|
|
111
|
-
| Existing code has no tests | Add tests for the code you are changing. Start the cycle now. |
|
|
112
|
-
|
|
113
|
-
## Integration with MAXSIM
|
|
79
|
+
## MAXSIM Integration
|
|
114
80
|
|
|
115
81
|
In MAXSIM plan execution, tasks marked `tdd="true"` follow this cycle with per-step commits:
|
|
82
|
+
|
|
116
83
|
- **RED commit:** `test({phase}-{plan}): add failing test for [feature]`
|
|
117
84
|
- **GREEN commit:** `feat({phase}-{plan}): implement [feature]`
|
|
118
85
|
- **REFACTOR commit (if changes made):** `refactor({phase}-{plan}): clean up [feature]`
|
|
@@ -1,36 +1,33 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: using-maxsim
|
|
3
|
-
description: Entry skill that establishes MAXSIM workflow rules — triggers before any action to route work through the correct MAXSIM commands, skills, and agents
|
|
4
3
|
alwaysApply: true
|
|
4
|
+
description: >-
|
|
5
|
+
Routes all work through MAXSIM's spec-driven workflow: checks for planning
|
|
6
|
+
directory, determines active phase, and dispatches to the correct MAXSIM
|
|
7
|
+
command. Use when starting any work session, resuming work, or when unsure
|
|
8
|
+
which MAXSIM command to run.
|
|
5
9
|
---
|
|
6
10
|
|
|
7
11
|
# Using MAXSIM
|
|
8
12
|
|
|
9
|
-
MAXSIM is a spec-driven development system. Work flows through phases, plans, and tasks
|
|
13
|
+
MAXSIM is a spec-driven development system. Work flows through phases, plans, and tasks -- not ad-hoc coding.
|
|
10
14
|
|
|
11
|
-
**
|
|
12
|
-
|
|
13
|
-
## The Iron Law
|
|
14
|
-
|
|
15
|
-
<HARD-GATE>
|
|
16
|
-
NO IMPLEMENTATION WITHOUT A PLAN.
|
|
17
|
-
If there is no .planning/ directory, run `/maxsim:init` first.
|
|
15
|
+
**HARD GATE -- No implementation without a plan.**
|
|
16
|
+
If there is no `.planning/` directory, run `/maxsim:init` first.
|
|
18
17
|
If there is no current phase, run `/maxsim:plan-phase` first.
|
|
19
18
|
If there is no PLAN.md for the current phase, run `/maxsim:plan-phase` first.
|
|
20
19
|
If there IS a plan, run `/maxsim:execute-phase` to execute it.
|
|
21
|
-
Skipping the workflow is a violation — not a shortcut.
|
|
22
|
-
</HARD-GATE>
|
|
23
20
|
|
|
24
|
-
##
|
|
21
|
+
## Process
|
|
25
22
|
|
|
26
|
-
|
|
23
|
+
Before starting any task:
|
|
27
24
|
|
|
28
|
-
1. **Check for `.planning/` directory**
|
|
29
|
-
2. **Check STATE.md**
|
|
30
|
-
3. **Check current phase**
|
|
31
|
-
4. **Route to the correct command** based on the
|
|
25
|
+
1. **Check for `.planning/` directory** -- if missing, initialize with `/maxsim:init`
|
|
26
|
+
2. **Check STATE.md** -- resume from last checkpoint if one exists
|
|
27
|
+
3. **Check current phase** -- determine what phase is active in ROADMAP.md
|
|
28
|
+
4. **Route to the correct command** based on the routing table below
|
|
32
29
|
|
|
33
|
-
|
|
30
|
+
### Routing Table
|
|
34
31
|
|
|
35
32
|
| Situation | Route To |
|
|
36
33
|
|-----------|----------|
|
|
@@ -39,18 +36,18 @@ This skill applies to ALL work sessions. Before starting any task:
|
|
|
39
36
|
| Active phase has no PLAN.md | `/maxsim:plan-phase` |
|
|
40
37
|
| Active phase has PLAN.md, not started | `/maxsim:execute-phase` |
|
|
41
38
|
| Checkpoint exists in STATE.md | `/maxsim:resume-work` |
|
|
42
|
-
| Bug found during execution | `/maxsim:debug`
|
|
43
|
-
| Phase complete, needs verification | `/maxsim:verify-phase`
|
|
39
|
+
| Bug found during execution | `/maxsim:debug` |
|
|
40
|
+
| Phase complete, needs verification | `/maxsim:verify-phase` |
|
|
44
41
|
| Quick standalone task | `/maxsim:quick` |
|
|
45
42
|
| User asks for help | `/maxsim:help` |
|
|
46
43
|
|
|
47
|
-
|
|
44
|
+
### Available Skills
|
|
48
45
|
|
|
49
46
|
Skills are behavioral rules that activate automatically based on context:
|
|
50
47
|
|
|
51
48
|
| Skill | Triggers When |
|
|
52
49
|
|-------|---------------|
|
|
53
|
-
| `using-maxsim` | Always (alwaysApply)
|
|
50
|
+
| `using-maxsim` | Always (alwaysApply) -- entry point for all MAXSIM work |
|
|
54
51
|
| `systematic-debugging` | Any bug, test failure, or unexpected behavior encountered |
|
|
55
52
|
| `tdd` | Implementing any feature or bug fix (write test first) |
|
|
56
53
|
| `verification-before-completion` | Before claiming any work is complete or passing |
|
|
@@ -60,7 +57,7 @@ Skills are behavioral rules that activate automatically based on context:
|
|
|
60
57
|
| `simplify` | When reviewing and cleaning up code changes |
|
|
61
58
|
| `code-review` | When reviewing implementation quality |
|
|
62
59
|
|
|
63
|
-
|
|
60
|
+
### Available Agents
|
|
64
61
|
|
|
65
62
|
Agents are specialized subagent prompts spawned by MAXSIM commands:
|
|
66
63
|
|
|
@@ -80,17 +77,7 @@ Agents are specialized subagent prompts spawned by MAXSIM commands:
|
|
|
80
77
|
| `maxsim-codebase-mapper` | Maps codebase structure | `/maxsim:init` |
|
|
81
78
|
| `maxsim-integration-checker` | Checks integration points | `/maxsim:verify-phase` |
|
|
82
79
|
|
|
83
|
-
## Common
|
|
84
|
-
|
|
85
|
-
| Excuse | Why It Violates the Rule |
|
|
86
|
-
|--------|--------------------------|
|
|
87
|
-
| "It's just a small fix" | Small fixes have context and consequences. Use `/maxsim:quick`. |
|
|
88
|
-
| "I know what to do, I don't need a plan" | Plans catch what you miss. The plan is the checkpoint. |
|
|
89
|
-
| "MAXSIM overhead is too much for this" | `/maxsim:quick` exists for lightweight tasks. Use it. |
|
|
90
|
-
| "I'll plan it in my head" | Plans in your head die with context. Write them down. |
|
|
91
|
-
| "The user said 'just do it'" | Route through `/maxsim:quick` — it is fast and still tracked. |
|
|
92
|
-
|
|
93
|
-
## Red Flags — STOP If You Catch Yourself:
|
|
80
|
+
## Common Pitfalls
|
|
94
81
|
|
|
95
82
|
- Writing implementation code without a PLAN.md
|
|
96
83
|
- Skipping `/maxsim:init` because "the project is simple"
|
|
@@ -99,22 +86,22 @@ Agents are specialized subagent prompts spawned by MAXSIM commands:
|
|
|
99
86
|
- Making architectural decisions without documenting them in STATE.md
|
|
100
87
|
- Finishing work without running verification
|
|
101
88
|
|
|
102
|
-
**If any
|
|
89
|
+
**If any of these occur: stop, check the routing table, follow the workflow.**
|
|
103
90
|
|
|
104
|
-
## Verification
|
|
91
|
+
## Verification
|
|
105
92
|
|
|
106
93
|
Before ending any work session:
|
|
107
94
|
|
|
108
95
|
- [ ] All work was routed through MAXSIM commands (not ad-hoc)
|
|
109
96
|
- [ ] STATE.md reflects current progress and decisions
|
|
110
|
-
- [ ] Any bugs encountered were debugged systematically
|
|
97
|
+
- [ ] Any bugs encountered were debugged systematically
|
|
111
98
|
- [ ] Tests were written before implementation (TDD)
|
|
112
99
|
- [ ] Completion claims have verification evidence
|
|
113
100
|
- [ ] Recurring patterns or errors were saved to memory
|
|
114
101
|
|
|
115
|
-
## Integration
|
|
102
|
+
## MAXSIM Integration
|
|
116
103
|
|
|
117
104
|
When a project has a `CLAUDE.md`, both apply:
|
|
118
105
|
- `CLAUDE.md` defines project-specific conventions (language, tools, style)
|
|
119
106
|
- MAXSIM skills define workflow rules (how work is structured and verified)
|
|
120
|
-
- If they conflict, `CLAUDE.md`
|
|
107
|
+
- If they conflict, `CLAUDE.md` takes priority for code style; MAXSIM takes priority for workflow structure
|
|
@@ -1,36 +1,28 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: verification-before-completion
|
|
3
|
-
description:
|
|
3
|
+
description: >-
|
|
4
|
+
Requires running verification commands and reading actual output before making
|
|
5
|
+
any completion claims. Use when claiming work is done, tests pass, builds
|
|
6
|
+
succeed, or bugs are fixed. Prevents false completion claims.
|
|
4
7
|
---
|
|
5
8
|
|
|
6
9
|
# Verification Before Completion
|
|
7
10
|
|
|
8
|
-
|
|
11
|
+
Evidence before claims, always.
|
|
9
12
|
|
|
10
|
-
**
|
|
13
|
+
**HARD GATE -- No completion claims without fresh verification evidence. If you have not run the verification command in this turn, you cannot claim it passes. "Should work" is not evidence. "I'm confident" is not evidence.**
|
|
11
14
|
|
|
12
|
-
##
|
|
15
|
+
## Process
|
|
13
16
|
|
|
14
|
-
|
|
15
|
-
NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.
|
|
16
|
-
If you have not run the verification command in this turn, you CANNOT claim it passes.
|
|
17
|
-
"Should work" is not evidence. "I'm confident" is not evidence.
|
|
18
|
-
Violating this rule is a violation — not a special case.
|
|
19
|
-
</HARD-GATE>
|
|
17
|
+
Before claiming any status or marking a task done:
|
|
20
18
|
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
4. **VERIFY:** Does the output actually confirm the claim?
|
|
29
|
-
- If NO: State the actual status with evidence
|
|
30
|
-
- If YES: State the claim WITH the evidence
|
|
31
|
-
5. **CLAIM:** Only now may you assert completion
|
|
32
|
-
|
|
33
|
-
**Skip any step = lying, not verifying.**
|
|
19
|
+
1. **IDENTIFY** -- What command proves this claim?
|
|
20
|
+
2. **RUN** -- Execute the full command fresh in this turn (not a previous run)
|
|
21
|
+
3. **READ** -- Read the full output, check the exit code, count failures
|
|
22
|
+
4. **VERIFY** -- Does the output actually confirm the claim?
|
|
23
|
+
- If NO: state the actual status with evidence
|
|
24
|
+
- If YES: state the claim with the evidence
|
|
25
|
+
5. **CLAIM** -- Only now may you assert completion
|
|
34
26
|
|
|
35
27
|
### Evidence Block Format
|
|
36
28
|
|
|
@@ -43,35 +35,11 @@ OUTPUT: [relevant excerpt of actual output]
|
|
|
43
35
|
VERDICT: PASS | FAIL
|
|
44
36
|
```
|
|
45
37
|
|
|
46
|
-
This format is required for task completion claims in MAXSIM plan execution. It is
|
|
47
|
-
|
|
48
|
-
## Common Rationalizations — REJECT THESE
|
|
49
|
-
|
|
50
|
-
| Excuse | Why It Violates the Rule |
|
|
51
|
-
|--------|--------------------------|
|
|
52
|
-
| "Should work now" | "Should" is not evidence. RUN the command. |
|
|
53
|
-
| "I'm confident in the logic" | Confidence is not evidence. Run it. |
|
|
54
|
-
| "The linter passed" | Linter passing does not mean tests pass or build succeeds. |
|
|
55
|
-
| "Just this once" | NO EXCEPTIONS. This is the rule, not a guideline. |
|
|
56
|
-
| "I only changed one line" | One line can break everything. Verify. |
|
|
57
|
-
| "The subagent reported success" | Trust test output and VCS diffs, not agent reports. |
|
|
58
|
-
| "Partial check is enough" | Partial proves nothing about the unchecked parts. |
|
|
59
|
-
|
|
60
|
-
## Red Flags — STOP If You Catch Yourself:
|
|
38
|
+
This format is required for task completion claims in MAXSIM plan execution. It is not required for intermediate status updates.
|
|
61
39
|
|
|
62
|
-
|
|
63
|
-
- Expressing satisfaction ("Great!", "Perfect!", "Done!") before running verification
|
|
64
|
-
- About to commit or push without running the test/build command in THIS turn
|
|
65
|
-
- Trusting a subagent's completion report without independent verification
|
|
66
|
-
- Thinking "the last run was clean, I only changed one line"
|
|
67
|
-
- About to mark a MAXSIM task as done without running the `<verify>` block
|
|
68
|
-
- Relying on a previous turn's test output as current evidence
|
|
40
|
+
### What Counts as Verification
|
|
69
41
|
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
## What Counts as Verification
|
|
73
|
-
|
|
74
|
-
| Claim | Requires | NOT Sufficient |
|
|
42
|
+
| Claim | Requires | Not Sufficient |
|
|
75
43
|
|-------|----------|----------------|
|
|
76
44
|
| "Tests pass" | Test command output showing 0 failures | Previous run, "should pass", partial run |
|
|
77
45
|
| "Build succeeds" | Build command with exit code 0 | Linter passing, "logs look clean" |
|
|
@@ -79,7 +47,19 @@ This format is required for task completion claims in MAXSIM plan execution. It
|
|
|
79
47
|
| "Task is complete" | All done criteria checked with evidence | "I implemented everything in the plan" |
|
|
80
48
|
| "No regressions" | Full test suite passing | "I only changed one file" |
|
|
81
49
|
|
|
82
|
-
##
|
|
50
|
+
## Common Pitfalls
|
|
51
|
+
|
|
52
|
+
| Excuse | Why It Fails |
|
|
53
|
+
|--------|-------------|
|
|
54
|
+
| "Should work now" | "Should" is not evidence. Run the command. |
|
|
55
|
+
| "I'm confident in the logic" | Confidence is not evidence. Run it. |
|
|
56
|
+
| "The linter passed" | Linter passing does not mean tests pass or build succeeds. |
|
|
57
|
+
| "I only changed one line" | One line can break everything. Verify. |
|
|
58
|
+
| "The subagent reported success" | Trust test output and VCS diffs, not agent reports. |
|
|
59
|
+
|
|
60
|
+
Stop if you catch yourself using "should", "probably", or "looks good" about unverified work, or expressing satisfaction before running verification.
|
|
61
|
+
|
|
62
|
+
## Verification
|
|
83
63
|
|
|
84
64
|
Before marking any work as complete:
|
|
85
65
|
|
|
@@ -91,12 +71,13 @@ Before marking any work as complete:
|
|
|
91
71
|
- [ ] No "should", "probably", or "seems to" in your completion statement
|
|
92
72
|
- [ ] Evidence block produced for the task completion claim
|
|
93
73
|
|
|
94
|
-
##
|
|
74
|
+
## MAXSIM Integration
|
|
75
|
+
|
|
76
|
+
The executor's task commit protocol requires verification before committing:
|
|
95
77
|
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
2. Confirm the `<done>` criteria are met with evidence
|
|
78
|
+
1. Run the task's verify block (automated checks)
|
|
79
|
+
2. Confirm the done criteria are met with evidence
|
|
99
80
|
3. Produce an evidence block for the task completion
|
|
100
81
|
4. Only then: stage files and commit
|
|
101
82
|
|
|
102
|
-
The verifier agent independently re-checks all claims
|
|
83
|
+
The verifier agent independently re-checks all claims -- do not assume the verifier will catch what you missed.
|