maxsimcli 5.0.6 → 5.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (91) hide show
  1. package/README.md +316 -288
  2. package/dist/assets/CHANGELOG.md +14 -0
  3. package/dist/assets/hooks/maxsim-capture-learnings.cjs +128 -0
  4. package/dist/assets/hooks/maxsim-capture-learnings.cjs.map +1 -0
  5. package/dist/assets/hooks/maxsim-check-update.cjs +126 -88
  6. package/dist/assets/hooks/maxsim-check-update.cjs.map +1 -1
  7. package/dist/assets/hooks/maxsim-notification-sound.cjs +87 -43
  8. package/dist/assets/hooks/maxsim-notification-sound.cjs.map +1 -1
  9. package/dist/assets/hooks/maxsim-statusline.cjs +45 -171
  10. package/dist/assets/hooks/maxsim-statusline.cjs.map +1 -1
  11. package/dist/assets/hooks/maxsim-stop-sound.cjs +86 -43
  12. package/dist/assets/hooks/maxsim-stop-sound.cjs.map +1 -1
  13. package/dist/assets/hooks/maxsim-sync-reminder.cjs +72 -21
  14. package/dist/assets/hooks/maxsim-sync-reminder.cjs.map +1 -1
  15. package/dist/assets/templates/agents/AGENTS.md +62 -51
  16. package/dist/assets/templates/agents/executor.md +44 -59
  17. package/dist/assets/templates/agents/planner.md +36 -31
  18. package/dist/assets/templates/agents/researcher.md +35 -43
  19. package/dist/assets/templates/agents/verifier.md +29 -31
  20. package/dist/assets/templates/commands/maxsim/debug.md +20 -154
  21. package/dist/assets/templates/commands/maxsim/execute.md +19 -33
  22. package/dist/assets/templates/commands/maxsim/go.md +21 -20
  23. package/dist/assets/templates/commands/maxsim/help.md +5 -14
  24. package/dist/assets/templates/commands/maxsim/init.md +18 -40
  25. package/dist/assets/templates/commands/maxsim/plan.md +22 -37
  26. package/dist/assets/templates/commands/maxsim/progress.md +15 -16
  27. package/dist/assets/templates/commands/maxsim/quick.md +18 -29
  28. package/dist/assets/templates/commands/maxsim/settings.md +18 -26
  29. package/dist/assets/templates/references/continuation-format.md +2 -4
  30. package/dist/assets/templates/references/model-profiles.md +2 -2
  31. package/dist/assets/templates/references/planning-config.md +10 -11
  32. package/dist/assets/templates/references/self-improvement.md +120 -0
  33. package/dist/assets/templates/rules/conventions.md +1 -1
  34. package/dist/assets/templates/rules/verification-protocol.md +1 -1
  35. package/dist/assets/templates/skills/brainstorming/SKILL.md +35 -26
  36. package/dist/assets/templates/skills/code-review/SKILL.md +78 -55
  37. package/dist/assets/templates/skills/commit-conventions/SKILL.md +70 -36
  38. package/dist/assets/templates/skills/github-operations/SKILL.md +142 -0
  39. package/dist/assets/templates/skills/handoff-contract/SKILL.md +62 -28
  40. package/dist/assets/templates/skills/maxsim-batch/SKILL.md +68 -42
  41. package/dist/assets/templates/skills/maxsim-simplify/SKILL.md +65 -40
  42. package/dist/assets/templates/skills/project-memory/SKILL.md +121 -0
  43. package/dist/assets/templates/skills/research/SKILL.md +126 -0
  44. package/dist/assets/templates/skills/roadmap-writing/SKILL.md +71 -68
  45. package/dist/assets/templates/skills/systematic-debugging/SKILL.md +37 -25
  46. package/dist/assets/templates/skills/tdd/SKILL.md +36 -39
  47. package/dist/assets/templates/skills/using-maxsim/SKILL.md +69 -55
  48. package/dist/assets/templates/skills/verification/SKILL.md +167 -0
  49. package/dist/assets/templates/workflows/batch.md +249 -268
  50. package/dist/assets/templates/workflows/diagnose-issues.md +225 -151
  51. package/dist/assets/templates/workflows/execute-plan.md +191 -981
  52. package/dist/assets/templates/workflows/execute.md +350 -309
  53. package/dist/assets/templates/workflows/go.md +119 -138
  54. package/dist/assets/templates/workflows/health.md +71 -114
  55. package/dist/assets/templates/workflows/help.md +85 -147
  56. package/dist/assets/templates/workflows/init-existing.md +180 -1373
  57. package/dist/assets/templates/workflows/init.md +53 -165
  58. package/dist/assets/templates/workflows/new-milestone.md +91 -334
  59. package/dist/assets/templates/workflows/new-project.md +165 -1384
  60. package/dist/assets/templates/workflows/plan-create.md +182 -73
  61. package/dist/assets/templates/workflows/plan-discuss.md +89 -82
  62. package/dist/assets/templates/workflows/plan-research.md +191 -85
  63. package/dist/assets/templates/workflows/plan.md +122 -58
  64. package/dist/assets/templates/workflows/progress.md +76 -310
  65. package/dist/assets/templates/workflows/quick.md +70 -495
  66. package/dist/assets/templates/workflows/sdd.md +231 -221
  67. package/dist/assets/templates/workflows/settings.md +90 -120
  68. package/dist/assets/templates/workflows/verify-phase.md +296 -258
  69. package/dist/cli.cjs +17 -23465
  70. package/dist/cli.cjs.map +1 -1
  71. package/dist/install.cjs +356 -8358
  72. package/dist/install.cjs.map +1 -1
  73. package/package.json +16 -22
  74. package/dist/assets/templates/skills/agent-system-map/SKILL.md +0 -92
  75. package/dist/assets/templates/skills/evidence-collection/SKILL.md +0 -87
  76. package/dist/assets/templates/skills/github-artifact-protocol/SKILL.md +0 -67
  77. package/dist/assets/templates/skills/github-tools-guide/SKILL.md +0 -89
  78. package/dist/assets/templates/skills/input-validation/SKILL.md +0 -51
  79. package/dist/assets/templates/skills/memory-management/SKILL.md +0 -75
  80. package/dist/assets/templates/skills/research-methodology/SKILL.md +0 -137
  81. package/dist/assets/templates/skills/sdd/SKILL.md +0 -91
  82. package/dist/assets/templates/skills/tool-priority-guide/SKILL.md +0 -80
  83. package/dist/assets/templates/skills/verification-before-completion/SKILL.md +0 -71
  84. package/dist/assets/templates/skills/verification-gates/SKILL.md +0 -169
  85. package/dist/assets/templates/workflows/discuss-phase.md +0 -683
  86. package/dist/assets/templates/workflows/research-phase.md +0 -73
  87. package/dist/assets/templates/workflows/verify-work.md +0 -572
  88. package/dist/core-D5zUr9cb.cjs +0 -4305
  89. package/dist/core-D5zUr9cb.cjs.map +0 -1
  90. package/dist/skills-CjFWZIGM.cjs +0 -6824
  91. package/dist/skills-CjFWZIGM.cjs.map +0 -1
@@ -1,57 +1,66 @@
1
1
  ---
2
2
  name: systematic-debugging
3
- description: >-
4
- Systematic debugging via reproduce-hypothesize-isolate-verify-fix cycle.
5
- Requires evidence at each step. Use when investigating bugs, test failures,
6
- unexpected behavior, or runtime errors.
3
+ description: Systematic debugging via reproduce-hypothesize-isolate-verify-fix-confirm cycle. Requires evidence at each step. Use when encountering bugs, test failures, or unexpected behavior.
7
4
  ---
8
5
 
9
6
  # Systematic Debugging
10
7
 
11
8
  Find the root cause first. Random fixes waste time and create new bugs.
12
9
 
13
- **No fix attempts without understanding root cause.** If you have not completed the REPRODUCE and HYPOTHESIZE steps, you cannot propose a fix.
10
+ **No fix attempts without understanding root cause.** Completing REPRODUCE and HYPOTHESIZE is mandatory before proposing any fix.
14
11
 
15
- ## The 5-Step Process
12
+ ## The 6-Step Process
16
13
 
17
- ### 1. REPRODUCE -- Confirm the Problem
14
+ ### 1. REPRODUCE Confirm the Problem
18
15
 
19
16
  - Run the failing command or test. Capture the EXACT error output.
20
- - Can you trigger it reliably? What are the exact steps?
21
- - If not reproducible: gather more data -- do not guess.
17
+ - Can it be triggered reliably? What are the exact steps?
18
+ - If not reproducible: gather more data do not guess.
22
19
 
23
- ### 2. HYPOTHESIZE -- Form a Theory
20
+ **Output:** Exact reproduction steps and full error output. Nothing moves forward without this.
24
21
 
25
- - Read the error message completely (stack trace, line numbers, exit codes).
22
+ ### 2. HYPOTHESIZE Form a Theory
23
+
24
+ - Read the error message completely: stack trace, line numbers, exit codes.
26
25
  - Check recent changes: `git diff`, recent commits, new dependencies.
27
26
  - Trace data flow: where does the bad value originate?
28
- - State your hypothesis clearly: "I think X is the root cause because Y."
27
+ - State the hypothesis explicitly: "I think X is the root cause because Y."
28
+
29
+ **Output:** One clear hypothesis with evidence supporting it.
29
30
 
30
- ### 3. ISOLATE -- Narrow the Scope
31
+ ### 3. ISOLATE Narrow the Scope
31
32
 
32
33
  - Find the smallest reproduction case.
33
34
  - In multi-component systems, add diagnostic logging at each boundary.
34
35
  - Identify which specific layer or component is failing.
35
36
  - Compare against working examples in the codebase.
36
37
 
37
- ### 4. VERIFY -- Test Your Hypothesis
38
+ **Output:** The smallest failing case and the specific component responsible.
39
+
40
+ ### 4. VERIFY — Test the Hypothesis
41
+
42
+ - Make the smallest possible change to test the hypothesis.
43
+ - Change one variable at a time — never multiple things simultaneously.
44
+ - If hypothesis is wrong: form a new hypothesis. Do not stack fixes.
38
45
 
39
- - Make the smallest possible change to test your hypothesis.
40
- - Change one variable at a time -- never multiple things simultaneously.
41
- - If hypothesis is wrong: form a new hypothesis, do not stack fixes.
46
+ **Output:** Confirmed or rejected hypothesis, with evidence from the test.
42
47
 
43
- ### 5. FIX -- Address the Root Cause
48
+ ### 5. FIX Address the Root Cause
44
49
 
45
- - Write a failing test that reproduces the bug.
50
+ - Write a failing test that reproduces the bug (when applicable).
46
51
  - Implement a single fix that addresses the root cause.
47
- - No "while I'm here" improvements -- fix only the identified issue.
52
+ - No "while I'm here" improvements fix only the identified issue.
48
53
 
49
- ### 6. CONFIRM -- Verify the Fix
54
+ **Output:** A minimal fix that targets exactly the identified root cause.
55
+
56
+ ### 6. CONFIRM — Verify the Fix
50
57
 
51
58
  - Run the original failing test: it must now pass.
52
59
  - Run the full test suite: no regressions.
53
60
  - Verify the original error no longer occurs.
54
61
 
62
+ **Output:** Evidence that the bug is fixed and nothing is broken.
63
+
55
64
  ## Hypothesis Testing Protocol
56
65
 
57
66
  For each hypothesis:
@@ -59,11 +68,13 @@ For each hypothesis:
59
68
  1. **Form:** "I think X is the root cause because Y."
60
69
  2. **Design test:** "If X is the cause, then changing Z should produce W."
61
70
  3. **Run test:** Execute the change and observe the result.
62
- 4. **Evaluate:** Did the result match the prediction? If yes, proceed to FIX. If no, form a new hypothesis.
71
+ 4. **Evaluate:** Did the result match the prediction? If yes, proceed to FIX. If no, discard the hypothesis and form a new one.
72
+
73
+ Never carry forward a failed hypothesis. Each iteration starts clean.
63
74
 
64
75
  ## Escalation
65
76
 
66
- If 3+ fix attempts have failed, the issue is likely architectural. Document what you have tried (hypotheses tested, evidence gathered, fixes attempted) and escalate.
77
+ If 3+ fix attempts have failed, the issue is likely architectural. Stop guessing. Document what has been tried hypotheses tested, evidence gathered, fixes attempted and escalate or step back to redesign.
67
78
 
68
79
  ## Common Pitfalls
69
80
 
@@ -71,9 +82,10 @@ If 3+ fix attempts have failed, the issue is likely architectural. Document what
71
82
  |--------|---------|
72
83
  | "I think I know what it is" | Thinking is not evidence. Reproduce first. |
73
84
  | "Let me just try this fix" | That is guessing. Complete REPRODUCE and HYPOTHESIZE first. |
74
- | "Multiple changes at once saves time" | You cannot isolate what worked. You will create new bugs. |
85
+ | "Multiple changes at once saves time" | You cannot isolate what worked. New bugs will appear. |
75
86
  | "The issue is simple" | Simple bugs have root causes too. The process is fast for simple bugs. |
87
+ | "The stack trace is enough" | Stack traces show where it crashed, not why. Trace the data. |
76
88
 
77
89
  Stop immediately if you catch yourself changing code before reproducing, proposing a fix before reading the full error, trying random fixes, or changing multiple things at once.
78
90
 
79
- See also: `/verification-before-completion` for evidence-based confirmation after fixes.
91
+ See also: `verification-before-completion` for evidence-based confirmation after fixes.
@@ -1,77 +1,74 @@
1
1
  ---
2
2
  name: tdd
3
- description: >-
4
- Test-driven development with red-green-refactor cycle and atomic commits.
5
- Write failing test first, then minimal passing code, then refactor. Use when
6
- implementing business logic, API endpoints, data transformations, validation
7
- rules, or algorithms.
3
+ description: Enforces red-green-refactor TDD cycle with atomic commits per phase. Use when implementing features, fixing bugs, or when tests should drive the design.
8
4
  ---
9
5
 
10
6
  # Test-Driven Development (TDD)
11
7
 
12
8
  Write the test first. Watch it fail. Write minimal code to pass. Clean up.
13
9
 
14
- ## When to Use TDD
10
+ ## When to Use
15
11
 
16
- **Good fit:** Business logic with defined I/O, API endpoints with contracts, data transformations, validation rules, algorithms, state machines.
12
+ **Good fit:** Business logic with defined I/O, API endpoints with contracts, data transformations, validation rules, algorithms, state machines, bug fixes.
17
13
 
18
- **Poor fit:** UI layout, configuration files, build scripts, one-off scripts, mechanical renames.
14
+ **Poor fit:** UI layout, configuration files, build scripts, one-off scripts, mechanical renames, exploratory spikes.
19
15
 
20
16
  ## The Red-Green-Refactor Cycle
21
17
 
22
- ### 1. RED -- Write One Failing Test
18
+ ### RED Write One Failing Test
23
19
 
24
- - Write ONE minimal test describing the desired behavior
25
- - Test name describes what SHOULD happen, not implementation details
26
- - Use real code paths -- mocks only when unavoidable (external APIs, databases)
20
+ - Write ONE minimal test describing the desired behavior.
21
+ - Test name describes what SHOULD happen, not implementation details.
22
+ - Use real code paths mocks only when unavoidable (external APIs, databases, I/O).
27
23
 
28
- ### 2. VERIFY RED -- Run the Test
24
+ **Verify RED:** Run the test. It MUST fail with an assertion error — not a syntax error or import failure. If the test passes immediately, it is testing existing behavior. Rewrite it.
29
25
 
30
- - Test MUST fail with an assertion (not error out from syntax or imports)
31
- - Failure message must match the missing behavior
32
- - If the test passes immediately, you are testing existing behavior -- rewrite it
26
+ ### GREEN Write Minimal Code
33
27
 
34
- ### 3. GREEN -- Write Minimal Code
28
+ - Write the SIMPLEST code that makes the test pass.
29
+ - Do NOT add features the test does not require.
30
+ - Do NOT refactor yet.
35
31
 
36
- - Write the SIMPLEST code that makes the test pass
37
- - Do NOT add features the test does not require
38
- - Do NOT refactor yet
32
+ **Verify GREEN:** Run all tests. The new test must pass. ALL existing tests must still pass. If any existing test fails, fix the code not the tests.
39
33
 
40
- ### 4. VERIFY GREEN -- Run All Tests
34
+ ### REFACTOR Clean Up
41
35
 
42
- - The new test MUST pass
43
- - ALL existing tests MUST still pass
44
- - If any test fails, fix code -- not tests
36
+ - Remove duplication, improve names, extract helpers.
37
+ - Run tests after every change.
38
+ - Do NOT add new behavior during refactor.
45
39
 
46
- ### 5. REFACTOR -- Clean Up (Tests Still Green)
40
+ **Verify REFACTOR:** All tests still pass. No behavior changed.
47
41
 
48
- - Remove duplication, improve names, extract helpers
49
- - Run tests after every change
50
- - Do NOT add new behavior during refactor
42
+ ### REPEAT
51
43
 
52
- ### 6. REPEAT -- Next failing test for next behavior
44
+ Move to the next failing test for the next unit of behavior.
53
45
 
54
46
  ## Commit Pattern
55
47
 
56
- Each TDD cycle produces 2-3 atomic commits:
48
+ Each TDD cycle produces 23 atomic commits:
57
49
 
58
- - **RED commit:** `test({scope}): add failing test for [feature]`
59
- - **GREEN commit:** `feat({scope}): implement [feature]`
60
- - **REFACTOR commit (if changes made):** `refactor({scope}): clean up [feature]`
50
+ | Phase | Commit format |
51
+ |-------|---------------|
52
+ | RED | `test({scope}): add failing test for [feature]` |
53
+ | GREEN | `feat({scope}): implement [feature]` |
54
+ | REFACTOR | `refactor({scope}): clean up [feature]` (omit if no changes) |
55
+
56
+ Keep commits small. One cycle = one feature unit = one commit group.
61
57
 
62
58
  ## Context Budget
63
59
 
64
- TDD uses approximately 40% more context than direct implementation due to the RED-GREEN-REFACTOR overhead. Plan accordingly for long task lists.
60
+ TDD uses approximately 40% more context than direct implementation due to cycle overhead. Plan accordingly for long task lists.
65
61
 
66
62
  ## Common Pitfalls
67
63
 
68
64
  | Excuse | Why It Fails |
69
65
  |--------|-------------|
70
- | "Too simple to test" | Simple code breaks. The test takes 30 seconds. |
71
- | "I'll add tests after" | Tests written after pass immediately -- they prove nothing. |
66
+ | "Too simple to test" | Simple code breaks. The test takes 30 seconds to write. |
67
+ | "I'll add tests after" | Tests written after pass immediately they prove nothing. |
72
68
  | "I know the code works" | Knowledge is not evidence. A passing test is evidence. |
73
- | "TDD is slower" | TDD is faster than debugging. Every skip creates debt. |
69
+ | "TDD is slower" | TDD is faster than debugging. Every skipped test creates debt. |
70
+ | "Mocks make this easy" | Over-mocking tests the mock, not the code. Prefer real code paths. |
74
71
 
75
- Stop immediately if you catch yourself writing implementation code before writing a test, writing a test that passes on the first run, skipping the VERIFY RED step, or adding features beyond what the current test requires.
72
+ Stop immediately if you catch yourself writing implementation code before a test, writing a test that passes on the first run, skipping VERIFY RED, or adding features beyond what the current test requires.
76
73
 
77
- See also: `/verification-before-completion` for evidence-based completion claims after TDD cycles.
74
+ See also: `verification-before-completion` for evidence-based completion claims after TDD cycles.
@@ -1,78 +1,92 @@
1
1
  ---
2
2
  name: using-maxsim
3
- description: >-
4
- Routes work through MAXSIM's spec-driven workflow: checks planning state,
5
- determines active phase, dispatches to the correct MAXSIM command. Use when
6
- starting work sessions, resuming work, or choosing which MAXSIM command to run.
3
+ description: Routes work through MaxsimCLI commands based on project state and user intent. Provides command reference and decision routing table. Use when determining which MaxsimCLI command to use or when starting a new session.
7
4
  ---
8
5
 
9
- # Using MAXSIM
6
+ # Using MaxsimCLI
10
7
 
11
- MAXSIM is a spec-driven development system. Work flows through phases, plans, and tasks -- not ad-hoc coding.
8
+ MaxsimCLI is a spec-driven development system. Work flows through phases, plans, and tasks not ad-hoc coding.
12
9
 
13
- **No implementation without a plan.** If there is no `.planning/` directory, run `/maxsim:init` first. If there is no current phase, run `/maxsim:plan` first. If there IS a plan, run `/maxsim:execute` to execute it.
10
+ **No implementation without a plan.** If there is no `.planning/` directory, run `/maxsim:init` first. If there is no current phase, run `/maxsim:plan N` first. If there is a plan, run `/maxsim:execute N` to execute it.
14
11
 
15
- ## Routing
12
+ ---
13
+
14
+ ## Command Routing Table
15
+
16
+ Determine user intent, then route to the correct command.
17
+
18
+ | User Intent | Command |
19
+ |-------------|---------|
20
+ | Start a new project | `/maxsim:init` |
21
+ | Continue where I left off | `/maxsim:go` |
22
+ | Plan next phase | `/maxsim:plan N` |
23
+ | Execute planned work | `/maxsim:execute N` |
24
+ | Fix a bug | `/maxsim:debug` |
25
+ | Quick one-off task | `/maxsim:quick` |
26
+ | Check progress | `/maxsim:progress` |
27
+ | Change settings | `/maxsim:settings` |
28
+ | See all commands | `/maxsim:help` |
29
+
30
+ ---
31
+
32
+ ## Session Start Routing
33
+
34
+ Before beginning any task in a session:
16
35
 
17
- Before starting any task:
36
+ 1. Check for `.planning/` directory — if missing, run `/maxsim:init`
37
+ 2. Read `STATE.md` — if a checkpoint exists, resume from it via `/maxsim:go`
38
+ 3. Check `ROADMAP.md` — identify the active phase
39
+ 4. Route to the correct command using the table above
18
40
 
19
- 1. **Check for `.planning/` directory** -- if missing, initialize with `/maxsim:init`
20
- 2. **Check STATE.md** -- resume from last checkpoint if one exists
21
- 3. **Check current phase** -- determine what phase is active in ROADMAP.md
22
- 4. **Route to the correct command** based on the table below
41
+ GitHub Issues with label `maxsim:lesson` or `maxsim:decision` are the source of truth for project learnings and architectural decisions. Read them before planning.
23
42
 
24
- ### Command Surface (9 commands)
43
+ ---
44
+
45
+ ## Skills Per Agent Type
25
46
 
26
- | Situation | Command |
27
- |-----------|---------|
28
- | No `.planning/` directory | `/maxsim:init` |
29
- | No ROADMAP.md or empty roadmap | `/maxsim:init` |
30
- | Active phase has no PLAN.md | `/maxsim:plan N` |
31
- | Active phase has PLAN.md, not started | `/maxsim:execute N` |
32
- | Phase complete, needs verification | `/maxsim:execute N` (auto-verifies) |
33
- | Bug found during execution | `/maxsim:debug` |
34
- | Quick standalone task | `/maxsim:quick` |
35
- | Check overall status | `/maxsim:progress` |
36
- | Don't know what to do next | `/maxsim:go` |
37
- | Change workflow settings | `/maxsim:settings` |
38
- | Need command reference | `/maxsim:help` |
47
+ Skills load on-demand based on the current task. Each agent type draws from a different set.
39
48
 
40
- ### Agent Model (4 agents)
49
+ | Agent | Primary Skills |
50
+ |-------|---------------|
51
+ | Planner | `research`, `brainstorming`, `roadmap-writing`, `project-memory` |
52
+ | Executor | `tdd`, `systematic-debugging`, `verification`, `maxsim-simplify` |
53
+ | Verifier | `verification`, `code-review`, `systematic-debugging` |
54
+ | Researcher | `research`, `project-memory` |
41
55
 
42
- MAXSIM uses 4 generic agent types. Specialization comes from the orchestrator's spawn prompt and on-demand skills, not from separate agent definitions.
56
+ Skills are not auto-loaded. They activate when invoked directly (e.g., `/research`) or when the orchestrator spawns an agent with explicit skill instructions.
57
+
58
+ ---
43
59
 
44
- | Agent | Role | Spawned By |
45
- |-------|------|-----------|
46
- | Executor | Implements plans with atomic commits and verified completion | `/maxsim:execute` |
47
- | Planner | Creates structured PLAN.md files from requirements | `/maxsim:plan` |
48
- | Researcher | Gathers domain knowledge and codebase context | `/maxsim:plan` (research stage) |
49
- | Verifier | Reviews code, checks specs, debugs failures | `/maxsim:execute` (review stage), `/maxsim:debug` |
60
+ ## GitHub as Source of Truth
50
61
 
51
- ### Skills
62
+ All persistent project state lives in GitHub, not in local files that disappear between sessions:
52
63
 
53
- Skills load on-demand based on description matching or direct `/skill-name` invocation. They are not auto-loaded -- each skill activates only when its content is relevant to the current task.
64
+ | Artifact | Location |
65
+ |----------|---------|
66
+ | Phase plans | `.planning/phases/N/PLAN.md` (committed) |
67
+ | Roadmap | `.planning/ROADMAP.md` (committed) |
68
+ | Session state | `.planning/STATE.md` (committed after each checkpoint) |
69
+ | Learnings | GitHub Issues — label `maxsim:lesson` |
70
+ | Decisions | GitHub Issues — label `maxsim:decision` |
54
71
 
55
- | Skill | When It Activates |
56
- |-------|-------------------|
57
- | `systematic-debugging` | Investigating bugs, test failures, or unexpected behavior |
58
- | `tdd` | Implementing business logic, APIs, data transformations |
59
- | `verification-before-completion` | Claiming work is done, tests pass, builds succeed |
60
- | `memory-management` | Recurring patterns, errors, or decisions worth persisting |
61
- | `brainstorming` | Facing architectural choices or design decisions |
62
- | `roadmap-writing` | Creating or restructuring a project roadmap |
63
- | `maxsim-simplify` | Reviewing code for duplication, dead code, complexity |
64
- | `code-review` | Reviewing implementation for security, interfaces, quality |
65
- | `sdd` | Executing sequential tasks with fresh-agent isolation |
66
- | `maxsim-batch` | Parallelizing work across independent worktree units |
72
+ ---
67
73
 
68
74
  ## Common Pitfalls
69
75
 
70
- - Writing implementation code without a PLAN.md
71
- - Skipping `/maxsim:init` because "the project is simple"
72
- - Ignoring STATE.md checkpoints from previous sessions
76
+ - Writing implementation code without a `PLAN.md`
77
+ - Skipping `/maxsim:init` because the project seems simple
78
+ - Ignoring `STATE.md` checkpoints from previous sessions
73
79
  - Working outside the current phase without explicit user approval
74
- - Making architectural decisions without documenting them in STATE.md
80
+ - Making architectural decisions without recording them as `maxsim:decision` issues
81
+
82
+ If any of these occur: stop, check the routing table, and follow the workflow.
83
+
84
+ ---
75
85
 
76
- **If any of these occur: stop, check the routing table, follow the workflow.**
86
+ ## v6 Changes from v5
77
87
 
78
- See also: `/verification-before-completion` for evidence-based completion claims.
88
+ - `/maxsim:resume-work` replaced by `/maxsim:go`
89
+ - `/maxsim:plan-phase` and `/maxsim:execute-phase` replaced by `/maxsim:plan N` and `/maxsim:execute N`
90
+ - Project memory now uses GitHub Issues instead of local STATE.md comments
91
+ - `research` skill merges former `research-methodology` and `tool-priority-guide`
92
+ - `project-memory` skill replaces `memory-management`
@@ -0,0 +1,167 @@
1
+ ---
2
+ name: verification
3
+ description: Evidence-based verification with quality gates, anti-rationalization enforcement, and retry escalation. Merges gate framework, evidence collection, and completion verification into one authoritative source. Use when completing tasks, verifying implementations, or before claiming work is done.
4
+ ---
5
+
6
+ ## The Iron Law
7
+
8
+ No completion claim is valid without fresh verification evidence produced in THIS session. Evidence from a prior session, a prior attempt, or reasoning about what "should" be true does not count. If the evidence was not collected by running a tool call in the current session, it does not exist.
9
+
10
+ ---
11
+
12
+ ## Evidence Block Format
13
+
14
+ Every claim about task completion, test status, build status, or spec compliance requires an Evidence Block. Produce one per claim.
15
+
16
+ ```
17
+ **CLAIM**: [The specific assertion being made]
18
+ **EVIDENCE**: [Tool name and exact command or action taken]
19
+ **OUTPUT**: [Actual output, quoted verbatim — not paraphrased]
20
+ **VERDICT**: PASS | FAIL | SKIPPED
21
+ ```
22
+
23
+ SKIPPED is only allowed when the claim is explicitly out of scope and the reason is documented. A skipped gate must be acknowledged by the caller.
24
+
25
+ ---
26
+
27
+ ## 4 Quality Gates
28
+
29
+ Gates run in order. A failure at any gate stops forward progress until resolved.
30
+
31
+ ### Gate 1 — Input Gate
32
+ Run before work begins.
33
+
34
+ - Spec or task definition exists and is unambiguous
35
+ - Acceptance criteria are stated explicitly
36
+ - Required inputs (files, configs, credentials) are present
37
+ - Scope boundaries are defined — what is in and what is out
38
+
39
+ Failure action: Return to requester with a clarifying question. Do not guess at requirements.
40
+
41
+ ### Gate 2 — Pre-Action Gate
42
+ Run before executing changes.
43
+
44
+ - Git state is clean or the working branch is correctly scoped
45
+ - Dependencies are installed and match the lockfile
46
+ - Linter and formatter configs are present
47
+ - No blocking issues from a previous failed run remain in the working tree
48
+
49
+ Failure action: Resolve the blocking state first. Document what was found and what was done to fix it.
50
+
51
+ ### Gate 3 — Completion Gate
52
+ Run after implementation, before declaring done.
53
+
54
+ - All tests pass (fresh run, not cached)
55
+ - Build exits with code 0
56
+ - Lint is clean
57
+ - Every acceptance criterion from Gate 1 is addressed with an Evidence Block
58
+ - No files are left in a modified-but-uncommitted state unless that is the intended deliverable
59
+
60
+ Failure action: Fix failures. Do not skip a failing test. Do not suppress a lint error. Each fix resets the gate — re-run from the top of Gate 3.
61
+
62
+ ### Gate 4 — Quality Gate
63
+ Run after Gate 3 passes.
64
+
65
+ - Code review concerns (if any were raised) are resolved
66
+ - No regressions introduced — GUARD command confirms this (see below)
67
+ - Evidence Blocks are complete and attached to the work artifact
68
+ - Handoff contract or completion note is written if another agent will consume this output
69
+
70
+ Failure action: Rework the implementation. If regressions are found, revert before attempting a fix.
71
+
72
+ ---
73
+
74
+ ## What Counts as Evidence
75
+
76
+ | Claim | Required Evidence | NOT Sufficient |
77
+ |---|---|---|
78
+ | Tests pass | `npm test` output showing pass count and zero failures | "I ran the tests" |
79
+ | Build succeeds | `npm run build` (or equivalent) with exit code 0 shown | "Build should work" |
80
+ | Lint is clean | `npm run lint` output with zero errors and zero warnings | "No obvious lint issues" |
81
+ | File was created | `ls -la <path>` or Read tool output showing the file | "I wrote the file" |
82
+ | Function behaves correctly | Test output or REPL output showing actual return value | "The logic looks right" |
83
+ | API responds correctly | Actual HTTP response body and status code | "The endpoint exists" |
84
+ | Dependency is installed | `package.json` or lockfile entry shown verbatim | "I installed it earlier" |
85
+ | Spec is met | Quoted spec requirement next to quoted output proving it | "This matches the spec" |
86
+ | No regressions | GUARD command output from this session | "Nothing was broken" |
87
+ | Migration ran | Migration log or schema diff output | "I ran the migration" |
88
+
89
+ ---
90
+
91
+ ## Verify + Guard Pattern
92
+
93
+ Every task execution uses two paired commands.
94
+
95
+ **VERIFY** — "Did this task accomplish its stated goal?"
96
+
97
+ Run after implementation. Produces an Evidence Block for each acceptance criterion. If any criterion fails, the task is not done.
98
+
99
+ **GUARD** — "Did this change break what was already working?"
100
+
101
+ Run after VERIFY passes. Executes the full test suite and any smoke checks that existed before the task started. If GUARD fails after VERIFY passes, the implementation introduced a regression.
102
+
103
+ ### Regression Protocol
104
+
105
+ 1. VERIFY passes, GUARD fails: attempt rework, limit 2 rework cycles
106
+ 2. After 2 rework cycles: revert the change entirely, escalate to user
107
+ 3. Do not merge a change where GUARD is failing
108
+
109
+ ---
110
+
111
+ ## Anti-Rationalization Table
112
+
113
+ These phrases indicate a verification failure. They are never acceptable as evidence.
114
+
115
+ | Forbidden Phrase | Why It Fails |
116
+ |---|---|
117
+ | "should work" | Describes expectation, not observed outcome |
118
+ | "I already checked" | Not verifiable in this session |
119
+ | "tests were passing before" | Stale evidence; fresh run required |
120
+ | "this is obviously correct" | Correctness is measured, not assessed by inspection |
121
+ | "I think it's fine" | No tool output, no claim |
122
+ | "the logic is sound" | Logic can be sound and still produce wrong output |
123
+ | "nothing changed in that area" | Changes in dependencies, configs, and imports are invisible to this claim |
124
+ | "it worked in my local run" | Local run is not this session's evidence unless tool output is shown |
125
+ | "we can verify later" | Verification deferred is verification skipped |
126
+ | "this is low risk" | Risk level does not substitute for evidence |
127
+
128
+ ---
129
+
130
+ ## Retry Logic
131
+
132
+ ### Attempt Counting
133
+
134
+ Each task starts at attempt 1. A failed gate that triggers a rework cycle increments the attempt counter. Attempt count resets only when the task scope changes materially.
135
+
136
+ ### Per-Attempt Rules
137
+
138
+ - **Attempt 1**: Execute normally. Collect full evidence. If gates pass, complete.
139
+ - **Attempt 2**: Fresh context. Do not carry forward assumptions from attempt 1. Re-read the spec. Re-run all gates from Gate 1.
140
+ - **Attempt 3**: Fresh agent context. Treat this as a cold start. Diagnose why attempts 1 and 2 failed before touching any code.
141
+
142
+ ### After 3 Failures
143
+
144
+ Escalate to the user. The escalation must include:
145
+
146
+ 1. The original task spec (quoted)
147
+ 2. What was attempted in each of the 3 runs (brief, factual)
148
+ 3. The specific gate that failed each time and the exact error output
149
+ 4. A diagnostic summary: is this a spec problem, an environment problem, or an implementation problem?
150
+ 5. A proposed next step (rewrite spec, fix environment, reduce scope)
151
+
152
+ Do not attempt a 4th run without user acknowledgment and revised instructions.
153
+
154
+ ---
155
+
156
+ ## Common Pitfalls
157
+
158
+ | Pitfall | Symptom | Correct Behavior |
159
+ |---|---|---|
160
+ | Caching test results | Reporting pass without re-running | Always run tests fresh; use `--no-cache` or equivalent |
161
+ | Partial lint scope | Running lint on one file, claiming lint is clean | Run lint on the entire affected module or project |
162
+ | Missing Gate 1 | Starting work before spec is confirmed | Always confirm acceptance criteria exist before writing code |
163
+ | Evidence copied from prior session | Referencing output not produced in this session | All evidence must come from tool calls in the current session |
164
+ | Verifying only the happy path | Tests pass but edge cases are untested | GUARD must include regression tests, not only new tests |
165
+ | Skipping Gate 4 after Gate 3 passes | Declaring done without regression check | Gate 3 and Gate 4 are both required; neither is optional |
166
+ | Conflating "no errors" with "correct output" | Exit code 0 but wrong behavior | Evidence must show correct output, not just absence of error |
167
+ | Writing evidence after the fact | Constructing output from memory | Run the command, capture the output, paste it verbatim |