maxsimcli 4.8.0 → 4.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (162) hide show
  1. package/README.md +180 -202
  2. package/dist/assets/CHANGELOG.md +61 -0
  3. package/dist/assets/hooks/maxsim-check-update.cjs +38 -0
  4. package/dist/assets/hooks/maxsim-check-update.cjs.map +1 -1
  5. package/dist/assets/hooks/maxsim-statusline.cjs +116 -48
  6. package/dist/assets/hooks/maxsim-statusline.cjs.map +1 -1
  7. package/dist/assets/hooks/maxsim-sync-reminder.cjs +117 -0
  8. package/dist/assets/hooks/maxsim-sync-reminder.cjs.map +1 -0
  9. package/dist/assets/templates/agents/AGENTS.md +78 -106
  10. package/dist/assets/templates/agents/executor.md +101 -0
  11. package/dist/assets/templates/agents/planner.md +86 -0
  12. package/dist/assets/templates/agents/researcher.md +71 -0
  13. package/dist/assets/templates/agents/verifier.md +88 -0
  14. package/dist/assets/templates/commands/maxsim/debug.md +7 -7
  15. package/dist/assets/templates/commands/maxsim/execute.md +45 -0
  16. package/dist/assets/templates/commands/maxsim/go.md +29 -0
  17. package/dist/assets/templates/commands/maxsim/help.md +2 -2
  18. package/dist/assets/templates/commands/maxsim/init.md +52 -0
  19. package/dist/assets/templates/commands/maxsim/plan.md +50 -0
  20. package/dist/assets/templates/commands/maxsim/progress.md +4 -3
  21. package/dist/assets/templates/commands/maxsim/quick.md +6 -4
  22. package/dist/assets/templates/commands/maxsim/settings.md +4 -3
  23. package/dist/assets/templates/references/continuation-format.md +16 -16
  24. package/dist/assets/templates/references/model-profile-resolution.md +1 -1
  25. package/dist/assets/templates/references/model-profiles.md +12 -19
  26. package/dist/assets/templates/rules/conventions.md +51 -0
  27. package/dist/assets/templates/rules/verification-protocol.md +57 -0
  28. package/dist/assets/templates/skills/agent-system-map/SKILL.md +92 -0
  29. package/dist/assets/templates/skills/brainstorming/SKILL.md +48 -36
  30. package/dist/assets/templates/skills/code-review/SKILL.md +40 -61
  31. package/dist/assets/templates/skills/commit-conventions/SKILL.md +75 -0
  32. package/dist/assets/templates/skills/evidence-collection/SKILL.md +87 -0
  33. package/dist/assets/templates/skills/handoff-contract/SKILL.md +70 -0
  34. package/dist/assets/templates/skills/input-validation/SKILL.md +51 -0
  35. package/dist/assets/templates/skills/maxsim-batch/SKILL.md +41 -45
  36. package/dist/assets/templates/skills/maxsim-simplify/SKILL.md +37 -90
  37. package/dist/assets/templates/skills/memory-management/SKILL.md +32 -67
  38. package/dist/assets/templates/skills/research-methodology/SKILL.md +137 -0
  39. package/dist/assets/templates/skills/roadmap-writing/SKILL.md +40 -58
  40. package/dist/assets/templates/skills/sdd/SKILL.md +34 -69
  41. package/dist/assets/templates/skills/systematic-debugging/SKILL.md +20 -26
  42. package/dist/assets/templates/skills/tdd/SKILL.md +25 -33
  43. package/dist/assets/templates/skills/tool-priority-guide/SKILL.md +80 -0
  44. package/dist/assets/templates/skills/using-maxsim/SKILL.md +42 -73
  45. package/dist/assets/templates/skills/verification-before-completion/SKILL.md +12 -24
  46. package/dist/assets/templates/skills/verification-gates/SKILL.md +169 -0
  47. package/dist/assets/templates/templates/UAT.md +3 -3
  48. package/dist/assets/templates/templates/VALIDATION.md +1 -1
  49. package/dist/assets/templates/templates/context.md +4 -4
  50. package/dist/assets/templates/templates/debug-subagent-prompt.md +3 -3
  51. package/dist/assets/templates/templates/discovery.md +2 -2
  52. package/dist/assets/templates/templates/phase-prompt.md +2 -2
  53. package/dist/assets/templates/templates/planner-subagent-prompt.md +7 -7
  54. package/dist/assets/templates/templates/project.md +1 -1
  55. package/dist/assets/templates/templates/research.md +1 -1
  56. package/dist/assets/templates/templates/state.md +2 -2
  57. package/dist/assets/templates/templates/summary.md +41 -0
  58. package/dist/assets/templates/workflows/batch.md +5 -5
  59. package/dist/assets/templates/workflows/diagnose-issues.md +2 -2
  60. package/dist/assets/templates/workflows/discovery-phase.md +3 -3
  61. package/dist/assets/templates/workflows/discuss-phase.md +11 -11
  62. package/dist/assets/templates/workflows/execute-phase.md +205 -11
  63. package/dist/assets/templates/workflows/execute-plan.md +299 -34
  64. package/dist/assets/templates/workflows/execute.md +421 -0
  65. package/dist/assets/templates/workflows/go.md +250 -0
  66. package/dist/assets/templates/workflows/health.md +5 -5
  67. package/dist/assets/templates/workflows/help.md +165 -435
  68. package/dist/assets/templates/workflows/init-existing.md +23 -23
  69. package/dist/assets/templates/workflows/init.md +205 -0
  70. package/dist/assets/templates/workflows/new-milestone.md +9 -9
  71. package/dist/assets/templates/workflows/new-project.md +26 -26
  72. package/dist/assets/templates/workflows/plan-create.md +298 -0
  73. package/dist/assets/templates/workflows/plan-discuss.md +347 -0
  74. package/dist/assets/templates/workflows/plan-phase.md +29 -29
  75. package/dist/assets/templates/workflows/plan-research.md +177 -0
  76. package/dist/assets/templates/workflows/plan.md +231 -0
  77. package/dist/assets/templates/workflows/progress.md +46 -42
  78. package/dist/assets/templates/workflows/quick.md +195 -14
  79. package/dist/assets/templates/workflows/research-phase.md +5 -5
  80. package/dist/assets/templates/workflows/sdd.md +20 -12
  81. package/dist/assets/templates/workflows/settings.md +18 -14
  82. package/dist/assets/templates/workflows/verify-phase.md +1 -1
  83. package/dist/assets/templates/workflows/verify-work.md +16 -16
  84. package/dist/cli.cjs +4589 -229
  85. package/dist/cli.cjs.map +1 -1
  86. package/dist/core-D5zUr9cb.cjs.map +1 -1
  87. package/dist/install.cjs +234 -17
  88. package/dist/install.cjs.map +1 -1
  89. package/dist/mcp-server.cjs +298 -20
  90. package/dist/mcp-server.cjs.map +1 -1
  91. package/dist/skills-CjFWZIGM.cjs.map +1 -1
  92. package/package.json +1 -1
  93. package/dist/assets/hooks/maxsim-context-monitor.cjs +0 -121
  94. package/dist/assets/hooks/maxsim-context-monitor.cjs.map +0 -1
  95. package/dist/assets/templates/agents/maxsim-code-reviewer.md +0 -239
  96. package/dist/assets/templates/agents/maxsim-codebase-mapper.md +0 -214
  97. package/dist/assets/templates/agents/maxsim-debugger.md +0 -572
  98. package/dist/assets/templates/agents/maxsim-drift-checker.md +0 -522
  99. package/dist/assets/templates/agents/maxsim-executor.md +0 -504
  100. package/dist/assets/templates/agents/maxsim-integration-checker.md +0 -273
  101. package/dist/assets/templates/agents/maxsim-phase-researcher.md +0 -305
  102. package/dist/assets/templates/agents/maxsim-plan-checker.md +0 -343
  103. package/dist/assets/templates/agents/maxsim-planner.md +0 -610
  104. package/dist/assets/templates/agents/maxsim-project-researcher.md +0 -359
  105. package/dist/assets/templates/agents/maxsim-research-synthesizer.md +0 -263
  106. package/dist/assets/templates/agents/maxsim-roadmapper.md +0 -324
  107. package/dist/assets/templates/agents/maxsim-spec-reviewer.md +0 -245
  108. package/dist/assets/templates/agents/maxsim-verifier.md +0 -393
  109. package/dist/assets/templates/commands/maxsim/add-phase.md +0 -43
  110. package/dist/assets/templates/commands/maxsim/add-tests.md +0 -41
  111. package/dist/assets/templates/commands/maxsim/add-todo.md +0 -57
  112. package/dist/assets/templates/commands/maxsim/artefakte.md +0 -122
  113. package/dist/assets/templates/commands/maxsim/audit-milestone.md +0 -36
  114. package/dist/assets/templates/commands/maxsim/batch.md +0 -42
  115. package/dist/assets/templates/commands/maxsim/check-drift.md +0 -56
  116. package/dist/assets/templates/commands/maxsim/check-todos.md +0 -46
  117. package/dist/assets/templates/commands/maxsim/cleanup.md +0 -18
  118. package/dist/assets/templates/commands/maxsim/complete-milestone.md +0 -136
  119. package/dist/assets/templates/commands/maxsim/discuss-phase.md +0 -87
  120. package/dist/assets/templates/commands/maxsim/discuss.md +0 -70
  121. package/dist/assets/templates/commands/maxsim/execute-phase.md +0 -41
  122. package/dist/assets/templates/commands/maxsim/health.md +0 -22
  123. package/dist/assets/templates/commands/maxsim/init-existing.md +0 -46
  124. package/dist/assets/templates/commands/maxsim/insert-phase.md +0 -32
  125. package/dist/assets/templates/commands/maxsim/list-phase-assumptions.md +0 -46
  126. package/dist/assets/templates/commands/maxsim/map-codebase.md +0 -71
  127. package/dist/assets/templates/commands/maxsim/new-milestone.md +0 -44
  128. package/dist/assets/templates/commands/maxsim/new-project.md +0 -46
  129. package/dist/assets/templates/commands/maxsim/pause-work.md +0 -38
  130. package/dist/assets/templates/commands/maxsim/plan-milestone-gaps.md +0 -34
  131. package/dist/assets/templates/commands/maxsim/plan-phase.md +0 -44
  132. package/dist/assets/templates/commands/maxsim/realign.md +0 -39
  133. package/dist/assets/templates/commands/maxsim/reapply-patches.md +0 -110
  134. package/dist/assets/templates/commands/maxsim/remove-phase.md +0 -31
  135. package/dist/assets/templates/commands/maxsim/research-phase.md +0 -189
  136. package/dist/assets/templates/commands/maxsim/resume-work.md +0 -40
  137. package/dist/assets/templates/commands/maxsim/roadmap.md +0 -19
  138. package/dist/assets/templates/commands/maxsim/sdd.md +0 -39
  139. package/dist/assets/templates/commands/maxsim/set-profile.md +0 -34
  140. package/dist/assets/templates/commands/maxsim/update.md +0 -37
  141. package/dist/assets/templates/commands/maxsim/verify-work.md +0 -38
  142. package/dist/assets/templates/workflows/add-phase.md +0 -111
  143. package/dist/assets/templates/workflows/add-tests.md +0 -351
  144. package/dist/assets/templates/workflows/add-todo.md +0 -247
  145. package/dist/assets/templates/workflows/audit-milestone.md +0 -297
  146. package/dist/assets/templates/workflows/check-drift.md +0 -248
  147. package/dist/assets/templates/workflows/check-todos.md +0 -261
  148. package/dist/assets/templates/workflows/cleanup.md +0 -153
  149. package/dist/assets/templates/workflows/complete-milestone.md +0 -701
  150. package/dist/assets/templates/workflows/discuss.md +0 -343
  151. package/dist/assets/templates/workflows/insert-phase.md +0 -129
  152. package/dist/assets/templates/workflows/list-phase-assumptions.md +0 -178
  153. package/dist/assets/templates/workflows/map-codebase.md +0 -315
  154. package/dist/assets/templates/workflows/pause-work.md +0 -122
  155. package/dist/assets/templates/workflows/plan-milestone-gaps.md +0 -274
  156. package/dist/assets/templates/workflows/realign.md +0 -288
  157. package/dist/assets/templates/workflows/remove-phase.md +0 -154
  158. package/dist/assets/templates/workflows/resume-project.md +0 -306
  159. package/dist/assets/templates/workflows/roadmap.md +0 -130
  160. package/dist/assets/templates/workflows/set-profile.md +0 -81
  161. package/dist/assets/templates/workflows/transition.md +0 -544
  162. package/dist/assets/templates/workflows/update.md +0 -220
@@ -1,22 +1,21 @@
1
1
  ---
2
2
  name: roadmap-writing
3
3
  description: >-
4
- Creates structured project roadmaps with phased planning, dependency graphs,
5
- and testable success criteria in MAXSIM-compatible format. Use when creating a
6
- new roadmap, restructuring project phases, or planning milestones.
4
+ Phased planning with dependency graphs, success criteria, and requirement
5
+ mapping. Produces roadmaps with observable truths as success criteria.
6
+ Use when creating project roadmaps, breaking features into phases, or
7
+ structuring multi-phase work.
7
8
  ---
8
9
 
9
10
  # Roadmap Writing
10
11
 
11
12
  A roadmap without success criteria is a wish list. Define what done looks like for every phase.
12
13
 
13
- **HARD GATE: No phase without success criteria and dependencies. Every phase must have a number, name, goal, testable success criteria, and explicit dependencies. Violating this rule is a violation, not flexibility.**
14
-
15
14
  ## Process
16
15
 
17
16
  ### 1. SCOPE -- Understand the Project
18
17
 
19
- Before writing phases, understand what you are planning:
18
+ Before writing phases:
20
19
 
21
20
  - Read PROJECT.md for vision and constraints
22
21
  - Read REQUIREMENTS.md for v1/v2/out-of-scope boundaries
@@ -29,7 +28,7 @@ Each phase should be:
29
28
 
30
29
  | Property | Requirement |
31
30
  |----------|------------|
32
- | **Independently deliverable** | The phase produces a working increment, not a half-built feature |
31
+ | **Independently deliverable** | Produces a working increment, not a half-built feature |
33
32
  | **1-3 days of work** | Larger phases should be split; smaller ones should be merged |
34
33
  | **Clear boundary** | You can tell when the phase is done without ambiguity |
35
34
  | **Ordered by dependency** | No phase depends on a later phase |
@@ -42,28 +41,25 @@ Phase numbering convention:
42
41
  | `01A`, `01B` | Parallel sub-phases that can execute concurrently |
43
42
  | `01.1`, `01.2` | Sequential sub-phases within a parent phase |
44
43
 
45
- Sort order: `01` then `01A` then `01B` then `01.1` then `01.2` then `02`.
46
-
47
44
  ### 3. DEFINE -- Write Each Phase
48
45
 
49
- Every phase must include all of these fields:
46
+ Every phase must include:
50
47
 
51
48
  ```markdown
52
49
  ### Phase {number}: {name}
53
50
  **Goal**: {one sentence -- what this phase achieves}
54
51
  **Depends on**: {phase numbers, or "Nothing" for the first phase}
55
- **Requirements**: {requirement IDs from REQUIREMENTS.md, if applicable}
52
+ **Requirements**: {requirement IDs from REQUIREMENTS.md}
56
53
  **Success Criteria** (what must be TRUE):
57
- 1. {Testable statement -- can be verified with a command, test, or inspection}
58
- 2. {Testable statement}
59
- 3. {Testable statement}
54
+ 1. {Observable truth -- verifiable by command, test, or inspection}
55
+ 2. {Observable truth}
60
56
  **Plans**: TBD
61
57
  ```
62
58
 
63
59
  Success criteria rules:
64
60
  - Each criterion must be testable -- "code is clean" is not testable; "no lint warnings" is testable
65
61
  - Include at least 2 criteria per phase
66
- - At least one criterion should be verifiable by running a command (test, build, lint)
62
+ - At least one criterion should be verifiable by running a command
67
63
  - Criteria describe the end state, not the process ("tests pass" not "write tests")
68
64
 
69
65
  ### 4. CONNECT -- Map Dependencies
@@ -71,42 +67,54 @@ Success criteria rules:
71
67
  - Which phases can run in parallel? (Use letter suffixes: `03A`, `03B`)
72
68
  - Which phases are strictly sequential? (Use number suffixes: `03.1`, `03.2`)
73
69
  - Are there any circular dependencies? (This is a design error -- restructure)
70
+ - Every phase except the first must declare at least one dependency
71
+
72
+ ### 5. MAP REQUIREMENTS -- Ensure Coverage
74
73
 
75
- Every phase except the first must declare at least one dependency.
74
+ Every requirement ID from REQUIREMENTS.md must appear in at least one phase. Produce a coverage map:
76
75
 
77
- ### 5. MILESTONE -- Group Into Milestones
76
+ ```
77
+ REQUIREMENT-ID -> Phase N
78
+ ```
78
79
 
79
- Group phases into milestones that represent user-visible releases. Each milestone should be a coherent deliverable that could ship independently.
80
+ If any requirement is unmapped, either add it to a phase or explicitly mark it as out-of-scope.
81
+
82
+ ### 6. MILESTONE -- Group Into Milestones
83
+
84
+ Group phases into milestones that represent user-visible releases:
80
85
 
81
86
  ```markdown
82
87
  ## Milestones
83
-
84
88
  - **v1.0 MVP** -- Phases 1-4
85
89
  - **v1.1 Polish** -- Phases 5-7
86
- - **v2.0 Scale** -- Phases 8-10
87
90
  ```
88
91
 
89
- ### 6. WRITE -- Produce the Roadmap
92
+ ### 7. VALIDATE -- Check the Roadmap
93
+
94
+ | Check | How to Verify |
95
+ |-------|--------------|
96
+ | Every phase has success criteria | Read each phase detail section |
97
+ | Dependencies are acyclic | Trace the dependency chain -- no loops |
98
+ | Phase numbering is sequential | Numbers increase, no gaps larger than 1 |
99
+ | Milestones cover all phases | Every phase appears in exactly one milestone |
100
+ | Success criteria are testable | Each criterion can be verified by command, test, or inspection |
101
+ | Requirements are covered | Every requirement ID maps to at least one phase |
90
102
 
91
- Assemble the complete ROADMAP.md:
103
+ ## Roadmap Format
92
104
 
93
105
  ```markdown
94
106
  # Roadmap: {project name}
95
107
 
96
108
  ## Overview
97
-
98
- {2-3 sentences: what the project is, what this roadmap covers, delivery strategy}
109
+ {2-3 sentences: what the project is, what this roadmap covers}
99
110
 
100
111
  ## Milestones
101
-
102
112
  - **{milestone name}** -- Phases {range} ({status})
103
113
 
104
114
  ## Phases
105
-
106
115
  - [ ] **Phase {N}: {name}** - {one-line summary}
107
116
 
108
117
  ## Phase Details
109
-
110
118
  ### Phase {N}: {name}
111
119
  **Goal**: ...
112
120
  **Depends on**: ...
@@ -114,51 +122,25 @@ Assemble the complete ROADMAP.md:
114
122
  **Success Criteria** (what must be TRUE):
115
123
  1. ...
116
124
  **Plans**: TBD
117
- ```
118
125
 
119
- ### 7. VALIDATE -- Check the Roadmap
120
-
121
- Before finalizing, verify:
122
-
123
- | Check | How to Verify |
124
- |-------|--------------|
125
- | Every phase has success criteria | Read each phase detail section |
126
- | Dependencies are acyclic | Trace the dependency chain -- no loops |
127
- | Phase numbering is sequential | Numbers increase, no gaps larger than 1 |
128
- | Milestones cover all phases | Every phase appears in exactly one milestone |
129
- | Success criteria are testable | Each criterion can be verified by command, test, or inspection |
126
+ ## Coverage Map
127
+ REQUIREMENT-ID -> Phase N
128
+ ```
130
129
 
131
130
  ## Common Pitfalls
132
131
 
133
132
  | Pitfall | Why It Fails |
134
133
  |---------|-------------|
135
134
  | "We don't know enough to plan" | Plan what you know. Unknown phases get a research spike first. |
136
- | "The roadmap will change anyway" | Plans change -- that is expected. No plan guarantees drift. |
137
135
  | "Success criteria are too rigid" | Vague criteria are useless. Rigid criteria are adjustable. |
138
136
  | "One big phase is simpler" | Big phases hide complexity and delay feedback. Split them. |
139
137
  | "Dependencies are obvious" | Obvious to you now. Not obvious to the agent running phase 5 next week. |
140
138
  | "We'll add details later" | Later never comes. Write the details now while context is fresh. |
141
139
 
142
- Stop if you catch yourself writing a phase without success criteria, creating phases longer than 3 days of work, skipping dependency declarations, writing vague criteria like "code is good", creating circular dependencies, or putting all work in one or two massive phases.
143
-
144
- ## Verification
145
-
146
- Before finalizing a roadmap, confirm:
147
-
148
- - [ ] Every phase has a number, name, goal, dependencies, and success criteria
149
- - [ ] Success criteria are testable (verifiable by command, test, or inspection)
150
- - [ ] Dependencies form a DAG (no circular dependencies)
151
- - [ ] Phase numbering follows MAXSIM convention (01, 01A, 01B, 01.1, etc.)
152
- - [ ] Phases are 1-3 days of work each
153
- - [ ] Milestones group phases into coherent deliverables
154
- - [ ] ROADMAP.md matches the expected format for MAXSIM CLI parsing
155
- - [ ] Overview section summarizes the project and delivery strategy
140
+ Stop if you catch yourself writing a phase without success criteria, creating phases longer than 3 days of work, skipping dependency declarations, or writing vague criteria like "code is good".
156
141
 
157
142
  ## MAXSIM Integration
158
143
 
159
- Roadmap writing integrates with the MAXSIM lifecycle:
160
- - Use during project initialization to create the initial roadmap
161
- - Use when restructuring after a significant scope change or pivot
162
144
  - The roadmap is read by MAXSIM agents via `roadmap read` -- format compliance is mandatory
163
- - Phase numbering must be parseable by `normalizePhaseName()` and `comparePhaseNum()` in core
145
+ - Phase numbering must be parseable by `normalizePhaseName()` and `comparePhaseNum()`
164
146
  - Config `model_profile` in `.planning/config.json` affects agent assignment per phase
@@ -1,84 +1,66 @@
1
1
  ---
2
2
  name: sdd
3
3
  description: >-
4
- Executes plan tasks sequentially, each in a fresh subagent with minimal context,
5
- with mandatory two-stage review between tasks. Use when executing sequential
6
- tasks where context rot is a concern or running spec-driven dispatch.
4
+ Spec-driven development with fresh-agent-per-task execution. Prevents context
5
+ rot by isolating each task in a clean context window with its spec. Use when
6
+ executing multi-task plans, orchestrating agent work, or when context
7
+ accumulation degrades quality.
7
8
  ---
8
9
 
9
- # Spec-Driven Dispatch (SDD)
10
+ # Spec-Driven Development (SDD)
10
11
 
11
- Execute tasks sequentially, each in a fresh subagent with clean context. Review every task before moving to the next.
12
+ Execute tasks sequentially, each in a fresh agent with clean context. Verify every task before moving to the next.
12
13
 
13
- **HARD GATE** -- No task starts until the previous task passes two-stage review. If the review found issues, they must be fixed before the next task begins. No exceptions, no deferral, no skipping review for simple tasks.
14
+ ## Why SDD
14
15
 
15
- ## Process
16
+ Context rot is the primary failure mode for multi-task execution. As an agent processes more tasks, earlier context competes with later instructions. Quality degrades predictably after 3-5 tasks in a single context window. SDD solves this by giving each task a fresh context with only its specification.
17
+
18
+ ## The SDD Process
16
19
 
17
20
  ### 1. LOAD -- Read the Plan
18
21
 
19
22
  - Read the plan file (PLAN.md) to get the ordered task list
20
- - For each task, identify: description, acceptance criteria, relevant files
21
- - Confirm task order makes sense (later tasks may depend on earlier ones)
23
+ - For each task: description, acceptance criteria, relevant files
24
+ - Confirm task order respects dependencies
22
25
 
23
26
  ### 2. DISPATCH -- Spawn Fresh Agent Per Task
24
27
 
25
28
  For each task in order:
26
29
 
27
- 1. Assemble the task context:
30
+ 1. Assemble minimal task context:
28
31
  - Task description and acceptance criteria from the plan
29
32
  - Only the files relevant to this specific task
30
33
  - Results from previous tasks (commit hashes, created files) -- NOT the full previous context
31
34
  2. Spawn a fresh agent with this minimal context
32
- 3. The agent implements the task, runs tests, and commits
35
+ 3. The agent implements the task, runs verification, and commits
33
36
 
34
37
  ### 3. REVIEW -- Two-Stage Quality Gate
35
38
 
36
- After each task completes, run two review stages before proceeding:
37
-
38
- **Stage 1: Spec Compliance**
39
-
40
- - Does the implementation match the task description?
41
- - Are all acceptance criteria met?
42
- - Were only the specified files modified (no scope creep)?
43
- - Do the changes align with the plan's intent?
44
-
45
- Verdict: PASS or FAIL with specific issues.
39
+ After each task completes:
46
40
 
47
- **Stage 2: Code Quality**
41
+ **Stage 1: Spec Compliance** -- Does the implementation match the task spec? Are all acceptance criteria met? Were only specified files modified?
48
42
 
49
- - Are there obvious bugs, edge cases, or error handling gaps?
50
- - Is the code readable and consistent with codebase conventions?
51
- - Are there unnecessary complications or dead code?
52
- - Do all tests pass?
43
+ **Stage 2: Code Quality** -- Are there bugs, edge cases, or error handling gaps? Is the code consistent with codebase conventions? Do all tests pass?
53
44
 
54
- Verdict: PASS or FAIL with specific issues.
45
+ Verdict: PASS or FAIL with specific issues per stage.
55
46
 
56
47
  ### 4. FIX -- Address Review Failures
57
48
 
58
49
  If either review stage fails:
59
50
 
60
- 1. Spawn a NEW fresh agent with the original task description, the review feedback, and the current file state
61
- 2. The fix agent addresses ONLY the review issues -- no new features
62
- 3. Re-run both review stages on the fixed code
63
- 4. If 3 fix attempts fail: STOP and escalate to the user
51
+ 1. Spawn a NEW fresh agent with original task spec + review feedback + current file state
52
+ 2. Fix agent addresses ONLY the review issues -- no new features
53
+ 3. Re-run both review stages
54
+ 4. If 3 fix attempts fail: STOP and escalate
64
55
 
65
56
  ### 5. ADVANCE -- Move to Next Task
66
57
 
67
58
  Only after both review stages pass:
68
59
 
69
- - Record the task as complete
70
- - Note the commit hash and any files created or modified
71
- - Pass this minimal summary (not full context) to the next task's agent
72
-
73
- ### 6. REPORT -- Final Summary
74
-
75
- After all tasks complete:
76
-
77
- - List each task with its status and commit hash
78
- - Note any tasks that required fix iterations
79
- - Summarize the total changes made
60
+ - Record task as complete with commit hash
61
+ - Pass minimal summary (not full context) to the next task
80
62
 
81
- ## Context Management Rules
63
+ ## Context Management
82
64
 
83
65
  Each agent receives ONLY what it needs:
84
66
 
@@ -89,38 +71,21 @@ Each agent receives ONLY what it needs:
89
71
  | Previous task commit hashes | Always |
90
72
  | Previous task full diff | Never |
91
73
  | Previous task agent conversation | Never |
92
- | PROJECT.md / REQUIREMENTS.md | Only if task references project-level concerns |
93
74
  | Full codebase | Never -- only specified files |
94
75
 
95
76
  The point of SDD is fresh context. Loading the previous agent's full context defeats the purpose.
96
77
 
78
+ ## When to Use SDD
79
+
80
+ - **Good fit:** Multi-task plans (3+ tasks), sequential work where each task builds on the previous, implementations where quality degrades over time
81
+ - **Poor fit:** Single-task work, highly interactive tasks requiring user feedback, tasks that share significant overlapping context
82
+
97
83
  ## Common Pitfalls
98
84
 
99
- | Pitfall | Why it matters |
85
+ | Pitfall | Why It Matters |
100
86
  |---|---|
101
- | Skipping review for simple tasks | Simple tasks still have bugs. Review takes seconds for simple code. |
102
- | Passing full context forward | Full context causes context rot. Minimal summaries keep agents effective. |
87
+ | Skipping review for simple tasks | Simple tasks still have bugs. Review catches what the implementer missed. |
88
+ | Passing full context forward | Full context causes the exact rot SDD is designed to prevent. |
103
89
  | Deferring fixes to the next task | The next task's agent does not know about the bug. Fix it now. |
104
- | Accumulating fix-later items across tasks | Each task must be clean before the next starts. |
105
-
106
- ## Verification
107
-
108
- Before reporting completion, confirm:
109
-
110
- - [ ] Every task was executed by a fresh agent with minimal context
111
- - [ ] Every task passed both spec compliance and code quality review
112
- - [ ] No task was skipped or started before the previous task passed review
113
- - [ ] Fix iterations (if any) are documented
114
- - [ ] All tests pass after the final task
115
- - [ ] Summary includes per-task status and commit hashes
116
-
117
- ## MAXSIM Integration
118
-
119
- When a plan specifies `skill: "sdd"`:
120
90
 
121
- - The orchestrator reads tasks from PLAN.md in order
122
- - Each task is dispatched to a fresh subagent
123
- - Two-stage review runs between every task
124
- - Failed reviews trigger fix agents (up to 3 attempts)
125
- - Progress is tracked in STATE.md via decision entries
126
- - Final results are recorded in SUMMARY.md
91
+ See also: `/verification-before-completion` for the evidence-based verification methodology used within each SDD task.
@@ -1,18 +1,18 @@
1
1
  ---
2
2
  name: systematic-debugging
3
3
  description: >-
4
- Investigates bugs through systematic root-cause analysis: reproduce, hypothesize,
5
- isolate, verify, fix, confirm. Use when encountering any bug, test failure,
6
- unexpected behavior, or error message.
4
+ Systematic debugging via reproduce-hypothesize-isolate-verify-fix cycle.
5
+ Requires evidence at each step. Use when investigating bugs, test failures,
6
+ unexpected behavior, or runtime errors.
7
7
  ---
8
8
 
9
9
  # Systematic Debugging
10
10
 
11
11
  Find the root cause first. Random fixes waste time and create new bugs.
12
12
 
13
- **HARD GATE -- No fix attempts without understanding root cause. If you have not completed the REPRODUCE and HYPOTHESIZE steps, you cannot propose a fix.**
13
+ **No fix attempts without understanding root cause.** If you have not completed the REPRODUCE and HYPOTHESIZE steps, you cannot propose a fix.
14
14
 
15
- ## Process
15
+ ## The 5-Step Process
16
16
 
17
17
  ### 1. REPRODUCE -- Confirm the Problem
18
18
 
@@ -52,6 +52,19 @@ Find the root cause first. Random fixes waste time and create new bugs.
52
52
  - Run the full test suite: no regressions.
53
53
  - Verify the original error no longer occurs.
54
54
 
55
+ ## Hypothesis Testing Protocol
56
+
57
+ For each hypothesis:
58
+
59
+ 1. **Form:** "I think X is the root cause because Y."
60
+ 2. **Design test:** "If X is the cause, then changing Z should produce W."
61
+ 3. **Run test:** Execute the change and observe the result.
62
+ 4. **Evaluate:** Did the result match the prediction? If yes, proceed to FIX. If no, form a new hypothesis.
63
+
64
+ ## Escalation
65
+
66
+ If 3+ fix attempts have failed, the issue is likely architectural. Document what you have tried (hypotheses tested, evidence gathered, fixes attempted) and escalate.
67
+
55
68
  ## Common Pitfalls
56
69
 
57
70
  | Excuse | Reality |
@@ -61,25 +74,6 @@ Find the root cause first. Random fixes waste time and create new bugs.
61
74
  | "Multiple changes at once saves time" | You cannot isolate what worked. You will create new bugs. |
62
75
  | "The issue is simple" | Simple bugs have root causes too. The process is fast for simple bugs. |
63
76
 
64
- Stop immediately if you catch yourself changing code before reproducing, proposing a fix before reading the full error, trying random fixes, or changing multiple things at once. If any of these triggers, return to step 1.
65
-
66
- If 3+ fix attempts have failed, the issue is likely architectural. Document what you have tried and escalate to the user.
67
-
68
- ## Verification
69
-
70
- Before claiming a bug is fixed, confirm:
71
-
72
- - [ ] The original error has been reproduced reliably
73
- - [ ] Root cause has been identified with evidence (not guessed)
74
- - [ ] A failing test reproduces the bug
75
- - [ ] A single, targeted fix addresses the root cause
76
- - [ ] The failing test now passes
77
- - [ ] The full test suite passes (no regressions)
78
- - [ ] The original error no longer occurs when running the original steps
79
-
80
- ## MAXSIM Integration
77
+ Stop immediately if you catch yourself changing code before reproducing, proposing a fix before reading the full error, trying random fixes, or changing multiple things at once.
81
78
 
82
- When debugging during plan execution, MAXSIM deviation rules apply:
83
- - **Rule 1 (Auto-fix bugs):** You may auto-fix bugs found during execution, but you must still follow this debugging process.
84
- - **Rule 4 (Architectural changes):** If 3+ fix attempts fail, STOP and return a checkpoint -- this is an architectural decision for the user.
85
- - Track all debugging deviations for SUMMARY.md documentation.
79
+ See also: `/verification-before-completion` for evidence-based confirmation after fixes.
@@ -1,18 +1,23 @@
1
1
  ---
2
2
  name: tdd
3
3
  description: >-
4
- Enforces test-driven development with the Red-Green-Refactor cycle: write a
5
- failing test first, implement minimal code to pass, then refactor. Use when
6
- implementing features, fixing bugs, or adding new behavior.
4
+ Test-driven development with red-green-refactor cycle and atomic commits.
5
+ Write failing test first, then minimal passing code, then refactor. Use when
6
+ implementing business logic, API endpoints, data transformations, validation
7
+ rules, or algorithms.
7
8
  ---
8
9
 
9
10
  # Test-Driven Development (TDD)
10
11
 
11
12
  Write the test first. Watch it fail. Write minimal code to pass. Clean up.
12
13
 
13
- **HARD GATE: No implementation code without a failing test first. If you wrote production code before the test, delete it and start over. No exceptions.**
14
+ ## When to Use TDD
14
15
 
15
- ## Process
16
+ **Good fit:** Business logic with defined I/O, API endpoints with contracts, data transformations, validation rules, algorithms, state machines.
17
+
18
+ **Poor fit:** UI layout, configuration files, build scripts, one-off scripts, mechanical renames.
19
+
20
+ ## The Red-Green-Refactor Cycle
16
21
 
17
22
  ### 1. RED -- Write One Failing Test
18
23
 
@@ -46,40 +51,27 @@ Write the test first. Watch it fail. Write minimal code to pass. Clean up.
46
51
 
47
52
  ### 6. REPEAT -- Next failing test for next behavior
48
53
 
54
+ ## Commit Pattern
55
+
56
+ Each TDD cycle produces 2-3 atomic commits:
57
+
58
+ - **RED commit:** `test({scope}): add failing test for [feature]`
59
+ - **GREEN commit:** `feat({scope}): implement [feature]`
60
+ - **REFACTOR commit (if changes made):** `refactor({scope}): clean up [feature]`
61
+
62
+ ## Context Budget
63
+
64
+ TDD uses approximately 40% more context than direct implementation due to the RED-GREEN-REFACTOR overhead. Plan accordingly for long task lists.
65
+
49
66
  ## Common Pitfalls
50
67
 
51
- | Excuse | Why it fails |
68
+ | Excuse | Why It Fails |
52
69
  |--------|-------------|
53
70
  | "Too simple to test" | Simple code breaks. The test takes 30 seconds. |
54
71
  | "I'll add tests after" | Tests written after pass immediately -- they prove nothing. |
55
72
  | "I know the code works" | Knowledge is not evidence. A passing test is evidence. |
56
73
  | "TDD is slower" | TDD is faster than debugging. Every skip creates debt. |
57
- | "Let me keep the code as reference" | You will adapt it instead of writing test-first. Delete means delete. |
58
-
59
- Stop immediately if you catch yourself:
60
-
61
- - Writing implementation code before writing a test
62
- - Writing a test that passes on the first run
63
- - Skipping the VERIFY RED step
64
- - Adding features beyond what the current test requires
65
- - Keeping pre-TDD code "as reference"
66
-
67
- ## Verification
68
-
69
- Before claiming TDD compliance, confirm:
70
-
71
- - [ ] Every new function/method has a corresponding test
72
- - [ ] Each test was written BEFORE its implementation
73
- - [ ] Each test was observed to FAIL before implementation was written
74
- - [ ] Each test failed for the expected reason (missing behavior, not syntax error)
75
- - [ ] Minimal code was written to pass each test
76
- - [ ] All tests pass after implementation
77
- - [ ] Refactoring (if any) did not break any tests
78
-
79
- ## MAXSIM Integration
80
74
 
81
- In MAXSIM plan execution, tasks marked `tdd="true"` follow this cycle with per-step commits:
75
+ Stop immediately if you catch yourself writing implementation code before writing a test, writing a test that passes on the first run, skipping the VERIFY RED step, or adding features beyond what the current test requires.
82
76
 
83
- - **RED commit:** `test({phase}-{plan}): add failing test for [feature]`
84
- - **GREEN commit:** `feat({phase}-{plan}): implement [feature]`
85
- - **REFACTOR commit (if changes made):** `refactor({phase}-{plan}): clean up [feature]`
77
+ See also: `/verification-before-completion` for evidence-based completion claims after TDD cycles.
@@ -0,0 +1,80 @@
1
+ ---
2
+ name: tool-priority-guide
3
+ description: >-
4
+ Tool selection guide for Claude Code operations. Maps common tasks to preferred
5
+ tools, explaining when to use Read over cat, Grep over rg, Glob over find,
6
+ Write over echo, and Edit over sed. Use when deciding which tool to use for
7
+ file operations, search, content modification, or web content retrieval.
8
+ user-invocable: false
9
+ ---
10
+
11
+ # Tool Priority Guide
12
+
13
+ Use dedicated Claude Code tools over Bash equivalents. Dedicated tools provide better permissions handling, output formatting, and user experience.
14
+
15
+ ## File Reading
16
+
17
+ | Task | Use | Not |
18
+ |------|-----|-----|
19
+ | Read file contents | **Read tool** | `cat`, `head`, `tail` via Bash |
20
+ | Read specific lines | **Read tool** (with offset/limit) | `sed -n 'X,Yp'` via Bash |
21
+ | Read images | **Read tool** (multimodal) | Not possible via Bash |
22
+ | Read PDFs | **Read tool** (with pages param) | `pdftotext` via Bash |
23
+
24
+ **Why Read:** Handles permissions, large files, binary formats. Returns line-numbered output.
25
+
26
+ ## File Writing
27
+
28
+ | Task | Use | Not |
29
+ |------|-----|-----|
30
+ | Create new file | **Write tool** | `echo > file`, `cat <<EOF` via Bash |
31
+ | Rewrite entire file | **Write tool** (after Read) | `cat > file` via Bash |
32
+ | Modify part of file | **Edit tool** | `sed`, `awk` via Bash |
33
+ | Rename string across file | **Edit tool** (replace_all) | `sed -i 's/old/new/g'` via Bash |
34
+
35
+ **Why Write/Edit:** Atomic operations, preserves encoding, provides diff view for review.
36
+
37
+ ## Searching
38
+
39
+ | Task | Use | Not |
40
+ |------|-----|-----|
41
+ | Search file contents | **Grep tool** | `grep`, `rg` via Bash |
42
+ | Find files by pattern | **Glob tool** | `find`, `ls -R` via Bash |
43
+ | Search with context | **Grep tool** (-A, -B, -C params) | `grep -C N` via Bash |
44
+ | Count matches | **Grep tool** (output_mode: count) | `grep -c` via Bash |
45
+
46
+ **Why Grep/Glob:** Optimized permissions, structured output, result limiting.
47
+
48
+ ## Web Content
49
+
50
+ | Task | Use | Not |
51
+ |------|-----|-----|
52
+ | Fetch documentation | **WebFetch tool** | `curl` via Bash |
53
+ | Read API responses | **WebFetch tool** | `curl | jq` via Bash |
54
+ | Download files | **Bash** (`curl -O`) | WebFetch (not for binary downloads) |
55
+
56
+ **Why WebFetch:** Handles authentication, follows redirects, parses HTML.
57
+
58
+ ## When Bash IS the Right Tool
59
+
60
+ | Task | Why Bash |
61
+ |------|---------|
62
+ | Run build/test commands | `npm test`, `npm run build` -- no dedicated tool |
63
+ | Git operations | `git status`, `git commit` -- no dedicated tool |
64
+ | Install dependencies | `npm install` -- no dedicated tool |
65
+ | Check file existence | `test -f path` -- lightweight, often part of larger commands |
66
+ | Run project CLI tools | Project-specific commands -- no dedicated tool |
67
+ | Chained operations | Multiple sequential commands with `&&` |
68
+
69
+ ## Quick Reference
70
+
71
+ ```
72
+ Read file --> Read tool
73
+ Write file --> Write tool (new) or Edit tool (modify)
74
+ Search code --> Grep tool
75
+ Find files --> Glob tool
76
+ Fetch URL --> WebFetch tool
77
+ Run commands --> Bash tool
78
+ ```
79
+
80
+ The general principle: if a dedicated tool exists for the operation, use it. Fall back to Bash only when no dedicated tool covers the task.