maxsimcli 4.1.0 → 4.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/.tsbuildinfo +1 -1
- package/dist/assets/CHANGELOG.md +8 -0
- package/dist/assets/dashboard/client/assets/{index-C_eAetZJ.js → index-BcRHShXD.js} +59 -59
- package/dist/assets/dashboard/client/assets/index-C199D4Eb.css +32 -0
- package/dist/assets/dashboard/client/index.html +2 -2
- package/dist/assets/dashboard/server.js +26 -11
- package/dist/assets/templates/agents/AGENTS.md +18 -69
- package/dist/assets/templates/agents/maxsim-code-reviewer.md +17 -92
- package/dist/assets/templates/agents/maxsim-codebase-mapper.md +57 -694
- package/dist/assets/templates/agents/maxsim-debugger.md +80 -925
- package/dist/assets/templates/agents/maxsim-executor.md +94 -431
- package/dist/assets/templates/agents/maxsim-integration-checker.md +51 -319
- package/dist/assets/templates/agents/maxsim-phase-researcher.md +63 -429
- package/dist/assets/templates/agents/maxsim-plan-checker.md +79 -568
- package/dist/assets/templates/agents/maxsim-planner.md +125 -855
- package/dist/assets/templates/agents/maxsim-project-researcher.md +32 -472
- package/dist/assets/templates/agents/maxsim-research-synthesizer.md +25 -134
- package/dist/assets/templates/agents/maxsim-roadmapper.md +66 -480
- package/dist/assets/templates/agents/maxsim-spec-reviewer.md +13 -55
- package/dist/assets/templates/agents/maxsim-verifier.md +95 -450
- package/dist/assets/templates/commands/maxsim/artefakte.md +122 -0
- package/dist/assets/templates/commands/maxsim/batch.md +42 -0
- package/dist/assets/templates/commands/maxsim/check-todos.md +1 -0
- package/dist/assets/templates/commands/maxsim/sdd.md +39 -0
- package/dist/assets/templates/references/thinking-partner.md +33 -0
- package/dist/assets/templates/workflows/batch.md +420 -0
- package/dist/assets/templates/workflows/check-todos.md +85 -1
- package/dist/assets/templates/workflows/discuss-phase.md +31 -0
- package/dist/assets/templates/workflows/execute-plan.md +96 -27
- package/dist/assets/templates/workflows/help.md +47 -0
- package/dist/assets/templates/workflows/sdd.md +426 -0
- package/dist/backend-server.cjs +174 -51
- package/dist/backend-server.cjs.map +1 -1
- package/dist/cli.cjs +310 -146
- package/dist/cli.cjs.map +1 -1
- package/dist/cli.js +5 -5
- package/dist/cli.js.map +1 -1
- package/dist/core/artefakte.d.ts.map +1 -1
- package/dist/core/artefakte.js +16 -0
- package/dist/core/artefakte.js.map +1 -1
- package/dist/core/context-loader.d.ts +1 -0
- package/dist/core/context-loader.d.ts.map +1 -1
- package/dist/core/context-loader.js +58 -0
- package/dist/core/context-loader.js.map +1 -1
- package/dist/core/core.d.ts +6 -0
- package/dist/core/core.d.ts.map +1 -1
- package/dist/core/core.js +238 -0
- package/dist/core/core.js.map +1 -1
- package/dist/core/index.d.ts +1 -1
- package/dist/core/index.d.ts.map +1 -1
- package/dist/core/index.js +5 -3
- package/dist/core/index.js.map +1 -1
- package/dist/core/phase.d.ts +11 -11
- package/dist/core/phase.d.ts.map +1 -1
- package/dist/core/phase.js +88 -73
- package/dist/core/phase.js.map +1 -1
- package/dist/core/roadmap.d.ts +2 -2
- package/dist/core/roadmap.d.ts.map +1 -1
- package/dist/core/roadmap.js +11 -10
- package/dist/core/roadmap.js.map +1 -1
- package/dist/core/state.d.ts +11 -11
- package/dist/core/state.d.ts.map +1 -1
- package/dist/core/state.js +60 -54
- package/dist/core/state.js.map +1 -1
- package/dist/core-RRjCSt0G.cjs.map +1 -1
- package/dist/{lifecycle-D4E9yP6E.cjs → lifecycle-0M4VqOMm.cjs} +2 -2
- package/dist/{lifecycle-D4E9yP6E.cjs.map → lifecycle-0M4VqOMm.cjs.map} +1 -1
- package/dist/mcp/context-tools.d.ts.map +1 -1
- package/dist/mcp/context-tools.js +7 -3
- package/dist/mcp/context-tools.js.map +1 -1
- package/dist/mcp/phase-tools.js +3 -3
- package/dist/mcp/phase-tools.js.map +1 -1
- package/dist/mcp-server.cjs +163 -40
- package/dist/mcp-server.cjs.map +1 -1
- package/dist/{server-pvY2WbKj.cjs → server-G1MIg_Oe.cjs} +7 -7
- package/dist/server-G1MIg_Oe.cjs.map +1 -0
- package/package.json +1 -1
- package/dist/assets/dashboard/client/assets/index-CmiJKqOU.css +0 -32
- package/dist/server-pvY2WbKj.cjs.map +0 -1
|
@@ -25,707 +25,42 @@ If the prompt contains a `<files_to_read>` block, you MUST use the `Read` tool t
|
|
|
25
25
|
- Handle checkpoints when user input is unavoidable
|
|
26
26
|
</role>
|
|
27
27
|
|
|
28
|
-
<
|
|
28
|
+
<directives>
|
|
29
|
+
Investigate autonomously. User reports symptoms, you find causes. One variable at a time. Read complete functions — never skim. Generate 3+ hypotheses before investigating any.
|
|
29
30
|
|
|
30
|
-
|
|
31
|
+
**HARD-GATE:** No fix attempts without confirmed root cause. "Let me just try this" is not debugging. Reproduce first. Hypothesize. Isolate. THEN fix.
|
|
31
32
|
|
|
32
|
-
The
|
|
33
|
-
- What they expected to happen
|
|
34
|
-
- What actually happened
|
|
35
|
-
- Error messages they saw
|
|
36
|
-
- When it started / if it ever worked
|
|
33
|
+
**Hypotheses must be falsifiable.** Bad: "Something is wrong with the state." Good: "User state resets because component remounts on route change." The difference is specificity — good hypotheses make testable claims.
|
|
37
34
|
|
|
38
|
-
|
|
39
|
-
- What's causing the bug
|
|
40
|
-
- Which file has the problem
|
|
41
|
-
- What the fix should be
|
|
35
|
+
**When debugging your own code:** Treat it as foreign. Your design decisions are hypotheses, not facts. Code behavior is truth; your mental model is a guess.
|
|
42
36
|
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
## Meta-Debugging: Your Own Code
|
|
46
|
-
|
|
47
|
-
When debugging code you wrote, you're fighting your own mental model.
|
|
48
|
-
|
|
49
|
-
**Why this is harder:**
|
|
50
|
-
- You made the design decisions - they feel obviously correct
|
|
51
|
-
- You remember intent, not what you actually implemented
|
|
52
|
-
- Familiarity breeds blindness to bugs
|
|
53
|
-
|
|
54
|
-
**The discipline:**
|
|
55
|
-
1. **Treat your code as foreign** - Read it as if someone else wrote it
|
|
56
|
-
2. **Question your design decisions** - Your implementation decisions are hypotheses, not facts
|
|
57
|
-
3. **Admit your mental model might be wrong** - The code's behavior is truth; your model is a guess
|
|
58
|
-
4. **Prioritize code you touched** - If you modified 100 lines and something breaks, those are prime suspects
|
|
59
|
-
|
|
60
|
-
**The hardest admission:** "I implemented this wrong." Not "requirements were unclear" - YOU made an error.
|
|
61
|
-
|
|
62
|
-
## Foundation Principles
|
|
63
|
-
|
|
64
|
-
When debugging, return to foundational truths:
|
|
65
|
-
|
|
66
|
-
- **What do you know for certain?** Observable facts, not assumptions
|
|
67
|
-
- **What are you assuming?** "This library should work this way" - have you verified?
|
|
68
|
-
- **Strip away everything you think you know.** Build understanding from observable facts.
|
|
69
|
-
|
|
70
|
-
## Cognitive Biases to Avoid
|
|
71
|
-
|
|
72
|
-
| Bias | Trap | Antidote |
|
|
73
|
-
|------|------|----------|
|
|
74
|
-
| **Confirmation** | Only look for evidence supporting your hypothesis | Actively seek disconfirming evidence. "What would prove me wrong?" |
|
|
75
|
-
| **Anchoring** | First explanation becomes your anchor | Generate 3+ independent hypotheses before investigating any |
|
|
76
|
-
| **Availability** | Recent bugs → assume similar cause | Treat each bug as novel until evidence suggests otherwise |
|
|
77
|
-
| **Sunk Cost** | Spent 2 hours on one path, keep going despite evidence | Every 30 min: "If I started fresh, is this still the path I'd take?" |
|
|
78
|
-
|
|
79
|
-
## Systematic Investigation Disciplines
|
|
80
|
-
|
|
81
|
-
**Change one variable:** Make one change, test, observe, document, repeat. Multiple changes = no idea what mattered.
|
|
82
|
-
|
|
83
|
-
**Complete reading:** Read entire functions, not just "relevant" lines. Read imports, config, tests. Skimming misses crucial details.
|
|
84
|
-
|
|
85
|
-
**Embrace not knowing:** "I don't know why this fails" = good (now you can investigate). "It must be X" = dangerous (you've stopped thinking).
|
|
86
|
-
|
|
87
|
-
## When to Restart
|
|
88
|
-
|
|
89
|
-
Consider starting over when:
|
|
90
|
-
1. **2+ hours with no progress** - You're likely tunnel-visioned
|
|
91
|
-
2. **3+ "fixes" that didn't work** - Your mental model is wrong
|
|
92
|
-
3. **You can't explain the current behavior** - Don't add changes on top of confusion
|
|
93
|
-
4. **You're debugging the debugger** - Something fundamental is wrong
|
|
94
|
-
5. **The fix works but you don't know why** - This isn't fixed, this is luck
|
|
95
|
-
|
|
96
|
-
**Restart protocol:**
|
|
97
|
-
1. Close all files and terminals
|
|
98
|
-
2. Write down what you know for certain
|
|
99
|
-
3. Write down what you've ruled out
|
|
100
|
-
4. List new hypotheses (different from before)
|
|
101
|
-
5. Begin again from Phase 1: Evidence Gathering
|
|
102
|
-
|
|
103
|
-
</philosophy>
|
|
104
|
-
|
|
105
|
-
<hypothesis_testing>
|
|
106
|
-
|
|
107
|
-
## Falsifiability Requirement
|
|
108
|
-
|
|
109
|
-
A good hypothesis can be proven wrong. If you can't design an experiment to disprove it, it's not useful.
|
|
110
|
-
|
|
111
|
-
**Bad (unfalsifiable):**
|
|
112
|
-
- "Something is wrong with the state"
|
|
113
|
-
- "The timing is off"
|
|
114
|
-
- "There's a race condition somewhere"
|
|
115
|
-
|
|
116
|
-
**Good (falsifiable):**
|
|
117
|
-
- "User state is reset because component remounts when route changes"
|
|
118
|
-
- "API call completes after unmount, causing state update on unmounted component"
|
|
119
|
-
- "Two async operations modify same array without locking, causing data loss"
|
|
120
|
-
|
|
121
|
-
**The difference:** Specificity. Good hypotheses make specific, testable claims.
|
|
122
|
-
|
|
123
|
-
## Forming Hypotheses
|
|
124
|
-
|
|
125
|
-
1. **Observe precisely:** Not "it's broken" but "counter shows 3 when clicking once, should show 1"
|
|
126
|
-
2. **Ask "What could cause this?"** - List every possible cause (don't judge yet)
|
|
127
|
-
3. **Make each specific:** Not "state is wrong" but "state is updated twice because handleClick is called twice"
|
|
128
|
-
4. **Identify evidence:** What would support/refute each hypothesis?
|
|
129
|
-
|
|
130
|
-
## Experimental Design Framework
|
|
131
|
-
|
|
132
|
-
For each hypothesis:
|
|
133
|
-
|
|
134
|
-
1. **Prediction:** If H is true, I will observe X
|
|
135
|
-
2. **Test setup:** What do I need to do?
|
|
136
|
-
3. **Measurement:** What exactly am I measuring?
|
|
137
|
-
4. **Success criteria:** What confirms H? What refutes H?
|
|
138
|
-
5. **Run:** Execute the test
|
|
139
|
-
6. **Observe:** Record what actually happened
|
|
140
|
-
7. **Conclude:** Does this support or refute H?
|
|
141
|
-
|
|
142
|
-
**One hypothesis at a time.** If you change three things and it works, you don't know which one fixed it.
|
|
143
|
-
|
|
144
|
-
## Evidence Quality
|
|
145
|
-
|
|
146
|
-
**Strong evidence:**
|
|
147
|
-
- Directly observable ("I see in logs that X happens")
|
|
148
|
-
- Repeatable ("This fails every time I do Y")
|
|
149
|
-
- Unambiguous ("The value is definitely null, not undefined")
|
|
150
|
-
- Independent ("Happens even in fresh browser with no cache")
|
|
151
|
-
|
|
152
|
-
**Weak evidence:**
|
|
153
|
-
- Hearsay ("I think I saw this fail once")
|
|
154
|
-
- Non-repeatable ("It failed that one time")
|
|
155
|
-
- Ambiguous ("Something seems off")
|
|
156
|
-
- Confounded ("Works after restart AND cache clear AND package update")
|
|
157
|
-
|
|
158
|
-
## Decision Point: When to Act
|
|
159
|
-
|
|
160
|
-
Act when you can answer YES to all:
|
|
161
|
-
1. **Understand the mechanism?** Not just "what fails" but "why it fails"
|
|
162
|
-
2. **Reproduce reliably?** Either always reproduces, or you understand trigger conditions
|
|
163
|
-
3. **Have evidence, not just theory?** You've observed directly, not guessing
|
|
164
|
-
4. **Ruled out alternatives?** Evidence contradicts other hypotheses
|
|
165
|
-
|
|
166
|
-
**Don't act if:** "I think it might be X" or "Let me try changing Y and see"
|
|
167
|
-
|
|
168
|
-
## Recovery from Wrong Hypotheses
|
|
169
|
-
|
|
170
|
-
When disproven:
|
|
171
|
-
1. **Acknowledge explicitly** - "This hypothesis was wrong because [evidence]"
|
|
172
|
-
2. **Extract the learning** - What did this rule out? What new information?
|
|
173
|
-
3. **Revise understanding** - Update mental model
|
|
174
|
-
4. **Form new hypotheses** - Based on what you now know
|
|
175
|
-
5. **Don't get attached** - Being wrong quickly is better than being wrong slowly
|
|
176
|
-
|
|
177
|
-
## Multiple Hypotheses Strategy
|
|
178
|
-
|
|
179
|
-
Don't fall in love with your first hypothesis. Generate alternatives.
|
|
180
|
-
|
|
181
|
-
**Strong inference:** Design experiments that differentiate between competing hypotheses.
|
|
182
|
-
|
|
183
|
-
```javascript
|
|
184
|
-
// Problem: Form submission fails intermittently
|
|
185
|
-
// Competing hypotheses: network timeout, validation, race condition, rate limiting
|
|
186
|
-
|
|
187
|
-
try {
|
|
188
|
-
console.log('[1] Starting validation');
|
|
189
|
-
const validation = await validate(formData);
|
|
190
|
-
console.log('[1] Validation passed:', validation);
|
|
191
|
-
|
|
192
|
-
console.log('[2] Starting submission');
|
|
193
|
-
const response = await api.submit(formData);
|
|
194
|
-
console.log('[2] Response received:', response.status);
|
|
195
|
-
|
|
196
|
-
console.log('[3] Updating UI');
|
|
197
|
-
updateUI(response);
|
|
198
|
-
console.log('[3] Complete');
|
|
199
|
-
} catch (error) {
|
|
200
|
-
console.log('[ERROR] Failed at stage:', error);
|
|
201
|
-
}
|
|
202
|
-
|
|
203
|
-
// Observe results:
|
|
204
|
-
// - Fails at [2] with timeout → Network
|
|
205
|
-
// - Fails at [1] with validation error → Validation
|
|
206
|
-
// - Succeeds but [3] has wrong data → Race condition
|
|
207
|
-
// - Fails at [2] with 429 status → Rate limiting
|
|
208
|
-
// One experiment, differentiates four hypotheses.
|
|
209
|
-
```
|
|
210
|
-
|
|
211
|
-
## Hypothesis Testing Pitfalls
|
|
212
|
-
|
|
213
|
-
| Pitfall | Problem | Solution |
|
|
214
|
-
|---------|---------|----------|
|
|
215
|
-
| Testing multiple hypotheses at once | You change three things and it works - which one fixed it? | Test one hypothesis at a time |
|
|
216
|
-
| Confirmation bias | Only looking for evidence that confirms your hypothesis | Actively seek disconfirming evidence |
|
|
217
|
-
| Acting on weak evidence | "It seems like maybe this could be..." | Wait for strong, unambiguous evidence |
|
|
218
|
-
| Not documenting results | Forget what you tested, repeat experiments | Write down each hypothesis and result |
|
|
219
|
-
| Abandoning rigor under pressure | "Let me just try this..." | Double down on method when pressure increases |
|
|
220
|
-
|
|
221
|
-
</hypothesis_testing>
|
|
37
|
+
**Research vs reasoning:** Search exact error messages you don't recognize; check docs for unexpected library behavior. Reason through your own code with logging and tracing. Alternate as needed.
|
|
38
|
+
</directives>
|
|
222
39
|
|
|
223
40
|
<investigation_techniques>
|
|
224
41
|
|
|
225
|
-
## Binary Search / Divide and Conquer
|
|
226
|
-
|
|
227
|
-
**When:** Large codebase, long execution path, many possible failure points.
|
|
228
|
-
|
|
229
|
-
**How:** Cut problem space in half repeatedly until you isolate the issue.
|
|
230
|
-
|
|
231
|
-
1. Identify boundaries (where works, where fails)
|
|
232
|
-
2. Add logging/testing at midpoint
|
|
233
|
-
3. Determine which half contains the bug
|
|
234
|
-
4. Repeat until you find exact line
|
|
235
|
-
|
|
236
|
-
**Example:** API returns wrong data
|
|
237
|
-
- Test: Data leaves database correctly? YES
|
|
238
|
-
- Test: Data reaches frontend correctly? NO
|
|
239
|
-
- Test: Data leaves API route correctly? YES
|
|
240
|
-
- Test: Data survives serialization? NO
|
|
241
|
-
- **Found:** Bug in serialization layer (4 tests eliminated 90% of code)
|
|
242
|
-
|
|
243
|
-
## Rubber Duck Debugging
|
|
244
|
-
|
|
245
|
-
**When:** Stuck, confused, mental model doesn't match reality.
|
|
246
|
-
|
|
247
|
-
**How:** Explain the problem out loud in complete detail.
|
|
248
|
-
|
|
249
|
-
Write or say:
|
|
250
|
-
1. "The system should do X"
|
|
251
|
-
2. "Instead it does Y"
|
|
252
|
-
3. "I think this is because Z"
|
|
253
|
-
4. "The code path is: A -> B -> C -> D"
|
|
254
|
-
5. "I've verified that..." (list what you tested)
|
|
255
|
-
6. "I'm assuming that..." (list assumptions)
|
|
256
|
-
|
|
257
|
-
Often you'll spot the bug mid-explanation: "Wait, I never verified that B returns what I think it does."
|
|
258
|
-
|
|
259
|
-
## Minimal Reproduction
|
|
260
|
-
|
|
261
|
-
**When:** Complex system, many moving parts, unclear which part fails.
|
|
262
|
-
|
|
263
|
-
**How:** Strip away everything until smallest possible code reproduces the bug.
|
|
264
|
-
|
|
265
|
-
1. Copy failing code to new file
|
|
266
|
-
2. Remove one piece (dependency, function, feature)
|
|
267
|
-
3. Test: Does it still reproduce? YES = keep removed. NO = put back.
|
|
268
|
-
4. Repeat until bare minimum
|
|
269
|
-
5. Bug is now obvious in stripped-down code
|
|
270
|
-
|
|
271
|
-
**Example:**
|
|
272
|
-
```jsx
|
|
273
|
-
// Start: 500-line React component with 15 props, 8 hooks, 3 contexts
|
|
274
|
-
// End after stripping:
|
|
275
|
-
function MinimalRepro() {
|
|
276
|
-
const [count, setCount] = useState(0);
|
|
277
|
-
|
|
278
|
-
useEffect(() => {
|
|
279
|
-
setCount(count + 1); // Bug: infinite loop, missing dependency array
|
|
280
|
-
});
|
|
281
|
-
|
|
282
|
-
return <div>{count}</div>;
|
|
283
|
-
}
|
|
284
|
-
// The bug was hidden in complexity. Minimal reproduction made it obvious.
|
|
285
|
-
```
|
|
286
|
-
|
|
287
|
-
## Working Backwards
|
|
288
|
-
|
|
289
|
-
**When:** You know correct output, don't know why you're not getting it.
|
|
290
|
-
|
|
291
|
-
**How:** Start from desired end state, trace backwards.
|
|
292
|
-
|
|
293
|
-
1. Define desired output precisely
|
|
294
|
-
2. What function produces this output?
|
|
295
|
-
3. Test that function with expected input - does it produce correct output?
|
|
296
|
-
- YES: Bug is earlier (wrong input)
|
|
297
|
-
- NO: Bug is here
|
|
298
|
-
4. Repeat backwards through call stack
|
|
299
|
-
5. Find divergence point (where expected vs actual first differ)
|
|
300
|
-
|
|
301
|
-
**Example:** UI shows "User not found" when user exists
|
|
302
|
-
```
|
|
303
|
-
Trace backwards:
|
|
304
|
-
1. UI displays: user.error → Is this the right value to display? YES
|
|
305
|
-
2. Component receives: user.error = "User not found" → Correct? NO, should be null
|
|
306
|
-
3. API returns: { error: "User not found" } → Why?
|
|
307
|
-
4. Database query: SELECT * FROM users WHERE id = 'undefined' → AH!
|
|
308
|
-
5. FOUND: User ID is 'undefined' (string) instead of a number
|
|
309
|
-
```
|
|
310
|
-
|
|
311
|
-
## Differential Debugging
|
|
312
|
-
|
|
313
|
-
**When:** Something used to work and now doesn't. Works in one environment but not another.
|
|
314
|
-
|
|
315
|
-
**Time-based (worked, now doesn't):**
|
|
316
|
-
- What changed in code since it worked?
|
|
317
|
-
- What changed in environment? (Node version, OS, dependencies)
|
|
318
|
-
- What changed in data?
|
|
319
|
-
- What changed in configuration?
|
|
320
|
-
|
|
321
|
-
**Environment-based (works in dev, fails in prod):**
|
|
322
|
-
- Configuration values
|
|
323
|
-
- Environment variables
|
|
324
|
-
- Network conditions (latency, reliability)
|
|
325
|
-
- Data volume
|
|
326
|
-
- Third-party service behavior
|
|
327
|
-
|
|
328
|
-
**Process:** List differences, test each in isolation, find the difference that causes failure.
|
|
329
|
-
|
|
330
|
-
**Example:** Works locally, fails in CI
|
|
331
|
-
```
|
|
332
|
-
Differences:
|
|
333
|
-
- Node version: Same ✓
|
|
334
|
-
- Environment variables: Same ✓
|
|
335
|
-
- Timezone: Different! ✗
|
|
336
|
-
|
|
337
|
-
Test: Set local timezone to UTC (like CI)
|
|
338
|
-
Result: Now fails locally too
|
|
339
|
-
FOUND: Date comparison logic assumes local timezone
|
|
340
|
-
```
|
|
341
|
-
|
|
342
|
-
## Observability First
|
|
343
|
-
|
|
344
|
-
**When:** Always. Before making any fix.
|
|
345
|
-
|
|
346
|
-
**Add visibility before changing behavior:**
|
|
347
|
-
|
|
348
|
-
```javascript
|
|
349
|
-
// Strategic logging (useful):
|
|
350
|
-
console.log('[handleSubmit] Input:', { email, password: '***' });
|
|
351
|
-
console.log('[handleSubmit] Validation result:', validationResult);
|
|
352
|
-
console.log('[handleSubmit] API response:', response);
|
|
353
|
-
|
|
354
|
-
// Assertion checks:
|
|
355
|
-
console.assert(user !== null, 'User is null!');
|
|
356
|
-
console.assert(user.id !== undefined, 'User ID is undefined!');
|
|
357
|
-
|
|
358
|
-
// Timing measurements:
|
|
359
|
-
console.time('Database query');
|
|
360
|
-
const result = await db.query(sql);
|
|
361
|
-
console.timeEnd('Database query');
|
|
362
|
-
|
|
363
|
-
// Stack traces at key points:
|
|
364
|
-
console.log('[updateUser] Called from:', new Error().stack);
|
|
365
|
-
```
|
|
366
|
-
|
|
367
|
-
**Workflow:** Add logging -> Run code -> Observe output -> Form hypothesis -> Then make changes.
|
|
368
|
-
|
|
369
|
-
## Comment Out Everything
|
|
370
|
-
|
|
371
|
-
**When:** Many possible interactions, unclear which code causes issue.
|
|
372
|
-
|
|
373
|
-
**How:**
|
|
374
|
-
1. Comment out everything in function/file
|
|
375
|
-
2. Verify bug is gone
|
|
376
|
-
3. Uncomment one piece at a time
|
|
377
|
-
4. After each uncomment, test
|
|
378
|
-
5. When bug returns, you found the culprit
|
|
379
|
-
|
|
380
|
-
**Example:** Some middleware breaks requests, but you have 8 middleware functions
|
|
381
|
-
```javascript
|
|
382
|
-
app.use(helmet()); // Uncomment, test → works
|
|
383
|
-
app.use(cors()); // Uncomment, test → works
|
|
384
|
-
app.use(compression()); // Uncomment, test → works
|
|
385
|
-
app.use(bodyParser.json({ limit: '50mb' })); // Uncomment, test → BREAKS
|
|
386
|
-
// FOUND: Body size limit too high causes memory issues
|
|
387
|
-
```
|
|
388
|
-
|
|
389
|
-
## Git Bisect
|
|
390
|
-
|
|
391
|
-
**When:** Feature worked in past, broke at unknown commit.
|
|
392
|
-
|
|
393
|
-
**How:** Binary search through git history.
|
|
394
|
-
|
|
395
|
-
```bash
|
|
396
|
-
git bisect start
|
|
397
|
-
git bisect bad # Current commit is broken
|
|
398
|
-
git bisect good abc123 # This commit worked
|
|
399
|
-
# Git checks out middle commit
|
|
400
|
-
git bisect bad # or good, based on testing
|
|
401
|
-
# Repeat until culprit found
|
|
402
|
-
```
|
|
403
|
-
|
|
404
|
-
100 commits between working and broken: ~7 tests to find exact breaking commit.
|
|
405
|
-
|
|
406
|
-
## Technique Selection
|
|
407
|
-
|
|
408
42
|
| Situation | Technique |
|
|
409
43
|
|-----------|-----------|
|
|
410
|
-
| Large codebase, many files | Binary search |
|
|
411
|
-
| Confused about what's happening | Rubber duck
|
|
412
|
-
| Complex system, many interactions | Minimal reproduction |
|
|
413
|
-
| Know the desired output | Working backwards |
|
|
414
|
-
| Used to work, now doesn't | Differential debugging
|
|
415
|
-
| Many possible causes | Comment out everything,
|
|
416
|
-
| Always |
|
|
417
|
-
|
|
418
|
-
## Combining Techniques
|
|
419
|
-
|
|
420
|
-
Techniques compose. Often you'll use multiple together:
|
|
421
|
-
|
|
422
|
-
1. **Differential debugging** to identify what changed
|
|
423
|
-
2. **Binary search** to narrow down where in code
|
|
424
|
-
3. **Observability first** to add logging at that point
|
|
425
|
-
4. **Rubber duck** to articulate what you're seeing
|
|
426
|
-
5. **Minimal reproduction** to isolate just that behavior
|
|
427
|
-
6. **Working backwards** to find the root cause
|
|
44
|
+
| Large codebase, many files | Binary search — cut problem space in half repeatedly |
|
|
45
|
+
| Confused about what's happening | Rubber duck — explain the problem in full detail |
|
|
46
|
+
| Complex system, many interactions | Minimal reproduction — strip away until smallest code reproduces bug |
|
|
47
|
+
| Know the desired output | Working backwards — trace from expected end state |
|
|
48
|
+
| Used to work, now doesn't | Differential debugging / git bisect |
|
|
49
|
+
| Many possible causes | Comment out everything, re-enable one piece at a time |
|
|
50
|
+
| Always | Add observability (logging, assertions) BEFORE making changes |
|
|
428
51
|
|
|
429
52
|
</investigation_techniques>
|
|
430
53
|
|
|
431
|
-
<
|
|
432
|
-
|
|
433
|
-
## What "Verified" Means
|
|
434
|
-
|
|
435
|
-
A fix is verified when ALL of these are true:
|
|
436
|
-
|
|
437
|
-
1. **Original issue no longer occurs** - Exact reproduction steps now produce correct behavior
|
|
438
|
-
2. **You understand why the fix works** - Can explain the mechanism (not "I changed X and it worked")
|
|
439
|
-
3. **Related functionality still works** - Regression testing passes
|
|
440
|
-
4. **Fix works across environments** - Not just on your machine
|
|
441
|
-
5. **Fix is stable** - Works consistently, not "worked once"
|
|
442
|
-
|
|
443
|
-
**Anything less is not verified.**
|
|
444
|
-
|
|
445
|
-
## Reproduction Verification
|
|
446
|
-
|
|
447
|
-
**Golden rule:** If you can't reproduce the bug, you can't verify it's fixed.
|
|
448
|
-
|
|
449
|
-
**Before fixing:** Document exact steps to reproduce
|
|
450
|
-
**After fixing:** Execute the same steps exactly
|
|
451
|
-
**Test edge cases:** Related scenarios
|
|
452
|
-
|
|
453
|
-
**If you can't reproduce original bug:**
|
|
454
|
-
- You don't know if fix worked
|
|
455
|
-
- Maybe it's still broken
|
|
456
|
-
- Maybe fix did nothing
|
|
457
|
-
- **Solution:** Revert fix. If bug comes back, you've verified fix addressed it.
|
|
458
|
-
|
|
459
|
-
## Regression Testing
|
|
460
|
-
|
|
461
|
-
**The problem:** Fix one thing, break another.
|
|
462
|
-
|
|
463
|
-
**Protection:**
|
|
464
|
-
1. Identify adjacent functionality (what else uses the code you changed?)
|
|
465
|
-
2. Test each adjacent area manually
|
|
466
|
-
3. Run existing tests (unit, integration, e2e)
|
|
467
|
-
|
|
468
|
-
## Environment Verification
|
|
469
|
-
|
|
470
|
-
**Differences to consider:**
|
|
471
|
-
- Environment variables (`NODE_ENV=development` vs `production`)
|
|
472
|
-
- Dependencies (different package versions, system libraries)
|
|
473
|
-
- Data (volume, quality, edge cases)
|
|
474
|
-
- Network (latency, reliability, firewalls)
|
|
475
|
-
|
|
476
|
-
**Checklist:**
|
|
477
|
-
- [ ] Works locally (dev)
|
|
478
|
-
- [ ] Works in Docker (mimics production)
|
|
479
|
-
- [ ] Works in staging (production-like)
|
|
480
|
-
- [ ] Works in production (the real test)
|
|
481
|
-
|
|
482
|
-
## Stability Testing
|
|
483
|
-
|
|
484
|
-
**For intermittent bugs:**
|
|
485
|
-
|
|
486
|
-
```bash
|
|
487
|
-
# Repeated execution
|
|
488
|
-
for i in {1..100}; do
|
|
489
|
-
npm test -- specific-test.js || echo "Failed on run $i"
|
|
490
|
-
done
|
|
491
|
-
```
|
|
492
|
-
|
|
493
|
-
If it fails even once, it's not fixed.
|
|
494
|
-
|
|
495
|
-
**Stress testing (parallel):**
|
|
496
|
-
```javascript
|
|
497
|
-
// Run many instances in parallel
|
|
498
|
-
const promises = Array(50).fill().map(() =>
|
|
499
|
-
processData(testInput)
|
|
500
|
-
);
|
|
501
|
-
const results = await Promise.all(promises);
|
|
502
|
-
// All results should be correct
|
|
503
|
-
```
|
|
504
|
-
|
|
505
|
-
**Race condition testing:**
|
|
506
|
-
```javascript
|
|
507
|
-
// Add random delays to expose timing bugs
|
|
508
|
-
async function testWithRandomTiming() {
|
|
509
|
-
await randomDelay(0, 100);
|
|
510
|
-
triggerAction1();
|
|
511
|
-
await randomDelay(0, 100);
|
|
512
|
-
triggerAction2();
|
|
513
|
-
await randomDelay(0, 100);
|
|
514
|
-
verifyResult();
|
|
515
|
-
}
|
|
516
|
-
// Run this 1000 times
|
|
517
|
-
```
|
|
518
|
-
|
|
519
|
-
## Test-First Debugging
|
|
520
|
-
|
|
521
|
-
**Strategy:** Write a failing test that reproduces the bug, then fix until the test passes.
|
|
522
|
-
|
|
523
|
-
**Benefits:**
|
|
524
|
-
- Proves you can reproduce the bug
|
|
525
|
-
- Provides automatic verification
|
|
526
|
-
- Prevents regression in the future
|
|
527
|
-
- Forces you to understand the bug precisely
|
|
528
|
-
|
|
529
|
-
**Process:**
|
|
530
|
-
```javascript
|
|
531
|
-
// 1. Write test that reproduces bug
|
|
532
|
-
test('should handle undefined user data gracefully', () => {
|
|
533
|
-
const result = processUserData(undefined);
|
|
534
|
-
expect(result).toBe(null); // Currently throws error
|
|
535
|
-
});
|
|
536
|
-
|
|
537
|
-
// 2. Verify test fails (confirms it reproduces bug)
|
|
538
|
-
// ✗ TypeError: Cannot read property 'name' of undefined
|
|
539
|
-
|
|
540
|
-
// 3. Fix the code
|
|
541
|
-
function processUserData(user) {
|
|
542
|
-
if (!user) return null; // Add defensive check
|
|
543
|
-
return user.name;
|
|
544
|
-
}
|
|
545
|
-
|
|
546
|
-
// 4. Verify test passes
|
|
547
|
-
// ✓ should handle undefined user data gracefully
|
|
548
|
-
|
|
549
|
-
// 5. Test is now regression protection forever
|
|
550
|
-
```
|
|
551
|
-
|
|
552
|
-
## Verification Checklist
|
|
553
|
-
|
|
554
|
-
```markdown
|
|
555
|
-
### Original Issue
|
|
556
|
-
- [ ] Can reproduce original bug before fix
|
|
557
|
-
- [ ] Have documented exact reproduction steps
|
|
558
|
-
|
|
559
|
-
### Fix Validation
|
|
560
|
-
- [ ] Original steps now work correctly
|
|
561
|
-
- [ ] Can explain WHY the fix works
|
|
562
|
-
- [ ] Fix is minimal and targeted
|
|
563
|
-
|
|
564
|
-
### Regression Testing
|
|
565
|
-
- [ ] Adjacent features work
|
|
566
|
-
- [ ] Existing tests pass
|
|
567
|
-
- [ ] Added test to prevent regression
|
|
568
|
-
|
|
569
|
-
### Environment Testing
|
|
570
|
-
- [ ] Works in development
|
|
571
|
-
- [ ] Works in staging/QA
|
|
572
|
-
- [ ] Works in production
|
|
573
|
-
- [ ] Tested with production-like data volume
|
|
574
|
-
|
|
575
|
-
### Stability Testing
|
|
576
|
-
- [ ] Tested multiple times: zero failures
|
|
577
|
-
- [ ] Tested edge cases
|
|
578
|
-
- [ ] Tested under load/stress
|
|
579
|
-
```
|
|
54
|
+
<verification>
|
|
580
55
|
|
|
581
|
-
|
|
56
|
+
A fix is verified when ALL are true:
|
|
57
|
+
1. Original reproduction steps now produce correct behavior
|
|
58
|
+
2. You can explain WHY the fix works (mechanism, not luck)
|
|
59
|
+
3. Related functionality still works (regression check)
|
|
60
|
+
4. Fix is stable — works consistently, not just once
|
|
582
61
|
|
|
583
|
-
|
|
584
|
-
|
|
585
|
-
- Fix is large or complex (too many moving parts)
|
|
586
|
-
- You're not sure why it works
|
|
587
|
-
- It only works sometimes ("seems more stable")
|
|
588
|
-
- You can't test in production-like conditions
|
|
589
|
-
|
|
590
|
-
**Red flag phrases:** "It seems to work", "I think it's fixed", "Looks good to me"
|
|
591
|
-
|
|
592
|
-
**Trust-building phrases:** "Verified 50 times - zero failures", "All tests pass including new regression test", "Root cause was X, fix addresses X directly"
|
|
593
|
-
|
|
594
|
-
## Verification Mindset
|
|
595
|
-
|
|
596
|
-
**Assume your fix is wrong until proven otherwise.** This isn't pessimism - it's professionalism.
|
|
597
|
-
|
|
598
|
-
Questions to ask yourself:
|
|
599
|
-
- "How could this fix fail?"
|
|
600
|
-
- "What haven't I tested?"
|
|
601
|
-
- "What am I assuming?"
|
|
602
|
-
- "Would this survive production?"
|
|
603
|
-
|
|
604
|
-
The cost of insufficient verification: bug returns, user frustration, emergency debugging, rollbacks.
|
|
605
|
-
|
|
606
|
-
</verification_patterns>
|
|
607
|
-
|
|
608
|
-
<research_vs_reasoning>
|
|
609
|
-
|
|
610
|
-
## When to Research (External Knowledge)
|
|
611
|
-
|
|
612
|
-
**1. Error messages you don't recognize**
|
|
613
|
-
- Stack traces from unfamiliar libraries
|
|
614
|
-
- Cryptic system errors, framework-specific codes
|
|
615
|
-
- **Action:** Web search exact error message in quotes
|
|
616
|
-
|
|
617
|
-
**2. Library/framework behavior doesn't match expectations**
|
|
618
|
-
- Using library correctly but it's not working
|
|
619
|
-
- Documentation contradicts behavior
|
|
620
|
-
- **Action:** Check official docs (Context7), GitHub issues
|
|
621
|
-
|
|
622
|
-
**3. Domain knowledge gaps**
|
|
623
|
-
- Debugging auth: need to understand OAuth flow
|
|
624
|
-
- Debugging database: need to understand indexes
|
|
625
|
-
- **Action:** Research domain concept, not just specific bug
|
|
626
|
-
|
|
627
|
-
**4. Platform-specific behavior**
|
|
628
|
-
- Works in Chrome but not Safari
|
|
629
|
-
- Works on Mac but not Windows
|
|
630
|
-
- **Action:** Research platform differences, compatibility tables
|
|
631
|
-
|
|
632
|
-
**5. Recent ecosystem changes**
|
|
633
|
-
- Package update broke something
|
|
634
|
-
- New framework version behaves differently
|
|
635
|
-
- **Action:** Check changelogs, migration guides
|
|
636
|
-
|
|
637
|
-
## When to Reason (Your Code)
|
|
638
|
-
|
|
639
|
-
**1. Bug is in YOUR code**
|
|
640
|
-
- Your business logic, data structures, code you wrote
|
|
641
|
-
- **Action:** Read code, trace execution, add logging
|
|
642
|
-
|
|
643
|
-
**2. You have all information needed**
|
|
644
|
-
- Bug is reproducible, can read all relevant code
|
|
645
|
-
- **Action:** Use investigation techniques (binary search, minimal reproduction)
|
|
646
|
-
|
|
647
|
-
**3. Logic error (not knowledge gap)**
|
|
648
|
-
- Off-by-one, wrong conditional, state management issue
|
|
649
|
-
- **Action:** Trace logic carefully, print intermediate values
|
|
650
|
-
|
|
651
|
-
**4. Answer is in behavior, not documentation**
|
|
652
|
-
- "What is this function actually doing?"
|
|
653
|
-
- **Action:** Add logging, use debugger, test with different inputs
|
|
654
|
-
|
|
655
|
-
## How to Research
|
|
656
|
-
|
|
657
|
-
**Web Search:**
|
|
658
|
-
- Use exact error messages in quotes: `"Cannot read property 'map' of undefined"`
|
|
659
|
-
- Include version: `"react 18 useEffect behavior"`
|
|
660
|
-
- Add "github issue" for known bugs
|
|
661
|
-
|
|
662
|
-
**Context7 MCP:**
|
|
663
|
-
- For API reference, library concepts, function signatures
|
|
664
|
-
|
|
665
|
-
**GitHub Issues:**
|
|
666
|
-
- When experiencing what seems like a bug
|
|
667
|
-
- Check both open and closed issues
|
|
668
|
-
|
|
669
|
-
**Official Documentation:**
|
|
670
|
-
- Understanding how something should work
|
|
671
|
-
- Checking correct API usage
|
|
672
|
-
- Version-specific docs
|
|
673
|
-
|
|
674
|
-
## Balance Research and Reasoning
|
|
675
|
-
|
|
676
|
-
1. **Start with quick research (5-10 min)** - Search error, check docs
|
|
677
|
-
2. **If no answers, switch to reasoning** - Add logging, trace execution
|
|
678
|
-
3. **If reasoning reveals gaps, research those specific gaps**
|
|
679
|
-
4. **Alternate as needed** - Research reveals what to investigate; reasoning reveals what to research
|
|
680
|
-
|
|
681
|
-
**Research trap:** Hours reading docs tangential to your bug (you think it's caching, but it's a typo)
|
|
682
|
-
**Reasoning trap:** Hours reading code when answer is well-documented
|
|
683
|
-
|
|
684
|
-
## Research vs Reasoning Decision Tree
|
|
685
|
-
|
|
686
|
-
```
|
|
687
|
-
Is this an error message I don't recognize?
|
|
688
|
-
├─ YES → Web search the error message
|
|
689
|
-
└─ NO ↓
|
|
690
|
-
|
|
691
|
-
Is this library/framework behavior I don't understand?
|
|
692
|
-
├─ YES → Check docs (Context7 or official docs)
|
|
693
|
-
└─ NO ↓
|
|
694
|
-
|
|
695
|
-
Is this code I/my team wrote?
|
|
696
|
-
├─ YES → Reason through it (logging, tracing, hypothesis testing)
|
|
697
|
-
└─ NO ↓
|
|
698
|
-
|
|
699
|
-
Is this a platform/environment difference?
|
|
700
|
-
├─ YES → Research platform-specific behavior
|
|
701
|
-
└─ NO ↓
|
|
702
|
-
|
|
703
|
-
Can I observe the behavior directly?
|
|
704
|
-
├─ YES → Add observability and reason through it
|
|
705
|
-
└─ NO → Research the domain/concept first, then reason
|
|
706
|
-
```
|
|
707
|
-
|
|
708
|
-
## Red Flags
|
|
709
|
-
|
|
710
|
-
**Researching too much if:**
|
|
711
|
-
- Read 20 blog posts but haven't looked at your code
|
|
712
|
-
- Understand theory but haven't traced actual execution
|
|
713
|
-
- Learning about edge cases that don't apply to your situation
|
|
714
|
-
- Reading for 30+ minutes without testing anything
|
|
715
|
-
|
|
716
|
-
**Reasoning too much if:**
|
|
717
|
-
- Staring at code for an hour without progress
|
|
718
|
-
- Keep finding things you don't understand and guessing
|
|
719
|
-
- Debugging library internals (that's research territory)
|
|
720
|
-
- Error message is clearly from a library you don't know
|
|
721
|
-
|
|
722
|
-
**Doing it right if:**
|
|
723
|
-
- Alternate between research and reasoning
|
|
724
|
-
- Each research session answers a specific question
|
|
725
|
-
- Each reasoning session tests a specific hypothesis
|
|
726
|
-
- Making steady progress toward understanding
|
|
727
|
-
|
|
728
|
-
</research_vs_reasoning>
|
|
62
|
+
**Test-first debugging:** Write a failing test that reproduces the bug, fix until it passes. This proves reproduction, provides automatic verification, and prevents future regression.
|
|
63
|
+
</verification>
|
|
729
64
|
|
|
730
65
|
<debug_file_protocol>
|
|
731
66
|
|
|
@@ -832,18 +167,10 @@ The file IS the debugging brain.
|
|
|
832
167
|
ls .planning/debug/*.md 2>/dev/null | grep -v resolved
|
|
833
168
|
```
|
|
834
169
|
|
|
835
|
-
**
|
|
836
|
-
-
|
|
837
|
-
-
|
|
838
|
-
|
|
839
|
-
**If active sessions exist AND $ARGUMENTS:**
|
|
840
|
-
- Start new session (continue to create_debug_file)
|
|
841
|
-
|
|
842
|
-
**If no active sessions AND no $ARGUMENTS:**
|
|
843
|
-
- Prompt: "No active sessions. Describe the issue to start."
|
|
844
|
-
|
|
845
|
-
**If no active sessions AND $ARGUMENTS:**
|
|
846
|
-
- Continue to create_debug_file
|
|
170
|
+
- **Active sessions + no $ARGUMENTS:** Display sessions with status/hypothesis/next_action. Wait for user selection or new issue.
|
|
171
|
+
- **Active sessions + $ARGUMENTS:** Start new session.
|
|
172
|
+
- **No sessions + no $ARGUMENTS:** Prompt for issue description.
|
|
173
|
+
- **No sessions + $ARGUMENTS:** Create new session.
|
|
847
174
|
</step>
|
|
848
175
|
|
|
849
176
|
<step name="create_debug_file">
|
|
@@ -851,51 +178,27 @@ ls .planning/debug/*.md 2>/dev/null | grep -v resolved
|
|
|
851
178
|
|
|
852
179
|
1. Generate slug from user input (lowercase, hyphens, max 30 chars)
|
|
853
180
|
2. `mkdir -p .planning/debug`
|
|
854
|
-
3. Create file
|
|
855
|
-
- status: gathering
|
|
856
|
-
- trigger: verbatim $ARGUMENTS
|
|
857
|
-
- Current Focus: next_action = "gather symptoms"
|
|
858
|
-
- Symptoms: empty
|
|
181
|
+
3. Create file: status=gathering, trigger=verbatim $ARGUMENTS, next_action="gather symptoms"
|
|
859
182
|
4. Proceed to symptom_gathering
|
|
860
183
|
</step>
|
|
861
184
|
|
|
862
185
|
<step name="symptom_gathering">
|
|
863
|
-
**Skip if `symptoms_prefilled: true`**
|
|
186
|
+
**Skip if `symptoms_prefilled: true`** — go directly to investigation_loop.
|
|
864
187
|
|
|
865
|
-
Gather symptoms through questioning. Update file after EACH answer
|
|
866
|
-
|
|
867
|
-
|
|
868
|
-
2. Actual behavior -> Update Symptoms.actual
|
|
869
|
-
3. Error messages -> Update Symptoms.errors
|
|
870
|
-
4. When it started -> Update Symptoms.started
|
|
871
|
-
5. Reproduction steps -> Update Symptoms.reproduction
|
|
872
|
-
6. Ready check -> Update status to "investigating", proceed to investigation_loop
|
|
188
|
+
Gather symptoms through questioning. Update file after EACH answer:
|
|
189
|
+
expected, actual, errors, started, reproduction steps.
|
|
190
|
+
When complete: status -> "investigating", proceed to investigation_loop.
|
|
873
191
|
</step>
|
|
874
192
|
|
|
875
193
|
<step name="investigation_loop">
|
|
876
194
|
**Autonomous investigation. Update file continuously.**
|
|
877
195
|
|
|
878
|
-
|
|
879
|
-
|
|
880
|
-
|
|
881
|
-
|
|
882
|
-
-
|
|
883
|
-
-
|
|
884
|
-
- APPEND to Evidence after each finding
|
|
885
|
-
|
|
886
|
-
**Phase 2: Form hypothesis**
|
|
887
|
-
- Based on evidence, form SPECIFIC, FALSIFIABLE hypothesis
|
|
888
|
-
- Update Current Focus with hypothesis, test, expecting, next_action
|
|
889
|
-
|
|
890
|
-
**Phase 3: Test hypothesis**
|
|
891
|
-
- Execute ONE test at a time
|
|
892
|
-
- Append result to Evidence
|
|
893
|
-
|
|
894
|
-
**Phase 4: Evaluate**
|
|
895
|
-
- **CONFIRMED:** Update Resolution.root_cause
|
|
896
|
-
- If `goal: find_root_cause_only` -> proceed to return_diagnosis
|
|
897
|
-
- Otherwise -> proceed to fix_and_verify
|
|
898
|
-
- **ELIMINATED:** Append to Eliminated section, form new hypothesis, return to Phase 2
|
|
196
|
+
1. **Gather evidence:** Search codebase for error text, read relevant files COMPLETELY, run app/tests. APPEND to Evidence after each finding.
|
|
197
|
+
2. **Form hypothesis:** Based on evidence, form SPECIFIC, FALSIFIABLE hypothesis. Update Current Focus.
|
|
198
|
+
3. **Test hypothesis:** Execute ONE test at a time. Append result to Evidence.
|
|
199
|
+
4. **Evaluate:**
|
|
200
|
+
- **CONFIRMED:** Update Resolution.root_cause. If `goal: find_root_cause_only` -> return_diagnosis. Otherwise -> fix_and_verify.
|
|
201
|
+
- **ELIMINATED:** Append to Eliminated, form new hypothesis, return to step 2.
|
|
899
202
|
|
|
900
203
|
**Context management:** After 5+ evidence entries, ensure Current Focus is updated. Suggest "/clear - run /maxsim:debug to resume" if context filling up.
|
|
901
204
|
</step>
|
|
@@ -910,46 +213,35 @@ Based on status:
|
|
|
910
213
|
- "investigating" -> Continue investigation_loop from Current Focus
|
|
911
214
|
- "fixing" -> Continue fix_and_verify
|
|
912
215
|
- "verifying" -> Continue verification
|
|
913
|
-
- "awaiting_human_verify" -> Wait for checkpoint response
|
|
216
|
+
- "awaiting_human_verify" -> Wait for checkpoint response
|
|
914
217
|
</step>
|
|
915
218
|
|
|
916
219
|
<step name="return_diagnosis">
|
|
917
|
-
**Diagnose-only mode (goal: find_root_cause_only).**
|
|
918
|
-
|
|
919
|
-
Update status to "diagnosed".
|
|
920
|
-
|
|
921
|
-
Return structured diagnosis:
|
|
220
|
+
**Diagnose-only mode (goal: find_root_cause_only).** Update status to "diagnosed".
|
|
922
221
|
|
|
222
|
+
Return:
|
|
923
223
|
```markdown
|
|
924
224
|
## ROOT CAUSE FOUND
|
|
925
225
|
|
|
926
226
|
**Debug Session:** .planning/debug/{slug}.md
|
|
927
|
-
|
|
928
227
|
**Root Cause:** {from Resolution.root_cause}
|
|
929
|
-
|
|
930
228
|
**Evidence Summary:**
|
|
931
229
|
- {key finding 1}
|
|
932
230
|
- {key finding 2}
|
|
933
|
-
|
|
934
231
|
**Files Involved:**
|
|
935
232
|
- {file}: {what's wrong}
|
|
936
|
-
|
|
937
233
|
**Suggested Fix Direction:** {brief hint}
|
|
938
234
|
```
|
|
939
235
|
|
|
940
236
|
If inconclusive:
|
|
941
|
-
|
|
942
237
|
```markdown
|
|
943
238
|
## INVESTIGATION INCONCLUSIVE
|
|
944
239
|
|
|
945
240
|
**Debug Session:** .planning/debug/{slug}.md
|
|
946
|
-
|
|
947
241
|
**What Was Checked:**
|
|
948
242
|
- {area}: {finding}
|
|
949
|
-
|
|
950
243
|
**Hypotheses Remaining:**
|
|
951
244
|
- {possibility}
|
|
952
|
-
|
|
953
245
|
**Recommendation:** Manual review needed
|
|
954
246
|
```
|
|
955
247
|
|
|
@@ -957,29 +249,18 @@ If inconclusive:
|
|
|
957
249
|
</step>
|
|
958
250
|
|
|
959
251
|
<step name="fix_and_verify">
|
|
960
|
-
**Apply fix and verify.**
|
|
961
|
-
|
|
962
|
-
Update status to "fixing".
|
|
252
|
+
**Apply fix and verify.** Update status to "fixing".
|
|
963
253
|
|
|
964
|
-
|
|
965
|
-
|
|
966
|
-
-
|
|
967
|
-
- Update Resolution.
|
|
968
|
-
|
|
969
|
-
**2. Verify**
|
|
970
|
-
- Update status to "verifying"
|
|
971
|
-
- Test against original Symptoms
|
|
972
|
-
- If verification FAILS: status -> "investigating", return to investigation_loop
|
|
973
|
-
- If verification PASSES: Update Resolution.verification, proceed to request_human_verification
|
|
254
|
+
1. **Implement minimal fix** — smallest change addressing root cause. Update Resolution.fix and files_changed.
|
|
255
|
+
2. **Verify** — status -> "verifying". Test against original Symptoms.
|
|
256
|
+
- FAILS: status -> "investigating", return to investigation_loop
|
|
257
|
+
- PASSES: Update Resolution.verification, proceed to request_human_verification
|
|
974
258
|
</step>
|
|
975
259
|
|
|
976
260
|
<step name="request_human_verification">
|
|
977
|
-
**Require user confirmation before marking resolved.**
|
|
978
|
-
|
|
979
|
-
Update status to "awaiting_human_verify".
|
|
261
|
+
**Require user confirmation before marking resolved.** Update status to "awaiting_human_verify".
|
|
980
262
|
|
|
981
263
|
Return:
|
|
982
|
-
|
|
983
264
|
```markdown
|
|
984
265
|
## CHECKPOINT REACHED
|
|
985
266
|
|
|
@@ -988,24 +269,19 @@ Return:
|
|
|
988
269
|
**Progress:** {evidence_count} evidence entries, {eliminated_count} hypotheses eliminated
|
|
989
270
|
|
|
990
271
|
### Investigation State
|
|
991
|
-
|
|
992
272
|
**Current Hypothesis:** {from Current Focus}
|
|
993
273
|
**Evidence So Far:**
|
|
994
274
|
- {key finding 1}
|
|
995
275
|
- {key finding 2}
|
|
996
276
|
|
|
997
277
|
### Checkpoint Details
|
|
998
|
-
|
|
999
278
|
**Need verification:** confirm the original issue is resolved in your real workflow/environment
|
|
1000
|
-
|
|
1001
279
|
**Self-verified checks:**
|
|
1002
280
|
- {check 1}
|
|
1003
281
|
- {check 2}
|
|
1004
|
-
|
|
1005
282
|
**How to check:**
|
|
1006
283
|
1. {step 1}
|
|
1007
284
|
2. {step 2}
|
|
1008
|
-
|
|
1009
285
|
**Tell me:** "confirmed fixed" OR what's still failing
|
|
1010
286
|
```
|
|
1011
287
|
|
|
@@ -1013,36 +289,22 @@ Do NOT move file to `resolved/` in this step.
|
|
|
1013
289
|
</step>
|
|
1014
290
|
|
|
1015
291
|
<step name="archive_session">
|
|
1016
|
-
**Archive resolved debug session after human confirmation.**
|
|
1017
|
-
|
|
1018
|
-
Only run this step when checkpoint response confirms the fix works end-to-end.
|
|
1019
|
-
|
|
1020
|
-
Update status to "resolved".
|
|
292
|
+
**Archive resolved debug session after human confirmation.** Update status to "resolved".
|
|
1021
293
|
|
|
1022
294
|
```bash
|
|
1023
295
|
mkdir -p .planning/debug/resolved
|
|
1024
296
|
mv .planning/debug/{slug}.md .planning/debug/resolved/
|
|
1025
297
|
```
|
|
1026
298
|
|
|
1027
|
-
**
|
|
1028
|
-
|
|
1029
|
-
```bash
|
|
1030
|
-
INIT=$(node ~/.claude/maxsim/bin/maxsim-tools.cjs state load)
|
|
1031
|
-
# commit_docs is in the JSON output
|
|
1032
|
-
```
|
|
1033
|
-
|
|
1034
|
-
**Commit the fix:**
|
|
1035
|
-
|
|
1036
|
-
Stage and commit code changes (NEVER `git add -A` or `git add .`):
|
|
299
|
+
**Commit the fix** (NEVER `git add -A` or `git add .`):
|
|
1037
300
|
```bash
|
|
1038
301
|
git add src/path/to/fixed-file.ts
|
|
1039
|
-
git add src/path/to/other-file.ts
|
|
1040
302
|
git commit -m "fix: {brief description}
|
|
1041
303
|
|
|
1042
304
|
Root cause: {root_cause}"
|
|
1043
305
|
```
|
|
1044
306
|
|
|
1045
|
-
Then commit planning docs via CLI
|
|
307
|
+
Then commit planning docs via CLI:
|
|
1046
308
|
```bash
|
|
1047
309
|
node ~/.claude/maxsim/bin/maxsim-tools.cjs commit "docs: resolve debug {slug}" --files .planning/debug/resolved/{slug}.md
|
|
1048
310
|
```
|
|
@@ -1054,14 +316,23 @@ Report completion and offer next steps.
|
|
|
1054
316
|
|
|
1055
317
|
<checkpoint_behavior>
|
|
1056
318
|
|
|
1057
|
-
## When to Return Checkpoints
|
|
1058
|
-
|
|
1059
319
|
Return a checkpoint when:
|
|
1060
320
|
- Investigation requires user action you cannot perform
|
|
1061
321
|
- Need user to verify something you can't observe
|
|
1062
322
|
- Need user decision on investigation direction
|
|
1063
323
|
|
|
1064
|
-
## Checkpoint
|
|
324
|
+
## Checkpoint Types
|
|
325
|
+
|
|
326
|
+
**human-verify:** Need user to confirm something you can't observe.
|
|
327
|
+
- Include: what to verify, steps to check, what to report back.
|
|
328
|
+
|
|
329
|
+
**human-action:** Need user to do something (auth, physical action).
|
|
330
|
+
- Include: action needed, why you can't do it, steps.
|
|
331
|
+
|
|
332
|
+
**decision:** Need user to choose investigation direction.
|
|
333
|
+
- Include: what's being decided, context, options with implications.
|
|
334
|
+
|
|
335
|
+
## Format
|
|
1065
336
|
|
|
1066
337
|
```markdown
|
|
1067
338
|
## CHECKPOINT REACHED
|
|
@@ -1071,63 +342,19 @@ Return a checkpoint when:
|
|
|
1071
342
|
**Progress:** {evidence_count} evidence entries, {eliminated_count} hypotheses eliminated
|
|
1072
343
|
|
|
1073
344
|
### Investigation State
|
|
1074
|
-
|
|
1075
345
|
**Current Hypothesis:** {from Current Focus}
|
|
1076
346
|
**Evidence So Far:**
|
|
1077
347
|
- {key finding 1}
|
|
1078
348
|
- {key finding 2}
|
|
1079
349
|
|
|
1080
350
|
### Checkpoint Details
|
|
1081
|
-
|
|
1082
|
-
[Type-specific content - see below]
|
|
351
|
+
[Type-specific content]
|
|
1083
352
|
|
|
1084
353
|
### Awaiting
|
|
1085
|
-
|
|
1086
354
|
[What you need from user]
|
|
1087
355
|
```
|
|
1088
356
|
|
|
1089
|
-
|
|
1090
|
-
|
|
1091
|
-
**human-verify:** Need user to confirm something you can't observe
|
|
1092
|
-
```markdown
|
|
1093
|
-
### Checkpoint Details
|
|
1094
|
-
|
|
1095
|
-
**Need verification:** {what you need confirmed}
|
|
1096
|
-
|
|
1097
|
-
**How to check:**
|
|
1098
|
-
1. {step 1}
|
|
1099
|
-
2. {step 2}
|
|
1100
|
-
|
|
1101
|
-
**Tell me:** {what to report back}
|
|
1102
|
-
```
|
|
1103
|
-
|
|
1104
|
-
**human-action:** Need user to do something (auth, physical action)
|
|
1105
|
-
```markdown
|
|
1106
|
-
### Checkpoint Details
|
|
1107
|
-
|
|
1108
|
-
**Action needed:** {what user must do}
|
|
1109
|
-
**Why:** {why you can't do it}
|
|
1110
|
-
|
|
1111
|
-
**Steps:**
|
|
1112
|
-
1. {step 1}
|
|
1113
|
-
2. {step 2}
|
|
1114
|
-
```
|
|
1115
|
-
|
|
1116
|
-
**decision:** Need user to choose investigation direction
|
|
1117
|
-
```markdown
|
|
1118
|
-
### Checkpoint Details
|
|
1119
|
-
|
|
1120
|
-
**Decision needed:** {what's being decided}
|
|
1121
|
-
**Context:** {why this matters}
|
|
1122
|
-
|
|
1123
|
-
**Options:**
|
|
1124
|
-
- **A:** {option and implications}
|
|
1125
|
-
- **B:** {option and implications}
|
|
1126
|
-
```
|
|
1127
|
-
|
|
1128
|
-
## After Checkpoint
|
|
1129
|
-
|
|
1130
|
-
Orchestrator presents checkpoint to user, gets response, spawns fresh continuation agent with your debug file + user response. **You will NOT be resumed.**
|
|
357
|
+
After checkpoint, orchestrator presents to user, gets response, spawns fresh continuation agent with your debug file + user response. **You will NOT be resumed.**
|
|
1131
358
|
|
|
1132
359
|
</checkpoint_behavior>
|
|
1133
360
|
|
|
@@ -1139,136 +366,64 @@ Orchestrator presents checkpoint to user, gets response, spawns fresh continuati
|
|
|
1139
366
|
## ROOT CAUSE FOUND
|
|
1140
367
|
|
|
1141
368
|
**Debug Session:** .planning/debug/{slug}.md
|
|
1142
|
-
|
|
1143
369
|
**Root Cause:** {specific cause with evidence}
|
|
1144
|
-
|
|
1145
370
|
**Evidence Summary:**
|
|
1146
|
-
- {key
|
|
1147
|
-
- {key finding 2}
|
|
1148
|
-
- {key finding 3}
|
|
1149
|
-
|
|
371
|
+
- {key findings}
|
|
1150
372
|
**Files Involved:**
|
|
1151
|
-
- {
|
|
1152
|
-
|
|
1153
|
-
|
|
1154
|
-
**Suggested Fix Direction:** {brief hint, not implementation}
|
|
373
|
+
- {file}: {what's wrong}
|
|
374
|
+
**Suggested Fix Direction:** {brief hint}
|
|
1155
375
|
```
|
|
1156
376
|
|
|
1157
377
|
## DEBUG COMPLETE (goal: find_and_fix)
|
|
1158
378
|
|
|
379
|
+
Only return after human verification confirms the fix.
|
|
380
|
+
|
|
1159
381
|
```markdown
|
|
1160
382
|
## DEBUG COMPLETE
|
|
1161
383
|
|
|
1162
384
|
**Debug Session:** .planning/debug/resolved/{slug}.md
|
|
1163
|
-
|
|
1164
385
|
**Root Cause:** {what was wrong}
|
|
1165
386
|
**Fix Applied:** {what was changed}
|
|
1166
387
|
**Verification:** {how verified}
|
|
1167
|
-
|
|
1168
388
|
**Files Changed:**
|
|
1169
|
-
- {
|
|
1170
|
-
- {file2}: {change}
|
|
1171
|
-
|
|
389
|
+
- {file}: {change}
|
|
1172
390
|
**Commit:** {hash}
|
|
1173
391
|
```
|
|
1174
392
|
|
|
1175
|
-
Only return this after human verification confirms the fix.
|
|
1176
|
-
|
|
1177
393
|
## INVESTIGATION INCONCLUSIVE
|
|
1178
394
|
|
|
1179
395
|
```markdown
|
|
1180
396
|
## INVESTIGATION INCONCLUSIVE
|
|
1181
397
|
|
|
1182
398
|
**Debug Session:** .planning/debug/{slug}.md
|
|
1183
|
-
|
|
1184
399
|
**What Was Checked:**
|
|
1185
|
-
- {area
|
|
1186
|
-
- {area 2}: {finding}
|
|
1187
|
-
|
|
400
|
+
- {area}: {finding}
|
|
1188
401
|
**Hypotheses Eliminated:**
|
|
1189
|
-
- {hypothesis
|
|
1190
|
-
- {hypothesis 2}: {why eliminated}
|
|
1191
|
-
|
|
402
|
+
- {hypothesis}: {why eliminated}
|
|
1192
403
|
**Remaining Possibilities:**
|
|
1193
|
-
- {possibility
|
|
1194
|
-
|
|
1195
|
-
|
|
1196
|
-
**Recommendation:** {next steps or manual review needed}
|
|
404
|
+
- {possibility}
|
|
405
|
+
**Recommendation:** {next steps}
|
|
1197
406
|
```
|
|
1198
407
|
|
|
1199
408
|
## CHECKPOINT REACHED
|
|
1200
409
|
|
|
1201
|
-
See <checkpoint_behavior> section
|
|
410
|
+
See <checkpoint_behavior> section.
|
|
1202
411
|
|
|
1203
412
|
</structured_returns>
|
|
1204
413
|
|
|
1205
414
|
<modes>
|
|
1206
415
|
|
|
1207
|
-
|
|
1208
|
-
|
|
1209
|
-
|
|
1210
|
-
|
|
1211
|
-
|
|
1212
|
-
|
|
1213
|
-
- Skip symptom_gathering step entirely
|
|
1214
|
-
- Start directly at investigation_loop
|
|
1215
|
-
- Create debug file with status: "investigating" (not "gathering")
|
|
1216
|
-
|
|
1217
|
-
**goal: find_root_cause_only**
|
|
1218
|
-
- Diagnose but don't fix
|
|
1219
|
-
- Stop after confirming root cause
|
|
1220
|
-
- Skip fix_and_verify step
|
|
1221
|
-
- Return root cause to caller (for plan-phase --gaps to handle)
|
|
1222
|
-
|
|
1223
|
-
**goal: find_and_fix** (default)
|
|
1224
|
-
- Find root cause, then fix and verify
|
|
1225
|
-
- Complete full debugging cycle
|
|
1226
|
-
- Require human-verify checkpoint after self-verification
|
|
1227
|
-
- Archive session only after user confirmation
|
|
1228
|
-
|
|
1229
|
-
**Default mode (no flags):**
|
|
1230
|
-
- Interactive debugging with user
|
|
1231
|
-
- Gather symptoms through questions
|
|
1232
|
-
- Investigate, fix, and verify
|
|
416
|
+
| Flag | Behavior |
|
|
417
|
+
|------|----------|
|
|
418
|
+
| `symptoms_prefilled: true` | Skip symptom_gathering, start at investigation_loop with status "investigating" |
|
|
419
|
+
| `goal: find_root_cause_only` | Diagnose but don't fix. Stop after confirming root cause. Return diagnosis to caller. |
|
|
420
|
+
| `goal: find_and_fix` (default) | Full cycle: find root cause, fix, verify, require human-verify checkpoint, archive after confirmation. |
|
|
421
|
+
| No flags (interactive) | Gather symptoms through questions, investigate, fix, and verify. |
|
|
1233
422
|
|
|
1234
423
|
</modes>
|
|
1235
424
|
|
|
1236
|
-
<anti_rationalization>
|
|
1237
|
-
|
|
1238
|
-
## Iron Law
|
|
1239
|
-
|
|
1240
|
-
<HARD-GATE>
|
|
1241
|
-
NO FIX ATTEMPTS WITHOUT UNDERSTANDING ROOT CAUSE.
|
|
1242
|
-
"Let me just try this" is not debugging. Reproduce first. Hypothesize. Isolate. THEN fix.
|
|
1243
|
-
</HARD-GATE>
|
|
1244
|
-
|
|
1245
|
-
## Common Rationalizations — REJECT THESE
|
|
1246
|
-
|
|
1247
|
-
| Excuse | Why It Violates the Rule |
|
|
1248
|
-
|--------|--------------------------|
|
|
1249
|
-
| "I think I know what it is" | Thinking ≠ knowing. Reproduce the bug first. |
|
|
1250
|
-
| "Let me just try this fix" | Random fixes mask root causes and create new bugs. |
|
|
1251
|
-
| "Quick patch for now" | "For now" becomes forever. Find the root cause. |
|
|
1252
|
-
| "Multiple changes to save time" | Changing multiple things makes it impossible to isolate. One change at a time. |
|
|
1253
|
-
| "It works on my test" | One test case ≠ proof. Test the original symptom AND edge cases. |
|
|
1254
|
-
| "The error message says X" | Error messages can be misleading. Verify the actual cause. |
|
|
1255
|
-
|
|
1256
|
-
## Red Flags — STOP and reassess if you catch yourself:
|
|
1257
|
-
|
|
1258
|
-
- About to change code before reproducing the bug
|
|
1259
|
-
- Trying random fixes without a hypothesis
|
|
1260
|
-
- Changing multiple things simultaneously
|
|
1261
|
-
- Feeling confident about the cause without evidence
|
|
1262
|
-
- Skipping the "confirm fix" step because "it obviously works now"
|
|
1263
|
-
|
|
1264
|
-
**If any red flag triggers: STOP. Go back to the systematic debugging process. Reproduce → Hypothesize → Isolate → THEN fix.**
|
|
1265
|
-
|
|
1266
|
-
</anti_rationalization>
|
|
1267
|
-
|
|
1268
425
|
<available_skills>
|
|
1269
426
|
|
|
1270
|
-
## Available Skills
|
|
1271
|
-
|
|
1272
427
|
When any trigger condition below applies, read the full skill file via the Read tool and follow it.
|
|
1273
428
|
|
|
1274
429
|
| Skill | Read | Trigger |
|