@gitwhy-cli/whyspec 0.1.16 → 0.1.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,34 +1,65 @@
1
1
  ---
2
2
  name: whyspec-debug
3
- description: Debug with scientific method — gather symptoms, form falsifiable hypotheses, test systematically, verify root cause before fixing. Searches team knowledge first and captures the full investigation as persistent context.
3
+ description: Use when encountering any bug, test failure, or unexpected behavior before proposing fixes.
4
+ argument-hint: "<bug-description-or-change-name>"
4
5
  ---
5
6
 
6
7
  # WhySpec Debug — Scientific Investigation
7
8
 
8
9
  Debug systematically. No fix without root cause.
9
10
 
10
- This skill implements a structured debugging process that captures the full investigation
11
- as persistent context — symptoms, hypotheses, evidence, root cause, and fix rationale.
12
-
13
11
  The investigation is automatically saved as a context file when resolved.
14
12
 
15
13
  ---
16
14
 
17
- ## Purpose
15
+ **Input**: A bug description, error message, or change name for an existing debug session.
18
16
 
19
- Debugging is not guessing. This skill enforces:
17
+ ## Iron Law
20
18
 
21
- 1. **Team knowledge first** search past reasoning before reinventing
22
- 2. **Scientific method** — falsifiable hypotheses tested with evidence
23
- 3. **Iron Law** — no fix is proposed until root cause is verified
24
- 4. **Persistent state** — debug.md survives context resets so investigations can resume
25
- 5. **Reasoning capture** — every investigation produces a context file for future developers
19
+ **NO FIX WITHOUT VERIFIED ROOT CAUSE.** Guessing at fixes creates more bugs than it solves. A wrong diagnosis leads to a wrong fix that masks the real problem.
26
20
 
27
- ---
21
+ ## Red Flags — If You're Thinking This, STOP
28
22
 
29
- **Input**: A bug description, error message, or change name for an existing debug session.
23
+ - "The fix is obvious, I'll just apply it" If it's obvious, verification takes 10 seconds. Do it.
24
+ - "I'll try this fix and see if it works" → That's guess-and-check, not debugging. Form a hypothesis first.
25
+ - "This is too simple to need the full process" → Simple bugs have root causes too. The process is fast for simple bugs.
26
+ - "I already know what's wrong from the stack trace" → The stack trace shows WHERE, not WHY. Investigate the WHY.
27
+ - "Let me just add some logging and see" → Logging is a test for a hypothesis. What's your hypothesis?
30
28
 
31
- ---
29
+ ## Rationalization Table
30
+
31
+ | If you catch yourself thinking... | Reality |
32
+ |----------------------------------|---------|
33
+ | "The fix is obvious, skip investigation" | Obvious fixes have root causes too. Verify in 30 seconds. |
34
+ | "It's just a typo/config issue" | Confirm it. Read the code. Don't assume. |
35
+ | "I'll just try this quick fix first" | The first fix sets the pattern. Do it right from the start. |
36
+ | "No time for full investigation" | Systematic debugging is FASTER than guess-and-check thrashing. |
37
+ | "I already know what's wrong" | Then verification should take 10 seconds. Do it anyway. |
38
+ | "It works on my machine" | That's a symptom, not a diagnosis. Find the environmental difference. |
39
+ | "The error message tells me exactly what's wrong" | Error messages describe symptoms. Root causes are upstream. |
40
+ | "Let me just revert the last change" | Revert is a workaround, not a fix. Why did the change break things? |
41
+
42
+ ## Tools
43
+
44
+ | Tool | When to use | When NOT to use |
45
+ |------|------------|-----------------|
46
+ | **Grep** | Search for error messages, function names, patterns in code | Don't grep before forming hypotheses — symptoms first |
47
+ | **Read** | Read suspect files, stack trace locations, config files | Don't read unrelated files — stay focused on hypotheses |
48
+ | **Bash** | Run tests to reproduce, `git log`/`git diff` to find trigger commits, execute test commands for hypotheses | Don't run destructive commands or modify production data |
49
+ | **Glob** | Find files by pattern when error references unknown paths | Don't glob the entire repo — scope to suspect areas |
50
+ | **Write** | Write/update debug.md (investigation state) and ctx_<id>.md (final capture) | Don't write fix code until root cause is verified |
51
+ | **WebSearch** | ONLY for: error messages with no codebase matches, library changelogs, CVE lookups | Never search web for: "how to debug X", generic solutions, or before investigating the codebase |
52
+ | **AskUserQuestion** | Escalation ONLY — after 2 rounds of failed hypotheses, or when root cause is outside codebase | Don't ask before investigating. The codebase has the answers. |
53
+
54
+ ### Codebase First, Web Never-First
55
+
56
+ Read the codebase BEFORE considering web search. Web search is justified ONLY when:
57
+ - Error message has zero matches in the codebase or git history
58
+ - Library version changelog needed (breaking changes between versions)
59
+ - Security advisory lookup (CVEs)
60
+ - Stack trace references internal framework code you can't read locally
61
+
62
+ Never search the web for: "how to fix X", architecture decisions, or generic debugging advice.
32
63
 
33
64
  ## Step 0: Team Knowledge Search
34
65
 
@@ -39,15 +70,12 @@ whyspec search --json "<keywords from bug description>"
39
70
  ```
40
71
 
41
72
  If results exist:
42
- - Display: "Found N past contexts in this domain"
43
- - List relevant titles and key decisions from past investigations
44
- - Note any past decisions that might inform the current bug
73
+ - Display relevant titles and key decisions from past investigations
74
+ - Note past decisions that might inform the current bug (e.g., "The auth middleware was deliberately changed in add-auth — this might explain the current session bug")
45
75
 
46
76
  If no results: note "No prior context found" and continue.
47
77
 
48
- This step takes seconds. It prevents re-investigating solved problems and surfaces past decisions that may explain the current behavior.
49
-
50
- ---
78
+ This takes seconds. It prevents re-investigating solved problems.
51
79
 
52
80
  ## Step 1: Symptoms Gathering
53
81
 
@@ -58,11 +86,11 @@ whyspec debug --json "<bug-name>"
58
86
  ```
59
87
 
60
88
  Parse the JSON response:
61
- - `path`: Debug session directory (e.g., `.gitwhy/changes/<bug-name>/`)
89
+ - `path`: Debug session directory
62
90
  - `template`: debug.md template structure
63
- - `related_contexts`: Past contexts in the same domain (from Step 0)
91
+ - `related_contexts`: Past contexts in the same domain
64
92
 
65
- **Gather symptoms** — use **AskUserQuestion** if the user hasn't provided enough detail, or investigate the codebase directly:
93
+ **Gather symptoms** — investigate the codebase directly. Only ask the user if you genuinely can't find the information yourself:
66
94
 
67
95
  | Symptom | What to capture |
68
96
  |---------|----------------|
@@ -73,332 +101,188 @@ Parse the JSON response:
73
101
  | Timeline | When it started, what changed recently |
74
102
  | Scope | Who is affected, how often, which environments |
75
103
 
76
- **Write debug.md immediately** — this file IS the investigation state:
77
-
78
- ```markdown
79
- # Debug: <bug-name>
80
-
81
- ## Status: INVESTIGATING
82
-
104
+ <examples>
105
+ <good>
83
106
  ## Symptoms
84
-
85
- **Expected:** [what should happen]
86
- **Actual:** [what actually happens]
87
- **Error:**
107
+ **Expected:** POST /api/users returns 201 with user object
108
+ **Actual:** Returns 500 with "Cannot read properties of undefined (reading 'email')"
109
+ **Error:** TypeError at src/handlers/users.ts:47 — `req.body.email` is undefined
110
+ **Stack trace:**
88
111
  ```
89
- [exact error message or stack trace]
112
+ TypeError: Cannot read properties of undefined (reading 'email')
113
+ at createUser (src/handlers/users.ts:47:28)
114
+ at Layer.handle (node_modules/express/lib/router/layer.js:95:5)
90
115
  ```
91
- **Reproduction:** [minimal steps to reproduce]
92
- **Timeline:** [when it started, what changed recently]
93
- **Scope:** [who/what is affected, frequency]
94
-
95
- ## Related Past Contexts
96
-
97
- [Results from Step 0, or "None found"]
98
- [If found: relevant decisions, reasoning excerpts]
99
-
100
- ## Hypotheses
101
-
102
- [Populated in Step 2]
103
-
104
- ## Evidence Log
105
-
106
- [Populated in Step 3]
107
-
108
- ## Root Cause
109
-
110
- [Populated in Step 4]
111
-
112
- ## Fix
113
-
114
- [Populated in Step 5]
115
-
116
- ## Prevention
117
-
118
- [Populated in Step 5]
119
- ```
120
-
121
- **CRITICAL**: Write debug.md to `<path>/debug.md` NOW, after this step. It persists across context resets and enables resuming the investigation.
116
+ **Reproduction:** `curl -X POST localhost:3000/api/users -H "Content-Type: application/json" -d '{"name":"test"}'`
117
+ **Timeline:** Started after commit a1b2c3d (merged express-validator upgrade, Apr 8)
118
+ **Scope:** All POST endpoints with body parsing, not just /users. GET endpoints unaffected.
119
+ Why good: Exact error with file:line, exact reproduction command,
120
+ identified the trigger commit, and noticed the scope is broader than reported.
121
+ </good>
122
+
123
+ <bad>
124
+ ## Symptoms
125
+ **Expected:** API should work
126
+ **Actual:** Getting 500 errors
127
+ **Error:** Server error
128
+ Why bad: Vague symptoms lead to vague hypotheses. No file, no line,
129
+ no reproduction steps, no timeline.
130
+ </bad>
131
+ </examples>
122
132
 
123
- ---
133
+ **Write debug.md immediately** to `<path>/debug.md`. It persists across context resets.
124
134
 
125
135
  ## Step 2: Hypothesis Formation
126
136
 
127
- Form **3 or more falsifiable hypotheses**. Each must include a specific claim, a concrete test, and a way to disprove it:
128
-
129
- ```markdown
130
- ## Hypotheses
137
+ Form **3 or more falsifiable hypotheses**:
131
138
 
132
- ### H1: [Specific, testable claim about the root cause]
133
- - **Test:** [Concrete action — a command to run, a log to check, a condition to verify]
134
- - **Disproof:** [What evidence would prove this hypothesis WRONG]
139
+ <examples>
140
+ <good>
141
+ ### H1: express-validator upgrade broke body parsing middleware order
142
+ - **Test:** `git diff a1b2c3d -- src/app.ts` — check if middleware registration order changed
143
+ - **Disproof:** If middleware order is identical pre/post upgrade, this is wrong
135
144
  - **Status:** UNTESTED
136
- - **Likelihood:** HIGH / MEDIUM / LOW
145
+ - **Likelihood:** HIGH (scope matches all POST endpoints affected, timing matches upgrade)
137
146
 
138
- ### H2: [Different claim consider a different subsystem or mechanism]
139
- - **Test:** [Concrete action]
140
- - **Disproof:** [What would disprove it]
147
+ ### H2: express-validator v7 changed req.body population timing
148
+ - **Test:** Add `console.log(req.body)` before and after validation middleware, compare output
149
+ - **Disproof:** If req.body is populated before validation in both versions, this is wrong
141
150
  - **Status:** UNTESTED
142
- - **Likelihood:** HIGH / MEDIUM / LOW
151
+ - **Likelihood:** MEDIUM (v7 changelog mentions "async validation" changes)
143
152
 
144
- ### H3: [Third claim consider edge cases, race conditions, configuration]
145
- - **Test:** [Concrete action]
146
- - **Disproof:** [What would disprove it]
153
+ ### H3: Content-Type header handling changed in the upgrade
154
+ - **Test:** Send request with `Content-Type: application/json; charset=utf-8` — does it parse?
155
+ - **Disproof:** If extended Content-Type works the same in both versions, this is wrong
147
156
  - **Status:** UNTESTED
148
- - **Likelihood:** HIGH / MEDIUM / LOW
149
- ```
150
-
151
- **Hypothesis quality rules:**
152
- - Each hypothesis must be **specific enough to test** — "something is wrong with auth" is not a hypothesis
153
- - Each hypothesis must be **falsifiable** — there must be evidence that could prove it wrong
154
- - Hypotheses should target **different root causes** — not three variations of the same idea
155
- - **Use past contexts**: if Step 0 found related reasoning, let those decisions inform your hypotheses. A past choice ("we used X because of Y") might explain the current behavior.
156
-
157
- Rank by likelihood. Test the most likely first.
158
-
159
- Update debug.md with all hypotheses before proceeding.
160
-
161
- ---
157
+ - **Likelihood:** LOW (but worth testing — charset handling has caused issues before)
158
+ Why good: Each hypothesis is specific, testable, and falsifiable.
159
+ They target different root causes. Likelihood is justified with evidence.
160
+ </good>
161
+
162
+ <bad>
163
+ ### H1: Something is wrong with the API
164
+ - **Test:** Check the API
165
+ - **Disproof:** If the API works
166
+ ### H2: Maybe a dependency issue
167
+ - **Test:** Check dependencies
168
+ Why bad: Not specific enough to test. "Check the API" is not a concrete action.
169
+ </bad>
170
+ </examples>
171
+
172
+ Rank by likelihood. Test the most likely first. Update debug.md before proceeding.
162
173
 
163
174
  ## Step 3: Hypothesis Testing
164
175
 
165
- Test each hypothesis **one at a time, sequentially**. For each:
176
+ Test each hypothesis **one at a time, sequentially**:
166
177
 
167
- 1. **Execute the test** described in the hypothesis
168
- 2. **Record evidence** — exact output, logs, observed behavior
169
- 3. **Evaluate**does the evidence support, refute, or leave the hypothesis inconclusive?
170
- 4. **Update status**: `CONFIRMED`, `DISPROVED`, or `INCONCLUSIVE`
171
- 5. **Update debug.md immediately** with findings
178
+ 1. Execute the test described in the hypothesis
179
+ 2. Record evidence — exact output, logs, observed behavior
180
+ 3. Evaluate — support, refute, or inconclusive?
181
+ 4. Update status: `CONFIRMED`, `DISPROVED`, or `INCONCLUSIVE`
182
+ 5. Update debug.md immediately
172
183
 
173
- ```markdown
174
- ## Evidence Log
175
-
176
- ### H1: [claim]DISPROVED
177
- **Test performed:** [exact command or action taken]
178
- **Evidence:**
179
- ```
180
- [exact output, log entries, or observations]
181
- ```
182
- **Conclusion:** [why this hypothesis is disproved — what the evidence shows]
183
-
184
- ### H2: [claim] — CONFIRMED
185
- **Test performed:** [exact command or action taken]
186
- **Evidence:**
187
- ```
188
- [exact output showing the root cause]
189
- ```
190
- **Conclusion:** [why this is confirmed — the causal link between evidence and symptom]
191
- ```
192
-
193
- **Testing rules:**
184
+ **Rules:**
185
+ - **One hypothesis at a time** — never test multiple simultaneously
186
+ - **Max 3 tests per hypothesis** — if inconclusive after 3, mark INCONCLUSIVE and move on
187
+ - **Preserve the crime scene** record current state before modifying suspect code
188
+ - **Update debug.md after each test** don't batch
194
189
 
195
- - **One hypothesis at a time** — never test multiple simultaneously. Confounded evidence is useless.
196
- - **Max 3 tests per hypothesis** if evidence is inconclusive after 3 attempts, mark INCONCLUSIVE and move to the next.
197
- - **Preserve the crime scene** before modifying suspect code, record its current state in the evidence log.
198
- - **Update debug.md after each test** — don't batch. Each test result is written immediately.
190
+ If ALL hypotheses are disproved:
191
+ - Form new hypotheses based on what evidence revealed
192
+ - If stuck after a second round, escalate to the user
199
193
 
200
- If ALL hypotheses are disproved or inconclusive:
201
- - Form new hypotheses based on what the evidence revealed
202
- - If still stuck after a second round, escalate to the user (see Escalation Rules)
203
-
204
- ---
194
+ ## Step 4: Root Cause Verification
205
195
 
206
- ## Step 4: Root Cause Verification — The Iron Law
196
+ Before proposing ANY fix:
207
197
 
208
- **No fix without verified root cause.**
209
-
210
- Before proposing ANY fix, you must:
211
-
212
- 1. **State the root cause clearly and specifically**
198
+ 1. **State the root cause** clearly and specifically
213
199
  2. **Explain the causal chain**: [trigger] → [mechanism] → [symptom]
214
- 3. **Verify predictive power**: can you predict the symptom from the cause? Can you reliably reproduce it?
215
-
216
- Update debug.md:
200
+ 3. **Verify predictive power**: can you predict the symptom from the cause?
217
201
 
218
202
  ```markdown
219
203
  ## Root Cause
220
-
221
- **Cause:** [precise description of what is wrong — not symptoms, the actual defect]
222
- **Causal chain:** [trigger event][mechanism/code path][observed symptom]
223
- **Verified by:** [how the causal link was confirmed — which test, which evidence]
224
- **Confidence:** HIGH / MEDIUM / LOW
204
+ **Cause:** express-validator v7 switched to async validation, body parsing
205
+ now completes AFTER route handler starts executing
206
+ **Causal chain:** express-validator upgradeasync body parsing req.body
207
+ undefined when handler reads it synchronously TypeError
208
+ **Verified by:** Adding `await` before validation resolved the issue in test
209
+ **Confidence:** HIGH
225
210
  ```
226
211
 
227
- **Confidence thresholds:**
228
-
229
212
  | Confidence | Criteria | Action |
230
213
  |-----------|----------|--------|
231
- | HIGH | Reproduction is reliable, causal chain is clear, evidence is unambiguous | Proceed to fix |
232
- | MEDIUM | Strong evidence but some uncertainty remains | Proceed with caution, note risks |
233
- | LOW | Circumstantial evidence, cannot reliably reproduce | **Escalate to user** — do NOT fix |
234
-
235
- If confidence is LOW:
236
- - Present all evidence gathered to the user via **AskUserQuestion**
237
- - Show: what was tested, what was found, what remains uncertain
238
- - Ask for additional context, access, or direction
239
- - **Do NOT guess at a fix**
240
-
241
- Update debug.md status: `## Status: ROOT CAUSE IDENTIFIED`
242
-
243
- ---
214
+ | HIGH | Reliable reproduction, clear causal chain | Proceed to fix |
215
+ | MEDIUM | Strong evidence, some uncertainty | Proceed with caution, note risks |
216
+ | LOW | Circumstantial evidence | **Escalate — do NOT fix** |
244
217
 
245
218
  ## Step 5: Fix + Auto-Capture
246
219
 
247
- Once root cause is verified with HIGH or MEDIUM confidence:
248
-
249
- ### 5a. Implement the fix
250
-
251
- - Make the **minimal, targeted change** that addresses the root cause
252
- - Don't refactor surrounding code — fix the bug, nothing more
253
- - Verify the fix resolves the symptom (run the reproduction steps again)
254
-
255
- ### 5b. Update debug.md
256
-
257
- ```markdown
258
- ## Fix
259
-
260
- **Change:** [what was modified and how]
261
- **Files:** [files changed]
262
- **Verification:** [how the fix was confirmed — test results, manual reproduction]
263
-
264
- ## Prevention
265
-
266
- **How to prevent recurrence:**
267
- - [Concrete preventive measure — e.g., "add input validation for X"]
268
- - [Process improvement e.g., "add test case for this edge case"]
269
- - [Monitoring — e.g., "add alert for this error pattern"]
270
- ```
271
-
272
- Update debug.md status: `## Status: RESOLVED`
273
-
274
- ### 5c. Commit the fix
275
-
276
- Commit atomically with a clear message referencing the root cause.
277
-
278
- ### 5d. Auto-capture reasoning
279
-
280
- Generate a context file to preserve the full investigation:
281
-
282
- ```bash
283
- whyspec capture --json "<bug-name>"
284
- ```
285
-
286
- Write `<path>/ctx_<id>.md` in SaaS XML format:
287
-
288
- ```xml
289
- <context>
290
- <title>Debug: [short description — bug and fix]</title>
291
-
292
- <story>
293
- Phase 1 — Symptoms:
294
- [What was observed, when it started, reproduction steps]
295
-
296
- Phase 2 — Investigation:
297
- [Hypotheses formed, tests performed, evidence gathered]
298
- [Which hypotheses were disproved and why]
299
-
300
- Phase 3 — Root Cause:
301
- [The actual defect, causal chain, how it was verified]
302
-
303
- Phase 4 — Fix:
304
- [What was changed, how the fix was confirmed]
305
- </story>
306
-
307
- <reasoning>
308
- Why the bug existed and why this fix is correct.
309
-
310
- <decisions>
311
- - [Fix approach chosen] — [rationale for this approach]
312
- </decisions>
313
-
314
- <rejected>
315
- - [Alternative fix considered] — [why it was rejected]
316
- - [Disproved hypothesis] — [what evidence ruled it out]
317
- </rejected>
318
-
319
- <tradeoffs>
320
- - [Any trade-offs in the fix — scope, performance, complexity]
321
- </tradeoffs>
322
- </reasoning>
323
-
324
- <files>
325
- [Files changed to fix the bug]
326
- </files>
327
-
328
- <verification>[Test results confirming the fix]</verification>
329
- <risks>[Potential side effects, related areas to watch]</risks>
330
- </context>
331
- ```
332
-
333
- ### 5e. Show summary
334
-
335
- ```
336
- ## Debug Complete: <bug-name>
337
-
338
- Root cause: [one-line summary]
339
- Fix: [what was changed]
340
- Context: ctx_<id>.md
341
-
342
- Investigation:
343
- Hypotheses tested: N (M confirmed, P disproved)
344
- Evidence entries: N
345
- Past contexts referenced: N
346
-
347
- View full investigation: /whyspec-show <bug-name>
348
- ```
349
-
350
- ---
220
+ Once root cause is verified (HIGH or MEDIUM confidence):
221
+
222
+ 1. **Implement the minimal fix** — fix the bug, don't refactor surrounding code
223
+ 2. **Verify the fix** — run reproduction steps again, run test suite
224
+ 3. **Update debug.md** add fix details and prevention measures:
225
+ ```markdown
226
+ ## Fix
227
+ **Change:** Added `await` before express-validator `validationResult()` calls
228
+ **Files:** src/middleware/validate.ts (3 lines changed)
229
+ **Verification:** `npm test` — 47 pass, 0 fail. Manual curl test returns 201.
230
+
231
+ ## Prevention
232
+ - Added eslint rule for async validation middleware
233
+ - Added integration test: POST with body → verify req.body populated
234
+ - Updated UPGRADE.md with express-validator v7 migration notes
235
+ ```
236
+ 4. **Commit** atomically with root cause in message
237
+ 5. **Auto-capture** reasoning:
238
+ ```bash
239
+ whyspec capture --json "<bug-name>"
240
+ ```
241
+ Write `<path>/ctx_<id>.md` with the full investigation story.
242
+
243
+ 6. **Show summary:**
244
+ ```
245
+ ## Debug Complete: <bug-name>
246
+
247
+ Root cause: [one-line summary]
248
+ Fix: [what was changed]
249
+ Context: ctx_<id>.md
250
+
251
+ Investigation:
252
+ Hypotheses tested: N (M confirmed, P disproved)
253
+ Evidence entries: N
254
+ Past contexts referenced: N
255
+
256
+ View full investigation: /whyspec:show <bug-name>
257
+ ```
351
258
 
352
259
  ## Resuming an Investigation
353
260
 
354
- If the user invokes `/whyspec-debug` and a `debug.md` already exists for that change:
355
-
356
- 1. **Read debug.md** from the change folder
357
- 2. **Check the Status field** and resume from the appropriate step:
358
-
359
- | Status | Resume from |
360
- |--------|-------------|
361
- | `INVESTIGATING` | Last completed step — check which sections are populated |
362
- | `ROOT CAUSE IDENTIFIED` | Step 5 — implement the fix |
363
- | `RESOLVED` | Investigation is complete — show summary |
364
-
365
- 3. **Announce**: "Resuming debug session: <name> — Status: <status>"
366
- 4. **Show progress**: display completed sections and what remains
367
-
368
- This is why writing debug.md incrementally is critical — it's the contract for resumability.
369
-
370
- ---
261
+ If debug.md already exists for a change:
262
+ 1. Read debug.md
263
+ 2. Check Status and resume from the appropriate step
264
+ 3. Announce: "Resuming debug session: <name> Status: <status>"
371
265
 
372
266
  ## Escalation Rules
373
267
 
374
- Escalate to the user (via **AskUserQuestion**) when:
375
-
376
- | Trigger | What to present |
377
- |---------|----------------|
378
- | All hypotheses disproved (2 rounds) | Full evidence summary, ask for new direction |
379
- | Cannot reproduce | Symptoms documented, ask for environment details or access |
380
- | Root cause outside codebase | Findings documented, suggest infrastructure/environment investigation |
381
- | Root cause confidence is LOW | Evidence summary, explain uncertainty, ask for guidance |
382
- | Fix would introduce significant risk | Proposed fix, risk assessment, ask for approval |
268
+ | Trigger | Action |
269
+ |---------|--------|
270
+ | All hypotheses disproved (2 rounds) | Present full evidence, ask for new direction |
271
+ | Cannot reproduce | Document symptoms, ask for environment details |
272
+ | Root cause outside codebase | Document findings, suggest infrastructure investigation |
273
+ | Confidence is LOW | Present evidence, explain uncertainty, do NOT fix |
274
+ | Fix introduces significant risk | Present fix + risk assessment, ask for approval |
275
+ | 3 failed fix attempts | Stop. Present what was tried. Ask for help. |
383
276
 
384
- When escalating, always present:
385
- - What was tested and what was found
386
- - What remains uncertain
387
- - A specific question or request for the user
388
-
389
- **Never silently give up.** If you're stuck, say so with evidence.
390
-
391
- ---
277
+ **Never silently give up.** If stuck, present evidence and ask.
392
278
 
393
279
  ## Guardrails
394
280
 
395
- - **No fix without root cause** — the Iron Law is non-negotiable. Never propose a fix based on a guess, a hunch, or pattern-matching without evidence.
396
- - **Max 3 tests per hypothesis** — if evidence is inconclusive after 3 attempts, mark INCONCLUSIVE and form new hypotheses or escalate.
397
- - **Always capture reasoning** — every debug session MUST produce both `debug.md` AND `ctx_<id>.md`. No silent fixes. The investigation is as valuable as the fix.
398
- - **Write debug.md incrementally** — update after EVERY step, not at the end. This is the resumability contract. If context resets, the investigation survives.
399
- - **Don't skip team knowledge** — always run Step 0, even for "obvious" bugs. Past contexts prevent repeated mistakes and surface relevant decisions.
400
- - **Don't guess at root cause** — if uncertain after investigation, escalate. Wrong diagnosis leads to wrong fixes that mask the real problem.
401
- - **Test one hypothesis at a time** — never test multiple simultaneously. Sequential testing produces clean evidence.
402
- - **Preserve evidence** — before modifying suspect code, record its current state. Don't destroy the crime scene.
403
- - **Minimal fixes only** — fix the bug, don't refactor. Keep the diff focused on the root cause.
404
- - **Don't skip prevention** — after fixing, always document how to prevent recurrence. Future developers need this.
281
+ - **No fix without root cause** — the Iron Law is non-negotiable
282
+ - **Max 3 tests per hypothesis** — escalate if inconclusive
283
+ - **Always capture reasoning** — every debug session produces both debug.md AND ctx_<id>.md
284
+ - **Write debug.md incrementally** — update after EVERY step, not at the end
285
+ - **Don't skip team knowledge** — always run Step 0
286
+ - **Test one hypothesis at a time** — sequential testing produces clean evidence
287
+ - **Preserve evidence** — record state before modifying suspect code
288
+ - **Minimal fixes only** — fix the bug, don't refactor