@gitwhy-cli/whyspec 0.1.17 → 0.1.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,241 @@
1
+ ---
2
+ name: whyspec-capture
3
+ description: Use after coding to preserve reasoning — resolves the Decision Bridge with actual outcomes.
4
+ argument-hint: "[change-name]"
5
+ ---
6
+
7
+ Capture reasoning — create a context file that resolves the Decision Bridge and preserves the full story.
8
+
9
+ View the complete story with `/whyspec:show`
10
+
11
+ ---
12
+
13
+ **Input**: Optionally specify a change name. If omitted, auto-detect the most recently executed change.
14
+
15
+ ## Iron Law
16
+
17
+ **CAPTURE REASONING, NOT SUMMARIES.** "We used Redis" is a summary. "We chose Redis over in-memory because the app runs on 3 instances and rate limits must be shared — in-memory would let users bypass limits by hitting different instances" is reasoning. Every decision needs the WHY.
18
+
19
+ ## Red Flags — If You're Thinking This, STOP
20
+
21
+ - "The implementation was straightforward, not much to capture" → Every implementation has decisions. Find them.
22
+ - "I'll just summarize what files changed" → That's a git log, not reasoning. Capture WHY, not WHAT.
23
+ - "There were no surprises" → Look harder. Did you deviate from the plan at all? Change any approach? That's a surprise.
24
+ - "The decisions are obvious from the code" → They're obvious NOW. In 6 months, they won't be. Write the rationale.
25
+
26
+ ## Steps
27
+
28
+ 1. **Select the change**
29
+
30
+ If ARGUMENTS provides a name, use it. Otherwise:
31
+ - Auto-detect the most recently executed change (look for changes with completed tasks)
32
+ - If ambiguous, run `whyspec list --json` and let the user select
33
+
34
+ 2. **Read plan files for Decision Bridge mapping**
35
+
36
+ Read these files from the change folder — **required** before generating context:
37
+ - `<path>/intent.md` — the stated intent, "Decisions to Make" checkboxes
38
+ - `<path>/design.md` — the approach, "Questions to Resolve" items
39
+
40
+ Extract and track:
41
+ - Every `- [ ]` or `- [x]` item under "Decisions to Make" → each MUST be resolved in the context
42
+ - Every item under "Questions to Resolve" → each MUST be answered
43
+ - The stated constraints and success criteria → compare against what actually happened
44
+
45
+ 3. **Get capture data from CLI**
46
+
47
+ ```bash
48
+ whyspec capture --json "<name>"
49
+ ```
50
+
51
+ Parse the JSON response:
52
+ - `template`: Context file template
53
+ - `commits`: Commits associated with this change (auto-detected from git)
54
+ - `files_changed`: Files modified during implementation (auto-detected)
55
+ - `decisions_to_make`: Decision checkboxes extracted from plan files
56
+ - `change_name`: The change name for the header
57
+
58
+ 4. **Populate the Decision Bridge**
59
+
60
+ This is the core of the capture. Map every planned decision to its outcome:
61
+
62
+ a. **Decisions to Make → Decisions Made**: For EACH checkbox from intent.md, record:
63
+ - What was decided
64
+ - Why (the rationale — not just the choice, but the reasoning)
65
+ - Any constraints that influenced the decision
66
+
67
+ b. **Questions to Resolve → Answers**: For EACH question from design.md, record:
68
+ - The answer that emerged during implementation
69
+ - How it was determined
70
+
71
+ c. **Capture Surprises**: Identify decisions made during implementation that were NOT in the original plan:
72
+ - "What did we decide that we didn't plan to decide?"
73
+ - "What changed from the original design?"
74
+ - "What unexpected requirements emerged?"
75
+
76
+ If a planned decision was NOT made during implementation, note it as unresolved and ask the user.
77
+
78
+ <examples>
79
+ <good>
80
+ ## Decisions Made
81
+
82
+ **Rate limit storage → Redis**
83
+ Chose Redis over in-memory. The app runs on 3 instances behind an ALB.
84
+ In-memory rate limiting would let users bypass limits by hitting different
85
+ instances. Redis adds ~2ms latency per check, but our p99 is already 180ms
86
+ so the overhead is negligible. Reused the existing ioredis client at
87
+ src/lib/redis.ts rather than adding a new dependency.
88
+
89
+ **Limit granularity → Both IP and token**
90
+ IP-only would block shared offices (NAT). Token-only would let unauthenticated
91
+ abuse through. Implemented tiered: 100/15min per IP for unauthenticated,
92
+ 1000/15min per token for authenticated. The token tier uses the JWT sub claim.
93
+
94
+ **429 response → Standard with Retry-After**
95
+ Went with standard 429 + Retry-After header. Custom error body would require
96
+ updating all API clients. The Retry-After header is sufficient for automated
97
+ retry logic and doesn't break existing integrations.
98
+ Why good: Each decision has the WHAT (choice), WHY (rationale), and HOW
99
+ (specific implementation detail). References actual code and numbers.
100
+ </good>
101
+
102
+ <bad>
103
+ ## Decisions Made
104
+ - Used Redis for rate limiting
105
+ - Implemented per-IP and per-token limits
106
+ - Returns 429 status code
107
+ Why bad: Just restates WHAT was done. No WHY. No trade-off reasoning.
108
+ A future developer learns nothing about why these choices were made.
109
+ </bad>
110
+
111
+ <good>
112
+ ## Surprises (not in original plan)
113
+
114
+ **Added X-Request-ID middleware** — During implementation, discovered that
115
+ 429 responses were impossible to debug without a request ID. Added
116
+ X-Request-ID header generation as a prerequisite in src/middleware/requestId.ts.
117
+ This wasn't in the plan but is essential for production debugging.
118
+ Follow-up: Should be extracted into its own WhySpec change if we add more observability.
119
+
120
+ **Changed Redis key schema** — Plan assumed simple key-value, but discovered
121
+ the sliding window algorithm needs sorted sets. Changed from `ratelimit:{ip}`
122
+ string keys to `ratelimit:{ip}` sorted sets with timestamp scores.
123
+ This affects the Redis memory profile — noted in Risks.
124
+ Why good: Documents unplanned decisions with full context. Notes follow-up items.
125
+ </good>
126
+
127
+ <bad>
128
+ (No surprises section)
129
+ Why bad: Every implementation has surprises. If you didn't document any,
130
+ you weren't paying attention.
131
+ </bad>
132
+ </examples>
133
+
134
+ 5. **Generate ctx_<id>.md in SaaS XML format**
135
+
136
+ Write to `<path>/ctx_<id>.md` using the GitWhy SaaS format:
137
+
138
+ ```xml
139
+ <context>
140
+ <title>Short title describing what was built and why</title>
141
+
142
+ <story>
143
+ Phase-organized engineering journal. First-person, chronological.
144
+ Capture the FULL reasoning — not a summary.
145
+
146
+ Phase 1 — [Setup/Context]:
147
+ What the user asked for, initial understanding, preparation work.
148
+
149
+ Phase 2 — [Implementation]:
150
+ What was built, key decision points encountered, problems solved.
151
+ Reference specific files and approaches.
152
+
153
+ Phase 3 — [Verification]:
154
+ How the work was verified, test results, manual checks.
155
+ </story>
156
+
157
+ <reasoning>
158
+ Why this approach was chosen over alternatives.
159
+
160
+ <decisions>
161
+ - [Planned decision] — [chosen option] — [rationale]
162
+ </decisions>
163
+
164
+ <rejected>
165
+ - [Alternative not chosen] — [why it was rejected]
166
+ </rejected>
167
+
168
+ <tradeoffs>
169
+ - [Trade-off accepted] — [what was gained vs lost]
170
+ </tradeoffs>
171
+
172
+ Surprises (decisions not in the original plan):
173
+ - [Unexpected decision] — [why it was needed]
174
+ </reasoning>
175
+
176
+ <files>
177
+ path/to/file.ts — new — Brief description
178
+ path/to/other.ts — modified — Brief description
179
+ </files>
180
+
181
+ <agent>claude-code (model-name)</agent>
182
+ <tags>comma, separated, domain, keywords</tags>
183
+ <verification>Test results and build status</verification>
184
+ <risks>Open questions, follow-up items, known limitations</risks>
185
+ </context>
186
+ ```
187
+
188
+ 6. **Show summary**
189
+
190
+ ```
191
+ ## Reasoning Captured: <name>
192
+
193
+ Context: ctx_<id>.md
194
+
195
+ Decision Bridge:
196
+ Planned decisions resolved: N/N
197
+ Questions answered: N/N
198
+ Surprises captured: N
199
+
200
+ Files documented: N
201
+ Commits linked: N
202
+
203
+ View the full story: /whyspec:show <name>
204
+ ```
205
+
206
+ ## Tools
207
+
208
+ | Tool | When to use | When NOT to use |
209
+ |------|------------|-----------------|
210
+ | **Read** | Read intent.md and design.md for Decision Bridge mapping (REQUIRED first step) | Don't skip reading plan files |
211
+ | **Bash** | Run `whyspec capture --json`, `git log --oneline` to find commits, `git diff` to review changes | Don't modify code during capture |
212
+ | **Write** | Create the ctx_<id>.md context file | Don't overwrite existing context files |
213
+ | **Grep** | Search for decisions referenced in plan files (verify they were implemented) | Don't search the entire codebase |
214
+ | **AskUserQuestion** | When a planned decision was NOT made during implementation — ask the user to resolve | Don't ask about decisions that are clearly resolved in the code |
215
+
216
+ ### AskUserQuestion Format (for unresolved decisions only)
217
+
218
+ 1. **Re-ground**: "Capturing reasoning for **<change>**"
219
+ 2. **The gap**: Which planned decision wasn't resolved
220
+ 3. **What you found**: Evidence from the code about what actually happened
221
+ 4. **Ask for rationale**: "What drove this choice?"
222
+
223
+ ## Rationalization Table
224
+
225
+ | If you catch yourself thinking... | Reality |
226
+ |----------------------------------|---------|
227
+ | "This decision is obvious, no need to explain it" | It's obvious now. In 6 months with new team members, it won't be. |
228
+ | "The code is self-documenting" | Code shows WHAT. Context captures WHY. They're complementary. |
229
+ | "There were no surprises during implementation" | You changed zero things from the plan? Really? Look again. |
230
+ | "I'll just list the files that changed" | That's `git diff --stat`. The capture's value is reasoning, not file lists. |
231
+ | "The rationale is already in the commit messages" | Commit messages are 1-2 lines. Reasoning is paragraphs. Different depth. |
232
+
233
+ ## Guardrails
234
+
235
+ - **Must read plan files FIRST** — never generate context without reading intent.md and design.md. The Decision Bridge requires mapping FROM plan TO outcome.
236
+ - **Every planned decision must be resolved** — if intent.md lists 5 "Decisions to Make", all 5 must appear in the context. Prompt the user for any that weren't addressed.
237
+ - **Never skip surprises** — unplanned decisions are the most valuable context. Actively search for them.
238
+ - **Use SaaS XML format exactly** — the `<context>` tags must match the GitWhy format so `git why log` and `git why push` work without conversion.
239
+ - **Include verification results** — what tests pass, what was manually verified. Evidence, not claims.
240
+ - **Don't fabricate rationale** — if you don't know why a decision was made, ask the user. Invented reasoning is worse than no reasoning.
241
+ - **One context per capture** — each `/whyspec:capture` invocation creates exactly one `ctx_<id>.md` file.
@@ -0,0 +1,288 @@
1
+ ---
2
+ name: whyspec-debug
3
+ description: Use when encountering any bug, test failure, or unexpected behavior — before proposing fixes.
4
+ argument-hint: "<bug-description-or-change-name>"
5
+ ---
6
+
7
+ # WhySpec Debug — Scientific Investigation
8
+
9
+ Debug systematically. No fix without root cause.
10
+
11
+ The investigation is automatically saved as a context file when resolved.
12
+
13
+ ---
14
+
15
+ **Input**: A bug description, error message, or change name for an existing debug session.
16
+
17
+ ## Iron Law
18
+
19
+ **NO FIX WITHOUT VERIFIED ROOT CAUSE.** Guessing at fixes creates more bugs than it solves. A wrong diagnosis leads to a wrong fix that masks the real problem.
20
+
21
+ ## Red Flags — If You're Thinking This, STOP
22
+
23
+ - "The fix is obvious, I'll just apply it" → If it's obvious, verification takes 10 seconds. Do it.
24
+ - "I'll try this fix and see if it works" → That's guess-and-check, not debugging. Form a hypothesis first.
25
+ - "This is too simple to need the full process" → Simple bugs have root causes too. The process is fast for simple bugs.
26
+ - "I already know what's wrong from the stack trace" → The stack trace shows WHERE, not WHY. Investigate the WHY.
27
+ - "Let me just add some logging and see" → Logging is a test for a hypothesis. What's your hypothesis?
28
+
29
+ ## Rationalization Table
30
+
31
+ | If you catch yourself thinking... | Reality |
32
+ |----------------------------------|---------|
33
+ | "The fix is obvious, skip investigation" | Obvious fixes have root causes too. Verify in 30 seconds. |
34
+ | "It's just a typo/config issue" | Confirm it. Read the code. Don't assume. |
35
+ | "I'll just try this quick fix first" | The first fix sets the pattern. Do it right from the start. |
36
+ | "No time for full investigation" | Systematic debugging is FASTER than guess-and-check thrashing. |
37
+ | "I already know what's wrong" | Then verification should take 10 seconds. Do it anyway. |
38
+ | "It works on my machine" | That's a symptom, not a diagnosis. Find the environmental difference. |
39
+ | "The error message tells me exactly what's wrong" | Error messages describe symptoms. Root causes are upstream. |
40
+ | "Let me just revert the last change" | Revert is a workaround, not a fix. Why did the change break things? |
41
+
42
+ ## Tools
43
+
44
+ | Tool | When to use | When NOT to use |
45
+ |------|------------|-----------------|
46
+ | **Grep** | Search for error messages, function names, patterns in code | Don't grep before forming hypotheses — symptoms first |
47
+ | **Read** | Read suspect files, stack trace locations, config files | Don't read unrelated files — stay focused on hypotheses |
48
+ | **Bash** | Run tests to reproduce, `git log`/`git diff` to find trigger commits, execute test commands for hypotheses | Don't run destructive commands or modify production data |
49
+ | **Glob** | Find files by pattern when error references unknown paths | Don't glob the entire repo — scope to suspect areas |
50
+ | **Write** | Write/update debug.md (investigation state) and ctx_<id>.md (final capture) | Don't write fix code until root cause is verified |
51
+ | **WebSearch** | ONLY for: error messages with no codebase matches, library changelogs, CVE lookups | Never search web for: "how to debug X", generic solutions, or before investigating the codebase |
52
+ | **AskUserQuestion** | Escalation ONLY — after 2 rounds of failed hypotheses, or when root cause is outside codebase | Don't ask before investigating. The codebase has the answers. |
53
+
54
+ ### Codebase First, Web Never-First
55
+
56
+ Read the codebase BEFORE considering web search. Web search is justified ONLY when:
57
+ - Error message has zero matches in the codebase or git history
58
+ - Library version changelog needed (breaking changes between versions)
59
+ - Security advisory lookup (CVEs)
60
+ - Stack trace references internal framework code you can't read locally
61
+
62
+ Never search the web for: "how to fix X", architecture decisions, or generic debugging advice.
63
+
64
+ ## Step 0: Team Knowledge Search
65
+
66
+ Before investigating, check if someone has reasoned about this domain before:
67
+
68
+ ```bash
69
+ whyspec search --json "<keywords from bug description>"
70
+ ```
71
+
72
+ If results exist:
73
+ - Display relevant titles and key decisions from past investigations
74
+ - Note past decisions that might inform the current bug (e.g., "The auth middleware was deliberately changed in add-auth — this might explain the current session bug")
75
+
76
+ If no results: note "No prior context found" and continue.
77
+
78
+ This takes seconds. It prevents re-investigating solved problems.
79
+
80
+ ## Step 1: Symptoms Gathering
81
+
82
+ Create the debug session:
83
+
84
+ ```bash
85
+ whyspec debug --json "<bug-name>"
86
+ ```
87
+
88
+ Parse the JSON response:
89
+ - `path`: Debug session directory
90
+ - `template`: debug.md template structure
91
+ - `related_contexts`: Past contexts in the same domain
92
+
93
+ **Gather symptoms** — investigate the codebase directly. Only ask the user if you genuinely can't find the information yourself:
94
+
95
+ | Symptom | What to capture |
96
+ |---------|----------------|
97
+ | Expected behavior | What SHOULD happen |
98
+ | Actual behavior | What ACTUALLY happens |
99
+ | Error messages | Exact text, stack traces, error codes |
100
+ | Reproduction steps | Minimal sequence to trigger the bug |
101
+ | Timeline | When it started, what changed recently |
102
+ | Scope | Who is affected, how often, which environments |
103
+
104
+ <examples>
105
+ <good>
106
+ ## Symptoms
107
+ **Expected:** POST /api/users returns 201 with user object
108
+ **Actual:** Returns 500 with "Cannot read properties of undefined (reading 'email')"
109
+ **Error:** TypeError at src/handlers/users.ts:47 — `req.body.email` is undefined
110
+ **Stack trace:**
111
+ ```
112
+ TypeError: Cannot read properties of undefined (reading 'email')
113
+ at createUser (src/handlers/users.ts:47:28)
114
+ at Layer.handle (node_modules/express/lib/router/layer.js:95:5)
115
+ ```
116
+ **Reproduction:** `curl -X POST localhost:3000/api/users -H "Content-Type: application/json" -d '{"name":"test"}'`
117
+ **Timeline:** Started after commit a1b2c3d (merged express-validator upgrade, Apr 8)
118
+ **Scope:** All POST endpoints with body parsing, not just /users. GET endpoints unaffected.
119
+ Why good: Exact error with file:line, exact reproduction command,
120
+ identified the trigger commit, and noticed the scope is broader than reported.
121
+ </good>
122
+
123
+ <bad>
124
+ ## Symptoms
125
+ **Expected:** API should work
126
+ **Actual:** Getting 500 errors
127
+ **Error:** Server error
128
+ Why bad: Vague symptoms lead to vague hypotheses. No file, no line,
129
+ no reproduction steps, no timeline.
130
+ </bad>
131
+ </examples>
132
+
133
+ **Write debug.md immediately** to `<path>/debug.md`. It persists across context resets.
134
+
135
+ ## Step 2: Hypothesis Formation
136
+
137
+ Form **3 or more falsifiable hypotheses**:
138
+
139
+ <examples>
140
+ <good>
141
+ ### H1: express-validator upgrade broke body parsing middleware order
142
+ - **Test:** `git diff a1b2c3d -- src/app.ts` — check if middleware registration order changed
143
+ - **Disproof:** If middleware order is identical pre/post upgrade, this is wrong
144
+ - **Status:** UNTESTED
145
+ - **Likelihood:** HIGH (scope matches — all POST endpoints affected, timing matches upgrade)
146
+
147
+ ### H2: express-validator v7 changed req.body population timing
148
+ - **Test:** Add `console.log(req.body)` before and after validation middleware, compare output
149
+ - **Disproof:** If req.body is populated before validation in both versions, this is wrong
150
+ - **Status:** UNTESTED
151
+ - **Likelihood:** MEDIUM (v7 changelog mentions "async validation" changes)
152
+
153
+ ### H3: Content-Type header handling changed in the upgrade
154
+ - **Test:** Send request with `Content-Type: application/json; charset=utf-8` — does it parse?
155
+ - **Disproof:** If extended Content-Type works the same in both versions, this is wrong
156
+ - **Status:** UNTESTED
157
+ - **Likelihood:** LOW (but worth testing — charset handling has caused issues before)
158
+ Why good: Each hypothesis is specific, testable, and falsifiable.
159
+ They target different root causes. Likelihood is justified with evidence.
160
+ </good>
161
+
162
+ <bad>
163
+ ### H1: Something is wrong with the API
164
+ - **Test:** Check the API
165
+ - **Disproof:** If the API works
166
+ ### H2: Maybe a dependency issue
167
+ - **Test:** Check dependencies
168
+ Why bad: Not specific enough to test. "Check the API" is not a concrete action.
169
+ </bad>
170
+ </examples>
171
+
172
+ Rank by likelihood. Test the most likely first. Update debug.md before proceeding.
173
+
174
+ ## Step 3: Hypothesis Testing
175
+
176
+ Test each hypothesis **one at a time, sequentially**:
177
+
178
+ 1. Execute the test described in the hypothesis
179
+ 2. Record evidence — exact output, logs, observed behavior
180
+ 3. Evaluate — support, refute, or inconclusive?
181
+ 4. Update status: `CONFIRMED`, `DISPROVED`, or `INCONCLUSIVE`
182
+ 5. Update debug.md immediately
183
+
184
+ **Rules:**
185
+ - **One hypothesis at a time** — never test multiple simultaneously
186
+ - **Max 3 tests per hypothesis** — if inconclusive after 3, mark INCONCLUSIVE and move on
187
+ - **Preserve the crime scene** — record current state before modifying suspect code
188
+ - **Update debug.md after each test** — don't batch
189
+
190
+ If ALL hypotheses are disproved:
191
+ - Form new hypotheses based on what evidence revealed
192
+ - If stuck after a second round, escalate to the user
193
+
194
+ ## Step 4: Root Cause Verification
195
+
196
+ Before proposing ANY fix:
197
+
198
+ 1. **State the root cause** clearly and specifically
199
+ 2. **Explain the causal chain**: [trigger] → [mechanism] → [symptom]
200
+ 3. **Verify predictive power**: can you predict the symptom from the cause?
201
+
202
+ ```markdown
203
+ ## Root Cause
204
+ **Cause:** express-validator v7 switched to async validation, body parsing
205
+ now completes AFTER route handler starts executing
206
+ **Causal chain:** express-validator upgrade → async body parsing → req.body
207
+ undefined when handler reads it synchronously → TypeError
208
+ **Verified by:** Adding `await` before validation resolved the issue in test
209
+ **Confidence:** HIGH
210
+ ```
211
+
212
+ | Confidence | Criteria | Action |
213
+ |-----------|----------|--------|
214
+ | HIGH | Reliable reproduction, clear causal chain | Proceed to fix |
215
+ | MEDIUM | Strong evidence, some uncertainty | Proceed with caution, note risks |
216
+ | LOW | Circumstantial evidence | **Escalate — do NOT fix** |
217
+
218
+ ## Step 5: Fix + Auto-Capture
219
+
220
+ Once root cause is verified (HIGH or MEDIUM confidence):
221
+
222
+ 1. **Implement the minimal fix** — fix the bug, don't refactor surrounding code
223
+ 2. **Verify the fix** — run reproduction steps again, run test suite
224
+ 3. **Update debug.md** — add fix details and prevention measures:
225
+ ```markdown
226
+ ## Fix
227
+ **Change:** Added `await` before express-validator `validationResult()` calls
228
+ **Files:** src/middleware/validate.ts (3 lines changed)
229
+ **Verification:** `npm test` — 47 pass, 0 fail. Manual curl test returns 201.
230
+
231
+ ## Prevention
232
+ - Added eslint rule for async validation middleware
233
+ - Added integration test: POST with body → verify req.body populated
234
+ - Updated UPGRADE.md with express-validator v7 migration notes
235
+ ```
236
+ 4. **Commit** atomically with root cause in message
237
+ 5. **Auto-capture** reasoning:
238
+ ```bash
239
+ whyspec capture --json "<bug-name>"
240
+ ```
241
+ Write `<path>/ctx_<id>.md` with the full investigation story.
242
+
243
+ 6. **Show summary:**
244
+ ```
245
+ ## Debug Complete: <bug-name>
246
+
247
+ Root cause: [one-line summary]
248
+ Fix: [what was changed]
249
+ Context: ctx_<id>.md
250
+
251
+ Investigation:
252
+ Hypotheses tested: N (M confirmed, P disproved)
253
+ Evidence entries: N
254
+ Past contexts referenced: N
255
+
256
+ View full investigation: /whyspec:show <bug-name>
257
+ ```
258
+
259
+ ## Resuming an Investigation
260
+
261
+ If debug.md already exists for a change:
262
+ 1. Read debug.md
263
+ 2. Check Status and resume from the appropriate step
264
+ 3. Announce: "Resuming debug session: <name> — Status: <status>"
265
+
266
+ ## Escalation Rules
267
+
268
+ | Trigger | Action |
269
+ |---------|--------|
270
+ | All hypotheses disproved (2 rounds) | Present full evidence, ask for new direction |
271
+ | Cannot reproduce | Document symptoms, ask for environment details |
272
+ | Root cause outside codebase | Document findings, suggest infrastructure investigation |
273
+ | Confidence is LOW | Present evidence, explain uncertainty, do NOT fix |
274
+ | Fix introduces significant risk | Present fix + risk assessment, ask for approval |
275
+ | 3 failed fix attempts | Stop. Present what was tried. Ask for help. |
276
+
277
+ **Never silently give up.** If stuck, present evidence and ask.
278
+
279
+ ## Guardrails
280
+
281
+ - **No fix without root cause** — the Iron Law is non-negotiable
282
+ - **Max 3 tests per hypothesis** — escalate if inconclusive
283
+ - **Always capture reasoning** — every debug session produces both debug.md AND ctx_<id>.md
284
+ - **Write debug.md incrementally** — update after EVERY step, not at the end
285
+ - **Don't skip team knowledge** — always run Step 0
286
+ - **Test one hypothesis at a time** — sequential testing produces clean evidence
287
+ - **Preserve evidence** — record state before modifying suspect code
288
+ - **Minimal fixes only** — fix the bug, don't refactor