npm - litclaude-ai - Versions diffs - 0.2.2 - Mend

litclaude-ai 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (156) hide show

package/plugins/litclaude/skills/ai-slop-remover/SKILL.md ADDED Viewed

@@ -0,0 +1,142 @@
+---
+name: ai-slop-remover
+description: "Removes AI-generated code smells from a SINGLE file while preserving functionality. For multiple files, run per file in parallel."
+---
+You are an expert code refactorer specializing in removing AI-generated "slop" patterns while STRICTLY preserving functionality.
+**INPUT**: Exactly ONE file path. If multiple paths provided, REJECT and instruct to run one LitClaude pass per file.
+---
+## DETECTION CRITERIA (Specific)
+### 1. Obvious Comments (EXCLUDE: BDD comments like #given, #when, #then, #when/then)
+**REMOVE**:
+- Comments restating the code: `x += 1  # increment x`
+- Docstrings on trivial methods: `"""Returns the name."""` for `def get_name(): return self.name`
+- Section dividers: `# ===== HELPER FUNCTIONS =====`
+- Commented-out code blocks
+- `# TODO: future enhancement` without concrete plan
+- `# Note: this is important` without explaining WHY
+**KEEP**:
+- Comments explaining WHY (business logic, edge cases, workarounds)
+- Links to issues/tickets: `# See SPR-1234`
+- Non-obvious algorithm explanations
+- Regex explanations
+- Matches to existing code style
+### 2. Over-Defensive Code
+**REMOVE**:
+- Null checks for values that CANNOT be None (e.g., Django request in view)
+- `if x is not None and x.attr is not None:` when x is guaranteed
+- Try-except around code that can't raise (e.g., dict literal access)
+- `isinstance()` checks for statically typed parameters
+- Default values for required parameters: `def foo(x: str = "")` when empty string is invalid
+- Backward-compat shims: `_old_name = new_name  # deprecated`
+- `# removed` or `# deleted` comments for removed code
+- Re-exports of unused items
+- Verbose, duplicated, or redundant code / test cases
+**KEEP**:
+- Validation at system boundaries (user input, external API responses)
+- Error handling for I/O operations
+- Null checks for nullable DB fields
+- assertions in test code to matching type expectations
+### 3. Spaghetti Nesting (2+ levels deep)
+**REFACTOR**:
+- Nested if-else chains -> early returns / guard clauses
+- `if x: if y: if z:` -> `if not x: return` / `if not y: return`
+- Nested loops with conditionals -> extract to helper OR use comprehensions
+- Complex ternary `a if b else (c if d else e)` -> explicit if-else
+---
+## PROCESS
+### Step 1: Read & Analyze
+Read the file. Identify ALL slop instances with line numbers.
+### Step 2: Deep Consideration (CRITICAL)
+For EACH identified issue, think:
+- **Functionality Impact**: Will removing this change behavior? If ANY doubt, SKIP.
+- **Test Coverage**: Are there tests that might break? If uncertain, SKIP.
+- **Context Dependency**: Is this "slop" actually necessary for this specific codebase? (e.g., defensive code for known flaky external API)
+- **Readability Trade-off**: Will removal make code LESS readable? If yes, SKIP.
+**RULE**: When in doubt, DO NOT CHANGE. False negatives are better than breaking code.
+### Step 3: Execute Changes
+Make changes using Claude Code edit tools. One logical change at a time.
+### Step 4: Detailed Report
+**OUTPUT FORMAT**:
+```
+## AI Slop Removed: {filename}
+### Analysis Summary
+- Total issues found: N
+- Issues fixed: M
+- Issues skipped (safety): K
+### Changes Made
+#### Change 1: [Category] Line X-Y
+**Before**: [original code snippet]
+**After**: [modified code snippet]
+**Why this is slop**: [Explain why this pattern is problematic]
+**Why safe to remove**: [Explain why functionality is preserved]
+**Impact**: None - purely cosmetic improvement
+---
+### Skipped Issues (Preserved for Safety)
+#### Skipped 1: Line X
+**Reason**: [Why you chose not to change this]
+### Summary
+- Removed N obvious comments
+- Simplified M defensive patterns
+- Flattened K nested structures
+- Preserved L patterns that looked like slop but serve purpose
+```
+---
+## SAFETY RULES
+1. **NEVER remove error handling for I/O, network, or file operations**
+2. **NEVER simplify validation for user input or external data**
+3. **NEVER change public API signatures**
+4. **NEVER remove type hints (even redundant-looking ones)**
+5. **If a pattern appears in multiple places, it might be intentional - ASK before bulk removal**
+6. **Preserve all BDD test comments (#given, #when, #then)**
+When finished, your report should be detailed enough that a reviewer can understand EXACTLY what changed and feel confident the changes are safe.
+---
+## WHEN NO SLOP FOUND
+If the file is clean, report:
+```
+## AI Slop Analysis: {filename}
+### Result: No AI Slop Detected
+This file is clean. Here's why:
+**Comments**: N comments found, all explain WHY not WHAT
+**Defensive Code**: Null checks present are appropriate (e.g., checks external API response)
+**Code Structure**: Maximum nesting depth acceptable, early returns used appropriately
+**Conclusion**: This code appears to be human-written or well-reviewed AI code. No changes needed.
+```

package/plugins/litclaude/skills/comment-checker/SKILL.md ADDED Viewed

@@ -0,0 +1,55 @@
+---
+name: comment-checker
+description: "Comment hygiene workflow adapted for LitClaude: keep comments useful, remove stale narration, and preserve intent-bearing notes."
+---
+# Comment Checker
+Use this skill after code edits or documentation rewrites where comments may
+become stale. The goal is not fewer comments; it is comments that carry intent
+the code or tests cannot express.
+## Keep Comments That Explain
+Keep comments when they explain:
+- why a compatibility path exists
+- why a safety boundary is conservative
+- why a file is generated or packaged
+- why a fallback exists for a specific runtime behavior
+- why a test asserts a policy rather than exact prose
+## Remove Comments That Narrate
+Remove or rewrite comments that merely repeat code:
+- "read the file"
+- "set the variable"
+- "loop through items"
+- "return result"
+If the comment is only true because the current implementation happens to work
+that way, prefer a clearer name or a test.
+## LitClaude-Specific Checks
+- Comments about Claude registry writes must match actual paths.
+- Comments about `/goal` must not imply the plugin can type it silently.
+- Comments about `--dry-run` must distinguish inspection from final proof.
+- Comments about package contents must match `package.json.files`.
+- Comments in tests should explain policy assertions, not every regex.
+## Review Flow
+1. Search changed files for comments.
+2. For each comment, ask whether the next maintainer would be worse off without
+   it.
+3. Delete narration.
+4. Update stale intent comments.
+5. Run tests touching the changed files.
+## Evidence
+For code comments, evidence is the diff plus tests. For docs comments or
+markdown policy notes, evidence is docs tests and one manual scan of the
+rendered or packaged surface when relevant.

package/plugins/litclaude/skills/debugging/SKILL.md ADDED Viewed

@@ -0,0 +1,70 @@
+---
+name: debugging
+description: "Systematic Claude Code debugging workflow adapted for LitClaude: reproduce, localize, explain root cause, patch minimally, and verify the real failing surface."
+---
+# Debugging
+Use this skill when behavior is broken, surprising, flaky, or contradicted by
+evidence. Do not patch from vibes. First make the failure reproducible.
+## Debugging Loop
+1. Reproduce the failure with the smallest command or scenario.
+2. Capture exact output, exit code, log, or transcript.
+3. State expected vs actual behavior.
+4. Localize the failing boundary.
+5. Patch minimally.
+6. Re-run the original reproduction.
+7. Add or update a regression test.
+8. Run one real-surface QA scenario.
+## Reproduction Standards
+A good reproduction is:
+- runnable from the project root
+- deterministic or has a bounded flake note
+- small enough for the next agent to rerun
+- tied to one expected observable
+For LitClaude examples:
+- `printf ... | node plugins/litclaude/bin/litclaude-hook.js user-prompt-submit`
+- `npm exec --yes --package ./ -- litclaude-ai --dry-run install`
+- `CLAUDE_CONFIG_DIR=$(mktemp -d) node bin/litclaude-ai.js install`
+- `node --test test/hooks.test.mjs --test-name-pattern litwork`
+## Root Cause Discipline
+Explain the mechanism, not just the symptom. "The test failed" is not a root
+cause. "The command markdown mentions `$ARGUMENTS`, which Claude renders empty
+and tests reject as a broken placeholder" is a root cause.
+## Common LitClaude Failure Classes
+- npm same-name source checkout causes `npx litclaude-ai` resolution confusion.
+- `npm publish` requires OTP and fails with `EOTP`.
+- CLI smoke uses unsupported `--version` and receives usage.
+- Claude Code exposes `/goal` as a slash command but not model-facing tools.
+- Plugin manifest version differs from `package.json`.
+- Hook JSON parsing fails with stack traces instead of controlled errors.
+- Package `files` omit a new skill or command.
+## Patch Rules
+- Patch the failing boundary, not an unrelated symptom.
+- Keep compatibility aliases unless a breaking change is intentional.
+- Add tests before production behavior changes.
+- Do not broaden catch blocks without asserting the new error path.
+- Do not call network or publish in tests.
+## Evidence
+Record:
+- failing command and output
+- changed file
+- regression test
+- passing command and output
+- manual QA artifact and cleanup

package/plugins/litclaude/skills/debugging/references/methodology/00-setup.md ADDED Viewed

@@ -0,0 +1,108 @@
+# Phase 0 + 1 — Environment Assessment & Journal Setup
+Before a debugger touches anything, you need a map of what's running and a ledger of what you'll touch. Skipping either phase is how debug sessions turn into "why is my repo dirty a week later" sessions.
+---
+## Phase 0 — Environment Assessment
+Map the ground truth before you attach. Attaching the wrong way wastes the first hour.
+### 1. Identify the runtime
+Read the actual manifest file, don't guess from extensions:
+- Python → `pyproject.toml`, `requirements*.txt`, `setup.py`, `uv.lock`, `.python-version`
+- Node → `package.json` (check `scripts`, check `engines`, check `type: module`)
+- Rust → `Cargo.toml`, `rust-toolchain*`
+- Go → `go.mod`, `go.sum`
+- Native / mixed → `Makefile`, `CMakeLists.txt`, the binary itself (`file <path>`)
+### 2. Load the matching runtime reference
+The moment you know the runtime, open `references/runtimes/<runtime>.md`. The commands in this phase (and every phase after) are runtime-specific. The shape of the answers is the same; the commands are not.
+### 3. Gather observable environment state
+The shape of the answers you need (commands in the runtime reference):
+| Question | Why it matters |
+|---|---|
+| What binary/interpreter/runtime actually launches the process? | Determines debugger flag plumbing. Wrappers (`tsx`, `poetry run`, `cargo run`, `bun`, supervisor scripts) change how flags propagate. |
+| Is there already a debug-relevant port in use, or another instance of the service running? | Either attach to it or kill it deliberately — never silently compete. |
+| Are symbols / source maps / debug info present and correct? | This determines whether breakpoints land on the right lines. Compiled-but-not-debug builds, stripped binaries, and incomplete source maps all silently misplace breakpoints. |
+| Does the code path require env vars, config files, or auth tokens to reach the bug? | Missing env often produces early-return paths that masquerade as the bug itself. |
+| Is there an existing failing test or known repro? | Prefer amplifying an existing repro over inventing one. |
+| Are watchers (file watchers, hot reloaders, supervisors) going to restart the process mid-session? | If yes, turn them off before attaching. Restarts drop inspector connections and invalidate breakpoints. |
+### 4. Gate check
+If any answer is "I'm not sure", you are not ready for Phase 1. Investigate until certain. Guessing here cascades into false-positive hypotheses in Phase 2.
+---
+## Phase 1 — Journal Setup
+Open **one** journal file at the project root: `.debug-journal.md`. Single source of truth for every artifact this skill creates. The contract with the user that you can undo everything.
+### Exclude from git (don't pollute the committed ignore list)
+```bash
+grep -qx '.debug-journal.md' .git/info/exclude || echo '.debug-journal.md' >> .git/info/exclude
+```
+`.git/info/exclude` is per-clone and not committed — perfect for local-session artifacts.
+### Journal template
+```markdown
+# Debug Journal — <short bug name>
+Started: <ISO timestamp>
+Goal: <one-sentence user request>
+## Environment snapshot (Phase 0)
+- Runtime: <language + version + launcher>
+- Entry: <command that starts the process>
+- Ports / sockets: <app=..., debugger=..., etc>
+- Git HEAD: <sha>, working tree clean? <yes/no>
+- References read: <list the files from references/ you loaded — proves you did the gate>
+## Hypotheses
+1. [STATUS] <hypothesis> — distinguishing evidence: <what would confirm/refute> — if true, fix is: <two words>
+2. ...
+## Failed hypothesis round counter
+- Round 1: <result>
+- Round 2: <result>
+<!-- At 2 consecutive failures, invoke Oracle Triple (see 04-oracle-triple.md). -->
+## Artifacts to revert
+<!-- Every temp edit, tmux session, fixture, env override, saved debugger session goes here
+     BEFORE it is created. The rule is journal-then-modify. -->
+- [ ] `src/foo.py` — added `breakpoint()` on 2 lines. Revert: `git checkout src/foo.py`
+- [ ] tmux session `debug-server`. Kill: `tmux kill-session -t debug-server`
+- [ ] `/tmp/debug-payload.json`. Remove: `rm /tmp/debug-payload.json`
+- [ ] env var in current shell: `FOO_BASE_URL=...`. Unset when done.
+- [ ] GDB session save: `~/ghidra-projects/scratch.gzf`. Remove if not promoting.
+## Findings
+<!-- Append observed values here with timestamp. Verbatim only, no paraphrasing. -->
+## Oracle Triple (if invoked)
+<!-- One subsection per Oracle round, with the synthesized new hypothesis set. -->
+## Final fix
+<!-- File paths + test path. Filled during Phase 7. -->
+```
+### The journal-then-modify rule
+Before any modification to the repo, shell, or system state, append to "Artifacts to revert" first. This one discipline is what prevents debug sessions from becoming git cleanup sessions.
+If you catch yourself about to run a command that creates a file, opens a port, or modifies source — stop, journal the intended artifact with its revert command, then run the command. Not the other way around.
+### Why a single journal (not scattered TODO comments)
+- One `git checkout`, one `rm`, one `tmux kill-session` list — simple Phase 9 walk.
+- Survives interruptions. If you get pulled away mid-session, the next agent (or you later) can continue or revert without guessing.
+- Prevents the most common failure: leaving `console.log`/`print()`/`dbg!` scattered across the tree.

package/plugins/litclaude/skills/debugging/references/methodology/02-investigate.md ADDED Viewed

@@ -0,0 +1,126 @@
+# Phase 2 + 3 — Hypothesis Formation & Parallel Investigation
+One hypothesis is a hunch. Three hypotheses is a decision. Investigation is how you turn the decision into runtime evidence.
+---
+## Phase 2 — Hypothesis Formation (Minimum Three)
+### Why three, not one
+A single hypothesis creates confirmation bias: you'll read runtime state looking for evidence that confirms it and unconsciously discount contradictions. Three hypotheses force you to design queries that *distinguish* between them, which is the only way runtime evidence becomes decisive.
+### Generate across orthogonal axes
+If your three hypotheses are all variations of "the handler has a bug", you don't actually have three hypotheses. Span the space:
+| Axis | Example framing |
+|---|---|
+| **User-code logic** | "The handler early-returns because condition X is unexpectedly true" |
+| **Library/SDK behavior** | "The third-party client swallows the error and returns a stub" |
+| **Environment/config** | "The env var is read at module-load time before it gets populated, so it's empty" |
+| **Async/timing** | "The promise rejects (or goroutine panics) after the response is already sent" |
+| **Silent side-effect** | "An earlier turn mutated shared state that the current turn inherits" |
+| **Observability gap** | "The error is raised but suppressed before logging; it only exists as an unawaited rejection / ignored signal" |
+| **Binary-level** (when applicable) | "The function we think is running is actually jumped over by a patched thunk / a different version loaded" |
+| **Build-vs-runtime** | "The code we're reading is not the code that's running — stale build, wrong symlink, cached wheel, or dist/ ahead of src/" |
+### For each hypothesis, write in the journal
+1. **Claim** — one sentence.
+2. **Distinguishing evidence** — the exact value or state that confirms or refutes it, AND where to read it (file:line, log source, breakpoint location, memory address).
+3. **If true, the fix is** — two words. Forces you to think through fix cost before committing to the hunt.
+### Collapse rule
+If two hypotheses have identical distinguishing evidence, they aren't actually different — collapse them and find a real alternative. If you can't come up with a third distinct hypothesis, you don't understand the system well enough yet. Go read a little more code before investigating.
+---
+## Phase 3 — Parallel Investigation
+Branch depending on what's available.
+### Path A: Team mode ENABLED
+When the `Claude workflow coordination` tools are present, create a **debug-squad** team and split investigation across members working on different evidence sources. This is the right default whenever you have ≥3 hypotheses and any of them would take >10 minutes to investigate single-threaded.
+**Team spec** — write to `~/.litclaude/teams/debug-squad/config.json`:
+```json
+{
+  "name": "debug-squad",
+  "lead": { "kind": "subagent_type", "subagent_type": "sisyphus" },
+  "members": [
+    {
+      "kind": "category",
+      "category": "deep",
+      "prompt": "You are the Runtime State Inspector. Your job: attach to the live process, hit breakpoints, read program state (variables, heap, goroutines, stack, registers depending on runtime), and report observed values verbatim. Never guess — if you don't see the value, say so. Report back via Claude workflow message with file:line / address references and captured values. Never edit source code. Never run git commands. If you need an instrumentation statement added (breakpoint(), debugger;, dbg!, etc.), ask the Lead first."
+    },
+    {
+      "kind": "category",
+      "category": "deep",
+      "prompt": "You are the Log Archaeologist. Your job: grep server logs, stderr streams, SDK-internal debug output (DEBUG env, RUST_LOG, GODEBUG, PYTHONASYNCIODEBUG), and correlate timestamps. Produce a timeline of events with latencies. Flag anything that looks like a silent catch, a swallowed rejection, a panic recovered-and-ignored, a success response that contains failure signals (HTTP 200 with empty body, stopReason=error, exit 0 with error-in-stdout). Never edit source code."
+    },
+    {
+      "kind": "category",
+      "category": "deep",
+      "prompt": "You are the Reproduction Engineer. Your job: build the smallest reliable repro — a curl command, a vitest/pytest/go test, a tmux script, a Playwright script for browser bugs, a pwntools script for binary targets. It must reproduce on first try and be copy-pasteable by the Lead. Document exact input, expected output, observed output. Save repro artifacts under /tmp/ and tell the Lead to journal them. If the bug is browser-based you MUST use Playwright CLI — do not simulate with curl."
+    },
+    {
+      "kind": "category",
+      "category": "deep",
+      "prompt": "You are the Trace Correlator. Your job: take findings from the other members and cross-link them. Build a causal chain from symptom to suspected cause. Identify missing evidence. Propose the next single most-decisive runtime query. Never edit source code; only reason across already-captured evidence. If hypotheses diverge sharply after correlation, tell the Lead immediately — that is the signal for the Oracle Triple."
+    }
+  ]
+}
+```
+**Assignment rule**: one hypothesis → one `Claude workflow task`. Give each hypothesis to the member whose evidence source is most likely to confirm or refute it. Broadcast the full hypothesis list once via `Claude workflow message(to="*")` so members know what the others are testing.
+**Lead responsibilities**:
+- Maintain the journal (members do not write to it).
+- Approve any source-code edits (including `debugger;` / `breakpoint()` / `dbg!` statements).
+- Synthesize member reports into updated hypothesis statuses.
+- Decide when to disband: `Claude workflow shutdown request` → `Claude workflow shutdown approval` → `Claude workflow cleanup`.
+**Team does NOT include Oracle** — Oracle is a hard-reject team member type. Oracle is used separately in Phase 4 (see `04-oracle-triple.md`).
+### Path B: Team mode DISABLED
+Fan out async Claude Code workflow lanes or subagents instead. Same rule: one
+hypothesis per lane.
+- Lane 1: runtime state investigation for hypothesis 1, with the bug summary
+  and exact state to inspect.
+- Lane 2: log/timing investigation for hypothesis 2.
+- Lane 3: reproduction minimizer for hypothesis 3.
+End your response, wait for completion notifications, then synthesize.
+---
+## Evidence capture discipline (both paths)
+For every piece of runtime state captured, record in the journal:
+```markdown
+### <ISO timestamp> — <what you looked at>
+- Source: <file:line | log source | curl command | breakpoint address>
+- Value: `<verbatim>`
+- Interpretation: <one line — why this matters>
+- Refutes/Confirms: H<n>
+```
+**Verbatim values only. No paraphrasing.**
+- `messages.length=0` is evidence.
+- "messages seemed empty" is not evidence — it's a memory of an observation, and memory of observations is where debug sessions go to die.
+If you find yourself about to paraphrase, stop, go back, and copy the raw value.
+---
+## Round completion
+A "round" is complete when every hypothesis has either confirming or refuting evidence — or when you have exhausted the evidence sources available without a decisive result. If the round ends inconclusively, that counts as a failed round for the counter in the journal. See `04-oracle-triple.md` for what to do at 2 consecutive failed rounds.

package/plugins/litclaude/skills/debugging/references/methodology/04-oracle-triple.md ADDED Viewed

@@ -0,0 +1,106 @@
+# Phase 4 — Oracle Triple Consultation
+At 2 consecutive failed hypothesis rounds, stop investigating and reframe. Continuing past two failures usually means the real cause is in a category you haven't imagined — and more time on your current mental model is wasted time.
+The Oracle Triple is how you break out of the mental box.
+> ⚠️ **Wrong tool for non-debugging tasks.** The Triple is for *stuck root-cause hunts*. If your task is producing an artifact (extraction, reverse engineering, audit, compliance documentation) and you want a skeptical review before declaring it done, use the **Verification Oracle** pattern in [partial-runtime-evidence.md](partial-runtime-evidence.md#verification-oracle-pattern-for-non-debug-tasks). Running the Triple on a finished extraction returns three diverging "what if you tried…" tangents that are not what you need.
+---
+## When to invoke
+| Situation | Invoke? |
+|---|---|
+| 1 round failed, you have new distinguishing evidence | No — run one more round with a refined hypothesis set |
+| 2 rounds failed, hypotheses now feel like variations of each other | **Yes — invoke now** |
+| 2 rounds failed, no new evidence angles left to try | **Yes — invoke now** |
+| You've been investigating >2 hours on the same bug | **Yes — invoke now regardless of round count** |
+| 1 round failed but the user is watching and wants speed | No — one round isn't enough to justify Oracle cost. Resist the urge. |
+---
+## Why three Oracles, and why *orthogonal* framings
+A single Oracle call returns a single coherent analysis. Coherent analyses tend to inherit the framing of the prompt, which means they inherit the same blind spots the investigator already has. Three Oracles with *orthogonal framings* force the analyses to diverge, and the places where they agree across frames is where the real signal lives.
+The three framings below are chosen to cover distinct bug-cause categories:
+- **A (obvious-but-missed)** — embarrassingly simple causes the investigator walked past.
+- **B (system-boundary)** — causes living at integration seams, not in the code being read.
+- **C (invariant-violation)** — assumptions load-bearing to current hypotheses that may themselves be false.
+Spawn all three in parallel.
+---
+## The three prompts
+Launch three parallel Claude Code workflow lanes or subagents with the same
+evidence packet and three orthogonal prompts:
+- Framing A - obvious-but-missed: ask for the three simplest causes a senior
+  engineer might spot quickly, including typos, stale cache, wrong process,
+  wrong file, wrong import, or test harness mismatch.
+- Framing B - system-boundary: ask for three causes at integration boundaries,
+  such as third-party SDK behavior, middleware mutation, proxy rewrites,
+  build-time vs runtime env resolution, module-load order, shared library
+  mismatch, ABI differences, or transport mismatch.
+- Framing C - invariant-violation: ask for the five most load-bearing assumed
+  invariants and the smallest runtime query that would falsify each.
+---
+## Synthesizing across three Oracles
+**Do not pick the highest-ranked candidate from a single Oracle.** That defeats the purpose of getting three framings.
+Instead, walk the outputs in this order:
+### 1. Agreement scan
+Note which candidate causes appear in at least two Oracles' outputs. Independent agreement across orthogonal framings is strong signal — when the obvious-but-missed framing and the system-boundary framing both land on the same cause, that's usually the bug.
+### 2. Disagreement scan
+Note where Oracles disagree. Disagreement is genuine uncertainty that runtime evidence (not more reasoning) must resolve. Each disagreement becomes a candidate for the next round's distinguishing query.
+### 3. New falsification queries
+Framing C produces concrete "one query that would decide it" suggestions. Pull these verbatim into your new round's evidence-gathering plan — they are designed to be decisive.
+### 4. Build the new hypothesis set
+Minimum 3, same rules as Phase 2. Aim to have hypotheses drawn from the agreement scan (likely cause) AND from the disagreement scan (so one round's evidence resolves the disagreement).
+Record in the journal:
+```markdown
+## Oracle Triple — Round <N>
+- Invoked at: <ISO timestamp>
+- Framing A summary: <top 3 candidates, one line each>
+- Framing B summary: <top 3 candidates>
+- Framing C summary: <5 load-bearing assumptions + falsification queries>
+### Cross-framing agreement
+- <candidate> appeared in A + B
+- <candidate> appeared in B + C
+### New hypothesis set
+1. <hypothesis> — evidence to gather: <one-liner>
+2. ...
+```
+### 5. Reset the counter
+Reset the "consecutive failed rounds" counter to 0. Return to Phase 3 (parallel investigation) with the new set.
+---
+## If *another* 2 rounds fail after the Oracle Triple
+You are genuinely stuck. This is the escalation threshold.
+Escalate to the user (see `05-escalate.md`) with the full trace: every hypothesis tried, every piece of evidence captured, both Oracle syntheses. Do not guess a fix.
+This is rare — in practice, the Oracle Triple resolves almost all stuck debugging sessions within one round, because it pulls in framings the investigator was too close to the code to see.

package/plugins/litclaude/skills/debugging/references/methodology/05-escalate.md ADDED Viewed

@@ -0,0 +1,69 @@
+# Phase 5 — User Decision Escalation
+Escalation is for genuine ambiguity, not for skipping investigation. Most "should I ask the user" moments are really "I don't want to do one more query" moments, and those are wrong.
+---
+## Ask the user ONLY when
+- **Evidence exhausted**, contradictions remain, and further investigation would require a decision with policy implications (e.g. "patch the third-party SDK vs wrap it vs change architecture").
+- The bug has **multiple valid fixes with different scope/risk tradeoffs** and the user's preference drives the choice.
+- A proposed fix would **change observable product behavior** for the end user (not just fix the internal bug).
+- You've **exhausted the Oracle Triple** and another 2 rounds failed after synthesis.
+## Do NOT ask when
+- You haven't tried the Oracle Triple yet.
+- The question can be answered by one more runtime query.
+- You're asking for permission to do the obvious thing.
+- You're asking because you're tired.
+---
+## Escalation format (paste into the reply)
+Keep it short. Evidence-dense. One decision, not a status update.
+```markdown
+## Decision needed
+**What we know** (verbatim evidence, not paraphrase):
+- <fact 1 with file:line or address>
+- <fact 2 with source>
+- <what the evidence rules IN>
+- <what the evidence rules OUT>
+**What the decision is** (one sentence):
+<the fork in the road>
+**Options**:
+| # | Fix | Scope | Risk | Effort |
+|---|-----|-------|------|--------|
+| A | <short label> | <files touched / layers> | <regressions possible> | <rough> |
+| B | ... | ... | ... | ... |
+| C | ... | ... | ... | ... |
+**Recommendation**: <A/B/C> because <one-sentence reason>.
+Which direction do you want?
+```
+---
+## Anti-patterns in escalation
+- **Asking without evidence.** "What do you want me to do?" is not an escalation, it's abandonment. Every escalation includes the evidence the user needs to decide.
+- **Two questions in one.** One decision per escalation. Multi-part questions lead to partial answers and re-escalation.
+- **Escalating before Phase 4.** If you haven't tried the Oracle Triple, you haven't earned the right to escalate.
+- **Presenting options you don't actually have.** If option C requires a library the user doesn't use, don't list it. The options are only things you can actually do today.
+- **Hiding a recommendation.** The user hired you to think — always end with a recommendation, even if you're low-confidence. Say so explicitly: "Recommendation (low confidence): B, because X. If you have context about Y that I don't, it might change to A."
+---
+## What happens after the user responds
+- **User picks an option**: return to Phase 6 (root cause confirmation) with the chosen direction. The user's choice is not itself confirmation — you still need runtime evidence that the cause you're fixing is the cause in play.
+- **User proposes a different option you hadn't considered**: treat it as new information. Update hypotheses. May trigger another Phase 3 round.
+- **User gives more context that resolves the disagreement**: skip to Phase 6.
+- **User is also unsure**: that's a signal you need more evidence, not more opinions. Run one more targeted query before asking again.