npm - litclaude-ai - Versions diffs - 0.2.2 - Mend

litclaude-ai 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (156) hide show

package/plugins/litclaude/skills/debugging/references/methodology/06-fix.md ADDED Viewed

@@ -0,0 +1,116 @@
+# Phase 6 + 7 — Root Cause Confirmation & TDD Fix
+A cause is not "confirmed" until you can toggle the bug by toggling the cause. Every other level of evidence is correlation, and correlation-driven fixes ship bugs.
+---
+## Phase 6 — Root Cause Confirmation
+You are allowed to call the cause "confirmed" only when ALL THREE of these hold:
+### 1. Captured runtime value matches the hypothesis exactly
+Not "the value looks consistent with" — the value is exactly the value the hypothesis predicted. If your hypothesis was "baseUrl is api.anthropic.com despite ANTHROPIC_BASE_URL being set to a proxy", the captured value is literally `"https://api.anthropic.com"` in the debugger at the moment of the HTTP call.
+### 2. Reproducible
+Running the repro a second time yields the same observation. Flaky repros mean you haven't isolated the cause; you've isolated a symptom that sometimes appears when the cause does. Keep investigating.
+### 3. Toggle proof (the one most skipped)
+**Changing the value** (via debugger assignment, env override, or a speculative one-line patch) **makes the bug disappear — and reverting brings the bug back**.
+If you can't toggle the bug by toggling the suspected cause, what you have is a correlation, not a mechanism. A correlation is a strong hypothesis, not a confirmed cause.
+Examples of a valid toggle proof:
+| Suspected cause | Toggle |
+|---|---|
+| Env var overrides library default, and the override is wrong | Unset the env var → bug goes away. Reset it → bug comes back. |
+| Async task is not awaited | Add `await` → bug goes away. Remove `await` → bug comes back. |
+| Third-party SDK uses hardcoded URL | Monkey-patch SDK to use env URL → bug goes away. Unpatch → bug comes back. |
+| Race condition on shared state | Add a mutex → bug goes away under load. Remove mutex → bug comes back under load. |
+If you can't construct a toggle proof, you haven't confirmed the cause. Run one more round.
+### Update the journal
+```markdown
+## Root cause (confirmed <ISO timestamp>)
+- Mechanism: <one paragraph, causal not correlational — the chain from cause to observable symptom>
+- Evidence: <file:line of captured value | path to saved repro | address + register state>
+- Toggle proof: "With <change X>, repro produces <good>. Reverting <change X>, repro produces <bad>."
+- Fix scope: <files and approximate line count>
+```
+The "mechanism" field is the acid test. If you can't write the causal chain from cause to observable symptom as one paragraph, you don't yet understand the bug well enough to fix it.
+---
+## Phase 7 — TDD Fix
+Red, green, refactor. No shortcuts.
+### 1. Red — failing-first test
+Write a test that fails *specifically because of this bug*. Requirements:
+- **Test name reads like a bug report.** `test_refinement_turn_returns_empty_content_when_anthropic_returns_401` is good. `test_bug_fix` is not.
+- **Failure message clearly shows what the bug looks like.** If someone reads only the failure output, they understand what's broken.
+- **Minimum infrastructure.** Don't spin up the whole server if a unit test against the right seam captures the mechanism.
+Run the test. Confirm it fails. Paste the failure output into the journal:
+```markdown
+### Red phase (<ISO timestamp>)
+Test: <path>::<name>
+Command: <exact invocation>
+Output:
+```
+<verbatim failure output>
+```
+Confirms: the bug is reproducible at the test-harness level, not just the manual repro.
+```
+### 2. Green — minimum change
+Make the test pass with the **smallest change that fully fixes the observed mechanism**.
+If the diff is larger than ~30 lines and you aren't refactoring, something is wrong — either you're fixing more than the bug, or the root cause was deeper than you confirmed. Back to Phase 6.
+Signs you're over-fixing:
+- Adding "just in case" null checks or try/except around other code
+- Refactoring adjacent functions because "while I'm here"
+- Adding new configuration options the bug didn't require
+- Introducing new abstractions to "make this cleaner"
+Resist all of these. Fix the bug. Note the surrounding issues for follow-up. Move on.
+### 3. Refactor — ONLY AFTER GREEN
+Only cleanup directly related to the fix. Do not re-architect.
+If the code around the fix is rough, note it in the journal as a follow-up for the user; do not expand scope here. Refactoring during a bugfix is how one-line fixes turn into hundred-line diffs nobody can review.
+### 4. Regression — full suite green
+Run the full test suite for the affected package (not just the one new test). Existing tests must still pass.
+If they don't, your "fix" broke something else. Back to Phase 6 with the new failure as evidence — usually it means the mechanism you thought you fixed was load-bearing for some other code path you didn't know about, and the "broken" test is actually pointing at a better understanding of the system.
+### Update the journal
+```markdown
+### Green phase (<ISO timestamp>)
+Fix: <file:line> — <two-line description of the change>
+Test: <path>::<name> now passes
+Full suite: <N tests, <M failures — should be 0>
+```
+---
+## The red-green discipline summary
+No red test → no proof the fix addresses the reported bug. Only proof it doesn't break tests that already existed.
+A test written *after* the fix might still pass with the fix reverted. If that's the case, the test doesn't lock the bug — it locks something else. Always verify the test fails without the fix and passes with it. The journal should show both outputs.

package/plugins/litclaude/skills/debugging/references/methodology/08-qa.md ADDED Viewed

@@ -0,0 +1,94 @@
+# Phase 8 — Manual QA by Actually Using It
+Tests cover cases you thought of. Real usage covers the ones you didn't.
+The single fastest way to ship a broken fix is to stop at "tests pass". Manual QA means interacting with the running system the way the user does, then comparing observed behavior to the original bug report.
+---
+## Product-type playbook
+Pick the row that matches the product. Do what it says. Do not substitute.
+| Product type | QA means… |
+|---|---|
+| **CLI tool** | Open `tmux`, run the actual command end-to-end, capture output. Paste the session transcript into the journal. Include exit code, stdout, stderr, side-effect check (files created/modified). |
+| **HTTP API** | Start the real server, hit endpoints with `curl` or `httpie`, inspect response status + body + headers. Hit the specific endpoint that reproduced the bug. If there's auth, use real auth. |
+| **Browser-served web app** | **Drive a real browser via Playwright CLI.** See [tools/playwright-cli.md](../tools/playwright-cli.md). Navigate the exact page/flow that reproduced the bug. Capture screenshot + DOM + network evidence. **Do not substitute with curl** — browsers have state (cookies, localStorage, service workers, client-side JS, viewport-dependent CSS) that curl does not have. |
+| **Agent / LLM pipeline** | Run the same user prompt that originally failed. Capture the full turn — tool calls, messages, usage counters. **Confirm non-zero usage** (zero usage = still failing silently, see silent-failure check below). |
+| **Background worker / job queue** | Trigger the job through the normal entry point (API call, cron tick, message publish), tail the worker logs, observe completion state in the queue or DB. Don't just call the worker function directly — the trigger path matters. |
+| **MCP server** | Invoke the tool via its actual client (Claude Desktop, Cursor, etc. if available) or `mcp-cli`, not just the HTTP probe endpoint. The MCP handshake itself is sometimes where bugs live. |
+| **Native binary** | Re-run the exact command that crashed / misbehaved. If the input was a file, use the same file. If the bug was exploitable, confirm the exploit repro via pwntools (see [tools/pwntools.md](../tools/pwntools.md)). Capture exit code, signal if any, core dump if generated. |
+| **Bundled-app binary** (Bun SEA, Node SEA, Electron, etc.) | Re-run the exact command. If the operation requires paid quota / blocked network, capture the **app's debug log** (`APP_DEBUG=1 APP_LOG_LEVEL=debug APP_LOG_FILE=/tmp/trace.log`) which usually emits the assembled request before sending. See [methodology/partial-runtime-evidence.md](partial-runtime-evidence.md) for combining partial signals into a defensible verification. |
+| **Long-running daemon** | Start fresh, let it run for the amount of time the bug originally took to manifest (not less), capture resource usage (memory, fd, cpu) throughout. Short-running QA misses resource leaks and cumulative state bugs. |
+---
+## Journal format
+Every QA run goes in the journal under "Findings":
+```markdown
+### Manual QA — <product type> (<ISO timestamp>)
+- Scenario: <one line describing what you did>
+- Command: `<exact invocation>`
+- Observed output:
+```
+<verbatim output, trimmed to relevant section>
+```
+- Expected output: <what correct behavior looks like>
+- Fix verified: yes / no / partial — <details>
+```
+If any QA step shows **partial or regressed behavior**, this is not "mostly done" — it's incomplete. Return to Phase 6.
+---
+## The silent-failure check (always run)
+Regardless of product type, audit the fix against these silent-failure patterns. If the original bug was a silent failure, the same pattern may exist in adjacent code that you haven't tested yet.
+### Universal silent-failure signals
+- HTTP 2xx with empty or default body
+- Response `ok: true` but a sub-field contains an error token (e.g. `stopReason: "error"`, `status: "failed"`)
+- `usage.totalTokens === 0` on an LLM response
+- Process exit code 0 but stderr contains an exception traceback
+- Panic recovered and logged but ignored
+- Goroutine / task / promise rejection with no top-level handler
+- `try { ... } catch { /* swallowed */ }` or `except: pass`
+- Success response shape but semantic field indicates failure (e.g. `error: null` actually being `error: "..."` with falsy check)
+- Write returned success but read-back shows stale data
+- Job marked complete but side-effect did not happen
+- Cache hit path returned stale data and no refresh was triggered
+### Language-specific silent-failure signals
+Check the runtime reference for additional patterns:
+- [runtimes/python.md](../runtimes/python.md) — asyncio task exceptions, bare `except`, `logging.exception` that goes nowhere
+- [runtimes/node.md](../runtimes/node.md) — unhandled promise rejections, `void` on async, swallowed `.catch(() => {})`
+- [runtimes/rust.md](../runtimes/rust.md) — `.unwrap_or_default()`, `let _ = result`, error variants discarded
+- [runtimes/go.md](../runtimes/go.md) — `if err != nil { return err }` that never reaches user output, recovered panics, buffered channels that block silently
+- [runtimes/native-binary.md](../runtimes/native-binary.md) — ignored return codes from libc, missing `perror`, `alarm()` / signal masks
+- [runtimes/bundled-js-binary.md](../runtimes/bundled-js-binary.md) — `process.env.X` baked at build time, dead code from tree-shaking failures, worker sub-bundles diverging from main bundle
+### What to do when you find another silent-failure spot
+Don't fix it. This is out of scope for the current bug.
+Note it in the journal under a "Follow-ups" section with:
+- File:line
+- Pattern matched
+- Proposed fix sketch (one line)
+- Risk level (what happens if left unfixed)
+Surface these to the user in the final message under "Next steps I didn't take".
+---
+## The "fix verified" bar
+"Fix verified" means: the exact original failing scenario, re-run, now produces the correct output. Not a similar scenario. Not a unit test of the fix. The original scenario.
+If you can't re-run the original scenario (e.g. it required a specific data state that's gone), construct the closest equivalent and document the difference in the journal. Escalate to the user if the equivalent is materially different.

package/plugins/litclaude/skills/debugging/references/methodology/09-cleanup.md ADDED Viewed

@@ -0,0 +1,164 @@
+# Phase 9 + 10 — Cleanup & Final Verification
+The working tree after the session must differ from before only by the real fix and its test. Anything else is a process failure.
+---
+## Phase 9 — Cleanup & Revert
+### The walk
+Open the journal's "Artifacts to revert" list. Walk it top to bottom. Check each box only after the revert command succeeds and produces no error.
+### Standard revert operations
+Most sessions create some combination of these artifacts. The commands below are the defaults — your journal should have the exact commands for this session.
+```bash
+# --- Temporary source edits (instrumentation statements, debug prints) ---
+git checkout <file>                              # reverts only that file
+git diff <file>                                  # verify clean
+# --- tmux sessions ---
+tmux kill-session -t <session-name>
+tmux ls                                          # confirm gone
+# --- Temp fixtures / scratch scripts ---
+rm -f /tmp/debug-*.*
+ls /tmp/debug-*.* 2>/dev/null                    # confirm gone (ls returns non-zero when no match)
+# --- Background processes (debugger-attached runtimes) ---
+pkill -f 'node --inspect' || true
+pkill -f 'python -m pdb' || true
+pkill -f 'debugpy' || true
+pkill -f 'dlv' || true
+pkill -f 'gdb' || true
+pkill -f 'lldb' || true
+# --- Debug-relevant ports confirmed free ---
+lsof -iTCP:9229 -sTCP:LISTEN -nP 2>/dev/null     # Node inspector default
+lsof -iTCP:5678 -sTCP:LISTEN -nP 2>/dev/null     # debugpy default
+lsof -iTCP:2345 -sTCP:LISTEN -nP 2>/dev/null     # dlv default
+lsof -iTCP:9999 -sTCP:LISTEN -nP 2>/dev/null     # pwndbg/gdb-server default
+# --- Env var overrides in current shell ---
+unset DEBUG_OVERRIDE_FOO
+unset PYTHONBREAKPOINT
+unset RUST_LOG
+unset DEBUG
+# --- Ghidra scratch projects (if created just for this session) ---
+# rm -rf ~/ghidra-projects/debug-scratch
+# --- Core dumps from debugging (if any) ---
+rm -f ./core ./core.* ~/core.*
+# --- Playwright trace files ---
+rm -rf playwright-report/ test-results/
+```
+### The verify command
+This is the single most important check of the whole skill:
+```bash
+git status
+git diff --stat
+```
+The diff must contain **only**:
+1. The real fix.
+2. The new failing-first test.
+3. Nothing else.
+### Detector checklist — scan the diff for these
+If `git status` shows any untracked debug file, or `git diff` shows any of the patterns below, **you are not done**. Clean it.
+| Pattern | Usually means |
+|---|---|
+| `debugger;` | Node debug statement left behind |
+| `breakpoint()` | Python debug statement left behind |
+| `dbg!(...)` | Rust debug macro left behind |
+| `fmt.Println("DEBUG: ...")` | Go ad-hoc print |
+| `console.log("[DEBUG]` | Node ad-hoc log |
+| `print(f"DEBUG: ` | Python ad-hoc print |
+| `// TODO DEBUG`, `// HACK`, `// XXX` | Stale debug marker |
+| `// <PROJECT>-DEBUG` | Session-specific marker from this skill's edits |
+| Commented-out code blocks near the fix | Dead code from trial fixes |
+| Reordered imports or formatting in unrelated files | Drift from your editor's autoformat during the session |
+### Remove the journal
+Only once the git check is clean:
+```bash
+rm .debug-journal.md
+sed -i.bak '/^\.debug-journal\.md$/d' .git/info/exclude && rm -f .git/info/exclude.bak
+```
+The journal is not part of the fix; it doesn't belong in the commit or in the git exclude list.
+---
+## Phase 10 — Final Verification
+Last gate before reporting done. All four gates must be true, and all four must have **evidence in your final message** to the user. Passing a gate without evidence is the same as failing it.
+### The four gates
+1. **Red→green toggle confirmed** — show the failing test output from before the fix and passing output after. Both outputs visible in the reply or the journal.
+2. **Full test suite green** — show the suite's final pass line (e.g. `42 passed in 3.14s`). Not just the new test.
+3. **Manual QA reproduced the fix** — show the command or scenario that originally failed and its now-correct output. Verbatim, not paraphrased.
+4. **Working tree clean of debug artifacts** — show `git diff --stat` output containing only fix + test, plus `git status` clean of untracked debug files.
+If any of the four lacks evidence, you have not finished — return to the appropriate phase.
+### Final message template
+Keep it short. Evidence-dense. The user should be able to skim it in 30 seconds.
+```markdown
+Fixed.
+**Root cause**: <one sentence — the mechanism, not the symptom>
+**Fix**: `<file:line>` — <two words>
+**Test**: `<test file>::<test name>` — red without fix, green with fix
+**QA**: <one line describing what you ran and what you saw>
+Diff:
+```
+<git diff --stat output — should be tiny>
+```
+**Next steps I didn't take** (awaiting your decision):
+- <follow-up 1, if any — from QA silent-failure scan or refactor opportunities noted during Phase 7>
+- <follow-up 2 — or "none" if nothing else surfaced>
+```
+### Example (from a real session)
+```markdown
+Fixed.
+**Root cause**: pi-mono Agent's `model.baseUrl` was hardcoded to `api.anthropic.com`, so the `ANTHROPIC_BASE_URL` env var was silently ignored. The proxy API key was rejected by the real Anthropic API with 401, but pi-mono packaged the error into the assistant message's `errorMessage` field instead of throwing, so the route's try/catch never fired and the client received HTTP 200 with empty content.
+**Fix**: `core/pi-bridge/modelResolver.ts:117` — override baseUrl
+**Test**: `__tests__/core/modelResolver.test.ts::resolves_env_override` — red without fix, green with fix
+**QA**: `curl -X POST /api/refinement/chat` with proxy env set, observed non-zero usage and non-empty content
+Diff:
+```
+ core/pi-bridge/modelResolver.ts              | 3 +++
+ __tests__/core/modelResolver.test.ts         | 42 ++++++++++++++++++++++
+ 2 files changed, 45 insertions(+)
+```
+**Next steps I didn't take** (awaiting your decision):
+- pi-mono itself silently swallows LLM errors into `errorMessage`; adding a throw-on-error wrapper at our orchestrator layer would surface these upstream
+- Same silent-failure pattern exists in the planning route — likely the same fix applies
+```

package/plugins/litclaude/skills/debugging/references/methodology/partial-runtime-evidence.md ADDED Viewed

@@ -0,0 +1,228 @@
+# Partial Runtime Evidence — When You Cannot Execute the Real Operation
+Read this when **runtime truth beats code reading** is in conflict with **you cannot run the actual operation**.
+The skill's first invariant is "runtime state is the only source of truth." But sometimes the only state you can produce is a *partial* observation — the real call requires paid credits, a hardware device you don't have, network access through a corporate proxy, a production secret, or a customer dataset.
+**Partial runtime evidence is still runtime evidence.** This reference tells you which partial signals to harvest and how to combine them so the conclusion is defensible.
+---
+## When this applies
+Use this reference when ALL are true:
+1. The bug or extraction question requires runtime confirmation (per skill invariant #1).
+2. You attempted the obvious "just run it" path and it failed for reasons unrelated to the bug:
+   - 401/402/403 from a paid API
+   - "device not found" / "permission denied" / SIP block
+   - Production-only credentials
+   - Network isolation (air-gapped, behind VPN you don't have)
+   - Time-of-day or quota limits
+3. **Mocking the entire system** would defeat the verification — you specifically need evidence about how the *real* code behaves, not a stub.
+If only #1 and #2 are true and you can mock cleanly, just mock and proceed. This file is for cases where mocking would invalidate the answer.
+---
+## The hierarchy of partial evidence (strongest first)
+When you cannot capture the full outbound payload + full response, capture as much as possible from this list. **Evidence further down the list has more inference; evidence higher up is closer to ground truth.**
+### Tier 1 — Pre-send / post-receive logs (best partial evidence)
+The system you're investigating builds a request, then sends it. If the build step logs the assembled request **before** transmission, that log is ground truth for everything except the wire-level bytes (TLS, headers added by HTTP library, etc.).
+```bash
+# Maximize debug logging
+APP_DEBUG=1 APP_LOG_LEVEL=debug APP_LOG_FILE=/tmp/trace.log ./target -x "minimal valid input" 2>&1 | head -200
+```
+Look for log lines like:
+- `Building request: model=X, params={...}`
+- `[provider] payload: {...}`
+- `Sending to <url>: <serialized body>`
+**Strength**: 95% of ground truth. Missing only wire-level transformations.
+### Tier 2 — Local interception via proxy / shim
+Run the real binary against a local proxy that records and (optionally) returns a canned response.
+```bash
+# mitmproxy approach
+mitmproxy --listen-host 127.0.0.1 --listen-port 8888 --mode regular &
+HTTPS_PROXY=http://127.0.0.1:8888 SSL_CERT_FILE=~/.mitmproxy/mitmproxy-ca-cert.pem ./target ...
+# Now mitmproxy logs the actual TLS-decrypted request
+```
+```bash
+# DYLD_INSERT_LIBRARIES / LD_PRELOAD shim approach
+# Wrap the network call to log payload, return a fake 200
+# See pwntools.md for shim examples
+```
+**Strength**: Wire-level ground truth, but requires the target to honor your proxy / preload.
+### Tier 3 — Static extraction × runtime fingerprint cross-check
+When you cannot send a request at all, you can still cross-check static analysis with whatever the binary does that *doesn't* require the real call:
+- The binary builds the request — even if sending fails, the build step ran. Trace it (Tier 1).
+- The binary writes a state file or cache — read it.
+- The binary emits version-specific User-Agent strings; verify they match your static extraction.
+- The binary's `--help` or `--version` output reveals build metadata; verify model lists / feature flags.
+**Strength**: Disjoint evidence sources confirming the same fact. Two independent partial signals that agree are nearly as strong as one full observation.
+### Tier 4 — Contrastive runtime under different inputs
+If you can run with input variant A but not B, run A and reason about B from code:
+```bash
+# A: minimal trial input — works for free tier
+./target --action=read --resource=local-file
+# B: full inference call — paid tier required, blocked
+# But the request-building code is shared between A and B!
+# Capture A's logs, then inspect the code path for B and verify only the model/endpoint diff.
+```
+**Strength**: Confirms shared code paths; remaining gap is only the difference between A and B.
+### Tier 5 — Vendor-published API logs / dashboard
+If the operation succeeded earlier (before quota ran out, before access was revoked), the vendor's dashboard / audit log may show the request. Lower fidelity but still observed behavior.
+**Strength**: Real wire data, but often summarized — token counts, status codes, no payload bodies.
+### Tier 6 — Pure code reading with peer review
+If literally none of the above is available, read the code carefully and submit it to **one Oracle for skeptical review** (see "Verification Oracle" below). This is the weakest tier and you must explicitly mark conclusions as "unverified" in the journal.
+---
+## How to combine partial signals
+A defensible conclusion **prefers two independent signals from different tiers**, with one exception: a complete Tier 2 wire-level capture is wire-level ground truth and can stand alone for request-shape claims (because the wire bytes are exactly what the remote received). For *behavioral* claims (what the system does next, what state it stores, what side effects it produces), still combine with another signal.
+| Available evidence | Defensibility |
+|---|---|
+| Tier 1 + Tier 1 (same log, different lines) | weak — single source |
+| Tier 1 + Tier 2 (debug log + proxy capture) | **strong** — independent confirmation |
+| Tier 1 + Tier 3 (debug log + version output cross-check) | **strong** — disjoint sources |
+| Tier 2 alone (full proxy capture) | strong **for request-shape claims only** — stands alone for "what bytes were sent". Add a second signal for response-handling or state claims. |
+| Tier 3 + Tier 4 (cross-check + contrastive run) | medium — both partial |
+| Tier 6 alone (code reading only) | **insufficient** — escalate or mark unverified |
+Record in the journal:
+```markdown
+## Partial runtime evidence
+### Question being verified
+<the specific claim, e.g. "Opus 4.7 default effort is 'high'">
+### Available signals
+- Tier 1: debug log /tmp/trace.log line 47-49 shows `effort: "high"` ✓
+- Tier 3: static extraction of m5T() function returns "high" for smart mode ✓
+- Tier 6: code path verified by reading prompt-builder.js ✓
+### Independence assessment
+Tier 1 and Tier 3 are independent — the log was emitted by a different
+code path than m5T() and would diverge if the static reading were wrong.
+### Conclusion
+VERIFIED via Tier 1 + Tier 3 agreement. No need to escalate.
+```
+If you cannot achieve a complete Tier 2 capture **or** two independent non-Tier-6 signals from the table above, **write an explicit note in the deliverable**:
+> ⚠️ Partial-evidence finding. The full outbound payload could not be captured because [reason]. The conclusion rests on:
+> - [signal A — tier and source]
+> - [signal B — tier and source]
+> A future verification should attempt [the missing tier] when [condition].
+---
+## Verification Oracle pattern (for non-debug tasks)
+The skill's main Oracle Triple (`04-oracle-triple.md`) is for **stuck debugging** — 2 failed rounds, mental box, three orthogonal framings to break out.
+For tasks where the deliverable is an **artifact, not a bug fix** (reverse engineering, extraction, audit, compliance documentation), use a different pattern: **single Oracle, late, skeptical, with the deliverable in hand**.
+### When to invoke
+- Right before declaring an extraction/audit task "done"
+- After every significant revision of the deliverable (not after every small edit)
+- Maximum 3-4 iterations before escalating to user
+### Pattern
+Use a Claude Code review lane or verifier subagent with this prompt shape:
+```text
+SKEPTICAL FINAL VERIFICATION — be critical, look for reasons the task is incomplete or wrong.
+## Original task
+<verbatim user request>
+## What I produced
+<list of artifacts with paths and brief descriptions>
+## Specific claims to verify
+<bullet list of every concrete claim in the deliverable>
+## Where to look
+<paths the Oracle should Read / Bash to verify>
+## Your job
+1. Read the deliverables.
+2. Spot-check each claim against the source/evidence the deliverable cites.
+3. Identify any unsubstantiated claims, missing pieces, or factual errors.
+4. End with PASS / FAIL / PARTIAL with specific gaps.
+Be skeptical. Don't rubber-stamp.
+```
+### Why this differs from the Oracle Triple
+| | Oracle Triple (debug) | Verification Oracle (artifact) |
+|---|---|---|
+| Trigger | 2 failed hypothesis rounds | About to declare "done" |
+| Count | 3 in parallel, orthogonal framings | 1 sequential, focused review |
+| Goal | Break out of mental box | Catch unsubstantiated claims |
+| Tone of prompt | Brainstorm wide alternatives | Skeptical audit |
+| Iteration | Reset hypothesis set after | Fix gaps, re-invoke until PASS |
+### Don't conflate them
+If you're stuck debugging, do the Triple. If you have a deliverable and need it audited, do the Verification Oracle. Doing the Triple on a finished extraction will return three diverging "what if you tried…" tangents that are not what you need. Doing the Verification Oracle on a stuck debugging session will return a polite "the evidence is incomplete" that you already knew.
+---
+## Common partial-evidence anti-patterns
+| Anti-pattern | Why it fails | Replacement |
+|---|---|---|
+| "It looks right in the code, so it works" | Tier 6 alone, unverified | Add at least one Tier 1-3 signal |
+| "I ran it once, didn't error, so it's correct" | Absence of error ≠ presence of correctness | Capture the actual output and verify content |
+| "The mock returns the value I wrote, so the code is fine" | Tautology — mock loops back your assumption | Use Tier 2 (proxy) instead, or cross-check with Tier 3 |
+| "The vendor's dashboard shows my call worked" | Dashboard often only shows status code, not behavior | Combine with Tier 1 if available |
+| "I'll trust the most-recent stack overflow answer" | Code from a different version / context | Verify against the actual binary you have |
+---
+## Cleanup additions for partial-evidence work
+```bash
+# Proxy artifacts
+pkill -f mitmproxy 2>/dev/null
+rm -f ~/.mitmproxy/cache_* 2>/dev/null
+# Debug log files
+rm -f /tmp/trace.log /tmp/*-debug-trace.log
+# DYLD_INSERT / LD_PRELOAD shim libraries
+rm -f /tmp/*.dylib /tmp/*.so
+# Verify env vars set in your shell are not persisted
+unset HTTPS_PROXY APP_DEBUG APP_LOG_LEVEL APP_LOG_FILE 2>/dev/null
+```