litclaude-ai 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (156) hide show
  1. package/CHANGELOG.md +155 -0
  2. package/LICENSE +21 -0
  3. package/README.md +369 -0
  4. package/README_ko-KR.md +374 -0
  5. package/RELEASE_CHECKLIST.md +165 -0
  6. package/bin/litclaude-ai.js +643 -0
  7. package/cover.png +0 -0
  8. package/docs/agents.md +67 -0
  9. package/docs/hooks.md +134 -0
  10. package/docs/lsp.md +40 -0
  11. package/docs/migration.md +209 -0
  12. package/docs/workflow-compatibility-audit.md +119 -0
  13. package/generate_cover.py +123 -0
  14. package/package.json +48 -0
  15. package/plugins/litclaude/.claude-plugin/plugin.json +25 -0
  16. package/plugins/litclaude/.lsp.json +13 -0
  17. package/plugins/litclaude/.mcp.json +9 -0
  18. package/plugins/litclaude/agents/boulder-executor.md +12 -0
  19. package/plugins/litclaude/agents/librarian-researcher.md +15 -0
  20. package/plugins/litclaude/agents/oracle-verifier.md +16 -0
  21. package/plugins/litclaude/agents/prometheus-planner.md +13 -0
  22. package/plugins/litclaude/agents/qa-runner.md +16 -0
  23. package/plugins/litclaude/agents/quality-reviewer.md +17 -0
  24. package/plugins/litclaude/bin/litclaude-hook.js +110 -0
  25. package/plugins/litclaude/bin/litclaude-hud.js +271 -0
  26. package/plugins/litclaude/bin/litclaude-lsp-doctor.js +15 -0
  27. package/plugins/litclaude/bin/litclaude-mcp.js +70 -0
  28. package/plugins/litclaude/commands/deep-interview.md +21 -0
  29. package/plugins/litclaude/commands/dynamic-workflow.md +36 -0
  30. package/plugins/litclaude/commands/lit-loop.md +40 -0
  31. package/plugins/litclaude/commands/lit-plan.md +35 -0
  32. package/plugins/litclaude/commands/litgoal.md +30 -0
  33. package/plugins/litclaude/commands/review-work.md +35 -0
  34. package/plugins/litclaude/commands/start-work.md +36 -0
  35. package/plugins/litclaude/hooks/hooks.json +54 -0
  36. package/plugins/litclaude/lib/context-pressure.mjs +25 -0
  37. package/plugins/litclaude/lib/hud-accent-palette.mjs +58 -0
  38. package/plugins/litclaude/lib/litgoal/cli.mjs +266 -0
  39. package/plugins/litclaude/lib/litgoal/ledger.mjs +16 -0
  40. package/plugins/litclaude/lib/litgoal/paths.mjs +7 -0
  41. package/plugins/litclaude/lib/litgoal/state.mjs +67 -0
  42. package/plugins/litclaude/lib/mutated-file-paths.mjs +63 -0
  43. package/plugins/litclaude/lib/start-work-continuation.mjs +99 -0
  44. package/plugins/litclaude/lib/workflow-check.mjs +83 -0
  45. package/plugins/litclaude/skills/ai-slop-remover/SKILL.md +142 -0
  46. package/plugins/litclaude/skills/comment-checker/SKILL.md +55 -0
  47. package/plugins/litclaude/skills/debugging/SKILL.md +70 -0
  48. package/plugins/litclaude/skills/debugging/references/methodology/00-setup.md +108 -0
  49. package/plugins/litclaude/skills/debugging/references/methodology/02-investigate.md +126 -0
  50. package/plugins/litclaude/skills/debugging/references/methodology/04-oracle-triple.md +106 -0
  51. package/plugins/litclaude/skills/debugging/references/methodology/05-escalate.md +69 -0
  52. package/plugins/litclaude/skills/debugging/references/methodology/06-fix.md +116 -0
  53. package/plugins/litclaude/skills/debugging/references/methodology/08-qa.md +94 -0
  54. package/plugins/litclaude/skills/debugging/references/methodology/09-cleanup.md +164 -0
  55. package/plugins/litclaude/skills/debugging/references/methodology/partial-runtime-evidence.md +228 -0
  56. package/plugins/litclaude/skills/debugging/references/runtimes/bundled-js-binary.md +415 -0
  57. package/plugins/litclaude/skills/debugging/references/runtimes/go.md +252 -0
  58. package/plugins/litclaude/skills/debugging/references/runtimes/native-binary.md +484 -0
  59. package/plugins/litclaude/skills/debugging/references/runtimes/node.md +260 -0
  60. package/plugins/litclaude/skills/debugging/references/runtimes/python.md +248 -0
  61. package/plugins/litclaude/skills/debugging/references/runtimes/rust.md +234 -0
  62. package/plugins/litclaude/skills/debugging/references/tools/ghidra.md +212 -0
  63. package/plugins/litclaude/skills/debugging/references/tools/playwright-cli.md +194 -0
  64. package/plugins/litclaude/skills/debugging/references/tools/pwndbg.md +263 -0
  65. package/plugins/litclaude/skills/debugging/references/tools/pwntools.md +265 -0
  66. package/plugins/litclaude/skills/deep-interview/SKILL.md +323 -0
  67. package/plugins/litclaude/skills/deep-interview/scripts/render_progress.py +193 -0
  68. package/plugins/litclaude/skills/frontend-ui-ux/SKILL.md +62 -0
  69. package/plugins/litclaude/skills/lit-loop/SKILL.md +144 -0
  70. package/plugins/litclaude/skills/lit-plan/SKILL.md +125 -0
  71. package/plugins/litclaude/skills/litgoal/SKILL.md +219 -0
  72. package/plugins/litclaude/skills/lsp/SKILL.md +63 -0
  73. package/plugins/litclaude/skills/programming/SKILL.md +106 -0
  74. package/plugins/litclaude/skills/programming/references/go/README.md +90 -0
  75. package/plugins/litclaude/skills/programming/references/go/backend-stack.md +641 -0
  76. package/plugins/litclaude/skills/programming/references/go/bootstrap.md +328 -0
  77. package/plugins/litclaude/skills/programming/references/go/bubbletea-v2.md +360 -0
  78. package/plugins/litclaude/skills/programming/references/go/cobra-stack.md +468 -0
  79. package/plugins/litclaude/skills/programming/references/go/concurrency.md +362 -0
  80. package/plugins/litclaude/skills/programming/references/go/data-modeling.md +329 -0
  81. package/plugins/litclaude/skills/programming/references/go/error-handling.md +359 -0
  82. package/plugins/litclaude/skills/programming/references/go/golangci-strict.md +236 -0
  83. package/plugins/litclaude/skills/programming/references/go/grpc-connect.md +375 -0
  84. package/plugins/litclaude/skills/programming/references/go/libraries.md +337 -0
  85. package/plugins/litclaude/skills/programming/references/go/one-liners.md +202 -0
  86. package/plugins/litclaude/skills/programming/references/go/sqlc-pgx.md +471 -0
  87. package/plugins/litclaude/skills/programming/references/go/testing.md +467 -0
  88. package/plugins/litclaude/skills/programming/references/go/type-patterns.md +298 -0
  89. package/plugins/litclaude/skills/programming/references/python/README.md +314 -0
  90. package/plugins/litclaude/skills/programming/references/python/async-anyio.md +442 -0
  91. package/plugins/litclaude/skills/programming/references/python/data-modeling.md +233 -0
  92. package/plugins/litclaude/skills/programming/references/python/data-processing.md +133 -0
  93. package/plugins/litclaude/skills/programming/references/python/error-handling.md +218 -0
  94. package/plugins/litclaude/skills/programming/references/python/fastapi-stack.md +316 -0
  95. package/plugins/litclaude/skills/programming/references/python/httpx2-optimization.md +360 -0
  96. package/plugins/litclaude/skills/programming/references/python/libraries.md +307 -0
  97. package/plugins/litclaude/skills/programming/references/python/one-liners.md +268 -0
  98. package/plugins/litclaude/skills/programming/references/python/orjson-stack.md +378 -0
  99. package/plugins/litclaude/skills/programming/references/python/pydantic-ai.md +285 -0
  100. package/plugins/litclaude/skills/programming/references/python/pyproject-strict.md +232 -0
  101. package/plugins/litclaude/skills/programming/references/python/textual-tui.md +201 -0
  102. package/plugins/litclaude/skills/programming/references/python/type-patterns.md +176 -0
  103. package/plugins/litclaude/skills/programming/references/rust/README.md +317 -0
  104. package/plugins/litclaude/skills/programming/references/rust/async-tokio.md +299 -0
  105. package/plugins/litclaude/skills/programming/references/rust/axum-stack.md +467 -0
  106. package/plugins/litclaude/skills/programming/references/rust/cargo-strict.md +317 -0
  107. package/plugins/litclaude/skills/programming/references/rust/clap-stack.md +409 -0
  108. package/plugins/litclaude/skills/programming/references/rust/concurrency.md +375 -0
  109. package/plugins/litclaude/skills/programming/references/rust/libraries.md +439 -0
  110. package/plugins/litclaude/skills/programming/references/rust/one-liners.md +291 -0
  111. package/plugins/litclaude/skills/programming/references/rust/proptest-insta.md +429 -0
  112. package/plugins/litclaude/skills/programming/references/rust/type-state.md +354 -0
  113. package/plugins/litclaude/skills/programming/references/rust/unsafe-discipline.md +250 -0
  114. package/plugins/litclaude/skills/programming/references/rust/zero-cost-safety.md +527 -0
  115. package/plugins/litclaude/skills/programming/references/rust-ub/README.md +289 -0
  116. package/plugins/litclaude/skills/programming/references/rust-ub/miri-sanitizers-loom.md +411 -0
  117. package/plugins/litclaude/skills/programming/references/rust-ub/ub-taxonomy.md +269 -0
  118. package/plugins/litclaude/skills/programming/references/typescript/README.md +195 -0
  119. package/plugins/litclaude/skills/programming/references/typescript/backend-hono.md +672 -0
  120. package/plugins/litclaude/skills/programming/references/typescript/bootstrap.md +199 -0
  121. package/plugins/litclaude/skills/programming/references/typescript/data-modeling.md +202 -0
  122. package/plugins/litclaude/skills/programming/references/typescript/error-handling.md +169 -0
  123. package/plugins/litclaude/skills/programming/references/typescript/tsconfig-strict.md +152 -0
  124. package/plugins/litclaude/skills/programming/references/typescript/type-patterns.md +196 -0
  125. package/plugins/litclaude/skills/programming/scripts/go/check-no-excuse-rules.sh +173 -0
  126. package/plugins/litclaude/skills/programming/scripts/go/new-project.py +138 -0
  127. package/plugins/litclaude/skills/programming/scripts/go/templates/.editorconfig +13 -0
  128. package/plugins/litclaude/skills/programming/scripts/go/templates/.golangci.yml +95 -0
  129. package/plugins/litclaude/skills/programming/scripts/go/templates/AGENTS.md.tmpl +24 -0
  130. package/plugins/litclaude/skills/programming/scripts/go/templates/README.md.tmpl +12 -0
  131. package/plugins/litclaude/skills/programming/scripts/go/templates/Taskfile.yml +40 -0
  132. package/plugins/litclaude/skills/programming/scripts/go/templates/ci.yml +37 -0
  133. package/plugins/litclaude/skills/programming/scripts/go/templates/config.go +24 -0
  134. package/plugins/litclaude/skills/programming/scripts/go/templates/gitignore +15 -0
  135. package/plugins/litclaude/skills/programming/scripts/go/templates/main.go.tmpl +22 -0
  136. package/plugins/litclaude/skills/programming/scripts/go/templates/run.go +15 -0
  137. package/plugins/litclaude/skills/programming/scripts/python/check-no-excuse-rules.py +687 -0
  138. package/plugins/litclaude/skills/programming/scripts/python/new-project.py +172 -0
  139. package/plugins/litclaude/skills/programming/scripts/python/new-script.py +116 -0
  140. package/plugins/litclaude/skills/programming/scripts/rust/check-no-excuse-rules.py +296 -0
  141. package/plugins/litclaude/skills/programming/scripts/rust/check-no-excuse-rules.sh +158 -0
  142. package/plugins/litclaude/skills/programming/scripts/rust/new-project.py +175 -0
  143. package/plugins/litclaude/skills/programming/scripts/typescript/check-no-excuse-rules.ts +282 -0
  144. package/plugins/litclaude/skills/programming/scripts/typescript/new-project.ts +177 -0
  145. package/plugins/litclaude/skills/refactor/SKILL.md +73 -0
  146. package/plugins/litclaude/skills/remove-ai-slops/SKILL.md +52 -0
  147. package/plugins/litclaude/skills/review-work/SKILL.md +331 -0
  148. package/plugins/litclaude/skills/rules/SKILL.md +66 -0
  149. package/plugins/litclaude/skills/start-work/SKILL.md +132 -0
  150. package/scripts/audit-plan-checkboxes.mjs +37 -0
  151. package/scripts/doctor.mjs +41 -0
  152. package/scripts/inspect-agent-tools.mjs +27 -0
  153. package/scripts/postinstall.mjs +50 -0
  154. package/scripts/qa-claude-plugin-smoke.sh +60 -0
  155. package/scripts/qa-portable-install.sh +136 -0
  156. package/scripts/validate-plugin.mjs +72 -0
@@ -0,0 +1,116 @@
1
+ # Phase 6 + 7 — Root Cause Confirmation & TDD Fix
2
+
3
+ A cause is not "confirmed" until you can toggle the bug by toggling the cause. Every other level of evidence is correlation, and correlation-driven fixes ship bugs.
4
+
5
+ ---
6
+
7
+ ## Phase 6 — Root Cause Confirmation
8
+
9
+ You are allowed to call the cause "confirmed" only when ALL THREE of these hold:
10
+
11
+ ### 1. Captured runtime value matches the hypothesis exactly
12
+
13
+ Not "the value looks consistent with" — the value is exactly the value the hypothesis predicted. If your hypothesis was "baseUrl is api.anthropic.com despite ANTHROPIC_BASE_URL being set to a proxy", the captured value is literally `"https://api.anthropic.com"` in the debugger at the moment of the HTTP call.
14
+
15
+ ### 2. Reproducible
16
+
17
+ Running the repro a second time yields the same observation. Flaky repros mean you haven't isolated the cause; you've isolated a symptom that sometimes appears when the cause does. Keep investigating.
18
+
19
+ ### 3. Toggle proof (the one most skipped)
20
+
21
+ **Changing the value** (via debugger assignment, env override, or a speculative one-line patch) **makes the bug disappear — and reverting brings the bug back**.
22
+
23
+ If you can't toggle the bug by toggling the suspected cause, what you have is a correlation, not a mechanism. A correlation is a strong hypothesis, not a confirmed cause.
24
+
25
+ Examples of a valid toggle proof:
26
+
27
+ | Suspected cause | Toggle |
28
+ |---|---|
29
+ | Env var overrides library default, and the override is wrong | Unset the env var → bug goes away. Reset it → bug comes back. |
30
+ | Async task is not awaited | Add `await` → bug goes away. Remove `await` → bug comes back. |
31
+ | Third-party SDK uses hardcoded URL | Monkey-patch SDK to use env URL → bug goes away. Unpatch → bug comes back. |
32
+ | Race condition on shared state | Add a mutex → bug goes away under load. Remove mutex → bug comes back under load. |
33
+
34
+ If you can't construct a toggle proof, you haven't confirmed the cause. Run one more round.
35
+
36
+ ### Update the journal
37
+
38
+ ```markdown
39
+ ## Root cause (confirmed <ISO timestamp>)
40
+ - Mechanism: <one paragraph, causal not correlational — the chain from cause to observable symptom>
41
+ - Evidence: <file:line of captured value | path to saved repro | address + register state>
42
+ - Toggle proof: "With <change X>, repro produces <good>. Reverting <change X>, repro produces <bad>."
43
+ - Fix scope: <files and approximate line count>
44
+ ```
45
+
46
+ The "mechanism" field is the acid test. If you can't write the causal chain from cause to observable symptom as one paragraph, you don't yet understand the bug well enough to fix it.
47
+
48
+ ---
49
+
50
+ ## Phase 7 — TDD Fix
51
+
52
+ Red, green, refactor. No shortcuts.
53
+
54
+ ### 1. Red — failing-first test
55
+
56
+ Write a test that fails *specifically because of this bug*. Requirements:
57
+
58
+ - **Test name reads like a bug report.** `test_refinement_turn_returns_empty_content_when_anthropic_returns_401` is good. `test_bug_fix` is not.
59
+ - **Failure message clearly shows what the bug looks like.** If someone reads only the failure output, they understand what's broken.
60
+ - **Minimum infrastructure.** Don't spin up the whole server if a unit test against the right seam captures the mechanism.
61
+
62
+ Run the test. Confirm it fails. Paste the failure output into the journal:
63
+
64
+ ```markdown
65
+ ### Red phase (<ISO timestamp>)
66
+ Test: <path>::<name>
67
+ Command: <exact invocation>
68
+ Output:
69
+ ```
70
+ <verbatim failure output>
71
+ ```
72
+ Confirms: the bug is reproducible at the test-harness level, not just the manual repro.
73
+ ```
74
+
75
+ ### 2. Green — minimum change
76
+
77
+ Make the test pass with the **smallest change that fully fixes the observed mechanism**.
78
+
79
+ If the diff is larger than ~30 lines and you aren't refactoring, something is wrong — either you're fixing more than the bug, or the root cause was deeper than you confirmed. Back to Phase 6.
80
+
81
+ Signs you're over-fixing:
82
+ - Adding "just in case" null checks or try/except around other code
83
+ - Refactoring adjacent functions because "while I'm here"
84
+ - Adding new configuration options the bug didn't require
85
+ - Introducing new abstractions to "make this cleaner"
86
+
87
+ Resist all of these. Fix the bug. Note the surrounding issues for follow-up. Move on.
88
+
89
+ ### 3. Refactor — ONLY AFTER GREEN
90
+
91
+ Only cleanup directly related to the fix. Do not re-architect.
92
+
93
+ If the code around the fix is rough, note it in the journal as a follow-up for the user; do not expand scope here. Refactoring during a bugfix is how one-line fixes turn into hundred-line diffs nobody can review.
94
+
95
+ ### 4. Regression — full suite green
96
+
97
+ Run the full test suite for the affected package (not just the one new test). Existing tests must still pass.
98
+
99
+ If they don't, your "fix" broke something else. Back to Phase 6 with the new failure as evidence — usually it means the mechanism you thought you fixed was load-bearing for some other code path you didn't know about, and the "broken" test is actually pointing at a better understanding of the system.
100
+
101
+ ### Update the journal
102
+
103
+ ```markdown
104
+ ### Green phase (<ISO timestamp>)
105
+ Fix: <file:line> — <two-line description of the change>
106
+ Test: <path>::<name> now passes
107
+ Full suite: <N tests, <M failures — should be 0>
108
+ ```
109
+
110
+ ---
111
+
112
+ ## The red-green discipline summary
113
+
114
+ No red test → no proof the fix addresses the reported bug. Only proof it doesn't break tests that already existed.
115
+
116
+ A test written *after* the fix might still pass with the fix reverted. If that's the case, the test doesn't lock the bug — it locks something else. Always verify the test fails without the fix and passes with it. The journal should show both outputs.
@@ -0,0 +1,94 @@
1
+ # Phase 8 — Manual QA by Actually Using It
2
+
3
+ Tests cover cases you thought of. Real usage covers the ones you didn't.
4
+
5
+ The single fastest way to ship a broken fix is to stop at "tests pass". Manual QA means interacting with the running system the way the user does, then comparing observed behavior to the original bug report.
6
+
7
+ ---
8
+
9
+ ## Product-type playbook
10
+
11
+ Pick the row that matches the product. Do what it says. Do not substitute.
12
+
13
+ | Product type | QA means… |
14
+ |---|---|
15
+ | **CLI tool** | Open `tmux`, run the actual command end-to-end, capture output. Paste the session transcript into the journal. Include exit code, stdout, stderr, side-effect check (files created/modified). |
16
+ | **HTTP API** | Start the real server, hit endpoints with `curl` or `httpie`, inspect response status + body + headers. Hit the specific endpoint that reproduced the bug. If there's auth, use real auth. |
17
+ | **Browser-served web app** | **Drive a real browser via Playwright CLI.** See [tools/playwright-cli.md](../tools/playwright-cli.md). Navigate the exact page/flow that reproduced the bug. Capture screenshot + DOM + network evidence. **Do not substitute with curl** — browsers have state (cookies, localStorage, service workers, client-side JS, viewport-dependent CSS) that curl does not have. |
18
+ | **Agent / LLM pipeline** | Run the same user prompt that originally failed. Capture the full turn — tool calls, messages, usage counters. **Confirm non-zero usage** (zero usage = still failing silently, see silent-failure check below). |
19
+ | **Background worker / job queue** | Trigger the job through the normal entry point (API call, cron tick, message publish), tail the worker logs, observe completion state in the queue or DB. Don't just call the worker function directly — the trigger path matters. |
20
+ | **MCP server** | Invoke the tool via its actual client (Claude Desktop, Cursor, etc. if available) or `mcp-cli`, not just the HTTP probe endpoint. The MCP handshake itself is sometimes where bugs live. |
21
+ | **Native binary** | Re-run the exact command that crashed / misbehaved. If the input was a file, use the same file. If the bug was exploitable, confirm the exploit repro via pwntools (see [tools/pwntools.md](../tools/pwntools.md)). Capture exit code, signal if any, core dump if generated. |
22
+ | **Bundled-app binary** (Bun SEA, Node SEA, Electron, etc.) | Re-run the exact command. If the operation requires paid quota / blocked network, capture the **app's debug log** (`APP_DEBUG=1 APP_LOG_LEVEL=debug APP_LOG_FILE=/tmp/trace.log`) which usually emits the assembled request before sending. See [methodology/partial-runtime-evidence.md](partial-runtime-evidence.md) for combining partial signals into a defensible verification. |
23
+ | **Long-running daemon** | Start fresh, let it run for the amount of time the bug originally took to manifest (not less), capture resource usage (memory, fd, cpu) throughout. Short-running QA misses resource leaks and cumulative state bugs. |
24
+
25
+ ---
26
+
27
+ ## Journal format
28
+
29
+ Every QA run goes in the journal under "Findings":
30
+
31
+ ```markdown
32
+ ### Manual QA — <product type> (<ISO timestamp>)
33
+ - Scenario: <one line describing what you did>
34
+ - Command: `<exact invocation>`
35
+ - Observed output:
36
+ ```
37
+ <verbatim output, trimmed to relevant section>
38
+ ```
39
+ - Expected output: <what correct behavior looks like>
40
+ - Fix verified: yes / no / partial — <details>
41
+ ```
42
+
43
+ If any QA step shows **partial or regressed behavior**, this is not "mostly done" — it's incomplete. Return to Phase 6.
44
+
45
+ ---
46
+
47
+ ## The silent-failure check (always run)
48
+
49
+ Regardless of product type, audit the fix against these silent-failure patterns. If the original bug was a silent failure, the same pattern may exist in adjacent code that you haven't tested yet.
50
+
51
+ ### Universal silent-failure signals
52
+
53
+ - HTTP 2xx with empty or default body
54
+ - Response `ok: true` but a sub-field contains an error token (e.g. `stopReason: "error"`, `status: "failed"`)
55
+ - `usage.totalTokens === 0` on an LLM response
56
+ - Process exit code 0 but stderr contains an exception traceback
57
+ - Panic recovered and logged but ignored
58
+ - Goroutine / task / promise rejection with no top-level handler
59
+ - `try { ... } catch { /* swallowed */ }` or `except: pass`
60
+ - Success response shape but semantic field indicates failure (e.g. `error: null` actually being `error: "..."` with falsy check)
61
+ - Write returned success but read-back shows stale data
62
+ - Job marked complete but side-effect did not happen
63
+ - Cache hit path returned stale data and no refresh was triggered
64
+
65
+ ### Language-specific silent-failure signals
66
+
67
+ Check the runtime reference for additional patterns:
68
+
69
+ - [runtimes/python.md](../runtimes/python.md) — asyncio task exceptions, bare `except`, `logging.exception` that goes nowhere
70
+ - [runtimes/node.md](../runtimes/node.md) — unhandled promise rejections, `void` on async, swallowed `.catch(() => {})`
71
+ - [runtimes/rust.md](../runtimes/rust.md) — `.unwrap_or_default()`, `let _ = result`, error variants discarded
72
+ - [runtimes/go.md](../runtimes/go.md) — `if err != nil { return err }` that never reaches user output, recovered panics, buffered channels that block silently
73
+ - [runtimes/native-binary.md](../runtimes/native-binary.md) — ignored return codes from libc, missing `perror`, `alarm()` / signal masks
74
+ - [runtimes/bundled-js-binary.md](../runtimes/bundled-js-binary.md) — `process.env.X` baked at build time, dead code from tree-shaking failures, worker sub-bundles diverging from main bundle
75
+
76
+ ### What to do when you find another silent-failure spot
77
+
78
+ Don't fix it. This is out of scope for the current bug.
79
+
80
+ Note it in the journal under a "Follow-ups" section with:
81
+ - File:line
82
+ - Pattern matched
83
+ - Proposed fix sketch (one line)
84
+ - Risk level (what happens if left unfixed)
85
+
86
+ Surface these to the user in the final message under "Next steps I didn't take".
87
+
88
+ ---
89
+
90
+ ## The "fix verified" bar
91
+
92
+ "Fix verified" means: the exact original failing scenario, re-run, now produces the correct output. Not a similar scenario. Not a unit test of the fix. The original scenario.
93
+
94
+ If you can't re-run the original scenario (e.g. it required a specific data state that's gone), construct the closest equivalent and document the difference in the journal. Escalate to the user if the equivalent is materially different.
@@ -0,0 +1,164 @@
1
+ # Phase 9 + 10 — Cleanup & Final Verification
2
+
3
+ The working tree after the session must differ from before only by the real fix and its test. Anything else is a process failure.
4
+
5
+ ---
6
+
7
+ ## Phase 9 — Cleanup & Revert
8
+
9
+ ### The walk
10
+
11
+ Open the journal's "Artifacts to revert" list. Walk it top to bottom. Check each box only after the revert command succeeds and produces no error.
12
+
13
+ ### Standard revert operations
14
+
15
+ Most sessions create some combination of these artifacts. The commands below are the defaults — your journal should have the exact commands for this session.
16
+
17
+ ```bash
18
+ # --- Temporary source edits (instrumentation statements, debug prints) ---
19
+ git checkout <file> # reverts only that file
20
+ git diff <file> # verify clean
21
+
22
+ # --- tmux sessions ---
23
+ tmux kill-session -t <session-name>
24
+ tmux ls # confirm gone
25
+
26
+ # --- Temp fixtures / scratch scripts ---
27
+ rm -f /tmp/debug-*.*
28
+ ls /tmp/debug-*.* 2>/dev/null # confirm gone (ls returns non-zero when no match)
29
+
30
+ # --- Background processes (debugger-attached runtimes) ---
31
+ pkill -f 'node --inspect' || true
32
+ pkill -f 'python -m pdb' || true
33
+ pkill -f 'debugpy' || true
34
+ pkill -f 'dlv' || true
35
+ pkill -f 'gdb' || true
36
+ pkill -f 'lldb' || true
37
+
38
+ # --- Debug-relevant ports confirmed free ---
39
+ lsof -iTCP:9229 -sTCP:LISTEN -nP 2>/dev/null # Node inspector default
40
+ lsof -iTCP:5678 -sTCP:LISTEN -nP 2>/dev/null # debugpy default
41
+ lsof -iTCP:2345 -sTCP:LISTEN -nP 2>/dev/null # dlv default
42
+ lsof -iTCP:9999 -sTCP:LISTEN -nP 2>/dev/null # pwndbg/gdb-server default
43
+
44
+ # --- Env var overrides in current shell ---
45
+ unset DEBUG_OVERRIDE_FOO
46
+ unset PYTHONBREAKPOINT
47
+ unset RUST_LOG
48
+ unset DEBUG
49
+
50
+ # --- Ghidra scratch projects (if created just for this session) ---
51
+ # rm -rf ~/ghidra-projects/debug-scratch
52
+
53
+ # --- Core dumps from debugging (if any) ---
54
+ rm -f ./core ./core.* ~/core.*
55
+
56
+ # --- Playwright trace files ---
57
+ rm -rf playwright-report/ test-results/
58
+ ```
59
+
60
+ ### The verify command
61
+
62
+ This is the single most important check of the whole skill:
63
+
64
+ ```bash
65
+ git status
66
+ git diff --stat
67
+ ```
68
+
69
+ The diff must contain **only**:
70
+
71
+ 1. The real fix.
72
+ 2. The new failing-first test.
73
+ 3. Nothing else.
74
+
75
+ ### Detector checklist — scan the diff for these
76
+
77
+ If `git status` shows any untracked debug file, or `git diff` shows any of the patterns below, **you are not done**. Clean it.
78
+
79
+ | Pattern | Usually means |
80
+ |---|---|
81
+ | `debugger;` | Node debug statement left behind |
82
+ | `breakpoint()` | Python debug statement left behind |
83
+ | `dbg!(...)` | Rust debug macro left behind |
84
+ | `fmt.Println("DEBUG: ...")` | Go ad-hoc print |
85
+ | `console.log("[DEBUG]` | Node ad-hoc log |
86
+ | `print(f"DEBUG: ` | Python ad-hoc print |
87
+ | `// TODO DEBUG`, `// HACK`, `// XXX` | Stale debug marker |
88
+ | `// <PROJECT>-DEBUG` | Session-specific marker from this skill's edits |
89
+ | Commented-out code blocks near the fix | Dead code from trial fixes |
90
+ | Reordered imports or formatting in unrelated files | Drift from your editor's autoformat during the session |
91
+
92
+ ### Remove the journal
93
+
94
+ Only once the git check is clean:
95
+
96
+ ```bash
97
+ rm .debug-journal.md
98
+ sed -i.bak '/^\.debug-journal\.md$/d' .git/info/exclude && rm -f .git/info/exclude.bak
99
+ ```
100
+
101
+ The journal is not part of the fix; it doesn't belong in the commit or in the git exclude list.
102
+
103
+ ---
104
+
105
+ ## Phase 10 — Final Verification
106
+
107
+ Last gate before reporting done. All four gates must be true, and all four must have **evidence in your final message** to the user. Passing a gate without evidence is the same as failing it.
108
+
109
+ ### The four gates
110
+
111
+ 1. **Red→green toggle confirmed** — show the failing test output from before the fix and passing output after. Both outputs visible in the reply or the journal.
112
+
113
+ 2. **Full test suite green** — show the suite's final pass line (e.g. `42 passed in 3.14s`). Not just the new test.
114
+
115
+ 3. **Manual QA reproduced the fix** — show the command or scenario that originally failed and its now-correct output. Verbatim, not paraphrased.
116
+
117
+ 4. **Working tree clean of debug artifacts** — show `git diff --stat` output containing only fix + test, plus `git status` clean of untracked debug files.
118
+
119
+ If any of the four lacks evidence, you have not finished — return to the appropriate phase.
120
+
121
+ ### Final message template
122
+
123
+ Keep it short. Evidence-dense. The user should be able to skim it in 30 seconds.
124
+
125
+ ```markdown
126
+ Fixed.
127
+
128
+ **Root cause**: <one sentence — the mechanism, not the symptom>
129
+ **Fix**: `<file:line>` — <two words>
130
+ **Test**: `<test file>::<test name>` — red without fix, green with fix
131
+ **QA**: <one line describing what you ran and what you saw>
132
+
133
+ Diff:
134
+ ```
135
+ <git diff --stat output — should be tiny>
136
+ ```
137
+
138
+ **Next steps I didn't take** (awaiting your decision):
139
+ - <follow-up 1, if any — from QA silent-failure scan or refactor opportunities noted during Phase 7>
140
+ - <follow-up 2 — or "none" if nothing else surfaced>
141
+ ```
142
+
143
+ ### Example (from a real session)
144
+
145
+ ```markdown
146
+ Fixed.
147
+
148
+ **Root cause**: pi-mono Agent's `model.baseUrl` was hardcoded to `api.anthropic.com`, so the `ANTHROPIC_BASE_URL` env var was silently ignored. The proxy API key was rejected by the real Anthropic API with 401, but pi-mono packaged the error into the assistant message's `errorMessage` field instead of throwing, so the route's try/catch never fired and the client received HTTP 200 with empty content.
149
+
150
+ **Fix**: `core/pi-bridge/modelResolver.ts:117` — override baseUrl
151
+ **Test**: `__tests__/core/modelResolver.test.ts::resolves_env_override` — red without fix, green with fix
152
+ **QA**: `curl -X POST /api/refinement/chat` with proxy env set, observed non-zero usage and non-empty content
153
+
154
+ Diff:
155
+ ```
156
+ core/pi-bridge/modelResolver.ts | 3 +++
157
+ __tests__/core/modelResolver.test.ts | 42 ++++++++++++++++++++++
158
+ 2 files changed, 45 insertions(+)
159
+ ```
160
+
161
+ **Next steps I didn't take** (awaiting your decision):
162
+ - pi-mono itself silently swallows LLM errors into `errorMessage`; adding a throw-on-error wrapper at our orchestrator layer would surface these upstream
163
+ - Same silent-failure pattern exists in the planning route — likely the same fix applies
164
+ ```
@@ -0,0 +1,228 @@
1
+ # Partial Runtime Evidence — When You Cannot Execute the Real Operation
2
+
3
+ Read this when **runtime truth beats code reading** is in conflict with **you cannot run the actual operation**.
4
+
5
+ The skill's first invariant is "runtime state is the only source of truth." But sometimes the only state you can produce is a *partial* observation — the real call requires paid credits, a hardware device you don't have, network access through a corporate proxy, a production secret, or a customer dataset.
6
+
7
+ **Partial runtime evidence is still runtime evidence.** This reference tells you which partial signals to harvest and how to combine them so the conclusion is defensible.
8
+
9
+ ---
10
+
11
+ ## When this applies
12
+
13
+ Use this reference when ALL are true:
14
+
15
+ 1. The bug or extraction question requires runtime confirmation (per skill invariant #1).
16
+ 2. You attempted the obvious "just run it" path and it failed for reasons unrelated to the bug:
17
+ - 401/402/403 from a paid API
18
+ - "device not found" / "permission denied" / SIP block
19
+ - Production-only credentials
20
+ - Network isolation (air-gapped, behind VPN you don't have)
21
+ - Time-of-day or quota limits
22
+ 3. **Mocking the entire system** would defeat the verification — you specifically need evidence about how the *real* code behaves, not a stub.
23
+
24
+ If only #1 and #2 are true and you can mock cleanly, just mock and proceed. This file is for cases where mocking would invalidate the answer.
25
+
26
+ ---
27
+
28
+ ## The hierarchy of partial evidence (strongest first)
29
+
30
+ When you cannot capture the full outbound payload + full response, capture as much as possible from this list. **Evidence further down the list has more inference; evidence higher up is closer to ground truth.**
31
+
32
+ ### Tier 1 — Pre-send / post-receive logs (best partial evidence)
33
+
34
+ The system you're investigating builds a request, then sends it. If the build step logs the assembled request **before** transmission, that log is ground truth for everything except the wire-level bytes (TLS, headers added by HTTP library, etc.).
35
+
36
+ ```bash
37
+ # Maximize debug logging
38
+ APP_DEBUG=1 APP_LOG_LEVEL=debug APP_LOG_FILE=/tmp/trace.log ./target -x "minimal valid input" 2>&1 | head -200
39
+ ```
40
+
41
+ Look for log lines like:
42
+ - `Building request: model=X, params={...}`
43
+ - `[provider] payload: {...}`
44
+ - `Sending to <url>: <serialized body>`
45
+
46
+ **Strength**: 95% of ground truth. Missing only wire-level transformations.
47
+
48
+ ### Tier 2 — Local interception via proxy / shim
49
+
50
+ Run the real binary against a local proxy that records and (optionally) returns a canned response.
51
+
52
+ ```bash
53
+ # mitmproxy approach
54
+ mitmproxy --listen-host 127.0.0.1 --listen-port 8888 --mode regular &
55
+ HTTPS_PROXY=http://127.0.0.1:8888 SSL_CERT_FILE=~/.mitmproxy/mitmproxy-ca-cert.pem ./target ...
56
+ # Now mitmproxy logs the actual TLS-decrypted request
57
+ ```
58
+
59
+ ```bash
60
+ # DYLD_INSERT_LIBRARIES / LD_PRELOAD shim approach
61
+ # Wrap the network call to log payload, return a fake 200
62
+ # See pwntools.md for shim examples
63
+ ```
64
+
65
+ **Strength**: Wire-level ground truth, but requires the target to honor your proxy / preload.
66
+
67
+ ### Tier 3 — Static extraction × runtime fingerprint cross-check
68
+
69
+ When you cannot send a request at all, you can still cross-check static analysis with whatever the binary does that *doesn't* require the real call:
70
+
71
+ - The binary builds the request — even if sending fails, the build step ran. Trace it (Tier 1).
72
+ - The binary writes a state file or cache — read it.
73
+ - The binary emits version-specific User-Agent strings; verify they match your static extraction.
74
+ - The binary's `--help` or `--version` output reveals build metadata; verify model lists / feature flags.
75
+
76
+ **Strength**: Disjoint evidence sources confirming the same fact. Two independent partial signals that agree are nearly as strong as one full observation.
77
+
78
+ ### Tier 4 — Contrastive runtime under different inputs
79
+
80
+ If you can run with input variant A but not B, run A and reason about B from code:
81
+
82
+ ```bash
83
+ # A: minimal trial input — works for free tier
84
+ ./target --action=read --resource=local-file
85
+ # B: full inference call — paid tier required, blocked
86
+ # But the request-building code is shared between A and B!
87
+ # Capture A's logs, then inspect the code path for B and verify only the model/endpoint diff.
88
+ ```
89
+
90
+ **Strength**: Confirms shared code paths; remaining gap is only the difference between A and B.
91
+
92
+ ### Tier 5 — Vendor-published API logs / dashboard
93
+
94
+ If the operation succeeded earlier (before quota ran out, before access was revoked), the vendor's dashboard / audit log may show the request. Lower fidelity but still observed behavior.
95
+
96
+ **Strength**: Real wire data, but often summarized — token counts, status codes, no payload bodies.
97
+
98
+ ### Tier 6 — Pure code reading with peer review
99
+
100
+ If literally none of the above is available, read the code carefully and submit it to **one Oracle for skeptical review** (see "Verification Oracle" below). This is the weakest tier and you must explicitly mark conclusions as "unverified" in the journal.
101
+
102
+ ---
103
+
104
+ ## How to combine partial signals
105
+
106
+ A defensible conclusion **prefers two independent signals from different tiers**, with one exception: a complete Tier 2 wire-level capture is wire-level ground truth and can stand alone for request-shape claims (because the wire bytes are exactly what the remote received). For *behavioral* claims (what the system does next, what state it stores, what side effects it produces), still combine with another signal.
107
+
108
+ | Available evidence | Defensibility |
109
+ |---|---|
110
+ | Tier 1 + Tier 1 (same log, different lines) | weak — single source |
111
+ | Tier 1 + Tier 2 (debug log + proxy capture) | **strong** — independent confirmation |
112
+ | Tier 1 + Tier 3 (debug log + version output cross-check) | **strong** — disjoint sources |
113
+ | Tier 2 alone (full proxy capture) | strong **for request-shape claims only** — stands alone for "what bytes were sent". Add a second signal for response-handling or state claims. |
114
+ | Tier 3 + Tier 4 (cross-check + contrastive run) | medium — both partial |
115
+ | Tier 6 alone (code reading only) | **insufficient** — escalate or mark unverified |
116
+
117
+ Record in the journal:
118
+
119
+ ```markdown
120
+ ## Partial runtime evidence
121
+ ### Question being verified
122
+ <the specific claim, e.g. "Opus 4.7 default effort is 'high'">
123
+
124
+ ### Available signals
125
+ - Tier 1: debug log /tmp/trace.log line 47-49 shows `effort: "high"` ✓
126
+ - Tier 3: static extraction of m5T() function returns "high" for smart mode ✓
127
+ - Tier 6: code path verified by reading prompt-builder.js ✓
128
+
129
+ ### Independence assessment
130
+ Tier 1 and Tier 3 are independent — the log was emitted by a different
131
+ code path than m5T() and would diverge if the static reading were wrong.
132
+
133
+ ### Conclusion
134
+ VERIFIED via Tier 1 + Tier 3 agreement. No need to escalate.
135
+ ```
136
+
137
+ If you cannot achieve a complete Tier 2 capture **or** two independent non-Tier-6 signals from the table above, **write an explicit note in the deliverable**:
138
+
139
+ > ⚠️ Partial-evidence finding. The full outbound payload could not be captured because [reason]. The conclusion rests on:
140
+ > - [signal A — tier and source]
141
+ > - [signal B — tier and source]
142
+ > A future verification should attempt [the missing tier] when [condition].
143
+
144
+ ---
145
+
146
+ ## Verification Oracle pattern (for non-debug tasks)
147
+
148
+ The skill's main Oracle Triple (`04-oracle-triple.md`) is for **stuck debugging** — 2 failed rounds, mental box, three orthogonal framings to break out.
149
+
150
+ For tasks where the deliverable is an **artifact, not a bug fix** (reverse engineering, extraction, audit, compliance documentation), use a different pattern: **single Oracle, late, skeptical, with the deliverable in hand**.
151
+
152
+ ### When to invoke
153
+
154
+ - Right before declaring an extraction/audit task "done"
155
+ - After every significant revision of the deliverable (not after every small edit)
156
+ - Maximum 3-4 iterations before escalating to user
157
+
158
+ ### Pattern
159
+
160
+ Use a Claude Code review lane or verifier subagent with this prompt shape:
161
+
162
+ ```text
163
+ SKEPTICAL FINAL VERIFICATION — be critical, look for reasons the task is incomplete or wrong.
164
+
165
+ ## Original task
166
+ <verbatim user request>
167
+
168
+ ## What I produced
169
+ <list of artifacts with paths and brief descriptions>
170
+
171
+ ## Specific claims to verify
172
+ <bullet list of every concrete claim in the deliverable>
173
+
174
+ ## Where to look
175
+ <paths the Oracle should Read / Bash to verify>
176
+
177
+ ## Your job
178
+ 1. Read the deliverables.
179
+ 2. Spot-check each claim against the source/evidence the deliverable cites.
180
+ 3. Identify any unsubstantiated claims, missing pieces, or factual errors.
181
+ 4. End with PASS / FAIL / PARTIAL with specific gaps.
182
+ Be skeptical. Don't rubber-stamp.
183
+ ```
184
+
185
+ ### Why this differs from the Oracle Triple
186
+
187
+ | | Oracle Triple (debug) | Verification Oracle (artifact) |
188
+ |---|---|---|
189
+ | Trigger | 2 failed hypothesis rounds | About to declare "done" |
190
+ | Count | 3 in parallel, orthogonal framings | 1 sequential, focused review |
191
+ | Goal | Break out of mental box | Catch unsubstantiated claims |
192
+ | Tone of prompt | Brainstorm wide alternatives | Skeptical audit |
193
+ | Iteration | Reset hypothesis set after | Fix gaps, re-invoke until PASS |
194
+
195
+ ### Don't conflate them
196
+
197
+ If you're stuck debugging, do the Triple. If you have a deliverable and need it audited, do the Verification Oracle. Doing the Triple on a finished extraction will return three diverging "what if you tried…" tangents that are not what you need. Doing the Verification Oracle on a stuck debugging session will return a polite "the evidence is incomplete" that you already knew.
198
+
199
+ ---
200
+
201
+ ## Common partial-evidence anti-patterns
202
+
203
+ | Anti-pattern | Why it fails | Replacement |
204
+ |---|---|---|
205
+ | "It looks right in the code, so it works" | Tier 6 alone, unverified | Add at least one Tier 1-3 signal |
206
+ | "I ran it once, didn't error, so it's correct" | Absence of error ≠ presence of correctness | Capture the actual output and verify content |
207
+ | "The mock returns the value I wrote, so the code is fine" | Tautology — mock loops back your assumption | Use Tier 2 (proxy) instead, or cross-check with Tier 3 |
208
+ | "The vendor's dashboard shows my call worked" | Dashboard often only shows status code, not behavior | Combine with Tier 1 if available |
209
+ | "I'll trust the most-recent stack overflow answer" | Code from a different version / context | Verify against the actual binary you have |
210
+
211
+ ---
212
+
213
+ ## Cleanup additions for partial-evidence work
214
+
215
+ ```bash
216
+ # Proxy artifacts
217
+ pkill -f mitmproxy 2>/dev/null
218
+ rm -f ~/.mitmproxy/cache_* 2>/dev/null
219
+
220
+ # Debug log files
221
+ rm -f /tmp/trace.log /tmp/*-debug-trace.log
222
+
223
+ # DYLD_INSERT / LD_PRELOAD shim libraries
224
+ rm -f /tmp/*.dylib /tmp/*.so
225
+
226
+ # Verify env vars set in your shell are not persisted
227
+ unset HTTPS_PROXY APP_DEBUG APP_LOG_LEVEL APP_LOG_FILE 2>/dev/null
228
+ ```