pi-llm-debugging 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # pi-llm-debugging
2
2
 
3
- A [pi](https://shittycodingagent.ai) extension that captures the full LLM provider request payload to disk before each call — letting you inspect exactly what gets sent to the model, turn by turn.
3
+ A [pi](https://shittycodingagent.ai) extension that captures both the full LLM provider request payload **and** the direct HTTP response from the provider to disk for every LLM call — letting you inspect exactly what gets sent to (and received from) the model, turn by turn.
4
4
 
5
5
  ## Install
6
6
 
@@ -16,10 +16,11 @@ pi remove npm:pi-llm-debugging
16
16
 
17
17
  ## How it works
18
18
 
19
- Every time pi is about to call the LLM, the extension intercepts the raw provider payload and writes it as a JSON file into your project's local `.pi` directory:
19
+ Every time pi is about to call the LLM, the extension writes **two** JSON files into your project's local `.pi` directory — one for the request, one for the raw provider response:
20
20
 
21
21
  ```
22
- .pi/pi-llm-debugging/<session_id>/<seq>.json
22
+ .pi/pi-llm-debugging/<session_id>/<seq>-req.json
23
+ .pi/pi-llm-debugging/<session_id>/<seq>-res.json
23
24
  ```
24
25
 
25
26
  - **`session_id`** — the current pi session identifier (visible in the footer bar). Resets on `/new`, `/resume`, and `/fork`.
@@ -30,12 +31,153 @@ For example, a session might produce:
30
31
  ```
31
32
  .pi/pi-llm-debugging/
32
33
  └── abc123def/
33
- ├── 001.json ← first turn
34
- ├── 002.json ← second turn (after a tool call loops back)
35
- └── 003.json
34
+ ├── 001-req.json ← first turn, request payload
35
+ ├── 001-res.json ← first turn, raw provider response
36
+ ├── 002-req.json ← second turn (after a tool call loops back)
37
+ ├── 002-res.json
38
+ ├── 003-req.json
39
+ └── 003-res.json
36
40
  ```
37
41
 
38
- Each file is the exact payload the provider receives: the full message history, system prompt, tool definitions, model parameters, and any cache hints.
42
+ **`<seq>-req.json`** is the exact payload the provider receives: the full message history, system prompt, tool definitions, model parameters, and any cache hints. It is captured via pi's `before_provider_request` event.
43
+
44
+ **`<seq>-res.json`** is the direct HTTP response from the provider, captured by transparently intercepting `fetch` for known LLM hosts (Anthropic, OpenAI, Gemini, Groq, Mistral, DeepSeek, xAI, Together, Fireworks, Cohere, Perplexity, OpenRouter, …). Each file has this shape:
45
+
46
+ ```jsonc
47
+ {
48
+ "url": "https://api.anthropic.com/v1/messages",
49
+ "method": "POST",
50
+ "status": 200,
51
+ "statusText": "OK",
52
+ "headers": { "content-type": "text/event-stream", ... },
53
+ "body": "event: message_start\ndata: {...}\n\n...", // raw bytes verbatim
54
+ "parsedBody": { /* decoded JSON, when content-type is application/json */ }
55
+ }
56
+ ```
57
+
58
+ For streaming SSE responses, `body` contains the full raw SSE text exactly as sent by the provider, so you can replay or diff it. For non-streamed JSON responses, `parsedBody` holds the decoded object for convenience.
59
+
60
+ ## Walkthrough: a 2-turn session
61
+
62
+ Let's trace what gets written when you run a single prompt that requires one tool call. Imagine you start a fresh pi session and type:
63
+
64
+ > **show me the number of files in cwd**
65
+
66
+ pi loops with the model twice: once to decide which tool to call, and once to summarize the tool output. That produces four files:
67
+
68
+ ```
69
+ .pi/pi-llm-debugging/abc123def/
70
+ ├── 001-req.json ← user prompt is sent
71
+ ├── 001-res.json ← model replies with a Bash tool call
72
+ ├── 002-req.json ← prompt + tool call + tool result are sent back
73
+ └── 002-res.json ← model replies with the final text answer
74
+ ```
75
+
76
+ ### Turn 1 — ask the question
77
+
78
+ **`001-req.json`** (trimmed) — just the user message and the model/tool config:
79
+
80
+ ```jsonc
81
+ {
82
+ "model": "claude-opus-4-6",
83
+ "system": [{ "type": "text", "text": "You are Claude Code, ..." }],
84
+ "tools": [ { "name": "Bash", /* ... */ }, /* Read, Edit, ... */ ],
85
+ "messages": [
86
+ {
87
+ "role": "user",
88
+ "content": [
89
+ { "type": "text", "text": "show me the number of files in cwd" }
90
+ ]
91
+ }
92
+ ]
93
+ }
94
+ ```
95
+
96
+ **`001-res.json`** — the raw SSE stream from Anthropic. The model decides to call `Bash`:
97
+
98
+ ```jsonc
99
+ {
100
+ "url": "https://api.anthropic.com/v1/messages",
101
+ "status": 200,
102
+ "headers": { "content-type": "text/event-stream", /* ... */ },
103
+ "body": "event: message_start\ndata: {\"type\":\"message_start\",\"message\":{\"usage\":{\"input_tokens\":6,\"cache_read_input_tokens\":5996,\"output_tokens\":0}}}\n\nevent: content_block_start\ndata: {\"type\":\"content_block_start\",\"index\":0,\"content_block\":{\"type\":\"tool_use\",\"id\":\"toolu_01GE7771KFZvGYeQhyLmbJqQ\",\"name\":\"Bash\",\"input\":{}}}\n\nevent: content_block_delta\ndata: {\"delta\":{\"type\":\"input_json_delta\",\"partial_json\":\"{\\\"command\\\": \\\"ls -1 | wc -l\\\"}\"}}\n\nevent: message_delta\ndata: {\"delta\":{\"stop_reason\":\"tool_use\"},\"usage\":{\"output_tokens\":72}}\n\nevent: message_stop\ndata: {\"type\":\"message_stop\"}\n\n"
104
+ }
105
+ ```
106
+
107
+ If you `jq -r '.body' 001-res.json` you'll see the SSE events laid out cleanly. Notable bits in this response:
108
+
109
+ - `stop_reason: "tool_use"` — the model wants pi to run a tool before continuing.
110
+ - `tool_use.name: "Bash"`, `input: { command: "ls -1 | wc -l" }` — exactly what pi will execute.
111
+ - `usage.cache_read_input_tokens: 5996` — 5996 tokens of system prompt + tool defs hit the prompt cache; only 6 fresh input tokens were billed at full rate.
112
+
113
+ pi runs `ls -1 | wc -l` locally and gets `13`. It then loops back to the model.
114
+
115
+ ### Turn 2 — the tool result is sent back
116
+
117
+ **`002-req.json`** (trimmed) — the conversation has grown by two messages: the assistant's `tool_use` block and a `user`-role `tool_result` carrying the bash output:
118
+
119
+ ```jsonc
120
+ {
121
+ "model": "claude-opus-4-6",
122
+ "messages": [
123
+ {
124
+ "role": "user",
125
+ "content": [{ "type": "text", "text": "show me the number of files in cwd" }]
126
+ },
127
+ {
128
+ "role": "assistant",
129
+ "content": [
130
+ {
131
+ "type": "tool_use",
132
+ "id": "toolu_01GE7771KFZvGYeQhyLmbJqQ",
133
+ "name": "Bash",
134
+ "input": { "command": "ls -1 | wc -l" }
135
+ }
136
+ ]
137
+ },
138
+ {
139
+ "role": "user",
140
+ "content": [
141
+ {
142
+ "type": "tool_result",
143
+ "tool_use_id": "toolu_01GE7771KFZvGYeQhyLmbJqQ",
144
+ "content": " 13\n"
145
+ }
146
+ ]
147
+ }
148
+ ]
149
+ }
150
+ ```
151
+
152
+ Diffing `001-req.json` against `002-req.json` is the fastest way to see *exactly* how pi grew the conversation between turns — useful when debugging tool-result formatting or context bloat.
153
+
154
+ **`002-res.json`** — with the tool result in hand, the model now answers in plain text:
155
+
156
+ ```jsonc
157
+ {
158
+ "url": "https://api.anthropic.com/v1/messages",
159
+ "status": 200,
160
+ "body": "event: message_start\ndata: {...,\"usage\":{\"input_tokens\":1,\"cache_read_input_tokens\":6089,\"output_tokens\":1}}\n\nevent: content_block_start\ndata: {\"content_block\":{\"type\":\"text\",\"text\":\"\"}}\n\nevent: content_block_delta\ndata: {\"delta\":{\"type\":\"text_delta\",\"text\":\"13\"}}\n\nevent: content_block_delta\ndata: {\"delta\":{\"type\":\"text_delta\",\"text\":\" files/directories in the current working directory.\"}}\n\nevent: message_delta\ndata: {\"delta\":{\"stop_reason\":\"end_turn\"},\"usage\":{\"output_tokens\":17}}\n\nevent: message_stop\ndata: {}\n\n"
161
+ }
162
+ ```
163
+
164
+ This time:
165
+
166
+ - `stop_reason: "end_turn"` — the model is done, the agent loop exits.
167
+ - The streamed text deltas concatenate to `"13 files/directories in the current working directory."` — which is what you see in the pi UI.
168
+ - `cache_read_input_tokens: 6089` (vs `5996` on turn 1) — the prior turn's assistant + tool_result blocks were appended into the cache.
169
+
170
+ ### What you learned from 4 files
171
+
172
+ By reading these 4 files in order you can answer questions like:
173
+
174
+ - *Did pi send my exact prompt?* → `001-req.json`
175
+ - *Why did the model choose Bash and what command did it pick?* → `001-res.json`
176
+ - *Was the tool result formatted correctly when sent back?* → `002-req.json`
177
+ - *Did the model actually generate the final answer, or did pi mangle it?* → `002-res.json`
178
+ - *How much of my context was cached vs fresh?* → `usage` blocks in either `-res.json`
179
+
180
+ No guessing, no "works on my machine" — just the bytes that crossed the wire.
39
181
 
40
182
  ## What you can debug
41
183
 
@@ -49,6 +191,8 @@ Each file is the exact payload the provider receives: the full message history,
49
191
 
50
192
  **Compaction quality** — Compare the payload just before and just after a `/compact` to see what the summary replaced and whether important context was preserved.
51
193
 
194
+ **Response-side issues** — Inspect `<seq>-res.json` to see the raw SSE stream, tool-use blocks, stop reason, thinking blocks, cache hits, and token usage reported by the provider. Essential when the model does something surprising and you need to know whether it was the model, the parser, or pi itself.
195
+
52
196
  ## Tips
53
197
 
54
198
  Add the debugging output to your `.gitignore` so it doesn't end up in version control:
@@ -60,16 +204,19 @@ Add the debugging output to your `.gitignore` so it doesn't end up in version co
60
204
  Use `jq` to quickly inspect a payload:
61
205
 
62
206
  ```bash
63
- # See just the messages
64
- jq '.messages' .pi/pi-llm-debugging/<session_id>/001.json
207
+ # See just the messages sent on the first turn
208
+ jq '.messages' .pi/pi-llm-debugging/<session_id>/001-req.json
209
+
210
+ # Inspect the decoded provider response (non-streamed)
211
+ jq '.parsedBody' .pi/pi-llm-debugging/<session_id>/001-res.json
65
212
 
66
- # Count tokens approximation: check message content lengths
67
- jq '[.messages[].content | .. | strings | length] | add' .pi/pi-llm-debugging/<session_id>/001.json
213
+ # Replay a streamed SSE response to stdout
214
+ jq -r '.body' .pi/pi-llm-debugging/<session_id>/001-res.json
68
215
 
69
- # Diff two consecutive turns to see what changed
216
+ # Diff two consecutive request payloads to see what changed
70
217
  diff \
71
- <(jq . .pi/pi-llm-debugging/<session_id>/001.json) \
72
- <(jq . .pi/pi-llm-debugging/<session_id>/002.json)
218
+ <(jq . .pi/pi-llm-debugging/<session_id>/001-req.json) \
219
+ <(jq . .pi/pi-llm-debugging/<session_id>/002-req.json)
73
220
  ```
74
221
 
75
222
  Since files are scoped per-project under `.pi/`, each project manages its own debugging output independently — nothing bleeds into other projects or your global `~/.pi` directory.
@@ -1,21 +1,162 @@
1
1
  /**
2
- * Pi LLM Debugging — Saves the full provider request payload to disk before each LLM call.
2
+ * Pi LLM Debugging — Saves the full provider request AND the raw provider response
3
+ * to disk for each LLM call.
3
4
  *
4
- * Files are written to <project>/.pi/pi-llm-debugging/<pi_session_id>/<sequence>.json
5
- * where <sequence> is a zero-padded counter (001, 002, ...).
5
+ * For every LLM turn, two files are written:
6
+ * <project>/.pi/pi-llm-debugging/<pi_session_id>/<seq>-req.json
7
+ * <project>/.pi/pi-llm-debugging/<pi_session_id>/<seq>-res.json
6
8
  *
7
- * Unlike the global save-llm-prompt extension, each project manages its own debugging
8
- * files under its local .pi directory, making it easy to review, diff, and gitignore
9
- * per-project LLM traffic.
9
+ * - <seq> is a zero-padded counter (001, 002, ...).
10
+ * - <seq>-req.json contains the exact payload handed to the provider SDK
11
+ * (captured via pi's `before_provider_request` event).
12
+ * - <seq>-res.json contains the direct HTTP response from the LLM provider
13
+ * (captured by monkey-patching globalThis.fetch and teeing the response
14
+ * body for known provider hosts). For streaming SSE responses the raw SSE
15
+ * text is preserved verbatim inside the `body` field.
10
16
  *
11
- * The current Pi session ID is shown in the footer bar and updates on /new, /resume, and /fork.
17
+ * Unlike pi's global save-llm-prompt extension, files are scoped to the
18
+ * current project's local .pi directory so each project manages its own
19
+ * debugging output (easy to gitignore, diff, and review).
20
+ *
21
+ * The current Pi session ID is shown in the footer bar and updates on
22
+ * /new, /resume, and /fork.
12
23
  */
13
24
 
14
25
  import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
15
26
  import { mkdirSync, writeFileSync } from "node:fs";
16
27
  import { join } from "node:path";
17
28
 
29
+ // Hostnames we consider "LLM provider" traffic worth capturing.
30
+ const PROVIDER_HOST_PATTERNS: RegExp[] = [
31
+ /(^|\.)anthropic\.com$/i,
32
+ /(^|\.)openai\.com$/i,
33
+ /(^|\.)openai\.azure\.com$/i,
34
+ /(^|\.)googleapis\.com$/i, // gemini / vertex
35
+ /(^|\.)generativelanguage\.googleapis\.com$/i,
36
+ /(^|\.)mistral\.ai$/i,
37
+ /(^|\.)groq\.com$/i,
38
+ /(^|\.)deepseek\.com$/i,
39
+ /(^|\.)x\.ai$/i,
40
+ /(^|\.)together\.xyz$/i,
41
+ /(^|\.)fireworks\.ai$/i,
42
+ /(^|\.)cohere\.(com|ai)$/i,
43
+ /(^|\.)perplexity\.ai$/i,
44
+ /(^|\.)openrouter\.ai$/i,
45
+ ];
46
+
47
+ function isProviderUrl(url: string): boolean {
48
+ try {
49
+ const host = new URL(url).hostname;
50
+ return PROVIDER_HOST_PATTERNS.some((re) => re.test(host));
51
+ } catch {
52
+ return false;
53
+ }
54
+ }
55
+
56
+ // Install the fetch interceptor exactly once per Node process. Multiple
57
+ // pi sessions (or extension re-inits) share the same hook and use a
58
+ // module-level "current target" to decide where to write the next response.
59
+ type ResponseTarget = { outDir: string; sequence: number } | null;
60
+ let currentTarget: ResponseTarget = null;
61
+ let fetchPatched = false;
62
+
63
+ function installFetchInterceptor() {
64
+ if (fetchPatched) return;
65
+ fetchPatched = true;
66
+
67
+ const originalFetch = globalThis.fetch;
68
+ if (typeof originalFetch !== "function") return;
69
+
70
+ globalThis.fetch = (async (
71
+ input: Parameters<typeof fetch>[0],
72
+ init?: Parameters<typeof fetch>[1],
73
+ ): Promise<Response> => {
74
+ const url =
75
+ typeof input === "string"
76
+ ? input
77
+ : input instanceof URL
78
+ ? input.toString()
79
+ : (input as Request).url;
80
+
81
+ const response = await originalFetch(input as any, init as any);
82
+
83
+ // Only intercept known provider traffic, and only if we have a
84
+ // request that hasn't been paired with a response yet.
85
+ if (!currentTarget || !isProviderUrl(url)) {
86
+ return response;
87
+ }
88
+
89
+ const target = currentTarget;
90
+ // One response per request: clear immediately so subsequent fetches
91
+ // (retries, unrelated calls) don't clobber this slot.
92
+ currentTarget = null;
93
+
94
+ const filename = `${String(target.sequence).padStart(3, "0")}-res.json`;
95
+ const filepath = join(target.outDir, filename);
96
+
97
+ // Tee the body so the caller still gets a fully readable response.
98
+ // For non-streamed JSON responses, .clone() + .text() is enough.
99
+ // For SSE streams, clone() also works: both branches can be
100
+ // consumed independently by the WHATWG fetch implementation.
101
+ const cloned = response.clone();
102
+
103
+ // Fire-and-forget: never block the real request on disk IO.
104
+ void (async () => {
105
+ try {
106
+ const headers: Record<string, string> = {};
107
+ cloned.headers.forEach((v, k) => {
108
+ headers[k] = v;
109
+ });
110
+ const bodyText = await cloned.text();
111
+
112
+ const contentType = headers["content-type"] || "";
113
+ let parsedBody: unknown = undefined;
114
+ if (contentType.includes("application/json")) {
115
+ try {
116
+ parsedBody = JSON.parse(bodyText);
117
+ } catch {
118
+ // keep raw text only
119
+ }
120
+ }
121
+
122
+ const record = {
123
+ url,
124
+ method: (init?.method || (typeof input !== "string" && !(input instanceof URL) ? (input as Request).method : "GET")).toUpperCase(),
125
+ status: response.status,
126
+ statusText: response.statusText,
127
+ headers,
128
+ // For SSE / text responses, `body` holds the raw stream text.
129
+ // For JSON responses, `parsedBody` holds the decoded object and
130
+ // `body` still holds the exact bytes for fidelity.
131
+ body: bodyText,
132
+ parsedBody,
133
+ };
134
+
135
+ writeFileSync(filepath, JSON.stringify(record, null, 2), "utf-8");
136
+ } catch (err) {
137
+ try {
138
+ writeFileSync(
139
+ filepath,
140
+ JSON.stringify(
141
+ { url, error: (err as Error)?.message || String(err) },
142
+ null,
143
+ 2,
144
+ ),
145
+ "utf-8",
146
+ );
147
+ } catch {
148
+ // give up silently — debugging must never break the session
149
+ }
150
+ }
151
+ })();
152
+
153
+ return response;
154
+ }) as typeof fetch;
155
+ }
156
+
18
157
  export default function (pi: ExtensionAPI) {
158
+ installFetchInterceptor();
159
+
19
160
  let outDir = "";
20
161
  let sequence = 0;
21
162
 
@@ -46,8 +187,13 @@ export default function (pi: ExtensionAPI) {
46
187
  pi.on("before_provider_request", (_event, ctx) => {
47
188
  if (!outDir) initSession(ctx);
48
189
  sequence++;
49
- const filename = `${String(sequence).padStart(3, "0")}.json`;
50
- const filepath = join(outDir, filename);
51
- writeFileSync(filepath, JSON.stringify(_event.payload, null, 2), "utf-8");
190
+
191
+ const seqStr = String(sequence).padStart(3, "0");
192
+ const reqPath = join(outDir, `${seqStr}-req.json`);
193
+ writeFileSync(reqPath, JSON.stringify(_event.payload, null, 2), "utf-8");
194
+
195
+ // Arm the fetch interceptor to route the very next provider-bound
196
+ // HTTP response into <seq>-res.json.
197
+ currentTarget = { outDir, sequence };
52
198
  });
53
199
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-llm-debugging",
3
- "version": "0.1.0",
3
+ "version": "0.2.0",
4
4
  "description": "Saves LLM provider request payloads to the project's .pi folder for per-project debugging",
5
5
  "repository": {
6
6
  "type": "git",