pi-thinking-only-guard 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 reluxa
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,119 @@
1
+ # Qwen3.6 Thinking-Only Guard — Pi Extension
2
+
3
+ ## Problem
4
+
5
+ Qwen3.6-27B (and similar thinking-capable models via providers like airouter) sometimes places
6
+ tool calls inside the `reasoning_content` (thinking block) instead of as proper `tool_calls` in the API response.
7
+
8
+ - `finish_reason: "stop"` — model thinks it is done
9
+ - Thinking content contains `<tool_call>` ... `</tool_call>` blocks
10
+ - No actual `tool_calls` in response — pi does not execute them
11
+ - No text content — user sees empty or thinking-only response
12
+
13
+ Known issue: [sgl-project/sglang#27021](https://github.com/sgl-project/sglang/issues/27021)
14
+
15
+ ## Root Cause
16
+
17
+ Provider emits:
18
+ ```
19
+ reasoning_content: "<tool_call>
20
+ <function=read>
21
+ <parameter=path>
22
+ /home/reluxa/.profile
23
+ </parameter>
24
+ </function>
25
+ </tool_call>"
26
+ content: ""
27
+ finish_reason: "stop"
28
+ ```
29
+
30
+ Pi's OpenAI-completions parser puts thinking in `[type: "thinking"]`, finds no tool calls,
31
+ and the turn ends. The model "stopped" from its perspective.
32
+
33
+ ## Solution: thinking-only-guard.ts
34
+
35
+ A pi extension that detects this pattern during live streaming and sends the trapped
36
+ tool call(s) back to the model so it can execute them properly.
37
+
38
+ ### Files
39
+
40
+ | File | Purpose |
41
+ |------|---------|
42
+ | `~/.pi/agent/extensions/thinking-only-guard.ts` | The extension |
43
+ | `~/.pi/agent/extensions/tests/thinking-only-guard.test.js` | Unit tests (14 tests) |
44
+
45
+ ### How It Works
46
+
47
+ 1. `message_update` — Accumulates `thinking_delta` tokens into `lastThinking`
48
+ 2. `message_end` — Checks if the completed assistant message matches the pattern:
49
+ - `toolCallCount >= 1` (one or more <tool_call> blocks in thinking)
50
+ - `hasText === false` (no `type: "text"` in content array)
51
+ - `hasRealToolCalls === false` (no `type: "toolCall"` in content array)
52
+ - `sawThinkingDelta === true` (only fires during live streaming, not session replay)
53
+ 3. If matched — extracts the exact tool call block(s) from thinking and sends a follow-up:
54
+ > Your last response had N tool call(s) inside your thinking block. Please execute them now:
55
+ >
56
+ > <tool_call>
57
+ <function=read>
58
+ <path>
59
+ /home/reluxa/.profile
60
+ </parameter>
61
+ </function>
62
+ </tool_call>
63
+ 4. `turn_end` — Resets retry counter
64
+ 5. Max 2 retries per turn before giving up
65
+
66
+ ### Trigger Conditions
67
+
68
+ | Condition | Must be |
69
+ |-----------|---------|
70
+ | Tool calls in thinking | >= 1 |
71
+ | Text blocks in content | 0 |
72
+ | Real toolCall entries | 0 |
73
+ | Live streaming | Yes |
74
+ | Retry count | < maxRetries (2) |
75
+
76
+ ### Configuration
77
+
78
+ Editable at the top of the extension file.
79
+
80
+ | Setting | Default | Notes |
81
+ |---------|---------|-------|
82
+ | `maxRetries` | 2 | Max auto-continue per turn |
83
+
84
+ ### Running Tests
85
+
86
+ ```bash
87
+ node ~/.pi/agent/extensions/tests/thinking-only-guard.test.js
88
+ ```
89
+
90
+ 14 tests: single call, multiple calls, with text, with real toolCalls, plain thinking, empty thinking, extract N calls
91
+
92
+ ### Session Replay Results
93
+
94
+ Scanned 763 thinking-only messages across `~/.pi/agent/sessions/--home-reluxa--/`.
95
+ 4 would have triggered the guard (entries 50556cf1, 968d54e2, bf309c6e, 051b0538).
96
+
97
+ ## Design Decisions
98
+
99
+ ### Why send back as user message?
100
+
101
+ The turn is already finalized by `message_end` — pi will not re-scan for tool calls.
102
+ Sending the trapped call back triggers a fresh turn where the model executes it.
103
+
104
+ ### Why not modify the message in-place?
105
+
106
+ `message_end` can return `{ message }` to replace it, but the turn is already done.
107
+ Converting thinking to text only makes the calls visible, not executable.
108
+
109
+ ### Why not a custom provider wrapper?
110
+
111
+ Technically possible — intercept raw streaming chunks, restructure reasoning_content into tool_calls.
112
+ However: significantly more engineering, fragile (depends on provider internals).
113
+ Current approach works and costs one extra turn.
114
+
115
+ ## Related
116
+
117
+ - [Qwen3.6-27B on HuggingFace](https://huggingface.co/Qwen/Qwen3.6-27B)
118
+ - [SGLang issue: stops after thinking](https://github.com/sgl-project/sglang/issues/27021)
119
+ - [LMStudio: thinking + tool call issue](https://github.com/lmstudio-ai/lmstudio-bug-tracker/issues/2045)
package/package.json ADDED
@@ -0,0 +1,14 @@
1
+ {
2
+ "name": "pi-thinking-only-guard",
3
+ "version": "0.1.0",
4
+ "description": "Auto-recover trapped tool calls from thinking blocks for Qwen3.6 and similar models",
5
+ "keywords": ["pi-package", "pi-coding-agent", "qwen", "thinking"],
6
+ "license": "MIT",
7
+ "repository": {
8
+ "type": "git",
9
+ "url": "git@github.com:reluxa/pi-thinking-only-guard.git"
10
+ },
11
+ "pi": {
12
+ "extensions": ["./thinking-only-guard.ts"]
13
+ }
14
+ }
@@ -0,0 +1,38 @@
1
+ #!/usr/bin/env node
2
+ const TOOL_OPEN = '<tool_call>';
3
+ const TOOL_CLOSE = '</tool_call>';
4
+ function countToolCalls(t){return t.split(TOOL_OPEN).length-1;}
5
+ function extractAllToolCalls(t){const r=[];let i=0;for(;;){const a=t.indexOf(TOOL_OPEN,i);if(a===-1)break;const b=t.indexOf(TOOL_CLOSE,a);if(b===-1)break;r.push(t.slice(a,b+TOOL_CLOSE.length));i=b+TOOL_CLOSE.length;}return r;}
6
+ function wouldThinking(text,c){
7
+ const n=countToolCalls(text);
8
+ const hasText=typeof c==="string"?c.trim().length>0:Array.isArray(c)?c.some(x=>x.type==="text"&&x.text&&x.text.trim().length>0):false;
9
+ const hasTC=Array.isArray(c)?c.some(x=>x.type==="toolCall"):false;
10
+ return n>=1&&!hasText&&!hasTC;
11
+ }
12
+ let p=0,f=0;
13
+ function assert(name,a,e){if(a===e){p++;console.log(" ✓ "+name);}else{f++;console.log(" ✗ "+name+" (got "+a+", expected "+e+")");}}
14
+ function assertCount(name,actual,expected){if(actual===expected){p++;console.log(" ✓ "+name);}else{f++;console.log(" ✗ "+name+" (got "+actual+", expected "+expected+")");}}
15
+ console.log("thinking-only-guard tests (multi-call version)");
16
+
17
+ // --- wouldThinking (should trigger) ---
18
+ assert("read in thinking, no text",wouldThinking('<tool_call>\n<function=read>\n<parameter=path>\n/home/reluxa/.profile\n</parameter>\n</function>\n</tool_call>',[{type:"thinking"}]),true);
19
+ assert("bash in thinking, no text",wouldThinking('<tool_call>\n<function=bash>\n<parameter=command>\necho hello\n</parameter>\n</function>\n</tool_call>',[{type:"thinking"}]),true);
20
+ assert("two tool calls in thinking, no text",wouldThinking('<tool_call>\n<function=bash>\n<parameter=command>\necho a\n</parameter>\n</function>\n</tool_call><tool_call>\n<function=bash>\n<parameter=command>\nls\n</parameter>\n</function>\n</tool_call>',[{type:"thinking"}]),true);
21
+ assert("bash + read in thinking, no text",wouldThinking('<tool_call>\n<function=bash>\n<parameter=command>\nls\n</parameter>\n</function>\n</tool_call><tool_call>\n<function=read>\n<parameter=path>\n/home/reluxa/.profile\n</parameter>\n</function>\n</tool_call>',[{type:"thinking"}]),true);
22
+ assert("bash + read + bash in thinking, no text",wouldThinking('<tool_call>\n<function=bash>\n<parameter=command>\nls\n</parameter>\n</function>\n</tool_call><tool_call>\n<function=read>\n<parameter=path>\n/home/reluxa/.profile\n</parameter>\n</function>\n</tool_call><tool_call>\n<function=bash>\n<parameter=command>\npwd\n</parameter>\n</function>\n</tool_call>',[{type:"thinking"}]),true);
23
+ assert("tool call + text before it, still 1 tc",wouldThinking('No model.json found.\n\n<tool_call>\n<function=read>\n<parameter=path>\n/home/reluxa/.profile\n</parameter>\n</function>\n</tool_call>',[{type:"thinking"}]),true);
24
+
25
+ // --- wouldThinking (should NOT trigger) ---
26
+ assert("read + text block",wouldThinking('<tool_call>\n<function=read>\n<parameter=path>\n/home/reluxa/.profile\n</parameter>\n</function>\n</tool_call>',[{type:"thinking"},{type:"text",text:"x"}]),false);
27
+ assert("read + real toolCall",wouldThinking('<tool_call>\n<function=read>\n<parameter=path>\n/home/reluxa/.profile\n</parameter>\n</function>\n</tool_call>',[{type:"thinking"},{type:"toolCall",id:"tc1"}]),false);
28
+ assert("plain thinking, no tool calls",wouldThinking('This is your home directory with quite a lot of files.',[{type:"thinking"}]),false);
29
+ assert("empty thinking",wouldThinking('',[{type:"thinking",thinking:""}]),false);
30
+
31
+ // --- extractAllToolCalls ---
32
+ assertCount("extract 1 tool call",extractAllToolCalls('<tool_call>\n<function=read>\n<parameter=path>\n/home/reluxa/.profile\n</parameter>\n</function>\n</tool_call>').length,1);
33
+ assertCount("extract 2 tool calls",extractAllToolCalls('<tool_call>\n<function=bash>\n<parameter=command>\necho a\n</parameter>\n</function>\n</tool_call><tool_call>\n<function=bash>\n<parameter=command>\nls\n</parameter>\n</function>\n</tool_call>').length,2);
34
+ assertCount("extract 3 tool calls",extractAllToolCalls('<tool_call>\n<function=bash>\n<parameter=command>\nls\n</parameter>\n</function>\n</tool_call><tool_call>\n<function=read>\n<parameter=path>\n/home/reluxa/.profile\n</parameter>\n</function>\n</tool_call><tool_call>\n<function=bash>\n<parameter=command>\npwd\n</parameter>\n</function>\n</tool_call>').length,3);
35
+ assertCount("no tool calls",extractAllToolCalls('This is your home directory with quite a lot of files.').length,0);
36
+
37
+ console.log("passed: "+p+", failed: "+f);
38
+ process.exit(f>0?1:0);
@@ -0,0 +1,142 @@
1
+ /**
2
+ * thinking-only-guard.ts
3
+ *
4
+ * Detects when Qwen3.6 puts a tool call inside the thinking section
5
+ * instead of executing it as a real tool call, leaving the message
6
+ * with only thinking content and no text or tool calls.
7
+ *
8
+ * Extracts the trapped tool call and sends it back to the model
9
+ * so it can execute it properly instead of just asking to continue.
10
+ *
11
+ * Trigger condition:
12
+ * - Exactly ONE tool call marker inside thinking content
13
+ * - No text block in the content array
14
+ * - No real toolCall entries in the content array
15
+ * - Only fires during live streaming (not on session replay)
16
+ *
17
+ * Events used:
18
+ * - message_update: accumulates thinking content during streaming
19
+ * - turn_end: resets retry counter per turn
20
+ * - message_end: detects the pattern and sends the trapped tool call
21
+ */
22
+ import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
23
+
24
+ // Tool call markers used by pi's tool format
25
+ const TOOL_OPEN = "<tool_call>";
26
+ const TOOL_CLOSE = "</tool_call>";
27
+
28
+ /** Extract the tool call block from thinking text.
29
+ * Returns the raw tool call string including markers, or null.
30
+ */
31
+ function extractToolCall(text: string): string | null {
32
+ const openIdx = text.indexOf(TOOL_OPEN);
33
+ if (openIdx === -1) return null;
34
+ const closeIdx = text.indexOf(TOOL_CLOSE, openIdx);
35
+ if (closeIdx === -1) return null;
36
+ return text.slice(openIdx, closeIdx + TOOL_CLOSE.length);
37
+ }
38
+
39
+ /** Count how many tool_call blocks appear in the thinking text */
40
+ function countToolCalls(text: string): number {
41
+ return (text.split(TOOL_OPEN).length - 1);
42
+ }
43
+
44
+ /** Extract all tool call blocks from thinking text */
45
+ function extractAllToolCalls(text: string): string[] {
46
+ const calls: string[] = [];
47
+ let idx = 0;
48
+ while (true) {
49
+ const openIdx = text.indexOf(TOOL_OPEN, idx);
50
+ if (openIdx === -1) break;
51
+ const closeIdx = text.indexOf(TOOL_CLOSE, openIdx);
52
+ if (closeIdx === -1) break;
53
+ calls.push(text.slice(openIdx, closeIdx + TOOL_CLOSE.length));
54
+ idx = closeIdx + TOOL_CLOSE.length;
55
+ }
56
+ return calls;
57
+ }
58
+
59
+ export default function (pi: ExtensionAPI) {
60
+ const maxRetries = 2;
61
+
62
+ let lastThinking = "";
63
+ let sawThinkingDelta = false;
64
+ let retryCount = 0;
65
+
66
+ pi.on("message_start", async (event) => {
67
+ if (event.message.role === "assistant") {
68
+ lastThinking = "";
69
+ sawThinkingDelta = false;
70
+ }
71
+ });
72
+
73
+ pi.on("message_update", async (event) => {
74
+ const amEvent = event.assistantMessageEvent;
75
+ if (amEvent.type === "thinking_delta") {
76
+ sawThinkingDelta = true;
77
+ lastThinking += amEvent.delta;
78
+ }
79
+ });
80
+
81
+ pi.on("turn_end", async () => {
82
+ retryCount = 0;
83
+ });
84
+
85
+ pi.on("message_end", async (event, ctx) => {
86
+ if (event.message.role !== "assistant") {
87
+ lastThinking = "";
88
+ sawThinkingDelta = false;
89
+ return;
90
+ }
91
+
92
+ const content = event.message.content;
93
+ const thinking = lastThinking;
94
+
95
+ // Only act on live streaming (not session replay / restore)
96
+ if (!sawThinkingDelta || thinking.length === 0) {
97
+ lastThinking = "";
98
+ sawThinkingDelta = false;
99
+ return;
100
+ }
101
+
102
+ const toolCallCount = countToolCalls(thinking);
103
+
104
+ // Check content array for text blocks and real toolCall entries
105
+ const hasText = typeof content === "string"
106
+ ? content.trim().length > 0
107
+ : Array.isArray(content)
108
+ ? content.some((c: any) => c.type === "text" && c.text?.trim().length > 0)
109
+ : false;
110
+
111
+ const hasRealToolCalls = Array.isArray(content)
112
+ ? content.some((c: any) => c.type === "toolCall")
113
+ : false;
114
+
115
+ // Trigger: exactly one tool call trapped in thinking, no text, no real tool calls
116
+ if (toolCallCount >= 1 && !hasText && !hasRealToolCalls) {
117
+ if (retryCount >= maxRetries) {
118
+ ctx.ui.notify(`Thinking guard: stopped after ${maxRetries} retries`, "warn");
119
+ } else {
120
+ retryCount++;
121
+ const trappedCall = extractToolCall(thinking);
122
+ let followUp;
123
+ if (trappedCall) {
124
+ // Send the exact tool call back so the model can execute it
125
+ followUp = `Your last response had this tool call inside your thinking block instead of executing it. Please execute it now:\n\n${trappedCall}`;
126
+ } else {
127
+ followUp = "You placed a tool call inside your thinking block instead of executing it. Please continue and make the tool call properly.";
128
+ }
129
+ ctx.ui.notify(
130
+ `Thinking guard: repeating trapped tool call(s) (attempt ${retryCount}/${maxRetries})`,
131
+ "warn"
132
+ );
133
+ setTimeout(() => {
134
+ pi.sendUserMessage(followUp, { triggerTurn: true });
135
+ }, 500);
136
+ }
137
+ }
138
+
139
+ lastThinking = "";
140
+ sawThinkingDelta = false;
141
+ });
142
+ }