npm - pi-thinking-only-guard - Versions diffs - 0.1.0 - Mend

pi-thinking-only-guard 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/LICENSE +21 -0
package/README.md +119 -0
package/package.json +14 -0
package/tests/thinking-only-guard.test.js +38 -0
package/thinking-only-guard.ts +142 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 reluxa
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,119 @@
+# Qwen3.6 Thinking-Only Guard — Pi Extension
+## Problem
+Qwen3.6-27B (and similar thinking-capable models via providers like airouter) sometimes places
+tool calls inside the `reasoning_content` (thinking block) instead of as proper `tool_calls` in the API response.
+- `finish_reason: "stop"` — model thinks it is done
+- Thinking content contains `<tool_call>` ... `</tool_call>` blocks
+- No actual `tool_calls` in response — pi does not execute them
+- No text content — user sees empty or thinking-only response
+Known issue: [sgl-project/sglang#27021](https://github.com/sgl-project/sglang/issues/27021)
+## Root Cause
+Provider emits:
+```
+    reasoning_content: "<tool_call>
+<function=read>
+<parameter=path>
+/home/reluxa/.profile
+</parameter>
+</function>
+</tool_call>"
+    content: ""
+    finish_reason: "stop"
+```
+Pi's OpenAI-completions parser puts thinking in `[type: "thinking"]`, finds no tool calls,
+and the turn ends. The model "stopped" from its perspective.
+## Solution: thinking-only-guard.ts
+A pi extension that detects this pattern during live streaming and sends the trapped
+tool call(s) back to the model so it can execute them properly.
+### Files
+| File | Purpose |
+|------|---------|
+| `~/.pi/agent/extensions/thinking-only-guard.ts` | The extension |
+| `~/.pi/agent/extensions/tests/thinking-only-guard.test.js` | Unit tests (14 tests) |
+### How It Works
+1. `message_update` — Accumulates `thinking_delta` tokens into `lastThinking`
+2. `message_end` — Checks if the completed assistant message matches the pattern:
+   - `toolCallCount >= 1` (one or more <tool_call> blocks in thinking)
+   - `hasText === false` (no `type: "text"` in content array)
+   - `hasRealToolCalls === false` (no `type: "toolCall"` in content array)
+   - `sawThinkingDelta === true` (only fires during live streaming, not session replay)
+3. If matched — extracts the exact tool call block(s) from thinking and sends a follow-up:
+   > Your last response had N tool call(s) inside your thinking block. Please execute them now:
+   >
+   > <tool_call>
+   <function=read>
+   <path>
+   /home/reluxa/.profile
+   </parameter>
+   </function>
+   </tool_call>
+4. `turn_end` — Resets retry counter
+5. Max 2 retries per turn before giving up
+### Trigger Conditions
+| Condition | Must be |
+|-----------|---------|
+| Tool calls in thinking | >= 1 |
+| Text blocks in content | 0 |
+| Real toolCall entries | 0 |
+| Live streaming | Yes |
+| Retry count | < maxRetries (2) |
+### Configuration
+Editable at the top of the extension file.
+| Setting | Default | Notes |
+|---------|---------|-------|
+| `maxRetries` | 2 | Max auto-continue per turn |
+### Running Tests
+```bash
+node ~/.pi/agent/extensions/tests/thinking-only-guard.test.js
+```
+14 tests: single call, multiple calls, with text, with real toolCalls, plain thinking, empty thinking, extract N calls
+### Session Replay Results
+Scanned 763 thinking-only messages across `~/.pi/agent/sessions/--home-reluxa--/`.
+4 would have triggered the guard (entries 50556cf1, 968d54e2, bf309c6e, 051b0538).
+## Design Decisions
+### Why send back as user message?
+The turn is already finalized by `message_end` — pi will not re-scan for tool calls.
+Sending the trapped call back triggers a fresh turn where the model executes it.
+### Why not modify the message in-place?
+`message_end` can return `{ message }` to replace it, but the turn is already done.
+Converting thinking to text only makes the calls visible, not executable.
+### Why not a custom provider wrapper?
+Technically possible — intercept raw streaming chunks, restructure reasoning_content into tool_calls.
+However: significantly more engineering, fragile (depends on provider internals).
+Current approach works and costs one extra turn.
+## Related
+- [Qwen3.6-27B on HuggingFace](https://huggingface.co/Qwen/Qwen3.6-27B)
+- [SGLang issue: stops after thinking](https://github.com/sgl-project/sglang/issues/27021)
+- [LMStudio: thinking + tool call issue](https://github.com/lmstudio-ai/lmstudio-bug-tracker/issues/2045)

package/package.json ADDED Viewed

@@ -0,0 +1,14 @@
+{
+  "name": "pi-thinking-only-guard",
+  "version": "0.1.0",
+  "description": "Auto-recover trapped tool calls from thinking blocks for Qwen3.6 and similar models",
+  "keywords": ["pi-package", "pi-coding-agent", "qwen", "thinking"],
+  "license": "MIT",
+  "repository": {
+    "type": "git",
+    "url": "git@github.com:reluxa/pi-thinking-only-guard.git"
+  },
+  "pi": {
+    "extensions": ["./thinking-only-guard.ts"]
+  }
+}

package/tests/thinking-only-guard.test.js ADDED Viewed

@@ -0,0 +1,38 @@
+#!/usr/bin/env node
+const TOOL_OPEN = '<tool_call>';
+const TOOL_CLOSE = '</tool_call>';
+function countToolCalls(t){return t.split(TOOL_OPEN).length-1;}
+function extractAllToolCalls(t){const r=[];let i=0;for(;;){const a=t.indexOf(TOOL_OPEN,i);if(a===-1)break;const b=t.indexOf(TOOL_CLOSE,a);if(b===-1)break;r.push(t.slice(a,b+TOOL_CLOSE.length));i=b+TOOL_CLOSE.length;}return r;}
+function wouldThinking(text,c){
+  const n=countToolCalls(text);
+  const hasText=typeof c==="string"?c.trim().length>0:Array.isArray(c)?c.some(x=>x.type==="text"&&x.text&&x.text.trim().length>0):false;
+  const hasTC=Array.isArray(c)?c.some(x=>x.type==="toolCall"):false;
+  return n>=1&&!hasText&&!hasTC;
+}
+let p=0,f=0;
+function assert(name,a,e){if(a===e){p++;console.log("  ✓ "+name);}else{f++;console.log("  ✗ "+name+" (got "+a+", expected "+e+")");}}
+function assertCount(name,actual,expected){if(actual===expected){p++;console.log("  ✓ "+name);}else{f++;console.log("  ✗ "+name+" (got "+actual+", expected "+expected+")");}}
+console.log("thinking-only-guard tests (multi-call version)");
+// --- wouldThinking (should trigger) ---
+assert("read in thinking, no text",wouldThinking('<tool_call>\n<function=read>\n<parameter=path>\n/home/reluxa/.profile\n</parameter>\n</function>\n</tool_call>',[{type:"thinking"}]),true);
+assert("bash in thinking, no text",wouldThinking('<tool_call>\n<function=bash>\n<parameter=command>\necho hello\n</parameter>\n</function>\n</tool_call>',[{type:"thinking"}]),true);
+assert("two tool calls in thinking, no text",wouldThinking('<tool_call>\n<function=bash>\n<parameter=command>\necho a\n</parameter>\n</function>\n</tool_call><tool_call>\n<function=bash>\n<parameter=command>\nls\n</parameter>\n</function>\n</tool_call>',[{type:"thinking"}]),true);
+assert("bash + read in thinking, no text",wouldThinking('<tool_call>\n<function=bash>\n<parameter=command>\nls\n</parameter>\n</function>\n</tool_call><tool_call>\n<function=read>\n<parameter=path>\n/home/reluxa/.profile\n</parameter>\n</function>\n</tool_call>',[{type:"thinking"}]),true);
+assert("bash + read + bash in thinking, no text",wouldThinking('<tool_call>\n<function=bash>\n<parameter=command>\nls\n</parameter>\n</function>\n</tool_call><tool_call>\n<function=read>\n<parameter=path>\n/home/reluxa/.profile\n</parameter>\n</function>\n</tool_call><tool_call>\n<function=bash>\n<parameter=command>\npwd\n</parameter>\n</function>\n</tool_call>',[{type:"thinking"}]),true);
+assert("tool call + text before it, still 1 tc",wouldThinking('No model.json found.\n\n<tool_call>\n<function=read>\n<parameter=path>\n/home/reluxa/.profile\n</parameter>\n</function>\n</tool_call>',[{type:"thinking"}]),true);
+// --- wouldThinking (should NOT trigger) ---
+assert("read + text block",wouldThinking('<tool_call>\n<function=read>\n<parameter=path>\n/home/reluxa/.profile\n</parameter>\n</function>\n</tool_call>',[{type:"thinking"},{type:"text",text:"x"}]),false);
+assert("read + real toolCall",wouldThinking('<tool_call>\n<function=read>\n<parameter=path>\n/home/reluxa/.profile\n</parameter>\n</function>\n</tool_call>',[{type:"thinking"},{type:"toolCall",id:"tc1"}]),false);
+assert("plain thinking, no tool calls",wouldThinking('This is your home directory with quite a lot of files.',[{type:"thinking"}]),false);
+assert("empty thinking",wouldThinking('',[{type:"thinking",thinking:""}]),false);
+// --- extractAllToolCalls ---
+assertCount("extract 1 tool call",extractAllToolCalls('<tool_call>\n<function=read>\n<parameter=path>\n/home/reluxa/.profile\n</parameter>\n</function>\n</tool_call>').length,1);
+assertCount("extract 2 tool calls",extractAllToolCalls('<tool_call>\n<function=bash>\n<parameter=command>\necho a\n</parameter>\n</function>\n</tool_call><tool_call>\n<function=bash>\n<parameter=command>\nls\n</parameter>\n</function>\n</tool_call>').length,2);
+assertCount("extract 3 tool calls",extractAllToolCalls('<tool_call>\n<function=bash>\n<parameter=command>\nls\n</parameter>\n</function>\n</tool_call><tool_call>\n<function=read>\n<parameter=path>\n/home/reluxa/.profile\n</parameter>\n</function>\n</tool_call><tool_call>\n<function=bash>\n<parameter=command>\npwd\n</parameter>\n</function>\n</tool_call>').length,3);
+assertCount("no tool calls",extractAllToolCalls('This is your home directory with quite a lot of files.').length,0);
+console.log("passed: "+p+", failed: "+f);
+process.exit(f>0?1:0);

package/thinking-only-guard.ts ADDED Viewed

@@ -0,0 +1,142 @@
+/**
+ * thinking-only-guard.ts
+ *
+ * Detects when Qwen3.6 puts a tool call inside the thinking section
+ * instead of executing it as a real tool call, leaving the message
+ * with only thinking content and no text or tool calls.
+ *
+ * Extracts the trapped tool call and sends it back to the model
+ * so it can execute it properly instead of just asking to continue.
+ *
+ * Trigger condition:
+ *   - Exactly ONE tool call marker inside thinking content
+ *   - No text block in the content array
+ *   - No real toolCall entries in the content array
+ *   - Only fires during live streaming (not on session replay)
+ *
+ * Events used:
+ *   - message_update: accumulates thinking content during streaming
+ *   - turn_end: resets retry counter per turn
+ *   - message_end: detects the pattern and sends the trapped tool call
+ */
+import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
+// Tool call markers used by pi's tool format
+const TOOL_OPEN = "<tool_call>";
+const TOOL_CLOSE = "</tool_call>";
+/** Extract the tool call block from thinking text.
+ * Returns the raw tool call string including markers, or null.
+ */
+function extractToolCall(text: string): string | null {
+  const openIdx = text.indexOf(TOOL_OPEN);
+  if (openIdx === -1) return null;
+  const closeIdx = text.indexOf(TOOL_CLOSE, openIdx);
+  if (closeIdx === -1) return null;
+  return text.slice(openIdx, closeIdx + TOOL_CLOSE.length);
+}
+/** Count how many tool_call blocks appear in the thinking text */
+function countToolCalls(text: string): number {
+  return (text.split(TOOL_OPEN).length - 1);
+}
+/** Extract all tool call blocks from thinking text */
+function extractAllToolCalls(text: string): string[] {
+  const calls: string[] = [];
+  let idx = 0;
+  while (true) {
+    const openIdx = text.indexOf(TOOL_OPEN, idx);
+    if (openIdx === -1) break;
+    const closeIdx = text.indexOf(TOOL_CLOSE, openIdx);
+    if (closeIdx === -1) break;
+    calls.push(text.slice(openIdx, closeIdx + TOOL_CLOSE.length));
+    idx = closeIdx + TOOL_CLOSE.length;
+  }
+  return calls;
+}
+export default function (pi: ExtensionAPI) {
+  const maxRetries = 2;
+  let lastThinking = "";
+  let sawThinkingDelta = false;
+  let retryCount = 0;
+  pi.on("message_start", async (event) => {
+    if (event.message.role === "assistant") {
+      lastThinking = "";
+      sawThinkingDelta = false;
+    }
+  });
+  pi.on("message_update", async (event) => {
+    const amEvent = event.assistantMessageEvent;
+    if (amEvent.type === "thinking_delta") {
+      sawThinkingDelta = true;
+      lastThinking += amEvent.delta;
+    }
+  });
+  pi.on("turn_end", async () => {
+    retryCount = 0;
+  });
+  pi.on("message_end", async (event, ctx) => {
+    if (event.message.role !== "assistant") {
+      lastThinking = "";
+      sawThinkingDelta = false;
+      return;
+    }
+    const content = event.message.content;
+    const thinking = lastThinking;
+    // Only act on live streaming (not session replay / restore)
+    if (!sawThinkingDelta || thinking.length === 0) {
+      lastThinking = "";
+      sawThinkingDelta = false;
+      return;
+    }
+    const toolCallCount = countToolCalls(thinking);
+    // Check content array for text blocks and real toolCall entries
+    const hasText = typeof content === "string"
+      ? content.trim().length > 0
+      : Array.isArray(content)
+        ? content.some((c: any) => c.type === "text" && c.text?.trim().length > 0)
+        : false;
+    const hasRealToolCalls = Array.isArray(content)
+      ? content.some((c: any) => c.type === "toolCall")
+      : false;
+    // Trigger: exactly one tool call trapped in thinking, no text, no real tool calls
+    if (toolCallCount >= 1 && !hasText && !hasRealToolCalls) {
+      if (retryCount >= maxRetries) {
+        ctx.ui.notify(`Thinking guard: stopped after ${maxRetries} retries`, "warn");
+      } else {
+        retryCount++;
+        const trappedCall = extractToolCall(thinking);
+        let followUp;
+        if (trappedCall) {
+          // Send the exact tool call back so the model can execute it
+          followUp = `Your last response had this tool call inside your thinking block instead of executing it. Please execute it now:\n\n${trappedCall}`;
+        } else {
+          followUp = "You placed a tool call inside your thinking block instead of executing it. Please continue and make the tool call properly.";
+        }
+        ctx.ui.notify(
+          `Thinking guard: repeating trapped tool call(s) (attempt ${retryCount}/${maxRetries})`,
+          "warn"
+        );
+        setTimeout(() => {
+          pi.sendUserMessage(followUp, { triggerTurn: true });
+        }, 500);
+      }
+    }
+    lastThinking = "";
+    sawThinkingDelta = false;
+  });
+}