@estebanforge/pi-glm-tweaks 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 EstebanForge
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,73 @@
1
+ # @estebanforge/pi-glm-tweaks
2
+
3
+ Pi-native tweaks for Z.AI's **GLM-5.2**. Restricts the Pi thinking-level UI to the three modes GLM-5.2 actually supports (**off**, **high**, **max**), wires the native `thinkingFormat:"zai"` translation, and auto-clamps any stale level when the model is selected.
4
+
5
+ ## Install
6
+
7
+ ```
8
+ pi install npm:@estebanforge/pi-glm-tweaks
9
+ ```
10
+
11
+ Works with Pi's built-in `zai/glm-5.2` model out of the box, or a custom entry in `~/.pi/agent/models.json`. The extension re-registers it with the OpenAI-compat endpoint and the proper thinking map. Other Z.AI models (`zai/glm-4.7`, `zai/glm-5-turbo`, `zai/glm-5.1`, plus any custom entries) are preserved across the re-registration.
12
+
13
+ ## What it does
14
+
15
+ GLM-5.2 ships three thinking modes (per [docs.z.ai](https://docs.z.ai/guides/capabilities/thinking)):
16
+
17
+ | Pi thinking level | GLM-5.2 wire |
18
+ | --- | --- |
19
+ | `off` | `thinking: { type: "disabled" }` |
20
+ | `high` | `thinking: { type: "enabled" }` + `reasoning_effort: "high"` |
21
+ | `max` (Pi `xhigh`) | `thinking: { type: "enabled" }` + `reasoning_effort: "max"` |
22
+
23
+ Pi natively exposes six thinking levels (`off`, `minimal`, `low`, `medium`, `high`, `xhigh`). GLM-5.2 doesn't really fit the middle four — `low`/`medium` get mapped to `high` server-side, `minimal` skips thinking, and `xhigh` is the only way to reach `reasoning_effort: "max"`.
24
+
25
+ This extension collapses that mismatch:
26
+
27
+ 1. **Re-registers `zai/glm-5.2`** on `session_start` with `api: "openai-completions"`, `baseUrl: https://api.z.ai/api/coding/paas/v4`, `compat.thinkingFormat: "zai"`, and a tight `thinkingLevelMap`:
28
+ ```ts
29
+ {
30
+ minimal: null, // hidden
31
+ low: null, // hidden
32
+ medium: null, // hidden
33
+ high: "high", // → reasoning_effort: "high"
34
+ xhigh: "max", // → reasoning_effort: "max"
35
+ // off omitted → supported, sends thinking.type = "disabled"
36
+ }
37
+ ```
38
+ 2. **Auto-clamps on `model_select`** — if the current level is one we hid (e.g. you switched from a model that allowed `medium`), quietly bump to `high` and notify.
39
+ 3. **Footer hint** — sets `ctx.ui.setStatus("glm-thinking", "thinking: off | high | max")` while GLM-5.2 is the active model.
40
+
41
+ `Shift+Tab`, `/thinking`, and the level picker all see only the three GLM-5.2 modes.
42
+
43
+ ## Token-efficiency tweaks
44
+
45
+ GLM-5.2 overthinks on long agent loops — it can spend an entire turn on `reasoning_content` without taking a tool call. The Z.AI API does not expose a `max_thinking_tokens` parameter, so the post that popularised this observation does it at the provider layer (mid-stream injection). We can't intercept the stream, but we can approximate the win with three cheap, opt-out tweaks:
46
+
47
+ | Flag | Default | What it does |
48
+ | --- | --- | --- |
49
+ | `glm-budget-nudge` | `true` | (a) Appends a soft thinking-budget fragment to the system prompt on every zai/glm-5.2 turn. (b) Per LLM call, sums `reasoning_content` across prior assistant messages in the current agent loop (the one started by the most recent user prompt); if cumulative exceeds ~2000 characters (roughly 500 English tokens), injects a one-shot hint to push the model back toward tool calls. Fires at most once per loop. The hint appears in the conversation panel as a user message prefixed `[system reminder: ...]` — that is intentional, so you can see when the ratchet fired. |
50
+ | `glm-clear-thinking` | `true` | Forces `clear_thinking: true` on every request. The coding endpoint (`api.z.ai/api/coding/paas/v4`) defaults to preserved thinking, which silently compounds `reasoning_content` across turns. At $4.4/MTok output, this is real money. |
51
+ | `glm-quick-disable` | `true` | For user prompts under 80 chars, forces `thinking.type: "disabled"` for that turn. Trivial questions ("what time is it") don't need deep thinking. |
52
+
53
+ All three flags surface in `pi config` and Pi's flag editor — `pi config set glm-budget-nudge false` to disable.
54
+
55
+ ### What the tweaks cannot do
56
+
57
+ - Cap thinking tokens at a wire level. Z.AI does not expose a thinking budget param.
58
+ - Inject text mid-stream. No Pi hook for streaming chunk mutation.
59
+ - Force the model to call a tool. The system prompt can ask; nothing forces it.
60
+ - Lower `reasoning_effort` per-request. Per [KiwiGaze/glm-for-copilot #7](https://github.com/KiwiGaze/glm-for-copilot/issues/7) it's a no-op on `/chat/completions`.
61
+
62
+ ## Why this exists
63
+
64
+ Pi's built-in `thinkingFormat: "zai"` (in `openai-completions.js`) already knows the wire translation. The catch is that GLM-5.2's user-defined model in `models.json` typically lacks a `thinkingLevelMap`, so the UI shows all six levels and sends invalid combinations on hidden ones. This extension fills that gap automatically — no manual `models.json` editing.
65
+
66
+ ## Compatibility
67
+
68
+ - Pi (`@earendil-works/pi-coding-agent`) — any version with `registerProvider` taking effect post-bind and `thinkingFormat: "zai"` support, plus the `before_agent_start` / `context` / `before_provider_request` / `registerFlag` hooks.
69
+ - Z.AI API key — resolved through Pi's standard auth storage (env var `ZAI_API_KEY`, `/login`, or `models.json` provider `apiKey`). The extension does not configure auth.
70
+
71
+ ## License
72
+
73
+ MIT
@@ -0,0 +1,289 @@
1
+ /**
2
+ * pi-glm-tweaks — Pi-native tweaks for Z.AI's GLM-5.2.
3
+ *
4
+ * Restricts the Pi thinking-level UI to the three modes GLM-5.2 actually
5
+ * supports (off, high, max), wires the native `thinkingFormat: "zai"` wire
6
+ * translation, auto-clamps hidden levels, and applies token-efficiency
7
+ * hygiene (per-turn system-prompt nudge, intra-loop ratchet, wire-level
8
+ * clear_thinking and short-prompt quick-disable).
9
+ *
10
+ * Wire map (see https://docs.z.ai/guides/capabilities/thinking and
11
+ * providers/openai-completions.js in pi-ai):
12
+ *
13
+ * Pi level | thinking.type | reasoning_effort
14
+ * ----------|---------------|------------------
15
+ * off | "disabled" | (omitted)
16
+ * high | "enabled" | "high"
17
+ * xhigh | "enabled" | "max"
18
+ *
19
+ * Hidden levels (minimal, low, medium) are Pi-side concepts that don't map
20
+ * cleanly: low/medium get server-side-mapped to "high", minimal is a no-op
21
+ * for Pi's reasoning transport. Showing them invites accidental footguns.
22
+ *
23
+ * Behavior:
24
+ * - On session_start, re-register the `zai` provider with GLM-5.2 redefined
25
+ * against the OpenAI-compat endpoint and the tight thinkingLevelMap.
26
+ * registerProvider takes effect immediately after bindCore (no /reload).
27
+ * - On model_select to zai/glm-5.2, clamp a stale hidden level to "high"
28
+ * and notify. Set the footer status hint.
29
+ * - On model_select to any other model, clear the footer status.
30
+ * - On every user turn, inject a soft system-prompt budget fragment
31
+ * (`glm-budget-nudge`, default on).
32
+ * - Per LLM call, count cumulative reasoning_content; if over a
33
+ * threshold, inject a one-shot user-side hint to push the model back
34
+ * toward tool calls (`glm-budget-nudge`).
35
+ * - On every outgoing request, force `clear_thinking: true` (the coding
36
+ * endpoint defaults to preserved thinking, which silently compounds
37
+ * `reasoning_content` across turns). `glm-clear-thinking`, default on.
38
+ * - On short user prompts (<80 chars), force `thinking.type: "disabled"`
39
+ * to save tokens on trivial turns. `glm-quick-disable`, default on.
40
+ *
41
+ * Auth is untouched. The provider's existing key (ZAI_API_KEY env, /login,
42
+ * or models.json apiKey) continues to resolve against the new baseUrl.
43
+ */
44
+ import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
45
+
46
+ const PROVIDER = "zai";
47
+ const MODEL_ID = "glm-5.2";
48
+ const ZAI_CODING_BASE_URL = "https://api.z.ai/api/coding/paas/v4";
49
+
50
+ // Pi thinking-level keys we hide for GLM-5.2. Listed explicitly so the map
51
+ // stays grep-friendly; any level not present (notably `off`) is supported
52
+ // with the provider's default mapping (here: thinking.type="disabled").
53
+ const HIDDEN_LEVELS = new Set(["minimal", "low", "medium"]);
54
+
55
+ // Token-efficiency tuning constants. Hardcoded for v1 — exposed as flags
56
+ // would be over-engineering for a single-model extension. Bump these in
57
+ // a future minor if users report the ratchet firing too eagerly / not
58
+ // eagerly enough.
59
+ const SHORT_PROMPT_THRESHOLD = 80;
60
+ const RATCHET_THRESHOLD_CHARS = 2_000;
61
+
62
+ // Soft system-prompt fragment appended to every zai/glm-5.2 turn when
63
+ // the budget-nudge flag is on. No "I'm overthinking" ack string — that's
64
+ // unenforceable (model may or may not emit it, may emit it in Chinese,
65
+ // and we'd have to detect it).
66
+ const BUDGET_FRAGMENT = `
67
+
68
+ <glm-thinking-budget>
69
+ You are operating under a per-turn thinking budget. Behave accordingly:
70
+ - Cap each thinking block at ~500 tokens. Don't ruminate; commit to a tool call or response.
71
+ - Take a tool call every 200-300 thinking tokens. Don't sit and speculate without acting.
72
+ - Prefer a concrete tool call over further internal deliberation.
73
+ </glm-thinking-budget>`;
74
+
75
+ // Redefined glm-5.2 model entry. `cost` mirrors the built-in (Z.AI does
76
+ // not publish per-token rates; zeros is conservative). thinkingLevelMap
77
+ // doubles as UI-hide (`null`) and wire-level safety net: Pi's zai branch
78
+ // in openai-completions.js reads this map for reasoning_effort, and a
79
+ // null entry produces no reasoning_effort field on the wire. baseUrl
80
+ // is per-model (not provider-level) so we don't override any custom
81
+ // baseUrl the user may have set on other `zai/*` models.
82
+ const GLM52_MODEL = {
83
+ id: MODEL_ID,
84
+ name: "GLM-5.2",
85
+ api: "openai-completions",
86
+ baseUrl: ZAI_CODING_BASE_URL,
87
+ reasoning: true,
88
+ input: ["text"] as ("text" | "image")[],
89
+ contextWindow: 1_000_000,
90
+ maxTokens: 131_072,
91
+ cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
92
+ thinkingLevelMap: {
93
+ minimal: null,
94
+ low: null,
95
+ medium: null,
96
+ high: "high",
97
+ xhigh: "max",
98
+ },
99
+ compat: {
100
+ supportsDeveloperRole: false,
101
+ supportsReasoningEffort: true,
102
+ thinkingFormat: "zai" as const,
103
+ zaiToolStream: true,
104
+ },
105
+ };
106
+
107
+ function isZaiGlm52(model: { provider: string; id: string } | undefined | null): boolean {
108
+ return !!model && model.provider === PROVIDER && model.id === MODEL_ID;
109
+ }
110
+
111
+ export default function (pi: ExtensionAPI) {
112
+ // Register Pi-idiomatic flags at factory load time, NOT inside
113
+ // session_start. registerFlag is static setup; calling it per session
114
+ // would clobber user preferences on every /new or /reload.
115
+ pi.registerFlag("glm-budget-nudge", {
116
+ description: "Inject a soft thinking-budget system-prompt fragment and intra-loop ratchet for zai/glm-5.2.",
117
+ type: "boolean",
118
+ default: true,
119
+ });
120
+ pi.registerFlag("glm-clear-thinking", {
121
+ description: "Force clear_thinking=true on zai/glm-5.2 requests to prevent cross-turn reasoning_content carryover on the coding endpoint.",
122
+ type: "boolean",
123
+ default: true,
124
+ });
125
+ pi.registerFlag("glm-quick-disable", {
126
+ description: "Disable thinking on short user prompts (<80 chars) to save tokens on trivial turns.",
127
+ type: "boolean",
128
+ default: true,
129
+ });
130
+
131
+ // Per-loop mutable state. Node.js runs the extension hooks single-
132
+ // threaded, so a closure-scoped object is safe and avoids re-reading
133
+ // flags + recomputing in every hook. Reset on every before_agent_start.
134
+ const loop: {
135
+ shortPrompt: boolean;
136
+ ratchetFired: boolean;
137
+ } = { shortPrompt: false, ratchetFired: false };
138
+
139
+ pi.on("session_start", async (_event, ctx) => {
140
+ // Build the full `zai` provider model list, patching only glm-5.2.
141
+ // registerProvider replaces ALL models for the provider when models
142
+ // are provided, so a single-entry list would silently drop
143
+ // glm-4.7, glm-5-turbo, glm-5.1, and any user-added zai entries.
144
+ const existing = ctx.modelRegistry.getAll().filter((m) => m.provider === PROVIDER);
145
+ if (existing.length === 0) return;
146
+ if (!existing.some((m) => m.id === MODEL_ID)) return;
147
+
148
+ // registerProvider requires apiKey (or oauth) when defining models,
149
+ // even for a provider that already has auth resolved. Pull the
150
+ // resolved key from the existing provider so we keep working
151
+ // whether the user used ZAI_API_KEY env, /login, or models.json
152
+ // apiKey.
153
+ const apiKey = await ctx.modelRegistry.getApiKeyForProvider(PROVIDER);
154
+ if (!apiKey) {
155
+ ctx.ui.notify(
156
+ "pi-glm-tweaks: ZAI auth not configured. Run `/login` or set ZAI_API_KEY to enable GLM-5.2 thinking tweaks.",
157
+ "warning",
158
+ );
159
+ return;
160
+ }
161
+
162
+ // Per-model spread preserves every original field (api, baseUrl,
163
+ // headers, compat extras) for non-target models. Only glm-5.2 gets
164
+ // the new thinkingLevelMap, baseUrl, and OpenAI-compat compat block.
165
+ // baseUrl is set at BOTH provider level (required by validation;
166
+ // satisfies the model-registry check) and per-model in GLM52_MODEL
167
+ // (per-model takes precedence at request time, so any custom
168
+ // baseUrl the user has on other `zai/*` models is preserved by
169
+ // the spread).
170
+ const models = existing.map((m) => (isZaiGlm52(m) ? GLM52_MODEL : { ...m }));
171
+ pi.registerProvider(PROVIDER, {
172
+ baseUrl: ZAI_CODING_BASE_URL,
173
+ apiKey,
174
+ models,
175
+ });
176
+ });
177
+
178
+ pi.on("before_agent_start", (event, ctx) => {
179
+ // Reset per-loop state at the start of each user turn. The other
180
+ // hooks read these to drive their per-turn behavior.
181
+ loop.shortPrompt = event.prompt.length < SHORT_PROMPT_THRESHOLD;
182
+ loop.ratchetFired = false;
183
+
184
+ if (!isZaiGlm52(ctx.model)) return {};
185
+ if (pi.getFlag("glm-budget-nudge") !== true) return {};
186
+
187
+ // Return the assembled prompt with our fragment appended. We must
188
+ // concat (not replace) — Pi's before_agent_start chaining means
189
+ // our systemPrompt replaces the upstream value, and other
190
+ // extensions downstream only see what we return.
191
+ return { systemPrompt: (event.systemPrompt ?? "") + BUDGET_FRAGMENT };
192
+ });
193
+
194
+ pi.on("context", (event, ctx) => {
195
+ if (!isZaiGlm52(ctx.model)) return {};
196
+ if (pi.getFlag("glm-budget-nudge") !== true) return {};
197
+ if (loop.ratchetFired) return {};
198
+
199
+ // Sum reasoning_content from assistant messages in the CURRENT
200
+ // agent loop only. Find the boundary by walking back to the last
201
+ // `role: "user"` message (the prompt that started this loop).
202
+ // toolResult / assistant / custom / etc. are not user role, so
203
+ // they don't reset the boundary. Without this scoping, a long
204
+ // session would fire the ratchet on the first LLM call of every
205
+ // new turn regardless of current-loop thinking.
206
+ let loopStart = event.messages.length - 1;
207
+ while (loopStart > 0) {
208
+ const m = event.messages[loopStart] as { role?: string } | undefined;
209
+ if (m?.role === "user") break;
210
+ loopStart--;
211
+ }
212
+
213
+ let totalReasoning = 0;
214
+ for (let i = loopStart + 1; i < event.messages.length; i++) {
215
+ const m = event.messages[i];
216
+ if (typeof m !== "object" || m === null) continue;
217
+ const msg = m as { role?: string; reasoning_content?: unknown };
218
+ if (msg.role !== "assistant") continue;
219
+ if (typeof msg.reasoning_content === "string") {
220
+ totalReasoning += msg.reasoning_content.length;
221
+ }
222
+ }
223
+ if (totalReasoning < RATCHET_THRESHOLD_CHARS) return {};
224
+
225
+ loop.ratchetFired = true;
226
+ const hint = {
227
+ role: "user",
228
+ content:
229
+ "[system reminder: you've been thinking extensively without taking a tool call. Take a tool call now or wrap up your response.]",
230
+ };
231
+ return { messages: [...event.messages, hint as never] };
232
+ });
233
+
234
+ pi.on("before_provider_request", (event, ctx) => {
235
+ if (!isZaiGlm52(ctx.model)) return;
236
+ if (!event.payload || typeof event.payload !== "object") return;
237
+
238
+ const obj = event.payload as Record<string, unknown>;
239
+ const current = obj.thinking;
240
+ const thinking =
241
+ current && typeof current === "object" && !Array.isArray(current)
242
+ ? { ...(current as Record<string, unknown>) }
243
+ : ({} as Record<string, unknown>);
244
+
245
+ let mutated = false;
246
+
247
+ // Force clear_thinking on every request. The coding endpoint
248
+ // defaults to preserved thinking (clear_thinking: false), which
249
+ // silently compounds reasoning_content across turns. Cost at
250
+ // $4.4/MTok output makes this materially expensive.
251
+ if (pi.getFlag("glm-clear-thinking") === true) {
252
+ thinking.clear_thinking = true;
253
+ mutated = true;
254
+ }
255
+
256
+ // Short-prompt quick-disable: trivial turns ("what time is it")
257
+ // don't need deep thinking. Force the kill switch and let Pi's
258
+ // zai branch drop the thinking.type="disabled" through.
259
+ if (pi.getFlag("glm-quick-disable") === true && loop.shortPrompt) {
260
+ thinking.type = "disabled";
261
+ mutated = true;
262
+ }
263
+
264
+ if (mutated) {
265
+ obj.thinking = thinking;
266
+ }
267
+ return obj;
268
+ });
269
+
270
+ pi.on("model_select", (event, ctx) => {
271
+ if (!isZaiGlm52(event.model)) {
272
+ ctx.ui.setStatus("glm-thinking", undefined);
273
+ return;
274
+ }
275
+
276
+ // Auto-clamp if Pi's current level is one we hid for GLM-5.2.
277
+ // setThinkingLevel is a no-op if already at the requested level.
278
+ const current = pi.getThinkingLevel();
279
+ if (HIDDEN_LEVELS.has(current)) {
280
+ pi.setThinkingLevel("high");
281
+ ctx.ui.notify(
282
+ `GLM-5.2 thinking: "${current}" not supported. Switched to high (off | high | max).`,
283
+ "info",
284
+ );
285
+ }
286
+
287
+ ctx.ui.setStatus("glm-thinking", "thinking: off | high | max");
288
+ });
289
+ }
package/package.json ADDED
@@ -0,0 +1,49 @@
1
+ {
2
+ "name": "@estebanforge/pi-glm-tweaks",
3
+ "version": "1.0.0",
4
+ "description": "Pi-native tweaks for Z.AI's GLM-5.2. Restricts the Pi thinking-level UI to the three modes GLM-5.2 actually supports (off, high, max), wires the native thinkingFormat:\"zai\" wire translation, and auto-clamps hidden levels when the model is selected.",
5
+ "keywords": [
6
+ "pi-package",
7
+ "pi-extension",
8
+ "zai",
9
+ "z-ai",
10
+ "glm",
11
+ "glm-5.2",
12
+ "thinking",
13
+ "reasoning_effort",
14
+ "thinking-mode"
15
+ ],
16
+ "license": "MIT",
17
+ "author": {
18
+ "name": "EstebanForge",
19
+ "email": "esteban@attitude.cl",
20
+ "url": "https://actitud.xyz"
21
+ },
22
+ "repository": {
23
+ "type": "git",
24
+ "url": "git+https://github.com/EstebanForge/pi-glm-tweaks.git"
25
+ },
26
+ "type": "module",
27
+ "files": [
28
+ "extensions",
29
+ "README.md"
30
+ ],
31
+ "pi": {
32
+ "extensions": [
33
+ "./extensions"
34
+ ]
35
+ },
36
+ "peerDependencies": {
37
+ "@earendil-works/pi-coding-agent": "*"
38
+ },
39
+ "peerDependenciesMeta": {
40
+ "@earendil-works/pi-coding-agent": {
41
+ "optional": true
42
+ }
43
+ },
44
+ "devDependencies": {
45
+ "@earendil-works/pi-coding-agent": "^0.80.2",
46
+ "@types/node": "^22.0.0",
47
+ "typescript": "^5.8.0"
48
+ }
49
+ }