glm-mcp-copilot 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 djerok
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,63 @@
1
+ # glm-mcp-copilot — GLM as a cheap delegate for GitHub Copilot (VS Code)
2
+
3
+ Use the **GLM** model (Zhipu / Z.ai) as a **~10× cheaper delegate** inside **GitHub Copilot / Copilot
4
+ Chat** (VS Code agent mode). It's the **same GLM MCP server** used by the Claude Code version — Copilot
5
+ calls `glm_agent` / `glm_delegate` / `glm_recommend` / `glm_status` to offload work to GLM.
6
+
7
+ > Sibling package: **[glm-mcp-claude](../README.md)** (the Claude Code version). Same server, different host.
8
+
9
+ ## What you get
10
+ - The **glm MCP server** registered in VS Code (agent mode) — tools:
11
+ - **`glm_agent`** — GLM works your repo directly (read/write/edit/run), returns a concise summary + stats.
12
+ - **`glm_delegate`** — GLM drafts text you place.
13
+ - **`glm_recommend`** — free advisory: GLM vs the default model.
14
+ - **`glm_status`** — usage ledger (proof of GLM tokens spent) + config.
15
+ - A **`.github/copilot-instructions.md`** delegation policy so Copilot offloads to GLM automatically.
16
+
17
+ ## Prerequisites
18
+ - **VS Code** with **GitHub Copilot + Copilot Chat**, and **Agent mode** available (MCP support).
19
+ - **Node.js ≥ 18**.
20
+ - A **Z.ai / Zhipu GLM Coding Plan** API key — https://z.ai (the only paid key needed).
21
+
22
+ ## Install
23
+ ```bash
24
+ # from npm:
25
+ npx glm-mcp-copilot --key YOUR_ZAI_API_KEY
26
+
27
+ # or clone the repo and run the Copilot installer:
28
+ git clone https://github.com/djerok/glm-mcp
29
+ node glm-mcp/copilot/install-copilot.mjs --key YOUR_ZAI_API_KEY
30
+ ```
31
+ Run it **from your project folder** (it sets up that workspace). It:
32
+ 1. installs the GLM MCP server to `~/.glm-mcp/glm-mcp/` and runs `npm install`,
33
+ 2. writes your key to that server's `.env`,
34
+ 3. registers the server in `.vscode/mcp.json` (VS Code's `servers` format),
35
+ 4. writes `.github/copilot-instructions.md` (the delegation policy).
36
+
37
+ Then in VS Code: **Reload Window → open Copilot Chat → Agent mode → start the `glm` server** (`MCP: List
38
+ Servers`). Ask Copilot to do a coding task; it will call `glm_agent`.
39
+
40
+ ## How it differs from the Claude Code version
41
+ Copilot doesn't have Claude Code's *subagents* or *PreToolUse hooks*, so there's no auto-routing hook or
42
+ `glm` subagent. Instead:
43
+ - **MCP tools** (`glm_*`) are available in **agent mode** and Copilot calls them.
44
+ - **`.github/copilot-instructions.md`** steers Copilot to delegate to GLM (the CLAUDE.md equivalent).
45
+
46
+ Everything else — the GLM agent loop, peak-aware model pick, cost bias, token cap, usage ledger,
47
+ `dry_run` oversight — is the **same server**, so it behaves identically once a tool is called.
48
+
49
+ ## Configuration
50
+ Same `.env` knobs as the Claude version, in `~/.glm-mcp/glm-mcp/.env`:
51
+ `GLM_API_KEY`, `GLM_BASE_URL`, `GLM_COST_BIAS`, `GLM_CAP`, `GLM_MAX_TOKENS`, `GLM_OFFPEAK_MODEL` /
52
+ `GLM_PEAK_MODEL`, etc. See `glm-mcp/.env.example`.
53
+
54
+ ## Verifying GLM usage
55
+ `glm_status` (or `~/.glm-mcp/glm-mcp/usage.jsonl`) logs every GLM call (model + tokens) — independent
56
+ proof that work ran on GLM, not Copilot's default model.
57
+
58
+ ## Security
59
+ - Your key lives in `~/.glm-mcp/glm-mcp/.env` (git-ignored) — not committed, not in the npm package.
60
+ - GLM routes through servers in China — keep secrets/regulated code on the default model.
61
+
62
+ ## License
63
+ [MIT](LICENSE) © [djerok](https://github.com/djerok) · Canonical repo: https://github.com/djerok/glm-mcp
@@ -0,0 +1,21 @@
1
+ <!-- GLM delegation policy for GitHub Copilot / Copilot Chat (agent mode). -->
2
+ <!-- Installed to .github/copilot-instructions.md so Copilot reads it automatically. -->
3
+
4
+ # GLM delegation
5
+
6
+ You have the **glm** MCP server available in agent mode, with these tools:
7
+ `glm_agent`, `glm_delegate`, `glm_recommend`, `glm_status`. GLM (Zhipu/Z.ai) is **~10× cheaper**
8
+ than the default Copilot model — offload work to it to save cost.
9
+
10
+ - For **any** implementation / codegen / edit / refactor / test / docs task, call **`glm_agent`**
11
+ with the goal + the absolute repo path (`workdir`). GLM reads, writes, edits, and runs the files
12
+ itself, end-to-end, on **GLM tokens**. Prefer this over doing the file work yourself.
13
+ - For pure text drafts (no file ops), use **`glm_delegate`**.
14
+ - **To actually spend GLM tokens (not Copilot's), you must call the glm tools — don't do the work
15
+ inline yourself.** Doing it yourself spends the default model's tokens, not GLM's.
16
+ - Keep on the **default model** for: sensitive/secret code, vision/images, parallel work, very large
17
+ context (>128K), and heavy dependent tool-loops. Unsure? call **`glm_recommend`** (free, local).
18
+ - `glm_status` shows the GLM usage ledger — proof of how many GLM tokens were actually spent.
19
+
20
+ Note: for oversight, `glm_agent` supports `dry_run: true` (propose a diff without writing) and returns
21
+ a concise summary + stats after each run.
@@ -0,0 +1,37 @@
1
+ # Copy to .env and fill in. .env is git-ignored.
2
+
3
+ # Your GLM (Zhipu / Z.ai) API key. Used as a Bearer token.
4
+ GLM_API_KEY=your-zai-key-here
5
+
6
+ # Anthropic-compatible endpoint for the GLM coding plan.
7
+ GLM_BASE_URL=https://api.z.ai/api/anthropic
8
+
9
+ # --- optional tuning (sensible defaults baked in) ---
10
+ # GLM_USE_HAIKU=off # off (DEFAULT) = skip the Haiku `glm` subagent and call GLM directly
11
+ # # (mcp__glm__glm_agent) so ALL tokens stay on GLM. Set to `on` to allow
12
+ # # the Haiku-orchestrated subagent (it spends some Claude tokens).
13
+ # GLM_COST_BIAS=7 # how hard to favor GLM. Default 7 => GLM handles ~98-100% of tasks
14
+ # # (Opus only for vision/parallel/huge-context/sensitive/heavy tool-loops).
15
+ # # Lower (e.g. 1.5) to route more hard tasks to Opus; 0 = capability only.
16
+ # GLM_MAX_CONCURRENT=1 # GLM caps in-flight requests ~1; keep at 1 unless your tier allows more
17
+ # --- output token cap (OFF by default = generous) ---
18
+ # By default the cap is OFF: every call may use up to GLM_MAX_TOKENS_CEILING (131072).
19
+ # max_tokens is a ceiling, not a target -- you pay for ACTUAL output, so leaving it off
20
+ # just prevents truncation. Turn the cap ON to control spend.
21
+ # GLM_CAP=off # off (default) | on -- enforce GLM_MAX_TOKENS when on
22
+ # GLM_MAX_TOKENS=32768 # the hard per-call limit applied WHEN GLM_CAP=on
23
+ # GLM_MAX_TOKENS_CEILING=131072 # the generous default used when the cap is OFF
24
+ # GLM_MAX_RETRIES=4
25
+ # GLM_TIMEOUT_MS=300000
26
+ # GLM_AGENT_MAX_ITERS=30 # max tool-loop turns for glm_agent before it stops
27
+ # GLM_AGENT_BASH_TIMEOUT_MS=120000 # per-command timeout for glm_agent's run_bash
28
+ # GLM_OFFPEAK_MODEL=glm-5.2 # model(s) for "auto" off-peak. Can be a COMMA LIST, e.g.
29
+ # # "glm-5.2,glm-5-turbo" -> the router auto-picks (most capable for
30
+ # # hard tasks, cheapest for easy ones).
31
+ # GLM_PEAK_MODEL=glm-5.2 # model(s) for "auto" during peak. glm-5.x carries the ~3x surcharge,
32
+ # # so when "auto" lands on a glm-5.x model the router routes LESS to GLM
33
+ # # at peak. Include a no-surcharge model (e.g. "glm-5.2,glm-4.7") and
34
+ # # the router will prefer it at peak -> GLM stays fine to use.
35
+ # GLM_CHEAP_MODEL=glm-4.5-air
36
+ # GLM_PEAK_START_CN=14 # peak window start, China hour (UTC+8)
37
+ # GLM_PEAK_END_CN=18 # peak window end (exclusive)
@@ -0,0 +1,21 @@
1
+ {
2
+ "name": "glm-mcp",
3
+ "version": "1.1.7",
4
+ "description": "MCP server that delegates self-contained subtasks to the GLM (Zhipu/Z.ai) Anthropic-compatible API, so Claude Code can use GLM as a cheap, peak-aware subagent.",
5
+ "type": "module",
6
+ "bin": {
7
+ "glm-mcp": "src/index.js"
8
+ },
9
+ "main": "src/index.js",
10
+ "scripts": {
11
+ "start": "node src/index.js",
12
+ "smoke": "node src/smoke.js"
13
+ },
14
+ "dependencies": {
15
+ "@modelcontextprotocol/sdk": "^1.0.0",
16
+ "zod": "^3.23.8"
17
+ },
18
+ "engines": {
19
+ "node": ">=18"
20
+ }
21
+ }
@@ -0,0 +1,227 @@
1
+ // glmAgent.js
2
+ // Runs GLM as a real tool-using agent against the local filesystem, with oversight
3
+ // built in so Opus can regulate and see exactly what GLM did:
4
+ // - returns a unified DIFF of every change (isolated to the files GLM touched)
5
+ // - returns an ACTION LOG of every read/write/edit/bash
6
+ // - records a non-invasive git checkpoint + revert hint (when in a git repo)
7
+ // - supports dry_run: GLM proposes changes to an in-memory overlay and writes NOTHING,
8
+ // so Opus can approve the diff before a real apply pass.
9
+
10
+ import { readFileSync, writeFileSync, readdirSync, statSync, mkdirSync, existsSync } from "node:fs";
11
+ import { resolve, dirname, relative, isAbsolute, join } from "node:path";
12
+ import { execSync } from "node:child_process";
13
+ import { glmMessage } from "./glmClient.js";
14
+
15
+ const MAX_ITERS = parseInt(process.env.GLM_AGENT_MAX_ITERS || "30", 10);
16
+ const BASH_TIMEOUT = parseInt(process.env.GLM_AGENT_BASH_TIMEOUT_MS || "120000", 10);
17
+ const FILE_READ_CAP = 100000;
18
+ const BASH_OUT_CAP = 30000;
19
+ const DIFF_CAP = 20000;
20
+ const DIFF_LINE_CAP = 3000;
21
+
22
+ const TOOLS = [
23
+ { name: "read_file", description: "Read a UTF-8 text file (path relative to working dir or absolute).",
24
+ input_schema: { type: "object", properties: { path: { type: "string" } }, required: ["path"] } },
25
+ { name: "write_file", description: "Create or overwrite a file. Creates parent dirs as needed.",
26
+ input_schema: { type: "object", properties: { path: { type: "string" }, content: { type: "string" } }, required: ["path", "content"] } },
27
+ { name: "edit_file", description: "Replace an exact substring in a file. old_string must appear exactly once.",
28
+ input_schema: { type: "object", properties: { path: { type: "string" }, old_string: { type: "string" }, new_string: { type: "string" } }, required: ["path", "old_string", "new_string"] } },
29
+ { name: "list_dir", description: "List entries in a directory (relative or absolute). Defaults to '.'.",
30
+ input_schema: { type: "object", properties: { path: { type: "string" } } } },
31
+ { name: "run_bash", description: "Run a shell command in the working dir; returns stdout+stderr. Disabled in dry_run.",
32
+ input_schema: { type: "object", properties: { command: { type: "string" } }, required: ["command"] } },
33
+ ];
34
+
35
+ function safeResolve(root, p) {
36
+ return isAbsolute(p || "") ? resolve(p) : resolve(root, p || ".");
37
+ }
38
+
39
+ function unifiedDiff(oldStr, newStr, path) {
40
+ if (oldStr === newStr) return "";
41
+ const A = oldStr.length ? oldStr.split("\n") : [];
42
+ const B = newStr.length ? newStr.split("\n") : [];
43
+ if (A.length > DIFF_LINE_CAP || B.length > DIFF_LINE_CAP) {
44
+ return `--- ${path}\n+++ ${path}\n@@ large file: ${A.length} -> ${B.length} lines (detailed diff omitted) @@\n`;
45
+ }
46
+ const n = A.length, m = B.length;
47
+ const dp = [];
48
+ for (let i = 0; i <= n; i++) dp.push(new Int32Array(m + 1));
49
+ for (let i = n - 1; i >= 0; i--)
50
+ for (let j = m - 1; j >= 0; j--)
51
+ dp[i][j] = A[i] === B[j] ? dp[i + 1][j + 1] + 1 : Math.max(dp[i + 1][j], dp[i][j + 1]);
52
+ const rows = [];
53
+ let i = 0, j = 0;
54
+ while (i < n && j < m) {
55
+ if (A[i] === B[j]) { rows.push([" ", A[i]]); i++; j++; }
56
+ else if (dp[i + 1][j] >= dp[i][j + 1]) { rows.push(["-", A[i]]); i++; }
57
+ else { rows.push(["+", B[j]]); j++; }
58
+ }
59
+ while (i < n) rows.push(["-", A[i++]]);
60
+ while (j < m) rows.push(["+", B[j++]]);
61
+ const out = [`--- ${path}`, `+++ ${path}`];
62
+ let ctx = [];
63
+ const flush = () => {
64
+ if (ctx.length > 6) {
65
+ out.push(" " + ctx[0], " " + ctx[1], `@@ ... ${ctx.length - 4} unchanged ... @@`, " " + ctx[ctx.length - 2], " " + ctx[ctx.length - 1]);
66
+ } else for (const c of ctx) out.push(" " + c);
67
+ ctx = [];
68
+ };
69
+ for (const [t, l] of rows) {
70
+ if (t === " ") ctx.push(l);
71
+ else { flush(); out.push(t + l); }
72
+ }
73
+ flush();
74
+ return out.join("\n") + "\n";
75
+ }
76
+
77
+ function gitCheckpoint(root) {
78
+ try {
79
+ execSync(`git -C "${root}" rev-parse --is-inside-work-tree`, { stdio: "ignore" });
80
+ } catch {
81
+ return { isRepo: false, baseline: null, revertHint: "Not a git repo — review the diff below; revert manually if needed." };
82
+ }
83
+ let baseline = "";
84
+ try { baseline = execSync(`git -C "${root}" stash create`, { encoding: "utf8" }).trim(); } catch {}
85
+ if (!baseline) {
86
+ try { baseline = execSync(`git -C "${root}" rev-parse HEAD`, { encoding: "utf8" }).trim(); } catch {}
87
+ }
88
+ return {
89
+ isRepo: true,
90
+ baseline,
91
+ revertHint: baseline
92
+ ? `To revert GLM's changes: \`git -C "${root}" checkout ${baseline} -- .\` then \`git -C "${root}" clean -fd\` to drop any new files. (Baseline is a non-invasive snapshot; your working tree was not modified by the checkpoint.)`
93
+ : "Git repo detected but baseline capture failed; use `git diff` / `git stash` to review and revert.",
94
+ };
95
+ }
96
+
97
+ export async function runGlmAgent({ model, task, context, workdir, maxTokens = 32768, thinking = false, dryRun = false }) {
98
+ const root = workdir && workdir.trim() ? resolve(workdir) : process.cwd();
99
+ const log = [];
100
+ const originals = new Map(); // abs -> pre-run disk content (string|null if didn't exist)
101
+ const overlay = new Map(); // dry_run staging: abs -> proposed content
102
+ const checkpoint = dryRun ? { isRepo: false, baseline: null, revertHint: "dry_run: nothing written." } : gitCheckpoint(root);
103
+
104
+ const recordOriginal = (abs) => {
105
+ if (!originals.has(abs)) {
106
+ try { originals.set(abs, readFileSync(abs, "utf8")); } catch { originals.set(abs, null); }
107
+ }
108
+ };
109
+ const readCurrent = (abs) => {
110
+ if (dryRun && overlay.has(abs)) return overlay.get(abs);
111
+ return readFileSync(abs, "utf8");
112
+ };
113
+ const writeCurrent = (abs, content) => {
114
+ if (dryRun) { overlay.set(abs, content); return; }
115
+ mkdirSync(dirname(abs), { recursive: true });
116
+ writeFileSync(abs, content, "utf8");
117
+ };
118
+
119
+ function runTool(name, input) {
120
+ try {
121
+ switch (name) {
122
+ case "read_file": {
123
+ const abs = safeResolve(root, input.path);
124
+ const txt = readCurrent(abs);
125
+ log.push(`read ${relative(root, abs) || input.path}`);
126
+ return txt.length > FILE_READ_CAP ? txt.slice(0, FILE_READ_CAP) + "\n…[truncated]" : txt;
127
+ }
128
+ case "write_file": {
129
+ const abs = safeResolve(root, input.path);
130
+ recordOriginal(abs);
131
+ writeCurrent(abs, input.content ?? "");
132
+ log.push(`${dryRun ? "[dry] " : ""}write ${relative(root, abs) || input.path}`);
133
+ return `${dryRun ? "(dry_run, staged) " : ""}Wrote ${(input.content ?? "").length} chars to ${input.path}.`;
134
+ }
135
+ case "edit_file": {
136
+ const abs = safeResolve(root, input.path);
137
+ let cur;
138
+ try { cur = readCurrent(abs); } catch { return `ERROR: cannot read ${input.path} to edit.`; }
139
+ const occ = cur.split(input.old_string).length - 1;
140
+ if (occ === 0) return `ERROR: old_string not found in ${input.path}. Read the file and retry with an exact match.`;
141
+ if (occ > 1) return `ERROR: old_string appears ${occ} times in ${input.path}; add surrounding lines to make it unique.`;
142
+ recordOriginal(abs);
143
+ writeCurrent(abs, cur.replace(input.old_string, input.new_string));
144
+ log.push(`${dryRun ? "[dry] " : ""}edit ${relative(root, abs) || input.path}`);
145
+ return `${dryRun ? "(dry_run, staged) " : ""}Edited ${input.path} (1 replacement).`;
146
+ }
147
+ case "list_dir": {
148
+ const abs = safeResolve(root, input.path || ".");
149
+ const entries = readdirSync(abs).map((e) => {
150
+ try { return statSync(join(abs, e)).isDirectory() ? e + "/" : e; } catch { return e; }
151
+ });
152
+ return entries.join("\n") || "(empty)";
153
+ }
154
+ case "run_bash": {
155
+ if (dryRun) return `[dry_run] bash disabled. Use read_file/list_dir to inspect. (cmd was: ${input.command})`;
156
+ log.push(`bash: ${String(input.command).slice(0, 80)}`);
157
+ let out;
158
+ try {
159
+ out = execSync(input.command, { cwd: root, timeout: BASH_TIMEOUT, encoding: "utf8", stdio: ["ignore", "pipe", "pipe"], shell: true });
160
+ } catch (e) {
161
+ out = `${e.stdout || ""}${e.stderr || ""}\n[exit ${e.status ?? "?"}] ${e.message}`;
162
+ }
163
+ return (out || "(no output)").slice(0, BASH_OUT_CAP);
164
+ }
165
+ default:
166
+ return `ERROR: unknown tool ${name}`;
167
+ }
168
+ } catch (e) {
169
+ return `ERROR (${name}): ${e.message}`;
170
+ }
171
+ }
172
+
173
+ const system =
174
+ `You are a capable coding agent operating directly on a local repository.\n` +
175
+ `Working directory: ${root}\n` +
176
+ (dryRun
177
+ ? `DRY RUN: your write_file/edit_file are STAGED, not written to disk, and run_bash is disabled. ` +
178
+ `Produce the complete set of intended changes, then stop and summarize them.\n`
179
+ : `Make changes yourself with the tools; run tests/builds to verify. `) +
180
+ `Tools: read_file, write_file, edit_file, list_dir, run_bash. When fully done, stop calling ` +
181
+ `tools and reply with a concise summary of what you changed and how you verified it.`;
182
+
183
+ const messages = [{ role: "user", content: context ? `${task}\n\n--- CONTEXT ---\n${context}` : task }];
184
+ let lastText = "";
185
+ const totalUsage = { input_tokens: 0, output_tokens: 0 };
186
+ let iters = 0;
187
+
188
+ for (; iters < MAX_ITERS; iters++) {
189
+ const { raw, usage } = await glmMessage({ model, system, messages, maxTokens, thinking, tools: TOOLS });
190
+ totalUsage.input_tokens += usage.input_tokens || 0;
191
+ totalUsage.output_tokens += usage.output_tokens || 0;
192
+ const content = raw.content || [];
193
+ const textParts = content.filter((b) => b.type === "text").map((b) => b.text);
194
+ if (textParts.length) lastText = textParts.join("\n").trim();
195
+ const toolUses = content.filter((b) => b.type === "tool_use");
196
+ if (raw.stop_reason !== "tool_use" || toolUses.length === 0) break;
197
+ messages.push({ role: "assistant", content });
198
+ messages.push({
199
+ role: "user",
200
+ content: toolUses.map((tu) => ({ type: "tool_result", tool_use_id: tu.id, content: String(runTool(tu.name, tu.input || {})) })),
201
+ });
202
+ }
203
+
204
+ // Build the diff from captured originals.
205
+ let diff = "";
206
+ for (const [abs, orig] of originals) {
207
+ let now;
208
+ if (dryRun) now = overlay.has(abs) ? overlay.get(abs) : orig ?? "";
209
+ else now = existsSync(abs) ? readFileSync(abs, "utf8") : "";
210
+ const d = unifiedDiff(orig ?? "", now ?? "", relative(root, abs) || abs);
211
+ if (d) diff += (orig == null ? `(new file)\n` : "") + d + "\n";
212
+ }
213
+ if (diff.length > DIFF_CAP) diff = diff.slice(0, DIFF_CAP) + "\n…[diff truncated]";
214
+
215
+ return {
216
+ text: lastText || "(GLM finished without a summary)",
217
+ actions: log,
218
+ iters,
219
+ hitCap: iters >= MAX_ITERS,
220
+ usage: totalUsage,
221
+ root,
222
+ dryRun,
223
+ diff: diff.trim(),
224
+ changedFiles: [...originals.keys()].map((a) => relative(root, a) || a),
225
+ git: checkpoint,
226
+ };
227
+ }
@@ -0,0 +1,176 @@
1
+ // glmClient.js
2
+ // Thin client over the GLM Anthropic-compatible endpoint with two things that
3
+ // matter for GLM specifically:
4
+ // 1. A concurrency gate (GLM caps in-flight requests at ~1 even on paid tiers).
5
+ // 2. Exponential backoff on 429 / "concurrency" / 5xx errors.
6
+
7
+ import { appendFileSync, readFileSync } from "node:fs";
8
+ import { fileURLToPath } from "node:url";
9
+ import { dirname, join } from "node:path";
10
+
11
+ // Local usage ledger: every GLM call is appended here so you have independent, on-disk
12
+ // proof of GLM usage (model + tokens), regardless of what the z.ai dashboard shows.
13
+ // View it: cat ~/.claude/glm-mcp/usage.jsonl
14
+ const USAGE_LOG = join(dirname(fileURLToPath(import.meta.url)), "..", "usage.jsonl");
15
+ function logUsage(model, usage) {
16
+ try {
17
+ appendFileSync(
18
+ USAGE_LOG,
19
+ JSON.stringify({
20
+ ts: new Date().toISOString(),
21
+ model,
22
+ input_tokens: usage.input_tokens || 0,
23
+ output_tokens: usage.output_tokens || 0,
24
+ }) + "\n"
25
+ );
26
+ } catch {}
27
+ }
28
+
29
+ /** Cumulative GLM usage from the local ledger — independent proof of GLM token spend. */
30
+ export function usageSummary() {
31
+ const out = { calls: 0, input_tokens: 0, output_tokens: 0, total_tokens: 0, by_model: {}, log_path: USAGE_LOG };
32
+ try {
33
+ for (const l of readFileSync(USAGE_LOG, "utf8").trim().split(/\r?\n/)) {
34
+ if (!l) continue;
35
+ const e = JSON.parse(l);
36
+ out.calls++;
37
+ out.input_tokens += e.input_tokens || 0;
38
+ out.output_tokens += e.output_tokens || 0;
39
+ out.by_model[e.model] = (out.by_model[e.model] || 0) + 1;
40
+ }
41
+ out.total_tokens = out.input_tokens + out.output_tokens;
42
+ } catch {}
43
+ return out;
44
+ }
45
+
46
+ const BASE_URL = (process.env.GLM_BASE_URL || "https://api.z.ai/api/anthropic").replace(/\/$/, "");
47
+ const API_KEY = process.env.GLM_API_KEY || process.env.ANTHROPIC_AUTH_TOKEN || "";
48
+ const MAX_CONCURRENT = Math.max(1, parseInt(process.env.GLM_MAX_CONCURRENT || "1", 10));
49
+ const MAX_RETRIES = Math.max(0, parseInt(process.env.GLM_MAX_RETRIES || "4", 10));
50
+ const TIMEOUT_MS = parseInt(process.env.GLM_TIMEOUT_MS || "300000", 10);
51
+
52
+ // ---- tiny semaphore so we never exceed GLM's concurrency cap ----
53
+ let active = 0;
54
+ const waiters = [];
55
+ async function acquire() {
56
+ if (active < MAX_CONCURRENT) {
57
+ active++;
58
+ return;
59
+ }
60
+ await new Promise((res) => waiters.push(res));
61
+ active++;
62
+ }
63
+ function release() {
64
+ active--;
65
+ const next = waiters.shift();
66
+ if (next) next();
67
+ }
68
+
69
+ const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
70
+
71
+ function isRetryable(status, bodyText) {
72
+ if (status === 429 || status === 503 || status === 502 || status === 500) return true;
73
+ if (bodyText && /concurren|rate.?limit|too\s+much/i.test(bodyText)) return true;
74
+ return false;
75
+ }
76
+
77
+ /**
78
+ * Call GLM's /v1/messages (Anthropic Messages API shape).
79
+ * @param {object} p
80
+ * @param {string} p.model
81
+ * @param {Array} p.messages Anthropic-style messages
82
+ * @param {string} [p.system]
83
+ * @param {number} [p.maxTokens]
84
+ * @param {boolean}[p.thinking]
85
+ * @returns {Promise<{text:string, usage:object, raw:object}>}
86
+ */
87
+ export async function glmMessage({ model, messages, system, maxTokens = 32768, thinking = false, tools }) {
88
+ if (!API_KEY) {
89
+ throw new Error(
90
+ "GLM_API_KEY (or ANTHROPIC_AUTH_TOKEN) is not set. Add it to glm-mcp/.env or the MCP server env in .mcp.json."
91
+ );
92
+ }
93
+
94
+ const body = {
95
+ model,
96
+ max_tokens: maxTokens,
97
+ messages,
98
+ ...(system ? { system } : {}),
99
+ ...(tools && tools.length ? { tools } : {}),
100
+ ...(thinking ? { thinking: { type: "enabled", budget_tokens: Math.min(maxTokens, 8000) } } : {}),
101
+ };
102
+
103
+ await acquire();
104
+ try {
105
+ let attempt = 0;
106
+ // eslint-disable-next-line no-constant-condition
107
+ while (true) {
108
+ const controller = new AbortController();
109
+ const t = setTimeout(() => controller.abort(), TIMEOUT_MS);
110
+ let res, txt;
111
+ try {
112
+ res = await fetch(`${BASE_URL}/v1/messages`, {
113
+ method: "POST",
114
+ headers: {
115
+ "content-type": "application/json",
116
+ authorization: `Bearer ${API_KEY}`,
117
+ "x-api-key": API_KEY,
118
+ "anthropic-version": "2023-06-01",
119
+ },
120
+ body: JSON.stringify(body),
121
+ signal: controller.signal,
122
+ });
123
+ txt = await res.text();
124
+ } catch (e) {
125
+ clearTimeout(t);
126
+ if (attempt < MAX_RETRIES) {
127
+ await sleep(backoff(attempt++));
128
+ continue;
129
+ }
130
+ throw new Error(`GLM request failed (network/timeout): ${e.message}`);
131
+ }
132
+ clearTimeout(t);
133
+
134
+ if (!res.ok) {
135
+ if (isRetryable(res.status, txt) && attempt < MAX_RETRIES) {
136
+ await sleep(backoff(attempt++, txt));
137
+ continue;
138
+ }
139
+ throw new Error(`GLM API error ${res.status}: ${truncate(txt, 800)}`);
140
+ }
141
+
142
+ let json;
143
+ try {
144
+ json = JSON.parse(txt);
145
+ } catch {
146
+ throw new Error(`GLM returned non-JSON response: ${truncate(txt, 800)}`);
147
+ }
148
+
149
+ const text = (json.content || [])
150
+ .filter((b) => b.type === "text")
151
+ .map((b) => b.text)
152
+ .join("\n")
153
+ .trim();
154
+
155
+ logUsage(model, json.usage || {}); // on-disk proof of GLM usage
156
+ return { text, usage: json.usage || {}, raw: json };
157
+ }
158
+ } finally {
159
+ release();
160
+ }
161
+ }
162
+
163
+ function backoff(attempt, bodyText) {
164
+ // Honor concurrency errors with a slightly longer floor.
165
+ const concurrency = bodyText && /concurren|too\s+much/i.test(bodyText);
166
+ const base = concurrency ? 2000 : 800;
167
+ const jitter = Math.random() * 400;
168
+ return Math.min(base * 2 ** attempt + jitter, 30000);
169
+ }
170
+
171
+ function truncate(s, n) {
172
+ if (!s) return "";
173
+ return s.length > n ? s.slice(0, n) + "…[truncated]" : s;
174
+ }
175
+
176
+ export const config = { BASE_URL, MAX_CONCURRENT, MAX_RETRIES, hasKey: Boolean(API_KEY) };