npm - glm-mcp-copilot - Versions diffs - 1.0.0 - Mend

glm-mcp-copilot 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/LICENSE +21 -0
package/README.md +63 -0
package/copilot-instructions.md +21 -0
package/glm-mcp/.env.example +37 -0
package/glm-mcp/package.json +21 -0
package/glm-mcp/src/glmAgent.js +227 -0
package/glm-mcp/src/glmClient.js +176 -0
package/glm-mcp/src/index.js +312 -0
package/glm-mcp/src/loadEnv.js +24 -0
package/glm-mcp/src/router.js +305 -0
package/glm-mcp/src/smoke.js +42 -0
package/install-copilot.mjs +102 -0
package/mcp.json.example +9 -0
package/package.json +43 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 djerok
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,63 @@
+# glm-mcp-copilot — GLM as a cheap delegate for GitHub Copilot (VS Code)
+Use the **GLM** model (Zhipu / Z.ai) as a **~10× cheaper delegate** inside **GitHub Copilot / Copilot
+Chat** (VS Code agent mode). It's the **same GLM MCP server** used by the Claude Code version — Copilot
+calls `glm_agent` / `glm_delegate` / `glm_recommend` / `glm_status` to offload work to GLM.
+> Sibling package: **[glm-mcp-claude](../README.md)** (the Claude Code version). Same server, different host.
+## What you get
+- The **glm MCP server** registered in VS Code (agent mode) — tools:
+  - **`glm_agent`** — GLM works your repo directly (read/write/edit/run), returns a concise summary + stats.
+  - **`glm_delegate`** — GLM drafts text you place.
+  - **`glm_recommend`** — free advisory: GLM vs the default model.
+  - **`glm_status`** — usage ledger (proof of GLM tokens spent) + config.
+- A **`.github/copilot-instructions.md`** delegation policy so Copilot offloads to GLM automatically.
+## Prerequisites
+- **VS Code** with **GitHub Copilot + Copilot Chat**, and **Agent mode** available (MCP support).
+- **Node.js ≥ 18**.
+- A **Z.ai / Zhipu GLM Coding Plan** API key — https://z.ai (the only paid key needed).
+## Install
+```bash
+# from npm:
+npx glm-mcp-copilot --key YOUR_ZAI_API_KEY
+# or clone the repo and run the Copilot installer:
+git clone https://github.com/djerok/glm-mcp
+node glm-mcp/copilot/install-copilot.mjs --key YOUR_ZAI_API_KEY
+```
+Run it **from your project folder** (it sets up that workspace). It:
+1. installs the GLM MCP server to `~/.glm-mcp/glm-mcp/` and runs `npm install`,
+2. writes your key to that server's `.env`,
+3. registers the server in `.vscode/mcp.json` (VS Code's `servers` format),
+4. writes `.github/copilot-instructions.md` (the delegation policy).
+Then in VS Code: **Reload Window → open Copilot Chat → Agent mode → start the `glm` server** (`MCP: List
+Servers`). Ask Copilot to do a coding task; it will call `glm_agent`.
+## How it differs from the Claude Code version
+Copilot doesn't have Claude Code's *subagents* or *PreToolUse hooks*, so there's no auto-routing hook or
+`glm` subagent. Instead:
+- **MCP tools** (`glm_*`) are available in **agent mode** and Copilot calls them.
+- **`.github/copilot-instructions.md`** steers Copilot to delegate to GLM (the CLAUDE.md equivalent).
+Everything else — the GLM agent loop, peak-aware model pick, cost bias, token cap, usage ledger,
+`dry_run` oversight — is the **same server**, so it behaves identically once a tool is called.
+## Configuration
+Same `.env` knobs as the Claude version, in `~/.glm-mcp/glm-mcp/.env`:
+`GLM_API_KEY`, `GLM_BASE_URL`, `GLM_COST_BIAS`, `GLM_CAP`, `GLM_MAX_TOKENS`, `GLM_OFFPEAK_MODEL` /
+`GLM_PEAK_MODEL`, etc. See `glm-mcp/.env.example`.
+## Verifying GLM usage
+`glm_status` (or `~/.glm-mcp/glm-mcp/usage.jsonl`) logs every GLM call (model + tokens) — independent
+proof that work ran on GLM, not Copilot's default model.
+## Security
+- Your key lives in `~/.glm-mcp/glm-mcp/.env` (git-ignored) — not committed, not in the npm package.
+- GLM routes through servers in China — keep secrets/regulated code on the default model.
+## License
+[MIT](LICENSE) © [djerok](https://github.com/djerok) · Canonical repo: https://github.com/djerok/glm-mcp

package/copilot-instructions.md ADDED Viewed

@@ -0,0 +1,21 @@
+<!-- GLM delegation policy for GitHub Copilot / Copilot Chat (agent mode). -->
+<!-- Installed to .github/copilot-instructions.md so Copilot reads it automatically. -->
+# GLM delegation
+You have the **glm** MCP server available in agent mode, with these tools:
+`glm_agent`, `glm_delegate`, `glm_recommend`, `glm_status`. GLM (Zhipu/Z.ai) is **~10× cheaper**
+than the default Copilot model — offload work to it to save cost.
+- For **any** implementation / codegen / edit / refactor / test / docs task, call **`glm_agent`**
+  with the goal + the absolute repo path (`workdir`). GLM reads, writes, edits, and runs the files
+  itself, end-to-end, on **GLM tokens**. Prefer this over doing the file work yourself.
+- For pure text drafts (no file ops), use **`glm_delegate`**.
+- **To actually spend GLM tokens (not Copilot's), you must call the glm tools — don't do the work
+  inline yourself.** Doing it yourself spends the default model's tokens, not GLM's.
+- Keep on the **default model** for: sensitive/secret code, vision/images, parallel work, very large
+  context (>128K), and heavy dependent tool-loops. Unsure? call **`glm_recommend`** (free, local).
+- `glm_status` shows the GLM usage ledger — proof of how many GLM tokens were actually spent.
+Note: for oversight, `glm_agent` supports `dry_run: true` (propose a diff without writing) and returns
+a concise summary + stats after each run.

package/glm-mcp/.env.example ADDED Viewed

@@ -0,0 +1,37 @@
+# Copy to .env and fill in. .env is git-ignored.
+# Your GLM (Zhipu / Z.ai) API key. Used as a Bearer token.
+GLM_API_KEY=your-zai-key-here
+# Anthropic-compatible endpoint for the GLM coding plan.
+GLM_BASE_URL=https://api.z.ai/api/anthropic
+# --- optional tuning (sensible defaults baked in) ---
+# GLM_USE_HAIKU=off           # off (DEFAULT) = skip the Haiku `glm` subagent and call GLM directly
+#                             # (mcp__glm__glm_agent) so ALL tokens stay on GLM. Set to `on` to allow
+#                             # the Haiku-orchestrated subagent (it spends some Claude tokens).
+# GLM_COST_BIAS=7             # how hard to favor GLM. Default 7 => GLM handles ~98-100% of tasks
+#                             # (Opus only for vision/parallel/huge-context/sensitive/heavy tool-loops).
+#                             # Lower (e.g. 1.5) to route more hard tasks to Opus; 0 = capability only.
+# GLM_MAX_CONCURRENT=1        # GLM caps in-flight requests ~1; keep at 1 unless your tier allows more
+# --- output token cap (OFF by default = generous) ---
+# By default the cap is OFF: every call may use up to GLM_MAX_TOKENS_CEILING (131072).
+# max_tokens is a ceiling, not a target -- you pay for ACTUAL output, so leaving it off
+# just prevents truncation. Turn the cap ON to control spend.
+# GLM_CAP=off                      # off (default) | on  -- enforce GLM_MAX_TOKENS when on
+# GLM_MAX_TOKENS=32768             # the hard per-call limit applied WHEN GLM_CAP=on
+# GLM_MAX_TOKENS_CEILING=131072    # the generous default used when the cap is OFF
+# GLM_MAX_RETRIES=4
+# GLM_TIMEOUT_MS=300000
+# GLM_AGENT_MAX_ITERS=30           # max tool-loop turns for glm_agent before it stops
+# GLM_AGENT_BASH_TIMEOUT_MS=120000 # per-command timeout for glm_agent's run_bash
+# GLM_OFFPEAK_MODEL=glm-5.2   # model(s) for "auto" off-peak. Can be a COMMA LIST, e.g.
+#                             # "glm-5.2,glm-5-turbo" -> the router auto-picks (most capable for
+#                             # hard tasks, cheapest for easy ones).
+# GLM_PEAK_MODEL=glm-5.2      # model(s) for "auto" during peak. glm-5.x carries the ~3x surcharge,
+#                             # so when "auto" lands on a glm-5.x model the router routes LESS to GLM
+#                             # at peak. Include a no-surcharge model (e.g. "glm-5.2,glm-4.7") and
+#                             # the router will prefer it at peak -> GLM stays fine to use.
+# GLM_CHEAP_MODEL=glm-4.5-air
+# GLM_PEAK_START_CN=14        # peak window start, China hour (UTC+8)
+# GLM_PEAK_END_CN=18          # peak window end (exclusive)

package/glm-mcp/package.json ADDED Viewed

@@ -0,0 +1,21 @@
+{
+  "name": "glm-mcp",
+  "version": "1.1.7",
+  "description": "MCP server that delegates self-contained subtasks to the GLM (Zhipu/Z.ai) Anthropic-compatible API, so Claude Code can use GLM as a cheap, peak-aware subagent.",
+  "type": "module",
+  "bin": {
+    "glm-mcp": "src/index.js"
+  },
+  "main": "src/index.js",
+  "scripts": {
+    "start": "node src/index.js",
+    "smoke": "node src/smoke.js"
+  },
+  "dependencies": {
+    "@modelcontextprotocol/sdk": "^1.0.0",
+    "zod": "^3.23.8"
+  },
+  "engines": {
+    "node": ">=18"
+  }
+}

package/glm-mcp/src/glmAgent.js ADDED Viewed

@@ -0,0 +1,227 @@
+// glmAgent.js
+// Runs GLM as a real tool-using agent against the local filesystem, with oversight
+// built in so Opus can regulate and see exactly what GLM did:
+//   - returns a unified DIFF of every change (isolated to the files GLM touched)
+//   - returns an ACTION LOG of every read/write/edit/bash
+//   - records a non-invasive git checkpoint + revert hint (when in a git repo)
+//   - supports dry_run: GLM proposes changes to an in-memory overlay and writes NOTHING,
+//     so Opus can approve the diff before a real apply pass.
+import { readFileSync, writeFileSync, readdirSync, statSync, mkdirSync, existsSync } from "node:fs";
+import { resolve, dirname, relative, isAbsolute, join } from "node:path";
+import { execSync } from "node:child_process";
+import { glmMessage } from "./glmClient.js";
+const MAX_ITERS = parseInt(process.env.GLM_AGENT_MAX_ITERS || "30", 10);
+const BASH_TIMEOUT = parseInt(process.env.GLM_AGENT_BASH_TIMEOUT_MS || "120000", 10);
+const FILE_READ_CAP = 100000;
+const BASH_OUT_CAP = 30000;
+const DIFF_CAP = 20000;
+const DIFF_LINE_CAP = 3000;
+const TOOLS = [
+  { name: "read_file", description: "Read a UTF-8 text file (path relative to working dir or absolute).",
+    input_schema: { type: "object", properties: { path: { type: "string" } }, required: ["path"] } },
+  { name: "write_file", description: "Create or overwrite a file. Creates parent dirs as needed.",
+    input_schema: { type: "object", properties: { path: { type: "string" }, content: { type: "string" } }, required: ["path", "content"] } },
+  { name: "edit_file", description: "Replace an exact substring in a file. old_string must appear exactly once.",
+    input_schema: { type: "object", properties: { path: { type: "string" }, old_string: { type: "string" }, new_string: { type: "string" } }, required: ["path", "old_string", "new_string"] } },
+  { name: "list_dir", description: "List entries in a directory (relative or absolute). Defaults to '.'.",
+    input_schema: { type: "object", properties: { path: { type: "string" } } } },
+  { name: "run_bash", description: "Run a shell command in the working dir; returns stdout+stderr. Disabled in dry_run.",
+    input_schema: { type: "object", properties: { command: { type: "string" } }, required: ["command"] } },
+];
+function safeResolve(root, p) {
+  return isAbsolute(p || "") ? resolve(p) : resolve(root, p || ".");
+}
+function unifiedDiff(oldStr, newStr, path) {
+  if (oldStr === newStr) return "";
+  const A = oldStr.length ? oldStr.split("\n") : [];
+  const B = newStr.length ? newStr.split("\n") : [];
+  if (A.length > DIFF_LINE_CAP || B.length > DIFF_LINE_CAP) {
+    return `--- ${path}\n+++ ${path}\n@@ large file: ${A.length} -> ${B.length} lines (detailed diff omitted) @@\n`;
+  }
+  const n = A.length, m = B.length;
+  const dp = [];
+  for (let i = 0; i <= n; i++) dp.push(new Int32Array(m + 1));
+  for (let i = n - 1; i >= 0; i--)
+    for (let j = m - 1; j >= 0; j--)
+      dp[i][j] = A[i] === B[j] ? dp[i + 1][j + 1] + 1 : Math.max(dp[i + 1][j], dp[i][j + 1]);
+  const rows = [];
+  let i = 0, j = 0;
+  while (i < n && j < m) {
+    if (A[i] === B[j]) { rows.push([" ", A[i]]); i++; j++; }
+    else if (dp[i + 1][j] >= dp[i][j + 1]) { rows.push(["-", A[i]]); i++; }
+    else { rows.push(["+", B[j]]); j++; }
+  }
+  while (i < n) rows.push(["-", A[i++]]);
+  while (j < m) rows.push(["+", B[j++]]);
+  const out = [`--- ${path}`, `+++ ${path}`];
+  let ctx = [];
+  const flush = () => {
+    if (ctx.length > 6) {
+      out.push(" " + ctx[0], " " + ctx[1], `@@ ... ${ctx.length - 4} unchanged ... @@`, " " + ctx[ctx.length - 2], " " + ctx[ctx.length - 1]);
+    } else for (const c of ctx) out.push(" " + c);
+    ctx = [];
+  };
+  for (const [t, l] of rows) {
+    if (t === " ") ctx.push(l);
+    else { flush(); out.push(t + l); }
+  }
+  flush();
+  return out.join("\n") + "\n";
+}
+function gitCheckpoint(root) {
+  try {
+    execSync(`git -C "${root}" rev-parse --is-inside-work-tree`, { stdio: "ignore" });
+  } catch {
+    return { isRepo: false, baseline: null, revertHint: "Not a git repo — review the diff below; revert manually if needed." };
+  }
+  let baseline = "";
+  try { baseline = execSync(`git -C "${root}" stash create`, { encoding: "utf8" }).trim(); } catch {}
+  if (!baseline) {
+    try { baseline = execSync(`git -C "${root}" rev-parse HEAD`, { encoding: "utf8" }).trim(); } catch {}
+  }
+  return {
+    isRepo: true,
+    baseline,
+    revertHint: baseline
+      ? `To revert GLM's changes: \`git -C "${root}" checkout ${baseline} -- .\` then \`git -C "${root}" clean -fd\` to drop any new files. (Baseline is a non-invasive snapshot; your working tree was not modified by the checkpoint.)`
+      : "Git repo detected but baseline capture failed; use `git diff` / `git stash` to review and revert.",
+  };
+}
+export async function runGlmAgent({ model, task, context, workdir, maxTokens = 32768, thinking = false, dryRun = false }) {
+  const root = workdir && workdir.trim() ? resolve(workdir) : process.cwd();
+  const log = [];
+  const originals = new Map(); // abs -> pre-run disk content (string|null if didn't exist)
+  const overlay = new Map(); // dry_run staging: abs -> proposed content
+  const checkpoint = dryRun ? { isRepo: false, baseline: null, revertHint: "dry_run: nothing written." } : gitCheckpoint(root);
+  const recordOriginal = (abs) => {
+    if (!originals.has(abs)) {
+      try { originals.set(abs, readFileSync(abs, "utf8")); } catch { originals.set(abs, null); }
+    }
+  };
+  const readCurrent = (abs) => {
+    if (dryRun && overlay.has(abs)) return overlay.get(abs);
+    return readFileSync(abs, "utf8");
+  };
+  const writeCurrent = (abs, content) => {
+    if (dryRun) { overlay.set(abs, content); return; }
+    mkdirSync(dirname(abs), { recursive: true });
+    writeFileSync(abs, content, "utf8");
+  };
+  function runTool(name, input) {
+    try {
+      switch (name) {
+        case "read_file": {
+          const abs = safeResolve(root, input.path);
+          const txt = readCurrent(abs);
+          log.push(`read ${relative(root, abs) || input.path}`);
+          return txt.length > FILE_READ_CAP ? txt.slice(0, FILE_READ_CAP) + "\n…[truncated]" : txt;
+        }
+        case "write_file": {
+          const abs = safeResolve(root, input.path);
+          recordOriginal(abs);
+          writeCurrent(abs, input.content ?? "");
+          log.push(`${dryRun ? "[dry] " : ""}write ${relative(root, abs) || input.path}`);
+          return `${dryRun ? "(dry_run, staged) " : ""}Wrote ${(input.content ?? "").length} chars to ${input.path}.`;
+        }
+        case "edit_file": {
+          const abs = safeResolve(root, input.path);
+          let cur;
+          try { cur = readCurrent(abs); } catch { return `ERROR: cannot read ${input.path} to edit.`; }
+          const occ = cur.split(input.old_string).length - 1;
+          if (occ === 0) return `ERROR: old_string not found in ${input.path}. Read the file and retry with an exact match.`;
+          if (occ > 1) return `ERROR: old_string appears ${occ} times in ${input.path}; add surrounding lines to make it unique.`;
+          recordOriginal(abs);
+          writeCurrent(abs, cur.replace(input.old_string, input.new_string));
+          log.push(`${dryRun ? "[dry] " : ""}edit ${relative(root, abs) || input.path}`);
+          return `${dryRun ? "(dry_run, staged) " : ""}Edited ${input.path} (1 replacement).`;
+        }
+        case "list_dir": {
+          const abs = safeResolve(root, input.path || ".");
+          const entries = readdirSync(abs).map((e) => {
+            try { return statSync(join(abs, e)).isDirectory() ? e + "/" : e; } catch { return e; }
+          });
+          return entries.join("\n") || "(empty)";
+        }
+        case "run_bash": {
+          if (dryRun) return `[dry_run] bash disabled. Use read_file/list_dir to inspect. (cmd was: ${input.command})`;
+          log.push(`bash: ${String(input.command).slice(0, 80)}`);
+          let out;
+          try {
+            out = execSync(input.command, { cwd: root, timeout: BASH_TIMEOUT, encoding: "utf8", stdio: ["ignore", "pipe", "pipe"], shell: true });
+          } catch (e) {
+            out = `${e.stdout || ""}${e.stderr || ""}\n[exit ${e.status ?? "?"}] ${e.message}`;
+          }
+          return (out || "(no output)").slice(0, BASH_OUT_CAP);
+        }
+        default:
+          return `ERROR: unknown tool ${name}`;
+      }
+    } catch (e) {
+      return `ERROR (${name}): ${e.message}`;
+    }
+  }
+  const system =
+    `You are a capable coding agent operating directly on a local repository.\n` +
+    `Working directory: ${root}\n` +
+    (dryRun
+      ? `DRY RUN: your write_file/edit_file are STAGED, not written to disk, and run_bash is disabled. ` +
+        `Produce the complete set of intended changes, then stop and summarize them.\n`
+      : `Make changes yourself with the tools; run tests/builds to verify. `) +
+    `Tools: read_file, write_file, edit_file, list_dir, run_bash. When fully done, stop calling ` +
+    `tools and reply with a concise summary of what you changed and how you verified it.`;
+  const messages = [{ role: "user", content: context ? `${task}\n\n--- CONTEXT ---\n${context}` : task }];
+  let lastText = "";
+  const totalUsage = { input_tokens: 0, output_tokens: 0 };
+  let iters = 0;
+  for (; iters < MAX_ITERS; iters++) {
+    const { raw, usage } = await glmMessage({ model, system, messages, maxTokens, thinking, tools: TOOLS });
+    totalUsage.input_tokens += usage.input_tokens || 0;
+    totalUsage.output_tokens += usage.output_tokens || 0;
+    const content = raw.content || [];
+    const textParts = content.filter((b) => b.type === "text").map((b) => b.text);
+    if (textParts.length) lastText = textParts.join("\n").trim();
+    const toolUses = content.filter((b) => b.type === "tool_use");
+    if (raw.stop_reason !== "tool_use" || toolUses.length === 0) break;
+    messages.push({ role: "assistant", content });
+    messages.push({
+      role: "user",
+      content: toolUses.map((tu) => ({ type: "tool_result", tool_use_id: tu.id, content: String(runTool(tu.name, tu.input || {})) })),
+    });
+  }
+  // Build the diff from captured originals.
+  let diff = "";
+  for (const [abs, orig] of originals) {
+    let now;
+    if (dryRun) now = overlay.has(abs) ? overlay.get(abs) : orig ?? "";
+    else now = existsSync(abs) ? readFileSync(abs, "utf8") : "";
+    const d = unifiedDiff(orig ?? "", now ?? "", relative(root, abs) || abs);
+    if (d) diff += (orig == null ? `(new file)\n` : "") + d + "\n";
+  }
+  if (diff.length > DIFF_CAP) diff = diff.slice(0, DIFF_CAP) + "\n…[diff truncated]";
+  return {
+    text: lastText || "(GLM finished without a summary)",
+    actions: log,
+    iters,
+    hitCap: iters >= MAX_ITERS,
+    usage: totalUsage,
+    root,
+    dryRun,
+    diff: diff.trim(),
+    changedFiles: [...originals.keys()].map((a) => relative(root, a) || a),
+    git: checkpoint,
+  };
+}

package/glm-mcp/src/glmClient.js ADDED Viewed

@@ -0,0 +1,176 @@
+// glmClient.js
+// Thin client over the GLM Anthropic-compatible endpoint with two things that
+// matter for GLM specifically:
+//   1. A concurrency gate (GLM caps in-flight requests at ~1 even on paid tiers).
+//   2. Exponential backoff on 429 / "concurrency" / 5xx errors.
+import { appendFileSync, readFileSync } from "node:fs";
+import { fileURLToPath } from "node:url";
+import { dirname, join } from "node:path";
+// Local usage ledger: every GLM call is appended here so you have independent, on-disk
+// proof of GLM usage (model + tokens), regardless of what the z.ai dashboard shows.
+// View it: cat ~/.claude/glm-mcp/usage.jsonl
+const USAGE_LOG = join(dirname(fileURLToPath(import.meta.url)), "..", "usage.jsonl");
+function logUsage(model, usage) {
+  try {
+    appendFileSync(
+      USAGE_LOG,
+      JSON.stringify({
+        ts: new Date().toISOString(),
+        model,
+        input_tokens: usage.input_tokens || 0,
+        output_tokens: usage.output_tokens || 0,
+      }) + "\n"
+    );
+  } catch {}
+}
+/** Cumulative GLM usage from the local ledger — independent proof of GLM token spend. */
+export function usageSummary() {
+  const out = { calls: 0, input_tokens: 0, output_tokens: 0, total_tokens: 0, by_model: {}, log_path: USAGE_LOG };
+  try {
+    for (const l of readFileSync(USAGE_LOG, "utf8").trim().split(/\r?\n/)) {
+      if (!l) continue;
+      const e = JSON.parse(l);
+      out.calls++;
+      out.input_tokens += e.input_tokens || 0;
+      out.output_tokens += e.output_tokens || 0;
+      out.by_model[e.model] = (out.by_model[e.model] || 0) + 1;
+    }
+    out.total_tokens = out.input_tokens + out.output_tokens;
+  } catch {}
+  return out;
+}
+const BASE_URL = (process.env.GLM_BASE_URL || "https://api.z.ai/api/anthropic").replace(/\/$/, "");
+const API_KEY = process.env.GLM_API_KEY || process.env.ANTHROPIC_AUTH_TOKEN || "";
+const MAX_CONCURRENT = Math.max(1, parseInt(process.env.GLM_MAX_CONCURRENT || "1", 10));
+const MAX_RETRIES = Math.max(0, parseInt(process.env.GLM_MAX_RETRIES || "4", 10));
+const TIMEOUT_MS = parseInt(process.env.GLM_TIMEOUT_MS || "300000", 10);
+// ---- tiny semaphore so we never exceed GLM's concurrency cap ----
+let active = 0;
+const waiters = [];
+async function acquire() {
+  if (active < MAX_CONCURRENT) {
+    active++;
+    return;
+  }
+  await new Promise((res) => waiters.push(res));
+  active++;
+}
+function release() {
+  active--;
+  const next = waiters.shift();
+  if (next) next();
+}
+const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
+function isRetryable(status, bodyText) {
+  if (status === 429 || status === 503 || status === 502 || status === 500) return true;
+  if (bodyText && /concurren|rate.?limit|too\s+much/i.test(bodyText)) return true;
+  return false;
+}
+/**
+ * Call GLM's /v1/messages (Anthropic Messages API shape).
+ * @param {object} p
+ * @param {string} p.model
+ * @param {Array}  p.messages  Anthropic-style messages
+ * @param {string} [p.system]
+ * @param {number} [p.maxTokens]
+ * @param {boolean}[p.thinking]
+ * @returns {Promise<{text:string, usage:object, raw:object}>}
+ */
+export async function glmMessage({ model, messages, system, maxTokens = 32768, thinking = false, tools }) {
+  if (!API_KEY) {
+    throw new Error(
+      "GLM_API_KEY (or ANTHROPIC_AUTH_TOKEN) is not set. Add it to glm-mcp/.env or the MCP server env in .mcp.json."
+    );
+  }
+  const body = {
+    model,
+    max_tokens: maxTokens,
+    messages,
+    ...(system ? { system } : {}),
+    ...(tools && tools.length ? { tools } : {}),
+    ...(thinking ? { thinking: { type: "enabled", budget_tokens: Math.min(maxTokens, 8000) } } : {}),
+  };
+  await acquire();
+  try {
+    let attempt = 0;
+    // eslint-disable-next-line no-constant-condition
+    while (true) {
+      const controller = new AbortController();
+      const t = setTimeout(() => controller.abort(), TIMEOUT_MS);
+      let res, txt;
+      try {
+        res = await fetch(`${BASE_URL}/v1/messages`, {
+          method: "POST",
+          headers: {
+            "content-type": "application/json",
+            authorization: `Bearer ${API_KEY}`,
+            "x-api-key": API_KEY,
+            "anthropic-version": "2023-06-01",
+          },
+          body: JSON.stringify(body),
+          signal: controller.signal,
+        });
+        txt = await res.text();
+      } catch (e) {
+        clearTimeout(t);
+        if (attempt < MAX_RETRIES) {
+          await sleep(backoff(attempt++));
+          continue;
+        }
+        throw new Error(`GLM request failed (network/timeout): ${e.message}`);
+      }
+      clearTimeout(t);
+      if (!res.ok) {
+        if (isRetryable(res.status, txt) && attempt < MAX_RETRIES) {
+          await sleep(backoff(attempt++, txt));
+          continue;
+        }
+        throw new Error(`GLM API error ${res.status}: ${truncate(txt, 800)}`);
+      }
+      let json;
+      try {
+        json = JSON.parse(txt);
+      } catch {
+        throw new Error(`GLM returned non-JSON response: ${truncate(txt, 800)}`);
+      }
+      const text = (json.content || [])
+        .filter((b) => b.type === "text")
+        .map((b) => b.text)
+        .join("\n")
+        .trim();
+      logUsage(model, json.usage || {}); // on-disk proof of GLM usage
+      return { text, usage: json.usage || {}, raw: json };
+    }
+  } finally {
+    release();
+  }
+}
+function backoff(attempt, bodyText) {
+  // Honor concurrency errors with a slightly longer floor.
+  const concurrency = bodyText && /concurren|too\s+much/i.test(bodyText);
+  const base = concurrency ? 2000 : 800;
+  const jitter = Math.random() * 400;
+  return Math.min(base * 2 ** attempt + jitter, 30000);
+}
+function truncate(s, n) {
+  if (!s) return "";
+  return s.length > n ? s.slice(0, n) + "…[truncated]" : s;
+}
+export const config = { BASE_URL, MAX_CONCURRENT, MAX_RETRIES, hasKey: Boolean(API_KEY) };