glm-mcp-claude 1.0.0 → 1.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +11 -15
- package/agents/glm.md +14 -4
- package/docs/RULES.md +2 -1
- package/glm-mcp/.env.example +6 -1
- package/glm-mcp/package.json +1 -1
- package/glm-mcp/src/index.js +20 -5
- package/glm-mcp/src/router.js +19 -5
- package/hooks/glm_subagent_router.mjs +28 -9
- package/install.mjs +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -19,9 +19,9 @@ files directly. One command to install.
|
|
|
19
19
|
> **no separate pay-per-token Anthropic API key required**. Only GLM needs a (cheap) Z.ai key.
|
|
20
20
|
> Opus orchestrates on your subscription; GLM does the heavy lifting for a fraction of the cost.
|
|
21
21
|
|
|
22
|
-

|
|
23
23
|
|
|
24
|
-
<sub>↑
|
|
24
|
+
<sub>↑ **Directly calling the GLM agent (`glm_agent`).** Prompt: *"write a 2000-word Shakespearean essay about the usefulness of an umbrella into my Desktop."* GLM created the file itself — **18 iterations, ~$0.064** — Opus never touched the keys. GLM reads, writes, edits, and runs your files directly.</sub>
|
|
25
25
|
|
|
26
26
|
```bash
|
|
27
27
|
# from npm:
|
|
@@ -122,21 +122,16 @@ tool-heavy dependent loops, huge context, vision, or anything you mark sensitive
|
|
|
122
122
|
| `glm_delegate` | GLM tokens | Text in → text out. GLM drafts; you place it. |
|
|
123
123
|
| `glm_agent` | GLM tokens | GLM works your repo directly (read/write/edit/bash). Returns a diff + action log + git revert; supports `dry_run` (propose, don't write). |
|
|
124
124
|
|
|
125
|
-
### Example:
|
|
125
|
+
### Example: delegating a read-and-summarize task
|
|
126
126
|
|
|
127
|
-
|
|
127
|
+
The `glm` subagent — its cheap Haiku layer driving, GLM doing the heavy lifting — reading this
|
|
128
|
+
whole repo and summarizing it:
|
|
128
129
|
|
|
129
|
-

|
|
130
131
|
|
|
131
|
-
>
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
- **Output:** `Umbrella-Essay-Shakespeare.md` — ~2,260 words of Early Modern English (*thee/thou/thy*, *doth/hath*) with two blank-verse interludes
|
|
136
|
-
- **Work:** 18 tool-loop iterations; **one** file created, nothing existing touched
|
|
137
|
-
- **Cost:** ~**$0.064** — a fraction of running the same task on Opus
|
|
138
|
-
|
|
139
|
-
That's the point: the orchestrator stays on Opus while `glm_agent` does the heavy, file-touching work for cents.
|
|
132
|
+
<sub>Orchestrated by Haiku 4.5 (the cheap layer), offloading the token-heavy work to GLM via the
|
|
133
|
+
MCP tools — the Opus → Haiku → GLM hybrid in action. The orchestrator stays on Opus while GLM does
|
|
134
|
+
the file-touching / heavy work for cents.</sub>
|
|
140
135
|
|
|
141
136
|
---
|
|
142
137
|
|
|
@@ -159,7 +154,8 @@ fine letting it modify.
|
|
|
159
154
|
|---|---|---|
|
|
160
155
|
| `GLM_API_KEY` | — | Your Z.ai key. **Required.** |
|
|
161
156
|
| `GLM_BASE_URL` | `https://api.z.ai/api/anthropic` | Anthropic-compatible endpoint. |
|
|
162
|
-
| `
|
|
157
|
+
| `GLM_USE_HAIKU` | `off` | **Off (default) skips the Haiku `glm` subagent and calls GLM directly** (`glm_agent`), so *all* tokens stay on GLM. Set `on` to allow the Haiku-orchestrated subagent (it spends some Claude tokens to orchestrate). |
|
|
158
|
+
| `GLM_COST_BIAS` | `7` | How hard to favor GLM. **Default `7` → GLM carries ~98–100% of tasks** (Opus only for vision / parallel / >128K context / sensitive / heavy tool-loops). Lower it (e.g. `1.5`) to send more hard tasks (debugging, architecture, security) to Opus; `0` = decide on capability alone. |
|
|
163
159
|
| `GLM_CAP` | `off` | Output-token cap. **Off by default** = generous (up to 131072 per call). Set `on` to enforce `GLM_MAX_TOKENS` and rein in spend. |
|
|
164
160
|
| `GLM_MAX_TOKENS` | `32768` | The hard per-call limit applied **only when `GLM_CAP=on`**. (`max_tokens` is a ceiling, not a target — you pay for actual output.) |
|
|
165
161
|
| `GLM_MAX_TOKENS_CEILING` | `131072` | The generous default used when the cap is **off**. |
|
package/agents/glm.md
CHANGED
|
@@ -14,12 +14,18 @@ model: haiku
|
|
|
14
14
|
|
|
15
15
|
You are the **GLM delegate** — a full subagent with the same tools as any subagent
|
|
16
16
|
(Read, Grep, Glob, Write, Edit, Bash, …) PLUS the GLM tools. Your edge is COST: GLM
|
|
17
|
-
(~10x cheaper than Opus) does the heavy lifting.
|
|
17
|
+
(~10x cheaper than Opus) does the heavy lifting.
|
|
18
18
|
|
|
19
|
-
|
|
20
|
-
|
|
19
|
+
> ⚠️ **Token rule (important):** *you* run on Haiku (a Claude model). Anything **you** write with
|
|
20
|
+
> your own Write/Edit/Bash spends **Claude tokens, zero GLM**. Only the `glm_agent` / `glm_delegate`
|
|
21
|
+
> tools spend **GLM tokens**. So **strongly prefer `glm_agent`** for real work: let GLM do the
|
|
22
|
+
> reading/writing/running. Use your own tools mainly to gather context and to verify the result —
|
|
23
|
+
> not to produce the output yourself. This keeps the burden (and the tokens) on GLM.
|
|
24
|
+
|
|
25
|
+
### Default for coding tasks: `glm_agent` (GLM works the files directly)
|
|
26
|
+
For essentially every "go do this in the repo" task, hand the whole thing to `glm_agent`. It runs GLM
|
|
21
27
|
as a real agent with its own read/write/edit/bash tools, so GLM inspects and edits the code
|
|
22
|
-
itself and runs tests — end to end. Call it with:
|
|
28
|
+
itself and runs tests — end to end, on GLM tokens. Call it with:
|
|
23
29
|
- `task`: the self-contained coding task.
|
|
24
30
|
- `workdir`: the **absolute path of the project root** (pass it explicitly).
|
|
25
31
|
- `model`: leave `auto` (peak-aware); `thinking: true` for harder work.
|
|
@@ -36,6 +42,10 @@ call `glm_delegate` (task + pasted context), then apply it with your own Write/E
|
|
|
36
42
|
2. **Serialize GLM calls** — one at a time (GLM caps concurrency ~1).
|
|
37
43
|
3. **Verify before returning.** Build/lint/test or re-read. If GLM's output is wrong or it loops,
|
|
38
44
|
retry once with a sharper prompt; if still bad, do the critical part yourself or escalate to Opus.
|
|
45
|
+
4. **Always end your report with the GLM stats.** `glm_agent` prints a `=== GLM STATS ===` block
|
|
46
|
+
(model, tokens delegated, iterations, cost); `glm_delegate` prints a `[GLM delegated … tokens to
|
|
47
|
+
<model>]` line. Surface these in your final message so every run clearly states **which GLM model
|
|
48
|
+
ran (e.g. glm-5.2) and how many tokens were delegated.**
|
|
39
49
|
|
|
40
50
|
## Operating rules
|
|
41
51
|
- **Serialize GLM calls.** GLM caps concurrent requests (~1); one `glm_delegate` at a time.
|
package/docs/RULES.md
CHANGED
|
@@ -19,7 +19,8 @@ GLM is **~10× cheaper** than Opus, and **still ~3–4× cheaper even at peak**
|
|
|
19
19
|
| GLM-5.2 at peak | 1.8 / 6.6 | ~3–4× cheaper |
|
|
20
20
|
| GLM-4.7 (no multiplier) | 0.4 / 1.75 | ~12× cheaper |
|
|
21
21
|
|
|
22
|
-
So the router applies a standing **cost bias toward GLM** (`GLM_COST_BIAS`, default `
|
|
22
|
+
So the router applies a standing **cost bias toward GLM** (`GLM_COST_BIAS`, default `7` →
|
|
23
|
+
GLM carries ~98–100% of tasks; lower it to hand more hard tasks to Opus):
|
|
23
24
|
GLM is the default for safe-to-be-wrong work, and Opus is the exception you *pay up* for only
|
|
24
25
|
when quality/risk justifies it. The catch cost can't override: on hard tasks, *cheaper-but-wrong*
|
|
25
26
|
is **more** expensive (rework + Opus tokens to fix), so the capability penalties for
|
package/glm-mcp/.env.example
CHANGED
|
@@ -7,7 +7,12 @@ GLM_API_KEY=your-zai-key-here
|
|
|
7
7
|
GLM_BASE_URL=https://api.z.ai/api/anthropic
|
|
8
8
|
|
|
9
9
|
# --- optional tuning (sensible defaults baked in) ---
|
|
10
|
-
#
|
|
10
|
+
# GLM_USE_HAIKU=off # off (DEFAULT) = skip the Haiku `glm` subagent and call GLM directly
|
|
11
|
+
# # (mcp__glm__glm_agent) so ALL tokens stay on GLM. Set to `on` to allow
|
|
12
|
+
# # the Haiku-orchestrated subagent (it spends some Claude tokens).
|
|
13
|
+
# GLM_COST_BIAS=7 # how hard to favor GLM. Default 7 => GLM handles ~98-100% of tasks
|
|
14
|
+
# # (Opus only for vision/parallel/huge-context/sensitive/heavy tool-loops).
|
|
15
|
+
# # Lower (e.g. 1.5) to route more hard tasks to Opus; 0 = capability only.
|
|
11
16
|
# GLM_MAX_CONCURRENT=1 # GLM caps in-flight requests ~1; keep at 1 unless your tier allows more
|
|
12
17
|
# --- output token cap (OFF by default = generous) ---
|
|
13
18
|
# By default the cap is OFF: every call may use up to GLM_MAX_TOKENS_CEILING (131072).
|
package/glm-mcp/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "glm-mcp",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.1.1",
|
|
4
4
|
"description": "MCP server that delegates self-contained subtasks to the GLM (Zhipu/Z.ai) Anthropic-compatible API, so Claude Code can use GLM as a cheap, peak-aware subagent.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
package/glm-mcp/src/index.js
CHANGED
|
@@ -26,6 +26,7 @@ import {
|
|
|
26
26
|
MODELS,
|
|
27
27
|
resolveMaxTokens,
|
|
28
28
|
MAXTOK,
|
|
29
|
+
USE_HAIKU,
|
|
29
30
|
} from "./router.js";
|
|
30
31
|
import { runGlmAgent } from "./glmAgent.js";
|
|
31
32
|
|
|
@@ -177,16 +178,26 @@ server.registerTool(
|
|
|
177
178
|
const chosen = resolveModel(model, now);
|
|
178
179
|
try {
|
|
179
180
|
const r = await runGlmAgent({ model: chosen, task, context, workdir, maxTokens: resolveMaxTokens(max_tokens), thinking, dryRun: dry_run });
|
|
180
|
-
const
|
|
181
|
+
const inTok = r.usage.input_tokens || 0;
|
|
182
|
+
const outTok = r.usage.output_tokens || 0;
|
|
183
|
+
const totalTok = inTok + outTok;
|
|
184
|
+
const cost = estimateCost(chosen, inTok, outTok, now);
|
|
185
|
+
const opusCost = estimateCost("claude-opus", inTok, outTok, now);
|
|
186
|
+
const xCheaper = cost > 0 ? Math.round(opusCost / cost) : "?";
|
|
181
187
|
const banner = r.dryRun ? "*** DRY RUN — nothing was written; this is GLM's PROPOSED change for you to approve ***\n" : "";
|
|
182
|
-
const totalTok = (r.usage.input_tokens || 0) + (r.usage.output_tokens || 0);
|
|
183
188
|
const header =
|
|
184
|
-
`[GLM agent]
|
|
185
|
-
`dir=${r.root} | iterations=${r.iters}${r.hitCap ? " (HIT CAP -- may be incomplete)" : ""} | actions=${r.actions.length} | files=${r.changedFiles.length}`;
|
|
189
|
+
`[GLM agent] ${chosen} | dir=${r.root} | ${r.iters} iterations${r.hitCap ? " (HIT CAP -- may be incomplete)" : ""} | ${r.actions.length} actions | ${r.changedFiles.length} files`;
|
|
186
190
|
const actions = r.actions.length ? `\nActions:\n- ${r.actions.join("\n- ")}` : "";
|
|
187
191
|
const diff = r.diff ? `\n\n=== DIFF (review this) ===\n${r.diff}` : "\n\n(no file changes)";
|
|
188
192
|
const revert = !r.dryRun && r.git && r.git.revertHint ? `\n\nRevert: ${r.git.revertHint}` : "";
|
|
189
|
-
|
|
193
|
+
// Prominent stats footer, shown after every glm_agent run finishes.
|
|
194
|
+
const stats =
|
|
195
|
+
`\n\n=== GLM STATS (this subagent) ===\n` +
|
|
196
|
+
`model: ${chosen}\n` +
|
|
197
|
+
`tokens: ${totalTok} delegated to GLM (${inTok} in / ${outTok} out)\n` +
|
|
198
|
+
`iterations: ${r.iters}${r.hitCap ? " (hit cap)" : ""} files changed: ${r.changedFiles.length}\n` +
|
|
199
|
+
`est. cost: $${cost} (~${xCheaper}x cheaper than Opus)`;
|
|
200
|
+
return { content: [{ type: "text", text: clip(`${banner}${header}${actions}${diff}${revert}\n\n=== GLM SUMMARY ===\n${r.text}${stats}`) }] };
|
|
190
201
|
} catch (e) {
|
|
191
202
|
return {
|
|
192
203
|
isError: true,
|
|
@@ -284,6 +295,10 @@ server.registerTool(
|
|
|
284
295
|
base_url: config.BASE_URL,
|
|
285
296
|
api_key_loaded: config.hasKey,
|
|
286
297
|
max_concurrent: config.MAX_CONCURRENT,
|
|
298
|
+
use_haiku_subagent: USE_HAIKU,
|
|
299
|
+
orchestration: USE_HAIKU
|
|
300
|
+
? "Haiku `glm` subagent allowed (spends some Claude tokens to orchestrate)."
|
|
301
|
+
: "Direct GLM only (GLM_USE_HAIKU=off) -> call glm_agent directly; keeps all tokens on GLM.",
|
|
287
302
|
max_tokens: {
|
|
288
303
|
cap_enabled: MAXTOK.capEnabled,
|
|
289
304
|
default_per_call: resolveMaxTokens(undefined),
|
package/glm-mcp/src/router.js
CHANGED
|
@@ -48,11 +48,25 @@ function numEnv(name, fallback) {
|
|
|
48
48
|
return Number.isFinite(v) ? v : fallback;
|
|
49
49
|
}
|
|
50
50
|
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
51
|
+
function boolEnv(name, fallback) {
|
|
52
|
+
const v = (process.env[name] || "").trim().toLowerCase();
|
|
53
|
+
if (/^(1|on|true|yes)$/.test(v)) return true;
|
|
54
|
+
if (/^(0|off|false|no)$/.test(v)) return false;
|
|
55
|
+
return fallback;
|
|
56
|
+
}
|
|
57
|
+
|
|
58
|
+
// Use the Haiku-orchestrated `glm` subagent? DEFAULT false -> skip Haiku and call GLM directly
|
|
59
|
+
// (mcp__glm__glm_agent), so the burden and the tokens stay on GLM (the Haiku subagent's own
|
|
60
|
+
// writing would spend Claude tokens). Set GLM_USE_HAIKU=on in .env to allow the subagent path.
|
|
61
|
+
export const USE_HAIKU = boolEnv("GLM_USE_HAIKU", false);
|
|
62
|
+
|
|
63
|
+
// GLM is ~10x cheaper than Opus, so by default GLM carries the overwhelming majority of the
|
|
64
|
+
// burden: with GLM_COST_BIAS=7, ~98-100% of tasks route to GLM (measured across all task types,
|
|
65
|
+
// peak and off-peak). Opus is used only for what GLM genuinely can't/shouldn't do -- vision,
|
|
66
|
+
// parallel fan-out, >128K context, sensitive code, and heavy dependent tool-loops (the hard
|
|
67
|
+
// overrides). LOWER GLM_COST_BIAS (e.g. 1.5) if you want Opus to handle more of the hard tasks
|
|
68
|
+
// (debugging, architecture, security, big refactors); set 0 to decide on capability alone.
|
|
69
|
+
const COST_BIAS = numEnv("GLM_COST_BIAS", 7);
|
|
56
70
|
|
|
57
71
|
// --- Output token policy ---------------------------------------------------
|
|
58
72
|
// max_tokens is a CEILING, not a target: GLM stops when done and you're billed for
|
|
@@ -17,6 +17,21 @@ import { readFileSync } from "node:fs";
|
|
|
17
17
|
// Layout: <claude>/hooks/glm_subagent_router.mjs + <claude>/glm-mcp/src/router.js
|
|
18
18
|
const HERE = dirname(fileURLToPath(import.meta.url));
|
|
19
19
|
const ROUTER = resolve(HERE, "..", "glm-mcp", "src", "router.js");
|
|
20
|
+
const MCP_ENV = resolve(HERE, "..", "glm-mcp", ".env");
|
|
21
|
+
|
|
22
|
+
// Load the MCP server's .env so the hook honors the SAME settings the server uses
|
|
23
|
+
// (GLM_COST_BIAS, GLM_USE_HAIKU, peak window, ...). Best-effort; safe if the file is absent.
|
|
24
|
+
function loadMcpEnv() {
|
|
25
|
+
try {
|
|
26
|
+
for (const line of readFileSync(MCP_ENV, "utf8").split(/\r?\n/)) {
|
|
27
|
+
const m = line.match(/^\s*([A-Z0-9_]+)\s*=\s*(.*)\s*$/);
|
|
28
|
+
if (!m) continue;
|
|
29
|
+
let v = m[2];
|
|
30
|
+
if ((v.startsWith('"') && v.endsWith('"')) || (v.startsWith("'") && v.endsWith("'"))) v = v.slice(1, -1);
|
|
31
|
+
if (process.env[m[1]] === undefined) process.env[m[1]] = v;
|
|
32
|
+
}
|
|
33
|
+
} catch {}
|
|
34
|
+
}
|
|
20
35
|
|
|
21
36
|
// Pull the most recent genuine human message from the transcript (skipping
|
|
22
37
|
// tool_result turns), so we can detect when the user explicitly picked an agent.
|
|
@@ -120,6 +135,7 @@ function inferProfile(text) {
|
|
|
120
135
|
|
|
121
136
|
(async () => {
|
|
122
137
|
try {
|
|
138
|
+
loadMcpEnv(); // honor .env settings before importing the router
|
|
123
139
|
const raw = await readStdin();
|
|
124
140
|
const payload = JSON.parse(raw || "{}");
|
|
125
141
|
const ti = payload.tool_input || {};
|
|
@@ -142,7 +158,7 @@ function inferProfile(text) {
|
|
|
142
158
|
|
|
143
159
|
let verdict, peakNote = "";
|
|
144
160
|
try {
|
|
145
|
-
const { recommend, isPeak } = await import(pathToFileURL(ROUTER).href);
|
|
161
|
+
const { recommend, isPeak, USE_HAIKU } = await import(pathToFileURL(ROUTER).href);
|
|
146
162
|
const rec = recommend(profile);
|
|
147
163
|
const peak = isPeak();
|
|
148
164
|
peakNote = peak
|
|
@@ -151,19 +167,22 @@ function inferProfile(text) {
|
|
|
151
167
|
if (rec.engine !== "glm") {
|
|
152
168
|
verdict = `KEEP ON OPUS (inferred: ${profile.taskType}, confidence ${rec.confidence}). Why: ${rec.reasons[0] || ""}`;
|
|
153
169
|
} else if (repoTask) {
|
|
154
|
-
// Hands-on repo task ->
|
|
170
|
+
// Hands-on repo task -> call glm_agent DIRECTLY (the only path that spends GLM tokens).
|
|
171
|
+
const haikuClause = USE_HAIKU
|
|
172
|
+
? `The Haiku "glm" subagent is allowed (GLM_USE_HAIKU=on), but glm_agent direct is cheaper (no Claude orchestration tokens).`
|
|
173
|
+
: `Do NOT do this inline yourself, and do NOT use the Haiku "glm" subagent — both burn Claude/Opus tokens and spend ZERO GLM.`;
|
|
155
174
|
verdict =
|
|
156
175
|
`GLM-SUITABLE repo task (inferred: ${profile.taskType}, confidence ${rec.confidence}). ` +
|
|
157
|
-
|
|
158
|
-
`
|
|
159
|
-
`For oversight, pass dry_run:true first
|
|
160
|
-
`Why: ${rec.reasons[rec.reasons.length - 1] || rec.reasons[0] || ""}`;
|
|
176
|
+
`➤ CALL mcp__glm__glm_agent DIRECTLY with workdir="${cwd}" (model ${rec.model}) — the only path that ` +
|
|
177
|
+
`actually spends GLM tokens (GLM reads/edits the files and runs tests itself). ${haikuClause} ` +
|
|
178
|
+
`For oversight, pass dry_run:true first, then apply. Why: ${rec.reasons[rec.reasons.length - 1] || rec.reasons[0] || ""}`;
|
|
161
179
|
} else {
|
|
162
|
-
// Pure generation -> draft via glm_delegate
|
|
180
|
+
// Pure generation -> draft via glm_delegate (spends GLM tokens); not inline (Claude tokens).
|
|
163
181
|
verdict =
|
|
164
182
|
`GLM-SUITABLE generation (inferred: ${profile.taskType}, confidence ${rec.confidence}). ` +
|
|
165
|
-
|
|
166
|
-
`
|
|
183
|
+
`➤ CALL mcp__glm__glm_delegate (model ${rec.model}) to generate this — that spends GLM tokens. ` +
|
|
184
|
+
`Writing it yourself instead spends Claude tokens and zero GLM. Then place/verify the result. ` +
|
|
185
|
+
`Why: ${rec.reasons[rec.reasons.length - 1] || rec.reasons[0] || ""}`;
|
|
167
186
|
}
|
|
168
187
|
} catch {
|
|
169
188
|
verdict = `Could not load router; GLM delegate is still available (mcp__glm__glm_agent for repo tasks, mcp__glm__glm_delegate for drafts, mcp__glm__glm_recommend to decide).`;
|
package/install.mjs
CHANGED
|
@@ -102,7 +102,7 @@ writeFileSync(settingsPath, JSON.stringify(settings, null, 2) + "\n");
|
|
|
102
102
|
// 6. Append the global delegation policy (idempotent).
|
|
103
103
|
step("Adding global delegation policy to CLAUDE.md");
|
|
104
104
|
const claudeMd = join(CLAUDE, "CLAUDE.md");
|
|
105
|
-
const policy = `\n# Delegation (GLM hybrid)\n\nPrefer offloading well-specified, self-contained subtasks. When you spawn a subagent (the\n\`Task\` tool), a PreToolUse hook injects a \`[GLM router]\` line advising
|
|
105
|
+
const policy = `\n# Delegation (GLM hybrid)\n\nPrefer offloading well-specified, self-contained subtasks to GLM. When you spawn a subagent (the\n\`Task\` tool), a PreToolUse hook injects a \`[GLM router]\` line advising GLM vs Opus -- follow it.\nIf the user explicitly named an agent, honor it. Full rules: \`~/.claude/glm-mcp/README.md\`.\n\n**To actually spend GLM tokens (not Claude tokens): call \`mcp__glm__glm_agent\` directly** for repo\nwork, or \`mcp__glm__glm_delegate\` for pure generation. Prefer calling \`glm_agent\` directly over the\n\`glm\` subagent (which runs on Haiku -- its own writing spends Claude tokens; only the glm_* tools\nspend GLM). If a task is GLM-suitable, don't do it inline yourself (Opus/Claude tokens, zero GLM).\n`;
|
|
106
106
|
const existing = existsSync(claudeMd) ? readFileSync(claudeMd, "utf8") : "";
|
|
107
107
|
if (!existing.includes("[GLM router]")) {
|
|
108
108
|
writeFileSync(claudeMd, existing + policy);
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "glm-mcp-claude",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.1.1",
|
|
4
4
|
"description": "GLM (Zhipu/Z.ai) as a cheap, full-capability subagent for Claude Code — auto-routing between Opus and GLM, a file-editing agent with diff/dry-run/git-revert oversight, and a one-command installer.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|