glm-mcp-claude 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,14 @@
1
+ {
2
+ "_comment": "OPTIONAL project-scoped registration. The installer registers globally (user scope) instead, which is recommended. Use this only if you want GLM available in just one project: copy to .mcp.json, set the key, and adjust the path to glm-mcp/src/index.js.",
3
+ "mcpServers": {
4
+ "glm": {
5
+ "command": "node",
6
+ "args": ["./glm-mcp/src/index.js"],
7
+ "env": {
8
+ "GLM_API_KEY": "your-zai-key-here",
9
+ "GLM_BASE_URL": "https://api.z.ai/api/anthropic",
10
+ "GLM_MAX_CONCURRENT": "1"
11
+ }
12
+ }
13
+ }
14
+ }
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 djerok
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,220 @@
1
+ # GLM-as-Subagent for Claude Code β€” plug & play
2
+
3
+ [![npm version](https://img.shields.io/npm/v/glm-mcp-claude?color=cb3837&logo=npm)](https://www.npmjs.com/package/glm-mcp-claude)
4
+ [![npm downloads](https://img.shields.io/npm/dm/glm-mcp-claude?color=cb3837&logo=npm)](https://www.npmjs.com/package/glm-mcp-claude)
5
+ [![license: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
6
+ [![Claude Code](https://img.shields.io/badge/Claude%20Code-MCP-d97757)](https://github.com/djerok/glm_mcp_claude)
7
+
8
+ > πŸ“¦ **Canonical source:** **https://github.com/djerok/glm_mcp_claude** β€” created by
9
+ > [@djerok](https://github.com/djerok). If you found this via a fork, mirror, or an
10
+ > awesome-list, the original lives here. Please ⭐ / file issues / open PRs at the source.
11
+
12
+ Add **GLM** (Zhipu / Z.ai) to Claude Code as a **cheap, full-capability subagent** (~10Γ— cheaper
13
+ than Opus), with **automatic per-task routing** between Opus and GLM. Your main agent stays on
14
+ Opus; GLM does the well-specified, cost-sensitive work β€” and can read, write, edit, and run your
15
+ files directly. One command to install.
16
+
17
+ > βœ… **Works in the Claude Code app on a subscription-based Claude.** Your main agent runs on the
18
+ > Claude you already pay for through the **Claude Code app** (Pro / Max / Team subscription) β€”
19
+ > **no separate pay-per-token Anthropic API key required**. Only GLM needs a (cheap) Z.ai key.
20
+ > Opus orchestrates on your subscription; GLM does the heavy lifting for a fraction of the cost.
21
+
22
+ ![The glm subagent (orchestrated by Haiku 4.5, the cheap layer) reading this repo and offloading generation to GLM](assets/demo-glm-subagent-summary.png)
23
+
24
+ <sub>↑ The `glm` subagent (orchestrated by Haiku 4.5, the cheap layer) reading the repo and offloading the heavy work to GLM via the MCP tools β€” the Opus β†’ Haiku β†’ GLM hybrid in action.</sub>
25
+
26
+ ```bash
27
+ # from npm:
28
+ npx glm-mcp-claude --key YOUR_ZAI_API_KEY
29
+
30
+ # or straight from GitHub (no clone):
31
+ npx github:djerok/glm_mcp_claude --key YOUR_ZAI_API_KEY
32
+
33
+ # or clone and run the installer:
34
+ node install.mjs --key YOUR_ZAI_API_KEY
35
+ ```
36
+ …then restart Claude Code. That's it. (Details below.)
37
+
38
+ > πŸ”‘ **Your key must be from the Z.ai / Zhipu GLM Coding Plan.** Get one at
39
+ > **https://z.ai** β†’ subscribe to the **GLM Coding Plan**, then create an API key. A generic /
40
+ > free key without coding-plan access will not work for the coding models used here.
41
+
42
+ ---
43
+
44
+ ## What you get
45
+
46
+ - **`glm` subagent** β€” a full-tool subagent (read/write/edit/bash) powered by GLM.
47
+ - **`glm_agent` tool** β€” GLM as a real file-editing agent with built-in oversight (diff, dry-run, git revert).
48
+ - **`glm_delegate` / `glm_recommend` / `glm_status`** β€” draft-only delegation, a free routing advisor, and a health check.
49
+ - **Auto-delegation hook** β€” when you spawn a subagent, it injects a GLM-vs-Opus verdict so cheap work goes to GLM automatically. Zero token cost when you're not spawning subagents.
50
+ - **If you explicitly name an agent** ("use opus", "use the sonnet agent", "use glm"), the hook stays silent and just routes where you asked.
51
+
52
+ ---
53
+
54
+ ## Prerequisites
55
+
56
+ - The **Claude Code app** (desktop or CLI), signed in with a **subscription-based Claude**
57
+ (Pro / Max / Team). Your main agent uses this β€” **no Anthropic API key needed.** The `claude`
58
+ CLI should be on your PATH (`claude --version`).
59
+ - **Node.js β‰₯ 18** (`node -v`)
60
+ - **A Z.ai / Zhipu API key with GLM Coding Plan access** β€” get one at **<https://z.ai>**.
61
+ ⚠️ **It must be on the GLM Coding Plan** (the coding-plan subscription); a generic or free
62
+ key won't have access to the coding models this uses. This is the **only paid key required**,
63
+ and GLM is ~10Γ— cheaper than Opus.
64
+ - Git (optional, but enables `glm_agent`'s one-command revert)
65
+
66
+ ---
67
+
68
+ ## Install (recommended: global, all projects)
69
+
70
+ ```bash
71
+ # from this folder:
72
+ node install.mjs --key YOUR_ZAI_API_KEY
73
+ ```
74
+ The installer:
75
+ 1. copies the server to `~/.claude/glm-mcp/` and runs `npm install`,
76
+ 2. writes your key into `~/.claude/glm-mcp/.env`,
77
+ 3. installs the `glm` subagent (`~/.claude/agents/glm.md`) and the hook (`~/.claude/hooks/`),
78
+ 4. wires the hook into `~/.claude/settings.json` (backs it up first),
79
+ 5. adds a short delegation policy to `~/.claude/CLAUDE.md`,
80
+ 6. registers the MCP server with `claude mcp add glm -s user`.
81
+
82
+ Then **restart Claude Code** and run `glm_status` β€” you should see `"api_key_loaded": true`.
83
+
84
+ > Options: `--no-register` (skip the CLI step), `--skip-npm`, `--claude-dir PATH` (custom config dir).
85
+ > Re-running is safe (idempotent). No key on the command line? Run `node install.mjs`, then edit
86
+ > `~/.claude/glm-mcp/.env` and set `GLM_API_KEY=...`.
87
+
88
+ ### Per-project instead of global
89
+ Don't want it everywhere? Skip the installer. Copy `glm-mcp/` into your project, `cd glm-mcp && npm install`,
90
+ copy `.mcp.json.example` β†’ `.mcp.json` in the project root, set the key, and (optionally) copy
91
+ `agents/glm.md` to `.claude/agents/` and the hook into `.claude/` + `.claude/settings.json`.
92
+
93
+ ---
94
+
95
+ ## How it works (the short version)
96
+
97
+ ```
98
+ You ask for something
99
+ β†’ Opus orchestrates
100
+ β†’ wants to delegate a chunk β†’ spawns a subagent
101
+ β†’ [hook fires] "[GLM router] GLM-suitable repo task β†’ use glm_agent (dry_run first)"
102
+ (or "keep on Opus" for hard/sensitive work)
103
+ β†’ Opus runs glm_agent (GLM edits the files, runs tests) β€” or keeps it on Opus
104
+ β†’ you get a diff + action log + a one-command revert
105
+ ```
106
+
107
+ The routing rules live in **`glm-mcp/src/router.js`** and the **hook** β€” not in always-on context β€”
108
+ so they cost nothing until a subagent is actually spawned.
109
+
110
+ **Routing in one line:** GLM is the default (it's ~10Γ— cheaper); Opus is the exception for work
111
+ where being wrong is expensive β€” subtle debugging, architecture, large refactors, security,
112
+ tool-heavy dependent loops, huge context, vision, or anything you mark sensitive.
113
+
114
+ ---
115
+
116
+ ## The tools
117
+
118
+ | Tool | Cost | What it does |
119
+ |---|---|---|
120
+ | `glm_recommend` | free (local) | GLM-or-Opus decision + model pick + reasons. |
121
+ | `glm_status` | free (local) | Peak window, active model, key/config health. |
122
+ | `glm_delegate` | GLM tokens | Text in β†’ text out. GLM drafts; you place it. |
123
+ | `glm_agent` | GLM tokens | GLM works your repo directly (read/write/edit/bash). Returns a diff + action log + git revert; supports `dry_run` (propose, don't write). |
124
+
125
+ ### Example: directly calling the GLM agent
126
+
127
+ A real run β€” asking GLM (via `glm_agent`) to write a file end-to-end on disk:
128
+
129
+ ![GLM agent writing a 2000-word Shakespearean essay to disk in 18 iterations for about 6 cents](assets/demo-glm-agent-umbrella.png)
130
+
131
+ > **Prompt:** *"Using the GLM agent `glm_agent`, write a 2000-word essay in Shakespearean format about the usefulness of an umbrella, into my Desktop."*
132
+
133
+ GLM did it itself β€” created the file directly, no round-tripping the content through the main agent:
134
+
135
+ - **Output:** `Umbrella-Essay-Shakespeare.md` β€” ~2,260 words of Early Modern English (*thee/thou/thy*, *doth/hath*) with two blank-verse interludes
136
+ - **Work:** 18 tool-loop iterations; **one** file created, nothing existing touched
137
+ - **Cost:** ~**$0.064** β€” a fraction of running the same task on Opus
138
+
139
+ That's the point: the orchestrator stays on Opus while `glm_agent` does the heavy, file-touching work for cents.
140
+
141
+ ---
142
+
143
+ ## Oversight (how you stay in control of `glm_agent`)
144
+
145
+ - **Entry:** you/Opus choose when to call it and with which `workdir`.
146
+ - **`dry_run: true`:** GLM proposes a full diff and writes **nothing** β€” approve, then apply.
147
+ - **After a real run:** you get the unified **diff**, an **action log**, and a one-command
148
+ **git revert** (`git checkout <baseline> -- .`).
149
+
150
+ Note: file/bash ops inside `glm_agent` run in the MCP server process (not gated per-edit) and are
151
+ scoped to the `workdir` you pass. That's intentional (max autonomy) β€” point it only at repos you're
152
+ fine letting it modify.
153
+
154
+ ---
155
+
156
+ ## Configuration (`~/.claude/glm-mcp/.env`)
157
+
158
+ | Var | Default | Meaning |
159
+ |---|---|---|
160
+ | `GLM_API_KEY` | β€” | Your Z.ai key. **Required.** |
161
+ | `GLM_BASE_URL` | `https://api.z.ai/api/anthropic` | Anthropic-compatible endpoint. |
162
+ | `GLM_COST_BIAS` | `1.5` | How hard to favor GLM (it's ~10Γ— cheaper). Higher = more GLM; `0` = decide on capability only. |
163
+ | `GLM_CAP` | `off` | Output-token cap. **Off by default** = generous (up to 131072 per call). Set `on` to enforce `GLM_MAX_TOKENS` and rein in spend. |
164
+ | `GLM_MAX_TOKENS` | `32768` | The hard per-call limit applied **only when `GLM_CAP=on`**. (`max_tokens` is a ceiling, not a target β€” you pay for actual output.) |
165
+ | `GLM_MAX_TOKENS_CEILING` | `131072` | The generous default used when the cap is **off**. |
166
+ | `GLM_MAX_CONCURRENT` | `1` | GLM caps in-flight requests; keep at 1. |
167
+ | `GLM_OFFPEAK_MODEL` / `GLM_PEAK_MODEL` | `glm-5.2` / `glm-5.2` | Model(s) for `auto`. **Each can be a comma-separated list** (e.g. `glm-5.2,glm-5-turbo`) and the router auto-picks β€” most capable for hard tasks, cheapest for easy ones. **Peak rule:** when `auto` lands on a glm-5.x model (3Γ— surcharge) the router routes **less** work to GLM at peak; if you include a no-surcharge model (e.g. `GLM_PEAK_MODEL=glm-5.2,glm-4.7`) it's preferred at peak and GLM stays fine to use. |
168
+ | `GLM_PEAK_START_CN` / `GLM_PEAK_END_CN` | `14` / `18` | Peak window (China hour, UTC+8). |
169
+ | `GLM_AGENT_MAX_ITERS` | `30` | Max tool-loop turns for `glm_agent`. |
170
+
171
+ Full list with comments: `glm-mcp/.env.example`.
172
+
173
+ ---
174
+
175
+ ## Uninstall
176
+
177
+ ```bash
178
+ node uninstall.mjs # remove agent, hook, settings entry, MCP registration
179
+ node uninstall.mjs --purge # also delete ~/.claude/glm-mcp (and its .env)
180
+ ```
181
+
182
+ ---
183
+
184
+ ## Security
185
+
186
+ - **Never commit/share your `.env` or a `.mcp.json` containing the key.** `.gitignore` excludes them.
187
+ - GLM routes through servers in China β€” don't send secrets/regulated code you wouldn't send to a
188
+ third-party API. (Routing keeps `sensitive`-flagged work on Opus, but you decide what to delegate.)
189
+
190
+ ---
191
+
192
+ ## Troubleshooting
193
+
194
+ | Symptom | Fix |
195
+ |---|---|
196
+ | `glm_status` missing / tools absent | Restart Claude Code; `claude mcp get glm` to confirm registration. |
197
+ | `api_key_loaded: false` | Set `GLM_API_KEY` in `~/.claude/glm-mcp/.env`. |
198
+ | Server fails to start | `cd ~/.claude/glm-mcp && npm run smoke` to see the real error. |
199
+ | `Too much concurrency` | Expected under load; it auto-retries. Don't fan out parallel GLM calls. |
200
+ | Hook not firing | Check `~/.claude/settings.json` has a `PreToolUse` `Task` matcher pointing at `glm_subagent_router.mjs`. |
201
+
202
+ More background and the research behind the routing rules: see [`docs/`](docs/).
203
+
204
+ ---
205
+
206
+ ## Contributing
207
+
208
+ PRs and issues welcome β€” see [CONTRIBUTING.md](CONTRIBUTING.md). Good first areas: routing
209
+ rules (`glm-mcp/src/router.js` + the hook), provider adapters, and docs. Please never commit
210
+ secrets/`.env`.
211
+
212
+ ## License
213
+
214
+ [MIT](LICENSE) Β© [djerok](https://github.com/djerok)
215
+
216
+ ---
217
+
218
+ <sub>Original / canonical repository: **https://github.com/djerok/glm_mcp_claude**. If you fork,
219
+ mirror, or redistribute this project, please keep a link back to the source so others can find
220
+ updates, file issues, and contribute. Built by [@djerok](https://github.com/djerok).</sub>
package/agents/glm.md ADDED
@@ -0,0 +1,45 @@
1
+ ---
2
+ name: glm
3
+ description: >
4
+ Cost-saving delegate backed by the GLM model (Zhipu/Z.ai) via the glm MCP server.
5
+ A FULLY-CAPABLE subagent: it can read, search, write, edit, run commands, and apply
6
+ changes to disk just like any other subagent -- but it offloads the heavy generation
7
+ to GLM (~10x cheaper than Opus), then applies the result itself. Use PROACTIVELY for
8
+ cheap, well-specified, self-contained subtasks: frontend/UI, boilerplate, scaffolding,
9
+ CRUD, local refactors, docs, summarization, algorithmic codegen, tests, config. NOT
10
+ for security-sensitive/proprietary code, subtle long debugging, large multi-step
11
+ refactors, dependent agentic tool-loops, or work needing parallel agents -- those stay on Opus.
12
+ model: haiku
13
+ ---
14
+
15
+ You are the **GLM delegate** β€” a full subagent with the same tools as any subagent
16
+ (Read, Grep, Glob, Write, Edit, Bash, …) PLUS the GLM tools. Your edge is COST: GLM
17
+ (~10x cheaper than Opus) does the heavy lifting. You have two ways to use it β€” pick one:
18
+
19
+ ### Preferred for coding tasks: `glm_agent` (GLM works the files directly)
20
+ For most "go do this in the repo" tasks, hand the whole thing to `glm_agent`. It runs GLM
21
+ as a real agent with its own read/write/edit/bash tools, so GLM inspects and edits the code
22
+ itself and runs tests β€” end to end. Call it with:
23
+ - `task`: the self-contained coding task.
24
+ - `workdir`: the **absolute path of the project root** (pass it explicitly).
25
+ - `model`: leave `auto` (peak-aware); `thinking: true` for harder work.
26
+ Then verify the result (re-read changed files / run the build or tests) and report a summary.
27
+
28
+ ### For pure generation (no file ops): `glm_delegate`
29
+ If you just need GLM to draft text/code that *you* will place, gather context with Read/Grep,
30
+ call `glm_delegate` (task + pasted context), then apply it with your own Write/Edit.
31
+
32
+ ## Always
33
+ 1. **Check fit when unsure.** Call `glm_recommend`. If it returns `OPUS` (tool-heavy dependent
34
+ loop, large context >128K, long-horizon >20 steps, vision, etc.), stop and report that this
35
+ should run on Opus, with the reason. Don't force GLM where it's weak.
36
+ 2. **Serialize GLM calls** β€” one at a time (GLM caps concurrency ~1).
37
+ 3. **Verify before returning.** Build/lint/test or re-read. If GLM's output is wrong or it loops,
38
+ retry once with a sharper prompt; if still bad, do the critical part yourself or escalate to Opus.
39
+
40
+ ## Operating rules
41
+ - **Serialize GLM calls.** GLM caps concurrent requests (~1); one `glm_delegate` at a time.
42
+ - **You own the writes.** You can edit disk directly β€” verify before saving (build/lint/test or
43
+ re-read), and prefer Edit over blind overwrites. Correctness first; a cheap wrong change still
44
+ costs time to undo.
45
+ - You have full repo access like any subagent. Use it freely.
@@ -0,0 +1,58 @@
1
+ # Auto-Select: routing subagent work between GLM and Opus
2
+
3
+ > **Group 3, Agents B & C deliverables:** the auto-selection mechanism and the
4
+ > end-to-end smoothness contract.
5
+
6
+ ## How auto-selection actually works (and its honest limits)
7
+
8
+ Claude Code does **not** let you swap a subagent's underlying model to a non-Anthropic
9
+ provider β€” `model:` in an agent file only chooses among Anthropic models. So "make GLM a
10
+ subagent" is delivered as a **delegation tool** the orchestrator calls, plus rules that tell
11
+ the orchestrator *when* to call it. There is no hidden background process; routing is a
12
+ decision Claude makes using the rules below. That keeps it cheap, transparent, and accurate.
13
+
14
+ Three cooperating pieces:
15
+
16
+ 1. **The rule (cheap, no tokens):** `glm_recommend` and the table in [`RULES.md`](RULES.md)
17
+ decide GLM vs Opus locally β€” no GLM call needed to make the decision.
18
+ 2. **The execution:** `glm_delegate` runs the work on GLM with peak-aware model choice and
19
+ the concurrency gate; or the `glm` subagent wraps that with context-gathering.
20
+ 3. **The policy in context:** the snippet in [`CLAUDE.md`](../CLAUDE.md) makes every Claude
21
+ Code session in this project *consider GLM whenever it would otherwise spawn a subagent.*
22
+
23
+ ## The decision flow (what happens on every subagent-worthy task)
24
+
25
+ ```
26
+ Need to delegate a subtask?
27
+ β”‚
28
+ β–Ό
29
+ Is it sensitive / parallel / long+complex / latency-critical? ──yes──▢ Opus subagent
30
+ β”‚ no
31
+ β–Ό
32
+ Quick fit check (RULES.md table; glm_recommend if unsure)
33
+ β”‚
34
+ GLM β—€β”΄β–Ά Opus
35
+ β”‚
36
+ β–Ό
37
+ GLM picked β†’ `glm` subagent gathers context β†’ glm_delegate (auto model, peak-aware)
38
+ β”‚
39
+ β–Ό
40
+ Verify output. Wrong/low-quality? β†’ retry once β†’ still bad? β†’ escalate to Opus
41
+ ```
42
+
43
+ ## Smoothness / efficiency contract (the E2E checklist)
44
+
45
+ - **Consideration is automatic** β€” the CLAUDE.md policy ensures GLM is weighed on every
46
+ subagent dispatch, without the user asking.
47
+ - **Deciding is near-free** β€” the GLM-vs-Opus choice is a local rule lookup
48
+ (`glm_recommend` / the table), not an extra LLM round-trip. No meaningful token/latency tax.
49
+ - **Decisions are accurate** — overrides protect correctness (sensitive→Opus, hard→Opus);
50
+ cost-timing only shifts *which* engine handles the safe-to-delegate majority.
51
+ - **GLM behaves predictably** β€” it gets full context pasted in (it can't read files/tools),
52
+ output is verified, and failures escalate rather than silently degrade.
53
+ - **Plug-and-play** β€” to add this to any other Claude Code project: copy `glm-mcp/`,
54
+ `.mcp.json`, `.claude/agents/glm.md`, and the CLAUDE.md block. Set the key. Done.
55
+
56
+ ## Tuning knobs
57
+ All behavior is env-driven (see `glm-mcp/.env.example`): peak window hours, peak/off-peak
58
+ model picks, concurrency cap, retries, timeout. No code edits needed to adjust policy.
package/docs/RULES.md ADDED
@@ -0,0 +1,105 @@
1
+ # GLM-vs-Opus Routing Rules (synthesized)
2
+
3
+ This is the single source of truth for *when* to delegate to GLM. It is the human-readable
4
+ version of the logic in [`glm-mcp/src/router.js`](../glm-mcp/src/router.js) (`recommend()`),
5
+ synthesized from the three research docs in [`docs/research/`](research/).
6
+
7
+ > **Group 3, Agent A deliverable:** synthesized decision rules.
8
+
9
+ ## The one-line policy
10
+ **Default to GLM for the routine ~80%; escalate to Opus for the expensive-if-wrong minority.**
11
+
12
+ ## Why cost makes GLM the default (not just a "cheap option")
13
+ GLM is **~10Γ— cheaper** than Opus, and **still ~3–4Γ— cheaper even at peak** (3Γ— multiplier):
14
+
15
+ | Model | ~$/1M in/out | vs Opus |
16
+ |---|---|---|
17
+ | Opus | 5 / 25 | 1Γ— |
18
+ | GLM-5.2 off-peak | 0.6 / 2.2 | ~10Γ— cheaper |
19
+ | GLM-5.2 at peak | 1.8 / 6.6 | ~3–4Γ— cheaper |
20
+ | GLM-4.7 (no multiplier) | 0.4 / 1.75 | ~12Γ— cheaper |
21
+
22
+ So the router applies a standing **cost bias toward GLM** (`GLM_COST_BIAS`, default `1.5`):
23
+ GLM is the default for safe-to-be-wrong work, and Opus is the exception you *pay up* for only
24
+ when quality/risk justifies it. The catch cost can't override: on hard tasks, *cheaper-but-wrong*
25
+ is **more** expensive (rework + Opus tokens to fix), so the capability penalties for
26
+ debugging/architecture/large-refactor/security claw those back to Opus despite the price gap.
27
+ Tune `GLM_COST_BIAS` up to push more to GLM, or to `0` to decide on capability alone.
28
+
29
+ ## Hard overrides β€” always Opus (no matter the cost saving)
30
+ | Condition | Why |
31
+ |---|---|
32
+ | Proprietary / security-critical code or secrets | GLM routes through servers in China; Zhipu is on the US Entity List. Don't send sensitive IP. |
33
+ | Needs parallel / concurrent agents | GLM caps in-flight requests at ~1 even on paid tiers; fan-out breaks. |
34
+ | Long-horizon **and** high complexity (30+ sequential steps, multi-hour autonomy) | GLM drifts off-plan; Opus holds the plan. |
35
+ | Latency-sensitive interactive loop | GLM is among the slowest frontier coders (~50–100 tok/s). |
36
+
37
+ ## Tool-calling: the important nuance
38
+ GLM is **strong at one-shot / short tool calls** (clean schema adherence, near-zero argument
39
+ hallucination) but **weak at long, dependent agentic loops**. Root cause from research: GLM
40
+ "plans-then-acts" and depends on its reasoning state (`reasoning_content`) being fed back each
41
+ turn; in long loops it drifts or loops infinitely. Opus *interleaves* thinking with tool use,
42
+ so it adapts mid-execution. One report: a task that looped 40+ tool calls on GLM finished in
43
+ ~2 min on Claude. So:
44
+ - `toolcall_single` (one structured call / extraction) β†’ **GLM** (+2)
45
+ - `toolcall_fanout` (a few independent calls, no shared state) β†’ **GLM** (+1)
46
+ - `toolcall_heavy` / `agentic_loop` (many *dependent* calls, long horizon) β†’ **Opus** (hard override)
47
+
48
+ ## Scenario matrix (capability-fit weight, before cost bias)
49
+ Weight scale: +2 strongly GLM … βˆ’3 strongly Opus. Cost bias (+1.5) + timing then applied;
50
+ net effect: fit β‰₯ βˆ’1 usually routes GLM, fit ≀ βˆ’2 routes Opus.
51
+
52
+ | Lean GLM (+1/+2) | Lean / strong Opus (βˆ’1β€¦βˆ’3) |
53
+ |---|---|
54
+ | frontend/UI, boilerplate, scaffolding, config (+2) | iac/Terraform/K8s, dependency upgrades (βˆ’1) |
55
+ | CRUD, regex, docs, i18n, type/lint fixes (+2) | debugging, code review, perf optimization (βˆ’2) |
56
+ | unit tests, local refactor, prototype (+2) | DB migrations, 3rd-party API integration (βˆ’2) |
57
+ | toolcall_single (+2) | systems langs (Rust/Go/C) (βˆ’2) |
58
+ | SQL, ETL, CI/CD, CLI, notebook, integration tests (+1) | large refactor, architecture, security (βˆ’3) |
59
+ | algorithm, research, summarization, toolcall_fanout (+1) | toolcall_heavy, agentic_loop (βˆ’3) |
60
+ | general, ml_training (0 / toss-up β†’ cost breaks tie to GLM) | |
61
+
62
+ **Principle:** route by *cost of being wrong*, not token price. GLM where output verifies fast
63
+ (compiler/linter/test runner is ground truth) and a retry is cheap; Opus where errors are
64
+ silent, cascading, or expensive (migrations, security, review, perf).
65
+
66
+ ## Conditional overrides (force Opus regardless of task type)
67
+ - **Vision** (images/screenshots/GUI/computer-use) β†’ Opus (GLM text endpoint has no vision).
68
+ - **Input > ~128K tokens** β†’ Opus (GLM degrades past ~100K despite 1M advertised); use
69
+ `glm-5.2[1m]` only for pure retrieval/extraction.
70
+ - **> 20 dependent steps / sustained single goal** β†’ Opus (GLM goal-drift).
71
+ - **Unfamiliar / niche / post-cutoff API** β†’ βˆ’2: paste authoritative docs into the prompt, or Opus.
72
+ - **Chinese / bilingual** β†’ +1 toward GLM (genuine strength).
73
+
74
+ **Complexity adjusts the call:** `low` nudges toward GLM, `high` nudges toward Opus.
75
+
76
+ ## Cost-timing rule (the "use GLM less at peak" requirement)
77
+ - **Peak window:** ~**14:00–18:00 China time (UTC+8)** β‰ˆ 02:00–06:00 US Eastern.
78
+ - Flagship **GLM-5.2 costs ~3Γ— at peak, ~2Γ— off-peak** (1Γ— off-peak under a promo through ~Sep 2026).
79
+ - **Default `model: "auto"` is GLM-5.2 in both windows.** Since GLM-5.2 carries the ~3Γ— peak
80
+ surcharge, the router **routes less work to GLM during peak** (a peak penalty scaled by the
81
+ multiplier) instead of switching models β€” only stronger-fit tasks go to GLM at peak.
82
+ - GLM-4.7 carries no multiplier and stays available as a cheaper option: set `GLM_PEAK_MODEL=glm-4.7`
83
+ if you'd rather switch to a no-surcharge model at peak instead of routing less.
84
+ - Western working hours fall in GLM off-peak β†’ GLM is *most* attractive during your normal day.
85
+
86
+ ## Model picks
87
+ | Use | Model |
88
+ |---|---|
89
+ | Off-peak default | `glm-5.2` |
90
+ | Peak default | `glm-5.2` (router sends less to GLM at peak due to ~3Γ— cost) |
91
+ | No-surcharge alternative for peak | `glm-4.7` (set `GLM_PEAK_MODEL=glm-4.7`) |
92
+ | Cheapest / trivial work | `glm-4.5-air` |
93
+ | Huge context (>200K) | `glm-5.2[1m]` |
94
+
95
+ > `GLM_OFFPEAK_MODEL` / `GLM_PEAK_MODEL` each accept a **comma-separated list** (e.g.
96
+ > `glm-5.2,glm-5-turbo`); the router auto-picks one β€” most capable for hard tasks, cheapest for
97
+ > easy/peak ones. Put a no-surcharge model in the peak list (e.g. `glm-5.2,glm-4.7`) and it's
98
+ > preferred at peak, so GLM stays fine to use during the surcharge window.
99
+
100
+ ## Escalation contract
101
+ If GLM's output is wrong, low-quality, or it refuses: retry once with a sharper prompt,
102
+ then **escalate to Opus**. Never let cost-saving degrade correctness on work that matters.
103
+
104
+ > ⚠️ Pricing, peak windows, and the concurrency cap are vendor-set and drift. Verify against
105
+ > `z.ai/subscribe` and `docs.z.ai` periodically; tune via env vars (see `.env.example`).