npm - glm-mcp-claude - Versions diffs - 1.0.0 - Mend

glm-mcp-claude 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/.mcp.json.example +14 -0
package/LICENSE +21 -0
package/README.md +220 -0
package/agents/glm.md +45 -0
package/assets/demo-glm-agent-umbrella.png +0 -0
package/assets/demo-glm-subagent-summary.png +0 -0
package/docs/AUTOSELECT.md +58 -0
package/docs/RULES.md +105 -0
package/docs/research/glm-capabilities.md +241 -0
package/docs/research/glm-failure-modes-routing.md +287 -0
package/docs/research/glm-misc-and-integration.md +180 -0
package/docs/research/glm-peak-usage-and-cost.md +146 -0
package/docs/research/glm-vs-opus-scenario-matrix.md +85 -0
package/docs/research/glm-vs-opus-toolcalling.md +134 -0
package/glm-mcp/.env.example +32 -0
package/glm-mcp/package-lock.json +1180 -0
package/glm-mcp/package.json +21 -0
package/glm-mcp/src/glmAgent.js +227 -0
package/glm-mcp/src/glmClient.js +136 -0
package/glm-mcp/src/index.js +306 -0
package/glm-mcp/src/loadEnv.js +24 -0
package/glm-mcp/src/router.js +291 -0
package/glm-mcp/src/smoke.js +42 -0
package/hooks/glm_subagent_router.mjs +206 -0
package/install.mjs +132 -0
package/package.json +47 -0
package/uninstall.mjs +47 -0

package/.mcp.json.example ADDED Viewed

@@ -0,0 +1,14 @@
+{
+  "_comment": "OPTIONAL project-scoped registration. The installer registers globally (user scope) instead, which is recommended. Use this only if you want GLM available in just one project: copy to .mcp.json, set the key, and adjust the path to glm-mcp/src/index.js.",
+  "mcpServers": {
+    "glm": {
+      "command": "node",
+      "args": ["./glm-mcp/src/index.js"],
+      "env": {
+        "GLM_API_KEY": "your-zai-key-here",
+        "GLM_BASE_URL": "https://api.z.ai/api/anthropic",
+        "GLM_MAX_CONCURRENT": "1"
+      }
+    }
+  }
+}

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 djerok
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,220 @@
+# GLM-as-Subagent for Claude Code — plug & play
+[![npm version](https://img.shields.io/npm/v/glm-mcp-claude?color=cb3837&logo=npm)](https://www.npmjs.com/package/glm-mcp-claude)
+[![npm downloads](https://img.shields.io/npm/dm/glm-mcp-claude?color=cb3837&logo=npm)](https://www.npmjs.com/package/glm-mcp-claude)
+[![license: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
+[![Claude Code](https://img.shields.io/badge/Claude%20Code-MCP-d97757)](https://github.com/djerok/glm_mcp_claude)
+> 📦 **Canonical source:** **https://github.com/djerok/glm_mcp_claude** — created by
+> [@djerok](https://github.com/djerok). If you found this via a fork, mirror, or an
+> awesome-list, the original lives here. Please ⭐ / file issues / open PRs at the source.
+Add **GLM** (Zhipu / Z.ai) to Claude Code as a **cheap, full-capability subagent** (~10× cheaper
+than Opus), with **automatic per-task routing** between Opus and GLM. Your main agent stays on
+Opus; GLM does the well-specified, cost-sensitive work — and can read, write, edit, and run your
+files directly. One command to install.
+> ✅ **Works in the Claude Code app on a subscription-based Claude.** Your main agent runs on the
+> Claude you already pay for through the **Claude Code app** (Pro / Max / Team subscription) —
+> **no separate pay-per-token Anthropic API key required**. Only GLM needs a (cheap) Z.ai key.
+> Opus orchestrates on your subscription; GLM does the heavy lifting for a fraction of the cost.
+![The glm subagent (orchestrated by Haiku 4.5, the cheap layer) reading this repo and offloading generation to GLM](assets/demo-glm-subagent-summary.png)
+<sub>↑ The `glm` subagent (orchestrated by Haiku 4.5, the cheap layer) reading the repo and offloading the heavy work to GLM via the MCP tools — the Opus → Haiku → GLM hybrid in action.</sub>
+```bash
+# from npm:
+npx glm-mcp-claude --key YOUR_ZAI_API_KEY
+# or straight from GitHub (no clone):
+npx github:djerok/glm_mcp_claude --key YOUR_ZAI_API_KEY
+# or clone and run the installer:
+node install.mjs --key YOUR_ZAI_API_KEY
+```
+…then restart Claude Code. That's it. (Details below.)
+> 🔑 **Your key must be from the Z.ai / Zhipu GLM Coding Plan.** Get one at
+> **https://z.ai** → subscribe to the **GLM Coding Plan**, then create an API key. A generic /
+> free key without coding-plan access will not work for the coding models used here.
+---
+## What you get
+- **`glm` subagent** — a full-tool subagent (read/write/edit/bash) powered by GLM.
+- **`glm_agent` tool** — GLM as a real file-editing agent with built-in oversight (diff, dry-run, git revert).
+- **`glm_delegate` / `glm_recommend` / `glm_status`** — draft-only delegation, a free routing advisor, and a health check.
+- **Auto-delegation hook** — when you spawn a subagent, it injects a GLM-vs-Opus verdict so cheap work goes to GLM automatically. Zero token cost when you're not spawning subagents.
+- **If you explicitly name an agent** ("use opus", "use the sonnet agent", "use glm"), the hook stays silent and just routes where you asked.
+---
+## Prerequisites
+- The **Claude Code app** (desktop or CLI), signed in with a **subscription-based Claude**
+  (Pro / Max / Team). Your main agent uses this — **no Anthropic API key needed.** The `claude`
+  CLI should be on your PATH (`claude --version`).
+- **Node.js ≥ 18** (`node -v`)
+- **A Z.ai / Zhipu API key with GLM Coding Plan access** — get one at **<https://z.ai>**.
+  ⚠️ **It must be on the GLM Coding Plan** (the coding-plan subscription); a generic or free
+  key won't have access to the coding models this uses. This is the **only paid key required**,
+  and GLM is ~10× cheaper than Opus.
+- Git (optional, but enables `glm_agent`'s one-command revert)
+---
+## Install (recommended: global, all projects)
+```bash
+# from this folder:
+node install.mjs --key YOUR_ZAI_API_KEY
+```
+The installer:
+1. copies the server to `~/.claude/glm-mcp/` and runs `npm install`,
+2. writes your key into `~/.claude/glm-mcp/.env`,
+3. installs the `glm` subagent (`~/.claude/agents/glm.md`) and the hook (`~/.claude/hooks/`),
+4. wires the hook into `~/.claude/settings.json` (backs it up first),
+5. adds a short delegation policy to `~/.claude/CLAUDE.md`,
+6. registers the MCP server with `claude mcp add glm -s user`.
+Then **restart Claude Code** and run `glm_status` — you should see `"api_key_loaded": true`.
+> Options: `--no-register` (skip the CLI step), `--skip-npm`, `--claude-dir PATH` (custom config dir).
+> Re-running is safe (idempotent). No key on the command line? Run `node install.mjs`, then edit
+> `~/.claude/glm-mcp/.env` and set `GLM_API_KEY=...`.
+### Per-project instead of global
+Don't want it everywhere? Skip the installer. Copy `glm-mcp/` into your project, `cd glm-mcp && npm install`,
+copy `.mcp.json.example` → `.mcp.json` in the project root, set the key, and (optionally) copy
+`agents/glm.md` to `.claude/agents/` and the hook into `.claude/` + `.claude/settings.json`.
+---
+## How it works (the short version)
+```
+You ask for something
+  → Opus orchestrates
+  → wants to delegate a chunk → spawns a subagent
+       → [hook fires] "[GLM router] GLM-suitable repo task → use glm_agent (dry_run first)"
+              (or "keep on Opus" for hard/sensitive work)
+       → Opus runs glm_agent (GLM edits the files, runs tests) — or keeps it on Opus
+       → you get a diff + action log + a one-command revert
+```
+The routing rules live in **`glm-mcp/src/router.js`** and the **hook** — not in always-on context —
+so they cost nothing until a subagent is actually spawned.
+**Routing in one line:** GLM is the default (it's ~10× cheaper); Opus is the exception for work
+where being wrong is expensive — subtle debugging, architecture, large refactors, security,
+tool-heavy dependent loops, huge context, vision, or anything you mark sensitive.
+---
+## The tools
+| Tool | Cost | What it does |
+|---|---|---|
+| `glm_recommend` | free (local) | GLM-or-Opus decision + model pick + reasons. |
+| `glm_status` | free (local) | Peak window, active model, key/config health. |
+| `glm_delegate` | GLM tokens | Text in → text out. GLM drafts; you place it. |
+| `glm_agent` | GLM tokens | GLM works your repo directly (read/write/edit/bash). Returns a diff + action log + git revert; supports `dry_run` (propose, don't write). |
+### Example: directly calling the GLM agent
+A real run — asking GLM (via `glm_agent`) to write a file end-to-end on disk:
+![GLM agent writing a 2000-word Shakespearean essay to disk in 18 iterations for about 6 cents](assets/demo-glm-agent-umbrella.png)
+> **Prompt:** *"Using the GLM agent `glm_agent`, write a 2000-word essay in Shakespearean format about the usefulness of an umbrella, into my Desktop."*
+GLM did it itself — created the file directly, no round-tripping the content through the main agent:
+- **Output:** `Umbrella-Essay-Shakespeare.md` — ~2,260 words of Early Modern English (*thee/thou/thy*, *doth/hath*) with two blank-verse interludes
+- **Work:** 18 tool-loop iterations; **one** file created, nothing existing touched
+- **Cost:** ~**$0.064** — a fraction of running the same task on Opus
+That's the point: the orchestrator stays on Opus while `glm_agent` does the heavy, file-touching work for cents.
+---
+## Oversight (how you stay in control of `glm_agent`)
+- **Entry:** you/Opus choose when to call it and with which `workdir`.
+- **`dry_run: true`:** GLM proposes a full diff and writes **nothing** — approve, then apply.
+- **After a real run:** you get the unified **diff**, an **action log**, and a one-command
+  **git revert** (`git checkout <baseline> -- .`).
+Note: file/bash ops inside `glm_agent` run in the MCP server process (not gated per-edit) and are
+scoped to the `workdir` you pass. That's intentional (max autonomy) — point it only at repos you're
+fine letting it modify.
+---
+## Configuration (`~/.claude/glm-mcp/.env`)
+| Var | Default | Meaning |
+|---|---|---|
+| `GLM_API_KEY` | — | Your Z.ai key. **Required.** |
+| `GLM_BASE_URL` | `https://api.z.ai/api/anthropic` | Anthropic-compatible endpoint. |
+| `GLM_COST_BIAS` | `1.5` | How hard to favor GLM (it's ~10× cheaper). Higher = more GLM; `0` = decide on capability only. |
+| `GLM_CAP` | `off` | Output-token cap. **Off by default** = generous (up to 131072 per call). Set `on` to enforce `GLM_MAX_TOKENS` and rein in spend. |
+| `GLM_MAX_TOKENS` | `32768` | The hard per-call limit applied **only when `GLM_CAP=on`**. (`max_tokens` is a ceiling, not a target — you pay for actual output.) |
+| `GLM_MAX_TOKENS_CEILING` | `131072` | The generous default used when the cap is **off**. |
+| `GLM_MAX_CONCURRENT` | `1` | GLM caps in-flight requests; keep at 1. |
+| `GLM_OFFPEAK_MODEL` / `GLM_PEAK_MODEL` | `glm-5.2` / `glm-5.2` | Model(s) for `auto`. **Each can be a comma-separated list** (e.g. `glm-5.2,glm-5-turbo`) and the router auto-picks — most capable for hard tasks, cheapest for easy ones. **Peak rule:** when `auto` lands on a glm-5.x model (3× surcharge) the router routes **less** work to GLM at peak; if you include a no-surcharge model (e.g. `GLM_PEAK_MODEL=glm-5.2,glm-4.7`) it's preferred at peak and GLM stays fine to use. |
+| `GLM_PEAK_START_CN` / `GLM_PEAK_END_CN` | `14` / `18` | Peak window (China hour, UTC+8). |
+| `GLM_AGENT_MAX_ITERS` | `30` | Max tool-loop turns for `glm_agent`. |
+Full list with comments: `glm-mcp/.env.example`.
+---
+## Uninstall
+```bash
+node uninstall.mjs          # remove agent, hook, settings entry, MCP registration
+node uninstall.mjs --purge  # also delete ~/.claude/glm-mcp (and its .env)
+```
+---
+## Security
+- **Never commit/share your `.env` or a `.mcp.json` containing the key.** `.gitignore` excludes them.
+- GLM routes through servers in China — don't send secrets/regulated code you wouldn't send to a
+  third-party API. (Routing keeps `sensitive`-flagged work on Opus, but you decide what to delegate.)
+---
+## Troubleshooting
+| Symptom | Fix |
+|---|---|
+| `glm_status` missing / tools absent | Restart Claude Code; `claude mcp get glm` to confirm registration. |
+| `api_key_loaded: false` | Set `GLM_API_KEY` in `~/.claude/glm-mcp/.env`. |
+| Server fails to start | `cd ~/.claude/glm-mcp && npm run smoke` to see the real error. |
+| `Too much concurrency` | Expected under load; it auto-retries. Don't fan out parallel GLM calls. |
+| Hook not firing | Check `~/.claude/settings.json` has a `PreToolUse` `Task` matcher pointing at `glm_subagent_router.mjs`. |
+More background and the research behind the routing rules: see [`docs/`](docs/).
+---
+## Contributing
+PRs and issues welcome — see [CONTRIBUTING.md](CONTRIBUTING.md). Good first areas: routing
+rules (`glm-mcp/src/router.js` + the hook), provider adapters, and docs. Please never commit
+secrets/`.env`.
+## License
+[MIT](LICENSE) © [djerok](https://github.com/djerok)
+---
+<sub>Original / canonical repository: **https://github.com/djerok/glm_mcp_claude**. If you fork,
+mirror, or redistribute this project, please keep a link back to the source so others can find
+updates, file issues, and contribute. Built by [@djerok](https://github.com/djerok).</sub>

package/agents/glm.md ADDED Viewed

@@ -0,0 +1,45 @@
+---
+name: glm
+description: >
+  Cost-saving delegate backed by the GLM model (Zhipu/Z.ai) via the glm MCP server.
+  A FULLY-CAPABLE subagent: it can read, search, write, edit, run commands, and apply
+  changes to disk just like any other subagent -- but it offloads the heavy generation
+  to GLM (~10x cheaper than Opus), then applies the result itself. Use PROACTIVELY for
+  cheap, well-specified, self-contained subtasks: frontend/UI, boilerplate, scaffolding,
+  CRUD, local refactors, docs, summarization, algorithmic codegen, tests, config. NOT
+  for security-sensitive/proprietary code, subtle long debugging, large multi-step
+  refactors, dependent agentic tool-loops, or work needing parallel agents -- those stay on Opus.
+model: haiku
+---
+You are the **GLM delegate** — a full subagent with the same tools as any subagent
+(Read, Grep, Glob, Write, Edit, Bash, …) PLUS the GLM tools. Your edge is COST: GLM
+(~10x cheaper than Opus) does the heavy lifting. You have two ways to use it — pick one:
+### Preferred for coding tasks: `glm_agent` (GLM works the files directly)
+For most "go do this in the repo" tasks, hand the whole thing to `glm_agent`. It runs GLM
+as a real agent with its own read/write/edit/bash tools, so GLM inspects and edits the code
+itself and runs tests — end to end. Call it with:
+- `task`: the self-contained coding task.
+- `workdir`: the **absolute path of the project root** (pass it explicitly).
+- `model`: leave `auto` (peak-aware); `thinking: true` for harder work.
+Then verify the result (re-read changed files / run the build or tests) and report a summary.
+### For pure generation (no file ops): `glm_delegate`
+If you just need GLM to draft text/code that *you* will place, gather context with Read/Grep,
+call `glm_delegate` (task + pasted context), then apply it with your own Write/Edit.
+## Always
+1. **Check fit when unsure.** Call `glm_recommend`. If it returns `OPUS` (tool-heavy dependent
+   loop, large context >128K, long-horizon >20 steps, vision, etc.), stop and report that this
+   should run on Opus, with the reason. Don't force GLM where it's weak.
+2. **Serialize GLM calls** — one at a time (GLM caps concurrency ~1).
+3. **Verify before returning.** Build/lint/test or re-read. If GLM's output is wrong or it loops,
+   retry once with a sharper prompt; if still bad, do the critical part yourself or escalate to Opus.
+## Operating rules
+- **Serialize GLM calls.** GLM caps concurrent requests (~1); one `glm_delegate` at a time.
+- **You own the writes.** You can edit disk directly — verify before saving (build/lint/test or
+  re-read), and prefer Edit over blind overwrites. Correctness first; a cheap wrong change still
+  costs time to undo.
+- You have full repo access like any subagent. Use it freely.

package/assets/demo-glm-agent-umbrella.png ADDED Viewed

Binary file

package/assets/demo-glm-subagent-summary.png ADDED Viewed

Binary file

package/docs/AUTOSELECT.md ADDED Viewed

@@ -0,0 +1,58 @@
+# Auto-Select: routing subagent work between GLM and Opus
+> **Group 3, Agents B & C deliverables:** the auto-selection mechanism and the
+> end-to-end smoothness contract.
+## How auto-selection actually works (and its honest limits)
+Claude Code does **not** let you swap a subagent's underlying model to a non-Anthropic
+provider — `model:` in an agent file only chooses among Anthropic models. So "make GLM a
+subagent" is delivered as a **delegation tool** the orchestrator calls, plus rules that tell
+the orchestrator *when* to call it. There is no hidden background process; routing is a
+decision Claude makes using the rules below. That keeps it cheap, transparent, and accurate.
+Three cooperating pieces:
+1. **The rule (cheap, no tokens):** `glm_recommend` and the table in [`RULES.md`](RULES.md)
+   decide GLM vs Opus locally — no GLM call needed to make the decision.
+2. **The execution:** `glm_delegate` runs the work on GLM with peak-aware model choice and
+   the concurrency gate; or the `glm` subagent wraps that with context-gathering.
+3. **The policy in context:** the snippet in [`CLAUDE.md`](../CLAUDE.md) makes every Claude
+   Code session in this project *consider GLM whenever it would otherwise spawn a subagent.*
+## The decision flow (what happens on every subagent-worthy task)
+```
+Need to delegate a subtask?
+        │
+        ▼
+Is it sensitive / parallel / long+complex / latency-critical?  ──yes──▶  Opus subagent
+        │ no
+        ▼
+Quick fit check (RULES.md table; glm_recommend if unsure)
+        │
+   GLM ◀┴▶ Opus
+        │
+        ▼
+GLM picked → `glm` subagent gathers context → glm_delegate (auto model, peak-aware)
+        │
+        ▼
+Verify output. Wrong/low-quality? → retry once → still bad? → escalate to Opus
+```
+## Smoothness / efficiency contract (the E2E checklist)
+- **Consideration is automatic** — the CLAUDE.md policy ensures GLM is weighed on every
+  subagent dispatch, without the user asking.
+- **Deciding is near-free** — the GLM-vs-Opus choice is a local rule lookup
+  (`glm_recommend` / the table), not an extra LLM round-trip. No meaningful token/latency tax.
+- **Decisions are accurate** — overrides protect correctness (sensitive→Opus, hard→Opus);
+  cost-timing only shifts *which* engine handles the safe-to-delegate majority.
+- **GLM behaves predictably** — it gets full context pasted in (it can't read files/tools),
+  output is verified, and failures escalate rather than silently degrade.
+- **Plug-and-play** — to add this to any other Claude Code project: copy `glm-mcp/`,
+  `.mcp.json`, `.claude/agents/glm.md`, and the CLAUDE.md block. Set the key. Done.
+## Tuning knobs
+All behavior is env-driven (see `glm-mcp/.env.example`): peak window hours, peak/off-peak
+model picks, concurrency cap, retries, timeout. No code edits needed to adjust policy.

package/docs/RULES.md ADDED Viewed

@@ -0,0 +1,105 @@
+# GLM-vs-Opus Routing Rules (synthesized)
+This is the single source of truth for *when* to delegate to GLM. It is the human-readable
+version of the logic in [`glm-mcp/src/router.js`](../glm-mcp/src/router.js) (`recommend()`),
+synthesized from the three research docs in [`docs/research/`](research/).
+> **Group 3, Agent A deliverable:** synthesized decision rules.
+## The one-line policy
+**Default to GLM for the routine ~80%; escalate to Opus for the expensive-if-wrong minority.**
+## Why cost makes GLM the default (not just a "cheap option")
+GLM is **~10× cheaper** than Opus, and **still ~3–4× cheaper even at peak** (3× multiplier):
+| Model | ~$/1M in/out | vs Opus |
+|---|---|---|
+| Opus | 5 / 25 | 1× |
+| GLM-5.2 off-peak | 0.6 / 2.2 | ~10× cheaper |
+| GLM-5.2 at peak | 1.8 / 6.6 | ~3–4× cheaper |
+| GLM-4.7 (no multiplier) | 0.4 / 1.75 | ~12× cheaper |
+So the router applies a standing **cost bias toward GLM** (`GLM_COST_BIAS`, default `1.5`):
+GLM is the default for safe-to-be-wrong work, and Opus is the exception you *pay up* for only
+when quality/risk justifies it. The catch cost can't override: on hard tasks, *cheaper-but-wrong*
+is **more** expensive (rework + Opus tokens to fix), so the capability penalties for
+debugging/architecture/large-refactor/security claw those back to Opus despite the price gap.
+Tune `GLM_COST_BIAS` up to push more to GLM, or to `0` to decide on capability alone.
+## Hard overrides — always Opus (no matter the cost saving)
+| Condition | Why |
+|---|---|
+| Proprietary / security-critical code or secrets | GLM routes through servers in China; Zhipu is on the US Entity List. Don't send sensitive IP. |
+| Needs parallel / concurrent agents | GLM caps in-flight requests at ~1 even on paid tiers; fan-out breaks. |
+| Long-horizon **and** high complexity (30+ sequential steps, multi-hour autonomy) | GLM drifts off-plan; Opus holds the plan. |
+| Latency-sensitive interactive loop | GLM is among the slowest frontier coders (~50–100 tok/s). |
+## Tool-calling: the important nuance
+GLM is **strong at one-shot / short tool calls** (clean schema adherence, near-zero argument
+hallucination) but **weak at long, dependent agentic loops**. Root cause from research: GLM
+"plans-then-acts" and depends on its reasoning state (`reasoning_content`) being fed back each
+turn; in long loops it drifts or loops infinitely. Opus *interleaves* thinking with tool use,
+so it adapts mid-execution. One report: a task that looped 40+ tool calls on GLM finished in
+~2 min on Claude. So:
+- `toolcall_single` (one structured call / extraction) → **GLM** (+2)
+- `toolcall_fanout` (a few independent calls, no shared state) → **GLM** (+1)
+- `toolcall_heavy` / `agentic_loop` (many *dependent* calls, long horizon) → **Opus** (hard override)
+## Scenario matrix (capability-fit weight, before cost bias)
+Weight scale: +2 strongly GLM … −3 strongly Opus. Cost bias (+1.5) + timing then applied;
+net effect: fit ≥ −1 usually routes GLM, fit ≤ −2 routes Opus.
+| Lean GLM (+1/+2) | Lean / strong Opus (−1…−3) |
+|---|---|
+| frontend/UI, boilerplate, scaffolding, config (+2) | iac/Terraform/K8s, dependency upgrades (−1) |
+| CRUD, regex, docs, i18n, type/lint fixes (+2) | debugging, code review, perf optimization (−2) |
+| unit tests, local refactor, prototype (+2) | DB migrations, 3rd-party API integration (−2) |
+| toolcall_single (+2) | systems langs (Rust/Go/C) (−2) |
+| SQL, ETL, CI/CD, CLI, notebook, integration tests (+1) | large refactor, architecture, security (−3) |
+| algorithm, research, summarization, toolcall_fanout (+1) | toolcall_heavy, agentic_loop (−3) |
+| general, ml_training (0 / toss-up → cost breaks tie to GLM) | |
+**Principle:** route by *cost of being wrong*, not token price. GLM where output verifies fast
+(compiler/linter/test runner is ground truth) and a retry is cheap; Opus where errors are
+silent, cascading, or expensive (migrations, security, review, perf).
+## Conditional overrides (force Opus regardless of task type)
+- **Vision** (images/screenshots/GUI/computer-use) → Opus (GLM text endpoint has no vision).
+- **Input > ~128K tokens** → Opus (GLM degrades past ~100K despite 1M advertised); use
+  `glm-5.2[1m]` only for pure retrieval/extraction.
+- **> 20 dependent steps / sustained single goal** → Opus (GLM goal-drift).
+- **Unfamiliar / niche / post-cutoff API** → −2: paste authoritative docs into the prompt, or Opus.
+- **Chinese / bilingual** → +1 toward GLM (genuine strength).
+**Complexity adjusts the call:** `low` nudges toward GLM, `high` nudges toward Opus.
+## Cost-timing rule (the "use GLM less at peak" requirement)
+- **Peak window:** ~**14:00–18:00 China time (UTC+8)** ≈ 02:00–06:00 US Eastern.
+- Flagship **GLM-5.2 costs ~3× at peak, ~2× off-peak** (1× off-peak under a promo through ~Sep 2026).
+- **Default `model: "auto"` is GLM-5.2 in both windows.** Since GLM-5.2 carries the ~3× peak
+  surcharge, the router **routes less work to GLM during peak** (a peak penalty scaled by the
+  multiplier) instead of switching models — only stronger-fit tasks go to GLM at peak.
+- GLM-4.7 carries no multiplier and stays available as a cheaper option: set `GLM_PEAK_MODEL=glm-4.7`
+  if you'd rather switch to a no-surcharge model at peak instead of routing less.
+- Western working hours fall in GLM off-peak → GLM is *most* attractive during your normal day.
+## Model picks
+| Use | Model |
+|---|---|
+| Off-peak default | `glm-5.2` |
+| Peak default | `glm-5.2` (router sends less to GLM at peak due to ~3× cost) |
+| No-surcharge alternative for peak | `glm-4.7` (set `GLM_PEAK_MODEL=glm-4.7`) |
+| Cheapest / trivial work | `glm-4.5-air` |
+| Huge context (>200K) | `glm-5.2[1m]` |
+> `GLM_OFFPEAK_MODEL` / `GLM_PEAK_MODEL` each accept a **comma-separated list** (e.g.
+> `glm-5.2,glm-5-turbo`); the router auto-picks one — most capable for hard tasks, cheapest for
+> easy/peak ones. Put a no-surcharge model in the peak list (e.g. `glm-5.2,glm-4.7`) and it's
+> preferred at peak, so GLM stays fine to use during the surcharge window.
+## Escalation contract
+If GLM's output is wrong, low-quality, or it refuses: retry once with a sharper prompt,
+then **escalate to Opus**. Never let cost-saving degrade correctness on work that matters.
+> ⚠️ Pricing, peak windows, and the concurrency cap are vendor-set and drift. Verify against
+> `z.ai/subscribe` and `docs.z.ai` periodically; tune via env vars (see `.env.example`).