cheap-llm-mcp 0.1.0 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.example +3 -1
- package/CHANGELOG.md +16 -0
- package/README.en.md +210 -0
- package/README.md +154 -199
- package/README.zh-CN.md +36 -10
- package/codex-config.example.toml +3 -1
- package/dist/chat.js +3 -3
- package/dist/chat.js.map +1 -1
- package/dist/providers.d.ts +7 -0
- package/dist/providers.js +31 -14
- package/dist/providers.js.map +1 -1
- package/dist/server.d.ts +1 -1
- package/dist/server.js +2 -2
- package/dist/server.js.map +1 -1
- package/dist/setup.d.ts +13 -2
- package/dist/setup.js +112 -16
- package/dist/setup.js.map +1 -1
- package/dist/types.d.ts +2 -0
- package/package.json +5 -4
package/.env.example
CHANGED
|
@@ -1,7 +1,9 @@
|
|
|
1
1
|
CHEAP_LLM_API_KEY=sk-...
|
|
2
2
|
CHEAP_LLM_BASE_URL=https://api.deepseek.com
|
|
3
|
-
CHEAP_LLM_MODEL=deepseek-
|
|
3
|
+
CHEAP_LLM_MODEL=deepseek-v4-flash
|
|
4
4
|
CHEAP_LLM_CHAT_PATH=/chat/completions
|
|
5
|
+
CHEAP_LLM_API_KEY_HEADER=Authorization
|
|
6
|
+
CHEAP_LLM_API_KEY_PREFIX=Bearer
|
|
5
7
|
|
|
6
8
|
SIMPLE_LLM_CHINESE_DEFAULT=true
|
|
7
9
|
SIMPLE_LLM_STABILITY_DEFAULT=true
|
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,21 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.1.3
|
|
4
|
+
|
|
5
|
+
- Made the GitHub homepage README Chinese-first with a separate English docs link.
|
|
6
|
+
- Added an optional tiny public API connectivity test to the setup wizard.
|
|
7
|
+
- Added Qwen / Alibaba Cloud Bailian to the setup wizard and documented it as a first-class domestic model preset.
|
|
8
|
+
|
|
9
|
+
## 0.1.2
|
|
10
|
+
|
|
11
|
+
- Added explicit DeepSeek and Xiaomi MiMo setup presets.
|
|
12
|
+
- Added configurable API-key auth headers for providers that use `api-key` instead of `Authorization: Bearer`.
|
|
13
|
+
- Updated docs and examples to separate tested presets from generic OpenAI-compatible endpoints.
|
|
14
|
+
|
|
15
|
+
## 0.1.1
|
|
16
|
+
|
|
17
|
+
- Redacted API keys in setup command previews, running logs, and fallback config output.
|
|
18
|
+
|
|
3
19
|
## 0.1.0
|
|
4
20
|
|
|
5
21
|
- Initial productized MCP server.
|
package/README.en.md
ADDED
|
@@ -0,0 +1,210 @@
|
|
|
1
|
+
# cheap-llm-mcp
|
|
2
|
+
|
|
3
|
+
[](https://www.npmjs.com/package/cheap-llm-mcp)
|
|
4
|
+
[](https://github.com/stBlackCat/cheap-llm-mcp/actions/workflows/ci.yml)
|
|
5
|
+
[](https://nodejs.org/)
|
|
6
|
+
[](LICENSE)
|
|
7
|
+
|
|
8
|
+
Still worried about GPT Plus limits? Still watching your Claude subscription tokens burn on tiny chores?
|
|
9
|
+
|
|
10
|
+
`cheap-llm-mcp` solves a big chunk of that pain: use cheap AI for cheap work, while your premium model stays in charge.
|
|
11
|
+
|
|
12
|
+
[Chinese homepage](README.md)
|
|
13
|
+
|
|
14
|
+
This is a local stdio MCP server for Claude Code, Codex, and other MCP clients. It routes simple, low-risk, self-contained tasks to tested DeepSeek, Xiaomi MiMo, and Qwen / Alibaba Cloud Bailian presets, or to a custom OpenAI-compatible chat completions API. Your main AI still plans, reviews, edits, and decides. The cheap model just handles small drafts.
|
|
15
|
+
|
|
16
|
+
## Quickstart
|
|
17
|
+
|
|
18
|
+
Install the MCP server first:
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
npx -y cheap-llm-mcp@latest setup
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
Then fill in one OpenAI-compatible endpoint. The setup wizard can also send a tiny public connectivity test after you enter the API key.
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
CHEAP_LLM_BASE_URL=https://api.deepseek.com
|
|
28
|
+
CHEAP_LLM_MODEL=deepseek-v4-flash
|
|
29
|
+
CHEAP_LLM_API_KEY=sk-...
|
|
30
|
+
CHEAP_LLM_CHAT_PATH=/chat/completions
|
|
31
|
+
CHEAP_LLM_API_KEY_HEADER=Authorization
|
|
32
|
+
CHEAP_LLM_API_KEY_PREFIX=Bearer
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
DeepSeek, Xiaomi MiMo, and Qwen are first-class presets. Other OpenAI-compatible providers may work when they expose a compatible chat completions endpoint and use a supported API-key header.
|
|
36
|
+
|
|
37
|
+
Check your setup:
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
npx -y cheap-llm-mcp@latest doctor
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
Print manual config:
|
|
44
|
+
|
|
45
|
+
```bash
|
|
46
|
+
npx -y cheap-llm-mcp@latest config
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
## Claude Code
|
|
50
|
+
|
|
51
|
+
The setup wizard can run this after confirmation:
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
claude mcp add --transport stdio --scope user \
|
|
55
|
+
--env CHEAP_LLM_API_KEY=sk-... \
|
|
56
|
+
--env CHEAP_LLM_BASE_URL=https://api.deepseek.com \
|
|
57
|
+
--env CHEAP_LLM_MODEL=deepseek-v4-flash \
|
|
58
|
+
--env CHEAP_LLM_CHAT_PATH=/chat/completions \
|
|
59
|
+
--env CHEAP_LLM_API_KEY_HEADER=Authorization \
|
|
60
|
+
--env CHEAP_LLM_API_KEY_PREFIX=Bearer \
|
|
61
|
+
--env SIMPLE_LLM_CHINESE_DEFAULT=true \
|
|
62
|
+
--env SIMPLE_LLM_STABILITY_DEFAULT=true \
|
|
63
|
+
cheap-llm -- npx -y cheap-llm-mcp@latest
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
Restart Claude Code and run:
|
|
67
|
+
|
|
68
|
+
```text
|
|
69
|
+
/mcp
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
## Codex
|
|
73
|
+
|
|
74
|
+
The setup wizard can run this after confirmation:
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
codex mcp add cheap-llm \
|
|
78
|
+
--env CHEAP_LLM_API_KEY=sk-... \
|
|
79
|
+
--env CHEAP_LLM_BASE_URL=https://api.deepseek.com \
|
|
80
|
+
--env CHEAP_LLM_MODEL=deepseek-v4-flash \
|
|
81
|
+
--env CHEAP_LLM_CHAT_PATH=/chat/completions \
|
|
82
|
+
--env CHEAP_LLM_API_KEY_HEADER=Authorization \
|
|
83
|
+
--env CHEAP_LLM_API_KEY_PREFIX=Bearer \
|
|
84
|
+
--env SIMPLE_LLM_CHINESE_DEFAULT=true \
|
|
85
|
+
--env SIMPLE_LLM_STABILITY_DEFAULT=true \
|
|
86
|
+
-- npx -y cheap-llm-mcp@latest
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
Restart Codex and verify:
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
codex mcp list
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
If `codex mcp add` is unavailable, run:
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
npx -y cheap-llm-mcp@latest config
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
Then paste the printed TOML into `~/.codex/config.toml`.
|
|
102
|
+
|
|
103
|
+
## What should be delegated?
|
|
104
|
+
|
|
105
|
+
Good cheap-model tasks:
|
|
106
|
+
|
|
107
|
+
- summarize a short note
|
|
108
|
+
- translate or rewrite text
|
|
109
|
+
- classify a small snippet
|
|
110
|
+
- extract fields into JSON
|
|
111
|
+
- draft a regex
|
|
112
|
+
- explain a short command
|
|
113
|
+
- produce a tiny isolated code snippet
|
|
114
|
+
|
|
115
|
+
Bad cheap-model tasks:
|
|
116
|
+
|
|
117
|
+
- decide architecture
|
|
118
|
+
- edit your repo directly
|
|
119
|
+
- review security-sensitive code
|
|
120
|
+
- reason over a full private codebase
|
|
121
|
+
- handle secrets or sensitive data
|
|
122
|
+
- debug complex cross-file behavior
|
|
123
|
+
|
|
124
|
+
## Stability Without Wasting Tokens
|
|
125
|
+
|
|
126
|
+
Cheap models are useful, but they are not the boss.
|
|
127
|
+
|
|
128
|
+
`cheap-llm-mcp` adds a compact default instruction that tells the cheap model to return a concise draft only, avoid final decisions, avoid pretending it edited files, avoid guessing missing facts, and say `UNCERTAIN` when the task is ambiguous.
|
|
129
|
+
|
|
130
|
+
The MCP tool description also tells the host AI to lightly review the result against the original task before using it. This keeps the premium model in control without asking the cheap model to produce long self-review reports.
|
|
131
|
+
|
|
132
|
+
## OpenAI-Compatible Config
|
|
133
|
+
|
|
134
|
+
```bash
|
|
135
|
+
CHEAP_LLM_BASE_URL=https://your-provider.example/v1
|
|
136
|
+
CHEAP_LLM_MODEL=your-cheap-model
|
|
137
|
+
CHEAP_LLM_API_KEY=your-api-key
|
|
138
|
+
CHEAP_LLM_CHAT_PATH=/chat/completions
|
|
139
|
+
CHEAP_LLM_API_KEY_HEADER=Authorization
|
|
140
|
+
CHEAP_LLM_API_KEY_PREFIX=Bearer
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
Tested presets:
|
|
144
|
+
|
|
145
|
+
```bash
|
|
146
|
+
# DeepSeek
|
|
147
|
+
CHEAP_LLM_BASE_URL=https://api.deepseek.com
|
|
148
|
+
CHEAP_LLM_MODEL=deepseek-v4-flash
|
|
149
|
+
|
|
150
|
+
# Xiaomi MiMo
|
|
151
|
+
CHEAP_LLM_BASE_URL=https://api.xiaomimimo.com/v1
|
|
152
|
+
CHEAP_LLM_MODEL=mimo-v2.5-pro
|
|
153
|
+
|
|
154
|
+
# Qwen / Alibaba Cloud Bailian
|
|
155
|
+
CHEAP_LLM_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
|
|
156
|
+
CHEAP_LLM_MODEL=qwen-plus
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
Xiaomi MiMo also documents an `api-key` header. To switch to that form:
|
|
160
|
+
|
|
161
|
+
```bash
|
|
162
|
+
CHEAP_LLM_API_KEY_HEADER=api-key
|
|
163
|
+
CHEAP_LLM_API_KEY_PREFIX=none
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
Qwen / Alibaba Cloud Bailian uses the DashScope OpenAI-compatible endpoint: `https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions`, with `qwen-plus` as the default model.
|
|
167
|
+
|
|
168
|
+
Reference docs: [DeepSeek API](https://api-docs.deepseek.com/zh-cn/), [Xiaomi MiMo first API call](https://platform.xiaomimimo.com/docs/zh-CN/quick-start/first-api-call), and [Alibaba Cloud Model Studio Qwen API](https://www.alibabacloud.com/help/en/model-studio/use-qwen-by-calling-api).
|
|
169
|
+
|
|
170
|
+
## Token Savings
|
|
171
|
+
|
|
172
|
+
Run the MCP tool `get_token_savings` to see how many provider-reported tokens were handled by cheap models during the current MCP server session.
|
|
173
|
+
|
|
174
|
+
For a persistent audit trail, set:
|
|
175
|
+
|
|
176
|
+
```bash
|
|
177
|
+
SIMPLE_LLM_USAGE_LOG=/path/to/cheap-llm-usage.jsonl
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
The usage log records provider, model, token counts, and timestamp only. It does not record prompts or model outputs.
|
|
181
|
+
|
|
182
|
+
## Safety Defaults
|
|
183
|
+
|
|
184
|
+
- Calls require `approvedForExternalApi=true`.
|
|
185
|
+
- Calls require `dataClassification`.
|
|
186
|
+
- `dataClassification=sensitive` is rejected.
|
|
187
|
+
- Common secret patterns are rejected.
|
|
188
|
+
- HTTP providers are rejected unless `SIMPLE_LLM_ALLOW_HTTP=true`.
|
|
189
|
+
- Prompt size is capped by `SIMPLE_LLM_MAX_PROMPT_CHARS` (default: `12000`).
|
|
190
|
+
- Requests time out via `SIMPLE_LLM_TIMEOUT_MS` (default: `60000`).
|
|
191
|
+
- Provider errors are redacted before being returned.
|
|
192
|
+
|
|
193
|
+
## Chinese Default
|
|
194
|
+
|
|
195
|
+
Chinese-first output is enabled by default:
|
|
196
|
+
|
|
197
|
+
```bash
|
|
198
|
+
SIMPLE_LLM_CHINESE_DEFAULT=true
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
## Development
|
|
202
|
+
|
|
203
|
+
```bash
|
|
204
|
+
npm install
|
|
205
|
+
npm run ci
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
## License
|
|
209
|
+
|
|
210
|
+
MIT
|
package/README.md
CHANGED
|
@@ -1,243 +1,198 @@
|
|
|
1
|
-
# cheap-llm-mcp
|
|
2
|
-
|
|
3
|
-
[](https://www.npmjs.com/package/cheap-llm-mcp)
|
|
4
|
-
[](https://github.com/stBlackCat/cheap-llm-mcp/actions/workflows/ci.yml)
|
|
5
|
-
[](https://nodejs.org/)
|
|
6
|
-
[](LICENSE)
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
##
|
|
17
|
-
|
|
18
|
-
|
|
1
|
+
# cheap-llm-mcp
|
|
2
|
+
|
|
3
|
+
[](https://www.npmjs.com/package/cheap-llm-mcp)
|
|
4
|
+
[](https://github.com/stBlackCat/cheap-llm-mcp/actions/workflows/ci.yml)
|
|
5
|
+
[](https://nodejs.org/)
|
|
6
|
+
[](LICENSE)
|
|
7
|
+
|
|
8
|
+
[English docs](README.en.md)
|
|
9
|
+
|
|
10
|
+
还在为 GPT Plus 的额度发愁?还在心疼自己的 Claude 会员 token 太贵?
|
|
11
|
+
|
|
12
|
+
这个 MCP 可以解决你的大部分小任务成本问题:用便宜的 AI 做便宜的事情,让贵的主模型继续负责统筹、审查和最终决策。
|
|
13
|
+
|
|
14
|
+
`cheap-llm-mcp` 是一个本地 stdio MCP server,适用于 Claude Code、Codex 和其他 MCP 客户端。它可以把摘要、翻译、分类、抽取、小段代码等低风险任务交给国产低成本模型分流处理。内置预设包括 DeepSeek、Xiaomi MiMo 和 Qwen / 阿里云百炼,也可以接入你自己填写的 OpenAI-compatible API。
|
|
15
|
+
|
|
16
|
+
## 快速开始
|
|
17
|
+
|
|
18
|
+
先一键安装 MCP:
|
|
19
19
|
|
|
20
20
|
```bash
|
|
21
21
|
npx -y cheap-llm-mcp@latest setup
|
|
22
22
|
```
|
|
23
23
|
|
|
24
|
-
|
|
24
|
+
向导会让你选择客户端、provider 预设、模型和 API key。填完 API key 后,它可以立刻发一个极小的公开 ping 测试接口连通性,不发送你的项目内容。
|
|
25
|
+
|
|
26
|
+
手动配置时,核心就是这几项:
|
|
25
27
|
|
|
26
28
|
```bash
|
|
27
29
|
CHEAP_LLM_BASE_URL=https://api.deepseek.com
|
|
28
|
-
CHEAP_LLM_MODEL=deepseek-
|
|
30
|
+
CHEAP_LLM_MODEL=deepseek-v4-flash
|
|
29
31
|
CHEAP_LLM_API_KEY=sk-...
|
|
30
32
|
CHEAP_LLM_CHAT_PATH=/chat/completions
|
|
33
|
+
CHEAP_LLM_API_KEY_HEADER=Authorization
|
|
34
|
+
CHEAP_LLM_API_KEY_PREFIX=Bearer
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
DeepSeek、Xiaomi MiMo 和 Qwen 是明确支持的国产模型预设;其他平台只要提供兼容 chat completions 的接口,并且鉴权头能按下面的格式配置,就可以按自定义 OpenAI-compatible 接口尝试。
|
|
38
|
+
|
|
39
|
+
检查配置:
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
npx -y cheap-llm-mcp@latest doctor
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
打印手动配置:
|
|
46
|
+
|
|
47
|
+
```bash
|
|
48
|
+
npx -y cheap-llm-mcp@latest config
|
|
31
49
|
```
|
|
32
50
|
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
```bash
|
|
38
|
-
npx -y cheap-llm-mcp@latest doctor
|
|
39
|
-
```
|
|
40
|
-
|
|
41
|
-
Print manual config:
|
|
42
|
-
|
|
43
|
-
```bash
|
|
44
|
-
npx -y cheap-llm-mcp@latest config
|
|
45
|
-
```
|
|
46
|
-
|
|
47
|
-
## Claude Code
|
|
48
|
-
|
|
49
|
-
The setup wizard can run this after confirmation:
|
|
50
|
-
|
|
51
|
-
```bash
|
|
51
|
+
## Claude Code
|
|
52
|
+
|
|
53
|
+
向导会在你确认后执行类似命令:
|
|
54
|
+
|
|
55
|
+
```bash
|
|
52
56
|
claude mcp add --transport stdio --scope user \
|
|
53
57
|
--env CHEAP_LLM_API_KEY=sk-... \
|
|
54
58
|
--env CHEAP_LLM_BASE_URL=https://api.deepseek.com \
|
|
55
|
-
--env CHEAP_LLM_MODEL=deepseek-
|
|
59
|
+
--env CHEAP_LLM_MODEL=deepseek-v4-flash \
|
|
56
60
|
--env CHEAP_LLM_CHAT_PATH=/chat/completions \
|
|
61
|
+
--env CHEAP_LLM_API_KEY_HEADER=Authorization \
|
|
62
|
+
--env CHEAP_LLM_API_KEY_PREFIX=Bearer \
|
|
57
63
|
--env SIMPLE_LLM_CHINESE_DEFAULT=true \
|
|
58
64
|
--env SIMPLE_LLM_STABILITY_DEFAULT=true \
|
|
59
|
-
cheap-llm -- npx -y cheap-llm-mcp@latest
|
|
60
|
-
```
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
```text
|
|
65
|
-
/mcp
|
|
66
|
-
```
|
|
67
|
-
|
|
68
|
-
## Codex
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
```bash
|
|
65
|
+
cheap-llm -- npx -y cheap-llm-mcp@latest
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
重启 Claude Code 后运行:
|
|
69
|
+
|
|
70
|
+
```text
|
|
71
|
+
/mcp
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
## Codex
|
|
75
|
+
|
|
76
|
+
向导会在你确认后执行类似命令:
|
|
77
|
+
|
|
78
|
+
```bash
|
|
73
79
|
codex mcp add cheap-llm \
|
|
74
80
|
--env CHEAP_LLM_API_KEY=sk-... \
|
|
75
81
|
--env CHEAP_LLM_BASE_URL=https://api.deepseek.com \
|
|
76
|
-
--env CHEAP_LLM_MODEL=deepseek-
|
|
82
|
+
--env CHEAP_LLM_MODEL=deepseek-v4-flash \
|
|
77
83
|
--env CHEAP_LLM_CHAT_PATH=/chat/completions \
|
|
84
|
+
--env CHEAP_LLM_API_KEY_HEADER=Authorization \
|
|
85
|
+
--env CHEAP_LLM_API_KEY_PREFIX=Bearer \
|
|
78
86
|
--env SIMPLE_LLM_CHINESE_DEFAULT=true \
|
|
79
87
|
--env SIMPLE_LLM_STABILITY_DEFAULT=true \
|
|
80
|
-
-- npx -y cheap-llm-mcp@latest
|
|
81
|
-
```
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
If `codex mcp add` is unavailable, run:
|
|
90
|
-
|
|
91
|
-
```bash
|
|
92
|
-
npx -y cheap-llm-mcp@latest config
|
|
93
|
-
```
|
|
94
|
-
|
|
95
|
-
Then paste the printed TOML into `~/.codex/config.toml`.
|
|
96
|
-
|
|
97
|
-
## What should be delegated?
|
|
98
|
-
|
|
99
|
-
Good cheap-model tasks:
|
|
100
|
-
|
|
101
|
-
- summarize a short note
|
|
102
|
-
- translate or rewrite text
|
|
103
|
-
- classify a small snippet
|
|
104
|
-
- extract fields into JSON
|
|
105
|
-
- draft a regex
|
|
106
|
-
- explain a short command
|
|
107
|
-
- produce a tiny isolated code snippet
|
|
108
|
-
|
|
109
|
-
Bad cheap-model tasks:
|
|
110
|
-
|
|
111
|
-
- decide architecture
|
|
112
|
-
- edit your repo directly
|
|
113
|
-
- review security-sensitive code
|
|
114
|
-
- reason over a full private codebase
|
|
115
|
-
- handle secrets or sensitive data
|
|
116
|
-
- debug complex cross-file behavior
|
|
117
|
-
|
|
118
|
-
## Stability without wasting tokens
|
|
119
|
-
|
|
120
|
-
Cheap models are useful, but they are not the boss.
|
|
121
|
-
|
|
122
|
-
`cheap-llm-mcp` adds a compact default instruction that tells the cheap model to:
|
|
123
|
-
|
|
124
|
-
- return a concise draft only
|
|
125
|
-
- avoid final decisions
|
|
126
|
-
- avoid pretending it edited files
|
|
127
|
-
- avoid guessing missing facts
|
|
128
|
-
- say `UNCERTAIN` when the task is ambiguous
|
|
129
|
-
|
|
130
|
-
The MCP tool description also tells the host AI to lightly review the result against the original task before using it. This keeps the premium model in control without asking the cheap model to produce long self-review reports.
|
|
131
|
-
|
|
132
|
-
Disable this default only if you know what you are doing:
|
|
133
|
-
|
|
134
|
-
```bash
|
|
135
|
-
SIMPLE_LLM_STABILITY_DEFAULT=false
|
|
136
|
-
```
|
|
137
|
-
|
|
138
|
-
## 30-second demo
|
|
139
|
-
|
|
140
|
-
1. Run `npx -y cheap-llm-mcp@latest setup`.
|
|
141
|
-
2. Restart Claude Code or Codex.
|
|
142
|
-
3. Ask: "Use the cheap LLM MCP to summarize this short text."
|
|
143
|
-
4. Your host AI delegates the small task, then checks the draft before using it.
|
|
144
|
-
|
|
145
|
-
Available tools:
|
|
146
|
-
|
|
147
|
-
- `ask_simple_model`: call a configured cheap model for a self-contained task.
|
|
148
|
-
- `list_simple_model_providers`: show configured providers without leaking API keys.
|
|
149
|
-
- `check_simple_model_setup`: validate local provider configuration without making a model request.
|
|
150
|
-
- `get_token_savings`: show how many provider-reported tokens were routed to cheap models.
|
|
151
|
-
|
|
152
|
-
## OpenAI-compatible config
|
|
153
|
-
|
|
154
|
-
The recommended config is intentionally boring:
|
|
88
|
+
-- npx -y cheap-llm-mcp@latest
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
如果命令不可用,运行 `npx -y cheap-llm-mcp@latest config`,把输出的 TOML 写入 `~/.codex/config.toml`。
|
|
92
|
+
|
|
93
|
+
## 配置格式
|
|
94
|
+
|
|
95
|
+
推荐路径就是这几项:
|
|
155
96
|
|
|
156
97
|
```bash
|
|
157
98
|
CHEAP_LLM_BASE_URL=https://your-provider.example/v1
|
|
158
99
|
CHEAP_LLM_MODEL=your-cheap-model
|
|
159
100
|
CHEAP_LLM_API_KEY=your-api-key
|
|
160
101
|
CHEAP_LLM_CHAT_PATH=/chat/completions
|
|
102
|
+
CHEAP_LLM_API_KEY_HEADER=Authorization
|
|
103
|
+
CHEAP_LLM_API_KEY_PREFIX=Bearer
|
|
161
104
|
```
|
|
162
105
|
|
|
163
|
-
|
|
106
|
+
已明确支持的预设:
|
|
164
107
|
|
|
165
108
|
```bash
|
|
166
109
|
# DeepSeek
|
|
167
110
|
CHEAP_LLM_BASE_URL=https://api.deepseek.com
|
|
168
|
-
CHEAP_LLM_MODEL=deepseek-
|
|
111
|
+
CHEAP_LLM_MODEL=deepseek-v4-flash
|
|
169
112
|
|
|
170
|
-
#
|
|
113
|
+
# Xiaomi MiMo
|
|
114
|
+
CHEAP_LLM_BASE_URL=https://api.xiaomimimo.com/v1
|
|
115
|
+
CHEAP_LLM_MODEL=mimo-v2.5-pro
|
|
116
|
+
|
|
117
|
+
# Qwen / 阿里云百炼
|
|
171
118
|
CHEAP_LLM_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
|
|
172
119
|
CHEAP_LLM_MODEL=qwen-plus
|
|
173
120
|
|
|
174
|
-
#
|
|
121
|
+
# 其他 OpenAI-compatible 网关
|
|
175
122
|
CHEAP_LLM_BASE_URL=https://example.com/v1
|
|
176
123
|
CHEAP_LLM_MODEL=model-id
|
|
177
124
|
```
|
|
178
125
|
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
-
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
|
|
220
|
-
|
|
221
|
-
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
|
|
231
|
-
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
|
|
243
|
-
|
|
126
|
+
DeepSeek 使用 `Authorization: Bearer ...`,接口是 `https://api.deepseek.com/chat/completions`。Xiaomi MiMo 的 OpenAI-compatible 接口是 `https://api.xiaomimimo.com/v1/chat/completions`;小米文档同时支持 `Authorization: Bearer ...` 和 `api-key` 头。如果你要切到 `api-key` 头,配置:
|
|
127
|
+
|
|
128
|
+
```bash
|
|
129
|
+
CHEAP_LLM_API_KEY_HEADER=api-key
|
|
130
|
+
CHEAP_LLM_API_KEY_PREFIX=none
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
Qwen / 阿里云百炼使用 DashScope OpenAI-compatible endpoint:`https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions`,默认模型是 `qwen-plus`。
|
|
134
|
+
|
|
135
|
+
参考文档:[DeepSeek API](https://api-docs.deepseek.com/zh-cn/)、[Xiaomi MiMo 首次调用 API](https://platform.xiaomimimo.com/docs/zh-CN/quick-start/first-api-call) 和 [Alibaba Cloud Model Studio Qwen API](https://www.alibabacloud.com/help/en/model-studio/use-qwen-by-calling-api)。
|
|
136
|
+
|
|
137
|
+
高级用户仍然可以用 `SIMPLE_LLM_PROVIDERS` 配多个具名 provider,但默认体验就是一个便宜的 OpenAI-compatible endpoint。
|
|
138
|
+
|
|
139
|
+
## 什么任务适合外包?
|
|
140
|
+
|
|
141
|
+
适合:短文本摘要、翻译润色、简单分类、小段信息抽取成 JSON、正则草稿、简短命令解释、独立小代码片段。
|
|
142
|
+
|
|
143
|
+
不适合:架构决策、直接修改仓库、安全敏感代码审查、完整私有代码库上下文推理、密钥或敏感数据、复杂跨文件调试。
|
|
144
|
+
|
|
145
|
+
## 稳定性控制,但不浪费 token
|
|
146
|
+
|
|
147
|
+
便宜模型可以干活,但不能当负责人。
|
|
148
|
+
|
|
149
|
+
`cheap-llm-mcp` 默认会给便宜模型加一条很短的稳定性约束:只输出简洁草案,不做最终决策,不假装已经修改文件,不乱猜缺失事实,遇到不确定任务时用 `UNCERTAIN` 说明。
|
|
150
|
+
|
|
151
|
+
同时,MCP 工具描述会要求 Codex 或 Claude Code 的主 AI 对结果进行轻量审查:只对照原任务核对是否可用,不额外发大段上下文,也不默认让便宜模型再自我审查一遍。这样既能提升稳定性,也不会把省下来的 token 又花回去。
|
|
152
|
+
|
|
153
|
+
## Token 节省统计
|
|
154
|
+
|
|
155
|
+
运行 MCP 工具 `get_token_savings`,可以看到当前 MCP server 会话里实际有多少 token 被低费用模型处理了。
|
|
156
|
+
|
|
157
|
+
它会统计低费用模型的 prompt、completion、total token,按 provider/model 分组,并给出粗略的 `estimatedPremiumTokensAvoided`。这里默认只统计 token,不硬编码价格表,因为各家模型价格经常变。
|
|
158
|
+
|
|
159
|
+
## 默认中文约束
|
|
160
|
+
|
|
161
|
+
默认开启:
|
|
162
|
+
|
|
163
|
+
```bash
|
|
164
|
+
SIMPLE_LLM_CHINESE_DEFAULT=true
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
MCP 会自动注入中文优先 system prompt:默认使用简体中文回答,但保留代码、命令、文件路径、API 名称、模型名称、错误信息、配置键和英文技术术语原文。
|
|
168
|
+
|
|
169
|
+
## 安全边界
|
|
170
|
+
|
|
171
|
+
这个 MCP 只适合低风险、可自包含的小任务:
|
|
172
|
+
|
|
173
|
+
- 必须显式确认 `approvedForExternalApi=true`
|
|
174
|
+
- 必须提供 `dataClassification`
|
|
175
|
+
- `sensitive` 数据会直接拒绝
|
|
176
|
+
- 自动扫描常见 API key、token、password、AWS key、private key
|
|
177
|
+
- 默认只允许 HTTPS provider
|
|
178
|
+
- 默认 prompt 上限是 12000 字符
|
|
179
|
+
- 默认请求超时是 60000ms
|
|
180
|
+
- provider 错误会脱敏后返回
|
|
181
|
+
- setup 连通性测试只发送固定 public ping,不发送你的仓库、需求或业务数据
|
|
182
|
+
|
|
183
|
+
不要把密钥、敏感客户数据、完整私有仓库上下文、安全判断、复杂架构决策、大规模重构交给外部便宜模型。
|
|
184
|
+
|
|
185
|
+
## 为什么不是直接换小模型?
|
|
186
|
+
|
|
187
|
+
直接把主模型换小,省了 token,但规划、判断、安全边界、工具编排都会变弱。`cheap-llm-mcp` 的思路是强模型继续当负责人,只把小而明确的任务转交出去。
|
|
188
|
+
|
|
189
|
+
## 开发
|
|
190
|
+
|
|
191
|
+
```bash
|
|
192
|
+
npm install
|
|
193
|
+
npm run ci
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
## License
|
|
197
|
+
|
|
198
|
+
MIT
|