copilot-custom-endpoint 1.3.0 → 1.3.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/docs/example-config.md +203 -0
- package/docs/models/glm.md +365 -0
- package/docs/models/kimi.md +232 -0
- package/docs/models/mimo.md +258 -0
- package/docs/models/minimax.md +247 -0
- package/docs/models/qwen.md +320 -0
- package/docs/pricing.md +116 -0
- package/package.json +5 -1
|
@@ -0,0 +1,232 @@
|
|
|
1
|
+
# Kimi — VS Code Custom Endpoint Setup Guide
|
|
2
|
+
|
|
3
|
+
> **TL;DR:** Kimi K2.6 requires the local proxy. The K2 family locks `temperature: 1` and `top_p: 0.95`, and requires `thinking: { type: "disabled" }` on tool turns. The proxy rewrites sampling values, suppresses thinking on tool turns, and preserves streaming. Direct VS Code → Moonshot integration is not viable in this environment.
|
|
4
|
+
|
|
5
|
+
## At a Glance
|
|
6
|
+
|
|
7
|
+
| Field | Value |
|
|
8
|
+
| ---------------------- | --------------------------------------------- |
|
|
9
|
+
| Mode | **Proxy required** (local on `:3457`) |
|
|
10
|
+
| Vision | ✅ Yes |
|
|
11
|
+
| Tool calling | ✅ Yes (proxy forces `thinking: disabled`) |
|
|
12
|
+
| Context | 256K |
|
|
13
|
+
| Max output | 32K |
|
|
14
|
+
| Required `requestBody` | `temperature: 1` |
|
|
15
|
+
| Upstream endpoint | `https://api.moonshot.ai/v1/chat/completions` |
|
|
16
|
+
| Proxy endpoint | `http://127.0.0.1:3457/v1/chat/completions` |
|
|
17
|
+
|
|
18
|
+
## Quick Start
|
|
19
|
+
|
|
20
|
+
1. **Start the proxy:** `npm run proxy:kimi`
|
|
21
|
+
2. **Edit `chatLanguageModels.json`** — add the Kimi block from [Setup](#setup) below.
|
|
22
|
+
3. **Set your Moonshot API key** via the Command Palette → **Chat: Manage Language Models**.
|
|
23
|
+
4. **Restart VS Code** and pick "Kimi K2.6" in the chat picker.
|
|
24
|
+
|
|
25
|
+
## Setup
|
|
26
|
+
|
|
27
|
+
### 1. VS Code configuration
|
|
28
|
+
|
|
29
|
+
Config file location:
|
|
30
|
+
|
|
31
|
+
| OS | Path |
|
|
32
|
+
| ------- | ----------------------------------------------------------------- |
|
|
33
|
+
| Windows | `%APPDATA%\Code\User\chatLanguageModels.json` |
|
|
34
|
+
| macOS | `~/Library/Application Support/Code/User/chatLanguageModels.json` |
|
|
35
|
+
| Linux | `~/.config/Code/User/chatLanguageModels.json` |
|
|
36
|
+
|
|
37
|
+
```json
|
|
38
|
+
{
|
|
39
|
+
"name": "Kimi",
|
|
40
|
+
"vendor": "customendpoint",
|
|
41
|
+
"apiKey": "",
|
|
42
|
+
"apiType": "chat-completions",
|
|
43
|
+
"models": [
|
|
44
|
+
{
|
|
45
|
+
"id": "kimi-k2.6",
|
|
46
|
+
"name": "Kimi K2.6",
|
|
47
|
+
"url": "http://127.0.0.1:3457/v1/chat/completions",
|
|
48
|
+
"requestBody": {
|
|
49
|
+
"temperature": 1
|
|
50
|
+
},
|
|
51
|
+
"toolCalling": true,
|
|
52
|
+
"vision": true,
|
|
53
|
+
"streaming": true,
|
|
54
|
+
"maxInputTokens": 262144,
|
|
55
|
+
"maxOutputTokens": 32768
|
|
56
|
+
}
|
|
57
|
+
]
|
|
58
|
+
}
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
### 2. API key
|
|
62
|
+
|
|
63
|
+
1. Open the Command Palette (`Ctrl+Shift+P`).
|
|
64
|
+
2. Run **Chat: Manage Language Models**.
|
|
65
|
+
3. Find the **Kimi** group → **Update API Key**.
|
|
66
|
+
4. Paste your Moonshot API key.
|
|
67
|
+
|
|
68
|
+
> After setting via the UI, VS Code replaces `"apiKey": ""` with a `${input:chat.lm.secret.<id>}` reference.
|
|
69
|
+
|
|
70
|
+
### 3. Local proxy
|
|
71
|
+
|
|
72
|
+
| Setting | Value |
|
|
73
|
+
| ------------ | ----------------------------------------------------- |
|
|
74
|
+
| Script | `proxy/kimi-proxy.mjs` |
|
|
75
|
+
| Listen URL | `http://127.0.0.1:3457/v1/chat/completions` |
|
|
76
|
+
| Health check | `http://127.0.0.1:3457/healthz` |
|
|
77
|
+
| Start | `npm run proxy:kimi` (or `node proxy/kimi-proxy.mjs`) |
|
|
78
|
+
| Help | `node proxy/kimi-proxy.mjs --help` |
|
|
79
|
+
|
|
80
|
+
#### Environment variables
|
|
81
|
+
|
|
82
|
+
All can be set in a `.env` file at the repo root (both proxies `import 'dotenv/config'` automatically).
|
|
83
|
+
|
|
84
|
+
| Variable | Default | Purpose |
|
|
85
|
+
| ------------------------------------------- | ----------------------------------------------------- | ------------------------------------------------------- |
|
|
86
|
+
| `KIMI_PROXY_PORT` | `3457` (falls back to `PORT`) | Local listen port |
|
|
87
|
+
| `KIMI_UPSTREAM_URL` | `https://api.moonshot.ai/v1/chat/completions` | Upstream Moonshot endpoint |
|
|
88
|
+
| `KIMI_PROXY_FORCE_TEMPERATURE` | `1` | Temperature for thinking-mode requests |
|
|
89
|
+
| `KIMI_PROXY_FORCE_NON_THINKING_TEMPERATURE` | `0.6` | Temperature when thinking is disabled (tool requests) |
|
|
90
|
+
| `KIMI_PROXY_FORCE_TOP_P` | `0.95` | `top_p` forced into request body |
|
|
91
|
+
| `KIMI_PROXY_DISABLE_THINKING_WITH_TOOLS` | `1` | Force `thinking={"type":"disabled"}` when tools present |
|
|
92
|
+
| `KIMI_PROXY_LOG` | `debug_log/kimi-proxy.ndjson` (relative to repo root) | Redacted NDJSON log path |
|
|
93
|
+
|
|
94
|
+
#### Health check response
|
|
95
|
+
|
|
96
|
+
```json
|
|
97
|
+
{
|
|
98
|
+
"ok": true,
|
|
99
|
+
"upstreamUrl": "https://api.moonshot.ai/v1/chat/completions",
|
|
100
|
+
"port": 3457,
|
|
101
|
+
"forcedTemperature": 1,
|
|
102
|
+
"forcedTopP": 0.95
|
|
103
|
+
}
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
#### Proxy behavior
|
|
107
|
+
|
|
108
|
+
- Forwards the existing `Authorization` header upstream.
|
|
109
|
+
- Rewrites plain-chat requests to `temperature: 1` and `top_p: 0.95`.
|
|
110
|
+
- Rewrites tool-enabled requests to `thinking: {"type": "disabled"}`, `temperature: 0.6`, and `top_p: 0.95`.
|
|
111
|
+
- Preserves streaming responses.
|
|
112
|
+
- Writes redacted request summaries to `debug_log/kimi-proxy.ndjson`.
|
|
113
|
+
|
|
114
|
+
## Configuration Reference
|
|
115
|
+
|
|
116
|
+
### Sampling parameters
|
|
117
|
+
|
|
118
|
+
| Parameter | Value | Notes |
|
|
119
|
+
| ------------- | ----------------------------- | -------------------------------- |
|
|
120
|
+
| `temperature` | `1` (thinking) / `0.6` (tool) | Locked by model — proxy enforces |
|
|
121
|
+
| `top_p` | `0.95` | Locked by model — proxy enforces |
|
|
122
|
+
|
|
123
|
+
### Thinking mode
|
|
124
|
+
|
|
125
|
+
| Turn type | Behavior |
|
|
126
|
+
| ------------ | ----------------------------------------------------------- |
|
|
127
|
+
| Plain chat | Thinking enabled, `temperature: 1` |
|
|
128
|
+
| Tool-enabled | `thinking: { type: "disabled" }` forced, `temperature: 0.6` |
|
|
129
|
+
|
|
130
|
+
### Capabilities
|
|
131
|
+
|
|
132
|
+
- Native multimodal: text, image, video input.
|
|
133
|
+
- Tool calling with `tool_choice: "auto"`.
|
|
134
|
+
- Streaming (SSE).
|
|
135
|
+
- `tools` / `tool_calls` only (deprecated `functions` not supported).
|
|
136
|
+
- `tool_choice="required"` is **not** supported by the model.
|
|
137
|
+
|
|
138
|
+
## Troubleshooting
|
|
139
|
+
|
|
140
|
+
| Symptom | Likely cause | Fix |
|
|
141
|
+
| ------------------------------------------------------ | -------------------------- | ------------------------------------------------- |
|
|
142
|
+
| "Connection refused" on chat | Proxy not running | `npm run proxy:kimi` |
|
|
143
|
+
| `invalid temperature: only 1 is allowed` | Direct path without proxy | Use the proxy |
|
|
144
|
+
| `invalid top_p: only 0.95 is allowed` | Direct path without proxy | Use the proxy |
|
|
145
|
+
| `thinking is enabled but reasoning_content is missing` | Tool turn with thinking on | Verify `KIMI_PROXY_DISABLE_THINKING_WITH_TOOLS=1` |
|
|
146
|
+
| Model not in VS Code picker | Config not reloaded | Restart VS Code |
|
|
147
|
+
| `tool_choice=required` rejected | Model limitation | Use `auto` only |
|
|
148
|
+
|
|
149
|
+
## Pricing
|
|
150
|
+
|
|
151
|
+
For the cross-provider comparison, see [docs/pricing.md](../pricing.md). Kimi K2.6 on the **Moonshot direct platform**:
|
|
152
|
+
|
|
153
|
+
| Model | Input | Output (non-thinking) | Output (thinking) |
|
|
154
|
+
| ----------- | ---------- | --------------------- | ----------------- |
|
|
155
|
+
| `kimi-k2.6` | $0.16 / 1M | $0.95 / 1M | $4.00 / 1M |
|
|
156
|
+
|
|
157
|
+
> Via DashScope, K2.6 is also available at $0.89 / 1M input and $3.71 / 1M output (same model, regional pricing).
|
|
158
|
+
|
|
159
|
+
---
|
|
160
|
+
|
|
161
|
+
## Background & Findings
|
|
162
|
+
|
|
163
|
+
> This appendix preserves the validation narrative for future reference. It is not required to use the model.
|
|
164
|
+
|
|
165
|
+
### Why Kimi was a reasonable candidate
|
|
166
|
+
|
|
167
|
+
Kimi documents an OpenAI-compatible Chat Completions API with Bearer-token auth, `model` selection, streaming, and `tools` / `tool_calls` — making VS Code Custom Endpoint `chat-completions` mode the lowest-risk starting point.
|
|
168
|
+
|
|
169
|
+
### Why direct integration failed
|
|
170
|
+
|
|
171
|
+
Direct VS Code requests to Moonshot failed in stages:
|
|
172
|
+
|
|
173
|
+
1. Initial auth failure while the config still pointed at the older `api.moonshot.cn` endpoint.
|
|
174
|
+
2. `invalid temperature: only 1 is allowed for this model`.
|
|
175
|
+
3. `invalid top_p: only 0.95 is allowed for this model`.
|
|
176
|
+
4. After the first tool-enabled attempt, `thinking is enabled but reasoning_content is missing in assistant tool call message`.
|
|
177
|
+
|
|
178
|
+
The model-level `requestBody.temperature = 1` override validated locally but was not sufficient in practice, which strongly suggests that VS Code's Custom Endpoint provider ignored or overwrote some model-specific request fields.
|
|
179
|
+
|
|
180
|
+
### Important caveats from research
|
|
181
|
+
|
|
182
|
+
- Kimi documents `tools` / `tool_calls`, not deprecated `functions` / `function_call`.
|
|
183
|
+
- `tool_choice="required"` is not supported.
|
|
184
|
+
- Thinking controls are Kimi-specific through a `thinking` object and `reasoning_content` fields.
|
|
185
|
+
- VS Code BYOK/custom endpoint support does not replace GitHub-hosted features such as inline completions or semantic search.
|
|
186
|
+
- K2-family models use fixed sampling values, which made request rewriting necessary when VS Code sent incompatible values.
|
|
187
|
+
|
|
188
|
+
### Validation results
|
|
189
|
+
|
|
190
|
+
| Check | Result |
|
|
191
|
+
| ------------------------------------------------------- | ------------------------------------------------------- |
|
|
192
|
+
| `GET /v1/models` against Moonshot | ✅ HTTP 200 |
|
|
193
|
+
| Non-streaming chat against Moonshot | ✅ HTTP 200 |
|
|
194
|
+
| Streaming chat against Moonshot | ✅ HTTP 200 |
|
|
195
|
+
| Proxy-backed plain chat in VS Code | ✅ |
|
|
196
|
+
| Proxy-backed streaming in VS Code | ✅ |
|
|
197
|
+
| Proxy-backed integrated-browser tool use (post-rewrite) | ✅ |
|
|
198
|
+
| Direct VS Code → Moonshot (no proxy) | ❌ — fails on temperature / top_p / `reasoning_content` |
|
|
199
|
+
|
|
200
|
+
### Tool-enabled validation details
|
|
201
|
+
|
|
202
|
+
**Prompt:** "Please open kimi documentation site using vscode integrated browser"
|
|
203
|
+
|
|
204
|
+
- First run: browser tool invocation succeeded, but the post-tool follow-up failed because thinking remained enabled and VS Code did not preserve `reasoning_content`.
|
|
205
|
+
- Workaround: force `thinking: { "type": "disabled" }` plus `temperature: 0.6` on tool-enabled turns.
|
|
206
|
+
- Rerun: both the tool turn and the follow-up model turn returned upstream `200` with `text/event-stream`.
|
|
207
|
+
|
|
208
|
+
### Proxy validation notes
|
|
209
|
+
|
|
210
|
+
- Redacted proxy logs confirmed `temperature 0.1 -> 1` and `top_p 1 -> 0.95` for plain-chat requests.
|
|
211
|
+
- Redacted proxy logs later confirmed `thinking undefined -> disabled` and `temperature 0.1 -> 0.6` for tool-enabled requests.
|
|
212
|
+
|
|
213
|
+
### Final verdict
|
|
214
|
+
|
|
215
|
+
- Acceptable for plain chat: **yes** (proxy)
|
|
216
|
+
- Acceptable for streaming chat: **yes** (proxy)
|
|
217
|
+
- Acceptable for tool-enabled agent use: **yes**, with the local proxy workaround
|
|
218
|
+
- Acceptable without a proxy: **no**
|
|
219
|
+
|
|
220
|
+
## References
|
|
221
|
+
|
|
222
|
+
- VS Code custom endpoint docs: `https://code.visualstudio.com/docs/copilot/customization/language-models#_add-a-custom-endpoint-model`
|
|
223
|
+
- Kimi docs index: `https://platform.kimi.ai/docs/llms.txt`
|
|
224
|
+
- Kimi chat completion docs: `https://platform.kimi.ai/docs/api/chat.md`
|
|
225
|
+
- Kimi models list: `https://platform.kimi.ai/docs/api/list-models.md`
|
|
226
|
+
- Kimi model parameter reference: `https://platform.kimi.ai/docs/api/models-overview.md`
|
|
227
|
+
- Kimi tool use docs: `https://platform.kimi.ai/docs/api/tool-use.md`
|
|
228
|
+
- Kimi K2.6 quickstart: `https://platform.kimi.ai/docs/guide/kimi-k2-6-quickstart.md`
|
|
229
|
+
- Kimi thinking guide: `https://platform.kimi.ai/docs/guide/use-kimi-k2-thinking-model.md`
|
|
230
|
+
- Kimi web search guide: `https://platform.kimi.ai/docs/guide/use-web-search.md`
|
|
231
|
+
- Kimi coding tools / agent guide: `https://platform.kimi.ai/docs/guide/agent-support.md`
|
|
232
|
+
- Kimi K2.6 pricing: `https://platform.kimi.ai/docs/pricing/chat-k26`
|
|
@@ -0,0 +1,258 @@
|
|
|
1
|
+
# Xiaomi MiMo — VS Code Custom Endpoint Setup Guide
|
|
2
|
+
|
|
3
|
+
> **TL;DR:** MiMo works directly — no proxy needed. Set `thinking: { type: "disabled" }` in `requestBody` for tool-loop stability, because MiMo's API rejects (HTTP 400) any tool turn that is missing historical `reasoning_content`. Disabling thinking eliminates the field, so loops stay stable.
|
|
4
|
+
|
|
5
|
+
## At a Glance
|
|
6
|
+
|
|
7
|
+
| Field | Value |
|
|
8
|
+
| ---------------------- | ------------------------------------------------ |
|
|
9
|
+
| Mode | **Direct** (no proxy) |
|
|
10
|
+
| Vision | ✅ Yes (`mimo-v2.5` only) |
|
|
11
|
+
| Tool calling | ✅ Yes (with `thinking: disabled`) |
|
|
12
|
+
| Context | 1M (V2.5 Pro / V2.5) / 256K (V2 Flash) |
|
|
13
|
+
| Max output | 128K (V2.5 Pro) / 32K (V2.5) / 64K (V2 Flash) |
|
|
14
|
+
| Required `requestBody` | `thinking: { type: "disabled" }` |
|
|
15
|
+
| Endpoint | `https://api.xiaomimimo.com/v1/chat/completions` |
|
|
16
|
+
|
|
17
|
+
### Models at a glance
|
|
18
|
+
|
|
19
|
+
| Model | Vision | Context | Role |
|
|
20
|
+
| --------------- | ------ | ------- | ------------------------------------------ |
|
|
21
|
+
| `mimo-v2.5-pro` | ❌ | 1M | Flagship text-only — best for agentic work |
|
|
22
|
+
| `mimo-v2.5` | ✅ | 1M | Omnimodal — text + image + video + audio |
|
|
23
|
+
| `mimo-v2-flash` | ❌ | 256K | Fastest and cheapest — strong reasoning |
|
|
24
|
+
|
|
25
|
+
> Legacy `mimo-v2-pro` and `mimo-v2-omni` auto-route to V2.5 (with V2.5 pricing) as of June 1, 2026, and will be fully deprecated by June 30, 2026. Use the V2.5 series.
|
|
26
|
+
|
|
27
|
+
## Quick Start
|
|
28
|
+
|
|
29
|
+
1. **Edit `chatLanguageModels.json`** — add the MiMo block(s) from [Setup](#setup) below.
|
|
30
|
+
2. **Set your `MIMO_API_KEY`** via Command Palette → **Chat: Manage Language Models**.
|
|
31
|
+
3. **Restart VS Code** and pick "MiMo V2.5 Pro", "MiMo V2.5", or "MiMo V2 Flash".
|
|
32
|
+
|
|
33
|
+
## Setup
|
|
34
|
+
|
|
35
|
+
### 1. VS Code configuration
|
|
36
|
+
|
|
37
|
+
Config file location:
|
|
38
|
+
|
|
39
|
+
| OS | Path |
|
|
40
|
+
| ------- | ----------------------------------------------------------------- |
|
|
41
|
+
| Windows | `%APPDATA%\Code\User\chatLanguageModels.json` |
|
|
42
|
+
| macOS | `~/Library/Application Support/Code/User/chatLanguageModels.json` |
|
|
43
|
+
| Linux | `~/.config/Code/User/chatLanguageModels.json` |
|
|
44
|
+
|
|
45
|
+
```json
|
|
46
|
+
{
|
|
47
|
+
"name": "MiMo",
|
|
48
|
+
"vendor": "customendpoint",
|
|
49
|
+
"apiKey": "",
|
|
50
|
+
"apiType": "chat-completions",
|
|
51
|
+
"models": [
|
|
52
|
+
{
|
|
53
|
+
"id": "mimo-v2.5-pro",
|
|
54
|
+
"name": "MiMo V2.5 Pro",
|
|
55
|
+
"url": "https://api.xiaomimimo.com/v1/chat/completions",
|
|
56
|
+
"toolCalling": true,
|
|
57
|
+
"vision": false,
|
|
58
|
+
"streaming": true,
|
|
59
|
+
"maxInputTokens": 1048576,
|
|
60
|
+
"maxOutputTokens": 131072,
|
|
61
|
+
"requestBody": {
|
|
62
|
+
"thinking": { "type": "disabled" },
|
|
63
|
+
"temperature": 1,
|
|
64
|
+
"top_p": 0.95
|
|
65
|
+
}
|
|
66
|
+
},
|
|
67
|
+
{
|
|
68
|
+
"id": "mimo-v2.5",
|
|
69
|
+
"name": "MiMo V2.5",
|
|
70
|
+
"url": "https://api.xiaomimimo.com/v1/chat/completions",
|
|
71
|
+
"toolCalling": true,
|
|
72
|
+
"vision": true,
|
|
73
|
+
"streaming": true,
|
|
74
|
+
"maxInputTokens": 1048576,
|
|
75
|
+
"maxOutputTokens": 32768,
|
|
76
|
+
"requestBody": {
|
|
77
|
+
"thinking": { "type": "disabled" },
|
|
78
|
+
"temperature": 1,
|
|
79
|
+
"top_p": 0.95
|
|
80
|
+
}
|
|
81
|
+
},
|
|
82
|
+
{
|
|
83
|
+
"id": "mimo-v2-flash",
|
|
84
|
+
"name": "MiMo V2 Flash",
|
|
85
|
+
"url": "https://api.xiaomimimo.com/v1/chat/completions",
|
|
86
|
+
"toolCalling": true,
|
|
87
|
+
"vision": false,
|
|
88
|
+
"streaming": true,
|
|
89
|
+
"maxInputTokens": 262144,
|
|
90
|
+
"maxOutputTokens": 65536,
|
|
91
|
+
"requestBody": {
|
|
92
|
+
"thinking": { "type": "disabled" },
|
|
93
|
+
"temperature": 0.3,
|
|
94
|
+
"top_p": 0.95
|
|
95
|
+
}
|
|
96
|
+
}
|
|
97
|
+
]
|
|
98
|
+
}
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
### 2. API key
|
|
102
|
+
|
|
103
|
+
1. Open the Command Palette (`Ctrl+Shift+P`).
|
|
104
|
+
2. Run **Chat: Manage Language Models**.
|
|
105
|
+
3. Find the **MiMo** group → **Update API Key**.
|
|
106
|
+
4. Paste your MiMo API key.
|
|
107
|
+
|
|
108
|
+
> After setting via the UI, VS Code replaces `"apiKey": ""` with a `${input:chat.lm.secret.<id>}` reference.
|
|
109
|
+
|
|
110
|
+
### 3. Token Plan (optional)
|
|
111
|
+
|
|
112
|
+
Token Plan subscribers use different base URLs and `tp-` prefixed keys:
|
|
113
|
+
|
|
114
|
+
| Protocol | Base URL |
|
|
115
|
+
| --------- | ------------------------------------------------ |
|
|
116
|
+
| OpenAI | `https://token-plan-cn.xiaomimimo.com/v1` |
|
|
117
|
+
| Anthropic | `https://token-plan-cn.xiaomimimo.com/anthropic` |
|
|
118
|
+
|
|
119
|
+
> Pay-as-you-go keys are `sk-…`; Token Plan keys are `tp-…`. The endpoint to use depends on which key you set.
|
|
120
|
+
|
|
121
|
+
## Configuration Reference
|
|
122
|
+
|
|
123
|
+
### Sampling parameters
|
|
124
|
+
|
|
125
|
+
| Task type | `temperature` | `top_p` |
|
|
126
|
+
| -------------------- | ------------- | ------- |
|
|
127
|
+
| Agentic / tool-use | `0.3` | `0.95` |
|
|
128
|
+
| Vibe coding | `0.3` | `0.95` |
|
|
129
|
+
| General conversation | `0.8` | `0.95` |
|
|
130
|
+
| Math reasoning | `1.0` | `0.95` |
|
|
131
|
+
|
|
132
|
+
> For `mimo-v2.5-pro` and `mimo-v2.5`, MiMo's docs recommend `temperature: 1.0` and `top_p: 0.95` regardless of task. In thinking mode these models also **lock** `temperature` to `1.0` — any custom value is silently overridden. Since we disable thinking, your `requestBody` value is honored.
|
|
133
|
+
|
|
134
|
+
MiMo accepts `temperature` in `[0, 1.5]` and `top_p` in `[0.01, 1.0]`.
|
|
135
|
+
|
|
136
|
+
### Thinking mode
|
|
137
|
+
|
|
138
|
+
| Model | API default `thinking.type` | API default `temperature` |
|
|
139
|
+
| ---------------------------- | --------------------------- | -------------------------- |
|
|
140
|
+
| `mimo-v2.5-pro`, `mimo-v2.5` | `enabled` | `1.0` (locked in thinking) |
|
|
141
|
+
| `mimo-v2-flash` | `disabled` | `0.3` (customizable) |
|
|
142
|
+
|
|
143
|
+
When thinking is enabled, responses include a `reasoning_content` field alongside `content` and `tool_calls`.
|
|
144
|
+
|
|
145
|
+
### Capabilities
|
|
146
|
+
|
|
147
|
+
- Streaming (SSE, standard OpenAI format).
|
|
148
|
+
- Tool calling with `tool_choice: "auto"`.
|
|
149
|
+
- Vision (image input via OpenAI `content` array) on `mimo-v2.5` only.
|
|
150
|
+
- `tool_choice` other than `"auto"` is **stripped** and treated as `"auto"`.
|
|
151
|
+
- `mimo-v2.5` also supports video and audio understanding.
|
|
152
|
+
|
|
153
|
+
### Rate limits
|
|
154
|
+
|
|
155
|
+
**100 RPM / 10M TPM** per model per account.
|
|
156
|
+
|
|
157
|
+
## Troubleshooting
|
|
158
|
+
|
|
159
|
+
| Symptom | Likely cause | Fix |
|
|
160
|
+
| ------------------------------------------ | -------------------------------------------------------- | ----------------------------------------------------- |
|
|
161
|
+
| HTTP 400 on the second turn of a tool loop | `reasoning_content` missing in history (thinking on) | Add `thinking: { type: "disabled" }` to `requestBody` |
|
|
162
|
+
| Vision request returns an error | Used `mimo-v2.5-pro` or `mimo-v2-flash` (text-only) | Use `mimo-v2.5` for vision |
|
|
163
|
+
| Custom `tool_choice` ignored | MiMo only honors `"auto"` | Stick to `auto` |
|
|
164
|
+
| 401 Unauthorized | Wrong key, or Token Plan URL used with pay-as-you-go key | Match key prefix (`sk-` vs `tp-`) to the endpoint |
|
|
165
|
+
| 429 rate-limited | Concurrent sessions exceeded 100 RPM / 10M TPM | Reduce concurrent agent sessions |
|
|
166
|
+
|
|
167
|
+
## Pricing
|
|
168
|
+
|
|
169
|
+
For the cross-provider comparison, see [docs/pricing.md](../pricing.md). Overseas (international) pay-as-you-go rates:
|
|
170
|
+
|
|
171
|
+
| Model | Input (Cache Hit) | Input (Cache Miss) | Output |
|
|
172
|
+
| --------------- | ----------------- | ------------------ | ---------- |
|
|
173
|
+
| `mimo-v2.5-pro` | $0.20 / 1M | $1.00 / 1M | $3.00 / 1M |
|
|
174
|
+
| `mimo-v2.5` | $0.08 / 1M | $0.40 / 1M | $2.00 / 1M |
|
|
175
|
+
| `mimo-v2-flash` | $0.01 / 1M | $0.10 / 1M | $0.30 / 1M |
|
|
176
|
+
|
|
177
|
+
> Cache writing is currently free of charge (limited-time offer). MiMo also offers a Token Plan subscription with discounted rates and a free cache-writing promotion.
|
|
178
|
+
|
|
179
|
+
---
|
|
180
|
+
|
|
181
|
+
## Background & Findings
|
|
182
|
+
|
|
183
|
+
> This appendix preserves the validation narrative for future reference. It is not required to use the model.
|
|
184
|
+
|
|
185
|
+
### The critical `reasoning_content` constraint
|
|
186
|
+
|
|
187
|
+
When thinking mode is enabled and the conversation history contains tool calls, the `reasoning_content` field **must** be fully passed back in every subsequent assistant message. Otherwise, the API returns HTTP 400.
|
|
188
|
+
|
|
189
|
+
This is the same class of problem as Qwen's `reasoning_content` issue, but **stricter**: MiMo's API actively rejects requests with missing historical `reasoning_content`, rather than silently degrading.
|
|
190
|
+
|
|
191
|
+
**Implication for VS Code Copilot:** VS Code's agent mode is unlikely to preserve `reasoning_content` across multi-turn tool loops. Therefore:
|
|
192
|
+
|
|
193
|
+
- **Thinking enabled + tool calling = broken** (400 errors after the first tool round-trip).
|
|
194
|
+
- **Thinking disabled + tool calling = works** (no `reasoning_content` to preserve).
|
|
195
|
+
- **Thinking enabled + plain chat = works** (no tool calls in history).
|
|
196
|
+
|
|
197
|
+
### Why a static `thinking: disabled` is enough
|
|
198
|
+
|
|
199
|
+
VS Code's agent mode is the only flow that triggers tool loops, and we already disable thinking for those turns. Plain chat with thinking enabled works fine because no `reasoning_content` accumulates in history.
|
|
200
|
+
|
|
201
|
+
A dynamic proxy (suppress thinking only when tools are present — same pattern as `proxy/qwen-proxy.mjs`) would let plain chat show reasoning, but it is **not implemented** because:
|
|
202
|
+
|
|
203
|
+
- The cost of losing visible reasoning in plain chat is low for most users.
|
|
204
|
+
- Static suppression is one less moving part to maintain.
|
|
205
|
+
|
|
206
|
+
### Benchmark highlights (from official MiMo V2.5 announcement)
|
|
207
|
+
|
|
208
|
+
| Model | SWE-Bench Verified | SWE-Bench Pro | Terminal-Bench 2.0 | AIME 2025 |
|
|
209
|
+
| --------------- | ------------------ | ------------- | ------------------ | --------- |
|
|
210
|
+
| `mimo-v2.5-pro` | — | 57.2% | 68.4% | — |
|
|
211
|
+
| `mimo-v2.5` | — | 56.1% | — | — |
|
|
212
|
+
| `mimo-v2-flash` | 73.4% | — | — | 94.1% |
|
|
213
|
+
|
|
214
|
+
> `mimo-v2.5` additionally scores 87.7% on Video-MME and 62.3% on Claw-Eval Text.
|
|
215
|
+
|
|
216
|
+
### Validation results
|
|
217
|
+
|
|
218
|
+
| # | Test | Model | Result |
|
|
219
|
+
| --- | ----------------------------------------- | --------------- | ------------------------------------------------------------------------------------- |
|
|
220
|
+
| 1 | Add provider to `chatLanguageModels.json` | All | ✅ |
|
|
221
|
+
| 2 | Plain chat in VS Code | `mimo-v2.5-pro` | ✅ — model self-identified as MiMo 1T-param |
|
|
222
|
+
| 3 | Agent mode (tool calling) | `mimo-v2.5-pro` | ✅ — file reads, browser automation, terminal, image viewing all worked |
|
|
223
|
+
| 4 | Vision | `mimo-v2.5` | ✅ — analyzed an attached screenshot (Facebook post, browser tabs, sidebar) in detail |
|
|
224
|
+
|
|
225
|
+
External API checks (curl):
|
|
226
|
+
|
|
227
|
+
| Check | Model | Result |
|
|
228
|
+
| ------------------ | --------------- | ------------------------------------------------------- |
|
|
229
|
+
| Non-streaming chat | `mimo-v2-flash` | ✅ |
|
|
230
|
+
| Streaming (SSE) | `mimo-v2-flash` | ✅ |
|
|
231
|
+
| Non-streaming chat | `mimo-v2.5-pro` | ✅ |
|
|
232
|
+
| Tool calling | `mimo-v2-flash` | ✅ — `finish_reason: "tool_calls"` with valid JSON args |
|
|
233
|
+
|
|
234
|
+
### Known risks
|
|
235
|
+
|
|
236
|
+
| Risk | Detail | Mitigation |
|
|
237
|
+
| ------------------------------------- | ------------------------------------------------------------------ | ------------------------------------------------------- |
|
|
238
|
+
| `reasoning_content` 400 errors | If thinking is accidentally enabled in tool loops, API returns 400 | Always set `thinking.type: "disabled"` in `requestBody` |
|
|
239
|
+
| `tool_choice` only supports `"auto"` | Non-`auto` values are stripped | Should not affect VS Code, which uses `auto` |
|
|
240
|
+
| Auth header format | Both `api-key:` and `Authorization: Bearer` work | VS Code sends `Authorization: Bearer` — works directly |
|
|
241
|
+
| `temperature` locked in thinking mode | V2.5 Pro / V2.5 force `temperature: 1.0` when thinking is on | Not an issue when thinking is disabled |
|
|
242
|
+
| 1M context window | VS Code may not send enough tokens to benefit | Set conservatively; adjust after testing |
|
|
243
|
+
|
|
244
|
+
## References
|
|
245
|
+
|
|
246
|
+
- API Platform: `https://platform.xiaomimimo.com/`
|
|
247
|
+
- OpenAI API Reference: `https://platform.xiaomimimo.com/docs/en-US/api/chat/openai-api`
|
|
248
|
+
- First API Call Guide: `https://platform.xiaomimimo.com/docs/en-US/quick-start/first-api-call`
|
|
249
|
+
- Model & Rate Limits: `https://platform.xiaomimimo.com/docs/en-US/quick-start/model`
|
|
250
|
+
- Model Hyperparameters: `https://platform.xiaomimimo.com/docs/en-US/quick-start/model-hyperparameters`
|
|
251
|
+
- Pricing: `https://platform.xiaomimimo.com/docs/en-US/pricing`
|
|
252
|
+
- `reasoning_content` Guide: `https://platform.xiaomimimo.com/docs/en-US/usage-guide/passing-back-reasoning_content`
|
|
253
|
+
- AI Tools Integration: `https://platform.xiaomimimo.com/docs/en-US/integration/claude-code`
|
|
254
|
+
- HuggingFace (MiMo-V2.5): `https://huggingface.co/XiaomiMiMo/MiMo-V2.5`
|
|
255
|
+
- HuggingFace (MiMo-V2.5-Pro): `https://huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro`
|
|
256
|
+
- HuggingFace (MiMo-V2-Flash): `https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash`
|
|
257
|
+
- MiMo V2.5 Blog: `https://mimo.xiaomi.com/mimo-v2-5`
|
|
258
|
+
- AI Studio (playground): `https://aistudio.xiaomimimo.com/`
|