copilot-custom-endpoint 1.3.0 → 1.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,232 @@
1
+ # Kimi — VS Code Custom Endpoint Setup Guide
2
+
3
+ > **TL;DR:** Kimi K2.6 requires the local proxy. The K2 family locks `temperature: 1` and `top_p: 0.95`, and requires `thinking: { type: "disabled" }` on tool turns. The proxy rewrites sampling values, suppresses thinking on tool turns, and preserves streaming. Direct VS Code → Moonshot integration is not viable in this environment.
4
+
5
+ ## At a Glance
6
+
7
+ | Field | Value |
8
+ | ---------------------- | --------------------------------------------- |
9
+ | Mode | **Proxy required** (local on `:3457`) |
10
+ | Vision | ✅ Yes |
11
+ | Tool calling | ✅ Yes (proxy forces `thinking: disabled`) |
12
+ | Context | 256K |
13
+ | Max output | 32K |
14
+ | Required `requestBody` | `temperature: 1` |
15
+ | Upstream endpoint | `https://api.moonshot.ai/v1/chat/completions` |
16
+ | Proxy endpoint | `http://127.0.0.1:3457/v1/chat/completions` |
17
+
18
+ ## Quick Start
19
+
20
+ 1. **Start the proxy:** `npm run proxy:kimi`
21
+ 2. **Edit `chatLanguageModels.json`** — add the Kimi block from [Setup](#setup) below.
22
+ 3. **Set your Moonshot API key** via the Command Palette → **Chat: Manage Language Models**.
23
+ 4. **Restart VS Code** and pick "Kimi K2.6" in the chat picker.
24
+
25
+ ## Setup
26
+
27
+ ### 1. VS Code configuration
28
+
29
+ Config file location:
30
+
31
+ | OS | Path |
32
+ | ------- | ----------------------------------------------------------------- |
33
+ | Windows | `%APPDATA%\Code\User\chatLanguageModels.json` |
34
+ | macOS | `~/Library/Application Support/Code/User/chatLanguageModels.json` |
35
+ | Linux | `~/.config/Code/User/chatLanguageModels.json` |
36
+
37
+ ```json
38
+ {
39
+ "name": "Kimi",
40
+ "vendor": "customendpoint",
41
+ "apiKey": "",
42
+ "apiType": "chat-completions",
43
+ "models": [
44
+ {
45
+ "id": "kimi-k2.6",
46
+ "name": "Kimi K2.6",
47
+ "url": "http://127.0.0.1:3457/v1/chat/completions",
48
+ "requestBody": {
49
+ "temperature": 1
50
+ },
51
+ "toolCalling": true,
52
+ "vision": true,
53
+ "streaming": true,
54
+ "maxInputTokens": 262144,
55
+ "maxOutputTokens": 32768
56
+ }
57
+ ]
58
+ }
59
+ ```
60
+
61
+ ### 2. API key
62
+
63
+ 1. Open the Command Palette (`Ctrl+Shift+P`).
64
+ 2. Run **Chat: Manage Language Models**.
65
+ 3. Find the **Kimi** group → **Update API Key**.
66
+ 4. Paste your Moonshot API key.
67
+
68
+ > After setting via the UI, VS Code replaces `"apiKey": ""` with a `${input:chat.lm.secret.<id>}` reference.
69
+
70
+ ### 3. Local proxy
71
+
72
+ | Setting | Value |
73
+ | ------------ | ----------------------------------------------------- |
74
+ | Script | `proxy/kimi-proxy.mjs` |
75
+ | Listen URL | `http://127.0.0.1:3457/v1/chat/completions` |
76
+ | Health check | `http://127.0.0.1:3457/healthz` |
77
+ | Start | `npm run proxy:kimi` (or `node proxy/kimi-proxy.mjs`) |
78
+ | Help | `node proxy/kimi-proxy.mjs --help` |
79
+
80
+ #### Environment variables
81
+
82
+ All can be set in a `.env` file at the repo root (both proxies `import 'dotenv/config'` automatically).
83
+
84
+ | Variable | Default | Purpose |
85
+ | ------------------------------------------- | ----------------------------------------------------- | ------------------------------------------------------- |
86
+ | `KIMI_PROXY_PORT` | `3457` (falls back to `PORT`) | Local listen port |
87
+ | `KIMI_UPSTREAM_URL` | `https://api.moonshot.ai/v1/chat/completions` | Upstream Moonshot endpoint |
88
+ | `KIMI_PROXY_FORCE_TEMPERATURE` | `1` | Temperature for thinking-mode requests |
89
+ | `KIMI_PROXY_FORCE_NON_THINKING_TEMPERATURE` | `0.6` | Temperature when thinking is disabled (tool requests) |
90
+ | `KIMI_PROXY_FORCE_TOP_P` | `0.95` | `top_p` forced into request body |
91
+ | `KIMI_PROXY_DISABLE_THINKING_WITH_TOOLS` | `1` | Force `thinking={"type":"disabled"}` when tools present |
92
+ | `KIMI_PROXY_LOG` | `debug_log/kimi-proxy.ndjson` (relative to repo root) | Redacted NDJSON log path |
93
+
94
+ #### Health check response
95
+
96
+ ```json
97
+ {
98
+ "ok": true,
99
+ "upstreamUrl": "https://api.moonshot.ai/v1/chat/completions",
100
+ "port": 3457,
101
+ "forcedTemperature": 1,
102
+ "forcedTopP": 0.95
103
+ }
104
+ ```
105
+
106
+ #### Proxy behavior
107
+
108
+ - Forwards the existing `Authorization` header upstream.
109
+ - Rewrites plain-chat requests to `temperature: 1` and `top_p: 0.95`.
110
+ - Rewrites tool-enabled requests to `thinking: {"type": "disabled"}`, `temperature: 0.6`, and `top_p: 0.95`.
111
+ - Preserves streaming responses.
112
+ - Writes redacted request summaries to `debug_log/kimi-proxy.ndjson`.
113
+
114
+ ## Configuration Reference
115
+
116
+ ### Sampling parameters
117
+
118
+ | Parameter | Value | Notes |
119
+ | ------------- | ----------------------------- | -------------------------------- |
120
+ | `temperature` | `1` (thinking) / `0.6` (tool) | Locked by model — proxy enforces |
121
+ | `top_p` | `0.95` | Locked by model — proxy enforces |
122
+
123
+ ### Thinking mode
124
+
125
+ | Turn type | Behavior |
126
+ | ------------ | ----------------------------------------------------------- |
127
+ | Plain chat | Thinking enabled, `temperature: 1` |
128
+ | Tool-enabled | `thinking: { type: "disabled" }` forced, `temperature: 0.6` |
129
+
130
+ ### Capabilities
131
+
132
+ - Native multimodal: text, image, video input.
133
+ - Tool calling with `tool_choice: "auto"`.
134
+ - Streaming (SSE).
135
+ - `tools` / `tool_calls` only (deprecated `functions` not supported).
136
+ - `tool_choice="required"` is **not** supported by the model.
137
+
138
+ ## Troubleshooting
139
+
140
+ | Symptom | Likely cause | Fix |
141
+ | ------------------------------------------------------ | -------------------------- | ------------------------------------------------- |
142
+ | "Connection refused" on chat | Proxy not running | `npm run proxy:kimi` |
143
+ | `invalid temperature: only 1 is allowed` | Direct path without proxy | Use the proxy |
144
+ | `invalid top_p: only 0.95 is allowed` | Direct path without proxy | Use the proxy |
145
+ | `thinking is enabled but reasoning_content is missing` | Tool turn with thinking on | Verify `KIMI_PROXY_DISABLE_THINKING_WITH_TOOLS=1` |
146
+ | Model not in VS Code picker | Config not reloaded | Restart VS Code |
147
+ | `tool_choice=required` rejected | Model limitation | Use `auto` only |
148
+
149
+ ## Pricing
150
+
151
+ For the cross-provider comparison, see [docs/pricing.md](../pricing.md). Kimi K2.6 on the **Moonshot direct platform**:
152
+
153
+ | Model | Input | Output (non-thinking) | Output (thinking) |
154
+ | ----------- | ---------- | --------------------- | ----------------- |
155
+ | `kimi-k2.6` | $0.16 / 1M | $0.95 / 1M | $4.00 / 1M |
156
+
157
+ > Via DashScope, K2.6 is also available at $0.89 / 1M input and $3.71 / 1M output (same model, regional pricing).
158
+
159
+ ---
160
+
161
+ ## Background & Findings
162
+
163
+ > This appendix preserves the validation narrative for future reference. It is not required to use the model.
164
+
165
+ ### Why Kimi was a reasonable candidate
166
+
167
+ Kimi documents an OpenAI-compatible Chat Completions API with Bearer-token auth, `model` selection, streaming, and `tools` / `tool_calls` — making VS Code Custom Endpoint `chat-completions` mode the lowest-risk starting point.
168
+
169
+ ### Why direct integration failed
170
+
171
+ Direct VS Code requests to Moonshot failed in stages:
172
+
173
+ 1. Initial auth failure while the config still pointed at the older `api.moonshot.cn` endpoint.
174
+ 2. `invalid temperature: only 1 is allowed for this model`.
175
+ 3. `invalid top_p: only 0.95 is allowed for this model`.
176
+ 4. After the first tool-enabled attempt, `thinking is enabled but reasoning_content is missing in assistant tool call message`.
177
+
178
+ The model-level `requestBody.temperature = 1` override validated locally but was not sufficient in practice, which strongly suggests that VS Code's Custom Endpoint provider ignored or overwrote some model-specific request fields.
179
+
180
+ ### Important caveats from research
181
+
182
+ - Kimi documents `tools` / `tool_calls`, not deprecated `functions` / `function_call`.
183
+ - `tool_choice="required"` is not supported.
184
+ - Thinking controls are Kimi-specific through a `thinking` object and `reasoning_content` fields.
185
+ - VS Code BYOK/custom endpoint support does not replace GitHub-hosted features such as inline completions or semantic search.
186
+ - K2-family models use fixed sampling values, which made request rewriting necessary when VS Code sent incompatible values.
187
+
188
+ ### Validation results
189
+
190
+ | Check | Result |
191
+ | ------------------------------------------------------- | ------------------------------------------------------- |
192
+ | `GET /v1/models` against Moonshot | ✅ HTTP 200 |
193
+ | Non-streaming chat against Moonshot | ✅ HTTP 200 |
194
+ | Streaming chat against Moonshot | ✅ HTTP 200 |
195
+ | Proxy-backed plain chat in VS Code | ✅ |
196
+ | Proxy-backed streaming in VS Code | ✅ |
197
+ | Proxy-backed integrated-browser tool use (post-rewrite) | ✅ |
198
+ | Direct VS Code → Moonshot (no proxy) | ❌ — fails on temperature / top_p / `reasoning_content` |
199
+
200
+ ### Tool-enabled validation details
201
+
202
+ **Prompt:** "Please open kimi documentation site using vscode integrated browser"
203
+
204
+ - First run: browser tool invocation succeeded, but the post-tool follow-up failed because thinking remained enabled and VS Code did not preserve `reasoning_content`.
205
+ - Workaround: force `thinking: { "type": "disabled" }` plus `temperature: 0.6` on tool-enabled turns.
206
+ - Rerun: both the tool turn and the follow-up model turn returned upstream `200` with `text/event-stream`.
207
+
208
+ ### Proxy validation notes
209
+
210
+ - Redacted proxy logs confirmed `temperature 0.1 -> 1` and `top_p 1 -> 0.95` for plain-chat requests.
211
+ - Redacted proxy logs later confirmed `thinking undefined -> disabled` and `temperature 0.1 -> 0.6` for tool-enabled requests.
212
+
213
+ ### Final verdict
214
+
215
+ - Acceptable for plain chat: **yes** (proxy)
216
+ - Acceptable for streaming chat: **yes** (proxy)
217
+ - Acceptable for tool-enabled agent use: **yes**, with the local proxy workaround
218
+ - Acceptable without a proxy: **no**
219
+
220
+ ## References
221
+
222
+ - VS Code custom endpoint docs: `https://code.visualstudio.com/docs/copilot/customization/language-models#_add-a-custom-endpoint-model`
223
+ - Kimi docs index: `https://platform.kimi.ai/docs/llms.txt`
224
+ - Kimi chat completion docs: `https://platform.kimi.ai/docs/api/chat.md`
225
+ - Kimi models list: `https://platform.kimi.ai/docs/api/list-models.md`
226
+ - Kimi model parameter reference: `https://platform.kimi.ai/docs/api/models-overview.md`
227
+ - Kimi tool use docs: `https://platform.kimi.ai/docs/api/tool-use.md`
228
+ - Kimi K2.6 quickstart: `https://platform.kimi.ai/docs/guide/kimi-k2-6-quickstart.md`
229
+ - Kimi thinking guide: `https://platform.kimi.ai/docs/guide/use-kimi-k2-thinking-model.md`
230
+ - Kimi web search guide: `https://platform.kimi.ai/docs/guide/use-web-search.md`
231
+ - Kimi coding tools / agent guide: `https://platform.kimi.ai/docs/guide/agent-support.md`
232
+ - Kimi K2.6 pricing: `https://platform.kimi.ai/docs/pricing/chat-k26`
@@ -0,0 +1,258 @@
1
+ # Xiaomi MiMo — VS Code Custom Endpoint Setup Guide
2
+
3
+ > **TL;DR:** MiMo works directly — no proxy needed. Set `thinking: { type: "disabled" }` in `requestBody` for tool-loop stability, because MiMo's API rejects (HTTP 400) any tool turn that is missing historical `reasoning_content`. Disabling thinking eliminates the field, so loops stay stable.
4
+
5
+ ## At a Glance
6
+
7
+ | Field | Value |
8
+ | ---------------------- | ------------------------------------------------ |
9
+ | Mode | **Direct** (no proxy) |
10
+ | Vision | ✅ Yes (`mimo-v2.5` only) |
11
+ | Tool calling | ✅ Yes (with `thinking: disabled`) |
12
+ | Context | 1M (V2.5 Pro / V2.5) / 256K (V2 Flash) |
13
+ | Max output | 128K (V2.5 Pro) / 32K (V2.5) / 64K (V2 Flash) |
14
+ | Required `requestBody` | `thinking: { type: "disabled" }` |
15
+ | Endpoint | `https://api.xiaomimimo.com/v1/chat/completions` |
16
+
17
+ ### Models at a glance
18
+
19
+ | Model | Vision | Context | Role |
20
+ | --------------- | ------ | ------- | ------------------------------------------ |
21
+ | `mimo-v2.5-pro` | ❌ | 1M | Flagship text-only — best for agentic work |
22
+ | `mimo-v2.5` | ✅ | 1M | Omnimodal — text + image + video + audio |
23
+ | `mimo-v2-flash` | ❌ | 256K | Fastest and cheapest — strong reasoning |
24
+
25
+ > Legacy `mimo-v2-pro` and `mimo-v2-omni` auto-route to V2.5 (with V2.5 pricing) as of June 1, 2026, and will be fully deprecated by June 30, 2026. Use the V2.5 series.
26
+
27
+ ## Quick Start
28
+
29
+ 1. **Edit `chatLanguageModels.json`** — add the MiMo block(s) from [Setup](#setup) below.
30
+ 2. **Set your `MIMO_API_KEY`** via Command Palette → **Chat: Manage Language Models**.
31
+ 3. **Restart VS Code** and pick "MiMo V2.5 Pro", "MiMo V2.5", or "MiMo V2 Flash".
32
+
33
+ ## Setup
34
+
35
+ ### 1. VS Code configuration
36
+
37
+ Config file location:
38
+
39
+ | OS | Path |
40
+ | ------- | ----------------------------------------------------------------- |
41
+ | Windows | `%APPDATA%\Code\User\chatLanguageModels.json` |
42
+ | macOS | `~/Library/Application Support/Code/User/chatLanguageModels.json` |
43
+ | Linux | `~/.config/Code/User/chatLanguageModels.json` |
44
+
45
+ ```json
46
+ {
47
+ "name": "MiMo",
48
+ "vendor": "customendpoint",
49
+ "apiKey": "",
50
+ "apiType": "chat-completions",
51
+ "models": [
52
+ {
53
+ "id": "mimo-v2.5-pro",
54
+ "name": "MiMo V2.5 Pro",
55
+ "url": "https://api.xiaomimimo.com/v1/chat/completions",
56
+ "toolCalling": true,
57
+ "vision": false,
58
+ "streaming": true,
59
+ "maxInputTokens": 1048576,
60
+ "maxOutputTokens": 131072,
61
+ "requestBody": {
62
+ "thinking": { "type": "disabled" },
63
+ "temperature": 1,
64
+ "top_p": 0.95
65
+ }
66
+ },
67
+ {
68
+ "id": "mimo-v2.5",
69
+ "name": "MiMo V2.5",
70
+ "url": "https://api.xiaomimimo.com/v1/chat/completions",
71
+ "toolCalling": true,
72
+ "vision": true,
73
+ "streaming": true,
74
+ "maxInputTokens": 1048576,
75
+ "maxOutputTokens": 32768,
76
+ "requestBody": {
77
+ "thinking": { "type": "disabled" },
78
+ "temperature": 1,
79
+ "top_p": 0.95
80
+ }
81
+ },
82
+ {
83
+ "id": "mimo-v2-flash",
84
+ "name": "MiMo V2 Flash",
85
+ "url": "https://api.xiaomimimo.com/v1/chat/completions",
86
+ "toolCalling": true,
87
+ "vision": false,
88
+ "streaming": true,
89
+ "maxInputTokens": 262144,
90
+ "maxOutputTokens": 65536,
91
+ "requestBody": {
92
+ "thinking": { "type": "disabled" },
93
+ "temperature": 0.3,
94
+ "top_p": 0.95
95
+ }
96
+ }
97
+ ]
98
+ }
99
+ ```
100
+
101
+ ### 2. API key
102
+
103
+ 1. Open the Command Palette (`Ctrl+Shift+P`).
104
+ 2. Run **Chat: Manage Language Models**.
105
+ 3. Find the **MiMo** group → **Update API Key**.
106
+ 4. Paste your MiMo API key.
107
+
108
+ > After setting via the UI, VS Code replaces `"apiKey": ""` with a `${input:chat.lm.secret.<id>}` reference.
109
+
110
+ ### 3. Token Plan (optional)
111
+
112
+ Token Plan subscribers use different base URLs and `tp-` prefixed keys:
113
+
114
+ | Protocol | Base URL |
115
+ | --------- | ------------------------------------------------ |
116
+ | OpenAI | `https://token-plan-cn.xiaomimimo.com/v1` |
117
+ | Anthropic | `https://token-plan-cn.xiaomimimo.com/anthropic` |
118
+
119
+ > Pay-as-you-go keys are `sk-…`; Token Plan keys are `tp-…`. The endpoint to use depends on which key you set.
120
+
121
+ ## Configuration Reference
122
+
123
+ ### Sampling parameters
124
+
125
+ | Task type | `temperature` | `top_p` |
126
+ | -------------------- | ------------- | ------- |
127
+ | Agentic / tool-use | `0.3` | `0.95` |
128
+ | Vibe coding | `0.3` | `0.95` |
129
+ | General conversation | `0.8` | `0.95` |
130
+ | Math reasoning | `1.0` | `0.95` |
131
+
132
+ > For `mimo-v2.5-pro` and `mimo-v2.5`, MiMo's docs recommend `temperature: 1.0` and `top_p: 0.95` regardless of task. In thinking mode these models also **lock** `temperature` to `1.0` — any custom value is silently overridden. Since we disable thinking, your `requestBody` value is honored.
133
+
134
+ MiMo accepts `temperature` in `[0, 1.5]` and `top_p` in `[0.01, 1.0]`.
135
+
136
+ ### Thinking mode
137
+
138
+ | Model | API default `thinking.type` | API default `temperature` |
139
+ | ---------------------------- | --------------------------- | -------------------------- |
140
+ | `mimo-v2.5-pro`, `mimo-v2.5` | `enabled` | `1.0` (locked in thinking) |
141
+ | `mimo-v2-flash` | `disabled` | `0.3` (customizable) |
142
+
143
+ When thinking is enabled, responses include a `reasoning_content` field alongside `content` and `tool_calls`.
144
+
145
+ ### Capabilities
146
+
147
+ - Streaming (SSE, standard OpenAI format).
148
+ - Tool calling with `tool_choice: "auto"`.
149
+ - Vision (image input via OpenAI `content` array) on `mimo-v2.5` only.
150
+ - `tool_choice` other than `"auto"` is **stripped** and treated as `"auto"`.
151
+ - `mimo-v2.5` also supports video and audio understanding.
152
+
153
+ ### Rate limits
154
+
155
+ **100 RPM / 10M TPM** per model per account.
156
+
157
+ ## Troubleshooting
158
+
159
+ | Symptom | Likely cause | Fix |
160
+ | ------------------------------------------ | -------------------------------------------------------- | ----------------------------------------------------- |
161
+ | HTTP 400 on the second turn of a tool loop | `reasoning_content` missing in history (thinking on) | Add `thinking: { type: "disabled" }` to `requestBody` |
162
+ | Vision request returns an error | Used `mimo-v2.5-pro` or `mimo-v2-flash` (text-only) | Use `mimo-v2.5` for vision |
163
+ | Custom `tool_choice` ignored | MiMo only honors `"auto"` | Stick to `auto` |
164
+ | 401 Unauthorized | Wrong key, or Token Plan URL used with pay-as-you-go key | Match key prefix (`sk-` vs `tp-`) to the endpoint |
165
+ | 429 rate-limited | Concurrent sessions exceeded 100 RPM / 10M TPM | Reduce concurrent agent sessions |
166
+
167
+ ## Pricing
168
+
169
+ For the cross-provider comparison, see [docs/pricing.md](../pricing.md). Overseas (international) pay-as-you-go rates:
170
+
171
+ | Model | Input (Cache Hit) | Input (Cache Miss) | Output |
172
+ | --------------- | ----------------- | ------------------ | ---------- |
173
+ | `mimo-v2.5-pro` | $0.20 / 1M | $1.00 / 1M | $3.00 / 1M |
174
+ | `mimo-v2.5` | $0.08 / 1M | $0.40 / 1M | $2.00 / 1M |
175
+ | `mimo-v2-flash` | $0.01 / 1M | $0.10 / 1M | $0.30 / 1M |
176
+
177
+ > Cache writing is currently free of charge (limited-time offer). MiMo also offers a Token Plan subscription with discounted rates and a free cache-writing promotion.
178
+
179
+ ---
180
+
181
+ ## Background & Findings
182
+
183
+ > This appendix preserves the validation narrative for future reference. It is not required to use the model.
184
+
185
+ ### The critical `reasoning_content` constraint
186
+
187
+ When thinking mode is enabled and the conversation history contains tool calls, the `reasoning_content` field **must** be fully passed back in every subsequent assistant message. Otherwise, the API returns HTTP 400.
188
+
189
+ This is the same class of problem as Qwen's `reasoning_content` issue, but **stricter**: MiMo's API actively rejects requests with missing historical `reasoning_content`, rather than silently degrading.
190
+
191
+ **Implication for VS Code Copilot:** VS Code's agent mode is unlikely to preserve `reasoning_content` across multi-turn tool loops. Therefore:
192
+
193
+ - **Thinking enabled + tool calling = broken** (400 errors after the first tool round-trip).
194
+ - **Thinking disabled + tool calling = works** (no `reasoning_content` to preserve).
195
+ - **Thinking enabled + plain chat = works** (no tool calls in history).
196
+
197
+ ### Why a static `thinking: disabled` is enough
198
+
199
+ VS Code's agent mode is the only flow that triggers tool loops, and we already disable thinking for those turns. Plain chat with thinking enabled works fine because no `reasoning_content` accumulates in history.
200
+
201
+ A dynamic proxy (suppress thinking only when tools are present — same pattern as `proxy/qwen-proxy.mjs`) would let plain chat show reasoning, but it is **not implemented** because:
202
+
203
+ - The cost of losing visible reasoning in plain chat is low for most users.
204
+ - Static suppression is one less moving part to maintain.
205
+
206
+ ### Benchmark highlights (from official MiMo V2.5 announcement)
207
+
208
+ | Model | SWE-Bench Verified | SWE-Bench Pro | Terminal-Bench 2.0 | AIME 2025 |
209
+ | --------------- | ------------------ | ------------- | ------------------ | --------- |
210
+ | `mimo-v2.5-pro` | — | 57.2% | 68.4% | — |
211
+ | `mimo-v2.5` | — | 56.1% | — | — |
212
+ | `mimo-v2-flash` | 73.4% | — | — | 94.1% |
213
+
214
+ > `mimo-v2.5` additionally scores 87.7% on Video-MME and 62.3% on Claw-Eval Text.
215
+
216
+ ### Validation results
217
+
218
+ | # | Test | Model | Result |
219
+ | --- | ----------------------------------------- | --------------- | ------------------------------------------------------------------------------------- |
220
+ | 1 | Add provider to `chatLanguageModels.json` | All | ✅ |
221
+ | 2 | Plain chat in VS Code | `mimo-v2.5-pro` | ✅ — model self-identified as MiMo 1T-param |
222
+ | 3 | Agent mode (tool calling) | `mimo-v2.5-pro` | ✅ — file reads, browser automation, terminal, image viewing all worked |
223
+ | 4 | Vision | `mimo-v2.5` | ✅ — analyzed an attached screenshot (Facebook post, browser tabs, sidebar) in detail |
224
+
225
+ External API checks (curl):
226
+
227
+ | Check | Model | Result |
228
+ | ------------------ | --------------- | ------------------------------------------------------- |
229
+ | Non-streaming chat | `mimo-v2-flash` | ✅ |
230
+ | Streaming (SSE) | `mimo-v2-flash` | ✅ |
231
+ | Non-streaming chat | `mimo-v2.5-pro` | ✅ |
232
+ | Tool calling | `mimo-v2-flash` | ✅ — `finish_reason: "tool_calls"` with valid JSON args |
233
+
234
+ ### Known risks
235
+
236
+ | Risk | Detail | Mitigation |
237
+ | ------------------------------------- | ------------------------------------------------------------------ | ------------------------------------------------------- |
238
+ | `reasoning_content` 400 errors | If thinking is accidentally enabled in tool loops, API returns 400 | Always set `thinking.type: "disabled"` in `requestBody` |
239
+ | `tool_choice` only supports `"auto"` | Non-`auto` values are stripped | Should not affect VS Code, which uses `auto` |
240
+ | Auth header format | Both `api-key:` and `Authorization: Bearer` work | VS Code sends `Authorization: Bearer` — works directly |
241
+ | `temperature` locked in thinking mode | V2.5 Pro / V2.5 force `temperature: 1.0` when thinking is on | Not an issue when thinking is disabled |
242
+ | 1M context window | VS Code may not send enough tokens to benefit | Set conservatively; adjust after testing |
243
+
244
+ ## References
245
+
246
+ - API Platform: `https://platform.xiaomimimo.com/`
247
+ - OpenAI API Reference: `https://platform.xiaomimimo.com/docs/en-US/api/chat/openai-api`
248
+ - First API Call Guide: `https://platform.xiaomimimo.com/docs/en-US/quick-start/first-api-call`
249
+ - Model & Rate Limits: `https://platform.xiaomimimo.com/docs/en-US/quick-start/model`
250
+ - Model Hyperparameters: `https://platform.xiaomimimo.com/docs/en-US/quick-start/model-hyperparameters`
251
+ - Pricing: `https://platform.xiaomimimo.com/docs/en-US/pricing`
252
+ - `reasoning_content` Guide: `https://platform.xiaomimimo.com/docs/en-US/usage-guide/passing-back-reasoning_content`
253
+ - AI Tools Integration: `https://platform.xiaomimimo.com/docs/en-US/integration/claude-code`
254
+ - HuggingFace (MiMo-V2.5): `https://huggingface.co/XiaomiMiMo/MiMo-V2.5`
255
+ - HuggingFace (MiMo-V2.5-Pro): `https://huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro`
256
+ - HuggingFace (MiMo-V2-Flash): `https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash`
257
+ - MiMo V2.5 Blog: `https://mimo.xiaomi.com/mimo-v2-5`
258
+ - AI Studio (playground): `https://aistudio.xiaomimimo.com/`