copilot-custom-endpoint 1.0.5 → 1.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +249 -81
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Github Copilot Custom Endpoints
|
|
2
2
|
|
|
3
|
-
> **TL;DR** — As of **June 1, 2026**, GitHub Copilot switched to usage-based billing (AI Credits), making every chat and agent session burn credits fast. This repo documents a practical workaround: use **cheaper, non-GitHub models** (DeepSeek, Kimi, Qwen) inside VS Code's Copilot chat — often at **5–55× lower cost** while retaining agent mode, tool calling, and streaming. We keep validated, copy-paste-ready configs and a small local proxy that smooths out provider quirks.
|
|
3
|
+
> **TL;DR** — As of **June 1, 2026**, GitHub Copilot switched to usage-based billing (AI Credits), making every chat and agent session burn credits fast. This repo documents a practical workaround: use **cheaper, non-GitHub models** (DeepSeek, Kimi, Qwen, MiMo) inside VS Code's Copilot chat — often at **5–55× lower cost** while retaining agent mode, tool calling, and streaming. We keep validated, copy-paste-ready configs and a small local proxy that smooths out provider quirks.
|
|
4
4
|
|
|
5
5
|
## What is this?
|
|
6
6
|
|
|
@@ -20,16 +20,20 @@ This repo is for those situations: validated, copy-paste-ready configs when Open
|
|
|
20
20
|
|
|
21
21
|
## Quick start
|
|
22
22
|
|
|
23
|
-
| Provider | Model
|
|
24
|
-
| ----------------------------- |
|
|
25
|
-
| **Moonshot (Kimi)** | `kimi-k2.6`
|
|
26
|
-
| **Alibaba Cloud (DashScope)** | `qwen3.6-plus`
|
|
27
|
-
| **Alibaba Cloud (DashScope)** | `qwen3.7-max`
|
|
28
|
-
| **DeepSeek** | `deepseek-v4`
|
|
23
|
+
| Provider | Model | Needs proxy? | Plain chat | Streaming | Tool calling | Vision |
|
|
24
|
+
| ----------------------------- | --------------- | ---------------------------------- | ---------- | --------- | ------------ | ------ |
|
|
25
|
+
| **Moonshot (Kimi)** | `kimi-k2.6` | Yes — `proxy/kimi-proxy.mjs` | ✅ | ✅ | ✅ | ✅ |
|
|
26
|
+
| **Alibaba Cloud (DashScope)** | `qwen3.6-plus` | Optional — `proxy/qwen-proxy.mjs`¹ | ✅² | ✅ | ✅ | ✅ |
|
|
27
|
+
| **Alibaba Cloud (DashScope)** | `qwen3.7-max` | Optional — `proxy/qwen-proxy.mjs`¹ | ✅² | ✅ | ✅ | ❌ |
|
|
28
|
+
| **DeepSeek** | `deepseek-v4` | No — uses a VS Code extension | ✅ | ✅ | ✅ | ✅³ |
|
|
29
|
+
| **Xiaomi MiMo** | `mimo-v2.5` | No | ✅ | ✅ | ✅ | ✅⁴ |
|
|
30
|
+
| **Xiaomi MiMo** | `mimo-v2.5-pro` | No | ✅ | ✅ | ✅ | ❌ |
|
|
31
|
+
| **Xiaomi MiMo** | `mimo-v2-flash` | No | ✅ | ✅ | ✅ | ❌ |
|
|
29
32
|
|
|
30
33
|
¹ Proxy is optional: direct path works with static `enable_thinking: false`. Proxy adds dynamic thinking suppression (thinking ON in plain chat, OFF in tool loops).
|
|
31
34
|
² With proxy: reasoning visible in plain chat. Without proxy: always suppressed.
|
|
32
|
-
³ Vision is supported through a proxy model (Claude, GPT-4o) that describes the image before sending to DeepSeek.
|
|
35
|
+
³ Vision is supported through a proxy model (Claude, GPT-4o) that describes the image before sending to DeepSeek.
|
|
36
|
+
⁴ Native vision via dedicated ViT encoder. Tested via VS Code image attachment in agent mode.
|
|
33
37
|
|
|
34
38
|
Pick the model you want and follow the corresponding section below.
|
|
35
39
|
|
|
@@ -99,10 +103,66 @@ Here's a complete, real-world example of `chatLanguageModels.json` combining all
|
|
|
99
103
|
"maxOutputTokens": 32768
|
|
100
104
|
}
|
|
101
105
|
]
|
|
106
|
+
},
|
|
107
|
+
{
|
|
108
|
+
"name": "MiMo",
|
|
109
|
+
"vendor": "customendpoint",
|
|
110
|
+
"apiKey": "<your-mimo-api-key>",
|
|
111
|
+
"apiType": "chat-completions",
|
|
112
|
+
"models": [
|
|
113
|
+
{
|
|
114
|
+
"id": "mimo-v2.5-pro",
|
|
115
|
+
"name": "MiMo V2.5 Pro",
|
|
116
|
+
"url": "https://api.xiaomimimo.com/v1/chat/completions",
|
|
117
|
+
"toolCalling": true,
|
|
118
|
+
"vision": false,
|
|
119
|
+
"streaming": true,
|
|
120
|
+
"maxInputTokens": 1048576,
|
|
121
|
+
"maxOutputTokens": 131072,
|
|
122
|
+
"requestBody": {
|
|
123
|
+
"thinking": { "type": "disabled" },
|
|
124
|
+
"temperature": 1,
|
|
125
|
+
"top_p": 0.95
|
|
126
|
+
}
|
|
127
|
+
},
|
|
128
|
+
{
|
|
129
|
+
"id": "mimo-v2.5",
|
|
130
|
+
"name": "MiMo V2.5",
|
|
131
|
+
"url": "https://api.xiaomimimo.com/v1/chat/completions",
|
|
132
|
+
"toolCalling": true,
|
|
133
|
+
"vision": true,
|
|
134
|
+
"streaming": true,
|
|
135
|
+
"maxInputTokens": 1048576,
|
|
136
|
+
"maxOutputTokens": 32768,
|
|
137
|
+
"requestBody": {
|
|
138
|
+
"thinking": { "type": "disabled" },
|
|
139
|
+
"temperature": 1,
|
|
140
|
+
"top_p": 0.95
|
|
141
|
+
}
|
|
142
|
+
},
|
|
143
|
+
{
|
|
144
|
+
"id": "mimo-v2-flash",
|
|
145
|
+
"name": "MiMo V2 Flash",
|
|
146
|
+
"url": "https://api.xiaomimimo.com/v1/chat/completions",
|
|
147
|
+
"toolCalling": true,
|
|
148
|
+
"vision": false,
|
|
149
|
+
"streaming": true,
|
|
150
|
+
"maxInputTokens": 262144,
|
|
151
|
+
"maxOutputTokens": 65536,
|
|
152
|
+
"requestBody": {
|
|
153
|
+
"thinking": { "type": "disabled" },
|
|
154
|
+
"temperature": 0.3,
|
|
155
|
+
"top_p": 0.95
|
|
156
|
+
}
|
|
157
|
+
}
|
|
158
|
+
]
|
|
102
159
|
}
|
|
103
160
|
]
|
|
104
161
|
```
|
|
105
162
|
|
|
163
|
+
<details>
|
|
164
|
+
<summary>Kimi K2.6 (Moonshot)</summary>
|
|
165
|
+
|
|
106
166
|
### Kimi K2.6 (Moonshot)
|
|
107
167
|
|
|
108
168
|
#### 1. Grab a Moonshot API key
|
|
@@ -215,33 +275,92 @@ Open (or create) your user config file (see [Config file location](#config-file-
|
|
|
215
275
|
| `invalid temperature` / `invalid top_p` | You're talking directly to Moonshot instead of through the proxy. Double-check the `url` in `chatLanguageModels.json`. |
|
|
216
276
|
| Tool calls fail after first turn | This happens if "thinking" stays enabled during tool loops. The proxy normally disables it automatically; ensure you're on the latest `proxy/kimi-proxy.mjs`. |
|
|
217
277
|
|
|
278
|
+
</details>
|
|
279
|
+
|
|
218
280
|
---
|
|
219
281
|
|
|
282
|
+
<details>
|
|
283
|
+
<summary>Qwen 3.6 Plus / Qwen 3.7 Max (DashScope)</summary>
|
|
284
|
+
|
|
220
285
|
### Qwen 3.6 Plus or Qwen 3.7 Max (DashScope)
|
|
221
286
|
|
|
222
|
-
|
|
287
|
+
Qwen models work **directly** with DashScope — no proxy needed. Just add `enable_thinking: false` to `requestBody` for tool-calling stability. An optional `proxy/qwen-proxy.mjs` is also available for dynamic thinking suppression (see [below](#optional-local-proxy-for-dynamic-thinking)).
|
|
223
288
|
|
|
224
289
|
#### 1. Grab a DashScope API key
|
|
225
290
|
|
|
226
|
-
|
|
291
|
+
Create an API key [here](https://modelstudio.console.alibabacloud.com/ap-southeast-1?tab=dashboard#/api-key).
|
|
292
|
+
|
|
293
|
+
> **Regional endpoints:** DashScope offers endpoints for several regions. API keys are region-specific.
|
|
294
|
+
>
|
|
295
|
+
> - **China (Beijing):** `https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions`
|
|
296
|
+
> - **US (Virginia):** `https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions`
|
|
297
|
+
> - **Singapore (default):** `https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions`
|
|
298
|
+
|
|
299
|
+
#### 2. Register the models in VS Code
|
|
300
|
+
|
|
301
|
+
Open (or create) your user config file (see [Config file location](#config-file-location) above) and paste this entry (replace `<your-dashscope-key>`):
|
|
302
|
+
|
|
303
|
+
```json
|
|
304
|
+
{
|
|
305
|
+
"name": "Qwen",
|
|
306
|
+
"vendor": "customendpoint",
|
|
307
|
+
"apiKey": "<your-dashscope-key>",
|
|
308
|
+
"apiType": "chat-completions",
|
|
309
|
+
"models": [
|
|
310
|
+
{
|
|
311
|
+
"id": "qwen3.7-max",
|
|
312
|
+
"name": "Qwen 3.7 Max",
|
|
313
|
+
"url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
|
|
314
|
+
"toolCalling": true,
|
|
315
|
+
"vision": false,
|
|
316
|
+
"streaming": true,
|
|
317
|
+
"requestBody": {
|
|
318
|
+
"enable_thinking": false
|
|
319
|
+
}
|
|
320
|
+
},
|
|
321
|
+
{
|
|
322
|
+
"id": "qwen3.6-plus",
|
|
323
|
+
"name": "Qwen 3.6 Plus",
|
|
324
|
+
"url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
|
|
325
|
+
"toolCalling": true,
|
|
326
|
+
"vision": true,
|
|
327
|
+
"streaming": true,
|
|
328
|
+
"requestBody": {
|
|
329
|
+
"enable_thinking": false
|
|
330
|
+
}
|
|
331
|
+
}
|
|
332
|
+
]
|
|
333
|
+
}
|
|
334
|
+
```
|
|
335
|
+
|
|
336
|
+
> **Trade-off:** `enable_thinking: false` suppresses reasoning in all requests (both plain chat and tool loops). Tool loops stay stable, but you never see the model's thought process. The [optional proxy](#optional-local-proxy-for-dynamic-thinking) below avoids this trade-off.
|
|
337
|
+
|
|
338
|
+
#### 3. Chat!
|
|
227
339
|
|
|
228
|
-
|
|
340
|
+
- Open the Copilot chat panel (`Ctrl+Alt+I` / `Cmd+Ctrl+I`).
|
|
341
|
+
- Click the model picker (top-right of the chat input).
|
|
342
|
+
- Choose **Qwen 3.6 Plus** (with vision) or **Qwen 3.7 Max** (text only).
|
|
343
|
+
- Ask something. Streaming, tool use, and vision (3.6 Plus) all work.
|
|
344
|
+
|
|
345
|
+
---
|
|
229
346
|
|
|
230
|
-
|
|
347
|
+
#### Optional: Local proxy for dynamic thinking
|
|
231
348
|
|
|
232
|
-
|
|
349
|
+
If you want reasoning visible in plain chat but automatically suppressed during tool loops, run the optional `proxy/qwen-proxy.mjs` instead.
|
|
350
|
+
|
|
351
|
+
Start the proxy:
|
|
233
352
|
|
|
234
353
|
```bash
|
|
235
354
|
npm run proxy:qwen
|
|
236
355
|
```
|
|
237
356
|
|
|
238
|
-
|
|
357
|
+
Or with all proxies:
|
|
239
358
|
|
|
240
359
|
```bash
|
|
241
360
|
npm run proxy
|
|
242
361
|
```
|
|
243
362
|
|
|
244
|
-
|
|
363
|
+
Or globally (from any directory):
|
|
245
364
|
|
|
246
365
|
```bash
|
|
247
366
|
# Qwen only
|
|
@@ -250,14 +369,6 @@ npx copilot-custom-endpoint qwen
|
|
|
250
369
|
npx copilot-custom-endpoint
|
|
251
370
|
```
|
|
252
371
|
|
|
253
|
-
Clean up debug logs
|
|
254
|
-
|
|
255
|
-
```bash
|
|
256
|
-
npm run clean:logs
|
|
257
|
-
# or with npx
|
|
258
|
-
npx copilot-custom-endpoint clean
|
|
259
|
-
```
|
|
260
|
-
|
|
261
372
|
You should see:
|
|
262
373
|
|
|
263
374
|
```
|
|
@@ -281,9 +392,7 @@ Expected response:
|
|
|
281
392
|
}
|
|
282
393
|
```
|
|
283
394
|
|
|
284
|
-
|
|
285
|
-
|
|
286
|
-
Open (or create) your user config file (see [Config file location](#config-file-location) above) and paste this entry (replace `<your-dashscope-key>`). Point URLs at the proxy and omit `requestBody` — the proxy handles thinking dynamically:
|
|
395
|
+
Then update your VS Code config to point URLs at the proxy and remove `requestBody` — the proxy handles thinking dynamically:
|
|
287
396
|
|
|
288
397
|
```json
|
|
289
398
|
{
|
|
@@ -314,59 +423,7 @@ Open (or create) your user config file (see [Config file location](#config-file-
|
|
|
314
423
|
|
|
315
424
|
> **Keep the proxy terminal open** while using these models.
|
|
316
425
|
|
|
317
|
-
|
|
318
|
-
|
|
319
|
-
- Open the Copilot chat panel (`Ctrl+Alt+I` / `Cmd+Ctrl+I`).
|
|
320
|
-
- Click the model picker (top-right of the chat input).
|
|
321
|
-
- Choose **Qwen 3.6 Plus** (with vision) or **Qwen 3.7 Max** (text only).
|
|
322
|
-
- Ask something. Streaming, tool use, and vision (3.6 Plus) all work.
|
|
323
|
-
|
|
324
|
-
> **Regional endpoints:** If connecting directly (no proxy), DashScope offers endpoints for several regions. The proxy uses `dashscope-intl.aliyuncs.com` (Singapore) by default, configurable via `QWEN_UPSTREAM_URL`.
|
|
325
|
-
>
|
|
326
|
-
> - **China (Beijing):** `https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions`
|
|
327
|
-
> - **US (Virginia):** `https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions`
|
|
328
|
-
> - **Singapore:** `https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions` (proxy default)
|
|
329
|
-
>
|
|
330
|
-
> API keys are region-specific.
|
|
331
|
-
|
|
332
|
-
#### Direct path (no proxy)
|
|
333
|
-
|
|
334
|
-
If you prefer not to run the proxy, Qwen models work **directly** with DashScope by using the upstream URL and a static `enable_thinking: false` in `requestBody`:
|
|
335
|
-
|
|
336
|
-
```json
|
|
337
|
-
{
|
|
338
|
-
"name": "Qwen",
|
|
339
|
-
"vendor": "customendpoint",
|
|
340
|
-
"apiKey": "<your-dashscope-key>",
|
|
341
|
-
"apiType": "chat-completions",
|
|
342
|
-
"models": [
|
|
343
|
-
{
|
|
344
|
-
"id": "qwen3.7-max",
|
|
345
|
-
"name": "Qwen 3.7 Max",
|
|
346
|
-
"url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
|
|
347
|
-
"toolCalling": true,
|
|
348
|
-
"vision": false,
|
|
349
|
-
"streaming": true,
|
|
350
|
-
"requestBody": {
|
|
351
|
-
"enable_thinking": false
|
|
352
|
-
}
|
|
353
|
-
},
|
|
354
|
-
{
|
|
355
|
-
"id": "qwen3.6-plus",
|
|
356
|
-
"name": "Qwen 3.6 Plus",
|
|
357
|
-
"url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
|
|
358
|
-
"toolCalling": true,
|
|
359
|
-
"vision": true,
|
|
360
|
-
"streaming": true,
|
|
361
|
-
"requestBody": {
|
|
362
|
-
"enable_thinking": false
|
|
363
|
-
}
|
|
364
|
-
}
|
|
365
|
-
]
|
|
366
|
-
}
|
|
367
|
-
```
|
|
368
|
-
|
|
369
|
-
> **Trade-off:** `enable_thinking: false` suppresses reasoning in all requests (both plain chat and tool loops). Tool loops stay stable, but you never see the model's thought process. The proxy path avoids this trade-off.
|
|
426
|
+
The proxy URL is configurable via the `QWEN_UPSTREAM_URL` environment variable (defaults to the Singapore endpoint shown in [step 1](#1-grab-a-dashscope-api-key)).
|
|
370
427
|
|
|
371
428
|
#### Troubleshooting (Qwen)
|
|
372
429
|
|
|
@@ -375,8 +432,13 @@ If you prefer not to run the proxy, Qwen models work **directly** with DashScope
|
|
|
375
432
|
| `reasoning_content` errors during tool loops | Ensure `enable_thinking: false` is present in `requestBody` for every Qwen model. |
|
|
376
433
|
| Vision images fail to upload | Use base64-encoded images; external image URLs may fail if DashScope cannot reach them. |
|
|
377
434
|
|
|
435
|
+
</details>
|
|
436
|
+
|
|
378
437
|
---
|
|
379
438
|
|
|
439
|
+
<details>
|
|
440
|
+
<summary>DeepSeek V4 (VS Code Extension)</summary>
|
|
441
|
+
|
|
380
442
|
### DeepSeek V4 (VS Code Extension)
|
|
381
443
|
|
|
382
444
|
DeepSeek V4 Pro & Flash are available via a **dedicated VS Code extension** rather than a raw custom endpoint. The extension plugs DeepSeek directly into Copilot Chat's model picker while preserving agent mode, tool calling, skills, and MCP support.
|
|
@@ -418,16 +480,114 @@ DeepSeek V4 is text-only, but the extension handles images automatically — dro
|
|
|
418
480
|
|
|
419
481
|
> For the full official guide, see: [github.com/deepseek-ai/awesome-deepseek-agent/blob/main/docs/github_copilot.md](https://github.com/deepseek-ai/awesome-deepseek-agent/blob/main/docs/github_copilot.md)
|
|
420
482
|
|
|
483
|
+
</details>
|
|
484
|
+
|
|
485
|
+
---
|
|
486
|
+
|
|
487
|
+
<details>
|
|
488
|
+
<summary>Xiaomi MiMo</summary>
|
|
489
|
+
|
|
490
|
+
### Xiaomi MiMo
|
|
491
|
+
|
|
492
|
+
MiMo works **directly** — no proxy needed. Just add the provider entry to your VS Code config and select the model in the chat picker.
|
|
493
|
+
|
|
494
|
+
No proxy means lower latency, fewer moving parts, and nothing extra to keep running.
|
|
495
|
+
|
|
496
|
+
#### 1. Get a MiMo API key
|
|
497
|
+
|
|
498
|
+
Sign up at [platform.xiaomimimo.com](https://platform.xiaomimimo.com) and create an API key from the [Console](https://platform.xiaomimimo.com/console/api-keys).
|
|
499
|
+
|
|
500
|
+
#### 2. Register the models in VS Code
|
|
501
|
+
|
|
502
|
+
Open your user config file (see [Config file location](#config-file-location) above) and paste this entry (replace `<your-mimo-api-key>`):
|
|
503
|
+
|
|
504
|
+
```json
|
|
505
|
+
{
|
|
506
|
+
"name": "MiMo",
|
|
507
|
+
"vendor": "customendpoint",
|
|
508
|
+
"apiKey": "<your-mimo-api-key>",
|
|
509
|
+
"apiType": "chat-completions",
|
|
510
|
+
"models": [
|
|
511
|
+
{
|
|
512
|
+
"id": "mimo-v2.5-pro",
|
|
513
|
+
"name": "MiMo V2.5 Pro",
|
|
514
|
+
"url": "https://api.xiaomimimo.com/v1/chat/completions",
|
|
515
|
+
"toolCalling": true,
|
|
516
|
+
"vision": false,
|
|
517
|
+
"streaming": true,
|
|
518
|
+
"maxInputTokens": 1048576,
|
|
519
|
+
"maxOutputTokens": 131072,
|
|
520
|
+
"requestBody": {
|
|
521
|
+
"thinking": { "type": "disabled" },
|
|
522
|
+
"temperature": 1,
|
|
523
|
+
"top_p": 0.95
|
|
524
|
+
}
|
|
525
|
+
},
|
|
526
|
+
{
|
|
527
|
+
"id": "mimo-v2.5",
|
|
528
|
+
"name": "MiMo V2.5",
|
|
529
|
+
"url": "https://api.xiaomimimo.com/v1/chat/completions",
|
|
530
|
+
"toolCalling": true,
|
|
531
|
+
"vision": true,
|
|
532
|
+
"streaming": true,
|
|
533
|
+
"maxInputTokens": 1048576,
|
|
534
|
+
"maxOutputTokens": 32768,
|
|
535
|
+
"requestBody": {
|
|
536
|
+
"thinking": { "type": "disabled" },
|
|
537
|
+
"temperature": 1,
|
|
538
|
+
"top_p": 0.95
|
|
539
|
+
}
|
|
540
|
+
},
|
|
541
|
+
{
|
|
542
|
+
"id": "mimo-v2-flash",
|
|
543
|
+
"name": "MiMo V2 Flash",
|
|
544
|
+
"url": "https://api.xiaomimimo.com/v1/chat/completions",
|
|
545
|
+
"toolCalling": true,
|
|
546
|
+
"vision": false,
|
|
547
|
+
"streaming": true,
|
|
548
|
+
"maxInputTokens": 262144,
|
|
549
|
+
"maxOutputTokens": 65536,
|
|
550
|
+
"requestBody": {
|
|
551
|
+
"thinking": { "type": "disabled" },
|
|
552
|
+
"temperature": 0.3,
|
|
553
|
+
"top_p": 0.95
|
|
554
|
+
}
|
|
555
|
+
}
|
|
556
|
+
]
|
|
557
|
+
}
|
|
558
|
+
```
|
|
559
|
+
|
|
560
|
+
> **Note:** `thinking: { "type": "disabled" }` is required for tool-calling stability. Without it, MiMo returns a 400 error when conversation history contains tool calls with missing `reasoning_content`.
|
|
561
|
+
|
|
562
|
+
#### 3. Chat!
|
|
563
|
+
|
|
564
|
+
- Open the Copilot chat panel (`Ctrl+Alt+I` / `Cmd+Ctrl+I`).
|
|
565
|
+
- Click the model picker (top-right of the chat input).
|
|
566
|
+
- Choose **MiMo V2 Flash** (fastest/cheapest), **MiMo V2.5** (omnimodal with vision), or **MiMo V2.5 Pro** (most capable for agentic work).
|
|
567
|
+
- Ask something. Streaming, tool use, and vision (V2.5) all work.
|
|
568
|
+
|
|
569
|
+
#### Troubleshooting (MiMo)
|
|
570
|
+
|
|
571
|
+
| Symptom | Fix |
|
|
572
|
+
| ----------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- |
|
|
573
|
+
| 400 error `reasoning_content` during tool loops | Ensure `thinking: { "type": "disabled" }` is present in `requestBody` for every MiMo model. |
|
|
574
|
+
| Vision images fail to upload | Use `mimo-v2.5` (the only model with native vision). Text-only models (`pro`, `flash`) don't support image input. |
|
|
575
|
+
|
|
576
|
+
</details>
|
|
577
|
+
|
|
578
|
+
---
|
|
579
|
+
|
|
421
580
|
For the full research notes, tested values, and known limitations, see:
|
|
422
581
|
|
|
423
582
|
- [`docs/models/kimi-k2.6.md`](docs/models/kimi-k2.6.md)
|
|
424
583
|
- [`docs/models/qwen.md`](docs/models/qwen.md)
|
|
584
|
+
- [`docs/models/mimo.md`](docs/models/mimo.md)
|
|
425
585
|
|
|
426
586
|
## Pricing comparison
|
|
427
587
|
|
|
428
588
|
> **⏰ June 1, 2026 — GitHub Copilot switched to usage-based billing (AI Credits) today.**
|
|
429
589
|
>
|
|
430
|
-
> Before this change, Copilot
|
|
590
|
+
> Before this change, Copilot used **premium request-based billing** — each model had its own multiplier (e.g., GPT-5.5 = 7.5×, Claude Sonnet 4.6 = 1×, Haiku 4.5 = 0.33×), and every request consumed `multiplier × 1` from your monthly premium-request allowance. Now **every interaction burns AI credits** based on actual token consumption. Agent mode and complex multi-file tasks consume significantly more tokens than simple Q&A, which means your 7,000 Pro+ credits can disappear fast if you're using frontier models.
|
|
431
591
|
>
|
|
432
592
|
> **The practical workaround:** use cheaper alternative models (DeepSeek V4 Flash, Kimi K2.6, Qwen) that are still powerful enough for coding — often at **5–55× less cost** than the Copilot defaults. The tables below show the exact comparison.
|
|
433
593
|
>
|
|
@@ -470,32 +630,39 @@ These are the models available through GitHub Copilot's model roster as of June
|
|
|
470
630
|
| Model | Provider | Input (per 1M) | Output (per 1M) | Context window |
|
|
471
631
|
| --------------------- | --------- | ----------------------------- | --------------------------------------- | -------------- |
|
|
472
632
|
| **DeepSeek V4 Flash** | DeepSeek | $0.14 | $0.28 | 1M |
|
|
633
|
+
| **MiMo V2 Flash** 🏆 | Xiaomi | $0.10 | $0.30 | 256K |
|
|
473
634
|
| **Kimi K2.6** | Moonshot | $0.16 | $0.95 (non-thinking) / $4.00 (thinking) | 256K |
|
|
474
635
|
| **DeepSeek V4 Pro** | DeepSeek | $1.74 | $3.48 | 1M |
|
|
636
|
+
| **MiMo V2.5** | Xiaomi | $0.40 | $2.00 | 1M |
|
|
637
|
+
| **MiMo V2.5 Pro** | Xiaomi | $1.00 | $3.00 | 1M |
|
|
475
638
|
| **Qwen 3.6 Plus** | DashScope | $0.50 (≤256K) / $2.00 (>256K) | $3.00 (≤256K) / $6.00 (>256K) | 1M |
|
|
476
639
|
| **Qwen 3.7 Max** | DashScope | $2.50 (≤1M) | $7.50 (≤1M) | 1M |
|
|
477
640
|
|
|
478
641
|
> **Notes:**
|
|
479
642
|
>
|
|
480
643
|
> - **DeepSeek V4** input pricing shown is the **cache miss** price. Cache hits are significantly cheaper ($0.0028/M for Flash, $0.0145/M for Pro).
|
|
644
|
+
> - **MiMo** input pricing shown is the **cache miss** price. Cache hits are 5× cheaper for V2.5 Pro ($0.20/M) and V2.5 ($0.08/M), and 10× cheaper for V2 Flash ($0.01/M).
|
|
481
645
|
> - **Gemini 3 Flash** is priced at $0.50/MTok input (text/image/video) and $1.00/MTok input for audio.
|
|
482
646
|
> - **Anthropic (Claude)** models also have a cache write cost ($6.25/MTok for Opus, $3.75/MTok for Sonnet, $1.25/MTok for Haiku). Opus 4.7+ use a new tokenizer that may use up to 35% more tokens for the same text.
|
|
483
647
|
> - **OpenAI** models support cached input at 0.1× base input rate.
|
|
484
648
|
> - **Qwen** models use **tiered pricing** — determined by total input tokens per request. Prices above are for non-thinking mode.
|
|
485
649
|
> - **Kimi K2.6** pricing is from the **Moonshot platform** (direct). Via DashScope: $0.89 input / $3.71 output.
|
|
486
650
|
> - **DashScope** offers a **free quota** of 1M input + 1M output tokens per model, valid for 90 days.
|
|
651
|
+
> - **MiMo** offers a **Token Plan** subscription model with discounted rates and a free cache-writing promotion.
|
|
487
652
|
> - For typical Copilot chat usage (short-to-medium prompts), you'll almost always fall in the lowest pricing tier.
|
|
488
653
|
|
|
489
654
|
**Quick cost comparison for a typical coding session** (~10K input + ~2K output tokens per turn, 50 turns):
|
|
490
655
|
|
|
491
656
|
| Model | Estimated session cost | Copilot Pro+ credits |
|
|
492
657
|
| ------------------------ | ---------------------- | -------------------- |
|
|
658
|
+
| MiMo V2 Flash 🏆 | ~$0.08 | — |
|
|
493
659
|
| DeepSeek V4 Flash 🏆 | ~$0.10 | — |
|
|
494
660
|
| Kimi K2.6 (non-thinking) | ~$0.18 | — |
|
|
495
|
-
|
|
|
661
|
+
| MiMo V2.5 | ~$0.40 | — |
|
|
496
662
|
| Kimi K2.6 (thinking) | ~$0.48 | — |
|
|
497
663
|
| Gemini 3 Flash | ~$0.55 | ~55 |
|
|
498
664
|
| Qwen 3.6 Plus | ~$0.55 | — |
|
|
665
|
+
| MiMo V2.5 Pro | ~$0.80 | — |
|
|
499
666
|
| GPT-5.4 mini | ~$0.83 | ~83 |
|
|
500
667
|
| Claude Haiku 4.5 | ~$1.00 | ~100 |
|
|
501
668
|
| DeepSeek V4 Pro | ~$1.22 | — |
|
|
@@ -519,6 +686,7 @@ These are the models available through GitHub Copilot's model roster as of June
|
|
|
519
686
|
> - [Google Gemini pricing](https://ai.google.dev/pricing)
|
|
520
687
|
> - [DashScope pricing](https://www.alibabacloud.com/help/en/model-studio/billing-for-model-studio)
|
|
521
688
|
> - [DeepSeek pricing](https://api-docs.deepseek.com/quick_start/pricing)
|
|
689
|
+
> - [MiMo pricing](https://platform.xiaomimimo.com/docs/en-US/pricing)
|
|
522
690
|
|
|
523
691
|
## Repo layout
|
|
524
692
|
|