copilot-custom-endpoint 1.1.0 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +84 -72
  2. package/package.json +2 -2
package/README.md CHANGED
@@ -160,6 +160,9 @@ Here's a complete, real-world example of `chatLanguageModels.json` combining all
160
160
  ]
161
161
  ```
162
162
 
163
+ <details>
164
+ <summary>Kimi K2.6 (Moonshot)</summary>
165
+
163
166
  ### Kimi K2.6 (Moonshot)
164
167
 
165
168
  #### 1. Grab a Moonshot API key
@@ -272,33 +275,92 @@ Open (or create) your user config file (see [Config file location](#config-file-
272
275
  | `invalid temperature` / `invalid top_p` | You're talking directly to Moonshot instead of through the proxy. Double-check the `url` in `chatLanguageModels.json`. |
273
276
  | Tool calls fail after first turn | This happens if "thinking" stays enabled during tool loops. The proxy normally disables it automatically; ensure you're on the latest `proxy/kimi-proxy.mjs`. |
274
277
 
278
+ </details>
279
+
275
280
  ---
276
281
 
282
+ <details>
283
+ <summary>Qwen 3.6 Plus / Qwen 3.7 Max (DashScope)</summary>
284
+
277
285
  ### Qwen 3.6 Plus or Qwen 3.7 Max (DashScope)
278
286
 
279
- These models work with the optional `proxy/qwen-proxy.mjs` for dynamic thinking suppression (reasoning visible in plain chat, suppressed in tool loops). They also work **without a proxy** using a static `enable_thinking: false` — see the [direct path alternative](#direct-path-no-proxy) below.
287
+ Qwen models work **directly** with DashScope no proxy needed. Just add `enable_thinking: false` to `requestBody` for tool-calling stability. An optional `proxy/qwen-proxy.mjs` is also available for dynamic thinking suppression (see [below](#optional-local-proxy-for-dynamic-thinking)).
280
288
 
281
289
  #### 1. Grab a DashScope API key
282
290
 
283
- Sign up at [dashscope.aliyun.com](https://dashscope.aliyun.com) and create an API key.
291
+ Create an API key [here](https://modelstudio.console.alibabacloud.com/ap-southeast-1?tab=dashboard#/api-key).
292
+
293
+ > **Regional endpoints:** DashScope offers endpoints for several regions. API keys are region-specific.
294
+ >
295
+ > - **China (Beijing):** `https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions`
296
+ > - **US (Virginia):** `https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions`
297
+ > - **Singapore (default):** `https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions`
298
+
299
+ #### 2. Register the models in VS Code
300
+
301
+ Open (or create) your user config file (see [Config file location](#config-file-location) above) and paste this entry (replace `<your-dashscope-key>`):
302
+
303
+ ```json
304
+ {
305
+ "name": "Qwen",
306
+ "vendor": "customendpoint",
307
+ "apiKey": "<your-dashscope-key>",
308
+ "apiType": "chat-completions",
309
+ "models": [
310
+ {
311
+ "id": "qwen3.7-max",
312
+ "name": "Qwen 3.7 Max",
313
+ "url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
314
+ "toolCalling": true,
315
+ "vision": false,
316
+ "streaming": true,
317
+ "requestBody": {
318
+ "enable_thinking": false
319
+ }
320
+ },
321
+ {
322
+ "id": "qwen3.6-plus",
323
+ "name": "Qwen 3.6 Plus",
324
+ "url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
325
+ "toolCalling": true,
326
+ "vision": true,
327
+ "streaming": true,
328
+ "requestBody": {
329
+ "enable_thinking": false
330
+ }
331
+ }
332
+ ]
333
+ }
334
+ ```
335
+
336
+ > **Trade-off:** `enable_thinking: false` suppresses reasoning in all requests (both plain chat and tool loops). Tool loops stay stable, but you never see the model's thought process. The [optional proxy](#optional-local-proxy-for-dynamic-thinking) below avoids this trade-off.
284
337
 
285
- #### 2. Start the optional local proxy (recommended)
338
+ #### 3. Chat!
339
+
340
+ - Open the Copilot chat panel (`Ctrl+Alt+I` / `Cmd+Ctrl+I`).
341
+ - Click the model picker (top-right of the chat input).
342
+ - Choose **Qwen 3.6 Plus** (with vision) or **Qwen 3.7 Max** (text only).
343
+ - Ask something. Streaming, tool use, and vision (3.6 Plus) all work.
344
+
345
+ ---
346
+
347
+ #### Optional: Local proxy for dynamic thinking
286
348
 
287
- The proxy dynamically enables thinking in plain chat and disables it during tool calls:
349
+ If you want reasoning visible in plain chat but automatically suppressed during tool loops, run the optional `proxy/qwen-proxy.mjs` instead.
288
350
 
289
- Run Qwen proxy
351
+ Start the proxy:
290
352
 
291
353
  ```bash
292
354
  npm run proxy:qwen
293
355
  ```
294
356
 
295
- Run all proxies
357
+ Or with all proxies:
296
358
 
297
359
  ```bash
298
360
  npm run proxy
299
361
  ```
300
362
 
301
- Run globally (from any directory)
363
+ Or globally (from any directory):
302
364
 
303
365
  ```bash
304
366
  # Qwen only
@@ -307,14 +369,6 @@ npx copilot-custom-endpoint qwen
307
369
  npx copilot-custom-endpoint
308
370
  ```
309
371
 
310
- Clean up debug logs
311
-
312
- ```bash
313
- npm run clean:logs
314
- # or with npx
315
- npx copilot-custom-endpoint clean
316
- ```
317
-
318
372
  You should see:
319
373
 
320
374
  ```
@@ -338,9 +392,7 @@ Expected response:
338
392
  }
339
393
  ```
340
394
 
341
- #### 3. Register the models in VS Code
342
-
343
- Open (or create) your user config file (see [Config file location](#config-file-location) above) and paste this entry (replace `<your-dashscope-key>`). Point URLs at the proxy and omit `requestBody` — the proxy handles thinking dynamically:
395
+ Then update your VS Code config to point URLs at the proxy and remove `requestBody` — the proxy handles thinking dynamically:
344
396
 
345
397
  ```json
346
398
  {
@@ -371,59 +423,7 @@ Open (or create) your user config file (see [Config file location](#config-file-
371
423
 
372
424
  > **Keep the proxy terminal open** while using these models.
373
425
 
374
- #### 4. Chat!
375
-
376
- - Open the Copilot chat panel (`Ctrl+Alt+I` / `Cmd+Ctrl+I`).
377
- - Click the model picker (top-right of the chat input).
378
- - Choose **Qwen 3.6 Plus** (with vision) or **Qwen 3.7 Max** (text only).
379
- - Ask something. Streaming, tool use, and vision (3.6 Plus) all work.
380
-
381
- > **Regional endpoints:** If connecting directly (no proxy), DashScope offers endpoints for several regions. The proxy uses `dashscope-intl.aliyuncs.com` (Singapore) by default, configurable via `QWEN_UPSTREAM_URL`.
382
- >
383
- > - **China (Beijing):** `https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions`
384
- > - **US (Virginia):** `https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions`
385
- > - **Singapore:** `https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions` (proxy default)
386
- >
387
- > API keys are region-specific.
388
-
389
- #### Direct path (no proxy)
390
-
391
- If you prefer not to run the proxy, Qwen models work **directly** with DashScope by using the upstream URL and a static `enable_thinking: false` in `requestBody`:
392
-
393
- ```json
394
- {
395
- "name": "Qwen",
396
- "vendor": "customendpoint",
397
- "apiKey": "<your-dashscope-key>",
398
- "apiType": "chat-completions",
399
- "models": [
400
- {
401
- "id": "qwen3.7-max",
402
- "name": "Qwen 3.7 Max",
403
- "url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
404
- "toolCalling": true,
405
- "vision": false,
406
- "streaming": true,
407
- "requestBody": {
408
- "enable_thinking": false
409
- }
410
- },
411
- {
412
- "id": "qwen3.6-plus",
413
- "name": "Qwen 3.6 Plus",
414
- "url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
415
- "toolCalling": true,
416
- "vision": true,
417
- "streaming": true,
418
- "requestBody": {
419
- "enable_thinking": false
420
- }
421
- }
422
- ]
423
- }
424
- ```
425
-
426
- > **Trade-off:** `enable_thinking: false` suppresses reasoning in all requests (both plain chat and tool loops). Tool loops stay stable, but you never see the model's thought process. The proxy path avoids this trade-off.
426
+ The proxy URL is configurable via the `QWEN_UPSTREAM_URL` environment variable (defaults to the Singapore endpoint shown in [step 1](#1-grab-a-dashscope-api-key)).
427
427
 
428
428
  #### Troubleshooting (Qwen)
429
429
 
@@ -432,8 +432,13 @@ If you prefer not to run the proxy, Qwen models work **directly** with DashScope
432
432
  | `reasoning_content` errors during tool loops | Ensure `enable_thinking: false` is present in `requestBody` for every Qwen model. |
433
433
  | Vision images fail to upload | Use base64-encoded images; external image URLs may fail if DashScope cannot reach them. |
434
434
 
435
+ </details>
436
+
435
437
  ---
436
438
 
439
+ <details>
440
+ <summary>DeepSeek V4 (VS Code Extension)</summary>
441
+
437
442
  ### DeepSeek V4 (VS Code Extension)
438
443
 
439
444
  DeepSeek V4 Pro & Flash are available via a **dedicated VS Code extension** rather than a raw custom endpoint. The extension plugs DeepSeek directly into Copilot Chat's model picker while preserving agent mode, tool calling, skills, and MCP support.
@@ -475,8 +480,13 @@ DeepSeek V4 is text-only, but the extension handles images automatically — dro
475
480
 
476
481
  > For the full official guide, see: [github.com/deepseek-ai/awesome-deepseek-agent/blob/main/docs/github_copilot.md](https://github.com/deepseek-ai/awesome-deepseek-agent/blob/main/docs/github_copilot.md)
477
482
 
483
+ </details>
484
+
478
485
  ---
479
486
 
487
+ <details>
488
+ <summary>Xiaomi MiMo</summary>
489
+
480
490
  ### Xiaomi MiMo
481
491
 
482
492
  MiMo works **directly** — no proxy needed. Just add the provider entry to your VS Code config and select the model in the chat picker.
@@ -563,6 +573,8 @@ Open your user config file (see [Config file location](#config-file-location) ab
563
573
  | 400 error `reasoning_content` during tool loops | Ensure `thinking: { "type": "disabled" }` is present in `requestBody` for every MiMo model. |
564
574
  | Vision images fail to upload | Use `mimo-v2.5` (the only model with native vision). Text-only models (`pro`, `flash`) don't support image input. |
565
575
 
576
+ </details>
577
+
566
578
  ---
567
579
 
568
580
  For the full research notes, tested values, and known limitations, see:
@@ -575,7 +587,7 @@ For the full research notes, tested values, and known limitations, see:
575
587
 
576
588
  > **⏰ June 1, 2026 — GitHub Copilot switched to usage-based billing (AI Credits) today.**
577
589
  >
578
- > Before this change, Copilot was a flat subscriptionno per-turn metering, so you could use chat and agent mode as much as you wanted within rate-limit bounds. Now **every interaction burns AI credits** from your monthly allowance. Agent mode and complex multi-file tasks consume significantly more tokens than simple Q&A, which means your 7,000 Pro+ credits can disappear fast if you're using frontier models.
590
+ > Before this change, Copilot used **premium request-based billing**each model had its own multiplier (e.g., GPT-5.5 = 7.5×, Claude Sonnet 4.6 = 1×, Haiku 4.5 = 0.33×), and every request consumed `multiplier × 1` from your monthly premium-request allowance. Now **every interaction burns AI credits** based on actual token consumption. Agent mode and complex multi-file tasks consume significantly more tokens than simple Q&A, which means your 7,000 Pro+ credits can disappear fast if you're using frontier models.
579
591
  >
580
592
  > **The practical workaround:** use cheaper alternative models (DeepSeek V4 Flash, Kimi K2.6, Qwen) that are still powerful enough for coding — often at **5–55× less cost** than the Copilot defaults. The tables below show the exact comparison.
581
593
  >
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "copilot-custom-endpoint",
3
- "version": "1.1.0",
3
+ "version": "1.1.1",
4
4
  "description": "Local proxies for VS Code Copilot custom endpoints — Kimi K2 & Qwen 3.x",
5
5
  "license": "MIT",
6
6
  "type": "module",
@@ -51,4 +51,4 @@
51
51
  "dependencies": {
52
52
  "dotenv": "^17.4.2"
53
53
  }
54
- }
54
+ }