npm - copilot-custom-endpoint - Versions diffs - 1.3.13 → 1.4.0 - Mend

copilot-custom-endpoint 1.3.13 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/README.md CHANGED Viewed

@@ -24,9 +24,9 @@ That's it. No code, no servers to manage (unless the model specifically needs th
 | **MiMo V2 Flash**           | Xiaomi    | No                     | ❌           | [Setup](docs/models/mimo.md)                                                                       |
 | **MiMo V2.5**               | Xiaomi    | No                     | ✅           | [Setup](docs/models/mimo.md)                                                                       |
 | **MiMo V2.5 Pro**           | Xiaomi    | No                     | ❌           | [Setup](docs/models/mimo.md)                                                                       |
-| **Kimi K2.6**               | Moonshot  | **Yes**                | ✅           | [Setup](docs/models/kimi.md)                                                                       |
-| **Qwen 3.7 Plus**           | DashScope | Optional               | ✅           | [Setup](docs/models/qwen.md)                                                                       |
-| **Qwen 3.7 Max**            | DashScope | Optional               | ❌           | [Setup](docs/models/qwen.md)                                                                       |
+| **Kimi K2.7 Code / K2.6**   | Moonshot  | **Yes**                | ✅           | [Setup](docs/models/kimi.md)                                                                       |
+| **Qwen 3.7 Plus**           | DashScope | Optional (recommended) | ✅           | [Setup](docs/models/qwen.md)                                                                       |
+| **Qwen 3.7 Max**            | DashScope | Optional (recommended) | ❌           | [Setup](docs/models/qwen.md)                                                                       |
 | **MiniMax M3**              | MiniMax   | No                     | ✅           | [Setup](docs/models/minimax.md)                                                                    |
 | **GLM 5.1**                 | Z.ai      | No                     | ❌           | [Setup](docs/models/glm.md)                                                                        |
 | **GLM 5V Turbo**            | Z.ai      | No                     | ✅           | [Setup](docs/models/glm.md)                                                                        |
@@ -88,16 +88,17 @@ npx copilot-custom-endpoint clean    # Remove debug_log/
 ## Pricing snapshot
-All prices are **USD per 1M tokens** (cache miss). 1 AI credit = $0.01.
+All prices are **USD per 1M tokens** (cache miss). 1 AI credit = $0.01. **MiniMax M3** figures reflect a permanent 50% off list price — see the model doc for the full rate card.
 | Model                        | Input | Output | Context |
 | ---------------------------- | ----- | ------ | ------- |
 | **MiMo V2 Flash** 🏆         | $0.10 | $0.30  | 256K    |
 | **DeepSeek V4 Flash** 🏆     | $0.14 | $0.28  | 1M      |
-| **Kimi K2.6** (non-thinking) | $0.16 | $0.95  | 256K    |
+| **Kimi K2.6** (non-thinking) | $0.16 | $0.95  | 262K    |
+| **Kimi K2.7 Code**           | $0.19 | $4.00  | 262K    |
+| **MiniMax M3**               | $0.30 | $1.20  | 1M      |
 | **MiMo V2.5**                | $0.40 | $2.00  | 1M      |
 | **Qwen 3.7 Plus**            | $0.40 | $1.60  | 1M      |
-| **MiniMax M3**               | $0.60 | $2.40  | 1M      |
 | **MiMo V2.5 Pro**            | $1.00 | $3.00  | 1M      |
 | **GLM 5V Turbo**             | $1.20 | $4.00  | 200K    |
 | **GLM 5.1**                  | $1.40 | $4.40  | 200K    |

package/docs/example-config.md CHANGED Viewed

@@ -3,6 +3,8 @@
 Here's a complete, real-world `chatLanguageModels.json` that combines **all the providers documented in this repo**. Copy what you need, leave the rest out.
 > **Note:** The `apiKey` fields are left as empty strings — set them via the **Chat: Manage Language Models** UI (Command Palette → right-click provider group → **Update API Key**). After you set a key via the UI, VS Code replaces the empty string with a `${input:chat.lm.secret.<id>}` secret reference.
+>
+> This combined config reflects the same provider blocks as the live `chatLanguageModels.json`. Qwen is pointed at the local proxy; remove `requestBody.enable_thinking` when using the proxy.
 ```json
 [
@@ -15,24 +17,18 @@ Here's a complete, real-world `chatLanguageModels.json` that combines **all the
       {
         "id": "qwen3.7-max",
         "name": "Qwen 3.7 Max (text)",
-        "url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
+        "url": "http://127.0.0.1:3458/v1/chat/completions",
         "toolCalling": true,
         "vision": false,
-        "streaming": true,
-        "requestBody": {
-          "enable_thinking": false
-        }
+        "streaming": true
       },
       {
         "id": "qwen3.7-plus",
         "name": "Qwen 3.7 Plus (vision)",
-        "url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
+        "url": "http://127.0.0.1:3458/v1/chat/completions",
         "toolCalling": true,
         "vision": true,
-        "streaming": true,
-        "requestBody": {
-          "enable_thinking": false
-        }
+        "streaming": true
       }
     ]
   },
@@ -54,6 +50,20 @@ Here's a complete, real-world `chatLanguageModels.json` that combines **all the
         "streaming": true,
         "maxInputTokens": 262144,
         "maxOutputTokens": 32768
+      },
+      {
+        "id": "kimi-k2.7-code",
+        "name": "Kimi K2.7 Code (vision)",
+        "url": "http://127.0.0.1:3457/v1/chat/completions",
+        "requestBody": {
+          "temperature": 1,
+          "max_tokens": 4096
+        },
+        "toolCalling": true,
+        "vision": true,
+        "streaming": true,
+        "maxInputTokens": 262144,
+        "maxOutputTokens": 4096
       }
     ]
   },
@@ -155,7 +165,6 @@ Here's a complete, real-world `chatLanguageModels.json` that combines **all the
           "top_p": 0.95
         }
       },
       {
         "id": "glm-5v-turbo",
         "name": "GLM 5V Turbo (vision)",
@@ -180,7 +189,7 @@ Here's a complete, real-world `chatLanguageModels.json` that combines **all the
 If you only need one provider, jump straight to its setup guide:
-- [Kimi K2.6](kimi.md)
+- [Kimi K2.6 / K2.7 Code](kimi.md)
 - [Qwen 3.7 Plus / 3.7 Max](qwen.md)
 - [Xiaomi MiMo (V2.5 / V2.5 Pro / V2 Flash)](mimo.md)
 - [MiniMax M3](minimax.md)

package/docs/models/glm.md CHANGED Viewed

@@ -10,7 +10,7 @@
 | Vision                 | ✅ Yes (`glm-5v-turbo` only)                            |
 | Tool calling           | ✅ Yes (native multimodal tool use on `glm-5v-turbo`)   |
 | Context (flagship)     | 200K (`glm-5.1` / `glm-5v-turbo`)                       |
-| Max output (flagship)  | 128K                                                    |
+| Max output (flagship)  | 131072                                                  |
 | Required `requestBody` | `thinking: { type: "enabled" }` (recommended)           |
 | Endpoint (intl)        | `https://api.z.ai/api/paas/v4/chat/completions`         |
 | Endpoint (China)       | `https://open.bigmodel.cn/api/paas/v4/chat/completions` |
@@ -20,8 +20,8 @@
 | Model          | Vision | Context | Max output | Thinking  | Cost (in / out per 1M) | Role                                                      |
 | -------------- | ------ | ------- | ---------- | --------- | ---------------------- | --------------------------------------------------------- |
-| `glm-5.1`      | ❌     | 200K    | 128K       | `enabled` | $1.40 / $4.40          | Current flagship — long-horizon / 8h autonomous work      |
-| `glm-5v-turbo` | ✅     | 200K    | 128K       | `enabled` | $1.20 / $4.00          | Multimodal **coding** model — vision-based agentic coding |
+| `glm-5.1`      | ❌     | 200K    | 131072     | `enabled` | $1.40 / $4.40          | Current flagship — long-horizon / 8h autonomous work      |
+| `glm-5v-turbo` | ✅     | 200K    | 131072     | `enabled` | $1.20 / $4.00          | Multimodal **coding** model — vision-based agentic coding |
 > Other GLM models — `glm-5`, `glm-5-turbo`, `glm-4.6v-flashx`, `glm-4.5`, `glm-4.5-air`, `glm-4.5-flash`, `glm-4.5-x`, `glm-4.5-airx`, `glm-4-32b-0414-128k` — are callable on the same endpoint but are intentionally **not** added to the default `chatLanguageModels.json` block below. Add them in the same shape if you need them. Note: `glm-4.6v-flashx` was previously in the default block but has been **removed** because live testing showed it is not reliable for tool calling.
@@ -54,7 +54,7 @@ Config file location:
   "models": [
     {
       "id": "glm-5.1",
-      "name": "GLM 5.1 (flagship)",
+      "name": "GLM 5.1 (text)",
       "url": "https://api.z.ai/api/paas/v4/chat/completions",
       "toolCalling": true,
       "vision": false,
@@ -69,7 +69,7 @@ Config file location:
     },
     {
       "id": "glm-5v-turbo",
-      "name": "GLM 5V Turbo (vision flagship)",
+      "name": "GLM 5V Turbo (vision)",
       "url": "https://api.z.ai/api/paas/v4/chat/completions",
       "toolCalling": true,
       "vision": true,

package/docs/models/kimi.md CHANGED Viewed

@@ -1,19 +1,36 @@
 # Kimi — VS Code Custom Endpoint Setup Guide
-> **TL;DR:** Kimi K2.6 requires the local proxy. The K2 family locks `temperature: 1` and `top_p: 0.95`, and requires `thinking: { type: "disabled" }` on tool turns. The proxy rewrites sampling values, suppresses thinking on tool turns, and preserves streaming. Direct VS Code → Moonshot integration is not viable in this environment.
+> **TL;DR:** Kimi models require the local proxy. The K2 family locks `temperature: 1` and `top_p: 0.95`. K2.6 requires `thinking: { type: "disabled" }` on tool turns; **K2.7 Code is always-thinking and rejects `thinking: disabled`**, so the proxy detects `kimi-k2.7*` and skips that rewrite while keeping sampling enforcement. Direct VS Code → Moonshot integration is not viable in this environment.
 ## At a Glance
-| Field                  | Value                                         |
-| ---------------------- | --------------------------------------------- |
-| Mode                   | **Proxy required** (local on `:3457`)         |
-| Vision                 | ✅ Yes                                        |
-| Tool calling           | ✅ Yes (proxy forces `thinking: disabled`)    |
-| Context                | 256K                                          |
-| Max output             | 32K                                           |
-| Required `requestBody` | `temperature: 1`                              |
-| Upstream endpoint      | `https://api.moonshot.ai/v1/chat/completions` |
-| Proxy endpoint         | `http://127.0.0.1:3457/v1/chat/completions`   |
+| Field             | Value                                         |
+| ----------------- | --------------------------------------------- |
+| Mode              | **Proxy required** (local on `:3457`)         |
+| Vision            | ✅ Yes                                        |
+| Tool calling      | ✅ Yes                                        |
+| Upstream endpoint | `https://api.moonshot.ai/v1/chat/completions` |
+| Proxy endpoint    | `http://127.0.0.1:3457/v1/chat/completions`   |
+### K2.6
+| Field                  | Value                                |
+| ---------------------- | ------------------------------------ |
+| Model id               | `kimi-k2.6`                          |
+| Context                | 262K                                 |
+| Max output             | 32768                                |
+| Required `requestBody` | `temperature: 1`                     |
+| Tool calling           | ✅ Proxy forces `thinking: disabled` |
+### K2.7 Code
+| Field                  | Value                                                      |
+| ---------------------- | ---------------------------------------------------------- |
+| Model id               | `kimi-k2.7-code`                                           |
+| Context                | 262K                                                       |
+| Max output             | 4096                                                       |
+| Required `requestBody` | `temperature: 1`, `max_tokens: 4096`                       |
+| Tool calling           | ✅ Proxy lets K2.7 think (it rejects `thinking: disabled`) |
 ## Quick Start
@@ -23,7 +40,7 @@
    - `npx copilot-custom-endpoint` (also starts the Qwen proxy concurrently)
 2. **Edit `chatLanguageModels.json`** — add the Kimi block from [Setup](#setup) below.
 3. **Set your Moonshot API key** via the Command Palette → **Chat: Manage Language Models**.
-4. **Restart VS Code** and pick "Kimi K2.6" in the chat picker.
+4. **Restart VS Code** and pick "Kimi K2.6" or "Kimi K2.7 Code" in the chat picker.
 ## Setup
@@ -46,7 +63,7 @@ Config file location:
   "models": [
     {
       "id": "kimi-k2.6",
-      "name": "Kimi K2.6",
+      "name": "Kimi K2.6 (vision)",
       "url": "http://127.0.0.1:3457/v1/chat/completions",
       "requestBody": {
         "temperature": 1
@@ -56,11 +73,27 @@ Config file location:
       "streaming": true,
       "maxInputTokens": 262144,
       "maxOutputTokens": 32768
+    },
+    {
+      "id": "kimi-k2.7-code",
+      "name": "Kimi K2.7 Code",
+      "url": "http://127.0.0.1:3457/v1/chat/completions",
+      "requestBody": {
+        "temperature": 1,
+        "max_tokens": 4096
+      },
+      "toolCalling": true,
+      "vision": true,
+      "streaming": true,
+      "maxInputTokens": 262144,
+      "maxOutputTokens": 4096
     }
   ]
 }
 ```
+> **K2.7 note:** `max_tokens` and `maxOutputTokens` are intentionally conservative at **4096**. K2.7 is always-thinking, so reasoning tokens inflate response size. Values above 24K triggered VS Code's "Response too long" error in agent mode during validation. Raise this only if you have tested your specific workload.
 ### 2. API key
 1. Open the Command Palette (`Ctrl+Shift+P`).
@@ -84,15 +117,15 @@ Config file location:
 All can be set in a `.env` file at the repo root (both proxies `import 'dotenv/config'` automatically).
-| Variable                                    | Default                                               | Purpose                                                 |
-| ------------------------------------------- | ----------------------------------------------------- | ------------------------------------------------------- |
-| `KIMI_PROXY_PORT`                           | `3457` (falls back to `PORT`)                         | Local listen port                                       |
-| `KIMI_UPSTREAM_URL`                         | `https://api.moonshot.ai/v1/chat/completions`         | Upstream Moonshot endpoint                              |
-| `KIMI_PROXY_FORCE_TEMPERATURE`              | `1`                                                   | Temperature for thinking-mode requests                  |
-| `KIMI_PROXY_FORCE_NON_THINKING_TEMPERATURE` | `0.6`                                                 | Temperature when thinking is disabled (tool requests)   |
-| `KIMI_PROXY_FORCE_TOP_P`                    | `0.95`                                                | `top_p` forced into request body                        |
-| `KIMI_PROXY_DISABLE_THINKING_WITH_TOOLS`    | `1`                                                   | Force `thinking={"type":"disabled"}` when tools present |
-| `KIMI_PROXY_LOG`                            | `debug_log/kimi-proxy.ndjson` (relative to repo root) | Redacted NDJSON log path                                |
+| Variable                                    | Default                                                  | Purpose                                                 |
+| ------------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------- |
+| `KIMI_PROXY_PORT`                           | `3457` (falls back to `PORT`)                            | Local listen port                                       |
+| `KIMI_UPSTREAM_URL`                         | `https://api.moonshot.ai/v1/chat/completions`            | Upstream Moonshot endpoint                              |
+| `KIMI_PROXY_FORCE_TEMPERATURE`              | `1`                                                      | Temperature for thinking-mode requests                  |
+| `KIMI_PROXY_FORCE_NON_THINKING_TEMPERATURE` | `0.6`                                                    | Temperature when thinking is disabled (tool requests)   |
+| `KIMI_PROXY_FORCE_TOP_P`                    | `0.95`                                                   | `top_p` forced into request body                        |
+| `KIMI_PROXY_DISABLE_THINKING_WITH_TOOLS`    | `1`                                                      | Force `thinking={"type":"disabled"}` when tools present |
+| `KIMI_PROXY_LOG`                            | `debug_log/kimi-proxy.ndjson` (relative to proxy script) | Redacted NDJSON log path                                |
 #### Health check response
@@ -110,7 +143,8 @@ All can be set in a `.env` file at the repo root (both proxies `import 'dotenv/c
 - Forwards the existing `Authorization` header upstream.
 - Rewrites plain-chat requests to `temperature: 1` and `top_p: 0.95`.
-- Rewrites tool-enabled requests to `thinking: {"type": "disabled"}`, `temperature: 0.6`, and `top_p: 0.95`.
+- For **K2.5/K2.6**: rewrites tool-enabled requests to `thinking: {"type": "disabled"}`, `temperature: 0.6`, and `top_p: 0.95`.
+- For **K2.7 Code**: keeps thinking enabled (K2.7 rejects `thinking: disabled` with HTTP 400) and rewrites to `temperature: 1`, `top_p: 0.95`.
 - Preserves streaming responses.
 - Writes redacted request summaries to `debug_log/kimi-proxy.ndjson`.
@@ -125,10 +159,11 @@ All can be set in a `.env` file at the repo root (both proxies `import 'dotenv/c
 ### Thinking mode
-| Turn type    | Behavior                                                    |
-| ------------ | ----------------------------------------------------------- |
-| Plain chat   | Thinking enabled, `temperature: 1`                          |
-| Tool-enabled | `thinking: { type: "disabled" }` forced, `temperature: 0.6` |
+| Model       | Turn type    | Behavior                                                    |
+| ----------- | ------------ | ----------------------------------------------------------- |
+| K2.5 / K2.6 | Plain chat   | Thinking enabled, `temperature: 1`, `top_p: 0.95`           |
+| K2.5 / K2.6 | Tool-enabled | `thinking: { type: "disabled" }` forced, `temperature: 0.6`, `top_p: 0.95` |
+| K2.7 Code   | All turns    | Always-thinking, `temperature: 1`, `top_p: 0.95`            |
 ### Capabilities
@@ -151,12 +186,14 @@ All can be set in a `.env` file at the repo root (both proxies `import 'dotenv/c
 ## Pricing
-For the cross-provider comparison, see [docs/pricing.md](../pricing.md). Kimi K2.6 on the **Moonshot direct platform**:
+For the cross-provider comparison, see [docs/pricing.md](../pricing.md). Kimi models on the **Moonshot direct platform**:
-| Model       | Input      | Output (non-thinking) | Output (thinking) |
-| ----------- | ---------- | --------------------- | ----------------- |
-| `kimi-k2.6` | $0.16 / 1M | $0.95 / 1M            | $4.00 / 1M        |
+| Model            | Input      | Cached input | Output (non-thinking) | Output (thinking) |
+| ---------------- | ---------- | ------------ | --------------------- | ----------------- |
+| `kimi-k2.6`      | $0.16 / 1M | —            | $0.95 / 1M            | $4.00 / 1M        |
+| `kimi-k2.7-code` | $0.19 / 1M | $0.95 / 1M   | —                     | $4.00 / 1M        |
+> **K2.7:** No non-thinking mode — always-thinking. Cached input pricing applies.
 > Via DashScope, K2.6 is also available at $0.89 / 1M input and $3.71 / 1M output (same model, regional pricing).
 ---
@@ -213,11 +250,25 @@ The model-level `requestBody.temperature = 1` override validated locally but was
 - Redacted proxy logs confirmed `temperature 0.1 -> 1` and `top_p 1 -> 0.95` for plain-chat requests.
 - Redacted proxy logs later confirmed `thinking undefined -> disabled` and `temperature 0.1 -> 0.6` for tool-enabled requests.
+### K2.7 Code validation results (June 14, 2026)
+| Check                                                 | Result                                     |
+| ----------------------------------------------------- | ------------------------------------------ |
+| `GET /v1/models` — slug confirmed                     | ✅ `kimi-k2.7-code`                        |
+| Plain chat via proxy                                  | ✅                                         |
+| Tool turn with `thinking: disabled`                   | ❌ HTTP 400 — rejected by model            |
+| Tool turn letting K2.7 think                          | ✅                                         |
+| Two-turn tool loop via proxy                          | ✅ No `reasoning_content is missing` error |
+| VS Code Agent mode — integrated browser opened Google | ✅                                         |
+| `maxOutputTokens` 24K–32K in agent mode               | ❌ VS Code "Response too long"             |
+| `maxOutputTokens` 4096 in agent mode                  | ✅                                         |
 ### Final verdict
 - Acceptable for plain chat: **yes** (proxy)
 - Acceptable for streaming chat: **yes** (proxy)
 - Acceptable for tool-enabled agent use: **yes**, with the local proxy workaround
+- K2.7 specifically: **yes**, but keep `maxOutputTokens` low (4096 validated) to avoid VS Code's response-size limit
 - Acceptable without a proxy: **no**
 ## References
@@ -233,3 +284,4 @@ The model-level `requestBody.temperature = 1` override validated locally but was
 - Kimi web search guide: `https://platform.kimi.ai/docs/guide/use-web-search.md`
 - Kimi coding tools / agent guide: `https://platform.kimi.ai/docs/guide/agent-support.md`
 - Kimi K2.6 pricing: `https://platform.kimi.ai/docs/pricing/chat-k26`
+- Kimi K2.7 Code pricing: `https://platform.kimi.ai/docs/pricing/chat-k27-code`

package/docs/models/mimo.md CHANGED Viewed

@@ -10,7 +10,7 @@
 | Vision                 | ✅ Yes (`mimo-v2.5` only)                        |
 | Tool calling           | ✅ Yes (with `thinking: disabled`)               |
 | Context                | 1M (V2.5 Pro / V2.5) / 256K (V2 Flash)           |
-| Max output             | 128K (V2.5 Pro) / 32K (V2.5) / 64K (V2 Flash)    |
+| Max output             | 131072 (V2.5 Pro) / 32768 (V2.5) / 65536 (V2 Flash) |
 | Required `requestBody` | `thinking: { type: "disabled" }`                 |
 | Endpoint               | `https://api.xiaomimimo.com/v1/chat/completions` |
@@ -51,7 +51,7 @@ Config file location:
   "models": [
     {
       "id": "mimo-v2.5-pro",
-      "name": "MiMo V2.5 Pro",
+      "name": "MiMo V2.5 Pro (text)",
       "url": "https://api.xiaomimimo.com/v1/chat/completions",
       "toolCalling": true,
       "vision": false,
@@ -66,7 +66,7 @@ Config file location:
     },
     {
       "id": "mimo-v2.5",
-      "name": "MiMo V2.5",
+      "name": "MiMo V2.5 (vision)",
       "url": "https://api.xiaomimimo.com/v1/chat/completions",
       "toolCalling": true,
       "vision": true,
@@ -81,7 +81,7 @@ Config file location:
     },
     {
       "id": "mimo-v2-flash",
-      "name": "MiMo V2 Flash",
+      "name": "MiMo V2 Flash (text)",
       "url": "https://api.xiaomimimo.com/v1/chat/completions",
       "toolCalling": true,
       "vision": false,

package/docs/models/minimax.md CHANGED Viewed

@@ -10,7 +10,7 @@
 | Vision                   | ✅ Yes (image + video)                                  |
 | Tool calling             | ✅ Yes                                                  |
 | Context                  | 1M (guaranteed 512K)                                    |
-| Max output               | 512K (recommended 128K)                                 |
+| Max output               | 131072                                                  |
 | Required `requestBody`   | `thinking: { type: "adaptive" }, reasoning_split: true` |
 | Endpoint (international) | `https://api.minimax.io/v1/chat/completions`            |
 | Endpoint (China)         | `https://api.minimaxi.com/v1/chat/completions`          |
@@ -42,7 +42,7 @@ Config file location:
   "models": [
     {
       "id": "MiniMax-M3",
-      "name": "MiniMax M3",
+      "name": "MiniMax M3 (vision)",
       "url": "https://api.minimax.io/v1/chat/completions",
       "toolCalling": true,
       "vision": true,
@@ -149,7 +149,7 @@ For the cross-provider comparison, see [docs/pricing.md](../pricing.md). MiniMax
 \* Input tokens above 512K are available in limited quantity for a limited time.
-> **Promo:** A 7-day 50% off promotion is available for new accounts, making the ≤ 512K tier effectively $0.30 / 1M input and $1.20 / 1M output for the first week.
+> **Permanent 50% off:** A standing 50% discount applies to all MiniMax-M3 pay-as-you-go usage on both the Standard and Priority tiers (verified June 9, 2026). The effective rates are $0.30 / 1M input, $1.20 / 1M output, and $0.06 / 1M cached input (≤ 512K tier).
 ### Token Plan (subscription)

package/docs/models/qwen.md CHANGED Viewed

@@ -1,19 +1,19 @@
 # Qwen (DashScope) — VS Code Custom Endpoint Setup Guide
-> **TL;DR:** Direct path works for `qwen3.7-plus` (vision) and `qwen3.7-max` (text-only) without a proxy. The optional `proxy/qwen-proxy.mjs` adds dynamic thinking suppression: reasoning stays ON in plain chat but turns OFF automatically when tools are invoked. Pick the mode that matches your tradeoff.
+> **TL;DR:** The live config points `qwen3.7-plus` (vision) and `qwen3.7-max` (text-only) at `proxy/qwen-proxy.mjs` for dynamic thinking suppression: reasoning stays ON in plain chat but turns OFF automatically when tools are invoked. A direct DashScope path with static `enable_thinking: false` is also supported if you prefer not to run the proxy.
 ## At a Glance
 | Field                           | Value                                                                     |
 | ------------------------------- | ------------------------------------------------------------------------- |
-| Mode                            | **Direct** (no proxy) **or** **Proxy** (optional, for dynamic thinking)   |
-| Vision                          | ✅ Yes (`qwen3.7-plus`)                                                   |
-| Tool calling                    | ✅ Yes                                                                    |
-| Context                         | 1M                                                                        |
-| Required `requestBody` (direct) | `enable_thinking: false`                                                  |
-| Required `requestBody` (proxy)  | none — proxy injects based on tool activity in the conversation           |
-| Endpoint                        | `https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions` |
-| Proxy endpoint                  | `http://127.0.0.1:3458/v1/chat/completions`                               |
+| Mode                            | **Proxy** (local on `:3458`) **or** **Direct** (static `enable_thinking: false`) |
+| Vision                          | ✅ Yes (`qwen3.7-plus`)                                                          |
+| Tool calling                    | ✅ Yes                                                                           |
+| Context                         | 1M                                                                               |
+| Required `requestBody` (direct) | `enable_thinking: false`                                                         |
+| Required `requestBody` (proxy)  | none — proxy injects based on tool activity in the conversation                  |
+| Endpoint                        | `https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions`        |
+| Proxy endpoint                  | `http://127.0.0.1:3458/v1/chat/completions`                                      |
 ### Models at a glance
@@ -22,15 +22,9 @@
 | `qwen3.7-plus` | ✅ Yes | Primary model with image understanding |
 | `qwen3.7-max`  | ❌ No  | Larger text-only model                 |
-> The snapshot `qwen3.7-plus-2026-05-26` is also available; the floating `qwen3.7-plus` alias is preferred.
+> The live `chatLanguageModels.json` points Qwen models at the local proxy by default; the direct DashScope URL is shown for users who prefer a static `enable_thinking: false` setup.
-## Quick Start — Direct Path (Recommended for Simplicity)
-1. **Edit `chatLanguageModels.json`** — add the Qwen block from [Setup § Direct](#direct-path) below.
-2. **Set your `DASHSCOPE_API_KEY`** via Command Palette → **Chat: Manage Language Models**.
-3. **Restart VS Code** and pick "Qwen 3.7 Plus" or "Qwen 3.7 Max".
-## Quick Start — With Proxy (Dynamic Thinking)
+## Quick Start — With Proxy (Recommended)
 1. **Start the proxy** — choose one:
    - `npm run proxy:qwen` (from the repo root)
@@ -40,6 +34,12 @@
 3. **Set your DashScope API key** via the Language Models UI.
 4. **Restart VS Code.** Reasoning will be visible in plain chat and suppressed on tool turns.
+## Quick Start — Direct Path (No Proxy)
+1. **Edit `chatLanguageModels.json`** — add the Qwen block from [Setup § Direct](#direct-path) below.
+2. **Set your `DASHSCOPE_API_KEY`** via Command Palette → **Chat: Manage Language Models**.
+3. **Restart VS Code** and pick "Qwen 3.7 Plus" or "Qwen 3.7 Max".
 ## Setup
 ### Regional endpoints
@@ -63,7 +63,7 @@ DashScope is region-specific — your API key only works on the endpoint it was
   "models": [
     {
       "id": "qwen3.7-max",
-      "name": "Qwen 3.7 Max",
+      "name": "Qwen 3.7 Max (text)",
       "url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
       "toolCalling": true,
       "vision": false,
@@ -74,7 +74,7 @@ DashScope is region-specific — your API key only works on the endpoint it was
     },
     {
       "id": "qwen3.7-plus",
-      "name": "Qwen 3.7 Plus",
+      "name": "Qwen 3.7 Plus (vision)",
       "url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
       "toolCalling": true,
       "vision": true,
@@ -89,6 +89,8 @@ DashScope is region-specific — your API key only works on the endpoint it was
 > **`enable_thinking: false`** suppresses the Qwen3 family's default thinking mode, which prevents `reasoning_content` issues during tool loops.
+> **Live config note:** The checked-in `chatLanguageModels.json` points Qwen at the local proxy (`http://127.0.0.1:3458`) with no `requestBody` override, so the proxy manages `enable_thinking` dynamically. Use the snippet above only if you are not running the proxy.
 ### Proxy path
 #### 1. Start the proxy
@@ -132,7 +134,7 @@ Expected response:
   "models": [
     {
       "id": "qwen3.7-max",
-      "name": "Qwen 3.7 Max",
+      "name": "Qwen 3.7 Max (text)",
       "url": "http://127.0.0.1:3458/v1/chat/completions",
       "toolCalling": true,
       "vision": false,
@@ -140,7 +142,7 @@ Expected response:
     },
     {
       "id": "qwen3.7-plus",
-      "name": "Qwen 3.7 Plus",
+      "name": "Qwen 3.7 Plus (vision)",
       "url": "http://127.0.0.1:3458/v1/chat/completions",
       "toolCalling": true,
       "vision": true,
@@ -160,7 +162,7 @@ All can be set in a `.env` file at the repo root (both proxies `import 'dotenv/c
 | ---------------------------------------- | ------------------------------------------------------------------------- | -------------------------------------------------- |
 | `QWEN_PROXY_PORT`                        | `3458` (falls back to `PORT`)                                             | Local listen port                                  |
 | `QWEN_UPSTREAM_URL`                      | `https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions` | Upstream DashScope endpoint                        |
-| `QWEN_PROXY_LOG`                         | `debug_log/qwen-proxy.ndjson` (relative to repo root)                     | Redacted NDJSON log path                           |
+| `QWEN_PROXY_LOG`                         | `debug_log/qwen-proxy.ndjson` (relative to proxy script)                  | Redacted NDJSON log path                           |
 | `QWEN_PROXY_DISABLE_THINKING_WITH_TOOLS` | `1`                                                                       | Set to `0` to skip tool-aware thinking suppression |
 #### Proxy request rewriting rules
@@ -175,6 +177,8 @@ The proxy detects active tool use by examining the conversation state, not just
 > **Why delete rather than set `true`?** Omitting the key lets Qwen use its built-in default (`true`). Deletion is closer to "don't interfere."
 >
 > **Why not check `body.tools`?** The proxy checks for tool _activity_ — tool results in the message history or an explicit `tool_choice` directive — rather than the mere presence of a tools array. This correctly handles tool-enabled conversations even when the client sends `tools` in an earlier request but omits it from subsequent turns.
+>
+> **Proxy vs. direct:** The live config uses the proxy URL with no `requestBody` override so this dynamic behavior is applied to every request. The direct-path snippet above keeps `enable_thinking: false` static in `requestBody` as a no-proxy alternative.
 ### API key
@@ -197,6 +201,8 @@ The Qwen3 hybrid-thinking models default to `enable_thinking: true`, producing `
 | Proxy path          | Thinking ON (default preserved) | Thinking OFF (auto-injected)  |
 | No config (default) | Thinking ON                     | Risk: history may be rejected |
+> The live `chatLanguageModels.json` uses the proxy path by default, so plain-chat reasoning is visible and tool turns are stable.
 ### Vision (`qwen3.7-plus`)
 - Image input via OpenAI-compatible `content` array format (base64 data URIs).

package/docs/pricing.md CHANGED Viewed

@@ -47,6 +47,7 @@ These are the models available through GitHub Copilot's model roster as of June
 | **MiMo V2 Flash**     | Xiaomi    | $0.10                         | $0.01                         | $0.30                                   | 256K           |
 | **DeepSeek V4 Flash** | DeepSeek  | $0.14                         | $0.0028                       | $0.28                                   | 1M             |
 | **Kimi K2.6**         | Moonshot  | $0.16                         | —                             | $0.95 (non-thinking) / $4.00 (thinking) | 256K           |
+| **Kimi K2.7 Code**    | Moonshot  | $0.19                         | $0.95                         | $4.00                                   | 262K           |
 | **Qwen 3.7 Plus**     | DashScope | $0.40 (≤256K) / $1.20 (>256K) | —                             | $1.60 (≤256K) / $4.80 (>256K)           | 1M             |
 | **MiMo V2.5**         | Xiaomi    | $0.40                         | $0.08                         | $2.00                                   | 1M             |
 | **DeepSeek V4 Pro**   | DeepSeek  | $0.435                        | $0.003625                     | $0.87                                   | 1M             |
@@ -66,7 +67,7 @@ These are the models available through GitHub Copilot's model roster as of June
 > - **Qwen** models use **tiered pricing** — determined by total input tokens per request. Prices above are for non-thinking mode.
 > - **Kimi K2.6** pricing is from the **Moonshot platform** (direct). Via DashScope: $0.89 input / $3.71 output.
 > - **DashScope** offers a **free quota** of 1M input + 1M output tokens per model, valid for 90 days.
-> - **MiniMax M3** uses **tiered pricing** — input price doubles above 512K input tokens. Cache hits are priced at 20% of the input rate ($0.12/M ≤512K, $0.24/M >512K). A 7-day 50% off promotion is available for new accounts.
+> - **MiniMax M3** uses **tiered pricing** — input price doubles above 512K input tokens. Cache hits are priced at 20% of the input rate ($0.12/M ≤512K, $0.24/M >512K). A **permanent 50% off** discount applies to all MiniMax-M3 pay-as-you-go usage (Standard and Priority tiers), making the effective rates half the list prices above.
 > - **GLM** models support prompt caching — cache hits are priced at $0.24/M for 5V Turbo and $0.26/M for 5.1.
 > - **MiMo** offers a **Token Plan** subscription model with discounted rates and a free cache-writing promotion.
 > - For typical Copilot chat usage (short-to-medium prompts), you'll almost always fall in the lowest pricing tier.
@@ -80,12 +81,13 @@ For a typical coding session (~10K input + ~2K output tokens per turn, 50 turns)
 | MiMo V2 Flash            | ~$0.08                 |
 | DeepSeek V4 Flash        | ~$0.10                 |
 | Kimi K2.6 (non-thinking) | ~$0.18                 |
+| MiniMax M3 (50% off)     | ~$0.27                 |
 | DeepSeek V4 Pro          | ~$0.30                 |
 | Raptor mini              | ~$0.33                 |
 | Qwen 3.7 Plus            | ~$0.36                 |
 | MiMo V2.5                | ~$0.40                 |
 | Kimi K2.6 (thinking)     | ~$0.48                 |
-| MiniMax M3               | ~$0.54                 |
+| Kimi K2.7 Code           | ~$0.50                 |
 | Gemini 3 Flash           | ~$0.55                 |
 | MiMo V2.5 Pro            | ~$0.80                 |
 | GPT-5.4 mini             | ~$0.83                 |
@@ -102,7 +104,7 @@ For a typical coding session (~10K input + ~2K output tokens per turn, 50 turns)
 > **How long does 7,000 credits last?** A Pro+ subscriber running 50-turn sessions could afford roughly **13 GPT-5.5 sessions**, **23 Opus sessions**, or **212 Raptor mini sessions** per month — or mix and match. (Multiply session cost by 100 to convert to AI credits.)
-> Prices last verified: June 1, 2026. Always check the official pages for the latest rates:
+> Prices last verified: June 9, 2026. Always check the official pages for the latest rates:
 >
 > - [GitHub Copilot models & pricing](https://docs.github.com/en/copilot/reference/copilot-billing/models-and-pricing)
 > - [OpenAI pricing](https://openai.com/api/pricing/)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "copilot-custom-endpoint",
-  "version": "1.3.13",
+  "version": "1.4.0",
   "description": "Local proxies for VS Code Copilot custom endpoints — Kimi K2 & Qwen 3.x",
   "license": "MIT",
   "type": "module",

package/proxy/kimi-proxy.mjs CHANGED Viewed

@@ -8,12 +8,16 @@ import { createProxy } from '../lib/create-proxy.mjs'
  * - Validated in this repo with `kimi-k2.6`.
  * - Expected to work for `kimi-k2.5`, because Kimi documents the same fixed
  *   sampling and thinking behavior for `kimi-k2.6` / `kimi-k2.5`.
+ * - Validated in this repo with `kimi-k2.7-code` (June 14, 2026). K2.7 is
+ *   always-thinking and rejects `thinking: { type: 'disabled' }`. The proxy
+ *   detects K2.7 and skips the thinking-disable rewrite while keeping
+ *   temperature/top_p enforcement.
  * - Not intended for `moonshot-v1` models or non-Kimi providers, because this
  *   proxy rewrites requests to K2-family-specific values:
  *   - thinking mode temperature = 1.0
  *   - non-thinking mode temperature = 0.6
  *   - top_p = 0.95
- *   - tool-enabled requests force `thinking: { type: 'disabled' }`
+ *   - tool-enabled requests force `thinking: { type: 'disabled' }` (K2.5/K2.6 only)
  */
 const upstreamUrl =
   process.env.KIMI_UPSTREAM_URL ?? 'https://api.moonshot.ai/v1/chat/completions'
@@ -104,6 +108,12 @@ function rewriteKimi(payload) {
   const incomingTemperature = payload.temperature
   const incomingTopP = payload.top_p
   const incomingThinkingType = payload?.thinking?.type
+  const model = payload.model ?? ''
+  // K2.7 is always-thinking and rejects thinking: disabled.
+  // Detect K2.7 variants (e.g. kimi-k2.7-code) and skip the thinking-disable
+  // rewrite while keeping temperature/top_p enforcement.
+  const isK27 = model.startsWith('kimi-k2.7')
   // Determine if a tool is actually being invoked:
   // - tool_choice is set and not "none"
@@ -116,7 +126,7 @@ function rewriteKimi(payload) {
     (toolChoice !== undefined && toolChoice !== 'none' && toolChoice !== null)
   const hasTools = hasActiveToolCall
-  const useNonThinkingMode = disableThinkingWithTools && hasTools
+  const useNonThinkingMode = !isK27 && disableThinkingWithTools && hasTools
   const rewrittenTemperature = useNonThinkingMode
     ? forcedNonThinkingTemperature
     : forcedTemperature
@@ -134,6 +144,8 @@ function rewriteKimi(payload) {
   const rewrittenThinkingType = payload.thinking?.type
   const rewriteInfo = {
+    model,
+    isK27,
     incomingTemperature,
     rewrittenTemperature,
     incomingTopP,
@@ -145,7 +157,8 @@ function rewriteKimi(payload) {
   const summary = summarizePayload(payload, hasTools, rewriteInfo)
   const modeTag = hasTools ? '[tools]' : '[chat]'
-  const consoleMsg = `${modeTag} temperature ${String(incomingTemperature)} -> ${String(rewrittenTemperature)}, top_p ${String(incomingTopP)} -> ${String(forcedTopP)}, thinking ${String(incomingThinkingType)} -> ${String(rewrittenThinkingType)}`
+  const k27Tag = isK27 ? '[k2.7]' : ''
+  const consoleMsg = `${k27Tag}${modeTag} temperature ${String(incomingTemperature)} -> ${String(rewrittenTemperature)}, top_p ${String(incomingTopP)} -> ${String(forcedTopP)}, thinking ${String(incomingThinkingType)} -> ${String(rewrittenThinkingType)}`
   // Clean up internal key before forwarding
   delete payload.__incomingThinkingType