copilot-custom-endpoint 1.3.1 → 1.3.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +3 -3
- package/docs/example-config.md +3 -3
- package/docs/models/glm.md +18 -19
- package/docs/models/qwen.md +20 -18
- package/docs/pricing.md +2 -2
- package/package.json +1 -1
- package/proxy/qwen-proxy.mjs +1 -1
package/README.md
CHANGED
|
@@ -25,7 +25,7 @@ That's it. No code, no servers to manage (unless the model specifically needs th
|
|
|
25
25
|
| **MiMo V2.5** | Xiaomi | No | ✅ | [Setup](docs/models/mimo.md) |
|
|
26
26
|
| **MiMo V2.5 Pro** | Xiaomi | No | ❌ | [Setup](docs/models/mimo.md) |
|
|
27
27
|
| **Kimi K2.6** | Moonshot | **Yes** | ✅ | [Setup](docs/models/kimi.md) |
|
|
28
|
-
| **Qwen 3.
|
|
28
|
+
| **Qwen 3.7 Plus** | DashScope | Optional | ✅ | [Setup](docs/models/qwen.md) |
|
|
29
29
|
| **Qwen 3.7 Max** | DashScope | Optional | ❌ | [Setup](docs/models/qwen.md) |
|
|
30
30
|
| **MiniMax M3** | MiniMax | No | ✅ | [Setup](docs/models/minimax.md) |
|
|
31
31
|
| **GLM 5.1** | Z.ai | No | ❌ | [Setup](docs/models/glm.md) |
|
|
@@ -97,7 +97,7 @@ All prices are **USD per 1M tokens** (cache miss). 1 AI credit = $0.01.
|
|
|
97
97
|
| **DeepSeek V4 Flash** 🏆 | $0.14 | $0.28 | 1M |
|
|
98
98
|
| **Kimi K2.6** (non-thinking) | $0.16 | $0.95 | 256K |
|
|
99
99
|
| **MiMo V2.5** | $0.40 | $2.00 | 1M |
|
|
100
|
-
| **Qwen 3.
|
|
100
|
+
| **Qwen 3.7 Plus** | $0.40 | $1.60 | 1M |
|
|
101
101
|
| **MiniMax M3** | $0.60 | $2.40 | 1M |
|
|
102
102
|
| **MiMo V2.5 Pro** | $1.00 | $3.00 | 1M |
|
|
103
103
|
| **GLM 5V Turbo** | $1.20 | $4.00 | 200K |
|
|
@@ -117,7 +117,7 @@ VS Code's built-in `view_image` tool only accepts **static images** (PNG, JPG, G
|
|
|
117
117
|
**Video Context MCP** is a small MCP server that bridges that gap. It works with **GitHub Copilot, Cursor, and Claude Code** out of the box, and:
|
|
118
118
|
|
|
119
119
|
- **Extracts frames** from local files or remote URLs (no `ffmpeg` gymnastics required).
|
|
120
|
-
- **Routes them through a multi-provider fallback chain** — `Gemini → GLM-4.6V → Qwen3.6 → Kimi K2.6 → MiMo-V2.5` — so a single `GLM 5V Turbo` rate-limit hiccup doesn't kill your session.
|
|
120
|
+
- **Routes them through a multi-provider fallback chain** — `Gemini → GLM-4.6V-flash → Qwen3.6-plus → Kimi K2.6 → MiMo-V2.5` — so a single `GLM 5V Turbo` rate-limit hiccup doesn't kill your session.
|
|
121
121
|
- **Answers natural-language questions** about the video grounded in actual frames: "what does the speaker click in the last 30 seconds?", "summarize the demo", "find the frame where the error appears".
|
|
122
122
|
- **Extras:** timestamp search, audio transcription with speaker diarization, and video metadata (resolution, duration, codec).
|
|
123
123
|
|
package/docs/example-config.md
CHANGED
|
@@ -24,8 +24,8 @@ Here's a complete, real-world `chatLanguageModels.json` that combines **all the
|
|
|
24
24
|
}
|
|
25
25
|
},
|
|
26
26
|
{
|
|
27
|
-
"id": "qwen3.
|
|
28
|
-
"name": "Qwen 3.
|
|
27
|
+
"id": "qwen3.7-plus",
|
|
28
|
+
"name": "Qwen 3.7 Plus",
|
|
29
29
|
"url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
|
|
30
30
|
"toolCalling": true,
|
|
31
31
|
"vision": true,
|
|
@@ -195,7 +195,7 @@ Here's a complete, real-world `chatLanguageModels.json` that combines **all the
|
|
|
195
195
|
If you only need one provider, jump straight to its setup guide:
|
|
196
196
|
|
|
197
197
|
- [Kimi K2.6](kimi.md)
|
|
198
|
-
- [Qwen 3.
|
|
198
|
+
- [Qwen 3.7 Plus / 3.7 Max](qwen.md)
|
|
199
199
|
- [Xiaomi MiMo (V2.5 / V2.5 Pro / V2 Flash)](mimo.md)
|
|
200
200
|
- [MiniMax M3](minimax.md)
|
|
201
201
|
- [GLM (5.1 / 4.7 Flash / 5V Turbo)](glm.md)
|
package/docs/models/glm.md
CHANGED
|
@@ -8,7 +8,7 @@
|
|
|
8
8
|
> - When `thinking: { type: "enabled" }` is set — thinking tokens still hold the in-flight slot, so the model occupies the throttle window longer.
|
|
9
9
|
> - During peak hours, when many users are sharing the same free pool.
|
|
10
10
|
>
|
|
11
|
-
> **This is the free tier behaving as designed, not a bug.** For uninterrupted work, use a paid model: `glm-4.6v
|
|
11
|
+
> **This is the free tier behaving as designed, not a bug.** For uninterrupted work, use a paid model: `glm-4.6v` is the cheapest paid vision option ($0.30/$0.90 per 1M), `glm-4.7` is the best cost/quality balance for text-only ($0.60/$2.20 per 1M), and `glm-5.1` is the flagship ($1.40/$4.40 per 1M). See [Rate limits](#rate-limits) for the full breakdown.
|
|
12
12
|
|
|
13
13
|
## At a Glance
|
|
14
14
|
|
|
@@ -26,19 +26,18 @@
|
|
|
26
26
|
|
|
27
27
|
### Models at a glance
|
|
28
28
|
|
|
29
|
-
| Model
|
|
30
|
-
|
|
|
31
|
-
| `glm-5.1`
|
|
32
|
-
| `glm-4.7`
|
|
33
|
-
| `glm-4.7-flash`
|
|
34
|
-
| `glm-5v-turbo`
|
|
35
|
-
| `glm-4.6v`
|
|
36
|
-
| `glm-4.6v-
|
|
37
|
-
| `glm-4.6v-flash` | ✅ | 128K | 32K | hybrid (auto) | Free ¹ | **Free** vision tier |
|
|
29
|
+
| Model | Vision | Context | Max output | Thinking | Cost (in / out per 1M) | Role |
|
|
30
|
+
| ---------------- | ------ | ------- | ---------- | ------------- | ---------------------- | --------------------------------------------------------- |
|
|
31
|
+
| `glm-5.1` | ❌ | 200K | 128K | `enabled` | $1.40 / $4.40 | Current flagship — long-horizon / 8h autonomous work |
|
|
32
|
+
| `glm-4.7` | ❌ | 200K | 128K | `enabled` | $0.60 / $2.20 | Flagship 4.x — strong coding/agent |
|
|
33
|
+
| `glm-4.7-flash` | ❌ | 200K | 128K | `enabled` | Free ¹ | **Free** — newest 4.x tier at no cost |
|
|
34
|
+
| `glm-5v-turbo` | ✅ | 200K | 128K | `enabled` | $1.20 / $4.00 | Multimodal **coding** model — vision-based agentic coding |
|
|
35
|
+
| `glm-4.6v` | ✅ | 128K | 32K | hybrid (auto) | $0.30 / $0.90 | Vision + **native multimodal tool calls** |
|
|
36
|
+
| `glm-4.6v-flash` | ✅ | 128K | 32K | hybrid (auto) | Free ¹ | **Free** vision tier |
|
|
38
37
|
|
|
39
|
-
> ¹ **Free-tier caveat:** the two `*flash` free models are heavily rate-limited — see the [warning at the top of this document](#glm-zai--zhipu-ai--vs-code-custom-endpoint-setup-guide) and the [Rate limits](#rate-limits) section. Expect frequent HTTP `1302 / ChatRateLimited` errors, especially on context > 8K or with thinking enabled. For reliable use, prefer `glm-4.6v
|
|
38
|
+
> ¹ **Free-tier caveat:** the two `*flash` free models are heavily rate-limited — see the [warning at the top of this document](#glm-zai--zhipu-ai--vs-code-custom-endpoint-setup-guide) and the [Rate limits](#rate-limits) section. Expect frequent HTTP `1302 / ChatRateLimited` errors, especially on context > 8K or with thinking enabled. For reliable use, prefer `glm-4.6v` (cheapest paid vision) or `glm-4.7` (best cost/quality, text-only).
|
|
40
39
|
|
|
41
|
-
> Other GLM models — `glm-5`, `glm-5-turbo`, `glm-4.7`, `glm-4.6`, `glm-4.6v`, `glm-4.6v-flashx`, `glm-4.6v-flash`, `glm-4.5`, `glm-4.5-air`, `glm-4.5-flash`, `glm-4.5-x`, `glm-4.5-airx`, `glm-4-32b-0414-128k` — are callable on the same endpoint but are intentionally **not** added to the default `chatLanguageModels.json` block below. Add them in the same shape if you need them.
|
|
40
|
+
> Other GLM models — `glm-5`, `glm-5-turbo`, `glm-4.7`, `glm-4.6`, `glm-4.6v`, `glm-4.6v-flashx`, `glm-4.6v-flash`, `glm-4.5`, `glm-4.5-air`, `glm-4.5-flash`, `glm-4.5-x`, `glm-4.5-airx`, `glm-4-32b-0414-128k` — are callable on the same endpoint but are intentionally **not** added to the default `chatLanguageModels.json` block below. Add them in the same shape if you need them. Note: `glm-4.6v-flashx` was previously in the default block but has been **removed** because live testing showed it is not reliable for tool calling.
|
|
42
41
|
|
|
43
42
|
## Quick Start
|
|
44
43
|
|
|
@@ -177,8 +176,8 @@ Config file location:
|
|
|
177
176
|
- **Tool calling** with the standard `tools` array. `tool_choice` accepts only `auto`.
|
|
178
177
|
- **Max 128 functions** per request.
|
|
179
178
|
- **Tool stream** (`tool_stream: true`) is supported on the `glm-4.6v` family and above for streaming tool-call deltas.
|
|
180
|
-
- **Vision** on `glm-4.6v`, `glm-4.6v-
|
|
181
|
-
- **Video input** on `glm-5v-turbo` — the model natively accepts video (Input Modality: **Video / Image / Text / File**). Use a public video URL in an `image_url` content part via direct API call; VS Code's chat UI does not currently forward video attachments to the model. For a turnkey VS Code integration that bridges the gap (extracts frames, routes them to GLM or a fallback provider, and answers natural-language questions about the video), see [**Video Context MCP**](https://www.videocontextmcp.com/) — an MCP server that gives Copilot/Cursor/Claude Code video understanding via the `glm-4.6v` provider and a multi-provider fallback chain (Gemini → GLM-4.6V →
|
|
179
|
+
- **Vision** on `glm-4.6v`, `glm-4.6v-flash`, and `glm-5v-turbo` using the OpenAI `image_url` content-part format. External URLs and base64 data URIs both work.
|
|
180
|
+
- **Video input** on `glm-5v-turbo` — the model natively accepts video (Input Modality: **Video / Image / Text / File**). Use a public video URL in an `image_url` content part via direct API call; VS Code's chat UI does not currently forward video attachments to the model. For a turnkey VS Code integration that bridges the gap (extracts frames, routes them to GLM or a fallback provider, and answers natural-language questions about the video), see [**Video Context MCP**](https://www.videocontextmcp.com/) — an MCP server that gives Copilot/Cursor/Claude Code video understanding via the `glm-4.6v` provider and a multi-provider fallback chain (Gemini → GLM-4.6V → Qwen 3.7 Plus → Kimi K2.6 → MiMo-V2.5).
|
|
182
181
|
- **Native multimodal tool calling** on `glm-4.6v` (and inherited by `glm-5v-turbo`) — images, screenshots, and document pages can be passed directly as tool parameters and tool results can be consumed visually.
|
|
183
182
|
- **Built-in web search** is exposed as a tool type `web_search` (different from `function`).
|
|
184
183
|
- **Context caching** is automatic — the API returns `usage.prompt_tokens_details.cached_tokens` on cache hits; cache writes are currently free of charge.
|
|
@@ -219,8 +218,8 @@ ChatRateLimited: Rate limit exceeded
|
|
|
219
218
|
|
|
220
219
|
#### Paid-tier specifics
|
|
221
220
|
|
|
222
|
-
- Paid models (`glm-5.1`, `glm-4.7`, `glm-4.6v`, `glm-
|
|
223
|
-
-
|
|
221
|
+
- Paid models (`glm-5.1`, `glm-4.7`, `glm-4.6v`, `glm-5v-turbo`) share a much larger concurrency pool sized to your prepaid balance. (Note: `glm-4.6v-flashx` was previously listed here as the cheapest paid option, but it has been removed from the recommended set because live testing showed it is not reliable for tool calling.)
|
|
222
|
+
- For the cheapest reliable paid option, use `glm-4.6v` ($0.30 / $0.90 per 1M) if you need vision, or `glm-4.7` ($0.60 / $2.20 per 1M) for text-only work.
|
|
224
223
|
- `glm-4.7` ($0.60 / $2.20 per 1M) is the recommended default for agent/coding work — strong quality at a low price.
|
|
225
224
|
- `glm-5.1` ($1.40 / $4.40 per 1M) is the flagship and only worth it for long-horizon autonomous tasks.
|
|
226
225
|
|
|
@@ -248,7 +247,7 @@ If you want to keep using `glm-4.7-flash` despite the limits:
|
|
|
248
247
|
| 401 Unauthorized | Region mismatch (international key used on China URL, or vice versa) | Match your key to the regional endpoint |
|
|
249
248
|
| Upstream complains about `reasoning_content is missing` | You set `clear_thinking: false` from a client that doesn't forward it | Drop `clear_thinking` from `requestBody` |
|
|
250
249
|
| 429 / "concurrency limit exceeded" | Too many in-flight requests | Reduce concurrent agent sessions, or upgrade your Z.ai plan |
|
|
251
|
-
| `1302` / `ChatRateLimited` on a free-tier model (`*flash`) | Expected behavior — free tier is heavily throttled | Wait ~30s and retry, disable `thinking`, start a new chat, or switch to `glm-4.
|
|
250
|
+
| `1302` / `ChatRateLimited` on a free-tier model (`*flash`) | Expected behavior — free tier is heavily throttled | Wait ~30s and retry, disable `thinking`, start a new chat, or switch to `glm-4.7` |
|
|
252
251
|
| Long Chinese responses when the prompt is English | Missing `Accept-Language: en-US,en` (Z.ai default) | Optional — VS Code's custom-endpoint provider doesn't set custom headers; usually the prompt language wins |
|
|
253
252
|
|
|
254
253
|
## Pricing
|
|
@@ -278,7 +277,7 @@ That makes VS Code's `chat-completions` provider the obvious starting point —
|
|
|
278
277
|
|
|
279
278
|
| Concern | Z.ai / GLM behaviour | Why it matters for VS Code |
|
|
280
279
|
| --------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
281
|
-
| Thinking default | Hybrid for `glm-4.6v` / `glm-4.6v-
|
|
280
|
+
| Thinking default | Hybrid for `glm-4.6v` / `glm-4.6v-flash`; always-on for `glm-5.1` / `glm-4.7` / `glm-4.7-flash` / `glm-5v-turbo`. | VS Code can simply set `thinking: { type: "enabled" }` in `requestBody` to make thinking deterministic on every model. |
|
|
282
281
|
| `reasoning_content` on tool turns | Z.ai defaults to `clear_thinking: true`, **silently stripping historical `reasoning_content`**. | This is a near-perfect match for VS Code, which does **not** preserve `reasoning_content` between turns. Loops work without extra plumbing. |
|
|
283
282
|
| `tool_choice` | Only `auto` is accepted. | VS Code's default behaviour is `auto`, so no override needed. |
|
|
284
283
|
| `temperature` hard cap | `[0.0, 1.0]` — strictly enforced server-side. | Use `1.0` for coding/agent work; never go above. |
|
|
@@ -344,7 +343,7 @@ This file is the **research record and the user-facing setup guide**. The implem
|
|
|
344
343
|
|
|
345
344
|
## Companion tools
|
|
346
345
|
|
|
347
|
-
- [**Video Context MCP**](https://www.videocontextmcp.com/) — an MCP server that gives AI coding assistants (GitHub Copilot, Cursor, Claude Code) the ability to **understand video content** via natural language. Extracts frames from local or remote videos, routes them through a multi-provider fallback chain (**Gemini → GLM-4.6V →
|
|
346
|
+
- [**Video Context MCP**](https://www.videocontextmcp.com/) — an MCP server that gives AI coding assistants (GitHub Copilot, Cursor, Claude Code) the ability to **understand video content** via natural language. Extracts frames from local or remote videos, routes them through a multi-provider fallback chain (**Gemini → GLM-4.6V-flash → Qwen 3.6 Plus → Kimi K2.6 → MiMo-V2.5**), and returns answers grounded in actual video frames. Also handles summarization, timestamp search, audio transcription with speaker diarization, and video metadata. Works around the limitation that VS Code's built-in `view_image` tool only accepts static images — so it lets `glm-5v-turbo`'s native video support actually be exercised end-to-end from inside VS Code.
|
|
348
347
|
|
|
349
348
|
## References
|
|
350
349
|
|
package/docs/models/qwen.md
CHANGED
|
@@ -1,13 +1,13 @@
|
|
|
1
1
|
# Qwen (DashScope) — VS Code Custom Endpoint Setup Guide
|
|
2
2
|
|
|
3
|
-
> **TL;DR:** Direct path works for
|
|
3
|
+
> **TL;DR:** Direct path works for `qwen3.7-plus` (vision) and `qwen3.7-max` (text-only) without a proxy. The optional `proxy/qwen-proxy.mjs` adds dynamic thinking suppression: reasoning stays ON in plain chat but turns OFF automatically when tools are invoked. Pick the mode that matches your tradeoff.
|
|
4
4
|
|
|
5
5
|
## At a Glance
|
|
6
6
|
|
|
7
7
|
| Field | Value |
|
|
8
8
|
| ------------------------------- | ------------------------------------------------------------------------- |
|
|
9
9
|
| Mode | **Direct** (no proxy) **or** **Proxy** (optional, for dynamic thinking) |
|
|
10
|
-
| Vision | ✅ Yes (`qwen3.
|
|
10
|
+
| Vision | ✅ Yes (`qwen3.7-plus`) |
|
|
11
11
|
| Tool calling | ✅ Yes |
|
|
12
12
|
| Context | 1M |
|
|
13
13
|
| Required `requestBody` (direct) | `enable_thinking: false` |
|
|
@@ -19,16 +19,16 @@
|
|
|
19
19
|
|
|
20
20
|
| Model | Vision | Role |
|
|
21
21
|
| -------------- | ------ | -------------------------------------- |
|
|
22
|
-
| `qwen3.
|
|
22
|
+
| `qwen3.7-plus` | ✅ Yes | Primary model with image understanding |
|
|
23
23
|
| `qwen3.7-max` | ❌ No | Larger text-only model |
|
|
24
24
|
|
|
25
|
-
> The snapshot `qwen3.
|
|
25
|
+
> The snapshot `qwen3.7-plus-2026-05-26` is also available; the floating `qwen3.7-plus` alias is preferred.
|
|
26
26
|
|
|
27
27
|
## Quick Start — Direct Path (Recommended for Simplicity)
|
|
28
28
|
|
|
29
29
|
1. **Edit `chatLanguageModels.json`** — add the Qwen block from [Setup § Direct](#direct-path) below.
|
|
30
30
|
2. **Set your `DASHSCOPE_API_KEY`** via Command Palette → **Chat: Manage Language Models**.
|
|
31
|
-
3. **Restart VS Code** and pick "Qwen 3.
|
|
31
|
+
3. **Restart VS Code** and pick "Qwen 3.7 Plus" or "Qwen 3.7 Max".
|
|
32
32
|
|
|
33
33
|
## Quick Start — With Proxy (Dynamic Thinking)
|
|
34
34
|
|
|
@@ -70,8 +70,8 @@ DashScope is region-specific — your API key only works on the endpoint it was
|
|
|
70
70
|
}
|
|
71
71
|
},
|
|
72
72
|
{
|
|
73
|
-
"id": "qwen3.
|
|
74
|
-
"name": "Qwen 3.
|
|
73
|
+
"id": "qwen3.7-plus",
|
|
74
|
+
"name": "Qwen 3.7 Plus",
|
|
75
75
|
"url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
|
|
76
76
|
"toolCalling": true,
|
|
77
77
|
"vision": true,
|
|
@@ -136,8 +136,8 @@ Expected response:
|
|
|
136
136
|
"streaming": true
|
|
137
137
|
},
|
|
138
138
|
{
|
|
139
|
-
"id": "qwen3.
|
|
140
|
-
"name": "Qwen 3.
|
|
139
|
+
"id": "qwen3.7-plus",
|
|
140
|
+
"name": "Qwen 3.7 Plus",
|
|
141
141
|
"url": "http://127.0.0.1:3458/v1/chat/completions",
|
|
142
142
|
"toolCalling": true,
|
|
143
143
|
"vision": true,
|
|
@@ -190,16 +190,18 @@ The Qwen3 hybrid-thinking models default to `enable_thinking: true`, producing `
|
|
|
190
190
|
| Proxy path | Thinking ON (default preserved) | Thinking OFF (auto-injected) |
|
|
191
191
|
| No config (default) | Thinking ON | Risk: history may be rejected |
|
|
192
192
|
|
|
193
|
-
### Vision (`qwen3.
|
|
193
|
+
### Vision (`qwen3.7-plus`)
|
|
194
194
|
|
|
195
195
|
- Image input via OpenAI-compatible `content` array format (base64 data URIs).
|
|
196
196
|
- **External image URLs may fail** if DashScope's servers cannot reach them — base64-encoded images work reliably.
|
|
197
|
+
- **Image attachment behavior**: Unlike some other models, Qwen may fail to read images that are directly dragged and dropped into the Copilot Chat. If this happens, provide the absolute file path to the image (e.g., `c:\path\to\image.png`) in your prompt as a reliable workaround.
|
|
198
|
+
- **Pricing**: **$0.40 / $1.60 per 1M input/output (≤ 256K)** and **$1.20 / $4.80 per 1M (> 256K)**.
|
|
197
199
|
|
|
198
200
|
### Capabilities
|
|
199
201
|
|
|
200
202
|
- Streaming (SSE, `data: [DONE]` terminator).
|
|
201
203
|
- Tool calling with `tools` array and `tool_calls` response.
|
|
202
|
-
- Vision (image input) on `qwen3.
|
|
204
|
+
- Vision (image input) on `qwen3.7-plus`.
|
|
203
205
|
- Non-OpenAI extras: `enable_thinking`, `thinking_budget`, `enable_search` (via `extra_body`).
|
|
204
206
|
|
|
205
207
|
## Troubleshooting
|
|
@@ -220,7 +222,7 @@ For the cross-provider comparison, see [docs/pricing.md](../pricing.md). DashSco
|
|
|
220
222
|
|
|
221
223
|
| Model | Input (≤ 256K tokens) | Input (> 256K tokens) | Output (≤ 256K tokens) | Output (> 256K tokens) |
|
|
222
224
|
| -------------- | --------------------- | --------------------- | ---------------------- | ---------------------- |
|
|
223
|
-
| `qwen3.
|
|
225
|
+
| `qwen3.7-plus` | $0.40 / 1M | $1.20 / 1M | $1.60 / 1M | $4.80 / 1M |
|
|
224
226
|
| `qwen3.7-max` | $2.50 / 1M (≤ 1M) | — | $7.50 / 1M (≤ 1M) | — |
|
|
225
227
|
|
|
226
228
|
> **Free quota:** DashScope offers 1M input + 1M output tokens per model, valid for 90 days after activating Model Studio.
|
|
@@ -266,7 +268,7 @@ Both work — pick based on your preference:
|
|
|
266
268
|
| Streaming in VS Code | ✅ | Token-by-token streaming confirmed |
|
|
267
269
|
| Tool / agent use in VS Code | ✅ | Browser tool invoked successfully |
|
|
268
270
|
|
|
269
|
-
#### Direct-path validation — `qwen3.
|
|
271
|
+
#### Direct-path validation — `qwen3.7-plus`
|
|
270
272
|
|
|
271
273
|
| Capability | Result | Notes |
|
|
272
274
|
| -------------------------------------------- | ------ | ---------------------------------------------------------------------------- |
|
|
@@ -275,7 +277,7 @@ Both work — pick based on your preference:
|
|
|
275
277
|
| Tool-enabled chat (`enable_thinking: false`) | ✅ | Clean `tool_calls`, no `reasoning_content`, 25 tokens |
|
|
276
278
|
| Vision: image + text (curl, base64) | ✅ | Model correctly identified a 10×10 test pattern; `image_tokens: 66` |
|
|
277
279
|
| Vision: image + text (curl, external URL) | ❌ | `Failed to download multimodal content` — DashScope couldn't reach Wikipedia |
|
|
278
|
-
| Model appears in VS Code picker | ✅ | "Agent \| Qwen 3.
|
|
280
|
+
| Model appears in VS Code picker | ✅ | "Agent \| Qwen 3.7 Plus" confirmed |
|
|
279
281
|
| Plain chat in VS Code | ✅ | Streaming output confirmed |
|
|
280
282
|
| Streaming in VS Code | ✅ | Token-by-token streaming confirmed |
|
|
281
283
|
| Tool / agent use in VS Code | ✅ | Browser tool invoked to open Qwen docs and Google |
|
|
@@ -283,17 +285,17 @@ Both work — pick based on your preference:
|
|
|
283
285
|
|
|
284
286
|
#### Intermittent `ERR_CONNECTION_RESET` investigation
|
|
285
287
|
|
|
286
|
-
A `net::ERR_CONNECTION_RESET` was observed once during `qwen3.
|
|
288
|
+
A `net::ERR_CONNECTION_RESET` was observed once during `qwen3.7-plus` validation, but did not reproduce on the same machine outside VS Code:
|
|
287
289
|
|
|
288
290
|
- Direct `curl` POST to DashScope Singapore → HTTP 200.
|
|
289
291
|
- Direct Node.js HTTPS POST → HTTP 200.
|
|
290
|
-
- Direct Node.js HTTPS **streaming** POST with full `qwen3.
|
|
292
|
+
- Direct Node.js HTTPS **streaming** POST with full `qwen3.7-plus.md` content embedded → HTTP 200.
|
|
291
293
|
|
|
292
294
|
Conclusion: not a DashScope or Qwen model incompatibility. Evidence points to an intermittent VS Code / Electron transport issue or transient network interruption local to the editor process.
|
|
293
295
|
|
|
294
296
|
### Final verdict
|
|
295
297
|
|
|
296
|
-
| Criterion | `qwen3.7-max` | `qwen3.
|
|
298
|
+
| Criterion | `qwen3.7-max` | `qwen3.7-plus` |
|
|
297
299
|
| ---------------------- | -------------- | -------------- |
|
|
298
300
|
| Plain chat | ✅ | ✅ |
|
|
299
301
|
| Streaming chat | ✅ | ✅ |
|
|
@@ -305,7 +307,7 @@ Conclusion: not a DashScope or Qwen model incompatibility. Evidence points to an
|
|
|
305
307
|
|
|
306
308
|
- GitHub Copilot inline completions and semantic-search features remain outside scope.
|
|
307
309
|
- One intermittent VS Code-side `net::ERR_CONNECTION_RESET` was observed — not reproducible externally, treated as transient transport issue.
|
|
308
|
-
- External image URLs may fail if DashScope's servers cannot reach them; base64-encoded images work reliably
|
|
310
|
+
- External image URLs may fail if DashScope's servers cannot reach them; base64-encoded images work reliably.
|
|
309
311
|
- Vision is not supported on `qwen3.7-max` (text-generation model).
|
|
310
312
|
- `maxInputTokens` / `maxOutputTokens` not yet confirmed from official DashScope documentation.
|
|
311
313
|
- API keys are region-specific — a key created for one regional endpoint will not work with another.
|
package/docs/pricing.md
CHANGED
|
@@ -50,7 +50,7 @@ These are the models available through GitHub Copilot's model roster as of June
|
|
|
50
50
|
| **DeepSeek V4 Pro** | DeepSeek | $1.74 | $3.48 | 1M |
|
|
51
51
|
| **MiMo V2.5** | Xiaomi | $0.40 | $2.00 | 1M |
|
|
52
52
|
| **MiMo V2.5 Pro** | Xiaomi | $1.00 | $3.00 | 1M |
|
|
53
|
-
| **Qwen 3.
|
|
53
|
+
| **Qwen 3.7 Plus** | DashScope | $0.40 (≤256K) / $1.20 (>256K) | $1.60 (≤256K) / $4.80 (>256K) | 1M |
|
|
54
54
|
| **Qwen 3.7 Max** | DashScope | $2.50 (≤1M) | $7.50 (≤1M) | 1M |
|
|
55
55
|
| **MiniMax M3** | MiniMax | $0.60 (≤512K) / $1.20 (>512K) | $2.40 (≤512K) / $4.80 (>512K) | 1M |
|
|
56
56
|
| **GLM 4.7 Flash** | Z.ai | Free (rate-limited ¹) | Free (rate-limited ¹) | 200K |
|
|
@@ -83,8 +83,8 @@ For a typical coding session (~10K input + ~2K output tokens per turn, 50 turns)
|
|
|
83
83
|
| Kimi K2.6 (non-thinking) | ~$0.18 | — |
|
|
84
84
|
| MiMo V2.5 | ~$0.40 | — |
|
|
85
85
|
| Kimi K2.6 (thinking) | ~$0.48 | — |
|
|
86
|
+
| Qwen 3.7 Plus | ~$0.36 | — |
|
|
86
87
|
| Gemini 3 Flash | ~$0.55 | ~55 |
|
|
87
|
-
| Qwen 3.6 Plus | ~$0.55 | — |
|
|
88
88
|
| MiniMax M3 | ~$0.54 | — |
|
|
89
89
|
| MiMo V2.5 Pro | ~$0.80 | — |
|
|
90
90
|
| GLM 4.7 Flash (free) | ~$0.00 ¹ | — |
|
package/package.json
CHANGED
package/proxy/qwen-proxy.mjs
CHANGED
|
@@ -5,7 +5,7 @@ import { createProxy } from '../lib/create-proxy.mjs'
|
|
|
5
5
|
|
|
6
6
|
/**
|
|
7
7
|
* Supported model scope for this proxy:
|
|
8
|
-
* - Validated with `qwen3.
|
|
8
|
+
* - Validated with `qwen3.7-plus` and `qwen3.7-max`.
|
|
9
9
|
* - Expected to work for any Qwen3 hybrid-thinking model (qwen3-* series)
|
|
10
10
|
* that supports the `enable_thinking` top-level field on DashScope's
|
|
11
11
|
* OpenAI-compatible surface.
|