copilot-custom-endpoint 1.3.0 → 1.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/docs/example-config.md +203 -0
- package/docs/models/glm.md +366 -0
- package/docs/models/kimi.md +232 -0
- package/docs/models/mimo.md +258 -0
- package/docs/models/minimax.md +247 -0
- package/docs/models/qwen.md +320 -0
- package/docs/pricing.md +116 -0
- package/package.json +5 -1
|
@@ -0,0 +1,247 @@
|
|
|
1
|
+
# MiniMax — VS Code Custom Endpoint Setup Guide
|
|
2
|
+
|
|
3
|
+
> **TL;DR:** MiniMax-M3 works directly — no proxy needed. Use `thinking: { type: "adaptive" }` + `reasoning_split: true` in `requestBody` so the model can reason and the response arrives in a clean OpenAI format (`reasoning_details` field, separate from `content`). **Important:** `thinking: { type: "disabled" }` is **not** a hard override — the model still reasons internally and emits `<think>` tags / `reasoning_content` regardless.
|
|
4
|
+
|
|
5
|
+
## At a Glance
|
|
6
|
+
|
|
7
|
+
| Field | Value |
|
|
8
|
+
| ------------------------ | ------------------------------------------------------- |
|
|
9
|
+
| Mode | **Direct** (no proxy) |
|
|
10
|
+
| Vision | ✅ Yes (image + video) |
|
|
11
|
+
| Tool calling | ✅ Yes |
|
|
12
|
+
| Context | 1M (guaranteed 512K) |
|
|
13
|
+
| Max output | 512K (recommended 128K) |
|
|
14
|
+
| Required `requestBody` | `thinking: { type: "adaptive" }, reasoning_split: true` |
|
|
15
|
+
| Endpoint (international) | `https://api.minimax.io/v1/chat/completions` |
|
|
16
|
+
| Endpoint (China) | `https://api.minimaxi.com/v1/chat/completions` |
|
|
17
|
+
|
|
18
|
+
## Quick Start
|
|
19
|
+
|
|
20
|
+
1. **Edit `chatLanguageModels.json`** — add the MiniMax block from [Setup](#setup) below.
|
|
21
|
+
2. **Set your `MINIMAX_API_KEY`** via Command Palette → **Chat: Manage Language Models**.
|
|
22
|
+
3. **Restart VS Code** and pick "MiniMax M3" in the chat picker.
|
|
23
|
+
|
|
24
|
+
## Setup
|
|
25
|
+
|
|
26
|
+
### 1. VS Code configuration
|
|
27
|
+
|
|
28
|
+
Config file location:
|
|
29
|
+
|
|
30
|
+
| OS | Path |
|
|
31
|
+
| ------- | ----------------------------------------------------------------- |
|
|
32
|
+
| Windows | `%APPDATA%\Code\User\chatLanguageModels.json` |
|
|
33
|
+
| macOS | `~/Library/Application Support/Code/User/chatLanguageModels.json` |
|
|
34
|
+
| Linux | `~/.config/Code/User/chatLanguageModels.json` |
|
|
35
|
+
|
|
36
|
+
```json
|
|
37
|
+
{
|
|
38
|
+
"name": "MiniMax",
|
|
39
|
+
"vendor": "customendpoint",
|
|
40
|
+
"apiKey": "",
|
|
41
|
+
"apiType": "chat-completions",
|
|
42
|
+
"models": [
|
|
43
|
+
{
|
|
44
|
+
"id": "MiniMax-M3",
|
|
45
|
+
"name": "MiniMax M3",
|
|
46
|
+
"url": "https://api.minimax.io/v1/chat/completions",
|
|
47
|
+
"toolCalling": true,
|
|
48
|
+
"vision": true,
|
|
49
|
+
"streaming": true,
|
|
50
|
+
"maxInputTokens": 1048576,
|
|
51
|
+
"maxOutputTokens": 131072,
|
|
52
|
+
"requestBody": {
|
|
53
|
+
"thinking": { "type": "adaptive" },
|
|
54
|
+
"reasoning_split": true,
|
|
55
|
+
"temperature": 1,
|
|
56
|
+
"top_p": 0.95
|
|
57
|
+
}
|
|
58
|
+
}
|
|
59
|
+
]
|
|
60
|
+
}
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
### 2. API key
|
|
64
|
+
|
|
65
|
+
1. Open the Command Palette (`Ctrl+Shift+P`).
|
|
66
|
+
2. Run **Chat: Manage Language Models**.
|
|
67
|
+
3. Find the **MiniMax** group → **Update API Key**.
|
|
68
|
+
4. Paste your MiniMax API key.
|
|
69
|
+
|
|
70
|
+
> After setting via the UI, VS Code replaces `"apiKey": ""` with a `${input:chat.lm.secret.<id>}` reference.
|
|
71
|
+
|
|
72
|
+
### 3. Regional endpoints
|
|
73
|
+
|
|
74
|
+
| Region | Endpoint |
|
|
75
|
+
| ------------- | ---------------------------------------------- |
|
|
76
|
+
| International | `https://api.minimax.io/v1/chat/completions` |
|
|
77
|
+
| China | `https://api.minimaxi.com/v1/chat/completions` |
|
|
78
|
+
|
|
79
|
+
> API keys are region-specific and cannot be used across regions.
|
|
80
|
+
|
|
81
|
+
## Configuration Reference
|
|
82
|
+
|
|
83
|
+
### Sampling parameters
|
|
84
|
+
|
|
85
|
+
| Task type | `temperature` | `top_p` |
|
|
86
|
+
| -------------------- | ------------- | ------- |
|
|
87
|
+
| Agentic / tool-use | `1.0` | `0.95` |
|
|
88
|
+
| Coding | `1.0` | `0.95` |
|
|
89
|
+
| General conversation | `1.0` | `0.95` |
|
|
90
|
+
|
|
91
|
+
M3 accepts `temperature` in `[0, 2]` and `top_p` in `[0, 1]`.
|
|
92
|
+
|
|
93
|
+
### Thinking mode
|
|
94
|
+
|
|
95
|
+
| `thinking.type` | Behavior |
|
|
96
|
+
| --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
97
|
+
| `adaptive` | **Recommended.** Model decides whether to think. |
|
|
98
|
+
| `disabled` | Soft hint only — the model still reasons internally and emits `<think>` / `reasoning_content` regardless. Use only if you want a different response field layout. |
|
|
99
|
+
|
|
100
|
+
When thinking is enabled (any mode), the server returns thinking in one of two formats:
|
|
101
|
+
|
|
102
|
+
1. **Native** (default): thinking is embedded in `content` wrapped in `<think>` tags.
|
|
103
|
+
2. **Interleaved Thinking** (`reasoning_split: true`): thinking is separated into a `reasoning_details` field for cleaner programmatic access.
|
|
104
|
+
|
|
105
|
+
VS Code will most likely **ignore** the extra `reasoning_details` / `reasoning_content` / `delta.reasoning` fields it doesn't recognize, so `reasoning_split: true` is the cleanest way to keep `content` clean.
|
|
106
|
+
|
|
107
|
+
### Capabilities
|
|
108
|
+
|
|
109
|
+
- Streaming (SSE, standard OpenAI format).
|
|
110
|
+
- Tool calling with `tool_choice: "auto"`.
|
|
111
|
+
- Vision: image and video understanding (M3 only).
|
|
112
|
+
- Native multimodal training from step zero.
|
|
113
|
+
- Interleaved Thinking: model can reason between each round of tool interactions.
|
|
114
|
+
- Automatic prompt caching (no configuration needed).
|
|
115
|
+
- 1M context enables long-range agent tasks, long-horizon coding, and long-video understanding.
|
|
116
|
+
|
|
117
|
+
### Rate limits
|
|
118
|
+
|
|
119
|
+
| Model | RPM | TPM |
|
|
120
|
+
| ------------ | --- | ---------- |
|
|
121
|
+
| `MiniMax-M3` | 200 | 10,000,000 |
|
|
122
|
+
|
|
123
|
+
> Input tokens above 512K are available in limited quantity for a limited time. Contact sales for access.
|
|
124
|
+
|
|
125
|
+
### Model ID casing
|
|
126
|
+
|
|
127
|
+
MiniMax model IDs are **case-sensitive**. Use exactly:
|
|
128
|
+
|
|
129
|
+
- `MiniMax-M3` (not `minimax-m3` or `MINIMAX-M3`)
|
|
130
|
+
|
|
131
|
+
## Troubleshooting
|
|
132
|
+
|
|
133
|
+
| Symptom | Likely cause | Fix |
|
|
134
|
+
| ------------------------------ | ------------------------------------------------------ | -------------------------------------------------------- |
|
|
135
|
+
| Model not in picker | Config not reloaded, or wrong casing | Restart VS Code; verify model ID is exactly `MiniMax-M3` |
|
|
136
|
+
| Reasoning leaks into `content` | Missing `reasoning_split` | Add `reasoning_split: true` to `requestBody` |
|
|
137
|
+
| 401 Unauthorized | API key region mismatch | Use the endpoint that matches your key's region |
|
|
138
|
+
| 429 rate-limited | Concurrent sessions exceeded 200 RPM / 10M TPM | Reduce concurrent agent sessions |
|
|
139
|
+
| Vision request returns 400 | Vision only supported on M3 (not the legacy M2.x line) | Use `MiniMax-M3` |
|
|
140
|
+
|
|
141
|
+
## Pricing
|
|
142
|
+
|
|
143
|
+
For the cross-provider comparison, see [docs/pricing.md](../pricing.md). MiniMax-M3 pay-as-you-go rates:
|
|
144
|
+
|
|
145
|
+
| Token range | Input (Cache Hit) | Input (Cache Miss) | Output |
|
|
146
|
+
| --------------------- | ----------------- | ------------------ | ---------- |
|
|
147
|
+
| ≤ 512K input tokens | $0.12 / 1M | $0.60 / 1M | $2.40 / 1M |
|
|
148
|
+
| > 512K input tokens\* | $0.24 / 1M | $1.20 / 1M | $4.80 / 1M |
|
|
149
|
+
|
|
150
|
+
\* Input tokens above 512K are available in limited quantity for a limited time.
|
|
151
|
+
|
|
152
|
+
> **Promo:** A 7-day 50% off promotion is available for new accounts, making the ≤ 512K tier effectively $0.30 / 1M input and $1.20 / 1M output for the first week.
|
|
153
|
+
|
|
154
|
+
### Token Plan (subscription)
|
|
155
|
+
|
|
156
|
+
MiniMax also offers monthly subscription plans with quota that resets each month (Plus $20/mo, Max $50/mo, Ultra $120/mo). All plans provide access to all models. See the [MiniMax Token Plan page](https://platform.minimax.io/docs/guides/pricing-token-plan) for details.
|
|
157
|
+
|
|
158
|
+
---
|
|
159
|
+
|
|
160
|
+
## Background & Findings
|
|
161
|
+
|
|
162
|
+
> This appendix preserves the validation narrative for future reference. It is not required to use the model.
|
|
163
|
+
|
|
164
|
+
### The `thinking` parameter is a soft hint
|
|
165
|
+
|
|
166
|
+
The `thinking: { "type": "disabled" }` parameter does **not** suppress `<think>` tags or `reasoning_content` in responses — the model always reasons internally. The setting is a layout hint, not a behavioral override.
|
|
167
|
+
|
|
168
|
+
This is the key insight behind the recommended config:
|
|
169
|
+
|
|
170
|
+
- `thinking: { type: "adaptive" }` lets the model decide when to reason (which is "always" in practice).
|
|
171
|
+
- `reasoning_split: true` tells the server to put the reasoning into a structured `reasoning_details` field, keeping `content` clean for VS Code.
|
|
172
|
+
|
|
173
|
+
If you have an older config that uses `disabled` (e.g., to mirror the MiMo convention), it will still work — the difference vs `adaptive` is purely cosmetic (response field layout). The model remains stable in 3-turn tool loops under both settings.
|
|
174
|
+
|
|
175
|
+
### Architecture
|
|
176
|
+
|
|
177
|
+
- **Model:** MiniMax-M3 (multimodal frontier coding model).
|
|
178
|
+
- **Architecture:** Novel MiniMax Sparse Attention (MSA) — designed for 1M context with low latency.
|
|
179
|
+
- **Training:** Native multimodal training from step zero with 100T+ data, deep alignment between textual and visual semantic spaces.
|
|
180
|
+
|
|
181
|
+
### Validation results (June 3, 2026)
|
|
182
|
+
|
|
183
|
+
#### Phase 1 — Connectivity check
|
|
184
|
+
|
|
185
|
+
| Check | Result | Notes |
|
|
186
|
+
| ------------------ | ------ | ------------------------------------------------------------------------------------- |
|
|
187
|
+
| Non-streaming chat | ✅ | Model responded with `<think>` reasoning + greeting content |
|
|
188
|
+
| Streaming (SSE) | ✅ | Chunks arrive as `data: {...}` with incremental `delta.content` and `delta.reasoning` |
|
|
189
|
+
| Tool calling | ✅ | `finish_reason: "tool_calls"` with `get_weather({"location": "San Francisco"})` |
|
|
190
|
+
| Vision | ✅ | Correctly identified Google logo colors (blue, red, yellow, green) from PNG URL |
|
|
191
|
+
|
|
192
|
+
**Key Phase 1 finding:** `thinking: {"type": "disabled"}` does not suppress reasoning — the model still emits `<think>` tags and `reasoning_content`. This is why the recommended config uses `adaptive` + `reasoning_split: true`.
|
|
193
|
+
|
|
194
|
+
#### Phase 2 — VS Code in-editor validation
|
|
195
|
+
|
|
196
|
+
The Copilot Chat panel in the validation session was running on `MiniMax M3`, making every response a live test.
|
|
197
|
+
|
|
198
|
+
| Step | Result | Evidence |
|
|
199
|
+
| --------------------------------------- | ------ | ------------------------------------------------------------------------------------------------------- |
|
|
200
|
+
| Add config to `chatLanguageModels.json` | ✅ | `MiniMax M3` appears in the model picker |
|
|
201
|
+
| Open VS Code and select the model | ✅ | Screenshot confirms "MiniMax M3" selected |
|
|
202
|
+
| Plain chat | ✅ | Coherent answer to "What do you mean by streaming in this context?" |
|
|
203
|
+
| Streaming | ✅ | Text appeared progressively in the chat panel |
|
|
204
|
+
| Tool calling (agent mode) | ✅ | `open_browser_page` invoked successfully → "Google" page title confirmed |
|
|
205
|
+
| Vision | ✅ | Facebook screenshot analyzed in detail (10 tabs, sidebar items, post content, birthdays, Reels section) |
|
|
206
|
+
|
|
207
|
+
#### Phase 3 — Multi-turn tool loop test
|
|
208
|
+
|
|
209
|
+
Asked the model to inspect a YouTube video (`https://www.youtube.com/watch?v=rAzT5lcezPs`) using videoMcp tools. The model chained three tool calls:
|
|
210
|
+
|
|
211
|
+
| # | Tool | Result |
|
|
212
|
+
| --- | --------------------------------------------- | -------------------------------------------------------------------------------------------------------------- |
|
|
213
|
+
| 1 | `mcp_videomcp_get_video_info` | ✅ 17 min 6 sec, 1280×720 @ 30 fps, h264, 96 MB |
|
|
214
|
+
| 2 | `mcp_videomcp_analyze_video` (Gemini backend) | ✅ Identified presenter (PewDiePie), topic (Odysseus — local self-hosted AI workspace), key features, sponsors |
|
|
215
|
+
| 3 | `mcp_videomcp_transcribe_video` (Deepgram) | ✅ 17 KB transcript written to file |
|
|
216
|
+
|
|
217
|
+
**Findings:**
|
|
218
|
+
|
|
219
|
+
- ✅ All three tool calls succeeded without errors.
|
|
220
|
+
- ✅ Model chained calls logically (metadata → analysis → transcript) rather than asking the user to re-prompt.
|
|
221
|
+
- ✅ Each tool result was incorporated into subsequent reasoning.
|
|
222
|
+
- ✅ No `<think>` tag or `reasoning_content` degradation observed mid-conversation — the multi-turn loop did not visibly break the model, contradicting the original Phase 1 worry that `<think>` tags would cause problems.
|
|
223
|
+
|
|
224
|
+
#### Phase 4 — Long-context test
|
|
225
|
+
|
|
226
|
+
**Skipped.** The 1M context claim is well-supported by MiniMax's published benchmarks, and the curl test in Phase 1 confirmed single-turn support for multi-KB prompts. The long-context pressure-test is deferred until a real workload requires it.
|
|
227
|
+
|
|
228
|
+
### Final verdict
|
|
229
|
+
|
|
230
|
+
- Acceptable for plain chat: **yes**
|
|
231
|
+
- Acceptable for streaming chat: **yes**
|
|
232
|
+
- Acceptable for tool-enabled agent use: **yes**
|
|
233
|
+
- Acceptable for vision: **yes**
|
|
234
|
+
- Acceptable without a proxy: **yes**
|
|
235
|
+
|
|
236
|
+
## References
|
|
237
|
+
|
|
238
|
+
- MiniMax Official Website: `https://www.minimax.io/`
|
|
239
|
+
- MiniMax API Documentation: `https://platform.minimax.io/docs/guides/models-intro`
|
|
240
|
+
- MiniMax M3 Model Page: `https://www.minimax.io/models/text/m3`
|
|
241
|
+
- MiniMax Text Generation Guide: `https://platform.minimax.io/docs/guides/text-generation`
|
|
242
|
+
- MiniMax Tool Use & Interleaved Thinking: `https://platform.minimax.io/docs/guides/text-m3-function-call`
|
|
243
|
+
- MiniMax Pricing: `https://platform.minimax.io/docs/pricing/overview`
|
|
244
|
+
- MiniMax Pay as You Go: `https://platform.minimax.io/docs/guides/pricing-paygo`
|
|
245
|
+
- MiniMax Token Plan: `https://platform.minimax.io/docs/guides/pricing-token-plan`
|
|
246
|
+
- MiniMax Rate Limits: `https://platform.minimax.io/docs/guides/rate-limits`
|
|
247
|
+
- MiniMax M3 for AI Coding Tools: `https://platform.minimax.io/docs/guides/text-ai-coding-tools`
|
|
@@ -0,0 +1,320 @@
|
|
|
1
|
+
# Qwen (DashScope) — VS Code Custom Endpoint Setup Guide
|
|
2
|
+
|
|
3
|
+
> **TL;DR:** Direct path works for both `qwen3.6-plus` (vision) and `qwen3.7-max` (text-only) without a proxy. The optional `proxy/qwen-proxy.mjs` adds dynamic thinking suppression: reasoning stays ON in plain chat but turns OFF automatically when tools are invoked. Pick the mode that matches your tradeoff.
|
|
4
|
+
|
|
5
|
+
## At a Glance
|
|
6
|
+
|
|
7
|
+
| Field | Value |
|
|
8
|
+
| ------------------------------- | ------------------------------------------------------------------------- |
|
|
9
|
+
| Mode | **Direct** (no proxy) **or** **Proxy** (optional, for dynamic thinking) |
|
|
10
|
+
| Vision | ✅ Yes (`qwen3.6-plus` only) |
|
|
11
|
+
| Tool calling | ✅ Yes |
|
|
12
|
+
| Context | 1M |
|
|
13
|
+
| Required `requestBody` (direct) | `enable_thinking: false` |
|
|
14
|
+
| Required `requestBody` (proxy) | none — proxy injects based on `tools` presence |
|
|
15
|
+
| Endpoint | `https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions` |
|
|
16
|
+
| Proxy endpoint | `http://127.0.0.1:3458/v1/chat/completions` |
|
|
17
|
+
|
|
18
|
+
### Models at a glance
|
|
19
|
+
|
|
20
|
+
| Model | Vision | Role |
|
|
21
|
+
| -------------- | ------ | -------------------------------------- |
|
|
22
|
+
| `qwen3.6-plus` | ✅ Yes | Primary model with image understanding |
|
|
23
|
+
| `qwen3.7-max` | ❌ No | Larger text-only model |
|
|
24
|
+
|
|
25
|
+
> The snapshot `qwen3.6-plus-2026-04-02` is also available; the floating `qwen3.6-plus` alias is preferred.
|
|
26
|
+
|
|
27
|
+
## Quick Start — Direct Path (Recommended for Simplicity)
|
|
28
|
+
|
|
29
|
+
1. **Edit `chatLanguageModels.json`** — add the Qwen block from [Setup § Direct](#direct-path) below.
|
|
30
|
+
2. **Set your `DASHSCOPE_API_KEY`** via Command Palette → **Chat: Manage Language Models**.
|
|
31
|
+
3. **Restart VS Code** and pick "Qwen 3.6 Plus" or "Qwen 3.7 Max".
|
|
32
|
+
|
|
33
|
+
## Quick Start — With Proxy (Dynamic Thinking)
|
|
34
|
+
|
|
35
|
+
1. **Start the proxy:** `npm run proxy:qwen`.
|
|
36
|
+
2. **Edit `chatLanguageModels.json`** — use the proxy-path block from [Setup § Proxy](#proxy-path) below.
|
|
37
|
+
3. **Set your DashScope API key** via the Language Models UI.
|
|
38
|
+
4. **Restart VS Code.** Reasoning will be visible in plain chat and suppressed on tool turns.
|
|
39
|
+
|
|
40
|
+
## Setup
|
|
41
|
+
|
|
42
|
+
### Regional endpoints
|
|
43
|
+
|
|
44
|
+
DashScope is region-specific — your API key only works on the endpoint it was created for:
|
|
45
|
+
|
|
46
|
+
| Region | Endpoint |
|
|
47
|
+
| ---------------------------------- | ------------------------------------------------------------------------- |
|
|
48
|
+
| **Singapore (used in this guide)** | `https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions` |
|
|
49
|
+
| China (Beijing) | `https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions` |
|
|
50
|
+
| US (Virginia) | `https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions` |
|
|
51
|
+
|
|
52
|
+
### Direct path
|
|
53
|
+
|
|
54
|
+
```json
|
|
55
|
+
{
|
|
56
|
+
"name": "Qwen",
|
|
57
|
+
"vendor": "customendpoint",
|
|
58
|
+
"apiKey": "",
|
|
59
|
+
"apiType": "chat-completions",
|
|
60
|
+
"models": [
|
|
61
|
+
{
|
|
62
|
+
"id": "qwen3.7-max",
|
|
63
|
+
"name": "Qwen 3.7 Max",
|
|
64
|
+
"url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
|
|
65
|
+
"toolCalling": true,
|
|
66
|
+
"vision": false,
|
|
67
|
+
"streaming": true,
|
|
68
|
+
"requestBody": {
|
|
69
|
+
"enable_thinking": false
|
|
70
|
+
}
|
|
71
|
+
},
|
|
72
|
+
{
|
|
73
|
+
"id": "qwen3.6-plus",
|
|
74
|
+
"name": "Qwen 3.6 Plus",
|
|
75
|
+
"url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
|
|
76
|
+
"toolCalling": true,
|
|
77
|
+
"vision": true,
|
|
78
|
+
"streaming": true,
|
|
79
|
+
"requestBody": {
|
|
80
|
+
"enable_thinking": false
|
|
81
|
+
}
|
|
82
|
+
}
|
|
83
|
+
]
|
|
84
|
+
}
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
> **`enable_thinking: false`** suppresses the Qwen3 family's default thinking mode, which prevents `reasoning_content` issues during tool loops.
|
|
88
|
+
|
|
89
|
+
### Proxy path
|
|
90
|
+
|
|
91
|
+
#### 1. Start the proxy
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
node proxy/qwen-proxy.mjs
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
Expected output:
|
|
98
|
+
|
|
99
|
+
```
|
|
100
|
+
[qwen-proxy] listening on http://127.0.0.1:3458/v1/chat/completions
|
|
101
|
+
[qwen-proxy] forwarding to https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
Verify it's alive:
|
|
105
|
+
|
|
106
|
+
```bash
|
|
107
|
+
curl http://127.0.0.1:3458/healthz
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
Expected response:
|
|
111
|
+
|
|
112
|
+
```json
|
|
113
|
+
{
|
|
114
|
+
"ok": true,
|
|
115
|
+
"upstreamUrl": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
|
|
116
|
+
"port": 3458,
|
|
117
|
+
"disableThinkingWithTools": true
|
|
118
|
+
}
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
#### 2. Update VS Code config — point URLs to the proxy and remove `requestBody.enable_thinking`
|
|
122
|
+
|
|
123
|
+
```json
|
|
124
|
+
{
|
|
125
|
+
"name": "Qwen",
|
|
126
|
+
"vendor": "customendpoint",
|
|
127
|
+
"apiKey": "",
|
|
128
|
+
"apiType": "chat-completions",
|
|
129
|
+
"models": [
|
|
130
|
+
{
|
|
131
|
+
"id": "qwen3.7-max",
|
|
132
|
+
"name": "Qwen 3.7 Max",
|
|
133
|
+
"url": "http://127.0.0.1:3458/v1/chat/completions",
|
|
134
|
+
"toolCalling": true,
|
|
135
|
+
"vision": false,
|
|
136
|
+
"streaming": true
|
|
137
|
+
},
|
|
138
|
+
{
|
|
139
|
+
"id": "qwen3.6-plus",
|
|
140
|
+
"name": "Qwen 3.6 Plus",
|
|
141
|
+
"url": "http://127.0.0.1:3458/v1/chat/completions",
|
|
142
|
+
"toolCalling": true,
|
|
143
|
+
"vision": true,
|
|
144
|
+
"streaming": true
|
|
145
|
+
}
|
|
146
|
+
]
|
|
147
|
+
}
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
> **Keep the proxy terminal open** while using Qwen via proxy.
|
|
151
|
+
|
|
152
|
+
#### Proxy environment variables
|
|
153
|
+
|
|
154
|
+
All can be set in a `.env` file at the repo root (both proxies `import 'dotenv/config'` automatically).
|
|
155
|
+
|
|
156
|
+
| Variable | Default | Purpose |
|
|
157
|
+
| ---------------------------------------- | ------------------------------------------------------------------------- | -------------------------------------------------- |
|
|
158
|
+
| `QWEN_PROXY_PORT` | `3458` (falls back to `PORT`) | Local listen port |
|
|
159
|
+
| `QWEN_UPSTREAM_URL` | `https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions` | Upstream DashScope endpoint |
|
|
160
|
+
| `QWEN_PROXY_LOG` | `debug_log/qwen-proxy.ndjson` (relative to repo root) | Redacted NDJSON log path |
|
|
161
|
+
| `QWEN_PROXY_DISABLE_THINKING_WITH_TOOLS` | `1` | Set to `0` to skip tool-aware thinking suppression |
|
|
162
|
+
|
|
163
|
+
#### Proxy request rewriting rules
|
|
164
|
+
|
|
165
|
+
| Condition | Action |
|
|
166
|
+
| ---------------------------------------- | ----------------------------------------------------------- |
|
|
167
|
+
| `body.tools` is a non-empty array | Set `body.enable_thinking = false` |
|
|
168
|
+
| `body.tools` is missing, empty, or falsy | Delete `body.enable_thinking` (let model default to `true`) |
|
|
169
|
+
|
|
170
|
+
> **Why delete rather than set `true`?** Omitting the key lets Qwen use its built-in default (`true`). Deletion is closer to "don't interfere."
|
|
171
|
+
|
|
172
|
+
### API key
|
|
173
|
+
|
|
174
|
+
1. Open the Command Palette (`Ctrl+Shift+P`).
|
|
175
|
+
2. Run **Chat: Manage Language Models**.
|
|
176
|
+
3. Find the **Qwen** group → **Update API Key**.
|
|
177
|
+
4. Paste your DashScope API key.
|
|
178
|
+
|
|
179
|
+
> After setting via the UI, VS Code replaces `"apiKey": ""` with a `${input:chat.lm.secret.<id>}` reference.
|
|
180
|
+
|
|
181
|
+
## Configuration Reference
|
|
182
|
+
|
|
183
|
+
### Thinking mode
|
|
184
|
+
|
|
185
|
+
The Qwen3 hybrid-thinking models default to `enable_thinking: true`, producing `reasoning_content` in responses. This is **harmless in plain chat** (you see the model's reasoning) but **breaks agent/tool-calling loops**: VS Code may not preserve `reasoning_content` in follow-up tool-result messages, and the model may reject the continuation.
|
|
186
|
+
|
|
187
|
+
| Mode | Plain chat | Tool turns |
|
|
188
|
+
| ------------------- | ------------------------------- | ----------------------------- |
|
|
189
|
+
| Direct path | Thinking OFF (always) | Thinking OFF |
|
|
190
|
+
| Proxy path | Thinking ON (default preserved) | Thinking OFF (auto-injected) |
|
|
191
|
+
| No config (default) | Thinking ON | Risk: history may be rejected |
|
|
192
|
+
|
|
193
|
+
### Vision (`qwen3.6-plus` only)
|
|
194
|
+
|
|
195
|
+
- Image input via OpenAI-compatible `content` array format (base64 data URIs).
|
|
196
|
+
- **External image URLs may fail** if DashScope's servers cannot reach them — base64-encoded images work reliably.
|
|
197
|
+
|
|
198
|
+
### Capabilities
|
|
199
|
+
|
|
200
|
+
- Streaming (SSE, `data: [DONE]` terminator).
|
|
201
|
+
- Tool calling with `tools` array and `tool_calls` response.
|
|
202
|
+
- Vision (image input) on `qwen3.6-plus` only.
|
|
203
|
+
- Non-OpenAI extras: `enable_thinking`, `thinking_budget`, `enable_search` (via `extra_body`).
|
|
204
|
+
|
|
205
|
+
## Troubleshooting
|
|
206
|
+
|
|
207
|
+
| Symptom | Likely cause | Fix |
|
|
208
|
+
| ----------------------------------------------- | -------------------------------------------- | ------------------------------------------------------------------------------------------------------------ |
|
|
209
|
+
| "Connection refused" (proxy mode) | Proxy not running | `npm run proxy:qwen` |
|
|
210
|
+
| Tool loops fail with `reasoning_content` errors | Direct path missing `enable_thinking: false` | Add `enable_thinking: false` to `requestBody` |
|
|
211
|
+
| Tool loops still fail with proxy | Proxy not rewriting | Check `debug_log/qwen-proxy.ndjson` — verify `hasTools: true` requests have `rewrittenEnableThinking: false` |
|
|
212
|
+
| Vision fails with external image URL | DashScope couldn't reach the URL | Use a base64 data URI instead |
|
|
213
|
+
| 401 Unauthorized | API key region mismatch | Match your key to the regional endpoint |
|
|
214
|
+
| Intermittent `net::ERR_CONNECTION_RESET` | Transient VS Code / Electron transport | Retry; not reproducible via `curl` or Node.js |
|
|
215
|
+
| Want to switch back to direct | Proxy mode active | Revert `url` to DashScope endpoint and restore `requestBody.enable_thinking: false` |
|
|
216
|
+
|
|
217
|
+
## Pricing
|
|
218
|
+
|
|
219
|
+
For the cross-provider comparison, see [docs/pricing.md](../pricing.md). DashScope (international) rates for **non-thinking** mode:
|
|
220
|
+
|
|
221
|
+
| Model | Input (≤ 256K tokens) | Input (> 256K tokens) | Output (≤ 256K tokens) | Output (> 256K tokens) |
|
|
222
|
+
| -------------- | --------------------- | --------------------- | ---------------------- | ---------------------- |
|
|
223
|
+
| `qwen3.6-plus` | $0.50 / 1M | $2.00 / 1M | $3.00 / 1M | $6.00 / 1M |
|
|
224
|
+
| `qwen3.7-max` | $2.50 / 1M (≤ 1M) | — | $7.50 / 1M (≤ 1M) | — |
|
|
225
|
+
|
|
226
|
+
> **Free quota:** DashScope offers 1M input + 1M output tokens per model, valid for 90 days after activating Model Studio.
|
|
227
|
+
|
|
228
|
+
---
|
|
229
|
+
|
|
230
|
+
## Background & Findings
|
|
231
|
+
|
|
232
|
+
> This appendix preserves the validation narrative for future reference. It is not required to use the model.
|
|
233
|
+
|
|
234
|
+
### Why a proxy is useful (and why a static `enable_thinking: false` is enough)
|
|
235
|
+
|
|
236
|
+
Both work — pick based on your preference:
|
|
237
|
+
|
|
238
|
+
- **Direct path** is the simplest: static `enable_thinking: false` suppresses reasoning in all requests. Tool loops stay stable. Trade-off: you never see the model's thought process.
|
|
239
|
+
- **Proxy path** is dynamic: reasoning stays ON in plain chat (you see it), and the proxy automatically sets `enable_thinking: false` when `tools` is present (loops stay stable). Best of both worlds, at the cost of running a local process.
|
|
240
|
+
|
|
241
|
+
### Validation results (June 1, 2026)
|
|
242
|
+
|
|
243
|
+
#### Proxy validation (8 checks, all passed)
|
|
244
|
+
|
|
245
|
+
| # | Check | Result |
|
|
246
|
+
| --- | ---------------------------------------- | -------------------------------------------------------------------------------- |
|
|
247
|
+
| 1 | Proxy starts | ✅ `--help` prints correct usage and defaults |
|
|
248
|
+
| 2 | Health check | ✅ Returns `{"ok":true,"port":3458,...}` |
|
|
249
|
+
| 3 | Plain chat (no tools) → thinking ON | ✅ Response contains `reasoning_content`; `enable_thinking` deleted |
|
|
250
|
+
| 4 | Tool chat (tools present) → thinking OFF | ✅ No `reasoning_content`; clean `tool_calls`; `enable_thinking: false` injected |
|
|
251
|
+
| 5 | Streaming passthrough | ✅ SSE chunks arrive correctly with `text/event-stream` |
|
|
252
|
+
| 6 | Error passthrough | ✅ Invalid JSON returns HTTP 400 with useful error message |
|
|
253
|
+
| 7 | Auth passthrough | ✅ Missing key → 401; valid key → 200 |
|
|
254
|
+
| 8 | Logging | ✅ All entries redact `Authorization: Bearer <redacted>` |
|
|
255
|
+
|
|
256
|
+
#### Direct-path validation — `qwen3.7-max`
|
|
257
|
+
|
|
258
|
+
| Capability | Result | Notes |
|
|
259
|
+
| -------------------------------------------- | ------ | ---------------------------------------------------------------------- |
|
|
260
|
+
| Non-streaming chat (curl) | ✅ | HTTP 200, valid assistant message; `reasoning_content` present |
|
|
261
|
+
| Streaming chat (curl) | ✅ | HTTP 200, SSE chunks; `reasoning_content` streamed alongside `content` |
|
|
262
|
+
| Tool-enabled chat (thinking on) | ✅ | HTTP 200, `finish_reason: tool_calls` — `reasoning_content` present |
|
|
263
|
+
| Tool-enabled chat (`enable_thinking: false`) | ✅ | HTTP 200, clean OpenAI shape, no `reasoning_content`, 25 tokens vs 170 |
|
|
264
|
+
| Model appears in VS Code picker | ✅ | "Agent \| Qwen 3.7 Max" confirmed |
|
|
265
|
+
| Plain chat in VS Code | ✅ | Streaming output confirmed |
|
|
266
|
+
| Streaming in VS Code | ✅ | Token-by-token streaming confirmed |
|
|
267
|
+
| Tool / agent use in VS Code | ✅ | Browser tool invoked successfully |
|
|
268
|
+
|
|
269
|
+
#### Direct-path validation — `qwen3.6-plus`
|
|
270
|
+
|
|
271
|
+
| Capability | Result | Notes |
|
|
272
|
+
| -------------------------------------------- | ------ | ---------------------------------------------------------------------------- |
|
|
273
|
+
| Non-streaming chat (curl) | ✅ | HTTP 200; `reasoning_content` present (727 reasoning tokens) |
|
|
274
|
+
| Streaming chat (curl) | ✅ | SSE chunks with `reasoning_content` deltas streaming correctly |
|
|
275
|
+
| Tool-enabled chat (`enable_thinking: false`) | ✅ | Clean `tool_calls`, no `reasoning_content`, 25 tokens |
|
|
276
|
+
| Vision: image + text (curl, base64) | ✅ | Model correctly identified a 10×10 test pattern; `image_tokens: 66` |
|
|
277
|
+
| Vision: image + text (curl, external URL) | ❌ | `Failed to download multimodal content` — DashScope couldn't reach Wikipedia |
|
|
278
|
+
| Model appears in VS Code picker | ✅ | "Agent \| Qwen 3.6 Plus" confirmed |
|
|
279
|
+
| Plain chat in VS Code | ✅ | Streaming output confirmed |
|
|
280
|
+
| Streaming in VS Code | ✅ | Token-by-token streaming confirmed |
|
|
281
|
+
| Tool / agent use in VS Code | ✅ | Browser tool invoked to open Qwen docs and Google |
|
|
282
|
+
| Vision in VS Code | ✅ | Image attachment analyzed correctly |
|
|
283
|
+
|
|
284
|
+
#### Intermittent `ERR_CONNECTION_RESET` investigation
|
|
285
|
+
|
|
286
|
+
A `net::ERR_CONNECTION_RESET` was observed once during `qwen3.6-plus` validation, but did not reproduce on the same machine outside VS Code:
|
|
287
|
+
|
|
288
|
+
- Direct `curl` POST to DashScope Singapore → HTTP 200.
|
|
289
|
+
- Direct Node.js HTTPS POST → HTTP 200.
|
|
290
|
+
- Direct Node.js HTTPS **streaming** POST with full `qwen3.6-plus.md` content embedded → HTTP 200.
|
|
291
|
+
|
|
292
|
+
Conclusion: not a DashScope or Qwen model incompatibility. Evidence points to an intermittent VS Code / Electron transport issue or transient network interruption local to the editor process.
|
|
293
|
+
|
|
294
|
+
### Final verdict
|
|
295
|
+
|
|
296
|
+
| Criterion | `qwen3.7-max` | `qwen3.6-plus` |
|
|
297
|
+
| ---------------------- | -------------- | -------------- |
|
|
298
|
+
| Plain chat | ✅ | ✅ |
|
|
299
|
+
| Streaming chat | ✅ | ✅ |
|
|
300
|
+
| Tool-enabled agent use | ✅ | ✅ |
|
|
301
|
+
| Vision | ❌ (text-only) | ✅ |
|
|
302
|
+
| Without a proxy | ✅ | ✅ |
|
|
303
|
+
|
|
304
|
+
### Known limitations
|
|
305
|
+
|
|
306
|
+
- GitHub Copilot inline completions and semantic-search features remain outside scope.
|
|
307
|
+
- One intermittent VS Code-side `net::ERR_CONNECTION_RESET` was observed — not reproducible externally, treated as transient transport issue.
|
|
308
|
+
- External image URLs may fail if DashScope's servers cannot reach them; base64-encoded images work reliably (`qwen3.6-plus`).
|
|
309
|
+
- Vision is not supported on `qwen3.7-max` (text-generation model).
|
|
310
|
+
- `maxInputTokens` / `maxOutputTokens` not yet confirmed from official DashScope documentation.
|
|
311
|
+
- API keys are region-specific — a key created for one regional endpoint will not work with another.
|
|
312
|
+
|
|
313
|
+
## References
|
|
314
|
+
|
|
315
|
+
- VS Code custom endpoint docs: `https://code.visualstudio.com/docs/copilot/customization/language-models#_add-a-custom-endpoint-model`
|
|
316
|
+
- DashScope OpenAI-compatible Chat Completions overview: `https://help.aliyun.com/zh/model-studio/compatibility-of-openai-with-dashscope`
|
|
317
|
+
- DashScope model index: `https://help.aliyun.com/zh/model-studio/getting-started/models`
|
|
318
|
+
- DashScope vision model guide: `https://help.aliyun.com/zh/model-studio/vision`
|
|
319
|
+
- DashScope pricing: `https://www.alibabacloud.com/help/en/model-studio/billing-for-model-studio`
|
|
320
|
+
- Kimi K2.6 validation record (separate provider): [kimi-k2.6.md](kimi-k2.6.md)
|