copilot-custom-endpoint 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +470 -0
- package/cli.mjs +54 -0
- package/package.json +50 -0
- package/proxy/kimi-proxy.mjs +158 -0
- package/proxy/qwen-proxy.mjs +114 -0
package/README.md
ADDED
|
@@ -0,0 +1,470 @@
|
|
|
1
|
+
# Github Copilot Custom Endpoints
|
|
2
|
+
|
|
3
|
+
> **TL;DR** — As of **June 1, 2026**, GitHub Copilot switched to usage-based billing (AI Credits), making every chat and agent session consume from your monthly allowance. Frontier models like GPT-5.5 and Opus 4.8 burn credits fast. This repo documents a practical workaround: use **cheaper, non-GitHub models** (DeepSeek, Kimi, Qwen) inside VS Code's Copilot chat — often at **5–55× lower cost** while retaining agent mode, tool calling, and streaming. We keep validated, copy-paste-ready configs and a small local proxy that smooths out provider quirks.
|
|
4
|
+
|
|
5
|
+
## What is this?
|
|
6
|
+
|
|
7
|
+
VS Code lets you add your own language-model endpoint ("Bring Your Own Key"). In practice, many providers claim "OpenAI-compatible" APIs but reject the exact request shapes that VS Code sends. This repo is a growing collection of **real, tested setups** — not just hopeful `curl` snippets.
|
|
8
|
+
|
|
9
|
+
Each provider/model gets one durable record under `docs/models/` plus any local proxy code it needs under `proxy/`.
|
|
10
|
+
|
|
11
|
+
### Why custom endpoints instead of OpenRouter?
|
|
12
|
+
|
|
13
|
+
[OpenRouter](https://openrouter.ai) is a popular unified gateway, but it is **not always an option**:
|
|
14
|
+
|
|
15
|
+
- **Corporate firewalls often block OpenRouter** (and many other cloud AI gateways) by default. If your employer's network blocks OpenRouter, you cannot use it — full stop. A custom endpoint lets you talk directly to a provider that _is_ allowed, or run a small local proxy on `localhost` that forwards through an approved egress path.
|
|
16
|
+
- **Provider-specific features** (Kimi's thinking mode, vision quirks, etc.) often need request rewriting that a generic aggregator does not support.
|
|
17
|
+
- **Cost or contract reasons** may mean your organisation already has a direct relationship with a specific provider and does not want traffic routed through a third party.
|
|
18
|
+
|
|
19
|
+
This repo is for those situations: validated, copy-paste-ready configs when OpenRouter is blocked, too expensive, or simply the wrong tool for the job.
|
|
20
|
+
|
|
21
|
+
## Quick start
|
|
22
|
+
|
|
23
|
+
| Provider | Model | Needs proxy? | Plain chat | Streaming | Tool calling | Vision |
|
|
24
|
+
| ----------------------------- | -------------- | ---------------------------------- | ---------- | --------- | ------------ | ------ |
|
|
25
|
+
| **Moonshot (Kimi)** | `kimi-k2.6` | Yes — `proxy/kimi-proxy.mjs` | ✅ | ✅ | ✅ | ✅ |
|
|
26
|
+
| **Alibaba Cloud (DashScope)** | `qwen3.6-plus` | Optional — `proxy/qwen-proxy.mjs`¹ | ✅² | ✅ | ✅ | ✅ |
|
|
27
|
+
| **Alibaba Cloud (DashScope)** | `qwen3.7-max` | Optional — `proxy/qwen-proxy.mjs`¹ | ✅² | ✅ | ✅ | ❌ |
|
|
28
|
+
| **DeepSeek** | `deepseek-v4` | No — uses a VS Code extension | ✅ | ✅ | ✅ | ✅¹ |
|
|
29
|
+
|
|
30
|
+
¹ Proxy is optional: direct path works with static `enable_thinking: false`. Proxy adds dynamic thinking suppression (thinking ON in plain chat, OFF in tool loops).
|
|
31
|
+
² With proxy: reasoning visible in plain chat. Without proxy: always suppressed.
|
|
32
|
+
|
|
33
|
+
¹ Vision is supported through a proxy model (Claude, GPT-4o) that describes the image before sending to DeepSeek.
|
|
34
|
+
|
|
35
|
+
Pick the model you want and follow the corresponding section below.
|
|
36
|
+
|
|
37
|
+
### Config file location
|
|
38
|
+
|
|
39
|
+
The Kimi and Qwen setups require editing the same VS Code config file:
|
|
40
|
+
|
|
41
|
+
| OS | Path |
|
|
42
|
+
| ------- | ----------------------------------------------------------------- |
|
|
43
|
+
| Windows | `%APPDATA%\Code\User\chatLanguageModels.json` |
|
|
44
|
+
| macOS | `~/Library/Application Support/Code/User/chatLanguageModels.json` |
|
|
45
|
+
| Linux | `~/.config/Code/User/chatLanguageModels.json` |
|
|
46
|
+
|
|
47
|
+
### Kimi K2.6 (Moonshot)
|
|
48
|
+
|
|
49
|
+
#### 1. Grab a Moonshot API key
|
|
50
|
+
|
|
51
|
+
Sign up at [platform.moonshot.ai](https://platform.moonshot.ai) and create an API key.
|
|
52
|
+
|
|
53
|
+
#### 2. Start the local proxy
|
|
54
|
+
|
|
55
|
+
The proxy rewrites VS Code's requests into shapes Kimi actually accepts (fixed `temperature`, `top_p`, and disabling "thinking" during tool calls).
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
# from this repo — Kimi only
|
|
59
|
+
npm run proxy:kimi
|
|
60
|
+
# from this repo — both proxies concurrently
|
|
61
|
+
npm run proxy
|
|
62
|
+
# or with npx (after npm publish)
|
|
63
|
+
npx copilot-custom-endpoint kimi
|
|
64
|
+
npx copilot-custom-endpoint # starts both proxies
|
|
65
|
+
# or directly
|
|
66
|
+
node proxy/kimi-proxy.mjs
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
You should see:
|
|
70
|
+
|
|
71
|
+
```
|
|
72
|
+
[kimi-proxy] listening on http://127.0.0.1:3457/v1/chat/completions
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
Check it's alive:
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
curl http://127.0.0.1:3457/healthz
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Expected response:
|
|
82
|
+
|
|
83
|
+
```json
|
|
84
|
+
{
|
|
85
|
+
"ok": true,
|
|
86
|
+
"upstreamUrl": "https://api.moonshot.ai/v1/chat/completions",
|
|
87
|
+
"port": 3457,
|
|
88
|
+
"forcedTemperature": 1,
|
|
89
|
+
"forcedTopP": 0.95
|
|
90
|
+
}
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
> **Keep this terminal open** while you use Kimi in VS Code.
|
|
94
|
+
|
|
95
|
+
#### 3. Register the model in VS Code
|
|
96
|
+
|
|
97
|
+
Open (or create) your user config file (see [Config file location](#config-file-location) above) and paste this entry (replace `<your-moonshot-key>`):
|
|
98
|
+
|
|
99
|
+
```json
|
|
100
|
+
{
|
|
101
|
+
"name": "Kimi",
|
|
102
|
+
"vendor": "customendpoint",
|
|
103
|
+
"apiKey": "<your-moonshot-key>",
|
|
104
|
+
"apiType": "chat-completions",
|
|
105
|
+
"models": [
|
|
106
|
+
{
|
|
107
|
+
"id": "kimi-k2.6",
|
|
108
|
+
"name": "Kimi K2.6",
|
|
109
|
+
"url": "http://127.0.0.1:3457/v1/chat/completions",
|
|
110
|
+
"requestBody": {
|
|
111
|
+
"temperature": 1
|
|
112
|
+
},
|
|
113
|
+
"toolCalling": true,
|
|
114
|
+
"vision": true,
|
|
115
|
+
"streaming": true,
|
|
116
|
+
"maxInputTokens": 262144,
|
|
117
|
+
"maxOutputTokens": 32768
|
|
118
|
+
}
|
|
119
|
+
]
|
|
120
|
+
}
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
> **Note:** The `requestBody.temperature` here is a hint to VS Code, but the proxy will enforce the exact values Kimi requires regardless.
|
|
124
|
+
|
|
125
|
+
#### 4. Chat!
|
|
126
|
+
|
|
127
|
+
- Open the Copilot chat panel (`Ctrl+Alt+I` / `Cmd+Ctrl+I`).
|
|
128
|
+
- Click the model picker (top-right of the chat input).
|
|
129
|
+
- Choose **Kimi K2.6**.
|
|
130
|
+
- Ask something. Streaming, tool use, and vision all work.
|
|
131
|
+
|
|
132
|
+
#### Troubleshooting (Kimi)
|
|
133
|
+
|
|
134
|
+
| Symptom | Fix |
|
|
135
|
+
| --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
136
|
+
| "Connection refused" or no response | Make sure `node proxy/kimi-proxy.mjs` is still running. |
|
|
137
|
+
| `invalid temperature` / `invalid top_p` | You're talking directly to Moonshot instead of through the proxy. Double-check the `url` in `chatLanguageModels.json`. |
|
|
138
|
+
| Tool calls fail after first turn | This happens if "thinking" stays enabled during tool loops. The proxy normally disables it automatically; ensure you're on the latest `proxy/kimi-proxy.mjs`. |
|
|
139
|
+
|
|
140
|
+
---
|
|
141
|
+
|
|
142
|
+
### Qwen 3.6 Plus or Qwen 3.7 Max (DashScope)
|
|
143
|
+
|
|
144
|
+
These models work with the optional `proxy/qwen-proxy.mjs` for dynamic thinking suppression (reasoning visible in plain chat, suppressed in tool loops). They also work **without a proxy** using a static `enable_thinking: false` — see the [direct path alternative](#direct-path-no-proxy) below.
|
|
145
|
+
|
|
146
|
+
#### 1. Grab a DashScope API key
|
|
147
|
+
|
|
148
|
+
Sign up at [dashscope.aliyun.com](https://dashscope.aliyun.com) and create an API key.
|
|
149
|
+
|
|
150
|
+
#### 2. Start the optional local proxy (recommended)
|
|
151
|
+
|
|
152
|
+
The proxy dynamically enables thinking in plain chat and disables it during tool calls:
|
|
153
|
+
|
|
154
|
+
```bash
|
|
155
|
+
# from this repo — Qwen only
|
|
156
|
+
npm run proxy:qwen
|
|
157
|
+
# from this repo — both proxies concurrently
|
|
158
|
+
npm run proxy
|
|
159
|
+
# or with npx (after npm publish)
|
|
160
|
+
npx copilot-custom-endpoint qwen
|
|
161
|
+
npx copilot-custom-endpoint # starts both proxies
|
|
162
|
+
# or directly
|
|
163
|
+
node proxy/qwen-proxy.mjs
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
You should see:
|
|
167
|
+
|
|
168
|
+
```
|
|
169
|
+
[qwen-proxy] listening on http://127.0.0.1:3458/v1/chat/completions
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
Check it's alive:
|
|
173
|
+
|
|
174
|
+
```bash
|
|
175
|
+
curl http://127.0.0.1:3458/healthz
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
Expected response:
|
|
179
|
+
|
|
180
|
+
```json
|
|
181
|
+
{
|
|
182
|
+
"ok": true,
|
|
183
|
+
"upstreamUrl": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
|
|
184
|
+
"port": 3458,
|
|
185
|
+
"disableThinkingWithTools": true
|
|
186
|
+
}
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
#### 3. Register the models in VS Code
|
|
190
|
+
|
|
191
|
+
Open (or create) your user config file (see [Config file location](#config-file-location) above) and paste this entry (replace `<your-dashscope-key>`). Point URLs at the proxy and omit `requestBody` — the proxy handles thinking dynamically:
|
|
192
|
+
|
|
193
|
+
```json
|
|
194
|
+
{
|
|
195
|
+
"name": "Qwen",
|
|
196
|
+
"vendor": "customendpoint",
|
|
197
|
+
"apiKey": "<your-dashscope-key>",
|
|
198
|
+
"apiType": "chat-completions",
|
|
199
|
+
"models": [
|
|
200
|
+
{
|
|
201
|
+
"id": "qwen3.7-max",
|
|
202
|
+
"name": "Qwen 3.7 Max",
|
|
203
|
+
"url": "http://127.0.0.1:3458/v1/chat/completions",
|
|
204
|
+
"toolCalling": true,
|
|
205
|
+
"vision": false,
|
|
206
|
+
"streaming": true
|
|
207
|
+
},
|
|
208
|
+
{
|
|
209
|
+
"id": "qwen3.6-plus",
|
|
210
|
+
"name": "Qwen 3.6 Plus",
|
|
211
|
+
"url": "http://127.0.0.1:3458/v1/chat/completions",
|
|
212
|
+
"toolCalling": true,
|
|
213
|
+
"vision": true,
|
|
214
|
+
"streaming": true
|
|
215
|
+
}
|
|
216
|
+
]
|
|
217
|
+
}
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
> **Keep the proxy terminal open** while using these models.
|
|
221
|
+
|
|
222
|
+
#### 4. Chat!
|
|
223
|
+
|
|
224
|
+
- Open the Copilot chat panel (`Ctrl+Alt+I` / `Cmd+Ctrl+I`).
|
|
225
|
+
- Click the model picker (top-right of the chat input).
|
|
226
|
+
- Choose **Qwen 3.6 Plus** (with vision) or **Qwen 3.7 Max** (text only).
|
|
227
|
+
- Ask something. Streaming, tool use, and vision (3.6 Plus) all work.
|
|
228
|
+
|
|
229
|
+
> **Regional endpoints:** If connecting directly (no proxy), DashScope offers endpoints for several regions. The proxy uses `dashscope-intl.aliyuncs.com` (Singapore) by default, configurable via `QWEN_UPSTREAM_URL`.
|
|
230
|
+
>
|
|
231
|
+
> - **China (Beijing):** `https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions`
|
|
232
|
+
> - **US (Virginia):** `https://dashscope-us.aliyuncs.com/compatible-mode/v1/chat/completions`
|
|
233
|
+
> - **Singapore:** `https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions` (proxy default)
|
|
234
|
+
>
|
|
235
|
+
> API keys are region-specific.
|
|
236
|
+
|
|
237
|
+
#### Direct path (no proxy)
|
|
238
|
+
|
|
239
|
+
If you prefer not to run the proxy, Qwen models work **directly** with DashScope by using the upstream URL and a static `enable_thinking: false` in `requestBody`:
|
|
240
|
+
|
|
241
|
+
```json
|
|
242
|
+
{
|
|
243
|
+
"name": "Qwen",
|
|
244
|
+
"vendor": "customendpoint",
|
|
245
|
+
"apiKey": "<your-dashscope-key>",
|
|
246
|
+
"apiType": "chat-completions",
|
|
247
|
+
"models": [
|
|
248
|
+
{
|
|
249
|
+
"id": "qwen3.7-max",
|
|
250
|
+
"name": "Qwen 3.7 Max",
|
|
251
|
+
"url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
|
|
252
|
+
"toolCalling": true,
|
|
253
|
+
"vision": false,
|
|
254
|
+
"streaming": true,
|
|
255
|
+
"requestBody": {
|
|
256
|
+
"enable_thinking": false
|
|
257
|
+
}
|
|
258
|
+
},
|
|
259
|
+
{
|
|
260
|
+
"id": "qwen3.6-plus",
|
|
261
|
+
"name": "Qwen 3.6 Plus",
|
|
262
|
+
"url": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions",
|
|
263
|
+
"toolCalling": true,
|
|
264
|
+
"vision": true,
|
|
265
|
+
"streaming": true,
|
|
266
|
+
"requestBody": {
|
|
267
|
+
"enable_thinking": false
|
|
268
|
+
}
|
|
269
|
+
}
|
|
270
|
+
]
|
|
271
|
+
}
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
> **Trade-off:** `enable_thinking: false` suppresses reasoning in all requests (both plain chat and tool loops). Tool loops stay stable, but you never see the model's thought process. The proxy path avoids this trade-off.
|
|
275
|
+
|
|
276
|
+
#### Troubleshooting (Qwen)
|
|
277
|
+
|
|
278
|
+
| Symptom | Fix |
|
|
279
|
+
| -------------------------------------------- | --------------------------------------------------------------------------------------- |
|
|
280
|
+
| `reasoning_content` errors during tool loops | Ensure `enable_thinking: false` is present in `requestBody` for every Qwen model. |
|
|
281
|
+
| Vision images fail to upload | Use base64-encoded images; external image URLs may fail if DashScope cannot reach them. |
|
|
282
|
+
|
|
283
|
+
---
|
|
284
|
+
|
|
285
|
+
### DeepSeek V4 (VS Code Extension)
|
|
286
|
+
|
|
287
|
+
DeepSeek V4 Pro & Flash are available via a **dedicated VS Code extension** rather than a raw custom endpoint. The extension plugs DeepSeek directly into Copilot Chat's model picker while preserving agent mode, tool calling, skills, and MCP support.
|
|
288
|
+
|
|
289
|
+
> **How this differs:** Unlike Kimi and Qwen (which use VS Code's built-in `chatLanguageModels.json` custom endpoint mechanism), DeepSeek uses a VS Code extension that registers itself with Copilot. The experience is the same — pick the model in chat — but the setup path goes through the extension.
|
|
290
|
+
|
|
291
|
+
#### 1. Install the Extension
|
|
292
|
+
|
|
293
|
+
- VS Code 1.116 or later.
|
|
294
|
+
- A [GitHub Copilot subscription](https://github.com/features/copilot) (Free / Pro / Enterprise all work).
|
|
295
|
+
- Install **[DeepSeek V4 for Copilot Chat](https://marketplace.visualstudio.com/items?itemName=Vizards.deepseek-v4-for-copilot)** from the VS Code Marketplace ([source](https://github.com/Vizards/deepseek-v4-for-copilot)).
|
|
296
|
+
|
|
297
|
+
#### 2. Get a DeepSeek API Key
|
|
298
|
+
|
|
299
|
+
Go to [platform.deepseek.com/api_keys](https://platform.deepseek.com/api_keys) and create an API key (starts with `sk-`).
|
|
300
|
+
|
|
301
|
+
#### 3. Configure the API Key
|
|
302
|
+
|
|
303
|
+
Open the Command Palette (`Ctrl+Shift+P`) and run **DeepSeek: Set API Key**, then paste your key. The key is stored in your OS keychain.
|
|
304
|
+
|
|
305
|
+
#### 4. Select the Model and Start Chatting
|
|
306
|
+
|
|
307
|
+
- Open Copilot Chat (`Ctrl+Shift+I`).
|
|
308
|
+
- Click the model picker (top-right of the chat panel).
|
|
309
|
+
- Choose **DeepSeek V4 Pro** or **DeepSeek V4 Flash**.
|
|
310
|
+
- Agent mode, tool calling, skills, and MCP all work out of the box.
|
|
311
|
+
|
|
312
|
+
#### Optional: Configure Thinking Effort
|
|
313
|
+
|
|
314
|
+
In the model picker, click the gear icon next to a DeepSeek model to choose:
|
|
315
|
+
|
|
316
|
+
- **None** — fastest, no reasoning.
|
|
317
|
+
- **High** — balanced (default).
|
|
318
|
+
- **Max** — deep reasoning for complex tasks.
|
|
319
|
+
|
|
320
|
+
#### Optional: Vision Support
|
|
321
|
+
|
|
322
|
+
DeepSeek V4 is text-only, but the extension handles images automatically — drop a screenshot into chat and it proxies through another installed Copilot model (Claude, GPT-4o) to describe the image first. Run **DeepSeek: Set Vision Proxy Model** to pick which model handles image descriptions.
|
|
323
|
+
|
|
324
|
+
> For the full official guide, see: [github.com/deepseek-ai/awesome-deepseek-agent/blob/main/docs/github_copilot.md](https://github.com/deepseek-ai/awesome-deepseek-agent/blob/main/docs/github_copilot.md)
|
|
325
|
+
|
|
326
|
+
For the full research notes, tested values, and known limitations, see:
|
|
327
|
+
|
|
328
|
+
- [`docs/models/kimi-k2.6.md`](docs/models/kimi-k2.6.md)
|
|
329
|
+
- [`docs/models/qwen.md`](docs/models/qwen.md)
|
|
330
|
+
|
|
331
|
+
## Pricing comparison
|
|
332
|
+
|
|
333
|
+
> **⏰ June 1, 2026 — GitHub Copilot switched to usage-based billing (AI Credits) today.**
|
|
334
|
+
>
|
|
335
|
+
> Before this change, Copilot was a flat subscription — no per-turn metering, so you could use chat and agent mode as much as you wanted within rate-limit bounds. Now **every interaction burns AI credits** from your monthly allowance. Agent mode and complex multi-file tasks consume significantly more tokens than simple Q&A, which means your 7,000 Pro+ credits can disappear fast if you're using frontier models.
|
|
336
|
+
>
|
|
337
|
+
> **The practical workaround:** use cheaper alternative models (DeepSeek V4 Flash, Kimi K2.6, Qwen) that are still powerful enough for coding — often at **5–55× less cost** than the Copilot defaults. The tables below show the exact comparison.
|
|
338
|
+
>
|
|
339
|
+
> 1 AI credit = $0.01 USD. All paid plans include a monthly credit allowance:
|
|
340
|
+
>
|
|
341
|
+
> | Plan | Price/mo | Base credits | Flex allotment | Total monthly |
|
|
342
|
+
> | ---- | -------- | ------------ | -------------- | ------------- |
|
|
343
|
+
> | Pro | $10 | 1,000 | 500 | **1,500** |
|
|
344
|
+
> | Pro+ | $39 | 3,900 | 3,100 | **7,000** |
|
|
345
|
+
> | Max | $100 | 10,000 | 10,000 | **20,000** |
|
|
346
|
+
>
|
|
347
|
+
> Code completions remain unlimited and **not** billed. Auto model selection gets a 10% discount.
|
|
348
|
+
|
|
349
|
+
All prices below are in **USD per 1M tokens** (non-cached). To convert to AI credits, multiply by 100 (e.g., $5.00/1M = 500 credits/1M).
|
|
350
|
+
|
|
351
|
+
### Default GitHub Copilot models
|
|
352
|
+
|
|
353
|
+
These are the models available through GitHub Copilot's model roster as of June 1, 2026.
|
|
354
|
+
|
|
355
|
+
| Model | Provider | Tier | Input (per 1M) | Cached input | Output (per 1M) | Context |
|
|
356
|
+
| --------------------- | --------- | ----------- | -------------- | ------------ | --------------- | ------- |
|
|
357
|
+
| **GPT-5.5** | OpenAI | Powerful | $5.00 | $0.50 | $30.00 | — |
|
|
358
|
+
| **Claude Opus 4.8** | Anthropic | Powerful | $5.00 | $0.50 | $25.00 | 1M |
|
|
359
|
+
| **Claude Opus 4.7** | Anthropic | Powerful | $5.00 | $0.50 | $25.00 | 1M |
|
|
360
|
+
| **GPT-5.4** | OpenAI | Versatile | $2.50 | $0.25 | $15.00 | — |
|
|
361
|
+
| **GPT-5.3-Codex** | OpenAI | Powerful | $1.75 | $0.175 | $14.00 | — |
|
|
362
|
+
| **Claude Sonnet 4.6** | Anthropic | Versatile | $3.00 | $0.30 | $15.00 | 1M |
|
|
363
|
+
| **Gemini 3.1 Pro** | Google | Powerful | $2.00¹ | $0.20 | $12.00¹ | 1M |
|
|
364
|
+
| **Claude Haiku 4.5** | Anthropic | Versatile | $1.00 | $0.10 | $5.00 | 1M |
|
|
365
|
+
| **Gemini 3.5 Flash** | Google | Lightweight | $1.50 | $0.15 | $9.00 | 1M |
|
|
366
|
+
| **Gemini 2.5 Pro** | Google | Powerful | $1.25¹ | $0.125 | $10.00¹ | 1M |
|
|
367
|
+
| **GPT-5.4 mini** | OpenAI | Lightweight | $0.75 | $0.075 | $4.50 | — |
|
|
368
|
+
| **Gemini 3 Flash** | Google | Lightweight | $0.50 | $0.05 | $3.00 | 1M |
|
|
369
|
+
| **Raptor mini** | GitHub | Versatile | $0.25 | $0.025 | $2.00 | — |
|
|
370
|
+
|
|
371
|
+
¹ Gemini 3.1 Pro and 2.5 Pro pricing applies to prompts ≤200K tokens.
|
|
372
|
+
|
|
373
|
+
### Custom-endpoint alternatives
|
|
374
|
+
|
|
375
|
+
| Model | Provider | Input (per 1M) | Output (per 1M) | Context window |
|
|
376
|
+
| --------------------- | --------- | ----------------------------- | --------------------------------------- | -------------- |
|
|
377
|
+
| **DeepSeek V4 Flash** | DeepSeek | $0.14 | $0.28 | 1M |
|
|
378
|
+
| **Kimi K2.6** | Moonshot | $0.16 | $0.95 (non-thinking) / $4.00 (thinking) | 256K |
|
|
379
|
+
| **DeepSeek V4 Pro** | DeepSeek | $1.74 | $3.48 | 1M |
|
|
380
|
+
| **Qwen 3.6 Plus** | DashScope | $0.50 (≤256K) / $2.00 (>256K) | $3.00 (≤256K) / $6.00 (>256K) | 1M |
|
|
381
|
+
| **Qwen 3.7 Max** | DashScope | $2.50 (≤1M) | $7.50 (≤1M) | 1M |
|
|
382
|
+
|
|
383
|
+
> **Notes:**
|
|
384
|
+
>
|
|
385
|
+
> - **DeepSeek V4** input pricing shown is the **cache miss** price. Cache hits are significantly cheaper ($0.0028/M for Flash, $0.0145/M for Pro).
|
|
386
|
+
> - **Gemini 3 Flash** is priced at $0.50/MTok input (text/image/video) and $1.00/MTok input for audio.
|
|
387
|
+
> - **Anthropic (Claude)** models also have a cache write cost ($6.25/MTok for Opus, $3.75/MTok for Sonnet, $1.25/MTok for Haiku). Opus 4.7+ use a new tokenizer that may use up to 35% more tokens for the same text.
|
|
388
|
+
> - **OpenAI** models support cached input at 0.1× base input rate.
|
|
389
|
+
> - **Qwen** models use **tiered pricing** — determined by total input tokens per request. Prices above are for non-thinking mode.
|
|
390
|
+
> - **Kimi K2.6** pricing is from the **Moonshot platform** (direct). Via DashScope: $0.89 input / $3.71 output.
|
|
391
|
+
> - **DashScope** offers a **free quota** of 1M input + 1M output tokens per model, valid for 90 days.
|
|
392
|
+
> - For typical Copilot chat usage (short-to-medium prompts), you'll almost always fall in the lowest pricing tier.
|
|
393
|
+
|
|
394
|
+
**Quick cost comparison for a typical coding session** (~10K input + ~2K output tokens per turn, 50 turns):
|
|
395
|
+
|
|
396
|
+
| Model | Estimated session cost | Copilot Pro+ credits |
|
|
397
|
+
| ------------------------ | ---------------------- | -------------------- |
|
|
398
|
+
| DeepSeek V4 Flash 🏆 | ~$0.10 | — |
|
|
399
|
+
| Kimi K2.6 (non-thinking) | ~$0.18 | — |
|
|
400
|
+
| Raptor mini | ~$0.33 | ~33 |
|
|
401
|
+
| Kimi K2.6 (thinking) | ~$0.48 | — |
|
|
402
|
+
| Gemini 3 Flash | ~$0.55 | ~55 |
|
|
403
|
+
| Qwen 3.6 Plus | ~$0.55 | — |
|
|
404
|
+
| GPT-5.4 mini | ~$0.83 | ~83 |
|
|
405
|
+
| Claude Haiku 4.5 | ~$1.00 | ~100 |
|
|
406
|
+
| DeepSeek V4 Pro | ~$1.22 | — |
|
|
407
|
+
| Qwen 3.7 Max | ~$1.33 | — |
|
|
408
|
+
| Gemini 2.5 Pro | ~$1.63 | ~163 |
|
|
409
|
+
| Gemini 3.5 Flash | ~$1.65 | ~165 |
|
|
410
|
+
| Gemini 3.1 Pro | ~$2.20 | ~220 |
|
|
411
|
+
| GPT-5.3-Codex | ~$2.28 | ~228 |
|
|
412
|
+
| GPT-5.4 | ~$2.75 | ~275 |
|
|
413
|
+
| Claude Sonnet 4.6 | ~$3.00 | ~300 |
|
|
414
|
+
| Claude Opus 4.8 / 4.7 | ~$5.00 | ~500 |
|
|
415
|
+
| GPT-5.5 | ~$5.50 | ~550 |
|
|
416
|
+
|
|
417
|
+
> **How long does 7,000 credits last?** A Pro+ subscriber running 50-turn sessions could afford roughly **13 GPT-5.5 sessions**, **23 Opus sessions**, or **212 Raptor mini sessions** per month — or mix and match.
|
|
418
|
+
|
|
419
|
+
> Prices last verified: June 1, 2026. Always check the official pages for the latest rates:
|
|
420
|
+
>
|
|
421
|
+
> - [GitHub Copilot models & pricing](https://docs.github.com/en/copilot/reference/copilot-billing/models-and-pricing)
|
|
422
|
+
> - [OpenAI pricing](https://openai.com/api/pricing/)
|
|
423
|
+
> - [Anthropic (Claude) pricing](https://platform.claude.com/docs/en/about-claude/pricing)
|
|
424
|
+
> - [Google Gemini pricing](https://ai.google.dev/pricing)
|
|
425
|
+
> - [DashScope pricing](https://www.alibabacloud.com/help/en/model-studio/billing-for-model-studio)
|
|
426
|
+
> - [DeepSeek pricing](https://api-docs.deepseek.com/quick_start/pricing)
|
|
427
|
+
|
|
428
|
+
## Repo layout
|
|
429
|
+
|
|
430
|
+
```
|
|
431
|
+
.
|
|
432
|
+
├── docs/models/<provider>-<model>.md # One merged record per model
|
|
433
|
+
├── proxy/ # Local compatibility shims (Kimi only)
|
|
434
|
+
├── tests/ # Test assets (images, etc.)
|
|
435
|
+
└── debug_log/ # Runtime logs (git-ignored)
|
|
436
|
+
```
|
|
437
|
+
|
|
438
|
+
## Adding a new model
|
|
439
|
+
|
|
440
|
+
Want to validate Qwen, GLM, Mimo, or something else?
|
|
441
|
+
|
|
442
|
+
1. Create `docs/models/<provider>-<model>.md`.
|
|
443
|
+
2. If the provider needs request rewriting, add a proxy script under `proxy/`.
|
|
444
|
+
3. Recommended sections for the record:
|
|
445
|
+
1. Summary
|
|
446
|
+
2. Compatibility assessment
|
|
447
|
+
3. Final working configuration
|
|
448
|
+
4. Validation summary
|
|
449
|
+
5. Known limitations
|
|
450
|
+
6. Final verdict
|
|
451
|
+
7. Sources
|
|
452
|
+
|
|
453
|
+
## Limitations
|
|
454
|
+
|
|
455
|
+
- This repo covers **chat only**. GitHub Copilot features like inline completions, semantic search, and next-edit suggestions still require a GitHub-hosted model.
|
|
456
|
+
- Each proxy is tuned for a specific provider family. Don't point the Kimi proxy at an arbitrary OpenAI-compatible endpoint and expect it to work.
|
|
457
|
+
|
|
458
|
+
---
|
|
459
|
+
|
|
460
|
+
## Support
|
|
461
|
+
|
|
462
|
+
If you find this project helpful, please consider supporting its development:
|
|
463
|
+
|
|
464
|
+
[](https://github.com/sponsors/tugudush)
|
|
465
|
+
|
|
466
|
+
**Solana (SOL)**
|
|
467
|
+
|
|
468
|
+
```
|
|
469
|
+
CWZccD3Ny3XotFZtnkcyzP3hapmu3ExknN1PF4rEvP3u
|
|
470
|
+
```
|
package/cli.mjs
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
import { fileURLToPath } from 'node:url'
|
|
3
|
+
import { dirname, resolve } from 'node:path'
|
|
4
|
+
import { fork } from 'node:child_process'
|
|
5
|
+
import { rmSync } from 'node:fs'
|
|
6
|
+
|
|
7
|
+
const __dirname = dirname(fileURLToPath(import.meta.url))
|
|
8
|
+
const sub = process.argv[2]
|
|
9
|
+
|
|
10
|
+
const usage = `Usage: copilot-custom-endpoint [all|kimi|qwen|clean]
|
|
11
|
+
|
|
12
|
+
Start a local proxy for VS Code Copilot custom endpoints.
|
|
13
|
+
|
|
14
|
+
copilot-custom-endpoint all Start both proxies concurrently (default)
|
|
15
|
+
copilot-custom-endpoint kimi Start the Kimi K2 proxy on port 3457
|
|
16
|
+
copilot-custom-endpoint qwen Start the Qwen 3.x proxy on port 3458
|
|
17
|
+
copilot-custom-endpoint clean Remove the debug_log/ directory
|
|
18
|
+
|
|
19
|
+
Environment variables: see --help for each proxy.
|
|
20
|
+
`
|
|
21
|
+
|
|
22
|
+
if (sub === 'clean') {
|
|
23
|
+
rmSync(resolve(process.cwd(), 'debug_log'), { recursive: true, force: true })
|
|
24
|
+
console.log('debug_log/ removed')
|
|
25
|
+
process.exit(0)
|
|
26
|
+
}
|
|
27
|
+
|
|
28
|
+
if (sub && sub !== 'kimi' && sub !== 'qwen' && sub !== 'all') {
|
|
29
|
+
console.error(usage)
|
|
30
|
+
process.exit(1)
|
|
31
|
+
}
|
|
32
|
+
|
|
33
|
+
const targets = sub === 'all' || !sub ? ['kimi', 'qwen'] : [sub]
|
|
34
|
+
|
|
35
|
+
// Spawn all target proxies and wait for all to exit.
|
|
36
|
+
// This keeps both proxies alive in "all" mode instead of exiting
|
|
37
|
+
// when the first one terminates.
|
|
38
|
+
const children = targets.map((name) => {
|
|
39
|
+
const proxyFile = resolve(__dirname, 'proxy', `${name}-proxy.mjs`)
|
|
40
|
+
return fork(proxyFile, process.argv.slice(3), { stdio: 'inherit' })
|
|
41
|
+
})
|
|
42
|
+
|
|
43
|
+
const exitCodes = await Promise.all(
|
|
44
|
+
children.map(
|
|
45
|
+
(child) =>
|
|
46
|
+
new Promise((resolve) => {
|
|
47
|
+
child.on('exit', (code) => {
|
|
48
|
+
resolve(code ?? 0)
|
|
49
|
+
})
|
|
50
|
+
})
|
|
51
|
+
)
|
|
52
|
+
)
|
|
53
|
+
|
|
54
|
+
process.exit(exitCodes.some((code) => code !== 0) ? 1 : 0)
|
package/package.json
ADDED
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "copilot-custom-endpoint",
|
|
3
|
+
"version": "1.0.0",
|
|
4
|
+
"description": "Local proxies for VS Code Copilot custom endpoints — Kimi K2 & Qwen 3.x",
|
|
5
|
+
"license": "MIT",
|
|
6
|
+
"type": "module",
|
|
7
|
+
"bin": {
|
|
8
|
+
"copilot-custom-endpoint": "cli.mjs",
|
|
9
|
+
"copilot-custom-endpoint-kimi": "proxy/kimi-proxy.mjs",
|
|
10
|
+
"copilot-custom-endpoint-qwen": "proxy/qwen-proxy.mjs"
|
|
11
|
+
},
|
|
12
|
+
"files": [
|
|
13
|
+
"cli.mjs",
|
|
14
|
+
"proxy/"
|
|
15
|
+
],
|
|
16
|
+
"scripts": {
|
|
17
|
+
"proxy": "concurrently --names kimi,qwen --prefix-colors cyan,green \"npm run proxy:kimi\" \"npm run proxy:qwen\"",
|
|
18
|
+
"proxy:kimi": "node proxy/kimi-proxy.mjs",
|
|
19
|
+
"proxy:qwen": "node proxy/qwen-proxy.mjs",
|
|
20
|
+
"clean:logs": "node -e \"import{rmSync}from'node:fs';rmSync('debug_log',{recursive:true,force:true});console.log('debug_log/ removed')\"",
|
|
21
|
+
"test": "node --test tests/**/*.test.mjs",
|
|
22
|
+
"lint": "eslint .",
|
|
23
|
+
"lint:fix": "eslint . --fix",
|
|
24
|
+
"format": "prettier --check .",
|
|
25
|
+
"format:fix": "prettier --write .",
|
|
26
|
+
"lf": "npm run lint && npm run format:fix"
|
|
27
|
+
},
|
|
28
|
+
"repository": {
|
|
29
|
+
"type": "git",
|
|
30
|
+
"url": "git+https://github.com/tugudush/copilot-custom-endpoints.git"
|
|
31
|
+
},
|
|
32
|
+
"keywords": [
|
|
33
|
+
"copilot",
|
|
34
|
+
"vscode",
|
|
35
|
+
"custom-endpoint",
|
|
36
|
+
"kimi",
|
|
37
|
+
"qwen",
|
|
38
|
+
"moonshot",
|
|
39
|
+
"dashscope",
|
|
40
|
+
"proxy"
|
|
41
|
+
],
|
|
42
|
+
"devDependencies": {
|
|
43
|
+
"@eslint/js": "^10.0.1",
|
|
44
|
+
"concurrently": "^10.0.1",
|
|
45
|
+
"eslint": "^10.4.1",
|
|
46
|
+
"eslint-config-prettier": "^10.1.8",
|
|
47
|
+
"globals": "^17.6.0",
|
|
48
|
+
"prettier": "^3.8.3"
|
|
49
|
+
}
|
|
50
|
+
}
|
|
@@ -0,0 +1,158 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
import { fileURLToPath } from 'node:url'
|
|
3
|
+
import { createProxy } from '../lib/create-proxy.mjs'
|
|
4
|
+
|
|
5
|
+
/**
|
|
6
|
+
* Supported model scope for this proxy:
|
|
7
|
+
* - Validated in this repo with `kimi-k2.6`.
|
|
8
|
+
* - Expected to work for `kimi-k2.5`, because Kimi documents the same fixed
|
|
9
|
+
* sampling and thinking behavior for `kimi-k2.6` / `kimi-k2.5`.
|
|
10
|
+
* - Not intended for `moonshot-v1` models or non-Kimi providers, because this
|
|
11
|
+
* proxy rewrites requests to K2-family-specific values:
|
|
12
|
+
* - thinking mode temperature = 1.0
|
|
13
|
+
* - non-thinking mode temperature = 0.6
|
|
14
|
+
* - top_p = 0.95
|
|
15
|
+
* - tool-enabled requests force `thinking: { type: 'disabled' }`
|
|
16
|
+
*/
|
|
17
|
+
const upstreamUrl =
|
|
18
|
+
process.env.KIMI_UPSTREAM_URL ?? 'https://api.moonshot.ai/v1/chat/completions'
|
|
19
|
+
const port = Number.parseInt(process.env.PORT ?? '3457', 10)
|
|
20
|
+
const forcedTemperature = Number(
|
|
21
|
+
process.env.KIMI_PROXY_FORCE_TEMPERATURE ?? '1'
|
|
22
|
+
)
|
|
23
|
+
const forcedNonThinkingTemperature = Number(
|
|
24
|
+
process.env.KIMI_PROXY_FORCE_NON_THINKING_TEMPERATURE ?? '0.6'
|
|
25
|
+
)
|
|
26
|
+
const forcedTopP = Number(process.env.KIMI_PROXY_FORCE_TOP_P ?? '0.95')
|
|
27
|
+
const disableThinkingWithTools =
|
|
28
|
+
(process.env.KIMI_PROXY_DISABLE_THINKING_WITH_TOOLS ?? '1') !== '0'
|
|
29
|
+
const defaultLogPath = fileURLToPath(
|
|
30
|
+
new URL('../debug_log/kimi-proxy.ndjson', import.meta.url)
|
|
31
|
+
)
|
|
32
|
+
const logPath = process.env.KIMI_PROXY_LOG ?? defaultLogPath
|
|
33
|
+
|
|
34
|
+
if (process.argv.includes('--help')) {
|
|
35
|
+
console.log(`Kimi proxy
|
|
36
|
+
|
|
37
|
+
Starts a local HTTP proxy that rewrites the outbound chat-completions request body to use Kimi-compatible sampling values.
|
|
38
|
+
|
|
39
|
+
Environment variables:
|
|
40
|
+
PORT Local listen port. Default: 3457
|
|
41
|
+
KIMI_UPSTREAM_URL Upstream Moonshot chat-completions URL.
|
|
42
|
+
Default: https://api.moonshot.ai/v1/chat/completions
|
|
43
|
+
KIMI_PROXY_FORCE_TEMPERATURE Temperature to force into the request body. Default: 1
|
|
44
|
+
KIMI_PROXY_FORCE_NON_THINKING_TEMPERATURE
|
|
45
|
+
Temperature to force when thinking is disabled. Default: 0.6
|
|
46
|
+
KIMI_PROXY_FORCE_TOP_P top_p to force into the request body. Default: 0.95
|
|
47
|
+
KIMI_PROXY_DISABLE_THINKING_WITH_TOOLS
|
|
48
|
+
Force thinking={"type":"disabled"} when tools are present.
|
|
49
|
+
Default: 1
|
|
50
|
+
KIMI_PROXY_LOG Path to the redacted NDJSON log file.
|
|
51
|
+
|
|
52
|
+
Suggested VS Code model URL:
|
|
53
|
+
http://127.0.0.1:3457/v1/chat/completions
|
|
54
|
+
`)
|
|
55
|
+
process.exit(0)
|
|
56
|
+
}
|
|
57
|
+
|
|
58
|
+
if (!Number.isFinite(forcedTemperature)) {
|
|
59
|
+
throw new Error(
|
|
60
|
+
`Invalid KIMI_PROXY_FORCE_TEMPERATURE: ${process.env.KIMI_PROXY_FORCE_TEMPERATURE ?? ''}`
|
|
61
|
+
)
|
|
62
|
+
}
|
|
63
|
+
|
|
64
|
+
if (!Number.isFinite(forcedNonThinkingTemperature)) {
|
|
65
|
+
throw new Error(
|
|
66
|
+
`Invalid KIMI_PROXY_FORCE_NON_THINKING_TEMPERATURE: ${process.env.KIMI_PROXY_FORCE_NON_THINKING_TEMPERATURE ?? ''}`
|
|
67
|
+
)
|
|
68
|
+
}
|
|
69
|
+
|
|
70
|
+
if (!Number.isFinite(forcedTopP)) {
|
|
71
|
+
throw new Error(
|
|
72
|
+
`Invalid KIMI_PROXY_FORCE_TOP_P: ${process.env.KIMI_PROXY_FORCE_TOP_P ?? ''}`
|
|
73
|
+
)
|
|
74
|
+
}
|
|
75
|
+
|
|
76
|
+
// ---- Provider-specific rewrite logic ----
|
|
77
|
+
|
|
78
|
+
function summarizePayload(payload, hasTools, rewriteInfo) {
|
|
79
|
+
const messages = Array.isArray(payload.messages) ? payload.messages : []
|
|
80
|
+
const tools = Array.isArray(payload.tools) ? payload.tools : []
|
|
81
|
+
|
|
82
|
+
return {
|
|
83
|
+
model: payload.model,
|
|
84
|
+
stream: payload.stream,
|
|
85
|
+
...rewriteInfo,
|
|
86
|
+
maxTokens:
|
|
87
|
+
payload.max_tokens ??
|
|
88
|
+
payload.max_completion_tokens ??
|
|
89
|
+
payload.max_output_tokens,
|
|
90
|
+
toolChoice: payload.tool_choice,
|
|
91
|
+
toolCount: tools.length,
|
|
92
|
+
hasTools,
|
|
93
|
+
messageCount: messages.length,
|
|
94
|
+
messageRoles: messages.map((message) => message?.role).slice(0, 16),
|
|
95
|
+
topLevelKeys: Object.keys(payload).sort()
|
|
96
|
+
}
|
|
97
|
+
}
|
|
98
|
+
|
|
99
|
+
function rewriteKimi(payload) {
|
|
100
|
+
const incomingTemperature = payload.temperature
|
|
101
|
+
const incomingTopP = payload.top_p
|
|
102
|
+
const incomingThinkingType = payload?.thinking?.type
|
|
103
|
+
const hasTools = Array.isArray(payload.tools) && payload.tools.length > 0
|
|
104
|
+
const useNonThinkingMode = disableThinkingWithTools && hasTools
|
|
105
|
+
const rewrittenTemperature = useNonThinkingMode
|
|
106
|
+
? forcedNonThinkingTemperature
|
|
107
|
+
: forcedTemperature
|
|
108
|
+
|
|
109
|
+
// Capture incoming state before mutation
|
|
110
|
+
payload.__incomingThinkingType = incomingThinkingType
|
|
111
|
+
|
|
112
|
+
// Apply rewrites
|
|
113
|
+
payload.temperature = rewrittenTemperature
|
|
114
|
+
payload.top_p = forcedTopP
|
|
115
|
+
|
|
116
|
+
if (useNonThinkingMode) {
|
|
117
|
+
payload.thinking = { type: 'disabled' }
|
|
118
|
+
}
|
|
119
|
+
|
|
120
|
+
const rewrittenThinkingType = payload.thinking?.type
|
|
121
|
+
const rewriteInfo = {
|
|
122
|
+
incomingTemperature,
|
|
123
|
+
rewrittenTemperature,
|
|
124
|
+
incomingTopP,
|
|
125
|
+
rewrittenTopP: forcedTopP,
|
|
126
|
+
incomingThinkingType,
|
|
127
|
+
rewrittenThinkingType
|
|
128
|
+
}
|
|
129
|
+
|
|
130
|
+
const summary = summarizePayload(payload, hasTools, rewriteInfo)
|
|
131
|
+
|
|
132
|
+
const consoleMsg = `temperature ${String(incomingTemperature)} -> ${String(rewrittenTemperature)}, top_p ${String(incomingTopP)} -> ${String(forcedTopP)}, thinking ${String(incomingThinkingType)} -> ${String(rewrittenThinkingType)}`
|
|
133
|
+
|
|
134
|
+
// Clean up internal key before forwarding
|
|
135
|
+
delete payload.__incomingThinkingType
|
|
136
|
+
|
|
137
|
+
return { summary, consoleMsg }
|
|
138
|
+
}
|
|
139
|
+
|
|
140
|
+
// ---- Create and start ----
|
|
141
|
+
|
|
142
|
+
const { start } = createProxy({
|
|
143
|
+
upstreamUrl,
|
|
144
|
+
port,
|
|
145
|
+
logPath,
|
|
146
|
+
label: 'kimi-proxy',
|
|
147
|
+
healthCheckExtras: { forcedTemperature, forcedTopP },
|
|
148
|
+
rewriteRequest: rewriteKimi,
|
|
149
|
+
startupMessages: (_port, _upstreamUrl) => [
|
|
150
|
+
`[kimi-proxy] listening on http://127.0.0.1:${_port}/v1/chat/completions`,
|
|
151
|
+
`[kimi-proxy] forwarding to ${_upstreamUrl}`,
|
|
152
|
+
`[kimi-proxy] forcing temperature=${forcedTemperature}, non-thinking temperature=${forcedNonThinkingTemperature}, and top_p=${forcedTopP}`,
|
|
153
|
+
`[kimi-proxy] disable thinking with tools=${disableThinkingWithTools}`,
|
|
154
|
+
`[kimi-proxy] writing redacted request summaries to ${logPath}`
|
|
155
|
+
]
|
|
156
|
+
})
|
|
157
|
+
|
|
158
|
+
start()
|
|
@@ -0,0 +1,114 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
import { fileURLToPath } from 'node:url'
|
|
3
|
+
import { createProxy } from '../lib/create-proxy.mjs'
|
|
4
|
+
|
|
5
|
+
/**
|
|
6
|
+
* Supported model scope for this proxy:
|
|
7
|
+
* - Validated with `qwen3.6-plus` and `qwen3.7-max`.
|
|
8
|
+
* - Expected to work for any Qwen3 hybrid-thinking model (qwen3-* series)
|
|
9
|
+
* that supports the `enable_thinking` top-level field on DashScope's
|
|
10
|
+
* OpenAI-compatible surface.
|
|
11
|
+
* - Not intended for non-Qwen providers, because the rewrite assumes
|
|
12
|
+
* DashScope's `enable_thinking` behavior (no nested `thinking` object).
|
|
13
|
+
*/
|
|
14
|
+
const upstreamUrl =
|
|
15
|
+
process.env.QWEN_UPSTREAM_URL ??
|
|
16
|
+
'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions'
|
|
17
|
+
const port = Number.parseInt(process.env.PORT ?? '3458', 10)
|
|
18
|
+
const disableThinkingWithTools =
|
|
19
|
+
(process.env.QWEN_PROXY_DISABLE_THINKING_WITH_TOOLS ?? '1') !== '0'
|
|
20
|
+
const defaultLogPath = fileURLToPath(
|
|
21
|
+
new URL('../debug_log/qwen-proxy.ndjson', import.meta.url)
|
|
22
|
+
)
|
|
23
|
+
const logPath = process.env.QWEN_PROXY_LOG ?? defaultLogPath
|
|
24
|
+
|
|
25
|
+
if (process.argv.includes('--help')) {
|
|
26
|
+
console.log(`Qwen proxy
|
|
27
|
+
|
|
28
|
+
Starts a local HTTP proxy that conditionally injects enable_thinking: false
|
|
29
|
+
when the request includes a tools array, letting Qwen hybrid-thinking models
|
|
30
|
+
show reasoning in plain chat while keeping tool loops stable.
|
|
31
|
+
|
|
32
|
+
Environment variables:
|
|
33
|
+
PORT Local listen port. Default: 3458
|
|
34
|
+
QWEN_UPSTREAM_URL Upstream DashScope chat-completions URL.
|
|
35
|
+
Default: https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
|
|
36
|
+
QWEN_PROXY_DISABLE_THINKING_WITH_TOOLS
|
|
37
|
+
Inject enable_thinking: false when tools are present.
|
|
38
|
+
Default: 1
|
|
39
|
+
QWEN_PROXY_LOG Path to the redacted NDJSON log file.
|
|
40
|
+
|
|
41
|
+
Suggested VS Code model URL:
|
|
42
|
+
http://127.0.0.1:3458/v1/chat/completions
|
|
43
|
+
`)
|
|
44
|
+
process.exit(0)
|
|
45
|
+
}
|
|
46
|
+
|
|
47
|
+
// ---- Provider-specific rewrite logic ----
|
|
48
|
+
|
|
49
|
+
function summarizePayload(payload, hasTools, rewriteInfo) {
|
|
50
|
+
const messages = Array.isArray(payload.messages) ? payload.messages : []
|
|
51
|
+
const tools = Array.isArray(payload.tools) ? payload.tools : []
|
|
52
|
+
|
|
53
|
+
return {
|
|
54
|
+
model: payload.model,
|
|
55
|
+
stream: payload.stream,
|
|
56
|
+
hasTools,
|
|
57
|
+
toolCount: tools.length,
|
|
58
|
+
toolChoice: payload.tool_choice,
|
|
59
|
+
...rewriteInfo,
|
|
60
|
+
maxTokens:
|
|
61
|
+
payload.max_tokens ??
|
|
62
|
+
payload.max_completion_tokens ??
|
|
63
|
+
payload.max_output_tokens,
|
|
64
|
+
messageCount: messages.length,
|
|
65
|
+
messageRoles: messages.map((message) => message?.role).slice(0, 16),
|
|
66
|
+
topLevelKeys: Object.keys(payload).sort()
|
|
67
|
+
}
|
|
68
|
+
}
|
|
69
|
+
|
|
70
|
+
function rewriteQwen(payload) {
|
|
71
|
+
const hasTools = Array.isArray(payload.tools) && payload.tools.length > 0
|
|
72
|
+
const incomingEnableThinking = payload.enable_thinking
|
|
73
|
+
|
|
74
|
+
if (disableThinkingWithTools && hasTools) {
|
|
75
|
+
// Tool-enabled request: suppress thinking to avoid reasoning_content issues
|
|
76
|
+
payload.enable_thinking = false
|
|
77
|
+
} else {
|
|
78
|
+
// Plain chat: remove enable_thinking so the model uses its default (true)
|
|
79
|
+
delete payload.enable_thinking
|
|
80
|
+
}
|
|
81
|
+
|
|
82
|
+
const rewrittenEnableThinking =
|
|
83
|
+
disableThinkingWithTools && hasTools ? false : undefined
|
|
84
|
+
|
|
85
|
+
const summary = summarizePayload(payload, hasTools, {
|
|
86
|
+
incomingEnableThinking,
|
|
87
|
+
rewrittenEnableThinking
|
|
88
|
+
})
|
|
89
|
+
|
|
90
|
+
const consoleMsg = `tools=${String(hasTools)} enable_thinking=${String(incomingEnableThinking)} -> ${
|
|
91
|
+
hasTools && disableThinkingWithTools ? 'false' : '<deleted>'
|
|
92
|
+
}, model=${payload.model ?? '?'}`
|
|
93
|
+
|
|
94
|
+
return { summary, consoleMsg }
|
|
95
|
+
}
|
|
96
|
+
|
|
97
|
+
// ---- Create and start ----
|
|
98
|
+
|
|
99
|
+
const { start } = createProxy({
|
|
100
|
+
upstreamUrl,
|
|
101
|
+
port,
|
|
102
|
+
logPath,
|
|
103
|
+
label: 'qwen-proxy',
|
|
104
|
+
healthCheckExtras: { disableThinkingWithTools },
|
|
105
|
+
rewriteRequest: rewriteQwen,
|
|
106
|
+
startupMessages: (_port, _upstreamUrl) => [
|
|
107
|
+
`[qwen-proxy] listening on http://127.0.0.1:${_port}/v1/chat/completions`,
|
|
108
|
+
`[qwen-proxy] forwarding to ${_upstreamUrl}`,
|
|
109
|
+
`[qwen-proxy] disable thinking with tools=${disableThinkingWithTools}`,
|
|
110
|
+
`[qwen-proxy] writing redacted request summaries to ${logPath}`
|
|
111
|
+
]
|
|
112
|
+
})
|
|
113
|
+
|
|
114
|
+
start()
|