copilot-custom-endpoint 1.3.6 → 1.3.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +2 -2
- package/docs/models/glm.md +0 -13
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
|
|
7
7
|
VS Code lets you add your own language-model endpoint via a small JSON config file. Many providers advertise "OpenAI-compatible" APIs but reject the exact request shapes VS Code sends. This repo collects **real, tested setups** — one per provider — plus a tiny local proxy that smooths over the rough edges when needed.
|
|
8
8
|
|
|
9
|
-
If [OpenRouter](https://openrouter.ai) is blocked by your network
|
|
9
|
+
If [OpenRouter](https://openrouter.ai) is blocked by your network or too generic for your model's quirks, this is the workaround.
|
|
10
10
|
|
|
11
11
|
## How it works (4 steps)
|
|
12
12
|
|
|
@@ -116,7 +116,7 @@ VS Code's built-in `view_image` tool only accepts **static images** (PNG, JPG, G
|
|
|
116
116
|
**Video Context MCP** is a small MCP server that bridges that gap. It works with **GitHub Copilot, Cursor, and Claude Code** out of the box, and:
|
|
117
117
|
|
|
118
118
|
- **Extracts frames** from local files or remote URLs (no `ffmpeg` gymnastics required).
|
|
119
|
-
- **Routes them through a multi-provider fallback chain** — `Gemini → GLM
|
|
119
|
+
- **Routes them through a multi-provider fallback chain** — `Gemini → GLM 4.6V Flash→ Qwen3.7-plus → Kimi K2.6 → MiMo-V2.5` — so a single `GLM 5V Turbo` rate-limit hiccup doesn't kill your session.
|
|
120
120
|
- **Answers natural-language questions** about the video grounded in actual frames: "what does the speaker click in the last 30 seconds?", "summarize the demo", "find the frame where the error appears".
|
|
121
121
|
- **Extras:** timestamp search, audio transcription with speaker diarization, and video metadata (resolution, duration, codec).
|
|
122
122
|
|
package/docs/models/glm.md
CHANGED
|
@@ -45,9 +45,6 @@ Config file location:
|
|
|
45
45
|
| macOS | `~/Library/Application Support/Code/User/chatLanguageModels.json` |
|
|
46
46
|
| Linux | `~/.config/Code/User/chatLanguageModels.json` |
|
|
47
47
|
|
|
48
|
-
<details>
|
|
49
|
-
<summary><strong>GLM config — collapse for brevity</strong></summary>
|
|
50
|
-
|
|
51
48
|
```json
|
|
52
49
|
{
|
|
53
50
|
"name": "GLM",
|
|
@@ -89,8 +86,6 @@ Config file location:
|
|
|
89
86
|
}
|
|
90
87
|
```
|
|
91
88
|
|
|
92
|
-
</details>
|
|
93
|
-
|
|
94
89
|
> **Leave `apiKey` as `""`** — set it through the Language Models UI so VS Code stores it in the OS keychain (it will replace the empty string with a `${input:chat.lm.secret.<id>}` reference).
|
|
95
90
|
|
|
96
91
|
### 2. API key
|
|
@@ -259,14 +254,6 @@ This file is the **research record and the user-facing setup guide**. The implem
|
|
|
259
254
|
| 10 | `curl` tool-call follow-up turn (proves `clear_thinking`) | HTTP 200, prior `reasoning_content` is auto-stripped | ⏳ |
|
|
260
255
|
|
|
261
256
|
> **`glm-5v-turbo` fully validated** ✅ for VS Code custom-endpoint use: plain chat ✅, streaming ✅, tool calling ✅ (tested with `open_browser_page` opening Google), vision ✅ (accurately described a daily.dev screenshot including post titles, tags, sidebar navigation, browser tabs, and ad content).
|
|
262
|
-
>
|
|
263
|
-
> **Video input: GLM-5V-Turbo supports it natively, but VS Code's tool pipeline blocks it.** Z.ai's official docs state GLM-5V-Turbo's **Input Modality is "Video / Image / Text / File"**, and the Chat Completion API accepts **video** alongside images, audio, and files. There is even an official **"Video Object Tracking"** skill/example for `glm-5v-turbo`. However, VS Code's `view_image` tool only accepts static image formats (`png`, `jpg`, `jpeg`, `gif`, `webp`) and **rejects video files at the tool layer before they reach the model**. To test video input with GLM-5V-Turbo, use a direct API call (e.g., `curl`) with a public video URL in an `image_url` content part, or extract frames as images first (e.g., `ffmpeg -i video.mp4 -vframes 1 frame.png`). For a turnkey bridge that does this automatically inside VS Code, see [**Video Context MCP**](https://www.videocontextmcp.com/) — an MCP server that extracts frames from a video and routes them to a model provider (e.g. GLM 5V Turbo) in a fallback chain so you can ask natural-language questions about any video. See [GLM-5V-Turbo docs](https://docs.z.ai/guides/vlm/glm-5v-turbo) for the official video input examples.
|
|
264
|
-
>
|
|
265
|
-
> **`glm-5.1` partially validated** ✅ for text-only use: plain chat ✅, streaming ✅, tool calling ✅. The remaining `curl`-based checks are pending.
|
|
266
|
-
|
|
267
|
-
## Companion tools
|
|
268
|
-
|
|
269
|
-
- [**Video Context MCP**](https://www.videocontextmcp.com/) — an MCP server that gives AI coding assistants (GitHub Copilot, Cursor, Claude Code) the ability to **understand video content** via natural language. Extracts frames from local or remote videos, routes them through a multi-provider fallback chain (**Gemini → GLM 5V Turbo → Qwen 3.7 Plus → Kimi K2.6 → MiMo-V2.5**), and returns answers grounded in actual video frames. Also handles summarization, timestamp search, audio transcription with speaker diarization, and video metadata. Works around the limitation that VS Code's built-in `view_image` tool only accepts static images — so it lets `glm-5v-turbo`'s native video support actually be exercised end-to-end from inside VS Code.
|
|
270
257
|
|
|
271
258
|
## References
|
|
272
259
|
|