ghc-proxy 0.5.0 → 0.5.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -12,7 +12,7 @@ A proxy that turns your GitHub Copilot subscription into an OpenAI and Anthropic
12
12
  **TL;DR** — Install [Bun](https://bun.com/docs/installation), then run:
13
13
 
14
14
  ```bash
15
- bunx ghc-proxy@latest start --wait
15
+ bunx ghc-proxy@latest start
16
16
  ```
17
17
 
18
18
  ## Prerequisites
@@ -28,9 +28,7 @@ Before you start, make sure you have:
28
28
 
29
29
  1. Start the proxy:
30
30
 
31
- bunx ghc-proxy@latest start --wait
32
-
33
- > **Recommended:** The `--wait` flag queues requests instead of rejecting them with a 429 error when you hit Copilot rate limits. This is the simplest way to run the proxy for daily use.
31
+ bunx ghc-proxy@latest start
34
32
 
35
33
  2. On the first run, you will be guided through GitHub's device-code authentication flow. Follow the prompts to authorize the proxy.
36
34
 
@@ -38,6 +36,8 @@ Before you start, make sure you have:
38
36
 
39
37
  That's it. Any tool that supports the OpenAI or Anthropic API can now point to `http://localhost:4141`.
40
38
 
39
+ > **Tip:** If you set `--rate-limit`, add `--wait` to queue requests instead of rejecting them with 429 when the cooldown has not elapsed yet. See [Rate Limiting](#rate-limiting) for details.
40
+
41
41
  ## Using with Claude Code
42
42
 
43
43
  This is the most common use case. There are two ways to set it up:
@@ -73,7 +73,7 @@ Create or edit `~/.claude/settings.json` (this applies globally to all projects)
73
73
  Then simply start the proxy and use Claude Code as usual:
74
74
 
75
75
  ```bash
76
- bunx ghc-proxy@latest start --wait
76
+ bunx ghc-proxy@latest start
77
77
  ```
78
78
 
79
79
  **What each environment variable does:**
@@ -91,75 +91,6 @@ bunx ghc-proxy@latest start --wait
91
91
 
92
92
  See the [Claude Code settings docs](https://docs.anthropic.com/en/docs/claude-code/settings#environment-variables) for more options.
93
93
 
94
- ## What it Does
95
-
96
- ghc-proxy sits between your tools and the GitHub Copilot API:
97
-
98
- ```text
99
- ┌──────────────┐ ┌───────────┐ ┌───────────────────────┐
100
- │ Claude Code │──────│ ghc-proxy │──────│ api.githubcopilot.com │
101
- │ Cursor │ │ :4141 │ │ │
102
- │ Any client │ │ │ │ │
103
- └──────────────┘ └───────────┘ └───────────────────────┘
104
- OpenAI or Translates GitHub Copilot
105
- Anthropic between API
106
- format formats
107
- ```
108
-
109
- The proxy authenticates with GitHub using the [device code OAuth flow](https://docs.github.com/en/apps/oauth-apps/building-oauth-apps/authorizing-oauth-apps#device-flow) (the same flow VS Code uses), then exchanges the GitHub token for a short-lived Copilot token that auto-refreshes.
110
-
111
- When the Copilot token response includes `endpoints.api`, `ghc-proxy` now prefers that runtime API base automatically instead of relying only on the configured account type. This keeps enterprise/business routing aligned with the endpoint GitHub actually returned for the current token.
112
-
113
- Incoming requests hit an [Elysia](https://elysiajs.com/) server. `chat/completions` requests are validated, normalized into the shared planning pipeline, and then forwarded to Copilot. `responses` requests use a native Responses path with explicit compatibility policies. `messages` requests are routed per-model and can use native Anthropic passthrough, the Responses translation path, or the existing chat-completions fallback. The translator tracks exact vs lossy vs unsupported behavior explicitly; see the [Messages Routing and Translation Guide](./docs/messages-routing-and-translation.md) and the [Anthropic Translation Matrix](./docs/anthropic-translation-matrix.md) for the current support surface.
114
-
115
- ### Request Routing
116
-
117
- `ghc-proxy` does not force every request through one protocol. The current routing rules are:
118
-
119
- - `POST /v1/chat/completions`: OpenAI Chat Completions -> shared planning pipeline -> Copilot `/chat/completions`
120
- - `POST /v1/responses`: OpenAI Responses create -> native Responses handler -> Copilot `/responses`
121
- - `POST /v1/responses/input_tokens`: Responses input-token counting passthrough when the upstream supports it
122
- - `GET /v1/responses/:responseId`: Responses retrieve passthrough when the upstream supports it
123
- - `GET /v1/responses/:responseId/input_items`: Responses input-items passthrough when the upstream supports it
124
- - `DELETE /v1/responses/:responseId`: Responses delete passthrough when the upstream supports it
125
- - `POST /v1/messages`: Anthropic Messages -> choose the best available upstream path for the selected model:
126
- - native Copilot `/v1/messages` when supported
127
- - Anthropic -> Responses -> Anthropic translation when the model only supports `/responses`
128
- - Anthropic -> Chat Completions -> Anthropic fallback otherwise
129
-
130
- This keeps the existing chat pipeline stable while allowing newer Copilot models to use the endpoint they actually expose.
131
-
132
- ### Endpoints
133
-
134
- **OpenAI compatible:**
135
-
136
- | Method | Path | Description |
137
- |--------|------|-------------|
138
- | `POST` | `/v1/chat/completions` | Chat completions (streaming and non-streaming) |
139
- | `POST` | `/v1/responses` | Create a Responses API response |
140
- | `POST` | `/v1/responses/input_tokens` | Count Responses input tokens when supported by Copilot upstream |
141
- | `GET` | `/v1/responses/:responseId` | Retrieve one response when supported by Copilot upstream |
142
- | `GET` | `/v1/responses/:responseId/input_items` | Retrieve response input items when supported by Copilot upstream |
143
- | `DELETE` | `/v1/responses/:responseId` | Delete one response when supported by Copilot upstream |
144
- | `GET` | `/v1/models` | List available models |
145
- | `POST` | `/v1/embeddings` | Generate embeddings |
146
-
147
- **Anthropic compatible:**
148
-
149
- | Method | Path | Description |
150
- |--------|------|-------------|
151
- | `POST` | `/v1/messages` | Messages API with per-model routing across native Messages, Responses translation, or chat-completions fallback |
152
- | `POST` | `/v1/messages/count_tokens` | Token counting |
153
-
154
- **Utility:**
155
-
156
- | Method | Path | Description |
157
- |--------|------|-------------|
158
- | `GET` | `/usage` | Copilot quota / usage monitoring |
159
- | `GET` | `/token` | Inspect the current Copilot token |
160
-
161
- > **Note:** The `/v1/` prefix is optional. `/chat/completions`, `/responses`, `/models`, and `/embeddings` also work.
162
-
163
94
  ## CLI Reference
164
95
 
165
96
  ghc-proxy uses a subcommand structure:
@@ -179,18 +110,18 @@ bunx ghc-proxy@latest debug # Print diagnostic info (version, paths, to
179
110
  | `--verbose` | `-v` | `false` | Enable verbose logging |
180
111
  | `--account-type` | `-a` | `individual` | `individual`, `business`, or `enterprise` |
181
112
  | `--rate-limit` | `-r` | -- | Minimum seconds between requests |
182
- | `--wait` | `-w` | `false` | Wait instead of rejecting when rate-limited |
113
+ | `--wait` | `-w` | `false` | Queue requests instead of rejecting with 429 when `--rate-limit` cooldown has not elapsed (requires `--rate-limit`) |
183
114
  | `--manual` | -- | `false` | Manually approve each request |
184
115
  | `--github-token` | `-g` | -- | Pass a GitHub token directly (from `auth`) |
185
116
  | `--claude-code` | `-c` | `false` | Generate a Claude Code launch command |
186
117
  | `--show-token` | -- | `false` | Display tokens on auth and refresh |
187
118
  | `--proxy-env` | -- | `false` | Use `HTTP_PROXY`/`HTTPS_PROXY` from env (Node.js only; Bun reads proxy env natively) |
188
- | `--idle-timeout` | -- | `120` | Bun server idle timeout in seconds |
189
- | `--upstream-timeout` | -- | `300` | Upstream request timeout in seconds (0 to disable) |
119
+ | `--idle-timeout` | -- | `120` | Bun server idle timeout in seconds (`0` disables; Bun max is `255`; streaming routes disable idle timeout automatically) |
120
+ | `--upstream-timeout` | -- | `1800` | Upstream request timeout in seconds (0 to disable) |
190
121
 
191
122
  ## Rate Limiting
192
123
 
193
- If you are worried about hitting Copilot rate limits:
124
+ If you want to throttle how often the proxy forwards requests:
194
125
 
195
126
  ```bash
196
127
  # Enforce a 30-second cooldown between requests
@@ -203,6 +134,8 @@ bunx ghc-proxy@latest start --rate-limit 30 --wait
203
134
  bunx ghc-proxy@latest start --manual
204
135
  ```
205
136
 
137
+ `--wait` only takes effect when `--rate-limit` is also set. Without `--rate-limit`, there is no cooldown to wait on and `--wait` has no effect.
138
+
206
139
  ## Account Types
207
140
 
208
141
  If you have a GitHub Business or Enterprise Copilot plan, pass `--account-type`:
@@ -214,6 +147,57 @@ bunx ghc-proxy@latest start --account-type enterprise
214
147
 
215
148
  This routes requests to the correct Copilot API endpoint for your plan. See the [GitHub docs on network routing](https://docs.github.com/en/enterprise-cloud@latest/copilot/managing-copilot/managing-github-copilot-in-your-organization/managing-access-to-github-copilot-in-your-organization/managing-github-copilot-access-to-your-organizations-network#configuring-copilot-subscription-based-network-routing-for-your-enterprise-or-organization) for details.
216
149
 
150
+ ## Configuration
151
+
152
+ The proxy reads an optional JSON config file at:
153
+
154
+ ```
155
+ ~/.local/share/ghc-proxy/config.json
156
+ ```
157
+
158
+ All fields are optional. The full schema:
159
+
160
+ | Field | Type | Default | Description |
161
+ |-------|------|---------|-------------|
162
+ | `modelRewrites` | `{ from, to }[]` | -- | Glob-pattern model substitution rules (see [Model Rewrites](#model-rewrites)) |
163
+ | `modelFallback` | `object` | -- | Override default model fallbacks (see [Customizing Fallbacks](#customizing-fallbacks)) |
164
+ | `modelFallback.claudeOpus` | `string` | `claude-opus-4.6` | Fallback for `claude-opus-*` models |
165
+ | `modelFallback.claudeSonnet` | `string` | `claude-sonnet-4.6` | Fallback for `claude-sonnet-*` models |
166
+ | `modelFallback.claudeHaiku` | `string` | `claude-haiku-4.5` | Fallback for `claude-haiku-*` models |
167
+ | `smallModel` | `string` | -- | Target model for compact request routing (see [Small-Model Routing](#small-model-routing)) |
168
+ | `compactUseSmallModel` | `boolean` | `false` | Route compact/summarization requests to `smallModel` |
169
+ | `contextUpgrade` | `boolean` | `true` | Auto-upgrade to extended-context model variants (see [Context-1M Auto-Upgrade](#context-1m-auto-upgrade)) |
170
+ | `contextUpgradeTokenThreshold` | `number` | `160000` | Token threshold for proactive context upgrade |
171
+ | `useFunctionApplyPatch` | `boolean` | `true` | Rewrite `apply_patch` custom tool as function tool on Responses path |
172
+ | `responsesApiContextManagementModels` | `string[]` | -- | Models that enable Responses context compaction |
173
+ | `modelReasoningEfforts` | `Record<string, string>` | -- | Per-model reasoning effort defaults for Anthropic-to-Responses translation |
174
+
175
+ Example:
176
+
177
+ ```json
178
+ {
179
+ "modelRewrites": [
180
+ { "from": "claude-haiku-*", "to": "gpt-4.1-mini" }
181
+ ],
182
+ "modelFallback": {
183
+ "claudeOpus": "claude-opus-4.6",
184
+ "claudeSonnet": "claude-sonnet-4.6"
185
+ },
186
+ "smallModel": "gpt-4.1-mini",
187
+ "compactUseSmallModel": true,
188
+ "contextUpgrade": true,
189
+ "contextUpgradeTokenThreshold": 160000,
190
+ "useFunctionApplyPatch": true,
191
+ "responsesApiContextManagementModels": ["gpt-5", "gpt-5-mini"],
192
+ "modelReasoningEfforts": {
193
+ "gpt-5": "high",
194
+ "gpt-5-mini": "medium"
195
+ }
196
+ }
197
+ ```
198
+
199
+ **Priority order** for model fallbacks: environment variable > config.json > built-in default.
200
+
217
201
  ## Model Mapping
218
202
 
219
203
  When Claude Code sends a request for a model like `claude-sonnet-4.6`, the proxy maps it to an actual model available on Copilot. The mapping logic works as follows:
@@ -251,10 +235,46 @@ Or in the proxy's **config file** (`~/.local/share/ghc-proxy/config.json`):
251
235
  }
252
236
  ```
253
237
 
254
- **Priority order:** environment variable > config.json > built-in default.
255
-
256
238
  > **Note:** Model fallbacks only apply to the **chat completions translation path**. The native Messages and Responses API strategies pass the model ID through to Copilot as-is.
257
239
 
240
+ ### Model Rewrites
241
+
242
+ For more general model substitution, use `modelRewrites` in the config file. Each rule maps a `from` pattern to a `to` model ID. The `from` field supports glob patterns with `*` wildcards, and the first matching rule wins.
243
+
244
+ ```json
245
+ {
246
+ "modelRewrites": [
247
+ { "from": "claude-haiku-*", "to": "gpt-4.1-mini" },
248
+ { "from": "gpt-5.4*", "to": "gpt-5.2" }
249
+ ]
250
+ }
251
+ ```
252
+
253
+ Unlike model fallbacks (which only apply to the chat completions path), rewrites are applied **uniformly to all three endpoints** — `/v1/messages`, `/v1/chat/completions`, and `/v1/responses`. Target model names are normalized against Copilot's known model list using dash/dot equivalence (e.g. `gpt-4.1` matches `gpt-4-1`).
254
+
255
+ Rewrites run **before** any other model policy — context upgrades, small-model routing, and strategy selection all see the rewritten model. This means a rewritten model still benefits from context-1m upgrades if the target has an upgrade rule.
256
+
257
+ ### Context-1M Auto-Upgrade
258
+
259
+ The proxy can automatically upgrade models to their extended-context (1M token) variants when the request is large. This is enabled by default.
260
+
261
+ **Proactive upgrade:** Before sending the request, the proxy estimates the input token count. If it exceeds the configured threshold (default: 160,000 tokens), the model is upgraded to its 1M variant before the request is sent.
262
+
263
+ **Reactive upgrade:** If the upstream returns a context-length error (e.g. "context length exceeded"), the proxy retries the request with the upgraded model automatically.
264
+
265
+ **Beta header support:** When a client sends an `anthropic-beta: context-*` header (e.g. `context-1m-2025-04-14`), the proxy strips the header (Copilot does not understand it) and upgrades the model to the 1M variant instead.
266
+
267
+ Current upgrade rules:
268
+
269
+ | Source Model | Upgraded Model |
270
+ |-------------|----------------|
271
+ | `claude-opus-4.6` | `claude-opus-4.6-1m` |
272
+
273
+ Configuration:
274
+
275
+ - `contextUpgrade` (boolean, default `true`) — enable or disable auto-upgrade
276
+ - `contextUpgradeTokenThreshold` (number, default `160000`) — token count threshold for proactive upgrade
277
+
258
278
  ### Small-Model Routing
259
279
 
260
280
  `/v1/messages` can optionally reroute specific low-value requests to a cheaper model:
@@ -268,7 +288,76 @@ The switch defaults to `false`. Routing is conservative:
268
288
  - it must preserve the original model's declared endpoint support
269
289
  - tool, thinking, and vision requests are not rerouted to a model that lacks the required capabilities
270
290
 
271
- ### Responses Compatibility
291
+ ## How it Works
292
+
293
+ ghc-proxy sits between your tools and the GitHub Copilot API:
294
+
295
+ ```text
296
+ ┌──────────────┐ ┌───────────┐ ┌───────────────────────┐
297
+ │ Claude Code │──────│ ghc-proxy │──────│ api.githubcopilot.com │
298
+ │ Cursor │ │ :4141 │ │ │
299
+ │ Any client │ │ │ │ │
300
+ └──────────────┘ └───────────┘ └───────────────────────┘
301
+ OpenAI or Translates GitHub Copilot
302
+ Anthropic between API
303
+ format formats
304
+ ```
305
+
306
+ The proxy authenticates with GitHub using the [device code OAuth flow](https://docs.github.com/en/apps/oauth-apps/building-oauth-apps/authorizing-oauth-apps#device-flow) (the same flow VS Code uses), then exchanges the GitHub token for a short-lived Copilot token that auto-refreshes.
307
+
308
+ When the Copilot token response includes `endpoints.api`, `ghc-proxy` now prefers that runtime API base automatically instead of relying only on the configured account type. This keeps enterprise/business routing aligned with the endpoint GitHub actually returned for the current token.
309
+
310
+ Incoming requests hit an [Elysia](https://elysiajs.com/) server. `chat/completions` requests are validated, normalized into the shared planning pipeline, and then forwarded to Copilot. `responses` requests use a native Responses path with explicit compatibility policies. `messages` requests are routed per-model and can use native Anthropic passthrough, the Responses translation path, or the existing chat-completions fallback. The translator tracks exact vs lossy vs unsupported behavior explicitly; see the [Messages Routing and Translation Guide](./docs/messages-routing-and-translation.md) and the [Anthropic Translation Matrix](./docs/anthropic-translation-matrix.md) for the current support surface.
311
+
312
+ ### Request Routing
313
+
314
+ `ghc-proxy` does not force every request through one protocol. The current routing rules are:
315
+
316
+ - `POST /v1/chat/completions`: OpenAI Chat Completions -> shared planning pipeline -> Copilot `/chat/completions`
317
+ - `POST /v1/responses`: OpenAI Responses create -> native Responses handler -> Copilot `/responses`
318
+ - `POST /v1/responses/input_tokens`: Responses input-token counting passthrough when the upstream supports it
319
+ - `GET /v1/responses/:responseId`: Responses retrieve passthrough when the upstream supports it
320
+ - `GET /v1/responses/:responseId/input_items`: Responses input-items passthrough when the upstream supports it
321
+ - `DELETE /v1/responses/:responseId`: Responses delete passthrough when the upstream supports it
322
+ - `POST /v1/messages`: Anthropic Messages -> choose the best available upstream path for the selected model:
323
+ - native Copilot `/v1/messages` when supported
324
+ - Anthropic -> Responses -> Anthropic translation when the model only supports `/responses`
325
+ - Anthropic -> Chat Completions -> Anthropic fallback otherwise
326
+
327
+ This keeps the existing chat pipeline stable while allowing newer Copilot models to use the endpoint they actually expose.
328
+
329
+ ### Endpoints
330
+
331
+ **OpenAI compatible:**
332
+
333
+ | Method | Path | Description |
334
+ |--------|------|-------------|
335
+ | `POST` | `/v1/chat/completions` | Chat completions (streaming and non-streaming) |
336
+ | `POST` | `/v1/responses` | Create a Responses API response |
337
+ | `POST` | `/v1/responses/input_tokens` | Count Responses input tokens when supported by Copilot upstream |
338
+ | `GET` | `/v1/responses/:responseId` | Retrieve one response when supported by Copilot upstream |
339
+ | `GET` | `/v1/responses/:responseId/input_items` | Retrieve response input items when supported by Copilot upstream |
340
+ | `DELETE` | `/v1/responses/:responseId` | Delete one response when supported by Copilot upstream |
341
+ | `GET` | `/v1/models` | List available models |
342
+ | `POST` | `/v1/embeddings` | Generate embeddings |
343
+
344
+ **Anthropic compatible:**
345
+
346
+ | Method | Path | Description |
347
+ |--------|------|-------------|
348
+ | `POST` | `/v1/messages` | Messages API with per-model routing across native Messages, Responses translation, or chat-completions fallback |
349
+ | `POST` | `/v1/messages/count_tokens` | Token counting |
350
+
351
+ **Utility:**
352
+
353
+ | Method | Path | Description |
354
+ |--------|------|-------------|
355
+ | `GET` | `/usage` | Copilot quota / usage monitoring |
356
+ | `GET` | `/token` | Inspect the current Copilot token |
357
+
358
+ > **Note:** The `/v1/` prefix is optional for OpenAI-compatible endpoints (`/chat/completions`, `/responses`, `/models`, `/embeddings`). Anthropic endpoints (`/v1/messages`, `/v1/messages/count_tokens`) require the `/v1` prefix.
359
+
360
+ ## Responses Compatibility
272
361
 
273
362
  `/v1/responses` is designed to stay close to the OpenAI wire format while making Copilot limitations explicit:
274
363
 
@@ -282,33 +371,18 @@ The switch defaults to `false`. Routing is conservative:
282
371
  - external image URLs on the Responses path fail explicitly with `400`; use `file_id` or data URL image input instead
283
372
  - official `input_file` and `item_reference` input items are modeled explicitly and validated before forwarding
284
373
 
285
- Live upstream verification matters here. On March 11, 2026, a full local scan across every Copilot model that advertised `/responses` support still showed two stable vision gaps:
286
-
287
- - external image URLs were rejected uniformly enough that the proxy now rejects them locally with a clearer capability error
288
- - the current 1x1 PNG data URL probe was rejected upstream as invalid image data even though the fixture itself decodes as a valid PNG locally
289
-
290
- The proxy does not currently disable Responses vision wholesale because the same models still advertise vision capability in Copilot model metadata. Treat Responses vision as upstream-contract-sensitive and verify it with `matrix:live` before relying on it.
374
+ > See [Responses Upstream Notes](./docs/responses-upstream-notes.md) for detailed upstream compatibility observations from live testing.
291
375
 
292
- Additional real-upstream note: on March 11, 2026, `POST /responses` succeeded against the current enterprise Copilot endpoint, but `POST /responses/input_tokens`, `GET /responses/{id}`, `GET /responses/{id}/input_items`, and `DELETE /responses/{id}` all returned upstream `404`. The proxy exposes those routes because they are part of the official Responses surface, but current Copilot upstream support is not there yet. The same live matrix also showed `previous_response_id` returning upstream `400 previous_response_id is not supported` on the tested model.
376
+ ## Docker
293
377
 
294
- Example `config.json`:
378
+ Pre-built images are available on GHCR:
295
379
 
296
- ```json
297
- {
298
- "smallModel": "gpt-4.1-mini",
299
- "compactUseSmallModel": true,
300
- "useFunctionApplyPatch": true,
301
- "responsesApiContextManagementModels": ["gpt-5", "gpt-5-mini"],
302
- "modelReasoningEfforts": {
303
- "gpt-5": "high",
304
- "gpt-5-mini": "medium"
305
- }
306
- }
380
+ ```bash
381
+ docker pull ghcr.io/wxxb789/ghc-proxy
382
+ docker run -p 4141:4141 ghcr.io/wxxb789/ghc-proxy
307
383
  ```
308
384
 
309
- ## Docker
310
-
311
- Build and run:
385
+ Or build locally:
312
386
 
313
387
  ```bash
314
388
  docker build -t ghc-proxy .
@@ -321,7 +395,7 @@ Authentication and settings are persisted in `copilot-data/config.json` so they
321
395
  You can also pass a GitHub token via environment variable:
322
396
 
323
397
  ```bash
324
- docker run -p 4141:4141 -e GH_TOKEN=your_token ghc-proxy
398
+ docker run -p 4141:4141 -e GH_TOKEN=your_token ghcr.io/wxxb789/ghc-proxy
325
399
  ```
326
400
 
327
401
  Docker Compose:
@@ -329,7 +403,7 @@ Docker Compose:
329
403
  ```yaml
330
404
  services:
331
405
  ghc-proxy:
332
- build: .
406
+ image: ghcr.io/wxxb789/ghc-proxy
333
407
  ports:
334
408
  - '4141:4141'
335
409
  environment:
package/dist/main.mjs CHANGED
@@ -5698,6 +5698,9 @@ function fromTranslationFailure(failure) {
5698
5698
  type: "translation_error"
5699
5699
  } });
5700
5700
  }
5701
+ function previewBody(text, maxLength = 500) {
5702
+ return text.length > maxLength ? `${text.slice(0, maxLength)}…` : text;
5703
+ }
5701
5704
  function isStructuredErrorPayload(value) {
5702
5705
  return typeof value === "object" && value !== null && "error" in value && typeof value.error === "object" && value.error !== null;
5703
5706
  }
@@ -5706,12 +5709,13 @@ function isStructuredErrorPayload(value) {
5706
5709
  * Used by CopilotClient when upstream returns a non-OK response.
5707
5710
  */
5708
5711
  async function throwUpstreamError(message, response) {
5712
+ let rawText = "";
5709
5713
  let body;
5710
5714
  try {
5711
- const text = await response.text();
5712
- const json = JSON.parse(text);
5715
+ rawText = await response.text();
5716
+ const json = JSON.parse(rawText);
5713
5717
  body = isStructuredErrorPayload(json) ? json : { error: {
5714
- message: text,
5718
+ message: rawText,
5715
5719
  type: "upstream_error"
5716
5720
  } };
5717
5721
  } catch {
@@ -5720,7 +5724,13 @@ async function throwUpstreamError(message, response) {
5720
5724
  type: "upstream_error"
5721
5725
  } };
5722
5726
  }
5723
- consola.error("Upstream error:", body);
5727
+ consola.error("Upstream error:", {
5728
+ status: response.status,
5729
+ statusText: response.statusText,
5730
+ url: response.url,
5731
+ body,
5732
+ rawBody: rawText ? previewBody(rawText) : "<empty>"
5733
+ });
5724
5734
  throw new HTTPError(response.status, body);
5725
5735
  }
5726
5736
 
@@ -5751,10 +5761,7 @@ var CopilotClient = class {
5751
5761
  body: options.body,
5752
5762
  signal: options.signal
5753
5763
  });
5754
- if (!response.ok) {
5755
- consola.error(errorMessage, response);
5756
- await throwUpstreamError(errorMessage, response);
5757
- }
5764
+ if (!response.ok) await throwUpstreamError(errorMessage, response);
5758
5765
  return response;
5759
5766
  }
5760
5767
  /** Fetch and parse JSON response */
@@ -6225,7 +6232,7 @@ const checkUsage = defineCommand({
6225
6232
 
6226
6233
  //#endregion
6227
6234
  //#region src/lib/version.ts
6228
- const VERSION = "0.5.0";
6235
+ const VERSION = "0.5.2";
6229
6236
 
6230
6237
  //#endregion
6231
6238
  //#region src/debug.ts
@@ -46602,6 +46609,19 @@ function logRequest(method, url, status, elapsed, modelInfo) {
46602
46609
  console.log(`${line}${formatModelMapping(modelInfo)}`);
46603
46610
  }
46604
46611
 
46612
+ //#endregion
46613
+ //#region src/lib/request-timeout.ts
46614
+ function disableIdleTimeout(server, request) {
46615
+ if (typeof server?.timeout === "function") server.timeout(request, 0);
46616
+ }
46617
+ function hasStreamingFlag(body) {
46618
+ if (!body || typeof body !== "object") return false;
46619
+ return body.stream === true;
46620
+ }
46621
+ function hasStreamingResponsesQuery(request) {
46622
+ return new URL(request.url).searchParams.get("stream") === "true";
46623
+ }
46624
+
46605
46625
  //#endregion
46606
46626
  //#region src/lib/sse-adapter.ts
46607
46627
  /**
@@ -46698,8 +46718,8 @@ function inferModelFamily(model) {
46698
46718
  const baseProfile = {
46699
46719
  id: "base",
46700
46720
  family: "other",
46701
- enableCacheControl: false,
46702
- includeUsageOnStream: false,
46721
+ enableCacheControl: true,
46722
+ includeUsageOnStream: true,
46703
46723
  applyThinking(request) {
46704
46724
  const thinking = request.thinking;
46705
46725
  if (!thinking || thinking.type === "disabled") return {};
@@ -48454,7 +48474,10 @@ function parseAnthropicCountTokensPayload(payload) {
48454
48474
  //#region src/lib/validation/embeddings.ts
48455
48475
  const embeddingRequestSchema = object({
48456
48476
  input: union([string(), array(string())]),
48457
- model: string().min(1)
48477
+ model: string().min(1),
48478
+ dimensions: nonNegativeIntegerSchema.optional(),
48479
+ encoding_format: _enum(["float", "base64"]).optional(),
48480
+ user: string().min(1).optional()
48458
48481
  }).loose();
48459
48482
  function parseEmbeddingRequest(payload) {
48460
48483
  return parsePayload(embeddingRequestSchema, "openai.embeddings", payload);
@@ -48895,7 +48918,8 @@ async function handleCompletionCore({ body, signal, headers }) {
48895
48918
  //#endregion
48896
48919
  //#region src/routes/chat-completions/route.ts
48897
48920
  function createCompletionRoutes() {
48898
- return new Elysia().use(requestGuardPlugin).post("/chat/completions", async function* ({ body, request }) {
48921
+ return new Elysia().use(requestGuardPlugin).post("/chat/completions", async function* ({ body, request, server }) {
48922
+ if (hasStreamingFlag(body)) disableIdleTimeout(server, request);
48899
48923
  const { result, modelMapping } = await handleCompletionCore({
48900
48924
  body,
48901
48925
  signal: request.signal,
@@ -48909,12 +48933,18 @@ function createCompletionRoutes() {
48909
48933
 
48910
48934
  //#endregion
48911
48935
  //#region src/routes/embeddings/handler.ts
48936
+ function normalizeEmbeddingRequest(payload) {
48937
+ return {
48938
+ ...payload,
48939
+ input: typeof payload.input === "string" ? [payload.input] : payload.input
48940
+ };
48941
+ }
48912
48942
  /**
48913
48943
  * Core handler for creating embeddings.
48914
48944
  */
48915
48945
  async function handleEmbeddingsCore(body) {
48916
48946
  const payload = parseEmbeddingRequest(body);
48917
- return await createCopilotClient().createEmbeddings(payload);
48947
+ return await createCopilotClient().createEmbeddings(normalizeEmbeddingRequest(payload));
48918
48948
  }
48919
48949
 
48920
48950
  //#endregion
@@ -48942,10 +48972,16 @@ function createAnthropicAdapter() {
48942
48972
 
48943
48973
  //#endregion
48944
48974
  //#region src/routes/messages/count-tokens-handler.ts
48945
- const CLAUDE_TOOL_OVERHEAD_TOKENS = 346;
48946
- const GROK_TOOL_OVERHEAD_TOKENS = 480;
48947
- const CLAUDE_ESTIMATION_FACTOR = 1.15;
48948
- const GROK_ESTIMATION_FACTOR = 1.03;
48975
+ const TOOL_OVERHEAD_TOKENS = {
48976
+ claude: 346,
48977
+ grok: 480,
48978
+ gpt: 346
48979
+ };
48980
+ const ESTIMATION_FACTOR = {
48981
+ claude: 1.15,
48982
+ grok: 1.03,
48983
+ gpt: 1.1
48984
+ };
48949
48985
  /**
48950
48986
  * Core handler for counting tokens.
48951
48987
  */
@@ -48970,13 +49006,13 @@ async function handleCountTokensCore({ body, headers }) {
48970
49006
  let mcpToolExist = false;
48971
49007
  if (anthropicBeta?.startsWith("claude-code")) mcpToolExist = anthropicPayload.tools.some((tool) => tool.name.startsWith("mcp__"));
48972
49008
  if (!mcpToolExist) {
48973
- if (anthropicPayload.model.startsWith("claude")) tokenCount.input = tokenCount.input + CLAUDE_TOOL_OVERHEAD_TOKENS;
48974
- else if (anthropicPayload.model.startsWith("grok")) tokenCount.input = tokenCount.input + GROK_TOOL_OVERHEAD_TOKENS;
49009
+ const overhead = TOOL_OVERHEAD_TOKENS[inferModelFamily(anthropicPayload.model)];
49010
+ if (overhead) tokenCount.input = tokenCount.input + overhead;
48975
49011
  }
48976
49012
  }
48977
49013
  let finalTokenCount = tokenCount.input + tokenCount.output;
48978
- if (anthropicPayload.model.startsWith("claude")) finalTokenCount = Math.round(finalTokenCount * CLAUDE_ESTIMATION_FACTOR);
48979
- else if (anthropicPayload.model.startsWith("grok")) finalTokenCount = Math.round(finalTokenCount * GROK_ESTIMATION_FACTOR);
49014
+ const factor = ESTIMATION_FACTOR[inferModelFamily(anthropicPayload.model)];
49015
+ if (factor) finalTokenCount = Math.round(finalTokenCount * factor);
48980
49016
  consola.info("Token count:", finalTokenCount);
48981
49017
  return { input_tokens: finalTokenCount };
48982
49018
  }
@@ -49174,12 +49210,15 @@ function translateAssistantMessage(message) {
49174
49210
  continue;
49175
49211
  }
49176
49212
  if (SignatureCodec.isReasoningSignature(block.signature)) {
49177
- flushPendingContent(pendingContent, items, {
49178
- role: "assistant",
49179
- phase: assistantPhase
49180
- });
49181
- items.push(createReasoningContent(block));
49182
- continue;
49213
+ const { id } = SignatureCodec.decodeReasoning(block.signature);
49214
+ if (id) {
49215
+ flushPendingContent(pendingContent, items, {
49216
+ role: "assistant",
49217
+ phase: assistantPhase
49218
+ });
49219
+ items.push(createReasoningContent(block));
49220
+ continue;
49221
+ }
49183
49222
  }
49184
49223
  }
49185
49224
  const converted = translateAssistantContentBlock(block);
@@ -49817,6 +49856,7 @@ var ResponsesStreamTranslator = class {
49817
49856
  this.closeScalarBlock(`thinking:${rawEvent.output_index}`, events);
49818
49857
  }
49819
49858
  if (rawEvent.item.type === "function_call") {
49859
+ if (this.state.functionCallStateByOutputIndex.get(rawEvent.output_index)?.closed) return events;
49820
49860
  const blockIndex = this.openFunctionCallBlock({
49821
49861
  outputIndex: rawEvent.output_index,
49822
49862
  toolCallId: rawEvent.item.call_id,
@@ -50318,7 +50358,8 @@ async function handleMessagesCore({ body, signal, headers }) {
50318
50358
  //#endregion
50319
50359
  //#region src/routes/messages/route.ts
50320
50360
  function createMessageRoutes() {
50321
- return new Elysia().use(requestGuardPlugin).post("/messages", async function* ({ body, request }) {
50361
+ return new Elysia().use(requestGuardPlugin).post("/messages", async function* ({ body, request, server }) {
50362
+ if (hasStreamingFlag(body)) disableIdleTimeout(server, request);
50322
50363
  const { result, modelMapping } = await handleMessagesCore({
50323
50364
  body,
50324
50365
  signal: request.signal,
@@ -50580,7 +50621,8 @@ function parseBooleanParam(value) {
50580
50621
  //#endregion
50581
50622
  //#region src/routes/responses/route.ts
50582
50623
  function createResponsesRoutes() {
50583
- return new Elysia().use(requestGuardPlugin).post("/responses", async function* ({ body, request }) {
50624
+ return new Elysia().use(requestGuardPlugin).post("/responses", async function* ({ body, request, server }) {
50625
+ if (hasStreamingFlag(body)) disableIdleTimeout(server, request);
50584
50626
  const { result, modelMapping } = await handleResponsesCore({
50585
50627
  body,
50586
50628
  signal: request.signal,
@@ -50602,7 +50644,8 @@ function createResponsesRoutes() {
50602
50644
  headers: request.headers,
50603
50645
  signal: request.signal
50604
50646
  });
50605
- }).get("/responses/:responseId", async ({ params, request }) => {
50647
+ }).get("/responses/:responseId", async ({ params, request, server }) => {
50648
+ if (hasStreamingResponsesQuery(request)) disableIdleTimeout(server, request);
50606
50649
  return handleRetrieveResponseCore({
50607
50650
  params,
50608
50651
  url: request.url,