npm - ghc-proxy - Versions diffs - 0.5.0 → 0.5.2 - Mend

ghc-proxy 0.5.0 → 0.5.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -12,7 +12,7 @@ A proxy that turns your GitHub Copilot subscription into an OpenAI and Anthropic
 **TL;DR** — Install [Bun](https://bun.com/docs/installation), then run:
 ```bash
-bunx ghc-proxy@latest start --wait
+bunx ghc-proxy@latest start
 ```
 ## Prerequisites
@@ -28,9 +28,7 @@ Before you start, make sure you have:
 1. Start the proxy:
-       bunx ghc-proxy@latest start --wait
-   > **Recommended:** The `--wait` flag queues requests instead of rejecting them with a 429 error when you hit Copilot rate limits. This is the simplest way to run the proxy for daily use.
+       bunx ghc-proxy@latest start
 2. On the first run, you will be guided through GitHub's device-code authentication flow. Follow the prompts to authorize the proxy.
@@ -38,6 +36,8 @@ Before you start, make sure you have:
 That's it. Any tool that supports the OpenAI or Anthropic API can now point to `http://localhost:4141`.
+> **Tip:** If you set `--rate-limit`, add `--wait` to queue requests instead of rejecting them with 429 when the cooldown has not elapsed yet. See [Rate Limiting](#rate-limiting) for details.
 ## Using with Claude Code
 This is the most common use case. There are two ways to set it up:
@@ -73,7 +73,7 @@ Create or edit `~/.claude/settings.json` (this applies globally to all projects)
 Then simply start the proxy and use Claude Code as usual:
 ```bash
-bunx ghc-proxy@latest start --wait
+bunx ghc-proxy@latest start
 ```
 **What each environment variable does:**
@@ -91,75 +91,6 @@ bunx ghc-proxy@latest start --wait
 See the [Claude Code settings docs](https://docs.anthropic.com/en/docs/claude-code/settings#environment-variables) for more options.
-## What it Does
-ghc-proxy sits between your tools and the GitHub Copilot API:
-```text
-┌──────────────┐      ┌───────────┐      ┌───────────────────────┐
-│ Claude Code  │──────│ ghc-proxy │──────│ api.githubcopilot.com │
-│ Cursor       │      │ :4141     │      │                       │
-│ Any client   │      │           │      │                       │
-└──────────────┘      └───────────┘      └───────────────────────┘
-   OpenAI or           Translates           GitHub Copilot
-   Anthropic           between              API
-   format              formats
-```
-The proxy authenticates with GitHub using the [device code OAuth flow](https://docs.github.com/en/apps/oauth-apps/building-oauth-apps/authorizing-oauth-apps#device-flow) (the same flow VS Code uses), then exchanges the GitHub token for a short-lived Copilot token that auto-refreshes.
-When the Copilot token response includes `endpoints.api`, `ghc-proxy` now prefers that runtime API base automatically instead of relying only on the configured account type. This keeps enterprise/business routing aligned with the endpoint GitHub actually returned for the current token.
-Incoming requests hit an [Elysia](https://elysiajs.com/) server. `chat/completions` requests are validated, normalized into the shared planning pipeline, and then forwarded to Copilot. `responses` requests use a native Responses path with explicit compatibility policies. `messages` requests are routed per-model and can use native Anthropic passthrough, the Responses translation path, or the existing chat-completions fallback. The translator tracks exact vs lossy vs unsupported behavior explicitly; see the [Messages Routing and Translation Guide](./docs/messages-routing-and-translation.md) and the [Anthropic Translation Matrix](./docs/anthropic-translation-matrix.md) for the current support surface.
-### Request Routing
-`ghc-proxy` does not force every request through one protocol. The current routing rules are:
-- `POST /v1/chat/completions`: OpenAI Chat Completions -> shared planning pipeline -> Copilot `/chat/completions`
-- `POST /v1/responses`: OpenAI Responses create -> native Responses handler -> Copilot `/responses`
-- `POST /v1/responses/input_tokens`: Responses input-token counting passthrough when the upstream supports it
-- `GET /v1/responses/:responseId`: Responses retrieve passthrough when the upstream supports it
-- `GET /v1/responses/:responseId/input_items`: Responses input-items passthrough when the upstream supports it
-- `DELETE /v1/responses/:responseId`: Responses delete passthrough when the upstream supports it
-- `POST /v1/messages`: Anthropic Messages -> choose the best available upstream path for the selected model:
-  - native Copilot `/v1/messages` when supported
-  - Anthropic -> Responses -> Anthropic translation when the model only supports `/responses`
-  - Anthropic -> Chat Completions -> Anthropic fallback otherwise
-This keeps the existing chat pipeline stable while allowing newer Copilot models to use the endpoint they actually expose.
-### Endpoints
-**OpenAI compatible:**
-| Method | Path | Description |
-|--------|------|-------------|
-| `POST` | `/v1/chat/completions` | Chat completions (streaming and non-streaming) |
-| `POST` | `/v1/responses` | Create a Responses API response |
-| `POST` | `/v1/responses/input_tokens` | Count Responses input tokens when supported by Copilot upstream |
-| `GET` | `/v1/responses/:responseId` | Retrieve one response when supported by Copilot upstream |
-| `GET` | `/v1/responses/:responseId/input_items` | Retrieve response input items when supported by Copilot upstream |
-| `DELETE` | `/v1/responses/:responseId` | Delete one response when supported by Copilot upstream |
-| `GET`  | `/v1/models` | List available models |
-| `POST` | `/v1/embeddings` | Generate embeddings |
-**Anthropic compatible:**
-| Method | Path | Description |
-|--------|------|-------------|
-| `POST` | `/v1/messages` | Messages API with per-model routing across native Messages, Responses translation, or chat-completions fallback |
-| `POST` | `/v1/messages/count_tokens` | Token counting |
-**Utility:**
-| Method | Path | Description |
-|--------|------|-------------|
-| `GET`  | `/usage` | Copilot quota / usage monitoring |
-| `GET`  | `/token` | Inspect the current Copilot token |
-> **Note:** The `/v1/` prefix is optional. `/chat/completions`, `/responses`, `/models`, and `/embeddings` also work.
 ## CLI Reference
 ghc-proxy uses a subcommand structure:
@@ -179,18 +110,18 @@ bunx ghc-proxy@latest debug          # Print diagnostic info (version, paths, to
 | `--verbose` | `-v` | `false` | Enable verbose logging |
 | `--account-type` | `-a` | `individual` | `individual`, `business`, or `enterprise` |
 | `--rate-limit` | `-r` | -- | Minimum seconds between requests |
-| `--wait` | `-w` | `false` | Wait instead of rejecting when rate-limited |
+| `--wait` | `-w` | `false` | Queue requests instead of rejecting with 429 when `--rate-limit` cooldown has not elapsed (requires `--rate-limit`) |
 | `--manual` | -- | `false` | Manually approve each request |
 | `--github-token` | `-g` | -- | Pass a GitHub token directly (from `auth`) |
 | `--claude-code` | `-c` | `false` | Generate a Claude Code launch command |
 | `--show-token` | -- | `false` | Display tokens on auth and refresh |
 | `--proxy-env` | -- | `false` | Use `HTTP_PROXY`/`HTTPS_PROXY` from env (Node.js only; Bun reads proxy env natively) |
-| `--idle-timeout` | -- | `120` | Bun server idle timeout in seconds |
-| `--upstream-timeout` | -- | `300` | Upstream request timeout in seconds (0 to disable) |
+| `--idle-timeout` | -- | `120` | Bun server idle timeout in seconds (`0` disables; Bun max is `255`; streaming routes disable idle timeout automatically) |
+| `--upstream-timeout` | -- | `1800` | Upstream request timeout in seconds (0 to disable) |
 ## Rate Limiting
-If you are worried about hitting Copilot rate limits:
+If you want to throttle how often the proxy forwards requests:
 ```bash
 # Enforce a 30-second cooldown between requests
@@ -203,6 +134,8 @@ bunx ghc-proxy@latest start --rate-limit 30 --wait
 bunx ghc-proxy@latest start --manual
 ```
+`--wait` only takes effect when `--rate-limit` is also set. Without `--rate-limit`, there is no cooldown to wait on and `--wait` has no effect.
 ## Account Types
 If you have a GitHub Business or Enterprise Copilot plan, pass `--account-type`:
@@ -214,6 +147,57 @@ bunx ghc-proxy@latest start --account-type enterprise
 This routes requests to the correct Copilot API endpoint for your plan. See the [GitHub docs on network routing](https://docs.github.com/en/enterprise-cloud@latest/copilot/managing-copilot/managing-github-copilot-in-your-organization/managing-access-to-github-copilot-in-your-organization/managing-github-copilot-access-to-your-organizations-network#configuring-copilot-subscription-based-network-routing-for-your-enterprise-or-organization) for details.
+## Configuration
+The proxy reads an optional JSON config file at:
+```
+~/.local/share/ghc-proxy/config.json
+```
+All fields are optional. The full schema:
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `modelRewrites` | `{ from, to }[]` | -- | Glob-pattern model substitution rules (see [Model Rewrites](#model-rewrites)) |
+| `modelFallback` | `object` | -- | Override default model fallbacks (see [Customizing Fallbacks](#customizing-fallbacks)) |
+| `modelFallback.claudeOpus` | `string` | `claude-opus-4.6` | Fallback for `claude-opus-*` models |
+| `modelFallback.claudeSonnet` | `string` | `claude-sonnet-4.6` | Fallback for `claude-sonnet-*` models |
+| `modelFallback.claudeHaiku` | `string` | `claude-haiku-4.5` | Fallback for `claude-haiku-*` models |
+| `smallModel` | `string` | -- | Target model for compact request routing (see [Small-Model Routing](#small-model-routing)) |
+| `compactUseSmallModel` | `boolean` | `false` | Route compact/summarization requests to `smallModel` |
+| `contextUpgrade` | `boolean` | `true` | Auto-upgrade to extended-context model variants (see [Context-1M Auto-Upgrade](#context-1m-auto-upgrade)) |
+| `contextUpgradeTokenThreshold` | `number` | `160000` | Token threshold for proactive context upgrade |
+| `useFunctionApplyPatch` | `boolean` | `true` | Rewrite `apply_patch` custom tool as function tool on Responses path |
+| `responsesApiContextManagementModels` | `string[]` | -- | Models that enable Responses context compaction |
+| `modelReasoningEfforts` | `Record<string, string>` | -- | Per-model reasoning effort defaults for Anthropic-to-Responses translation |
+Example:
+```json
+{
+  "modelRewrites": [
+    { "from": "claude-haiku-*", "to": "gpt-4.1-mini" }
+  ],
+  "modelFallback": {
+    "claudeOpus": "claude-opus-4.6",
+    "claudeSonnet": "claude-sonnet-4.6"
+  },
+  "smallModel": "gpt-4.1-mini",
+  "compactUseSmallModel": true,
+  "contextUpgrade": true,
+  "contextUpgradeTokenThreshold": 160000,
+  "useFunctionApplyPatch": true,
+  "responsesApiContextManagementModels": ["gpt-5", "gpt-5-mini"],
+  "modelReasoningEfforts": {
+    "gpt-5": "high",
+    "gpt-5-mini": "medium"
+  }
+}
+```
+**Priority order** for model fallbacks: environment variable > config.json > built-in default.
 ## Model Mapping
 When Claude Code sends a request for a model like `claude-sonnet-4.6`, the proxy maps it to an actual model available on Copilot. The mapping logic works as follows:
@@ -251,10 +235,46 @@ Or in the proxy's **config file** (`~/.local/share/ghc-proxy/config.json`):
 }
 ```
-**Priority order:** environment variable > config.json > built-in default.
 > **Note:** Model fallbacks only apply to the **chat completions translation path**. The native Messages and Responses API strategies pass the model ID through to Copilot as-is.
+### Model Rewrites
+For more general model substitution, use `modelRewrites` in the config file. Each rule maps a `from` pattern to a `to` model ID. The `from` field supports glob patterns with `*` wildcards, and the first matching rule wins.
+```json
+{
+  "modelRewrites": [
+    { "from": "claude-haiku-*", "to": "gpt-4.1-mini" },
+    { "from": "gpt-5.4*", "to": "gpt-5.2" }
+  ]
+}
+```
+Unlike model fallbacks (which only apply to the chat completions path), rewrites are applied **uniformly to all three endpoints** — `/v1/messages`, `/v1/chat/completions`, and `/v1/responses`. Target model names are normalized against Copilot's known model list using dash/dot equivalence (e.g. `gpt-4.1` matches `gpt-4-1`).
+Rewrites run **before** any other model policy — context upgrades, small-model routing, and strategy selection all see the rewritten model. This means a rewritten model still benefits from context-1m upgrades if the target has an upgrade rule.
+### Context-1M Auto-Upgrade
+The proxy can automatically upgrade models to their extended-context (1M token) variants when the request is large. This is enabled by default.
+**Proactive upgrade:** Before sending the request, the proxy estimates the input token count. If it exceeds the configured threshold (default: 160,000 tokens), the model is upgraded to its 1M variant before the request is sent.
+**Reactive upgrade:** If the upstream returns a context-length error (e.g. "context length exceeded"), the proxy retries the request with the upgraded model automatically.
+**Beta header support:** When a client sends an `anthropic-beta: context-*` header (e.g. `context-1m-2025-04-14`), the proxy strips the header (Copilot does not understand it) and upgrades the model to the 1M variant instead.
+Current upgrade rules:
+| Source Model | Upgraded Model |
+|-------------|----------------|
+| `claude-opus-4.6` | `claude-opus-4.6-1m` |
+Configuration:
+- `contextUpgrade` (boolean, default `true`) — enable or disable auto-upgrade
+- `contextUpgradeTokenThreshold` (number, default `160000`) — token count threshold for proactive upgrade
 ### Small-Model Routing
 `/v1/messages` can optionally reroute specific low-value requests to a cheaper model:
@@ -268,7 +288,76 @@ The switch defaults to `false`. Routing is conservative:
 - it must preserve the original model's declared endpoint support
 - tool, thinking, and vision requests are not rerouted to a model that lacks the required capabilities
-### Responses Compatibility
+## How it Works
+ghc-proxy sits between your tools and the GitHub Copilot API:
+```text
+┌──────────────┐      ┌───────────┐      ┌───────────────────────┐
+│ Claude Code  │──────│ ghc-proxy │──────│ api.githubcopilot.com │
+│ Cursor       │      │ :4141     │      │                       │
+│ Any client   │      │           │      │                       │
+└──────────────┘      └───────────┘      └───────────────────────┘
+   OpenAI or           Translates           GitHub Copilot
+   Anthropic           between              API
+   format              formats
+```
+The proxy authenticates with GitHub using the [device code OAuth flow](https://docs.github.com/en/apps/oauth-apps/building-oauth-apps/authorizing-oauth-apps#device-flow) (the same flow VS Code uses), then exchanges the GitHub token for a short-lived Copilot token that auto-refreshes.
+When the Copilot token response includes `endpoints.api`, `ghc-proxy` now prefers that runtime API base automatically instead of relying only on the configured account type. This keeps enterprise/business routing aligned with the endpoint GitHub actually returned for the current token.
+Incoming requests hit an [Elysia](https://elysiajs.com/) server. `chat/completions` requests are validated, normalized into the shared planning pipeline, and then forwarded to Copilot. `responses` requests use a native Responses path with explicit compatibility policies. `messages` requests are routed per-model and can use native Anthropic passthrough, the Responses translation path, or the existing chat-completions fallback. The translator tracks exact vs lossy vs unsupported behavior explicitly; see the [Messages Routing and Translation Guide](./docs/messages-routing-and-translation.md) and the [Anthropic Translation Matrix](./docs/anthropic-translation-matrix.md) for the current support surface.
+### Request Routing
+`ghc-proxy` does not force every request through one protocol. The current routing rules are:
+- `POST /v1/chat/completions`: OpenAI Chat Completions -> shared planning pipeline -> Copilot `/chat/completions`
+- `POST /v1/responses`: OpenAI Responses create -> native Responses handler -> Copilot `/responses`
+- `POST /v1/responses/input_tokens`: Responses input-token counting passthrough when the upstream supports it
+- `GET /v1/responses/:responseId`: Responses retrieve passthrough when the upstream supports it
+- `GET /v1/responses/:responseId/input_items`: Responses input-items passthrough when the upstream supports it
+- `DELETE /v1/responses/:responseId`: Responses delete passthrough when the upstream supports it
+- `POST /v1/messages`: Anthropic Messages -> choose the best available upstream path for the selected model:
+  - native Copilot `/v1/messages` when supported
+  - Anthropic -> Responses -> Anthropic translation when the model only supports `/responses`
+  - Anthropic -> Chat Completions -> Anthropic fallback otherwise
+This keeps the existing chat pipeline stable while allowing newer Copilot models to use the endpoint they actually expose.
+### Endpoints
+**OpenAI compatible:**
+| Method | Path | Description |
+|--------|------|-------------|
+| `POST` | `/v1/chat/completions` | Chat completions (streaming and non-streaming) |
+| `POST` | `/v1/responses` | Create a Responses API response |
+| `POST` | `/v1/responses/input_tokens` | Count Responses input tokens when supported by Copilot upstream |
+| `GET` | `/v1/responses/:responseId` | Retrieve one response when supported by Copilot upstream |
+| `GET` | `/v1/responses/:responseId/input_items` | Retrieve response input items when supported by Copilot upstream |
+| `DELETE` | `/v1/responses/:responseId` | Delete one response when supported by Copilot upstream |
+| `GET`  | `/v1/models` | List available models |
+| `POST` | `/v1/embeddings` | Generate embeddings |
+**Anthropic compatible:**
+| Method | Path | Description |
+|--------|------|-------------|
+| `POST` | `/v1/messages` | Messages API with per-model routing across native Messages, Responses translation, or chat-completions fallback |
+| `POST` | `/v1/messages/count_tokens` | Token counting |
+**Utility:**
+| Method | Path | Description |
+|--------|------|-------------|
+| `GET`  | `/usage` | Copilot quota / usage monitoring |
+| `GET`  | `/token` | Inspect the current Copilot token |
+> **Note:** The `/v1/` prefix is optional for OpenAI-compatible endpoints (`/chat/completions`, `/responses`, `/models`, `/embeddings`). Anthropic endpoints (`/v1/messages`, `/v1/messages/count_tokens`) require the `/v1` prefix.
+## Responses Compatibility
 `/v1/responses` is designed to stay close to the OpenAI wire format while making Copilot limitations explicit:
@@ -282,33 +371,18 @@ The switch defaults to `false`. Routing is conservative:
 - external image URLs on the Responses path fail explicitly with `400`; use `file_id` or data URL image input instead
 - official `input_file` and `item_reference` input items are modeled explicitly and validated before forwarding
-Live upstream verification matters here. On March 11, 2026, a full local scan across every Copilot model that advertised `/responses` support still showed two stable vision gaps:
-- external image URLs were rejected uniformly enough that the proxy now rejects them locally with a clearer capability error
-- the current 1x1 PNG data URL probe was rejected upstream as invalid image data even though the fixture itself decodes as a valid PNG locally
-The proxy does not currently disable Responses vision wholesale because the same models still advertise vision capability in Copilot model metadata. Treat Responses vision as upstream-contract-sensitive and verify it with `matrix:live` before relying on it.
+> See [Responses Upstream Notes](./docs/responses-upstream-notes.md) for detailed upstream compatibility observations from live testing.
-Additional real-upstream note: on March 11, 2026, `POST /responses` succeeded against the current enterprise Copilot endpoint, but `POST /responses/input_tokens`, `GET /responses/{id}`, `GET /responses/{id}/input_items`, and `DELETE /responses/{id}` all returned upstream `404`. The proxy exposes those routes because they are part of the official Responses surface, but current Copilot upstream support is not there yet. The same live matrix also showed `previous_response_id` returning upstream `400 previous_response_id is not supported` on the tested model.
+## Docker
-Example `config.json`:
+Pre-built images are available on GHCR:
-```json
-{
-  "smallModel": "gpt-4.1-mini",
-  "compactUseSmallModel": true,
-  "useFunctionApplyPatch": true,
-  "responsesApiContextManagementModels": ["gpt-5", "gpt-5-mini"],
-  "modelReasoningEfforts": {
-    "gpt-5": "high",
-    "gpt-5-mini": "medium"
-  }
-}
+```bash
+docker pull ghcr.io/wxxb789/ghc-proxy
+docker run -p 4141:4141 ghcr.io/wxxb789/ghc-proxy
 ```
-## Docker
-Build and run:
+Or build locally:
 ```bash
 docker build -t ghc-proxy .
@@ -321,7 +395,7 @@ Authentication and settings are persisted in `copilot-data/config.json` so they
 You can also pass a GitHub token via environment variable:
 ```bash
-docker run -p 4141:4141 -e GH_TOKEN=your_token ghc-proxy
+docker run -p 4141:4141 -e GH_TOKEN=your_token ghcr.io/wxxb789/ghc-proxy
 ```
 Docker Compose:
@@ -329,7 +403,7 @@ Docker Compose:
 ```yaml
 services:
   ghc-proxy:
-    build: .
+    image: ghcr.io/wxxb789/ghc-proxy
     ports:
       - '4141:4141'
     environment:

package/dist/main.mjs CHANGED Viewed

@@ -5698,6 +5698,9 @@ function fromTranslationFailure(failure) {
 		type: "translation_error"
 	} });
 }
+function previewBody(text, maxLength = 500) {
+	return text.length > maxLength ? `${text.slice(0, maxLength)}…` : text;
+}
 function isStructuredErrorPayload(value) {
 	return typeof value === "object" && value !== null && "error" in value && typeof value.error === "object" && value.error !== null;
 }
@@ -5706,12 +5709,13 @@ function isStructuredErrorPayload(value) {
 * Used by CopilotClient when upstream returns a non-OK response.
 */
 async function throwUpstreamError(message, response) {
+	let rawText = "";
 	let body;
 	try {
-		const text = await response.text();
-		const json = JSON.parse(text);
+		rawText = await response.text();
+		const json = JSON.parse(rawText);
 		body = isStructuredErrorPayload(json) ? json : { error: {
-			message: text,
+			message: rawText,
 			type: "upstream_error"
 		} };
 	} catch {
@@ -5720,7 +5724,13 @@ async function throwUpstreamError(message, response) {
 			type: "upstream_error"
 		} };
 	}
-	consola.error("Upstream error:", body);
+	consola.error("Upstream error:", {
+		status: response.status,
+		statusText: response.statusText,
+		url: response.url,
+		body,
+		rawBody: rawText ? previewBody(rawText) : "<empty>"
+	});
 	throw new HTTPError(response.status, body);
 }
@@ -5751,10 +5761,7 @@ var CopilotClient = class {
 			body: options.body,
 			signal: options.signal
 		});
-		if (!response.ok) {
-			consola.error(errorMessage, response);
-			await throwUpstreamError(errorMessage, response);
-		}
+		if (!response.ok) await throwUpstreamError(errorMessage, response);
 		return response;
 	}
 	/** Fetch and parse JSON response */
@@ -6225,7 +6232,7 @@ const checkUsage = defineCommand({
 //#endregion
 //#region src/lib/version.ts
-const VERSION = "0.5.0";
+const VERSION = "0.5.2";
 //#endregion
 //#region src/debug.ts
@@ -46602,6 +46609,19 @@ function logRequest(method, url, status, elapsed, modelInfo) {
 	console.log(`${line}${formatModelMapping(modelInfo)}`);
 }
+//#endregion
+//#region src/lib/request-timeout.ts
+function disableIdleTimeout(server, request) {
+	if (typeof server?.timeout === "function") server.timeout(request, 0);
+}
+function hasStreamingFlag(body) {
+	if (!body || typeof body !== "object") return false;
+	return body.stream === true;
+}
+function hasStreamingResponsesQuery(request) {
+	return new URL(request.url).searchParams.get("stream") === "true";
+}
 //#endregion
 //#region src/lib/sse-adapter.ts
 /**
@@ -46698,8 +46718,8 @@ function inferModelFamily(model) {
 const baseProfile = {
 	id: "base",
 	family: "other",
-	enableCacheControl: false,
-	includeUsageOnStream: false,
+	enableCacheControl: true,
+	includeUsageOnStream: true,
 	applyThinking(request) {
 		const thinking = request.thinking;
 		if (!thinking || thinking.type === "disabled") return {};
@@ -48454,7 +48474,10 @@ function parseAnthropicCountTokensPayload(payload) {
 //#region src/lib/validation/embeddings.ts
 const embeddingRequestSchema = object({
 	input: union([string(), array(string())]),
-	model: string().min(1)
+	model: string().min(1),
+	dimensions: nonNegativeIntegerSchema.optional(),
+	encoding_format: _enum(["float", "base64"]).optional(),
+	user: string().min(1).optional()
 }).loose();
 function parseEmbeddingRequest(payload) {
 	return parsePayload(embeddingRequestSchema, "openai.embeddings", payload);
@@ -48895,7 +48918,8 @@ async function handleCompletionCore({ body, signal, headers }) {
 //#endregion
 //#region src/routes/chat-completions/route.ts
 function createCompletionRoutes() {
-	return new Elysia().use(requestGuardPlugin).post("/chat/completions", async function* ({ body, request }) {
+	return new Elysia().use(requestGuardPlugin).post("/chat/completions", async function* ({ body, request, server }) {
+		if (hasStreamingFlag(body)) disableIdleTimeout(server, request);
 		const { result, modelMapping } = await handleCompletionCore({
 			body,
 			signal: request.signal,
@@ -48909,12 +48933,18 @@ function createCompletionRoutes() {
 //#endregion
 //#region src/routes/embeddings/handler.ts
+function normalizeEmbeddingRequest(payload) {
+	return {
+		...payload,
+		input: typeof payload.input === "string" ? [payload.input] : payload.input
+	};
+}
 /**
 * Core handler for creating embeddings.
 */
 async function handleEmbeddingsCore(body) {
 	const payload = parseEmbeddingRequest(body);
-	return await createCopilotClient().createEmbeddings(payload);
+	return await createCopilotClient().createEmbeddings(normalizeEmbeddingRequest(payload));
 }
 //#endregion
@@ -48942,10 +48972,16 @@ function createAnthropicAdapter() {
 //#endregion
 //#region src/routes/messages/count-tokens-handler.ts
-const CLAUDE_TOOL_OVERHEAD_TOKENS = 346;
-const GROK_TOOL_OVERHEAD_TOKENS = 480;
-const CLAUDE_ESTIMATION_FACTOR = 1.15;
-const GROK_ESTIMATION_FACTOR = 1.03;
+const TOOL_OVERHEAD_TOKENS = {
+	claude: 346,
+	grok: 480,
+	gpt: 346
+};
+const ESTIMATION_FACTOR = {
+	claude: 1.15,
+	grok: 1.03,
+	gpt: 1.1
+};
 /**
 * Core handler for counting tokens.
 */
@@ -48970,13 +49006,13 @@ async function handleCountTokensCore({ body, headers }) {
 		let mcpToolExist = false;
 		if (anthropicBeta?.startsWith("claude-code")) mcpToolExist = anthropicPayload.tools.some((tool) => tool.name.startsWith("mcp__"));
 		if (!mcpToolExist) {
-			if (anthropicPayload.model.startsWith("claude")) tokenCount.input = tokenCount.input + CLAUDE_TOOL_OVERHEAD_TOKENS;
-			else if (anthropicPayload.model.startsWith("grok")) tokenCount.input = tokenCount.input + GROK_TOOL_OVERHEAD_TOKENS;
+			const overhead = TOOL_OVERHEAD_TOKENS[inferModelFamily(anthropicPayload.model)];
+			if (overhead) tokenCount.input = tokenCount.input + overhead;
 		}
 	}
 	let finalTokenCount = tokenCount.input + tokenCount.output;
-	if (anthropicPayload.model.startsWith("claude")) finalTokenCount = Math.round(finalTokenCount * CLAUDE_ESTIMATION_FACTOR);
-	else if (anthropicPayload.model.startsWith("grok")) finalTokenCount = Math.round(finalTokenCount * GROK_ESTIMATION_FACTOR);
+	const factor = ESTIMATION_FACTOR[inferModelFamily(anthropicPayload.model)];
+	if (factor) finalTokenCount = Math.round(finalTokenCount * factor);
 	consola.info("Token count:", finalTokenCount);
 	return { input_tokens: finalTokenCount };
 }
@@ -49174,12 +49210,15 @@ function translateAssistantMessage(message) {
 				continue;
 			}
 			if (SignatureCodec.isReasoningSignature(block.signature)) {
-				flushPendingContent(pendingContent, items, {
-					role: "assistant",
-					phase: assistantPhase
-				});
-				items.push(createReasoningContent(block));
-				continue;
+				const { id } = SignatureCodec.decodeReasoning(block.signature);
+				if (id) {
+					flushPendingContent(pendingContent, items, {
+						role: "assistant",
+						phase: assistantPhase
+					});
+					items.push(createReasoningContent(block));
+					continue;
+				}
 			}
 		}
 		const converted = translateAssistantContentBlock(block);
@@ -49817,6 +49856,7 @@ var ResponsesStreamTranslator = class {
 			this.closeScalarBlock(`thinking:${rawEvent.output_index}`, events);
 		}
 		if (rawEvent.item.type === "function_call") {
+			if (this.state.functionCallStateByOutputIndex.get(rawEvent.output_index)?.closed) return events;
 			const blockIndex = this.openFunctionCallBlock({
 				outputIndex: rawEvent.output_index,
 				toolCallId: rawEvent.item.call_id,
@@ -50318,7 +50358,8 @@ async function handleMessagesCore({ body, signal, headers }) {
 //#endregion
 //#region src/routes/messages/route.ts
 function createMessageRoutes() {
-	return new Elysia().use(requestGuardPlugin).post("/messages", async function* ({ body, request }) {
+	return new Elysia().use(requestGuardPlugin).post("/messages", async function* ({ body, request, server }) {
+		if (hasStreamingFlag(body)) disableIdleTimeout(server, request);
 		const { result, modelMapping } = await handleMessagesCore({
 			body,
 			signal: request.signal,
@@ -50580,7 +50621,8 @@ function parseBooleanParam(value) {
 //#endregion
 //#region src/routes/responses/route.ts
 function createResponsesRoutes() {
-	return new Elysia().use(requestGuardPlugin).post("/responses", async function* ({ body, request }) {
+	return new Elysia().use(requestGuardPlugin).post("/responses", async function* ({ body, request, server }) {
+		if (hasStreamingFlag(body)) disableIdleTimeout(server, request);
 		const { result, modelMapping } = await handleResponsesCore({
 			body,
 			signal: request.signal,
@@ -50602,7 +50644,8 @@ function createResponsesRoutes() {
 			headers: request.headers,
 			signal: request.signal
 		});
-	}).get("/responses/:responseId", async ({ params, request }) => {
+	}).get("/responses/:responseId", async ({ params, request, server }) => {
+		if (hasStreamingResponsesQuery(request)) disableIdleTimeout(server, request);
 		return handleRetrieveResponseCore({
 			params,
 			url: request.url,