@jeffreycao/copilot-api 1.10.9 → 1.10.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. package/README.md +53 -213
  2. package/README.zh-CN.md +47 -209
  3. package/dist/auth-BO_SkMVw.js +116 -0
  4. package/dist/auth-BO_SkMVw.js.map +1 -0
  5. package/dist/{check-usage-BdXGp1Wr.js → check-usage-D-W6VD7k.js} +3 -4
  6. package/dist/{check-usage-BdXGp1Wr.js.map → check-usage-D-W6VD7k.js.map} +1 -1
  7. package/dist/{proxy-DvlF9a-7.js → config-ztdkLu9o.js} +83 -70
  8. package/dist/config-ztdkLu9o.js.map +1 -0
  9. package/dist/{debug-C_TBkyUw.js → debug-BVHmoCzY.js} +17 -7
  10. package/dist/debug-BVHmoCzY.js.map +1 -0
  11. package/dist/main.js +5 -5
  12. package/dist/{mcp-CTb-DbQH.js → mcp-DZgcvqQY.js} +2 -2
  13. package/dist/{mcp-CTb-DbQH.js.map → mcp-DZgcvqQY.js.map} +1 -1
  14. package/dist/{server-FPXzFkg9.js → server-2tRe3sDu.js} +1798 -1713
  15. package/dist/server-2tRe3sDu.js.map +1 -0
  16. package/dist/{start-CbKg_0bY.js → start-CM-b3DRX.js} +4 -6
  17. package/dist/{start-CbKg_0bY.js.map → start-CM-b3DRX.js.map} +1 -1
  18. package/dist/token-BVXHiYEl.js +1875 -0
  19. package/dist/token-BVXHiYEl.js.map +1 -0
  20. package/dist/{tool-search-D3SN0jX-.js → tool-search-wA-fLduL.js} +1 -1
  21. package/dist/{tool-search-D3SN0jX-.js.map → tool-search-wA-fLduL.js.map} +1 -1
  22. package/package.json +2 -2
  23. package/dist/auth-BHa2OHXf.js +0 -45
  24. package/dist/auth-BHa2OHXf.js.map +0 -1
  25. package/dist/debug-C_TBkyUw.js.map +0 -1
  26. package/dist/paths-DC-mqCY3.js +0 -30
  27. package/dist/paths-DC-mqCY3.js.map +0 -1
  28. package/dist/proxy-DvlF9a-7.js.map +0 -1
  29. package/dist/server-FPXzFkg9.js.map +0 -1
  30. package/dist/token-Dj8XsAxn.js +0 -170
  31. package/dist/token-Dj8XsAxn.js.map +0 -1
  32. package/dist/utils-jHLgqAq2.js +0 -657
  33. package/dist/utils-jHLgqAq2.js.map +0 -1
package/README.md CHANGED
@@ -2,25 +2,6 @@
2
2
 
3
3
  English | [简体中文](./README.zh-CN.md)
4
4
 
5
- > [!WARNING]
6
- > This is a reverse-engineered proxy of GitHub Copilot API. It is not supported by GitHub, and may break unexpectedly. Use at your own risk. In the current version, if not using opencode OAuth, the device ID and machine ID will be sent to GitHub Copilot. It is not recommended to use a large number of accounts on a single device; if necessary, it is advised to run them in Docker containers.
7
-
8
- > [!WARNING]
9
- > **GitHub Security Notice:**
10
- > Excessive automated or scripted use of Copilot (including rapid or bulk requests, such as via automated tools) may trigger GitHub's abuse-detection systems.
11
- > You may receive a warning from GitHub Security, and further anomalous activity could result in temporary suspension of your Copilot access.
12
- >
13
- > GitHub prohibits use of their servers for excessive automated bulk activity or any activity that places undue burden on their infrastructure.
14
- >
15
- > Please review:
16
- >
17
- > - [GitHub Acceptable Use Policies](https://docs.github.com/site-policy/acceptable-use-policies/github-acceptable-use-policies#4-spam-and-inauthentic-activity-on-github)
18
- > - [GitHub Copilot Terms](https://docs.github.com/site-policy/github-terms/github-terms-for-additional-products-and-features#github-copilot)
19
- >
20
- > Use this proxy responsibly to avoid account restrictions.
21
-
22
- ---
23
-
24
5
  ## Important Notes
25
6
 
26
7
  > [!IMPORTANT]
@@ -28,98 +9,35 @@ English | [简体中文](./README.zh-CN.md)
28
9
  >
29
10
  > 1. **Claude Code configuration:** When using with Claude Code, please configure the model ID as `claude-opus-4-6` or `claude-opus-4.6` (without the `[1m]` suffix, exceeding GitHub Copilot's context window limit too much may lead to being banned). Example claude `settings.json` see [Manual Configuration with `settings.json`](#manual-configuration-with-settingsjson).
30
11
  >
31
- > 2. **Recommend for Opencode:** When using with opencode, we recommend starting with the opencode OAuth app. This approach behaves identically to opencode's built-in GitHub Copilot provider with no Terms of Service risk:
12
+ > 2. **Recommend for Opencode:** For opencode, prefer the opencode OAuth app. It matches opencode's built-in GitHub Copilot provider and avoids Terms of Service risk:
32
13
  > ```sh
33
14
  > npx @jeffreycao/copilot-api@latest --oauth-app=opencode start
34
15
  > ```
35
16
  >
36
- > 3. **Disable multi agent when using codex:** If you're using codex via GitHub Copilot, it's recommended to disable the multi agent feature. Currently, GitHub Copilot charges based on the last message being a user role when using codex, and the billing logic has not been adjusted.
17
+ > 3. **Built-in `codex` provider:** Run `npx @jeffreycao/copilot-api@latest auth login --provider codex` once and the gateway will persist and refresh Codex OAuth credentials automatically.
18
+ >
19
+ > 4. **Disable multi agent when using codex:** If you're using codex via GitHub Copilot, disable multi agent. Copilot currently charges codex traffic based on whether the last message is a user role, and that billing logic has not been adjusted.
20
+ >
21
+ > 5. **Note:** See [GitHub Copilot Security Notice](./NOTICE.md#github-copilot-security-notice) for the warning removed from the README header.
37
22
 
38
23
  ---
39
24
 
40
25
  ## Project Overview
41
26
 
42
- A reverse-engineered proxy for the GitHub Copilot API that exposes it as an OpenAI and Anthropic compatible service. This allows you to use GitHub Copilot with any tool that supports the OpenAI Chat Completions / Responses API or the Anthropic Messages API, including to power [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview).
27
+ A reverse-engineered GitHub Copilot integration that also works as a small AI gateway. Besides Copilot, it can route the built-in `codex` provider and configured third-party providers such as DashScope behind OpenAI- and Anthropic-compatible APIs, so tools like [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview) can use one local endpoint.
43
28
 
44
- Compared with routing everything through plain Chat Completions compatibility, this proxy can prefer Copilot's native Anthropic-style Messages API for Claude-family models, preserve more native thinking/tool semantics, reduce unnecessary Premium request consumption on warmup or resumed tool turns, and expose phase-aware `gpt-5.4` / `gpt-5.3-codex` responses that are easier for users to follow.
29
+ On the GitHub Copilot path, the gateway prefers Copilot's native Anthropic-style Messages API when available, preserving more Claude-native behavior for tool-heavy workflows.
45
30
 
46
31
  ## Features
47
32
 
48
- - **OpenAI & Anthropic Compatibility**: Exposes GitHub Copilot as an OpenAI-compatible (`/v1/responses`, `/v1/chat/completions`, `/v1/models`, `/v1/embeddings`) and Anthropic-compatible (`/v1/messages`) API.
49
- - **Anthropic-First Routing for Claude Models**: When a model supports Copilot's native `/v1/messages` endpoint, the proxy prefers it over `/responses` or `/chat/completions`, preserving Anthropic-style `tool_use` / `tool_result` flows and more Claude-native behavior.
50
- - **Fewer Unnecessary Premium Requests**: Reduces wasted premium usage by routing warmup requests to `smallModel`, merging `tool_result` follow-ups back into the tool flow, and treating resumed tool turns as continuation traffic instead of fresh premium interactions.
51
- - **Phase-Aware `gpt-5.4` and `gpt-5.3-codex`**: These models can emit user-friendly commentary before deeper reasoning or tool use, so long-running coding actions are easier to understand instead of appearing as a sudden tool burst.
52
- - **Claude Native Beta Support**: On the Messages API path, supports Anthropic-native capabilities such as `interleaved-thinking`, `advanced-tool-use`, and `context-management`, which are difficult or unavailable through plain Chat Completions compatibility.
53
- - **Subagent Marker Integration**: Claude Code and opencode plugins can inject `__SUBAGENT_MARKER__...` and propagate `x-session-id` so subagent traffic keeps the correct root session and agent/user semantics.
54
- - **OpenCode via `@ai-sdk/anthropic`**: Point OpenCode at this proxy as an Anthropic provider so Anthropic Messages semantics, premium-request optimizations, and Claude-native behavior are preserved end to end.
55
- - **Claude Code Integration**: Easily configure and launch [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview) to use Copilot as its backend with a simple command-line flag (`--claude-code`).
56
- - **Usage Dashboard**: A web-based dashboard to monitor your Copilot API usage, view quotas, and see detailed statistics.
57
- - **Rate Limit Control**: Manage API usage with rate-limiting options (`--rate-limit`) and a waiting mechanism (`--wait`) to prevent errors from rapid requests.
58
- - **Manual Request Approval**: Manually approve or deny each API request for fine-grained control over usage (`--manual`).
59
- - **Token Visibility**: Option to display GitHub and Copilot tokens during authentication and refresh for debugging (`--show-token`).
60
- - **Flexible Authentication**: Authenticate interactively or provide a GitHub token directly, suitable for CI/CD environments.
61
- - **Support for Different Account Types**: Works with individual, business, and enterprise GitHub Copilot plans.
62
- - **Opencode OAuth Support**: Use opencode GitHub Copilot authentication by setting `COPILOT_API_OAUTH_APP=opencode` environment variable or using `--oauth-app=opencode` command line option.
63
- - **GitHub Enterprise Support**: Connect to GHE.com by setting `COPILOT_API_ENTERPRISE_URL` environment variable (e.g., `company.ghe.com`) or using `--enterprise-url=company.ghe.com` command line option.
64
- - **Custom Data Directory**: Change the default data directory (where tokens and config are stored) by setting `COPILOT_API_HOME` environment variable or using `--api-home=/path/to/dir` command line option.
65
- - **Multi-Provider Messages Proxy Routes**: Add global provider configs and call external Anthropic-compatible or OpenAI-compatible APIs via `/:provider/v1/messages` and `/:provider/v1/models`, or send `model: "provider/model"` to the top-level `/v1/messages` API.
66
- - **Accurate Claude Token Counting**: Optionally forward `/v1/messages/count_tokens` requests for Claude models to Anthropic's free token counting endpoint for exact counts instead of GPT tokenizer estimation.
67
- - **GPT Context Management**: Configurable context compaction for long-running GPT conversations via `responsesApiContextManagementModels`, reducing unnecessary premium requests when approaching token limits. See [Configuration](#configuration-configjson) for details.
68
-
69
- ## Better Agent Semantics
70
-
71
- ### Native Anthropic Messages API when available
72
-
73
- For models that advertise Copilot support for `/v1/messages`, this project sends the request to the native Messages API first and only falls back to `/responses` or `/chat/completions` when needed.
74
-
75
- Compared with using Claude-family models only through Chat Completions compatibility, the Messages API path keeps more Anthropic-native behavior, including support for:
76
-
77
- - `interleaved-thinking-2025-05-14`
78
- - `advanced-tool-use-2025-11-20`
79
- - `context-management-2025-06-27`
80
-
81
- Supported `anthropic-beta` values are filtered and forwarded on the native Messages path, and `interleaved-thinking` is added automatically when a thinking budget is requested for non-adaptive extended thinking.
82
-
83
- ### Fewer unnecessary Premium requests
84
-
85
- The proxy includes request-accounting safeguards designed for tool-heavy coding workflows:
86
-
87
- - tool-less warmup or probe requests can be forced onto `smallModel` so background checks do not spend premium usage;
88
- - mixed `tool_result` + reminder text blocks are merged back into the `tool_result` flow instead of being counted like fresh user turns;
89
- - `x-initiator` is derived from the latest message or item, not stale assistant history.
90
-
91
- This helps resumed tool turns continue the existing workflow instead of consuming an extra Premium request as a brand-new interaction.
92
-
93
- ### Phase-aware `gpt-5.4` and `gpt-5.3-codex`
94
-
95
- By default, the built-in `extraPrompts` for `gpt-5.4` and `gpt-5.3-codex` enable intermediary-update behavior, and the proxy translates assistant turns into `phase: "commentary"` before tool calls and `phase: "final_answer"` for the final response.
96
-
97
- That gives clients a short, user-friendly explanation of what the model is about to do before deeper reasoning or tool execution begins.
98
-
99
- ### Subagent marker integration
100
-
101
- For subagent-based clients, this project can preserve root session context and correctly classify subagent-originated traffic.
102
-
103
- The marker flow uses `__SUBAGENT_MARKER__...` inside a `<system-reminder>` block together with root `x-session-id` propagation. When a marker is detected, the proxy can keep the parent session identity, infer `x-initiator: agent`, and tag the interaction as subagent traffic instead of a fresh top-level request.
104
-
105
- Plugin integrations are included for both Claude Code and opencode; see [Plugin Integrations](#plugin-integrations) below for setup details.
106
-
107
- ### Accurate Claude token counting
108
-
109
- By default, `/v1/messages/count_tokens` estimates Claude token counts using the GPT `o200k_base` tokenizer with a 1.15x multiplier. This consistently underestimates actual Claude token usage, which can cause tools like Claude Code to compact too late and hit "prompt token count exceeds limit" errors.
110
-
111
- When an Anthropic API key is configured, the proxy forwards Claude model token counting requests to [Anthropic's real `/v1/messages/count_tokens` endpoint](https://docs.anthropic.com/en/docs/build-with-claude/token-counting) instead. This returns exact counts and eliminates the estimation mismatch. Non-Claude models and failures fall back to the GPT tokenizer estimation automatically.
112
-
113
- **Setup:**
114
-
115
- 1. Create an Anthropic API account at [console.anthropic.com](https://console.anthropic.com) and add a minimum $5 credit balance (required to activate the API key, but the token counting endpoint itself is free)
116
- 2. Create an API key from Settings > API Keys
117
- 3. Configure the key via **one** of:
118
- - `config.json`: set `"anthropicApiKey": "sk-ant-..."`
119
- - Environment variable: `ANTHROPIC_API_KEY=sk-ant-...`
120
-
121
- > [!NOTE]
122
- > Anthropic's `/v1/messages/count_tokens` endpoint is **free** (no per-token cost). It is rate-limited to 100 RPM at Tier 1. The $5 credit purchase is only needed to activate API access — the token counting calls themselves cost nothing.
33
+ - **OpenAI and Anthropic compatibility**: Serve `/v1/responses`, `/v1/chat/completions`, `/v1/models`, `/v1/embeddings`, and `/v1/messages` from one local gateway.
34
+ - **One gateway for Copilot, `codex`, and external providers**: Route GitHub Copilot, the built-in `codex` provider, and configured third-party providers behind the same endpoint.
35
+ - **Agent-friendly Claude handling on Copilot**: Prefer native `/v1/messages` when available, preserve Claude-style tool flows, support Anthropic beta features, and keep subagent/session markers intact.
36
+ - **Claude Code and OpenCode integration**: Works with Claude Code and OpenCode, including direct Anthropic-compatible usage through `@ai-sdk/anthropic`.
37
+ - **Flexible auth and deployment options**: Supports interactive login or direct tokens, individual/business/enterprise plans, GitHub Enterprise, opencode OAuth, and custom data directories.
38
+ - **Local control and visibility**: Includes a usage dashboard, rate limiting, manual approval, and optional token visibility for debugging.
39
+ - **Multi-provider routing**: Expose provider-specific `/:provider/...` routes or use `model: "provider/model"` on the top-level API.
40
+ - **Better token and context management**: Supports exact Claude token counting and configurable GPT context compaction for long-running conversations.
123
41
 
124
42
  ## Prerequisites
125
43
 
@@ -189,63 +107,27 @@ Main dashboard, token usage breakdown in the bundled Electron app:
189
107
 
190
108
  ## Using with Docker
191
109
 
192
- Build image
110
+ Build the image:
193
111
 
194
112
  ```sh
195
113
  docker build -t copilot-api .
196
114
  ```
197
115
 
198
- Run the container
116
+ Run the container with a bind mount so auth data survives restarts:
199
117
 
200
118
  ```sh
201
- # Create a directory on your host to persist the GitHub token and related data
202
119
  mkdir -p ./copilot-data
203
-
204
- # Run the container with a bind mount to persist the token
205
- # This ensures your authentication survives container restarts
206
-
207
120
  docker run -p 4141:4141 -v $(pwd)/copilot-data:/root/.local/share/copilot-api copilot-api
208
121
  ```
209
122
 
210
- > **Note:**
211
- > The GitHub token and related data will be stored in `copilot-data` on your host. This is mapped to `/root/.local/share/copilot-api` inside the container, ensuring persistence across restarts.
123
+ This stores GitHub auth data in `./copilot-data` on the host, mapped to `/root/.local/share/copilot-api` in the container.
212
124
 
213
- ### Docker with Environment Variables
214
-
215
- You can pass the GitHub token directly to the container using environment variables:
125
+ Or pass a GitHub token directly:
216
126
 
217
127
  ```sh
218
- # Build with GitHub token
219
- docker build --build-arg GH_TOKEN=your_github_token_here -t copilot-api .
220
-
221
- # Run with GitHub token
222
128
  docker run -p 4141:4141 -e GH_TOKEN=your_github_token_here copilot-api
223
-
224
- # Run with additional options
225
- docker run -p 4141:4141 -e GH_TOKEN=your_token copilot-api start --verbose --port 4141
226
- ```
227
-
228
- ### Docker Compose Example
229
-
230
- ```yaml
231
- version: "3.8"
232
- services:
233
- copilot-api:
234
- build: .
235
- ports:
236
- - "4141:4141"
237
- environment:
238
- - GH_TOKEN=your_github_token_here
239
- restart: unless-stopped
240
129
  ```
241
130
 
242
- The Docker image includes:
243
-
244
- - Multi-stage build for optimized image size
245
- - Non-root user for enhanced security
246
- - Health check for container monitoring
247
- - Pinned base image version for reproducible builds
248
-
249
131
  ## Command Structure
250
132
 
251
133
  Copilot API now uses a subcommand structure with these main commands:
@@ -372,11 +254,11 @@ The following command line options are available for the `start` command:
372
254
  - **auth.adminApiKey:** Single admin key used only for `/admin/*` routes. If missing, the server generates a random key at startup and writes it back to `config.json`. Requests use the same `x-api-key` or `Authorization: Bearer` headers, but regular `auth.apiKeys` never grant access to `/admin/*`.
373
255
  - **modelMappings:** Exact `sourceModel -> targetModel` rewrites for top-level `POST /v1/messages` and `POST /v1/messages/count_tokens` requests. Omit it or leave it as `{}` to disable rewrites. Both the source and target must be non-empty strings. Targets can be regular model IDs or `provider/model` aliases such as `dashscope/qwen3.6-plus`, and the rewrite happens before provider alias parsing. The admin endpoints `GET/POST /admin/config/model-mappings` read and update only this field.
374
256
  - **extraPrompts:** Map of `model -> prompt` appended to the first system prompt when translating Anthropic-style requests to Copilot. Use this to inject guardrails or guidance per model. Missing default entries are auto-added without overwriting your custom prompts. The built-in prompts for `gpt-5.3-codex` and `gpt-5.4` enable phase-aware commentary, which lets the model emit a short user-facing progress update before tools or deeper reasoning.
375
- - **providers:** Global upstream provider map. Each provider key (for example `custom`) becomes a route prefix (`/custom/v1/messages`). Supports `type: "anthropic"` and `type: "openai-compatible"`. Top-level Anthropic clients can also use `model: "custom/model-id"` with `/v1/messages` and `/v1/messages/count_tokens`; the proxy strips the `custom/` prefix before forwarding upstream. `GET /v1/models` does not aggregate provider models; use `GET /custom/v1/models` for provider model lists.
257
+ - **providers:** Global upstream provider map. Each provider key (for example `dashscope`) becomes a route prefix (`/dashscope/v1/messages`). Supports `type: "anthropic"`, `type: "openai-compatible"`, and `type: "openai-responses"`. Top-level clients can also use `model: "dashscope/model-id"` with `/v1/messages`, `/v1/messages/count_tokens`, and `/v1/responses`; the gateway strips the `dashscope/` prefix before forwarding upstream. `GET /v1/models` does not aggregate provider models; use `GET /dashscope/v1/models` for provider model lists.
376
258
  - `enabled` defaults to `true` if omitted.
377
- - `baseUrl` should be provider API base URL without the final endpoint. For Anthropic providers, omit `/v1/messages`; for OpenAI-compatible providers, omit `/v1/chat/completions`.
378
- - `apiKey` is used as the upstream credential value.
379
- - `authType` (optional): Controls how `apiKey` is sent upstream. Supports `x-api-key` and `authorization`. Anthropic providers default to `x-api-key`; OpenAI-compatible providers default to `authorization`. When set to `authorization`, the proxy sends `Authorization: Bearer <apiKey>`.
259
+ - `baseUrl` should be provider API base URL without the final endpoint. For Anthropic providers, omit `/v1/messages`; for OpenAI-compatible providers, omit `/v1/chat/completions`; for OpenAI Responses providers, omit `/v1/responses`.
260
+ - `apiKey` is used as the upstream credential value and is required for regular providers.
261
+ - `authType` (optional): Controls how `apiKey` is sent upstream. Supports `x-api-key` and `authorization` for regular providers. Anthropic providers default to `x-api-key`; OpenAI-compatible and OpenAI Responses providers default to `authorization`. When set to `authorization`, the proxy sends `Authorization: Bearer <apiKey>`. `oauth2` is reserved for the built-in `codex` provider and is written automatically by `auth login --provider codex`.
380
262
  - `adjustInputTokens` (optional): When `true`, the proxy will adjust the `input_tokens` in the usage response by subtracting `cache_read_input_tokens` and `cache_creation_input_tokens`.
381
263
  - `models` (optional): Per-model configuration map. Each key is a model ID (matching the model name in requests), and the value is:
382
264
  - `temperature` (optional): Default temperature value used when the request does not specify one.
@@ -386,14 +268,14 @@ The following command line options are available for the `start` command:
386
268
  - `contextCache` (optional): Defaults to `true` for OpenAI-compatible providers. This enables Alibaba Cloud Model Studio/DashScope explicit context cache by injecting `cache_control: { "type": "ephemeral" }` on up to 4 content blocks using the Context Cache format. The cache breakpoint strategy matches opencode's main provider flow: the first 2 system messages plus the last 2 non-system messages. Marked string content is converted to text content part arrays for `system` / `user` / `assistant` / `tool` messages; existing array content is marked on the last part. Set this to `false` when the model already supports implicit caching, or when the upstream does not accept this explicit-cache extension field.
387
269
  - `supportPdf` (optional): Controls whether the model supports PDF/document content. Defaults to `false`; unsupported PDFs are converted to a text notice. Set it to `true` to send PDF/document blocks as OpenAI Chat Completions file parts.
388
270
  - `toolContentSupportType` (optional): Tool result content capabilities for that model, as an array of `array`, `image`, and `pdf`. Provider routes default to string-only tool content when omitted. If `supportPdf` is `true` but this list does not include `pdf`, file parts in tool results are moved to user role messages. This provider default does not change the Copilot main flow, which continues to support array + image and not PDF.
389
- - **smallModel:** Fallback model used for tool-less warmup messages (e.g., Claude Code probe requests) to avoid spending premium requests; defaults to gpt-5-mini.
390
- - **responsesApiContextManagementModels:** List of GPT model IDs that should receive Responses API `context_management` compaction instructions. This defaults to `[]`, so you need to opt in explicitly. A good starting point is `["gpt-5-mini", "gpt-5.3-codex", "gpt-5.4-mini", "gpt-5.4"]`. When enabled, the request includes `context_management` in the body and keeps only the latest compaction carrier on follow-up turns. The actual compaction is handled server-side and appears to begin when usage approaches roughly 90% of the model's `maxPromptTokens`, which makes it especially useful for long-running tasks without consuming additional premium requests. In practice, the effective `compact_threshold` also appears to be fixed on the server side, so changing it in this project does not currently alter compaction behavior. At the moment, this optimization is intended for GPT-family models only.
271
+ - **smallModel:** Fallback model used for tool-less warmup messages (e.g., Claude Code probe requests); defaults to gpt-5-mini.
272
+ - **responsesApiContextManagementModels:** List of GPT model IDs that should receive Responses API `context_management` compaction instructions. This defaults to `[]`, so you need to opt in explicitly. A good starting point is `["gpt-5-mini", "gpt-5.3-codex", "gpt-5.4-mini", "gpt-5.4"]`. When enabled, the request includes `context_management` in the body and keeps only the latest compaction carrier on follow-up turns. The actual compaction is handled server-side and appears to begin when usage approaches roughly 90% of the model's `maxPromptTokens`, which makes it especially useful for long-running tasks. In practice, the effective `compact_threshold` also appears to be fixed on the server side, so changing it in this project does not currently alter compaction behavior. At the moment, this optimization is intended for GPT-family models only.
391
273
  - **modelReasoningEfforts:** Per-model `reasoning.effort` sent to the Copilot Responses API. Allowed values are `none`, `minimal`, `low`, `medium`, `high`, and `xhigh`. If a model isn’t listed, `high` is used by default.
392
274
  - **useMessagesApi:** When `true`, Claude-family models that support Copilot's native `/v1/messages` endpoint will use the Messages API; otherwise they fall back to `/chat/completions`. Set to `false` to disable Messages API routing and always use `/chat/completions`. Defaults to `true`.
393
275
  - **useResponsesApiWebSocket:** When `true`, Responses API requests use Copilot's websocket transport for models that advertise `ws:/responses`; models that only advertise `/responses` continue to use HTTP. Set to `false` to disable websocket routing and use HTTP `/responses` whenever the selected model supports it. Defaults to `true`.
394
276
  - **useResponsesApiWebSearch:** When `true`, the server keeps Responses API tools with `type: "web_search"` and forwards them upstream. Set to `false` to strip those tools from `/responses` payloads. Defaults to `true`.
395
277
  - **claudeTokenMultiplier:** Multiplier applied to the fallback GPT-tokenizer estimate for Claude `/v1/messages/count_tokens` requests. Defaults to `1.15`. Increase it if your client is still compacting too late. This setting is only used when the proxy is estimating Claude tokens locally; if `anthropicApiKey` is configured and Anthropic token counting succeeds, the exact Anthropic count is returned instead.
396
- - **anthropicApiKey:** Anthropic API key used for accurate Claude token counting (see [Accurate Claude Token Counting](#accurate-claude-token-counting) below). Can also be set via the `ANTHROPIC_API_KEY` environment variable. If not set, token counting falls back to GPT tokenizer estimation.
278
+ - **anthropicApiKey:** Anthropic API key used to forward Claude `/v1/messages/count_tokens` requests to Anthropic's real token counting endpoint, which returns exact counts instead of GPT tokenizer estimates. Can also be set via the `ANTHROPIC_API_KEY` environment variable. If not set, or if the upstream call fails, token counting falls back to local GPT tokenizer estimation controlled by `claudeTokenMultiplier`.
397
279
 
398
280
  Edit this file to customize prompts or swap in your own fast model. Restart the server (or rerun the command) after changes so the cached config is refreshed.
399
281
 
@@ -431,7 +313,7 @@ These endpoints mimic the OpenAI API structure.
431
313
 
432
314
  | Endpoint | Method | Description |
433
315
  | --------------------------- | ------ | ---------------------------------------------------------------- |
434
- | `POST /v1/responses` | `POST` | OpenAI Most advanced interface for generating model responses. |
316
+ | `POST /v1/responses` | `POST` | OpenAI Most advanced interface for generating model responses. Supports `provider/model` aliases for `openai-responses` providers. |
435
317
  | `POST /v1/chat/completions` | `POST` | Creates a model response for the given chat conversation. |
436
318
  | `GET /v1/models` | `GET` | Lists the currently available models. |
437
319
  | `POST /v1/embeddings` | `POST` | Creates an embedding vector representing the input text. |
@@ -444,7 +326,7 @@ These endpoints are designed to be compatible with the Anthropic Messages API.
444
326
  | -------------------------------- | ------ | ------------------------------------------------------------ |
445
327
  | `POST /v1/messages` | `POST` | Creates a model response for a given conversation. Supports `provider/model` aliases for configured providers. |
446
328
  | `POST /v1/messages/count_tokens` | `POST` | Calculates the number of tokens for a given set of messages. Supports `provider/model` aliases for configured providers. |
447
- | `POST /:provider/v1/messages` | `POST` | Proxies Anthropic Messages requests to the configured Anthropic or OpenAI-compatible provider. |
329
+ | `POST /:provider/v1/messages` | `POST` | Proxies Anthropic Messages requests to the configured Anthropic, OpenAI-compatible, or OpenAI Responses provider. |
448
330
  | `GET /:provider/v1/models` | `GET` | Proxies model listing requests to the configured provider. |
449
331
  | `POST /:provider/v1/messages/count_tokens` | `POST` | Calculates tokens locally for provider route requests. |
450
332
 
@@ -468,75 +350,33 @@ These endpoints are reserved for local administrative actions and only accept `a
468
350
 
469
351
  ## Example Usage
470
352
 
471
- Using with npx:
353
+ Common `npx` commands:
472
354
 
473
355
  ```sh
474
- # Basic usage with start command
356
+ # Start the gateway
475
357
  npx @jeffreycao/copilot-api@latest start
476
358
 
477
- # Run on custom port with verbose logging
359
+ # Start on a custom port with verbose logging
478
360
  npx @jeffreycao/copilot-api@latest start --port 8080 --verbose
479
361
 
480
- # Use with a business plan GitHub account
481
- npx @jeffreycao/copilot-api@latest start --account-type business
482
-
483
- # Use with an enterprise plan GitHub account
484
- npx @jeffreycao/copilot-api@latest start --account-type enterprise
485
-
486
- # Enable manual approval for each request
487
- npx @jeffreycao/copilot-api@latest start --manual
362
+ # Run the auth flow
363
+ npx @jeffreycao/copilot-api@latest auth login
488
364
 
489
- # Set rate limit to 30 seconds between requests
490
- npx @jeffreycao/copilot-api@latest start --rate-limit 30
491
-
492
- # Wait instead of error when rate limit is hit
493
- npx @jeffreycao/copilot-api@latest start --rate-limit 30 --wait
494
-
495
- # Provide GitHub token directly
496
- npx @jeffreycao/copilot-api@latest start --github-token ghp_YOUR_TOKEN_HERE
497
-
498
- # Run only the auth flow
499
- npx @jeffreycao/copilot-api@latest auth
500
-
501
- # Run auth flow with verbose logging
502
- npx @jeffreycao/copilot-api@latest auth --verbose
503
-
504
- # Show your Copilot usage/quota in the terminal (no server needed)
365
+ # Check Copilot usage without starting the server
505
366
  npx @jeffreycao/copilot-api@latest check-usage
506
367
 
507
- # Display debug information for troubleshooting
508
- npx @jeffreycao/copilot-api@latest debug
509
-
510
- # Display debug information in JSON format
368
+ # Print debug information as JSON
511
369
  npx @jeffreycao/copilot-api@latest debug --json
512
370
 
513
- # Initialize proxy from environment variables (HTTP_PROXY, HTTPS_PROXY, etc.)
514
- npx @jeffreycao/copilot-api@latest start --proxy-env
515
-
516
- # Use opencode GitHub Copilot authentication
517
- COPILOT_API_OAUTH_APP=opencode npx @jeffreycao/copilot-api@latest start
518
-
519
- # Set custom API home directory via command line
520
- npx @jeffreycao/copilot-api@latest --api-home=/path/to/custom/dir start
521
-
522
- # Use GitHub Enterprise via command line
523
- npx @jeffreycao/copilot-api@latest --enterprise-url=company.ghe.com start
524
-
525
- # Use opencode OAuth via command line
526
- npx @jeffreycao/copilot-api@latest --oauth-app=opencode start
527
-
528
- # Combine multiple global options
529
- npx @jeffreycao/copilot-api@latest --api-home=/custom/path --oauth-app=opencode --enterprise-url=company.ghe.com start
530
-
531
371
  # Run the published CLI with Bun instead of Node.js
532
372
  bunx --bun @jeffreycao/copilot-api@latest start
533
373
  ```
534
374
 
535
375
  ## Using with Claude Code
536
376
 
537
- This proxy can be used to power [Claude Code](https://docs.anthropic.com/en/claude-code), an experimental conversational AI assistant for developers from Anthropic.
377
+ This AI gateway can be used to power [Claude Code](https://docs.anthropic.com/en/claude-code), an experimental conversational AI assistant for developers from Anthropic.
538
378
 
539
- There are two ways to configure Claude Code to use this proxy:
379
+ There are two ways to configure Claude Code to use this AI gateway:
540
380
 
541
381
  ### Interactive Setup with `--claude-code` flag
542
382
 
@@ -546,7 +386,7 @@ To get started, run the `start` command with the `--claude-code` flag:
546
386
  npx @jeffreycao/copilot-api@latest start --claude-code
547
387
  ```
548
388
 
549
- You will be prompted to select a primary model and a "small, fast" model for background tasks. After selecting the models, a command will be copied to your clipboard. This command sets the necessary environment variables for Claude Code to use the proxy.
389
+ You will be prompted to select a primary model and a "small, fast" model for background tasks. After selecting the models, a command will be copied to your clipboard. This command sets the necessary environment variables for Claude Code to use the gateway.
550
390
 
551
391
  Paste and run this command in a new terminal to launch Claude Code.
552
392
 
@@ -593,9 +433,9 @@ You can also read more about IDE integration here: [Add Claude Code to your IDE]
593
433
 
594
434
  ## GPT Tool Search
595
435
 
596
- For GPT Responses models such as `gpt-5.4+`, this proxy can expose Responses `tool_search` through a small MCP bridge. The same bridge can be used by Claude Code and opencode, as long as the client loads MCP servers and sends Anthropic Messages traffic through this proxy.
436
+ For GPT Responses models such as `gpt-5.4+`, this AI gateway can expose Responses `tool_search` through a small MCP bridge. The same bridge can be used by Claude Code and opencode, as long as the client loads MCP servers and sends Anthropic Messages traffic through this gateway.
597
437
 
598
- Do not set Claude Code's native `ENABLE_TOOL_SEARCH` for GPT models. That flag enables Claude Code's own client-side tool search mode, and it may stop forwarding deferred tool definitions. This proxy needs the full tool definitions so it can keep the small always-loaded tool set eager and translate every other tool into Responses deferred namespaces.
438
+ Do not set Claude Code's native `ENABLE_TOOL_SEARCH` for GPT models. That flag enables Claude Code's own client-side tool search mode, and it may stop forwarding deferred tool definitions. This gateway needs the full tool definitions so it can keep the small always-loaded tool set eager and translate every other tool into Responses deferred namespaces.
599
439
 
600
440
  If you install `tool-search@copilot-api-marketplace`, Claude Code receives this MCP bridge automatically and you can skip the manual Claude Code MCP setup below.
601
441
 
@@ -628,23 +468,23 @@ Add the tool search bridge to the MCP config used by opencode:
628
468
 
629
469
  For local development, use `bun` as the command and `["run", "./src/main.ts", "mcp"]` as the args.
630
470
 
631
- Internally, the proxy now configures OpenAI Responses `tool_search` in client-executed mode. Deferred tools are still exposed as searchable namespaces, but the model is explicitly asked to return the exact deferred tool names it wants to load next.
471
+ Internally, the gateway now configures OpenAI Responses `tool_search` in client-executed mode. Deferred tools are still exposed as searchable namespaces, but the model is explicitly asked to return the exact deferred tool names it wants to load next.
632
472
 
633
473
  The bridge uses direct tool selection, not query search. Its tool input is `names`, a comma-separated list of exact deferred tool names, for example `TaskList,TaskGet,mcp__fetch__fetch`.
634
474
 
635
475
  ## Using with OpenCode
636
476
 
637
- OpenCode already has a direct GitHub Copilot provider. Use this section when you want OpenCode to point at this proxy through `@ai-sdk/anthropic` and reuse the agent behaviors described earlier in this README.
477
+ OpenCode already has a direct GitHub Copilot provider. Use this section when you want OpenCode to point at this AI gateway through `@ai-sdk/anthropic` and reuse the agent behaviors described earlier in this README.
638
478
 
639
479
  ### Minimal setup
640
480
 
641
- Start the proxy with the OpenCode OAuth app:
481
+ Start the AI gateway with the OpenCode OAuth app:
642
482
 
643
483
  ```sh
644
484
  npx @jeffreycao/copilot-api@latest --oauth-app=opencode start
645
485
  ```
646
486
 
647
- Then point OpenCode at the proxy with `@ai-sdk/anthropic`.
487
+ Then point OpenCode at the gateway with `@ai-sdk/anthropic`.
648
488
 
649
489
  Example `~/.config/opencode/opencode.json`:
650
490
 
@@ -717,10 +557,10 @@ Example `~/.config/opencode/opencode.json`:
717
557
 
718
558
  Why these fields matter:
719
559
 
720
- - `npm: "@ai-sdk/anthropic"` is the important part. OpenCode will speak Anthropic Messages semantics to this proxy instead of flattening everything into OpenAI Chat Completions.
560
+ - `npm: "@ai-sdk/anthropic"` is the important part. OpenCode will speak Anthropic Messages semantics to this AI gateway instead of flattening everything into OpenAI Chat Completions.
721
561
  - `options.baseURL` should be `http://localhost:4141/v1`; the Anthropic SDK will append `/messages`, `/models`, and `/messages/count_tokens` automatically.
722
562
  - `model`, `small_model`, and `agent.*.model` let you keep `gpt-5.4` for build/plan work while routing exploration and background work to `gpt-5-mini`.
723
- - If you enable `auth.apiKeys` in this proxy, replace `dummy` with a real key. Otherwise any placeholder value is fine.
563
+ - If you enable `auth.apiKeys` in this AI gateway, replace `dummy` with a real key. Otherwise any placeholder value is fine.
724
564
 
725
565
  ## Plugin Integrations
726
566
 
@@ -730,11 +570,11 @@ Plugin integrations are available for Claude Code and opencode.
730
570
 
731
571
  The Claude Code integration is packaged as two plugins:
732
572
 
733
- - `agent-inject` injects `__SUBAGENT_MARKER__...` on `SubagentStart`, so this proxy can infer `x-initiator: agent`.
573
+ - `agent-inject` injects `__SUBAGENT_MARKER__...` on `SubagentStart`, so the gateway can infer `x-initiator: agent`.
734
574
  - `tool-search` registers the `tool_search` MCP bridge used for GPT Responses deferred tool loading.
735
575
 
736
576
  - Marketplace catalog in this repository: `.claude-plugin/marketplace.json`
737
- - Plugin sources in this repository: `claude-plugin/agent-inject`, `claude-plugin/tool-search`
577
+ - Plugin sources in this repository: `plugin/claude/agent-inject`, `plugin/claude/tool-search`
738
578
 
739
579
  Add the marketplace remotely:
740
580
 
@@ -749,7 +589,7 @@ Install the plugins from the marketplace:
749
589
  /plugin install tool-search@copilot-api-marketplace
750
590
  ```
751
591
 
752
- After installation, `agent-inject` injects `__SUBAGENT_MARKER__...` on `SubagentStart`, and this proxy uses it to infer `x-initiator: agent`.
592
+ After installation, `agent-inject` injects `__SUBAGENT_MARKER__...` on `SubagentStart`, and the gateway uses it to infer `x-initiator: agent`.
753
593
 
754
594
  The `agent-inject` plugin also registers a `UserPromptSubmit` hook that returns `{"continue": true}`, and it can inject `SessionStart` reminder rules through environment variables:
755
595
 
@@ -760,7 +600,7 @@ The `tool-search` plugin bundles the same MCP bridge described in [GPT Tool Sear
760
600
 
761
601
  #### Opencode plugin
762
602
 
763
- The subagent marker producer is packaged as an opencode plugin located at `.opencode/plugins/subagent-marker.js`.
603
+ The subagent marker producer is packaged as an opencode plugin located at `plugin/opencode/subagent-marker.js`.
764
604
 
765
605
  **Installation:**
766
606
 
@@ -768,7 +608,7 @@ Copy the plugin file to your opencode plugins directory:
768
608
 
769
609
  ```sh
770
610
  # Clone or download this repository, then copy the plugin
771
- cp .opencode/plugins/subagent-marker.js ~/.config/opencode/plugins/
611
+ cp plugin/opencode/subagent-marker.js ~/.config/opencode/plugins/
772
612
  ```
773
613
 
774
614
  Or manually create the file at `~/.config/opencode/plugins/subagent-marker.js` with the plugin content.
@@ -778,7 +618,7 @@ Or manually create the file at `~/.config/opencode/plugins/subagent-marker.js` w
778
618
  - Tracks sub-sessions created by subagents
779
619
  - Automatically prepends a marker system reminder (`__SUBAGENT_MARKER__...`) to subagent chat messages
780
620
  - Sets `x-session-id` header for session tracking
781
- - Enables this proxy to infer `x-initiator: agent` for subagent-originated requests
621
+ - Enables the gateway to infer `x-initiator: agent` for subagent-originated requests
782
622
 
783
623
  The plugin hooks into `session.created`, `session.deleted`, `chat.message`, and `chat.headers` events to provide seamless subagent marker functionality.
784
624