ghc-proxy 0.5.2 → 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -169,7 +169,11 @@ All fields are optional. The full schema:
169
169
  | `contextUpgrade` | `boolean` | `true` | Auto-upgrade to extended-context model variants (see [Context-1M Auto-Upgrade](#context-1m-auto-upgrade)) |
170
170
  | `contextUpgradeTokenThreshold` | `number` | `160000` | Token threshold for proactive context upgrade |
171
171
  | `useFunctionApplyPatch` | `boolean` | `true` | Rewrite `apply_patch` custom tool as function tool on Responses path |
172
- | `responsesApiContextManagementModels` | `string[]` | -- | Models that enable Responses context compaction |
172
+ | `responsesApiAutoCompactInput` | `boolean` | `false` | Automatically trim Responses `input` to the latest `compaction` item |
173
+ | `responsesApiAutoContextManagement` | `boolean` | `false` | Automatically inject Responses `context_management` for selected models |
174
+ | `responsesApiContextManagementModels` | `string[]` | -- | Models eligible for auto-injected Responses `context_management` |
175
+ | `responsesOfficialEmulator` | `boolean` | `false` | Enable local OpenAI-style Responses state emulation for `previous_response_id`, `conversation`, retrieve, input_items, delete, and input_tokens |
176
+ | `responsesOfficialEmulatorTtlSeconds` | `number` | `14400` | In-memory TTL for locally emulated Responses state |
173
177
  | `modelReasoningEfforts` | `Record<string, string>` | -- | Per-model reasoning effort defaults for Anthropic-to-Responses translation |
174
178
 
175
179
  Example:
@@ -188,7 +192,11 @@ Example:
188
192
  "contextUpgrade": true,
189
193
  "contextUpgradeTokenThreshold": 160000,
190
194
  "useFunctionApplyPatch": true,
195
+ "responsesApiAutoCompactInput": false,
196
+ "responsesApiAutoContextManagement": false,
191
197
  "responsesApiContextManagementModels": ["gpt-5", "gpt-5-mini"],
198
+ "responsesOfficialEmulator": false,
199
+ "responsesOfficialEmulatorTtlSeconds": 14400,
192
200
  "modelReasoningEfforts": {
193
201
  "gpt-5": "high",
194
202
  "gpt-5-mini": "medium"
@@ -315,10 +323,10 @@ Incoming requests hit an [Elysia](https://elysiajs.com/) server. `chat/completio
315
323
 
316
324
  - `POST /v1/chat/completions`: OpenAI Chat Completions -> shared planning pipeline -> Copilot `/chat/completions`
317
325
  - `POST /v1/responses`: OpenAI Responses create -> native Responses handler -> Copilot `/responses`
318
- - `POST /v1/responses/input_tokens`: Responses input-token counting passthrough when the upstream supports it
319
- - `GET /v1/responses/:responseId`: Responses retrieve passthrough when the upstream supports it
320
- - `GET /v1/responses/:responseId/input_items`: Responses input-items passthrough when the upstream supports it
321
- - `DELETE /v1/responses/:responseId`: Responses delete passthrough when the upstream supports it
326
+ - `POST /v1/responses/input_tokens`: Responses input-token counting passthrough by default, or local estimation in official emulator mode
327
+ - `GET /v1/responses/:responseId`: Responses retrieve passthrough by default, or local retrieval in official emulator mode
328
+ - `GET /v1/responses/:responseId/input_items`: Responses input-items passthrough by default, or local retrieval in official emulator mode
329
+ - `DELETE /v1/responses/:responseId`: Responses delete passthrough by default, or local deletion in official emulator mode
322
330
  - `POST /v1/messages`: Anthropic Messages -> choose the best available upstream path for the selected model:
323
331
  - native Copilot `/v1/messages` when supported
324
332
  - Anthropic -> Responses -> Anthropic translation when the model only supports `/responses`
@@ -334,10 +342,10 @@ This keeps the existing chat pipeline stable while allowing newer Copilot models
334
342
  |--------|------|-------------|
335
343
  | `POST` | `/v1/chat/completions` | Chat completions (streaming and non-streaming) |
336
344
  | `POST` | `/v1/responses` | Create a Responses API response |
337
- | `POST` | `/v1/responses/input_tokens` | Count Responses input tokens when supported by Copilot upstream |
338
- | `GET` | `/v1/responses/:responseId` | Retrieve one response when supported by Copilot upstream |
339
- | `GET` | `/v1/responses/:responseId/input_items` | Retrieve response input items when supported by Copilot upstream |
340
- | `DELETE` | `/v1/responses/:responseId` | Delete one response when supported by Copilot upstream |
345
+ | `POST` | `/v1/responses/input_tokens` | Count Responses input tokens via upstream passthrough or the local official emulator |
346
+ | `GET` | `/v1/responses/:responseId` | Retrieve one response via upstream passthrough or the local official emulator |
347
+ | `GET` | `/v1/responses/:responseId/input_items` | Retrieve response input items via upstream passthrough or the local official emulator |
348
+ | `DELETE` | `/v1/responses/:responseId` | Delete one response via upstream passthrough or the local official emulator |
341
349
  | `GET` | `/v1/models` | List available models |
342
350
  | `POST` | `/v1/embeddings` | Generate embeddings |
343
351
 
@@ -364,13 +372,29 @@ This keeps the existing chat pipeline stable while allowing newer Copilot models
364
372
  - requests are validated before any mutation
365
373
  - common official request fields such as `conversation`, `previous_response_id`, `max_tool_calls`, `truncation`, `user`, `prompt`, and `text` are now modeled explicitly instead of relying on loose passthrough alone
366
374
  - official `text.format` options are modeled explicitly, including `text`, `json_object`, and `json_schema`
375
+ - an opt-in `responsesOfficialEmulator` mode adds in-memory OpenAI-style state for `previous_response_id`, `conversation`, `GET /responses/{id}`, `GET /responses/{id}/input_items`, `DELETE /responses/{id}`, and `POST /responses/input_tokens`
376
+ - emulator state is memory-only and expires after `responsesOfficialEmulatorTtlSeconds` (default `14400`, or 4 hours)
377
+ - `background: true` is rejected explicitly while emulator mode is enabled
367
378
  - `custom` `apply_patch` can be rewritten as a function tool when `useFunctionApplyPatch` is enabled
368
- - per-model Responses context compaction can be enabled with `responsesApiContextManagementModels`
379
+ - automatic Responses `context_management` injection is disabled by default and only applies when `responsesApiAutoContextManagement` is `true` and the model matches `responsesApiContextManagementModels`
380
+ - automatic trimming of Responses `input` to the latest `compaction` item is disabled by default and only applies when `responsesApiAutoCompactInput` is `true`
369
381
  - reasoning defaults for Anthropic -> Responses translation can be tuned with `modelReasoningEfforts`
370
382
  - known unsupported builtin tools, such as `web_search`, fail explicitly with `400` instead of being silently removed
371
383
  - external image URLs on the Responses path fail explicitly with `400`; use `file_id` or data URL image input instead
372
384
  - official `input_file` and `item_reference` input items are modeled explicitly and validated before forwarding
373
385
 
386
+ Example opt-in configuration for these two Responses-specific policies:
387
+
388
+ ```json
389
+ {
390
+ "responsesApiAutoContextManagement": true,
391
+ "responsesApiContextManagementModels": ["gpt-5"],
392
+ "responsesApiAutoCompactInput": true,
393
+ "responsesOfficialEmulator": true,
394
+ "responsesOfficialEmulatorTtlSeconds": 14400
395
+ }
396
+ ```
397
+
374
398
  > See [Responses Upstream Notes](./docs/responses-upstream-notes.md) for detailed upstream compatibility observations from live testing.
375
399
 
376
400
  ## Docker