@oh-my-pi/pi-ai 12.17.0 → 12.17.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,1270 @@
1
+ # Changelog
2
+
3
+ ## [Unreleased]
4
+
5
+ ## [12.17.2] - 2026-02-21
6
+ ### Added
7
+
8
+ - Exported `getAntigravityUserAgent()` function for constructing Antigravity User-Agent headers
9
+
10
+ ### Changed
11
+
12
+ - Updated default Antigravity version from 1.15.8 to 1.18.3
13
+ - Unified User-Agent header generation across Antigravity API calls to use centralized `getAntigravityUserAgent()` function
14
+
15
+ ## [12.17.1] - 2026-02-21
16
+ ### Added
17
+
18
+ - Added new export paths for provider models via `./provider-models` and `./provider-models/*`
19
+ - Added new export paths for Cursor and OpenAI Codex providers via `./providers/cursor/gen/*` and `./providers/openai-codex/*`
20
+ - Added new export paths for usage utilities via `./usage/*`
21
+ - Added new export paths for discovery and OAuth utilities via `./utils/discovery` and `./utils/oauth` with subpath exports
22
+
23
+ ### Changed
24
+
25
+ - Simplified main export path to use wildcard pattern `./src/*.ts` for broader module access
26
+ - Updated `models.json` export to include TypeScript declaration file at `./src/models.json.d.ts`
27
+ - Reorganized package.json field ordering for improved readability
28
+
29
+ ## [12.17.0] - 2026-02-21
30
+ ### Fixed
31
+ - Cursor provider: bind `execHandlers` when passing handler methods to the exec protocol so handlers receive correct `this` context (fixes "undefined is not an object (evaluating 'this.options')" when using exec tools such as web search with Cursor)
32
+
33
+ ## [12.16.0] - 2026-02-21
34
+ ### Added
35
+
36
+ - Exported `readModelCache` and `writeModelCache` functions for direct SQLite-backed model cache access
37
+ - Added `<turn_aborted>` guidance marker as synthetic user message when assistant messages are aborted or errored, informing the model that tools may have partially executed
38
+ - Added support for Sonnet 4.6 models in adaptive thinking detection
39
+
40
+ ### Changed
41
+
42
+ - Updated model cache schema version to support improved global model fallback resolution
43
+ - Improved GitHub Copilot model resolution to prefer provider-specific model definitions over global references when context window is larger, ensuring optimal model capabilities
44
+ - Migrated model cache from per-provider JSON files to unified SQLite database (models.db) for atomic cross-process access
45
+ - Renamed `cachePath` option to `cacheDbPath` in ModelManagerOptions to reflect database-backed storage
46
+ - Improved non-authoritative cache handling with 5-minute retry backoff instead of retrying on every startup
47
+ - Modified handling of aborted/errored assistant messages to preserve tool call structure instead of converting to text summaries, with synthetic 'aborted' tool results injected
48
+ - Updated tool call tracking to use status map (Resolved/Aborted) instead of separate sets for better handling of duplicate and aborted tool results
49
+
50
+ ## [12.15.0] - 2026-02-20
51
+ ### Fixed
52
+
53
+ - Improved error messages for OAuth token refresh failures by including detailed error information from the provider
54
+ - Separated rate limit and usage limit error handling to provide distinct user-friendly messages for ChatGPT rate limits vs subscription usage limits
55
+
56
+ ### Changed
57
+
58
+ - Increased SDK retry attempts to 5 for OpenAI, Azure OpenAI, and Anthropic clients (was SDK default of 2)
59
+ - Changed 429 retry strategy for OpenAI Codex and Google Gemini CLI to use a 5-minute time budget when the server provides a retry delay, instead of a fixed attempt cap
60
+
61
+ ## [12.14.0] - 2026-02-19
62
+ ### Added
63
+
64
+ - Added `gemini-3.1-pro` model to opencode provider with text and image input support
65
+ - Added `trinity-large-preview-free` model to opencode provider
66
+ - Added `google/gemini-3.1-pro-preview` model to nanogpt provider
67
+ - Added `google/gemini-3.1-pro-preview` model to openrouter provider with text and image input support
68
+ - Added `gemini-3.1-pro` model to cursor provider
69
+ - Added optional `intent` field to `ToolCall` interface for harness-level intent metadata
70
+
71
+ ### Changed
72
+
73
+ - Changed `big-pickle` model API from `openai-completions` to `anthropic-messages`
74
+ - Changed `big-pickle` model baseUrl from `https://opencode.ai/zen/v1` to `https://opencode.ai/zen`
75
+ - Changed `minimax-m2.5-free` model API from `openai-completions` to `anthropic-messages`
76
+ - Changed `minimax-m2.5-free` model baseUrl from `https://opencode.ai/zen/v1` to `https://opencode.ai/zen`
77
+
78
+ ### Fixed
79
+
80
+ - Fixed tool argument validation to iteratively coerce nested JSON strings across multiple passes, enabling proper handling of deeply nested JSON-serialized objects and arrays
81
+
82
+ ## [12.13.0] - 2026-02-19
83
+
84
+ ### Added
85
+
86
+ - Added NanoGPT provider support with API-key login, dynamic model discovery from `https://nano-gpt.com/api/v1/models`, and text-model filtering for catalog/runtime discovery ([#111](https://github.com/can1357/oh-my-pi/issues/111))
87
+
88
+ ## [12.12.3] - 2026-02-19
89
+ ### Fixed
90
+
91
+ - Fixed retry logic to recognize 'unable to connect' errors as transient failures
92
+
93
+ ## [12.11.3] - 2026-02-19
94
+
95
+ ### Fixed
96
+
97
+ - Fixed OpenAI Codex streaming to fail truncated responses that end without a terminal completion event, preventing partial outputs from being treated as successful completions.
98
+ - Fixed Codex websocket append fallback by resetting stale turn-state/model-etag session metadata when request shape diverges from appendable history.
99
+
100
+ ## [12.11.1] - 2026-02-19
101
+ ### Added
102
+
103
+ - Added support for Claude 4.6 Opus and Sonnet models via Cursor API
104
+ - Added support for Composer 1.5 model via Cursor API
105
+ - Added support for GPT-5.1 Codex Mini and GPT-5.1 High models via Cursor API
106
+ - Added support for GPT-5.2 and GPT-5.3 Codex variants (Fast, High, Low, Extra High) via Cursor API
107
+ - Added HTTP/2 transport support for Cursor API requests (required by Cursor API)
108
+
109
+ ### Changed
110
+
111
+ - Updated pricing for Claude 3.5 Sonnet model
112
+ - Updated Claude 3.5 Sonnet context window from 262,144 to 131,072 tokens
113
+ - Simplified Cursor model display names by removing '(Cursor)' suffix
114
+ - Changed Cursor API timeout from 15 seconds to 5 seconds
115
+ - Switched Cursor API transport from HTTP/1.1 to HTTP/2
116
+
117
+ ## [12.11.0] - 2026-02-19
118
+
119
+ ### Added
120
+
121
+ - Added `priority` field to Model interface for provider-assigned model prioritization
122
+ - Added `CatalogDiscoveryConfig` interface to standardize catalog discovery configuration across providers
123
+ - Added type guards `isCatalogDescriptor()` and `allowsUnauthenticatedCatalogDiscovery()` for safer descriptor handling
124
+ - Added `DEFAULT_MODEL_PER_PROVIDER` export from descriptors module for centralized default model management
125
+ - Support for 11 new AI providers: Cloudflare AI Gateway, Hugging Face Inference, LiteLLM, Moonshot, NVIDIA, Ollama, Qianfan, Qwen Portal, Together, Venice, vLLM, and Xiaomi MiMo
126
+ - Login flows for new providers with API key validation and OAuth token support
127
+ - Extended `KnownProvider` type to include all newly supported providers
128
+ - API key environment variable mappings for all new providers in service provider map
129
+ - Model discovery and configuration for Cloudflare AI Gateway, Hugging Face, LiteLLM, Moonshot, NVIDIA, Ollama, Qianfan, Qwen Portal, Together, Venice, vLLM, and Xiaomi MiMo
130
+
131
+ ### Changed
132
+
133
+ - Refactored OAuth credential retrieval to simplify storage lifecycle management in model generation script
134
+ - Parallelized special model discovery sources (Antigravity, Codex) for improved generation performance
135
+ - Reorganized model JSON structure to place `contextWindow` and `maxTokens` before `compat` field for consistency
136
+ - Added `priority` field to OpenAI Codex models for provider-assigned model prioritization
137
+ - Refactored provider descriptors to use helper functions (`descriptor`, `catalog`, `catalogDescriptor`) for reduced code duplication
138
+ - Refactored models.dev provider descriptors to use helper functions (`simpleModelsDevDescriptor`, `openAiCompletionsDescriptor`, `anthropicMessagesDescriptor`) for improved maintainability
139
+ - Unified provider descriptors into single source of truth in `descriptors.ts` for both runtime model discovery and catalog generation, improving maintainability
140
+ - Refactored model generation script to use declarative `CatalogProviderDescriptor` interface instead of separate descriptor types, reducing code duplication
141
+ - Reorganized models.dev provider descriptors into logical groups (Bedrock, Core, Coding Plans, Specialized) for better code organization
142
+ - Simplified API resolution for OpenCode and GitHub Copilot providers using rule-based matching instead of inline conditionals
143
+ - Refactored model generation script to use declarative provider descriptors instead of inline provider-specific logic, improving maintainability and reducing code duplication
144
+ - Extracted model post-processing policies (cache pricing corrections, context window normalization) into dedicated `model-policies.ts` module for better testability and clarity
145
+ - Removed static bundled models for Ollama and vLLM from `models.json` to rely on dynamic discovery instead, reducing static catalog size
146
+ - Updated `OAuthProvider` type to include new provider identifiers
147
+ - Expanded model registry (models.json) with thousands of new model entries across all new providers
148
+ - Modified environment variable resolution to use `$pickenv` for providers with multiple possible env var names
149
+ - Updated README documentation to list all newly supported providers and their authentication requirements
150
+
151
+ ## [12.10.1] - 2026-02-18
152
+ - Added Synthetic provider
153
+ - Added API-key login helpers for Synthetic and Cerebras providers
154
+
155
+ ## [12.10.0] - 2026-02-18
156
+
157
+ ### Breaking Changes
158
+
159
+ - Renamed public API functions: `getModel()` → `getBundledModel()`, `getModels()` → `getBundledModels()`, `getProviders()` → `getBundledProviders()`
160
+
161
+ ### Added
162
+
163
+ - Exported `ModelManager` API for runtime-aware model resolution with dynamic endpoint discovery
164
+ - Exported provider-specific model manager configuration helpers for Google, OpenAI-compatible, Codex, and Cursor providers
165
+ - Exported discovery utilities for fetching models from Antigravity, Codex, Cursor, Gemini, and OpenAI-compatible endpoints
166
+ - Added `createModelManager()` function to manage bundled and dynamically discovered models with configurable refresh strategies
167
+ - Added support for on-disk model caching with TTL-based invalidation
168
+ - Added `resolveProviderModels()` function for runtime model resolution across multiple providers
169
+ - Added EU cross-region inference variants for Claude Haiku 3.5 on Bedrock
170
+ - Added Claude Sonnet 4.6 and Claude Sonnet 4.6 Thinking models to Antigravity provider
171
+ - Added GLM-5 Free model via OpenCode provider
172
+ - Added GLM-4.7-FlashX model via ZAI provider
173
+ - Added MiniMax-M2.5-highspeed model across multiple providers (minimax-code, minimax-code-cn, minimax, minimax-cn)
174
+ - Added Claude Sonnet 4.6 model to OpenRouter provider
175
+ - Added Qwen 3.5 Plus model to Vercel AI Gateway provider
176
+ - Added Claude Sonnet 4.6 model to Vercel AI Gateway provider
177
+
178
+ ### Changed
179
+
180
+ - Renamed `getModel()` to `getBundledModel()` to clarify it returns compile-time bundled models only
181
+ - Renamed `getModels()` to `getBundledModels()` for consistency
182
+ - Renamed `getProviders()` to `getBundledProviders()` for consistency
183
+ - Refactored model generation script to use modular discovery functions instead of monolithic provider-specific logic
184
+ - Updated models.json with new model entries and pricing updates across multiple providers
185
+ - Updated pricing for deepseek/deepseek-v3 model on OpenRouter
186
+ - Updated maxTokens from 65536 to 4096 for deepseek/deepseek-v3 on OpenRouter
187
+ - Updated pricing and maxTokens for mistralai/mistral-large-2411 on OpenRouter
188
+ - Updated pricing for qwen/qwen-max on Together AI
189
+ - Updated pricing for qwen/qwen-vl-plus on Together AI
190
+ - Updated pricing for qwen/qwen-plus on Together AI
191
+ - Updated pricing for qwen/qwen-turbo on Together AI
192
+ - Expanded EU cross-region inference variant support to all Claude models on Bedrock (previously limited to Haiku, Sonnet, and Opus 4.5)
193
+
194
+ ## [12.8.0] - 2026-02-16
195
+
196
+ ### Added
197
+
198
+ - Added `contextPromotionTarget` model property to specify preferred fallback model when context promotion is triggered
199
+ - Added automatic context promotion target assignment for Spark models to their base model equivalents
200
+ - Added support for Brave search provider with BRAVE_API_KEY environment variable
201
+
202
+ ### Changed
203
+
204
+ - Updated Qwen model context window and max token limits for improved accuracy
205
+
206
+ ## [12.7.0] - 2026-02-16
207
+ ### Added
208
+
209
+ - Added DeepSeek-V3.2 model support via Amazon Bedrock
210
+ - Added GLM-5 model support via OpenCode
211
+ - Added MiniMax M2.5 model support via OpenCode
212
+
213
+ ### Changed
214
+
215
+ - Updated GLM-4.5, GLM-4.5-Air, GLM-4.5-Flash, GLM-4.5V, GLM-4.6, GLM-4.6V, GLM-4.7, GLM-4.7-Flash, and GLM-5 models to use anthropic-messages API instead of openai-completions
216
+ - Updated GLM models base URL from https://api.z.ai/api/coding/paas/v4 to https://api.z.ai/api/anthropic
217
+ - Updated pricing for multiple models including Mistral, Moonshot, and Qwen variants
218
+ - Updated context window and max tokens for several models to reflect accurate specifications
219
+
220
+ ### Removed
221
+
222
+ - Removed compat field with supportsDeveloperRole and thinkingFormat properties from GLM models
223
+
224
+ ## [12.6.0] - 2026-02-16
225
+
226
+ ### Added
227
+
228
+ - Added source-scoped custom API and OAuth provider registration helpers for extension-defined providers.
229
+
230
+ ### Changed
231
+
232
+ - Expanded `Api` typing to allow extension-defined API identifiers while preserving built-in API exhaustiveness checks.
233
+
234
+ ### Fixed
235
+
236
+ - Fixed custom API registration to reject built-in API identifiers and prevent accidental provider overrides.
237
+
238
+ ## [12.2.0] - 2026-02-13
239
+
240
+ ### Added
241
+
242
+ - Added automatic retry logic for WebSocket stream closures before response completion, with configurable retry budget to improve reliability on flaky connections
243
+ - Added `providerSessionState` option to enable provider-scoped mutable state persistence across agent turns
244
+ - Added WebSocket retry logic with configurable retry budget and delay via `PI_CODEX_WEBSOCKET_RETRY_BUDGET` and `PI_CODEX_WEBSOCKET_RETRY_DELAY_MS` environment variables
245
+ - Added WebSocket idle timeout detection via `PI_CODEX_WEBSOCKET_IDLE_TIMEOUT_MS` environment variable to fail stalled connections
246
+ - Added WebSocket v2 beta header support via `PI_CODEX_WEBSOCKET_V2` environment variable for newer OpenAI API versions
247
+ - Added WebSocket handshake header capture to extract and replay session metadata (turn state, models etag, reasoning flags) across SSE fallback requests
248
+ - Added `preferWebsockets` option to enable WebSocket transport for OpenAI Codex responses when supported
249
+ - Added `prewarmOpenAICodexResponses()` function to establish and reuse WebSocket connections across multiple requests
250
+ - Added `getOpenAICodexTransportDetails()` function to inspect transport layer details including WebSocket status and fallback information
251
+ - Added `getProviderDetails()` function to retrieve formatted provider configuration and transport information
252
+ - Added automatic fallback from WebSocket to SSE when connection fails, with transparent retry logic
253
+ - Added session state management to reuse WebSocket connections and enable request appending across turns
254
+ - Added support for x-codex-turn-state header to maintain conversation state across SSE requests
255
+
256
+ ### Changed
257
+
258
+ - Changed WebSocket session state storage from global maps to provider-scoped session state for multi-agent isolation
259
+ - Changed WebSocket connection initialization to accept idle timeout configuration and handshake header callbacks
260
+ - Changed WebSocket error handling to use standardized transport error messages with `Codex websocket transport error` prefix
261
+ - Changed WebSocket retry behavior to retry transient failures before activating sticky fallback, improving reliability on flaky connections
262
+ - Changed OpenAI Codex model configuration to prefer WebSocket transport by default with `preferWebsockets: true`
263
+ - Changed header handling to use appropriate OpenAI-Beta header values for WebSocket vs SSE transports
264
+ - Perplexity OAuth token refresh now uses JWT expiry extraction instead of Socket.IO RPC, improving reliability when server is unreachable
265
+ - Removed Socket.IO client implementation for Perplexity token refresh; tokens are now validated using embedded JWT expiry claims
266
+
267
+ ### Removed
268
+
269
+ - Removed `refreshPerplexityToken` export; token refresh is now handled internally via JWT expiry detection
270
+
271
+ ### Fixed
272
+
273
+ - Fixed WebSocket stream retry logic to properly handle mid-stream connection closures and retry before falling back to SSE transport
274
+ - Fixed `preferWebsockets` option handling to correctly respect explicit `false` values when determining transport preference
275
+ - Fixed WebSocket append state not being reset after aborted requests, preventing stale state from affecting subsequent turns
276
+ - Fixed WebSocket append state not being reset after stream errors, preventing failed append attempts from blocking future requests
277
+ - Fixed Codex model context window metadata to use 272000 input tokens (instead of 400000 total budget) for non-Spark Codex variants
278
+
279
+ ## [12.0.0] - 2026-02-12
280
+
281
+ ### Added
282
+
283
+ - Added GPT-5.3 Codex Spark model with 128K context window and extended reasoning capabilities
284
+ - Added MiniMax M2.5 and M2.5 Lightning models via OpenAI-compatible API (minimax-code provider)
285
+ - Added MiniMax M2.5 and M2.5 Lightning models via OpenAI-compatible API (minimax-code-cn provider for China region)
286
+ - Added MiniMax M2.5 and M2.5 Lightning models via Anthropic API (minimax and minimax-cn providers)
287
+ - Added Llama 3.1 8B model via Cerebras API
288
+ - Added MiniMax M2.5 model via OpenRouter
289
+ - Added MiniMax M2.5 model via Vercel AI Gateway
290
+ - Added MiniMax M2.5 Free model via OpenCode
291
+ - Added Qwen3 VL 32B Instruct multimodal model via OpenRouter
292
+
293
+ ### Changed
294
+
295
+ - Updated Z.ai GLM-5 pricing and context window configuration on OpenRouter
296
+ - Updated Qwen3 Max Thinking max tokens from 32768 to 65536 on OpenRouter
297
+ - Updated OpenAI GPT-5 Image Mini pricing on OpenRouter
298
+ - Updated OpenAI GPT-5 Pro pricing and context window on OpenRouter
299
+ - Updated OpenAI o4-mini pricing and context window on OpenRouter
300
+ - Updated Claude Opus 4.5 Thinking model name formatting (removed parentheses)
301
+ - Updated Claude Opus 4.6 Thinking model name formatting (removed parentheses)
302
+ - Updated Claude Sonnet 4.5 Thinking model name formatting (removed parentheses)
303
+ - Updated Gemini 2.5 Flash Thinking model name formatting (removed parentheses)
304
+ - Updated Gemini 3 Pro High and Low model name formatting (removed parentheses)
305
+ - Updated GPT-OSS 120B Medium model name formatting (removed parentheses) and context window to 131072
306
+
307
+ ### Removed
308
+
309
+ - Removed GLM-5 model from Z.ai provider
310
+ - Removed Trinity Large Preview Free model from OpenCode provider
311
+ - Removed MiniMax M2.1 Free model from OpenCode provider
312
+ - Removed deprecated Anthropic model entries: `claude-3-5-haiku-latest`, `claude-3-5-haiku-20241022`, `claude-3-7-sonnet-20250219`, `claude-3-7-sonnet-latest`, `claude-3-opus-20240229`, `claude-3-sonnet-20240229` ([#33](https://github.com/can1357/oh-my-pi/issues/33))
313
+
314
+ ### Fixed
315
+
316
+ - Added deprecation filter in model generation script to prevent re-adding deprecated Anthropic models ([#33](https://github.com/can1357/oh-my-pi/issues/33))
317
+
318
+ ## [11.14.1] - 2026-02-12
319
+ ### Added
320
+
321
+ - Added prompt-caching-scope-2026-01-05 beta feature support
322
+
323
+ ### Changed
324
+
325
+ - Updated Claude Code version header to 2.1.39
326
+ - Updated runtime version header to v24.13.1 and package version to 0.73.0
327
+ - Increased request timeout from 60s to 600s
328
+ - Reordered Accept-Encoding header values for compression preference
329
+ - Updated OAuth authorization and token endpoints to use platform.claude.com
330
+ - Expanded OAuth scopes to include user:sessions:claude_code and user:mcp_servers
331
+
332
+ ### Removed
333
+
334
+ - Removed claude-code-20250219 beta feature from default models
335
+ - Removed fine-grained-tool-streaming-2025-05-14 beta feature
336
+
337
+ ## [11.13.1] - 2026-02-12
338
+ ### Added
339
+
340
+ - Added Perplexity (Pro/Max) OAuth login support via native macOS app extraction or email OTP authentication
341
+ - Added `loginPerplexity` and `refreshPerplexityToken` functions for Perplexity account integration
342
+ - Added Socket.IO v4 client implementation for authenticated WebSocket communication with Perplexity API
343
+
344
+ ## [11.12.0] - 2026-02-11
345
+ ### Changed
346
+
347
+ - Increased maximum retry attempts for Codex requests from 2 to 5 to improve reliability on transient failures
348
+
349
+ ### Fixed
350
+
351
+ - Fixed tool result content handling in Anthropic provider to provide fallback error message when content is empty
352
+ - Improved retry delay calculation to parse delay values from error response bodies (e.g., 'Please try again in 225ms')
353
+
354
+ ## [11.11.0] - 2026-02-10
355
+
356
+ ### Breaking Changes
357
+
358
+ - Replaced `./models.generated` export with `./models.json` - update imports from `import { MODELS } from './models.generated'` to `import MODELS from './models.json' with { type: 'json' }`
359
+
360
+ ### Added
361
+
362
+ - Added TypeScript type declarations for `models.json` to enable proper type inference when importing the JSON file
363
+
364
+ ### Changed
365
+
366
+ - Updated available models in google-antigravity provider with new model variants and updated context window/token limits
367
+ - Simplified type signatures for `getModel()` and `getModels()` functions for improved usability
368
+ - Changed models export from TypeScript module to JSON format for improved performance and reduced bundle size
369
+ - Updated `@anthropic-ai/sdk` dependency from ^0.72.1 to ^0.74.0
370
+
371
+ ## [11.10.0] - 2026-02-10
372
+ ### Added
373
+
374
+ - Added support for Kimi K2, K2 Turbo Preview, and K2.5 models with reasoning capabilities
375
+
376
+ ### Fixed
377
+
378
+ - Fixed Claude Opus 4.6 context window to 200K across all providers (was incorrectly set to 1M)
379
+ - Fixed Claude Sonnet 4 context window to 200K across multiple providers (was incorrectly set to 1M)
380
+
381
+ ## [11.8.0] - 2026-02-10
382
+ ### Added
383
+
384
+ - Added `auto` model alias for OpenRouter with automatic model routing
385
+ - Added `openrouter/aurora-alpha` model with reasoning capabilities
386
+ - Added `qwen/qwen3-max-thinking` model with extended context window support
387
+ - Added support for `parametersJsonSchema` in Google Gemini tool definitions for improved JSON Schema compatibility
388
+
389
+ ### Changed
390
+
391
+ - Updated Claude Sonnet 4 and 4.5 context window from 1M to 200K tokens to reflect actual limits
392
+ - Updated Claude Opus 4.6 context window to 200K tokens across providers
393
+ - Changed default `reasoningSummary` for OpenAI Codex from `undefined` to `auto`
394
+ - Updated Qwen model pricing and context window specifications across multiple variants
395
+ - Modified Google Gemini CLI system instruction to use compact format
396
+ - Changed tool parameter handling for Claude models on Google Cloud Code Assist to use legacy `parameters` field for API translation
397
+
398
+ ### Removed
399
+
400
+ - Removed `glm-4.7-free` model from OpenCode provider
401
+ - Removed `qwen3-coder` model from OpenCode provider
402
+ - Removed `ai21/jamba-mini-1.7` model from OpenRouter
403
+ - Removed `stepfun-ai/step3` model from OpenRouter
404
+ - Removed duplicate test suite for Google Antigravity Provider with `gemini-3-pro-high`
405
+
406
+ ### Fixed
407
+
408
+ - Fixed Amazon Bedrock HTTP/1.1 handler import to use direct import instead of dynamic import
409
+ - Fixed Qwen model context window and pricing inconsistencies across OpenRouter
410
+ - Fixed cache read pricing for multiple Qwen models
411
+ - Fixed OpenAI Codex reasoning effort clamping for `gpt-5.3-codex` model
412
+
413
+ ## [11.7.1] - 2026-02-07
414
+
415
+ ### Added
416
+
417
+ - Added Claude Opus 4.6 Thinking model for Antigravity provider
418
+ - Added Gemini 2.5 Flash, Gemini 2.5 Flash Thinking, and Gemini 2.5 Pro models for Antigravity provider
419
+ - Added Pony Alpha model via OpenRouter
420
+
421
+ ### Changed
422
+
423
+ - Updated Antigravity models to use free tier pricing (0 cost) across all models
424
+ - Changed Antigravity model fetching to dynamically load from API when credentials are available, with hardcoded fallback models
425
+ - Updated Claude Opus 4.6 context window from 200,000 to 1,000,000 tokens across Bedrock regions
426
+ - Updated Claude Opus 4.6 cache pricing from 1.5/18.75 to 0.5/6.25 for EU and US regions
427
+ - Updated Antigravity model pricing to free tier (0 cost) for Claude Opus 4.5 Thinking, Claude Sonnet 4.5 Thinking, Gemini 3 Flash, Gemini 3 Pro variants, and GPT-OSS 120B Medium
428
+ - Updated GPT-OSS 120B Medium reasoning capability from false to true
429
+ - Updated Gemini 3 Flash max tokens from 65,535 to 65,536
430
+ - Updated Claude Opus 4.5 Thinking display name formatting to include parentheses
431
+ - Updated various model pricing and context window parameters across OpenRouter and other providers
432
+ - Removed Claude Opus 4.6 20260205 model from Anthropic provider
433
+
434
+ ### Fixed
435
+
436
+ - Fixed Claude Opus 4.6 model ID format by removing version suffix (:0) in Bedrock configurations
437
+ - Fixed Llama 3.1 70B Instruct pricing and context window parameters
438
+ - Fixed Mistral model pricing and cache read costs
439
+ - Fixed DeepSeek and other model pricing inconsistencies
440
+ - Fixed Qwen model pricing and token limits
441
+ - Fixed GLM model pricing and context window specifications
442
+
443
+ ## [11.6.0] - 2026-02-07
444
+
445
+ ### Added
446
+
447
+ - Added Bedrock cache retention support with `PI_CACHE_RETENTION` env var and per-request `cacheRetention` option
448
+ - Added adaptive thinking support for Bedrock Opus 4.6+ models
449
+ - Added `AWS_BEDROCK_SKIP_AUTH` env var to support unauthenticated Bedrock proxies
450
+ - Added `AWS_BEDROCK_FORCE_HTTP1` env var to force HTTP/1.1 for custom Bedrock endpoints
451
+ - Re-exported `Static`, `TSchema`, and `Type` from `@sinclair/typebox`
452
+
453
+ ### Fixed
454
+
455
+ - Fixed OpenAI Responses storage disabled by default (`store: false`)
456
+ - Fixed reasoning effort clamping for gpt-5.3 Codex models (minimal -> low)
457
+ - Fixed Bedrock `supportsPromptCaching` to also check model cost fields
458
+
459
+ ## [11.5.1] - 2026-02-07
460
+ ### Fixed
461
+
462
+ - Fixed schema normalization to handle array-valued `type` fields by converting them to a single type with nullable flag for Google provider compatibility
463
+
464
+ ## [11.3.0] - 2026-02-06
465
+ ### Added
466
+
467
+ - Added `cacheRetention` option to control prompt cache retention preference ('none', 'short', 'long') across providers
468
+ - Added `maxRetryDelayMs` option to cap server-requested retry delays and fail fast when delays exceed the limit
469
+ - Added `effort` option for Anthropic Opus 4.6+ models to control adaptive thinking effort levels ('low', 'medium', 'high', 'max')
470
+ - Added support for Anthropic Opus 4.6+ adaptive thinking mode that lets Claude decide when and how much to think
471
+ - Added `PI_AI_ANTIGRAVITY_VERSION` environment variable to customize Antigravity sandbox endpoint version
472
+ - Exported `convertAnthropicMessages` function for converting message formats to Anthropic API
473
+ - Automatic fallback for Anthropic assistant-prefill requests: appends synthetic user "Continue." message when conversation ends with assistant turn to maintain API compatibility
474
+
475
+ ### Changed
476
+
477
+ - Changed `supportsXhigh()` to include GPT-5.1 Codex Max and broaden Anthropic support to all Anthropic Messages API models with budget-based thinking capability
478
+ - Changed Anthropic thinking mode to use adaptive thinking for Opus 4.6+ models instead of budget-based thinking
479
+ - Changed `supportsXhigh()` to support GPT-5.2/5.3 and Anthropic Opus 4.6+ models with adaptive thinking
480
+ - Changed prompt caching to respect `cacheRetention` option and support TTL configuration for Anthropic
481
+ - Changed OpenAI tool definitions to conditionally include `strict` field only when provider supports it
482
+ - Changed Qwen model support to use `enable_thinking` boolean parameter instead of OpenAI-style reasoning_effort
483
+
484
+ ### Fixed
485
+
486
+ - Fixed indentation and formatting in `convertAnthropicMessages` function
487
+ - Fixed handling of conversations ending with assistant messages on Anthropic-routed models that reject assistant prefill requests
488
+
489
+ ## [11.2.3] - 2026-02-05
490
+ ### Added
491
+
492
+ - Added Claude Opus 4.6 model support across multiple providers (Anthropic, Amazon Bedrock, GitHub Copilot, OpenRouter, OpenCode, Vercel AI Gateway)
493
+ - Added GPT-5.3 Codex model support for OpenAI
494
+ - Added `readSseJson` utility import for improved SSE stream handling in Google Gemini CLI provider
495
+
496
+ ### Changed
497
+
498
+ - Updated Google Gemini CLI provider to use `readSseJson` utility for cleaner SSE stream parsing
499
+ - Updated pricing for Llama 3.1 405B model on Vercel AI Gateway (cache read rate adjusted)
500
+ - Updated Llama 3.1 405B context window and max tokens on Vercel AI Gateway (256000 for both)
501
+
502
+ ### Removed
503
+
504
+ - Removed Kimi K2, Kimi K2 Turbo Preview, and Kimi K2.5 models
505
+ - Removed Deep Cogito Cogito V2 Preview models from OpenRouter
506
+
507
+ ## [11.0.0] - 2026-02-05
508
+
509
+ ### Changed
510
+
511
+ - Replaced direct `Bun.env` access with `getEnv()` utility from `@oh-my-pi/pi-utils` for consistent environment variable handling across all providers
512
+ - Updated environment variable names from `OMP_*` prefix to `PI_*` prefix for consistency (e.g., `OMP_CODING_AGENT_DIR` → `PI_CODING_AGENT_DIR`)
513
+
514
+ ### Removed
515
+
516
+ - Removed automatic environment variable migration from `PI_*` to `OMP_*` prefixes via `migrate-env.ts` module
517
+
518
+ ## [10.5.0] - 2026-02-04
519
+
520
+ ### Changed
521
+
522
+ - Updated @anthropic-ai/sdk to ^0.72.1
523
+ - Updated @aws-sdk/client-bedrock-runtime to ^3.982.0
524
+ - Updated @google/genai to ^1.39.0
525
+ - Updated @smithy/node-http-handler to ^4.4.9
526
+ - Updated openai to ^6.17.0
527
+ - Updated @types/node to ^25.2.0
528
+
529
+ ### Removed
530
+
531
+ - Removed proxy-agent dependency
532
+ - Removed undici dependency
533
+
534
+ ## [9.4.0] - 2026-01-31
535
+
536
+ ### Added
537
+
538
+ - Added `getEnv()` function to retrieve environment variables from Bun.env, cwd/.env, or ~/.env
539
+ - Added support for reading .env files from home directory and current working directory
540
+ - Added support for `exa` and `perplexity` as known providers in `getEnvApiKey()`
541
+
542
+ ### Changed
543
+
544
+ - Changed `getEnvApiKey()` to check Bun.env, cwd/.env, and ~/.env files in order of precedence
545
+ - Refactored provider API key resolution to use a declarative service provider map
546
+
547
+ ## [9.2.2] - 2026-01-31
548
+
549
+ ### Added
550
+
551
+ - Added OpenCode Zen provider with API key authentication for accessing multiple AI models
552
+ - Added 4 new free models via OpenCode: glm-4.7-free, kimi-k2.5-free, minimax-m2.1-free, trinity-large-preview-free
553
+ - Added glm-4.7-flash model via Zai provider
554
+ - Added Kimi Code provider with OpenAI and Anthropic API format support
555
+ - Added prompt cache retention support with PI_CACHE_RETENTION env var
556
+ - Added overflow patterns for Bedrock, MiniMax, Kimi; reclassified 429 as rate limiting
557
+ - Added profile endpoint integration to resolve user emails with 24-hour caching
558
+ - Added automatic token refresh for expired Kimi OAuth credentials
559
+ - Added Kimi Code OAuth handler with device authorization flow
560
+ - Added Kimi Code usage provider with quota caching
561
+ - Added 4 new Kimi Code models (kimi-for-coding, kimi-k2, kimi-k2-turbo-preview, kimi-k2.5)
562
+ - Added Kimi Code provider integration with OAuth and token management
563
+ - Added tool-choice utility for mapping unified ToolChoice to provider-specific formats
564
+ - Added ToolChoice type for controlling tool selection (auto, none, any, required, function)
565
+
566
+ ### Changed
567
+
568
+ - Updated Kimi K2.5 cache read pricing from 0.1 to 0.08
569
+ - Updated MiniMax M2 pricing: input 0.6→0.6, output 3→3, cache read 0.1→0.09999999999999999
570
+ - Updated OpenRouter DeepSeek V3.1 pricing and max tokens: input 0.6→0.5, output 3→2.8, maxTokens 262144→4096
571
+ - Updated OpenRouter DeepSeek R1 pricing and max tokens: input 0.06→0.049999999999999996, output 0.24→0.19999999999999998, maxTokens 262144→4096
572
+ - Updated Anthropic Claude 3.5 Sonnet max tokens from 256000 to 65536 on OpenRouter
573
+ - Updated Vercel AI Gateway Claude 3.5 Sonnet cache read pricing from 0.125 to 0.13
574
+ - Updated Vercel AI Gateway Claude 3.5 Sonnet New cache read pricing from 0.125 to 0.13
575
+ - Updated Vercel AI Gateway GPT-5.2 cache read pricing from 0.175 to 0.18 and display name to 'GPT 5.2'
576
+ - Updated Zai GLM-4.6 cache read pricing from 0.024999999999999998 to 0.03
577
+ - Updated Zai Qwen QwQ max tokens from 66000 to 16384
578
+ - Added delta event batching and throttling (50ms, 20 updates/sec max) to AssistantMessageEventStream
579
+ - Updated MiniMax-M2 pricing: input 1.2→0.6, output 1.2→3, cacheRead 0.6→0.1
580
+
581
+ ### Removed
582
+
583
+ - Removed OpenRouter google/gemini-2.0-flash-exp:free model
584
+ - Removed Vercel AI Gateway stealth/sonoma-dusk-alpha and stealth/sonoma-sky-alpha models
585
+
586
+ ### Fixed
587
+
588
+ - Fixed rate limit issues with Kimi models by always sending max_tokens
589
+ - Added handling for sensitive stop reason from Anthropic API safety filters
590
+ - Added optional chaining for safer JSON schema property access in Anthropic provider
591
+
592
+ ## [8.6.0] - 2026-01-27
593
+
594
+ ### Changed
595
+
596
+ - Replaced JSON5 dependency with Bun.JSON5 parsing
597
+
598
+ ### Fixed
599
+
600
+ - Filtered empty user text blocks for OpenAI-compatible completions and normalized Kimi reasoning_content for OpenRouter tool-call messages
601
+
602
+ ## [8.4.0] - 2026-01-25
603
+
604
+ ### Added
605
+
606
+ - Added Azure OpenAI Responses provider with deployment mapping and resource-based base URL support
607
+
608
+ ### Changed
609
+
610
+ - Added OpenRouter routing preferences for OpenAI-compatible completions
611
+
612
+ ### Fixed
613
+
614
+ - Defaulted Google tool call arguments to empty objects when providers omit args
615
+ - Guarded Responses/Codex streaming deltas against missing content parts and handled arguments.done events
616
+
617
+ ## [8.2.1] - 2026-01-24
618
+
619
+ ### Fixed
620
+
621
+ - Fixed handling of streaming function call arguments in OpenAI responses to properly parse arguments when sent via `response.function_call_arguments.done` events
622
+
623
+ ## [8.2.0] - 2026-01-24
624
+
625
+ ### Changed
626
+
627
+ - Migrated node module imports from named to namespace imports across all packages for consistency with project guidelines
628
+
629
+ ## [8.0.0] - 2026-01-23
630
+
631
+ ### Fixed
632
+
633
+ - Fixed OpenAI Responses API 400 error "function_call without required reasoning item" when switching between models (same provider, different model). The fix omits the `id` field for function_calls from different models to avoid triggering OpenAI's reasoning/function_call pairing validation
634
+ - Fixed 400 errors when reading multiple images via GitHub Copilot's Claude models. Claude requires tool_use -> tool_result adjacency with no user messages interleaved. Images from consecutive tool results are now batched into a single user message
635
+
636
+ ## [7.0.0] - 2026-01-21
637
+
638
+ ### Added
639
+
640
+ - Added usage tracking system with normalized schema for provider quota/limit endpoints
641
+ - Added Claude usage provider for 5-hour and 7-day quota windows
642
+ - Added GitHub Copilot usage provider for chat, completions, and premium requests
643
+ - Added Google Antigravity usage provider for model quota tracking
644
+ - Added Google Gemini CLI usage provider for tier-based quota monitoring
645
+ - Added OpenAI Codex usage provider for primary and secondary rate limit windows
646
+ - Added ZAI usage provider for token and request quota tracking
647
+
648
+ ### Changed
649
+
650
+ - Updated Claude usage provider to extract account identifiers from response headers
651
+ - Updated GitHub Copilot usage provider to include account identifiers in usage reports
652
+ - Updated Google Gemini CLI usage provider to handle missing reset time gracefully
653
+
654
+ ### Fixed
655
+
656
+ - Fixed GitHub Copilot usage provider to simplify token handling and improve reliability
657
+ - Fixed GitHub Copilot usage provider to properly resolve account identifiers for OAuth credentials
658
+ - Fixed API validation errors when sending empty user messages (resume with `.`) across all providers:
659
+ - Google Cloud Code Assist (google-shared.ts)
660
+ - OpenAI Responses API (openai-responses.ts)
661
+ - OpenAI Codex Responses API (openai-codex-responses.ts)
662
+ - Cursor (cursor.ts)
663
+ - Amazon Bedrock (amazon-bedrock.ts)
664
+ - Clamped OpenAI Codex reasoning effort "minimal" to "low" for gpt-5.2 models to avoid API errors
665
+ - Fixed GitHub Copilot usage fallback to internal quota endpoints when billing usage is unavailable
666
+ - Fixed GitHub Copilot usage metadata to include account identifiers for report dedupe
667
+ - Fixed Anthropic usage metadata extraction to include account identifiers when provided by the usage endpoint
668
+ - Fixed Gemini CLI usage windows to consistently label quota windows for display suppression
669
+
670
+ ## [6.9.69] - 2026-01-21
671
+
672
+ ### Added
673
+
674
+ - Added duration and time-to-first-token (ttft) metrics to all AI provider responses
675
+ - Added performance tracking for streaming responses across all providers
676
+
677
+ ## [6.9.0] - 2026-01-21
678
+
679
+ ### Removed
680
+
681
+ - Removed openai-codex provider exports from main package index
682
+ - Removed openai-codex prompt utilities and moved them inline
683
+ - Removed vitest configuration file
684
+
685
+ ## [6.8.4] - 2026-01-21
686
+
687
+ ### Changed
688
+
689
+ - Updated prompt caching strategy to follow Anthropic's recommended hierarchy
690
+ - Fixed token usage tracking to properly handle cumulative output tokens from message_delta events
691
+ - Improved message validation to filter out empty or invalid content blocks
692
+ - Increased OAuth callback timeout from 120 seconds to 120,000 milliseconds
693
+
694
+ ## [6.8.3] - 2026-01-21
695
+
696
+ ### Added
697
+
698
+ - Added `headers` option to all providers for custom request headers
699
+ - Added `onPayload` hook to observe provider request payloads before sending
700
+ - Added `strictResponsesPairing` option for Azure OpenAI Responses API compatibility
701
+ - Added `originator` option to `loginOpenAICodex` for custom OAuth flow identification
702
+ - Added per-request `headers` and `onPayload` hooks to `StreamOptions`
703
+ - Added `originator` option to `loginOpenAICodex`
704
+
705
+ ### Fixed
706
+
707
+ - Fixed tool call ID normalization for OpenAI Responses API cross-provider handoffs
708
+ - Skipped errored or aborted assistant messages during cross-provider transforms
709
+ - Detected AWS ECS/IRSA credentials for Bedrock authentication checks
710
+ - Detected AWS ECS/IRSA credentials for Bedrock authentication checks
711
+ - Normalized Responses API tool call IDs during handoffs and refreshed handoff tests
712
+ - Enforced strict tool call/result pairing for Azure OpenAI Responses API
713
+ - Skipped errored or aborted assistant messages during cross-provider transforms
714
+
715
+ ### Security
716
+
717
+ - Enhanced AWS credential detection to support ECS task roles and IRSA web identity tokens
718
+
719
+ ## [6.8.2] - 2026-01-21
720
+
721
+ ### Fixed
722
+
723
+ - Improved error handling for aborted requests in Google Gemini CLI provider
724
+ - Enhanced OAuth callback flow to handle manual input errors gracefully
725
+ - Fixed login cancellation handling in GitHub Copilot OAuth flow
726
+ - Removed fallback manual input from OpenAI Codex OAuth flow
727
+
728
+ ### Security
729
+
730
+ - Hardened database file permissions to prevent credential leakage
731
+ - Set secure directory permissions (0o700) for credential storage
732
+
733
+ ## [6.8.0] - 2026-01-20
734
+
735
+ ### Added
736
+
737
+ - Added `logout` command to CLI for OAuth provider logout
738
+ - Added `status` command to show logged-in providers and token expiry
739
+ - Added persistent credential storage using SQLite database
740
+ - Added OAuth callback server with automatic port fallback
741
+ - Added HTML callback page with success/error states
742
+ - Added support for Cursor OAuth provider
743
+
744
+ ### Changed
745
+
746
+ - Updated Promise.withResolvers usage for better compatibility
747
+ - Replaced custom sleep implementations with Bun.sleep and abortableSleep
748
+ - Simplified SSE stream parsing using readLines utility
749
+ - Updated test framework from vitest to bun:test
750
+ - Replaced temp directory creation with TempDir API
751
+ - Changed credential storage from auth.json to ~/.omp/agent/agent.db
752
+ - Changed CLI command examples from npx to bunx
753
+ - Refactored OAuth flows to use common callback server base class
754
+ - Updated OAuth provider interfaces to use controller pattern
755
+
756
+ ### Fixed
757
+
758
+ - Fixed OAuth callback handling with improved error states
759
+ - Fixed token refresh for all OAuth providers
760
+
761
+ ## [6.7.670] - 2026-01-19
762
+
763
+ ### Changed
764
+
765
+ - Updated Claude Code compatibility headers and version
766
+ - Improved OAuth token handling with proper state generation
767
+ - Enhanced cache control for tool and user message blocks
768
+ - Simplified tool name prefixing for OAuth traffic
769
+ - Updated PKCE verifier generation for better security
770
+
771
+ ## [5.7.67] - 2026-01-18
772
+
773
+ ### Fixed
774
+
775
+ - Added error handling for unknown OAuth providers
776
+
777
+ ## [5.6.77] - 2026-01-18
778
+
779
+ ### Fixed
780
+
781
+ - Prevented duplicate tool results for errored or aborted messages when results already exist
782
+
783
+ ## [5.6.7] - 2026-01-18
784
+
785
+ ### Added
786
+
787
+ - Added automatic retry logic for OpenAI Codex responses with configurable delay and max retries
788
+ - Added tool call ID sanitization for Amazon Bedrock to ensure valid characters
789
+ - Added tool argument validation that coerces JSON-encoded strings for expected non-string types
790
+
791
+ ### Changed
792
+
793
+ - Updated environment variable prefix from PI* to OMP* for better consistency
794
+ - Added automatic migration for legacy PI* environment variables to OMP* equivalents
795
+ - Adjusted Bedrock Claude thinking budgets to reserve output tokens when maxTokens is too low
796
+
797
+ ### Fixed
798
+
799
+ - Fixed orphaned tool call handling to ensure proper tool_use/tool_result pairing for all assistant messages
800
+ - Fixed message transformation to insert synthetic tool results for errored/aborted assistant messages with tool calls
801
+ - Fixed tool prefix handling in Claude provider to use case-insensitive comparison
802
+ - Fixed Gemini 3 model handling to treat unsigned tool calls as context-only with anti-mimicry context
803
+ - Fixed message transformation to filter out empty error messages from conversation history
804
+ - Fixed OpenAI completions provider compatibility detection to use provider metadata
805
+ - Fixed OpenAI completions provider to avoid using developer role for opencode provider
806
+ - Fixed orphaned tool call handling to skip synthetic results for errored assistant messages
807
+
808
+ ## [5.5.0] - 2026-01-18
809
+
810
+ ### Changed
811
+
812
+ - Updated User-Agent header from 'opencode' to 'pi' for OpenAI Codex requests
813
+ - Simplified Codex system prompt instructions
814
+ - Removed bridge text override from Codex system prompt builder
815
+
816
+ ## [5.3.0] - 2026-01-15
817
+
818
+ ### Changed
819
+
820
+ - Replaced detailed Codex system instructions with simplified pi assistant instructions
821
+ - Updated internal documentation references to use pi-internal:// protocol
822
+
823
+ ## [5.1.0] - 2026-01-14
824
+
825
+ ### Added
826
+
827
+ - Added Amazon Bedrock provider with `bedrock-converse-stream` API for Claude models via AWS
828
+ - Added MiniMax provider with OpenAI-compatible API
829
+ - Added EU cross-region inference model variants for Claude models on Bedrock
830
+
831
+ ### Fixed
832
+
833
+ - Fixed Gemini CLI provider retries with proper error handling, retry delays from headers, and empty stream retry logic
834
+ - Fixed numbered list items showing "1." for all items when code blocks break list continuity (via `start` property)
835
+
836
+ ## [5.0.0] - 2026-01-12
837
+
838
+ ### Added
839
+
840
+ - Added support for `xhigh` thinking level in `thinkingBudgets` configuration
841
+
842
+ ### Changed
843
+
844
+ - Changed Anthropic thinking token budgets: minimal (1024→3072), low (2048→6144), medium (8192→12288), high (16384→24576)
845
+ - Changed Google thinking token budgets: minimal (1024), low (2048→4096), medium (8192), high (16384), xhigh (24575)
846
+ - Changed `supportsXhigh()` to return true for all Anthropic models
847
+
848
+ ## [4.6.0] - 2026-01-12
849
+
850
+ ### Fixed
851
+
852
+ - Fixed incorrect classification of thought signatures in Google Gemini responses—thought signatures are now correctly treated as metadata rather than thinking content indicators
853
+ - Fixed thought signature handling in Google Gemini CLI and Vertex AI streaming to properly preserve signatures across text deltas
854
+ - Fixed Google schema sanitization stripping property names that match schema keywords (e.g., "pattern", "format") from tool definitions
855
+
856
+ ## [4.4.9] - 2026-01-12
857
+
858
+ ### Fixed
859
+
860
+ - Fixed Google provider schema sanitization to strip additional unsupported JSON Schema fields (patternProperties, additionalProperties, min/max constraints, pattern, format)
861
+
862
+ ## [4.4.8] - 2026-01-12
863
+
864
+ ### Fixed
865
+
866
+ - Fixed Google provider schema sanitization to properly collapse `anyOf`/`oneOf` with const values into enum arrays
867
+ - Fixed const-to-enum conversion to infer type from the const value when type is not specified
868
+
869
+ ## [4.4.6] - 2026-01-11
870
+
871
+ ### Fixed
872
+
873
+ - Fixed tool parameter schema sanitization to only apply Google-specific transformations for Gemini models, preserving original schemas for other model types
874
+
875
+ ## [4.4.5] - 2026-01-11
876
+
877
+ ### Changed
878
+
879
+ - Exported `sanitizeSchemaForGoogle` utility function for external use
880
+
881
+ ### Fixed
882
+
883
+ - Fixed Google provider schema sanitization to strip additional unsupported JSON Schema fields ($schema, $ref, $defs, format, examples, and others)
884
+ - Fixed Google provider to ignore `additionalProperties: false` which is unsupported by the API
885
+
886
+ ## [4.4.4] - 2026-01-11
887
+
888
+ ### Fixed
889
+
890
+ - Fixed Cursor todo updates to bridge update_todos tool calls to the local todo_write tool
891
+
892
+ ## [4.3.0] - 2026-01-11
893
+
894
+ ### Added
895
+
896
+ - Added debug log filtering and display script for Cursor JSONL logs with follow mode and coalescing support
897
+ - Added protobuf definition extractor script to reconstruct .proto files from bundled JavaScript
898
+ - Added conversation state caching to persist context across multiple Cursor API requests in the same session
899
+ - Added shell streaming support for real-time stdout/stderr output during command execution
900
+ - Added JSON5 parsing for MCP tool arguments with Python-style boolean and None value normalization
901
+ - Added Cursor provider with support for Claude, GPT, and Gemini models via Cursor's agent API
902
+ - Added OAuth authentication flow for Cursor including login, token refresh, and expiry detection
903
+ - Added `cursor-agent` API type with streaming support and tool execution handlers
904
+ - Added Cursor model definitions including Claude 4.5, GPT-5.x, Gemini 3, and Grok variants
905
+ - Added model generation script to automatically fetch and update AI model definitions from models.dev and OpenRouter APIs
906
+
907
+ ### Changed
908
+
909
+ - Changed Cursor debug logging to use structured JSONL format with automatic MCP argument decoding
910
+ - Changed MCP tool argument decoding to use protobuf Value schema for improved type handling
911
+ - Changed tool advertisement to filter Cursor native tools (bash, read, write, delete, ls, grep, lsp) instead of only exposing mcp\_ prefixed tools
912
+
913
+ ### Fixed
914
+
915
+ - Fixed Cursor conversation history serialization so subagents retain task context and can call complete
916
+
917
+ ## [4.2.1] - 2026-01-11
918
+
919
+ ### Changed
920
+
921
+ - Updated `reasoningSummary` option to accept only `"auto"`, `"concise"`, `"detailed"`, or `null` (removed `"off"` and `"on"` values)
922
+ - Changed default `reasoningSummary` from `"auto"` to `"detailed"`
923
+ - OpenAI Codex: switched to bundled system prompt matching opencode, changed originator to "opencode", simplified prompt handling
924
+
925
+ ### Fixed
926
+
927
+ - Fixed Cloud Code Assist tool schema conversion to avoid unsupported `const` fields
928
+
929
+ ## [4.0.0] - 2026-01-10
930
+
931
+ ### Added
932
+
933
+ - Added `betas` option in `AnthropicOptions` for passing custom Anthropic beta feature flags
934
+ - OpenCode Zen provider support with 26 models (Claude, GPT, Gemini, Grok, Kimi, GLM, Qwen, etc.). Set `OPENCODE_API_KEY` env var to use.
935
+ - `thinkingBudgets` option in `SimpleStreamOptions` for customizing token budgets per thinking level on token-based providers
936
+ - `sessionId` option in `StreamOptions` for providers that support session-based caching. OpenAI Codex provider uses this to set `prompt_cache_key` and routing headers.
937
+ - `supportsUsageInStreaming` compatibility flag for OpenAI-compatible providers that reject `stream_options: { include_usage: true }`. Defaults to `true`. Set to `false` in model config for providers like gatewayz.ai.
938
+ - `GOOGLE_APPLICATION_CREDENTIALS` env var support for Vertex AI credential detection (standard for CI/production)
939
+ - Exported OpenAI Codex utilities: `CacheMetadata`, `getCodexInstructions`, `getModelFamily`, `ModelFamily`, `buildCodexPiBridge`, `buildCodexSystemPrompt`, `CodexSystemPrompt`
940
+ - Headless OAuth support for all callback-server providers (Google Gemini CLI, Antigravity, OpenAI Codex): paste redirect URL when browser callback is unreachable
941
+ - Cancellable GitHub Copilot device code polling via AbortSignal
942
+ - Improved error messages for OpenRouter providers by including raw metadata from upstream errors
943
+
944
+ ### Changed
945
+
946
+ - Changed Anthropic provider to include Claude Code system instruction for all API key types, not just OAuth tokens (except Haiku models)
947
+ - Changed Anthropic OAuth tool naming to use `proxy_` prefix instead of mapping to Claude Code tool names, avoiding potential name collisions
948
+ - Changed Anthropic provider to include Claude Code headers for all requests, not just OAuth tokens
949
+ - Anthropic provider now maps tool names to Claude Code's exact tool names (Read, Write, Edit, Bash, Grep, Glob) instead of using prefixed names
950
+ - OpenAI Completions provider now disables strict mode on tools to allow optional parameters without null unions
951
+
952
+ ### Fixed
953
+
954
+ - Fixed Anthropic OAuth code parsing to accept full redirect URLs in addition to raw authorization codes
955
+ - Fixed Anthropic token refresh to preserve existing refresh token when server doesn't return a new one
956
+ - Fixed thinking mode being enabled when tool_choice forces a specific tool, which is unsupported
957
+ - Fixed max_tokens being too low when thinking budget is set, now auto-adjusts to model's maxTokens
958
+ - Google Cloud Code Assist OAuth for paid subscriptions: properly handles long-running operations for project provisioning, supports `GOOGLE_CLOUD_PROJECT` / `GOOGLE_CLOUD_PROJECT_ID` env vars for paid tiers
959
+ - `os.homedir()` calls at module load time; now resolved lazily when needed
960
+ - OpenAI Responses tool strict flag to use a boolean for LM Studio compatibility
961
+ - Gemini CLI abort handling: detect native `AbortError` in retry catch block, cancel SSE reader when abort signal fires
962
+ - Antigravity provider 429 errors by aligning request payload with CLIProxyAPI v6.6.89
963
+ - Thinking block handling for cross-model conversations: thinking blocks are now converted to plain text when switching models
964
+ - OpenAI Codex context window from 400,000 to 272,000 tokens to match Codex CLI defaults
965
+ - Codex SSE error events to surface message, code, and status
966
+ - Context overflow detection for `context_length_exceeded` error codes
967
+ - Codex provider now always includes `reasoning.encrypted_content` even when custom `include` options are passed
968
+ - Codex requests now omit the `reasoning` field entirely when thinking is off
969
+ - Crash when pasting text with trailing whitespace exceeding terminal width
970
+
971
+ ## [3.37.1] - 2026-01-10
972
+
973
+ ### Added
974
+
975
+ - Added automatic type coercion for tool arguments when LLMs return JSON-encoded strings instead of native types (numbers, booleans, arrays, objects)
976
+
977
+ ### Changed
978
+
979
+ - Changed tool argument validation to attempt JSON parsing and type coercion before rejecting mismatched types
980
+ - Changed validation error messages to include both original and normalized arguments when coercion was attempted
981
+
982
+ ## [3.37.0] - 2026-01-10
983
+
984
+ ### Changed
985
+
986
+ - Enabled type coercion in JSON schema validation to automatically convert compatible types
987
+
988
+ ## [3.35.0] - 2026-01-09
989
+
990
+ ### Added
991
+
992
+ - Enhanced error messages to include retry-after timing information from API rate limit headers
993
+
994
+ ## [0.42.0] - 2026-01-09
995
+
996
+ ### Added
997
+
998
+ - Added OpenCode Zen provider support with 26 models (Claude, GPT, Gemini, Grok, Kimi, GLM, Qwen, etc.). Set `OPENCODE_API_KEY` env var to use.
999
+
1000
+ ## [0.39.0] - 2026-01-08
1001
+
1002
+ ### Fixed
1003
+
1004
+ - Fixed Gemini CLI abort handling: detect native `AbortError` in retry catch block, cancel SSE reader when abort signal fires ([#568](https://github.com/badlogic/pi-mono/pull/568) by [@tmustier](https://github.com/tmustier))
1005
+ - Fixed Antigravity provider 429 errors by aligning request payload with CLIProxyAPI v6.6.89: inject Antigravity system instruction with `role: "user"`, set `requestType: "agent"`, and use `antigravity` userAgent. Added bridge prompt to override Antigravity behavior (identity, paths, web dev guidelines) with Pi defaults. ([#571](https://github.com/badlogic/pi-mono/pull/571) by [@ben-vargas](https://github.com/ben-vargas))
1006
+ - Fixed thinking block handling for cross-model conversations: thinking blocks are now converted to plain text (no `<thinking>` tags) when switching models. Previously, `<thinking>` tags caused models to mimic the pattern and output literal tags. Also fixed empty thinking blocks causing API errors. ([#561](https://github.com/badlogic/pi-mono/issues/561))
1007
+
1008
+ ## [0.38.0] - 2026-01-08
1009
+
1010
+ ### Added
1011
+
1012
+ - `thinkingBudgets` option in `SimpleStreamOptions` for customizing token budgets per thinking level on token-based providers ([#529](https://github.com/badlogic/pi-mono/pull/529) by [@melihmucuk](https://github.com/melihmucuk))
1013
+
1014
+ ### Breaking Changes
1015
+
1016
+ - Removed OpenAI Codex model aliases (`gpt-5`, `gpt-5-mini`, `gpt-5-nano`, `codex-mini-latest`, `gpt-5-codex`, `gpt-5.1-codex`, `gpt-5.1-chat-latest`). Use canonical model IDs: `gpt-5.1`, `gpt-5.1-codex-max`, `gpt-5.1-codex-mini`, `gpt-5.2`, `gpt-5.2-codex`. ([#536](https://github.com/badlogic/pi-mono/pull/536) by [@ghoulr](https://github.com/ghoulr))
1017
+
1018
+ ### Fixed
1019
+
1020
+ - Fixed OpenAI Codex context window from 400,000 to 272,000 tokens to match Codex CLI defaults and prevent 400 errors. ([#536](https://github.com/badlogic/pi-mono/pull/536) by [@ghoulr](https://github.com/ghoulr))
1021
+ - Fixed Codex SSE error events to surface message, code, and status. ([#551](https://github.com/badlogic/pi-mono/pull/551) by [@tmustier](https://github.com/tmustier))
1022
+ - Fixed context overflow detection for `context_length_exceeded` error codes.
1023
+
1024
+ ## [0.37.6] - 2026-01-06
1025
+
1026
+ ### Added
1027
+
1028
+ - Exported OpenAI Codex utilities: `CacheMetadata`, `getCodexInstructions`, `getModelFamily`, `ModelFamily`, `buildCodexPiBridge`, `buildCodexSystemPrompt`, `CodexSystemPrompt` ([#510](https://github.com/badlogic/pi-mono/pull/510) by [@mitsuhiko](https://github.com/mitsuhiko))
1029
+
1030
+ ## [0.37.3] - 2026-01-06
1031
+
1032
+ ### Added
1033
+
1034
+ - `sessionId` option in `StreamOptions` for providers that support session-based caching. OpenAI Codex provider uses this to set `prompt_cache_key` and routing headers.
1035
+
1036
+ ## [0.37.2] - 2026-01-05
1037
+
1038
+ ### Fixed
1039
+
1040
+ - Codex provider now always includes `reasoning.encrypted_content` even when custom `include` options are passed ([#484](https://github.com/badlogic/pi-mono/pull/484) by [@kim0](https://github.com/kim0))
1041
+
1042
+ ## [0.37.0] - 2026-01-05
1043
+
1044
+ ### Breaking Changes
1045
+
1046
+ - OpenAI Codex models no longer have per-thinking-level variants (e.g., `gpt-5.2-codex-high`). Use the base model ID and set thinking level separately. The Codex provider clamps reasoning effort to what each model supports internally. (initial implementation by [@ben-vargas](https://github.com/ben-vargas) in [#472](https://github.com/badlogic/pi-mono/pull/472))
1047
+
1048
+ ### Added
1049
+
1050
+ - Headless OAuth support for all callback-server providers (Google Gemini CLI, Antigravity, OpenAI Codex): paste redirect URL when browser callback is unreachable ([#428](https://github.com/badlogic/pi-mono/pull/428) by [@ben-vargas](https://github.com/ben-vargas), [#468](https://github.com/badlogic/pi-mono/pull/468) by [@crcatala](https://github.com/crcatala))
1051
+ - Cancellable GitHub Copilot device code polling via AbortSignal
1052
+
1053
+ ### Fixed
1054
+
1055
+ - Codex requests now omit the `reasoning` field entirely when thinking is off, letting the backend use its default instead of forcing a value. ([#472](https://github.com/badlogic/pi-mono/pull/472))
1056
+
1057
+ ## [0.36.0] - 2026-01-05
1058
+
1059
+ ### Added
1060
+
1061
+ - OpenAI Codex OAuth provider with Responses API streaming support: `openai-codex-responses` streaming provider with SSE parsing, tool-call handling, usage/cost tracking, and PKCE OAuth flow ([#451](https://github.com/badlogic/pi-mono/pull/451) by [@kim0](https://github.com/kim0))
1062
+
1063
+ ### Fixed
1064
+
1065
+ - Vertex AI dummy value for `getEnvApiKey()`: Returns `"<authenticated>"` when Application Default Credentials are configured (`~/.config/gcloud/application_default_credentials.json` exists) and both `GOOGLE_CLOUD_PROJECT` (or `GCLOUD_PROJECT`) and `GOOGLE_CLOUD_LOCATION` are set. This allows `streamSimple()` to work with Vertex AI without explicit `apiKey` option. The ADC credentials file existence check is cached per-process to avoid repeated filesystem access.
1066
+
1067
+ ## [0.32.3] - 2026-01-03
1068
+
1069
+ ### Fixed
1070
+
1071
+ - Google Vertex AI models no longer appear in available models list without explicit authentication. Previously, `getEnvApiKey()` returned a dummy value for `google-vertex`, causing models to show up even when Google Cloud ADC was not configured.
1072
+
1073
+ ## [0.32.0] - 2026-01-03
1074
+
1075
+ ### Added
1076
+
1077
+ - Vertex AI provider with ADC (Application Default Credentials) support. Authenticate with `gcloud auth application-default login`, set `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION`, and access Gemini models via Vertex AI. ([#300](https://github.com/badlogic/pi-mono/pull/300) by [@default-anton](https://github.com/default-anton))
1078
+
1079
+ ### Fixed
1080
+
1081
+ - **Gemini CLI rate limit handling**: Added automatic retry with server-provided delay for 429 errors. Parses delay from error messages like "Your quota will reset after 39s" and waits accordingly. Falls back to exponential backoff for other transient errors. ([#370](https://github.com/badlogic/pi-mono/issues/370))
1082
+
1083
+ ## [0.31.0] - 2026-01-02
1084
+
1085
+ ### Breaking Changes
1086
+
1087
+ - **Agent API moved**: All agent functionality (`agentLoop`, `agentLoopContinue`, `AgentContext`, `AgentEvent`, `AgentTool`, `AgentToolResult`, etc.) has moved to `@mariozechner/pi-agent-core`. Import from that package instead of `@oh-my-pi/pi-ai`.
1088
+
1089
+ ### Added
1090
+
1091
+ - **`GoogleThinkingLevel` type**: Exported type that mirrors Google's `ThinkingLevel` enum values (`"THINKING_LEVEL_UNSPECIFIED" | "MINIMAL" | "LOW" | "MEDIUM" | "HIGH"`). Allows configuring Gemini thinking levels without importing from `@google/genai`.
1092
+ - **`ANTHROPIC_OAUTH_TOKEN` env var**: Now checked before `ANTHROPIC_API_KEY` in `getEnvApiKey()`, allowing OAuth tokens to take precedence.
1093
+ - **`event-stream.js` export**: `AssistantMessageEventStream` utility now exported from package index.
1094
+
1095
+ ### Changed
1096
+
1097
+ - **OAuth uses Web Crypto API**: PKCE generation and OAuth flows now use Web Crypto API (`crypto.subtle`) instead of Node.js `crypto` module. This improves browser compatibility while still working in Node.js 20+.
1098
+ - **Deterministic model generation**: `generate-models.ts` now sorts providers and models alphabetically for consistent output across runs. ([#332](https://github.com/badlogic/pi-mono/pull/332) by [@mrexodia](https://github.com/mrexodia))
1099
+
1100
+ ### Fixed
1101
+
1102
+ - **OpenAI completions empty content blocks**: Empty text or thinking blocks in assistant messages are now filtered out before sending to the OpenAI completions API, preventing validation errors. ([#344](https://github.com/badlogic/pi-mono/pull/344) by [@default-anton](https://github.com/default-anton))
1103
+ - **Thinking token duplication**: Fixed thinking content duplication with chutes.ai provider. The provider was returning thinking content in both `reasoning_content` and `reasoning` fields, causing each chunk to be processed twice. Now only the first non-empty reasoning field is used.
1104
+ - **zAi provider API mapping**: Fixed zAi models to use `openai-completions` API with correct base URL (`https://api.z.ai/api/coding/paas/v4`) instead of incorrect Anthropic API mapping. ([#344](https://github.com/badlogic/pi-mono/pull/344), [#358](https://github.com/badlogic/pi-mono/pull/358) by [@default-anton](https://github.com/default-anton))
1105
+
1106
+ ## [0.28.0] - 2025-12-25
1107
+
1108
+ ### Breaking Changes
1109
+
1110
+ - **OAuth storage removed** ([#296](https://github.com/badlogic/pi-mono/issues/296)): All storage functions (`loadOAuthCredentials`, `saveOAuthCredentials`, `setOAuthStorage`, etc.) removed. Callers are responsible for storing credentials.
1111
+ - **OAuth login functions**: `loginAnthropic`, `loginGitHubCopilot`, `loginGeminiCli`, `loginAntigravity` now return `OAuthCredentials` instead of saving to disk.
1112
+ - **refreshOAuthToken**: Now takes `(provider, credentials)` and returns new `OAuthCredentials` instead of saving.
1113
+ - **getOAuthApiKey**: Now takes `(provider, credentials)` and returns `{ newCredentials, apiKey }` or null.
1114
+ - **OAuthCredentials type**: No longer includes `type: "oauth"` discriminator. Callers add discriminator when storing.
1115
+ - **setApiKey, resolveApiKey**: Removed. Callers must manage their own API key storage/resolution.
1116
+ - **getApiKey**: Renamed to `getEnvApiKey`. Only checks environment variables for known providers.
1117
+
1118
+ ## [0.27.7] - 2025-12-24
1119
+
1120
+ ### Fixed
1121
+
1122
+ - **Thinking tag leakage**: Fixed Claude mimicking literal `</thinking>` tags in responses. Unsigned thinking blocks (from aborted streams) are now converted to plain text without `<thinking>` tags. The TUI still displays them as thinking blocks. ([#302](https://github.com/badlogic/pi-mono/pull/302) by [@nicobailon](https://github.com/nicobailon))
1123
+
1124
+ ## [0.25.1] - 2025-12-21
1125
+
1126
+ ### Added
1127
+
1128
+ - **xhigh thinking level support**: Added `supportsXhigh()` function to check if a model supports xhigh reasoning level. Also clamps xhigh to high for OpenAI models that don't support it. ([#236](https://github.com/badlogic/pi-mono/pull/236) by [@theBucky](https://github.com/theBucky))
1129
+
1130
+ ### Fixed
1131
+
1132
+ - **Gemini multimodal tool results**: Fixed images in tool results causing flaky/broken responses with Gemini models. For Gemini 3, images are now nested inside `functionResponse.parts` per the [docs](https://ai.google.dev/gemini-api/docs/function-calling#multimodal). For older models (which don't support multimodal function responses), images are sent in a separate user message.
1133
+
1134
+ - **Queued message steering**: When `getQueuedMessages` is provided, the agent loop now checks for queued user messages after each tool call and skips remaining tool calls in the current assistant message when a queued message arrives (emitting error tool results).
1135
+
1136
+ - **Double API version path in Google provider URL**: Fixed Gemini API calls returning 404 after baseUrl support was added. The SDK was appending its default apiVersion to baseUrl which already included the version path. ([#251](https://github.com/badlogic/pi-mono/pull/251) by [@shellfyred](https://github.com/shellfyred))
1137
+
1138
+ - **Anthropic SDK retries disabled**: Re-enabled SDK-level retries (default 2) for transient HTTP failures. ([#252](https://github.com/badlogic/pi-mono/issues/252))
1139
+
1140
+ ## [0.23.5] - 2025-12-19
1141
+
1142
+ ### Added
1143
+
1144
+ - **Gemini 3 Flash thinking support**: Extended thinking level support for Gemini 3 Flash models (MINIMAL, LOW, MEDIUM, HIGH) to match Pro models' capabilities. ([#212](https://github.com/badlogic/pi-mono/pull/212) by [@markusylisiurunen](https://github.com/markusylisiurunen))
1145
+
1146
+ - **GitHub Copilot thinking models**: Added thinking support for additional Copilot models (o3-mini, o1-mini, o1-preview). ([#234](https://github.com/badlogic/pi-mono/pull/234) by [@aadishv](https://github.com/aadishv))
1147
+
1148
+ ### Fixed
1149
+
1150
+ - **Gemini tool result format**: Fixed tool result format for Gemini 3 Flash Preview which strictly requires `{ output: value }` for success and `{ error: value }` for errors. Previous format using `{ result, isError }` was rejected by newer Gemini models. Also improved type safety by removing `as any` casts. ([#213](https://github.com/badlogic/pi-mono/issues/213), [#220](https://github.com/badlogic/pi-mono/pull/220))
1151
+
1152
+ - **Google baseUrl configuration**: Google provider now respects `baseUrl` configuration for custom endpoints or API proxies. ([#216](https://github.com/badlogic/pi-mono/issues/216), [#221](https://github.com/badlogic/pi-mono/pull/221) by [@theBucky](https://github.com/theBucky))
1153
+
1154
+ - **GitHub Copilot vision requests**: Added `Copilot-Vision-Request` header when sending images to GitHub Copilot models. ([#222](https://github.com/badlogic/pi-mono/issues/222))
1155
+
1156
+ - **GitHub Copilot X-Initiator header**: Fixed X-Initiator logic to check last message role instead of any message in history. This ensures proper billing when users send follow-up messages. ([#209](https://github.com/badlogic/pi-mono/issues/209))
1157
+
1158
+ ## [0.22.3] - 2025-12-16
1159
+
1160
+ ### Added
1161
+
1162
+ - **Image limits test suite**: Added comprehensive tests for provider-specific image limitations (max images, max size, max dimensions). Discovered actual limits: Anthropic (100 images, 5MB, 8000px), OpenAI (500 images, ≥25MB), Gemini (~2500 images, ≥40MB), Mistral (8 images, ~15MB), OpenRouter (~40 images context-limited, ~15MB). ([#120](https://github.com/badlogic/pi-mono/pull/120))
1163
+
1164
+ - **Tool result streaming**: Added `tool_execution_update` event and optional `onUpdate` callback to `AgentTool.execute()` for streaming tool output during execution. Tools can now emit partial results (e.g., bash stdout) that are forwarded to subscribers. ([#44](https://github.com/badlogic/pi-mono/issues/44))
1165
+
1166
+ - **X-Initiator header for GitHub Copilot**: Added X-Initiator header handling for GitHub Copilot provider to ensure correct call accounting (agent calls are not deducted from quota). Sets initiator based on last message role. ([#200](https://github.com/badlogic/pi-mono/pull/200) by [@kim0](https://github.com/kim0))
1167
+
1168
+ ### Changed
1169
+
1170
+ - **Normalized tool_execution_end result**: `tool_execution_end` event now always contains `AgentToolResult` (no longer `AgentToolResult | string`). Errors are wrapped in the standard result format.
1171
+
1172
+ ### Fixed
1173
+
1174
+ - **Reasoning disabled by default**: When `reasoning` option is not specified, thinking is now explicitly disabled for all providers. Previously, some providers like Gemini with "dynamic thinking" would use their default (thinking ON), causing unexpected token usage. This was the original intended behavior. ([#180](https://github.com/badlogic/pi-mono/pull/180) by [@markusylisiurunen](https://github.com/markusylisiurunen))
1175
+
1176
+ ## [0.22.2] - 2025-12-15
1177
+
1178
+ ### Added
1179
+
1180
+ - **Interleaved thinking for Anthropic**: Added `interleavedThinking` option to `AnthropicOptions`. When enabled, Claude 4 models can think between tool calls and reason after receiving tool results. Enabled by default (no extra token cost, just unlocks the capability). Set `interleavedThinking: false` to disable.
1181
+
1182
+ ## [0.22.1] - 2025-12-15
1183
+
1184
+ _Dedicated to Peter's shoulder ([@steipete](https://twitter.com/steipete))_
1185
+
1186
+ ### Added
1187
+
1188
+ - **Interleaved thinking for Anthropic**: Enabled interleaved thinking in the Anthropic provider, allowing Claude models to output thinking blocks interspersed with text responses.
1189
+
1190
+ ## [0.22.0] - 2025-12-15
1191
+
1192
+ ### Added
1193
+
1194
+ - **GitHub Copilot provider**: Added `github-copilot` as a known provider with models sourced from models.dev. Includes Claude, GPT, Gemini, Grok, and other models available through GitHub Copilot. ([#191](https://github.com/badlogic/pi-mono/pull/191) by [@cau1k](https://github.com/cau1k))
1195
+
1196
+ ### Fixed
1197
+
1198
+ - **GitHub Copilot gpt-5 models**: Fixed API selection for gpt-5 models to use `openai-responses` instead of `openai-completions` (gpt-5 models are not accessible via completions endpoint)
1199
+
1200
+ - **GitHub Copilot cross-model context handoff**: Fixed context handoff failing when switching between GitHub Copilot models using different APIs (e.g., gpt-5 to claude-sonnet-4). Tool call IDs from OpenAI Responses API were incompatible with other models. ([#198](https://github.com/badlogic/pi-mono/issues/198))
1201
+
1202
+ - **Gemini 3 Pro thinking levels**: Thinking level configuration now works correctly for Gemini 3 Pro models. Previously all levels mapped to -1 (minimal thinking). Now LOW/MEDIUM/HIGH properly control test-time computation. ([#176](https://github.com/badlogic/pi-mono/pull/176) by [@markusylisiurunen](https://github.com/markusylisiurunen))
1203
+
1204
+ ## [0.18.2] - 2025-12-11
1205
+
1206
+ ### Changed
1207
+
1208
+ - **Anthropic SDK retries disabled**: Set `maxRetries: 0` on Anthropic client to allow application-level retry handling. The SDK's built-in retries were interfering with coding-agent's retry logic. ([#157](https://github.com/badlogic/pi-mono/issues/157))
1209
+
1210
+ ## [0.18.1] - 2025-12-10
1211
+
1212
+ ### Added
1213
+
1214
+ - **Mistral provider**: Added support for Mistral AI models via the OpenAI-compatible API. Includes automatic handling of Mistral-specific requirements (tool call ID format). Set `MISTRAL_API_KEY` environment variable to use.
1215
+
1216
+ ### Fixed
1217
+
1218
+ - Fixed Mistral 400 errors after aborted assistant messages by skipping empty assistant messages (no content, no tool calls) ([#165](https://github.com/badlogic/pi-mono/issues/165))
1219
+
1220
+ - Removed synthetic assistant bridge message after tool results for Mistral (no longer required as of Dec 2025) ([#165](https://github.com/badlogic/pi-mono/issues/165))
1221
+
1222
+ - Fixed bug where `ANTHROPIC_API_KEY` environment variable was deleted globally after first OAuth token usage, causing subsequent prompts to fail ([#164](https://github.com/badlogic/pi-mono/pull/164))
1223
+
1224
+ ## [0.17.0] - 2025-12-09
1225
+
1226
+ ### Added
1227
+
1228
+ - **`agentLoopContinue` function**: Continue an agent loop from existing context without adding a new user message. Validates that the last message is `user` or `toolResult`. Useful for retry after context overflow or resuming from manually-added tool results.
1229
+
1230
+ ### Breaking Changes
1231
+
1232
+ - Removed provider-level tool argument validation. Validation now happens in `agentLoop` via `executeToolCalls`, allowing models to retry on validation errors. For manual tool execution, use `validateToolCall(tools, toolCall)` or `validateToolArguments(tool, toolCall)`.
1233
+
1234
+ ### Added
1235
+
1236
+ - Added `validateToolCall(tools, toolCall)` helper that finds the tool by name and validates arguments.
1237
+
1238
+ - **OpenAI compatibility overrides**: Added `compat` field to `Model` for `openai-completions` API, allowing explicit configuration of provider quirks (`supportsStore`, `supportsDeveloperRole`, `supportsReasoningEffort`, `maxTokensField`). Falls back to URL-based detection if not set. Useful for LiteLLM, custom proxies, and other non-standard endpoints. ([#133](https://github.com/badlogic/pi-mono/issues/133), thanks @fink-andreas for the initial idea and PR)
1239
+
1240
+ - **xhigh reasoning level**: Added `xhigh` to `ReasoningEffort` type for OpenAI codex-max models. For non-OpenAI providers (Anthropic, Google), `xhigh` is automatically mapped to `high`. ([#143](https://github.com/badlogic/pi-mono/issues/143))
1241
+
1242
+ ### Changed
1243
+
1244
+ - **Updated SDK versions**: OpenAI SDK 5.21.0 → 6.10.0, Anthropic SDK 0.61.0 → 0.71.2, Google GenAI SDK 1.30.0 → 1.31.0
1245
+
1246
+ ## [0.13.0] - 2025-12-06
1247
+
1248
+ ### Breaking Changes
1249
+
1250
+ - **Added `totalTokens` field to `Usage` type**: All code that constructs `Usage` objects must now include the `totalTokens` field. This field represents the total tokens processed by the LLM (input + output + cache). For OpenAI and Google, this uses native API values (`total_tokens`, `totalTokenCount`). For Anthropic, it's computed as `input + output + cacheRead + cacheWrite`.
1251
+
1252
+ ## [0.12.10] - 2025-12-04
1253
+
1254
+ ### Added
1255
+
1256
+ - Added `gpt-5.1-codex-max` model support
1257
+
1258
+ ### Fixed
1259
+
1260
+ - **OpenAI Token Counting**: Fixed `usage.input` to exclude cached tokens for OpenAI providers. Previously, `input` included cached tokens, causing double-counting when calculating total context size via `input + cacheRead`. Now `input` represents non-cached input tokens across all providers, making `input + output + cacheRead + cacheWrite` the correct formula for total context size.
1261
+
1262
+ - **Fixed Claude Opus 4.5 cache pricing** (was 3x too expensive)
1263
+ - Corrected cache_read: $1.50 → $0.50 per MTok
1264
+ - Corrected cache_write: $18.75 → $6.25 per MTok
1265
+ - Added manual override in `scripts/generate-models.ts` until upstream fix is merged
1266
+ - Submitted PR to models.dev: https://github.com/sst/models.dev/pull/439
1267
+
1268
+ ## [0.9.4] - 2025-11-26
1269
+
1270
+ Initial release with multi-provider LLM support.