@gajae-code/ai 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (349) hide show
  1. package/CHANGELOG.md +2644 -0
  2. package/README.md +1181 -0
  3. package/dist/types/api-registry.d.ts +30 -0
  4. package/dist/types/auth-broker/client.d.ts +66 -0
  5. package/dist/types/auth-broker/index.d.ts +5 -0
  6. package/dist/types/auth-broker/refresher.d.ts +25 -0
  7. package/dist/types/auth-broker/remote-store.d.ts +96 -0
  8. package/dist/types/auth-broker/server.d.ts +32 -0
  9. package/dist/types/auth-broker/types.d.ts +105 -0
  10. package/dist/types/auth-broker/wire-schemas.d.ts +412 -0
  11. package/dist/types/auth-gateway/http.d.ts +39 -0
  12. package/dist/types/auth-gateway/index.d.ts +3 -0
  13. package/dist/types/auth-gateway/server.d.ts +17 -0
  14. package/dist/types/auth-gateway/types.d.ts +115 -0
  15. package/dist/types/auth-storage.d.ts +641 -0
  16. package/dist/types/cli.d.ts +2 -0
  17. package/dist/types/index.d.ts +49 -0
  18. package/dist/types/model-cache.d.ts +17 -0
  19. package/dist/types/model-manager.d.ts +62 -0
  20. package/dist/types/model-thinking.d.ts +71 -0
  21. package/dist/types/models.d.ts +12 -0
  22. package/dist/types/provider-details.d.ts +24 -0
  23. package/dist/types/provider-models/bundled-references.d.ts +4 -0
  24. package/dist/types/provider-models/descriptors.d.ts +48 -0
  25. package/dist/types/provider-models/google.d.ts +20 -0
  26. package/dist/types/provider-models/index.d.ts +5 -0
  27. package/dist/types/provider-models/ollama.d.ts +7 -0
  28. package/dist/types/provider-models/openai-compat.d.ts +237 -0
  29. package/dist/types/provider-models/special.d.ts +16 -0
  30. package/dist/types/providers/amazon-bedrock.d.ts +36 -0
  31. package/dist/types/providers/anthropic-messages-server-schema.d.ts +450 -0
  32. package/dist/types/providers/anthropic-messages-server.d.ts +17 -0
  33. package/dist/types/providers/anthropic.d.ts +188 -0
  34. package/dist/types/providers/aws-credentials.d.ts +43 -0
  35. package/dist/types/providers/aws-eventstream.d.ts +38 -0
  36. package/dist/types/providers/aws-sigv4.d.ts +55 -0
  37. package/dist/types/providers/azure-openai-responses.d.ts +15 -0
  38. package/dist/types/providers/cursor/gen/agent_pb.d.ts +13022 -0
  39. package/dist/types/providers/cursor.d.ts +42 -0
  40. package/dist/types/providers/error-message.d.ts +27 -0
  41. package/dist/types/providers/github-copilot-headers.d.ts +40 -0
  42. package/dist/types/providers/gitlab-duo.d.ts +27 -0
  43. package/dist/types/providers/google-auth.d.ts +24 -0
  44. package/dist/types/providers/google-gemini-cli.d.ts +72 -0
  45. package/dist/types/providers/google-gemini-headers.d.ts +18 -0
  46. package/dist/types/providers/google-shared.d.ts +163 -0
  47. package/dist/types/providers/google-types.d.ts +138 -0
  48. package/dist/types/providers/google-vertex.d.ts +7 -0
  49. package/dist/types/providers/google.d.ts +4 -0
  50. package/dist/types/providers/grammar.d.ts +1 -0
  51. package/dist/types/providers/kimi.d.ts +27 -0
  52. package/dist/types/providers/mock.d.ts +175 -0
  53. package/dist/types/providers/ollama.d.ts +6 -0
  54. package/dist/types/providers/openai-anthropic-shim.d.ts +31 -0
  55. package/dist/types/providers/openai-chat-server-schema.d.ts +814 -0
  56. package/dist/types/providers/openai-chat-server.d.ts +16 -0
  57. package/dist/types/providers/openai-codex/constants.d.ts +26 -0
  58. package/dist/types/providers/openai-codex/request-transformer.d.ts +49 -0
  59. package/dist/types/providers/openai-codex/response-handler.d.ts +17 -0
  60. package/dist/types/providers/openai-codex-responses.d.ts +67 -0
  61. package/dist/types/providers/openai-completions-compat.d.ts +25 -0
  62. package/dist/types/providers/openai-completions.d.ts +33 -0
  63. package/dist/types/providers/openai-responses-server-schema.d.ts +392 -0
  64. package/dist/types/providers/openai-responses-server.d.ts +17 -0
  65. package/dist/types/providers/openai-responses-shared.d.ts +89 -0
  66. package/dist/types/providers/openai-responses.d.ts +32 -0
  67. package/dist/types/providers/pi-native-client.d.ts +13 -0
  68. package/dist/types/providers/pi-native-server.d.ts +68 -0
  69. package/dist/types/providers/register-builtins.d.ts +31 -0
  70. package/dist/types/providers/synthetic.d.ts +26 -0
  71. package/dist/types/providers/transform-messages.d.ts +12 -0
  72. package/dist/types/providers/vision-guard.d.ts +8 -0
  73. package/dist/types/rate-limit-utils.d.ts +19 -0
  74. package/dist/types/stream.d.ts +24 -0
  75. package/dist/types/types.d.ts +746 -0
  76. package/dist/types/usage/claude.d.ts +3 -0
  77. package/dist/types/usage/gemini.d.ts +2 -0
  78. package/dist/types/usage/github-copilot.d.ts +7 -0
  79. package/dist/types/usage/google-antigravity.d.ts +2 -0
  80. package/dist/types/usage/kimi.d.ts +2 -0
  81. package/dist/types/usage/minimax-code.d.ts +2 -0
  82. package/dist/types/usage/openai-codex.d.ts +3 -0
  83. package/dist/types/usage/shared.d.ts +1 -0
  84. package/dist/types/usage/zai.d.ts +2 -0
  85. package/dist/types/usage.d.ts +258 -0
  86. package/dist/types/utils/abort.d.ts +19 -0
  87. package/dist/types/utils/anthropic-auth.d.ts +31 -0
  88. package/dist/types/utils/discovery/antigravity.d.ts +61 -0
  89. package/dist/types/utils/discovery/codex.d.ts +38 -0
  90. package/dist/types/utils/discovery/cursor.d.ts +23 -0
  91. package/dist/types/utils/discovery/gemini.d.ts +25 -0
  92. package/dist/types/utils/discovery/index.d.ts +4 -0
  93. package/dist/types/utils/discovery/openai-compatible.d.ts +72 -0
  94. package/dist/types/utils/event-stream.d.ts +28 -0
  95. package/dist/types/utils/fireworks-model-id.d.ts +10 -0
  96. package/dist/types/utils/foundry.d.ts +1 -0
  97. package/dist/types/utils/h2-fetch.d.ts +22 -0
  98. package/dist/types/utils/http-inspector.d.ts +31 -0
  99. package/dist/types/utils/idle-iterator.d.ts +67 -0
  100. package/dist/types/utils/json-parse.d.ts +10 -0
  101. package/dist/types/utils/oauth/alibaba-coding-plan.d.ts +18 -0
  102. package/dist/types/utils/oauth/anthropic.d.ts +22 -0
  103. package/dist/types/utils/oauth/api-key-login.d.ts +35 -0
  104. package/dist/types/utils/oauth/api-key-validation.d.ts +27 -0
  105. package/dist/types/utils/oauth/callback-server.d.ts +57 -0
  106. package/dist/types/utils/oauth/cerebras.d.ts +1 -0
  107. package/dist/types/utils/oauth/cloudflare-ai-gateway.d.ts +18 -0
  108. package/dist/types/utils/oauth/cursor.d.ts +15 -0
  109. package/dist/types/utils/oauth/deepseek.d.ts +10 -0
  110. package/dist/types/utils/oauth/firepass.d.ts +1 -0
  111. package/dist/types/utils/oauth/fireworks.d.ts +1 -0
  112. package/dist/types/utils/oauth/github-copilot.d.ts +38 -0
  113. package/dist/types/utils/oauth/gitlab-duo.d.ts +3 -0
  114. package/dist/types/utils/oauth/google-antigravity.d.ts +11 -0
  115. package/dist/types/utils/oauth/google-gemini-cli.d.ts +10 -0
  116. package/dist/types/utils/oauth/google-oauth-shared.d.ts +28 -0
  117. package/dist/types/utils/oauth/huggingface.d.ts +19 -0
  118. package/dist/types/utils/oauth/index.d.ts +38 -0
  119. package/dist/types/utils/oauth/kagi.d.ts +17 -0
  120. package/dist/types/utils/oauth/kilo.d.ts +5 -0
  121. package/dist/types/utils/oauth/kimi.d.ts +21 -0
  122. package/dist/types/utils/oauth/litellm.d.ts +18 -0
  123. package/dist/types/utils/oauth/lm-studio.d.ts +17 -0
  124. package/dist/types/utils/oauth/minimax-code.d.ts +28 -0
  125. package/dist/types/utils/oauth/moonshot.d.ts +1 -0
  126. package/dist/types/utils/oauth/nanogpt.d.ts +1 -0
  127. package/dist/types/utils/oauth/nvidia.d.ts +18 -0
  128. package/dist/types/utils/oauth/ollama-cloud.d.ts +2 -0
  129. package/dist/types/utils/oauth/ollama.d.ts +18 -0
  130. package/dist/types/utils/oauth/openai-codex.d.ts +21 -0
  131. package/dist/types/utils/oauth/opencode.d.ts +18 -0
  132. package/dist/types/utils/oauth/parallel.d.ts +17 -0
  133. package/dist/types/utils/oauth/perplexity.d.ts +9 -0
  134. package/dist/types/utils/oauth/pkce.d.ts +8 -0
  135. package/dist/types/utils/oauth/qianfan.d.ts +17 -0
  136. package/dist/types/utils/oauth/qwen-portal.d.ts +19 -0
  137. package/dist/types/utils/oauth/synthetic.d.ts +1 -0
  138. package/dist/types/utils/oauth/tavily.d.ts +17 -0
  139. package/dist/types/utils/oauth/together.d.ts +1 -0
  140. package/dist/types/utils/oauth/types.d.ts +44 -0
  141. package/dist/types/utils/oauth/venice.d.ts +18 -0
  142. package/dist/types/utils/oauth/vercel-ai-gateway.d.ts +18 -0
  143. package/dist/types/utils/oauth/vllm.d.ts +16 -0
  144. package/dist/types/utils/oauth/xiaomi.d.ts +19 -0
  145. package/dist/types/utils/oauth/zai.d.ts +18 -0
  146. package/dist/types/utils/oauth/zenmux.d.ts +1 -0
  147. package/dist/types/utils/overflow.d.ts +54 -0
  148. package/dist/types/utils/parse-bind.d.ts +23 -0
  149. package/dist/types/utils/provider-response.d.ts +3 -0
  150. package/dist/types/utils/retry-after.d.ts +3 -0
  151. package/dist/types/utils/retry.d.ts +26 -0
  152. package/dist/types/utils/schema/adapt.d.ts +24 -0
  153. package/dist/types/utils/schema/compatibility.d.ts +30 -0
  154. package/dist/types/utils/schema/dereference.d.ts +11 -0
  155. package/dist/types/utils/schema/draft.d.ts +10 -0
  156. package/dist/types/utils/schema/equality.d.ts +4 -0
  157. package/dist/types/utils/schema/fields.d.ts +49 -0
  158. package/dist/types/utils/schema/index.d.ts +13 -0
  159. package/dist/types/utils/schema/json-schema-validator.d.ts +12 -0
  160. package/dist/types/utils/schema/meta-validator.d.ts +2 -0
  161. package/dist/types/utils/schema/normalize.d.ts +93 -0
  162. package/dist/types/utils/schema/spill.d.ts +8 -0
  163. package/dist/types/utils/schema/stamps.d.ts +25 -0
  164. package/dist/types/utils/schema/types.d.ts +4 -0
  165. package/dist/types/utils/schema/wire.d.ts +54 -0
  166. package/dist/types/utils/schema/zod-decontaminate.d.ts +31 -0
  167. package/dist/types/utils/sse-debug.d.ts +10 -0
  168. package/dist/types/utils/tool-call-healing.d.ts +71 -0
  169. package/dist/types/utils/tool-choice.d.ts +50 -0
  170. package/dist/types/utils/validation.d.ts +17 -0
  171. package/dist/types/utils.d.ts +28 -0
  172. package/package.json +146 -0
  173. package/src/api-registry.ts +96 -0
  174. package/src/auth-broker/client.ts +358 -0
  175. package/src/auth-broker/index.ts +5 -0
  176. package/src/auth-broker/refresher.ts +127 -0
  177. package/src/auth-broker/remote-store.ts +623 -0
  178. package/src/auth-broker/server.ts +644 -0
  179. package/src/auth-broker/types.ts +127 -0
  180. package/src/auth-broker/wire-schemas.ts +200 -0
  181. package/src/auth-gateway/http.ts +194 -0
  182. package/src/auth-gateway/index.ts +3 -0
  183. package/src/auth-gateway/server.ts +717 -0
  184. package/src/auth-gateway/types.ts +134 -0
  185. package/src/auth-storage.ts +4104 -0
  186. package/src/cli.ts +262 -0
  187. package/src/index.ts +54 -0
  188. package/src/model-cache.ts +129 -0
  189. package/src/model-manager.ts +450 -0
  190. package/src/model-thinking.ts +691 -0
  191. package/src/models.json +73853 -0
  192. package/src/models.json.d.ts +9 -0
  193. package/src/models.ts +56 -0
  194. package/src/prompts/turn-aborted-guidance.md +4 -0
  195. package/src/provider-details.ts +90 -0
  196. package/src/provider-models/bundled-references.ts +38 -0
  197. package/src/provider-models/descriptors.ts +308 -0
  198. package/src/provider-models/google.ts +91 -0
  199. package/src/provider-models/index.ts +5 -0
  200. package/src/provider-models/ollama.ts +153 -0
  201. package/src/provider-models/openai-compat.ts +2275 -0
  202. package/src/provider-models/special.ts +67 -0
  203. package/src/providers/amazon-bedrock.ts +849 -0
  204. package/src/providers/anthropic-messages-server-schema.ts +229 -0
  205. package/src/providers/anthropic-messages-server.ts +677 -0
  206. package/src/providers/anthropic.ts +2696 -0
  207. package/src/providers/aws-credentials.ts +501 -0
  208. package/src/providers/aws-eventstream.ts +185 -0
  209. package/src/providers/aws-sigv4.ts +218 -0
  210. package/src/providers/azure-openai-responses.ts +337 -0
  211. package/src/providers/cursor/gen/agent_pb.ts +15274 -0
  212. package/src/providers/cursor/proto/agent.proto +3526 -0
  213. package/src/providers/cursor/proto/buf.gen.yaml +6 -0
  214. package/src/providers/cursor/proto/buf.yaml +17 -0
  215. package/src/providers/cursor.ts +2561 -0
  216. package/src/providers/error-message.ts +21 -0
  217. package/src/providers/github-copilot-headers.ts +140 -0
  218. package/src/providers/gitlab-duo.ts +372 -0
  219. package/src/providers/google-auth.ts +252 -0
  220. package/src/providers/google-gemini-cli.ts +795 -0
  221. package/src/providers/google-gemini-headers.ts +41 -0
  222. package/src/providers/google-shared.ts +902 -0
  223. package/src/providers/google-types.ts +167 -0
  224. package/src/providers/google-vertex.ts +88 -0
  225. package/src/providers/google.ts +41 -0
  226. package/src/providers/grammar.ts +70 -0
  227. package/src/providers/kimi.ts +52 -0
  228. package/src/providers/mock.ts +500 -0
  229. package/src/providers/ollama.ts +544 -0
  230. package/src/providers/openai-anthropic-shim.ts +138 -0
  231. package/src/providers/openai-chat-server-schema.ts +243 -0
  232. package/src/providers/openai-chat-server.ts +628 -0
  233. package/src/providers/openai-codex/constants.ts +43 -0
  234. package/src/providers/openai-codex/request-transformer.ts +161 -0
  235. package/src/providers/openai-codex/response-handler.ts +81 -0
  236. package/src/providers/openai-codex-responses.ts +2598 -0
  237. package/src/providers/openai-completions-compat.ts +279 -0
  238. package/src/providers/openai-completions.ts +1853 -0
  239. package/src/providers/openai-responses-server-schema.ts +290 -0
  240. package/src/providers/openai-responses-server.ts +1183 -0
  241. package/src/providers/openai-responses-shared.ts +800 -0
  242. package/src/providers/openai-responses.ts +621 -0
  243. package/src/providers/pi-native-client.ts +228 -0
  244. package/src/providers/pi-native-server.ts +210 -0
  245. package/src/providers/register-builtins.ts +412 -0
  246. package/src/providers/synthetic.ts +50 -0
  247. package/src/providers/transform-messages.ts +309 -0
  248. package/src/providers/vision-guard.ts +31 -0
  249. package/src/rate-limit-utils.ts +84 -0
  250. package/src/stream.ts +895 -0
  251. package/src/types.ts +884 -0
  252. package/src/usage/claude.ts +431 -0
  253. package/src/usage/gemini.ts +250 -0
  254. package/src/usage/github-copilot.ts +421 -0
  255. package/src/usage/google-antigravity.ts +201 -0
  256. package/src/usage/kimi.ts +271 -0
  257. package/src/usage/minimax-code.ts +31 -0
  258. package/src/usage/openai-codex.ts +503 -0
  259. package/src/usage/shared.ts +10 -0
  260. package/src/usage/zai.ts +247 -0
  261. package/src/usage.ts +183 -0
  262. package/src/utils/abort.ts +51 -0
  263. package/src/utils/anthropic-auth.ts +87 -0
  264. package/src/utils/discovery/antigravity.ts +261 -0
  265. package/src/utils/discovery/codex.ts +371 -0
  266. package/src/utils/discovery/cursor.ts +306 -0
  267. package/src/utils/discovery/gemini.ts +248 -0
  268. package/src/utils/discovery/index.ts +4 -0
  269. package/src/utils/discovery/openai-compatible.ts +224 -0
  270. package/src/utils/event-stream.ts +142 -0
  271. package/src/utils/fireworks-model-id.ts +30 -0
  272. package/src/utils/foundry.ts +8 -0
  273. package/src/utils/h2-fetch.ts +60 -0
  274. package/src/utils/http-inspector.ts +176 -0
  275. package/src/utils/idle-iterator.ts +250 -0
  276. package/src/utils/json-parse.ts +148 -0
  277. package/src/utils/oauth/alibaba-coding-plan.ts +59 -0
  278. package/src/utils/oauth/anthropic.ts +200 -0
  279. package/src/utils/oauth/api-key-login.ts +87 -0
  280. package/src/utils/oauth/api-key-validation.ts +92 -0
  281. package/src/utils/oauth/callback-server.ts +276 -0
  282. package/src/utils/oauth/cerebras.ts +16 -0
  283. package/src/utils/oauth/cloudflare-ai-gateway.ts +48 -0
  284. package/src/utils/oauth/cursor.ts +157 -0
  285. package/src/utils/oauth/deepseek.ts +53 -0
  286. package/src/utils/oauth/firepass.ts +24 -0
  287. package/src/utils/oauth/fireworks.ts +15 -0
  288. package/src/utils/oauth/github-copilot.ts +362 -0
  289. package/src/utils/oauth/gitlab-duo.ts +123 -0
  290. package/src/utils/oauth/google-antigravity.ts +200 -0
  291. package/src/utils/oauth/google-gemini-cli.ts +256 -0
  292. package/src/utils/oauth/google-oauth-shared.ts +110 -0
  293. package/src/utils/oauth/huggingface.ts +62 -0
  294. package/src/utils/oauth/index.ts +444 -0
  295. package/src/utils/oauth/kagi.ts +47 -0
  296. package/src/utils/oauth/kilo.ts +87 -0
  297. package/src/utils/oauth/kimi.ts +254 -0
  298. package/src/utils/oauth/litellm.ts +47 -0
  299. package/src/utils/oauth/lm-studio.ts +38 -0
  300. package/src/utils/oauth/minimax-code.ts +78 -0
  301. package/src/utils/oauth/moonshot.ts +16 -0
  302. package/src/utils/oauth/nanogpt.ts +15 -0
  303. package/src/utils/oauth/nvidia.ts +70 -0
  304. package/src/utils/oauth/oauth.html +199 -0
  305. package/src/utils/oauth/ollama-cloud.ts +28 -0
  306. package/src/utils/oauth/ollama.ts +47 -0
  307. package/src/utils/oauth/openai-codex.ts +299 -0
  308. package/src/utils/oauth/opencode.ts +49 -0
  309. package/src/utils/oauth/parallel.ts +46 -0
  310. package/src/utils/oauth/perplexity.ts +206 -0
  311. package/src/utils/oauth/pkce.ts +18 -0
  312. package/src/utils/oauth/qianfan.ts +58 -0
  313. package/src/utils/oauth/qwen-portal.ts +60 -0
  314. package/src/utils/oauth/synthetic.ts +16 -0
  315. package/src/utils/oauth/tavily.ts +46 -0
  316. package/src/utils/oauth/together.ts +16 -0
  317. package/src/utils/oauth/types.ts +94 -0
  318. package/src/utils/oauth/venice.ts +59 -0
  319. package/src/utils/oauth/vercel-ai-gateway.ts +47 -0
  320. package/src/utils/oauth/vllm.ts +40 -0
  321. package/src/utils/oauth/xiaomi.ts +137 -0
  322. package/src/utils/oauth/zai.ts +60 -0
  323. package/src/utils/oauth/zenmux.ts +15 -0
  324. package/src/utils/overflow.ts +137 -0
  325. package/src/utils/parse-bind.ts +54 -0
  326. package/src/utils/provider-response.ts +30 -0
  327. package/src/utils/retry-after.ts +110 -0
  328. package/src/utils/retry.ts +54 -0
  329. package/src/utils/schema/CONSTRAINTS.md +164 -0
  330. package/src/utils/schema/adapt.ts +36 -0
  331. package/src/utils/schema/compatibility.ts +435 -0
  332. package/src/utils/schema/dereference.ts +98 -0
  333. package/src/utils/schema/draft.ts +341 -0
  334. package/src/utils/schema/equality.ts +97 -0
  335. package/src/utils/schema/fields.ts +190 -0
  336. package/src/utils/schema/index.ts +13 -0
  337. package/src/utils/schema/json-schema-validator.ts +577 -0
  338. package/src/utils/schema/meta-validator.ts +167 -0
  339. package/src/utils/schema/normalize.ts +1588 -0
  340. package/src/utils/schema/spill.ts +43 -0
  341. package/src/utils/schema/stamps.ts +97 -0
  342. package/src/utils/schema/types.ts +11 -0
  343. package/src/utils/schema/wire.ts +213 -0
  344. package/src/utils/schema/zod-decontaminate.ts +331 -0
  345. package/src/utils/sse-debug.ts +289 -0
  346. package/src/utils/tool-call-healing.ts +271 -0
  347. package/src/utils/tool-choice.ts +99 -0
  348. package/src/utils/validation.ts +1019 -0
  349. package/src/utils.ts +166 -0
package/CHANGELOG.md ADDED
@@ -0,0 +1,2644 @@
1
+ # Changelog
2
+
3
+ ## [Unreleased]
4
+
5
+ ## [0.1.1] - 2026-05-28
6
+ ### Breaking Changes
7
+
8
+ - Removed `findAnthropicAuth` from `anthropic-auth` and replaced store-driven auth discovery with `buildAnthropicAuthConfig`, requiring callers to provide an already-resolved API key before building Anthropic auth config
9
+
10
+ ### Added
11
+
12
+ - Added `AuthStorage.getOAuthAccess` to return a refreshed OAuth access token with identity metadata (`accountId`, `email`, `projectId`, `enterpriseUrl`) for callers that need bearer-token headers together
13
+
14
+ ### Changed
15
+
16
+ - Changed OAuth selection in `AuthStorage` to treat credentials as stale when they are within 60 seconds of expiry and rotate them preemptively
17
+ - Changed Google Gemini CLI, Google Gemini usage, Antigravity usage, and Kimi usage flows to stop refreshing OAuth tokens directly and rely on `AuthStorage` for token rotation
18
+
19
+ ### Removed
20
+
21
+ - Removed provider-local OAuth refresh helpers from Google Gemini CLI and Google/Kimi/Antigravity usage probes, preventing direct refresh calls from those usage paths
22
+
23
+ ### Fixed
24
+
25
+ - Fixed expired OAuth handling so provider-level paths no longer attempt direct token refresh calls for expired credentials and instead rely on `AuthStorage` for rotation
26
+ - Fixed `google-gemini-cli` / `google-antigravity` aborting heavy reasoning runs with "Provider stream timed out while waiting for the first event" before the upstream had a chance to emit its first SSE frame. Cloud Code Assist routinely takes >100s on Gemini 3.x Pro at high thinking levels; the lazy-stream wrapper now floors the first-event watchdog at 5 minutes for these two providers when neither `StreamOptions.streamFirstEventTimeoutMs` nor `PI_STREAM_FIRST_EVENT_TIMEOUT_MS` pins a value. Other providers keep the 100s default. Internally, `getStreamIdleTimeoutMs` and `getStreamFirstEventTimeoutMs` now accept an optional per-provider `fallbackMs` so other slow-first-token providers can opt into the same widening without leaking through to the global default.
27
+ - Fixed Anthropic model Opus 4.7 on Amazon Bedrock streaming no reasoning output (and appearing to hang on long reasoning runs) because Anthropic silently switched the adaptive-thinking display default to `"omitted"`. The Bedrock provider now sends `thinking.display = "summarized"` by default on Opus 4.7+ adaptive models and on budget-based Anthropic model models, mirroring the existing direct-Anthropic behavior. `BedrockOptions.thinkingDisplay` (`"summarized" | "omitted"`) is exposed for callers that want to opt out, and `hideThinkingSummary` now wires through to the Bedrock case ([#1373](https://github.com/can1357/gajae-code/issues/1373)).
28
+ - Fixed Cursor Composer resume/tool-continuation turns failing with `Cannot send empty user message to Cursor API`. Empty current user turns now use Cursor's `resumeAction` instead of constructing an invalid `userMessageAction` ([#1376](https://github.com/can1357/gajae-code/issues/1376)).
29
+
30
+ ## [15.3.2] - 2026-05-25
31
+ ### Added
32
+
33
+ - Added `GET /v1/snapshot/stream` for live auth-broker snapshot updates via SSE with `snapshot`, `entry`, and `removed` event frames
34
+ - Added `AuthBrokerClient.openSnapshotStream()` for consuming SSE snapshot streams from `/v1/snapshot/stream`
35
+ - Added `streamSnapshots` option to `RemoteAuthCredentialStore` (default `true`) to enable or disable SSE-based snapshot synchronization
36
+ - Added `streamKeepaliveMs` to `startAuthBroker()` to tune heartbeat frequency for the SSE stream
37
+ - Added `AuthStorage.checkCredentials({ signal?, timeoutMs?, baseUrlResolver? })` that returns a per-credential `CredentialHealthResult` with tri-state `ok` (`true` / `false` / `null`-unverifiable), the credential's identity (provider, type, email/accountId, broker-refresh flag), and the upstream error string when the probe fails. Iterates sequentially over `listAuthCredentials()`, exercises OAuth refresh on expiry, then calls the per-provider `UsageProvider.fetchUsage` without swallowing errors — so callers can identify which row in a multi-account broker is producing 401s instead of getting a silently-deduplicated `fetchUsageReports` list.
38
+ - Added `GET /v1/credentials/check` to `startAuthGateway()` that forwards to `AuthStorage.checkCredentials` and returns `{ generatedAt, credentials }`. Gated by the same bearer as the rest of the gateway.
39
+
40
+ ### Changed
41
+
42
+ - Changed `RemoteAuthCredentialStore` to prefer SSE snapshot streaming and automatically fall back to long-polling when a broker returns 404 for `/v1/snapshot/stream`
43
+ - Changed snapshot write-refresh flow so `RemoteAuthCredentialStore` skips immediate `/v1/snapshot` refreshes when SSE streaming is active
44
+ - Changed broker SSE stream behavior to keep connections open with periodic keepalives and an increased server idle timeout
45
+
46
+ ## [15.3.0] - 2026-05-25
47
+
48
+ ### Added
49
+
50
+ - Added DeepSeek to the built-in API-key login provider catalog so `gjc login deepseek` stores a reusable `DEEPSEEK_API_KEY` credential for the bundled DeepSeek models.
51
+
52
+ ### Fixed
53
+
54
+ - Fixed `openai-responses` requests intermittently 400ing with `No tool call found for function call output with call_id …` after an aborted turn or a locally-rejected tool call (e.g. argument-validation failure). `convertConversationMessages` now folds orphan `function_call_output` / `custom_tool_call_output` items — those whose matching `function_call` was wiped by an earlier `dt: false` snapshot splice or never landed in any persisted provider payload — into assistant text notes, preserving the payload while keeping the request grammatically valid ([#1351](https://github.com/can1357/gajae-code/issues/1351)).
55
+
56
+ ## [15.2.4] - 2026-05-22
57
+
58
+ ### Fixed
59
+
60
+ - Fixed ChatGPT Plus/Pro (OpenAI code) OAuth login returning `Token exchange failed: 403` on Windows. When port 1455 was in use, the callback server silently fell back to a random port; OpenAI's authorization endpoint accepts any localhost redirect URI (loose validation), so the browser callback succeeds and shows "Authentication Successful", but the token endpoint rejects the non-registered port with 403. The `OpenAIOpenAI codeOAuthFlow` now enforces a fixed `redirectUri` option so a busy port immediately surfaces as "port unavailable" instead of producing a confusing 403 ([#1277](https://github.com/can1357/gajae-code/issues/1277)).
61
+ - Improved `exchangeCodeForToken` error diagnostics: the 403 response body (`error` / `error_description` fields) is now included in the thrown message, matching the existing `refreshOpenAIOpenAI codeToken` behaviour.
62
+
63
+ ### Added
64
+
65
+ - Added `ChatGPT Plus/Pro (OpenAI code, headless/device)` (`openai-code-device`) as an alternative login method for the OpenAI code provider. Uses OpenAI's device-code flow (`/api/accounts/deviceauth/usercode` → poll `/api/accounts/deviceauth/token`), which avoids a local callback server and port 1455 entirely. Credentials are stored under the existing `openai-code` provider key so all models and tooling continue to work without reconfiguration ([#1277](https://github.com/can1357/gajae-code/issues/1277)).
66
+
67
+ ## [15.2.2] - 2026-05-22
68
+
69
+ ### Fixed
70
+
71
+ - Fixed `gemini-3.1-pro-high` and `gemini-3.1-pro-low` on the `google-antigravity` provider always returning HTTP 400 from Cloud Code Assist. The `ANTIGRAVITY_SYSTEM_INSTRUCTION` identity header was not injected for these models because the internal check matched the string `"gemini-3-pro-high"` (hyphen) instead of the versioned `"gemini-3.1-pro-..."` form. The guard now matches all `gemini-3` model variants ([#1274](https://github.com/can1357/gajae-code/issues/1274)).
72
+
73
+ ## [15.2.0] - 2026-05-21
74
+
75
+ ### Fixed
76
+
77
+ - Fixed `/login` (and `/logout`, plus any `AuthStorage.set` / `remove` call) against a remote auth-broker throwing `RemoteAuthCredentialStore is read-only on the client. Use 'gjc auth-broker login <provider>' to mutate credentials.` Added three optional async write hooks to `AuthCredentialStore` (`upsertAuthCredentialRemote`, `replaceAuthCredentialsRemote`, `deleteAuthCredentialsRemote`); `RemoteAuthCredentialStore` implements them via the broker's `POST /v1/credential` and `POST /v1/credential/:id/disable` endpoints and applies the broker's authoritative post-write entries to the local snapshot. `AuthStorage` routes through the hooks when present, so OAuth and API-key logins (and logouts) initiated from a broker-backed client now persist server-side and surface immediately without waiting for the long-poll snapshot tick.
78
+
79
+ ## [15.1.9] - 2026-05-21
80
+
81
+ ### Fixed
82
+
83
+ - Fixed Ollama named tool forcing to send only the requested tool when the caller passes a named `toolChoice`, preserving `tool_choice: "required"` while preventing local models from selecting a different tool. ([#1236](https://github.com/can1357/gajae-code/issues/1236))
84
+ - Fixed `/btw` (and IRC background replies) returning a `BedrockException` 400 (`The toolConfig field must be defined when using toolUse and toolResult content blocks.`) on LiteLLM → Bedrock once the session has tool-call history. Two source fixes in `buildParams`: (1) `if (context.tools)` → `if (context.tools?.length)` so an explicit `context.tools = []` (the /btw opt-out) never routes through `convertTools` and never emits an empty `"tools"` array; (2) `else if (hasToolHistory(...))` → `else if (context.tools === undefined && hasToolHistory(...))` so the Anthropic-proxy sentinel that injects `tools: []` for tool-history turns is suppressed when the caller explicitly opted out, preventing it from re-introducing the empty array. As defence-in-depth, `tool_choice: "none"` is also dropped when the resolved tools list is missing or empty. ([#1227](https://github.com/can1357/gajae-code/issues/1227))
85
+
86
+ ## [15.1.8] - 2026-05-20
87
+ ### Added
88
+
89
+ - Added Fireworks Fire Pass as a separate `firepass` provider with API-key login flow, bundled `kimi-k2.6-turbo` model entry (Kimi K2.6 Turbo), and wire-id translation from the friendly catalog id to the `accounts/fireworks/routers/kimi-k2p6-turbo` router endpoint. Fire Pass keys (`fpk_…`) authorize only the dedicated router and reject `/v1/models`, so login validation pings chat completions against the router id directly. Extended the openai-completions Kimi-family safety net so the firepass entry inherits the per-Fireworks-docs "always send `max_tokens`" default ([Kimi K2 guide](https://docs.fireworks.ai/models/kimi-k2)); the router's accepted `reasoning_effort` set includes `xhigh`, so it is forwarded verbatim rather than remapped. See https://docs.fireworks.ai/firepass.
90
+
91
+ ### Fixed
92
+
93
+ - Fixed DeepSeek V4 direct API requests with tools to keep documented thinking mode instead of dropping reasoning: lower GJC efforts now map to DeepSeek's supported `high`, `tool_choice` is omitted, `thinking: { type: "enabled" }` and `max_tokens` are sent, and partial user `reasoningEffortMap` overrides merge with DeepSeek defaults. ([#1207](https://github.com/can1357/gajae-code/issues/1207))
94
+ - Fixed model cache schema v2 databases so offline refreshes preserve cached provider discoveries after upgrading to schema v3 and subsequent online refreshes can overwrite the cache. ([#1219](https://github.com/can1357/gajae-code/issues/1219))
95
+ - Fixed Perplexity OAuth credentials being treated as expired one hour after login. `getJwtExpiry` was fabricating `expires = now + 1h` whenever the JWT had no `exp` claim (the common case — Perplexity sessions are server-side). Once the hour elapsed, `getOAuthApiKey` would mark the cred expired and the search provider's loader would silently skip it, surfacing as "logged out". Logins with no `exp` now persist a far-future sentinel; `getOAuthApiKey` also normalizes any stale `expires` written by older builds.
96
+
97
+ ## [15.1.7] - 2026-05-19
98
+ ### Added
99
+
100
+ - Added Anthropic realization of `serviceTier: "priority"`. The anthropic-messages provider now sets `speed: "fast"` on the request and appends the `fast-mode-2026-02-01` beta to `Anthropic-Beta` whenever the caller passes `serviceTier: "priority"`. When the server rejects an unsupported model with `invalid_request_error`, the provider transparently retries the same turn without the fast-mode signal (mirroring the strict-tools fallback pattern), persists the disable via a new `providerSessionState.fastModeDisabled` flag so subsequent requests in the session skip the field, and surfaces the action via the new `AssistantMessage.disabledFeatures` array (id `"priority"`) so callers can sync user-facing toggles. A new `clearAnthropicFastModeFallback(providerSessionState)` helper lets callers re-arm priority after the auto-fallback fired.
101
+ - Added scoped `ServiceTier` values: `"openai-only"` (priority on `openai`/`openai-code`, ignored elsewhere) and `"anthropic-model-only"` (priority on direct `anthropic`, ignored on Bedrock/Vertex Anthropic model and elsewhere). A new `resolveServiceTier(serviceTier, provider)` helper computes the effective tier for the provider; existing OpenAI/Anthropic provider code routes through it, so `service_tier` and Anthropic fast-mode emission both respect scope. `getPriorityPremiumRequests` now counts Anthropic+priority as one premium request (previously zero) and continues to ignore providers that drop the field on the wire.
102
+
103
+ ### Fixed
104
+
105
+ - Fixed Anthropic fast mode (`serviceTier: "priority"`) looping on 429 `rate_limit_error: "Extra usage is required for fast mode."` for accounts without the extra-usage entitlement. `isAnthropicFastModeUnsupportedError` now matches the 429 phrasing in addition to the 400 `invalid_request_error` "does not support the `speed` parameter" case, so the provider drops `speed: "fast"` on the in-turn retry, sets `providerSessionState.fastModeDisabled` for the remainder of the session, and surfaces `disabledFeatures: ["priority"]` to the caller instead of retrying with the same payload until `PROVIDER_MAX_RETRIES` is exhausted.
106
+ - Fixed MiniMax Coding Plan CN streaming `<think>...</think>` reasoning as visible assistant text. The OpenAI-compatible stream parser now enables the existing MiniMax tag parser for both `minimax-code` and `minimax-code-cn`, so CN responses become structured `thinking` blocks instead of raw text. ([#1203](https://github.com/can1357/gajae-code/issues/1203))
107
+
108
+ ## [15.1.6] - 2026-05-19
109
+
110
+ ### Fixed
111
+
112
+ - Fixed `{}` (empty JSON Schema, the wire representation of `z.unknown()`) being passed verbatim to grammar-constrained samplers (llama.cpp, etc.) in `additionalProperties`, `items`, and other schema-valued positions across **every provider** (OpenAI, Anthropic, Google, Ollama, Bedrock, Cursor). Grammar builders treat `{}` as "generate an empty object" rather than "any JSON value", causing open-typed fields (e.g. `extra.title` from `z.record(z.string(), z.unknown())`) to always emit `{}` instead of the intended string/number/etc. `toolWireSchema` now applies a new `normalizeEmptySchemas` pass (exported) to both the Zod and TypeBox/raw-JSON-Schema branches, converting `{}` → `true` (semantically identical per JSON Schema draft 2020-12 §4.3.1) in all schema-valued positions. Strict-mode opt-out is preserved across all providers: OpenAI's `hasUnrepresentableStrictObjectMap` hits the `=== true` branch instead of the `isJsonObject({})` branch (same result); Anthropic's `normalizeAnthropicStrictSchemaNode` opts out via `additionalProperties !== false` (still true for `true`); Google's `normalizeSchemaForGoogle` strips `additionalProperties` regardless (pre-existing). ([#1179](https://github.com/can1357/gajae-code/issues/1179))
113
+ - Fixed `pi-ai login <provider>` crashing with `Unknown provider` for providers that only the `auth-storage` `login()` switch knew about (perplexity, alibaba-coding-plan, gitlab-duo, huggingface, opencode-zen/go, lm-studio, ollama, cerebras, fireworks, qianfan, synthetic, venice, litellm, moonshot, together, cloudflare/vercel ai gateways, vllm, qwen-portal, nvidia, xiaomi, and any custom OAuth provider). The CLI now delegates to `SqliteAuthCredentialStore.login()` instead of duplicating a smaller switch, so the auth-broker `gjc auth-broker login <provider>` flow works for every registered OAuth provider.
114
+
115
+ ## [15.1.4] - 2026-05-19
116
+ ### Changed
117
+
118
+ - Updated auth-gateway format and pi-native request handling to invalidate the failed API key and retry the provider request with a replacement key when authentication fails
119
+
120
+ ### Fixed
121
+
122
+ - Fixed OpenCode-Go and OpenCode-Zen chat-completions replay to omit stored reasoning fields on Kimi assistant tool-call messages, avoiding provider 400s for rejected `messages[].reasoning` payloads. ([#1157](https://github.com/can1357/gajae-code/issues/1157))
123
+ - Fixed OpenAI Responses and OpenAI code tool schema normalization to emit `properties: {}` for no-argument object schemas without rewriting literal payloads. ([#1147](https://github.com/can1357/gajae-code/issues/1147))
124
+ - Fixed Anthropic 400 (`unexpected tool_use_id found in tool_result blocks ... Each tool_result block must have a corresponding tool_use block in the previous message`) when handoff/compaction folds an assistant `tool_use` into the handoff summary string but leaves the matching user-side `tool_result` message in the history. `transformMessages` now indexes every `tool_use` id surviving the first pass and drops orphan `tool_result` messages whose originator was compacted away, preserving the text payload as a user-level `<stale-tool-result>` note so the model still sees what the tool returned. The note is emitted with `role: "user"` rather than `role: "developer"` so providers that elevate developer-role messages (Ollama: `developer` → `system`; OpenAI chat-completions reasoning models: `developer` → `developer`) cannot lift stale tool output to an instruction-priority tier above the surrounding user/developer messages.
125
+ - Fixed streaming authentication retry to trigger when a provider emits a 401 `error` event after a `start` event but before any replay-unsafe content is emitted
126
+ - Added `credential_process` support to the Bedrock provider's AWS credential resolver so profiles delegating to external brokers (`aws-vault`, `granted`, in-house tools) resolve instead of falling through to `Unable to resolve AWS credentials`. Parses the AWS SDK `Version: 1` JSON envelope, honors `Expiration` in the per-profile cache, propagates `AbortSignal` to the spawned helper, routes Windows `.cmd`/`.bat` helpers through `cmd.exe /c`, and ships a POSIX-shell-style tokenizer that preserves backslashes inside double quotes so Windows paths survive ([#1142](https://github.com/can1357/gajae-code/issues/1142))
127
+
128
+ ## [15.1.3] - 2026-05-17
129
+ ### Breaking Changes
130
+
131
+ - Changed `AuthBrokerClient.fetchSnapshot()` to return status-based results (`200` or `304`) instead of always returning a raw snapshot body, so callers now need to branch on `status`
132
+ - Renamed public schema utilities in `@gajae-code/ai/utils/schema` by replacing `sanitizeSchemaForGoogle`, `sanitizeSchemaForCCA`, `prepareSchemaForCCA`, and `sanitizeSchemaForMCP` with `normalizeSchemaForGoogle`, `normalizeSchemaForCCA`, and `normalizeSchemaForMCP`
133
+ - Added MCP schema normalization via `normalizeSchemaForMCP` for compatibility checks
134
+ - Removed the `StringEnum` helper from `@gajae-code/ai/utils/schema`. Use `z.enum([...])` directly; Zod's emitted JSON Schema is already wire-compatible with Google and other providers.
135
+ - Renamed the concrete SQLite credential store class from `AuthCredentialStore` to `SqliteAuthCredentialStore`. `AuthCredentialStore` is now the persistence interface implemented by both the SQLite store and the new `RemoteAuthCredentialStore`. Update `new AuthCredentialStore(db)` / `AuthCredentialStore.open(...)` call-sites to `SqliteAuthCredentialStore`; type-position uses (`store: AuthCredentialStore`) continue to work unchanged.
136
+
137
+ ### Added
138
+
139
+ - Added `onAuthError` to `StreamOptions` and wired `streamSimple()` to retry once with a replacement API key when the first provider response is a 401 before any assistant events are emitted
140
+ - Added generation-aware snapshot metadata (`generation`, `serverNowMs`, `refresher`, and `rotatesInMs`) to auth-broker snapshot responses to support client-side credential-rotation planning
141
+ - Added `transport: "pi-native"` on `Model` and the matching `streamPiNative` client. When `model.transport === "pi-native"`, `streamSimple` short-circuits the per-provider dispatch and POSTs the canonical `Context` to the auth-gateway's `POST /v1/pi/stream` endpoint. The response is SSE-framed `AssistantMessageEvent`s parsed by `readSseJson` and pushed verbatim into the local `AssistantMessageEventStream` — no wire-format translation, no partial-stripping reconstruction. Used by containerized gjc installs (robogjc slots, swarm extension, etc.) to route every LLM call through a credential-holding sidecar; the slot itself never sees the real provider tokens. Server-controlled fields (`apiKey`, `signal`, `fetch`, lifecycle callbacks, the provider-session map) are stripped from the wire body — `apiKey` rides in the `Authorization` header as the gateway bearer.
142
+ - Added `POST /v1/pi/stream` to the auth-gateway. Same auth + abort + model-resolution + openai-code-compat + prefix-cache plumbing as the foreign-wire routes; only the wire-format translation is skipped. Request body is `{ modelId, context, options?, stream? }` where `context` is the canonical pi-ai `Context` and `options` is `SimpleStreamOptions` with non-serializable fields stripped. Response is SSE-framed `AssistantMessageEvent` (terminated by `data: [DONE]`) when streaming, or `{ message: AssistantMessage }` JSON when `stream: false`.
143
+ - Added Vertex AI authentication via Google Application Default Credentials from `GOOGLE_APPLICATION_CREDENTIALS`, `~/.config/gcloud/application_default_credentials.json`, or metadata server tokens, with token caching and refresh skew control via `GOOGLE_VERTEX_REFRESH_SKEW_MS`
144
+ - Added support for Anthropic image message parts with `type: "url"` and `type: "file"` sources
145
+ - Added `stopSequences` and `frequencyPenalty` to shared stream options and wired them through to OpenAI request translation
146
+ - Added optional request cancellation support to auth-broker interactions by propagating `AbortSignal` into health, snapshot, usage, and refresh calls
147
+ - Added `AuthStorage.setConfigApiKey` / `removeConfigApiKey` / `clearConfigApiKeys` for config-sourced per-provider bearers (e.g. `models.yml` `providers.<name>.apiKey`). The new tier sits between runtime `--api-key` and stored credentials in `getApiKey`/`peekApiKey` resolution, so a bearer pinned in config now beats the broker's OAuth access token. Also suppresses OAuth `account_uuid` attribution when active, since outbound auth is the explicit config bearer, not OAuth. `describeCredentialSource` reports `"config override (models.yml)"` for visibility.
148
+ - Added per-model `additional_rate_limits` parsing to `openaiOpenAI codeUsageProvider`. The OpenAI code `wham/usage` endpoint surfaces a separate `GPT-5.3-OpenAI code-Spark` rate limit (`metered_feature: openai-code_bengalfox`) on Pro accounts; these now emit dedicated `openai-code:spark:{primary,secondary}` `UsageLimit` entries with `scope.tier = "spark"`, mirroring how Anthropic exposes `anthropic:7d:sonnet` separately from the umbrella `anthropic:7d` bucket. The osx-widgets client already keyed spark detection off `limit.id.includes("spark")`; this populates that contract end-to-end.
149
+ - Added `GET /v1/usage` to the auth-broker API to expose aggregated usage reports from `AuthStorage.fetchUsageReports`
150
+ - Added auth-broker usage polling response handling that returns normalized usage reports plus generation timestamp for clients (5-min per-credential cache via `AuthStorage`)
151
+ - Added the auth-broker subsystem (`@gajae-code/ai/auth-broker`) for sharing OAuth credentials across machines without leaking refresh tokens.
152
+ - `startAuthBroker(...)` boots a `Bun.serve` HTTP server exposing `GET /v1/healthz`, `GET /v1/snapshot`, `POST /v1/credential` (upsert), `POST /v1/credential/:id/refresh`, and `POST /v1/credential/:id/disable`.
153
+ - `AuthBrokerClient` is the matching HTTP client used by remote clients.
154
+ - `RemoteAuthCredentialStore` is a client-side `AuthCredentialStore` that mirrors a broker snapshot in memory; mutating methods (`replace*`, `upsert*`, `delete*ForProvider`) throw because writes are server-side only.
155
+ - `AuthBrokerRefresher` is the background refresh loop that pre-refreshes credentials within `refreshSkewMs` and disables on definitive failure (`invalid_grant` / non-network 401-403).
156
+ - Added `AuthStorage.exportSnapshot()`, `AuthStorage.upsertCredential(provider, credential)`, `AuthStorage.forceRefreshCredentialById(id)`, and `AuthStorage.disableCredentialById(id, cause)` public methods consumed by the auth-broker server.
157
+ - Added `AuthStorageOptions.refreshOAuthCredential` override so a remote-store client can route every OAuth refresh through the broker instead of the local OAuth endpoint.
158
+ - Added `REMOTE_REFRESH_SENTINEL` (`"__remote__"`) — the wire placeholder substituted for OAuth refresh tokens in broker snapshots; clients never see the real refresh token.
159
+ - Exposed the OAuth provider catalog (`getOAuthProviders`, `OAuthProvider`, `OAuthProviderInfo`) and `refreshOAuthToken` through the package barrel so the coding-agent CLI can target them without reaching into `utils/oauth`.
160
+ - Added the auth-gateway subsystem (`@gajae-code/ai/auth-gateway`) — a forward-proxy that sits between unauthenticated clients (the macOS usage widget, llm-git, robogjc containers, …) and the broker. Clients send standard provider-format requests; the gateway parses them into gjc's canonical `Context`, dispatches through pi-ai's `streamSimple()`, and translates the canonical event stream back to the matching wire format. `Authorization` is injected server-side so access tokens never leave the gateway host. Wire surface:
161
+ - `GET /healthz` — unauth liveness.
162
+ - `GET /v1/usage` — aggregated provider usage; 5-min per-credential cache via `AuthStorage.fetchUsageReports`.
163
+ - `GET /v1/models` — model catalog (scoped to providers with credentials).
164
+ - `POST /v1/chat/completions` — OpenAI chat-completions in/out.
165
+ - `POST /v1/messages` — Anthropic messages in/out (text + thinking + tool_use blocks, SSE event taxonomy preserved).
166
+ - `POST /v1/responses` — OpenAI Responses in/out (reasoning items + function_call output items, SSE pass-through).
167
+ - Added exports from `@gajae-code/ai/auth-gateway`: `startAuthGateway`, `AuthGatewayServerOptions`, `AuthGatewayBootOptions`, `AuthGatewayServerHandle`, `ModelResolver`, `DEFAULT_AUTH_GATEWAY_BIND`. Per-format `parseRequest` / `encodeResponse` / `encodeStream` triples are reachable via the `./providers/*` subpath as `openai-chat-server`, `anthropic-messages-server`, and `openai-responses-server`.
168
+ - Added `listProvidersWithEnvKey()` to enumerate every provider with an env-var fallback (used by the new migrate command in coding-agent).
169
+
170
+ ### Changed
171
+
172
+ - Changed `GET /v1/snapshot` to support generation-based polling with `If-None-Match` and `wait` for long-poll updates and to return `304` when no snapshot changes are available
173
+ - Changed Bedrock credential resolution for streaming calls to prefer environment keys, AWS profile/SSO credentials, and IMDSv2 fallback when available
174
+ - Changed auth-gateway parsing for OpenAI chat-completions and Responses to ignore unsupported SDK-only fields instead of rejecting requests
175
+ - Changed auth-gateway protocol handling to include CORS headers on responses and support browser-origin requests
176
+ - Changed prompt-cache handling to resolve cache keys from request metadata and headers and preserve them through protocol translation
177
+ - Changed Anthropic messages parsing to forward request `metadata` through to downstream execution
178
+ - Changed usage report caching to use a 5-minute per-credential TTL with jittered refresh timing to reduce usage endpoint rate-limit collisions
179
+ - Changed usage polling failure handling so transient errors continue serving the last known report instead of returning null and dropping the credential from usage aggregates after cache expiry
180
+ - Changed `sanitizeSchemaForGoogle` to normalize snake_case schema keys (such as `any_of` and `additional_properties`) to camelCase and auto-generate `propertyOrdering` for multi-property objects
181
+ - Changed strict-mode sanitization to resolve `$ref` nodes with sibling keys by inlining and merging referenced local definitions
182
+ - Changed strict-mode sanitization to flatten single-entry `allOf` nodes and remove the `allOf` wrapper
183
+ - Changed Anthropic tool schema normalization to preserve supported metadata keywords such as `$ref`, `$defs`, `$schema`, `enum`, `const`, `default`, `title`, and `nullable` instead of stripping them
184
+ - Changed string schema processing to retain only supported `format` values (`date-time`, `time`, `date`, `duration`, `email`, `hostname`, `uri`, `ipv4`, `ipv6`, `uuid`) and demote unsupported `format` values to `description` hints
185
+
186
+ ### Fixed
187
+
188
+ - Fixed OAuth credential refresh flow so concurrent manual and background refreshes now share one in-flight attempt per credential, and `RemoteAuthCredentialStore` now re-synchronizes before using near-expiring OAuth credentials
189
+ - Fixed stale-credential handling after auth failures by waiting for updated broker snapshots and refreshing suspect credentials through broker endpoints before continuing
190
+ - Fixed Google Generative AI startup behavior to throw a clear API-key-required error when no key is configured
191
+ - Fixed AWS Bedrock image message serialization to preserve base64 `source.bytes` payloads instead of decoding and rebuilding them
192
+ - Fixed Google provider error handling to extract the API-reported `error.message` from JSON response bodies when available
193
+ - Fixed `RemoteAuthCredentialStore.getUsageReport` to return the matching credential-specific usage report and coalesce parallel callers into one broker `/v1/usage` fetch
194
+ - Fixed auth-broker credential upload validation to reject the remote refresh-token sentinel and prevent storing a non-refresh value
195
+ - Fixed OpenAI Responses streaming output to emit `reasoning_summary_text` events and parse/send `summary_text` reasoning payloads
196
+ - Fixed Anthropic stop-sequence handling by trimming requests to the API limit of four entries before forwarding
197
+ - Fixed prompt caching behavior across protocol translations so cached-token usage is preserved when Anthropic and OpenAI requests are routed through each other
198
+ - Fixed Anthropic model usage fetching to retry transient `429` and `5xx` responses with exponential backoff, respecting `Retry-After` before returning failure
199
+ - Fixed auth-gateway request translation to preserve OpenAI Responses string/system message content, reasoning replay payloads, completed item text in stream item-done events, Anthropic tool-result ordering, and OpenAI Chat/Responses cached-token usage totals
200
+ - Fixed auth-gateway failure handling so unsupported request controls, upstream terminal errors, non-streaming aborts, and already-aborted client requests fail explicitly instead of being accepted, ignored, or encoded as successful HTTP 200 responses
201
+ - Fixed Gemini CLI / Antigravity tool schema normalization to run the full Cloud Code Assist pipeline, matching shared Google schema handling for union/object merging and nullable extraction
202
+ - Fixed stripped validation hints to be preserved as description spill text (`{key: value}` blocks) when `normalizeSchemaForGoogle` and `normalizeSchemaForCCA` drop unsupported schema keywords
203
+ - Fixed `sanitizeSchemaForGoogle` to collapse nullability forms (`type:'null'` and null-bearing `anyOf` variants) into `nullable` while preserving remaining variants
204
+ - Fixed `sanitizeSchemaForGoogle` to inline local `$defs` references instead of dropping `$ref`/`$defs` structure during Google schema sanitization
205
+ - Fixed `normalizeAnthropicToolSchema` to handle self-referential schemas without infinite recursion
206
+ - Fixed object schema normalization so explicit open-map declarations (`additionalProperties: true` and schema-valued `additionalProperties`) are preserved instead of being converted to closed objects
207
+ - Fixed unsupported schema constraints on arrays and strings (`maxItems`, `uniqueItems`, `pattern`, `minLength`, `maxLength`, and `minItems` when greater than 1) by demoting them into `description` rather than dropping them
208
+
209
+ ### Security
210
+
211
+ - Hardened auth-gateway bearer-token checks with constant-time comparison to avoid timing-side-channel leaks
212
+
213
+ ## [15.1.2] - 2026-05-15
214
+ ### Breaking Changes
215
+
216
+ - Rejected draft-07 tuple and dependency keywords (`items` arrays, `dependencies`, `additionalItems`) in JSON Schema validation
217
+
218
+ ### Added
219
+
220
+ - Added `responseHeaders`, `responseStatus`, and `responseRequestId` fields to `MockResponse` so mock providers can provide synthetic `ProviderResponseMetadata`
221
+ - Added `onResponse` metadata emission for mocks that sends lowercased headers and a default status of 200 before streaming when response headers are configured
222
+ - Added recursive strict-mode sanitization for array `prefixItems` entries so tuple schemas now enforce object constraints per item
223
+
224
+ ### Changed
225
+
226
+ - Normalized legacy draft-07 JSON Schema constructs used in tool parameters (`items` arrays, `additionalItems`, `definitions`, `dependencies`) to draft 2020-12 before OpenAI/Google/CCA sanitization, wire conversion, and argument validation
227
+ - Reworked OpenAI response schema adaptation to rewrite `oneOf` into `anyOf` while preserving existing `anyOf` branches
228
+ - Changed tuple array validation to validate per-index schemas from `prefixItems` and apply `items` only to remaining elements
229
+
230
+ ### Fixed
231
+
232
+ - Fixed validation of plain JSON Schema tool arguments that omitted a `$schema` URI so draft-07-shaped schemas now pass validation instead of being rejected
233
+ - Fixed tuple-array validation for legacy JSON Schema tool schemas to enforce `additionalItems: false` and per-position constraints after automatic draft upgrade
234
+ - Fixed Anthropic tool schema normalization to recurse into `prefixItems` so unsupported constraints inside tuple items are stripped in the generated input schema
235
+ - Fixed Anthropic tool-schema normalization stripping the body of explicit open `additionalProperties` (e.g. Zod's `z.record(z.string(), z.unknown())` compiling to `additionalProperties: {}`) by unconditionally overwriting it with `false`, which closed record-style fields and prevented models from supplying any key. The coding-agent's `resolve` tool exposes plan-approval titles via such a field, so Kimi K2 (and any other Anthropic-shaped provider) could not pass `extra: { title }`, blocking plan mode entirely ([#1104](https://github.com/can1357/gajae-code/issues/1104))
236
+ - Fixed Anthropic strict tool planning to leave tools with open `additionalProperties` maps non-strict instead of sending schemas Anthropic rejects.
237
+
238
+ ## [15.1.0] - 2026-05-15
239
+
240
+ ### Breaking Changes
241
+
242
+ - Removed TypeBox root exports (`Type`, `Static`, and `TSchema`) from the package entrypoint, so callers importing those symbols from `@gajae-code/ai` must migrate to `zod` or `@gajae-code/ai/types`
243
+
244
+ ### Added
245
+
246
+ - Added support for defining tool schemas with Zod (`z.object`, `z.string`, etc.) by allowing `Tool.parameters` to be either Zod schemas or legacy JSON Schema objects and converting them to provider wire format automatically
247
+ - Added package-level schema helpers in the `zod/v4` style by exporting `z` and `ZodType` from the root entrypoint
248
+ - Added a `mock` API provider via `createMockModel` to build `Model<"mock">` instances for fully in-memory, deterministic assistant streams in tests
249
+ - Added `streamMock` and `registerMockApi` so mock responses can be consumed through `stream()` and the global custom API registry without an external model backend
250
+ - Added async/sync response scripting with optional context-based handlers, and new `push()`/`reset()` controls to drive multi-turn mock interactions and inspect per-call invocation state
251
+ - Added support in mock responses for simulating tool calls, usage metadata, custom stop reasons, delayed emissions, and terminal error/aborted outcomes
252
+
253
+ ### Changed
254
+
255
+ - Changed Azure OpenAI Responses tool schema conversion to sanitize tool parameter schemas and rewrite `oneOf` branches as `anyOf` so tool calls remain compatible with Azure's schema expectations
256
+ - Changed `Static<S>` to extract a schema object’s `static` type when present, improving inferred tool argument types for non-Zod parameter definitions
257
+ - Changed `Static` typing behavior so it now infers argument types from Zod schemas and defaults to `unknown` for non-Zod JSON Schema parameter definitions
258
+ - Restored the default steady-state stream idle timeout to 120s (regressed in 15.0.0). 30s was too aggressive for reasoning models, slow proxies, and tool-call planning gaps, surfacing as repeated `Provider stream stalled while waiting for the next event` errors. Existing `PI_STREAM_IDLE_TIMEOUT_MS` / `PI_OPENAI_STREAM_IDLE_TIMEOUT_MS` overrides are unchanged.
259
+
260
+ ### Fixed
261
+
262
+ - Preserved top-level unknown fields in validated tool-call arguments so extra root properties are retained after schema coercion
263
+ - Fixed coercion for Zod `record` fields by parsing JSON-stringified record arguments into objects
264
+ - Validated legacy draft-07 JSON Schema tool parameters directly instead of converting through Zod, improving support for features like `$ref`, `definitions`, `nullable`, and `uniqueItems`
265
+ - Fixed Cloud Code Assist schema preparation to strip unsupported `propertyNames` and fall back to a minimal tool schema when schema meta-validation detects malformed keywords
266
+ - Fixed OpenAI Completions streaming to avoid treating non-output chunks (including role-only preambles) as progress events so idle-timeout watchdog behavior no longer hangs on no-op streamed chunks
267
+ - Fixed Cloud Code Assist schema compatibility checks by replacing strict AJV meta-schema validation with structural JSON Schema validation to avoid rejecting structurally valid tool schemas
268
+ - Fixed lazy built-in provider streams (`anthropic-messages`, `bedrock-converse-stream`, `cursor-agent`, `google-*`, `ollama-chat`, `openai-*`) prematurely aborting slow first-token responses with `Provider stream stalled while waiting for the next event`. The lazy-stream watchdog wrapper was treating the synthetic `start` event (yielded immediately by every provider before the model emits any tokens) as the first real item, which caused the watchdog to drop from `firstItemTimeoutMs` (100s) to `idleTimeoutMs` (30s) before the upstream model had produced anything. The shared `iterateWithIdleTimeout` now keeps `awaitingFirstItem` true until a real progress item arrives, and the lazy-stream wrapper marks `start` as a non-progress keepalive ([#1073](https://github.com/can1357/gajae-code/pull/1073) regression).
269
+ - Heal leaked Kimi K2 chat-template tool-call tokens (`<|tool_calls_section_begin|>` … `<|tool_call_argument_begin|>` … `<|tool_calls_section_end|>`) that some hosts (native `kimi-code` API, OpenRouter, Fireworks, etc.) emit into `delta.content` instead of structured `tool_calls`. The OpenAI-completions stream consumer now strips the markers from visible text, reconstructs the embedded calls as proper `toolCall` content blocks (stream-aware, token-boundary-safe), and promotes `finish_reason: stop` to `toolUse` when calls were healed.
270
+ - Fixed OpenAI-completions Kimi K2 healed-call promotion clobbering non-stop terminal finish reasons (`error`, `length`, `aborted`); promotion now only fires when the prior stop reason is the natural-completion `stop`
271
+ - Fixed OpenAI-completions duplicate Kimi tool calls when a single chunk delivers both leaked markers and a structured `delta.tool_calls`; the healer now strips visible markers but discards its synthesized calls so structured payloads remain the single source of truth
272
+ - Fixed Kimi tool-call healer synthesizing a bogus empty call when assistant text mentions a literal `<|tool_call_end|>` (or `<|tool_call_begin|>` / `<|tool_call_argument_begin|>`) outside an active `<|tool_calls_section_begin|>…<|tool_calls_section_end|>` section; the tokens now survive as text
273
+ - Fixed OpenAI-completions ignoring per-request `StreamOptions.streamFirstEventTimeoutMs` when configuring the underlying OpenAI SDK HTTP timeout, causing slow-before-headers providers to be aborted at the env default before the wrapping watchdog armed
274
+ - Fixed JSON Schema validator silently accepting values that violate `propertyNames`, `patternProperties`, `dependentRequired`, `dependencies`, `if`/`then`/`else`, `contains`, and `prefixItems`; the in-tree validator now enforces these keywords instead of falling through. `unevaluatedProperties`/`unevaluatedItems` remain permissive but log a one-time warning so tool authors are not surprised.
275
+ - Fixed recursive `$ref` schemas being treated as universally valid: the validator previously short-circuited on the second occurrence of any ref it had already seen, so nested values violating the referenced sub-schema passed. Cycle detection now keys on (ref, value-identity) pairs with a depth cap for primitive values, so genuine sub-tree violations are still caught.
276
+ - Fixed JSON Schema meta-validator accepting malformed `if`/`then`/`else` and `dependencies` keywords; each conditional sub-schema is now structurally validated and draft-07 `dependencies` accepts either a schema or a string array of dependent keys.
277
+ - Fixed Zod-emitted wire schemas dropping null-valued unknown root fields before `preserveUnknownRootFields` could snapshot them, so callers like `task.simple` no longer lose a `schema: null` argument and downstream rejection paths fire as intended.
278
+ - Fixed mock provider partial `Usage` to recompute `totalTokens` (and `cost.total` when cost components are supplied) when omitted, instead of reporting 0
279
+ - Fixed mock provider auto-generated tool-call IDs to use a per-instance counter (now reset by `reset()`), so test order no longer affects IDs across `createMockModel()` instances
280
+
281
+ ## [15.0.2] - 2026-05-15
282
+ ### Fixed
283
+
284
+ - Fixed `StreamOptions.fetch` typing to accept fetch-compatible override functions that do not expose `preconnect`, allowing custom fetch implementations to be used without type errors across runtimes
285
+ - Fixed Moonshot Kimi K2.6 forced tool calls to send `thinking: { type: "disabled" }`, avoiding `tool_choice 'specified' is incompatible with thinking enabled` 400s while preserving the requested named tool ([#1077](https://github.com/can1357/gajae-code/issues/1077)).
286
+
287
+ ## [15.0.1] - 2026-05-14
288
+ ### Breaking Changes
289
+
290
+ - Increased the minimum Bun runtime version to `>=1.3.14` for the `@aws-?` package
291
+
292
+ ### Added
293
+
294
+ - Added `installH2Fetch` to patch `globalThis.fetch` so HTTPS requests attempt HTTP/2 over ALPN with automatic HTTP/1.1 fallback when HTTP/2 is unsupported
295
+ - Added priority service-tier traffic to the `premiumRequests` accounting on OpenAI and OpenAI code provider providers. Sending `serviceTier: "priority"` now increments `usage.premiumRequests` by 1 per request, matching the existing GitHub Copilot premium-request budget semantics so downstream consumers (e.g. the `gjc stats` "Premium Reqs" card and `/usage`) reflect priority traffic alongside Copilot premium calls.
296
+
297
+ ## [15.0.0] - 2026-05-13
298
+
299
+ ### Added
300
+
301
+ - Added `AuthStorage.onCredentialDisabled(listener)` — a multi-subscriber `on/off` API for `credential_disabled` events. Returns an unsubscribe function; calling it more than once is a no-op. Multiple subscribers all receive every disable event, with synchronous and async exceptions isolated per-listener so a misbehaving subscriber cannot starve the rest of the chain. Buffer-and-replay semantics are preserved: events emitted while no listener is subscribed are buffered (FIFO, capped at 32) and replayed once to the listener that triggers the empty→non-empty transition. After every subscriber unsubscribes, subsequent disable events buffer again until the next subscribe.
302
+
303
+ ### Fixed
304
+
305
+ - Fixed OAuth credentials being silently disabled when two gjc processes (or any two `AuthStorage` instances sharing a `agent.db`) race on token refresh. Anthropic rotates refresh tokens on every use, so the loser's `invalid_grant` response previously soft-deleted the row that the winner just rotated, forcing the user to `/login` again. `#tryOAuthCredential` now re-reads the row from disk before declaring a definitive failure: if the persisted `refresh` differs from the snapshot it tried, the peer-rotated credential is reloaded and the request retries against the fresh token instead of disabling the live row.
306
+ - Closed a remaining race window in OAuth refresh-failure handling: between re-reading the credential row to check for peer rotation and the subsequent soft-delete, another process could still complete a refresh and rotate the row, leaving us to disable the freshly-rotated credential by `id`. The disable now runs as a single CAS update conditioned on the row's `data` still matching the snapshot we tried to refresh, and on `disabled_cause IS NULL`. If the CAS reports 0 rows changed (peer rotation, or row already disabled by a concurrent failure on the same snapshot), we reload from disk and retry instead of mutating the wrong row or emitting a spurious `credential_disabled` event.
307
+ ### Changed
308
+ - Lowered the default steady-state stream idle timeout from 120s to 30s while preserving the existing environment overrides.
309
+
310
+ ### Fixed
311
+ - Lazy built-in provider streams now enforce the shared idle watchdog and abort stalled provider requests, so session auto-retry can continue after transient network drops instead of remaining stuck. Caller aborts still terminate as aborted.
312
+
313
+ ## [14.9.3] - 2026-05-10
314
+
315
+ ### Fixed
316
+ - Anthropic provider now retries generic transient connect failures (`unable to connect`, `fetch failed`, `connection error`, etc.) by falling back to the shared `isRetryableError` allowlist after the provider-specific patterns. Previously these errors bypassed the hand-curated regex in `isProviderRetryableError` and aborted the stream on the first attempt, while the OpenAI SDK and OpenAI code `fetchWithRetry` paths already handled them.
317
+
318
+ ## [14.9.0] - 2026-05-10
319
+
320
+ ### Added
321
+
322
+ ### Fixed
323
+ - Fixed silent forwarding of image content (for example Python plot output rendered in the terminal) to models without vision support, which produced opaque 404 errors from upstream. Image blocks are now stripped and replaced with a `[image omitted: model does not support vision]` placeholder for non-vision models, including tool-result payloads ([#967](https://github.com/can1357/gajae-code/issues/967), [#968](https://github.com/can1357/gajae-code/issues/968)).
324
+
325
+ - Added `AuthStorage` `onCredentialDisabled` callback (sync or async) so embedders can react when a credential is automatically disabled (e.g. OAuth refresh fails with `invalid_grant`) — useful for surfacing a banner or auto-launching a re-login flow instead of letting the credential silently disappear. Sync throws and async rejections are both caught and logged so a misbehaving subscriber cannot break the disable path.
326
+ - Added Anthropic OAuth `account.uuid` and `account.email_address` extraction from the `/v1/oauth/token` exchange and refresh responses; both `AnthropicOAuthFlow.exchangeToken()` and `refreshAnthropicToken()` now populate `OAuthCredentials.{accountId, email}` so downstream consumers can attribute requests to the authenticated account without a separate `/api/oauth/profile` round-trip.
327
+ - Added `onSseEvent` stream diagnostics so HTTP SSE providers can expose raw SSE frames without changing parsed model output.
328
+ - Added `streamIdleTimeoutMs` option (and `PI_STREAM_IDLE_TIMEOUT_MS` env override; `PI_OPENAI_STREAM_IDLE_TIMEOUT_MS` remains a backward-compatible alias) for a steady-state inter-event watchdog. Set to `0` to disable.
329
+ - Added a semantic-progress predicate to OpenAI Responses and OpenAI code SSE/WebSocket transports so `response.in_progress`-style keepalives no longer reset the idle deadline on stalled tool calls.
330
+
331
+ ### Changed
332
+
333
+ - Anthropic streams now enforce a steady-state idle timeout (defaults to 120s, same control as `PI_STREAM_IDLE_TIMEOUT_MS`) in addition to the first-event watchdog. Long-running responses that go fully silent between events will now surface as `Anthropic stream stalled while waiting for the next event` instead of hanging.
334
+ - Fixed `resolveAnthropicMetadataUserId()` to accept JSON-format `user_id` values that match real Anthropic Code's payload shape (`{ device_id, account_uuid, session_id, ... }` from `services/api/anthropic-model.ts:getAPIMetadata`). Previously only the synthetic `user_<hex>_account_<uuid>_session_<uuid>` cloaking format was accepted on OAuth, which caused stable session-keyed metadata supplied by callers to be discarded and replaced with fresh random entropy on every request — defeating session-count attribution on the Anthropic model OAuth path.
335
+
336
+ ## [14.8.0] - 2026-05-09
337
+
338
+ ### Fixed
339
+ - Fixed Gemini 3 Pro thinking metadata so `medium` effort is rejected with the expected error instead of being silently accepted: `ThinkingConfig` now carries an optional explicit `levels` list that survives `expandEffortRange`, letting non-contiguous supported sets (e.g. `[low, high]`) round-trip through enrichment.
340
+ - Fixed Kimi Code OAuth expiry handling to refresh access tokens 5 minutes before server expiry, avoiding daily 401s from using tokens right up to the cutoff.
341
+ - Fixed OpenAI Responses custom tool replay to preserve custom tool call item IDs with the `ctc_` prefix instead of rewriting them as `fc_` function-call IDs ([#977](https://github.com/can1357/gajae-code/issues/977)).
342
+
343
+ ## [14.7.6] - 2026-05-07
344
+
345
+ ### Added
346
+
347
+ - Added `hideThinkingSummary` option to `SimpleStreamOptions`. When true, `streamSimple` requests that the underlying provider omit reasoning/thinking summaries: Anthropic receives `thinking.display = "omitted"` (where supported), and OpenAI Responses / Azure / OpenAI code providers leave `reasoning.summary` unset so the server skips emitting the human-readable summary stream entirely.
348
+
349
+ ### Changed
350
+
351
+ - Changed OpenAI Responses, Azure OpenAI Responses, and OpenAI code provider providers to omit `reasoning.summary` from requests when `reasoningSummary` is explicitly `null` (previously fell back to `"auto"`).
352
+ ## [14.7.5] - 2026-05-07
353
+
354
+ ### Added
355
+
356
+ - Added `OpenAICompat.supportsMultipleSystemMessages` so chat-completions hosts can opt out of separate leading system blocks. Auto-detected as `true` for OpenAI, Azure, OpenRouter, Cerebras, Together, Fireworks, Groq, DeepSeek, Mistral, xAI, Z.ai, GitHub Copilot, and Zenmux; `false` for MiniMax, Alibaba Dashscope, and Qwen Portal whose chat templates reject follow-up system messages. Unknown OpenAI-compatible hosts (custom vLLM/local) default to `false`; users can opt back in via `compat.supportsMultipleSystemMessages: true`.
357
+
358
+ ### Fixed
359
+
360
+ - Fixed strict-template OpenAI-compatible hosts (e.g. Qwen 3.5+ via vLLM, MiniMax) rejecting follow-up `system`/`developer` messages by coalescing ordered system prompts into a single block joined by `\n\n` when `compat.supportsMultipleSystemMessages` is false. Canonical hosts continue to receive separate blocks so KV-cache reuse stays effective when only the trailing prompt changes ([#958](https://github.com/can1357/gajae-code/issues/958)).
361
+
362
+ ## [14.7.2] - 2026-05-06
363
+
364
+ ### Fixed
365
+
366
+ - Fixed VLLM model discovery to use `max_model_len` as the context window when the endpoint reports it.
367
+ - Fixed custom Ollama Cloud/local-proxy model aliases (for example `deepseek-v4-pro:cloud`) to inherit bundled cache-pricing metadata when the upstream model is known ([#937](https://github.com/can1357/gajae-code/issues/937)).
368
+ - Fixed local Ollama model discovery to apply `/api/show` thinking and vision capabilities in addition to native context windows ([#928](https://github.com/can1357/gajae-code/issues/928)).
369
+
370
+ ## [14.7.0] - 2026-05-04
371
+ ### Breaking Changes
372
+
373
+ - Changed `Context.systemPrompt` from a string to `string[]`, so callers must now pass an array of prompts instead of a single string
374
+ - Changed behavior will throw at runtime for non-array system prompts because request builders now normalize system prompts as an array
375
+
376
+ ### Added
377
+
378
+ - Added support for multiple system prompts by changing `Context.systemPrompt` to an ordered string array and preserving provider-appropriate instruction precedence
379
+
380
+ ### Changed
381
+
382
+ - Changed request builders for Anthropic, OpenAI, Bedrock, Azure, Cursor, Google, and Ollama to propagate every non-empty system prompt entry without demoting durable instructions into ordinary conversation turns
383
+
384
+ ### Fixed
385
+
386
+ - Filtered out empty normalized system prompts so blank entries are no longer sent to providers
387
+ - Removed blank system prompt strings from provider payloads to avoid unnecessary empty instruction messages
388
+
389
+ ## [14.6.6] - 2026-05-04
390
+
391
+ ### Added
392
+
393
+ - Added always-on OpenRouter response caching (1h TTL) by sending `X-OpenRouter-Cache: true` and `X-OpenRouter-Cache-TTL: 3600` on every OpenRouter request — identical requests replay from OpenRouter's edge cache for free. https://openrouter.ai/docs/features/response-caching
394
+
395
+ ## [14.6.4] - 2026-05-03
396
+
397
+ ### Fixed
398
+
399
+ - Fixed OpenAI code provider websocket continuations to retry with full context when `previous_response_id` expires server-side instead of surfacing `previous_response_not_found`.
400
+
401
+ ## [14.6.2] - 2026-05-03
402
+ ### Added
403
+
404
+ - Added `EventStream.fail(err)` method to terminate the async iterator with an error, enabling consumers to catch stream-level failures via `for await` without hanging
405
+
406
+ ### Fixed
407
+
408
+ - Fixed OpenAI Responses tool schema conversion to rewrite non-strict `oneOf` unions to `anyOf` before sending tools to the Responses API ([#920](https://github.com/can1357/gajae-code/issues/920))
409
+
410
+ ## [14.6.0] - 2026-05-02
411
+
412
+ ### Added
413
+
414
+ - Added `disableReasoning` to stream and OpenAI completion options to force reasoning off for models that support it, sending `reasoning: { enabled: false }` for OpenRouter-compatible requests
415
+ - Added `thinkingDisplay` option to Anthropic options to control whether adaptive and explicit reasoning is returned as `summarized` or `omitted`
416
+ - Added Anthropic model compatibility flags `supportsEagerToolInputStreaming` and `supportsLongCacheRetention` for API-capability-specific request behavior
417
+
418
+ ### Changed
419
+
420
+ - Changed Anthropic request payloads to send `thinking: { type: "disabled" }` when `thinkingEnabled` is explicitly `false` on reasoning-enabled models
421
+ - Changed Anthropic cache retention handling so `cacheRetention: "long"` now uses `ttl: "1h"` only for canonical Anthropic endpoints with long-cache support
422
+ - Changed Anthropic tool schema generation to include `eager_input_streaming` only on models that advertise support
423
+ - Changed Anthropic OAuth login flow to include browser fallback guidance and richer error context when token exchange or refresh fails
424
+
425
+ ### Fixed
426
+
427
+ - Fixed Anthropic non-thinking requests to include the caller-provided `temperature` value in request payloads
428
+ - Fixed Anthropic `anthropic-model-opus-4-7` non-thinking payloads to omit sampling fields (`temperature`, `top_p`, and `top_k`)
429
+ - Fixed OpenAI code provider base URL normalization so configured base URLs with or without `/openai-code` or `/openai-code/responses` now resolve to `/openai-code/responses`
430
+ - Fixed OpenAI code provider websocket handling to parse JSON from non-string message payloads including `ArrayBuffer`, typed arrays, and `Blob` values
431
+ - Fixed OpenAI code provider websocket handshakes to replace stale `openai-beta` values with the websocket beta and avoid sending request-body headers over websocket transport
432
+ - Fixed abort tracking so caller-initiated cancellations are treated as user aborts even after local watchdog timeouts, preventing unintended automatic retries
433
+ - Fixed Anthropic stream handling to parse raw SSE envelopes directly, ignore unrelated events, and repair malformed JSON in SSE payloads
434
+ - Fixed Anthropic streaming to emit an explicit error when the SSE stream ends without a `message_stop` event
435
+ - Fixed OpenAI code provider websocket continuations to send true `previous_response_id` deltas for `store: false` transcripts, expose request stats, and default text verbosity to `low` unless explicitly overridden.
436
+ - Fixed OpenAI code provider websocket append reuse after `response.completed` terminal events.
437
+
438
+ ## [14.5.14] - 2026-05-01
439
+ ### Added
440
+
441
+ - Added package-level `google-gemini-headers` exports (`getGeminiCliHeaders`, `getGeminiCliUserAgent`, `getAntigravityHeaders`, `extractRetryDelay`, and `ANTIGRAVITY_SYSTEM_INSTRUCTION`) for header and retry handling reuse without importing full Google providers
442
+
443
+ ### Changed
444
+
445
+ - Changed package exports and streaming/provider wiring to load heavy Google/Kimi/GitLab/synthetic provider modules lazily through `register-builtins`, reducing startup import overhead from optional provider SDKs
446
+
447
+ ### Fixed
448
+
449
+ - Fixed DeepSeek V4 tool-call follow-up 400 errors from three root causes:
450
+ - Mapped `reasoning_effort` "xhigh" to "max" for DeepSeek-family models on any provider (NVIDIA, OpenCode-Go, etc.), not just `deepseek`
451
+ - Recovered `reasoning_content` from thinking blocks with valid signatures that were filtered by the non-empty-text check
452
+ - Added empty-string fallback when `reasoning_content` is genuinely absent (e.g. proxy-stripped) but the provider requires the field
453
+
454
+ ## [14.5.13] - 2026-05-01
455
+
456
+ ### Breaking Changes
457
+
458
+ - Removed `utils/oauth` re-exports from the package entrypoint, so OAuth helper imports from the root module must be updated
459
+
460
+ ## [14.5.10] - 2026-04-30
461
+
462
+ ### Added
463
+
464
+ - Added provider response metadata callbacks for Anthropic and OpenAI streaming requests.
465
+
466
+ ## [14.5.9] - 2026-04-30
467
+
468
+ ### Added
469
+
470
+ - Added `usage.reasoningTokens` to OpenAI and Google usage output when providers report reasoning/thinking tokens
471
+ - Added `usage.cttl.ephemeral5m` and `usage.cttl.ephemeral1h` to report Anthropic cache-write TTL token buckets
472
+ - Added `usage.server.webSearch` and `usage.server.webFetch` to report Anthropic server tool-call request counts
473
+
474
+ ### Fixed
475
+
476
+ - Fixed OpenAI usage attribution to avoid double-counting `reasoning_tokens` in output totals
477
+ - Fixed Anthropic streaming usage handling so a previously populated cache TTL breakdown is preserved when later events omit `cache_creation`
478
+
479
+ ## [14.5.4] - 2026-04-28
480
+
481
+ ### Changed
482
+
483
+ - Changed OpenAI custom Lark grammar payloads to strip comments and blank lines before sending provider requests.
484
+
485
+ ### Fixed
486
+
487
+ - Fixed OpenAI code provider GPT model pricing by inheriting matching OpenAI catalog rates for zero-priced discovered OpenAI code entries.
488
+
489
+ ## [14.5.3] - 2026-04-27
490
+
491
+ ### Added
492
+
493
+ - Added `fireworks` as a supported provider with API key login flow and credential storage
494
+ - Added Fireworks model catalog support with `fireworks`-scoped openai-completions models `glm-5`, `glm-5.1`, `kimi-k2.5`, `kimi-k2.6`, and `minimax-m2.7`
495
+ - Added built-in discovery wiring so providers with base URL `api.fireworks.ai` are recognized as OpenAI-compatible and can use streaming token control
496
+
497
+ ### Changed
498
+
499
+ - Updated the built-in model catalog to use corrected `contextWindow` and `maxTokens` values for many existing models instead of placeholder limits
500
+ - Updated several model cost entries, including cache-read pricing, to corrected values
501
+
502
+ ### Fixed
503
+
504
+ - Fixed Fireworks request formatting by translating between public model IDs and API wire IDs when sending OpenAI-completions requests
505
+ - Fixed OpenAI-compatible model parameter handling for Fireworks by allowing `max_tokens` to be sent during requests
506
+
507
+ ## [14.5.1] - 2026-04-26
508
+
509
+ ### Fixed
510
+
511
+ - Fixed NVIDIA NIM DeepSeek-V4 models leaking chat-template tool-call markers (e.g. `<|DSML|tool_calls|>`) into visible response text by stripping the special tokens from streamed `delta.content` ([#798](https://github.com/can1357/gajae-code/issues/798))
512
+
513
+ ## [14.4.0] - 2026-04-26
514
+
515
+ ### Added
516
+
517
+ - Added an `examples` option to `StringEnum` to include example values in the generated schema
518
+
519
+ ### Changed
520
+
521
+ - Changed Anthropic tool schema generation to strip unsupported schema fields (including `patternProperties`), add `additionalProperties: false` for object types, and apply Anthropic strict-mode limits when marking tools as strict
522
+ - Changed Anthropic strict tool planning to cap strict `tools` at twenty entries and convert excess optional/union parameters to nullable schemas to stay within provider constraints
523
+
524
+ ### Fixed
525
+
526
+ - Fixed Anthropic tool schema compilation failures by keeping the `write` tool out of the strict-tool allowlist when the full coding-agent tool set is active
527
+ - Fixed Anthropic 400 `tools.*.custom: For 'object' type, property 'minItems' is not supported` by stripping `minItems` from object-shaped JSON schema nodes (array nodes still keep supported `minItems` values)
528
+ - Fixed Anthropic tool schemas that used tuple-style arrays by stripping unsupported `maxItems` and only preserving provider-supported `minItems` values
529
+ - Fixed Anthropic and OpenRouter Anthropic tool calls that previously failed with `compiled grammar is too large` by retrying automatically without strict tool schemas and reusing non-strict mode for subsequent requests in the same provider session
530
+ - Fixed parsing of JSON tool arguments containing raw control characters inside string values (such as embedded newlines) by escaping them before JSON parsing
531
+ - Fixed `validateToolArguments` to accept stringified objects and arrays that include literal control characters inside string fields
532
+ - Fixed OpenAI code provider Spark OAuth selection to fall back to non-Pro accounts when no ChatGPT Pro account is connected, so users without a Pro account can still attempt Spark requests in case the server permits access.
533
+
534
+ ## [14.3.0] - 2026-04-25
535
+
536
+ ### Added
537
+
538
+ - Added support for Anthropic model Opus 4.7 (`anthropic-model-opus-4-7`) model ([#726](https://github.com/can1357/gajae-code/issues/726))
539
+ - Suppresses sampling parameters (temperature/top_p/top_k) that Opus 4.7 rejects
540
+ - Enables `display: "summarized"` for adaptive thinking to restore visible thinking content
541
+
542
+ ### Fixed
543
+
544
+ - Fixed Cursor provider losing conversation history on follow-up turns (model responding "this appears to be the start of our session") by populating `ConversationStateStructure.rootPromptMessagesJson` with JSON blob IDs for the system prompt plus prior user/assistant/tool-result messages. Cursor's server builds the model prompt from `rootPromptMessagesJson`, not from the protobuf `turns[]` tree, so sending only the system prompt there caused prior turns to be dropped
545
+ - Fixed Cursor provider multi-turn conversations failing with `Connect error internal: Blob not found` on the second message by storing `ConversationStateStructure.turns`, `AgentConversationTurnStructure.user_message`, and `AgentConversationTurnStructure.steps` as content-addressed blob IDs in the KV store (matching the existing handling for `rootPromptMessagesJson`) rather than sending the raw serialized bytes inline ([#678](https://github.com/can1357/gajae-code/issues/678))
546
+
547
+ ## [14.2.1] - 2026-04-24
548
+
549
+ ### Fixed
550
+
551
+ - Fixed OpenAI code provider Spark OAuth selection to require a verified ChatGPT Pro account instead of falling back to Plus or unknown-plan accounts.
552
+
553
+ ## [14.2.0] - 2026-04-23
554
+
555
+ ### Added
556
+
557
+ - Added `gpt-5.5` to the built-in model catalog for both OpenAI Responses (`openai`) and local `litellm` (`openai-completions`) providers
558
+ - Added `gpt-image-2` to the `litellm` built-in model catalog
559
+ - Added `isCopilotTransientModelError()` and `callWithCopilotModelRetry()` helpers in `utils/retry` that detect GitHub Copilot's intermittent `HTTP 400 model_not_supported` responses for preview models (`gpt-5.3-openai-code`, `gpt-5.4`, `gpt-5.4-mini`, ...) and retry the request up to three times with backoff. OpenAI Responses, OpenAI Completions, and Anthropic provider paths now participate in this retry when the model is served through Copilot.
560
+ - Added OpenAI Responses custom-tool grammar support for patch-envelope `apply_patch` calls, including freeform streaming, history replay, and forced tool-choice mapping to the custom wire name.
561
+
562
+ ### Changed
563
+
564
+ - Updated built-in model metadata with revised `contextWindow`, `maxTokens`, and pricing values for existing entries
565
+ - Changed generated model policies to assign `applyPatchToolType: "freeform"` for first-party GPT-5 OpenAI Responses and OpenAI code models, so regenerated `models.json` preserves the `apply_patch` custom-tool metadata.
566
+ - Renamed `rewriteCopilotAuthError` to `rewriteCopilotError` and extended it to rewrite `HTTP 400 model_not_supported` after retries are exhausted with guidance about Copilot's OAuth-client-specific rollout gap (see opencode#13313).
567
+
568
+ ### Fixed
569
+
570
+ - Fixed Amazon Bedrock proxy handling to honor lowercase `http_proxy`, `https_proxy`, and `all_proxy` environment variables when using HTTP/1 fallback
571
+ - Fixed Amazon Bedrock streaming behind corporate HTTP proxies by using a proxy-aware HTTP/1 transport when `HTTPS_PROXY`, `HTTP_PROXY`, or `ALL_PROXY` is configured, including AWS SSO credential calls.
572
+ - Fixed Amazon Bedrock requests to retry once with HTTP/1 when the AWS SDK's default HTTP/2 transport fails before streaming begins.
573
+ - Fixed OpenAI Responses streaming to display thinking tokens from local providers (llama.cpp, etc.) that send raw `reasoning_text.delta` events and empty `summary` arrays in `output_item.done`. Previously, thinking content was silently dropped during streaming while non-streaming mode worked correctly.
574
+ - Synced the bundled OpenCode Go catalog with the current docs so `kimi-k2.6`, `mimo-v2.5`, and `mimo-v2.5-pro` appear in offline/default model lists.
575
+
576
+ ## [14.1.3] - 2026-04-17
577
+
578
+ ### Fixed
579
+
580
+ - Preserved user-provided `session_id` and `x-client-request-id` headers in OpenAI Responses requests instead of overriding them with automatic session-derived values
581
+ - Stopped sending `session_id` and `x-client-request-id` headers for OpenAI Responses requests when `cacheRetention` is set to `none`
582
+ - Fixed direct OpenAI Responses requests to send `session_id` and `x-client-request-id` from the same session-derived value as `prompt_cache_key`, improving prompt cache affinity for append-only sessions
583
+
584
+ ## [14.1.1] - 2026-04-14
585
+
586
+ ### Added
587
+
588
+ - Added `toolStrictMode` compatibility option (`"all_strict"` or `"none"`) to OpenAI-compatible model config to force tool schemas to be sent uniformly strict, uniformly non-strict, or keep mixed per-tool behavior
589
+
590
+ ### Changed
591
+
592
+ - Changed Cerebras OpenAI-compatible providers to default `toolStrictMode` to `"all_strict"` unless explicitly overridden
593
+
594
+ ### Fixed
595
+
596
+ - Fixed OpenAI Completions handling for providers that reject mixed `strict` flags by automatically retrying with non-strict tool schemas when an initial all-strict tool request fails with strict-format 400/422 errors
597
+ - Fixed OpenAI-completions error reporting by including captured JSON error body details such as type, param, and code when a request fails without a body in the thrown SDK error
598
+ - Fixed shell execution failure responses to preserve all result fields when sanitizing, preventing truncated metadata in stream results
599
+ - Fixed context overflow detection to recognize `model_context_window_exceeded` from z.ai / GLM providers, preventing infinite retry loops when context window is exceeded ([#638](https://github.com/can1357/gajae-code/issues/638))
600
+ - Fixed strict tool schema enforcement to preserve `additionalProperties: false` and required keys for reused nested object schemas, preventing invalid `todo_write` function schemas in OpenAI code/OpenAI requests
601
+ - Fixed GitHub Copilot reasoning regressions by preserving GPT-5.x / Anthropic model 4.x reasoning controls instead of stripping them from requests ([#773](https://github.com/can1357/gajae-code/issues/773))
602
+
603
+ ## [14.1.0] - 2026-04-11
604
+
605
+ ### Added
606
+
607
+ - Added `accountId` to usage report metadata
608
+
609
+ ### Changed
610
+
611
+ - Changed usage parsing to emit a usage report with available fields when parsing fails, rather than returning null
612
+
613
+ ### Fixed
614
+
615
+ - Fixed `planType` resolution to fall back to the raw payload `plan_type` when parsed value is absent
616
+ - Fixed usage metadata `raw` fallback to preserve the original payload when parsed raw output is missing
617
+
618
+ ## [14.0.5] - 2026-04-11
619
+
620
+ ### Changed
621
+
622
+ - Replaced GitHub Copilot authentication from VSCode extension impersonation to the opencode OAuth flow, eliminating TOS concerns. Existing users will need to re-authenticate once with `/login github-copilot`.
623
+ - Simplified Copilot token handling: GitHub OAuth token is used directly for all API requests (no JWT exchange or refresh cycle).
624
+ - Changed GitHub Copilot API base URL from `api.individual.githubcopilot.com` to `api.githubcopilot.com`.
625
+ - Updated default OpenAI stream idle timeout to 120,000 milliseconds to keep stream generation alive longer
626
+
627
+ ### Fixed
628
+
629
+ - Fixed duplicate synthetic tool results being generated when a real tool result appears later in message history
630
+ - Fixed GitHub Copilot `/models` discovery to unwrap structured OAuth credentials before sending the bearer token, preserving dynamic catalog refresh for OAuth-backed callers.
631
+
632
+ ### Removed
633
+
634
+ - Removed Copilot JWT proxy-ep base URL resolution (no longer needed with opencode auth).
635
+
636
+ ## [14.0.3] - 2026-04-09
637
+
638
+ ### Fixed
639
+
640
+ - Fixed Ollama discovery cache normalization so cached models upgrade to the OpenAI Responses transport after the provider change
641
+
642
+ ## [14.0.0] - 2026-04-08
643
+
644
+ ### Breaking Changes
645
+
646
+ - Removed `coerceNullStrings` function and its automatic null-string coercion behavior from JSON parsing
647
+
648
+ ### Added
649
+
650
+ - Added support for OpenRouter provider with strict mode detection
651
+ - Added automatic cleaning of literal escape sequences (`\n`, `\t`, `\r`) in JSON parsing to handle LLM encoding confusion
652
+ - Added support for healing JSON with trailing junk after balanced containers (e.g., `]\n</invoke>`)
653
+ - Added `OPENAI_CODE_STARTUP_EVENT_CHANNEL` constant and `OpenAI codeStartupEvent` type for monitoring OpenAI code provider initialization status
654
+ - Added automatic healing of malformed JSON with single-character bracket errors at the end of strings, improving LLM tool argument parsing robustness
655
+
656
+ ## [13.19.0] - 2026-04-05
657
+
658
+ ### Fixed
659
+
660
+ - Fixed GitHub Copilot model context window detection by correcting fallback priority for maxContextWindowTokens and maxPromptTokens
661
+ - Fixed Gemini 2.5 Pro context window detection in GitHub Copilot model limits test
662
+ - Fixed Anthropic model Opus 4.6 context window detection in GitHub Copilot model limits test
663
+ - Fixed Anthropic streaming to suppress transient SDK console errors for malformed SSE keep-alive frames so the TUI only shows surfaced provider errors
664
+
665
+ - Added environment-based credential fallback for the OpenAI code provider provider.
666
+
667
+ ## [13.17.6] - 2026-04-01
668
+
669
+ ### Fixed
670
+
671
+ - Fixed Anthropic first-event timeouts to exclude stream connection setup from the watchdog, preserve timeout-specific retry classification after local aborts, and reset retry state cleanly between attempts
672
+
673
+ ## [13.17.5] - 2026-04-01
674
+
675
+ ### Changed
676
+
677
+ - Increased default first-event timeout from 15s to 45s to better accommodate longer request setup times
678
+ - Modified first-event watchdog to inherit idle timeout when it exceeds the default, ensuring consistent timeout behavior across different configurations
679
+
680
+ ### Fixed
681
+
682
+ - Fixed first-event watchdog initialization timing so it no longer starts before the actual stream request is created, preventing premature timeouts during request setup
683
+ - Fixed first-event watchdog timing so OpenAI-family providers no longer count slow request setup against the first streamed event timeout, and raised the default first-event timeout to avoid false aborts after long tool turns
684
+
685
+ ## [13.17.2] - 2026-04-01
686
+
687
+ ### Fixed
688
+
689
+ - Fixed OpenAI-family first-event timeouts to preserve provider-specific timeout errors for retry classification instead of flattening them to generic aborts ([#591](https://github.com/can1357/gajae-code/issues/591))
690
+
691
+ ## [13.17.1] - 2026-04-01
692
+
693
+ ### Added
694
+
695
+ - Added `thinkingSignature` field to thinking content blocks to preserve the original reasoning field name (e.g., `reasoning_text`, `reasoning_content`) for accurate follow-up requests
696
+ - Added first-event timeout detection for streaming responses to abort stuck requests before user-visible content arrives
697
+ - Added `PI_STREAM_FIRST_EVENT_TIMEOUT_MS` environment variable to configure first-event timeout (defaults to 15 seconds or idle timeout, whichever is lower)
698
+
699
+ ### Changed
700
+
701
+ - Changed thinking block handling to track and distinguish between different reasoning field types, enabling proper field name preservation across multiple turns
702
+
703
+ ### Fixed
704
+
705
+ - Fixed Anthropic stream timeout errors to be properly retried by recognizing first-event timeout messages
706
+ - Fixed stream stall detection to distinguish between first-event timeouts and idle timeouts, enabling faster recovery for stuck connections
707
+
708
+ ### Added
709
+
710
+ - Added Vercel AI Gateway to `/login` providers for interactive API key setup
711
+
712
+ ### Fixed
713
+
714
+ - Fixed `gjc commit` failing with HTTP 400 errors when using reasoning-enabled models on OpenAI-compatible endpoints that don't support the `developer` role (e.g., GitHub Copilot, custom proxies). Now falls back to `system` role when `developer` is unsupported.
715
+
716
+ ## [13.17.0] - 2026-03-30
717
+
718
+ ### Changed
719
+
720
+ - Bumped zai provider default model from glm-4.6 to glm-5.1
721
+
722
+ ## [13.16.5] - 2026-03-29
723
+
724
+ ### Added
725
+
726
+ - Added Gemma 3 27B model support for Google Generative AI
727
+
728
+ ### Changed
729
+
730
+ - Updated Kwaipilot KAT-Coder-Pro V2 model display name and pricing information
731
+ - Updated Kwaipilot KAT-Coder-Pro V2 context window from 222,222 to 256,000 tokens and max tokens from 8,888 to 80,000
732
+
733
+ ### Fixed
734
+
735
+ - Fixed normalizeAnthropicBaseUrl returning empty string instead of undefined when baseUrl is empty
736
+
737
+ ## [13.16.4] - 2026-03-28
738
+
739
+ ### Added
740
+
741
+ - Added support for Groq Compound and Compound Mini models with extended context window (131K tokens) and configurable thinking levels
742
+ - Added support for OpenAI GPT-OSS-Safeguard-20B model with reasoning capabilities across multiple providers
743
+ - Added support for Kwaipilot KAT-Coder-Pro V2 model across Kilo, NanoGPT, and OpenRouter providers
744
+ - Added support for GLM-5.1 model with extended context window (200K tokens) and max output of 131K tokens
745
+ - Added support for Qwen3.5-27B-Musica-v1 model
746
+ - Added support for zai-org/glm-5.1 model with reasoning capabilities
747
+ - Added support for Sapiens AI Agnes-1.5-Lite model with multimodal input (text and image) and reasoning
748
+ - Added support for Venice openai-gpt-54-mini model
749
+
750
+ ### Changed
751
+
752
+ - Updated Qwen QwQ 32B max tokens from 16,384 to 40,960 across multiple providers
753
+ - Updated OpenAI GPT-OSS-Safeguard-20B model name to 'Safety GPT OSS 20B' and enabled reasoning capabilities
754
+ - Updated OpenAI GPT-OSS-Safeguard-20B context window from 222,222 to 131,072 tokens and max tokens from 8,888 to 65,536
755
+ - Updated OpenRouter Qwen QwQ 32B pricing: input from 0.2 to 0.19, output from 1.17 to 1.15, cache read from 0.1 to 0.095
756
+ - Updated OpenRouter Anthropic model 3.5 Sonnet pricing: input from 0.45 to 0.42, cache read from 0.225 to 0.21
757
+
758
+ ## [13.16.3] - 2026-03-28
759
+
760
+ ### Changed
761
+
762
+ - Modified OAuth credential saving to preserve unrelated identities instead of replacing all credentials for a provider
763
+ - Updated credential identity resolution to use provider context for more accurate email deduplication
764
+
765
+ ### Fixed
766
+
767
+ - Fixed OAuth credential updates to replace matching credentials in-place rather than creating disabled rows, preventing unbounded accumulation of soft-deleted credentials
768
+
769
+ ## [13.15.0] - 2026-03-23
770
+
771
+ ### Added
772
+
773
+ - Added `isUsageLimitError()` to `rate-limit-utils` as a single source of truth for detecting usage/quota limit errors across all providers
774
+
775
+ ### Fixed
776
+
777
+ - Fixed lazy stream forwarding to properly handle final results from source streams with `result()` methods
778
+ - Fixed lazy stream error handling to convert iterator failures into terminal error results instead of silently failing
779
+ - Fixed `parseRateLimitReason` to recognize "usage limit" in error messages and correctly classify them as `QUOTA_EXHAUSTED`
780
+ - Fixed OpenAI code `fetchWithRetry` retrying 429 responses for `usage_limit_reached` errors for up to 5 minutes instead of returning immediately for credential switching
781
+ - Removed `usage.?limit` from `TRANSIENT_MESSAGE_PATTERN` in retry utils since usage limits are not transient and require credential rotation
782
+ - Fixed `parseRateLimitReason` not recognizing "usage limit" in OpenAI code error messages, causing incorrect fallback to `UNKNOWN` classification instead of `QUOTA_EXHAUSTED`
783
+
784
+ ## [13.14.2] - 2026-03-21
785
+
786
+ ### Changed
787
+
788
+ - Updated thinking configuration format from `levels` array to `minLevel` and `maxLevel` properties for improved clarity
789
+ - Corrected context window from 400000 to 272000 tokens for GPT-5.4 mini and nano variants on OpenAI code transport
790
+ - Normalized GPT-5.4 variant priority handling to use parsed variant instead of special-casing raw model IDs
791
+ - Added support for `mini` variant in OpenAI model parsing regex
792
+
793
+ ### Fixed
794
+
795
+ - Fixed inconsistent thinking level configuration across multiple model definitions
796
+
797
+ ## [13.14.0] - 2026-03-20
798
+
799
+ ### Fixed
800
+
801
+ - Fixed resumed OpenAI Responses sessions to avoid replaying stale same-provider native history on the first follow-up after process restart ([#488](https://github.com/can1357/gajae-code/issues/488))
802
+
803
+ ### Added
804
+
805
+ - Added bundled GPT-5.4 mini model metadata for OpenAI, OpenAI code provider, and GitHub Copilot, including low-to-xhigh thinking support and GitHub Copilot premium multiplier metadata
806
+ - Added bundled GPT-5.4 nano model metadata for OpenAI and OpenAI code provider, including low-to-xhigh thinking support
807
+
808
+ ## [13.13.2] - 2026-03-18
809
+
810
+ ### Changed
811
+
812
+ - Modified tool result handling for aborted assistant messages to preserve existing tool results when already recorded, instead of always replacing them with synthetic 'aborted' results
813
+
814
+ ## [13.13.0] - 2026-03-18
815
+
816
+ ### Changed
817
+
818
+ - Changed tool argument validation to always normalize optional null values before type coercion, ensuring consistent handling of LLM-generated 'null' strings
819
+
820
+ ### Fixed
821
+
822
+ - Fixed tool argument validation to properly handle string 'null' values from LLMs on optional fields by stripping them during normalization
823
+ - Improved type safety of `validateToolCall` and `validateToolArguments` functions by returning properly typed `ToolCall["arguments"]` instead of `any`
824
+
825
+ ## [13.12.9] - 2026-03-17
826
+
827
+ ### Changed
828
+
829
+ - Extracted OpenAI compatibility detection and resolution logic into dedicated `openai-completions-compat` module for improved maintainability and reusability
830
+
831
+ ### Fixed
832
+
833
+ - Fixed `openai-responses` manual history replay to strip replay-only item IDs and preserve normalized tool `call_id` values for GitHub Copilot follow-up turns ([#457](https://github.com/can1357/gajae-code/issues/457))
834
+
835
+ ## [13.12.0] - 2026-03-14
836
+
837
+ ### Added
838
+
839
+ - Added support for `qwen-chat-template` thinking format to enable reasoning via `chat_template_kwargs.enable_thinking`
840
+ - Added `reasoningEffortMap` option to `OpenAICompat` for mapping pi-ai reasoning levels to provider-specific `reasoning_effort` values
841
+ - Added `extraBody` to `OpenAICompat` to support provider-specific request body routing fields in OpenAI-completions requests
842
+ - Added support for reading token usage from choice-level `usage` field as fallback when root-level usage is unavailable
843
+ - Added new models: DeepSeek-V3.2 (Bedrock), Llama 3.1 405B Instruct, Magistral Small 1.2, Ministral 3 3B, Mistral Large 3, Pixtral Large (25.02), NVIDIA Nemotron Nano 3 30B, and Qwen3-5-9b
844
+ - Added `close()` method to `AuthStorage` for properly closing the underlying credential store
845
+ - Added `initiatorOverride` option in OpenAI and Anthropic providers to customize message attribution
846
+
847
+ ### Changed
848
+
849
+ - Changed assistant message content serialization to always use plain string format instead of text block arrays to prevent recursive nesting in OpenAI-compatible backends
850
+ - Changed Bedrock Opus 4.6 context window from 1M to 1M and added max tokens limit of 128K
851
+ - Changed OpenCode Zen/Go Sonnet 4.0/4.5 context window from 1M to 200K
852
+ - Changed GitHub Copilot context windows from 200K to 128K for both gpt-4o and gpt-4o-mini
853
+ - Changed Anthropic model 3.5 Sonnet (Anthropic API) pricing: input from $0.5 to $0.25, output from $3 to $1.5, cache read from $0.05 to $0.025, cache write from $0 to $1
854
+ - Changed Devstral 2 model name from '135B' to '123B'
855
+ - Changed ByteDance Seed 2.0-Lite to support reasoning with effort-based thinking mode and image inputs
856
+ - Changed Qwen3-32b (Groq) reasoning effort mapping to normalize all levels to 'default'
857
+ - Changed finish_reason 'end' to map to 'stop' for improved compatibility with additional providers
858
+ - Changed Anthropic reference model merging to prioritize bundled metadata for known models while using models.dev for newly discovered IDs
859
+
860
+ ### Fixed
861
+
862
+ - Fixed reasoning_effort parameter handling to use provider-specific mappings instead of raw effort values
863
+ - Fixed assistant content serialization for GitHub Copilot and other OpenAI-compatible backends that mirror array payloads
864
+ - Fixed token usage calculation to properly extract cached tokens from both root and nested `prompt_tokens_details` fields
865
+ - Fixed stop reason mapping to handle string values and unknown finish reasons gracefully
866
+ - Fixed resource cleanup in `AuthCredentialStore.close()` to properly finalize all prepared statements before closing the database
867
+
868
+ ## [13.11.1] - 2026-03-13
869
+
870
+ ### Fixed
871
+
872
+ - Added `llama.cpp` as local provider
873
+ - Fixed auth schema V0-to-V1 migration crash when the V0 table lacks a `disabled` column
874
+
875
+ ## [13.11.0] - 2026-03-12
876
+
877
+ ### Added
878
+
879
+ - Added support for Parallel AI provider with API key authentication
880
+ - Added `PARALLEL_API_KEY` environment variable support for Parallel provider configuration
881
+ - Added automatic websocket reconnection handling for connection limit errors, with fallback to SSE replay when content has already been emitted
882
+
883
+ ### Changed
884
+
885
+ - Enhanced `OpenAI codeProviderStreamError` to include an optional error code field for better error categorization and handling
886
+
887
+ ### Fixed
888
+
889
+ - Improved retry logic to handle HTTP/2 stream errors and internal_error responses from Anthropic API
890
+
891
+ ## [13.9.16] - 2026-03-10
892
+
893
+ ### Added
894
+
895
+ - Support for `onPayload` callback to replace provider request payloads before sending, enabling request interception and modification
896
+ - Support for structured text signature metadata with phase information (commentary/final_answer) in OpenAI and Azure OpenAI Responses providers
897
+ - Support for OpenAI code provider Spark model selection with plan-based account prioritization
898
+ - Added `modelId` option to `getApiKey()` to enable model-specific credential ranking
899
+
900
+ ### Changed
901
+
902
+ - Enhanced `onPayload` callback signature to accept model parameter and support async payload replacement
903
+ - Improved error messages for `response.failed` events to include detailed error codes, messages, and incomplete reasons
904
+ - Refactored OpenAI code provider response streaming to improve code organization and maintainability with extracted helper functions and type definitions
905
+ - Enhanced websocket fallback logic to safely replay buffered output over SSE when websocket connections fail mid-stream
906
+ - Improved error recovery for websocket streams by distinguishing between fatal connection errors and retryable stream errors
907
+ - Updated credential ranking strategy to prioritize Pro plan accounts when requesting OpenAI code provider Spark models
908
+
909
+ ### Fixed
910
+
911
+ - Fixed websocket stream recovery to properly reset output state and clear buffered items when falling back to SSE after partial output
912
+ - Fixed handling of malformed JSON messages in websocket streams to trigger immediate fallback to SSE without retry attempts
913
+
914
+ ## [13.9.13] - 2026-03-10
915
+
916
+ ### Added
917
+
918
+ - Added `isSpecialServiceTier` utility function to validate OpenAI service tier values
919
+
920
+ ## [13.9.12] - 2026-03-09
921
+
922
+ ### Added
923
+
924
+ - Added Tavily web search provider support with API key authentication
925
+
926
+ ### Fixed
927
+
928
+ - Fixed OpenAI-family streaming transports to fail with an explicit idle-timeout error instead of hanging indefinitely when the provider stops sending events mid-response
929
+ - Fixed OpenAI code provider OAuth refresh and usage-limit lookups to respect request timeouts instead of waiting indefinitely during account selection or rotation
930
+ - Fixed OpenAI code provider prewarmed websocket requests to fall back quickly when the socket connects but never starts the response stream
931
+
932
+ ## [13.9.10] - 2026-03-08
933
+
934
+ ### Added
935
+
936
+ - Added `identity_key` column to auth credentials storage for improved credential deduplication
937
+ - Added schema versioning system to auth credentials database for safer migrations
938
+ - Added automatic backfilling of identity keys during database schema migrations
939
+
940
+ ### Changed
941
+
942
+ - Changed credential deduplication logic to use single identity key instead of multiple identifiers for better performance
943
+ - Changed database schema to store normalized identity keys alongside credentials
944
+ - Changed auth schema migration to support upgrading from legacy database versions with automatic data backfill
945
+
946
+ ### Fixed
947
+
948
+ - Fixed API key credential matching to correctly identify when the same key is re-stored, preventing unnecessary row duplication on re-login
949
+ - Fixed credential deduplication to correctly handle OAuth accounts with matching emails but different account IDs
950
+ - Fixed API key replacement to reuse existing stored rows instead of accumulating disabled duplicates
951
+ - Fixed auth storage to preserve newer recorded schema versions when opened by older binaries
952
+
953
+ ## [13.9.8] - 2026-03-08
954
+
955
+ ### Fixed
956
+
957
+ - Fixed WebSocket stream fallback logic to safely replay buffered output over SSE when WebSocket fails after partial content has been streamed
958
+
959
+ ## [13.9.4] - 2026-03-07
960
+
961
+ ### Changed
962
+
963
+ - Simplified API key credential storage to always replace existing credentials on re-login instead of accumulating multiple keys
964
+ - Updated Kagi API key placeholder from `kagi_...` to `KG_...` to match current API key format
965
+ - Updated Kagi login instructions to clarify Search API access is beta-only and provide support contact
966
+ - Disabled usage reporting in streaming responses for Cerebras models due to compatibility issues
967
+
968
+ ### Fixed
969
+
970
+ - Fixed Cerebras model compatibility by preventing `stream_options` usage requests in chat completions
971
+
972
+ ## [13.9.3] - 2026-03-07
973
+
974
+ ### Breaking Changes
975
+
976
+ - Changed `reasoning` parameter from `ThinkingLevel | undefined` to `Effort | undefined` in `SimpleStreamOptions`; 'off' is no longer valid (omit the field instead)
977
+ - Removed `supportsXhigh()` function; check `model.thinking?.maxLevel` instead
978
+ - Removed `ThinkingLevel` and `ThinkingEffort` types; use `Effort` enum
979
+ - Removed `getAvailableThinkingLevels()` and `getAvailableThinkingEfforts()` functions
980
+ - Changed `transformRequestBody()` signature to require `Model` parameter as second argument for effort validation
981
+ - Removed `thinking.ts` module export; import from `model-thinking.ts` instead
982
+
983
+ ### Added
984
+
985
+ - Added `incremental` flag to `OpenAIResponsesHistoryPayload` to support building conversation history from multiple assistant messages instead of replacing it
986
+ - Added `dt` flag to `OpenAIResponsesHistoryPayload` for transport-level metadata
987
+ - Added `ThinkingConfig` interface to models for canonical thinking transport metadata with min/max effort levels and provider-specific mode
988
+ - Added `thinking` field to `Model` type containing per-model thinking capabilities used to clamp and map user-facing effort levels
989
+ - Added `Effort` enum (minimal, low, medium, high, xhigh) as canonical user-facing thinking levels replacing `ThinkingLevel`
990
+ - Added `enrichModelThinking()` function to automatically populate thinking metadata on models based on their capabilities
991
+ - Added `mapEffortToAnthropicAdaptiveEffort()` function to map user effort levels to Anthropic adaptive thinking effort
992
+ - Added `mapEffortToGoogleThinkingLevel()` function to map user effort levels to Google thinking levels
993
+ - Added `requireSupportedEffort()` function to validate and clamp effort levels per model, throwing errors for unsupported combinations
994
+ - Added `clampThinkingLevelForModel()` function to clamp thinking levels to model-supported range
995
+ - Added `applyGeneratedModelPolicies()` and `linkSparkPromotionTargets()` exports from model-thinking module
996
+ - Added `serviceTier` option to control OpenAI processing priority and cost (auto, default, flex, scale, priority)
997
+ - Added `providerPayload` field to messages and responses for reconstructing transport-native history
998
+ - Added Gemini usage provider for tracking quota and tier information
999
+ - Added `getOpenAI codeAccountId()` utility to extract account ID from OpenAI code JWT tokens
1000
+ - Added email extraction from OpenAI code provider OAuth tokens for credential deduplication
1001
+
1002
+ ### Changed
1003
+
1004
+ - Changed credential disabling mechanism from boolean `disabled` flag to `disabled_cause` text field for tracking why credentials were disabled
1005
+ - Changed `deleteAuthCredential()` and `deleteAuthCredentialsForProvider()` methods to require a `disabledCause` parameter explaining the reason for disabling
1006
+ - Changed Gemini model parsing to strip `-preview` suffix for consistent model identification
1007
+ - Changed OpenAI code provider websocket error handling to detect fatal connection errors and immediately fall back to SSE without retrying
1008
+ - Changed OpenAI code provider to always use websockets v2 protocol (removed v1 support)
1009
+ - Changed `reasoning` parameter type from `ThinkingLevel` to `Effort` in `SimpleStreamOptions`, removing 'off' value (callers should omit the field instead)
1010
+ - Changed thinking configuration to use model-specific metadata instead of hardcoded provider logic for effort mapping
1011
+ - Changed OpenAI code provider request transformer to accept `Model` parameter for effort validation instead of string model ID
1012
+ - Changed Anthropic provider to use model thinking metadata for determining adaptive thinking support instead of model ID pattern matching
1013
+ - Changed Google Vertex and Google providers to use shorter variable names for thinking config construction
1014
+ - Moved thinking-related utilities from `thinking.ts` to new `model-thinking.ts` module with expanded functionality
1015
+ - Moved model policy functions from `provider-models/model-policies.ts` to `model-thinking.ts`
1016
+ - Moved `googleGeminiCliUsageProvider` from `providers/google-gemini-cli-usage.ts` to `usage/gemini.ts`
1017
+ - Changed default OpenAI model from gpt-5.1-openai-code to gpt-5.4 across all providers
1018
+ - Changed `UsageFetchContext` to remove cache and now() dependencies—usage fetchers now use Date.now() directly
1019
+ - Removed `resetInMs` field from usage windows; consumers should calculate from `resetsAt` timestamp
1020
+ - Changed OpenAI code provider credential ranking to deduplicate by email when accountId matches
1021
+ - Improved OpenAI code provider error handling with retryable error detection
1022
+
1023
+ ### Removed
1024
+
1025
+ - Removed `thinking.ts` module; use `model-thinking.ts` instead
1026
+ - Removed `provider-models/model-policies.ts` module; functionality moved to `model-thinking.ts`
1027
+ - Removed `supportsXhigh()` function from models.ts; use model.thinking metadata instead
1028
+ - Removed `ThinkingLevel` and `ThinkingEffort` types; use `Effort` enum instead
1029
+ - Removed `getAvailableThinkingLevels()` and `getAvailableThinkingEfforts()` functions
1030
+ - Removed `model-policies` export from `provider-models/index.ts`
1031
+ - Removed hardcoded thinking level clamping logic from OpenAI code provider request transformer; now uses model metadata
1032
+ - Removed `UsageCache` and `UsageCacheEntry` interfaces—caching is now handled internally by AuthStorage
1033
+ - Removed `google-gemini-cli-usage` export; use new `gemini` usage provider instead
1034
+ - Removed `resetInMs` computation from all usage providers
1035
+ - Removed cache TTL constants and cache management from usage fetchers (anthropic-model, github-copilot, google-antigravity, kimi, openai-code, zai)
1036
+
1037
+ ### Fixed
1038
+
1039
+ - Fixed credential purging to respect disabled credentials when deduplicating by email, preventing re-enablement of intentionally disabled credentials
1040
+ - Fixed OpenAI code provider websocket error reporting to include detailed error messages from error events
1041
+ - Fixed conversation history reconstruction to support incremental updates from multiple assistant messages while maintaining backward compatibility with full-snapshot payloads
1042
+ - Fixed OpenAI code provider to reject unsupported effort levels instead of silently clamping them, providing clear error messages about supported efforts
1043
+ - Fixed model cache normalization to properly apply thinking enrichment when loading cached models
1044
+ - Fixed dynamic model merging to apply thinking enrichment to merged model results
1045
+ - Fixed OpenAI code provider streaming to properly include service_tier in SSE payloads
1046
+ - Fixed type safety in OpenAI responses by removing unsafe type casts on image content blocks
1047
+ - Fixed credential purging to respect disabled credentials when deduplicating by email
1048
+ - Fixed API-key provider re-login to replace the active stored key instead of appending stale credentials that were still selected first
1049
+ - Fixed Kagi login guidance to use the correct `KG_...` key format and mention Search API beta access requirements
1050
+
1051
+ ## [13.9.2] - 2026-03-05
1052
+
1053
+ ### Added
1054
+
1055
+ - Support for redacted thinking blocks in Anthropic messages, enabling secure handling of encrypted reasoning content
1056
+ - Preservation of latest Anthropic thinking blocks and redacted thinking content during message transformation, even when switching between Anthropic models
1057
+
1058
+ ### Changed
1059
+
1060
+ - Assistant message content now includes `RedactedThinkingContent` type alongside existing text, thinking, and tool call blocks
1061
+ - Message transformation logic now preserves signed thinking blocks and redacted thinking for the latest assistant message in Anthropic conversations
1062
+
1063
+ ### Fixed
1064
+
1065
+ - Fixed Unicode normalization to consistently apply `toWellFormed()` to all text content, including thinking blocks, ensuring proper handling of malformed UTF-16 sequences
1066
+
1067
+ ## [13.9.1] - 2026-03-05
1068
+
1069
+ ### Breaking Changes
1070
+
1071
+ - Removed `THINKING_LEVELS`, `ALL_THINKING_LEVELS`, `ALL_THINKING_MODES`, `THINKING_MODE_DESCRIPTIONS`, and `THINKING_MODE_LABELS` exports
1072
+ - Renamed `formatThinking()` to `getThinkingMetadata()` with changed return type from string to `ThinkingMetadata` object
1073
+ - Renamed `getAvailableThinkingLevel()` to `getAvailableThinkingLevels()` and added default parameter
1074
+ - Renamed `getAvailableEffort()` to `getAvailableEfforts()` and added default parameter
1075
+
1076
+ ### Added
1077
+
1078
+ - Added `ThinkingMetadata` type to provide structured access to thinking mode information (value, label, description)
1079
+
1080
+ ## [13.9.0] - 2026-03-05
1081
+
1082
+ ### Added
1083
+
1084
+ - Exported new thinking module with `Effort`, `ThinkingLevel`, and `ThinkingMode` types for managing reasoning effort levels
1085
+ - Added `getAvailableEffort()` function to determine supported thinking effort levels based on model capabilities
1086
+ - Added `parseEffort()`, `parseThinkingLevel()`, and `parseThinkingMode()` functions for parsing thinking configuration strings
1087
+ - Added `THINKING_LEVELS`, `ALL_THINKING_LEVELS`, and `ALL_THINKING_MODES` constants for iterating over available thinking options
1088
+ - Added `THINKING_MODE_DESCRIPTIONS` and `THINKING_MODE_LABELS` for displaying thinking modes in user interfaces
1089
+ - Added `formatThinking()` function to format thinking modes as compact display labels
1090
+
1091
+ ### Changed
1092
+
1093
+ - Refactored thinking level handling to distinguish between `Effort` (provider-level, no "off") and `ThinkingLevel` (user-facing, includes "off")
1094
+ - Updated `ThinkingBudgets` type to use `Effort` instead of `ThinkingLevel` for more precise token budget configuration
1095
+ - Improved reasoning option handling to explicitly support "off" value for disabling reasoning across all providers
1096
+ - Simplified thinking effort mapping logic by centralizing provider-specific clamping behavior
1097
+
1098
+ ## [13.7.8] - 2026-03-04
1099
+
1100
+ ### Added
1101
+
1102
+ - Added ZenMux provider support with mixed API routing: Anthropic-owned models discovered from `https://zenmux.ai/api/v1/models` now use the Anthropic transport (`https://zenmux.ai/api/anthropic`), while other ZenMux models use the OpenAI-compatible transport.
1103
+
1104
+ ## [13.7.7] - 2026-03-04
1105
+
1106
+ ### Changed
1107
+
1108
+ - Modified response ID normalization to preserve existing item ID prefixes when truncating oversized IDs
1109
+ - Updated tool call ID normalization to use `fc_` prefix for generated item IDs instead of `item_` prefix
1110
+
1111
+ ### Fixed
1112
+
1113
+ - Fixed handling of reasoning item IDs to remain untouched during response normalization while function call IDs are properly normalized
1114
+
1115
+ ## [13.7.2] - 2026-03-04
1116
+
1117
+ ### Added
1118
+
1119
+ - Added support for Kagi API key authentication via `login kagi` command
1120
+ - Added Kagi to the list of available OAuth providers
1121
+
1122
+ ### Fixed
1123
+
1124
+ - MCP tool schemas with `$ref`/`$defs` are now dereferenced before being sent to LLM providers, fixing dangling references that left models without type definitions
1125
+ - Ajv schema validation no longer emits `console.warn()` for non-standard format keywords (e.g. `"uint"`) from MCP servers, preventing TUI corruption
1126
+ - Tool schema compilation is now cached per schema identity, eliminating redundant recompilation on every tool call
1127
+
1128
+ ## [13.6.0] - 2026-03-03
1129
+
1130
+ ### Added
1131
+
1132
+ - Added Anthropic Foundry gateway mode controlled by `ANTHROPIC_MODEL_CODE_USE_FOUNDRY`, with support for `FOUNDRY_BASE_URL`, `ANTHROPIC_FOUNDRY_API_KEY`, `ANTHROPIC_CUSTOM_HEADERS`, and optional mTLS material (`ANTHROPIC_MODEL_CODE_CLIENT_CERT`, `ANTHROPIC_MODEL_CODE_CLIENT_KEY`, `NODE_EXTRA_CA_CERTS`)
1133
+ - Added LM Studio provider support with OpenAI-compatible model discovery and OAuth login.
1134
+ - Added support for `LM_STUDIO_API_KEY` and `LM_STUDIO_BASE_URL` environment variables for authentication and custom host configuration.
1135
+
1136
+ ### Changed
1137
+
1138
+ - Anthropic key resolution now prefers `ANTHROPIC_FOUNDRY_API_KEY` over `ANTHROPIC_OAUTH_TOKEN` and `ANTHROPIC_API_KEY` when Foundry mode is enabled
1139
+ - Anthropic auth base-URL fallback now prefers `FOUNDRY_BASE_URL` when `ANTHROPIC_MODEL_CODE_USE_FOUNDRY` is enabled
1140
+
1141
+ ## [13.5.8] - 2026-03-02
1142
+
1143
+ ### Fixed
1144
+
1145
+ - Fixed schema compatibility issue where patternProperties in tool parameters caused failures when converting to legacy Antigravity format
1146
+
1147
+ ## [13.5.5] - 2026-03-01
1148
+
1149
+ ### Changed
1150
+
1151
+ - Anthropic Anthropic model system-block cloaking now leaves the agent identity block uncached and applies `cache_control: { type: "ephemeral" }` to injected user system blocks without forcing `ttl: "1h"`
1152
+
1153
+ ### Fixed
1154
+
1155
+ - Anthropic request payload construction now enforces a maximum of 4 `cache_control` breakpoints (tools/system/messages priority order) before dispatch
1156
+ - Anthropic cache-control normalization now removes later `ttl: "1h"` entries when a default/5m block has already appeared earlier in evaluation order
1157
+
1158
+ ## [13.5.3] - 2026-03-01
1159
+
1160
+ ### Fixed
1161
+
1162
+ - Fixed tool argument coercion to handle malformed JSON with trailing wrapper braces by parsing leading JSON containers
1163
+
1164
+ ## [13.4.0] - 2026-03-01
1165
+
1166
+ ### Breaking Changes
1167
+
1168
+ - Removed `TInput` generic parameter from `ToolResultMessage` interface and removed `$normative` property
1169
+
1170
+ ### Added
1171
+
1172
+ - `hasUnrepresentableStrictObjectMap()` pre-flight check in `tryEnforceStrictSchema`: schemas with `patternProperties` or schema-valued `additionalProperties` now degrade gracefully to non-strict mode instead of throwing during enforcement
1173
+ - `generateAnthropic modelCloakingUserId()` generates structured user IDs for Anthropic OAuth metadata (`user_{hex64}_account_{uuid}_session_{uuid}`)
1174
+ - `isAnthropic modelCloakingUserId()` validates whether a string matches the cloaking user-ID format
1175
+ - `mapStainlessOs()` and `mapStainlessArch()` map `process.platform`/`process.arch` to Stainless header values; X-Stainless-Os and X-Stainless-Arch in `anthropic-modelCodeHeaders` are now runtime-computed
1176
+ - `buildAnthropic modelCodeTlsFetchOptions()` attaches SNI and default TLS ciphers for direct `api.anthropic.com` connections
1177
+ - `createAnthropic modelBillingHeader()` generates the `x-anthropic-billing-header` block (SHA-256 payload fingerprint + random build hash)
1178
+ - `buildAnthropicSystemBlocks()` now injects a billing header block and the Anthropic model Agent SDK identity block with `ephemeral` 1h cache-control when `includeAnthropic modelCodeInstruction` is set
1179
+ - `resolveAnthropicMetadataUserId()` auto-generates a cloaking user ID for OAuth requests when `metadata.user_id` is absent or invalid
1180
+ - `AnthropicOAuthFlow` is now exported for direct use
1181
+ - OAuth callback server timeout extended from 2 min to 5 min
1182
+ - `parseGeminiCliCredentials()` parses Google Cloud credential JSON with support for legacy (`{token,projectId}`), alias (`project_id`/`refresh`/`expires`), and enriched formats
1183
+ - `shouldRefreshGeminiCliCredentials()` and proactive token refresh before requests for both Gemini CLI and Antigravity providers (60s pre-expiry buffer)
1184
+ - `normalizeAntigravityTools()` converts `parametersJsonSchema` → `parameters` in function declarations for Antigravity compatibility
1185
+ - `ANTIGRAVITY_SYSTEM_INSTRUCTION` is now exported for use by search and other consumers
1186
+ - `ANTIGRAVITY_LOAD_CODE_ASSIST_METADATA` constant exported from OAuth module with `ANTIGRAVITY` ideType
1187
+ - Antigravity project onboarding: `onboardProjectWithRetries()` provisions a new project via `onboardUser` LRO when `loadCodeAssist` returns no existing project (up to 5 attempts, 2s interval)
1188
+ - `getOAuthApiKey` now includes `refreshToken`, `expiresAt`, `email`, and `accountId` in the Gemini/Antigravity JSON credential payload to enable proactive refresh
1189
+ - Antigravity model discovery now tries the production daily endpoint first, with sandbox as fallback
1190
+ - `ANTIGRAVITY_DISCOVERY_DENYLIST` filters low-quality/internal models from discovery results
1191
+
1192
+ ### Changed
1193
+
1194
+ - Replaced `sanitizeSurrogates()` utility with native `String.prototype.toWellFormed()` for handling unpaired Unicode surrogates across all providers
1195
+ - Extended `ANTHROPIC_OAUTH_BETA` constant in the OpenAI-compat Anthropic route with `interleaved-thinking-2025-05-14`, `context-management-2025-06-27`, and `prompt-caching-scope-2026-01-05` beta flags
1196
+ - `anthropic-modelCodeVersion` bumped to `2.1.63`; `anthropic-modelCodeSystemInstruction` updated to identify as Anthropic model Agent SDK
1197
+ - `anthropic-modelCodeHeaders`: removed `X-Stainless-Helper-Method`, updated package version to `0.74.0`, runtime version to `v24.3.0`
1198
+ - `applyAnthropic modelToolPrefix` / `stripAnthropic modelToolPrefix` now accept an optional prefix override and skip Anthropic built-in tool names (`web_search`, `code_execution`, `text_editor`, `computer`)
1199
+ - Accept-Encoding header updated to `gzip, deflate, br, zstd`
1200
+ - Non-Anthropic base URLs now receive `Authorization: Bearer` regardless of OAuth status
1201
+ - Prompt-caching logic now skips applying breakpoints when any block already carries `cache_control`, instead of stripping then re-applying
1202
+ - `fine-grained-tool-streaming-2025-05-14` removed from default beta set
1203
+ - Anthropic OAuth token URL changed from `platform.anthropic-model.com` to `api.anthropic.com`
1204
+ - Anthropic OAuth scopes reduced to `org:create_api_key user:profile user:inference`
1205
+ - OAuth code exchange now strips URL fragment from callback code, using the fragment as state override when present
1206
+ - Anthropic model usage headers aligned: user-agent updated to `anthropic-model-cli/2.1.63 (external, cli)`, anthropic-beta extended with full beta set
1207
+ - Antigravity session ID format changed to signed decimal (negative int63 derived from SHA-256 of first user message, or random bounded int63)
1208
+ - Antigravity `requestId` now uses `agent-{uuid}` format; non-Antigravity requests no longer include requestId/userAgent/requestType in the payload
1209
+ - `ANTIGRAVITY_DAILY_ENDPOINT` corrected to `daily-cloudcode-pa.googleapis.com`; sandbox endpoint kept as fallback only
1210
+ - Antigravity discovery: removed `recommended`/`agentModelSorts` filter; now includes all non-internal, non-denylisted models
1211
+ - Antigravity discovery no longer sends `project` in the request body
1212
+ - Gemini/Antigravity OAuth flows no longer use PKCE (code_challenge removed)
1213
+ - Antigravity `loadCodeAssist` metadata ideType changed from `IDE_UNSPECIFIED` to `ANTIGRAVITY`
1214
+ - Antigravity `discoverProject` now uses a single canonical production endpoint; falls back to project onboarding instead of a hardcoded default project ID
1215
+ - `VALIDATED` tool calling config applied to Antigravity requests with Anthropic model models
1216
+ - `maxOutputTokens` removed from Antigravity generation config for non-Anthropic model models
1217
+ - System instruction injection for Antigravity scoped to Anthropic model and `gemini-3-pro-high` models only
1218
+
1219
+ ### Removed
1220
+
1221
+ - Removed `sanitizeSurrogates()` utility function; use native `String.prototype.toWellFormed()` instead
1222
+
1223
+ ## [13.3.14] - 2026-02-28
1224
+
1225
+ ### Added
1226
+
1227
+ - Exported schema utilities from new `./utils/schema` module, consolidating JSON Schema handling across providers
1228
+ - Added `CredentialRankingStrategy` interface for providers to implement usage-based credential selection
1229
+ - Added `anthropic-modelRankingStrategy` for Anthropic OAuth credentials to enable smart multi-account selection based on usage windows
1230
+ - Added `openai-codeRankingStrategy` for OpenAI code provider OAuth credentials with priority boost for fresh 5-hour window starts
1231
+ - Added `adaptSchemaForStrict()` helper for unified OpenAI strict schema enforcement across providers
1232
+ - Added schema equality and merging utilities: `areJsonValuesEqual()`, `mergeCompatibleEnumSchemas()`, `mergePropertySchemas()`
1233
+ - Added Cloud Code Assist schema normalization: `copySchemaWithout()`, `stripResidualCombiners()`, `prepareSchemaForCCA()`
1234
+ - Added `sanitizeSchemaForGoogle()` and `sanitizeSchemaForCCA()` for provider-specific schema sanitization
1235
+ - Added `StringEnum()` helper for creating string enum schemas compatible with Google and other providers
1236
+ - Added `enforceStrictSchema()` and `sanitizeSchemaForStrictMode()` for OpenAI strict mode schema validation
1237
+ - Added package exports for `./utils/schema` and `./utils/schema/*` subpaths
1238
+ - Added `validateSchemaCompatibility()` to statically audit a JSON Schema against provider-specific rules (`openai-strict`, `google`, `cloud-code-assist-anthropic-model`) and return structured violations
1239
+ - Added `validateStrictSchemaEnforcement()` to verify the strict-fail-open contract: enforced schemas pass strict validation, failed schemas return the original object identity
1240
+ - Added `COMBINATOR_KEYS` (`anyOf`, `allOf`, `oneOf`) and `CCA_UNSUPPORTED_SCHEMA_FIELDS` as exported constants in `fields.ts` to eliminate duplication across modules
1241
+ - Added `tryEnforceStrictSchema` result cache (`WeakMap`) to avoid redundant sanitize + enforce work for the same schema object
1242
+ - Added comprehensive schema normalization test suite (`schema-normalization.test.ts`) covering strict mode, Google, and Cloud Code Assist normalization paths
1243
+ - Added schema compatibility validation test suite (`schema-compatibility.test.ts`) covering all three provider targets
1244
+
1245
+ ### Changed
1246
+
1247
+ - Moved schema utilities from `./utils/typebox-helpers` to new `./utils/schema` module with expanded functionality
1248
+ - Refactored OpenAI provider tool conversion to use unified `adaptSchemaForStrict()` helper across openai-code, completions, and responses
1249
+ - Updated `AuthStorage` to support generic credential ranking via `CredentialRankingStrategy` instead of OpenAI code-only logic
1250
+ - Moved Google schema sanitization functions from `google-shared.ts` to `./utils/schema` module
1251
+ - Changed export path: `./utils/typebox-helpers` → `./utils/schema` in main index
1252
+ - `sanitizeSchemaForGoogle()` / `sanitizeSchemaForCCA()` now accept a parameterized `unsupportedFields` set internally, enabling code reuse between the two sanitizers
1253
+ - `copySchemaWithout()` rewritten using object-rest destructuring for clarity
1254
+
1255
+ ### Fixed
1256
+
1257
+ - Fixed cycle detection: `WeakSet` guards added to all recursive schema traversals (`sanitizeSchemaForStrictMode`, `enforceStrictSchema`, `normalizeSchemaForCCA`, `normalizeNullablePropertiesForCloudCodeAssist`, `stripResidualCombiners`, `sanitizeSchemaImpl`, `hasResidualCloudCodeAssistIncompatibilities`) — circular schemas no longer cause infinite loops or stack overflows
1258
+ - Fixed `hasResidualCloudCodeAssistIncompatibilities`: cycle detection now returns `false` (not `true`) for already-visited nodes, eliminating false positives that forced the CCA fallback schema on valid recursive inputs
1259
+ - Fixed `stripResidualCombiners` to iterate to a fixpoint rather than making a single pass, ensuring chained combiner reductions (where one reduction enables another) are fully resolved
1260
+ - Fixed `mergeObjectCombinerVariants` required-field computation: the flattened object now takes the intersection of all variants' `required` arrays (unioned with own-level required properties that exist in the merged schema), preventing required fields from being silently dropped or over-included
1261
+ - Fixed `mergeCompatibleEnumSchemas` to use deep structural equality (`areJsonValuesEqual`) instead of `Object.is` when deduplicating object-valued enum members
1262
+ - Fixed `sanitizeSchemaForGoogle` const-to-enum deduplication to use deep equality instead of reference equality
1263
+ - Fixed `sanitizeSchemaForGoogle` type inference for `anyOf`/`oneOf`-flattened const enums: type is now derived from all variants (must agree), falling back to inference from enum values; mixed null/non-null infers the non-null type and sets `nullable`
1264
+ - Fixed `sanitizeSchemaForGoogle` recursion to spread options when descending (previously only `insideProperties`, `normalizeTypeArrayToNullable`, `stripNullableKeyword` were forwarded; new fields `unsupportedFields` and `seen` were silently dropped)
1265
+ - Fixed `sanitizeSchemaForGoogle` array-valued `type` filtering to exclude non-string entries before processing
1266
+ - Removed incorrect `additionalProperties: false` stripping from `sanitizeSchemaForGoogle` (the field is valid in Google schemas when `false`)
1267
+ - Fixed `sanitizeSchemaForStrictMode` to strip the `nullable` keyword and expand it into `anyOf: [schema, {type: "null"}]` in the output, matching what OpenAI strict mode actually expects
1268
+ - Fixed `sanitizeSchemaForStrictMode` to infer `type: "array"` when `items` is present but `type` is absent
1269
+ - Fixed `sanitizeSchemaForStrictMode` to infer a scalar `type` from uniform `enum` values when `type` is not explicitly set
1270
+ - Fixed `sanitizeSchemaForStrictMode` const-to-enum merge to use deep equality, preventing duplicate enum entries when `const` and `enum` both exist with the same value
1271
+ - Fixed `enforceStrictSchema` to drop `additionalProperties` unconditionally (previously only object-valued `additionalProperties` was recursed into; non-object values were passed through, violating strict schema requirements)
1272
+ - Fixed `enforceStrictSchema` to recurse into `$defs` and `definitions` blocks so referenced sub-schemas are also made strict-compliant
1273
+ - Fixed `enforceStrictSchema` to handle tuple-style `items` arrays (previously only single-schema `items` objects were recursed)
1274
+ - Fixed `enforceStrictSchema` double-wrapping: optional properties already expressed as `anyOf: [..., {type: "null"}]` are not wrapped again
1275
+ - Fixed `enforceStrictSchema` `Array.isArray` type-narrowing for `type` field to filter non-string entries before checking for `"object"`
1276
+
1277
+ ## [13.3.8] - 2026-02-28
1278
+
1279
+ ### Fixed
1280
+
1281
+ - Fixed response body reuse error when handling 429 rate limit responses with retry logic
1282
+
1283
+ ## [13.3.7] - 2026-02-27
1284
+
1285
+ ### Added
1286
+
1287
+ - Added `tryEnforceStrictSchema` function that gracefully downgrades to non-strict mode when schema enforcement fails, enabling better compatibility with malformed or circular schemas
1288
+ - Added `sanitizeSchemaForStrictMode` function to normalize JSON schemas by stripping non-structural keywords, converting `const` to `enum`, and expanding type arrays into `anyOf` variants
1289
+ - Added Kilo Gateway provider support with OpenAI-compatible model discovery, OAuth `/login kilo`, and `KILO_API_KEY` environment variable support ([#193](https://github.com/can1357/gajae-code/issues/193))
1290
+
1291
+ ### Changed
1292
+
1293
+ - Changed strict mode handling in OpenAI providers to use `tryEnforceStrictSchema` for safer schema enforcement with automatic fallback to non-strict mode
1294
+ - Enhanced `enforceStrictSchema` to properly handle schemas with type arrays containing `object` (e.g., `type: ["object", "null"]`)
1295
+
1296
+ ### Fixed
1297
+
1298
+ - Fixed `enforceStrictSchema` to properly handle malformed object schemas with required keys but missing properties
1299
+ - Fixed `enforceStrictSchema` to correctly process nested object schemas within `anyOf`, `allOf`, and `oneOf` combinators
1300
+
1301
+ ## [13.3.1] - 2026-02-26
1302
+
1303
+ ### Added
1304
+
1305
+ - Added `topP`, `topK`, `minP`, `presencePenalty`, and `repetitionPenalty` options to `StreamOptions` for fine-grained control over model sampling behavior
1306
+
1307
+ ## [13.3.0] - 2026-02-26
1308
+
1309
+ ### Changed
1310
+
1311
+ - Allowed OAuth provider logins to supply a manual authorization code handler with a default prompt when none is provided
1312
+
1313
+ ## [13.2.0] - 2026-02-23
1314
+
1315
+ ### Added
1316
+
1317
+ - Added support for GitHub Copilot provider in strict mode for both openai-completions and openai-responses tool schemas
1318
+
1319
+ ### Fixed
1320
+
1321
+ - Fixed tool descriptions being rejected when undefined by providing empty string fallback across all providers
1322
+
1323
+ ## [12.19.1] - 2026-02-22
1324
+
1325
+ ### Added
1326
+
1327
+ - Exported `isProviderRetryableError` function for detecting rate-limit and transient stream errors
1328
+ - Support for retrying malformed JSON stream-envelope parse errors from Anthropic-compatible proxy endpoints
1329
+
1330
+ ### Changed
1331
+
1332
+ - Expanded retry detection to include JSON parse errors (unterminated strings, unexpected end of input) in addition to rate-limit errors
1333
+
1334
+ ## [12.19.0] - 2026-02-22
1335
+
1336
+ ### Added
1337
+
1338
+ - Added GitLab Duo provider with support for Anthropic model, GPT-5, and other models via GitLab AI Gateway
1339
+ - Added OAuth authentication for GitLab Duo with automatic token refresh and direct access caching
1340
+ - Added 16 new GitLab Duo models including Anthropic model Opus/Sonnet/Haiku variants and GPT-5 series models
1341
+ - Added `isOAuth` option to Anthropic provider to force OAuth bearer auth mode for proxy tokens
1342
+ - Added `streamGitLabDuo` function to route requests through GitLab AI Gateway with direct access tokens
1343
+ - Added `getGitLabDuoModels` function to retrieve available GitLab Duo model configurations
1344
+ - Added `clearGitLabDuoDirectAccessCache` function to manually clear cached direct access tokens
1345
+
1346
+ ### Changed
1347
+
1348
+ - Enhanced `getModelMapping()` to support both GitLab Duo alias IDs (e.g., `duo-chat-gpt-5-openai-code`) and canonical model IDs (e.g., `gpt-5-openai-code`) for improved model resolution flexibility
1349
+ - Migrated `AuthCredentialStore` and `AuthStorage` into `@gajae-code/ai` as shared credential primitives for downstream packages
1350
+ - Moved Anthropic auth helpers (`findAnthropicAuth`, `isOAuthToken`, `buildAnthropicSearchHeaders`, `buildAnthropicUrl`) into shared AI utilities for reuse across providers
1351
+ - Replaced `CliAuthStorage` with `AuthCredentialStore` for improved credential management with multiple credentials per provider
1352
+ - Updated models.json pricing for Anthropic model 3.5 Sonnet (input: 0.23→0.45, output: 3→2.2, added cache read: 0.225) and Anthropic model 3 Opus (input: 0.3→0.95)
1353
+ - Moved `mapAnthropicToolChoice` function from gitlab-duo provider to stream module for broader reusability
1354
+ - Enhanced HTTP status code extraction to handle string-formatted status codes in error objects
1355
+
1356
+ ### Removed
1357
+
1358
+ - Removed `CliAuthStorage` class in favor of new `AuthCredentialStore` with enhanced functionality
1359
+
1360
+ ## [12.17.2] - 2026-02-21
1361
+
1362
+ ### Added
1363
+
1364
+ - Exported `getAntigravityUserAgent()` function for constructing Antigravity User-Agent headers
1365
+
1366
+ ### Changed
1367
+
1368
+ - Updated default Antigravity version from 1.15.8 to 1.18.3
1369
+ - Unified User-Agent header generation across Antigravity API calls to use centralized `getAntigravityUserAgent()` function
1370
+
1371
+ ## [12.17.1] - 2026-02-21
1372
+
1373
+ ### Added
1374
+
1375
+ - Added new export paths for provider models via `./provider-models` and `./provider-models/*`
1376
+ - Added new export paths for Cursor and OpenAI code provider providers via `./providers/cursor/gen/*` and `./providers/openai-code/*`
1377
+ - Added new export paths for usage utilities via `./usage/*`
1378
+ - Added new export paths for discovery and OAuth utilities via `./utils/discovery` and `./utils/oauth` with subpath exports
1379
+
1380
+ ### Changed
1381
+
1382
+ - Simplified main export path to use wildcard pattern `./src/*.ts` for broader module access
1383
+ - Updated `models.json` export to include TypeScript declaration file at `./src/models.json.d.ts`
1384
+ - Reorganized package.json field ordering for improved readability
1385
+
1386
+ ## [12.17.0] - 2026-02-21
1387
+
1388
+ ### Fixed
1389
+
1390
+ - Cursor provider: bind `execHandlers` when passing handler methods to the exec protocol so handlers receive correct `this` context (fixes "undefined is not an object (evaluating 'this.options')" when using exec tools such as web search with Cursor)
1391
+
1392
+ ## [12.16.0] - 2026-02-21
1393
+
1394
+ ### Added
1395
+
1396
+ - Exported `readModelCache` and `writeModelCache` functions for direct SQLite-backed model cache access
1397
+ - Added `<turn_aborted>` guidance marker as synthetic user message when assistant messages are aborted or errored, informing the model that tools may have partially executed
1398
+ - Added support for Sonnet 4.6 models in adaptive thinking detection
1399
+
1400
+ ### Changed
1401
+
1402
+ - Updated model cache schema version to support improved global model fallback resolution
1403
+ - Improved GitHub Copilot model resolution to prefer provider-specific model definitions over global references when context window is larger, ensuring optimal model capabilities
1404
+ - Migrated model cache from per-provider JSON files to unified SQLite database (models.db) for atomic cross-process access
1405
+ - Renamed `cachePath` option to `cacheDbPath` in ModelManagerOptions to reflect database-backed storage
1406
+ - Improved non-authoritative cache handling with 5-minute retry backoff instead of retrying on every startup
1407
+ - Modified handling of aborted/errored assistant messages to preserve tool call structure instead of converting to text summaries, with synthetic 'aborted' tool results injected
1408
+ - Updated tool call tracking to use status map (Resolved/Aborted) instead of separate sets for better handling of duplicate and aborted tool results
1409
+
1410
+ ## [12.15.0] - 2026-02-20
1411
+
1412
+ ### Fixed
1413
+
1414
+ - Improved error messages for OAuth token refresh failures by including detailed error information from the provider
1415
+ - Separated rate limit and usage limit error handling to provide distinct user-friendly messages for ChatGPT rate limits vs subscription usage limits
1416
+
1417
+ ### Changed
1418
+
1419
+ - Increased SDK retry attempts to 5 for OpenAI, Azure OpenAI, and Anthropic clients (was SDK default of 2)
1420
+ - Changed 429 retry strategy for OpenAI code provider and Google Gemini CLI to use a 5-minute time budget when the server provides a retry delay, instead of a fixed attempt cap
1421
+
1422
+ ## [12.14.0] - 2026-02-19
1423
+
1424
+ ### Added
1425
+
1426
+ - Added `gemini-3.1-pro` model to opencode provider with text and image input support
1427
+ - Added `trinity-large-preview-free` model to opencode provider
1428
+ - Added `google/gemini-3.1-pro-preview` model to nanogpt provider
1429
+ - Added `google/gemini-3.1-pro-preview` model to openrouter provider with text and image input support
1430
+ - Added `gemini-3.1-pro` model to cursor provider
1431
+ - Added optional `intent` field to `ToolCall` interface for harness-level intent metadata
1432
+
1433
+ ### Changed
1434
+
1435
+ - Changed `big-pickle` model API from `openai-completions` to `anthropic-messages`
1436
+ - Changed `big-pickle` model baseUrl from `https://opencode.ai/zen/v1` to `https://opencode.ai/zen`
1437
+ - Changed `minimax-m2.5-free` model API from `openai-completions` to `anthropic-messages`
1438
+ - Changed `minimax-m2.5-free` model baseUrl from `https://opencode.ai/zen/v1` to `https://opencode.ai/zen`
1439
+
1440
+ ### Fixed
1441
+
1442
+ - Fixed tool argument validation to iteratively coerce nested JSON strings across multiple passes, enabling proper handling of deeply nested JSON-serialized objects and arrays
1443
+
1444
+ ## [12.13.0] - 2026-02-19
1445
+
1446
+ ### Added
1447
+
1448
+ - Added NanoGPT provider support with API-key login, dynamic model discovery from `https://nano-gpt.com/api/v1/models`, and text-model filtering for catalog/runtime discovery ([#111](https://github.com/can1357/gajae-code/issues/111))
1449
+
1450
+ ## [12.12.3] - 2026-02-19
1451
+
1452
+ ### Fixed
1453
+
1454
+ - Fixed retry logic to recognize 'unable to connect' errors as transient failures
1455
+
1456
+ ## [12.11.3] - 2026-02-19
1457
+
1458
+ ### Fixed
1459
+
1460
+ - Fixed OpenAI code provider streaming to fail truncated responses that end without a terminal completion event, preventing partial outputs from being treated as successful completions.
1461
+ - Fixed OpenAI code websocket append fallback by resetting stale turn-state/model-etag session metadata when request shape diverges from appendable history.
1462
+
1463
+ ## [12.11.1] - 2026-02-19
1464
+
1465
+ ### Added
1466
+
1467
+ - Added support for Anthropic model 4.6 Opus and Sonnet models via Cursor API
1468
+ - Added support for Composer 1.5 model via Cursor API
1469
+ - Added support for GPT-5.1 OpenAI code Mini and GPT-5.1 High models via Cursor API
1470
+ - Added support for GPT-5.2 and GPT-5.3 OpenAI code variants (Fast, High, Low, Extra High) via Cursor API
1471
+ - Added HTTP/2 transport support for Cursor API requests (required by Cursor API)
1472
+
1473
+ ### Changed
1474
+
1475
+ - Updated pricing for Anthropic model 3.5 Sonnet model
1476
+ - Updated Anthropic model 3.5 Sonnet context window from 262,144 to 131,072 tokens
1477
+ - Simplified Cursor model display names by removing '(Cursor)' suffix
1478
+ - Changed Cursor API timeout from 15 seconds to 5 seconds
1479
+ - Switched Cursor API transport from HTTP/1.1 to HTTP/2
1480
+
1481
+ ## [12.11.0] - 2026-02-19
1482
+
1483
+ ### Added
1484
+
1485
+ - Added `priority` field to Model interface for provider-assigned model prioritization
1486
+ - Added `CatalogDiscoveryConfig` interface to standardize catalog discovery configuration across providers
1487
+ - Added type guards `isCatalogDescriptor()` and `allowsUnauthenticatedCatalogDiscovery()` for safer descriptor handling
1488
+ - Added `DEFAULT_MODEL_PER_PROVIDER` export from descriptors module for centralized default model management
1489
+ - Support for 11 new AI providers: Cloudflare AI Gateway, Hugging Face Inference, LiteLLM, Moonshot, NVIDIA, Ollama, Qianfan, Qwen Portal, Together, Venice, vLLM, and Xiaomi MiMo
1490
+ - Login flows for new providers with API key validation and OAuth token support
1491
+ - Extended `KnownProvider` type to include all newly supported providers
1492
+ - API key environment variable mappings for all new providers in service provider map
1493
+ - Model discovery and configuration for Cloudflare AI Gateway, Hugging Face, LiteLLM, Moonshot, NVIDIA, Ollama, Qianfan, Qwen Portal, Together, Venice, vLLM, and Xiaomi MiMo
1494
+
1495
+ ### Changed
1496
+
1497
+ - Refactored OAuth credential retrieval to simplify storage lifecycle management in model generation script
1498
+ - Parallelized special model discovery sources (Antigravity, OpenAI code) for improved generation performance
1499
+ - Reorganized model JSON structure to place `contextWindow` and `maxTokens` before `compat` field for consistency
1500
+ - Added `priority` field to OpenAI code provider models for provider-assigned model prioritization
1501
+ - Refactored provider descriptors to use helper functions (`descriptor`, `catalog`, `catalogDescriptor`) for reduced code duplication
1502
+ - Refactored models.dev provider descriptors to use helper functions (`simpleModelsDevDescriptor`, `openAiCompletionsDescriptor`, `anthropicMessagesDescriptor`) for improved maintainability
1503
+ - Unified provider descriptors into single source of truth in `descriptors.ts` for both runtime model discovery and catalog generation, improving maintainability
1504
+ - Refactored model generation script to use declarative `CatalogProviderDescriptor` interface instead of separate descriptor types, reducing code duplication
1505
+ - Reorganized models.dev provider descriptors into logical groups (Bedrock, Core, Coding Plans, Specialized) for better code organization
1506
+ - Simplified API resolution for OpenCode and GitHub Copilot providers using rule-based matching instead of inline conditionals
1507
+ - Refactored model generation script to use declarative provider descriptors instead of inline provider-specific logic, improving maintainability and reducing code duplication
1508
+ - Extracted model post-processing policies (cache pricing corrections, context window normalization) into dedicated `model-policies.ts` module for better testability and clarity
1509
+ - Removed static bundled models for Ollama and vLLM from `models.json` to rely on dynamic discovery instead, reducing static catalog size
1510
+ - Updated `OAuthProvider` type to include new provider identifiers
1511
+ - Expanded model registry (models.json) with thousands of new model entries across all new providers
1512
+ - Modified environment variable resolution to use `$pickenv` for providers with multiple possible env var names
1513
+ - Updated README documentation to list all newly supported providers and their authentication requirements
1514
+
1515
+ ## [12.10.1] - 2026-02-18
1516
+
1517
+ - Added Synthetic provider
1518
+ - Added API-key login helpers for Synthetic and Cerebras providers
1519
+
1520
+ ## [12.10.0] - 2026-02-18
1521
+
1522
+ ### Breaking Changes
1523
+
1524
+ - Renamed public API functions: `getModel()` → `getBundledModel()`, `getModels()` → `getBundledModels()`, `getProviders()` → `getBundledProviders()`
1525
+
1526
+ ### Added
1527
+
1528
+ - Exported `ModelManager` API for runtime-aware model resolution with dynamic endpoint discovery
1529
+ - Exported provider-specific model manager configuration helpers for Google, OpenAI-compatible, OpenAI code, and Cursor providers
1530
+ - Exported discovery utilities for fetching models from Antigravity, OpenAI code, Cursor, Gemini, and OpenAI-compatible endpoints
1531
+ - Added `createModelManager()` function to manage bundled and dynamically discovered models with configurable refresh strategies
1532
+ - Added support for on-disk model caching with TTL-based invalidation
1533
+ - Added `resolveProviderModels()` function for runtime model resolution across multiple providers
1534
+ - Added EU cross-region inference variants for Anthropic model Haiku 3.5 on Bedrock
1535
+ - Added Anthropic model Sonnet 4.6 and Anthropic model Sonnet 4.6 Thinking models to Antigravity provider
1536
+ - Added GLM-5 Free model via OpenCode provider
1537
+ - Added GLM-4.7-FlashX model via ZAI provider
1538
+ - Added MiniMax-M2.5-highspeed model across multiple providers (minimax-code, minimax-code-cn, minimax, minimax-cn)
1539
+ - Added Anthropic model Sonnet 4.6 model to OpenRouter provider
1540
+ - Added Qwen 3.5 Plus model to Vercel AI Gateway provider
1541
+ - Added Anthropic model Sonnet 4.6 model to Vercel AI Gateway provider
1542
+
1543
+ ### Changed
1544
+
1545
+ - Renamed `getModel()` to `getBundledModel()` to clarify it returns compile-time bundled models only
1546
+ - Renamed `getModels()` to `getBundledModels()` for consistency
1547
+ - Renamed `getProviders()` to `getBundledProviders()` for consistency
1548
+ - Refactored model generation script to use modular discovery functions instead of monolithic provider-specific logic
1549
+ - Updated models.json with new model entries and pricing updates across multiple providers
1550
+ - Updated pricing for deepseek/deepseek-v3 model on OpenRouter
1551
+ - Updated maxTokens from 65536 to 4096 for deepseek/deepseek-v3 on OpenRouter
1552
+ - Updated pricing and maxTokens for mistralai/mistral-large-2411 on OpenRouter
1553
+ - Updated pricing for qwen/qwen-max on Together AI
1554
+ - Updated pricing for qwen/qwen-vl-plus on Together AI
1555
+ - Updated pricing for qwen/qwen-plus on Together AI
1556
+ - Updated pricing for qwen/qwen-turbo on Together AI
1557
+ - Expanded EU cross-region inference variant support to all Anthropic model models on Bedrock (previously limited to Haiku, Sonnet, and Opus 4.5)
1558
+
1559
+ ## [12.8.0] - 2026-02-16
1560
+
1561
+ ### Added
1562
+
1563
+ - Added `contextPromotionTarget` model property to specify preferred fallback model when context promotion is triggered
1564
+ - Added automatic context promotion target assignment for Spark models to their base model equivalents
1565
+ - Added support for Brave search provider with BRAVE_API_KEY environment variable
1566
+
1567
+ ### Changed
1568
+
1569
+ - Updated Qwen model context window and max token limits for improved accuracy
1570
+
1571
+ ## [12.7.0] - 2026-02-16
1572
+
1573
+ ### Added
1574
+
1575
+ - Added DeepSeek-V3.2 model support via Amazon Bedrock
1576
+ - Added GLM-5 model support via OpenCode
1577
+ - Added MiniMax M2.5 model support via OpenCode
1578
+
1579
+ ### Changed
1580
+
1581
+ - Updated GLM-4.5, GLM-4.5-Air, GLM-4.5-Flash, GLM-4.5V, GLM-4.6, GLM-4.6V, GLM-4.7, GLM-4.7-Flash, and GLM-5 models to use anthropic-messages API instead of openai-completions
1582
+ - Updated GLM models base URL from https://api.z.ai/api/coding/paas/v4 to https://api.z.ai/api/anthropic
1583
+ - Updated pricing for multiple models including Mistral, Moonshot, and Qwen variants
1584
+ - Updated context window and max tokens for several models to reflect accurate specifications
1585
+
1586
+ ### Removed
1587
+
1588
+ - Removed compat field with supportsDeveloperRole and thinkingFormat properties from GLM models
1589
+
1590
+ ## [12.6.0] - 2026-02-16
1591
+
1592
+ ### Added
1593
+
1594
+ - Added source-scoped custom API and OAuth provider registration helpers for extension-defined providers.
1595
+
1596
+ ### Changed
1597
+
1598
+ - Expanded `Api` typing to allow extension-defined API identifiers while preserving built-in API exhaustiveness checks.
1599
+
1600
+ ### Fixed
1601
+
1602
+ - Fixed custom API registration to reject built-in API identifiers and prevent accidental provider overrides.
1603
+
1604
+ ## [12.2.0] - 2026-02-13
1605
+
1606
+ ### Added
1607
+
1608
+ - Added automatic retry logic for WebSocket stream closures before response completion, with configurable retry budget to improve reliability on flaky connections
1609
+ - Added `providerSessionState` option to enable provider-scoped mutable state persistence across agent turns
1610
+ - Added WebSocket retry logic with configurable retry budget and delay via `PI_OPENAI_CODE_WEBSOCKET_RETRY_BUDGET` and `PI_OPENAI_CODE_WEBSOCKET_RETRY_DELAY_MS` environment variables
1611
+ - Added WebSocket idle timeout detection via `PI_OPENAI_CODE_WEBSOCKET_IDLE_TIMEOUT_MS` environment variable to fail stalled connections
1612
+ - Added WebSocket v2 beta header support via `PI_OPENAI_CODE_WEBSOCKET_V2` environment variable for newer OpenAI API versions
1613
+ - Added WebSocket handshake header capture to extract and replay session metadata (turn state, models etag, reasoning flags) across SSE fallback requests
1614
+ - Added `preferWebsockets` option to enable WebSocket transport for OpenAI code provider responses when supported
1615
+ - Added `prewarmOpenAIOpenAI codeResponses()` function to establish and reuse WebSocket connections across multiple requests
1616
+ - Added `getOpenAIOpenAI codeTransportDetails()` function to inspect transport layer details including WebSocket status and fallback information
1617
+ - Added `getProviderDetails()` function to retrieve formatted provider configuration and transport information
1618
+ - Added automatic fallback from WebSocket to SSE when connection fails, with transparent retry logic
1619
+ - Added session state management to reuse WebSocket connections and enable request appending across turns
1620
+ - Added support for x-openai-code-turn-state header to maintain conversation state across SSE requests
1621
+
1622
+ ### Changed
1623
+
1624
+ - Changed WebSocket session state storage from global maps to provider-scoped session state for multi-agent isolation
1625
+ - Changed WebSocket connection initialization to accept idle timeout configuration and handshake header callbacks
1626
+ - Changed WebSocket error handling to use standardized transport error messages with `OpenAI code websocket transport error` prefix
1627
+ - Changed WebSocket retry behavior to retry transient failures before activating sticky fallback, improving reliability on flaky connections
1628
+ - Changed OpenAI code provider model configuration to prefer WebSocket transport by default with `preferWebsockets: true`
1629
+ - Changed header handling to use appropriate OpenAI-Beta header values for WebSocket vs SSE transports
1630
+ - Perplexity OAuth token refresh now uses JWT expiry extraction instead of Socket.IO RPC, improving reliability when server is unreachable
1631
+ - Removed Socket.IO client implementation for Perplexity token refresh; tokens are now validated using embedded JWT expiry claims
1632
+
1633
+ ### Removed
1634
+
1635
+ - Removed `refreshPerplexityToken` export; token refresh is now handled internally via JWT expiry detection
1636
+
1637
+ ### Fixed
1638
+
1639
+ - Fixed WebSocket stream retry logic to properly handle mid-stream connection closures and retry before falling back to SSE transport
1640
+ - Fixed `preferWebsockets` option handling to correctly respect explicit `false` values when determining transport preference
1641
+ - Fixed WebSocket append state not being reset after aborted requests, preventing stale state from affecting subsequent turns
1642
+ - Fixed WebSocket append state not being reset after stream errors, preventing failed append attempts from blocking future requests
1643
+ - Fixed OpenAI code model context window metadata to use 272000 input tokens (instead of 400000 total budget) for non-Spark OpenAI code variants
1644
+
1645
+ ## [12.0.0] - 2026-02-12
1646
+
1647
+ ### Added
1648
+
1649
+ - Added GPT-5.3 OpenAI code Spark model with 128K context window and extended reasoning capabilities
1650
+ - Added MiniMax M2.5 and M2.5 Lightning models via OpenAI-compatible API (minimax-code provider)
1651
+ - Added MiniMax M2.5 and M2.5 Lightning models via OpenAI-compatible API (minimax-code-cn provider for China region)
1652
+ - Added MiniMax M2.5 and M2.5 Lightning models via Anthropic API (minimax and minimax-cn providers)
1653
+ - Added Llama 3.1 8B model via Cerebras API
1654
+ - Added MiniMax M2.5 model via OpenRouter
1655
+ - Added MiniMax M2.5 model via Vercel AI Gateway
1656
+ - Added MiniMax M2.5 Free model via OpenCode
1657
+ - Added Qwen3 VL 32B Instruct multimodal model via OpenRouter
1658
+
1659
+ ### Changed
1660
+
1661
+ - Updated Z.ai GLM-5 pricing and context window configuration on OpenRouter
1662
+ - Updated Qwen3 Max Thinking max tokens from 32768 to 65536 on OpenRouter
1663
+ - Updated OpenAI GPT-5 Image Mini pricing on OpenRouter
1664
+ - Updated OpenAI GPT-5 Pro pricing and context window on OpenRouter
1665
+ - Updated OpenAI o4-mini pricing and context window on OpenRouter
1666
+ - Updated Anthropic model Opus 4.5 Thinking model name formatting (removed parentheses)
1667
+ - Updated Anthropic model Opus 4.6 Thinking model name formatting (removed parentheses)
1668
+ - Updated Anthropic model Sonnet 4.5 Thinking model name formatting (removed parentheses)
1669
+ - Updated Gemini 2.5 Flash Thinking model name formatting (removed parentheses)
1670
+ - Updated Gemini 3 Pro High and Low model name formatting (removed parentheses)
1671
+ - Updated GPT-OSS 120B Medium model name formatting (removed parentheses) and context window to 131072
1672
+
1673
+ ### Removed
1674
+
1675
+ - Removed GLM-5 model from Z.ai provider
1676
+ - Removed Trinity Large Preview Free model from OpenCode provider
1677
+ - Removed MiniMax M2.1 Free model from OpenCode provider
1678
+ - Removed deprecated Anthropic model entries: `anthropic-model-3-5-haiku-latest`, `anthropic-model-3-5-haiku-20241022`, `anthropic-model-3-7-sonnet-20250219`, `anthropic-model-3-7-sonnet-latest`, `anthropic-model-3-opus-20240229`, `anthropic-model-3-sonnet-20240229` ([#33](https://github.com/can1357/gajae-code/issues/33))
1679
+
1680
+ ### Fixed
1681
+
1682
+ - Added deprecation filter in model generation script to prevent re-adding deprecated Anthropic models ([#33](https://github.com/can1357/gajae-code/issues/33))
1683
+
1684
+ ## [11.14.1] - 2026-02-12
1685
+
1686
+ ### Added
1687
+
1688
+ - Added prompt-caching-scope-2026-01-05 beta feature support
1689
+
1690
+ ### Changed
1691
+
1692
+ - Updated Anthropic Code version header to 2.1.39
1693
+ - Updated runtime version header to v24.13.1 and package version to 0.73.0
1694
+ - Increased request timeout from 60s to 600s
1695
+ - Reordered Accept-Encoding header values for compression preference
1696
+ - Updated OAuth authorization and token endpoints to use platform.anthropic-model.com
1697
+ - Expanded OAuth scopes to include user:sessions:anthropic-model_code and user:mcp_servers
1698
+
1699
+ ### Removed
1700
+
1701
+ - Removed anthropic-model-code-20250219 beta feature from default models
1702
+ - Removed fine-grained-tool-streaming-2025-05-14 beta feature
1703
+
1704
+ ## [11.13.1] - 2026-02-12
1705
+
1706
+ ### Added
1707
+
1708
+ - Added Perplexity (Pro/Max) OAuth login support via native macOS app extraction or email OTP authentication
1709
+ - Added `loginPerplexity` and `refreshPerplexityToken` functions for Perplexity account integration
1710
+ - Added Socket.IO v4 client implementation for authenticated WebSocket communication with Perplexity API
1711
+
1712
+ ## [11.12.0] - 2026-02-11
1713
+
1714
+ ### Changed
1715
+
1716
+ - Increased maximum retry attempts for OpenAI code requests from 2 to 5 to improve reliability on transient failures
1717
+
1718
+ ### Fixed
1719
+
1720
+ - Fixed tool result content handling in Anthropic provider to provide fallback error message when content is empty
1721
+ - Improved retry delay calculation to parse delay values from error response bodies (e.g., 'Please try again in 225ms')
1722
+
1723
+ ## [11.11.0] - 2026-02-10
1724
+
1725
+ ### Breaking Changes
1726
+
1727
+ - Replaced `./models.generated` export with `./models.json` - update imports from `import { MODELS } from './models.generated'` to `import MODELS from './models.json' with { type: 'json' }`
1728
+
1729
+ ### Added
1730
+
1731
+ - Added TypeScript type declarations for `models.json` to enable proper type inference when importing the JSON file
1732
+
1733
+ ### Changed
1734
+
1735
+ - Updated available models in google-antigravity provider with new model variants and updated context window/token limits
1736
+ - Simplified type signatures for `getModel()` and `getModels()` functions for improved usability
1737
+ - Changed models export from TypeScript module to JSON format for improved performance and reduced bundle size
1738
+ - Updated `@anthropic-ai/sdk` dependency from ^0.72.1 to ^0.74.0
1739
+
1740
+ ## [11.10.0] - 2026-02-10
1741
+
1742
+ ### Added
1743
+
1744
+ - Added support for Kimi K2, K2 Turbo Preview, and K2.5 models with reasoning capabilities
1745
+
1746
+ ### Fixed
1747
+
1748
+ - Fixed Anthropic model Opus 4.6 context window to 200K across all providers (was incorrectly set to 1M)
1749
+ - Fixed Anthropic model Sonnet 4 context window to 200K across multiple providers (was incorrectly set to 1M)
1750
+
1751
+ ## [11.8.0] - 2026-02-10
1752
+
1753
+ ### Added
1754
+
1755
+ - Added `auto` model alias for OpenRouter with automatic model routing
1756
+ - Added `openrouter/aurora-alpha` model with reasoning capabilities
1757
+ - Added `qwen/qwen3-max-thinking` model with extended context window support
1758
+ - Added support for `parametersJsonSchema` in Google Gemini tool definitions for improved JSON Schema compatibility
1759
+
1760
+ ### Changed
1761
+
1762
+ - Updated Anthropic model Sonnet 4 and 4.5 context window from 1M to 200K tokens to reflect actual limits
1763
+ - Updated Anthropic model Opus 4.6 context window to 200K tokens across providers
1764
+ - Changed default `reasoningSummary` for OpenAI code provider from `undefined` to `auto`
1765
+ - Updated Qwen model pricing and context window specifications across multiple variants
1766
+ - Modified Google Gemini CLI system instruction to use compact format
1767
+ - Changed tool parameter handling for Anthropic model models on Google Cloud Code Assist to use legacy `parameters` field for API translation
1768
+
1769
+ ### Removed
1770
+
1771
+ - Removed `glm-4.7-free` model from OpenCode provider
1772
+ - Removed `qwen3-coder` model from OpenCode provider
1773
+ - Removed `ai21/jamba-mini-1.7` model from OpenRouter
1774
+ - Removed `stepfun-ai/step3` model from OpenRouter
1775
+ - Removed duplicate test suite for Google Antigravity Provider with `gemini-3-pro-high`
1776
+
1777
+ ### Fixed
1778
+
1779
+ - Fixed Amazon Bedrock HTTP/1.1 handler import to use direct import instead of dynamic import
1780
+ - Fixed Qwen model context window and pricing inconsistencies across OpenRouter
1781
+ - Fixed cache read pricing for multiple Qwen models
1782
+ - Fixed OpenAI code provider reasoning effort clamping for `gpt-5.3-openai-code` model
1783
+
1784
+ ## [11.7.1] - 2026-02-07
1785
+
1786
+ ### Added
1787
+
1788
+ - Added Anthropic model Opus 4.6 Thinking model for Antigravity provider
1789
+ - Added Gemini 2.5 Flash, Gemini 2.5 Flash Thinking, and Gemini 2.5 Pro models for Antigravity provider
1790
+ - Added Pony Alpha model via OpenRouter
1791
+
1792
+ ### Changed
1793
+
1794
+ - Updated Antigravity models to use free tier pricing (0 cost) across all models
1795
+ - Changed Antigravity model fetching to dynamically load from API when credentials are available, with hardcoded fallback models
1796
+ - Updated Anthropic model Opus 4.6 context window from 200,000 to 1,000,000 tokens across Bedrock regions
1797
+ - Updated Anthropic model Opus 4.6 cache pricing from 1.5/18.75 to 0.5/6.25 for EU and US regions
1798
+ - Updated Antigravity model pricing to free tier (0 cost) for Anthropic model Opus 4.5 Thinking, Anthropic model Sonnet 4.5 Thinking, Gemini 3 Flash, Gemini 3 Pro variants, and GPT-OSS 120B Medium
1799
+ - Updated GPT-OSS 120B Medium reasoning capability from false to true
1800
+ - Updated Gemini 3 Flash max tokens from 65,535 to 65,536
1801
+ - Updated Anthropic model Opus 4.5 Thinking display name formatting to include parentheses
1802
+ - Updated various model pricing and context window parameters across OpenRouter and other providers
1803
+ - Removed Anthropic model Opus 4.6 20260205 model from Anthropic provider
1804
+
1805
+ ### Fixed
1806
+
1807
+ - Fixed Anthropic model Opus 4.6 model ID format by removing version suffix (:0) in Bedrock configurations
1808
+ - Fixed Llama 3.1 70B Instruct pricing and context window parameters
1809
+ - Fixed Mistral model pricing and cache read costs
1810
+ - Fixed DeepSeek and other model pricing inconsistencies
1811
+ - Fixed Qwen model pricing and token limits
1812
+ - Fixed GLM model pricing and context window specifications
1813
+
1814
+ ## [11.6.0] - 2026-02-07
1815
+
1816
+ ### Added
1817
+
1818
+ - Added Bedrock cache retention support with `PI_CACHE_RETENTION` env var and per-request `cacheRetention` option
1819
+ - Added adaptive thinking support for Bedrock Opus 4.6+ models
1820
+ - Added `AWS_BEDROCK_SKIP_AUTH` env var to support unauthenticated Bedrock proxies
1821
+ - Added `AWS_BEDROCK_FORCE_HTTP1` env var to force HTTP/1.1 for custom Bedrock endpoints
1822
+ - Re-exported `Static`, `TSchema`, and `Type` from `@sinclair/typebox`
1823
+
1824
+ ### Fixed
1825
+
1826
+ - Fixed OpenAI Responses storage disabled by default (`store: false`)
1827
+ - Fixed reasoning effort clamping for gpt-5.3 OpenAI code models (minimal -> low)
1828
+ - Fixed Bedrock `supportsPromptCaching` to also check model cost fields
1829
+
1830
+ ## [11.5.1] - 2026-02-07
1831
+
1832
+ ### Fixed
1833
+
1834
+ - Fixed schema normalization to handle array-valued `type` fields by converting them to a single type with nullable flag for Google provider compatibility
1835
+
1836
+ ## [11.3.0] - 2026-02-06
1837
+
1838
+ ### Added
1839
+
1840
+ - Added `cacheRetention` option to control prompt cache retention preference ('none', 'short', 'long') across providers
1841
+ - Added `maxRetryDelayMs` option to cap server-requested retry delays and fail fast when delays exceed the limit
1842
+ - Added `effort` option for Anthropic Opus 4.6+ models to control adaptive thinking effort levels ('low', 'medium', 'high', 'max')
1843
+ - Added support for Anthropic Opus 4.6+ adaptive thinking mode that lets Anthropic model decide when and how much to think
1844
+ - Added `PI_AI_ANTIGRAVITY_VERSION` environment variable to customize Antigravity sandbox endpoint version
1845
+ - Exported `convertAnthropicMessages` function for converting message formats to Anthropic API
1846
+ - Automatic fallback for Anthropic assistant-prefill requests: appends synthetic user "Continue." message when conversation ends with assistant turn to maintain API compatibility
1847
+
1848
+ ### Changed
1849
+
1850
+ - Changed `supportsXhigh()` to include GPT-5.1 OpenAI code Max and broaden Anthropic support to all Anthropic Messages API models with budget-based thinking capability
1851
+ - Changed Anthropic thinking mode to use adaptive thinking for Opus 4.6+ models instead of budget-based thinking
1852
+ - Changed `supportsXhigh()` to support GPT-5.2/5.3 and Anthropic Opus 4.6+ models with adaptive thinking
1853
+ - Changed prompt caching to respect `cacheRetention` option and support TTL configuration for Anthropic
1854
+ - Changed OpenAI tool definitions to conditionally include `strict` field only when provider supports it
1855
+ - Changed Qwen model support to use `enable_thinking` boolean parameter instead of OpenAI-style reasoning_effort
1856
+
1857
+ ### Fixed
1858
+
1859
+ - Fixed indentation and formatting in `convertAnthropicMessages` function
1860
+ - Fixed handling of conversations ending with assistant messages on Anthropic-routed models that reject assistant prefill requests
1861
+
1862
+ ## [11.2.3] - 2026-02-05
1863
+
1864
+ ### Added
1865
+
1866
+ - Added Anthropic model Opus 4.6 model support across multiple providers (Anthropic, Amazon Bedrock, GitHub Copilot, OpenRouter, OpenCode, Vercel AI Gateway)
1867
+ - Added GPT-5.3 OpenAI code model support for OpenAI
1868
+ - Added `readSseJson` utility import for improved SSE stream handling in Google Gemini CLI provider
1869
+
1870
+ ### Changed
1871
+
1872
+ - Updated Google Gemini CLI provider to use `readSseJson` utility for cleaner SSE stream parsing
1873
+ - Updated pricing for Llama 3.1 405B model on Vercel AI Gateway (cache read rate adjusted)
1874
+ - Updated Llama 3.1 405B context window and max tokens on Vercel AI Gateway (256000 for both)
1875
+
1876
+ ### Removed
1877
+
1878
+ - Removed Kimi K2, Kimi K2 Turbo Preview, and Kimi K2.5 models
1879
+ - Removed Deep Cogito Cogito V2 Preview models from OpenRouter
1880
+
1881
+ ## [11.0.0] - 2026-02-05
1882
+
1883
+ ### Changed
1884
+
1885
+ - Replaced direct `Bun.env` access with `getEnv()` utility from `@gajae-code/utils` for consistent environment variable handling across all providers
1886
+ - Updated environment variable names from `GJC_*` prefix to `PI_*` prefix for consistency (e.g., `GJC_CODING_AGENT_DIR` → `PI_CODING_AGENT_DIR`)
1887
+
1888
+ ### Removed
1889
+
1890
+ - Removed automatic environment variable migration from `PI_*` to `GJC_*` prefixes via `migrate-env.ts` module
1891
+
1892
+ ## [10.5.0] - 2026-02-04
1893
+
1894
+ ### Changed
1895
+
1896
+ - Updated @anthropic-ai/sdk to ^0.72.1
1897
+ - Updated @aws-sdk/client-bedrock-runtime to ^3.982.0
1898
+ - Updated @google/genai to ^1.39.0
1899
+ - Updated @smithy/node-http-handler to ^4.4.9
1900
+ - Updated openai to ^6.17.0
1901
+ - Updated @types/node to ^25.2.0
1902
+
1903
+ ### Removed
1904
+
1905
+ - Removed proxy-agent dependency
1906
+ - Removed undici dependency
1907
+
1908
+ ## [9.4.0] - 2026-01-31
1909
+
1910
+ ### Added
1911
+
1912
+ - Added `getEnv()` function to retrieve environment variables from Bun.env, cwd/.env, or ~/.env
1913
+ - Added support for reading .env files from home directory and current working directory
1914
+ - Added support for `exa` and `perplexity` as known providers in `getEnvApiKey()`
1915
+
1916
+ ### Changed
1917
+
1918
+ - Changed `getEnvApiKey()` to check Bun.env, cwd/.env, and ~/.env files in order of precedence
1919
+ - Refactored provider API key resolution to use a declarative service provider map
1920
+
1921
+ ## [9.2.2] - 2026-01-31
1922
+
1923
+ ### Added
1924
+
1925
+ - Added OpenCode Zen provider with API key authentication for accessing multiple AI models
1926
+ - Added 4 new free models via OpenCode: glm-4.7-free, kimi-k2.5-free, minimax-m2.1-free, trinity-large-preview-free
1927
+ - Added glm-4.7-flash model via Zai provider
1928
+ - Added Kimi Code provider with OpenAI and Anthropic API format support
1929
+ - Added prompt cache retention support with PI_CACHE_RETENTION env var
1930
+ - Added overflow patterns for Bedrock, MiniMax, Kimi; reclassified 429 as rate limiting
1931
+ - Added profile endpoint integration to resolve user emails with 24-hour caching
1932
+ - Added automatic token refresh for expired Kimi OAuth credentials
1933
+ - Added Kimi Code OAuth handler with device authorization flow
1934
+ - Added Kimi Code usage provider with quota caching
1935
+ - Added 4 new Kimi Code models (kimi-for-coding, kimi-k2, kimi-k2-turbo-preview, kimi-k2.5)
1936
+ - Added Kimi Code provider integration with OAuth and token management
1937
+ - Added tool-choice utility for mapping unified ToolChoice to provider-specific formats
1938
+ - Added ToolChoice type for controlling tool selection (auto, none, any, required, function)
1939
+
1940
+ ### Changed
1941
+
1942
+ - Updated Kimi K2.5 cache read pricing from 0.1 to 0.08
1943
+ - Updated MiniMax M2 pricing: input 0.6→0.6, output 3→3, cache read 0.1→0.09999999999999999
1944
+ - Updated OpenRouter DeepSeek V3.1 pricing and max tokens: input 0.6→0.5, output 3→2.8, maxTokens 262144→4096
1945
+ - Updated OpenRouter DeepSeek R1 pricing and max tokens: input 0.06→0.049999999999999996, output 0.24→0.19999999999999998, maxTokens 262144→4096
1946
+ - Updated Anthropic Anthropic model 3.5 Sonnet max tokens from 256000 to 65536 on OpenRouter
1947
+ - Updated Vercel AI Gateway Anthropic model 3.5 Sonnet cache read pricing from 0.125 to 0.13
1948
+ - Updated Vercel AI Gateway Anthropic model 3.5 Sonnet New cache read pricing from 0.125 to 0.13
1949
+ - Updated Vercel AI Gateway GPT-5.2 cache read pricing from 0.175 to 0.18 and display name to 'GPT 5.2'
1950
+ - Updated Zai GLM-4.6 cache read pricing from 0.024999999999999998 to 0.03
1951
+ - Updated Zai Qwen QwQ max tokens from 66000 to 16384
1952
+ - Added delta event batching and throttling (50ms, 20 updates/sec max) to AssistantMessageEventStream
1953
+ - Updated MiniMax-M2 pricing: input 1.2→0.6, output 1.2→3, cacheRead 0.6→0.1
1954
+
1955
+ ### Removed
1956
+
1957
+ - Removed OpenRouter google/gemini-2.0-flash-exp:free model
1958
+ - Removed Vercel AI Gateway stealth/sonoma-dusk-alpha and stealth/sonoma-sky-alpha models
1959
+
1960
+ ### Fixed
1961
+
1962
+ - Fixed rate limit issues with Kimi models by always sending max_tokens
1963
+ - Added handling for sensitive stop reason from Anthropic API safety filters
1964
+ - Added optional chaining for safer JSON schema property access in Anthropic provider
1965
+
1966
+ ## [8.6.0] - 2026-01-27
1967
+
1968
+ ### Changed
1969
+
1970
+ - Replaced JSON5 dependency with Bun.JSON5 parsing
1971
+
1972
+ ### Fixed
1973
+
1974
+ - Filtered empty user text blocks for OpenAI-compatible completions and normalized Kimi reasoning_content for OpenRouter tool-call messages
1975
+
1976
+ ## [8.4.0] - 2026-01-25
1977
+
1978
+ ### Added
1979
+
1980
+ - Added Azure OpenAI Responses provider with deployment mapping and resource-based base URL support
1981
+
1982
+ ### Changed
1983
+
1984
+ - Added OpenRouter routing preferences for OpenAI-compatible completions
1985
+
1986
+ ### Fixed
1987
+
1988
+ - Defaulted Google tool call arguments to empty objects when providers omit args
1989
+ - Guarded Responses/OpenAI code streaming deltas against missing content parts and handled arguments.done events
1990
+
1991
+ ## [8.2.1] - 2026-01-24
1992
+
1993
+ ### Fixed
1994
+
1995
+ - Fixed handling of streaming function call arguments in OpenAI responses to properly parse arguments when sent via `response.function_call_arguments.done` events
1996
+
1997
+ ## [8.2.0] - 2026-01-24
1998
+
1999
+ ### Changed
2000
+
2001
+ - Migrated node module imports from named to namespace imports across all packages for consistency with project guidelines
2002
+
2003
+ ## [8.0.0] - 2026-01-23
2004
+
2005
+ ### Fixed
2006
+
2007
+ - Fixed OpenAI Responses API 400 error "function_call without required reasoning item" when switching between models (same provider, different model). The fix omits the `id` field for function_calls from different models to avoid triggering OpenAI's reasoning/function_call pairing validation
2008
+ - Fixed 400 errors when reading multiple images via GitHub Copilot's Anthropic model models. Anthropic model requires tool_use -> tool_result adjacency with no user messages interleaved. Images from consecutive tool results are now batched into a single user message
2009
+
2010
+ ## [7.0.0] - 2026-01-21
2011
+
2012
+ ### Added
2013
+
2014
+ - Added usage tracking system with normalized schema for provider quota/limit endpoints
2015
+ - Added Anthropic model usage provider for 5-hour and 7-day quota windows
2016
+ - Added GitHub Copilot usage provider for chat, completions, and premium requests
2017
+ - Added Google Antigravity usage provider for model quota tracking
2018
+ - Added Google Gemini CLI usage provider for tier-based quota monitoring
2019
+ - Added OpenAI code provider usage provider for primary and secondary rate limit windows
2020
+ - Added ZAI usage provider for token and request quota tracking
2021
+
2022
+ ### Changed
2023
+
2024
+ - Updated Anthropic model usage provider to extract account identifiers from response headers
2025
+ - Updated GitHub Copilot usage provider to include account identifiers in usage reports
2026
+ - Updated Google Gemini CLI usage provider to handle missing reset time gracefully
2027
+
2028
+ ### Fixed
2029
+
2030
+ - Fixed GitHub Copilot usage provider to simplify token handling and improve reliability
2031
+ - Fixed GitHub Copilot usage provider to properly resolve account identifiers for OAuth credentials
2032
+ - Fixed API validation errors when sending empty user messages (resume with `.`) across all providers:
2033
+ - Google Cloud Code Assist (google-shared.ts)
2034
+ - OpenAI Responses API (openai-responses.ts)
2035
+ - OpenAI code provider Responses API (openai-code-responses.ts)
2036
+ - Cursor (cursor.ts)
2037
+ - Amazon Bedrock (amazon-bedrock.ts)
2038
+ - Clamped OpenAI code provider reasoning effort "minimal" to "low" for gpt-5.2 models to avoid API errors
2039
+ - Fixed GitHub Copilot usage fallback to internal quota endpoints when billing usage is unavailable
2040
+ - Fixed GitHub Copilot usage metadata to include account identifiers for report dedupe
2041
+ - Fixed Anthropic usage metadata extraction to include account identifiers when provided by the usage endpoint
2042
+ - Fixed Gemini CLI usage windows to consistently label quota windows for display suppression
2043
+
2044
+ ## [6.9.69] - 2026-01-21
2045
+
2046
+ ### Added
2047
+
2048
+ - Added duration and time-to-first-token (ttft) metrics to all AI provider responses
2049
+ - Added performance tracking for streaming responses across all providers
2050
+
2051
+ ## [6.9.0] - 2026-01-21
2052
+
2053
+ ### Removed
2054
+
2055
+ - Removed openai-code provider exports from main package index
2056
+ - Removed openai-code prompt utilities and moved them inline
2057
+ - Removed vitest configuration file
2058
+
2059
+ ## [6.8.4] - 2026-01-21
2060
+
2061
+ ### Changed
2062
+
2063
+ - Updated prompt caching strategy to follow Anthropic's recommended hierarchy
2064
+ - Fixed token usage tracking to properly handle cumulative output tokens from message_delta events
2065
+ - Improved message validation to filter out empty or invalid content blocks
2066
+ - Increased OAuth callback timeout from 120 seconds to 120,000 milliseconds
2067
+
2068
+ ## [6.8.3] - 2026-01-21
2069
+
2070
+ ### Added
2071
+
2072
+ - Added `headers` option to all providers for custom request headers
2073
+ - Added `onPayload` hook to observe provider request payloads before sending
2074
+ - Added `strictResponsesPairing` option for Azure OpenAI Responses API compatibility
2075
+ - Added `originator` option to `loginOpenAIOpenAI code` for custom OAuth flow identification
2076
+ - Added per-request `headers` and `onPayload` hooks to `StreamOptions`
2077
+ - Added `originator` option to `loginOpenAIOpenAI code`
2078
+
2079
+ ### Fixed
2080
+
2081
+ - Fixed tool call ID normalization for OpenAI Responses API cross-provider handoffs
2082
+ - Skipped errored or aborted assistant messages during cross-provider transforms
2083
+ - Detected AWS ECS/IRSA credentials for Bedrock authentication checks
2084
+ - Detected AWS ECS/IRSA credentials for Bedrock authentication checks
2085
+ - Normalized Responses API tool call IDs during handoffs and refreshed handoff tests
2086
+ - Enforced strict tool call/result pairing for Azure OpenAI Responses API
2087
+ - Skipped errored or aborted assistant messages during cross-provider transforms
2088
+
2089
+ ### Security
2090
+
2091
+ - Enhanced AWS credential detection to support ECS task roles and IRSA web identity tokens
2092
+
2093
+ ## [6.8.2] - 2026-01-21
2094
+
2095
+ ### Fixed
2096
+
2097
+ - Improved error handling for aborted requests in Google Gemini CLI provider
2098
+ - Enhanced OAuth callback flow to handle manual input errors gracefully
2099
+ - Fixed login cancellation handling in GitHub Copilot OAuth flow
2100
+ - Removed fallback manual input from OpenAI code provider OAuth flow
2101
+
2102
+ ### Security
2103
+
2104
+ - Hardened database file permissions to prevent credential leakage
2105
+ - Set secure directory permissions (0o700) for credential storage
2106
+
2107
+ ## [6.8.0] - 2026-01-20
2108
+
2109
+ ### Added
2110
+
2111
+ - Added `logout` command to CLI for OAuth provider logout
2112
+ - Added `status` command to show logged-in providers and token expiry
2113
+ - Added persistent credential storage using SQLite database
2114
+ - Added OAuth callback server with automatic port fallback
2115
+ - Added HTML callback page with success/error states
2116
+ - Added support for Cursor OAuth provider
2117
+
2118
+ ### Changed
2119
+
2120
+ - Updated Promise.withResolvers usage for better compatibility
2121
+ - Replaced custom sleep implementations with Bun.sleep and abortableSleep
2122
+ - Simplified SSE stream parsing using readLines utility
2123
+ - Updated test framework from vitest to bun:test
2124
+ - Replaced temp directory creation with TempDir API
2125
+ - Changed credential storage from auth.json to ~/.gjc/agent/agent.db
2126
+ - Changed CLI command examples from npx to bunx
2127
+ - Refactored OAuth flows to use common callback server base class
2128
+ - Updated OAuth provider interfaces to use controller pattern
2129
+
2130
+ ### Fixed
2131
+
2132
+ - Fixed OAuth callback handling with improved error states
2133
+ - Fixed token refresh for all OAuth providers
2134
+
2135
+ ## [6.7.670] - 2026-01-19
2136
+
2137
+ ### Changed
2138
+
2139
+ - Updated Anthropic Code compatibility headers and version
2140
+ - Improved OAuth token handling with proper state generation
2141
+ - Enhanced cache control for tool and user message blocks
2142
+ - Simplified tool name prefixing for OAuth traffic
2143
+ - Updated PKCE verifier generation for better security
2144
+
2145
+ ## [5.7.67] - 2026-01-18
2146
+
2147
+ ### Fixed
2148
+
2149
+ - Added error handling for unknown OAuth providers
2150
+
2151
+ ## [5.6.77] - 2026-01-18
2152
+
2153
+ ### Fixed
2154
+
2155
+ - Prevented duplicate tool results for errored or aborted messages when results already exist
2156
+
2157
+ ## [5.6.7] - 2026-01-18
2158
+
2159
+ ### Added
2160
+
2161
+ - Added automatic retry logic for OpenAI code provider responses with configurable delay and max retries
2162
+ - Added tool call ID sanitization for Amazon Bedrock to ensure valid characters
2163
+ - Added tool argument validation that coerces JSON-encoded strings for expected non-string types
2164
+
2165
+ ### Changed
2166
+
2167
+ - Updated environment variable prefix from PI* to GJC* for better consistency
2168
+ - Added automatic migration for legacy PI* environment variables to GJC* equivalents
2169
+ - Adjusted Bedrock Anthropic model thinking budgets to reserve output tokens when maxTokens is too low
2170
+
2171
+ ### Fixed
2172
+
2173
+ - Fixed orphaned tool call handling to ensure proper tool_use/tool_result pairing for all assistant messages
2174
+ - Fixed message transformation to insert synthetic tool results for errored/aborted assistant messages with tool calls
2175
+ - Fixed tool prefix handling in Anthropic model provider to use case-insensitive comparison
2176
+ - Fixed Gemini 3 model handling to treat unsigned tool calls as context-only with anti-mimicry context
2177
+ - Fixed message transformation to filter out empty error messages from conversation history
2178
+ - Fixed OpenAI completions provider compatibility detection to use provider metadata
2179
+ - Fixed OpenAI completions provider to avoid using developer role for opencode provider
2180
+ - Fixed orphaned tool call handling to skip synthetic results for errored assistant messages
2181
+
2182
+ ## [5.5.0] - 2026-01-18
2183
+
2184
+ ### Changed
2185
+
2186
+ - Updated User-Agent header from 'opencode' to 'pi' for OpenAI code provider requests
2187
+ - Simplified OpenAI code system prompt instructions
2188
+ - Removed bridge text override from OpenAI code system prompt builder
2189
+
2190
+ ## [5.3.0] - 2026-01-15
2191
+
2192
+ ### Changed
2193
+
2194
+ - Replaced detailed OpenAI code system instructions with simplified pi assistant instructions
2195
+ - Updated internal documentation references to use pi-internal:// protocol
2196
+
2197
+ ## [5.1.0] - 2026-01-14
2198
+
2199
+ ### Added
2200
+
2201
+ - Added Amazon Bedrock provider with `bedrock-converse-stream` API for Anthropic model models via AWS
2202
+ - Added MiniMax provider with OpenAI-compatible API
2203
+ - Added EU cross-region inference model variants for Anthropic model models on Bedrock
2204
+
2205
+ ### Fixed
2206
+
2207
+ - Fixed Gemini CLI provider retries with proper error handling, retry delays from headers, and empty stream retry logic
2208
+ - Fixed numbered list items showing "1." for all items when code blocks break list continuity (via `start` property)
2209
+
2210
+ ## [5.0.0] - 2026-01-12
2211
+
2212
+ ### Added
2213
+
2214
+ - Added support for `xhigh` thinking level in `thinkingBudgets` configuration
2215
+
2216
+ ### Changed
2217
+
2218
+ - Changed Anthropic thinking token budgets: minimal (1024→3072), low (2048→6144), medium (8192→12288), high (16384→24576)
2219
+ - Changed Google thinking token budgets: minimal (1024), low (2048→4096), medium (8192), high (16384), xhigh (24575)
2220
+ - Changed `supportsXhigh()` to return true for all Anthropic models
2221
+
2222
+ ## [4.6.0] - 2026-01-12
2223
+
2224
+ ### Fixed
2225
+
2226
+ - Fixed incorrect classification of thought signatures in Google Gemini responses—thought signatures are now correctly treated as metadata rather than thinking content indicators
2227
+ - Fixed thought signature handling in Google Gemini CLI and Vertex AI streaming to properly preserve signatures across text deltas
2228
+ - Fixed Google schema sanitization stripping property names that match schema keywords (e.g., "pattern", "format") from tool definitions
2229
+
2230
+ ## [4.4.9] - 2026-01-12
2231
+
2232
+ ### Fixed
2233
+
2234
+ - Fixed Google provider schema sanitization to strip additional unsupported JSON Schema fields (patternProperties, additionalProperties, min/max constraints, pattern, format)
2235
+
2236
+ ## [4.4.8] - 2026-01-12
2237
+
2238
+ ### Fixed
2239
+
2240
+ - Fixed Google provider schema sanitization to properly collapse `anyOf`/`oneOf` with const values into enum arrays
2241
+ - Fixed const-to-enum conversion to infer type from the const value when type is not specified
2242
+
2243
+ ## [4.4.6] - 2026-01-11
2244
+
2245
+ ### Fixed
2246
+
2247
+ - Fixed tool parameter schema sanitization to only apply Google-specific transformations for Gemini models, preserving original schemas for other model types
2248
+
2249
+ ## [4.4.5] - 2026-01-11
2250
+
2251
+ ### Changed
2252
+
2253
+ - Exported `sanitizeSchemaForGoogle` utility function for external use
2254
+
2255
+ ### Fixed
2256
+
2257
+ - Fixed Google provider schema sanitization to strip additional unsupported JSON Schema fields ($schema, $ref, $defs, format, examples, and others)
2258
+ - Fixed Google provider to ignore `additionalProperties: false` which is unsupported by the API
2259
+
2260
+ ## [4.4.4] - 2026-01-11
2261
+
2262
+ ### Fixed
2263
+
2264
+ - Fixed Cursor todo updates to bridge update_todos tool calls to the local todo_write tool
2265
+
2266
+ ## [4.3.0] - 2026-01-11
2267
+
2268
+ ### Added
2269
+
2270
+ - Added debug log filtering and display script for Cursor JSONL logs with follow mode and coalescing support
2271
+ - Added protobuf definition extractor script to reconstruct .proto files from bundled JavaScript
2272
+ - Added conversation state caching to persist context across multiple Cursor API requests in the same session
2273
+ - Added shell streaming support for real-time stdout/stderr output during command execution
2274
+ - Added JSON5 parsing for MCP tool arguments with Python-style boolean and None value normalization
2275
+ - Added Cursor provider with support for Anthropic model, GPT, and Gemini models via Cursor's agent API
2276
+ - Added OAuth authentication flow for Cursor including login, token refresh, and expiry detection
2277
+ - Added `cursor-agent` API type with streaming support and tool execution handlers
2278
+ - Added Cursor model definitions including Anthropic model 4.5, GPT-5.x, Gemini 3, and Grok variants
2279
+ - Added model generation script to automatically fetch and update AI model definitions from models.dev and OpenRouter APIs
2280
+
2281
+ ### Changed
2282
+
2283
+ - Changed Cursor debug logging to use structured JSONL format with automatic MCP argument decoding
2284
+ - Changed MCP tool argument decoding to use protobuf Value schema for improved type handling
2285
+ - Changed tool advertisement to filter Cursor native tools (bash, read, write, delete, ls, grep, lsp) instead of only exposing mcp\_ prefixed tools
2286
+
2287
+ ### Fixed
2288
+
2289
+ - Fixed Cursor conversation history serialization so subagents retain task context and can call complete
2290
+
2291
+ ## [4.2.1] - 2026-01-11
2292
+
2293
+ ### Changed
2294
+
2295
+ - Updated `reasoningSummary` option to accept only `"auto"`, `"concise"`, `"detailed"`, or `null` (removed `"off"` and `"on"` values)
2296
+ - Changed default `reasoningSummary` from `"auto"` to `"detailed"`
2297
+ - OpenAI code provider: switched to bundled system prompt matching opencode, changed originator to "opencode", simplified prompt handling
2298
+
2299
+ ### Fixed
2300
+
2301
+ - Fixed Cloud Code Assist tool schema conversion to avoid unsupported `const` fields
2302
+
2303
+ ## [4.0.0] - 2026-01-10
2304
+
2305
+ ### Added
2306
+
2307
+ - Added `betas` option in `AnthropicOptions` for passing custom Anthropic beta feature flags
2308
+ - OpenCode Zen provider support with 26 models (Anthropic model, GPT, Gemini, Grok, Kimi, GLM, Qwen, etc.). Set `OPENCODE_API_KEY` env var to use.
2309
+ - `thinkingBudgets` option in `SimpleStreamOptions` for customizing token budgets per thinking level on token-based providers
2310
+ - `sessionId` option in `StreamOptions` for providers that support session-based caching. OpenAI code provider provider uses this to set `prompt_cache_key` and routing headers.
2311
+ - `supportsUsageInStreaming` compatibility flag for OpenAI-compatible providers that reject `stream_options: { include_usage: true }`. Defaults to `true`. Set to `false` in model config for providers like gatewayz.ai.
2312
+ - `GOOGLE_APPLICATION_CREDENTIALS` env var support for Vertex AI credential detection (standard for CI/production)
2313
+ - Exported OpenAI code provider utilities: `CacheMetadata`, `getOpenAI codeInstructions`, `getModelFamily`, `ModelFamily`, `buildOpenAI codePiBridge`, `buildOpenAI codeSystemPrompt`, `OpenAI codeSystemPrompt`
2314
+ - Headless OAuth support for all callback-server providers (Google Gemini CLI, Antigravity, OpenAI code provider): paste redirect URL when browser callback is unreachable
2315
+ - Cancellable GitHub Copilot device code polling via AbortSignal
2316
+ - Improved error messages for OpenRouter providers by including raw metadata from upstream errors
2317
+
2318
+ ### Changed
2319
+
2320
+ - Changed Anthropic provider to include Anthropic Code system instruction for all API key types, not just OAuth tokens (except Haiku models)
2321
+ - Changed Anthropic OAuth tool naming to use `proxy_` prefix instead of mapping to Anthropic Code tool names, avoiding potential name collisions
2322
+ - Changed Anthropic provider to include Anthropic Code headers for all requests, not just OAuth tokens
2323
+ - Anthropic provider now maps tool names to Anthropic Code's exact tool names (Read, Write, Edit, Bash, Grep, Glob) instead of using prefixed names
2324
+ - OpenAI Completions provider now disables strict mode on tools to allow optional parameters without null unions
2325
+
2326
+ ### Fixed
2327
+
2328
+ - Fixed Anthropic OAuth code parsing to accept full redirect URLs in addition to raw authorization codes
2329
+ - Fixed Anthropic token refresh to preserve existing refresh token when server doesn't return a new one
2330
+ - Fixed thinking mode being enabled when tool_choice forces a specific tool, which is unsupported
2331
+ - Fixed max_tokens being too low when thinking budget is set, now auto-adjusts to model's maxTokens
2332
+ - Google Cloud Code Assist OAuth for paid subscriptions: properly handles long-running operations for project provisioning, supports `GOOGLE_CLOUD_PROJECT` / `GOOGLE_CLOUD_PROJECT_ID` env vars for paid tiers
2333
+ - `os.homedir()` calls at module load time; now resolved lazily when needed
2334
+ - OpenAI Responses tool strict flag to use a boolean for LM Studio compatibility
2335
+ - Gemini CLI abort handling: detect native `AbortError` in retry catch block, cancel SSE reader when abort signal fires
2336
+ - Antigravity provider 429 errors by aligning request payload with CLIProxyAPI v6.6.89
2337
+ - Thinking block handling for cross-model conversations: thinking blocks are now converted to plain text when switching models
2338
+ - OpenAI code provider context window from 400,000 to 272,000 tokens to match OpenAI code CLI defaults
2339
+ - OpenAI code SSE error events to surface message, code, and status
2340
+ - Context overflow detection for `context_length_exceeded` error codes
2341
+ - OpenAI code provider now always includes `reasoning.encrypted_content` even when custom `include` options are passed
2342
+ - OpenAI code requests now omit the `reasoning` field entirely when thinking is off
2343
+ - Crash when pasting text with trailing whitespace exceeding terminal width
2344
+
2345
+ ## [3.37.1] - 2026-01-10
2346
+
2347
+ ### Added
2348
+
2349
+ - Added automatic type coercion for tool arguments when LLMs return JSON-encoded strings instead of native types (numbers, booleans, arrays, objects)
2350
+
2351
+ ### Changed
2352
+
2353
+ - Changed tool argument validation to attempt JSON parsing and type coercion before rejecting mismatched types
2354
+ - Changed validation error messages to include both original and normalized arguments when coercion was attempted
2355
+
2356
+ ## [3.37.0] - 2026-01-10
2357
+
2358
+ ### Changed
2359
+
2360
+ - Enabled type coercion in JSON schema validation to automatically convert compatible types
2361
+
2362
+ ## [3.35.0] - 2026-01-09
2363
+
2364
+ ### Added
2365
+
2366
+ - Enhanced error messages to include retry-after timing information from API rate limit headers
2367
+
2368
+ ## [0.42.0] - 2026-01-09
2369
+
2370
+ ### Added
2371
+
2372
+ - Added OpenCode Zen provider support with 26 models (Anthropic model, GPT, Gemini, Grok, Kimi, GLM, Qwen, etc.). Set `OPENCODE_API_KEY` env var to use.
2373
+
2374
+ ## [0.39.0] - 2026-01-08
2375
+
2376
+ ### Fixed
2377
+
2378
+ - Fixed Gemini CLI abort handling: detect native `AbortError` in retry catch block, cancel SSE reader when abort signal fires ([#568](https://github.com/badlogic/pi-mono/pull/568) by [@tmustier](https://github.com/tmustier))
2379
+ - Fixed Antigravity provider 429 errors by aligning request payload with CLIProxyAPI v6.6.89: inject Antigravity system instruction with `role: "user"`, set `requestType: "agent"`, and use `antigravity` userAgent. Added bridge prompt to override Antigravity behavior (identity, paths, web dev guidelines) with Pi defaults. ([#571](https://github.com/badlogic/pi-mono/pull/571) by [@ben-vargas](https://github.com/ben-vargas))
2380
+ - Fixed thinking block handling for cross-model conversations: thinking blocks are now converted to plain text (no `<thinking>` tags) when switching models. Previously, `<thinking>` tags caused models to mimic the pattern and output literal tags. Also fixed empty thinking blocks causing API errors. ([#561](https://github.com/badlogic/pi-mono/issues/561))
2381
+
2382
+ ## [0.38.0] - 2026-01-08
2383
+
2384
+ ### Added
2385
+
2386
+ - `thinkingBudgets` option in `SimpleStreamOptions` for customizing token budgets per thinking level on token-based providers ([#529](https://github.com/badlogic/pi-mono/pull/529) by [@melihmucuk](https://github.com/melihmucuk))
2387
+
2388
+ ### Breaking Changes
2389
+
2390
+ - Removed OpenAI code provider model aliases (`gpt-5`, `gpt-5-mini`, `gpt-5-nano`, `openai-code-mini-latest`, `gpt-5-openai-code`, `gpt-5.1-openai-code`, `gpt-5.1-chat-latest`). Use canonical model IDs: `gpt-5.1`, `gpt-5.1-openai-code-max`, `gpt-5.1-openai-code-mini`, `gpt-5.2`, `gpt-5.2-openai-code`. ([#536](https://github.com/badlogic/pi-mono/pull/536) by [@ghoulr](https://github.com/ghoulr))
2391
+
2392
+ ### Fixed
2393
+
2394
+ - Fixed OpenAI code provider context window from 400,000 to 272,000 tokens to match OpenAI code CLI defaults and prevent 400 errors. ([#536](https://github.com/badlogic/pi-mono/pull/536) by [@ghoulr](https://github.com/ghoulr))
2395
+ - Fixed OpenAI code SSE error events to surface message, code, and status. ([#551](https://github.com/badlogic/pi-mono/pull/551) by [@tmustier](https://github.com/tmustier))
2396
+ - Fixed context overflow detection for `context_length_exceeded` error codes.
2397
+
2398
+ ## [0.37.6] - 2026-01-06
2399
+
2400
+ ### Added
2401
+
2402
+ - Exported OpenAI code provider utilities: `CacheMetadata`, `getOpenAI codeInstructions`, `getModelFamily`, `ModelFamily`, `buildOpenAI codePiBridge`, `buildOpenAI codeSystemPrompt`, `OpenAI codeSystemPrompt` ([#510](https://github.com/badlogic/pi-mono/pull/510) by [@mitsuhiko](https://github.com/mitsuhiko))
2403
+
2404
+ ## [0.37.3] - 2026-01-06
2405
+
2406
+ ### Added
2407
+
2408
+ - `sessionId` option in `StreamOptions` for providers that support session-based caching. OpenAI code provider provider uses this to set `prompt_cache_key` and routing headers.
2409
+
2410
+ ## [0.37.2] - 2026-01-05
2411
+
2412
+ ### Fixed
2413
+
2414
+ - OpenAI code provider now always includes `reasoning.encrypted_content` even when custom `include` options are passed ([#484](https://github.com/badlogic/pi-mono/pull/484) by [@kim0](https://github.com/kim0))
2415
+
2416
+ ## [0.37.0] - 2026-01-05
2417
+
2418
+ ### Breaking Changes
2419
+
2420
+ - OpenAI code provider models no longer have per-thinking-level variants (e.g., `gpt-5.2-openai-code-high`). Use the base model ID and set thinking level separately. The OpenAI code provider clamps reasoning effort to what each model supports internally. (initial implementation by [@ben-vargas](https://github.com/ben-vargas) in [#472](https://github.com/badlogic/pi-mono/pull/472))
2421
+
2422
+ ### Added
2423
+
2424
+ - Headless OAuth support for all callback-server providers (Google Gemini CLI, Antigravity, OpenAI code provider): paste redirect URL when browser callback is unreachable ([#428](https://github.com/badlogic/pi-mono/pull/428) by [@ben-vargas](https://github.com/ben-vargas), [#468](https://github.com/badlogic/pi-mono/pull/468) by [@crcatala](https://github.com/crcatala))
2425
+ - Cancellable GitHub Copilot device code polling via AbortSignal
2426
+
2427
+ ### Fixed
2428
+
2429
+ - OpenAI code requests now omit the `reasoning` field entirely when thinking is off, letting the backend use its default instead of forcing a value. ([#472](https://github.com/badlogic/pi-mono/pull/472))
2430
+
2431
+ ## [0.36.0] - 2026-01-05
2432
+
2433
+ ### Added
2434
+
2435
+ - OpenAI code provider OAuth provider with Responses API streaming support: `openai-code-responses` streaming provider with SSE parsing, tool-call handling, usage/cost tracking, and PKCE OAuth flow ([#451](https://github.com/badlogic/pi-mono/pull/451) by [@kim0](https://github.com/kim0))
2436
+
2437
+ ### Fixed
2438
+
2439
+ - Vertex AI dummy value for `getEnvApiKey()`: Returns `"<authenticated>"` when Application Default Credentials are configured (`~/.config/gcloud/application_default_credentials.json` exists) and both `GOOGLE_CLOUD_PROJECT` (or `GCLOUD_PROJECT`) and `GOOGLE_CLOUD_LOCATION` are set. This allows `streamSimple()` to work with Vertex AI without explicit `apiKey` option. The ADC credentials file existence check is cached per-process to avoid repeated filesystem access.
2440
+
2441
+ ## [0.32.3] - 2026-01-03
2442
+
2443
+ ### Fixed
2444
+
2445
+ - Google Vertex AI models no longer appear in available models list without explicit authentication. Previously, `getEnvApiKey()` returned a dummy value for `google-vertex`, causing models to show up even when Google Cloud ADC was not configured.
2446
+
2447
+ ## [0.32.0] - 2026-01-03
2448
+
2449
+ ### Added
2450
+
2451
+ - Vertex AI provider with ADC (Application Default Credentials) support. Authenticate with `gcloud auth application-default login`, set `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION`, and access Gemini models via Vertex AI. ([#300](https://github.com/badlogic/pi-mono/pull/300) by [@default-anton](https://github.com/default-anton))
2452
+
2453
+ ### Fixed
2454
+
2455
+ - **Gemini CLI rate limit handling**: Added automatic retry with server-provided delay for 429 errors. Parses delay from error messages like "Your quota will reset after 39s" and waits accordingly. Falls back to exponential backoff for other transient errors. ([#370](https://github.com/badlogic/pi-mono/issues/370))
2456
+
2457
+ ## [0.31.0] - 2026-01-02
2458
+
2459
+ ### Breaking Changes
2460
+
2461
+ - **Agent API moved**: All agent functionality (`agentLoop`, `agentLoopContinue`, `AgentContext`, `AgentEvent`, `AgentTool`, `AgentToolResult`, etc.) has moved to `@mariozechner/pi-agent-core`. Import from that package instead of `@gajae-code/ai`.
2462
+
2463
+ ### Added
2464
+
2465
+ - **`GoogleThinkingLevel` type**: Exported type that mirrors Google's `ThinkingLevel` enum values (`"THINKING_LEVEL_UNSPECIFIED" | "MINIMAL" | "LOW" | "MEDIUM" | "HIGH"`). Allows configuring Gemini thinking levels without importing from `@google/genai`.
2466
+ - **`ANTHROPIC_OAUTH_TOKEN` env var**: Now checked before `ANTHROPIC_API_KEY` in `getEnvApiKey()`, allowing OAuth tokens to take precedence.
2467
+ - **`event-stream.js` export**: `AssistantMessageEventStream` utility now exported from package index.
2468
+
2469
+ ### Changed
2470
+
2471
+ - **OAuth uses Web Crypto API**: PKCE generation and OAuth flows now use Web Crypto API (`crypto.subtle`) instead of Node.js `crypto` module. This improves browser compatibility while still working in Node.js 20+.
2472
+ - **Deterministic model generation**: `generate-models.ts` now sorts providers and models alphabetically for consistent output across runs. ([#332](https://github.com/badlogic/pi-mono/pull/332) by [@mrexodia](https://github.com/mrexodia))
2473
+
2474
+ ### Fixed
2475
+
2476
+ - **OpenAI completions empty content blocks**: Empty text or thinking blocks in assistant messages are now filtered out before sending to the OpenAI completions API, preventing validation errors. ([#344](https://github.com/badlogic/pi-mono/pull/344) by [@default-anton](https://github.com/default-anton))
2477
+ - **Thinking token duplication**: Fixed thinking content duplication with chutes.ai provider. The provider was returning thinking content in both `reasoning_content` and `reasoning` fields, causing each chunk to be processed twice. Now only the first non-empty reasoning field is used.
2478
+ - **zAi provider API mapping**: Fixed zAi models to use `openai-completions` API with correct base URL (`https://api.z.ai/api/coding/paas/v4`) instead of incorrect Anthropic API mapping. ([#344](https://github.com/badlogic/pi-mono/pull/344), [#358](https://github.com/badlogic/pi-mono/pull/358) by [@default-anton](https://github.com/default-anton))
2479
+
2480
+ ## [0.28.0] - 2025-12-25
2481
+
2482
+ ### Breaking Changes
2483
+
2484
+ - **OAuth storage removed** ([#296](https://github.com/badlogic/pi-mono/issues/296)): All storage functions (`loadOAuthCredentials`, `saveOAuthCredentials`, `setOAuthStorage`, etc.) removed. Callers are responsible for storing credentials.
2485
+ - **OAuth login functions**: `loginAnthropic`, `loginGitHubCopilot`, `loginGeminiCli`, `loginAntigravity` now return `OAuthCredentials` instead of saving to disk.
2486
+ - **refreshOAuthToken**: Now takes `(provider, credentials)` and returns new `OAuthCredentials` instead of saving.
2487
+ - **getOAuthApiKey**: Now takes `(provider, credentials)` and returns `{ newCredentials, apiKey }` or null.
2488
+ - **OAuthCredentials type**: No longer includes `type: "oauth"` discriminator. Callers add discriminator when storing.
2489
+ - **setApiKey, resolveApiKey**: Removed. Callers must manage their own API key storage/resolution.
2490
+ - **getApiKey**: Renamed to `getEnvApiKey`. Only checks environment variables for known providers.
2491
+
2492
+ ## [0.27.7] - 2025-12-24
2493
+
2494
+ ### Fixed
2495
+
2496
+ - **Thinking tag leakage**: Fixed Anthropic model mimicking literal `</thinking>` tags in responses. Unsigned thinking blocks (from aborted streams) are now converted to plain text without `<thinking>` tags. The TUI still displays them as thinking blocks. ([#302](https://github.com/badlogic/pi-mono/pull/302) by [@nicobailon](https://github.com/nicobailon))
2497
+
2498
+ ## [0.25.1] - 2025-12-21
2499
+
2500
+ ### Added
2501
+
2502
+ - **xhigh thinking level support**: Added `supportsXhigh()` function to check if a model supports xhigh reasoning level. Also clamps xhigh to high for OpenAI models that don't support it. ([#236](https://github.com/badlogic/pi-mono/pull/236) by [@theBucky](https://github.com/theBucky))
2503
+
2504
+ ### Fixed
2505
+
2506
+ - **Gemini multimodal tool results**: Fixed images in tool results causing flaky/broken responses with Gemini models. For Gemini 3, images are now nested inside `functionResponse.parts` per the [docs](https://ai.google.dev/gemini-api/docs/function-calling#multimodal). For older models (which don't support multimodal function responses), images are sent in a separate user message.
2507
+
2508
+ - **Queued message steering**: When `getQueuedMessages` is provided, the agent loop now checks for queued user messages after each tool call and skips remaining tool calls in the current assistant message when a queued message arrives (emitting error tool results).
2509
+
2510
+ - **Double API version path in Google provider URL**: Fixed Gemini API calls returning 404 after baseUrl support was added. The SDK was appending its default apiVersion to baseUrl which already included the version path. ([#251](https://github.com/badlogic/pi-mono/pull/251) by [@shellfyred](https://github.com/shellfyred))
2511
+
2512
+ - **Anthropic SDK retries disabled**: Re-enabled SDK-level retries (default 2) for transient HTTP failures. ([#252](https://github.com/badlogic/pi-mono/issues/252))
2513
+
2514
+ ## [0.23.5] - 2025-12-19
2515
+
2516
+ ### Added
2517
+
2518
+ - **Gemini 3 Flash thinking support**: Extended thinking level support for Gemini 3 Flash models (MINIMAL, LOW, MEDIUM, HIGH) to match Pro models' capabilities. ([#212](https://github.com/badlogic/pi-mono/pull/212) by [@markusylisiurunen](https://github.com/markusylisiurunen))
2519
+
2520
+ - **GitHub Copilot thinking models**: Added thinking support for additional Copilot models (o3-mini, o1-mini, o1-preview). ([#234](https://github.com/badlogic/pi-mono/pull/234) by [@aadishv](https://github.com/aadishv))
2521
+
2522
+ ### Fixed
2523
+
2524
+ - **Gemini tool result format**: Fixed tool result format for Gemini 3 Flash Preview which strictly requires `{ output: value }` for success and `{ error: value }` for errors. Previous format using `{ result, isError }` was rejected by newer Gemini models. Also improved type safety by removing `as any` casts. ([#213](https://github.com/badlogic/pi-mono/issues/213), [#220](https://github.com/badlogic/pi-mono/pull/220))
2525
+
2526
+ - **Google baseUrl configuration**: Google provider now respects `baseUrl` configuration for custom endpoints or API proxies. ([#216](https://github.com/badlogic/pi-mono/issues/216), [#221](https://github.com/badlogic/pi-mono/pull/221) by [@theBucky](https://github.com/theBucky))
2527
+
2528
+ - **GitHub Copilot vision requests**: Added `Copilot-Vision-Request` header when sending images to GitHub Copilot models. ([#222](https://github.com/badlogic/pi-mono/issues/222))
2529
+
2530
+ - **GitHub Copilot X-Initiator header**: Fixed X-Initiator logic to check last message role instead of any message in history. This ensures proper billing when users send follow-up messages. ([#209](https://github.com/badlogic/pi-mono/issues/209))
2531
+
2532
+ ## [0.22.3] - 2025-12-16
2533
+
2534
+ ### Added
2535
+
2536
+ - **Image limits test suite**: Added comprehensive tests for provider-specific image limitations (max images, max size, max dimensions). Discovered actual limits: Anthropic (100 images, 5MB, 8000px), OpenAI (500 images, ≥25MB), Gemini (~2500 images, ≥40MB), Mistral (8 images, ~15MB), OpenRouter (~40 images context-limited, ~15MB). ([#120](https://github.com/badlogic/pi-mono/pull/120))
2537
+
2538
+ - **Tool result streaming**: Added `tool_execution_update` event and optional `onUpdate` callback to `AgentTool.execute()` for streaming tool output during execution. Tools can now emit partial results (e.g., bash stdout) that are forwarded to subscribers. ([#44](https://github.com/badlogic/pi-mono/issues/44))
2539
+
2540
+ - **X-Initiator header for GitHub Copilot**: Added X-Initiator header handling for GitHub Copilot provider to ensure correct call accounting (agent calls are not deducted from quota). Sets initiator based on last message role. ([#200](https://github.com/badlogic/pi-mono/pull/200) by [@kim0](https://github.com/kim0))
2541
+
2542
+ ### Changed
2543
+
2544
+ - **Normalized tool_execution_end result**: `tool_execution_end` event now always contains `AgentToolResult` (no longer `AgentToolResult | string`). Errors are wrapped in the standard result format.
2545
+
2546
+ ### Fixed
2547
+
2548
+ - **Reasoning disabled by default**: When `reasoning` option is not specified, thinking is now explicitly disabled for all providers. Previously, some providers like Gemini with "dynamic thinking" would use their default (thinking ON), causing unexpected token usage. This was the original intended behavior. ([#180](https://github.com/badlogic/pi-mono/pull/180) by [@markusylisiurunen](https://github.com/markusylisiurunen))
2549
+
2550
+ ## [0.22.2] - 2025-12-15
2551
+
2552
+ ### Added
2553
+
2554
+ - **Interleaved thinking for Anthropic**: Added `interleavedThinking` option to `AnthropicOptions`. When enabled, Anthropic model 4 models can think between tool calls and reason after receiving tool results. Enabled by default (no extra token cost, just unlocks the capability). Set `interleavedThinking: false` to disable.
2555
+
2556
+ ## [0.22.1] - 2025-12-15
2557
+
2558
+ _Dedicated to Peter's shoulder ([@steipete](https://twitter.com/steipete))_
2559
+
2560
+ ### Added
2561
+
2562
+ - **Interleaved thinking for Anthropic**: Enabled interleaved thinking in the Anthropic provider, allowing Anthropic model models to output thinking blocks interspersed with text responses.
2563
+
2564
+ ## [0.22.0] - 2025-12-15
2565
+
2566
+ ### Added
2567
+
2568
+ - **GitHub Copilot provider**: Added `github-copilot` as a known provider with models sourced from models.dev. Includes Anthropic model, GPT, Gemini, Grok, and other models available through GitHub Copilot. ([#191](https://github.com/badlogic/pi-mono/pull/191) by [@cau1k](https://github.com/cau1k))
2569
+
2570
+ ### Fixed
2571
+
2572
+ - **GitHub Copilot gpt-5 models**: Fixed API selection for gpt-5 models to use `openai-responses` instead of `openai-completions` (gpt-5 models are not accessible via completions endpoint)
2573
+
2574
+ - **GitHub Copilot cross-model context handoff**: Fixed context handoff failing when switching between GitHub Copilot models using different APIs (e.g., gpt-5 to anthropic-model-sonnet-4). Tool call IDs from OpenAI Responses API were incompatible with other models. ([#198](https://github.com/badlogic/pi-mono/issues/198))
2575
+
2576
+ - **Gemini 3 Pro thinking levels**: Thinking level configuration now works correctly for Gemini 3 Pro models. Previously all levels mapped to -1 (minimal thinking). Now LOW/MEDIUM/HIGH properly control test-time computation. ([#176](https://github.com/badlogic/pi-mono/pull/176) by [@markusylisiurunen](https://github.com/markusylisiurunen))
2577
+
2578
+ ## [0.18.2] - 2025-12-11
2579
+
2580
+ ### Changed
2581
+
2582
+ - **Anthropic SDK retries disabled**: Set `maxRetries: 0` on Anthropic client to allow application-level retry handling. The SDK's built-in retries were interfering with coding-agent's retry logic. ([#157](https://github.com/badlogic/pi-mono/issues/157))
2583
+
2584
+ ## [0.18.1] - 2025-12-10
2585
+
2586
+ ### Added
2587
+
2588
+ - **Mistral provider**: Added support for Mistral AI models via the OpenAI-compatible API. Includes automatic handling of Mistral-specific requirements (tool call ID format). Set `MISTRAL_API_KEY` environment variable to use.
2589
+
2590
+ ### Fixed
2591
+
2592
+ - Fixed Mistral 400 errors after aborted assistant messages by skipping empty assistant messages (no content, no tool calls) ([#165](https://github.com/badlogic/pi-mono/issues/165))
2593
+
2594
+ - Removed synthetic assistant bridge message after tool results for Mistral (no longer required as of Dec 2025) ([#165](https://github.com/badlogic/pi-mono/issues/165))
2595
+
2596
+ - Fixed bug where `ANTHROPIC_API_KEY` environment variable was deleted globally after first OAuth token usage, causing subsequent prompts to fail ([#164](https://github.com/badlogic/pi-mono/pull/164))
2597
+
2598
+ ## [0.17.0] - 2025-12-09
2599
+
2600
+ ### Added
2601
+
2602
+ - **`agentLoopContinue` function**: Continue an agent loop from existing context without adding a new user message. Validates that the last message is `user` or `toolResult`. Useful for retry after context overflow or resuming from manually-added tool results.
2603
+
2604
+ ### Breaking Changes
2605
+
2606
+ - Removed provider-level tool argument validation. Validation now happens in `agentLoop` via `executeToolCalls`, allowing models to retry on validation errors. For manual tool execution, use `validateToolCall(tools, toolCall)` or `validateToolArguments(tool, toolCall)`.
2607
+
2608
+ ### Added
2609
+
2610
+ - Added `validateToolCall(tools, toolCall)` helper that finds the tool by name and validates arguments.
2611
+
2612
+ - **OpenAI compatibility overrides**: Added `compat` field to `Model` for `openai-completions` API, allowing explicit configuration of provider quirks (`supportsStore`, `supportsDeveloperRole`, `supportsReasoningEffort`, `maxTokensField`). Falls back to URL-based detection if not set. Useful for LiteLLM, custom proxies, and other non-standard endpoints. ([#133](https://github.com/badlogic/pi-mono/issues/133), thanks @fink-andreas for the initial idea and PR)
2613
+
2614
+ - **xhigh reasoning level**: Added `xhigh` to `ReasoningEffort` type for OpenAI openai-code-max models. For non-OpenAI providers (Anthropic, Google), `xhigh` is automatically mapped to `high`. ([#143](https://github.com/badlogic/pi-mono/issues/143))
2615
+
2616
+ ### Changed
2617
+
2618
+ - **Updated SDK versions**: OpenAI SDK 5.21.0 → 6.10.0, Anthropic SDK 0.61.0 → 0.71.2, Google GenAI SDK 1.30.0 → 1.31.0
2619
+
2620
+ ## [0.13.0] - 2025-12-06
2621
+
2622
+ ### Breaking Changes
2623
+
2624
+ - **Added `totalTokens` field to `Usage` type**: All code that constructs `Usage` objects must now include the `totalTokens` field. This field represents the total tokens processed by the LLM (input + output + cache). For OpenAI and Google, this uses native API values (`total_tokens`, `totalTokenCount`). For Anthropic, it's computed as `input + output + cacheRead + cacheWrite`.
2625
+
2626
+ ## [0.12.10] - 2025-12-04
2627
+
2628
+ ### Added
2629
+
2630
+ - Added `gpt-5.1-openai-code-max` model support
2631
+
2632
+ ### Fixed
2633
+
2634
+ - **OpenAI Token Counting**: Fixed `usage.input` to exclude cached tokens for OpenAI providers. Previously, `input` included cached tokens, causing double-counting when calculating total context size via `input + cacheRead`. Now `input` represents non-cached input tokens across all providers, making `input + output + cacheRead + cacheWrite` the correct formula for total context size.
2635
+
2636
+ - **Fixed Anthropic model Opus 4.5 cache pricing** (was 3x too expensive)
2637
+ - Corrected cache_read: $1.50 → $0.50 per MTok
2638
+ - Corrected cache_write: $18.75 → $6.25 per MTok
2639
+ - Added manual override in `scripts/generate-models.ts` until upstream fix is merged
2640
+ - Submitted PR to models.dev: https://github.com/sst/models.dev/pull/439
2641
+
2642
+ ## [0.9.4] - 2025-11-26
2643
+
2644
+ Initial release with multi-provider LLM support.