pi-free 2.0.12 → 2.0.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,608 +1,640 @@
1
- # Changelog
2
-
3
- All notable changes to this project will be documented in this file.
4
-
5
- The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
- and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
-
8
- ## [Unreleased]
9
-
10
- ## [2.0.12] - 2026-05-13
11
-
12
- ### Added
13
-
14
- - **Novita AI provider** — OpenAI-compatible API at `api.novita.ai/openai/v1` with 100+ open-source models. Non-standard but rich metadata: per-model pricing (`input_token_price_per_m`), context size, max output tokens, reasoning/vision features, and model descriptions. 3 free models, 99 paid.
15
-
16
- - **FastRouter provider** — OpenRouter-compatible API at `api.fastrouter.ai/api/v1` with 170+ models. Always discovered (no auth needed for model listing). Full pricing, context lengths, and feature metadata. 129 text models (6 free, 123 paid) after filtering image/video. Set `FASTROUTER_API_KEY` for chat completions.
17
-
18
- - **Dynamic model fetching for OpenCode and OpenRouter** — Pi's built-in providers now get their models fetched dynamically from the API (`opencode.ai/zen/v1/models` and `openrouter.ai/api/v1/models`), same as Mistral, Groq, Cerebras, and xAI. Overwrites Pi's defaults with the full model list. OpenCode uses name-based free detection (API returns no pricing); OpenRouter uses full cost-based detection.
19
-
20
- - **API key reading from `~/.pi/agent/auth.json`**`getOpencodeApiKey()` and `getOpenrouterApiKey()` now fall back to Pi's auth.json when the env var isn't set, matching how Pi's built-in providers read their keys.
21
-
22
- ### Changed
23
-
24
- - **`_pricingKnown` guard in `isFreeModel`** — Providers can now signal whether pricing data is authoritative. When `_pricingKnown` is explicitly `false` (API returned no pricing), `isFreeModel` falls back to name-only detection (checks for "free" in the model name). This eliminates false positives where missing pricing data was treated as $0 cost. All affected providers (ZenMux, Together, CrofAI, dynamic-built-in, fetchOpenAICompatibleModels, deepinfra, sambanova, novita) now set this flag correctly.
25
-
26
- - **All providers now use `isFreeModel` consistently** — Together switched from hardcoded `cost===0` check to `isFreeModel`. DeepInfra and SambaNova switched from manual free lists to `isFreeModel` with proper `_pricingKnown` metadata. NVIDIA, Codestral, and Ollama explicitly documented as free-tier providers (`freeModels = allModels`).
27
-
28
- - **Unified OpenRouter-based providers** — Kilo, OpenRouter, and Cline now share the same `fetchOpenRouterCompatibleModels` / OpenRouter API logic.
29
-
30
- ### Removed
31
-
32
- - **`DEFAULT_MIN_SIZE_B` (30B minimum model size filter)** — Removed from `model-fetcher.ts` and `cline-models.ts`. All models are now shown regardless of parameter count. NVIDIA still uses its own 70B threshold (`NVIDIA_MIN_SIZE_B`).
33
-
34
- ### Fixed
35
-
36
- - **ZenMux false free classifications** — Models without `pricings` data (DeepSeek Chat V3.1, Kimi K2 0711, Claude 3.7 Sonnet) were incorrectly classified as free because missing pricing defaulted to $0. Fixed to 3 genuinely free models (down from 6 false positives).
37
-
38
- - **Together AI, CrofAI, dynamic-built-in missing-pricing false positives** — Same `?? 0` pattern across multiple providers could mark unpriced models as free. All now set `_pricingKnown: false` when pricing is absent from the API response.
39
-
40
- ## [2.0.10] - 2026-05-08
41
-
42
- ### Fixed
43
-
44
- - **Config wipe on JSON parse failure** — `saveConfig` used `loadConfigFile()` which returns `{}` on any parse error, causing `{ ...{}, ...updates }` to write a partial config that permanently destroyed all API keys. Now reads the raw file directly and refuses to save if corrupt. `ensureConfigFile` also refuses to overwrite corrupt files.
45
-
46
- - **Built-in provider keys removed from pi-free config** `mistral_api_key`, `groq_api_key`, `cerebras_api_key`, `xai_api_key`, and `hf_token` are no longer in `~/.pi/free.json`. These are pi's own built-in providers; their keys come from environment variables only.
47
-
48
- ## [2.0.9] - 2026-05-08
49
-
50
- ### Added
51
-
52
- - **Together AI provider**Fast inference on 200+ open-source models (Llama, DeepSeek, Qwen, etc.) through an OpenAI-compatible API. $1 trial credit on signup, no credit card required. Set `TOGETHER_AI_API_KEY`.
53
-
54
- - **Per-model metadata for Ollama Cloud** — Fetches `/api/show` details for every Ollama Cloud model to detect real capabilities: thinking/vision support, actual context windows (up to 1M tokens), and thinking level maps (`reasoning_effort`). Models now show parameter size and quantization in display names.
55
-
56
- - **Thinking level maps**Four curated maps (`DEFAULT`, `GPT_OSS`, `QWEN3`, `NO_OFF`) for Ollama Cloud models that map Pi's thinking levels to Ollama's `reasoning_effort` values, based on per-model API testing.
57
-
58
- - **`/ollama-cloud-refresh` command** — Re-fetch Ollama Cloud models from the API and update the provider live, no restart needed.
59
-
60
- - **Persistent Ollama Cloud cache** — Models cached via `provider-cache.ts` for fast startup. Stale cache auto-refreshes on `session_start`. Fallback models used when cache is unavailable.
61
-
62
- ### Fixed
63
-
64
- - **ZenMux pricing** — Fixed `pricings` key (was reading `pricing`, always returned $0). Now correctly extracts per-model pricing (per-million-tokens ÷ 1M). Also uses `display_name`, `input_modalities` (vision detection), and `capabilities.reasoning` from API.
65
-
66
- - **CrofAI model metadata** — Custom fetch now reads per-model `name`, `custom_reasoning`, `context_length`, `max_completion_tokens`, and per-million-token `pricing` from the API.
67
-
68
- - **DeepInfra model metadata** — Extracts real model data from the `metadata` sub-object (context_length, max_tokens, pricing, reasoning tags). Filters non-chat models (embedding, rerank, whisper).
69
-
70
- - **Ollama Cloud model names** — Enriched with parameter size and quantization (e.g., `deepseek-v4-pro (671B, Q4_0)`). Set `supportsDeveloperRole: false` (fixes GLM models silently ignoring prompts). Bumped `maxTokens` from 4096 to 32768.
71
-
72
- - **SambaNova model accuracy** — `fetchOpenAICompatibleModels` now reads per-model `context_length`, `max_completion_tokens`, and `pricing` from SambaNova's extended API response. Also reads `reasoning`, `input_modalities`, and accepts plain array responses.
73
-
74
- ### Changed
75
-
76
- - **Package scope migration** — Updated all peer dependency imports from `@mariozechner/*` to `@earendil-works/*` (`pi-ai`, `pi-coding-agent`, `pi-tui`) to match the upstream scope rename in `@earendil-works/pi` v0.74.0.
77
-
78
- ## [2.0.8] - 2026-05-07
79
-
80
- ### Added
81
-
82
- - **Codestral provider** — Mistral's code-focused model via codestral.mistral.ai.
83
- Free tier (Experiment plan): 2 req/min, 500K tokens/min, 1B tokens/month.
84
- Uses pi's built-in Mistral SDK (`mistral-conversations` API type).
85
-
86
- - **LLM7.io provider** — OpenAI-compatible API gateway routing across
87
- multiple providers (OpenAI, Mistral, Google, DeepSeek, etc.). Free tier:
88
- default/fast selectors, 100 req/hr, 20 req/min.
89
-
90
- - **DeepInfra provider** — AI inference cloud with 100+ open-source models.
91
- $5 one-time credit on signup (no credit card). Models fetched dynamically.
92
- Shown as trial credit provider in `/free-providers`.
93
-
94
- - **SambaNova provider** — Fast inference on custom RDU hardware with
95
- OpenAI-compatible API. All models accessible on free tier (no credit card):
96
- 20-480 RPM. Models include Llama 3.3 70B, DeepSeek-V3/R1, Llama 4 Maverick.
97
- Shown as freemium provider in `/free-providers`.
98
-
99
- ### Changed
100
-
101
- - **Codestral: fixed HTTP 422 error** — Switched API type from
102
- `openai-completions` to `mistral-conversations`. The OpenAI completions
103
- adapter was sending unrecognized fields (`stream_options`, `store`,
104
- `max_completion_tokens`) that Mistral's API rejects with 422.
105
-
106
- ### Fixed
107
-
108
- - **Toggle commands persist across sessions for all providers** Providers using
109
- `setupProvider` (zenmux, crofai, llm7, sambanova, deepinfra) were always
110
- registering `freeModels` on startup, ignoring the persisted `show_paid` config.
111
- Now each provider reads its config getter and registers the correct initial
112
- model set. Fixes #149.
113
-
114
- ### Security
115
-
116
- - **Log injection prevention** `scripts/update-benchmarks.ts` sanitizes external
117
- API data (CRLF stripping) before logging. Fixes SonarCloud S1075.
118
-
119
- ### Reliability
120
-
121
- - **Prefer `String#replaceAll()` over `String#replace()`** — Replaced all 7 flagged
122
- instances. Where regex is unnecessary (2/7), switched to string literal form.
123
- Fixes SonarCloud S4144.
124
-
125
- ### Added
126
-
127
- - **`agents.md`** Codebase guide for AI agents covering architecture, patterns,
128
- conventions, testing, and the Pi extension API.
129
-
130
- ### Added
131
-
132
- - **Passive quota monitoring** — Extracts rate-limit headers from every
133
- provider response via `after_provider_response` event (no extra API calls).
134
- Tries 6 header format variants (`x-ratelimit-remaining`,
135
- `ratelimit-remaining-requests-day`, etc.). Shows remaining quota in the
136
- status bar with warning icons when ≤25% or ≤10%. Fixes #147.
137
-
138
- ### Fixed
139
-
140
- - **Missing `g` flag on `replaceAll` regexps broke model filtering** —
141
- `String.prototype.replaceAll()` requires a global RegExp; 20+ patterns in
142
- `benchmark-lookup.ts` were missing it, causing a `TypeError` that prevented
143
- models from appearing for providers like cline and kilo. Added `/g` flag to
144
- all affected patterns. Fixes #151.
145
-
146
- ### Changed
147
-
148
- - **Resolved ~280 SonarCloud issues across 21 files** — Bulk code-quality
149
- cleanup including: stripping trailing zeros from `toFixed()` (S7748),
150
- `global` → `globalThis` (S7764), `parseFloat` → `Number.parseFloat` (S7773),
151
- naming unnamed async exports (S7726), `String.raw` for path strings (S7780),
152
- top-level await over promise chains (S7785), re-export from source (S7763),
153
- `.at(-1)` over `[length-1]` (S7755), `node:fs` protocol imports (S7772),
154
- and logging user-controlled data sanitization (S5145). Fixes #148.
155
-
156
- ### Security
157
-
158
- - **Bump `basic-ftp` 5.3.0 → 5.3.1** — Patches GHSA-rpmf-866q-6p89 (high
159
- severity): malicious FTP server could cause client-side DoS via unbounded
160
- multiline control response buffering. Fixes `npm audit` finding.
161
-
162
- ### Refactored
163
-
164
- - **Extracted shared model-fetch helper** — `fetchOpenAICompatibleModels()`
165
- in `lib/util.ts` eliminates ~120 lines of duplicated fetch→parse→map
166
- boilerplate across CrofAI, DeepInfra, and SambaNova providers.
167
-
168
- ## [2.0.6] - 2026-05-02
169
-
170
- ### Security
171
-
172
- - **5x S5852 regex super-linear runtime** Replaced all flagged regex patterns
173
- (nested quantifiers in model size extraction) with manual char-by-char string
174
- parsing in `parseModelSize()`, `normalizeSizeTokenOrder()`, and test helpers.
175
- Eliminates catastrophic backtracking risk.
176
-
177
- - **4x S4036 PATH variable security** —
178
- - `open-browser.ts`: Added `resolveExe()` helper that prefers known absolute
179
- paths (`/usr/bin/open`, `C:\Windows\System32\...\powershell.exe`) before
180
- falling back to PATH lookup
181
- - `check-extensions.mjs`: Removed hardcoded PATH override; resolved `npm` via
182
- `execFileSync` with known absolute paths
183
-
184
- - **1x S4721 command injection** Replaced `execSync` with `execFileSync` in
185
- `resolveExe()` helper. `execFileSync` takes separate arguments and never
186
- spawns a shell, eliminating the injection vector.
187
-
188
- ### Changed
189
-
190
- - **Banner image** Converted `banner.svg` to `banner.png` for reliable
191
- rendering across all GitHub surfaces (mobile, email, dark mode readers).
192
-
193
- ## [2.0.5] - 2026-05-02
194
-
195
- ### Added
196
-
197
- - **NVIDIA model probe auto-discovery** Lazy auto-probe for NVIDIA models on
198
- first `session_start` (once per session). Broken 404 models detected and
199
- auto-hidden without requiring manual `/probe-nvidia`.
200
-
201
- ### Changed
202
-
203
- - **Ollama provider updates** — Improved cloud model detection and configuration.
204
-
205
- ## [2.0.4] - 2026-05-02
206
-
207
- ### Fixed
208
-
209
- - **OpenRouter key resolution no longer falls back to `free.json`**
210
- `getOpenrouterApiKey()` now only checks the `OPENROUTER_API_KEY` environment variable.
211
- Previously it fell back to `~/.pi/free.json`, which could contain stale/revoked keys
212
- that conflict with pi's built-in OpenRouter provider (which reads from
213
- `~/.pi/agent/auth.json`).
214
-
215
- - **Removed `openrouter_api_key` from `PiFreeConfig` interface and config template** —
216
- Prevents future persistence of OpenRouter keys in `free.json`, eliminating the
217
- source of stale key conflicts for built-in providers.
218
-
219
- ## [2.0.3] - 2026-05-02
220
-
221
- ### Added
222
-
223
- - **Consistent `isFreeModel` helper with Route A/B logic** Created a unified helper for free model detection that automatically detects whether a provider exposes pricing:
224
- - **Route A (pricing-exposed)**: Model is free if `cost === 0` OR `"free"` in name (OR logic)
225
- - **Route B (non-pricing-exposed)**: Model is free only if `"free"` in name
226
- - Dynamic detection: If ALL models have cost === 0, assumes pricing not exposed → uses Route B
227
- - If ANY model has cost > 0, assumes pricing exposed → uses Route A
228
- - All providers (Cline, Kilo, NVIDIA, Ollama, dynamic built-in) now use this consistent helper
229
-
230
- - **CrofAI provider (PAID)** Added new **paid** provider for CrofAI (https://crof.ai), an OpenAI-compatible LLM inference API. **Note: CrofAI is a paid provider** — users must have a CrofAI API key with credits. The provider uses Route B detection (name-only) since CrofAI's API doesn't expose per-model pricing. Only models with `"free"` in their names are marked as free (none currently).
231
-
232
- - **ZenMux provider (PAID)** — Added new **paid** provider for ZenMux AI gateway (https://zenmux.ai), a unified API for 200+ models from OpenAI, Anthropic, Google, etc. **Note: ZenMux is a paid provider** — users must have a ZenMux API key with credits. The provider uses Route A detection (OR logic) since ZenMux exposes pricing. Models marked as free only if `cost === 0` OR `"free"` in name (2 free models identified: GLM 4.7 Flash Free, GLM 4.6v Flash Free).
233
-
234
- - **Comprehensive `isFreeModel` test suite** — Added 30+ unit tests covering Route A, Route B, freemium behavior, and edge cases. Tests verify correct classification on actual OpenRouter API data (371 models, 30 free).
235
-
236
- - **Toggle commands for dynamic built-in providers** — Added `/toggle-mistral`, `/toggle-groq`,
237
- `/toggle-cerebras`, `/toggle-xai`, and `/toggle-huggingface` commands. These providers were
238
- registered with the global toggle system but lacked per-provider toggle commands, making
239
- free/paid switching inaccessible without editing config files.
240
-
241
- - **Lazy auto-probe for NVIDIA models** Extracted `runNvidiaProbe()` into a shared function
242
- called automatically on first `session_start` (once per session). Previously, users had to
243
- manually run `/probe-nvidia` to discover 404 models. Now broken models are detected and
244
- auto-hidden on first use.
245
-
246
- ### Changed
247
-
248
- - **Cline provider now uses `isFreeModel`** Fixed Cline to use the consistent `isFreeModel` helper instead of `m.cost.input === 0`. Previously used cost-only filtering, now uses proper OR logic for pricing-exposed providers.
249
-
250
- - **NVIDIA test expectations updated** — Updated tests to reflect strict Route B behavior (name-only detection for non-pricing-exposed providers). Added test for models with `"free"` in name being marked as free.
251
-
252
- ### Fixed
253
-
254
- - **`provider-factory.ts` — `beforeProviderRequest` hook now scoped to owning provider** —
255
- The hook was firing for **all** provider requests regardless of which provider the factory
256
- was configuring. Now checks `evt.provider !== def.providerId` and returns early if the
257
- event doesn't belong to the owning provider.
258
-
259
- - **`provider-factory.ts` `reRegister` callback no longer corrupts stored model lists**
260
- When toggling between free/paid modes, the callback was overwriting `stored.all` with only
261
- the filtered subset, losing the original full model list. Now preserves the original model
262
- lists for correct subsequent toggling.
263
-
264
- - **`lib/types.ts`Removed leftover `LspTestInterface`**Removed a test interface that
265
- was left in production code.
266
-
267
- - **`index.ts` — Removed redundant `.catch()` on deprecated Qwen provider** — The `.catch()`
268
- was unnecessary since `Promise.allSettled` already handles rejections.
269
-
270
- ### Removed
271
-
272
- - **Qwen provider (deprecated)** — Removed Qwen OAuth provider as the 1,000 req/day free tier is no longer available. Provider remains functional for existing authenticated users but new free tier registrations are not supported.
273
-
274
- - **Modal provider** Removed single-model Modal provider (only had GLM-5.1 FP8). Users should use other providers for GLM models.
275
-
276
- - **Cloudflare provider** — Removed Cloudflare Workers AI provider as it's now built into pi core. Users can use pi's built-in Cloudflare provider instead.
277
-
278
- - **Qwen test file** — Removed `tests/qwen.test.ts` along with the deprecated provider.
279
-
280
- ## [2.0.2] - 2026-04-26
281
-
282
- ### Added
283
-
284
- - **Model matching debug logging** — Added `~/.pi/modelmatch.log` to diagnose which models get Coding Index scores and which don't:
285
- - Logs every matching attempt with provider, model ID, normalization strategy, and result
286
- - CSV-like format: `timestamp|provider|modelId|modelName|action|strategy|normalizedId|matchKey|codingIndex|details`
287
- - Provider-specific normalizers for better matching:
288
- - **NVIDIA**: Strips vendor prefixes (`meta/`, `mistralai/`, `microsoft/`, `qwen/`, etc.)
289
- - **Cloudflare**: Strips `@cf/namespace/` prefixes
290
- - **Groq**: Removes `-versatile` and numeric context suffixes (`-32768`)
291
- - **Cerebras**: Normalizes `llama3.1` `llama-3.1`, auto-adds `instruct` suffix
292
- - **Mistral**: Strips `-latest` suffix
293
- - **Ollama**: Converts `model:tag` `model-tag`
294
- - Common suffix stripping: `:free`, date codes (`-20250514`), versions (`-v1.1`), `-it`, `-fp8`/`-bf16`
295
-
296
- - **Enhanced benchmark lookup** `enhanceModelNameWithCodingIndex()` now accepts optional `provider` parameter for provider-aware normalization
297
-
298
- - **Static 404 model blocklist for NVIDIA** — Probed all 136 models from `integrate.api.nvidia.com/v1/models` and identified 57 that return 404 "Function not found" on `/v1/chat/completions`. These are now hard-filtered so they never appear in the model selector:
299
- - Covers discontinued models (`databricks/dbrx-instruct`, `meta/codellama-70b`, `meta/llama2-70b`, `ibm/granite-*`, etc.)
300
- - Covers embedding-only models listed as chat-capable (`nvidia/nv-embed-v1`, `nvidia/nv-embedqa-*`, `snowflake/arctic-embed-l`, etc.)
301
- - Covers stale API catalog entries (`mistralai/mistral-large`, `mistralai/mistral-large-2-instruct`, `writer/palmyra-*`, etc.)
302
- - Full list in `NVIDIA_KNOWN_404_MODELS` in `providers/nvidia/nvidia.ts`
303
-
304
- - **`/probe-nvidia` command** — On-demand model health check. Tests every registered NVIDIA model with a minimal `max_tokens: 1` request, auto-hides any new 404s in `~/.pi/free.json`, and re-registers the provider immediately.
305
-
306
- - **`scripts/probe-nvidia.mjs`**Standalone Node.js script to reproduce the probe. Reads `~/.pi/free.json` for the API key, batches 20 requests at a time with 10s timeout, and prints all broken model IDs for adding to the blocklist.
307
-
308
- - **Ollama Cloud 403 handling** — Same pattern as NVIDIA 404s for Ollama Cloud:
309
- - `OLLAMA_KNOWN_403_MODELS` blocklist for models that return 403 "access denied"
310
- - `/probe-ollama` command to test all models on-demand, auto-hide broken ones, and re-register
311
- - `scripts/probe-ollama.mjs` standalone script for blocklist maintenance
312
-
313
- - **Provider-scoped hidden models** — Hidden models are now provider-specific:
314
- - Format: `"provider/model-id"` (e.g., `"ollama/kimi-k2.6"`, `"nvidia/broken-model"`)
315
- - A model hidden from one provider doesn't hide it from other providers
316
- - Backward compatible with old global `"model-id"` format
317
- - All providers updated: NVIDIA, Ollama, Cloudflare, Cline, Kilo, Modal
318
-
319
- ### Fixed
320
-
321
- - **Probe commands timeout handling** — Added `fetchWithTimeout` with 10-second timeout to `/probe-nvidia` and `/probe-ollama` commands. Prevents the coding harness from freezing when individual model probe requests hang indefinitely.
322
-
323
- - **NVIDIA provider now sends `authHeader: true`** — Explicitly enables `Authorization: Bearer` header injection. Previously relied on pi's implicit behavior which could fail in some configurations.
324
-
325
- ### Removed
326
-
327
- - **NVIDIA 404 model warning log** — Removed the `console.warn("[nvidia] Skipping known 404 model: ...")` output when filtering out known broken models. The filter still works silently; use `/probe-nvidia` to identify new 404s if needed.
328
-
329
- ### Changed
330
-
331
- - **Cloudflare provider now fetches models dynamically** Replaced static 19-model hardcoded list with live API fetch from `api.cloudflare.com/client/v4/accounts/{account_id}/ai/models`:
332
- - Automatically discovers all 30+ text generation models (was manually maintaining 19)
333
- - Smart filtering excludes embeddings, image generation, speech, translation, and vision-only models via regex patterns
334
- - Metadata inference from model IDs: detects vision (`vision`/`multimodal`), reasoning (`r1`/`thinking`/`qwq`), context windows, and estimated costs
335
- - Fixed Mistral Small ID: changed from incorrect `@cf/mistralai/...` to correct `@cf/mistral/...`
336
- - Added new fallback models: Kimi K2.6, OpenAI GPT-OSS 120B/20B, Qwen 2.5 Coder 32B, QwQ 32B, Llama 3.2 11B Vision
337
- - Graceful fallback to expanded 18-model hardcoded list if API fetch fails
338
-
339
- - **NVIDIA provider now queries NVIDIA's API directly** — Source of truth switched from `models.dev` curated JSON to `https://integrate.api.nvidia.com/v1/models`:
340
- - Eliminates 57 missing models and 25 stale entries from the old third-party source
341
- - Models not in `models.dev` get inferred metadata (128k context, 4k output, vision/reasoning heuristics)
342
- - Added regex-based non-chat model filtering for unknown models (embeddings, whisper, reward models, safety guards, parsers, detectors, etc.)
343
- - Graceful fallback to `models.dev` if NVIDIA API is unreachable
344
- - Removed paid/free toggle filtering — NVIDIA is freemium (all models use free credits)
345
-
346
- ## [2.0.1] - 2026-04-24
347
-
348
- ### Added
349
-
350
- - **Built-in provider toggle support** (`lib/built-in-toggle.ts`) — Enables free/paid filtering for Pi's built-in providers that expose per-model pricing:
351
- - **OpenCode (`/toggle-opencode`)** — Captures built-in OpenCode models on session start and filters to free-only by default
352
- - **OpenRouter (`/toggle-openrouter`)** — Now uses the built-in toggle system for consistency
353
- - Toggle works in the current session (no restart needed)
354
- - Persisted via `opencode_show_paid` and `openrouter_show_paid` in `~/.pi/free.json`
355
-
356
- ### Changed
357
-
358
- - **OpenRouter moved to built-in toggle system** — OpenRouter is now handled by `lib/built-in-toggle.ts` alongside OpenCode for a unified approach:
359
- - Removed from `providers/dynamic-built-in/index.ts`
360
- - Eliminated duplicate toggle command registration logic
361
- - Consolidated toggle persistence with other built-in providers
362
-
363
- - **Standardized all toggle commands to `toggle-{provider}`**Renamed from `{provider}-toggle` for consistency:
364
- - `/kilo-toggle` `/toggle-kilo`
365
- - `/cline-toggle` `/toggle-cline`
366
- - `/openrouter-toggle` `/toggle-openrouter`
367
- - `/nvidia-toggle` `/toggle-nvidia`
368
- - `/cloudflare-toggle` `/toggle-cloudflare`
369
- - `/ollama-toggle` `/toggle-ollama`
370
- - `/mistral-toggle` → `/toggle-mistral`
371
- - `/groq-toggle` `/toggle-groq`
372
- - `/cerebras-toggle` `/toggle-cerebras`
373
- - `/toggle-opencode` (new)
374
-
375
- ### Fixed
376
-
377
- - **Ollama Cloud model fetching endpoint** — Corrected the `/v1/models` → `/models` endpoint path in `providers/ollama/ollama.ts`:
378
- - The previous fix (2.0.0) incorrectly used `/v1/models`; Ollama Cloud's models endpoint is `/v1/models` for chat completions but `/models` for listing
379
- - This ensures model fetching works correctly with the OpenAI-compatible API
380
-
381
- ### Removed
382
-
383
- - **Global `/free` command** — Removed the global free-only toggle. Per-provider toggles (`/toggle-{provider}`) are now the only way to switch between free and paid models. The `/free-providers` status command remains.
384
-
385
- ## [2.0.0] - 2026-04-23
386
-
387
- ### Breaking Changes
388
-
389
- - **Removed Fireworks provider** — Fireworks is now a built-in Pi provider (added in pi 0.68.1), so the extension's Fireworks provider has been removed to avoid conflicts:
390
- - Deleted `providers/fireworks/fireworks.ts` and `tests/fireworks.test.ts`
391
- - Removed all Fireworks configuration options from `config.ts` (`fireworks_api_key`, `fireworks_show_paid`)
392
- - Users should now use Pi's built-in Fireworks support with `FIREWORKS_API_KEY`
393
-
394
- - **Renamed Ollama provider to `ollama-cloud`** — Changed provider ID from `"ollama"` to `"ollama-cloud"` to avoid collision with Pi's built-in local Ollama provider:
395
- - This prevents provider ID conflicts when both are registered
396
- - All log messages and documentation now reference "Ollama Cloud"
397
-
398
- ### Removed
399
-
400
- - **Dropped `@sinclair/typebox` peer dependency** — Pi 0.69.0 migrated from `@sinclair/typebox` to `typebox` 1.x. The extension didn't directly import this package, so it was removed from `peerDependencies` to avoid potential conflicts.
401
-
402
- ### Fixed
403
-
404
- - **Ollama Cloud API endpoint** — Fixed broken Ollama Cloud integration:
405
- - Changed `BASE_URL_OLLAMA` from `https://ollama.com` to `https://ollama.com/v1` — the OpenAI-compatible API endpoint
406
- - Fixed model fetching to use `/v1/models` instead of `/api/tags` — ensures model IDs work with chat completions endpoint
407
- - Previously calls went to HTML homepage instead of API endpoints, causing 404 errors
408
-
409
- ### Removed
410
-
411
- - **Removed paid model warning on selection** Deleted the `model_select` event handler that showed:
412
- - `⚠️ Paid model selected (${model.id}). Use "/free off" to enable paid models.`
413
- - This warning was redundant since the global `/free` toggle and provider toggles already control model visibility
414
-
415
- - **Removed pointless `/modal-toggle` command** — Modal provider only has 1 free model (GLM-5.1 FP8), so there was nothing meaningful to toggle:
416
- - Added `skipToggle` option to `ProviderDefinition` and `ProviderSetupConfig` interfaces
417
- - Modal provider now sets `skipToggle: true` to prevent toggle command creation
418
-
419
- ### Changed
420
-
421
- - **Marked Qwen provider as fully deprecated** Updated messaging to clarify the provider is broken:
422
- - Changed model name from `"Qwen Coder — Free 1k/day"` to `"Qwen Coder — DEPRECATED (free tier discontinued)"`
423
- - Updated all JSDoc comments to clearly state auth is broken and free tier is no longer available
424
- - Provider remains for backward compatibility but should not be used
425
-
426
- ### Added
427
-
428
- - **Cloudflare Workers AI provider** New provider for Cloudflare's serverless GPU platform:
429
- - 50+ open-source models: Llama 4, Mistral Small 3.1, Qwen 2.5/3, DeepSeek R1, Gemma 4, Kimi K2.5/2.6, and more
430
- - **10,000 Neurons/day FREE tier** (resets daily at 00:00 UTC)
431
- - **$0.011 per 1,000 Neurons** beyond free allocation
432
- - Only requires `CLOUDFLARE_API_TOKEN` account ID auto-derived from token
433
- - Toggle with `/cloudflare-toggle`
434
- - Create token at https://dash.cloudflare.com/profile/api-tokens
435
-
436
- - **Unified dynamic built-in providers module** — New `providers/dynamic-built-in/` module that dynamically fetches models from Pi's built-in providers when users have API keys:
437
- - **Mistral** (`MISTRAL_API_KEY`) Fetches from `api.mistral.ai/v1/models`
438
- - **Groq** (`GROQ_API_KEY`) Fetches from `api.groq.com/openai/v1/models`
439
- - **Cerebras** (`CEREBRAS_API_KEY`) Fetches from `api.cerebras.ai/v1/models`
440
- - **xAI** (`XAI_API_KEY`) — Fetches from `api.x.ai/v1/models`
441
- - **Hugging Face** (`HF_TOKEN` — optional) — Fetches public + authenticated models
442
- - **OpenRouter** — Moved from `index.ts` to unified module with dynamic fetch
443
- - All integrate with global `/free` toggle and have per-provider toggle commands (`/mistral-toggle`, `/groq-toggle`, etc.)
444
-
445
- - **Global `/free` toggle system** New centralized free/paid filtering across ALL providers:
446
- - `/free on/off/status` — Toggle free-only view globally
447
- - `/free-providers` — Show free/paid model counts by provider
448
- - `FREE_ONLY` config option and `PI_FREE_ONLY` environment variable
449
- - Providers register via `registerWithGlobalToggle()` for unified filtering
450
-
451
- ### Fixed
452
-
453
- - **Toggle commands now actually filter models from UI** — Previously, toggle commands only showed notifications but didn't remove paid models from the model picker:
454
- - **OpenRouter (`/openrouter-toggle`)**: Now uses `registerProvider`/`unregisterProvider` to actually filter models from the picker UI
455
- - **NVIDIA (`/nvidia-toggle`)**: Added dynamic `showPaid` parameter to `fetchNvidiaModels()` so toggle properly switches between free and paid model sets
456
- - **Fireworks**: Removed broken toggle command all models are paid with no free tier, so there was nothing to toggle
457
-
458
- ### Added
459
-
460
- - **OpenRouter per-provider free model toggle** — Added `/openrouter-toggle` command for the built-in OpenRouter provider:
461
- - `/openrouter-toggle` Switch between showing only free models vs all models (including paid)
462
- - New config flag `openrouter_show_paid` in `~/.pi/free.json` (default: `false`)
463
- - Environment variable: `OPENROUTER_SHOW_PAID=true` to show paid models by default
464
- - This brings OpenRouter (a built-in pi provider) in line with extension providers that have per-provider toggles
465
-
466
- ### Deprecated
467
-
468
- - **Qwen provider** — The 1,000 requests/day free tier is no longer available from Qwen/DashScope. The provider code remains for backward compatibility but is now deprecated:
469
- - Added `@deprecated` JSDoc tags to all Qwen-related exports
470
- - Added deprecation warning when Qwen provider loads
471
- - Added warning when `QWEN_SHOW_PAID` config is used
472
- - Consider migrating to other free providers: Kilo, Cline, NVIDIA, or Modal
473
-
474
- ### Added
475
-
476
- - **Go provider** — OpenCode Go subscription gateway (⚠️ paid only — $5 first month, then $10/month, no free tier) with models: GLM-5, Kimi K2.5, MiMo-V2-Pro, MiMo-V2-Omni, MiniMax M2.7, MiniMax M2.5
477
- - Set `OPENCODE_GO_API_KEY` or `opencode_go_api_key` in `~/.pi/free.json`
478
- - Toggle with `/go-toggle`
479
-
480
- ### Fixed
481
-
482
- - **All providers now show Coding Index scores in model selector** — Added `enhanceWithCI()` to factory-based providers (nvidia, fireworks, mistral, modal, ollama) and cline. Now all providers display CI scores in `/models` command (pi-models extension).
483
-
484
- - **All providers now show in `--list-models`** — Providers (zen, openrouter, go) that registered models only in `session_start` were missing from `pi --list-models` which runs before session starts. Added immediate registration for these providers:
485
- - **zen**: Added model caching to `~/.pi/provider-cache.json` for immediate registration + dynamic refresh
486
- - **openrouter**: Immediate model registration at extension load (like kilo/cline)
487
- - **go**: Immediate registration with static model list (no API to fetch from)
488
- - All 11 providers now visible in `--list-models`
489
-
490
- ### Changed
491
-
492
- - Updated README with clear free vs paid provider distinction (9 free + 2 paid-only: Go, Fireworks)
493
- - Added Go and Fireworks provider documentation under new "💳 Paid-Only Providers" section
494
- - Added `opencode_go_api_key` to config file template
495
- - Updated package.json description and keywords to include all 11 providers
496
-
497
- ### Added
498
-
499
- - **Provider model cache** (`lib/provider-cache.ts`) — New utility for caching provider model lists to `~/.pi/provider-cache.json`. Used by zen provider for faster startup and offline access after first successful fetch.
500
-
501
- ## [1.0.9] - 2026-04-14
502
-
503
- ### Fixed
504
-
505
- - **Qwen OAuth breaks other OAuth providers** — `modifyModels` receives all models across every registered provider, not just Qwen's. The previous `map()` stamped the Qwen dashscope `baseUrl` onto every model, causing other OAuth providers (Kilo, OpenRouter, etc.) to return 404 after a `/login qwen` flow. Now only models with `provider === PROVIDER_QWEN` are patched; others pass through unchanged.
506
-
507
- ## [1.0.8] - 2026-04-13
508
-
509
- ### Added
510
-
511
- - **Modal provider** — Free access to GLM-5.1 FP8 (128k context, 16k max output) during promotional period (free until April 30, 2026)
512
- - Requires a free Modal API key (`MODAL_API_KEY` or `modal_api_key` in `~/.pi/free.json`)
513
- - Model: `zai-org/GLM-5.1-FP8` — 128k context window, 16k max output tokens
514
- - **Qwen provider** — Free access to Qwen Coder (1,000 requests/day) via OAuth device flow
515
- - Run `/login qwen` to authenticate through Qwen Studio (chat.qwen.ai)
516
- - Uses `coder-model` alias (maps to Qwen3.6-Plus on the backend)
517
- - 131k context window, 16k max output tokens, zero cost
518
-
519
- ### Fixed
520
-
521
- - **Qwen OAuth browser launch on Windows** — URLs with `&` query params were truncated by `cmd.exe`'s `&` command separator; switched to `powershell.exe Start-Process` which passes the URL as a literal string
522
- - **Qwen API endpoint** — Replicates qwen-code's `getCurrentEndpoint()` logic: uses `resource_url` from OAuth token response (`dashscope.aliyuncs.com` for Chinese accounts, `portal.qwen.ai` for international), with fallback to `dashscope.aliyuncs.com/compatible-mode/v1`
523
- - **Qwen DashScope headers** — Added all headers required by DashScope's OpenAI-compatible API: `X-DashScope-AuthType: qwen-oauth`, `X-DashScope-CacheControl: enable`, `X-DashScope-UserAgent`, `Client-Code: QwenCode`
524
- - **Qwen modifyModels crash** `modifyModels` must be synchronous; making it async caused the pi framework to receive a `Promise` instead of a `Model[]`, breaking `ModelRegistry.find()`
525
-
526
- ## [1.0.5] - 2025-04-03
527
-
528
- ### Fixed
529
-
530
- - **NVIDIA provider non-chat model filtering** (comment/implementation mismatch)
531
- - Added modalities-based filtering to exclude embedding, speech-to-text, OCR, and image-gen models
532
- - Filters models where `output` is not `["text"]` (e.g., image generation like `black-forest-labs/flux.1-dev`)
533
- - Filters models where `input` lacks `"text"` (e.g., OCR like `nvidia/nemoretriever-ocr-v1`, speech-to-text like `openai/whisper-large-v3`)
534
- - Updated file comment to accurately describe the filtering behavior
535
- - Added 8 comprehensive unit tests for model filtering logic
536
-
537
- ## [1.0.4] - 2025-04-03
538
-
539
- ### Fixed
540
-
541
- - **All tests now passing** (127/127)
542
- - Fixed mock paths in kilo.test.ts, zen.test.ts, ollama.test.ts
543
- - Fixed createCtxReRegister mocks in zen.test.ts and openrouter.test.ts
544
- - Fixed cline.test.ts to test actual provider re-registration behavior
545
- - Added missing DEFAULT_MIN_SIZE_B constant to openrouter mock
546
-
547
- ### Changed
548
-
549
- - **Code quality improvements**
550
- - Refactored usage modules to break circular dependency (limits.ts ↔ formatters.ts)
551
- - Created usage/types.ts with shared interfaces (FreeTierLimit, FreeTierUsage)
552
- - Bumped version to 1.0.4
553
-
554
- ## [1.0.3] - 2025-04-03
555
-
556
- ### Changed
557
-
558
- - Updated package.json metadata (name, description, keywords, repository URL)
559
- - Updated .npmignore for cleaner publishes
560
-
561
- ## [1.0.0] - 2024-03-28
562
-
563
- ### Added
564
-
565
- - Initial release with 6 providers: Kilo, Zen, OpenRouter, NVIDIA, Cline, Fireworks
566
- - Free tier usage tracking across all sessions
567
- - Provider failover with model hopping
568
- - Autocompact integration for rate limit recovery
569
- - Usage widget with glimpseui
570
- - Command toggles for free/all model filtering
571
- - Hardcoded benchmark data from Artificial Analysis
572
-
573
- ### Changed
574
-
575
- - **Major refactoring**: Split free-tier-limits.ts into usage/\* modules
576
- - usage/tracking.ts - runtime session tracking
577
- - usage/cumulative.ts - persistent storage
578
- - usage/formatters.ts - display formatting
579
- - 77% line reduction (741 → 166 lines)
580
- - **Major refactoring**: Split usage-widget.ts into widget/\* modules
581
- - widget/data.ts - data collection
582
- - widget/format.ts - formatting utilities
583
- - widget/render.ts - HTML generation
584
- - 74% line reduction (~350 → 90 lines)
585
- - **Refactoring**: Extracted functions from cline-auth.ts
586
- - fetchAuthorizeUrl() - auth URL fetching
587
- - waitForAuthCode() - callback handling
588
- - exchangeCodeForTokens() - token exchange
589
- - parseManualInput() - manual input parsing
590
- - **Refactoring**: Simplified model-hop.ts complexity
591
- - Extracted handleDowngradeDecision()
592
- - Extracted tryAlternativeModel()
593
- - **Deduplication**: Created shared modules
594
- - lib/json-persistence.ts - file I/O with caching
595
- - lib/logger.ts - structured logging
596
- - providers/model-fetcher.ts - OpenRouter-compatible fetching
597
- - Replaced ~30 console.log statements with structured logging
598
- - Fixed all 9 pre-existing test failures
599
- - fetchWithRetry now throws after last retry
600
- - Fixed auth pattern matching (added key.*not.*valid)
601
- - Updated capability ranking tests
602
- - Added resetUsageStats() for test isolation
603
-
604
- ### Fixed
605
-
606
- - fetchWithRetry() now properly throws after exhausting retries
607
- - Auth error pattern matching now handles more message variants
608
- - Test isolation for free-tier-limits tests
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ## [2.0.14] - 2026-06-02
11
+
12
+ ### Added
13
+
14
+ - **Routeway provider** — OpenAI-compatible gateway (`api.routeway.ai/v1`) with 219 models, 16 free (`:free` suffix). Set `ROUTEWAY_API_KEY` or add `routeway_api_key` to `~/.pi/free.json`. Toggle with `/toggle-routeway` ([#209](https://github.com/apmantza/pi-free/pull/209)).
15
+
16
+ ### Fixed
17
+
18
+ - **Cline free model merging** — Free-to-try models (e.g. `qwen3.7-plus`) from Cline's recommended list now appear in the free model picker even when absent from the main catalog ([#209](https://github.com/apmantza/pi-free/pull/209)).
19
+
20
+ - **`_pricingKnown` / `_freeKnown` authoritatve flag**Providers can now signal whether pricing data is authoritative via `_pricingKnown`. When `false`, `isFreeModel` falls back to name-based detection. Kilo's `isFree` API flag now flows through as `_freeKnown` ([#209](https://github.com/apmantza/pi-free/pull/209)).
21
+
22
+ ## [2.0.13] - 2026-05-21
23
+
24
+ ### Added
25
+
26
+ - **OpenCode static headers injection** — pi-free now injects required OpenCode headers (`x-opencode-client`, `x-opencode-session`, `x-opencode-request`, `x-opencode-project`, `User-Agent`) when capturing/re-registering pi's built-in OpenCode models **and** when dynamically fetching/registering OpenCode models from `opencode.ai/zen/v1`. Prevents requests from hanging indefinitely when pi's model generation omits these headers ([pi#4680](https://github.com/earendil-works/pi/issues/4680), [#171](https://github.com/apmantza/pi-free/issues/171), [#173](https://github.com/apmantza/pi-free/issues/173), [#174](https://github.com/apmantza/pi-free/issues/174)). Headers are now regenerated per-call with fresh session and request IDs. Uses native `ses_`/`msg_` prefixed ULID identifiers matching OpenCode's `Identifier.descending()` format to avoid daily rate-limit throttling ([#175](https://github.com/apmantza/pi-free/issues/175)).
27
+
28
+ - **OpenCode endpoint detection** — Replaced regex-based OpenCode endpoint check with a simple string comparison, reducing overhead on every streaming request.
29
+
30
+ ### Fixed
31
+
32
+ - **Lazy-load Pi AI stream providers** — Pi-ai's OpenAI completions and Anthropic stream modules are now imported lazily on first use rather than at extension load time. Eliminates start-up failures when pi-ai exports are not yet resolvable ([#177](https://github.com/apmantza/pi-free/issues/177)).
33
+
34
+ - **Subpath resolution for isolated extension context** — Pi loads pi-free from a directory tree that does not contain `@earendil-works/pi-ai` in its `node_modules`. `createRequire().resolve()` only understands CJS resolution, but pi-ai is ESM-only with strict exports. The new fallback resolves a pi-ai dependency from Pi's entry point, walks up to `node_modules`, reads `pi-ai/package.json`, and maps the `exports` field to the actual file path. Fixes module resolution for both `anthropic` and `openai-completions` subpaths. Includes integration test.
35
+
36
+ - **Security: shell injection in test** — Replaced `execSync` with `execFileSync` in the OpenCode session integration test to avoid shell injection risk.
37
+
38
+ ### Security
39
+
40
+ - **Bump `brace-expansion` 5.0.5 5.0.6** — Patches minor dependency vulnerability. Fixes `npm audit`. ([#172](https://github.com/apmantza/pi-free/issues/172))
41
+
42
+ ## [2.0.12] - 2026-05-13
43
+
44
+ ### Added
45
+
46
+ - **Novita AI provider** OpenAI-compatible API at `api.novita.ai/openai/v1` with 100+ open-source models. Non-standard but rich metadata: per-model pricing (`input_token_price_per_m`), context size, max output tokens, reasoning/vision features, and model descriptions. 3 free models, 99 paid.
47
+
48
+ - **FastRouter provider** — OpenRouter-compatible API at `api.fastrouter.ai/api/v1` with 170+ models. Always discovered (no auth needed for model listing). Full pricing, context lengths, and feature metadata. 129 text models (6 free, 123 paid) after filtering image/video. Set `FASTROUTER_API_KEY` for chat completions.
49
+
50
+ - **Dynamic model fetching for OpenCode and OpenRouter** — Pi's built-in providers now get their models fetched dynamically from the API (`opencode.ai/zen/v1/models` and `openrouter.ai/api/v1/models`), same as Mistral, Groq, Cerebras, and xAI. Overwrites Pi's defaults with the full model list. OpenCode uses name-based free detection (API returns no pricing); OpenRouter uses full cost-based detection.
51
+
52
+ - **API key reading from `~/.pi/agent/auth.json`** `getOpencodeApiKey()` and `getOpenrouterApiKey()` now fall back to Pi's auth.json when the env var isn't set, matching how Pi's built-in providers read their keys.
53
+
54
+ ### Changed
55
+
56
+ - **`_pricingKnown` guard in `isFreeModel`** Providers can now signal whether pricing data is authoritative. When `_pricingKnown` is explicitly `false` (API returned no pricing), `isFreeModel` falls back to name-only detection (checks for "free" in the model name). This eliminates false positives where missing pricing data was treated as $0 cost. All affected providers (ZenMux, Together, CrofAI, dynamic-built-in, fetchOpenAICompatibleModels, deepinfra, sambanova, novita) now set this flag correctly.
57
+
58
+ - **All providers now use `isFreeModel` consistently** — Together switched from hardcoded `cost===0` check to `isFreeModel`. DeepInfra and SambaNova switched from manual free lists to `isFreeModel` with proper `_pricingKnown` metadata. NVIDIA, Codestral, and Ollama explicitly documented as free-tier providers (`freeModels = allModels`).
59
+
60
+ - **Unified OpenRouter-based providers** — Kilo, OpenRouter, and Cline now share the same `fetchOpenRouterCompatibleModels` / OpenRouter API logic.
61
+
62
+ ### Removed
63
+
64
+ - **`DEFAULT_MIN_SIZE_B` (30B minimum model size filter)** — Removed from `model-fetcher.ts` and `cline-models.ts`. All models are now shown regardless of parameter count. NVIDIA still uses its own 70B threshold (`NVIDIA_MIN_SIZE_B`).
65
+
66
+ ### Fixed
67
+
68
+ - **ZenMux false free classifications** — Models without `pricings` data (DeepSeek Chat V3.1, Kimi K2 0711, Claude 3.7 Sonnet) were incorrectly classified as free because missing pricing defaulted to $0. Fixed to 3 genuinely free models (down from 6 false positives).
69
+
70
+ - **Together AI, CrofAI, dynamic-built-in missing-pricing false positives** — Same `?? 0` pattern across multiple providers could mark unpriced models as free. All now set `_pricingKnown: false` when pricing is absent from the API response.
71
+
72
+ ## [2.0.10] - 2026-05-08
73
+
74
+ ### Fixed
75
+
76
+ - **Config wipe on JSON parse failure** — `saveConfig` used `loadConfigFile()` which returns `{}` on any parse error, causing `{ ...{}, ...updates }` to write a partial config that permanently destroyed all API keys. Now reads the raw file directly and refuses to save if corrupt. `ensureConfigFile` also refuses to overwrite corrupt files.
77
+
78
+ - **Built-in provider keys removed from pi-free config** — `mistral_api_key`, `groq_api_key`, `cerebras_api_key`, `xai_api_key`, and `hf_token` are no longer in `~/.pi/free.json`. These are pi's own built-in providers; their keys come from environment variables only.
79
+
80
+ ## [2.0.9] - 2026-05-08
81
+
82
+ ### Added
83
+
84
+ - **Together AI provider** — Fast inference on 200+ open-source models (Llama, DeepSeek, Qwen, etc.) through an OpenAI-compatible API. $1 trial credit on signup, no credit card required. Set `TOGETHER_AI_API_KEY`.
85
+
86
+ - **Per-model metadata for Ollama Cloud** — Fetches `/api/show` details for every Ollama Cloud model to detect real capabilities: thinking/vision support, actual context windows (up to 1M tokens), and thinking level maps (`reasoning_effort`). Models now show parameter size and quantization in display names.
87
+
88
+ - **Thinking level maps** — Four curated maps (`DEFAULT`, `GPT_OSS`, `QWEN3`, `NO_OFF`) for Ollama Cloud models that map Pi's thinking levels to Ollama's `reasoning_effort` values, based on per-model API testing.
89
+
90
+ - **`/ollama-cloud-refresh` command** — Re-fetch Ollama Cloud models from the API and update the provider live, no restart needed.
91
+
92
+ - **Persistent Ollama Cloud cache** — Models cached via `provider-cache.ts` for fast startup. Stale cache auto-refreshes on `session_start`. Fallback models used when cache is unavailable.
93
+
94
+ ### Fixed
95
+
96
+ - **ZenMux pricing** — Fixed `pricings` key (was reading `pricing`, always returned $0). Now correctly extracts per-model pricing (per-million-tokens ÷ 1M). Also uses `display_name`, `input_modalities` (vision detection), and `capabilities.reasoning` from API.
97
+
98
+ - **CrofAI model metadata** — Custom fetch now reads per-model `name`, `custom_reasoning`, `context_length`, `max_completion_tokens`, and per-million-token `pricing` from the API.
99
+
100
+ - **DeepInfra model metadata** — Extracts real model data from the `metadata` sub-object (context_length, max_tokens, pricing, reasoning tags). Filters non-chat models (embedding, rerank, whisper).
101
+
102
+ - **Ollama Cloud model names** — Enriched with parameter size and quantization (e.g., `deepseek-v4-pro (671B, Q4_0)`). Set `supportsDeveloperRole: false` (fixes GLM models silently ignoring prompts). Bumped `maxTokens` from 4096 to 32768.
103
+
104
+ - **SambaNova model accuracy** — `fetchOpenAICompatibleModels` now reads per-model `context_length`, `max_completion_tokens`, and `pricing` from SambaNova's extended API response. Also reads `reasoning`, `input_modalities`, and accepts plain array responses.
105
+
106
+ ### Changed
107
+
108
+ - **Package scope migration** Updated all peer dependency imports from `@mariozechner/*` to `@earendil-works/*` (`pi-ai`, `pi-coding-agent`, `pi-tui`) to match the upstream scope rename in `@earendil-works/pi` v0.74.0.
109
+
110
+ ## [2.0.8] - 2026-05-07
111
+
112
+ ### Added
113
+
114
+ - **Codestral provider** — Mistral's code-focused model via codestral.mistral.ai.
115
+ Free tier (Experiment plan): 2 req/min, 500K tokens/min, 1B tokens/month.
116
+ Uses pi's built-in Mistral SDK (`mistral-conversations` API type).
117
+
118
+ - **LLM7.io provider** — OpenAI-compatible API gateway routing across
119
+ multiple providers (OpenAI, Mistral, Google, DeepSeek, etc.). Free tier:
120
+ default/fast selectors, 100 req/hr, 20 req/min.
121
+
122
+ - **DeepInfra provider** AI inference cloud with 100+ open-source models.
123
+ $5 one-time credit on signup (no credit card). Models fetched dynamically.
124
+ Shown as trial credit provider in `/free-providers`.
125
+
126
+ - **SambaNova provider** — Fast inference on custom RDU hardware with
127
+ OpenAI-compatible API. All models accessible on free tier (no credit card):
128
+ 20-480 RPM. Models include Llama 3.3 70B, DeepSeek-V3/R1, Llama 4 Maverick.
129
+ Shown as freemium provider in `/free-providers`.
130
+
131
+ ### Changed
132
+
133
+ - **Codestral: fixed HTTP 422 error** Switched API type from
134
+ `openai-completions` to `mistral-conversations`. The OpenAI completions
135
+ adapter was sending unrecognized fields (`stream_options`, `store`,
136
+ `max_completion_tokens`) that Mistral's API rejects with 422.
137
+
138
+ ### Fixed
139
+
140
+ - **Toggle commands persist across sessions for all providers** — Providers using
141
+ `setupProvider` (zenmux, crofai, llm7, sambanova, deepinfra) were always
142
+ registering `freeModels` on startup, ignoring the persisted `show_paid` config.
143
+ Now each provider reads its config getter and registers the correct initial
144
+ model set. Fixes #149.
145
+
146
+ ### Security
147
+
148
+ - **Log injection prevention** — `scripts/update-benchmarks.ts` sanitizes external
149
+ API data (CRLF stripping) before logging. Fixes SonarCloud S1075.
150
+
151
+ ### Reliability
152
+
153
+ - **Prefer `String#replaceAll()` over `String#replace()`** Replaced all 7 flagged
154
+ instances. Where regex is unnecessary (2/7), switched to string literal form.
155
+ Fixes SonarCloud S4144.
156
+
157
+ ### Added
158
+
159
+ - **`agents.md`** Codebase guide for AI agents covering architecture, patterns,
160
+ conventions, testing, and the Pi extension API.
161
+
162
+ ### Added
163
+
164
+ - **Passive quota monitoring** — Extracts rate-limit headers from every
165
+ provider response via `after_provider_response` event (no extra API calls).
166
+ Tries 6 header format variants (`x-ratelimit-remaining`,
167
+ `ratelimit-remaining-requests-day`, etc.). Shows remaining quota in the
168
+ status bar with warning icons when ≤25% or ≤10%. Fixes #147.
169
+
170
+ ### Fixed
171
+
172
+ - **Missing `g` flag on `replaceAll` regexps broke model filtering**
173
+ `String.prototype.replaceAll()` requires a global RegExp; 20+ patterns in
174
+ `benchmark-lookup.ts` were missing it, causing a `TypeError` that prevented
175
+ models from appearing for providers like cline and kilo. Added `/g` flag to
176
+ all affected patterns. Fixes #151.
177
+
178
+ ### Changed
179
+
180
+ - **Resolved ~280 SonarCloud issues across 21 files** — Bulk code-quality
181
+ cleanup including: stripping trailing zeros from `toFixed()` (S7748),
182
+ `global` `globalThis` (S7764), `parseFloat` → `Number.parseFloat` (S7773),
183
+ naming unnamed async exports (S7726), `String.raw` for path strings (S7780),
184
+ top-level await over promise chains (S7785), re-export from source (S7763),
185
+ `.at(-1)` over `[length-1]` (S7755), `node:fs` protocol imports (S7772),
186
+ and logging user-controlled data sanitization (S5145). Fixes #148.
187
+
188
+ ### Security
189
+
190
+ - **Bump `basic-ftp` 5.3.0 5.3.1** Patches GHSA-rpmf-866q-6p89 (high
191
+ severity): malicious FTP server could cause client-side DoS via unbounded
192
+ multiline control response buffering. Fixes `npm audit` finding.
193
+
194
+ ### Refactored
195
+
196
+ - **Extracted shared model-fetch helper** — `fetchOpenAICompatibleModels()`
197
+ in `lib/util.ts` eliminates ~120 lines of duplicated fetch→parse→map
198
+ boilerplate across CrofAI, DeepInfra, and SambaNova providers.
199
+
200
+ ## [2.0.6] - 2026-05-02
201
+
202
+ ### Security
203
+
204
+ - **5x S5852 regex super-linear runtime** — Replaced all flagged regex patterns
205
+ (nested quantifiers in model size extraction) with manual char-by-char string
206
+ parsing in `parseModelSize()`, `normalizeSizeTokenOrder()`, and test helpers.
207
+ Eliminates catastrophic backtracking risk.
208
+
209
+ - **4x S4036 PATH variable security**
210
+ - `open-browser.ts`: Added `resolveExe()` helper that prefers known absolute
211
+ paths (`/usr/bin/open`, `C:\Windows\System32\...\powershell.exe`) before
212
+ falling back to PATH lookup
213
+ - `check-extensions.mjs`: Removed hardcoded PATH override; resolved `npm` via
214
+ `execFileSync` with known absolute paths
215
+
216
+ - **1x S4721 command injection** Replaced `execSync` with `execFileSync` in
217
+ `resolveExe()` helper. `execFileSync` takes separate arguments and never
218
+ spawns a shell, eliminating the injection vector.
219
+
220
+ ### Changed
221
+
222
+ - **Banner image** — Converted `banner.svg` to `banner.png` for reliable
223
+ rendering across all GitHub surfaces (mobile, email, dark mode readers).
224
+
225
+ ## [2.0.5] - 2026-05-02
226
+
227
+ ### Added
228
+
229
+ - **NVIDIA model probe auto-discovery** — Lazy auto-probe for NVIDIA models on
230
+ first `session_start` (once per session). Broken 404 models detected and
231
+ auto-hidden without requiring manual `/probe-nvidia`.
232
+
233
+ ### Changed
234
+
235
+ - **Ollama provider updates** — Improved cloud model detection and configuration.
236
+
237
+ ## [2.0.4] - 2026-05-02
238
+
239
+ ### Fixed
240
+
241
+ - **OpenRouter key resolution no longer falls back to `free.json`**
242
+ `getOpenrouterApiKey()` now only checks the `OPENROUTER_API_KEY` environment variable.
243
+ Previously it fell back to `~/.pi/free.json`, which could contain stale/revoked keys
244
+ that conflict with pi's built-in OpenRouter provider (which reads from
245
+ `~/.pi/agent/auth.json`).
246
+
247
+ - **Removed `openrouter_api_key` from `PiFreeConfig` interface and config template** —
248
+ Prevents future persistence of OpenRouter keys in `free.json`, eliminating the
249
+ source of stale key conflicts for built-in providers.
250
+
251
+ ## [2.0.3] - 2026-05-02
252
+
253
+ ### Added
254
+
255
+ - **Consistent `isFreeModel` helper with Route A/B logic** Created a unified helper for free model detection that automatically detects whether a provider exposes pricing:
256
+ - **Route A (pricing-exposed)**: Model is free if `cost === 0` OR `"free"` in name (OR logic)
257
+ - **Route B (non-pricing-exposed)**: Model is free only if `"free"` in name
258
+ - Dynamic detection: If ALL models have cost === 0, assumes pricing not exposed → uses Route B
259
+ - If ANY model has cost > 0, assumes pricing exposed → uses Route A
260
+ - All providers (Cline, Kilo, NVIDIA, Ollama, dynamic built-in) now use this consistent helper
261
+
262
+ - **CrofAI provider (PAID)** — Added new **paid** provider for CrofAI (https://crof.ai), an OpenAI-compatible LLM inference API. **Note: CrofAI is a paid provider** — users must have a CrofAI API key with credits. The provider uses Route B detection (name-only) since CrofAI's API doesn't expose per-model pricing. Only models with `"free"` in their names are marked as free (none currently).
263
+
264
+ - **ZenMux provider (PAID)** Added new **paid** provider for ZenMux AI gateway (https://zenmux.ai), a unified API for 200+ models from OpenAI, Anthropic, Google, etc. **Note: ZenMux is a paid provider** users must have a ZenMux API key with credits. The provider uses Route A detection (OR logic) since ZenMux exposes pricing. Models marked as free only if `cost === 0` OR `"free"` in name (2 free models identified: GLM 4.7 Flash Free, GLM 4.6v Flash Free).
265
+
266
+ - **Comprehensive `isFreeModel` test suite** — Added 30+ unit tests covering Route A, Route B, freemium behavior, and edge cases. Tests verify correct classification on actual OpenRouter API data (371 models, 30 free).
267
+
268
+ - **Toggle commands for dynamic built-in providers** — Added `/toggle-mistral`, `/toggle-groq`,
269
+ `/toggle-cerebras`, `/toggle-xai`, and `/toggle-huggingface` commands. These providers were
270
+ registered with the global toggle system but lacked per-provider toggle commands, making
271
+ free/paid switching inaccessible without editing config files.
272
+
273
+ - **Lazy auto-probe for NVIDIA models** — Extracted `runNvidiaProbe()` into a shared function
274
+ called automatically on first `session_start` (once per session). Previously, users had to
275
+ manually run `/probe-nvidia` to discover 404 models. Now broken models are detected and
276
+ auto-hidden on first use.
277
+
278
+ ### Changed
279
+
280
+ - **Cline provider now uses `isFreeModel`** — Fixed Cline to use the consistent `isFreeModel` helper instead of `m.cost.input === 0`. Previously used cost-only filtering, now uses proper OR logic for pricing-exposed providers.
281
+
282
+ - **NVIDIA test expectations updated** — Updated tests to reflect strict Route B behavior (name-only detection for non-pricing-exposed providers). Added test for models with `"free"` in name being marked as free.
283
+
284
+ ### Fixed
285
+
286
+ - **`provider-factory.ts` `beforeProviderRequest` hook now scoped to owning provider** —
287
+ The hook was firing for **all** provider requests regardless of which provider the factory
288
+ was configuring. Now checks `evt.provider !== def.providerId` and returns early if the
289
+ event doesn't belong to the owning provider.
290
+
291
+ - **`provider-factory.ts` `reRegister` callback no longer corrupts stored model lists** —
292
+ When toggling between free/paid modes, the callback was overwriting `stored.all` with only
293
+ the filtered subset, losing the original full model list. Now preserves the original model
294
+ lists for correct subsequent toggling.
295
+
296
+ - **`lib/types.ts` Removed leftover `LspTestInterface`** Removed a test interface that
297
+ was left in production code.
298
+
299
+ - **`index.ts` Removed redundant `.catch()` on deprecated Qwen provider** — The `.catch()`
300
+ was unnecessary since `Promise.allSettled` already handles rejections.
301
+
302
+ ### Removed
303
+
304
+ - **Qwen provider (deprecated)** — Removed Qwen OAuth provider as the 1,000 req/day free tier is no longer available. Provider remains functional for existing authenticated users but new free tier registrations are not supported.
305
+
306
+ - **Modal provider** Removed single-model Modal provider (only had GLM-5.1 FP8). Users should use other providers for GLM models.
307
+
308
+ - **Cloudflare provider** — Removed Cloudflare Workers AI provider as it's now built into pi core. Users can use pi's built-in Cloudflare provider instead.
309
+
310
+ - **Qwen test file** Removed `tests/qwen.test.ts` along with the deprecated provider.
311
+
312
+ ## [2.0.2] - 2026-04-26
313
+
314
+ ### Added
315
+
316
+ - **Model matching debug logging** Added `~/.pi/modelmatch.log` to diagnose which models get Coding Index scores and which don't:
317
+ - Logs every matching attempt with provider, model ID, normalization strategy, and result
318
+ - CSV-like format: `timestamp|provider|modelId|modelName|action|strategy|normalizedId|matchKey|codingIndex|details`
319
+ - Provider-specific normalizers for better matching:
320
+ - **NVIDIA**: Strips vendor prefixes (`meta/`, `mistralai/`, `microsoft/`, `qwen/`, etc.)
321
+ - **Cloudflare**: Strips `@cf/namespace/` prefixes
322
+ - **Groq**: Removes `-versatile` and numeric context suffixes (`-32768`)
323
+ - **Cerebras**: Normalizes `llama3.1` `llama-3.1`, auto-adds `instruct` suffix
324
+ - **Mistral**: Strips `-latest` suffix
325
+ - **Ollama**: Converts `model:tag` → `model-tag`
326
+ - Common suffix stripping: `:free`, date codes (`-20250514`), versions (`-v1.1`), `-it`, `-fp8`/`-bf16`
327
+
328
+ - **Enhanced benchmark lookup** — `enhanceModelNameWithCodingIndex()` now accepts optional `provider` parameter for provider-aware normalization
329
+
330
+ - **Static 404 model blocklist for NVIDIA** — Probed all 136 models from `integrate.api.nvidia.com/v1/models` and identified 57 that return 404 "Function not found" on `/v1/chat/completions`. These are now hard-filtered so they never appear in the model selector:
331
+ - Covers discontinued models (`databricks/dbrx-instruct`, `meta/codellama-70b`, `meta/llama2-70b`, `ibm/granite-*`, etc.)
332
+ - Covers embedding-only models listed as chat-capable (`nvidia/nv-embed-v1`, `nvidia/nv-embedqa-*`, `snowflake/arctic-embed-l`, etc.)
333
+ - Covers stale API catalog entries (`mistralai/mistral-large`, `mistralai/mistral-large-2-instruct`, `writer/palmyra-*`, etc.)
334
+ - Full list in `NVIDIA_KNOWN_404_MODELS` in `providers/nvidia/nvidia.ts`
335
+
336
+ - **`/probe-nvidia` command** On-demand model health check. Tests every registered NVIDIA model with a minimal `max_tokens: 1` request, auto-hides any new 404s in `~/.pi/free.json`, and re-registers the provider immediately.
337
+
338
+ - **`scripts/probe-nvidia.mjs`** — Standalone Node.js script to reproduce the probe. Reads `~/.pi/free.json` for the API key, batches 20 requests at a time with 10s timeout, and prints all broken model IDs for adding to the blocklist.
339
+
340
+ - **Ollama Cloud 403 handling** Same pattern as NVIDIA 404s for Ollama Cloud:
341
+ - `OLLAMA_KNOWN_403_MODELS` blocklist for models that return 403 "access denied"
342
+ - `/probe-ollama` command to test all models on-demand, auto-hide broken ones, and re-register
343
+ - `scripts/probe-ollama.mjs` standalone script for blocklist maintenance
344
+
345
+ - **Provider-scoped hidden models** — Hidden models are now provider-specific:
346
+ - Format: `"provider/model-id"` (e.g., `"ollama/kimi-k2.6"`, `"nvidia/broken-model"`)
347
+ - A model hidden from one provider doesn't hide it from other providers
348
+ - Backward compatible with old global `"model-id"` format
349
+ - All providers updated: NVIDIA, Ollama, Cloudflare, Cline, Kilo, Modal
350
+
351
+ ### Fixed
352
+
353
+ - **Probe commands timeout handling** — Added `fetchWithTimeout` with 10-second timeout to `/probe-nvidia` and `/probe-ollama` commands. Prevents the coding harness from freezing when individual model probe requests hang indefinitely.
354
+
355
+ - **NVIDIA provider now sends `authHeader: true`** — Explicitly enables `Authorization: Bearer` header injection. Previously relied on pi's implicit behavior which could fail in some configurations.
356
+
357
+ ### Removed
358
+
359
+ - **NVIDIA 404 model warning log** — Removed the `console.warn("[nvidia] Skipping known 404 model: ...")` output when filtering out known broken models. The filter still works silently; use `/probe-nvidia` to identify new 404s if needed.
360
+
361
+ ### Changed
362
+
363
+ - **Cloudflare provider now fetches models dynamically**Replaced static 19-model hardcoded list with live API fetch from `api.cloudflare.com/client/v4/accounts/{account_id}/ai/models`:
364
+ - Automatically discovers all 30+ text generation models (was manually maintaining 19)
365
+ - Smart filtering excludes embeddings, image generation, speech, translation, and vision-only models via regex patterns
366
+ - Metadata inference from model IDs: detects vision (`vision`/`multimodal`), reasoning (`r1`/`thinking`/`qwq`), context windows, and estimated costs
367
+ - Fixed Mistral Small ID: changed from incorrect `@cf/mistralai/...` to correct `@cf/mistral/...`
368
+ - Added new fallback models: Kimi K2.6, OpenAI GPT-OSS 120B/20B, Qwen 2.5 Coder 32B, QwQ 32B, Llama 3.2 11B Vision
369
+ - Graceful fallback to expanded 18-model hardcoded list if API fetch fails
370
+
371
+ - **NVIDIA provider now queries NVIDIA's API directly** — Source of truth switched from `models.dev` curated JSON to `https://integrate.api.nvidia.com/v1/models`:
372
+ - Eliminates 57 missing models and 25 stale entries from the old third-party source
373
+ - Models not in `models.dev` get inferred metadata (128k context, 4k output, vision/reasoning heuristics)
374
+ - Added regex-based non-chat model filtering for unknown models (embeddings, whisper, reward models, safety guards, parsers, detectors, etc.)
375
+ - Graceful fallback to `models.dev` if NVIDIA API is unreachable
376
+ - Removed paid/free toggle filtering — NVIDIA is freemium (all models use free credits)
377
+
378
+ ## [2.0.1] - 2026-04-24
379
+
380
+ ### Added
381
+
382
+ - **Built-in provider toggle support** (`lib/built-in-toggle.ts`) — Enables free/paid filtering for Pi's built-in providers that expose per-model pricing:
383
+ - **OpenCode (`/toggle-opencode`)** Captures built-in OpenCode models on session start and filters to free-only by default
384
+ - **OpenRouter (`/toggle-openrouter`)** — Now uses the built-in toggle system for consistency
385
+ - Toggle works in the current session (no restart needed)
386
+ - Persisted via `opencode_show_paid` and `openrouter_show_paid` in `~/.pi/free.json`
387
+
388
+ ### Changed
389
+
390
+ - **OpenRouter moved to built-in toggle system** — OpenRouter is now handled by `lib/built-in-toggle.ts` alongside OpenCode for a unified approach:
391
+ - Removed from `providers/dynamic-built-in/index.ts`
392
+ - Eliminated duplicate toggle command registration logic
393
+ - Consolidated toggle persistence with other built-in providers
394
+
395
+ - **Standardized all toggle commands to `toggle-{provider}`** Renamed from `{provider}-toggle` for consistency:
396
+ - `/kilo-toggle` `/toggle-kilo`
397
+ - `/cline-toggle` → `/toggle-cline`
398
+ - `/openrouter-toggle` → `/toggle-openrouter`
399
+ - `/nvidia-toggle` → `/toggle-nvidia`
400
+ - `/cloudflare-toggle` `/toggle-cloudflare`
401
+ - `/ollama-toggle` → `/toggle-ollama`
402
+ - `/mistral-toggle` → `/toggle-mistral`
403
+ - `/groq-toggle` → `/toggle-groq`
404
+ - `/cerebras-toggle` `/toggle-cerebras`
405
+ - `/toggle-opencode` (new)
406
+
407
+ ### Fixed
408
+
409
+ - **Ollama Cloud model fetching endpoint** — Corrected the `/v1/models` → `/models` endpoint path in `providers/ollama/ollama.ts`:
410
+ - The previous fix (2.0.0) incorrectly used `/v1/models`; Ollama Cloud's models endpoint is `/v1/models` for chat completions but `/models` for listing
411
+ - This ensures model fetching works correctly with the OpenAI-compatible API
412
+
413
+ ### Removed
414
+
415
+ - **Global `/free` command** — Removed the global free-only toggle. Per-provider toggles (`/toggle-{provider}`) are now the only way to switch between free and paid models. The `/free-providers` status command remains.
416
+
417
+ ## [2.0.0] - 2026-04-23
418
+
419
+ ### Breaking Changes
420
+
421
+ - **Removed Fireworks provider** Fireworks is now a built-in Pi provider (added in pi 0.68.1), so the extension's Fireworks provider has been removed to avoid conflicts:
422
+ - Deleted `providers/fireworks/fireworks.ts` and `tests/fireworks.test.ts`
423
+ - Removed all Fireworks configuration options from `config.ts` (`fireworks_api_key`, `fireworks_show_paid`)
424
+ - Users should now use Pi's built-in Fireworks support with `FIREWORKS_API_KEY`
425
+
426
+ - **Renamed Ollama provider to `ollama-cloud`** — Changed provider ID from `"ollama"` to `"ollama-cloud"` to avoid collision with Pi's built-in local Ollama provider:
427
+ - This prevents provider ID conflicts when both are registered
428
+ - All log messages and documentation now reference "Ollama Cloud"
429
+
430
+ ### Removed
431
+
432
+ - **Dropped `@sinclair/typebox` peer dependency** — Pi 0.69.0 migrated from `@sinclair/typebox` to `typebox` 1.x. The extension didn't directly import this package, so it was removed from `peerDependencies` to avoid potential conflicts.
433
+
434
+ ### Fixed
435
+
436
+ - **Ollama Cloud API endpoint** — Fixed broken Ollama Cloud integration:
437
+ - Changed `BASE_URL_OLLAMA` from `https://ollama.com` to `https://ollama.com/v1` — the OpenAI-compatible API endpoint
438
+ - Fixed model fetching to use `/v1/models` instead of `/api/tags` — ensures model IDs work with chat completions endpoint
439
+ - Previously calls went to HTML homepage instead of API endpoints, causing 404 errors
440
+
441
+ ### Removed
442
+
443
+ - **Removed paid model warning on selection** Deleted the `model_select` event handler that showed:
444
+ - `⚠️ Paid model selected (${model.id}). Use "/free off" to enable paid models.`
445
+ - This warning was redundant since the global `/free` toggle and provider toggles already control model visibility
446
+
447
+ - **Removed pointless `/modal-toggle` command** Modal provider only has 1 free model (GLM-5.1 FP8), so there was nothing meaningful to toggle:
448
+ - Added `skipToggle` option to `ProviderDefinition` and `ProviderSetupConfig` interfaces
449
+ - Modal provider now sets `skipToggle: true` to prevent toggle command creation
450
+
451
+ ### Changed
452
+
453
+ - **Marked Qwen provider as fully deprecated** — Updated messaging to clarify the provider is broken:
454
+ - Changed model name from `"Qwen Coder Free 1k/day"` to `"Qwen Coder DEPRECATED (free tier discontinued)"`
455
+ - Updated all JSDoc comments to clearly state auth is broken and free tier is no longer available
456
+ - Provider remains for backward compatibility but should not be used
457
+
458
+ ### Added
459
+
460
+ - **Cloudflare Workers AI provider** — New provider for Cloudflare's serverless GPU platform:
461
+ - 50+ open-source models: Llama 4, Mistral Small 3.1, Qwen 2.5/3, DeepSeek R1, Gemma 4, Kimi K2.5/2.6, and more
462
+ - **10,000 Neurons/day FREE tier** (resets daily at 00:00 UTC)
463
+ - **$0.011 per 1,000 Neurons** beyond free allocation
464
+ - Only requires `CLOUDFLARE_API_TOKEN` account ID auto-derived from token
465
+ - Toggle with `/cloudflare-toggle`
466
+ - Create token at https://dash.cloudflare.com/profile/api-tokens
467
+
468
+ - **Unified dynamic built-in providers module** — New `providers/dynamic-built-in/` module that dynamically fetches models from Pi's built-in providers when users have API keys:
469
+ - **Mistral** (`MISTRAL_API_KEY`) Fetches from `api.mistral.ai/v1/models`
470
+ - **Groq** (`GROQ_API_KEY`) Fetches from `api.groq.com/openai/v1/models`
471
+ - **Cerebras** (`CEREBRAS_API_KEY`) Fetches from `api.cerebras.ai/v1/models`
472
+ - **xAI** (`XAI_API_KEY`) Fetches from `api.x.ai/v1/models`
473
+ - **Hugging Face** (`HF_TOKEN` — optional) — Fetches public + authenticated models
474
+ - **OpenRouter** — Moved from `index.ts` to unified module with dynamic fetch
475
+ - All integrate with global `/free` toggle and have per-provider toggle commands (`/mistral-toggle`, `/groq-toggle`, etc.)
476
+
477
+ - **Global `/free` toggle system** New centralized free/paid filtering across ALL providers:
478
+ - `/free on/off/status` — Toggle free-only view globally
479
+ - `/free-providers` — Show free/paid model counts by provider
480
+ - `FREE_ONLY` config option and `PI_FREE_ONLY` environment variable
481
+ - Providers register via `registerWithGlobalToggle()` for unified filtering
482
+
483
+ ### Fixed
484
+
485
+ - **Toggle commands now actually filter models from UI** Previously, toggle commands only showed notifications but didn't remove paid models from the model picker:
486
+ - **OpenRouter (`/openrouter-toggle`)**: Now uses `registerProvider`/`unregisterProvider` to actually filter models from the picker UI
487
+ - **NVIDIA (`/nvidia-toggle`)**: Added dynamic `showPaid` parameter to `fetchNvidiaModels()` so toggle properly switches between free and paid model sets
488
+ - **Fireworks**: Removed broken toggle command all models are paid with no free tier, so there was nothing to toggle
489
+
490
+ ### Added
491
+
492
+ - **OpenRouter per-provider free model toggle** Added `/openrouter-toggle` command for the built-in OpenRouter provider:
493
+ - `/openrouter-toggle` Switch between showing only free models vs all models (including paid)
494
+ - New config flag `openrouter_show_paid` in `~/.pi/free.json` (default: `false`)
495
+ - Environment variable: `OPENROUTER_SHOW_PAID=true` to show paid models by default
496
+ - This brings OpenRouter (a built-in pi provider) in line with extension providers that have per-provider toggles
497
+
498
+ ### Deprecated
499
+
500
+ - **Qwen provider** — The 1,000 requests/day free tier is no longer available from Qwen/DashScope. The provider code remains for backward compatibility but is now deprecated:
501
+ - Added `@deprecated` JSDoc tags to all Qwen-related exports
502
+ - Added deprecation warning when Qwen provider loads
503
+ - Added warning when `QWEN_SHOW_PAID` config is used
504
+ - Consider migrating to other free providers: Kilo, Cline, NVIDIA, or Modal
505
+
506
+ ### Added
507
+
508
+ - **Go provider** — OpenCode Go subscription gateway (⚠️ paid only — $5 first month, then $10/month, no free tier) with models: GLM-5, Kimi K2.5, MiMo-V2-Pro, MiMo-V2-Omni, MiniMax M2.7, MiniMax M2.5
509
+ - Set `OPENCODE_GO_API_KEY` or `opencode_go_api_key` in `~/.pi/free.json`
510
+ - Toggle with `/go-toggle`
511
+
512
+ ### Fixed
513
+
514
+ - **All providers now show Coding Index scores in model selector** — Added `enhanceWithCI()` to factory-based providers (nvidia, fireworks, mistral, modal, ollama) and cline. Now all providers display CI scores in `/models` command (pi-models extension).
515
+
516
+ - **All providers now show in `--list-models`** — Providers (zen, openrouter, go) that registered models only in `session_start` were missing from `pi --list-models` which runs before session starts. Added immediate registration for these providers:
517
+ - **zen**: Added model caching to `~/.pi/provider-cache.json` for immediate registration + dynamic refresh
518
+ - **openrouter**: Immediate model registration at extension load (like kilo/cline)
519
+ - **go**: Immediate registration with static model list (no API to fetch from)
520
+ - All 11 providers now visible in `--list-models`
521
+
522
+ ### Changed
523
+
524
+ - Updated README with clear free vs paid provider distinction (9 free + 2 paid-only: Go, Fireworks)
525
+ - Added Go and Fireworks provider documentation under new "💳 Paid-Only Providers" section
526
+ - Added `opencode_go_api_key` to config file template
527
+ - Updated package.json description and keywords to include all 11 providers
528
+
529
+ ### Added
530
+
531
+ - **Provider model cache** (`lib/provider-cache.ts`) New utility for caching provider model lists to `~/.pi/provider-cache.json`. Used by zen provider for faster startup and offline access after first successful fetch.
532
+
533
+ ## [1.0.9] - 2026-04-14
534
+
535
+ ### Fixed
536
+
537
+ - **Qwen OAuth breaks other OAuth providers** — `modifyModels` receives all models across every registered provider, not just Qwen's. The previous `map()` stamped the Qwen dashscope `baseUrl` onto every model, causing other OAuth providers (Kilo, OpenRouter, etc.) to return 404 after a `/login qwen` flow. Now only models with `provider === PROVIDER_QWEN` are patched; others pass through unchanged.
538
+
539
+ ## [1.0.8] - 2026-04-13
540
+
541
+ ### Added
542
+
543
+ - **Modal provider** Free access to GLM-5.1 FP8 (128k context, 16k max output) during promotional period (free until April 30, 2026)
544
+ - Requires a free Modal API key (`MODAL_API_KEY` or `modal_api_key` in `~/.pi/free.json`)
545
+ - Model: `zai-org/GLM-5.1-FP8` 128k context window, 16k max output tokens
546
+ - **Qwen provider** — Free access to Qwen Coder (1,000 requests/day) via OAuth device flow
547
+ - Run `/login qwen` to authenticate through Qwen Studio (chat.qwen.ai)
548
+ - Uses `coder-model` alias (maps to Qwen3.6-Plus on the backend)
549
+ - 131k context window, 16k max output tokens, zero cost
550
+
551
+ ### Fixed
552
+
553
+ - **Qwen OAuth browser launch on Windows** — URLs with `&` query params were truncated by `cmd.exe`'s `&` command separator; switched to `powershell.exe Start-Process` which passes the URL as a literal string
554
+ - **Qwen API endpoint** — Replicates qwen-code's `getCurrentEndpoint()` logic: uses `resource_url` from OAuth token response (`dashscope.aliyuncs.com` for Chinese accounts, `portal.qwen.ai` for international), with fallback to `dashscope.aliyuncs.com/compatible-mode/v1`
555
+ - **Qwen DashScope headers** — Added all headers required by DashScope's OpenAI-compatible API: `X-DashScope-AuthType: qwen-oauth`, `X-DashScope-CacheControl: enable`, `X-DashScope-UserAgent`, `Client-Code: QwenCode`
556
+ - **Qwen modifyModels crash** — `modifyModels` must be synchronous; making it async caused the pi framework to receive a `Promise` instead of a `Model[]`, breaking `ModelRegistry.find()`
557
+
558
+ ## [1.0.5] - 2025-04-03
559
+
560
+ ### Fixed
561
+
562
+ - **NVIDIA provider non-chat model filtering** (comment/implementation mismatch)
563
+ - Added modalities-based filtering to exclude embedding, speech-to-text, OCR, and image-gen models
564
+ - Filters models where `output` is not `["text"]` (e.g., image generation like `black-forest-labs/flux.1-dev`)
565
+ - Filters models where `input` lacks `"text"` (e.g., OCR like `nvidia/nemoretriever-ocr-v1`, speech-to-text like `openai/whisper-large-v3`)
566
+ - Updated file comment to accurately describe the filtering behavior
567
+ - Added 8 comprehensive unit tests for model filtering logic
568
+
569
+ ## [1.0.4] - 2025-04-03
570
+
571
+ ### Fixed
572
+
573
+ - **All tests now passing** (127/127)
574
+ - Fixed mock paths in kilo.test.ts, zen.test.ts, ollama.test.ts
575
+ - Fixed createCtxReRegister mocks in zen.test.ts and openrouter.test.ts
576
+ - Fixed cline.test.ts to test actual provider re-registration behavior
577
+ - Added missing DEFAULT_MIN_SIZE_B constant to openrouter mock
578
+
579
+ ### Changed
580
+
581
+ - **Code quality improvements**
582
+ - Refactored usage modules to break circular dependency (limits.ts formatters.ts)
583
+ - Created usage/types.ts with shared interfaces (FreeTierLimit, FreeTierUsage)
584
+ - Bumped version to 1.0.4
585
+
586
+ ## [1.0.3] - 2025-04-03
587
+
588
+ ### Changed
589
+
590
+ - Updated package.json metadata (name, description, keywords, repository URL)
591
+ - Updated .npmignore for cleaner publishes
592
+
593
+ ## [1.0.0] - 2024-03-28
594
+
595
+ ### Added
596
+
597
+ - Initial release with 6 providers: Kilo, Zen, OpenRouter, NVIDIA, Cline, Fireworks
598
+ - Free tier usage tracking across all sessions
599
+ - Provider failover with model hopping
600
+ - Autocompact integration for rate limit recovery
601
+ - Usage widget with glimpseui
602
+ - Command toggles for free/all model filtering
603
+ - Hardcoded benchmark data from Artificial Analysis
604
+
605
+ ### Changed
606
+
607
+ - **Major refactoring**: Split free-tier-limits.ts into usage/\* modules
608
+ - usage/tracking.ts - runtime session tracking
609
+ - usage/cumulative.ts - persistent storage
610
+ - usage/formatters.ts - display formatting
611
+ - 77% line reduction (741 → 166 lines)
612
+ - **Major refactoring**: Split usage-widget.ts into widget/\* modules
613
+ - widget/data.ts - data collection
614
+ - widget/format.ts - formatting utilities
615
+ - widget/render.ts - HTML generation
616
+ - 74% line reduction (~350 → 90 lines)
617
+ - **Refactoring**: Extracted functions from cline-auth.ts
618
+ - fetchAuthorizeUrl() - auth URL fetching
619
+ - waitForAuthCode() - callback handling
620
+ - exchangeCodeForTokens() - token exchange
621
+ - parseManualInput() - manual input parsing
622
+ - **Refactoring**: Simplified model-hop.ts complexity
623
+ - Extracted handleDowngradeDecision()
624
+ - Extracted tryAlternativeModel()
625
+ - **Deduplication**: Created shared modules
626
+ - lib/json-persistence.ts - file I/O with caching
627
+ - lib/logger.ts - structured logging
628
+ - providers/model-fetcher.ts - OpenRouter-compatible fetching
629
+ - Replaced ~30 console.log statements with structured logging
630
+ - Fixed all 9 pre-existing test failures
631
+ - fetchWithRetry now throws after last retry
632
+ - Fixed auth pattern matching (added key.*not.*valid)
633
+ - Updated capability ranking tests
634
+ - Added resetUsageStats() for test isolation
635
+
636
+ ### Fixed
637
+
638
+ - fetchWithRetry() now properly throws after exhausting retries
639
+ - Auth error pattern matching now handles more message variants
640
+ - Test isolation for free-tier-limits tests