pi-free 2.0.8 → 2.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,540 +1,544 @@
1
- # Changelog
2
-
3
- All notable changes to this project will be documented in this file.
4
-
5
- The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
- and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
-
8
- ## [Unreleased]
9
-
10
- ## [2.0.8] - 2026-05-07
11
-
12
- ### Added
13
-
14
- - **Codestral provider** — Mistral's code-focused model via codestral.mistral.ai.
15
- Free tier (Experiment plan): 2 req/min, 500K tokens/min, 1B tokens/month.
16
- Uses pi's built-in Mistral SDK (`mistral-conversations` API type).
17
-
18
- - **LLM7.io provider** — OpenAI-compatible API gateway routing across
19
- multiple providers (OpenAI, Mistral, Google, DeepSeek, etc.). Free tier:
20
- default/fast selectors, 100 req/hr, 20 req/min.
21
-
22
- - **DeepInfra provider** — AI inference cloud with 100+ open-source models.
23
- $5 one-time credit on signup (no credit card). Models fetched dynamically.
24
- Shown as trial credit provider in `/free-providers`.
25
-
26
- - **SambaNova provider** — Fast inference on custom RDU hardware with
27
- OpenAI-compatible API. All models accessible on free tier (no credit card):
28
- 20-480 RPM. Models include Llama 3.3 70B, DeepSeek-V3/R1, Llama 4 Maverick.
29
- Shown as freemium provider in `/free-providers`.
30
-
31
- ### Changed
32
-
33
- - **Codestral: fixed HTTP 422 error** — Switched API type from
34
- `openai-completions` to `mistral-conversations`. The OpenAI completions
35
- adapter was sending unrecognized fields (`stream_options`, `store`,
36
- `max_completion_tokens`) that Mistral's API rejects with 422.
37
-
38
- ### Fixed
39
-
40
- - **Toggle commands persist across sessions for all providers** — Providers using
41
- `setupProvider` (zenmux, crofai, llm7, sambanova, deepinfra) were always
42
- registering `freeModels` on startup, ignoring the persisted `show_paid` config.
43
- Now each provider reads its config getter and registers the correct initial
44
- model set. Fixes #149.
45
-
46
- ### Security
47
-
48
- - **Log injection prevention** — `scripts/update-benchmarks.ts` sanitizes external
49
- API data (CRLF stripping) before logging. Fixes SonarCloud S1075.
50
-
51
- ### Reliability
52
-
53
- - **Prefer `String#replaceAll()` over `String#replace()`** Replaced all 7 flagged
54
- instances. Where regex is unnecessary (2/7), switched to string literal form.
55
- Fixes SonarCloud S4144.
56
-
57
- ### Added
58
-
59
- - **`agents.md`** — Codebase guide for AI agents covering architecture, patterns,
60
- conventions, testing, and the Pi extension API.
61
-
62
- ### Added
63
-
64
- - **Passive quota monitoring** Extracts rate-limit headers from every
65
- provider response via `after_provider_response` event (no extra API calls).
66
- Tries 6 header format variants (`x-ratelimit-remaining`,
67
- `ratelimit-remaining-requests-day`, etc.). Shows remaining quota in the
68
- status bar with warning icons when ≤25% or ≤10%. Fixes #147.
69
-
70
- ### Fixed
71
-
72
- - **Missing `g` flag on `replaceAll` regexps broke model filtering**
73
- `String.prototype.replaceAll()` requires a global RegExp; 20+ patterns in
74
- `benchmark-lookup.ts` were missing it, causing a `TypeError` that prevented
75
- models from appearing for providers like cline and kilo. Added `/g` flag to
76
- all affected patterns. Fixes #151.
77
-
78
- ### Changed
79
-
80
- - **Resolved ~280 SonarCloud issues across 21 files** — Bulk code-quality
81
- cleanup including: stripping trailing zeros from `toFixed()` (S7748),
82
- `global` → `globalThis` (S7764), `parseFloat` → `Number.parseFloat` (S7773),
83
- naming unnamed async exports (S7726), `String.raw` for path strings (S7780),
84
- top-level await over promise chains (S7785), re-export from source (S7763),
85
- `.at(-1)` over `[length-1]` (S7755), `node:fs` protocol imports (S7772),
86
- and logging user-controlled data sanitization (S5145). Fixes #148.
87
-
88
- ### Security
89
-
90
- - **Bump `basic-ftp` 5.3.0 5.3.1** Patches GHSA-rpmf-866q-6p89 (high
91
- severity): malicious FTP server could cause client-side DoS via unbounded
92
- multiline control response buffering. Fixes `npm audit` finding.
93
-
94
- ### Refactored
95
-
96
- - **Extracted shared model-fetch helper** `fetchOpenAICompatibleModels()`
97
- in `lib/util.ts` eliminates ~120 lines of duplicated fetch→parse→map
98
- boilerplate across CrofAI, DeepInfra, and SambaNova providers.
99
-
100
- ## [2.0.6] - 2026-05-02
101
-
102
- ### Security
103
-
104
- - **5x S5852 regex super-linear runtime** — Replaced all flagged regex patterns
105
- (nested quantifiers in model size extraction) with manual char-by-char string
106
- parsing in `parseModelSize()`, `normalizeSizeTokenOrder()`, and test helpers.
107
- Eliminates catastrophic backtracking risk.
108
-
109
- - **4x S4036 PATH variable security**
110
- - `open-browser.ts`: Added `resolveExe()` helper that prefers known absolute
111
- paths (`/usr/bin/open`, `C:\Windows\System32\...\powershell.exe`) before
112
- falling back to PATH lookup
113
- - `check-extensions.mjs`: Removed hardcoded PATH override; resolved `npm` via
114
- `execFileSync` with known absolute paths
115
-
116
- - **1x S4721 command injection** — Replaced `execSync` with `execFileSync` in
117
- `resolveExe()` helper. `execFileSync` takes separate arguments and never
118
- spawns a shell, eliminating the injection vector.
119
-
120
- ### Changed
121
-
122
- - **Banner image** Converted `banner.svg` to `banner.png` for reliable
123
- rendering across all GitHub surfaces (mobile, email, dark mode readers).
124
-
125
- ## [2.0.5] - 2026-05-02
126
-
127
- ### Added
128
-
129
- - **NVIDIA model probe auto-discovery** — Lazy auto-probe for NVIDIA models on
130
- first `session_start` (once per session). Broken 404 models detected and
131
- auto-hidden without requiring manual `/probe-nvidia`.
132
-
133
- ### Changed
134
-
135
- - **Ollama provider updates** — Improved cloud model detection and configuration.
136
-
137
- ## [2.0.4] - 2026-05-02
138
-
139
- ### Fixed
140
-
141
- - **OpenRouter key resolution no longer falls back to `free.json`** —
142
- `getOpenrouterApiKey()` now only checks the `OPENROUTER_API_KEY` environment variable.
143
- Previously it fell back to `~/.pi/free.json`, which could contain stale/revoked keys
144
- that conflict with pi's built-in OpenRouter provider (which reads from
145
- `~/.pi/agent/auth.json`).
146
-
147
- - **Removed `openrouter_api_key` from `PiFreeConfig` interface and config template**
148
- Prevents future persistence of OpenRouter keys in `free.json`, eliminating the
149
- source of stale key conflicts for built-in providers.
150
-
151
- ## [2.0.3] - 2026-05-02
152
-
153
- ### Added
154
-
155
- - **Consistent `isFreeModel` helper with Route A/B logic** — Created a unified helper for free model detection that automatically detects whether a provider exposes pricing:
156
- - **Route A (pricing-exposed)**: Model is free if `cost === 0` OR `"free"` in name (OR logic)
157
- - **Route B (non-pricing-exposed)**: Model is free only if `"free"` in name
158
- - Dynamic detection: If ALL models have cost === 0, assumes pricing not exposed → uses Route B
159
- - If ANY model has cost > 0, assumes pricing exposed uses Route A
160
- - All providers (Cline, Kilo, NVIDIA, Ollama, dynamic built-in) now use this consistent helper
161
-
162
- - **CrofAI provider (PAID)** — Added new **paid** provider for CrofAI (https://crof.ai), an OpenAI-compatible LLM inference API. **Note: CrofAI is a paid provider** — users must have a CrofAI API key with credits. The provider uses Route B detection (name-only) since CrofAI's API doesn't expose per-model pricing. Only models with `"free"` in their names are marked as free (none currently).
163
-
164
- - **ZenMux provider (PAID)** — Added new **paid** provider for ZenMux AI gateway (https://zenmux.ai), a unified API for 200+ models from OpenAI, Anthropic, Google, etc. **Note: ZenMux is a paid provider** — users must have a ZenMux API key with credits. The provider uses Route A detection (OR logic) since ZenMux exposes pricing. Models marked as free only if `cost === 0` OR `"free"` in name (2 free models identified: GLM 4.7 Flash Free, GLM 4.6v Flash Free).
165
-
166
- - **Comprehensive `isFreeModel` test suite** — Added 30+ unit tests covering Route A, Route B, freemium behavior, and edge cases. Tests verify correct classification on actual OpenRouter API data (371 models, 30 free).
167
-
168
- - **Toggle commands for dynamic built-in providers** — Added `/toggle-mistral`, `/toggle-groq`,
169
- `/toggle-cerebras`, `/toggle-xai`, and `/toggle-huggingface` commands. These providers were
170
- registered with the global toggle system but lacked per-provider toggle commands, making
171
- free/paid switching inaccessible without editing config files.
172
-
173
- - **Lazy auto-probe for NVIDIA models** — Extracted `runNvidiaProbe()` into a shared function
174
- called automatically on first `session_start` (once per session). Previously, users had to
175
- manually run `/probe-nvidia` to discover 404 models. Now broken models are detected and
176
- auto-hidden on first use.
177
-
178
- ### Changed
179
-
180
- - **Cline provider now uses `isFreeModel`** — Fixed Cline to use the consistent `isFreeModel` helper instead of `m.cost.input === 0`. Previously used cost-only filtering, now uses proper OR logic for pricing-exposed providers.
181
-
182
- - **NVIDIA test expectations updated** — Updated tests to reflect strict Route B behavior (name-only detection for non-pricing-exposed providers). Added test for models with `"free"` in name being marked as free.
183
-
184
- ### Fixed
185
-
186
- - **`provider-factory.ts` `beforeProviderRequest` hook now scoped to owning provider** —
187
- The hook was firing for **all** provider requests regardless of which provider the factory
188
- was configuring. Now checks `evt.provider !== def.providerId` and returns early if the
189
- event doesn't belong to the owning provider.
190
-
191
- - **`provider-factory.ts` `reRegister` callback no longer corrupts stored model lists**
192
- When toggling between free/paid modes, the callback was overwriting `stored.all` with only
193
- the filtered subset, losing the original full model list. Now preserves the original model
194
- lists for correct subsequent toggling.
195
-
196
- - **`lib/types.ts` Removed leftover `LspTestInterface`** Removed a test interface that
197
- was left in production code.
198
-
199
- - **`index.ts` — Removed redundant `.catch()` on deprecated Qwen provider** — The `.catch()`
200
- was unnecessary since `Promise.allSettled` already handles rejections.
201
-
202
- ### Removed
203
-
204
- - **Qwen provider (deprecated)** — Removed Qwen OAuth provider as the 1,000 req/day free tier is no longer available. Provider remains functional for existing authenticated users but new free tier registrations are not supported.
205
-
206
- - **Modal provider** — Removed single-model Modal provider (only had GLM-5.1 FP8). Users should use other providers for GLM models.
207
-
208
- - **Cloudflare provider** — Removed Cloudflare Workers AI provider as it's now built into pi core. Users can use pi's built-in Cloudflare provider instead.
209
-
210
- - **Qwen test file** — Removed `tests/qwen.test.ts` along with the deprecated provider.
211
-
212
- ## [2.0.2] - 2026-04-26
213
-
214
- ### Added
215
-
216
- - **Model matching debug logging** — Added `~/.pi/modelmatch.log` to diagnose which models get Coding Index scores and which don't:
217
- - Logs every matching attempt with provider, model ID, normalization strategy, and result
218
- - CSV-like format: `timestamp|provider|modelId|modelName|action|strategy|normalizedId|matchKey|codingIndex|details`
219
- - Provider-specific normalizers for better matching:
220
- - **NVIDIA**: Strips vendor prefixes (`meta/`, `mistralai/`, `microsoft/`, `qwen/`, etc.)
221
- - **Cloudflare**: Strips `@cf/namespace/` prefixes
222
- - **Groq**: Removes `-versatile` and numeric context suffixes (`-32768`)
223
- - **Cerebras**: Normalizes `llama3.1` `llama-3.1`, auto-adds `instruct` suffix
224
- - **Mistral**: Strips `-latest` suffix
225
- - **Ollama**: Converts `model:tag` → `model-tag`
226
- - Common suffix stripping: `:free`, date codes (`-20250514`), versions (`-v1.1`), `-it`, `-fp8`/`-bf16`
227
-
228
- - **Enhanced benchmark lookup** — `enhanceModelNameWithCodingIndex()` now accepts optional `provider` parameter for provider-aware normalization
229
-
230
- - **Static 404 model blocklist for NVIDIA** Probed all 136 models from `integrate.api.nvidia.com/v1/models` and identified 57 that return 404 "Function not found" on `/v1/chat/completions`. These are now hard-filtered so they never appear in the model selector:
231
- - Covers discontinued models (`databricks/dbrx-instruct`, `meta/codellama-70b`, `meta/llama2-70b`, `ibm/granite-*`, etc.)
232
- - Covers embedding-only models listed as chat-capable (`nvidia/nv-embed-v1`, `nvidia/nv-embedqa-*`, `snowflake/arctic-embed-l`, etc.)
233
- - Covers stale API catalog entries (`mistralai/mistral-large`, `mistralai/mistral-large-2-instruct`, `writer/palmyra-*`, etc.)
234
- - Full list in `NVIDIA_KNOWN_404_MODELS` in `providers/nvidia/nvidia.ts`
235
-
236
- - **`/probe-nvidia` command** — On-demand model health check. Tests every registered NVIDIA model with a minimal `max_tokens: 1` request, auto-hides any new 404s in `~/.pi/free.json`, and re-registers the provider immediately.
237
-
238
- - **`scripts/probe-nvidia.mjs`** Standalone Node.js script to reproduce the probe. Reads `~/.pi/free.json` for the API key, batches 20 requests at a time with 10s timeout, and prints all broken model IDs for adding to the blocklist.
239
-
240
- - **Ollama Cloud 403 handling** Same pattern as NVIDIA 404s for Ollama Cloud:
241
- - `OLLAMA_KNOWN_403_MODELS` blocklist for models that return 403 "access denied"
242
- - `/probe-ollama` command to test all models on-demand, auto-hide broken ones, and re-register
243
- - `scripts/probe-ollama.mjs` standalone script for blocklist maintenance
244
-
245
- - **Provider-scoped hidden models** Hidden models are now provider-specific:
246
- - Format: `"provider/model-id"` (e.g., `"ollama/kimi-k2.6"`, `"nvidia/broken-model"`)
247
- - A model hidden from one provider doesn't hide it from other providers
248
- - Backward compatible with old global `"model-id"` format
249
- - All providers updated: NVIDIA, Ollama, Cloudflare, Cline, Kilo, Modal
250
-
251
- ### Fixed
252
-
253
- - **Probe commands timeout handling** Added `fetchWithTimeout` with 10-second timeout to `/probe-nvidia` and `/probe-ollama` commands. Prevents the coding harness from freezing when individual model probe requests hang indefinitely.
254
-
255
- - **NVIDIA provider now sends `authHeader: true`** — Explicitly enables `Authorization: Bearer` header injection. Previously relied on pi's implicit behavior which could fail in some configurations.
256
-
257
- ### Removed
258
-
259
- - **NVIDIA 404 model warning log**Removed the `console.warn("[nvidia] Skipping known 404 model: ...")` output when filtering out known broken models. The filter still works silently; use `/probe-nvidia` to identify new 404s if needed.
260
-
261
- ### Changed
262
-
263
- - **Cloudflare provider now fetches models dynamically** — Replaced static 19-model hardcoded list with live API fetch from `api.cloudflare.com/client/v4/accounts/{account_id}/ai/models`:
264
- - Automatically discovers all 30+ text generation models (was manually maintaining 19)
265
- - Smart filtering excludes embeddings, image generation, speech, translation, and vision-only models via regex patterns
266
- - Metadata inference from model IDs: detects vision (`vision`/`multimodal`), reasoning (`r1`/`thinking`/`qwq`), context windows, and estimated costs
267
- - Fixed Mistral Small ID: changed from incorrect `@cf/mistralai/...` to correct `@cf/mistral/...`
268
- - Added new fallback models: Kimi K2.6, OpenAI GPT-OSS 120B/20B, Qwen 2.5 Coder 32B, QwQ 32B, Llama 3.2 11B Vision
269
- - Graceful fallback to expanded 18-model hardcoded list if API fetch fails
270
-
271
- - **NVIDIA provider now queries NVIDIA's API directly** — Source of truth switched from `models.dev` curated JSON to `https://integrate.api.nvidia.com/v1/models`:
272
- - Eliminates 57 missing models and 25 stale entries from the old third-party source
273
- - Models not in `models.dev` get inferred metadata (128k context, 4k output, vision/reasoning heuristics)
274
- - Added regex-based non-chat model filtering for unknown models (embeddings, whisper, reward models, safety guards, parsers, detectors, etc.)
275
- - Graceful fallback to `models.dev` if NVIDIA API is unreachable
276
- - Removed paid/free toggle filtering NVIDIA is freemium (all models use free credits)
277
-
278
- ## [2.0.1] - 2026-04-24
279
-
280
- ### Added
281
-
282
- - **Built-in provider toggle support** (`lib/built-in-toggle.ts`) — Enables free/paid filtering for Pi's built-in providers that expose per-model pricing:
283
- - **OpenCode (`/toggle-opencode`)** — Captures built-in OpenCode models on session start and filters to free-only by default
284
- - **OpenRouter (`/toggle-openrouter`)** — Now uses the built-in toggle system for consistency
285
- - Toggle works in the current session (no restart needed)
286
- - Persisted via `opencode_show_paid` and `openrouter_show_paid` in `~/.pi/free.json`
287
-
288
- ### Changed
289
-
290
- - **OpenRouter moved to built-in toggle system** — OpenRouter is now handled by `lib/built-in-toggle.ts` alongside OpenCode for a unified approach:
291
- - Removed from `providers/dynamic-built-in/index.ts`
292
- - Eliminated duplicate toggle command registration logic
293
- - Consolidated toggle persistence with other built-in providers
294
-
295
- - **Standardized all toggle commands to `toggle-{provider}`** — Renamed from `{provider}-toggle` for consistency:
296
- - `/kilo-toggle` `/toggle-kilo`
297
- - `/cline-toggle` `/toggle-cline`
298
- - `/openrouter-toggle` → `/toggle-openrouter`
299
- - `/nvidia-toggle` `/toggle-nvidia`
300
- - `/cloudflare-toggle` → `/toggle-cloudflare`
301
- - `/ollama-toggle` → `/toggle-ollama`
302
- - `/mistral-toggle` → `/toggle-mistral`
303
- - `/groq-toggle` → `/toggle-groq`
304
- - `/cerebras-toggle` → `/toggle-cerebras`
305
- - `/toggle-opencode` (new)
306
-
307
- ### Fixed
308
-
309
- - **Ollama Cloud model fetching endpoint** — Corrected the `/v1/models` → `/models` endpoint path in `providers/ollama/ollama.ts`:
310
- - The previous fix (2.0.0) incorrectly used `/v1/models`; Ollama Cloud's models endpoint is `/v1/models` for chat completions but `/models` for listing
311
- - This ensures model fetching works correctly with the OpenAI-compatible API
312
-
313
- ### Removed
314
-
315
- - **Global `/free` command** Removed the global free-only toggle. Per-provider toggles (`/toggle-{provider}`) are now the only way to switch between free and paid models. The `/free-providers` status command remains.
316
-
317
- ## [2.0.0] - 2026-04-23
318
-
319
- ### Breaking Changes
320
-
321
- - **Removed Fireworks provider** — Fireworks is now a built-in Pi provider (added in pi 0.68.1), so the extension's Fireworks provider has been removed to avoid conflicts:
322
- - Deleted `providers/fireworks/fireworks.ts` and `tests/fireworks.test.ts`
323
- - Removed all Fireworks configuration options from `config.ts` (`fireworks_api_key`, `fireworks_show_paid`)
324
- - Users should now use Pi's built-in Fireworks support with `FIREWORKS_API_KEY`
325
-
326
- - **Renamed Ollama provider to `ollama-cloud`** — Changed provider ID from `"ollama"` to `"ollama-cloud"` to avoid collision with Pi's built-in local Ollama provider:
327
- - This prevents provider ID conflicts when both are registered
328
- - All log messages and documentation now reference "Ollama Cloud"
329
-
330
- ### Removed
331
-
332
- - **Dropped `@sinclair/typebox` peer dependency** Pi 0.69.0 migrated from `@sinclair/typebox` to `typebox` 1.x. The extension didn't directly import this package, so it was removed from `peerDependencies` to avoid potential conflicts.
333
-
334
- ### Fixed
335
-
336
- - **Ollama Cloud API endpoint** — Fixed broken Ollama Cloud integration:
337
- - Changed `BASE_URL_OLLAMA` from `https://ollama.com` to `https://ollama.com/v1` — the OpenAI-compatible API endpoint
338
- - Fixed model fetching to use `/v1/models` instead of `/api/tags` — ensures model IDs work with chat completions endpoint
339
- - Previously calls went to HTML homepage instead of API endpoints, causing 404 errors
340
-
341
- ### Removed
342
-
343
- - **Removed paid model warning on selection** Deleted the `model_select` event handler that showed:
344
- - `⚠️ Paid model selected (${model.id}). Use "/free off" to enable paid models.`
345
- - This warning was redundant since the global `/free` toggle and provider toggles already control model visibility
346
-
347
- - **Removed pointless `/modal-toggle` command** — Modal provider only has 1 free model (GLM-5.1 FP8), so there was nothing meaningful to toggle:
348
- - Added `skipToggle` option to `ProviderDefinition` and `ProviderSetupConfig` interfaces
349
- - Modal provider now sets `skipToggle: true` to prevent toggle command creation
350
-
351
- ### Changed
352
-
353
- - **Marked Qwen provider as fully deprecated** Updated messaging to clarify the provider is broken:
354
- - Changed model name from `"Qwen Coder — Free 1k/day"` to `"Qwen Coder — DEPRECATED (free tier discontinued)"`
355
- - Updated all JSDoc comments to clearly state auth is broken and free tier is no longer available
356
- - Provider remains for backward compatibility but should not be used
357
-
358
- ### Added
359
-
360
- - **Cloudflare Workers AI provider** New provider for Cloudflare's serverless GPU platform:
361
- - 50+ open-source models: Llama 4, Mistral Small 3.1, Qwen 2.5/3, DeepSeek R1, Gemma 4, Kimi K2.5/2.6, and more
362
- - **10,000 Neurons/day FREE tier** (resets daily at 00:00 UTC)
363
- - **$0.011 per 1,000 Neurons** beyond free allocation
364
- - Only requires `CLOUDFLARE_API_TOKEN`account ID auto-derived from token
365
- - Toggle with `/cloudflare-toggle`
366
- - Create token at https://dash.cloudflare.com/profile/api-tokens
367
-
368
- - **Unified dynamic built-in providers module** New `providers/dynamic-built-in/` module that dynamically fetches models from Pi's built-in providers when users have API keys:
369
- - **Mistral** (`MISTRAL_API_KEY`) — Fetches from `api.mistral.ai/v1/models`
370
- - **Groq** (`GROQ_API_KEY`) Fetches from `api.groq.com/openai/v1/models`
371
- - **Cerebras** (`CEREBRAS_API_KEY`) — Fetches from `api.cerebras.ai/v1/models`
372
- - **xAI** (`XAI_API_KEY`) Fetches from `api.x.ai/v1/models`
373
- - **Hugging Face** (`HF_TOKEN` — optional) — Fetches public + authenticated models
374
- - **OpenRouter** — Moved from `index.ts` to unified module with dynamic fetch
375
- - All integrate with global `/free` toggle and have per-provider toggle commands (`/mistral-toggle`, `/groq-toggle`, etc.)
376
-
377
- - **Global `/free` toggle system**New centralized free/paid filtering across ALL providers:
378
- - `/free on/off/status` Toggle free-only view globally
379
- - `/free-providers` Show free/paid model counts by provider
380
- - `FREE_ONLY` config option and `PI_FREE_ONLY` environment variable
381
- - Providers register via `registerWithGlobalToggle()` for unified filtering
382
-
383
- ### Fixed
384
-
385
- - **Toggle commands now actually filter models from UI** — Previously, toggle commands only showed notifications but didn't remove paid models from the model picker:
386
- - **OpenRouter (`/openrouter-toggle`)**: Now uses `registerProvider`/`unregisterProvider` to actually filter models from the picker UI
387
- - **NVIDIA (`/nvidia-toggle`)**: Added dynamic `showPaid` parameter to `fetchNvidiaModels()` so toggle properly switches between free and paid model sets
388
- - **Fireworks**: Removed broken toggle command — all models are paid with no free tier, so there was nothing to toggle
389
-
390
- ### Added
391
-
392
- - **OpenRouter per-provider free model toggle**Added `/openrouter-toggle` command for the built-in OpenRouter provider:
393
- - `/openrouter-toggle` — Switch between showing only free models vs all models (including paid)
394
- - New config flag `openrouter_show_paid` in `~/.pi/free.json` (default: `false`)
395
- - Environment variable: `OPENROUTER_SHOW_PAID=true` to show paid models by default
396
- - This brings OpenRouter (a built-in pi provider) in line with extension providers that have per-provider toggles
397
-
398
- ### Deprecated
399
-
400
- - **Qwen provider** The 1,000 requests/day free tier is no longer available from Qwen/DashScope. The provider code remains for backward compatibility but is now deprecated:
401
- - Added `@deprecated` JSDoc tags to all Qwen-related exports
402
- - Added deprecation warning when Qwen provider loads
403
- - Added warning when `QWEN_SHOW_PAID` config is used
404
- - Consider migrating to other free providers: Kilo, Cline, NVIDIA, or Modal
405
-
406
- ### Added
407
-
408
- - **Go provider** OpenCode Go subscription gateway (⚠️ paid only — $5 first month, then $10/month, no free tier) with models: GLM-5, Kimi K2.5, MiMo-V2-Pro, MiMo-V2-Omni, MiniMax M2.7, MiniMax M2.5
409
- - Set `OPENCODE_GO_API_KEY` or `opencode_go_api_key` in `~/.pi/free.json`
410
- - Toggle with `/go-toggle`
411
-
412
- ### Fixed
413
-
414
- - **All providers now show Coding Index scores in model selector** — Added `enhanceWithCI()` to factory-based providers (nvidia, fireworks, mistral, modal, ollama) and cline. Now all providers display CI scores in `/models` command (pi-models extension).
415
-
416
- - **All providers now show in `--list-models`** — Providers (zen, openrouter, go) that registered models only in `session_start` were missing from `pi --list-models` which runs before session starts. Added immediate registration for these providers:
417
- - **zen**: Added model caching to `~/.pi/provider-cache.json` for immediate registration + dynamic refresh
418
- - **openrouter**: Immediate model registration at extension load (like kilo/cline)
419
- - **go**: Immediate registration with static model list (no API to fetch from)
420
- - All 11 providers now visible in `--list-models`
421
-
422
- ### Changed
423
-
424
- - Updated README with clear free vs paid provider distinction (9 free + 2 paid-only: Go, Fireworks)
425
- - Added Go and Fireworks provider documentation under new "💳 Paid-Only Providers" section
426
- - Added `opencode_go_api_key` to config file template
427
- - Updated package.json description and keywords to include all 11 providers
428
-
429
- ### Added
430
-
431
- - **Provider model cache** (`lib/provider-cache.ts`) — New utility for caching provider model lists to `~/.pi/provider-cache.json`. Used by zen provider for faster startup and offline access after first successful fetch.
432
-
433
- ## [1.0.9] - 2026-04-14
434
-
435
- ### Fixed
436
-
437
- - **Qwen OAuth breaks other OAuth providers** — `modifyModels` receives all models across every registered provider, not just Qwen's. The previous `map()` stamped the Qwen dashscope `baseUrl` onto every model, causing other OAuth providers (Kilo, OpenRouter, etc.) to return 404 after a `/login qwen` flow. Now only models with `provider === PROVIDER_QWEN` are patched; others pass through unchanged.
438
-
439
- ## [1.0.8] - 2026-04-13
440
-
441
- ### Added
442
-
443
- - **Modal provider** — Free access to GLM-5.1 FP8 (128k context, 16k max output) during promotional period (free until April 30, 2026)
444
- - Requires a free Modal API key (`MODAL_API_KEY` or `modal_api_key` in `~/.pi/free.json`)
445
- - Model: `zai-org/GLM-5.1-FP8` — 128k context window, 16k max output tokens
446
- - **Qwen provider** — Free access to Qwen Coder (1,000 requests/day) via OAuth device flow
447
- - Run `/login qwen` to authenticate through Qwen Studio (chat.qwen.ai)
448
- - Uses `coder-model` alias (maps to Qwen3.6-Plus on the backend)
449
- - 131k context window, 16k max output tokens, zero cost
450
-
451
- ### Fixed
452
-
453
- - **Qwen OAuth browser launch on Windows** URLs with `&` query params were truncated by `cmd.exe`'s `&` command separator; switched to `powershell.exe Start-Process` which passes the URL as a literal string
454
- - **Qwen API endpoint** — Replicates qwen-code's `getCurrentEndpoint()` logic: uses `resource_url` from OAuth token response (`dashscope.aliyuncs.com` for Chinese accounts, `portal.qwen.ai` for international), with fallback to `dashscope.aliyuncs.com/compatible-mode/v1`
455
- - **Qwen DashScope headers** — Added all headers required by DashScope's OpenAI-compatible API: `X-DashScope-AuthType: qwen-oauth`, `X-DashScope-CacheControl: enable`, `X-DashScope-UserAgent`, `Client-Code: QwenCode`
456
- - **Qwen modifyModels crash** — `modifyModels` must be synchronous; making it async caused the pi framework to receive a `Promise` instead of a `Model[]`, breaking `ModelRegistry.find()`
457
-
458
- ## [1.0.5] - 2025-04-03
459
-
460
- ### Fixed
461
-
462
- - **NVIDIA provider non-chat model filtering** (comment/implementation mismatch)
463
- - Added modalities-based filtering to exclude embedding, speech-to-text, OCR, and image-gen models
464
- - Filters models where `output` is not `["text"]` (e.g., image generation like `black-forest-labs/flux.1-dev`)
465
- - Filters models where `input` lacks `"text"` (e.g., OCR like `nvidia/nemoretriever-ocr-v1`, speech-to-text like `openai/whisper-large-v3`)
466
- - Updated file comment to accurately describe the filtering behavior
467
- - Added 8 comprehensive unit tests for model filtering logic
468
-
469
- ## [1.0.4] - 2025-04-03
470
-
471
- ### Fixed
472
-
473
- - **All tests now passing** (127/127)
474
- - Fixed mock paths in kilo.test.ts, zen.test.ts, ollama.test.ts
475
- - Fixed createCtxReRegister mocks in zen.test.ts and openrouter.test.ts
476
- - Fixed cline.test.ts to test actual provider re-registration behavior
477
- - Added missing DEFAULT_MIN_SIZE_B constant to openrouter mock
478
-
479
- ### Changed
480
-
481
- - **Code quality improvements**
482
- - Refactored usage modules to break circular dependency (limits.ts ↔ formatters.ts)
483
- - Created usage/types.ts with shared interfaces (FreeTierLimit, FreeTierUsage)
484
- - Bumped version to 1.0.4
485
-
486
- ## [1.0.3] - 2025-04-03
487
-
488
- ### Changed
489
-
490
- - Updated package.json metadata (name, description, keywords, repository URL)
491
- - Updated .npmignore for cleaner publishes
492
-
493
- ## [1.0.0] - 2024-03-28
494
-
495
- ### Added
496
-
497
- - Initial release with 6 providers: Kilo, Zen, OpenRouter, NVIDIA, Cline, Fireworks
498
- - Free tier usage tracking across all sessions
499
- - Provider failover with model hopping
500
- - Autocompact integration for rate limit recovery
501
- - Usage widget with glimpseui
502
- - Command toggles for free/all model filtering
503
- - Hardcoded benchmark data from Artificial Analysis
504
-
505
- ### Changed
506
-
507
- - **Major refactoring**: Split free-tier-limits.ts into usage/\* modules
508
- - usage/tracking.ts - runtime session tracking
509
- - usage/cumulative.ts - persistent storage
510
- - usage/formatters.ts - display formatting
511
- - 77% line reduction (741 166 lines)
512
- - **Major refactoring**: Split usage-widget.ts into widget/\* modules
513
- - widget/data.ts - data collection
514
- - widget/format.ts - formatting utilities
515
- - widget/render.ts - HTML generation
516
- - 74% line reduction (~350 90 lines)
517
- - **Refactoring**: Extracted functions from cline-auth.ts
518
- - fetchAuthorizeUrl() - auth URL fetching
519
- - waitForAuthCode() - callback handling
520
- - exchangeCodeForTokens() - token exchange
521
- - parseManualInput() - manual input parsing
522
- - **Refactoring**: Simplified model-hop.ts complexity
523
- - Extracted handleDowngradeDecision()
524
- - Extracted tryAlternativeModel()
525
- - **Deduplication**: Created shared modules
526
- - lib/json-persistence.ts - file I/O with caching
527
- - lib/logger.ts - structured logging
528
- - providers/model-fetcher.ts - OpenRouter-compatible fetching
529
- - Replaced ~30 console.log statements with structured logging
530
- - Fixed all 9 pre-existing test failures
531
- - fetchWithRetry now throws after last retry
532
- - Fixed auth pattern matching (added key.*not.*valid)
533
- - Updated capability ranking tests
534
- - Added resetUsageStats() for test isolation
535
-
536
- ### Fixed
537
-
538
- - fetchWithRetry() now properly throws after exhausting retries
539
- - Auth error pattern matching now handles more message variants
540
- - Test isolation for free-tier-limits tests
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ### Changed
11
+
12
+ - **Package scope migration** — Updated all peer dependency imports from `@mariozechner/*` to `@earendil-works/*` (`pi-ai`, `pi-coding-agent`, `pi-tui`) to match the upstream scope rename in `@earendil-works/pi` v0.74.0.
13
+
14
+ ## [2.0.8] - 2026-05-07
15
+
16
+ ### Added
17
+
18
+ - **Codestral provider** — Mistral's code-focused model via codestral.mistral.ai.
19
+ Free tier (Experiment plan): 2 req/min, 500K tokens/min, 1B tokens/month.
20
+ Uses pi's built-in Mistral SDK (`mistral-conversations` API type).
21
+
22
+ - **LLM7.io provider** — OpenAI-compatible API gateway routing across
23
+ multiple providers (OpenAI, Mistral, Google, DeepSeek, etc.). Free tier:
24
+ default/fast selectors, 100 req/hr, 20 req/min.
25
+
26
+ - **DeepInfra provider** — AI inference cloud with 100+ open-source models.
27
+ $5 one-time credit on signup (no credit card). Models fetched dynamically.
28
+ Shown as trial credit provider in `/free-providers`.
29
+
30
+ - **SambaNova provider** — Fast inference on custom RDU hardware with
31
+ OpenAI-compatible API. All models accessible on free tier (no credit card):
32
+ 20-480 RPM. Models include Llama 3.3 70B, DeepSeek-V3/R1, Llama 4 Maverick.
33
+ Shown as freemium provider in `/free-providers`.
34
+
35
+ ### Changed
36
+
37
+ - **Codestral: fixed HTTP 422 error** — Switched API type from
38
+ `openai-completions` to `mistral-conversations`. The OpenAI completions
39
+ adapter was sending unrecognized fields (`stream_options`, `store`,
40
+ `max_completion_tokens`) that Mistral's API rejects with 422.
41
+
42
+ ### Fixed
43
+
44
+ - **Toggle commands persist across sessions for all providers** — Providers using
45
+ `setupProvider` (zenmux, crofai, llm7, sambanova, deepinfra) were always
46
+ registering `freeModels` on startup, ignoring the persisted `show_paid` config.
47
+ Now each provider reads its config getter and registers the correct initial
48
+ model set. Fixes #149.
49
+
50
+ ### Security
51
+
52
+ - **Log injection prevention** — `scripts/update-benchmarks.ts` sanitizes external
53
+ API data (CRLF stripping) before logging. Fixes SonarCloud S1075.
54
+
55
+ ### Reliability
56
+
57
+ - **Prefer `String#replaceAll()` over `String#replace()`** — Replaced all 7 flagged
58
+ instances. Where regex is unnecessary (2/7), switched to string literal form.
59
+ Fixes SonarCloud S4144.
60
+
61
+ ### Added
62
+
63
+ - **`agents.md`** — Codebase guide for AI agents covering architecture, patterns,
64
+ conventions, testing, and the Pi extension API.
65
+
66
+ ### Added
67
+
68
+ - **Passive quota monitoring** Extracts rate-limit headers from every
69
+ provider response via `after_provider_response` event (no extra API calls).
70
+ Tries 6 header format variants (`x-ratelimit-remaining`,
71
+ `ratelimit-remaining-requests-day`, etc.). Shows remaining quota in the
72
+ status bar with warning icons when ≤25% or ≤10%. Fixes #147.
73
+
74
+ ### Fixed
75
+
76
+ - **Missing `g` flag on `replaceAll` regexps broke model filtering** —
77
+ `String.prototype.replaceAll()` requires a global RegExp; 20+ patterns in
78
+ `benchmark-lookup.ts` were missing it, causing a `TypeError` that prevented
79
+ models from appearing for providers like cline and kilo. Added `/g` flag to
80
+ all affected patterns. Fixes #151.
81
+
82
+ ### Changed
83
+
84
+ - **Resolved ~280 SonarCloud issues across 21 files** Bulk code-quality
85
+ cleanup including: stripping trailing zeros from `toFixed()` (S7748),
86
+ `global` `globalThis` (S7764), `parseFloat` → `Number.parseFloat` (S7773),
87
+ naming unnamed async exports (S7726), `String.raw` for path strings (S7780),
88
+ top-level await over promise chains (S7785), re-export from source (S7763),
89
+ `.at(-1)` over `[length-1]` (S7755), `node:fs` protocol imports (S7772),
90
+ and logging user-controlled data sanitization (S5145). Fixes #148.
91
+
92
+ ### Security
93
+
94
+ - **Bump `basic-ftp` 5.3.0 → 5.3.1** — Patches GHSA-rpmf-866q-6p89 (high
95
+ severity): malicious FTP server could cause client-side DoS via unbounded
96
+ multiline control response buffering. Fixes `npm audit` finding.
97
+
98
+ ### Refactored
99
+
100
+ - **Extracted shared model-fetch helper** — `fetchOpenAICompatibleModels()`
101
+ in `lib/util.ts` eliminates ~120 lines of duplicated fetch→parse→map
102
+ boilerplate across CrofAI, DeepInfra, and SambaNova providers.
103
+
104
+ ## [2.0.6] - 2026-05-02
105
+
106
+ ### Security
107
+
108
+ - **5x S5852 regex super-linear runtime** — Replaced all flagged regex patterns
109
+ (nested quantifiers in model size extraction) with manual char-by-char string
110
+ parsing in `parseModelSize()`, `normalizeSizeTokenOrder()`, and test helpers.
111
+ Eliminates catastrophic backtracking risk.
112
+
113
+ - **4x S4036 PATH variable security**
114
+ - `open-browser.ts`: Added `resolveExe()` helper that prefers known absolute
115
+ paths (`/usr/bin/open`, `C:\Windows\System32\...\powershell.exe`) before
116
+ falling back to PATH lookup
117
+ - `check-extensions.mjs`: Removed hardcoded PATH override; resolved `npm` via
118
+ `execFileSync` with known absolute paths
119
+
120
+ - **1x S4721 command injection** — Replaced `execSync` with `execFileSync` in
121
+ `resolveExe()` helper. `execFileSync` takes separate arguments and never
122
+ spawns a shell, eliminating the injection vector.
123
+
124
+ ### Changed
125
+
126
+ - **Banner image** — Converted `banner.svg` to `banner.png` for reliable
127
+ rendering across all GitHub surfaces (mobile, email, dark mode readers).
128
+
129
+ ## [2.0.5] - 2026-05-02
130
+
131
+ ### Added
132
+
133
+ - **NVIDIA model probe auto-discovery** — Lazy auto-probe for NVIDIA models on
134
+ first `session_start` (once per session). Broken 404 models detected and
135
+ auto-hidden without requiring manual `/probe-nvidia`.
136
+
137
+ ### Changed
138
+
139
+ - **Ollama provider updates** — Improved cloud model detection and configuration.
140
+
141
+ ## [2.0.4] - 2026-05-02
142
+
143
+ ### Fixed
144
+
145
+ - **OpenRouter key resolution no longer falls back to `free.json`** —
146
+ `getOpenrouterApiKey()` now only checks the `OPENROUTER_API_KEY` environment variable.
147
+ Previously it fell back to `~/.pi/free.json`, which could contain stale/revoked keys
148
+ that conflict with pi's built-in OpenRouter provider (which reads from
149
+ `~/.pi/agent/auth.json`).
150
+
151
+ - **Removed `openrouter_api_key` from `PiFreeConfig` interface and config template** —
152
+ Prevents future persistence of OpenRouter keys in `free.json`, eliminating the
153
+ source of stale key conflicts for built-in providers.
154
+
155
+ ## [2.0.3] - 2026-05-02
156
+
157
+ ### Added
158
+
159
+ - **Consistent `isFreeModel` helper with Route A/B logic** Created a unified helper for free model detection that automatically detects whether a provider exposes pricing:
160
+ - **Route A (pricing-exposed)**: Model is free if `cost === 0` OR `"free"` in name (OR logic)
161
+ - **Route B (non-pricing-exposed)**: Model is free only if `"free"` in name
162
+ - Dynamic detection: If ALL models have cost === 0, assumes pricing not exposed uses Route B
163
+ - If ANY model has cost > 0, assumes pricing exposed → uses Route A
164
+ - All providers (Cline, Kilo, NVIDIA, Ollama, dynamic built-in) now use this consistent helper
165
+
166
+ - **CrofAI provider (PAID)** — Added new **paid** provider for CrofAI (https://crof.ai), an OpenAI-compatible LLM inference API. **Note: CrofAI is a paid provider** users must have a CrofAI API key with credits. The provider uses Route B detection (name-only) since CrofAI's API doesn't expose per-model pricing. Only models with `"free"` in their names are marked as free (none currently).
167
+
168
+ - **ZenMux provider (PAID)** — Added new **paid** provider for ZenMux AI gateway (https://zenmux.ai), a unified API for 200+ models from OpenAI, Anthropic, Google, etc. **Note: ZenMux is a paid provider** users must have a ZenMux API key with credits. The provider uses Route A detection (OR logic) since ZenMux exposes pricing. Models marked as free only if `cost === 0` OR `"free"` in name (2 free models identified: GLM 4.7 Flash Free, GLM 4.6v Flash Free).
169
+
170
+ - **Comprehensive `isFreeModel` test suite** Added 30+ unit tests covering Route A, Route B, freemium behavior, and edge cases. Tests verify correct classification on actual OpenRouter API data (371 models, 30 free).
171
+
172
+ - **Toggle commands for dynamic built-in providers** — Added `/toggle-mistral`, `/toggle-groq`,
173
+ `/toggle-cerebras`, `/toggle-xai`, and `/toggle-huggingface` commands. These providers were
174
+ registered with the global toggle system but lacked per-provider toggle commands, making
175
+ free/paid switching inaccessible without editing config files.
176
+
177
+ - **Lazy auto-probe for NVIDIA models** — Extracted `runNvidiaProbe()` into a shared function
178
+ called automatically on first `session_start` (once per session). Previously, users had to
179
+ manually run `/probe-nvidia` to discover 404 models. Now broken models are detected and
180
+ auto-hidden on first use.
181
+
182
+ ### Changed
183
+
184
+ - **Cline provider now uses `isFreeModel`** — Fixed Cline to use the consistent `isFreeModel` helper instead of `m.cost.input === 0`. Previously used cost-only filtering, now uses proper OR logic for pricing-exposed providers.
185
+
186
+ - **NVIDIA test expectations updated** — Updated tests to reflect strict Route B behavior (name-only detection for non-pricing-exposed providers). Added test for models with `"free"` in name being marked as free.
187
+
188
+ ### Fixed
189
+
190
+ - **`provider-factory.ts` — `beforeProviderRequest` hook now scoped to owning provider** —
191
+ The hook was firing for **all** provider requests regardless of which provider the factory
192
+ was configuring. Now checks `evt.provider !== def.providerId` and returns early if the
193
+ event doesn't belong to the owning provider.
194
+
195
+ - **`provider-factory.ts` — `reRegister` callback no longer corrupts stored model lists** —
196
+ When toggling between free/paid modes, the callback was overwriting `stored.all` with only
197
+ the filtered subset, losing the original full model list. Now preserves the original model
198
+ lists for correct subsequent toggling.
199
+
200
+ - **`lib/types.ts` Removed leftover `LspTestInterface`** Removed a test interface that
201
+ was left in production code.
202
+
203
+ - **`index.ts` — Removed redundant `.catch()` on deprecated Qwen provider** — The `.catch()`
204
+ was unnecessary since `Promise.allSettled` already handles rejections.
205
+
206
+ ### Removed
207
+
208
+ - **Qwen provider (deprecated)** — Removed Qwen OAuth provider as the 1,000 req/day free tier is no longer available. Provider remains functional for existing authenticated users but new free tier registrations are not supported.
209
+
210
+ - **Modal provider** — Removed single-model Modal provider (only had GLM-5.1 FP8). Users should use other providers for GLM models.
211
+
212
+ - **Cloudflare provider** — Removed Cloudflare Workers AI provider as it's now built into pi core. Users can use pi's built-in Cloudflare provider instead.
213
+
214
+ - **Qwen test file** — Removed `tests/qwen.test.ts` along with the deprecated provider.
215
+
216
+ ## [2.0.2] - 2026-04-26
217
+
218
+ ### Added
219
+
220
+ - **Model matching debug logging** Added `~/.pi/modelmatch.log` to diagnose which models get Coding Index scores and which don't:
221
+ - Logs every matching attempt with provider, model ID, normalization strategy, and result
222
+ - CSV-like format: `timestamp|provider|modelId|modelName|action|strategy|normalizedId|matchKey|codingIndex|details`
223
+ - Provider-specific normalizers for better matching:
224
+ - **NVIDIA**: Strips vendor prefixes (`meta/`, `mistralai/`, `microsoft/`, `qwen/`, etc.)
225
+ - **Cloudflare**: Strips `@cf/namespace/` prefixes
226
+ - **Groq**: Removes `-versatile` and numeric context suffixes (`-32768`)
227
+ - **Cerebras**: Normalizes `llama3.1` → `llama-3.1`, auto-adds `instruct` suffix
228
+ - **Mistral**: Strips `-latest` suffix
229
+ - **Ollama**: Converts `model:tag` → `model-tag`
230
+ - Common suffix stripping: `:free`, date codes (`-20250514`), versions (`-v1.1`), `-it`, `-fp8`/`-bf16`
231
+
232
+ - **Enhanced benchmark lookup** `enhanceModelNameWithCodingIndex()` now accepts optional `provider` parameter for provider-aware normalization
233
+
234
+ - **Static 404 model blocklist for NVIDIA** — Probed all 136 models from `integrate.api.nvidia.com/v1/models` and identified 57 that return 404 "Function not found" on `/v1/chat/completions`. These are now hard-filtered so they never appear in the model selector:
235
+ - Covers discontinued models (`databricks/dbrx-instruct`, `meta/codellama-70b`, `meta/llama2-70b`, `ibm/granite-*`, etc.)
236
+ - Covers embedding-only models listed as chat-capable (`nvidia/nv-embed-v1`, `nvidia/nv-embedqa-*`, `snowflake/arctic-embed-l`, etc.)
237
+ - Covers stale API catalog entries (`mistralai/mistral-large`, `mistralai/mistral-large-2-instruct`, `writer/palmyra-*`, etc.)
238
+ - Full list in `NVIDIA_KNOWN_404_MODELS` in `providers/nvidia/nvidia.ts`
239
+
240
+ - **`/probe-nvidia` command** On-demand model health check. Tests every registered NVIDIA model with a minimal `max_tokens: 1` request, auto-hides any new 404s in `~/.pi/free.json`, and re-registers the provider immediately.
241
+
242
+ - **`scripts/probe-nvidia.mjs`** Standalone Node.js script to reproduce the probe. Reads `~/.pi/free.json` for the API key, batches 20 requests at a time with 10s timeout, and prints all broken model IDs for adding to the blocklist.
243
+
244
+ - **Ollama Cloud 403 handling** — Same pattern as NVIDIA 404s for Ollama Cloud:
245
+ - `OLLAMA_KNOWN_403_MODELS` blocklist for models that return 403 "access denied"
246
+ - `/probe-ollama` command to test all models on-demand, auto-hide broken ones, and re-register
247
+ - `scripts/probe-ollama.mjs` standalone script for blocklist maintenance
248
+
249
+ - **Provider-scoped hidden models** Hidden models are now provider-specific:
250
+ - Format: `"provider/model-id"` (e.g., `"ollama/kimi-k2.6"`, `"nvidia/broken-model"`)
251
+ - A model hidden from one provider doesn't hide it from other providers
252
+ - Backward compatible with old global `"model-id"` format
253
+ - All providers updated: NVIDIA, Ollama, Cloudflare, Cline, Kilo, Modal
254
+
255
+ ### Fixed
256
+
257
+ - **Probe commands timeout handling** — Added `fetchWithTimeout` with 10-second timeout to `/probe-nvidia` and `/probe-ollama` commands. Prevents the coding harness from freezing when individual model probe requests hang indefinitely.
258
+
259
+ - **NVIDIA provider now sends `authHeader: true`** Explicitly enables `Authorization: Bearer` header injection. Previously relied on pi's implicit behavior which could fail in some configurations.
260
+
261
+ ### Removed
262
+
263
+ - **NVIDIA 404 model warning log** — Removed the `console.warn("[nvidia] Skipping known 404 model: ...")` output when filtering out known broken models. The filter still works silently; use `/probe-nvidia` to identify new 404s if needed.
264
+
265
+ ### Changed
266
+
267
+ - **Cloudflare provider now fetches models dynamically** Replaced static 19-model hardcoded list with live API fetch from `api.cloudflare.com/client/v4/accounts/{account_id}/ai/models`:
268
+ - Automatically discovers all 30+ text generation models (was manually maintaining 19)
269
+ - Smart filtering excludes embeddings, image generation, speech, translation, and vision-only models via regex patterns
270
+ - Metadata inference from model IDs: detects vision (`vision`/`multimodal`), reasoning (`r1`/`thinking`/`qwq`), context windows, and estimated costs
271
+ - Fixed Mistral Small ID: changed from incorrect `@cf/mistralai/...` to correct `@cf/mistral/...`
272
+ - Added new fallback models: Kimi K2.6, OpenAI GPT-OSS 120B/20B, Qwen 2.5 Coder 32B, QwQ 32B, Llama 3.2 11B Vision
273
+ - Graceful fallback to expanded 18-model hardcoded list if API fetch fails
274
+
275
+ - **NVIDIA provider now queries NVIDIA's API directly** — Source of truth switched from `models.dev` curated JSON to `https://integrate.api.nvidia.com/v1/models`:
276
+ - Eliminates 57 missing models and 25 stale entries from the old third-party source
277
+ - Models not in `models.dev` get inferred metadata (128k context, 4k output, vision/reasoning heuristics)
278
+ - Added regex-based non-chat model filtering for unknown models (embeddings, whisper, reward models, safety guards, parsers, detectors, etc.)
279
+ - Graceful fallback to `models.dev` if NVIDIA API is unreachable
280
+ - Removed paid/free toggle filtering — NVIDIA is freemium (all models use free credits)
281
+
282
+ ## [2.0.1] - 2026-04-24
283
+
284
+ ### Added
285
+
286
+ - **Built-in provider toggle support** (`lib/built-in-toggle.ts`) — Enables free/paid filtering for Pi's built-in providers that expose per-model pricing:
287
+ - **OpenCode (`/toggle-opencode`)** — Captures built-in OpenCode models on session start and filters to free-only by default
288
+ - **OpenRouter (`/toggle-openrouter`)** — Now uses the built-in toggle system for consistency
289
+ - Toggle works in the current session (no restart needed)
290
+ - Persisted via `opencode_show_paid` and `openrouter_show_paid` in `~/.pi/free.json`
291
+
292
+ ### Changed
293
+
294
+ - **OpenRouter moved to built-in toggle system** — OpenRouter is now handled by `lib/built-in-toggle.ts` alongside OpenCode for a unified approach:
295
+ - Removed from `providers/dynamic-built-in/index.ts`
296
+ - Eliminated duplicate toggle command registration logic
297
+ - Consolidated toggle persistence with other built-in providers
298
+
299
+ - **Standardized all toggle commands to `toggle-{provider}`** — Renamed from `{provider}-toggle` for consistency:
300
+ - `/kilo-toggle` → `/toggle-kilo`
301
+ - `/cline-toggle` → `/toggle-cline`
302
+ - `/openrouter-toggle` → `/toggle-openrouter`
303
+ - `/nvidia-toggle` → `/toggle-nvidia`
304
+ - `/cloudflare-toggle` → `/toggle-cloudflare`
305
+ - `/ollama-toggle` → `/toggle-ollama`
306
+ - `/mistral-toggle` → `/toggle-mistral`
307
+ - `/groq-toggle` → `/toggle-groq`
308
+ - `/cerebras-toggle` → `/toggle-cerebras`
309
+ - `/toggle-opencode` (new)
310
+
311
+ ### Fixed
312
+
313
+ - **Ollama Cloud model fetching endpoint** — Corrected the `/v1/models` → `/models` endpoint path in `providers/ollama/ollama.ts`:
314
+ - The previous fix (2.0.0) incorrectly used `/v1/models`; Ollama Cloud's models endpoint is `/v1/models` for chat completions but `/models` for listing
315
+ - This ensures model fetching works correctly with the OpenAI-compatible API
316
+
317
+ ### Removed
318
+
319
+ - **Global `/free` command** — Removed the global free-only toggle. Per-provider toggles (`/toggle-{provider}`) are now the only way to switch between free and paid models. The `/free-providers` status command remains.
320
+
321
+ ## [2.0.0] - 2026-04-23
322
+
323
+ ### Breaking Changes
324
+
325
+ - **Removed Fireworks provider** — Fireworks is now a built-in Pi provider (added in pi 0.68.1), so the extension's Fireworks provider has been removed to avoid conflicts:
326
+ - Deleted `providers/fireworks/fireworks.ts` and `tests/fireworks.test.ts`
327
+ - Removed all Fireworks configuration options from `config.ts` (`fireworks_api_key`, `fireworks_show_paid`)
328
+ - Users should now use Pi's built-in Fireworks support with `FIREWORKS_API_KEY`
329
+
330
+ - **Renamed Ollama provider to `ollama-cloud`** — Changed provider ID from `"ollama"` to `"ollama-cloud"` to avoid collision with Pi's built-in local Ollama provider:
331
+ - This prevents provider ID conflicts when both are registered
332
+ - All log messages and documentation now reference "Ollama Cloud"
333
+
334
+ ### Removed
335
+
336
+ - **Dropped `@sinclair/typebox` peer dependency** — Pi 0.69.0 migrated from `@sinclair/typebox` to `typebox` 1.x. The extension didn't directly import this package, so it was removed from `peerDependencies` to avoid potential conflicts.
337
+
338
+ ### Fixed
339
+
340
+ - **Ollama Cloud API endpoint** — Fixed broken Ollama Cloud integration:
341
+ - Changed `BASE_URL_OLLAMA` from `https://ollama.com` to `https://ollama.com/v1` — the OpenAI-compatible API endpoint
342
+ - Fixed model fetching to use `/v1/models` instead of `/api/tags` — ensures model IDs work with chat completions endpoint
343
+ - Previously calls went to HTML homepage instead of API endpoints, causing 404 errors
344
+
345
+ ### Removed
346
+
347
+ - **Removed paid model warning on selection** — Deleted the `model_select` event handler that showed:
348
+ - `⚠️ Paid model selected (${model.id}). Use "/free off" to enable paid models.`
349
+ - This warning was redundant since the global `/free` toggle and provider toggles already control model visibility
350
+
351
+ - **Removed pointless `/modal-toggle` command** — Modal provider only has 1 free model (GLM-5.1 FP8), so there was nothing meaningful to toggle:
352
+ - Added `skipToggle` option to `ProviderDefinition` and `ProviderSetupConfig` interfaces
353
+ - Modal provider now sets `skipToggle: true` to prevent toggle command creation
354
+
355
+ ### Changed
356
+
357
+ - **Marked Qwen provider as fully deprecated** — Updated messaging to clarify the provider is broken:
358
+ - Changed model name from `"Qwen Coder — Free 1k/day"` to `"Qwen Coder — DEPRECATED (free tier discontinued)"`
359
+ - Updated all JSDoc comments to clearly state auth is broken and free tier is no longer available
360
+ - Provider remains for backward compatibility but should not be used
361
+
362
+ ### Added
363
+
364
+ - **Cloudflare Workers AI provider** New provider for Cloudflare's serverless GPU platform:
365
+ - 50+ open-source models: Llama 4, Mistral Small 3.1, Qwen 2.5/3, DeepSeek R1, Gemma 4, Kimi K2.5/2.6, and more
366
+ - **10,000 Neurons/day FREE tier** (resets daily at 00:00 UTC)
367
+ - **$0.011 per 1,000 Neurons** beyond free allocation
368
+ - Only requires `CLOUDFLARE_API_TOKEN` account ID auto-derived from token
369
+ - Toggle with `/cloudflare-toggle`
370
+ - Create token at https://dash.cloudflare.com/profile/api-tokens
371
+
372
+ - **Unified dynamic built-in providers module** — New `providers/dynamic-built-in/` module that dynamically fetches models from Pi's built-in providers when users have API keys:
373
+ - **Mistral** (`MISTRAL_API_KEY`) — Fetches from `api.mistral.ai/v1/models`
374
+ - **Groq** (`GROQ_API_KEY`) Fetches from `api.groq.com/openai/v1/models`
375
+ - **Cerebras** (`CEREBRAS_API_KEY`) Fetches from `api.cerebras.ai/v1/models`
376
+ - **xAI** (`XAI_API_KEY`) — Fetches from `api.x.ai/v1/models`
377
+ - **Hugging Face** (`HF_TOKEN` optional)Fetches public + authenticated models
378
+ - **OpenRouter** — Moved from `index.ts` to unified module with dynamic fetch
379
+ - All integrate with global `/free` toggle and have per-provider toggle commands (`/mistral-toggle`, `/groq-toggle`, etc.)
380
+
381
+ - **Global `/free` toggle system** New centralized free/paid filtering across ALL providers:
382
+ - `/free on/off/status` — Toggle free-only view globally
383
+ - `/free-providers` — Show free/paid model counts by provider
384
+ - `FREE_ONLY` config option and `PI_FREE_ONLY` environment variable
385
+ - Providers register via `registerWithGlobalToggle()` for unified filtering
386
+
387
+ ### Fixed
388
+
389
+ - **Toggle commands now actually filter models from UI** — Previously, toggle commands only showed notifications but didn't remove paid models from the model picker:
390
+ - **OpenRouter (`/openrouter-toggle`)**: Now uses `registerProvider`/`unregisterProvider` to actually filter models from the picker UI
391
+ - **NVIDIA (`/nvidia-toggle`)**: Added dynamic `showPaid` parameter to `fetchNvidiaModels()` so toggle properly switches between free and paid model sets
392
+ - **Fireworks**: Removed broken toggle command all models are paid with no free tier, so there was nothing to toggle
393
+
394
+ ### Added
395
+
396
+ - **OpenRouter per-provider free model toggle** Added `/openrouter-toggle` command for the built-in OpenRouter provider:
397
+ - `/openrouter-toggle` — Switch between showing only free models vs all models (including paid)
398
+ - New config flag `openrouter_show_paid` in `~/.pi/free.json` (default: `false`)
399
+ - Environment variable: `OPENROUTER_SHOW_PAID=true` to show paid models by default
400
+ - This brings OpenRouter (a built-in pi provider) in line with extension providers that have per-provider toggles
401
+
402
+ ### Deprecated
403
+
404
+ - **Qwen provider** The 1,000 requests/day free tier is no longer available from Qwen/DashScope. The provider code remains for backward compatibility but is now deprecated:
405
+ - Added `@deprecated` JSDoc tags to all Qwen-related exports
406
+ - Added deprecation warning when Qwen provider loads
407
+ - Added warning when `QWEN_SHOW_PAID` config is used
408
+ - Consider migrating to other free providers: Kilo, Cline, NVIDIA, or Modal
409
+
410
+ ### Added
411
+
412
+ - **Go provider** — OpenCode Go subscription gateway (⚠️ paid only — $5 first month, then $10/month, no free tier) with models: GLM-5, Kimi K2.5, MiMo-V2-Pro, MiMo-V2-Omni, MiniMax M2.7, MiniMax M2.5
413
+ - Set `OPENCODE_GO_API_KEY` or `opencode_go_api_key` in `~/.pi/free.json`
414
+ - Toggle with `/go-toggle`
415
+
416
+ ### Fixed
417
+
418
+ - **All providers now show Coding Index scores in model selector** Added `enhanceWithCI()` to factory-based providers (nvidia, fireworks, mistral, modal, ollama) and cline. Now all providers display CI scores in `/models` command (pi-models extension).
419
+
420
+ - **All providers now show in `--list-models`** — Providers (zen, openrouter, go) that registered models only in `session_start` were missing from `pi --list-models` which runs before session starts. Added immediate registration for these providers:
421
+ - **zen**: Added model caching to `~/.pi/provider-cache.json` for immediate registration + dynamic refresh
422
+ - **openrouter**: Immediate model registration at extension load (like kilo/cline)
423
+ - **go**: Immediate registration with static model list (no API to fetch from)
424
+ - All 11 providers now visible in `--list-models`
425
+
426
+ ### Changed
427
+
428
+ - Updated README with clear free vs paid provider distinction (9 free + 2 paid-only: Go, Fireworks)
429
+ - Added Go and Fireworks provider documentation under new "💳 Paid-Only Providers" section
430
+ - Added `opencode_go_api_key` to config file template
431
+ - Updated package.json description and keywords to include all 11 providers
432
+
433
+ ### Added
434
+
435
+ - **Provider model cache** (`lib/provider-cache.ts`) — New utility for caching provider model lists to `~/.pi/provider-cache.json`. Used by zen provider for faster startup and offline access after first successful fetch.
436
+
437
+ ## [1.0.9] - 2026-04-14
438
+
439
+ ### Fixed
440
+
441
+ - **Qwen OAuth breaks other OAuth providers** — `modifyModels` receives all models across every registered provider, not just Qwen's. The previous `map()` stamped the Qwen dashscope `baseUrl` onto every model, causing other OAuth providers (Kilo, OpenRouter, etc.) to return 404 after a `/login qwen` flow. Now only models with `provider === PROVIDER_QWEN` are patched; others pass through unchanged.
442
+
443
+ ## [1.0.8] - 2026-04-13
444
+
445
+ ### Added
446
+
447
+ - **Modal provider** Free access to GLM-5.1 FP8 (128k context, 16k max output) during promotional period (free until April 30, 2026)
448
+ - Requires a free Modal API key (`MODAL_API_KEY` or `modal_api_key` in `~/.pi/free.json`)
449
+ - Model: `zai-org/GLM-5.1-FP8` — 128k context window, 16k max output tokens
450
+ - **Qwen provider** — Free access to Qwen Coder (1,000 requests/day) via OAuth device flow
451
+ - Run `/login qwen` to authenticate through Qwen Studio (chat.qwen.ai)
452
+ - Uses `coder-model` alias (maps to Qwen3.6-Plus on the backend)
453
+ - 131k context window, 16k max output tokens, zero cost
454
+
455
+ ### Fixed
456
+
457
+ - **Qwen OAuth browser launch on Windows** — URLs with `&` query params were truncated by `cmd.exe`'s `&` command separator; switched to `powershell.exe Start-Process` which passes the URL as a literal string
458
+ - **Qwen API endpoint** — Replicates qwen-code's `getCurrentEndpoint()` logic: uses `resource_url` from OAuth token response (`dashscope.aliyuncs.com` for Chinese accounts, `portal.qwen.ai` for international), with fallback to `dashscope.aliyuncs.com/compatible-mode/v1`
459
+ - **Qwen DashScope headers** — Added all headers required by DashScope's OpenAI-compatible API: `X-DashScope-AuthType: qwen-oauth`, `X-DashScope-CacheControl: enable`, `X-DashScope-UserAgent`, `Client-Code: QwenCode`
460
+ - **Qwen modifyModels crash** — `modifyModels` must be synchronous; making it async caused the pi framework to receive a `Promise` instead of a `Model[]`, breaking `ModelRegistry.find()`
461
+
462
+ ## [1.0.5] - 2025-04-03
463
+
464
+ ### Fixed
465
+
466
+ - **NVIDIA provider non-chat model filtering** (comment/implementation mismatch)
467
+ - Added modalities-based filtering to exclude embedding, speech-to-text, OCR, and image-gen models
468
+ - Filters models where `output` is not `["text"]` (e.g., image generation like `black-forest-labs/flux.1-dev`)
469
+ - Filters models where `input` lacks `"text"` (e.g., OCR like `nvidia/nemoretriever-ocr-v1`, speech-to-text like `openai/whisper-large-v3`)
470
+ - Updated file comment to accurately describe the filtering behavior
471
+ - Added 8 comprehensive unit tests for model filtering logic
472
+
473
+ ## [1.0.4] - 2025-04-03
474
+
475
+ ### Fixed
476
+
477
+ - **All tests now passing** (127/127)
478
+ - Fixed mock paths in kilo.test.ts, zen.test.ts, ollama.test.ts
479
+ - Fixed createCtxReRegister mocks in zen.test.ts and openrouter.test.ts
480
+ - Fixed cline.test.ts to test actual provider re-registration behavior
481
+ - Added missing DEFAULT_MIN_SIZE_B constant to openrouter mock
482
+
483
+ ### Changed
484
+
485
+ - **Code quality improvements**
486
+ - Refactored usage modules to break circular dependency (limits.ts formatters.ts)
487
+ - Created usage/types.ts with shared interfaces (FreeTierLimit, FreeTierUsage)
488
+ - Bumped version to 1.0.4
489
+
490
+ ## [1.0.3] - 2025-04-03
491
+
492
+ ### Changed
493
+
494
+ - Updated package.json metadata (name, description, keywords, repository URL)
495
+ - Updated .npmignore for cleaner publishes
496
+
497
+ ## [1.0.0] - 2024-03-28
498
+
499
+ ### Added
500
+
501
+ - Initial release with 6 providers: Kilo, Zen, OpenRouter, NVIDIA, Cline, Fireworks
502
+ - Free tier usage tracking across all sessions
503
+ - Provider failover with model hopping
504
+ - Autocompact integration for rate limit recovery
505
+ - Usage widget with glimpseui
506
+ - Command toggles for free/all model filtering
507
+ - Hardcoded benchmark data from Artificial Analysis
508
+
509
+ ### Changed
510
+
511
+ - **Major refactoring**: Split free-tier-limits.ts into usage/\* modules
512
+ - usage/tracking.ts - runtime session tracking
513
+ - usage/cumulative.ts - persistent storage
514
+ - usage/formatters.ts - display formatting
515
+ - 77% line reduction (741 → 166 lines)
516
+ - **Major refactoring**: Split usage-widget.ts into widget/\* modules
517
+ - widget/data.ts - data collection
518
+ - widget/format.ts - formatting utilities
519
+ - widget/render.ts - HTML generation
520
+ - 74% line reduction (~350 90 lines)
521
+ - **Refactoring**: Extracted functions from cline-auth.ts
522
+ - fetchAuthorizeUrl() - auth URL fetching
523
+ - waitForAuthCode() - callback handling
524
+ - exchangeCodeForTokens() - token exchange
525
+ - parseManualInput() - manual input parsing
526
+ - **Refactoring**: Simplified model-hop.ts complexity
527
+ - Extracted handleDowngradeDecision()
528
+ - Extracted tryAlternativeModel()
529
+ - **Deduplication**: Created shared modules
530
+ - lib/json-persistence.ts - file I/O with caching
531
+ - lib/logger.ts - structured logging
532
+ - providers/model-fetcher.ts - OpenRouter-compatible fetching
533
+ - Replaced ~30 console.log statements with structured logging
534
+ - Fixed all 9 pre-existing test failures
535
+ - fetchWithRetry now throws after last retry
536
+ - Fixed auth pattern matching (added key.*not.*valid)
537
+ - Updated capability ranking tests
538
+ - Added resetUsageStats() for test isolation
539
+
540
+ ### Fixed
541
+
542
+ - fetchWithRetry() now properly throws after exhausting retries
543
+ - Auth error pattern matching now handles more message variants
544
+ - Test isolation for free-tier-limits tests