noosphere 0.2.0 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -59,103 +59,204 @@ const audio = await ai.speak({
59
59
  // audio.buffer contains the audio data
60
60
  ```
61
61
 
62
- ## Dynamic Model Auto-Fetch — Always Up-to-Date
62
+ ## Dynamic Model Auto-Fetch — Always Up-to-Date (ALL Providers, ALL Modalities)
63
63
 
64
- Noosphere **automatically discovers the latest models** from every provider's API at runtime. When Google releases a new Gemini model, when OpenAI drops GPT-5, when Anthropic publishes Claude 4 — **you get them immediately**, without updating Noosphere or any dependency.
64
+ Noosphere **automatically discovers the latest models from EVERY provider's API at runtime** — across **all 4 modalities** (LLM, image, video, TTS). When Google releases a new Gemini model, when OpenAI drops GPT-5, when FAL adds a new video model, when a new image model trends on HuggingFace — **you get them immediately**, without updating Noosphere or any dependency.
65
65
 
66
66
  ### The Problem It Solves
67
67
 
68
- Traditional AI libraries rely on **static model catalogs** hardcoded at build time. The `@mariozechner/pi-ai` dependency ships with ~246 models in a pre-generated `models.generated.js` file. When a provider releases a new model, you'd have to wait for the library maintainer to run `npm run generate-models`, publish a new version, and then you'd `npm update`. This lag can be days or weeks.
69
-
70
- ### How It Works
71
-
72
- On the **first API call**, Noosphere queries every provider's model listing API in parallel and merges the results with the static catalog:
73
-
74
- ```
75
- First ai.chat() / ai.image() / ai.stream() call
76
-
77
- ├─ 1. Load static pi-ai catalog (246 models with accurate cost/context data)
78
-
79
- ├─ 2. Parallel fetch from ALL provider APIs (8 concurrent requests):
80
- ├── GET https://api.openai.com/v1/models (Bearer token)
81
- │ ├── GET https://api.anthropic.com/v1/models (x-api-key + anthropic-version)
82
- ├── GET https://generativelanguage.googleapis.com/... (API key in URL)
83
- ├── GET https://api.groq.com/openai/v1/models (Bearer token)
84
- ├── GET https://api.mistral.ai/v1/models (Bearer token)
85
- ├── GET https://api.x.ai/v1/models (Bearer token)
86
- ├── GET https://openrouter.ai/api/v1/models (Bearer token)
87
- └── GET https://api.cerebras.ai/v1/models (Bearer token)
88
-
89
- ├─ 3. Filter results (chat models only — exclude embeddings, TTS, whisper, etc.)
90
-
91
- ├─ 4. Deduplicate against static catalog (static wins — has accurate cost data)
92
-
93
- └─ 5. Merge: Static catalog + newly discovered models = complete model list
94
- ```
95
-
96
- ### What Gets Fetched Per Provider
68
+ Traditional AI libraries rely on **static model catalogs** hardcoded at build time. The `@mariozechner/pi-ai` dependency ships with ~246 LLM models in a pre-generated `models.generated.js` file. HuggingFace providers typically hardcode 3-5 default models. When a provider releases a new model, you'd have to wait for the library maintainer to update, publish, and then you'd `npm update`. This lag can be days or weeks.
69
+
70
+ **Noosphere solves this for every provider and every modality simultaneously.**
71
+
72
+ ### How It Works Complete Auto-Fetch Architecture
73
+
74
+ Noosphere has **3 independent auto-fetch systems** that work in parallel, one for each provider layer:
75
+
76
+ ```
77
+ ┌─────────────────────────────────────────────────────────────┐
78
+ NOOSPHERE AUTO-FETCH │
79
+ ├─────────────────────────────────────────────────────────────┤
80
+
81
+ ┌─── Pi-AI Provider (LLM) ─────────────────────────────┐ │
82
+ 8 parallel API calls on first chat()/stream(): │ │
83
+ OpenAI, Anthropic, Google, Groq, Mistral, │ │
84
+ xAI, OpenRouter, Cerebras │ │
85
+ Merges with static pi-ai catalog (246 models) │ │
86
+ Constructs synthetic Model objects for new ones │ │
87
+ └───────────────────────────────────────────────────────┘
88
+
89
+ ┌─── FAL Provider (Image/Video/TTS) ───────────────────┐ │
90
+ 1 API call on listModels(): │ │
91
+ │ GET https://api.fal.ai/v1/models/pricing │ │
92
+ → Returns ALL 867+ endpoints with live pricing │ │
93
+ │ → Auto-classifies modality from model ID + unit │ │
94
+ │ └───────────────────────────────────────────────────────┘ │
95
+ │ │
96
+ │ ┌─── HuggingFace Provider (LLM/Image/TTS) ────────────┐ │
97
+ │ │ 3 parallel API calls on listModels(): │ │
98
+ │ │ GET huggingface.co/api/models?pipeline_tag=... │ │
99
+ │ │ → text-generation (top 50 trending, inference-ready) │ │
100
+ │ │ → text-to-image (top 50 trending, inference-ready) │ │
101
+ │ │ → text-to-speech (top 30 trending, inference-ready) │ │
102
+ │ │ → Includes inference provider mapping + pricing │ │
103
+ │ └───────────────────────────────────────────────────────┘ │
104
+ │ │
105
+ └─────────────────────────────────────────────────────────────┘
106
+ ```
107
+
108
+ ### Layer 1: LLM Auto-Fetch (Pi-AI Provider) — 8 Provider APIs
109
+
110
+ On the **first `chat()` or `stream()` call**, Pi-AI queries every LLM provider's model listing API in parallel:
97
111
 
98
112
  | Provider | API Endpoint | Auth | Model Filter | API Protocol |
99
113
  |---|---|---|---|---|
100
- | **OpenAI** | `/v1/models` | Bearer token | `gpt-*`, `o1*`, `o3*`, `o4*`, `chatgpt-*`, `codex-*` | `openai-responses` |
101
- | **Anthropic** | `/v1/models?limit=100` | `x-api-key` + `anthropic-version` | `claude-*` | `anthropic-messages` |
102
- | **Google** | `/v1beta/models?key=KEY` | API key in URL | `gemini-*`, `gemma-*` + must support `generateContent` | `google-generative-ai` |
103
- | **Groq** | `/openai/v1/models` | Bearer token | All (Groq only serves chat models) | `openai-completions` |
104
- | **Mistral** | `/v1/models` | Bearer token | Exclude `*embed*` | `openai-completions` |
105
- | **xAI** | `/v1/models` | Bearer token | `grok*` | `openai-completions` |
106
- | **OpenRouter** | `/api/v1/models` | Bearer token | All (OpenRouter only lists usable models) | `openai-completions` |
107
- | **Cerebras** | `/v1/models` | Bearer token | All (Cerebras only serves chat models) | `openai-completions` |
108
-
109
- ### Resilience Guarantees
114
+ | **OpenAI** | `GET /v1/models` | Bearer token | `gpt-*`, `o1*`, `o3*`, `o4*`, `chatgpt-*`, `codex-*` | `openai-responses` |
115
+ | **Anthropic** | `GET /v1/models?limit=100` | `x-api-key` + `anthropic-version: 2023-06-01` | `claude-*` | `anthropic-messages` |
116
+ | **Google** | `GET /v1beta/models?key=KEY` | API key in URL | `gemini-*`, `gemma-*` + must support `generateContent` | `google-generative-ai` |
117
+ | **Groq** | `GET /openai/v1/models` | Bearer token | All (Groq only serves chat models) | `openai-completions` |
118
+ | **Mistral** | `GET /v1/models` | Bearer token | Exclude `*embed*` | `openai-completions` |
119
+ | **xAI** | `GET /v1/models` | Bearer token | `grok*` | `openai-completions` |
120
+ | **OpenRouter** | `GET /api/v1/models` | Bearer token | All (all OpenRouter models are usable) | `openai-completions` |
121
+ | **Cerebras** | `GET /v1/models` | Bearer token | All (Cerebras only serves chat models) | `openai-completions` |
110
122
 
111
- - **8-second timeout** per provider slow APIs don't block everything
112
- - **`Promise.allSettled()`** — if one provider fails, the others still work
113
- - **Silent failure** — network errors are caught and ignored, static catalog always available
114
- - **One-time fetch** — results are cached in memory, not re-fetched on every call
115
- - **Zero config** — works automatically if you have API keys set
116
-
117
- ### How New Models Become Usable
118
-
119
- When a dynamically discovered model isn't in the static catalog, Noosphere constructs a **synthetic Model object** that pi-ai's `complete()` and `stream()` functions can use directly:
123
+ **How new LLM models become usable:** When a model isn't in the static catalog, Noosphere constructs a **synthetic `Model` object** with the correct API protocol, base URL, and inherited cost data:
120
124
 
121
125
  ```typescript
122
- // For a new model like "gpt-4.5-turbo" discovered from OpenAI's API:
126
+ // New model "gpt-4.5-turbo" discovered from OpenAI's /v1/models:
123
127
  {
124
128
  id: 'gpt-4.5-turbo',
125
129
  name: 'gpt-4.5-turbo',
126
- api: 'openai-responses', // Correct protocol for the provider
130
+ api: 'openai-responses', // Correct protocol for OpenAI
127
131
  provider: 'openai',
128
132
  baseUrl: 'https://api.openai.com/v1',
129
- reasoning: false, // Inferred from model ID prefix
133
+ reasoning: false, // Inferred from model ID prefix
130
134
  input: ['text', 'image'],
131
- cost: { input: 2.5, output: 10, cacheRead: 1.25, cacheWrite: 2.5 }, // From template
132
- contextWindow: 128000, // From template or provider API
133
- maxTokens: 16384, // From template or provider API
135
+ cost: { input: 2.5, output: 10, ... }, // Inherited from template model
136
+ contextWindow: 128000, // From template or API response
137
+ maxTokens: 16384,
138
+ }
139
+ // This object is passed directly to pi-ai's complete()/stream() — works immediately
140
+ ```
141
+
142
+ ### Layer 2: Image/Video/TTS Auto-Fetch (FAL Provider) — Pricing API
143
+
144
+ FAL already provides a **fully dynamic catalog**. On `listModels()`, it fetches from `https://api.fal.ai/v1/models/pricing`:
145
+
146
+ ```typescript
147
+ // FAL returns an array with ALL available endpoints + live pricing:
148
+ [
149
+ { modelId: "fal-ai/flux-pro/v1.1-ultra", price: 0.06, unit: "per_image" },
150
+ { modelId: "fal-ai/kling-video/v2/master/text-to-video", price: 0.10, unit: "per_second" },
151
+ { modelId: "fal-ai/kokoro/american-english", price: 0.002, unit: "per_1k_chars" },
152
+ // ... 867+ endpoints total
153
+ ]
154
+
155
+ // Modality is auto-inferred from model ID + pricing unit:
156
+ // - unit contains 'char' OR id contains 'tts'/'kokoro'/'elevenlabs' → TTS
157
+ // - unit contains 'second' OR id contains 'video'/'kling'/'sora'/'veo' → Video
158
+ // - Everything else → Image
159
+ ```
160
+
161
+ **Result:** Every FAL model is always current — new endpoints appear the moment FAL publishes them. Pricing is always accurate because it comes directly from their API.
162
+
163
+ ### Layer 3: LLM/Image/TTS Auto-Fetch (HuggingFace Provider) — Hub API
164
+
165
+ Instead of 3 hardcoded defaults, HuggingFace now fetches **trending inference-ready models** from the Hub API across all 3 modalities:
166
+
167
+ ```
168
+ GET https://huggingface.co/api/models
169
+ ?pipeline_tag=text-generation ← LLM models
170
+ &inference_provider=all ← Only models available via inference API
171
+ &sort=trendingScore ← Most popular first
172
+ &limit=50 ← Top 50
173
+ &expand[]=inferenceProviderMapping ← Include provider routing + pricing
174
+ ```
175
+
176
+ | Pipeline Tag | Modality | Limit | What It Fetches |
177
+ |---|---|---|---|
178
+ | `text-generation` | LLM | 50 | Top 50 trending chat/completion models with active inference endpoints |
179
+ | `text-to-image` | Image | 50 | Top 50 trending image generation models (SDXL, Flux, etc.) |
180
+ | `text-to-speech` | TTS | 30 | Top 30 trending TTS models with active inference endpoints |
181
+
182
+ **What the Hub API returns per model:**
183
+ ```json
184
+ {
185
+ "id": "Qwen/Qwen2.5-72B-Instruct",
186
+ "pipeline_tag": "text-generation",
187
+ "likes": 1893,
188
+ "downloads": 4521987,
189
+ "inferenceProviderMapping": [
190
+ {
191
+ "provider": "together",
192
+ "providerId": "Qwen/Qwen2.5-72B-Instruct-Turbo",
193
+ "status": "live",
194
+ "providerDetails": {
195
+ "context_length": 32768,
196
+ "pricing": { "input": 1.2, "output": 1.2 }
197
+ }
198
+ },
199
+ {
200
+ "provider": "fireworks-ai",
201
+ "providerId": "accounts/fireworks/models/qwen2p5-72b-instruct",
202
+ "status": "live"
203
+ }
204
+ ]
134
205
  }
135
206
  ```
136
207
 
137
- **Template inheritance:** Cost and context window data come from a "template" — the first model in the static catalog for that provider. This means new models inherit approximate pricing until the static catalog is updated with exact numbers. For Google, the API returns `inputTokenLimit` and `outputTokenLimit` directly, so context window data is always accurate.
208
+ **Noosphere extracts from this:**
209
+ - Model ID → `id` field
210
+ - Pricing → first provider with `providerDetails.pricing`
211
+ - Context window → first provider with `providerDetails.context_length`
212
+ - Inference providers → list of available providers (Together, Fireworks, Groq, etc.)
213
+
214
+ **Three requests fire in parallel** (`Promise.allSettled`) with a **10-second timeout** each. If any fails, the 3 hardcoded defaults are always available as fallback.
215
+
216
+ ### Resilience Guarantees (All Layers)
217
+
218
+ | Guarantee | Pi-AI (LLM) | FAL (Image/Video/TTS) | HuggingFace (LLM/Image/TTS) |
219
+ |---|---|---|---|
220
+ | **Timeout** | 8s per provider | No custom timeout | 10s per pipeline_tag |
221
+ | **Parallelism** | 8 concurrent requests | 1 request (returns all) | 3 concurrent requests |
222
+ | **Failure handling** | `Promise.allSettled` | Returns `[]` on error | `Promise.allSettled` |
223
+ | **Fallback** | Static pi-ai catalog (246 models) | Empty list (provider still usable by model ID) | 3 hardcoded defaults |
224
+ | **Caching** | One-time fetch, cached in memory | Per `listModels()` call | One-time fetch, cached in memory |
225
+ | **Auth required** | Yes (per-provider API keys) | Yes (FAL key) | Optional (works without token) |
226
+
227
+ ### Total Model Coverage
228
+
229
+ | Source | Modalities | Model Count | Update Frequency |
230
+ |---|---|---|---|
231
+ | Pi-AI static catalog | LLM | ~246 | On npm update |
232
+ | Pi-AI dynamic fetch | LLM | **All models across 8 providers** | **Every session** |
233
+ | FAL pricing API | Image, Video, TTS | 867+ | **Every `listModels()` call** |
234
+ | HuggingFace Hub API | LLM, Image, TTS | Top 130 trending | **Every session** |
235
+ | ComfyUI `/object_info` | Image | Local checkpoints | **Every `listModels()` call** |
236
+ | Local TTS `/voices` | TTS | Local voices | **Every `listModels()` call** |
138
237
 
139
238
  ### Force Refresh
140
239
 
141
240
  ```typescript
142
241
  const ai = new Noosphere();
143
242
 
144
- // Models are auto-fetched on first call:
243
+ // Models are auto-fetched on first call — no action needed:
145
244
  await ai.chat({ model: 'gemini-2.5-ultra', messages: [...] }); // works immediately
146
245
 
147
- // Force a re-fetch if you suspect new models were added mid-session:
148
- // (access the provider's refreshDynamicModels method via the registry)
149
- const models = await ai.getModels('llm');
150
- // Or trigger a full sync:
151
- await ai.syncModels();
246
+ // Trigger a full sync across ALL providers:
247
+ const result = await ai.syncModels();
248
+ // result = { synced: 1200+, byProvider: { 'pi-ai': 300, 'fal': 867, 'huggingface': 130, ... }, errors: [] }
249
+
250
+ // Get all models for a specific modality:
251
+ const imageModels = await ai.getModels('image');
252
+ // Returns: FAL image models + HuggingFace image models + ComfyUI models
152
253
  ```
153
254
 
154
- ### Why Not Just Use the Provider APIs Directly?
255
+ ### Why Hybrid (Static + Dynamic)?
155
256
 
156
257
  | Approach | Pros | Cons |
157
258
  |---|---|---|
158
- | **Static catalog only** (old) | Accurate costs, fast startup | Stale within days, miss new models |
259
+ | **Static catalog only** | Accurate costs, fast startup | Stale within days, miss new models |
159
260
  | **Dynamic only** | Always current | No cost data, no context window info, slow startup |
160
261
  | **Hybrid (Noosphere)** | Best of both — accurate data for known models + immediate access to new ones | New models have estimated costs until catalog update |
161
262
 
@@ -1306,15 +1407,33 @@ The `@fal-ai/client` provides additional features beyond what Noosphere surfaces
1306
1407
 
1307
1408
  ---
1308
1409
 
1309
- ### Hugging Face — Open Source AI (30+ tasks)
1410
+ ### Hugging Face — Open Source AI (30+ tasks, Dynamic Discovery)
1310
1411
 
1311
1412
  **Provider ID:** `huggingface`
1312
1413
  **Modalities:** LLM, Image, TTS
1313
1414
  **Library:** `@huggingface/inference`
1415
+ **Auto-Fetch:** Yes — discovers trending inference-ready models from the Hub API
1314
1416
 
1315
- Access to the entire Hugging Face Hub ecosystem. Any model hosted on HuggingFace can be used by passing its ID directly.
1417
+ Access to the entire Hugging Face Hub ecosystem. Noosphere **automatically discovers the top trending models** across all 3 modalities via the Hub API, filtered to only include models with active inference provider endpoints.
1316
1418
 
1317
- #### Default Models
1419
+ #### Auto-Discovered Models
1420
+
1421
+ On first `listModels()` call, HuggingFace fetches from:
1422
+ ```
1423
+ GET https://huggingface.co/api/models?inference_provider=all&pipeline_tag={tag}&sort=trendingScore&limit={n}&expand[]=inferenceProviderMapping
1424
+ ```
1425
+
1426
+ | Pipeline Tag | Modality | Limit | Example Models |
1427
+ |---|---|---|---|
1428
+ | `text-generation` | LLM | 50 | Qwen2.5-72B-Instruct, Llama-3.3-70B, DeepSeek-V3, Mistral-Large |
1429
+ | `text-to-image` | Image | 50 | FLUX.1-dev, Stable Diffusion 3.5, SDXL-Lightning, Playground v2.5 |
1430
+ | `text-to-speech` | TTS | 30 | Kokoro-82M, Bark, MMS-TTS |
1431
+
1432
+ Each discovered model includes **inference provider routing** (Together, Fireworks, Groq, Replicate, etc.) and **pricing data** when available from the provider.
1433
+
1434
+ #### Fallback Default Models
1435
+
1436
+ These 3 models are always available, even if the Hub API is unreachable:
1318
1437
 
1319
1438
  | Modality | Default Model | Description |
1320
1439
  |---|---|---|
@@ -1322,7 +1441,7 @@ Access to the entire Hugging Face Hub ecosystem. Any model hosted on HuggingFace
1322
1441
  | Image | `stabilityai/stable-diffusion-xl-base-1.0` | SDXL Base |
1323
1442
  | TTS | `facebook/mms-tts-eng` | MMS TTS English |
1324
1443
 
1325
- Any HuggingFace model ID works — just pass it as the `model` parameter:
1444
+ Any HuggingFace model ID works — just pass it as the `model` parameter (even if it's not in the auto-discovered list):
1326
1445
 
1327
1446
  ```typescript
1328
1447
  await ai.chat({
@@ -1480,26 +1599,31 @@ const buffer = Buffer.from(await blob.arrayBuffer());
1480
1599
  // result.media = { format: 'wav' }
1481
1600
  ```
1482
1601
 
1483
- **Model listing — curated defaults, not API discovery:**
1602
+ **Model listing — dynamic Hub API discovery:**
1484
1603
  ```typescript
1485
- // Unlike FAL (which fetches from API) or Pi-AI (which auto-generates),
1486
- // HuggingFace returns a HARDCODED list of 3 curated models:
1604
+ // HuggingFace now auto-fetches trending models from the Hub API:
1487
1605
  async listModels(modality?: Modality): Promise<ModelInfo[]> {
1488
- const models: ModelInfo[] = [];
1489
- if (!modality || modality === 'image') {
1490
- models.push({ id: 'stabilityai/stable-diffusion-xl-base-1.0', ... });
1491
- }
1492
- if (!modality || modality === 'tts') {
1493
- models.push({ id: 'facebook/mms-tts-eng', ... });
1494
- }
1495
- if (!modality || modality === 'llm') {
1496
- models.push({ id: 'meta-llama/Llama-3.1-8B-Instruct', ... });
1497
- }
1498
- return models;
1606
+ if (!this.dynamicModels) await this.fetchHubModels();
1607
+ // Returns: 3 hardcoded defaults + top 50 LLM + top 50 image + top 30 TTS
1608
+ // All filtered by inference_provider=all (only inference-ready models)
1499
1609
  }
1500
- // This means: the registry only KNOWS about 3 models by default,
1501
- // but you can use ANY HuggingFace model by passing its ID directly.
1502
- // The model just won't appear in getModels() or syncModels() results.
1610
+
1611
+ // Hub API request per modality:
1612
+ // GET https://huggingface.co/api/models
1613
+ // ?pipeline_tag=text-generation
1614
+ // &inference_provider=all ← Only models with active inference endpoints
1615
+ // &sort=trendingScore ← Most popular first
1616
+ // &limit=50
1617
+ // &expand[]=inferenceProviderMapping ← Include provider routing + pricing
1618
+
1619
+ // Response includes per model:
1620
+ // - id: "Qwen/Qwen2.5-72B-Instruct"
1621
+ // - inferenceProviderMapping: [{ provider: "together", status: "live",
1622
+ // providerDetails: { context_length: 32768, pricing: { input: 1.2 } } }]
1623
+
1624
+ // Pricing and context_length extracted from inferenceProviderMapping
1625
+ // 3 hardcoded defaults always included as fallback
1626
+ // Results cached in memory after first fetch
1503
1627
  ```
1504
1628
 
1505
1629
  #### The 17 HuggingFace Inference Providers
package/dist/index.cjs CHANGED
@@ -279,7 +279,6 @@ var UsageTracker = class {
279
279
  // src/providers/pi-ai.ts
280
280
  var import_pi_ai = require("@mariozechner/pi-ai");
281
281
  var KNOWN_PROVIDERS = ["anthropic", "google", "openai", "xai", "groq", "cerebras", "openrouter", "zai"];
282
- var LOCAL_PROVIDERS = /* @__PURE__ */ new Set(["ollama"]);
283
282
  var FETCH_TIMEOUT_MS = 8e3;
284
283
  var OPENAI_CHAT_PREFIXES = ["gpt-", "o1", "o3", "o4", "chatgpt-", "codex-"];
285
284
  var OPENAI_REASONING_PREFIXES = ["o1", "o3", "o4"];
@@ -420,35 +419,7 @@ var PiAiProvider = class {
420
419
  if (modality && modality !== "llm") return [];
421
420
  await this.ensureDynamicModels();
422
421
  const models = [];
423
- const seenIds = /* @__PURE__ */ new Set();
424
- for (const provider of KNOWN_PROVIDERS) {
425
- try {
426
- const providerModels = (0, import_pi_ai.getModels)(provider);
427
- for (const m of providerModels) {
428
- seenIds.add(m.id);
429
- models.push({
430
- id: m.id,
431
- provider: "pi-ai",
432
- name: m.name || m.id,
433
- modality: "llm",
434
- local: LOCAL_PROVIDERS.has(String(m.provider)),
435
- cost: {
436
- price: m.cost.input ?? 0,
437
- unit: m.cost.input > 0 ? "per_1m_tokens" : "free"
438
- },
439
- capabilities: {
440
- contextWindow: m.contextWindow,
441
- maxTokens: m.maxTokens,
442
- supportsVision: m.input.includes("image"),
443
- supportsStreaming: true
444
- }
445
- });
446
- }
447
- } catch {
448
- }
449
- }
450
- for (const [id, m] of this.dynamicModels) {
451
- if (seenIds.has(id)) continue;
422
+ for (const [, m] of this.dynamicModels) {
452
423
  models.push({
453
424
  id: m.id,
454
425
  provider: "pi-ai",
@@ -583,24 +554,15 @@ var PiAiProvider = class {
583
554
  async ensureDynamicModels() {
584
555
  if (this.dynamicModelsFetched) return;
585
556
  this.dynamicModelsFetched = true;
586
- const staticIds = /* @__PURE__ */ new Set();
587
- for (const provider of KNOWN_PROVIDERS) {
588
- try {
589
- for (const m of (0, import_pi_ai.getModels)(provider)) {
590
- staticIds.add(m.id);
591
- }
592
- } catch {
593
- }
594
- }
595
557
  const fetchPromises = [];
596
558
  for (const [providerKey, configFactory] of Object.entries(PROVIDER_APIS)) {
597
559
  const apiKey = this.keys[providerKey];
598
560
  if (!apiKey) continue;
599
- fetchPromises.push(this.fetchProviderModels(configFactory(apiKey), apiKey, staticIds));
561
+ fetchPromises.push(this.fetchProviderModels(configFactory(apiKey), apiKey));
600
562
  }
601
563
  await Promise.allSettled(fetchPromises);
602
564
  }
603
- async fetchProviderModels(config, apiKey, staticIds) {
565
+ async fetchProviderModels(config, apiKey) {
604
566
  try {
605
567
  const controller = new AbortController();
606
568
  const timer = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS);
@@ -613,11 +575,11 @@ var PiAiProvider = class {
613
575
  if (!res.ok) return;
614
576
  const data = await res.json();
615
577
  const entries = config.extractEntries(data);
616
- const templateModel = this.findStaticTemplate(config.providerName);
578
+ const staticTemplate = this.findStaticTemplate(config.providerName);
617
579
  for (const entry of entries) {
618
580
  const id = entry.id;
619
581
  if (!config.filterChat(id)) continue;
620
- if (staticIds.has(id)) continue;
582
+ const staticMatch = this.findStaticModel(config.providerName, id);
621
583
  this.dynamicModels.set(id, {
622
584
  id,
623
585
  name: entry.name ?? id,
@@ -625,10 +587,10 @@ var PiAiProvider = class {
625
587
  provider: config.providerName,
626
588
  baseUrl: config.piBaseUrl,
627
589
  reasoning: config.isReasoning(id),
628
- input: ["text", "image"],
629
- cost: templateModel?.cost ?? { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
630
- contextWindow: entry.contextWindow ?? templateModel?.contextWindow ?? 128e3,
631
- maxTokens: entry.maxTokens ?? templateModel?.maxTokens ?? 16384
590
+ input: staticMatch?.input ?? ["text", "image"],
591
+ cost: staticMatch?.cost ?? staticTemplate?.cost ?? { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
592
+ contextWindow: entry.contextWindow ?? staticMatch?.contextWindow ?? staticTemplate?.contextWindow ?? 128e3,
593
+ maxTokens: entry.maxTokens ?? staticMatch?.maxTokens ?? staticTemplate?.maxTokens ?? 16384
632
594
  });
633
595
  }
634
596
  } finally {
@@ -645,6 +607,14 @@ var PiAiProvider = class {
645
607
  return null;
646
608
  }
647
609
  }
610
+ findStaticModel(providerName, modelId) {
611
+ try {
612
+ const models = (0, import_pi_ai.getModels)(providerName);
613
+ return models.find((m) => m.id === modelId) ?? null;
614
+ } catch {
615
+ return null;
616
+ }
617
+ }
648
618
  /** Force re-fetch of dynamic models from provider APIs */
649
619
  async refreshDynamicModels() {
650
620
  this.dynamicModelsFetched = false;
@@ -652,18 +622,24 @@ var PiAiProvider = class {
652
622
  await this.ensureDynamicModels();
653
623
  }
654
624
  findModel(modelId) {
625
+ if (modelId) {
626
+ const dynamic = this.dynamicModels.get(modelId);
627
+ if (dynamic) return { model: dynamic, provider: String(dynamic.provider) };
628
+ }
629
+ if (!modelId) {
630
+ const first = this.dynamicModels.values().next();
631
+ if (!first.done && first.value) {
632
+ return { model: first.value, provider: String(first.value.provider) };
633
+ }
634
+ }
655
635
  for (const provider of KNOWN_PROVIDERS) {
656
636
  try {
657
637
  const models = (0, import_pi_ai.getModels)(provider);
658
- const found = modelId ? models.find((m) => m.id === modelId) : models[0];
638
+ const found = modelId ? models.find((m) => m.id === modelId) : void 0;
659
639
  if (found) return { model: found, provider };
660
640
  } catch {
661
641
  }
662
642
  }
663
- if (modelId) {
664
- const dynamic = this.dynamicModels.get(modelId);
665
- if (dynamic) return { model: dynamic, provider: String(dynamic.provider) };
666
- }
667
643
  return { model: null, provider: null };
668
644
  }
669
645
  };
@@ -1037,51 +1013,102 @@ var LocalTTSProvider = class {
1037
1013
 
1038
1014
  // src/providers/huggingface.ts
1039
1015
  var import_inference = require("@huggingface/inference");
1016
+ var HF_HUB_API = "https://huggingface.co/api/models";
1017
+ var FETCH_TIMEOUT_MS2 = 1e4;
1018
+ var PIPELINE_TAG_MAP = {
1019
+ "text-generation": { modality: "llm", limit: 50 },
1020
+ "text-to-image": { modality: "image", limit: 50 },
1021
+ "text-to-speech": { modality: "tts", limit: 30 }
1022
+ };
1040
1023
  var HuggingFaceProvider = class {
1041
1024
  id = "huggingface";
1042
1025
  name = "HuggingFace Inference";
1043
1026
  modalities = ["image", "tts", "llm"];
1044
1027
  isLocal = false;
1045
1028
  client;
1029
+ token;
1030
+ dynamicModels = null;
1046
1031
  constructor(token) {
1032
+ this.token = token;
1047
1033
  this.client = new import_inference.HfInference(token);
1048
1034
  }
1049
1035
  async ping() {
1050
1036
  return true;
1051
1037
  }
1052
1038
  async listModels(modality) {
1053
- const models = [];
1054
- if (!modality || modality === "image") {
1055
- models.push({
1056
- id: "stabilityai/stable-diffusion-xl-base-1.0",
1057
- provider: "huggingface",
1058
- name: "SDXL Base",
1059
- modality: "image",
1060
- local: false,
1061
- cost: { price: 0, unit: "free" }
1062
- });
1039
+ if (!this.dynamicModels) {
1040
+ await this.fetchHubModels();
1063
1041
  }
1064
- if (!modality || modality === "tts") {
1065
- models.push({
1066
- id: "facebook/mms-tts-eng",
1067
- provider: "huggingface",
1068
- name: "MMS TTS English",
1069
- modality: "tts",
1070
- local: false,
1071
- cost: { price: 0, unit: "free" }
1072
- });
1042
+ const all = this.dynamicModels ?? [];
1043
+ if (modality) return all.filter((m) => m.modality === modality);
1044
+ return all;
1045
+ }
1046
+ async fetchHubModels() {
1047
+ const seenIds = /* @__PURE__ */ new Set();
1048
+ const models = [];
1049
+ const fetches = Object.entries(PIPELINE_TAG_MAP).map(
1050
+ ([tag, { modality, limit }]) => this.fetchByPipelineTag(tag, modality, limit)
1051
+ );
1052
+ const results = await Promise.allSettled(fetches);
1053
+ for (const result of results) {
1054
+ if (result.status !== "fulfilled") continue;
1055
+ for (const model of result.value) {
1056
+ if (seenIds.has(model.id)) continue;
1057
+ seenIds.add(model.id);
1058
+ models.push(model);
1059
+ }
1073
1060
  }
1074
- if (!modality || modality === "llm") {
1075
- models.push({
1076
- id: "meta-llama/Llama-3.1-8B-Instruct",
1077
- provider: "huggingface",
1078
- name: "Llama 3.1 8B",
1079
- modality: "llm",
1080
- local: false,
1081
- cost: { price: 0, unit: "free" }
1082
- });
1061
+ this.dynamicModels = models;
1062
+ }
1063
+ async fetchByPipelineTag(pipelineTag, modality, limit) {
1064
+ try {
1065
+ const controller = new AbortController();
1066
+ const timer = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS2);
1067
+ try {
1068
+ const params = new URLSearchParams({
1069
+ pipeline_tag: pipelineTag,
1070
+ inference_provider: "all",
1071
+ sort: "trendingScore",
1072
+ limit: String(limit),
1073
+ "expand[]": "inferenceProviderMapping"
1074
+ });
1075
+ const res = await fetch(`${HF_HUB_API}?${params}`, {
1076
+ headers: this.token ? { Authorization: `Bearer ${this.token}` } : {},
1077
+ signal: controller.signal
1078
+ });
1079
+ if (!res.ok) return [];
1080
+ const data = await res.json();
1081
+ return data.filter((entry) => entry.id || entry.modelId).map((entry) => {
1082
+ const id = entry.id ?? entry.modelId;
1083
+ const providers = (entry.inferenceProviderMapping ?? []).filter((p) => p.status === "live").map((p) => p.provider);
1084
+ const pricingProvider = (entry.inferenceProviderMapping ?? []).find((p) => p.providerDetails?.pricing);
1085
+ const pricing = pricingProvider?.providerDetails?.pricing;
1086
+ const contextLength = (entry.inferenceProviderMapping ?? []).find((p) => p.providerDetails?.context_length)?.providerDetails?.context_length;
1087
+ return {
1088
+ id,
1089
+ provider: "huggingface",
1090
+ name: id.split("/").pop() ?? id,
1091
+ modality,
1092
+ local: false,
1093
+ cost: {
1094
+ price: pricing?.input ?? 0,
1095
+ unit: pricing ? "per_1m_tokens" : "free"
1096
+ },
1097
+ capabilities: {
1098
+ ...modality === "llm" ? {
1099
+ contextWindow: contextLength,
1100
+ supportsStreaming: true
1101
+ } : {},
1102
+ ...providers.length > 0 ? { inferenceProviders: providers } : {}
1103
+ }
1104
+ };
1105
+ });
1106
+ } finally {
1107
+ clearTimeout(timer);
1108
+ }
1109
+ } catch {
1110
+ return [];
1083
1111
  }
1084
- return models;
1085
1112
  }
1086
1113
  async chat(options) {
1087
1114
  const start = Date.now();