noosphere 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -59,103 +59,204 @@ const audio = await ai.speak({
59
59
  // audio.buffer contains the audio data
60
60
  ```
61
61
 
62
- ## Dynamic Model Auto-Fetch — Always Up-to-Date
62
+ ## Dynamic Model Auto-Fetch — Always Up-to-Date (ALL Providers, ALL Modalities)
63
63
 
64
- Noosphere **automatically discovers the latest models** from every provider's API at runtime. When Google releases a new Gemini model, when OpenAI drops GPT-5, when Anthropic publishes Claude 4 — **you get them immediately**, without updating Noosphere or any dependency.
64
+ Noosphere **automatically discovers the latest models from EVERY provider's API at runtime** — across **all 4 modalities** (LLM, image, video, TTS). When Google releases a new Gemini model, when OpenAI drops GPT-5, when FAL adds a new video model, when a new image model trends on HuggingFace — **you get them immediately**, without updating Noosphere or any dependency.
65
65
 
66
66
  ### The Problem It Solves
67
67
 
68
- Traditional AI libraries rely on **static model catalogs** hardcoded at build time. The `@mariozechner/pi-ai` dependency ships with ~246 models in a pre-generated `models.generated.js` file. When a provider releases a new model, you'd have to wait for the library maintainer to run `npm run generate-models`, publish a new version, and then you'd `npm update`. This lag can be days or weeks.
69
-
70
- ### How It Works
71
-
72
- On the **first API call**, Noosphere queries every provider's model listing API in parallel and merges the results with the static catalog:
73
-
74
- ```
75
- First ai.chat() / ai.image() / ai.stream() call
76
-
77
- ├─ 1. Load static pi-ai catalog (246 models with accurate cost/context data)
78
-
79
- ├─ 2. Parallel fetch from ALL provider APIs (8 concurrent requests):
80
- ├── GET https://api.openai.com/v1/models (Bearer token)
81
- │ ├── GET https://api.anthropic.com/v1/models (x-api-key + anthropic-version)
82
- ├── GET https://generativelanguage.googleapis.com/... (API key in URL)
83
- ├── GET https://api.groq.com/openai/v1/models (Bearer token)
84
- ├── GET https://api.mistral.ai/v1/models (Bearer token)
85
- ├── GET https://api.x.ai/v1/models (Bearer token)
86
- ├── GET https://openrouter.ai/api/v1/models (Bearer token)
87
- └── GET https://api.cerebras.ai/v1/models (Bearer token)
88
-
89
- ├─ 3. Filter results (chat models only — exclude embeddings, TTS, whisper, etc.)
90
-
91
- ├─ 4. Deduplicate against static catalog (static wins — has accurate cost data)
92
-
93
- └─ 5. Merge: Static catalog + newly discovered models = complete model list
94
- ```
95
-
96
- ### What Gets Fetched Per Provider
68
+ Traditional AI libraries rely on **static model catalogs** hardcoded at build time. The `@mariozechner/pi-ai` dependency ships with ~246 LLM models in a pre-generated `models.generated.js` file. HuggingFace providers typically hardcode 3-5 default models. When a provider releases a new model, you'd have to wait for the library maintainer to update, publish, and then you'd `npm update`. This lag can be days or weeks.
69
+
70
+ **Noosphere solves this for every provider and every modality simultaneously.**
71
+
72
+ ### How It Works Complete Auto-Fetch Architecture
73
+
74
+ Noosphere has **3 independent auto-fetch systems** that work in parallel, one for each provider layer:
75
+
76
+ ```
77
+ ┌─────────────────────────────────────────────────────────────┐
78
+ NOOSPHERE AUTO-FETCH │
79
+ ├─────────────────────────────────────────────────────────────┤
80
+
81
+ ┌─── Pi-AI Provider (LLM) ─────────────────────────────┐ │
82
+ 8 parallel API calls on first chat()/stream(): │ │
83
+ OpenAI, Anthropic, Google, Groq, Mistral, │ │
84
+ xAI, OpenRouter, Cerebras │ │
85
+ Merges with static pi-ai catalog (246 models) │ │
86
+ Constructs synthetic Model objects for new ones │ │
87
+ └───────────────────────────────────────────────────────┘
88
+
89
+ ┌─── FAL Provider (Image/Video/TTS) ───────────────────┐ │
90
+ 1 API call on listModels(): │ │
91
+ │ GET https://api.fal.ai/v1/models/pricing │ │
92
+ → Returns ALL 867+ endpoints with live pricing │ │
93
+ │ → Auto-classifies modality from model ID + unit │ │
94
+ │ └───────────────────────────────────────────────────────┘ │
95
+ │ │
96
+ │ ┌─── HuggingFace Provider (LLM/Image/TTS) ────────────┐ │
97
+ │ │ 3 parallel API calls on listModels(): │ │
98
+ │ │ GET huggingface.co/api/models?pipeline_tag=... │ │
99
+ │ │ → text-generation (top 50 trending, inference-ready) │ │
100
+ │ │ → text-to-image (top 50 trending, inference-ready) │ │
101
+ │ │ → text-to-speech (top 30 trending, inference-ready) │ │
102
+ │ │ → Includes inference provider mapping + pricing │ │
103
+ │ └───────────────────────────────────────────────────────┘ │
104
+ │ │
105
+ └─────────────────────────────────────────────────────────────┘
106
+ ```
107
+
108
+ ### Layer 1: LLM Auto-Fetch (Pi-AI Provider) — 8 Provider APIs
109
+
110
+ On the **first `chat()` or `stream()` call**, Pi-AI queries every LLM provider's model listing API in parallel:
97
111
 
98
112
  | Provider | API Endpoint | Auth | Model Filter | API Protocol |
99
113
  |---|---|---|---|---|
100
- | **OpenAI** | `/v1/models` | Bearer token | `gpt-*`, `o1*`, `o3*`, `o4*`, `chatgpt-*`, `codex-*` | `openai-responses` |
101
- | **Anthropic** | `/v1/models?limit=100` | `x-api-key` + `anthropic-version` | `claude-*` | `anthropic-messages` |
102
- | **Google** | `/v1beta/models?key=KEY` | API key in URL | `gemini-*`, `gemma-*` + must support `generateContent` | `google-generative-ai` |
103
- | **Groq** | `/openai/v1/models` | Bearer token | All (Groq only serves chat models) | `openai-completions` |
104
- | **Mistral** | `/v1/models` | Bearer token | Exclude `*embed*` | `openai-completions` |
105
- | **xAI** | `/v1/models` | Bearer token | `grok*` | `openai-completions` |
106
- | **OpenRouter** | `/api/v1/models` | Bearer token | All (OpenRouter only lists usable models) | `openai-completions` |
107
- | **Cerebras** | `/v1/models` | Bearer token | All (Cerebras only serves chat models) | `openai-completions` |
108
-
109
- ### Resilience Guarantees
114
+ | **OpenAI** | `GET /v1/models` | Bearer token | `gpt-*`, `o1*`, `o3*`, `o4*`, `chatgpt-*`, `codex-*` | `openai-responses` |
115
+ | **Anthropic** | `GET /v1/models?limit=100` | `x-api-key` + `anthropic-version: 2023-06-01` | `claude-*` | `anthropic-messages` |
116
+ | **Google** | `GET /v1beta/models?key=KEY` | API key in URL | `gemini-*`, `gemma-*` + must support `generateContent` | `google-generative-ai` |
117
+ | **Groq** | `GET /openai/v1/models` | Bearer token | All (Groq only serves chat models) | `openai-completions` |
118
+ | **Mistral** | `GET /v1/models` | Bearer token | Exclude `*embed*` | `openai-completions` |
119
+ | **xAI** | `GET /v1/models` | Bearer token | `grok*` | `openai-completions` |
120
+ | **OpenRouter** | `GET /api/v1/models` | Bearer token | All (all OpenRouter models are usable) | `openai-completions` |
121
+ | **Cerebras** | `GET /v1/models` | Bearer token | All (Cerebras only serves chat models) | `openai-completions` |
110
122
 
111
- - **8-second timeout** per provider slow APIs don't block everything
112
- - **`Promise.allSettled()`** — if one provider fails, the others still work
113
- - **Silent failure** — network errors are caught and ignored, static catalog always available
114
- - **One-time fetch** — results are cached in memory, not re-fetched on every call
115
- - **Zero config** — works automatically if you have API keys set
116
-
117
- ### How New Models Become Usable
118
-
119
- When a dynamically discovered model isn't in the static catalog, Noosphere constructs a **synthetic Model object** that pi-ai's `complete()` and `stream()` functions can use directly:
123
+ **How new LLM models become usable:** When a model isn't in the static catalog, Noosphere constructs a **synthetic `Model` object** with the correct API protocol, base URL, and inherited cost data:
120
124
 
121
125
  ```typescript
122
- // For a new model like "gpt-4.5-turbo" discovered from OpenAI's API:
126
+ // New model "gpt-4.5-turbo" discovered from OpenAI's /v1/models:
123
127
  {
124
128
  id: 'gpt-4.5-turbo',
125
129
  name: 'gpt-4.5-turbo',
126
- api: 'openai-responses', // Correct protocol for the provider
130
+ api: 'openai-responses', // Correct protocol for OpenAI
127
131
  provider: 'openai',
128
132
  baseUrl: 'https://api.openai.com/v1',
129
- reasoning: false, // Inferred from model ID prefix
133
+ reasoning: false, // Inferred from model ID prefix
130
134
  input: ['text', 'image'],
131
- cost: { input: 2.5, output: 10, cacheRead: 1.25, cacheWrite: 2.5 }, // From template
132
- contextWindow: 128000, // From template or provider API
133
- maxTokens: 16384, // From template or provider API
135
+ cost: { input: 2.5, output: 10, ... }, // Inherited from template model
136
+ contextWindow: 128000, // From template or API response
137
+ maxTokens: 16384,
138
+ }
139
+ // This object is passed directly to pi-ai's complete()/stream() — works immediately
140
+ ```
141
+
142
+ ### Layer 2: Image/Video/TTS Auto-Fetch (FAL Provider) — Pricing API
143
+
144
+ FAL already provides a **fully dynamic catalog**. On `listModels()`, it fetches from `https://api.fal.ai/v1/models/pricing`:
145
+
146
+ ```typescript
147
+ // FAL returns an array with ALL available endpoints + live pricing:
148
+ [
149
+ { modelId: "fal-ai/flux-pro/v1.1-ultra", price: 0.06, unit: "per_image" },
150
+ { modelId: "fal-ai/kling-video/v2/master/text-to-video", price: 0.10, unit: "per_second" },
151
+ { modelId: "fal-ai/kokoro/american-english", price: 0.002, unit: "per_1k_chars" },
152
+ // ... 867+ endpoints total
153
+ ]
154
+
155
+ // Modality is auto-inferred from model ID + pricing unit:
156
+ // - unit contains 'char' OR id contains 'tts'/'kokoro'/'elevenlabs' → TTS
157
+ // - unit contains 'second' OR id contains 'video'/'kling'/'sora'/'veo' → Video
158
+ // - Everything else → Image
159
+ ```
160
+
161
+ **Result:** Every FAL model is always current — new endpoints appear the moment FAL publishes them. Pricing is always accurate because it comes directly from their API.
162
+
163
+ ### Layer 3: LLM/Image/TTS Auto-Fetch (HuggingFace Provider) — Hub API
164
+
165
+ Instead of 3 hardcoded defaults, HuggingFace now fetches **trending inference-ready models** from the Hub API across all 3 modalities:
166
+
167
+ ```
168
+ GET https://huggingface.co/api/models
169
+ ?pipeline_tag=text-generation ← LLM models
170
+ &inference_provider=all ← Only models available via inference API
171
+ &sort=trendingScore ← Most popular first
172
+ &limit=50 ← Top 50
173
+ &expand[]=inferenceProviderMapping ← Include provider routing + pricing
174
+ ```
175
+
176
+ | Pipeline Tag | Modality | Limit | What It Fetches |
177
+ |---|---|---|---|
178
+ | `text-generation` | LLM | 50 | Top 50 trending chat/completion models with active inference endpoints |
179
+ | `text-to-image` | Image | 50 | Top 50 trending image generation models (SDXL, Flux, etc.) |
180
+ | `text-to-speech` | TTS | 30 | Top 30 trending TTS models with active inference endpoints |
181
+
182
+ **What the Hub API returns per model:**
183
+ ```json
184
+ {
185
+ "id": "Qwen/Qwen2.5-72B-Instruct",
186
+ "pipeline_tag": "text-generation",
187
+ "likes": 1893,
188
+ "downloads": 4521987,
189
+ "inferenceProviderMapping": [
190
+ {
191
+ "provider": "together",
192
+ "providerId": "Qwen/Qwen2.5-72B-Instruct-Turbo",
193
+ "status": "live",
194
+ "providerDetails": {
195
+ "context_length": 32768,
196
+ "pricing": { "input": 1.2, "output": 1.2 }
197
+ }
198
+ },
199
+ {
200
+ "provider": "fireworks-ai",
201
+ "providerId": "accounts/fireworks/models/qwen2p5-72b-instruct",
202
+ "status": "live"
203
+ }
204
+ ]
134
205
  }
135
206
  ```
136
207
 
137
- **Template inheritance:** Cost and context window data come from a "template" — the first model in the static catalog for that provider. This means new models inherit approximate pricing until the static catalog is updated with exact numbers. For Google, the API returns `inputTokenLimit` and `outputTokenLimit` directly, so context window data is always accurate.
208
+ **Noosphere extracts from this:**
209
+ - Model ID → `id` field
210
+ - Pricing → first provider with `providerDetails.pricing`
211
+ - Context window → first provider with `providerDetails.context_length`
212
+ - Inference providers → list of available providers (Together, Fireworks, Groq, etc.)
213
+
214
+ **Three requests fire in parallel** (`Promise.allSettled`) with a **10-second timeout** each. If any fails, the 3 hardcoded defaults are always available as fallback.
215
+
216
+ ### Resilience Guarantees (All Layers)
217
+
218
+ | Guarantee | Pi-AI (LLM) | FAL (Image/Video/TTS) | HuggingFace (LLM/Image/TTS) |
219
+ |---|---|---|---|
220
+ | **Timeout** | 8s per provider | No custom timeout | 10s per pipeline_tag |
221
+ | **Parallelism** | 8 concurrent requests | 1 request (returns all) | 3 concurrent requests |
222
+ | **Failure handling** | `Promise.allSettled` | Returns `[]` on error | `Promise.allSettled` |
223
+ | **Fallback** | Static pi-ai catalog (246 models) | Empty list (provider still usable by model ID) | 3 hardcoded defaults |
224
+ | **Caching** | One-time fetch, cached in memory | Per `listModels()` call | One-time fetch, cached in memory |
225
+ | **Auth required** | Yes (per-provider API keys) | Yes (FAL key) | Optional (works without token) |
226
+
227
+ ### Total Model Coverage
228
+
229
+ | Source | Modalities | Model Count | Update Frequency |
230
+ |---|---|---|---|
231
+ | Pi-AI static catalog | LLM | ~246 | On npm update |
232
+ | Pi-AI dynamic fetch | LLM | **All models across 8 providers** | **Every session** |
233
+ | FAL pricing API | Image, Video, TTS | 867+ | **Every `listModels()` call** |
234
+ | HuggingFace Hub API | LLM, Image, TTS | Top 130 trending | **Every session** |
235
+ | ComfyUI `/object_info` | Image | Local checkpoints | **Every `listModels()` call** |
236
+ | Local TTS `/voices` | TTS | Local voices | **Every `listModels()` call** |
138
237
 
139
238
  ### Force Refresh
140
239
 
141
240
  ```typescript
142
241
  const ai = new Noosphere();
143
242
 
144
- // Models are auto-fetched on first call:
243
+ // Models are auto-fetched on first call — no action needed:
145
244
  await ai.chat({ model: 'gemini-2.5-ultra', messages: [...] }); // works immediately
146
245
 
147
- // Force a re-fetch if you suspect new models were added mid-session:
148
- // (access the provider's refreshDynamicModels method via the registry)
149
- const models = await ai.getModels('llm');
150
- // Or trigger a full sync:
151
- await ai.syncModels();
246
+ // Trigger a full sync across ALL providers:
247
+ const result = await ai.syncModels();
248
+ // result = { synced: 1200+, byProvider: { 'pi-ai': 300, 'fal': 867, 'huggingface': 130, ... }, errors: [] }
249
+
250
+ // Get all models for a specific modality:
251
+ const imageModels = await ai.getModels('image');
252
+ // Returns: FAL image models + HuggingFace image models + ComfyUI models
152
253
  ```
153
254
 
154
- ### Why Not Just Use the Provider APIs Directly?
255
+ ### Why Hybrid (Static + Dynamic)?
155
256
 
156
257
  | Approach | Pros | Cons |
157
258
  |---|---|---|
158
- | **Static catalog only** (old) | Accurate costs, fast startup | Stale within days, miss new models |
259
+ | **Static catalog only** | Accurate costs, fast startup | Stale within days, miss new models |
159
260
  | **Dynamic only** | Always current | No cost data, no context window info, slow startup |
160
261
  | **Hybrid (Noosphere)** | Best of both — accurate data for known models + immediate access to new ones | New models have estimated costs until catalog update |
161
262
 
@@ -1306,15 +1407,33 @@ The `@fal-ai/client` provides additional features beyond what Noosphere surfaces
1306
1407
 
1307
1408
  ---
1308
1409
 
1309
- ### Hugging Face — Open Source AI (30+ tasks)
1410
+ ### Hugging Face — Open Source AI (30+ tasks, Dynamic Discovery)
1310
1411
 
1311
1412
  **Provider ID:** `huggingface`
1312
1413
  **Modalities:** LLM, Image, TTS
1313
1414
  **Library:** `@huggingface/inference`
1415
+ **Auto-Fetch:** Yes — discovers trending inference-ready models from the Hub API
1314
1416
 
1315
- Access to the entire Hugging Face Hub ecosystem. Any model hosted on HuggingFace can be used by passing its ID directly.
1417
+ Access to the entire Hugging Face Hub ecosystem. Noosphere **automatically discovers the top trending models** across all 3 modalities via the Hub API, filtered to only include models with active inference provider endpoints.
1316
1418
 
1317
- #### Default Models
1419
+ #### Auto-Discovered Models
1420
+
1421
+ On first `listModels()` call, HuggingFace fetches from:
1422
+ ```
1423
+ GET https://huggingface.co/api/models?inference_provider=all&pipeline_tag={tag}&sort=trendingScore&limit={n}&expand[]=inferenceProviderMapping
1424
+ ```
1425
+
1426
+ | Pipeline Tag | Modality | Limit | Example Models |
1427
+ |---|---|---|---|
1428
+ | `text-generation` | LLM | 50 | Qwen2.5-72B-Instruct, Llama-3.3-70B, DeepSeek-V3, Mistral-Large |
1429
+ | `text-to-image` | Image | 50 | FLUX.1-dev, Stable Diffusion 3.5, SDXL-Lightning, Playground v2.5 |
1430
+ | `text-to-speech` | TTS | 30 | Kokoro-82M, Bark, MMS-TTS |
1431
+
1432
+ Each discovered model includes **inference provider routing** (Together, Fireworks, Groq, Replicate, etc.) and **pricing data** when available from the provider.
1433
+
1434
+ #### Fallback Default Models
1435
+
1436
+ These 3 models are always available, even if the Hub API is unreachable:
1318
1437
 
1319
1438
  | Modality | Default Model | Description |
1320
1439
  |---|---|---|
@@ -1322,7 +1441,7 @@ Access to the entire Hugging Face Hub ecosystem. Any model hosted on HuggingFace
1322
1441
  | Image | `stabilityai/stable-diffusion-xl-base-1.0` | SDXL Base |
1323
1442
  | TTS | `facebook/mms-tts-eng` | MMS TTS English |
1324
1443
 
1325
- Any HuggingFace model ID works — just pass it as the `model` parameter:
1444
+ Any HuggingFace model ID works — just pass it as the `model` parameter (even if it's not in the auto-discovered list):
1326
1445
 
1327
1446
  ```typescript
1328
1447
  await ai.chat({
@@ -1480,26 +1599,31 @@ const buffer = Buffer.from(await blob.arrayBuffer());
1480
1599
  // result.media = { format: 'wav' }
1481
1600
  ```
1482
1601
 
1483
- **Model listing — curated defaults, not API discovery:**
1602
+ **Model listing — dynamic Hub API discovery:**
1484
1603
  ```typescript
1485
- // Unlike FAL (which fetches from API) or Pi-AI (which auto-generates),
1486
- // HuggingFace returns a HARDCODED list of 3 curated models:
1604
+ // HuggingFace now auto-fetches trending models from the Hub API:
1487
1605
  async listModels(modality?: Modality): Promise<ModelInfo[]> {
1488
- const models: ModelInfo[] = [];
1489
- if (!modality || modality === 'image') {
1490
- models.push({ id: 'stabilityai/stable-diffusion-xl-base-1.0', ... });
1491
- }
1492
- if (!modality || modality === 'tts') {
1493
- models.push({ id: 'facebook/mms-tts-eng', ... });
1494
- }
1495
- if (!modality || modality === 'llm') {
1496
- models.push({ id: 'meta-llama/Llama-3.1-8B-Instruct', ... });
1497
- }
1498
- return models;
1606
+ if (!this.dynamicModels) await this.fetchHubModels();
1607
+ // Returns: 3 hardcoded defaults + top 50 LLM + top 50 image + top 30 TTS
1608
+ // All filtered by inference_provider=all (only inference-ready models)
1499
1609
  }
1500
- // This means: the registry only KNOWS about 3 models by default,
1501
- // but you can use ANY HuggingFace model by passing its ID directly.
1502
- // The model just won't appear in getModels() or syncModels() results.
1610
+
1611
+ // Hub API request per modality:
1612
+ // GET https://huggingface.co/api/models
1613
+ // ?pipeline_tag=text-generation
1614
+ // &inference_provider=all ← Only models with active inference endpoints
1615
+ // &sort=trendingScore ← Most popular first
1616
+ // &limit=50
1617
+ // &expand[]=inferenceProviderMapping ← Include provider routing + pricing
1618
+
1619
+ // Response includes per model:
1620
+ // - id: "Qwen/Qwen2.5-72B-Instruct"
1621
+ // - inferenceProviderMapping: [{ provider: "together", status: "live",
1622
+ // providerDetails: { context_length: 32768, pricing: { input: 1.2 } } }]
1623
+
1624
+ // Pricing and context_length extracted from inferenceProviderMapping
1625
+ // 3 hardcoded defaults always included as fallback
1626
+ // Results cached in memory after first fetch
1503
1627
  ```
1504
1628
 
1505
1629
  #### The 17 HuggingFace Inference Providers
package/dist/index.cjs CHANGED
@@ -1037,51 +1037,111 @@ var LocalTTSProvider = class {
1037
1037
 
1038
1038
  // src/providers/huggingface.ts
1039
1039
  var import_inference = require("@huggingface/inference");
1040
+ var HF_HUB_API = "https://huggingface.co/api/models";
1041
+ var FETCH_TIMEOUT_MS2 = 1e4;
1042
+ var PIPELINE_TAG_MAP = {
1043
+ "text-generation": { modality: "llm", limit: 50 },
1044
+ "text-to-image": { modality: "image", limit: 50 },
1045
+ "text-to-speech": { modality: "tts", limit: 30 }
1046
+ };
1047
+ var DEFAULT_MODELS = [
1048
+ { id: "stabilityai/stable-diffusion-xl-base-1.0", provider: "huggingface", name: "SDXL Base", modality: "image", local: false, cost: { price: 0, unit: "free" } },
1049
+ { id: "facebook/mms-tts-eng", provider: "huggingface", name: "MMS TTS English", modality: "tts", local: false, cost: { price: 0, unit: "free" } },
1050
+ { id: "meta-llama/Llama-3.1-8B-Instruct", provider: "huggingface", name: "Llama 3.1 8B", modality: "llm", local: false, cost: { price: 0, unit: "free" } }
1051
+ ];
1040
1052
  var HuggingFaceProvider = class {
1041
1053
  id = "huggingface";
1042
1054
  name = "HuggingFace Inference";
1043
1055
  modalities = ["image", "tts", "llm"];
1044
1056
  isLocal = false;
1045
1057
  client;
1058
+ token;
1059
+ dynamicModels = null;
1046
1060
  constructor(token) {
1061
+ this.token = token;
1047
1062
  this.client = new import_inference.HfInference(token);
1048
1063
  }
1049
1064
  async ping() {
1050
1065
  return true;
1051
1066
  }
1052
1067
  async listModels(modality) {
1068
+ if (!this.dynamicModels) {
1069
+ await this.fetchHubModels();
1070
+ }
1071
+ const all = this.dynamicModels ?? DEFAULT_MODELS;
1072
+ if (modality) return all.filter((m) => m.modality === modality);
1073
+ return all;
1074
+ }
1075
+ async fetchHubModels() {
1076
+ const seenIds = /* @__PURE__ */ new Set();
1053
1077
  const models = [];
1054
- if (!modality || modality === "image") {
1055
- models.push({
1056
- id: "stabilityai/stable-diffusion-xl-base-1.0",
1057
- provider: "huggingface",
1058
- name: "SDXL Base",
1059
- modality: "image",
1060
- local: false,
1061
- cost: { price: 0, unit: "free" }
1062
- });
1078
+ for (const d of DEFAULT_MODELS) {
1079
+ seenIds.add(d.id);
1080
+ models.push(d);
1063
1081
  }
1064
- if (!modality || modality === "tts") {
1065
- models.push({
1066
- id: "facebook/mms-tts-eng",
1067
- provider: "huggingface",
1068
- name: "MMS TTS English",
1069
- modality: "tts",
1070
- local: false,
1071
- cost: { price: 0, unit: "free" }
1072
- });
1082
+ const fetches = Object.entries(PIPELINE_TAG_MAP).map(
1083
+ ([tag, { modality, limit }]) => this.fetchByPipelineTag(tag, modality, limit)
1084
+ );
1085
+ const results = await Promise.allSettled(fetches);
1086
+ for (const result of results) {
1087
+ if (result.status !== "fulfilled") continue;
1088
+ for (const model of result.value) {
1089
+ if (seenIds.has(model.id)) continue;
1090
+ seenIds.add(model.id);
1091
+ models.push(model);
1092
+ }
1073
1093
  }
1074
- if (!modality || modality === "llm") {
1075
- models.push({
1076
- id: "meta-llama/Llama-3.1-8B-Instruct",
1077
- provider: "huggingface",
1078
- name: "Llama 3.1 8B",
1079
- modality: "llm",
1080
- local: false,
1081
- cost: { price: 0, unit: "free" }
1082
- });
1094
+ this.dynamicModels = models;
1095
+ }
1096
+ async fetchByPipelineTag(pipelineTag, modality, limit) {
1097
+ try {
1098
+ const controller = new AbortController();
1099
+ const timer = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS2);
1100
+ try {
1101
+ const params = new URLSearchParams({
1102
+ pipeline_tag: pipelineTag,
1103
+ inference_provider: "all",
1104
+ sort: "trendingScore",
1105
+ limit: String(limit),
1106
+ "expand[]": "inferenceProviderMapping"
1107
+ });
1108
+ const res = await fetch(`${HF_HUB_API}?${params}`, {
1109
+ headers: this.token ? { Authorization: `Bearer ${this.token}` } : {},
1110
+ signal: controller.signal
1111
+ });
1112
+ if (!res.ok) return [];
1113
+ const data = await res.json();
1114
+ return data.filter((entry) => entry.id || entry.modelId).map((entry) => {
1115
+ const id = entry.id ?? entry.modelId;
1116
+ const providers = (entry.inferenceProviderMapping ?? []).filter((p) => p.status === "live").map((p) => p.provider);
1117
+ const pricingProvider = (entry.inferenceProviderMapping ?? []).find((p) => p.providerDetails?.pricing);
1118
+ const pricing = pricingProvider?.providerDetails?.pricing;
1119
+ const contextLength = (entry.inferenceProviderMapping ?? []).find((p) => p.providerDetails?.context_length)?.providerDetails?.context_length;
1120
+ return {
1121
+ id,
1122
+ provider: "huggingface",
1123
+ name: id.split("/").pop() ?? id,
1124
+ modality,
1125
+ local: false,
1126
+ cost: {
1127
+ price: pricing?.input ?? 0,
1128
+ unit: pricing ? "per_1m_tokens" : "free"
1129
+ },
1130
+ capabilities: {
1131
+ ...modality === "llm" ? {
1132
+ contextWindow: contextLength,
1133
+ supportsStreaming: true
1134
+ } : {},
1135
+ ...providers.length > 0 ? { inferenceProviders: providers } : {}
1136
+ }
1137
+ };
1138
+ });
1139
+ } finally {
1140
+ clearTimeout(timer);
1141
+ }
1142
+ } catch {
1143
+ return [];
1083
1144
  }
1084
- return models;
1085
1145
  }
1086
1146
  async chat(options) {
1087
1147
  const start = Date.now();