noosphere 0.2.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +210 -86
- package/dist/index.cjs +88 -28
- package/dist/index.cjs.map +1 -1
- package/dist/index.js +88 -28
- package/dist/index.js.map +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -59,103 +59,204 @@ const audio = await ai.speak({
|
|
|
59
59
|
// audio.buffer contains the audio data
|
|
60
60
|
```
|
|
61
61
|
|
|
62
|
-
## Dynamic Model Auto-Fetch — Always Up-to-Date
|
|
62
|
+
## Dynamic Model Auto-Fetch — Always Up-to-Date (ALL Providers, ALL Modalities)
|
|
63
63
|
|
|
64
|
-
Noosphere **automatically discovers the latest models
|
|
64
|
+
Noosphere **automatically discovers the latest models from EVERY provider's API at runtime** — across **all 4 modalities** (LLM, image, video, TTS). When Google releases a new Gemini model, when OpenAI drops GPT-5, when FAL adds a new video model, when a new image model trends on HuggingFace — **you get them immediately**, without updating Noosphere or any dependency.
|
|
65
65
|
|
|
66
66
|
### The Problem It Solves
|
|
67
67
|
|
|
68
|
-
Traditional AI libraries rely on **static model catalogs** hardcoded at build time. The `@mariozechner/pi-ai` dependency ships with ~246 models in a pre-generated `models.generated.js` file. When a provider releases a new model, you'd have to wait for the library maintainer to
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
│
|
|
83
|
-
│
|
|
84
|
-
│
|
|
85
|
-
│
|
|
86
|
-
│
|
|
87
|
-
│
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
│
|
|
91
|
-
|
|
92
|
-
│
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
68
|
+
Traditional AI libraries rely on **static model catalogs** hardcoded at build time. The `@mariozechner/pi-ai` dependency ships with ~246 LLM models in a pre-generated `models.generated.js` file. HuggingFace providers typically hardcode 3-5 default models. When a provider releases a new model, you'd have to wait for the library maintainer to update, publish, and then you'd `npm update`. This lag can be days or weeks.
|
|
69
|
+
|
|
70
|
+
**Noosphere solves this for every provider and every modality simultaneously.**
|
|
71
|
+
|
|
72
|
+
### How It Works — Complete Auto-Fetch Architecture
|
|
73
|
+
|
|
74
|
+
Noosphere has **3 independent auto-fetch systems** that work in parallel, one for each provider layer:
|
|
75
|
+
|
|
76
|
+
```
|
|
77
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
78
|
+
│ NOOSPHERE AUTO-FETCH │
|
|
79
|
+
├─────────────────────────────────────────────────────────────┤
|
|
80
|
+
│ │
|
|
81
|
+
│ ┌─── Pi-AI Provider (LLM) ─────────────────────────────┐ │
|
|
82
|
+
│ │ 8 parallel API calls on first chat()/stream(): │ │
|
|
83
|
+
│ │ OpenAI, Anthropic, Google, Groq, Mistral, │ │
|
|
84
|
+
│ │ xAI, OpenRouter, Cerebras │ │
|
|
85
|
+
│ │ → Merges with static pi-ai catalog (246 models) │ │
|
|
86
|
+
│ │ → Constructs synthetic Model objects for new ones │ │
|
|
87
|
+
│ └───────────────────────────────────────────────────────┘ │
|
|
88
|
+
│ │
|
|
89
|
+
│ ┌─── FAL Provider (Image/Video/TTS) ───────────────────┐ │
|
|
90
|
+
│ │ 1 API call on listModels(): │ │
|
|
91
|
+
│ │ GET https://api.fal.ai/v1/models/pricing │ │
|
|
92
|
+
│ │ → Returns ALL 867+ endpoints with live pricing │ │
|
|
93
|
+
│ │ → Auto-classifies modality from model ID + unit │ │
|
|
94
|
+
│ └───────────────────────────────────────────────────────┘ │
|
|
95
|
+
│ │
|
|
96
|
+
│ ┌─── HuggingFace Provider (LLM/Image/TTS) ────────────┐ │
|
|
97
|
+
│ │ 3 parallel API calls on listModels(): │ │
|
|
98
|
+
│ │ GET huggingface.co/api/models?pipeline_tag=... │ │
|
|
99
|
+
│ │ → text-generation (top 50 trending, inference-ready) │ │
|
|
100
|
+
│ │ → text-to-image (top 50 trending, inference-ready) │ │
|
|
101
|
+
│ │ → text-to-speech (top 30 trending, inference-ready) │ │
|
|
102
|
+
│ │ → Includes inference provider mapping + pricing │ │
|
|
103
|
+
│ └───────────────────────────────────────────────────────┘ │
|
|
104
|
+
│ │
|
|
105
|
+
└─────────────────────────────────────────────────────────────┘
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
### Layer 1: LLM Auto-Fetch (Pi-AI Provider) — 8 Provider APIs
|
|
109
|
+
|
|
110
|
+
On the **first `chat()` or `stream()` call**, Pi-AI queries every LLM provider's model listing API in parallel:
|
|
97
111
|
|
|
98
112
|
| Provider | API Endpoint | Auth | Model Filter | API Protocol |
|
|
99
113
|
|---|---|---|---|---|
|
|
100
|
-
| **OpenAI** |
|
|
101
|
-
| **Anthropic** |
|
|
102
|
-
| **Google** |
|
|
103
|
-
| **Groq** |
|
|
104
|
-
| **Mistral** |
|
|
105
|
-
| **xAI** |
|
|
106
|
-
| **OpenRouter** |
|
|
107
|
-
| **Cerebras** |
|
|
108
|
-
|
|
109
|
-
### Resilience Guarantees
|
|
114
|
+
| **OpenAI** | `GET /v1/models` | Bearer token | `gpt-*`, `o1*`, `o3*`, `o4*`, `chatgpt-*`, `codex-*` | `openai-responses` |
|
|
115
|
+
| **Anthropic** | `GET /v1/models?limit=100` | `x-api-key` + `anthropic-version: 2023-06-01` | `claude-*` | `anthropic-messages` |
|
|
116
|
+
| **Google** | `GET /v1beta/models?key=KEY` | API key in URL | `gemini-*`, `gemma-*` + must support `generateContent` | `google-generative-ai` |
|
|
117
|
+
| **Groq** | `GET /openai/v1/models` | Bearer token | All (Groq only serves chat models) | `openai-completions` |
|
|
118
|
+
| **Mistral** | `GET /v1/models` | Bearer token | Exclude `*embed*` | `openai-completions` |
|
|
119
|
+
| **xAI** | `GET /v1/models` | Bearer token | `grok*` | `openai-completions` |
|
|
120
|
+
| **OpenRouter** | `GET /api/v1/models` | Bearer token | All (all OpenRouter models are usable) | `openai-completions` |
|
|
121
|
+
| **Cerebras** | `GET /v1/models` | Bearer token | All (Cerebras only serves chat models) | `openai-completions` |
|
|
110
122
|
|
|
111
|
-
|
|
112
|
-
- **`Promise.allSettled()`** — if one provider fails, the others still work
|
|
113
|
-
- **Silent failure** — network errors are caught and ignored, static catalog always available
|
|
114
|
-
- **One-time fetch** — results are cached in memory, not re-fetched on every call
|
|
115
|
-
- **Zero config** — works automatically if you have API keys set
|
|
116
|
-
|
|
117
|
-
### How New Models Become Usable
|
|
118
|
-
|
|
119
|
-
When a dynamically discovered model isn't in the static catalog, Noosphere constructs a **synthetic Model object** that pi-ai's `complete()` and `stream()` functions can use directly:
|
|
123
|
+
**How new LLM models become usable:** When a model isn't in the static catalog, Noosphere constructs a **synthetic `Model` object** with the correct API protocol, base URL, and inherited cost data:
|
|
120
124
|
|
|
121
125
|
```typescript
|
|
122
|
-
//
|
|
126
|
+
// New model "gpt-4.5-turbo" discovered from OpenAI's /v1/models:
|
|
123
127
|
{
|
|
124
128
|
id: 'gpt-4.5-turbo',
|
|
125
129
|
name: 'gpt-4.5-turbo',
|
|
126
|
-
api: 'openai-responses',
|
|
130
|
+
api: 'openai-responses', // Correct protocol for OpenAI
|
|
127
131
|
provider: 'openai',
|
|
128
132
|
baseUrl: 'https://api.openai.com/v1',
|
|
129
|
-
reasoning: false,
|
|
133
|
+
reasoning: false, // Inferred from model ID prefix
|
|
130
134
|
input: ['text', 'image'],
|
|
131
|
-
cost: { input: 2.5, output: 10,
|
|
132
|
-
contextWindow: 128000,
|
|
133
|
-
maxTokens: 16384,
|
|
135
|
+
cost: { input: 2.5, output: 10, ... }, // Inherited from template model
|
|
136
|
+
contextWindow: 128000, // From template or API response
|
|
137
|
+
maxTokens: 16384,
|
|
138
|
+
}
|
|
139
|
+
// This object is passed directly to pi-ai's complete()/stream() — works immediately
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
### Layer 2: Image/Video/TTS Auto-Fetch (FAL Provider) — Pricing API
|
|
143
|
+
|
|
144
|
+
FAL already provides a **fully dynamic catalog**. On `listModels()`, it fetches from `https://api.fal.ai/v1/models/pricing`:
|
|
145
|
+
|
|
146
|
+
```typescript
|
|
147
|
+
// FAL returns an array with ALL available endpoints + live pricing:
|
|
148
|
+
[
|
|
149
|
+
{ modelId: "fal-ai/flux-pro/v1.1-ultra", price: 0.06, unit: "per_image" },
|
|
150
|
+
{ modelId: "fal-ai/kling-video/v2/master/text-to-video", price: 0.10, unit: "per_second" },
|
|
151
|
+
{ modelId: "fal-ai/kokoro/american-english", price: 0.002, unit: "per_1k_chars" },
|
|
152
|
+
// ... 867+ endpoints total
|
|
153
|
+
]
|
|
154
|
+
|
|
155
|
+
// Modality is auto-inferred from model ID + pricing unit:
|
|
156
|
+
// - unit contains 'char' OR id contains 'tts'/'kokoro'/'elevenlabs' → TTS
|
|
157
|
+
// - unit contains 'second' OR id contains 'video'/'kling'/'sora'/'veo' → Video
|
|
158
|
+
// - Everything else → Image
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
**Result:** Every FAL model is always current — new endpoints appear the moment FAL publishes them. Pricing is always accurate because it comes directly from their API.
|
|
162
|
+
|
|
163
|
+
### Layer 3: LLM/Image/TTS Auto-Fetch (HuggingFace Provider) — Hub API
|
|
164
|
+
|
|
165
|
+
Instead of 3 hardcoded defaults, HuggingFace now fetches **trending inference-ready models** from the Hub API across all 3 modalities:
|
|
166
|
+
|
|
167
|
+
```
|
|
168
|
+
GET https://huggingface.co/api/models
|
|
169
|
+
?pipeline_tag=text-generation ← LLM models
|
|
170
|
+
&inference_provider=all ← Only models available via inference API
|
|
171
|
+
&sort=trendingScore ← Most popular first
|
|
172
|
+
&limit=50 ← Top 50
|
|
173
|
+
&expand[]=inferenceProviderMapping ← Include provider routing + pricing
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
| Pipeline Tag | Modality | Limit | What It Fetches |
|
|
177
|
+
|---|---|---|---|
|
|
178
|
+
| `text-generation` | LLM | 50 | Top 50 trending chat/completion models with active inference endpoints |
|
|
179
|
+
| `text-to-image` | Image | 50 | Top 50 trending image generation models (SDXL, Flux, etc.) |
|
|
180
|
+
| `text-to-speech` | TTS | 30 | Top 30 trending TTS models with active inference endpoints |
|
|
181
|
+
|
|
182
|
+
**What the Hub API returns per model:**
|
|
183
|
+
```json
|
|
184
|
+
{
|
|
185
|
+
"id": "Qwen/Qwen2.5-72B-Instruct",
|
|
186
|
+
"pipeline_tag": "text-generation",
|
|
187
|
+
"likes": 1893,
|
|
188
|
+
"downloads": 4521987,
|
|
189
|
+
"inferenceProviderMapping": [
|
|
190
|
+
{
|
|
191
|
+
"provider": "together",
|
|
192
|
+
"providerId": "Qwen/Qwen2.5-72B-Instruct-Turbo",
|
|
193
|
+
"status": "live",
|
|
194
|
+
"providerDetails": {
|
|
195
|
+
"context_length": 32768,
|
|
196
|
+
"pricing": { "input": 1.2, "output": 1.2 }
|
|
197
|
+
}
|
|
198
|
+
},
|
|
199
|
+
{
|
|
200
|
+
"provider": "fireworks-ai",
|
|
201
|
+
"providerId": "accounts/fireworks/models/qwen2p5-72b-instruct",
|
|
202
|
+
"status": "live"
|
|
203
|
+
}
|
|
204
|
+
]
|
|
134
205
|
}
|
|
135
206
|
```
|
|
136
207
|
|
|
137
|
-
**
|
|
208
|
+
**Noosphere extracts from this:**
|
|
209
|
+
- Model ID → `id` field
|
|
210
|
+
- Pricing → first provider with `providerDetails.pricing`
|
|
211
|
+
- Context window → first provider with `providerDetails.context_length`
|
|
212
|
+
- Inference providers → list of available providers (Together, Fireworks, Groq, etc.)
|
|
213
|
+
|
|
214
|
+
**Three requests fire in parallel** (`Promise.allSettled`) with a **10-second timeout** each. If any fails, the 3 hardcoded defaults are always available as fallback.
|
|
215
|
+
|
|
216
|
+
### Resilience Guarantees (All Layers)
|
|
217
|
+
|
|
218
|
+
| Guarantee | Pi-AI (LLM) | FAL (Image/Video/TTS) | HuggingFace (LLM/Image/TTS) |
|
|
219
|
+
|---|---|---|---|
|
|
220
|
+
| **Timeout** | 8s per provider | No custom timeout | 10s per pipeline_tag |
|
|
221
|
+
| **Parallelism** | 8 concurrent requests | 1 request (returns all) | 3 concurrent requests |
|
|
222
|
+
| **Failure handling** | `Promise.allSettled` | Returns `[]` on error | `Promise.allSettled` |
|
|
223
|
+
| **Fallback** | Static pi-ai catalog (246 models) | Empty list (provider still usable by model ID) | 3 hardcoded defaults |
|
|
224
|
+
| **Caching** | One-time fetch, cached in memory | Per `listModels()` call | One-time fetch, cached in memory |
|
|
225
|
+
| **Auth required** | Yes (per-provider API keys) | Yes (FAL key) | Optional (works without token) |
|
|
226
|
+
|
|
227
|
+
### Total Model Coverage
|
|
228
|
+
|
|
229
|
+
| Source | Modalities | Model Count | Update Frequency |
|
|
230
|
+
|---|---|---|---|
|
|
231
|
+
| Pi-AI static catalog | LLM | ~246 | On npm update |
|
|
232
|
+
| Pi-AI dynamic fetch | LLM | **All models across 8 providers** | **Every session** |
|
|
233
|
+
| FAL pricing API | Image, Video, TTS | 867+ | **Every `listModels()` call** |
|
|
234
|
+
| HuggingFace Hub API | LLM, Image, TTS | Top 130 trending | **Every session** |
|
|
235
|
+
| ComfyUI `/object_info` | Image | Local checkpoints | **Every `listModels()` call** |
|
|
236
|
+
| Local TTS `/voices` | TTS | Local voices | **Every `listModels()` call** |
|
|
138
237
|
|
|
139
238
|
### Force Refresh
|
|
140
239
|
|
|
141
240
|
```typescript
|
|
142
241
|
const ai = new Noosphere();
|
|
143
242
|
|
|
144
|
-
// Models are auto-fetched on first call:
|
|
243
|
+
// Models are auto-fetched on first call — no action needed:
|
|
145
244
|
await ai.chat({ model: 'gemini-2.5-ultra', messages: [...] }); // works immediately
|
|
146
245
|
|
|
147
|
-
//
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
246
|
+
// Trigger a full sync across ALL providers:
|
|
247
|
+
const result = await ai.syncModels();
|
|
248
|
+
// result = { synced: 1200+, byProvider: { 'pi-ai': 300, 'fal': 867, 'huggingface': 130, ... }, errors: [] }
|
|
249
|
+
|
|
250
|
+
// Get all models for a specific modality:
|
|
251
|
+
const imageModels = await ai.getModels('image');
|
|
252
|
+
// Returns: FAL image models + HuggingFace image models + ComfyUI models
|
|
152
253
|
```
|
|
153
254
|
|
|
154
|
-
### Why
|
|
255
|
+
### Why Hybrid (Static + Dynamic)?
|
|
155
256
|
|
|
156
257
|
| Approach | Pros | Cons |
|
|
157
258
|
|---|---|---|
|
|
158
|
-
| **Static catalog only**
|
|
259
|
+
| **Static catalog only** | Accurate costs, fast startup | Stale within days, miss new models |
|
|
159
260
|
| **Dynamic only** | Always current | No cost data, no context window info, slow startup |
|
|
160
261
|
| **Hybrid (Noosphere)** | Best of both — accurate data for known models + immediate access to new ones | New models have estimated costs until catalog update |
|
|
161
262
|
|
|
@@ -1306,15 +1407,33 @@ The `@fal-ai/client` provides additional features beyond what Noosphere surfaces
|
|
|
1306
1407
|
|
|
1307
1408
|
---
|
|
1308
1409
|
|
|
1309
|
-
### Hugging Face — Open Source AI (30+ tasks)
|
|
1410
|
+
### Hugging Face — Open Source AI (30+ tasks, Dynamic Discovery)
|
|
1310
1411
|
|
|
1311
1412
|
**Provider ID:** `huggingface`
|
|
1312
1413
|
**Modalities:** LLM, Image, TTS
|
|
1313
1414
|
**Library:** `@huggingface/inference`
|
|
1415
|
+
**Auto-Fetch:** Yes — discovers trending inference-ready models from the Hub API
|
|
1314
1416
|
|
|
1315
|
-
Access to the entire Hugging Face Hub ecosystem.
|
|
1417
|
+
Access to the entire Hugging Face Hub ecosystem. Noosphere **automatically discovers the top trending models** across all 3 modalities via the Hub API, filtered to only include models with active inference provider endpoints.
|
|
1316
1418
|
|
|
1317
|
-
####
|
|
1419
|
+
#### Auto-Discovered Models
|
|
1420
|
+
|
|
1421
|
+
On first `listModels()` call, HuggingFace fetches from:
|
|
1422
|
+
```
|
|
1423
|
+
GET https://huggingface.co/api/models?inference_provider=all&pipeline_tag={tag}&sort=trendingScore&limit={n}&expand[]=inferenceProviderMapping
|
|
1424
|
+
```
|
|
1425
|
+
|
|
1426
|
+
| Pipeline Tag | Modality | Limit | Example Models |
|
|
1427
|
+
|---|---|---|---|
|
|
1428
|
+
| `text-generation` | LLM | 50 | Qwen2.5-72B-Instruct, Llama-3.3-70B, DeepSeek-V3, Mistral-Large |
|
|
1429
|
+
| `text-to-image` | Image | 50 | FLUX.1-dev, Stable Diffusion 3.5, SDXL-Lightning, Playground v2.5 |
|
|
1430
|
+
| `text-to-speech` | TTS | 30 | Kokoro-82M, Bark, MMS-TTS |
|
|
1431
|
+
|
|
1432
|
+
Each discovered model includes **inference provider routing** (Together, Fireworks, Groq, Replicate, etc.) and **pricing data** when available from the provider.
|
|
1433
|
+
|
|
1434
|
+
#### Fallback Default Models
|
|
1435
|
+
|
|
1436
|
+
These 3 models are always available, even if the Hub API is unreachable:
|
|
1318
1437
|
|
|
1319
1438
|
| Modality | Default Model | Description |
|
|
1320
1439
|
|---|---|---|
|
|
@@ -1322,7 +1441,7 @@ Access to the entire Hugging Face Hub ecosystem. Any model hosted on HuggingFace
|
|
|
1322
1441
|
| Image | `stabilityai/stable-diffusion-xl-base-1.0` | SDXL Base |
|
|
1323
1442
|
| TTS | `facebook/mms-tts-eng` | MMS TTS English |
|
|
1324
1443
|
|
|
1325
|
-
Any HuggingFace model ID works — just pass it as the `model` parameter:
|
|
1444
|
+
Any HuggingFace model ID works — just pass it as the `model` parameter (even if it's not in the auto-discovered list):
|
|
1326
1445
|
|
|
1327
1446
|
```typescript
|
|
1328
1447
|
await ai.chat({
|
|
@@ -1480,26 +1599,31 @@ const buffer = Buffer.from(await blob.arrayBuffer());
|
|
|
1480
1599
|
// result.media = { format: 'wav' }
|
|
1481
1600
|
```
|
|
1482
1601
|
|
|
1483
|
-
**Model listing —
|
|
1602
|
+
**Model listing — dynamic Hub API discovery:**
|
|
1484
1603
|
```typescript
|
|
1485
|
-
//
|
|
1486
|
-
// HuggingFace returns a HARDCODED list of 3 curated models:
|
|
1604
|
+
// HuggingFace now auto-fetches trending models from the Hub API:
|
|
1487
1605
|
async listModels(modality?: Modality): Promise<ModelInfo[]> {
|
|
1488
|
-
|
|
1489
|
-
|
|
1490
|
-
|
|
1491
|
-
}
|
|
1492
|
-
if (!modality || modality === 'tts') {
|
|
1493
|
-
models.push({ id: 'facebook/mms-tts-eng', ... });
|
|
1494
|
-
}
|
|
1495
|
-
if (!modality || modality === 'llm') {
|
|
1496
|
-
models.push({ id: 'meta-llama/Llama-3.1-8B-Instruct', ... });
|
|
1497
|
-
}
|
|
1498
|
-
return models;
|
|
1606
|
+
if (!this.dynamicModels) await this.fetchHubModels();
|
|
1607
|
+
// Returns: 3 hardcoded defaults + top 50 LLM + top 50 image + top 30 TTS
|
|
1608
|
+
// All filtered by inference_provider=all (only inference-ready models)
|
|
1499
1609
|
}
|
|
1500
|
-
|
|
1501
|
-
//
|
|
1502
|
-
//
|
|
1610
|
+
|
|
1611
|
+
// Hub API request per modality:
|
|
1612
|
+
// GET https://huggingface.co/api/models
|
|
1613
|
+
// ?pipeline_tag=text-generation
|
|
1614
|
+
// &inference_provider=all ← Only models with active inference endpoints
|
|
1615
|
+
// &sort=trendingScore ← Most popular first
|
|
1616
|
+
// &limit=50
|
|
1617
|
+
// &expand[]=inferenceProviderMapping ← Include provider routing + pricing
|
|
1618
|
+
|
|
1619
|
+
// Response includes per model:
|
|
1620
|
+
// - id: "Qwen/Qwen2.5-72B-Instruct"
|
|
1621
|
+
// - inferenceProviderMapping: [{ provider: "together", status: "live",
|
|
1622
|
+
// providerDetails: { context_length: 32768, pricing: { input: 1.2 } } }]
|
|
1623
|
+
|
|
1624
|
+
// Pricing and context_length extracted from inferenceProviderMapping
|
|
1625
|
+
// 3 hardcoded defaults always included as fallback
|
|
1626
|
+
// Results cached in memory after first fetch
|
|
1503
1627
|
```
|
|
1504
1628
|
|
|
1505
1629
|
#### The 17 HuggingFace Inference Providers
|
package/dist/index.cjs
CHANGED
|
@@ -1037,51 +1037,111 @@ var LocalTTSProvider = class {
|
|
|
1037
1037
|
|
|
1038
1038
|
// src/providers/huggingface.ts
|
|
1039
1039
|
var import_inference = require("@huggingface/inference");
|
|
1040
|
+
var HF_HUB_API = "https://huggingface.co/api/models";
|
|
1041
|
+
var FETCH_TIMEOUT_MS2 = 1e4;
|
|
1042
|
+
var PIPELINE_TAG_MAP = {
|
|
1043
|
+
"text-generation": { modality: "llm", limit: 50 },
|
|
1044
|
+
"text-to-image": { modality: "image", limit: 50 },
|
|
1045
|
+
"text-to-speech": { modality: "tts", limit: 30 }
|
|
1046
|
+
};
|
|
1047
|
+
var DEFAULT_MODELS = [
|
|
1048
|
+
{ id: "stabilityai/stable-diffusion-xl-base-1.0", provider: "huggingface", name: "SDXL Base", modality: "image", local: false, cost: { price: 0, unit: "free" } },
|
|
1049
|
+
{ id: "facebook/mms-tts-eng", provider: "huggingface", name: "MMS TTS English", modality: "tts", local: false, cost: { price: 0, unit: "free" } },
|
|
1050
|
+
{ id: "meta-llama/Llama-3.1-8B-Instruct", provider: "huggingface", name: "Llama 3.1 8B", modality: "llm", local: false, cost: { price: 0, unit: "free" } }
|
|
1051
|
+
];
|
|
1040
1052
|
var HuggingFaceProvider = class {
|
|
1041
1053
|
id = "huggingface";
|
|
1042
1054
|
name = "HuggingFace Inference";
|
|
1043
1055
|
modalities = ["image", "tts", "llm"];
|
|
1044
1056
|
isLocal = false;
|
|
1045
1057
|
client;
|
|
1058
|
+
token;
|
|
1059
|
+
dynamicModels = null;
|
|
1046
1060
|
constructor(token) {
|
|
1061
|
+
this.token = token;
|
|
1047
1062
|
this.client = new import_inference.HfInference(token);
|
|
1048
1063
|
}
|
|
1049
1064
|
async ping() {
|
|
1050
1065
|
return true;
|
|
1051
1066
|
}
|
|
1052
1067
|
async listModels(modality) {
|
|
1068
|
+
if (!this.dynamicModels) {
|
|
1069
|
+
await this.fetchHubModels();
|
|
1070
|
+
}
|
|
1071
|
+
const all = this.dynamicModels ?? DEFAULT_MODELS;
|
|
1072
|
+
if (modality) return all.filter((m) => m.modality === modality);
|
|
1073
|
+
return all;
|
|
1074
|
+
}
|
|
1075
|
+
async fetchHubModels() {
|
|
1076
|
+
const seenIds = /* @__PURE__ */ new Set();
|
|
1053
1077
|
const models = [];
|
|
1054
|
-
|
|
1055
|
-
|
|
1056
|
-
|
|
1057
|
-
provider: "huggingface",
|
|
1058
|
-
name: "SDXL Base",
|
|
1059
|
-
modality: "image",
|
|
1060
|
-
local: false,
|
|
1061
|
-
cost: { price: 0, unit: "free" }
|
|
1062
|
-
});
|
|
1078
|
+
for (const d of DEFAULT_MODELS) {
|
|
1079
|
+
seenIds.add(d.id);
|
|
1080
|
+
models.push(d);
|
|
1063
1081
|
}
|
|
1064
|
-
|
|
1065
|
-
|
|
1066
|
-
|
|
1067
|
-
|
|
1068
|
-
|
|
1069
|
-
|
|
1070
|
-
|
|
1071
|
-
|
|
1072
|
-
|
|
1082
|
+
const fetches = Object.entries(PIPELINE_TAG_MAP).map(
|
|
1083
|
+
([tag, { modality, limit }]) => this.fetchByPipelineTag(tag, modality, limit)
|
|
1084
|
+
);
|
|
1085
|
+
const results = await Promise.allSettled(fetches);
|
|
1086
|
+
for (const result of results) {
|
|
1087
|
+
if (result.status !== "fulfilled") continue;
|
|
1088
|
+
for (const model of result.value) {
|
|
1089
|
+
if (seenIds.has(model.id)) continue;
|
|
1090
|
+
seenIds.add(model.id);
|
|
1091
|
+
models.push(model);
|
|
1092
|
+
}
|
|
1073
1093
|
}
|
|
1074
|
-
|
|
1075
|
-
|
|
1076
|
-
|
|
1077
|
-
|
|
1078
|
-
|
|
1079
|
-
|
|
1080
|
-
|
|
1081
|
-
|
|
1082
|
-
|
|
1094
|
+
this.dynamicModels = models;
|
|
1095
|
+
}
|
|
1096
|
+
async fetchByPipelineTag(pipelineTag, modality, limit) {
|
|
1097
|
+
try {
|
|
1098
|
+
const controller = new AbortController();
|
|
1099
|
+
const timer = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS2);
|
|
1100
|
+
try {
|
|
1101
|
+
const params = new URLSearchParams({
|
|
1102
|
+
pipeline_tag: pipelineTag,
|
|
1103
|
+
inference_provider: "all",
|
|
1104
|
+
sort: "trendingScore",
|
|
1105
|
+
limit: String(limit),
|
|
1106
|
+
"expand[]": "inferenceProviderMapping"
|
|
1107
|
+
});
|
|
1108
|
+
const res = await fetch(`${HF_HUB_API}?${params}`, {
|
|
1109
|
+
headers: this.token ? { Authorization: `Bearer ${this.token}` } : {},
|
|
1110
|
+
signal: controller.signal
|
|
1111
|
+
});
|
|
1112
|
+
if (!res.ok) return [];
|
|
1113
|
+
const data = await res.json();
|
|
1114
|
+
return data.filter((entry) => entry.id || entry.modelId).map((entry) => {
|
|
1115
|
+
const id = entry.id ?? entry.modelId;
|
|
1116
|
+
const providers = (entry.inferenceProviderMapping ?? []).filter((p) => p.status === "live").map((p) => p.provider);
|
|
1117
|
+
const pricingProvider = (entry.inferenceProviderMapping ?? []).find((p) => p.providerDetails?.pricing);
|
|
1118
|
+
const pricing = pricingProvider?.providerDetails?.pricing;
|
|
1119
|
+
const contextLength = (entry.inferenceProviderMapping ?? []).find((p) => p.providerDetails?.context_length)?.providerDetails?.context_length;
|
|
1120
|
+
return {
|
|
1121
|
+
id,
|
|
1122
|
+
provider: "huggingface",
|
|
1123
|
+
name: id.split("/").pop() ?? id,
|
|
1124
|
+
modality,
|
|
1125
|
+
local: false,
|
|
1126
|
+
cost: {
|
|
1127
|
+
price: pricing?.input ?? 0,
|
|
1128
|
+
unit: pricing ? "per_1m_tokens" : "free"
|
|
1129
|
+
},
|
|
1130
|
+
capabilities: {
|
|
1131
|
+
...modality === "llm" ? {
|
|
1132
|
+
contextWindow: contextLength,
|
|
1133
|
+
supportsStreaming: true
|
|
1134
|
+
} : {},
|
|
1135
|
+
...providers.length > 0 ? { inferenceProviders: providers } : {}
|
|
1136
|
+
}
|
|
1137
|
+
};
|
|
1138
|
+
});
|
|
1139
|
+
} finally {
|
|
1140
|
+
clearTimeout(timer);
|
|
1141
|
+
}
|
|
1142
|
+
} catch {
|
|
1143
|
+
return [];
|
|
1083
1144
|
}
|
|
1084
|
-
return models;
|
|
1085
1145
|
}
|
|
1086
1146
|
async chat(options) {
|
|
1087
1147
|
const start = Date.now();
|