noosphere 0.1.3 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -7,7 +7,7 @@ One import. Every model. Every modality.
7
7
  ## Features
8
8
 
9
9
  - **4 modalities** — LLM chat, image generation, video generation, and text-to-speech
10
- - **246+ LLM models** — via Pi-AI gateway (OpenAI, Anthropic, Google, Groq, Mistral, xAI, Cerebras, OpenRouter)
10
+ - **Always up-to-date models** — Dynamic auto-fetch from ALL provider APIs at runtime (OpenAI, Anthropic, Google, Groq, Mistral, xAI, Cerebras, OpenRouter)
11
11
  - **867+ media endpoints** — via FAL (Flux, SDXL, Kling, Sora 2, VEO 3, Kokoro, ElevenLabs, and hundreds more)
12
12
  - **30+ HuggingFace tasks** — LLM, image, TTS, translation, summarization, classification, and more
13
13
  - **Local-first architecture** — Auto-detects ComfyUI, Ollama, Piper, and Kokoro on your machine
@@ -59,6 +59,209 @@ const audio = await ai.speak({
59
59
  // audio.buffer contains the audio data
60
60
  ```
61
61
 
62
+ ## Dynamic Model Auto-Fetch — Always Up-to-Date (ALL Providers, ALL Modalities)
63
+
64
+ Noosphere **automatically discovers the latest models from EVERY provider's API at runtime** — across **all 4 modalities** (LLM, image, video, TTS). When Google releases a new Gemini model, when OpenAI drops GPT-5, when FAL adds a new video model, when a new image model trends on HuggingFace — **you get them immediately**, without updating Noosphere or any dependency.
65
+
66
+ ### The Problem It Solves
67
+
68
+ Traditional AI libraries rely on **static model catalogs** hardcoded at build time. The `@mariozechner/pi-ai` dependency ships with ~246 LLM models in a pre-generated `models.generated.js` file. HuggingFace providers typically hardcode 3-5 default models. When a provider releases a new model, you'd have to wait for the library maintainer to update, publish, and then you'd `npm update`. This lag can be days or weeks.
69
+
70
+ **Noosphere solves this for every provider and every modality simultaneously.**
71
+
72
+ ### How It Works — Complete Auto-Fetch Architecture
73
+
74
+ Noosphere has **3 independent auto-fetch systems** that work in parallel, one for each provider layer:
75
+
76
+ ```
77
+ ┌─────────────────────────────────────────────────────────────┐
78
+ │ NOOSPHERE AUTO-FETCH │
79
+ ├─────────────────────────────────────────────────────────────┤
80
+ │ │
81
+ │ ┌─── Pi-AI Provider (LLM) ─────────────────────────────┐ │
82
+ │ │ 8 parallel API calls on first chat()/stream(): │ │
83
+ │ │ OpenAI, Anthropic, Google, Groq, Mistral, │ │
84
+ │ │ xAI, OpenRouter, Cerebras │ │
85
+ │ │ → Merges with static pi-ai catalog (246 models) │ │
86
+ │ │ → Constructs synthetic Model objects for new ones │ │
87
+ │ └───────────────────────────────────────────────────────┘ │
88
+ │ │
89
+ │ ┌─── FAL Provider (Image/Video/TTS) ───────────────────┐ │
90
+ │ │ 1 API call on listModels(): │ │
91
+ │ │ GET https://api.fal.ai/v1/models/pricing │ │
92
+ │ │ → Returns ALL 867+ endpoints with live pricing │ │
93
+ │ │ → Auto-classifies modality from model ID + unit │ │
94
+ │ └───────────────────────────────────────────────────────┘ │
95
+ │ │
96
+ │ ┌─── HuggingFace Provider (LLM/Image/TTS) ────────────┐ │
97
+ │ │ 3 parallel API calls on listModels(): │ │
98
+ │ │ GET huggingface.co/api/models?pipeline_tag=... │ │
99
+ │ │ → text-generation (top 50 trending, inference-ready) │ │
100
+ │ │ → text-to-image (top 50 trending, inference-ready) │ │
101
+ │ │ → text-to-speech (top 30 trending, inference-ready) │ │
102
+ │ │ → Includes inference provider mapping + pricing │ │
103
+ │ └───────────────────────────────────────────────────────┘ │
104
+ │ │
105
+ └─────────────────────────────────────────────────────────────┘
106
+ ```
107
+
108
+ ### Layer 1: LLM Auto-Fetch (Pi-AI Provider) — 8 Provider APIs
109
+
110
+ On the **first `chat()` or `stream()` call**, Pi-AI queries every LLM provider's model listing API in parallel:
111
+
112
+ | Provider | API Endpoint | Auth | Model Filter | API Protocol |
113
+ |---|---|---|---|---|
114
+ | **OpenAI** | `GET /v1/models` | Bearer token | `gpt-*`, `o1*`, `o3*`, `o4*`, `chatgpt-*`, `codex-*` | `openai-responses` |
115
+ | **Anthropic** | `GET /v1/models?limit=100` | `x-api-key` + `anthropic-version: 2023-06-01` | `claude-*` | `anthropic-messages` |
116
+ | **Google** | `GET /v1beta/models?key=KEY` | API key in URL | `gemini-*`, `gemma-*` + must support `generateContent` | `google-generative-ai` |
117
+ | **Groq** | `GET /openai/v1/models` | Bearer token | All (Groq only serves chat models) | `openai-completions` |
118
+ | **Mistral** | `GET /v1/models` | Bearer token | Exclude `*embed*` | `openai-completions` |
119
+ | **xAI** | `GET /v1/models` | Bearer token | `grok*` | `openai-completions` |
120
+ | **OpenRouter** | `GET /api/v1/models` | Bearer token | All (all OpenRouter models are usable) | `openai-completions` |
121
+ | **Cerebras** | `GET /v1/models` | Bearer token | All (Cerebras only serves chat models) | `openai-completions` |
122
+
123
+ **How new LLM models become usable:** When a model isn't in the static catalog, Noosphere constructs a **synthetic `Model` object** with the correct API protocol, base URL, and inherited cost data:
124
+
125
+ ```typescript
126
+ // New model "gpt-4.5-turbo" discovered from OpenAI's /v1/models:
127
+ {
128
+ id: 'gpt-4.5-turbo',
129
+ name: 'gpt-4.5-turbo',
130
+ api: 'openai-responses', // Correct protocol for OpenAI
131
+ provider: 'openai',
132
+ baseUrl: 'https://api.openai.com/v1',
133
+ reasoning: false, // Inferred from model ID prefix
134
+ input: ['text', 'image'],
135
+ cost: { input: 2.5, output: 10, ... }, // Inherited from template model
136
+ contextWindow: 128000, // From template or API response
137
+ maxTokens: 16384,
138
+ }
139
+ // This object is passed directly to pi-ai's complete()/stream() — works immediately
140
+ ```
141
+
142
+ ### Layer 2: Image/Video/TTS Auto-Fetch (FAL Provider) — Pricing API
143
+
144
+ FAL already provides a **fully dynamic catalog**. On `listModels()`, it fetches from `https://api.fal.ai/v1/models/pricing`:
145
+
146
+ ```typescript
147
+ // FAL returns an array with ALL available endpoints + live pricing:
148
+ [
149
+ { modelId: "fal-ai/flux-pro/v1.1-ultra", price: 0.06, unit: "per_image" },
150
+ { modelId: "fal-ai/kling-video/v2/master/text-to-video", price: 0.10, unit: "per_second" },
151
+ { modelId: "fal-ai/kokoro/american-english", price: 0.002, unit: "per_1k_chars" },
152
+ // ... 867+ endpoints total
153
+ ]
154
+
155
+ // Modality is auto-inferred from model ID + pricing unit:
156
+ // - unit contains 'char' OR id contains 'tts'/'kokoro'/'elevenlabs' → TTS
157
+ // - unit contains 'second' OR id contains 'video'/'kling'/'sora'/'veo' → Video
158
+ // - Everything else → Image
159
+ ```
160
+
161
+ **Result:** Every FAL model is always current — new endpoints appear the moment FAL publishes them. Pricing is always accurate because it comes directly from their API.
162
+
163
+ ### Layer 3: LLM/Image/TTS Auto-Fetch (HuggingFace Provider) — Hub API
164
+
165
+ Instead of 3 hardcoded defaults, HuggingFace now fetches **trending inference-ready models** from the Hub API across all 3 modalities:
166
+
167
+ ```
168
+ GET https://huggingface.co/api/models
169
+ ?pipeline_tag=text-generation ← LLM models
170
+ &inference_provider=all ← Only models available via inference API
171
+ &sort=trendingScore ← Most popular first
172
+ &limit=50 ← Top 50
173
+ &expand[]=inferenceProviderMapping ← Include provider routing + pricing
174
+ ```
175
+
176
+ | Pipeline Tag | Modality | Limit | What It Fetches |
177
+ |---|---|---|---|
178
+ | `text-generation` | LLM | 50 | Top 50 trending chat/completion models with active inference endpoints |
179
+ | `text-to-image` | Image | 50 | Top 50 trending image generation models (SDXL, Flux, etc.) |
180
+ | `text-to-speech` | TTS | 30 | Top 30 trending TTS models with active inference endpoints |
181
+
182
+ **What the Hub API returns per model:**
183
+ ```json
184
+ {
185
+ "id": "Qwen/Qwen2.5-72B-Instruct",
186
+ "pipeline_tag": "text-generation",
187
+ "likes": 1893,
188
+ "downloads": 4521987,
189
+ "inferenceProviderMapping": [
190
+ {
191
+ "provider": "together",
192
+ "providerId": "Qwen/Qwen2.5-72B-Instruct-Turbo",
193
+ "status": "live",
194
+ "providerDetails": {
195
+ "context_length": 32768,
196
+ "pricing": { "input": 1.2, "output": 1.2 }
197
+ }
198
+ },
199
+ {
200
+ "provider": "fireworks-ai",
201
+ "providerId": "accounts/fireworks/models/qwen2p5-72b-instruct",
202
+ "status": "live"
203
+ }
204
+ ]
205
+ }
206
+ ```
207
+
208
+ **Noosphere extracts from this:**
209
+ - Model ID → `id` field
210
+ - Pricing → first provider with `providerDetails.pricing`
211
+ - Context window → first provider with `providerDetails.context_length`
212
+ - Inference providers → list of available providers (Together, Fireworks, Groq, etc.)
213
+
214
+ **Three requests fire in parallel** (`Promise.allSettled`) with a **10-second timeout** each. If any fails, the 3 hardcoded defaults are always available as fallback.
215
+
216
+ ### Resilience Guarantees (All Layers)
217
+
218
+ | Guarantee | Pi-AI (LLM) | FAL (Image/Video/TTS) | HuggingFace (LLM/Image/TTS) |
219
+ |---|---|---|---|
220
+ | **Timeout** | 8s per provider | No custom timeout | 10s per pipeline_tag |
221
+ | **Parallelism** | 8 concurrent requests | 1 request (returns all) | 3 concurrent requests |
222
+ | **Failure handling** | `Promise.allSettled` | Returns `[]` on error | `Promise.allSettled` |
223
+ | **Fallback** | Static pi-ai catalog (246 models) | Empty list (provider still usable by model ID) | 3 hardcoded defaults |
224
+ | **Caching** | One-time fetch, cached in memory | Per `listModels()` call | One-time fetch, cached in memory |
225
+ | **Auth required** | Yes (per-provider API keys) | Yes (FAL key) | Optional (works without token) |
226
+
227
+ ### Total Model Coverage
228
+
229
+ | Source | Modalities | Model Count | Update Frequency |
230
+ |---|---|---|---|
231
+ | Pi-AI static catalog | LLM | ~246 | On npm update |
232
+ | Pi-AI dynamic fetch | LLM | **All models across 8 providers** | **Every session** |
233
+ | FAL pricing API | Image, Video, TTS | 867+ | **Every `listModels()` call** |
234
+ | HuggingFace Hub API | LLM, Image, TTS | Top 130 trending | **Every session** |
235
+ | ComfyUI `/object_info` | Image | Local checkpoints | **Every `listModels()` call** |
236
+ | Local TTS `/voices` | TTS | Local voices | **Every `listModels()` call** |
237
+
238
+ ### Force Refresh
239
+
240
+ ```typescript
241
+ const ai = new Noosphere();
242
+
243
+ // Models are auto-fetched on first call — no action needed:
244
+ await ai.chat({ model: 'gemini-2.5-ultra', messages: [...] }); // works immediately
245
+
246
+ // Trigger a full sync across ALL providers:
247
+ const result = await ai.syncModels();
248
+ // result = { synced: 1200+, byProvider: { 'pi-ai': 300, 'fal': 867, 'huggingface': 130, ... }, errors: [] }
249
+
250
+ // Get all models for a specific modality:
251
+ const imageModels = await ai.getModels('image');
252
+ // Returns: FAL image models + HuggingFace image models + ComfyUI models
253
+ ```
254
+
255
+ ### Why Hybrid (Static + Dynamic)?
256
+
257
+ | Approach | Pros | Cons |
258
+ |---|---|---|
259
+ | **Static catalog only** | Accurate costs, fast startup | Stale within days, miss new models |
260
+ | **Dynamic only** | Always current | No cost data, no context window info, slow startup |
261
+ | **Hybrid (Noosphere)** | Best of both — accurate data for known models + immediate access to new ones | New models have estimated costs until catalog update |
262
+
263
+ ---
264
+
62
265
  ## Configuration
63
266
 
64
267
  API keys are resolved from the constructor config or environment variables (config takes priority):
@@ -1112,28 +1315,125 @@ The largest media generation provider with dynamic pricing fetched at runtime fr
1112
1315
  **Other TTS:**
1113
1316
  `fal-ai/f5-tts` (voice cloning), `fal-ai/dia-tts`, `fal-ai/minimax/speech-2.6-turbo`, `fal-ai/minimax/speech-2.6-hd`, `fal-ai/chatterbox/text-to-speech`, `fal-ai/index-tts-2/text-to-speech`
1114
1317
 
1318
+ #### FAL Provider Internals — How It Actually Works
1319
+
1320
+ **Image generation** uses `fal.subscribe()` (queue-based, polls until complete):
1321
+ ```typescript
1322
+ // Exact request payload sent to FAL:
1323
+ const response = await fal.subscribe(model, {
1324
+ input: {
1325
+ prompt: "A sunset over mountains",
1326
+ negative_prompt: "blurry", // from options.negativePrompt
1327
+ image_size: { width: 1024, height: 768 }, // from options.width/height
1328
+ seed: 42, // from options.seed
1329
+ num_inference_steps: 30, // from options.steps
1330
+ guidance_scale: 7.5, // from options.guidanceScale
1331
+ },
1332
+ });
1333
+
1334
+ // Response parsing — URL from images array:
1335
+ const image = response.data?.images?.[0];
1336
+ // result.url = image?.url
1337
+ // result.media = { width: image?.width, height: image?.height, format: 'png' }
1338
+ ```
1339
+
1340
+ **Video generation** uses `fal.subscribe()`:
1341
+ ```typescript
1342
+ const response = await fal.subscribe(model, {
1343
+ input: {
1344
+ prompt: "Ocean waves",
1345
+ image_url: "https://...", // from options.imageUrl (image-to-video)
1346
+ duration: 5, // from options.duration
1347
+ fps: 24, // from options.fps
1348
+ },
1349
+ });
1350
+
1351
+ // Response parsing — URL from video object with fallback:
1352
+ const video = response.data?.video;
1353
+ // result.url = video?.url ?? response.data?.video_url
1354
+ // Note: width/height/duration/fps come from INPUT options, not response
1355
+ ```
1356
+
1357
+ **TTS** uses `fal.run()` (direct call, NOT subscribe — no queue):
1358
+ ```typescript
1359
+ const response = await fal.run(model, {
1360
+ input: {
1361
+ text: "Hello world",
1362
+ voice: "af_heart", // from options.voice
1363
+ speed: 1.0, // from options.speed
1364
+ },
1365
+ });
1366
+
1367
+ // Response parsing — URL from audio object with fallback:
1368
+ // result.url = response.data?.audio_url ?? response.data?.audio?.url
1369
+ ```
1370
+
1371
+ **Pricing cache and cost tracking:**
1372
+ ```typescript
1373
+ // Pricing fetched dynamically from FAL API during listModels():
1374
+ const res = await fetch('https://api.fal.ai/v1/models/pricing', {
1375
+ headers: { Authorization: `Key ${this.apiKey}` },
1376
+ });
1377
+ // Returns: Array<{ modelId: string, price: number, unit: string }>
1378
+
1379
+ // Cached in memory Map, cleared on each listModels() call:
1380
+ private pricingCache = new Map<string, { price: number; unit: string }>();
1381
+
1382
+ // Cost per request pulled from cache (defaults to 0 if not cached):
1383
+ usage: { cost: pricingCache.get(model)?.price ?? 0 }
1384
+ ```
1385
+
1386
+ **Modality inference from model ID — exact string matching:**
1387
+ ```typescript
1388
+ inferModality(modelId: string, unit: string): Modality {
1389
+ // TTS: unit contains 'char' OR modelId contains 'tts'/'kokoro'/'elevenlabs'
1390
+ // Video: unit contains 'second' OR modelId contains 'video'/'kling'/'sora'/'veo'
1391
+ // Image: everything else (default)
1392
+ }
1393
+ ```
1394
+
1395
+ **Error handling:** Only `listModels()` catches errors (returns `[]`). Image/video/speak methods let FAL errors propagate directly — no wrapping.
1396
+
1115
1397
  #### FAL Client Capabilities
1116
1398
 
1117
1399
  The `@fal-ai/client` provides additional features beyond what Noosphere surfaces:
1118
1400
 
1119
- - **Queue API** — Submit jobs, poll status, get results, cancel. Supports webhooks and priority levels
1120
- - **Streaming API** — Real-time streaming responses via async iterators
1121
- - **Realtime API** — WebSocket connections for interactive use (e.g., real-time image generation)
1122
- - **Storage API** — File upload with configurable TTL (1h, 1d, 7d, 30d, 1y, never)
1123
- - **Retry logic** — Configurable retries with exponential backoff and jitter
1124
- - **Request middleware** — Custom request interceptors and proxy support
1401
+ - **Queue API** — `fal.queue.submit()`, `status()`, `result()`, `cancel()`. Supports webhooks, priority levels (`"low"` | `"normal"`), and polling/streaming status modes
1402
+ - **Streaming API** — `fal.streaming.stream()` with async iterators, chunk-level events, configurable timeout between chunks (15s default)
1403
+ - **Realtime API** — `fal.realtime.connect()` for WebSocket connections with msgpack encoding, throttle interval (128ms default), frame buffering (1-60 frames)
1404
+ - **Storage API** — `fal.storage.upload()` with configurable object lifecycle: `"never"` | `"immediate"` | `"1h"` | `"1d"` | `"7d"` | `"30d"` | `"1y"`
1405
+ - **Retry logic** — 3 retries default, exponential backoff (500ms base, 15s max), jitter enabled, retries on 408/429/500/502/503/504
1406
+ - **Request middleware** — `withMiddleware()` for request interceptors, `withProxy()` for proxy configuration
1125
1407
 
1126
1408
  ---
1127
1409
 
1128
- ### Hugging Face — Open Source AI (30+ tasks)
1410
+ ### Hugging Face — Open Source AI (30+ tasks, Dynamic Discovery)
1129
1411
 
1130
1412
  **Provider ID:** `huggingface`
1131
1413
  **Modalities:** LLM, Image, TTS
1132
1414
  **Library:** `@huggingface/inference`
1415
+ **Auto-Fetch:** Yes — discovers trending inference-ready models from the Hub API
1416
+
1417
+ Access to the entire Hugging Face Hub ecosystem. Noosphere **automatically discovers the top trending models** across all 3 modalities via the Hub API, filtered to only include models with active inference provider endpoints.
1418
+
1419
+ #### Auto-Discovered Models
1420
+
1421
+ On first `listModels()` call, HuggingFace fetches from:
1422
+ ```
1423
+ GET https://huggingface.co/api/models?inference_provider=all&pipeline_tag={tag}&sort=trendingScore&limit={n}&expand[]=inferenceProviderMapping
1424
+ ```
1425
+
1426
+ | Pipeline Tag | Modality | Limit | Example Models |
1427
+ |---|---|---|---|
1428
+ | `text-generation` | LLM | 50 | Qwen2.5-72B-Instruct, Llama-3.3-70B, DeepSeek-V3, Mistral-Large |
1429
+ | `text-to-image` | Image | 50 | FLUX.1-dev, Stable Diffusion 3.5, SDXL-Lightning, Playground v2.5 |
1430
+ | `text-to-speech` | TTS | 30 | Kokoro-82M, Bark, MMS-TTS |
1133
1431
 
1134
- Access to the entire Hugging Face Hub ecosystem. Any model hosted on HuggingFace can be used by passing its ID directly.
1432
+ Each discovered model includes **inference provider routing** (Together, Fireworks, Groq, Replicate, etc.) and **pricing data** when available from the provider.
1135
1433
 
1136
- #### Default Models
1434
+ #### Fallback Default Models
1435
+
1436
+ These 3 models are always available, even if the Hub API is unreachable:
1137
1437
 
1138
1438
  | Modality | Default Model | Description |
1139
1439
  |---|---|---|
@@ -1141,7 +1441,7 @@ Access to the entire Hugging Face Hub ecosystem. Any model hosted on HuggingFace
1141
1441
  | Image | `stabilityai/stable-diffusion-xl-base-1.0` | SDXL Base |
1142
1442
  | TTS | `facebook/mms-tts-eng` | MMS TTS English |
1143
1443
 
1144
- Any HuggingFace model ID works — just pass it as the `model` parameter:
1444
+ Any HuggingFace model ID works — just pass it as the `model` parameter (even if it's not in the auto-discovered list):
1145
1445
 
1146
1446
  ```typescript
1147
1447
  await ai.chat({
@@ -1212,6 +1512,265 @@ The `@huggingface/inference` library (v3.15.0) provides 30+ AI tasks, including
1212
1512
  - **Multimodal Input:** Images via `image_url` content chunks in chat messages
1213
1513
  - **17 Inference Providers:** Route through Groq, Together, Fireworks, Replicate, Cerebras, Cohere, and more
1214
1514
 
1515
+ #### HuggingFace Provider Internals — How It Actually Works
1516
+
1517
+ The `HuggingFaceProvider` class (`src/providers/huggingface.ts`, 141 lines) wraps the `@huggingface/inference` library's `HfInference` client. Here's the exact internal flow for each modality:
1518
+
1519
+ **Initialization:**
1520
+ ```typescript
1521
+ // Constructor receives a single API token
1522
+ constructor(token: string) {
1523
+ this.client = new HfInference(token);
1524
+ // HfInference stores the token internally and attaches it
1525
+ // as Authorization: Bearer <token> to every request
1526
+ }
1527
+
1528
+ // ping() always returns true — HuggingFace is considered
1529
+ // "available" if the token was provided. No actual HTTP check.
1530
+ async ping(): Promise<boolean> { return true; }
1531
+ ```
1532
+
1533
+ **Chat Completions — exact request flow:**
1534
+ ```typescript
1535
+ // Default model: meta-llama/Llama-3.1-8B-Instruct
1536
+ const model = options.model ?? 'meta-llama/Llama-3.1-8B-Instruct';
1537
+
1538
+ // Maps directly to HfInference.chatCompletion():
1539
+ const response = await this.client.chatCompletion({
1540
+ model, // HuggingFace model ID or inference endpoint
1541
+ messages: options.messages, // Array<{ role, content }> — passed directly
1542
+ temperature: options.temperature, // 0.0 - 2.0 (optional)
1543
+ max_tokens: options.maxTokens, // Max output tokens (optional)
1544
+ });
1545
+
1546
+ // Response parsing:
1547
+ const choice = response.choices?.[0]; // OpenAI-compatible format
1548
+ const usage = response.usage; // { prompt_tokens, completion_tokens }
1549
+ // result.content = choice?.message?.content ?? ''
1550
+ // result.usage.input = usage?.prompt_tokens
1551
+ // result.usage.output = usage?.completion_tokens
1552
+ // result.usage.cost = 0 (always free for HF Inference API)
1553
+ ```
1554
+
1555
+ **Image Generation — Blob-to-Buffer conversion pipeline:**
1556
+ ```typescript
1557
+ // Default model: stabilityai/stable-diffusion-xl-base-1.0
1558
+ const model = options.model ?? 'stabilityai/stable-diffusion-xl-base-1.0';
1559
+
1560
+ // Uses textToImage() which returns a Blob object:
1561
+ const blob = await this.client.textToImage({
1562
+ model,
1563
+ inputs: options.prompt, // The text prompt
1564
+ parameters: {
1565
+ negative_prompt: options.negativePrompt, // What NOT to generate
1566
+ width: options.width, // Pixel width
1567
+ height: options.height, // Pixel height
1568
+ guidance_scale: options.guidanceScale, // CFG scale
1569
+ num_inference_steps: options.steps, // Denoising steps
1570
+ },
1571
+ }, { outputType: 'blob' }); // <-- Forces Blob output (not ReadableStream)
1572
+
1573
+ // Blob → ArrayBuffer → Node.js Buffer conversion:
1574
+ const buffer = Buffer.from(await blob.arrayBuffer());
1575
+ // This is the critical step — HfInference returns a Web API Blob,
1576
+ // which must be converted to a Node.js Buffer for downstream use.
1577
+
1578
+ // Result always reports PNG format regardless of actual model output:
1579
+ // result.media = { width: options.width ?? 1024, height: options.height ?? 1024, format: 'png' }
1580
+ ```
1581
+
1582
+ **Text-to-Speech — Blob-to-Buffer conversion:**
1583
+ ```typescript
1584
+ // Default model: facebook/mms-tts-eng
1585
+ const model = options.model ?? 'facebook/mms-tts-eng';
1586
+
1587
+ // Uses textToSpeech() — simpler API, just model + text:
1588
+ const blob = await this.client.textToSpeech({
1589
+ model,
1590
+ inputs: options.text, // Text to synthesize
1591
+ // Note: No voice, speed, or format parameters — these are model-dependent
1592
+ });
1593
+
1594
+ // Same Blob → Buffer conversion:
1595
+ const buffer = Buffer.from(await blob.arrayBuffer());
1596
+
1597
+ // Usage tracks character count, not tokens:
1598
+ // result.usage = { cost: 0, input: options.text.length, unit: 'characters' }
1599
+ // result.media = { format: 'wav' }
1600
+ ```
1601
+
1602
+ **Model listing — dynamic Hub API discovery:**
1603
+ ```typescript
1604
+ // HuggingFace now auto-fetches trending models from the Hub API:
1605
+ async listModels(modality?: Modality): Promise<ModelInfo[]> {
1606
+ if (!this.dynamicModels) await this.fetchHubModels();
1607
+ // Returns: 3 hardcoded defaults + top 50 LLM + top 50 image + top 30 TTS
1608
+ // All filtered by inference_provider=all (only inference-ready models)
1609
+ }
1610
+
1611
+ // Hub API request per modality:
1612
+ // GET https://huggingface.co/api/models
1613
+ // ?pipeline_tag=text-generation
1614
+ // &inference_provider=all ← Only models with active inference endpoints
1615
+ // &sort=trendingScore ← Most popular first
1616
+ // &limit=50
1617
+ // &expand[]=inferenceProviderMapping ← Include provider routing + pricing
1618
+
1619
+ // Response includes per model:
1620
+ // - id: "Qwen/Qwen2.5-72B-Instruct"
1621
+ // - inferenceProviderMapping: [{ provider: "together", status: "live",
1622
+ // providerDetails: { context_length: 32768, pricing: { input: 1.2 } } }]
1623
+
1624
+ // Pricing and context_length extracted from inferenceProviderMapping
1625
+ // 3 hardcoded defaults always included as fallback
1626
+ // Results cached in memory after first fetch
1627
+ ```
1628
+
1629
+ #### The 17 HuggingFace Inference Providers
1630
+
1631
+ The `@huggingface/inference` library supports routing requests through 17 different inference providers. This means a single HuggingFace model ID can be served by multiple backends with different performance/cost characteristics:
1632
+
1633
+ | # | Provider | Type | Strengths |
1634
+ |---|---|---|---|
1635
+ | 1 | `hf-inference` | HuggingFace's own | Default, free tier, rate-limited |
1636
+ | 2 | `hf-dedicated` | Dedicated endpoints | Private, reserved GPU, guaranteed availability |
1637
+ | 3 | `together-ai` | Together.ai | Fast inference, competitive pricing |
1638
+ | 4 | `fireworks-ai` | Fireworks.ai | Optimized serving, function calling |
1639
+ | 5 | `replicate` | Replicate | Pay-per-use, large model catalog |
1640
+ | 6 | `cerebras` | Cerebras | Extreme speed (WSE-3 hardware) |
1641
+ | 7 | `groq` | Groq | Ultra-low latency (LPU hardware) |
1642
+ | 8 | `cohere` | Cohere | Enterprise, embeddings, RAG |
1643
+ | 9 | `sambanova` | SambaNova | Enterprise RDU hardware |
1644
+ | 10 | `nebius` | Nebius | European cloud infrastructure |
1645
+ | 11 | `hyperbolic` | Hyperbolic Labs | Open-access GPU marketplace |
1646
+ | 12 | `novita` | Novita AI | Cost-efficient inference |
1647
+ | 13 | `ovh-cloud` | OVHcloud | European sovereign cloud |
1648
+ | 14 | `aws` | Amazon SageMaker | AWS-managed endpoints |
1649
+ | 15 | `azure` | Azure ML | Azure-managed endpoints |
1650
+ | 16 | `google-vertex` | Google Vertex | GCP-managed endpoints |
1651
+ | 17 | `deepinfra` | DeepInfra | High-throughput inference |
1652
+
1653
+ **Provider routing** is handled by the `@huggingface/inference` library's internal `provider` parameter:
1654
+ ```typescript
1655
+ // Route through a specific inference provider:
1656
+ const response = await client.chatCompletion({
1657
+ model: 'meta-llama/Llama-3.1-70B-Instruct',
1658
+ provider: 'together-ai', // <-- Route through Together.ai
1659
+ messages: [...],
1660
+ });
1661
+
1662
+ // NOTE: Noosphere does NOT currently expose the `provider` parameter
1663
+ // in its ChatOptions type. To use a specific HF inference provider,
1664
+ // you would need a custom provider or direct @huggingface/inference usage.
1665
+ ```
1666
+
1667
+ #### Using HuggingFace Locally — Dedicated Endpoints
1668
+
1669
+ HuggingFace Inference Endpoints let you deploy any model on dedicated GPUs. The `@huggingface/inference` library supports this via the `endpointUrl` parameter:
1670
+
1671
+ ```typescript
1672
+ // Direct HfInference usage with a local/dedicated endpoint:
1673
+ import { HfInference } from '@huggingface/inference';
1674
+
1675
+ const client = new HfInference('your-token');
1676
+
1677
+ // Point to your dedicated endpoint:
1678
+ const response = await client.chatCompletion({
1679
+ model: 'tgi',
1680
+ endpointUrl: 'https://your-endpoint.endpoints.huggingface.cloud',
1681
+ messages: [{ role: 'user', content: 'Hello' }],
1682
+ });
1683
+
1684
+ // For a truly local setup with TGI (Text Generation Inference):
1685
+ const localClient = new HfInference(); // No token needed for local
1686
+ const response = await localClient.chatCompletion({
1687
+ model: 'tgi',
1688
+ endpointUrl: 'http://localhost:8080', // Local TGI server
1689
+ messages: [...],
1690
+ });
1691
+ ```
1692
+
1693
+ **Deploying HuggingFace models locally with TGI:**
1694
+
1695
+ ```bash
1696
+ # 1. Install Text Generation Inference (TGI):
1697
+ docker run --gpus all -p 8080:80 \
1698
+ -v /data:/data \
1699
+ ghcr.io/huggingface/text-generation-inference:latest \
1700
+ --model-id meta-llama/Llama-3.1-8B-Instruct
1701
+
1702
+ # 2. For image models, use Inference Endpoints:
1703
+ # Deploy via https://ui.endpoints.huggingface.co/
1704
+ # Select your model, GPU type, and region
1705
+ # Get an endpoint URL like: https://xyz123.endpoints.huggingface.cloud
1706
+
1707
+ # 3. For TTS models locally, use the Transformers library:
1708
+ # pip install transformers torch
1709
+ # Then run a local server that serves the model
1710
+ ```
1711
+
1712
+ **Other local deployment options:**
1713
+
1714
+ | Method | URL Pattern | Use Case |
1715
+ |---|---|---|
1716
+ | TGI Docker | `http://localhost:8080` | Production local LLM serving |
1717
+ | HF Inference Endpoints | `https://xxxx.endpoints.huggingface.cloud` | Managed dedicated GPU |
1718
+ | vLLM with HF models | `http://localhost:8000` | High-throughput local serving |
1719
+ | Transformers + FastAPI | Custom URL | Custom model serving |
1720
+
1721
+ #### Unexposed `@huggingface/inference` Parameters
1722
+
1723
+ The `chatCompletion()` method accepts many parameters that Noosphere's `ChatOptions` doesn't currently expose. These are available if you use the library directly:
1724
+
1725
+ | Parameter | Type | Description |
1726
+ |---|---|---|
1727
+ | `temperature` | `number` | Sampling temperature (0-2.0) — **exposed** via `ChatOptions.temperature` |
1728
+ | `max_tokens` | `number` | Max output tokens — **exposed** via `ChatOptions.maxTokens` |
1729
+ | `top_p` | `number` | Nucleus sampling threshold (0-1.0) — **not exposed** |
1730
+ | `top_k` | `number` | Top-K sampling — **not exposed** |
1731
+ | `frequency_penalty` | `number` | Penalize repeated tokens (-2.0 to 2.0) — **not exposed** |
1732
+ | `presence_penalty` | `number` | Penalize tokens already present (-2.0 to 2.0) — **not exposed** |
1733
+ | `repetition_penalty` | `number` | Alternative repetition penalty (>1.0 penalizes) — **not exposed** |
1734
+ | `stop` | `string[]` | Stop sequences — **not exposed** |
1735
+ | `seed` | `number` | Deterministic sampling seed — **not exposed** |
1736
+ | `tools` | `Tool[]` | Function/tool definitions — **not exposed** |
1737
+ | `tool_choice` | `string \| object` | Tool selection strategy — **not exposed** |
1738
+ | `tool_prompt` | `string` | System prompt for tool use — **not exposed** |
1739
+ | `response_format` | `object` | JSON schema constraints — **not exposed** |
1740
+ | `reasoning_effort` | `string` | Thinking depth level — **not exposed** |
1741
+ | `stream` | `boolean` | Enable streaming — **not exposed** (use `chatCompletionStream()`) |
1742
+ | `provider` | `string` | Inference provider routing — **not exposed** |
1743
+ | `endpointUrl` | `string` | Custom endpoint URL — **not exposed** |
1744
+ | `n` | `number` | Number of completions — **not exposed** |
1745
+ | `logprobs` | `boolean` | Return log probabilities — **not exposed** |
1746
+ | `grammar` | `object` | BNF grammar constraints — **not exposed** |
1747
+
1748
+ **Image generation unexposed parameters:**
1749
+ | Parameter | Type | Description |
1750
+ |---|---|---|
1751
+ | `negative_prompt` | `string` | **Exposed** via `ImageOptions.negativePrompt` |
1752
+ | `width` / `height` | `number` | **Exposed** via `ImageOptions.width/height` |
1753
+ | `guidance_scale` | `number` | **Exposed** via `ImageOptions.guidanceScale` |
1754
+ | `num_inference_steps` | `number` | **Exposed** via `ImageOptions.steps` |
1755
+ | `scheduler` | `string` | Diffusion scheduler type — **not exposed** |
1756
+ | `target_size` | `object` | Target resize dimensions — **not exposed** |
1757
+ | `clip_skip` | `number` | CLIP skip layers — **not exposed** |
1758
+
1759
+ #### HuggingFace Error Behavior
1760
+
1761
+ Unlike other providers, HuggingFaceProvider does **not** catch errors from the `@huggingface/inference` library. All errors propagate directly up to Noosphere's `executeWithRetry()`:
1762
+
1763
+ ```
1764
+ HfInference throws → HuggingFaceProvider propagates →
1765
+ executeWithRetry catches → Noosphere wraps as NoosphereError
1766
+ ```
1767
+
1768
+ Common error scenarios:
1769
+ - **401 Unauthorized** — Invalid or expired token → becomes `AUTH_FAILED`
1770
+ - **404 Model Not Found** — Model ID doesn't exist on HF Hub → becomes `MODEL_NOT_FOUND`
1771
+ - **429 Rate Limited** — Free tier limit exceeded → becomes `RATE_LIMITED` (retryable)
1772
+ - **503 Model Loading** — Model is cold-starting on HF Inference → becomes `PROVIDER_UNAVAILABLE` (retryable)
1773
+
1215
1774
  ---
1216
1775
 
1217
1776
  ### ComfyUI — Local Image Generation
@@ -1220,17 +1779,237 @@ The `@huggingface/inference` library (v3.15.0) provides 30+ AI tasks, including
1220
1779
  **Modalities:** Image, Video (planned)
1221
1780
  **Type:** Local
1222
1781
  **Default Port:** 8188
1782
+ **Source:** `src/providers/comfyui.ts` (155 lines)
1783
+
1784
+ Connects to a local ComfyUI instance for Stable Diffusion workflows. ComfyUI is a node-based UI for Stable Diffusion that exposes an HTTP API. Noosphere communicates with it via raw HTTP — no ComfyUI SDK needed.
1785
+
1786
+ #### How It Works — Complete Lifecycle
1787
+
1788
+ ```
1789
+ User calls ai.image() →
1790
+ 1. structuredClone(DEFAULT_TXT2IMG_WORKFLOW) // Deep-clone the template
1791
+ 2. Inject parameters into workflow nodes // Mutate the clone
1792
+ 3. POST /prompt { prompt: workflow } // Queue the workflow
1793
+ 4. Receive { prompt_id: "abc-123" } // Get tracking ID
1794
+ 5. POLL GET /history/abc-123 every 1000ms // Check completion
1795
+ 6. Parse outputs → find SaveImage node // Locate generated image
1796
+ 7. GET /view?filename=X&subfolder=Y&type=Z // Fetch image binary
1797
+ 8. Return Buffer // PNG buffer to caller
1798
+ ```
1799
+
1800
+ #### The Complete Workflow JSON — All 8 Nodes
1223
1801
 
1224
- Connects to a local ComfyUI instance for Stable Diffusion workflows.
1802
+ The `DEFAULT_TXT2IMG_WORKFLOW` constant defines a complete SDXL text-to-image pipeline as a ComfyUI node graph. Each key is a **node ID** (string), each value defines the node type and its connections:
1225
1803
 
1226
- #### How It Works
1804
+ ```typescript
1805
+ // Node "3": KSampler — The core diffusion sampling node
1806
+ '3': {
1807
+ class_type: 'KSampler',
1808
+ inputs: {
1809
+ seed: 0, // Random seed (overridden by options.seed)
1810
+ steps: 20, // Denoising steps (overridden by options.steps)
1811
+ cfg: 7, // CFG/guidance scale (overridden by options.guidanceScale)
1812
+ sampler_name: 'euler', // Sampling algorithm
1813
+ scheduler: 'normal', // Noise schedule
1814
+ denoise: 1, // Full denoise (1.0 = txt2img, <1.0 = img2img)
1815
+ model: ['4', 0], // ← Connection: output 0 of node "4" (checkpoint model)
1816
+ positive: ['6', 0], // ← Connection: output 0 of node "6" (positive prompt)
1817
+ negative: ['7', 0], // ← Connection: output 0 of node "7" (negative prompt)
1818
+ latent_image: ['5', 0], // ← Connection: output 0 of node "5" (empty latent)
1819
+ },
1820
+ }
1227
1821
 
1228
- 1. Clones a built-in txt2img workflow template (KSampler + SDXL pipeline)
1229
- 2. Injects your parameters (prompt, dimensions, seed, steps, guidance)
1230
- 3. POSTs the workflow to ComfyUI's `/prompt` endpoint
1231
- 4. Polls `/history/{promptId}` every second until completion (max 5 minutes)
1232
- 5. Fetches the generated image from `/view`
1233
- 6. Returns a PNG buffer
1822
+ // Node "4": CheckpointLoaderSimple Loads the SDXL model from disk
1823
+ '4': {
1824
+ class_type: 'CheckpointLoaderSimple',
1825
+ inputs: {
1826
+ ckpt_name: 'sd_xl_base_1.0.safetensors', // Checkpoint file on disk
1827
+ // Outputs: [0]=MODEL, [1]=CLIP, [2]=VAE
1828
+ // MODEL → KSampler.model
1829
+ // CLIP → CLIPTextEncode nodes
1830
+ // VAE → VAEDecode
1831
+ },
1832
+ }
1833
+
1834
+ // Node "5": EmptyLatentImage — Creates the initial noise tensor
1835
+ '5': {
1836
+ class_type: 'EmptyLatentImage',
1837
+ inputs: {
1838
+ width: 1024, // Overridden by options.width
1839
+ height: 1024, // Overridden by options.height
1840
+ batch_size: 1, // Always 1 image per generation
1841
+ },
1842
+ }
1843
+
1844
+ // Node "6": CLIPTextEncode — Positive prompt encoding
1845
+ '6': {
1846
+ class_type: 'CLIPTextEncode',
1847
+ inputs: {
1848
+ text: '', // Overridden by options.prompt
1849
+ clip: ['4', 1], // ← Connection: output 1 of node "4" (CLIP model)
1850
+ },
1851
+ }
1852
+
1853
+ // Node "7": CLIPTextEncode — Negative prompt encoding
1854
+ '7': {
1855
+ class_type: 'CLIPTextEncode',
1856
+ inputs: {
1857
+ text: '', // Overridden by options.negativePrompt ?? ''
1858
+ clip: ['4', 1], // ← Same CLIP model as positive prompt
1859
+ },
1860
+ }
1861
+
1862
+ // Node "8": VAEDecode — Converts latent space to pixel space
1863
+ '8': {
1864
+ class_type: 'VAEDecode',
1865
+ inputs: {
1866
+ samples: ['3', 0], // ← Connection: output 0 of node "3" (sampled latents)
1867
+ vae: ['4', 2], // ← Connection: output 2 of node "4" (VAE decoder)
1868
+ },
1869
+ }
1870
+
1871
+ // Node "9": SaveImage — Saves the final image
1872
+ '9': {
1873
+ class_type: 'SaveImage',
1874
+ inputs: {
1875
+ filename_prefix: 'noosphere', // Files saved as noosphere_00001.png, etc.
1876
+ images: ['8', 0], // ← Connection: output 0 of node "8" (decoded image)
1877
+ },
1878
+ }
1879
+ ```
1880
+
1881
+ **Node connection format:** `['nodeId', outputIndex]` — this is ComfyUI's internal linking system. For example, `['4', 1]` means "output slot 1 of node 4", which is the CLIP model from CheckpointLoaderSimple.
1882
+
1883
+ **Visual pipeline flow:**
1884
+ ```
1885
+ CheckpointLoader["4"] ──MODEL──→ KSampler["3"]
1886
+ ├──CLIP──→ CLIPTextEncode["6"] (positive) ──→ KSampler["3"]
1887
+ ├──CLIP──→ CLIPTextEncode["7"] (negative) ──→ KSampler["3"]
1888
+ └──VAE───→ VAEDecode["8"]
1889
+ EmptyLatentImage["5"] ──→ KSampler["3"] ──→ VAEDecode["8"] ──→ SaveImage["9"]
1890
+ ```
1891
+
1892
+ #### Parameter Injection — How Options Map to Nodes
1893
+
1894
+ ```typescript
1895
+ // Deep-clone to avoid mutating the template:
1896
+ const workflow = structuredClone(DEFAULT_TXT2IMG_WORKFLOW);
1897
+
1898
+ // Direct node mutations:
1899
+ workflow['6'].inputs.text = options.prompt; // Positive prompt → Node 6
1900
+ workflow['7'].inputs.text = options.negativePrompt ?? ''; // Negative prompt → Node 7
1901
+ workflow['5'].inputs.width = options.width ?? 1024; // Width → Node 5
1902
+ workflow['5'].inputs.height = options.height ?? 1024; // Height → Node 5
1903
+
1904
+ // Conditional overrides (only if user provided them):
1905
+ if (options.seed !== undefined) workflow['3'].inputs.seed = options.seed;
1906
+ if (options.steps !== undefined) workflow['3'].inputs.steps = options.steps;
1907
+ if (options.guidanceScale !== undefined) workflow['3'].inputs.cfg = options.guidanceScale;
1908
+ // Note: sampler_name, scheduler, and denoise are NOT configurable via Noosphere.
1909
+ // They're hardcoded to euler/normal/1.0
1910
+ ```
1911
+
1912
+ #### Queue Submission — POST /prompt
1913
+
1914
+ ```typescript
1915
+ const queueRes = await fetch(`${this.baseUrl}/prompt`, {
1916
+ method: 'POST',
1917
+ headers: { 'Content-Type': 'application/json' },
1918
+ body: JSON.stringify({ prompt: workflow }),
1919
+ // ComfyUI expects: { prompt: <workflow_object>, client_id?: string }
1920
+ });
1921
+
1922
+ if (!queueRes.ok) throw new Error(`ComfyUI queue failed: ${queueRes.status}`);
1923
+
1924
+ const { prompt_id } = await queueRes.json();
1925
+ // prompt_id is a UUID like "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
1926
+ // Used to track this specific generation in the history API
1927
+ ```
1928
+
1929
+ #### Polling Mechanism — Deadline-Based with 1s Intervals
1930
+
1931
+ ```typescript
1932
+ private async pollForResult(promptId: string, maxWaitMs = 300000): Promise<ArrayBuffer> {
1933
+ const deadline = Date.now() + maxWaitMs; // 300,000ms = 5 minutes
1934
+
1935
+ while (Date.now() < deadline) {
1936
+ // Check history for our prompt
1937
+ const res = await fetch(`${this.baseUrl}/history/${promptId}`);
1938
+
1939
+ if (!res.ok) {
1940
+ await new Promise((r) => setTimeout(r, 1000)); // 1 second between polls
1941
+ continue;
1942
+ }
1943
+
1944
+ const history = await res.json();
1945
+ // History format: { [promptId]: { outputs: { [nodeId]: { images: [...] } } } }
1946
+
1947
+ const entry = history[promptId];
1948
+ if (!entry?.outputs) {
1949
+ await new Promise((r) => setTimeout(r, 1000)); // Not ready yet
1950
+ continue;
1951
+ }
1952
+
1953
+ // Search ALL output nodes for images (not just node "9"):
1954
+ for (const nodeOutput of Object.values(entry.outputs)) {
1955
+ if (nodeOutput.images?.length > 0) {
1956
+ const img = nodeOutput.images[0];
1957
+ // Fetch the actual image binary:
1958
+ const imgRes = await fetch(
1959
+ `${this.baseUrl}/view?filename=${img.filename}&subfolder=${img.subfolder}&type=${img.type}`
1960
+ );
1961
+ return imgRes.arrayBuffer();
1962
+ }
1963
+ }
1964
+
1965
+ await new Promise((r) => setTimeout(r, 1000));
1966
+ }
1967
+
1968
+ throw new Error(`ComfyUI generation timed out after ${maxWaitMs}ms`);
1969
+ }
1970
+ ```
1971
+
1972
+ **Key polling details:**
1973
+ - **Interval:** Fixed 1000ms (not configurable)
1974
+ - **Timeout:** 300,000ms = 5 minutes (hardcoded, not from `config.timeout.image`)
1975
+ - **Deadline-based:** Uses `Date.now() < deadline` comparison, NOT a retry counter
1976
+ - **Image fetch URL format:** `/view?filename=noosphere_00001_.png&subfolder=&type=output`
1977
+ - **Returns:** Raw `ArrayBuffer` → converted to `Buffer` by the caller
1978
+
1979
+ #### Auto-Detection — How ComfyUI Gets Discovered
1980
+
1981
+ During `Noosphere.init()`, if `autoDetectLocal` is true:
1982
+
1983
+ ```typescript
1984
+ // Ping the /system_stats endpoint with a 2-second timeout:
1985
+ const pingUrl = async (url: string): Promise<boolean> => {
1986
+ const controller = new AbortController();
1987
+ const timer = setTimeout(() => controller.abort(), 2000); // 2s hard timeout
1988
+ try {
1989
+ const res = await fetch(url, { signal: controller.signal });
1990
+ return res.ok;
1991
+ } finally {
1992
+ clearTimeout(timer);
1993
+ }
1994
+ };
1995
+
1996
+ // Check ComfyUI specifically:
1997
+ if (comfyuiCfg?.enabled) {
1998
+ const ok = await pingUrl(`${comfyuiCfg.host}:${comfyuiCfg.port}/system_stats`);
1999
+ if (ok) {
2000
+ this.registry.addProvider(new ComfyUIProvider({
2001
+ host: comfyuiCfg.host, // Default: 'http://localhost'
2002
+ port: comfyuiCfg.port, // Default: 8188
2003
+ }));
2004
+ }
2005
+ }
2006
+ ```
2007
+
2008
+ **Environment variable overrides:**
2009
+ ```bash
2010
+ COMFYUI_HOST=http://192.168.1.100 # Override host
2011
+ COMFYUI_PORT=8190 # Override port
2012
+ ```
1234
2013
 
1235
2014
  #### Configuration
1236
2015
 
@@ -1238,30 +2017,82 @@ Connects to a local ComfyUI instance for Stable Diffusion workflows.
1238
2017
  const ai = new Noosphere({
1239
2018
  local: {
1240
2019
  comfyui: {
1241
- enabled: true,
1242
- host: 'http://localhost',
1243
- port: 8188,
2020
+ enabled: true, // Default: true (auto-detected)
2021
+ host: 'http://localhost', // Default: 'http://localhost'
2022
+ port: 8188, // Default: 8188
1244
2023
  },
1245
2024
  },
1246
2025
  });
1247
2026
  ```
1248
2027
 
1249
- #### Default Workflow
2028
+ #### Model Discovery — Dynamic via /object_info
1250
2029
 
1251
- - **Checkpoint:** `sd_xl_base_1.0.safetensors`
1252
- - **Sampler:** euler with normal scheduler
1253
- - **Default Steps:** 20
1254
- - **Default CFG/Guidance:** 7
1255
- - **Default Size:** 1024x1024
1256
- - **Max Size:** 2048x2048
1257
- - **Output:** PNG
2030
+ ```typescript
2031
+ async listModels(modality?: Modality): Promise<ModelInfo[]> {
2032
+ // Fetches ComfyUI's full node registry:
2033
+ const res = await fetch(`${this.baseUrl}/object_info`);
2034
+ if (!res.ok) return [];
2035
+
2036
+ // Does NOT parse the response — just uses it as a connectivity check.
2037
+ // Returns hardcoded model entries:
2038
+ const models: ModelInfo[] = [];
2039
+ if (!modality || modality === 'image') {
2040
+ models.push({
2041
+ id: 'comfyui-txt2img',
2042
+ provider: 'comfyui',
2043
+ name: 'ComfyUI Text-to-Image',
2044
+ modality: 'image',
2045
+ local: true,
2046
+ cost: { price: 0, unit: 'free' },
2047
+ capabilities: { maxWidth: 2048, maxHeight: 2048, supportsNegativePrompt: true },
2048
+ });
2049
+ }
2050
+ if (!modality || modality === 'video') {
2051
+ models.push({
2052
+ id: 'comfyui-txt2vid',
2053
+ provider: 'comfyui',
2054
+ name: 'ComfyUI Text-to-Video',
2055
+ modality: 'video',
2056
+ local: true,
2057
+ cost: { price: 0, unit: 'free' },
2058
+ capabilities: { maxDuration: 10, supportsImageToVideo: true },
2059
+ });
2060
+ }
2061
+ return models;
2062
+ }
2063
+ // NOTE: /object_info is fetched but the response is discarded.
2064
+ // The actual model list is hardcoded. This means even if you have
2065
+ // dozens of checkpoints in ComfyUI, Noosphere only exposes 2 model IDs.
2066
+ ```
1258
2067
 
1259
- #### Models Exposed
2068
+ #### Video Generation — Not Yet Implemented
1260
2069
 
1261
- | Model ID | Modality | Description |
1262
- |---|---|---|
1263
- | `comfyui-txt2img` | Image | Text-to-image via workflow |
1264
- | `comfyui-txt2vid` | Video | Planned (requires AnimateDiff workflow) |
2070
+ ```typescript
2071
+ async video(_options: VideoOptions): Promise<NoosphereResult> {
2072
+ throw new Error('ComfyUI video generation requires a configured AnimateDiff workflow');
2073
+ }
2074
+ // The 'comfyui-txt2vid' model ID is listed but will throw at runtime.
2075
+ // This is a placeholder for future AnimateDiff/SVD workflow templates.
2076
+ ```
2077
+
2078
+ #### Default Workflow Parameters Summary
2079
+
2080
+ | Parameter | Default | Configurable | Node |
2081
+ |---|---|---|---|
2082
+ | Checkpoint | `sd_xl_base_1.0.safetensors` | No | Node 4 |
2083
+ | Sampler | `euler` | No | Node 3 |
2084
+ | Scheduler | `normal` | No | Node 3 |
2085
+ | Denoise | `1.0` | No | Node 3 |
2086
+ | Steps | `20` | Yes (`options.steps`) | Node 3 |
2087
+ | CFG/Guidance | `7` | Yes (`options.guidanceScale`) | Node 3 |
2088
+ | Seed | `0` | Yes (`options.seed`) | Node 3 |
2089
+ | Width | `1024` | Yes (`options.width`) | Node 5 |
2090
+ | Height | `1024` | Yes (`options.height`) | Node 5 |
2091
+ | Batch Size | `1` | No | Node 5 |
2092
+ | Filename Prefix | `noosphere` | No | Node 9 |
2093
+ | Negative Prompt | `''` (empty) | Yes (`options.negativePrompt`) | Node 7 |
2094
+ | Max Size | `2048x2048` | Via options | Node 5 |
2095
+ | Output Format | PNG | No | ComfyUI default |
1265
2096
 
1266
2097
  ---
1267
2098
 
@@ -1270,99 +2101,889 @@ const ai = new Noosphere({
1270
2101
  **Provider IDs:** `piper`, `kokoro`
1271
2102
  **Modality:** TTS
1272
2103
  **Type:** Local
2104
+ **Source:** `src/providers/local-tts.ts` (112 lines)
1273
2105
 
1274
- Connects to local OpenAI-compatible TTS servers.
2106
+ The `LocalTTSProvider` is a generic adapter for any local TTS server that exposes an OpenAI-compatible `/v1/audio/speech` endpoint. Two instances are created by default — one for Piper, one for Kokoro — but the class works with ANY server implementing this protocol.
1275
2107
 
1276
2108
  #### Supported Engines
1277
2109
 
1278
- | Engine | Default Port | Health Check | Voice Discovery |
1279
- |---|---|---|---|
1280
- | Piper | 5500 | `GET /health` | `GET /voices` |
1281
- | Kokoro | 5501 | `GET /health` | `GET /v1/models` (fallback) |
2110
+ | Engine | Default Port | Health Check | Voice Discovery | Description |
2111
+ |---|---|---|---|---|
2112
+ | Piper | 5500 | `GET /health` | `GET /voices` (array) | Fast offline TTS, 30+ languages, ONNX models |
2113
+ | Kokoro | 5501 | `GET /health` | `GET /v1/models` (OpenAI format) | High-quality neural TTS |
1282
2114
 
1283
- #### API
2115
+ #### Provider Instantiation — How Instances Are Created
1284
2116
 
1285
- Uses the OpenAI-compatible TTS endpoint:
2117
+ ```typescript
2118
+ // The LocalTTSProvider constructor takes a config object:
2119
+ interface LocalTTSConfig {
2120
+ id: string; // Provider ID: 'piper' or 'kokoro'
2121
+ name: string; // Display name: 'Piper TTS' or 'Kokoro TTS'
2122
+ host: string; // Base URL host
2123
+ port: number; // Port number
2124
+ }
1286
2125
 
2126
+ // Two separate instances are created during init():
2127
+ new LocalTTSProvider({ id: 'piper', name: 'Piper TTS', host: piperCfg.host, port: piperCfg.port })
2128
+ new LocalTTSProvider({ id: 'kokoro', name: 'Kokoro TTS', host: kokoroCfg.host, port: kokoroCfg.port })
2129
+
2130
+ // Each instance is an independent provider in the registry.
2131
+ // They don't share state or config.
2132
+ // The baseUrl is constructed as: `${config.host}:${config.port}`
2133
+ // Example: "http://localhost:5500"
1287
2134
  ```
1288
- POST /v1/audio/speech
2135
+
2136
+ #### Health Check — Ping Protocol
2137
+
2138
+ ```typescript
2139
+ async ping(): Promise<boolean> {
2140
+ try {
2141
+ const res = await fetch(`${this.baseUrl}/health`);
2142
+ return res.ok; // true if HTTP 200-299
2143
+ } catch {
2144
+ return false; // Network error, connection refused, etc.
2145
+ }
2146
+ }
2147
+ // Used during auto-detection in Noosphere.init()
2148
+ // Also used by: the 2-second AbortController timeout in init()
2149
+ // Note: /health is checked BEFORE the provider is registered.
2150
+ // If /health fails, the provider is silently skipped.
2151
+ ```
2152
+
2153
+ #### Dual Voice Discovery Mechanism
2154
+
2155
+ The `listModels()` method implements a **two-strategy fallback** to discover available voices. This is necessary because different TTS servers expose voices through different API formats:
2156
+
2157
+ ```typescript
2158
+ async listModels(modality?: Modality): Promise<ModelInfo[]> {
2159
+ if (modality && modality !== 'tts') return [];
2160
+
2161
+ let voices: Array<{ id: string; name?: string }> = [];
2162
+
2163
+ // STRATEGY 1: Piper-style /voices endpoint
2164
+ // Expected response: Array<{ id: string, name?: string, ... }>
2165
+ try {
2166
+ const res = await fetch(`${this.baseUrl}/voices`);
2167
+ if (res.ok) {
2168
+ const data = await res.json();
2169
+ if (Array.isArray(data)) {
2170
+ voices = data;
2171
+ // Success — skip fallback
2172
+ }
2173
+ }
2174
+ } catch {
2175
+ // STRATEGY 2: OpenAI-compatible /v1/models endpoint
2176
+ // Expected response: { data: Array<{ id: string, ... }> }
2177
+ const res = await fetch(`${this.baseUrl}/v1/models`);
2178
+ if (res.ok) {
2179
+ const data = await res.json();
2180
+ voices = data.data ?? [];
2181
+ }
2182
+ }
2183
+
2184
+ // Map voices to ModelInfo objects:
2185
+ return voices.map((v) => ({
2186
+ id: v.id,
2187
+ provider: this.id, // 'piper' or 'kokoro'
2188
+ name: v.name ?? v.id, // Fallback to ID if no name
2189
+ modality: 'tts' as const,
2190
+ local: true,
2191
+ cost: { price: 0, unit: 'free' },
2192
+ capabilities: {
2193
+ voices: voices.map((vv) => vv.id), // All voice IDs as capabilities
2194
+ },
2195
+ }));
2196
+ }
2197
+ ```
2198
+
2199
+ **Critical implementation detail:** The fallback is triggered by a `catch` block, NOT by checking the response. This means:
2200
+ - If `/voices` returns a **non-array** (e.g., `{}`), strategy 1 succeeds but `voices` remains empty
2201
+ - If `/voices` returns HTTP **404**, strategy 1 "succeeds" (no exception), but `res.ok` is false, so voices stays empty, AND strategy 2 is never tried
2202
+ - Strategy 2 only runs if `/voices` **throws a network error** (connection refused, DNS failure, etc.)
2203
+
2204
+ **Piper response format** (`GET /voices`):
2205
+ ```json
2206
+ [
2207
+ { "id": "en_US-lessac-medium", "name": "Lessac (English US)" },
2208
+ { "id": "en_US-amy-medium", "name": "Amy (English US)" },
2209
+ { "id": "de_DE-thorsten-high", "name": "Thorsten (German)" }
2210
+ ]
2211
+ ```
2212
+
2213
+ **Kokoro/OpenAI response format** (`GET /v1/models`):
2214
+ ```json
1289
2215
  {
1290
- "model": "tts-1",
1291
- "input": "Hello world",
1292
- "voice": "default",
1293
- "speed": 1.0,
1294
- "response_format": "mp3"
2216
+ "data": [
2217
+ { "id": "kokoro-v1", "object": "model" },
2218
+ { "id": "kokoro-v1-jp", "object": "model" }
2219
+ ]
1295
2220
  }
1296
2221
  ```
1297
2222
 
1298
- Supports `mp3`, `wav`, and `ogg` formats. Returns audio as a Buffer.
2223
+ #### Speech Generation Exact HTTP Protocol
2224
+
2225
+ ```typescript
2226
+ async speak(options: SpeakOptions): Promise<NoosphereResult> {
2227
+ const start = Date.now();
2228
+
2229
+ // POST to OpenAI-compatible TTS endpoint:
2230
+ const res = await fetch(`${this.baseUrl}/v1/audio/speech`, {
2231
+ method: 'POST',
2232
+ headers: { 'Content-Type': 'application/json' },
2233
+ body: JSON.stringify({
2234
+ model: options.model ?? 'tts-1', // Default model ID
2235
+ input: options.text, // Text to synthesize
2236
+ voice: options.voice ?? 'default', // Voice selection
2237
+ speed: options.speed ?? 1.0, // Playback speed multiplier
2238
+ response_format: options.format ?? 'mp3', // Output audio format
2239
+ }),
2240
+ });
2241
+
2242
+ if (!res.ok) {
2243
+ throw new Error(`Local TTS failed: ${res.status} ${await res.text()}`);
2244
+ // Note: error includes the response body text for debugging
2245
+ }
2246
+
2247
+ // Response is raw audio binary — convert to Buffer:
2248
+ const audioBuffer = Buffer.from(await res.arrayBuffer());
2249
+
2250
+ return {
2251
+ buffer: audioBuffer,
2252
+ provider: this.id, // 'piper' or 'kokoro'
2253
+ model: options.model ?? options.voice ?? 'default', // Fallback chain
2254
+ modality: 'tts',
2255
+ latencyMs: Date.now() - start,
2256
+ usage: {
2257
+ cost: 0, // Always free (local)
2258
+ input: options.text.length, // CHARACTER count, not tokens
2259
+ unit: 'characters', // Track by characters
2260
+ },
2261
+ media: {
2262
+ format: options.format ?? 'mp3', // Matches requested format
2263
+ },
2264
+ };
2265
+ }
2266
+ ```
2267
+
2268
+ **Request/Response details:**
2269
+ | Field | Value | Notes |
2270
+ |---|---|---|
2271
+ | Method | `POST` | Always POST |
2272
+ | URL | `/v1/audio/speech` | OpenAI-compatible standard |
2273
+ | Content-Type | `application/json` | JSON body |
2274
+ | Response Content-Type | `audio/mpeg`, `audio/wav`, or `audio/ogg` | Depends on `response_format` |
2275
+ | Response Body | Raw binary audio | Converted to `Buffer` via `arrayBuffer()` |
2276
+
2277
+ **Available formats (from `SpeakOptions.format` type):**
2278
+ | Format | Typical Size | Quality | Use Case |
2279
+ |---|---|---|---|
2280
+ | `mp3` | Smallest | Lossy | Web playback, storage |
2281
+ | `wav` | Largest | Lossless | Processing, editing |
2282
+ | `ogg` | Medium | Lossy | Web playback, open format |
2283
+
2284
+ #### Usage Tracking — Character-Based
2285
+
2286
+ Local TTS tracks usage by **character count**, not tokens:
2287
+
2288
+ ```typescript
2289
+ usage: {
2290
+ cost: 0, // Always 0 for local providers
2291
+ input: options.text.length, // JavaScript string .length (UTF-16 code units)
2292
+ unit: 'characters', // Unit identifier for aggregation
2293
+ }
2294
+ // Note: .length counts UTF-16 code units, not Unicode codepoints.
2295
+ // "Hello" = 5, "🎵" = 2 (surrogate pair), "café" = 4
2296
+ ```
2297
+
2298
+ This feeds into the global `UsageTracker`, so you can query TTS usage:
2299
+ ```typescript
2300
+ const usage = ai.getUsage({ modality: 'tts' });
2301
+ // usage.totalRequests = number of TTS calls
2302
+ // usage.totalCost = 0 (always free for local)
2303
+ // usage.byProvider = { piper: 0, kokoro: 0 }
2304
+ ```
2305
+
2306
+ #### Auto-Detection — Parallel Discovery
2307
+
2308
+ Both Piper and Kokoro are detected simultaneously during `init()`:
2309
+
2310
+ ```typescript
2311
+ // Inside Noosphere.init(), wrapped in Promise.allSettled():
2312
+ await Promise.allSettled([
2313
+ // ... ComfyUI detection ...
2314
+ (async () => {
2315
+ if (piperCfg?.enabled) { // enabled: true by default
2316
+ const ok = await pingUrl(`${piperCfg.host}:${piperCfg.port}/health`);
2317
+ if (ok) {
2318
+ this.registry.addProvider(new LocalTTSProvider({
2319
+ id: 'piper', name: 'Piper TTS',
2320
+ host: piperCfg.host, port: piperCfg.port,
2321
+ }));
2322
+ }
2323
+ }
2324
+ })(),
2325
+ (async () => {
2326
+ if (kokoroCfg?.enabled) { // enabled: true by default
2327
+ const ok = await pingUrl(`${kokoroCfg.host}:${kokoroCfg.port}/health`);
2328
+ if (ok) {
2329
+ this.registry.addProvider(new LocalTTSProvider({
2330
+ id: 'kokoro', name: 'Kokoro TTS',
2331
+ host: kokoroCfg.host, port: kokoroCfg.port,
2332
+ }));
2333
+ }
2334
+ }
2335
+ })(),
2336
+ ]);
2337
+ ```
2338
+
2339
+ **Environment variable overrides:**
2340
+ ```bash
2341
+ PIPER_HOST=http://192.168.1.100 PIPER_PORT=5500
2342
+ KOKORO_HOST=http://192.168.1.100 KOKORO_PORT=5501
2343
+ ```
2344
+
2345
+ #### Setting Up Local TTS Servers
2346
+
2347
+ **Piper TTS:**
2348
+ ```bash
2349
+ # Docker (recommended):
2350
+ docker run -p 5500:5500 rhasspy/wyoming-piper \
2351
+ --voice en_US-lessac-medium
2352
+
2353
+ # Or via pip:
2354
+ pip install piper-tts
2355
+ # Then run a compatible HTTP server (wyoming-piper or piper-http-server)
2356
+ ```
2357
+
2358
+ **Kokoro TTS:**
2359
+ ```bash
2360
+ # Docker:
2361
+ docker run -p 5501:8880 ghcr.io/remsky/kokoro-fastapi-cpu:latest
2362
+
2363
+ # The Kokoro server exposes OpenAI-compatible endpoints at:
2364
+ # GET /v1/models → List available voices
2365
+ # POST /v1/audio/speech → Generate speech
2366
+ # GET /health → Health check
2367
+ ```
1299
2368
 
1300
2369
  ---
1301
2370
 
1302
2371
  ## Architecture
1303
2372
 
1304
- ### Provider Resolution (Local-First)
2373
+ ### The Complete Init() Flow — What Happens When You Create a Noosphere Instance
2374
+
2375
+ ```typescript
2376
+ const ai = new Noosphere({ /* config */ });
2377
+ // At this point: config is resolved, but NO providers are registered.
2378
+ // The `initialized` flag is false.
2379
+
2380
+ await ai.chat({ messages: [...] });
2381
+ // FIRST call triggers lazy initialization via init()
2382
+ ```
2383
+
2384
+ **Initialization sequence (`src/noosphere.ts:240-322`):**
2385
+
2386
+ ```
2387
+ 1. Constructor:
2388
+ ├── resolveConfig(input) // Merge config > env > defaults
2389
+ ├── new Registry(cacheTTLMinutes) // Empty provider registry
2390
+ └── new UsageTracker(onUsage) // Empty event list
2391
+
2392
+ 2. First API call triggers init():
2393
+ ├── Set initialized = true (immediately, before any async work)
2394
+
2395
+ ├── CLOUD PROVIDER REGISTRATION (synchronous):
2396
+ │ ├── Collect all API keys from resolved config
2397
+ │ ├── If ANY LLM key exists → register PiAiProvider(allKeys)
2398
+ │ ├── If FAL key exists → register FalProvider(falKey)
2399
+ │ └── If HF token exists → register HuggingFaceProvider(token)
2400
+
2401
+ └── LOCAL SERVICE DETECTION (parallel, async):
2402
+ └── Promise.allSettled([
2403
+ pingUrl(comfyui /system_stats) → register ComfyUIProvider
2404
+ pingUrl(piper /health) → register LocalTTSProvider('piper')
2405
+ pingUrl(kokoro /health) → register LocalTTSProvider('kokoro')
2406
+ ])
2407
+ ```
2408
+
2409
+ **Key design decisions:**
2410
+ - `initialized = true` is set **before** async work, preventing concurrent init() calls
2411
+ - Cloud providers are registered **synchronously** (no network calls needed)
2412
+ - Local detection uses `Promise.allSettled()` — a failing ping doesn't block others
2413
+ - Each ping has a 2-second `AbortController` timeout
2414
+ - If auto-detection is disabled (`autoDetectLocal: false`), local providers are never registered
1305
2415
 
1306
- When you call a generation method without specifying a provider, Noosphere resolves one automatically:
2416
+ ### Configuration Resolution Three-Layer Priority System
1307
2417
 
1308
- 1. If `model` is specified without `provider` looks up model in registry cache
1309
- 2. If a `default` is configured for the modality → uses that
1310
- 3. Otherwise → **local providers first**, then cloud providers
2418
+ The `resolveConfig()` function (`src/config.ts`, 87 lines) implements a strict priority hierarchy:
1311
2419
 
1312
2420
  ```
1313
- resolveProvider(modality):
1314
- 1. Check user-specified provider ID → return if found
1315
- 2. Check configured defaults → return if found
1316
- 3. Scan all providers:
1317
- → Return first LOCAL provider supporting this modality
1318
- → Fallback to first CLOUD provider
1319
- 4. Throw NO_PROVIDER error
2421
+ Priority: Explicit Config > Environment Variables > Built-in Defaults
1320
2422
  ```
1321
2423
 
1322
- ### Retry & Failover Logic
2424
+ **API Key Resolution:**
2425
+ ```typescript
2426
+ // For each of the 9 supported providers:
2427
+ const ENV_KEY_MAP = {
2428
+ openai: 'OPENAI_API_KEY',
2429
+ anthropic: 'ANTHROPIC_API_KEY',
2430
+ google: 'GEMINI_API_KEY',
2431
+ fal: 'FAL_KEY',
2432
+ openrouter: 'OPENROUTER_API_KEY',
2433
+ huggingface: 'HUGGINGFACE_TOKEN',
2434
+ groq: 'GROQ_API_KEY',
2435
+ mistral: 'MISTRAL_API_KEY',
2436
+ xai: 'XAI_API_KEY',
2437
+ };
1323
2438
 
2439
+ // Resolution per key:
2440
+ keys[name] = input.keys?.[name] // 1. Explicit config
2441
+ ?? process.env[envVar]; // 2. Environment variable
2442
+ // 3. undefined (no default)
1324
2443
  ```
1325
- executeWithRetry(modality, provider, fn):
1326
- for attempt = 0..maxRetries:
1327
- try: return fn()
1328
- catch:
1329
- if error is retryable AND attempts remain:
1330
- wait backoffMs * 2^attempt (exponential backoff)
1331
- retry same provider
1332
- if error is NOT GENERATION_FAILED AND failover enabled:
1333
- try each alternative provider for this modality
1334
- throw last error
2444
+
2445
+ **Local Service Resolution:**
2446
+ ```typescript
2447
+ // For each of the 4 local services:
2448
+ const LOCAL_DEFAULTS = {
2449
+ ollama: { host: 'http://localhost', port: 11434, envHost: 'OLLAMA_HOST', envPort: 'OLLAMA_PORT' },
2450
+ comfyui: { host: 'http://localhost', port: 8188, envHost: 'COMFYUI_HOST', envPort: 'COMFYUI_PORT' },
2451
+ piper: { host: 'http://localhost', port: 5500, envHost: 'PIPER_HOST', envPort: 'PIPER_PORT' },
2452
+ kokoro: { host: 'http://localhost', port: 5501, envHost: 'KOKORO_HOST', envPort: 'KOKORO_PORT' },
2453
+ };
2454
+
2455
+ // Resolution per service:
2456
+ local[name] = {
2457
+ enabled: cfgLocal?.enabled ?? true, // Default: enabled
2458
+ host: cfgLocal?.host ?? process.env[envHost] ?? defaults.host,
2459
+ port: cfgLocal?.port ?? parseInt(process.env[envPort]) ?? defaults.port,
2460
+ type: cfgLocal?.type,
2461
+ };
1335
2462
  ```
1336
2463
 
1337
- **Retryable errors (same provider):** `PROVIDER_UNAVAILABLE`, `RATE_LIMITED`, `TIMEOUT`, `GENERATION_FAILED`
2464
+ **Other config defaults:**
2465
+ | Setting | Default | Environment Override |
2466
+ |---|---|---|
2467
+ | `autoDetectLocal` | `true` | `NOOSPHERE_AUTO_DETECT_LOCAL` |
2468
+ | `discoveryCacheTTL` | `60` (minutes) | `NOOSPHERE_DISCOVERY_CACHE_TTL` |
2469
+ | `retry.maxRetries` | `2` | — |
2470
+ | `retry.backoffMs` | `1000` | — |
2471
+ | `retry.failover` | `true` | — |
2472
+ | `retry.retryableErrors` | `['PROVIDER_UNAVAILABLE', 'RATE_LIMITED', 'TIMEOUT']` | — |
2473
+ | `timeout.llm` | `30000` (30s) | — |
2474
+ | `timeout.image` | `120000` (2m) | — |
2475
+ | `timeout.video` | `300000` (5m) | — |
2476
+ | `timeout.tts` | `60000` (1m) | — |
1338
2477
 
1339
- **Failover-eligible errors (cross-provider):** `PROVIDER_UNAVAILABLE`, `RATE_LIMITED`, `TIMEOUT` (NOT `GENERATION_FAILED`)
2478
+ ### Provider Resolution Local-First Algorithm
1340
2479
 
1341
- ### Model Registry & Caching
2480
+ When you call a generation method without specifying a provider, Noosphere resolves one automatically through a three-stage process in `resolveProviderForModality()` (`src/noosphere.ts:324-348`):
1342
2481
 
1343
- - Models are fetched from providers via `listModels()` and cached in memory
1344
- - Cache TTL is configurable (default: 60 minutes)
1345
- - `syncModels()` forces a refresh of all provider model lists
1346
- - Registry tracks model → provider mappings for fast resolution
2482
+ ```typescript
2483
+ private resolveProviderForModality(
2484
+ modality: Modality,
2485
+ preferredId?: string,
2486
+ modelId?: string,
2487
+ ): NoosphereProvider {
2488
+
2489
+ // STAGE 1: Model-based resolution
2490
+ // If model was specified WITHOUT a provider, search the registry cache
2491
+ if (modelId && !preferredId) {
2492
+ const resolved = this.registry.resolveModel(modelId, modality);
2493
+ if (resolved) return resolved.provider;
2494
+ // resolveModel() scans ALL cached models across ALL providers
2495
+ // looking for exact match on both modelId AND modality
2496
+ }
1347
2497
 
1348
- ### Usage Tracking
2498
+ // STAGE 2: Default-based resolution
2499
+ // Check if user configured a default for this modality
2500
+ if (!preferredId) {
2501
+ const defaultCfg = this.config.defaults[modality];
2502
+ if (defaultCfg) {
2503
+ preferredId = defaultCfg.provider;
2504
+ // Now fall through to Stage 3 with this preferredId
2505
+ }
2506
+ }
2507
+
2508
+ // STAGE 3: Provider registry resolution
2509
+ const provider = this.registry.resolveProvider(modality, preferredId);
2510
+ if (!provider) {
2511
+ throw new NoosphereError(
2512
+ `No provider available for modality '${modality}'`,
2513
+ { code: 'NO_PROVIDER', ... }
2514
+ );
2515
+ }
2516
+ return provider;
2517
+ }
2518
+ ```
2519
+
2520
+ **Registry.resolveProvider() — The local-first algorithm** (`src/registry.ts:31-46`):
2521
+
2522
+ ```typescript
2523
+ resolveProvider(modality: Modality, preferredId?: string): NoosphereProvider | null {
2524
+ // If a specific provider was requested:
2525
+ if (preferredId) {
2526
+ const p = this.providers.get(preferredId);
2527
+ if (p && p.modalities.includes(modality)) return p;
2528
+ return null; // NOT found — returns null, NOT a fallback
2529
+ }
2530
+
2531
+ // No preference — scan with local-first priority:
2532
+ let bestCloud: NoosphereProvider | null = null;
2533
+
2534
+ for (const p of this.providers.values()) {
2535
+ if (!p.modalities.includes(modality)) continue;
2536
+
2537
+ // LOCAL provider found → return IMMEDIATELY (first match wins)
2538
+ if (p.isLocal) return p;
2539
+
2540
+ // CLOUD provider → save as fallback (first cloud match only)
2541
+ if (!bestCloud) bestCloud = p;
2542
+ }
2543
+
2544
+ return bestCloud; // Return first cloud provider, or null
2545
+ }
2546
+ ```
2547
+
2548
+ **Resolution priority diagram:**
2549
+ ```
2550
+ ai.chat({ model: 'gpt-4o' })
2551
+
2552
+ ├─ Stage 1: Search modelCache for 'gpt-4o' with modality 'llm'
2553
+ │ └── Found in pi-ai cache → return PiAiProvider
2554
+
2555
+ ├─ Stage 2: (skipped — model resolved in Stage 1)
2556
+
2557
+ └─ Stage 3: (skipped — already resolved)
2558
+
2559
+ ai.image({ prompt: 'sunset' })
2560
+
2561
+ ├─ Stage 1: (no model specified, skipped)
2562
+
2563
+ ├─ Stage 2: Check config.defaults.image → none configured
2564
+
2565
+ └─ Stage 3: resolveProvider('image', undefined)
2566
+ ├── Scan providers:
2567
+ │ ├── pi-ai: modalities=['llm'] → skip (no 'image')
2568
+ │ ├── comfyui: modalities=['image','video'], isLocal=true → RETURN
2569
+ │ └── (fal never reached — local wins)
2570
+ └── Returns ComfyUIProvider (local-first)
2571
+
2572
+ ai.image({ prompt: 'sunset' }) // No local ComfyUI running
2573
+
2574
+ └─ Stage 3: resolveProvider('image', undefined)
2575
+ ├── Scan providers:
2576
+ │ ├── pi-ai: no 'image' → skip
2577
+ │ ├── fal: modalities=['image','video','tts'], isLocal=false → save as bestCloud
2578
+ │ └── huggingface: modalities=['image','tts','llm'], isLocal=false → already have bestCloud
2579
+ └── Returns FalProvider (first cloud fallback)
2580
+ ```
2581
+
2582
+ ### Retry & Failover Logic — Complete Algorithm
2583
+
2584
+ The `executeWithRetry()` method (`src/noosphere.ts:350-397`) implements a two-phase error handling strategy: same-provider retries, then cross-provider failover.
2585
+
2586
+ ```typescript
2587
+ private async executeWithRetry<T>(
2588
+ modality: Modality,
2589
+ provider: NoosphereProvider,
2590
+ fn: () => Promise<T>,
2591
+ failoverFnFactory?: (alt: NoosphereProvider) => (() => Promise<T>) | null,
2592
+ ): Promise<T> {
2593
+ const { maxRetries, backoffMs, retryableErrors, failover } = this.config.retry;
2594
+ // Default: maxRetries=2, backoffMs=1000, failover=true
2595
+ // retryableErrors = ['PROVIDER_UNAVAILABLE', 'RATE_LIMITED', 'TIMEOUT']
2596
+ let lastError: Error | undefined;
2597
+
2598
+ for (let attempt = 0; attempt <= maxRetries; attempt++) {
2599
+ try {
2600
+ return await fn(); // Try the primary provider
2601
+ } catch (err) {
2602
+ lastError = err instanceof Error ? err : new Error(String(err));
2603
+
2604
+ const isNoosphereErr = err instanceof NoosphereError;
2605
+ const code = isNoosphereErr ? err.code : 'GENERATION_FAILED';
2606
+
2607
+ // GENERATION_FAILED is special:
2608
+ // - Retryable on same provider (bad prompt, transient model issue)
2609
+ // - NOT eligible for cross-provider failover
2610
+ const isRetryable = retryableErrors.includes(code) || code === 'GENERATION_FAILED';
2611
+ const allowsFailover = code !== 'GENERATION_FAILED' && retryableErrors.includes(code);
2612
+
2613
+ if (!isRetryable || attempt === maxRetries) {
2614
+ // FAILOVER PHASE: Try other providers
2615
+ if (failover && allowsFailover && failoverFnFactory) {
2616
+ const altProviders = this.registry.getAllProviders()
2617
+ .filter((p) => p.id !== provider.id && p.modalities.includes(modality));
2618
+
2619
+ for (const alt of altProviders) {
2620
+ try {
2621
+ const altFn = failoverFnFactory(alt);
2622
+ if (altFn) return await altFn(); // Success on alternate provider
2623
+ } catch {
2624
+ // Continue to next alternate provider
2625
+ }
2626
+ }
2627
+ }
2628
+ break; // All retries and failovers exhausted
2629
+ }
2630
+
2631
+ // RETRY: Exponential backoff on same provider
2632
+ const delay = backoffMs * Math.pow(2, attempt);
2633
+ // attempt=0: 1000ms, attempt=1: 2000ms, attempt=2: 4000ms
2634
+ await new Promise((resolve) => setTimeout(resolve, delay));
2635
+ }
2636
+ }
2637
+
2638
+ throw lastError ?? new NoosphereError('Generation failed', { ... });
2639
+ }
2640
+ ```
2641
+
2642
+ **Failover function factory pattern:**
2643
+
2644
+ Each generation method passes a factory function that creates the right call for alternate providers:
2645
+ ```typescript
2646
+ // In chat():
2647
+ (alt) => alt.chat ? () => alt.chat!(options) : null
2648
+ // If the alternate provider has chat(), create a function to call it.
2649
+ // If not (e.g., ComfyUI for LLM), return null → skip this provider.
2650
+
2651
+ // In image():
2652
+ (alt) => alt.image ? () => alt.image!(options) : null
2653
+
2654
+ // In video():
2655
+ (alt) => alt.video ? () => alt.video!(options) : null
2656
+
2657
+ // In speak():
2658
+ (alt) => alt.speak ? () => alt.speak!(options) : null
2659
+ ```
2660
+
2661
+ **Complete retry timeline example:**
2662
+ ```
2663
+ ai.chat() with provider="pi-ai", maxRetries=2, backoffMs=1000
2664
+
2665
+ Attempt 0: pi-ai.chat() → RATE_LIMITED
2666
+ wait 1000ms (1000 * 2^0)
2667
+ Attempt 1: pi-ai.chat() → RATE_LIMITED
2668
+ wait 2000ms (1000 * 2^1)
2669
+ Attempt 2: pi-ai.chat() → RATE_LIMITED
2670
+ // maxRetries exhausted, RATE_LIMITED allows failover
2671
+ Failover 1: huggingface.chat() → 503 SERVICE_UNAVAILABLE
2672
+ Failover 2: (no more providers with 'llm' modality)
2673
+ throw last error (RATE_LIMITED from pi-ai)
2674
+ ```
2675
+
2676
+ **Error classification matrix:**
2677
+
2678
+ | Error Code | Same-Provider Retry | Cross-Provider Failover | Typical Cause |
2679
+ |---|---|---|---|
2680
+ | `PROVIDER_UNAVAILABLE` | Yes | Yes | Server down, network error |
2681
+ | `RATE_LIMITED` | Yes | Yes | API quota exceeded |
2682
+ | `TIMEOUT` | Yes | Yes | Slow response |
2683
+ | `GENERATION_FAILED` | Yes | **No** | Bad prompt, model error |
2684
+ | `AUTH_FAILED` | No | No | Wrong API key |
2685
+ | `MODEL_NOT_FOUND` | No | No | Invalid model ID |
2686
+ | `INVALID_INPUT` | No | No | Bad parameters |
2687
+ | `NO_PROVIDER` | No | No | No provider registered |
2688
+
2689
+ ### Model Registry — Internal Data Structures
2690
+
2691
+ The Registry (`src/registry.ts`, 137 lines) is the central nervous system that maps providers to models and handles model lookups.
2692
+
2693
+ **Internal state:**
2694
+ ```typescript
2695
+ class Registry {
2696
+ // Provider storage: Map<providerId, providerInstance>
2697
+ private providers = new Map<string, NoosphereProvider>();
2698
+ // Example: { 'pi-ai' → PiAiProvider, 'fal' → FalProvider, 'comfyui' → ComfyUIProvider }
2699
+
2700
+ // Model cache: Map<providerId, { models: ModelInfo[], syncedAt: timestamp }>
2701
+ private modelCache = new Map<string, CachedModels>();
2702
+ // Example: {
2703
+ // 'pi-ai' → { models: [246 ModelInfo objects], syncedAt: 1710000000000 },
2704
+ // 'fal' → { models: [867 ModelInfo objects], syncedAt: 1710000000000 },
2705
+ // }
2706
+
2707
+ // Cache TTL in milliseconds (converted from minutes in constructor)
2708
+ private cacheTTLMs: number;
2709
+ // Default: 60 * 60 * 1000 = 3,600,000ms = 1 hour
2710
+ }
2711
+ ```
2712
+
2713
+ **Cache staleness check:**
2714
+ ```typescript
2715
+ isCacheStale(providerId: string): boolean {
2716
+ const cached = this.modelCache.get(providerId);
2717
+ if (!cached) return true; // No cache = stale
2718
+ return Date.now() - cached.syncedAt > this.cacheTTLMs;
2719
+ // Example: if syncedAt was 61 minutes ago and TTL is 60 minutes → stale
2720
+ }
2721
+ ```
2722
+
2723
+ **Model resolution — linear scan across all caches:**
2724
+ ```typescript
2725
+ resolveModel(modelId: string, modality: Modality):
2726
+ { provider: NoosphereProvider; model: ModelInfo } | null {
2727
+
2728
+ // Scan EVERY provider's cached models:
2729
+ for (const [providerId, cached] of this.modelCache) {
2730
+ const model = cached.models.find(
2731
+ (m) => m.id === modelId && m.modality === modality
2732
+ );
2733
+ // Must match BOTH modelId AND modality
2734
+ if (model) {
2735
+ const provider = this.providers.get(providerId);
2736
+ if (provider) return { provider, model };
2737
+ }
2738
+ }
2739
+ return null;
2740
+ }
2741
+ // Performance: O(n) where n = total models across all providers
2742
+ // With 246 Pi-AI + 867 FAL + 3 HuggingFace = ~1116 models to scan
2743
+ // This is fast enough for the use case (called once per request)
2744
+ ```
2745
+
2746
+ **Sync mechanism:**
2747
+ ```typescript
2748
+ async syncAll(): Promise<SyncResult> {
2749
+ const byProvider: Record<string, number> = {};
2750
+ const errors: string[] = [];
2751
+ let synced = 0;
2752
+
2753
+ // Sequential sync (NOT parallel) — one provider at a time:
2754
+ for (const provider of this.providers.values()) {
2755
+ try {
2756
+ const models = await provider.listModels();
2757
+ this.modelCache.set(provider.id, {
2758
+ models,
2759
+ syncedAt: Date.now(),
2760
+ });
2761
+ byProvider[provider.id] = models.length;
2762
+ synced += models.length;
2763
+ } catch (err) {
2764
+ errors.push(`${provider.id}: ${err.message}`);
2765
+ byProvider[provider.id] = 0;
2766
+ // Note: failed sync does NOT clear existing cache
2767
+ }
2768
+ }
2769
+
2770
+ return { synced, byProvider, errors };
2771
+ }
2772
+ ```
2773
+
2774
+ **Provider info aggregation:**
2775
+ ```typescript
2776
+ getProviderInfos(modality?: Modality): ProviderInfo[] {
2777
+ // Returns summary info for each registered provider:
2778
+ // {
2779
+ // id: 'pi-ai',
2780
+ // name: 'pi-ai (LLM Gateway)',
2781
+ // modalities: ['llm'],
2782
+ // local: false,
2783
+ // status: 'online', // Always 'online' — no live ping check
2784
+ // modelCount: 246, // From cache, or 0 if not synced
2785
+ // }
2786
+ }
2787
+ ```
2788
+
2789
+ ### Usage Tracking — In-Memory Event Store
2790
+
2791
+ The `UsageTracker` (`src/tracking.ts`, 57 lines) records every API call and provides filtered aggregation.
2792
+
2793
+ **Internal state:**
2794
+ ```typescript
2795
+ class UsageTracker {
2796
+ private events: UsageEvent[] = []; // Append-only array
2797
+ private onUsage?: (event: UsageEvent) => void | Promise<void>; // Optional callback
2798
+ }
2799
+ ```
2800
+
2801
+ **Recording flow — every API call creates a UsageEvent:**
2802
+
2803
+ ```typescript
2804
+ // On SUCCESS (in Noosphere.trackUsage):
2805
+ const event: UsageEvent = {
2806
+ modality: result.modality, // 'llm' | 'image' | 'video' | 'tts'
2807
+ provider: result.provider, // 'pi-ai', 'fal', etc.
2808
+ model: result.model, // 'gpt-4o', 'flux-pro', etc.
2809
+ cost: result.usage.cost, // USD amount (0 for free/local)
2810
+ latencyMs: result.latencyMs, // Wall-clock milliseconds
2811
+ input: result.usage.input, // Input tokens or characters
2812
+ output: result.usage.output, // Output tokens (LLM only)
2813
+ unit: result.usage.unit, // 'tokens', 'characters', 'free'
2814
+ timestamp: new Date().toISOString(), // ISO 8601
2815
+ success: true,
2816
+ metadata, // User-provided metadata passthrough
2817
+ };
2818
+
2819
+ // On FAILURE (in Noosphere.trackError):
2820
+ const event: UsageEvent = {
2821
+ modality, provider,
2822
+ model: model ?? 'unknown',
2823
+ cost: 0, // No cost on failure
2824
+ latencyMs: Date.now() - startMs, // Time until failure
2825
+ timestamp: new Date().toISOString(),
2826
+ success: false,
2827
+ error: err instanceof Error ? err.message : String(err),
2828
+ metadata,
2829
+ };
2830
+ ```
2831
+
2832
+ **Query/aggregation — filtered summary:**
2833
+ ```typescript
2834
+ getSummary(options?: UsageQueryOptions): UsageSummary {
2835
+ let filtered = this.events;
2836
+
2837
+ // Time-range filtering:
2838
+ if (options?.since) {
2839
+ const since = new Date(options.since).getTime();
2840
+ filtered = filtered.filter((e) => new Date(e.timestamp).getTime() >= since);
2841
+ }
2842
+ if (options?.until) {
2843
+ const until = new Date(options.until).getTime();
2844
+ filtered = filtered.filter((e) => new Date(e.timestamp).getTime() <= until);
2845
+ }
2846
+
2847
+ // Provider/modality filtering:
2848
+ if (options?.provider) {
2849
+ filtered = filtered.filter((e) => e.provider === options.provider);
2850
+ }
2851
+ if (options?.modality) {
2852
+ filtered = filtered.filter((e) => e.modality === options.modality);
2853
+ }
2854
+
2855
+ // Aggregation:
2856
+ const byProvider: Record<string, number> = {};
2857
+ const byModality = { llm: 0, image: 0, video: 0, tts: 0 };
2858
+ let totalCost = 0;
2859
+
2860
+ for (const event of filtered) {
2861
+ totalCost += event.cost;
2862
+ byProvider[event.provider] = (byProvider[event.provider] ?? 0) + event.cost;
2863
+ byModality[event.modality] += event.cost;
2864
+ }
2865
+
2866
+ return { totalCost, totalRequests: filtered.length, byProvider, byModality };
2867
+ }
2868
+ ```
2869
+
2870
+ **Usage example:**
2871
+ ```typescript
2872
+ // Get all usage:
2873
+ const all = ai.getUsage();
2874
+ // { totalCost: 0.42, totalRequests: 15, byProvider: { 'pi-ai': 0.40, 'fal': 0.02 }, byModality: { llm: 0.40, image: 0.02, video: 0, tts: 0 } }
2875
+
2876
+ // Get usage for last hour, LLM only:
2877
+ const recent = ai.getUsage({
2878
+ since: new Date(Date.now() - 3600000),
2879
+ modality: 'llm',
2880
+ });
2881
+
2882
+ // Get usage for a specific provider:
2883
+ const falUsage = ai.getUsage({ provider: 'fal' });
2884
+
2885
+ // Real-time callback (set in constructor):
2886
+ const ai = new Noosphere({
2887
+ onUsage: (event) => {
2888
+ console.log(`${event.provider}/${event.model}: $${event.cost} in ${event.latencyMs}ms`);
2889
+ // Or: send to analytics, update dashboard, check budget
2890
+ },
2891
+ });
2892
+ ```
2893
+
2894
+ **Important limitations:**
2895
+ - Events are stored **in memory only** — lost on process restart
2896
+ - No deduplication — each retry/failover attempt creates a separate event
2897
+ - `clear()` wipes all history (called by `dispose()`)
2898
+ - The `onUsage` callback is `await`ed — a slow callback blocks the response return
2899
+
2900
+ ### Streaming Architecture
2901
+
2902
+ The `stream()` method (`src/noosphere.ts:73-124`) wraps provider streams with usage tracking:
2903
+
2904
+ ```typescript
2905
+ stream(options: ChatOptions): NoosphereStream {
2906
+ // Returns IMMEDIATELY (synchronous) — no await
2907
+ // The actual initialization happens lazily on first iteration
2908
+
2909
+ let innerStream: NoosphereStream | undefined;
2910
+ let finalResult: NoosphereResult | undefined;
2911
+ let providerRef: NoosphereProvider | undefined;
2912
+
2913
+ // Lazy init — runs on first for-await-of iteration:
2914
+ const ensureInit = async () => {
2915
+ if (!this.initialized) await this.init();
2916
+ if (!providerRef) {
2917
+ providerRef = this.resolveProviderForModality('llm', ...);
2918
+ if (!providerRef.stream) throw new NoosphereError(...);
2919
+ innerStream = providerRef.stream(options);
2920
+ }
2921
+ };
2922
+
2923
+ // Wrapped async iterator with usage tracking:
2924
+ const wrappedIterator = {
2925
+ async *[Symbol.asyncIterator]() {
2926
+ await ensureInit(); // Init on first next()
2927
+ for await (const event of innerStream!) {
2928
+ if (event.type === 'done' && event.result) {
2929
+ finalResult = event.result;
2930
+ await trackUsage(event.result); // Track when complete
2931
+ }
2932
+ yield event; // Pass events through
2933
+ }
2934
+ },
2935
+ };
2936
+
2937
+ return {
2938
+ [Symbol.asyncIterator]: () => wrappedIterator[Symbol.asyncIterator](),
2939
+
2940
+ // result() — consume entire stream and return final result:
2941
+ result: async () => {
2942
+ if (finalResult) return finalResult; // Already consumed
2943
+ for await (const event of wrappedIterator) {
2944
+ if (event.type === 'done') return event.result!;
2945
+ if (event.type === 'error') throw event.error;
2946
+ }
2947
+ throw new NoosphereError('Stream ended without result');
2948
+ },
2949
+
2950
+ // abort() — signal cancellation:
2951
+ abort: () => innerStream?.abort(),
2952
+ };
2953
+ }
2954
+ ```
2955
+
2956
+ **Stream event types:**
2957
+ | Event Type | Fields | When |
2958
+ |---|---|---|
2959
+ | `text_delta` | `{ type, delta: string }` | Each text token |
2960
+ | `thinking_delta` | `{ type, delta: string }` | Each reasoning token |
2961
+ | `done` | `{ type, result: NoosphereResult }` | Stream complete |
2962
+ | `error` | `{ type, error: Error }` | Stream failed |
2963
+
2964
+ **Note:** Streaming does NOT use `executeWithRetry()`. If the stream fails, there's no automatic retry or failover. The error is yielded as an `error` event and also tracked via `trackError()`.
2965
+
2966
+ ### Lifecycle Management — dispose()
2967
+
2968
+ ```typescript
2969
+ async dispose(): Promise<void> {
2970
+ // 1. Call dispose() on every registered provider (if implemented):
2971
+ for (const provider of this.registry.getAllProviders()) {
2972
+ if (provider.dispose) {
2973
+ await provider.dispose();
2974
+ // Currently: no built-in provider implements dispose()
2975
+ // This is for custom providers that need cleanup
2976
+ }
2977
+ }
2978
+
2979
+ // 2. Clear the model cache:
2980
+ this.registry.clearCache();
2981
+
2982
+ // 3. Clear usage history:
2983
+ this.tracker.clear();
1349
2984
 
1350
- Every API call (success or failure) records a `UsageEvent`:
1351
-
1352
- ```typescript
1353
- interface UsageEvent {
1354
- modality: 'llm' | 'image' | 'video' | 'tts';
1355
- provider: string;
1356
- model: string;
1357
- cost: number; // USD
1358
- latencyMs: number;
1359
- input?: number; // tokens or characters
1360
- output?: number; // tokens
1361
- unit?: string;
1362
- timestamp: string; // ISO 8601
1363
- success: boolean;
1364
- error?: string; // error message if failed
1365
- metadata?: Record<string, unknown>;
2985
+ // Note: does NOT set initialized=false
2986
+ // After dispose(), the instance is NOT reusable for new requests
1366
2987
  }
1367
2988
  ```
1368
2989