noosphere 0.1.3 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -7,7 +7,7 @@ One import. Every model. Every modality.
7
7
  ## Features
8
8
 
9
9
  - **4 modalities** — LLM chat, image generation, video generation, and text-to-speech
10
- - **246+ LLM models** — via Pi-AI gateway (OpenAI, Anthropic, Google, Groq, Mistral, xAI, Cerebras, OpenRouter)
10
+ - **Always up-to-date models** — Dynamic auto-fetch from ALL provider APIs at runtime (OpenAI, Anthropic, Google, Groq, Mistral, xAI, Cerebras, OpenRouter)
11
11
  - **867+ media endpoints** — via FAL (Flux, SDXL, Kling, Sora 2, VEO 3, Kokoro, ElevenLabs, and hundreds more)
12
12
  - **30+ HuggingFace tasks** — LLM, image, TTS, translation, summarization, classification, and more
13
13
  - **Local-first architecture** — Auto-detects ComfyUI, Ollama, Piper, and Kokoro on your machine
@@ -59,6 +59,108 @@ const audio = await ai.speak({
59
59
  // audio.buffer contains the audio data
60
60
  ```
61
61
 
62
+ ## Dynamic Model Auto-Fetch — Always Up-to-Date
63
+
64
+ Noosphere **automatically discovers the latest models** from every provider's API at runtime. When Google releases a new Gemini model, when OpenAI drops GPT-5, when Anthropic publishes Claude 4 — **you get them immediately**, without updating Noosphere or any dependency.
65
+
66
+ ### The Problem It Solves
67
+
68
+ Traditional AI libraries rely on **static model catalogs** hardcoded at build time. The `@mariozechner/pi-ai` dependency ships with ~246 models in a pre-generated `models.generated.js` file. When a provider releases a new model, you'd have to wait for the library maintainer to run `npm run generate-models`, publish a new version, and then you'd `npm update`. This lag can be days or weeks.
69
+
70
+ ### How It Works
71
+
72
+ On the **first API call**, Noosphere queries every provider's model listing API in parallel and merges the results with the static catalog:
73
+
74
+ ```
75
+ First ai.chat() / ai.image() / ai.stream() call
76
+
77
+ ├─ 1. Load static pi-ai catalog (246 models with accurate cost/context data)
78
+
79
+ ├─ 2. Parallel fetch from ALL provider APIs (8 concurrent requests):
80
+ │ ├── GET https://api.openai.com/v1/models (Bearer token)
81
+ │ ├── GET https://api.anthropic.com/v1/models (x-api-key + anthropic-version)
82
+ │ ├── GET https://generativelanguage.googleapis.com/... (API key in URL)
83
+ │ ├── GET https://api.groq.com/openai/v1/models (Bearer token)
84
+ │ ├── GET https://api.mistral.ai/v1/models (Bearer token)
85
+ │ ├── GET https://api.x.ai/v1/models (Bearer token)
86
+ │ ├── GET https://openrouter.ai/api/v1/models (Bearer token)
87
+ │ └── GET https://api.cerebras.ai/v1/models (Bearer token)
88
+
89
+ ├─ 3. Filter results (chat models only — exclude embeddings, TTS, whisper, etc.)
90
+
91
+ ├─ 4. Deduplicate against static catalog (static wins — has accurate cost data)
92
+
93
+ └─ 5. Merge: Static catalog + newly discovered models = complete model list
94
+ ```
95
+
96
+ ### What Gets Fetched Per Provider
97
+
98
+ | Provider | API Endpoint | Auth | Model Filter | API Protocol |
99
+ |---|---|---|---|---|
100
+ | **OpenAI** | `/v1/models` | Bearer token | `gpt-*`, `o1*`, `o3*`, `o4*`, `chatgpt-*`, `codex-*` | `openai-responses` |
101
+ | **Anthropic** | `/v1/models?limit=100` | `x-api-key` + `anthropic-version` | `claude-*` | `anthropic-messages` |
102
+ | **Google** | `/v1beta/models?key=KEY` | API key in URL | `gemini-*`, `gemma-*` + must support `generateContent` | `google-generative-ai` |
103
+ | **Groq** | `/openai/v1/models` | Bearer token | All (Groq only serves chat models) | `openai-completions` |
104
+ | **Mistral** | `/v1/models` | Bearer token | Exclude `*embed*` | `openai-completions` |
105
+ | **xAI** | `/v1/models` | Bearer token | `grok*` | `openai-completions` |
106
+ | **OpenRouter** | `/api/v1/models` | Bearer token | All (OpenRouter only lists usable models) | `openai-completions` |
107
+ | **Cerebras** | `/v1/models` | Bearer token | All (Cerebras only serves chat models) | `openai-completions` |
108
+
109
+ ### Resilience Guarantees
110
+
111
+ - **8-second timeout** per provider — slow APIs don't block everything
112
+ - **`Promise.allSettled()`** — if one provider fails, the others still work
113
+ - **Silent failure** — network errors are caught and ignored, static catalog always available
114
+ - **One-time fetch** — results are cached in memory, not re-fetched on every call
115
+ - **Zero config** — works automatically if you have API keys set
116
+
117
+ ### How New Models Become Usable
118
+
119
+ When a dynamically discovered model isn't in the static catalog, Noosphere constructs a **synthetic Model object** that pi-ai's `complete()` and `stream()` functions can use directly:
120
+
121
+ ```typescript
122
+ // For a new model like "gpt-4.5-turbo" discovered from OpenAI's API:
123
+ {
124
+ id: 'gpt-4.5-turbo',
125
+ name: 'gpt-4.5-turbo',
126
+ api: 'openai-responses', // Correct protocol for the provider
127
+ provider: 'openai',
128
+ baseUrl: 'https://api.openai.com/v1',
129
+ reasoning: false, // Inferred from model ID prefix
130
+ input: ['text', 'image'],
131
+ cost: { input: 2.5, output: 10, cacheRead: 1.25, cacheWrite: 2.5 }, // From template
132
+ contextWindow: 128000, // From template or provider API
133
+ maxTokens: 16384, // From template or provider API
134
+ }
135
+ ```
136
+
137
+ **Template inheritance:** Cost and context window data come from a "template" — the first model in the static catalog for that provider. This means new models inherit approximate pricing until the static catalog is updated with exact numbers. For Google, the API returns `inputTokenLimit` and `outputTokenLimit` directly, so context window data is always accurate.
138
+
139
+ ### Force Refresh
140
+
141
+ ```typescript
142
+ const ai = new Noosphere();
143
+
144
+ // Models are auto-fetched on first call:
145
+ await ai.chat({ model: 'gemini-2.5-ultra', messages: [...] }); // works immediately
146
+
147
+ // Force a re-fetch if you suspect new models were added mid-session:
148
+ // (access the provider's refreshDynamicModels method via the registry)
149
+ const models = await ai.getModels('llm');
150
+ // Or trigger a full sync:
151
+ await ai.syncModels();
152
+ ```
153
+
154
+ ### Why Not Just Use the Provider APIs Directly?
155
+
156
+ | Approach | Pros | Cons |
157
+ |---|---|---|
158
+ | **Static catalog only** (old) | Accurate costs, fast startup | Stale within days, miss new models |
159
+ | **Dynamic only** | Always current | No cost data, no context window info, slow startup |
160
+ | **Hybrid (Noosphere)** | Best of both — accurate data for known models + immediate access to new ones | New models have estimated costs until catalog update |
161
+
162
+ ---
163
+
62
164
  ## Configuration
63
165
 
64
166
  API keys are resolved from the constructor config or environment variables (config takes priority):
@@ -1112,16 +1214,95 @@ The largest media generation provider with dynamic pricing fetched at runtime fr
1112
1214
  **Other TTS:**
1113
1215
  `fal-ai/f5-tts` (voice cloning), `fal-ai/dia-tts`, `fal-ai/minimax/speech-2.6-turbo`, `fal-ai/minimax/speech-2.6-hd`, `fal-ai/chatterbox/text-to-speech`, `fal-ai/index-tts-2/text-to-speech`
1114
1216
 
1217
+ #### FAL Provider Internals — How It Actually Works
1218
+
1219
+ **Image generation** uses `fal.subscribe()` (queue-based, polls until complete):
1220
+ ```typescript
1221
+ // Exact request payload sent to FAL:
1222
+ const response = await fal.subscribe(model, {
1223
+ input: {
1224
+ prompt: "A sunset over mountains",
1225
+ negative_prompt: "blurry", // from options.negativePrompt
1226
+ image_size: { width: 1024, height: 768 }, // from options.width/height
1227
+ seed: 42, // from options.seed
1228
+ num_inference_steps: 30, // from options.steps
1229
+ guidance_scale: 7.5, // from options.guidanceScale
1230
+ },
1231
+ });
1232
+
1233
+ // Response parsing — URL from images array:
1234
+ const image = response.data?.images?.[0];
1235
+ // result.url = image?.url
1236
+ // result.media = { width: image?.width, height: image?.height, format: 'png' }
1237
+ ```
1238
+
1239
+ **Video generation** uses `fal.subscribe()`:
1240
+ ```typescript
1241
+ const response = await fal.subscribe(model, {
1242
+ input: {
1243
+ prompt: "Ocean waves",
1244
+ image_url: "https://...", // from options.imageUrl (image-to-video)
1245
+ duration: 5, // from options.duration
1246
+ fps: 24, // from options.fps
1247
+ },
1248
+ });
1249
+
1250
+ // Response parsing — URL from video object with fallback:
1251
+ const video = response.data?.video;
1252
+ // result.url = video?.url ?? response.data?.video_url
1253
+ // Note: width/height/duration/fps come from INPUT options, not response
1254
+ ```
1255
+
1256
+ **TTS** uses `fal.run()` (direct call, NOT subscribe — no queue):
1257
+ ```typescript
1258
+ const response = await fal.run(model, {
1259
+ input: {
1260
+ text: "Hello world",
1261
+ voice: "af_heart", // from options.voice
1262
+ speed: 1.0, // from options.speed
1263
+ },
1264
+ });
1265
+
1266
+ // Response parsing — URL from audio object with fallback:
1267
+ // result.url = response.data?.audio_url ?? response.data?.audio?.url
1268
+ ```
1269
+
1270
+ **Pricing cache and cost tracking:**
1271
+ ```typescript
1272
+ // Pricing fetched dynamically from FAL API during listModels():
1273
+ const res = await fetch('https://api.fal.ai/v1/models/pricing', {
1274
+ headers: { Authorization: `Key ${this.apiKey}` },
1275
+ });
1276
+ // Returns: Array<{ modelId: string, price: number, unit: string }>
1277
+
1278
+ // Cached in memory Map, cleared on each listModels() call:
1279
+ private pricingCache = new Map<string, { price: number; unit: string }>();
1280
+
1281
+ // Cost per request pulled from cache (defaults to 0 if not cached):
1282
+ usage: { cost: pricingCache.get(model)?.price ?? 0 }
1283
+ ```
1284
+
1285
+ **Modality inference from model ID — exact string matching:**
1286
+ ```typescript
1287
+ inferModality(modelId: string, unit: string): Modality {
1288
+ // TTS: unit contains 'char' OR modelId contains 'tts'/'kokoro'/'elevenlabs'
1289
+ // Video: unit contains 'second' OR modelId contains 'video'/'kling'/'sora'/'veo'
1290
+ // Image: everything else (default)
1291
+ }
1292
+ ```
1293
+
1294
+ **Error handling:** Only `listModels()` catches errors (returns `[]`). Image/video/speak methods let FAL errors propagate directly — no wrapping.
1295
+
1115
1296
  #### FAL Client Capabilities
1116
1297
 
1117
1298
  The `@fal-ai/client` provides additional features beyond what Noosphere surfaces:
1118
1299
 
1119
- - **Queue API** — Submit jobs, poll status, get results, cancel. Supports webhooks and priority levels
1120
- - **Streaming API** — Real-time streaming responses via async iterators
1121
- - **Realtime API** — WebSocket connections for interactive use (e.g., real-time image generation)
1122
- - **Storage API** — File upload with configurable TTL (1h, 1d, 7d, 30d, 1y, never)
1123
- - **Retry logic** — Configurable retries with exponential backoff and jitter
1124
- - **Request middleware** — Custom request interceptors and proxy support
1300
+ - **Queue API** — `fal.queue.submit()`, `status()`, `result()`, `cancel()`. Supports webhooks, priority levels (`"low"` | `"normal"`), and polling/streaming status modes
1301
+ - **Streaming API** — `fal.streaming.stream()` with async iterators, chunk-level events, configurable timeout between chunks (15s default)
1302
+ - **Realtime API** — `fal.realtime.connect()` for WebSocket connections with msgpack encoding, throttle interval (128ms default), frame buffering (1-60 frames)
1303
+ - **Storage API** — `fal.storage.upload()` with configurable object lifecycle: `"never"` | `"immediate"` | `"1h"` | `"1d"` | `"7d"` | `"30d"` | `"1y"`
1304
+ - **Retry logic** — 3 retries default, exponential backoff (500ms base, 15s max), jitter enabled, retries on 408/429/500/502/503/504
1305
+ - **Request middleware** — `withMiddleware()` for request interceptors, `withProxy()` for proxy configuration
1125
1306
 
1126
1307
  ---
1127
1308
 
@@ -1212,6 +1393,260 @@ The `@huggingface/inference` library (v3.15.0) provides 30+ AI tasks, including
1212
1393
  - **Multimodal Input:** Images via `image_url` content chunks in chat messages
1213
1394
  - **17 Inference Providers:** Route through Groq, Together, Fireworks, Replicate, Cerebras, Cohere, and more
1214
1395
 
1396
+ #### HuggingFace Provider Internals — How It Actually Works
1397
+
1398
+ The `HuggingFaceProvider` class (`src/providers/huggingface.ts`, 141 lines) wraps the `@huggingface/inference` library's `HfInference` client. Here's the exact internal flow for each modality:
1399
+
1400
+ **Initialization:**
1401
+ ```typescript
1402
+ // Constructor receives a single API token
1403
+ constructor(token: string) {
1404
+ this.client = new HfInference(token);
1405
+ // HfInference stores the token internally and attaches it
1406
+ // as Authorization: Bearer <token> to every request
1407
+ }
1408
+
1409
+ // ping() always returns true — HuggingFace is considered
1410
+ // "available" if the token was provided. No actual HTTP check.
1411
+ async ping(): Promise<boolean> { return true; }
1412
+ ```
1413
+
1414
+ **Chat Completions — exact request flow:**
1415
+ ```typescript
1416
+ // Default model: meta-llama/Llama-3.1-8B-Instruct
1417
+ const model = options.model ?? 'meta-llama/Llama-3.1-8B-Instruct';
1418
+
1419
+ // Maps directly to HfInference.chatCompletion():
1420
+ const response = await this.client.chatCompletion({
1421
+ model, // HuggingFace model ID or inference endpoint
1422
+ messages: options.messages, // Array<{ role, content }> — passed directly
1423
+ temperature: options.temperature, // 0.0 - 2.0 (optional)
1424
+ max_tokens: options.maxTokens, // Max output tokens (optional)
1425
+ });
1426
+
1427
+ // Response parsing:
1428
+ const choice = response.choices?.[0]; // OpenAI-compatible format
1429
+ const usage = response.usage; // { prompt_tokens, completion_tokens }
1430
+ // result.content = choice?.message?.content ?? ''
1431
+ // result.usage.input = usage?.prompt_tokens
1432
+ // result.usage.output = usage?.completion_tokens
1433
+ // result.usage.cost = 0 (always free for HF Inference API)
1434
+ ```
1435
+
1436
+ **Image Generation — Blob-to-Buffer conversion pipeline:**
1437
+ ```typescript
1438
+ // Default model: stabilityai/stable-diffusion-xl-base-1.0
1439
+ const model = options.model ?? 'stabilityai/stable-diffusion-xl-base-1.0';
1440
+
1441
+ // Uses textToImage() which returns a Blob object:
1442
+ const blob = await this.client.textToImage({
1443
+ model,
1444
+ inputs: options.prompt, // The text prompt
1445
+ parameters: {
1446
+ negative_prompt: options.negativePrompt, // What NOT to generate
1447
+ width: options.width, // Pixel width
1448
+ height: options.height, // Pixel height
1449
+ guidance_scale: options.guidanceScale, // CFG scale
1450
+ num_inference_steps: options.steps, // Denoising steps
1451
+ },
1452
+ }, { outputType: 'blob' }); // <-- Forces Blob output (not ReadableStream)
1453
+
1454
+ // Blob → ArrayBuffer → Node.js Buffer conversion:
1455
+ const buffer = Buffer.from(await blob.arrayBuffer());
1456
+ // This is the critical step — HfInference returns a Web API Blob,
1457
+ // which must be converted to a Node.js Buffer for downstream use.
1458
+
1459
+ // Result always reports PNG format regardless of actual model output:
1460
+ // result.media = { width: options.width ?? 1024, height: options.height ?? 1024, format: 'png' }
1461
+ ```
1462
+
1463
+ **Text-to-Speech — Blob-to-Buffer conversion:**
1464
+ ```typescript
1465
+ // Default model: facebook/mms-tts-eng
1466
+ const model = options.model ?? 'facebook/mms-tts-eng';
1467
+
1468
+ // Uses textToSpeech() — simpler API, just model + text:
1469
+ const blob = await this.client.textToSpeech({
1470
+ model,
1471
+ inputs: options.text, // Text to synthesize
1472
+ // Note: No voice, speed, or format parameters — these are model-dependent
1473
+ });
1474
+
1475
+ // Same Blob → Buffer conversion:
1476
+ const buffer = Buffer.from(await blob.arrayBuffer());
1477
+
1478
+ // Usage tracks character count, not tokens:
1479
+ // result.usage = { cost: 0, input: options.text.length, unit: 'characters' }
1480
+ // result.media = { format: 'wav' }
1481
+ ```
1482
+
1483
+ **Model listing — curated defaults, not API discovery:**
1484
+ ```typescript
1485
+ // Unlike FAL (which fetches from API) or Pi-AI (which auto-generates),
1486
+ // HuggingFace returns a HARDCODED list of 3 curated models:
1487
+ async listModels(modality?: Modality): Promise<ModelInfo[]> {
1488
+ const models: ModelInfo[] = [];
1489
+ if (!modality || modality === 'image') {
1490
+ models.push({ id: 'stabilityai/stable-diffusion-xl-base-1.0', ... });
1491
+ }
1492
+ if (!modality || modality === 'tts') {
1493
+ models.push({ id: 'facebook/mms-tts-eng', ... });
1494
+ }
1495
+ if (!modality || modality === 'llm') {
1496
+ models.push({ id: 'meta-llama/Llama-3.1-8B-Instruct', ... });
1497
+ }
1498
+ return models;
1499
+ }
1500
+ // This means: the registry only KNOWS about 3 models by default,
1501
+ // but you can use ANY HuggingFace model by passing its ID directly.
1502
+ // The model just won't appear in getModels() or syncModels() results.
1503
+ ```
1504
+
1505
+ #### The 17 HuggingFace Inference Providers
1506
+
1507
+ The `@huggingface/inference` library supports routing requests through 17 different inference providers. This means a single HuggingFace model ID can be served by multiple backends with different performance/cost characteristics:
1508
+
1509
+ | # | Provider | Type | Strengths |
1510
+ |---|---|---|---|
1511
+ | 1 | `hf-inference` | HuggingFace's own | Default, free tier, rate-limited |
1512
+ | 2 | `hf-dedicated` | Dedicated endpoints | Private, reserved GPU, guaranteed availability |
1513
+ | 3 | `together-ai` | Together.ai | Fast inference, competitive pricing |
1514
+ | 4 | `fireworks-ai` | Fireworks.ai | Optimized serving, function calling |
1515
+ | 5 | `replicate` | Replicate | Pay-per-use, large model catalog |
1516
+ | 6 | `cerebras` | Cerebras | Extreme speed (WSE-3 hardware) |
1517
+ | 7 | `groq` | Groq | Ultra-low latency (LPU hardware) |
1518
+ | 8 | `cohere` | Cohere | Enterprise, embeddings, RAG |
1519
+ | 9 | `sambanova` | SambaNova | Enterprise RDU hardware |
1520
+ | 10 | `nebius` | Nebius | European cloud infrastructure |
1521
+ | 11 | `hyperbolic` | Hyperbolic Labs | Open-access GPU marketplace |
1522
+ | 12 | `novita` | Novita AI | Cost-efficient inference |
1523
+ | 13 | `ovh-cloud` | OVHcloud | European sovereign cloud |
1524
+ | 14 | `aws` | Amazon SageMaker | AWS-managed endpoints |
1525
+ | 15 | `azure` | Azure ML | Azure-managed endpoints |
1526
+ | 16 | `google-vertex` | Google Vertex | GCP-managed endpoints |
1527
+ | 17 | `deepinfra` | DeepInfra | High-throughput inference |
1528
+
1529
+ **Provider routing** is handled by the `@huggingface/inference` library's internal `provider` parameter:
1530
+ ```typescript
1531
+ // Route through a specific inference provider:
1532
+ const response = await client.chatCompletion({
1533
+ model: 'meta-llama/Llama-3.1-70B-Instruct',
1534
+ provider: 'together-ai', // <-- Route through Together.ai
1535
+ messages: [...],
1536
+ });
1537
+
1538
+ // NOTE: Noosphere does NOT currently expose the `provider` parameter
1539
+ // in its ChatOptions type. To use a specific HF inference provider,
1540
+ // you would need a custom provider or direct @huggingface/inference usage.
1541
+ ```
1542
+
1543
+ #### Using HuggingFace Locally — Dedicated Endpoints
1544
+
1545
+ HuggingFace Inference Endpoints let you deploy any model on dedicated GPUs. The `@huggingface/inference` library supports this via the `endpointUrl` parameter:
1546
+
1547
+ ```typescript
1548
+ // Direct HfInference usage with a local/dedicated endpoint:
1549
+ import { HfInference } from '@huggingface/inference';
1550
+
1551
+ const client = new HfInference('your-token');
1552
+
1553
+ // Point to your dedicated endpoint:
1554
+ const response = await client.chatCompletion({
1555
+ model: 'tgi',
1556
+ endpointUrl: 'https://your-endpoint.endpoints.huggingface.cloud',
1557
+ messages: [{ role: 'user', content: 'Hello' }],
1558
+ });
1559
+
1560
+ // For a truly local setup with TGI (Text Generation Inference):
1561
+ const localClient = new HfInference(); // No token needed for local
1562
+ const response = await localClient.chatCompletion({
1563
+ model: 'tgi',
1564
+ endpointUrl: 'http://localhost:8080', // Local TGI server
1565
+ messages: [...],
1566
+ });
1567
+ ```
1568
+
1569
+ **Deploying HuggingFace models locally with TGI:**
1570
+
1571
+ ```bash
1572
+ # 1. Install Text Generation Inference (TGI):
1573
+ docker run --gpus all -p 8080:80 \
1574
+ -v /data:/data \
1575
+ ghcr.io/huggingface/text-generation-inference:latest \
1576
+ --model-id meta-llama/Llama-3.1-8B-Instruct
1577
+
1578
+ # 2. For image models, use Inference Endpoints:
1579
+ # Deploy via https://ui.endpoints.huggingface.co/
1580
+ # Select your model, GPU type, and region
1581
+ # Get an endpoint URL like: https://xyz123.endpoints.huggingface.cloud
1582
+
1583
+ # 3. For TTS models locally, use the Transformers library:
1584
+ # pip install transformers torch
1585
+ # Then run a local server that serves the model
1586
+ ```
1587
+
1588
+ **Other local deployment options:**
1589
+
1590
+ | Method | URL Pattern | Use Case |
1591
+ |---|---|---|
1592
+ | TGI Docker | `http://localhost:8080` | Production local LLM serving |
1593
+ | HF Inference Endpoints | `https://xxxx.endpoints.huggingface.cloud` | Managed dedicated GPU |
1594
+ | vLLM with HF models | `http://localhost:8000` | High-throughput local serving |
1595
+ | Transformers + FastAPI | Custom URL | Custom model serving |
1596
+
1597
+ #### Unexposed `@huggingface/inference` Parameters
1598
+
1599
+ The `chatCompletion()` method accepts many parameters that Noosphere's `ChatOptions` doesn't currently expose. These are available if you use the library directly:
1600
+
1601
+ | Parameter | Type | Description |
1602
+ |---|---|---|
1603
+ | `temperature` | `number` | Sampling temperature (0-2.0) — **exposed** via `ChatOptions.temperature` |
1604
+ | `max_tokens` | `number` | Max output tokens — **exposed** via `ChatOptions.maxTokens` |
1605
+ | `top_p` | `number` | Nucleus sampling threshold (0-1.0) — **not exposed** |
1606
+ | `top_k` | `number` | Top-K sampling — **not exposed** |
1607
+ | `frequency_penalty` | `number` | Penalize repeated tokens (-2.0 to 2.0) — **not exposed** |
1608
+ | `presence_penalty` | `number` | Penalize tokens already present (-2.0 to 2.0) — **not exposed** |
1609
+ | `repetition_penalty` | `number` | Alternative repetition penalty (>1.0 penalizes) — **not exposed** |
1610
+ | `stop` | `string[]` | Stop sequences — **not exposed** |
1611
+ | `seed` | `number` | Deterministic sampling seed — **not exposed** |
1612
+ | `tools` | `Tool[]` | Function/tool definitions — **not exposed** |
1613
+ | `tool_choice` | `string \| object` | Tool selection strategy — **not exposed** |
1614
+ | `tool_prompt` | `string` | System prompt for tool use — **not exposed** |
1615
+ | `response_format` | `object` | JSON schema constraints — **not exposed** |
1616
+ | `reasoning_effort` | `string` | Thinking depth level — **not exposed** |
1617
+ | `stream` | `boolean` | Enable streaming — **not exposed** (use `chatCompletionStream()`) |
1618
+ | `provider` | `string` | Inference provider routing — **not exposed** |
1619
+ | `endpointUrl` | `string` | Custom endpoint URL — **not exposed** |
1620
+ | `n` | `number` | Number of completions — **not exposed** |
1621
+ | `logprobs` | `boolean` | Return log probabilities — **not exposed** |
1622
+ | `grammar` | `object` | BNF grammar constraints — **not exposed** |
1623
+
1624
+ **Image generation unexposed parameters:**
1625
+ | Parameter | Type | Description |
1626
+ |---|---|---|
1627
+ | `negative_prompt` | `string` | **Exposed** via `ImageOptions.negativePrompt` |
1628
+ | `width` / `height` | `number` | **Exposed** via `ImageOptions.width/height` |
1629
+ | `guidance_scale` | `number` | **Exposed** via `ImageOptions.guidanceScale` |
1630
+ | `num_inference_steps` | `number` | **Exposed** via `ImageOptions.steps` |
1631
+ | `scheduler` | `string` | Diffusion scheduler type — **not exposed** |
1632
+ | `target_size` | `object` | Target resize dimensions — **not exposed** |
1633
+ | `clip_skip` | `number` | CLIP skip layers — **not exposed** |
1634
+
1635
+ #### HuggingFace Error Behavior
1636
+
1637
+ Unlike other providers, HuggingFaceProvider does **not** catch errors from the `@huggingface/inference` library. All errors propagate directly up to Noosphere's `executeWithRetry()`:
1638
+
1639
+ ```
1640
+ HfInference throws → HuggingFaceProvider propagates →
1641
+ executeWithRetry catches → Noosphere wraps as NoosphereError
1642
+ ```
1643
+
1644
+ Common error scenarios:
1645
+ - **401 Unauthorized** — Invalid or expired token → becomes `AUTH_FAILED`
1646
+ - **404 Model Not Found** — Model ID doesn't exist on HF Hub → becomes `MODEL_NOT_FOUND`
1647
+ - **429 Rate Limited** — Free tier limit exceeded → becomes `RATE_LIMITED` (retryable)
1648
+ - **503 Model Loading** — Model is cold-starting on HF Inference → becomes `PROVIDER_UNAVAILABLE` (retryable)
1649
+
1215
1650
  ---
1216
1651
 
1217
1652
  ### ComfyUI — Local Image Generation
@@ -1220,17 +1655,237 @@ The `@huggingface/inference` library (v3.15.0) provides 30+ AI tasks, including
1220
1655
  **Modalities:** Image, Video (planned)
1221
1656
  **Type:** Local
1222
1657
  **Default Port:** 8188
1658
+ **Source:** `src/providers/comfyui.ts` (155 lines)
1659
+
1660
+ Connects to a local ComfyUI instance for Stable Diffusion workflows. ComfyUI is a node-based UI for Stable Diffusion that exposes an HTTP API. Noosphere communicates with it via raw HTTP — no ComfyUI SDK needed.
1223
1661
 
1224
- Connects to a local ComfyUI instance for Stable Diffusion workflows.
1662
+ #### How It Works Complete Lifecycle
1225
1663
 
1226
- #### How It Works
1664
+ ```
1665
+ User calls ai.image() →
1666
+ 1. structuredClone(DEFAULT_TXT2IMG_WORKFLOW) // Deep-clone the template
1667
+ 2. Inject parameters into workflow nodes // Mutate the clone
1668
+ 3. POST /prompt { prompt: workflow } // Queue the workflow
1669
+ 4. Receive { prompt_id: "abc-123" } // Get tracking ID
1670
+ 5. POLL GET /history/abc-123 every 1000ms // Check completion
1671
+ 6. Parse outputs → find SaveImage node // Locate generated image
1672
+ 7. GET /view?filename=X&subfolder=Y&type=Z // Fetch image binary
1673
+ 8. Return Buffer // PNG buffer to caller
1674
+ ```
1227
1675
 
1228
- 1. Clones a built-in txt2img workflow template (KSampler + SDXL pipeline)
1229
- 2. Injects your parameters (prompt, dimensions, seed, steps, guidance)
1230
- 3. POSTs the workflow to ComfyUI's `/prompt` endpoint
1231
- 4. Polls `/history/{promptId}` every second until completion (max 5 minutes)
1232
- 5. Fetches the generated image from `/view`
1233
- 6. Returns a PNG buffer
1676
+ #### The Complete Workflow JSON All 8 Nodes
1677
+
1678
+ The `DEFAULT_TXT2IMG_WORKFLOW` constant defines a complete SDXL text-to-image pipeline as a ComfyUI node graph. Each key is a **node ID** (string), each value defines the node type and its connections:
1679
+
1680
+ ```typescript
1681
+ // Node "3": KSampler — The core diffusion sampling node
1682
+ '3': {
1683
+ class_type: 'KSampler',
1684
+ inputs: {
1685
+ seed: 0, // Random seed (overridden by options.seed)
1686
+ steps: 20, // Denoising steps (overridden by options.steps)
1687
+ cfg: 7, // CFG/guidance scale (overridden by options.guidanceScale)
1688
+ sampler_name: 'euler', // Sampling algorithm
1689
+ scheduler: 'normal', // Noise schedule
1690
+ denoise: 1, // Full denoise (1.0 = txt2img, <1.0 = img2img)
1691
+ model: ['4', 0], // ← Connection: output 0 of node "4" (checkpoint model)
1692
+ positive: ['6', 0], // ← Connection: output 0 of node "6" (positive prompt)
1693
+ negative: ['7', 0], // ← Connection: output 0 of node "7" (negative prompt)
1694
+ latent_image: ['5', 0], // ← Connection: output 0 of node "5" (empty latent)
1695
+ },
1696
+ }
1697
+
1698
+ // Node "4": CheckpointLoaderSimple — Loads the SDXL model from disk
1699
+ '4': {
1700
+ class_type: 'CheckpointLoaderSimple',
1701
+ inputs: {
1702
+ ckpt_name: 'sd_xl_base_1.0.safetensors', // Checkpoint file on disk
1703
+ // Outputs: [0]=MODEL, [1]=CLIP, [2]=VAE
1704
+ // MODEL → KSampler.model
1705
+ // CLIP → CLIPTextEncode nodes
1706
+ // VAE → VAEDecode
1707
+ },
1708
+ }
1709
+
1710
+ // Node "5": EmptyLatentImage — Creates the initial noise tensor
1711
+ '5': {
1712
+ class_type: 'EmptyLatentImage',
1713
+ inputs: {
1714
+ width: 1024, // Overridden by options.width
1715
+ height: 1024, // Overridden by options.height
1716
+ batch_size: 1, // Always 1 image per generation
1717
+ },
1718
+ }
1719
+
1720
+ // Node "6": CLIPTextEncode — Positive prompt encoding
1721
+ '6': {
1722
+ class_type: 'CLIPTextEncode',
1723
+ inputs: {
1724
+ text: '', // Overridden by options.prompt
1725
+ clip: ['4', 1], // ← Connection: output 1 of node "4" (CLIP model)
1726
+ },
1727
+ }
1728
+
1729
+ // Node "7": CLIPTextEncode — Negative prompt encoding
1730
+ '7': {
1731
+ class_type: 'CLIPTextEncode',
1732
+ inputs: {
1733
+ text: '', // Overridden by options.negativePrompt ?? ''
1734
+ clip: ['4', 1], // ← Same CLIP model as positive prompt
1735
+ },
1736
+ }
1737
+
1738
+ // Node "8": VAEDecode — Converts latent space to pixel space
1739
+ '8': {
1740
+ class_type: 'VAEDecode',
1741
+ inputs: {
1742
+ samples: ['3', 0], // ← Connection: output 0 of node "3" (sampled latents)
1743
+ vae: ['4', 2], // ← Connection: output 2 of node "4" (VAE decoder)
1744
+ },
1745
+ }
1746
+
1747
+ // Node "9": SaveImage — Saves the final image
1748
+ '9': {
1749
+ class_type: 'SaveImage',
1750
+ inputs: {
1751
+ filename_prefix: 'noosphere', // Files saved as noosphere_00001.png, etc.
1752
+ images: ['8', 0], // ← Connection: output 0 of node "8" (decoded image)
1753
+ },
1754
+ }
1755
+ ```
1756
+
1757
+ **Node connection format:** `['nodeId', outputIndex]` — this is ComfyUI's internal linking system. For example, `['4', 1]` means "output slot 1 of node 4", which is the CLIP model from CheckpointLoaderSimple.
1758
+
1759
+ **Visual pipeline flow:**
1760
+ ```
1761
+ CheckpointLoader["4"] ──MODEL──→ KSampler["3"]
1762
+ ├──CLIP──→ CLIPTextEncode["6"] (positive) ──→ KSampler["3"]
1763
+ ├──CLIP──→ CLIPTextEncode["7"] (negative) ──→ KSampler["3"]
1764
+ └──VAE───→ VAEDecode["8"]
1765
+ EmptyLatentImage["5"] ──→ KSampler["3"] ──→ VAEDecode["8"] ──→ SaveImage["9"]
1766
+ ```
1767
+
1768
+ #### Parameter Injection — How Options Map to Nodes
1769
+
1770
+ ```typescript
1771
+ // Deep-clone to avoid mutating the template:
1772
+ const workflow = structuredClone(DEFAULT_TXT2IMG_WORKFLOW);
1773
+
1774
+ // Direct node mutations:
1775
+ workflow['6'].inputs.text = options.prompt; // Positive prompt → Node 6
1776
+ workflow['7'].inputs.text = options.negativePrompt ?? ''; // Negative prompt → Node 7
1777
+ workflow['5'].inputs.width = options.width ?? 1024; // Width → Node 5
1778
+ workflow['5'].inputs.height = options.height ?? 1024; // Height → Node 5
1779
+
1780
+ // Conditional overrides (only if user provided them):
1781
+ if (options.seed !== undefined) workflow['3'].inputs.seed = options.seed;
1782
+ if (options.steps !== undefined) workflow['3'].inputs.steps = options.steps;
1783
+ if (options.guidanceScale !== undefined) workflow['3'].inputs.cfg = options.guidanceScale;
1784
+ // Note: sampler_name, scheduler, and denoise are NOT configurable via Noosphere.
1785
+ // They're hardcoded to euler/normal/1.0
1786
+ ```
1787
+
1788
+ #### Queue Submission — POST /prompt
1789
+
1790
+ ```typescript
1791
+ const queueRes = await fetch(`${this.baseUrl}/prompt`, {
1792
+ method: 'POST',
1793
+ headers: { 'Content-Type': 'application/json' },
1794
+ body: JSON.stringify({ prompt: workflow }),
1795
+ // ComfyUI expects: { prompt: <workflow_object>, client_id?: string }
1796
+ });
1797
+
1798
+ if (!queueRes.ok) throw new Error(`ComfyUI queue failed: ${queueRes.status}`);
1799
+
1800
+ const { prompt_id } = await queueRes.json();
1801
+ // prompt_id is a UUID like "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
1802
+ // Used to track this specific generation in the history API
1803
+ ```
1804
+
1805
+ #### Polling Mechanism — Deadline-Based with 1s Intervals
1806
+
1807
+ ```typescript
1808
+ private async pollForResult(promptId: string, maxWaitMs = 300000): Promise<ArrayBuffer> {
1809
+ const deadline = Date.now() + maxWaitMs; // 300,000ms = 5 minutes
1810
+
1811
+ while (Date.now() < deadline) {
1812
+ // Check history for our prompt
1813
+ const res = await fetch(`${this.baseUrl}/history/${promptId}`);
1814
+
1815
+ if (!res.ok) {
1816
+ await new Promise((r) => setTimeout(r, 1000)); // 1 second between polls
1817
+ continue;
1818
+ }
1819
+
1820
+ const history = await res.json();
1821
+ // History format: { [promptId]: { outputs: { [nodeId]: { images: [...] } } } }
1822
+
1823
+ const entry = history[promptId];
1824
+ if (!entry?.outputs) {
1825
+ await new Promise((r) => setTimeout(r, 1000)); // Not ready yet
1826
+ continue;
1827
+ }
1828
+
1829
+ // Search ALL output nodes for images (not just node "9"):
1830
+ for (const nodeOutput of Object.values(entry.outputs)) {
1831
+ if (nodeOutput.images?.length > 0) {
1832
+ const img = nodeOutput.images[0];
1833
+ // Fetch the actual image binary:
1834
+ const imgRes = await fetch(
1835
+ `${this.baseUrl}/view?filename=${img.filename}&subfolder=${img.subfolder}&type=${img.type}`
1836
+ );
1837
+ return imgRes.arrayBuffer();
1838
+ }
1839
+ }
1840
+
1841
+ await new Promise((r) => setTimeout(r, 1000));
1842
+ }
1843
+
1844
+ throw new Error(`ComfyUI generation timed out after ${maxWaitMs}ms`);
1845
+ }
1846
+ ```
1847
+
1848
+ **Key polling details:**
1849
+ - **Interval:** Fixed 1000ms (not configurable)
1850
+ - **Timeout:** 300,000ms = 5 minutes (hardcoded, not from `config.timeout.image`)
1851
+ - **Deadline-based:** Uses `Date.now() < deadline` comparison, NOT a retry counter
1852
+ - **Image fetch URL format:** `/view?filename=noosphere_00001_.png&subfolder=&type=output`
1853
+ - **Returns:** Raw `ArrayBuffer` → converted to `Buffer` by the caller
1854
+
1855
+ #### Auto-Detection — How ComfyUI Gets Discovered
1856
+
1857
+ During `Noosphere.init()`, if `autoDetectLocal` is true:
1858
+
1859
+ ```typescript
1860
+ // Ping the /system_stats endpoint with a 2-second timeout:
1861
+ const pingUrl = async (url: string): Promise<boolean> => {
1862
+ const controller = new AbortController();
1863
+ const timer = setTimeout(() => controller.abort(), 2000); // 2s hard timeout
1864
+ try {
1865
+ const res = await fetch(url, { signal: controller.signal });
1866
+ return res.ok;
1867
+ } finally {
1868
+ clearTimeout(timer);
1869
+ }
1870
+ };
1871
+
1872
+ // Check ComfyUI specifically:
1873
+ if (comfyuiCfg?.enabled) {
1874
+ const ok = await pingUrl(`${comfyuiCfg.host}:${comfyuiCfg.port}/system_stats`);
1875
+ if (ok) {
1876
+ this.registry.addProvider(new ComfyUIProvider({
1877
+ host: comfyuiCfg.host, // Default: 'http://localhost'
1878
+ port: comfyuiCfg.port, // Default: 8188
1879
+ }));
1880
+ }
1881
+ }
1882
+ ```
1883
+
1884
+ **Environment variable overrides:**
1885
+ ```bash
1886
+ COMFYUI_HOST=http://192.168.1.100 # Override host
1887
+ COMFYUI_PORT=8190 # Override port
1888
+ ```
1234
1889
 
1235
1890
  #### Configuration
1236
1891
 
@@ -1238,30 +1893,82 @@ Connects to a local ComfyUI instance for Stable Diffusion workflows.
1238
1893
  const ai = new Noosphere({
1239
1894
  local: {
1240
1895
  comfyui: {
1241
- enabled: true,
1242
- host: 'http://localhost',
1243
- port: 8188,
1896
+ enabled: true, // Default: true (auto-detected)
1897
+ host: 'http://localhost', // Default: 'http://localhost'
1898
+ port: 8188, // Default: 8188
1244
1899
  },
1245
1900
  },
1246
1901
  });
1247
1902
  ```
1248
1903
 
1249
- #### Default Workflow
1904
+ #### Model Discovery — Dynamic via /object_info
1250
1905
 
1251
- - **Checkpoint:** `sd_xl_base_1.0.safetensors`
1252
- - **Sampler:** euler with normal scheduler
1253
- - **Default Steps:** 20
1254
- - **Default CFG/Guidance:** 7
1255
- - **Default Size:** 1024x1024
1256
- - **Max Size:** 2048x2048
1257
- - **Output:** PNG
1906
+ ```typescript
1907
+ async listModels(modality?: Modality): Promise<ModelInfo[]> {
1908
+ // Fetches ComfyUI's full node registry:
1909
+ const res = await fetch(`${this.baseUrl}/object_info`);
1910
+ if (!res.ok) return [];
1911
+
1912
+ // Does NOT parse the response — just uses it as a connectivity check.
1913
+ // Returns hardcoded model entries:
1914
+ const models: ModelInfo[] = [];
1915
+ if (!modality || modality === 'image') {
1916
+ models.push({
1917
+ id: 'comfyui-txt2img',
1918
+ provider: 'comfyui',
1919
+ name: 'ComfyUI Text-to-Image',
1920
+ modality: 'image',
1921
+ local: true,
1922
+ cost: { price: 0, unit: 'free' },
1923
+ capabilities: { maxWidth: 2048, maxHeight: 2048, supportsNegativePrompt: true },
1924
+ });
1925
+ }
1926
+ if (!modality || modality === 'video') {
1927
+ models.push({
1928
+ id: 'comfyui-txt2vid',
1929
+ provider: 'comfyui',
1930
+ name: 'ComfyUI Text-to-Video',
1931
+ modality: 'video',
1932
+ local: true,
1933
+ cost: { price: 0, unit: 'free' },
1934
+ capabilities: { maxDuration: 10, supportsImageToVideo: true },
1935
+ });
1936
+ }
1937
+ return models;
1938
+ }
1939
+ // NOTE: /object_info is fetched but the response is discarded.
1940
+ // The actual model list is hardcoded. This means even if you have
1941
+ // dozens of checkpoints in ComfyUI, Noosphere only exposes 2 model IDs.
1942
+ ```
1258
1943
 
1259
- #### Models Exposed
1944
+ #### Video Generation — Not Yet Implemented
1260
1945
 
1261
- | Model ID | Modality | Description |
1262
- |---|---|---|
1263
- | `comfyui-txt2img` | Image | Text-to-image via workflow |
1264
- | `comfyui-txt2vid` | Video | Planned (requires AnimateDiff workflow) |
1946
+ ```typescript
1947
+ async video(_options: VideoOptions): Promise<NoosphereResult> {
1948
+ throw new Error('ComfyUI video generation requires a configured AnimateDiff workflow');
1949
+ }
1950
+ // The 'comfyui-txt2vid' model ID is listed but will throw at runtime.
1951
+ // This is a placeholder for future AnimateDiff/SVD workflow templates.
1952
+ ```
1953
+
1954
+ #### Default Workflow Parameters Summary
1955
+
1956
+ | Parameter | Default | Configurable | Node |
1957
+ |---|---|---|---|
1958
+ | Checkpoint | `sd_xl_base_1.0.safetensors` | No | Node 4 |
1959
+ | Sampler | `euler` | No | Node 3 |
1960
+ | Scheduler | `normal` | No | Node 3 |
1961
+ | Denoise | `1.0` | No | Node 3 |
1962
+ | Steps | `20` | Yes (`options.steps`) | Node 3 |
1963
+ | CFG/Guidance | `7` | Yes (`options.guidanceScale`) | Node 3 |
1964
+ | Seed | `0` | Yes (`options.seed`) | Node 3 |
1965
+ | Width | `1024` | Yes (`options.width`) | Node 5 |
1966
+ | Height | `1024` | Yes (`options.height`) | Node 5 |
1967
+ | Batch Size | `1` | No | Node 5 |
1968
+ | Filename Prefix | `noosphere` | No | Node 9 |
1969
+ | Negative Prompt | `''` (empty) | Yes (`options.negativePrompt`) | Node 7 |
1970
+ | Max Size | `2048x2048` | Via options | Node 5 |
1971
+ | Output Format | PNG | No | ComfyUI default |
1265
1972
 
1266
1973
  ---
1267
1974
 
@@ -1270,99 +1977,889 @@ const ai = new Noosphere({
1270
1977
  **Provider IDs:** `piper`, `kokoro`
1271
1978
  **Modality:** TTS
1272
1979
  **Type:** Local
1980
+ **Source:** `src/providers/local-tts.ts` (112 lines)
1273
1981
 
1274
- Connects to local OpenAI-compatible TTS servers.
1982
+ The `LocalTTSProvider` is a generic adapter for any local TTS server that exposes an OpenAI-compatible `/v1/audio/speech` endpoint. Two instances are created by default — one for Piper, one for Kokoro — but the class works with ANY server implementing this protocol.
1275
1983
 
1276
1984
  #### Supported Engines
1277
1985
 
1278
- | Engine | Default Port | Health Check | Voice Discovery |
1279
- |---|---|---|---|
1280
- | Piper | 5500 | `GET /health` | `GET /voices` |
1281
- | Kokoro | 5501 | `GET /health` | `GET /v1/models` (fallback) |
1986
+ | Engine | Default Port | Health Check | Voice Discovery | Description |
1987
+ |---|---|---|---|---|
1988
+ | Piper | 5500 | `GET /health` | `GET /voices` (array) | Fast offline TTS, 30+ languages, ONNX models |
1989
+ | Kokoro | 5501 | `GET /health` | `GET /v1/models` (OpenAI format) | High-quality neural TTS |
1282
1990
 
1283
- #### API
1991
+ #### Provider Instantiation — How Instances Are Created
1284
1992
 
1285
- Uses the OpenAI-compatible TTS endpoint:
1993
+ ```typescript
1994
+ // The LocalTTSProvider constructor takes a config object:
1995
+ interface LocalTTSConfig {
1996
+ id: string; // Provider ID: 'piper' or 'kokoro'
1997
+ name: string; // Display name: 'Piper TTS' or 'Kokoro TTS'
1998
+ host: string; // Base URL host
1999
+ port: number; // Port number
2000
+ }
2001
+
2002
+ // Two separate instances are created during init():
2003
+ new LocalTTSProvider({ id: 'piper', name: 'Piper TTS', host: piperCfg.host, port: piperCfg.port })
2004
+ new LocalTTSProvider({ id: 'kokoro', name: 'Kokoro TTS', host: kokoroCfg.host, port: kokoroCfg.port })
2005
+
2006
+ // Each instance is an independent provider in the registry.
2007
+ // They don't share state or config.
2008
+ // The baseUrl is constructed as: `${config.host}:${config.port}`
2009
+ // Example: "http://localhost:5500"
2010
+ ```
2011
+
2012
+ #### Health Check — Ping Protocol
2013
+
2014
+ ```typescript
2015
+ async ping(): Promise<boolean> {
2016
+ try {
2017
+ const res = await fetch(`${this.baseUrl}/health`);
2018
+ return res.ok; // true if HTTP 200-299
2019
+ } catch {
2020
+ return false; // Network error, connection refused, etc.
2021
+ }
2022
+ }
2023
+ // Used during auto-detection in Noosphere.init()
2024
+ // Also used by: the 2-second AbortController timeout in init()
2025
+ // Note: /health is checked BEFORE the provider is registered.
2026
+ // If /health fails, the provider is silently skipped.
2027
+ ```
2028
+
2029
+ #### Dual Voice Discovery Mechanism
2030
+
2031
+ The `listModels()` method implements a **two-strategy fallback** to discover available voices. This is necessary because different TTS servers expose voices through different API formats:
2032
+
2033
+ ```typescript
2034
+ async listModels(modality?: Modality): Promise<ModelInfo[]> {
2035
+ if (modality && modality !== 'tts') return [];
2036
+
2037
+ let voices: Array<{ id: string; name?: string }> = [];
2038
+
2039
+ // STRATEGY 1: Piper-style /voices endpoint
2040
+ // Expected response: Array<{ id: string, name?: string, ... }>
2041
+ try {
2042
+ const res = await fetch(`${this.baseUrl}/voices`);
2043
+ if (res.ok) {
2044
+ const data = await res.json();
2045
+ if (Array.isArray(data)) {
2046
+ voices = data;
2047
+ // Success — skip fallback
2048
+ }
2049
+ }
2050
+ } catch {
2051
+ // STRATEGY 2: OpenAI-compatible /v1/models endpoint
2052
+ // Expected response: { data: Array<{ id: string, ... }> }
2053
+ const res = await fetch(`${this.baseUrl}/v1/models`);
2054
+ if (res.ok) {
2055
+ const data = await res.json();
2056
+ voices = data.data ?? [];
2057
+ }
2058
+ }
2059
+
2060
+ // Map voices to ModelInfo objects:
2061
+ return voices.map((v) => ({
2062
+ id: v.id,
2063
+ provider: this.id, // 'piper' or 'kokoro'
2064
+ name: v.name ?? v.id, // Fallback to ID if no name
2065
+ modality: 'tts' as const,
2066
+ local: true,
2067
+ cost: { price: 0, unit: 'free' },
2068
+ capabilities: {
2069
+ voices: voices.map((vv) => vv.id), // All voice IDs as capabilities
2070
+ },
2071
+ }));
2072
+ }
2073
+ ```
1286
2074
 
2075
+ **Critical implementation detail:** The fallback is triggered by a `catch` block, NOT by checking the response. This means:
2076
+ - If `/voices` returns a **non-array** (e.g., `{}`), strategy 1 succeeds but `voices` remains empty
2077
+ - If `/voices` returns HTTP **404**, strategy 1 "succeeds" (no exception), but `res.ok` is false, so voices stays empty, AND strategy 2 is never tried
2078
+ - Strategy 2 only runs if `/voices` **throws a network error** (connection refused, DNS failure, etc.)
2079
+
2080
+ **Piper response format** (`GET /voices`):
2081
+ ```json
2082
+ [
2083
+ { "id": "en_US-lessac-medium", "name": "Lessac (English US)" },
2084
+ { "id": "en_US-amy-medium", "name": "Amy (English US)" },
2085
+ { "id": "de_DE-thorsten-high", "name": "Thorsten (German)" }
2086
+ ]
1287
2087
  ```
1288
- POST /v1/audio/speech
2088
+
2089
+ **Kokoro/OpenAI response format** (`GET /v1/models`):
2090
+ ```json
1289
2091
  {
1290
- "model": "tts-1",
1291
- "input": "Hello world",
1292
- "voice": "default",
1293
- "speed": 1.0,
1294
- "response_format": "mp3"
2092
+ "data": [
2093
+ { "id": "kokoro-v1", "object": "model" },
2094
+ { "id": "kokoro-v1-jp", "object": "model" }
2095
+ ]
1295
2096
  }
1296
2097
  ```
1297
2098
 
1298
- Supports `mp3`, `wav`, and `ogg` formats. Returns audio as a Buffer.
2099
+ #### Speech Generation Exact HTTP Protocol
2100
+
2101
+ ```typescript
2102
+ async speak(options: SpeakOptions): Promise<NoosphereResult> {
2103
+ const start = Date.now();
2104
+
2105
+ // POST to OpenAI-compatible TTS endpoint:
2106
+ const res = await fetch(`${this.baseUrl}/v1/audio/speech`, {
2107
+ method: 'POST',
2108
+ headers: { 'Content-Type': 'application/json' },
2109
+ body: JSON.stringify({
2110
+ model: options.model ?? 'tts-1', // Default model ID
2111
+ input: options.text, // Text to synthesize
2112
+ voice: options.voice ?? 'default', // Voice selection
2113
+ speed: options.speed ?? 1.0, // Playback speed multiplier
2114
+ response_format: options.format ?? 'mp3', // Output audio format
2115
+ }),
2116
+ });
2117
+
2118
+ if (!res.ok) {
2119
+ throw new Error(`Local TTS failed: ${res.status} ${await res.text()}`);
2120
+ // Note: error includes the response body text for debugging
2121
+ }
2122
+
2123
+ // Response is raw audio binary — convert to Buffer:
2124
+ const audioBuffer = Buffer.from(await res.arrayBuffer());
2125
+
2126
+ return {
2127
+ buffer: audioBuffer,
2128
+ provider: this.id, // 'piper' or 'kokoro'
2129
+ model: options.model ?? options.voice ?? 'default', // Fallback chain
2130
+ modality: 'tts',
2131
+ latencyMs: Date.now() - start,
2132
+ usage: {
2133
+ cost: 0, // Always free (local)
2134
+ input: options.text.length, // CHARACTER count, not tokens
2135
+ unit: 'characters', // Track by characters
2136
+ },
2137
+ media: {
2138
+ format: options.format ?? 'mp3', // Matches requested format
2139
+ },
2140
+ };
2141
+ }
2142
+ ```
2143
+
2144
+ **Request/Response details:**
2145
+ | Field | Value | Notes |
2146
+ |---|---|---|
2147
+ | Method | `POST` | Always POST |
2148
+ | URL | `/v1/audio/speech` | OpenAI-compatible standard |
2149
+ | Content-Type | `application/json` | JSON body |
2150
+ | Response Content-Type | `audio/mpeg`, `audio/wav`, or `audio/ogg` | Depends on `response_format` |
2151
+ | Response Body | Raw binary audio | Converted to `Buffer` via `arrayBuffer()` |
2152
+
2153
+ **Available formats (from `SpeakOptions.format` type):**
2154
+ | Format | Typical Size | Quality | Use Case |
2155
+ |---|---|---|---|
2156
+ | `mp3` | Smallest | Lossy | Web playback, storage |
2157
+ | `wav` | Largest | Lossless | Processing, editing |
2158
+ | `ogg` | Medium | Lossy | Web playback, open format |
2159
+
2160
+ #### Usage Tracking — Character-Based
2161
+
2162
+ Local TTS tracks usage by **character count**, not tokens:
2163
+
2164
+ ```typescript
2165
+ usage: {
2166
+ cost: 0, // Always 0 for local providers
2167
+ input: options.text.length, // JavaScript string .length (UTF-16 code units)
2168
+ unit: 'characters', // Unit identifier for aggregation
2169
+ }
2170
+ // Note: .length counts UTF-16 code units, not Unicode codepoints.
2171
+ // "Hello" = 5, "🎵" = 2 (surrogate pair), "café" = 4
2172
+ ```
2173
+
2174
+ This feeds into the global `UsageTracker`, so you can query TTS usage:
2175
+ ```typescript
2176
+ const usage = ai.getUsage({ modality: 'tts' });
2177
+ // usage.totalRequests = number of TTS calls
2178
+ // usage.totalCost = 0 (always free for local)
2179
+ // usage.byProvider = { piper: 0, kokoro: 0 }
2180
+ ```
2181
+
2182
+ #### Auto-Detection — Parallel Discovery
2183
+
2184
+ Both Piper and Kokoro are detected simultaneously during `init()`:
2185
+
2186
+ ```typescript
2187
+ // Inside Noosphere.init(), wrapped in Promise.allSettled():
2188
+ await Promise.allSettled([
2189
+ // ... ComfyUI detection ...
2190
+ (async () => {
2191
+ if (piperCfg?.enabled) { // enabled: true by default
2192
+ const ok = await pingUrl(`${piperCfg.host}:${piperCfg.port}/health`);
2193
+ if (ok) {
2194
+ this.registry.addProvider(new LocalTTSProvider({
2195
+ id: 'piper', name: 'Piper TTS',
2196
+ host: piperCfg.host, port: piperCfg.port,
2197
+ }));
2198
+ }
2199
+ }
2200
+ })(),
2201
+ (async () => {
2202
+ if (kokoroCfg?.enabled) { // enabled: true by default
2203
+ const ok = await pingUrl(`${kokoroCfg.host}:${kokoroCfg.port}/health`);
2204
+ if (ok) {
2205
+ this.registry.addProvider(new LocalTTSProvider({
2206
+ id: 'kokoro', name: 'Kokoro TTS',
2207
+ host: kokoroCfg.host, port: kokoroCfg.port,
2208
+ }));
2209
+ }
2210
+ }
2211
+ })(),
2212
+ ]);
2213
+ ```
2214
+
2215
+ **Environment variable overrides:**
2216
+ ```bash
2217
+ PIPER_HOST=http://192.168.1.100 PIPER_PORT=5500
2218
+ KOKORO_HOST=http://192.168.1.100 KOKORO_PORT=5501
2219
+ ```
2220
+
2221
+ #### Setting Up Local TTS Servers
2222
+
2223
+ **Piper TTS:**
2224
+ ```bash
2225
+ # Docker (recommended):
2226
+ docker run -p 5500:5500 rhasspy/wyoming-piper \
2227
+ --voice en_US-lessac-medium
2228
+
2229
+ # Or via pip:
2230
+ pip install piper-tts
2231
+ # Then run a compatible HTTP server (wyoming-piper or piper-http-server)
2232
+ ```
2233
+
2234
+ **Kokoro TTS:**
2235
+ ```bash
2236
+ # Docker:
2237
+ docker run -p 5501:8880 ghcr.io/remsky/kokoro-fastapi-cpu:latest
2238
+
2239
+ # The Kokoro server exposes OpenAI-compatible endpoints at:
2240
+ # GET /v1/models → List available voices
2241
+ # POST /v1/audio/speech → Generate speech
2242
+ # GET /health → Health check
2243
+ ```
1299
2244
 
1300
2245
  ---
1301
2246
 
1302
2247
  ## Architecture
1303
2248
 
1304
- ### Provider Resolution (Local-First)
2249
+ ### The Complete Init() Flow — What Happens When You Create a Noosphere Instance
1305
2250
 
1306
- When you call a generation method without specifying a provider, Noosphere resolves one automatically:
2251
+ ```typescript
2252
+ const ai = new Noosphere({ /* config */ });
2253
+ // At this point: config is resolved, but NO providers are registered.
2254
+ // The `initialized` flag is false.
1307
2255
 
1308
- 1. If `model` is specified without `provider` → looks up model in registry cache
1309
- 2. If a `default` is configured for the modality → uses that
1310
- 3. Otherwise → **local providers first**, then cloud providers
2256
+ await ai.chat({ messages: [...] });
2257
+ // FIRST call triggers lazy initialization via init()
2258
+ ```
2259
+
2260
+ **Initialization sequence (`src/noosphere.ts:240-322`):**
1311
2261
 
1312
2262
  ```
1313
- resolveProvider(modality):
1314
- 1. Check user-specified provider ID return if found
1315
- 2. Check configured defaults return if found
1316
- 3. Scan all providers:
1317
- → Return first LOCAL provider supporting this modality
1318
- Fallback to first CLOUD provider
1319
- 4. Throw NO_PROVIDER error
2263
+ 1. Constructor:
2264
+ ├── resolveConfig(input) // Merge config > env > defaults
2265
+ ├── new Registry(cacheTTLMinutes) // Empty provider registry
2266
+ └── new UsageTracker(onUsage) // Empty event list
2267
+
2268
+ 2. First API call triggers init():
2269
+ ├── Set initialized = true (immediately, before any async work)
2270
+
2271
+ ├── CLOUD PROVIDER REGISTRATION (synchronous):
2272
+ │ ├── Collect all API keys from resolved config
2273
+ │ ├── If ANY LLM key exists → register PiAiProvider(allKeys)
2274
+ │ ├── If FAL key exists → register FalProvider(falKey)
2275
+ │ └── If HF token exists → register HuggingFaceProvider(token)
2276
+
2277
+ └── LOCAL SERVICE DETECTION (parallel, async):
2278
+ └── Promise.allSettled([
2279
+ pingUrl(comfyui /system_stats) → register ComfyUIProvider
2280
+ pingUrl(piper /health) → register LocalTTSProvider('piper')
2281
+ pingUrl(kokoro /health) → register LocalTTSProvider('kokoro')
2282
+ ])
1320
2283
  ```
1321
2284
 
1322
- ### Retry & Failover Logic
2285
+ **Key design decisions:**
2286
+ - `initialized = true` is set **before** async work, preventing concurrent init() calls
2287
+ - Cloud providers are registered **synchronously** (no network calls needed)
2288
+ - Local detection uses `Promise.allSettled()` — a failing ping doesn't block others
2289
+ - Each ping has a 2-second `AbortController` timeout
2290
+ - If auto-detection is disabled (`autoDetectLocal: false`), local providers are never registered
2291
+
2292
+ ### Configuration Resolution — Three-Layer Priority System
2293
+
2294
+ The `resolveConfig()` function (`src/config.ts`, 87 lines) implements a strict priority hierarchy:
1323
2295
 
1324
2296
  ```
1325
- executeWithRetry(modality, provider, fn):
1326
- for attempt = 0..maxRetries:
1327
- try: return fn()
1328
- catch:
1329
- if error is retryable AND attempts remain:
1330
- wait backoffMs * 2^attempt (exponential backoff)
1331
- retry same provider
1332
- if error is NOT GENERATION_FAILED AND failover enabled:
1333
- try each alternative provider for this modality
1334
- throw last error
2297
+ Priority: Explicit Config > Environment Variables > Built-in Defaults
1335
2298
  ```
1336
2299
 
1337
- **Retryable errors (same provider):** `PROVIDER_UNAVAILABLE`, `RATE_LIMITED`, `TIMEOUT`, `GENERATION_FAILED`
2300
+ **API Key Resolution:**
2301
+ ```typescript
2302
+ // For each of the 9 supported providers:
2303
+ const ENV_KEY_MAP = {
2304
+ openai: 'OPENAI_API_KEY',
2305
+ anthropic: 'ANTHROPIC_API_KEY',
2306
+ google: 'GEMINI_API_KEY',
2307
+ fal: 'FAL_KEY',
2308
+ openrouter: 'OPENROUTER_API_KEY',
2309
+ huggingface: 'HUGGINGFACE_TOKEN',
2310
+ groq: 'GROQ_API_KEY',
2311
+ mistral: 'MISTRAL_API_KEY',
2312
+ xai: 'XAI_API_KEY',
2313
+ };
1338
2314
 
1339
- **Failover-eligible errors (cross-provider):** `PROVIDER_UNAVAILABLE`, `RATE_LIMITED`, `TIMEOUT` (NOT `GENERATION_FAILED`)
2315
+ // Resolution per key:
2316
+ keys[name] = input.keys?.[name] // 1. Explicit config
2317
+ ?? process.env[envVar]; // 2. Environment variable
2318
+ // 3. undefined (no default)
2319
+ ```
1340
2320
 
1341
- ### Model Registry & Caching
2321
+ **Local Service Resolution:**
2322
+ ```typescript
2323
+ // For each of the 4 local services:
2324
+ const LOCAL_DEFAULTS = {
2325
+ ollama: { host: 'http://localhost', port: 11434, envHost: 'OLLAMA_HOST', envPort: 'OLLAMA_PORT' },
2326
+ comfyui: { host: 'http://localhost', port: 8188, envHost: 'COMFYUI_HOST', envPort: 'COMFYUI_PORT' },
2327
+ piper: { host: 'http://localhost', port: 5500, envHost: 'PIPER_HOST', envPort: 'PIPER_PORT' },
2328
+ kokoro: { host: 'http://localhost', port: 5501, envHost: 'KOKORO_HOST', envPort: 'KOKORO_PORT' },
2329
+ };
1342
2330
 
1343
- - Models are fetched from providers via `listModels()` and cached in memory
1344
- - Cache TTL is configurable (default: 60 minutes)
1345
- - `syncModels()` forces a refresh of all provider model lists
1346
- - Registry tracks model → provider mappings for fast resolution
2331
+ // Resolution per service:
2332
+ local[name] = {
2333
+ enabled: cfgLocal?.enabled ?? true, // Default: enabled
2334
+ host: cfgLocal?.host ?? process.env[envHost] ?? defaults.host,
2335
+ port: cfgLocal?.port ?? parseInt(process.env[envPort]) ?? defaults.port,
2336
+ type: cfgLocal?.type,
2337
+ };
2338
+ ```
1347
2339
 
1348
- ### Usage Tracking
2340
+ **Other config defaults:**
2341
+ | Setting | Default | Environment Override |
2342
+ |---|---|---|
2343
+ | `autoDetectLocal` | `true` | `NOOSPHERE_AUTO_DETECT_LOCAL` |
2344
+ | `discoveryCacheTTL` | `60` (minutes) | `NOOSPHERE_DISCOVERY_CACHE_TTL` |
2345
+ | `retry.maxRetries` | `2` | — |
2346
+ | `retry.backoffMs` | `1000` | — |
2347
+ | `retry.failover` | `true` | — |
2348
+ | `retry.retryableErrors` | `['PROVIDER_UNAVAILABLE', 'RATE_LIMITED', 'TIMEOUT']` | — |
2349
+ | `timeout.llm` | `30000` (30s) | — |
2350
+ | `timeout.image` | `120000` (2m) | — |
2351
+ | `timeout.video` | `300000` (5m) | — |
2352
+ | `timeout.tts` | `60000` (1m) | — |
2353
+
2354
+ ### Provider Resolution — Local-First Algorithm
2355
+
2356
+ When you call a generation method without specifying a provider, Noosphere resolves one automatically through a three-stage process in `resolveProviderForModality()` (`src/noosphere.ts:324-348`):
2357
+
2358
+ ```typescript
2359
+ private resolveProviderForModality(
2360
+ modality: Modality,
2361
+ preferredId?: string,
2362
+ modelId?: string,
2363
+ ): NoosphereProvider {
2364
+
2365
+ // STAGE 1: Model-based resolution
2366
+ // If model was specified WITHOUT a provider, search the registry cache
2367
+ if (modelId && !preferredId) {
2368
+ const resolved = this.registry.resolveModel(modelId, modality);
2369
+ if (resolved) return resolved.provider;
2370
+ // resolveModel() scans ALL cached models across ALL providers
2371
+ // looking for exact match on both modelId AND modality
2372
+ }
2373
+
2374
+ // STAGE 2: Default-based resolution
2375
+ // Check if user configured a default for this modality
2376
+ if (!preferredId) {
2377
+ const defaultCfg = this.config.defaults[modality];
2378
+ if (defaultCfg) {
2379
+ preferredId = defaultCfg.provider;
2380
+ // Now fall through to Stage 3 with this preferredId
2381
+ }
2382
+ }
2383
+
2384
+ // STAGE 3: Provider registry resolution
2385
+ const provider = this.registry.resolveProvider(modality, preferredId);
2386
+ if (!provider) {
2387
+ throw new NoosphereError(
2388
+ `No provider available for modality '${modality}'`,
2389
+ { code: 'NO_PROVIDER', ... }
2390
+ );
2391
+ }
2392
+ return provider;
2393
+ }
2394
+ ```
2395
+
2396
+ **Registry.resolveProvider() — The local-first algorithm** (`src/registry.ts:31-46`):
2397
+
2398
+ ```typescript
2399
+ resolveProvider(modality: Modality, preferredId?: string): NoosphereProvider | null {
2400
+ // If a specific provider was requested:
2401
+ if (preferredId) {
2402
+ const p = this.providers.get(preferredId);
2403
+ if (p && p.modalities.includes(modality)) return p;
2404
+ return null; // NOT found — returns null, NOT a fallback
2405
+ }
2406
+
2407
+ // No preference — scan with local-first priority:
2408
+ let bestCloud: NoosphereProvider | null = null;
2409
+
2410
+ for (const p of this.providers.values()) {
2411
+ if (!p.modalities.includes(modality)) continue;
2412
+
2413
+ // LOCAL provider found → return IMMEDIATELY (first match wins)
2414
+ if (p.isLocal) return p;
2415
+
2416
+ // CLOUD provider → save as fallback (first cloud match only)
2417
+ if (!bestCloud) bestCloud = p;
2418
+ }
2419
+
2420
+ return bestCloud; // Return first cloud provider, or null
2421
+ }
2422
+ ```
2423
+
2424
+ **Resolution priority diagram:**
2425
+ ```
2426
+ ai.chat({ model: 'gpt-4o' })
2427
+
2428
+ ├─ Stage 1: Search modelCache for 'gpt-4o' with modality 'llm'
2429
+ │ └── Found in pi-ai cache → return PiAiProvider
2430
+
2431
+ ├─ Stage 2: (skipped — model resolved in Stage 1)
2432
+
2433
+ └─ Stage 3: (skipped — already resolved)
2434
+
2435
+ ai.image({ prompt: 'sunset' })
2436
+
2437
+ ├─ Stage 1: (no model specified, skipped)
2438
+
2439
+ ├─ Stage 2: Check config.defaults.image → none configured
2440
+
2441
+ └─ Stage 3: resolveProvider('image', undefined)
2442
+ ├── Scan providers:
2443
+ │ ├── pi-ai: modalities=['llm'] → skip (no 'image')
2444
+ │ ├── comfyui: modalities=['image','video'], isLocal=true → RETURN
2445
+ │ └── (fal never reached — local wins)
2446
+ └── Returns ComfyUIProvider (local-first)
2447
+
2448
+ ai.image({ prompt: 'sunset' }) // No local ComfyUI running
2449
+
2450
+ └─ Stage 3: resolveProvider('image', undefined)
2451
+ ├── Scan providers:
2452
+ │ ├── pi-ai: no 'image' → skip
2453
+ │ ├── fal: modalities=['image','video','tts'], isLocal=false → save as bestCloud
2454
+ │ └── huggingface: modalities=['image','tts','llm'], isLocal=false → already have bestCloud
2455
+ └── Returns FalProvider (first cloud fallback)
2456
+ ```
2457
+
2458
+ ### Retry & Failover Logic — Complete Algorithm
2459
+
2460
+ The `executeWithRetry()` method (`src/noosphere.ts:350-397`) implements a two-phase error handling strategy: same-provider retries, then cross-provider failover.
2461
+
2462
+ ```typescript
2463
+ private async executeWithRetry<T>(
2464
+ modality: Modality,
2465
+ provider: NoosphereProvider,
2466
+ fn: () => Promise<T>,
2467
+ failoverFnFactory?: (alt: NoosphereProvider) => (() => Promise<T>) | null,
2468
+ ): Promise<T> {
2469
+ const { maxRetries, backoffMs, retryableErrors, failover } = this.config.retry;
2470
+ // Default: maxRetries=2, backoffMs=1000, failover=true
2471
+ // retryableErrors = ['PROVIDER_UNAVAILABLE', 'RATE_LIMITED', 'TIMEOUT']
2472
+ let lastError: Error | undefined;
2473
+
2474
+ for (let attempt = 0; attempt <= maxRetries; attempt++) {
2475
+ try {
2476
+ return await fn(); // Try the primary provider
2477
+ } catch (err) {
2478
+ lastError = err instanceof Error ? err : new Error(String(err));
2479
+
2480
+ const isNoosphereErr = err instanceof NoosphereError;
2481
+ const code = isNoosphereErr ? err.code : 'GENERATION_FAILED';
2482
+
2483
+ // GENERATION_FAILED is special:
2484
+ // - Retryable on same provider (bad prompt, transient model issue)
2485
+ // - NOT eligible for cross-provider failover
2486
+ const isRetryable = retryableErrors.includes(code) || code === 'GENERATION_FAILED';
2487
+ const allowsFailover = code !== 'GENERATION_FAILED' && retryableErrors.includes(code);
2488
+
2489
+ if (!isRetryable || attempt === maxRetries) {
2490
+ // FAILOVER PHASE: Try other providers
2491
+ if (failover && allowsFailover && failoverFnFactory) {
2492
+ const altProviders = this.registry.getAllProviders()
2493
+ .filter((p) => p.id !== provider.id && p.modalities.includes(modality));
2494
+
2495
+ for (const alt of altProviders) {
2496
+ try {
2497
+ const altFn = failoverFnFactory(alt);
2498
+ if (altFn) return await altFn(); // Success on alternate provider
2499
+ } catch {
2500
+ // Continue to next alternate provider
2501
+ }
2502
+ }
2503
+ }
2504
+ break; // All retries and failovers exhausted
2505
+ }
2506
+
2507
+ // RETRY: Exponential backoff on same provider
2508
+ const delay = backoffMs * Math.pow(2, attempt);
2509
+ // attempt=0: 1000ms, attempt=1: 2000ms, attempt=2: 4000ms
2510
+ await new Promise((resolve) => setTimeout(resolve, delay));
2511
+ }
2512
+ }
2513
+
2514
+ throw lastError ?? new NoosphereError('Generation failed', { ... });
2515
+ }
2516
+ ```
2517
+
2518
+ **Failover function factory pattern:**
2519
+
2520
+ Each generation method passes a factory function that creates the right call for alternate providers:
2521
+ ```typescript
2522
+ // In chat():
2523
+ (alt) => alt.chat ? () => alt.chat!(options) : null
2524
+ // If the alternate provider has chat(), create a function to call it.
2525
+ // If not (e.g., ComfyUI for LLM), return null → skip this provider.
2526
+
2527
+ // In image():
2528
+ (alt) => alt.image ? () => alt.image!(options) : null
2529
+
2530
+ // In video():
2531
+ (alt) => alt.video ? () => alt.video!(options) : null
2532
+
2533
+ // In speak():
2534
+ (alt) => alt.speak ? () => alt.speak!(options) : null
2535
+ ```
2536
+
2537
+ **Complete retry timeline example:**
2538
+ ```
2539
+ ai.chat() with provider="pi-ai", maxRetries=2, backoffMs=1000
2540
+
2541
+ Attempt 0: pi-ai.chat() → RATE_LIMITED
2542
+ wait 1000ms (1000 * 2^0)
2543
+ Attempt 1: pi-ai.chat() → RATE_LIMITED
2544
+ wait 2000ms (1000 * 2^1)
2545
+ Attempt 2: pi-ai.chat() → RATE_LIMITED
2546
+ // maxRetries exhausted, RATE_LIMITED allows failover
2547
+ Failover 1: huggingface.chat() → 503 SERVICE_UNAVAILABLE
2548
+ Failover 2: (no more providers with 'llm' modality)
2549
+ throw last error (RATE_LIMITED from pi-ai)
2550
+ ```
2551
+
2552
+ **Error classification matrix:**
2553
+
2554
+ | Error Code | Same-Provider Retry | Cross-Provider Failover | Typical Cause |
2555
+ |---|---|---|---|
2556
+ | `PROVIDER_UNAVAILABLE` | Yes | Yes | Server down, network error |
2557
+ | `RATE_LIMITED` | Yes | Yes | API quota exceeded |
2558
+ | `TIMEOUT` | Yes | Yes | Slow response |
2559
+ | `GENERATION_FAILED` | Yes | **No** | Bad prompt, model error |
2560
+ | `AUTH_FAILED` | No | No | Wrong API key |
2561
+ | `MODEL_NOT_FOUND` | No | No | Invalid model ID |
2562
+ | `INVALID_INPUT` | No | No | Bad parameters |
2563
+ | `NO_PROVIDER` | No | No | No provider registered |
2564
+
2565
+ ### Model Registry — Internal Data Structures
2566
+
2567
+ The Registry (`src/registry.ts`, 137 lines) is the central nervous system that maps providers to models and handles model lookups.
2568
+
2569
+ **Internal state:**
2570
+ ```typescript
2571
+ class Registry {
2572
+ // Provider storage: Map<providerId, providerInstance>
2573
+ private providers = new Map<string, NoosphereProvider>();
2574
+ // Example: { 'pi-ai' → PiAiProvider, 'fal' → FalProvider, 'comfyui' → ComfyUIProvider }
2575
+
2576
+ // Model cache: Map<providerId, { models: ModelInfo[], syncedAt: timestamp }>
2577
+ private modelCache = new Map<string, CachedModels>();
2578
+ // Example: {
2579
+ // 'pi-ai' → { models: [246 ModelInfo objects], syncedAt: 1710000000000 },
2580
+ // 'fal' → { models: [867 ModelInfo objects], syncedAt: 1710000000000 },
2581
+ // }
2582
+
2583
+ // Cache TTL in milliseconds (converted from minutes in constructor)
2584
+ private cacheTTLMs: number;
2585
+ // Default: 60 * 60 * 1000 = 3,600,000ms = 1 hour
2586
+ }
2587
+ ```
2588
+
2589
+ **Cache staleness check:**
2590
+ ```typescript
2591
+ isCacheStale(providerId: string): boolean {
2592
+ const cached = this.modelCache.get(providerId);
2593
+ if (!cached) return true; // No cache = stale
2594
+ return Date.now() - cached.syncedAt > this.cacheTTLMs;
2595
+ // Example: if syncedAt was 61 minutes ago and TTL is 60 minutes → stale
2596
+ }
2597
+ ```
2598
+
2599
+ **Model resolution — linear scan across all caches:**
2600
+ ```typescript
2601
+ resolveModel(modelId: string, modality: Modality):
2602
+ { provider: NoosphereProvider; model: ModelInfo } | null {
2603
+
2604
+ // Scan EVERY provider's cached models:
2605
+ for (const [providerId, cached] of this.modelCache) {
2606
+ const model = cached.models.find(
2607
+ (m) => m.id === modelId && m.modality === modality
2608
+ );
2609
+ // Must match BOTH modelId AND modality
2610
+ if (model) {
2611
+ const provider = this.providers.get(providerId);
2612
+ if (provider) return { provider, model };
2613
+ }
2614
+ }
2615
+ return null;
2616
+ }
2617
+ // Performance: O(n) where n = total models across all providers
2618
+ // With 246 Pi-AI + 867 FAL + 3 HuggingFace = ~1116 models to scan
2619
+ // This is fast enough for the use case (called once per request)
2620
+ ```
2621
+
2622
+ **Sync mechanism:**
2623
+ ```typescript
2624
+ async syncAll(): Promise<SyncResult> {
2625
+ const byProvider: Record<string, number> = {};
2626
+ const errors: string[] = [];
2627
+ let synced = 0;
2628
+
2629
+ // Sequential sync (NOT parallel) — one provider at a time:
2630
+ for (const provider of this.providers.values()) {
2631
+ try {
2632
+ const models = await provider.listModels();
2633
+ this.modelCache.set(provider.id, {
2634
+ models,
2635
+ syncedAt: Date.now(),
2636
+ });
2637
+ byProvider[provider.id] = models.length;
2638
+ synced += models.length;
2639
+ } catch (err) {
2640
+ errors.push(`${provider.id}: ${err.message}`);
2641
+ byProvider[provider.id] = 0;
2642
+ // Note: failed sync does NOT clear existing cache
2643
+ }
2644
+ }
2645
+
2646
+ return { synced, byProvider, errors };
2647
+ }
2648
+ ```
2649
+
2650
+ **Provider info aggregation:**
2651
+ ```typescript
2652
+ getProviderInfos(modality?: Modality): ProviderInfo[] {
2653
+ // Returns summary info for each registered provider:
2654
+ // {
2655
+ // id: 'pi-ai',
2656
+ // name: 'pi-ai (LLM Gateway)',
2657
+ // modalities: ['llm'],
2658
+ // local: false,
2659
+ // status: 'online', // Always 'online' — no live ping check
2660
+ // modelCount: 246, // From cache, or 0 if not synced
2661
+ // }
2662
+ }
2663
+ ```
2664
+
2665
+ ### Usage Tracking — In-Memory Event Store
2666
+
2667
+ The `UsageTracker` (`src/tracking.ts`, 57 lines) records every API call and provides filtered aggregation.
2668
+
2669
+ **Internal state:**
2670
+ ```typescript
2671
+ class UsageTracker {
2672
+ private events: UsageEvent[] = []; // Append-only array
2673
+ private onUsage?: (event: UsageEvent) => void | Promise<void>; // Optional callback
2674
+ }
2675
+ ```
2676
+
2677
+ **Recording flow — every API call creates a UsageEvent:**
2678
+
2679
+ ```typescript
2680
+ // On SUCCESS (in Noosphere.trackUsage):
2681
+ const event: UsageEvent = {
2682
+ modality: result.modality, // 'llm' | 'image' | 'video' | 'tts'
2683
+ provider: result.provider, // 'pi-ai', 'fal', etc.
2684
+ model: result.model, // 'gpt-4o', 'flux-pro', etc.
2685
+ cost: result.usage.cost, // USD amount (0 for free/local)
2686
+ latencyMs: result.latencyMs, // Wall-clock milliseconds
2687
+ input: result.usage.input, // Input tokens or characters
2688
+ output: result.usage.output, // Output tokens (LLM only)
2689
+ unit: result.usage.unit, // 'tokens', 'characters', 'free'
2690
+ timestamp: new Date().toISOString(), // ISO 8601
2691
+ success: true,
2692
+ metadata, // User-provided metadata passthrough
2693
+ };
2694
+
2695
+ // On FAILURE (in Noosphere.trackError):
2696
+ const event: UsageEvent = {
2697
+ modality, provider,
2698
+ model: model ?? 'unknown',
2699
+ cost: 0, // No cost on failure
2700
+ latencyMs: Date.now() - startMs, // Time until failure
2701
+ timestamp: new Date().toISOString(),
2702
+ success: false,
2703
+ error: err instanceof Error ? err.message : String(err),
2704
+ metadata,
2705
+ };
2706
+ ```
2707
+
2708
+ **Query/aggregation — filtered summary:**
2709
+ ```typescript
2710
+ getSummary(options?: UsageQueryOptions): UsageSummary {
2711
+ let filtered = this.events;
2712
+
2713
+ // Time-range filtering:
2714
+ if (options?.since) {
2715
+ const since = new Date(options.since).getTime();
2716
+ filtered = filtered.filter((e) => new Date(e.timestamp).getTime() >= since);
2717
+ }
2718
+ if (options?.until) {
2719
+ const until = new Date(options.until).getTime();
2720
+ filtered = filtered.filter((e) => new Date(e.timestamp).getTime() <= until);
2721
+ }
2722
+
2723
+ // Provider/modality filtering:
2724
+ if (options?.provider) {
2725
+ filtered = filtered.filter((e) => e.provider === options.provider);
2726
+ }
2727
+ if (options?.modality) {
2728
+ filtered = filtered.filter((e) => e.modality === options.modality);
2729
+ }
2730
+
2731
+ // Aggregation:
2732
+ const byProvider: Record<string, number> = {};
2733
+ const byModality = { llm: 0, image: 0, video: 0, tts: 0 };
2734
+ let totalCost = 0;
2735
+
2736
+ for (const event of filtered) {
2737
+ totalCost += event.cost;
2738
+ byProvider[event.provider] = (byProvider[event.provider] ?? 0) + event.cost;
2739
+ byModality[event.modality] += event.cost;
2740
+ }
2741
+
2742
+ return { totalCost, totalRequests: filtered.length, byProvider, byModality };
2743
+ }
2744
+ ```
2745
+
2746
+ **Usage example:**
2747
+ ```typescript
2748
+ // Get all usage:
2749
+ const all = ai.getUsage();
2750
+ // { totalCost: 0.42, totalRequests: 15, byProvider: { 'pi-ai': 0.40, 'fal': 0.02 }, byModality: { llm: 0.40, image: 0.02, video: 0, tts: 0 } }
2751
+
2752
+ // Get usage for last hour, LLM only:
2753
+ const recent = ai.getUsage({
2754
+ since: new Date(Date.now() - 3600000),
2755
+ modality: 'llm',
2756
+ });
2757
+
2758
+ // Get usage for a specific provider:
2759
+ const falUsage = ai.getUsage({ provider: 'fal' });
2760
+
2761
+ // Real-time callback (set in constructor):
2762
+ const ai = new Noosphere({
2763
+ onUsage: (event) => {
2764
+ console.log(`${event.provider}/${event.model}: $${event.cost} in ${event.latencyMs}ms`);
2765
+ // Or: send to analytics, update dashboard, check budget
2766
+ },
2767
+ });
2768
+ ```
2769
+
2770
+ **Important limitations:**
2771
+ - Events are stored **in memory only** — lost on process restart
2772
+ - No deduplication — each retry/failover attempt creates a separate event
2773
+ - `clear()` wipes all history (called by `dispose()`)
2774
+ - The `onUsage` callback is `await`ed — a slow callback blocks the response return
2775
+
2776
+ ### Streaming Architecture
2777
+
2778
+ The `stream()` method (`src/noosphere.ts:73-124`) wraps provider streams with usage tracking:
2779
+
2780
+ ```typescript
2781
+ stream(options: ChatOptions): NoosphereStream {
2782
+ // Returns IMMEDIATELY (synchronous) — no await
2783
+ // The actual initialization happens lazily on first iteration
2784
+
2785
+ let innerStream: NoosphereStream | undefined;
2786
+ let finalResult: NoosphereResult | undefined;
2787
+ let providerRef: NoosphereProvider | undefined;
2788
+
2789
+ // Lazy init — runs on first for-await-of iteration:
2790
+ const ensureInit = async () => {
2791
+ if (!this.initialized) await this.init();
2792
+ if (!providerRef) {
2793
+ providerRef = this.resolveProviderForModality('llm', ...);
2794
+ if (!providerRef.stream) throw new NoosphereError(...);
2795
+ innerStream = providerRef.stream(options);
2796
+ }
2797
+ };
2798
+
2799
+ // Wrapped async iterator with usage tracking:
2800
+ const wrappedIterator = {
2801
+ async *[Symbol.asyncIterator]() {
2802
+ await ensureInit(); // Init on first next()
2803
+ for await (const event of innerStream!) {
2804
+ if (event.type === 'done' && event.result) {
2805
+ finalResult = event.result;
2806
+ await trackUsage(event.result); // Track when complete
2807
+ }
2808
+ yield event; // Pass events through
2809
+ }
2810
+ },
2811
+ };
2812
+
2813
+ return {
2814
+ [Symbol.asyncIterator]: () => wrappedIterator[Symbol.asyncIterator](),
2815
+
2816
+ // result() — consume entire stream and return final result:
2817
+ result: async () => {
2818
+ if (finalResult) return finalResult; // Already consumed
2819
+ for await (const event of wrappedIterator) {
2820
+ if (event.type === 'done') return event.result!;
2821
+ if (event.type === 'error') throw event.error;
2822
+ }
2823
+ throw new NoosphereError('Stream ended without result');
2824
+ },
2825
+
2826
+ // abort() — signal cancellation:
2827
+ abort: () => innerStream?.abort(),
2828
+ };
2829
+ }
2830
+ ```
2831
+
2832
+ **Stream event types:**
2833
+ | Event Type | Fields | When |
2834
+ |---|---|---|
2835
+ | `text_delta` | `{ type, delta: string }` | Each text token |
2836
+ | `thinking_delta` | `{ type, delta: string }` | Each reasoning token |
2837
+ | `done` | `{ type, result: NoosphereResult }` | Stream complete |
2838
+ | `error` | `{ type, error: Error }` | Stream failed |
2839
+
2840
+ **Note:** Streaming does NOT use `executeWithRetry()`. If the stream fails, there's no automatic retry or failover. The error is yielded as an `error` event and also tracked via `trackError()`.
2841
+
2842
+ ### Lifecycle Management — dispose()
2843
+
2844
+ ```typescript
2845
+ async dispose(): Promise<void> {
2846
+ // 1. Call dispose() on every registered provider (if implemented):
2847
+ for (const provider of this.registry.getAllProviders()) {
2848
+ if (provider.dispose) {
2849
+ await provider.dispose();
2850
+ // Currently: no built-in provider implements dispose()
2851
+ // This is for custom providers that need cleanup
2852
+ }
2853
+ }
2854
+
2855
+ // 2. Clear the model cache:
2856
+ this.registry.clearCache();
2857
+
2858
+ // 3. Clear usage history:
2859
+ this.tracker.clear();
1349
2860
 
1350
- Every API call (success or failure) records a `UsageEvent`:
1351
-
1352
- ```typescript
1353
- interface UsageEvent {
1354
- modality: 'llm' | 'image' | 'video' | 'tts';
1355
- provider: string;
1356
- model: string;
1357
- cost: number; // USD
1358
- latencyMs: number;
1359
- input?: number; // tokens or characters
1360
- output?: number; // tokens
1361
- unit?: string;
1362
- timestamp: string; // ISO 8601
1363
- success: boolean;
1364
- error?: string; // error message if failed
1365
- metadata?: Record<string, unknown>;
2861
+ // Note: does NOT set initialized=false
2862
+ // After dispose(), the instance is NOT reusable for new requests
1366
2863
  }
1367
2864
  ```
1368
2865