npm - noosphere - Versions diffs - 0.1.1 → 0.1.2 - Mend

noosphere 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +820 -122
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -2,13 +2,18 @@
 Unified AI creation engine — text, image, video, and audio generation across all providers through a single interface.
+One import. Every model. Every modality.
 ## Features
-- **Multi-modal** — LLM chat, image generation, video generation, and text-to-speech
-- **Multi-provider** — OpenAI, Anthropic, Google, Groq, Mistral, xAI, OpenRouter, FAL, Hugging Face
-- **Local-first** — Auto-detects ComfyUI, Ollama, Piper, and Kokoro running on your machine
+- **4 modalities** — LLM chat, image generation, video generation, and text-to-speech
+- **246+ LLM models** — via Pi-AI gateway (OpenAI, Anthropic, Google, Groq, Mistral, xAI, Cerebras, OpenRouter)
+- **867+ media endpoints** — via FAL (Flux, SDXL, Kling, Sora 2, VEO 3, Kokoro, ElevenLabs, and hundreds more)
+- **30+ HuggingFace tasks** — LLM, image, TTS, translation, summarization, classification, and more
+- **Local-first architecture** — Auto-detects ComfyUI, Ollama, Piper, and Kokoro on your machine
+- **Agentic capabilities** — Tool use, function calling, reasoning/thinking, vision, and agent loops via Pi-AI
 - **Failover & retry** — Automatic retries with exponential backoff and cross-provider failover
-- **Usage tracking** — Track costs, latency, and token counts across all providers
+- **Usage tracking** — Real-time cost, latency, and token tracking across all providers
 - **TypeScript-first** — Full type definitions with ESM and CommonJS support
 ## Install
@@ -56,7 +61,7 @@ const audio = await ai.speak({
 ## Configuration
-API keys are resolved from the constructor or environment variables:
+API keys are resolved from the constructor config or environment variables (config takes priority):
 ```typescript
 const ai = new Noosphere({
@@ -80,47 +85,118 @@ Or set environment variables:
 |---|---|
 | `OPENAI_API_KEY` | OpenAI |
 | `ANTHROPIC_API_KEY` | Anthropic |
-| `GEMINI_API_KEY` | Google |
-| `FAL_KEY` | FAL |
+| `GEMINI_API_KEY` | Google Gemini |
+| `FAL_KEY` | FAL.ai |
 | `HUGGINGFACE_TOKEN` | Hugging Face |
 | `GROQ_API_KEY` | Groq |
 | `MISTRAL_API_KEY` | Mistral |
-| `XAI_API_KEY` | xAI |
+| `XAI_API_KEY` | xAI (Grok) |
 | `OPENROUTER_API_KEY` | OpenRouter |
-## API
+### Full Configuration Reference
+```typescript
+const ai = new Noosphere({
+  // API keys (or use env vars above)
+  keys: { /* ... */ },
+  // Default models per modality
+  defaults: {
+    llm: { provider: 'pi-ai', model: 'claude-sonnet-4-20250514' },
+    image: { provider: 'fal', model: 'fal-ai/flux/schnell' },
+    video: { provider: 'fal', model: 'fal-ai/kling-video/v2/master/text-to-video' },
+    tts: { provider: 'fal', model: 'fal-ai/kokoro/american-english' },
+  },
+  // Local service configuration
+  autoDetectLocal: true,  // env: NOOSPHERE_AUTO_DETECT_LOCAL
+  local: {
+    ollama: { enabled: true, host: 'http://localhost', port: 11434 },
+    comfyui: { enabled: true, host: 'http://localhost', port: 8188 },
+    piper: { enabled: true, host: 'http://localhost', port: 5500 },
+    kokoro: { enabled: true, host: 'http://localhost', port: 5501 },
+    custom: [],  // additional LocalServiceConfig[]
+  },
+  // Retry & failover
+  retry: {
+    maxRetries: 2,           // default: 2
+    backoffMs: 1000,         // default: 1000 (exponential: 1s, 2s, 4s...)
+    failover: true,          // default: true — try other providers on failure
+    retryableErrors: ['PROVIDER_UNAVAILABLE', 'RATE_LIMITED', 'TIMEOUT'],
+  },
+  // Timeouts per modality (ms)
+  timeout: {
+    llm: 30000,    // 30s
+    image: 120000, // 2min
+    video: 300000, // 5min
+    tts: 60000,    // 1min
+  },
+  // Model discovery cache (minutes)
+  discoveryCacheTTL: 60,  // env: NOOSPHERE_DISCOVERY_CACHE_TTL
+  // Real-time usage callback
+  onUsage: (event) => {
+    console.log(`${event.provider}/${event.model}: $${event.cost} (${event.latencyMs}ms)`);
+  },
+});
+```
+### Local Service Environment Variables
+| Variable | Default | Description |
+|---|---|---|
+| `OLLAMA_HOST` | `http://localhost` | Ollama server host |
+| `OLLAMA_PORT` | `11434` | Ollama server port |
+| `COMFYUI_HOST` | `http://localhost` | ComfyUI server host |
+| `COMFYUI_PORT` | `8188` | ComfyUI server port |
+| `PIPER_HOST` | `http://localhost` | Piper TTS server host |
+| `PIPER_PORT` | `5500` | Piper TTS server port |
+| `KOKORO_HOST` | `http://localhost` | Kokoro TTS server host |
+| `KOKORO_PORT` | `5501` | Kokoro TTS server port |
+| `NOOSPHERE_AUTO_DETECT_LOCAL` | `true` | Enable/disable local service auto-detection |
+| `NOOSPHERE_DISCOVERY_CACHE_TTL` | `60` | Model cache TTL in minutes |
+---
+## API Reference
 ### `new Noosphere(config?)`
-Creates a new instance. Providers are initialized lazily on first use.
+Creates a new instance. Providers are initialized lazily on first API call. Auto-detects local services via HTTP pings (2s timeout each).
-### Generation
+### Generation Methods
 #### `ai.chat(options): Promise<NoosphereResult>`
-Generate text with an LLM.
+Generate text with any LLM. Supports 246+ models across 8 providers.
 ```typescript
 const result = await ai.chat({
-  provider: 'anthropic',          // optional — auto-resolved if omitted
-  model: 'claude-sonnet-4-20250514',  // optional
+  provider: 'anthropic',                // optional — auto-resolved if omitted
+  model: 'claude-sonnet-4-20250514',    // optional — uses default or first available
   messages: [
     { role: 'system', content: 'You are helpful.' },
     { role: 'user', content: 'Explain quantum computing' },
   ],
-  temperature: 0.7,               // optional
-  maxTokens: 1024,                // optional
-  jsonMode: false,                // optional
+  temperature: 0.7,     // optional (0-2)
+  maxTokens: 1024,      // optional
+  jsonMode: false,       // optional
 });
-console.log(result.content);     // response text
-console.log(result.thinking);    // reasoning (if supported)
-console.log(result.usage.cost);  // cost in USD
+console.log(result.content);          // response text
+console.log(result.thinking);         // reasoning output (Claude, GPT-5, o3, Gemini, Grok-4)
+console.log(result.usage.cost);       // cost in USD
+console.log(result.usage.input);      // input tokens
+console.log(result.usage.output);     // output tokens
+console.log(result.latencyMs);        // response time in ms
 ```
 #### `ai.stream(options): NoosphereStream`
-Stream LLM responses.
+Stream LLM responses token-by-token. Same options as `chat()`.
 ```typescript
 const stream = ai.stream({
@@ -128,67 +204,95 @@ const stream = ai.stream({
 });
 for await (const event of stream) {
-  if (event.type === 'text_delta') process.stdout.write(event.delta!);
-  if (event.type === 'thinking_delta') console.log('[thinking]', event.delta);
+  switch (event.type) {
+    case 'text_delta':
+      process.stdout.write(event.delta!);
+      break;
+    case 'thinking_delta':
+      console.log('[thinking]', event.delta);
+      break;
+    case 'done':
+      console.log('\n\nUsage:', event.result!.usage);
+      break;
+    case 'error':
+      console.error(event.error);
+      break;
+  }
 }
-// Or get the full result after streaming
+// Or consume the full result
 const result = await stream.result();
+// Abort at any time
+stream.abort();
 ```
 #### `ai.image(options): Promise<NoosphereResult>`
-Generate images.
+Generate images. Supports 200+ image models via FAL, HuggingFace, and ComfyUI.
 ```typescript
 const result = await ai.image({
-  prompt: 'A futuristic cityscape',
-  negativePrompt: 'blurry, low quality', // optional
-  width: 1024,                           // optional
-  height: 768,                           // optional
-  seed: 42,                              // optional
-  steps: 30,                             // optional
-  guidanceScale: 7.5,                    // optional
+  provider: 'fal',                              // optional
+  model: 'fal-ai/flux-2-pro',                   // optional
+  prompt: 'A futuristic cityscape at sunset',
+  negativePrompt: 'blurry, low quality',         // optional
+  width: 1024,                                   // optional
+  height: 768,                                   // optional
+  seed: 42,                                      // optional — reproducible results
+  steps: 30,                                     // optional — inference steps (more = higher quality)
+  guidanceScale: 7.5,                            // optional — prompt adherence (higher = stricter)
 });
-console.log(result.url);          // image URL
-console.log(result.media?.width); // dimensions
+console.log(result.url);                // image URL (FAL)
+console.log(result.buffer);             // image Buffer (HuggingFace, ComfyUI)
+console.log(result.media?.width);       // actual dimensions
+console.log(result.media?.height);
+console.log(result.media?.format);      // 'png'
 ```
 #### `ai.video(options): Promise<NoosphereResult>`
-Generate videos.
+Generate videos. Supports 150+ video models via FAL (Kling, Sora 2, VEO 3, WAN, Pixverse, and more).
 ```typescript
 const result = await ai.video({
+  provider: 'fal',
+  model: 'fal-ai/kling-video/v2/master/text-to-video',
   prompt: 'A bird flying through clouds',
-  imageUrl: 'https://...',  // optional — image-to-video
-  duration: 5,              // optional — seconds
-  fps: 24,                  // optional
-  width: 1280,              // optional
-  height: 720,              // optional
+  imageUrl: 'https://...',    // optional — image-to-video
+  duration: 5,                // optional — seconds
+  fps: 24,                    // optional
+  width: 1280,                // optional
+  height: 720,                // optional
 });
-console.log(result.url);
+console.log(result.url);                // video URL
+console.log(result.media?.duration);    // actual duration
+console.log(result.media?.fps);         // frames per second
+console.log(result.media?.format);      // 'mp4'
 ```
 #### `ai.speak(options): Promise<NoosphereResult>`
-Text-to-speech synthesis.
+Text-to-speech synthesis. Supports 50+ TTS models via FAL, HuggingFace, Piper, and Kokoro.
 ```typescript
 const result = await ai.speak({
+  provider: 'fal',
+  model: 'fal-ai/kokoro/american-english',
   text: 'Hello world',
-  voice: 'alloy',           // optional
-  language: 'en',           // optional
-  speed: 1.0,               // optional
-  format: 'mp3',            // optional — 'mp3' | 'wav' | 'ogg'
+  voice: 'af_heart',        // optional — voice ID
+  language: 'en',            // optional
+  speed: 1.0,                // optional
+  format: 'mp3',             // optional — 'mp3' | 'wav' | 'ogg'
 });
-// result.buffer contains the audio data
+console.log(result.buffer);  // audio Buffer
+console.log(result.url);     // audio URL (FAL)
 ```
-### Discovery
+### Discovery Methods
 #### `ai.getProviders(modality?): Promise<ProviderInfo[]>`
@@ -196,15 +300,16 @@ List available providers, optionally filtered by modality.
 ```typescript
 const providers = await ai.getProviders('llm');
-// [{ id: 'pi-ai', name: 'Pi-AI', modalities: ['llm'], local: false, status: 'online', modelCount: 42 }]
+// [{ id: 'pi-ai', name: 'Pi-AI', modalities: ['llm'], local: false, status: 'online', modelCount: 246 }]
 ```
 #### `ai.getModels(modality?): Promise<ModelInfo[]>`
-List all available models.
+List all available models with full metadata.
 ```typescript
 const models = await ai.getModels('image');
+// Returns ModelInfo[] with id, provider, name, modality, local, cost, capabilities
 ```
 #### `ai.getModel(provider, modelId): Promise<ModelInfo | null>`
@@ -213,58 +318,618 @@ Get details about a specific model.
 #### `ai.syncModels(): Promise<SyncResult>`
-Refresh model lists from all providers.
+Refresh model lists from all providers. Returns sync count, per-provider breakdown, and any errors.
 ### Usage Tracking
 #### `ai.getUsage(options?): UsageSummary`
-Get aggregated usage statistics.
+Get aggregated usage statistics with optional filtering.
 ```typescript
-const usage = ai.getUsage({ since: '2024-01-01', provider: 'openai' });
-console.log(usage.totalCost);       // total USD spent
-console.log(usage.totalRequests);   // number of requests
-console.log(usage.byProvider);      // { openai: 2.50, anthropic: 1.20 }
-console.log(usage.byModality);      // { llm: 3.00, image: 0.70 }
+const usage = ai.getUsage({
+  since: '2024-01-01',    // optional — ISO date or Date object
+  until: '2024-12-31',    // optional
+  provider: 'openai',     // optional — filter by provider
+  modality: 'llm',        // optional — filter by modality
+});
+console.log(usage.totalCost);        // total USD spent
+console.log(usage.totalRequests);    // number of requests
+console.log(usage.byProvider);       // { openai: 2.50, anthropic: 1.20, fal: 0.30 }
+console.log(usage.byModality);       // { llm: 3.00, image: 0.70, video: 0.30, tts: 0.00 }
 ```
-Real-time usage callback:
+### Lifecycle
+#### `ai.registerProvider(provider): void`
+Register a custom provider (see [Custom Providers](#custom-providers)).
+#### `ai.dispose(): Promise<void>`
+Cleanup all provider resources, clear model cache, and reset usage tracker.
+### NoosphereResult
+Every generation method returns a `NoosphereResult`:
 ```typescript
-const ai = new Noosphere({
-  onUsage: (event) => {
-    console.log(`${event.provider}/${event.model}: $${event.cost} (${event.latencyMs}ms)`);
-  },
+interface NoosphereResult {
+  content?: string;        // LLM response text
+  thinking?: string;       // reasoning/thinking output (supported models)
+  url?: string;            // media URL (images, videos, audio from cloud providers)
+  buffer?: Buffer;         // media binary data (local providers, HuggingFace)
+  provider: string;        // which provider handled the request
+  model: string;           // which model was used
+  modality: Modality;      // 'llm' | 'image' | 'video' | 'tts'
+  latencyMs: number;       // request duration in milliseconds
+  usage: {
+    cost: number;          // cost in USD
+    input?: number;        // input tokens/characters
+    output?: number;       // output tokens
+    unit?: string;         // 'tokens' | 'characters' | 'per_image' | 'per_second' | 'free'
+  };
+  media?: {
+    width?: number;        // image/video width
+    height?: number;       // image/video height
+    duration?: number;     // video/audio duration in seconds
+    format?: string;       // 'png' | 'mp4' | 'mp3' | 'wav'
+    fps?: number;          // video frames per second
+  };
+}
+```
+---
+## Providers In Depth
+### Pi-AI — LLM Gateway (246+ models)
+**Provider ID:** `pi-ai`
+**Modalities:** LLM (chat + streaming)
+**Library:** `@mariozechner/pi-ai`
+A unified gateway that routes to 8 LLM providers through 4 different API protocols:
+| API Protocol | Providers |
+|---|---|
+| `anthropic-messages` | Anthropic |
+| `google-generative-ai` | Google |
+| `openai-responses` | OpenAI (reasoning models) |
+| `openai-completions` | OpenAI, xAI, Groq, Cerebras, Zai, OpenRouter |
+#### Anthropic Models (19)
+| Model | Context | Reasoning | Vision | Input Cost | Output Cost |
+|---|---|---|---|---|---|
+| `claude-opus-4-0` | 200k | Yes | Yes | $15/M | $75/M |
+| `claude-opus-4-1` | 200k | Yes | Yes | $15/M | $75/M |
+| `claude-sonnet-4-20250514` | 200k | Yes | Yes | $3/M | $15/M |
+| `claude-sonnet-4-5-20250929` | 200k | Yes | Yes | $3/M | $15/M |
+| `claude-3-7-sonnet-20250219` | 200k | Yes | Yes | $3/M | $15/M |
+| `claude-3-5-sonnet-20241022` | 200k | No | Yes | $3/M | $15/M |
+| `claude-haiku-4-5-20251001` | 200k | No | Yes | $0.80/M | $4/M |
+| `claude-3-5-haiku-20241022` | 200k | No | Yes | $0.80/M | $4/M |
+| `claude-3-haiku-20240307` | 200k | No | Yes | $0.25/M | $1.25/M |
+| *...and 10 more variants* | | | | | |
+#### OpenAI Models (24)
+| Model | Context | Reasoning | Vision | Input Cost | Output Cost |
+|---|---|---|---|---|---|
+| `gpt-5` | 200k | Yes | Yes | $10/M | $30/M |
+| `gpt-5-mini` | 200k | Yes | Yes | $2.50/M | $10/M |
+| `gpt-4.1` | 128k | No | Yes | $2/M | $8/M |
+| `gpt-4.1-mini` | 128k | No | Yes | $0.40/M | $1.60/M |
+| `gpt-4.1-nano` | 128k | No | Yes | $0.10/M | $0.40/M |
+| `gpt-4o` | 128k | No | Yes | $2.50/M | $10/M |
+| `gpt-4o-mini` | 128k | No | Yes | $0.15/M | $0.60/M |
+| `o3-pro` | 200k | Yes | Yes | $20/M | $80/M |
+| `o3-mini` | 200k | Yes | Yes | $1.10/M | $4.40/M |
+| `o4-mini` | 200k | Yes | Yes | $1.10/M | $4.40/M |
+| `codex-mini-latest` | 200k | Yes | No | $1.50/M | $6/M |
+| *...and 13 more variants* | | | | | |
+#### Google Gemini Models (19)
+| Model | Context | Reasoning | Vision | Cost |
+|---|---|---|---|---|
+| `gemini-2.5-flash` | 1M | Yes | Yes | $0.15-0.60/M |
+| `gemini-2.5-pro` | 1M | Yes | Yes | $1.25-10/M |
+| `gemini-2.0-flash` | 1M | No | Yes | $0.10-0.40/M |
+| `gemini-2.0-flash-lite` | 1M | No | Yes | $0.025-0.10/M |
+| `gemini-1.5-flash` | 1M | No | Yes | $0.075-0.30/M |
+| `gemini-1.5-pro` | 2M | No | Yes | $1.25-5/M |
+| *...and 13 more variants* | | | | |
+#### xAI Grok Models (20)
+| Model | Context | Reasoning | Vision | Input Cost |
+|---|---|---|---|---|
+| `grok-4` | 256k | Yes | Yes | $5/M |
+| `grok-4-fast` | 256k | Yes | Yes | $3/M |
+| `grok-3` | 131k | No | Yes | $3/M |
+| `grok-3-fast` | 131k | No | Yes | $5/M |
+| `grok-3-mini-fast-latest` | 131k | Yes | No | $0.30/M |
+| `grok-2-vision` | 32k | No | Yes | $2/M |
+| *...and 14 more variants* | | | | |
+#### Groq Models (15)
+| Model | Context | Cost |
+|---|---|---|
+| `llama-3.3-70b-versatile` | 128k | $0.59/M |
+| `llama-3.1-8b-instant` | 128k | $0.05/M |
+| `mistral-saba-24b` | 32k | $0.40/M |
+| `qwen-qwq-32b` | 128k | $0.29/M |
+| `deepseek-r1-distill-llama-70b` | 128k | $0.75/M |
+| *...and 10 more* | | |
+#### Cerebras Models (3)
+`gpt-oss-120b`, `qwen-3-235b-a22b-instruct-2507`, `qwen-3-coder-480b`
+#### Zai Models (5)
+`glm-4.6`, `glm-4.5`, `glm-4.5-flash`, `glm-4.5v`, `glm-4.5-air`
+#### OpenRouter (141 models)
+Aggregator providing access to hundreds of additional models including Llama, Deepseek, Mistral, Qwen, and many more. Full list available via `ai.getModels('llm')`.
+#### Agentic Capabilities (via Pi-AI library)
+The underlying `@mariozechner/pi-ai` library exposes powerful agentic features. While Noosphere currently surfaces chat and streaming, the library provides:
+**Tool Use / Function Calling:**
+```typescript
+// Supported across Anthropic, OpenAI, Google, xAI, Groq
+// Tool definitions use TypeBox schemas for runtime validation
+interface Tool<TParameters extends TSchema = TSchema> {
+  name: string;
+  description: string;
+  parameters: TParameters;  // TypeBox schema — validated at runtime with AJV
+}
+```
+**Reasoning / Thinking:**
+- **Anthropic:** `thinkingEnabled`, `thinkingBudgetTokens` — Claude Opus/Sonnet extended thinking
+- **OpenAI:** `reasoningEffort` (minimal/low/medium/high) — o1/o3/o4/GPT-5 reasoning
+- **Google:** `thinking.enabled`, `thinking.budgetTokens` — Gemini 2.5 thinking
+- **xAI:** Grok-4 native reasoning
+- Thinking blocks are automatically extracted and streamed as separate `thinking_delta` events
+**Vision / Multimodal Input:**
+```typescript
+// Send images alongside text to vision-capable models
+{
+  role: "user",
+  content: [
+    { type: "text", text: "What's in this image?" },
+    { type: "image", data: base64String, mimeType: "image/png" }
+  ]
+}
+```
+**Agent Loop:**
+```typescript
+// Built-in agentic execution loop with automatic tool calling
+import { agentLoop } from '@mariozechner/pi-ai';
+const events = agentLoop(prompt, context, {
+  tools: [myTool],
+  model: getModel('anthropic', 'claude-sonnet-4-20250514'),
 });
+for await (const event of events) {
+  // event.type: agent_start → turn_start → message_start →
+  //   message_update → tool_execution_start → tool_execution_end →
+  //   message_end → turn_end → agent_end
+}
 ```
-### Custom Providers
+**Cost Tracking per Model:**
+```typescript
+// Costs tracked per 1M tokens with cache-aware pricing
+{
+  input: number,       // cost per 1M input tokens
+  output: number,      // cost per 1M output tokens
+  cacheRead: number,   // prompt cache hit cost
+  cacheWrite: number,  // prompt cache write cost
+}
+```
+---
+### FAL — Media Generation (867+ endpoints)
+**Provider ID:** `fal`
+**Modalities:** Image, Video, TTS
+**Library:** `@fal-ai/client`
+The largest media generation provider with dynamic pricing fetched at runtime from `https://api.fal.ai/v1/models/pricing`.
+#### Image Models (200+)
+**FLUX Family (20+ variants):**
+| Model | Description |
+|---|---|
+| `fal-ai/flux/schnell` | Fast generation (default) |
+| `fal-ai/flux/dev` | Higher quality |
+| `fal-ai/flux-2` | Next generation |
+| `fal-ai/flux-2-pro` | Professional quality |
+| `fal-ai/flux-2-flex` | Flexible variant |
+| `fal-ai/flux-2/edit` | Image editing |
+| `fal-ai/flux-2/lora` | LoRA fine-tuning |
+| `fal-ai/flux-pro/v1.1-ultra` | Ultra high quality |
+| `fal-ai/flux-pro/kontext` | Context-aware generation |
+| `fal-ai/flux-lora` | Custom style training |
+| `fal-ai/flux-vision-upscaler` | AI upscaling |
+| `fal-ai/flux-krea-trainer` | Model training |
+| `fal-ai/flux-lora-fast-training` | Fast fine-tuning |
+| `fal-ai/flux-lora-portrait-trainer` | Portrait specialist |
+**Stable Diffusion:**
+`fal-ai/stable-diffusion-v15`, `fal-ai/stable-diffusion-v35-large`, `fal-ai/stable-diffusion-v35-medium`, `fal-ai/stable-diffusion-v3-medium`
+**Other Image Models:**
+| Model | Description |
+|---|---|
+| `fal-ai/recraft/v3/text-to-image` | Artistic generation |
+| `fal-ai/ideogram/v2`, `v2a`, `v3` | Ideogram series |
+| `fal-ai/imagen3`, `fal-ai/imagen4/preview` | Google Imagen |
+| `fal-ai/gpt-image-1` | GPT image generation |
+| `fal-ai/gpt-image-1/edit-image` | GPT image editing |
+| `fal-ai/reve/text-to-image` | Reve generation |
+| `fal-ai/sana`, `fal-ai/sana/sprint` | Sana models |
+| `fal-ai/pixart-sigma` | PixArt Sigma |
+| `fal-ai/bria/text-to-image/base` | Bria AI |
+**Pre-trained LoRA Styles:**
+`fal-ai/flux-2-lora-gallery/sepia-vintage`, `virtual-tryon`, `satellite-view-style`, `realism`, `multiple-angles`, `hdr-style`, `face-to-full-portrait`, `digital-comic-art`, `ballpoint-pen-sketch`, `apartment-staging`, `add-background`
+**Image Editing/Enhancement (30+ tools):**
+`fal-ai/image-editing/age-progression`, `baby-version`, `background-change`, `hair-change`, `expression-change`, `object-removal`, `photo-restoration`, `style-transfer`, and many more.
+#### Video Models (150+)
+**Kling Video (20+ variants):**
+| Model | Description |
+|---|---|
+| `fal-ai/kling-video/v2/master/text-to-video` | Default text-to-video |
+| `fal-ai/kling-video/v2/master/image-to-video` | Image-to-video |
+| `fal-ai/kling-video/v2.5-turbo/pro/text-to-video` | Turbo pro |
+| `fal-ai/kling-video/o1/image-to-video` | O1 quality |
+| `fal-ai/kling-video/o1/video-to-video/edit` | Video editing |
+| `fal-ai/kling-video/lipsync/audio-to-video` | Lip sync |
+| `fal-ai/kling-video/video-to-audio` | Audio extraction |
+**Sora 2 (OpenAI):**
+| Model | Description |
+|---|---|
+| `fal-ai/sora-2/text-to-video` | Text-to-video |
+| `fal-ai/sora-2/text-to-video/pro` | Pro quality |
+| `fal-ai/sora-2/image-to-video` | Image-to-video |
+| `fal-ai/sora-2/video-to-video/remix` | Video remixing |
+**VEO 3 (Google):**
+| Model | Description |
+|---|---|
+| `fal-ai/veo3` | VEO 3 standard |
+| `fal-ai/veo3/fast` | Fast variant |
+| `fal-ai/veo3/image-to-video` | Image-to-video |
+| `fal-ai/veo3.1` | Latest version |
+| `fal-ai/veo3.1/reference-to-video` | Reference-guided |
+| `fal-ai/veo3.1/first-last-frame-to-video` | Frame interpolation |
+**WAN (15+ variants):**
+`fal-ai/wan-pro/text-to-video`, `fal-ai/wan-pro/image-to-video`, `fal-ai/wan/v2.2-a14b/text-to-video`, `fal-ai/wan-vace-14b/depth`, `fal-ai/wan-vace-14b/inpainting`, `fal-ai/wan-vace-14b/pose`, `fal-ai/wan-effects`
+**Pixverse (20+ variants):**
+`fal-ai/pixverse/v5.5/text-to-video`, `fal-ai/pixverse/v5.5/image-to-video`, `fal-ai/pixverse/v5.5/effects`, `fal-ai/pixverse/lipsync`, `fal-ai/pixverse/sound-effects`
+**Minimax / Hailuo:**
+`fal-ai/minimax/hailuo-2.3/text-to-video/pro`, `fal-ai/minimax/hailuo-2.3/image-to-video/pro`, `fal-ai/minimax/video-01-director`, `fal-ai/minimax/video-01-live`
+**Other Video Models:**
+| Provider | Models |
+|---|---|
+| Hunyuan | `fal-ai/hunyuan-video/text-to-video`, `image-to-video`, `video-to-video`, `foley` |
+| Pika | `fal-ai/pika/v2.2/text-to-video`, `pikascenes`, `pikaffects` |
+| LTX | `fal-ai/ltx-2/text-to-video`, `image-to-video`, `retake-video` |
+| Luma | `fal-ai/luma-dream-machine/ray-2`, `ray-2-flash`, `luma-photon` |
+| Vidu | `fal-ai/vidu/q2/text-to-video`, `image-to-video/pro` |
+| CogVideoX | `fal-ai/cogvideox-5b/text-to-video`, `video-to-video` |
+| Seedance | `fal-ai/bytedance/seedance/v1/text-to-video`, `image-to-video` |
+| Magi | `fal-ai/magi/text-to-video`, `extend-video` |
+#### TTS / Speech Models (50+)
+**Kokoro (9 languages, 20+ voices per language):**
+| Model | Language | Example Voices |
+|---|---|---|
+| `fal-ai/kokoro/american-english` | English (US) | af_heart, af_alloy, af_bella, af_nova, am_adam, am_echo, am_onyx |
+| `fal-ai/kokoro/british-english` | English (UK) | British voice set |
+| `fal-ai/kokoro/french` | French | French voice set |
+| `fal-ai/kokoro/japanese` | Japanese | Japanese voice set |
+| `fal-ai/kokoro/spanish` | Spanish | Spanish voice set |
+| `fal-ai/kokoro/mandarin-chinese` | Chinese | Mandarin voice set |
+| `fal-ai/kokoro/italian` | Italian | Italian voice set |
+| `fal-ai/kokoro/hindi` | Hindi | Hindi voice set |
+| `fal-ai/kokoro/brazilian-portuguese` | Portuguese | Portuguese voice set |
+**ElevenLabs:**
+| Model | Description |
+|---|---|
+| `fal-ai/elevenlabs/tts/eleven-v3` | Professional quality |
+| `fal-ai/elevenlabs/tts/turbo-v2.5` | Faster inference |
+| `fal-ai/elevenlabs/tts/multilingual-v2` | Multi-language |
+| `fal-ai/elevenlabs/text-to-dialogue/eleven-v3` | Dialogue generation |
+| `fal-ai/elevenlabs/sound-effects/v2` | Sound effects |
+| `fal-ai/elevenlabs/speech-to-text` | Transcription |
+| `fal-ai/elevenlabs/audio-isolation` | Background removal |
+**Other TTS:**
+`fal-ai/f5-tts` (voice cloning), `fal-ai/dia-tts`, `fal-ai/minimax/speech-2.6-turbo`, `fal-ai/minimax/speech-2.6-hd`, `fal-ai/chatterbox/text-to-speech`, `fal-ai/index-tts-2/text-to-speech`
+#### FAL Client Capabilities
+The `@fal-ai/client` provides additional features beyond what Noosphere surfaces:
+- **Queue API** — Submit jobs, poll status, get results, cancel. Supports webhooks and priority levels
+- **Streaming API** — Real-time streaming responses via async iterators
+- **Realtime API** — WebSocket connections for interactive use (e.g., real-time image generation)
+- **Storage API** — File upload with configurable TTL (1h, 1d, 7d, 30d, 1y, never)
+- **Retry logic** — Configurable retries with exponential backoff and jitter
+- **Request middleware** — Custom request interceptors and proxy support
+---
+### Hugging Face — Open Source AI (30+ tasks)
-Register your own provider by implementing the `NoosphereProvider` interface:
+**Provider ID:** `huggingface`
+**Modalities:** LLM, Image, TTS
+**Library:** `@huggingface/inference`
+Access to the entire Hugging Face Hub ecosystem. Any model hosted on HuggingFace can be used by passing its ID directly.
+#### Default Models
+| Modality | Default Model | Description |
+|---|---|---|
+| LLM | `meta-llama/Llama-3.1-8B-Instruct` | Llama 3.1 8B |
+| Image | `stabilityai/stable-diffusion-xl-base-1.0` | SDXL Base |
+| TTS | `facebook/mms-tts-eng` | MMS TTS English |
+Any HuggingFace model ID works — just pass it as the `model` parameter:
 ```typescript
-import type { NoosphereProvider } from 'noosphere';
+await ai.chat({
+  provider: 'huggingface',
+  model: 'mistralai/Mixtral-8x7B-v0.1',
+  messages: [{ role: 'user', content: 'Hello' }],
+});
+```
-const myProvider: NoosphereProvider = {
-  id: 'my-provider',
-  name: 'My Provider',
-  modalities: ['llm'],
-  isLocal: false,
+#### Full Library Capabilities
-  async ping() { return true; },
-  async listModels() { return [/* ... */]; },
+The `@huggingface/inference` library (v3.15.0) provides 30+ AI tasks, including capabilities not yet surfaced by Noosphere:
+**Natural Language Processing:**
+| Task | Method | Description |
+|---|---|---|
+| Chat | `chatCompletion()` | OpenAI-compatible chat completions |
+| Chat Streaming | `chatCompletionStream()` | Token-by-token streaming |
+| Text Generation | `textGeneration()` | Raw text completion |
+| Summarization | `summarization()` | Text summarization |
+| Translation | `translation()` | Language translation |
+| Question Answering | `questionAnswering()` | Extract answers from context |
+| Text Classification | `textClassification()` | Sentiment, topic classification |
+| Zero-Shot Classification | `zeroShotClassification()` | Classify without training |
+| Token Classification | `tokenClassification()` | NER, POS tagging |
+| Sentence Similarity | `sentenceSimilarity()` | Semantic similarity scores |
+| Feature Extraction | `featureExtraction()` | Text embeddings |
+| Fill Mask | `fillMask()` | Fill in masked tokens |
+| Table QA | `tableQuestionAnswering()` | Answer questions about tables |
+**Computer Vision:**
+| Task | Method | Description |
+|---|---|---|
+| Text-to-Image | `textToImage()` | Generate images from text |
+| Image-to-Image | `imageToImage()` | Transform/edit images |
+| Image Captioning | `imageToText()` | Describe images |
+| Classification | `imageClassification()` | Classify image content |
+| Object Detection | `objectDetection()` | Detect and locate objects |
+| Segmentation | `imageSegmentation()` | Pixel-level segmentation |
+| Zero-Shot Image | `zeroShotImageClassification()` | Classify without training |
+| Text-to-Video | `textToVideo()` | Generate videos |
+**Audio:**
+| Task | Method | Description |
+|---|---|---|
+| Text-to-Speech | `textToSpeech()` | Generate speech |
+| Speech-to-Text | `automaticSpeechRecognition()` | Transcription |
+| Audio Classification | `audioClassification()` | Classify sounds |
+| Audio-to-Audio | `audioToAudio()` | Source separation, enhancement |
+**Multimodal:**
+| Task | Method | Description |
+|---|---|---|
+| Visual QA | `visualQuestionAnswering()` | Answer questions about images |
+| Document QA | `documentQuestionAnswering()` | Answer questions about documents |
+**Tabular:**
+| Task | Method | Description |
+|---|---|---|
+| Classification | `tabularClassification()` | Classify tabular data |
+| Regression | `tabularRegression()` | Predict continuous values |
+#### HuggingFace Agentic Features
+- **Tool/Function Calling:** Full support via `tools` parameter with `tool_choice` control (auto/none/required)
+- **JSON Schema Responses:** `response_format: { type: 'json_schema', json_schema: {...} }`
+- **Reasoning:** `reasoning_effort` parameter (none/minimal/low/medium/high/xhigh)
+- **Multimodal Input:** Images via `image_url` content chunks in chat messages
+- **17 Inference Providers:** Route through Groq, Together, Fireworks, Replicate, Cerebras, Cohere, and more
+---
+### ComfyUI — Local Image Generation
+**Provider ID:** `comfyui`
+**Modalities:** Image, Video (planned)
+**Type:** Local
+**Default Port:** 8188
+Connects to a local ComfyUI instance for Stable Diffusion workflows.
+#### How It Works
-  async chat(options) {
-    // your implementation
-    return { content: '...', provider: 'my-provider', model: '...', modality: 'llm', latencyMs: 100, usage: { cost: 0 } };
+1. Clones a built-in txt2img workflow template (KSampler + SDXL pipeline)
+2. Injects your parameters (prompt, dimensions, seed, steps, guidance)
+3. POSTs the workflow to ComfyUI's `/prompt` endpoint
+4. Polls `/history/{promptId}` every second until completion (max 5 minutes)
+5. Fetches the generated image from `/view`
+6. Returns a PNG buffer
+#### Configuration
+```typescript
+const ai = new Noosphere({
+  local: {
+    comfyui: {
+      enabled: true,
+      host: 'http://localhost',
+      port: 8188,
+    },
   },
-};
+});
+```
-ai.registerProvider(myProvider);
+#### Default Workflow
+- **Checkpoint:** `sd_xl_base_1.0.safetensors`
+- **Sampler:** euler with normal scheduler
+- **Default Steps:** 20
+- **Default CFG/Guidance:** 7
+- **Default Size:** 1024x1024
+- **Max Size:** 2048x2048
+- **Output:** PNG
+#### Models Exposed
+| Model ID | Modality | Description |
+|---|---|---|
+| `comfyui-txt2img` | Image | Text-to-image via workflow |
+| `comfyui-txt2vid` | Video | Planned (requires AnimateDiff workflow) |
+---
+### Local TTS — Piper & Kokoro
+**Provider IDs:** `piper`, `kokoro`
+**Modality:** TTS
+**Type:** Local
+Connects to local OpenAI-compatible TTS servers.
+#### Supported Engines
+| Engine | Default Port | Health Check | Voice Discovery |
+|---|---|---|---|
+| Piper | 5500 | `GET /health` | `GET /voices` |
+| Kokoro | 5501 | `GET /health` | `GET /v1/models` (fallback) |
+#### API
+Uses the OpenAI-compatible TTS endpoint:
+```
+POST /v1/audio/speech
+{
+  "model": "tts-1",
+  "input": "Hello world",
+  "voice": "default",
+  "speed": 1.0,
+  "response_format": "mp3"
+}
+```
+Supports `mp3`, `wav`, and `ogg` formats. Returns audio as a Buffer.
+---
+## Architecture
+### Provider Resolution (Local-First)
+When you call a generation method without specifying a provider, Noosphere resolves one automatically:
+1. If `model` is specified without `provider` → looks up model in registry cache
+2. If a `default` is configured for the modality → uses that
+3. Otherwise → **local providers first**, then cloud providers
+```
+resolveProvider(modality):
+  1. Check user-specified provider ID → return if found
+  2. Check configured defaults → return if found
+  3. Scan all providers:
+     → Return first LOCAL provider supporting this modality
+     → Fallback to first CLOUD provider
+  4. Throw NO_PROVIDER error
+```
+### Retry & Failover Logic
+```
+executeWithRetry(modality, provider, fn):
+  for attempt = 0..maxRetries:
+    try: return fn()
+    catch:
+      if error is retryable AND attempts remain:
+        wait backoffMs * 2^attempt (exponential backoff)
+        retry same provider
+      if error is NOT GENERATION_FAILED AND failover enabled:
+        try each alternative provider for this modality
+      throw last error
+```
+**Retryable errors (same provider):** `PROVIDER_UNAVAILABLE`, `RATE_LIMITED`, `TIMEOUT`, `GENERATION_FAILED`
+**Failover-eligible errors (cross-provider):** `PROVIDER_UNAVAILABLE`, `RATE_LIMITED`, `TIMEOUT` (NOT `GENERATION_FAILED`)
+### Model Registry & Caching
+- Models are fetched from providers via `listModels()` and cached in memory
+- Cache TTL is configurable (default: 60 minutes)
+- `syncModels()` forces a refresh of all provider model lists
+- Registry tracks model → provider mappings for fast resolution
+### Usage Tracking
+Every API call (success or failure) records a `UsageEvent`:
+```typescript
+interface UsageEvent {
+  modality: 'llm' | 'image' | 'video' | 'tts';
+  provider: string;
+  model: string;
+  cost: number;           // USD
+  latencyMs: number;
+  input?: number;         // tokens or characters
+  output?: number;        // tokens
+  unit?: string;
+  timestamp: string;      // ISO 8601
+  success: boolean;
+  error?: string;         // error message if failed
+  metadata?: Record<string, unknown>;
+}
 ```
-### Error Handling
+---
+## Error Handling
 All errors are instances of `NoosphereError`:
@@ -275,67 +940,100 @@ try {
   await ai.chat({ messages: [{ role: 'user', content: 'Hello' }] });
 } catch (err) {
   if (err instanceof NoosphereError) {
-    console.log(err.code);      // 'RATE_LIMITED' | 'TIMEOUT' | 'AUTH_FAILED' | ...
-    console.log(err.provider);  // which provider failed
-    console.log(err.modality);  // 'llm' | 'image' | 'video' | 'tts'
-    console.log(err.isRetryable());
+    console.log(err.code);           // error code
+    console.log(err.provider);       // which provider failed
+    console.log(err.modality);       // which modality
+    console.log(err.model);          // which model (if known)
+    console.log(err.cause);          // underlying error
+    console.log(err.isRetryable());  // whether retry might help
   }
 }
 ```
-Error codes: `PROVIDER_UNAVAILABLE`, `MODEL_NOT_FOUND`, `AUTH_FAILED`, `RATE_LIMITED`, `TIMEOUT`, `GENERATION_FAILED`, `INVALID_INPUT`, `NO_PROVIDER`
+### Error Codes
-### Retry & Failover
+| Code | Description | Retryable | Failover |
+|---|---|---|---|
+| `PROVIDER_UNAVAILABLE` | Provider is down or unreachable | Yes | Yes |
+| `RATE_LIMITED` | API rate limit exceeded | Yes | Yes |
+| `TIMEOUT` | Request exceeded timeout | Yes | Yes |
+| `GENERATION_FAILED` | Generation error (bad prompt, model issue) | Yes | No |
+| `AUTH_FAILED` | Invalid or missing API key | No | No |
+| `MODEL_NOT_FOUND` | Requested model doesn't exist | No | No |
+| `INVALID_INPUT` | Bad parameters or unsupported operation | No | No |
+| `NO_PROVIDER` | No provider available for the requested modality | No | No |
-```typescript
-const ai = new Noosphere({
-  retry: {
-    maxRetries: 3,                // default: 2
-    backoffMs: 2000,              // default: 1000
-    failover: true,               // default: true — try other providers on failure
-    retryableErrors: ['RATE_LIMITED', 'TIMEOUT', 'PROVIDER_UNAVAILABLE'],
-  },
-  timeout: {
-    llm: 30000,    // 30s (default)
-    image: 120000, // 2min (default)
-    video: 300000, // 5min (default)
-    tts: 60000,    // 1min (default)
-  },
-});
-```
+---
-### Local Services
+## Custom Providers
-Noosphere auto-detects local AI services on startup. Configure via constructor or environment variables:
+Extend Noosphere with your own providers:
 ```typescript
-const ai = new Noosphere({
-  autoDetectLocal: true, // default
-  local: {
-    comfyui: { enabled: true, host: 'http://localhost', port: 8188 },
-    piper: { enabled: true, host: 'http://localhost', port: 5500 },
-    kokoro: { enabled: true, host: 'http://localhost', port: 5501 },
+import type { NoosphereProvider, ModelInfo, ChatOptions, NoosphereResult, Modality } from 'noosphere';
+const myProvider: NoosphereProvider = {
+  // Required properties
+  id: 'my-provider',
+  name: 'My Custom Provider',
+  modalities: ['llm', 'image'] as Modality[],
+  isLocal: false,
+  // Required methods
+  async ping() { return true; },
+  async listModels(modality?: Modality): Promise<ModelInfo[]> {
+    return [{
+      id: 'my-model',
+      provider: 'my-provider',
+      name: 'My Model',
+      modality: 'llm',
+      local: false,
+      cost: { price: 1.0, unit: 'per_1m_tokens' },
+      capabilities: {
+        contextWindow: 128000,
+        maxTokens: 4096,
+        supportsVision: false,
+        supportsStreaming: true,
+      },
+    }];
   },
-});
-```
-Environment variables: `COMFYUI_HOST`, `COMFYUI_PORT`, `PIPER_HOST`, `PIPER_PORT`, `KOKORO_HOST`, `KOKORO_PORT`, `NOOSPHERE_AUTO_DETECT_LOCAL`
+  // Optional methods — implement per modality
+  async chat(options: ChatOptions): Promise<NoosphereResult> {
+    const start = Date.now();
+    // ... your implementation
+    return {
+      content: 'Response text',
+      provider: 'my-provider',
+      model: 'my-model',
+      modality: 'llm',
+      latencyMs: Date.now() - start,
+      usage: { cost: 0.001, input: 100, output: 50, unit: 'tokens' },
+    };
+  },
-### Cleanup
+  // stream?(options): NoosphereStream
+  // image?(options): Promise<NoosphereResult>
+  // video?(options): Promise<NoosphereResult>
+  // speak?(options): Promise<NoosphereResult>
+  // dispose?(): Promise<void>
+};
-```typescript
-await ai.dispose();
+ai.registerProvider(myProvider);
 ```
-## Providers
+---
-| Provider | Modalities | Type |
-|---|---|---|
-| Pi-AI (OpenAI, Anthropic, Google, Groq, Mistral, xAI, OpenRouter) | LLM | Cloud |
-| FAL | Image, Video, TTS | Cloud |
-| Hugging Face | LLM, Image, TTS | Cloud |
-| ComfyUI | Image, Video | Local |
-| Piper / Kokoro | TTS | Local |
+## Provider Summary
+| Provider | ID | Modalities | Type | Models | Library |
+|---|---|---|---|---|---|
+| Pi-AI Gateway | `pi-ai` | LLM | Cloud | 246+ | `@mariozechner/pi-ai` |
+| FAL.ai | `fal` | Image, Video, TTS | Cloud | 867+ | `@fal-ai/client` |
+| Hugging Face | `huggingface` | LLM, Image, TTS | Cloud | Unlimited (any HF model) | `@huggingface/inference` |
+| ComfyUI | `comfyui` | Image | Local | SDXL workflows | Direct HTTP |
+| Piper TTS | `piper` | TTS | Local | Piper voices | Direct HTTP |
+| Kokoro TTS | `kokoro` | TTS | Local | Kokoro voices | Direct HTTP |
 ## Requirements

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "noosphere",
-  "version": "0.1.1",
+  "version": "0.1.2",
   "description": "Unified AI creation engine — text, image, video, audio across all providers",
   "type": "module",
   "main": "./dist/index.js",