npm - noosphere - Versions diffs - 0.1.1 → 0.1.3 - Mend

noosphere 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +1255 -118
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -2,13 +2,18 @@
 Unified AI creation engine — text, image, video, and audio generation across all providers through a single interface.
+One import. Every model. Every modality.
 ## Features
-- **Multi-modal** — LLM chat, image generation, video generation, and text-to-speech
-- **Multi-provider** — OpenAI, Anthropic, Google, Groq, Mistral, xAI, OpenRouter, FAL, Hugging Face
-- **Local-first** — Auto-detects ComfyUI, Ollama, Piper, and Kokoro running on your machine
+- **4 modalities** — LLM chat, image generation, video generation, and text-to-speech
+- **246+ LLM models** — via Pi-AI gateway (OpenAI, Anthropic, Google, Groq, Mistral, xAI, Cerebras, OpenRouter)
+- **867+ media endpoints** — via FAL (Flux, SDXL, Kling, Sora 2, VEO 3, Kokoro, ElevenLabs, and hundreds more)
+- **30+ HuggingFace tasks** — LLM, image, TTS, translation, summarization, classification, and more
+- **Local-first architecture** — Auto-detects ComfyUI, Ollama, Piper, and Kokoro on your machine
+- **Agentic capabilities** — Tool use, function calling, reasoning/thinking, vision, and agent loops via Pi-AI
 - **Failover & retry** — Automatic retries with exponential backoff and cross-provider failover
-- **Usage tracking** — Track costs, latency, and token counts across all providers
+- **Usage tracking** — Real-time cost, latency, and token tracking across all providers
 - **TypeScript-first** — Full type definitions with ESM and CommonJS support
 ## Install
@@ -56,7 +61,7 @@ const audio = await ai.speak({
 ## Configuration
-API keys are resolved from the constructor or environment variables:
+API keys are resolved from the constructor config or environment variables (config takes priority):
 ```typescript
 const ai = new Noosphere({
@@ -80,47 +85,118 @@ Or set environment variables:
 |---|---|
 | `OPENAI_API_KEY` | OpenAI |
 | `ANTHROPIC_API_KEY` | Anthropic |
-| `GEMINI_API_KEY` | Google |
-| `FAL_KEY` | FAL |
+| `GEMINI_API_KEY` | Google Gemini |
+| `FAL_KEY` | FAL.ai |
 | `HUGGINGFACE_TOKEN` | Hugging Face |
 | `GROQ_API_KEY` | Groq |
 | `MISTRAL_API_KEY` | Mistral |
-| `XAI_API_KEY` | xAI |
+| `XAI_API_KEY` | xAI (Grok) |
 | `OPENROUTER_API_KEY` | OpenRouter |
-## API
+### Full Configuration Reference
+```typescript
+const ai = new Noosphere({
+  // API keys (or use env vars above)
+  keys: { /* ... */ },
+  // Default models per modality
+  defaults: {
+    llm: { provider: 'pi-ai', model: 'claude-sonnet-4-20250514' },
+    image: { provider: 'fal', model: 'fal-ai/flux/schnell' },
+    video: { provider: 'fal', model: 'fal-ai/kling-video/v2/master/text-to-video' },
+    tts: { provider: 'fal', model: 'fal-ai/kokoro/american-english' },
+  },
+  // Local service configuration
+  autoDetectLocal: true,  // env: NOOSPHERE_AUTO_DETECT_LOCAL
+  local: {
+    ollama: { enabled: true, host: 'http://localhost', port: 11434 },
+    comfyui: { enabled: true, host: 'http://localhost', port: 8188 },
+    piper: { enabled: true, host: 'http://localhost', port: 5500 },
+    kokoro: { enabled: true, host: 'http://localhost', port: 5501 },
+    custom: [],  // additional LocalServiceConfig[]
+  },
+  // Retry & failover
+  retry: {
+    maxRetries: 2,           // default: 2
+    backoffMs: 1000,         // default: 1000 (exponential: 1s, 2s, 4s...)
+    failover: true,          // default: true — try other providers on failure
+    retryableErrors: ['PROVIDER_UNAVAILABLE', 'RATE_LIMITED', 'TIMEOUT'],
+  },
+  // Timeouts per modality (ms)
+  timeout: {
+    llm: 30000,    // 30s
+    image: 120000, // 2min
+    video: 300000, // 5min
+    tts: 60000,    // 1min
+  },
+  // Model discovery cache (minutes)
+  discoveryCacheTTL: 60,  // env: NOOSPHERE_DISCOVERY_CACHE_TTL
+  // Real-time usage callback
+  onUsage: (event) => {
+    console.log(`${event.provider}/${event.model}: $${event.cost} (${event.latencyMs}ms)`);
+  },
+});
+```
+### Local Service Environment Variables
+| Variable | Default | Description |
+|---|---|---|
+| `OLLAMA_HOST` | `http://localhost` | Ollama server host |
+| `OLLAMA_PORT` | `11434` | Ollama server port |
+| `COMFYUI_HOST` | `http://localhost` | ComfyUI server host |
+| `COMFYUI_PORT` | `8188` | ComfyUI server port |
+| `PIPER_HOST` | `http://localhost` | Piper TTS server host |
+| `PIPER_PORT` | `5500` | Piper TTS server port |
+| `KOKORO_HOST` | `http://localhost` | Kokoro TTS server host |
+| `KOKORO_PORT` | `5501` | Kokoro TTS server port |
+| `NOOSPHERE_AUTO_DETECT_LOCAL` | `true` | Enable/disable local service auto-detection |
+| `NOOSPHERE_DISCOVERY_CACHE_TTL` | `60` | Model cache TTL in minutes |
+---
+## API Reference
 ### `new Noosphere(config?)`
-Creates a new instance. Providers are initialized lazily on first use.
+Creates a new instance. Providers are initialized lazily on first API call. Auto-detects local services via HTTP pings (2s timeout each).
-### Generation
+### Generation Methods
 #### `ai.chat(options): Promise<NoosphereResult>`
-Generate text with an LLM.
+Generate text with any LLM. Supports 246+ models across 8 providers.
 ```typescript
 const result = await ai.chat({
-  provider: 'anthropic',          // optional — auto-resolved if omitted
-  model: 'claude-sonnet-4-20250514',  // optional
+  provider: 'anthropic',                // optional — auto-resolved if omitted
+  model: 'claude-sonnet-4-20250514',    // optional — uses default or first available
   messages: [
     { role: 'system', content: 'You are helpful.' },
     { role: 'user', content: 'Explain quantum computing' },
   ],
-  temperature: 0.7,               // optional
-  maxTokens: 1024,                // optional
-  jsonMode: false,                // optional
+  temperature: 0.7,     // optional (0-2)
+  maxTokens: 1024,      // optional
+  jsonMode: false,       // optional
 });
-console.log(result.content);     // response text
-console.log(result.thinking);    // reasoning (if supported)
-console.log(result.usage.cost);  // cost in USD
+console.log(result.content);          // response text
+console.log(result.thinking);         // reasoning output (Claude, GPT-5, o3, Gemini, Grok-4)
+console.log(result.usage.cost);       // cost in USD
+console.log(result.usage.input);      // input tokens
+console.log(result.usage.output);     // output tokens
+console.log(result.latencyMs);        // response time in ms
 ```
 #### `ai.stream(options): NoosphereStream`
-Stream LLM responses.
+Stream LLM responses token-by-token. Same options as `chat()`.
 ```typescript
 const stream = ai.stream({
@@ -128,67 +204,95 @@ const stream = ai.stream({
 });
 for await (const event of stream) {
-  if (event.type === 'text_delta') process.stdout.write(event.delta!);
-  if (event.type === 'thinking_delta') console.log('[thinking]', event.delta);
+  switch (event.type) {
+    case 'text_delta':
+      process.stdout.write(event.delta!);
+      break;
+    case 'thinking_delta':
+      console.log('[thinking]', event.delta);
+      break;
+    case 'done':
+      console.log('\n\nUsage:', event.result!.usage);
+      break;
+    case 'error':
+      console.error(event.error);
+      break;
+  }
 }
-// Or get the full result after streaming
+// Or consume the full result
 const result = await stream.result();
+// Abort at any time
+stream.abort();
 ```
 #### `ai.image(options): Promise<NoosphereResult>`
-Generate images.
+Generate images. Supports 200+ image models via FAL, HuggingFace, and ComfyUI.
 ```typescript
 const result = await ai.image({
-  prompt: 'A futuristic cityscape',
-  negativePrompt: 'blurry, low quality', // optional
-  width: 1024,                           // optional
-  height: 768,                           // optional
-  seed: 42,                              // optional
-  steps: 30,                             // optional
-  guidanceScale: 7.5,                    // optional
+  provider: 'fal',                              // optional
+  model: 'fal-ai/flux-2-pro',                   // optional
+  prompt: 'A futuristic cityscape at sunset',
+  negativePrompt: 'blurry, low quality',         // optional
+  width: 1024,                                   // optional
+  height: 768,                                   // optional
+  seed: 42,                                      // optional — reproducible results
+  steps: 30,                                     // optional — inference steps (more = higher quality)
+  guidanceScale: 7.5,                            // optional — prompt adherence (higher = stricter)
 });
-console.log(result.url);          // image URL
-console.log(result.media?.width); // dimensions
+console.log(result.url);                // image URL (FAL)
+console.log(result.buffer);             // image Buffer (HuggingFace, ComfyUI)
+console.log(result.media?.width);       // actual dimensions
+console.log(result.media?.height);
+console.log(result.media?.format);      // 'png'
 ```
 #### `ai.video(options): Promise<NoosphereResult>`
-Generate videos.
+Generate videos. Supports 150+ video models via FAL (Kling, Sora 2, VEO 3, WAN, Pixverse, and more).
 ```typescript
 const result = await ai.video({
+  provider: 'fal',
+  model: 'fal-ai/kling-video/v2/master/text-to-video',
   prompt: 'A bird flying through clouds',
-  imageUrl: 'https://...',  // optional — image-to-video
-  duration: 5,              // optional — seconds
-  fps: 24,                  // optional
-  width: 1280,              // optional
-  height: 720,              // optional
+  imageUrl: 'https://...',    // optional — image-to-video
+  duration: 5,                // optional — seconds
+  fps: 24,                    // optional
+  width: 1280,                // optional
+  height: 720,                // optional
 });
-console.log(result.url);
+console.log(result.url);                // video URL
+console.log(result.media?.duration);    // actual duration
+console.log(result.media?.fps);         // frames per second
+console.log(result.media?.format);      // 'mp4'
 ```
 #### `ai.speak(options): Promise<NoosphereResult>`
-Text-to-speech synthesis.
+Text-to-speech synthesis. Supports 50+ TTS models via FAL, HuggingFace, Piper, and Kokoro.
 ```typescript
 const result = await ai.speak({
+  provider: 'fal',
+  model: 'fal-ai/kokoro/american-english',
   text: 'Hello world',
-  voice: 'alloy',           // optional
-  language: 'en',           // optional
-  speed: 1.0,               // optional
-  format: 'mp3',            // optional — 'mp3' | 'wav' | 'ogg'
+  voice: 'af_heart',        // optional — voice ID
+  language: 'en',            // optional
+  speed: 1.0,                // optional
+  format: 'mp3',             // optional — 'mp3' | 'wav' | 'ogg'
 });
-// result.buffer contains the audio data
+console.log(result.buffer);  // audio Buffer
+console.log(result.url);     // audio URL (FAL)
 ```
-### Discovery
+### Discovery Methods
 #### `ai.getProviders(modality?): Promise<ProviderInfo[]>`
@@ -196,15 +300,16 @@ List available providers, optionally filtered by modality.
 ```typescript
 const providers = await ai.getProviders('llm');
-// [{ id: 'pi-ai', name: 'Pi-AI', modalities: ['llm'], local: false, status: 'online', modelCount: 42 }]
+// [{ id: 'pi-ai', name: 'Pi-AI', modalities: ['llm'], local: false, status: 'online', modelCount: 246 }]
 ```
 #### `ai.getModels(modality?): Promise<ModelInfo[]>`
-List all available models.
+List all available models with full metadata.
 ```typescript
 const models = await ai.getModels('image');
+// Returns ModelInfo[] with id, provider, name, modality, local, cost, capabilities
 ```
 #### `ai.getModel(provider, modelId): Promise<ModelInfo | null>`
@@ -213,129 +318,1161 @@ Get details about a specific model.
 #### `ai.syncModels(): Promise<SyncResult>`
-Refresh model lists from all providers.
+Refresh model lists from all providers. Returns sync count, per-provider breakdown, and any errors.
 ### Usage Tracking
 #### `ai.getUsage(options?): UsageSummary`
-Get aggregated usage statistics.
+Get aggregated usage statistics with optional filtering.
 ```typescript
-const usage = ai.getUsage({ since: '2024-01-01', provider: 'openai' });
-console.log(usage.totalCost);       // total USD spent
-console.log(usage.totalRequests);   // number of requests
-console.log(usage.byProvider);      // { openai: 2.50, anthropic: 1.20 }
-console.log(usage.byModality);      // { llm: 3.00, image: 0.70 }
+const usage = ai.getUsage({
+  since: '2024-01-01',    // optional — ISO date or Date object
+  until: '2024-12-31',    // optional
+  provider: 'openai',     // optional — filter by provider
+  modality: 'llm',        // optional — filter by modality
+});
+console.log(usage.totalCost);        // total USD spent
+console.log(usage.totalRequests);    // number of requests
+console.log(usage.byProvider);       // { openai: 2.50, anthropic: 1.20, fal: 0.30 }
+console.log(usage.byModality);       // { llm: 3.00, image: 0.70, video: 0.30, tts: 0.00 }
 ```
-Real-time usage callback:
+### Lifecycle
+#### `ai.registerProvider(provider): void`
+Register a custom provider (see [Custom Providers](#custom-providers)).
+#### `ai.dispose(): Promise<void>`
+Cleanup all provider resources, clear model cache, and reset usage tracker.
+### NoosphereResult
+Every generation method returns a `NoosphereResult`:
 ```typescript
-const ai = new Noosphere({
-  onUsage: (event) => {
-    console.log(`${event.provider}/${event.model}: $${event.cost} (${event.latencyMs}ms)`);
+interface NoosphereResult {
+  content?: string;        // LLM response text
+  thinking?: string;       // reasoning/thinking output (supported models)
+  url?: string;            // media URL (images, videos, audio from cloud providers)
+  buffer?: Buffer;         // media binary data (local providers, HuggingFace)
+  provider: string;        // which provider handled the request
+  model: string;           // which model was used
+  modality: Modality;      // 'llm' | 'image' | 'video' | 'tts'
+  latencyMs: number;       // request duration in milliseconds
+  usage: {
+    cost: number;          // cost in USD
+    input?: number;        // input tokens/characters
+    output?: number;       // output tokens
+    unit?: string;         // 'tokens' | 'characters' | 'per_image' | 'per_second' | 'free'
+  };
+  media?: {
+    width?: number;        // image/video width
+    height?: number;       // image/video height
+    duration?: number;     // video/audio duration in seconds
+    format?: string;       // 'png' | 'mp4' | 'mp3' | 'wav'
+    fps?: number;          // video frames per second
+  };
+}
+```
+---
+## Providers In Depth
+### Pi-AI — LLM Gateway (246+ models)
+**Provider ID:** `pi-ai`
+**Modalities:** LLM (chat + streaming)
+**Library:** `@mariozechner/pi-ai`
+A unified gateway that routes to 8 LLM providers through 4 different API protocols:
+| API Protocol | Providers |
+|---|---|
+| `anthropic-messages` | Anthropic |
+| `google-generative-ai` | Google |
+| `openai-responses` | OpenAI (reasoning models) |
+| `openai-completions` | OpenAI, xAI, Groq, Cerebras, Zai, OpenRouter |
+#### Anthropic Models (19)
+| Model | Context | Reasoning | Vision | Input Cost | Output Cost |
+|---|---|---|---|---|---|
+| `claude-opus-4-0` | 200k | Yes | Yes | $15/M | $75/M |
+| `claude-opus-4-1` | 200k | Yes | Yes | $15/M | $75/M |
+| `claude-sonnet-4-20250514` | 200k | Yes | Yes | $3/M | $15/M |
+| `claude-sonnet-4-5-20250929` | 200k | Yes | Yes | $3/M | $15/M |
+| `claude-3-7-sonnet-20250219` | 200k | Yes | Yes | $3/M | $15/M |
+| `claude-3-5-sonnet-20241022` | 200k | No | Yes | $3/M | $15/M |
+| `claude-haiku-4-5-20251001` | 200k | No | Yes | $0.80/M | $4/M |
+| `claude-3-5-haiku-20241022` | 200k | No | Yes | $0.80/M | $4/M |
+| `claude-3-haiku-20240307` | 200k | No | Yes | $0.25/M | $1.25/M |
+| *...and 10 more variants* | | | | | |
+#### OpenAI Models (24)
+| Model | Context | Reasoning | Vision | Input Cost | Output Cost |
+|---|---|---|---|---|---|
+| `gpt-5` | 200k | Yes | Yes | $10/M | $30/M |
+| `gpt-5-mini` | 200k | Yes | Yes | $2.50/M | $10/M |
+| `gpt-4.1` | 128k | No | Yes | $2/M | $8/M |
+| `gpt-4.1-mini` | 128k | No | Yes | $0.40/M | $1.60/M |
+| `gpt-4.1-nano` | 128k | No | Yes | $0.10/M | $0.40/M |
+| `gpt-4o` | 128k | No | Yes | $2.50/M | $10/M |
+| `gpt-4o-mini` | 128k | No | Yes | $0.15/M | $0.60/M |
+| `o3-pro` | 200k | Yes | Yes | $20/M | $80/M |
+| `o3-mini` | 200k | Yes | Yes | $1.10/M | $4.40/M |
+| `o4-mini` | 200k | Yes | Yes | $1.10/M | $4.40/M |
+| `codex-mini-latest` | 200k | Yes | No | $1.50/M | $6/M |
+| *...and 13 more variants* | | | | | |
+#### Google Gemini Models (19)
+| Model | Context | Reasoning | Vision | Cost |
+|---|---|---|---|---|
+| `gemini-2.5-flash` | 1M | Yes | Yes | $0.15-0.60/M |
+| `gemini-2.5-pro` | 1M | Yes | Yes | $1.25-10/M |
+| `gemini-2.0-flash` | 1M | No | Yes | $0.10-0.40/M |
+| `gemini-2.0-flash-lite` | 1M | No | Yes | $0.025-0.10/M |
+| `gemini-1.5-flash` | 1M | No | Yes | $0.075-0.30/M |
+| `gemini-1.5-pro` | 2M | No | Yes | $1.25-5/M |
+| *...and 13 more variants* | | | | |
+#### xAI Grok Models (20)
+| Model | Context | Reasoning | Vision | Input Cost |
+|---|---|---|---|---|
+| `grok-4` | 256k | Yes | Yes | $5/M |
+| `grok-4-fast` | 256k | Yes | Yes | $3/M |
+| `grok-3` | 131k | No | Yes | $3/M |
+| `grok-3-fast` | 131k | No | Yes | $5/M |
+| `grok-3-mini-fast-latest` | 131k | Yes | No | $0.30/M |
+| `grok-2-vision` | 32k | No | Yes | $2/M |
+| *...and 14 more variants* | | | | |
+#### Groq Models (15)
+| Model | Context | Cost |
+|---|---|---|
+| `llama-3.3-70b-versatile` | 128k | $0.59/M |
+| `llama-3.1-8b-instant` | 128k | $0.05/M |
+| `mistral-saba-24b` | 32k | $0.40/M |
+| `qwen-qwq-32b` | 128k | $0.29/M |
+| `deepseek-r1-distill-llama-70b` | 128k | $0.75/M |
+| *...and 10 more* | | |
+#### Cerebras Models (3)
+`gpt-oss-120b`, `qwen-3-235b-a22b-instruct-2507`, `qwen-3-coder-480b`
+#### Zai Models (5)
+`glm-4.6`, `glm-4.5`, `glm-4.5-flash`, `glm-4.5v`, `glm-4.5-air`
+#### OpenRouter (141 models)
+Aggregator providing access to hundreds of additional models including Llama, Deepseek, Mistral, Qwen, and many more. Full list available via `ai.getModels('llm')`.
+#### The Pi-AI Engine — Deep Dive
+Noosphere's LLM provider is powered by `@mariozechner/pi-ai`, part of the **Pi mono-repo** by Mario Zechner (badlogic). Pi is NOT a wrapper like LangChain or Mastra — it's a **micro-framework for agentic AI** (~15K LOC, 4 npm packages) that was built from scratch as a minimalist alternative to Claude Code.
+Pi consists of 4 packages in 3 tiers:
+```
+TIER 1 — FOUNDATION
+  @mariozechner/pi-ai             LLM API: stream(), complete(), model registry
+                                  0 internal deps, talks to 20+ providers
+TIER 2 — INFRASTRUCTURE
+  @mariozechner/pi-agent-core     Agent loop, tool execution, lifecycle events
+                                  Depends on pi-ai
+  @mariozechner/pi-tui            Terminal UI with differential rendering
+                                  Standalone, 0 internal deps
+TIER 3 — APPLICATION
+  @mariozechner/pi-coding-agent   CLI + SDK: sessions, compaction, extensions
+                                  Depends on all above
+```
+Noosphere uses `@mariozechner/pi-ai` (Tier 1) directly for LLM access. But the full Pi ecosystem provides capabilities that can be layered on top.
+---
+#### How Pi Keeps 200+ Models Updated
+Pi does NOT hardcode models. It has an **auto-generation pipeline** that runs at build time:
+```
+STEP 1: FETCH (3 sources in parallel)
+┌──────────────────┐  ┌──────────────────┐  ┌───────────────┐
+│   models.dev     │  │   OpenRouter     │  │  Vercel AI    │
+│   /api.json      │  │   /v1/models     │  │  Gateway      │
+│                  │  │                  │  │  /v1/models   │
+│ Context windows  │  │ Pricing ($/M)    │  │ Capability    │
+│ Capabilities     │  │ Availability     │  │ tags          │
+│ Tool support     │  │ Provider routing │  │               │
+└────────┬─────────┘  └────────┬─────────┘  └──────┬────────┘
+         └─────────┬───────────┴────────────────────┘
+                   ▼
+STEP 2: MERGE & DEDUPLICATE
+         Priority: models.dev > OpenRouter > Vercel
+         Key: provider + modelId
+                   │
+                   ▼
+STEP 3: FILTER
+         ✅ tool_call === true
+         ✅ streaming supported
+         ✅ system messages supported
+         ✅ not deprecated
+                   │
+                   ▼
+STEP 4: NORMALIZE
+         Costs → $/million tokens
+         API type → one of 4 protocols
+         Input modes → ["text"] or ["text","image"]
+                   │
+                   ▼
+STEP 5: PATCH (manual corrections)
+         Claude Opus: cache pricing fix
+         GPT-5.4: context window override
+         Kimi K2.5: hardcoded pricing
+                   │
+                   ▼
+STEP 6: GENERATE TypeScript
+         → models.generated.ts (~330KB)
+         → 200+ models with full type safety
+```
+Each generated model entry looks like:
+```typescript
+{
+  id: "claude-opus-4-6",
+  name: "Claude Opus 4.6",
+  api: "anthropic-messages",
+  provider: "anthropic",
+  baseUrl: "https://api.anthropic.com",
+  reasoning: true,
+  input: ["text", "image"],
+  cost: {
+    input: 15,          // $15/M tokens
+    output: 75,         // $75/M tokens
+    cacheRead: 1.5,     // prompt cache hit
+    cacheWrite: 18.75,  // prompt cache write
   },
-});
+  contextWindow: 200_000,
+  maxTokens: 32_000,
+} satisfies Model<"anthropic-messages">
 ```
-### Custom Providers
+When a new model is released (e.g., Gemini 3.0), it appears in models.dev/OpenRouter → the script captures it → a new Pi version is published → Noosphere updates its dependency.
+---
-Register your own provider by implementing the `NoosphereProvider` interface:
+#### 4 API Protocols — How Pi Talks to Every Provider
+Pi abstracts all LLM providers into 4 wire protocols. Each protocol handles the differences in request format, streaming format, auth headers, and response parsing:
+| Protocol | Providers | Key Differences |
+|---|---|---|
+| `anthropic-messages` | Anthropic, AWS Bedrock | `system` as top-level field, content as `[{type:"text", text:"..."}]` blocks, `x-api-key` auth, `anthropic-beta` headers |
+| `openai-completions` | OpenAI, xAI, Groq, Cerebras, OpenRouter, Ollama, vLLM | `system` as message with `role:"system"`, content as string, `Authorization: Bearer` auth, `tool_calls` array |
+| `openai-responses` | OpenAI (reasoning models) | New Responses API with server-side context, `store: true`, reasoning summaries |
+| `google-generative-ai` | Google Gemini, Vertex AI | `systemInstruction.parts[{text}]`, role `"model"` instead of `"assistant"`, `functionCall` instead of `tool_calls`, `thinkingConfig` |
+The core function `streamSimple()` detects which protocol to use based on `model.api` and handles all the formatting/parsing transparently:
 ```typescript
-import type { NoosphereProvider } from 'noosphere';
+// What happens inside Pi when you call Noosphere's chat():
+async function* streamSimple(
+  model: Model,           // includes model.api to determine protocol
+  context: Context,       // { systemPrompt, messages, tools }
+  options?: StreamOptions  // { signal, onPayload, thinkingLevel, ... }
+): AsyncIterable<AssistantMessageEvent> {
+  // 1. Format request according to model.api protocol
+  // 2. Open SSE/WebSocket stream
+  // 3. Parse provider-specific chunks
+  // 4. Emit normalized events:
+  //    → text_delta, thinking_delta, tool_call, message_end
+}
+```
-const myProvider: NoosphereProvider = {
-  id: 'my-provider',
-  name: 'My Provider',
-  modalities: ['llm'],
-  isLocal: false,
+---
-  async ping() { return true; },
-  async listModels() { return [/* ... */]; },
+#### Agentic Capabilities
-  async chat(options) {
-    // your implementation
-    return { content: '...', provider: 'my-provider', model: '...', modality: 'llm', latencyMs: 100, usage: { cost: 0 } };
-  },
+These are the capabilities people get access to through the Pi-AI engine:
+##### 1. Tool Use / Function Calling
+Full structured tool calling supported across **all major providers**. Tool definitions use TypeBox schemas with runtime validation via AJV:
+```typescript
+import { type Tool, StringEnum } from '@mariozechner/pi-ai';
+import { Type } from '@sinclair/typebox';
+// Define a tool with typed parameters
+const searchTool: Tool = {
+  name: 'web_search',
+  description: 'Search the web for information',
+  parameters: Type.Object({
+    query: Type.String({ description: 'Search query' }),
+    maxResults: Type.Optional(Type.Number({ default: 5 })),
+    type: StringEnum(['web', 'images', 'news'], { description: 'Search type' }),
+  }),
 };
-ai.registerProvider(myProvider);
+// Pass tools in context — Pi handles the rest
+const context = {
+  systemPrompt: 'You are a helpful assistant.',
+  messages: [{ role: 'user', content: 'Search for recent AI news' }],
+  tools: [searchTool],
+};
 ```
-### Error Handling
+**How tool calling works internally:**
-All errors are instances of `NoosphereError`:
+```
+User prompt → LLM → "I need to call web_search"
+                         │
+                         ▼
+              Pi validates arguments with AJV
+              against the TypeBox schema
+                         │
+                   ┌─────┴─────┐
+                   │ Valid?     │
+                   ├─Yes───────┤
+                   │ Execute   │
+                   │ tool      │
+                   ├───────────┤
+                   │ No        │
+                   │ Return    │
+                   │ validation│
+                   │ error to  │
+                   │ LLM       │
+                   └───────────┘
+                         │
+                         ▼
+              Tool result → back into context → LLM continues
+```
+**Provider-specific tool_choice control:**
+- **Anthropic:** `"auto" | "any" | "none" | { type: "tool", name: "specific_tool" }`
+- **OpenAI:** `"auto" | "none" | "required" | { type: "function", function: { name: "..." } }`
+- **Google:** `"auto" | "none" | "any"`
+**Partial JSON streaming:** During streaming, Pi parses tool call arguments incrementally using partial JSON parsing. This means you can see tool arguments being built in real-time, not just after the tool call completes.
+##### 2. Reasoning / Extended Thinking
+Pi provides **unified thinking support** across all providers that support it. Thinking blocks are automatically extracted, separated from regular text, and streamed as distinct events:
+| Provider | Models | Control Parameters | How It Works |
+|---|---|---|---|
+| **Anthropic** | Claude Opus, Sonnet 4+ | `thinkingEnabled: boolean`, `thinkingBudgetTokens: number` | Extended thinking blocks in response, separate `thinking` content type |
+| **OpenAI** | o1, o3, o4, GPT-5 | `reasoningEffort: "minimal" \| "low" \| "medium" \| "high"` | Reasoning via Responses API, `reasoningSummary: "auto" \| "detailed" \| "concise"` |
+| **Google** | Gemini 2.5 Flash/Pro | `thinking.enabled: boolean`, `thinking.budgetTokens: number` | Thinking via `thinkingConfig`, mapped to effort levels |
+| **xAI** | Grok-4, Grok-3-mini | Native reasoning | Automatic when model supports it |
+**Cross-provider thinking portability:** When switching models mid-conversation, Pi converts thinking blocks between formats. Anthropic thinking blocks become `<thinking>` tagged text when sent to OpenAI/Google, and vice versa.
 ```typescript
-import { NoosphereError } from 'noosphere';
+// Thinking is automatically extracted in Noosphere responses:
+const result = await ai.chat({
+  model: 'claude-opus-4-6',
+  messages: [{ role: 'user', content: 'Solve this step by step: 15! / 13!' }],
+});
-try {
-  await ai.chat({ messages: [{ role: 'user', content: 'Hello' }] });
-} catch (err) {
-  if (err instanceof NoosphereError) {
-    console.log(err.code);      // 'RATE_LIMITED' | 'TIMEOUT' | 'AUTH_FAILED' | ...
-    console.log(err.provider);  // which provider failed
-    console.log(err.modality);  // 'llm' | 'image' | 'video' | 'tts'
-    console.log(err.isRetryable());
+console.log(result.thinking);  // "Let me work through this... 15! = 15 × 14 × 13!..."
+console.log(result.content);   // "15! / 13! = 15 × 14 = 210"
+// During streaming, thinking arrives as separate events:
+const stream = ai.stream({ messages: [...] });
+for await (const event of stream) {
+  if (event.type === 'thinking_delta') console.log('[THINKING]', event.delta);
+  if (event.type === 'text_delta') console.log('[RESPONSE]', event.delta);
+}
+```
+##### 3. Vision / Multimodal Input
+Models with `input: ["text", "image"]` accept images alongside text. Pi handles the encoding and format differences per provider:
+```typescript
+// Send images to vision-capable models
+const messages = [{
+  role: 'user',
+  content: [
+    { type: 'text', text: 'What is in this image?' },
+    { type: 'image', data: base64PngString, mimeType: 'image/png' },
+  ],
+}];
+// Supported MIME types: image/png, image/jpeg, image/gif, image/webp
+// Images are silently ignored when sent to non-vision models
+```
+**Vision-capable models include:** All Claude models, all GPT-4o/GPT-5 models, Gemini models, Grok-2-vision, Grok-4, and select Groq models.
+##### 4. Agent Loop — Autonomous Tool Execution
+The `@mariozechner/pi-agent-core` package provides a complete agent loop that automatically cycles through `prompt → LLM → tool call → result → repeat` until the task is done:
+```typescript
+import { agentLoop } from '@mariozechner/pi-ai';
+const events = agentLoop(userMessage, agentContext, {
+  model: getModel('anthropic', 'claude-opus-4-6'),
+  tools: [searchTool, readFileTool, writeFileTool],
+  signal: abortController.signal,
+});
+for await (const event of events) {
+  switch (event.type) {
+    case 'agent_start':           // Agent begins
+    case 'turn_start':            // New LLM turn begins
+    case 'message_start':         // LLM starts responding
+    case 'message_update':        // Text/thinking delta received
+    case 'tool_execution_start':  // About to execute a tool
+    case 'tool_execution_end':    // Tool finished, result available
+    case 'message_end':           // LLM finished this message
+    case 'turn_end':              // Turn complete (may loop if tools were called)
+    case 'agent_end':             // All done, final messages available
   }
 }
 ```
-Error codes: `PROVIDER_UNAVAILABLE`, `MODEL_NOT_FOUND`, `AUTH_FAILED`, `RATE_LIMITED`, `TIMEOUT`, `GENERATION_FAILED`, `INVALID_INPUT`, `NO_PROVIDER`
+**The agent loop state machine:**
-### Retry & Failover
+```
+[User sends prompt]
+        │
+        ▼
+  ┌─[Build Context]──▶ [Check Queues]──▶ [Stream LLM]◄── streamFn()
+  │                                           │
+  │                                     ┌─────┴──────┐
+  │                                     │            │
+  │                                   text      tool_call
+  │                                     │            │
+  │                                     ▼            ▼
+  │                                  [Done]    [Execute Tool]
+  │                                                  │
+  │                                            tool result
+  │                                                  │
+  └──────────────────────────────────────────────────┘
+                                    (loops back to Stream LLM)
+```
+**Key design decisions:**
+- Tools execute **sequentially** by default (parallelism can be added on top)
+- The `streamFn` is **injectable** — you can wrap it with middleware to modify requests per-provider
+- Tool arguments are **validated at runtime** using TypeBox + AJV before execution
+- Aborted/failed responses preserve partial content and usage data
+- Tool results are automatically added to the conversation context
+##### 5. The `streamFn` Pattern — Injectable Middleware
+This is Pi's most powerful architectural feature. The `streamFn` is the function that actually talks to the LLM, and it can be **wrapped with middleware** like Express.js request handlers:
 ```typescript
-const ai = new Noosphere({
-  retry: {
-    maxRetries: 3,                // default: 2
-    backoffMs: 2000,              // default: 1000
-    failover: true,               // default: true — try other providers on failure
-    retryableErrors: ['RATE_LIMITED', 'TIMEOUT', 'PROVIDER_UNAVAILABLE'],
+import type { StreamFn } from '@mariozechner/pi-agent-core';
+import { streamSimple } from '@mariozechner/pi-ai';
+// Start with Pi's base streaming function
+let fn: StreamFn = streamSimple;
+// Wrap it with middleware that modifies requests per-provider
+fn = createMyCustomWrapper(fn, {
+  // Add custom headers for Anthropic
+  onPayload: (payload) => {
+    if (model.provider === 'anthropic') {
+      payload.headers['anthropic-beta'] = 'fine-grained-tool-streaming-2025-05-14';
+    }
   },
-  timeout: {
-    llm: 30000,    // 30s (default)
-    image: 120000, // 2min (default)
-    video: 300000, // 5min (default)
-    tts: 60000,    // 1min (default)
+});
+// Each wrapper calls the previous one, forming a chain:
+// request → wrapper3 → wrapper2 → wrapper1 → streamSimple → API
+```
+This pattern is what allows projects like OpenClaw to stack **16 provider-specific wrappers** on top of Pi's base streaming — adding beta headers for Anthropic, WebSocket transport for OpenAI, thinking sanitization for Google, reasoning effort headers for OpenRouter, and more — without modifying Pi's source code.
+##### 6. Session Management (via pi-coding-agent)
+The `@mariozechner/pi-coding-agent` package provides persistent session management with JSONL-based storage:
+```typescript
+import { createAgentSession, SessionManager } from '@mariozechner/pi-coding-agent';
+// Create a session with full persistence
+const session = await createAgentSession({
+  model: 'claude-opus-4-6',
+  tools: myTools,
+  sessionManager,  // handles JSONL persistence
+});
+const result = await session.run('Build a REST API');
+// Session is automatically saved to:
+// ~/.pi/agent/sessions/session_abc123.jsonl
+```
+**Session file format (append-only JSONL):**
+```jsonl
+{"role":"user","content":"Build a REST API","timestamp":1710000000}
+{"role":"assistant","content":"I'll create...","model":"claude-opus-4-6","usage":{...}}
+{"role":"toolResult","toolCallId":"tc_001","toolName":"bash","content":"OK"}
+{"type":"compaction","summary":"The user asked to build...","preservedMessages":[...]}
+```
+**Session operations:**
+- `create()` — new session
+- `open(id)` — restore existing session
+- `continueRecent()` — continue the most recent session
+- `forkFrom(id)` — create a branch (new JSONL referencing parent)
+- `inMemory()` — RAM-only session (for SDK/testing)
+##### 7. Context Compaction — Automatic Context Window Management
+When the conversation approaches the model's context window limit, Pi automatically **compacts** the history:
+```
+1. DETECT: Calculate inputTokens + outputTokens vs model.contextWindow
+2. TRIGGER: Proactively before overflow, or as recovery after overflow error
+3. SUMMARIZE: Send history to LLM with a compaction prompt
+4. WRITE: Append compaction entry to JSONL:
+   {"type":"compaction","summary":"...","preservedMessages":[last N messages]}
+5. CONTINUE: Context is now summary + recent messages instead of full history
+```
+The JSONL file is **never rewritten** — compaction entries are appended, maintaining a complete audit trail.
+##### 8. Cost Tracking — Cache-Aware Pricing
+Pi tracks costs per-request with cache-aware pricing for providers that support prompt caching:
+```typescript
+// Every model has 4 cost dimensions:
+{
+  input: 15,          // $15 per 1M input tokens
+  output: 75,         // $75 per 1M output tokens
+  cacheRead: 1.5,     // $1.50 per 1M cached prompt tokens (read)
+  cacheWrite: 18.75,  // $18.75 per 1M cached prompt tokens (write)
+}
+// Usage tracking on every response:
+{
+  input: 1500,        // tokens consumed as input
+  output: 800,        // tokens generated
+  cacheRead: 5000,    // prompt cache hits
+  cacheWrite: 1500,   // prompt cache writes
+  cost: {
+    total: 0.082,     // total cost in USD
+    input: 0.0225,
+    output: 0.06,
+    cacheRead: 0.0075,
+    cacheWrite: 0.028,
   },
+}
+```
+**Anthropic and OpenAI** support prompt caching. For providers without caching, `cacheRead` and `cacheWrite` are always 0.
+##### 9. Extension System (via pi-coding-agent)
+Pi supports a plugin system where extensions can register tools, commands, and lifecycle hooks:
+```typescript
+// Extensions are TypeScript modules loaded at runtime via jiti
+export default function(api: ExtensionAPI) {
+  // Register a custom tool
+  api.registerTool('my_tool', {
+    description: 'Does something useful',
+    parameters: { /* TypeBox schema */ },
+    execute: async (args) => 'result',
+  });
+  // Register a slash command
+  api.registerCommand('/mycommand', {
+    handler: async (args) => { /* ... */ },
+    description: 'Custom command',
+  });
+  // Hook into the agent lifecycle
+  api.on('before_agent_start', async (context) => {
+    context.systemPrompt += '\nExtra instructions';
+  });
+  api.on('tool_execution_end', async (event) => {
+    // Post-process tool results
+  });
+}
+```
+**Resource discovery chain (priority):**
+1. Project `.pi/` directory (highest)
+2. User `~/.pi/agent/`
+3. npm packages with Pi metadata
+4. Built-in defaults
+##### 10. The Anti-MCP Philosophy — Why Pi Uses CLI Instead
+Pi explicitly **rejects MCP** (Model Context Protocol). Mario Zechner's argument, backed by benchmarks:
+**The token cost problem:**
+| Approach | Tools | Tokens Consumed | % of Claude's Context |
+|---|---|---|---|
+| Playwright MCP | 21 tools | 13,700 tokens | 6.8% |
+| Chrome DevTools MCP | 26 tools | 18,000 tokens | 9.0% |
+| Pi CLI + README | N/A | 225 tokens | ~0.1% |
+That's a **60-80x reduction** in token consumption. With 5 MCP servers, you lose ~55,000 tokens before doing any work.
+**Benchmark results (120 evaluations):**
+| Approach | Avg Cost | Success Rate |
+|---|---|---|
+| CLI (tmux) | $0.37 | 100% |
+| CLI (terminalcp) | $0.39 | 100% |
+| MCP (terminalcp) | $0.48 | 100% |
+Same success rate, MCP costs **30% more**.
+**Pi's alternative: Progressive Disclosure via CLI tools + READMEs**
+Instead of loading all tool definitions upfront, Pi's agent has `bash` as a built-in tool and discovers CLI tools only when needed:
+```
+MCP approach:                          Pi approach:
+─────────────                          ──────────
+Session start →                        Session start →
+  Load 21 Playwright tools               Load 4 tools: read, write, edit, bash
+  Load 26 Chrome DevTools tools           (225 tokens)
+  Load N more MCP tools
+  (~55,000 tokens wasted)
+When browser needed:                   When browser needed:
+  Tools already loaded                   Agent reads SKILL.md (225 tokens)
+  (but context is polluted)              Runs: browser-start.js
+                                         Runs: browser-nav.js https://...
+                                         Runs: browser-screenshot.js
+When browser NOT needed:               When browser NOT needed:
+  Tools still consume context             0 tokens wasted
+```
+**The 4 built-in tools** (what Pi argues is sufficient):
+| Tool | What It Does | Why It's Enough |
+|---|---|---|
+| `read` | Read files (text + images) | Supports offset/limit for large files |
+| `write` | Create/overwrite files | Creates directories automatically |
+| `edit` | Replace text (oldText→newText) | Surgical edits, like a diff |
+| `bash` | Execute any shell command | **bash can do everything else** — replaces MCP entirely |
+The key insight: `bash` replaces MCP. Any CLI tool, API call, database query, or system operation can be invoked through bash. The agent reads the tool's README only when it needs it, paying tokens on-demand instead of upfront.
+---
+### FAL — Media Generation (867+ endpoints)
+**Provider ID:** `fal`
+**Modalities:** Image, Video, TTS
+**Library:** `@fal-ai/client`
+The largest media generation provider with dynamic pricing fetched at runtime from `https://api.fal.ai/v1/models/pricing`.
+#### Image Models (200+)
+**FLUX Family (20+ variants):**
+| Model | Description |
+|---|---|
+| `fal-ai/flux/schnell` | Fast generation (default) |
+| `fal-ai/flux/dev` | Higher quality |
+| `fal-ai/flux-2` | Next generation |
+| `fal-ai/flux-2-pro` | Professional quality |
+| `fal-ai/flux-2-flex` | Flexible variant |
+| `fal-ai/flux-2/edit` | Image editing |
+| `fal-ai/flux-2/lora` | LoRA fine-tuning |
+| `fal-ai/flux-pro/v1.1-ultra` | Ultra high quality |
+| `fal-ai/flux-pro/kontext` | Context-aware generation |
+| `fal-ai/flux-lora` | Custom style training |
+| `fal-ai/flux-vision-upscaler` | AI upscaling |
+| `fal-ai/flux-krea-trainer` | Model training |
+| `fal-ai/flux-lora-fast-training` | Fast fine-tuning |
+| `fal-ai/flux-lora-portrait-trainer` | Portrait specialist |
+**Stable Diffusion:**
+`fal-ai/stable-diffusion-v15`, `fal-ai/stable-diffusion-v35-large`, `fal-ai/stable-diffusion-v35-medium`, `fal-ai/stable-diffusion-v3-medium`
+**Other Image Models:**
+| Model | Description |
+|---|---|
+| `fal-ai/recraft/v3/text-to-image` | Artistic generation |
+| `fal-ai/ideogram/v2`, `v2a`, `v3` | Ideogram series |
+| `fal-ai/imagen3`, `fal-ai/imagen4/preview` | Google Imagen |
+| `fal-ai/gpt-image-1` | GPT image generation |
+| `fal-ai/gpt-image-1/edit-image` | GPT image editing |
+| `fal-ai/reve/text-to-image` | Reve generation |
+| `fal-ai/sana`, `fal-ai/sana/sprint` | Sana models |
+| `fal-ai/pixart-sigma` | PixArt Sigma |
+| `fal-ai/bria/text-to-image/base` | Bria AI |
+**Pre-trained LoRA Styles:**
+`fal-ai/flux-2-lora-gallery/sepia-vintage`, `virtual-tryon`, `satellite-view-style`, `realism`, `multiple-angles`, `hdr-style`, `face-to-full-portrait`, `digital-comic-art`, `ballpoint-pen-sketch`, `apartment-staging`, `add-background`
+**Image Editing/Enhancement (30+ tools):**
+`fal-ai/image-editing/age-progression`, `baby-version`, `background-change`, `hair-change`, `expression-change`, `object-removal`, `photo-restoration`, `style-transfer`, and many more.
+#### Video Models (150+)
+**Kling Video (20+ variants):**
+| Model | Description |
+|---|---|
+| `fal-ai/kling-video/v2/master/text-to-video` | Default text-to-video |
+| `fal-ai/kling-video/v2/master/image-to-video` | Image-to-video |
+| `fal-ai/kling-video/v2.5-turbo/pro/text-to-video` | Turbo pro |
+| `fal-ai/kling-video/o1/image-to-video` | O1 quality |
+| `fal-ai/kling-video/o1/video-to-video/edit` | Video editing |
+| `fal-ai/kling-video/lipsync/audio-to-video` | Lip sync |
+| `fal-ai/kling-video/video-to-audio` | Audio extraction |
+**Sora 2 (OpenAI):**
+| Model | Description |
+|---|---|
+| `fal-ai/sora-2/text-to-video` | Text-to-video |
+| `fal-ai/sora-2/text-to-video/pro` | Pro quality |
+| `fal-ai/sora-2/image-to-video` | Image-to-video |
+| `fal-ai/sora-2/video-to-video/remix` | Video remixing |
+**VEO 3 (Google):**
+| Model | Description |
+|---|---|
+| `fal-ai/veo3` | VEO 3 standard |
+| `fal-ai/veo3/fast` | Fast variant |
+| `fal-ai/veo3/image-to-video` | Image-to-video |
+| `fal-ai/veo3.1` | Latest version |
+| `fal-ai/veo3.1/reference-to-video` | Reference-guided |
+| `fal-ai/veo3.1/first-last-frame-to-video` | Frame interpolation |
+**WAN (15+ variants):**
+`fal-ai/wan-pro/text-to-video`, `fal-ai/wan-pro/image-to-video`, `fal-ai/wan/v2.2-a14b/text-to-video`, `fal-ai/wan-vace-14b/depth`, `fal-ai/wan-vace-14b/inpainting`, `fal-ai/wan-vace-14b/pose`, `fal-ai/wan-effects`
+**Pixverse (20+ variants):**
+`fal-ai/pixverse/v5.5/text-to-video`, `fal-ai/pixverse/v5.5/image-to-video`, `fal-ai/pixverse/v5.5/effects`, `fal-ai/pixverse/lipsync`, `fal-ai/pixverse/sound-effects`
+**Minimax / Hailuo:**
+`fal-ai/minimax/hailuo-2.3/text-to-video/pro`, `fal-ai/minimax/hailuo-2.3/image-to-video/pro`, `fal-ai/minimax/video-01-director`, `fal-ai/minimax/video-01-live`
+**Other Video Models:**
+| Provider | Models |
+|---|---|
+| Hunyuan | `fal-ai/hunyuan-video/text-to-video`, `image-to-video`, `video-to-video`, `foley` |
+| Pika | `fal-ai/pika/v2.2/text-to-video`, `pikascenes`, `pikaffects` |
+| LTX | `fal-ai/ltx-2/text-to-video`, `image-to-video`, `retake-video` |
+| Luma | `fal-ai/luma-dream-machine/ray-2`, `ray-2-flash`, `luma-photon` |
+| Vidu | `fal-ai/vidu/q2/text-to-video`, `image-to-video/pro` |
+| CogVideoX | `fal-ai/cogvideox-5b/text-to-video`, `video-to-video` |
+| Seedance | `fal-ai/bytedance/seedance/v1/text-to-video`, `image-to-video` |
+| Magi | `fal-ai/magi/text-to-video`, `extend-video` |
+#### TTS / Speech Models (50+)
+**Kokoro (9 languages, 20+ voices per language):**
+| Model | Language | Example Voices |
+|---|---|---|
+| `fal-ai/kokoro/american-english` | English (US) | af_heart, af_alloy, af_bella, af_nova, am_adam, am_echo, am_onyx |
+| `fal-ai/kokoro/british-english` | English (UK) | British voice set |
+| `fal-ai/kokoro/french` | French | French voice set |
+| `fal-ai/kokoro/japanese` | Japanese | Japanese voice set |
+| `fal-ai/kokoro/spanish` | Spanish | Spanish voice set |
+| `fal-ai/kokoro/mandarin-chinese` | Chinese | Mandarin voice set |
+| `fal-ai/kokoro/italian` | Italian | Italian voice set |
+| `fal-ai/kokoro/hindi` | Hindi | Hindi voice set |
+| `fal-ai/kokoro/brazilian-portuguese` | Portuguese | Portuguese voice set |
+**ElevenLabs:**
+| Model | Description |
+|---|---|
+| `fal-ai/elevenlabs/tts/eleven-v3` | Professional quality |
+| `fal-ai/elevenlabs/tts/turbo-v2.5` | Faster inference |
+| `fal-ai/elevenlabs/tts/multilingual-v2` | Multi-language |
+| `fal-ai/elevenlabs/text-to-dialogue/eleven-v3` | Dialogue generation |
+| `fal-ai/elevenlabs/sound-effects/v2` | Sound effects |
+| `fal-ai/elevenlabs/speech-to-text` | Transcription |
+| `fal-ai/elevenlabs/audio-isolation` | Background removal |
+**Other TTS:**
+`fal-ai/f5-tts` (voice cloning), `fal-ai/dia-tts`, `fal-ai/minimax/speech-2.6-turbo`, `fal-ai/minimax/speech-2.6-hd`, `fal-ai/chatterbox/text-to-speech`, `fal-ai/index-tts-2/text-to-speech`
+#### FAL Client Capabilities
+The `@fal-ai/client` provides additional features beyond what Noosphere surfaces:
+- **Queue API** — Submit jobs, poll status, get results, cancel. Supports webhooks and priority levels
+- **Streaming API** — Real-time streaming responses via async iterators
+- **Realtime API** — WebSocket connections for interactive use (e.g., real-time image generation)
+- **Storage API** — File upload with configurable TTL (1h, 1d, 7d, 30d, 1y, never)
+- **Retry logic** — Configurable retries with exponential backoff and jitter
+- **Request middleware** — Custom request interceptors and proxy support
+---
+### Hugging Face — Open Source AI (30+ tasks)
+**Provider ID:** `huggingface`
+**Modalities:** LLM, Image, TTS
+**Library:** `@huggingface/inference`
+Access to the entire Hugging Face Hub ecosystem. Any model hosted on HuggingFace can be used by passing its ID directly.
+#### Default Models
+| Modality | Default Model | Description |
+|---|---|---|
+| LLM | `meta-llama/Llama-3.1-8B-Instruct` | Llama 3.1 8B |
+| Image | `stabilityai/stable-diffusion-xl-base-1.0` | SDXL Base |
+| TTS | `facebook/mms-tts-eng` | MMS TTS English |
+Any HuggingFace model ID works — just pass it as the `model` parameter:
+```typescript
+await ai.chat({
+  provider: 'huggingface',
+  model: 'mistralai/Mixtral-8x7B-v0.1',
+  messages: [{ role: 'user', content: 'Hello' }],
 });
 ```
-### Local Services
+#### Full Library Capabilities
+The `@huggingface/inference` library (v3.15.0) provides 30+ AI tasks, including capabilities not yet surfaced by Noosphere:
+**Natural Language Processing:**
+| Task | Method | Description |
+|---|---|---|
+| Chat | `chatCompletion()` | OpenAI-compatible chat completions |
+| Chat Streaming | `chatCompletionStream()` | Token-by-token streaming |
+| Text Generation | `textGeneration()` | Raw text completion |
+| Summarization | `summarization()` | Text summarization |
+| Translation | `translation()` | Language translation |
+| Question Answering | `questionAnswering()` | Extract answers from context |
+| Text Classification | `textClassification()` | Sentiment, topic classification |
+| Zero-Shot Classification | `zeroShotClassification()` | Classify without training |
+| Token Classification | `tokenClassification()` | NER, POS tagging |
+| Sentence Similarity | `sentenceSimilarity()` | Semantic similarity scores |
+| Feature Extraction | `featureExtraction()` | Text embeddings |
+| Fill Mask | `fillMask()` | Fill in masked tokens |
+| Table QA | `tableQuestionAnswering()` | Answer questions about tables |
+**Computer Vision:**
+| Task | Method | Description |
+|---|---|---|
+| Text-to-Image | `textToImage()` | Generate images from text |
+| Image-to-Image | `imageToImage()` | Transform/edit images |
+| Image Captioning | `imageToText()` | Describe images |
+| Classification | `imageClassification()` | Classify image content |
+| Object Detection | `objectDetection()` | Detect and locate objects |
+| Segmentation | `imageSegmentation()` | Pixel-level segmentation |
+| Zero-Shot Image | `zeroShotImageClassification()` | Classify without training |
+| Text-to-Video | `textToVideo()` | Generate videos |
+**Audio:**
+| Task | Method | Description |
+|---|---|---|
+| Text-to-Speech | `textToSpeech()` | Generate speech |
+| Speech-to-Text | `automaticSpeechRecognition()` | Transcription |
+| Audio Classification | `audioClassification()` | Classify sounds |
+| Audio-to-Audio | `audioToAudio()` | Source separation, enhancement |
+**Multimodal:**
+| Task | Method | Description |
+|---|---|---|
+| Visual QA | `visualQuestionAnswering()` | Answer questions about images |
+| Document QA | `documentQuestionAnswering()` | Answer questions about documents |
+**Tabular:**
+| Task | Method | Description |
+|---|---|---|
+| Classification | `tabularClassification()` | Classify tabular data |
+| Regression | `tabularRegression()` | Predict continuous values |
+#### HuggingFace Agentic Features
+- **Tool/Function Calling:** Full support via `tools` parameter with `tool_choice` control (auto/none/required)
+- **JSON Schema Responses:** `response_format: { type: 'json_schema', json_schema: {...} }`
+- **Reasoning:** `reasoning_effort` parameter (none/minimal/low/medium/high/xhigh)
+- **Multimodal Input:** Images via `image_url` content chunks in chat messages
+- **17 Inference Providers:** Route through Groq, Together, Fireworks, Replicate, Cerebras, Cohere, and more
+---
+### ComfyUI — Local Image Generation
+**Provider ID:** `comfyui`
+**Modalities:** Image, Video (planned)
+**Type:** Local
+**Default Port:** 8188
-Noosphere auto-detects local AI services on startup. Configure via constructor or environment variables:
+Connects to a local ComfyUI instance for Stable Diffusion workflows.
+#### How It Works
+1. Clones a built-in txt2img workflow template (KSampler + SDXL pipeline)
+2. Injects your parameters (prompt, dimensions, seed, steps, guidance)
+3. POSTs the workflow to ComfyUI's `/prompt` endpoint
+4. Polls `/history/{promptId}` every second until completion (max 5 minutes)
+5. Fetches the generated image from `/view`
+6. Returns a PNG buffer
+#### Configuration
 ```typescript
 const ai = new Noosphere({
-  autoDetectLocal: true, // default
   local: {
-    comfyui: { enabled: true, host: 'http://localhost', port: 8188 },
-    piper: { enabled: true, host: 'http://localhost', port: 5500 },
-    kokoro: { enabled: true, host: 'http://localhost', port: 5501 },
+    comfyui: {
+      enabled: true,
+      host: 'http://localhost',
+      port: 8188,
+    },
   },
 });
 ```
-Environment variables: `COMFYUI_HOST`, `COMFYUI_PORT`, `PIPER_HOST`, `PIPER_PORT`, `KOKORO_HOST`, `KOKORO_PORT`, `NOOSPHERE_AUTO_DETECT_LOCAL`
+#### Default Workflow
+- **Checkpoint:** `sd_xl_base_1.0.safetensors`
+- **Sampler:** euler with normal scheduler
+- **Default Steps:** 20
+- **Default CFG/Guidance:** 7
+- **Default Size:** 1024x1024
+- **Max Size:** 2048x2048
+- **Output:** PNG
+#### Models Exposed
+| Model ID | Modality | Description |
+|---|---|---|
+| `comfyui-txt2img` | Image | Text-to-image via workflow |
+| `comfyui-txt2vid` | Video | Planned (requires AnimateDiff workflow) |
+---
+### Local TTS — Piper & Kokoro
+**Provider IDs:** `piper`, `kokoro`
+**Modality:** TTS
+**Type:** Local
+Connects to local OpenAI-compatible TTS servers.
+#### Supported Engines
+| Engine | Default Port | Health Check | Voice Discovery |
+|---|---|---|---|
+| Piper | 5500 | `GET /health` | `GET /voices` |
+| Kokoro | 5501 | `GET /health` | `GET /v1/models` (fallback) |
+#### API
+Uses the OpenAI-compatible TTS endpoint:
+```
+POST /v1/audio/speech
+{
+  "model": "tts-1",
+  "input": "Hello world",
+  "voice": "default",
+  "speed": 1.0,
+  "response_format": "mp3"
+}
+```
+Supports `mp3`, `wav`, and `ogg` formats. Returns audio as a Buffer.
+---
+## Architecture
+### Provider Resolution (Local-First)
+When you call a generation method without specifying a provider, Noosphere resolves one automatically:
+1. If `model` is specified without `provider` → looks up model in registry cache
+2. If a `default` is configured for the modality → uses that
+3. Otherwise → **local providers first**, then cloud providers
+```
+resolveProvider(modality):
+  1. Check user-specified provider ID → return if found
+  2. Check configured defaults → return if found
+  3. Scan all providers:
+     → Return first LOCAL provider supporting this modality
+     → Fallback to first CLOUD provider
+  4. Throw NO_PROVIDER error
+```
+### Retry & Failover Logic
+```
+executeWithRetry(modality, provider, fn):
+  for attempt = 0..maxRetries:
+    try: return fn()
+    catch:
+      if error is retryable AND attempts remain:
+        wait backoffMs * 2^attempt (exponential backoff)
+        retry same provider
+      if error is NOT GENERATION_FAILED AND failover enabled:
+        try each alternative provider for this modality
+      throw last error
+```
+**Retryable errors (same provider):** `PROVIDER_UNAVAILABLE`, `RATE_LIMITED`, `TIMEOUT`, `GENERATION_FAILED`
+**Failover-eligible errors (cross-provider):** `PROVIDER_UNAVAILABLE`, `RATE_LIMITED`, `TIMEOUT` (NOT `GENERATION_FAILED`)
+### Model Registry & Caching
+- Models are fetched from providers via `listModels()` and cached in memory
+- Cache TTL is configurable (default: 60 minutes)
+- `syncModels()` forces a refresh of all provider model lists
+- Registry tracks model → provider mappings for fast resolution
+### Usage Tracking
+Every API call (success or failure) records a `UsageEvent`:
+```typescript
+interface UsageEvent {
+  modality: 'llm' | 'image' | 'video' | 'tts';
+  provider: string;
+  model: string;
+  cost: number;           // USD
+  latencyMs: number;
+  input?: number;         // tokens or characters
+  output?: number;        // tokens
+  unit?: string;
+  timestamp: string;      // ISO 8601
+  success: boolean;
+  error?: string;         // error message if failed
+  metadata?: Record<string, unknown>;
+}
+```
+---
-### Cleanup
+## Error Handling
+All errors are instances of `NoosphereError`:
 ```typescript
-await ai.dispose();
+import { NoosphereError } from 'noosphere';
+try {
+  await ai.chat({ messages: [{ role: 'user', content: 'Hello' }] });
+} catch (err) {
+  if (err instanceof NoosphereError) {
+    console.log(err.code);           // error code
+    console.log(err.provider);       // which provider failed
+    console.log(err.modality);       // which modality
+    console.log(err.model);          // which model (if known)
+    console.log(err.cause);          // underlying error
+    console.log(err.isRetryable());  // whether retry might help
+  }
+}
 ```
-## Providers
+### Error Codes
-| Provider | Modalities | Type |
-|---|---|---|
-| Pi-AI (OpenAI, Anthropic, Google, Groq, Mistral, xAI, OpenRouter) | LLM | Cloud |
-| FAL | Image, Video, TTS | Cloud |
-| Hugging Face | LLM, Image, TTS | Cloud |
-| ComfyUI | Image, Video | Local |
-| Piper / Kokoro | TTS | Local |
+| Code | Description | Retryable | Failover |
+|---|---|---|---|
+| `PROVIDER_UNAVAILABLE` | Provider is down or unreachable | Yes | Yes |
+| `RATE_LIMITED` | API rate limit exceeded | Yes | Yes |
+| `TIMEOUT` | Request exceeded timeout | Yes | Yes |
+| `GENERATION_FAILED` | Generation error (bad prompt, model issue) | Yes | No |
+| `AUTH_FAILED` | Invalid or missing API key | No | No |
+| `MODEL_NOT_FOUND` | Requested model doesn't exist | No | No |
+| `INVALID_INPUT` | Bad parameters or unsupported operation | No | No |
+| `NO_PROVIDER` | No provider available for the requested modality | No | No |
+---
+## Custom Providers
+Extend Noosphere with your own providers:
+```typescript
+import type { NoosphereProvider, ModelInfo, ChatOptions, NoosphereResult, Modality } from 'noosphere';
+const myProvider: NoosphereProvider = {
+  // Required properties
+  id: 'my-provider',
+  name: 'My Custom Provider',
+  modalities: ['llm', 'image'] as Modality[],
+  isLocal: false,
+  // Required methods
+  async ping() { return true; },
+  async listModels(modality?: Modality): Promise<ModelInfo[]> {
+    return [{
+      id: 'my-model',
+      provider: 'my-provider',
+      name: 'My Model',
+      modality: 'llm',
+      local: false,
+      cost: { price: 1.0, unit: 'per_1m_tokens' },
+      capabilities: {
+        contextWindow: 128000,
+        maxTokens: 4096,
+        supportsVision: false,
+        supportsStreaming: true,
+      },
+    }];
+  },
+  // Optional methods — implement per modality
+  async chat(options: ChatOptions): Promise<NoosphereResult> {
+    const start = Date.now();
+    // ... your implementation
+    return {
+      content: 'Response text',
+      provider: 'my-provider',
+      model: 'my-model',
+      modality: 'llm',
+      latencyMs: Date.now() - start,
+      usage: { cost: 0.001, input: 100, output: 50, unit: 'tokens' },
+    };
+  },
+  // stream?(options): NoosphereStream
+  // image?(options): Promise<NoosphereResult>
+  // video?(options): Promise<NoosphereResult>
+  // speak?(options): Promise<NoosphereResult>
+  // dispose?(): Promise<void>
+};
+ai.registerProvider(myProvider);
+```
+---
+## Provider Summary
+| Provider | ID | Modalities | Type | Models | Library |
+|---|---|---|---|---|---|
+| Pi-AI Gateway | `pi-ai` | LLM | Cloud | 246+ | `@mariozechner/pi-ai` |
+| FAL.ai | `fal` | Image, Video, TTS | Cloud | 867+ | `@fal-ai/client` |
+| Hugging Face | `huggingface` | LLM, Image, TTS | Cloud | Unlimited (any HF model) | `@huggingface/inference` |
+| ComfyUI | `comfyui` | Image | Local | SDXL workflows | Direct HTTP |
+| Piper TTS | `piper` | TTS | Local | Piper voices | Direct HTTP |
+| Kokoro TTS | `kokoro` | TTS | Local | Kokoro voices | Direct HTTP |
 ## Requirements