npm - noosphere - Versions diffs - 0.2.0 → 0.2.2 - Mend

noosphere 0.2.0 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md CHANGED Viewed

@@ -59,103 +59,204 @@ const audio = await ai.speak({
 // audio.buffer contains the audio data
 ```
-## Dynamic Model Auto-Fetch — Always Up-to-Date
+## Dynamic Model Auto-Fetch — Always Up-to-Date (ALL Providers, ALL Modalities)
-Noosphere **automatically discovers the latest models** from every provider's API at runtime. When Google releases a new Gemini model, when OpenAI drops GPT-5, when Anthropic publishes Claude 4 — **you get them immediately**, without updating Noosphere or any dependency.
+Noosphere **automatically discovers the latest models from EVERY provider's API at runtime** — across **all 4 modalities** (LLM, image, video, TTS). When Google releases a new Gemini model, when OpenAI drops GPT-5, when FAL adds a new video model, when a new image model trends on HuggingFace — **you get them immediately**, without updating Noosphere or any dependency.
 ### The Problem It Solves
-Traditional AI libraries rely on **static model catalogs** hardcoded at build time. The `@mariozechner/pi-ai` dependency ships with ~246 models in a pre-generated `models.generated.js` file. When a provider releases a new model, you'd have to wait for the library maintainer to run `npm run generate-models`, publish a new version, and then you'd `npm update`. This lag can be days or weeks.
-### How It Works
-On the **first API call**, Noosphere queries every provider's model listing API in parallel and merges the results with the static catalog:
-```
-First ai.chat() / ai.image() / ai.stream() call
-  │
-  ├─ 1. Load static pi-ai catalog (246 models with accurate cost/context data)
-  │
-  ├─ 2. Parallel fetch from ALL provider APIs (8 concurrent requests):
-  │     ├── GET https://api.openai.com/v1/models              (Bearer token)
-  │     ├── GET https://api.anthropic.com/v1/models            (x-api-key + anthropic-version)
-  │     ├── GET https://generativelanguage.googleapis.com/...  (API key in URL)
-  │     ├── GET https://api.groq.com/openai/v1/models         (Bearer token)
-  │     ├── GET https://api.mistral.ai/v1/models              (Bearer token)
-  │     ├── GET https://api.x.ai/v1/models                    (Bearer token)
-  │     ├── GET https://openrouter.ai/api/v1/models           (Bearer token)
-  │     └── GET https://api.cerebras.ai/v1/models             (Bearer token)
-  │
-  ├─ 3. Filter results (chat models only — exclude embeddings, TTS, whisper, etc.)
-  │
-  ├─ 4. Deduplicate against static catalog (static wins — has accurate cost data)
-  │
-  └─ 5. Merge: Static catalog + newly discovered models = complete model list
-```
-### What Gets Fetched Per Provider
+Traditional AI libraries rely on **static model catalogs** hardcoded at build time. The `@mariozechner/pi-ai` dependency ships with ~246 LLM models in a pre-generated `models.generated.js` file. HuggingFace providers typically hardcode 3-5 default models. When a provider releases a new model, you'd have to wait for the library maintainer to update, publish, and then you'd `npm update`. This lag can be days or weeks.
+**Noosphere solves this for every provider and every modality simultaneously.**
+### How It Works — Complete Auto-Fetch Architecture
+Noosphere has **3 independent auto-fetch systems** that work in parallel, one for each provider layer:
+```
+┌─────────────────────────────────────────────────────────────┐
+│                   NOOSPHERE AUTO-FETCH                       │
+├─────────────────────────────────────────────────────────────┤
+│                                                              │
+│  ┌─── Pi-AI Provider (LLM) ─────────────────────────────┐  │
+│  │  8 parallel API calls on first chat()/stream():       │  │
+│  │  OpenAI, Anthropic, Google, Groq, Mistral,            │  │
+│  │  xAI, OpenRouter, Cerebras                            │  │
+│  │  → Merges with static pi-ai catalog (246 models)      │  │
+│  │  → Constructs synthetic Model objects for new ones     │  │
+│  └───────────────────────────────────────────────────────┘  │
+│                                                              │
+│  ┌─── FAL Provider (Image/Video/TTS) ───────────────────┐  │
+│  │  1 API call on listModels():                          │  │
+│  │  GET https://api.fal.ai/v1/models/pricing             │  │
+│  │  → Returns ALL 867+ endpoints with live pricing       │  │
+│  │  → Auto-classifies modality from model ID + unit      │  │
+│  └───────────────────────────────────────────────────────┘  │
+│                                                              │
+│  ┌─── HuggingFace Provider (LLM/Image/TTS) ────────────┐  │
+│  │  3 parallel API calls on listModels():                │  │
+│  │  GET huggingface.co/api/models?pipeline_tag=...       │  │
+│  │  → text-generation (top 50 trending, inference-ready) │  │
+│  │  → text-to-image (top 50 trending, inference-ready)   │  │
+│  │  → text-to-speech (top 30 trending, inference-ready)  │  │
+│  │  → Includes inference provider mapping + pricing      │  │
+│  └───────────────────────────────────────────────────────┘  │
+│                                                              │
+└─────────────────────────────────────────────────────────────┘
+```
+### Layer 1: LLM Auto-Fetch (Pi-AI Provider) — 8 Provider APIs
+On the **first `chat()` or `stream()` call**, Pi-AI queries every LLM provider's model listing API in parallel:
 | Provider | API Endpoint | Auth | Model Filter | API Protocol |
 |---|---|---|---|---|
-| **OpenAI** | `/v1/models` | Bearer token | `gpt-*`, `o1*`, `o3*`, `o4*`, `chatgpt-*`, `codex-*` | `openai-responses` |
-| **Anthropic** | `/v1/models?limit=100` | `x-api-key` + `anthropic-version` | `claude-*` | `anthropic-messages` |
-| **Google** | `/v1beta/models?key=KEY` | API key in URL | `gemini-*`, `gemma-*` + must support `generateContent` | `google-generative-ai` |
-| **Groq** | `/openai/v1/models` | Bearer token | All (Groq only serves chat models) | `openai-completions` |
-| **Mistral** | `/v1/models` | Bearer token | Exclude `*embed*` | `openai-completions` |
-| **xAI** | `/v1/models` | Bearer token | `grok*` | `openai-completions` |
-| **OpenRouter** | `/api/v1/models` | Bearer token | All (OpenRouter only lists usable models) | `openai-completions` |
-| **Cerebras** | `/v1/models` | Bearer token | All (Cerebras only serves chat models) | `openai-completions` |
-### Resilience Guarantees
+| **OpenAI** | `GET /v1/models` | Bearer token | `gpt-*`, `o1*`, `o3*`, `o4*`, `chatgpt-*`, `codex-*` | `openai-responses` |
+| **Anthropic** | `GET /v1/models?limit=100` | `x-api-key` + `anthropic-version: 2023-06-01` | `claude-*` | `anthropic-messages` |
+| **Google** | `GET /v1beta/models?key=KEY` | API key in URL | `gemini-*`, `gemma-*` + must support `generateContent` | `google-generative-ai` |
+| **Groq** | `GET /openai/v1/models` | Bearer token | All (Groq only serves chat models) | `openai-completions` |
+| **Mistral** | `GET /v1/models` | Bearer token | Exclude `*embed*` | `openai-completions` |
+| **xAI** | `GET /v1/models` | Bearer token | `grok*` | `openai-completions` |
+| **OpenRouter** | `GET /api/v1/models` | Bearer token | All (all OpenRouter models are usable) | `openai-completions` |
+| **Cerebras** | `GET /v1/models` | Bearer token | All (Cerebras only serves chat models) | `openai-completions` |
-- **8-second timeout** per provider — slow APIs don't block everything
-- **`Promise.allSettled()`** — if one provider fails, the others still work
-- **Silent failure** — network errors are caught and ignored, static catalog always available
-- **One-time fetch** — results are cached in memory, not re-fetched on every call
-- **Zero config** — works automatically if you have API keys set
-### How New Models Become Usable
-When a dynamically discovered model isn't in the static catalog, Noosphere constructs a **synthetic Model object** that pi-ai's `complete()` and `stream()` functions can use directly:
+**How new LLM models become usable:** When a model isn't in the static catalog, Noosphere constructs a **synthetic `Model` object** with the correct API protocol, base URL, and inherited cost data:
 ```typescript
-// For a new model like "gpt-4.5-turbo" discovered from OpenAI's API:
+// New model "gpt-4.5-turbo" discovered from OpenAI's /v1/models:
 {
   id: 'gpt-4.5-turbo',
   name: 'gpt-4.5-turbo',
-  api: 'openai-responses',           // Correct protocol for the provider
+  api: 'openai-responses',              // Correct protocol for OpenAI
   provider: 'openai',
   baseUrl: 'https://api.openai.com/v1',
-  reasoning: false,                   // Inferred from model ID prefix
+  reasoning: false,                      // Inferred from model ID prefix
   input: ['text', 'image'],
-  cost: { input: 2.5, output: 10, cacheRead: 1.25, cacheWrite: 2.5 },  // From template
-  contextWindow: 128000,              // From template or provider API
-  maxTokens: 16384,                   // From template or provider API
+  cost: { input: 2.5, output: 10, ... },  // Inherited from template model
+  contextWindow: 128000,                   // From template or API response
+  maxTokens: 16384,
+}
+// This object is passed directly to pi-ai's complete()/stream() — works immediately
+```
+### Layer 2: Image/Video/TTS Auto-Fetch (FAL Provider) — Pricing API
+FAL already provides a **fully dynamic catalog**. On `listModels()`, it fetches from `https://api.fal.ai/v1/models/pricing`:
+```typescript
+// FAL returns an array with ALL available endpoints + live pricing:
+[
+  { modelId: "fal-ai/flux-pro/v1.1-ultra", price: 0.06, unit: "per_image" },
+  { modelId: "fal-ai/kling-video/v2/master/text-to-video", price: 0.10, unit: "per_second" },
+  { modelId: "fal-ai/kokoro/american-english", price: 0.002, unit: "per_1k_chars" },
+  // ... 867+ endpoints total
+]
+// Modality is auto-inferred from model ID + pricing unit:
+// - unit contains 'char' OR id contains 'tts'/'kokoro'/'elevenlabs' → TTS
+// - unit contains 'second' OR id contains 'video'/'kling'/'sora'/'veo' → Video
+// - Everything else → Image
+```
+**Result:** Every FAL model is always current — new endpoints appear the moment FAL publishes them. Pricing is always accurate because it comes directly from their API.
+### Layer 3: LLM/Image/TTS Auto-Fetch (HuggingFace Provider) — Hub API
+Instead of 3 hardcoded defaults, HuggingFace now fetches **trending inference-ready models** from the Hub API across all 3 modalities:
+```
+GET https://huggingface.co/api/models
+  ?pipeline_tag=text-generation       ← LLM models
+  &inference_provider=all             ← Only models available via inference API
+  &sort=trendingScore                 ← Most popular first
+  &limit=50                           ← Top 50
+  &expand[]=inferenceProviderMapping  ← Include provider routing + pricing
+```
+| Pipeline Tag | Modality | Limit | What It Fetches |
+|---|---|---|---|
+| `text-generation` | LLM | 50 | Top 50 trending chat/completion models with active inference endpoints |
+| `text-to-image` | Image | 50 | Top 50 trending image generation models (SDXL, Flux, etc.) |
+| `text-to-speech` | TTS | 30 | Top 30 trending TTS models with active inference endpoints |
+**What the Hub API returns per model:**
+```json
+{
+  "id": "Qwen/Qwen2.5-72B-Instruct",
+  "pipeline_tag": "text-generation",
+  "likes": 1893,
+  "downloads": 4521987,
+  "inferenceProviderMapping": [
+    {
+      "provider": "together",
+      "providerId": "Qwen/Qwen2.5-72B-Instruct-Turbo",
+      "status": "live",
+      "providerDetails": {
+        "context_length": 32768,
+        "pricing": { "input": 1.2, "output": 1.2 }
+      }
+    },
+    {
+      "provider": "fireworks-ai",
+      "providerId": "accounts/fireworks/models/qwen2p5-72b-instruct",
+      "status": "live"
+    }
+  ]
 }
 ```
-**Template inheritance:** Cost and context window data come from a "template" — the first model in the static catalog for that provider. This means new models inherit approximate pricing until the static catalog is updated with exact numbers. For Google, the API returns `inputTokenLimit` and `outputTokenLimit` directly, so context window data is always accurate.
+**Noosphere extracts from this:**
+- Model ID → `id` field
+- Pricing → first provider with `providerDetails.pricing`
+- Context window → first provider with `providerDetails.context_length`
+- Inference providers → list of available providers (Together, Fireworks, Groq, etc.)
+**Three requests fire in parallel** (`Promise.allSettled`) with a **10-second timeout** each. If any fails, the 3 hardcoded defaults are always available as fallback.
+### Resilience Guarantees (All Layers)
+| Guarantee | Pi-AI (LLM) | FAL (Image/Video/TTS) | HuggingFace (LLM/Image/TTS) |
+|---|---|---|---|
+| **Timeout** | 8s per provider | No custom timeout | 10s per pipeline_tag |
+| **Parallelism** | 8 concurrent requests | 1 request (returns all) | 3 concurrent requests |
+| **Failure handling** | `Promise.allSettled` | Returns `[]` on error | `Promise.allSettled` |
+| **Fallback** | Static pi-ai catalog (246 models) | Empty list (provider still usable by model ID) | 3 hardcoded defaults |
+| **Caching** | One-time fetch, cached in memory | Per `listModels()` call | One-time fetch, cached in memory |
+| **Auth required** | Yes (per-provider API keys) | Yes (FAL key) | Optional (works without token) |
+### Total Model Coverage
+| Source | Modalities | Model Count | Update Frequency |
+|---|---|---|---|
+| Pi-AI static catalog | LLM | ~246 | On npm update |
+| Pi-AI dynamic fetch | LLM | **All models across 8 providers** | **Every session** |
+| FAL pricing API | Image, Video, TTS | 867+ | **Every `listModels()` call** |
+| HuggingFace Hub API | LLM, Image, TTS | Top 130 trending | **Every session** |
+| ComfyUI `/object_info` | Image | Local checkpoints | **Every `listModels()` call** |
+| Local TTS `/voices` | TTS | Local voices | **Every `listModels()` call** |
 ### Force Refresh
 ```typescript
 const ai = new Noosphere();
-// Models are auto-fetched on first call:
+// Models are auto-fetched on first call — no action needed:
 await ai.chat({ model: 'gemini-2.5-ultra', messages: [...] }); // works immediately
-// Force a re-fetch if you suspect new models were added mid-session:
-// (access the provider's refreshDynamicModels method via the registry)
-const models = await ai.getModels('llm');
-// Or trigger a full sync:
-await ai.syncModels();
+// Trigger a full sync across ALL providers:
+const result = await ai.syncModels();
+// result = { synced: 1200+, byProvider: { 'pi-ai': 300, 'fal': 867, 'huggingface': 130, ... }, errors: [] }
+// Get all models for a specific modality:
+const imageModels = await ai.getModels('image');
+// Returns: FAL image models + HuggingFace image models + ComfyUI models
 ```
-### Why Not Just Use the Provider APIs Directly?
+### Why Hybrid (Static + Dynamic)?
 | Approach | Pros | Cons |
 |---|---|---|
-| **Static catalog only** (old) | Accurate costs, fast startup | Stale within days, miss new models |
+| **Static catalog only** | Accurate costs, fast startup | Stale within days, miss new models |
 | **Dynamic only** | Always current | No cost data, no context window info, slow startup |
 | **Hybrid (Noosphere)** | Best of both — accurate data for known models + immediate access to new ones | New models have estimated costs until catalog update |
@@ -1306,15 +1407,33 @@ The `@fal-ai/client` provides additional features beyond what Noosphere surfaces
 ---
-### Hugging Face — Open Source AI (30+ tasks)
+### Hugging Face — Open Source AI (30+ tasks, Dynamic Discovery)
 **Provider ID:** `huggingface`
 **Modalities:** LLM, Image, TTS
 **Library:** `@huggingface/inference`
+**Auto-Fetch:** Yes — discovers trending inference-ready models from the Hub API
-Access to the entire Hugging Face Hub ecosystem. Any model hosted on HuggingFace can be used by passing its ID directly.
+Access to the entire Hugging Face Hub ecosystem. Noosphere **automatically discovers the top trending models** across all 3 modalities via the Hub API, filtered to only include models with active inference provider endpoints.
-#### Default Models
+#### Auto-Discovered Models
+On first `listModels()` call, HuggingFace fetches from:
+```
+GET https://huggingface.co/api/models?inference_provider=all&pipeline_tag={tag}&sort=trendingScore&limit={n}&expand[]=inferenceProviderMapping
+```
+| Pipeline Tag | Modality | Limit | Example Models |
+|---|---|---|---|
+| `text-generation` | LLM | 50 | Qwen2.5-72B-Instruct, Llama-3.3-70B, DeepSeek-V3, Mistral-Large |
+| `text-to-image` | Image | 50 | FLUX.1-dev, Stable Diffusion 3.5, SDXL-Lightning, Playground v2.5 |
+| `text-to-speech` | TTS | 30 | Kokoro-82M, Bark, MMS-TTS |
+Each discovered model includes **inference provider routing** (Together, Fireworks, Groq, Replicate, etc.) and **pricing data** when available from the provider.
+#### Fallback Default Models
+These 3 models are always available, even if the Hub API is unreachable:
 | Modality | Default Model | Description |
 |---|---|---|
@@ -1322,7 +1441,7 @@ Access to the entire Hugging Face Hub ecosystem. Any model hosted on HuggingFace
 | Image | `stabilityai/stable-diffusion-xl-base-1.0` | SDXL Base |
 | TTS | `facebook/mms-tts-eng` | MMS TTS English |
-Any HuggingFace model ID works — just pass it as the `model` parameter:
+Any HuggingFace model ID works — just pass it as the `model` parameter (even if it's not in the auto-discovered list):
 ```typescript
 await ai.chat({
@@ -1480,26 +1599,31 @@ const buffer = Buffer.from(await blob.arrayBuffer());
 // result.media = { format: 'wav' }
 ```
-**Model listing — curated defaults, not API discovery:**
+**Model listing — dynamic Hub API discovery:**
 ```typescript
-// Unlike FAL (which fetches from API) or Pi-AI (which auto-generates),
-// HuggingFace returns a HARDCODED list of 3 curated models:
+// HuggingFace now auto-fetches trending models from the Hub API:
 async listModels(modality?: Modality): Promise<ModelInfo[]> {
-  const models: ModelInfo[] = [];
-  if (!modality || modality === 'image') {
-    models.push({ id: 'stabilityai/stable-diffusion-xl-base-1.0', ... });
-  }
-  if (!modality || modality === 'tts') {
-    models.push({ id: 'facebook/mms-tts-eng', ... });
-  }
-  if (!modality || modality === 'llm') {
-    models.push({ id: 'meta-llama/Llama-3.1-8B-Instruct', ... });
-  }
-  return models;
+  if (!this.dynamicModels) await this.fetchHubModels();
+  // Returns: 3 hardcoded defaults + top 50 LLM + top 50 image + top 30 TTS
+  // All filtered by inference_provider=all (only inference-ready models)
 }
-// This means: the registry only KNOWS about 3 models by default,
-// but you can use ANY HuggingFace model by passing its ID directly.
-// The model just won't appear in getModels() or syncModels() results.
+// Hub API request per modality:
+// GET https://huggingface.co/api/models
+//   ?pipeline_tag=text-generation
+//   &inference_provider=all        ← Only models with active inference endpoints
+//   &sort=trendingScore            ← Most popular first
+//   &limit=50
+//   &expand[]=inferenceProviderMapping  ← Include provider routing + pricing
+// Response includes per model:
+// - id: "Qwen/Qwen2.5-72B-Instruct"
+// - inferenceProviderMapping: [{ provider: "together", status: "live",
+//     providerDetails: { context_length: 32768, pricing: { input: 1.2 } } }]
+// Pricing and context_length extracted from inferenceProviderMapping
+// 3 hardcoded defaults always included as fallback
+// Results cached in memory after first fetch
 ```
 #### The 17 HuggingFace Inference Providers

package/dist/index.cjs CHANGED Viewed

@@ -279,7 +279,6 @@ var UsageTracker = class {
 // src/providers/pi-ai.ts
 var import_pi_ai = require("@mariozechner/pi-ai");
 var KNOWN_PROVIDERS = ["anthropic", "google", "openai", "xai", "groq", "cerebras", "openrouter", "zai"];
-var LOCAL_PROVIDERS = /* @__PURE__ */ new Set(["ollama"]);
 var FETCH_TIMEOUT_MS = 8e3;
 var OPENAI_CHAT_PREFIXES = ["gpt-", "o1", "o3", "o4", "chatgpt-", "codex-"];
 var OPENAI_REASONING_PREFIXES = ["o1", "o3", "o4"];
@@ -420,35 +419,7 @@ var PiAiProvider = class {
     if (modality && modality !== "llm") return [];
     await this.ensureDynamicModels();
     const models = [];
-    const seenIds = /* @__PURE__ */ new Set();
-    for (const provider of KNOWN_PROVIDERS) {
-      try {
-        const providerModels = (0, import_pi_ai.getModels)(provider);
-        for (const m of providerModels) {
-          seenIds.add(m.id);
-          models.push({
-            id: m.id,
-            provider: "pi-ai",
-            name: m.name || m.id,
-            modality: "llm",
-            local: LOCAL_PROVIDERS.has(String(m.provider)),
-            cost: {
-              price: m.cost.input ?? 0,
-              unit: m.cost.input > 0 ? "per_1m_tokens" : "free"
-            },
-            capabilities: {
-              contextWindow: m.contextWindow,
-              maxTokens: m.maxTokens,
-              supportsVision: m.input.includes("image"),
-              supportsStreaming: true
-            }
-          });
-        }
-      } catch {
-      }
-    }
-    for (const [id, m] of this.dynamicModels) {
-      if (seenIds.has(id)) continue;
+    for (const [, m] of this.dynamicModels) {
       models.push({
         id: m.id,
         provider: "pi-ai",
@@ -583,24 +554,15 @@ var PiAiProvider = class {
   async ensureDynamicModels() {
     if (this.dynamicModelsFetched) return;
     this.dynamicModelsFetched = true;
-    const staticIds = /* @__PURE__ */ new Set();
-    for (const provider of KNOWN_PROVIDERS) {
-      try {
-        for (const m of (0, import_pi_ai.getModels)(provider)) {
-          staticIds.add(m.id);
-        }
-      } catch {
-      }
-    }
     const fetchPromises = [];
     for (const [providerKey, configFactory] of Object.entries(PROVIDER_APIS)) {
       const apiKey = this.keys[providerKey];
       if (!apiKey) continue;
-      fetchPromises.push(this.fetchProviderModels(configFactory(apiKey), apiKey, staticIds));
+      fetchPromises.push(this.fetchProviderModels(configFactory(apiKey), apiKey));
     }
     await Promise.allSettled(fetchPromises);
   }
-  async fetchProviderModels(config, apiKey, staticIds) {
+  async fetchProviderModels(config, apiKey) {
     try {
       const controller = new AbortController();
       const timer = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS);
@@ -613,11 +575,11 @@ var PiAiProvider = class {
         if (!res.ok) return;
         const data = await res.json();
         const entries = config.extractEntries(data);
-        const templateModel = this.findStaticTemplate(config.providerName);
+        const staticTemplate = this.findStaticTemplate(config.providerName);
         for (const entry of entries) {
           const id = entry.id;
           if (!config.filterChat(id)) continue;
-          if (staticIds.has(id)) continue;
+          const staticMatch = this.findStaticModel(config.providerName, id);
           this.dynamicModels.set(id, {
             id,
             name: entry.name ?? id,
@@ -625,10 +587,10 @@ var PiAiProvider = class {
             provider: config.providerName,
             baseUrl: config.piBaseUrl,
             reasoning: config.isReasoning(id),
-            input: ["text", "image"],
-            cost: templateModel?.cost ?? { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
-            contextWindow: entry.contextWindow ?? templateModel?.contextWindow ?? 128e3,
-            maxTokens: entry.maxTokens ?? templateModel?.maxTokens ?? 16384
+            input: staticMatch?.input ?? ["text", "image"],
+            cost: staticMatch?.cost ?? staticTemplate?.cost ?? { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+            contextWindow: entry.contextWindow ?? staticMatch?.contextWindow ?? staticTemplate?.contextWindow ?? 128e3,
+            maxTokens: entry.maxTokens ?? staticMatch?.maxTokens ?? staticTemplate?.maxTokens ?? 16384
           });
         }
       } finally {
@@ -645,6 +607,14 @@ var PiAiProvider = class {
       return null;
     }
   }
+  findStaticModel(providerName, modelId) {
+    try {
+      const models = (0, import_pi_ai.getModels)(providerName);
+      return models.find((m) => m.id === modelId) ?? null;
+    } catch {
+      return null;
+    }
+  }
   /** Force re-fetch of dynamic models from provider APIs */
   async refreshDynamicModels() {
     this.dynamicModelsFetched = false;
@@ -652,18 +622,24 @@ var PiAiProvider = class {
     await this.ensureDynamicModels();
   }
   findModel(modelId) {
+    if (modelId) {
+      const dynamic = this.dynamicModels.get(modelId);
+      if (dynamic) return { model: dynamic, provider: String(dynamic.provider) };
+    }
+    if (!modelId) {
+      const first = this.dynamicModels.values().next();
+      if (!first.done && first.value) {
+        return { model: first.value, provider: String(first.value.provider) };
+      }
+    }
     for (const provider of KNOWN_PROVIDERS) {
       try {
         const models = (0, import_pi_ai.getModels)(provider);
-        const found = modelId ? models.find((m) => m.id === modelId) : models[0];
+        const found = modelId ? models.find((m) => m.id === modelId) : void 0;
         if (found) return { model: found, provider };
       } catch {
       }
     }
-    if (modelId) {
-      const dynamic = this.dynamicModels.get(modelId);
-      if (dynamic) return { model: dynamic, provider: String(dynamic.provider) };
-    }
     return { model: null, provider: null };
   }
 };
@@ -1037,51 +1013,102 @@ var LocalTTSProvider = class {
 // src/providers/huggingface.ts
 var import_inference = require("@huggingface/inference");
+var HF_HUB_API = "https://huggingface.co/api/models";
+var FETCH_TIMEOUT_MS2 = 1e4;
+var PIPELINE_TAG_MAP = {
+  "text-generation": { modality: "llm", limit: 50 },
+  "text-to-image": { modality: "image", limit: 50 },
+  "text-to-speech": { modality: "tts", limit: 30 }
+};
 var HuggingFaceProvider = class {
   id = "huggingface";
   name = "HuggingFace Inference";
   modalities = ["image", "tts", "llm"];
   isLocal = false;
   client;
+  token;
+  dynamicModels = null;
   constructor(token) {
+    this.token = token;
     this.client = new import_inference.HfInference(token);
   }
   async ping() {
     return true;
   }
   async listModels(modality) {
-    const models = [];
-    if (!modality || modality === "image") {
-      models.push({
-        id: "stabilityai/stable-diffusion-xl-base-1.0",
-        provider: "huggingface",
-        name: "SDXL Base",
-        modality: "image",
-        local: false,
-        cost: { price: 0, unit: "free" }
-      });
+    if (!this.dynamicModels) {
+      await this.fetchHubModels();
     }
-    if (!modality || modality === "tts") {
-      models.push({
-        id: "facebook/mms-tts-eng",
-        provider: "huggingface",
-        name: "MMS TTS English",
-        modality: "tts",
-        local: false,
-        cost: { price: 0, unit: "free" }
-      });
+    const all = this.dynamicModels ?? [];
+    if (modality) return all.filter((m) => m.modality === modality);
+    return all;
+  }
+  async fetchHubModels() {
+    const seenIds = /* @__PURE__ */ new Set();
+    const models = [];
+    const fetches = Object.entries(PIPELINE_TAG_MAP).map(
+      ([tag, { modality, limit }]) => this.fetchByPipelineTag(tag, modality, limit)
+    );
+    const results = await Promise.allSettled(fetches);
+    for (const result of results) {
+      if (result.status !== "fulfilled") continue;
+      for (const model of result.value) {
+        if (seenIds.has(model.id)) continue;
+        seenIds.add(model.id);
+        models.push(model);
+      }
     }
-    if (!modality || modality === "llm") {
-      models.push({
-        id: "meta-llama/Llama-3.1-8B-Instruct",
-        provider: "huggingface",
-        name: "Llama 3.1 8B",
-        modality: "llm",
-        local: false,
-        cost: { price: 0, unit: "free" }
-      });
+    this.dynamicModels = models;
+  }
+  async fetchByPipelineTag(pipelineTag, modality, limit) {
+    try {
+      const controller = new AbortController();
+      const timer = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS2);
+      try {
+        const params = new URLSearchParams({
+          pipeline_tag: pipelineTag,
+          inference_provider: "all",
+          sort: "trendingScore",
+          limit: String(limit),
+          "expand[]": "inferenceProviderMapping"
+        });
+        const res = await fetch(`${HF_HUB_API}?${params}`, {
+          headers: this.token ? { Authorization: `Bearer ${this.token}` } : {},
+          signal: controller.signal
+        });
+        if (!res.ok) return [];
+        const data = await res.json();
+        return data.filter((entry) => entry.id || entry.modelId).map((entry) => {
+          const id = entry.id ?? entry.modelId;
+          const providers = (entry.inferenceProviderMapping ?? []).filter((p) => p.status === "live").map((p) => p.provider);
+          const pricingProvider = (entry.inferenceProviderMapping ?? []).find((p) => p.providerDetails?.pricing);
+          const pricing = pricingProvider?.providerDetails?.pricing;
+          const contextLength = (entry.inferenceProviderMapping ?? []).find((p) => p.providerDetails?.context_length)?.providerDetails?.context_length;
+          return {
+            id,
+            provider: "huggingface",
+            name: id.split("/").pop() ?? id,
+            modality,
+            local: false,
+            cost: {
+              price: pricing?.input ?? 0,
+              unit: pricing ? "per_1m_tokens" : "free"
+            },
+            capabilities: {
+              ...modality === "llm" ? {
+                contextWindow: contextLength,
+                supportsStreaming: true
+              } : {},
+              ...providers.length > 0 ? { inferenceProviders: providers } : {}
+            }
+          };
+        });
+      } finally {
+        clearTimeout(timer);
+      }
+    } catch {
+      return [];
     }
-    return models;
   }
   async chat(options) {
     const start = Date.now();