npm - @push.rocks/smartai - Versions diffs - 0.13.3 → 2.0.0 - Mend

@push.rocks/smartai 0.13.3 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (98) hide show

package/dist_ts/00_commitinfo_data.js +3 -3
package/dist_ts/index.d.ts +6 -11
package/dist_ts/index.js +6 -12
package/dist_ts/plugins.d.ts +10 -15
package/dist_ts/plugins.js +13 -19
package/dist_ts/smartai.classes.smartai.d.ts +7 -0
package/dist_ts/smartai.classes.smartai.js +51 -0
package/dist_ts/smartai.interfaces.d.ts +41 -0
package/dist_ts/smartai.interfaces.js +2 -0
package/dist_ts/smartai.middleware.anthropic.d.ts +7 -0
package/dist_ts/smartai.middleware.anthropic.js +36 -0
package/dist_ts/smartai.provider.ollama.d.ts +8 -0
package/dist_ts/smartai.provider.ollama.js +378 -0
package/dist_ts_audio/index.d.ts +9 -0
package/dist_ts_audio/index.js +15 -0
package/dist_ts_audio/plugins.d.ts +2 -0
package/dist_ts_audio/plugins.js +3 -0
package/dist_ts_document/index.d.ts +11 -0
package/dist_ts_document/index.js +45 -0
package/dist_ts_document/plugins.d.ts +3 -0
package/dist_ts_document/plugins.js +4 -0
package/dist_ts_image/index.d.ts +46 -0
package/dist_ts_image/index.js +110 -0
package/dist_ts_image/plugins.d.ts +3 -0
package/dist_ts_image/plugins.js +4 -0
package/dist_ts_research/index.d.ts +19 -0
package/dist_ts_research/index.js +98 -0
package/dist_ts_research/plugins.d.ts +2 -0
package/dist_ts_research/plugins.js +3 -0
package/dist_ts_vision/index.d.ts +8 -0
package/dist_ts_vision/index.js +21 -0
package/dist_ts_vision/plugins.d.ts +2 -0
package/dist_ts_vision/plugins.js +3 -0
package/package.json +50 -22
package/readme.hints.md +34 -88
package/readme.md +284 -547
package/ts/00_commitinfo_data.ts +2 -2
package/ts/index.ts +8 -11
package/ts/plugins.ts +19 -35
package/ts/smartai.classes.smartai.ts +51 -0
package/ts/smartai.interfaces.ts +53 -0
package/ts/smartai.middleware.anthropic.ts +38 -0
package/ts/smartai.provider.ollama.ts +426 -0
package/ts_audio/index.ts +24 -0
package/ts_audio/plugins.ts +2 -0
package/ts_document/index.ts +61 -0
package/ts_document/plugins.ts +3 -0
package/ts_image/index.ts +147 -0
package/ts_image/plugins.ts +3 -0
package/ts_research/index.ts +120 -0
package/ts_research/plugins.ts +2 -0
package/ts_vision/index.ts +29 -0
package/ts_vision/plugins.ts +2 -0
package/dist_ts/abstract.classes.multimodal.d.ts +0 -212
package/dist_ts/abstract.classes.multimodal.js +0 -43
package/dist_ts/classes.conversation.d.ts +0 -31
package/dist_ts/classes.conversation.js +0 -150
package/dist_ts/classes.smartai.d.ts +0 -59
package/dist_ts/classes.smartai.js +0 -139
package/dist_ts/classes.tts.d.ts +0 -6
package/dist_ts/classes.tts.js +0 -10
package/dist_ts/interfaces.d.ts +0 -1
package/dist_ts/interfaces.js +0 -2
package/dist_ts/paths.d.ts +0 -2
package/dist_ts/paths.js +0 -4
package/dist_ts/provider.anthropic.d.ts +0 -48
package/dist_ts/provider.anthropic.js +0 -369
package/dist_ts/provider.elevenlabs.d.ts +0 -43
package/dist_ts/provider.elevenlabs.js +0 -64
package/dist_ts/provider.exo.d.ts +0 -40
package/dist_ts/provider.exo.js +0 -116
package/dist_ts/provider.groq.d.ts +0 -39
package/dist_ts/provider.groq.js +0 -178
package/dist_ts/provider.mistral.d.ts +0 -61
package/dist_ts/provider.mistral.js +0 -288
package/dist_ts/provider.ollama.d.ts +0 -141
package/dist_ts/provider.ollama.js +0 -529
package/dist_ts/provider.openai.d.ts +0 -62
package/dist_ts/provider.openai.js +0 -403
package/dist_ts/provider.perplexity.d.ts +0 -37
package/dist_ts/provider.perplexity.js +0 -215
package/dist_ts/provider.xai.d.ts +0 -52
package/dist_ts/provider.xai.js +0 -160
package/ts/abstract.classes.multimodal.ts +0 -240
package/ts/classes.conversation.ts +0 -176
package/ts/classes.smartai.ts +0 -187
package/ts/classes.tts.ts +0 -15
package/ts/interfaces.ts +0 -0
package/ts/paths.ts +0 -4
package/ts/provider.anthropic.ts +0 -446
package/ts/provider.elevenlabs.ts +0 -116
package/ts/provider.exo.ts +0 -155
package/ts/provider.groq.ts +0 -219
package/ts/provider.mistral.ts +0 -352
package/ts/provider.ollama.ts +0 -705
package/ts/provider.openai.ts +0 -462
package/ts/provider.perplexity.ts +0 -259
package/ts/provider.xai.ts +0 -214

package/readme.md CHANGED Viewed

@@ -1,12 +1,12 @@
 # @push.rocks/smartai
-**One API to rule them all** 🚀
+**A unified provider registry for the Vercel AI SDK** 🧠⚡
 [![npm version](https://img.shields.io/npm/v/@push.rocks/smartai.svg)](https://www.npmjs.com/package/@push.rocks/smartai)
 [![TypeScript](https://img.shields.io/badge/TypeScript-5.x-blue.svg)](https://www.typescriptlang.org/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-SmartAI unifies the world's leading AI providers — OpenAI, Anthropic, Mistral, Perplexity, Ollama, Groq, XAI, Exo, and ElevenLabs — under a single, elegant TypeScript interface. Build AI applications at lightning speed without vendor lock-in.
+SmartAI gives you a single `getModel()` function that returns a standard `LanguageModelV3` for **any** supported provider — Anthropic, OpenAI, Google, Groq, Mistral, XAI, Perplexity, or Ollama. Use the returned model with the Vercel AI SDK's `generateText()`, `streamText()`, and tool ecosystem. Specialized capabilities like vision, audio, image generation, document analysis, and web research are available as dedicated subpath imports.
 ## Issue Reporting and Security
@@ -14,679 +14,416 @@ For reporting bugs, issues, or security vulnerabilities, please visit [community
 ## 🎯 Why SmartAI?
-- **🔌 Universal Interface**: Write once, run with any AI provider. Switch between GPT-5, Claude, Llama, or Grok with a single line change.
-- **🛡️ Type-Safe**: Full TypeScript support with comprehensive type definitions for all operations.
-- **🌊 Streaming First**: Built for real-time applications with native streaming support.
-- **🎨 Multi-Modal**: Seamlessly work with text, images, audio, and documents.
-- **🏠 Local & Cloud**: Support for both cloud providers and local models via Ollama/Exo.
-- **⚡ Zero Lock-In**: Your code remains portable across all AI providers.
+- **🔌 One function, eight providers** — `getModel()` returns a standard `LanguageModelV3`. Switch providers by changing a string.
+- **🧱 Built on Vercel AI SDK** — Uses `ai` v6 under the hood. Your model works with `generateText()`, `streamText()`, tool calling, structured output, and everything else in the AI SDK ecosystem.
+- **🏠 Custom Ollama provider** — A full `LanguageModelV3` implementation for Ollama with support for `think` mode, `num_ctx`, auto-tuned temperature for Qwen models, and native tool calling.
+- **💰 Anthropic prompt caching** — Automatic `cacheControl` middleware reduces cost and latency on repeated calls. Enabled by default, opt out with `promptCaching: false`.
+- **📦 Modular subpath exports** — Vision, audio, image, document, and research capabilities ship as separate imports. Only import what you need.
+- **⚡ Zero lock-in** — Your code uses standard AI SDK types. Swap providers without touching application logic.
 ## 📦 Installation
 ```bash
-npm install @push.rocks/smartai
-# or
 pnpm install @push.rocks/smartai
 ```
 ## 🚀 Quick Start
 ```typescript
-import { SmartAi } from '@push.rocks/smartai';
-// Initialize with your favorite providers
-const ai = new SmartAi({
-  openaiToken: 'sk-...',
-  anthropicToken: 'sk-ant-...',
-  elevenlabsToken: 'sk-...',
-  elevenlabs: {
-    defaultVoiceId: '19STyYD15bswVz51nqLf', // Optional: Samara voice
-  },
-});
+import { getModel, generateText, streamText } from '@push.rocks/smartai';
-await ai.start();
+// Get a model for any provider
+const model = getModel({
+  provider: 'anthropic',
+  model: 'claude-sonnet-4-5-20250929',
+  apiKey: process.env.ANTHROPIC_TOKEN,
+});
-// Same API, multiple providers
-const response = await ai.openaiProvider.chat({
-  systemMessage: 'You are a helpful assistant.',
-  userMessage: 'Explain quantum computing in simple terms',
-  messageHistory: [],
+// Use it with the standard AI SDK functions
+const result = await generateText({
+  model,
+  prompt: 'Explain quantum computing in simple terms.',
 });
-console.log(response.message);
+console.log(result.text);
 ```
-## 📊 Provider Capabilities Matrix
-Choose the right provider for your use case:
+That's it. Change `provider` to `'openai'` and `model` to `'gpt-4o'` and the rest of your code stays exactly the same.
-| Provider       | Chat | Streaming | TTS | Vision | Documents | Research | Images | Highlights                                                      |
-| -------------- | :--: | :-------: | :-: | :----: | :-------: | :------: | :----: | --------------------------------------------------------------- |
-| **OpenAI**     |  ✅  |    ✅     | ✅  |   ✅   |    ✅     |    ✅    |   ✅   | gpt-image-1 • DALL-E 3 • Deep Research API                      |
-| **Anthropic**  |  ✅  |    ✅     | ❌  |   ✅   |    ✅     |    ✅    |   ❌   | Claude Sonnet 4.5 • Extended Thinking • Web Search API          |
-| **Mistral**    |  ✅  |    ✅     | ❌  |   ✅   |    ✅     |    ❌    |   ❌   | Native PDF OCR • mistral-large • Fast inference                 |
-| **ElevenLabs** |  ❌  |    ❌     | ✅  |   ❌   |    ❌     |    ❌    |   ❌   | Premium TTS • 70+ languages • v3 model                          |
-| **Ollama**     |  ✅  |    ✅     | ❌  |   ✅   |    ✅     |    ❌    |   ❌   | 100% local • Privacy-first • No API costs                       |
-| **XAI**        |  ✅  |    ✅     | ❌  |   ❌   |    ✅     |    ❌    |   ❌   | Grok 2 • Real-time data                                         |
-| **Perplexity** |  ✅  |    ✅     | ❌  |   ❌   |    ❌     |    ✅    |   ❌   | Web-aware • Research-focused • Sonar Pro                        |
-| **Groq**       |  ✅  |    ✅     | ❌  |   ❌   |    ❌     |    ❌    |   ❌   | 10x faster • LPU inference • Llama 3.3                          |
-| **Exo**        |  ✅  |    ✅     | ❌  |   ❌   |    ❌     |    ❌    |   ❌   | Distributed • P2P compute • Decentralized                       |
+## 🔧 Core API
-## 🎮 Core Features
+### `getModel(options): LanguageModelV3`
-### 💬 Universal Chat Interface
-Works identically across all providers:
+The primary export. Returns a standard `LanguageModelV3` you can use with any AI SDK function.
 ```typescript
-// Use GPT-5 for complex reasoning
-const gptResponse = await ai.openaiProvider.chat({
-  systemMessage: 'You are an expert physicist.',
-  userMessage: 'Explain the implications of quantum entanglement',
-  messageHistory: [],
-});
-// Use Claude for safety-critical applications
-const claudeResponse = await ai.anthropicProvider.chat({
-  systemMessage: 'You are a medical advisor.',
-  userMessage: 'Review this patient data for concerns',
-  messageHistory: [],
-});
-// Use Groq for lightning-fast responses
-const groqResponse = await ai.groqProvider.chat({
-  systemMessage: 'You are a code reviewer.',
-  userMessage: 'Quick! Find the bug in this code: ...',
-  messageHistory: [],
-});
+import { getModel } from '@push.rocks/smartai';
+import type { ISmartAiOptions } from '@push.rocks/smartai';
+const options: ISmartAiOptions = {
+  provider: 'anthropic',  // 'anthropic' | 'openai' | 'google' | 'groq' | 'mistral' | 'xai' | 'perplexity' | 'ollama'
+  model: 'claude-sonnet-4-5-20250929',
+  apiKey: 'sk-ant-...',
+  // Anthropic-only: prompt caching (default: true)
+  promptCaching: true,
+  // Ollama-only: base URL (default: http://localhost:11434)
+  baseUrl: 'http://localhost:11434',
+  // Ollama-only: model runtime options
+  ollamaOptions: { think: true, num_ctx: 4096 },
+};
+const model = getModel(options);
 ```
-### 🌊 Real-Time Streaming
+### Re-exported AI SDK Functions
-Build responsive chat interfaces with token-by-token streaming:
+SmartAI re-exports the most commonly used functions from `ai` for convenience:
 ```typescript
-// Create a chat stream
-const stream = await ai.openaiProvider.chatStream(inputStream);
-const reader = stream.getReader();
+import {
+  getModel,
+  generateText,
+  streamText,
+  tool,
+  jsonSchema,
+} from '@push.rocks/smartai';
+import type {
+  ModelMessage,
+  ToolSet,
+  StreamTextResult,
+  LanguageModelV3,
+} from '@push.rocks/smartai';
+```
-// Display responses as they arrive
-while (true) {
-  const { done, value } = await reader.read();
-  if (done) break;
+## 🤖 Supported Providers
-  // Update UI in real-time
-  process.stdout.write(value);
-}
-```
+| Provider | Package | Example Models |
+|----------|---------|----------------|
+| **Anthropic** | `@ai-sdk/anthropic` | `claude-sonnet-4-5-20250929`, `claude-opus-4-5-20250929` |
+| **OpenAI** | `@ai-sdk/openai` | `gpt-4o`, `gpt-4o-mini`, `o3-mini` |
+| **Google** | `@ai-sdk/google` | `gemini-2.0-flash`, `gemini-2.5-pro` |
+| **Groq** | `@ai-sdk/groq` | `llama-3.3-70b-versatile`, `mixtral-8x7b-32768` |
+| **Mistral** | `@ai-sdk/mistral` | `mistral-large-latest`, `mistral-small-latest` |
+| **XAI** | `@ai-sdk/xai` | `grok-3`, `grok-3-mini` |
+| **Perplexity** | `@ai-sdk/perplexity` | `sonar-pro`, `sonar` |
+| **Ollama** | Custom `LanguageModelV3` | `qwen3:8b`, `llama3:8b`, `deepseek-r1` |
-### 🎙️ Text-to-Speech
+## 💬 Text Generation
-Generate natural voices with OpenAI or ElevenLabs:
+### Generate Text
 ```typescript
-// OpenAI TTS
-const audioStream = await ai.openaiProvider.audio({
-  message: 'Welcome to the future of AI development!',
+import { getModel, generateText } from '@push.rocks/smartai';
+const model = getModel({
+  provider: 'openai',
+  model: 'gpt-4o',
+  apiKey: process.env.OPENAI_TOKEN,
 });
-// ElevenLabs TTS - Premium quality, natural voices (uses v3 by default)
-const elevenLabsAudio = await ai.elevenlabsProvider.audio({
-  message: 'Experience the most lifelike text to speech technology.',
-  voiceId: '19STyYD15bswVz51nqLf', // Optional: Samara voice
-  modelId: 'eleven_v3', // Optional: defaults to eleven_v3 (70+ languages)
-  voiceSettings: {
-    // Optional: fine-tune voice characteristics
-    stability: 0.5, // 0-1: Speech consistency
-    similarity_boost: 0.8, // 0-1: Voice similarity to original
-    style: 0.0, // 0-1: Expressiveness
-    use_speaker_boost: true, // Enhanced clarity
-  },
+const result = await generateText({
+  model,
+  system: 'You are a helpful assistant.',
+  prompt: 'What is 2 + 2?',
 });
-// Stream directly to speakers or save to file
-audioStream.pipe(fs.createWriteStream('welcome.mp3'));
+console.log(result.text); // "4"
 ```
-### 👁️ Vision Analysis
-Understand images with multiple providers:
+### Stream Text
 ```typescript
-const image = fs.readFileSync('product-photo.jpg');
+import { getModel, streamText } from '@push.rocks/smartai';
-// OpenAI: General purpose vision
-const gptVision = await ai.openaiProvider.vision({
-  image,
-  prompt: 'Describe this product and suggest marketing angles',
+const model = getModel({
+  provider: 'anthropic',
+  model: 'claude-sonnet-4-5-20250929',
+  apiKey: process.env.ANTHROPIC_TOKEN,
 });
-// Anthropic: Detailed analysis with extended thinking
-const claudeVision = await ai.anthropicProvider.vision({
-  image,
-  prompt: 'Identify any safety concerns or defects',
+const result = await streamText({
+  model,
+  prompt: 'Count from 1 to 10.',
 });
-// Ollama: Private, local analysis
-const ollamaVision = await ai.ollamaProvider.vision({
-  image,
-  prompt: 'Extract all text and categorize the content',
-});
+for await (const chunk of result.textStream) {
+  process.stdout.write(chunk);
+}
 ```
-### 📄 Document Intelligence
-Extract insights from PDFs with AI:
+### Tool Calling
 ```typescript
-const contract = fs.readFileSync('contract.pdf');
-const invoice = fs.readFileSync('invoice.pdf');
-// Analyze documents with OpenAI
-const analysis = await ai.openaiProvider.document({
-  systemMessage: 'You are a legal expert.',
-  userMessage: 'Compare these documents and highlight key differences',
-  messageHistory: [],
-  pdfDocuments: [contract, invoice],
+import { getModel, generateText, tool, jsonSchema } from '@push.rocks/smartai';
+const model = getModel({
+  provider: 'anthropic',
+  model: 'claude-sonnet-4-5-20250929',
+  apiKey: process.env.ANTHROPIC_TOKEN,
 });
-// Multi-document analysis with Anthropic
-const taxDocs = [form1099, w2, receipts];
-const taxAnalysis = await ai.anthropicProvider.document({
-  systemMessage: 'You are a tax advisor.',
-  userMessage: 'Prepare a tax summary from these documents',
-  messageHistory: [],
-  pdfDocuments: taxDocs,
+const result = await generateText({
+  model,
+  prompt: 'What is the weather in London?',
+  tools: {
+    getWeather: tool({
+      description: 'Get weather for a location',
+      parameters: jsonSchema({
+        type: 'object',
+        properties: {
+          location: { type: 'string' },
+        },
+        required: ['location'],
+      }),
+      execute: async ({ location }) => {
+        return { temperature: 18, condition: 'cloudy' };
+      },
+    }),
+  },
 });
 ```
-### 🔬 Research & Web Search
+## 🏠 Ollama (Local Models)
-Perform deep research with web search capabilities across multiple providers:
+The custom Ollama provider implements `LanguageModelV3` directly, calling Ollama's native `/api/chat` endpoint. This gives you features that generic OpenAI-compatible wrappers miss:
 ```typescript
-// OpenAI Deep Research - Comprehensive analysis
-const deepResearch = await ai.openaiProvider.research({
-  query: 'What are the latest developments in quantum computing?',
-  searchDepth: 'deep',
-  includeWebSearch: true,
-});
-console.log(deepResearch.answer);
-console.log('Sources:', deepResearch.sources);
-// Anthropic Web Search - Domain-filtered research
-import { AnthropicProvider } from '@push.rocks/smartai';
-const anthropic = new AnthropicProvider({
-  anthropicToken: 'sk-ant-...',
-  enableWebSearch: true,
-  searchDomainAllowList: ['nature.com', 'science.org'],
+import { getModel, generateText } from '@push.rocks/smartai';
+const model = getModel({
+  provider: 'ollama',
+  model: 'qwen3:8b',
+  baseUrl: 'http://localhost:11434', // default
+  ollamaOptions: {
+    think: true,      // Enable thinking/reasoning mode
+    num_ctx: 8192,     // Context window size
+    temperature: 0.7,  // Override default (Qwen models auto-default to 0.55)
+  },
 });
-const scientificResearch = await anthropic.research({
-  query: 'Latest breakthroughs in CRISPR gene editing',
-  searchDepth: 'advanced',
+const result = await generateText({
+  model,
+  prompt: 'Solve this step by step: what is 15% of 340?',
 });
-// Perplexity - Research-focused with citations
-const perplexityResearch = await ai.perplexityProvider.research({
-  query: 'Current state of autonomous vehicle technology',
-  searchDepth: 'deep', // Uses Sonar Pro model
-});
+console.log(result.text);
 ```
-**Research Options:**
-- `searchDepth`: `'basic'` | `'advanced'` | `'deep'`
-- `maxSources`: Number of sources to include
-- `includeWebSearch`: Enable web search (OpenAI)
-- `background`: Run as background task (OpenAI)
+### Ollama Features
-**Supported Providers:**
+- **`think` mode** — Enables reasoning for models that support it (Qwen3, QwQ, DeepSeek-R1). The `think` parameter is sent at the top level of the request body as required by the Ollama API.
+- **Auto-tuned temperature** — Qwen models automatically get `temperature: 0.55` when no explicit temperature is set, matching the recommended inference setting.
+- **Native tool calling** — Full tool call support via Ollama's native format (not shimmed through OpenAI-compatible endpoints).
+- **Streaming with reasoning** — `doStream()` emits proper `reasoning-start`, `reasoning-delta`, `reasoning-end` parts alongside text.
+- **All Ollama options** — `num_ctx`, `top_k`, `top_p`, `repeat_penalty`, `num_predict`, `stop`, `seed`.
-- **OpenAI**: Deep Research API with specialized models (`o3-deep-research-*`, `o4-mini-deep-research-*`)
-- **Anthropic**: Web Search API with domain filtering
-- **Perplexity**: Sonar and Sonar Pro models with built-in citations
+## 💰 Anthropic Prompt Caching
-### 🧠 Extended Thinking (Anthropic)
-Enable Claude to spend more time reasoning about complex problems before generating responses:
+When using the Anthropic provider, SmartAI automatically wraps the model with caching middleware that adds `cacheControl: { type: 'ephemeral' }` to the last system message and last user message. This can significantly reduce cost and latency for repeated calls with the same system prompt.
 ```typescript
-import { AnthropicProvider } from '@push.rocks/smartai';
-// Configure extended thinking mode at provider level
-const anthropic = new AnthropicProvider({
-  anthropicToken: 'sk-ant-...',
-  extendedThinking: 'normal', // Options: 'quick' | 'normal' | 'deep' | 'off'
+// Caching enabled by default
+const model = getModel({
+  provider: 'anthropic',
+  model: 'claude-sonnet-4-5-20250929',
+  apiKey: process.env.ANTHROPIC_TOKEN,
 });
-await anthropic.start();
-// Extended thinking is automatically applied to all methods
-const response = await anthropic.chat({
-  systemMessage: 'You are an expert mathematician.',
-  userMessage: 'Prove the Pythagorean theorem from first principles',
-  messageHistory: [],
+// Opt out of caching
+const modelNoCaching = getModel({
+  provider: 'anthropic',
+  model: 'claude-sonnet-4-5-20250929',
+  apiKey: process.env.ANTHROPIC_TOKEN,
+  promptCaching: false,
 });
 ```
-**Thinking Modes:**
-| Mode       | Budget Tokens | Use Case                                         |
-| ---------- | ------------- | ------------------------------------------------ |
-| `'quick'`  | 2,048         | Lightweight reasoning for simple queries         |
-| `'normal'` | 8,000         | **Default** — Balanced reasoning for most tasks  |
-| `'deep'`   | 16,000        | Complex reasoning for difficult problems         |
-| `'off'`    | 0             | Disable extended thinking                        |
-**Best Practices:**
-- Start with `'normal'` (default) for general usage
-- Use `'deep'` for complex analytical tasks, philosophy, mathematics, or research
-- Use `'quick'` for simple factual queries where deep reasoning isn't needed
-- Thinking budget counts against total token usage
-### 📑 Native PDF OCR (Mistral)
-Mistral provides native PDF document processing via their OCR API — no image conversion required:
+You can also use the middleware directly:
 ```typescript
-import { MistralProvider } from '@push.rocks/smartai';
-const mistral = new MistralProvider({
-  mistralToken: 'your-api-key',
-  chatModel: 'mistral-large-latest', // Default
-  ocrModel: 'mistral-ocr-latest', // Default
-  tableFormat: 'markdown', // 'markdown' | 'html'
-});
-await mistral.start();
+import { createAnthropicCachingMiddleware } from '@push.rocks/smartai';
+import { wrapLanguageModel } from 'ai';
-// Direct PDF processing - no image conversion overhead
-const result = await mistral.document({
-  systemMessage: 'You are a document analyst.',
-  userMessage: 'Extract all invoice details and calculate the total.',
-  pdfDocuments: [invoicePdfBuffer],
-  messageHistory: [],
-});
+const middleware = createAnthropicCachingMiddleware();
+const cachedModel = wrapLanguageModel({ model: baseModel, middleware });
 ```
-**Key Advantage**: Unlike other providers that convert PDFs to images first, Mistral's OCR API processes PDFs natively, potentially offering faster and more accurate text extraction for document-heavy workloads.
-**Supported Formats:**
+## 📦 Subpath Exports
-- Native PDF processing via Files API
-- Image OCR (JPEG, PNG, GIF, WebP) for vision tasks
-- Table extraction with markdown or HTML output
+SmartAI provides specialized capabilities as separate subpath imports. Each one is a focused utility that takes a model (or API key) and does one thing well.
-### 🎨 Image Generation & Editing
+### 👁️ Vision — `@push.rocks/smartai/vision`
-Generate and edit images with OpenAI's cutting-edge models:
+Analyze images using any vision-capable model.
 ```typescript
-// Basic image generation with gpt-image-1
-const image = await ai.openaiProvider.imageGenerate({
-  prompt: 'A futuristic robot assistant in a modern office, digital art',
-  model: 'gpt-image-1',
-  quality: 'high',
-  size: '1024x1024',
-});
-// Save the generated image
-const imageBuffer = Buffer.from(image.images[0].b64_json!, 'base64');
-fs.writeFileSync('robot.png', imageBuffer);
-// Advanced: Transparent background with custom format
-const logo = await ai.openaiProvider.imageGenerate({
-  prompt: 'Minimalist mountain peak logo, geometric design',
-  model: 'gpt-image-1',
-  quality: 'high',
-  size: '1024x1024',
-  background: 'transparent',
-  outputFormat: 'png',
-});
-// WebP with compression for web use
-const webImage = await ai.openaiProvider.imageGenerate({
-  prompt: 'Product showcase: sleek smartphone on marble surface',
-  model: 'gpt-image-1',
-  quality: 'high',
-  size: '1536x1024',
-  outputFormat: 'webp',
-  outputCompression: 85,
+import { analyzeImage } from '@push.rocks/smartai/vision';
+import { getModel } from '@push.rocks/smartai';
+import * as fs from 'fs';
+const model = getModel({
+  provider: 'anthropic',
+  model: 'claude-sonnet-4-5-20250929',
+  apiKey: process.env.ANTHROPIC_TOKEN,
 });
-// Superior text rendering (gpt-image-1's strength)
-const signage = await ai.openaiProvider.imageGenerate({
-  prompt:
-    'Vintage cafe sign saying "COFFEE & CODE" in hand-lettered typography',
-  model: 'gpt-image-1',
-  quality: 'high',
-  size: '1024x1024',
+const description = await analyzeImage({
+  model,
+  image: fs.readFileSync('photo.jpg'),
+  prompt: 'Describe this image in detail.',
+  mediaType: 'image/jpeg', // optional, defaults to 'image/jpeg'
 });
-// Generate multiple variations at once
-const variations = await ai.openaiProvider.imageGenerate({
-  prompt: 'Abstract geometric pattern, colorful minimalist art',
-  model: 'gpt-image-1',
-  n: 3,
-  quality: 'medium',
-  size: '1024x1024',
-});
-// Edit an existing image
-const editedImage = await ai.openaiProvider.imageEdit({
-  image: originalImageBuffer,
-  prompt: 'Add sunglasses and change the background to a beach sunset',
-  model: 'gpt-image-1',
-  quality: 'high',
-});
+console.log(description);
 ```
-**Image Generation Options:**
-- `model`: `'gpt-image-1'` | `'dall-e-3'` | `'dall-e-2'`
-- `quality`: `'low'` | `'medium'` | `'high'` | `'auto'`
-- `size`: Multiple aspect ratios up to 4096×4096
-- `background`: `'transparent'` | `'opaque'` | `'auto'`
-- `outputFormat`: `'png'` | `'jpeg'` | `'webp'`
-- `outputCompression`: 0–100 for webp/jpeg
-- `moderation`: `'low'` | `'auto'`
-- `n`: Number of images (1–10)
-**gpt-image-1 Advantages:**
-- Superior text rendering in images
-- Up to 4096×4096 resolution
-- Transparent background support
-- Advanced output formats (WebP with compression)
-- Better prompt understanding
-- Streaming support for progressive rendering
+**`analyzeImage(options)`** accepts:
+- `model` — Any `LanguageModelV3` with vision support
+- `image` — `Buffer` or `Uint8Array`
+- `prompt` — What to ask about the image
+- `mediaType` — `'image/jpeg'` | `'image/png'` | `'image/webp'` | `'image/gif'`
-### 🔄 Persistent Conversations
+### 🎙️ Audio — `@push.rocks/smartai/audio`
-Maintain context across interactions:
+Text-to-speech using OpenAI's TTS models.
 ```typescript
-// Create a coding assistant conversation
-const assistant = ai.createConversation('openai');
-await assistant.setSystemMessage('You are an expert TypeScript developer.');
-// First question
-const inputWriter = assistant.getInputStreamWriter();
-await inputWriter.write('How do I implement a singleton pattern?');
-// Continue the conversation
-await inputWriter.write('Now show me how to make it thread-safe');
+import { textToSpeech } from '@push.rocks/smartai/audio';
+import * as fs from 'fs';
+const stream = await textToSpeech({
+  apiKey: process.env.OPENAI_TOKEN,
+  text: 'Welcome to the future of AI development!',
+  voice: 'nova',     // 'alloy' | 'echo' | 'fable' | 'onyx' | 'nova' | 'shimmer'
+  model: 'tts-1-hd', // 'tts-1' | 'tts-1-hd'
+  responseFormat: 'mp3', // 'mp3' | 'opus' | 'aac' | 'flac'
+  speed: 1.0,         // 0.25 to 4.0
+});
-// The assistant remembers the entire context
+stream.pipe(fs.createWriteStream('welcome.mp3'));
 ```
-## 🚀 Real-World Examples
+### 🎨 Image — `@push.rocks/smartai/image`
-### Build a Customer Support Bot
+Generate and edit images using OpenAI's image models.
 ```typescript
-const supportBot = new SmartAi({
-  anthropicToken: process.env.ANTHROPIC_KEY, // Claude for empathetic responses
+import { generateImage, editImage } from '@push.rocks/smartai/image';
+// Generate an image
+const result = await generateImage({
+  apiKey: process.env.OPENAI_TOKEN,
+  prompt: 'A futuristic cityscape at sunset, digital art',
+  model: 'gpt-image-1',     // 'gpt-image-1' | 'dall-e-3' | 'dall-e-2'
+  quality: 'high',           // 'low' | 'medium' | 'high' | 'auto'
+  size: '1024x1024',
+  background: 'transparent', // gpt-image-1 only
+  outputFormat: 'png',       // 'png' | 'jpeg' | 'webp'
+  n: 1,
 });
-async function handleCustomerQuery(query: string, history: ChatMessage[]) {
-  try {
-    const response = await supportBot.anthropicProvider.chat({
-      systemMessage: `You are a helpful customer support agent.
-                      Be empathetic, professional, and solution-oriented.`,
-      userMessage: query,
-      messageHistory: history,
-    });
-    return response.message;
-  } catch (error) {
-    // Fallback to another provider if needed
-    return await supportBot.openaiProvider.chat({ /* ... */ });
-  }
-}
-```
-### Create a Code Review Assistant
+// result.images[0].b64_json — base64-encoded image data
+const imageBuffer = Buffer.from(result.images[0].b64_json!, 'base64');
-```typescript
-const codeReviewer = new SmartAi({
-  groqToken: process.env.GROQ_KEY, // Groq for speed
+// Edit an existing image
+const edited = await editImage({
+  apiKey: process.env.OPENAI_TOKEN,
+  image: imageBuffer,
+  prompt: 'Add a rainbow in the sky',
+  model: 'gpt-image-1',
 });
-async function reviewCode(code: string, language: string) {
-  const review = await codeReviewer.groqProvider.chat({
-    systemMessage: `You are a ${language} expert. Review code for:
-                    - Security vulnerabilities
-                    - Performance issues
-                    - Best practices
-                    - Potential bugs`,
-    userMessage: `Review this code:\n\n${code}`,
-    messageHistory: [],
-  });
-  return review.message;
-}
 ```
-### Build a Research Assistant
+### 📄 Document — `@push.rocks/smartai/document`
+Analyze PDF documents by converting them to images and using a vision model. Uses `@push.rocks/smartpdf` for PDF-to-PNG conversion (requires Chromium/Puppeteer).
 ```typescript
-const researcher = new SmartAi({
-  perplexityToken: process.env.PERPLEXITY_KEY,
+import { analyzeDocuments, stopSmartpdf } from '@push.rocks/smartai/document';
+import { getModel } from '@push.rocks/smartai';
+import * as fs from 'fs';
+const model = getModel({
+  provider: 'anthropic',
+  model: 'claude-sonnet-4-5-20250929',
+  apiKey: process.env.ANTHROPIC_TOKEN,
 });
-async function research(topic: string) {
-  // Perplexity excels at web-aware research
-  const findings = await researcher.perplexityProvider.research({
-    query: `Research the latest developments in ${topic}`,
-    searchDepth: 'deep',
-  });
-  return {
-    answer: findings.answer,
-    sources: findings.sources,
-  };
-}
-```
-### Local AI for Sensitive Data
-```typescript
-const localAI = new SmartAi({
-  ollama: {
-    baseUrl: 'http://localhost:11434',
-    model: 'llama2',
-    visionModel: 'llava',
-  },
+const analysis = await analyzeDocuments({
+  model,
+  systemMessage: 'You are a legal document analyst.',
+  userMessage: 'Summarize the key terms and conditions.',
+  pdfDocuments: [fs.readFileSync('contract.pdf')],
+  messageHistory: [],  // optional: prior conversation context
 });
-// Process sensitive documents without leaving your infrastructure
-async function analyzeSensitiveDoc(pdfBuffer: Buffer) {
-  const analysis = await localAI.ollamaProvider.document({
-    systemMessage: 'Extract and summarize key information.',
-    userMessage: 'Analyze this confidential document',
-    messageHistory: [],
-    pdfDocuments: [pdfBuffer],
-  });
-  // Data never leaves your servers
-  return analysis.message;
-}
+console.log(analysis);
+// Clean up the SmartPdf instance when done
+await stopSmartpdf();
 ```
-## ⚡ Performance Tips
+### 🔬 Research — `@push.rocks/smartai/research`
-### 1. Provider Selection Strategy
+Perform web-search-powered research using Anthropic's `web_search_20250305` tool.
 ```typescript
-class SmartAIRouter {
-  constructor(private ai: SmartAi) {}
-  async query(
-    message: string,
-    requirements: {
-      speed?: boolean;
-      accuracy?: boolean;
-      cost?: boolean;
-      privacy?: boolean;
-    }
-  ) {
-    if (requirements.privacy) {
-      return this.ai.ollamaProvider.chat({ /* ... */ }); // Local only
-    }
-    if (requirements.speed) {
-      return this.ai.groqProvider.chat({ /* ... */ }); // 10x faster
-    }
-    if (requirements.accuracy) {
-      return this.ai.anthropicProvider.chat({ /* ... */ }); // Best reasoning
-    }
-    // Default fallback
-    return this.ai.openaiProvider.chat({ /* ... */ });
-  }
-}
-```
+import { research } from '@push.rocks/smartai/research';
-### 2. Streaming for Large Responses
+const result = await research({
+  apiKey: process.env.ANTHROPIC_TOKEN,
+  query: 'What are the latest developments in quantum computing?',
+  searchDepth: 'basic',     // 'basic' | 'advanced' | 'deep'
+  maxSources: 10,           // optional: limit number of search results
+  allowedDomains: ['nature.com', 'arxiv.org'],  // optional: restrict to domains
+  blockedDomains: ['reddit.com'],               // optional: exclude domains
+});
-```typescript
-// Don't wait for the entire response
-async function streamResponse(userQuery: string) {
-  const stream = await ai.openaiProvider.chatStream(
-    createInputStream(userQuery)
-  );
-  // Process tokens as they arrive
-  for await (const chunk of stream) {
-    updateUI(chunk); // Immediate feedback
-    await processChunk(chunk); // Parallel processing
-  }
-}
+console.log(result.answer);
+console.log('Sources:', result.sources);       // Array<{ url, title, snippet }>
+console.log('Queries:', result.searchQueries); // search queries the model used
 ```
-### 3. Parallel Multi-Provider Queries
+## 🧪 Testing
-```typescript
-// Get the best answer from multiple AIs
-async function consensusQuery(question: string) {
-  const providers = [
-    ai.openaiProvider.chat({ /* ... */ }),
-    ai.anthropicProvider.chat({ /* ... */ }),
-    ai.perplexityProvider.chat({ /* ... */ }),
-  ];
-  const responses = await Promise.all(providers);
-  return synthesizeResponses(responses);
-}
+```bash
+# All tests
+pnpm test
+# Individual test files
+tstest test/test.smartai.ts --verbose    # Core getModel + generateText + streamText
+tstest test/test.ollama.ts --verbose     # Ollama provider (mocked, no API needed)
+tstest test/test.vision.ts --verbose     # Vision analysis
+tstest test/test.image.ts --verbose      # Image generation
+tstest test/test.research.ts --verbose   # Web research
+tstest test/test.audio.ts --verbose      # Text-to-speech
+tstest test/test.document.ts --verbose   # Document analysis (needs Chromium)
 ```
-## 🛠️ Advanced Configuration
-### Provider-Specific Options
-```typescript
-const ai = new SmartAi({
-  // OpenAI
-  openaiToken: 'sk-...',
-  // Anthropic with extended thinking
-  anthropicToken: 'sk-ant-...',
-  // Perplexity for research
-  perplexityToken: 'pplx-...',
+Most tests skip gracefully when API keys are not set. The Ollama tests are fully mocked and require no external services.
-  // Groq for speed
-  groqToken: 'gsk_...',
+## 📐 Architecture
-  // Mistral with OCR settings
-  mistralToken: 'your-key',
-  mistral: {
-    chatModel: 'mistral-large-latest',
-    ocrModel: 'mistral-ocr-latest',
-    tableFormat: 'markdown',
-  },
-  // XAI (Grok)
-  xaiToken: 'xai-...',
-  // ElevenLabs TTS
-  elevenlabsToken: 'sk-...',
-  elevenlabs: {
-    defaultVoiceId: '19STyYD15bswVz51nqLf',
-    defaultModelId: 'eleven_v3',
-  },
-  // Ollama (local)
-  ollama: {
-    baseUrl: 'http://localhost:11434',
-    model: 'llama2',
-    visionModel: 'llava',
-    defaultOptions: {
-      num_ctx: 4096,
-      temperature: 0.7,
-      top_p: 0.9,
-    },
-    defaultTimeout: 120000,
-  },
-  // Exo (distributed)
-  exo: {
-    baseUrl: 'http://localhost:8080/v1',
-    apiKey: 'optional-key',
-  },
-});
 ```
-### Error Handling & Fallbacks
-```typescript
-class ResilientAI {
-  private providers = ['openai', 'anthropic', 'groq'];
-  async query(opts: ChatOptions): Promise<ChatResponse> {
-    for (const provider of this.providers) {
-      try {
-        return await this.ai[`${provider}Provider`].chat(opts);
-      } catch (error) {
-        console.warn(`${provider} failed, trying next...`);
-        continue;
-      }
-    }
-    throw new Error('All providers failed');
-  }
-}
+@push.rocks/smartai
+├── ts/                          # Core package
+│   ├── index.ts                 # Re-exports getModel, AI SDK functions, types
+│   ├── smartai.classes.smartai.ts  # getModel() — provider switch
+│   ├── smartai.interfaces.ts    # ISmartAiOptions, TProvider, IOllamaModelOptions
+│   ├── smartai.provider.ollama.ts  # Custom LanguageModelV3 for Ollama
+│   ├── smartai.middleware.anthropic.ts  # Prompt caching middleware
+│   └── plugins.ts               # AI SDK provider factories
+├── ts_vision/                   # @push.rocks/smartai/vision
+├── ts_audio/                    # @push.rocks/smartai/audio
+├── ts_image/                    # @push.rocks/smartai/image
+├── ts_document/                 # @push.rocks/smartai/document
+└── ts_research/                 # @push.rocks/smartai/research
 ```
-## 🎯 Choosing the Right Provider
-| Use Case              | Recommended Provider | Why                                                       |
-| --------------------- | -------------------- | --------------------------------------------------------- |
-| **General Purpose**   | OpenAI               | Most features, stable, well-documented                    |
-| **Complex Reasoning** | Anthropic            | Superior logical thinking, extended thinking, safer       |
-| **Document OCR**      | Mistral              | Native PDF processing, no image conversion overhead       |
-| **Research & Facts**  | Perplexity           | Web-aware, provides citations                             |
-| **Deep Research**     | OpenAI               | Deep Research API with comprehensive analysis             |
-| **Premium TTS**       | ElevenLabs           | Most natural voices, 70+ languages, v3 model              |
-| **Speed Critical**    | Groq                 | 10x faster inference, sub-second responses                |
-| **Privacy Critical**  | Ollama               | 100% local, no data leaves your servers                   |
-| **Real-time Data**    | XAI                  | Grok with access to current information                   |
-| **Cost Sensitive**    | Ollama/Exo           | Free (local) or distributed compute                       |
-## 📈 Roadmap
-- [x] Research & Web Search API
-- [x] Image generation support (gpt-image-1, DALL-E 3, DALL-E 2)
-- [x] Extended thinking (Anthropic)
-- [x] Native PDF OCR (Mistral)
-- [ ] Streaming function calls
-- [ ] Voice input processing
-- [ ] Fine-tuning integration
-- [ ] Embedding support
-- [ ] Agent framework
-- [ ] More providers (Cohere, AI21, etc.)
+The core package is a thin registry. `getModel()` creates the appropriate `@ai-sdk/*` provider, calls it with the model ID, and returns the resulting `LanguageModelV3`. For Anthropic, it optionally wraps the model with prompt caching middleware. For Ollama, it returns a custom `LanguageModelV3` implementation that talks directly to Ollama's `/api/chat` endpoint.
+Subpath modules are independent — they import `ai` and provider SDKs directly, not through the core package. This keeps the dependency graph clean and allows tree-shaking.
 ## License and Legal Information