npm - noosphere - Versions diffs - 0.1.2 → 0.1.3 - Mend

noosphere 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +476 -37
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -478,68 +478,507 @@ A unified gateway that routes to 8 LLM providers through 4 different API protoco
 Aggregator providing access to hundreds of additional models including Llama, Deepseek, Mistral, Qwen, and many more. Full list available via `ai.getModels('llm')`.
-#### Agentic Capabilities (via Pi-AI library)
+#### The Pi-AI Engine — Deep Dive
-The underlying `@mariozechner/pi-ai` library exposes powerful agentic features. While Noosphere currently surfaces chat and streaming, the library provides:
+Noosphere's LLM provider is powered by `@mariozechner/pi-ai`, part of the **Pi mono-repo** by Mario Zechner (badlogic). Pi is NOT a wrapper like LangChain or Mastra — it's a **micro-framework for agentic AI** (~15K LOC, 4 npm packages) that was built from scratch as a minimalist alternative to Claude Code.
+Pi consists of 4 packages in 3 tiers:
+```
+TIER 1 — FOUNDATION
+  @mariozechner/pi-ai             LLM API: stream(), complete(), model registry
+                                  0 internal deps, talks to 20+ providers
+TIER 2 — INFRASTRUCTURE
+  @mariozechner/pi-agent-core     Agent loop, tool execution, lifecycle events
+                                  Depends on pi-ai
+  @mariozechner/pi-tui            Terminal UI with differential rendering
+                                  Standalone, 0 internal deps
+TIER 3 — APPLICATION
+  @mariozechner/pi-coding-agent   CLI + SDK: sessions, compaction, extensions
+                                  Depends on all above
+```
+Noosphere uses `@mariozechner/pi-ai` (Tier 1) directly for LLM access. But the full Pi ecosystem provides capabilities that can be layered on top.
+---
+#### How Pi Keeps 200+ Models Updated
+Pi does NOT hardcode models. It has an **auto-generation pipeline** that runs at build time:
+```
+STEP 1: FETCH (3 sources in parallel)
+┌──────────────────┐  ┌──────────────────┐  ┌───────────────┐
+│   models.dev     │  │   OpenRouter     │  │  Vercel AI    │
+│   /api.json      │  │   /v1/models     │  │  Gateway      │
+│                  │  │                  │  │  /v1/models   │
+│ Context windows  │  │ Pricing ($/M)    │  │ Capability    │
+│ Capabilities     │  │ Availability     │  │ tags          │
+│ Tool support     │  │ Provider routing │  │               │
+└────────┬─────────┘  └────────┬─────────┘  └──────┬────────┘
+         └─────────┬───────────┴────────────────────┘
+                   ▼
+STEP 2: MERGE & DEDUPLICATE
+         Priority: models.dev > OpenRouter > Vercel
+         Key: provider + modelId
+                   │
+                   ▼
+STEP 3: FILTER
+         ✅ tool_call === true
+         ✅ streaming supported
+         ✅ system messages supported
+         ✅ not deprecated
+                   │
+                   ▼
+STEP 4: NORMALIZE
+         Costs → $/million tokens
+         API type → one of 4 protocols
+         Input modes → ["text"] or ["text","image"]
+                   │
+                   ▼
+STEP 5: PATCH (manual corrections)
+         Claude Opus: cache pricing fix
+         GPT-5.4: context window override
+         Kimi K2.5: hardcoded pricing
+                   │
+                   ▼
+STEP 6: GENERATE TypeScript
+         → models.generated.ts (~330KB)
+         → 200+ models with full type safety
+```
+Each generated model entry looks like:
+```typescript
+{
+  id: "claude-opus-4-6",
+  name: "Claude Opus 4.6",
+  api: "anthropic-messages",
+  provider: "anthropic",
+  baseUrl: "https://api.anthropic.com",
+  reasoning: true,
+  input: ["text", "image"],
+  cost: {
+    input: 15,          // $15/M tokens
+    output: 75,         // $75/M tokens
+    cacheRead: 1.5,     // prompt cache hit
+    cacheWrite: 18.75,  // prompt cache write
+  },
+  contextWindow: 200_000,
+  maxTokens: 32_000,
+} satisfies Model<"anthropic-messages">
+```
+When a new model is released (e.g., Gemini 3.0), it appears in models.dev/OpenRouter → the script captures it → a new Pi version is published → Noosphere updates its dependency.
+---
+#### 4 API Protocols — How Pi Talks to Every Provider
+Pi abstracts all LLM providers into 4 wire protocols. Each protocol handles the differences in request format, streaming format, auth headers, and response parsing:
+| Protocol | Providers | Key Differences |
+|---|---|---|
+| `anthropic-messages` | Anthropic, AWS Bedrock | `system` as top-level field, content as `[{type:"text", text:"..."}]` blocks, `x-api-key` auth, `anthropic-beta` headers |
+| `openai-completions` | OpenAI, xAI, Groq, Cerebras, OpenRouter, Ollama, vLLM | `system` as message with `role:"system"`, content as string, `Authorization: Bearer` auth, `tool_calls` array |
+| `openai-responses` | OpenAI (reasoning models) | New Responses API with server-side context, `store: true`, reasoning summaries |
+| `google-generative-ai` | Google Gemini, Vertex AI | `systemInstruction.parts[{text}]`, role `"model"` instead of `"assistant"`, `functionCall` instead of `tool_calls`, `thinkingConfig` |
+The core function `streamSimple()` detects which protocol to use based on `model.api` and handles all the formatting/parsing transparently:
-**Tool Use / Function Calling:**
 ```typescript
-// Supported across Anthropic, OpenAI, Google, xAI, Groq
-// Tool definitions use TypeBox schemas for runtime validation
-interface Tool<TParameters extends TSchema = TSchema> {
-  name: string;
-  description: string;
-  parameters: TParameters;  // TypeBox schema — validated at runtime with AJV
+// What happens inside Pi when you call Noosphere's chat():
+async function* streamSimple(
+  model: Model,           // includes model.api to determine protocol
+  context: Context,       // { systemPrompt, messages, tools }
+  options?: StreamOptions  // { signal, onPayload, thinkingLevel, ... }
+): AsyncIterable<AssistantMessageEvent> {
+  // 1. Format request according to model.api protocol
+  // 2. Open SSE/WebSocket stream
+  // 3. Parse provider-specific chunks
+  // 4. Emit normalized events:
+  //    → text_delta, thinking_delta, tool_call, message_end
 }
 ```
-**Reasoning / Thinking:**
-- **Anthropic:** `thinkingEnabled`, `thinkingBudgetTokens` — Claude Opus/Sonnet extended thinking
-- **OpenAI:** `reasoningEffort` (minimal/low/medium/high) — o1/o3/o4/GPT-5 reasoning
-- **Google:** `thinking.enabled`, `thinking.budgetTokens` — Gemini 2.5 thinking
-- **xAI:** Grok-4 native reasoning
-- Thinking blocks are automatically extracted and streamed as separate `thinking_delta` events
+---
+#### Agentic Capabilities
+These are the capabilities people get access to through the Pi-AI engine:
+##### 1. Tool Use / Function Calling
+Full structured tool calling supported across **all major providers**. Tool definitions use TypeBox schemas with runtime validation via AJV:
+```typescript
+import { type Tool, StringEnum } from '@mariozechner/pi-ai';
+import { Type } from '@sinclair/typebox';
+// Define a tool with typed parameters
+const searchTool: Tool = {
+  name: 'web_search',
+  description: 'Search the web for information',
+  parameters: Type.Object({
+    query: Type.String({ description: 'Search query' }),
+    maxResults: Type.Optional(Type.Number({ default: 5 })),
+    type: StringEnum(['web', 'images', 'news'], { description: 'Search type' }),
+  }),
+};
+// Pass tools in context — Pi handles the rest
+const context = {
+  systemPrompt: 'You are a helpful assistant.',
+  messages: [{ role: 'user', content: 'Search for recent AI news' }],
+  tools: [searchTool],
+};
+```
+**How tool calling works internally:**
+```
+User prompt → LLM → "I need to call web_search"
+                         │
+                         ▼
+              Pi validates arguments with AJV
+              against the TypeBox schema
+                         │
+                   ┌─────┴─────┐
+                   │ Valid?     │
+                   ├─Yes───────┤
+                   │ Execute   │
+                   │ tool      │
+                   ├───────────┤
+                   │ No        │
+                   │ Return    │
+                   │ validation│
+                   │ error to  │
+                   │ LLM       │
+                   └───────────┘
+                         │
+                         ▼
+              Tool result → back into context → LLM continues
+```
+**Provider-specific tool_choice control:**
+- **Anthropic:** `"auto" | "any" | "none" | { type: "tool", name: "specific_tool" }`
+- **OpenAI:** `"auto" | "none" | "required" | { type: "function", function: { name: "..." } }`
+- **Google:** `"auto" | "none" | "any"`
+**Partial JSON streaming:** During streaming, Pi parses tool call arguments incrementally using partial JSON parsing. This means you can see tool arguments being built in real-time, not just after the tool call completes.
+##### 2. Reasoning / Extended Thinking
+Pi provides **unified thinking support** across all providers that support it. Thinking blocks are automatically extracted, separated from regular text, and streamed as distinct events:
+| Provider | Models | Control Parameters | How It Works |
+|---|---|---|---|
+| **Anthropic** | Claude Opus, Sonnet 4+ | `thinkingEnabled: boolean`, `thinkingBudgetTokens: number` | Extended thinking blocks in response, separate `thinking` content type |
+| **OpenAI** | o1, o3, o4, GPT-5 | `reasoningEffort: "minimal" \| "low" \| "medium" \| "high"` | Reasoning via Responses API, `reasoningSummary: "auto" \| "detailed" \| "concise"` |
+| **Google** | Gemini 2.5 Flash/Pro | `thinking.enabled: boolean`, `thinking.budgetTokens: number` | Thinking via `thinkingConfig`, mapped to effort levels |
+| **xAI** | Grok-4, Grok-3-mini | Native reasoning | Automatic when model supports it |
+**Cross-provider thinking portability:** When switching models mid-conversation, Pi converts thinking blocks between formats. Anthropic thinking blocks become `<thinking>` tagged text when sent to OpenAI/Google, and vice versa.
-**Vision / Multimodal Input:**
 ```typescript
-// Send images alongside text to vision-capable models
-{
-  role: "user",
-  content: [
-    { type: "text", text: "What's in this image?" },
-    { type: "image", data: base64String, mimeType: "image/png" }
-  ]
+// Thinking is automatically extracted in Noosphere responses:
+const result = await ai.chat({
+  model: 'claude-opus-4-6',
+  messages: [{ role: 'user', content: 'Solve this step by step: 15! / 13!' }],
+});
+console.log(result.thinking);  // "Let me work through this... 15! = 15 × 14 × 13!..."
+console.log(result.content);   // "15! / 13! = 15 × 14 = 210"
+// During streaming, thinking arrives as separate events:
+const stream = ai.stream({ messages: [...] });
+for await (const event of stream) {
+  if (event.type === 'thinking_delta') console.log('[THINKING]', event.delta);
+  if (event.type === 'text_delta') console.log('[RESPONSE]', event.delta);
 }
 ```
-**Agent Loop:**
+##### 3. Vision / Multimodal Input
+Models with `input: ["text", "image"]` accept images alongside text. Pi handles the encoding and format differences per provider:
+```typescript
+// Send images to vision-capable models
+const messages = [{
+  role: 'user',
+  content: [
+    { type: 'text', text: 'What is in this image?' },
+    { type: 'image', data: base64PngString, mimeType: 'image/png' },
+  ],
+}];
+// Supported MIME types: image/png, image/jpeg, image/gif, image/webp
+// Images are silently ignored when sent to non-vision models
+```
+**Vision-capable models include:** All Claude models, all GPT-4o/GPT-5 models, Gemini models, Grok-2-vision, Grok-4, and select Groq models.
+##### 4. Agent Loop — Autonomous Tool Execution
+The `@mariozechner/pi-agent-core` package provides a complete agent loop that automatically cycles through `prompt → LLM → tool call → result → repeat` until the task is done:
 ```typescript
-// Built-in agentic execution loop with automatic tool calling
 import { agentLoop } from '@mariozechner/pi-ai';
-const events = agentLoop(prompt, context, {
-  tools: [myTool],
-  model: getModel('anthropic', 'claude-sonnet-4-20250514'),
+const events = agentLoop(userMessage, agentContext, {
+  model: getModel('anthropic', 'claude-opus-4-6'),
+  tools: [searchTool, readFileTool, writeFileTool],
+  signal: abortController.signal,
 });
 for await (const event of events) {
-  // event.type: agent_start → turn_start → message_start →
-  //   message_update → tool_execution_start → tool_execution_end →
-  //   message_end → turn_end → agent_end
+  switch (event.type) {
+    case 'agent_start':           // Agent begins
+    case 'turn_start':            // New LLM turn begins
+    case 'message_start':         // LLM starts responding
+    case 'message_update':        // Text/thinking delta received
+    case 'tool_execution_start':  // About to execute a tool
+    case 'tool_execution_end':    // Tool finished, result available
+    case 'message_end':           // LLM finished this message
+    case 'turn_end':              // Turn complete (may loop if tools were called)
+    case 'agent_end':             // All done, final messages available
+  }
 }
 ```
-**Cost Tracking per Model:**
+**The agent loop state machine:**
+```
+[User sends prompt]
+        │
+        ▼
+  ┌─[Build Context]──▶ [Check Queues]──▶ [Stream LLM]◄── streamFn()
+  │                                           │
+  │                                     ┌─────┴──────┐
+  │                                     │            │
+  │                                   text      tool_call
+  │                                     │            │
+  │                                     ▼            ▼
+  │                                  [Done]    [Execute Tool]
+  │                                                  │
+  │                                            tool result
+  │                                                  │
+  └──────────────────────────────────────────────────┘
+                                    (loops back to Stream LLM)
+```
+**Key design decisions:**
+- Tools execute **sequentially** by default (parallelism can be added on top)
+- The `streamFn` is **injectable** — you can wrap it with middleware to modify requests per-provider
+- Tool arguments are **validated at runtime** using TypeBox + AJV before execution
+- Aborted/failed responses preserve partial content and usage data
+- Tool results are automatically added to the conversation context
+##### 5. The `streamFn` Pattern — Injectable Middleware
+This is Pi's most powerful architectural feature. The `streamFn` is the function that actually talks to the LLM, and it can be **wrapped with middleware** like Express.js request handlers:
+```typescript
+import type { StreamFn } from '@mariozechner/pi-agent-core';
+import { streamSimple } from '@mariozechner/pi-ai';
+// Start with Pi's base streaming function
+let fn: StreamFn = streamSimple;
+// Wrap it with middleware that modifies requests per-provider
+fn = createMyCustomWrapper(fn, {
+  // Add custom headers for Anthropic
+  onPayload: (payload) => {
+    if (model.provider === 'anthropic') {
+      payload.headers['anthropic-beta'] = 'fine-grained-tool-streaming-2025-05-14';
+    }
+  },
+});
+// Each wrapper calls the previous one, forming a chain:
+// request → wrapper3 → wrapper2 → wrapper1 → streamSimple → API
+```
+This pattern is what allows projects like OpenClaw to stack **16 provider-specific wrappers** on top of Pi's base streaming — adding beta headers for Anthropic, WebSocket transport for OpenAI, thinking sanitization for Google, reasoning effort headers for OpenRouter, and more — without modifying Pi's source code.
+##### 6. Session Management (via pi-coding-agent)
+The `@mariozechner/pi-coding-agent` package provides persistent session management with JSONL-based storage:
 ```typescript
-// Costs tracked per 1M tokens with cache-aware pricing
+import { createAgentSession, SessionManager } from '@mariozechner/pi-coding-agent';
+// Create a session with full persistence
+const session = await createAgentSession({
+  model: 'claude-opus-4-6',
+  tools: myTools,
+  sessionManager,  // handles JSONL persistence
+});
+const result = await session.run('Build a REST API');
+// Session is automatically saved to:
+// ~/.pi/agent/sessions/session_abc123.jsonl
+```
+**Session file format (append-only JSONL):**
+```jsonl
+{"role":"user","content":"Build a REST API","timestamp":1710000000}
+{"role":"assistant","content":"I'll create...","model":"claude-opus-4-6","usage":{...}}
+{"role":"toolResult","toolCallId":"tc_001","toolName":"bash","content":"OK"}
+{"type":"compaction","summary":"The user asked to build...","preservedMessages":[...]}
+```
+**Session operations:**
+- `create()` — new session
+- `open(id)` — restore existing session
+- `continueRecent()` — continue the most recent session
+- `forkFrom(id)` — create a branch (new JSONL referencing parent)
+- `inMemory()` — RAM-only session (for SDK/testing)
+##### 7. Context Compaction — Automatic Context Window Management
+When the conversation approaches the model's context window limit, Pi automatically **compacts** the history:
+```
+1. DETECT: Calculate inputTokens + outputTokens vs model.contextWindow
+2. TRIGGER: Proactively before overflow, or as recovery after overflow error
+3. SUMMARIZE: Send history to LLM with a compaction prompt
+4. WRITE: Append compaction entry to JSONL:
+   {"type":"compaction","summary":"...","preservedMessages":[last N messages]}
+5. CONTINUE: Context is now summary + recent messages instead of full history
+```
+The JSONL file is **never rewritten** — compaction entries are appended, maintaining a complete audit trail.
+##### 8. Cost Tracking — Cache-Aware Pricing
+Pi tracks costs per-request with cache-aware pricing for providers that support prompt caching:
+```typescript
+// Every model has 4 cost dimensions:
 {
-  input: number,       // cost per 1M input tokens
-  output: number,      // cost per 1M output tokens
-  cacheRead: number,   // prompt cache hit cost
-  cacheWrite: number,  // prompt cache write cost
+  input: 15,          // $15 per 1M input tokens
+  output: 75,         // $75 per 1M output tokens
+  cacheRead: 1.5,     // $1.50 per 1M cached prompt tokens (read)
+  cacheWrite: 18.75,  // $18.75 per 1M cached prompt tokens (write)
+}
+// Usage tracking on every response:
+{
+  input: 1500,        // tokens consumed as input
+  output: 800,        // tokens generated
+  cacheRead: 5000,    // prompt cache hits
+  cacheWrite: 1500,   // prompt cache writes
+  cost: {
+    total: 0.082,     // total cost in USD
+    input: 0.0225,
+    output: 0.06,
+    cacheRead: 0.0075,
+    cacheWrite: 0.028,
+  },
 }
 ```
+**Anthropic and OpenAI** support prompt caching. For providers without caching, `cacheRead` and `cacheWrite` are always 0.
+##### 9. Extension System (via pi-coding-agent)
+Pi supports a plugin system where extensions can register tools, commands, and lifecycle hooks:
+```typescript
+// Extensions are TypeScript modules loaded at runtime via jiti
+export default function(api: ExtensionAPI) {
+  // Register a custom tool
+  api.registerTool('my_tool', {
+    description: 'Does something useful',
+    parameters: { /* TypeBox schema */ },
+    execute: async (args) => 'result',
+  });
+  // Register a slash command
+  api.registerCommand('/mycommand', {
+    handler: async (args) => { /* ... */ },
+    description: 'Custom command',
+  });
+  // Hook into the agent lifecycle
+  api.on('before_agent_start', async (context) => {
+    context.systemPrompt += '\nExtra instructions';
+  });
+  api.on('tool_execution_end', async (event) => {
+    // Post-process tool results
+  });
+}
+```
+**Resource discovery chain (priority):**
+1. Project `.pi/` directory (highest)
+2. User `~/.pi/agent/`
+3. npm packages with Pi metadata
+4. Built-in defaults
+##### 10. The Anti-MCP Philosophy — Why Pi Uses CLI Instead
+Pi explicitly **rejects MCP** (Model Context Protocol). Mario Zechner's argument, backed by benchmarks:
+**The token cost problem:**
+| Approach | Tools | Tokens Consumed | % of Claude's Context |
+|---|---|---|---|
+| Playwright MCP | 21 tools | 13,700 tokens | 6.8% |
+| Chrome DevTools MCP | 26 tools | 18,000 tokens | 9.0% |
+| Pi CLI + README | N/A | 225 tokens | ~0.1% |
+That's a **60-80x reduction** in token consumption. With 5 MCP servers, you lose ~55,000 tokens before doing any work.
+**Benchmark results (120 evaluations):**
+| Approach | Avg Cost | Success Rate |
+|---|---|---|
+| CLI (tmux) | $0.37 | 100% |
+| CLI (terminalcp) | $0.39 | 100% |
+| MCP (terminalcp) | $0.48 | 100% |
+Same success rate, MCP costs **30% more**.
+**Pi's alternative: Progressive Disclosure via CLI tools + READMEs**
+Instead of loading all tool definitions upfront, Pi's agent has `bash` as a built-in tool and discovers CLI tools only when needed:
+```
+MCP approach:                          Pi approach:
+─────────────                          ──────────
+Session start →                        Session start →
+  Load 21 Playwright tools               Load 4 tools: read, write, edit, bash
+  Load 26 Chrome DevTools tools           (225 tokens)
+  Load N more MCP tools
+  (~55,000 tokens wasted)
+When browser needed:                   When browser needed:
+  Tools already loaded                   Agent reads SKILL.md (225 tokens)
+  (but context is polluted)              Runs: browser-start.js
+                                         Runs: browser-nav.js https://...
+                                         Runs: browser-screenshot.js
+When browser NOT needed:               When browser NOT needed:
+  Tools still consume context             0 tokens wasted
+```
+**The 4 built-in tools** (what Pi argues is sufficient):
+| Tool | What It Does | Why It's Enough |
+|---|---|---|
+| `read` | Read files (text + images) | Supports offset/limit for large files |
+| `write` | Create/overwrite files | Creates directories automatically |
+| `edit` | Replace text (oldText→newText) | Surgical edits, like a diff |
+| `bash` | Execute any shell command | **bash can do everything else** — replaces MCP entirely |
+The key insight: `bash` replaces MCP. Any CLI tool, API call, database query, or system operation can be invoked through bash. The agent reads the tool's README only when it needs it, paying tokens on-demand instead of upfront.
 ---
 ### FAL — Media Generation (867+ endpoints)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "noosphere",
-  "version": "0.1.2",
+  "version": "0.1.3",
   "description": "Unified AI creation engine — text, image, video, audio across all providers",
   "type": "module",
   "main": "./dist/index.js",