npm - agentic-flow - Versions diffs - 1.9.2 → 1.9.4 - Mend

agentic-flow 1.9.2 → 1.9.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/CHANGELOG.md +86 -0
package/README.md +104 -0
package/dist/cli-proxy.js +38 -6
package/dist/core/long-running-agent.js +219 -0
package/dist/core/provider-manager.js +434 -0
package/dist/examples/use-provider-fallback.js +176 -0
package/dist/proxy/anthropic-to-gemini.js +50 -15
package/dist/proxy/proxy/anthropic-to-gemini.js +439 -0
package/dist/proxy/utils/logger.js +59 -0
package/docs/LANDING-PAGE-PROVIDER-CONTENT.md +204 -0
package/docs/PROVIDER-FALLBACK-GUIDE.md +619 -0
package/docs/PROVIDER-FALLBACK-SUMMARY.md +418 -0
package/package.json +1 -1
package/validation/test-provider-fallback.ts +285 -0
package/wasm/reasoningbank/reasoningbank_wasm_bg.js +2 -2
package/wasm/reasoningbank/reasoningbank_wasm_bg.wasm +0 -0

package/docs/PROVIDER-FALLBACK-GUIDE.md ADDED Viewed

@@ -0,0 +1,619 @@
+# Provider Fallback & Dynamic Switching Guide
+**Production-grade LLM provider fallback for long-running agents**
+## Overview
+The `ProviderManager` and `LongRunningAgent` classes provide enterprise-grade provider fallback, health monitoring, cost optimization, and automatic recovery for long-running AI agents.
+### Key Features
+- ✅ **Automatic Fallback** - Seamless switching between providers on failure
+- ✅ **Circuit Breaker** - Prevents cascading failures with automatic recovery
+- ✅ **Health Monitoring** - Real-time provider health tracking
+- ✅ **Cost Optimization** - Intelligent provider selection based on cost/performance
+- ✅ **Retry Logic** - Exponential/linear backoff for transient errors
+- ✅ **Checkpointing** - Save/restore agent state for crash recovery
+- ✅ **Budget Control** - Hard limits on spending and runtime
+- ✅ **Performance Tracking** - Latency, success rate, and token usage metrics
+## Quick Start
+### Basic Provider Fallback
+```typescript
+import { ProviderManager, ProviderConfig } from 'agentic-flow/core/provider-manager';
+// Configure providers
+const providers: ProviderConfig[] = [
+  {
+    name: 'gemini',
+    apiKey: process.env.GOOGLE_GEMINI_API_KEY,
+    priority: 1, // Try first
+    maxRetries: 3,
+    timeout: 30000,
+    costPerToken: 0.00015,
+    enabled: true
+  },
+  {
+    name: 'anthropic',
+    apiKey: process.env.ANTHROPIC_API_KEY,
+    priority: 2, // Fallback
+    maxRetries: 3,
+    timeout: 60000,
+    costPerToken: 0.003,
+    enabled: true
+  },
+  {
+    name: 'onnx',
+    priority: 3, // Last resort (free, local)
+    maxRetries: 2,
+    timeout: 120000,
+    costPerToken: 0,
+    enabled: true
+  }
+];
+// Initialize manager
+const manager = new ProviderManager(providers, {
+  type: 'priority', // or 'cost-optimized', 'performance-optimized', 'round-robin'
+  maxFailures: 3,
+  recoveryTime: 60000,
+  retryBackoff: 'exponential'
+});
+// Execute with automatic fallback
+const { result, provider, attempts } = await manager.executeWithFallback(
+  async (providerName) => {
+    // Your LLM API call here
+    return await callLLM(providerName, prompt);
+  }
+);
+console.log(`Success with ${provider} after ${attempts} attempts`);
+```
+### Long-Running Agent
+```typescript
+import { LongRunningAgent } from 'agentic-flow/core/long-running-agent';
+// Create agent
+const agent = new LongRunningAgent({
+  agentName: 'research-agent',
+  providers,
+  fallbackStrategy: {
+    type: 'cost-optimized',
+    maxFailures: 3,
+    recoveryTime: 60000,
+    retryBackoff: 'exponential',
+    costThreshold: 0.50, // Max $0.50 per request
+    latencyThreshold: 30000 // Max 30s per request
+  },
+  checkpointInterval: 30000, // Save state every 30s
+  maxRuntime: 3600000, // Max 1 hour
+  costBudget: 5.00 // Max $5 total
+});
+await agent.start();
+// Execute tasks with automatic provider selection
+const result = await agent.executeTask({
+  name: 'analyze-code',
+  complexity: 'complex', // 'simple' | 'medium' | 'complex'
+  estimatedTokens: 5000,
+  execute: async (provider) => {
+    return await analyzeCode(provider, code);
+  }
+});
+// Get status
+const status = agent.getStatus();
+console.log(`Completed: ${status.completedTasks}, Cost: $${status.totalCost}`);
+await agent.stop();
+```
+## Fallback Strategies
+### 1. Priority-Based (Default)
+Tries providers in priority order (1 = highest).
+```typescript
+{
+  type: 'priority',
+  maxFailures: 3,
+  recoveryTime: 60000,
+  retryBackoff: 'exponential'
+}
+```
+**Use Case:** Prefer specific provider (e.g., Claude for quality)
+### 2. Cost-Optimized
+Selects cheapest provider for estimated token count.
+```typescript
+{
+  type: 'cost-optimized',
+  maxFailures: 3,
+  recoveryTime: 60000,
+  retryBackoff: 'exponential',
+  costThreshold: 0.50 // Max $0.50 per request
+}
+```
+**Use Case:** High-volume applications, budget constraints
+### 3. Performance-Optimized
+Selects provider with best latency and success rate.
+```typescript
+{
+  type: 'performance-optimized',
+  maxFailures: 3,
+  recoveryTime: 60000,
+  retryBackoff: 'exponential',
+  latencyThreshold: 30000 // Max 30s
+}
+```
+**Use Case:** Real-time applications, user-facing services
+### 4. Round-Robin
+Distributes load evenly across providers.
+```typescript
+{
+  type: 'round-robin',
+  maxFailures: 3,
+  recoveryTime: 60000,
+  retryBackoff: 'exponential'
+}
+```
+**Use Case:** Load balancing, testing multiple providers
+## Task Complexity Heuristics
+The system applies intelligent heuristics based on task complexity:
+### Simple Tasks → Prefer Gemini/ONNX
+```typescript
+await agent.executeTask({
+  name: 'format-code',
+  complexity: 'simple', // Fast, cheap providers preferred
+  estimatedTokens: 200,
+  execute: async (provider) => formatCode(code)
+});
+```
+**Rationale:** Simple tasks don't need Claude's reasoning power
+### Medium Tasks → Auto-Optimized
+```typescript
+await agent.executeTask({
+  name: 'refactor-function',
+  complexity: 'medium', // Balance cost/quality
+  estimatedTokens: 1500,
+  execute: async (provider) => refactorFunction(code)
+});
+```
+**Rationale:** Uses fallback strategy (cost/performance)
+### Complex Tasks → Prefer Claude
+```typescript
+await agent.executeTask({
+  name: 'design-architecture',
+  complexity: 'complex', // Quality matters most
+  estimatedTokens: 5000,
+  execute: async (provider) => designArchitecture(requirements)
+});
+```
+**Rationale:** Complex reasoning benefits from Claude's capabilities
+## Circuit Breaker
+Prevents cascading failures by temporarily disabling failing providers.
+### How It Works
+1. **Failure Tracking:** Count consecutive failures per provider
+2. **Threshold:** Open circuit after N failures (configurable)
+3. **Recovery:** Automatically recover after timeout
+4. **Fallback:** Use next available provider
+### Configuration
+```typescript
+{
+  maxFailures: 3, // Open circuit after 3 consecutive failures
+  recoveryTime: 60000, // Try recovery after 60 seconds
+  retryBackoff: 'exponential' // 1s, 2s, 4s, 8s, 16s...
+}
+```
+### Monitoring
+```typescript
+const health = manager.getHealth();
+health.forEach(h => {
+  console.log(`${h.provider}:`);
+  console.log(`  Circuit Breaker: ${h.circuitBreakerOpen ? 'OPEN' : 'CLOSED'}`);
+  console.log(`  Consecutive Failures: ${h.consecutiveFailures}`);
+  console.log(`  Success Rate: ${(h.successRate * 100).toFixed(1)}%`);
+});
+```
+## Cost Tracking & Optimization
+### Real-Time Cost Monitoring
+```typescript
+const costs = manager.getCostSummary();
+console.log(`Total: $${costs.total.toFixed(4)}`);
+console.log(`Tokens: ${costs.totalTokens.toLocaleString()}`);
+for (const [provider, cost] of Object.entries(costs.byProvider)) {
+  console.log(`  ${provider}: $${cost.toFixed(4)}`);
+}
+```
+### Budget Constraints
+```typescript
+const agent = new LongRunningAgent({
+  agentName: 'budget-agent',
+  providers,
+  costBudget: 10.00, // Hard limit: $10
+  // ... other config
+});
+// Agent automatically stops when budget exceeded
+```
+### Cost-Per-Provider Configuration
+```typescript
+const providers: ProviderConfig[] = [
+  {
+    name: 'gemini',
+    costPerToken: 0.00015, // $0.15 per 1M tokens
+    // ...
+  },
+  {
+    name: 'anthropic',
+    costPerToken: 0.003, // $3 per 1M tokens (Sonnet)
+    // ...
+  },
+  {
+    name: 'onnx',
+    costPerToken: 0, // FREE (local)
+    // ...
+  }
+];
+```
+## Health Monitoring
+### Automatic Health Checks
+```typescript
+const providers: ProviderConfig[] = [
+  {
+    name: 'gemini',
+    healthCheckInterval: 60000, // Check every minute
+    // ...
+  }
+];
+```
+### Manual Health Check
+```typescript
+const health = manager.getHealth();
+health.forEach(h => {
+  console.log(`${h.provider}:`);
+  console.log(`  Healthy: ${h.isHealthy}`);
+  console.log(`  Success Rate: ${(h.successRate * 100).toFixed(1)}%`);
+  console.log(`  Avg Latency: ${h.averageLatency.toFixed(0)}ms`);
+  console.log(`  Error Rate: ${(h.errorRate * 100).toFixed(1)}%`);
+});
+```
+### Metrics Collection
+```typescript
+const metrics = manager.getMetrics();
+metrics.forEach(m => {
+  console.log(`${m.provider}:`);
+  console.log(`  Total Requests: ${m.totalRequests}`);
+  console.log(`  Successful: ${m.successfulRequests}`);
+  console.log(`  Failed: ${m.failedRequests}`);
+  console.log(`  Avg Latency: ${m.averageLatency.toFixed(0)}ms`);
+  console.log(`  Total Cost: $${m.totalCost.toFixed(4)}`);
+});
+```
+## Checkpointing & Recovery
+### Automatic Checkpoints
+```typescript
+const agent = new LongRunningAgent({
+  agentName: 'checkpoint-agent',
+  providers,
+  checkpointInterval: 30000, // Save every 30 seconds
+  // ...
+});
+await agent.start();
+// Agent automatically saves checkpoints every 30s
+// On crash, restore from last checkpoint
+```
+### Manual Checkpoint Management
+```typescript
+// Get all checkpoints
+const metrics = agent.getMetrics();
+const checkpoints = metrics.checkpoints;
+// Restore from specific checkpoint
+const lastCheckpoint = checkpoints[checkpoints.length - 1];
+agent.restoreFromCheckpoint(lastCheckpoint);
+```
+### Checkpoint Data
+```typescript
+interface AgentCheckpoint {
+  timestamp: Date;
+  taskProgress: number; // 0-1
+  currentProvider: string;
+  totalCost: number;
+  totalTokens: number;
+  completedTasks: number;
+  failedTasks: number;
+  state: Record<string, any>; // Custom state
+}
+```
+## Retry Logic
+### Exponential Backoff (Recommended)
+```typescript
+{
+  retryBackoff: 'exponential'
+}
+```
+**Delays:** 1s, 2s, 4s, 8s, 16s, 30s (max)
+**Use Case:** Rate limits, transient errors
+### Linear Backoff
+```typescript
+{
+  retryBackoff: 'linear'
+}
+```
+**Delays:** 1s, 2s, 3s, 4s, 5s, 10s (max)
+**Use Case:** Predictable retry patterns
+### Retryable Errors
+Automatically retried:
+- `rate limit`
+- `timeout`
+- `connection`
+- `network`
+- HTTP 503, 502, 429
+Non-retryable errors fail immediately:
+- Authentication errors
+- Invalid requests
+- HTTP 4xx (except 429)
+## Production Best Practices
+### 1. Multi-Provider Strategy
+```typescript
+const providers: ProviderConfig[] = [
+  // Primary: Fast & cheap for simple tasks
+  { name: 'gemini', priority: 1, costPerToken: 0.00015 },
+  // Fallback: High quality for complex tasks
+  { name: 'anthropic', priority: 2, costPerToken: 0.003 },
+  // Emergency: Free local inference
+  { name: 'onnx', priority: 3, costPerToken: 0 }
+];
+```
+### 2. Cost Optimization
+```typescript
+// Use cost-optimized strategy for high-volume
+const agent = new LongRunningAgent({
+  agentName: 'production-agent',
+  providers,
+  fallbackStrategy: {
+    type: 'cost-optimized',
+    costThreshold: 0.50
+  },
+  costBudget: 100.00 // Daily budget
+});
+```
+### 3. Health Monitoring
+```typescript
+// Monitor provider health every minute
+const providers: ProviderConfig[] = [
+  {
+    name: 'gemini',
+    healthCheckInterval: 60000,
+    enabled: true
+  }
+];
+// Check health before critical operations
+const health = manager.getHealth();
+const unhealthy = health.filter(h => !h.isHealthy);
+if (unhealthy.length > 0) {
+  console.warn('Unhealthy providers:', unhealthy.map(h => h.provider));
+}
+```
+### 4. Graceful Degradation
+```typescript
+// Prefer quality, fallback to cost
+const providers: ProviderConfig[] = [
+  { name: 'anthropic', priority: 1 }, // Best quality
+  { name: 'gemini', priority: 2 },     // Cheaper fallback
+  { name: 'onnx', priority: 3 }        // Always available
+];
+```
+### 5. Circuit Breaker Tuning
+```typescript
+{
+  maxFailures: 5, // More tolerant in production
+  recoveryTime: 300000, // 5 minutes before retry
+  retryBackoff: 'exponential'
+}
+```
+## Docker Validation
+### Build Image
+```bash
+docker build -f Dockerfile.provider-fallback -t agentic-flow-provider-fallback .
+```
+### Run Tests
+```bash
+# With Gemini API key
+docker run --rm \
+  -e GOOGLE_GEMINI_API_KEY=your_key_here \
+  agentic-flow-provider-fallback
+# ONNX only (no API key needed)
+docker run --rm agentic-flow-provider-fallback
+```
+### Expected Output
+```
+✅ Provider Fallback Validation Test
+====================================
+📋 Testing Provider Manager...
+1️⃣  Building TypeScript...
+✅ Build complete
+2️⃣  Running provider fallback example...
+   Using Gemini API key: AIza...
+🚀 Starting Long-Running Agent with Provider Fallback
+📋 Task 1: Simple Code Generation (Gemini optimal)
+  Using provider: gemini
+  ✅ Result: { code: 'console.log("Hello World");', provider: 'gemini' }
+📋 Task 2: Complex Architecture Design (Claude optimal)
+  Using provider: anthropic
+  ✅ Result: { architecture: 'Event-driven microservices', provider: 'anthropic' }
+📈 Provider Health:
+gemini:
+  Healthy: true
+  Success Rate: 100.0%
+  Circuit Breaker: CLOSED
+✅ All provider fallback tests passed!
+```
+## API Reference
+### ProviderManager
+```typescript
+class ProviderManager {
+  constructor(providers: ProviderConfig[], strategy: FallbackStrategy);
+  selectProvider(
+    taskComplexity?: 'simple' | 'medium' | 'complex',
+    estimatedTokens?: number
+  ): Promise<ProviderType>;
+  executeWithFallback<T>(
+    requestFn: (provider: ProviderType) => Promise<T>,
+    taskComplexity?: 'simple' | 'medium' | 'complex',
+    estimatedTokens?: number
+  ): Promise<{ result: T; provider: ProviderType; attempts: number }>;
+  getMetrics(): ProviderMetrics[];
+  getHealth(): ProviderHealth[];
+  getCostSummary(): { total: number; byProvider: Record<ProviderType, number>; totalTokens: number };
+  destroy(): void;
+}
+```
+### LongRunningAgent
+```typescript
+class LongRunningAgent {
+  constructor(config: LongRunningAgentConfig);
+  start(): Promise<void>;
+  stop(): Promise<void>;
+  executeTask<T>(task: {
+    name: string;
+    complexity: 'simple' | 'medium' | 'complex';
+    estimatedTokens?: number;
+    execute: (provider: string) => Promise<T>;
+  }): Promise<T>;
+  getStatus(): AgentStatus;
+  getMetrics(): AgentMetrics;
+  restoreFromCheckpoint(checkpoint: AgentCheckpoint): void;
+}
+```
+## Examples
+See `src/examples/use-provider-fallback.ts` for complete working examples.
+## Support
+- **GitHub Issues:** https://github.com/ruvnet/agentic-flow/issues
+- **Documentation:** https://github.com/ruvnet/agentic-flow#readme
+- **Discord:** Coming soon
+## License
+MIT - See LICENSE file for details