npm - agentic-flow - Versions diffs - 1.2.1 → 1.2.3 - Mend

agentic-flow 1.2.1 → 1.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/.claude/answer.md +1 -0
package/.claude/openrouter-models-research.md +411 -0
package/.claude/openrouter-quick-reference.md +113 -0
package/README.md +80 -19
package/dist/agents/claudeAgent.js +7 -5
package/dist/cli/claude-code-wrapper.js +121 -72
package/dist/cli-proxy.js +52 -4
package/dist/proxy/anthropic-to-onnx.js +213 -0
package/docs/.claude-flow/metrics/performance.json +1 -1
package/docs/.claude-flow/metrics/task-metrics.json +3 -3
package/docs/ONNX-PROXY-IMPLEMENTATION.md +254 -0
package/docs/guides/PROXY-ARCHITECTURE-AND-EXTENSION.md +708 -0
package/package.json +2 -2

package/.claude/answer.md ADDED Viewed

	@@ -0,0 +1 @@
1	+ A program walks into a bar and orders a beer. As it is waiting for its drink, it hears a guy next to it say, 'Wow, the bartender can brew beer in just 5 minutes!' The program turns to the man and says, 'I don't know, I'm still trying to debug my couple of weeks old code and I still can't tell what it's doing. A 5 minute beer?

package/.claude/openrouter-models-research.md ADDED Viewed

@@ -0,0 +1,411 @@
+# Best OpenRouter Models for Claude Code Tool Use
+**Research Date:** October 6, 2025
+**Research Focus:** Models supporting tool/function calling that are cheap, fast, and high-quality
+---
+## Executive Summary
+This research identifies the top 5 OpenRouter models optimized for Claude Code's tool calling requirements, balancing cost-effectiveness, speed, and quality. **Mistral Small 3.1 24B** emerges as the best overall value at $0.02/$0.04 per million tokens, while several FREE options are available including DeepSeek V3 0324 and Gemini 2.0 Flash.
+---
+## Top 5 Recommended Models
+### 🥇 1. Mistral Small 3.1 24B
+**Model ID:** `mistralai/mistral-small-3.1-24b`
+- **Cost:** $0.02/M input tokens | $0.04/M output tokens
+- **Tool Support:** ⭐⭐⭐⭐⭐ Excellent (optimized for function calling)
+- **Speed:** ⚡⚡⚡⚡ Fast (low-latency)
+- **Context:** 128K tokens
+- **Quality:** High
+**Why Choose This:**
+- Specifically optimized for function calling APIs and JSON-structured outputs
+- Best cost-to-performance ratio for tool use
+- Low-latency responses ideal for interactive Claude Code workflows
+- Excellent at structured outputs and tool implementation
+**Best For:** Production Claude Code deployments requiring reliable, fast tool calling at minimal cost.
+---
+### 🥈 2. Cohere Command R7B (12-2024)
+**Model ID:** `cohere/command-r7b-12-2024`
+- **Cost:** $0.038/M input tokens | $0.15/M output tokens
+- **Tool Support:** ⭐⭐⭐⭐⭐ Excellent
+- **Speed:** ⚡⚡⚡⚡⚡ Very Fast
+- **Context:** 128K tokens
+- **Quality:** High
+**Why Choose This:**
+- Cheapest overall option among premium tool-calling models
+- Excels at RAG, tool use, agents, and complex reasoning
+- 7B parameter model - very efficient and fast
+- Updated December 2024 with latest improvements
+**Best For:** Budget-conscious deployments needing excellent tool calling and agent capabilities.
+---
+### 🥉 3. Qwen Turbo
+**Model ID:** `qwen/qwen-turbo`
+- **Cost:** $0.05/M input tokens | $0.20/M output tokens
+- **Tool Support:** ⭐⭐⭐⭐ Good
+- **Speed:** ⚡⚡⚡⚡⚡ Very Fast (turbo-optimized)
+- **Context:** 1M tokens (!)
+- **Quality:** Good
+**Why Choose This:**
+- Massive 1M context window at budget pricing
+- Very fast response times
+- Good tool calling support
+- Cached tokens at $0.02/M for repeated queries
+**Notes:**
+- Model is deprecated (Alibaba recommends Qwen-Flash)
+- Still available and functional on OpenRouter
+- Consider `qwen/qwen-flash` as alternative
+**Best For:** Projects needing large context windows with tool calling at low cost.
+---
+### 🏆 4. DeepSeek Chat
+**Model ID:** `deepseek/deepseek-chat`
+- **Cost:** $0.23/M input tokens | $0.90/M output tokens
+- **Tool Support:** ⭐⭐⭐⭐ Good
+- **Speed:** ⚡⚡⚡⚡ Fast
+- **Context:** 131K tokens
+- **Quality:** Very High
+**Special Note:**
+**DeepSeek V3 0324 is available COMPLETELY FREE on OpenRouter!**
+- Model ID: `deepseek/deepseek-chat-v3-0324:free`
+- Zero cost for input and output tokens
+- Unprecedented free tier offering
+**Why Choose This:**
+- Strong reasoning capabilities
+- Automatic prompt caching (no config needed)
+- Good agentic workflow support
+- Chinese company - excellent multilingual support
+**Best For:**
+- Free tier: Experimentation and development
+- Paid tier: Production deployments needing strong reasoning
+---
+### ⭐ 5. Google Gemini 2.0 Flash Experimental (FREE)
+**Model ID:** `google/gemini-2.0-flash-exp:free`
+- **Cost:** $0.00 (FREE tier)
+- **Tool Support:** ⭐⭐⭐⭐⭐ Excellent (enhanced function calling)
+- **Speed:** ⚡⚡⚡⚡⚡ Very Fast
+- **Context:** 1M tokens
+- **Quality:** Very High
+**Free Tier Limits:**
+- 20 requests per minute
+- 50 requests per day (if account has <$10 credits)
+- No daily limit if account has $10+ credits
+**Why Choose This:**
+- Completely free with generous limits
+- Enhanced function calling in 2.0 version
+- Multimodal understanding capabilities
+- Strong coding performance
+- Most popular model on OpenRouter for tool calling (5M+ requests/week)
+**Paid Alternative:**
+- `google/gemini-2.0-flash-001`: $0.125/M input | $0.5/M output
+- `google/gemini-2.0-flash-lite-001`: $0.075/M input | $0.3/M output
+**Best For:** Development, testing, and low-volume production use cases.
+---
+## Honorable Mentions
+### Meta Llama 3.3 70B Instruct (FREE)
+**Model ID:** `meta-llama/llama-3.3-70b-instruct:free`
+- **Cost:** $0.00 (FREE)
+- **Tool Support:** ⭐⭐⭐⭐ Good
+- **Speed:** ⚡⚡⚡ Moderate
+- **Context:** 128K tokens
+- **Quality:** Very High
+**Notes:**
+- Completely free for training/development
+- 70B parameters - strong capabilities
+- Your requests may be used for training
+- Also available: `meta-llama/llama-3.3-8b-instruct:free`
+---
+### Microsoft Phi-4
+**Model ID:** `microsoft/phi-4`
+- **Cost:** $0.07/M input | $0.14/M output
+- **Tool Support:** ⭐⭐⭐ Good
+- **Speed:** ⚡⚡⚡⚡ Fast
+- **Context:** 16K tokens
+- **Quality:** Good for size
+**Alternative:** `microsoft/phi-4-reasoning-plus` at $0.07/M input | $0.35/M output for enhanced reasoning.
+---
+## Tool Calling Accuracy Rankings
+Based on OpenRouter's official benchmarks:
+| Rank | Model | Accuracy | Notes |
+|------|-------|----------|-------|
+| 🥇 1 | GPT-5 | 99.7% | Highest accuracy (expensive) |
+| 🥈 2 | Claude 4.1 Opus | 99.5% | Near-perfect (expensive) |
+| 🏆 | Gemini 2.5 Flash | - | Most popular (5M+ requests/week) |
+**Key Insight:** While GPT-5 and Claude 4.1 Opus lead in accuracy, Gemini 2.5 Flash's popularity suggests excellent real-world performance at much lower cost.
+---
+## Cost Comparison Table
+| Model | Input $/M | Output $/M | Total $/M (50/50) | Free Tier |
+|-------|-----------|------------|-------------------|-----------|
+| Mistral Small 3.1 | $0.02 | $0.04 | $0.03 | ❌ |
+| Command R7B | $0.038 | $0.15 | $0.094 | ❌ |
+| Qwen Turbo | $0.05 | $0.20 | $0.125 | ❌ |
+| DeepSeek V3 0324 | $0.00 | $0.00 | $0.00 | ✅ FREE |
+| Gemini 2.0 Flash | $0.00 | $0.00 | $0.00 | ✅ FREE |
+| Llama 3.3 70B | $0.00 | $0.00 | $0.00 | ✅ FREE |
+| DeepSeek Chat (paid) | $0.23 | $0.90 | $0.565 | ❌ |
+| Phi-4 | $0.07 | $0.14 | $0.105 | ❌ |
+*Note: "Total $/M (50/50)" assumes equal input/output token usage*
+---
+## OpenRouter-Specific Tips
+### 1. Use Model Suffixes for Optimization
+**`:free` suffix** - Access free tier versions:
+```
+google/gemini-2.0-flash-exp:free
+meta-llama/llama-3.3-70b-instruct:free
+deepseek/deepseek-chat-v3-0324:free
+```
+**`:floor` suffix** - Get cheapest provider:
+```
+deepseek/deepseek-chat:floor
+```
+This automatically routes to the cheapest available provider for that model.
+**`:nitro` suffix** - Get fastest throughput:
+```
+anthropic/claude-3.5-sonnet:nitro
+```
+### 2. Filter for Tool Support
+Visit: `https://openrouter.ai/models?supported_parameters=tools`
+This shows only models with verified tool/function calling support.
+### 3. No Extra Charges for Tool Calling
+OpenRouter charges based on token usage only. Tool calling doesn't incur additional fees - you only pay for:
+- Input tokens (your prompts + tool definitions)
+- Output tokens (model responses + tool calls)
+### 4. Automatic Prompt Caching
+Some models (like DeepSeek) have automatic prompt caching:
+- No configuration needed
+- Reduces costs for repeated queries
+- Speeds up responses
+### 5. Free Tier Rate Limits
+For models with `:free` suffix:
+- **20 requests per minute** (all free models)
+- **50 requests per day** if account balance < $10
+- **Unlimited daily requests** if account balance ≥ $10
+### 6. OpenRouter Fees
+- **5.5% fee** ($0.80 minimum) when purchasing credits
+- **No markup** on model provider pricing
+- Pay-as-you-go credit system
+---
+## Use Case Recommendations
+### For Development & Testing
+**Recommendation:** `google/gemini-2.0-flash-exp:free`
+- Free tier with generous limits
+- Excellent tool calling
+- Fast responses
+- No cost during development
+### For Budget Production Deployments
+**Recommendation:** `mistralai/mistral-small-3.1-24b`
+- Best cost/performance ratio ($0.02/$0.04)
+- Optimized for tool calling
+- Low latency
+- Reliable quality
+### For Maximum Savings
+**Recommendation:** `cohere/command-r7b-12-2024`
+- Cheapest paid option ($0.038/$0.15)
+- Excellent agent capabilities
+- Very fast (7B params)
+- Strong tool use support
+### For Large Context Needs
+**Recommendation:** `qwen/qwen-turbo`
+- 1M context window
+- Low cost ($0.05/$0.20)
+- Fast responses
+- Good tool support
+### For High-Quality Reasoning
+**Recommendation:** `deepseek/deepseek-chat`
+- FREE option available (v3-0324)
+- Strong reasoning capabilities
+- Good for complex workflows
+- Automatic caching
+### For Multilingual Projects
+**Recommendation:** `deepseek/deepseek-chat` or `qwen/qwen-turbo`
+- Chinese models with excellent multilingual support
+- Good tool calling in multiple languages
+- Cost-effective
+---
+## Implementation Example
+Here's how to use these models with agentic-flow:
+```bash
+# Using Mistral Small 3.1 (Best Value)
+agentic-flow --agent coder \
+  --task "Create a REST API with authentication" \
+  --provider openrouter \
+  --model "mistralai/mistral-small-3.1-24b"
+# Using free Gemini (Development)
+agentic-flow --agent researcher \
+  --task "Analyze this codebase structure" \
+  --provider openrouter \
+  --model "google/gemini-2.0-flash-exp:free"
+# Using DeepSeek (Free Tier)
+agentic-flow --agent analyst \
+  --task "Review code quality" \
+  --provider openrouter \
+  --model "deepseek/deepseek-chat-v3-0324:free"
+# Using floor routing (Cheapest)
+agentic-flow --agent optimizer \
+  --task "Optimize database queries" \
+  --provider openrouter \
+  --model "deepseek/deepseek-chat:floor"
+```
+---
+## Key Research Findings
+1. **No Extra Tool Calling Fees:** OpenRouter charges only for tokens, not for tool usage
+2. **Free Tier Available:** Multiple high-quality FREE models with tool support
+3. **Cost Range:** From $0 (free) to $0.90/M output tokens
+4. **Quality Trade-offs:** Even cheapest models (Mistral Small 3.1) offer excellent tool calling
+5. **Speed Leaders:** Qwen Turbo, Gemini 2.0 Flash, Command R7B are fastest
+6. **Popularity != Accuracy:** Gemini 2.5 Flash most used despite GPT-5/Claude leading accuracy
+7. **Chinese Models Competitive:** DeepSeek and Qwen offer excellent value and capabilities
+8. **Free Options Viable:** Free tier models are production-ready for many use cases
+---
+## Migration Path
+### From Anthropic Claude
+1. **Development:** Switch to `google/gemini-2.0-flash-exp:free`
+2. **Production:** Switch to `mistralai/mistral-small-3.1-24b`
+3. **Savings:** ~97% cost reduction (Claude Sonnet: $3/$15 vs Mistral: $0.02/$0.04)
+### From OpenAI GPT-4
+1. **Development:** Switch to `deepseek/deepseek-chat-v3-0324:free`
+2. **Production:** Switch to `cohere/command-r7b-12-2024`
+3. **Savings:** ~99% cost reduction (GPT-4: $30/$60 vs Command R7B: $0.038/$0.15)
+---
+## Monitoring & Optimization
+### Track Your Usage
+OpenRouter provides detailed analytics:
+- Token usage per model
+- Cost breakdown
+- Response times
+- Error rates
+### A/B Testing Recommended
+Test these models with your actual workload:
+1. Start with free tier (Gemini/DeepSeek)
+2. Compare with Mistral Small 3.1
+3. Measure: accuracy, speed, cost
+4. Choose based on your requirements
+### Cost Optimization Tips
+1. Use `:floor` suffix for automatic cheapest routing
+2. Enable prompt caching where available
+3. Batch requests when possible
+4. Use free tier for non-critical workloads
+5. Monitor and adjust based on actual usage patterns
+---
+## Conclusion
+For **Claude Code tool use** on OpenRouter, the clear winners are:
+**🏆 Best Overall Value:** `mistralai/mistral-small-3.1-24b`
+- Optimized for tool calling at unbeatable pricing
+**🆓 Best Free Option:** `google/gemini-2.0-flash-exp:free`
+- Production-ready free tier with excellent capabilities
+**💰 Maximum Savings:** `cohere/command-r7b-12-2024`
+- Cheapest paid option with strong performance
+All three models offer excellent tool calling support, fast responses, and high-quality outputs suitable for production Claude Code deployments.
+---
+## Additional Resources
+- **OpenRouter Models Page:** https://openrouter.ai/models
+- **Tool Calling Docs:** https://openrouter.ai/docs/features/tool-calling
+- **Filter by Tools:** https://openrouter.ai/models?supported_parameters=tools
+- **OpenRouter Discord:** For community support and updates
+- **Model Rankings:** https://openrouter.ai/rankings
+---
+**Research Conducted By:** Claude Code Research Agent
+**Last Updated:** October 6, 2025
+**Methodology:** Web research, documentation review, pricing analysis, benchmark comparison

package/.claude/openrouter-quick-reference.md ADDED Viewed

@@ -0,0 +1,113 @@
+# OpenRouter Models Quick Reference for Claude Code
+## Top 5 Models for Tool/Function Calling
+### 🥇 1. Mistral Small 3.1 24B - BEST VALUE
+```bash
+Model: mistralai/mistral-small-3.1-24b
+Cost: $0.02/M input | $0.04/M output
+Speed: ⚡⚡⚡⚡ Fast
+Tool Support: ⭐⭐⭐⭐⭐ Excellent
+```
+**Use for:** Production deployments - best cost/performance ratio
+---
+### 🥈 2. Cohere Command R7B - CHEAPEST PAID
+```bash
+Model: cohere/command-r7b-12-2024
+Cost: $0.038/M input | $0.15/M output
+Speed: ⚡⚡⚡⚡⚡ Very Fast
+Tool Support: ⭐⭐⭐⭐⭐ Excellent
+```
+**Use for:** Budget-conscious deployments with agent workflows
+---
+### 🥉 3. Qwen Turbo - LARGE CONTEXT
+```bash
+Model: qwen/qwen-turbo
+Cost: $0.05/M input | $0.20/M output
+Speed: ⚡⚡⚡⚡⚡ Very Fast
+Tool Support: ⭐⭐⭐⭐ Good
+Context: 1M tokens
+```
+**Use for:** Projects needing massive context windows
+---
+### 🆓 4. DeepSeek V3 0324 - FREE
+```bash
+Model: deepseek/deepseek-chat-v3-0324:free
+Cost: $0.00 (FREE!)
+Speed: ⚡⚡⚡⚡ Fast
+Tool Support: ⭐⭐⭐⭐ Good
+```
+**Use for:** Development, testing, cost-sensitive production
+---
+### ⭐ 5. Gemini 2.0 Flash - FREE (MOST POPULAR)
+```bash
+Model: google/gemini-2.0-flash-exp:free
+Cost: $0.00 (FREE!)
+Speed: ⚡⚡⚡⚡⚡ Very Fast
+Tool Support: ⭐⭐⭐⭐⭐ Excellent
+Limits: 20 req/min, 50/day if <$10 credits
+```
+**Use for:** Development, testing, low-volume production
+---
+## Quick Command Examples
+```bash
+# Best value - Mistral Small 3.1
+agentic-flow --agent coder --task "..." --provider openrouter \
+  --model "mistralai/mistral-small-3.1-24b"
+# Free tier - Gemini
+agentic-flow --agent researcher --task "..." --provider openrouter \
+  --model "google/gemini-2.0-flash-exp:free"
+# Cheapest provider auto-routing
+agentic-flow --agent optimizer --task "..." --provider openrouter \
+  --model "deepseek/deepseek-chat:floor"
+```
+---
+## Cost Comparison (per Million Tokens)
+| Model | Input | Output | 50/50 Mix |
+|-------|-------|--------|-----------|
+| Mistral Small 3.1 | $0.02 | $0.04 | $0.03 |
+| Command R7B | $0.038 | $0.15 | $0.094 |
+| Qwen Turbo | $0.05 | $0.20 | $0.125 |
+| DeepSeek FREE | $0.00 | $0.00 | $0.00 |
+| Gemini FREE | $0.00 | $0.00 | $0.00 |
+---
+## Pro Tips
+1. **Use `:free` suffix** for free models
+2. **Use `:floor` suffix** for cheapest provider
+3. **Filter models:** https://openrouter.ai/models?supported_parameters=tools
+4. **No extra fees** for tool calling - only token usage
+5. **Free tier limits:** 20 req/min, 50/day (unlimited with $10+ balance)
+---
+## When to Use Which Model
+- **Development/Testing:** Gemini 2.0 Flash Free
+- **Production (Budget):** Mistral Small 3.1 24B
+- **Production (Cheapest):** Command R7B
+- **Large Context:** Qwen Turbo
+- **Complex Reasoning:** DeepSeek Chat
+- **Maximum Savings:** DeepSeek V3 0324 Free
+---
+Full research report: `/workspaces/agentic-flow/agentic-flow/.claude/openrouter-models-research.md`

package/README.md CHANGED Viewed

@@ -12,14 +12,22 @@
 ## 📖 Introduction
-Agentic Flow is a framework for running AI agents at scale with intelligent cost optimization. It runs any Claude Code agent through the [Claude Agent SDK](https://docs.claude.com/en/api/agent-sdk), automatically routing tasks to the cheapest model that meets quality requirements.
+I built Agentic Flow to easily switch between alternative low-cost AI models in Claude Code/Agent SDK. For those comfortable using Claude agents and commands, it lets you take what you've created and deploy fully hosted agents for real business purposes. Use Claude Code to get the agent working, then deploy it in your favorite cloud.
+Agentic Flow runs Claude Code agents at near zero cost without rewriting a thing. The built-in model optimizer automatically routes every task to the cheapest option that meets your quality requirements—free local models for privacy, OpenRouter for 99% cost savings, Gemini for speed, or Anthropic when quality matters most. It analyzes each task and selects the optimal model from 27+ options with a single flag, reducing API costs dramatically compared to using Claude exclusively.
+The system spawns specialized agents on demand through Claude Code's Task tool and MCP coordination. It orchestrates swarms of 66+ pre-built agents (researchers, coders, reviewers, testers, architects) that work in parallel, coordinate through shared memory, and auto-scale based on workload. Transparent OpenRouter and Gemini proxies translate Anthropic API calls automatically—no code changes needed. Local models run direct without proxies for maximum privacy. Switch providers with environment variables, not refactoring.
+Extending agent capabilities is effortless. Add custom tools and integrations through the CLI—weather data, databases, search engines, or any external service—without touching config files. Your agents instantly gain new abilities across all projects. Every tool you add becomes available to the entire agent ecosystem automatically, and all operations are logged with full traceability for auditing, debugging, and compliance. This means your agents can connect to proprietary systems, third-party APIs, or internal tools in seconds, not hours.
+Define routing rules through flexible policy modes: Strict mode keeps sensitive data offline, Economy mode prefers free models (99% savings), Premium mode uses Anthropic for highest quality, or create custom cost/quality thresholds. The policy defines the rules; the swarm enforces them automatically. Runs local for development, Docker for CI/CD, or Flow Nexus cloud for production scale. Agentic Flow is the framework for autonomous efficiency—one unified runner for every Claude Code agent, self-tuning, self-routing, and built for real-world deployment.
 **Key Capabilities:**
+- ✅ **Claude Code Mode** - Run Claude Code with OpenRouter/Gemini/ONNX (85-99% savings)
 - ✅ **66 Specialized Agents** - Pre-built experts for coding, research, review, testing, DevOps
 - ✅ **213 MCP Tools** - Memory, GitHub, neural networks, sandboxes, workflows, payments
 - ✅ **Multi-Model Router** - Anthropic, OpenRouter (100+ models), Gemini, ONNX (free local)
-- ✅ **Cost Optimization** - 85-99% savings with DeepSeek, Llama, Gemini vs Claude
-- ✅ **Standalone Proxy** - Use Gemini/OpenRouter with Claude Code at 85% cost savings
+- ✅ **Cost Optimization** - DeepSeek at $0.14/M tokens vs Claude at $15/M (99% savings)
 **Built On:**
 - [Claude Agent SDK](https://docs.claude.com/en/api/agent-sdk) by Anthropic
@@ -112,28 +120,67 @@ npm run mcp:stdio
 ---
-### Option 3: Claude Code Integration (NEW in v1.1.13)
+### Option 3: Claude Code Mode (v1.2.3+)
+**Run Claude Code with alternative AI providers - 85-99% cost savings!**
-**Auto-start proxy + spawn Claude Code with one command:**
+Automatically spawns Claude Code with proxy configuration for OpenRouter, Gemini, or ONNX models:
 ```bash
-# OpenRouter (99% cost savings)
-npx agentic-flow claude-code --provider openrouter "Write a Python function"
+# Interactive mode - Opens Claude Code UI with proxy
+npx agentic-flow claude-code --provider openrouter
+npx agentic-flow claude-code --provider gemini
+# Non-interactive mode - Execute task and exit
+npx agentic-flow claude-code --provider openrouter "Write a Python hello world function"
+npx agentic-flow claude-code --provider openrouter --model "deepseek/deepseek-chat" "Create REST API"
-# Gemini (FREE tier)
-npx agentic-flow claude-code --provider gemini "Create a REST API"
+# Use specific models
+npx agentic-flow claude-code --provider openrouter --model "mistralai/mistral-small"
+npx agentic-flow claude-code --provider gemini --model "gemini-2.0-flash-exp"
-# Anthropic (direct, no proxy)
-npx agentic-flow claude-code --provider anthropic "Help me debug"
+# Local ONNX models (100% free, privacy-focused)
+npx agentic-flow claude-code --provider onnx "Analyze this codebase"
 ```
+**Recommended Models:**
+| Provider | Model | Cost/M Tokens | Context | Best For |
+|----------|-------|---------------|---------|----------|
+| OpenRouter | `deepseek/deepseek-chat` (default) | $0.14 | 128k | General tasks, best value |
+| OpenRouter | `anthropic/claude-3.5-sonnet` | $3.00 | 200k | Highest quality, complex reasoning |
+| OpenRouter | `google/gemini-2.0-flash-exp:free` | FREE | 1M | Development, testing (rate limited) |
+| Gemini | `gemini-2.0-flash-exp` | FREE | 1M | Fast responses, rate limited |
+| ONNX | `phi-4-mini-instruct` | FREE | 128k | Privacy, offline, no API needed |
+⚠️ **Note:** Claude Code sends 35k+ tokens in tool definitions. Models with <128k context (like Mistral Small at 32k) will fail with "context length exceeded" errors.
 **How it works:**
-1. ✅ Auto-detects if proxy is running
-2. ✅ Auto-starts proxy if needed (background)
-3. ✅ Sets `ANTHROPIC_BASE_URL` to proxy endpoint
-4. ✅ Configures provider-specific API keys
-5. ✅ Spawns Claude Code with environment configured
-6. ✅ Cleans up proxy on exit (optional)
+1. ✅ Auto-starts proxy server in background (OpenRouter/Gemini/ONNX)
+2. ✅ Sets `ANTHROPIC_BASE_URL` to proxy endpoint
+3. ✅ Configures provider-specific API keys transparently
+4. ✅ Spawns Claude Code with environment configured
+5. ✅ All Claude SDK features work (tools, memory, MCP, etc.)
+6. ✅ Automatic cleanup on exit
+**Environment Setup:**
+```bash
+# OpenRouter (100+ models at 85-99% savings)
+export OPENROUTER_API_KEY=sk-or-v1-...
+# Gemini (FREE tier available)
+export GOOGLE_GEMINI_API_KEY=AIza...
+# ONNX (local models, no API key needed)
+# export ONNX_MODEL_PATH=/path/to/models  # Optional
+```
+**Full Help:**
+```bash
+npx agentic-flow claude-code --help
+```
 **Alternative: Manual Proxy (v1.1.11)**
@@ -370,9 +417,9 @@ node dist/mcp/fastmcp/servers/http-sse.js
 - **stdio**: Claude Desktop, Cursor IDE, command-line tools
 - **HTTP/SSE**: Web apps, browser extensions, REST APIs, mobile apps
-### Add Custom MCP Servers (No Code Required)
+### Add Custom MCP Servers (No Code Required) ✨ NEW in v1.2.1
-Add your own MCP servers via CLI without editing code:
+Add your own MCP servers via CLI without editing code—extends agent capabilities in seconds:
 ```bash
 # Add MCP server (Claude Desktop style JSON config)
@@ -393,6 +440,13 @@ npx agentic-flow mcp disable weather
 # Remove server
 npx agentic-flow mcp remove weather
+# Test server configuration
+npx agentic-flow mcp test weather
+# Export/import configurations
+npx agentic-flow mcp export ./mcp-backup.json
+npx agentic-flow mcp import ./mcp-backup.json
 ```
 **Configuration stored in:** `~/.agentic-flow/mcp-config.json`
@@ -411,6 +465,13 @@ npx agentic-flow --agent researcher --task "Get weather forecast for Tokyo"
 - `weather-mcp` - Weather data
 - `database-mcp` - Database operations
+**v1.2.1 Improvements:**
+- ✅ CLI routing fixed - `mcp add/list/remove` commands now work correctly
+- ✅ Model optimizer filters models without tool support automatically
+- ✅ Full compatibility with Claude Desktop config format
+- ✅ Test command for validating server configurations
+- ✅ Export/import for backing up and sharing configurations
 **Documentation:** See [docs/guides/ADDING-MCP-SERVERS-CLI.md](docs/guides/ADDING-MCP-SERVERS-CLI.md) for complete guide.
 ### Using MCP Tools in Agents

package/dist/agents/claudeAgent.js CHANGED Viewed

@@ -85,11 +85,13 @@ export async function claudeAgent(agent, input, onStream, modelOverride) {
             });
         }
         else if (provider === 'onnx') {
-            // For ONNX: Local inference (TODO: implement ONNX proxy)
-            envOverrides.ANTHROPIC_API_KEY = 'local';
-            if (modelConfig.baseURL) {
-                envOverrides.ANTHROPIC_BASE_URL = modelConfig.baseURL;
-            }
+            // For ONNX: Use ANTHROPIC_BASE_URL if already set by CLI (proxy mode)
+            envOverrides.ANTHROPIC_API_KEY = process.env.ANTHROPIC_API_KEY || 'sk-ant-onnx-local-key';
+            envOverrides.ANTHROPIC_BASE_URL = process.env.ANTHROPIC_BASE_URL || process.env.ONNX_PROXY_URL || 'http://localhost:3001';
+            logger.info('Using ONNX local proxy', {
+                proxyUrl: envOverrides.ANTHROPIC_BASE_URL,
+                model: finalModel
+            });
         }
         // For Anthropic provider, use existing ANTHROPIC_API_KEY (no proxy needed)
         logger.info('Multi-provider configuration', {