agentic-flow 1.2.1 β†’ 1.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1 @@
1
+ A program walks into a bar and orders a beer. As it is waiting for its drink, it hears a guy next to it say, 'Wow, the bartender can brew beer in just 5 minutes!' The program turns to the man and says, 'I don't know, I'm still trying to debug my couple of weeks old code and I still can't tell what it's doing. A 5 minute beer?
@@ -0,0 +1,411 @@
1
+ # Best OpenRouter Models for Claude Code Tool Use
2
+
3
+ **Research Date:** October 6, 2025
4
+ **Research Focus:** Models supporting tool/function calling that are cheap, fast, and high-quality
5
+
6
+ ---
7
+
8
+ ## Executive Summary
9
+
10
+ This research identifies the top 5 OpenRouter models optimized for Claude Code's tool calling requirements, balancing cost-effectiveness, speed, and quality. **Mistral Small 3.1 24B** emerges as the best overall value at $0.02/$0.04 per million tokens, while several FREE options are available including DeepSeek V3 0324 and Gemini 2.0 Flash.
11
+
12
+ ---
13
+
14
+ ## Top 5 Recommended Models
15
+
16
+ ### πŸ₯‡ 1. Mistral Small 3.1 24B
17
+ **Model ID:** `mistralai/mistral-small-3.1-24b`
18
+
19
+ - **Cost:** $0.02/M input tokens | $0.04/M output tokens
20
+ - **Tool Support:** ⭐⭐⭐⭐⭐ Excellent (optimized for function calling)
21
+ - **Speed:** ⚑⚑⚑⚑ Fast (low-latency)
22
+ - **Context:** 128K tokens
23
+ - **Quality:** High
24
+
25
+ **Why Choose This:**
26
+ - Specifically optimized for function calling APIs and JSON-structured outputs
27
+ - Best cost-to-performance ratio for tool use
28
+ - Low-latency responses ideal for interactive Claude Code workflows
29
+ - Excellent at structured outputs and tool implementation
30
+
31
+ **Best For:** Production Claude Code deployments requiring reliable, fast tool calling at minimal cost.
32
+
33
+ ---
34
+
35
+ ### πŸ₯ˆ 2. Cohere Command R7B (12-2024)
36
+ **Model ID:** `cohere/command-r7b-12-2024`
37
+
38
+ - **Cost:** $0.038/M input tokens | $0.15/M output tokens
39
+ - **Tool Support:** ⭐⭐⭐⭐⭐ Excellent
40
+ - **Speed:** ⚑⚑⚑⚑⚑ Very Fast
41
+ - **Context:** 128K tokens
42
+ - **Quality:** High
43
+
44
+ **Why Choose This:**
45
+ - Cheapest overall option among premium tool-calling models
46
+ - Excels at RAG, tool use, agents, and complex reasoning
47
+ - 7B parameter model - very efficient and fast
48
+ - Updated December 2024 with latest improvements
49
+
50
+ **Best For:** Budget-conscious deployments needing excellent tool calling and agent capabilities.
51
+
52
+ ---
53
+
54
+ ### πŸ₯‰ 3. Qwen Turbo
55
+ **Model ID:** `qwen/qwen-turbo`
56
+
57
+ - **Cost:** $0.05/M input tokens | $0.20/M output tokens
58
+ - **Tool Support:** ⭐⭐⭐⭐ Good
59
+ - **Speed:** ⚑⚑⚑⚑⚑ Very Fast (turbo-optimized)
60
+ - **Context:** 1M tokens (!)
61
+ - **Quality:** Good
62
+
63
+ **Why Choose This:**
64
+ - Massive 1M context window at budget pricing
65
+ - Very fast response times
66
+ - Good tool calling support
67
+ - Cached tokens at $0.02/M for repeated queries
68
+
69
+ **Notes:**
70
+ - Model is deprecated (Alibaba recommends Qwen-Flash)
71
+ - Still available and functional on OpenRouter
72
+ - Consider `qwen/qwen-flash` as alternative
73
+
74
+ **Best For:** Projects needing large context windows with tool calling at low cost.
75
+
76
+ ---
77
+
78
+ ### πŸ† 4. DeepSeek Chat
79
+ **Model ID:** `deepseek/deepseek-chat`
80
+
81
+ - **Cost:** $0.23/M input tokens | $0.90/M output tokens
82
+ - **Tool Support:** ⭐⭐⭐⭐ Good
83
+ - **Speed:** ⚑⚑⚑⚑ Fast
84
+ - **Context:** 131K tokens
85
+ - **Quality:** Very High
86
+
87
+ **Special Note:**
88
+ **DeepSeek V3 0324 is available COMPLETELY FREE on OpenRouter!**
89
+ - Model ID: `deepseek/deepseek-chat-v3-0324:free`
90
+ - Zero cost for input and output tokens
91
+ - Unprecedented free tier offering
92
+
93
+ **Why Choose This:**
94
+ - Strong reasoning capabilities
95
+ - Automatic prompt caching (no config needed)
96
+ - Good agentic workflow support
97
+ - Chinese company - excellent multilingual support
98
+
99
+ **Best For:**
100
+ - Free tier: Experimentation and development
101
+ - Paid tier: Production deployments needing strong reasoning
102
+
103
+ ---
104
+
105
+ ### ⭐ 5. Google Gemini 2.0 Flash Experimental (FREE)
106
+ **Model ID:** `google/gemini-2.0-flash-exp:free`
107
+
108
+ - **Cost:** $0.00 (FREE tier)
109
+ - **Tool Support:** ⭐⭐⭐⭐⭐ Excellent (enhanced function calling)
110
+ - **Speed:** ⚑⚑⚑⚑⚑ Very Fast
111
+ - **Context:** 1M tokens
112
+ - **Quality:** Very High
113
+
114
+ **Free Tier Limits:**
115
+ - 20 requests per minute
116
+ - 50 requests per day (if account has <$10 credits)
117
+ - No daily limit if account has $10+ credits
118
+
119
+ **Why Choose This:**
120
+ - Completely free with generous limits
121
+ - Enhanced function calling in 2.0 version
122
+ - Multimodal understanding capabilities
123
+ - Strong coding performance
124
+ - Most popular model on OpenRouter for tool calling (5M+ requests/week)
125
+
126
+ **Paid Alternative:**
127
+ - `google/gemini-2.0-flash-001`: $0.125/M input | $0.5/M output
128
+ - `google/gemini-2.0-flash-lite-001`: $0.075/M input | $0.3/M output
129
+
130
+ **Best For:** Development, testing, and low-volume production use cases.
131
+
132
+ ---
133
+
134
+ ## Honorable Mentions
135
+
136
+ ### Meta Llama 3.3 70B Instruct (FREE)
137
+ **Model ID:** `meta-llama/llama-3.3-70b-instruct:free`
138
+
139
+ - **Cost:** $0.00 (FREE)
140
+ - **Tool Support:** ⭐⭐⭐⭐ Good
141
+ - **Speed:** ⚑⚑⚑ Moderate
142
+ - **Context:** 128K tokens
143
+ - **Quality:** Very High
144
+
145
+ **Notes:**
146
+ - Completely free for training/development
147
+ - 70B parameters - strong capabilities
148
+ - Your requests may be used for training
149
+ - Also available: `meta-llama/llama-3.3-8b-instruct:free`
150
+
151
+ ---
152
+
153
+ ### Microsoft Phi-4
154
+ **Model ID:** `microsoft/phi-4`
155
+
156
+ - **Cost:** $0.07/M input | $0.14/M output
157
+ - **Tool Support:** ⭐⭐⭐ Good
158
+ - **Speed:** ⚑⚑⚑⚑ Fast
159
+ - **Context:** 16K tokens
160
+ - **Quality:** Good for size
161
+
162
+ **Alternative:** `microsoft/phi-4-reasoning-plus` at $0.07/M input | $0.35/M output for enhanced reasoning.
163
+
164
+ ---
165
+
166
+ ## Tool Calling Accuracy Rankings
167
+
168
+ Based on OpenRouter's official benchmarks:
169
+
170
+ | Rank | Model | Accuracy | Notes |
171
+ |------|-------|----------|-------|
172
+ | πŸ₯‡ 1 | GPT-5 | 99.7% | Highest accuracy (expensive) |
173
+ | πŸ₯ˆ 2 | Claude 4.1 Opus | 99.5% | Near-perfect (expensive) |
174
+ | πŸ† | Gemini 2.5 Flash | - | Most popular (5M+ requests/week) |
175
+
176
+ **Key Insight:** While GPT-5 and Claude 4.1 Opus lead in accuracy, Gemini 2.5 Flash's popularity suggests excellent real-world performance at much lower cost.
177
+
178
+ ---
179
+
180
+ ## Cost Comparison Table
181
+
182
+ | Model | Input $/M | Output $/M | Total $/M (50/50) | Free Tier |
183
+ |-------|-----------|------------|-------------------|-----------|
184
+ | Mistral Small 3.1 | $0.02 | $0.04 | $0.03 | ❌ |
185
+ | Command R7B | $0.038 | $0.15 | $0.094 | ❌ |
186
+ | Qwen Turbo | $0.05 | $0.20 | $0.125 | ❌ |
187
+ | DeepSeek V3 0324 | $0.00 | $0.00 | $0.00 | βœ… FREE |
188
+ | Gemini 2.0 Flash | $0.00 | $0.00 | $0.00 | βœ… FREE |
189
+ | Llama 3.3 70B | $0.00 | $0.00 | $0.00 | βœ… FREE |
190
+ | DeepSeek Chat (paid) | $0.23 | $0.90 | $0.565 | ❌ |
191
+ | Phi-4 | $0.07 | $0.14 | $0.105 | ❌ |
192
+
193
+ *Note: "Total $/M (50/50)" assumes equal input/output token usage*
194
+
195
+ ---
196
+
197
+ ## OpenRouter-Specific Tips
198
+
199
+ ### 1. Use Model Suffixes for Optimization
200
+
201
+ **`:free` suffix** - Access free tier versions:
202
+ ```
203
+ google/gemini-2.0-flash-exp:free
204
+ meta-llama/llama-3.3-70b-instruct:free
205
+ deepseek/deepseek-chat-v3-0324:free
206
+ ```
207
+
208
+ **`:floor` suffix** - Get cheapest provider:
209
+ ```
210
+ deepseek/deepseek-chat:floor
211
+ ```
212
+ This automatically routes to the cheapest available provider for that model.
213
+
214
+ **`:nitro` suffix** - Get fastest throughput:
215
+ ```
216
+ anthropic/claude-3.5-sonnet:nitro
217
+ ```
218
+
219
+ ### 2. Filter for Tool Support
220
+
221
+ Visit: `https://openrouter.ai/models?supported_parameters=tools`
222
+
223
+ This shows only models with verified tool/function calling support.
224
+
225
+ ### 3. No Extra Charges for Tool Calling
226
+
227
+ OpenRouter charges based on token usage only. Tool calling doesn't incur additional fees - you only pay for:
228
+ - Input tokens (your prompts + tool definitions)
229
+ - Output tokens (model responses + tool calls)
230
+
231
+ ### 4. Automatic Prompt Caching
232
+
233
+ Some models (like DeepSeek) have automatic prompt caching:
234
+ - No configuration needed
235
+ - Reduces costs for repeated queries
236
+ - Speeds up responses
237
+
238
+ ### 5. Free Tier Rate Limits
239
+
240
+ For models with `:free` suffix:
241
+ - **20 requests per minute** (all free models)
242
+ - **50 requests per day** if account balance < $10
243
+ - **Unlimited daily requests** if account balance β‰₯ $10
244
+
245
+ ### 6. OpenRouter Fees
246
+
247
+ - **5.5% fee** ($0.80 minimum) when purchasing credits
248
+ - **No markup** on model provider pricing
249
+ - Pay-as-you-go credit system
250
+
251
+ ---
252
+
253
+ ## Use Case Recommendations
254
+
255
+ ### For Development & Testing
256
+ **Recommendation:** `google/gemini-2.0-flash-exp:free`
257
+ - Free tier with generous limits
258
+ - Excellent tool calling
259
+ - Fast responses
260
+ - No cost during development
261
+
262
+ ### For Budget Production Deployments
263
+ **Recommendation:** `mistralai/mistral-small-3.1-24b`
264
+ - Best cost/performance ratio ($0.02/$0.04)
265
+ - Optimized for tool calling
266
+ - Low latency
267
+ - Reliable quality
268
+
269
+ ### For Maximum Savings
270
+ **Recommendation:** `cohere/command-r7b-12-2024`
271
+ - Cheapest paid option ($0.038/$0.15)
272
+ - Excellent agent capabilities
273
+ - Very fast (7B params)
274
+ - Strong tool use support
275
+
276
+ ### For Large Context Needs
277
+ **Recommendation:** `qwen/qwen-turbo`
278
+ - 1M context window
279
+ - Low cost ($0.05/$0.20)
280
+ - Fast responses
281
+ - Good tool support
282
+
283
+ ### For High-Quality Reasoning
284
+ **Recommendation:** `deepseek/deepseek-chat`
285
+ - FREE option available (v3-0324)
286
+ - Strong reasoning capabilities
287
+ - Good for complex workflows
288
+ - Automatic caching
289
+
290
+ ### For Multilingual Projects
291
+ **Recommendation:** `deepseek/deepseek-chat` or `qwen/qwen-turbo`
292
+ - Chinese models with excellent multilingual support
293
+ - Good tool calling in multiple languages
294
+ - Cost-effective
295
+
296
+ ---
297
+
298
+ ## Implementation Example
299
+
300
+ Here's how to use these models with agentic-flow:
301
+
302
+ ```bash
303
+ # Using Mistral Small 3.1 (Best Value)
304
+ agentic-flow --agent coder \
305
+ --task "Create a REST API with authentication" \
306
+ --provider openrouter \
307
+ --model "mistralai/mistral-small-3.1-24b"
308
+
309
+ # Using free Gemini (Development)
310
+ agentic-flow --agent researcher \
311
+ --task "Analyze this codebase structure" \
312
+ --provider openrouter \
313
+ --model "google/gemini-2.0-flash-exp:free"
314
+
315
+ # Using DeepSeek (Free Tier)
316
+ agentic-flow --agent analyst \
317
+ --task "Review code quality" \
318
+ --provider openrouter \
319
+ --model "deepseek/deepseek-chat-v3-0324:free"
320
+
321
+ # Using floor routing (Cheapest)
322
+ agentic-flow --agent optimizer \
323
+ --task "Optimize database queries" \
324
+ --provider openrouter \
325
+ --model "deepseek/deepseek-chat:floor"
326
+ ```
327
+
328
+ ---
329
+
330
+ ## Key Research Findings
331
+
332
+ 1. **No Extra Tool Calling Fees:** OpenRouter charges only for tokens, not for tool usage
333
+ 2. **Free Tier Available:** Multiple high-quality FREE models with tool support
334
+ 3. **Cost Range:** From $0 (free) to $0.90/M output tokens
335
+ 4. **Quality Trade-offs:** Even cheapest models (Mistral Small 3.1) offer excellent tool calling
336
+ 5. **Speed Leaders:** Qwen Turbo, Gemini 2.0 Flash, Command R7B are fastest
337
+ 6. **Popularity != Accuracy:** Gemini 2.5 Flash most used despite GPT-5/Claude leading accuracy
338
+ 7. **Chinese Models Competitive:** DeepSeek and Qwen offer excellent value and capabilities
339
+ 8. **Free Options Viable:** Free tier models are production-ready for many use cases
340
+
341
+ ---
342
+
343
+ ## Migration Path
344
+
345
+ ### From Anthropic Claude
346
+ 1. **Development:** Switch to `google/gemini-2.0-flash-exp:free`
347
+ 2. **Production:** Switch to `mistralai/mistral-small-3.1-24b`
348
+ 3. **Savings:** ~97% cost reduction (Claude Sonnet: $3/$15 vs Mistral: $0.02/$0.04)
349
+
350
+ ### From OpenAI GPT-4
351
+ 1. **Development:** Switch to `deepseek/deepseek-chat-v3-0324:free`
352
+ 2. **Production:** Switch to `cohere/command-r7b-12-2024`
353
+ 3. **Savings:** ~99% cost reduction (GPT-4: $30/$60 vs Command R7B: $0.038/$0.15)
354
+
355
+ ---
356
+
357
+ ## Monitoring & Optimization
358
+
359
+ ### Track Your Usage
360
+ OpenRouter provides detailed analytics:
361
+ - Token usage per model
362
+ - Cost breakdown
363
+ - Response times
364
+ - Error rates
365
+
366
+ ### A/B Testing Recommended
367
+ Test these models with your actual workload:
368
+ 1. Start with free tier (Gemini/DeepSeek)
369
+ 2. Compare with Mistral Small 3.1
370
+ 3. Measure: accuracy, speed, cost
371
+ 4. Choose based on your requirements
372
+
373
+ ### Cost Optimization Tips
374
+ 1. Use `:floor` suffix for automatic cheapest routing
375
+ 2. Enable prompt caching where available
376
+ 3. Batch requests when possible
377
+ 4. Use free tier for non-critical workloads
378
+ 5. Monitor and adjust based on actual usage patterns
379
+
380
+ ---
381
+
382
+ ## Conclusion
383
+
384
+ For **Claude Code tool use** on OpenRouter, the clear winners are:
385
+
386
+ **πŸ† Best Overall Value:** `mistralai/mistral-small-3.1-24b`
387
+ - Optimized for tool calling at unbeatable pricing
388
+
389
+ **πŸ†“ Best Free Option:** `google/gemini-2.0-flash-exp:free`
390
+ - Production-ready free tier with excellent capabilities
391
+
392
+ **πŸ’° Maximum Savings:** `cohere/command-r7b-12-2024`
393
+ - Cheapest paid option with strong performance
394
+
395
+ All three models offer excellent tool calling support, fast responses, and high-quality outputs suitable for production Claude Code deployments.
396
+
397
+ ---
398
+
399
+ ## Additional Resources
400
+
401
+ - **OpenRouter Models Page:** https://openrouter.ai/models
402
+ - **Tool Calling Docs:** https://openrouter.ai/docs/features/tool-calling
403
+ - **Filter by Tools:** https://openrouter.ai/models?supported_parameters=tools
404
+ - **OpenRouter Discord:** For community support and updates
405
+ - **Model Rankings:** https://openrouter.ai/rankings
406
+
407
+ ---
408
+
409
+ **Research Conducted By:** Claude Code Research Agent
410
+ **Last Updated:** October 6, 2025
411
+ **Methodology:** Web research, documentation review, pricing analysis, benchmark comparison
@@ -0,0 +1,113 @@
1
+ # OpenRouter Models Quick Reference for Claude Code
2
+
3
+ ## Top 5 Models for Tool/Function Calling
4
+
5
+ ### πŸ₯‡ 1. Mistral Small 3.1 24B - BEST VALUE
6
+ ```bash
7
+ Model: mistralai/mistral-small-3.1-24b
8
+ Cost: $0.02/M input | $0.04/M output
9
+ Speed: ⚑⚑⚑⚑ Fast
10
+ Tool Support: ⭐⭐⭐⭐⭐ Excellent
11
+ ```
12
+ **Use for:** Production deployments - best cost/performance ratio
13
+
14
+ ---
15
+
16
+ ### πŸ₯ˆ 2. Cohere Command R7B - CHEAPEST PAID
17
+ ```bash
18
+ Model: cohere/command-r7b-12-2024
19
+ Cost: $0.038/M input | $0.15/M output
20
+ Speed: ⚑⚑⚑⚑⚑ Very Fast
21
+ Tool Support: ⭐⭐⭐⭐⭐ Excellent
22
+ ```
23
+ **Use for:** Budget-conscious deployments with agent workflows
24
+
25
+ ---
26
+
27
+ ### πŸ₯‰ 3. Qwen Turbo - LARGE CONTEXT
28
+ ```bash
29
+ Model: qwen/qwen-turbo
30
+ Cost: $0.05/M input | $0.20/M output
31
+ Speed: ⚑⚑⚑⚑⚑ Very Fast
32
+ Tool Support: ⭐⭐⭐⭐ Good
33
+ Context: 1M tokens
34
+ ```
35
+ **Use for:** Projects needing massive context windows
36
+
37
+ ---
38
+
39
+ ### πŸ†“ 4. DeepSeek V3 0324 - FREE
40
+ ```bash
41
+ Model: deepseek/deepseek-chat-v3-0324:free
42
+ Cost: $0.00 (FREE!)
43
+ Speed: ⚑⚑⚑⚑ Fast
44
+ Tool Support: ⭐⭐⭐⭐ Good
45
+ ```
46
+ **Use for:** Development, testing, cost-sensitive production
47
+
48
+ ---
49
+
50
+ ### ⭐ 5. Gemini 2.0 Flash - FREE (MOST POPULAR)
51
+ ```bash
52
+ Model: google/gemini-2.0-flash-exp:free
53
+ Cost: $0.00 (FREE!)
54
+ Speed: ⚑⚑⚑⚑⚑ Very Fast
55
+ Tool Support: ⭐⭐⭐⭐⭐ Excellent
56
+ Limits: 20 req/min, 50/day if <$10 credits
57
+ ```
58
+ **Use for:** Development, testing, low-volume production
59
+
60
+ ---
61
+
62
+ ## Quick Command Examples
63
+
64
+ ```bash
65
+ # Best value - Mistral Small 3.1
66
+ agentic-flow --agent coder --task "..." --provider openrouter \
67
+ --model "mistralai/mistral-small-3.1-24b"
68
+
69
+ # Free tier - Gemini
70
+ agentic-flow --agent researcher --task "..." --provider openrouter \
71
+ --model "google/gemini-2.0-flash-exp:free"
72
+
73
+ # Cheapest provider auto-routing
74
+ agentic-flow --agent optimizer --task "..." --provider openrouter \
75
+ --model "deepseek/deepseek-chat:floor"
76
+ ```
77
+
78
+ ---
79
+
80
+ ## Cost Comparison (per Million Tokens)
81
+
82
+ | Model | Input | Output | 50/50 Mix |
83
+ |-------|-------|--------|-----------|
84
+ | Mistral Small 3.1 | $0.02 | $0.04 | $0.03 |
85
+ | Command R7B | $0.038 | $0.15 | $0.094 |
86
+ | Qwen Turbo | $0.05 | $0.20 | $0.125 |
87
+ | DeepSeek FREE | $0.00 | $0.00 | $0.00 |
88
+ | Gemini FREE | $0.00 | $0.00 | $0.00 |
89
+
90
+ ---
91
+
92
+ ## Pro Tips
93
+
94
+ 1. **Use `:free` suffix** for free models
95
+ 2. **Use `:floor` suffix** for cheapest provider
96
+ 3. **Filter models:** https://openrouter.ai/models?supported_parameters=tools
97
+ 4. **No extra fees** for tool calling - only token usage
98
+ 5. **Free tier limits:** 20 req/min, 50/day (unlimited with $10+ balance)
99
+
100
+ ---
101
+
102
+ ## When to Use Which Model
103
+
104
+ - **Development/Testing:** Gemini 2.0 Flash Free
105
+ - **Production (Budget):** Mistral Small 3.1 24B
106
+ - **Production (Cheapest):** Command R7B
107
+ - **Large Context:** Qwen Turbo
108
+ - **Complex Reasoning:** DeepSeek Chat
109
+ - **Maximum Savings:** DeepSeek V3 0324 Free
110
+
111
+ ---
112
+
113
+ Full research report: `/workspaces/agentic-flow/agentic-flow/.claude/openrouter-models-research.md`
package/README.md CHANGED
@@ -12,14 +12,22 @@
12
12
 
13
13
  ## πŸ“– Introduction
14
14
 
15
- Agentic Flow is a framework for running AI agents at scale with intelligent cost optimization. It runs any Claude Code agent through the [Claude Agent SDK](https://docs.claude.com/en/api/agent-sdk), automatically routing tasks to the cheapest model that meets quality requirements.
15
+ I built Agentic Flow to easily switch between alternative low-cost AI models in Claude Code/Agent SDK. For those comfortable using Claude agents and commands, it lets you take what you've created and deploy fully hosted agents for real business purposes. Use Claude Code to get the agent working, then deploy it in your favorite cloud.
16
+
17
+ Agentic Flow runs Claude Code agents at near zero cost without rewriting a thing. The built-in model optimizer automatically routes every task to the cheapest option that meets your quality requirementsβ€”free local models for privacy, OpenRouter for 99% cost savings, Gemini for speed, or Anthropic when quality matters most. It analyzes each task and selects the optimal model from 27+ options with a single flag, reducing API costs dramatically compared to using Claude exclusively.
18
+
19
+ The system spawns specialized agents on demand through Claude Code's Task tool and MCP coordination. It orchestrates swarms of 66+ pre-built agents (researchers, coders, reviewers, testers, architects) that work in parallel, coordinate through shared memory, and auto-scale based on workload. Transparent OpenRouter and Gemini proxies translate Anthropic API calls automaticallyβ€”no code changes needed. Local models run direct without proxies for maximum privacy. Switch providers with environment variables, not refactoring.
20
+
21
+ Extending agent capabilities is effortless. Add custom tools and integrations through the CLIβ€”weather data, databases, search engines, or any external serviceβ€”without touching config files. Your agents instantly gain new abilities across all projects. Every tool you add becomes available to the entire agent ecosystem automatically, and all operations are logged with full traceability for auditing, debugging, and compliance. This means your agents can connect to proprietary systems, third-party APIs, or internal tools in seconds, not hours.
22
+
23
+ Define routing rules through flexible policy modes: Strict mode keeps sensitive data offline, Economy mode prefers free models (99% savings), Premium mode uses Anthropic for highest quality, or create custom cost/quality thresholds. The policy defines the rules; the swarm enforces them automatically. Runs local for development, Docker for CI/CD, or Flow Nexus cloud for production scale. Agentic Flow is the framework for autonomous efficiencyβ€”one unified runner for every Claude Code agent, self-tuning, self-routing, and built for real-world deployment.
16
24
 
17
25
  **Key Capabilities:**
26
+ - βœ… **Claude Code Mode** - Run Claude Code with OpenRouter/Gemini/ONNX (85-99% savings)
18
27
  - βœ… **66 Specialized Agents** - Pre-built experts for coding, research, review, testing, DevOps
19
28
  - βœ… **213 MCP Tools** - Memory, GitHub, neural networks, sandboxes, workflows, payments
20
29
  - βœ… **Multi-Model Router** - Anthropic, OpenRouter (100+ models), Gemini, ONNX (free local)
21
- - βœ… **Cost Optimization** - 85-99% savings with DeepSeek, Llama, Gemini vs Claude
22
- - βœ… **Standalone Proxy** - Use Gemini/OpenRouter with Claude Code at 85% cost savings
30
+ - βœ… **Cost Optimization** - DeepSeek at $0.14/M tokens vs Claude at $15/M (99% savings)
23
31
 
24
32
  **Built On:**
25
33
  - [Claude Agent SDK](https://docs.claude.com/en/api/agent-sdk) by Anthropic
@@ -112,28 +120,67 @@ npm run mcp:stdio
112
120
 
113
121
  ---
114
122
 
115
- ### Option 3: Claude Code Integration (NEW in v1.1.13)
123
+ ### Option 3: Claude Code Mode (v1.2.3+)
124
+
125
+ **Run Claude Code with alternative AI providers - 85-99% cost savings!**
116
126
 
117
- **Auto-start proxy + spawn Claude Code with one command:**
127
+ Automatically spawns Claude Code with proxy configuration for OpenRouter, Gemini, or ONNX models:
118
128
 
119
129
  ```bash
120
- # OpenRouter (99% cost savings)
121
- npx agentic-flow claude-code --provider openrouter "Write a Python function"
130
+ # Interactive mode - Opens Claude Code UI with proxy
131
+ npx agentic-flow claude-code --provider openrouter
132
+ npx agentic-flow claude-code --provider gemini
133
+
134
+ # Non-interactive mode - Execute task and exit
135
+ npx agentic-flow claude-code --provider openrouter "Write a Python hello world function"
136
+ npx agentic-flow claude-code --provider openrouter --model "deepseek/deepseek-chat" "Create REST API"
122
137
 
123
- # Gemini (FREE tier)
124
- npx agentic-flow claude-code --provider gemini "Create a REST API"
138
+ # Use specific models
139
+ npx agentic-flow claude-code --provider openrouter --model "mistralai/mistral-small"
140
+ npx agentic-flow claude-code --provider gemini --model "gemini-2.0-flash-exp"
125
141
 
126
- # Anthropic (direct, no proxy)
127
- npx agentic-flow claude-code --provider anthropic "Help me debug"
142
+ # Local ONNX models (100% free, privacy-focused)
143
+ npx agentic-flow claude-code --provider onnx "Analyze this codebase"
128
144
  ```
129
145
 
146
+ **Recommended Models:**
147
+
148
+ | Provider | Model | Cost/M Tokens | Context | Best For |
149
+ |----------|-------|---------------|---------|----------|
150
+ | OpenRouter | `deepseek/deepseek-chat` (default) | $0.14 | 128k | General tasks, best value |
151
+ | OpenRouter | `anthropic/claude-3.5-sonnet` | $3.00 | 200k | Highest quality, complex reasoning |
152
+ | OpenRouter | `google/gemini-2.0-flash-exp:free` | FREE | 1M | Development, testing (rate limited) |
153
+ | Gemini | `gemini-2.0-flash-exp` | FREE | 1M | Fast responses, rate limited |
154
+ | ONNX | `phi-4-mini-instruct` | FREE | 128k | Privacy, offline, no API needed |
155
+
156
+ ⚠️ **Note:** Claude Code sends 35k+ tokens in tool definitions. Models with <128k context (like Mistral Small at 32k) will fail with "context length exceeded" errors.
157
+
130
158
  **How it works:**
131
- 1. βœ… Auto-detects if proxy is running
132
- 2. βœ… Auto-starts proxy if needed (background)
133
- 3. βœ… Sets `ANTHROPIC_BASE_URL` to proxy endpoint
134
- 4. βœ… Configures provider-specific API keys
135
- 5. βœ… Spawns Claude Code with environment configured
136
- 6. βœ… Cleans up proxy on exit (optional)
159
+ 1. βœ… Auto-starts proxy server in background (OpenRouter/Gemini/ONNX)
160
+ 2. βœ… Sets `ANTHROPIC_BASE_URL` to proxy endpoint
161
+ 3. βœ… Configures provider-specific API keys transparently
162
+ 4. βœ… Spawns Claude Code with environment configured
163
+ 5. βœ… All Claude SDK features work (tools, memory, MCP, etc.)
164
+ 6. βœ… Automatic cleanup on exit
165
+
166
+ **Environment Setup:**
167
+
168
+ ```bash
169
+ # OpenRouter (100+ models at 85-99% savings)
170
+ export OPENROUTER_API_KEY=sk-or-v1-...
171
+
172
+ # Gemini (FREE tier available)
173
+ export GOOGLE_GEMINI_API_KEY=AIza...
174
+
175
+ # ONNX (local models, no API key needed)
176
+ # export ONNX_MODEL_PATH=/path/to/models # Optional
177
+ ```
178
+
179
+ **Full Help:**
180
+
181
+ ```bash
182
+ npx agentic-flow claude-code --help
183
+ ```
137
184
 
138
185
  **Alternative: Manual Proxy (v1.1.11)**
139
186
 
@@ -370,9 +417,9 @@ node dist/mcp/fastmcp/servers/http-sse.js
370
417
  - **stdio**: Claude Desktop, Cursor IDE, command-line tools
371
418
  - **HTTP/SSE**: Web apps, browser extensions, REST APIs, mobile apps
372
419
 
373
- ### Add Custom MCP Servers (No Code Required)
420
+ ### Add Custom MCP Servers (No Code Required) ✨ NEW in v1.2.1
374
421
 
375
- Add your own MCP servers via CLI without editing code:
422
+ Add your own MCP servers via CLI without editing codeβ€”extends agent capabilities in seconds:
376
423
 
377
424
  ```bash
378
425
  # Add MCP server (Claude Desktop style JSON config)
@@ -393,6 +440,13 @@ npx agentic-flow mcp disable weather
393
440
 
394
441
  # Remove server
395
442
  npx agentic-flow mcp remove weather
443
+
444
+ # Test server configuration
445
+ npx agentic-flow mcp test weather
446
+
447
+ # Export/import configurations
448
+ npx agentic-flow mcp export ./mcp-backup.json
449
+ npx agentic-flow mcp import ./mcp-backup.json
396
450
  ```
397
451
 
398
452
  **Configuration stored in:** `~/.agentic-flow/mcp-config.json`
@@ -411,6 +465,13 @@ npx agentic-flow --agent researcher --task "Get weather forecast for Tokyo"
411
465
  - `weather-mcp` - Weather data
412
466
  - `database-mcp` - Database operations
413
467
 
468
+ **v1.2.1 Improvements:**
469
+ - βœ… CLI routing fixed - `mcp add/list/remove` commands now work correctly
470
+ - βœ… Model optimizer filters models without tool support automatically
471
+ - βœ… Full compatibility with Claude Desktop config format
472
+ - βœ… Test command for validating server configurations
473
+ - βœ… Export/import for backing up and sharing configurations
474
+
414
475
  **Documentation:** See [docs/guides/ADDING-MCP-SERVERS-CLI.md](docs/guides/ADDING-MCP-SERVERS-CLI.md) for complete guide.
415
476
 
416
477
  ### Using MCP Tools in Agents
@@ -85,11 +85,13 @@ export async function claudeAgent(agent, input, onStream, modelOverride) {
85
85
  });
86
86
  }
87
87
  else if (provider === 'onnx') {
88
- // For ONNX: Local inference (TODO: implement ONNX proxy)
89
- envOverrides.ANTHROPIC_API_KEY = 'local';
90
- if (modelConfig.baseURL) {
91
- envOverrides.ANTHROPIC_BASE_URL = modelConfig.baseURL;
92
- }
88
+ // For ONNX: Use ANTHROPIC_BASE_URL if already set by CLI (proxy mode)
89
+ envOverrides.ANTHROPIC_API_KEY = process.env.ANTHROPIC_API_KEY || 'sk-ant-onnx-local-key';
90
+ envOverrides.ANTHROPIC_BASE_URL = process.env.ANTHROPIC_BASE_URL || process.env.ONNX_PROXY_URL || 'http://localhost:3001';
91
+ logger.info('Using ONNX local proxy', {
92
+ proxyUrl: envOverrides.ANTHROPIC_BASE_URL,
93
+ model: finalModel
94
+ });
93
95
  }
94
96
  // For Anthropic provider, use existing ANTHROPIC_API_KEY (no proxy needed)
95
97
  logger.info('Multi-provider configuration', {