agentic-flow 1.2.1 β 1.2.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/answer.md +1 -0
- package/.claude/openrouter-models-research.md +411 -0
- package/.claude/openrouter-quick-reference.md +113 -0
- package/README.md +80 -19
- package/dist/agents/claudeAgent.js +7 -5
- package/dist/cli/claude-code-wrapper.js +121 -72
- package/dist/cli-proxy.js +52 -4
- package/dist/proxy/anthropic-to-onnx.js +213 -0
- package/docs/.claude-flow/metrics/performance.json +1 -1
- package/docs/.claude-flow/metrics/task-metrics.json +3 -3
- package/docs/ONNX-PROXY-IMPLEMENTATION.md +254 -0
- package/docs/guides/PROXY-ARCHITECTURE-AND-EXTENSION.md +708 -0
- package/package.json +2 -2
|
@@ -0,0 +1 @@
|
|
|
1
|
+
A program walks into a bar and orders a beer. As it is waiting for its drink, it hears a guy next to it say, 'Wow, the bartender can brew beer in just 5 minutes!' The program turns to the man and says, 'I don't know, I'm still trying to debug my couple of weeks old code and I still can't tell what it's doing. A 5 minute beer?
|
|
@@ -0,0 +1,411 @@
|
|
|
1
|
+
# Best OpenRouter Models for Claude Code Tool Use
|
|
2
|
+
|
|
3
|
+
**Research Date:** October 6, 2025
|
|
4
|
+
**Research Focus:** Models supporting tool/function calling that are cheap, fast, and high-quality
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Executive Summary
|
|
9
|
+
|
|
10
|
+
This research identifies the top 5 OpenRouter models optimized for Claude Code's tool calling requirements, balancing cost-effectiveness, speed, and quality. **Mistral Small 3.1 24B** emerges as the best overall value at $0.02/$0.04 per million tokens, while several FREE options are available including DeepSeek V3 0324 and Gemini 2.0 Flash.
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## Top 5 Recommended Models
|
|
15
|
+
|
|
16
|
+
### π₯ 1. Mistral Small 3.1 24B
|
|
17
|
+
**Model ID:** `mistralai/mistral-small-3.1-24b`
|
|
18
|
+
|
|
19
|
+
- **Cost:** $0.02/M input tokens | $0.04/M output tokens
|
|
20
|
+
- **Tool Support:** βββββ Excellent (optimized for function calling)
|
|
21
|
+
- **Speed:** β‘β‘β‘β‘ Fast (low-latency)
|
|
22
|
+
- **Context:** 128K tokens
|
|
23
|
+
- **Quality:** High
|
|
24
|
+
|
|
25
|
+
**Why Choose This:**
|
|
26
|
+
- Specifically optimized for function calling APIs and JSON-structured outputs
|
|
27
|
+
- Best cost-to-performance ratio for tool use
|
|
28
|
+
- Low-latency responses ideal for interactive Claude Code workflows
|
|
29
|
+
- Excellent at structured outputs and tool implementation
|
|
30
|
+
|
|
31
|
+
**Best For:** Production Claude Code deployments requiring reliable, fast tool calling at minimal cost.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
### π₯ 2. Cohere Command R7B (12-2024)
|
|
36
|
+
**Model ID:** `cohere/command-r7b-12-2024`
|
|
37
|
+
|
|
38
|
+
- **Cost:** $0.038/M input tokens | $0.15/M output tokens
|
|
39
|
+
- **Tool Support:** βββββ Excellent
|
|
40
|
+
- **Speed:** β‘β‘β‘β‘β‘ Very Fast
|
|
41
|
+
- **Context:** 128K tokens
|
|
42
|
+
- **Quality:** High
|
|
43
|
+
|
|
44
|
+
**Why Choose This:**
|
|
45
|
+
- Cheapest overall option among premium tool-calling models
|
|
46
|
+
- Excels at RAG, tool use, agents, and complex reasoning
|
|
47
|
+
- 7B parameter model - very efficient and fast
|
|
48
|
+
- Updated December 2024 with latest improvements
|
|
49
|
+
|
|
50
|
+
**Best For:** Budget-conscious deployments needing excellent tool calling and agent capabilities.
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
### π₯ 3. Qwen Turbo
|
|
55
|
+
**Model ID:** `qwen/qwen-turbo`
|
|
56
|
+
|
|
57
|
+
- **Cost:** $0.05/M input tokens | $0.20/M output tokens
|
|
58
|
+
- **Tool Support:** ββββ Good
|
|
59
|
+
- **Speed:** β‘β‘β‘β‘β‘ Very Fast (turbo-optimized)
|
|
60
|
+
- **Context:** 1M tokens (!)
|
|
61
|
+
- **Quality:** Good
|
|
62
|
+
|
|
63
|
+
**Why Choose This:**
|
|
64
|
+
- Massive 1M context window at budget pricing
|
|
65
|
+
- Very fast response times
|
|
66
|
+
- Good tool calling support
|
|
67
|
+
- Cached tokens at $0.02/M for repeated queries
|
|
68
|
+
|
|
69
|
+
**Notes:**
|
|
70
|
+
- Model is deprecated (Alibaba recommends Qwen-Flash)
|
|
71
|
+
- Still available and functional on OpenRouter
|
|
72
|
+
- Consider `qwen/qwen-flash` as alternative
|
|
73
|
+
|
|
74
|
+
**Best For:** Projects needing large context windows with tool calling at low cost.
|
|
75
|
+
|
|
76
|
+
---
|
|
77
|
+
|
|
78
|
+
### π 4. DeepSeek Chat
|
|
79
|
+
**Model ID:** `deepseek/deepseek-chat`
|
|
80
|
+
|
|
81
|
+
- **Cost:** $0.23/M input tokens | $0.90/M output tokens
|
|
82
|
+
- **Tool Support:** ββββ Good
|
|
83
|
+
- **Speed:** β‘β‘β‘β‘ Fast
|
|
84
|
+
- **Context:** 131K tokens
|
|
85
|
+
- **Quality:** Very High
|
|
86
|
+
|
|
87
|
+
**Special Note:**
|
|
88
|
+
**DeepSeek V3 0324 is available COMPLETELY FREE on OpenRouter!**
|
|
89
|
+
- Model ID: `deepseek/deepseek-chat-v3-0324:free`
|
|
90
|
+
- Zero cost for input and output tokens
|
|
91
|
+
- Unprecedented free tier offering
|
|
92
|
+
|
|
93
|
+
**Why Choose This:**
|
|
94
|
+
- Strong reasoning capabilities
|
|
95
|
+
- Automatic prompt caching (no config needed)
|
|
96
|
+
- Good agentic workflow support
|
|
97
|
+
- Chinese company - excellent multilingual support
|
|
98
|
+
|
|
99
|
+
**Best For:**
|
|
100
|
+
- Free tier: Experimentation and development
|
|
101
|
+
- Paid tier: Production deployments needing strong reasoning
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
### β 5. Google Gemini 2.0 Flash Experimental (FREE)
|
|
106
|
+
**Model ID:** `google/gemini-2.0-flash-exp:free`
|
|
107
|
+
|
|
108
|
+
- **Cost:** $0.00 (FREE tier)
|
|
109
|
+
- **Tool Support:** βββββ Excellent (enhanced function calling)
|
|
110
|
+
- **Speed:** β‘β‘β‘β‘β‘ Very Fast
|
|
111
|
+
- **Context:** 1M tokens
|
|
112
|
+
- **Quality:** Very High
|
|
113
|
+
|
|
114
|
+
**Free Tier Limits:**
|
|
115
|
+
- 20 requests per minute
|
|
116
|
+
- 50 requests per day (if account has <$10 credits)
|
|
117
|
+
- No daily limit if account has $10+ credits
|
|
118
|
+
|
|
119
|
+
**Why Choose This:**
|
|
120
|
+
- Completely free with generous limits
|
|
121
|
+
- Enhanced function calling in 2.0 version
|
|
122
|
+
- Multimodal understanding capabilities
|
|
123
|
+
- Strong coding performance
|
|
124
|
+
- Most popular model on OpenRouter for tool calling (5M+ requests/week)
|
|
125
|
+
|
|
126
|
+
**Paid Alternative:**
|
|
127
|
+
- `google/gemini-2.0-flash-001`: $0.125/M input | $0.5/M output
|
|
128
|
+
- `google/gemini-2.0-flash-lite-001`: $0.075/M input | $0.3/M output
|
|
129
|
+
|
|
130
|
+
**Best For:** Development, testing, and low-volume production use cases.
|
|
131
|
+
|
|
132
|
+
---
|
|
133
|
+
|
|
134
|
+
## Honorable Mentions
|
|
135
|
+
|
|
136
|
+
### Meta Llama 3.3 70B Instruct (FREE)
|
|
137
|
+
**Model ID:** `meta-llama/llama-3.3-70b-instruct:free`
|
|
138
|
+
|
|
139
|
+
- **Cost:** $0.00 (FREE)
|
|
140
|
+
- **Tool Support:** ββββ Good
|
|
141
|
+
- **Speed:** β‘β‘β‘ Moderate
|
|
142
|
+
- **Context:** 128K tokens
|
|
143
|
+
- **Quality:** Very High
|
|
144
|
+
|
|
145
|
+
**Notes:**
|
|
146
|
+
- Completely free for training/development
|
|
147
|
+
- 70B parameters - strong capabilities
|
|
148
|
+
- Your requests may be used for training
|
|
149
|
+
- Also available: `meta-llama/llama-3.3-8b-instruct:free`
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
### Microsoft Phi-4
|
|
154
|
+
**Model ID:** `microsoft/phi-4`
|
|
155
|
+
|
|
156
|
+
- **Cost:** $0.07/M input | $0.14/M output
|
|
157
|
+
- **Tool Support:** βββ Good
|
|
158
|
+
- **Speed:** β‘β‘β‘β‘ Fast
|
|
159
|
+
- **Context:** 16K tokens
|
|
160
|
+
- **Quality:** Good for size
|
|
161
|
+
|
|
162
|
+
**Alternative:** `microsoft/phi-4-reasoning-plus` at $0.07/M input | $0.35/M output for enhanced reasoning.
|
|
163
|
+
|
|
164
|
+
---
|
|
165
|
+
|
|
166
|
+
## Tool Calling Accuracy Rankings
|
|
167
|
+
|
|
168
|
+
Based on OpenRouter's official benchmarks:
|
|
169
|
+
|
|
170
|
+
| Rank | Model | Accuracy | Notes |
|
|
171
|
+
|------|-------|----------|-------|
|
|
172
|
+
| π₯ 1 | GPT-5 | 99.7% | Highest accuracy (expensive) |
|
|
173
|
+
| π₯ 2 | Claude 4.1 Opus | 99.5% | Near-perfect (expensive) |
|
|
174
|
+
| π | Gemini 2.5 Flash | - | Most popular (5M+ requests/week) |
|
|
175
|
+
|
|
176
|
+
**Key Insight:** While GPT-5 and Claude 4.1 Opus lead in accuracy, Gemini 2.5 Flash's popularity suggests excellent real-world performance at much lower cost.
|
|
177
|
+
|
|
178
|
+
---
|
|
179
|
+
|
|
180
|
+
## Cost Comparison Table
|
|
181
|
+
|
|
182
|
+
| Model | Input $/M | Output $/M | Total $/M (50/50) | Free Tier |
|
|
183
|
+
|-------|-----------|------------|-------------------|-----------|
|
|
184
|
+
| Mistral Small 3.1 | $0.02 | $0.04 | $0.03 | β |
|
|
185
|
+
| Command R7B | $0.038 | $0.15 | $0.094 | β |
|
|
186
|
+
| Qwen Turbo | $0.05 | $0.20 | $0.125 | β |
|
|
187
|
+
| DeepSeek V3 0324 | $0.00 | $0.00 | $0.00 | β
FREE |
|
|
188
|
+
| Gemini 2.0 Flash | $0.00 | $0.00 | $0.00 | β
FREE |
|
|
189
|
+
| Llama 3.3 70B | $0.00 | $0.00 | $0.00 | β
FREE |
|
|
190
|
+
| DeepSeek Chat (paid) | $0.23 | $0.90 | $0.565 | β |
|
|
191
|
+
| Phi-4 | $0.07 | $0.14 | $0.105 | β |
|
|
192
|
+
|
|
193
|
+
*Note: "Total $/M (50/50)" assumes equal input/output token usage*
|
|
194
|
+
|
|
195
|
+
---
|
|
196
|
+
|
|
197
|
+
## OpenRouter-Specific Tips
|
|
198
|
+
|
|
199
|
+
### 1. Use Model Suffixes for Optimization
|
|
200
|
+
|
|
201
|
+
**`:free` suffix** - Access free tier versions:
|
|
202
|
+
```
|
|
203
|
+
google/gemini-2.0-flash-exp:free
|
|
204
|
+
meta-llama/llama-3.3-70b-instruct:free
|
|
205
|
+
deepseek/deepseek-chat-v3-0324:free
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
**`:floor` suffix** - Get cheapest provider:
|
|
209
|
+
```
|
|
210
|
+
deepseek/deepseek-chat:floor
|
|
211
|
+
```
|
|
212
|
+
This automatically routes to the cheapest available provider for that model.
|
|
213
|
+
|
|
214
|
+
**`:nitro` suffix** - Get fastest throughput:
|
|
215
|
+
```
|
|
216
|
+
anthropic/claude-3.5-sonnet:nitro
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
### 2. Filter for Tool Support
|
|
220
|
+
|
|
221
|
+
Visit: `https://openrouter.ai/models?supported_parameters=tools`
|
|
222
|
+
|
|
223
|
+
This shows only models with verified tool/function calling support.
|
|
224
|
+
|
|
225
|
+
### 3. No Extra Charges for Tool Calling
|
|
226
|
+
|
|
227
|
+
OpenRouter charges based on token usage only. Tool calling doesn't incur additional fees - you only pay for:
|
|
228
|
+
- Input tokens (your prompts + tool definitions)
|
|
229
|
+
- Output tokens (model responses + tool calls)
|
|
230
|
+
|
|
231
|
+
### 4. Automatic Prompt Caching
|
|
232
|
+
|
|
233
|
+
Some models (like DeepSeek) have automatic prompt caching:
|
|
234
|
+
- No configuration needed
|
|
235
|
+
- Reduces costs for repeated queries
|
|
236
|
+
- Speeds up responses
|
|
237
|
+
|
|
238
|
+
### 5. Free Tier Rate Limits
|
|
239
|
+
|
|
240
|
+
For models with `:free` suffix:
|
|
241
|
+
- **20 requests per minute** (all free models)
|
|
242
|
+
- **50 requests per day** if account balance < $10
|
|
243
|
+
- **Unlimited daily requests** if account balance β₯ $10
|
|
244
|
+
|
|
245
|
+
### 6. OpenRouter Fees
|
|
246
|
+
|
|
247
|
+
- **5.5% fee** ($0.80 minimum) when purchasing credits
|
|
248
|
+
- **No markup** on model provider pricing
|
|
249
|
+
- Pay-as-you-go credit system
|
|
250
|
+
|
|
251
|
+
---
|
|
252
|
+
|
|
253
|
+
## Use Case Recommendations
|
|
254
|
+
|
|
255
|
+
### For Development & Testing
|
|
256
|
+
**Recommendation:** `google/gemini-2.0-flash-exp:free`
|
|
257
|
+
- Free tier with generous limits
|
|
258
|
+
- Excellent tool calling
|
|
259
|
+
- Fast responses
|
|
260
|
+
- No cost during development
|
|
261
|
+
|
|
262
|
+
### For Budget Production Deployments
|
|
263
|
+
**Recommendation:** `mistralai/mistral-small-3.1-24b`
|
|
264
|
+
- Best cost/performance ratio ($0.02/$0.04)
|
|
265
|
+
- Optimized for tool calling
|
|
266
|
+
- Low latency
|
|
267
|
+
- Reliable quality
|
|
268
|
+
|
|
269
|
+
### For Maximum Savings
|
|
270
|
+
**Recommendation:** `cohere/command-r7b-12-2024`
|
|
271
|
+
- Cheapest paid option ($0.038/$0.15)
|
|
272
|
+
- Excellent agent capabilities
|
|
273
|
+
- Very fast (7B params)
|
|
274
|
+
- Strong tool use support
|
|
275
|
+
|
|
276
|
+
### For Large Context Needs
|
|
277
|
+
**Recommendation:** `qwen/qwen-turbo`
|
|
278
|
+
- 1M context window
|
|
279
|
+
- Low cost ($0.05/$0.20)
|
|
280
|
+
- Fast responses
|
|
281
|
+
- Good tool support
|
|
282
|
+
|
|
283
|
+
### For High-Quality Reasoning
|
|
284
|
+
**Recommendation:** `deepseek/deepseek-chat`
|
|
285
|
+
- FREE option available (v3-0324)
|
|
286
|
+
- Strong reasoning capabilities
|
|
287
|
+
- Good for complex workflows
|
|
288
|
+
- Automatic caching
|
|
289
|
+
|
|
290
|
+
### For Multilingual Projects
|
|
291
|
+
**Recommendation:** `deepseek/deepseek-chat` or `qwen/qwen-turbo`
|
|
292
|
+
- Chinese models with excellent multilingual support
|
|
293
|
+
- Good tool calling in multiple languages
|
|
294
|
+
- Cost-effective
|
|
295
|
+
|
|
296
|
+
---
|
|
297
|
+
|
|
298
|
+
## Implementation Example
|
|
299
|
+
|
|
300
|
+
Here's how to use these models with agentic-flow:
|
|
301
|
+
|
|
302
|
+
```bash
|
|
303
|
+
# Using Mistral Small 3.1 (Best Value)
|
|
304
|
+
agentic-flow --agent coder \
|
|
305
|
+
--task "Create a REST API with authentication" \
|
|
306
|
+
--provider openrouter \
|
|
307
|
+
--model "mistralai/mistral-small-3.1-24b"
|
|
308
|
+
|
|
309
|
+
# Using free Gemini (Development)
|
|
310
|
+
agentic-flow --agent researcher \
|
|
311
|
+
--task "Analyze this codebase structure" \
|
|
312
|
+
--provider openrouter \
|
|
313
|
+
--model "google/gemini-2.0-flash-exp:free"
|
|
314
|
+
|
|
315
|
+
# Using DeepSeek (Free Tier)
|
|
316
|
+
agentic-flow --agent analyst \
|
|
317
|
+
--task "Review code quality" \
|
|
318
|
+
--provider openrouter \
|
|
319
|
+
--model "deepseek/deepseek-chat-v3-0324:free"
|
|
320
|
+
|
|
321
|
+
# Using floor routing (Cheapest)
|
|
322
|
+
agentic-flow --agent optimizer \
|
|
323
|
+
--task "Optimize database queries" \
|
|
324
|
+
--provider openrouter \
|
|
325
|
+
--model "deepseek/deepseek-chat:floor"
|
|
326
|
+
```
|
|
327
|
+
|
|
328
|
+
---
|
|
329
|
+
|
|
330
|
+
## Key Research Findings
|
|
331
|
+
|
|
332
|
+
1. **No Extra Tool Calling Fees:** OpenRouter charges only for tokens, not for tool usage
|
|
333
|
+
2. **Free Tier Available:** Multiple high-quality FREE models with tool support
|
|
334
|
+
3. **Cost Range:** From $0 (free) to $0.90/M output tokens
|
|
335
|
+
4. **Quality Trade-offs:** Even cheapest models (Mistral Small 3.1) offer excellent tool calling
|
|
336
|
+
5. **Speed Leaders:** Qwen Turbo, Gemini 2.0 Flash, Command R7B are fastest
|
|
337
|
+
6. **Popularity != Accuracy:** Gemini 2.5 Flash most used despite GPT-5/Claude leading accuracy
|
|
338
|
+
7. **Chinese Models Competitive:** DeepSeek and Qwen offer excellent value and capabilities
|
|
339
|
+
8. **Free Options Viable:** Free tier models are production-ready for many use cases
|
|
340
|
+
|
|
341
|
+
---
|
|
342
|
+
|
|
343
|
+
## Migration Path
|
|
344
|
+
|
|
345
|
+
### From Anthropic Claude
|
|
346
|
+
1. **Development:** Switch to `google/gemini-2.0-flash-exp:free`
|
|
347
|
+
2. **Production:** Switch to `mistralai/mistral-small-3.1-24b`
|
|
348
|
+
3. **Savings:** ~97% cost reduction (Claude Sonnet: $3/$15 vs Mistral: $0.02/$0.04)
|
|
349
|
+
|
|
350
|
+
### From OpenAI GPT-4
|
|
351
|
+
1. **Development:** Switch to `deepseek/deepseek-chat-v3-0324:free`
|
|
352
|
+
2. **Production:** Switch to `cohere/command-r7b-12-2024`
|
|
353
|
+
3. **Savings:** ~99% cost reduction (GPT-4: $30/$60 vs Command R7B: $0.038/$0.15)
|
|
354
|
+
|
|
355
|
+
---
|
|
356
|
+
|
|
357
|
+
## Monitoring & Optimization
|
|
358
|
+
|
|
359
|
+
### Track Your Usage
|
|
360
|
+
OpenRouter provides detailed analytics:
|
|
361
|
+
- Token usage per model
|
|
362
|
+
- Cost breakdown
|
|
363
|
+
- Response times
|
|
364
|
+
- Error rates
|
|
365
|
+
|
|
366
|
+
### A/B Testing Recommended
|
|
367
|
+
Test these models with your actual workload:
|
|
368
|
+
1. Start with free tier (Gemini/DeepSeek)
|
|
369
|
+
2. Compare with Mistral Small 3.1
|
|
370
|
+
3. Measure: accuracy, speed, cost
|
|
371
|
+
4. Choose based on your requirements
|
|
372
|
+
|
|
373
|
+
### Cost Optimization Tips
|
|
374
|
+
1. Use `:floor` suffix for automatic cheapest routing
|
|
375
|
+
2. Enable prompt caching where available
|
|
376
|
+
3. Batch requests when possible
|
|
377
|
+
4. Use free tier for non-critical workloads
|
|
378
|
+
5. Monitor and adjust based on actual usage patterns
|
|
379
|
+
|
|
380
|
+
---
|
|
381
|
+
|
|
382
|
+
## Conclusion
|
|
383
|
+
|
|
384
|
+
For **Claude Code tool use** on OpenRouter, the clear winners are:
|
|
385
|
+
|
|
386
|
+
**π Best Overall Value:** `mistralai/mistral-small-3.1-24b`
|
|
387
|
+
- Optimized for tool calling at unbeatable pricing
|
|
388
|
+
|
|
389
|
+
**π Best Free Option:** `google/gemini-2.0-flash-exp:free`
|
|
390
|
+
- Production-ready free tier with excellent capabilities
|
|
391
|
+
|
|
392
|
+
**π° Maximum Savings:** `cohere/command-r7b-12-2024`
|
|
393
|
+
- Cheapest paid option with strong performance
|
|
394
|
+
|
|
395
|
+
All three models offer excellent tool calling support, fast responses, and high-quality outputs suitable for production Claude Code deployments.
|
|
396
|
+
|
|
397
|
+
---
|
|
398
|
+
|
|
399
|
+
## Additional Resources
|
|
400
|
+
|
|
401
|
+
- **OpenRouter Models Page:** https://openrouter.ai/models
|
|
402
|
+
- **Tool Calling Docs:** https://openrouter.ai/docs/features/tool-calling
|
|
403
|
+
- **Filter by Tools:** https://openrouter.ai/models?supported_parameters=tools
|
|
404
|
+
- **OpenRouter Discord:** For community support and updates
|
|
405
|
+
- **Model Rankings:** https://openrouter.ai/rankings
|
|
406
|
+
|
|
407
|
+
---
|
|
408
|
+
|
|
409
|
+
**Research Conducted By:** Claude Code Research Agent
|
|
410
|
+
**Last Updated:** October 6, 2025
|
|
411
|
+
**Methodology:** Web research, documentation review, pricing analysis, benchmark comparison
|
|
@@ -0,0 +1,113 @@
|
|
|
1
|
+
# OpenRouter Models Quick Reference for Claude Code
|
|
2
|
+
|
|
3
|
+
## Top 5 Models for Tool/Function Calling
|
|
4
|
+
|
|
5
|
+
### π₯ 1. Mistral Small 3.1 24B - BEST VALUE
|
|
6
|
+
```bash
|
|
7
|
+
Model: mistralai/mistral-small-3.1-24b
|
|
8
|
+
Cost: $0.02/M input | $0.04/M output
|
|
9
|
+
Speed: β‘β‘β‘β‘ Fast
|
|
10
|
+
Tool Support: βββββ Excellent
|
|
11
|
+
```
|
|
12
|
+
**Use for:** Production deployments - best cost/performance ratio
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
### π₯ 2. Cohere Command R7B - CHEAPEST PAID
|
|
17
|
+
```bash
|
|
18
|
+
Model: cohere/command-r7b-12-2024
|
|
19
|
+
Cost: $0.038/M input | $0.15/M output
|
|
20
|
+
Speed: β‘β‘β‘β‘β‘ Very Fast
|
|
21
|
+
Tool Support: βββββ Excellent
|
|
22
|
+
```
|
|
23
|
+
**Use for:** Budget-conscious deployments with agent workflows
|
|
24
|
+
|
|
25
|
+
---
|
|
26
|
+
|
|
27
|
+
### π₯ 3. Qwen Turbo - LARGE CONTEXT
|
|
28
|
+
```bash
|
|
29
|
+
Model: qwen/qwen-turbo
|
|
30
|
+
Cost: $0.05/M input | $0.20/M output
|
|
31
|
+
Speed: β‘β‘β‘β‘β‘ Very Fast
|
|
32
|
+
Tool Support: ββββ Good
|
|
33
|
+
Context: 1M tokens
|
|
34
|
+
```
|
|
35
|
+
**Use for:** Projects needing massive context windows
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
### π 4. DeepSeek V3 0324 - FREE
|
|
40
|
+
```bash
|
|
41
|
+
Model: deepseek/deepseek-chat-v3-0324:free
|
|
42
|
+
Cost: $0.00 (FREE!)
|
|
43
|
+
Speed: β‘β‘β‘β‘ Fast
|
|
44
|
+
Tool Support: ββββ Good
|
|
45
|
+
```
|
|
46
|
+
**Use for:** Development, testing, cost-sensitive production
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
### β 5. Gemini 2.0 Flash - FREE (MOST POPULAR)
|
|
51
|
+
```bash
|
|
52
|
+
Model: google/gemini-2.0-flash-exp:free
|
|
53
|
+
Cost: $0.00 (FREE!)
|
|
54
|
+
Speed: β‘β‘β‘β‘β‘ Very Fast
|
|
55
|
+
Tool Support: βββββ Excellent
|
|
56
|
+
Limits: 20 req/min, 50/day if <$10 credits
|
|
57
|
+
```
|
|
58
|
+
**Use for:** Development, testing, low-volume production
|
|
59
|
+
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
## Quick Command Examples
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
# Best value - Mistral Small 3.1
|
|
66
|
+
agentic-flow --agent coder --task "..." --provider openrouter \
|
|
67
|
+
--model "mistralai/mistral-small-3.1-24b"
|
|
68
|
+
|
|
69
|
+
# Free tier - Gemini
|
|
70
|
+
agentic-flow --agent researcher --task "..." --provider openrouter \
|
|
71
|
+
--model "google/gemini-2.0-flash-exp:free"
|
|
72
|
+
|
|
73
|
+
# Cheapest provider auto-routing
|
|
74
|
+
agentic-flow --agent optimizer --task "..." --provider openrouter \
|
|
75
|
+
--model "deepseek/deepseek-chat:floor"
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
## Cost Comparison (per Million Tokens)
|
|
81
|
+
|
|
82
|
+
| Model | Input | Output | 50/50 Mix |
|
|
83
|
+
|-------|-------|--------|-----------|
|
|
84
|
+
| Mistral Small 3.1 | $0.02 | $0.04 | $0.03 |
|
|
85
|
+
| Command R7B | $0.038 | $0.15 | $0.094 |
|
|
86
|
+
| Qwen Turbo | $0.05 | $0.20 | $0.125 |
|
|
87
|
+
| DeepSeek FREE | $0.00 | $0.00 | $0.00 |
|
|
88
|
+
| Gemini FREE | $0.00 | $0.00 | $0.00 |
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
## Pro Tips
|
|
93
|
+
|
|
94
|
+
1. **Use `:free` suffix** for free models
|
|
95
|
+
2. **Use `:floor` suffix** for cheapest provider
|
|
96
|
+
3. **Filter models:** https://openrouter.ai/models?supported_parameters=tools
|
|
97
|
+
4. **No extra fees** for tool calling - only token usage
|
|
98
|
+
5. **Free tier limits:** 20 req/min, 50/day (unlimited with $10+ balance)
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
## When to Use Which Model
|
|
103
|
+
|
|
104
|
+
- **Development/Testing:** Gemini 2.0 Flash Free
|
|
105
|
+
- **Production (Budget):** Mistral Small 3.1 24B
|
|
106
|
+
- **Production (Cheapest):** Command R7B
|
|
107
|
+
- **Large Context:** Qwen Turbo
|
|
108
|
+
- **Complex Reasoning:** DeepSeek Chat
|
|
109
|
+
- **Maximum Savings:** DeepSeek V3 0324 Free
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
Full research report: `/workspaces/agentic-flow/agentic-flow/.claude/openrouter-models-research.md`
|
package/README.md
CHANGED
|
@@ -12,14 +12,22 @@
|
|
|
12
12
|
|
|
13
13
|
## π Introduction
|
|
14
14
|
|
|
15
|
-
Agentic Flow
|
|
15
|
+
I built Agentic Flow to easily switch between alternative low-cost AI models in Claude Code/Agent SDK. For those comfortable using Claude agents and commands, it lets you take what you've created and deploy fully hosted agents for real business purposes. Use Claude Code to get the agent working, then deploy it in your favorite cloud.
|
|
16
|
+
|
|
17
|
+
Agentic Flow runs Claude Code agents at near zero cost without rewriting a thing. The built-in model optimizer automatically routes every task to the cheapest option that meets your quality requirementsβfree local models for privacy, OpenRouter for 99% cost savings, Gemini for speed, or Anthropic when quality matters most. It analyzes each task and selects the optimal model from 27+ options with a single flag, reducing API costs dramatically compared to using Claude exclusively.
|
|
18
|
+
|
|
19
|
+
The system spawns specialized agents on demand through Claude Code's Task tool and MCP coordination. It orchestrates swarms of 66+ pre-built agents (researchers, coders, reviewers, testers, architects) that work in parallel, coordinate through shared memory, and auto-scale based on workload. Transparent OpenRouter and Gemini proxies translate Anthropic API calls automaticallyβno code changes needed. Local models run direct without proxies for maximum privacy. Switch providers with environment variables, not refactoring.
|
|
20
|
+
|
|
21
|
+
Extending agent capabilities is effortless. Add custom tools and integrations through the CLIβweather data, databases, search engines, or any external serviceβwithout touching config files. Your agents instantly gain new abilities across all projects. Every tool you add becomes available to the entire agent ecosystem automatically, and all operations are logged with full traceability for auditing, debugging, and compliance. This means your agents can connect to proprietary systems, third-party APIs, or internal tools in seconds, not hours.
|
|
22
|
+
|
|
23
|
+
Define routing rules through flexible policy modes: Strict mode keeps sensitive data offline, Economy mode prefers free models (99% savings), Premium mode uses Anthropic for highest quality, or create custom cost/quality thresholds. The policy defines the rules; the swarm enforces them automatically. Runs local for development, Docker for CI/CD, or Flow Nexus cloud for production scale. Agentic Flow is the framework for autonomous efficiencyβone unified runner for every Claude Code agent, self-tuning, self-routing, and built for real-world deployment.
|
|
16
24
|
|
|
17
25
|
**Key Capabilities:**
|
|
26
|
+
- β
**Claude Code Mode** - Run Claude Code with OpenRouter/Gemini/ONNX (85-99% savings)
|
|
18
27
|
- β
**66 Specialized Agents** - Pre-built experts for coding, research, review, testing, DevOps
|
|
19
28
|
- β
**213 MCP Tools** - Memory, GitHub, neural networks, sandboxes, workflows, payments
|
|
20
29
|
- β
**Multi-Model Router** - Anthropic, OpenRouter (100+ models), Gemini, ONNX (free local)
|
|
21
|
-
- β
**Cost Optimization** -
|
|
22
|
-
- β
**Standalone Proxy** - Use Gemini/OpenRouter with Claude Code at 85% cost savings
|
|
30
|
+
- β
**Cost Optimization** - DeepSeek at $0.14/M tokens vs Claude at $15/M (99% savings)
|
|
23
31
|
|
|
24
32
|
**Built On:**
|
|
25
33
|
- [Claude Agent SDK](https://docs.claude.com/en/api/agent-sdk) by Anthropic
|
|
@@ -112,28 +120,67 @@ npm run mcp:stdio
|
|
|
112
120
|
|
|
113
121
|
---
|
|
114
122
|
|
|
115
|
-
### Option 3: Claude Code
|
|
123
|
+
### Option 3: Claude Code Mode (v1.2.3+)
|
|
124
|
+
|
|
125
|
+
**Run Claude Code with alternative AI providers - 85-99% cost savings!**
|
|
116
126
|
|
|
117
|
-
|
|
127
|
+
Automatically spawns Claude Code with proxy configuration for OpenRouter, Gemini, or ONNX models:
|
|
118
128
|
|
|
119
129
|
```bash
|
|
120
|
-
#
|
|
121
|
-
npx agentic-flow claude-code --provider openrouter
|
|
130
|
+
# Interactive mode - Opens Claude Code UI with proxy
|
|
131
|
+
npx agentic-flow claude-code --provider openrouter
|
|
132
|
+
npx agentic-flow claude-code --provider gemini
|
|
133
|
+
|
|
134
|
+
# Non-interactive mode - Execute task and exit
|
|
135
|
+
npx agentic-flow claude-code --provider openrouter "Write a Python hello world function"
|
|
136
|
+
npx agentic-flow claude-code --provider openrouter --model "deepseek/deepseek-chat" "Create REST API"
|
|
122
137
|
|
|
123
|
-
#
|
|
124
|
-
npx agentic-flow claude-code --provider
|
|
138
|
+
# Use specific models
|
|
139
|
+
npx agentic-flow claude-code --provider openrouter --model "mistralai/mistral-small"
|
|
140
|
+
npx agentic-flow claude-code --provider gemini --model "gemini-2.0-flash-exp"
|
|
125
141
|
|
|
126
|
-
#
|
|
127
|
-
npx agentic-flow claude-code --provider
|
|
142
|
+
# Local ONNX models (100% free, privacy-focused)
|
|
143
|
+
npx agentic-flow claude-code --provider onnx "Analyze this codebase"
|
|
128
144
|
```
|
|
129
145
|
|
|
146
|
+
**Recommended Models:**
|
|
147
|
+
|
|
148
|
+
| Provider | Model | Cost/M Tokens | Context | Best For |
|
|
149
|
+
|----------|-------|---------------|---------|----------|
|
|
150
|
+
| OpenRouter | `deepseek/deepseek-chat` (default) | $0.14 | 128k | General tasks, best value |
|
|
151
|
+
| OpenRouter | `anthropic/claude-3.5-sonnet` | $3.00 | 200k | Highest quality, complex reasoning |
|
|
152
|
+
| OpenRouter | `google/gemini-2.0-flash-exp:free` | FREE | 1M | Development, testing (rate limited) |
|
|
153
|
+
| Gemini | `gemini-2.0-flash-exp` | FREE | 1M | Fast responses, rate limited |
|
|
154
|
+
| ONNX | `phi-4-mini-instruct` | FREE | 128k | Privacy, offline, no API needed |
|
|
155
|
+
|
|
156
|
+
β οΈ **Note:** Claude Code sends 35k+ tokens in tool definitions. Models with <128k context (like Mistral Small at 32k) will fail with "context length exceeded" errors.
|
|
157
|
+
|
|
130
158
|
**How it works:**
|
|
131
|
-
1. β
Auto-
|
|
132
|
-
2. β
|
|
133
|
-
3. β
|
|
134
|
-
4. β
|
|
135
|
-
5. β
|
|
136
|
-
6. β
|
|
159
|
+
1. β
Auto-starts proxy server in background (OpenRouter/Gemini/ONNX)
|
|
160
|
+
2. β
Sets `ANTHROPIC_BASE_URL` to proxy endpoint
|
|
161
|
+
3. β
Configures provider-specific API keys transparently
|
|
162
|
+
4. β
Spawns Claude Code with environment configured
|
|
163
|
+
5. β
All Claude SDK features work (tools, memory, MCP, etc.)
|
|
164
|
+
6. β
Automatic cleanup on exit
|
|
165
|
+
|
|
166
|
+
**Environment Setup:**
|
|
167
|
+
|
|
168
|
+
```bash
|
|
169
|
+
# OpenRouter (100+ models at 85-99% savings)
|
|
170
|
+
export OPENROUTER_API_KEY=sk-or-v1-...
|
|
171
|
+
|
|
172
|
+
# Gemini (FREE tier available)
|
|
173
|
+
export GOOGLE_GEMINI_API_KEY=AIza...
|
|
174
|
+
|
|
175
|
+
# ONNX (local models, no API key needed)
|
|
176
|
+
# export ONNX_MODEL_PATH=/path/to/models # Optional
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
**Full Help:**
|
|
180
|
+
|
|
181
|
+
```bash
|
|
182
|
+
npx agentic-flow claude-code --help
|
|
183
|
+
```
|
|
137
184
|
|
|
138
185
|
**Alternative: Manual Proxy (v1.1.11)**
|
|
139
186
|
|
|
@@ -370,9 +417,9 @@ node dist/mcp/fastmcp/servers/http-sse.js
|
|
|
370
417
|
- **stdio**: Claude Desktop, Cursor IDE, command-line tools
|
|
371
418
|
- **HTTP/SSE**: Web apps, browser extensions, REST APIs, mobile apps
|
|
372
419
|
|
|
373
|
-
### Add Custom MCP Servers (No Code Required)
|
|
420
|
+
### Add Custom MCP Servers (No Code Required) β¨ NEW in v1.2.1
|
|
374
421
|
|
|
375
|
-
Add your own MCP servers via CLI without editing code:
|
|
422
|
+
Add your own MCP servers via CLI without editing codeβextends agent capabilities in seconds:
|
|
376
423
|
|
|
377
424
|
```bash
|
|
378
425
|
# Add MCP server (Claude Desktop style JSON config)
|
|
@@ -393,6 +440,13 @@ npx agentic-flow mcp disable weather
|
|
|
393
440
|
|
|
394
441
|
# Remove server
|
|
395
442
|
npx agentic-flow mcp remove weather
|
|
443
|
+
|
|
444
|
+
# Test server configuration
|
|
445
|
+
npx agentic-flow mcp test weather
|
|
446
|
+
|
|
447
|
+
# Export/import configurations
|
|
448
|
+
npx agentic-flow mcp export ./mcp-backup.json
|
|
449
|
+
npx agentic-flow mcp import ./mcp-backup.json
|
|
396
450
|
```
|
|
397
451
|
|
|
398
452
|
**Configuration stored in:** `~/.agentic-flow/mcp-config.json`
|
|
@@ -411,6 +465,13 @@ npx agentic-flow --agent researcher --task "Get weather forecast for Tokyo"
|
|
|
411
465
|
- `weather-mcp` - Weather data
|
|
412
466
|
- `database-mcp` - Database operations
|
|
413
467
|
|
|
468
|
+
**v1.2.1 Improvements:**
|
|
469
|
+
- β
CLI routing fixed - `mcp add/list/remove` commands now work correctly
|
|
470
|
+
- β
Model optimizer filters models without tool support automatically
|
|
471
|
+
- β
Full compatibility with Claude Desktop config format
|
|
472
|
+
- β
Test command for validating server configurations
|
|
473
|
+
- β
Export/import for backing up and sharing configurations
|
|
474
|
+
|
|
414
475
|
**Documentation:** See [docs/guides/ADDING-MCP-SERVERS-CLI.md](docs/guides/ADDING-MCP-SERVERS-CLI.md) for complete guide.
|
|
415
476
|
|
|
416
477
|
### Using MCP Tools in Agents
|
|
@@ -85,11 +85,13 @@ export async function claudeAgent(agent, input, onStream, modelOverride) {
|
|
|
85
85
|
});
|
|
86
86
|
}
|
|
87
87
|
else if (provider === 'onnx') {
|
|
88
|
-
// For ONNX:
|
|
89
|
-
envOverrides.ANTHROPIC_API_KEY = 'local';
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
88
|
+
// For ONNX: Use ANTHROPIC_BASE_URL if already set by CLI (proxy mode)
|
|
89
|
+
envOverrides.ANTHROPIC_API_KEY = process.env.ANTHROPIC_API_KEY || 'sk-ant-onnx-local-key';
|
|
90
|
+
envOverrides.ANTHROPIC_BASE_URL = process.env.ANTHROPIC_BASE_URL || process.env.ONNX_PROXY_URL || 'http://localhost:3001';
|
|
91
|
+
logger.info('Using ONNX local proxy', {
|
|
92
|
+
proxyUrl: envOverrides.ANTHROPIC_BASE_URL,
|
|
93
|
+
model: finalModel
|
|
94
|
+
});
|
|
93
95
|
}
|
|
94
96
|
// For Anthropic provider, use existing ANTHROPIC_API_KEY (no proxy needed)
|
|
95
97
|
logger.info('Multi-provider configuration', {
|