ai_client 0.4.3 → 0.4.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,5 +1,375 @@
1
1
  ---
2
- - :id: qwen/qwen-turbo-2024-11-01
2
+ - :id: perplexity/r1-1776
3
+ :name: 'Perplexity: R1 1776'
4
+ :created: 1740004929
5
+ :description: |-
6
+ Note: As this model does not return <think> tags, thoughts will be streamed by default directly to the `content` field.
7
+
8
+ R1 1776 is a version of DeepSeek-R1 that has been post-trained to remove censorship constraints related to topics restricted by the Chinese government. The model retains its original reasoning capabilities while providing direct responses to a wider range of queries. R1 1776 is an offline chat model that does not use the perplexity search subsystem.
9
+
10
+ The model was tested on a multilingual dataset of over 1,000 examples covering sensitive topics to measure its likelihood of refusal or overly filtered responses. [Evaluation Results](https://cdn-uploads.huggingface.co/production/uploads/675c8332d01f593dc90817f5/GiN2VqC5hawUgAGJ6oHla.png) Its performance on math and reasoning benchmarks remains similar to the base R1 model. [Reasoning Performance](https://cdn-uploads.huggingface.co/production/uploads/675c8332d01f593dc90817f5/n4Z9Byqp2S7sKUvCvI40R.png)
11
+
12
+ Read more on the [Blog Post](https://perplexity.ai/hub/blog/open-sourcing-r1-1776)
13
+ :context_length: 128000
14
+ :architecture:
15
+ modality: text->text
16
+ tokenizer: DeepSeek
17
+ instruct_type:
18
+ :pricing:
19
+ prompt: '0.000002'
20
+ completion: '0.000008'
21
+ image: '0'
22
+ request: '0'
23
+ :top_provider:
24
+ context_length: 128000
25
+ max_completion_tokens:
26
+ is_moderated: false
27
+ :per_request_limits:
28
+ - :id: mistralai/mistral-saba
29
+ :name: 'Mistral: Saba'
30
+ :created: 1739803239
31
+ :description: Mistral Saba is a 24B-parameter language model specifically designed
32
+ for the Middle East and South Asia, delivering accurate and contextually relevant
33
+ responses while maintaining efficient performance. Trained on curated regional
34
+ datasets, it supports multiple Indian-origin languages—including Tamil and Malayalam—alongside
35
+ Arabic. This makes it a versatile option for a range of regional and multilingual
36
+ applications. Read more at the blog post [here](https://mistral.ai/en/news/mistral-saba)
37
+ :context_length: 32000
38
+ :architecture:
39
+ modality: text->text
40
+ tokenizer: Mistral
41
+ instruct_type:
42
+ :pricing:
43
+ prompt: '0.0000002'
44
+ completion: '0.0000006'
45
+ image: '0'
46
+ request: '0'
47
+ :top_provider:
48
+ context_length: 32000
49
+ max_completion_tokens:
50
+ is_moderated: false
51
+ :per_request_limits:
52
+ - :id: cognitivecomputations/dolphin3.0-r1-mistral-24b:free
53
+ :name: Dolphin3.0 R1 Mistral 24B (free)
54
+ :created: 1739462498
55
+ :description: |-
56
+ Dolphin 3.0 R1 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model, enabling coding, math, agentic, function calling, and general use cases.
57
+
58
+ The R1 version has been trained for 3 epochs to reason using 800k reasoning traces from the Dolphin-R1 dataset.
59
+
60
+ Dolphin aims to be a general purpose reasoning instruct model, similar to the models behind ChatGPT, Claude, Gemini.
61
+
62
+ Part of the [Dolphin 3.0 Collection](https://huggingface.co/collections/cognitivecomputations/dolphin-30-677ab47f73d7ff66743979a3) Curated and trained by [Eric Hartford](https://huggingface.co/ehartford), [Ben Gitter](https://huggingface.co/bigstorm), [BlouseJury](https://huggingface.co/BlouseJury) and [Cognitive Computations](https://huggingface.co/cognitivecomputations)
63
+ :context_length: 32768
64
+ :architecture:
65
+ modality: text->text
66
+ tokenizer: Other
67
+ instruct_type:
68
+ :pricing:
69
+ prompt: '0'
70
+ completion: '0'
71
+ image: '0'
72
+ request: '0'
73
+ :top_provider:
74
+ context_length: 32768
75
+ max_completion_tokens:
76
+ is_moderated: false
77
+ :per_request_limits:
78
+ - :id: cognitivecomputations/dolphin3.0-mistral-24b:free
79
+ :name: Dolphin3.0 Mistral 24B (free)
80
+ :created: 1739462019
81
+ :description: "Dolphin 3.0 is the next generation of the Dolphin series of instruct-tuned
82
+ models. Designed to be the ultimate general purpose local model, enabling coding,
83
+ math, agentic, function calling, and general use cases.\n\nDolphin aims to be
84
+ a general purpose instruct model, similar to the models behind ChatGPT, Claude,
85
+ Gemini. \n\nPart of the [Dolphin 3.0 Collection](https://huggingface.co/collections/cognitivecomputations/dolphin-30-677ab47f73d7ff66743979a3)
86
+ Curated and trained by [Eric Hartford](https://huggingface.co/ehartford), [Ben
87
+ Gitter](https://huggingface.co/bigstorm), [BlouseJury](https://huggingface.co/BlouseJury)
88
+ and [Cognitive Computations](https://huggingface.co/cognitivecomputations)"
89
+ :context_length: 32768
90
+ :architecture:
91
+ modality: text->text
92
+ tokenizer: Other
93
+ instruct_type:
94
+ :pricing:
95
+ prompt: '0'
96
+ completion: '0'
97
+ image: '0'
98
+ request: '0'
99
+ :top_provider:
100
+ context_length: 32768
101
+ max_completion_tokens:
102
+ is_moderated: false
103
+ :per_request_limits:
104
+ - :id: meta-llama/llama-guard-3-8b
105
+ :name: Llama Guard 3 8B
106
+ :created: 1739401318
107
+ :description: |
108
+ Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.
109
+
110
+ Llama Guard 3 was aligned to safeguard against the MLCommons standardized hazards taxonomy and designed to support Llama 3.1 capabilities. Specifically, it provides content moderation in 8 languages, and was optimized to support safety and security for search and code interpreter tool calls.
111
+ :context_length: 16384
112
+ :architecture:
113
+ modality: text->text
114
+ tokenizer: Llama3
115
+ instruct_type: none
116
+ :pricing:
117
+ prompt: '0.0000003'
118
+ completion: '0.0000003'
119
+ image: '0'
120
+ request: '0'
121
+ :top_provider:
122
+ context_length: 16384
123
+ max_completion_tokens:
124
+ is_moderated: false
125
+ :per_request_limits:
126
+ - :id: openai/o3-mini-high
127
+ :name: 'OpenAI: o3 Mini High'
128
+ :created: 1739372611
129
+ :description: "OpenAI o3-mini-high is the same model as [o3-mini](/openai/o3-mini)
130
+ with reasoning_effort set to high. \n\no3-mini is a cost-efficient language model
131
+ optimized for STEM reasoning tasks, particularly excelling in science, mathematics,
132
+ and coding. The model features three adjustable reasoning effort levels and supports
133
+ key developer capabilities including function calling, structured outputs, and
134
+ streaming, though it does not include vision processing capabilities.\n\nThe model
135
+ demonstrates significant improvements over its predecessor, with expert testers
136
+ preferring its responses 56% of the time and noting a 39% reduction in major errors
137
+ on complex questions. With medium reasoning effort settings, o3-mini matches the
138
+ performance of the larger o1 model on challenging reasoning evaluations like AIME
139
+ and GPQA, while maintaining lower latency and cost."
140
+ :context_length: 200000
141
+ :architecture:
142
+ modality: text->text
143
+ tokenizer: Other
144
+ instruct_type:
145
+ :pricing:
146
+ prompt: '0.0000011'
147
+ completion: '0.0000044'
148
+ image: '0'
149
+ request: '0'
150
+ :top_provider:
151
+ context_length: 200000
152
+ max_completion_tokens: 100000
153
+ is_moderated: true
154
+ :per_request_limits:
155
+ - :id: allenai/llama-3.1-tulu-3-405b
156
+ :name: Llama 3.1 Tulu 3 405B
157
+ :created: 1739053421
158
+ :description: Tülu 3 405B is the largest model in the Tülu 3 family, applying fully
159
+ open post-training recipes at a 405B parameter scale. Built on the Llama 3.1 405B
160
+ base, it leverages Reinforcement Learning with Verifiable Rewards (RLVR) to enhance
161
+ instruction following, MATH, GSM8K, and IFEval performance. As part of Tülu 3’s
162
+ fully open-source approach, it offers state-of-the-art capabilities while surpassing
163
+ prior open-weight models like Llama 3.1 405B Instruct and Nous Hermes 3 405B on
164
+ multiple benchmarks. To read more, [click here.](https://allenai.org/blog/tulu-3-405B)
165
+ :context_length: 16000
166
+ :architecture:
167
+ modality: text->text
168
+ tokenizer: Other
169
+ instruct_type:
170
+ :pricing:
171
+ prompt: '0.000005'
172
+ completion: '0.00001'
173
+ image: '0'
174
+ request: '0'
175
+ :top_provider:
176
+ context_length: 16000
177
+ max_completion_tokens:
178
+ is_moderated: false
179
+ :per_request_limits:
180
+ - :id: deepseek/deepseek-r1-distill-llama-8b
181
+ :name: 'DeepSeek: R1 Distill Llama 8B'
182
+ :created: 1738937718
183
+ :description: "DeepSeek R1 Distill Llama 8B is a distilled large language model
184
+ based on [Llama-3.1-8B-Instruct](/meta-llama/llama-3.1-8b-instruct), using outputs
185
+ from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation
186
+ techniques to achieve high performance across multiple benchmarks, including:\n\n-
187
+ AIME 2024 pass@1: 50.4\n- MATH-500 pass@1: 89.1\n- CodeForces Rating: 1205\n\nThe
188
+ model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance
189
+ comparable to larger frontier models.\n\nHugging Face: \n- [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B)
190
+ \n- [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)
191
+ \ |"
192
+ :context_length: 32000
193
+ :architecture:
194
+ modality: text->text
195
+ tokenizer: Llama3
196
+ instruct_type:
197
+ :pricing:
198
+ prompt: '0.00000004'
199
+ completion: '0.00000004'
200
+ image: '0'
201
+ request: '0'
202
+ :top_provider:
203
+ context_length: 32000
204
+ max_completion_tokens: 32000
205
+ is_moderated: false
206
+ :per_request_limits:
207
+ - :id: google/gemini-2.0-flash-001
208
+ :name: 'Google: Gemini Flash 2.0'
209
+ :created: 1738769413
210
+ :description: Gemini Flash 2.0 offers a significantly faster time to first token
211
+ (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining
212
+ quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5).
213
+ It introduces notable enhancements in multimodal understanding, coding capabilities,
214
+ complex instruction following, and function calling. These advancements come together
215
+ to deliver more seamless and robust agentic experiences.
216
+ :context_length: 1000000
217
+ :architecture:
218
+ modality: text+image->text
219
+ tokenizer: Gemini
220
+ instruct_type:
221
+ :pricing:
222
+ prompt: '0.0000001'
223
+ completion: '0.0000004'
224
+ image: '0.0000258'
225
+ request: '0'
226
+ :top_provider:
227
+ context_length: 1000000
228
+ max_completion_tokens: 8192
229
+ is_moderated: false
230
+ :per_request_limits:
231
+ - :id: google/gemini-2.0-flash-lite-preview-02-05:free
232
+ :name: 'Google: Gemini Flash Lite 2.0 Preview (free)'
233
+ :created: 1738768262
234
+ :description: Gemini Flash Lite 2.0 offers a significantly faster time to first
235
+ token (TTFT) compared to [Gemini Flash 1.5](google/gemini-flash-1.5), while maintaining
236
+ quality on par with larger models like [Gemini Pro 1.5](google/gemini-pro-1.5).
237
+ Because it's currently in preview, it will be **heavily rate-limited** by Google.
238
+ This model will move from free to paid pending a general rollout on February 24th,
239
+ at $0.075 / $0.30 per million input / ouput tokens respectively.
240
+ :context_length: 1000000
241
+ :architecture:
242
+ modality: text+image->text
243
+ tokenizer: Gemini
244
+ instruct_type:
245
+ :pricing:
246
+ prompt: '0'
247
+ completion: '0'
248
+ image: '0'
249
+ request: '0'
250
+ :top_provider:
251
+ context_length: 1000000
252
+ max_completion_tokens: 8192
253
+ is_moderated: false
254
+ :per_request_limits:
255
+ - :id: google/gemini-2.0-pro-exp-02-05:free
256
+ :name: 'Google: Gemini Pro 2.0 Experimental (free)'
257
+ :created: 1738768044
258
+ :description: |-
259
+ Gemini 2.0 Pro Experimental is a bleeding-edge version of the Gemini 2.0 Pro model. Because it's currently experimental, it will be **heavily rate-limited** by Google.
260
+
261
+ Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
262
+
263
+ #multimodal
264
+ :context_length: 2000000
265
+ :architecture:
266
+ modality: text+image->text
267
+ tokenizer: Gemini
268
+ instruct_type:
269
+ :pricing:
270
+ prompt: '0'
271
+ completion: '0'
272
+ image: '0'
273
+ request: '0'
274
+ :top_provider:
275
+ context_length: 2000000
276
+ max_completion_tokens: 8192
277
+ is_moderated: false
278
+ :per_request_limits:
279
+ - :id: qwen/qwen-vl-plus:free
280
+ :name: 'Qwen: Qwen VL Plus (free)'
281
+ :created: 1738731255
282
+ :description: 'Qwen''s Enhanced Large Visual Language Model. Significantly upgraded
283
+ for detailed recognition capabilities and text recognition abilities, supporting
284
+ ultra-high pixel resolutions up to millions of pixels and extreme aspect ratios
285
+ for image input. It delivers significant performance across a broad range of visual
286
+ tasks.
287
+
288
+ '
289
+ :context_length: 7500
290
+ :architecture:
291
+ modality: text+image->text
292
+ tokenizer: Qwen
293
+ instruct_type:
294
+ :pricing:
295
+ prompt: '0'
296
+ completion: '0'
297
+ image: '0'
298
+ request: '0'
299
+ :top_provider:
300
+ context_length: 7500
301
+ max_completion_tokens: 1500
302
+ is_moderated: false
303
+ :per_request_limits:
304
+ - :id: aion-labs/aion-1.0
305
+ :name: 'AionLabs: Aion-1.0'
306
+ :created: 1738697557
307
+ :description: Aion-1.0 is a multi-model system designed for high performance across
308
+ various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented
309
+ with additional models and techniques such as Tree of Thoughts (ToT) and Mixture
310
+ of Experts (MoE). It is Aion Lab's most powerful reasoning model.
311
+ :context_length: 32768
312
+ :architecture:
313
+ modality: text->text
314
+ tokenizer: Other
315
+ instruct_type:
316
+ :pricing:
317
+ prompt: '0.000004'
318
+ completion: '0.000008'
319
+ image: '0'
320
+ request: '0'
321
+ :top_provider:
322
+ context_length: 32768
323
+ max_completion_tokens: 32768
324
+ is_moderated: false
325
+ :per_request_limits:
326
+ - :id: aion-labs/aion-1.0-mini
327
+ :name: 'AionLabs: Aion-1.0-Mini'
328
+ :created: 1738697107
329
+ :description: Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1
330
+ model, designed for strong performance in reasoning domains such as mathematics,
331
+ coding, and logic. It is a modified variant of a FuseAI model that outperforms
332
+ R1-Distill-Qwen-32B and R1-Distill-Llama-70B, with benchmark results available
333
+ on its [Hugging Face page](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview),
334
+ independently replicated for verification.
335
+ :context_length: 32768
336
+ :architecture:
337
+ modality: text->text
338
+ tokenizer: Other
339
+ instruct_type:
340
+ :pricing:
341
+ prompt: '0.0000007'
342
+ completion: '0.0000014'
343
+ image: '0'
344
+ request: '0'
345
+ :top_provider:
346
+ context_length: 32768
347
+ max_completion_tokens: 32768
348
+ is_moderated: false
349
+ :per_request_limits:
350
+ - :id: aion-labs/aion-rp-llama-3.1-8b
351
+ :name: 'AionLabs: Aion-RP 1.0 (8B)'
352
+ :created: 1738696718
353
+ :description: Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation
354
+ portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto,
355
+ where LLMs evaluate each other’s responses. It is a fine-tuned base model rather
356
+ than an instruct model, designed to produce more natural and varied writing.
357
+ :context_length: 32768
358
+ :architecture:
359
+ modality: text->text
360
+ tokenizer: Other
361
+ instruct_type:
362
+ :pricing:
363
+ prompt: '0.0000002'
364
+ completion: '0.0000002'
365
+ image: '0'
366
+ request: '0'
367
+ :top_provider:
368
+ context_length: 32768
369
+ max_completion_tokens: 32768
370
+ is_moderated: false
371
+ :per_request_limits:
372
+ - :id: qwen/qwen-turbo
3
373
  :name: 'Qwen: Qwen-Turbo'
4
374
  :created: 1738410974
5
375
  :description: Qwen-Turbo, based on Qwen2.5, is a 1M context model that provides
@@ -19,6 +389,27 @@
19
389
  max_completion_tokens: 8192
20
390
  is_moderated: false
21
391
  :per_request_limits:
392
+ - :id: qwen/qwen2.5-vl-72b-instruct:free
393
+ :name: 'Qwen: Qwen2.5 VL 72B Instruct (free)'
394
+ :created: 1738410311
395
+ :description: Qwen2.5-VL is proficient in recognizing common objects such as flowers,
396
+ birds, fish, and insects. It is also highly capable of analyzing texts, charts,
397
+ icons, graphics, and layouts within images.
398
+ :context_length: 131072
399
+ :architecture:
400
+ modality: text+image->text
401
+ tokenizer: Qwen
402
+ instruct_type:
403
+ :pricing:
404
+ prompt: '0'
405
+ completion: '0'
406
+ image: '0'
407
+ request: '0'
408
+ :top_provider:
409
+ context_length: 131072
410
+ max_completion_tokens: 2048
411
+ is_moderated: false
412
+ :per_request_limits:
22
413
  - :id: qwen/qwen-plus
23
414
  :name: 'Qwen: Qwen-Plus'
24
415
  :created: 1738409840
@@ -66,7 +457,11 @@
66
457
  :name: 'OpenAI: o3 Mini'
67
458
  :created: 1738351721
68
459
  :description: |-
69
- OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding. The model features three adjustable reasoning effort levels and supports key developer capabilities including function calling, structured outputs, and streaming, though it does not include vision processing capabilities.
460
+ OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding.
461
+
462
+ This model supports the `reasoning_effort` parameter, which can be set to "high", "medium", or "low" to control the thinking time of the model. The default is "medium". OpenRouter also offers the model slug `openai/o3-mini-high` to default the parameter to "high".
463
+
464
+ The model features three adjustable reasoning effort levels and supports key developer capabilities including function calling, structured outputs, and streaming, though it does not include vision processing capabilities.
70
465
 
71
466
  The model demonstrates significant improvements over its predecessor, with expert testers preferring its responses 56% of the time and noting a 39% reduction in major errors on complex questions. With medium reasoning effort settings, o3-mini matches the performance of the larger o1 model on challenging reasoning evaluations like AIME and GPQA, while maintaining lower latency and cost.
72
467
  :context_length: 200000
@@ -85,7 +480,7 @@
85
480
  is_moderated: true
86
481
  :per_request_limits:
87
482
  - :id: deepseek/deepseek-r1-distill-qwen-1.5b
88
- :name: 'Deepseek: Deepseek R1 Distill Qwen 1.5B'
483
+ :name: 'DeepSeek: R1 Distill Qwen 1.5B'
89
484
  :created: 1738328067
90
485
  :description: |-
91
486
  DeepSeek R1 Distill Qwen 1.5B is a distilled large language model based on [Qwen 2.5 Math 1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It's a very small and efficient model which outperforms [GPT 4o 0513](/openai/gpt-4o-2024-05-13) on Math Benchmarks.
@@ -109,7 +504,29 @@
109
504
  request: '0'
110
505
  :top_provider:
111
506
  context_length: 131072
112
- max_completion_tokens: 2048
507
+ max_completion_tokens: 32768
508
+ is_moderated: false
509
+ :per_request_limits:
510
+ - :id: mistralai/mistral-small-24b-instruct-2501:free
511
+ :name: 'Mistral: Mistral Small 3 (free)'
512
+ :created: 1738255409
513
+ :description: |-
514
+ Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment.
515
+
516
+ The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware. [Read the blog post about the model here.](https://mistral.ai/news/mistral-small-3/)
517
+ :context_length: 32000
518
+ :architecture:
519
+ modality: text->text
520
+ tokenizer: Mistral
521
+ instruct_type:
522
+ :pricing:
523
+ prompt: '0'
524
+ completion: '0'
525
+ image: '0'
526
+ request: '0'
527
+ :top_provider:
528
+ context_length: 32000
529
+ max_completion_tokens:
113
530
  is_moderated: false
114
531
  :per_request_limits:
115
532
  - :id: mistralai/mistral-small-24b-instruct-2501
@@ -131,11 +548,11 @@
131
548
  request: '0'
132
549
  :top_provider:
133
550
  context_length: 32768
134
- max_completion_tokens:
551
+ max_completion_tokens: 8192
135
552
  is_moderated: false
136
553
  :per_request_limits:
137
554
  - :id: deepseek/deepseek-r1-distill-qwen-32b
138
- :name: 'DeepSeek: DeepSeek R1 Distill Qwen 32B'
555
+ :name: 'DeepSeek: R1 Distill Qwen 32B'
139
556
  :created: 1738194830
140
557
  :description: |-
141
558
  DeepSeek R1 Distill Qwen 32B is a distilled large language model based on [Qwen 2.5 32B](https://huggingface.co/Qwen/Qwen2.5-32B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
@@ -159,11 +576,11 @@
159
576
  request: '0'
160
577
  :top_provider:
161
578
  context_length: 131072
162
- max_completion_tokens:
579
+ max_completion_tokens: 8192
163
580
  is_moderated: false
164
581
  :per_request_limits:
165
582
  - :id: deepseek/deepseek-r1-distill-qwen-14b
166
- :name: 'DeepSeek: DeepSeek R1 Distill Qwen 14B'
583
+ :name: 'DeepSeek: R1 Distill Qwen 14B'
167
584
  :created: 1738193940
168
585
  :description: |-
169
586
  DeepSeek R1 Distill Qwen 14B is a distilled large language model based on [Qwen 2.5 14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
@@ -187,7 +604,7 @@
187
604
  request: '0'
188
605
  :top_provider:
189
606
  context_length: 131072
190
- max_completion_tokens: 2048
607
+ max_completion_tokens: 32768
191
608
  is_moderated: false
192
609
  :per_request_limits:
193
610
  - :id: perplexity/sonar-reasoning
@@ -282,8 +699,34 @@
282
699
  max_completion_tokens:
283
700
  is_moderated: false
284
701
  :per_request_limits:
702
+ - :id: deepseek/deepseek-r1-distill-llama-70b:free
703
+ :name: 'DeepSeek: R1 Distill Llama 70B (free)'
704
+ :created: 1737663169
705
+ :description: |-
706
+ DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including:
707
+
708
+ - AIME 2024 pass@1: 70.0
709
+ - MATH-500 pass@1: 94.5
710
+ - CodeForces Rating: 1633
711
+
712
+ The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
713
+ :context_length: 128000
714
+ :architecture:
715
+ modality: text->text
716
+ tokenizer: Llama3
717
+ instruct_type:
718
+ :pricing:
719
+ prompt: '0'
720
+ completion: '0'
721
+ image: '0'
722
+ request: '0'
723
+ :top_provider:
724
+ context_length: 128000
725
+ max_completion_tokens:
726
+ is_moderated: false
727
+ :per_request_limits:
285
728
  - :id: deepseek/deepseek-r1-distill-llama-70b
286
- :name: 'DeepSeek: DeepSeek R1 Distill Llama 70B'
729
+ :name: 'DeepSeek: R1 Distill Llama 70B'
287
730
  :created: 1737663169
288
731
  :description: |-
289
732
  DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including:
@@ -305,7 +748,7 @@
305
748
  request: '0'
306
749
  :top_provider:
307
750
  context_length: 131072
308
- max_completion_tokens:
751
+ max_completion_tokens: 8192
309
752
  is_moderated: false
310
753
  :per_request_limits:
311
754
  - :id: google/gemini-2.0-flash-thinking-exp:free
@@ -331,7 +774,7 @@
331
774
  is_moderated: false
332
775
  :per_request_limits:
333
776
  - :id: deepseek/deepseek-r1:free
334
- :name: 'DeepSeek: DeepSeek R1 (free)'
777
+ :name: 'DeepSeek: R1 (free)'
335
778
  :created: 1737381095
336
779
  :description: |-
337
780
  DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.
@@ -339,7 +782,7 @@
339
782
  Fully open-source model & [technical report](https://api-docs.deepseek.com/news/news250120).
340
783
 
341
784
  MIT licensed: Distill & commercialize freely!
342
- :context_length: 128000
785
+ :context_length: 163840
343
786
  :architecture:
344
787
  modality: text->text
345
788
  tokenizer: DeepSeek
@@ -350,12 +793,12 @@
350
793
  image: '0'
351
794
  request: '0'
352
795
  :top_provider:
353
- context_length: 128000
354
- max_completion_tokens: 4096
796
+ context_length: 163840
797
+ max_completion_tokens:
355
798
  is_moderated: false
356
799
  :per_request_limits:
357
800
  - :id: deepseek/deepseek-r1
358
- :name: 'DeepSeek: DeepSeek R1'
801
+ :name: 'DeepSeek: R1'
359
802
  :created: 1737381095
360
803
  :description: |-
361
804
  DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.
@@ -363,43 +806,19 @@
363
806
  Fully open-source model & [technical report](https://api-docs.deepseek.com/news/news250120).
364
807
 
365
808
  MIT licensed: Distill & commercialize freely!
366
- :context_length: 16000
809
+ :context_length: 128000
367
810
  :architecture:
368
811
  modality: text->text
369
812
  tokenizer: DeepSeek
370
813
  instruct_type:
371
814
  :pricing:
372
- prompt: '0.00000075'
815
+ prompt: '0.0000008'
373
816
  completion: '0.0000024'
374
817
  image: '0'
375
818
  request: '0'
376
819
  :top_provider:
377
- context_length: 16000
378
- max_completion_tokens: 8192
379
- is_moderated: false
380
- :per_request_limits:
381
- - :id: deepseek/deepseek-r1:nitro
382
- :name: 'DeepSeek: DeepSeek R1 (nitro)'
383
- :created: 1737381095
384
- :description: |-
385
- DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.
386
-
387
- Fully open-source model & [technical report](https://api-docs.deepseek.com/news/news250120).
388
-
389
- MIT licensed: Distill & commercialize freely!
390
- :context_length: 163840
391
- :architecture:
392
- modality: text->text
393
- tokenizer: DeepSeek
394
- instruct_type:
395
- :pricing:
396
- prompt: '0.000007'
397
- completion: '0.000007'
398
- image: '0'
399
- request: '0'
400
- :top_provider:
401
- context_length: 163840
402
- max_completion_tokens: 32768
820
+ context_length: 128000
821
+ max_completion_tokens:
403
822
  is_moderated: false
404
823
  :per_request_limits:
405
824
  - :id: sophosympatheia/rogue-rose-103b-v0.2:free
@@ -491,7 +910,7 @@
491
910
  request: '0'
492
911
  :top_provider:
493
912
  context_length: 16384
494
- max_completion_tokens:
913
+ max_completion_tokens: 8192
495
914
  is_moderated: false
496
915
  :per_request_limits:
497
916
  - :id: sao10k/l3.1-70b-hanami-x1
@@ -513,6 +932,28 @@
513
932
  max_completion_tokens:
514
933
  is_moderated: false
515
934
  :per_request_limits:
935
+ - :id: deepseek/deepseek-chat:free
936
+ :name: 'DeepSeek: DeepSeek V3 (free)'
937
+ :created: 1735241320
938
+ :description: |-
939
+ DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models.
940
+
941
+ For model details, please visit [the DeepSeek-V3 repo](https://github.com/deepseek-ai/DeepSeek-V3) for more information, or see the [launch announcement](https://api-docs.deepseek.com/news/news1226).
942
+ :context_length: 131072
943
+ :architecture:
944
+ modality: text->text
945
+ tokenizer: DeepSeek
946
+ instruct_type:
947
+ :pricing:
948
+ prompt: '0'
949
+ completion: '0'
950
+ image: '0'
951
+ request: '0'
952
+ :top_provider:
953
+ context_length: 131072
954
+ max_completion_tokens:
955
+ is_moderated: false
956
+ :per_request_limits:
516
957
  - :id: deepseek/deepseek-chat
517
958
  :name: 'DeepSeek: DeepSeek V3'
518
959
  :created: 1735241320
@@ -520,18 +961,18 @@
520
961
  DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models.
521
962
 
522
963
  For model details, please visit [the DeepSeek-V3 repo](https://github.com/deepseek-ai/DeepSeek-V3) for more information, or see the [launch announcement](https://api-docs.deepseek.com/news/news1226).
523
- :context_length: 16000
964
+ :context_length: 131072
524
965
  :architecture:
525
966
  modality: text->text
526
967
  tokenizer: DeepSeek
527
968
  instruct_type:
528
969
  :pricing:
529
- prompt: '0.00000049'
530
- completion: '0.00000089'
970
+ prompt: '0.0000009'
971
+ completion: '0.0000009'
531
972
  image: '0'
532
973
  request: '0'
533
974
  :top_provider:
534
- context_length: 16000
975
+ context_length: 131072
535
976
  max_completion_tokens:
536
977
  is_moderated: false
537
978
  :per_request_limits:
@@ -559,7 +1000,7 @@
559
1000
  4. **Performance and Benchmark Limitations:** Despite the improvements in visual reasoning, QVQ doesn’t entirely replace the capabilities of [Qwen2-VL-72B](/qwen/qwen-2-vl-72b-instruct). During multi-step visual reasoning, the model might gradually lose focus on the image content, leading to hallucinations. Moreover, QVQ doesn’t show significant improvement over [Qwen2-VL-72B](/qwen/qwen-2-vl-72b-instruct) in basic recognition tasks like identifying people, animals, or plants.
560
1001
 
561
1002
  Note: Currently, the model only supports single-round dialogues and image outputs. It does not support video inputs.
562
- :context_length: 128000
1003
+ :context_length: 32000
563
1004
  :architecture:
564
1005
  modality: text+image->text
565
1006
  tokenizer: Qwen
@@ -570,8 +1011,8 @@
570
1011
  image: '0'
571
1012
  request: '0'
572
1013
  :top_provider:
573
- context_length: 128000
574
- max_completion_tokens: 4096
1014
+ context_length: 32000
1015
+ max_completion_tokens: 8192
575
1016
  is_moderated: false
576
1017
  :per_request_limits:
577
1018
  - :id: google/gemini-2.0-flash-thinking-exp-1219:free
@@ -613,7 +1054,7 @@
613
1054
  request: '0'
614
1055
  :top_provider:
615
1056
  context_length: 131072
616
- max_completion_tokens:
1057
+ max_completion_tokens: 8192
617
1058
  is_moderated: false
618
1059
  :per_request_limits:
619
1060
  - :id: openai/o1
@@ -641,7 +1082,7 @@
641
1082
  is_moderated: true
642
1083
  :per_request_limits:
643
1084
  - :id: eva-unit-01/eva-llama-3.33-70b
644
- :name: EVA Llama 3.33 70b
1085
+ :name: EVA Llama 3.33 70B
645
1086
  :created: 1734377303
646
1087
  :description: |
647
1088
  EVA Llama 3.33 70b is a roleplay and storywriting specialist model. It is a full-parameter finetune of [Llama-3.3-70B-Instruct](https://openrouter.ai/meta-llama/llama-3.3-70b-instruct) on mixture of synthetic and natural data.
@@ -710,9 +1151,10 @@
710
1151
  - :id: cohere/command-r7b-12-2024
711
1152
  :name: 'Cohere: Command R7B (12-2024)'
712
1153
  :created: 1734158152
713
- :description: Command R7B (12-2024) is a small, fast update of the Command R+ model,
714
- delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks
715
- requiring complex reasoning and multiple steps.
1154
+ :description: |-
1155
+ Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks requiring complex reasoning and multiple steps.
1156
+
1157
+ Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
716
1158
  :context_length: 128000
717
1159
  :architecture:
718
1160
  modality: text->text
@@ -732,8 +1174,8 @@
732
1174
  :name: 'Google: Gemini Flash 2.0 Experimental (free)'
733
1175
  :created: 1733937523
734
1176
  :description: Gemini Flash 2.0 offers a significantly faster time to first token
735
- (TTFT) compared to [Gemini Flash 1.5](google/gemini-flash-1.5), while maintaining
736
- quality on par with larger models like [Gemini Pro 1.5](google/gemini-pro-1.5).
1177
+ (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining
1178
+ quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5).
737
1179
  It introduces notable enhancements in multimodal understanding, coding capabilities,
738
1180
  complex instruction following, and function calling. These advancements come together
739
1181
  to deliver more seamless and robust agentic experiences.
@@ -771,6 +1213,30 @@
771
1213
  max_completion_tokens: 8192
772
1214
  is_moderated: false
773
1215
  :per_request_limits:
1216
+ - :id: meta-llama/llama-3.3-70b-instruct:free
1217
+ :name: 'Meta: Llama 3.3 70B Instruct (free)'
1218
+ :created: 1733506137
1219
+ :description: |-
1220
+ The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.
1221
+
1222
+ Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
1223
+
1224
+ [Model Card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md)
1225
+ :context_length: 131072
1226
+ :architecture:
1227
+ modality: text->text
1228
+ tokenizer: Llama3
1229
+ instruct_type: llama3
1230
+ :pricing:
1231
+ prompt: '0'
1232
+ completion: '0'
1233
+ image: '0'
1234
+ request: '0'
1235
+ :top_provider:
1236
+ context_length: 131072
1237
+ max_completion_tokens:
1238
+ is_moderated: false
1239
+ :per_request_limits:
774
1240
  - :id: meta-llama/llama-3.3-70b-instruct
775
1241
  :name: 'Meta: Llama 3.3 70B Instruct'
776
1242
  :created: 1733506137
@@ -888,25 +1354,6 @@
888
1354
  request: '0'
889
1355
  :top_provider:
890
1356
  context_length: 32768
891
- max_completion_tokens:
892
- is_moderated: false
893
- :per_request_limits:
894
- - :id: google/gemini-exp-1121:free
895
- :name: 'Google: Gemini Experimental 1121 (free)'
896
- :created: 1732216725
897
- :description: Experimental release (November 21st, 2024) of Gemini.
898
- :context_length: 40960
899
- :architecture:
900
- modality: text+image->text
901
- tokenizer: Gemini
902
- instruct_type:
903
- :pricing:
904
- prompt: '0'
905
- completion: '0'
906
- image: '0'
907
- request: '0'
908
- :top_provider:
909
- context_length: 40960
910
1357
  max_completion_tokens: 8192
911
1358
  is_moderated: false
912
1359
  :per_request_limits:
@@ -1061,25 +1508,6 @@
1061
1508
  max_completion_tokens:
1062
1509
  is_moderated: false
1063
1510
  :per_request_limits:
1064
- - :id: google/gemini-exp-1114:free
1065
- :name: 'Google: Gemini Experimental 1114 (free)'
1066
- :created: 1731714740
1067
- :description: Gemini 11-14 (2024) experimental model features "quality" improvements.
1068
- :context_length: 40960
1069
- :architecture:
1070
- modality: text+image->text
1071
- tokenizer: Gemini
1072
- instruct_type:
1073
- :pricing:
1074
- prompt: '0'
1075
- completion: '0'
1076
- image: '0'
1077
- request: '0'
1078
- :top_provider:
1079
- context_length: 40960
1080
- max_completion_tokens: 8192
1081
- is_moderated: false
1082
- :per_request_limits:
1083
1511
  - :id: infermatic/mn-inferor-12b
1084
1512
  :name: 'Infermatic: Mistral Nemo Inferor 12B'
1085
1513
  :created: 1731464428
@@ -1087,19 +1515,19 @@
1087
1515
  Inferor 12B is a merge of top roleplay models, expert on immersive narratives and storytelling.
1088
1516
 
1089
1517
  This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [anthracite-org/magnum-v4-12b](https://openrouter.ai/anthracite-org/magnum-v4-72b) as a base.
1090
- :context_length: 32000
1518
+ :context_length: 16384
1091
1519
  :architecture:
1092
1520
  modality: text->text
1093
1521
  tokenizer: Mistral
1094
1522
  instruct_type: mistral
1095
1523
  :pricing:
1096
- prompt: '0.00000025'
1097
- completion: '0.0000005'
1524
+ prompt: '0.0000008'
1525
+ completion: '0.0000012'
1098
1526
  image: '0'
1099
1527
  request: '0'
1100
1528
  :top_provider:
1101
- context_length: 32000
1102
- max_completion_tokens:
1529
+ context_length: 16384
1530
+ max_completion_tokens: 4096
1103
1531
  is_moderated: false
1104
1532
  :per_request_limits:
1105
1533
  - :id: qwen/qwen-2.5-coder-32b-instruct
@@ -1174,7 +1602,7 @@
1174
1602
  is_moderated: false
1175
1603
  :per_request_limits:
1176
1604
  - :id: thedrummer/unslopnemo-12b
1177
- :name: Unslopnemo 12b
1605
+ :name: Unslopnemo 12B
1178
1606
  :created: 1731103448
1179
1607
  :description: UnslopNemo v4.1 is the latest addition from the creator of Rocinante,
1180
1608
  designed for adventure writing and role-play scenarios.
@@ -1482,6 +1910,28 @@
1482
1910
  request: '0'
1483
1911
  :top_provider:
1484
1912
  context_length: 32768
1913
+ max_completion_tokens: 8192
1914
+ is_moderated: false
1915
+ :per_request_limits:
1916
+ - :id: nvidia/llama-3.1-nemotron-70b-instruct:free
1917
+ :name: 'NVIDIA: Llama 3.1 Nemotron 70B Instruct (free)'
1918
+ :created: 1728950400
1919
+ :description: |-
1920
+ NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels in automatic alignment benchmarks. This model is tailored for applications requiring high accuracy in helpfulness and response generation, suitable for diverse user queries across multiple domains.
1921
+
1922
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
1923
+ :context_length: 131072
1924
+ :architecture:
1925
+ modality: text->text
1926
+ tokenizer: Llama3
1927
+ instruct_type: llama3
1928
+ :pricing:
1929
+ prompt: '0'
1930
+ completion: '0'
1931
+ image: '0'
1932
+ request: '0'
1933
+ :top_provider:
1934
+ context_length: 131072
1485
1935
  max_completion_tokens:
1486
1936
  is_moderated: false
1487
1937
  :per_request_limits:
@@ -1648,32 +2098,6 @@
1648
2098
  max_completion_tokens:
1649
2099
  is_moderated: false
1650
2100
  :per_request_limits:
1651
- - :id: meta-llama/llama-3.2-3b-instruct:free
1652
- :name: 'Meta: Llama 3.2 3B Instruct (free)'
1653
- :created: 1727222400
1654
- :description: |-
1655
- Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages.
1656
-
1657
- Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings.
1658
-
1659
- Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md).
1660
-
1661
- Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
1662
- :context_length: 4096
1663
- :architecture:
1664
- modality: text->text
1665
- tokenizer: Llama3
1666
- instruct_type: llama3
1667
- :pricing:
1668
- prompt: '0'
1669
- completion: '0'
1670
- image: '0'
1671
- request: '0'
1672
- :top_provider:
1673
- context_length: 4096
1674
- max_completion_tokens: 2048
1675
- is_moderated: false
1676
- :per_request_limits:
1677
2101
  - :id: meta-llama/llama-3.2-3b-instruct
1678
2102
  :name: 'Meta: Llama 3.2 3B Instruct'
1679
2103
  :created: 1727222400
@@ -1711,7 +2135,7 @@
1711
2135
  Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md).
1712
2136
 
1713
2137
  Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
1714
- :context_length: 4096
2138
+ :context_length: 131072
1715
2139
  :architecture:
1716
2140
  modality: text->text
1717
2141
  tokenizer: Llama3
@@ -1722,8 +2146,8 @@
1722
2146
  image: '0'
1723
2147
  request: '0'
1724
2148
  :top_provider:
1725
- context_length: 4096
1726
- max_completion_tokens: 2048
2149
+ context_length: 131072
2150
+ max_completion_tokens:
1727
2151
  is_moderated: false
1728
2152
  :per_request_limits:
1729
2153
  - :id: meta-llama/llama-3.2-1b-instruct
@@ -1752,8 +2176,8 @@
1752
2176
  max_completion_tokens:
1753
2177
  is_moderated: false
1754
2178
  :per_request_limits:
1755
- - :id: meta-llama/llama-3.2-90b-vision-instruct:free
1756
- :name: 'Meta: Llama 3.2 90B Vision Instruct (free)'
2179
+ - :id: meta-llama/llama-3.2-90b-vision-instruct
2180
+ :name: 'Meta: Llama 3.2 90B Vision Instruct'
1757
2181
  :created: 1727222400
1758
2182
  :description: |-
1759
2183
  The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks.
@@ -1769,41 +2193,15 @@
1769
2193
  tokenizer: Llama3
1770
2194
  instruct_type: llama3
1771
2195
  :pricing:
1772
- prompt: '0'
1773
- completion: '0'
1774
- image: '0'
2196
+ prompt: '0.0000008'
2197
+ completion: '0.0000016'
2198
+ image: '0.0051456'
1775
2199
  request: '0'
1776
2200
  :top_provider:
1777
2201
  context_length: 4096
1778
2202
  max_completion_tokens: 2048
1779
2203
  is_moderated: false
1780
2204
  :per_request_limits:
1781
- - :id: meta-llama/llama-3.2-90b-vision-instruct
1782
- :name: 'Meta: Llama 3.2 90B Vision Instruct'
1783
- :created: 1727222400
1784
- :description: |-
1785
- The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks.
1786
-
1787
- This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis.
1788
-
1789
- Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md).
1790
-
1791
- Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
1792
- :context_length: 131072
1793
- :architecture:
1794
- modality: text+image->text
1795
- tokenizer: Llama3
1796
- instruct_type: llama3
1797
- :pricing:
1798
- prompt: '0.0000009'
1799
- completion: '0.0000009'
1800
- image: '0.001301'
1801
- request: '0'
1802
- :top_provider:
1803
- context_length: 131072
1804
- max_completion_tokens:
1805
- is_moderated: false
1806
- :per_request_limits:
1807
2205
  - :id: meta-llama/llama-3.2-11b-vision-instruct:free
1808
2206
  :name: 'Meta: Llama 3.2 11B Vision Instruct (free)'
1809
2207
  :created: 1727222400
@@ -1841,7 +2239,7 @@
1841
2239
  Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md).
1842
2240
 
1843
2241
  Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
1844
- :context_length: 131072
2242
+ :context_length: 16384
1845
2243
  :architecture:
1846
2244
  modality: text+image->text
1847
2245
  tokenizer: Llama3
@@ -1849,11 +2247,11 @@
1849
2247
  :pricing:
1850
2248
  prompt: '0.000000055'
1851
2249
  completion: '0.000000055'
1852
- image: '0.00007948'
2250
+ image: '0'
1853
2251
  request: '0'
1854
2252
  :top_provider:
1855
- context_length: 131072
1856
- max_completion_tokens: 4096
2253
+ context_length: 16384
2254
+ max_completion_tokens:
1857
2255
  is_moderated: false
1858
2256
  :per_request_limits:
1859
2257
  - :id: qwen/qwen-2.5-72b-instruct
@@ -1871,19 +2269,19 @@
1871
2269
  - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
1872
2270
 
1873
2271
  Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
1874
- :context_length: 32768
2272
+ :context_length: 128000
1875
2273
  :architecture:
1876
2274
  modality: text->text
1877
2275
  tokenizer: Qwen
1878
2276
  instruct_type: chatml
1879
2277
  :pricing:
1880
- prompt: '0.00000023'
2278
+ prompt: '0.00000013'
1881
2279
  completion: '0.0000004'
1882
2280
  image: '0'
1883
2281
  request: '0'
1884
2282
  :top_provider:
1885
- context_length: 32768
1886
- max_completion_tokens: 4096
2283
+ context_length: 128000
2284
+ max_completion_tokens:
1887
2285
  is_moderated: false
1888
2286
  :per_request_limits:
1889
2287
  - :id: qwen/qwen-2-vl-72b-instruct
@@ -2064,7 +2462,7 @@
2064
2462
 
2065
2463
  Read the launch post [here](https://docs.cohere.com/changelog/command-gets-refreshed).
2066
2464
 
2067
- Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
2465
+ Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
2068
2466
  :context_length: 128000
2069
2467
  :architecture:
2070
2468
  modality: text->text
@@ -2088,7 +2486,7 @@
2088
2486
 
2089
2487
  Read the launch post [here](https://docs.cohere.com/changelog/command-gets-refreshed).
2090
2488
 
2091
- Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
2489
+ Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
2092
2490
  :context_length: 128000
2093
2491
  :architecture:
2094
2492
  modality: text->text
@@ -2136,32 +2534,6 @@
2136
2534
  max_completion_tokens:
2137
2535
  is_moderated: false
2138
2536
  :per_request_limits:
2139
- - :id: google/gemini-flash-1.5-exp
2140
- :name: 'Google: Gemini Flash 1.5 Experimental'
2141
- :created: 1724803200
2142
- :description: |-
2143
- Gemini 1.5 Flash Experimental is an experimental version of the [Gemini 1.5 Flash](/models/google/gemini-flash-1.5) model.
2144
-
2145
- Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
2146
-
2147
- #multimodal
2148
-
2149
- Note: This model is experimental and not suited for production use-cases. It may be removed or redirected to another model in the future.
2150
- :context_length: 1000000
2151
- :architecture:
2152
- modality: text+image->text
2153
- tokenizer: Gemini
2154
- instruct_type:
2155
- :pricing:
2156
- prompt: '0'
2157
- completion: '0'
2158
- image: '0'
2159
- request: '0'
2160
- :top_provider:
2161
- context_length: 1000000
2162
- max_completion_tokens: 8192
2163
- is_moderated: false
2164
- :per_request_limits:
2165
2537
  - :id: sao10k/l3.1-euryale-70b
2166
2538
  :name: 'Sao10K: Llama 3.1 Euryale 70B v2.2'
2167
2539
  :created: 1724803200
@@ -2179,7 +2551,7 @@
2179
2551
  request: '0'
2180
2552
  :top_provider:
2181
2553
  context_length: 131072
2182
- max_completion_tokens: 4096
2554
+ max_completion_tokens: 8192
2183
2555
  is_moderated: false
2184
2556
  :per_request_limits:
2185
2557
  - :id: google/gemini-flash-1.5-8b-exp
@@ -2317,7 +2689,7 @@
2317
2689
  The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.
2318
2690
 
2319
2691
  Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two.
2320
- :context_length: 131000
2692
+ :context_length: 131072
2321
2693
  :architecture:
2322
2694
  modality: text->text
2323
2695
  tokenizer: Llama3
@@ -2328,8 +2700,8 @@
2328
2700
  image: '0'
2329
2701
  request: '0'
2330
2702
  :top_provider:
2331
- context_length: 131000
2332
- max_completion_tokens: 131000
2703
+ context_length: 131072
2704
+ max_completion_tokens: 8192
2333
2705
  is_moderated: false
2334
2706
  :per_request_limits:
2335
2707
  - :id: perplexity/llama-3.1-sonar-huge-128k-online
@@ -2396,7 +2768,7 @@
2396
2768
  request: '0'
2397
2769
  :top_provider:
2398
2770
  context_length: 8192
2399
- max_completion_tokens:
2771
+ max_completion_tokens: 8192
2400
2772
  is_moderated: false
2401
2773
  :per_request_limits:
2402
2774
  - :id: aetherwiing/mn-starcannon-12b
@@ -2515,30 +2887,6 @@
2515
2887
  max_completion_tokens:
2516
2888
  is_moderated: false
2517
2889
  :per_request_limits:
2518
- - :id: google/gemini-pro-1.5-exp
2519
- :name: 'Google: Gemini Pro 1.5 Experimental'
2520
- :created: 1722470400
2521
- :description: |-
2522
- Gemini 1.5 Pro Experimental is a bleeding-edge version of the [Gemini 1.5 Pro](/models/google/gemini-pro-1.5) model. Because it's currently experimental, it will be **heavily rate-limited** by Google.
2523
-
2524
- Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
2525
-
2526
- #multimodal
2527
- :context_length: 1000000
2528
- :architecture:
2529
- modality: text+image->text
2530
- tokenizer: Gemini
2531
- instruct_type:
2532
- :pricing:
2533
- prompt: '0'
2534
- completion: '0'
2535
- image: '0'
2536
- request: '0'
2537
- :top_provider:
2538
- context_length: 1000000
2539
- max_completion_tokens: 8192
2540
- is_moderated: false
2541
- :per_request_limits:
2542
2890
  - :id: perplexity/llama-3.1-sonar-large-128k-chat
2543
2891
  :name: 'Perplexity: Llama 3.1 Sonar 70B'
2544
2892
  :created: 1722470400
@@ -2605,32 +2953,6 @@
2605
2953
  max_completion_tokens:
2606
2954
  is_moderated: false
2607
2955
  :per_request_limits:
2608
- - :id: meta-llama/llama-3.1-405b-instruct:free
2609
- :name: 'Meta: Llama 3.1 405B Instruct (free)'
2610
- :created: 1721692800
2611
- :description: |-
2612
- The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs.
2613
-
2614
- Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 405B instruct-tuned version is optimized for high quality dialogue usecases.
2615
-
2616
- It has demonstrated strong performance compared to leading closed-source models including GPT-4o and Claude 3.5 Sonnet in evaluations.
2617
-
2618
- To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
2619
- :context_length: 8000
2620
- :architecture:
2621
- modality: text->text
2622
- tokenizer: Llama3
2623
- instruct_type: llama3
2624
- :pricing:
2625
- prompt: '0'
2626
- completion: '0'
2627
- image: '0'
2628
- request: '0'
2629
- :top_provider:
2630
- context_length: 8000
2631
- max_completion_tokens: 4000
2632
- is_moderated: false
2633
- :per_request_limits:
2634
2956
  - :id: meta-llama/llama-3.1-405b-instruct
2635
2957
  :name: 'Meta: Llama 3.1 405B Instruct'
2636
2958
  :created: 1721692800
@@ -2654,33 +2976,7 @@
2654
2976
  request: '0'
2655
2977
  :top_provider:
2656
2978
  context_length: 32768
2657
- max_completion_tokens: 4096
2658
- is_moderated: false
2659
- :per_request_limits:
2660
- - :id: meta-llama/llama-3.1-405b-instruct:nitro
2661
- :name: 'Meta: Llama 3.1 405B Instruct (nitro)'
2662
- :created: 1721692800
2663
- :description: |-
2664
- The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs.
2665
-
2666
- Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 405B instruct-tuned version is optimized for high quality dialogue usecases.
2667
-
2668
- It has demonstrated strong performance compared to leading closed-source models including GPT-4o and Claude 3.5 Sonnet in evaluations.
2669
-
2670
- To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
2671
- :context_length: 8000
2672
- :architecture:
2673
- modality: text->text
2674
- tokenizer: Llama3
2675
- instruct_type: llama3
2676
- :pricing:
2677
- prompt: '0.00001462'
2678
- completion: '0.00001462'
2679
- image: '0'
2680
- request: '0'
2681
- :top_provider:
2682
- context_length: 8000
2683
- max_completion_tokens:
2979
+ max_completion_tokens: 8192
2684
2980
  is_moderated: false
2685
2981
  :per_request_limits:
2686
2982
  - :id: meta-llama/llama-3.1-8b-instruct:free
@@ -2692,7 +2988,7 @@
2692
2988
  It has demonstrated strong performance compared to leading closed-source models in human evaluations.
2693
2989
 
2694
2990
  To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
2695
- :context_length: 8192
2991
+ :context_length: 131072
2696
2992
  :architecture:
2697
2993
  modality: text->text
2698
2994
  tokenizer: Llama3
@@ -2703,8 +2999,8 @@
2703
2999
  image: '0'
2704
3000
  request: '0'
2705
3001
  :top_provider:
2706
- context_length: 8192
2707
- max_completion_tokens: 4096
3002
+ context_length: 131072
3003
+ max_completion_tokens:
2708
3004
  is_moderated: false
2709
3005
  :per_request_limits:
2710
3006
  - :id: meta-llama/llama-3.1-8b-instruct
@@ -2728,31 +3024,7 @@
2728
3024
  request: '0'
2729
3025
  :top_provider:
2730
3026
  context_length: 131072
2731
- max_completion_tokens: 4096
2732
- is_moderated: false
2733
- :per_request_limits:
2734
- - :id: meta-llama/llama-3.1-70b-instruct:free
2735
- :name: 'Meta: Llama 3.1 70B Instruct (free)'
2736
- :created: 1721692800
2737
- :description: |-
2738
- Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases.
2739
-
2740
- It has demonstrated strong performance compared to leading closed-source models in human evaluations.
2741
-
2742
- To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
2743
- :context_length: 8192
2744
- :architecture:
2745
- modality: text->text
2746
- tokenizer: Llama3
2747
- instruct_type: llama3
2748
- :pricing:
2749
- prompt: '0'
2750
- completion: '0'
2751
- image: '0'
2752
- request: '0'
2753
- :top_provider:
2754
- context_length: 8192
2755
- max_completion_tokens: 4096
3027
+ max_completion_tokens: 8192
2756
3028
  is_moderated: false
2757
3029
  :per_request_limits:
2758
3030
  - :id: meta-llama/llama-3.1-70b-instruct
@@ -2776,31 +3048,31 @@
2776
3048
  request: '0'
2777
3049
  :top_provider:
2778
3050
  context_length: 131072
2779
- max_completion_tokens: 4096
3051
+ max_completion_tokens: 8192
2780
3052
  is_moderated: false
2781
3053
  :per_request_limits:
2782
- - :id: meta-llama/llama-3.1-70b-instruct:nitro
2783
- :name: 'Meta: Llama 3.1 70B Instruct (nitro)'
2784
- :created: 1721692800
3054
+ - :id: mistralai/mistral-nemo:free
3055
+ :name: 'Mistral: Mistral Nemo (free)'
3056
+ :created: 1721347200
2785
3057
  :description: |-
2786
- Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases.
3058
+ A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA.
2787
3059
 
2788
- It has demonstrated strong performance compared to leading closed-source models in human evaluations.
3060
+ The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
2789
3061
 
2790
- To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
2791
- :context_length: 64000
3062
+ It supports function calling and is released under the Apache 2.0 license.
3063
+ :context_length: 128000
2792
3064
  :architecture:
2793
3065
  modality: text->text
2794
- tokenizer: Llama3
2795
- instruct_type: llama3
3066
+ tokenizer: Mistral
3067
+ instruct_type: mistral
2796
3068
  :pricing:
2797
- prompt: '0.00000325'
2798
- completion: '0.00000325'
3069
+ prompt: '0'
3070
+ completion: '0'
2799
3071
  image: '0'
2800
3072
  request: '0'
2801
3073
  :top_provider:
2802
- context_length: 64000
2803
- max_completion_tokens:
3074
+ context_length: 128000
3075
+ max_completion_tokens: 128000
2804
3076
  is_moderated: false
2805
3077
  :per_request_limits:
2806
3078
  - :id: mistralai/mistral-nemo
@@ -2824,7 +3096,7 @@
2824
3096
  request: '0'
2825
3097
  :top_provider:
2826
3098
  context_length: 131072
2827
- max_completion_tokens: 4096
3099
+ max_completion_tokens: 8192
2828
3100
  is_moderated: false
2829
3101
  :per_request_limits:
2830
3102
  - :id: mistralai/codestral-mamba
@@ -2877,89 +3149,37 @@
2877
3149
  image: '0.007225'
2878
3150
  request: '0'
2879
3151
  :top_provider:
2880
- context_length: 128000
2881
- max_completion_tokens: 16384
2882
- is_moderated: true
2883
- :per_request_limits:
2884
- - :id: openai/gpt-4o-mini-2024-07-18
2885
- :name: 'OpenAI: GPT-4o-mini (2024-07-18)'
2886
- :created: 1721260800
2887
- :description: |-
2888
- GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs.
2889
-
2890
- As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/models/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective.
2891
-
2892
- GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/).
2893
-
2894
- Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more.
2895
-
2896
- #multimodal
2897
- :context_length: 128000
2898
- :architecture:
2899
- modality: text+image->text
2900
- tokenizer: GPT
2901
- instruct_type:
2902
- :pricing:
2903
- prompt: '0.00000015'
2904
- completion: '0.0000006'
2905
- image: '0.007225'
2906
- request: '0'
2907
- :top_provider:
2908
- context_length: 128000
2909
- max_completion_tokens: 16384
2910
- is_moderated: true
2911
- :per_request_limits:
2912
- - :id: qwen/qwen-2-7b-instruct:free
2913
- :name: Qwen 2 7B Instruct (free)
2914
- :created: 1721088000
2915
- :description: |-
2916
- Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning.
2917
-
2918
- It features SwiGLU activation, attention QKV bias, and group query attention. It is pretrained on extensive data with supervised finetuning and direct preference optimization.
2919
-
2920
- For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2/) and [GitHub repo](https://github.com/QwenLM/Qwen2).
2921
-
2922
- Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
2923
- :context_length: 8192
2924
- :architecture:
2925
- modality: text->text
2926
- tokenizer: Qwen
2927
- instruct_type: chatml
2928
- :pricing:
2929
- prompt: '0'
2930
- completion: '0'
2931
- image: '0'
2932
- request: '0'
2933
- :top_provider:
2934
- context_length: 8192
2935
- max_completion_tokens: 4096
2936
- is_moderated: false
3152
+ context_length: 128000
3153
+ max_completion_tokens: 16384
3154
+ is_moderated: true
2937
3155
  :per_request_limits:
2938
- - :id: qwen/qwen-2-7b-instruct
2939
- :name: Qwen 2 7B Instruct
2940
- :created: 1721088000
3156
+ - :id: openai/gpt-4o-mini-2024-07-18
3157
+ :name: 'OpenAI: GPT-4o-mini (2024-07-18)'
3158
+ :created: 1721260800
2941
3159
  :description: |-
2942
- Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning.
3160
+ GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs.
2943
3161
 
2944
- It features SwiGLU activation, attention QKV bias, and group query attention. It is pretrained on extensive data with supervised finetuning and direct preference optimization.
3162
+ As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/models/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective.
2945
3163
 
2946
- For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2/) and [GitHub repo](https://github.com/QwenLM/Qwen2).
3164
+ GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/).
2947
3165
 
2948
- Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
2949
- :context_length: 32768
3166
+ Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more.
3167
+
3168
+ #multimodal
3169
+ :context_length: 128000
2950
3170
  :architecture:
2951
- modality: text->text
2952
- tokenizer: Qwen
2953
- instruct_type: chatml
3171
+ modality: text+image->text
3172
+ tokenizer: GPT
3173
+ instruct_type:
2954
3174
  :pricing:
2955
- prompt: '0.000000054'
2956
- completion: '0.000000054'
2957
- image: '0'
3175
+ prompt: '0.00000015'
3176
+ completion: '0.0000006'
3177
+ image: '0.007225'
2958
3178
  request: '0'
2959
3179
  :top_provider:
2960
- context_length: 32768
2961
- max_completion_tokens:
2962
- is_moderated: false
3180
+ context_length: 128000
3181
+ max_completion_tokens: 16384
3182
+ is_moderated: true
2963
3183
  :per_request_limits:
2964
3184
  - :id: google/gemma-2-27b-it
2965
3185
  :name: 'Google: Gemma 2 27B'
@@ -2982,7 +3202,7 @@
2982
3202
  request: '0'
2983
3203
  :top_provider:
2984
3204
  context_length: 8192
2985
- max_completion_tokens: 4096
3205
+ max_completion_tokens: 8192
2986
3206
  is_moderated: false
2987
3207
  :per_request_limits:
2988
3208
  - :id: alpindale/magnum-72b
@@ -3052,7 +3272,7 @@
3052
3272
  request: '0'
3053
3273
  :top_provider:
3054
3274
  context_length: 8192
3055
- max_completion_tokens: 4096
3275
+ max_completion_tokens: 8192
3056
3276
  is_moderated: false
3057
3277
  :per_request_limits:
3058
3278
  - :id: 01-ai/yi-large
@@ -3187,7 +3407,7 @@
3187
3407
  request: '0'
3188
3408
  :top_provider:
3189
3409
  context_length: 8192
3190
- max_completion_tokens: 4096
3410
+ max_completion_tokens: 8192
3191
3411
  is_moderated: false
3192
3412
  :per_request_limits:
3193
3413
  - :id: cognitivecomputations/dolphin-mixtral-8x22b
@@ -3233,13 +3453,13 @@
3233
3453
  tokenizer: Qwen
3234
3454
  instruct_type: chatml
3235
3455
  :pricing:
3236
- prompt: '0.00000034'
3237
- completion: '0.00000039'
3456
+ prompt: '0.0000009'
3457
+ completion: '0.0000009'
3238
3458
  image: '0'
3239
3459
  request: '0'
3240
3460
  :top_provider:
3241
3461
  context_length: 32768
3242
- max_completion_tokens:
3462
+ max_completion_tokens: 4096
3243
3463
  is_moderated: false
3244
3464
  :per_request_limits:
3245
3465
  - :id: mistralai/mistral-7b-instruct:free
@@ -3283,29 +3503,7 @@
3283
3503
  request: '0'
3284
3504
  :top_provider:
3285
3505
  context_length: 32768
3286
- max_completion_tokens: 4096
3287
- is_moderated: false
3288
- :per_request_limits:
3289
- - :id: mistralai/mistral-7b-instruct:nitro
3290
- :name: 'Mistral: Mistral 7B Instruct (nitro)'
3291
- :created: 1716768000
3292
- :description: |-
3293
- A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
3294
-
3295
- *Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.*
3296
- :context_length: 32768
3297
- :architecture:
3298
- modality: text->text
3299
- tokenizer: Mistral
3300
- instruct_type: mistral
3301
- :pricing:
3302
- prompt: '0.00000007'
3303
- completion: '0.00000007'
3304
- image: '0'
3305
- request: '0'
3306
- :top_provider:
3307
- context_length: 32768
3308
- max_completion_tokens:
3506
+ max_completion_tokens: 8192
3309
3507
  is_moderated: false
3310
3508
  :per_request_limits:
3311
3509
  - :id: mistralai/mistral-7b-instruct-v0.3
@@ -3333,7 +3531,7 @@
3333
3531
  request: '0'
3334
3532
  :top_provider:
3335
3533
  context_length: 32768
3336
- max_completion_tokens: 4096
3534
+ max_completion_tokens: 8192
3337
3535
  is_moderated: false
3338
3536
  :per_request_limits:
3339
3537
  - :id: nousresearch/hermes-2-pro-llama-3-8b
@@ -3669,32 +3867,30 @@
3669
3867
  max_completion_tokens: 2048
3670
3868
  is_moderated: false
3671
3869
  :per_request_limits:
3672
- - :id: meta-llama/llama-3-8b-instruct:free
3673
- :name: 'Meta: Llama 3 8B Instruct (free)'
3674
- :created: 1713398400
3870
+ - :id: sao10k/fimbulvetr-11b-v2
3871
+ :name: Fimbulvetr 11B v2
3872
+ :created: 1713657600
3675
3873
  :description: |-
3676
- Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.
3677
-
3678
- It has demonstrated strong performance compared to leading closed-source models in human evaluations.
3874
+ Creative writing model, routed with permission. It's fast, it keeps the conversation going, and it stays in character.
3679
3875
 
3680
- To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
3681
- :context_length: 8192
3876
+ If you submit a raw prompt, you can use Alpaca or Vicuna formats.
3877
+ :context_length: 4096
3682
3878
  :architecture:
3683
3879
  modality: text->text
3684
- tokenizer: Llama3
3685
- instruct_type: llama3
3880
+ tokenizer: Llama2
3881
+ instruct_type: alpaca
3686
3882
  :pricing:
3687
- prompt: '0'
3688
- completion: '0'
3883
+ prompt: '0.0000008'
3884
+ completion: '0.0000012'
3689
3885
  image: '0'
3690
3886
  request: '0'
3691
3887
  :top_provider:
3692
- context_length: 8192
3888
+ context_length: 4096
3693
3889
  max_completion_tokens: 4096
3694
3890
  is_moderated: false
3695
3891
  :per_request_limits:
3696
- - :id: meta-llama/llama-3-8b-instruct
3697
- :name: 'Meta: Llama 3 8B Instruct'
3892
+ - :id: meta-llama/llama-3-8b-instruct:free
3893
+ :name: 'Meta: Llama 3 8B Instruct (free)'
3698
3894
  :created: 1713398400
3699
3895
  :description: |-
3700
3896
  Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.
@@ -3708,8 +3904,8 @@
3708
3904
  tokenizer: Llama3
3709
3905
  instruct_type: llama3
3710
3906
  :pricing:
3711
- prompt: '0.00000003'
3712
- completion: '0.00000006'
3907
+ prompt: '0'
3908
+ completion: '0'
3713
3909
  image: '0'
3714
3910
  request: '0'
3715
3911
  :top_provider:
@@ -3717,32 +3913,8 @@
3717
3913
  max_completion_tokens: 4096
3718
3914
  is_moderated: false
3719
3915
  :per_request_limits:
3720
- - :id: meta-llama/llama-3-8b-instruct:extended
3721
- :name: 'Meta: Llama 3 8B Instruct (extended)'
3722
- :created: 1713398400
3723
- :description: |-
3724
- Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.
3725
-
3726
- It has demonstrated strong performance compared to leading closed-source models in human evaluations.
3727
-
3728
- To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
3729
- :context_length: 16384
3730
- :architecture:
3731
- modality: text->text
3732
- tokenizer: Llama3
3733
- instruct_type: llama3
3734
- :pricing:
3735
- prompt: '0.0000001875'
3736
- completion: '0.000001125'
3737
- image: '0'
3738
- request: '0'
3739
- :top_provider:
3740
- context_length: 16384
3741
- max_completion_tokens: 2048
3742
- is_moderated: false
3743
- :per_request_limits:
3744
- - :id: meta-llama/llama-3-8b-instruct:nitro
3745
- :name: 'Meta: Llama 3 8B Instruct (nitro)'
3916
+ - :id: meta-llama/llama-3-8b-instruct
3917
+ :name: 'Meta: Llama 3 8B Instruct'
3746
3918
  :created: 1713398400
3747
3919
  :description: |-
3748
3920
  Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.
@@ -3756,13 +3928,13 @@
3756
3928
  tokenizer: Llama3
3757
3929
  instruct_type: llama3
3758
3930
  :pricing:
3759
- prompt: '0.0000002'
3760
- completion: '0.0000002'
3931
+ prompt: '0.00000003'
3932
+ completion: '0.00000006'
3761
3933
  image: '0'
3762
3934
  request: '0'
3763
3935
  :top_provider:
3764
3936
  context_length: 8192
3765
- max_completion_tokens:
3937
+ max_completion_tokens: 8192
3766
3938
  is_moderated: false
3767
3939
  :per_request_limits:
3768
3940
  - :id: meta-llama/llama-3-70b-instruct
@@ -3786,31 +3958,7 @@
3786
3958
  request: '0'
3787
3959
  :top_provider:
3788
3960
  context_length: 8192
3789
- max_completion_tokens: 4096
3790
- is_moderated: false
3791
- :per_request_limits:
3792
- - :id: meta-llama/llama-3-70b-instruct:nitro
3793
- :name: 'Meta: Llama 3 70B Instruct (nitro)'
3794
- :created: 1713398400
3795
- :description: |-
3796
- Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases.
3797
-
3798
- It has demonstrated strong performance compared to leading closed-source models in human evaluations.
3799
-
3800
- To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
3801
- :context_length: 8192
3802
- :architecture:
3803
- modality: text->text
3804
- tokenizer: Llama3
3805
- instruct_type: llama3
3806
- :pricing:
3807
- prompt: '0.00000088'
3808
- completion: '0.00000088'
3809
- image: '0'
3810
- request: '0'
3811
- :top_provider:
3812
- context_length: 8192
3813
- max_completion_tokens:
3961
+ max_completion_tokens: 8192
3814
3962
  is_moderated: false
3815
3963
  :per_request_limits:
3816
3964
  - :id: mistralai/mixtral-8x22b-instruct
@@ -3862,7 +4010,7 @@
3862
4010
  request: '0'
3863
4011
  :top_provider:
3864
4012
  context_length: 65536
3865
- max_completion_tokens: 4096
4013
+ max_completion_tokens: 8192
3866
4014
  is_moderated: false
3867
4015
  :per_request_limits:
3868
4016
  - :id: microsoft/wizardlm-2-7b
@@ -3956,7 +4104,7 @@
3956
4104
 
3957
4105
  It offers multilingual support for ten key languages to facilitate global business operations. See benchmarks and the launch post [here](https://txt.cohere.com/command-r-plus-microsoft-azure/).
3958
4106
 
3959
- Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
4107
+ Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
3960
4108
  :context_length: 128000
3961
4109
  :architecture:
3962
4110
  modality: text->text
@@ -3980,7 +4128,7 @@
3980
4128
 
3981
4129
  It offers multilingual support for ten key languages to facilitate global business operations. See benchmarks and the launch post [here](https://txt.cohere.com/command-r-plus-microsoft-azure/).
3982
4130
 
3983
- Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
4131
+ Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
3984
4132
  :context_length: 128000
3985
4133
  :architecture:
3986
4134
  modality: text->text
@@ -4050,7 +4198,7 @@
4050
4198
  :description: |-
4051
4199
  Command is an instruction-following conversational model that performs language tasks with high quality, more reliably and with a longer context than our base generative models.
4052
4200
 
4053
- Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
4201
+ Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
4054
4202
  :context_length: 4096
4055
4203
  :architecture:
4056
4204
  modality: text->text
@@ -4074,7 +4222,7 @@
4074
4222
 
4075
4223
  Read the launch post [here](https://txt.cohere.com/command-r/).
4076
4224
 
4077
- Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
4225
+ Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
4078
4226
  :context_length: 128000
4079
4227
  :architecture:
4080
4228
  modality: text->text
@@ -4244,7 +4392,7 @@
4244
4392
 
4245
4393
  Read the launch post [here](https://txt.cohere.com/command-r/).
4246
4394
 
4247
- Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
4395
+ Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
4248
4396
  :context_length: 128000
4249
4397
  :architecture:
4250
4398
  modality: text->text
@@ -4554,29 +4702,7 @@
4554
4702
  request: '0'
4555
4703
  :top_provider:
4556
4704
  context_length: 32768
4557
- max_completion_tokens: 4096
4558
- is_moderated: false
4559
- :per_request_limits:
4560
- - :id: mistralai/mixtral-8x7b-instruct:nitro
4561
- :name: 'Mistral: Mixtral 8x7B Instruct (nitro)'
4562
- :created: 1702166400
4563
- :description: |-
4564
- Mixtral 8x7B Instruct is a pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters.
4565
-
4566
- Instruct model fine-tuned by Mistral. #moe
4567
- :context_length: 32768
4568
- :architecture:
4569
- modality: text->text
4570
- tokenizer: Mistral
4571
- instruct_type: mistral
4572
- :pricing:
4573
- prompt: '0.0000005'
4574
- completion: '0.0000005'
4575
- image: '0'
4576
- request: '0'
4577
- :top_provider:
4578
- context_length: 32768
4579
- max_completion_tokens:
4705
+ max_completion_tokens: 8192
4580
4706
  is_moderated: false
4581
4707
  :per_request_limits:
4582
4708
  - :id: openchat/openchat-7b:free
@@ -4626,7 +4752,7 @@
4626
4752
  request: '0'
4627
4753
  :top_provider:
4628
4754
  context_length: 8192
4629
- max_completion_tokens: 4096
4755
+ max_completion_tokens: 8192
4630
4756
  is_moderated: false
4631
4757
  :per_request_limits:
4632
4758
  - :id: neversleep/noromaid-20b
@@ -4753,7 +4879,7 @@
4753
4879
  request: '0'
4754
4880
  :top_provider:
4755
4881
  context_length: 4096
4756
- max_completion_tokens:
4882
+ max_completion_tokens: 4096
4757
4883
  is_moderated: false
4758
4884
  :per_request_limits:
4759
4885
  - :id: undi95/toppy-m-7b:free
@@ -4784,34 +4910,6 @@
4784
4910
  max_completion_tokens: 2048
4785
4911
  is_moderated: false
4786
4912
  :per_request_limits:
4787
- - :id: undi95/toppy-m-7b:nitro
4788
- :name: Toppy M 7B (nitro)
4789
- :created: 1699574400
4790
- :description: |-
4791
- A wild 7B parameter model that merges several models using the new task_arithmetic merge method from mergekit.
4792
- List of merged models:
4793
- - NousResearch/Nous-Capybara-7B-V1.9
4794
- - [HuggingFaceH4/zephyr-7b-beta](/models/huggingfaceh4/zephyr-7b-beta)
4795
- - lemonilia/AshhLimaRP-Mistral-7B
4796
- - Vulkane/120-Days-of-Sodom-LoRA-Mistral-7b
4797
- - Undi95/Mistral-pippa-sharegpt-7b-qlora
4798
-
4799
- #merge #uncensored
4800
- :context_length: 4096
4801
- :architecture:
4802
- modality: text->text
4803
- tokenizer: Mistral
4804
- instruct_type: alpaca
4805
- :pricing:
4806
- prompt: '0.00000007'
4807
- completion: '0.00000007'
4808
- image: '0'
4809
- request: '0'
4810
- :top_provider:
4811
- context_length: 4096
4812
- max_completion_tokens:
4813
- is_moderated: false
4814
- :per_request_limits:
4815
4913
  - :id: undi95/toppy-m-7b
4816
4914
  :name: Toppy M 7B
4817
4915
  :created: 1699574400
@@ -4891,6 +4989,7 @@
4891
4989
  - [google/gemini-flash-1.5](/google/gemini-flash-1.5)
4892
4990
  - [mistralai/mistral-large-2407](/mistralai/mistral-large-2407)
4893
4991
  - [mistralai/mistral-nemo](/mistralai/mistral-nemo)
4992
+ - [deepseek/deepseek-r1](/deepseek/deepseek-r1)
4894
4993
  - [meta-llama/llama-3.1-70b-instruct](/meta-llama/llama-3.1-70b-instruct)
4895
4994
  - [meta-llama/llama-3.1-405b-instruct](/meta-llama/llama-3.1-405b-instruct)
4896
4995
  - [mistralai/mixtral-8x22b-instruct](/mistralai/mixtral-8x22b-instruct)
@@ -5175,8 +5274,8 @@
5175
5274
  tokenizer: Llama2
5176
5275
  instruct_type: alpaca
5177
5276
  :pricing:
5178
- prompt: '0.00000017'
5179
- completion: '0.00000017'
5277
+ prompt: '0.00000018'
5278
+ completion: '0.00000018'
5180
5279
  image: '0'
5181
5280
  request: '0'
5182
5281
  :top_provider:
@@ -5287,26 +5386,6 @@
5287
5386
  max_completion_tokens: 4096
5288
5387
  is_moderated: false
5289
5388
  :per_request_limits:
5290
- - :id: undi95/remm-slerp-l2-13b:extended
5291
- :name: ReMM SLERP 13B (extended)
5292
- :created: 1689984000
5293
- :description: 'A recreation trial of the original MythoMax-L2-B13 but with updated
5294
- models. #merge'
5295
- :context_length: 6144
5296
- :architecture:
5297
- modality: text->text
5298
- tokenizer: Llama2
5299
- instruct_type: alpaca
5300
- :pricing:
5301
- prompt: '0.000001125'
5302
- completion: '0.000001125'
5303
- image: '0'
5304
- request: '0'
5305
- :top_provider:
5306
- context_length: 6144
5307
- max_completion_tokens: 512
5308
- is_moderated: false
5309
- :per_request_limits:
5310
5389
  - :id: google/palm-2-chat-bison
5311
5390
  :name: 'Google: PaLM 2 Chat'
5312
5391
  :created: 1689811200
@@ -5387,46 +5466,6 @@
5387
5466
  max_completion_tokens: 4096
5388
5467
  is_moderated: false
5389
5468
  :per_request_limits:
5390
- - :id: gryphe/mythomax-l2-13b:nitro
5391
- :name: MythoMax 13B (nitro)
5392
- :created: 1688256000
5393
- :description: 'One of the highest performing and most popular fine-tunes of Llama
5394
- 2 13B, with rich descriptions and roleplay. #merge'
5395
- :context_length: 4096
5396
- :architecture:
5397
- modality: text->text
5398
- tokenizer: Llama2
5399
- instruct_type: alpaca
5400
- :pricing:
5401
- prompt: '0.0000002'
5402
- completion: '0.0000002'
5403
- image: '0'
5404
- request: '0'
5405
- :top_provider:
5406
- context_length: 4096
5407
- max_completion_tokens:
5408
- is_moderated: false
5409
- :per_request_limits:
5410
- - :id: gryphe/mythomax-l2-13b:extended
5411
- :name: MythoMax 13B (extended)
5412
- :created: 1688256000
5413
- :description: 'One of the highest performing and most popular fine-tunes of Llama
5414
- 2 13B, with rich descriptions and roleplay. #merge'
5415
- :context_length: 8192
5416
- :architecture:
5417
- modality: text->text
5418
- tokenizer: Llama2
5419
- instruct_type: alpaca
5420
- :pricing:
5421
- prompt: '0.000001125'
5422
- completion: '0.000001125'
5423
- image: '0'
5424
- request: '0'
5425
- :top_provider:
5426
- context_length: 8192
5427
- max_completion_tokens: 512
5428
- is_moderated: false
5429
- :per_request_limits:
5430
5469
  - :id: meta-llama/llama-2-13b-chat
5431
5470
  :name: 'Meta: Llama 2 13B Chat'
5432
5471
  :created: 1687219200