ai_client 0.4.3 → 0.4.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +14 -3
- data/Rakefile +1 -0
- data/lib/ai_client/configuration.rb +14 -5
- data/lib/ai_client/models.yml +659 -620
- data/lib/ai_client/ollama_extensions.rb +191 -0
- data/lib/ai_client/version.rb +1 -1
- data/lib/ai_client/xai.rb +35 -0
- data/lib/ai_client.rb +15 -9
- metadata +5 -3
data/lib/ai_client/models.yml
CHANGED
@@ -1,5 +1,375 @@
|
|
1
1
|
---
|
2
|
-
- :id:
|
2
|
+
- :id: perplexity/r1-1776
|
3
|
+
:name: 'Perplexity: R1 1776'
|
4
|
+
:created: 1740004929
|
5
|
+
:description: |-
|
6
|
+
Note: As this model does not return <think> tags, thoughts will be streamed by default directly to the `content` field.
|
7
|
+
|
8
|
+
R1 1776 is a version of DeepSeek-R1 that has been post-trained to remove censorship constraints related to topics restricted by the Chinese government. The model retains its original reasoning capabilities while providing direct responses to a wider range of queries. R1 1776 is an offline chat model that does not use the perplexity search subsystem.
|
9
|
+
|
10
|
+
The model was tested on a multilingual dataset of over 1,000 examples covering sensitive topics to measure its likelihood of refusal or overly filtered responses. [Evaluation Results](https://cdn-uploads.huggingface.co/production/uploads/675c8332d01f593dc90817f5/GiN2VqC5hawUgAGJ6oHla.png) Its performance on math and reasoning benchmarks remains similar to the base R1 model. [Reasoning Performance](https://cdn-uploads.huggingface.co/production/uploads/675c8332d01f593dc90817f5/n4Z9Byqp2S7sKUvCvI40R.png)
|
11
|
+
|
12
|
+
Read more on the [Blog Post](https://perplexity.ai/hub/blog/open-sourcing-r1-1776)
|
13
|
+
:context_length: 128000
|
14
|
+
:architecture:
|
15
|
+
modality: text->text
|
16
|
+
tokenizer: DeepSeek
|
17
|
+
instruct_type:
|
18
|
+
:pricing:
|
19
|
+
prompt: '0.000002'
|
20
|
+
completion: '0.000008'
|
21
|
+
image: '0'
|
22
|
+
request: '0'
|
23
|
+
:top_provider:
|
24
|
+
context_length: 128000
|
25
|
+
max_completion_tokens:
|
26
|
+
is_moderated: false
|
27
|
+
:per_request_limits:
|
28
|
+
- :id: mistralai/mistral-saba
|
29
|
+
:name: 'Mistral: Saba'
|
30
|
+
:created: 1739803239
|
31
|
+
:description: Mistral Saba is a 24B-parameter language model specifically designed
|
32
|
+
for the Middle East and South Asia, delivering accurate and contextually relevant
|
33
|
+
responses while maintaining efficient performance. Trained on curated regional
|
34
|
+
datasets, it supports multiple Indian-origin languages—including Tamil and Malayalam—alongside
|
35
|
+
Arabic. This makes it a versatile option for a range of regional and multilingual
|
36
|
+
applications. Read more at the blog post [here](https://mistral.ai/en/news/mistral-saba)
|
37
|
+
:context_length: 32000
|
38
|
+
:architecture:
|
39
|
+
modality: text->text
|
40
|
+
tokenizer: Mistral
|
41
|
+
instruct_type:
|
42
|
+
:pricing:
|
43
|
+
prompt: '0.0000002'
|
44
|
+
completion: '0.0000006'
|
45
|
+
image: '0'
|
46
|
+
request: '0'
|
47
|
+
:top_provider:
|
48
|
+
context_length: 32000
|
49
|
+
max_completion_tokens:
|
50
|
+
is_moderated: false
|
51
|
+
:per_request_limits:
|
52
|
+
- :id: cognitivecomputations/dolphin3.0-r1-mistral-24b:free
|
53
|
+
:name: Dolphin3.0 R1 Mistral 24B (free)
|
54
|
+
:created: 1739462498
|
55
|
+
:description: |-
|
56
|
+
Dolphin 3.0 R1 is the next generation of the Dolphin series of instruct-tuned models. Designed to be the ultimate general purpose local model, enabling coding, math, agentic, function calling, and general use cases.
|
57
|
+
|
58
|
+
The R1 version has been trained for 3 epochs to reason using 800k reasoning traces from the Dolphin-R1 dataset.
|
59
|
+
|
60
|
+
Dolphin aims to be a general purpose reasoning instruct model, similar to the models behind ChatGPT, Claude, Gemini.
|
61
|
+
|
62
|
+
Part of the [Dolphin 3.0 Collection](https://huggingface.co/collections/cognitivecomputations/dolphin-30-677ab47f73d7ff66743979a3) Curated and trained by [Eric Hartford](https://huggingface.co/ehartford), [Ben Gitter](https://huggingface.co/bigstorm), [BlouseJury](https://huggingface.co/BlouseJury) and [Cognitive Computations](https://huggingface.co/cognitivecomputations)
|
63
|
+
:context_length: 32768
|
64
|
+
:architecture:
|
65
|
+
modality: text->text
|
66
|
+
tokenizer: Other
|
67
|
+
instruct_type:
|
68
|
+
:pricing:
|
69
|
+
prompt: '0'
|
70
|
+
completion: '0'
|
71
|
+
image: '0'
|
72
|
+
request: '0'
|
73
|
+
:top_provider:
|
74
|
+
context_length: 32768
|
75
|
+
max_completion_tokens:
|
76
|
+
is_moderated: false
|
77
|
+
:per_request_limits:
|
78
|
+
- :id: cognitivecomputations/dolphin3.0-mistral-24b:free
|
79
|
+
:name: Dolphin3.0 Mistral 24B (free)
|
80
|
+
:created: 1739462019
|
81
|
+
:description: "Dolphin 3.0 is the next generation of the Dolphin series of instruct-tuned
|
82
|
+
models. Designed to be the ultimate general purpose local model, enabling coding,
|
83
|
+
math, agentic, function calling, and general use cases.\n\nDolphin aims to be
|
84
|
+
a general purpose instruct model, similar to the models behind ChatGPT, Claude,
|
85
|
+
Gemini. \n\nPart of the [Dolphin 3.0 Collection](https://huggingface.co/collections/cognitivecomputations/dolphin-30-677ab47f73d7ff66743979a3)
|
86
|
+
Curated and trained by [Eric Hartford](https://huggingface.co/ehartford), [Ben
|
87
|
+
Gitter](https://huggingface.co/bigstorm), [BlouseJury](https://huggingface.co/BlouseJury)
|
88
|
+
and [Cognitive Computations](https://huggingface.co/cognitivecomputations)"
|
89
|
+
:context_length: 32768
|
90
|
+
:architecture:
|
91
|
+
modality: text->text
|
92
|
+
tokenizer: Other
|
93
|
+
instruct_type:
|
94
|
+
:pricing:
|
95
|
+
prompt: '0'
|
96
|
+
completion: '0'
|
97
|
+
image: '0'
|
98
|
+
request: '0'
|
99
|
+
:top_provider:
|
100
|
+
context_length: 32768
|
101
|
+
max_completion_tokens:
|
102
|
+
is_moderated: false
|
103
|
+
:per_request_limits:
|
104
|
+
- :id: meta-llama/llama-guard-3-8b
|
105
|
+
:name: Llama Guard 3 8B
|
106
|
+
:created: 1739401318
|
107
|
+
:description: |
|
108
|
+
Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.
|
109
|
+
|
110
|
+
Llama Guard 3 was aligned to safeguard against the MLCommons standardized hazards taxonomy and designed to support Llama 3.1 capabilities. Specifically, it provides content moderation in 8 languages, and was optimized to support safety and security for search and code interpreter tool calls.
|
111
|
+
:context_length: 16384
|
112
|
+
:architecture:
|
113
|
+
modality: text->text
|
114
|
+
tokenizer: Llama3
|
115
|
+
instruct_type: none
|
116
|
+
:pricing:
|
117
|
+
prompt: '0.0000003'
|
118
|
+
completion: '0.0000003'
|
119
|
+
image: '0'
|
120
|
+
request: '0'
|
121
|
+
:top_provider:
|
122
|
+
context_length: 16384
|
123
|
+
max_completion_tokens:
|
124
|
+
is_moderated: false
|
125
|
+
:per_request_limits:
|
126
|
+
- :id: openai/o3-mini-high
|
127
|
+
:name: 'OpenAI: o3 Mini High'
|
128
|
+
:created: 1739372611
|
129
|
+
:description: "OpenAI o3-mini-high is the same model as [o3-mini](/openai/o3-mini)
|
130
|
+
with reasoning_effort set to high. \n\no3-mini is a cost-efficient language model
|
131
|
+
optimized for STEM reasoning tasks, particularly excelling in science, mathematics,
|
132
|
+
and coding. The model features three adjustable reasoning effort levels and supports
|
133
|
+
key developer capabilities including function calling, structured outputs, and
|
134
|
+
streaming, though it does not include vision processing capabilities.\n\nThe model
|
135
|
+
demonstrates significant improvements over its predecessor, with expert testers
|
136
|
+
preferring its responses 56% of the time and noting a 39% reduction in major errors
|
137
|
+
on complex questions. With medium reasoning effort settings, o3-mini matches the
|
138
|
+
performance of the larger o1 model on challenging reasoning evaluations like AIME
|
139
|
+
and GPQA, while maintaining lower latency and cost."
|
140
|
+
:context_length: 200000
|
141
|
+
:architecture:
|
142
|
+
modality: text->text
|
143
|
+
tokenizer: Other
|
144
|
+
instruct_type:
|
145
|
+
:pricing:
|
146
|
+
prompt: '0.0000011'
|
147
|
+
completion: '0.0000044'
|
148
|
+
image: '0'
|
149
|
+
request: '0'
|
150
|
+
:top_provider:
|
151
|
+
context_length: 200000
|
152
|
+
max_completion_tokens: 100000
|
153
|
+
is_moderated: true
|
154
|
+
:per_request_limits:
|
155
|
+
- :id: allenai/llama-3.1-tulu-3-405b
|
156
|
+
:name: Llama 3.1 Tulu 3 405B
|
157
|
+
:created: 1739053421
|
158
|
+
:description: Tülu 3 405B is the largest model in the Tülu 3 family, applying fully
|
159
|
+
open post-training recipes at a 405B parameter scale. Built on the Llama 3.1 405B
|
160
|
+
base, it leverages Reinforcement Learning with Verifiable Rewards (RLVR) to enhance
|
161
|
+
instruction following, MATH, GSM8K, and IFEval performance. As part of Tülu 3’s
|
162
|
+
fully open-source approach, it offers state-of-the-art capabilities while surpassing
|
163
|
+
prior open-weight models like Llama 3.1 405B Instruct and Nous Hermes 3 405B on
|
164
|
+
multiple benchmarks. To read more, [click here.](https://allenai.org/blog/tulu-3-405B)
|
165
|
+
:context_length: 16000
|
166
|
+
:architecture:
|
167
|
+
modality: text->text
|
168
|
+
tokenizer: Other
|
169
|
+
instruct_type:
|
170
|
+
:pricing:
|
171
|
+
prompt: '0.000005'
|
172
|
+
completion: '0.00001'
|
173
|
+
image: '0'
|
174
|
+
request: '0'
|
175
|
+
:top_provider:
|
176
|
+
context_length: 16000
|
177
|
+
max_completion_tokens:
|
178
|
+
is_moderated: false
|
179
|
+
:per_request_limits:
|
180
|
+
- :id: deepseek/deepseek-r1-distill-llama-8b
|
181
|
+
:name: 'DeepSeek: R1 Distill Llama 8B'
|
182
|
+
:created: 1738937718
|
183
|
+
:description: "DeepSeek R1 Distill Llama 8B is a distilled large language model
|
184
|
+
based on [Llama-3.1-8B-Instruct](/meta-llama/llama-3.1-8b-instruct), using outputs
|
185
|
+
from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation
|
186
|
+
techniques to achieve high performance across multiple benchmarks, including:\n\n-
|
187
|
+
AIME 2024 pass@1: 50.4\n- MATH-500 pass@1: 89.1\n- CodeForces Rating: 1205\n\nThe
|
188
|
+
model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance
|
189
|
+
comparable to larger frontier models.\n\nHugging Face: \n- [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B)
|
190
|
+
\n- [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)
|
191
|
+
\ |"
|
192
|
+
:context_length: 32000
|
193
|
+
:architecture:
|
194
|
+
modality: text->text
|
195
|
+
tokenizer: Llama3
|
196
|
+
instruct_type:
|
197
|
+
:pricing:
|
198
|
+
prompt: '0.00000004'
|
199
|
+
completion: '0.00000004'
|
200
|
+
image: '0'
|
201
|
+
request: '0'
|
202
|
+
:top_provider:
|
203
|
+
context_length: 32000
|
204
|
+
max_completion_tokens: 32000
|
205
|
+
is_moderated: false
|
206
|
+
:per_request_limits:
|
207
|
+
- :id: google/gemini-2.0-flash-001
|
208
|
+
:name: 'Google: Gemini Flash 2.0'
|
209
|
+
:created: 1738769413
|
210
|
+
:description: Gemini Flash 2.0 offers a significantly faster time to first token
|
211
|
+
(TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining
|
212
|
+
quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5).
|
213
|
+
It introduces notable enhancements in multimodal understanding, coding capabilities,
|
214
|
+
complex instruction following, and function calling. These advancements come together
|
215
|
+
to deliver more seamless and robust agentic experiences.
|
216
|
+
:context_length: 1000000
|
217
|
+
:architecture:
|
218
|
+
modality: text+image->text
|
219
|
+
tokenizer: Gemini
|
220
|
+
instruct_type:
|
221
|
+
:pricing:
|
222
|
+
prompt: '0.0000001'
|
223
|
+
completion: '0.0000004'
|
224
|
+
image: '0.0000258'
|
225
|
+
request: '0'
|
226
|
+
:top_provider:
|
227
|
+
context_length: 1000000
|
228
|
+
max_completion_tokens: 8192
|
229
|
+
is_moderated: false
|
230
|
+
:per_request_limits:
|
231
|
+
- :id: google/gemini-2.0-flash-lite-preview-02-05:free
|
232
|
+
:name: 'Google: Gemini Flash Lite 2.0 Preview (free)'
|
233
|
+
:created: 1738768262
|
234
|
+
:description: Gemini Flash Lite 2.0 offers a significantly faster time to first
|
235
|
+
token (TTFT) compared to [Gemini Flash 1.5](google/gemini-flash-1.5), while maintaining
|
236
|
+
quality on par with larger models like [Gemini Pro 1.5](google/gemini-pro-1.5).
|
237
|
+
Because it's currently in preview, it will be **heavily rate-limited** by Google.
|
238
|
+
This model will move from free to paid pending a general rollout on February 24th,
|
239
|
+
at $0.075 / $0.30 per million input / ouput tokens respectively.
|
240
|
+
:context_length: 1000000
|
241
|
+
:architecture:
|
242
|
+
modality: text+image->text
|
243
|
+
tokenizer: Gemini
|
244
|
+
instruct_type:
|
245
|
+
:pricing:
|
246
|
+
prompt: '0'
|
247
|
+
completion: '0'
|
248
|
+
image: '0'
|
249
|
+
request: '0'
|
250
|
+
:top_provider:
|
251
|
+
context_length: 1000000
|
252
|
+
max_completion_tokens: 8192
|
253
|
+
is_moderated: false
|
254
|
+
:per_request_limits:
|
255
|
+
- :id: google/gemini-2.0-pro-exp-02-05:free
|
256
|
+
:name: 'Google: Gemini Pro 2.0 Experimental (free)'
|
257
|
+
:created: 1738768044
|
258
|
+
:description: |-
|
259
|
+
Gemini 2.0 Pro Experimental is a bleeding-edge version of the Gemini 2.0 Pro model. Because it's currently experimental, it will be **heavily rate-limited** by Google.
|
260
|
+
|
261
|
+
Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
|
262
|
+
|
263
|
+
#multimodal
|
264
|
+
:context_length: 2000000
|
265
|
+
:architecture:
|
266
|
+
modality: text+image->text
|
267
|
+
tokenizer: Gemini
|
268
|
+
instruct_type:
|
269
|
+
:pricing:
|
270
|
+
prompt: '0'
|
271
|
+
completion: '0'
|
272
|
+
image: '0'
|
273
|
+
request: '0'
|
274
|
+
:top_provider:
|
275
|
+
context_length: 2000000
|
276
|
+
max_completion_tokens: 8192
|
277
|
+
is_moderated: false
|
278
|
+
:per_request_limits:
|
279
|
+
- :id: qwen/qwen-vl-plus:free
|
280
|
+
:name: 'Qwen: Qwen VL Plus (free)'
|
281
|
+
:created: 1738731255
|
282
|
+
:description: 'Qwen''s Enhanced Large Visual Language Model. Significantly upgraded
|
283
|
+
for detailed recognition capabilities and text recognition abilities, supporting
|
284
|
+
ultra-high pixel resolutions up to millions of pixels and extreme aspect ratios
|
285
|
+
for image input. It delivers significant performance across a broad range of visual
|
286
|
+
tasks.
|
287
|
+
|
288
|
+
'
|
289
|
+
:context_length: 7500
|
290
|
+
:architecture:
|
291
|
+
modality: text+image->text
|
292
|
+
tokenizer: Qwen
|
293
|
+
instruct_type:
|
294
|
+
:pricing:
|
295
|
+
prompt: '0'
|
296
|
+
completion: '0'
|
297
|
+
image: '0'
|
298
|
+
request: '0'
|
299
|
+
:top_provider:
|
300
|
+
context_length: 7500
|
301
|
+
max_completion_tokens: 1500
|
302
|
+
is_moderated: false
|
303
|
+
:per_request_limits:
|
304
|
+
- :id: aion-labs/aion-1.0
|
305
|
+
:name: 'AionLabs: Aion-1.0'
|
306
|
+
:created: 1738697557
|
307
|
+
:description: Aion-1.0 is a multi-model system designed for high performance across
|
308
|
+
various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented
|
309
|
+
with additional models and techniques such as Tree of Thoughts (ToT) and Mixture
|
310
|
+
of Experts (MoE). It is Aion Lab's most powerful reasoning model.
|
311
|
+
:context_length: 32768
|
312
|
+
:architecture:
|
313
|
+
modality: text->text
|
314
|
+
tokenizer: Other
|
315
|
+
instruct_type:
|
316
|
+
:pricing:
|
317
|
+
prompt: '0.000004'
|
318
|
+
completion: '0.000008'
|
319
|
+
image: '0'
|
320
|
+
request: '0'
|
321
|
+
:top_provider:
|
322
|
+
context_length: 32768
|
323
|
+
max_completion_tokens: 32768
|
324
|
+
is_moderated: false
|
325
|
+
:per_request_limits:
|
326
|
+
- :id: aion-labs/aion-1.0-mini
|
327
|
+
:name: 'AionLabs: Aion-1.0-Mini'
|
328
|
+
:created: 1738697107
|
329
|
+
:description: Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1
|
330
|
+
model, designed for strong performance in reasoning domains such as mathematics,
|
331
|
+
coding, and logic. It is a modified variant of a FuseAI model that outperforms
|
332
|
+
R1-Distill-Qwen-32B and R1-Distill-Llama-70B, with benchmark results available
|
333
|
+
on its [Hugging Face page](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview),
|
334
|
+
independently replicated for verification.
|
335
|
+
:context_length: 32768
|
336
|
+
:architecture:
|
337
|
+
modality: text->text
|
338
|
+
tokenizer: Other
|
339
|
+
instruct_type:
|
340
|
+
:pricing:
|
341
|
+
prompt: '0.0000007'
|
342
|
+
completion: '0.0000014'
|
343
|
+
image: '0'
|
344
|
+
request: '0'
|
345
|
+
:top_provider:
|
346
|
+
context_length: 32768
|
347
|
+
max_completion_tokens: 32768
|
348
|
+
is_moderated: false
|
349
|
+
:per_request_limits:
|
350
|
+
- :id: aion-labs/aion-rp-llama-3.1-8b
|
351
|
+
:name: 'AionLabs: Aion-RP 1.0 (8B)'
|
352
|
+
:created: 1738696718
|
353
|
+
:description: Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation
|
354
|
+
portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto,
|
355
|
+
where LLMs evaluate each other’s responses. It is a fine-tuned base model rather
|
356
|
+
than an instruct model, designed to produce more natural and varied writing.
|
357
|
+
:context_length: 32768
|
358
|
+
:architecture:
|
359
|
+
modality: text->text
|
360
|
+
tokenizer: Other
|
361
|
+
instruct_type:
|
362
|
+
:pricing:
|
363
|
+
prompt: '0.0000002'
|
364
|
+
completion: '0.0000002'
|
365
|
+
image: '0'
|
366
|
+
request: '0'
|
367
|
+
:top_provider:
|
368
|
+
context_length: 32768
|
369
|
+
max_completion_tokens: 32768
|
370
|
+
is_moderated: false
|
371
|
+
:per_request_limits:
|
372
|
+
- :id: qwen/qwen-turbo
|
3
373
|
:name: 'Qwen: Qwen-Turbo'
|
4
374
|
:created: 1738410974
|
5
375
|
:description: Qwen-Turbo, based on Qwen2.5, is a 1M context model that provides
|
@@ -19,6 +389,27 @@
|
|
19
389
|
max_completion_tokens: 8192
|
20
390
|
is_moderated: false
|
21
391
|
:per_request_limits:
|
392
|
+
- :id: qwen/qwen2.5-vl-72b-instruct:free
|
393
|
+
:name: 'Qwen: Qwen2.5 VL 72B Instruct (free)'
|
394
|
+
:created: 1738410311
|
395
|
+
:description: Qwen2.5-VL is proficient in recognizing common objects such as flowers,
|
396
|
+
birds, fish, and insects. It is also highly capable of analyzing texts, charts,
|
397
|
+
icons, graphics, and layouts within images.
|
398
|
+
:context_length: 131072
|
399
|
+
:architecture:
|
400
|
+
modality: text+image->text
|
401
|
+
tokenizer: Qwen
|
402
|
+
instruct_type:
|
403
|
+
:pricing:
|
404
|
+
prompt: '0'
|
405
|
+
completion: '0'
|
406
|
+
image: '0'
|
407
|
+
request: '0'
|
408
|
+
:top_provider:
|
409
|
+
context_length: 131072
|
410
|
+
max_completion_tokens: 2048
|
411
|
+
is_moderated: false
|
412
|
+
:per_request_limits:
|
22
413
|
- :id: qwen/qwen-plus
|
23
414
|
:name: 'Qwen: Qwen-Plus'
|
24
415
|
:created: 1738409840
|
@@ -66,7 +457,11 @@
|
|
66
457
|
:name: 'OpenAI: o3 Mini'
|
67
458
|
:created: 1738351721
|
68
459
|
:description: |-
|
69
|
-
OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding.
|
460
|
+
OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding.
|
461
|
+
|
462
|
+
This model supports the `reasoning_effort` parameter, which can be set to "high", "medium", or "low" to control the thinking time of the model. The default is "medium". OpenRouter also offers the model slug `openai/o3-mini-high` to default the parameter to "high".
|
463
|
+
|
464
|
+
The model features three adjustable reasoning effort levels and supports key developer capabilities including function calling, structured outputs, and streaming, though it does not include vision processing capabilities.
|
70
465
|
|
71
466
|
The model demonstrates significant improvements over its predecessor, with expert testers preferring its responses 56% of the time and noting a 39% reduction in major errors on complex questions. With medium reasoning effort settings, o3-mini matches the performance of the larger o1 model on challenging reasoning evaluations like AIME and GPQA, while maintaining lower latency and cost.
|
72
467
|
:context_length: 200000
|
@@ -85,7 +480,7 @@
|
|
85
480
|
is_moderated: true
|
86
481
|
:per_request_limits:
|
87
482
|
- :id: deepseek/deepseek-r1-distill-qwen-1.5b
|
88
|
-
:name: '
|
483
|
+
:name: 'DeepSeek: R1 Distill Qwen 1.5B'
|
89
484
|
:created: 1738328067
|
90
485
|
:description: |-
|
91
486
|
DeepSeek R1 Distill Qwen 1.5B is a distilled large language model based on [Qwen 2.5 Math 1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It's a very small and efficient model which outperforms [GPT 4o 0513](/openai/gpt-4o-2024-05-13) on Math Benchmarks.
|
@@ -109,7 +504,29 @@
|
|
109
504
|
request: '0'
|
110
505
|
:top_provider:
|
111
506
|
context_length: 131072
|
112
|
-
max_completion_tokens:
|
507
|
+
max_completion_tokens: 32768
|
508
|
+
is_moderated: false
|
509
|
+
:per_request_limits:
|
510
|
+
- :id: mistralai/mistral-small-24b-instruct-2501:free
|
511
|
+
:name: 'Mistral: Mistral Small 3 (free)'
|
512
|
+
:created: 1738255409
|
513
|
+
:description: |-
|
514
|
+
Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment.
|
515
|
+
|
516
|
+
The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware. [Read the blog post about the model here.](https://mistral.ai/news/mistral-small-3/)
|
517
|
+
:context_length: 32000
|
518
|
+
:architecture:
|
519
|
+
modality: text->text
|
520
|
+
tokenizer: Mistral
|
521
|
+
instruct_type:
|
522
|
+
:pricing:
|
523
|
+
prompt: '0'
|
524
|
+
completion: '0'
|
525
|
+
image: '0'
|
526
|
+
request: '0'
|
527
|
+
:top_provider:
|
528
|
+
context_length: 32000
|
529
|
+
max_completion_tokens:
|
113
530
|
is_moderated: false
|
114
531
|
:per_request_limits:
|
115
532
|
- :id: mistralai/mistral-small-24b-instruct-2501
|
@@ -131,11 +548,11 @@
|
|
131
548
|
request: '0'
|
132
549
|
:top_provider:
|
133
550
|
context_length: 32768
|
134
|
-
max_completion_tokens:
|
551
|
+
max_completion_tokens: 8192
|
135
552
|
is_moderated: false
|
136
553
|
:per_request_limits:
|
137
554
|
- :id: deepseek/deepseek-r1-distill-qwen-32b
|
138
|
-
:name: 'DeepSeek:
|
555
|
+
:name: 'DeepSeek: R1 Distill Qwen 32B'
|
139
556
|
:created: 1738194830
|
140
557
|
:description: |-
|
141
558
|
DeepSeek R1 Distill Qwen 32B is a distilled large language model based on [Qwen 2.5 32B](https://huggingface.co/Qwen/Qwen2.5-32B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
|
@@ -159,11 +576,11 @@
|
|
159
576
|
request: '0'
|
160
577
|
:top_provider:
|
161
578
|
context_length: 131072
|
162
|
-
max_completion_tokens:
|
579
|
+
max_completion_tokens: 8192
|
163
580
|
is_moderated: false
|
164
581
|
:per_request_limits:
|
165
582
|
- :id: deepseek/deepseek-r1-distill-qwen-14b
|
166
|
-
:name: 'DeepSeek:
|
583
|
+
:name: 'DeepSeek: R1 Distill Qwen 14B'
|
167
584
|
:created: 1738193940
|
168
585
|
:description: |-
|
169
586
|
DeepSeek R1 Distill Qwen 14B is a distilled large language model based on [Qwen 2.5 14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
|
@@ -187,7 +604,7 @@
|
|
187
604
|
request: '0'
|
188
605
|
:top_provider:
|
189
606
|
context_length: 131072
|
190
|
-
max_completion_tokens:
|
607
|
+
max_completion_tokens: 32768
|
191
608
|
is_moderated: false
|
192
609
|
:per_request_limits:
|
193
610
|
- :id: perplexity/sonar-reasoning
|
@@ -282,8 +699,34 @@
|
|
282
699
|
max_completion_tokens:
|
283
700
|
is_moderated: false
|
284
701
|
:per_request_limits:
|
702
|
+
- :id: deepseek/deepseek-r1-distill-llama-70b:free
|
703
|
+
:name: 'DeepSeek: R1 Distill Llama 70B (free)'
|
704
|
+
:created: 1737663169
|
705
|
+
:description: |-
|
706
|
+
DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including:
|
707
|
+
|
708
|
+
- AIME 2024 pass@1: 70.0
|
709
|
+
- MATH-500 pass@1: 94.5
|
710
|
+
- CodeForces Rating: 1633
|
711
|
+
|
712
|
+
The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
|
713
|
+
:context_length: 128000
|
714
|
+
:architecture:
|
715
|
+
modality: text->text
|
716
|
+
tokenizer: Llama3
|
717
|
+
instruct_type:
|
718
|
+
:pricing:
|
719
|
+
prompt: '0'
|
720
|
+
completion: '0'
|
721
|
+
image: '0'
|
722
|
+
request: '0'
|
723
|
+
:top_provider:
|
724
|
+
context_length: 128000
|
725
|
+
max_completion_tokens:
|
726
|
+
is_moderated: false
|
727
|
+
:per_request_limits:
|
285
728
|
- :id: deepseek/deepseek-r1-distill-llama-70b
|
286
|
-
:name: 'DeepSeek:
|
729
|
+
:name: 'DeepSeek: R1 Distill Llama 70B'
|
287
730
|
:created: 1737663169
|
288
731
|
:description: |-
|
289
732
|
DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including:
|
@@ -305,7 +748,7 @@
|
|
305
748
|
request: '0'
|
306
749
|
:top_provider:
|
307
750
|
context_length: 131072
|
308
|
-
max_completion_tokens:
|
751
|
+
max_completion_tokens: 8192
|
309
752
|
is_moderated: false
|
310
753
|
:per_request_limits:
|
311
754
|
- :id: google/gemini-2.0-flash-thinking-exp:free
|
@@ -331,7 +774,7 @@
|
|
331
774
|
is_moderated: false
|
332
775
|
:per_request_limits:
|
333
776
|
- :id: deepseek/deepseek-r1:free
|
334
|
-
:name: 'DeepSeek:
|
777
|
+
:name: 'DeepSeek: R1 (free)'
|
335
778
|
:created: 1737381095
|
336
779
|
:description: |-
|
337
780
|
DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.
|
@@ -339,7 +782,7 @@
|
|
339
782
|
Fully open-source model & [technical report](https://api-docs.deepseek.com/news/news250120).
|
340
783
|
|
341
784
|
MIT licensed: Distill & commercialize freely!
|
342
|
-
:context_length:
|
785
|
+
:context_length: 163840
|
343
786
|
:architecture:
|
344
787
|
modality: text->text
|
345
788
|
tokenizer: DeepSeek
|
@@ -350,12 +793,12 @@
|
|
350
793
|
image: '0'
|
351
794
|
request: '0'
|
352
795
|
:top_provider:
|
353
|
-
context_length:
|
354
|
-
max_completion_tokens:
|
796
|
+
context_length: 163840
|
797
|
+
max_completion_tokens:
|
355
798
|
is_moderated: false
|
356
799
|
:per_request_limits:
|
357
800
|
- :id: deepseek/deepseek-r1
|
358
|
-
:name: 'DeepSeek:
|
801
|
+
:name: 'DeepSeek: R1'
|
359
802
|
:created: 1737381095
|
360
803
|
:description: |-
|
361
804
|
DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.
|
@@ -363,43 +806,19 @@
|
|
363
806
|
Fully open-source model & [technical report](https://api-docs.deepseek.com/news/news250120).
|
364
807
|
|
365
808
|
MIT licensed: Distill & commercialize freely!
|
366
|
-
:context_length:
|
809
|
+
:context_length: 128000
|
367
810
|
:architecture:
|
368
811
|
modality: text->text
|
369
812
|
tokenizer: DeepSeek
|
370
813
|
instruct_type:
|
371
814
|
:pricing:
|
372
|
-
prompt: '0.
|
815
|
+
prompt: '0.0000008'
|
373
816
|
completion: '0.0000024'
|
374
817
|
image: '0'
|
375
818
|
request: '0'
|
376
819
|
:top_provider:
|
377
|
-
context_length:
|
378
|
-
max_completion_tokens:
|
379
|
-
is_moderated: false
|
380
|
-
:per_request_limits:
|
381
|
-
- :id: deepseek/deepseek-r1:nitro
|
382
|
-
:name: 'DeepSeek: DeepSeek R1 (nitro)'
|
383
|
-
:created: 1737381095
|
384
|
-
:description: |-
|
385
|
-
DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.
|
386
|
-
|
387
|
-
Fully open-source model & [technical report](https://api-docs.deepseek.com/news/news250120).
|
388
|
-
|
389
|
-
MIT licensed: Distill & commercialize freely!
|
390
|
-
:context_length: 163840
|
391
|
-
:architecture:
|
392
|
-
modality: text->text
|
393
|
-
tokenizer: DeepSeek
|
394
|
-
instruct_type:
|
395
|
-
:pricing:
|
396
|
-
prompt: '0.000007'
|
397
|
-
completion: '0.000007'
|
398
|
-
image: '0'
|
399
|
-
request: '0'
|
400
|
-
:top_provider:
|
401
|
-
context_length: 163840
|
402
|
-
max_completion_tokens: 32768
|
820
|
+
context_length: 128000
|
821
|
+
max_completion_tokens:
|
403
822
|
is_moderated: false
|
404
823
|
:per_request_limits:
|
405
824
|
- :id: sophosympatheia/rogue-rose-103b-v0.2:free
|
@@ -491,7 +910,7 @@
|
|
491
910
|
request: '0'
|
492
911
|
:top_provider:
|
493
912
|
context_length: 16384
|
494
|
-
max_completion_tokens:
|
913
|
+
max_completion_tokens: 8192
|
495
914
|
is_moderated: false
|
496
915
|
:per_request_limits:
|
497
916
|
- :id: sao10k/l3.1-70b-hanami-x1
|
@@ -513,6 +932,28 @@
|
|
513
932
|
max_completion_tokens:
|
514
933
|
is_moderated: false
|
515
934
|
:per_request_limits:
|
935
|
+
- :id: deepseek/deepseek-chat:free
|
936
|
+
:name: 'DeepSeek: DeepSeek V3 (free)'
|
937
|
+
:created: 1735241320
|
938
|
+
:description: |-
|
939
|
+
DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models.
|
940
|
+
|
941
|
+
For model details, please visit [the DeepSeek-V3 repo](https://github.com/deepseek-ai/DeepSeek-V3) for more information, or see the [launch announcement](https://api-docs.deepseek.com/news/news1226).
|
942
|
+
:context_length: 131072
|
943
|
+
:architecture:
|
944
|
+
modality: text->text
|
945
|
+
tokenizer: DeepSeek
|
946
|
+
instruct_type:
|
947
|
+
:pricing:
|
948
|
+
prompt: '0'
|
949
|
+
completion: '0'
|
950
|
+
image: '0'
|
951
|
+
request: '0'
|
952
|
+
:top_provider:
|
953
|
+
context_length: 131072
|
954
|
+
max_completion_tokens:
|
955
|
+
is_moderated: false
|
956
|
+
:per_request_limits:
|
516
957
|
- :id: deepseek/deepseek-chat
|
517
958
|
:name: 'DeepSeek: DeepSeek V3'
|
518
959
|
:created: 1735241320
|
@@ -520,18 +961,18 @@
|
|
520
961
|
DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models.
|
521
962
|
|
522
963
|
For model details, please visit [the DeepSeek-V3 repo](https://github.com/deepseek-ai/DeepSeek-V3) for more information, or see the [launch announcement](https://api-docs.deepseek.com/news/news1226).
|
523
|
-
:context_length:
|
964
|
+
:context_length: 131072
|
524
965
|
:architecture:
|
525
966
|
modality: text->text
|
526
967
|
tokenizer: DeepSeek
|
527
968
|
instruct_type:
|
528
969
|
:pricing:
|
529
|
-
prompt: '0.
|
530
|
-
completion: '0.
|
970
|
+
prompt: '0.0000009'
|
971
|
+
completion: '0.0000009'
|
531
972
|
image: '0'
|
532
973
|
request: '0'
|
533
974
|
:top_provider:
|
534
|
-
context_length:
|
975
|
+
context_length: 131072
|
535
976
|
max_completion_tokens:
|
536
977
|
is_moderated: false
|
537
978
|
:per_request_limits:
|
@@ -559,7 +1000,7 @@
|
|
559
1000
|
4. **Performance and Benchmark Limitations:** Despite the improvements in visual reasoning, QVQ doesn’t entirely replace the capabilities of [Qwen2-VL-72B](/qwen/qwen-2-vl-72b-instruct). During multi-step visual reasoning, the model might gradually lose focus on the image content, leading to hallucinations. Moreover, QVQ doesn’t show significant improvement over [Qwen2-VL-72B](/qwen/qwen-2-vl-72b-instruct) in basic recognition tasks like identifying people, animals, or plants.
|
560
1001
|
|
561
1002
|
Note: Currently, the model only supports single-round dialogues and image outputs. It does not support video inputs.
|
562
|
-
:context_length:
|
1003
|
+
:context_length: 32000
|
563
1004
|
:architecture:
|
564
1005
|
modality: text+image->text
|
565
1006
|
tokenizer: Qwen
|
@@ -570,8 +1011,8 @@
|
|
570
1011
|
image: '0'
|
571
1012
|
request: '0'
|
572
1013
|
:top_provider:
|
573
|
-
context_length:
|
574
|
-
max_completion_tokens:
|
1014
|
+
context_length: 32000
|
1015
|
+
max_completion_tokens: 8192
|
575
1016
|
is_moderated: false
|
576
1017
|
:per_request_limits:
|
577
1018
|
- :id: google/gemini-2.0-flash-thinking-exp-1219:free
|
@@ -613,7 +1054,7 @@
|
|
613
1054
|
request: '0'
|
614
1055
|
:top_provider:
|
615
1056
|
context_length: 131072
|
616
|
-
max_completion_tokens:
|
1057
|
+
max_completion_tokens: 8192
|
617
1058
|
is_moderated: false
|
618
1059
|
:per_request_limits:
|
619
1060
|
- :id: openai/o1
|
@@ -641,7 +1082,7 @@
|
|
641
1082
|
is_moderated: true
|
642
1083
|
:per_request_limits:
|
643
1084
|
- :id: eva-unit-01/eva-llama-3.33-70b
|
644
|
-
:name: EVA Llama 3.33
|
1085
|
+
:name: EVA Llama 3.33 70B
|
645
1086
|
:created: 1734377303
|
646
1087
|
:description: |
|
647
1088
|
EVA Llama 3.33 70b is a roleplay and storywriting specialist model. It is a full-parameter finetune of [Llama-3.3-70B-Instruct](https://openrouter.ai/meta-llama/llama-3.3-70b-instruct) on mixture of synthetic and natural data.
|
@@ -710,9 +1151,10 @@
|
|
710
1151
|
- :id: cohere/command-r7b-12-2024
|
711
1152
|
:name: 'Cohere: Command R7B (12-2024)'
|
712
1153
|
:created: 1734158152
|
713
|
-
:description:
|
714
|
-
delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks
|
715
|
-
|
1154
|
+
:description: |-
|
1155
|
+
Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks requiring complex reasoning and multiple steps.
|
1156
|
+
|
1157
|
+
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
|
716
1158
|
:context_length: 128000
|
717
1159
|
:architecture:
|
718
1160
|
modality: text->text
|
@@ -732,8 +1174,8 @@
|
|
732
1174
|
:name: 'Google: Gemini Flash 2.0 Experimental (free)'
|
733
1175
|
:created: 1733937523
|
734
1176
|
:description: Gemini Flash 2.0 offers a significantly faster time to first token
|
735
|
-
(TTFT) compared to [Gemini Flash 1.5](google/gemini-flash-1.5), while maintaining
|
736
|
-
quality on par with larger models like [Gemini Pro 1.5](google/gemini-pro-1.5).
|
1177
|
+
(TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining
|
1178
|
+
quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5).
|
737
1179
|
It introduces notable enhancements in multimodal understanding, coding capabilities,
|
738
1180
|
complex instruction following, and function calling. These advancements come together
|
739
1181
|
to deliver more seamless and robust agentic experiences.
|
@@ -771,6 +1213,30 @@
|
|
771
1213
|
max_completion_tokens: 8192
|
772
1214
|
is_moderated: false
|
773
1215
|
:per_request_limits:
|
1216
|
+
- :id: meta-llama/llama-3.3-70b-instruct:free
|
1217
|
+
:name: 'Meta: Llama 3.3 70B Instruct (free)'
|
1218
|
+
:created: 1733506137
|
1219
|
+
:description: |-
|
1220
|
+
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.
|
1221
|
+
|
1222
|
+
Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
|
1223
|
+
|
1224
|
+
[Model Card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md)
|
1225
|
+
:context_length: 131072
|
1226
|
+
:architecture:
|
1227
|
+
modality: text->text
|
1228
|
+
tokenizer: Llama3
|
1229
|
+
instruct_type: llama3
|
1230
|
+
:pricing:
|
1231
|
+
prompt: '0'
|
1232
|
+
completion: '0'
|
1233
|
+
image: '0'
|
1234
|
+
request: '0'
|
1235
|
+
:top_provider:
|
1236
|
+
context_length: 131072
|
1237
|
+
max_completion_tokens:
|
1238
|
+
is_moderated: false
|
1239
|
+
:per_request_limits:
|
774
1240
|
- :id: meta-llama/llama-3.3-70b-instruct
|
775
1241
|
:name: 'Meta: Llama 3.3 70B Instruct'
|
776
1242
|
:created: 1733506137
|
@@ -888,25 +1354,6 @@
|
|
888
1354
|
request: '0'
|
889
1355
|
:top_provider:
|
890
1356
|
context_length: 32768
|
891
|
-
max_completion_tokens:
|
892
|
-
is_moderated: false
|
893
|
-
:per_request_limits:
|
894
|
-
- :id: google/gemini-exp-1121:free
|
895
|
-
:name: 'Google: Gemini Experimental 1121 (free)'
|
896
|
-
:created: 1732216725
|
897
|
-
:description: Experimental release (November 21st, 2024) of Gemini.
|
898
|
-
:context_length: 40960
|
899
|
-
:architecture:
|
900
|
-
modality: text+image->text
|
901
|
-
tokenizer: Gemini
|
902
|
-
instruct_type:
|
903
|
-
:pricing:
|
904
|
-
prompt: '0'
|
905
|
-
completion: '0'
|
906
|
-
image: '0'
|
907
|
-
request: '0'
|
908
|
-
:top_provider:
|
909
|
-
context_length: 40960
|
910
1357
|
max_completion_tokens: 8192
|
911
1358
|
is_moderated: false
|
912
1359
|
:per_request_limits:
|
@@ -1061,25 +1508,6 @@
|
|
1061
1508
|
max_completion_tokens:
|
1062
1509
|
is_moderated: false
|
1063
1510
|
:per_request_limits:
|
1064
|
-
- :id: google/gemini-exp-1114:free
|
1065
|
-
:name: 'Google: Gemini Experimental 1114 (free)'
|
1066
|
-
:created: 1731714740
|
1067
|
-
:description: Gemini 11-14 (2024) experimental model features "quality" improvements.
|
1068
|
-
:context_length: 40960
|
1069
|
-
:architecture:
|
1070
|
-
modality: text+image->text
|
1071
|
-
tokenizer: Gemini
|
1072
|
-
instruct_type:
|
1073
|
-
:pricing:
|
1074
|
-
prompt: '0'
|
1075
|
-
completion: '0'
|
1076
|
-
image: '0'
|
1077
|
-
request: '0'
|
1078
|
-
:top_provider:
|
1079
|
-
context_length: 40960
|
1080
|
-
max_completion_tokens: 8192
|
1081
|
-
is_moderated: false
|
1082
|
-
:per_request_limits:
|
1083
1511
|
- :id: infermatic/mn-inferor-12b
|
1084
1512
|
:name: 'Infermatic: Mistral Nemo Inferor 12B'
|
1085
1513
|
:created: 1731464428
|
@@ -1087,19 +1515,19 @@
|
|
1087
1515
|
Inferor 12B is a merge of top roleplay models, expert on immersive narratives and storytelling.
|
1088
1516
|
|
1089
1517
|
This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [anthracite-org/magnum-v4-12b](https://openrouter.ai/anthracite-org/magnum-v4-72b) as a base.
|
1090
|
-
:context_length:
|
1518
|
+
:context_length: 16384
|
1091
1519
|
:architecture:
|
1092
1520
|
modality: text->text
|
1093
1521
|
tokenizer: Mistral
|
1094
1522
|
instruct_type: mistral
|
1095
1523
|
:pricing:
|
1096
|
-
prompt: '0.
|
1097
|
-
completion: '0.
|
1524
|
+
prompt: '0.0000008'
|
1525
|
+
completion: '0.0000012'
|
1098
1526
|
image: '0'
|
1099
1527
|
request: '0'
|
1100
1528
|
:top_provider:
|
1101
|
-
context_length:
|
1102
|
-
max_completion_tokens:
|
1529
|
+
context_length: 16384
|
1530
|
+
max_completion_tokens: 4096
|
1103
1531
|
is_moderated: false
|
1104
1532
|
:per_request_limits:
|
1105
1533
|
- :id: qwen/qwen-2.5-coder-32b-instruct
|
@@ -1174,7 +1602,7 @@
|
|
1174
1602
|
is_moderated: false
|
1175
1603
|
:per_request_limits:
|
1176
1604
|
- :id: thedrummer/unslopnemo-12b
|
1177
|
-
:name: Unslopnemo
|
1605
|
+
:name: Unslopnemo 12B
|
1178
1606
|
:created: 1731103448
|
1179
1607
|
:description: UnslopNemo v4.1 is the latest addition from the creator of Rocinante,
|
1180
1608
|
designed for adventure writing and role-play scenarios.
|
@@ -1482,6 +1910,28 @@
|
|
1482
1910
|
request: '0'
|
1483
1911
|
:top_provider:
|
1484
1912
|
context_length: 32768
|
1913
|
+
max_completion_tokens: 8192
|
1914
|
+
is_moderated: false
|
1915
|
+
:per_request_limits:
|
1916
|
+
- :id: nvidia/llama-3.1-nemotron-70b-instruct:free
|
1917
|
+
:name: 'NVIDIA: Llama 3.1 Nemotron 70B Instruct (free)'
|
1918
|
+
:created: 1728950400
|
1919
|
+
:description: |-
|
1920
|
+
NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels in automatic alignment benchmarks. This model is tailored for applications requiring high accuracy in helpfulness and response generation, suitable for diverse user queries across multiple domains.
|
1921
|
+
|
1922
|
+
Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
|
1923
|
+
:context_length: 131072
|
1924
|
+
:architecture:
|
1925
|
+
modality: text->text
|
1926
|
+
tokenizer: Llama3
|
1927
|
+
instruct_type: llama3
|
1928
|
+
:pricing:
|
1929
|
+
prompt: '0'
|
1930
|
+
completion: '0'
|
1931
|
+
image: '0'
|
1932
|
+
request: '0'
|
1933
|
+
:top_provider:
|
1934
|
+
context_length: 131072
|
1485
1935
|
max_completion_tokens:
|
1486
1936
|
is_moderated: false
|
1487
1937
|
:per_request_limits:
|
@@ -1648,32 +2098,6 @@
|
|
1648
2098
|
max_completion_tokens:
|
1649
2099
|
is_moderated: false
|
1650
2100
|
:per_request_limits:
|
1651
|
-
- :id: meta-llama/llama-3.2-3b-instruct:free
|
1652
|
-
:name: 'Meta: Llama 3.2 3B Instruct (free)'
|
1653
|
-
:created: 1727222400
|
1654
|
-
:description: |-
|
1655
|
-
Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages.
|
1656
|
-
|
1657
|
-
Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings.
|
1658
|
-
|
1659
|
-
Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md).
|
1660
|
-
|
1661
|
-
Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
|
1662
|
-
:context_length: 4096
|
1663
|
-
:architecture:
|
1664
|
-
modality: text->text
|
1665
|
-
tokenizer: Llama3
|
1666
|
-
instruct_type: llama3
|
1667
|
-
:pricing:
|
1668
|
-
prompt: '0'
|
1669
|
-
completion: '0'
|
1670
|
-
image: '0'
|
1671
|
-
request: '0'
|
1672
|
-
:top_provider:
|
1673
|
-
context_length: 4096
|
1674
|
-
max_completion_tokens: 2048
|
1675
|
-
is_moderated: false
|
1676
|
-
:per_request_limits:
|
1677
2101
|
- :id: meta-llama/llama-3.2-3b-instruct
|
1678
2102
|
:name: 'Meta: Llama 3.2 3B Instruct'
|
1679
2103
|
:created: 1727222400
|
@@ -1711,7 +2135,7 @@
|
|
1711
2135
|
Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md).
|
1712
2136
|
|
1713
2137
|
Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
|
1714
|
-
:context_length:
|
2138
|
+
:context_length: 131072
|
1715
2139
|
:architecture:
|
1716
2140
|
modality: text->text
|
1717
2141
|
tokenizer: Llama3
|
@@ -1722,8 +2146,8 @@
|
|
1722
2146
|
image: '0'
|
1723
2147
|
request: '0'
|
1724
2148
|
:top_provider:
|
1725
|
-
context_length:
|
1726
|
-
max_completion_tokens:
|
2149
|
+
context_length: 131072
|
2150
|
+
max_completion_tokens:
|
1727
2151
|
is_moderated: false
|
1728
2152
|
:per_request_limits:
|
1729
2153
|
- :id: meta-llama/llama-3.2-1b-instruct
|
@@ -1752,8 +2176,8 @@
|
|
1752
2176
|
max_completion_tokens:
|
1753
2177
|
is_moderated: false
|
1754
2178
|
:per_request_limits:
|
1755
|
-
- :id: meta-llama/llama-3.2-90b-vision-instruct
|
1756
|
-
:name: 'Meta: Llama 3.2 90B Vision Instruct
|
2179
|
+
- :id: meta-llama/llama-3.2-90b-vision-instruct
|
2180
|
+
:name: 'Meta: Llama 3.2 90B Vision Instruct'
|
1757
2181
|
:created: 1727222400
|
1758
2182
|
:description: |-
|
1759
2183
|
The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks.
|
@@ -1769,41 +2193,15 @@
|
|
1769
2193
|
tokenizer: Llama3
|
1770
2194
|
instruct_type: llama3
|
1771
2195
|
:pricing:
|
1772
|
-
prompt: '0'
|
1773
|
-
completion: '0'
|
1774
|
-
image: '0'
|
2196
|
+
prompt: '0.0000008'
|
2197
|
+
completion: '0.0000016'
|
2198
|
+
image: '0.0051456'
|
1775
2199
|
request: '0'
|
1776
2200
|
:top_provider:
|
1777
2201
|
context_length: 4096
|
1778
2202
|
max_completion_tokens: 2048
|
1779
2203
|
is_moderated: false
|
1780
2204
|
:per_request_limits:
|
1781
|
-
- :id: meta-llama/llama-3.2-90b-vision-instruct
|
1782
|
-
:name: 'Meta: Llama 3.2 90B Vision Instruct'
|
1783
|
-
:created: 1727222400
|
1784
|
-
:description: |-
|
1785
|
-
The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks.
|
1786
|
-
|
1787
|
-
This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis.
|
1788
|
-
|
1789
|
-
Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md).
|
1790
|
-
|
1791
|
-
Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
|
1792
|
-
:context_length: 131072
|
1793
|
-
:architecture:
|
1794
|
-
modality: text+image->text
|
1795
|
-
tokenizer: Llama3
|
1796
|
-
instruct_type: llama3
|
1797
|
-
:pricing:
|
1798
|
-
prompt: '0.0000009'
|
1799
|
-
completion: '0.0000009'
|
1800
|
-
image: '0.001301'
|
1801
|
-
request: '0'
|
1802
|
-
:top_provider:
|
1803
|
-
context_length: 131072
|
1804
|
-
max_completion_tokens:
|
1805
|
-
is_moderated: false
|
1806
|
-
:per_request_limits:
|
1807
2205
|
- :id: meta-llama/llama-3.2-11b-vision-instruct:free
|
1808
2206
|
:name: 'Meta: Llama 3.2 11B Vision Instruct (free)'
|
1809
2207
|
:created: 1727222400
|
@@ -1841,7 +2239,7 @@
|
|
1841
2239
|
Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md).
|
1842
2240
|
|
1843
2241
|
Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
|
1844
|
-
:context_length:
|
2242
|
+
:context_length: 16384
|
1845
2243
|
:architecture:
|
1846
2244
|
modality: text+image->text
|
1847
2245
|
tokenizer: Llama3
|
@@ -1849,11 +2247,11 @@
|
|
1849
2247
|
:pricing:
|
1850
2248
|
prompt: '0.000000055'
|
1851
2249
|
completion: '0.000000055'
|
1852
|
-
image: '0
|
2250
|
+
image: '0'
|
1853
2251
|
request: '0'
|
1854
2252
|
:top_provider:
|
1855
|
-
context_length:
|
1856
|
-
max_completion_tokens:
|
2253
|
+
context_length: 16384
|
2254
|
+
max_completion_tokens:
|
1857
2255
|
is_moderated: false
|
1858
2256
|
:per_request_limits:
|
1859
2257
|
- :id: qwen/qwen-2.5-72b-instruct
|
@@ -1871,19 +2269,19 @@
|
|
1871
2269
|
- Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
|
1872
2270
|
|
1873
2271
|
Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
|
1874
|
-
:context_length:
|
2272
|
+
:context_length: 128000
|
1875
2273
|
:architecture:
|
1876
2274
|
modality: text->text
|
1877
2275
|
tokenizer: Qwen
|
1878
2276
|
instruct_type: chatml
|
1879
2277
|
:pricing:
|
1880
|
-
prompt: '0.
|
2278
|
+
prompt: '0.00000013'
|
1881
2279
|
completion: '0.0000004'
|
1882
2280
|
image: '0'
|
1883
2281
|
request: '0'
|
1884
2282
|
:top_provider:
|
1885
|
-
context_length:
|
1886
|
-
max_completion_tokens:
|
2283
|
+
context_length: 128000
|
2284
|
+
max_completion_tokens:
|
1887
2285
|
is_moderated: false
|
1888
2286
|
:per_request_limits:
|
1889
2287
|
- :id: qwen/qwen-2-vl-72b-instruct
|
@@ -2064,7 +2462,7 @@
|
|
2064
2462
|
|
2065
2463
|
Read the launch post [here](https://docs.cohere.com/changelog/command-gets-refreshed).
|
2066
2464
|
|
2067
|
-
Use of this model is subject to Cohere's [
|
2465
|
+
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
|
2068
2466
|
:context_length: 128000
|
2069
2467
|
:architecture:
|
2070
2468
|
modality: text->text
|
@@ -2088,7 +2486,7 @@
|
|
2088
2486
|
|
2089
2487
|
Read the launch post [here](https://docs.cohere.com/changelog/command-gets-refreshed).
|
2090
2488
|
|
2091
|
-
Use of this model is subject to Cohere's [
|
2489
|
+
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
|
2092
2490
|
:context_length: 128000
|
2093
2491
|
:architecture:
|
2094
2492
|
modality: text->text
|
@@ -2136,32 +2534,6 @@
|
|
2136
2534
|
max_completion_tokens:
|
2137
2535
|
is_moderated: false
|
2138
2536
|
:per_request_limits:
|
2139
|
-
- :id: google/gemini-flash-1.5-exp
|
2140
|
-
:name: 'Google: Gemini Flash 1.5 Experimental'
|
2141
|
-
:created: 1724803200
|
2142
|
-
:description: |-
|
2143
|
-
Gemini 1.5 Flash Experimental is an experimental version of the [Gemini 1.5 Flash](/models/google/gemini-flash-1.5) model.
|
2144
|
-
|
2145
|
-
Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
|
2146
|
-
|
2147
|
-
#multimodal
|
2148
|
-
|
2149
|
-
Note: This model is experimental and not suited for production use-cases. It may be removed or redirected to another model in the future.
|
2150
|
-
:context_length: 1000000
|
2151
|
-
:architecture:
|
2152
|
-
modality: text+image->text
|
2153
|
-
tokenizer: Gemini
|
2154
|
-
instruct_type:
|
2155
|
-
:pricing:
|
2156
|
-
prompt: '0'
|
2157
|
-
completion: '0'
|
2158
|
-
image: '0'
|
2159
|
-
request: '0'
|
2160
|
-
:top_provider:
|
2161
|
-
context_length: 1000000
|
2162
|
-
max_completion_tokens: 8192
|
2163
|
-
is_moderated: false
|
2164
|
-
:per_request_limits:
|
2165
2537
|
- :id: sao10k/l3.1-euryale-70b
|
2166
2538
|
:name: 'Sao10K: Llama 3.1 Euryale 70B v2.2'
|
2167
2539
|
:created: 1724803200
|
@@ -2179,7 +2551,7 @@
|
|
2179
2551
|
request: '0'
|
2180
2552
|
:top_provider:
|
2181
2553
|
context_length: 131072
|
2182
|
-
max_completion_tokens:
|
2554
|
+
max_completion_tokens: 8192
|
2183
2555
|
is_moderated: false
|
2184
2556
|
:per_request_limits:
|
2185
2557
|
- :id: google/gemini-flash-1.5-8b-exp
|
@@ -2317,7 +2689,7 @@
|
|
2317
2689
|
The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.
|
2318
2690
|
|
2319
2691
|
Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two.
|
2320
|
-
:context_length:
|
2692
|
+
:context_length: 131072
|
2321
2693
|
:architecture:
|
2322
2694
|
modality: text->text
|
2323
2695
|
tokenizer: Llama3
|
@@ -2328,8 +2700,8 @@
|
|
2328
2700
|
image: '0'
|
2329
2701
|
request: '0'
|
2330
2702
|
:top_provider:
|
2331
|
-
context_length:
|
2332
|
-
max_completion_tokens:
|
2703
|
+
context_length: 131072
|
2704
|
+
max_completion_tokens: 8192
|
2333
2705
|
is_moderated: false
|
2334
2706
|
:per_request_limits:
|
2335
2707
|
- :id: perplexity/llama-3.1-sonar-huge-128k-online
|
@@ -2396,7 +2768,7 @@
|
|
2396
2768
|
request: '0'
|
2397
2769
|
:top_provider:
|
2398
2770
|
context_length: 8192
|
2399
|
-
max_completion_tokens:
|
2771
|
+
max_completion_tokens: 8192
|
2400
2772
|
is_moderated: false
|
2401
2773
|
:per_request_limits:
|
2402
2774
|
- :id: aetherwiing/mn-starcannon-12b
|
@@ -2515,30 +2887,6 @@
|
|
2515
2887
|
max_completion_tokens:
|
2516
2888
|
is_moderated: false
|
2517
2889
|
:per_request_limits:
|
2518
|
-
- :id: google/gemini-pro-1.5-exp
|
2519
|
-
:name: 'Google: Gemini Pro 1.5 Experimental'
|
2520
|
-
:created: 1722470400
|
2521
|
-
:description: |-
|
2522
|
-
Gemini 1.5 Pro Experimental is a bleeding-edge version of the [Gemini 1.5 Pro](/models/google/gemini-pro-1.5) model. Because it's currently experimental, it will be **heavily rate-limited** by Google.
|
2523
|
-
|
2524
|
-
Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
|
2525
|
-
|
2526
|
-
#multimodal
|
2527
|
-
:context_length: 1000000
|
2528
|
-
:architecture:
|
2529
|
-
modality: text+image->text
|
2530
|
-
tokenizer: Gemini
|
2531
|
-
instruct_type:
|
2532
|
-
:pricing:
|
2533
|
-
prompt: '0'
|
2534
|
-
completion: '0'
|
2535
|
-
image: '0'
|
2536
|
-
request: '0'
|
2537
|
-
:top_provider:
|
2538
|
-
context_length: 1000000
|
2539
|
-
max_completion_tokens: 8192
|
2540
|
-
is_moderated: false
|
2541
|
-
:per_request_limits:
|
2542
2890
|
- :id: perplexity/llama-3.1-sonar-large-128k-chat
|
2543
2891
|
:name: 'Perplexity: Llama 3.1 Sonar 70B'
|
2544
2892
|
:created: 1722470400
|
@@ -2605,32 +2953,6 @@
|
|
2605
2953
|
max_completion_tokens:
|
2606
2954
|
is_moderated: false
|
2607
2955
|
:per_request_limits:
|
2608
|
-
- :id: meta-llama/llama-3.1-405b-instruct:free
|
2609
|
-
:name: 'Meta: Llama 3.1 405B Instruct (free)'
|
2610
|
-
:created: 1721692800
|
2611
|
-
:description: |-
|
2612
|
-
The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs.
|
2613
|
-
|
2614
|
-
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 405B instruct-tuned version is optimized for high quality dialogue usecases.
|
2615
|
-
|
2616
|
-
It has demonstrated strong performance compared to leading closed-source models including GPT-4o and Claude 3.5 Sonnet in evaluations.
|
2617
|
-
|
2618
|
-
To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
|
2619
|
-
:context_length: 8000
|
2620
|
-
:architecture:
|
2621
|
-
modality: text->text
|
2622
|
-
tokenizer: Llama3
|
2623
|
-
instruct_type: llama3
|
2624
|
-
:pricing:
|
2625
|
-
prompt: '0'
|
2626
|
-
completion: '0'
|
2627
|
-
image: '0'
|
2628
|
-
request: '0'
|
2629
|
-
:top_provider:
|
2630
|
-
context_length: 8000
|
2631
|
-
max_completion_tokens: 4000
|
2632
|
-
is_moderated: false
|
2633
|
-
:per_request_limits:
|
2634
2956
|
- :id: meta-llama/llama-3.1-405b-instruct
|
2635
2957
|
:name: 'Meta: Llama 3.1 405B Instruct'
|
2636
2958
|
:created: 1721692800
|
@@ -2654,33 +2976,7 @@
|
|
2654
2976
|
request: '0'
|
2655
2977
|
:top_provider:
|
2656
2978
|
context_length: 32768
|
2657
|
-
max_completion_tokens:
|
2658
|
-
is_moderated: false
|
2659
|
-
:per_request_limits:
|
2660
|
-
- :id: meta-llama/llama-3.1-405b-instruct:nitro
|
2661
|
-
:name: 'Meta: Llama 3.1 405B Instruct (nitro)'
|
2662
|
-
:created: 1721692800
|
2663
|
-
:description: |-
|
2664
|
-
The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs.
|
2665
|
-
|
2666
|
-
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 405B instruct-tuned version is optimized for high quality dialogue usecases.
|
2667
|
-
|
2668
|
-
It has demonstrated strong performance compared to leading closed-source models including GPT-4o and Claude 3.5 Sonnet in evaluations.
|
2669
|
-
|
2670
|
-
To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
|
2671
|
-
:context_length: 8000
|
2672
|
-
:architecture:
|
2673
|
-
modality: text->text
|
2674
|
-
tokenizer: Llama3
|
2675
|
-
instruct_type: llama3
|
2676
|
-
:pricing:
|
2677
|
-
prompt: '0.00001462'
|
2678
|
-
completion: '0.00001462'
|
2679
|
-
image: '0'
|
2680
|
-
request: '0'
|
2681
|
-
:top_provider:
|
2682
|
-
context_length: 8000
|
2683
|
-
max_completion_tokens:
|
2979
|
+
max_completion_tokens: 8192
|
2684
2980
|
is_moderated: false
|
2685
2981
|
:per_request_limits:
|
2686
2982
|
- :id: meta-llama/llama-3.1-8b-instruct:free
|
@@ -2692,7 +2988,7 @@
|
|
2692
2988
|
It has demonstrated strong performance compared to leading closed-source models in human evaluations.
|
2693
2989
|
|
2694
2990
|
To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
|
2695
|
-
:context_length:
|
2991
|
+
:context_length: 131072
|
2696
2992
|
:architecture:
|
2697
2993
|
modality: text->text
|
2698
2994
|
tokenizer: Llama3
|
@@ -2703,8 +2999,8 @@
|
|
2703
2999
|
image: '0'
|
2704
3000
|
request: '0'
|
2705
3001
|
:top_provider:
|
2706
|
-
context_length:
|
2707
|
-
max_completion_tokens:
|
3002
|
+
context_length: 131072
|
3003
|
+
max_completion_tokens:
|
2708
3004
|
is_moderated: false
|
2709
3005
|
:per_request_limits:
|
2710
3006
|
- :id: meta-llama/llama-3.1-8b-instruct
|
@@ -2728,31 +3024,7 @@
|
|
2728
3024
|
request: '0'
|
2729
3025
|
:top_provider:
|
2730
3026
|
context_length: 131072
|
2731
|
-
max_completion_tokens:
|
2732
|
-
is_moderated: false
|
2733
|
-
:per_request_limits:
|
2734
|
-
- :id: meta-llama/llama-3.1-70b-instruct:free
|
2735
|
-
:name: 'Meta: Llama 3.1 70B Instruct (free)'
|
2736
|
-
:created: 1721692800
|
2737
|
-
:description: |-
|
2738
|
-
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases.
|
2739
|
-
|
2740
|
-
It has demonstrated strong performance compared to leading closed-source models in human evaluations.
|
2741
|
-
|
2742
|
-
To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
|
2743
|
-
:context_length: 8192
|
2744
|
-
:architecture:
|
2745
|
-
modality: text->text
|
2746
|
-
tokenizer: Llama3
|
2747
|
-
instruct_type: llama3
|
2748
|
-
:pricing:
|
2749
|
-
prompt: '0'
|
2750
|
-
completion: '0'
|
2751
|
-
image: '0'
|
2752
|
-
request: '0'
|
2753
|
-
:top_provider:
|
2754
|
-
context_length: 8192
|
2755
|
-
max_completion_tokens: 4096
|
3027
|
+
max_completion_tokens: 8192
|
2756
3028
|
is_moderated: false
|
2757
3029
|
:per_request_limits:
|
2758
3030
|
- :id: meta-llama/llama-3.1-70b-instruct
|
@@ -2776,31 +3048,31 @@
|
|
2776
3048
|
request: '0'
|
2777
3049
|
:top_provider:
|
2778
3050
|
context_length: 131072
|
2779
|
-
max_completion_tokens:
|
3051
|
+
max_completion_tokens: 8192
|
2780
3052
|
is_moderated: false
|
2781
3053
|
:per_request_limits:
|
2782
|
-
- :id:
|
2783
|
-
:name: '
|
2784
|
-
:created:
|
3054
|
+
- :id: mistralai/mistral-nemo:free
|
3055
|
+
:name: 'Mistral: Mistral Nemo (free)'
|
3056
|
+
:created: 1721347200
|
2785
3057
|
:description: |-
|
2786
|
-
|
3058
|
+
A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA.
|
2787
3059
|
|
2788
|
-
|
3060
|
+
The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
|
2789
3061
|
|
2790
|
-
|
2791
|
-
:context_length:
|
3062
|
+
It supports function calling and is released under the Apache 2.0 license.
|
3063
|
+
:context_length: 128000
|
2792
3064
|
:architecture:
|
2793
3065
|
modality: text->text
|
2794
|
-
tokenizer:
|
2795
|
-
instruct_type:
|
3066
|
+
tokenizer: Mistral
|
3067
|
+
instruct_type: mistral
|
2796
3068
|
:pricing:
|
2797
|
-
prompt: '0
|
2798
|
-
completion: '0
|
3069
|
+
prompt: '0'
|
3070
|
+
completion: '0'
|
2799
3071
|
image: '0'
|
2800
3072
|
request: '0'
|
2801
3073
|
:top_provider:
|
2802
|
-
context_length:
|
2803
|
-
max_completion_tokens:
|
3074
|
+
context_length: 128000
|
3075
|
+
max_completion_tokens: 128000
|
2804
3076
|
is_moderated: false
|
2805
3077
|
:per_request_limits:
|
2806
3078
|
- :id: mistralai/mistral-nemo
|
@@ -2824,7 +3096,7 @@
|
|
2824
3096
|
request: '0'
|
2825
3097
|
:top_provider:
|
2826
3098
|
context_length: 131072
|
2827
|
-
max_completion_tokens:
|
3099
|
+
max_completion_tokens: 8192
|
2828
3100
|
is_moderated: false
|
2829
3101
|
:per_request_limits:
|
2830
3102
|
- :id: mistralai/codestral-mamba
|
@@ -2877,89 +3149,37 @@
|
|
2877
3149
|
image: '0.007225'
|
2878
3150
|
request: '0'
|
2879
3151
|
:top_provider:
|
2880
|
-
context_length: 128000
|
2881
|
-
max_completion_tokens: 16384
|
2882
|
-
is_moderated: true
|
2883
|
-
:per_request_limits:
|
2884
|
-
- :id: openai/gpt-4o-mini-2024-07-18
|
2885
|
-
:name: 'OpenAI: GPT-4o-mini (2024-07-18)'
|
2886
|
-
:created: 1721260800
|
2887
|
-
:description: |-
|
2888
|
-
GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs.
|
2889
|
-
|
2890
|
-
As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/models/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective.
|
2891
|
-
|
2892
|
-
GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/).
|
2893
|
-
|
2894
|
-
Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more.
|
2895
|
-
|
2896
|
-
#multimodal
|
2897
|
-
:context_length: 128000
|
2898
|
-
:architecture:
|
2899
|
-
modality: text+image->text
|
2900
|
-
tokenizer: GPT
|
2901
|
-
instruct_type:
|
2902
|
-
:pricing:
|
2903
|
-
prompt: '0.00000015'
|
2904
|
-
completion: '0.0000006'
|
2905
|
-
image: '0.007225'
|
2906
|
-
request: '0'
|
2907
|
-
:top_provider:
|
2908
|
-
context_length: 128000
|
2909
|
-
max_completion_tokens: 16384
|
2910
|
-
is_moderated: true
|
2911
|
-
:per_request_limits:
|
2912
|
-
- :id: qwen/qwen-2-7b-instruct:free
|
2913
|
-
:name: Qwen 2 7B Instruct (free)
|
2914
|
-
:created: 1721088000
|
2915
|
-
:description: |-
|
2916
|
-
Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning.
|
2917
|
-
|
2918
|
-
It features SwiGLU activation, attention QKV bias, and group query attention. It is pretrained on extensive data with supervised finetuning and direct preference optimization.
|
2919
|
-
|
2920
|
-
For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2/) and [GitHub repo](https://github.com/QwenLM/Qwen2).
|
2921
|
-
|
2922
|
-
Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
|
2923
|
-
:context_length: 8192
|
2924
|
-
:architecture:
|
2925
|
-
modality: text->text
|
2926
|
-
tokenizer: Qwen
|
2927
|
-
instruct_type: chatml
|
2928
|
-
:pricing:
|
2929
|
-
prompt: '0'
|
2930
|
-
completion: '0'
|
2931
|
-
image: '0'
|
2932
|
-
request: '0'
|
2933
|
-
:top_provider:
|
2934
|
-
context_length: 8192
|
2935
|
-
max_completion_tokens: 4096
|
2936
|
-
is_moderated: false
|
3152
|
+
context_length: 128000
|
3153
|
+
max_completion_tokens: 16384
|
3154
|
+
is_moderated: true
|
2937
3155
|
:per_request_limits:
|
2938
|
-
- :id:
|
2939
|
-
:name:
|
2940
|
-
:created:
|
3156
|
+
- :id: openai/gpt-4o-mini-2024-07-18
|
3157
|
+
:name: 'OpenAI: GPT-4o-mini (2024-07-18)'
|
3158
|
+
:created: 1721260800
|
2941
3159
|
:description: |-
|
2942
|
-
|
3160
|
+
GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs.
|
2943
3161
|
|
2944
|
-
|
3162
|
+
As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/models/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective.
|
2945
3163
|
|
2946
|
-
|
3164
|
+
GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/).
|
2947
3165
|
|
2948
|
-
|
2949
|
-
|
3166
|
+
Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more.
|
3167
|
+
|
3168
|
+
#multimodal
|
3169
|
+
:context_length: 128000
|
2950
3170
|
:architecture:
|
2951
|
-
modality: text->text
|
2952
|
-
tokenizer:
|
2953
|
-
instruct_type:
|
3171
|
+
modality: text+image->text
|
3172
|
+
tokenizer: GPT
|
3173
|
+
instruct_type:
|
2954
3174
|
:pricing:
|
2955
|
-
prompt: '0.
|
2956
|
-
completion: '0.
|
2957
|
-
image: '0'
|
3175
|
+
prompt: '0.00000015'
|
3176
|
+
completion: '0.0000006'
|
3177
|
+
image: '0.007225'
|
2958
3178
|
request: '0'
|
2959
3179
|
:top_provider:
|
2960
|
-
context_length:
|
2961
|
-
max_completion_tokens:
|
2962
|
-
is_moderated:
|
3180
|
+
context_length: 128000
|
3181
|
+
max_completion_tokens: 16384
|
3182
|
+
is_moderated: true
|
2963
3183
|
:per_request_limits:
|
2964
3184
|
- :id: google/gemma-2-27b-it
|
2965
3185
|
:name: 'Google: Gemma 2 27B'
|
@@ -2982,7 +3202,7 @@
|
|
2982
3202
|
request: '0'
|
2983
3203
|
:top_provider:
|
2984
3204
|
context_length: 8192
|
2985
|
-
max_completion_tokens:
|
3205
|
+
max_completion_tokens: 8192
|
2986
3206
|
is_moderated: false
|
2987
3207
|
:per_request_limits:
|
2988
3208
|
- :id: alpindale/magnum-72b
|
@@ -3052,7 +3272,7 @@
|
|
3052
3272
|
request: '0'
|
3053
3273
|
:top_provider:
|
3054
3274
|
context_length: 8192
|
3055
|
-
max_completion_tokens:
|
3275
|
+
max_completion_tokens: 8192
|
3056
3276
|
is_moderated: false
|
3057
3277
|
:per_request_limits:
|
3058
3278
|
- :id: 01-ai/yi-large
|
@@ -3187,7 +3407,7 @@
|
|
3187
3407
|
request: '0'
|
3188
3408
|
:top_provider:
|
3189
3409
|
context_length: 8192
|
3190
|
-
max_completion_tokens:
|
3410
|
+
max_completion_tokens: 8192
|
3191
3411
|
is_moderated: false
|
3192
3412
|
:per_request_limits:
|
3193
3413
|
- :id: cognitivecomputations/dolphin-mixtral-8x22b
|
@@ -3233,13 +3453,13 @@
|
|
3233
3453
|
tokenizer: Qwen
|
3234
3454
|
instruct_type: chatml
|
3235
3455
|
:pricing:
|
3236
|
-
prompt: '0.
|
3237
|
-
completion: '0.
|
3456
|
+
prompt: '0.0000009'
|
3457
|
+
completion: '0.0000009'
|
3238
3458
|
image: '0'
|
3239
3459
|
request: '0'
|
3240
3460
|
:top_provider:
|
3241
3461
|
context_length: 32768
|
3242
|
-
max_completion_tokens:
|
3462
|
+
max_completion_tokens: 4096
|
3243
3463
|
is_moderated: false
|
3244
3464
|
:per_request_limits:
|
3245
3465
|
- :id: mistralai/mistral-7b-instruct:free
|
@@ -3283,29 +3503,7 @@
|
|
3283
3503
|
request: '0'
|
3284
3504
|
:top_provider:
|
3285
3505
|
context_length: 32768
|
3286
|
-
max_completion_tokens:
|
3287
|
-
is_moderated: false
|
3288
|
-
:per_request_limits:
|
3289
|
-
- :id: mistralai/mistral-7b-instruct:nitro
|
3290
|
-
:name: 'Mistral: Mistral 7B Instruct (nitro)'
|
3291
|
-
:created: 1716768000
|
3292
|
-
:description: |-
|
3293
|
-
A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
|
3294
|
-
|
3295
|
-
*Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.*
|
3296
|
-
:context_length: 32768
|
3297
|
-
:architecture:
|
3298
|
-
modality: text->text
|
3299
|
-
tokenizer: Mistral
|
3300
|
-
instruct_type: mistral
|
3301
|
-
:pricing:
|
3302
|
-
prompt: '0.00000007'
|
3303
|
-
completion: '0.00000007'
|
3304
|
-
image: '0'
|
3305
|
-
request: '0'
|
3306
|
-
:top_provider:
|
3307
|
-
context_length: 32768
|
3308
|
-
max_completion_tokens:
|
3506
|
+
max_completion_tokens: 8192
|
3309
3507
|
is_moderated: false
|
3310
3508
|
:per_request_limits:
|
3311
3509
|
- :id: mistralai/mistral-7b-instruct-v0.3
|
@@ -3333,7 +3531,7 @@
|
|
3333
3531
|
request: '0'
|
3334
3532
|
:top_provider:
|
3335
3533
|
context_length: 32768
|
3336
|
-
max_completion_tokens:
|
3534
|
+
max_completion_tokens: 8192
|
3337
3535
|
is_moderated: false
|
3338
3536
|
:per_request_limits:
|
3339
3537
|
- :id: nousresearch/hermes-2-pro-llama-3-8b
|
@@ -3669,32 +3867,30 @@
|
|
3669
3867
|
max_completion_tokens: 2048
|
3670
3868
|
is_moderated: false
|
3671
3869
|
:per_request_limits:
|
3672
|
-
- :id:
|
3673
|
-
:name:
|
3674
|
-
:created:
|
3870
|
+
- :id: sao10k/fimbulvetr-11b-v2
|
3871
|
+
:name: Fimbulvetr 11B v2
|
3872
|
+
:created: 1713657600
|
3675
3873
|
:description: |-
|
3676
|
-
|
3677
|
-
|
3678
|
-
It has demonstrated strong performance compared to leading closed-source models in human evaluations.
|
3874
|
+
Creative writing model, routed with permission. It's fast, it keeps the conversation going, and it stays in character.
|
3679
3875
|
|
3680
|
-
|
3681
|
-
:context_length:
|
3876
|
+
If you submit a raw prompt, you can use Alpaca or Vicuna formats.
|
3877
|
+
:context_length: 4096
|
3682
3878
|
:architecture:
|
3683
3879
|
modality: text->text
|
3684
|
-
tokenizer:
|
3685
|
-
instruct_type:
|
3880
|
+
tokenizer: Llama2
|
3881
|
+
instruct_type: alpaca
|
3686
3882
|
:pricing:
|
3687
|
-
prompt: '0'
|
3688
|
-
completion: '0'
|
3883
|
+
prompt: '0.0000008'
|
3884
|
+
completion: '0.0000012'
|
3689
3885
|
image: '0'
|
3690
3886
|
request: '0'
|
3691
3887
|
:top_provider:
|
3692
|
-
context_length:
|
3888
|
+
context_length: 4096
|
3693
3889
|
max_completion_tokens: 4096
|
3694
3890
|
is_moderated: false
|
3695
3891
|
:per_request_limits:
|
3696
|
-
- :id: meta-llama/llama-3-8b-instruct
|
3697
|
-
:name: 'Meta: Llama 3 8B Instruct'
|
3892
|
+
- :id: meta-llama/llama-3-8b-instruct:free
|
3893
|
+
:name: 'Meta: Llama 3 8B Instruct (free)'
|
3698
3894
|
:created: 1713398400
|
3699
3895
|
:description: |-
|
3700
3896
|
Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.
|
@@ -3708,8 +3904,8 @@
|
|
3708
3904
|
tokenizer: Llama3
|
3709
3905
|
instruct_type: llama3
|
3710
3906
|
:pricing:
|
3711
|
-
prompt: '0
|
3712
|
-
completion: '0
|
3907
|
+
prompt: '0'
|
3908
|
+
completion: '0'
|
3713
3909
|
image: '0'
|
3714
3910
|
request: '0'
|
3715
3911
|
:top_provider:
|
@@ -3717,32 +3913,8 @@
|
|
3717
3913
|
max_completion_tokens: 4096
|
3718
3914
|
is_moderated: false
|
3719
3915
|
:per_request_limits:
|
3720
|
-
- :id: meta-llama/llama-3-8b-instruct
|
3721
|
-
:name: 'Meta: Llama 3 8B Instruct
|
3722
|
-
:created: 1713398400
|
3723
|
-
:description: |-
|
3724
|
-
Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.
|
3725
|
-
|
3726
|
-
It has demonstrated strong performance compared to leading closed-source models in human evaluations.
|
3727
|
-
|
3728
|
-
To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
|
3729
|
-
:context_length: 16384
|
3730
|
-
:architecture:
|
3731
|
-
modality: text->text
|
3732
|
-
tokenizer: Llama3
|
3733
|
-
instruct_type: llama3
|
3734
|
-
:pricing:
|
3735
|
-
prompt: '0.0000001875'
|
3736
|
-
completion: '0.000001125'
|
3737
|
-
image: '0'
|
3738
|
-
request: '0'
|
3739
|
-
:top_provider:
|
3740
|
-
context_length: 16384
|
3741
|
-
max_completion_tokens: 2048
|
3742
|
-
is_moderated: false
|
3743
|
-
:per_request_limits:
|
3744
|
-
- :id: meta-llama/llama-3-8b-instruct:nitro
|
3745
|
-
:name: 'Meta: Llama 3 8B Instruct (nitro)'
|
3916
|
+
- :id: meta-llama/llama-3-8b-instruct
|
3917
|
+
:name: 'Meta: Llama 3 8B Instruct'
|
3746
3918
|
:created: 1713398400
|
3747
3919
|
:description: |-
|
3748
3920
|
Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.
|
@@ -3756,13 +3928,13 @@
|
|
3756
3928
|
tokenizer: Llama3
|
3757
3929
|
instruct_type: llama3
|
3758
3930
|
:pricing:
|
3759
|
-
prompt: '0.
|
3760
|
-
completion: '0.
|
3931
|
+
prompt: '0.00000003'
|
3932
|
+
completion: '0.00000006'
|
3761
3933
|
image: '0'
|
3762
3934
|
request: '0'
|
3763
3935
|
:top_provider:
|
3764
3936
|
context_length: 8192
|
3765
|
-
max_completion_tokens:
|
3937
|
+
max_completion_tokens: 8192
|
3766
3938
|
is_moderated: false
|
3767
3939
|
:per_request_limits:
|
3768
3940
|
- :id: meta-llama/llama-3-70b-instruct
|
@@ -3786,31 +3958,7 @@
|
|
3786
3958
|
request: '0'
|
3787
3959
|
:top_provider:
|
3788
3960
|
context_length: 8192
|
3789
|
-
max_completion_tokens:
|
3790
|
-
is_moderated: false
|
3791
|
-
:per_request_limits:
|
3792
|
-
- :id: meta-llama/llama-3-70b-instruct:nitro
|
3793
|
-
:name: 'Meta: Llama 3 70B Instruct (nitro)'
|
3794
|
-
:created: 1713398400
|
3795
|
-
:description: |-
|
3796
|
-
Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases.
|
3797
|
-
|
3798
|
-
It has demonstrated strong performance compared to leading closed-source models in human evaluations.
|
3799
|
-
|
3800
|
-
To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
|
3801
|
-
:context_length: 8192
|
3802
|
-
:architecture:
|
3803
|
-
modality: text->text
|
3804
|
-
tokenizer: Llama3
|
3805
|
-
instruct_type: llama3
|
3806
|
-
:pricing:
|
3807
|
-
prompt: '0.00000088'
|
3808
|
-
completion: '0.00000088'
|
3809
|
-
image: '0'
|
3810
|
-
request: '0'
|
3811
|
-
:top_provider:
|
3812
|
-
context_length: 8192
|
3813
|
-
max_completion_tokens:
|
3961
|
+
max_completion_tokens: 8192
|
3814
3962
|
is_moderated: false
|
3815
3963
|
:per_request_limits:
|
3816
3964
|
- :id: mistralai/mixtral-8x22b-instruct
|
@@ -3862,7 +4010,7 @@
|
|
3862
4010
|
request: '0'
|
3863
4011
|
:top_provider:
|
3864
4012
|
context_length: 65536
|
3865
|
-
max_completion_tokens:
|
4013
|
+
max_completion_tokens: 8192
|
3866
4014
|
is_moderated: false
|
3867
4015
|
:per_request_limits:
|
3868
4016
|
- :id: microsoft/wizardlm-2-7b
|
@@ -3956,7 +4104,7 @@
|
|
3956
4104
|
|
3957
4105
|
It offers multilingual support for ten key languages to facilitate global business operations. See benchmarks and the launch post [here](https://txt.cohere.com/command-r-plus-microsoft-azure/).
|
3958
4106
|
|
3959
|
-
Use of this model is subject to Cohere's [
|
4107
|
+
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
|
3960
4108
|
:context_length: 128000
|
3961
4109
|
:architecture:
|
3962
4110
|
modality: text->text
|
@@ -3980,7 +4128,7 @@
|
|
3980
4128
|
|
3981
4129
|
It offers multilingual support for ten key languages to facilitate global business operations. See benchmarks and the launch post [here](https://txt.cohere.com/command-r-plus-microsoft-azure/).
|
3982
4130
|
|
3983
|
-
Use of this model is subject to Cohere's [
|
4131
|
+
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
|
3984
4132
|
:context_length: 128000
|
3985
4133
|
:architecture:
|
3986
4134
|
modality: text->text
|
@@ -4050,7 +4198,7 @@
|
|
4050
4198
|
:description: |-
|
4051
4199
|
Command is an instruction-following conversational model that performs language tasks with high quality, more reliably and with a longer context than our base generative models.
|
4052
4200
|
|
4053
|
-
Use of this model is subject to Cohere's [
|
4201
|
+
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
|
4054
4202
|
:context_length: 4096
|
4055
4203
|
:architecture:
|
4056
4204
|
modality: text->text
|
@@ -4074,7 +4222,7 @@
|
|
4074
4222
|
|
4075
4223
|
Read the launch post [here](https://txt.cohere.com/command-r/).
|
4076
4224
|
|
4077
|
-
Use of this model is subject to Cohere's [
|
4225
|
+
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
|
4078
4226
|
:context_length: 128000
|
4079
4227
|
:architecture:
|
4080
4228
|
modality: text->text
|
@@ -4244,7 +4392,7 @@
|
|
4244
4392
|
|
4245
4393
|
Read the launch post [here](https://txt.cohere.com/command-r/).
|
4246
4394
|
|
4247
|
-
Use of this model is subject to Cohere's [
|
4395
|
+
Use of this model is subject to Cohere's [Usage Policy](https://docs.cohere.com/docs/usage-policy) and [SaaS Agreement](https://cohere.com/saas-agreement).
|
4248
4396
|
:context_length: 128000
|
4249
4397
|
:architecture:
|
4250
4398
|
modality: text->text
|
@@ -4554,29 +4702,7 @@
|
|
4554
4702
|
request: '0'
|
4555
4703
|
:top_provider:
|
4556
4704
|
context_length: 32768
|
4557
|
-
max_completion_tokens:
|
4558
|
-
is_moderated: false
|
4559
|
-
:per_request_limits:
|
4560
|
-
- :id: mistralai/mixtral-8x7b-instruct:nitro
|
4561
|
-
:name: 'Mistral: Mixtral 8x7B Instruct (nitro)'
|
4562
|
-
:created: 1702166400
|
4563
|
-
:description: |-
|
4564
|
-
Mixtral 8x7B Instruct is a pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters.
|
4565
|
-
|
4566
|
-
Instruct model fine-tuned by Mistral. #moe
|
4567
|
-
:context_length: 32768
|
4568
|
-
:architecture:
|
4569
|
-
modality: text->text
|
4570
|
-
tokenizer: Mistral
|
4571
|
-
instruct_type: mistral
|
4572
|
-
:pricing:
|
4573
|
-
prompt: '0.0000005'
|
4574
|
-
completion: '0.0000005'
|
4575
|
-
image: '0'
|
4576
|
-
request: '0'
|
4577
|
-
:top_provider:
|
4578
|
-
context_length: 32768
|
4579
|
-
max_completion_tokens:
|
4705
|
+
max_completion_tokens: 8192
|
4580
4706
|
is_moderated: false
|
4581
4707
|
:per_request_limits:
|
4582
4708
|
- :id: openchat/openchat-7b:free
|
@@ -4626,7 +4752,7 @@
|
|
4626
4752
|
request: '0'
|
4627
4753
|
:top_provider:
|
4628
4754
|
context_length: 8192
|
4629
|
-
max_completion_tokens:
|
4755
|
+
max_completion_tokens: 8192
|
4630
4756
|
is_moderated: false
|
4631
4757
|
:per_request_limits:
|
4632
4758
|
- :id: neversleep/noromaid-20b
|
@@ -4753,7 +4879,7 @@
|
|
4753
4879
|
request: '0'
|
4754
4880
|
:top_provider:
|
4755
4881
|
context_length: 4096
|
4756
|
-
max_completion_tokens:
|
4882
|
+
max_completion_tokens: 4096
|
4757
4883
|
is_moderated: false
|
4758
4884
|
:per_request_limits:
|
4759
4885
|
- :id: undi95/toppy-m-7b:free
|
@@ -4784,34 +4910,6 @@
|
|
4784
4910
|
max_completion_tokens: 2048
|
4785
4911
|
is_moderated: false
|
4786
4912
|
:per_request_limits:
|
4787
|
-
- :id: undi95/toppy-m-7b:nitro
|
4788
|
-
:name: Toppy M 7B (nitro)
|
4789
|
-
:created: 1699574400
|
4790
|
-
:description: |-
|
4791
|
-
A wild 7B parameter model that merges several models using the new task_arithmetic merge method from mergekit.
|
4792
|
-
List of merged models:
|
4793
|
-
- NousResearch/Nous-Capybara-7B-V1.9
|
4794
|
-
- [HuggingFaceH4/zephyr-7b-beta](/models/huggingfaceh4/zephyr-7b-beta)
|
4795
|
-
- lemonilia/AshhLimaRP-Mistral-7B
|
4796
|
-
- Vulkane/120-Days-of-Sodom-LoRA-Mistral-7b
|
4797
|
-
- Undi95/Mistral-pippa-sharegpt-7b-qlora
|
4798
|
-
|
4799
|
-
#merge #uncensored
|
4800
|
-
:context_length: 4096
|
4801
|
-
:architecture:
|
4802
|
-
modality: text->text
|
4803
|
-
tokenizer: Mistral
|
4804
|
-
instruct_type: alpaca
|
4805
|
-
:pricing:
|
4806
|
-
prompt: '0.00000007'
|
4807
|
-
completion: '0.00000007'
|
4808
|
-
image: '0'
|
4809
|
-
request: '0'
|
4810
|
-
:top_provider:
|
4811
|
-
context_length: 4096
|
4812
|
-
max_completion_tokens:
|
4813
|
-
is_moderated: false
|
4814
|
-
:per_request_limits:
|
4815
4913
|
- :id: undi95/toppy-m-7b
|
4816
4914
|
:name: Toppy M 7B
|
4817
4915
|
:created: 1699574400
|
@@ -4891,6 +4989,7 @@
|
|
4891
4989
|
- [google/gemini-flash-1.5](/google/gemini-flash-1.5)
|
4892
4990
|
- [mistralai/mistral-large-2407](/mistralai/mistral-large-2407)
|
4893
4991
|
- [mistralai/mistral-nemo](/mistralai/mistral-nemo)
|
4992
|
+
- [deepseek/deepseek-r1](/deepseek/deepseek-r1)
|
4894
4993
|
- [meta-llama/llama-3.1-70b-instruct](/meta-llama/llama-3.1-70b-instruct)
|
4895
4994
|
- [meta-llama/llama-3.1-405b-instruct](/meta-llama/llama-3.1-405b-instruct)
|
4896
4995
|
- [mistralai/mixtral-8x22b-instruct](/mistralai/mixtral-8x22b-instruct)
|
@@ -5175,8 +5274,8 @@
|
|
5175
5274
|
tokenizer: Llama2
|
5176
5275
|
instruct_type: alpaca
|
5177
5276
|
:pricing:
|
5178
|
-
prompt: '0.
|
5179
|
-
completion: '0.
|
5277
|
+
prompt: '0.00000018'
|
5278
|
+
completion: '0.00000018'
|
5180
5279
|
image: '0'
|
5181
5280
|
request: '0'
|
5182
5281
|
:top_provider:
|
@@ -5287,26 +5386,6 @@
|
|
5287
5386
|
max_completion_tokens: 4096
|
5288
5387
|
is_moderated: false
|
5289
5388
|
:per_request_limits:
|
5290
|
-
- :id: undi95/remm-slerp-l2-13b:extended
|
5291
|
-
:name: ReMM SLERP 13B (extended)
|
5292
|
-
:created: 1689984000
|
5293
|
-
:description: 'A recreation trial of the original MythoMax-L2-B13 but with updated
|
5294
|
-
models. #merge'
|
5295
|
-
:context_length: 6144
|
5296
|
-
:architecture:
|
5297
|
-
modality: text->text
|
5298
|
-
tokenizer: Llama2
|
5299
|
-
instruct_type: alpaca
|
5300
|
-
:pricing:
|
5301
|
-
prompt: '0.000001125'
|
5302
|
-
completion: '0.000001125'
|
5303
|
-
image: '0'
|
5304
|
-
request: '0'
|
5305
|
-
:top_provider:
|
5306
|
-
context_length: 6144
|
5307
|
-
max_completion_tokens: 512
|
5308
|
-
is_moderated: false
|
5309
|
-
:per_request_limits:
|
5310
5389
|
- :id: google/palm-2-chat-bison
|
5311
5390
|
:name: 'Google: PaLM 2 Chat'
|
5312
5391
|
:created: 1689811200
|
@@ -5387,46 +5466,6 @@
|
|
5387
5466
|
max_completion_tokens: 4096
|
5388
5467
|
is_moderated: false
|
5389
5468
|
:per_request_limits:
|
5390
|
-
- :id: gryphe/mythomax-l2-13b:nitro
|
5391
|
-
:name: MythoMax 13B (nitro)
|
5392
|
-
:created: 1688256000
|
5393
|
-
:description: 'One of the highest performing and most popular fine-tunes of Llama
|
5394
|
-
2 13B, with rich descriptions and roleplay. #merge'
|
5395
|
-
:context_length: 4096
|
5396
|
-
:architecture:
|
5397
|
-
modality: text->text
|
5398
|
-
tokenizer: Llama2
|
5399
|
-
instruct_type: alpaca
|
5400
|
-
:pricing:
|
5401
|
-
prompt: '0.0000002'
|
5402
|
-
completion: '0.0000002'
|
5403
|
-
image: '0'
|
5404
|
-
request: '0'
|
5405
|
-
:top_provider:
|
5406
|
-
context_length: 4096
|
5407
|
-
max_completion_tokens:
|
5408
|
-
is_moderated: false
|
5409
|
-
:per_request_limits:
|
5410
|
-
- :id: gryphe/mythomax-l2-13b:extended
|
5411
|
-
:name: MythoMax 13B (extended)
|
5412
|
-
:created: 1688256000
|
5413
|
-
:description: 'One of the highest performing and most popular fine-tunes of Llama
|
5414
|
-
2 13B, with rich descriptions and roleplay. #merge'
|
5415
|
-
:context_length: 8192
|
5416
|
-
:architecture:
|
5417
|
-
modality: text->text
|
5418
|
-
tokenizer: Llama2
|
5419
|
-
instruct_type: alpaca
|
5420
|
-
:pricing:
|
5421
|
-
prompt: '0.000001125'
|
5422
|
-
completion: '0.000001125'
|
5423
|
-
image: '0'
|
5424
|
-
request: '0'
|
5425
|
-
:top_provider:
|
5426
|
-
context_length: 8192
|
5427
|
-
max_completion_tokens: 512
|
5428
|
-
is_moderated: false
|
5429
|
-
:per_request_limits:
|
5430
5469
|
- :id: meta-llama/llama-2-13b-chat
|
5431
5470
|
:name: 'Meta: Llama 2 13B Chat'
|
5432
5471
|
:created: 1687219200
|