ai_client 0.2.3 → 0.2.5

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,4839 @@
1
+ ---
2
+ - :id: google/gemini-flash-1.5-8b
3
+ :name: 'Google: Gemini 1.5 Flash-8B'
4
+ :created: 1727913600
5
+ :description: |-
6
+ Gemini 1.5 Flash-8B is optimized for speed and efficiency, offering enhanced performance in small prompt tasks like chat, transcription, and translation. With reduced latency, it is highly effective for real-time and large-scale operations. This model focuses on cost-effective solutions while maintaining high-quality results.
7
+
8
+ [Click here to learn more about this model](https://developers.googleblog.com/en/gemini-15-flash-8b-is-now-generally-available-for-use/).
9
+
10
+ Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
11
+ :context_length: 1000000
12
+ :architecture:
13
+ modality: text+image->text
14
+ tokenizer: Gemini
15
+ instruct_type:
16
+ :pricing:
17
+ prompt: '0.0000000375'
18
+ completion: '0.00000015'
19
+ image: '0'
20
+ request: '0'
21
+ :top_provider:
22
+ context_length: 1000000
23
+ max_completion_tokens: 8192
24
+ is_moderated: false
25
+ :per_request_limits:
26
+ prompt_tokens: '538608446'
27
+ completion_tokens: '134652111'
28
+ - :id: liquid/lfm-40b
29
+ :name: 'Liquid: LFM 40B MoE'
30
+ :created: 1727654400
31
+ :description: |-
32
+ Liquid's 40.3B Mixture of Experts (MoE) model. Liquid Foundation Models (LFMs) are large neural networks built with computational units rooted in dynamic systems.
33
+
34
+ LFMs are general-purpose AI models that can be used to model any kind of sequential data, including video, audio, text, time series, and signals.
35
+
36
+ See the [launch announcement](https://www.liquid.ai/liquid-foundation-models) for benchmarks and more info.
37
+ :context_length: 32768
38
+ :architecture:
39
+ modality: text->text
40
+ tokenizer: Other
41
+ instruct_type: vicuna
42
+ :pricing:
43
+ prompt: '0'
44
+ completion: '0'
45
+ image: '0'
46
+ request: '0'
47
+ :top_provider:
48
+ context_length: 32768
49
+ max_completion_tokens:
50
+ is_moderated: false
51
+ :per_request_limits:
52
+ prompt_tokens: Infinity
53
+ completion_tokens: Infinity
54
+ - :id: liquid/lfm-40b:free
55
+ :name: 'Liquid: LFM 40B MoE (free)'
56
+ :created: 1727654400
57
+ :description: |-
58
+ Liquid's 40.3B Mixture of Experts (MoE) model. Liquid Foundation Models (LFMs) are large neural networks built with computational units rooted in dynamic systems.
59
+
60
+ LFMs are general-purpose AI models that can be used to model any kind of sequential data, including video, audio, text, time series, and signals.
61
+
62
+ See the [launch announcement](https://www.liquid.ai/liquid-foundation-models) for benchmarks and more info.
63
+
64
+ _These are free, rate-limited endpoints for [LFM 40B MoE](/liquid/lfm-40b). Outputs may be cached. Read about rate limits [here](/docs/limits)._
65
+ :context_length: 32768
66
+ :architecture:
67
+ modality: text->text
68
+ tokenizer: Other
69
+ instruct_type: vicuna
70
+ :pricing:
71
+ prompt: '0'
72
+ completion: '0'
73
+ image: '0'
74
+ request: '0'
75
+ :top_provider:
76
+ context_length: 8192
77
+ max_completion_tokens: 4096
78
+ is_moderated: false
79
+ :per_request_limits:
80
+ prompt_tokens: Infinity
81
+ completion_tokens: Infinity
82
+ - :id: thedrummer/rocinante-12b
83
+ :name: Rocinante 12B
84
+ :created: 1727654400
85
+ :description: |-
86
+ Rocinante 12B is designed for engaging storytelling and rich prose.
87
+
88
+ Early testers have reported:
89
+ - Expanded vocabulary with unique and expressive word choices
90
+ - Enhanced creativity for vivid narratives
91
+ - Adventure-filled and captivating stories
92
+ :context_length: 32768
93
+ :architecture:
94
+ modality: text->text
95
+ tokenizer: Qwen
96
+ instruct_type: chatml
97
+ :pricing:
98
+ prompt: '0.00000025'
99
+ completion: '0.0000005'
100
+ image: '0'
101
+ request: '0'
102
+ :top_provider:
103
+ context_length: 32768
104
+ max_completion_tokens:
105
+ is_moderated: false
106
+ :per_request_limits:
107
+ prompt_tokens: '80791266'
108
+ completion_tokens: '40395633'
109
+ - :id: eva-unit-01/eva-qwen-2.5-14b
110
+ :name: EVA Qwen2.5 14B
111
+ :created: 1727654400
112
+ :description: |-
113
+ A model specializing in RP and creative writing, this model is based on Qwen2.5-14B, fine-tuned with a mixture of synthetic and natural data.
114
+
115
+ It is trained on 1.5M tokens of role-play data, and fine-tuned on 1.5M tokens of synthetic data.
116
+ :context_length: 32768
117
+ :architecture:
118
+ modality: text->text
119
+ tokenizer: Qwen
120
+ instruct_type: chatml
121
+ :pricing:
122
+ prompt: '0.00000025'
123
+ completion: '0.0000005'
124
+ image: '0'
125
+ request: '0'
126
+ :top_provider:
127
+ context_length: 32768
128
+ max_completion_tokens:
129
+ is_moderated: false
130
+ :per_request_limits:
131
+ prompt_tokens: '80791266'
132
+ completion_tokens: '40395633'
133
+ - :id: anthracite-org/magnum-v2-72b
134
+ :name: Magnum v2 72B
135
+ :created: 1727654400
136
+ :description: |-
137
+ From the maker of [Goliath](https://openrouter.ai/alpindale/goliath-120b), Magnum 72B is the seventh in a family of models designed to achieve the prose quality of the Claude 3 models, notably Opus & Sonnet.
138
+
139
+ The model is based on [Qwen2 72B](https://openrouter.ai/qwen/qwen-2-72b-instruct) and trained with 55 million tokens of highly curated roleplay (RP) data.
140
+ :context_length: 32768
141
+ :architecture:
142
+ modality: text->text
143
+ tokenizer: Qwen
144
+ instruct_type: chatml
145
+ :pricing:
146
+ prompt: '0.00000375'
147
+ completion: '0.0000045'
148
+ image: '0'
149
+ request: '0'
150
+ :top_provider:
151
+ context_length: 32768
152
+ max_completion_tokens:
153
+ is_moderated: false
154
+ :per_request_limits:
155
+ prompt_tokens: '5386084'
156
+ completion_tokens: '4488403'
157
+ - :id: meta-llama/llama-3.2-3b-instruct:free
158
+ :name: 'Meta: Llama 3.2 3B Instruct (free)'
159
+ :created: 1727222400
160
+ :description: |-
161
+ Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages.
162
+
163
+ Trained on 9 trillion tokens, the Llama 3.2B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings.
164
+
165
+ Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md).
166
+
167
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
168
+
169
+ _These are free, rate-limited endpoints for [Llama 3.2 3B Instruct](/meta-llama/llama-3.2-3b-instruct). Outputs may be cached. Read about rate limits [here](/docs/limits)._
170
+ :context_length: 131072
171
+ :architecture:
172
+ modality: text->text
173
+ tokenizer: Llama3
174
+ instruct_type: llama3
175
+ :pricing:
176
+ prompt: '0'
177
+ completion: '0'
178
+ image: '0'
179
+ request: '0'
180
+ :top_provider:
181
+ context_length: 4096
182
+ max_completion_tokens: 2048
183
+ is_moderated: false
184
+ :per_request_limits:
185
+ prompt_tokens: Infinity
186
+ completion_tokens: Infinity
187
+ - :id: meta-llama/llama-3.2-3b-instruct
188
+ :name: 'Meta: Llama 3.2 3B Instruct'
189
+ :created: 1727222400
190
+ :description: |-
191
+ Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages.
192
+
193
+ Trained on 9 trillion tokens, the Llama 3.2B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings.
194
+
195
+ Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md).
196
+
197
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
198
+ :context_length: 131072
199
+ :architecture:
200
+ modality: text->text
201
+ tokenizer: Llama3
202
+ instruct_type: llama3
203
+ :pricing:
204
+ prompt: '0.00000003'
205
+ completion: '0.00000005'
206
+ image: '0'
207
+ request: '0'
208
+ :top_provider:
209
+ context_length: 131072
210
+ max_completion_tokens:
211
+ is_moderated: false
212
+ :per_request_limits:
213
+ prompt_tokens: '673260558'
214
+ completion_tokens: '403956334'
215
+ - :id: meta-llama/llama-3.2-1b-instruct:free
216
+ :name: 'Meta: Llama 3.2 1B Instruct (free)'
217
+ :created: 1727222400
218
+ :description: |-
219
+ Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate efficiently in low-resource environments while maintaining strong task performance.
220
+
221
+ Supporting eight core languages and fine-tunable for more, Llama 1.3B is ideal for businesses or developers seeking lightweight yet powerful AI solutions that can operate in diverse multilingual settings without the high computational demand of larger models.
222
+
223
+ Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md).
224
+
225
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
226
+
227
+ _These are free, rate-limited endpoints for [Llama 3.2 1B Instruct](/meta-llama/llama-3.2-1b-instruct). Outputs may be cached. Read about rate limits [here](/docs/limits)._
228
+ :context_length: 131072
229
+ :architecture:
230
+ modality: text->text
231
+ tokenizer: Llama3
232
+ instruct_type: llama3
233
+ :pricing:
234
+ prompt: '0'
235
+ completion: '0'
236
+ image: '0'
237
+ request: '0'
238
+ :top_provider:
239
+ context_length: 4096
240
+ max_completion_tokens: 2048
241
+ is_moderated: false
242
+ :per_request_limits:
243
+ prompt_tokens: Infinity
244
+ completion_tokens: Infinity
245
+ - :id: meta-llama/llama-3.2-1b-instruct
246
+ :name: 'Meta: Llama 3.2 1B Instruct'
247
+ :created: 1727222400
248
+ :description: |-
249
+ Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate efficiently in low-resource environments while maintaining strong task performance.
250
+
251
+ Supporting eight core languages and fine-tunable for more, Llama 1.3B is ideal for businesses or developers seeking lightweight yet powerful AI solutions that can operate in diverse multilingual settings without the high computational demand of larger models.
252
+
253
+ Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md).
254
+
255
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
256
+ :context_length: 131072
257
+ :architecture:
258
+ modality: text->text
259
+ tokenizer: Llama3
260
+ instruct_type: llama3
261
+ :pricing:
262
+ prompt: '0.00000001'
263
+ completion: '0.00000002'
264
+ image: '0'
265
+ request: '0'
266
+ :top_provider:
267
+ context_length: 131072
268
+ max_completion_tokens:
269
+ is_moderated: false
270
+ :per_request_limits:
271
+ prompt_tokens: '2019781674'
272
+ completion_tokens: '1009890837'
273
+ - :id: meta-llama/llama-3.2-90b-vision-instruct
274
+ :name: 'Meta: Llama 3.2 90B Vision Instruct'
275
+ :created: 1727222400
276
+ :description: |-
277
+ The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks.
278
+
279
+ This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis.
280
+
281
+ Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md).
282
+
283
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
284
+ :context_length: 131072
285
+ :architecture:
286
+ modality: text+image->text
287
+ tokenizer: Llama3
288
+ instruct_type: llama3
289
+ :pricing:
290
+ prompt: '0.00000035'
291
+ completion: '0.0000004'
292
+ image: '0.00050575'
293
+ request: '0'
294
+ :top_provider:
295
+ context_length: 8192
296
+ max_completion_tokens:
297
+ is_moderated: false
298
+ :per_request_limits:
299
+ prompt_tokens: '57708047'
300
+ completion_tokens: '50494541'
301
+ - :id: meta-llama/llama-3.2-11b-vision-instruct:free
302
+ :name: 'Meta: Llama 3.2 11B Vision Instruct (free)'
303
+ :created: 1727222400
304
+ :description: |-
305
+ Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis.
306
+
307
+ Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research.
308
+
309
+ Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md).
310
+
311
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
312
+
313
+ _These are free, rate-limited endpoints for [Llama 3.2 11B Vision Instruct](/meta-llama/llama-3.2-11b-vision-instruct). Outputs may be cached. Read about rate limits [here](/docs/limits)._
314
+ :context_length: 131072
315
+ :architecture:
316
+ modality: text+image->text
317
+ tokenizer: Llama3
318
+ instruct_type: llama3
319
+ :pricing:
320
+ prompt: '0'
321
+ completion: '0'
322
+ image: '0'
323
+ request: '0'
324
+ :top_provider:
325
+ context_length: 8192
326
+ max_completion_tokens: 4096
327
+ is_moderated: false
328
+ :per_request_limits:
329
+ prompt_tokens: Infinity
330
+ completion_tokens: Infinity
331
+ - :id: meta-llama/llama-3.2-11b-vision-instruct
332
+ :name: 'Meta: Llama 3.2 11B Vision Instruct'
333
+ :created: 1727222400
334
+ :description: |-
335
+ Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis.
336
+
337
+ Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research.
338
+
339
+ Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md).
340
+
341
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
342
+ :context_length: 131072
343
+ :architecture:
344
+ modality: text+image->text
345
+ tokenizer: Llama3
346
+ instruct_type: llama3
347
+ :pricing:
348
+ prompt: '0.000000055'
349
+ completion: '0.000000055'
350
+ image: '0.000079475'
351
+ request: '0'
352
+ :top_provider:
353
+ context_length: 131072
354
+ max_completion_tokens:
355
+ is_moderated: false
356
+ :per_request_limits:
357
+ prompt_tokens: '367233031'
358
+ completion_tokens: '367233031'
359
+ - :id: qwen/qwen-2.5-72b-instruct
360
+ :name: Qwen2.5 72B Instruct
361
+ :created: 1726704000
362
+ :description: |-
363
+ Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2:
364
+
365
+ - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains.
366
+
367
+ - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots.
368
+
369
+ - Long-context Support up to 128K tokens and can generate up to 8K tokens.
370
+
371
+ - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
372
+
373
+ Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
374
+ :context_length: 131072
375
+ :architecture:
376
+ modality: text+image->text
377
+ tokenizer: Qwen
378
+ instruct_type: chatml
379
+ :pricing:
380
+ prompt: '0.00000035'
381
+ completion: '0.0000004'
382
+ image: '0'
383
+ request: '0'
384
+ :top_provider:
385
+ context_length: 32000
386
+ max_completion_tokens:
387
+ is_moderated: false
388
+ :per_request_limits:
389
+ prompt_tokens: '57708047'
390
+ completion_tokens: '50494541'
391
+ - :id: qwen/qwen-2-vl-72b-instruct
392
+ :name: Qwen2-VL 72B Instruct
393
+ :created: 1726617600
394
+ :description: |-
395
+ Qwen2 VL 72B is a multimodal LLM from the Qwen Team with the following key enhancements:
396
+
397
+ - SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.
398
+
399
+ - Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.
400
+
401
+ - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.
402
+
403
+ - Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.
404
+
405
+ For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2-vl/) and [GitHub repo](https://github.com/QwenLM/Qwen2-VL).
406
+
407
+ Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
408
+ :context_length: 32768
409
+ :architecture:
410
+ modality: text+image->text
411
+ tokenizer: Qwen
412
+ instruct_type: chatml
413
+ :pricing:
414
+ prompt: '0.0000004'
415
+ completion: '0.0000004'
416
+ image: '0.000578'
417
+ request: '0'
418
+ :top_provider:
419
+ context_length: 4096
420
+ max_completion_tokens:
421
+ is_moderated: false
422
+ :per_request_limits:
423
+ prompt_tokens: '50494541'
424
+ completion_tokens: '50494541'
425
+ - :id: neversleep/llama-3.1-lumimaid-8b
426
+ :name: Lumimaid v0.2 8B
427
+ :created: 1726358400
428
+ :description: |-
429
+ Lumimaid v0.2 8B is a finetune of [Llama 3.1 8B](/meta-llama/llama-3.1-8b-instruct) with a "HUGE step up dataset wise" compared to Lumimaid v0.1. Sloppy chats output were purged.
430
+
431
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
432
+ :context_length: 131072
433
+ :architecture:
434
+ modality: text->text
435
+ tokenizer: Llama3
436
+ instruct_type: llama3
437
+ :pricing:
438
+ prompt: '0.0000001875'
439
+ completion: '0.000001125'
440
+ image: '0'
441
+ request: '0'
442
+ :top_provider:
443
+ context_length: 32768
444
+ max_completion_tokens: 2048
445
+ is_moderated: false
446
+ :per_request_limits:
447
+ prompt_tokens: '107721689'
448
+ completion_tokens: '17953614'
449
+ - :id: openai/o1-mini-2024-09-12
450
+ :name: 'OpenAI: o1-mini (2024-09-12)'
451
+ :created: 1726099200
452
+ :description: |-
453
+ The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding.
454
+
455
+ The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1).
456
+
457
+ Note: This model is currently experimental and not suitable for production use-cases, and may be heavily rate-limited.
458
+ :context_length: 128000
459
+ :architecture:
460
+ modality: text->text
461
+ tokenizer: GPT
462
+ instruct_type:
463
+ :pricing:
464
+ prompt: '0.000003'
465
+ completion: '0.000012'
466
+ image: '0'
467
+ request: '0'
468
+ :top_provider:
469
+ context_length: 128000
470
+ max_completion_tokens: 65536
471
+ is_moderated: true
472
+ :per_request_limits:
473
+ prompt_tokens: '6732605'
474
+ completion_tokens: '1683151'
475
+ - :id: openai/o1-mini
476
+ :name: 'OpenAI: o1-mini'
477
+ :created: 1726099200
478
+ :description: |-
479
+ The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding.
480
+
481
+ The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1).
482
+
483
+ Note: This model is currently experimental and not suitable for production use-cases, and may be heavily rate-limited.
484
+ :context_length: 128000
485
+ :architecture:
486
+ modality: text->text
487
+ tokenizer: GPT
488
+ instruct_type:
489
+ :pricing:
490
+ prompt: '0.000003'
491
+ completion: '0.000012'
492
+ image: '0'
493
+ request: '0'
494
+ :top_provider:
495
+ context_length: 128000
496
+ max_completion_tokens: 65536
497
+ is_moderated: true
498
+ :per_request_limits:
499
+ prompt_tokens: '6732605'
500
+ completion_tokens: '1683151'
501
+ - :id: openai/o1-preview-2024-09-12
502
+ :name: 'OpenAI: o1-preview (2024-09-12)'
503
+ :created: 1726099200
504
+ :description: |-
505
+ The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding.
506
+
507
+ The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1).
508
+
509
+ Note: This model is currently experimental and not suitable for production use-cases, and may be heavily rate-limited.
510
+ :context_length: 128000
511
+ :architecture:
512
+ modality: text->text
513
+ tokenizer: GPT
514
+ instruct_type:
515
+ :pricing:
516
+ prompt: '0.000015'
517
+ completion: '0.00006'
518
+ image: '0'
519
+ request: '0'
520
+ :top_provider:
521
+ context_length: 128000
522
+ max_completion_tokens: 32768
523
+ is_moderated: true
524
+ :per_request_limits:
525
+ prompt_tokens: '1346521'
526
+ completion_tokens: '336630'
527
+ - :id: openai/o1-preview
528
+ :name: 'OpenAI: o1-preview'
529
+ :created: 1726099200
530
+ :description: |-
531
+ The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding.
532
+
533
+ The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1).
534
+
535
+ Note: This model is currently experimental and not suitable for production use-cases, and may be heavily rate-limited.
536
+ :context_length: 128000
537
+ :architecture:
538
+ modality: text->text
539
+ tokenizer: GPT
540
+ instruct_type:
541
+ :pricing:
542
+ prompt: '0.000015'
543
+ completion: '0.00006'
544
+ image: '0'
545
+ request: '0'
546
+ :top_provider:
547
+ context_length: 128000
548
+ max_completion_tokens: 32768
549
+ is_moderated: true
550
+ :per_request_limits:
551
+ prompt_tokens: '1346521'
552
+ completion_tokens: '336630'
553
+ - :id: mistralai/pixtral-12b
554
+ :name: 'Mistral: Pixtral 12B'
555
+ :created: 1725926400
556
+ :description: 'The first image to text model from Mistral AI. Its weight was launched
557
+ via torrent per their tradition: https://x.com/mistralai/status/1833758285167722836'
558
+ :context_length: 4096
559
+ :architecture:
560
+ modality: text+image->text
561
+ tokenizer: Mistral
562
+ instruct_type: mistral
563
+ :pricing:
564
+ prompt: '0.0000001'
565
+ completion: '0.0000001'
566
+ image: '0.0001445'
567
+ request: '0'
568
+ :top_provider:
569
+ context_length: 4096
570
+ max_completion_tokens:
571
+ is_moderated: false
572
+ :per_request_limits:
573
+ prompt_tokens: '201978167'
574
+ completion_tokens: '201978167'
575
+ - :id: cohere/command-r-plus-08-2024
576
+ :name: 'Cohere: Command R+ (08-2024)'
577
+ :created: 1724976000
578
+ :description: |-
579
+ command-r-plus-08-2024 is an update of the [Command R+](/cohere/command-r-plus) with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint the same.
580
+
581
+ Read the launch post [here](https://docs.cohere.com/changelog/command-gets-refreshed).
582
+
583
+ Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
584
+ :context_length: 128000
585
+ :architecture:
586
+ modality: text->text
587
+ tokenizer: Cohere
588
+ instruct_type:
589
+ :pricing:
590
+ prompt: '0.000002375'
591
+ completion: '0.0000095'
592
+ image: '0'
593
+ request: '0'
594
+ :top_provider:
595
+ context_length: 128000
596
+ max_completion_tokens: 4000
597
+ is_moderated: false
598
+ :per_request_limits:
599
+ prompt_tokens: '8504343'
600
+ completion_tokens: '2126085'
601
+ - :id: cohere/command-r-08-2024
602
+ :name: 'Cohere: Command R (08-2024)'
603
+ :created: 1724976000
604
+ :description: |-
605
+ command-r-08-2024 is an update of the [Command R](/cohere/command-r) with improved performance for multilingual retrieval-augmented generation (RAG) and tool use. More broadly, it is better at math, code and reasoning and is competitive with the previous version of the larger Command R+ model.
606
+
607
+ Read the launch post [here](https://docs.cohere.com/changelog/command-gets-refreshed).
608
+
609
+ Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
610
+ :context_length: 128000
611
+ :architecture:
612
+ modality: text->text
613
+ tokenizer: Cohere
614
+ instruct_type:
615
+ :pricing:
616
+ prompt: '0.0000001425'
617
+ completion: '0.00000057'
618
+ image: '0'
619
+ request: '0'
620
+ :top_provider:
621
+ context_length: 128000
622
+ max_completion_tokens: 4000
623
+ is_moderated: false
624
+ :per_request_limits:
625
+ prompt_tokens: '141739064'
626
+ completion_tokens: '35434766'
627
+ - :id: qwen/qwen-2-vl-7b-instruct
628
+ :name: Qwen2-VL 7B Instruct
629
+ :created: 1724803200
630
+ :description: |-
631
+ Qwen2 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements:
632
+
633
+ - SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.
634
+
635
+ - Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.
636
+
637
+ - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.
638
+
639
+ - Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.
640
+
641
+ For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2-vl/) and [GitHub repo](https://github.com/QwenLM/Qwen2-VL).
642
+
643
+ Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
644
+ :context_length: 32768
645
+ :architecture:
646
+ modality: text+image->text
647
+ tokenizer: Qwen
648
+ instruct_type: chatml
649
+ :pricing:
650
+ prompt: '0.0000001'
651
+ completion: '0.0000001'
652
+ image: '0.0001445'
653
+ request: '0'
654
+ :top_provider:
655
+ context_length: 4096
656
+ max_completion_tokens:
657
+ is_moderated: false
658
+ :per_request_limits:
659
+ prompt_tokens: '201978167'
660
+ completion_tokens: '201978167'
661
+ - :id: google/gemini-flash-1.5-8b-exp
662
+ :name: 'Google: Gemini Flash 8B 1.5 Experimental'
663
+ :created: 1724803200
664
+ :description: |-
665
+ Gemini 1.5 Flash 8B Experimental is an experimental, 8B parameter version of the [Gemini 1.5 Flash](/google/gemini-flash-1.5) model.
666
+
667
+ Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
668
+
669
+ #multimodal
670
+
671
+ Note: This model is currently experimental and not suitable for production use-cases, and may be heavily rate-limited.
672
+ :context_length: 1000000
673
+ :architecture:
674
+ modality: text+image->text
675
+ tokenizer: Gemini
676
+ instruct_type:
677
+ :pricing:
678
+ prompt: '0'
679
+ completion: '0'
680
+ image: '0'
681
+ request: '0'
682
+ :top_provider:
683
+ context_length: 1000000
684
+ max_completion_tokens: 8192
685
+ is_moderated: false
686
+ :per_request_limits:
687
+ prompt_tokens: Infinity
688
+ completion_tokens: Infinity
689
+ - :id: sao10k/l3.1-euryale-70b
690
+ :name: Llama 3.1 Euryale 70B v2.2
691
+ :created: 1724803200
692
+ :description: Euryale L3.1 70B v2.2 is a model focused on creative roleplay from
693
+ [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.1](/sao10k/l3-euryale-70b).
694
+ :context_length: 8192
695
+ :architecture:
696
+ modality: text->text
697
+ tokenizer: Llama3
698
+ instruct_type: llama3
699
+ :pricing:
700
+ prompt: '0.00000035'
701
+ completion: '0.0000004'
702
+ image: '0'
703
+ request: '0'
704
+ :top_provider:
705
+ context_length: 128000
706
+ max_completion_tokens:
707
+ is_moderated: false
708
+ :per_request_limits:
709
+ prompt_tokens: '57708047'
710
+ completion_tokens: '50494541'
711
+ - :id: google/gemini-flash-1.5-exp
712
+ :name: 'Google: Gemini Flash 1.5 Experimental'
713
+ :created: 1724803200
714
+ :description: |-
715
+ Gemini 1.5 Flash Experimental is an experimental version of the [Gemini 1.5 Flash](/google/gemini-flash-1.5) model.
716
+
717
+ Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
718
+
719
+ #multimodal
720
+
721
+ Note: This model is currently experimental and not suitable for production use-cases, and may be heavily rate-limited.
722
+ :context_length: 1000000
723
+ :architecture:
724
+ modality: text+image->text
725
+ tokenizer: Gemini
726
+ instruct_type:
727
+ :pricing:
728
+ prompt: '0'
729
+ completion: '0'
730
+ image: '0'
731
+ request: '0'
732
+ :top_provider:
733
+ context_length: 1000000
734
+ max_completion_tokens: 8192
735
+ is_moderated: false
736
+ :per_request_limits:
737
+ prompt_tokens: Infinity
738
+ completion_tokens: Infinity
739
+ - :id: ai21/jamba-1-5-large
740
+ :name: 'AI21: Jamba 1.5 Large'
741
+ :created: 1724371200
742
+ :description: |-
743
+ Jamba 1.5 Large is part of AI21's new family of open models, offering superior speed, efficiency, and quality.
744
+
745
+ It features a 256K effective context window, the longest among open models, enabling improved performance on tasks like document summarization and analysis.
746
+
747
+ Built on a novel SSM-Transformer architecture, it outperforms larger models like Llama 3.1 70B on benchmarks while maintaining resource efficiency.
748
+
749
+ Read their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.
750
+ :context_length: 256000
751
+ :architecture:
752
+ modality: text->text
753
+ tokenizer: Other
754
+ instruct_type:
755
+ :pricing:
756
+ prompt: '0.000002'
757
+ completion: '0.000008'
758
+ image: '0'
759
+ request: '0'
760
+ :top_provider:
761
+ context_length: 256000
762
+ max_completion_tokens: 4096
763
+ is_moderated: false
764
+ :per_request_limits:
765
+ prompt_tokens: '10098908'
766
+ completion_tokens: '2524727'
767
+ - :id: ai21/jamba-1-5-mini
768
+ :name: 'AI21: Jamba 1.5 Mini'
769
+ :created: 1724371200
770
+ :description: |-
771
+ Jamba 1.5 Mini is the world's first production-grade Mamba-based model, combining SSM and Transformer architectures for a 256K context window and high efficiency.
772
+
773
+ It works with 9 languages and can handle various writing and analysis tasks as well as or better than similar small models.
774
+
775
+ This model uses less computer memory and works faster with longer texts than previous designs.
776
+
777
+ Read their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.
778
+ :context_length: 256000
779
+ :architecture:
780
+ modality: text->text
781
+ tokenizer: Other
782
+ instruct_type:
783
+ :pricing:
784
+ prompt: '0.0000002'
785
+ completion: '0.0000004'
786
+ image: '0'
787
+ request: '0'
788
+ :top_provider:
789
+ context_length: 256000
790
+ max_completion_tokens: 4096
791
+ is_moderated: false
792
+ :per_request_limits:
793
+ prompt_tokens: '100989083'
794
+ completion_tokens: '50494541'
795
+ - :id: microsoft/phi-3.5-mini-128k-instruct
796
+ :name: Phi-3.5 Mini 128K Instruct
797
+ :created: 1724198400
798
+ :description: |-
799
+ Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as [Phi-3 Mini](/microsoft/phi-3-mini-128k-instruct).
800
+
801
+ The models underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3.5 models showcased robust and state-of-the-art performance among models with less than 13 billion parameters.
802
+ :context_length: 128000
803
+ :architecture:
804
+ modality: text->text
805
+ tokenizer: Other
806
+ instruct_type: phi3
807
+ :pricing:
808
+ prompt: '0.0000001'
809
+ completion: '0.0000001'
810
+ image: '0'
811
+ request: '0'
812
+ :top_provider:
813
+ context_length: 128000
814
+ max_completion_tokens:
815
+ is_moderated: false
816
+ :per_request_limits:
817
+ prompt_tokens: '201978167'
818
+ completion_tokens: '201978167'
819
+ - :id: nousresearch/hermes-3-llama-3.1-70b
820
+ :name: 'Nous: Hermes 3 70B Instruct'
821
+ :created: 1723939200
822
+ :description: |-
823
+ Hermes 3 is a generalist language model with many improvements over [Hermes 2](/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.
824
+
825
+ Hermes 3 70B is a competitive, if not superior finetune of the [Llama-3.1 70B foundation model](/meta-llama/llama-3.1-70b-instruct), focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.
826
+
827
+ The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.
828
+ :context_length: 131072
829
+ :architecture:
830
+ modality: text->text
831
+ tokenizer: Llama3
832
+ instruct_type: chatml
833
+ :pricing:
834
+ prompt: '0.0000004'
835
+ completion: '0.0000004'
836
+ image: '0'
837
+ request: '0'
838
+ :top_provider:
839
+ context_length: 12288
840
+ max_completion_tokens:
841
+ is_moderated: false
842
+ :per_request_limits:
843
+ prompt_tokens: '50494541'
844
+ completion_tokens: '50494541'
845
+ - :id: nousresearch/hermes-3-llama-3.1-405b:free
846
+ :name: 'Nous: Hermes 3 405B Instruct (free)'
847
+ :created: 1723766400
848
+ :description: |-
849
+ Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.
850
+
851
+ Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.
852
+
853
+ The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.
854
+
855
+ Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two.
856
+
857
+ _These are free, rate-limited endpoints for [Hermes 3 405B Instruct](/nousresearch/hermes-3-llama-3.1-405b). Outputs may be cached. Read about rate limits [here](/docs/limits)._
858
+ :context_length: 131072
859
+ :architecture:
860
+ modality: text->text
861
+ tokenizer: Llama3
862
+ instruct_type: chatml
863
+ :pricing:
864
+ prompt: '0'
865
+ completion: '0'
866
+ image: '0'
867
+ request: '0'
868
+ :top_provider:
869
+ context_length: 8192
870
+ max_completion_tokens: 4096
871
+ is_moderated: false
872
+ :per_request_limits:
873
+ prompt_tokens: Infinity
874
+ completion_tokens: Infinity
875
+ - :id: nousresearch/hermes-3-llama-3.1-405b
876
+ :name: 'Nous: Hermes 3 405B Instruct'
877
+ :created: 1723766400
878
+ :description: |-
879
+ Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.
880
+
881
+ Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.
882
+
883
+ The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.
884
+
885
+ Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two.
886
+ :context_length: 131072
887
+ :architecture:
888
+ modality: text->text
889
+ tokenizer: Llama3
890
+ instruct_type: chatml
891
+ :pricing:
892
+ prompt: '0.0000045'
893
+ completion: '0.0000045'
894
+ image: '0'
895
+ request: '0'
896
+ :top_provider:
897
+ context_length: 18000
898
+ max_completion_tokens:
899
+ is_moderated: false
900
+ :per_request_limits:
901
+ prompt_tokens: '4488403'
902
+ completion_tokens: '4488403'
903
+ - :id: nousresearch/hermes-3-llama-3.1-405b:extended
904
+ :name: 'Nous: Hermes 3 405B Instruct (extended)'
905
+ :created: 1723766400
906
+ :description: |-
907
+ Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.
908
+
909
+ Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.
910
+
911
+ The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.
912
+
913
+ Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two.
914
+
915
+ _These are extended-context endpoints for [Hermes 3 405B Instruct](/nousresearch/hermes-3-llama-3.1-405b). They may have higher prices._
916
+ :context_length: 128000
917
+ :architecture:
918
+ modality: text->text
919
+ tokenizer: Llama3
920
+ instruct_type: chatml
921
+ :pricing:
922
+ prompt: '0.0000045'
923
+ completion: '0.0000045'
924
+ image: '0'
925
+ request: '0'
926
+ :top_provider:
927
+ context_length: 128000
928
+ max_completion_tokens:
929
+ is_moderated: false
930
+ :per_request_limits:
931
+ prompt_tokens: '4488403'
932
+ completion_tokens: '4488403'
933
+ - :id: perplexity/llama-3.1-sonar-huge-128k-online
934
+ :name: 'Perplexity: Llama 3.1 Sonar 405B Online'
935
+ :created: 1723593600
936
+ :description: Llama 3.1 Sonar is Perplexity's latest model family. It surpasses
937
+ their earlier Sonar models in cost-efficiency, speed, and performance. The model
938
+ is built upon the Llama 3.1 405B and has internet access.
939
+ :context_length: 127072
940
+ :architecture:
941
+ modality: text->text
942
+ tokenizer: Llama3
943
+ instruct_type:
944
+ :pricing:
945
+ prompt: '0.000005'
946
+ completion: '0.000005'
947
+ image: '0'
948
+ request: '0.005'
949
+ :top_provider:
950
+ context_length: 127072
951
+ max_completion_tokens:
952
+ is_moderated: false
953
+ :per_request_limits:
954
+ prompt_tokens: '4039563'
955
+ completion_tokens: '4039563'
956
+ - :id: openai/chatgpt-4o-latest
957
+ :name: 'OpenAI: ChatGPT-4o'
958
+ :created: 1723593600
959
+ :description: |-
960
+ Dynamic model continuously updated to the current version of [GPT-4o](/openai/gpt-4o) in ChatGPT. Intended for research and evaluation.
961
+
962
+ Note: This model is currently experimental and not suitable for production use-cases, and may be heavily rate-limited.
963
+ :context_length: 128000
964
+ :architecture:
965
+ modality: text+image->text
966
+ tokenizer: GPT
967
+ instruct_type:
968
+ :pricing:
969
+ prompt: '0.000005'
970
+ completion: '0.000015'
971
+ image: '0.007225'
972
+ request: '0'
973
+ :top_provider:
974
+ context_length: 128000
975
+ max_completion_tokens: 16384
976
+ is_moderated: true
977
+ :per_request_limits:
978
+ prompt_tokens: '4039563'
979
+ completion_tokens: '1346521'
980
+ - :id: sao10k/l3-lunaris-8b
981
+ :name: Llama 3 8B Lunaris
982
+ :created: 1723507200
983
+ :description: |-
984
+ Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge.
985
+
986
+ Created by [Sao10k](https://huggingface.co/Sao10k), this model aims to offer an improved experience over Stheno v3.2, with enhanced creativity and logical reasoning.
987
+
988
+ For best results, use with Llama 3 Instruct context template, temperature 1.4, and min_p 0.1.
989
+ :context_length: 8192
990
+ :architecture:
991
+ modality: text->text
992
+ tokenizer: Llama3
993
+ instruct_type: llama3
994
+ :pricing:
995
+ prompt: '0.000002'
996
+ completion: '0.000002'
997
+ image: '0'
998
+ request: '0'
999
+ :top_provider:
1000
+ context_length: 8192
1001
+ max_completion_tokens:
1002
+ is_moderated: false
1003
+ :per_request_limits:
1004
+ prompt_tokens: '10098908'
1005
+ completion_tokens: '10098908'
1006
+ - :id: aetherwiing/mn-starcannon-12b
1007
+ :name: Mistral Nemo 12B Starcannon
1008
+ :created: 1723507200
1009
+ :description: |-
1010
+ Starcannon 12B is a creative roleplay and story writing model, using [nothingiisreal/mn-celeste-12b](https://openrouter.ai/nothingiisreal/mn-celeste-12b) as a base and [intervitens/mini-magnum-12b-v1.1](https://huggingface.co/intervitens/mini-magnum-12b-v1.1) merged in using the [TIES](https://arxiv.org/abs/2306.01708) method.
1011
+
1012
+ Although more similar to Magnum overall, the model remains very creative, with a pleasant writing style. It is recommended for people wanting more variety than Magnum, and yet more verbose prose than Celeste.
1013
+ :context_length: 12000
1014
+ :architecture:
1015
+ modality: text->text
1016
+ tokenizer: Mistral
1017
+ instruct_type: chatml
1018
+ :pricing:
1019
+ prompt: '0.000002'
1020
+ completion: '0.000002'
1021
+ image: '0'
1022
+ request: '0'
1023
+ :top_provider:
1024
+ context_length: 12000
1025
+ max_completion_tokens:
1026
+ is_moderated: false
1027
+ :per_request_limits:
1028
+ prompt_tokens: '10098908'
1029
+ completion_tokens: '10098908'
1030
+ - :id: openai/gpt-4o-2024-08-06
1031
+ :name: 'OpenAI: GPT-4o (2024-08-06)'
1032
+ :created: 1722902400
1033
+ :description: |-
1034
+ The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [here](https://openai.com/index/introducing-structured-outputs-in-the-api/).
1035
+
1036
+ GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.
1037
+
1038
+ For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209)
1039
+ :context_length: 128000
1040
+ :architecture:
1041
+ modality: text+image->text
1042
+ tokenizer: GPT
1043
+ instruct_type:
1044
+ :pricing:
1045
+ prompt: '0.0000025'
1046
+ completion: '0.00001'
1047
+ image: '0.0036125'
1048
+ request: '0'
1049
+ :top_provider:
1050
+ context_length: 128000
1051
+ max_completion_tokens: 16384
1052
+ is_moderated: true
1053
+ :per_request_limits:
1054
+ prompt_tokens: '8079126'
1055
+ completion_tokens: '2019781'
1056
+ - :id: meta-llama/llama-3.1-405b
1057
+ :name: 'Meta: Llama 3.1 405B (base)'
1058
+ :created: 1722556800
1059
+ :description: |-
1060
+ Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This is the base 405B pre-trained version.
1061
+
1062
+ It has demonstrated strong performance compared to leading closed-source models in human evaluations.
1063
+
1064
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
1065
+ :context_length: 131072
1066
+ :architecture:
1067
+ modality: text->text
1068
+ tokenizer: Llama3
1069
+ instruct_type: none
1070
+ :pricing:
1071
+ prompt: '0.000002'
1072
+ completion: '0.000002'
1073
+ image: '0'
1074
+ request: '0'
1075
+ :top_provider:
1076
+ context_length: 32768
1077
+ max_completion_tokens:
1078
+ is_moderated: false
1079
+ :per_request_limits:
1080
+ prompt_tokens: '10098908'
1081
+ completion_tokens: '10098908'
1082
+ - :id: nothingiisreal/mn-celeste-12b
1083
+ :name: Mistral Nemo 12B Celeste
1084
+ :created: 1722556800
1085
+ :description: |-
1086
+ A specialized story writing and roleplaying model based on Mistral's NeMo 12B Instruct. Fine-tuned on curated datasets including Reddit Writing Prompts and Opus Instruct 25K.
1087
+
1088
+ This model excels at creative writing, offering improved NSFW capabilities, with smarter and more active narration. It demonstrates remarkable versatility in both SFW and NSFW scenarios, with strong Out of Character (OOC) steering capabilities, allowing fine-tuned control over narrative direction and character behavior.
1089
+
1090
+ Check out the model's [HuggingFace page](https://huggingface.co/nothingiisreal/MN-12B-Celeste-V1.9) for details on what parameters and prompts work best!
1091
+ :context_length: 32000
1092
+ :architecture:
1093
+ modality: text->text
1094
+ tokenizer: Mistral
1095
+ instruct_type: chatml
1096
+ :pricing:
1097
+ prompt: '0.0000015'
1098
+ completion: '0.0000015'
1099
+ image: '0'
1100
+ request: '0'
1101
+ :top_provider:
1102
+ context_length: 32000
1103
+ max_completion_tokens:
1104
+ is_moderated: false
1105
+ :per_request_limits:
1106
+ prompt_tokens: '13465211'
1107
+ completion_tokens: '13465211'
1108
+ - :id: google/gemini-pro-1.5-exp
1109
+ :name: 'Google: Gemini Pro 1.5 Experimental'
1110
+ :created: 1722470400
1111
+ :description: |-
1112
+ Gemini 1.5 Pro (0827) is an experimental version of the [Gemini 1.5 Pro](/google/gemini-pro-1.5) model.
1113
+
1114
+ Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
1115
+
1116
+ #multimodal
1117
+
1118
+ Note: This model is currently experimental and not suitable for production use-cases, and may be heavily rate-limited.
1119
+ :context_length: 1000000
1120
+ :architecture:
1121
+ modality: text+image->text
1122
+ tokenizer: Gemini
1123
+ instruct_type:
1124
+ :pricing:
1125
+ prompt: '0'
1126
+ completion: '0'
1127
+ image: '0'
1128
+ request: '0'
1129
+ :top_provider:
1130
+ context_length: 1000000
1131
+ max_completion_tokens: 8192
1132
+ is_moderated: false
1133
+ :per_request_limits:
1134
+ prompt_tokens: Infinity
1135
+ completion_tokens: Infinity
1136
+ - :id: perplexity/llama-3.1-sonar-large-128k-online
1137
+ :name: 'Perplexity: Llama 3.1 Sonar 70B Online'
1138
+ :created: 1722470400
1139
+ :description: |-
1140
+ Llama 3.1 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance.
1141
+
1142
+ This is the online version of the [offline chat model](/perplexity/llama-3.1-sonar-large-128k-chat). It is focused on delivering helpful, up-to-date, and factual responses. #online
1143
+ :context_length: 127072
1144
+ :architecture:
1145
+ modality: text->text
1146
+ tokenizer: Llama3
1147
+ instruct_type:
1148
+ :pricing:
1149
+ prompt: '0.000001'
1150
+ completion: '0.000001'
1151
+ image: '0'
1152
+ request: '0.005'
1153
+ :top_provider:
1154
+ context_length: 127072
1155
+ max_completion_tokens:
1156
+ is_moderated: false
1157
+ :per_request_limits:
1158
+ prompt_tokens: '20197816'
1159
+ completion_tokens: '20197816'
1160
+ - :id: perplexity/llama-3.1-sonar-large-128k-chat
1161
+ :name: 'Perplexity: Llama 3.1 Sonar 70B'
1162
+ :created: 1722470400
1163
+ :description: |-
1164
+ Llama 3.1 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance.
1165
+
1166
+ This is a normal offline LLM, but the [online version](/perplexity/llama-3.1-sonar-large-128k-online) of this model has Internet access.
1167
+ :context_length: 131072
1168
+ :architecture:
1169
+ modality: text->text
1170
+ tokenizer: Llama3
1171
+ instruct_type:
1172
+ :pricing:
1173
+ prompt: '0.000001'
1174
+ completion: '0.000001'
1175
+ image: '0'
1176
+ request: '0'
1177
+ :top_provider:
1178
+ context_length: 131072
1179
+ max_completion_tokens:
1180
+ is_moderated: false
1181
+ :per_request_limits:
1182
+ prompt_tokens: '20197816'
1183
+ completion_tokens: '20197816'
1184
+ - :id: perplexity/llama-3.1-sonar-small-128k-online
1185
+ :name: 'Perplexity: Llama 3.1 Sonar 8B Online'
1186
+ :created: 1722470400
1187
+ :description: |-
1188
+ Llama 3.1 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance.
1189
+
1190
+ This is the online version of the [offline chat model](/perplexity/llama-3.1-sonar-small-128k-chat). It is focused on delivering helpful, up-to-date, and factual responses. #online
1191
+ :context_length: 127072
1192
+ :architecture:
1193
+ modality: text->text
1194
+ tokenizer: Llama3
1195
+ instruct_type:
1196
+ :pricing:
1197
+ prompt: '0.0000002'
1198
+ completion: '0.0000002'
1199
+ image: '0'
1200
+ request: '0.005'
1201
+ :top_provider:
1202
+ context_length: 127072
1203
+ max_completion_tokens:
1204
+ is_moderated: false
1205
+ :per_request_limits:
1206
+ prompt_tokens: '100989083'
1207
+ completion_tokens: '100989083'
1208
+ - :id: perplexity/llama-3.1-sonar-small-128k-chat
1209
+ :name: 'Perplexity: Llama 3.1 Sonar 8B'
1210
+ :created: 1722470400
1211
+ :description: |-
1212
+ Llama 3.1 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance.
1213
+
1214
+ This is a normal offline LLM, but the [online version](/perplexity/llama-3.1-sonar-small-128k-online) of this model has Internet access.
1215
+ :context_length: 131072
1216
+ :architecture:
1217
+ modality: text->text
1218
+ tokenizer: Llama3
1219
+ instruct_type:
1220
+ :pricing:
1221
+ prompt: '0.0000002'
1222
+ completion: '0.0000002'
1223
+ image: '0'
1224
+ request: '0'
1225
+ :top_provider:
1226
+ context_length: 131072
1227
+ max_completion_tokens:
1228
+ is_moderated: false
1229
+ :per_request_limits:
1230
+ prompt_tokens: '100989083'
1231
+ completion_tokens: '100989083'
1232
+ - :id: meta-llama/llama-3.1-70b-instruct:free
1233
+ :name: 'Meta: Llama 3.1 70B Instruct (free)'
1234
+ :created: 1721692800
1235
+ :description: |-
1236
+ Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases.
1237
+
1238
+ It has demonstrated strong performance compared to leading closed-source models in human evaluations.
1239
+
1240
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
1241
+
1242
+ _These are free, rate-limited endpoints for [Llama 3.1 70B Instruct](/meta-llama/llama-3.1-70b-instruct). Outputs may be cached. Read about rate limits [here](/docs/limits)._
1243
+ :context_length: 131072
1244
+ :architecture:
1245
+ modality: text->text
1246
+ tokenizer: Llama3
1247
+ instruct_type: llama3
1248
+ :pricing:
1249
+ prompt: '0'
1250
+ completion: '0'
1251
+ image: '0'
1252
+ request: '0'
1253
+ :top_provider:
1254
+ context_length: 8192
1255
+ max_completion_tokens: 4096
1256
+ is_moderated: false
1257
+ :per_request_limits:
1258
+ prompt_tokens: Infinity
1259
+ completion_tokens: Infinity
1260
+ - :id: meta-llama/llama-3.1-70b-instruct
1261
+ :name: 'Meta: Llama 3.1 70B Instruct'
1262
+ :created: 1721692800
1263
+ :description: |-
1264
+ Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases.
1265
+
1266
+ It has demonstrated strong performance compared to leading closed-source models in human evaluations.
1267
+
1268
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
1269
+ :context_length: 131072
1270
+ :architecture:
1271
+ modality: text->text
1272
+ tokenizer: Llama3
1273
+ instruct_type: llama3
1274
+ :pricing:
1275
+ prompt: '0.0000003'
1276
+ completion: '0.0000003'
1277
+ image: '0'
1278
+ request: '0'
1279
+ :top_provider:
1280
+ context_length: 131072
1281
+ max_completion_tokens:
1282
+ is_moderated: false
1283
+ :per_request_limits:
1284
+ prompt_tokens: '67326055'
1285
+ completion_tokens: '67326055'
1286
+ - :id: meta-llama/llama-3.1-8b-instruct:free
1287
+ :name: 'Meta: Llama 3.1 8B Instruct (free)'
1288
+ :created: 1721692800
1289
+ :description: |-
1290
+ Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient.
1291
+
1292
+ It has demonstrated strong performance compared to leading closed-source models in human evaluations.
1293
+
1294
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
1295
+
1296
+ _These are free, rate-limited endpoints for [Llama 3.1 8B Instruct](/meta-llama/llama-3.1-8b-instruct). Outputs may be cached. Read about rate limits [here](/docs/limits)._
1297
+ :context_length: 131072
1298
+ :architecture:
1299
+ modality: text->text
1300
+ tokenizer: Llama3
1301
+ instruct_type: llama3
1302
+ :pricing:
1303
+ prompt: '0'
1304
+ completion: '0'
1305
+ image: '0'
1306
+ request: '0'
1307
+ :top_provider:
1308
+ context_length: 8192
1309
+ max_completion_tokens: 4096
1310
+ is_moderated: false
1311
+ :per_request_limits:
1312
+ prompt_tokens: Infinity
1313
+ completion_tokens: Infinity
1314
+ - :id: meta-llama/llama-3.1-8b-instruct
1315
+ :name: 'Meta: Llama 3.1 8B Instruct'
1316
+ :created: 1721692800
1317
+ :description: |-
1318
+ Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient.
1319
+
1320
+ It has demonstrated strong performance compared to leading closed-source models in human evaluations.
1321
+
1322
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
1323
+ :context_length: 131072
1324
+ :architecture:
1325
+ modality: text->text
1326
+ tokenizer: Llama3
1327
+ instruct_type: llama3
1328
+ :pricing:
1329
+ prompt: '0.000000055'
1330
+ completion: '0.000000055'
1331
+ image: '0'
1332
+ request: '0'
1333
+ :top_provider:
1334
+ context_length: 100000
1335
+ max_completion_tokens:
1336
+ is_moderated: false
1337
+ :per_request_limits:
1338
+ prompt_tokens: '367233031'
1339
+ completion_tokens: '367233031'
1340
+ - :id: meta-llama/llama-3.1-405b-instruct:free
1341
+ :name: 'Meta: Llama 3.1 405B Instruct (free)'
1342
+ :created: 1721692800
1343
+ :description: |-
1344
+ The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs.
1345
+
1346
+ Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 405B instruct-tuned version is optimized for high quality dialogue usecases.
1347
+
1348
+ It has demonstrated strong performance compared to leading closed-source models in human evaluations.
1349
+
1350
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
1351
+
1352
+ _These are free, rate-limited endpoints for [Llama 3.1 405B Instruct](/meta-llama/llama-3.1-405b-instruct). Outputs may be cached. Read about rate limits [here](/docs/limits)._
1353
+ :context_length: 131072
1354
+ :architecture:
1355
+ modality: text->text
1356
+ tokenizer: Llama3
1357
+ instruct_type: llama3
1358
+ :pricing:
1359
+ prompt: '0'
1360
+ completion: '0'
1361
+ image: '0'
1362
+ request: '0'
1363
+ :top_provider:
1364
+ context_length: 8192
1365
+ max_completion_tokens: 4096
1366
+ is_moderated: false
1367
+ :per_request_limits:
1368
+ prompt_tokens: Infinity
1369
+ completion_tokens: Infinity
1370
+ - :id: meta-llama/llama-3.1-405b-instruct
1371
+ :name: 'Meta: Llama 3.1 405B Instruct'
1372
+ :created: 1721692800
1373
+ :description: |-
1374
+ The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs.
1375
+
1376
+ Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 405B instruct-tuned version is optimized for high quality dialogue usecases.
1377
+
1378
+ It has demonstrated strong performance compared to leading closed-source models in human evaluations.
1379
+
1380
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
1381
+ :context_length: 131072
1382
+ :architecture:
1383
+ modality: text->text
1384
+ tokenizer: Llama3
1385
+ instruct_type: llama3
1386
+ :pricing:
1387
+ prompt: '0.00000179'
1388
+ completion: '0.00000179'
1389
+ image: '0'
1390
+ request: '0'
1391
+ :top_provider:
1392
+ context_length: 32000
1393
+ max_completion_tokens:
1394
+ is_moderated: false
1395
+ :per_request_limits:
1396
+ prompt_tokens: '11283696'
1397
+ completion_tokens: '11283696'
1398
+ - :id: mistralai/codestral-mamba
1399
+ :name: 'Mistral: Codestral Mamba'
1400
+ :created: 1721347200
1401
+ :description: |-
1402
+ A 7.3B parameter Mamba-based model designed for code and reasoning tasks.
1403
+
1404
+ - Linear time inference, allowing for theoretically infinite sequence lengths
1405
+ - 256k token context window
1406
+ - Optimized for quick responses, especially beneficial for code productivity
1407
+ - Performs comparably to state-of-the-art transformer models in code and reasoning tasks
1408
+ - Available under the Apache 2.0 license for free use, modification, and distribution
1409
+ :context_length: 256000
1410
+ :architecture:
1411
+ modality: text->text
1412
+ tokenizer: Mistral
1413
+ instruct_type: mistral
1414
+ :pricing:
1415
+ prompt: '0.00000025'
1416
+ completion: '0.00000025'
1417
+ image: '0'
1418
+ request: '0'
1419
+ :top_provider:
1420
+ context_length: 256000
1421
+ max_completion_tokens:
1422
+ is_moderated: false
1423
+ :per_request_limits:
1424
+ prompt_tokens: '80791266'
1425
+ completion_tokens: '80791266'
1426
+ - :id: mistralai/mistral-nemo
1427
+ :name: 'Mistral: Mistral Nemo'
1428
+ :created: 1721347200
1429
+ :description: |-
1430
+ A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA.
1431
+
1432
+ The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
1433
+
1434
+ It supports function calling and is released under the Apache 2.0 license.
1435
+ :context_length: 128000
1436
+ :architecture:
1437
+ modality: text->text
1438
+ tokenizer: Mistral
1439
+ instruct_type: mistral
1440
+ :pricing:
1441
+ prompt: '0.00000013'
1442
+ completion: '0.00000013'
1443
+ image: '0'
1444
+ request: '0'
1445
+ :top_provider:
1446
+ context_length: 128000
1447
+ max_completion_tokens:
1448
+ is_moderated: false
1449
+ :per_request_limits:
1450
+ prompt_tokens: '155367821'
1451
+ completion_tokens: '155367821'
1452
+ - :id: openai/gpt-4o-mini-2024-07-18
1453
+ :name: 'OpenAI: GPT-4o-mini (2024-07-18)'
1454
+ :created: 1721260800
1455
+ :description: |-
1456
+ GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/openai/gpt-4o), supporting both text and image inputs with text outputs.
1457
+
1458
+ As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective.
1459
+
1460
+ GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/).
1461
+
1462
+ Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more.
1463
+ :context_length: 128000
1464
+ :architecture:
1465
+ modality: text+image->text
1466
+ tokenizer: GPT
1467
+ instruct_type:
1468
+ :pricing:
1469
+ prompt: '0.00000015'
1470
+ completion: '0.0000006'
1471
+ image: '0.007225'
1472
+ request: '0'
1473
+ :top_provider:
1474
+ context_length: 128000
1475
+ max_completion_tokens: 16384
1476
+ is_moderated: true
1477
+ :per_request_limits:
1478
+ prompt_tokens: '134652111'
1479
+ completion_tokens: '33663027'
1480
+ - :id: openai/gpt-4o-mini
1481
+ :name: 'OpenAI: GPT-4o-mini'
1482
+ :created: 1721260800
1483
+ :description: |-
1484
+ GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/openai/gpt-4o), supporting both text and image inputs with text outputs.
1485
+
1486
+ As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective.
1487
+
1488
+ GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/).
1489
+
1490
+ Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more.
1491
+ :context_length: 128000
1492
+ :architecture:
1493
+ modality: text+image->text
1494
+ tokenizer: GPT
1495
+ instruct_type:
1496
+ :pricing:
1497
+ prompt: '0.00000015'
1498
+ completion: '0.0000006'
1499
+ image: '0.007225'
1500
+ request: '0'
1501
+ :top_provider:
1502
+ context_length: 128000
1503
+ max_completion_tokens: 16384
1504
+ is_moderated: true
1505
+ :per_request_limits:
1506
+ prompt_tokens: '134652111'
1507
+ completion_tokens: '33663027'
1508
+ - :id: qwen/qwen-2-7b-instruct:free
1509
+ :name: Qwen 2 7B Instruct (free)
1510
+ :created: 1721088000
1511
+ :description: |-
1512
+ Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning.
1513
+
1514
+ It features SwiGLU activation, attention QKV bias, and group query attention. It is pretrained on extensive data with supervised finetuning and direct preference optimization.
1515
+
1516
+ For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2/) and [GitHub repo](https://github.com/QwenLM/Qwen2).
1517
+
1518
+ Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
1519
+
1520
+ _These are free, rate-limited endpoints for [Qwen 2 7B Instruct](/qwen/qwen-2-7b-instruct). Outputs may be cached. Read about rate limits [here](/docs/limits)._
1521
+ :context_length: 32768
1522
+ :architecture:
1523
+ modality: text->text
1524
+ tokenizer: Qwen
1525
+ instruct_type: chatml
1526
+ :pricing:
1527
+ prompt: '0'
1528
+ completion: '0'
1529
+ image: '0'
1530
+ request: '0'
1531
+ :top_provider:
1532
+ context_length: 8192
1533
+ max_completion_tokens: 4096
1534
+ is_moderated: false
1535
+ :per_request_limits:
1536
+ prompt_tokens: Infinity
1537
+ completion_tokens: Infinity
1538
+ - :id: qwen/qwen-2-7b-instruct
1539
+ :name: Qwen 2 7B Instruct
1540
+ :created: 1721088000
1541
+ :description: |-
1542
+ Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning.
1543
+
1544
+ It features SwiGLU activation, attention QKV bias, and group query attention. It is pretrained on extensive data with supervised finetuning and direct preference optimization.
1545
+
1546
+ For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2/) and [GitHub repo](https://github.com/QwenLM/Qwen2).
1547
+
1548
+ Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
1549
+ :context_length: 32768
1550
+ :architecture:
1551
+ modality: text->text
1552
+ tokenizer: Qwen
1553
+ instruct_type: chatml
1554
+ :pricing:
1555
+ prompt: '0.000000054'
1556
+ completion: '0.000000054'
1557
+ image: '0'
1558
+ request: '0'
1559
+ :top_provider:
1560
+ context_length: 32768
1561
+ max_completion_tokens:
1562
+ is_moderated: false
1563
+ :per_request_limits:
1564
+ prompt_tokens: '374033643'
1565
+ completion_tokens: '374033643'
1566
+ - :id: google/gemma-2-27b-it
1567
+ :name: 'Google: Gemma 2 27B'
1568
+ :created: 1720828800
1569
+ :description: |-
1570
+ Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini).
1571
+
1572
+ Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning.
1573
+
1574
+ See the [launch announcement](https://blog.google/technology/developers/google-gemma-2/) for more details. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
1575
+ :context_length: 8192
1576
+ :architecture:
1577
+ modality: text->text
1578
+ tokenizer: Gemini
1579
+ instruct_type: gemma
1580
+ :pricing:
1581
+ prompt: '0.00000027'
1582
+ completion: '0.00000027'
1583
+ image: '0'
1584
+ request: '0'
1585
+ :top_provider:
1586
+ context_length: 8192
1587
+ max_completion_tokens:
1588
+ is_moderated: false
1589
+ :per_request_limits:
1590
+ prompt_tokens: '74806728'
1591
+ completion_tokens: '74806728'
1592
+ - :id: alpindale/magnum-72b
1593
+ :name: Magnum 72B
1594
+ :created: 1720656000
1595
+ :description: |-
1596
+ From the maker of [Goliath](https://openrouter.ai/alpindale/goliath-120b), Magnum 72B is the first in a new family of models designed to achieve the prose quality of the Claude 3 models, notably Opus & Sonnet.
1597
+
1598
+ The model is based on [Qwen2 72B](https://openrouter.ai/qwen/qwen-2-72b-instruct) and trained with 55 million tokens of highly curated roleplay (RP) data.
1599
+ :context_length: 16384
1600
+ :architecture:
1601
+ modality: text->text
1602
+ tokenizer: Qwen
1603
+ instruct_type: chatml
1604
+ :pricing:
1605
+ prompt: '0.00000375'
1606
+ completion: '0.0000045'
1607
+ image: '0'
1608
+ request: '0'
1609
+ :top_provider:
1610
+ context_length: 16384
1611
+ max_completion_tokens: 1024
1612
+ is_moderated: false
1613
+ :per_request_limits:
1614
+ prompt_tokens: '5386084'
1615
+ completion_tokens: '4488403'
1616
+ - :id: nousresearch/hermes-2-theta-llama-3-8b
1617
+ :name: 'Nous: Hermes 2 Theta 8B'
1618
+ :created: 1720656000
1619
+ :description: |-
1620
+ An experimental merge model based on Llama 3, exhibiting a very distinctive style of writing. It combines the the best of [Meta's Llama 3 8B](https://openrouter.ai/meta-llama/llama-3-8b-instruct) and Nous Research's [Hermes 2 Pro](https://openrouter.ai/nousresearch/hermes-2-pro-llama-3-8b).
1621
+
1622
+ Hermes-2 Θ (theta) was specifically designed with a few capabilities in mind: executing function calls, generating JSON output, and most remarkably, demonstrating metacognitive abilities (contemplating the nature of thought and recognizing the diversity of cognitive processes among individuals).
1623
+ :context_length: 16384
1624
+ :architecture:
1625
+ modality: text->text
1626
+ tokenizer: Llama3
1627
+ instruct_type: chatml
1628
+ :pricing:
1629
+ prompt: '0.0000001875'
1630
+ completion: '0.000001125'
1631
+ image: '0'
1632
+ request: '0'
1633
+ :top_provider:
1634
+ context_length: 16384
1635
+ max_completion_tokens: 2048
1636
+ is_moderated: false
1637
+ :per_request_limits:
1638
+ prompt_tokens: '107721689'
1639
+ completion_tokens: '17953614'
1640
+ - :id: google/gemma-2-9b-it:free
1641
+ :name: 'Google: Gemma 2 9B (free)'
1642
+ :created: 1719532800
1643
+ :description: |-
1644
+ Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class.
1645
+
1646
+ Designed for a wide variety of tasks, it empowers developers and researchers to build innovative applications, while maintaining accessibility, safety, and cost-effectiveness.
1647
+
1648
+ See the [launch announcement](https://blog.google/technology/developers/google-gemma-2/) for more details. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
1649
+
1650
+ _These are free, rate-limited endpoints for [Gemma 2 9B](/google/gemma-2-9b-it). Outputs may be cached. Read about rate limits [here](/docs/limits)._
1651
+ :context_length: 8192
1652
+ :architecture:
1653
+ modality: text->text
1654
+ tokenizer: Gemini
1655
+ instruct_type: gemma
1656
+ :pricing:
1657
+ prompt: '0'
1658
+ completion: '0'
1659
+ image: '0'
1660
+ request: '0'
1661
+ :top_provider:
1662
+ context_length: 4096
1663
+ max_completion_tokens: 2048
1664
+ is_moderated: false
1665
+ :per_request_limits:
1666
+ prompt_tokens: Infinity
1667
+ completion_tokens: Infinity
1668
+ - :id: google/gemma-2-9b-it
1669
+ :name: 'Google: Gemma 2 9B'
1670
+ :created: 1719532800
1671
+ :description: |-
1672
+ Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class.
1673
+
1674
+ Designed for a wide variety of tasks, it empowers developers and researchers to build innovative applications, while maintaining accessibility, safety, and cost-effectiveness.
1675
+
1676
+ See the [launch announcement](https://blog.google/technology/developers/google-gemma-2/) for more details. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
1677
+ :context_length: 8192
1678
+ :architecture:
1679
+ modality: text->text
1680
+ tokenizer: Gemini
1681
+ instruct_type: gemma
1682
+ :pricing:
1683
+ prompt: '0.00000006'
1684
+ completion: '0.00000006'
1685
+ image: '0'
1686
+ request: '0'
1687
+ :top_provider:
1688
+ context_length: 4096
1689
+ max_completion_tokens:
1690
+ is_moderated: false
1691
+ :per_request_limits:
1692
+ prompt_tokens: '336630279'
1693
+ completion_tokens: '336630279'
1694
+ - :id: ai21/jamba-instruct
1695
+ :name: 'AI21: Jamba Instruct'
1696
+ :created: 1719273600
1697
+ :description: |-
1698
+ The Jamba-Instruct model, introduced by AI21 Labs, is an instruction-tuned variant of their hybrid SSM-Transformer Jamba model, specifically optimized for enterprise applications.
1699
+
1700
+ - 256K Context Window: It can process extensive information, equivalent to a 400-page novel, which is beneficial for tasks involving large documents such as financial reports or legal documents
1701
+ - Safety and Accuracy: Jamba-Instruct is designed with enhanced safety features to ensure secure deployment in enterprise environments, reducing the risk and cost of implementation
1702
+
1703
+ Read their [announcement](https://www.ai21.com/blog/announcing-jamba) to learn more.
1704
+
1705
+ Jamba has a knowledge cutoff of February 2024.
1706
+ :context_length: 256000
1707
+ :architecture:
1708
+ modality: text->text
1709
+ tokenizer: Other
1710
+ instruct_type:
1711
+ :pricing:
1712
+ prompt: '0.0000005'
1713
+ completion: '0.0000007'
1714
+ image: '0'
1715
+ request: '0'
1716
+ :top_provider:
1717
+ context_length: 256000
1718
+ max_completion_tokens: 4096
1719
+ is_moderated: false
1720
+ :per_request_limits:
1721
+ prompt_tokens: '40395633'
1722
+ completion_tokens: '28854023'
1723
+ - :id: anthropic/claude-3.5-sonnet
1724
+ :name: 'Anthropic: Claude 3.5 Sonnet'
1725
+ :created: 1718841600
1726
+ :description: |-
1727
+ Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:
1728
+
1729
+ - Coding: Autonomously writes, edits, and runs code with reasoning and troubleshooting
1730
+ - Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights
1731
+ - Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone
1732
+ - Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems)
1733
+
1734
+ #multimodal
1735
+ :context_length: 200000
1736
+ :architecture:
1737
+ modality: text+image->text
1738
+ tokenizer: Claude
1739
+ instruct_type:
1740
+ :pricing:
1741
+ prompt: '0.000003'
1742
+ completion: '0.000015'
1743
+ image: '0.0048'
1744
+ request: '0'
1745
+ :top_provider:
1746
+ context_length: 200000
1747
+ max_completion_tokens: 8192
1748
+ is_moderated: true
1749
+ :per_request_limits:
1750
+ prompt_tokens: '6732605'
1751
+ completion_tokens: '1346521'
1752
+ - :id: anthropic/claude-3.5-sonnet:beta
1753
+ :name: 'Anthropic: Claude 3.5 Sonnet (self-moderated)'
1754
+ :created: 1718841600
1755
+ :description: |-
1756
+ Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:
1757
+
1758
+ - Coding: Autonomously writes, edits, and runs code with reasoning and troubleshooting
1759
+ - Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights
1760
+ - Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone
1761
+ - Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems)
1762
+
1763
+ #multimodal
1764
+
1765
+ _This is a faster endpoint, made available in collaboration with Anthropic, that is self-moderated: response moderation happens on the provider's side instead of OpenRouter's. For requests that pass moderation, it's identical to the [Standard](/anthropic/claude-3.5-sonnet) variant._
1766
+ :context_length: 200000
1767
+ :architecture:
1768
+ modality: text+image->text
1769
+ tokenizer: Claude
1770
+ instruct_type:
1771
+ :pricing:
1772
+ prompt: '0.000003'
1773
+ completion: '0.000015'
1774
+ image: '0.0048'
1775
+ request: '0'
1776
+ :top_provider:
1777
+ context_length: 200000
1778
+ max_completion_tokens: 8192
1779
+ is_moderated: false
1780
+ :per_request_limits:
1781
+ prompt_tokens: '6732605'
1782
+ completion_tokens: '1346521'
1783
+ - :id: sao10k/l3-euryale-70b
1784
+ :name: Llama 3 Euryale 70B v2.1
1785
+ :created: 1718668800
1786
+ :description: |-
1787
+ Euryale 70B v2.1 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k).
1788
+
1789
+ - Better prompt adherence.
1790
+ - Better anatomy / spatial awareness.
1791
+ - Adapts much better to unique and custom formatting / reply formats.
1792
+ - Very creative, lots of unique swipes.
1793
+ - Is not restrictive during roleplays.
1794
+ :context_length: 8192
1795
+ :architecture:
1796
+ modality: text->text
1797
+ tokenizer: Llama3
1798
+ instruct_type: llama3
1799
+ :pricing:
1800
+ prompt: '0.00000035'
1801
+ completion: '0.0000004'
1802
+ image: '0'
1803
+ request: '0'
1804
+ :top_provider:
1805
+ context_length: 8192
1806
+ max_completion_tokens:
1807
+ is_moderated: false
1808
+ :per_request_limits:
1809
+ prompt_tokens: '57708047'
1810
+ completion_tokens: '50494541'
1811
+ - :id: cognitivecomputations/dolphin-mixtral-8x22b
1812
+ :name: "Dolphin 2.9.2 Mixtral 8x22B \U0001F42C"
1813
+ :created: 1717804800
1814
+ :description: |-
1815
+ Dolphin 2.9 is designed for instruction following, conversational, and coding. This model is a finetune of [Mixtral 8x22B Instruct](/mistralai/mixtral-8x22b-instruct). It features a 64k context length and was fine-tuned with a 16k sequence length using ChatML templates.
1816
+
1817
+ This model is a successor to [Dolphin Mixtral 8x7B](/cognitivecomputations/dolphin-mixtral-8x7b).
1818
+
1819
+ The model is uncensored and is stripped of alignment and bias. It requires an external alignment layer for ethical use. Users are cautioned to use this highly compliant model responsibly, as detailed in a blog post about uncensored models at [erichartford.com/uncensored-models](https://erichartford.com/uncensored-models).
1820
+
1821
+ #moe #uncensored
1822
+ :context_length: 65536
1823
+ :architecture:
1824
+ modality: text->text
1825
+ tokenizer: Mistral
1826
+ instruct_type: chatml
1827
+ :pricing:
1828
+ prompt: '0.0000009'
1829
+ completion: '0.0000009'
1830
+ image: '0'
1831
+ request: '0'
1832
+ :top_provider:
1833
+ context_length: 16000
1834
+ max_completion_tokens:
1835
+ is_moderated: false
1836
+ :per_request_limits:
1837
+ prompt_tokens: '22442018'
1838
+ completion_tokens: '22442018'
1839
+ - :id: qwen/qwen-2-72b-instruct
1840
+ :name: Qwen 2 72B Instruct
1841
+ :created: 1717718400
1842
+ :description: |-
1843
+ Qwen2 72B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning.
1844
+
1845
+ It features SwiGLU activation, attention QKV bias, and group query attention. It is pretrained on extensive data with supervised finetuning and direct preference optimization.
1846
+
1847
+ For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2/) and [GitHub repo](https://github.com/QwenLM/Qwen2).
1848
+
1849
+ Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
1850
+ :context_length: 32768
1851
+ :architecture:
1852
+ modality: text->text
1853
+ tokenizer: Qwen
1854
+ instruct_type: chatml
1855
+ :pricing:
1856
+ prompt: '0.00000034'
1857
+ completion: '0.00000039'
1858
+ image: '0'
1859
+ request: '0'
1860
+ :top_provider:
1861
+ context_length: 32768
1862
+ max_completion_tokens:
1863
+ is_moderated: false
1864
+ :per_request_limits:
1865
+ prompt_tokens: '59405343'
1866
+ completion_tokens: '51789273'
1867
+ - :id: nousresearch/hermes-2-pro-llama-3-8b
1868
+ :name: 'NousResearch: Hermes 2 Pro - Llama-3 8B'
1869
+ :created: 1716768000
1870
+ :description: Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting
1871
+ of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a
1872
+ newly introduced Function Calling and JSON Mode dataset developed in-house.
1873
+ :context_length: 8192
1874
+ :architecture:
1875
+ modality: text->text
1876
+ tokenizer: Llama3
1877
+ instruct_type: chatml
1878
+ :pricing:
1879
+ prompt: '0.00000014'
1880
+ completion: '0.00000014'
1881
+ image: '0'
1882
+ request: '0'
1883
+ :top_provider:
1884
+ context_length: 8192
1885
+ max_completion_tokens:
1886
+ is_moderated: false
1887
+ :per_request_limits:
1888
+ prompt_tokens: '144270119'
1889
+ completion_tokens: '144270119'
1890
+ - :id: mistralai/mistral-7b-instruct-v0.3
1891
+ :name: 'Mistral: Mistral 7B Instruct v0.3'
1892
+ :created: 1716768000
1893
+ :description: |-
1894
+ A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
1895
+
1896
+ An improved version of [Mistral 7B Instruct v0.2](/mistralai/mistral-7b-instruct-v0.2), with the following changes:
1897
+
1898
+ - Extended vocabulary to 32768
1899
+ - Supports v3 Tokenizer
1900
+ - Supports function calling
1901
+
1902
+ NOTE: Support for function calling depends on the provider.
1903
+ :context_length: 32768
1904
+ :architecture:
1905
+ modality: text->text
1906
+ tokenizer: Mistral
1907
+ instruct_type: mistral
1908
+ :pricing:
1909
+ prompt: '0.000000055'
1910
+ completion: '0.000000055'
1911
+ image: '0'
1912
+ request: '0'
1913
+ :top_provider:
1914
+ context_length: 32768
1915
+ max_completion_tokens:
1916
+ is_moderated: false
1917
+ :per_request_limits:
1918
+ prompt_tokens: '367233031'
1919
+ completion_tokens: '367233031'
1920
+ - :id: mistralai/mistral-7b-instruct:free
1921
+ :name: 'Mistral: Mistral 7B Instruct (free)'
1922
+ :created: 1716768000
1923
+ :description: |-
1924
+ A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
1925
+
1926
+ *Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.*
1927
+
1928
+ _These are free, rate-limited endpoints for [Mistral 7B Instruct](/mistralai/mistral-7b-instruct). Outputs may be cached. Read about rate limits [here](/docs/limits)._
1929
+ :context_length: 32768
1930
+ :architecture:
1931
+ modality: text->text
1932
+ tokenizer: Mistral
1933
+ instruct_type: mistral
1934
+ :pricing:
1935
+ prompt: '0'
1936
+ completion: '0'
1937
+ image: '0'
1938
+ request: '0'
1939
+ :top_provider:
1940
+ context_length: 8192
1941
+ max_completion_tokens: 4096
1942
+ is_moderated: false
1943
+ :per_request_limits:
1944
+ prompt_tokens: Infinity
1945
+ completion_tokens: Infinity
1946
+ - :id: mistralai/mistral-7b-instruct
1947
+ :name: 'Mistral: Mistral 7B Instruct'
1948
+ :created: 1716768000
1949
+ :description: |-
1950
+ A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
1951
+
1952
+ *Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.*
1953
+ :context_length: 32768
1954
+ :architecture:
1955
+ modality: text->text
1956
+ tokenizer: Mistral
1957
+ instruct_type: mistral
1958
+ :pricing:
1959
+ prompt: '0.000000055'
1960
+ completion: '0.000000055'
1961
+ image: '0'
1962
+ request: '0'
1963
+ :top_provider:
1964
+ context_length: 32768
1965
+ max_completion_tokens:
1966
+ is_moderated: false
1967
+ :per_request_limits:
1968
+ prompt_tokens: '367233031'
1969
+ completion_tokens: '367233031'
1970
+ - :id: mistralai/mistral-7b-instruct:nitro
1971
+ :name: 'Mistral: Mistral 7B Instruct (nitro)'
1972
+ :created: 1716768000
1973
+ :description: |-
1974
+ A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
1975
+
1976
+ *Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.*
1977
+
1978
+ _These are higher-throughput endpoints for [Mistral 7B Instruct](/mistralai/mistral-7b-instruct). They may have higher prices._
1979
+ :context_length: 32768
1980
+ :architecture:
1981
+ modality: text->text
1982
+ tokenizer: Mistral
1983
+ instruct_type: mistral
1984
+ :pricing:
1985
+ prompt: '0.00000007'
1986
+ completion: '0.00000007'
1987
+ image: '0'
1988
+ request: '0'
1989
+ :top_provider:
1990
+ context_length: 32768
1991
+ max_completion_tokens:
1992
+ is_moderated: false
1993
+ :per_request_limits:
1994
+ prompt_tokens: '288540239'
1995
+ completion_tokens: '288540239'
1996
+ - :id: microsoft/phi-3-mini-128k-instruct:free
1997
+ :name: Phi-3 Mini 128K Instruct (free)
1998
+ :created: 1716681600
1999
+ :description: |-
2000
+ Phi-3 Mini is a powerful 3.8B parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing.
2001
+
2002
+ At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. This model is static, trained on an offline dataset with an October 2023 cutoff date.
2003
+
2004
+ _These are free, rate-limited endpoints for [Phi-3 Mini 128K Instruct](/microsoft/phi-3-mini-128k-instruct). Outputs may be cached. Read about rate limits [here](/docs/limits)._
2005
+ :context_length: 128000
2006
+ :architecture:
2007
+ modality: text->text
2008
+ tokenizer: Other
2009
+ instruct_type: phi3
2010
+ :pricing:
2011
+ prompt: '0'
2012
+ completion: '0'
2013
+ image: '0'
2014
+ request: '0'
2015
+ :top_provider:
2016
+ context_length: 8192
2017
+ max_completion_tokens: 4096
2018
+ is_moderated: false
2019
+ :per_request_limits:
2020
+ prompt_tokens: Infinity
2021
+ completion_tokens: Infinity
2022
+ - :id: microsoft/phi-3-mini-128k-instruct
2023
+ :name: Phi-3 Mini 128K Instruct
2024
+ :created: 1716681600
2025
+ :description: |-
2026
+ Phi-3 Mini is a powerful 3.8B parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing.
2027
+
2028
+ At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. This model is static, trained on an offline dataset with an October 2023 cutoff date.
2029
+ :context_length: 128000
2030
+ :architecture:
2031
+ modality: text->text
2032
+ tokenizer: Other
2033
+ instruct_type: phi3
2034
+ :pricing:
2035
+ prompt: '0.0000001'
2036
+ completion: '0.0000001'
2037
+ image: '0'
2038
+ request: '0'
2039
+ :top_provider:
2040
+ context_length: 128000
2041
+ max_completion_tokens:
2042
+ is_moderated: false
2043
+ :per_request_limits:
2044
+ prompt_tokens: '201978167'
2045
+ completion_tokens: '201978167'
2046
+ - :id: microsoft/phi-3-medium-128k-instruct:free
2047
+ :name: Phi-3 Medium 128K Instruct (free)
2048
+ :created: 1716508800
2049
+ :description: |-
2050
+ Phi-3 128K Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing.
2051
+
2052
+ At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. In the MMLU-Pro eval, the model even comes close to a Llama3 70B level of performance.
2053
+
2054
+ For 4k context length, try [Phi-3 Medium 4K](/microsoft/phi-3-medium-4k-instruct).
2055
+
2056
+ _These are free, rate-limited endpoints for [Phi-3 Medium 128K Instruct](/microsoft/phi-3-medium-128k-instruct). Outputs may be cached. Read about rate limits [here](/docs/limits)._
2057
+ :context_length: 128000
2058
+ :architecture:
2059
+ modality: text->text
2060
+ tokenizer: Other
2061
+ instruct_type: phi3
2062
+ :pricing:
2063
+ prompt: '0'
2064
+ completion: '0'
2065
+ image: '0'
2066
+ request: '0'
2067
+ :top_provider:
2068
+ context_length: 8192
2069
+ max_completion_tokens: 4096
2070
+ is_moderated: false
2071
+ :per_request_limits:
2072
+ prompt_tokens: Infinity
2073
+ completion_tokens: Infinity
2074
+ - :id: microsoft/phi-3-medium-128k-instruct
2075
+ :name: Phi-3 Medium 128K Instruct
2076
+ :created: 1716508800
2077
+ :description: |-
2078
+ Phi-3 128K Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing.
2079
+
2080
+ At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. In the MMLU-Pro eval, the model even comes close to a Llama3 70B level of performance.
2081
+
2082
+ For 4k context length, try [Phi-3 Medium 4K](/microsoft/phi-3-medium-4k-instruct).
2083
+ :context_length: 128000
2084
+ :architecture:
2085
+ modality: text->text
2086
+ tokenizer: Other
2087
+ instruct_type: phi3
2088
+ :pricing:
2089
+ prompt: '0.000001'
2090
+ completion: '0.000001'
2091
+ image: '0'
2092
+ request: '0'
2093
+ :top_provider:
2094
+ context_length: 128000
2095
+ max_completion_tokens:
2096
+ is_moderated: false
2097
+ :per_request_limits:
2098
+ prompt_tokens: '20197816'
2099
+ completion_tokens: '20197816'
2100
+ - :id: neversleep/llama-3-lumimaid-70b
2101
+ :name: Llama 3 Lumimaid 70B
2102
+ :created: 1715817600
2103
+ :description: |-
2104
+ The NeverSleep team is back, with a Llama 3 70B finetune trained on their curated roleplay data. Striking a balance between eRP and RP, Lumimaid was designed to be serious, yet uncensored when necessary.
2105
+
2106
+ To enhance it's overall intelligence and chat capability, roughly 40% of the training data was not roleplay. This provides a breadth of knowledge to access, while still keeping roleplay as the primary strength.
2107
+
2108
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
2109
+ :context_length: 8192
2110
+ :architecture:
2111
+ modality: text->text
2112
+ tokenizer: Llama3
2113
+ instruct_type: llama3
2114
+ :pricing:
2115
+ prompt: '0.000003375'
2116
+ completion: '0.0000045'
2117
+ image: '0'
2118
+ request: '0'
2119
+ :top_provider:
2120
+ context_length: 8192
2121
+ max_completion_tokens: 2048
2122
+ is_moderated: false
2123
+ :per_request_limits:
2124
+ prompt_tokens: '5984538'
2125
+ completion_tokens: '4488403'
2126
+ - :id: google/gemini-flash-1.5
2127
+ :name: 'Google: Gemini Flash 1.5'
2128
+ :created: 1715644800
2129
+ :description: |-
2130
+ Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots.
2131
+
2132
+ Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter.
2133
+
2134
+ Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
2135
+
2136
+ #multimodal
2137
+ :context_length: 1000000
2138
+ :architecture:
2139
+ modality: text+image->text
2140
+ tokenizer: Gemini
2141
+ instruct_type:
2142
+ :pricing:
2143
+ prompt: '0.000000075'
2144
+ completion: '0.0000003'
2145
+ image: '0.00004'
2146
+ request: '0'
2147
+ :top_provider:
2148
+ context_length: 1000000
2149
+ max_completion_tokens: 8192
2150
+ is_moderated: false
2151
+ :per_request_limits:
2152
+ prompt_tokens: '269304223'
2153
+ completion_tokens: '67326055'
2154
+ - :id: deepseek/deepseek-chat
2155
+ :name: DeepSeek V2.5
2156
+ :created: 1715644800
2157
+ :description: |-
2158
+ DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions.
2159
+
2160
+ DeepSeek-V2 Chat is a conversational finetune of DeepSeek-V2, a Mixture-of-Experts (MoE) language model. It comprises 236B total parameters, of which 21B are activated for each token.
2161
+
2162
+ Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times.
2163
+
2164
+ DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluations.
2165
+ :context_length: 128000
2166
+ :architecture:
2167
+ modality: text->text
2168
+ tokenizer: Other
2169
+ instruct_type:
2170
+ :pricing:
2171
+ prompt: '0.00000014'
2172
+ completion: '0.00000028'
2173
+ image: '0'
2174
+ request: '0'
2175
+ :top_provider:
2176
+ context_length: 128000
2177
+ max_completion_tokens: 4096
2178
+ is_moderated: false
2179
+ :per_request_limits:
2180
+ prompt_tokens: '144270119'
2181
+ completion_tokens: '72135059'
2182
+ - :id: perplexity/llama-3-sonar-large-32k-online
2183
+ :name: 'Perplexity: Llama3 Sonar 70B Online'
2184
+ :created: 1715644800
2185
+ :description: |-
2186
+ Llama3 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance.
2187
+
2188
+ This is the online version of the [offline chat model](/perplexity/llama-3-sonar-large-32k-chat). It is focused on delivering helpful, up-to-date, and factual responses. #online
2189
+ :context_length: 28000
2190
+ :architecture:
2191
+ modality: text->text
2192
+ tokenizer: Llama3
2193
+ instruct_type:
2194
+ :pricing:
2195
+ prompt: '0.000001'
2196
+ completion: '0.000001'
2197
+ image: '0'
2198
+ request: '0.005'
2199
+ :top_provider:
2200
+ context_length: 28000
2201
+ max_completion_tokens:
2202
+ is_moderated: false
2203
+ :per_request_limits:
2204
+ prompt_tokens: '20197816'
2205
+ completion_tokens: '20197816'
2206
+ - :id: perplexity/llama-3-sonar-large-32k-chat
2207
+ :name: 'Perplexity: Llama3 Sonar 70B'
2208
+ :created: 1715644800
2209
+ :description: |-
2210
+ Llama3 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance.
2211
+
2212
+ This is a normal offline LLM, but the [online version](/perplexity/llama-3-sonar-large-32k-online) of this model has Internet access.
2213
+ :context_length: 32768
2214
+ :architecture:
2215
+ modality: text->text
2216
+ tokenizer: Llama3
2217
+ instruct_type:
2218
+ :pricing:
2219
+ prompt: '0.000001'
2220
+ completion: '0.000001'
2221
+ image: '0'
2222
+ request: '0'
2223
+ :top_provider:
2224
+ context_length: 32768
2225
+ max_completion_tokens:
2226
+ is_moderated: false
2227
+ :per_request_limits:
2228
+ prompt_tokens: '20197816'
2229
+ completion_tokens: '20197816'
2230
+ - :id: perplexity/llama-3-sonar-small-32k-online
2231
+ :name: 'Perplexity: Llama3 Sonar 8B Online'
2232
+ :created: 1715644800
2233
+ :description: |-
2234
+ Llama3 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance.
2235
+
2236
+ This is the online version of the [offline chat model](/perplexity/llama-3-sonar-small-32k-chat). It is focused on delivering helpful, up-to-date, and factual responses. #online
2237
+ :context_length: 28000
2238
+ :architecture:
2239
+ modality: text->text
2240
+ tokenizer: Llama3
2241
+ instruct_type:
2242
+ :pricing:
2243
+ prompt: '0.0000002'
2244
+ completion: '0.0000002'
2245
+ image: '0'
2246
+ request: '0.005'
2247
+ :top_provider:
2248
+ context_length: 28000
2249
+ max_completion_tokens:
2250
+ is_moderated: false
2251
+ :per_request_limits:
2252
+ prompt_tokens: '100989083'
2253
+ completion_tokens: '100989083'
2254
+ - :id: perplexity/llama-3-sonar-small-32k-chat
2255
+ :name: 'Perplexity: Llama3 Sonar 8B'
2256
+ :created: 1715644800
2257
+ :description: |-
2258
+ Llama3 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance.
2259
+
2260
+ This is a normal offline LLM, but the [online version](/perplexity/llama-3-sonar-small-32k-online) of this model has Internet access.
2261
+ :context_length: 32768
2262
+ :architecture:
2263
+ modality: text->text
2264
+ tokenizer: Llama3
2265
+ instruct_type:
2266
+ :pricing:
2267
+ prompt: '0.0000002'
2268
+ completion: '0.0000002'
2269
+ image: '0'
2270
+ request: '0'
2271
+ :top_provider:
2272
+ context_length: 32768
2273
+ max_completion_tokens:
2274
+ is_moderated: false
2275
+ :per_request_limits:
2276
+ prompt_tokens: '100989083'
2277
+ completion_tokens: '100989083'
2278
+ - :id: meta-llama/llama-guard-2-8b
2279
+ :name: 'Meta: LlamaGuard 2 8B'
2280
+ :created: 1715558400
2281
+ :description: |-
2282
+ This safeguard model has 8B parameters and is based on the Llama 3 family. Just like is predecessor, [LlamaGuard 1](https://huggingface.co/meta-llama/LlamaGuard-7b), it can do both prompt and response classification.
2283
+
2284
+ LlamaGuard 2 acts as a normal LLM would, generating text that indicates whether the given input/output is safe/unsafe. If deemed unsafe, it will also share the content categories violated.
2285
+
2286
+ For best results, please use raw prompt input or the `/completions` endpoint, instead of the chat API.
2287
+
2288
+ It has demonstrated strong performance compared to leading closed-source models in human evaluations.
2289
+
2290
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
2291
+ :context_length: 8192
2292
+ :architecture:
2293
+ modality: text->text
2294
+ tokenizer: Llama3
2295
+ instruct_type: none
2296
+ :pricing:
2297
+ prompt: '0.00000018'
2298
+ completion: '0.00000018'
2299
+ image: '0'
2300
+ request: '0'
2301
+ :top_provider:
2302
+ context_length: 8192
2303
+ max_completion_tokens:
2304
+ is_moderated: false
2305
+ :per_request_limits:
2306
+ prompt_tokens: '112210093'
2307
+ completion_tokens: '112210093'
2308
+ - :id: openai/gpt-4o-2024-05-13
2309
+ :name: 'OpenAI: GPT-4o (2024-05-13)'
2310
+ :created: 1715558400
2311
+ :description: |-
2312
+ GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.
2313
+
2314
+ For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209)
2315
+ :context_length: 128000
2316
+ :architecture:
2317
+ modality: text+image->text
2318
+ tokenizer: GPT
2319
+ instruct_type:
2320
+ :pricing:
2321
+ prompt: '0.000005'
2322
+ completion: '0.000015'
2323
+ image: '0.007225'
2324
+ request: '0'
2325
+ :top_provider:
2326
+ context_length: 128000
2327
+ max_completion_tokens: 4096
2328
+ is_moderated: true
2329
+ :per_request_limits:
2330
+ prompt_tokens: '4039563'
2331
+ completion_tokens: '1346521'
2332
+ - :id: openai/gpt-4o
2333
+ :name: 'OpenAI: GPT-4o'
2334
+ :created: 1715558400
2335
+ :description: |-
2336
+ GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.
2337
+
2338
+ For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209)
2339
+ :context_length: 128000
2340
+ :architecture:
2341
+ modality: text+image->text
2342
+ tokenizer: GPT
2343
+ instruct_type:
2344
+ :pricing:
2345
+ prompt: '0.0000025'
2346
+ completion: '0.00001'
2347
+ image: '0.0036125'
2348
+ request: '0'
2349
+ :top_provider:
2350
+ context_length: 128000
2351
+ max_completion_tokens: 4096
2352
+ is_moderated: true
2353
+ :per_request_limits:
2354
+ prompt_tokens: '8079126'
2355
+ completion_tokens: '2019781'
2356
+ - :id: openai/gpt-4o:extended
2357
+ :name: 'OpenAI: GPT-4o (extended)'
2358
+ :created: 1715558400
2359
+ :description: |-
2360
+ GPT-4o Extended is an experimental variant of GPT-4o with an extended max output tokens. This model supports only text input to text output.
2361
+
2362
+ _These are extended-context endpoints for [GPT-4o](/openai/gpt-4o). They may have higher prices._
2363
+ :context_length: 128000
2364
+ :architecture:
2365
+ modality: text->text
2366
+ tokenizer: GPT
2367
+ instruct_type:
2368
+ :pricing:
2369
+ prompt: '0.000006'
2370
+ completion: '0.000018'
2371
+ image: '0.007225'
2372
+ request: '0'
2373
+ :top_provider:
2374
+ context_length: 128000
2375
+ max_completion_tokens: 64000
2376
+ is_moderated: false
2377
+ :per_request_limits:
2378
+ prompt_tokens: '3366302'
2379
+ completion_tokens: '1122100'
2380
+ - :id: qwen/qwen-72b-chat
2381
+ :name: Qwen 1.5 72B Chat
2382
+ :created: 1715212800
2383
+ :description: |-
2384
+ Qwen1.5 72B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include:
2385
+
2386
+ - Significant performance improvement in human preference for chat models
2387
+ - Multilingual support of both base and chat models
2388
+ - Stable support of 32K context length for models of all sizes
2389
+
2390
+ For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5).
2391
+
2392
+ Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
2393
+ :context_length: 32768
2394
+ :architecture:
2395
+ modality: text->text
2396
+ tokenizer: Qwen
2397
+ instruct_type: chatml
2398
+ :pricing:
2399
+ prompt: '0.00000081'
2400
+ completion: '0.00000081'
2401
+ image: '0'
2402
+ request: '0'
2403
+ :top_provider:
2404
+ context_length: 32768
2405
+ max_completion_tokens:
2406
+ is_moderated: false
2407
+ :per_request_limits:
2408
+ prompt_tokens: '24935576'
2409
+ completion_tokens: '24935576'
2410
+ - :id: qwen/qwen-110b-chat
2411
+ :name: Qwen 1.5 110B Chat
2412
+ :created: 1715212800
2413
+ :description: |-
2414
+ Qwen1.5 110B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include:
2415
+
2416
+ - Significant performance improvement in human preference for chat models
2417
+ - Multilingual support of both base and chat models
2418
+ - Stable support of 32K context length for models of all sizes
2419
+
2420
+ For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5).
2421
+
2422
+ Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
2423
+ :context_length: 32768
2424
+ :architecture:
2425
+ modality: text->text
2426
+ tokenizer: Qwen
2427
+ instruct_type: chatml
2428
+ :pricing:
2429
+ prompt: '0.00000162'
2430
+ completion: '0.00000162'
2431
+ image: '0'
2432
+ request: '0'
2433
+ :top_provider:
2434
+ context_length: 32768
2435
+ max_completion_tokens:
2436
+ is_moderated: false
2437
+ :per_request_limits:
2438
+ prompt_tokens: '12467788'
2439
+ completion_tokens: '12467788'
2440
+ - :id: neversleep/llama-3-lumimaid-8b
2441
+ :name: Llama 3 Lumimaid 8B
2442
+ :created: 1714780800
2443
+ :description: |-
2444
+ The NeverSleep team is back, with a Llama 3 8B finetune trained on their curated roleplay data. Striking a balance between eRP and RP, Lumimaid was designed to be serious, yet uncensored when necessary.
2445
+
2446
+ To enhance it's overall intelligence and chat capability, roughly 40% of the training data was not roleplay. This provides a breadth of knowledge to access, while still keeping roleplay as the primary strength.
2447
+
2448
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
2449
+ :context_length: 24576
2450
+ :architecture:
2451
+ modality: text->text
2452
+ tokenizer: Llama3
2453
+ instruct_type: llama3
2454
+ :pricing:
2455
+ prompt: '0.0000001875'
2456
+ completion: '0.000001125'
2457
+ image: '0'
2458
+ request: '0'
2459
+ :top_provider:
2460
+ context_length: 8192
2461
+ max_completion_tokens:
2462
+ is_moderated: false
2463
+ :per_request_limits:
2464
+ prompt_tokens: '107721689'
2465
+ completion_tokens: '17953614'
2466
+ - :id: neversleep/llama-3-lumimaid-8b:extended
2467
+ :name: Llama 3 Lumimaid 8B (extended)
2468
+ :created: 1714780800
2469
+ :description: |-
2470
+ The NeverSleep team is back, with a Llama 3 8B finetune trained on their curated roleplay data. Striking a balance between eRP and RP, Lumimaid was designed to be serious, yet uncensored when necessary.
2471
+
2472
+ To enhance it's overall intelligence and chat capability, roughly 40% of the training data was not roleplay. This provides a breadth of knowledge to access, while still keeping roleplay as the primary strength.
2473
+
2474
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
2475
+
2476
+ _These are extended-context endpoints for [Llama 3 Lumimaid v0.1 8B](/neversleep/llama-3-lumimaid-8b). They may have higher prices._
2477
+ :context_length: 24576
2478
+ :architecture:
2479
+ modality: text->text
2480
+ tokenizer: Llama3
2481
+ instruct_type: llama3
2482
+ :pricing:
2483
+ prompt: '0.0000001875'
2484
+ completion: '0.000001125'
2485
+ image: '0'
2486
+ request: '0'
2487
+ :top_provider:
2488
+ context_length: 24576
2489
+ max_completion_tokens: 2048
2490
+ is_moderated: false
2491
+ :per_request_limits:
2492
+ prompt_tokens: '107721689'
2493
+ completion_tokens: '17953614'
2494
+ - :id: sao10k/fimbulvetr-11b-v2
2495
+ :name: Fimbulvetr 11B v2
2496
+ :created: 1713657600
2497
+ :description: |-
2498
+ Creative writing model, routed with permission. It's fast, it keeps the conversation going, and it stays in character.
2499
+
2500
+ If you submit a raw prompt, you can use Alpaca or Vicuna formats.
2501
+ :context_length: 8192
2502
+ :architecture:
2503
+ modality: text->text
2504
+ tokenizer: Llama2
2505
+ instruct_type: alpaca
2506
+ :pricing:
2507
+ prompt: '0.000000375'
2508
+ completion: '0.0000015'
2509
+ image: '0'
2510
+ request: '0'
2511
+ :top_provider:
2512
+ context_length: 8192
2513
+ max_completion_tokens: 2048
2514
+ is_moderated: false
2515
+ :per_request_limits:
2516
+ prompt_tokens: '53860844'
2517
+ completion_tokens: '13465211'
2518
+ - :id: meta-llama/llama-3-70b-instruct
2519
+ :name: 'Meta: Llama 3 70B Instruct'
2520
+ :created: 1713398400
2521
+ :description: |-
2522
+ Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases.
2523
+
2524
+ It has demonstrated strong performance compared to leading closed-source models in human evaluations.
2525
+
2526
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
2527
+ :context_length: 8192
2528
+ :architecture:
2529
+ modality: text->text
2530
+ tokenizer: Llama3
2531
+ instruct_type: llama3
2532
+ :pricing:
2533
+ prompt: '0.00000035'
2534
+ completion: '0.0000004'
2535
+ image: '0'
2536
+ request: '0'
2537
+ :top_provider:
2538
+ context_length: 8192
2539
+ max_completion_tokens:
2540
+ is_moderated: false
2541
+ :per_request_limits:
2542
+ prompt_tokens: '57708047'
2543
+ completion_tokens: '50494541'
2544
+ - :id: meta-llama/llama-3-70b-instruct:nitro
2545
+ :name: 'Meta: Llama 3 70B Instruct (nitro)'
2546
+ :created: 1713398400
2547
+ :description: |-
2548
+ Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases.
2549
+
2550
+ It has demonstrated strong performance compared to leading closed-source models in human evaluations.
2551
+
2552
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
2553
+
2554
+ _These are higher-throughput endpoints for [Llama 3 70B Instruct](/meta-llama/llama-3-70b-instruct). They may have higher prices._
2555
+ :context_length: 8192
2556
+ :architecture:
2557
+ modality: text->text
2558
+ tokenizer: Llama3
2559
+ instruct_type: llama3
2560
+ :pricing:
2561
+ prompt: '0.000000792'
2562
+ completion: '0.000000792'
2563
+ image: '0'
2564
+ request: '0'
2565
+ :top_provider:
2566
+ context_length: 8192
2567
+ max_completion_tokens:
2568
+ is_moderated: false
2569
+ :per_request_limits:
2570
+ prompt_tokens: '25502293'
2571
+ completion_tokens: '25502293'
2572
+ - :id: meta-llama/llama-3-8b-instruct:free
2573
+ :name: 'Meta: Llama 3 8B Instruct (free)'
2574
+ :created: 1713398400
2575
+ :description: |-
2576
+ Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.
2577
+
2578
+ It has demonstrated strong performance compared to leading closed-source models in human evaluations.
2579
+
2580
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
2581
+
2582
+ _These are free, rate-limited endpoints for [Llama 3 8B Instruct](/meta-llama/llama-3-8b-instruct). Outputs may be cached. Read about rate limits [here](/docs/limits)._
2583
+ :context_length: 8192
2584
+ :architecture:
2585
+ modality: text->text
2586
+ tokenizer: Llama3
2587
+ instruct_type: llama3
2588
+ :pricing:
2589
+ prompt: '0'
2590
+ completion: '0'
2591
+ image: '0'
2592
+ request: '0'
2593
+ :top_provider:
2594
+ context_length: 8192
2595
+ max_completion_tokens: 4096
2596
+ is_moderated: false
2597
+ :per_request_limits:
2598
+ prompt_tokens: Infinity
2599
+ completion_tokens: Infinity
2600
+ - :id: meta-llama/llama-3-8b-instruct
2601
+ :name: 'Meta: Llama 3 8B Instruct'
2602
+ :created: 1713398400
2603
+ :description: |-
2604
+ Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.
2605
+
2606
+ It has demonstrated strong performance compared to leading closed-source models in human evaluations.
2607
+
2608
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
2609
+ :context_length: 8192
2610
+ :architecture:
2611
+ modality: text->text
2612
+ tokenizer: Llama3
2613
+ instruct_type: llama3
2614
+ :pricing:
2615
+ prompt: '0.000000055'
2616
+ completion: '0.000000055'
2617
+ image: '0'
2618
+ request: '0'
2619
+ :top_provider:
2620
+ context_length: 8192
2621
+ max_completion_tokens:
2622
+ is_moderated: false
2623
+ :per_request_limits:
2624
+ prompt_tokens: '367233031'
2625
+ completion_tokens: '367233031'
2626
+ - :id: meta-llama/llama-3-8b-instruct:nitro
2627
+ :name: 'Meta: Llama 3 8B Instruct (nitro)'
2628
+ :created: 1713398400
2629
+ :description: |-
2630
+ Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.
2631
+
2632
+ It has demonstrated strong performance compared to leading closed-source models in human evaluations.
2633
+
2634
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
2635
+
2636
+ _These are higher-throughput endpoints for [Llama 3 8B Instruct](/meta-llama/llama-3-8b-instruct). They may have higher prices._
2637
+ :context_length: 8192
2638
+ :architecture:
2639
+ modality: text->text
2640
+ tokenizer: Llama3
2641
+ instruct_type: llama3
2642
+ :pricing:
2643
+ prompt: '0.000000162'
2644
+ completion: '0.000000162'
2645
+ image: '0'
2646
+ request: '0'
2647
+ :top_provider:
2648
+ context_length: 8192
2649
+ max_completion_tokens:
2650
+ is_moderated: false
2651
+ :per_request_limits:
2652
+ prompt_tokens: '124677881'
2653
+ completion_tokens: '124677881'
2654
+ - :id: meta-llama/llama-3-8b-instruct:extended
2655
+ :name: 'Meta: Llama 3 8B Instruct (extended)'
2656
+ :created: 1713398400
2657
+ :description: |-
2658
+ Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.
2659
+
2660
+ It has demonstrated strong performance compared to leading closed-source models in human evaluations.
2661
+
2662
+ Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
2663
+
2664
+ _These are extended-context endpoints for [Llama 3 8B Instruct](/meta-llama/llama-3-8b-instruct). They may have higher prices._
2665
+ :context_length: 16384
2666
+ :architecture:
2667
+ modality: text->text
2668
+ tokenizer: Llama3
2669
+ instruct_type: llama3
2670
+ :pricing:
2671
+ prompt: '0.0000001875'
2672
+ completion: '0.000001125'
2673
+ image: '0'
2674
+ request: '0'
2675
+ :top_provider:
2676
+ context_length: 16384
2677
+ max_completion_tokens: 2048
2678
+ is_moderated: false
2679
+ :per_request_limits:
2680
+ prompt_tokens: '107721689'
2681
+ completion_tokens: '17953614'
2682
+ - :id: mistralai/mixtral-8x22b-instruct
2683
+ :name: 'Mistral: Mixtral 8x22B Instruct'
2684
+ :created: 1713312000
2685
+ :description: |-
2686
+ Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include:
2687
+ - strong math, coding, and reasoning
2688
+ - large context length (64k)
2689
+ - fluency in English, French, Italian, German, and Spanish
2690
+
2691
+ See benchmarks on the launch announcement [here](https://mistral.ai/news/mixtral-8x22b/).
2692
+ #moe
2693
+ :context_length: 65536
2694
+ :architecture:
2695
+ modality: text->text
2696
+ tokenizer: Mistral
2697
+ instruct_type: mistral
2698
+ :pricing:
2699
+ prompt: '0.0000009'
2700
+ completion: '0.0000009'
2701
+ image: '0'
2702
+ request: '0'
2703
+ :top_provider:
2704
+ context_length: 65536
2705
+ max_completion_tokens:
2706
+ is_moderated: false
2707
+ :per_request_limits:
2708
+ prompt_tokens: '22442018'
2709
+ completion_tokens: '22442018'
2710
+ - :id: microsoft/wizardlm-2-7b
2711
+ :name: WizardLM-2 7B
2712
+ :created: 1713225600
2713
+ :description: |-
2714
+ WizardLM-2 7B is the smaller variant of Microsoft AI's latest Wizard model. It is the fastest and achieves comparable performance with existing 10x larger opensource leading models
2715
+
2716
+ It is a finetune of [Mistral 7B Instruct](/mistralai/mistral-7b-instruct), using the same technique as [WizardLM-2 8x22B](/microsoft/wizardlm-2-8x22b).
2717
+
2718
+ To read more about the model release, [click here](https://wizardlm.github.io/WizardLM2/).
2719
+
2720
+ #moe
2721
+ :context_length: 32000
2722
+ :architecture:
2723
+ modality: text->text
2724
+ tokenizer: Mistral
2725
+ instruct_type: vicuna
2726
+ :pricing:
2727
+ prompt: '0.000000055'
2728
+ completion: '0.000000055'
2729
+ image: '0'
2730
+ request: '0'
2731
+ :top_provider:
2732
+ context_length: 32000
2733
+ max_completion_tokens:
2734
+ is_moderated: false
2735
+ :per_request_limits:
2736
+ prompt_tokens: '367233031'
2737
+ completion_tokens: '367233031'
2738
+ - :id: microsoft/wizardlm-2-8x22b
2739
+ :name: WizardLM-2 8x22B
2740
+ :created: 1713225600
2741
+ :description: |-
2742
+ WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models.
2743
+
2744
+ It is an instruct finetune of [Mixtral 8x22B](/mistralai/mixtral-8x22b).
2745
+
2746
+ To read more about the model release, [click here](https://wizardlm.github.io/WizardLM2/).
2747
+
2748
+ #moe
2749
+ :context_length: 65536
2750
+ :architecture:
2751
+ modality: text->text
2752
+ tokenizer: Mistral
2753
+ instruct_type: vicuna
2754
+ :pricing:
2755
+ prompt: '0.0000005'
2756
+ completion: '0.0000005'
2757
+ image: '0'
2758
+ request: '0'
2759
+ :top_provider:
2760
+ context_length: 65536
2761
+ max_completion_tokens:
2762
+ is_moderated: false
2763
+ :per_request_limits:
2764
+ prompt_tokens: '40395633'
2765
+ completion_tokens: '40395633'
2766
+ - :id: google/gemini-pro-1.5
2767
+ :name: 'Google: Gemini Pro 1.5'
2768
+ :created: 1712620800
2769
+ :description: |-
2770
+ Google's latest multimodal model, supporting image and video in text or chat prompts.
2771
+
2772
+ Optimized for language tasks including:
2773
+
2774
+ - Code generation
2775
+ - Text generation
2776
+ - Text editing
2777
+ - Problem solving
2778
+ - Recommendations
2779
+ - Information extraction
2780
+ - Data extraction or generation
2781
+ - AI agents
2782
+
2783
+ Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
2784
+
2785
+ #multimodal
2786
+ :context_length: 2000000
2787
+ :architecture:
2788
+ modality: text+image->text
2789
+ tokenizer: Gemini
2790
+ instruct_type:
2791
+ :pricing:
2792
+ prompt: '0.00000125'
2793
+ completion: '0.000005'
2794
+ image: '0.00263'
2795
+ request: '0'
2796
+ :top_provider:
2797
+ context_length: 2000000
2798
+ max_completion_tokens: 8192
2799
+ is_moderated: false
2800
+ :per_request_limits:
2801
+ prompt_tokens: '16158253'
2802
+ completion_tokens: '4039563'
2803
+ - :id: openai/gpt-4-turbo
2804
+ :name: 'OpenAI: GPT-4 Turbo'
2805
+ :created: 1712620800
2806
+ :description: |-
2807
+ The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling.
2808
+
2809
+ Training data: up to December 2023.
2810
+ :context_length: 128000
2811
+ :architecture:
2812
+ modality: text+image->text
2813
+ tokenizer: GPT
2814
+ instruct_type:
2815
+ :pricing:
2816
+ prompt: '0.00001'
2817
+ completion: '0.00003'
2818
+ image: '0.01445'
2819
+ request: '0'
2820
+ :top_provider:
2821
+ context_length: 128000
2822
+ max_completion_tokens: 4096
2823
+ is_moderated: true
2824
+ :per_request_limits:
2825
+ prompt_tokens: '2019781'
2826
+ completion_tokens: '673260'
2827
+ - :id: cohere/command-r-plus
2828
+ :name: 'Cohere: Command R+'
2829
+ :created: 1712188800
2830
+ :description: |-
2831
+ Command R+ is a new, 104B-parameter LLM from Cohere. It's useful for roleplay, general consumer usecases, and Retrieval Augmented Generation (RAG).
2832
+
2833
+ It offers multilingual support for ten key languages to facilitate global business operations. See benchmarks and the launch post [here](https://txt.cohere.com/command-r-plus-microsoft-azure/).
2834
+
2835
+ Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
2836
+ :context_length: 128000
2837
+ :architecture:
2838
+ modality: text->text
2839
+ tokenizer: Cohere
2840
+ instruct_type:
2841
+ :pricing:
2842
+ prompt: '0.00000285'
2843
+ completion: '0.00001425'
2844
+ image: '0'
2845
+ request: '0'
2846
+ :top_provider:
2847
+ context_length: 128000
2848
+ max_completion_tokens: 4000
2849
+ is_moderated: false
2850
+ :per_request_limits:
2851
+ prompt_tokens: '7086953'
2852
+ completion_tokens: '1417390'
2853
+ - :id: cohere/command-r-plus-04-2024
2854
+ :name: 'Cohere: Command R+ (04-2024)'
2855
+ :created: 1712016000
2856
+ :description: |-
2857
+ Command R+ is a new, 104B-parameter LLM from Cohere. It's useful for roleplay, general consumer usecases, and Retrieval Augmented Generation (RAG).
2858
+
2859
+ It offers multilingual support for ten key languages to facilitate global business operations. See benchmarks and the launch post [here](https://txt.cohere.com/command-r-plus-microsoft-azure/).
2860
+
2861
+ Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
2862
+ :context_length: 128000
2863
+ :architecture:
2864
+ modality: text->text
2865
+ tokenizer: Cohere
2866
+ instruct_type:
2867
+ :pricing:
2868
+ prompt: '0.00000285'
2869
+ completion: '0.00001425'
2870
+ image: '0'
2871
+ request: '0'
2872
+ :top_provider:
2873
+ context_length: 128000
2874
+ max_completion_tokens: 4000
2875
+ is_moderated: false
2876
+ :per_request_limits:
2877
+ prompt_tokens: '7086953'
2878
+ completion_tokens: '1417390'
2879
+ - :id: databricks/dbrx-instruct
2880
+ :name: 'Databricks: DBRX 132B Instruct'
2881
+ :created: 1711670400
2882
+ :description: |-
2883
+ DBRX is a new open source large language model developed by Databricks. At 132B, it outperforms existing open source LLMs like Llama 2 70B and [Mixtral-8x7b](/mistralai/mixtral-8x7b) on standard industry benchmarks for language understanding, programming, math, and logic.
2884
+
2885
+ It uses a fine-grained mixture-of-experts (MoE) architecture. 36B parameters are active on any input. It was pre-trained on 12T tokens of text and code data. Compared to other open MoE models like Mixtral-8x7B and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts.
2886
+
2887
+ See the launch announcement and benchmark results [here](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm).
2888
+
2889
+ #moe
2890
+ :context_length: 32768
2891
+ :architecture:
2892
+ modality: text->text
2893
+ tokenizer: Other
2894
+ instruct_type: chatml
2895
+ :pricing:
2896
+ prompt: '0.00000108'
2897
+ completion: '0.00000108'
2898
+ image: '0'
2899
+ request: '0'
2900
+ :top_provider:
2901
+ context_length: 32768
2902
+ max_completion_tokens:
2903
+ is_moderated: false
2904
+ :per_request_limits:
2905
+ prompt_tokens: '18701682'
2906
+ completion_tokens: '18701682'
2907
+ - :id: sophosympatheia/midnight-rose-70b
2908
+ :name: Midnight Rose 70B
2909
+ :created: 1711065600
2910
+ :description: |-
2911
+ A merge with a complex family tree, this model was crafted for roleplaying and storytelling. Midnight Rose is a successor to Rogue Rose and Aurora Nights and improves upon them both. It wants to produce lengthy output by default and is the best creative writing merge produced so far by sophosympatheia.
2912
+
2913
+ Descending from earlier versions of Midnight Rose and [Wizard Tulu Dolphin 70B](https://huggingface.co/sophosympatheia/Wizard-Tulu-Dolphin-70B-v1.0), it inherits the best qualities of each.
2914
+ :context_length: 4096
2915
+ :architecture:
2916
+ modality: text->text
2917
+ tokenizer: Llama2
2918
+ instruct_type: airoboros
2919
+ :pricing:
2920
+ prompt: '0.0000008'
2921
+ completion: '0.0000008'
2922
+ image: '0'
2923
+ request: '0'
2924
+ :top_provider:
2925
+ context_length: 4096
2926
+ max_completion_tokens:
2927
+ is_moderated: false
2928
+ :per_request_limits:
2929
+ prompt_tokens: '25247270'
2930
+ completion_tokens: '25247270'
2931
+ - :id: cohere/command-r
2932
+ :name: 'Cohere: Command R'
2933
+ :created: 1710374400
2934
+ :description: |-
2935
+ Command-R is a 35B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents.
2936
+
2937
+ Read the launch post [here](https://txt.cohere.com/command-r/).
2938
+
2939
+ Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
2940
+ :context_length: 128000
2941
+ :architecture:
2942
+ modality: text->text
2943
+ tokenizer: Cohere
2944
+ instruct_type:
2945
+ :pricing:
2946
+ prompt: '0.000000475'
2947
+ completion: '0.000001425'
2948
+ image: '0'
2949
+ request: '0'
2950
+ :top_provider:
2951
+ context_length: 128000
2952
+ max_completion_tokens: 4000
2953
+ is_moderated: false
2954
+ :per_request_limits:
2955
+ prompt_tokens: '42521719'
2956
+ completion_tokens: '14173906'
2957
+ - :id: cohere/command
2958
+ :name: 'Cohere: Command'
2959
+ :created: 1710374400
2960
+ :description: |-
2961
+ Command is an instruction-following conversational model that performs language tasks with high quality, more reliably and with a longer context than our base generative models.
2962
+
2963
+ Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
2964
+ :context_length: 4096
2965
+ :architecture:
2966
+ modality: text->text
2967
+ tokenizer: Cohere
2968
+ instruct_type:
2969
+ :pricing:
2970
+ prompt: '0.00000095'
2971
+ completion: '0.0000019'
2972
+ image: '0'
2973
+ request: '0'
2974
+ :top_provider:
2975
+ context_length: 4096
2976
+ max_completion_tokens: 4000
2977
+ is_moderated: false
2978
+ :per_request_limits:
2979
+ prompt_tokens: '21260859'
2980
+ completion_tokens: '10630429'
2981
+ - :id: anthropic/claude-3-haiku
2982
+ :name: 'Anthropic: Claude 3 Haiku'
2983
+ :created: 1710288000
2984
+ :description: |-
2985
+ Claude 3 Haiku is Anthropic's fastest and most compact model for
2986
+ near-instant responsiveness. Quick and accurate targeted performance.
2987
+
2988
+ See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku)
2989
+
2990
+ #multimodal
2991
+ :context_length: 200000
2992
+ :architecture:
2993
+ modality: text+image->text
2994
+ tokenizer: Claude
2995
+ instruct_type:
2996
+ :pricing:
2997
+ prompt: '0.00000025'
2998
+ completion: '0.00000125'
2999
+ image: '0.0004'
3000
+ request: '0'
3001
+ :top_provider:
3002
+ context_length: 200000
3003
+ max_completion_tokens: 4096
3004
+ is_moderated: true
3005
+ :per_request_limits:
3006
+ prompt_tokens: '80791266'
3007
+ completion_tokens: '16158253'
3008
+ - :id: anthropic/claude-3-haiku:beta
3009
+ :name: 'Anthropic: Claude 3 Haiku (self-moderated)'
3010
+ :created: 1710288000
3011
+ :description: |-
3012
+ Claude 3 Haiku is Anthropic's fastest and most compact model for
3013
+ near-instant responsiveness. Quick and accurate targeted performance.
3014
+
3015
+ See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku)
3016
+
3017
+ #multimodal
3018
+
3019
+ _This is a faster endpoint, made available in collaboration with Anthropic, that is self-moderated: response moderation happens on the provider's side instead of OpenRouter's. For requests that pass moderation, it's identical to the [Standard](/anthropic/claude-3-haiku) variant._
3020
+ :context_length: 200000
3021
+ :architecture:
3022
+ modality: text+image->text
3023
+ tokenizer: Claude
3024
+ instruct_type:
3025
+ :pricing:
3026
+ prompt: '0.00000025'
3027
+ completion: '0.00000125'
3028
+ image: '0.0004'
3029
+ request: '0'
3030
+ :top_provider:
3031
+ context_length: 200000
3032
+ max_completion_tokens: 4096
3033
+ is_moderated: false
3034
+ :per_request_limits:
3035
+ prompt_tokens: '80791266'
3036
+ completion_tokens: '16158253'
3037
+ - :id: anthropic/claude-3-sonnet
3038
+ :name: 'Anthropic: Claude 3 Sonnet'
3039
+ :created: 1709596800
3040
+ :description: |-
3041
+ Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments.
3042
+
3043
+ See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family)
3044
+
3045
+ #multimodal
3046
+ :context_length: 200000
3047
+ :architecture:
3048
+ modality: text+image->text
3049
+ tokenizer: Claude
3050
+ instruct_type:
3051
+ :pricing:
3052
+ prompt: '0.000003'
3053
+ completion: '0.000015'
3054
+ image: '0.0048'
3055
+ request: '0'
3056
+ :top_provider:
3057
+ context_length: 200000
3058
+ max_completion_tokens: 4096
3059
+ is_moderated: true
3060
+ :per_request_limits:
3061
+ prompt_tokens: '6732605'
3062
+ completion_tokens: '1346521'
3063
+ - :id: anthropic/claude-3-sonnet:beta
3064
+ :name: 'Anthropic: Claude 3 Sonnet (self-moderated)'
3065
+ :created: 1709596800
3066
+ :description: |-
3067
+ Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments.
3068
+
3069
+ See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family)
3070
+
3071
+ #multimodal
3072
+
3073
+ _This is a faster endpoint, made available in collaboration with Anthropic, that is self-moderated: response moderation happens on the provider's side instead of OpenRouter's. For requests that pass moderation, it's identical to the [Standard](/anthropic/claude-3-sonnet) variant._
3074
+ :context_length: 200000
3075
+ :architecture:
3076
+ modality: text+image->text
3077
+ tokenizer: Claude
3078
+ instruct_type:
3079
+ :pricing:
3080
+ prompt: '0.000003'
3081
+ completion: '0.000015'
3082
+ image: '0.0048'
3083
+ request: '0'
3084
+ :top_provider:
3085
+ context_length: 200000
3086
+ max_completion_tokens: 4096
3087
+ is_moderated: false
3088
+ :per_request_limits:
3089
+ prompt_tokens: '6732605'
3090
+ completion_tokens: '1346521'
3091
+ - :id: anthropic/claude-3-opus
3092
+ :name: 'Anthropic: Claude 3 Opus'
3093
+ :created: 1709596800
3094
+ :description: |-
3095
+ Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding.
3096
+
3097
+ See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family)
3098
+
3099
+ #multimodal
3100
+ :context_length: 200000
3101
+ :architecture:
3102
+ modality: text+image->text
3103
+ tokenizer: Claude
3104
+ instruct_type:
3105
+ :pricing:
3106
+ prompt: '0.000015'
3107
+ completion: '0.000075'
3108
+ image: '0.024'
3109
+ request: '0'
3110
+ :top_provider:
3111
+ context_length: 200000
3112
+ max_completion_tokens: 4096
3113
+ is_moderated: true
3114
+ :per_request_limits:
3115
+ prompt_tokens: '1346521'
3116
+ completion_tokens: '269304'
3117
+ - :id: anthropic/claude-3-opus:beta
3118
+ :name: 'Anthropic: Claude 3 Opus (self-moderated)'
3119
+ :created: 1709596800
3120
+ :description: |-
3121
+ Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding.
3122
+
3123
+ See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family)
3124
+
3125
+ #multimodal
3126
+
3127
+ _This is a faster endpoint, made available in collaboration with Anthropic, that is self-moderated: response moderation happens on the provider's side instead of OpenRouter's. For requests that pass moderation, it's identical to the [Standard](/anthropic/claude-3-opus) variant._
3128
+ :context_length: 200000
3129
+ :architecture:
3130
+ modality: text+image->text
3131
+ tokenizer: Claude
3132
+ instruct_type:
3133
+ :pricing:
3134
+ prompt: '0.000015'
3135
+ completion: '0.000075'
3136
+ image: '0.024'
3137
+ request: '0'
3138
+ :top_provider:
3139
+ context_length: 200000
3140
+ max_completion_tokens: 4096
3141
+ is_moderated: false
3142
+ :per_request_limits:
3143
+ prompt_tokens: '1346521'
3144
+ completion_tokens: '269304'
3145
+ - :id: cohere/command-r-03-2024
3146
+ :name: 'Cohere: Command R (03-2024)'
3147
+ :created: 1709341200
3148
+ :description: |-
3149
+ Command-R is a 35B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents.
3150
+
3151
+ Read the launch post [here](https://txt.cohere.com/command-r/).
3152
+
3153
+ Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
3154
+ :context_length: 128000
3155
+ :architecture:
3156
+ modality: text->text
3157
+ tokenizer: Cohere
3158
+ instruct_type:
3159
+ :pricing:
3160
+ prompt: '0.000000475'
3161
+ completion: '0.000001425'
3162
+ image: '0'
3163
+ request: '0'
3164
+ :top_provider:
3165
+ context_length: 128000
3166
+ max_completion_tokens: 4000
3167
+ is_moderated: false
3168
+ :per_request_limits:
3169
+ prompt_tokens: '42521719'
3170
+ completion_tokens: '14173906'
3171
+ - :id: mistralai/mistral-large
3172
+ :name: Mistral Large
3173
+ :created: 1708905600
3174
+ :description: |-
3175
+ This is Mistral AI's flagship model, Mistral Large 2 (version `mistral-large-2407`). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/).
3176
+
3177
+ It is fluent in English, French, Spanish, German, and Italian, with high grammatical accuracy, and its long context window allows precise information recall from large documents.
3178
+ :context_length: 128000
3179
+ :architecture:
3180
+ modality: text->text
3181
+ tokenizer: Mistral
3182
+ instruct_type:
3183
+ :pricing:
3184
+ prompt: '0.000002'
3185
+ completion: '0.000006'
3186
+ image: '0'
3187
+ request: '0'
3188
+ :top_provider:
3189
+ context_length: 128000
3190
+ max_completion_tokens:
3191
+ is_moderated: false
3192
+ :per_request_limits:
3193
+ prompt_tokens: '10098908'
3194
+ completion_tokens: '3366302'
3195
+ - :id: openai/gpt-4-turbo-preview
3196
+ :name: 'OpenAI: GPT-4 Turbo Preview'
3197
+ :created: 1706140800
3198
+ :description: |-
3199
+ The preview GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Dec 2023.
3200
+
3201
+ **Note:** heavily rate limited by OpenAI while in preview.
3202
+ :context_length: 128000
3203
+ :architecture:
3204
+ modality: text->text
3205
+ tokenizer: GPT
3206
+ instruct_type:
3207
+ :pricing:
3208
+ prompt: '0.00001'
3209
+ completion: '0.00003'
3210
+ image: '0'
3211
+ request: '0'
3212
+ :top_provider:
3213
+ context_length: 128000
3214
+ max_completion_tokens: 4096
3215
+ is_moderated: true
3216
+ :per_request_limits:
3217
+ prompt_tokens: '2019781'
3218
+ completion_tokens: '673260'
3219
+ - :id: openai/gpt-3.5-turbo-0613
3220
+ :name: 'OpenAI: GPT-3.5 Turbo (older v0613)'
3221
+ :created: 1706140800
3222
+ :description: |-
3223
+ GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks.
3224
+
3225
+ Training data up to Sep 2021.
3226
+ :context_length: 4095
3227
+ :architecture:
3228
+ modality: text->text
3229
+ tokenizer: GPT
3230
+ instruct_type:
3231
+ :pricing:
3232
+ prompt: '0.000001'
3233
+ completion: '0.000002'
3234
+ image: '0'
3235
+ request: '0'
3236
+ :top_provider:
3237
+ context_length: 4095
3238
+ max_completion_tokens: 4096
3239
+ is_moderated: true
3240
+ :per_request_limits:
3241
+ prompt_tokens: '20197816'
3242
+ completion_tokens: '10098908'
3243
+ - :id: nousresearch/nous-hermes-2-mixtral-8x7b-dpo
3244
+ :name: 'Nous: Hermes 2 Mixtral 8x7B DPO'
3245
+ :created: 1705363200
3246
+ :description: |-
3247
+ Nous Hermes 2 Mixtral 8x7B DPO is the new flagship Nous Research model trained over the [Mixtral 8x7B MoE LLM](/mistralai/mixtral-8x7b).
3248
+
3249
+ The model was trained on over 1,000,000 entries of primarily [GPT-4](/openai/gpt-4) generated data, as well as other high quality data from open datasets across the AI landscape, achieving state of the art performance on a variety of tasks.
3250
+
3251
+ #moe
3252
+ :context_length: 32768
3253
+ :architecture:
3254
+ modality: text->text
3255
+ tokenizer: Mistral
3256
+ instruct_type: chatml
3257
+ :pricing:
3258
+ prompt: '0.00000054'
3259
+ completion: '0.00000054'
3260
+ image: '0'
3261
+ request: '0'
3262
+ :top_provider:
3263
+ context_length: 32768
3264
+ max_completion_tokens:
3265
+ is_moderated: false
3266
+ :per_request_limits:
3267
+ prompt_tokens: '37403364'
3268
+ completion_tokens: '37403364'
3269
+ - :id: mistralai/mistral-medium
3270
+ :name: Mistral Medium
3271
+ :created: 1704844800
3272
+ :description: This is Mistral AI's closed-source, medium-sided model. It's powered
3273
+ by a closed-source prototype and excels at reasoning, code, JSON, chat, and more.
3274
+ In benchmarks, it compares with many of the flagship models of other companies.
3275
+ :context_length: 32000
3276
+ :architecture:
3277
+ modality: text->text
3278
+ tokenizer: Mistral
3279
+ instruct_type:
3280
+ :pricing:
3281
+ prompt: '0.00000275'
3282
+ completion: '0.0000081'
3283
+ image: '0'
3284
+ request: '0'
3285
+ :top_provider:
3286
+ context_length: 32000
3287
+ max_completion_tokens:
3288
+ is_moderated: false
3289
+ :per_request_limits:
3290
+ prompt_tokens: '7344660'
3291
+ completion_tokens: '2493557'
3292
+ - :id: mistralai/mistral-small
3293
+ :name: Mistral Small
3294
+ :created: 1704844800
3295
+ :description: Cost-efficient, fast, and reliable option for use cases such as translation,
3296
+ summarization, and sentiment analysis.
3297
+ :context_length: 32000
3298
+ :architecture:
3299
+ modality: text->text
3300
+ tokenizer: Mistral
3301
+ instruct_type:
3302
+ :pricing:
3303
+ prompt: '0.0000002'
3304
+ completion: '0.0000006'
3305
+ image: '0'
3306
+ request: '0'
3307
+ :top_provider:
3308
+ context_length: 32000
3309
+ max_completion_tokens:
3310
+ is_moderated: false
3311
+ :per_request_limits:
3312
+ prompt_tokens: '100989083'
3313
+ completion_tokens: '33663027'
3314
+ - :id: mistralai/mistral-tiny
3315
+ :name: Mistral Tiny
3316
+ :created: 1704844800
3317
+ :description: This model is currently powered by Mistral-7B-v0.2, and incorporates
3318
+ a "better" fine-tuning than [Mistral 7B](/mistralai/mistral-7b-instruct-v0.1),
3319
+ inspired by community work. It's best used for large batch processing tasks where
3320
+ cost is a significant factor but reasoning capabilities are not crucial.
3321
+ :context_length: 32000
3322
+ :architecture:
3323
+ modality: text->text
3324
+ tokenizer: Mistral
3325
+ instruct_type:
3326
+ :pricing:
3327
+ prompt: '0.00000025'
3328
+ completion: '0.00000025'
3329
+ image: '0'
3330
+ request: '0'
3331
+ :top_provider:
3332
+ context_length: 32000
3333
+ max_completion_tokens:
3334
+ is_moderated: false
3335
+ :per_request_limits:
3336
+ prompt_tokens: '80791266'
3337
+ completion_tokens: '80791266'
3338
+ - :id: nousresearch/nous-hermes-yi-34b
3339
+ :name: 'Nous: Hermes 2 Yi 34B'
3340
+ :created: 1704153600
3341
+ :description: |-
3342
+ Nous Hermes 2 Yi 34B was trained on 1,000,000 entries of primarily GPT-4 generated data, as well as other high quality data from open datasets across the AI landscape.
3343
+
3344
+ Nous-Hermes 2 on Yi 34B outperforms all Nous-Hermes & Open-Hermes models of the past, achieving new heights in all benchmarks for a Nous Research LLM as well as surpassing many popular finetunes.
3345
+ :context_length: 4096
3346
+ :architecture:
3347
+ modality: text->text
3348
+ tokenizer: Yi
3349
+ instruct_type: chatml
3350
+ :pricing:
3351
+ prompt: '0.00000072'
3352
+ completion: '0.00000072'
3353
+ image: '0'
3354
+ request: '0'
3355
+ :top_provider:
3356
+ context_length: 4096
3357
+ max_completion_tokens:
3358
+ is_moderated: false
3359
+ :per_request_limits:
3360
+ prompt_tokens: '28052523'
3361
+ completion_tokens: '28052523'
3362
+ - :id: mistralai/mistral-7b-instruct-v0.2
3363
+ :name: 'Mistral: Mistral 7B Instruct v0.2'
3364
+ :created: 1703721600
3365
+ :description: |-
3366
+ A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
3367
+
3368
+ An improved version of [Mistral 7B Instruct](/modelsmistralai/mistral-7b-instruct-v0.1), with the following changes:
3369
+
3370
+ - 32k context window (vs 8k context in v0.1)
3371
+ - Rope-theta = 1e6
3372
+ - No Sliding-Window Attention
3373
+ :context_length: 32768
3374
+ :architecture:
3375
+ modality: text->text
3376
+ tokenizer: Mistral
3377
+ instruct_type: mistral
3378
+ :pricing:
3379
+ prompt: '0.00000018'
3380
+ completion: '0.00000018'
3381
+ image: '0'
3382
+ request: '0'
3383
+ :top_provider:
3384
+ context_length: 32768
3385
+ max_completion_tokens:
3386
+ is_moderated: false
3387
+ :per_request_limits:
3388
+ prompt_tokens: '112210093'
3389
+ completion_tokens: '112210093'
3390
+ - :id: cognitivecomputations/dolphin-mixtral-8x7b
3391
+ :name: "Dolphin 2.6 Mixtral 8x7B \U0001F42C"
3392
+ :created: 1703116800
3393
+ :description: |-
3394
+ This is a 16k context fine-tune of [Mixtral-8x7b](/mistralai/mixtral-8x7b). It excels in coding tasks due to extensive training with coding data and is known for its obedience, although it lacks DPO tuning.
3395
+
3396
+ The model is uncensored and is stripped of alignment and bias. It requires an external alignment layer for ethical use. Users are cautioned to use this highly compliant model responsibly, as detailed in a blog post about uncensored models at [erichartford.com/uncensored-models](https://erichartford.com/uncensored-models).
3397
+
3398
+ #moe #uncensored
3399
+ :context_length: 32768
3400
+ :architecture:
3401
+ modality: text->text
3402
+ tokenizer: Mistral
3403
+ instruct_type: chatml
3404
+ :pricing:
3405
+ prompt: '0.0000005'
3406
+ completion: '0.0000005'
3407
+ image: '0'
3408
+ request: '0'
3409
+ :top_provider:
3410
+ context_length: 32768
3411
+ max_completion_tokens:
3412
+ is_moderated: false
3413
+ :per_request_limits:
3414
+ prompt_tokens: '40395633'
3415
+ completion_tokens: '40395633'
3416
+ - :id: google/gemini-pro
3417
+ :name: 'Google: Gemini Pro 1.0'
3418
+ :created: 1702425600
3419
+ :description: |-
3420
+ Google's flagship text generation model. Designed to handle natural language tasks, multiturn text and code chat, and code generation.
3421
+
3422
+ See the benchmarks and prompting guidelines from [Deepmind](https://deepmind.google/technologies/gemini/).
3423
+
3424
+ Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
3425
+ :context_length: 32760
3426
+ :architecture:
3427
+ modality: text->text
3428
+ tokenizer: Gemini
3429
+ instruct_type:
3430
+ :pricing:
3431
+ prompt: '0.0000005'
3432
+ completion: '0.0000015'
3433
+ image: '0.0025'
3434
+ request: '0'
3435
+ :top_provider:
3436
+ context_length: 32760
3437
+ max_completion_tokens: 8192
3438
+ is_moderated: false
3439
+ :per_request_limits:
3440
+ prompt_tokens: '40395633'
3441
+ completion_tokens: '13465211'
3442
+ - :id: google/gemini-pro-vision
3443
+ :name: 'Google: Gemini Pro Vision 1.0'
3444
+ :created: 1702425600
3445
+ :description: |-
3446
+ Google's flagship multimodal model, supporting image and video in text or chat prompts for a text or code response.
3447
+
3448
+ See the benchmarks and prompting guidelines from [Deepmind](https://deepmind.google/technologies/gemini/).
3449
+
3450
+ Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
3451
+
3452
+ #multimodal
3453
+ :context_length: 16384
3454
+ :architecture:
3455
+ modality: text+image->text
3456
+ tokenizer: Gemini
3457
+ instruct_type:
3458
+ :pricing:
3459
+ prompt: '0.0000005'
3460
+ completion: '0.0000015'
3461
+ image: '0.0025'
3462
+ request: '0'
3463
+ :top_provider:
3464
+ context_length: 16384
3465
+ max_completion_tokens: 2048
3466
+ is_moderated: false
3467
+ :per_request_limits:
3468
+ prompt_tokens: '40395633'
3469
+ completion_tokens: '13465211'
3470
+ - :id: mistralai/mixtral-8x7b-instruct
3471
+ :name: Mixtral 8x7B Instruct
3472
+ :created: 1702166400
3473
+ :description: |-
3474
+ A pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters.
3475
+
3476
+ Instruct model fine-tuned by Mistral. #moe
3477
+ :context_length: 32768
3478
+ :architecture:
3479
+ modality: text->text
3480
+ tokenizer: Mistral
3481
+ instruct_type: mistral
3482
+ :pricing:
3483
+ prompt: '0.00000024'
3484
+ completion: '0.00000024'
3485
+ image: '0'
3486
+ request: '0'
3487
+ :top_provider:
3488
+ context_length: 32768
3489
+ max_completion_tokens:
3490
+ is_moderated: false
3491
+ :per_request_limits:
3492
+ prompt_tokens: '84157569'
3493
+ completion_tokens: '84157569'
3494
+ - :id: mistralai/mixtral-8x7b-instruct:nitro
3495
+ :name: Mixtral 8x7B Instruct (nitro)
3496
+ :created: 1702166400
3497
+ :description: |-
3498
+ A pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters.
3499
+
3500
+ Instruct model fine-tuned by Mistral. #moe
3501
+
3502
+ _These are higher-throughput endpoints for [Mixtral 8x7B Instruct](/mistralai/mixtral-8x7b-instruct). They may have higher prices._
3503
+ :context_length: 32768
3504
+ :architecture:
3505
+ modality: text->text
3506
+ tokenizer: Mistral
3507
+ instruct_type: mistral
3508
+ :pricing:
3509
+ prompt: '0.00000054'
3510
+ completion: '0.00000054'
3511
+ image: '0'
3512
+ request: '0'
3513
+ :top_provider:
3514
+ context_length: 32768
3515
+ max_completion_tokens:
3516
+ is_moderated: false
3517
+ :per_request_limits:
3518
+ prompt_tokens: '37403364'
3519
+ completion_tokens: '37403364'
3520
+ - :id: mistralai/mixtral-8x7b
3521
+ :name: Mixtral 8x7B (base)
3522
+ :created: 1702166400
3523
+ :description: |-
3524
+ A pretrained generative Sparse Mixture of Experts, by Mistral AI. Incorporates 8 experts (feed-forward networks) for a total of 47B parameters. Base model (not fine-tuned for instructions) - see [Mixtral 8x7B Instruct](/mistralai/mixtral-8x7b-instruct) for an instruct-tuned model.
3525
+
3526
+ #moe
3527
+ :context_length: 32768
3528
+ :architecture:
3529
+ modality: text->text
3530
+ tokenizer: Mistral
3531
+ instruct_type: none
3532
+ :pricing:
3533
+ prompt: '0.00000054'
3534
+ completion: '0.00000054'
3535
+ image: '0'
3536
+ request: '0'
3537
+ :top_provider:
3538
+ context_length: 32768
3539
+ max_completion_tokens:
3540
+ is_moderated: false
3541
+ :per_request_limits:
3542
+ prompt_tokens: '37403364'
3543
+ completion_tokens: '37403364'
3544
+ - :id: gryphe/mythomist-7b:free
3545
+ :name: MythoMist 7B (free)
3546
+ :created: 1701907200
3547
+ :description: |-
3548
+ From the creator of [MythoMax](/gryphe/mythomax-l2-13b), merges a suite of models to reduce word anticipation, ministrations, and other undesirable words in ChatGPT roleplaying data.
3549
+
3550
+ It combines [Neural Chat 7B](/intel/neural-chat-7b), Airoboros 7b, [Toppy M 7B](/undi95/toppy-m-7b), [Zepher 7b beta](/huggingfaceh4/zephyr-7b-beta), [Nous Capybara 34B](/nousresearch/nous-capybara-34b), [OpenHeremes 2.5](/teknium/openhermes-2.5-mistral-7b), and many others.
3551
+
3552
+ #merge
3553
+
3554
+ _These are free, rate-limited endpoints for [MythoMist 7B](/gryphe/mythomist-7b). Outputs may be cached. Read about rate limits [here](/docs/limits)._
3555
+ :context_length: 32768
3556
+ :architecture:
3557
+ modality: text->text
3558
+ tokenizer: Mistral
3559
+ instruct_type: alpaca
3560
+ :pricing:
3561
+ prompt: '0'
3562
+ completion: '0'
3563
+ image: '0'
3564
+ request: '0'
3565
+ :top_provider:
3566
+ context_length: 8192
3567
+ max_completion_tokens: 4096
3568
+ is_moderated: false
3569
+ :per_request_limits:
3570
+ prompt_tokens: Infinity
3571
+ completion_tokens: Infinity
3572
+ - :id: gryphe/mythomist-7b
3573
+ :name: MythoMist 7B
3574
+ :created: 1701907200
3575
+ :description: |-
3576
+ From the creator of [MythoMax](/gryphe/mythomax-l2-13b), merges a suite of models to reduce word anticipation, ministrations, and other undesirable words in ChatGPT roleplaying data.
3577
+
3578
+ It combines [Neural Chat 7B](/intel/neural-chat-7b), Airoboros 7b, [Toppy M 7B](/undi95/toppy-m-7b), [Zepher 7b beta](/huggingfaceh4/zephyr-7b-beta), [Nous Capybara 34B](/nousresearch/nous-capybara-34b), [OpenHeremes 2.5](/teknium/openhermes-2.5-mistral-7b), and many others.
3579
+
3580
+ #merge
3581
+ :context_length: 32768
3582
+ :architecture:
3583
+ modality: text->text
3584
+ tokenizer: Mistral
3585
+ instruct_type: alpaca
3586
+ :pricing:
3587
+ prompt: '0.000000375'
3588
+ completion: '0.000000375'
3589
+ image: '0'
3590
+ request: '0'
3591
+ :top_provider:
3592
+ context_length: 32768
3593
+ max_completion_tokens: 2048
3594
+ is_moderated: false
3595
+ :per_request_limits:
3596
+ prompt_tokens: '53860844'
3597
+ completion_tokens: '53860844'
3598
+ - :id: openchat/openchat-7b:free
3599
+ :name: OpenChat 3.5 7B (free)
3600
+ :created: 1701129600
3601
+ :description: |-
3602
+ OpenChat 7B is a library of open-source language models, fine-tuned with "C-RLFT (Conditioned Reinforcement Learning Fine-Tuning)" - a strategy inspired by offline reinforcement learning. It has been trained on mixed-quality data without preference labels.
3603
+
3604
+ - For OpenChat fine-tuned on Mistral 7B, check out [OpenChat 7B](/openchat/openchat-7b).
3605
+ - For OpenChat fine-tuned on Llama 8B, check out [OpenChat 8B](/openchat/openchat-8b).
3606
+
3607
+ #open-source
3608
+
3609
+ _These are free, rate-limited endpoints for [OpenChat 3.5 7B](/openchat/openchat-7b). Outputs may be cached. Read about rate limits [here](/docs/limits)._
3610
+ :context_length: 8192
3611
+ :architecture:
3612
+ modality: text->text
3613
+ tokenizer: Mistral
3614
+ instruct_type: openchat
3615
+ :pricing:
3616
+ prompt: '0'
3617
+ completion: '0'
3618
+ image: '0'
3619
+ request: '0'
3620
+ :top_provider:
3621
+ context_length: 8192
3622
+ max_completion_tokens: 4096
3623
+ is_moderated: false
3624
+ :per_request_limits:
3625
+ prompt_tokens: Infinity
3626
+ completion_tokens: Infinity
3627
+ - :id: openchat/openchat-7b
3628
+ :name: OpenChat 3.5 7B
3629
+ :created: 1701129600
3630
+ :description: |-
3631
+ OpenChat 7B is a library of open-source language models, fine-tuned with "C-RLFT (Conditioned Reinforcement Learning Fine-Tuning)" - a strategy inspired by offline reinforcement learning. It has been trained on mixed-quality data without preference labels.
3632
+
3633
+ - For OpenChat fine-tuned on Mistral 7B, check out [OpenChat 7B](/openchat/openchat-7b).
3634
+ - For OpenChat fine-tuned on Llama 8B, check out [OpenChat 8B](/openchat/openchat-8b).
3635
+
3636
+ #open-source
3637
+ :context_length: 8192
3638
+ :architecture:
3639
+ modality: text->text
3640
+ tokenizer: Mistral
3641
+ instruct_type: openchat
3642
+ :pricing:
3643
+ prompt: '0.000000055'
3644
+ completion: '0.000000055'
3645
+ image: '0'
3646
+ request: '0'
3647
+ :top_provider:
3648
+ context_length: 8192
3649
+ max_completion_tokens:
3650
+ is_moderated: false
3651
+ :per_request_limits:
3652
+ prompt_tokens: '367233031'
3653
+ completion_tokens: '367233031'
3654
+ - :id: neversleep/noromaid-20b
3655
+ :name: Noromaid 20B
3656
+ :created: 1700956800
3657
+ :description: |-
3658
+ A collab between IkariDev and Undi. This merge is suitable for RP, ERP, and general knowledge.
3659
+
3660
+ #merge #uncensored
3661
+ :context_length: 8192
3662
+ :architecture:
3663
+ modality: text->text
3664
+ tokenizer: Llama2
3665
+ instruct_type: alpaca
3666
+ :pricing:
3667
+ prompt: '0.0000015'
3668
+ completion: '0.00000225'
3669
+ image: '0'
3670
+ request: '0'
3671
+ :top_provider:
3672
+ context_length: 8192
3673
+ max_completion_tokens: 2048
3674
+ is_moderated: false
3675
+ :per_request_limits:
3676
+ prompt_tokens: '13465211'
3677
+ completion_tokens: '8976807'
3678
+ - :id: anthropic/claude-instant-1.1
3679
+ :name: 'Anthropic: Claude Instant v1.1'
3680
+ :created: 1700611200
3681
+ :description: Anthropic's model for low-latency, high throughput text generation.
3682
+ Supports hundreds of pages of text.
3683
+ :context_length: 100000
3684
+ :architecture:
3685
+ modality: text->text
3686
+ tokenizer: Claude
3687
+ instruct_type: claude
3688
+ :pricing:
3689
+ prompt: '0.0000008'
3690
+ completion: '0.0000024'
3691
+ image: '0'
3692
+ request: '0'
3693
+ :top_provider:
3694
+ context_length: 100000
3695
+ max_completion_tokens: 2048
3696
+ is_moderated: true
3697
+ :per_request_limits:
3698
+ prompt_tokens: '25247270'
3699
+ completion_tokens: '8415756'
3700
+ - :id: anthropic/claude-2.1
3701
+ :name: 'Anthropic: Claude v2.1'
3702
+ :created: 1700611200
3703
+ :description: 'Claude 2 delivers advancements in key capabilities for enterprises—including
3704
+ an industry-leading 200K token context window, significant reductions in rates
3705
+ of model hallucination, system prompts and a new beta feature: tool use.'
3706
+ :context_length: 200000
3707
+ :architecture:
3708
+ modality: text->text
3709
+ tokenizer: Claude
3710
+ instruct_type:
3711
+ :pricing:
3712
+ prompt: '0.000008'
3713
+ completion: '0.000024'
3714
+ image: '0'
3715
+ request: '0'
3716
+ :top_provider:
3717
+ context_length: 200000
3718
+ max_completion_tokens: 4096
3719
+ is_moderated: true
3720
+ :per_request_limits:
3721
+ prompt_tokens: '2524727'
3722
+ completion_tokens: '841575'
3723
+ - :id: anthropic/claude-2.1:beta
3724
+ :name: 'Anthropic: Claude v2.1 (self-moderated)'
3725
+ :created: 1700611200
3726
+ :description: |-
3727
+ Claude 2 delivers advancements in key capabilities for enterprises—including an industry-leading 200K token context window, significant reductions in rates of model hallucination, system prompts and a new beta feature: tool use.
3728
+
3729
+ _This is a faster endpoint, made available in collaboration with Anthropic, that is self-moderated: response moderation happens on the provider's side instead of OpenRouter's. For requests that pass moderation, it's identical to the [Standard](/anthropic/claude-2.1) variant._
3730
+ :context_length: 200000
3731
+ :architecture:
3732
+ modality: text->text
3733
+ tokenizer: Claude
3734
+ instruct_type:
3735
+ :pricing:
3736
+ prompt: '0.000008'
3737
+ completion: '0.000024'
3738
+ image: '0'
3739
+ request: '0'
3740
+ :top_provider:
3741
+ context_length: 200000
3742
+ max_completion_tokens: 4096
3743
+ is_moderated: false
3744
+ :per_request_limits:
3745
+ prompt_tokens: '2524727'
3746
+ completion_tokens: '841575'
3747
+ - :id: anthropic/claude-2
3748
+ :name: 'Anthropic: Claude v2'
3749
+ :created: 1700611200
3750
+ :description: 'Claude 2 delivers advancements in key capabilities for enterprises—including
3751
+ an industry-leading 200K token context window, significant reductions in rates
3752
+ of model hallucination, system prompts and a new beta feature: tool use.'
3753
+ :context_length: 200000
3754
+ :architecture:
3755
+ modality: text->text
3756
+ tokenizer: Claude
3757
+ instruct_type:
3758
+ :pricing:
3759
+ prompt: '0.000008'
3760
+ completion: '0.000024'
3761
+ image: '0'
3762
+ request: '0'
3763
+ :top_provider:
3764
+ context_length: 200000
3765
+ max_completion_tokens: 4096
3766
+ is_moderated: true
3767
+ :per_request_limits:
3768
+ prompt_tokens: '2524727'
3769
+ completion_tokens: '841575'
3770
+ - :id: anthropic/claude-2:beta
3771
+ :name: 'Anthropic: Claude v2 (self-moderated)'
3772
+ :created: 1700611200
3773
+ :description: |-
3774
+ Claude 2 delivers advancements in key capabilities for enterprises—including an industry-leading 200K token context window, significant reductions in rates of model hallucination, system prompts and a new beta feature: tool use.
3775
+
3776
+ _This is a faster endpoint, made available in collaboration with Anthropic, that is self-moderated: response moderation happens on the provider's side instead of OpenRouter's. For requests that pass moderation, it's identical to the [Standard](/anthropic/claude-2) variant._
3777
+ :context_length: 200000
3778
+ :architecture:
3779
+ modality: text->text
3780
+ tokenizer: Claude
3781
+ instruct_type:
3782
+ :pricing:
3783
+ prompt: '0.000008'
3784
+ completion: '0.000024'
3785
+ image: '0'
3786
+ request: '0'
3787
+ :top_provider:
3788
+ context_length: 200000
3789
+ max_completion_tokens: 4096
3790
+ is_moderated: false
3791
+ :per_request_limits:
3792
+ prompt_tokens: '2524727'
3793
+ completion_tokens: '841575'
3794
+ - :id: teknium/openhermes-2.5-mistral-7b
3795
+ :name: OpenHermes 2.5 Mistral 7B
3796
+ :created: 1700438400
3797
+ :description: |-
3798
+ A continuation of [OpenHermes 2 model](/teknium/openhermes-2-mistral-7b), trained on additional code datasets.
3799
+ Potentially the most interesting finding from training on a good ratio (est. of around 7-14% of the total dataset) of code instruction was that it has boosted several non-code benchmarks, including TruthfulQA, AGIEval, and GPT4All suite. It did however reduce BigBench benchmark score, but the net gain overall is significant.
3800
+ :context_length: 4096
3801
+ :architecture:
3802
+ modality: text->text
3803
+ tokenizer: Mistral
3804
+ instruct_type: chatml
3805
+ :pricing:
3806
+ prompt: '0.00000017'
3807
+ completion: '0.00000017'
3808
+ image: '0'
3809
+ request: '0'
3810
+ :top_provider:
3811
+ context_length: 4096
3812
+ max_completion_tokens:
3813
+ is_moderated: false
3814
+ :per_request_limits:
3815
+ prompt_tokens: '118810686'
3816
+ completion_tokens: '118810686'
3817
+ - :id: openai/gpt-4-vision-preview
3818
+ :name: 'OpenAI: GPT-4 Vision'
3819
+ :created: 1699833600
3820
+ :description: |-
3821
+ Ability to understand images, in addition to all other [GPT-4 Turbo capabilties](/openai/gpt-4-turbo). Training data: up to Apr 2023.
3822
+
3823
+ **Note:** heavily rate limited by OpenAI while in preview.
3824
+
3825
+ #multimodal
3826
+ :context_length: 128000
3827
+ :architecture:
3828
+ modality: text+image->text
3829
+ tokenizer: GPT
3830
+ instruct_type:
3831
+ :pricing:
3832
+ prompt: '0.00001'
3833
+ completion: '0.00003'
3834
+ image: '0.01445'
3835
+ request: '0'
3836
+ :top_provider:
3837
+ context_length: 128000
3838
+ max_completion_tokens: 4096
3839
+ is_moderated: true
3840
+ :per_request_limits:
3841
+ prompt_tokens: '2019781'
3842
+ completion_tokens: '673260'
3843
+ - :id: lizpreciatior/lzlv-70b-fp16-hf
3844
+ :name: lzlv 70B
3845
+ :created: 1699747200
3846
+ :description: |-
3847
+ A Mythomax/MLewd_13B-style merge of selected 70B models.
3848
+ A multi-model merge of several LLaMA2 70B finetunes for roleplaying and creative work. The goal was to create a model that combines creativity with intelligence for an enhanced experience.
3849
+
3850
+ #merge #uncensored
3851
+ :context_length: 4096
3852
+ :architecture:
3853
+ modality: text->text
3854
+ tokenizer: Llama2
3855
+ instruct_type: airoboros
3856
+ :pricing:
3857
+ prompt: '0.00000035'
3858
+ completion: '0.0000004'
3859
+ image: '0'
3860
+ request: '0'
3861
+ :top_provider:
3862
+ context_length: 4096
3863
+ max_completion_tokens:
3864
+ is_moderated: false
3865
+ :per_request_limits:
3866
+ prompt_tokens: '57708047'
3867
+ completion_tokens: '50494541'
3868
+ - :id: alpindale/goliath-120b
3869
+ :name: Goliath 120B
3870
+ :created: 1699574400
3871
+ :description: |-
3872
+ A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale.
3873
+
3874
+ Credits to
3875
+ - [@chargoddard](https://huggingface.co/chargoddard) for developing the framework used to merge the model - [mergekit](https://github.com/cg123/mergekit).
3876
+ - [@Undi95](https://huggingface.co/Undi95) for helping with the merge ratios.
3877
+
3878
+ #merge
3879
+ :context_length: 6144
3880
+ :architecture:
3881
+ modality: text->text
3882
+ tokenizer: Llama2
3883
+ instruct_type: airoboros
3884
+ :pricing:
3885
+ prompt: '0.000009375'
3886
+ completion: '0.000009375'
3887
+ image: '0'
3888
+ request: '0'
3889
+ :top_provider:
3890
+ context_length: 6144
3891
+ max_completion_tokens: 400
3892
+ is_moderated: false
3893
+ :per_request_limits:
3894
+ prompt_tokens: '2154433'
3895
+ completion_tokens: '2154433'
3896
+ - :id: undi95/toppy-m-7b:free
3897
+ :name: Toppy M 7B (free)
3898
+ :created: 1699574400
3899
+ :description: |-
3900
+ A wild 7B parameter model that merges several models using the new task_arithmetic merge method from mergekit.
3901
+ List of merged models:
3902
+ - NousResearch/Nous-Capybara-7B-V1.9
3903
+ - [HuggingFaceH4/zephyr-7b-beta](/huggingfaceh4/zephyr-7b-beta)
3904
+ - lemonilia/AshhLimaRP-Mistral-7B
3905
+ - Vulkane/120-Days-of-Sodom-LoRA-Mistral-7b
3906
+ - Undi95/Mistral-pippa-sharegpt-7b-qlora
3907
+
3908
+ #merge #uncensored
3909
+
3910
+ _These are free, rate-limited endpoints for [Toppy M 7B](/undi95/toppy-m-7b). Outputs may be cached. Read about rate limits [here](/docs/limits)._
3911
+ :context_length: 4096
3912
+ :architecture:
3913
+ modality: text->text
3914
+ tokenizer: Mistral
3915
+ instruct_type: alpaca
3916
+ :pricing:
3917
+ prompt: '0'
3918
+ completion: '0'
3919
+ image: '0'
3920
+ request: '0'
3921
+ :top_provider:
3922
+ context_length: 4096
3923
+ max_completion_tokens: 2048
3924
+ is_moderated: false
3925
+ :per_request_limits:
3926
+ prompt_tokens: Infinity
3927
+ completion_tokens: Infinity
3928
+ - :id: undi95/toppy-m-7b
3929
+ :name: Toppy M 7B
3930
+ :created: 1699574400
3931
+ :description: |-
3932
+ A wild 7B parameter model that merges several models using the new task_arithmetic merge method from mergekit.
3933
+ List of merged models:
3934
+ - NousResearch/Nous-Capybara-7B-V1.9
3935
+ - [HuggingFaceH4/zephyr-7b-beta](/huggingfaceh4/zephyr-7b-beta)
3936
+ - lemonilia/AshhLimaRP-Mistral-7B
3937
+ - Vulkane/120-Days-of-Sodom-LoRA-Mistral-7b
3938
+ - Undi95/Mistral-pippa-sharegpt-7b-qlora
3939
+
3940
+ #merge #uncensored
3941
+ :context_length: 4096
3942
+ :architecture:
3943
+ modality: text->text
3944
+ tokenizer: Mistral
3945
+ instruct_type: alpaca
3946
+ :pricing:
3947
+ prompt: '0.00000007'
3948
+ completion: '0.00000007'
3949
+ image: '0'
3950
+ request: '0'
3951
+ :top_provider:
3952
+ context_length: 4096
3953
+ max_completion_tokens:
3954
+ is_moderated: false
3955
+ :per_request_limits:
3956
+ prompt_tokens: '288540239'
3957
+ completion_tokens: '288540239'
3958
+ - :id: undi95/toppy-m-7b:nitro
3959
+ :name: Toppy M 7B (nitro)
3960
+ :created: 1699574400
3961
+ :description: |-
3962
+ A wild 7B parameter model that merges several models using the new task_arithmetic merge method from mergekit.
3963
+ List of merged models:
3964
+ - NousResearch/Nous-Capybara-7B-V1.9
3965
+ - [HuggingFaceH4/zephyr-7b-beta](/huggingfaceh4/zephyr-7b-beta)
3966
+ - lemonilia/AshhLimaRP-Mistral-7B
3967
+ - Vulkane/120-Days-of-Sodom-LoRA-Mistral-7b
3968
+ - Undi95/Mistral-pippa-sharegpt-7b-qlora
3969
+
3970
+ #merge #uncensored
3971
+
3972
+ _These are higher-throughput endpoints for [Toppy M 7B](/undi95/toppy-m-7b). They may have higher prices._
3973
+ :context_length: 4096
3974
+ :architecture:
3975
+ modality: text->text
3976
+ tokenizer: Mistral
3977
+ instruct_type: alpaca
3978
+ :pricing:
3979
+ prompt: '0.00000007'
3980
+ completion: '0.00000007'
3981
+ image: '0'
3982
+ request: '0'
3983
+ :top_provider:
3984
+ context_length: 4096
3985
+ max_completion_tokens:
3986
+ is_moderated: false
3987
+ :per_request_limits:
3988
+ prompt_tokens: '288540239'
3989
+ completion_tokens: '288540239'
3990
+ - :id: openrouter/auto
3991
+ :name: Auto (best for prompt)
3992
+ :created: 1699401600
3993
+ :description: |-
3994
+ Depending on their size, subject, and complexity, your prompts will be sent to [Llama 3 70B Instruct](/models/meta-llama/llama-3-70b-instruct), [Claude 3.5 Sonnet (self-moderated)](/models/anthropic/claude-3.5-sonnet:beta) or [GPT-4o](/models/openai/gpt-4o). To see which model was used, visit [Activity](/activity).
3995
+
3996
+ A major redesign of this router is coming soon. Stay tuned on [Discord](https://discord.gg/fVyRaUDgxW) for updates.
3997
+ :context_length: 200000
3998
+ :architecture:
3999
+ modality: text->text
4000
+ tokenizer: Router
4001
+ instruct_type:
4002
+ :pricing:
4003
+ prompt: "-1"
4004
+ completion: "-1"
4005
+ request: "-1"
4006
+ image: "-1"
4007
+ :top_provider:
4008
+ context_length:
4009
+ max_completion_tokens:
4010
+ is_moderated: false
4011
+ :per_request_limits:
4012
+ - :id: openai/gpt-4-1106-preview
4013
+ :name: 'OpenAI: GPT-4 Turbo (older v1106)'
4014
+ :created: 1699228800
4015
+ :description: |-
4016
+ The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling.
4017
+
4018
+ Training data: up to April 2023.
4019
+ :context_length: 128000
4020
+ :architecture:
4021
+ modality: text->text
4022
+ tokenizer: GPT
4023
+ instruct_type:
4024
+ :pricing:
4025
+ prompt: '0.00001'
4026
+ completion: '0.00003'
4027
+ image: '0'
4028
+ request: '0'
4029
+ :top_provider:
4030
+ context_length: 128000
4031
+ max_completion_tokens: 4096
4032
+ is_moderated: true
4033
+ :per_request_limits:
4034
+ prompt_tokens: '2019781'
4035
+ completion_tokens: '673260'
4036
+ - :id: openai/gpt-3.5-turbo-1106
4037
+ :name: 'OpenAI: GPT-3.5 Turbo 16k (older v1106)'
4038
+ :created: 1699228800
4039
+ :description: 'An older GPT-3.5 Turbo model with improved instruction following,
4040
+ JSON mode, reproducible outputs, parallel function calling, and more. Training
4041
+ data: up to Sep 2021.'
4042
+ :context_length: 16385
4043
+ :architecture:
4044
+ modality: text->text
4045
+ tokenizer: GPT
4046
+ instruct_type:
4047
+ :pricing:
4048
+ prompt: '0.000001'
4049
+ completion: '0.000002'
4050
+ image: '0'
4051
+ request: '0'
4052
+ :top_provider:
4053
+ context_length: 16385
4054
+ max_completion_tokens: 4096
4055
+ is_moderated: true
4056
+ :per_request_limits:
4057
+ prompt_tokens: '20197816'
4058
+ completion_tokens: '10098908'
4059
+ - :id: google/palm-2-codechat-bison-32k
4060
+ :name: 'Google: PaLM 2 Code Chat 32k'
4061
+ :created: 1698969600
4062
+ :description: PaLM 2 fine-tuned for chatbot conversations that help with code-related
4063
+ questions.
4064
+ :context_length: 32760
4065
+ :architecture:
4066
+ modality: text->text
4067
+ tokenizer: PaLM
4068
+ instruct_type:
4069
+ :pricing:
4070
+ prompt: '0.000001'
4071
+ completion: '0.000002'
4072
+ image: '0'
4073
+ request: '0'
4074
+ :top_provider:
4075
+ context_length: 32768
4076
+ max_completion_tokens: 8192
4077
+ is_moderated: false
4078
+ :per_request_limits:
4079
+ prompt_tokens: '20197816'
4080
+ completion_tokens: '10098908'
4081
+ - :id: google/palm-2-chat-bison-32k
4082
+ :name: 'Google: PaLM 2 Chat 32k'
4083
+ :created: 1698969600
4084
+ :description: PaLM 2 is a language model by Google with improved multilingual, reasoning
4085
+ and coding capabilities.
4086
+ :context_length: 32760
4087
+ :architecture:
4088
+ modality: text->text
4089
+ tokenizer: PaLM
4090
+ instruct_type:
4091
+ :pricing:
4092
+ prompt: '0.000001'
4093
+ completion: '0.000002'
4094
+ image: '0'
4095
+ request: '0'
4096
+ :top_provider:
4097
+ context_length: 32768
4098
+ max_completion_tokens: 8192
4099
+ is_moderated: false
4100
+ :per_request_limits:
4101
+ prompt_tokens: '20197816'
4102
+ completion_tokens: '10098908'
4103
+ - :id: jondurbin/airoboros-l2-70b
4104
+ :name: Airoboros 70B
4105
+ :created: 1698537600
4106
+ :description: |-
4107
+ A Llama 2 70B fine-tune using synthetic data (the Airoboros dataset).
4108
+
4109
+ Currently based on [jondurbin/airoboros-l2-70b](https://huggingface.co/jondurbin/airoboros-l2-70b-2.2.1), but might get updated in the future.
4110
+ :context_length: 4096
4111
+ :architecture:
4112
+ modality: text->text
4113
+ tokenizer: Llama2
4114
+ instruct_type: airoboros
4115
+ :pricing:
4116
+ prompt: '0.0000005'
4117
+ completion: '0.0000005'
4118
+ image: '0'
4119
+ request: '0'
4120
+ :top_provider:
4121
+ context_length: 4096
4122
+ max_completion_tokens:
4123
+ is_moderated: false
4124
+ :per_request_limits:
4125
+ prompt_tokens: '40395633'
4126
+ completion_tokens: '40395633'
4127
+ - :id: xwin-lm/xwin-lm-70b
4128
+ :name: Xwin 70B
4129
+ :created: 1697328000
4130
+ :description: Xwin-LM aims to develop and open-source alignment tech for LLMs. Our
4131
+ first release, built-upon on the [Llama2](/${Model.Llama_2_13B_Chat}) base models,
4132
+ ranked TOP-1 on AlpacaEval. Notably, it's the first to surpass [GPT-4](/${Model.GPT_4})
4133
+ on this benchmark. The project will be continuously updated.
4134
+ :context_length: 8192
4135
+ :architecture:
4136
+ modality: text->text
4137
+ tokenizer: Llama2
4138
+ instruct_type: airoboros
4139
+ :pricing:
4140
+ prompt: '0.00000375'
4141
+ completion: '0.00000375'
4142
+ image: '0'
4143
+ request: '0'
4144
+ :top_provider:
4145
+ context_length: 8192
4146
+ max_completion_tokens: 400
4147
+ is_moderated: false
4148
+ :per_request_limits:
4149
+ prompt_tokens: '5386084'
4150
+ completion_tokens: '5386084'
4151
+ - :id: mistralai/mistral-7b-instruct-v0.1
4152
+ :name: 'Mistral: Mistral 7B Instruct v0.1'
4153
+ :created: 1695859200
4154
+ :description: A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks,
4155
+ with optimizations for speed and context length.
4156
+ :context_length: 4096
4157
+ :architecture:
4158
+ modality: text->text
4159
+ tokenizer: Mistral
4160
+ instruct_type: mistral
4161
+ :pricing:
4162
+ prompt: '0.00000018'
4163
+ completion: '0.00000018'
4164
+ image: '0'
4165
+ request: '0'
4166
+ :top_provider:
4167
+ context_length: 4096
4168
+ max_completion_tokens:
4169
+ is_moderated: false
4170
+ :per_request_limits:
4171
+ prompt_tokens: '112210093'
4172
+ completion_tokens: '112210093'
4173
+ - :id: openai/gpt-3.5-turbo-instruct
4174
+ :name: 'OpenAI: GPT-3.5 Turbo Instruct'
4175
+ :created: 1695859200
4176
+ :description: 'This model is a variant of GPT-3.5 Turbo tuned for instructional
4177
+ prompts and omitting chat-related optimizations. Training data: up to Sep 2021.'
4178
+ :context_length: 4095
4179
+ :architecture:
4180
+ modality: text->text
4181
+ tokenizer: GPT
4182
+ instruct_type: chatml
4183
+ :pricing:
4184
+ prompt: '0.0000015'
4185
+ completion: '0.000002'
4186
+ image: '0'
4187
+ request: '0'
4188
+ :top_provider:
4189
+ context_length: 4095
4190
+ max_completion_tokens: 4096
4191
+ is_moderated: true
4192
+ :per_request_limits:
4193
+ prompt_tokens: '13465211'
4194
+ completion_tokens: '10098908'
4195
+ - :id: pygmalionai/mythalion-13b
4196
+ :name: 'Pygmalion: Mythalion 13B'
4197
+ :created: 1693612800
4198
+ :description: 'A blend of the new Pygmalion-13b and MythoMax. #merge'
4199
+ :context_length: 8192
4200
+ :architecture:
4201
+ modality: text->text
4202
+ tokenizer: Llama2
4203
+ instruct_type: alpaca
4204
+ :pricing:
4205
+ prompt: '0.000001125'
4206
+ completion: '0.000001125'
4207
+ image: '0'
4208
+ request: '0'
4209
+ :top_provider:
4210
+ context_length: 8192
4211
+ max_completion_tokens: 400
4212
+ is_moderated: false
4213
+ :per_request_limits:
4214
+ prompt_tokens: '17953614'
4215
+ completion_tokens: '17953614'
4216
+ - :id: openai/gpt-4-32k-0314
4217
+ :name: 'OpenAI: GPT-4 32k (older v0314)'
4218
+ :created: 1693180800
4219
+ :description: 'GPT-4-32k is an extended version of GPT-4, with the same capabilities
4220
+ but quadrupled context length, allowing for processing up to 40 pages of text
4221
+ in a single pass. This is particularly beneficial for handling longer content
4222
+ like interacting with PDFs without an external vector database. Training data:
4223
+ up to Sep 2021.'
4224
+ :context_length: 32767
4225
+ :architecture:
4226
+ modality: text->text
4227
+ tokenizer: GPT
4228
+ instruct_type:
4229
+ :pricing:
4230
+ prompt: '0.00006'
4231
+ completion: '0.00012'
4232
+ image: '0'
4233
+ request: '0'
4234
+ :top_provider:
4235
+ context_length: 32767
4236
+ max_completion_tokens: 4096
4237
+ is_moderated: true
4238
+ :per_request_limits:
4239
+ prompt_tokens: '336630'
4240
+ completion_tokens: '168315'
4241
+ - :id: openai/gpt-4-32k
4242
+ :name: 'OpenAI: GPT-4 32k'
4243
+ :created: 1693180800
4244
+ :description: 'GPT-4-32k is an extended version of GPT-4, with the same capabilities
4245
+ but quadrupled context length, allowing for processing up to 40 pages of text
4246
+ in a single pass. This is particularly beneficial for handling longer content
4247
+ like interacting with PDFs without an external vector database. Training data:
4248
+ up to Sep 2021.'
4249
+ :context_length: 32767
4250
+ :architecture:
4251
+ modality: text->text
4252
+ tokenizer: GPT
4253
+ instruct_type:
4254
+ :pricing:
4255
+ prompt: '0.00006'
4256
+ completion: '0.00012'
4257
+ image: '0'
4258
+ request: '0'
4259
+ :top_provider:
4260
+ context_length: 32767
4261
+ max_completion_tokens: 4096
4262
+ is_moderated: true
4263
+ :per_request_limits:
4264
+ prompt_tokens: '336630'
4265
+ completion_tokens: '168315'
4266
+ - :id: openai/gpt-3.5-turbo-16k
4267
+ :name: 'OpenAI: GPT-3.5 Turbo 16k'
4268
+ :created: 1693180800
4269
+ :description: 'This model offers four times the context length of gpt-3.5-turbo,
4270
+ allowing it to support approximately 20 pages of text in a single request at a
4271
+ higher cost. Training data: up to Sep 2021.'
4272
+ :context_length: 16385
4273
+ :architecture:
4274
+ modality: text->text
4275
+ tokenizer: GPT
4276
+ instruct_type:
4277
+ :pricing:
4278
+ prompt: '0.000003'
4279
+ completion: '0.000004'
4280
+ image: '0'
4281
+ request: '0'
4282
+ :top_provider:
4283
+ context_length: 16385
4284
+ max_completion_tokens: 4096
4285
+ is_moderated: true
4286
+ :per_request_limits:
4287
+ prompt_tokens: '6732605'
4288
+ completion_tokens: '5049454'
4289
+ - :id: nousresearch/nous-hermes-llama2-13b
4290
+ :name: 'Nous: Hermes 13B'
4291
+ :created: 1692489600
4292
+ :description: A state-of-the-art language model fine-tuned on over 300k instructions
4293
+ by Nous Research, with Teknium and Emozilla leading the fine tuning process.
4294
+ :context_length: 4096
4295
+ :architecture:
4296
+ modality: text->text
4297
+ tokenizer: Llama2
4298
+ instruct_type: alpaca
4299
+ :pricing:
4300
+ prompt: '0.00000017'
4301
+ completion: '0.00000017'
4302
+ image: '0'
4303
+ request: '0'
4304
+ :top_provider:
4305
+ context_length: 4096
4306
+ max_completion_tokens:
4307
+ is_moderated: false
4308
+ :per_request_limits:
4309
+ prompt_tokens: '118810686'
4310
+ completion_tokens: '118810686'
4311
+ - :id: huggingfaceh4/zephyr-7b-beta:free
4312
+ :name: 'Hugging Face: Zephyr 7B (free)'
4313
+ :created: 1690934400
4314
+ :description: |-
4315
+ Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-β is the second model in the series, and is a fine-tuned version of [mistralai/Mistral-7B-v0.1](/mistralai/mistral-7b-instruct-v0.1) that was trained on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO).
4316
+
4317
+ _These are free, rate-limited endpoints for [Zephyr 7B](/huggingfaceh4/zephyr-7b-beta). Outputs may be cached. Read about rate limits [here](/docs/limits)._
4318
+ :context_length: 4096
4319
+ :architecture:
4320
+ modality: text->text
4321
+ tokenizer: Mistral
4322
+ instruct_type: zephyr
4323
+ :pricing:
4324
+ prompt: '0'
4325
+ completion: '0'
4326
+ image: '0'
4327
+ request: '0'
4328
+ :top_provider:
4329
+ context_length: 4096
4330
+ max_completion_tokens: 2048
4331
+ is_moderated: false
4332
+ :per_request_limits:
4333
+ prompt_tokens: Infinity
4334
+ completion_tokens: Infinity
4335
+ - :id: mancer/weaver
4336
+ :name: 'Mancer: Weaver (alpha)'
4337
+ :created: 1690934400
4338
+ :description: An attempt to recreate Claude-style verbosity, but don't expect the
4339
+ same level of coherence or memory. Meant for use in roleplay/narrative situations.
4340
+ :context_length: 8000
4341
+ :architecture:
4342
+ modality: text->text
4343
+ tokenizer: Llama2
4344
+ instruct_type: alpaca
4345
+ :pricing:
4346
+ prompt: '0.000001875'
4347
+ completion: '0.00000225'
4348
+ image: '0'
4349
+ request: '0'
4350
+ :top_provider:
4351
+ context_length: 8000
4352
+ max_completion_tokens: 1000
4353
+ is_moderated: false
4354
+ :per_request_limits:
4355
+ prompt_tokens: '10772168'
4356
+ completion_tokens: '8976807'
4357
+ - :id: anthropic/claude-instant-1.0
4358
+ :name: 'Anthropic: Claude Instant v1.0'
4359
+ :created: 1690502400
4360
+ :description: Anthropic's model for low-latency, high throughput text generation.
4361
+ Supports hundreds of pages of text.
4362
+ :context_length: 100000
4363
+ :architecture:
4364
+ modality: text->text
4365
+ tokenizer: Claude
4366
+ instruct_type: claude
4367
+ :pricing:
4368
+ prompt: '0.0000008'
4369
+ completion: '0.0000024'
4370
+ image: '0'
4371
+ request: '0'
4372
+ :top_provider:
4373
+ context_length: 100000
4374
+ max_completion_tokens: 4096
4375
+ is_moderated: true
4376
+ :per_request_limits:
4377
+ prompt_tokens: '25247270'
4378
+ completion_tokens: '8415756'
4379
+ - :id: anthropic/claude-1.2
4380
+ :name: 'Anthropic: Claude v1.2'
4381
+ :created: 1690502400
4382
+ :description: Anthropic's model for low-latency, high throughput text generation.
4383
+ Supports hundreds of pages of text.
4384
+ :context_length: 100000
4385
+ :architecture:
4386
+ modality: text->text
4387
+ tokenizer: Claude
4388
+ instruct_type: claude
4389
+ :pricing:
4390
+ prompt: '0.000008'
4391
+ completion: '0.000024'
4392
+ image: '0'
4393
+ request: '0'
4394
+ :top_provider:
4395
+ context_length: 100000
4396
+ max_completion_tokens: 4096
4397
+ is_moderated: true
4398
+ :per_request_limits:
4399
+ prompt_tokens: '2524727'
4400
+ completion_tokens: '841575'
4401
+ - :id: anthropic/claude-1
4402
+ :name: 'Anthropic: Claude v1'
4403
+ :created: 1690502400
4404
+ :description: Anthropic's model for low-latency, high throughput text generation.
4405
+ Supports hundreds of pages of text.
4406
+ :context_length: 100000
4407
+ :architecture:
4408
+ modality: text->text
4409
+ tokenizer: Claude
4410
+ instruct_type: claude
4411
+ :pricing:
4412
+ prompt: '0.000008'
4413
+ completion: '0.000024'
4414
+ image: '0'
4415
+ request: '0'
4416
+ :top_provider:
4417
+ context_length: 100000
4418
+ max_completion_tokens: 4096
4419
+ is_moderated: true
4420
+ :per_request_limits:
4421
+ prompt_tokens: '2524727'
4422
+ completion_tokens: '841575'
4423
+ - :id: anthropic/claude-instant-1
4424
+ :name: 'Anthropic: Claude Instant v1'
4425
+ :created: 1690502400
4426
+ :description: Anthropic's model for low-latency, high throughput text generation.
4427
+ Supports hundreds of pages of text.
4428
+ :context_length: 100000
4429
+ :architecture:
4430
+ modality: text->text
4431
+ tokenizer: Claude
4432
+ instruct_type:
4433
+ :pricing:
4434
+ prompt: '0.0000008'
4435
+ completion: '0.0000024'
4436
+ image: '0'
4437
+ request: '0'
4438
+ :top_provider:
4439
+ context_length: 100000
4440
+ max_completion_tokens: 4096
4441
+ is_moderated: true
4442
+ :per_request_limits:
4443
+ prompt_tokens: '25247270'
4444
+ completion_tokens: '8415756'
4445
+ - :id: anthropic/claude-instant-1:beta
4446
+ :name: 'Anthropic: Claude Instant v1 (self-moderated)'
4447
+ :created: 1690502400
4448
+ :description: |-
4449
+ Anthropic's model for low-latency, high throughput text generation. Supports hundreds of pages of text.
4450
+
4451
+ _This is a faster endpoint, made available in collaboration with Anthropic, that is self-moderated: response moderation happens on the provider's side instead of OpenRouter's. For requests that pass moderation, it's identical to the [Standard](/anthropic/claude-instant-1) variant._
4452
+ :context_length: 100000
4453
+ :architecture:
4454
+ modality: text->text
4455
+ tokenizer: Claude
4456
+ instruct_type:
4457
+ :pricing:
4458
+ prompt: '0.0000008'
4459
+ completion: '0.0000024'
4460
+ image: '0'
4461
+ request: '0'
4462
+ :top_provider:
4463
+ context_length: 100000
4464
+ max_completion_tokens: 4096
4465
+ is_moderated: false
4466
+ :per_request_limits:
4467
+ prompt_tokens: '25247270'
4468
+ completion_tokens: '8415756'
4469
+ - :id: anthropic/claude-2.0
4470
+ :name: 'Anthropic: Claude v2.0'
4471
+ :created: 1690502400
4472
+ :description: Anthropic's flagship model. Superior performance on tasks that require
4473
+ complex reasoning. Supports hundreds of pages of text.
4474
+ :context_length: 100000
4475
+ :architecture:
4476
+ modality: text->text
4477
+ tokenizer: Claude
4478
+ instruct_type:
4479
+ :pricing:
4480
+ prompt: '0.000008'
4481
+ completion: '0.000024'
4482
+ image: '0'
4483
+ request: '0'
4484
+ :top_provider:
4485
+ context_length: 100000
4486
+ max_completion_tokens: 4096
4487
+ is_moderated: true
4488
+ :per_request_limits:
4489
+ prompt_tokens: '2524727'
4490
+ completion_tokens: '841575'
4491
+ - :id: anthropic/claude-2.0:beta
4492
+ :name: 'Anthropic: Claude v2.0 (self-moderated)'
4493
+ :created: 1690502400
4494
+ :description: |-
4495
+ Anthropic's flagship model. Superior performance on tasks that require complex reasoning. Supports hundreds of pages of text.
4496
+
4497
+ _This is a faster endpoint, made available in collaboration with Anthropic, that is self-moderated: response moderation happens on the provider's side instead of OpenRouter's. For requests that pass moderation, it's identical to the [Standard](/anthropic/claude-2.0) variant._
4498
+ :context_length: 100000
4499
+ :architecture:
4500
+ modality: text->text
4501
+ tokenizer: Claude
4502
+ instruct_type:
4503
+ :pricing:
4504
+ prompt: '0.000008'
4505
+ completion: '0.000024'
4506
+ image: '0'
4507
+ request: '0'
4508
+ :top_provider:
4509
+ context_length: 100000
4510
+ max_completion_tokens: 4096
4511
+ is_moderated: false
4512
+ :per_request_limits:
4513
+ prompt_tokens: '2524727'
4514
+ completion_tokens: '841575'
4515
+ - :id: undi95/remm-slerp-l2-13b
4516
+ :name: ReMM SLERP 13B
4517
+ :created: 1689984000
4518
+ :description: 'A recreation trial of the original MythoMax-L2-B13 but with updated
4519
+ models. #merge'
4520
+ :context_length: 4096
4521
+ :architecture:
4522
+ modality: text->text
4523
+ tokenizer: Llama2
4524
+ instruct_type: alpaca
4525
+ :pricing:
4526
+ prompt: '0.000001125'
4527
+ completion: '0.000001125'
4528
+ image: '0'
4529
+ request: '0'
4530
+ :top_provider:
4531
+ context_length: 6144
4532
+ max_completion_tokens: 400
4533
+ is_moderated: false
4534
+ :per_request_limits:
4535
+ prompt_tokens: '17953614'
4536
+ completion_tokens: '17953614'
4537
+ - :id: undi95/remm-slerp-l2-13b:extended
4538
+ :name: ReMM SLERP 13B (extended)
4539
+ :created: 1689984000
4540
+ :description: |-
4541
+ A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge
4542
+
4543
+ _These are extended-context endpoints for [ReMM SLERP 13B](/undi95/remm-slerp-l2-13b). They may have higher prices._
4544
+ :context_length: 6144
4545
+ :architecture:
4546
+ modality: text->text
4547
+ tokenizer: Llama2
4548
+ instruct_type: alpaca
4549
+ :pricing:
4550
+ prompt: '0.000001125'
4551
+ completion: '0.000001125'
4552
+ image: '0'
4553
+ request: '0'
4554
+ :top_provider:
4555
+ context_length: 6144
4556
+ max_completion_tokens: 400
4557
+ is_moderated: false
4558
+ :per_request_limits:
4559
+ prompt_tokens: '17953614'
4560
+ completion_tokens: '17953614'
4561
+ - :id: google/palm-2-codechat-bison
4562
+ :name: 'Google: PaLM 2 Code Chat'
4563
+ :created: 1689811200
4564
+ :description: PaLM 2 fine-tuned for chatbot conversations that help with code-related
4565
+ questions.
4566
+ :context_length: 7168
4567
+ :architecture:
4568
+ modality: text->text
4569
+ tokenizer: PaLM
4570
+ instruct_type:
4571
+ :pricing:
4572
+ prompt: '0.000001'
4573
+ completion: '0.000002'
4574
+ image: '0'
4575
+ request: '0'
4576
+ :top_provider:
4577
+ context_length: 7168
4578
+ max_completion_tokens: 1024
4579
+ is_moderated: false
4580
+ :per_request_limits:
4581
+ prompt_tokens: '20197816'
4582
+ completion_tokens: '10098908'
4583
+ - :id: google/palm-2-chat-bison
4584
+ :name: 'Google: PaLM 2 Chat'
4585
+ :created: 1689811200
4586
+ :description: PaLM 2 is a language model by Google with improved multilingual, reasoning
4587
+ and coding capabilities.
4588
+ :context_length: 9216
4589
+ :architecture:
4590
+ modality: text->text
4591
+ tokenizer: PaLM
4592
+ instruct_type:
4593
+ :pricing:
4594
+ prompt: '0.000001'
4595
+ completion: '0.000002'
4596
+ image: '0'
4597
+ request: '0'
4598
+ :top_provider:
4599
+ context_length: 9216
4600
+ max_completion_tokens: 1024
4601
+ is_moderated: false
4602
+ :per_request_limits:
4603
+ prompt_tokens: '20197816'
4604
+ completion_tokens: '10098908'
4605
+ - :id: gryphe/mythomax-l2-13b:free
4606
+ :name: MythoMax 13B (free)
4607
+ :created: 1688256000
4608
+ :description: |-
4609
+ One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge
4610
+
4611
+ _These are free, rate-limited endpoints for [MythoMax 13B](/gryphe/mythomax-l2-13b). Outputs may be cached. Read about rate limits [here](/docs/limits)._
4612
+ :context_length: 4096
4613
+ :architecture:
4614
+ modality: text->text
4615
+ tokenizer: Llama2
4616
+ instruct_type: alpaca
4617
+ :pricing:
4618
+ prompt: '0'
4619
+ completion: '0'
4620
+ image: '0'
4621
+ request: '0'
4622
+ :top_provider:
4623
+ context_length: 4096
4624
+ max_completion_tokens: 2048
4625
+ is_moderated: false
4626
+ :per_request_limits:
4627
+ prompt_tokens: Infinity
4628
+ completion_tokens: Infinity
4629
+ - :id: gryphe/mythomax-l2-13b
4630
+ :name: MythoMax 13B
4631
+ :created: 1688256000
4632
+ :description: 'One of the highest performing and most popular fine-tunes of Llama
4633
+ 2 13B, with rich descriptions and roleplay. #merge'
4634
+ :context_length: 4096
4635
+ :architecture:
4636
+ modality: text->text
4637
+ tokenizer: Llama2
4638
+ instruct_type: alpaca
4639
+ :pricing:
4640
+ prompt: '0.0000001'
4641
+ completion: '0.0000001'
4642
+ image: '0'
4643
+ request: '0'
4644
+ :top_provider:
4645
+ context_length: 4096
4646
+ max_completion_tokens:
4647
+ is_moderated: false
4648
+ :per_request_limits:
4649
+ prompt_tokens: '201978167'
4650
+ completion_tokens: '201978167'
4651
+ - :id: gryphe/mythomax-l2-13b:nitro
4652
+ :name: MythoMax 13B (nitro)
4653
+ :created: 1688256000
4654
+ :description: |-
4655
+ One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge
4656
+
4657
+ _These are higher-throughput endpoints for [MythoMax 13B](/gryphe/mythomax-l2-13b). They may have higher prices._
4658
+ :context_length: 4096
4659
+ :architecture:
4660
+ modality: text->text
4661
+ tokenizer: Llama2
4662
+ instruct_type: alpaca
4663
+ :pricing:
4664
+ prompt: '0.0000002'
4665
+ completion: '0.0000002'
4666
+ image: '0'
4667
+ request: '0'
4668
+ :top_provider:
4669
+ context_length: 4096
4670
+ max_completion_tokens:
4671
+ is_moderated: false
4672
+ :per_request_limits:
4673
+ prompt_tokens: '100989083'
4674
+ completion_tokens: '100989083'
4675
+ - :id: gryphe/mythomax-l2-13b:extended
4676
+ :name: MythoMax 13B (extended)
4677
+ :created: 1688256000
4678
+ :description: |-
4679
+ One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge
4680
+
4681
+ _These are extended-context endpoints for [MythoMax 13B](/gryphe/mythomax-l2-13b). They may have higher prices._
4682
+ :context_length: 8192
4683
+ :architecture:
4684
+ modality: text->text
4685
+ tokenizer: Llama2
4686
+ instruct_type: alpaca
4687
+ :pricing:
4688
+ prompt: '0.000001125'
4689
+ completion: '0.000001125'
4690
+ image: '0'
4691
+ request: '0'
4692
+ :top_provider:
4693
+ context_length: 8192
4694
+ max_completion_tokens: 400
4695
+ is_moderated: false
4696
+ :per_request_limits:
4697
+ prompt_tokens: '17953614'
4698
+ completion_tokens: '17953614'
4699
+ - :id: meta-llama/llama-2-13b-chat
4700
+ :name: 'Meta: Llama v2 13B Chat'
4701
+ :created: 1687219200
4702
+ :description: A 13 billion parameter language model from Meta, fine tuned for chat
4703
+ completions
4704
+ :context_length: 4096
4705
+ :architecture:
4706
+ modality: text->text
4707
+ tokenizer: Llama2
4708
+ instruct_type: llama2
4709
+ :pricing:
4710
+ prompt: '0.000000198'
4711
+ completion: '0.000000198'
4712
+ image: '0'
4713
+ request: '0'
4714
+ :top_provider:
4715
+ context_length: 4096
4716
+ max_completion_tokens:
4717
+ is_moderated: false
4718
+ :per_request_limits:
4719
+ prompt_tokens: '102009175'
4720
+ completion_tokens: '102009175'
4721
+ - :id: openai/gpt-4-0314
4722
+ :name: 'OpenAI: GPT-4 (older v0314)'
4723
+ :created: 1685232000
4724
+ :description: 'GPT-4-0314 is the first version of GPT-4 released, with a context
4725
+ length of 8,192 tokens, and was supported until June 14. Training data: up to
4726
+ Sep 2021.'
4727
+ :context_length: 8191
4728
+ :architecture:
4729
+ modality: text->text
4730
+ tokenizer: GPT
4731
+ instruct_type:
4732
+ :pricing:
4733
+ prompt: '0.00003'
4734
+ completion: '0.00006'
4735
+ image: '0'
4736
+ request: '0'
4737
+ :top_provider:
4738
+ context_length: 8191
4739
+ max_completion_tokens: 4096
4740
+ is_moderated: true
4741
+ :per_request_limits:
4742
+ prompt_tokens: '673260'
4743
+ completion_tokens: '336630'
4744
+ - :id: openai/gpt-4
4745
+ :name: 'OpenAI: GPT-4'
4746
+ :created: 1685232000
4747
+ :description: 'OpenAI''s flagship model, GPT-4 is a large-scale multimodal language
4748
+ model capable of solving difficult problems with greater accuracy than previous
4749
+ models due to its broader general knowledge and advanced reasoning capabilities.
4750
+ Training data: up to Sep 2021.'
4751
+ :context_length: 8191
4752
+ :architecture:
4753
+ modality: text->text
4754
+ tokenizer: GPT
4755
+ instruct_type:
4756
+ :pricing:
4757
+ prompt: '0.00003'
4758
+ completion: '0.00006'
4759
+ image: '0'
4760
+ request: '0'
4761
+ :top_provider:
4762
+ context_length: 8191
4763
+ max_completion_tokens: 4096
4764
+ is_moderated: true
4765
+ :per_request_limits:
4766
+ prompt_tokens: '673260'
4767
+ completion_tokens: '336630'
4768
+ - :id: openai/gpt-3.5-turbo-0301
4769
+ :name: 'OpenAI: GPT-3.5 Turbo (older v0301)'
4770
+ :created: 1685232000
4771
+ :description: |-
4772
+ GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks.
4773
+
4774
+ Training data up to Sep 2021.
4775
+ :context_length: 4095
4776
+ :architecture:
4777
+ modality: text->text
4778
+ tokenizer: GPT
4779
+ instruct_type:
4780
+ :pricing:
4781
+ prompt: '0.000001'
4782
+ completion: '0.000002'
4783
+ image: '0'
4784
+ request: '0'
4785
+ :top_provider:
4786
+ context_length: 4095
4787
+ max_completion_tokens: 4096
4788
+ is_moderated: true
4789
+ :per_request_limits:
4790
+ prompt_tokens: '20197816'
4791
+ completion_tokens: '10098908'
4792
+ - :id: openai/gpt-3.5-turbo-0125
4793
+ :name: 'OpenAI: GPT-3.5 Turbo 16k'
4794
+ :created: 1685232000
4795
+ :description: |-
4796
+ The latest GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Sep 2021.
4797
+
4798
+ This version has a higher accuracy at responding in requested formats and a fix for a bug which caused a text encoding issue for non-English language function calls.
4799
+ :context_length: 16385
4800
+ :architecture:
4801
+ modality: text->text
4802
+ tokenizer: GPT
4803
+ instruct_type:
4804
+ :pricing:
4805
+ prompt: '0.0000005'
4806
+ completion: '0.0000015'
4807
+ image: '0'
4808
+ request: '0'
4809
+ :top_provider:
4810
+ context_length: 16385
4811
+ max_completion_tokens: 4096
4812
+ is_moderated: true
4813
+ :per_request_limits:
4814
+ prompt_tokens: '40395633'
4815
+ completion_tokens: '13465211'
4816
+ - :id: openai/gpt-3.5-turbo
4817
+ :name: 'OpenAI: GPT-3.5 Turbo'
4818
+ :created: 1685232000
4819
+ :description: |-
4820
+ GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks.
4821
+
4822
+ Training data up to Sep 2021.
4823
+ :context_length: 16385
4824
+ :architecture:
4825
+ modality: text->text
4826
+ tokenizer: GPT
4827
+ instruct_type:
4828
+ :pricing:
4829
+ prompt: '0.0000005'
4830
+ completion: '0.0000015'
4831
+ image: '0'
4832
+ request: '0'
4833
+ :top_provider:
4834
+ context_length: 16385
4835
+ max_completion_tokens: 4096
4836
+ is_moderated: true
4837
+ :per_request_limits:
4838
+ prompt_tokens: '40395633'
4839
+ completion_tokens: '13465211'