endoreg-db 0.8.2.4__py3-none-any.whl → 0.8.2.6__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of endoreg-db might be problematic. Click here for more details.

@@ -1,1528 +0,0 @@
1
- API reference
2
-
3
- Endpoints
4
-
5
- Generate a completion
6
- Generate a chat completion
7
- Create a Model
8
- List Local Models
9
- Show Model Information
10
- Copy a Model
11
- Delete a Model
12
- Pull a Model
13
- Push a Model
14
- Generate Embeddings
15
- List Running Models
16
- Web search
17
- Version
18
-
19
-
20
- Conventions
21
-
22
- Model names
23
- Model names follow a model:tag format, where model can have an optional namespace such as example/model. Some examples are orca-mini:3b-q4_1 and llama3:70b. The tag is optional and, if not provided, will default to latest. The tag is used to identify a specific version.
24
-
25
- Durations
26
- All durations are returned in nanoseconds.
27
-
28
- Streaming responses
29
- Certain endpoints stream responses as JSON objects. Streaming can be disabled by providing {"stream": false} for these endpoints.
30
-
31
- Generate a completion
32
- Copy
33
-
34
- POST /api/generate
35
-
36
- Generate a response for a given prompt with a provided model. This is a streaming endpoint, so there will be a series of responses. The final response object will include statistics and additional data from the request.
37
-
38
- Parameters
39
-
40
- model: (required) the model name
41
- prompt: the prompt to generate a response for
42
- suffix: the text after the model response
43
- images: (optional) a list of base64-encoded images (for multimodal models such as llava)
44
-
45
- Advanced parameters (optional):
46
-
47
- format: the format to return a response in. Format can be json or a JSON schema
48
- options: additional model parameters listed in the documentation for the Modelfile such as temperature
49
- system: system message to (overrides what is defined in the Modelfile)
50
- template: the prompt template to use (overrides what is defined in the Modelfile)
51
- stream: if false the response will be returned as a single response object, rather than a stream of objects
52
- raw: if true no formatting will be applied to the prompt. You may choose to use the raw parameter if you are specifying a full templated prompt in your request to the API
53
- keep_alive: controls how long the model will stay loaded into memory following the request (default: 5m)
54
- context (deprecated): the context parameter returned from a previous request to /generate, this can be used to keep a short conversational memory
55
-
56
-
57
- Structured outputs
58
- Structured outputs are supported by providing a JSON schema in the format parameter. The model will generate a response that matches the schema. See the structured outputs example below.
59
-
60
- JSON mode
61
- Enable JSON mode by setting the format parameter to json. This will structure the response as a valid JSON object. See the JSON mode example below.
62
- It’s important to instruct the model to use JSON in the prompt. Otherwise, the model may generate large amounts whitespace.
63
-
64
- Examples
65
-
66
- Generate request (Streaming)
67
- Request
68
- Copy
69
-
70
- curl http://localhost:11434/api/generate -d '{
71
- "model": "llama3.2",
72
- "prompt": "Why is the sky blue?"
73
- }'
74
-
75
- Response
76
- A stream of JSON objects is returned:
77
- Copy
78
-
79
- {
80
- "model": "llama3.2",
81
- "created_at": "2023-08-04T08:52:19.385406455-07:00",
82
- "response": "The",
83
- "done": false
84
- }
85
-
86
- The final response in the stream also includes additional data about the generation:
87
-
88
- total_duration: time spent generating the response
89
- load_duration: time spent in nanoseconds loading the model
90
- prompt_eval_count: number of tokens in the prompt
91
- prompt_eval_duration: time spent in nanoseconds evaluating the prompt
92
- eval_count: number of tokens in the response
93
- eval_duration: time in nanoseconds spent generating the response
94
- context: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
95
- response: empty if the response was streamed, if not streamed, this will contain the full response
96
-
97
- To calculate how fast the response is generated in tokens per second (token/s), divide eval_count / eval_duration * 10^9.
98
- Copy
99
-
100
- {
101
- "model": "llama3.2",
102
- "created_at": "2023-08-04T19:22:45.499127Z",
103
- "response": "",
104
- "done": true,
105
- "context": [1, 2, 3],
106
- "total_duration": 10706818083,
107
- "load_duration": 6338219291,
108
- "prompt_eval_count": 26,
109
- "prompt_eval_duration": 130079000,
110
- "eval_count": 259,
111
- "eval_duration": 4232710000
112
- }
113
-
114
-
115
- Request (No streaming)
116
- Request
117
- A response can be received in one reply when streaming is off.
118
- Copy
119
-
120
- curl http://localhost:11434/api/generate -d '{
121
- "model": "llama3.2",
122
- "prompt": "Why is the sky blue?",
123
- "stream": false
124
- }'
125
-
126
- Response
127
- If stream is set to false, the response will be a single JSON object:
128
- Copy
129
-
130
- {
131
- "model": "llama3.2",
132
- "created_at": "2023-08-04T19:22:45.499127Z",
133
- "response": "The sky is blue because it is the color of the sky.",
134
- "done": true,
135
- "context": [1, 2, 3],
136
- "total_duration": 5043500667,
137
- "load_duration": 5025959,
138
- "prompt_eval_count": 26,
139
- "prompt_eval_duration": 325953000,
140
- "eval_count": 290,
141
- "eval_duration": 4709213000
142
- }
143
-
144
-
145
- Request (with suffix)
146
- Request
147
- Copy
148
-
149
- curl http://localhost:11434/api/generate -d '{
150
- "model": "codellama:code",
151
- "prompt": "def compute_gcd(a, b):",
152
- "suffix": " return result",
153
- "options": {
154
- "temperature": 0
155
- },
156
- "stream": false
157
- }'
158
-
159
- Response
160
- Copy
161
-
162
- {
163
- "model": "codellama:code",
164
- "created_at": "2024-07-22T20:47:51.147561Z",
165
- "response": "\n if a == 0:\n return b\n else:\n return compute_gcd(b % a, a)\n\ndef compute_lcm(a, b):\n result = (a * b) / compute_gcd(a, b)\n",
166
- "done": true,
167
- "done_reason": "stop",
168
- "context": [...],
169
- "total_duration": 1162761250,
170
- "load_duration": 6683708,
171
- "prompt_eval_count": 17,
172
- "prompt_eval_duration": 201222000,
173
- "eval_count": 63,
174
- "eval_duration": 953997000
175
- }
176
-
177
-
178
- Request (Structured outputs)
179
- Request
180
- Copy
181
-
182
- curl -X POST http://localhost:11434/api/generate -H "Content-Type: application/json" -d '{
183
- "model": "llama3.1:8b",
184
- "prompt": "Ollama is 22 years old and is busy saving the world. Respond using JSON",
185
- "stream": false,
186
- "format": {
187
- "type": "object",
188
- "properties": {
189
- "age": {
190
- "type": "integer"
191
- },
192
- "available": {
193
- "type": "boolean"
194
- }
195
- },
196
- "required": [
197
- "age",
198
- "available"
199
- ]
200
- }
201
- }'
202
-
203
- Response
204
- Copy
205
-
206
- {
207
- "model": "llama3.1:8b",
208
- "created_at": "2024-12-06T00:48:09.983619Z",
209
- "response": "{\n \"age\": 22,\n \"available\": true\n}",
210
- "done": true,
211
- "done_reason": "stop",
212
- "context": [1, 2, 3],
213
- "total_duration": 1075509083,
214
- "load_duration": 567678166,
215
- "prompt_eval_count": 28,
216
- "prompt_eval_duration": 236000000,
217
- "eval_count": 16,
218
- "eval_duration": 269000000
219
- }
220
-
221
-
222
- Request (JSON mode)
223
- When format is set to json, the output will always be a well-formed JSON object. It’s important to also instruct the model to respond in JSON.
224
- Request
225
- Copy
226
-
227
- curl http://localhost:11434/api/generate -d '{
228
- "model": "llama3.2",
229
- "prompt": "What color is the sky at different times of the day? Respond using JSON",
230
- "format": "json",
231
- "stream": false
232
- }'
233
-
234
- Response
235
- Copy
236
-
237
- {
238
- "model": "llama3.2",
239
- "created_at": "2023-11-09T21:07:55.186497Z",
240
- "response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
241
- "done": true,
242
- "context": [1, 2, 3],
243
- "total_duration": 4648158584,
244
- "load_duration": 4071084,
245
- "prompt_eval_count": 36,
246
- "prompt_eval_duration": 439038000,
247
- "eval_count": 180,
248
- "eval_duration": 4196918000
249
- }
250
-
251
- The value of response will be a string containing JSON similar to:
252
- Copy
253
-
254
- {
255
- "morning": {
256
- "color": "blue"
257
- },
258
- "noon": {
259
- "color": "blue-gray"
260
- },
261
- "afternoon": {
262
- "color": "warm gray"
263
- },
264
- "evening": {
265
- "color": "orange"
266
- }
267
- }
268
-
269
-
270
- Request (with images)
271
- To submit images to multimodal models such as llava or bakllava, provide a list of base64-encoded images:
272
-
273
- Request
274
- Copy
275
-
276
- curl http://localhost:11434/api/generate -d '{
277
- "model": "llava",
278
- "prompt":"What is in this picture?",
279
- "stream": false,
280
- "images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"]
281
- }'
282
-
283
-
284
- Response
285
- Copy
286
-
287
- {
288
- "model": "llava",
289
- "created_at": "2023-11-03T15:36:02.583064Z",
290
- "response": "A happy cartoon character, which is cute and cheerful.",
291
- "done": true,
292
- "context": [1, 2, 3],
293
- "total_duration": 2938432250,
294
- "load_duration": 2559292,
295
- "prompt_eval_count": 1,
296
- "prompt_eval_duration": 2195557000,
297
- "eval_count": 44,
298
- "eval_duration": 736432000
299
- }
300
-
301
-
302
- Request (Raw Mode)
303
- In some cases, you may wish to bypass the templating system and provide a full prompt. In this case, you can use the raw parameter to disable templating. Also note that raw mode will not return a context.
304
- Request
305
- Copy
306
-
307
- curl http://localhost:11434/api/generate -d '{
308
- "model": "mistral",
309
- "prompt": "[INST] why is the sky blue? [/INST]",
310
- "raw": true,
311
- "stream": false
312
- }'
313
-
314
-
315
- Request (Reproducible outputs)
316
- For reproducible outputs, set seed to a number:
317
- Request
318
- Copy
319
-
320
- curl http://localhost:11434/api/generate -d '{
321
- "model": "mistral",
322
- "prompt": "Why is the sky blue?",
323
- "options": {
324
- "seed": 123
325
- }
326
- }'
327
-
328
- Response
329
- Copy
330
-
331
- {
332
- "model": "mistral",
333
- "created_at": "2023-11-03T15:36:02.583064Z",
334
- "response": " The sky appears blue because of a phenomenon called Rayleigh scattering.",
335
- "done": true,
336
- "total_duration": 8493852375,
337
- "load_duration": 6589624375,
338
- "prompt_eval_count": 14,
339
- "prompt_eval_duration": 119039000,
340
- "eval_count": 110,
341
- "eval_duration": 1779061000
342
- }
343
-
344
-
345
- Generate request (With options)
346
- If you want to set custom options for the model at runtime rather than in the Modelfile, you can do so with the options parameter. This example sets every available option, but you can set any of them individually and omit the ones you do not want to override.
347
- Request
348
- Copy
349
-
350
- curl http://localhost:11434/api/generate -d '{
351
- "model": "llama3.2",
352
- "prompt": "Why is the sky blue?",
353
- "stream": false,
354
- "options": {
355
- "num_keep": 5,
356
- "seed": 42,
357
- "num_predict": 100,
358
- "top_k": 20,
359
- "top_p": 0.9,
360
- "min_p": 0.0,
361
- "typical_p": 0.7,
362
- "repeat_last_n": 33,
363
- "temperature": 0.8,
364
- "repeat_penalty": 1.2,
365
- "presence_penalty": 1.5,
366
- "frequency_penalty": 1.0,
367
- "mirostat": 1,
368
- "mirostat_tau": 0.8,
369
- "mirostat_eta": 0.6,
370
- "penalize_newline": true,
371
- "stop": ["\n", "user:"],
372
- "numa": false,
373
- "num_ctx": 1024,
374
- "num_batch": 2,
375
- "num_gpu": 1,
376
- "main_gpu": 0,
377
- "low_vram": false,
378
- "vocab_only": false,
379
- "use_mmap": true,
380
- "use_mlock": false,
381
- "num_thread": 8
382
- }
383
- }'
384
-
385
- Response
386
- Copy
387
-
388
- {
389
- "model": "llama3.2",
390
- "created_at": "2023-08-04T19:22:45.499127Z",
391
- "response": "The sky is blue because it is the color of the sky.",
392
- "done": true,
393
- "context": [1, 2, 3],
394
- "total_duration": 4935886791,
395
- "load_duration": 534986708,
396
- "prompt_eval_count": 26,
397
- "prompt_eval_duration": 107345000,
398
- "eval_count": 237,
399
- "eval_duration": 4289432000
400
- }
401
-
402
-
403
- Load a model
404
- If an empty prompt is provided, the model will be loaded into memory.
405
- Request
406
- Copy
407
-
408
- curl http://localhost:11434/api/generate -d '{
409
- "model": "llama3.2"
410
- }'
411
-
412
- Response
413
- A single JSON object is returned:
414
- Copy
415
-
416
- {
417
- "model": "llama3.2",
418
- "created_at": "2023-12-18T19:52:07.071755Z",
419
- "response": "",
420
- "done": true
421
- }
422
-
423
-
424
- Unload a model
425
- If an empty prompt is provided and the keep_alive parameter is set to 0, a model will be unloaded from memory.
426
- Request
427
- Copy
428
-
429
- curl http://localhost:11434/api/generate -d '{
430
- "model": "llama3.2",
431
- "keep_alive": 0
432
- }'
433
-
434
- Response
435
- A single JSON object is returned:
436
- Copy
437
-
438
- {
439
- "model": "llama3.2",
440
- "created_at": "2024-09-12T03:54:03.516566Z",
441
- "response": "",
442
- "done": true,
443
- "done_reason": "unload"
444
- }
445
-
446
-
447
- Generate a chat completion
448
- Copy
449
-
450
- POST /api/chat
451
-
452
- Generate the next message in a chat with a provided model. This is a streaming endpoint, so there will be a series of responses. Streaming can be disabled using "stream": false. The final response object will include statistics and additional data from the request.
453
-
454
- Parameters
455
-
456
- model: (required) the model name
457
- messages: the messages of the chat, this can be used to keep a chat memory
458
- tools: list of tools in JSON for the model to use if supported
459
-
460
- The message object has the following fields:
461
-
462
- role: the role of the message, either system, user, assistant, or tool
463
- content: the content of the message
464
- images (optional): a list of images to include in the message (for multimodal models such as llava)
465
- tool_calls (optional): a list of tools in JSON that the model wants to use
466
-
467
- Advanced parameters (optional):
468
-
469
- format: the format to return a response in. Format can be json or a JSON schema.
470
- options: additional model parameters listed in the documentation for the Modelfile such as temperature
471
- stream: if false the response will be returned as a single response object, rather than a stream of objects
472
- keep_alive: controls how long the model will stay loaded into memory following the request (default: 5m)
473
-
474
-
475
- Structured outputs
476
- Structured outputs are supported by providing a JSON schema in the format parameter. The model will generate a response that matches the schema. See the Chat request (Structured outputs) example below.
477
-
478
- Examples
479
-
480
- Chat Request (Streaming)
481
- Request
482
- Send a chat message with a streaming response.
483
- Copy
484
-
485
- curl http://localhost:11434/api/chat -d '{
486
- "model": "llama3.2",
487
- "messages": [
488
- {
489
- "role": "user",
490
- "content": "why is the sky blue?"
491
- }
492
- ]
493
- }'
494
-
495
- Response
496
- A stream of JSON objects is returned:
497
- Copy
498
-
499
- {
500
- "model": "llama3.2",
501
- "created_at": "2023-08-04T08:52:19.385406455-07:00",
502
- "message": {
503
- "role": "assistant",
504
- "content": "The",
505
- "images": null
506
- },
507
- "done": false
508
- }
509
-
510
- Final response:
511
- Copy
512
-
513
- {
514
- "model": "llama3.2",
515
- "created_at": "2023-08-04T19:22:45.499127Z",
516
- "message": {
517
- "role": "assistant",
518
- "content": ""
519
- },
520
- "done": true,
521
- "total_duration": 4883583458,
522
- "load_duration": 1334875,
523
- "prompt_eval_count": 26,
524
- "prompt_eval_duration": 342546000,
525
- "eval_count": 282,
526
- "eval_duration": 4535599000
527
- }
528
-
529
-
530
- Chat request (No streaming)
531
- Request
532
- Copy
533
-
534
- curl http://localhost:11434/api/chat -d '{
535
- "model": "llama3.2",
536
- "messages": [
537
- {
538
- "role": "user",
539
- "content": "why is the sky blue?"
540
- }
541
- ],
542
- "stream": false
543
- }'
544
-
545
- Response
546
- Copy
547
-
548
- {
549
- "model": "llama3.2",
550
- "created_at": "2023-12-12T14:13:43.416799Z",
551
- "message": {
552
- "role": "assistant",
553
- "content": "Hello! How are you today?"
554
- },
555
- "done": true,
556
- "total_duration": 5191566416,
557
- "load_duration": 2154458,
558
- "prompt_eval_count": 26,
559
- "prompt_eval_duration": 383809000,
560
- "eval_count": 298,
561
- "eval_duration": 4799921000
562
- }
563
-
564
-
565
- Chat request (Structured outputs)
566
- Request
567
- Copy
568
-
569
- curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
570
- "model": "llama3.1",
571
- "messages": [{"role": "user", "content": "Ollama is 22 years old and busy saving the world. Return a JSON object with the age and availability."}],
572
- "stream": false,
573
- "format": {
574
- "type": "object",
575
- "properties": {
576
- "age": {
577
- "type": "integer"
578
- },
579
- "available": {
580
- "type": "boolean"
581
- }
582
- },
583
- "required": [
584
- "age",
585
- "available"
586
- ]
587
- },
588
- "options": {
589
- "temperature": 0
590
- }
591
- }'
592
-
593
- Response
594
- Copy
595
-
596
- {
597
- "model": "llama3.1",
598
- "created_at": "2024-12-06T00:46:58.265747Z",
599
- "message": {
600
- "role": "assistant",
601
- "content": "{\"age\": 22, \"available\": false}"
602
- },
603
- "done_reason": "stop",
604
- "done": true,
605
- "total_duration": 2254970291,
606
- "load_duration": 574751416,
607
- "prompt_eval_count": 34,
608
- "prompt_eval_duration": 1502000000,
609
- "eval_count": 12,
610
- "eval_duration": 175000000
611
- }
612
-
613
-
614
- Chat request (With History)
615
- Send a chat message with a conversation history. You can use this same approach to start the conversation using multi-shot or chain-of-thought prompting.
616
- Request
617
- Copy
618
-
619
- curl http://localhost:11434/api/chat -d '{
620
- "model": "llama3.2",
621
- "messages": [
622
- {
623
- "role": "user",
624
- "content": "why is the sky blue?"
625
- },
626
- {
627
- "role": "assistant",
628
- "content": "due to rayleigh scattering."
629
- },
630
- {
631
- "role": "user",
632
- "content": "how is that different than mie scattering?"
633
- }
634
- ]
635
- }'
636
-
637
- Response
638
- A stream of JSON objects is returned:
639
- Copy
640
-
641
- {
642
- "model": "llama3.2",
643
- "created_at": "2023-08-04T08:52:19.385406455-07:00",
644
- "message": {
645
- "role": "assistant",
646
- "content": "The"
647
- },
648
- "done": false
649
- }
650
-
651
- Final response:
652
- Copy
653
-
654
- {
655
- "model": "llama3.2",
656
- "created_at": "2023-08-04T19:22:45.499127Z",
657
- "done": true,
658
- "total_duration": 8113331500,
659
- "load_duration": 6396458,
660
- "prompt_eval_count": 61,
661
- "prompt_eval_duration": 398801000,
662
- "eval_count": 468,
663
- "eval_duration": 7701267000
664
- }
665
-
666
-
667
- Chat request (with images)
668
- Request
669
- Send a chat message with images. The images should be provided as an array, with the individual images encoded in Base64.
670
- Copy
671
-
672
- curl http://localhost:11434/api/chat -d '{
673
- "model": "llava",
674
- "messages": [
675
- {
676
- "role": "user",
677
- "content": "what is in this image?",
678
- "images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"]
679
- }
680
- ]
681
- }'
682
-
683
- Response
684
- Copy
685
-
686
- {
687
- "model": "llava",
688
- "created_at": "2023-12-13T22:42:50.203334Z",
689
- "message": {
690
- "role": "assistant",
691
- "content": " The image features a cute, little pig with an angry facial expression. It's wearing a heart on its shirt and is waving in the air. This scene appears to be part of a drawing or sketching project.",
692
- "images": null
693
- },
694
- "done": true,
695
- "total_duration": 1668506709,
696
- "load_duration": 1986209,
697
- "prompt_eval_count": 26,
698
- "prompt_eval_duration": 359682000,
699
- "eval_count": 83,
700
- "eval_duration": 1303285000
701
- }
702
-
703
-
704
- Chat request (Reproducible outputs)
705
- Request
706
- Copy
707
-
708
- curl http://localhost:11434/api/chat -d '{
709
- "model": "llama3.2",
710
- "messages": [
711
- {
712
- "role": "user",
713
- "content": "Hello!"
714
- }
715
- ],
716
- "options": {
717
- "seed": 101,
718
- "temperature": 0
719
- }
720
- }'
721
-
722
- Response
723
- Copy
724
-
725
- {
726
- "model": "llama3.2",
727
- "created_at": "2023-12-12T14:13:43.416799Z",
728
- "message": {
729
- "role": "assistant",
730
- "content": "Hello! How are you today?"
731
- },
732
- "done": true,
733
- "total_duration": 5191566416,
734
- "load_duration": 2154458,
735
- "prompt_eval_count": 26,
736
- "prompt_eval_duration": 383809000,
737
- "eval_count": 298,
738
- "eval_duration": 4799921000
739
- }
740
-
741
-
742
- Chat request (with tools)
743
- Request
744
- Copy
745
-
746
- curl http://localhost:11434/api/chat -d '{
747
- "model": "llama3.2",
748
- "messages": [
749
- {
750
- "role": "user",
751
- "content": "What is the weather today in Paris?"
752
- }
753
- ],
754
- "stream": false,
755
- "tools": [
756
- {
757
- "type": "function",
758
- "function": {
759
- "name": "get_current_weather",
760
- "description": "Get the current weather for a location",
761
- "parameters": {
762
- "type": "object",
763
- "properties": {
764
- "location": {
765
- "type": "string",
766
- "description": "The location to get the weather for, e.g. San Francisco, CA"
767
- },
768
- "format": {
769
- "type": "string",
770
- "description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",
771
- "enum": ["celsius", "fahrenheit"]
772
- }
773
- },
774
- "required": ["location", "format"]
775
- }
776
- }
777
- }
778
- ]
779
- }'
780
-
781
- Response
782
- Copy
783
-
784
- {
785
- "model": "llama3.2",
786
- "created_at": "2024-07-22T20:33:28.123648Z",
787
- "message": {
788
- "role": "assistant",
789
- "content": "",
790
- "tool_calls": [
791
- {
792
- "function": {
793
- "name": "get_current_weather",
794
- "arguments": {
795
- "format": "celsius",
796
- "location": "Paris, FR"
797
- }
798
- }
799
- }
800
- ]
801
- },
802
- "done_reason": "stop",
803
- "done": true,
804
- "total_duration": 885095291,
805
- "load_duration": 3753500,
806
- "prompt_eval_count": 122,
807
- "prompt_eval_duration": 328493000,
808
- "eval_count": 33,
809
- "eval_duration": 552222000
810
- }
811
-
812
-
813
- Load a model
814
- If the messages array is empty, the model will be loaded into memory.
815
- Request
816
- Copy
817
-
818
- curl http://localhost:11434/api/chat -d '{
819
- "model": "llama3.2",
820
- "messages": []
821
- }'
822
-
823
- Response
824
- Copy
825
-
826
- {
827
- "model": "llama3.2",
828
- "created_at": "2024-09-12T21:17:29.110811Z",
829
- "message": {
830
- "role": "assistant",
831
- "content": ""
832
- },
833
- "done_reason": "load",
834
- "done": true
835
- }
836
-
837
-
838
- Unload a model
839
- If the messages array is empty and the keep_alive parameter is set to 0, a model will be unloaded from memory.
840
- Request
841
- Copy
842
-
843
- curl http://localhost:11434/api/chat -d '{
844
- "model": "llama3.2",
845
- "messages": [],
846
- "keep_alive": 0
847
- }'
848
-
849
- Response
850
- A single JSON object is returned:
851
- Copy
852
-
853
- {
854
- "model": "llama3.2",
855
- "created_at": "2024-09-12T21:33:17.547535Z",
856
- "message": {
857
- "role": "assistant",
858
- "content": ""
859
- },
860
- "done_reason": "unload",
861
- "done": true
862
- }
863
-
864
-
865
- Create a Model
866
- Copy
867
-
868
- POST /api/create
869
-
870
- Create a model from:
871
-
872
- another model;
873
- a safetensors directory; or
874
- a GGUF file.
875
-
876
- If you are creating a model from a safetensors directory or from a GGUF file, you must create a blob for each of the files and then use the file name and SHA256 digest associated with each blob in the files field.
877
-
878
- Parameters
879
-
880
- model: name of the model to create
881
- from: (optional) name of an existing model to create the new model from
882
- files: (optional) a dictionary of file names to SHA256 digests of blobs to create the model from
883
- adapters: (optional) a dictionary of file names to SHA256 digests of blobs for LORA adapters
884
- template: (optional) the prompt template for the model
885
- license: (optional) a string or list of strings containing the license or licenses for the model
886
- system: (optional) a string containing the system prompt for the model
887
- parameters: (optional) a dictionary of parameters for the model (see Modelfile for a list of parameters)
888
- messages: (optional) a list of message objects used to create a conversation
889
- stream: (optional) if false the response will be returned as a single response object, rather than a stream of objects
890
- quantize (optional): quantize a non-quantized (e.g. float16) model
891
-
892
-
893
- Quantization types
894
- Type Recommended
895
- q2_K
896
- q3_K_L
897
- q3_K_M
898
- q3_K_S
899
- q4_0
900
- q4_1
901
- q4_K_M *
902
- q4_K_S
903
- q5_0
904
- q5_1
905
- q5_K_M
906
- q5_K_S
907
- q6_K
908
- q8_0 *
909
-
910
- Examples
911
-
912
- Create a new model
913
- Create a new model from an existing model.
914
- Request
915
- Copy
916
-
917
- curl http://localhost:11434/api/create -d '{
918
- "model": "mario",
919
- "from": "llama3.2",
920
- "system": "You are Mario from Super Mario Bros."
921
- }'
922
-
923
- Response
924
- A stream of JSON objects is returned:
925
- Copy
926
-
927
- {"status":"reading model metadata"}
928
- {"status":"creating system layer"}
929
- {"status":"using already created layer sha256:22f7f8ef5f4c791c1b03d7eb414399294764d7cc82c7e94aa81a1feb80a983a2"}
930
- {"status":"using already created layer sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b"}
931
- {"status":"using already created layer sha256:7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d"}
932
- {"status":"using already created layer sha256:2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988"}
933
- {"status":"using already created layer sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9"}
934
- {"status":"writing layer sha256:df30045fe90f0d750db82a058109cecd6d4de9c90a3d75b19c09e5f64580bb42"}
935
- {"status":"writing layer sha256:f18a68eb09bf925bb1b669490407c1b1251c5db98dc4d3d81f3088498ea55690"}
936
- {"status":"writing manifest"}
937
- {"status":"success"}
938
-
939
-
940
- Quantize a model
941
- Quantize a non-quantized model.
942
- Request
943
- Copy
944
-
945
- curl http://localhost:11434/api/create -d '{
946
- "model": "llama3.1:quantized",
947
- "from": "llama3.1:8b-instruct-fp16",
948
- "quantize": "q4_K_M"
949
- }'
950
-
951
- Response
952
- A stream of JSON objects is returned:
953
- Copy
954
-
955
- {"status":"quantizing F16 model to Q4_K_M"}
956
- {"status":"creating new layer sha256:667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29"}
957
- {"status":"using existing layer sha256:11ce4ee3e170f6adebac9a991c22e22ab3f8530e154ee669954c4bc73061c258"}
958
- {"status":"using existing layer sha256:0ba8f0e314b4264dfd19df045cde9d4c394a52474bf92ed6a3de22a4ca31a177"}
959
- {"status":"using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"}
960
- {"status":"creating new layer sha256:455f34728c9b5dd3376378bfb809ee166c145b0b4c1f1a6feca069055066ef9a"}
961
- {"status":"writing manifest"}
962
- {"status":"success"}
963
-
964
-
965
- Create a model from GGUF
966
- Create a model from a GGUF file. The files parameter should be filled out with the file name and SHA256 digest of the GGUF file you wish to use. Use /api/blobs/:digest to push the GGUF file to the server before calling this API.
967
- Request
968
- Copy
969
-
970
- curl http://localhost:11434/api/create -d '{
971
- "model": "my-gguf-model",
972
- "files": {
973
- "test.gguf": "sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"
974
- }
975
- }'
976
-
977
- Response
978
- A stream of JSON objects is returned:
979
- Copy
980
-
981
- {"status":"parsing GGUF"}
982
- {"status":"using existing layer sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"}
983
- {"status":"writing manifest"}
984
- {"status":"success"}
985
-
986
-
987
- Create a model from a Safetensors directory
988
- The files parameter should include a dictionary of files for the safetensors model which includes the file names and SHA256 digest of each file. Use /api/blobs/:digest to first push each of the files to the server before calling this API. Files will remain in the cache until the Ollama server is restarted.
989
- Request
990
- Copy
991
-
992
- curl http://localhost:11434/api/create -d '{
993
- "model": "fred",
994
- "files": {
995
- "config.json": "sha256:dd3443e529fb2290423a0c65c2d633e67b419d273f170259e27297219828e389",
996
- "generation_config.json": "sha256:88effbb63300dbbc7390143fbbdd9d9fa50587b37e8bfd16c8c90d4970a74a36",
997
- "special_tokens_map.json": "sha256:b7455f0e8f00539108837bfa586c4fbf424e31f8717819a6798be74bef813d05",
998
- "tokenizer.json": "sha256:bbc1904d35169c542dffbe1f7589a5994ec7426d9e5b609d07bab876f32e97ab",
999
- "tokenizer_config.json": "sha256:24e8a6dc2547164b7002e3125f10b415105644fcf02bf9ad8b674c87b1eaaed6",
1000
- "model.safetensors": "sha256:1ff795ff6a07e6a68085d206fb84417da2f083f68391c2843cd2b8ac6df8538f"
1001
- }
1002
- }'
1003
-
1004
- Response
1005
- A stream of JSON objects is returned:
1006
- Copy
1007
-
1008
- {"status":"converting model"}
1009
- {"status":"creating new layer sha256:05ca5b813af4a53d2c2922933936e398958855c44ee534858fcfd830940618b6"}
1010
- {"status":"using autodetected template llama3-instruct"}
1011
- {"status":"using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"}
1012
- {"status":"writing manifest"}
1013
- {"status":"success"}
1014
-
1015
-
1016
- Check if a Blob Exists
1017
- Copy
1018
-
1019
- HEAD /api/blobs/:digest
1020
-
1021
- Ensures that the file blob (Binary Large Object) used with create a model exists on the server. This checks your Ollama server and not ollama.com.
1022
-
1023
- Query Parameters
1024
-
1025
- digest: the SHA256 digest of the blob
1026
-
1027
-
1028
- Examples
1029
-
1030
- Request
1031
- Copy
1032
-
1033
- curl -I http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
1034
-
1035
-
1036
- Response
1037
- Return 200 OK if the blob exists, 404 Not Found if it does not.
1038
-
1039
- Push a Blob
1040
- Copy
1041
-
1042
- POST /api/blobs/:digest
1043
-
1044
- Push a file to the Ollama server to create a “blob” (Binary Large Object).
1045
-
1046
- Query Parameters
1047
-
1048
- digest: the expected SHA256 digest of the file
1049
-
1050
-
1051
- Examples
1052
-
1053
- Request
1054
- Copy
1055
-
1056
- curl -T model.gguf -X POST http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2
1057
-
1058
-
1059
- Response
1060
- Return 201 Created if the blob was successfully created, 400 Bad Request if the digest used is not expected.
1061
-
1062
- List Local Models
1063
- Copy
1064
-
1065
- GET /api/tags
1066
-
1067
- List models that are available locally.
1068
-
1069
- Examples
1070
-
1071
- Request
1072
- Copy
1073
-
1074
- curl http://localhost:11434/api/tags
1075
-
1076
-
1077
- Response
1078
- A single JSON object will be returned.
1079
- Copy
1080
-
1081
- {
1082
- "models": [
1083
- {
1084
- "name": "codellama:13b",
1085
- "modified_at": "2023-11-04T14:56:49.277302595-07:00",
1086
- "size": 7365960935,
1087
- "digest": "9f438cb9cd581fc025612d27f7c1a6669ff83a8bb0ed86c94fcf4c5440555697",
1088
- "details": {
1089
- "format": "gguf",
1090
- "family": "llama",
1091
- "families": null,
1092
- "parameter_size": "13B",
1093
- "quantization_level": "Q4_0"
1094
- }
1095
- },
1096
- {
1097
- "name": "llama3:latest",
1098
- "modified_at": "2023-12-07T09:32:18.757212583-08:00",
1099
- "size": 3825819519,
1100
- "digest": "fe938a131f40e6f6d40083c9f0f430a515233eb2edaa6d72eb85c50d64f2300e",
1101
- "details": {
1102
- "format": "gguf",
1103
- "family": "llama",
1104
- "families": null,
1105
- "parameter_size": "7B",
1106
- "quantization_level": "Q4_0"
1107
- }
1108
- }
1109
- ]
1110
- }
1111
-
1112
-
1113
- Show Model Information
1114
- Copy
1115
-
1116
- POST /api/show
1117
-
1118
- Show information about a model including details, modelfile, template, parameters, license, system prompt.
1119
-
1120
- Parameters
1121
-
1122
- model: name of the model to show
1123
- verbose: (optional) if set to true, returns full data for verbose response fields
1124
-
1125
-
1126
- Examples
1127
-
1128
- Request
1129
- Copy
1130
-
1131
- curl http://localhost:11434/api/show -d '{
1132
- "model": "llava"
1133
- }'
1134
-
1135
-
1136
- Response
1137
- Copy
1138
-
1139
- {
1140
- "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llava:latest\n\nFROM /Users/matt/.ollama/models/blobs/sha256:200765e1283640ffbd013184bf496e261032fa75b99498a9613be4e94d63ad52\nTEMPLATE \"\"\"{{ .System }}\nUSER: {{ .Prompt }}\nASSISTANT: \"\"\"\nPARAMETER num_ctx 4096\nPARAMETER stop \"\u003c/s\u003e\"\nPARAMETER stop \"USER:\"\nPARAMETER stop \"ASSISTANT:\"",
1141
- "parameters": "num_keep 24\nstop \"<|start_header_id|>\"\nstop \"<|end_header_id|>\"\nstop \"<|eot_id|>\"",
1142
- "template": "{{ if .System }}<|start_header_id|>system<|end_header_id|>\n\n{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>\n\n{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>\n\n{{ .Response }}<|eot_id|>",
1143
- "details": {
1144
- "parent_model": "",
1145
- "format": "gguf",
1146
- "family": "llama",
1147
- "families": ["llama"],
1148
- "parameter_size": "8.0B",
1149
- "quantization_level": "Q4_0"
1150
- },
1151
- "model_info": {
1152
- "general.architecture": "llama",
1153
- "general.file_type": 2,
1154
- "general.parameter_count": 8030261248,
1155
- "general.quantization_version": 2,
1156
- "llama.attention.head_count": 32,
1157
- "llama.attention.head_count_kv": 8,
1158
- "llama.attention.layer_norm_rms_epsilon": 0.00001,
1159
- "llama.block_count": 32,
1160
- "llama.context_length": 8192,
1161
- "llama.embedding_length": 4096,
1162
- "llama.feed_forward_length": 14336,
1163
- "llama.rope.dimension_count": 128,
1164
- "llama.rope.freq_base": 500000,
1165
- "llama.vocab_size": 128256,
1166
- "tokenizer.ggml.bos_token_id": 128000,
1167
- "tokenizer.ggml.eos_token_id": 128009,
1168
- "tokenizer.ggml.merges": [], // populates if `verbose=true`
1169
- "tokenizer.ggml.model": "gpt2",
1170
- "tokenizer.ggml.pre": "llama-bpe",
1171
- "tokenizer.ggml.token_type": [], // populates if `verbose=true`
1172
- "tokenizer.ggml.tokens": [] // populates if `verbose=true`
1173
- },
1174
- "capabilities": ["completion", "vision"]
1175
- }
1176
-
1177
-
1178
- Copy a Model
1179
- Copy
1180
-
1181
- POST /api/copy
1182
-
1183
- Copy a model. Creates a model with another name from an existing model.
1184
-
1185
- Examples
1186
-
1187
- Request
1188
- Copy
1189
-
1190
- curl http://localhost:11434/api/copy -d '{
1191
- "source": "llama3.2",
1192
- "destination": "llama3-backup"
1193
- }'
1194
-
1195
-
1196
- Response
1197
- Returns a 200 OK if successful, or a 404 Not Found if the source model doesn’t exist.
1198
-
1199
- Delete a Model
1200
- Copy
1201
-
1202
- DELETE /api/delete
1203
-
1204
- Delete a model and its data.
1205
-
1206
- Parameters
1207
-
1208
- model: model name to delete
1209
-
1210
-
1211
- Examples
1212
-
1213
- Request
1214
- Copy
1215
-
1216
- curl -X DELETE http://localhost:11434/api/delete -d '{
1217
- "model": "llama3:13b"
1218
- }'
1219
-
1220
-
1221
- Response
1222
- Returns a 200 OK if successful, 404 Not Found if the model to be deleted doesn’t exist.
1223
-
1224
- Pull a Model
1225
- Copy
1226
-
1227
- POST /api/pull
1228
-
1229
- Download a model from the ollama library. Cancelled pulls are resumed from where they left off, and multiple calls will share the same download progress.
1230
-
1231
- Parameters
1232
-
1233
- model: name of the model to pull
1234
- insecure: (optional) allow insecure connections to the library. Only use this if you are pulling from your own library during development.
1235
- stream: (optional) if false the response will be returned as a single response object, rather than a stream of objects
1236
-
1237
-
1238
- Examples
1239
-
1240
- Request
1241
- Copy
1242
-
1243
- curl http://localhost:11434/api/pull -d '{
1244
- "model": "llama3.2"
1245
- }'
1246
-
1247
-
1248
- Response
1249
- If stream is not specified, or set to true, a stream of JSON objects is returned: The first object is the manifest:
1250
- Copy
1251
-
1252
- {
1253
- "status": "pulling manifest"
1254
- }
1255
-
1256
- Then there is a series of downloading responses. Until any of the download is completed, the completed key may not be included. The number of files to be downloaded depends on the number of layers specified in the manifest.
1257
- Copy
1258
-
1259
- {
1260
- "status": "downloading digestname",
1261
- "digest": "digestname",
1262
- "total": 2142590208,
1263
- "completed": 241970
1264
- }
1265
-
1266
- After all the files are downloaded, the final responses are:
1267
- Copy
1268
-
1269
- {
1270
- "status": "verifying sha256 digest"
1271
- }
1272
- {
1273
- "status": "writing manifest"
1274
- }
1275
- {
1276
- "status": "removing any unused layers"
1277
- }
1278
- {
1279
- "status": "success"
1280
- }
1281
-
1282
- if stream is set to false, then the response is a single JSON object:
1283
- Copy
1284
-
1285
- {
1286
- "status": "success"
1287
- }
1288
-
1289
-
1290
- Push a Model
1291
- Copy
1292
-
1293
- POST /api/push
1294
-
1295
- Upload a model to a model library. Requires registering for ollama.ai and adding a public key first.
1296
-
1297
- Parameters
1298
-
1299
- model: name of the model to push in the form of <namespace>/<model>:<tag>
1300
- insecure: (optional) allow insecure connections to the library. Only use this if you are pushing to your library during development.
1301
- stream: (optional) if false the response will be returned as a single response object, rather than a stream of objects
1302
-
1303
-
1304
- Examples
1305
-
1306
- Request
1307
- Copy
1308
-
1309
- curl http://localhost:11434/api/push -d '{
1310
- "model": "mattw/pygmalion:latest"
1311
- }'
1312
-
1313
-
1314
- Response
1315
- If stream is not specified, or set to true, a stream of JSON objects is returned:
1316
- Copy
1317
-
1318
- { "status": "retrieving manifest" }
1319
-
1320
- and then:
1321
- Copy
1322
-
1323
- {
1324
- "status": "starting upload",
1325
- "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
1326
- "total": 1928429856
1327
- }
1328
-
1329
- Then there is a series of uploading responses:
1330
- Copy
1331
-
1332
- {
1333
- "status": "starting upload",
1334
- "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",
1335
- "total": 1928429856
1336
- }
1337
-
1338
- Finally, when the upload is complete:
1339
- Copy
1340
-
1341
- {"status":"pushing manifest"}
1342
- {"status":"success"}
1343
-
1344
- If stream is set to false, then the response is a single JSON object:
1345
- Copy
1346
-
1347
- { "status": "success" }
1348
-
1349
-
1350
- Generate Embeddings
1351
- Copy
1352
-
1353
- POST /api/embed
1354
-
1355
- Generate embeddings from a model
1356
-
1357
- Parameters
1358
-
1359
- model: name of model to generate embeddings from
1360
- input: text or list of text to generate embeddings for
1361
-
1362
- Advanced parameters:
1363
-
1364
- truncate: truncates the end of each input to fit within context length. Returns error if false and context length is exceeded. Defaults to true
1365
- options: additional model parameters listed in the documentation for the Modelfile such as temperature
1366
- keep_alive: controls how long the model will stay loaded into memory following the request (default: 5m)
1367
-
1368
-
1369
- Examples
1370
-
1371
- Request
1372
- Copy
1373
-
1374
- curl http://localhost:11434/api/embed -d '{
1375
- "model": "all-minilm",
1376
- "input": "Why is the sky blue?"
1377
- }'
1378
-
1379
-
1380
- Response
1381
- Copy
1382
-
1383
- {
1384
- "model": "all-minilm",
1385
- "embeddings": [
1386
- [
1387
- 0.010071029, -0.0017594862, 0.05007221, 0.04692972, 0.054916814,
1388
- 0.008599704, 0.105441414, -0.025878139, 0.12958129, 0.031952348
1389
- ]
1390
- ],
1391
- "total_duration": 14143917,
1392
- "load_duration": 1019500,
1393
- "prompt_eval_count": 8
1394
- }
1395
-
1396
-
1397
- Request (Multiple input)
1398
- Copy
1399
-
1400
- curl http://localhost:11434/api/embed -d '{
1401
- "model": "all-minilm",
1402
- "input": ["Why is the sky blue?", "Why is the grass green?"]
1403
- }'
1404
-
1405
-
1406
- Response
1407
- Copy
1408
-
1409
- {
1410
- "model": "all-minilm",
1411
- "embeddings": [
1412
- [
1413
- 0.010071029, -0.0017594862, 0.05007221, 0.04692972, 0.054916814,
1414
- 0.008599704, 0.105441414, -0.025878139, 0.12958129, 0.031952348
1415
- ],
1416
- [
1417
- -0.0098027075, 0.06042469, 0.025257962, -0.006364387, 0.07272725,
1418
- 0.017194884, 0.09032035, -0.051705178, 0.09951512, 0.09072481
1419
- ]
1420
- ]
1421
- }
1422
-
1423
-
1424
- List Running Models
1425
- Copy
1426
-
1427
- GET /api/ps
1428
-
1429
- List models that are currently loaded into memory.
1430
-
1431
- Examples
1432
-
1433
- Request
1434
- Copy
1435
-
1436
- curl http://localhost:11434/api/ps
1437
-
1438
-
1439
- Response
1440
- A single JSON object will be returned.
1441
- Copy
1442
-
1443
- {
1444
- "models": [
1445
- {
1446
- "name": "mistral:latest",
1447
- "model": "mistral:latest",
1448
- "size": 5137025024,
1449
- "digest": "2ae6f6dd7a3dd734790bbbf58b8909a606e0e7e97e94b7604e0aa7ae4490e6d8",
1450
- "details": {
1451
- "parent_model": "",
1452
- "format": "gguf",
1453
- "family": "llama",
1454
- "families": ["llama"],
1455
- "parameter_size": "7.2B",
1456
- "quantization_level": "Q4_0"
1457
- },
1458
- "expires_at": "2024-06-04T14:38:31.83753-07:00",
1459
- "size_vram": 5137025024
1460
- }
1461
- ]
1462
- }
1463
-
1464
-
1465
- Generate Embedding
1466
- This endpoint has been superseded by /api/embed
1467
- Copy
1468
-
1469
- POST /api/embeddings
1470
-
1471
- Generate embeddings from a model
1472
-
1473
- Parameters
1474
-
1475
- model: name of model to generate embeddings from
1476
- prompt: text to generate embeddings for
1477
-
1478
- Advanced parameters:
1479
-
1480
- options: additional model parameters listed in the documentation for the Modelfile such as temperature
1481
- keep_alive: controls how long the model will stay loaded into memory following the request (default: 5m)
1482
-
1483
-
1484
- Examples
1485
-
1486
- Request
1487
- Copy
1488
-
1489
- curl http://localhost:11434/api/embeddings -d '{
1490
- "model": "all-minilm",
1491
- "prompt": "Here is an article about llamas..."
1492
- }'
1493
-
1494
-
1495
- Response
1496
- Copy
1497
-
1498
- {
1499
- "embedding": [
1500
- 0.5670403838157654, 0.009260174818336964, 0.23178744316101074,
1501
- -0.2916173040866852, -0.8924556970596313, 0.8785552978515625,
1502
- -0.34576427936553955, 0.5742510557174683, -0.04222835972905159,
1503
- -0.137906014919281
1504
- ]
1505
- }
1506
-
1507
-
1508
- Version
1509
- Copy
1510
-
1511
- GET /api/version
1512
-
1513
- Retrieve the Ollama version
1514
-
1515
- Examples
1516
-
1517
- Request
1518
- Copy
1519
-
1520
- curl http://localhost:11434/api/version
1521
-
1522
-
1523
- Response
1524
- Copy
1525
-
1526
- {
1527
- "version": "0.5.1"
1528
- }