@huggingface/inference 3.13.0 → 3.13.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +119 -124
- package/dist/index.cjs +100 -88
- package/dist/index.js +100 -88
- package/dist/src/lib/getDefaultTask.d.ts.map +1 -1
- package/dist/src/lib/getProviderHelper.d.ts +31 -31
- package/dist/src/lib/getProviderHelper.d.ts.map +1 -1
- package/dist/src/providers/fal-ai.d.ts +3 -17
- package/dist/src/providers/fal-ai.d.ts.map +1 -1
- package/dist/src/providers/hf-inference.d.ts +5 -1
- package/dist/src/providers/hf-inference.d.ts.map +1 -1
- package/dist/src/providers/providerHelper.d.ts +5 -1
- package/dist/src/providers/providerHelper.d.ts.map +1 -1
- package/dist/src/snippets/getInferenceSnippets.d.ts.map +1 -1
- package/dist/src/snippets/templates.exported.d.ts.map +1 -1
- package/dist/src/tasks/audio/automaticSpeechRecognition.d.ts.map +1 -1
- package/dist/src/tasks/cv/imageToImage.d.ts.map +1 -1
- package/dist/src/vendor/fetch-event-source/parse.d.ts.map +1 -1
- package/dist/test/test-files.d.ts.map +1 -1
- package/package.json +2 -2
- package/src/lib/getDefaultTask.ts +2 -1
- package/src/lib/getProviderHelper.ts +40 -36
- package/src/providers/fal-ai.ts +26 -1
- package/src/providers/hf-inference.ts +31 -2
- package/src/providers/providerHelper.ts +5 -1
- package/src/snippets/getInferenceSnippets.ts +13 -4
- package/src/snippets/templates.exported.ts +3 -1
- package/src/tasks/audio/automaticSpeechRecognition.ts +2 -32
- package/src/tasks/cv/imageToImage.ts +3 -18
package/README.md
CHANGED
|
@@ -1,11 +1,11 @@
|
|
|
1
1
|
# 🤗 Hugging Face Inference
|
|
2
2
|
|
|
3
|
-
A Typescript powered wrapper for
|
|
4
|
-
It works with [Inference Providers (serverless)](https://huggingface.co/docs/api-inference/index) – including all supported third-party Inference Providers – and [Inference Endpoints (dedicated)](https://huggingface.co/docs/inference-endpoints/index), and even with .
|
|
3
|
+
A Typescript powered wrapper that provides a unified interface to run inference across multiple services for models hosted on the Hugging Face Hub:
|
|
5
4
|
|
|
6
|
-
|
|
5
|
+
1. [Inference Providers](https://huggingface.co/docs/inference-providers/index): a streamlined, unified access to hundreds of machine learning models, powered by our serverless inference partners. This new approach builds on our previous Serverless Inference API, offering more models, improved performance, and greater reliability thanks to world-class providers. Refer to the [documentation](https://huggingface.co/docs/inference-providers/index#partners) for a list of supported providers.
|
|
6
|
+
2. [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index): a product to easily deploy models to production. Inference is run by Hugging Face in a dedicated, fully managed infrastructure on a cloud provider of your choice.
|
|
7
|
+
3. Local endpoints: you can also run inference with local inference servers like [llama.cpp](https://github.com/ggerganov/llama.cpp), [Ollama](https://ollama.com/), [vLLM](https://github.com/vllm-project/vllm), [LiteLLM](https://docs.litellm.ai/docs/simple_proxy), or [Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) by connecting the client to these local endpoints.
|
|
7
8
|
|
|
8
|
-
You can also try out a live [interactive notebook](https://observablehq.com/@huggingface/hello-huggingface-js-inference), see some demos on [hf.co/huggingfacejs](https://huggingface.co/huggingfacejs), or watch a [Scrimba tutorial that explains how Inference Endpoints works](https://scrimba.com/scrim/cod8248f5adfd6e129582c523).
|
|
9
9
|
|
|
10
10
|
## Getting Started
|
|
11
11
|
|
|
@@ -42,7 +42,7 @@ const hf = new InferenceClient('your access token');
|
|
|
42
42
|
|
|
43
43
|
Your access token should be kept private. If you need to protect it in front-end applications, we suggest setting up a proxy server that stores the access token.
|
|
44
44
|
|
|
45
|
-
|
|
45
|
+
## Using Inference Providers
|
|
46
46
|
|
|
47
47
|
You can send inference requests to third-party providers with the inference client.
|
|
48
48
|
|
|
@@ -50,6 +50,7 @@ Currently, we support the following providers:
|
|
|
50
50
|
- [Fal.ai](https://fal.ai)
|
|
51
51
|
- [Featherless AI](https://featherless.ai)
|
|
52
52
|
- [Fireworks AI](https://fireworks.ai)
|
|
53
|
+
- [HF Inference](https://huggingface.co/docs/inference-providers/providers/hf-inference)
|
|
53
54
|
- [Hyperbolic](https://hyperbolic.xyz)
|
|
54
55
|
- [Nebius](https://studio.nebius.ai)
|
|
55
56
|
- [Novita](https://novita.ai/?utm_source=github_huggingface&utm_medium=github_readme&utm_campaign=link)
|
|
@@ -63,7 +64,8 @@ Currently, we support the following providers:
|
|
|
63
64
|
- [Cerebras](https://cerebras.ai/)
|
|
64
65
|
- [Groq](https://groq.com)
|
|
65
66
|
|
|
66
|
-
To send requests to a third-party provider, you have to pass the `provider` parameter to the inference function.
|
|
67
|
+
To send requests to a third-party provider, you have to pass the `provider` parameter to the inference function. The default value of the `provider` parameter is "auto", which will select the first of the providers available for the model, sorted by your preferred order in https://hf.co/settings/inference-providers.
|
|
68
|
+
|
|
67
69
|
```ts
|
|
68
70
|
const accessToken = "hf_..."; // Either a HF access token, or an API key from the third-party provider (Replicate in this example)
|
|
69
71
|
|
|
@@ -75,6 +77,7 @@ await client.textToImage({
|
|
|
75
77
|
})
|
|
76
78
|
```
|
|
77
79
|
|
|
80
|
+
You also have to make sure your request is authenticated with an access token.
|
|
78
81
|
When authenticated with a Hugging Face access token, the request is routed through https://huggingface.co.
|
|
79
82
|
When authenticated with a third-party provider key, the request is made directly against that provider's inference API.
|
|
80
83
|
|
|
@@ -82,6 +85,7 @@ Only a subset of models are supported when requesting third-party providers. You
|
|
|
82
85
|
- [Fal.ai supported models](https://huggingface.co/api/partners/fal-ai/models)
|
|
83
86
|
- [Featherless AI supported models](https://huggingface.co/api/partners/featherless-ai/models)
|
|
84
87
|
- [Fireworks AI supported models](https://huggingface.co/api/partners/fireworks-ai/models)
|
|
88
|
+
- [HF Inference supported models](https://huggingface.co/api/partners/hf-inference/models)
|
|
85
89
|
- [Hyperbolic supported models](https://huggingface.co/api/partners/hyperbolic/models)
|
|
86
90
|
- [Nebius supported models](https://huggingface.co/api/partners/nebius/models)
|
|
87
91
|
- [Nscale supported models](https://huggingface.co/api/partners/nscale/models)
|
|
@@ -92,7 +96,6 @@ Only a subset of models are supported when requesting third-party providers. You
|
|
|
92
96
|
- [Cohere supported models](https://huggingface.co/api/partners/cohere/models)
|
|
93
97
|
- [Cerebras supported models](https://huggingface.co/api/partners/cerebras/models)
|
|
94
98
|
- [Groq supported models](https://console.groq.com/docs/models)
|
|
95
|
-
- [HF Inference API (serverless)](https://huggingface.co/models?inference=warm&sort=trending)
|
|
96
99
|
|
|
97
100
|
❗**Important note:** To be compatible, the third-party API must adhere to the "standard" shape API we expect on HF model pages for each pipeline task type.
|
|
98
101
|
This is not an issue for LLMs as everyone converged on the OpenAI API anyways, but can be more tricky for other tasks like "text-to-image" or "automatic-speech-recognition" where there exists no standard API. Let us know if any help is needed or if we can make things easier for you!
|
|
@@ -116,22 +119,22 @@ await textGeneration({
|
|
|
116
119
|
|
|
117
120
|
This will enable tree-shaking by your bundler.
|
|
118
121
|
|
|
119
|
-
|
|
122
|
+
### Natural Language Processing
|
|
120
123
|
|
|
121
|
-
|
|
124
|
+
#### Text Generation
|
|
122
125
|
|
|
123
126
|
Generates text from an input prompt.
|
|
124
127
|
|
|
125
|
-
[Demo](https://huggingface.co/spaces/huggingfacejs/streaming-text-generation)
|
|
126
|
-
|
|
127
128
|
```typescript
|
|
128
129
|
await hf.textGeneration({
|
|
129
|
-
model: '
|
|
130
|
+
model: 'mistralai/Mixtral-8x7B-v0.1',
|
|
131
|
+
provider: "together",
|
|
130
132
|
inputs: 'The answer to the universe is'
|
|
131
133
|
})
|
|
132
134
|
|
|
133
135
|
for await (const output of hf.textGenerationStream({
|
|
134
|
-
model: "
|
|
136
|
+
model: "mistralai/Mixtral-8x7B-v0.1",
|
|
137
|
+
provider: "together",
|
|
135
138
|
inputs: 'repeat "one two three four"',
|
|
136
139
|
parameters: { max_new_tokens: 250 }
|
|
137
140
|
})) {
|
|
@@ -139,16 +142,15 @@ for await (const output of hf.textGenerationStream({
|
|
|
139
142
|
}
|
|
140
143
|
```
|
|
141
144
|
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
Using the `chatCompletion` method, you can generate text with models compatible with the OpenAI Chat Completion API. All models served by [TGI](https://huggingface.co/docs/text-generation-inference/) on Hugging Face support Messages API.
|
|
145
|
+
#### Chat Completion
|
|
145
146
|
|
|
146
|
-
|
|
147
|
+
Generate a model response from a list of messages comprising a conversation.
|
|
147
148
|
|
|
148
149
|
```typescript
|
|
149
150
|
// Non-streaming API
|
|
150
151
|
const out = await hf.chatCompletion({
|
|
151
|
-
model: "
|
|
152
|
+
model: "Qwen/Qwen3-32B",
|
|
153
|
+
provider: "cerebras",
|
|
152
154
|
messages: [{ role: "user", content: "Hello, nice to meet you!" }],
|
|
153
155
|
max_tokens: 512,
|
|
154
156
|
temperature: 0.1,
|
|
@@ -157,7 +159,8 @@ const out = await hf.chatCompletion({
|
|
|
157
159
|
// Streaming API
|
|
158
160
|
let out = "";
|
|
159
161
|
for await (const chunk of hf.chatCompletionStream({
|
|
160
|
-
model: "
|
|
162
|
+
model: "Qwen/Qwen3-32B",
|
|
163
|
+
provider: "cerebras",
|
|
161
164
|
messages: [
|
|
162
165
|
{ role: "user", content: "Can you help me solve an equation?" },
|
|
163
166
|
],
|
|
@@ -169,33 +172,18 @@ for await (const chunk of hf.chatCompletionStream({
|
|
|
169
172
|
}
|
|
170
173
|
}
|
|
171
174
|
```
|
|
175
|
+
#### Feature Extraction
|
|
172
176
|
|
|
173
|
-
|
|
177
|
+
This task reads some text and outputs raw float values, that are usually consumed as part of a semantic database/semantic search.
|
|
174
178
|
|
|
175
179
|
```typescript
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
model: "gpt-3.5-turbo",
|
|
181
|
-
messages: [
|
|
182
|
-
{ role: "user", content: "Complete the equation 1+1= ,just the answer" },
|
|
183
|
-
],
|
|
184
|
-
max_tokens: 500,
|
|
185
|
-
temperature: 0.1,
|
|
186
|
-
seed: 0,
|
|
187
|
-
})) {
|
|
188
|
-
if (chunk.choices && chunk.choices.length > 0) {
|
|
189
|
-
out += chunk.choices[0].delta.content;
|
|
190
|
-
}
|
|
191
|
-
}
|
|
192
|
-
|
|
193
|
-
// For mistral AI:
|
|
194
|
-
// endpointUrl: "https://api.mistral.ai"
|
|
195
|
-
// model: "mistral-tiny"
|
|
180
|
+
await hf.featureExtraction({
|
|
181
|
+
model: "sentence-transformers/distilbert-base-nli-mean-tokens",
|
|
182
|
+
inputs: "That is a happy person",
|
|
183
|
+
});
|
|
196
184
|
```
|
|
197
185
|
|
|
198
|
-
|
|
186
|
+
#### Fill Mask
|
|
199
187
|
|
|
200
188
|
Tries to fill in a hole with a missing word (token to be precise).
|
|
201
189
|
|
|
@@ -206,7 +194,7 @@ await hf.fillMask({
|
|
|
206
194
|
})
|
|
207
195
|
```
|
|
208
196
|
|
|
209
|
-
|
|
197
|
+
#### Summarization
|
|
210
198
|
|
|
211
199
|
Summarizes longer text into shorter text. Be careful, some models have a maximum length of input.
|
|
212
200
|
|
|
@@ -221,7 +209,7 @@ await hf.summarization({
|
|
|
221
209
|
})
|
|
222
210
|
```
|
|
223
211
|
|
|
224
|
-
|
|
212
|
+
#### Question Answering
|
|
225
213
|
|
|
226
214
|
Answers questions based on the context you provide.
|
|
227
215
|
|
|
@@ -235,7 +223,7 @@ await hf.questionAnswering({
|
|
|
235
223
|
})
|
|
236
224
|
```
|
|
237
225
|
|
|
238
|
-
|
|
226
|
+
#### Table Question Answering
|
|
239
227
|
|
|
240
228
|
```typescript
|
|
241
229
|
await hf.tableQuestionAnswering({
|
|
@@ -252,7 +240,7 @@ await hf.tableQuestionAnswering({
|
|
|
252
240
|
})
|
|
253
241
|
```
|
|
254
242
|
|
|
255
|
-
|
|
243
|
+
#### Text Classification
|
|
256
244
|
|
|
257
245
|
Often used for sentiment analysis, this method will assign labels to the given text along with a probability score of that label.
|
|
258
246
|
|
|
@@ -263,7 +251,7 @@ await hf.textClassification({
|
|
|
263
251
|
})
|
|
264
252
|
```
|
|
265
253
|
|
|
266
|
-
|
|
254
|
+
#### Token Classification
|
|
267
255
|
|
|
268
256
|
Used for sentence parsing, either grammatical, or Named Entity Recognition (NER) to understand keywords contained within text.
|
|
269
257
|
|
|
@@ -274,7 +262,7 @@ await hf.tokenClassification({
|
|
|
274
262
|
})
|
|
275
263
|
```
|
|
276
264
|
|
|
277
|
-
|
|
265
|
+
#### Translation
|
|
278
266
|
|
|
279
267
|
Converts text from one language to another.
|
|
280
268
|
|
|
@@ -294,7 +282,7 @@ await hf.translation({
|
|
|
294
282
|
})
|
|
295
283
|
```
|
|
296
284
|
|
|
297
|
-
|
|
285
|
+
#### Zero-Shot Classification
|
|
298
286
|
|
|
299
287
|
Checks how well an input text fits into a set of labels you provide.
|
|
300
288
|
|
|
@@ -308,22 +296,7 @@ await hf.zeroShotClassification({
|
|
|
308
296
|
})
|
|
309
297
|
```
|
|
310
298
|
|
|
311
|
-
|
|
312
|
-
|
|
313
|
-
This task corresponds to any chatbot-like structure. Models tend to have shorter max_length, so please check with caution when using a given model if you need long-range dependency or not.
|
|
314
|
-
|
|
315
|
-
```typescript
|
|
316
|
-
await hf.conversational({
|
|
317
|
-
model: 'microsoft/DialoGPT-large',
|
|
318
|
-
inputs: {
|
|
319
|
-
past_user_inputs: ['Which movie is the best ?'],
|
|
320
|
-
generated_responses: ['It is Die Hard for sure.'],
|
|
321
|
-
text: 'Can you explain why ?'
|
|
322
|
-
}
|
|
323
|
-
})
|
|
324
|
-
```
|
|
325
|
-
|
|
326
|
-
### Sentence Similarity
|
|
299
|
+
#### Sentence Similarity
|
|
327
300
|
|
|
328
301
|
Calculate the semantic similarity between one text and a list of other sentences.
|
|
329
302
|
|
|
@@ -341,9 +314,9 @@ await hf.sentenceSimilarity({
|
|
|
341
314
|
})
|
|
342
315
|
```
|
|
343
316
|
|
|
344
|
-
|
|
317
|
+
### Audio
|
|
345
318
|
|
|
346
|
-
|
|
319
|
+
#### Automatic Speech Recognition
|
|
347
320
|
|
|
348
321
|
Transcribes speech from an audio file.
|
|
349
322
|
|
|
@@ -356,7 +329,7 @@ await hf.automaticSpeechRecognition({
|
|
|
356
329
|
})
|
|
357
330
|
```
|
|
358
331
|
|
|
359
|
-
|
|
332
|
+
#### Audio Classification
|
|
360
333
|
|
|
361
334
|
Assigns labels to the given audio along with a probability score of that label.
|
|
362
335
|
|
|
@@ -369,7 +342,7 @@ await hf.audioClassification({
|
|
|
369
342
|
})
|
|
370
343
|
```
|
|
371
344
|
|
|
372
|
-
|
|
345
|
+
#### Text To Speech
|
|
373
346
|
|
|
374
347
|
Generates natural-sounding speech from text input.
|
|
375
348
|
|
|
@@ -382,7 +355,7 @@ await hf.textToSpeech({
|
|
|
382
355
|
})
|
|
383
356
|
```
|
|
384
357
|
|
|
385
|
-
|
|
358
|
+
#### Audio To Audio
|
|
386
359
|
|
|
387
360
|
Outputs one or multiple generated audios from an input audio, commonly used for speech enhancement and source separation.
|
|
388
361
|
|
|
@@ -393,9 +366,9 @@ await hf.audioToAudio({
|
|
|
393
366
|
})
|
|
394
367
|
```
|
|
395
368
|
|
|
396
|
-
|
|
369
|
+
### Computer Vision
|
|
397
370
|
|
|
398
|
-
|
|
371
|
+
#### Image Classification
|
|
399
372
|
|
|
400
373
|
Assigns labels to a given image along with a probability score of that label.
|
|
401
374
|
|
|
@@ -408,7 +381,7 @@ await hf.imageClassification({
|
|
|
408
381
|
})
|
|
409
382
|
```
|
|
410
383
|
|
|
411
|
-
|
|
384
|
+
#### Object Detection
|
|
412
385
|
|
|
413
386
|
Detects objects within an image and returns labels with corresponding bounding boxes and probability scores.
|
|
414
387
|
|
|
@@ -421,7 +394,7 @@ await hf.objectDetection({
|
|
|
421
394
|
})
|
|
422
395
|
```
|
|
423
396
|
|
|
424
|
-
|
|
397
|
+
#### Image Segmentation
|
|
425
398
|
|
|
426
399
|
Detects segments within an image and returns labels with corresponding bounding boxes and probability scores.
|
|
427
400
|
|
|
@@ -432,7 +405,7 @@ await hf.imageSegmentation({
|
|
|
432
405
|
})
|
|
433
406
|
```
|
|
434
407
|
|
|
435
|
-
|
|
408
|
+
#### Image To Text
|
|
436
409
|
|
|
437
410
|
Outputs text from a given image, commonly used for captioning or optical character recognition.
|
|
438
411
|
|
|
@@ -443,7 +416,7 @@ await hf.imageToText({
|
|
|
443
416
|
})
|
|
444
417
|
```
|
|
445
418
|
|
|
446
|
-
|
|
419
|
+
#### Text To Image
|
|
447
420
|
|
|
448
421
|
Creates an image from a text prompt.
|
|
449
422
|
|
|
@@ -456,7 +429,7 @@ await hf.textToImage({
|
|
|
456
429
|
})
|
|
457
430
|
```
|
|
458
431
|
|
|
459
|
-
|
|
432
|
+
#### Image To Image
|
|
460
433
|
|
|
461
434
|
Image-to-image is the task of transforming a source image to match the characteristics of a target image or a target image domain.
|
|
462
435
|
|
|
@@ -472,7 +445,7 @@ await hf.imageToImage({
|
|
|
472
445
|
});
|
|
473
446
|
```
|
|
474
447
|
|
|
475
|
-
|
|
448
|
+
#### Zero Shot Image Classification
|
|
476
449
|
|
|
477
450
|
Checks how well an input image fits into a set of labels you provide.
|
|
478
451
|
|
|
@@ -488,20 +461,10 @@ await hf.zeroShotImageClassification({
|
|
|
488
461
|
})
|
|
489
462
|
```
|
|
490
463
|
|
|
491
|
-
|
|
492
|
-
|
|
493
|
-
### Feature Extraction
|
|
494
|
-
|
|
495
|
-
This task reads some text and outputs raw float values, that are usually consumed as part of a semantic database/semantic search.
|
|
464
|
+
### Multimodal
|
|
496
465
|
|
|
497
|
-
```typescript
|
|
498
|
-
await hf.featureExtraction({
|
|
499
|
-
model: "sentence-transformers/distilbert-base-nli-mean-tokens",
|
|
500
|
-
inputs: "That is a happy person",
|
|
501
|
-
});
|
|
502
|
-
```
|
|
503
466
|
|
|
504
|
-
|
|
467
|
+
#### Visual Question Answering
|
|
505
468
|
|
|
506
469
|
Visual Question Answering is the task of answering open-ended questions based on an image. They output natural language responses to natural language questions.
|
|
507
470
|
|
|
@@ -517,7 +480,7 @@ await hf.visualQuestionAnswering({
|
|
|
517
480
|
})
|
|
518
481
|
```
|
|
519
482
|
|
|
520
|
-
|
|
483
|
+
#### Document Question Answering
|
|
521
484
|
|
|
522
485
|
Document question answering models take a (document, question) pair as input and return an answer in natural language.
|
|
523
486
|
|
|
@@ -533,9 +496,9 @@ await hf.documentQuestionAnswering({
|
|
|
533
496
|
})
|
|
534
497
|
```
|
|
535
498
|
|
|
536
|
-
|
|
499
|
+
### Tabular
|
|
537
500
|
|
|
538
|
-
|
|
501
|
+
#### Tabular Regression
|
|
539
502
|
|
|
540
503
|
Tabular regression is the task of predicting a numerical value given a set of attributes.
|
|
541
504
|
|
|
@@ -555,7 +518,7 @@ await hf.tabularRegression({
|
|
|
555
518
|
})
|
|
556
519
|
```
|
|
557
520
|
|
|
558
|
-
|
|
521
|
+
#### Tabular Classification
|
|
559
522
|
|
|
560
523
|
Tabular classification is the task of classifying a target category (a group) based on set of attributes.
|
|
561
524
|
|
|
@@ -600,48 +563,80 @@ for await (const chunk of stream) {
|
|
|
600
563
|
}
|
|
601
564
|
```
|
|
602
565
|
|
|
603
|
-
##
|
|
566
|
+
## Using Inference Endpoints
|
|
604
567
|
|
|
605
|
-
|
|
568
|
+
The examples we saw above use inference providers. While these prove to be very useful for prototyping
|
|
569
|
+
and testing things quickly. Once you're ready to deploy your model to production, you'll need to use a dedicated infrastructure. That's where [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index) comes into play. It allows you to deploy any model and expose it as a private API. Once deployed, you'll get a URL that you can connect to:
|
|
606
570
|
|
|
607
571
|
```typescript
|
|
608
|
-
|
|
609
|
-
const { generated_text } = await gpt2.textGeneration({inputs: 'The answer to the universe is'});
|
|
572
|
+
import { InferenceClient } from '@huggingface/inference';
|
|
610
573
|
|
|
611
|
-
|
|
612
|
-
|
|
613
|
-
"https://router.huggingface.co/hf-inference/models/meta-llama/Llama-3.1-8B-Instruct"
|
|
614
|
-
);
|
|
615
|
-
const stream = ep.chatCompletionStream({
|
|
616
|
-
model: "tgi",
|
|
617
|
-
messages: [{ role: "user", content: "Complete the equation 1+1= ,just the answer" }],
|
|
618
|
-
max_tokens: 500,
|
|
619
|
-
temperature: 0.1,
|
|
620
|
-
seed: 0,
|
|
574
|
+
const hf = new InferenceClient("hf_xxxxxxxxxxxxxx", {
|
|
575
|
+
endpointUrl: "https://j3z5luu0ooo76jnl.us-east-1.aws.endpoints.huggingface.cloud/v1/",
|
|
621
576
|
});
|
|
622
|
-
|
|
623
|
-
|
|
624
|
-
|
|
625
|
-
|
|
626
|
-
|
|
627
|
-
|
|
628
|
-
}
|
|
577
|
+
|
|
578
|
+
const response = await hf.chatCompletion({
|
|
579
|
+
messages: [
|
|
580
|
+
{
|
|
581
|
+
role: "user",
|
|
582
|
+
content: "What is the capital of France?",
|
|
583
|
+
},
|
|
584
|
+
],
|
|
585
|
+
});
|
|
586
|
+
|
|
587
|
+
console.log(response.choices[0].message.content);
|
|
629
588
|
```
|
|
630
589
|
|
|
631
|
-
By default, all calls to the inference endpoint will wait until the model is
|
|
632
|
-
|
|
633
|
-
0](https://huggingface.co/docs/inference-endpoints/en/autoscaling#scaling-to-0)
|
|
634
|
-
is enabled on the endpoint, this can result in non-trivial waiting time. If
|
|
635
|
-
you'd rather disable this behavior and handle the endpoint's returned 500 HTTP
|
|
636
|
-
errors yourself, you can do so like so:
|
|
590
|
+
By default, all calls to the inference endpoint will wait until the model is loaded. When [scaling to 0](https://huggingface.co/docs/inference-endpoints/en/autoscaling#scaling-to-0)
|
|
591
|
+
is enabled on the endpoint, this can result in non-trivial waiting time. If you'd rather disable this behavior and handle the endpoint's returned 500 HTTP errors yourself, you can do so like so:
|
|
637
592
|
|
|
638
593
|
```typescript
|
|
639
|
-
const
|
|
640
|
-
|
|
641
|
-
|
|
642
|
-
|
|
594
|
+
const hf = new InferenceClient("hf_xxxxxxxxxxxxxx", {
|
|
595
|
+
endpointUrl: "https://j3z5luu0ooo76jnl.us-east-1.aws.endpoints.huggingface.cloud/v1/",
|
|
596
|
+
});
|
|
597
|
+
|
|
598
|
+
const response = await hf.chatCompletion(
|
|
599
|
+
{
|
|
600
|
+
messages: [
|
|
601
|
+
{
|
|
602
|
+
role: "user",
|
|
603
|
+
content: "What is the capital of France?",
|
|
604
|
+
},
|
|
605
|
+
],
|
|
606
|
+
},
|
|
607
|
+
{
|
|
608
|
+
retry_on_error: false,
|
|
609
|
+
}
|
|
643
610
|
);
|
|
644
611
|
```
|
|
612
|
+
## Using local endpoints
|
|
613
|
+
|
|
614
|
+
You can use `InferenceClient` to run chat completion with local inference servers (llama.cpp, vllm, litellm server, TGI, mlx, etc.) running on your own machine. The API should be OpenAI API-compatible.
|
|
615
|
+
|
|
616
|
+
```typescript
|
|
617
|
+
import { InferenceClient } from '@huggingface/inference';
|
|
618
|
+
|
|
619
|
+
const hf = new InferenceClient(undefined, {
|
|
620
|
+
endpointUrl: "http://localhost:8080",
|
|
621
|
+
});
|
|
622
|
+
|
|
623
|
+
const response = await hf.chatCompletion({
|
|
624
|
+
messages: [
|
|
625
|
+
{
|
|
626
|
+
role: "user",
|
|
627
|
+
content: "What is the capital of France?",
|
|
628
|
+
},
|
|
629
|
+
],
|
|
630
|
+
});
|
|
631
|
+
|
|
632
|
+
console.log(response.choices[0].message.content);
|
|
633
|
+
```
|
|
634
|
+
|
|
635
|
+
<Tip>
|
|
636
|
+
|
|
637
|
+
Similarily to the OpenAI JS client, `InferenceClient` can be used to run Chat Completion inference with any OpenAI REST API-compatible endpoint.
|
|
638
|
+
|
|
639
|
+
</Tip>
|
|
645
640
|
|
|
646
641
|
## Running tests
|
|
647
642
|
|