@huggingface/tasks 0.0.3 → 0.0.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +20 -0
- package/dist/index.d.ts +368 -46
- package/dist/index.js +117 -41
- package/dist/{index.cjs → index.mjs} +84 -67
- package/package.json +43 -33
- package/src/Types.ts +49 -43
- package/src/audio-classification/about.md +5 -5
- package/src/audio-classification/data.ts +11 -11
- package/src/audio-to-audio/about.md +4 -3
- package/src/audio-to-audio/data.ts +18 -15
- package/src/automatic-speech-recognition/about.md +5 -4
- package/src/automatic-speech-recognition/data.ts +18 -17
- package/src/const.ts +52 -44
- package/src/conversational/about.md +9 -9
- package/src/conversational/data.ts +22 -18
- package/src/depth-estimation/about.md +1 -3
- package/src/depth-estimation/data.ts +11 -11
- package/src/document-question-answering/about.md +1 -2
- package/src/document-question-answering/data.ts +22 -19
- package/src/feature-extraction/about.md +2 -3
- package/src/feature-extraction/data.ts +12 -15
- package/src/fill-mask/about.md +1 -1
- package/src/fill-mask/data.ts +16 -14
- package/src/image-classification/about.md +5 -3
- package/src/image-classification/data.ts +15 -15
- package/src/image-segmentation/about.md +4 -4
- package/src/image-segmentation/data.ts +26 -23
- package/src/image-to-image/about.md +10 -12
- package/src/image-to-image/data.ts +31 -27
- package/src/image-to-text/about.md +13 -6
- package/src/image-to-text/data.ts +20 -21
- package/src/index.ts +11 -0
- package/src/modelLibraries.ts +43 -0
- package/src/object-detection/about.md +2 -1
- package/src/object-detection/data.ts +20 -17
- package/src/pipelines.ts +619 -0
- package/src/placeholder/about.md +3 -3
- package/src/placeholder/data.ts +8 -8
- package/src/question-answering/about.md +1 -1
- package/src/question-answering/data.ts +21 -19
- package/src/reinforcement-learning/about.md +167 -176
- package/src/reinforcement-learning/data.ts +75 -78
- package/src/sentence-similarity/data.ts +29 -28
- package/src/summarization/about.md +6 -5
- package/src/summarization/data.ts +23 -20
- package/src/table-question-answering/about.md +5 -5
- package/src/table-question-answering/data.ts +35 -39
- package/src/tabular-classification/about.md +4 -6
- package/src/tabular-classification/data.ts +11 -12
- package/src/tabular-regression/about.md +14 -18
- package/src/tabular-regression/data.ts +10 -11
- package/src/tasksData.ts +47 -50
- package/src/text-classification/about.md +5 -4
- package/src/text-classification/data.ts +21 -20
- package/src/text-generation/about.md +7 -6
- package/src/text-generation/data.ts +36 -34
- package/src/text-to-image/about.md +19 -18
- package/src/text-to-image/data.ts +32 -26
- package/src/text-to-speech/about.md +4 -5
- package/src/text-to-speech/data.ts +16 -17
- package/src/text-to-video/about.md +41 -36
- package/src/text-to-video/data.ts +43 -38
- package/src/token-classification/about.md +1 -3
- package/src/token-classification/data.ts +26 -25
- package/src/translation/about.md +4 -4
- package/src/translation/data.ts +21 -21
- package/src/unconditional-image-generation/about.md +10 -5
- package/src/unconditional-image-generation/data.ts +26 -20
- package/src/video-classification/about.md +5 -1
- package/src/video-classification/data.ts +14 -14
- package/src/visual-question-answering/about.md +8 -3
- package/src/visual-question-answering/data.ts +22 -19
- package/src/zero-shot-classification/about.md +5 -4
- package/src/zero-shot-classification/data.ts +20 -20
- package/src/zero-shot-image-classification/about.md +17 -9
- package/src/zero-shot-image-classification/data.ts +12 -14
- package/tsconfig.json +18 -0
- package/assets/audio-classification/audio.wav +0 -0
- package/assets/audio-to-audio/input.wav +0 -0
- package/assets/audio-to-audio/label-0.wav +0 -0
- package/assets/audio-to-audio/label-1.wav +0 -0
- package/assets/automatic-speech-recognition/input.flac +0 -0
- package/assets/automatic-speech-recognition/wav2vec2.png +0 -0
- package/assets/contribution-guide/anatomy.png +0 -0
- package/assets/contribution-guide/libraries.png +0 -0
- package/assets/depth-estimation/depth-estimation-input.jpg +0 -0
- package/assets/depth-estimation/depth-estimation-output.png +0 -0
- package/assets/document-question-answering/document-question-answering-input.png +0 -0
- package/assets/image-classification/image-classification-input.jpeg +0 -0
- package/assets/image-segmentation/image-segmentation-input.jpeg +0 -0
- package/assets/image-segmentation/image-segmentation-output.png +0 -0
- package/assets/image-to-image/image-to-image-input.jpeg +0 -0
- package/assets/image-to-image/image-to-image-output.png +0 -0
- package/assets/image-to-image/pix2pix_examples.jpg +0 -0
- package/assets/image-to-text/savanna.jpg +0 -0
- package/assets/object-detection/object-detection-input.jpg +0 -0
- package/assets/object-detection/object-detection-output.jpg +0 -0
- package/assets/table-question-answering/tableQA.jpg +0 -0
- package/assets/text-to-image/image.jpeg +0 -0
- package/assets/text-to-speech/audio.wav +0 -0
- package/assets/text-to-video/text-to-video-output.gif +0 -0
- package/assets/unconditional-image-generation/unconditional-image-generation-output.jpeg +0 -0
- package/assets/video-classification/video-classification-input.gif +0 -0
- package/assets/visual-question-answering/elephant.jpeg +0 -0
- package/assets/zero-shot-image-classification/image-classification-input.jpeg +0 -0
- package/dist/index.d.cts +0 -145
|
@@ -42,7 +42,6 @@ synthesizer = pipeline("text-to-speech", "suno/bark")
|
|
|
42
42
|
synthesizer("Look I am generating speech in three lines of code!")
|
|
43
43
|
```
|
|
44
44
|
|
|
45
|
-
|
|
46
45
|
You can use [huggingface.js](https://github.com/huggingface/huggingface.js) to infer summarization models on Hugging Face Hub.
|
|
47
46
|
|
|
48
47
|
```javascript
|
|
@@ -50,14 +49,14 @@ import { HfInference } from "@huggingface/inference";
|
|
|
50
49
|
|
|
51
50
|
const inference = new HfInference(HF_ACCESS_TOKEN);
|
|
52
51
|
await inference.textToSpeech({
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
})
|
|
52
|
+
model: "facebook/mms-tts",
|
|
53
|
+
inputs: "text to generate speech from",
|
|
54
|
+
});
|
|
56
55
|
```
|
|
57
56
|
|
|
58
57
|
## Useful Resources
|
|
58
|
+
|
|
59
59
|
- [ML for Audio Study Group - Text to Speech Deep Dive](https://www.youtube.com/watch?v=aLBedWj-5CQ)
|
|
60
60
|
- [An introduction to SpeechT5, a multi-purpose speech recognition and synthesis model](https://huggingface.co/blog/speecht5).
|
|
61
61
|
- [A guide on Fine-tuning Whisper For Multilingual ASR with 🤗Transformers](https://huggingface.co/blog/fine-tune-whisper)
|
|
62
62
|
- [Speech Synthesis, Recognition, and More With SpeechT5](https://huggingface.co/blog/speecht5)
|
|
63
|
-
|
|
@@ -4,67 +4,66 @@ const taskData: TaskDataCustom = {
|
|
|
4
4
|
datasets: [
|
|
5
5
|
{
|
|
6
6
|
description: "Thousands of short audio clips of a single speaker.",
|
|
7
|
-
id:
|
|
7
|
+
id: "lj_speech",
|
|
8
8
|
},
|
|
9
9
|
{
|
|
10
10
|
description: "Multi-speaker English dataset.",
|
|
11
|
-
id:
|
|
11
|
+
id: "LibriTTS",
|
|
12
12
|
},
|
|
13
13
|
],
|
|
14
14
|
demo: {
|
|
15
15
|
inputs: [
|
|
16
16
|
{
|
|
17
|
-
label:
|
|
18
|
-
content:
|
|
19
|
-
"I love audio models on the Hub!",
|
|
17
|
+
label: "Input",
|
|
18
|
+
content: "I love audio models on the Hub!",
|
|
20
19
|
type: "text",
|
|
21
20
|
},
|
|
22
|
-
|
|
23
21
|
],
|
|
24
22
|
outputs: [
|
|
25
23
|
{
|
|
26
24
|
filename: "audio.wav",
|
|
27
|
-
type:
|
|
25
|
+
type: "audio",
|
|
28
26
|
},
|
|
29
27
|
],
|
|
30
28
|
},
|
|
31
29
|
metrics: [
|
|
32
30
|
{
|
|
33
31
|
description: "The Mel Cepstral Distortion (MCD) metric is used to calculate the quality of generated speech.",
|
|
34
|
-
id:
|
|
32
|
+
id: "mel cepstral distortion",
|
|
35
33
|
},
|
|
36
34
|
],
|
|
37
35
|
models: [
|
|
38
36
|
{
|
|
39
37
|
description: "A powerful TTS model.",
|
|
40
|
-
id:
|
|
38
|
+
id: "suno/bark",
|
|
41
39
|
},
|
|
42
40
|
{
|
|
43
41
|
description: "A massively multi-lingual TTS model.",
|
|
44
|
-
id:
|
|
42
|
+
id: "facebook/mms-tts",
|
|
45
43
|
},
|
|
46
44
|
{
|
|
47
45
|
description: "An end-to-end speech synthesis model.",
|
|
48
|
-
id:
|
|
46
|
+
id: "microsoft/speecht5_tts",
|
|
49
47
|
},
|
|
50
48
|
],
|
|
51
|
-
spaces:
|
|
49
|
+
spaces: [
|
|
52
50
|
{
|
|
53
51
|
description: "An application for generate highly realistic, multilingual speech.",
|
|
54
|
-
id:
|
|
52
|
+
id: "suno/bark",
|
|
55
53
|
},
|
|
56
54
|
{
|
|
57
55
|
description: "An application that contains multiple speech synthesis models for various languages and accents.",
|
|
58
|
-
id:
|
|
56
|
+
id: "coqui/CoquiTTS",
|
|
59
57
|
},
|
|
60
58
|
{
|
|
61
59
|
description: "An application that synthesizes speech for various speaker types.",
|
|
62
|
-
id:
|
|
60
|
+
id: "Matthijs/speecht5-tts-demo",
|
|
63
61
|
},
|
|
64
62
|
],
|
|
65
|
-
summary:
|
|
63
|
+
summary:
|
|
64
|
+
"Text-to-Speech (TTS) is the task of generating natural sounding speech given text input. TTS models can be extended to have a single model that generates speech for multiple speakers and multiple languages.",
|
|
66
65
|
widgetModels: ["microsoft/speecht5_tts"],
|
|
67
|
-
youtubeId:
|
|
66
|
+
youtubeId: "NW62DpzJ274",
|
|
68
67
|
};
|
|
69
68
|
|
|
70
69
|
export default taskData;
|
|
@@ -1,36 +1,41 @@
|
|
|
1
|
-
## Use Cases
|
|
2
|
-
|
|
3
|
-
### Script-based Video Generation
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
### Content format conversion
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
Text-
|
|
25
|
-
|
|
26
|
-
### Video
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
1
|
+
## Use Cases
|
|
2
|
+
|
|
3
|
+
### Script-based Video Generation
|
|
4
|
+
|
|
5
|
+
Text-to-video models can be used to create short-form video content from a provided text script. These models can be used to create engaging and informative marketing videos. For example, a company could use a text-to-video model to create a video that explains how their product works.
|
|
6
|
+
|
|
7
|
+
### Content format conversion
|
|
8
|
+
|
|
9
|
+
Text-to-video models can be used to generate videos from long-form text, including blog posts, articles, and text files. Text-to-video models can be used to create educational videos that are more engaging and interactive. An example of this is creating a video that explains a complex concept from an article.
|
|
10
|
+
|
|
11
|
+
### Voice-overs and Speech
|
|
12
|
+
|
|
13
|
+
Text-to-video models can be used to create an AI newscaster to deliver daily news, or for a film-maker to create a short film or a music video.
|
|
14
|
+
|
|
15
|
+
## Task Variants
|
|
16
|
+
Text-to-video models have different variants based on inputs and outputs.
|
|
17
|
+
|
|
18
|
+
### Text-to-video Editing
|
|
19
|
+
|
|
20
|
+
One text-to-video task is generating text-based video style and local attribute editing. Text-to-video editing models can make it easier to perform tasks like cropping, stabilization, color correction, resizing and audio editing consistently.
|
|
21
|
+
|
|
22
|
+
### Text-to-video Search
|
|
23
|
+
|
|
24
|
+
Text-to-video search is the task of retrieving videos that are relevant to a given text query. This can be challenging, as videos are a complex medium that can contain a lot of information. By using semantic analysis to extract the meaning of the text query, visual analysis to extract features from the videos, such as the objects and actions that are present in the video, and temporal analysis to categorize relationships between the objects and actions in the video, we can determine which videos are most likely to be relevant to the text query.
|
|
25
|
+
|
|
26
|
+
### Text-driven Video Prediction
|
|
27
|
+
|
|
28
|
+
Text-driven video prediction is the task of generating a video sequence from a text description. Text description can be anything from a simple sentence to a detailed story. The goal of this task is to generate a video that is both visually realistic and semantically consistent with the text description.
|
|
29
|
+
|
|
30
|
+
### Video Translation
|
|
31
|
+
|
|
32
|
+
Text-to-video translation models can translate videos from one language to another or allow to query the multilingual text-video model with non-English sentences. This can be useful for people who want to watch videos in a language that they don't understand, especially when multi-lingual captions are available for training.
|
|
33
|
+
|
|
34
|
+
## Inference
|
|
35
|
+
Contribute an inference snippet for text-to-video here!
|
|
36
|
+
|
|
37
|
+
## Useful Resources
|
|
38
|
+
|
|
39
|
+
In this area, you can insert useful resources about how to train or use a model for this task.
|
|
40
|
+
|
|
41
|
+
- [Text-to-Video: The Task, Challenges and the Current State](https://huggingface.co/blog/text-to-video)
|
|
@@ -2,96 +2,101 @@ import type { TaskDataCustom } from "../Types";
|
|
|
2
2
|
|
|
3
3
|
const taskData: TaskDataCustom = {
|
|
4
4
|
datasets: [
|
|
5
|
-
|
|
5
|
+
{
|
|
6
6
|
description: "Microsoft Research Video to Text is a large-scale dataset for open domain video captioning",
|
|
7
|
-
id:
|
|
7
|
+
id: "iejMac/CLIP-MSR-VTT",
|
|
8
8
|
},
|
|
9
|
-
|
|
9
|
+
{
|
|
10
10
|
description: "UCF101 Human Actions dataset consists of 13,320 video clips from YouTube, with 101 classes.",
|
|
11
|
-
id:
|
|
11
|
+
id: "quchenyuan/UCF101-ZIP",
|
|
12
12
|
},
|
|
13
|
-
|
|
13
|
+
{
|
|
14
14
|
description: "A high-quality dataset for human action recognition in YouTube videos.",
|
|
15
|
-
id:
|
|
15
|
+
id: "nateraw/kinetics",
|
|
16
16
|
},
|
|
17
|
-
|
|
17
|
+
{
|
|
18
18
|
description: "A dataset of video clips of humans performing pre-defined basic actions with everyday objects.",
|
|
19
|
-
id:
|
|
19
|
+
id: "HuggingFaceM4/something_something_v2",
|
|
20
20
|
},
|
|
21
|
-
|
|
22
|
-
description:
|
|
23
|
-
|
|
21
|
+
{
|
|
22
|
+
description:
|
|
23
|
+
"This dataset consists of text-video pairs and contains noisy samples with irrelevant video descriptions",
|
|
24
|
+
id: "HuggingFaceM4/webvid",
|
|
24
25
|
},
|
|
25
|
-
|
|
26
|
+
{
|
|
26
27
|
description: "A dataset of short Flickr videos for the temporal localization of events with descriptions.",
|
|
27
|
-
id:
|
|
28
|
+
id: "iejMac/CLIP-DiDeMo",
|
|
28
29
|
},
|
|
29
|
-
|
|
30
|
-
demo:
|
|
31
|
-
inputs:
|
|
30
|
+
],
|
|
31
|
+
demo: {
|
|
32
|
+
inputs: [
|
|
32
33
|
{
|
|
33
|
-
label:
|
|
34
|
-
content:
|
|
35
|
-
"Darth Vader is surfing on the waves.",
|
|
34
|
+
label: "Input",
|
|
35
|
+
content: "Darth Vader is surfing on the waves.",
|
|
36
36
|
type: "text",
|
|
37
37
|
},
|
|
38
38
|
],
|
|
39
39
|
outputs: [
|
|
40
40
|
{
|
|
41
41
|
filename: "text-to-video-output.gif",
|
|
42
|
-
type:
|
|
42
|
+
type: "img",
|
|
43
43
|
},
|
|
44
44
|
],
|
|
45
45
|
},
|
|
46
|
-
metrics:
|
|
46
|
+
metrics: [
|
|
47
47
|
{
|
|
48
|
-
description:
|
|
48
|
+
description:
|
|
49
|
+
"Inception Score uses an image classification model that predicts class labels and evaluates how distinct and diverse the images are. A higher score indicates better video generation.",
|
|
49
50
|
id: "is",
|
|
50
51
|
},
|
|
51
52
|
{
|
|
52
|
-
description:
|
|
53
|
+
description:
|
|
54
|
+
"Frechet Inception Distance uses an image classification model to obtain image embeddings. The metric compares mean and standard deviation of the embeddings of real and generated images. A smaller score indicates better video generation.",
|
|
53
55
|
id: "fid",
|
|
54
56
|
},
|
|
55
57
|
{
|
|
56
|
-
description:
|
|
58
|
+
description:
|
|
59
|
+
"Frechet Video Distance uses a model that captures coherence for changes in frames and the quality of each frame. A smaller score indicates better video generation.",
|
|
57
60
|
id: "fvd",
|
|
58
61
|
},
|
|
59
62
|
{
|
|
60
|
-
description:
|
|
63
|
+
description:
|
|
64
|
+
"CLIPSIM measures similarity between video frames and text using an image-text similarity model. A higher score indicates better video generation.",
|
|
61
65
|
id: "clipsim",
|
|
62
66
|
},
|
|
63
67
|
],
|
|
64
|
-
models:
|
|
68
|
+
models: [
|
|
65
69
|
{
|
|
66
70
|
description: "A strong model for video generation.",
|
|
67
|
-
id:
|
|
71
|
+
id: "PAIR/text2video-zero-controlnet-canny-arcane",
|
|
68
72
|
},
|
|
69
73
|
{
|
|
70
74
|
description: "A robust model for text-to-video generation.",
|
|
71
|
-
id:
|
|
75
|
+
id: "damo-vilab/text-to-video-ms-1.7b",
|
|
72
76
|
},
|
|
73
77
|
{
|
|
74
78
|
description: "A text-to-video generation model with high quality and smooth outputs.",
|
|
75
|
-
id:
|
|
76
|
-
},
|
|
77
|
-
|
|
78
|
-
spaces:
|
|
79
|
+
id: "cerspense/zeroscope_v2_576w",
|
|
80
|
+
},
|
|
81
|
+
],
|
|
82
|
+
spaces: [
|
|
79
83
|
{
|
|
80
84
|
description: "An application that generates video from text.",
|
|
81
|
-
id:
|
|
85
|
+
id: "fffiloni/zeroscope",
|
|
82
86
|
},
|
|
83
87
|
{
|
|
84
88
|
description: "An application that generates video from image and text.",
|
|
85
|
-
id:
|
|
89
|
+
id: "TempoFunk/makeavid-sd-jax",
|
|
86
90
|
},
|
|
87
91
|
{
|
|
88
92
|
description: "An application that generates videos from text and provides multi-model support.",
|
|
89
|
-
id:
|
|
93
|
+
id: "ArtGAN/Video-Diffusion-WebUI",
|
|
90
94
|
},
|
|
91
95
|
],
|
|
92
|
-
summary:
|
|
93
|
-
|
|
94
|
-
|
|
96
|
+
summary:
|
|
97
|
+
"Text-to-video models can be used in any application that requires generating consistent sequence of images from text. ",
|
|
98
|
+
widgetModels: [],
|
|
99
|
+
youtubeId: undefined,
|
|
95
100
|
};
|
|
96
101
|
|
|
97
102
|
export default taskData;
|
|
@@ -21,8 +21,6 @@ classifier = pipeline("ner")
|
|
|
21
21
|
classifier("Hello I'm Omar and I live in Zürich.")
|
|
22
22
|
```
|
|
23
23
|
|
|
24
|
-
|
|
25
|
-
|
|
26
24
|
### Part-of-Speech (PoS) Tagging
|
|
27
25
|
In PoS tagging, the model recognizes parts of speech, such as nouns, pronouns, adjectives, or verbs, in a given text. The task is formulated as labeling each word with a part of the speech.
|
|
28
26
|
|
|
@@ -75,4 +73,4 @@ Would you like to learn more about token classification? Great! Here you can fin
|
|
|
75
73
|
|
|
76
74
|
### Documentation
|
|
77
75
|
|
|
78
|
-
- [Token classification task guide](https://huggingface.co/docs/transformers/tasks/token_classification)
|
|
76
|
+
- [Token classification task guide](https://huggingface.co/docs/transformers/tasks/token_classification)
|
|
@@ -4,36 +4,35 @@ const taskData: TaskDataCustom = {
|
|
|
4
4
|
datasets: [
|
|
5
5
|
{
|
|
6
6
|
description: "A widely used dataset useful to benchmark named entity recognition models.",
|
|
7
|
-
id:
|
|
7
|
+
id: "conll2003",
|
|
8
8
|
},
|
|
9
9
|
{
|
|
10
|
-
description:
|
|
11
|
-
|
|
10
|
+
description:
|
|
11
|
+
"A multilingual dataset of Wikipedia articles annotated for named entity recognition in over 150 different languages.",
|
|
12
|
+
id: "wikiann",
|
|
12
13
|
},
|
|
13
14
|
],
|
|
14
15
|
demo: {
|
|
15
16
|
inputs: [
|
|
16
17
|
{
|
|
17
|
-
label:
|
|
18
|
-
content:
|
|
19
|
-
"My name is Omar and I live in Zürich.",
|
|
18
|
+
label: "Input",
|
|
19
|
+
content: "My name is Omar and I live in Zürich.",
|
|
20
20
|
type: "text",
|
|
21
21
|
},
|
|
22
|
-
|
|
23
22
|
],
|
|
24
23
|
outputs: [
|
|
25
24
|
{
|
|
26
|
-
text:
|
|
25
|
+
text: "My name is Omar and I live in Zürich.",
|
|
27
26
|
tokens: [
|
|
28
27
|
{
|
|
29
|
-
type:
|
|
28
|
+
type: "PERSON",
|
|
30
29
|
start: 11,
|
|
31
|
-
end:
|
|
30
|
+
end: 15,
|
|
32
31
|
},
|
|
33
32
|
{
|
|
34
|
-
type:
|
|
33
|
+
type: "GPE",
|
|
35
34
|
start: 30,
|
|
36
|
-
end:
|
|
35
|
+
end: 36,
|
|
37
36
|
},
|
|
38
37
|
],
|
|
39
38
|
type: "text-with-tokens",
|
|
@@ -43,41 +42,43 @@ const taskData: TaskDataCustom = {
|
|
|
43
42
|
metrics: [
|
|
44
43
|
{
|
|
45
44
|
description: "",
|
|
46
|
-
id:
|
|
45
|
+
id: "accuracy",
|
|
47
46
|
},
|
|
48
47
|
{
|
|
49
48
|
description: "",
|
|
50
|
-
id:
|
|
51
|
-
|
|
49
|
+
id: "recall",
|
|
52
50
|
},
|
|
53
51
|
{
|
|
54
52
|
description: "",
|
|
55
|
-
id:
|
|
53
|
+
id: "precision",
|
|
56
54
|
},
|
|
57
55
|
{
|
|
58
56
|
description: "",
|
|
59
|
-
id:
|
|
57
|
+
id: "f1",
|
|
60
58
|
},
|
|
61
59
|
],
|
|
62
60
|
models: [
|
|
63
61
|
{
|
|
64
|
-
description:
|
|
65
|
-
|
|
62
|
+
description:
|
|
63
|
+
"A robust performance model to identify people, locations, organizations and names of miscellaneous entities.",
|
|
64
|
+
id: "dslim/bert-base-NER",
|
|
66
65
|
},
|
|
67
66
|
{
|
|
68
67
|
description: "Flair models are typically the state of the art in named entity recognition tasks.",
|
|
69
|
-
id:
|
|
68
|
+
id: "flair/ner-english",
|
|
70
69
|
},
|
|
71
70
|
],
|
|
72
|
-
spaces:
|
|
71
|
+
spaces: [
|
|
73
72
|
{
|
|
74
|
-
description:
|
|
75
|
-
|
|
73
|
+
description:
|
|
74
|
+
"An application that can recognizes entities, extracts noun chunks and recognizes various linguistic features of each token.",
|
|
75
|
+
id: "spacy/gradio_pipeline_visualizer",
|
|
76
76
|
},
|
|
77
77
|
],
|
|
78
|
-
summary:
|
|
78
|
+
summary:
|
|
79
|
+
"Token classification is a natural language understanding task in which a label is assigned to some tokens in a text. Some popular token classification subtasks are Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging. NER models could be trained to identify specific entities in a text, such as dates, individuals and places; and PoS tagging would identify, for example, which words in a text are verbs, nouns, and punctuation marks.",
|
|
79
80
|
widgetModels: ["dslim/bert-base-NER"],
|
|
80
|
-
youtubeId:
|
|
81
|
+
youtubeId: "wVHdVlPScxA",
|
|
81
82
|
};
|
|
82
83
|
|
|
83
84
|
export default taskData;
|
package/src/translation/about.md
CHANGED
|
@@ -39,9 +39,9 @@ import { HfInference } from "@huggingface/inference";
|
|
|
39
39
|
|
|
40
40
|
const inference = new HfInference(HF_ACCESS_TOKEN);
|
|
41
41
|
await inference.translation({
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
})
|
|
42
|
+
model: "t5-base",
|
|
43
|
+
inputs: "My name is Wolfgang and I live in Berlin",
|
|
44
|
+
});
|
|
45
45
|
```
|
|
46
46
|
|
|
47
47
|
## Useful Resources
|
|
@@ -62,4 +62,4 @@ Would you like to learn more about Translation? Great! Here you can find some cu
|
|
|
62
62
|
|
|
63
63
|
### Documentation
|
|
64
64
|
|
|
65
|
-
- [Translation task guide](https://huggingface.co/docs/transformers/tasks/translation)
|
|
65
|
+
- [Translation task guide](https://huggingface.co/docs/transformers/tasks/translation)
|
package/src/translation/data.ts
CHANGED
|
@@ -4,65 +4,65 @@ const taskData: TaskDataCustom = {
|
|
|
4
4
|
datasets: [
|
|
5
5
|
{
|
|
6
6
|
description: "A dataset of copyright-free books translated into 16 different languages.",
|
|
7
|
-
id:
|
|
7
|
+
id: "opus_books",
|
|
8
8
|
},
|
|
9
9
|
{
|
|
10
|
-
description:
|
|
11
|
-
|
|
10
|
+
description:
|
|
11
|
+
"An example of translation between programming languages. This dataset consists of functions in Java and C#.",
|
|
12
|
+
id: "code_x_glue_cc_code_to_code_trans",
|
|
12
13
|
},
|
|
13
14
|
],
|
|
14
15
|
demo: {
|
|
15
16
|
inputs: [
|
|
16
17
|
{
|
|
17
|
-
label:
|
|
18
|
-
content:
|
|
19
|
-
"My name is Omar and I live in Zürich.",
|
|
18
|
+
label: "Input",
|
|
19
|
+
content: "My name is Omar and I live in Zürich.",
|
|
20
20
|
type: "text",
|
|
21
21
|
},
|
|
22
|
-
|
|
23
22
|
],
|
|
24
23
|
outputs: [
|
|
25
24
|
{
|
|
26
|
-
label:
|
|
27
|
-
content:
|
|
28
|
-
"Mein Name ist Omar und ich wohne in Zürich.",
|
|
25
|
+
label: "Output",
|
|
26
|
+
content: "Mein Name ist Omar und ich wohne in Zürich.",
|
|
29
27
|
type: "text",
|
|
30
28
|
},
|
|
31
29
|
],
|
|
32
30
|
},
|
|
33
31
|
metrics: [
|
|
34
32
|
{
|
|
35
|
-
description:
|
|
36
|
-
|
|
33
|
+
description:
|
|
34
|
+
"BLEU score is calculated by counting the number of shared single or subsequent tokens between the generated sequence and the reference. Subsequent n tokens are called “n-grams”. Unigram refers to a single token while bi-gram refers to token pairs and n-grams refer to n subsequent tokens. The score ranges from 0 to 1, where 1 means the translation perfectly matched and 0 did not match at all",
|
|
35
|
+
id: "bleu",
|
|
37
36
|
},
|
|
38
37
|
{
|
|
39
38
|
description: "",
|
|
40
|
-
id:
|
|
39
|
+
id: "sacrebleu",
|
|
41
40
|
},
|
|
42
41
|
],
|
|
43
42
|
models: [
|
|
44
43
|
{
|
|
45
44
|
description: "A model that translates from English to French.",
|
|
46
|
-
id:
|
|
45
|
+
id: "Helsinki-NLP/opus-mt-en-fr",
|
|
47
46
|
},
|
|
48
47
|
{
|
|
49
|
-
description:
|
|
50
|
-
|
|
48
|
+
description:
|
|
49
|
+
"A general-purpose Transformer that can be used to translate from English to German, French, or Romanian.",
|
|
50
|
+
id: "t5-base",
|
|
51
51
|
},
|
|
52
52
|
],
|
|
53
|
-
spaces:
|
|
53
|
+
spaces: [
|
|
54
54
|
{
|
|
55
55
|
description: "An application that can translate between 100 languages.",
|
|
56
|
-
id:
|
|
56
|
+
id: "Iker/Translate-100-languages",
|
|
57
57
|
},
|
|
58
58
|
{
|
|
59
59
|
description: "An application that can translate between English, Spanish and Hindi.",
|
|
60
|
-
id:
|
|
60
|
+
id: "EuroPython2022/Translate-with-Bloom",
|
|
61
61
|
},
|
|
62
62
|
],
|
|
63
|
-
summary:
|
|
63
|
+
summary: "Translation is the task of converting text from one language to another.",
|
|
64
64
|
widgetModels: ["t5-small"],
|
|
65
|
-
youtubeId:
|
|
65
|
+
youtubeId: "1JvfrvZgi6c",
|
|
66
66
|
};
|
|
67
67
|
|
|
68
68
|
export default taskData;
|
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
Unconditional image generation is the task of generating new images without any specific input. The main goal of this is to create novel, original images that are not based on existing images.
|
|
4
4
|
This can be used for a variety of applications, such as creating new artistic images, improving image recognition algorithms, or generating photorealistic images for virtual reality environments.
|
|
5
5
|
|
|
6
|
-
Unconditional image generation models usually start with a
|
|
6
|
+
Unconditional image generation models usually start with a _seed_ that generates a _random noise vector_. The model will then use this vector to create an output image similar to the images used for training the model.
|
|
7
7
|
|
|
8
8
|
An example of unconditional image generation would be generating the image of a face on a model trained with the [CelebA dataset](https://huggingface.co/datasets/huggan/CelebA-HQ) or [generating a butterfly](https://huggingface.co/spaces/huggan/butterfly-gan) on a model trained with the [Smithsonian Butterflies dataset](https://huggingface.co/datasets/ceyda/smithsonian_butterflies).
|
|
9
9
|
|
|
@@ -14,18 +14,23 @@ An example of unconditional image generation would be generating the image of a
|
|
|
14
14
|
Unconditional image generation can be used for a variety of applications.
|
|
15
15
|
|
|
16
16
|
### Artistic Expression
|
|
17
|
+
|
|
17
18
|
Unconditional image generation can be used to create novel, original artwork that is not based on any existing images. This can be used to explore new creative possibilities and produce unique, imaginative images.
|
|
18
19
|
|
|
19
|
-
### Data Augmentation
|
|
20
|
+
### Data Augmentation
|
|
21
|
+
|
|
20
22
|
Unconditional image generation models can be used to generate new images to improve the performance of image recognition algorithms. This makes algorithms more robust and able to handle a broader range of images.
|
|
21
23
|
|
|
22
|
-
### Virtual Reality
|
|
24
|
+
### Virtual Reality
|
|
25
|
+
|
|
23
26
|
Unconditional image generation models can be used to create photorealistic images that can be used in virtual reality environments. This makes the VR experience more immersive and realistic.
|
|
24
27
|
|
|
25
|
-
### Medical Imaging
|
|
28
|
+
### Medical Imaging
|
|
29
|
+
|
|
26
30
|
Unconditional image generation models can generate new medical images, such as CT or MRI scans, that can be used to train and evaluate medical imaging algorithms. This can improve the accuracy and reliability of these algorithms.
|
|
27
31
|
|
|
28
32
|
### Industrial Design
|
|
33
|
+
|
|
29
34
|
Unconditional image generation models can generate new designs for products, such as clothing or furniture, that are not based on any existing designs. This way, designers can explore new creative possibilities and produce unique, innovative designs.
|
|
30
35
|
|
|
31
36
|
## Model Hosting and Inference
|
|
@@ -42,4 +47,4 @@ This section should have useful information about Model Hosting and Inference
|
|
|
42
47
|
|
|
43
48
|
In this area, you can insert useful information about training the model
|
|
44
49
|
|
|
45
|
-
This page was made possible thanks to the efforts of [Someet Sahoo](https://huggingface.co/Someet24) and [Juan Carlos Piñeros](https://huggingface.co/juancopi81).
|
|
50
|
+
This page was made possible thanks to the efforts of [Someet Sahoo](https://huggingface.co/Someet24) and [Juan Carlos Piñeros](https://huggingface.co/juancopi81).
|