labelr 0.10.0__py3-none-any.whl → 0.11.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,230 @@
1
+ Metadata-Version: 2.4
2
+ Name: labelr
3
+ Version: 0.11.0
4
+ Summary: A command-line tool to manage labeling tasks with Label Studio.
5
+ Requires-Python: >=3.10
6
+ Description-Content-Type: text/markdown
7
+ License-File: LICENSE
8
+ Requires-Dist: datasets>=3.2.0
9
+ Requires-Dist: imagehash>=4.3.1
10
+ Requires-Dist: label-studio-sdk>=1.0.8
11
+ Requires-Dist: more-itertools>=10.5.0
12
+ Requires-Dist: openfoodfacts>=2.9.0
13
+ Requires-Dist: typer>=0.15.1
14
+ Requires-Dist: google-cloud-batch==0.18.0
15
+ Requires-Dist: huggingface-hub
16
+ Requires-Dist: deepdiff>=8.6.1
17
+ Requires-Dist: rapidfuzz>=3.14.3
18
+ Requires-Dist: aiohttp
19
+ Requires-Dist: aiofiles
20
+ Requires-Dist: orjson
21
+ Requires-Dist: google-cloud-storage
22
+ Requires-Dist: gcloud-aio-storage
23
+ Requires-Dist: google-genai>=1.56.0
24
+ Requires-Dist: diskcache>=5.6.3
25
+ Provides-Extra: ultralytics
26
+ Requires-Dist: ultralytics==8.4.8; extra == "ultralytics"
27
+ Provides-Extra: fiftyone
28
+ Requires-Dist: fiftyone~=1.10.0; extra == "fiftyone"
29
+ Dynamic: license-file
30
+
31
+ # Labelr
32
+
33
+ Labelr a command line interface that aims at providing a set of tools to help data scientists and machine learning engineers to deal with ML data annotation, data preprocessing and format conversion.
34
+
35
+ This project started as a way to automate some of the tasks we do at Open Food Facts to manage data at different stages of the machine learning pipeline.
36
+
37
+ The CLI currently is integrated with Label Studio (for data annotation), Ultralytics (for object detection), Google Cloud Batch (for training) and Hugging Face (for model and dataset storage). It only works with some specific tasks (object detection, image classification and image extraction using LVLM for now), but it's meant to be extended to other tasks in the future.
38
+
39
+ For object detection and image classification models, it currently allows to:
40
+
41
+ - create Label Studio projects
42
+ - upload images to Label Studio
43
+ - pre-annotate the tasks either with an existing object detection model, or with a zero-shot model (Yolo-World or SAM), using Ultralytics
44
+ - perform data quality checks on Label Studio datasets
45
+ - export the data to Hugging Face or to local disk
46
+ - train the model on Google Batch (for object detection only)
47
+ - visualize the model predictions and compare them with the ground truth, using [Fiftyone](https://docs.voxel51.com/user_guide/index.html).
48
+
49
+ Labelr also support managing datasets for fine-tuning large visual language models. It currently only support a single task: structured extraction (JSON) from a single image.
50
+ The following features are supported:
51
+
52
+ - creating training datasets using Google Gemini Batch, from a list of images, textual instructions and a JSON schema
53
+ - uploading the dataset to Hugging Face
54
+ - fixing manually or automatically the model output using [Directus](https://directus.io/), a headless CMS used to manage the structured output
55
+ - export the dataset to Hugging Face
56
+
57
+ In addition, Labelr comes with two scripts that can be used to train ML models:
58
+
59
+ - in `packages/train-yolo`: the `main.py` script can be used to train an object detection model using Ultralytics. The training can be fully automatized on Google Batch, and Labelr provides a CLI to launch Google Batch jobs.
60
+ - in `packages/train-unsloth`: the `main.py` script can be used to train a visual language model using Unsloth. The training is not yet automatized on Google Batch, but the script can be used to train the model locally.
61
+
62
+ ## Installation
63
+
64
+ Python 3.10 or higher is required to run this CLI.
65
+
66
+ To install the CLI, simply run:
67
+
68
+ ```bash
69
+ pip install labelr
70
+ ```
71
+ We recommend to install the CLI in a virtual environment. You can either use pip or conda for that.
72
+
73
+ There are two optional dependencies that you can install to use the CLI:
74
+ - `ultralytics`: pre-annotate object detection datasets with an ultralytics model (yolo, yolo-world)
75
+ - `fiftyone`: visualize the model predictions and compare them with the ground truth, using FiftyOne.
76
+
77
+ To install the ultralytics optional dependency, you can run:
78
+
79
+ ```bash
80
+ pip install labelr[ultralytics]
81
+ ```
82
+
83
+ ## Usage
84
+
85
+ ### Label Studio integration
86
+
87
+ To create a Label Studio project, you need to have a Label Studio instance running. Launching a Label Studio instance is out of the scope of this project, but you can follow the instructions on the [Label Studio documentation](https://labelstud.io/guide/install.html).
88
+
89
+ By default, the CLI will assume you're running Label Studio locally (url: http://127.0.0.1:8080). You can change the URL by setting the `--label-studio-url` CLI option or by updating the configuration (see the [Configuration](#configuration) section below for more information).
90
+
91
+ For all the commands that interact with Label Studio, you need to provide an API key using the `--api-key`, or through configuration.
92
+
93
+ #### Create a project
94
+
95
+ Once you have a Label Studio instance running, you can create a project easily. First, you need to create a configuration file for the project. The configuration file is an XML file that defines the labeling interface and the labels to use for the project. You can find an example of a configuration file in the [Label Studio documentation](https://labelstud.io/guide/setup).
96
+
97
+ For an object detection task, a command allows you to create the configuration file automatically:
98
+
99
+ ```bash
100
+ labelr ls create-config-file --labels 'label1' --labels 'label2' --output-file label_config.xml
101
+ ```
102
+
103
+ where `label1` and `label2` are the labels you want to use for the object detection task, and `label_config.xml` is the output file that will contain the configuration.
104
+
105
+ Then, you can create a project on Label Studio with the following command:
106
+
107
+ ```bash
108
+ labelr ls create --title my_project --api-key API_KEY --config-file label_config.xml
109
+ ```
110
+
111
+ where `API_KEY` is the API key of the Label Studio instance (API key is available at Account page), and `label_config.xml` is the configuration file of the project.
112
+
113
+ `ls` stands for Label Studio in the CLI.
114
+
115
+ #### Create a dataset file
116
+
117
+ If you have a list of images, for an object detection task, you can quickly create a dataset file with the following command:
118
+
119
+ ```bash
120
+ labelr ls create-dataset-file --input-file image_urls.txt --output-file dataset.json
121
+ ```
122
+
123
+ where `image_urls.txt` is a file containing the URLs of the images, one per line, and `dataset.json` is the output file.
124
+
125
+ #### Import data
126
+
127
+ Next, import the generated data to a project with the following command:
128
+
129
+ ```bash
130
+ labelr ls import-data --project-id PROJECT_ID --dataset-path dataset.json
131
+ ```
132
+
133
+ where `PROJECT_ID` is the ID of the project you created.
134
+
135
+ #### Pre-annotate the data
136
+
137
+ To accelerate annotation, you can pre-annotate the images with an object detection model. We support three pre-annotation backends:
138
+
139
+ - `ultralytics`: use your own model or [Yolo-World](https://docs.ultralytics.com/models/yolo-world/), a zero-shot model that can detect any object using a text description of the object. You can specify the path or the name of the model with the `--model-name` option. If no model name is provided, the `yolov8x-worldv2.pt` model (Yolo-World) is used.
140
+ - `ultralytics_sam3`: use [SAM3](https://docs.ultralytics.com/models/sam-3/), another zero-shot model. We advice to use this backend, as it's the most accurate. The `--model-name` option is ignored when this backend is used.
141
+ - `robotoff`: the ML backend of Open Food Facts (specific to Open Food Facts projects).
142
+
143
+ When using `ultralytics` or `ultralytics_sam3`, make sure you installed the labelr package with the `ultralytics` extra.
144
+
145
+ To pre-annotate the data with Ultralytics, use the following command:
146
+
147
+ ```bash
148
+ labelr ls add-prediction --project-id PROJECT_ID --backend ultralytics_sam3 --labels 'product' --labels 'price tag' --label-mapping '{"price tag": "price-tag"}'
149
+ ```
150
+
151
+ The SAM3 model will be automatically downloaded from Hugging Face. [SAM3](https://huggingface.co/facebook/sam3) is a gated model, it requires a permission before getting access to the model.Make sure you were granted the access before launching the command.
152
+
153
+ In the command above, `labels` is the list of labels to use for the object detection task (you can add as many labels as you want). You can also provide a `--label-mapping` option in case the names of the label of the model you use for pre-annotation is different from the names configured on your Label Studio project.
154
+
155
+
156
+ #### Add `train` and `val` split
157
+
158
+ In most machine learning projects, you need to split your data into a training and a validation set. Assigning each sample to a split is required before exporting the dataset. To do so, you can use the following command:
159
+
160
+ ```bash
161
+ labelr ls add-split --train-split 0.8 --project-id PROJECT_ID
162
+ ```
163
+
164
+ For each task in the dataset, it randomly assigns 80% of the samples to the `train` split and 20% to the `val` split. The split is saved in the task `data` in the `split` field.
165
+
166
+ You can change the train/val ratio by changing the `--train-split` option. You can also assign specific sample to a split. For example you can assign the `train` split to specific tasks by storinh the task IDs in a file `task_ids.txt` and by running the following command:
167
+
168
+ ```bash
169
+ labelr ls add-split --split-name train --task-id-file task_ids.txt --project-id PROJECT_ID
170
+ ```
171
+
172
+ #### Performing sanity checks on the dataset
173
+
174
+ Labelr can detect automatically some common data quality issues:
175
+
176
+ - broken image URLs
177
+ - duplicate tasks (based on the image hash)
178
+ - multiple annotations
179
+
180
+ To perform a check, run:
181
+
182
+ ```bash
183
+ labelr ls check-dataset --project-id PROJECT_ID
184
+ ```
185
+
186
+ The command will report the issues found. It is non-destructive by default, but you can use the `--delete-missing-images` and `--delete-duplicate-images` options to delete the tasks with missing images or duplicates respectively.
187
+
188
+ #### Export the data
189
+
190
+ Once the data is annotated, you can export it to a Hugging Face dataset or to local disk (Ultralytics format). To export it to disk, use the following command:
191
+
192
+ ```bash
193
+ labelr datasets export --project-id PROJECT_ID --from ls --to ultralytics --output-dir output --label-names 'product,price-tag'
194
+ ```
195
+
196
+ where `output` is the directory where the data will be exported. Currently, label names must be provided, as the CLI does not support exporting label names from Label Studio yet.
197
+
198
+ To export the data to a Hugging Face dataset, use the following command:
199
+
200
+ ```bash
201
+ labelr datasets export --project-id PROJECT_ID --from ls --to huggingface --repo-id REPO_ID --label-names 'product,price-tag'
202
+ ```
203
+
204
+ where `REPO_ID` is the ID of the Hugging Face repository where the dataset will be uploaded (ex: `openfoodfacts/food-detection`).
205
+
206
+ ### Lauch training jobs
207
+
208
+ You can also launch training jobs for YOLO object detection models using datasets hosted on Hugging Face. Please refer to the [train-yolo package README](packages/train-yolo/README.md) for more details on how to use this feature.
209
+
210
+ ## Configuration
211
+
212
+ Some Labelr settings can be configured using a configuration file or through environment variables. The configuration file is located at `~/.config/labelr/config.json`.
213
+
214
+ By order of precedence, the configuration is loaded from:
215
+
216
+ - CLI command option
217
+ - environment variable
218
+ - file configuration
219
+
220
+ The following variables are currently supported:
221
+
222
+ - `label_studio_url`: URL of the Label Studio server. Can also be set with the `LABELR_LABEL_STUDIO_URL` environment variable.
223
+ - `label_studio_api_key`: API key for Label Studio. Can also be set with the `LABELR_LABEL_STUDIO_API_KEY` environment variable.
224
+
225
+
226
+ Labelr supports configuring settings in config file through the `config` command. For example, to set the Label Studio URL, you can run:
227
+
228
+ ```bash
229
+ labelr config label_studio_url http://127.0.0.1:8080
230
+ ```
@@ -1,36 +1,38 @@
1
1
  labelr/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
2
2
  labelr/__main__.py,sha256=G4e95-IfhI-lOmkOBP6kQ8wl1x_Fl7dZlLOYr90K83c,66
3
3
  labelr/annotate.py,sha256=3fJ9FYbcozcOoKuhNtzPHV8sSnp-45FsNnMc8UeBHGU,3503
4
- labelr/check.py,sha256=3wK6mE0UsKvoBNm0_lyWhCMq7gxkv5r50pvO70damXY,2476
5
- labelr/config.py,sha256=3RXF_NdkSuHvfVMGMlYmjlw45fU77zQkLX7gmZq7NxM,64
4
+ labelr/check.py,sha256=AeU0MmuLTAVCNRN35Ri_QY--JTK53UnoOgrNQ2ikPKE,4988
5
+ labelr/config.py,sha256=NnpdeyJjux8MoBGkFq-4XkMe_NgKiXxY3kD3CAl-yIk,1792
6
6
  labelr/dataset_features.py,sha256=ZC9QAUw9oKHqyUPla2h3xQFaRT9sHq8hkPNN4RDDwmo,1257
7
7
  labelr/google_genai.py,sha256=x5p98eYoI887QMBDgziFxEW9WNdZ8Cw0EHjAFQ71SaE,14728
8
- labelr/main.py,sha256=OTiJSkD_TrzQmQQm291FhknD-HQQTWfBEBgImxqL0KM,2634
8
+ labelr/main.py,sha256=9n1kzRXTMbOKcXBUSpWamdTh9l8t1_skIhQWqOaPmb0,3155
9
9
  labelr/project_config.py,sha256=CIHEcgSOfXb53naHWEBkTDm2V9m3abAu8C54VSzHjAs,1260
10
10
  labelr/types.py,sha256=8CHfLyifF_N94OYDhG-7IcWboOh9o0Z_0LBtQapT8TQ,313
11
11
  labelr/utils.py,sha256=8Yp0L2MCIdUYSjvmF4U5iiaBpaZJbYw4rHJOMhCCudE,3075
12
12
  labelr/apps/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
13
- labelr/apps/datasets.py,sha256=tAD6TZSnwh7uhkleSfDP0PFqztXC1S3Vx2aMSVCFfRU,12725
13
+ labelr/apps/datasets.py,sha256=eZ3JcyXuHsR-UuHa0wBOSCFsjeEuosn0jTWZnMgJ9BQ,17516
14
+ labelr/apps/directus.py,sha256=D_uFfBNF80ygFxgcWQKC4cdNMDNb8wZ5IP2sMCmY4-8,5918
14
15
  labelr/apps/evaluate.py,sha256=UC4CuSKa4vgR5xTBZ-dFgp_1pYnkM55s2IJgix0YtkI,1157
15
- labelr/apps/google_batch.py,sha256=Mlz5jRVcR1XzRJg2HLte3rIhiOk4xQQjjLAJsc3lJjo,9572
16
+ labelr/apps/google_batch.py,sha256=L4KsLlB7faAlBWeJz89W17r7sxdrMfd83-Kf_wQgNSM,10856
16
17
  labelr/apps/hugging_face.py,sha256=B0GaDZeUZj2A7nEeC1OtCANb0DqvBkhWwFWM_9Nm2kU,1608
17
- labelr/apps/label_studio.py,sha256=lQ7K16noA4Mnr1hc0oxya1sgGgABWnpIIJTM5ENp7so,16869
18
+ labelr/apps/label_studio.py,sha256=VJjEva9vWN8Lu7l834Q3L07HsgLXCAJiZRXRK4oe8Ks,23992
18
19
  labelr/apps/train.py,sha256=wmOSpO9JsrwCXYMgRg2srMbV5B5TvnlfhAKPqUt6wSg,7328
20
+ labelr/apps/typer_description.py,sha256=5i1T-KQHIXoeZ6nii90JzcjkZHNNQj_DUiwO3ReT1JM,284
19
21
  labelr/evaluate/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
20
22
  labelr/evaluate/object_detection.py,sha256=QJIwrDY-Vsy0-It6tZSkN3qgAlmIu2W1-kGdmibiPSQ,3349
21
23
  labelr/export/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
22
24
  labelr/export/classification.py,sha256=rnm99vGMJy1UkdXiZ8t_TgFe3CyLBBYowWwzaZeniIs,4699
23
25
  labelr/export/common.py,sha256=lJ-ZDOMKGpC48fCuEnIrA8sZBhXGZOcghBbsLM1h66o,1252
24
26
  labelr/export/llm.py,sha256=Jlopi0EQ4YUWLe_s-kTFcISTzO1QmdX-qXQxayO6E-k,3186
25
- labelr/export/object_detection.py,sha256=91ywkPago7WgbY2COQKpwjFLYAAsXeGOu7TkGHi17OU,12338
27
+ labelr/export/object_detection.py,sha256=OQdY8pfNmqzvHj__SIir9_glbd4ogIos0aZ6P8dZhiM,16230
26
28
  labelr/sample/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
27
29
  labelr/sample/classification.py,sha256=7Z5hvxG6q6wfJMYj00JWbRBhfjOyhjaL8fpJjgBi9N8,539
28
30
  labelr/sample/common.py,sha256=f0XDS6s0z6Vw4G2FDELJ1VQSe5Tsh0q3-3VU9unK9eY,431
29
31
  labelr/sample/llm.py,sha256=zAsI3TmfGCbBPv4_hNtYR4Np3yAmUDzXGAvlQLF6V6w,2474
30
- labelr/sample/object_detection.py,sha256=XZasR_k4AxzsiWdVMC2ZnyjfA14PKJPrx1U-XPr5tWQ,8427
31
- labelr-0.10.0.dist-info/licenses/LICENSE,sha256=hIahDEOTzuHCU5J2nd07LWwkLW7Hko4UFO__ffsvB-8,34523
32
- labelr-0.10.0.dist-info/METADATA,sha256=pS2Ipq-aICU3TluuqSNocGP5-V8ztLk6X_udwwnECPk,7243
33
- labelr-0.10.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
34
- labelr-0.10.0.dist-info/entry_points.txt,sha256=OACukVeR_2z54i8yQuWqqk_jdEHlyTwmTFOFBmxPp1k,43
35
- labelr-0.10.0.dist-info/top_level.txt,sha256=bjZo50aGZhXIcZYpYOX4sdAQcamxh8nwfEh7A9RD_Ag,7
36
- labelr-0.10.0.dist-info/RECORD,,
32
+ labelr/sample/object_detection.py,sha256=iMqiHRdzX5J24WAG2OdSY3nLzp_xRG-_Yq1LALtC840,9454
33
+ labelr-0.11.0.dist-info/licenses/LICENSE,sha256=hIahDEOTzuHCU5J2nd07LWwkLW7Hko4UFO__ffsvB-8,34523
34
+ labelr-0.11.0.dist-info/METADATA,sha256=pIjkMaVB-NGuExa2yw3D-g166rUfBAo_Qzpjk-LdfQs,11586
35
+ labelr-0.11.0.dist-info/WHEEL,sha256=wUyA8OaulRlbfwMtmQsvNngGrxQHAvkKcvRmdizlJi0,92
36
+ labelr-0.11.0.dist-info/entry_points.txt,sha256=OACukVeR_2z54i8yQuWqqk_jdEHlyTwmTFOFBmxPp1k,43
37
+ labelr-0.11.0.dist-info/top_level.txt,sha256=bjZo50aGZhXIcZYpYOX4sdAQcamxh8nwfEh7A9RD_Ag,7
38
+ labelr-0.11.0.dist-info/RECORD,,
@@ -1,5 +1,5 @@
1
1
  Wheel-Version: 1.0
2
- Generator: setuptools (80.9.0)
2
+ Generator: setuptools (80.10.2)
3
3
  Root-Is-Purelib: true
4
4
  Tag: py3-none-any
5
5
 
@@ -1,158 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: labelr
3
- Version: 0.10.0
4
- Summary: A command-line tool to manage labeling tasks with Label Studio.
5
- Requires-Python: >=3.10
6
- Description-Content-Type: text/markdown
7
- License-File: LICENSE
8
- Requires-Dist: datasets>=3.2.0
9
- Requires-Dist: imagehash>=4.3.1
10
- Requires-Dist: label-studio-sdk>=1.0.8
11
- Requires-Dist: more-itertools>=10.5.0
12
- Requires-Dist: openfoodfacts>=2.9.0
13
- Requires-Dist: typer>=0.15.1
14
- Requires-Dist: google-cloud-batch==0.18.0
15
- Requires-Dist: huggingface-hub
16
- Requires-Dist: deepdiff>=8.6.1
17
- Requires-Dist: rapidfuzz>=3.14.3
18
- Requires-Dist: aiohttp
19
- Requires-Dist: aiofiles
20
- Requires-Dist: orjson
21
- Requires-Dist: google-cloud-storage
22
- Requires-Dist: gcloud-aio-storage
23
- Requires-Dist: google-genai>=1.56.0
24
- Provides-Extra: ultralytics
25
- Requires-Dist: ultralytics==8.3.223; extra == "ultralytics"
26
- Provides-Extra: fiftyone
27
- Requires-Dist: fiftyone~=1.10.0; extra == "fiftyone"
28
- Dynamic: license-file
29
-
30
- # Labelr
31
-
32
- Labelr a command line interface that aims to provide a set of tools to help data scientists and machine learning engineers to deal with ML data annotation, data preprocessing and format conversion.
33
-
34
- This project started as a way to automate some of the tasks we do at Open Food Facts to manage data at different stages of the machine learning pipeline.
35
-
36
- The CLI currently is integrated with Label Studio (for data annotation), Ultralytics (for object detection) and Hugging Face (for model and dataset storage). It only works with some specific tasks (object detection only currently), but it's meant to be extended to other tasks in the future.
37
-
38
- It currently allows to:
39
-
40
- - create Label Studio projects
41
- - upload images to Label Studio
42
- - pre-annotate the tasks either with an existing object detection model run by Triton, or with Yolo-World (through Ultralytics)
43
- - perform data quality checks on Label Studio
44
- - export the data to Hugging Face Dataset or to local disk
45
-
46
- ## Installation
47
-
48
- Python 3.10 or higher is required to run this CLI.
49
-
50
- To install the CLI, simply run:
51
-
52
- ```bash
53
- pip install labelr
54
- ```
55
- We recommend to install the CLI in a virtual environment. You can either use pip or conda for that.
56
-
57
- There are two optional dependencies that you can install to use the CLI:
58
- - `ultralytics`: pre-annotate object detection datasets with an ultralytics model (yolo, yolo-world)
59
- - `triton`: pre-annotate object detection datasets using a model served by a Triton inference server
60
-
61
- To install the optional dependencies, you can run:
62
-
63
- ```bash
64
- pip install labelr[ultralytics,triton]
65
- ```
66
-
67
- ## Usage
68
-
69
- ### Label Studio integration
70
-
71
- To create a Label Studio project, you need to have a Label Studio instance running. Launching a Label Studio instance is out of the scope of this project, but you can follow the instructions on the [Label Studio documentation](https://labelstud.io/guide/install.html).
72
-
73
- By default, the CLI will use Open Food Facts Label Studio instance, but you can change the URL by setting the `--label-studio-url` CLI option.
74
-
75
- For all the commands that interact with Label Studio, you need to provide an API key using the `--api-key` CLI option. You can get an API key by logging in to the Label Studio instance and going to the Account & Settings page.
76
-
77
- #### Create a project
78
-
79
- Once you have a Label Studio instance running, you can create a project easily. First, you need to create a configuration file for the project. The configuration file is an XML file that defines the labeling interface and the labels to use for the project. You can find an example of a configuration file in the [Label Studio documentation](https://labelstud.io/guide/setup).
80
-
81
- For an object detection task, a command allows you to create the configuration file automatically:
82
-
83
- ```bash
84
- labelr ls create-config --labels 'label1' --labels 'label2' --output-file label_config.xml
85
- ```
86
-
87
- where `label1` and `label2` are the labels you want to use for the object detection task, and `label_config.xml` is the output file that will contain the configuration.
88
-
89
- Then, you can create a project on Label Studio with the following command:
90
-
91
- ```bash
92
- labelr ls create --title my_project --api-key API_KEY --config-file label_config.xml
93
- ```
94
-
95
- where `API_KEY` is the API key of the Label Studio instance (API key is available at Account page), and `label_config.xml` is the configuration file of the project.
96
-
97
- `ls` stands for Label Studio in the CLI.
98
-
99
- #### Create a dataset file
100
-
101
- If you have a list of images, for an object detection task, you can quickly create a dataset file with the following command:
102
-
103
- ```bash
104
- labelr ls create-dataset-file --input-file image_urls.txt --output-file dataset.json
105
- ```
106
-
107
- where `image_urls.txt` is a file containing the URLs of the images, one per line, and `dataset.json` is the output file.
108
-
109
- #### Import data
110
-
111
- Next, import the generated data to a project with the following command:
112
-
113
- ```bash
114
- labelr ls import-data --project-id PROJECT_ID --dataset-path dataset.json
115
- ```
116
-
117
- where `PROJECT_ID` is the ID of the project you created.
118
-
119
- #### Pre-annotate the data
120
-
121
- To accelerate annotation, you can pre-annotate the images with an object detection model. We support two pre-annotation backends:
122
-
123
- - Triton: you need to have a Triton server running with a model that supports object detection. The object detection model is expected to be a yolo-v8 model. You can set the URL of the Triton server with the `--triton-url` CLI option.
124
-
125
- - Ultralytics: you can use the [Yolo-World model from Ultralytics](https://github.com/ultralytics/ultralytics), Ultralytics should be installed in the same virtualenv.
126
-
127
- To pre-annotate the data with Triton, use the following command:
128
-
129
- ```bash
130
- labelr ls add-prediction --project-id PROJECT_ID --backend ultralytics --labels 'product' --labels 'price tag' --label-mapping '{"price tag": "price-tag"}'
131
- ```
132
-
133
- where `labels` is the list of labels to use for the object detection task (you can add as many labels as you want).
134
- For Ultralytics, you can also provide a `--label-mapping` option to map the labels from the model to the labels of the project.
135
-
136
- By default, for Ultralytics, the `yolov8x-worldv2.pt` model is used. You can change the model by setting the `--model-name` CLI option.
137
-
138
- #### Export the data
139
-
140
- Once the data is annotated, you can export it to a Hugging Face dataset or to local disk (Ultralytics format). To export it to disk, use the following command:
141
-
142
- ```bash
143
- labelr datasets export --project-id PROJECT_ID --from ls --to ultralytics --output-dir output --label-names 'product,price-tag'
144
- ```
145
-
146
- where `output` is the directory where the data will be exported. Currently, label names must be provided, as the CLI does not support exporting label names from Label Studio yet.
147
-
148
- To export the data to a Hugging Face dataset, use the following command:
149
-
150
- ```bash
151
- labelr datasets export --project-id PROJECT_ID --from ls --to huggingface --repo-id REPO_ID --label-names 'product,price-tag'
152
- ```
153
-
154
- where `REPO_ID` is the ID of the Hugging Face repository where the dataset will be uploaded (ex: `openfoodfacts/food-detection`).
155
-
156
- ### Lauch training jobs
157
-
158
- You can also launch training jobs for YOLO object detection models using datasets hosted on Hugging Face. Please refer to the [train-yolo package README](packages/train-yolo/README.md) for more details on how to use this feature.