labelr 0.2.0__tar.gz → 0.4.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {labelr-0.2.0 → labelr-0.4.0}/PKG-INFO +16 -9
- labelr-0.2.0/src/labelr.egg-info/PKG-INFO → labelr-0.4.0/README.md +12 -22
- {labelr-0.2.0 → labelr-0.4.0}/pyproject.toml +3 -8
- {labelr-0.2.0 → labelr-0.4.0}/src/labelr/annotate.py +16 -15
- {labelr-0.2.0 → labelr-0.4.0}/src/labelr/apps/datasets.py +84 -5
- {labelr-0.2.0 → labelr-0.4.0}/src/labelr/apps/projects.py +115 -34
- {labelr-0.2.0 → labelr-0.4.0}/src/labelr/export.py +135 -23
- labelr-0.4.0/src/labelr/project_config.py +45 -0
- {labelr-0.2.0 → labelr-0.4.0}/src/labelr/sample.py +71 -15
- {labelr-0.2.0 → labelr-0.4.0}/src/labelr/types.py +1 -0
- labelr-0.2.0/README.md → labelr-0.4.0/src/labelr.egg-info/PKG-INFO +29 -2
- {labelr-0.2.0 → labelr-0.4.0}/src/labelr.egg-info/SOURCES.txt +1 -0
- {labelr-0.2.0 → labelr-0.4.0}/src/labelr.egg-info/requires.txt +1 -6
- {labelr-0.2.0 → labelr-0.4.0}/LICENSE +0 -0
- {labelr-0.2.0 → labelr-0.4.0}/setup.cfg +0 -0
- {labelr-0.2.0 → labelr-0.4.0}/src/labelr/__init__.py +0 -0
- {labelr-0.2.0 → labelr-0.4.0}/src/labelr/__main__.py +0 -0
- {labelr-0.2.0 → labelr-0.4.0}/src/labelr/apps/__init__.py +0 -0
- {labelr-0.2.0 → labelr-0.4.0}/src/labelr/apps/users.py +0 -0
- {labelr-0.2.0 → labelr-0.4.0}/src/labelr/check.py +0 -0
- {labelr-0.2.0 → labelr-0.4.0}/src/labelr/config.py +0 -0
- {labelr-0.2.0 → labelr-0.4.0}/src/labelr/main.py +0 -0
- {labelr-0.2.0 → labelr-0.4.0}/src/labelr.egg-info/dependency_links.txt +0 -0
- {labelr-0.2.0 → labelr-0.4.0}/src/labelr.egg-info/entry_points.txt +0 -0
- {labelr-0.2.0 → labelr-0.4.0}/src/labelr.egg-info/top_level.txt +0 -0
|
@@ -1,7 +1,7 @@
|
|
|
1
|
-
Metadata-Version: 2.
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
2
|
Name: labelr
|
|
3
|
-
Version: 0.
|
|
4
|
-
Summary:
|
|
3
|
+
Version: 0.4.0
|
|
4
|
+
Summary: A command-line tool to manage labeling tasks with Label Studio.
|
|
5
5
|
Requires-Python: >=3.10
|
|
6
6
|
Description-Content-Type: text/markdown
|
|
7
7
|
License-File: LICENSE
|
|
@@ -9,14 +9,11 @@ Requires-Dist: datasets>=3.2.0
|
|
|
9
9
|
Requires-Dist: imagehash>=4.3.1
|
|
10
10
|
Requires-Dist: label-studio-sdk>=1.0.8
|
|
11
11
|
Requires-Dist: more-itertools>=10.5.0
|
|
12
|
-
Requires-Dist: openfoodfacts>=2.
|
|
13
|
-
Requires-Dist: protobuf>=5.29.1
|
|
12
|
+
Requires-Dist: openfoodfacts>=2.9.0
|
|
14
13
|
Requires-Dist: typer>=0.15.1
|
|
15
14
|
Provides-Extra: ultralytics
|
|
16
15
|
Requires-Dist: ultralytics>=8.3.49; extra == "ultralytics"
|
|
17
|
-
|
|
18
|
-
Requires-Dist: tritonclient>=2.52.0; extra == "triton"
|
|
19
|
-
Requires-Dist: openfoodfacts[ml]>=2.3.4; extra == "triton"
|
|
16
|
+
Dynamic: license-file
|
|
20
17
|
|
|
21
18
|
# Labelr
|
|
22
19
|
|
|
@@ -67,7 +64,17 @@ For all the commands that interact with Label Studio, you need to provide an API
|
|
|
67
64
|
|
|
68
65
|
#### Create a project
|
|
69
66
|
|
|
70
|
-
Once you have a Label Studio instance running, you can create a project
|
|
67
|
+
Once you have a Label Studio instance running, you can create a project easily. First, you need to create a configuration file for the project. The configuration file is an XML file that defines the labeling interface and the labels to use for the project. You can find an example of a configuration file in the [Label Studio documentation](https://labelstud.io/guide/setup).
|
|
68
|
+
|
|
69
|
+
For an object detection task, a command allows you to create the configuration file automatically:
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
labelr projects create-config --labels 'label1' --labels 'label2' --output-file label_config.xml
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
where `label1` and `label2` are the labels you want to use for the object detection task, and `label_config.xml` is the output file that will contain the configuration.
|
|
76
|
+
|
|
77
|
+
Then, you can create a project on Label Studio with the following command:
|
|
71
78
|
|
|
72
79
|
```bash
|
|
73
80
|
labelr projects create --title my_project --api-key API_KEY --config-file label_config.xml
|
|
@@ -1,23 +1,3 @@
|
|
|
1
|
-
Metadata-Version: 2.1
|
|
2
|
-
Name: labelr
|
|
3
|
-
Version: 0.2.0
|
|
4
|
-
Summary: Add your description here
|
|
5
|
-
Requires-Python: >=3.10
|
|
6
|
-
Description-Content-Type: text/markdown
|
|
7
|
-
License-File: LICENSE
|
|
8
|
-
Requires-Dist: datasets>=3.2.0
|
|
9
|
-
Requires-Dist: imagehash>=4.3.1
|
|
10
|
-
Requires-Dist: label-studio-sdk>=1.0.8
|
|
11
|
-
Requires-Dist: more-itertools>=10.5.0
|
|
12
|
-
Requires-Dist: openfoodfacts>=2.3.4
|
|
13
|
-
Requires-Dist: protobuf>=5.29.1
|
|
14
|
-
Requires-Dist: typer>=0.15.1
|
|
15
|
-
Provides-Extra: ultralytics
|
|
16
|
-
Requires-Dist: ultralytics>=8.3.49; extra == "ultralytics"
|
|
17
|
-
Provides-Extra: triton
|
|
18
|
-
Requires-Dist: tritonclient>=2.52.0; extra == "triton"
|
|
19
|
-
Requires-Dist: openfoodfacts[ml]>=2.3.4; extra == "triton"
|
|
20
|
-
|
|
21
1
|
# Labelr
|
|
22
2
|
|
|
23
3
|
Labelr a command line interface that aims to provide a set of tools to help data scientists and machine learning engineers to deal with ML data annotation, data preprocessing and format conversion.
|
|
@@ -67,7 +47,17 @@ For all the commands that interact with Label Studio, you need to provide an API
|
|
|
67
47
|
|
|
68
48
|
#### Create a project
|
|
69
49
|
|
|
70
|
-
Once you have a Label Studio instance running, you can create a project
|
|
50
|
+
Once you have a Label Studio instance running, you can create a project easily. First, you need to create a configuration file for the project. The configuration file is an XML file that defines the labeling interface and the labels to use for the project. You can find an example of a configuration file in the [Label Studio documentation](https://labelstud.io/guide/setup).
|
|
51
|
+
|
|
52
|
+
For an object detection task, a command allows you to create the configuration file automatically:
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
labelr projects create-config --labels 'label1' --labels 'label2' --output-file label_config.xml
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
where `label1` and `label2` are the labels you want to use for the object detection task, and `label_config.xml` is the output file that will contain the configuration.
|
|
59
|
+
|
|
60
|
+
Then, you can create a project on Label Studio with the following command:
|
|
71
61
|
|
|
72
62
|
```bash
|
|
73
63
|
labelr projects create --title my_project --api-key API_KEY --config-file label_config.xml
|
|
@@ -130,4 +120,4 @@ To export the data to a Hugging Face dataset, use the following command:
|
|
|
130
120
|
labelr datasets export --project-id PROJECT_ID --from ls --to huggingface --repo-id REPO_ID --label-names 'product,price-tag'
|
|
131
121
|
```
|
|
132
122
|
|
|
133
|
-
where `REPO_ID` is the ID of the Hugging Face repository where the dataset will be uploaded (ex: `openfoodfacts/food-detection`).
|
|
123
|
+
where `REPO_ID` is the ID of the Hugging Face repository where the dataset will be uploaded (ex: `openfoodfacts/food-detection`).
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
[project]
|
|
2
2
|
name = "labelr"
|
|
3
|
-
version = "0.
|
|
4
|
-
description = "
|
|
3
|
+
version = "0.4.0"
|
|
4
|
+
description = "A command-line tool to manage labeling tasks with Label Studio."
|
|
5
5
|
readme = "README.md"
|
|
6
6
|
requires-python = ">=3.10"
|
|
7
7
|
dependencies = [
|
|
@@ -9,8 +9,7 @@ dependencies = [
|
|
|
9
9
|
"imagehash>=4.3.1",
|
|
10
10
|
"label-studio-sdk>=1.0.8",
|
|
11
11
|
"more-itertools>=10.5.0",
|
|
12
|
-
"openfoodfacts>=2.
|
|
13
|
-
"protobuf>=5.29.1",
|
|
12
|
+
"openfoodfacts>=2.9.0",
|
|
14
13
|
"typer>=0.15.1",
|
|
15
14
|
]
|
|
16
15
|
|
|
@@ -21,10 +20,6 @@ labelr = "labelr.main:app"
|
|
|
21
20
|
ultralytics = [
|
|
22
21
|
"ultralytics>=8.3.49",
|
|
23
22
|
]
|
|
24
|
-
triton = [
|
|
25
|
-
"tritonclient>=2.52.0",
|
|
26
|
-
"openfoodfacts[ml]>=2.3.4",
|
|
27
|
-
]
|
|
28
23
|
|
|
29
24
|
[tool.uv]
|
|
30
25
|
package = true
|
|
@@ -1,29 +1,30 @@
|
|
|
1
1
|
import random
|
|
2
2
|
import string
|
|
3
3
|
|
|
4
|
+
from openfoodfacts.types import JSONType
|
|
4
5
|
from openfoodfacts.utils import get_logger
|
|
5
6
|
|
|
6
|
-
try:
|
|
7
|
-
from openfoodfacts.ml.object_detection import ObjectDetectionRawResult
|
|
8
|
-
from ultralytics.engine.results import Results
|
|
9
|
-
except ImportError:
|
|
10
|
-
pass
|
|
11
|
-
|
|
12
|
-
|
|
13
7
|
logger = get_logger(__name__)
|
|
14
8
|
|
|
15
9
|
|
|
16
|
-
def
|
|
17
|
-
objects: list[
|
|
18
|
-
|
|
19
|
-
|
|
10
|
+
def format_annotation_results_from_robotoff(
|
|
11
|
+
objects: list[JSONType],
|
|
12
|
+
image_width: int,
|
|
13
|
+
image_height: int,
|
|
14
|
+
label_mapping: dict[str, str] | None = None,
|
|
15
|
+
) -> list[JSONType]:
|
|
16
|
+
"""Format annotation results from Robotoff prediction endpoint into
|
|
20
17
|
Label Studio format."""
|
|
21
18
|
annotation_results = []
|
|
22
19
|
for object_ in objects:
|
|
23
|
-
|
|
24
|
-
|
|
20
|
+
bounding_box = object_["bounding_box"]
|
|
21
|
+
label_name = object_["label"]
|
|
22
|
+
|
|
23
|
+
if label_mapping:
|
|
24
|
+
label_name = label_mapping.get(label_name, label_name)
|
|
25
|
+
|
|
25
26
|
# These are relative coordinates (between 0.0 and 1.0)
|
|
26
|
-
y_min, x_min, y_max, x_max =
|
|
27
|
+
y_min, x_min, y_max, x_max = bounding_box
|
|
27
28
|
# Make sure the coordinates are within the image boundaries,
|
|
28
29
|
# and convert them to percentages
|
|
29
30
|
y_min = min(max(0, y_min), 1.0) * 100
|
|
@@ -51,7 +52,7 @@ def format_annotation_results_from_triton(
|
|
|
51
52
|
"y": y,
|
|
52
53
|
"width": width,
|
|
53
54
|
"height": height,
|
|
54
|
-
"rectanglelabels": [
|
|
55
|
+
"rectanglelabels": [label_name],
|
|
55
56
|
},
|
|
56
57
|
},
|
|
57
58
|
)
|
|
@@ -6,8 +6,11 @@ from pathlib import Path
|
|
|
6
6
|
from typing import Annotated, Optional
|
|
7
7
|
|
|
8
8
|
import typer
|
|
9
|
+
from openfoodfacts import Flavor
|
|
9
10
|
from openfoodfacts.utils import get_logger
|
|
10
11
|
|
|
12
|
+
from labelr.export import export_from_ultralytics_to_hf
|
|
13
|
+
|
|
11
14
|
from ..config import LABEL_STUDIO_DEFAULT_URL
|
|
12
15
|
from ..types import ExportDestination, ExportSource, TaskType
|
|
13
16
|
|
|
@@ -130,9 +133,14 @@ def export(
|
|
|
130
133
|
from_: Annotated[ExportSource, typer.Option("--from", help="Input source to use")],
|
|
131
134
|
to: Annotated[ExportDestination, typer.Option(help="Where to export the data")],
|
|
132
135
|
api_key: Annotated[Optional[str], typer.Option(envvar="LABEL_STUDIO_API_KEY")],
|
|
136
|
+
task_type: Annotated[
|
|
137
|
+
TaskType, typer.Option(help="Type of task to export")
|
|
138
|
+
] = TaskType.object_detection,
|
|
133
139
|
repo_id: Annotated[
|
|
134
140
|
Optional[str],
|
|
135
|
-
typer.Option(
|
|
141
|
+
typer.Option(
|
|
142
|
+
help="Hugging Face Datasets repository ID to convert (only if --from or --to is `hf`)"
|
|
143
|
+
),
|
|
136
144
|
] = None,
|
|
137
145
|
label_names: Annotated[
|
|
138
146
|
Optional[str],
|
|
@@ -146,12 +154,33 @@ def export(
|
|
|
146
154
|
Optional[Path],
|
|
147
155
|
typer.Option(help="Path to the output directory", file_okay=False),
|
|
148
156
|
] = None,
|
|
157
|
+
dataset_dir: Annotated[
|
|
158
|
+
Optional[Path],
|
|
159
|
+
typer.Option(help="Path to the dataset directory, only for Ultralytics source"),
|
|
160
|
+
] = None,
|
|
149
161
|
download_images: Annotated[
|
|
150
162
|
bool,
|
|
151
163
|
typer.Option(
|
|
152
164
|
help="if True, don't use HF images and download images from the server"
|
|
153
165
|
),
|
|
154
166
|
] = False,
|
|
167
|
+
is_openfoodfacts_dataset: Annotated[
|
|
168
|
+
bool,
|
|
169
|
+
typer.Option(
|
|
170
|
+
help="Whether the Ultralytics dataset is an OpenFoodFacts dataset, only "
|
|
171
|
+
"for Ultralytics source. This is used to generate the correct image URLs "
|
|
172
|
+
"each image name."
|
|
173
|
+
),
|
|
174
|
+
] = True,
|
|
175
|
+
openfoodfacts_flavor: Annotated[
|
|
176
|
+
Flavor,
|
|
177
|
+
typer.Option(
|
|
178
|
+
help="Flavor of the Open Food Facts dataset to use for image URLs, only "
|
|
179
|
+
"for Ultralytics source if is_openfoodfacts_dataset is True. This is used to "
|
|
180
|
+
"generate the correct image URLs each image name. This option is ignored if "
|
|
181
|
+
"is_openfoodfacts_dataset is False."
|
|
182
|
+
),
|
|
183
|
+
] = Flavor.off,
|
|
155
184
|
train_ratio: Annotated[
|
|
156
185
|
float,
|
|
157
186
|
typer.Option(
|
|
@@ -165,6 +194,17 @@ def export(
|
|
|
165
194
|
help="Raise an error if an image download fails, only for Ultralytics"
|
|
166
195
|
),
|
|
167
196
|
] = True,
|
|
197
|
+
use_aws_cache: Annotated[
|
|
198
|
+
bool,
|
|
199
|
+
typer.Option(
|
|
200
|
+
help="Use the AWS S3 cache for image downloads instead of images.openfoodfacts.org, "
|
|
201
|
+
"it is ignored if the export format is not Ultralytics"
|
|
202
|
+
),
|
|
203
|
+
] = True,
|
|
204
|
+
merge_labels: Annotated[
|
|
205
|
+
bool,
|
|
206
|
+
typer.Option(help="Merge multiple labels into a single label"),
|
|
207
|
+
] = False,
|
|
168
208
|
):
|
|
169
209
|
"""Export Label Studio annotation, either to Hugging Face Datasets or
|
|
170
210
|
local files (ultralytics format)."""
|
|
@@ -179,6 +219,13 @@ def export(
|
|
|
179
219
|
if (to == ExportDestination.hf or from_ == ExportSource.hf) and repo_id is None:
|
|
180
220
|
raise typer.BadParameter("Repository ID is required for export/import with HF")
|
|
181
221
|
|
|
222
|
+
if from_ == ExportSource.ultralytics and dataset_dir is None:
|
|
223
|
+
raise typer.BadParameter(
|
|
224
|
+
"Dataset directory is required for export from Ultralytics source"
|
|
225
|
+
)
|
|
226
|
+
|
|
227
|
+
label_names_list: list[str] | None = None
|
|
228
|
+
|
|
182
229
|
if label_names is None:
|
|
183
230
|
if to == ExportDestination.hf:
|
|
184
231
|
raise typer.BadParameter("Label names are required for HF export")
|
|
@@ -186,6 +233,9 @@ def export(
|
|
|
186
233
|
raise typer.BadParameter(
|
|
187
234
|
"Label names are required for export from LS source"
|
|
188
235
|
)
|
|
236
|
+
else:
|
|
237
|
+
label_names = typing.cast(str, label_names)
|
|
238
|
+
label_names_list = label_names.split(",")
|
|
189
239
|
|
|
190
240
|
if from_ == ExportSource.ls:
|
|
191
241
|
if project_id is None:
|
|
@@ -197,31 +247,60 @@ def export(
|
|
|
197
247
|
raise typer.BadParameter("Output directory is required for Ultralytics export")
|
|
198
248
|
|
|
199
249
|
if from_ == ExportSource.ls:
|
|
250
|
+
if task_type != TaskType.object_detection:
|
|
251
|
+
raise typer.BadParameter(
|
|
252
|
+
"Only object detection task is currently supported with LS source"
|
|
253
|
+
)
|
|
200
254
|
ls = LabelStudio(base_url=label_studio_url, api_key=api_key)
|
|
201
|
-
label_names = typing.cast(str, label_names)
|
|
202
|
-
label_names_list = label_names.split(",")
|
|
203
255
|
if to == ExportDestination.hf:
|
|
204
256
|
repo_id = typing.cast(str, repo_id)
|
|
205
257
|
export_from_ls_to_hf(
|
|
206
|
-
ls,
|
|
258
|
+
ls,
|
|
259
|
+
repo_id=repo_id,
|
|
260
|
+
label_names=typing.cast(list[str], label_names_list),
|
|
261
|
+
project_id=typing.cast(int, project_id),
|
|
262
|
+
merge_labels=merge_labels,
|
|
263
|
+
use_aws_cache=use_aws_cache,
|
|
207
264
|
)
|
|
208
265
|
elif to == ExportDestination.ultralytics:
|
|
209
266
|
export_from_ls_to_ultralytics(
|
|
210
267
|
ls,
|
|
211
268
|
typing.cast(Path, output_dir),
|
|
212
|
-
label_names_list,
|
|
269
|
+
typing.cast(list[str], label_names_list),
|
|
213
270
|
typing.cast(int, project_id),
|
|
214
271
|
train_ratio=train_ratio,
|
|
215
272
|
error_raise=error_raise,
|
|
273
|
+
merge_labels=merge_labels,
|
|
274
|
+
use_aws_cache=use_aws_cache,
|
|
216
275
|
)
|
|
217
276
|
|
|
218
277
|
elif from_ == ExportSource.hf:
|
|
278
|
+
if task_type != TaskType.object_detection:
|
|
279
|
+
raise typer.BadParameter(
|
|
280
|
+
"Only object detection task is currently supported with HF source"
|
|
281
|
+
)
|
|
219
282
|
if to == ExportDestination.ultralytics:
|
|
220
283
|
export_from_hf_to_ultralytics(
|
|
221
284
|
typing.cast(str, repo_id),
|
|
222
285
|
typing.cast(Path, output_dir),
|
|
223
286
|
download_images=download_images,
|
|
224
287
|
error_raise=error_raise,
|
|
288
|
+
use_aws_cache=use_aws_cache,
|
|
225
289
|
)
|
|
226
290
|
else:
|
|
227
291
|
raise typer.BadParameter("Unsupported export format")
|
|
292
|
+
elif from_ == ExportSource.ultralytics:
|
|
293
|
+
if task_type != TaskType.classification:
|
|
294
|
+
raise typer.BadParameter(
|
|
295
|
+
"Only classification task is currently supported with Ultralytics source"
|
|
296
|
+
)
|
|
297
|
+
if to == ExportDestination.hf:
|
|
298
|
+
export_from_ultralytics_to_hf(
|
|
299
|
+
task_type=task_type,
|
|
300
|
+
dataset_dir=typing.cast(Path, dataset_dir),
|
|
301
|
+
repo_id=typing.cast(str, repo_id),
|
|
302
|
+
merge_labels=merge_labels,
|
|
303
|
+
label_names=typing.cast(list[str], label_names_list),
|
|
304
|
+
is_openfoodfacts_dataset=is_openfoodfacts_dataset,
|
|
305
|
+
openfoodfacts_flavor=openfoodfacts_flavor,
|
|
306
|
+
)
|
|
@@ -9,7 +9,7 @@ from openfoodfacts.utils import get_logger
|
|
|
9
9
|
from PIL import Image
|
|
10
10
|
|
|
11
11
|
from ..annotate import (
|
|
12
|
-
|
|
12
|
+
format_annotation_results_from_robotoff,
|
|
13
13
|
format_annotation_results_from_ultralytics,
|
|
14
14
|
)
|
|
15
15
|
from ..config import LABEL_STUDIO_DEFAULT_URL
|
|
@@ -92,14 +92,46 @@ def add_split(
|
|
|
92
92
|
],
|
|
93
93
|
api_key: Annotated[str, typer.Option(envvar="LABEL_STUDIO_API_KEY")],
|
|
94
94
|
project_id: Annotated[int, typer.Option(help="Label Studio project ID")],
|
|
95
|
+
split_name: Annotated[
|
|
96
|
+
Optional[str],
|
|
97
|
+
typer.Option(
|
|
98
|
+
help="name of the split associated "
|
|
99
|
+
"with the task ID file. If --task-id-file is not provided, "
|
|
100
|
+
"this field is ignored."
|
|
101
|
+
),
|
|
102
|
+
] = None,
|
|
103
|
+
train_split_name: Annotated[
|
|
104
|
+
str,
|
|
105
|
+
typer.Option(help="name of the train split"),
|
|
106
|
+
] = "train",
|
|
107
|
+
val_split_name: Annotated[
|
|
108
|
+
str,
|
|
109
|
+
typer.Option(help="name of the validation split"),
|
|
110
|
+
] = "val",
|
|
111
|
+
task_id_file: Annotated[
|
|
112
|
+
Optional[Path],
|
|
113
|
+
typer.Option(help="path of a text file containing IDs of samples"),
|
|
114
|
+
] = None,
|
|
115
|
+
overwrite: Annotated[
|
|
116
|
+
bool, typer.Option(help="overwrite existing split field")
|
|
117
|
+
] = False,
|
|
95
118
|
label_studio_url: str = LABEL_STUDIO_DEFAULT_URL,
|
|
96
119
|
):
|
|
97
120
|
"""Update the split field of tasks in a Label Studio project.
|
|
98
121
|
|
|
122
|
+
The behavior of this command depends on the `--task-id-file` option.
|
|
123
|
+
|
|
124
|
+
If `--task-id-file` is provided, it should contain a list of task IDs,
|
|
125
|
+
one per line. The split field of these tasks will be updated to the value
|
|
126
|
+
of `--split-name`.
|
|
127
|
+
|
|
128
|
+
If `--task-id-file` is not provided, the split field of all tasks in the
|
|
129
|
+
project will be updated based on the `train_split` probability.
|
|
99
130
|
The split field is set to "train" with probability `train_split`, and "val"
|
|
100
|
-
otherwise.
|
|
101
|
-
|
|
102
|
-
are not updated
|
|
131
|
+
otherwise.
|
|
132
|
+
|
|
133
|
+
In both cases, tasks with a non-null split field are not updated unless
|
|
134
|
+
the `--overwrite` flag is provided.
|
|
103
135
|
"""
|
|
104
136
|
import random
|
|
105
137
|
|
|
@@ -108,11 +140,29 @@ def add_split(
|
|
|
108
140
|
|
|
109
141
|
ls = LabelStudio(base_url=label_studio_url, api_key=api_key)
|
|
110
142
|
|
|
143
|
+
task_ids = None
|
|
144
|
+
if task_id_file is not None:
|
|
145
|
+
if split_name is None or split_name not in (train_split_name, val_split_name):
|
|
146
|
+
raise typer.BadParameter(
|
|
147
|
+
"--split-name is required when using --task-id-file"
|
|
148
|
+
)
|
|
149
|
+
task_ids = task_id_file.read_text().strip().split("\n")
|
|
150
|
+
|
|
111
151
|
for task in ls.tasks.list(project=project_id, fields="all"):
|
|
112
152
|
task: Task
|
|
153
|
+
task_id = task.id
|
|
154
|
+
|
|
113
155
|
split = task.data.get("split")
|
|
114
|
-
if split is None:
|
|
115
|
-
|
|
156
|
+
if split is None or overwrite:
|
|
157
|
+
if task_ids and str(task_id) in task_ids:
|
|
158
|
+
split = split_name
|
|
159
|
+
else:
|
|
160
|
+
split = (
|
|
161
|
+
train_split_name
|
|
162
|
+
if random.random() < train_split
|
|
163
|
+
else val_split_name
|
|
164
|
+
)
|
|
165
|
+
|
|
116
166
|
logger.info("Updating task: %s, split: %s", task.id, split)
|
|
117
167
|
ls.tasks.update(task.id, data={**task.data, "split": split})
|
|
118
168
|
|
|
@@ -153,30 +203,37 @@ def annotate_from_prediction(
|
|
|
153
203
|
|
|
154
204
|
|
|
155
205
|
class PredictorBackend(enum.Enum):
|
|
156
|
-
triton = "triton"
|
|
157
206
|
ultralytics = "ultralytics"
|
|
207
|
+
robotoff = "robotoff"
|
|
158
208
|
|
|
159
209
|
|
|
160
210
|
@app.command()
|
|
161
211
|
def add_prediction(
|
|
162
212
|
api_key: Annotated[str, typer.Option(envvar="LABEL_STUDIO_API_KEY")],
|
|
163
213
|
project_id: Annotated[int, typer.Option(help="Label Studio Project ID")],
|
|
214
|
+
view_id: Annotated[
|
|
215
|
+
Optional[int],
|
|
216
|
+
typer.Option(
|
|
217
|
+
help="Label Studio View ID to filter tasks. If not provided, all tasks in the "
|
|
218
|
+
"project are processed."
|
|
219
|
+
),
|
|
220
|
+
] = None,
|
|
164
221
|
model_name: Annotated[
|
|
165
222
|
str,
|
|
166
223
|
typer.Option(
|
|
167
|
-
help="Name of the object detection model to run (for
|
|
224
|
+
help="Name of the object detection model to run (for Robotoff server) or "
|
|
168
225
|
"of the Ultralytics zero-shot model to run."
|
|
169
226
|
),
|
|
170
227
|
] = "yolov8x-worldv2.pt",
|
|
171
|
-
|
|
228
|
+
server_url: Annotated[
|
|
172
229
|
Optional[str],
|
|
173
|
-
typer.Option(help="
|
|
174
|
-
] =
|
|
230
|
+
typer.Option(help="The Robotoff URL if the backend is robotoff"),
|
|
231
|
+
] = "https://robotoff.openfoodfacts.org",
|
|
175
232
|
backend: Annotated[
|
|
176
233
|
PredictorBackend,
|
|
177
234
|
typer.Option(
|
|
178
|
-
help="Prediction backend: either use
|
|
179
|
-
"the prediction or
|
|
235
|
+
help="Prediction backend: either use Ultralytics to perform "
|
|
236
|
+
"the prediction or Robotoff server."
|
|
180
237
|
),
|
|
181
238
|
] = PredictorBackend.ultralytics,
|
|
182
239
|
labels: Annotated[
|
|
@@ -196,8 +253,8 @@ def add_prediction(
|
|
|
196
253
|
threshold: Annotated[
|
|
197
254
|
Optional[float],
|
|
198
255
|
typer.Option(
|
|
199
|
-
help="Confidence threshold for selecting bounding boxes. The default is 0.
|
|
200
|
-
"for
|
|
256
|
+
help="Confidence threshold for selecting bounding boxes. The default is 0.3 "
|
|
257
|
+
"for robotoff backend and 0.1 for ultralytics backend."
|
|
201
258
|
),
|
|
202
259
|
] = None,
|
|
203
260
|
max_det: Annotated[int, typer.Option(help="Maximum numbers of detections")] = 300,
|
|
@@ -221,9 +278,7 @@ def add_prediction(
|
|
|
221
278
|
|
|
222
279
|
import tqdm
|
|
223
280
|
from label_studio_sdk.client import LabelStudio
|
|
224
|
-
from openfoodfacts.utils import get_image_from_url
|
|
225
|
-
|
|
226
|
-
from labelr.triton.object_detection import ObjectDetectionModelRegistry
|
|
281
|
+
from openfoodfacts.utils import get_image_from_url, http_session
|
|
227
282
|
|
|
228
283
|
label_mapping_dict = None
|
|
229
284
|
if label_mapping:
|
|
@@ -242,8 +297,6 @@ def add_prediction(
|
|
|
242
297
|
)
|
|
243
298
|
ls = LabelStudio(base_url=label_studio_url, api_key=api_key)
|
|
244
299
|
|
|
245
|
-
model: ObjectDetectionModelRegistry | "YOLO"
|
|
246
|
-
|
|
247
300
|
if backend == PredictorBackend.ultralytics:
|
|
248
301
|
from ultralytics import YOLO
|
|
249
302
|
|
|
@@ -258,18 +311,19 @@ def add_prediction(
|
|
|
258
311
|
model.set_classes(labels)
|
|
259
312
|
else:
|
|
260
313
|
logger.warning("The model does not support setting classes directly.")
|
|
261
|
-
elif backend == PredictorBackend.
|
|
262
|
-
if
|
|
263
|
-
raise typer.BadParameter("
|
|
314
|
+
elif backend == PredictorBackend.robotoff:
|
|
315
|
+
if server_url is None:
|
|
316
|
+
raise typer.BadParameter("--server-url is required for Robotoff backend")
|
|
264
317
|
|
|
265
318
|
if threshold is None:
|
|
266
|
-
threshold = 0.
|
|
267
|
-
|
|
268
|
-
model = ObjectDetectionModelRegistry.load(model_name)
|
|
319
|
+
threshold = 0.1
|
|
320
|
+
server_url = server_url.rstrip("/")
|
|
269
321
|
else:
|
|
270
322
|
raise typer.BadParameter(f"Unsupported backend: {backend}")
|
|
271
323
|
|
|
272
|
-
for task in tqdm.tqdm(
|
|
324
|
+
for task in tqdm.tqdm(
|
|
325
|
+
ls.tasks.list(project=project_id, view=view_id), desc="tasks"
|
|
326
|
+
):
|
|
273
327
|
if task.total_predictions == 0:
|
|
274
328
|
image_url = task.data["image_url"]
|
|
275
329
|
image = typing.cast(
|
|
@@ -286,12 +340,22 @@ def add_prediction(
|
|
|
286
340
|
label_studio_result = format_annotation_results_from_ultralytics(
|
|
287
341
|
results, labels, label_mapping_dict
|
|
288
342
|
)
|
|
289
|
-
|
|
290
|
-
|
|
291
|
-
|
|
292
|
-
|
|
293
|
-
|
|
294
|
-
|
|
343
|
+
elif backend == PredictorBackend.robotoff:
|
|
344
|
+
r = http_session.get(
|
|
345
|
+
f"{server_url}/api/v1/images/predict",
|
|
346
|
+
params={
|
|
347
|
+
"models": model_name,
|
|
348
|
+
"output_image": 0,
|
|
349
|
+
"image_url": image_url,
|
|
350
|
+
},
|
|
351
|
+
)
|
|
352
|
+
r.raise_for_status()
|
|
353
|
+
response = r.json()
|
|
354
|
+
label_studio_result = format_annotation_results_from_robotoff(
|
|
355
|
+
response["predictions"][model_name],
|
|
356
|
+
image.width,
|
|
357
|
+
image.height,
|
|
358
|
+
label_mapping_dict,
|
|
295
359
|
)
|
|
296
360
|
if dry_run:
|
|
297
361
|
logger.info("image_url: %s", image_url)
|
|
@@ -339,7 +403,7 @@ def create_dataset_file(
|
|
|
339
403
|
extra_meta["barcode"] = barcode
|
|
340
404
|
off_image_id = Path(extract_source_from_url(url)).stem
|
|
341
405
|
extra_meta["off_image_id"] = off_image_id
|
|
342
|
-
image_id = f"{barcode}
|
|
406
|
+
image_id = f"{barcode}_{off_image_id}"
|
|
343
407
|
|
|
344
408
|
image = get_image_from_url(url, error_raise=False)
|
|
345
409
|
|
|
@@ -351,3 +415,20 @@ def create_dataset_file(
|
|
|
351
415
|
image_id, url, image.width, image.height, extra_meta
|
|
352
416
|
)
|
|
353
417
|
f.write(json.dumps(label_studio_sample) + "\n")
|
|
418
|
+
|
|
419
|
+
|
|
420
|
+
@app.command()
|
|
421
|
+
def create_config_file(
|
|
422
|
+
output_file: Annotated[
|
|
423
|
+
Path, typer.Option(help="Path to the output label config file", exists=False)
|
|
424
|
+
],
|
|
425
|
+
labels: Annotated[
|
|
426
|
+
list[str], typer.Option(help="List of class labels to use for the model")
|
|
427
|
+
],
|
|
428
|
+
):
|
|
429
|
+
"""Create a Label Studio label config file for object detection tasks."""
|
|
430
|
+
from labelr.project_config import create_object_detection_label_config
|
|
431
|
+
|
|
432
|
+
config = create_object_detection_label_config(labels)
|
|
433
|
+
output_file.write_text(config)
|
|
434
|
+
logger.info("Label config file created: %s", output_file)
|
|
@@ -3,16 +3,21 @@ import logging
|
|
|
3
3
|
import pickle
|
|
4
4
|
import random
|
|
5
5
|
import tempfile
|
|
6
|
-
import typing
|
|
7
6
|
from pathlib import Path
|
|
8
7
|
|
|
9
8
|
import datasets
|
|
10
9
|
import tqdm
|
|
11
10
|
from label_studio_sdk.client import LabelStudio
|
|
12
|
-
from openfoodfacts.images import download_image
|
|
13
|
-
from
|
|
11
|
+
from openfoodfacts.images import download_image, generate_image_url
|
|
12
|
+
from openfoodfacts.types import Flavor
|
|
13
|
+
from PIL import Image, ImageOps
|
|
14
14
|
|
|
15
|
-
from labelr.sample import
|
|
15
|
+
from labelr.sample import (
|
|
16
|
+
HF_DS_CLASSIFICATION_FEATURES,
|
|
17
|
+
HF_DS_OBJECT_DETECTION_FEATURES,
|
|
18
|
+
format_object_detection_sample_to_hf,
|
|
19
|
+
)
|
|
20
|
+
from labelr.types import TaskType
|
|
16
21
|
|
|
17
22
|
logger = logging.getLogger(__name__)
|
|
18
23
|
|
|
@@ -27,10 +32,15 @@ def _pickle_sample_generator(dir: Path):
|
|
|
27
32
|
def export_from_ls_to_hf(
|
|
28
33
|
ls: LabelStudio,
|
|
29
34
|
repo_id: str,
|
|
30
|
-
|
|
35
|
+
label_names: list[str],
|
|
31
36
|
project_id: int,
|
|
37
|
+
merge_labels: bool = False,
|
|
38
|
+
use_aws_cache: bool = True,
|
|
32
39
|
):
|
|
33
|
-
|
|
40
|
+
if merge_labels:
|
|
41
|
+
label_names = ["object"]
|
|
42
|
+
|
|
43
|
+
logger.info("Project ID: %d, label names: %s", project_id, label_names)
|
|
34
44
|
|
|
35
45
|
for split in ["train", "val"]:
|
|
36
46
|
logger.info("Processing split: %s", split)
|
|
@@ -45,7 +55,11 @@ def export_from_ls_to_hf(
|
|
|
45
55
|
if task.data["split"] != split:
|
|
46
56
|
continue
|
|
47
57
|
sample = format_object_detection_sample_to_hf(
|
|
48
|
-
task.data,
|
|
58
|
+
task_data=task.data,
|
|
59
|
+
annotations=task.annotations,
|
|
60
|
+
label_names=label_names,
|
|
61
|
+
merge_labels=merge_labels,
|
|
62
|
+
use_aws_cache=use_aws_cache,
|
|
49
63
|
)
|
|
50
64
|
if sample is not None:
|
|
51
65
|
# Save output as pickle
|
|
@@ -54,7 +68,7 @@ def export_from_ls_to_hf(
|
|
|
54
68
|
|
|
55
69
|
hf_ds = datasets.Dataset.from_generator(
|
|
56
70
|
functools.partial(_pickle_sample_generator, tmp_dir),
|
|
57
|
-
features=
|
|
71
|
+
features=HF_DS_OBJECT_DETECTION_FEATURES,
|
|
58
72
|
)
|
|
59
73
|
hf_ds.push_to_hub(repo_id, split=split)
|
|
60
74
|
|
|
@@ -62,10 +76,12 @@ def export_from_ls_to_hf(
|
|
|
62
76
|
def export_from_ls_to_ultralytics(
|
|
63
77
|
ls: LabelStudio,
|
|
64
78
|
output_dir: Path,
|
|
65
|
-
|
|
79
|
+
label_names: list[str],
|
|
66
80
|
project_id: int,
|
|
67
81
|
train_ratio: float = 0.8,
|
|
68
82
|
error_raise: bool = True,
|
|
83
|
+
merge_labels: bool = False,
|
|
84
|
+
use_aws_cache: bool = True,
|
|
69
85
|
):
|
|
70
86
|
"""Export annotations from a Label Studio project to the Ultralytics
|
|
71
87
|
format.
|
|
@@ -73,7 +89,9 @@ def export_from_ls_to_ultralytics(
|
|
|
73
89
|
The Label Studio project should be an object detection project with a
|
|
74
90
|
single rectanglelabels annotation result per task.
|
|
75
91
|
"""
|
|
76
|
-
|
|
92
|
+
if merge_labels:
|
|
93
|
+
label_names = ["object"]
|
|
94
|
+
logger.info("Project ID: %d, label names: %s", project_id, label_names)
|
|
77
95
|
|
|
78
96
|
data_dir = output_dir / "data"
|
|
79
97
|
data_dir.mkdir(parents=True, exist_ok=True)
|
|
@@ -146,34 +164,37 @@ def export_from_ls_to_ultralytics(
|
|
|
146
164
|
y_min = value["y"] / 100
|
|
147
165
|
width = value["width"] / 100
|
|
148
166
|
height = value["height"] / 100
|
|
149
|
-
|
|
150
|
-
|
|
167
|
+
label_name = (
|
|
168
|
+
label_names[0] if merge_labels else value["rectanglelabels"][0]
|
|
169
|
+
)
|
|
170
|
+
label_id = label_names.index(label_name)
|
|
151
171
|
|
|
152
172
|
# Save the labels in the Ultralytics format:
|
|
153
173
|
# - one label per line
|
|
154
174
|
# - each line is a list of 5 elements:
|
|
155
|
-
# -
|
|
175
|
+
# - label_id
|
|
156
176
|
# - x_center
|
|
157
177
|
# - y_center
|
|
158
178
|
# - width
|
|
159
179
|
# - height
|
|
160
180
|
x_center = x_min + width / 2
|
|
161
181
|
y_center = y_min + height / 2
|
|
162
|
-
f.write(f"{
|
|
182
|
+
f.write(f"{label_id} {x_center} {y_center} {width} {height}\n")
|
|
163
183
|
has_valid_annotation = True
|
|
164
184
|
|
|
165
185
|
if has_valid_annotation:
|
|
166
186
|
download_output = download_image(
|
|
167
|
-
image_url,
|
|
187
|
+
image_url,
|
|
188
|
+
return_struct=True,
|
|
189
|
+
error_raise=error_raise,
|
|
190
|
+
use_cache=use_aws_cache,
|
|
168
191
|
)
|
|
169
192
|
if download_output is None:
|
|
170
193
|
logger.error("Failed to download image: %s", image_url)
|
|
171
194
|
continue
|
|
172
195
|
|
|
173
|
-
_, image_bytes = typing.cast(tuple[Image.Image, bytes], download_output)
|
|
174
|
-
|
|
175
196
|
with (images_dir / split / f"{image_id}.jpg").open("wb") as f:
|
|
176
|
-
f.write(image_bytes)
|
|
197
|
+
f.write(download_output.image_bytes)
|
|
177
198
|
|
|
178
199
|
with (output_dir / "data.yaml").open("w") as f:
|
|
179
200
|
f.write("path: data\n")
|
|
@@ -181,8 +202,8 @@ def export_from_ls_to_ultralytics(
|
|
|
181
202
|
f.write("val: images/val\n")
|
|
182
203
|
f.write("test:\n")
|
|
183
204
|
f.write("names:\n")
|
|
184
|
-
for i,
|
|
185
|
-
f.write(f" {i}: {
|
|
205
|
+
for i, label_name in enumerate(label_names):
|
|
206
|
+
f.write(f" {i}: {label_name}\n")
|
|
186
207
|
|
|
187
208
|
|
|
188
209
|
def export_from_hf_to_ultralytics(
|
|
@@ -190,6 +211,7 @@ def export_from_hf_to_ultralytics(
|
|
|
190
211
|
output_dir: Path,
|
|
191
212
|
download_images: bool = True,
|
|
192
213
|
error_raise: bool = True,
|
|
214
|
+
use_aws_cache: bool = True,
|
|
193
215
|
):
|
|
194
216
|
"""Export annotations from a Hugging Face dataset project to the
|
|
195
217
|
Ultralytics format.
|
|
@@ -215,14 +237,17 @@ def export_from_hf_to_ultralytics(
|
|
|
215
237
|
|
|
216
238
|
if download_images:
|
|
217
239
|
download_output = download_image(
|
|
218
|
-
image_url,
|
|
240
|
+
image_url,
|
|
241
|
+
return_struct=True,
|
|
242
|
+
error_raise=error_raise,
|
|
243
|
+
use_cache=use_aws_cache,
|
|
219
244
|
)
|
|
220
245
|
if download_output is None:
|
|
221
246
|
logger.error("Failed to download image: %s", image_url)
|
|
222
247
|
continue
|
|
223
|
-
|
|
248
|
+
|
|
224
249
|
with (split_images_dir / f"{image_id}.jpg").open("wb") as f:
|
|
225
|
-
f.write(image_bytes)
|
|
250
|
+
f.write(download_output.image_bytes)
|
|
226
251
|
else:
|
|
227
252
|
image = sample["image"]
|
|
228
253
|
image.save(split_images_dir / f"{image_id}.jpg")
|
|
@@ -268,3 +293,90 @@ def export_from_hf_to_ultralytics(
|
|
|
268
293
|
f.write("names:\n")
|
|
269
294
|
for i, category_name in enumerate(category_names):
|
|
270
295
|
f.write(f" {i}: {category_name}\n")
|
|
296
|
+
|
|
297
|
+
|
|
298
|
+
def export_from_ultralytics_to_hf(
|
|
299
|
+
task_type: TaskType,
|
|
300
|
+
dataset_dir: Path,
|
|
301
|
+
repo_id: str,
|
|
302
|
+
label_names: list[str],
|
|
303
|
+
merge_labels: bool = False,
|
|
304
|
+
is_openfoodfacts_dataset: bool = False,
|
|
305
|
+
openfoodfacts_flavor: Flavor = Flavor.off,
|
|
306
|
+
) -> None:
|
|
307
|
+
if task_type != TaskType.classification:
|
|
308
|
+
raise NotImplementedError(
|
|
309
|
+
"Only classification task is currently supported for Ultralytics to HF export"
|
|
310
|
+
)
|
|
311
|
+
|
|
312
|
+
logger.info("Repo ID: %s, dataset_dir: %s", repo_id, dataset_dir)
|
|
313
|
+
|
|
314
|
+
if not any((dataset_dir / split).is_dir() for split in ["train", "val", "test"]):
|
|
315
|
+
raise ValueError(
|
|
316
|
+
f"Dataset directory {dataset_dir} does not contain 'train', 'val' or 'test' subdirectories"
|
|
317
|
+
)
|
|
318
|
+
|
|
319
|
+
# Save output as pickle
|
|
320
|
+
for split in ["train", "val", "test"]:
|
|
321
|
+
split_dir = dataset_dir / split
|
|
322
|
+
|
|
323
|
+
if not split_dir.is_dir():
|
|
324
|
+
logger.info("Skipping missing split directory: %s", split_dir)
|
|
325
|
+
continue
|
|
326
|
+
|
|
327
|
+
with tempfile.TemporaryDirectory() as tmp_dir_str:
|
|
328
|
+
tmp_dir = Path(tmp_dir_str)
|
|
329
|
+
for label_dir in (d for d in split_dir.iterdir() if d.is_dir()):
|
|
330
|
+
label_name = label_dir.name
|
|
331
|
+
if merge_labels:
|
|
332
|
+
label_name = "object"
|
|
333
|
+
if label_name not in label_names:
|
|
334
|
+
raise ValueError(
|
|
335
|
+
"Label name %s not in provided label names (label names: %s)"
|
|
336
|
+
% (label_name, label_names),
|
|
337
|
+
)
|
|
338
|
+
label_id = label_names.index(label_name)
|
|
339
|
+
|
|
340
|
+
for image_path in label_dir.glob("*"):
|
|
341
|
+
if is_openfoodfacts_dataset:
|
|
342
|
+
image_stem_parts = image_path.stem.split("_")
|
|
343
|
+
barcode = image_stem_parts[0]
|
|
344
|
+
off_image_id = image_stem_parts[1]
|
|
345
|
+
image_id = f"{barcode}_{off_image_id}"
|
|
346
|
+
image_url = generate_image_url(
|
|
347
|
+
barcode, off_image_id, flavor=openfoodfacts_flavor
|
|
348
|
+
)
|
|
349
|
+
else:
|
|
350
|
+
image_id = image_path.stem
|
|
351
|
+
barcode = ""
|
|
352
|
+
off_image_id = ""
|
|
353
|
+
image_url = ""
|
|
354
|
+
image = Image.open(image_path)
|
|
355
|
+
image.load()
|
|
356
|
+
|
|
357
|
+
if image.mode != "RGB":
|
|
358
|
+
image = image.convert("RGB")
|
|
359
|
+
|
|
360
|
+
# Rotate image according to exif orientation using Pillow
|
|
361
|
+
ImageOps.exif_transpose(image, in_place=True)
|
|
362
|
+
sample = {
|
|
363
|
+
"image_id": image_id,
|
|
364
|
+
"image": image,
|
|
365
|
+
"width": image.width,
|
|
366
|
+
"height": image.height,
|
|
367
|
+
"meta": {
|
|
368
|
+
"barcode": barcode,
|
|
369
|
+
"off_image_id": off_image_id,
|
|
370
|
+
"image_url": image_url,
|
|
371
|
+
},
|
|
372
|
+
"category_id": label_id,
|
|
373
|
+
"category_name": label_name,
|
|
374
|
+
}
|
|
375
|
+
with open(tmp_dir / f"{split}_{image_id}.pkl", "wb") as f:
|
|
376
|
+
pickle.dump(sample, f)
|
|
377
|
+
|
|
378
|
+
hf_ds = datasets.Dataset.from_generator(
|
|
379
|
+
functools.partial(_pickle_sample_generator, tmp_dir),
|
|
380
|
+
features=HF_DS_CLASSIFICATION_FEATURES,
|
|
381
|
+
)
|
|
382
|
+
hf_ds.push_to_hub(repo_id, split=split)
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
COLORS = [
|
|
2
|
+
"blue",
|
|
3
|
+
"green",
|
|
4
|
+
"yellow",
|
|
5
|
+
"red",
|
|
6
|
+
"purple",
|
|
7
|
+
"orange",
|
|
8
|
+
"pink",
|
|
9
|
+
"brown",
|
|
10
|
+
"gray",
|
|
11
|
+
"black",
|
|
12
|
+
"white",
|
|
13
|
+
]
|
|
14
|
+
|
|
15
|
+
|
|
16
|
+
def create_object_detection_label_config(labels_names: list[str]) -> str:
|
|
17
|
+
"""Create a Label Studio label configuration for object detection tasks.
|
|
18
|
+
|
|
19
|
+
The format is the following:
|
|
20
|
+
```xml
|
|
21
|
+
<View>
|
|
22
|
+
<Image name="image" value="$image_url"/>
|
|
23
|
+
<RectangleLabels name="label" toName="image">
|
|
24
|
+
<Label value="nutrition-table" background="green"/>
|
|
25
|
+
<Label value="nutrition-table-small" background="blue"/>
|
|
26
|
+
<Label value="nutrition-table-small-energy" background="yellow"/>
|
|
27
|
+
<Label value="nutrition-table-text" background="red"/>
|
|
28
|
+
</RectangleLabels>
|
|
29
|
+
</View>
|
|
30
|
+
```
|
|
31
|
+
"""
|
|
32
|
+
if len(labels_names) > len(COLORS):
|
|
33
|
+
raise ValueError(
|
|
34
|
+
f"Too many labels ({len(labels_names)}) for the available colors ({len(COLORS)})."
|
|
35
|
+
)
|
|
36
|
+
labels_xml = "\n".join(
|
|
37
|
+
f' <Label value="{label}" background="{color}"/>'
|
|
38
|
+
for label, color in zip(labels_names, COLORS[: len(labels_names)])
|
|
39
|
+
)
|
|
40
|
+
return f"""<View>
|
|
41
|
+
<Image name="image" value="$image_url"/>
|
|
42
|
+
<RectangleLabels name="label" toName="image">
|
|
43
|
+
{labels_xml}
|
|
44
|
+
</RectangleLabels>
|
|
45
|
+
</View>"""
|
|
@@ -3,7 +3,9 @@ import random
|
|
|
3
3
|
import string
|
|
4
4
|
|
|
5
5
|
import datasets
|
|
6
|
-
from openfoodfacts
|
|
6
|
+
from openfoodfacts import Flavor
|
|
7
|
+
from openfoodfacts.barcode import normalize_barcode
|
|
8
|
+
from openfoodfacts.images import download_image, generate_image_url
|
|
7
9
|
|
|
8
10
|
logger = logging.getLogger(__name__)
|
|
9
11
|
|
|
@@ -62,17 +64,49 @@ def format_object_detection_sample_from_hf(hf_sample: dict, split: str) -> dict:
|
|
|
62
64
|
annotation_results = format_annotation_results_from_hf(
|
|
63
65
|
objects, image_width, image_height
|
|
64
66
|
)
|
|
67
|
+
image_id = hf_sample["image_id"]
|
|
68
|
+
image_url = hf_meta["image_url"]
|
|
69
|
+
meta_kwargs = {}
|
|
70
|
+
|
|
71
|
+
if "off_image_id" in hf_meta:
|
|
72
|
+
# If `off_image_id` is present, we assume this is an Open Food Facts
|
|
73
|
+
# dataset sample.
|
|
74
|
+
# We normalize the barcode, and generate a new image URL
|
|
75
|
+
# to make sure that:
|
|
76
|
+
# - the image URL is valid with correct path
|
|
77
|
+
# - we use the images subdomain everywhere
|
|
78
|
+
off_image_id = hf_meta["off_image_id"]
|
|
79
|
+
meta_kwargs["off_image_id"] = off_image_id
|
|
80
|
+
barcode = normalize_barcode(hf_meta["barcode"])
|
|
81
|
+
meta_kwargs["barcode"] = barcode
|
|
82
|
+
image_id = f"{barcode}_{off_image_id}"
|
|
83
|
+
|
|
84
|
+
if ".openfoodfacts." in image_url:
|
|
85
|
+
flavor = Flavor.off
|
|
86
|
+
elif ".openbeautyfacts." in image_url:
|
|
87
|
+
flavor = Flavor.obf
|
|
88
|
+
elif ".openpetfoodfacts." in image_url:
|
|
89
|
+
flavor = Flavor.opf
|
|
90
|
+
elif ".openproductsfacts." in image_url:
|
|
91
|
+
flavor = Flavor.opf
|
|
92
|
+
else:
|
|
93
|
+
raise ValueError(
|
|
94
|
+
f"Unknown Open Food Facts flavor for image URL: {image_url}"
|
|
95
|
+
)
|
|
96
|
+
image_url = generate_image_url(
|
|
97
|
+
code=barcode, image_id=off_image_id, flavor=flavor
|
|
98
|
+
)
|
|
99
|
+
|
|
65
100
|
return {
|
|
66
101
|
"data": {
|
|
67
|
-
"image_id":
|
|
68
|
-
"image_url":
|
|
102
|
+
"image_id": image_id,
|
|
103
|
+
"image_url": image_url,
|
|
69
104
|
"batch": "null",
|
|
70
105
|
"split": split,
|
|
71
106
|
"meta": {
|
|
72
107
|
"width": image_width,
|
|
73
108
|
"height": image_height,
|
|
74
|
-
|
|
75
|
-
"off_image_id": hf_meta["off_image_id"],
|
|
109
|
+
**meta_kwargs,
|
|
76
110
|
},
|
|
77
111
|
},
|
|
78
112
|
"predictions": [{"result": annotation_results}],
|
|
@@ -111,7 +145,11 @@ def format_object_detection_sample_to_ls(
|
|
|
111
145
|
|
|
112
146
|
|
|
113
147
|
def format_object_detection_sample_to_hf(
|
|
114
|
-
task_data: dict,
|
|
148
|
+
task_data: dict,
|
|
149
|
+
annotations: list[dict],
|
|
150
|
+
label_names: list[str],
|
|
151
|
+
merge_labels: bool = False,
|
|
152
|
+
use_aws_cache: bool = True,
|
|
115
153
|
) -> dict | None:
|
|
116
154
|
if len(annotations) > 1:
|
|
117
155
|
logger.info("More than one annotation found, skipping")
|
|
@@ -122,8 +160,8 @@ def format_object_detection_sample_to_hf(
|
|
|
122
160
|
|
|
123
161
|
annotation = annotations[0]
|
|
124
162
|
bboxes = []
|
|
125
|
-
|
|
126
|
-
|
|
163
|
+
bbox_label_ids = []
|
|
164
|
+
bbox_label_names = []
|
|
127
165
|
|
|
128
166
|
for annotation_result in annotation["result"]:
|
|
129
167
|
if annotation_result["type"] != "rectanglelabels":
|
|
@@ -137,12 +175,13 @@ def format_object_detection_sample_to_hf(
|
|
|
137
175
|
x_max = x_min + width
|
|
138
176
|
y_max = y_min + height
|
|
139
177
|
bboxes.append([y_min, x_min, y_max, x_max])
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
178
|
+
|
|
179
|
+
label_name = label_names[0] if merge_labels else value["rectanglelabels"][0]
|
|
180
|
+
bbox_label_names.append(label_name)
|
|
181
|
+
bbox_label_ids.append(label_names.index(label_name))
|
|
143
182
|
|
|
144
183
|
image_url = task_data["image_url"]
|
|
145
|
-
image = download_image(image_url, error_raise=False)
|
|
184
|
+
image = download_image(image_url, error_raise=False, use_cache=use_aws_cache)
|
|
146
185
|
if image is None:
|
|
147
186
|
logger.error("Failed to download image: %s", image_url)
|
|
148
187
|
return None
|
|
@@ -159,14 +198,14 @@ def format_object_detection_sample_to_hf(
|
|
|
159
198
|
},
|
|
160
199
|
"objects": {
|
|
161
200
|
"bbox": bboxes,
|
|
162
|
-
"category_id":
|
|
163
|
-
"category_name":
|
|
201
|
+
"category_id": bbox_label_ids,
|
|
202
|
+
"category_name": bbox_label_names,
|
|
164
203
|
},
|
|
165
204
|
}
|
|
166
205
|
|
|
167
206
|
|
|
168
207
|
# The HuggingFace Dataset features
|
|
169
|
-
|
|
208
|
+
HF_DS_OBJECT_DETECTION_FEATURES = datasets.Features(
|
|
170
209
|
{
|
|
171
210
|
"image_id": datasets.Value("string"),
|
|
172
211
|
"image": datasets.features.Image(),
|
|
@@ -184,3 +223,20 @@ HF_DS_FEATURES = datasets.Features(
|
|
|
184
223
|
},
|
|
185
224
|
}
|
|
186
225
|
)
|
|
226
|
+
|
|
227
|
+
|
|
228
|
+
HF_DS_CLASSIFICATION_FEATURES = datasets.Features(
|
|
229
|
+
{
|
|
230
|
+
"image_id": datasets.Value("string"),
|
|
231
|
+
"image": datasets.features.Image(),
|
|
232
|
+
"width": datasets.Value("int64"),
|
|
233
|
+
"height": datasets.Value("int64"),
|
|
234
|
+
"meta": {
|
|
235
|
+
"barcode": datasets.Value("string"),
|
|
236
|
+
"off_image_id": datasets.Value("string"),
|
|
237
|
+
"image_url": datasets.Value("string"),
|
|
238
|
+
},
|
|
239
|
+
"category_id": datasets.Value("int64"),
|
|
240
|
+
"category_name": datasets.Value("string"),
|
|
241
|
+
}
|
|
242
|
+
)
|
|
@@ -1,3 +1,20 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: labelr
|
|
3
|
+
Version: 0.4.0
|
|
4
|
+
Summary: A command-line tool to manage labeling tasks with Label Studio.
|
|
5
|
+
Requires-Python: >=3.10
|
|
6
|
+
Description-Content-Type: text/markdown
|
|
7
|
+
License-File: LICENSE
|
|
8
|
+
Requires-Dist: datasets>=3.2.0
|
|
9
|
+
Requires-Dist: imagehash>=4.3.1
|
|
10
|
+
Requires-Dist: label-studio-sdk>=1.0.8
|
|
11
|
+
Requires-Dist: more-itertools>=10.5.0
|
|
12
|
+
Requires-Dist: openfoodfacts>=2.9.0
|
|
13
|
+
Requires-Dist: typer>=0.15.1
|
|
14
|
+
Provides-Extra: ultralytics
|
|
15
|
+
Requires-Dist: ultralytics>=8.3.49; extra == "ultralytics"
|
|
16
|
+
Dynamic: license-file
|
|
17
|
+
|
|
1
18
|
# Labelr
|
|
2
19
|
|
|
3
20
|
Labelr a command line interface that aims to provide a set of tools to help data scientists and machine learning engineers to deal with ML data annotation, data preprocessing and format conversion.
|
|
@@ -47,7 +64,17 @@ For all the commands that interact with Label Studio, you need to provide an API
|
|
|
47
64
|
|
|
48
65
|
#### Create a project
|
|
49
66
|
|
|
50
|
-
Once you have a Label Studio instance running, you can create a project
|
|
67
|
+
Once you have a Label Studio instance running, you can create a project easily. First, you need to create a configuration file for the project. The configuration file is an XML file that defines the labeling interface and the labels to use for the project. You can find an example of a configuration file in the [Label Studio documentation](https://labelstud.io/guide/setup).
|
|
68
|
+
|
|
69
|
+
For an object detection task, a command allows you to create the configuration file automatically:
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
labelr projects create-config --labels 'label1' --labels 'label2' --output-file label_config.xml
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
where `label1` and `label2` are the labels you want to use for the object detection task, and `label_config.xml` is the output file that will contain the configuration.
|
|
76
|
+
|
|
77
|
+
Then, you can create a project on Label Studio with the following command:
|
|
51
78
|
|
|
52
79
|
```bash
|
|
53
80
|
labelr projects create --title my_project --api-key API_KEY --config-file label_config.xml
|
|
@@ -110,4 +137,4 @@ To export the data to a Hugging Face dataset, use the following command:
|
|
|
110
137
|
labelr datasets export --project-id PROJECT_ID --from ls --to huggingface --repo-id REPO_ID --label-names 'product,price-tag'
|
|
111
138
|
```
|
|
112
139
|
|
|
113
|
-
where `REPO_ID` is the ID of the Hugging Face repository where the dataset will be uploaded (ex: `openfoodfacts/food-detection`).
|
|
140
|
+
where `REPO_ID` is the ID of the Hugging Face repository where the dataset will be uploaded (ex: `openfoodfacts/food-detection`).
|
|
@@ -2,13 +2,8 @@ datasets>=3.2.0
|
|
|
2
2
|
imagehash>=4.3.1
|
|
3
3
|
label-studio-sdk>=1.0.8
|
|
4
4
|
more-itertools>=10.5.0
|
|
5
|
-
openfoodfacts>=2.
|
|
6
|
-
protobuf>=5.29.1
|
|
5
|
+
openfoodfacts>=2.9.0
|
|
7
6
|
typer>=0.15.1
|
|
8
7
|
|
|
9
|
-
[triton]
|
|
10
|
-
tritonclient>=2.52.0
|
|
11
|
-
openfoodfacts[ml]>=2.3.4
|
|
12
|
-
|
|
13
8
|
[ultralytics]
|
|
14
9
|
ultralytics>=8.3.49
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|