labelr 0.7.0__tar.gz → 0.8.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. {labelr-0.7.0/src/labelr.egg-info → labelr-0.8.0}/PKG-INFO +8 -6
  2. {labelr-0.7.0 → labelr-0.8.0}/README.md +7 -5
  3. {labelr-0.7.0 → labelr-0.8.0}/pyproject.toml +1 -1
  4. {labelr-0.7.0 → labelr-0.8.0}/src/labelr/apps/datasets.py +12 -25
  5. labelr-0.8.0/src/labelr/apps/evaluate.py +41 -0
  6. labelr-0.8.0/src/labelr/apps/hugging_face.py +57 -0
  7. labelr-0.7.0/src/labelr/apps/projects.py → labelr-0.8.0/src/labelr/apps/label_studio.py +65 -9
  8. {labelr-0.7.0 → labelr-0.8.0}/src/labelr/apps/train.py +22 -4
  9. labelr-0.8.0/src/labelr/evaluate/__init__.py +0 -0
  10. labelr-0.8.0/src/labelr/evaluate/llm.py +0 -0
  11. labelr-0.8.0/src/labelr/evaluate/object_detection.py +100 -0
  12. {labelr-0.7.0 → labelr-0.8.0}/src/labelr/export.py +1 -7
  13. {labelr-0.7.0 → labelr-0.8.0}/src/labelr/main.py +17 -8
  14. {labelr-0.7.0 → labelr-0.8.0}/src/labelr/sample.py +30 -4
  15. {labelr-0.7.0 → labelr-0.8.0/src/labelr.egg-info}/PKG-INFO +8 -6
  16. {labelr-0.7.0 → labelr-0.8.0}/src/labelr.egg-info/SOURCES.txt +6 -2
  17. labelr-0.7.0/src/labelr/apps/users.py +0 -36
  18. {labelr-0.7.0 → labelr-0.8.0}/LICENSE +0 -0
  19. {labelr-0.7.0 → labelr-0.8.0}/setup.cfg +0 -0
  20. {labelr-0.7.0 → labelr-0.8.0}/src/labelr/__init__.py +0 -0
  21. {labelr-0.7.0 → labelr-0.8.0}/src/labelr/__main__.py +0 -0
  22. {labelr-0.7.0 → labelr-0.8.0}/src/labelr/annotate.py +0 -0
  23. {labelr-0.7.0 → labelr-0.8.0}/src/labelr/apps/__init__.py +0 -0
  24. {labelr-0.7.0 → labelr-0.8.0}/src/labelr/check.py +0 -0
  25. {labelr-0.7.0 → labelr-0.8.0}/src/labelr/config.py +0 -0
  26. {labelr-0.7.0 → labelr-0.8.0}/src/labelr/dataset_features.py +0 -0
  27. {labelr-0.7.0 → labelr-0.8.0}/src/labelr/project_config.py +0 -0
  28. {labelr-0.7.0 → labelr-0.8.0}/src/labelr/types.py +0 -0
  29. {labelr-0.7.0 → labelr-0.8.0}/src/labelr/utils.py +0 -0
  30. {labelr-0.7.0 → labelr-0.8.0}/src/labelr.egg-info/dependency_links.txt +0 -0
  31. {labelr-0.7.0 → labelr-0.8.0}/src/labelr.egg-info/entry_points.txt +0 -0
  32. {labelr-0.7.0 → labelr-0.8.0}/src/labelr.egg-info/requires.txt +0 -0
  33. {labelr-0.7.0 → labelr-0.8.0}/src/labelr.egg-info/top_level.txt +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: labelr
3
- Version: 0.7.0
3
+ Version: 0.8.0
4
4
  Summary: A command-line tool to manage labeling tasks with Label Studio.
5
5
  Requires-Python: >=3.10
6
6
  Description-Content-Type: text/markdown
@@ -73,7 +73,7 @@ Once you have a Label Studio instance running, you can create a project easily.
73
73
  For an object detection task, a command allows you to create the configuration file automatically:
74
74
 
75
75
  ```bash
76
- labelr projects create-config --labels 'label1' --labels 'label2' --output-file label_config.xml
76
+ labelr ls create-config --labels 'label1' --labels 'label2' --output-file label_config.xml
77
77
  ```
78
78
 
79
79
  where `label1` and `label2` are the labels you want to use for the object detection task, and `label_config.xml` is the output file that will contain the configuration.
@@ -81,17 +81,19 @@ where `label1` and `label2` are the labels you want to use for the object detect
81
81
  Then, you can create a project on Label Studio with the following command:
82
82
 
83
83
  ```bash
84
- labelr projects create --title my_project --api-key API_KEY --config-file label_config.xml
84
+ labelr ls create --title my_project --api-key API_KEY --config-file label_config.xml
85
85
  ```
86
86
 
87
87
  where `API_KEY` is the API key of the Label Studio instance (API key is available at Account page), and `label_config.xml` is the configuration file of the project.
88
88
 
89
+ `ls` stands for Label Studio in the CLI.
90
+
89
91
  #### Create a dataset file
90
92
 
91
93
  If you have a list of images, for an object detection task, you can quickly create a dataset file with the following command:
92
94
 
93
95
  ```bash
94
- labelr projects create-dataset-file --input-file image_urls.txt --output-file dataset.json
96
+ labelr ls create-dataset-file --input-file image_urls.txt --output-file dataset.json
95
97
  ```
96
98
 
97
99
  where `image_urls.txt` is a file containing the URLs of the images, one per line, and `dataset.json` is the output file.
@@ -101,7 +103,7 @@ where `image_urls.txt` is a file containing the URLs of the images, one per line
101
103
  Next, import the generated data to a project with the following command:
102
104
 
103
105
  ```bash
104
- labelr projects import-data --project-id PROJECT_ID --dataset-path dataset.json
106
+ labelr ls import-data --project-id PROJECT_ID --dataset-path dataset.json
105
107
  ```
106
108
 
107
109
  where `PROJECT_ID` is the ID of the project you created.
@@ -117,7 +119,7 @@ To accelerate annotation, you can pre-annotate the images with an object detecti
117
119
  To pre-annotate the data with Triton, use the following command:
118
120
 
119
121
  ```bash
120
- labelr projects add-prediction --project-id PROJECT_ID --backend ultralytics --labels 'product' --labels 'price tag' --label-mapping '{"price tag": "price-tag"}'
122
+ labelr ls add-prediction --project-id PROJECT_ID --backend ultralytics --labels 'product' --labels 'price tag' --label-mapping '{"price tag": "price-tag"}'
121
123
  ```
122
124
 
123
125
  where `labels` is the list of labels to use for the object detection task (you can add as many labels as you want).
@@ -52,7 +52,7 @@ Once you have a Label Studio instance running, you can create a project easily.
52
52
  For an object detection task, a command allows you to create the configuration file automatically:
53
53
 
54
54
  ```bash
55
- labelr projects create-config --labels 'label1' --labels 'label2' --output-file label_config.xml
55
+ labelr ls create-config --labels 'label1' --labels 'label2' --output-file label_config.xml
56
56
  ```
57
57
 
58
58
  where `label1` and `label2` are the labels you want to use for the object detection task, and `label_config.xml` is the output file that will contain the configuration.
@@ -60,17 +60,19 @@ where `label1` and `label2` are the labels you want to use for the object detect
60
60
  Then, you can create a project on Label Studio with the following command:
61
61
 
62
62
  ```bash
63
- labelr projects create --title my_project --api-key API_KEY --config-file label_config.xml
63
+ labelr ls create --title my_project --api-key API_KEY --config-file label_config.xml
64
64
  ```
65
65
 
66
66
  where `API_KEY` is the API key of the Label Studio instance (API key is available at Account page), and `label_config.xml` is the configuration file of the project.
67
67
 
68
+ `ls` stands for Label Studio in the CLI.
69
+
68
70
  #### Create a dataset file
69
71
 
70
72
  If you have a list of images, for an object detection task, you can quickly create a dataset file with the following command:
71
73
 
72
74
  ```bash
73
- labelr projects create-dataset-file --input-file image_urls.txt --output-file dataset.json
75
+ labelr ls create-dataset-file --input-file image_urls.txt --output-file dataset.json
74
76
  ```
75
77
 
76
78
  where `image_urls.txt` is a file containing the URLs of the images, one per line, and `dataset.json` is the output file.
@@ -80,7 +82,7 @@ where `image_urls.txt` is a file containing the URLs of the images, one per line
80
82
  Next, import the generated data to a project with the following command:
81
83
 
82
84
  ```bash
83
- labelr projects import-data --project-id PROJECT_ID --dataset-path dataset.json
85
+ labelr ls import-data --project-id PROJECT_ID --dataset-path dataset.json
84
86
  ```
85
87
 
86
88
  where `PROJECT_ID` is the ID of the project you created.
@@ -96,7 +98,7 @@ To accelerate annotation, you can pre-annotate the images with an object detecti
96
98
  To pre-annotate the data with Triton, use the following command:
97
99
 
98
100
  ```bash
99
- labelr projects add-prediction --project-id PROJECT_ID --backend ultralytics --labels 'product' --labels 'price tag' --label-mapping '{"price tag": "price-tag"}'
101
+ labelr ls add-prediction --project-id PROJECT_ID --backend ultralytics --labels 'product' --labels 'price tag' --label-mapping '{"price tag": "price-tag"}'
100
102
  ```
101
103
 
102
104
  where `labels` is the list of labels to use for the object detection task (you can add as many labels as you want).
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "labelr"
3
- version = "0.7.0"
3
+ version = "0.8.0"
4
4
  description = "A command-line tool to manage labeling tasks with Label Studio."
5
5
  readme = "README.md"
6
6
  requires-python = ">=3.10"
@@ -1,3 +1,6 @@
1
+ """Commands to manage datasets local datasets and export between platforms
2
+ (Label Studio, HuggingFace Hub, local dataset,...)."""
3
+
1
4
  import json
2
5
  import random
3
6
  import shutil
@@ -21,45 +24,29 @@ logger = get_logger(__name__)
21
24
 
22
25
  @app.command()
23
26
  def check(
24
- api_key: Annotated[
25
- Optional[str], typer.Option(envvar="LABEL_STUDIO_API_KEY")
26
- ] = None,
27
- project_id: Annotated[
28
- Optional[int], typer.Option(help="Label Studio Project ID")
29
- ] = None,
30
- label_studio_url: str = LABEL_STUDIO_DEFAULT_URL,
31
27
  dataset_dir: Annotated[
32
- Optional[Path],
28
+ Path,
33
29
  typer.Option(
34
30
  help="Path to the dataset directory", exists=True, file_okay=False
35
31
  ),
36
- ] = None,
32
+ ],
37
33
  remove: Annotated[
38
34
  bool,
39
- typer.Option(
40
- help="Remove duplicate images from the dataset, only for local datasets"
41
- ),
35
+ typer.Option(help="Remove duplicate images from the dataset"),
42
36
  ] = False,
43
37
  ):
44
- """Check a dataset for duplicate images."""
45
- from label_studio_sdk.client import LabelStudio
38
+ """Check a local dataset in Ultralytics format for duplicate images."""
46
39
 
47
- from ..check import check_local_dataset, check_ls_dataset
40
+ from ..check import check_local_dataset
48
41
 
49
- if project_id is not None:
50
- ls = LabelStudio(base_url=label_studio_url, api_key=api_key)
51
- check_ls_dataset(ls, project_id)
52
- elif dataset_dir is not None:
53
- check_local_dataset(dataset_dir, remove=remove)
54
- else:
55
- raise typer.BadParameter("Either project ID or dataset directory is required")
42
+ check_local_dataset(dataset_dir, remove=remove)
56
43
 
57
44
 
58
45
  @app.command()
59
46
  def split_train_test(
60
47
  task_type: TaskType, dataset_dir: Path, output_dir: Path, train_ratio: float = 0.8
61
48
  ):
62
- """Split a dataset into training and test sets.
49
+ """Split a local dataset into training and test sets.
63
50
 
64
51
  Only classification tasks are supported.
65
52
  """
@@ -112,7 +99,7 @@ def convert_object_detection_dataset(
112
99
  Studio format, and save it to a JSON file."""
113
100
  from datasets import load_dataset
114
101
 
115
- from labelr.sample import format_object_detection_sample_from_hf
102
+ from labelr.sample import format_object_detection_sample_from_hf_to_ls
116
103
 
117
104
  logger.info("Loading dataset: %s", repo_id)
118
105
  ds = load_dataset(repo_id)
@@ -122,7 +109,7 @@ def convert_object_detection_dataset(
122
109
  for split in ds.keys():
123
110
  logger.info("Processing split: %s", split)
124
111
  for sample in ds[split]:
125
- label_studio_sample = format_object_detection_sample_from_hf(
112
+ label_studio_sample = format_object_detection_sample_from_hf_to_ls(
126
113
  sample, split=split
127
114
  )
128
115
  f.write(json.dumps(label_studio_sample) + "\n")
@@ -0,0 +1,41 @@
1
+ from typing import Annotated
2
+
3
+ import typer
4
+
5
+ app = typer.Typer()
6
+
7
+
8
+ @app.command()
9
+ def visualize_object_detection(
10
+ hf_repo_id: Annotated[
11
+ str,
12
+ typer.Option(
13
+ ...,
14
+ help="Hugging Face repository ID of the trained model. "
15
+ "A `predictions.parquet` file is expected in the repo. Revision can be specified "
16
+ "by appending `@<revision>` to the repo ID.",
17
+ ),
18
+ ],
19
+ dataset_name: Annotated[
20
+ str | None, typer.Option(..., help="Name of the FiftyOne dataset to create.")
21
+ ] = None,
22
+ persistent: Annotated[
23
+ bool,
24
+ typer.Option(
25
+ ...,
26
+ help="Whether to make the FiftyOne dataset persistent (i.e., saved to disk).",
27
+ ),
28
+ ] = False,
29
+ ):
30
+ """Visualize object detection model predictions stored in a Hugging Face
31
+ repository using FiftyOne."""
32
+ from labelr.evaluate import object_detection
33
+
34
+ if dataset_name is None:
35
+ dataset_name = hf_repo_id.replace("/", "-").replace("@", "-")
36
+
37
+ object_detection.visualize(
38
+ hf_repo_id=hf_repo_id,
39
+ dataset_name=dataset_name,
40
+ persistent=persistent,
41
+ )
@@ -0,0 +1,57 @@
1
+ from pathlib import Path
2
+ from typing import Annotated
3
+
4
+ import typer
5
+
6
+ app = typer.Typer()
7
+
8
+
9
+ @app.command()
10
+ def show_hf_sample(
11
+ repo_id: Annotated[
12
+ str,
13
+ typer.Argument(
14
+ ...,
15
+ help="Hugging Face Datasets repo ID. The revision can be specified by "
16
+ "appending `@<revision>` to the repo ID.",
17
+ ),
18
+ ],
19
+ image_id: Annotated[
20
+ str,
21
+ typer.Argument(
22
+ ...,
23
+ help="ID of the image associated with the sample to display (field: `image_id`)",
24
+ ),
25
+ ],
26
+ output_image_path: Annotated[
27
+ Path | None,
28
+ typer.Option(help="Path to save the sample image (optional)", exists=False),
29
+ ] = None,
30
+ ):
31
+ """Display a sample from a Hugging Face Datasets repository by image ID."""
32
+ from labelr.utils import parse_hf_repo_id
33
+
34
+ repo_id, revision = parse_hf_repo_id(repo_id)
35
+
36
+ from datasets import load_dataset
37
+
38
+ ds = load_dataset(repo_id, revision=revision)
39
+
40
+ sample = None
41
+ for split in ds.keys():
42
+ samples = ds[split].filter(lambda x: x == image_id, input_columns="image_id")
43
+ if len(samples) > 0:
44
+ sample = samples[0]
45
+ break
46
+ if sample is None:
47
+ typer.echo(f"Sample with image ID {image_id} not found in dataset {repo_id}")
48
+ raise typer.Exit(code=1)
49
+
50
+ else:
51
+ for key, value in sample.items():
52
+ typer.echo(f"{key}: {value}")
53
+
54
+ if output_image_path is not None:
55
+ image = sample["image"]
56
+ image.save(output_image_path)
57
+ typer.echo(f"Image saved to {output_image_path}")
@@ -6,12 +6,7 @@ from typing import Annotated, Optional
6
6
 
7
7
  import typer
8
8
  from openfoodfacts.utils import get_logger
9
- from PIL import Image
10
9
 
11
- from ..annotate import (
12
- format_annotation_results_from_robotoff,
13
- format_annotation_results_from_ultralytics,
14
- )
15
10
  from ..config import LABEL_STUDIO_DEFAULT_URL
16
11
 
17
12
  app = typer.Typer()
@@ -43,14 +38,20 @@ def import_data(
43
38
  api_key: Annotated[str, typer.Option(envvar="LABEL_STUDIO_API_KEY")],
44
39
  project_id: Annotated[int, typer.Option(help="Label Studio Project ID")],
45
40
  dataset_path: Annotated[
46
- Path, typer.Option(help="Path to the Label Studio dataset file", file_okay=True)
41
+ Path,
42
+ typer.Option(
43
+ help="Path to the Label Studio dataset JSONL file", file_okay=True
44
+ ),
47
45
  ],
48
46
  label_studio_url: str = LABEL_STUDIO_DEFAULT_URL,
49
47
  batch_size: int = 25,
50
48
  ):
51
49
  """Import tasks from a dataset file to a Label Studio project.
52
50
 
53
- The dataset file should contain one JSON object per line."""
51
+ The dataset file must be a JSONL file: it should contain one JSON object
52
+ per line. To generate such a file, you can use the `create-dataset-file`
53
+ command.
54
+ """
54
55
  import more_itertools
55
56
  import tqdm
56
57
  from label_studio_sdk.client import LabelStudio
@@ -279,6 +280,12 @@ def add_prediction(
279
280
  import tqdm
280
281
  from label_studio_sdk.client import LabelStudio
281
282
  from openfoodfacts.utils import get_image_from_url, http_session
283
+ from PIL import Image
284
+
285
+ from ..annotate import (
286
+ format_annotation_results_from_robotoff,
287
+ format_annotation_results_from_ultralytics,
288
+ )
282
289
 
283
290
  label_mapping_dict = None
284
291
  if label_mapping:
@@ -375,11 +382,16 @@ def create_dataset_file(
375
382
  typer.Option(help="Path to a list of image URLs", exists=True),
376
383
  ],
377
384
  output_file: Annotated[
378
- Path, typer.Option(help="Path to the output JSON file", exists=False)
385
+ Path, typer.Option(help="Path to the output JSONL file", exists=False)
379
386
  ],
380
387
  ):
381
388
  """Create a Label Studio object detection dataset file from a list of
382
- image URLs."""
389
+ image URLs.
390
+
391
+ The output file is a JSONL file. It cannot be imported directly in Label
392
+ Studio (which requires a JSON file as input), the `import-data` command
393
+ should be used to import the generated dataset file.
394
+ """
383
395
  from urllib.parse import urlparse
384
396
 
385
397
  import tqdm
@@ -432,3 +444,47 @@ def create_config_file(
432
444
  config = create_object_detection_label_config(labels)
433
445
  output_file.write_text(config)
434
446
  logger.info("Label config file created: %s", output_file)
447
+
448
+
449
+ @app.command()
450
+ def check_dataset(
451
+ project_id: Annotated[int, typer.Option(help="Label Studio Project ID")],
452
+ api_key: Annotated[
453
+ Optional[str], typer.Option(envvar="LABEL_STUDIO_API_KEY")
454
+ ] = None,
455
+ label_studio_url: str = LABEL_STUDIO_DEFAULT_URL,
456
+ ):
457
+ """Check a dataset for duplicate images on Label Studio."""
458
+ from label_studio_sdk.client import LabelStudio
459
+
460
+ from ..check import check_ls_dataset
461
+
462
+ ls = LabelStudio(base_url=label_studio_url, api_key=api_key)
463
+ check_ls_dataset(ls, project_id)
464
+
465
+
466
+ @app.command()
467
+ def list_users(
468
+ api_key: Annotated[str, typer.Option(envvar="LABEL_STUDIO_API_KEY")],
469
+ label_studio_url: str = LABEL_STUDIO_DEFAULT_URL,
470
+ ):
471
+ """List all users in Label Studio."""
472
+ from label_studio_sdk.client import LabelStudio
473
+
474
+ ls = LabelStudio(base_url=label_studio_url, api_key=api_key)
475
+
476
+ for user in ls.users.list():
477
+ print(f"{user.id:02d}: {user.email}")
478
+
479
+
480
+ @app.command()
481
+ def delete_user(
482
+ user_id: int,
483
+ api_key: Annotated[str, typer.Option(envvar="LABEL_STUDIO_API_KEY")],
484
+ label_studio_url: str = LABEL_STUDIO_DEFAULT_URL,
485
+ ):
486
+ """Delete a user from Label Studio."""
487
+ from label_studio_sdk.client import LabelStudio
488
+
489
+ ls = LabelStudio(base_url=label_studio_url, api_key=api_key)
490
+ ls.users.delete(user_id)
@@ -1,7 +1,6 @@
1
1
  import datetime
2
2
 
3
3
  import typer
4
- from google.cloud import batch_v1
5
4
 
6
5
  app = typer.Typer()
7
6
 
@@ -28,6 +27,11 @@ AVAILABLE_OBJECT_DETECTION_MODELS = [
28
27
  "yolo11m.pt",
29
28
  "yolo11l.pt",
30
29
  "yolo11x.pt",
30
+ "yolo12n.pt",
31
+ "yolo12s.pt",
32
+ "yolo12m.pt",
33
+ "yolo12l.pt",
34
+ "yolo12x.pt",
31
35
  ]
32
36
 
33
37
 
@@ -42,6 +46,9 @@ def train_object_detection(
42
46
  help="The Hugging Face token, used to push the trained model to Hugging Face Hub.",
43
47
  ),
44
48
  run_name: str = typer.Option(..., help="A name for the training run."),
49
+ add_date_to_run_name: bool = typer.Option(
50
+ True, help="Whether to append the date to the run name."
51
+ ),
45
52
  hf_repo_id: str = typer.Option(
46
53
  ..., help="The Hugging Face dataset repository ID to use to train."
47
54
  ),
@@ -64,6 +71,11 @@ def train_object_detection(
64
71
  f"Invalid model name '{model_name}'. Available models are: {', '.join(AVAILABLE_OBJECT_DETECTION_MODELS)}"
65
72
  )
66
73
 
74
+ datestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
75
+
76
+ if add_date_to_run_name:
77
+ run_name = f"{run_name}-{datestamp}"
78
+
67
79
  env_variables = {
68
80
  "HF_REPO_ID": hf_repo_id,
69
81
  "HF_TRAINED_MODEL_REPO_ID": hf_trained_model_repo_id,
@@ -77,8 +89,12 @@ def train_object_detection(
77
89
  "USE_AWS_IMAGE_CACHE": "False",
78
90
  "YOLO_MODEL_NAME": model_name,
79
91
  }
80
- job_name = "train-yolo-job"
81
- job_name = job_name + "-" + datetime.datetime.now().strftime("%Y%m%d%H%M%S")
92
+
93
+ job_name = f"train-yolo-job-{run_name}"
94
+ if not add_date_to_run_name:
95
+ # Ensure job name is unique by adding a datestamp if date is not added to run name
96
+ job_name = f"{job_name}-{datestamp}"
97
+
82
98
  job = launch_job(
83
99
  job_name=job_name,
84
100
  container_image_uri="europe-west9-docker.pkg.dev/robotoff/gcf-artifacts/train-yolo",
@@ -112,7 +128,7 @@ def launch_job(
112
128
  accelerators_count: int = 1,
113
129
  region: str = "europe-west4",
114
130
  install_gpu_drivers: bool = True,
115
- ) -> batch_v1.Job:
131
+ ):
116
132
  """This method creates a Batch Job on GCP.
117
133
 
118
134
  Sources:
@@ -126,6 +142,8 @@ def launch_job(
126
142
  Returns:
127
143
  Batch job information.
128
144
  """
145
+ from google.cloud import batch_v1
146
+
129
147
  client = batch_v1.BatchServiceClient()
130
148
 
131
149
  # Define what will be done as part of the job.
File without changes
File without changes
@@ -0,0 +1,100 @@
1
+ import tempfile
2
+ from pathlib import Path
3
+
4
+ import datasets
5
+ import fiftyone as fo
6
+ from huggingface_hub import hf_hub_download
7
+
8
+ from labelr.dataset_features import OBJECT_DETECTION_DS_PREDICTION_FEATURES
9
+ from labelr.utils import parse_hf_repo_id
10
+
11
+
12
+ def convert_bbox_to_fo_format(
13
+ bbox: tuple[float, float, float, float],
14
+ ) -> tuple[float, float, float, float]:
15
+ # Bounding box coordinates should be relative values
16
+ # in [0, 1] in the following format:
17
+ # [top-left-x, top-left-y, width, height]
18
+ y_min, x_min, y_max, x_max = bbox
19
+ return (
20
+ x_min,
21
+ y_min,
22
+ (x_max - x_min),
23
+ (y_max - y_min),
24
+ )
25
+
26
+
27
+ def visualize(
28
+ hf_repo_id: str,
29
+ dataset_name: str,
30
+ persistent: bool,
31
+ ):
32
+ hf_repo_id, hf_revision = parse_hf_repo_id(hf_repo_id)
33
+
34
+ file_path = hf_hub_download(
35
+ hf_repo_id,
36
+ filename="predictions.parquet",
37
+ revision=hf_revision,
38
+ repo_type="model",
39
+ # local_dir="./predictions/",
40
+ )
41
+ file_path = Path(file_path).absolute()
42
+ prediction_dataset = datasets.load_dataset(
43
+ "parquet",
44
+ data_files=str(file_path),
45
+ split="train",
46
+ features=OBJECT_DETECTION_DS_PREDICTION_FEATURES,
47
+ )
48
+ fo_dataset = fo.Dataset(name=dataset_name, persistent=persistent)
49
+
50
+ with tempfile.TemporaryDirectory() as tmpdir_str:
51
+ tmp_dir = Path(tmpdir_str)
52
+ for i, hf_sample in enumerate(prediction_dataset):
53
+ image = hf_sample["image"]
54
+ image_path = tmp_dir / f"{i}.jpg"
55
+ image.save(image_path)
56
+ split = hf_sample["split"]
57
+ sample = fo.Sample(
58
+ filepath=image_path,
59
+ split=split,
60
+ tags=[split],
61
+ image=hf_sample["image_id"],
62
+ )
63
+ ground_truth_detections = [
64
+ fo.Detection(
65
+ label=hf_sample["objects"]["category_name"][i],
66
+ bounding_box=convert_bbox_to_fo_format(
67
+ bbox=hf_sample["objects"]["bbox"][i],
68
+ ),
69
+ )
70
+ for i in range(len(hf_sample["objects"]["bbox"]))
71
+ ]
72
+ sample["ground_truth"] = fo.Detections(detections=ground_truth_detections)
73
+
74
+ if hf_sample["detected"] is not None and hf_sample["detected"]["bbox"]:
75
+ model_detections = [
76
+ fo.Detection(
77
+ label=hf_sample["detected"]["category_name"][i],
78
+ bounding_box=convert_bbox_to_fo_format(
79
+ bbox=hf_sample["detected"]["bbox"][i]
80
+ ),
81
+ confidence=hf_sample["detected"]["confidence"][i],
82
+ )
83
+ for i in range(len(hf_sample["detected"]["bbox"]))
84
+ ]
85
+ sample["model"] = fo.Detections(detections=model_detections)
86
+
87
+ fo_dataset.add_sample(sample)
88
+
89
+ # View summary info about the dataset
90
+ print(fo_dataset)
91
+
92
+ # Print the first few samples in the dataset
93
+ print(fo_dataset.head())
94
+
95
+ # Visualize the dataset in the FiftyOne App
96
+ session = fo.launch_app(fo_dataset)
97
+ fo_dataset.evaluate_detections(
98
+ "model", gt_field="ground_truth", eval_key="eval", compute_mAP=True
99
+ )
100
+ session.wait()
@@ -77,13 +77,7 @@ def export_from_ls_to_hf_object_detection(
77
77
  functools.partial(_pickle_sample_generator, tmp_dir),
78
78
  features=HF_DS_OBJECT_DETECTION_FEATURES,
79
79
  )
80
- hf_ds.push_to_hub(
81
- repo_id,
82
- split=split,
83
- revision=revision,
84
- # Create a PR if not pushing to main branch
85
- create_pr=revision != "main",
86
- )
80
+ hf_ds.push_to_hub(repo_id, split=split, revision=revision)
87
81
 
88
82
 
89
83
  def export_from_ls_to_ultralytics_object_detection(
@@ -4,9 +4,10 @@ import typer
4
4
  from openfoodfacts.utils import get_logger
5
5
 
6
6
  from labelr.apps import datasets as dataset_app
7
- from labelr.apps import projects as project_app
7
+ from labelr.apps import evaluate as evaluate_app
8
+ from labelr.apps import hugging_face as hf_app
9
+ from labelr.apps import label_studio as ls_app
8
10
  from labelr.apps import train as train_app
9
- from labelr.apps import users as user_app
10
11
 
11
12
  app = typer.Typer(pretty_exceptions_show_locals=False)
12
13
 
@@ -58,22 +59,30 @@ def predict(
58
59
  typer.echo(result)
59
60
 
60
61
 
61
- app.add_typer(user_app.app, name="users", help="Manage Label Studio users")
62
62
  app.add_typer(
63
- project_app.app,
64
- name="projects",
65
- help="Manage Label Studio projects (create, import data, etc.)",
63
+ ls_app.app,
64
+ name="ls",
65
+ help="Manage Label Studio projects (create, import data, etc.).",
66
+ )
67
+ app.add_typer(
68
+ hf_app.app,
69
+ name="hf",
70
+ help="Manage Hugging Face Datasets repositories.",
66
71
  )
67
72
  app.add_typer(
68
73
  dataset_app.app,
69
74
  name="datasets",
70
75
  help="Manage datasets (convert, export, check, etc.)",
71
76
  )
72
-
73
77
  app.add_typer(
74
78
  train_app.app,
75
79
  name="train",
76
- help="Train models",
80
+ help="Train models.",
81
+ )
82
+ app.add_typer(
83
+ evaluate_app.app,
84
+ name="evaluate",
85
+ help="Visualize and evaluate trained models.",
77
86
  )
78
87
 
79
88
  if __name__ == "__main__":
@@ -1,16 +1,19 @@
1
1
  import logging
2
2
  import random
3
3
  import string
4
+ import typing
4
5
 
5
6
  import datasets
7
+ import PIL
6
8
  from openfoodfacts import Flavor
7
9
  from openfoodfacts.barcode import normalize_barcode
8
10
  from openfoodfacts.images import download_image, generate_image_url
11
+ from PIL import ImageOps
9
12
 
10
13
  logger = logging.getLogger(__name__)
11
14
 
12
15
 
13
- def format_annotation_results_from_hf(
16
+ def format_annotation_results_from_hf_to_ls(
14
17
  objects: dict, image_width: int, image_height: int
15
18
  ):
16
19
  """Format annotation results from a HF object detection dataset into Label
@@ -56,12 +59,12 @@ def format_annotation_results_from_hf(
56
59
  return annotation_results
57
60
 
58
61
 
59
- def format_object_detection_sample_from_hf(hf_sample: dict, split: str) -> dict:
62
+ def format_object_detection_sample_from_hf_to_ls(hf_sample: dict, split: str) -> dict:
60
63
  hf_meta = hf_sample["meta"]
61
64
  objects = hf_sample["objects"]
62
65
  image_width = hf_sample["width"]
63
66
  image_height = hf_sample["height"]
64
- annotation_results = format_annotation_results_from_hf(
67
+ annotation_results = format_annotation_results_from_hf_to_ls(
65
68
  objects, image_width, image_height
66
69
  )
67
70
  image_id = hf_sample["image_id"]
@@ -149,8 +152,24 @@ def format_object_detection_sample_to_hf(
149
152
  annotations: list[dict],
150
153
  label_names: list[str],
151
154
  merge_labels: bool = False,
152
- use_aws_cache: bool = True,
155
+ use_aws_cache: bool = False,
153
156
  ) -> dict | None:
157
+ """Format a Label Studio object detection sample to Hugging Face format.
158
+
159
+ Args:
160
+ task_data: The task data from Label Studio.
161
+ annotations: The annotations from Label Studio.
162
+ label_names: The list of label names.
163
+ merge_labels: Whether to merge all labels into a single label (the
164
+ first label in `label_names`).
165
+ use_aws_cache: Whether to use AWS cache when downloading images.
166
+
167
+ Returns:
168
+ The formatted sample, or None in the following cases:
169
+ - More than one annotation is found
170
+ - No annotation is found
171
+ - An error occurs when downloading the image
172
+ """
154
173
  if len(annotations) > 1:
155
174
  logger.info("More than one annotation found, skipping")
156
175
  return None
@@ -186,6 +205,13 @@ def format_object_detection_sample_to_hf(
186
205
  logger.error("Failed to download image: %s", image_url)
187
206
  return None
188
207
 
208
+ # Correct image orientation using EXIF data
209
+ # Label Studio provides bounding boxes based on the displayed image (after
210
+ # eventual EXIF rotation), so we need to apply the same transformation to
211
+ # the image.
212
+ # Indeed, Hugging Face stores images without applying EXIF rotation, and
213
+ # EXIF data is not preserved in the dataset.
214
+ ImageOps.exif_transpose(typing.cast(PIL.Image.Image, image), in_place=True)
189
215
  return {
190
216
  "image_id": task_data["image_id"],
191
217
  "image": image,
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: labelr
3
- Version: 0.7.0
3
+ Version: 0.8.0
4
4
  Summary: A command-line tool to manage labeling tasks with Label Studio.
5
5
  Requires-Python: >=3.10
6
6
  Description-Content-Type: text/markdown
@@ -73,7 +73,7 @@ Once you have a Label Studio instance running, you can create a project easily.
73
73
  For an object detection task, a command allows you to create the configuration file automatically:
74
74
 
75
75
  ```bash
76
- labelr projects create-config --labels 'label1' --labels 'label2' --output-file label_config.xml
76
+ labelr ls create-config --labels 'label1' --labels 'label2' --output-file label_config.xml
77
77
  ```
78
78
 
79
79
  where `label1` and `label2` are the labels you want to use for the object detection task, and `label_config.xml` is the output file that will contain the configuration.
@@ -81,17 +81,19 @@ where `label1` and `label2` are the labels you want to use for the object detect
81
81
  Then, you can create a project on Label Studio with the following command:
82
82
 
83
83
  ```bash
84
- labelr projects create --title my_project --api-key API_KEY --config-file label_config.xml
84
+ labelr ls create --title my_project --api-key API_KEY --config-file label_config.xml
85
85
  ```
86
86
 
87
87
  where `API_KEY` is the API key of the Label Studio instance (API key is available at Account page), and `label_config.xml` is the configuration file of the project.
88
88
 
89
+ `ls` stands for Label Studio in the CLI.
90
+
89
91
  #### Create a dataset file
90
92
 
91
93
  If you have a list of images, for an object detection task, you can quickly create a dataset file with the following command:
92
94
 
93
95
  ```bash
94
- labelr projects create-dataset-file --input-file image_urls.txt --output-file dataset.json
96
+ labelr ls create-dataset-file --input-file image_urls.txt --output-file dataset.json
95
97
  ```
96
98
 
97
99
  where `image_urls.txt` is a file containing the URLs of the images, one per line, and `dataset.json` is the output file.
@@ -101,7 +103,7 @@ where `image_urls.txt` is a file containing the URLs of the images, one per line
101
103
  Next, import the generated data to a project with the following command:
102
104
 
103
105
  ```bash
104
- labelr projects import-data --project-id PROJECT_ID --dataset-path dataset.json
106
+ labelr ls import-data --project-id PROJECT_ID --dataset-path dataset.json
105
107
  ```
106
108
 
107
109
  where `PROJECT_ID` is the ID of the project you created.
@@ -117,7 +119,7 @@ To accelerate annotation, you can pre-annotate the images with an object detecti
117
119
  To pre-annotate the data with Triton, use the following command:
118
120
 
119
121
  ```bash
120
- labelr projects add-prediction --project-id PROJECT_ID --backend ultralytics --labels 'product' --labels 'price tag' --label-mapping '{"price tag": "price-tag"}'
122
+ labelr ls add-prediction --project-id PROJECT_ID --backend ultralytics --labels 'product' --labels 'price tag' --label-mapping '{"price tag": "price-tag"}'
121
123
  ```
122
124
 
123
125
  where `labels` is the list of labels to use for the object detection task (you can add as many labels as you want).
@@ -21,6 +21,10 @@ src/labelr.egg-info/requires.txt
21
21
  src/labelr.egg-info/top_level.txt
22
22
  src/labelr/apps/__init__.py
23
23
  src/labelr/apps/datasets.py
24
- src/labelr/apps/projects.py
24
+ src/labelr/apps/evaluate.py
25
+ src/labelr/apps/hugging_face.py
26
+ src/labelr/apps/label_studio.py
25
27
  src/labelr/apps/train.py
26
- src/labelr/apps/users.py
28
+ src/labelr/evaluate/__init__.py
29
+ src/labelr/evaluate/llm.py
30
+ src/labelr/evaluate/object_detection.py
@@ -1,36 +0,0 @@
1
- from typing import Annotated
2
-
3
- import typer
4
-
5
- from ..config import LABEL_STUDIO_DEFAULT_URL
6
-
7
- app = typer.Typer()
8
-
9
- # Label Studio user management
10
-
11
-
12
- @app.command()
13
- def list(
14
- api_key: Annotated[str, typer.Option(envvar="LABEL_STUDIO_API_KEY")],
15
- label_studio_url: str = LABEL_STUDIO_DEFAULT_URL,
16
- ):
17
- """List all users in Label Studio."""
18
- from label_studio_sdk.client import LabelStudio
19
-
20
- ls = LabelStudio(base_url=label_studio_url, api_key=api_key)
21
-
22
- for user in ls.users.list():
23
- print(f"{user.id:02d}: {user.email}")
24
-
25
-
26
- @app.command()
27
- def delete(
28
- user_id: int,
29
- api_key: Annotated[str, typer.Option(envvar="LABEL_STUDIO_API_KEY")],
30
- label_studio_url: str = LABEL_STUDIO_DEFAULT_URL,
31
- ):
32
- """Delete a user from Label Studio."""
33
- from label_studio_sdk.client import LabelStudio
34
-
35
- ls = LabelStudio(base_url=label_studio_url, api_key=api_key)
36
- ls.users.delete(user_id)
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes