PyPI - labelr - Versions diffs - 0.2.0__tar.gz → 0.4.0__tar.gz - Mend

labelr 0.2.0tar.gz → 0.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

{labelr-0.2.0 → labelr-0.4.0}/PKG-INFO RENAMED Viewed

@@ -1,7 +1,7 @@
-Metadata-Version: 2.1
+Metadata-Version: 2.4
 Name: labelr
-Version: 0.2.0
-Summary: Add your description here
+Version: 0.4.0
+Summary: A command-line tool to manage labeling tasks with Label Studio.
 Requires-Python: >=3.10
 Description-Content-Type: text/markdown
 License-File: LICENSE
@@ -9,14 +9,11 @@ Requires-Dist: datasets>=3.2.0
 Requires-Dist: imagehash>=4.3.1
 Requires-Dist: label-studio-sdk>=1.0.8
 Requires-Dist: more-itertools>=10.5.0
-Requires-Dist: openfoodfacts>=2.3.4
-Requires-Dist: protobuf>=5.29.1
+Requires-Dist: openfoodfacts>=2.9.0
 Requires-Dist: typer>=0.15.1
 Provides-Extra: ultralytics
 Requires-Dist: ultralytics>=8.3.49; extra == "ultralytics"
-Provides-Extra: triton
-Requires-Dist: tritonclient>=2.52.0; extra == "triton"
-Requires-Dist: openfoodfacts[ml]>=2.3.4; extra == "triton"
+Dynamic: license-file
 # Labelr
@@ -67,7 +64,17 @@ For all the commands that interact with Label Studio, you need to provide an API
 #### Create a project
-Once you have a Label Studio instance running, you can create a project with the following command:
+Once you have a Label Studio instance running, you can create a project easily. First, you need to create a configuration file for the project. The configuration file is an XML file that defines the labeling interface and the labels to use for the project. You can find an example of a configuration file in the [Label Studio documentation](https://labelstud.io/guide/setup).
+For an object detection task, a command allows you to create the configuration file automatically:
+```bash
+labelr projects create-config --labels 'label1' --labels 'label2' --output-file label_config.xml
+```
+where `label1` and `label2` are the labels you want to use for the object detection task, and `label_config.xml` is the output file that will contain the configuration.
+Then, you can create a project on Label Studio with the following command:
 ```bash
 labelr projects create --title my_project --api-key API_KEY --config-file label_config.xml

labelr-0.2.0/src/labelr.egg-info/PKG-INFO → labelr-0.4.0/README.md RENAMED Viewed

@@ -1,23 +1,3 @@
-Metadata-Version: 2.1
-Name: labelr
-Version: 0.2.0
-Summary: Add your description here
-Requires-Python: >=3.10
-Description-Content-Type: text/markdown
-License-File: LICENSE
-Requires-Dist: datasets>=3.2.0
-Requires-Dist: imagehash>=4.3.1
-Requires-Dist: label-studio-sdk>=1.0.8
-Requires-Dist: more-itertools>=10.5.0
-Requires-Dist: openfoodfacts>=2.3.4
-Requires-Dist: protobuf>=5.29.1
-Requires-Dist: typer>=0.15.1
-Provides-Extra: ultralytics
-Requires-Dist: ultralytics>=8.3.49; extra == "ultralytics"
-Provides-Extra: triton
-Requires-Dist: tritonclient>=2.52.0; extra == "triton"
-Requires-Dist: openfoodfacts[ml]>=2.3.4; extra == "triton"
 # Labelr
 Labelr a command line interface that aims to provide a set of tools to help data scientists and machine learning engineers to deal with ML data annotation, data preprocessing and format conversion.
@@ -67,7 +47,17 @@ For all the commands that interact with Label Studio, you need to provide an API
 #### Create a project
-Once you have a Label Studio instance running, you can create a project with the following command:
+Once you have a Label Studio instance running, you can create a project easily. First, you need to create a configuration file for the project. The configuration file is an XML file that defines the labeling interface and the labels to use for the project. You can find an example of a configuration file in the [Label Studio documentation](https://labelstud.io/guide/setup).
+For an object detection task, a command allows you to create the configuration file automatically:
+```bash
+labelr projects create-config --labels 'label1' --labels 'label2' --output-file label_config.xml
+```
+where `label1` and `label2` are the labels you want to use for the object detection task, and `label_config.xml` is the output file that will contain the configuration.
+Then, you can create a project on Label Studio with the following command:
 ```bash
 labelr projects create --title my_project --api-key API_KEY --config-file label_config.xml
@@ -130,4 +120,4 @@ To export the data to a Hugging Face dataset, use the following command:
 labelr datasets export --project-id PROJECT_ID --from ls --to huggingface --repo-id REPO_ID --label-names 'product,price-tag'
 ```
-where `REPO_ID` is the ID of the Hugging Face repository where the dataset will be uploaded (ex: `openfoodfacts/food-detection`).
+where `REPO_ID` is the ID of the Hugging Face repository where the dataset will be uploaded (ex: `openfoodfacts/food-detection`).

{labelr-0.2.0 → labelr-0.4.0}/pyproject.toml RENAMED Viewed

@@ -1,7 +1,7 @@
 [project]
 name = "labelr"
-version = "0.2.0"
-description = "Add your description here"
+version = "0.4.0"
+description = "A command-line tool to manage labeling tasks with Label Studio."
 readme = "README.md"
 requires-python = ">=3.10"
 dependencies = [
@@ -9,8 +9,7 @@ dependencies = [
     "imagehash>=4.3.1",
     "label-studio-sdk>=1.0.8",
     "more-itertools>=10.5.0",
-    "openfoodfacts>=2.3.4",
-    "protobuf>=5.29.1",
+    "openfoodfacts>=2.9.0",
     "typer>=0.15.1",
 ]
@@ -21,10 +20,6 @@ labelr = "labelr.main:app"
 ultralytics = [
     "ultralytics>=8.3.49",
 ]
-triton = [
-    "tritonclient>=2.52.0",
-    "openfoodfacts[ml]>=2.3.4",
-]
 [tool.uv]
 package = true

{labelr-0.2.0 → labelr-0.4.0}/src/labelr/annotate.py RENAMED Viewed

@@ -1,29 +1,30 @@
 import random
 import string
+from openfoodfacts.types import JSONType
 from openfoodfacts.utils import get_logger
-try:
-    from openfoodfacts.ml.object_detection import ObjectDetectionRawResult
-    from ultralytics.engine.results import Results
-except ImportError:
-    pass
 logger = get_logger(__name__)
-def format_annotation_results_from_triton(
-    objects: list["ObjectDetectionRawResult"], image_width: int, image_height: int
-):
-    """Format annotation results from a Triton object detection model into
+def format_annotation_results_from_robotoff(
+    objects: list[JSONType],
+    image_width: int,
+    image_height: int,
+    label_mapping: dict[str, str] | None = None,
+) -> list[JSONType]:
+    """Format annotation results from Robotoff prediction endpoint into
     Label Studio format."""
     annotation_results = []
     for object_ in objects:
-        bbox = object_.bounding_box
-        category_name = object_.label
+        bounding_box = object_["bounding_box"]
+        label_name = object_["label"]
+        if label_mapping:
+            label_name = label_mapping.get(label_name, label_name)
         # These are relative coordinates (between 0.0 and 1.0)
-        y_min, x_min, y_max, x_max = bbox
+        y_min, x_min, y_max, x_max = bounding_box
         # Make sure the coordinates are within the image boundaries,
         # and convert them to percentages
         y_min = min(max(0, y_min), 1.0) * 100
@@ -51,7 +52,7 @@ def format_annotation_results_from_triton(
                     "y": y,
                     "width": width,
                     "height": height,
-                    "rectanglelabels": [category_name],
+                    "rectanglelabels": [label_name],
                 },
             },
         )

{labelr-0.2.0 → labelr-0.4.0}/src/labelr/apps/datasets.py RENAMED Viewed

@@ -6,8 +6,11 @@ from pathlib import Path
 from typing import Annotated, Optional
 import typer
+from openfoodfacts import Flavor
 from openfoodfacts.utils import get_logger
+from labelr.export import export_from_ultralytics_to_hf
 from ..config import LABEL_STUDIO_DEFAULT_URL
 from ..types import ExportDestination, ExportSource, TaskType
@@ -130,9 +133,14 @@ def export(
     from_: Annotated[ExportSource, typer.Option("--from", help="Input source to use")],
     to: Annotated[ExportDestination, typer.Option(help="Where to export the data")],
     api_key: Annotated[Optional[str], typer.Option(envvar="LABEL_STUDIO_API_KEY")],
+    task_type: Annotated[
+        TaskType, typer.Option(help="Type of task to export")
+    ] = TaskType.object_detection,
     repo_id: Annotated[
         Optional[str],
-        typer.Option(help="Hugging Face Datasets repository ID to convert"),
+        typer.Option(
+            help="Hugging Face Datasets repository ID to convert (only if --from or --to is `hf`)"
+        ),
     ] = None,
     label_names: Annotated[
         Optional[str],
@@ -146,12 +154,33 @@ def export(
         Optional[Path],
         typer.Option(help="Path to the output directory", file_okay=False),
     ] = None,
+    dataset_dir: Annotated[
+        Optional[Path],
+        typer.Option(help="Path to the dataset directory, only for Ultralytics source"),
+    ] = None,
     download_images: Annotated[
         bool,
         typer.Option(
             help="if True, don't use HF images and download images from the server"
         ),
     ] = False,
+    is_openfoodfacts_dataset: Annotated[
+        bool,
+        typer.Option(
+            help="Whether the Ultralytics dataset is an OpenFoodFacts dataset, only "
+            "for Ultralytics source. This is used to generate the correct image URLs "
+            "each image name."
+        ),
+    ] = True,
+    openfoodfacts_flavor: Annotated[
+        Flavor,
+        typer.Option(
+            help="Flavor of the Open Food Facts dataset to use for image URLs, only "
+            "for Ultralytics source if is_openfoodfacts_dataset is True. This is used to "
+            "generate the correct image URLs each image name. This option is ignored if "
+            "is_openfoodfacts_dataset is False."
+        ),
+    ] = Flavor.off,
     train_ratio: Annotated[
         float,
         typer.Option(
@@ -165,6 +194,17 @@ def export(
             help="Raise an error if an image download fails, only for Ultralytics"
         ),
     ] = True,
+    use_aws_cache: Annotated[
+        bool,
+        typer.Option(
+            help="Use the AWS S3 cache for image downloads instead of images.openfoodfacts.org, "
+            "it is ignored if the export format is not Ultralytics"
+        ),
+    ] = True,
+    merge_labels: Annotated[
+        bool,
+        typer.Option(help="Merge multiple labels into a single label"),
+    ] = False,
 ):
     """Export Label Studio annotation, either to Hugging Face Datasets or
     local files (ultralytics format)."""
@@ -179,6 +219,13 @@ def export(
     if (to == ExportDestination.hf or from_ == ExportSource.hf) and repo_id is None:
         raise typer.BadParameter("Repository ID is required for export/import with HF")
+    if from_ == ExportSource.ultralytics and dataset_dir is None:
+        raise typer.BadParameter(
+            "Dataset directory is required for export from Ultralytics source"
+        )
+    label_names_list: list[str] | None = None
     if label_names is None:
         if to == ExportDestination.hf:
             raise typer.BadParameter("Label names are required for HF export")
@@ -186,6 +233,9 @@ def export(
             raise typer.BadParameter(
                 "Label names are required for export from LS source"
             )
+    else:
+        label_names = typing.cast(str, label_names)
+        label_names_list = label_names.split(",")
     if from_ == ExportSource.ls:
         if project_id is None:
@@ -197,31 +247,60 @@ def export(
         raise typer.BadParameter("Output directory is required for Ultralytics export")
     if from_ == ExportSource.ls:
+        if task_type != TaskType.object_detection:
+            raise typer.BadParameter(
+                "Only object detection task is currently supported with LS source"
+            )
         ls = LabelStudio(base_url=label_studio_url, api_key=api_key)
-        label_names = typing.cast(str, label_names)
-        label_names_list = label_names.split(",")
         if to == ExportDestination.hf:
             repo_id = typing.cast(str, repo_id)
             export_from_ls_to_hf(
-                ls, repo_id, label_names_list, typing.cast(int, project_id)
+                ls,
+                repo_id=repo_id,
+                label_names=typing.cast(list[str], label_names_list),
+                project_id=typing.cast(int, project_id),
+                merge_labels=merge_labels,
+                use_aws_cache=use_aws_cache,
             )
         elif to == ExportDestination.ultralytics:
             export_from_ls_to_ultralytics(
                 ls,
                 typing.cast(Path, output_dir),
-                label_names_list,
+                typing.cast(list[str], label_names_list),
                 typing.cast(int, project_id),
                 train_ratio=train_ratio,
                 error_raise=error_raise,
+                merge_labels=merge_labels,
+                use_aws_cache=use_aws_cache,
             )
     elif from_ == ExportSource.hf:
+        if task_type != TaskType.object_detection:
+            raise typer.BadParameter(
+                "Only object detection task is currently supported with HF source"
+            )
         if to == ExportDestination.ultralytics:
             export_from_hf_to_ultralytics(
                 typing.cast(str, repo_id),
                 typing.cast(Path, output_dir),
                 download_images=download_images,
                 error_raise=error_raise,
+                use_aws_cache=use_aws_cache,
             )
         else:
             raise typer.BadParameter("Unsupported export format")
+    elif from_ == ExportSource.ultralytics:
+        if task_type != TaskType.classification:
+            raise typer.BadParameter(
+                "Only classification task is currently supported with Ultralytics source"
+            )
+        if to == ExportDestination.hf:
+            export_from_ultralytics_to_hf(
+                task_type=task_type,
+                dataset_dir=typing.cast(Path, dataset_dir),
+                repo_id=typing.cast(str, repo_id),
+                merge_labels=merge_labels,
+                label_names=typing.cast(list[str], label_names_list),
+                is_openfoodfacts_dataset=is_openfoodfacts_dataset,
+                openfoodfacts_flavor=openfoodfacts_flavor,
+            )

{labelr-0.2.0 → labelr-0.4.0}/src/labelr/apps/projects.py RENAMED Viewed

@@ -9,7 +9,7 @@ from openfoodfacts.utils import get_logger
 from PIL import Image
 from ..annotate import (
-    format_annotation_results_from_triton,
+    format_annotation_results_from_robotoff,
     format_annotation_results_from_ultralytics,
 )
 from ..config import LABEL_STUDIO_DEFAULT_URL
@@ -92,14 +92,46 @@ def add_split(
     ],
     api_key: Annotated[str, typer.Option(envvar="LABEL_STUDIO_API_KEY")],
     project_id: Annotated[int, typer.Option(help="Label Studio project ID")],
+    split_name: Annotated[
+        Optional[str],
+        typer.Option(
+            help="name of the split associated "
+            "with the task ID file. If --task-id-file is not provided, "
+            "this field is ignored."
+        ),
+    ] = None,
+    train_split_name: Annotated[
+        str,
+        typer.Option(help="name of the train split"),
+    ] = "train",
+    val_split_name: Annotated[
+        str,
+        typer.Option(help="name of the validation split"),
+    ] = "val",
+    task_id_file: Annotated[
+        Optional[Path],
+        typer.Option(help="path of a text file containing IDs of samples"),
+    ] = None,
+    overwrite: Annotated[
+        bool, typer.Option(help="overwrite existing split field")
+    ] = False,
     label_studio_url: str = LABEL_STUDIO_DEFAULT_URL,
 ):
     """Update the split field of tasks in a Label Studio project.
+    The behavior of this command depends on the `--task-id-file` option.
+    If `--task-id-file` is provided, it should contain a list of task IDs,
+    one per line. The split field of these tasks will be updated to the value
+    of `--split-name`.
+    If `--task-id-file` is not provided, the split field of all tasks in the
+    project will be updated based on the `train_split` probability.
     The split field is set to "train" with probability `train_split`, and "val"
-    otherwise. Tasks without a split field are assigned a split based on the
-    probability, and updated in the server. Tasks with a non-null split field
-    are not updated.
+    otherwise.
+    In both cases, tasks with a non-null split field are not updated unless
+    the `--overwrite` flag is provided.
     """
     import random
@@ -108,11 +140,29 @@ def add_split(
     ls = LabelStudio(base_url=label_studio_url, api_key=api_key)
+    task_ids = None
+    if task_id_file is not None:
+        if split_name is None or split_name not in (train_split_name, val_split_name):
+            raise typer.BadParameter(
+                "--split-name is required when using --task-id-file"
+            )
+        task_ids = task_id_file.read_text().strip().split("\n")
     for task in ls.tasks.list(project=project_id, fields="all"):
         task: Task
+        task_id = task.id
         split = task.data.get("split")
-        if split is None:
-            split = "train" if random.random() < train_split else "val"
+        if split is None or overwrite:
+            if task_ids and str(task_id) in task_ids:
+                split = split_name
+            else:
+                split = (
+                    train_split_name
+                    if random.random() < train_split
+                    else val_split_name
+                )
             logger.info("Updating task: %s, split: %s", task.id, split)
             ls.tasks.update(task.id, data={**task.data, "split": split})
@@ -153,30 +203,37 @@ def annotate_from_prediction(
 class PredictorBackend(enum.Enum):
-    triton = "triton"
     ultralytics = "ultralytics"
+    robotoff = "robotoff"
 @app.command()
 def add_prediction(
     api_key: Annotated[str, typer.Option(envvar="LABEL_STUDIO_API_KEY")],
     project_id: Annotated[int, typer.Option(help="Label Studio Project ID")],
+    view_id: Annotated[
+        Optional[int],
+        typer.Option(
+            help="Label Studio View ID to filter tasks. If not provided, all tasks in the "
+            "project are processed."
+        ),
+    ] = None,
     model_name: Annotated[
         str,
         typer.Option(
-            help="Name of the object detection model to run (for Triton server) or "
+            help="Name of the object detection model to run (for Robotoff server) or "
             "of the Ultralytics zero-shot model to run."
         ),
     ] = "yolov8x-worldv2.pt",
-    triton_uri: Annotated[
+    server_url: Annotated[
         Optional[str],
-        typer.Option(help="URI (host+port) of the Triton Inference Server"),
-    ] = None,
+        typer.Option(help="The Robotoff URL if the backend is robotoff"),
+    ] = "https://robotoff.openfoodfacts.org",
     backend: Annotated[
         PredictorBackend,
         typer.Option(
-            help="Prediction backend: either use a Triton server to perform "
-            "the prediction or uses Ultralytics."
+            help="Prediction backend: either use Ultralytics to perform "
+            "the prediction or Robotoff server."
         ),
     ] = PredictorBackend.ultralytics,
     labels: Annotated[
@@ -196,8 +253,8 @@ def add_prediction(
     threshold: Annotated[
         Optional[float],
         typer.Option(
-            help="Confidence threshold for selecting bounding boxes. The default is 0.5 "
-            "for Triton backend and 0.1 for Ultralytics backend."
+            help="Confidence threshold for selecting bounding boxes. The default is 0.3 "
+            "for robotoff backend and 0.1 for ultralytics backend."
         ),
     ] = None,
     max_det: Annotated[int, typer.Option(help="Maximum numbers of detections")] = 300,
@@ -221,9 +278,7 @@ def add_prediction(
     import tqdm
     from label_studio_sdk.client import LabelStudio
-    from openfoodfacts.utils import get_image_from_url
-    from labelr.triton.object_detection import ObjectDetectionModelRegistry
+    from openfoodfacts.utils import get_image_from_url, http_session
     label_mapping_dict = None
     if label_mapping:
@@ -242,8 +297,6 @@ def add_prediction(
     )
     ls = LabelStudio(base_url=label_studio_url, api_key=api_key)
-    model: ObjectDetectionModelRegistry | "YOLO"
     if backend == PredictorBackend.ultralytics:
         from ultralytics import YOLO
@@ -258,18 +311,19 @@ def add_prediction(
             model.set_classes(labels)
         else:
             logger.warning("The model does not support setting classes directly.")
-    elif backend == PredictorBackend.triton:
-        if triton_uri is None:
-            raise typer.BadParameter("Triton URI is required for Triton backend")
+    elif backend == PredictorBackend.robotoff:
+        if server_url is None:
+            raise typer.BadParameter("--server-url is required for Robotoff backend")
         if threshold is None:
-            threshold = 0.5
-        model = ObjectDetectionModelRegistry.load(model_name)
+            threshold = 0.1
+            server_url = server_url.rstrip("/")
     else:
         raise typer.BadParameter(f"Unsupported backend: {backend}")
-    for task in tqdm.tqdm(ls.tasks.list(project=project_id), desc="tasks"):
+    for task in tqdm.tqdm(
+        ls.tasks.list(project=project_id, view=view_id), desc="tasks"
+    ):
         if task.total_predictions == 0:
             image_url = task.data["image_url"]
             image = typing.cast(
@@ -286,12 +340,22 @@ def add_prediction(
                 label_studio_result = format_annotation_results_from_ultralytics(
                     results, labels, label_mapping_dict
                 )
-            else:
-                output = model.detect_from_image(image, triton_uri=triton_uri)
-                results = output.select(threshold=threshold)
-                logger.info("Adding prediction to task: %s", task.id)
-                label_studio_result = format_annotation_results_from_triton(
-                    results, image.width, image.height
+            elif backend == PredictorBackend.robotoff:
+                r = http_session.get(
+                    f"{server_url}/api/v1/images/predict",
+                    params={
+                        "models": model_name,
+                        "output_image": 0,
+                        "image_url": image_url,
+                    },
+                )
+                r.raise_for_status()
+                response = r.json()
+                label_studio_result = format_annotation_results_from_robotoff(
+                    response["predictions"][model_name],
+                    image.width,
+                    image.height,
+                    label_mapping_dict,
                 )
             if dry_run:
                 logger.info("image_url: %s", image_url)
@@ -339,7 +403,7 @@ def create_dataset_file(
                 extra_meta["barcode"] = barcode
                 off_image_id = Path(extract_source_from_url(url)).stem
                 extra_meta["off_image_id"] = off_image_id
-                image_id = f"{barcode}-{off_image_id}"
+                image_id = f"{barcode}_{off_image_id}"
             image = get_image_from_url(url, error_raise=False)
@@ -351,3 +415,20 @@ def create_dataset_file(
                 image_id, url, image.width, image.height, extra_meta
             )
             f.write(json.dumps(label_studio_sample) + "\n")
+@app.command()
+def create_config_file(
+    output_file: Annotated[
+        Path, typer.Option(help="Path to the output label config file", exists=False)
+    ],
+    labels: Annotated[
+        list[str], typer.Option(help="List of class labels to use for the model")
+    ],
+):
+    """Create a Label Studio label config file for object detection tasks."""
+    from labelr.project_config import create_object_detection_label_config
+    config = create_object_detection_label_config(labels)
+    output_file.write_text(config)
+    logger.info("Label config file created: %s", output_file)

{labelr-0.2.0 → labelr-0.4.0}/src/labelr/export.py RENAMED Viewed

@@ -3,16 +3,21 @@ import logging
 import pickle
 import random
 import tempfile
-import typing
 from pathlib import Path
 import datasets
 import tqdm
 from label_studio_sdk.client import LabelStudio
-from openfoodfacts.images import download_image
-from PIL import Image
+from openfoodfacts.images import download_image, generate_image_url
+from openfoodfacts.types import Flavor
+from PIL import Image, ImageOps
-from labelr.sample import HF_DS_FEATURES, format_object_detection_sample_to_hf
+from labelr.sample import (
+    HF_DS_CLASSIFICATION_FEATURES,
+    HF_DS_OBJECT_DETECTION_FEATURES,
+    format_object_detection_sample_to_hf,
+)
+from labelr.types import TaskType
 logger = logging.getLogger(__name__)
@@ -27,10 +32,15 @@ def _pickle_sample_generator(dir: Path):
 def export_from_ls_to_hf(
     ls: LabelStudio,
     repo_id: str,
-    category_names: list[str],
+    label_names: list[str],
     project_id: int,
+    merge_labels: bool = False,
+    use_aws_cache: bool = True,
 ):
-    logger.info("Project ID: %d, category names: %s", project_id, category_names)
+    if merge_labels:
+        label_names = ["object"]
+    logger.info("Project ID: %d, label names: %s", project_id, label_names)
     for split in ["train", "val"]:
         logger.info("Processing split: %s", split)
@@ -45,7 +55,11 @@ def export_from_ls_to_hf(
                 if task.data["split"] != split:
                     continue
                 sample = format_object_detection_sample_to_hf(
-                    task.data, task.annotations, category_names
+                    task_data=task.data,
+                    annotations=task.annotations,
+                    label_names=label_names,
+                    merge_labels=merge_labels,
+                    use_aws_cache=use_aws_cache,
                 )
                 if sample is not None:
                     # Save output as pickle
@@ -54,7 +68,7 @@ def export_from_ls_to_hf(
             hf_ds = datasets.Dataset.from_generator(
                 functools.partial(_pickle_sample_generator, tmp_dir),
-                features=HF_DS_FEATURES,
+                features=HF_DS_OBJECT_DETECTION_FEATURES,
             )
             hf_ds.push_to_hub(repo_id, split=split)
@@ -62,10 +76,12 @@ def export_from_ls_to_hf(
 def export_from_ls_to_ultralytics(
     ls: LabelStudio,
     output_dir: Path,
-    category_names: list[str],
+    label_names: list[str],
     project_id: int,
     train_ratio: float = 0.8,
     error_raise: bool = True,
+    merge_labels: bool = False,
+    use_aws_cache: bool = True,
 ):
     """Export annotations from a Label Studio project to the Ultralytics
     format.
@@ -73,7 +89,9 @@ def export_from_ls_to_ultralytics(
     The Label Studio project should be an object detection project with a
     single rectanglelabels annotation result per task.
     """
-    logger.info("Project ID: %d, category names: %s", project_id, category_names)
+    if merge_labels:
+        label_names = ["object"]
+    logger.info("Project ID: %d, label names: %s", project_id, label_names)
     data_dir = output_dir / "data"
     data_dir.mkdir(parents=True, exist_ok=True)
@@ -146,34 +164,37 @@ def export_from_ls_to_ultralytics(
                     y_min = value["y"] / 100
                     width = value["width"] / 100
                     height = value["height"] / 100
-                    category_name = value["rectanglelabels"][0]
-                    category_id = category_names.index(category_name)
+                    label_name = (
+                        label_names[0] if merge_labels else value["rectanglelabels"][0]
+                    )
+                    label_id = label_names.index(label_name)
                     # Save the labels in the Ultralytics format:
                     # - one label per line
                     # - each line is a list of 5 elements:
-                    #   - category_id
+                    #   - label_id
                     #   - x_center
                     #   - y_center
                     #   - width
                     #   - height
                     x_center = x_min + width / 2
                     y_center = y_min + height / 2
-                    f.write(f"{category_id} {x_center} {y_center} {width} {height}\n")
+                    f.write(f"{label_id} {x_center} {y_center} {width} {height}\n")
                     has_valid_annotation = True
         if has_valid_annotation:
             download_output = download_image(
-                image_url, return_bytes=True, error_raise=error_raise
+                image_url,
+                return_struct=True,
+                error_raise=error_raise,
+                use_cache=use_aws_cache,
             )
             if download_output is None:
                 logger.error("Failed to download image: %s", image_url)
                 continue
-            _, image_bytes = typing.cast(tuple[Image.Image, bytes], download_output)
             with (images_dir / split / f"{image_id}.jpg").open("wb") as f:
-                f.write(image_bytes)
+                f.write(download_output.image_bytes)
     with (output_dir / "data.yaml").open("w") as f:
         f.write("path: data\n")
@@ -181,8 +202,8 @@ def export_from_ls_to_ultralytics(
         f.write("val: images/val\n")
         f.write("test:\n")
         f.write("names:\n")
-        for i, category_name in enumerate(category_names):
-            f.write(f"  {i}: {category_name}\n")
+        for i, label_name in enumerate(label_names):
+            f.write(f"  {i}: {label_name}\n")
 def export_from_hf_to_ultralytics(
@@ -190,6 +211,7 @@ def export_from_hf_to_ultralytics(
     output_dir: Path,
     download_images: bool = True,
     error_raise: bool = True,
+    use_aws_cache: bool = True,
 ):
     """Export annotations from a Hugging Face dataset project to the
     Ultralytics format.
@@ -215,14 +237,17 @@ def export_from_hf_to_ultralytics(
             if download_images:
                 download_output = download_image(
-                    image_url, return_bytes=True, error_raise=error_raise
+                    image_url,
+                    return_struct=True,
+                    error_raise=error_raise,
+                    use_cache=use_aws_cache,
                 )
                 if download_output is None:
                     logger.error("Failed to download image: %s", image_url)
                     continue
-                _, image_bytes = download_output
                 with (split_images_dir / f"{image_id}.jpg").open("wb") as f:
-                    f.write(image_bytes)
+                    f.write(download_output.image_bytes)
             else:
                 image = sample["image"]
                 image.save(split_images_dir / f"{image_id}.jpg")
@@ -268,3 +293,90 @@ def export_from_hf_to_ultralytics(
         f.write("names:\n")
         for i, category_name in enumerate(category_names):
             f.write(f"  {i}: {category_name}\n")
+def export_from_ultralytics_to_hf(
+    task_type: TaskType,
+    dataset_dir: Path,
+    repo_id: str,
+    label_names: list[str],
+    merge_labels: bool = False,
+    is_openfoodfacts_dataset: bool = False,
+    openfoodfacts_flavor: Flavor = Flavor.off,
+) -> None:
+    if task_type != TaskType.classification:
+        raise NotImplementedError(
+            "Only classification task is currently supported for Ultralytics to HF export"
+        )
+    logger.info("Repo ID: %s, dataset_dir: %s", repo_id, dataset_dir)
+    if not any((dataset_dir / split).is_dir() for split in ["train", "val", "test"]):
+        raise ValueError(
+            f"Dataset directory {dataset_dir} does not contain 'train', 'val' or 'test' subdirectories"
+        )
+    # Save output as pickle
+    for split in ["train", "val", "test"]:
+        split_dir = dataset_dir / split
+        if not split_dir.is_dir():
+            logger.info("Skipping missing split directory: %s", split_dir)
+            continue
+        with tempfile.TemporaryDirectory() as tmp_dir_str:
+            tmp_dir = Path(tmp_dir_str)
+            for label_dir in (d for d in split_dir.iterdir() if d.is_dir()):
+                label_name = label_dir.name
+                if merge_labels:
+                    label_name = "object"
+                if label_name not in label_names:
+                    raise ValueError(
+                        "Label name %s not in provided label names (label names: %s)"
+                        % (label_name, label_names),
+                    )
+                label_id = label_names.index(label_name)
+                for image_path in label_dir.glob("*"):
+                    if is_openfoodfacts_dataset:
+                        image_stem_parts = image_path.stem.split("_")
+                        barcode = image_stem_parts[0]
+                        off_image_id = image_stem_parts[1]
+                        image_id = f"{barcode}_{off_image_id}"
+                        image_url = generate_image_url(
+                            barcode, off_image_id, flavor=openfoodfacts_flavor
+                        )
+                    else:
+                        image_id = image_path.stem
+                        barcode = ""
+                        off_image_id = ""
+                        image_url = ""
+                    image = Image.open(image_path)
+                    image.load()
+                    if image.mode != "RGB":
+                        image = image.convert("RGB")
+                    # Rotate image according to exif orientation using Pillow
+                    ImageOps.exif_transpose(image, in_place=True)
+                    sample = {
+                        "image_id": image_id,
+                        "image": image,
+                        "width": image.width,
+                        "height": image.height,
+                        "meta": {
+                            "barcode": barcode,
+                            "off_image_id": off_image_id,
+                            "image_url": image_url,
+                        },
+                        "category_id": label_id,
+                        "category_name": label_name,
+                    }
+                    with open(tmp_dir / f"{split}_{image_id}.pkl", "wb") as f:
+                        pickle.dump(sample, f)
+            hf_ds = datasets.Dataset.from_generator(
+                functools.partial(_pickle_sample_generator, tmp_dir),
+                features=HF_DS_CLASSIFICATION_FEATURES,
+            )
+            hf_ds.push_to_hub(repo_id, split=split)

labelr-0.4.0/src/labelr/project_config.py ADDED Viewed

@@ -0,0 +1,45 @@
+COLORS = [
+    "blue",
+    "green",
+    "yellow",
+    "red",
+    "purple",
+    "orange",
+    "pink",
+    "brown",
+    "gray",
+    "black",
+    "white",
+]
+def create_object_detection_label_config(labels_names: list[str]) -> str:
+    """Create a Label Studio label configuration for object detection tasks.
+    The format is the following:
+    ```xml
+    <View>
+    <Image name="image" value="$image_url"/>
+    <RectangleLabels name="label" toName="image">
+    <Label value="nutrition-table" background="green"/>
+        <Label value="nutrition-table-small" background="blue"/>
+        <Label value="nutrition-table-small-energy" background="yellow"/>
+        <Label value="nutrition-table-text" background="red"/>
+    </RectangleLabels>
+    </View>
+    ```
+    """
+    if len(labels_names) > len(COLORS):
+        raise ValueError(
+            f"Too many labels ({len(labels_names)}) for the available colors ({len(COLORS)})."
+        )
+    labels_xml = "\n".join(
+        f'    <Label value="{label}" background="{color}"/>'
+        for label, color in zip(labels_names, COLORS[: len(labels_names)])
+    )
+    return f"""<View>
+<Image name="image" value="$image_url"/>
+<RectangleLabels name="label" toName="image">
+{labels_xml}
+</RectangleLabels>
+</View>"""

{labelr-0.2.0 → labelr-0.4.0}/src/labelr/sample.py RENAMED Viewed

@@ -3,7 +3,9 @@ import random
 import string
 import datasets
-from openfoodfacts.images import download_image
+from openfoodfacts import Flavor
+from openfoodfacts.barcode import normalize_barcode
+from openfoodfacts.images import download_image, generate_image_url
 logger = logging.getLogger(__name__)
@@ -62,17 +64,49 @@ def format_object_detection_sample_from_hf(hf_sample: dict, split: str) -> dict:
     annotation_results = format_annotation_results_from_hf(
         objects, image_width, image_height
     )
+    image_id = hf_sample["image_id"]
+    image_url = hf_meta["image_url"]
+    meta_kwargs = {}
+    if "off_image_id" in hf_meta:
+        # If `off_image_id` is present, we assume this is an Open Food Facts
+        # dataset sample.
+        # We normalize the barcode, and generate a new image URL
+        # to make sure that:
+        # - the image URL is valid with correct path
+        # - we use the images subdomain everywhere
+        off_image_id = hf_meta["off_image_id"]
+        meta_kwargs["off_image_id"] = off_image_id
+        barcode = normalize_barcode(hf_meta["barcode"])
+        meta_kwargs["barcode"] = barcode
+        image_id = f"{barcode}_{off_image_id}"
+        if ".openfoodfacts." in image_url:
+            flavor = Flavor.off
+        elif ".openbeautyfacts." in image_url:
+            flavor = Flavor.obf
+        elif ".openpetfoodfacts." in image_url:
+            flavor = Flavor.opf
+        elif ".openproductsfacts." in image_url:
+            flavor = Flavor.opf
+        else:
+            raise ValueError(
+                f"Unknown Open Food Facts flavor for image URL: {image_url}"
+            )
+        image_url = generate_image_url(
+            code=barcode, image_id=off_image_id, flavor=flavor
+        )
     return {
         "data": {
-            "image_id": hf_sample["image_id"],
-            "image_url": hf_meta["image_url"],
+            "image_id": image_id,
+            "image_url": image_url,
             "batch": "null",
             "split": split,
             "meta": {
                 "width": image_width,
                 "height": image_height,
-                "barcode": hf_meta["barcode"],
-                "off_image_id": hf_meta["off_image_id"],
+                **meta_kwargs,
             },
         },
         "predictions": [{"result": annotation_results}],
@@ -111,7 +145,11 @@ def format_object_detection_sample_to_ls(
 def format_object_detection_sample_to_hf(
-    task_data: dict, annotations: list[dict], category_names: list[str]
+    task_data: dict,
+    annotations: list[dict],
+    label_names: list[str],
+    merge_labels: bool = False,
+    use_aws_cache: bool = True,
 ) -> dict | None:
     if len(annotations) > 1:
         logger.info("More than one annotation found, skipping")
@@ -122,8 +160,8 @@ def format_object_detection_sample_to_hf(
     annotation = annotations[0]
     bboxes = []
-    bbox_category_ids = []
-    bbox_category_names = []
+    bbox_label_ids = []
+    bbox_label_names = []
     for annotation_result in annotation["result"]:
         if annotation_result["type"] != "rectanglelabels":
@@ -137,12 +175,13 @@ def format_object_detection_sample_to_hf(
         x_max = x_min + width
         y_max = y_min + height
         bboxes.append([y_min, x_min, y_max, x_max])
-        category_name = value["rectanglelabels"][0]
-        bbox_category_names.append(category_name)
-        bbox_category_ids.append(category_names.index(category_name))
+        label_name = label_names[0] if merge_labels else value["rectanglelabels"][0]
+        bbox_label_names.append(label_name)
+        bbox_label_ids.append(label_names.index(label_name))
     image_url = task_data["image_url"]
-    image = download_image(image_url, error_raise=False)
+    image = download_image(image_url, error_raise=False, use_cache=use_aws_cache)
     if image is None:
         logger.error("Failed to download image: %s", image_url)
         return None
@@ -159,14 +198,14 @@ def format_object_detection_sample_to_hf(
         },
         "objects": {
             "bbox": bboxes,
-            "category_id": bbox_category_ids,
-            "category_name": bbox_category_names,
+            "category_id": bbox_label_ids,
+            "category_name": bbox_label_names,
         },
     }
 # The HuggingFace Dataset features
-HF_DS_FEATURES = datasets.Features(
+HF_DS_OBJECT_DETECTION_FEATURES = datasets.Features(
     {
         "image_id": datasets.Value("string"),
         "image": datasets.features.Image(),
@@ -184,3 +223,20 @@ HF_DS_FEATURES = datasets.Features(
         },
     }
 )
+HF_DS_CLASSIFICATION_FEATURES = datasets.Features(
+    {
+        "image_id": datasets.Value("string"),
+        "image": datasets.features.Image(),
+        "width": datasets.Value("int64"),
+        "height": datasets.Value("int64"),
+        "meta": {
+            "barcode": datasets.Value("string"),
+            "off_image_id": datasets.Value("string"),
+            "image_url": datasets.Value("string"),
+        },
+        "category_id": datasets.Value("int64"),
+        "category_name": datasets.Value("string"),
+    }
+)

{labelr-0.2.0 → labelr-0.4.0}/src/labelr/types.py RENAMED Viewed

@@ -4,6 +4,7 @@ import enum
 class ExportSource(str, enum.Enum):
     hf = "hf"
     ls = "ls"
+    ultralytics = "ultralytics"
 class ExportDestination(str, enum.Enum):

labelr-0.2.0/README.md → labelr-0.4.0/src/labelr.egg-info/PKG-INFO RENAMED Viewed

@@ -1,3 +1,20 @@
+Metadata-Version: 2.4
+Name: labelr
+Version: 0.4.0
+Summary: A command-line tool to manage labeling tasks with Label Studio.
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: datasets>=3.2.0
+Requires-Dist: imagehash>=4.3.1
+Requires-Dist: label-studio-sdk>=1.0.8
+Requires-Dist: more-itertools>=10.5.0
+Requires-Dist: openfoodfacts>=2.9.0
+Requires-Dist: typer>=0.15.1
+Provides-Extra: ultralytics
+Requires-Dist: ultralytics>=8.3.49; extra == "ultralytics"
+Dynamic: license-file
 # Labelr
 Labelr a command line interface that aims to provide a set of tools to help data scientists and machine learning engineers to deal with ML data annotation, data preprocessing and format conversion.
@@ -47,7 +64,17 @@ For all the commands that interact with Label Studio, you need to provide an API
 #### Create a project
-Once you have a Label Studio instance running, you can create a project with the following command:
+Once you have a Label Studio instance running, you can create a project easily. First, you need to create a configuration file for the project. The configuration file is an XML file that defines the labeling interface and the labels to use for the project. You can find an example of a configuration file in the [Label Studio documentation](https://labelstud.io/guide/setup).
+For an object detection task, a command allows you to create the configuration file automatically:
+```bash
+labelr projects create-config --labels 'label1' --labels 'label2' --output-file label_config.xml
+```
+where `label1` and `label2` are the labels you want to use for the object detection task, and `label_config.xml` is the output file that will contain the configuration.
+Then, you can create a project on Label Studio with the following command:
 ```bash
 labelr projects create --title my_project --api-key API_KEY --config-file label_config.xml
@@ -110,4 +137,4 @@ To export the data to a Hugging Face dataset, use the following command:
 labelr datasets export --project-id PROJECT_ID --from ls --to huggingface --repo-id REPO_ID --label-names 'product,price-tag'
 ```
-where `REPO_ID` is the ID of the Hugging Face repository where the dataset will be uploaded (ex: `openfoodfacts/food-detection`).
+where `REPO_ID` is the ID of the Hugging Face repository where the dataset will be uploaded (ex: `openfoodfacts/food-detection`).

{labelr-0.2.0 → labelr-0.4.0}/src/labelr.egg-info/SOURCES.txt RENAMED Viewed

@@ -8,6 +8,7 @@ src/labelr/check.py
 src/labelr/config.py
 src/labelr/export.py
 src/labelr/main.py
+src/labelr/project_config.py
 src/labelr/sample.py
 src/labelr/types.py
 src/labelr.egg-info/PKG-INFO

{labelr-0.2.0 → labelr-0.4.0}/src/labelr.egg-info/requires.txt RENAMED Viewed

@@ -2,13 +2,8 @@ datasets>=3.2.0
 imagehash>=4.3.1
 label-studio-sdk>=1.0.8
 more-itertools>=10.5.0
-openfoodfacts>=2.3.4
-protobuf>=5.29.1
+openfoodfacts>=2.9.0
 typer>=0.15.1
-[triton]
-tritonclient>=2.52.0
-openfoodfacts[ml]>=2.3.4
 [ultralytics]
 ultralytics>=8.3.49