PyPI - sinapsis-huggingface - Versions diffs - 0.1.0__py3-none-any.whl → 0.2.0__py3-none-any.whl - Mend

sinapsis-huggingface 0.1.0py3-none-any.whl → 0.2.0py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

{sinapsis_huggingface-0.1.0.dist-info → sinapsis_huggingface-0.2.0.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: sinapsis-huggingface
-Version: 0.1.0
+Version: 0.2.0
 Summary: Package for HuggingFace-based templates
 Author-email: SinapsisAI <dev@sinapsis.tech>
 License:                     GNU AFFERO GENERAL PUBLIC LICENSE
@@ -822,22 +822,28 @@ The **Sinapsis web applications** provide an interactive way to explore and expe
 > [!IMPORTANT]
 > To run any of the apps, you first need to clone this repo:
+```bash
+git clone git@github.com:Sinapsis-ai/sinapsis-huggingface.git
+cd sinapsis-huggingface
+```
 > [!NOTE]
 > If you'd like to enable external app sharing in Gradio, `export GRADIO_SHARE_APP=True`
 > [!NOTE]
 > Agent configuration can be changed through the AGENT_CONFIG_PATH env var. You can check the available configurations in each package configs folder.
+> [!IMPORTANT]
+> Please make sure you have a valid huggingface access token in order to run the paligemma webapp. For further instructions on how to create an access token see
+https://huggingface.co/docs/transformers.js/en/guides/private
-```bash
-git clone git@github.com:Sinapsis-ai/sinapsis-huggingface.git
-cd sinapsis-huggingface
-```
 <details>
 <summary id="docker"><strong><span style="font-size: 1.4em;">🐳 Build with Docker</span></strong></summary>
-**IMPORTANT** The docker image depends on the sinapsis-nvidia:base image. To build it, refer to the [official sinapsis documentation]([https://](https://github.com/Sinapsis-ai/sinapsis?tab=readme-ov-file#docker)
+**IMPORTANT** The docker image depends on the sinapsis-nvidia:base image. To build it, refer to the [official sinapsis documentation](https://github.com/Sinapsis-AI/sinapsis/blob/main/README.md#docker)
 1. **Build the sinapsis-huggingface image**:
@@ -845,17 +851,35 @@ cd sinapsis-huggingface
 docker compose -f docker/compose.yaml build
 ```
 2. **Start the container**:
+For Diffusers app
 ```bash
 docker compose -f docker/compose_diffusers.yaml up sinapsis-huggingface-diffusers-gradio -d
 ```
-**NOTE**: There is also a service to deploy the vision app. To do so, use:
+For Grounding-Dino app
 ```bash
 docker compose -f docker/compose_vision.yaml up sinapsis-huggingface-vision-gradio -d
 ```
+For Paligemma app
+```bash
+export HF_TOKEN="your_huggingface_token"
+docker compose -f docker/compose_pali_gemma.yaml up sinapsis-huggingface-paligemma-gradio -d
+```
 3. **Check the status**:
+For Diffusers app
 ```bash
 docker logs -f sinapsis-huggingface-diffusers-gradio
 ```
+For Grounding-Dino app
+```bash
+docker logs -f sinapsis-huggingface-vision-gradio
+```
+For Paligemma app
+```bash
+docker logs -f sinapsis-huggingface-paligemma-gradio
+```
 **NOTE**: If using the vision app, please change the name of the service accordingly
 4. **The logs will display the URL to access the webapp, e.g.,**:
@@ -865,9 +889,19 @@ Running on local URL:  http://127.0.0.1:7860
 **NOTE**: The local URL can be different, please check the logs
 5. **To stop the app**:
+For Diffusers app
 ```bash
 docker compose -f docker/compose_diffusers.yaml down
 ```
+For Grounding-Dino app
+```bash
+docker compose -f docker/compose_vision.yaml down
+```
+For Paligemma app
+```bash
+docker compose -f docker/compose_pali_gemma.yaml down
+```
 </details>
 <details>
@@ -886,19 +920,23 @@ uv pip install sinapsis-huggingface[all] --extra-index-url https://pypi.sinapsis
 ```
 3. Run the webapp.
+For Diffusers app
 ```bash
 uv run webapps/diffusers_demo.py
 ```
-4. The terminal will display the URL to access the webapp, e.g., :
+For Grounding-Dino app
 ```bash
-Running on local URL:  http://127.0.0.1:7860
+uv run webapps/vision_demo.py
+```
+For Paligemma app
+```bash
+export HF_TOKEN="your_huggingface_token"
+uv run webapps/paligemma_demo.py
 ```
-**NOTE**: If you want to try the vision app, in step 5 change the command to:
+4. The terminal will display the URL to access the webapp, e.g., :
 ```bash
-python webapps/vision_demo.py
+Running on local URL:  http://127.0.0.1:7860
 ```
 </details>

{sinapsis_huggingface-0.1.0.dist-info → sinapsis_huggingface-0.2.0.dist-info}/RECORD RENAMED Viewed

@@ -1,4 +1,4 @@
-sinapsis_huggingface-0.1.0.dist-info/licenses/LICENSE,sha256=hIahDEOTzuHCU5J2nd07LWwkLW7Hko4UFO__ffsvB-8,34523
+sinapsis_huggingface-0.2.0.dist-info/licenses/LICENSE,sha256=hIahDEOTzuHCU5J2nd07LWwkLW7Hko4UFO__ffsvB-8,34523
 sinapsis_huggingface_diffusers/src/sinapsis_huggingface_diffusers/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 sinapsis_huggingface_diffusers/src/sinapsis_huggingface_diffusers/templates/__init__.py,sha256=9FHbS4hse9WIE-1a5jJlG-23gB3wahlULANJAWQ464c,947
 sinapsis_huggingface_diffusers/src/sinapsis_huggingface_diffusers/templates/base_diffusers.py,sha256=bJOF3w4iwd9dwtwgvaN9tIlBYpgpFL-AIM1u1Zg3Cys,8248
@@ -20,14 +20,19 @@ sinapsis_huggingface_grounding_dino/src/sinapsis_huggingface_grounding_dino/temp
 sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/helpers/__init__.py,sha256=RYEd6xTaVlItleSPoq9RVJIFgXfY6aOHqy2SIO7zwjc,168
 sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/helpers/text_to_sentences.py,sha256=teaJXoTAVzGwar9gxenBabkA9VBJd-VAxsNXlzkKMuU,1676
-sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/__init__.py,sha256=9DyCy0TMGrSwgzoa0z_xwH6idpbTwSz7yyR4kKuLEY0,852
-sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/base_transformers.py,sha256=ExoV83NIjBnJZGbWLsFpb3bcTjzhTPGDXbTaSAuAP-Q,5235
-sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/image_to_text_transformers.py,sha256=LGiKWlkATlmOGht-6CNRfHHc94fSSUIZC8Zosu7Qq3Y,2571
-sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/speech_to_text_transformers.py,sha256=eiK7Mfrbpx5qmWaS0xi3nx7cX4ngmy2pN0sWXQA90P4,2318
-sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/summarization_transformers.py,sha256=ZcvISBOfnSyjiiEDoDvejm6dh06MxgvIOGzYAseql6o,1974
-sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/text_to_speech_transformers.py,sha256=FW73tLKORq5jMpJHYWufQV5j68nQG5viCI_zoMyL4Fk,5805
-sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/translation_transformers.py,sha256=TmiQ3yqY8GBxTjPvpAdxp2jqFbESHr0mHRZ7SqjuCVU,2506
-sinapsis_huggingface-0.1.0.dist-info/METADATA,sha256=ToTFGyWmCAN8X5NtN_ABswlsJAipHIDJmJEsoXqyaw4,50408
-sinapsis_huggingface-0.1.0.dist-info/WHEEL,sha256=1tXe9gY0PYatrMPMDd6jXqjfpz_B-Wqm32CPfRC58XU,91
-sinapsis_huggingface-0.1.0.dist-info/top_level.txt,sha256=ZxHwnMjSWRceQL_6-B7GJBPxQWdlwkba-SYMVufhj5s,133
-sinapsis_huggingface-0.1.0.dist-info/RECORD,,
+sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/__init__.py,sha256=3BgUm6C_tRgzxh2ADMBcu6OHzR-U5Tl1eFVtU0PwxB0,1095
+sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/base_transformers.py,sha256=SJTfLHIkNidTQeh_EXdzKXEHnszWEdxiZ2F2dc0HGPc,5658
+sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/image_to_text_transformers.py,sha256=y4rOh4yYssM2PBIFdXJswPHOs0Y9sb5Bp7TSDbsCwGE,2601
+sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/speech_to_text_transformers.py,sha256=N_zQiWHcz6LiQjOfJdeOdauwxrlqI9O360v5GVE3TwQ,2348
+sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/summarization_transformers.py,sha256=b7GoMba6exEdCq9q6rOrDjoL7blxq8DKpQ_fCiOvwVM,2004
+sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/text_to_speech_transformers.py,sha256=PUpS4Kohe8D_5E5RYnYcEqFZd-koFBkm0r1Sihe_b70,5835
+sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/translation_transformers.py,sha256=okMCToQpqcKs4Y2gHyppJ6p4A3pm0drInqUMvSQw1jk,2536
+sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/pali_gemma/pali_gemma_base.py,sha256=O6pFLLR4uLtnMCV1Pn5HS7_Ab51vXdv3lkaI-UeCOsA,3707
+sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/pali_gemma/pali_gemma_detection.py,sha256=NrWxI8k7oVlLyaf7FjUuSc6J4eXK3ngM8RZwPb6tLL0,4122
+sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/pali_gemma/pali_gemma_inference.py,sha256=5RBXPUgxOBEM4UHrwfcmPW5dmrktDy44pXVkce8piRs,10387
+sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/thirdparty/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/thirdparty/helpers.py,sha256=IGeYd5U2xpimpwTQW_5xm1pUYB5tqHlpq-fjwBHI4gY,2187
+sinapsis_huggingface-0.2.0.dist-info/METADATA,sha256=7riD8-0RoTZkA4-zBD_e1tvaeQKlAjHQAw998kXel80,51211
+sinapsis_huggingface-0.2.0.dist-info/WHEEL,sha256=CmyFI0kx5cdEMTLiONQRbGQwjIoR1aIYB7eCAQ4KPJ0,91
+sinapsis_huggingface-0.2.0.dist-info/top_level.txt,sha256=ZxHwnMjSWRceQL_6-B7GJBPxQWdlwkba-SYMVufhj5s,133
+sinapsis_huggingface-0.2.0.dist-info/RECORD,,

{sinapsis_huggingface-0.1.0.dist-info → sinapsis_huggingface-0.2.0.dist-info}/WHEEL RENAMED Viewed

@@ -1,5 +1,5 @@
 Wheel-Version: 1.0
-Generator: setuptools (77.0.3)
+Generator: setuptools (78.1.0)
 Root-Is-Purelib: true
 Tag: py3-none-any

sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/__init__.py CHANGED Viewed

@@ -6,6 +6,9 @@ _root_lib_path = "sinapsis_huggingface_transformers.templates"
 _template_lookup = {
     "ImageToTextTransformers": f"{_root_lib_path}.image_to_text_transformers",
+    "PaliGemmaDetection": f"{_root_lib_path}.pali_gemma.pali_gemma_detection",
+    "PaliGemmaInference": f"{_root_lib_path}.pali_gemma.pali_gemma_inference",
+    "PaliGemmaSegmentation": f"{_root_lib_path}.pali_gemma.pali_gemma_segmentation",
     "SpeechToTextTransformers": f"{_root_lib_path}.speech_to_text_transformers",
     "SummarizationTransformers": f"{_root_lib_path}.summarization_transformers",
     "TextToSpeechTransformers": f"{_root_lib_path}.text_to_speech_transformers",

sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/base_transformers.py CHANGED Viewed

@@ -63,6 +63,17 @@ class TransformersBase(Template):
         self._TORCH_DTYPE = {"float16": torch.float16, "float32": torch.float32}
         self.task: str | None = None
         self._set_seed()
+    def setup_pipeline(self) -> None:
+        """Initialize and configure the HuggingFace Transformers processing pipeline.
+        Raises:
+            ValueError: If called before the task attribute is set. The task must be
+                defined by the child class before pipeline initialization.
+        """
+        if self.task is None:
+            raise ValueError("'task' must be assigned before pipeline setup")
         self.processor = self._initialize_processor()
         self.pipeline = self.initialize_pipeline()

sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/image_to_text_transformers.py CHANGED Viewed

@@ -38,6 +38,7 @@ class ImageToTextTransformers(TransformersBase):
     def __init__(self, attributes: TemplateAttributeType) -> None:
         super().__init__(attributes)
         self.task = "image-to-text"
+        self.setup_pipeline()
     @staticmethod
     def _convert_to_pil(image_content: Image.Image | np.ndarray) -> Image.Image:

sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/pali_gemma/pali_gemma_base.py ADDED Viewed

@@ -0,0 +1,97 @@
+# -*- coding: utf-8 -*-
+from abc import abstractmethod
+from typing import Any, ClassVar, Literal
+import torch
+from sinapsis_core.data_containers.data_packet import DataContainer
+from sinapsis_core.template_base import (
+    Template,
+    TemplateAttributes,
+    TemplateAttributeType,
+)
+from sinapsis_core.utils.env_var_keys import SINAPSIS_CACHE_DIR
+from transformers import AutoProcessor, PaliGemmaForConditionalGeneration
+class PaliGemmaBaseAttributes(TemplateAttributes):
+    """Base attributes for PaliGemma models.
+    Attributes:
+        model_path (str): Path to the pretrained PaliGemma model. Can be either:
+            - A Hugging Face model identifier (e.g. 'facebook/pali-gemma-7b')
+            - A local directory path containing the model files
+        processor_path (str): Path to the model processor/tokenizer. Can be either:
+            - A Hugging Face model identifier
+            - A local directory path containing the processor files
+        model_cache_dir (str): Directory for caching model files when downloading from Hugging Face.
+        device (Literal["cuda", "cpu"]): Device to run the model on. Defaults to cpu.
+        max_new_tokens (int): Maximum number of tokens to generate. Defaults to 200.
+        torch_dtype (Literal["float16", "float32"]): Model precision type. Defaults to float16.
+    """
+    model_path: str
+    processor_path: str
+    model_cache_dir: str = str(SINAPSIS_CACHE_DIR)
+    device: Literal["cuda", "cpu"] = "cpu"
+    max_new_tokens: int = 200
+    torch_dtype: Literal["float16", "float32"] = "float16"
+class PaliGemmaBase(Template):
+    """Base class for PaliGemma implementations."""
+    AttributesBaseModel = PaliGemmaBaseAttributes
+    CATEGORY = "Transformers"
+    _TORCH_DTYPE: ClassVar[dict[str, Any]] = {"float16": torch.float16, "float32": torch.float32}
+    def __init__(self, attributes: TemplateAttributeType) -> None:
+        super().__init__(attributes)
+        self.model = self._setup_model()
+        self.processor = self._setup_processor()
+    def _setup_model(
+        self,
+    ) -> PaliGemmaForConditionalGeneration:
+        """Initialize model  with proper device placement and precision settings.
+        Handles the loading of model components, configuring
+        it according to the specified device and precision requirements.
+        Returns:
+            PaliGemmaForConditionalGeneration: Initialized and configured model.
+        """
+        model = PaliGemmaForConditionalGeneration.from_pretrained(
+            self.attributes.model_path,
+            cache_dir=self.attributes.model_cache_dir,
+            torch_dtype=self._TORCH_DTYPE.get(self.attributes.torch_dtype),
+        ).to(self.attributes.device)
+        return model
+    def _setup_processor(self) -> AutoProcessor:
+        """Initialize processor with proper device placement and precision settings.
+        Handles the loading of processor components, configuring
+        it according to the specified cache and precision requirements.
+        Returns:
+            AutoProcessor: Initialized and configured processor.
+        """
+        processor = AutoProcessor.from_pretrained(
+            self.attributes.processor_path,
+            cache_dir=self.attributes.model_cache_dir,
+            torch_dtype=self._TORCH_DTYPE.get(self.attributes.torch_dtype),
+        )
+        return processor
+    @abstractmethod
+    def execute(self, container: DataContainer) -> DataContainer:
+        """Execute method to be implemented by child classes.
+        Args:
+            container (DataContainer): The input data container to be processed.
+        Returns:
+            DataContainer: The processed container with model outputs.
+        """

sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/pali_gemma/pali_gemma_detection.py ADDED Viewed

@@ -0,0 +1,124 @@
+# -*- coding: utf-8 -*-
+from dataclasses import dataclass
+from sinapsis_core.data_containers.annotations import ImageAnnotations
+from sinapsis_core.template_base import TemplateAttributeType
+from sinapsis_huggingface_transformers.templates.pali_gemma.pali_gemma_inference import (
+    PaliGemmaInference,
+    PaliGemmaInferenceAttributes,
+)
+from sinapsis_huggingface_transformers.thirdparty.helpers import (
+    get_matches,
+    parse_label,
+    parse_location_tokens,
+)
+@dataclass(frozen=True)
+class PaliGemmaDetectionKeys:
+    "Keys to use during detection"
+    detection_prompt: str = "detect {}"
+class PaliGemmaDetectionAttributes(PaliGemmaInferenceAttributes):
+    """Configuration attributes for PaliGemma object detection tasks.
+    This class extends the base inference attributes to handle object detection specific configurations.
+    Attributes:
+        objects_to_detect (str | list[str]): Target objects to detect, can be a single string or list of strings
+    """
+    objects_to_detect: str | list[str]
+class PaliGemmaDetection(PaliGemmaInference):
+    """Implementation of PaliGemma object detection pipeline.
+    The template inherits functionality from its base class, extending
+    the functionality to run inference on an image and to identify
+    the objects from the attributes.
+    Usage example:
+    agent:
+      name: my_test_agent
+    templates:
+    - template_name: InputTemplate
+      class_name: InputTemplate
+      attributes: {}
+    - template_name: PaliGemmaDetection
+      class_name: PaliGemmaDetection
+      template_input: InputTemplate
+      attributes:
+        model_path: '/path/to/paligemma/model'
+        processor_path: '`/path/to/processor'
+        model_cache_dir: /path/to/cache/dir
+        device: 'cuda'
+        max_new_tokens: 200
+        torch_dtype: float16
+        prompt: <image> caption en
+        objects_to_detect: 'object to detect'
+    """
+    AttributesBaseModel = PaliGemmaDetectionAttributes
+    KEYS = PaliGemmaDetectionKeys
+    def __init__(self, attributes: TemplateAttributeType) -> None:
+        super().__init__(attributes)
+        objects_str = self.initialize_objects_str()
+        self.prompt = self.KEYS.detection_prompt.format(objects_str)
+    def initialize_objects_str(self) -> str:
+        """
+        Initialize the objects to detect string according to the specified format.
+        Returns:
+            str: String enlisting the objects to be defined in the detection prompt.
+        """
+        if isinstance(self.attributes.objects_to_detect, str):
+            return self.attributes.objects_to_detect
+        return "; ".join(self.attributes.objects_to_detect)
+    def _format_text_for_prompt(self, text: str) -> str:
+        """Formats input text as a detection prompt.
+        Args:
+            text (str): Raw text content (expected to be objects to detect)
+        Returns:
+            str: Formatted detection prompt
+        """
+        return self.KEYS.detection_prompt.format(text)
+    def _create_annotation(
+        self, caption: str, confidence: float, image_shape: tuple[int, ...]
+    ) -> list[ImageAnnotations]:
+        """Creates structured annotations from detection model outputs.
+        Processes the model's output caption to extract bounding box coordinates
+        and object labels for each detected instance.
+        Args:
+            caption (str): Raw detection output from the model
+            confidence (float): Confidence score for the predictions
+            image_shape (tuple[int, ...]): Dimensions of the input image (height, width)
+        Returns:
+            list[ImageAnnotations]: List of annotations containing bounding boxes and labels
+                                  for each detected object
+        """
+        annotations = []
+        matches = get_matches(caption)
+        for match_coord in matches:
+            coords = parse_location_tokens(match_coord, image_shape)
+            label = parse_label(match_coord)
+            annotation = self.create_bbox_annotation(coords, label, confidence)
+            annotations.append(annotation)
+        return annotations

sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/pali_gemma/pali_gemma_inference.py ADDED Viewed

@@ -0,0 +1,260 @@
+# -*- coding: utf-8 -*-
+import numpy as np
+import torch
+from sinapsis_core.data_containers.annotations import BoundingBox, ImageAnnotations
+from sinapsis_core.data_containers.data_packet import DataContainer, ImagePacket
+from sinapsis_core.template_base import TemplateAttributeType
+from sinapsis_data_visualization.helpers.detection_utils import bbox_xyxy_to_xywh
+from sinapsis_huggingface_transformers.templates.pali_gemma.pali_gemma_base import (
+    PaliGemmaBase,
+    PaliGemmaBaseAttributes,
+)
+from transformers.generation.utils import GenerateOutput
+class PaliGemmaInferenceAttributes(PaliGemmaBaseAttributes):
+    """Configuration attributes for PaliGemma inference.
+    Attributes:
+        prompt (str): Prompt to run the inference (default: "<image>caption en")
+    The <image> token is essential as it serves as a marker that tells the model where to look at the image
+    when processing the input. This token enables the model to understand the relationship between the visual
+    and textual components during processing.
+    Example prompts:
+        - "<image>caption en" -> Generates a basic caption in English
+        - "<image>What objects can you see in this image?" -> Lists objects in the image
+    """
+    prompt: str = "<image>caption en"
+class PaliGemmaInference(PaliGemmaBase):
+    """Implementation of PaliGemma inference pipeline for image processing and caption generation.
+    This class handles the inference process for PaliGemma models, including image processing,
+    caption generation, and annotation creation. It supports both basic captioning and
+    detection/segmentation tasks.
+    Usage example:
+    agent:
+      name: my_test_agent
+    templates:
+    - template_name: InputTemplate
+      class_name: InputTemplate
+      attributes: {}
+    - template_name: PaliGemmaInference
+      class_name: PaliGemmaInference
+      template_input: InputTemplate
+      attributes:
+        model_path: '/path/to/paligemma/model'
+        processor_path: '`/path/to/processor'
+        model_cache_dir: /path/to/cache/dir
+        device: 'cuda'
+        max_new_tokens: 200
+        torch_dtype: float16
+        prompt: <image> caption en
+    """
+    AttributesBaseModel = PaliGemmaInferenceAttributes
+    INPUT_IDS = "input_ids"
+    def __init__(self, attributes: TemplateAttributeType) -> None:
+        super().__init__(attributes)
+        self.prompt = self.attributes.prompt
+    def _prepare_inputs(self, image_content: np.ndarray) -> dict:
+        """Prepares the input for model inference by processing the image and text prompt.
+        Args:
+            image_content (np.ndarray): Raw image content to be processed as a numpy array
+        Returns:
+            dict: Processed inputs containing:
+                - input_ids (torch.Tensor): Token IDs for the text prompt and image tokens
+                - attention_mask (torch.Tensor): Binary mask indicating valid input positions (1s)
+                - pixel_values (torch.Tensor): Processed image tensor with normalized pixel values
+                    in shape (batch_size, channels, height, width)
+        Note:
+            - The format of the returns it's because uses PyTorch tensors (return_tensors="pt")
+        """
+        return self.processor(
+            images=image_content,
+            text=self.prompt,
+            return_tensors="pt",
+        ).to(self.attributes.device)
+    def _generate_caption(self, inputs: dict) -> torch.Tensor:
+        """Generates caption using the model.
+        Args:
+            inputs (dict): Processed model inputs for the processor, including input IDs of the image and prompt
+        Returns:
+            GeneratedCaptionOutput: A structured output containing:
+                - sequences: tensor with token IDs of the generated sequence
+                - scores: tuple of tensors with token prediction scores for each generation step
+                - logits: optional tensor with raw logits (None in this configuration)
+                - attentions: optional attention weights (None in this configuration)
+                - hidden_states: optional hidden states (None in this configuration)
+                - past_key_values: tuple of tensors containing past keys/values for attention mechanism
+        Configuration parameters:
+            - max_new_tokens: Maximum number of new tokens to generate
+            - return_dict_in_generate: Returns output as a structured dictionary
+            - output_scores: Includes prediction scores in the output
+        """
+        with torch.no_grad():
+            return self.model.generate(
+                **inputs,
+                max_new_tokens=self.attributes.max_new_tokens,
+                return_dict_in_generate=True,
+                output_scores=True,
+            )
+    @staticmethod
+    def _calculate_confidence_score(outputs: GenerateOutput) -> float:
+        """Calculates the confidence score from model generation outputs.
+        The confidence score is computed as the mean of the highest probability
+        for each generated token in the sequence.
+        Args:
+            outputs (GenerateOutput): Model generation output containing scores
+                for each generated token
+        Returns:
+            float: Average confidence score across all generated tokens
+        """
+        scores = torch.stack(outputs.scores)
+        probs = torch.softmax(scores, dim=-1)
+        token_confidences = torch.max(probs, dim=-1).values
+        return float(torch.mean(token_confidences).cpu())
+    def _decode_caption(self, outputs: GenerateOutput, input_len: int) -> str:
+        """Decodes the model output sequences into readable caption text.
+        Args:
+            outputs (GenerateOutput): Model generation output containing the
+                generated token sequences
+            input_len (int): Length of the input sequence to skip initial tokens
+        Returns:
+            str: Decoded caption text with special tokens removed
+        """
+        return self.processor.decode(outputs.sequences[0][input_len:], skip_special_tokens=True)
+    def _create_annotation(
+        self, caption: str, confidence: float, image_shape: tuple[int, ...]
+    ) -> list[ImageAnnotations]:
+        """Creates image annotations from the generated caption.
+        Args:
+            caption (str): Generated caption text
+            confidence (float): Confidence score for the prediction
+            image_shape (tuple[int, ...]): Shape of the input image
+        Returns:
+            list[ImageAnnotations]: List containing annotation with caption information
+        """
+        _, _ = self, image_shape
+        return [ImageAnnotations(text=caption, confidence_score=confidence)]
+    def _process_single_image(self, image_packet: ImagePacket) -> None:
+        """Processes a single image through the inference pipeline.
+        Args:
+            image_packet (ImagePacket): Container with image data and metadata
+        Returns:
+            None: Modifies the image_packet in place by adding annotations
+        """
+        inputs = self._prepare_inputs(image_packet.content)
+        outputs = self._generate_caption(inputs)
+        input_len = inputs[self.INPUT_IDS].shape[-1]
+        caption = self._decode_caption(outputs, input_len)
+        confidence = self._calculate_confidence_score(outputs)
+        annotations = self._create_annotation(caption, confidence, image_packet.content.shape)
+        image_packet.annotations.extend(annotations)
+    def _format_text_for_prompt(self, text: str) -> str:
+        """Formats the incoming text appropriately for the current task type.
+            Base implementation returns the text as-is, subclasses may override
+        to apply task-specific formatting.
+            Args:
+            text (str): Raw text content
+            Returns:
+            str: Formatted prompt text
+        """
+        _ = self
+        return text
+    def process_from_text_packet(self, container: DataContainer) -> None:
+        """
+        Extract prompts from the received list of text packets and use them to perform inference in each received image
+        packet.
+        Args:
+            container (DataContainer): Data-container with text and image packets to be processed.
+        """
+        for text_packet in container.texts:
+            self.prompt = self._format_text_for_prompt(text_packet.content)
+            if container.images:
+                for image_packet in container.images:
+                    self._process_single_image(image_packet)
+    def process_from_prompt(self, container: DataContainer) -> None:
+        """
+        Perform inference in each received image packet using the prompt defined in template attributes.
+        Args:
+            container (DataContainer): Data-container with image packets to be processed.
+        """
+        if container.images:
+            for image_packet in container.images:
+                self._process_single_image(image_packet)
+    def execute(self, container: DataContainer) -> DataContainer:
+        """Executes the inference pipeline on a batch of images.
+        If text packets are present, uses each text as input for prompt formatting.
+        If no text packets exist, uses the default prompt from attributes.
+        Args:
+            container (DataContainer): Container with text and image packets
+        Returns:
+            DataContainer: Processed container with added annotations
+        """
+        if container.texts:
+            self.process_from_text_packet(container)
+        else:
+            self.process_from_prompt(container)
+        return container
+    @staticmethod
+    def create_bbox_annotation(coords: tuple[float, ...], label: str, confidence: float) -> ImageAnnotations:
+        """Creates bounding box annotation from coordinates and metadata.
+        Args:
+            coords (tuple[float, ...]): Coordinates (x0, y0, x1, y1)
+            label (str): Label for the detected object
+            confidence (float): Confidence score for the detection
+        Returns:
+            ImageAnnotations: Annotation object with bounding box information
+        """
+        x0, y0, x1, y1 = coords
+        x, y, w, h = bbox_xyxy_to_xywh([x0, y0, x1, y1])
+        return ImageAnnotations(
+            label_str=label,
+            confidence_score=confidence,
+            bbox=BoundingBox(x=x, y=y, w=w, h=h),
+        )

sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/speech_to_text_transformers.py CHANGED Viewed

@@ -39,6 +39,7 @@ class SpeechToTextTransformers(TransformersBase):
     def __init__(self, attributes: TemplateAttributeType) -> None:
         super().__init__(attributes)
         self.task = "automatic-speech-recognition"
+        self.setup_pipeline()
     def transformation_method(self, container: DataContainer) -> DataContainer:
         """Speech recognition (speech-to-text) using a Transformers Pipeline.

sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/summarization_transformers.py CHANGED Viewed

@@ -38,6 +38,7 @@ class SummarizationTransformers(TransformersBase):
     def __init__(self, attributes: TemplateAttributeType) -> None:
         super().__init__(attributes)
         self.task = "summarization"
+        self.setup_pipeline()
     def transformation_method(self, container: DataContainer) -> DataContainer:
         """Summarize text using a Transformers Pipeline.

sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/text_to_speech_transformers.py CHANGED Viewed

@@ -64,8 +64,9 @@ class TextToSpeechTransformers(TransformersBase):
     def __init__(self, attributes: TemplateAttributeType) -> None:
         super().__init__(attributes)
-        self.sample_rate = self._get_sample_rate()
         self.task = "text-to-speech"
+        self.setup_pipeline()
+        self.sample_rate = self._get_sample_rate()
     def _get_sample_rate(self) -> int:
         """Retrieve the sample rate for the generated audio.

sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/templates/translation_transformers.py CHANGED Viewed

@@ -56,6 +56,7 @@ class TranslationTransformers(TransformersBase):
     def __init__(self, attributes: TemplateAttributeType) -> None:
         super().__init__(attributes)
         self.task = f"translation_{self.attributes.source_language}_to_{self.attributes.target_language}"
+        self.setup_pipeline()
     def transformation_method(self, container: DataContainer) -> DataContainer:
         """Translate text using a Transformers Pipeline.

sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/thirdparty/__init__.py ADDED Viewed

File without changes

sinapsis_huggingface_transformers/src/sinapsis_huggingface_transformers/thirdparty/helpers.py ADDED Viewed

@@ -0,0 +1,70 @@
+# -*- coding: utf-8 -*-
+"""
+The constants and methods declared in this file are inspired in the following source:
+https://github.com/google/generative-ai-docs/blob/main/site/en/gemma/docs/paligemma/inference-with-keras.ipynb
+which is Licensed under the Apache License, Version 2.0.
+"""
+import numpy as np
+import regex as re
+COORDS_PATTERN: str = r"<loc(?P<y0>\d\d\d\d)><loc(?P<x0>\d\d\d\d)><loc(?P<y1>\d\d\d\d)><loc(?P<x1>\d\d\d\d)>"
+LABEL_PATTERN: str = r" (?P<label>.+?)( ;|$)"
+DETECTION_PATTERN: str = COORDS_PATTERN + LABEL_PATTERN
+LOCATION_KEYS: tuple[str, ...] = ("y0", "x0", "y1", "x1")
+LOCATION_SCALE: float = 1024.0
+def parse_location_tokens(match_coord: re.Match, image_shape: tuple[int, ...]) -> np.ndarray:
+    """Parses location tokens from model output into normalized coordinates.
+    Args:
+        match_coord (dict): Dictionary containing matched location tokens
+        image_shape (tuple[int, ...]): Shape of the input image
+    Returns:
+        np.ndarray: Normalized coordinates (x0, y0, x1, y1)
+    """
+    match_dict = match_coord.groupdict()
+    x0 = float(match_dict[LOCATION_KEYS[1]]) / LOCATION_SCALE * image_shape[1]
+    y0 = float(match_dict[LOCATION_KEYS[0]]) / LOCATION_SCALE * image_shape[0]
+    x1 = float(match_dict[LOCATION_KEYS[3]]) / LOCATION_SCALE * image_shape[1]
+    y1 = float(match_dict[LOCATION_KEYS[2]]) / LOCATION_SCALE * image_shape[0]
+    return np.array([x0, y0, x1, y1])
+def parse_label(match_coord: re.Match) -> str:
+    """
+    Retrieves detection label from a regex Match object.
+    Args:
+        match_coord (Match): The Match object containing the label information.
+    Returns:
+        str: The detection label.
+    """
+    label = match_coord.groupdict().get("label")
+    if label is None:
+        return ""
+    return label.strip()
+def get_matches(caption: str) -> re.Scanner:
+    """
+    Creates an iterable containing all the detection matches found in the
+    produced model caption.
+    Args:
+        caption (str): The caption produced by the paligemma model.
+    Returns:
+        Scanner: An iterable object containing all the regex matches.
+    """
+    return re.finditer(DETECTION_PATTERN, caption)

{sinapsis_huggingface-0.1.0.dist-info → sinapsis_huggingface-0.2.0.dist-info}/licenses/LICENSE RENAMED Viewed

File without changes

{sinapsis_huggingface-0.1.0.dist-info → sinapsis_huggingface-0.2.0.dist-info}/top_level.txt RENAMED Viewed

File without changes

sinapsis-huggingface 0.1.0__py3-none-any.whl → 0.2.0__py3-none-any.whl

sinapsis-huggingface 0.1.0py3-none-any.whl → 0.2.0py3-none-any.whl