PyPI - kmodels - Versions diffs - 0.2.2__tar.gz → 0.2.3__tar.gz - Mend

kmodels 0.2.2tar.gz → 0.2.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (255) hide show

{kmodels-0.2.2 → kmodels-0.2.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: kmodels
-Version: 0.2.2
+Version: 0.2.3
 Summary: Pretrained keras 3 vision models
 Author-email: Gitesh Chawda <gitesh.ch.0912@gmail.com>
 License: Apache License 2.0
@@ -39,7 +39,7 @@ Dynamic: license-file
 ## 📖 Introduction
-Keras Models (kmodels) is a collection of models with pretrained weights, built entirely with Keras 3. It supports a range of tasks, including classification, object detection (DETR, RT-DETR, RF-DETR), segmentation (SAM, SAM2, SegFormer, DeepLabV3, EoMT), vision-language modeling (CLIP, SigLIP, SigLIP2), and more. kmodels includes custom layers and backbone support, providing flexibility and efficiency across various applications. For backbones, there are various weight variants like `in1k`, `in21k`, `fb_dist_in1k`, `ms_in22k`, `fb_in22k_ft_in1k`, `ns_jft_in1k`, `aa_in1k`, `cvnets_in1k`, `augreg_in21k_ft_in1k`, `augreg_in21k`, and many more.
+Keras Models (kmodels) is a collection of models with pretrained weights, built entirely with Keras 3. It supports a range of tasks, including classification, object detection (DETR, RT-DETR, RF-DETR, D-FINE), segmentation (SAM, SAM2, SegFormer, DeepLabV3, EoMT), vision-language modeling (CLIP, SigLIP, SigLIP2), and more. It includes hybrid architectures like MaxViT alongside traditional CNNs and pure transformers. kmodels includes custom layers and backbone support, providing flexibility and efficiency across various applications. For backbones, there are various weight variants like `in1k`, `in21k`, `fb_dist_in1k`, `ms_in22k`, `fb_in22k_ft_in1k`, `ns_jft_in1k`, `aa_in1k`, `cvnets_in1k`, `augreg_in21k_ft_in1k`, `augreg_in21k`, and many more.
 ## ⚡ Installation
@@ -78,6 +78,7 @@ pip install -U git+https://github.com/IMvision12/keras-models
 | [DETR](docs/detr.md) | End-to-end object detection with Transformers (ResNet-50/101 backbones) |
 | [RT-DETR](docs/rt_detr.md) | Real-time DETR with ResNet-vd backbone and hybrid encoder (ResNet-18/34/50/101 variants) |
 | [RF-DETR](docs/rf_detr.md) | Real-time detection transformer (Nano, Small, Medium, Base, Large variants) |
+| [D-FINE](docs/dfine.md) | Fine-grained distribution refinement detector with HGNetV2 backbone (Nano/Small/Medium/Large/XLarge) |
 **Vision-Language Models**
@@ -107,6 +108,7 @@ pip install -U git+https://github.com/IMvision12/keras-models
     | Inception-ResNet-v2 | [Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning](https://arxiv.org/abs/1602.07261) | `timm` |
     | Inception-v3 | [Rethinking the Inception Architecture for Computer Vision](https://arxiv.org/abs/1512.00567) | `timm` |
     | Inception-v4 | [Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning](https://arxiv.org/abs/1602.07261) | `timm` |
+    | MaxViT | [MaxViT: Multi-Axis Vision Transformer](https://arxiv.org/abs/2204.01697) | `timm` |
     | MiT | [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) | `transformers` |
     | MLP-Mixer | [MLP-Mixer: An all-MLP Architecture for Vision](https://arxiv.org/abs/2105.01601) | `timm` |
     | MobileNetV2 | [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) | `timm` |
@@ -133,6 +135,7 @@ pip install -U git+https://github.com/IMvision12/keras-models
     | 🏷️ Model Name | 📜 Reference Paper | 📦 Source of Weights |
     |---------------|-------------------|---------------------|
+    | D-FINE | [D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement](https://arxiv.org/abs/2410.13842) | `transformers` |
     | DETR | [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) | `transformers`|
     | RT-DETR | [DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069) | `transformers` |
     | RF-DETR | [RF-DETR: Real-Time Detection Transformer](https://arxiv.org/abs/2502.18860) | `rfdetr` |

{kmodels-0.2.2 → kmodels-0.2.3}/README.md RENAMED Viewed

@@ -6,7 +6,7 @@
 ## 📖 Introduction
-Keras Models (kmodels) is a collection of models with pretrained weights, built entirely with Keras 3. It supports a range of tasks, including classification, object detection (DETR, RT-DETR, RF-DETR), segmentation (SAM, SAM2, SegFormer, DeepLabV3, EoMT), vision-language modeling (CLIP, SigLIP, SigLIP2), and more. kmodels includes custom layers and backbone support, providing flexibility and efficiency across various applications. For backbones, there are various weight variants like `in1k`, `in21k`, `fb_dist_in1k`, `ms_in22k`, `fb_in22k_ft_in1k`, `ns_jft_in1k`, `aa_in1k`, `cvnets_in1k`, `augreg_in21k_ft_in1k`, `augreg_in21k`, and many more.
+Keras Models (kmodels) is a collection of models with pretrained weights, built entirely with Keras 3. It supports a range of tasks, including classification, object detection (DETR, RT-DETR, RF-DETR, D-FINE), segmentation (SAM, SAM2, SegFormer, DeepLabV3, EoMT), vision-language modeling (CLIP, SigLIP, SigLIP2), and more. It includes hybrid architectures like MaxViT alongside traditional CNNs and pure transformers. kmodels includes custom layers and backbone support, providing flexibility and efficiency across various applications. For backbones, there are various weight variants like `in1k`, `in21k`, `fb_dist_in1k`, `ms_in22k`, `fb_in22k_ft_in1k`, `ns_jft_in1k`, `aa_in1k`, `cvnets_in1k`, `augreg_in21k_ft_in1k`, `augreg_in21k`, and many more.
 ## ⚡ Installation
@@ -45,6 +45,7 @@ pip install -U git+https://github.com/IMvision12/keras-models
 | [DETR](docs/detr.md) | End-to-end object detection with Transformers (ResNet-50/101 backbones) |
 | [RT-DETR](docs/rt_detr.md) | Real-time DETR with ResNet-vd backbone and hybrid encoder (ResNet-18/34/50/101 variants) |
 | [RF-DETR](docs/rf_detr.md) | Real-time detection transformer (Nano, Small, Medium, Base, Large variants) |
+| [D-FINE](docs/dfine.md) | Fine-grained distribution refinement detector with HGNetV2 backbone (Nano/Small/Medium/Large/XLarge) |
 **Vision-Language Models**
@@ -74,6 +75,7 @@ pip install -U git+https://github.com/IMvision12/keras-models
     | Inception-ResNet-v2 | [Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning](https://arxiv.org/abs/1602.07261) | `timm` |
     | Inception-v3 | [Rethinking the Inception Architecture for Computer Vision](https://arxiv.org/abs/1512.00567) | `timm` |
     | Inception-v4 | [Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning](https://arxiv.org/abs/1602.07261) | `timm` |
+    | MaxViT | [MaxViT: Multi-Axis Vision Transformer](https://arxiv.org/abs/2204.01697) | `timm` |
     | MiT | [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) | `transformers` |
     | MLP-Mixer | [MLP-Mixer: An all-MLP Architecture for Vision](https://arxiv.org/abs/2105.01601) | `timm` |
     | MobileNetV2 | [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) | `timm` |
@@ -100,6 +102,7 @@ pip install -U git+https://github.com/IMvision12/keras-models
     | 🏷️ Model Name | 📜 Reference Paper | 📦 Source of Weights |
     |---------------|-------------------|---------------------|
+    | D-FINE | [D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement](https://arxiv.org/abs/2410.13842) | `transformers` |
     | DETR | [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) | `transformers`|
     | RT-DETR | [DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069) | `transformers` |
     | RF-DETR | [RF-DETR: Real-Time Detection Transformer](https://arxiv.org/abs/2502.18860) | `rfdetr` |

{kmodels-0.2.2 → kmodels-0.2.3}/kmodels/__init__.py RENAMED Viewed

@@ -2,4 +2,4 @@ from kmodels import layers, models, utils
 from kmodels.model_registry import list_models, register_model
 from kmodels.version import version
-__version__ = "0.2.2"
+__version__ = "0.2.3"

{kmodels-0.2.2 → kmodels-0.2.3}/kmodels/models/__init__.py RENAMED Viewed

@@ -8,6 +8,7 @@ from kmodels.models import (
     deit,
     densenet,
     detr,
+    dfine,
     efficientformer,
     efficientnet,
     efficientnet_lite,
@@ -18,6 +19,7 @@ from kmodels.models import (
     inception_resnetv2,
     inceptionv3,
     inceptionv4,
+    maxvit,
     mit,
     mlp_mixer,
     mobilenetv2,

kmodels-0.2.3/kmodels/models/deeplabv3/__init__.py ADDED Viewed

@@ -0,0 +1,8 @@
+from kmodels.models.deeplabv3.deeplabv3_image_processor import (
+    DeepLabV3ImageProcessor,
+    DeepLabV3PostProcessor,
+)
+from kmodels.models.deeplabv3.deeplabv3_model import (
+    DeepLabV3ResNet50,
+    DeepLabV3ResNet101,
+)

kmodels-0.2.3/kmodels/models/deeplabv3/deeplabv3_image_processor.py ADDED Viewed

@@ -0,0 +1,186 @@
+"""Preprocessing and postprocessing for DeepLabV3 semantic segmentation."""
+from typing import Dict, List, Optional, Tuple, Union
+import keras
+import numpy as np
+from PIL import Image
+VOC_CLASSES = [
+    "background",
+    "aeroplane",
+    "bicycle",
+    "bird",
+    "boat",
+    "bottle",
+    "bus",
+    "car",
+    "cat",
+    "chair",
+    "cow",
+    "dining table",
+    "dog",
+    "horse",
+    "motorbike",
+    "person",
+    "potted plant",
+    "sheep",
+    "sofa",
+    "train",
+    "tv/monitor",
+]
+def DeepLabV3ImageProcessor(
+    image: Union[str, np.ndarray, "Image.Image"],
+    size: Optional[Dict[str, int]] = None,
+    resample: str = "bilinear",
+    do_rescale: bool = True,
+    rescale_factor: float = 1 / 255,
+    do_normalize: bool = True,
+    image_mean: Optional[Tuple[float, ...]] = None,
+    image_std: Optional[Tuple[float, ...]] = None,
+    return_tensor: bool = True,
+) -> Union["keras.KerasTensor", np.ndarray]:
+    """Preprocess an image for DeepLabV3 inference.
+    Handles loading, resizing, rescaling, and ImageNet normalization to match
+    the preprocessing used during DeepLabV3 training (torchvision convention).
+    Args:
+        image: Input image as a file path, numpy array, or PIL Image.
+        size: Target size as ``{"height": H, "width": W}``.
+            Default: ``{"height": 520, "width": 520}``.
+        resample: Interpolation method (``"nearest"``, ``"bilinear"``,
+            or ``"bicubic"``).
+        do_rescale: Whether to divide pixel values by 255.
+        rescale_factor: Rescale factor (default ``1/255``).
+        do_normalize: Whether to apply ImageNet normalization.
+        image_mean: Per-channel mean for normalization.
+            Default: ``(0.485, 0.456, 0.406)``.
+        image_std: Per-channel std for normalization.
+            Default: ``(0.229, 0.224, 0.225)``.
+        return_tensor: If True return a Keras tensor, otherwise numpy array.
+    Returns:
+        Preprocessed image with shape ``(1, H, W, 3)`` ready for model input.
+    Example:
+        ```python
+        from kmodels.models.deeplabv3 import DeepLabV3ImageProcessor, DeepLabV3ResNet50
+        model = DeepLabV3ResNet50(weights="voc")
+        img = DeepLabV3ImageProcessor("photo.jpg")
+        output = model(img, training=False)
+        ```
+    """
+    if size is None:
+        size = {"height": 520, "width": 520}
+    if image_mean is None:
+        image_mean = (0.485, 0.456, 0.406)
+    if image_std is None:
+        image_std = (0.229, 0.224, 0.225)
+    if isinstance(image, str):
+        image = Image.open(image).convert("RGB")
+        image = np.array(image, dtype=np.float32)
+    elif isinstance(image, Image.Image):
+        image = np.array(image.convert("RGB"), dtype=np.float32)
+    elif isinstance(image, np.ndarray):
+        image = image.astype(np.float32)
+        if image.ndim == 4:
+            image = image[0]
+    else:
+        raise TypeError("Input must be a file path (str), numpy array, or PIL Image.")
+    if image.ndim != 3 or image.shape[-1] != 3:
+        raise ValueError(f"Expected image shape (H, W, 3), got {image.shape}")
+    image = keras.ops.convert_to_tensor(image, dtype="float32")
+    image = keras.ops.expand_dims(image, axis=0)
+    target_size = (size["height"], size["width"])
+    image = keras.ops.image.resize(image, size=target_size, interpolation=resample)
+    if do_rescale:
+        image = image * rescale_factor
+    if do_normalize:
+        mean = keras.ops.reshape(
+            keras.ops.convert_to_tensor(image_mean, dtype="float32"), (1, 1, 1, 3)
+        )
+        std = keras.ops.reshape(
+            keras.ops.convert_to_tensor(image_std, dtype="float32"), (1, 1, 1, 3)
+        )
+        image = (image - mean) / std
+    if not return_tensor:
+        image = keras.ops.convert_to_numpy(image)
+    return image
+def DeepLabV3PostProcessor(
+    outputs: "keras.KerasTensor",
+    target_size: Optional[Tuple[int, int]] = None,
+    label_names: Optional[List[str]] = None,
+) -> Dict:
+    """Post-process raw DeepLabV3 outputs into semantic segmentation results.
+    Takes the raw logits from DeepLabV3, computes the argmax class map,
+    optionally resizes to the original image size, and maps class indices
+    to human-readable names.
+    Args:
+        outputs: Raw model output tensor of shape ``(1, H, W, num_classes)``.
+        target_size: Original image ``(height, width)`` for resizing the
+            prediction mask. If ``None``, the mask is returned at model
+            output resolution.
+        label_names: Custom class name list for mapping label indices to
+            names. If ``None``, defaults to Pascal VOC class names (21
+            classes). Provide this when using a model fine-tuned on a
+            custom dataset.
+    Returns:
+        Dict with:
+            - ``"segmentation"``: Integer array of shape ``(H, W)`` with
+              class indices.
+            - ``"class_names"``: List of unique class names detected in the
+              image.
+            - ``"unique_classes"``: Array of unique class indices.
+    Example:
+        ```python
+        from kmodels.models.deeplabv3 import (
+            DeepLabV3ResNet50, DeepLabV3ImageProcessor, DeepLabV3PostProcessor,
+        )
+        model = DeepLabV3ResNet50(weights="voc")
+        img = DeepLabV3ImageProcessor("photo.jpg")
+        output = model(img, training=False)
+        result = DeepLabV3PostProcessor(output, target_size=(orig_h, orig_w))
+        print(result["class_names"])
+        ```
+    """
+    _names = label_names if label_names is not None else VOC_CLASSES
+    logits = keras.ops.convert_to_numpy(outputs)
+    pred_mask = np.argmax(logits[0], axis=-1)  # (H, W)
+    if target_size is not None:
+        pred_mask = np.array(
+            Image.fromarray(pred_mask.astype(np.uint8)).resize(
+                (target_size[1], target_size[0]), Image.NEAREST
+            )
+        )
+    unique_classes = np.unique(pred_mask)
+    class_names = [
+        _names[c] if c < len(_names) else f"class_{c}" for c in unique_classes
+    ]
+    return {
+        "segmentation": pred_mask,
+        "class_names": class_names,
+        "unique_classes": unique_classes,
+    }

{kmodels-0.2.2 → kmodels-0.2.3}/kmodels/models/detr/detr_image_processor.py RENAMED Viewed

@@ -202,6 +202,7 @@ def DETRPostProcessor(
     outputs: Dict[str, keras.KerasTensor],
     threshold: float = 0.7,
     target_sizes: Optional[List[Tuple[int, int]]] = None,
+    label_names: Optional[List[str]] = None,
 ) -> List[Dict[str, np.ndarray]]:
     """Post-process raw DETR outputs into usable detections.
@@ -217,6 +218,9 @@ def DETRPostProcessor(
         target_sizes: List of ``(height, width)`` tuples for each image in
             the batch. Used to convert normalized boxes to pixel coordinates.
             If None, boxes are returned in normalized ``[0, 1]`` coordinates.
+        label_names: Custom class name list for mapping label indices to
+            names. If ``None``, defaults to COCO class names. Provide this
+            when using a model fine-tuned on a custom dataset.
     Returns:
         List of dicts (one per image in the batch), each containing:
@@ -278,16 +282,15 @@ def DETRPostProcessor(
             scale = np.array([img_w, img_h, img_w, img_h], dtype=np.float32)
             xyxy_boxes = xyxy_boxes * scale
-        # Map label indices to COCO names
-        label_names = [
-            COCO_CLASSES[l] if l < len(COCO_CLASSES) else f"class_{l}" for l in labels
-        ]
+        # Map label indices to class names
+        _names = label_names if label_names is not None else COCO_CLASSES
+        mapped_names = [_names[l] if l < len(_names) else f"class_{l}" for l in labels]
         results.append(
             {
                 "scores": scores,
                 "labels": labels,
-                "label_names": label_names,
+                "label_names": mapped_names,
                 "boxes": xyxy_boxes,
             }
         )

kmodels-0.2.3/kmodels/models/dfine/__init__.py ADDED Viewed

@@ -0,0 +1,12 @@
+from .dfine_image_processor import (
+    COCO_CLASSES,
+    DFineImageProcessor,
+    DFinePostProcessor,
+)
+from .dfine_model import (
+    DFineLarge,
+    DFineMedium,
+    DFineNano,
+    DFineSmall,
+    DFineXLarge,
+)

kmodels-0.2.3/kmodels/models/dfine/config.py ADDED Viewed

@@ -0,0 +1,105 @@
+DFINE_MODEL_CONFIG = {
+    "DFineNano": {
+        "stem_channels": [3, 16, 16],
+        "stage_in_channels": [16, 64, 256, 512],
+        "stage_mid_channels": [16, 32, 64, 128],
+        "stage_out_channels": [64, 256, 512, 1024],
+        "stage_num_blocks": [1, 1, 2, 1],
+        "stage_numb_of_layers": [3, 3, 3, 3],
+        "use_lab": True,
+        "encoder_in_channels": [512, 1024],
+        "encoder_hidden_dim": 128,
+        "d_model": 128,
+        "decoder_layers": 3,
+        "decoder_n_points": [6, 6],
+        "hidden_expansion": 0.34,
+        "ccfm_num_blocks": 2,
+        "num_feature_levels": 2,
+        "feat_strides": [16, 32],
+        "encode_proj_layers": [1],
+        "encoder_ffn_dim": 512,
+        "decoder_ffn_dim": 512,
+    },
+    "DFineSmall": {
+        "stem_channels": [3, 16, 16],
+        "stage_in_channels": [16, 64, 256, 512],
+        "stage_mid_channels": [16, 32, 64, 128],
+        "stage_out_channels": [64, 256, 512, 1024],
+        "stage_num_blocks": [1, 1, 2, 1],
+        "stage_numb_of_layers": [3, 3, 3, 3],
+        "use_lab": True,
+        "encoder_in_channels": [256, 512, 1024],
+        "decoder_layers": 3,
+        "decoder_n_points": [3, 6, 3],
+        "hidden_expansion": 0.5,
+    },
+    "DFineMedium": {
+        "stem_channels": [3, 24, 32],
+        "stage_in_channels": [32, 96, 384, 768],
+        "stage_mid_channels": [32, 64, 128, 256],
+        "stage_out_channels": [96, 384, 768, 1536],
+        "stage_num_blocks": [1, 1, 3, 1],
+        "stage_numb_of_layers": [4, 4, 4, 4],
+        "use_lab": True,
+        "encoder_in_channels": [384, 768, 1536],
+        "ccfm_num_blocks": 2,
+        "decoder_layers": 4,
+        "decoder_n_points": [3, 6, 3],
+    },
+    "DFineLarge": {
+        "stem_channels": [3, 32, 48],
+        "stage_in_channels": [48, 128, 512, 1024],
+        "stage_mid_channels": [48, 96, 192, 384],
+        "stage_out_channels": [128, 512, 1024, 2048],
+        "stage_num_blocks": [1, 1, 3, 1],
+        "stage_numb_of_layers": [6, 6, 6, 6],
+        "use_lab": False,
+        "encoder_in_channels": [512, 1024, 2048],
+        "ccfm_num_blocks": 3,
+        "decoder_layers": 6,
+        "decoder_n_points": [3, 6, 3],
+    },
+    "DFineXLarge": {
+        "stem_channels": [3, 32, 64],
+        "stage_in_channels": [64, 128, 512, 1024],
+        "stage_mid_channels": [64, 128, 256, 512],
+        "stage_out_channels": [128, 512, 1024, 2048],
+        "stage_num_blocks": [1, 2, 5, 2],
+        "stage_numb_of_layers": [6, 6, 6, 6],
+        "use_lab": False,
+        "encoder_in_channels": [512, 1024, 2048],
+        "encoder_hidden_dim": 384,
+        "ccfm_num_blocks": 3,
+        "decoder_layers": 6,
+        "decoder_n_points": [3, 6, 3],
+        "encoder_ffn_dim": 2048,
+    },
+}
+DFINE_WEIGHTS_CONFIG = {
+    "DFineNano": {
+        "coco": {
+            "url": "https://github.com/IMvision12/keras-models/releases/download/D-FINE/dfine_nano_coco.weights.h5",
+        },
+    },
+    "DFineSmall": {
+        "coco": {
+            "url": "https://github.com/IMvision12/keras-models/releases/download/D-FINE/dfine_small_coco.weights.h5",
+        },
+    },
+    "DFineMedium": {
+        "coco": {
+            "url": "https://github.com/IMvision12/keras-models/releases/download/D-FINE/dfine_medium_coco.weights.h5",
+        },
+    },
+    "DFineLarge": {
+        "coco": {
+            "url": "https://github.com/IMvision12/keras-models/releases/download/D-FINE/dfine_large_coco.weights.h5",
+        },
+    },
+    "DFineXLarge": {
+        "coco": {
+            "url": "https://github.com/IMvision12/keras-models/releases/download/D-FINE/dfine_xlarge_coco.weights.h5",
+        },
+    },
+}

kmodels 0.2.2__tar.gz → 0.2.3__tar.gz

kmodels 0.2.2tar.gz → 0.2.3tar.gz