PyPI - optimum-rbln - Versions diffs - 0.9.4a2__py3-none-any.whl → 0.9.5a4__py3-none-any.whl - Mend

optimum-rbln 0.9.4a2py3-none-any.whl → 0.9.5a4py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (82) hide show

optimum/rbln/transformers/models/qwen3_moe/__init__.py ADDED Viewed

@@ -0,0 +1,16 @@
+# Copyright 2025 Rebellions Inc. All rights reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at:
+#     http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from .configuration_qwen3_moe import RBLNQwen3MoeForCausalLMConfig
+from .modeling_qwen3_moe import RBLNQwen3MoeForCausalLM

optimum/rbln/transformers/models/qwen3_moe/configuration_qwen3_moe.py ADDED Viewed

@@ -0,0 +1,38 @@
+# Copyright 2025 Rebellions Inc. All rights reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at:
+#     http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from ..decoderonly.configuration_decoderonly import RBLNDecoderOnlyModelForCausalLMConfig
+class RBLNQwen3MoeForCausalLMConfig(RBLNDecoderOnlyModelForCausalLMConfig):
+    """
+    Configuration class for RBLN Qwen3 Moe models.
+    This class is an alias of RBLNDecoderOnlyModelForCausalLMConfig.
+    Example usage:
+    ```python
+    from optimum.rbln import RBLNQwen3MoeForCausalLM, RBLNQwen3MoeForCausalLMConfig
+    # Create a configuration object
+    config = RBLNQwen3MoeForCausalLMConfig(
+        batch_size=1,
+        max_seq_len=262144,
+        tensor_parallel_size=4
+    )
+    # Use the configuration with from_pretrained
+    model = RBLNQwen3MoeForCausalLM.from_pretrained(
+        "Qwen/Qwen3-30B-A3B-Thinking-2507",
+        export=True,
+        rbln_config=config
+    )
+    ```
+    """

optimum/rbln/transformers/models/qwen3_moe/modeling_qwen3_moe.py ADDED Viewed

@@ -0,0 +1,68 @@
+# Copyright 2025 Rebellions Inc. All rights reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at:
+#     http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from ...models.decoderonly import RBLNDecoderOnlyModelForCausalLM
+from .qwen3_moe_architecture import Qwen3MoeWrapper
+class RBLNQwen3MoeForCausalLM(RBLNDecoderOnlyModelForCausalLM):
+    """
+    The Qwen3 Moe is a Mixture-of-Experts (MoE) variant of Qwen3, available as a base model and an aligned chat model.
+    This model inherits from [`RBLNDecoderOnlyModelForCausalLM`]. Check the superclass documentation for the generic methods the library implements for all its models.
+    A class to convert and run pre-trained transformers based Qwen3MoeForCausalLM model on RBLN devices.
+    It implements the methods to convert a pre-trained transformers Qwen3MoeForCausalLM model into a RBLN transformer model by:
+    - transferring the checkpoint weights of the original into an optimized RBLN graph,
+    - compiling the resulting graph using the RBLN compiler.
+    **Configuration:**
+    This model uses [`RBLNQwen3MoeForCausalLMConfig`] for configuration. When calling methods like `from_pretrained` or `from_model`,
+    the `rbln_config` parameter should be an instance of [`RBLNQwen3MoeForCausalLMConfig`] or a dictionary conforming to its structure.
+    See the [`RBLNQwen3MoeForCausalLMConfig`] class for all available configuration options.
+    Examples:
+        ```python
+        from optimum.rbln import RBLNQwen3MoeForCausalLM
+        # Simple usage using rbln_* arguments
+        # `max_seq_len` is automatically inferred from the model config
+        model = RBLNQwen3MoeForCausalLM.from_pretrained(
+            "Qwen/Qwen3-30B-A3B-Thinking-2507",
+            export=True,
+            rbln_batch_size=1,
+            rbln_tensor_parallel_size=4,
+        )
+        # Using a config dictionary
+        rbln_config = {
+            "batch_size": 1,
+            "max_seq_len": 262144,
+            "tensor_parallel_size": 4,
+        }
+        model = RBLNQwen3MoeForCausalLM.from_pretrained(
+            "Qwen/Qwen3-30B-A3B-Thinking-2507",
+            export=True,
+            rbln_config=rbln_config
+        )
+        # Using a RBLNQwen3ForCausalLMConfig instance (recommended for type checking)
+        from optimum.rbln import RBLNQwen3MoeForCausalLMConfig
+        config = RBLNQwen3MoeForCausalLMConfig(
+            batch_size=1,
+            max_seq_len=262144,
+            tensor_parallel_size=4
+        )
+        model = RBLNQwen3MoeForCausalLM.from_pretrained(
+            "Qwen/Qwen3-30B-A3B-Thinking-2507",
+            export=True,
+            rbln_config=config
+        )
+        ```
+    """
+    _decoder_wrapper_cls = Qwen3MoeWrapper

optimum/rbln/transformers/models/qwen3_moe/qwen3_moe_architecture.py ADDED Viewed

@@ -0,0 +1,100 @@
+# Copyright 2025 Rebellions Inc. All rights reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at:
+#     http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Optional
+import torch
+from torch import nn
+from ..decoderonly.configuration_decoderonly import RBLNLoRAConfig
+from ..decoderonly.decoderonly_architecture import DecoderOnlyAttention, DecoderOnlyLayer, DecoderOnlyWrapper
+class Qwen3MoeWrapper(DecoderOnlyWrapper):
+    def get_rbln_layer_class(self):
+        return Qwen3MoeLayer
+    def get_rbln_attn_class(self):
+        return Qwen3MoeAttention
+class Qwen3MoeAttention(DecoderOnlyAttention):
+    def __post_init__(self, self_attn):
+        self.q_proj = self_attn.q_proj
+        self.k_proj = self_attn.k_proj
+        self.v_proj = self_attn.v_proj
+        self.o_proj = self_attn.o_proj
+        self.q_norm = self_attn.q_norm
+        self.k_norm = self_attn.k_norm
+class Qwen3MoeLayer(DecoderOnlyLayer):
+    def __init__(self, layer, self_attn: DecoderOnlyAttention, lora_config: Optional[RBLNLoRAConfig] = None):
+        super().__init__(layer, self_attn, lora_config)
+        self.mlp = (
+            Qwen3MoeSparseMoeBlock(layer.mlp)
+            if layer.mlp.__class__.__name__ == "Qwen3MoeSparseMoeBlock"
+            else layer.mlp
+        )
+    def get_mlp(self) -> nn.Module:
+        return self.mlp
+class Qwen3MoeSparseMoeBlock(nn.Module):
+    def __init__(self, model: nn.Module):
+        super().__init__()
+        self.num_experts = model.num_experts
+        self.top_k = model.top_k
+        self.norm_topk_prob = model.norm_topk_prob
+        self.gate = model.gate
+        self.experts = Qwen3MoeMLP(model.experts, self.top_k, self.norm_topk_prob)
+    def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
+        batch_size, sequence_length, hidden_dim = hidden_states.shape
+        hidden_states = hidden_states.view(-1, hidden_dim)
+        # router_logits: (batch * sequence_length, n_experts)
+        router_logits = self.gate(hidden_states)
+        final_hidden_states = self.experts(hidden_states, router_logits)
+        final_hidden_states = final_hidden_states.reshape(batch_size, sequence_length, hidden_dim)
+        return final_hidden_states
+class Qwen3MoeMLP(nn.Module):
+    def __init__(self, expert_list, top_k, norm_topk_prob):
+        super().__init__()
+        self.hidden_size = expert_list[0].hidden_size
+        self.intermediate_size = expert_list[0].intermediate_size
+        self.top_k = top_k
+        self.norm_topk_prob = norm_topk_prob
+        self.num_experts = len(expert_list)
+        self.gate_proj = nn.Linear(self.hidden_size, self.num_experts * self.intermediate_size, bias=False)
+        self.up_proj = nn.Linear(self.hidden_size, self.num_experts * self.intermediate_size, bias=False)
+        self.down_proj = nn.Linear(self.num_experts * self.intermediate_size, self.hidden_size, bias=False)
+        self.gate_proj.weight.data = torch.stack([expert.gate_proj.weight.data for expert in expert_list], dim=0)
+        self.up_proj.weight.data = torch.stack([expert.up_proj.weight.data for expert in expert_list], dim=0)
+        self.down_proj.weight.data = torch.stack([expert.down_proj.weight.data for expert in expert_list], dim=0)
+    def forward(self, x, router_logits):
+        return torch.ops.rbln_custom_ops.custom_moe_glu(
+            x,
+            self.gate_proj.weight,
+            self.up_proj.weight,
+            self.down_proj.weight,
+            router_logits,
+            self.top_k,
+            self.norm_topk_prob,
+        )

optimum/rbln/transformers/models/seq2seq/seq2seq_architecture.py CHANGED Viewed

@@ -268,13 +268,12 @@ class Seq2SeqDecoder(torch.nn.Module):
     def __init__(self, model, layers, **kwargs):
         super().__init__()
-        self._original_mod = model
         self.layers = nn.ModuleList(layers)
         self.embed_tokens = model.embed_tokens
-        self.final_layer_norm = getattr(model, "final_layer_norm", None)
-        self.__post_init__(**kwargs)
+        self.final_layer_norm = getattr(model, "final_layer_norm", None) or getattr(model, "layer_norm", None)
+        self.__post_init__(model, **kwargs)
-    def __post_init__(self, **kwargs):
+    def __post_init__(self, model: nn.Module, **kwargs):
         """
         Abstract method intended to be overridden by subclasses to modify or override
         the attributes of the original model after initialization.
@@ -344,12 +343,11 @@ class Seq2SeqDecoderLayer(torch.nn.Module):
     def __init__(self, decoder_layer, self_attn, cross_attn):
         super().__init__()
-        self._original_mod = decoder_layer
         self.self_attn = self_attn
         self.cross_attn = cross_attn
-        self.__post_init__()
+        self.__post_init__(decoder_layer)
-    def __post_init__(self, **kwargs):
+    def __post_init__(self, decoder_layer: nn.Module, **kwargs):
         """
         Abstract method intended to be overridden by subclasses to modify or override
         the attributes of the original model after initialization.
@@ -423,10 +421,9 @@ class Seq2SeqDecoderLayer(torch.nn.Module):
 class Seq2SeqSelfAttention(nn.Module):
     def __init__(self, attn, **kwargs):
         super().__init__()
-        self._original_mod = attn
-        self.__post_init__(**kwargs)
+        self.__post_init__(attn, **kwargs)
-    def __post_init__(self, **kwargs):
+    def __post_init__(self, attn: nn.Module, **kwargs):
         """
         Abstract method intended to be overridden by subclasses to modify or override
         the attributes of the original model after initialization.
@@ -495,8 +492,13 @@ class Seq2SeqSelfAttention(nn.Module):
 class Seq2SeqCrossAttention(nn.Module):
     def __init__(self, attn, **kwargs):
         super().__init__()
-        self._original_mod = attn
-        self.__post_init__(**kwargs)
+        self.__post_init__(attn, **kwargs)
+    def __post_init__(self, attn: nn.Module, **kwargs):
+        """
+        Optional post-init hook for subclasses (e.g., to register q/k/v/out projections).
+        """
+        pass
     def forward(
         self,

optimum/rbln/transformers/models/siglip/modeling_siglip.py CHANGED Viewed

@@ -21,6 +21,7 @@ from transformers.modeling_outputs import BaseModelOutputWithPooling
 from ....configuration_utils import RBLNCompileConfig
 from ....modeling import RBLNModel
 from ....utils.logging import get_logger
+from ...modeling_outputs import _validate_output_attentions, _validate_output_hidden_states
 from .configuration_siglip import RBLNSiglipVisionModelConfig
@@ -52,7 +53,7 @@ class _SiglipVisionModel(torch.nn.Module):
             interpolate_pos_encoding=self.interpolate_pos_encoding,
             output_attentions=self.output_attentions,
         )
-        return tuple(x for x in enc_out if x is not None)
+        return enc_out
 class RBLNSiglipVisionModel(RBLNModel):
@@ -138,23 +139,8 @@ class RBLNSiglipVisionModel(RBLNModel):
             The model outputs. If return_dict=False is passed, returns a tuple of tensors. Otherwise, returns a BaseModelOutputWithPooling object.
         """
-        output_attentions = output_attentions if output_attentions is not None else self.rbln_config.output_attentions
-        output_hidden_states = (
-            output_hidden_states if output_hidden_states is not None else self.rbln_config.output_hidden_states
-        )
-        if output_attentions != self.rbln_config.output_attentions:
-            raise ValueError(
-                f"Variable output_attentions {output_attentions} is not equal to rbln_config.output_attentions {self.rbln_config.output_attentions} "
-                f"Please compile again with the correct argument."
-            )
-        if output_hidden_states != self.rbln_config.output_hidden_states:
-            raise ValueError(
-                f"Variable output_hidden_states {output_hidden_states} is not equal to rbln_config.output_hidden_states {self.rbln_config.output_hidden_states} "
-                f"Please compile again with the correct argument."
-            )
+        output_attentions = _validate_output_attentions(output_attentions, self.rbln_config)
+        output_hidden_states = _validate_output_hidden_states(output_hidden_states, self.rbln_config)
         if interpolate_pos_encoding != self.rbln_config.interpolate_pos_encoding:
             raise ValueError(
                 f"Variable interpolate_pos_encoding {interpolate_pos_encoding} is not equal to rbln_config.interpolate_pos_encoding {self.rbln_config.interpolate_pos_encoding} "

optimum/rbln/transformers/models/swin/configuration_swin.py CHANGED Viewed

@@ -32,11 +32,6 @@ class RBLNSwinBackboneConfig(RBLNModelForImageClassificationConfig):
         Raises:
             ValueError: If batch_size is not a positive integer.
         """
-        super().__init__(**kwargs)
-        self.batch_size = batch_size or 1
-        if not isinstance(self.batch_size, int) or self.batch_size < 0:
-            raise ValueError(f"batch_size must be a positive integer, got {self.batch_size}")
-        self.image_size = image_size
+        super().__init__(batch_size=batch_size, image_size=image_size, **kwargs)
         self.output_hidden_states = output_hidden_states
         self.output_attentions = output_attentions

optimum/rbln/transformers/models/t5/t5_architecture.py CHANGED Viewed

@@ -111,9 +111,9 @@ class T5ForConditionalGeneration(Seq2SeqForConditionalGeneration):
 class T5Decoder(Seq2SeqDecoder):
     has_pos_emb = False
-    def __post_init__(self, dec_max_seq_len: int = None):
-        self.invert_attention_mask = self._original_mod.invert_attention_mask
-        self._dec_position_bias = self.precompute_dec_position_bias(self._original_mod, dec_max_seq_len)
+    def __post_init__(self, model: nn.Module, dec_max_seq_len: int = None):
+        self.invert_attention_mask = model.invert_attention_mask
+        self._dec_position_bias = self.precompute_dec_position_bias(model, dec_max_seq_len)
     def precompute_dec_position_bias(self, model, dec_max_length):
         attn_layer = model.block[0].layer[0].SelfAttention
@@ -145,13 +145,12 @@ class T5Decoder(Seq2SeqDecoder):
 class T5Block(Seq2SeqDecoderLayer):
     def __init__(self, decoder_layer, self_attn):
         super().__init__(decoder_layer, self_attn, cross_attn=None)
-        self.__post_init__()
-    def __post_init__(self):
-        self.self_attn_layer_norm = self._original_mod.layer[0].layer_norm
-        self.encoder_attn_layer_norm = self._original_mod.layer[1].layer_norm
-        self.cross_attn = T5CrossAttention(self._original_mod.layer[1].EncDecAttention)
-        self.ff_layer = self._original_mod.layer[2]
+    def __post_init__(self, decoder_layer: nn.Module):
+        self.self_attn_layer_norm = decoder_layer.layer[0].layer_norm
+        self.encoder_attn_layer_norm = decoder_layer.layer[1].layer_norm
+        self.cross_attn = T5CrossAttention(decoder_layer.layer[1].EncDecAttention)
+        self.ff_layer = decoder_layer.layer[2]
     def pre_self_attn_layer_norm(self, hidden_states):
         return self.self_attn_layer_norm(hidden_states)
@@ -167,13 +166,13 @@ class T5Block(Seq2SeqDecoderLayer):
 class T5LayerSelfAttention(Seq2SeqSelfAttention):
-    def __post_init__(self):
-        self.q_proj = self._original_mod.q
-        self.k_proj = self._original_mod.k
-        self.v_proj = self._original_mod.v
-        self.out_proj = self._original_mod.o
-        self.num_heads = self._original_mod.n_heads
-        self.head_dim = self._original_mod.key_value_proj_dim
+    def __post_init__(self, attn: nn.Module):
+        self.q_proj = attn.q
+        self.k_proj = attn.k
+        self.v_proj = attn.v
+        self.out_proj = attn.o
+        self.num_heads = attn.n_heads
+        self.head_dim = attn.key_value_proj_dim
         self.attn_decode = torch.ops.rbln_custom_ops.paged_add_softmax_attn_decode
     def projection(self, hidden_states) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:

optimum/rbln/transformers/models/time_series_transformer/time_series_transformers_architecture.py CHANGED Viewed

@@ -140,7 +140,6 @@ class TimeSeriesTransformersDecoderWrapper(torch.nn.Module):
 class TimeSeriesTransformersDecoder(nn.Module):
     def __init__(self, model, layers, **kwargs):
         super().__init__()
-        self._original_mod = model
         self.config = model.config
         self.layers = nn.ModuleList(layers)
         self.value_embedding = model.value_embedding
@@ -190,7 +189,6 @@ class TimeSeriesTransformersDecoder(nn.Module):
 class TimeSeriesTransformersDecoderLayer(nn.Module):
     def __init__(self, decoder_layer, self_attn, cross_attn):
         super().__init__()
-        self._original_mod = decoder_layer
         self.self_attn = self_attn
         self.encoder_attn = cross_attn
         self.embed_dim = decoder_layer.embed_dim
@@ -245,7 +243,6 @@ class TimeSeriesTransformersDecoderLayer(nn.Module):
 class TimeSeriesTransformersAttention(nn.Module):
     def __init__(self, attn, num_parallel_samples):
         super().__init__()
-        self._original_mod = attn
         self.q_proj = attn.q_proj
         self.k_proj = attn.k_proj
         self.v_proj = attn.v_proj

optimum/rbln/transformers/models/whisper/whisper_architecture.py CHANGED Viewed

@@ -154,7 +154,6 @@ class WhisperDecoderWrapper(torch.nn.Module):
 class WhisperDecoder(nn.Module):
     def __init__(self, model, layers, **kwargs):
         super().__init__()
-        self._original_mod = model
         self.layers = nn.ModuleList(layers)
         self.embed_tokens = model.embed_tokens
         self.layer_norm = model.layer_norm
@@ -210,7 +209,6 @@ class WhisperDecoder(nn.Module):
 class WhisperDecoderLayer(nn.Module):
     def __init__(self, decoder_layer, self_attn, cross_attn):
         super().__init__()
-        self._original_mod = decoder_layer
         self.self_attn = self_attn
         self.encoder_attn = cross_attn
         self.self_attn_layer_norm = decoder_layer.self_attn_layer_norm
@@ -263,7 +261,6 @@ class WhisperDecoderLayer(nn.Module):
 class WhisperAttention(nn.Module):
     def __init__(self, attn):
         super().__init__()
-        self._original_mod = attn
         self.q_proj = attn.q_proj
         self.k_proj = attn.k_proj
         self.v_proj = attn.v_proj

optimum/rbln/transformers/utils/rbln_quantization.py CHANGED Viewed

@@ -221,11 +221,12 @@ def load_weight_files(
     cache_dir: Optional[str] = None,
     force_download: bool = False,
     local_files_only: bool = False,
+    exception_keywords: Optional[List[str]] = None,
 ) -> list[str]:
     """
     Discover and download safetensors files for the given model id.
     """
+    exception_keywords = exception_keywords or []
     if os.path.isdir(model_id):
         safetensor_files = glob.glob(f"{model_id}/*.safetensors")
     else:
@@ -237,17 +238,24 @@ def load_weight_files(
             for file in repo_files:
                 if file.endswith(".safetensors"):
-                    # Download the safetensors file
-                    downloaded_file = hf_hub_download(
-                        repo_id=model_id,
-                        filename=file,
-                        revision=revision,
-                        token=use_auth_token,
-                        cache_dir=cache_dir,
-                        force_download=force_download,
-                        local_files_only=local_files_only,
-                    )
-                    safetensor_files.append(downloaded_file)
+                    exculde = False
+                    for except_key in exception_keywords:
+                        if except_key in file:
+                            exculde = True
+                            break
+                    if not exculde:
+                        # Download the safetensors file
+                        downloaded_file = hf_hub_download(
+                            repo_id=model_id,
+                            filename=file,
+                            revision=revision,
+                            token=use_auth_token,
+                            cache_dir=cache_dir,
+                            force_download=force_download,
+                            local_files_only=local_files_only,
+                        )
+                        safetensor_files.append(downloaded_file)
         except Exception as e:
             logger.error(f"Failed to download safetensors files from Hugging Face Hub: {e}")
             raise e

optimum/rbln/utils/import_utils.py CHANGED Viewed

@@ -136,7 +136,22 @@ def is_rbln_available() -> bool:
 def check_version_compats() -> None:
     warnings.filterwarnings(action="always", category=ImportWarning, module="optimum.rbln")
-    my_version = importlib.metadata.version("optimum-rbln")
+    try:
+        my_version = importlib.metadata.version("optimum-rbln")
+    except importlib.metadata.PackageNotFoundError:
+        # Common dev case: running from source (e.g. PYTHONPATH=src) without installing the package.
+        # package metadata doesn't exist, so fall back to the in-repo version file.
+        try:
+            from optimum.rbln.__version__ import __version__ as my_version  # type: ignore
+        except Exception:
+            warnings.warn(
+                "Could not determine optimum-rbln version (package metadata missing). "
+                "If you are running from source, consider `pip install -e .` to install metadata.",
+                ImportWarning,
+                stacklevel=2,
+            )
+            return
     target_version = list(filter(lambda v: Version(my_version) >= Version(v), RBLN_VERSION_COMPATS.keys()))[0]
     for compat in RBLN_VERSION_COMPATS[target_version]:
         try:

optimum/rbln/utils/runtime_utils.py CHANGED Viewed

@@ -20,6 +20,10 @@ import rebel
 import torch
+def is_compiler_supports_buffer_resize() -> bool:
+    return hasattr(rebel.RBLNCompiledModel, "exp_multiply_buffer_size")
 def get_available_dram(npu: Optional[str] = None) -> int:
     """
     Get the available DRAM size of the specified NPU.
@@ -75,12 +79,6 @@ def tp_and_devices_are_ok(
     if tensor_parallel_size is None:
         tensor_parallel_size = 1
-    if rebel.device_count() < tensor_parallel_size:
-        return (
-            f"Tensor parallel size {tensor_parallel_size} is greater than "
-            f"the number of available devices {rebel.device_count()}."
-        )
     if device is None:
         device = list(range(tensor_parallel_size))
     elif isinstance(device, int):
@@ -103,6 +101,12 @@ def tp_and_devices_are_ok(
                 f"Device {device_id} is not a valid NPU device. Please check your NPU status with 'rbln-stat' command."
             )
+    if rebel.device_count() < tensor_parallel_size:
+        return (
+            f"Tensor parallel size {tensor_parallel_size} is greater than "
+            f"the number of available devices {rebel.device_count()}."
+        )
     if npu is not None:
         for device_id in device:
             npu_name = rebel.get_npu_name(device_id)

optimum/rbln/utils/submodule.py CHANGED Viewed

@@ -61,12 +61,25 @@ class SubModulesMixin:
     ):
         return rbln_config
+    @classmethod
+    def _update_submodule_rbln_config(
+        cls,
+        submodule_name: str,
+        submodule_cls: Type["RBLNModel"],
+        model: "PreTrainedModel",
+        submodule_config: PretrainedConfig,
+        submodule_rbln_config: RBLNModelConfig,
+        preprocessors: Optional[Union["AutoFeatureExtractor", "AutoProcessor", "AutoTokenizer"]],
+    ):
+        return submodule_rbln_config
     @classmethod
     def _export_submodules_from_model(
         cls, model: "PreTrainedModel", model_save_dir: str, rbln_config: RBLNModelConfig, **kwargs
     ) -> List["RBLNModel"]:
         rbln_submodules = []
         submodule_prefix = getattr(cls, "_rbln_submodule_prefix", None)
+        submodule_postfix = getattr(cls, "_rbln_submodule_postfix", None)
         preprocessors = kwargs.pop("preprocessors", [])
         for submodule in cls._rbln_submodules:
@@ -74,6 +87,9 @@ class SubModulesMixin:
             if submodule_prefix is not None:
                 torch_submodule: PreTrainedModel = getattr(model, submodule_prefix)
                 torch_submodule = getattr(torch_submodule, submodule_name)
+            elif submodule_postfix is not None:
+                torch_submodule: PreTrainedModel = getattr(model, submodule_name)
+                torch_submodule = getattr(torch_submodule, submodule_postfix)
             else:
                 torch_submodule: PreTrainedModel = getattr(model, submodule_name)
@@ -92,6 +108,14 @@ class SubModulesMixin:
                 filtered_kwargs["cls_name"] = submodule_config_cls.__name__
                 submodule_rbln_config = submodule_config_cls(**filtered_kwargs)
+            submodule_rbln_config = cls._update_submodule_rbln_config(
+                submodule_name=submodule_name,
+                submodule_cls=submodule_cls,
+                model=model,
+                submodule_config=torch_submodule.config,
+                submodule_rbln_config=submodule_rbln_config,
+                preprocessors=preprocessors,
+            )
             setattr(rbln_config, submodule_name, submodule_rbln_config)
             submodule_rbln_config = submodule_cls._update_submodule_config(model, submodule_rbln_config, preprocessors)

{optimum_rbln-0.9.4a2.dist-info → optimum_rbln-0.9.5a4.dist-info}/METADATA RENAMED Viewed

@@ -1,10 +1,10 @@
 Metadata-Version: 2.4
 Name: optimum-rbln
-Version: 0.9.4a2
+Version: 0.9.5a4
 Summary: Optimum RBLN is the interface between the HuggingFace Transformers and Diffusers libraries and RBLN accelerators. It provides a set of tools enabling easy model loading and inference on single and multiple rbln device settings for different downstream tasks.
 Project-URL: Homepage, https://rebellions.ai
 Project-URL: Documentation, https://docs.rbln.ai
-Project-URL: Repository, https://github.com/rebellions-sw/optimum-rbln
+Project-URL: Repository, https://github.com/rbln-sw/optimum-rbln
 Author-email: "Rebellions Inc." <support@rebellions.ai>
 License-Expression: Apache-2.0
 License-File: LICENSE
@@ -24,12 +24,12 @@ Classifier: Programming Language :: Python :: 3.13
 Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
 Requires-Python: <3.14,>=3.9
 Requires-Dist: accelerate>=1.0.1
-Requires-Dist: diffusers==0.35.2
+Requires-Dist: diffusers==0.36.0
 Requires-Dist: packaging>=24.1
 Requires-Dist: torch==2.8.0
 Requires-Dist: torchaudio<=2.8.0
 Requires-Dist: torchvision<=0.23.0
-Requires-Dist: transformers==4.57.1
+Requires-Dist: transformers==4.57.3
 Description-Content-Type: text/markdown
@@ -40,7 +40,7 @@ Description-Content-Type: text/markdown
 <img src="assets/rbln_logo.png" width="60%"/>
 [![PyPI version](https://badge.fury.io/py/optimum-rbln.svg)](https://badge.fury.io/py/optimum-rbln)
-[![License](https://img.shields.io/github/license/rebellions-sw/optimum-rbln)](https://github.com/rebellions-sw/optimum-rbln/blob/main/LICENSE)
+[![License](https://img.shields.io/github/license/rbln-sw/optimum-rbln)](https://github.com/rbln-sw/optimum-rbln/blob/main/LICENSE)
 [![Documentation](https://img.shields.io/badge/docs-available-brightgreen)](https://docs.rbln.ai/software/optimum/optimum_rbln.html)
 [![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg)](CODE_OF_CONDUCT.md)
@@ -113,7 +113,7 @@ pip install optimum-rbln --extra-index-url https://download.pytorch.org/whl/cpu
 The below command installs `optimum-rbln` along with its dependencies.
 ```bash
-git clone https://github.com/rebellions-sw/optimum-rbln.git
+git clone https://github.com/rbln-sw/optimum-rbln.git
 cd optimum-rbln
 ./scripts/uv-sync.sh
 ```

optimum-rbln 0.9.4a2__py3-none-any.whl → 0.9.5a4__py3-none-any.whl

optimum-rbln 0.9.4a2py3-none-any.whl → 0.9.5a4py3-none-any.whl