PyPI - rxnn - Versions diffs - 0.1.83__tar.gz → 0.2.1__tar.gz - Mend

rxnn 0.1.83tar.gz → 0.2.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (40) hide show

{rxnn-0.1.83 → rxnn-0.2.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.3
 Name: rxnn
-Version: 0.1.83
+Version: 0.2.1
 Summary: RxNN: Reactive Neural Networks Platform
 License: Apache-2.0
 Keywords: deep-learning,ai,machine-learning
@@ -23,8 +23,10 @@ Project-URL: Homepage, https://rxai.dev/rxnn
 Project-URL: Repository, https://github.com/RxAI-dev/rxnn/python
 Description-Content-Type: text/markdown
-<img src="https://raw.githubusercontent.com/RxAI-dev/RxNN/refs/heads/main/assets/logo/logo_rxai.webp" width="300" />
-<img src="https://raw.githubusercontent.com/RxAI-dev/RxNN/refs/heads/main/assets/logo/logo_rxnn.webp" width="300" />
+<span>
+  <img src="https://raw.githubusercontent.com/RxAI-dev/RxNN/refs/heads/main/assets/logo/logo_rxai_v2.png" width="400" />
+  <img src="https://raw.githubusercontent.com/RxAI-dev/RxNN/refs/heads/main/assets/logo/logo_rxnn_v2.png" width="400" />
+</span>
 # Reactive AI - RxNN
 ## Reactive Neural Networks Platform
@@ -61,8 +63,8 @@ We are working on three new reactive architectures, that progressively advance f
 Each new architecture is based on the previous one and adding new features/abilities. They will be progressively
 released with next versions of **RxNN** framework:
-- 0.1.x: Reactive Transformer base models, Base Model Learning (pre-training/fine-tuning) & Transformers extensions (MoE Attention, Short-Term Memory, etc.)
-- 0.2.x: Memory Reinforcement Learning (MRL) for Short-Term Memory & Reactive Transformer, Attention-based Memory System details
+- 0.1.x (Released): Reactive Transformer base models, Base Model Learning (pre-training/fine-tuning) & Transformers extensions (MoE Attention, Short-Term Memory, etc.)
+- 0.2.x (Released): Memory Reinforcement Learning (MRL) for Short-Term Memory & Reactive Transformer, Attention-based Memory System details
 - 0.3.x: Reinforcement Learning from Human Feedback for Reactive models (RxRLHF), basic Tensor Reactive
   Extensions (TRX/Rust) for full Reactive Transformer, RxT-Alpha release (+following models - RxT-Beta, etc.)
 - 0.4.x: Preactor base models, Tensor Database (TDB/Rust) for Long-Term Memory, mxRAG/revRAG subsystems
@@ -126,7 +128,7 @@ Submodules:
 - `rxnn.transformers.moe` - Mixture-of-Experts feed forward layers - `MoeFeedForward` & `GatedMoeFeedForward` (recommended)
 - `rxnn.transformer.layers` - complete reactive/classic transformer layers - `ReactiveTransformerLayer` & `ClassicTransformerLayer`
 - `rxnn.transformer.models` - reactive/classic transformer models - `ReactiveTransformerEncoder`, `ReactiveTransformerDecoder` & `ClassicTransformerEncoder`, `ClassicTransformerDecoder`
-- `rxnn.transformer.sampler` - samplers for reactive models (Sampler is the integral part of reactive architectures) - `Sampler` & `SampleDecoder`
+- `rxnn.transformer.sampler` - samplers for reactive models (Sampler is the integral part of reactive architectures) - `Sampler`, `SampleDecoder`, `BatchSampler` & `BatchSampleDecoder`
 In **RxNN** models are initialized in declarative style by class composition, but then they are wrapped in imperative classes,
 to be compatible with HuggingFace **JSON** config. In example:
@@ -211,7 +213,7 @@ include **Long-Term Memory**.
 The main `ShortTermMemory` class is located in `rxnn.memory.stm` module - the usage example is in Transformers module description.
-Other submodules are connected to **Memory Attention** and will be described in 0.2.x version, after MRL
+> 0.2.x Memory modules docs in progress - will be released soon
 #### Training
 Training module includes **Trainers** for different training stages of reactive models and shared training utils.
@@ -233,9 +235,9 @@ Submodules:
 - `rxnn.training.callbacks` contain Trainer callbacks, for different kind of utils (more info below)
 - `rxnn.training.scheduler` includes learning rate scheduler for training
 - `rxnn.training.bml` - Base Model Learning module with Trainers for pre-training and fine-tuning
-- `rxnn.training.mrl` - Memory Reinforcement Learning module with Trainers for MRL (from 0.2.x)
+- `rxnn.training.mrl` - Memory Reinforcement Learning module with Trainers for MRL
 - `rxnn.training.rxrlhf` - Reinforcement Learning from Human Feedback for Reactive Models module (from 0.3.x)
-- `rxnn.training.brl` - Behavioral Reinforcement Learning module (Reactor / from 0.7.x
+- `rxnn.training.brl` - Behavioral Reinforcement Learning module (Reactor / from 0.7.x)
 ##### Base Model Learning
 Docs in progress

{rxnn-0.1.83 → rxnn-0.2.1}/README.md RENAMED Viewed

@@ -1,5 +1,7 @@
-<img src="https://raw.githubusercontent.com/RxAI-dev/RxNN/refs/heads/main/assets/logo/logo_rxai.webp" width="300" />
-<img src="https://raw.githubusercontent.com/RxAI-dev/RxNN/refs/heads/main/assets/logo/logo_rxnn.webp" width="300" />
+<span>
+  <img src="https://raw.githubusercontent.com/RxAI-dev/RxNN/refs/heads/main/assets/logo/logo_rxai_v2.png" width="400" />
+  <img src="https://raw.githubusercontent.com/RxAI-dev/RxNN/refs/heads/main/assets/logo/logo_rxnn_v2.png" width="400" />
+</span>
 # Reactive AI - RxNN
 ## Reactive Neural Networks Platform
@@ -36,8 +38,8 @@ We are working on three new reactive architectures, that progressively advance f
 Each new architecture is based on the previous one and adding new features/abilities. They will be progressively
 released with next versions of **RxNN** framework:
-- 0.1.x: Reactive Transformer base models, Base Model Learning (pre-training/fine-tuning) & Transformers extensions (MoE Attention, Short-Term Memory, etc.)
-- 0.2.x: Memory Reinforcement Learning (MRL) for Short-Term Memory & Reactive Transformer, Attention-based Memory System details
+- 0.1.x (Released): Reactive Transformer base models, Base Model Learning (pre-training/fine-tuning) & Transformers extensions (MoE Attention, Short-Term Memory, etc.)
+- 0.2.x (Released): Memory Reinforcement Learning (MRL) for Short-Term Memory & Reactive Transformer, Attention-based Memory System details
 - 0.3.x: Reinforcement Learning from Human Feedback for Reactive models (RxRLHF), basic Tensor Reactive
   Extensions (TRX/Rust) for full Reactive Transformer, RxT-Alpha release (+following models - RxT-Beta, etc.)
 - 0.4.x: Preactor base models, Tensor Database (TDB/Rust) for Long-Term Memory, mxRAG/revRAG subsystems
@@ -101,7 +103,7 @@ Submodules:
 - `rxnn.transformers.moe` - Mixture-of-Experts feed forward layers - `MoeFeedForward` & `GatedMoeFeedForward` (recommended)
 - `rxnn.transformer.layers` - complete reactive/classic transformer layers - `ReactiveTransformerLayer` & `ClassicTransformerLayer`
 - `rxnn.transformer.models` - reactive/classic transformer models - `ReactiveTransformerEncoder`, `ReactiveTransformerDecoder` & `ClassicTransformerEncoder`, `ClassicTransformerDecoder`
-- `rxnn.transformer.sampler` - samplers for reactive models (Sampler is the integral part of reactive architectures) - `Sampler` & `SampleDecoder`
+- `rxnn.transformer.sampler` - samplers for reactive models (Sampler is the integral part of reactive architectures) - `Sampler`, `SampleDecoder`, `BatchSampler` & `BatchSampleDecoder`
 In **RxNN** models are initialized in declarative style by class composition, but then they are wrapped in imperative classes,
 to be compatible with HuggingFace **JSON** config. In example:
@@ -186,7 +188,7 @@ include **Long-Term Memory**.
 The main `ShortTermMemory` class is located in `rxnn.memory.stm` module - the usage example is in Transformers module description.
-Other submodules are connected to **Memory Attention** and will be described in 0.2.x version, after MRL
+> 0.2.x Memory modules docs in progress - will be released soon
 #### Training
 Training module includes **Trainers** for different training stages of reactive models and shared training utils.
@@ -208,9 +210,9 @@ Submodules:
 - `rxnn.training.callbacks` contain Trainer callbacks, for different kind of utils (more info below)
 - `rxnn.training.scheduler` includes learning rate scheduler for training
 - `rxnn.training.bml` - Base Model Learning module with Trainers for pre-training and fine-tuning
-- `rxnn.training.mrl` - Memory Reinforcement Learning module with Trainers for MRL (from 0.2.x)
+- `rxnn.training.mrl` - Memory Reinforcement Learning module with Trainers for MRL
 - `rxnn.training.rxrlhf` - Reinforcement Learning from Human Feedback for Reactive Models module (from 0.3.x)
-- `rxnn.training.brl` - Behavioral Reinforcement Learning module (Reactor / from 0.7.x
+- `rxnn.training.brl` - Behavioral Reinforcement Learning module (Reactor / from 0.7.x)
 ##### Base Model Learning
 Docs in progress

{rxnn-0.1.83 → rxnn-0.2.1}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
 [tool.poetry]
 name = "rxnn"
-version = "0.1.83"
+version = "0.2.1"
 description = "RxNN: Reactive Neural Networks Platform"
 license = "Apache-2.0"

rxnn-0.2.1/src/rxnn/.DS_Store ADDED Viewed

Binary file

{rxnn-0.1.83 → rxnn-0.2.1}/src/rxnn/experimental/attention.py RENAMED Viewed

@@ -287,6 +287,7 @@ class SparseQueryAttention(MultiHeadAttention):
             k = self.k_proj(key).view(b, -1, self.num_groups, head_dim).transpose(1, 2)
             v = self.v_proj(value).view(b, -1, self.num_groups, head_dim).transpose(1, 2)
         else:
+            # Relative embedding version is not working without this strange mapping - it will be removed in next versions
             group_heads = self.num_heads // self.num_groups
             query_heads = self.num_heads // self.num_query_groups
             # Process Q
@@ -457,6 +458,7 @@ def init_experimental_attention(
         dropout: float = 0.0,
         rope: RotaryPositionalEmbedding = None,
         rope_only_for_query: bool = False,
+        rope_only_for_keys: bool = False,
         use_relative_embeddings: bool = False,
         max_seq_len: int = 1024,
         use_flash_attention: bool = False,
@@ -478,6 +480,7 @@ def init_experimental_attention(
             use_relative_embeddings=use_relative_embeddings,
             max_seq_len=max_seq_len,
             rope_only_for_query=rope_only_for_query,
+            rope_only_for_keys=rope_only_for_keys,
             use_flash_attention=use_flash_attention,
             is_causal=is_causal,
             use_bias=use_bias,
@@ -493,6 +496,7 @@ def init_experimental_attention(
             use_relative_embeddings=use_relative_embeddings,
             max_seq_len=max_seq_len,
             rope_only_for_query=rope_only_for_query,
+            rope_only_for_keys=rope_only_for_keys,
             use_flash_attention=use_flash_attention,
             is_causal=is_causal,
             use_bias=use_bias,
@@ -511,6 +515,7 @@ def init_experimental_attention(
             use_relative_embeddings=use_relative_embeddings,
             max_seq_len=max_seq_len,
             rope_only_for_query=rope_only_for_query,
+            rope_only_for_keys=rope_only_for_keys,
             use_flash_attention=use_flash_attention,
             is_causal=is_causal,
             use_bias=use_bias,

rxnn-0.2.1/src/rxnn/memory/attention.py ADDED Viewed

@@ -0,0 +1,42 @@
+import torch
+import torch.nn as nn
+from .stm import ShortTermMemory
+class StmMemoryAttention(nn.Module):
+    def __init__(
+            self,
+            stm: ShortTermMemory,
+            attention_layers: nn.ModuleList,
+            memory_norm_layers: nn.ModuleList,
+            *args,
+            **kwargs
+    ):
+        super(StmMemoryAttention, self).__init__(*args, **kwargs)
+        self.stm = stm
+        self.attention_layers = attention_layers
+        self.memory_norm_layers = memory_norm_layers
+        assert len(self.attention_layers) == len(self.memory_norm_layers) == self.stm.memory.size(0)
+        self.num_layers = len(attention_layers)
+    def update_max_len(self, max_seq_len: int):
+        for i in range(self.num_layers):
+            if self.attention_layers[i].rope is not None:
+                self.attention_layers[i].rope.update_max_len(max_seq_len)
+    def forward(self, x: torch.Tensor, attention_mask: torch.Tensor = None) -> torch.Tensor:
+        mask = attention_mask.unsqueeze(1).unsqueeze(1).bool() if attention_mask is not None else None
+        new_stm = torch.zeros_like(self.stm.memory)
+        for i in range(self.num_layers):
+            layer_stm = self.stm(i)
+            # expand layer STM to batch size, if it's not in batch mode
+            if layer_stm.size(0) == 1:
+                layer_stm = layer_stm.expand(x.size(0), -1, -1)
+            encoded_layer_data = x[i]
+            normalized_layer_stm = self.memory_norm_layers[i](layer_stm)
+            new_layer_stm = self.attention_layers[i](normalized_layer_stm, encoded_layer_data, encoded_layer_data, mask=mask)
+            # self.stm.update_layer(i, new_layer_stm + layer_stm)
+            new_stm[i] = new_layer_stm + layer_stm # residual
+        self.stm.update_all(new_stm)
+        return self.stm.memory

rxnn-0.2.1/src/rxnn/memory/stm.py ADDED Viewed

@@ -0,0 +1,96 @@
+import torch
+import torch.nn as nn
+class ShortTermMemory(nn.Module):
+    """Short-term memory module for the Attention-based Memory System"""
+    def __init__(self, num_layers: int, embed_dim: int, stm_size: int, init_type: str = 'normal',
+                 is_trainable: bool = False, legacy_init: bool = True, *args, **kwargs):
+        super(ShortTermMemory, self).__init__(*args, **kwargs)
+        self.num_layers = num_layers
+        self.embed_dim = embed_dim
+        self.stm_size = stm_size
+        self.batch_size = 1 # setting 1 as initial batch size (it will be normally used in inference/pre-training. Bigger batches are for RL stages)
+        self.is_trainable = is_trainable
+        assert init_type in ['normal', 'standard', 'uniform', 'ones', 'zeros'], \
+            'STM init type must be one of "normal", "standard", "uniform", "ones", "zeros"'
+        # Legacy init - temporary option to load old models with not-batched STM (they will be loaded, updated and then the option will be removed)
+        self.legacy_init = legacy_init
+        self.init_type = init_type
+        stm = self._init_tensor()
+        if self.is_trainable:
+            self.memory = nn.Parameter(stm)
+        else:
+            self.register_buffer('memory', stm)
+    def _init_tensor(self, init_type: str = None):
+        init_type = init_type or self.init_type
+        stm_shape = (self.num_layers, self.stm_size, self.embed_dim) \
+            if self.legacy_init else (self.num_layers, self.batch_size, self.stm_size, self.embed_dim)
+        if init_type == 'normal':
+            return torch.normal(0, 0.02, stm_shape)
+        elif init_type == 'standard':
+            return torch.normal(0, 1, stm_shape)
+        elif init_type == 'uniform':
+            return torch.rand(*stm_shape) * 0.02
+        elif init_type == 'ones':
+            return torch.ones(*stm_shape)
+        else:
+            return torch.zeros(*stm_shape)
+    def reset_legacy_(self):
+        self.legacy_init = False
+        self.memory = self._init_tensor()
+    def forward(self, layer: int) -> torch.Tensor:
+        return self.memory[layer].unsqueeze(0) if self.legacy_init else self.memory[layer]
+    def update_layer(self, layer: int, new_stm: torch.Tensor):
+        self.memory[layer] = new_stm
+    def update_all(self, new_stm: torch.Tensor):
+        self.memory.copy_(new_stm)
+    def make_trainable(self):
+        if not self.is_trainable:
+            self.is_trainable = True
+            initial_stm = self.memory.clone()
+            del self.memory
+            self.memory = nn.Parameter(initial_stm)
+    def freeze(self):
+        if self.is_trainable:
+            self.requires_grad_(False)
+            trained_stm = self.memory.clone()
+            del self.memory
+            self.register_buffer('memory', trained_stm)
+    def reset(self, init_type: str = None):
+        self.memory = self._init_tensor(init_type)
+    def resize(self, new_stm_size: int, init_type: str = None):
+        self.stm_size = new_stm_size
+        self.memory = self._init_tensor(init_type)
+    def batched_memory(self, batch_size: int, init_type: str = None):
+        if init_type is not None:
+            assert init_type in ['normal', 'standard', 'uniform', 'ones', 'zeros'], \
+                'STM init type must be one of "normal", "standard", "uniform", "ones", "zeros"'
+            self.init_type = init_type
+        self.batch_size = batch_size
+        self.memory = self._init_tensor()
+    def single_memory(self, init_type: str = None, use_mean_from_batch: bool = False):
+        if init_type is not None:
+            assert init_type in ['normal', 'standard', 'uniform', 'ones', 'zeros'], \
+                'STM init type must be one of "normal", "standard", "uniform", "ones", "zeros"'
+            self.init_type = init_type
+        self.batch_size = 1
+        if use_mean_from_batch:
+            batch_mean = self.memory.mean(dim=(1, 2, 3), keepdim=True)
+            self.memory = self._init_tensor()
+            self.memory.copy_(batch_mean)
+        else:
+            self.memory = self._init_tensor()

{rxnn-0.1.83 → rxnn-0.2.1}/src/rxnn/rxt/models.py RENAMED Viewed

@@ -8,6 +8,8 @@ from ..transformers.layers import ReactiveTransformerLayer
 from ..transformers.models import ReactiveTransformerBase, ReactiveTransformerEncoder, ReactiveTransformerDecoder
 from ..transformers.ff import get_activation_layer
 from ..memory.stm import ShortTermMemory
+from ..memory.norm import init_memory_norm
+from ..memory.attention import StmMemoryAttention
 from ..utils import get_model_size
 from ..experimental.attention import init_experimental_attention
@@ -135,6 +137,22 @@ class RxTAlphaComponentBase(nn.Module, PyTorchModelHubMixin):
     def load_shared_memory(self, stm: ShortTermMemory):
         self.model.stm = stm
+    def freeze_without_memory(self):
+        for param in self.model.parameters():
+            param.requires_grad_(False)
+        self.model.trainable_cross_attention_(True)
+    def freeze_memory(self):
+        self.model.trainable_cross_attention_(False)
+    def unfreeze_all(self):
+        for param in self.model.parameters():
+            param.requires_grad_(True)
+    def update_max_len(self, max_seq_len: int):
+        for layer in self.model.layers:
+            layer.update_max_len(max_seq_len)
     def forward(self, x: torch.Tensor, attention_mask: torch.Tensor = None) -> Union[
         torch.Tensor, tuple[torch.Tensor, torch.Tensor]]:
         return self.model(x, attention_mask=attention_mask)
@@ -205,3 +223,56 @@ def build_rxt_alpha_for_pretraining(
     return encoder, decoder
+class RxTAlphaMemoryAttention(nn.Module, PyTorchModelHubMixin, license="apache-2.0"):
+    """RxT-Alpha (Reactive Transformer) memory attention model"""
+    def __init__(
+            self,
+            num_layers: int = 12,
+            embed_dim: int = 512,
+            att_heads: int = 16,
+            seq_len: int = 1024,
+            stm_size: int = 1024,
+            use_flash_attention: bool = True,
+            att_dropout: float = 0.0,
+            norm_type: str = 'rms',
+            att_groups: int = 1,
+            att_type: str = 'sqa',
+            att_experts: int = None,
+            att_query_experts: int = None,
+            att_query_groups: int = None,
+            **kwargs,
+    ):
+        super(RxTAlphaMemoryAttention, self).__init__(**kwargs)
+        assert att_type in ['mha', 'gqa', 'mqa', 'gma', 'dma', 'sqa'], 'Memory attention type could be "mha", "gqa", "mqa", "gma", "dma", "sqa".'
+        rope = RotaryPositionalEmbedding(embed_dim // att_heads, seq_len)
+        stm = ShortTermMemory(num_layers, embed_dim, stm_size)
+        if att_type in ['mha', 'gqa', 'mqa']:
+            att_init = lambda: init_attention(embed_dim, att_heads, att_type, att_groups, rope=rope,
+                                              use_flash_attention=use_flash_attention, dropout=att_dropout,
+                                              max_seq_len=seq_len, is_causal=False, rope_only_for_keys=True)
+        else:
+            att_init = lambda: init_experimental_attention(embed_dim, att_heads, att_type, att_groups, rope=rope,
+                                                           use_flash_attention=use_flash_attention, dropout=att_dropout,
+                                                           max_seq_len=seq_len, is_causal=False, num_experts=att_experts,
+                                                           num_query_experts=att_query_experts,
+                                                           num_query_groups=att_query_groups, rope_only_for_keys=True)
+        memory_norm_layers = nn.ModuleList([init_memory_norm(norm_type, embed_dim, stm_size) for _ in range(num_layers)])
+        attention_layers = nn.ModuleList([att_init() for _ in range(num_layers)])
+        self.model = StmMemoryAttention(stm, attention_layers, memory_norm_layers)
+    def load_shared_memory(self, stm: ShortTermMemory):
+        self.model.stm = stm
+    def update_max_len(self, max_seq_len: int):
+        self.model.update_max_len(max_seq_len)
+    def reset_memory(self, init_type: str = None):
+        self.model.stm.reset_memory(init_type)
+    def forward(self, x: torch.Tensor, attention_mask: torch.Tensor = None) -> torch.Tensor:
+        return self.model(x, attention_mask=attention_mask)

{rxnn-0.1.83 → rxnn-0.2.1}/src/rxnn/training/bml.py RENAMED Viewed

@@ -1,46 +1,12 @@
 import torch
-import torch.nn as nn
 import torch.nn.functional as F
 from torch.nn.parallel import DistributedDataParallel
 import math
-from huggingface_hub import PyTorchModelHubMixin
 from typing import Union
 import torch.distributed as dist
-from ..transformers.models import ReactiveTransformerEncoder, ReactiveTransformerDecoder
+from ..transformers.models import ReactiveTransformerDecoder
 from ..training.base import BaseTrainer
-class MLMHead(nn.Module, PyTorchModelHubMixin, license="apache-2.0"):
-    def __init__(self, embed_dim: int, vocab_size: int, *args, **kwargs):
-        super(MLMHead, self).__init__(*args, **kwargs)
-        self.dense = nn.Linear(embed_dim, embed_dim)
-        self.act = nn.GELU()
-        self.layer_norm = nn.LayerNorm(embed_dim)
-        self.decoder = nn.Linear(embed_dim, vocab_size)
-    def forward(self, hidden_states):
-        x = self.dense(hidden_states)
-        x = self.act(x)
-        x = self.layer_norm(x)
-        return self.decoder(x)
-class MLMTrainingModel(nn.Module):
-    def __init__(
-            self,
-            encoder: ReactiveTransformerEncoder,
-            mlm_head: MLMHead,
-            *args,
-            **kwargs
-    ):
-        super(MLMTrainingModel, self).__init__(*args, **kwargs)
-        self.encoder = encoder
-        self.mlm_head = mlm_head
-    def forward(self, x: torch.Tensor, attention_mask: torch.Tensor = None) -> torch.Tensor:
-        h, _ = self.encoder(x, attention_mask=attention_mask)
-        y = self.mlm_head(h)
-        return y
+from .models import MLMTrainingModel, JointTrainingModel
 class MLMTrainer(BaseTrainer):
     def __init__(
@@ -242,29 +208,6 @@ class AutoregressiveTrainer(BaseTrainer):
         self.model.train()
         return avg_loss, metrics
-class JointTrainingModel(nn.Module):
-    def __init__(
-            self,
-            encoder: ReactiveTransformerEncoder,
-            decoder: ReactiveTransformerDecoder,
-            mlm_head: MLMHead,
-            *args,
-            **kwargs
-    ):
-        super(JointTrainingModel, self).__init__(*args, **kwargs)
-        self.encoder = encoder
-        self.mlm_head = mlm_head
-        self.decoder = decoder
-    def forward(self, x_e: torch.Tensor, x_d: torch.Tensor, attention_mask: torch.Tensor = None) -> tuple[
-        torch.Tensor, torch.Tensor]:
-        encoder_result, _ = self.encoder(x_e, attention_mask=attention_mask)
-        y_e = self.mlm_head(encoder_result)
-        y_d = self.decoder(x_d, attention_mask=attention_mask)
-        return y_e, y_d
 class JointLMTrainer(BaseTrainer):
     """"
     It's not recommended to use Joint LM Training in current implementation. More info soon

rxnn 0.1.83__tar.gz → 0.2.1__tar.gz

rxnn 0.1.83tar.gz → 0.2.1tar.gz