PyPI - x-transformers - Versions diffs - 2.4.0__tar.gz → 2.4.2__tar.gz - Mend

x-transformers 2.4.0tar.gz → 2.4.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (63) hide show

{x_transformers-2.4.0 → x_transformers-2.4.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: x-transformers
-Version: 2.4.0
+Version: 2.4.2
 Summary: X-Transformers
 Project-URL: Homepage, https://pypi.org/project/x-transformers/
 Project-URL: Repository, https://github.com/lucidrains/x-transformers
@@ -2495,4 +2495,16 @@ ids_out, num_out, is_number_mask = model.generate(start_ids, start_nums, 17)
 }
 ```
+```bibtex
+@misc{bloem2025universalpretrainingiteratedrandom,
+    title   = {Universal pre-training by iterated random computation},
+    author  = {Peter Bloem},
+    year    = {2025},
+    eprint  = {2506.20057},
+    archivePrefix = {arXiv},
+    primaryClass = {cs.LG},
+    url     = {https://arxiv.org/abs/2506.20057},
+}
+```
 *solve intelligence... then use that to solve everything else.* - Demis Hassabis

{x_transformers-2.4.0 → x_transformers-2.4.2}/README.md RENAMED Viewed

@@ -2447,4 +2447,16 @@ ids_out, num_out, is_number_mask = model.generate(start_ids, start_nums, 17)
 }
 ```
+```bibtex
+@misc{bloem2025universalpretrainingiteratedrandom,
+    title   = {Universal pre-training by iterated random computation},
+    author  = {Peter Bloem},
+    year    = {2025},
+    eprint  = {2506.20057},
+    archivePrefix = {arXiv},
+    primaryClass = {cs.LG},
+    url     = {https://arxiv.org/abs/2506.20057},
+}
+```
 *solve intelligence... then use that to solve everything else.* - Demis Hassabis

{x_transformers-2.4.0 → x_transformers-2.4.2}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "x-transformers"
-version = "2.4.0"
+version = "2.4.2"
 description = "X-Transformers"
 authors = [
     { name = "Phil Wang", email = "lucidrains@gmail.com" }

{x_transformers-2.4.0 → x_transformers-2.4.2}/tests/test_x_transformers.py RENAMED Viewed

@@ -1099,3 +1099,23 @@ def add_attn_pool():
     logits, intermediates = model(x, return_intermediates = True)
     assert intermediates.attn_pooled_tokens.shape[1] == 3
+def test_up():
+    from x_transformers.up_wrapper import UniversalPretrainWrapper
+    model = TransformerWrapper(
+        num_tokens = 256,
+        max_seq_len = 1024,
+        attn_pool = True,
+        num_attn_pool_queries =  3,
+        attn_layers = Decoder(
+            dim = 512,
+            depth = 12,
+            heads = 8
+        ),
+    )
+    up_wrapper = UniversalPretrainWrapper(model, seq_len = 16)
+    loss = up_wrapper()
+    loss.backward()

x_transformers-2.4.2/x_transformers/up_wrapper.py ADDED Viewed

@@ -0,0 +1,225 @@
+# https://arxiv.org/abs/2506.20057
+# Peter Bloem
+from __future__ import annotations
+from functools import partial
+from random import randrange, uniform
+import torch
+from torch import nn, cat, randperm
+from torch.nn import LSTM, Module
+from x_transformers.x_transformers import (
+    TransformerWrapper,
+    AutoregressiveWrapper
+)
+# functions
+def exists(v):
+    return v is not None
+def default(v, d):
+    return v if exists(v) else d
+def divisible_by(num, den):
+    return (num % den) == 0
+# random sequences, mixture of random and constant (unsure why constant is needed)
+def random_sequences(
+    num_tokens,
+    seq_len,
+    num_samples_random,
+    num_samples_constant,
+    shuffle = True,
+    device = None
+):
+    assert num_samples_random > 0 or num_samples_constant > 0
+    rand_seq = torch.randint(0, num_tokens, (num_samples_random, seq_len))
+    const_seq = torch.full((num_samples_constant, seq_len), randrange(num_tokens))
+    all_seq = cat((rand_seq, const_seq))
+    if exists(device):
+        all_seq = all_seq.to(device)
+    if not shuffle:
+        return all_seq
+    # shuffle with randperm
+    rand_indices = randperm(all_seq.shape[0])
+    return all_seq[rand_indices]
+# synthetic data generator
+class SyntheticDataGenerator(Module):
+    def __init__(
+        self,
+        dim,
+        num_tokens,
+        max_seq_len = 512,
+        hidden_size = None
+    ):
+        super().__init__()
+        self.max_seq_len = max_seq_len
+        self.embed = nn.Embedding(num_tokens, dim)
+        hidden_size = default(hidden_size, dim)
+        self.lstm = LSTM(dim, hidden_size, batch_first = True)
+        self.to_logits = nn.Linear(dim, num_tokens, bias = False)
+        self.apply(self.init_)
+    @torch.no_grad()
+    def init_(self, m):
+        if isinstance(m, nn.Linear):
+            m.weight *= uniform(0., 1.1) # he scales the lstm weights from 0 to 1.1
+    @torch.inference_mode()
+    @torch.compile
+    def generate(
+        self,
+        length,
+        seed = None,
+        condition = None,
+        temperature = 1e-4 # he uses a near greedy temperature
+    ):
+        assert exists(seed) or exists(condition)
+        prefix = [*filter(exists, (seed, condition))]
+        seq_len = self.max_seq_len
+        seq = torch.cat(prefix, dim = -1)
+        net_input = seq
+        hiddens = None
+        for _ in range(length):
+            logits, hiddens = self.forward(net_input, hiddens)
+            last_logit = logits[:, -1]
+            prob = (last_logit / temperature).softmax(dim = -1)
+            sampled = torch.multinomial(prob, 1)
+            net_input = sampled
+            seq = torch.cat((seq, sampled), dim = -1)
+        return seq[:, -seq_len:]
+    def forward(
+        self,
+        input,
+        hiddens = None
+    ):
+        tokens = self.embed(input)
+        embed, hidden = self.lstm(tokens, hiddens)
+        logits = self.to_logits(embed)
+        return logits, hidden
+# classes
+class UniversalPretrainWrapper(Module):
+    def __init__(
+        self,
+        model: TransformerWrapper,
+        data_generator: SyntheticDataGenerator | None = None,
+        buffer_size = None,
+        num_reset = 20,
+        batch_size = 32,
+        seq_len = 512,
+        seed_length = 8
+    ):
+        super().__init__()
+        self.model = model
+        self.ar_wrapped = AutoregressiveWrapper(model)
+        assert model.attn_layers.causal
+        num_tokens = model.num_tokens
+        dim = model.attn_layers.dim
+        if not exists(data_generator):
+            data_generator = SyntheticDataGenerator(
+                num_tokens = num_tokens,
+                dim = dim
+            )
+        self.seq_len = seq_len
+        self.data_generator = data_generator
+        self.seed_length = seed_length
+        self.batch_size = batch_size
+        buffer_size = default(buffer_size, batch_size * 20)
+        assert buffer_size > batch_size, f'data buffer size must be greater than batch size'
+        assert divisible_by(num_reset, 2)
+        self.num_reset = num_reset
+        self.buffer_size = buffer_size
+        self.random_sequences_fn = partial(random_sequences, num_tokens, seq_len)
+        init_data_buffer = self.random_sequences_fn(buffer_size // 2, buffer_size // 2)
+        self.register_buffer('synth_data_buffer', init_data_buffer)
+    @property
+    def device(self):
+        return self.synth_data_buffer.device
+    def get_rand_sequences_from_buffer(self, size = None):
+        size = default(size, self.batch_size)
+        rand_indices = randperm(self.buffer_size, device = self.device)[:size]
+        return self.synth_data_buffer[rand_indices]
+    def forward(self):
+        # following algorithm 1.
+        conditions = self.get_rand_sequences_from_buffer()
+        # get seeds, which appears to be random sequences with random crops of seed length
+        seeds = self.get_rand_sequences_from_buffer()
+        seq_arange = torch.arange(self.seed_length)
+        rand_offset = torch.randint(0, self.seq_len - self.seed_length, (self.batch_size,))
+        seq_start_pos = rand_offset[:, None] + seq_arange
+        batch_arange = torch.arange(self.batch_size, device = self.device)[:, None]
+        seeds = seeds[batch_arange, seq_start_pos]
+        # seed, condition to turing machine
+        synthetic_data = self.data_generator.generate(
+            self.seq_len,
+            condition = conditions,
+            seed = seeds
+        )
+        # reset
+        if self.num_reset > 0:
+            buffer_to_reset = self.get_rand_sequences_from_buffer(self.num_reset)
+            with torch.no_grad():
+                reset_sequences = self.random_sequences_fn(self.num_reset // 2, self.num_reset // 2, device = self.device)
+                buffer_to_reset.copy_(reset_sequences)
+        # sample yet again according to pseudocode
+        data = self.get_rand_sequences_from_buffer()
+        return self.ar_wrapped(data)

{x_transformers-2.4.0 → x_transformers-2.4.2}/x_transformers/x_transformers.py RENAMED Viewed

@@ -3263,7 +3263,7 @@ class TransformerWrapper(Module):
         # attention pool
-        if exists(self.attn_pool):
+        if exists(self.attn_pool) and return_intermediates:
             queries = repeat(self.attn_pool_queries, 'n d -> b n d', b = x.shape[0])
             attn_pooled_tokens = self.attn_pool(queries, context = x, context_mask = mask)

{x_transformers-2.4.0 → x_transformers-2.4.2}/.github/FUNDING.yml RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/.github/workflows/python-publish.yml RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/.github/workflows/python-test.yaml RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/.gitignore RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/LICENSE RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/data/README.md RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/data/enwik8.gz RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/all-attention.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/attention-on-attention.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/cosine-sim-attention.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/deepnorm.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/dynamic-pos-bias-linear.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/dynamic-pos-bias-log.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/dynamic-pos-bias-sinusoidal.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/dynamic-pos-bias.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/enhanced-recurrence.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/fcm.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/ffglu.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/flash-attention.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/gate_values.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/gating.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/length-extrapolation-scale.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/macaron-1.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/macaron-2.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/memory-transformer.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/normformer.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/pia.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/qknorm-analysis.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/resi_dual.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/residual_attn.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/rezero.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/rotary.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/sandwich-2.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/sandwich.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/sandwich_norm.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/scalenorm.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/talking-heads.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/topk-attention.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/images/xval.png RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/train_belief_state.py RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/train_copy.py RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/train_entropy_tokenizer.py RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/train_enwik8.py RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/train_length_extrapolate.py RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/train_parity.py RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/x_transformers/__init__.py RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/x_transformers/attend.py RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/x_transformers/autoregressive_wrapper.py RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/x_transformers/belief_state_wrapper.py RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/x_transformers/continuous.py RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/x_transformers/dpo.py RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/x_transformers/entropy_based_tokenizer.py RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/x_transformers/multi_input.py RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/x_transformers/neo_mlp.py RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/x_transformers/nonautoregressive_wrapper.py RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/x_transformers/xl_autoregressive_wrapper.py RENAMED Viewed

File without changes

{x_transformers-2.4.0 → x_transformers-2.4.2}/x_transformers/xval.py RENAMED Viewed

File without changes

x-transformers 2.4.0__tar.gz → 2.4.2__tar.gz

x-transformers 2.4.0tar.gz → 2.4.2tar.gz