PyPI - dreamer4 - Versions diffs - 0.0.99__tar.gz → 0.1.5__tar.gz - Mend

dreamer4 0.0.99tar.gz → 0.1.5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

{dreamer4-0.0.99 → dreamer4-0.1.5}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: dreamer4
-Version: 0.0.99
+Version: 0.1.5
 Summary: Dreamer 4
 Project-URL: Homepage, https://pypi.org/project/dreamer4/
 Project-URL: Repository, https://github.com/lucidrains/dreamer4
@@ -53,11 +53,100 @@ Description-Content-Type: text/markdown
 <img src="./dreamer4-fig2.png" width="400px"></img>
-## Dreamer 4 (wip)
+## Dreamer 4
 Implementation of Danijar's [latest iteration](https://arxiv.org/abs/2509.24527v1) for his [Dreamer](https://danijar.com/project/dreamer4/) line of work
-[Temporary Discord](https://discord.gg/MkACrrkrYR)
+[Discord channel](https://discord.gg/ab4BEk3W) for collaborating with other researchers interested in this work
+## Appreciation
+- [@dirkmcpherson](https://github.com/dirkmcpherson) for fixes to typo errors and unpassed arguments!
+## Install
+```bash
+$ pip install dreamer4
+```
+## Usage
+```python
+import torch
+from dreamer4 import VideoTokenizer, DynamicsWorldModel
+# video tokenizer, learned through MAE + lpips
+tokenizer = VideoTokenizer(
+    dim = 512,
+    dim_latent = 32,
+    patch_size = 32,
+    image_height = 256,
+    image_width = 256
+)
+video = torch.randn(2, 3, 10, 256, 256)
+# learn the tokenizer
+loss = tokenizer(video)
+loss.backward() # ler
+# dynamics world model
+world_model = DynamicsWorldModel(
+    dim = 512,
+    dim_latent = 32,
+    video_tokenizer = tokenizer,
+    num_discrete_actions = 4,
+    num_residual_streams = 1
+)
+# state, action, rewards
+video = torch.randn(2, 3, 10, 256, 256)
+discrete_actions = torch.randint(0, 4, (2, 10, 1))
+rewards = torch.randn(2, 10)
+# learn dynamics / behavior cloned model
+loss = world_model(
+    video = video,
+    rewards = rewards,
+    discrete_actions = discrete_actions
+)
+loss.backward()
+# do the above with much data
+# then generate dreams
+dreams = world_model.generate(
+    10,
+    batch_size = 2,
+    return_decoded_video = True,
+    return_for_policy_optimization = True
+)
+# learn from the dreams
+actor_loss, critic_loss = world_model.learn_from_experience(dreams)
+(actor_loss + critic_loss).backward()
+# learn from environment
+from dreamer4.mocks import MockEnv
+mock_env = MockEnv((256, 256), vectorized = True, num_envs = 4)
+experience = world_model.interact_with_env(mock_env, max_timesteps = 8, env_is_vectorized = True)
+actor_loss, critic_loss = world_model.learn_from_experience(experience)
+(actor_loss + critic_loss).backward()
+```
 ## Citation
@@ -72,3 +161,5 @@ Implementation of Danijar's [latest iteration](https://arxiv.org/abs/2509.24527v
     url     = {https://arxiv.org/abs/2509.24527},
 }
 ```
+*the conquest of nature is to be achieved through number and measure - angels to Descartes in a dream*

dreamer4-0.1.5/README.md ADDED Viewed

@@ -0,0 +1,112 @@
+<img src="./dreamer4-fig2.png" width="400px"></img>
+## Dreamer 4
+Implementation of Danijar's [latest iteration](https://arxiv.org/abs/2509.24527v1) for his [Dreamer](https://danijar.com/project/dreamer4/) line of work
+[Discord channel](https://discord.gg/ab4BEk3W) for collaborating with other researchers interested in this work
+## Appreciation
+- [@dirkmcpherson](https://github.com/dirkmcpherson) for fixes to typo errors and unpassed arguments!
+## Install
+```bash
+$ pip install dreamer4
+```
+## Usage
+```python
+import torch
+from dreamer4 import VideoTokenizer, DynamicsWorldModel
+# video tokenizer, learned through MAE + lpips
+tokenizer = VideoTokenizer(
+    dim = 512,
+    dim_latent = 32,
+    patch_size = 32,
+    image_height = 256,
+    image_width = 256
+)
+video = torch.randn(2, 3, 10, 256, 256)
+# learn the tokenizer
+loss = tokenizer(video)
+loss.backward() # ler
+# dynamics world model
+world_model = DynamicsWorldModel(
+    dim = 512,
+    dim_latent = 32,
+    video_tokenizer = tokenizer,
+    num_discrete_actions = 4,
+    num_residual_streams = 1
+)
+# state, action, rewards
+video = torch.randn(2, 3, 10, 256, 256)
+discrete_actions = torch.randint(0, 4, (2, 10, 1))
+rewards = torch.randn(2, 10)
+# learn dynamics / behavior cloned model
+loss = world_model(
+    video = video,
+    rewards = rewards,
+    discrete_actions = discrete_actions
+)
+loss.backward()
+# do the above with much data
+# then generate dreams
+dreams = world_model.generate(
+    10,
+    batch_size = 2,
+    return_decoded_video = True,
+    return_for_policy_optimization = True
+)
+# learn from the dreams
+actor_loss, critic_loss = world_model.learn_from_experience(dreams)
+(actor_loss + critic_loss).backward()
+# learn from environment
+from dreamer4.mocks import MockEnv
+mock_env = MockEnv((256, 256), vectorized = True, num_envs = 4)
+experience = world_model.interact_with_env(mock_env, max_timesteps = 8, env_is_vectorized = True)
+actor_loss, critic_loss = world_model.learn_from_experience(experience)
+(actor_loss + critic_loss).backward()
+```
+## Citation
+```bibtex
+@misc{hafner2025trainingagentsinsidescalable,
+    title   = {Training Agents Inside of Scalable World Models},
+    author  = {Danijar Hafner and Wilson Yan and Timothy Lillicrap},
+    year    = {2025},
+    eprint  = {2509.24527},
+    archivePrefix = {arXiv},
+    primaryClass = {cs.AI},
+    url     = {https://arxiv.org/abs/2509.24527},
+}
+```
+*the conquest of nature is to be achieved through number and measure - angels to Descartes in a dream*

{dreamer4-0.0.99 → dreamer4-0.1.5}/dreamer4/dreamer4.py RENAMED Viewed

@@ -14,7 +14,7 @@ from torch.nested import nested_tensor
 from torch.distributions import Normal, kl
 from torch.nn import Module, ModuleList, Embedding, Parameter, Sequential, Linear, RMSNorm, Identity
 from torch import nn, cat, stack, arange, tensor, Tensor, is_tensor, full, zeros, ones, randint, rand, randn, randn_like, empty, full, linspace, arange
-from torch.utils._pytree import tree_flatten, tree_unflatten
+from torch.utils._pytree import tree_map, tree_flatten, tree_unflatten
 import torchvision
 from torchvision.models import VGG16_Weights
@@ -91,6 +91,14 @@ class Experience:
     agent_index: int = 0
     is_from_world_model: bool = True
+    def cpu(self):
+        return self.to(torch.device('cpu'))
+    def to(self, device):
+        experience_dict = asdict(self)
+        experience_dict = tree_map(lambda t: t.to(device) if is_tensor(t) else t, experience_dict)
+        return Experience(**experience_dict)
 def combine_experiences(
     exps: list[Experiences]
 ) -> Experience:
@@ -1179,10 +1187,11 @@ def special_token_mask(q, k, seq_len, num_tokens, special_attend_only_itself = F
 def block_mask_special_tokens_right(
     seq_len,
-    num_tokens
+    num_tokens,
+    special_attend_only_itself = False
 ):
     def inner(b, h, q, k):
-        return special_token_mask(q, k, seq_len, num_tokens)
+        return special_token_mask(q, k, seq_len, num_tokens, special_attend_only_itself)
     return inner
 def compose_mask(mask1, mask2):
@@ -1331,6 +1340,12 @@ class Attention(Module):
         q = self.q_heads_rmsnorm(q)
         k = self.k_heads_rmsnorm(k)
+        # rotary
+        if exists(rotary_pos_emb):
+            q = apply_rotations(rotary_pos_emb, q)
+            k = apply_rotations(rotary_pos_emb, k)
         # caching
         if exists(kv_cache):
@@ -1338,12 +1353,6 @@ class Attention(Module):
             k = cat((ck, k), dim = -2)
             v = cat((cv, v), dim = -2)
-        # rotary
-        if exists(rotary_pos_emb):
-            q = apply_rotations(rotary_pos_emb, q)
-            k = apply_rotations(rotary_pos_emb, k)
         # attention
         attend_fn = default(attend_fn, naive_attend)
@@ -1493,7 +1502,8 @@ class AxialSpaceTimeTransformer(Module):
         # attend functions for space and time
-        use_flex = exists(flex_attention) and tokens.is_cuda
+        has_kv_cache = exists(kv_cache)
+        use_flex = exists(flex_attention) and tokens.is_cuda and not has_kv_cache # KV cache shape breaks flex attention TODO: Fix
         attend_kwargs = dict(use_flex = use_flex, softclamp_value = self.attn_softclamp_value, special_attend_only_itself = self.special_attend_only_itself, device = device)
@@ -1505,14 +1515,12 @@ class AxialSpaceTimeTransformer(Module):
         time_attn_kv_caches = []
-        has_kv_cache = exists(kv_cache)
         if has_kv_cache:
             past_tokens, tokens = tokens[:, :-1], tokens[:, -1:]
             rotary_seq_len = 1
-            rotary_pos_offset = past_tokens.shape[-2]
+            rotary_pos_offset = past_tokens.shape[1]
         else:
             rotary_seq_len = time
             rotary_pos_offset = 0
@@ -1687,6 +1695,7 @@ class VideoTokenizer(Module):
             time_block_every = time_block_every,
             num_special_spatial_tokens = num_latent_tokens,
             num_residual_streams = num_residual_streams,
+            special_attend_only_itself = True,
             final_norm = True
         )
@@ -1847,7 +1856,7 @@ class VideoTokenizer(Module):
         losses = (recon_loss, lpips_loss)
-        return total_loss, TokenizerLosses(losses)
+        return total_loss, TokenizerLosses(*losses)
 # dynamics model, axial space-time transformer
@@ -1900,7 +1909,9 @@ class DynamicsWorldModel(Module):
         gae_lambda = 0.95,
         ppo_eps_clip = 0.2,
         pmpo_pos_to_neg_weight = 0.5, # pos and neg equal weight
-        pmpo_kl_div_loss_weight = 1.,
+        pmpo_reverse_kl = True,
+        pmpo_kl_div_loss_weight = .3,
+        normalize_advantages = None,
         value_clip = 0.4,
         policy_entropy_weight = .01,
         gae_use_accelerated = False
@@ -2102,12 +2113,13 @@ class DynamicsWorldModel(Module):
         self.ppo_eps_clip = ppo_eps_clip
         self.value_clip = value_clip
-        self.policy_entropy_weight = value_clip
+        self.policy_entropy_weight = policy_entropy_weight
         # pmpo related
         self.pmpo_pos_to_neg_weight = pmpo_pos_to_neg_weight
         self.pmpo_kl_div_loss_weight = pmpo_kl_div_loss_weight
+        self.pmpo_reverse_kl = pmpo_reverse_kl
         # rewards related
@@ -2124,7 +2136,7 @@ class DynamicsWorldModel(Module):
         self.flow_loss_normalizer = LossNormalizer(1)
         self.reward_loss_normalizer = LossNormalizer(multi_token_pred_len)
         self.discrete_actions_loss_normalizer = LossNormalizer(multi_token_pred_len) if num_discrete_actions > 0 else None
-        self.continuous_actions_loss_normalizer = LossNormalizer(multi_token_pred_len) if num_discrete_actions > 0 else None
+        self.continuous_actions_loss_normalizer = LossNormalizer(multi_token_pred_len) if num_continuous_actions > 0 else None
         self.latent_flow_loss_weight = latent_flow_loss_weight
@@ -2355,6 +2367,9 @@ class DynamicsWorldModel(Module):
             elif len(env_step_out) == 4:
                 next_frame, reward, terminated, truncated = env_step_out
+            elif len(env_step_out) == 5:
+                next_frame, reward, terminated, truncated, info = env_step_out
             # update episode lens
             episode_lens = torch.where(is_terminated, episode_lens, episode_lens + 1)
@@ -2423,8 +2438,12 @@ class DynamicsWorldModel(Module):
         value_optim: Optimizer | None = None,
         only_learn_policy_value_heads = True, # in the paper, they do not finetune the entire dynamics model, they just learn the heads
         use_pmpo = True,
+        normalize_advantages = None,
         eps = 1e-6
     ):
+        assert isinstance(experience, Experience)
+        experience = experience.to(self.device)
         latents = experience.latents
         actions = experience.actions
@@ -2437,7 +2456,7 @@ class DynamicsWorldModel(Module):
         step_size = experience.step_size
         agent_index = experience.agent_index
-        assert all([*map(exists, (old_log_probs, actions, old_values, rewards, step_size))]), 'the generations need to contain the log probs, values, and rewards for policy optimization'
+        assert all([*map(exists, (old_log_probs, actions, old_values, rewards, step_size))]), 'the generations need to contain the log probs, values, and rewards for policy optimization - world_model.generate(..., return_log_probs_and_values = True)'
         batch, time = latents.shape[0], latents.shape[1]
@@ -2451,8 +2470,8 @@ class DynamicsWorldModel(Module):
         if exists(experience.lens):
             mask_for_gae = lens_to_mask(experience.lens, time)
-            rewards = rewards.masked_fill(mask_for_gae, 0.)
-            old_values = old_values.masked_fill(mask_for_gae, 0.)
+            rewards = rewards.masked_fill(~mask_for_gae, 0.)
+            old_values = old_values.masked_fill(~mask_for_gae, 0.)
         # calculate returns
@@ -2487,7 +2506,7 @@ class DynamicsWorldModel(Module):
             # mean, var - todo - handle distributed
-            returns_mean, returns_var = returns.mean(), returns.var()
+            returns_mean, returns_var = returns_for_stats.mean(), returns_for_stats.var()
             # ema
@@ -2505,16 +2524,19 @@ class DynamicsWorldModel(Module):
         else:
             advantage = returns - old_values
-        # apparently they just use the sign of the advantage
+        # if using pmpo, do not normalize advantages, but can be overridden
+        normalize_advantages = default(normalize_advantages, not use_pmpo)
+        if normalize_advantages:
+            advantage = F.layer_norm(advantage, advantage.shape, eps = eps)
         # https://arxiv.org/abs/2410.04166v1
         if use_pmpo:
             pos_advantage_mask = advantage >= 0.
             neg_advantage_mask = ~pos_advantage_mask
-        else:
-            advantage = F.layer_norm(advantage, advantage.shape, eps = eps)
         # replay for the action logits and values
         # but only do so if fine tuning the entire world model for RL
@@ -2578,11 +2600,18 @@ class DynamicsWorldModel(Module):
             # take care of kl
             if self.pmpo_kl_div_loss_weight > 0.:
                 new_unembedded_actions = self.action_embedder.unembed(policy_embed, pred_head_index = 0)
+                kl_div_inputs, kl_div_targets = new_unembedded_actions, old_action_unembeds
                 # mentioned that the "reverse direction for the prior KL" was used
+                # make optional, as observed instability in toy task
+                if self.pmpo_reverse_kl:
+                    kl_div_inputs, kl_div_targets = kl_div_targets, kl_div_inputs
-                discrete_kl_div, continuous_kl_div = self.action_embedder.kl_div(old_action_unembeds, new_unembedded_actions)
+                discrete_kl_div, continuous_kl_div = self.action_embedder.kl_div(kl_div_inputs, kl_div_targets)
                 # accumulate discrete and continuous kl div
@@ -2680,12 +2709,22 @@ class DynamicsWorldModel(Module):
         return_rewards_per_frame = False,
         return_agent_actions = False,
         return_log_probs_and_values = False,
+        return_for_policy_optimization = False,
         return_time_kv_cache = False,
         store_agent_embed = True,
         store_old_action_unembeds = True
     ): # (b t n d) | (b c t h w)
+        # handy flag for returning generations for rl
+        if return_for_policy_optimization:
+            return_agent_actions |= True
+            return_log_probs_and_values |= True
+            return_rewards_per_frame |= True
+        # more variables
         has_proprio = self.has_proprio
         was_training = self.training
         self.eval()
@@ -2755,6 +2794,19 @@ class DynamicsWorldModel(Module):
             curr_time_steps = latents.shape[1]
+            # determine whether to take an extra step if
+            # (1) using time kv cache
+            # (2) decoding anything off agent embedding (rewards, actions, etc)
+            take_extra_step = (
+                use_time_kv_cache or
+                return_rewards_per_frame or
+                store_agent_embed or
+                return_agent_actions
+            )
+            # prepare noised latent / proprio inputs
             noised_latent = randn((batch_size, 1, self.num_video_views, *latent_shape), device = self.device)
             noised_proprio = None
@@ -2762,7 +2814,10 @@ class DynamicsWorldModel(Module):
             if has_proprio:
                 noised_proprio = randn((batch_size, 1, self.dim_proprio), device = self.device)
-            for step in range(num_steps):
+            # denoising steps
+            for step in range(num_steps + int(take_extra_step)):
                 is_last_step = (step + 1) == num_steps
                 signal_levels = full((batch_size, 1), step * step_size, dtype = torch.long, device = self.device)
@@ -2805,6 +2860,11 @@ class DynamicsWorldModel(Module):
                 if use_time_kv_cache and is_last_step:
                     time_kv_cache = next_time_kv_cache
+                # early break if taking an extra step for agent embedding off cleaned latents for decoding
+                if take_extra_step and is_last_step:
+                    break
                 # maybe proprio
                 if has_proprio:
@@ -3007,7 +3067,7 @@ class DynamicsWorldModel(Module):
         latent_is_noised = False,
         return_all_losses = False,
         return_intermediates = False,
-        add_autoregressive_action_loss = False,
+        add_autoregressive_action_loss = True,
         update_loss_ema = None,
         latent_has_view_dim = False
     ):
@@ -3039,8 +3099,8 @@ class DynamicsWorldModel(Module):
         if latents.ndim == 4:
             latents = rearrange(latents, 'b t v d -> b t v 1 d') # 1 latent edge case
-        assert latents.shape[-2:] == self.latent_shape
-        assert latents.shape[2] == self.num_video_views
+        assert latents.shape[-2:] == self.latent_shape, f'latents must have shape {self.latent_shape}, got {latents.shape[-2:]}'
+        assert latents.shape[2] == self.num_video_views, f'latents must have {self.num_video_views} views, got {latents.shape[2]}'
         # variables
@@ -3464,7 +3524,7 @@ class DynamicsWorldModel(Module):
             reward_losses = F.cross_entropy(reward_pred, reward_targets, reduction = 'none')
-            reward_losses = reward_losses.masked_fill(reward_loss_mask, 0.)
+            reward_losses = reward_losses.masked_fill(~reward_loss_mask, 0.)
             if is_var_len:
                 reward_loss = reward_losses[loss_mask_without_last].mean(dim = 0)
@@ -3508,7 +3568,7 @@ class DynamicsWorldModel(Module):
                 discrete_mask = rearrange(discrete_mask, 'b t mtp -> mtp b t')
             if exists(continuous_actions):
-                continuous_action_targets, continuous_mask = create_multi_token_prediction_targets(discrete_actions, self.multi_token_pred_len)
+                continuous_action_targets, continuous_mask = create_multi_token_prediction_targets(continuous_actions, self.multi_token_pred_len)
                 continuous_action_targets = rearrange(continuous_action_targets, 'b t mtp ... -> mtp b t ...')
                 continuous_mask = rearrange(continuous_mask, 'b t mtp -> mtp b t')

{dreamer4-0.0.99 → dreamer4-0.1.5}/dreamer4/trainers.py RENAMED Viewed

@@ -528,7 +528,7 @@ class SimTrainer(Module):
                 total_experience += num_experience
-                experiences.append(experience)
+                experiences.append(experience.cpu())
             combined_experiences = combine_experiences(experiences)

{dreamer4-0.0.99 → dreamer4-0.1.5}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "dreamer4"
-version = "0.0.99"
+version = "0.1.5"
 description = "Dreamer 4"
 authors = [
     { name = "Phil Wang", email = "lucidrains@gmail.com" }

{dreamer4-0.0.99 → dreamer4-0.1.5}/tests/test_dreamer.py RENAMED Viewed

@@ -680,6 +680,12 @@ def test_online_rl(
     combined_experience = combine_experiences([one_experience, another_experience])
+    # quick test moving the experience to different devices
+    if torch.cuda.is_available():
+        combined_experience = combined_experience.to(torch.device('cuda'))
+        combined_experience = combined_experience.to(world_model_and_policy.device)
     if store_agent_embed:
         assert exists(combined_experience.agent_embed)

dreamer4-0.0.99/README.md DELETED Viewed

@@ -1,21 +0,0 @@
-<img src="./dreamer4-fig2.png" width="400px"></img>
-## Dreamer 4 (wip)
-Implementation of Danijar's [latest iteration](https://arxiv.org/abs/2509.24527v1) for his [Dreamer](https://danijar.com/project/dreamer4/) line of work
-[Temporary Discord](https://discord.gg/MkACrrkrYR)
-## Citation
-```bibtex
-@misc{hafner2025trainingagentsinsidescalable,
-    title   = {Training Agents Inside of Scalable World Models},
-    author  = {Danijar Hafner and Wilson Yan and Timothy Lillicrap},
-    year    = {2025},
-    eprint  = {2509.24527},
-    archivePrefix = {arXiv},
-    primaryClass = {cs.AI},
-    url     = {https://arxiv.org/abs/2509.24527},
-}
-```

{dreamer4-0.0.99 → dreamer4-0.1.5}/.github/workflows/python-publish.yml RENAMED Viewed

File without changes

{dreamer4-0.0.99 → dreamer4-0.1.5}/.github/workflows/test.yml RENAMED Viewed

File without changes

{dreamer4-0.0.99 → dreamer4-0.1.5}/.gitignore RENAMED Viewed

File without changes

{dreamer4-0.0.99 → dreamer4-0.1.5}/LICENSE RENAMED Viewed

File without changes

{dreamer4-0.0.99 → dreamer4-0.1.5}/dreamer4/__init__.py RENAMED Viewed

File without changes

{dreamer4-0.0.99 → dreamer4-0.1.5}/dreamer4/mocks.py RENAMED Viewed

File without changes

{dreamer4-0.0.99 → dreamer4-0.1.5}/dreamer4-fig2.png RENAMED Viewed

File without changes

dreamer4 0.0.99__tar.gz → 0.1.5__tar.gz

dreamer4 0.0.99tar.gz → 0.1.5tar.gz