PyPI - flaxdiff - Versions diffs - 0.1.3__tar.gz → 0.1.5__tar.gz - Mend

flaxdiff 0.1.3tar.gz → 0.1.5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

{flaxdiff-0.1.3/flaxdiff.egg-info → flaxdiff-0.1.5}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: flaxdiff
-Version: 0.1.3
+Version: 0.1.5
 Summary: A versatile and easy to understand Diffusion library
 Author: Ashish Kumar Singh
 Author-email: ashishkmr472@gmail.com
@@ -27,7 +27,7 @@ The `Diffusion_flax_linen.ipynb` notebook is my main workspace for experiments.
 In the `example notebooks` folder, you will find comprehensive notebooks for various diffusion techniques, written entirely from scratch and are independent of the FlaxDiff library. Each notebook includes detailed explanations of the underlying mathematics and concepts, making them invaluable resources for learning and understanding diffusion models.
-### Available Notebooks
+### Available Notebooks and Resources
 - **[Diffusion explained (nbviewer link)](https://nbviewer.org/github/AshishKumar4/FlaxDiff/blob/main/tutorial%20notebooks/simple%20diffusion%20flax.ipynb) [(local link)](tutorial%20notebooks/simple%20diffusion%20flax.ipynb)**
@@ -46,6 +46,14 @@ In the `example notebooks` folder, you will find comprehensive notebooks for var
 These notebooks aim to provide a very easy to understand and step-by-step guide to the various diffusion models and techniques. They are designed to be beginner-friendly, and thus although they may not adhere to the exact formulations and implementations of the original papers to make them more understandable and generalizable, I have tried my best to keep them as accurate as possible. If you find any mistakes or have any suggestions, please feel free to open an issue or a pull request.
+#### Other resources
+- **[Multi-host Data parallel training script in JAX](./training.py)**
+  - Training script for multi-host data parallel training in JAX, to serve as a reference for training large models on multiple GPUs/TPUs across multiple hosts. A full-fledged tutorial notebook is in the works.
+- **[TPU utilities for making life easier](./tpu-tools/)**
+  - A collection of utilities and scripts to make working with TPUs easier, such as cli to create/start/stop/setup TPUs, script to setup TPU VMs (install everything you need), mounting gcs datasets etc.
 ## Disclaimer (and About Me)
 I worked as a Machine Learning Researcher at Hyperverge from 2019-2021, focusing on computer vision, specifically facial anti-spoofing and facial detection & recognition. Since switching to my current job in 2021, I haven't engaged in as much R&D work, leading me to start this pet project to revisit and relearn the fundamentals and get familiar with the state-of-the-art. My current role involves primarily Golang system engineering with some applied ML work just sprinkled in. Therefore, the code may reflect my learning journey. Please forgive any mistakes and do open an issue to let me know.

flaxdiff-0.1.3/PKG-INFO → flaxdiff-0.1.5/README.md RENAMED Viewed

@@ -1,16 +1,3 @@
-Metadata-Version: 2.1
-Name: flaxdiff
-Version: 0.1.3
-Summary: A versatile and easy to understand Diffusion library
-Author: Ashish Kumar Singh
-Author-email: ashishkmr472@gmail.com
-Description-Content-Type: text/markdown
-Requires-Dist: flax>=0.8.4
-Requires-Dist: optax>=0.2.2
-Requires-Dist: jax>=0.4.28
-Requires-Dist: orbax
-Requires-Dist: clu
 # ![](images/logo.jpeg "FlaxDiff")
 ## A Versatile and simple Diffusion Library
@@ -27,7 +14,7 @@ The `Diffusion_flax_linen.ipynb` notebook is my main workspace for experiments.
 In the `example notebooks` folder, you will find comprehensive notebooks for various diffusion techniques, written entirely from scratch and are independent of the FlaxDiff library. Each notebook includes detailed explanations of the underlying mathematics and concepts, making them invaluable resources for learning and understanding diffusion models.
-### Available Notebooks
+### Available Notebooks and Resources
 - **[Diffusion explained (nbviewer link)](https://nbviewer.org/github/AshishKumar4/FlaxDiff/blob/main/tutorial%20notebooks/simple%20diffusion%20flax.ipynb) [(local link)](tutorial%20notebooks/simple%20diffusion%20flax.ipynb)**
@@ -46,6 +33,14 @@ In the `example notebooks` folder, you will find comprehensive notebooks for var
 These notebooks aim to provide a very easy to understand and step-by-step guide to the various diffusion models and techniques. They are designed to be beginner-friendly, and thus although they may not adhere to the exact formulations and implementations of the original papers to make them more understandable and generalizable, I have tried my best to keep them as accurate as possible. If you find any mistakes or have any suggestions, please feel free to open an issue or a pull request.
+#### Other resources
+- **[Multi-host Data parallel training script in JAX](./training.py)**
+  - Training script for multi-host data parallel training in JAX, to serve as a reference for training large models on multiple GPUs/TPUs across multiple hosts. A full-fledged tutorial notebook is in the works.
+- **[TPU utilities for making life easier](./tpu-tools/)**
+  - A collection of utilities and scripts to make working with TPUs easier, such as cli to create/start/stop/setup TPUs, script to setup TPU VMs (install everything you need), mounting gcs datasets etc.
 ## Disclaimer (and About Me)
 I worked as a Machine Learning Researcher at Hyperverge from 2019-2021, focusing on computer vision, specifically facial anti-spoofing and facial detection & recognition. Since switching to my current job in 2021, I haven't engaged in as much R&D work, leading me to start this pet project to revisit and relearn the fundamentals and get familiar with the state-of-the-art. My current role involves primarily Golang system engineering with some applied ML work just sprinkled in. Therefore, the code may reflect my learning journey. Please forgive any mistakes and do open an issue to let me know.

{flaxdiff-0.1.3 → flaxdiff-0.1.5}/flaxdiff/models/attention.py RENAMED Viewed

@@ -62,8 +62,13 @@ class EfficientAttention(nn.Module):
         # x has shape [B, H * W, C]
         context = x if context is None else context
-        B, H, W, C = x.shape
-        x = x.reshape((B, 1, H * W, C))
+        orig_x_shape = x.shape
+        if len(x.shape) == 4:
+            B, H, W, C = x.shape
+            x = x.reshape((B, 1, H * W, C))
+        else:
+            B, SEQ, C = x.shape
+            x = x.reshape((B, 1, SEQ, C))
         if len(context.shape) == 4:
             B, _H, _W, _C = context.shape
@@ -93,7 +98,7 @@ class EfficientAttention(nn.Module):
         proj = self.proj_attn(hidden_states)
-        proj = proj.reshape((B, H, W, C))
+        proj = proj.reshape(orig_x_shape)
         return proj
@@ -138,8 +143,10 @@ class NormalAttention(nn.Module):
     @nn.compact
     def __call__(self, x, context=None):
         # x has shape [B, H, W, C]
-        B, H, W, C = x.shape
-        x = x.reshape((B, H*W, C))
+        orig_x_shape = x.shape
+        if len(x.shape) == 4:
+            B, H, W, C = x.shape
+            x = x.reshape((B, H*W, C))
         context = x if context is None else context
         if len(context.shape) == 4:
             context = context.reshape((B, H*W, C))
@@ -151,10 +158,10 @@ class NormalAttention(nn.Module):
             query, key, value, dtype=self.dtype, broadcast_dropout=False, dropout_rng=None, precision=self.precision
         )
         proj = self.proj_attn(hidden_states)
-        proj = proj.reshape((B, H, W, C))
+        proj = proj.reshape(orig_x_shape)
         return proj
-class AttentionBlock(nn.Module):
+class BasicTransformerBlock(nn.Module):
     # Has self and cross attention
     query_dim: int
     heads: int = 4
@@ -193,129 +200,26 @@ class AttentionBlock(nn.Module):
             kernel_init=self.kernel_init
         )
-        self.ff = nn.DenseGeneral(
-            features=self.query_dim,
-            use_bias=self.use_bias,
-            precision=self.precision,
-            dtype=self.dtype,
-            kernel_init=self.kernel_init(),
-            name="ff"
-        )
+        self.ff = FlaxFeedForward(dim=self.query_dim)
         self.norm1 = nn.RMSNorm(epsilon=1e-5, dtype=self.dtype)
         self.norm2 = nn.RMSNorm(epsilon=1e-5, dtype=self.dtype)
         self.norm3 = nn.RMSNorm(epsilon=1e-5, dtype=self.dtype)
-        self.norm4 = nn.RMSNorm(epsilon=1e-5, dtype=self.dtype)
     @nn.compact
     def __call__(self, hidden_states, context=None):
         # self attention
-        residual = hidden_states
-        hidden_states = self.norm1(hidden_states)
-        if self.use_cross_only:
-            hidden_states = self.attention1(hidden_states, context)
-        else:
-            hidden_states = self.attention1(hidden_states)
-        hidden_states = hidden_states + residual
+        if not self.use_cross_only:
+            print("Using self attention")
+            hidden_states = hidden_states + self.attention1(self.norm1(hidden_states))
         # cross attention
-        residual = hidden_states
-        hidden_states = self.norm2(hidden_states)
-        hidden_states = self.attention2(hidden_states, context)
-        hidden_states = hidden_states + residual
+        hidden_states = hidden_states + self.attention2(self.norm2(hidden_states), context)
         # feed forward
-        residual = hidden_states
-        hidden_states = self.norm3(hidden_states)
-        hidden_states = nn.gelu(hidden_states)
-        hidden_states = self.ff(hidden_states)
-        hidden_states = hidden_states + residual
+        hidden_states = hidden_states + self.ff(self.norm3(hidden_states))
         return hidden_states
-class TransformerBlock(nn.Module):
-    heads: int = 4
-    dim_head: int = 32
-    use_linear_attention: bool = True
-    dtype: Any = jnp.float32
-    precision: Any = jax.lax.Precision.HIGH
-    use_projection: bool = False
-    use_flash_attention:bool = True
-    use_self_and_cross:bool = False
-    @nn.compact
-    def __call__(self, x, context=None):
-        inner_dim = self.heads * self.dim_head
-        B, H, W, C = x.shape
-        normed_x = nn.RMSNorm(epsilon=1e-5, dtype=self.dtype)(x)
-        if self.use_projection == True:
-            if self.use_linear_attention:
-                projected_x = nn.Dense(features=inner_dim,
-                                       use_bias=False, precision=self.precision,
-                                       kernel_init=kernel_init(1.0),
-                                       dtype=self.dtype, name=f'project_in')(normed_x)
-            else:
-                projected_x = nn.Conv(
-                    features=inner_dim, kernel_size=(1, 1),
-                    kernel_init=kernel_init(1.0),
-                    strides=(1, 1), padding='VALID', use_bias=False, dtype=self.dtype,
-                    precision=self.precision, name=f'project_in_conv',
-                )(normed_x)
-        else:
-            projected_x = normed_x
-            inner_dim = C
-        context = projected_x if context is None else context
-        if self.use_self_and_cross:
-            projected_x = AttentionBlock(
-                query_dim=inner_dim,
-                heads=self.heads,
-                dim_head=self.dim_head,
-                name=f'Attention',
-                precision=self.precision,
-                use_bias=False,
-                dtype=self.dtype,
-                use_flash_attention=self.use_flash_attention,
-                use_cross_only=False
-            )(projected_x, context)
-        elif self.use_flash_attention == True:
-            projected_x = EfficientAttention(
-                query_dim=inner_dim,
-                heads=self.heads,
-                dim_head=self.dim_head,
-                name=f'Attention',
-                precision=self.precision,
-                use_bias=False,
-                dtype=self.dtype,
-            )(projected_x, context)
-        else:
-            projected_x = NormalAttention(
-                query_dim=inner_dim,
-                heads=self.heads,
-                dim_head=self.dim_head,
-                name=f'Attention',
-                precision=self.precision,
-                use_bias=False,
-            )(projected_x, context)
-        if self.use_projection == True:
-            if self.use_linear_attention:
-                projected_x = nn.Dense(features=C, precision=self.precision,
-                                       dtype=self.dtype, use_bias=False,
-                                       kernel_init=kernel_init(1.0),
-                                       name=f'project_out')(projected_x)
-            else:
-                projected_x = nn.Conv(
-                    features=C, kernel_size=(1, 1),
-                    kernel_init=kernel_init(1.0),
-                    strides=(1, 1), padding='VALID', use_bias=False, dtype=self.dtype,
-                    precision=self.precision, name=f'project_out_conv',
-                )(projected_x)
-        out = x + projected_x
-        return out
 class FlaxGEGLU(nn.Module):
     r"""
     Flax implementation of a Linear layer followed by the variant of the gated linear unit activation function from
@@ -333,10 +237,11 @@ class FlaxGEGLU(nn.Module):
     dim: int
     dropout: float = 0.0
     dtype: jnp.dtype = jnp.float32
+    precision: Any = jax.lax.Precision.DEFAULT
     def setup(self):
         inner_dim = self.dim * 4
-        self.proj = nn.Dense(inner_dim * 2, dtype=self.dtype, precision=jax.lax.Precision.DEFAULT)
+        self.proj = nn.Dense(inner_dim * 2, dtype=self.dtype, precision=self.precision)
     def __call__(self, hidden_states):
         hidden_states = self.proj(hidden_states)
@@ -362,14 +267,14 @@ class FlaxFeedForward(nn.Module):
     """
     dim: int
-    dropout: float = 0.0
     dtype: jnp.dtype = jnp.float32
+    precision: Any = jax.lax.Precision.DEFAULT
     def setup(self):
         # The second linear layer needs to be called
         # net_2 for now to match the index of the Sequential layer
-        self.net_0 = FlaxGEGLU(self.dim, self.dtype)
-        self.net_2 = nn.Dense(self.dim, dtype=self.dtype, precision=jax.lax.Precision.DEFAULT)
+        self.net_0 = FlaxGEGLU(self.dim, self.dtype, precision=self.precision)
+        self.net_2 = nn.Dense(self.dim, dtype=self.dtype, precision=self.precision)
     def __call__(self, hidden_states):
         hidden_states = self.net_0(hidden_states)
@@ -377,55 +282,127 @@ class FlaxFeedForward(nn.Module):
         return hidden_states
 class BasicTransformerBlock(nn.Module):
+    # Has self and cross attention
     query_dim: int
-    heads: int
-    dim_head: int
-    dropout: float = 0.0
-    only_cross_attention: bool = False
-    dtype: jnp.dtype = jnp.float32
-    use_memory_efficient_attention: bool = False
-    split_head_dim: bool = False
-    precision: Any = jax.lax.Precision.DEFAULT
+    heads: int = 4
+    dim_head: int = 64
+    dtype: Any = jnp.float32
+    precision: Any = jax.lax.Precision.HIGHEST
+    use_bias: bool = True
+    kernel_init: Callable = lambda : kernel_init(1.0)
+    use_flash_attention:bool = False
+    use_cross_only:bool = False
+    only_pure_attention:bool = False
     def setup(self):
-        # self attention (or cross_attention if only_cross_attention is True)
-        self.attn1 = NormalAttention(
-            query_dim=self.query_dim,
+        if self.use_flash_attention:
+            attenBlock = EfficientAttention
+        else:
+            attenBlock = NormalAttention
+        self.attention1 = attenBlock(
+         query_dim=self.query_dim,
             heads=self.heads,
             dim_head=self.dim_head,
-            dtype=self.dtype,
+            name=f'Attention1',
             precision=self.precision,
+            use_bias=self.use_bias,
+            dtype=self.dtype,
+            kernel_init=self.kernel_init
         )
-        # cross attention
-        self.attn2 = NormalAttention(
+        self.attention2 = attenBlock(
             query_dim=self.query_dim,
             heads=self.heads,
             dim_head=self.dim_head,
-            dtype=self.dtype,
+            name=f'Attention2',
             precision=self.precision,
+            use_bias=self.use_bias,
+            dtype=self.dtype,
+            kernel_init=self.kernel_init
         )
-        self.ff = FlaxFeedForward(dim=self.query_dim, dropout=self.dropout, dtype=self.dtype)
+        self.ff = FlaxFeedForward(dim=self.query_dim)
         self.norm1 = nn.RMSNorm(epsilon=1e-5, dtype=self.dtype)
         self.norm2 = nn.RMSNorm(epsilon=1e-5, dtype=self.dtype)
         self.norm3 = nn.RMSNorm(epsilon=1e-5, dtype=self.dtype)
-    def __call__(self, hidden_states, context, deterministic=True):
+    @nn.compact
+    def __call__(self, hidden_states, context=None):
+        if self.only_pure_attention:
+            return self.attention2(self.norm2(hidden_states), context)
         # self attention
-        residual = hidden_states
-        if self.only_cross_attention:
-            hidden_states = self.attn1(self.norm1(hidden_states), context)
-        else:
-            hidden_states = self.attn1(self.norm1(hidden_states))
-        hidden_states = hidden_states + residual
+        if not self.use_cross_only:
+            hidden_states = hidden_states + self.attention1(self.norm1(hidden_states))
         # cross attention
-        residual = hidden_states
-        hidden_states = self.attn2(self.norm2(hidden_states), context)
-        hidden_states = hidden_states + residual
+        hidden_states = hidden_states + self.attention2(self.norm2(hidden_states), context)
         # feed forward
-        residual = hidden_states
-        hidden_states = self.ff(self.norm3(hidden_states))
-        hidden_states = hidden_states + residual
+        hidden_states = hidden_states + self.ff(self.norm3(hidden_states))
+        return hidden_states
-        return hidden_states
+class TransformerBlock(nn.Module):
+    heads: int = 4
+    dim_head: int = 32
+    use_linear_attention: bool = True
+    dtype: Any = jnp.float32
+    precision: Any = jax.lax.Precision.HIGH
+    use_projection: bool = False
+    use_flash_attention:bool = True
+    use_self_and_cross:bool = False
+    only_pure_attention:bool = False
+    @nn.compact
+    def __call__(self, x, context=None):
+        inner_dim = self.heads * self.dim_head
+        B, H, W, C = x.shape
+        normed_x = nn.RMSNorm(epsilon=1e-5, dtype=self.dtype)(x)
+        if self.use_projection == True:
+            if self.use_linear_attention:
+                projected_x = nn.Dense(features=inner_dim,
+                                       use_bias=False, precision=self.precision,
+                                       kernel_init=kernel_init(1.0),
+                                       dtype=self.dtype, name=f'project_in')(normed_x)
+            else:
+                projected_x = nn.Conv(
+                    features=inner_dim, kernel_size=(1, 1),
+                    kernel_init=kernel_init(1.0),
+                    strides=(1, 1), padding='VALID', use_bias=False, dtype=self.dtype,
+                    precision=self.precision, name=f'project_in_conv',
+                )(normed_x)
+        else:
+            projected_x = normed_x
+            inner_dim = C
+        context = projected_x if context is None else context
+        projected_x = BasicTransformerBlock(
+            query_dim=inner_dim,
+            heads=self.heads,
+            dim_head=self.dim_head,
+            name=f'Attention',
+            precision=self.precision,
+            use_bias=False,
+            dtype=self.dtype,
+            use_flash_attention=self.use_flash_attention,
+            use_cross_only=(not self.use_self_and_cross),
+            only_pure_attention=self.only_pure_attention
+        )(projected_x, context)
+        if self.use_projection == True:
+            if self.use_linear_attention:
+                projected_x = nn.Dense(features=C, precision=self.precision,
+                                       dtype=self.dtype, use_bias=False,
+                                       kernel_init=kernel_init(1.0),
+                                       name=f'project_out')(projected_x)
+            else:
+                projected_x = nn.Conv(
+                    features=C, kernel_size=(1, 1),
+                    kernel_init=kernel_init(1.0),
+                    strides=(1, 1), padding='VALID', use_bias=False, dtype=self.dtype,
+                    precision=self.precision, name=f'project_out_conv',
+                )(projected_x)
+        out = x + projected_x
+        return out

flaxdiff-0.1.5/flaxdiff/models/autoencoder/__init__.py ADDED Viewed

File without changes

flaxdiff-0.1.5/flaxdiff/models/autoencoder/autoencoder.py ADDED Viewed

@@ -0,0 +1,14 @@
+import jax
+import jax.numpy as jnp
+from flax import linen as nn
+from typing import Dict, Callable, Sequence, Any, Union
+import einops
+from ..common import kernel_init, ConvLayer, Upsample, Downsample, PixelShuffle
+class AutoEncoder:
+    def encode(self, x: jnp.ndarray, **kwargs) -> jnp.ndarray:
+        raise NotImplementedError
+    def decode(self, z: jnp.ndarray, **kwargs) -> jnp.ndarray:
+        raise NotImplementedError

flaxdiff-0.1.5/flaxdiff/models/autoencoder/diffusers.py ADDED Viewed

@@ -0,0 +1,88 @@
+import jax
+import jax.numpy as jnp
+from flax import linen as nn
+from .autoencoder import AutoEncoder
+"""
+This module contains an Autoencoder implementation which uses the Stable Diffusion VAE model from the HuggingFace Diffusers library.
+"""
+class StableDiffusionVAE(AutoEncoder):
+    def __init__(self, modelname = "CompVis/stable-diffusion-v1-4"):
+        from diffusers.models.vae_flax import FlaxEncoder, FlaxDecoder
+        from diffusers import FlaxStableDiffusionPipeline
+        pipeline, params = FlaxStableDiffusionPipeline.from_pretrained(
+            modelname,
+            revision="bf16",
+            dtype=jnp.bfloat16,
+        )
+        vae = pipeline.vae
+        enc = FlaxEncoder(
+            in_channels=vae.config.in_channels,
+            out_channels=vae.config.latent_channels,
+            down_block_types=vae.config.down_block_types,
+            block_out_channels=vae.config.block_out_channels,
+            layers_per_block=vae.config.layers_per_block,
+            act_fn=vae.config.act_fn,
+            norm_num_groups=vae.config.norm_num_groups,
+            double_z=True,
+            dtype=vae.dtype,
+        )
+        dec = FlaxDecoder(
+            in_channels=vae.config.latent_channels,
+            out_channels=vae.config.out_channels,
+            up_block_types=vae.config.up_block_types,
+            block_out_channels=vae.config.block_out_channels,
+            layers_per_block=vae.config.layers_per_block,
+            norm_num_groups=vae.config.norm_num_groups,
+            act_fn=vae.config.act_fn,
+            dtype=vae.dtype,
+        )
+        quant_conv = nn.Conv(
+            2 * vae.config.latent_channels,
+            kernel_size=(1, 1),
+            strides=(1, 1),
+            padding="VALID",
+            dtype=vae.dtype,
+        )
+        post_quant_conv = nn.Conv(
+            vae.config.latent_channels,
+            kernel_size=(1, 1),
+            strides=(1, 1),
+            padding="VALID",
+            dtype=vae.dtype,
+        )
+        self.enc = enc
+        self.dec = dec
+        self.post_quant_conv = post_quant_conv
+        self.quant_conv = quant_conv
+        self.params = params
+        self.scaling_factor = vae.scaling_factor
+    def encode(self, images, rngkey: jax.random.PRNGKey = None):
+        latents = self.enc.apply({"params": self.params["vae"]['encoder']}, images, deterministic=True)
+        latents = self.quant_conv.apply({"params": self.params["vae"]['quant_conv']}, latents)
+        if rngkey is not None:
+            mean, log_std = jnp.split(latents, 2, axis=-1)
+            log_std = jnp.clip(log_std, -30, 20)
+            std = jnp.exp(0.5 * log_std)
+            latents = mean + std * jax.random.normal(rngkey, mean.shape, dtype=mean.dtype)
+            print("Sampled")
+        else:
+            # return the mean
+            latents, _ = jnp.split(latents, 2, axis=-1)
+        latents *= self.scaling_factor
+        return latents
+    def decode(self, latents):
+        latents = (1.0 / self.scaling_factor) * latents
+        latents = self.post_quant_conv.apply({"params": self.params["vae"]['post_quant_conv']}, latents)
+        return self.dec.apply({"params": self.params["vae"]['decoder']}, latents)

flaxdiff 0.1.3__tar.gz → 0.1.5__tar.gz

flaxdiff 0.1.3tar.gz → 0.1.5tar.gz