PyPI - cache-dit - Versions diffs - 0.1.7__tar.gz → 0.1.8__tar.gz - Mend

cache-dit 0.1.7tar.gz → 0.1.8tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of cache-dit might be problematic. Click here for more details.

Files changed (103) hide show

{cache_dit-0.1.7 → cache_dit-0.1.8}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: cache_dit
-Version: 0.1.7
+Version: 0.1.8
 Summary: 🤗 CacheDiT: A Training-free and Easy-to-use Cache Acceleration Toolbox for Diffusion Transformers
 Author: DefTruth, vipshop.com, etc.
 Maintainer: DefTruth, vipshop.com, etc
@@ -35,7 +35,7 @@ Dynamic: requires-python
 <div align="center">
   <p align="center">
-    <h3>🤗 CacheDiT: A Training-free and Easy-to-use Cache Acceleration <br>Toolbox for Diffusion Transformers</h3>
+    <h2>🤗 CacheDiT: A Training-free and Easy-to-use Cache Acceleration <br>Toolbox for Diffusion Transformers</h2>
   </p>
   <img src=https://github.com/vipshop/cache-dit/raw/main/assets/cache-dit.png >
   <div align='center'>
@@ -44,13 +44,32 @@ Dynamic: requires-python
       <img src=https://img.shields.io/badge/PyPI-pass-brightgreen.svg >
       <img src=https://static.pepy.tech/badge/cache-dit >
       <img src=https://img.shields.io/badge/Python-3.10|3.11|3.12-9cf.svg >
-      <img src=https://img.shields.io/badge/Release-v0.1.7-brightgreen.svg >
+      <img src=https://img.shields.io/badge/Release-v0.1.8-brightgreen.svg >
  </div>
   <p align="center">
     DeepCache is for UNet not DiT. Most DiT cache speedups are complex and not training-free. CacheDiT <br>offers a set of training-free cache accelerators for DiT: 🔥DBCache, DBPrune, FBCache, etc🔥
   </p>
+  <p align="center">
+  <h3> 🔥Supported Models🔥</h2>
+  <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀FLUX.1</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
+  <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀CogVideoX</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
+  <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀Mochi</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
+  <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀Wan2.1</b>: 🔜DBCache, 🔜DBPrune, ✔️FBCache🔥</a> <br> <br>
+  <b>♥️ Please consider to leave a ⭐️ Star to support us ~ ♥️</b>
+  </p>
 </div>
+<!--
+## 🎉Supported Models
+<div id="supported"></div>
+- [🚀FLUX.1](https://github.com/vipshop/cache-dit/raw/main/examples): *✔️DBCache, ✔️DBPrune, ✔️FBCache*
+- [🚀CogVideoX](https://github.com/vipshop/cache-dit/raw/main/examples): *✔️DBCache, ✔️DBPrune, ✔️FBCache*
+- [🚀Mochi](https://github.com/vipshop/cache-dit/raw/main/examples): *✔️DBCache, ✔️DBPrune, ✔️FBCache*
+- [🚀Wan2.1**](https://github.com/vipshop/cache-dit/raw/main/examples): *🔜DBCache, 🔜DBPrune, ✔️FBCache*
+-->
 ## 🤗 Introduction
 <div align="center">
@@ -102,11 +121,20 @@ These case studies demonstrate that even with relatively high thresholds (such a
   </p>
 </div>
-Moreover, **CacheDiT** are **plug-and-play** solutions that works hand-in-hand with [ParaAttention](https://github.com/chengzeyi/ParaAttention). Users can easily tap into its **Context Parallelism** features for distributed inference.
+**CacheDiT** are **plug-and-play** solutions that works hand-in-hand with [ParaAttention](https://github.com/chengzeyi/ParaAttention). Users can easily tap into its **Context Parallelism** features for distributed inference. Moreover, **CacheDiT** are designed to work compatibly with `torch.compile`. You can easily use CacheDiT with torch.compile to further achieve a better performance.
+<div align="center">
+  <p align="center">
+  DBPrune + <b>torch.compile + context parallelism</b> <br>Steps: 28, "A cat holding a sign that says hello world with complex background"
+  </p>
+</div>
-<p align="center">
-    ♥️ Please consider to leave a ⭐️ Star to support us ~ ♥️
-</p>
+|Baseline|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
+|:---:|:---:|:---:|:---:|:---:|:---:|
+|+L20x1:24.85s|19.43s|16.82s|15.95s|14.24s|10.66s|
+|+compile:20.43s|16.25s|14.12s|13.41s|12s|8.86s|
+|+L20x4:7.75s|6.62s|6.03s|5.81s|5.24s|3.93s|
+|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_NONE_R0.08_S0_T20.43s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.03_P24.0_T16.25s.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.04_P34.6_T14.12s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.045_P38.2_T13.41s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.055_P45.1_T12.00s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.2_P59.5_T8.86s.png width=105px>|
 ## ©️Citations
@@ -136,11 +164,9 @@ The **CacheDiT** codebase was adapted from FBCache's implementation at the [Para
 - [⚡️Dynamic Block Prune](#dbprune)
 - [🎉Context Parallelism](#context-parallelism)
 - [🔥Torch Compile](#compile)
-- [🎉Supported Models](#supported)
 - [👋Contribute](#contribute)
 - [©️License](#license)
 ## ⚙️Installation
 <div id="installation"></div>
@@ -370,6 +396,7 @@ Then, run the python test script with `torchrun`:
 ```bash
 torchrun --nproc_per_node=4 parallel_cache.py
 ```
+<!--
 <div align="center">
   <p align="center">
@@ -377,17 +404,18 @@ torchrun --nproc_per_node=4 parallel_cache.py
   </p>
 </div>
-|Baseline(L20x1)|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
+|Baseline|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
 |:---:|:---:|:---:|:---:|:---:|:---:|
-|24.85s|19.43s|16.82s|15.95s|14.24s|10.66s|
-|8.54s (L20x4)|7.20s (L20x4)|6.61s (L20x4)|6.09s (L20x4)|5.54s (L20x4)|4.22s (L20x4)|
+|+L20x1:24.85s|19.43s|16.82s|15.95s|14.24s|10.66s|
+|+L20x4:8.54s|7.20s|6.61s|6.09s|5.54s|4.22s|
 |<img src=https://github.com/vipshop/cache-dit/raw/main/assets/NONE_R0.08_S0.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.03_P24.0_T19.43s.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.04_P34.6_T16.82s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.05_P38.3_T15.95s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.06_P45.2_T14.24s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.2_P59.5_T10.66s.png width=105px>|
+-->
 ## 🔥Torch Compile
 <div id="compile"></div>
-**CacheDiT** are designed to work compatibly with `torch.compile`. For example:
+**CacheDiT** are designed to work compatibly with `torch.compile`. You can easily use CacheDiT with torch.compile to further achieve a better performance. For example:
 ```python
 apply_cache_on_pipe(
@@ -396,21 +424,27 @@ apply_cache_on_pipe(
 # Compile the Transformer module
 pipe.transformer = torch.compile(pipe.transformer)
 ```
-However, users intending to use **CacheDiT** for DiT with **dynamic input shapes** should consider increasing the **recompile** **limit** of `torch._dynamo` to achieve better performance.
+However, users intending to use **CacheDiT** for DiT with **dynamic input shapes** should consider increasing the **recompile** **limit** of `torch._dynamo`. Otherwise, the recompile_limit error may be triggered, causing the module to fall back to eager mode.
 ```python
 torch._dynamo.config.recompile_limit = 96  # default is 8
 torch._dynamo.config.accumulated_recompile_limit = 2048  # default is 256
 ```
-Otherwise, the recompile_limit error may be triggered, causing the module to fall back to eager mode.
-## 🎉Supported Models
+<!--
-<div id="supported"></div>
+<div align="center">
+  <p align="center">
+  DBPrune + <b>torch.compile</b>, Steps: 28, "A cat holding a sign that says hello world with complex background"
+  </p>
+</div>
-- [🚀FLUX.1](https://github.com/vipshop/cache-dit/raw/main/src/cache_dit/cache_factory/dual_block_cache/diffusers_adapters)
-- [🚀CogVideoX](https://github.com/vipshop/cache-dit/raw/main/src/cache_dit/cache_factory/dual_block_cache/diffusers_adapters)
-- [🚀Mochi](https://github.com/vipshop/cache-dit/raw/main/src/cache_dit/cache_factory/dual_block_cache/diffusers_adapters)
+|Baseline|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
+|:---:|:---:|:---:|:---:|:---:|:---:|
+|+L20x1:24.8s|19.4s|16.8s|15.9s|14.2s|10.6s|
+|+compile:20.4s|16.5s|14.1s|13.4s|12s|8.8s|
+|+L20x4:7.7s|6.6s|6.0s|5.8s|5.2s|3.9s|
+|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_NONE_R0.08_S0_T20.43s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.03_P24.0_T16.25s.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.04_P34.6_T14.12s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.045_P38.2_T13.41s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.055_P45.1_T12.00s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.2_P59.5_T8.86s.png width=105px>|
+-->
 ## 👋Contribute
 <div id="contribute"></div>

{cache_dit-0.1.7 → cache_dit-0.1.8}/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 <div align="center">
   <p align="center">
-    <h3>🤗 CacheDiT: A Training-free and Easy-to-use Cache Acceleration <br>Toolbox for Diffusion Transformers</h3>
+    <h2>🤗 CacheDiT: A Training-free and Easy-to-use Cache Acceleration <br>Toolbox for Diffusion Transformers</h2>
   </p>
   <img src=https://github.com/vipshop/cache-dit/raw/main/assets/cache-dit.png >
   <div align='center'>
@@ -9,13 +9,32 @@
       <img src=https://img.shields.io/badge/PyPI-pass-brightgreen.svg >
       <img src=https://static.pepy.tech/badge/cache-dit >
       <img src=https://img.shields.io/badge/Python-3.10|3.11|3.12-9cf.svg >
-      <img src=https://img.shields.io/badge/Release-v0.1.7-brightgreen.svg >
+      <img src=https://img.shields.io/badge/Release-v0.1.8-brightgreen.svg >
  </div>
   <p align="center">
     DeepCache is for UNet not DiT. Most DiT cache speedups are complex and not training-free. CacheDiT <br>offers a set of training-free cache accelerators for DiT: 🔥DBCache, DBPrune, FBCache, etc🔥
   </p>
+  <p align="center">
+  <h3> 🔥Supported Models🔥</h2>
+  <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀FLUX.1</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
+  <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀CogVideoX</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
+  <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀Mochi</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
+  <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀Wan2.1</b>: 🔜DBCache, 🔜DBPrune, ✔️FBCache🔥</a> <br> <br>
+  <b>♥️ Please consider to leave a ⭐️ Star to support us ~ ♥️</b>
+  </p>
 </div>
+<!--
+## 🎉Supported Models
+<div id="supported"></div>
+- [🚀FLUX.1](https://github.com/vipshop/cache-dit/raw/main/examples): *✔️DBCache, ✔️DBPrune, ✔️FBCache*
+- [🚀CogVideoX](https://github.com/vipshop/cache-dit/raw/main/examples): *✔️DBCache, ✔️DBPrune, ✔️FBCache*
+- [🚀Mochi](https://github.com/vipshop/cache-dit/raw/main/examples): *✔️DBCache, ✔️DBPrune, ✔️FBCache*
+- [🚀Wan2.1**](https://github.com/vipshop/cache-dit/raw/main/examples): *🔜DBCache, 🔜DBPrune, ✔️FBCache*
+-->
 ## 🤗 Introduction
 <div align="center">
@@ -67,11 +86,20 @@ These case studies demonstrate that even with relatively high thresholds (such a
   </p>
 </div>
-Moreover, **CacheDiT** are **plug-and-play** solutions that works hand-in-hand with [ParaAttention](https://github.com/chengzeyi/ParaAttention). Users can easily tap into its **Context Parallelism** features for distributed inference.
+**CacheDiT** are **plug-and-play** solutions that works hand-in-hand with [ParaAttention](https://github.com/chengzeyi/ParaAttention). Users can easily tap into its **Context Parallelism** features for distributed inference. Moreover, **CacheDiT** are designed to work compatibly with `torch.compile`. You can easily use CacheDiT with torch.compile to further achieve a better performance.
+<div align="center">
+  <p align="center">
+  DBPrune + <b>torch.compile + context parallelism</b> <br>Steps: 28, "A cat holding a sign that says hello world with complex background"
+  </p>
+</div>
-<p align="center">
-    ♥️ Please consider to leave a ⭐️ Star to support us ~ ♥️
-</p>
+|Baseline|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
+|:---:|:---:|:---:|:---:|:---:|:---:|
+|+L20x1:24.85s|19.43s|16.82s|15.95s|14.24s|10.66s|
+|+compile:20.43s|16.25s|14.12s|13.41s|12s|8.86s|
+|+L20x4:7.75s|6.62s|6.03s|5.81s|5.24s|3.93s|
+|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_NONE_R0.08_S0_T20.43s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.03_P24.0_T16.25s.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.04_P34.6_T14.12s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.045_P38.2_T13.41s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.055_P45.1_T12.00s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.2_P59.5_T8.86s.png width=105px>|
 ## ©️Citations
@@ -101,11 +129,9 @@ The **CacheDiT** codebase was adapted from FBCache's implementation at the [Para
 - [⚡️Dynamic Block Prune](#dbprune)
 - [🎉Context Parallelism](#context-parallelism)
 - [🔥Torch Compile](#compile)
-- [🎉Supported Models](#supported)
 - [👋Contribute](#contribute)
 - [©️License](#license)
 ## ⚙️Installation
 <div id="installation"></div>
@@ -335,6 +361,7 @@ Then, run the python test script with `torchrun`:
 ```bash
 torchrun --nproc_per_node=4 parallel_cache.py
 ```
+<!--
 <div align="center">
   <p align="center">
@@ -342,17 +369,18 @@ torchrun --nproc_per_node=4 parallel_cache.py
   </p>
 </div>
-|Baseline(L20x1)|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
+|Baseline|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
 |:---:|:---:|:---:|:---:|:---:|:---:|
-|24.85s|19.43s|16.82s|15.95s|14.24s|10.66s|
-|8.54s (L20x4)|7.20s (L20x4)|6.61s (L20x4)|6.09s (L20x4)|5.54s (L20x4)|4.22s (L20x4)|
+|+L20x1:24.85s|19.43s|16.82s|15.95s|14.24s|10.66s|
+|+L20x4:8.54s|7.20s|6.61s|6.09s|5.54s|4.22s|
 |<img src=https://github.com/vipshop/cache-dit/raw/main/assets/NONE_R0.08_S0.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.03_P24.0_T19.43s.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.04_P34.6_T16.82s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.05_P38.3_T15.95s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.06_P45.2_T14.24s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.2_P59.5_T10.66s.png width=105px>|
+-->
 ## 🔥Torch Compile
 <div id="compile"></div>
-**CacheDiT** are designed to work compatibly with `torch.compile`. For example:
+**CacheDiT** are designed to work compatibly with `torch.compile`. You can easily use CacheDiT with torch.compile to further achieve a better performance. For example:
 ```python
 apply_cache_on_pipe(
@@ -361,21 +389,27 @@ apply_cache_on_pipe(
 # Compile the Transformer module
 pipe.transformer = torch.compile(pipe.transformer)
 ```
-However, users intending to use **CacheDiT** for DiT with **dynamic input shapes** should consider increasing the **recompile** **limit** of `torch._dynamo` to achieve better performance.
+However, users intending to use **CacheDiT** for DiT with **dynamic input shapes** should consider increasing the **recompile** **limit** of `torch._dynamo`. Otherwise, the recompile_limit error may be triggered, causing the module to fall back to eager mode.
 ```python
 torch._dynamo.config.recompile_limit = 96  # default is 8
 torch._dynamo.config.accumulated_recompile_limit = 2048  # default is 256
 ```
-Otherwise, the recompile_limit error may be triggered, causing the module to fall back to eager mode.
-## 🎉Supported Models
+<!--
-<div id="supported"></div>
+<div align="center">
+  <p align="center">
+  DBPrune + <b>torch.compile</b>, Steps: 28, "A cat holding a sign that says hello world with complex background"
+  </p>
+</div>
-- [🚀FLUX.1](https://github.com/vipshop/cache-dit/raw/main/src/cache_dit/cache_factory/dual_block_cache/diffusers_adapters)
-- [🚀CogVideoX](https://github.com/vipshop/cache-dit/raw/main/src/cache_dit/cache_factory/dual_block_cache/diffusers_adapters)
-- [🚀Mochi](https://github.com/vipshop/cache-dit/raw/main/src/cache_dit/cache_factory/dual_block_cache/diffusers_adapters)
+|Baseline|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
+|:---:|:---:|:---:|:---:|:---:|:---:|
+|+L20x1:24.8s|19.4s|16.8s|15.9s|14.2s|10.6s|
+|+compile:20.4s|16.5s|14.1s|13.4s|12s|8.8s|
+|+L20x4:7.7s|6.6s|6.0s|5.8s|5.2s|3.9s|
+|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_NONE_R0.08_S0_T20.43s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.03_P24.0_T16.25s.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.04_P34.6_T14.12s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.045_P38.2_T13.41s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.055_P45.1_T12.00s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.2_P59.5_T8.86s.png width=105px>|
+-->
 ## 👋Contribute
 <div id="contribute"></div>

cache_dit-0.1.8/assets/U0_C1_DBPRUNE_F1B0_R0.03_P24.0_T16.25s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U0_C1_DBPRUNE_F1B0_R0.045_P38.2_T13.41s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U0_C1_DBPRUNE_F1B0_R0.04_P34.6_T14.12s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U0_C1_DBPRUNE_F1B0_R0.055_P45.1_T12.00s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U0_C1_DBPRUNE_F1B0_R0.05_P41.6_T12.70s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U0_C1_DBPRUNE_F1B0_R0.2_P59.5_T8.86s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U0_C1_DBPRUNE_F8B8_R0.08_P23.1_T16.14s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U0_C1_NONE_R0.08_S0_T20.43s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U4_C1_DBPRUNE_F1B0_R0.03_P27.3_T6.62s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U4_C1_DBPRUNE_F1B0_R0.03_P27.3_T6.63s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U4_C1_DBPRUNE_F1B0_R0.045_P38.2_T5.81s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U4_C1_DBPRUNE_F1B0_R0.045_P38.2_T5.82s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U4_C1_DBPRUNE_F1B0_R0.04_P34.6_T6.06s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U4_C1_DBPRUNE_F1B0_R0.04_P34.6_T6.07s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U4_C1_DBPRUNE_F1B0_R0.04_P34.6_T6.08s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U4_C1_DBPRUNE_F1B0_R0.055_P45.1_T5.27s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U4_C1_DBPRUNE_F1B0_R0.055_P45.1_T5.28s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U4_C1_DBPRUNE_F1B0_R0.2_P59.5_T3.95s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U4_C1_DBPRUNE_F1B0_R0.2_P59.5_T3.96s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U4_C1_NONE_R0.08_S0_T7.78s.png ADDED Viewed

Binary file

cache_dit-0.1.8/assets/U4_C1_NONE_R0.08_S0_T7.79s.png ADDED Viewed

Binary file

{cache_dit-0.1.7 → cache_dit-0.1.8}/bench/bench.py RENAMED Viewed

@@ -3,7 +3,7 @@ import argparse
 import torch
 import time
-from diffusers import FluxPipeline
+from diffusers import FluxPipeline, FluxTransformer2DModel
 from cache_dit.cache_factory import apply_cache_on_pipe, CacheType
 from cache_dit.logger import init_logger
@@ -110,6 +110,7 @@ def get_cache_options(cache_type: CacheType, args: argparse.Namespace):
     return cache_options, cache_type_str
+@torch.no_grad()
 def main():
     args = get_args()
     logger.info(f"Arguments: {args}")
@@ -119,7 +120,9 @@ def main():
         try:
             import torch.distributed as dist
             from para_attn.context_parallel import init_context_parallel_mesh
-            from para_attn.context_parallel.diffusers_adapters import parallelize_pipe
+            from para_attn.context_parallel.diffusers_adapters import (
+                parallelize_pipe,
+            )
             # Initialize distributed process group
             dist.init_process_group()
@@ -133,9 +136,10 @@ def main():
             ).to("cuda")
             parallelize_pipe(
-                pipe, mesh=init_context_parallel_mesh(
+                pipe,
+                mesh=init_context_parallel_mesh(
                     pipe.device.type, max_ulysses_dim_size=args.ulysses
-                )
+                ),
             )
         except ImportError as e:
             logger.error(
@@ -148,7 +152,7 @@ def main():
         pipe = FluxPipeline.from_pretrained(
             os.environ.get("FLUX_DIR", "black-forest-labs/FLUX.1-dev"),
             torch_dtype=torch.bfloat16,
-        ).to("cuda")
+        ).to("cuda")
     cache_options, cache_type = get_cache_options(args.cache, args)
@@ -165,7 +169,18 @@ def main():
         torch._dynamo.config.accumulated_recompile_limit = (
             2048  # default is 256
         )
-        pipe.transformer = torch.compile(pipe.transformer, mode="default")
+        if isinstance(pipe.transformer, FluxTransformer2DModel):
+            logger.warning(
+                "Only compile transformer blocks not the whole model "
+                "for FluxTransformer2DModel to keep higher precision."
+            )
+            for module in pipe.transformer.transformer_blocks:
+                module.compile()
+            for module in pipe.transformer.single_transformer_blocks:
+                module.compile()
+        else:
+            logger.info("Compiling the transformer with default mode.")
+            pipe.transformer = torch.compile(pipe.transformer, mode="default")
     all_times = []
     cached_stepes = 0
@@ -238,6 +253,7 @@ def main():
     if args.ulysses is not None:
         import torch.distributed as dist
         dist.destroy_process_group()
         logger.info("Distributed process group destroyed.")

{cache_dit-0.1.7 → cache_dit-0.1.8}/examples/.gitignore RENAMED Viewed

@@ -164,5 +164,4 @@ _version.py
 report*.html
 .DS_Store
 *.png

cache_dit-0.1.8/examples/data/cup.png ADDED Viewed

Binary file

cache_dit-0.1.8/examples/data/cup_mask.png ADDED Viewed

Binary file

cache_dit-0.1.8/examples/run_cogvideox.py ADDED Viewed

@@ -0,0 +1,46 @@
+import os
+import torch
+from diffusers import CogVideoXPipeline
+from diffusers.utils import export_to_video
+from cache_dit.cache_factory import apply_cache_on_pipe, CacheType
+pipe = CogVideoXPipeline.from_pretrained(
+    os.environ.get(
+        "COGVIDEOX_DIR",
+        "THUDM/CogVideoX-5b",
+    ),
+    torch_dtype=torch.bfloat16,
+).to("cuda")
+# Default options, F8B8, good balance between performance and precision
+cache_options = CacheType.default_options(CacheType.DBCache)
+apply_cache_on_pipe(pipe, **cache_options)
+pipe.vae.enable_slicing()
+pipe.vae.enable_tiling()
+prompt = (
+    "A panda, dressed in a small, red jacket and a tiny hat, "
+    "sits on a wooden stool in a serene bamboo forest. The "
+    "panda's fluffy paws strum a miniature acoustic guitar, "
+    "producing soft, melodic tunes. Nearby, a few other pandas "
+    "gather, watching curiously and some clapping in rhythm. "
+    "Sunlight filters through the tall bamboo, casting a gentle "
+    "glow on the scene. The panda's face is expressive, showing "
+    "concentration and joy as it plays. The background includes "
+    "a small, flowing stream and vibrant green foliage, enhancing "
+    "the peaceful and magical atmosphere of this unique musical "
+    "performance."
+)
+video = pipe(
+    prompt=prompt,
+    num_videos_per_prompt=1,
+    num_inference_steps=50,
+    num_frames=49,
+    guidance_scale=6,
+    generator=torch.Generator("cuda").manual_seed(0),
+).frames[0]
+print("Saving video to cogvideox.mp4")
+export_to_video(video, "cogvideox.mp4", fps=8)

{cache_dit-0.1.7 → cache_dit-0.1.8}/examples/run_flux.py RENAMED Viewed

@@ -1,9 +1,13 @@
+import os
 import torch
 from diffusers import FluxPipeline
 from cache_dit.cache_factory import apply_cache_on_pipe, CacheType
 pipe = FluxPipeline.from_pretrained(
-    "black-forest-labs/FLUX.1-dev",
+    os.environ.get(
+        "FLUX_DIR",
+        "black-forest-labs/FLUX.1-dev",
+    ),
     torch_dtype=torch.bfloat16,
 ).to("cuda")

cache_dit-0.1.8/examples/run_flux_fill.py ADDED Viewed

@@ -0,0 +1,32 @@
+import os
+import torch
+from diffusers import FluxFillPipeline
+from diffusers.utils import load_image
+from cache_dit.cache_factory import apply_cache_on_pipe, CacheType
+pipe = FluxFillPipeline.from_pretrained(
+    os.environ.get(
+        "FLUX_FILL_DIR",
+        "black-forest-labs/FLUX.1-Fill-dev",
+    ),
+    torch_dtype=torch.bfloat16,
+).to("cuda")
+# Default options, F8B8, good balance between performance and precision
+cache_options = CacheType.default_options(CacheType.DBCache)
+apply_cache_on_pipe(pipe, **cache_options)
+image = pipe(
+    prompt="a white paper cup",
+    image=load_image("data/cup.png"),
+    mask_image=load_image("data/cup_mask.png"),
+    guidance_scale=30,
+    num_inference_steps=28,
+    max_sequence_length=512,
+    generator=torch.Generator("cuda").manual_seed(0),
+).images[0]
+print("Saving image to flux-fill.png")
+image.save("flux-fill.png")

{cache_dit-0.1.7 → cache_dit-0.1.8}/examples/run_mochi.py RENAMED Viewed

@@ -1,10 +1,14 @@
+import os
 import torch
 from diffusers import MochiPipeline
 from diffusers.utils import export_to_video
 from cache_dit.cache_factory import apply_cache_on_pipe, CacheType
 pipe = MochiPipeline.from_pretrained(
-    "genmo/mochi-1-preview",
+    os.environ.get(
+        "MOCHI_DIR",
+        "genmo/mochi-1-preview",
+    ),
     torch_dtype=torch.bfloat16,
 ).to("cuda")
@@ -15,7 +19,10 @@ apply_cache_on_pipe(pipe, **cache_options)
 pipe.enable_vae_tiling()
-prompt = "Close-up of a chameleon's eye, with its scaly skin changing color. Ultra high resolution 4k."
+prompt = (
+    "Close-up of a chameleon's eye, with its scaly skin "
+    "changing color. Ultra high resolution 4k."
+)
 video = pipe(
     prompt,
     num_frames=84,

cache_dit-0.1.8/examples/run_wan.py ADDED Viewed

@@ -0,0 +1,49 @@
+import os
+import torch
+from diffusers import WanPipeline
+from diffusers.utils import export_to_video
+from diffusers.schedulers.scheduling_unipc_multistep import (
+    UniPCMultistepScheduler,
+)
+from cache_dit.cache_factory import apply_cache_on_pipe, CacheType
+height, width = 480, 832
+pipe = WanPipeline.from_pretrained(
+    os.environ.get(
+        "WAN_DIR",
+        "Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
+    ),
+    torch_dtype=torch.bfloat16,
+)
+# flow shift should be 3.0 for 480p images, 5.0 for 720p images
+if hasattr(pipe, "scheduler") and pipe.scheduler is not None:
+    # Use the UniPCMultistepScheduler with the specified flow shift
+    flow_shift = 3.0 if height == 480 else 5.0
+    pipe.scheduler = UniPCMultistepScheduler.from_config(
+        pipe.scheduler.config,
+        flow_shift=flow_shift,
+    )
+pipe.to("cuda")
+apply_cache_on_pipe(pipe, **CacheType.default_options(CacheType.FBCache))
+# Enable memory savings
+pipe.enable_model_cpu_offload()
+pipe.enable_vae_tiling()
+video = pipe(
+    prompt=(
+        "An astronaut dancing vigorously on the moon with earth "
+        "flying past in the background, hyperrealistic"
+    ),
+    negative_prompt="",
+    height=480,
+    width=832,
+    num_frames=81,
+    num_inference_steps=30,
+).frames[0]
+print("Saving video to wan.mp4")
+export_to_video(video, "wan.mp4", fps=15)

{cache_dit-0.1.7 → cache_dit-0.1.8}/src/cache_dit/_version.py RENAMED Viewed

@@ -17,5 +17,5 @@ __version__: str
 __version_tuple__: VERSION_TUPLE
 version_tuple: VERSION_TUPLE
-__version__ = version = '0.1.7'
-__version_tuple__ = version_tuple = (0, 1, 7)
+__version__ = version = '0.1.8'
+__version_tuple__ = version_tuple = (0, 1, 8)

{cache_dit-0.1.7 → cache_dit-0.1.8}/src/cache_dit/cache_factory/dynamic_block_prune/prune_context.py RENAMED Viewed

@@ -628,7 +628,7 @@ class DBPrunedTransformerBlocks(torch.nn.Module):
         return sorted(non_prune_blocks_ids)
     # @torch.compile(dynamic=True)
-    # mark this function as compile with dynamic=True will
+    # mark this function as compile with dynamic=True will
     # cause precision degradate, so, we choose to disable it
     # now, until we find a better solution or fixed the bug.
     @torch.compiler.disable
@@ -668,7 +668,7 @@ class DBPrunedTransformerBlocks(torch.nn.Module):
         )
     # @torch.compile(dynamic=True)
-    # mark this function as compile with dynamic=True will
+    # mark this function as compile with dynamic=True will
     # cause precision degradate, so, we choose to disable it
     # now, until we find a better solution or fixed the bug.
     @torch.compiler.disable

{cache_dit-0.1.7 → cache_dit-0.1.8}/src/cache_dit.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: cache_dit
-Version: 0.1.7
+Version: 0.1.8
 Summary: 🤗 CacheDiT: A Training-free and Easy-to-use Cache Acceleration Toolbox for Diffusion Transformers
 Author: DefTruth, vipshop.com, etc.
 Maintainer: DefTruth, vipshop.com, etc
@@ -35,7 +35,7 @@ Dynamic: requires-python
 <div align="center">
   <p align="center">
-    <h3>🤗 CacheDiT: A Training-free and Easy-to-use Cache Acceleration <br>Toolbox for Diffusion Transformers</h3>
+    <h2>🤗 CacheDiT: A Training-free and Easy-to-use Cache Acceleration <br>Toolbox for Diffusion Transformers</h2>
   </p>
   <img src=https://github.com/vipshop/cache-dit/raw/main/assets/cache-dit.png >
   <div align='center'>
@@ -44,13 +44,32 @@ Dynamic: requires-python
       <img src=https://img.shields.io/badge/PyPI-pass-brightgreen.svg >
       <img src=https://static.pepy.tech/badge/cache-dit >
       <img src=https://img.shields.io/badge/Python-3.10|3.11|3.12-9cf.svg >
-      <img src=https://img.shields.io/badge/Release-v0.1.7-brightgreen.svg >
+      <img src=https://img.shields.io/badge/Release-v0.1.8-brightgreen.svg >
  </div>
   <p align="center">
     DeepCache is for UNet not DiT. Most DiT cache speedups are complex and not training-free. CacheDiT <br>offers a set of training-free cache accelerators for DiT: 🔥DBCache, DBPrune, FBCache, etc🔥
   </p>
+  <p align="center">
+  <h3> 🔥Supported Models🔥</h2>
+  <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀FLUX.1</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
+  <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀CogVideoX</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
+  <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀Mochi</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
+  <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀Wan2.1</b>: 🔜DBCache, 🔜DBPrune, ✔️FBCache🔥</a> <br> <br>
+  <b>♥️ Please consider to leave a ⭐️ Star to support us ~ ♥️</b>
+  </p>
 </div>
+<!--
+## 🎉Supported Models
+<div id="supported"></div>
+- [🚀FLUX.1](https://github.com/vipshop/cache-dit/raw/main/examples): *✔️DBCache, ✔️DBPrune, ✔️FBCache*
+- [🚀CogVideoX](https://github.com/vipshop/cache-dit/raw/main/examples): *✔️DBCache, ✔️DBPrune, ✔️FBCache*
+- [🚀Mochi](https://github.com/vipshop/cache-dit/raw/main/examples): *✔️DBCache, ✔️DBPrune, ✔️FBCache*
+- [🚀Wan2.1**](https://github.com/vipshop/cache-dit/raw/main/examples): *🔜DBCache, 🔜DBPrune, ✔️FBCache*
+-->
 ## 🤗 Introduction
 <div align="center">
@@ -102,11 +121,20 @@ These case studies demonstrate that even with relatively high thresholds (such a
   </p>
 </div>
-Moreover, **CacheDiT** are **plug-and-play** solutions that works hand-in-hand with [ParaAttention](https://github.com/chengzeyi/ParaAttention). Users can easily tap into its **Context Parallelism** features for distributed inference.
+**CacheDiT** are **plug-and-play** solutions that works hand-in-hand with [ParaAttention](https://github.com/chengzeyi/ParaAttention). Users can easily tap into its **Context Parallelism** features for distributed inference. Moreover, **CacheDiT** are designed to work compatibly with `torch.compile`. You can easily use CacheDiT with torch.compile to further achieve a better performance.
+<div align="center">
+  <p align="center">
+  DBPrune + <b>torch.compile + context parallelism</b> <br>Steps: 28, "A cat holding a sign that says hello world with complex background"
+  </p>
+</div>
-<p align="center">
-    ♥️ Please consider to leave a ⭐️ Star to support us ~ ♥️
-</p>
+|Baseline|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
+|:---:|:---:|:---:|:---:|:---:|:---:|
+|+L20x1:24.85s|19.43s|16.82s|15.95s|14.24s|10.66s|
+|+compile:20.43s|16.25s|14.12s|13.41s|12s|8.86s|
+|+L20x4:7.75s|6.62s|6.03s|5.81s|5.24s|3.93s|
+|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_NONE_R0.08_S0_T20.43s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.03_P24.0_T16.25s.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.04_P34.6_T14.12s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.045_P38.2_T13.41s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.055_P45.1_T12.00s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.2_P59.5_T8.86s.png width=105px>|
 ## ©️Citations
@@ -136,11 +164,9 @@ The **CacheDiT** codebase was adapted from FBCache's implementation at the [Para
 - [⚡️Dynamic Block Prune](#dbprune)
 - [🎉Context Parallelism](#context-parallelism)
 - [🔥Torch Compile](#compile)
-- [🎉Supported Models](#supported)
 - [👋Contribute](#contribute)
 - [©️License](#license)
 ## ⚙️Installation
 <div id="installation"></div>
@@ -370,6 +396,7 @@ Then, run the python test script with `torchrun`:
 ```bash
 torchrun --nproc_per_node=4 parallel_cache.py
 ```
+<!--
 <div align="center">
   <p align="center">
@@ -377,17 +404,18 @@ torchrun --nproc_per_node=4 parallel_cache.py
   </p>
 </div>
-|Baseline(L20x1)|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
+|Baseline|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
 |:---:|:---:|:---:|:---:|:---:|:---:|
-|24.85s|19.43s|16.82s|15.95s|14.24s|10.66s|
-|8.54s (L20x4)|7.20s (L20x4)|6.61s (L20x4)|6.09s (L20x4)|5.54s (L20x4)|4.22s (L20x4)|
+|+L20x1:24.85s|19.43s|16.82s|15.95s|14.24s|10.66s|
+|+L20x4:8.54s|7.20s|6.61s|6.09s|5.54s|4.22s|
 |<img src=https://github.com/vipshop/cache-dit/raw/main/assets/NONE_R0.08_S0.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.03_P24.0_T19.43s.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.04_P34.6_T16.82s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.05_P38.3_T15.95s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.06_P45.2_T14.24s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.2_P59.5_T10.66s.png width=105px>|
+-->
 ## 🔥Torch Compile
 <div id="compile"></div>
-**CacheDiT** are designed to work compatibly with `torch.compile`. For example:
+**CacheDiT** are designed to work compatibly with `torch.compile`. You can easily use CacheDiT with torch.compile to further achieve a better performance. For example:
 ```python
 apply_cache_on_pipe(
@@ -396,21 +424,27 @@ apply_cache_on_pipe(
 # Compile the Transformer module
 pipe.transformer = torch.compile(pipe.transformer)
 ```
-However, users intending to use **CacheDiT** for DiT with **dynamic input shapes** should consider increasing the **recompile** **limit** of `torch._dynamo` to achieve better performance.
+However, users intending to use **CacheDiT** for DiT with **dynamic input shapes** should consider increasing the **recompile** **limit** of `torch._dynamo`. Otherwise, the recompile_limit error may be triggered, causing the module to fall back to eager mode.
 ```python
 torch._dynamo.config.recompile_limit = 96  # default is 8
 torch._dynamo.config.accumulated_recompile_limit = 2048  # default is 256
 ```
-Otherwise, the recompile_limit error may be triggered, causing the module to fall back to eager mode.
-## 🎉Supported Models
+<!--
-<div id="supported"></div>
+<div align="center">
+  <p align="center">
+  DBPrune + <b>torch.compile</b>, Steps: 28, "A cat holding a sign that says hello world with complex background"
+  </p>
+</div>
-- [🚀FLUX.1](https://github.com/vipshop/cache-dit/raw/main/src/cache_dit/cache_factory/dual_block_cache/diffusers_adapters)
-- [🚀CogVideoX](https://github.com/vipshop/cache-dit/raw/main/src/cache_dit/cache_factory/dual_block_cache/diffusers_adapters)
-- [🚀Mochi](https://github.com/vipshop/cache-dit/raw/main/src/cache_dit/cache_factory/dual_block_cache/diffusers_adapters)
+|Baseline|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
+|:---:|:---:|:---:|:---:|:---:|:---:|
+|+L20x1:24.8s|19.4s|16.8s|15.9s|14.2s|10.6s|
+|+compile:20.4s|16.5s|14.1s|13.4s|12s|8.8s|
+|+L20x4:7.7s|6.6s|6.0s|5.8s|5.2s|3.9s|
+|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_NONE_R0.08_S0_T20.43s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.03_P24.0_T16.25s.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.04_P34.6_T14.12s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.045_P38.2_T13.41s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.055_P45.1_T12.00s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.2_P59.5_T8.86s.png width=105px>|
+-->
 ## 👋Contribute
 <div id="contribute"></div>

{cache_dit-0.1.7 → cache_dit-0.1.8}/src/cache_dit.egg-info/SOURCES.txt RENAMED Viewed

@@ -35,6 +35,27 @@ assets/DBPRUNE_F1B0_R0.1_P62.8_T9.95s.png
 assets/DBPRUNE_F1B0_R0.2_P59.5_T10.66s.png
 assets/DBPRUNE_F1B0_R0.3_P63.1_T9.79s.png
 assets/NONE_R0.08_S0.png
+assets/U0_C1_DBPRUNE_F1B0_R0.03_P24.0_T16.25s.png
+assets/U0_C1_DBPRUNE_F1B0_R0.045_P38.2_T13.41s.png
+assets/U0_C1_DBPRUNE_F1B0_R0.04_P34.6_T14.12s.png
+assets/U0_C1_DBPRUNE_F1B0_R0.055_P45.1_T12.00s.png
+assets/U0_C1_DBPRUNE_F1B0_R0.05_P41.6_T12.70s.png
+assets/U0_C1_DBPRUNE_F1B0_R0.2_P59.5_T8.86s.png
+assets/U0_C1_DBPRUNE_F8B8_R0.08_P23.1_T16.14s.png
+assets/U0_C1_NONE_R0.08_S0_T20.43s.png
+assets/U4_C1_DBPRUNE_F1B0_R0.03_P27.3_T6.62s.png
+assets/U4_C1_DBPRUNE_F1B0_R0.03_P27.3_T6.63s.png
+assets/U4_C1_DBPRUNE_F1B0_R0.045_P38.2_T5.81s.png
+assets/U4_C1_DBPRUNE_F1B0_R0.045_P38.2_T5.82s.png
+assets/U4_C1_DBPRUNE_F1B0_R0.04_P34.6_T6.06s.png
+assets/U4_C1_DBPRUNE_F1B0_R0.04_P34.6_T6.07s.png
+assets/U4_C1_DBPRUNE_F1B0_R0.04_P34.6_T6.08s.png
+assets/U4_C1_DBPRUNE_F1B0_R0.055_P45.1_T5.27s.png
+assets/U4_C1_DBPRUNE_F1B0_R0.055_P45.1_T5.28s.png
+assets/U4_C1_DBPRUNE_F1B0_R0.2_P59.5_T3.95s.png
+assets/U4_C1_DBPRUNE_F1B0_R0.2_P59.5_T3.96s.png
+assets/U4_C1_NONE_R0.08_S0_T7.78s.png
+assets/U4_C1_NONE_R0.08_S0_T7.79s.png
 assets/cache-dit.png
 bench/.gitignore
 bench/bench.py
@@ -42,7 +63,11 @@ docs/.gitignore
 examples/.gitignore
 examples/run_cogvideox.py
 examples/run_flux.py
+examples/run_flux_fill.py
 examples/run_mochi.py
+examples/run_wan.py
+examples/data/cup.png
+examples/data/cup_mask.png
 src/cache_dit/__init__.py
 src/cache_dit/_version.py
 src/cache_dit/logger.py

cache_dit-0.1.7/examples/run_cogvideox.py DELETED Viewed

@@ -1,30 +0,0 @@
-import torch
-from diffusers import CogVideoXPipeline
-from diffusers.utils import export_to_video
-from cache_dit.cache_factory import apply_cache_on_pipe, CacheType
-pipe = CogVideoXPipeline.from_pretrained(
-    "THUDM/CogVideoX-5b",
-    torch_dtype=torch.bfloat16,
-).to("cuda")
-# Default options, F8B8, good balance between performance and precision
-cache_options = CacheType.default_options(CacheType.DBCache)
-apply_cache_on_pipe(pipe, **cache_options)
-pipe.vae.enable_slicing()
-pipe.vae.enable_tiling()
-prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance."
-video = pipe(
-    prompt=prompt,
-    num_videos_per_prompt=1,
-    num_inference_steps=50,
-    num_frames=49,
-    guidance_scale=6,
-    generator=torch.Generator("cuda").manual_seed(0),
-).frames[0]
-print("Saving video to cogvideox.mp4")
-export_to_video(video, "cogvideox.mp4", fps=8)