cache-dit 0.1.8__py3-none-any.whl → 0.2.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of cache-dit might be problematic. Click here for more details.

@@ -4,13 +4,13 @@ import functools
4
4
  import unittest
5
5
 
6
6
  import torch
7
- from diffusers import DiffusionPipeline, HunyuanVideoTransformer3DModel
7
+ from diffusers import DiffusionPipeline, WanTransformer3DModel
8
8
 
9
9
  from cache_dit.cache_factory.first_block_cache import cache_context
10
10
 
11
11
 
12
12
  def apply_cache_on_transformer(
13
- transformer: HunyuanVideoTransformer3DModel,
13
+ transformer: WanTransformer3DModel,
14
14
  ):
15
15
  if getattr(transformer, "_is_cached", False):
16
16
  return transformer
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: cache_dit
3
- Version: 0.1.8
3
+ Version: 0.2.0
4
4
  Summary: 🤗 CacheDiT: A Training-free and Easy-to-use Cache Acceleration Toolbox for Diffusion Transformers
5
5
  Author: DefTruth, vipshop.com, etc.
6
6
  Maintainer: DefTruth, vipshop.com, etc
@@ -44,31 +44,27 @@ Dynamic: requires-python
44
44
  <img src=https://img.shields.io/badge/PyPI-pass-brightgreen.svg >
45
45
  <img src=https://static.pepy.tech/badge/cache-dit >
46
46
  <img src=https://img.shields.io/badge/Python-3.10|3.11|3.12-9cf.svg >
47
- <img src=https://img.shields.io/badge/Release-v0.1.8-brightgreen.svg >
47
+ <img src=https://img.shields.io/badge/Release-v0.2.0-brightgreen.svg >
48
48
  </div>
49
49
  <p align="center">
50
50
  DeepCache is for UNet not DiT. Most DiT cache speedups are complex and not training-free. CacheDiT <br>offers a set of training-free cache accelerators for DiT: 🔥DBCache, DBPrune, FBCache, etc🔥
51
51
  </p>
52
52
  <p align="center">
53
- <h3> 🔥Supported Models🔥</h2>
53
+ <h4> 🔥Supported Models🔥</h4>
54
54
  <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀FLUX.1</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
55
- <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀CogVideoX</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
56
55
  <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀Mochi</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
57
- <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀Wan2.1</b>: 🔜DBCache, 🔜DBPrune, ✔️FBCache🔥</a> <br> <br>
58
- <b>♥️ Please consider to leave a ⭐️ Star to support us ~ ♥️</b>
56
+ <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀CogVideoX</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
57
+ <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀CogVideoX1.5</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
58
+ <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀Wan2.1</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
59
+ <a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀HunyuanVideo</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
59
60
  </p>
60
61
  </div>
61
62
 
63
+ ## 👋 Highlight
62
64
 
63
- <!--
64
- ## 🎉Supported Models
65
- <div id="supported"></div>
66
- - [🚀FLUX.1](https://github.com/vipshop/cache-dit/raw/main/examples): *✔️DBCache, ✔️DBPrune, ✔️FBCache*
67
- - [🚀CogVideoX](https://github.com/vipshop/cache-dit/raw/main/examples): *✔️DBCache, ✔️DBPrune, ✔️FBCache*
68
- - [🚀Mochi](https://github.com/vipshop/cache-dit/raw/main/examples): *✔️DBCache, ✔️DBPrune, ✔️FBCache*
69
- - [🚀Wan2.1**](https://github.com/vipshop/cache-dit/raw/main/examples): *🔜DBCache, 🔜DBPrune, ✔️FBCache*
70
- -->
65
+ <div id="reference"></div>
71
66
 
67
+ The **CacheDiT** codebase is adapted from [FBCache](https://github.com/chengzeyi/ParaAttention/tree/main/src/para_attn/first_block_cache). Special thanks to their excellent work! The **FBCache** support for Mochi, FLUX.1, CogVideoX, Wan2.1, and HunyuanVideo is directly adapted from the original [FBCache](https://github.com/chengzeyi/ParaAttention/tree/main/src/para_attn/first_block_cache).
72
68
 
73
69
  ## 🤗 Introduction
74
70
 
@@ -110,6 +106,12 @@ These case studies demonstrate that even with relatively high thresholds (such a
110
106
 
111
107
  **DBPrune**: We have further implemented a new **Dynamic Block Prune** algorithm based on **Residual Caching** for Diffusion Transformers, referred to as DBPrune. DBPrune caches each block's hidden states and residuals, then **dynamically prunes** blocks during inference by computing the L1 distance between previous hidden states. When a block is pruned, its output is approximated using the cached residuals.
112
108
 
109
+ <div align="center">
110
+ <p align="center">
111
+ DBPrune, <b> L20x1 </b>, Steps: 28, "A cat holding a sign that says hello world with complex background"
112
+ </p>
113
+ </div>
114
+
113
115
  |Baseline(L20x1)|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
114
116
  |:---:|:---:|:---:|:---:|:---:|:---:|
115
117
  |24.85s|19.43s|16.82s|15.95s|14.24s|10.66s|
@@ -117,11 +119,11 @@ These case studies demonstrate that even with relatively high thresholds (such a
117
119
 
118
120
  <div align="center">
119
121
  <p align="center">
120
- DBPrune, <b> L20x1 </b>, Steps: 28, "A cat holding a sign that says hello world with complex background"
122
+ <h3>🔥 Context Parallelism and Torch Compile</h3>
121
123
  </p>
122
- </div>
124
+ </div>
123
125
 
124
- **CacheDiT** are **plug-and-play** solutions that works hand-in-hand with [ParaAttention](https://github.com/chengzeyi/ParaAttention). Users can easily tap into its **Context Parallelism** features for distributed inference. Moreover, **CacheDiT** are designed to work compatibly with `torch.compile`. You can easily use CacheDiT with torch.compile to further achieve a better performance.
126
+ Moreover, **CacheDiT** are **plug-and-play** solutions that works hand-in-hand with [ParaAttention](https://github.com/chengzeyi/ParaAttention). Users can easily tap into its **Context Parallelism** features for distributed inference. By the way, CacheDiT is designed to work compatibly with **torch.compile.** You can easily use CacheDiT with torch.compile to further achieve a better performance.
125
127
 
126
128
  <div align="center">
127
129
  <p align="center">
@@ -131,11 +133,16 @@ These case studies demonstrate that even with relatively high thresholds (such a
131
133
 
132
134
  |Baseline|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
133
135
  |:---:|:---:|:---:|:---:|:---:|:---:|
134
- |+L20x1:24.85s|19.43s|16.82s|15.95s|14.24s|10.66s|
135
- |+compile:20.43s|16.25s|14.12s|13.41s|12s|8.86s|
136
+ |+compile:20.43s|16.25s|14.12s|13.41s|12.00s|8.86s|
136
137
  |+L20x4:7.75s|6.62s|6.03s|5.81s|5.24s|3.93s|
137
138
  |<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_NONE_R0.08_S0_T20.43s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.03_P24.0_T16.25s.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.04_P34.6_T14.12s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.045_P38.2_T13.41s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.055_P45.1_T12.00s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.2_P59.5_T8.86s.png width=105px>|
138
139
 
140
+ <div align="center">
141
+ <p align="center">
142
+ <b>♥️ Please consider to leave a ⭐️ Star to support us ~ ♥️</b>
143
+ </p>
144
+ </div>
145
+
139
146
  ## ©️Citations
140
147
 
141
148
  ```BibTeX
@@ -148,12 +155,6 @@ These case studies demonstrate that even with relatively high thresholds (such a
148
155
  }
149
156
  ```
150
157
 
151
- ## 👋Reference
152
-
153
- <div id="reference"></div>
154
-
155
- The **CacheDiT** codebase was adapted from FBCache's implementation at the [ParaAttention](https://github.com/chengzeyi/ParaAttention/tree/main/src/para_attn/first_block_cache). We would like to express our sincere gratitude for this excellent work!
156
-
157
158
  ## 📖Contents
158
159
 
159
160
  <div id="contents"></div>
@@ -396,26 +397,12 @@ Then, run the python test script with `torchrun`:
396
397
  ```bash
397
398
  torchrun --nproc_per_node=4 parallel_cache.py
398
399
  ```
399
- <!--
400
-
401
- <div align="center">
402
- <p align="center">
403
- DBPrune, <b> L20x4 </b>, Steps: 28, "A cat holding a sign that says hello world with complex background"
404
- </p>
405
- </div>
406
-
407
- |Baseline|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
408
- |:---:|:---:|:---:|:---:|:---:|:---:|
409
- |+L20x1:24.85s|19.43s|16.82s|15.95s|14.24s|10.66s|
410
- |+L20x4:8.54s|7.20s|6.61s|6.09s|5.54s|4.22s|
411
- |<img src=https://github.com/vipshop/cache-dit/raw/main/assets/NONE_R0.08_S0.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.03_P24.0_T19.43s.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.04_P34.6_T16.82s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.05_P38.3_T15.95s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.06_P45.2_T14.24s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.2_P59.5_T10.66s.png width=105px>|
412
- -->
413
400
 
414
401
  ## 🔥Torch Compile
415
402
 
416
403
  <div id="compile"></div>
417
404
 
418
- **CacheDiT** are designed to work compatibly with `torch.compile`. You can easily use CacheDiT with torch.compile to further achieve a better performance. For example:
405
+ By the way, **CacheDiT** is designed to work compatibly with **torch.compile.** You can easily use CacheDiT with torch.compile to further achieve a better performance. For example:
419
406
 
420
407
  ```python
421
408
  apply_cache_on_pipe(
@@ -430,22 +417,6 @@ torch._dynamo.config.recompile_limit = 96 # default is 8
430
417
  torch._dynamo.config.accumulated_recompile_limit = 2048 # default is 256
431
418
  ```
432
419
 
433
- <!--
434
-
435
- <div align="center">
436
- <p align="center">
437
- DBPrune + <b>torch.compile</b>, Steps: 28, "A cat holding a sign that says hello world with complex background"
438
- </p>
439
- </div>
440
-
441
- |Baseline|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
442
- |:---:|:---:|:---:|:---:|:---:|:---:|
443
- |+L20x1:24.8s|19.4s|16.8s|15.9s|14.2s|10.6s|
444
- |+compile:20.4s|16.5s|14.1s|13.4s|12s|8.8s|
445
- |+L20x4:7.7s|6.6s|6.0s|5.8s|5.2s|3.9s|
446
- |<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_NONE_R0.08_S0_T20.43s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.03_P24.0_T16.25s.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.04_P34.6_T14.12s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.045_P38.2_T13.41s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.055_P45.1_T12.00s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.2_P59.5_T8.86s.png width=105px>|
447
- -->
448
-
449
420
  ## 👋Contribute
450
421
  <div id="contribute"></div>
451
422
 
@@ -1,5 +1,5 @@
1
1
  cache_dit/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
2
- cache_dit/_version.py,sha256=AjUi5zEL_BoWoXMXR1FnWc3mD6FHX7snDXjDHVLoens,511
2
+ cache_dit/_version.py,sha256=iB5DfB5V6YB5Wo4JmvS-txT42QtmGaWcWp3udRT7zCI,511
3
3
  cache_dit/logger.py,sha256=dKfNe_RRk9HJwfgHGeRR1f0LbskJpKdGmISCbL9roQs,3443
4
4
  cache_dit/primitives.py,sha256=A2iG9YLot3gOsZSPp-_gyjqjLgJvWQRx8aitD4JQ23Y,3877
5
5
  cache_dit/cache_factory/__init__.py,sha256=5RNuhWakvvqrOV4vkqrEBA7d-V1LwcNSsjtW14mkqK8,5255
@@ -7,25 +7,30 @@ cache_dit/cache_factory/taylorseer.py,sha256=0W29ykJg3MnyLAB2KFicsl11Xe41cDYPgI6
7
7
  cache_dit/cache_factory/utils.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
8
8
  cache_dit/cache_factory/dual_block_cache/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
9
9
  cache_dit/cache_factory/dual_block_cache/cache_context.py,sha256=EJ-uhA2-sWMW1jNDhcBtjHDqSn8lUzfKbYoPfZDQhZU,49665
10
- cache_dit/cache_factory/dual_block_cache/diffusers_adapters/__init__.py,sha256=C6tfXHpdY8YFV3gk74dr_IpYH4bO4ItbPCQYud3NgAM,1667
10
+ cache_dit/cache_factory/dual_block_cache/diffusers_adapters/__init__.py,sha256=ySmO_0IuSm5VrYdi9ccGVHoFVkgtPZMJGq_OMoyl0Q8,2003
11
11
  cache_dit/cache_factory/dual_block_cache/diffusers_adapters/cogvideox.py,sha256=1_n-RFMiL3v2SjhSfFrPH5Mn5Dq9z4BesVK8GN_nh2g,2404
12
12
  cache_dit/cache_factory/dual_block_cache/diffusers_adapters/flux.py,sha256=UbE6nIF-EtA92QxIZVMzIssdZKQSPAVX1hchF9R8drU,2754
13
+ cache_dit/cache_factory/dual_block_cache/diffusers_adapters/hunyuan_video.py,sha256=k3fnlVRTSFkEfWAJ136EMVCeOHPklsArKWFQGMMyLuM,10102
13
14
  cache_dit/cache_factory/dual_block_cache/diffusers_adapters/mochi.py,sha256=qxMu1L3ycT8F-uxpGsmFQBY_BH1vDiGIOXgS_Qbb7dM,2391
15
+ cache_dit/cache_factory/dual_block_cache/diffusers_adapters/wan.py,sha256=M_O91sVaHSJgptv6OY5MhT4vD3gQAUTETEmP1kLdE6c,2713
14
16
  cache_dit/cache_factory/dynamic_block_prune/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
15
17
  cache_dit/cache_factory/dynamic_block_prune/prune_context.py,sha256=YRDwZ_16yjThpgVgDv6YaIB4QCE9nEkE-MOru0jOd50,35026
16
- cache_dit/cache_factory/dynamic_block_prune/diffusers_adapters/__init__.py,sha256=8IjJjZOs5XRzsj7Ni2MXpR2Z1PUyRSONIhmfAn1G0eM,1667
18
+ cache_dit/cache_factory/dynamic_block_prune/diffusers_adapters/__init__.py,sha256=WNBk4GeekjG2Ln1pAer20TVHpWyGyyHN50DdqWuPhc0,2003
17
19
  cache_dit/cache_factory/dynamic_block_prune/diffusers_adapters/cogvideox.py,sha256=ORJpdkXkgziDUo-rpebC6pUemgYaDCoeu0cwwLz175U,2407
18
20
  cache_dit/cache_factory/dynamic_block_prune/diffusers_adapters/flux.py,sha256=KbEkLSsHtS6xwLWNh3jlOlXRyGRdrI2pWV1zyQxMTj4,2757
21
+ cache_dit/cache_factory/dynamic_block_prune/diffusers_adapters/hunyuan_video.py,sha256=v3SgLJQPbEqSVy2sYVGMhptJqCe-XPXL7LIV7GEacZg,10105
19
22
  cache_dit/cache_factory/dynamic_block_prune/diffusers_adapters/mochi.py,sha256=rgeXfww-7WX6URSDg7mF1HuxSmYmoJVjMVoNGuxjwxc,2395
23
+ cache_dit/cache_factory/dynamic_block_prune/diffusers_adapters/wan.py,sha256=JCRIq7kt59nbv7t10FdfmFZd1GawUZu2tYQ_QBB8zXQ,2713
20
24
  cache_dit/cache_factory/first_block_cache/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
21
25
  cache_dit/cache_factory/first_block_cache/cache_context.py,sha256=DpDhtK095PlrvACf7sbjOt2-QpVkV1arr1qGEKJqgaQ,23502
22
- cache_dit/cache_factory/first_block_cache/diffusers_adapters/__init__.py,sha256=h80pgoZ3l8qC4rbm9KY0jSN8hOsmGgyvvFxD-xznHdw,1959
26
+ cache_dit/cache_factory/first_block_cache/diffusers_adapters/__init__.py,sha256=-FFgA2MoudEo7uDacg4aWgm1KwfLZFsEDTVxatgbq9M,2146
23
27
  cache_dit/cache_factory/first_block_cache/diffusers_adapters/cogvideox.py,sha256=qO5CWyurtwW30mvOe6cxeQPTSXLDlPJcezm72zEjDq8,2375
24
28
  cache_dit/cache_factory/first_block_cache/diffusers_adapters/flux.py,sha256=Dcd4OzABCtyQCZNX2KNnUTdVoO1E1ApM7P8gcVYzcK0,2733
29
+ cache_dit/cache_factory/first_block_cache/diffusers_adapters/hunyuan_video.py,sha256=OL7W4ukYlZz0IDmBR1zVV6XT3Mgciglj9Hqzv1wUAkQ,10092
25
30
  cache_dit/cache_factory/first_block_cache/diffusers_adapters/mochi.py,sha256=lQTClo52OwPbNEE4jiBZQhfC7hbtYqnYIABp_vbm_dk,2363
26
- cache_dit/cache_factory/first_block_cache/diffusers_adapters/wan.py,sha256=IVH-lroOzvYb4XKLk9MOw54EtijBtuzVaKcVGz0KlBA,2656
27
- cache_dit-0.1.8.dist-info/licenses/LICENSE,sha256=Dqb07Ik2dV41s9nIdMUbiRWEfDqo7-dQeRiY7kPO8PE,3769
28
- cache_dit-0.1.8.dist-info/METADATA,sha256=sAYGKro4VfeE_SHrZA8X0BcHfw9y3YY_Qcj9ONkbemE,23952
29
- cache_dit-0.1.8.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
30
- cache_dit-0.1.8.dist-info/top_level.txt,sha256=ZJDydonLEhujzz0FOkVbO-BqfzO9d_VqRHmZU-3MOZo,10
31
- cache_dit-0.1.8.dist-info/RECORD,,
31
+ cache_dit/cache_factory/first_block_cache/diffusers_adapters/wan.py,sha256=dBNzHBECAuTTA1a7kLdvZL20YzaKTAS3iciVLzKKEWA,2638
32
+ cache_dit-0.2.0.dist-info/licenses/LICENSE,sha256=Dqb07Ik2dV41s9nIdMUbiRWEfDqo7-dQeRiY7kPO8PE,3769
33
+ cache_dit-0.2.0.dist-info/METADATA,sha256=WK3Fu8euIwLlm3TXjJws9VzwNjEfcMNvkCSJRt7jEdo,21845
34
+ cache_dit-0.2.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
35
+ cache_dit-0.2.0.dist-info/top_level.txt,sha256=ZJDydonLEhujzz0FOkVbO-BqfzO9d_VqRHmZU-3MOZo,10
36
+ cache_dit-0.2.0.dist-info/RECORD,,