cache-dit 0.1.7__py3-none-any.whl → 0.1.8__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of cache-dit might be problematic. Click here for more details.
- cache_dit/_version.py +2 -2
- cache_dit/cache_factory/dynamic_block_prune/prune_context.py +2 -2
- {cache_dit-0.1.7.dist-info → cache_dit-0.1.8.dist-info}/METADATA +55 -21
- {cache_dit-0.1.7.dist-info → cache_dit-0.1.8.dist-info}/RECORD +7 -7
- {cache_dit-0.1.7.dist-info → cache_dit-0.1.8.dist-info}/WHEEL +0 -0
- {cache_dit-0.1.7.dist-info → cache_dit-0.1.8.dist-info}/licenses/LICENSE +0 -0
- {cache_dit-0.1.7.dist-info → cache_dit-0.1.8.dist-info}/top_level.txt +0 -0
cache_dit/_version.py
CHANGED
|
@@ -628,7 +628,7 @@ class DBPrunedTransformerBlocks(torch.nn.Module):
|
|
|
628
628
|
return sorted(non_prune_blocks_ids)
|
|
629
629
|
|
|
630
630
|
# @torch.compile(dynamic=True)
|
|
631
|
-
# mark this function as compile with dynamic=True will
|
|
631
|
+
# mark this function as compile with dynamic=True will
|
|
632
632
|
# cause precision degradate, so, we choose to disable it
|
|
633
633
|
# now, until we find a better solution or fixed the bug.
|
|
634
634
|
@torch.compiler.disable
|
|
@@ -668,7 +668,7 @@ class DBPrunedTransformerBlocks(torch.nn.Module):
|
|
|
668
668
|
)
|
|
669
669
|
|
|
670
670
|
# @torch.compile(dynamic=True)
|
|
671
|
-
# mark this function as compile with dynamic=True will
|
|
671
|
+
# mark this function as compile with dynamic=True will
|
|
672
672
|
# cause precision degradate, so, we choose to disable it
|
|
673
673
|
# now, until we find a better solution or fixed the bug.
|
|
674
674
|
@torch.compiler.disable
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: cache_dit
|
|
3
|
-
Version: 0.1.
|
|
3
|
+
Version: 0.1.8
|
|
4
4
|
Summary: 🤗 CacheDiT: A Training-free and Easy-to-use Cache Acceleration Toolbox for Diffusion Transformers
|
|
5
5
|
Author: DefTruth, vipshop.com, etc.
|
|
6
6
|
Maintainer: DefTruth, vipshop.com, etc
|
|
@@ -35,7 +35,7 @@ Dynamic: requires-python
|
|
|
35
35
|
|
|
36
36
|
<div align="center">
|
|
37
37
|
<p align="center">
|
|
38
|
-
<
|
|
38
|
+
<h2>🤗 CacheDiT: A Training-free and Easy-to-use Cache Acceleration <br>Toolbox for Diffusion Transformers</h2>
|
|
39
39
|
</p>
|
|
40
40
|
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/cache-dit.png >
|
|
41
41
|
<div align='center'>
|
|
@@ -44,13 +44,32 @@ Dynamic: requires-python
|
|
|
44
44
|
<img src=https://img.shields.io/badge/PyPI-pass-brightgreen.svg >
|
|
45
45
|
<img src=https://static.pepy.tech/badge/cache-dit >
|
|
46
46
|
<img src=https://img.shields.io/badge/Python-3.10|3.11|3.12-9cf.svg >
|
|
47
|
-
<img src=https://img.shields.io/badge/Release-v0.1.
|
|
47
|
+
<img src=https://img.shields.io/badge/Release-v0.1.8-brightgreen.svg >
|
|
48
48
|
</div>
|
|
49
49
|
<p align="center">
|
|
50
50
|
DeepCache is for UNet not DiT. Most DiT cache speedups are complex and not training-free. CacheDiT <br>offers a set of training-free cache accelerators for DiT: 🔥DBCache, DBPrune, FBCache, etc🔥
|
|
51
51
|
</p>
|
|
52
|
+
<p align="center">
|
|
53
|
+
<h3> 🔥Supported Models🔥</h2>
|
|
54
|
+
<a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀FLUX.1</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
|
|
55
|
+
<a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀CogVideoX</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
|
|
56
|
+
<a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀Mochi</b>: ✔️DBCache, ✔️DBPrune, ✔️FBCache🔥</a> <br>
|
|
57
|
+
<a href=https://github.com/vipshop/cache-dit/raw/main/examples> <b>🚀Wan2.1</b>: 🔜DBCache, 🔜DBPrune, ✔️FBCache🔥</a> <br> <br>
|
|
58
|
+
<b>♥️ Please consider to leave a ⭐️ Star to support us ~ ♥️</b>
|
|
59
|
+
</p>
|
|
52
60
|
</div>
|
|
53
61
|
|
|
62
|
+
|
|
63
|
+
<!--
|
|
64
|
+
## 🎉Supported Models
|
|
65
|
+
<div id="supported"></div>
|
|
66
|
+
- [🚀FLUX.1](https://github.com/vipshop/cache-dit/raw/main/examples): *✔️DBCache, ✔️DBPrune, ✔️FBCache*
|
|
67
|
+
- [🚀CogVideoX](https://github.com/vipshop/cache-dit/raw/main/examples): *✔️DBCache, ✔️DBPrune, ✔️FBCache*
|
|
68
|
+
- [🚀Mochi](https://github.com/vipshop/cache-dit/raw/main/examples): *✔️DBCache, ✔️DBPrune, ✔️FBCache*
|
|
69
|
+
- [🚀Wan2.1**](https://github.com/vipshop/cache-dit/raw/main/examples): *🔜DBCache, 🔜DBPrune, ✔️FBCache*
|
|
70
|
+
-->
|
|
71
|
+
|
|
72
|
+
|
|
54
73
|
## 🤗 Introduction
|
|
55
74
|
|
|
56
75
|
<div align="center">
|
|
@@ -102,11 +121,20 @@ These case studies demonstrate that even with relatively high thresholds (such a
|
|
|
102
121
|
</p>
|
|
103
122
|
</div>
|
|
104
123
|
|
|
105
|
-
|
|
124
|
+
**CacheDiT** are **plug-and-play** solutions that works hand-in-hand with [ParaAttention](https://github.com/chengzeyi/ParaAttention). Users can easily tap into its **Context Parallelism** features for distributed inference. Moreover, **CacheDiT** are designed to work compatibly with `torch.compile`. You can easily use CacheDiT with torch.compile to further achieve a better performance.
|
|
125
|
+
|
|
126
|
+
<div align="center">
|
|
127
|
+
<p align="center">
|
|
128
|
+
DBPrune + <b>torch.compile + context parallelism</b> <br>Steps: 28, "A cat holding a sign that says hello world with complex background"
|
|
129
|
+
</p>
|
|
130
|
+
</div>
|
|
106
131
|
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
132
|
+
|Baseline|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
|
|
133
|
+
|:---:|:---:|:---:|:---:|:---:|:---:|
|
|
134
|
+
|+L20x1:24.85s|19.43s|16.82s|15.95s|14.24s|10.66s|
|
|
135
|
+
|+compile:20.43s|16.25s|14.12s|13.41s|12s|8.86s|
|
|
136
|
+
|+L20x4:7.75s|6.62s|6.03s|5.81s|5.24s|3.93s|
|
|
137
|
+
|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_NONE_R0.08_S0_T20.43s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.03_P24.0_T16.25s.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.04_P34.6_T14.12s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.045_P38.2_T13.41s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.055_P45.1_T12.00s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.2_P59.5_T8.86s.png width=105px>|
|
|
110
138
|
|
|
111
139
|
## ©️Citations
|
|
112
140
|
|
|
@@ -136,11 +164,9 @@ The **CacheDiT** codebase was adapted from FBCache's implementation at the [Para
|
|
|
136
164
|
- [⚡️Dynamic Block Prune](#dbprune)
|
|
137
165
|
- [🎉Context Parallelism](#context-parallelism)
|
|
138
166
|
- [🔥Torch Compile](#compile)
|
|
139
|
-
- [🎉Supported Models](#supported)
|
|
140
167
|
- [👋Contribute](#contribute)
|
|
141
168
|
- [©️License](#license)
|
|
142
169
|
|
|
143
|
-
|
|
144
170
|
## ⚙️Installation
|
|
145
171
|
|
|
146
172
|
<div id="installation"></div>
|
|
@@ -370,6 +396,7 @@ Then, run the python test script with `torchrun`:
|
|
|
370
396
|
```bash
|
|
371
397
|
torchrun --nproc_per_node=4 parallel_cache.py
|
|
372
398
|
```
|
|
399
|
+
<!--
|
|
373
400
|
|
|
374
401
|
<div align="center">
|
|
375
402
|
<p align="center">
|
|
@@ -377,17 +404,18 @@ torchrun --nproc_per_node=4 parallel_cache.py
|
|
|
377
404
|
</p>
|
|
378
405
|
</div>
|
|
379
406
|
|
|
380
|
-
|Baseline
|
|
407
|
+
|Baseline|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
|
|
381
408
|
|:---:|:---:|:---:|:---:|:---:|:---:|
|
|
382
|
-
|
|
383
|
-
|
|
409
|
+
|+L20x1:24.85s|19.43s|16.82s|15.95s|14.24s|10.66s|
|
|
410
|
+
|+L20x4:8.54s|7.20s|6.61s|6.09s|5.54s|4.22s|
|
|
384
411
|
|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/NONE_R0.08_S0.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.03_P24.0_T19.43s.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.04_P34.6_T16.82s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.05_P38.3_T15.95s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.06_P45.2_T14.24s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.2_P59.5_T10.66s.png width=105px>|
|
|
412
|
+
-->
|
|
385
413
|
|
|
386
414
|
## 🔥Torch Compile
|
|
387
415
|
|
|
388
416
|
<div id="compile"></div>
|
|
389
417
|
|
|
390
|
-
**CacheDiT** are designed to work compatibly with `torch.compile`. For example:
|
|
418
|
+
**CacheDiT** are designed to work compatibly with `torch.compile`. You can easily use CacheDiT with torch.compile to further achieve a better performance. For example:
|
|
391
419
|
|
|
392
420
|
```python
|
|
393
421
|
apply_cache_on_pipe(
|
|
@@ -396,21 +424,27 @@ apply_cache_on_pipe(
|
|
|
396
424
|
# Compile the Transformer module
|
|
397
425
|
pipe.transformer = torch.compile(pipe.transformer)
|
|
398
426
|
```
|
|
399
|
-
However, users intending to use **CacheDiT** for DiT with **dynamic input shapes** should consider increasing the **recompile** **limit** of `torch._dynamo
|
|
400
|
-
|
|
427
|
+
However, users intending to use **CacheDiT** for DiT with **dynamic input shapes** should consider increasing the **recompile** **limit** of `torch._dynamo`. Otherwise, the recompile_limit error may be triggered, causing the module to fall back to eager mode.
|
|
401
428
|
```python
|
|
402
429
|
torch._dynamo.config.recompile_limit = 96 # default is 8
|
|
403
430
|
torch._dynamo.config.accumulated_recompile_limit = 2048 # default is 256
|
|
404
431
|
```
|
|
405
|
-
Otherwise, the recompile_limit error may be triggered, causing the module to fall back to eager mode.
|
|
406
432
|
|
|
407
|
-
|
|
433
|
+
<!--
|
|
408
434
|
|
|
409
|
-
<div
|
|
435
|
+
<div align="center">
|
|
436
|
+
<p align="center">
|
|
437
|
+
DBPrune + <b>torch.compile</b>, Steps: 28, "A cat holding a sign that says hello world with complex background"
|
|
438
|
+
</p>
|
|
439
|
+
</div>
|
|
410
440
|
|
|
411
|
-
|
|
412
|
-
|
|
413
|
-
|
|
441
|
+
|Baseline|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
|
|
442
|
+
|:---:|:---:|:---:|:---:|:---:|:---:|
|
|
443
|
+
|+L20x1:24.8s|19.4s|16.8s|15.9s|14.2s|10.6s|
|
|
444
|
+
|+compile:20.4s|16.5s|14.1s|13.4s|12s|8.8s|
|
|
445
|
+
|+L20x4:7.7s|6.6s|6.0s|5.8s|5.2s|3.9s|
|
|
446
|
+
|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_NONE_R0.08_S0_T20.43s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.03_P24.0_T16.25s.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.04_P34.6_T14.12s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.045_P38.2_T13.41s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.055_P45.1_T12.00s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.2_P59.5_T8.86s.png width=105px>|
|
|
447
|
+
-->
|
|
414
448
|
|
|
415
449
|
## 👋Contribute
|
|
416
450
|
<div id="contribute"></div>
|
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
cache_dit/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
2
|
-
cache_dit/_version.py,sha256=
|
|
2
|
+
cache_dit/_version.py,sha256=AjUi5zEL_BoWoXMXR1FnWc3mD6FHX7snDXjDHVLoens,511
|
|
3
3
|
cache_dit/logger.py,sha256=dKfNe_RRk9HJwfgHGeRR1f0LbskJpKdGmISCbL9roQs,3443
|
|
4
4
|
cache_dit/primitives.py,sha256=A2iG9YLot3gOsZSPp-_gyjqjLgJvWQRx8aitD4JQ23Y,3877
|
|
5
5
|
cache_dit/cache_factory/__init__.py,sha256=5RNuhWakvvqrOV4vkqrEBA7d-V1LwcNSsjtW14mkqK8,5255
|
|
@@ -12,7 +12,7 @@ cache_dit/cache_factory/dual_block_cache/diffusers_adapters/cogvideox.py,sha256=
|
|
|
12
12
|
cache_dit/cache_factory/dual_block_cache/diffusers_adapters/flux.py,sha256=UbE6nIF-EtA92QxIZVMzIssdZKQSPAVX1hchF9R8drU,2754
|
|
13
13
|
cache_dit/cache_factory/dual_block_cache/diffusers_adapters/mochi.py,sha256=qxMu1L3ycT8F-uxpGsmFQBY_BH1vDiGIOXgS_Qbb7dM,2391
|
|
14
14
|
cache_dit/cache_factory/dynamic_block_prune/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
15
|
-
cache_dit/cache_factory/dynamic_block_prune/prune_context.py,sha256=
|
|
15
|
+
cache_dit/cache_factory/dynamic_block_prune/prune_context.py,sha256=YRDwZ_16yjThpgVgDv6YaIB4QCE9nEkE-MOru0jOd50,35026
|
|
16
16
|
cache_dit/cache_factory/dynamic_block_prune/diffusers_adapters/__init__.py,sha256=8IjJjZOs5XRzsj7Ni2MXpR2Z1PUyRSONIhmfAn1G0eM,1667
|
|
17
17
|
cache_dit/cache_factory/dynamic_block_prune/diffusers_adapters/cogvideox.py,sha256=ORJpdkXkgziDUo-rpebC6pUemgYaDCoeu0cwwLz175U,2407
|
|
18
18
|
cache_dit/cache_factory/dynamic_block_prune/diffusers_adapters/flux.py,sha256=KbEkLSsHtS6xwLWNh3jlOlXRyGRdrI2pWV1zyQxMTj4,2757
|
|
@@ -24,8 +24,8 @@ cache_dit/cache_factory/first_block_cache/diffusers_adapters/cogvideox.py,sha256
|
|
|
24
24
|
cache_dit/cache_factory/first_block_cache/diffusers_adapters/flux.py,sha256=Dcd4OzABCtyQCZNX2KNnUTdVoO1E1ApM7P8gcVYzcK0,2733
|
|
25
25
|
cache_dit/cache_factory/first_block_cache/diffusers_adapters/mochi.py,sha256=lQTClo52OwPbNEE4jiBZQhfC7hbtYqnYIABp_vbm_dk,2363
|
|
26
26
|
cache_dit/cache_factory/first_block_cache/diffusers_adapters/wan.py,sha256=IVH-lroOzvYb4XKLk9MOw54EtijBtuzVaKcVGz0KlBA,2656
|
|
27
|
-
cache_dit-0.1.
|
|
28
|
-
cache_dit-0.1.
|
|
29
|
-
cache_dit-0.1.
|
|
30
|
-
cache_dit-0.1.
|
|
31
|
-
cache_dit-0.1.
|
|
27
|
+
cache_dit-0.1.8.dist-info/licenses/LICENSE,sha256=Dqb07Ik2dV41s9nIdMUbiRWEfDqo7-dQeRiY7kPO8PE,3769
|
|
28
|
+
cache_dit-0.1.8.dist-info/METADATA,sha256=sAYGKro4VfeE_SHrZA8X0BcHfw9y3YY_Qcj9ONkbemE,23952
|
|
29
|
+
cache_dit-0.1.8.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
|
|
30
|
+
cache_dit-0.1.8.dist-info/top_level.txt,sha256=ZJDydonLEhujzz0FOkVbO-BqfzO9d_VqRHmZU-3MOZo,10
|
|
31
|
+
cache_dit-0.1.8.dist-info/RECORD,,
|
|
File without changes
|
|
File without changes
|
|
File without changes
|