cache-dit 0.2.19__py3-none-any.whl → 0.2.20__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of cache-dit might be problematic. Click here for more details.

cache_dit/_version.py CHANGED
@@ -28,7 +28,7 @@ version_tuple: VERSION_TUPLE
28
28
  commit_id: COMMIT_ID
29
29
  __commit_id__: COMMIT_ID
30
30
 
31
- __version__ = version = '0.2.19'
32
- __version_tuple__ = version_tuple = (0, 2, 19)
31
+ __version__ = version = '0.2.20'
32
+ __version_tuple__ = version_tuple = (0, 2, 20)
33
33
 
34
34
  __commit_id__ = commit_id = None
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: cache_dit
3
- Version: 0.2.19
3
+ Version: 0.2.20
4
4
  Summary: 🤗 CacheDiT: An Unified and Training-free Cache Acceleration Toolbox for Diffusion Transformers
5
5
  Author: DefTruth, vipshop.com, etc.
6
6
  Maintainer: DefTruth, vipshop.com, etc
@@ -40,10 +40,12 @@ Dynamic: requires-dist
40
40
  Dynamic: requires-python
41
41
 
42
42
  <div align="center">
43
+ <img src=https://github.com/vipshop/cache-dit/raw/main/assets/cache-dit-logo.png height="120">
44
+
43
45
  <p align="center">
44
- <h2>🤗 CacheDiT: An Unified and Training-free Cache Acceleration <br>Toolbox for Diffusion Transformers</h2>
46
+ An <b>Unified</b> and Training-free <b>Cache Acceleration</b> Toolbox for <b>Diffusion Transformers</b> <br>
47
+ ♥️ <b>Cache Acceleration</b> with <b>One-line</b> Code ~ ♥️
45
48
  </p>
46
- <img src=https://github.com/vipshop/cache-dit/raw/main/assets/cache-dit-v1.png >
47
49
  <div align='center'>
48
50
  <img src=https://img.shields.io/badge/Language-Python-brightgreen.svg >
49
51
  <img src=https://img.shields.io/badge/PRs-welcome-9cf.svg >
@@ -52,25 +54,36 @@ Dynamic: requires-python
52
54
  <img src=https://img.shields.io/badge/Python-3.10|3.11|3.12-9cf.svg >
53
55
  <img src=https://img.shields.io/badge/Release-v0.2-brightgreen.svg >
54
56
  </div>
57
+ <p align="center">
55
58
  🔥<b><a href="#unified">Unified Cache APIs</a> | <a href="#dbcache">DBCache</a> | <a href="#taylorseer">Hybrid TaylorSeer</a> | <a href="#cfg">Hybrid Cache CFG</a></b>🔥
56
- </div>
57
-
58
- <div align="center">
59
+ </p>
59
60
  <p align="center">
60
- ♥️ Cache <b>Acceleration</b> with <b>One-line</b> Code ~ ♥️
61
+ 🎉Now, <b>Diffusers's</b> Pipelines <b>Coverage Ratio: 60%~70%</b>🎉
61
62
  </p>
62
- </div>
63
+ </div>
64
+
65
+ <!--
66
+ <img src=https://github.com/vipshop/cache-dit/raw/main/assets/cache-dit-v1.png >
67
+ <img src=https://github.com/vipshop/cache-dit/raw/main/assets/dbcache-v1.png height="320px">
68
+ <img src=https://github.com/vipshop/cache-dit/raw/main/assets/dbcache-v1.png>
69
+ <img src=https://github.com/vipshop/cache-dit/raw/main/assets/patterns.png>
70
+ -->
63
71
 
64
72
  ## 🔥News
65
73
 
66
74
  - [2025-08-19] 🔥[**Qwen-Image-Edit**](https://github.com/QwenLM/Qwen-Image) **2x⚡️** speedup! Check example [run_qwen_image_edit.py](./examples/run_qwen_image_edit.py).
67
- - [2025-08-18] 🎉Early **[Unified Cache APIs](#unified)** released! Check [Qwen-Image w/ UAPI](./examples/run_qwen_image_uapi.py) as an example.
68
75
  - [2025-08-12] 🎉First caching mechanism in [QwenLM/Qwen-Image](https://github.com/QwenLM/Qwen-Image) with **[cache-dit](https://github.com/vipshop/cache-dit)**, check the [PR](https://github.com/QwenLM/Qwen-Image/pull/61).
69
76
  - [2025-08-11] 🔥[**Qwen-Image**](https://github.com/QwenLM/Qwen-Image) **1.8x⚡️** speedup! Please refer [run_qwen_image.py](./examples/run_qwen_image.py) as an example.
77
+
78
+ <details>
79
+ <summary> Previous News </summary>
80
+
70
81
  - [2025-08-10] 🔥[**FLUX.1-Kontext-dev**](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev) is supported! Please refer [run_flux_kontext.py](./examples/run_flux_kontext.py) as an example.
71
82
  - [2025-07-18] 🎉First caching mechanism in [🤗huggingface/flux-fast](https://github.com/huggingface/flux-fast) with **[cache-dit](https://github.com/vipshop/cache-dit)**, check the [PR](https://github.com/huggingface/flux-fast/pull/13).
72
83
  - [2025-07-13] **[🤗flux-faster](https://github.com/xlite-dev/flux-faster)** is released! **3.3x** speedup for FLUX.1 on NVIDIA L20 with **[cache-dit](https://github.com/vipshop/cache-dit)**.
73
84
 
85
+ </details>
86
+
74
87
  ## 📖Contents
75
88
 
76
89
  <div id="contents"></div>
@@ -81,7 +94,7 @@ Dynamic: requires-python
81
94
  - [⚡️Dual Block Cache](#dbcache)
82
95
  - [🔥Hybrid TaylorSeer](#taylorseer)
83
96
  - [⚡️Hybrid Cache CFG](#cfg)
84
- - [🔥Torch Compile](#compile)
97
+ - [⚙️Torch Compile](#compile)
85
98
  - [🛠Metrics CLI](#metrics)
86
99
 
87
100
  ## ⚙️Installation
@@ -110,11 +123,15 @@ Currently, **cache-dit** library supports almost **Any** Diffusion Transformers
110
123
  - [🚀FLUX.1-dev](https://github.com/vipshop/cache-dit/raw/main/examples)
111
124
  - [🚀FLUX.1-Fill-dev](https://github.com/vipshop/cache-dit/raw/main/examples)
112
125
  - [🚀FLUX.1-Kontext-dev](https://github.com/vipshop/cache-dit/raw/main/examples)
113
- - [🚀mochi-1-preview](https://github.com/vipshop/cache-dit/raw/main/examples)
114
126
  - [🚀CogVideoX](https://github.com/vipshop/cache-dit/raw/main/examples)
115
127
  - [🚀CogVideoX1.5](https://github.com/vipshop/cache-dit/raw/main/examples)
116
128
  - [🚀Wan2.1-T2V](https://github.com/vipshop/cache-dit/raw/main/examples)
117
129
  - [🚀Wan2.1-FLF2V](https://github.com/vipshop/cache-dit/raw/main/examples)
130
+
131
+ <details>
132
+ <summary> More Pipelines </summary>
133
+
134
+ - [🚀mochi-1-preview](https://github.com/vipshop/cache-dit/raw/main/examples)
118
135
  - [🚀HunyuanVideo](https://github.com/vipshop/cache-dit/raw/main/examples)
119
136
  - [🚀LTXVideo](https://github.com/vipshop/cache-dit/raw/main/examples)
120
137
  - [🚀Allegro](https://github.com/vipshop/cache-dit/raw/main/examples)
@@ -124,22 +141,18 @@ Currently, **cache-dit** library supports almost **Any** Diffusion Transformers
124
141
  - [🚀EasyAnimate](https://github.com/vipshop/cache-dit/raw/main/examples)
125
142
  - [🚀SkyReelsV2](https://github.com/vipshop/cache-dit/raw/main/examples)
126
143
  - [🚀SD3](https://github.com/vipshop/cache-dit/raw/main/examples)
144
+
145
+ </details>
127
146
 
128
147
  ## 🎉Unified Cache APIs
129
148
 
130
149
  <div id="unified"></div>
131
150
 
132
- Currently, for any **Diffusion** models with **Transformer Blocks** that match the specific **Input/Output patterns**, we can use the **Unified Cache APIs** from **cache-dit**, namely, the `cache_dit.enable_cache(...)` API. The supported patterns are listed as follows:
151
+ Currently, for any **Diffusion** models with **Transformer Blocks** that match the specific **Input/Output patterns**, we can use the **Unified Cache APIs** from **cache-dit**, namely, the `cache_dit.enable_cache(...)` API. The **Unified Cache APIs** are currently in the experimental phase; please stay tuned for updates. The supported patterns are listed as follows:
133
152
 
134
- ```python
135
- (IN: hidden_states, encoder_hidden_states, ...) -> (OUT: hidden_states, encoder_hidden_states)
136
- (IN: hidden_states, encoder_hidden_states, ...) -> (OUT: encoder_hidden_states, hidden_states)
137
- (IN: hidden_states, encoder_hidden_states, ...) -> (OUT: hidden_states)
138
- (IN: hidden_states, ...) -> (OUT: hidden_states) # TODO, DiT, Lumina2, etc.
139
- ```
140
-
141
- After the `cache_dit.enable_cache(...)` API is called, you just need to call the pipe as normal. The `pipe` param can be **any** Diffusion Pipeline. Please refer to [Qwen-Image](./examples/run_qwen_image_uapi.py) as an example. The **Unified Cache APIs** are currently in the experimental phase; please stay tuned for updates.
153
+ ![](https://github.com/vipshop/cache-dit/raw/main/assets/patterns.png)
142
154
 
155
+ After the `cache_dit.enable_cache(...)` API is called, you just need to call the pipe as normal. The `pipe` param can be **any** Diffusion Pipeline. Please refer to [Qwen-Image](./examples/run_qwen_image_uapi.py) as an example.
143
156
  ```python
144
157
  import cache_dit
145
158
  from diffusers import DiffusionPipeline
@@ -180,47 +193,13 @@ After finishing each inference of `pipe(...)`, you can call the `cache_dit.summa
180
193
 
181
194
  <div id="dbcache"></div>
182
195
 
183
- ![](https://github.com/vipshop/cache-dit/raw/main/assets/dbcache-v1.png)
184
-
185
-
186
- **DBCache**: **Dual Block Caching** for Diffusion Transformers. We have enhanced `FBCache` into a more general and customizable cache algorithm, namely `DBCache`, enabling it to achieve fully `UNet-style` cache acceleration for DiT models. Different configurations of compute blocks (**F8B12**, etc.) can be customized in DBCache. Moreover, it can be entirely **training**-**free**. DBCache can strike a perfect **balance** between performance and precision!
187
-
188
- <div align="center">
189
- <p align="center">
190
- DBCache, <b> L20x1 </b>, Steps: 28, "A cat holding a sign that says hello world with complex background"
191
- </p>
192
- </div>
193
-
194
- |Baseline(L20x1)|F1B0 (0.08)|F1B0 (0.20)|F8B8 (0.15)|F12B12 (0.20)|F16B16 (0.20)|
195
- |:---:|:---:|:---:|:---:|:---:|:---:|
196
- |24.85s|15.59s|8.58s|15.41s|15.11s|17.74s|
197
- |<img src=https://github.com/vipshop/cache-dit/raw/main/assets/NONE_R0.08_S0.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F1B0S1_R0.08_S11.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F1B0S1_R0.2_S19.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F8B8S1_R0.15_S15.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F12B12S4_R0.2_S16.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F16B16S4_R0.2_S13.png width=105px>|
198
- |**Baseline(L20x1)**|**F1B0 (0.08)**|**F8B8 (0.12)**|**F8B12 (0.12)**|**F8B16 (0.20)**|**F8B20 (0.20)**|
199
- |27.85s|6.04s|5.88s|5.77s|6.01s|6.20s|
200
- |<img src=https://github.com/vipshop/cache-dit/raw/main/assets/TEXTURE_NONE_R0.08.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/TEXTURE_DBCACHE_F1B0_R0.08.png width=105px> |<img src=https://github.com/vipshop/cache-dit/raw/main/assets/TEXTURE_DBCACHE_F8B8_R0.12.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/TEXTURE_DBCACHE_F8B12_R0.12.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/TEXTURE_DBCACHE_F8B16_R0.2.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/TEXTURE_DBCACHE_F8B20_R0.2.png width=105px>|
201
-
202
- <div align="center">
203
- <p align="center">
204
- DBCache, <b> L20x4 </b>, Steps: 20, case to show the texture recovery ability of DBCache
205
- </p>
206
- </div>
207
-
208
- These case studies demonstrate that even with relatively high thresholds (such as 0.12, 0.15, 0.2, etc.) under the DBCache **F12B12** or **F8B16** configuration, the detailed texture of the kitten's fur, colored cloth, and the clarity of text can still be preserved. This suggests that users can leverage DBCache to effectively balance performance and precision in their workflows!
209
-
196
+ ![](https://github.com/vipshop/cache-dit/raw/main/assets/dbcache-fnbn-v1.png)
210
197
 
211
- **DBCache** provides configurable parameters for custom optimization, enabling a balanced trade-off between performance and precision:
198
+ **DBCache**: **Dual Block Caching** for Diffusion Transformers. Different configurations of compute blocks (**F8B12**, etc.) can be customized in DBCache, enabling a balanced trade-off between performance and precision. Moreover, it can be entirely **training**-**free**. Please check [DBCache.md](./docs/DBCache.md) docs for more design details.
212
199
 
213
200
  - **Fn**: Specifies that DBCache uses the **first n** Transformer blocks to fit the information at time step t, enabling the calculation of a more stable L1 diff and delivering more accurate information to subsequent blocks.
214
201
  - **Bn**: Further fuses approximate information in the **last n** Transformer blocks to enhance prediction accuracy. These blocks act as an auto-scaler for approximate hidden states that use residual cache.
215
202
 
216
- ![](https://github.com/vipshop/cache-dit/raw/main/assets/dbcache-fnbn-v1.png)
217
-
218
- - **warmup_steps**: (default: 0) DBCache does not apply the caching strategy when the number of running steps is less than or equal to this value, ensuring the model sufficiently learns basic features during warmup.
219
- - **max_cached_steps**: (default: -1) DBCache disables the caching strategy when the previous cached steps exceed this value to prevent precision degradation.
220
- - **residual_diff_threshold**: The value of residual diff threshold, a higher value leads to faster performance at the cost of lower precision.
221
-
222
- For a good balance between performance and precision, DBCache is configured by default with **F8B0**, 8 warmup steps, and unlimited cached steps.
223
-
224
203
  ```python
225
204
  import cache_dit
226
205
  from diffusers import FluxPipeline
@@ -230,7 +209,8 @@ pipe = FluxPipeline.from_pretrained(
230
209
  torch_dtype=torch.bfloat16,
231
210
  ).to("cuda")
232
211
 
233
- # Default options, F8B0, good balance between performance and precision
212
+ # Default options, F8B0, 8 warmup steps, and unlimited cached
213
+ # steps for good balance between performance and precision
234
214
  cache_options = cache_dit.default_options()
235
215
 
236
216
  # Custom options, F8B8, higher precision
@@ -259,6 +239,17 @@ cache_options = {
259
239
  }
260
240
  ```
261
241
 
242
+ <div align="center">
243
+ <p align="center">
244
+ DBCache, <b> L20x1 </b>, Steps: 28, "A cat holding a sign that says hello world with complex background"
245
+ </p>
246
+ </div>
247
+
248
+ |Baseline(L20x1)|F1B0 (0.08)|F1B0 (0.20)|F8B8 (0.15)|F12B12 (0.20)|F16B16 (0.20)|
249
+ |:---:|:---:|:---:|:---:|:---:|:---:|
250
+ |24.85s|15.59s|8.58s|15.41s|15.11s|17.74s|
251
+ |<img src=https://github.com/vipshop/cache-dit/raw/main/assets/NONE_R0.08_S0.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F1B0S1_R0.08_S11.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F1B0S1_R0.2_S19.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F8B8S1_R0.15_S15.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F12B12S4_R0.2_S16.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F16B16S4_R0.2_S13.png width=105px>|
252
+
262
253
  ## 🔥Hybrid TaylorSeer
263
254
 
264
255
  <div id="taylorseer"></div>
@@ -326,16 +317,15 @@ cache_options = {
326
317
  }
327
318
  ```
328
319
 
329
- ## 🔥Torch Compile
320
+ ## ⚙️Torch Compile
330
321
 
331
322
  <div id="compile"></div>
332
323
 
333
324
  By the way, **cache-dit** is designed to work compatibly with **torch.compile.** You can easily use cache-dit with torch.compile to further achieve a better performance. For example:
334
325
 
335
326
  ```python
336
- cache_dit.enable_cache(
337
- pipe, **cache_dit.default_options()
338
- )
327
+ cache_dit.enable_cache(pipe)
328
+
339
329
  # Compile the Transformer module
340
330
  pipe.transformer = torch.compile(pipe.transformer)
341
331
  ```
@@ -1,5 +1,5 @@
1
1
  cache_dit/__init__.py,sha256=TvZI861ipGnYaOEHJA0Og-ksRUGNCld-PGy_NgjcKZE,641
2
- cache_dit/_version.py,sha256=32XF9c5EeiOUdyiWeKcwkXTWZQBgtvbmKx8wZoDEW0o,706
2
+ cache_dit/_version.py,sha256=4N3ayuoZZJYPEGMvrxu7tnGigRTxbAdCyp5a8y7c6aw,706
3
3
  cache_dit/logger.py,sha256=0zsu42hN-3-rgGC_C29ms1IvVpV4_b4_SwJCKSenxBE,4304
4
4
  cache_dit/primitives.py,sha256=A2iG9YLot3gOsZSPp-_gyjqjLgJvWQRx8aitD4JQ23Y,3877
5
5
  cache_dit/utils.py,sha256=yybhUTGPfeCoIVZzpoefZ2ypvH8de-10UhPls81ceG4,4800
@@ -22,9 +22,9 @@ cache_dit/metrics/fid.py,sha256=9Ivtazl6mW0Bon2VXa-Ia5Xj2ewxRD3V1Qkd69zYM3Y,1706
22
22
  cache_dit/metrics/inception.py,sha256=pBVe2X6ylLPIXTG4-GWDM9DWnCviMJbJ45R3ulhktR0,12759
23
23
  cache_dit/metrics/lpips.py,sha256=I2qCNi6qJh5TRsaIsdxO0WoRX1DN7U_H3zS0oCSahYM,1032
24
24
  cache_dit/metrics/metrics.py,sha256=8jvM1sF-nDxUuwCRy44QEoo4dYVLCQVh1QyAMs4eaQY,27840
25
- cache_dit-0.2.19.dist-info/licenses/LICENSE,sha256=Dqb07Ik2dV41s9nIdMUbiRWEfDqo7-dQeRiY7kPO8PE,3769
26
- cache_dit-0.2.19.dist-info/METADATA,sha256=cCnv_b_F06xdttqdHhnbmPDpF_xRgz-O03tYfvzGGrI,20910
27
- cache_dit-0.2.19.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
28
- cache_dit-0.2.19.dist-info/entry_points.txt,sha256=FX2gysXaZx6NeK1iCLMcIdP8Q4_qikkIHtEmi3oWn8o,65
29
- cache_dit-0.2.19.dist-info/top_level.txt,sha256=ZJDydonLEhujzz0FOkVbO-BqfzO9d_VqRHmZU-3MOZo,10
30
- cache_dit-0.2.19.dist-info/RECORD,,
25
+ cache_dit-0.2.20.dist-info/licenses/LICENSE,sha256=Dqb07Ik2dV41s9nIdMUbiRWEfDqo7-dQeRiY7kPO8PE,3769
26
+ cache_dit-0.2.20.dist-info/METADATA,sha256=rq7U5fZNeRMn6CvKRyuh1JcovMSwbDWGt1v6LzfaBUE,18752
27
+ cache_dit-0.2.20.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
28
+ cache_dit-0.2.20.dist-info/entry_points.txt,sha256=FX2gysXaZx6NeK1iCLMcIdP8Q4_qikkIHtEmi3oWn8o,65
29
+ cache_dit-0.2.20.dist-info/top_level.txt,sha256=ZJDydonLEhujzz0FOkVbO-BqfzO9d_VqRHmZU-3MOZo,10
30
+ cache_dit-0.2.20.dist-info/RECORD,,