cache-dit 0.3.1__py3-none-any.whl → 0.3.3__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of cache-dit might be problematic. Click here for more details.
- cache_dit/__init__.py +1 -0
- cache_dit/_version.py +2 -2
- cache_dit/cache_factory/__init__.py +3 -6
- cache_dit/cache_factory/block_adapters/block_adapters.py +21 -64
- cache_dit/cache_factory/cache_adapters/__init__.py +0 -1
- cache_dit/cache_factory/cache_adapters/cache_adapter.py +82 -21
- cache_dit/cache_factory/cache_blocks/__init__.py +4 -0
- cache_dit/cache_factory/cache_blocks/offload_utils.py +115 -0
- cache_dit/cache_factory/cache_blocks/pattern_base.py +3 -0
- cache_dit/cache_factory/cache_contexts/__init__.py +10 -8
- cache_dit/cache_factory/cache_contexts/cache_context.py +186 -117
- cache_dit/cache_factory/cache_contexts/cache_manager.py +63 -131
- cache_dit/cache_factory/cache_contexts/calibrators/__init__.py +132 -0
- cache_dit/cache_factory/cache_contexts/{v2/calibrators → calibrators}/foca.py +1 -1
- cache_dit/cache_factory/cache_contexts/{v2/calibrators → calibrators}/taylorseer.py +7 -2
- cache_dit/cache_factory/cache_interface.py +128 -111
- cache_dit/cache_factory/params_modifier.py +87 -0
- cache_dit/metrics/__init__.py +3 -1
- cache_dit/utils.py +12 -21
- {cache_dit-0.3.1.dist-info → cache_dit-0.3.3.dist-info}/METADATA +200 -434
- {cache_dit-0.3.1.dist-info → cache_dit-0.3.3.dist-info}/RECORD +27 -31
- cache_dit/cache_factory/cache_adapters/v2/__init__.py +0 -3
- cache_dit/cache_factory/cache_adapters/v2/cache_adapter_v2.py +0 -524
- cache_dit/cache_factory/cache_contexts/taylorseer.py +0 -102
- cache_dit/cache_factory/cache_contexts/v2/__init__.py +0 -13
- cache_dit/cache_factory/cache_contexts/v2/cache_context_v2.py +0 -288
- cache_dit/cache_factory/cache_contexts/v2/cache_manager_v2.py +0 -799
- cache_dit/cache_factory/cache_contexts/v2/calibrators/__init__.py +0 -81
- /cache_dit/cache_factory/cache_blocks/{utils.py → pattern_utils.py} +0 -0
- /cache_dit/cache_factory/cache_contexts/{v2/calibrators → calibrators}/base.py +0 -0
- {cache_dit-0.3.1.dist-info → cache_dit-0.3.3.dist-info}/WHEEL +0 -0
- {cache_dit-0.3.1.dist-info → cache_dit-0.3.3.dist-info}/entry_points.txt +0 -0
- {cache_dit-0.3.1.dist-info → cache_dit-0.3.3.dist-info}/licenses/LICENSE +0 -0
- {cache_dit-0.3.1.dist-info → cache_dit-0.3.3.dist-info}/top_level.txt +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: cache_dit
|
|
3
|
-
Version: 0.3.
|
|
3
|
+
Version: 0.3.3
|
|
4
4
|
Summary: A Unified, Flexible and Training-free Cache Acceleration Framework for 🤗Diffusers.
|
|
5
5
|
Author: DefTruth, vipshop.com, etc.
|
|
6
6
|
Maintainer: DefTruth, vipshop.com, etc
|
|
@@ -45,6 +45,8 @@ Dynamic: provides-extra
|
|
|
45
45
|
Dynamic: requires-dist
|
|
46
46
|
Dynamic: requires-python
|
|
47
47
|
|
|
48
|
+
<a href="./README.md">📚English</a> | <a href="./README_CN.md">📚中文阅读</a>
|
|
49
|
+
|
|
48
50
|
<div align="center">
|
|
49
51
|
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/cache-dit-logo.png height="120">
|
|
50
52
|
|
|
@@ -57,12 +59,12 @@ Dynamic: requires-python
|
|
|
57
59
|
<img src=https://img.shields.io/badge/PRs-welcome-9cf.svg >
|
|
58
60
|
<img src=https://img.shields.io/badge/PyPI-pass-brightgreen.svg >
|
|
59
61
|
<img src=https://static.pepy.tech/badge/cache-dit >
|
|
62
|
+
<img src=https://img.shields.io/github/stars/vipshop/cache-dit.svg?style=dark >
|
|
60
63
|
<img src=https://img.shields.io/badge/Python-3.10|3.11|3.12-9cf.svg >
|
|
61
|
-
<img src=https://img.shields.io/badge/Release-v0.3-brightgreen.svg >
|
|
62
64
|
</div>
|
|
63
65
|
<p align="center">
|
|
64
|
-
<b><a href="#unified">📚Unified Cache APIs</a></b> | <a href="#forward-pattern-matching">📚Forward Pattern Matching</a> | <a href="
|
|
65
|
-
<a href="
|
|
66
|
+
<b><a href="#unified">📚Unified Cache APIs</a></b> | <a href="#forward-pattern-matching">📚Forward Pattern Matching</a> | <a href="./docs/User_Guide.md">📚Automatic Block Adapter</a><br>
|
|
67
|
+
<a href="./docs/User_Guide.md">📚Hybrid Forward Pattern</a> | <a href="#dbcache">📚DBCache</a> | <a href="./docs/User_Guide.md">📚TaylorSeer Calibrator</a> | <a href="./docs/User_Guide.md">📚Cache CFG</a><br>
|
|
66
68
|
<a href="#benchmarks">📚Text2Image DrawBench</a> | <a href="#benchmarks">📚Text2Image Distillation DrawBench</a>
|
|
67
69
|
</p>
|
|
68
70
|
<p align="center">
|
|
@@ -74,6 +76,8 @@ Dynamic: requires-python
|
|
|
74
76
|
🔥<a href="#supported">Chroma</a> | <a href="#supported">Sana</a> | <a href="#supported">Allegro</a> | <a href="#supported">Mochi</a> | <a href="#supported">SD 3/3.5</a> | <a href="#supported">Amused</a> | <a href="#supported"> ... </a> | <a href="#supported">DiT-XL</a>🔥
|
|
75
77
|
</p>
|
|
76
78
|
</div>
|
|
79
|
+
|
|
80
|
+
|
|
77
81
|
<div align='center'>
|
|
78
82
|
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/gifs/wan2.2.C0_Q0_NONE.gif width=124px>
|
|
79
83
|
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/gifs/wan2.2.C1_Q0_DBCACHE_F1B0_W2M8MC2_T1O2_R0.08.gif width=124px>
|
|
@@ -85,12 +89,6 @@ Dynamic: requires-python
|
|
|
85
89
|
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux.C0_Q0_NONE_T23.69s.png width=90px>
|
|
86
90
|
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux.C0_Q0_DBCACHE_F1B0_W4M0MC0_T1O2_R0.15_S16_T11.39s.png width=90px>
|
|
87
91
|
<p><b>🔥Qwen-Image</b> | <a href="https://github.com/vipshop/cache-dit">+cache-dit</a>:1.8x↑🎉 | <b>FLUX.1-dev</b> | <a href="https://github.com/vipshop/cache-dit">+cache-dit</a>:2.1x↑🎉</p>
|
|
88
|
-
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext-cat.C0_L0_Q0_NONE.png width=100px>
|
|
89
|
-
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext.C0_L0_Q0_NONE.png width=100px>
|
|
90
|
-
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext.C0_L0_Q0_DBCACHE_F8B0_W8M0MC0_T0O2_R0.08_S10.png width=100px>
|
|
91
|
-
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext.C0_L0_Q0_DBCACHE_F1B0_W8M0MC2_T0O2_R0.12_S12.png width=100px>
|
|
92
|
-
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext.C0_L0_Q0_DBCACHE_F1B0_W2M0MC2_T0O2_R0.15_S15.png width=100px>
|
|
93
|
-
<p><b>🔥FLUX-Kontext-dev</b> | Baseline | <a href="https://github.com/vipshop/cache-dit">+cache-dit</a>:1.3x↑🎉 | 1.7x↑🎉 | 2.0x↑ 🎉</p>
|
|
94
92
|
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/qwen-image-lightning.4steps.C0_L1_Q0_NONE.png width=160px>
|
|
95
93
|
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/qwen-image-lightning.4steps.C0_L1_Q0_DBCACHE_F16B16_W2M1MC1_T0O2_R0.9_S1.png width=160px>
|
|
96
94
|
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/hunyuan-image-2.1.C0_L0_Q1_fp8_w8a16_wo_NONE.png width=90px>
|
|
@@ -100,7 +98,22 @@ Dynamic: requires-python
|
|
|
100
98
|
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/qwen-image-edit.C0_L0_Q0_NONE.png width=125px>
|
|
101
99
|
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/qwen-image-edit.C0_L0_Q0_DBCACHE_F8B0_W8M0MC0_T0O2_R0.08_S18.png width=125px>
|
|
102
100
|
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/qwen-image-edit.C0_L0_Q0_DBCACHE_F1B0_W8M0MC2_T0O2_R0.12_S24.png width=125px>
|
|
103
|
-
<p><b>🔥Qwen-Image-Edit</b> | Input w/o Edit | Baseline | <a href="https://github.com/vipshop/cache-dit">+cache-dit</a>:1.6x↑🎉 | 1.9x↑🎉
|
|
101
|
+
<p><b>🔥Qwen-Image-Edit</b> | Input w/o Edit | Baseline | <a href="https://github.com/vipshop/cache-dit">+cache-dit</a>:1.6x↑🎉 | 1.9x↑🎉
|
|
102
|
+
<br>♥️ Please consider to leave a <b>⭐️ Star</b> to support us ~ ♥️
|
|
103
|
+
</p>
|
|
104
|
+
</div>
|
|
105
|
+
|
|
106
|
+
<details align='center'>
|
|
107
|
+
|
|
108
|
+
<summary>Click here to show more Image/Video cases</summary>
|
|
109
|
+
|
|
110
|
+
<div align='center'>
|
|
111
|
+
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext-cat.C0_L0_Q0_NONE.png width=100px>
|
|
112
|
+
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext.C0_L0_Q0_NONE.png width=100px>
|
|
113
|
+
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext.C0_L0_Q0_DBCACHE_F8B0_W8M0MC0_T0O2_R0.08_S10.png width=100px>
|
|
114
|
+
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext.C0_L0_Q0_DBCACHE_F1B0_W8M0MC2_T0O2_R0.12_S12.png width=100px>
|
|
115
|
+
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext.C0_L0_Q0_DBCACHE_F1B0_W2M0MC2_T0O2_R0.15_S15.png width=100px>
|
|
116
|
+
<p><b>🔥FLUX-Kontext-dev</b> | Baseline | <a href="https://github.com/vipshop/cache-dit">+cache-dit</a>:1.3x↑🎉 | 1.7x↑🎉 | 2.0x↑ 🎉</p>
|
|
104
117
|
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/hidream.C0_L0_Q0_NONE.png width=100px>
|
|
105
118
|
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/hidream.C0_L0_Q0_DBCACHE_F1B0_W8M0MC0_T0O2_R0.08_S24.png width=100px>
|
|
106
119
|
<img src=https://github.com/vipshop/cache-dit/raw/main/assets/cogview4.C0_L0_Q0_NONE.png width=100px>
|
|
@@ -160,24 +173,25 @@ Dynamic: requires-python
|
|
|
160
173
|
<p><b>🔥Asumed</b> | <a href="https://github.com/vipshop/cache-dit">+cache-dit</a>:1.1x↑🎉 | 1.2x↑🎉 | <b>DiT-XL-256</b> | <a href="https://github.com/vipshop/cache-dit">+cache-dit</a>:1.8x↑🎉
|
|
161
174
|
<br>♥️ Please consider to leave a <b>⭐️ Star</b> to support us ~ ♥️</p>
|
|
162
175
|
</div>
|
|
176
|
+
</details>
|
|
163
177
|
|
|
164
178
|
## 🔥News
|
|
165
179
|
|
|
166
|
-
- [2025-09-10] 🎉Day 1 support [**HunyuanImage-2.1**](https://github.com/Tencent-Hunyuan/HunyuanImage-2.1) with **1.7x↑🎉** speedup! Check this [example](
|
|
167
|
-
- [2025-09-08] 🔥[**Qwen-Image-Lightning**](
|
|
168
|
-
- [2025-09-03] 🎉[**Wan2.2-MoE**](https://github.com/Wan-Video) **2.4x↑🎉** speedup! Please refer to [run_wan_2.2.py](
|
|
169
|
-
- [2025-08-19] 🔥[**Qwen-Image-Edit**](https://github.com/QwenLM/Qwen-Image) **2x↑🎉** speedup! Check the example: [run_qwen_image_edit.py](
|
|
170
|
-
- [2025-08-11] 🔥[**Qwen-Image**](https://github.com/QwenLM/Qwen-Image) **1.8x↑🎉** speedup! Please refer to [run_qwen_image.py](
|
|
171
|
-
- [2025-07-13] 🎉[**FLUX.1-dev**](https://github.com/xlite-dev/flux-faster) **3.3x↑🎉** speedup! NVIDIA L20 with **[cache-dit](https://github.com/vipshop/cache-dit)** + **compile + FP8 DQ**.
|
|
180
|
+
- [2025-09-10] 🎉Day 1 support [**HunyuanImage-2.1**](https://github.com/Tencent-Hunyuan/HunyuanImage-2.1) with **1.7x↑🎉** speedup! Check this [example](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline/run_hunyuan_image_2.1.py).
|
|
181
|
+
- [2025-09-08] 🔥[**Qwen-Image-Lightning**](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline/run_qwen_image_lightning.py) **7.1/3.5 steps🎉** inference with **[DBCache: F16B16](https://github.com/vipshop/cache-dit)**.
|
|
182
|
+
- [2025-09-03] 🎉[**Wan2.2-MoE**](https://github.com/Wan-Video) **2.4x↑🎉** speedup! Please refer to [run_wan_2.2.py](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline/run_wan_2.2.py) as an example.
|
|
183
|
+
- [2025-08-19] 🔥[**Qwen-Image-Edit**](https://github.com/QwenLM/Qwen-Image) **2x↑🎉** speedup! Check the example: [run_qwen_image_edit.py](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline/run_qwen_image_edit.py).
|
|
184
|
+
- [2025-08-11] 🔥[**Qwen-Image**](https://github.com/QwenLM/Qwen-Image) **1.8x↑🎉** speedup! Please refer to [run_qwen_image.py](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline/run_qwen_image.py) as an example.
|
|
172
185
|
|
|
173
186
|
<details>
|
|
174
187
|
<summary> Previous News </summary>
|
|
175
188
|
|
|
189
|
+
- [2025-07-13] 🎉[**FLUX.1-dev**](https://github.com/xlite-dev/flux-faster) **3.3x↑🎉** speedup! NVIDIA L20 with **[cache-dit](https://github.com/vipshop/cache-dit)** + **compile + FP8 DQ**.
|
|
176
190
|
- [2025-09-08] 🎉First caching mechanism in [Qwen-Image-Lightning](https://github.com/ModelTC/Qwen-Image-Lightning) with **[cache-dit](https://github.com/vipshop/cache-dit)**, check this [PR](https://github.com/ModelTC/Qwen-Image-Lightning/pull/35).
|
|
177
191
|
- [2025-09-08] 🎉First caching mechanism in [Wan2.2](https://github.com/Wan-Video/Wan2.2) with **[cache-dit](https://github.com/vipshop/cache-dit)**, check this [PR](https://github.com/Wan-Video/Wan2.2/pull/127) for more details.
|
|
178
192
|
- [2025-08-12] 🎉First caching mechanism in [QwenLM/Qwen-Image](https://github.com/QwenLM/Qwen-Image) with **[cache-dit](https://github.com/vipshop/cache-dit)**, check this [PR](https://github.com/QwenLM/Qwen-Image/pull/61).
|
|
179
|
-
- [2025-09-01] 📚[**Hybird Forward Pattern**](#unified) is supported! Please check [FLUX.1-dev](
|
|
180
|
-
- [2025-08-10] 🔥[**FLUX.1-Kontext-dev**](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev) is supported! Please refer [run_flux_kontext.py](
|
|
193
|
+
- [2025-09-01] 📚[**Hybird Forward Pattern**](#unified) is supported! Please check [FLUX.1-dev](https://github.com/vipshop/cache-dit/blob/main/examples/run_flux_adapter.py) as an example.
|
|
194
|
+
- [2025-08-10] 🔥[**FLUX.1-Kontext-dev**](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev) is supported! Please refer [run_flux_kontext.py](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline/run_flux_kontext.py) as an example.
|
|
181
195
|
- [2025-07-18] 🎉First caching mechanism in [🤗huggingface/flux-fast](https://github.com/huggingface/flux-fast) with **[cache-dit](https://github.com/vipshop/cache-dit)**, check the [PR](https://github.com/huggingface/flux-fast/pull/13).
|
|
182
196
|
|
|
183
197
|
</details>
|
|
@@ -187,20 +201,14 @@ Dynamic: requires-python
|
|
|
187
201
|
<div id="contents"></div>
|
|
188
202
|
|
|
189
203
|
- [⚙️Installation](#️installation)
|
|
190
|
-
- [🔥
|
|
191
|
-
- [
|
|
192
|
-
- [🎉Unified Cache APIs](#unified)
|
|
193
|
-
- [📚Forward Pattern Matching](#forward-pattern-matching)
|
|
194
|
-
- [♥️Cache with One-line Code](#%EF%B8%8Fcache-acceleration-with-one-line-code)
|
|
195
|
-
- [🔥Automatic Block Adapter](#automatic-block-adapter)
|
|
196
|
-
- [📚Hybird Forward Pattern](#automatic-block-adapter)
|
|
197
|
-
- [📚Implement Patch Functor](#implement-patch-functor)
|
|
198
|
-
- [🤖Cache Acceleration Stats](#cache-acceleration-stats-summary)
|
|
204
|
+
- [🔥Quick Start](#quick-start)
|
|
205
|
+
- [📚Pattern Matching](#forward-pattern-matching)
|
|
199
206
|
- [⚡️Dual Block Cache](#dbcache)
|
|
200
207
|
- [🔥TaylorSeer Calibrator](#taylorseer)
|
|
201
|
-
- [
|
|
202
|
-
- [
|
|
203
|
-
- [
|
|
208
|
+
- [📚Hybrid Cache CFG](#cfg)
|
|
209
|
+
- [🔥Benchmarks](#benchmarks)
|
|
210
|
+
- [🎉User Guide](#user-guide)
|
|
211
|
+
- [©️Citations](#citations)
|
|
204
212
|
|
|
205
213
|
## ⚙️Installation
|
|
206
214
|
|
|
@@ -217,11 +225,35 @@ Or you can install the latest develop version from GitHub:
|
|
|
217
225
|
pip3 install git+https://github.com/vipshop/cache-dit.git
|
|
218
226
|
```
|
|
219
227
|
|
|
220
|
-
## 🔥
|
|
228
|
+
## 🔥Quick Start
|
|
229
|
+
|
|
230
|
+
<div id="unified"></div>
|
|
231
|
+
|
|
232
|
+
<div id="quick-start"></div>
|
|
233
|
+
|
|
234
|
+
In most cases, you only need to call ♥️**one-line**♥️ of code, that is `cache_dit.enable_cache(...)`. After this API is called, you just need to call the pipe as normal. The `pipe` param can be **any** Diffusion Pipeline. Please refer to [Qwen-Image](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline/run_qwen_image.py) as an example.
|
|
235
|
+
|
|
236
|
+
```python
|
|
237
|
+
>>> import cache_dit
|
|
238
|
+
>>> from diffusers import DiffusionPipeline
|
|
239
|
+
>>> pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image") # Can be any diffusion pipeline
|
|
240
|
+
>>> cache_dit.enable_cache(pipe) # One-line code with default cache options.
|
|
241
|
+
>>> output = pipe(...) # Just call the pipe as normal.
|
|
242
|
+
>>> stats = cache_dit.summary(pipe) # Then, get the summary of cache acceleration stats.
|
|
243
|
+
>>> cache_dit.disable_cache(pipe) # Disable cache and run original pipe.
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
## 📚Forward Pattern Matching
|
|
221
247
|
|
|
222
248
|
<div id="supported"></div>
|
|
223
249
|
|
|
224
|
-
|
|
250
|
+
<div id="forward-pattern-matching"></div>
|
|
251
|
+
|
|
252
|
+
cache-dit works by matching specific input/output patterns as shown below.
|
|
253
|
+
|
|
254
|
+

|
|
255
|
+
|
|
256
|
+
Please check [🎉Examples](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline) for more details. Here are just some of the tested models listed.
|
|
225
257
|
|
|
226
258
|
```python
|
|
227
259
|
>>> import cache_dit
|
|
@@ -235,64 +267,128 @@ Currently, **cache-dit** library supports almost **Any** Diffusion Transformers
|
|
|
235
267
|
<details>
|
|
236
268
|
<summary> Show all pipelines </summary>
|
|
237
269
|
|
|
238
|
-
- [🚀HunyuanImage-2.1](https://github.com/vipshop/cache-dit/
|
|
239
|
-
- [🚀Qwen-Image-Lightning](https://github.com/vipshop/cache-dit/
|
|
240
|
-
- [🚀Qwen-Image-Edit](https://github.com/vipshop/cache-dit/
|
|
241
|
-
- [🚀Qwen-Image](https://github.com/vipshop/cache-dit/
|
|
242
|
-
- [🚀FLUX.1-dev](https://github.com/vipshop/cache-dit/
|
|
243
|
-
- [🚀FLUX.1-Fill-dev](https://github.com/vipshop/cache-dit/
|
|
244
|
-
- [🚀FLUX.1-Kontext-dev](https://github.com/vipshop/cache-dit/
|
|
245
|
-
- [🚀CogView4](https://github.com/vipshop/cache-dit/
|
|
246
|
-
- [🚀Wan2.2-T2V](https://github.com/vipshop/cache-dit/
|
|
247
|
-
- [🚀HunyuanVideo](https://github.com/vipshop/cache-dit/
|
|
248
|
-
- [🚀HiDream-I1-Full](https://github.com/vipshop/cache-dit/
|
|
249
|
-
- [🚀HunyuanDiT](https://github.com/vipshop/cache-dit/
|
|
250
|
-
- [🚀Wan2.1-T2V](https://github.com/vipshop/cache-dit/
|
|
251
|
-
- [🚀Wan2.1-FLF2V](https://github.com/vipshop/cache-dit/
|
|
252
|
-
- [🚀SkyReelsV2](https://github.com/vipshop/cache-dit/
|
|
253
|
-
- [🚀Chroma1-HD](https://github.com/vipshop/cache-dit/
|
|
254
|
-
- [🚀CogVideoX1.5](https://github.com/vipshop/cache-dit/
|
|
255
|
-
- [🚀CogView3-Plus](https://github.com/vipshop/cache-dit/
|
|
256
|
-
- [🚀CogVideoX](https://github.com/vipshop/cache-dit/
|
|
257
|
-
- [🚀VisualCloze](https://github.com/vipshop/cache-dit/
|
|
258
|
-
- [🚀LTXVideo](https://github.com/vipshop/cache-dit/
|
|
259
|
-
- [🚀OmniGen](https://github.com/vipshop/cache-dit/
|
|
260
|
-
- [🚀Lumina2](https://github.com/vipshop/cache-dit/
|
|
261
|
-
- [🚀mochi-1-preview](https://github.com/vipshop/cache-dit/
|
|
262
|
-
- [🚀AuraFlow-v0.3](https://github.com/vipshop/cache-dit/
|
|
263
|
-
- [🚀PixArt-Alpha](https://github.com/vipshop/cache-dit/
|
|
264
|
-
- [🚀PixArt-Sigma](https://github.com/vipshop/cache-dit/
|
|
265
|
-
- [🚀NVIDIA Sana](https://github.com/vipshop/cache-dit/
|
|
266
|
-
- [🚀SD-3/3.5](https://github.com/vipshop/cache-dit/
|
|
267
|
-
- [🚀ConsisID](https://github.com/vipshop/cache-dit/
|
|
268
|
-
- [🚀Allegro](https://github.com/vipshop/cache-dit/
|
|
269
|
-
- [🚀Amused](https://github.com/vipshop/cache-dit/
|
|
270
|
-
- [🚀DiT-XL](https://github.com/vipshop/cache-dit/
|
|
270
|
+
- [🚀HunyuanImage-2.1](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
271
|
+
- [🚀Qwen-Image-Lightning](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
272
|
+
- [🚀Qwen-Image-Edit](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
273
|
+
- [🚀Qwen-Image](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
274
|
+
- [🚀FLUX.1-dev](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
275
|
+
- [🚀FLUX.1-Fill-dev](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
276
|
+
- [🚀FLUX.1-Kontext-dev](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
277
|
+
- [🚀CogView4](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
278
|
+
- [🚀Wan2.2-T2V](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
279
|
+
- [🚀HunyuanVideo](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
280
|
+
- [🚀HiDream-I1-Full](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
281
|
+
- [🚀HunyuanDiT](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
282
|
+
- [🚀Wan2.1-T2V](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
283
|
+
- [🚀Wan2.1-FLF2V](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
284
|
+
- [🚀SkyReelsV2](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
285
|
+
- [🚀Chroma1-HD](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
286
|
+
- [🚀CogVideoX1.5](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
287
|
+
- [🚀CogView3-Plus](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
288
|
+
- [🚀CogVideoX](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
289
|
+
- [🚀VisualCloze](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
290
|
+
- [🚀LTXVideo](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
291
|
+
- [🚀OmniGen](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
292
|
+
- [🚀Lumina2](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
293
|
+
- [🚀mochi-1-preview](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
294
|
+
- [🚀AuraFlow-v0.3](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
295
|
+
- [🚀PixArt-Alpha](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
296
|
+
- [🚀PixArt-Sigma](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
297
|
+
- [🚀NVIDIA Sana](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
298
|
+
- [🚀SD-3/3.5](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
299
|
+
- [🚀ConsisID](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
300
|
+
- [🚀Allegro](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
301
|
+
- [🚀Amused](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
302
|
+
- [🚀DiT-XL](https://github.com/vipshop/cache-dit/blob/main/examples)
|
|
271
303
|
- ...
|
|
272
304
|
|
|
273
305
|
</details>
|
|
274
306
|
|
|
275
|
-
##
|
|
307
|
+
## ⚡️DBCache: Dual Block Cache
|
|
276
308
|
|
|
277
|
-
<div id="
|
|
309
|
+
<div id="dbcache"></div>
|
|
310
|
+
|
|
311
|
+

|
|
312
|
+
|
|
313
|
+
**DBCache**: **Dual Block Caching** for Diffusion Transformers. Different configurations of compute blocks (**F8B12**, etc.) can be customized in DBCache, enabling a balanced trade-off between performance and precision. Moreover, it can be entirely **training**-**free**. Please Check the [DBCache](https://github.com/vipshop/cache-dit/blob/main/docs/DBCache.md) and [User Guide](https://github.com/vipshop/cache-dit/blob/main/docs/User_Guide.md#dbcache) docs for more design details.
|
|
314
|
+
|
|
315
|
+
```python
|
|
316
|
+
# Default options, F8B0, 8 warmup steps, and unlimited cached
|
|
317
|
+
# steps for good balance between performance and precision
|
|
318
|
+
cache_dit.enable_cache(pipe_or_adapter)
|
|
319
|
+
|
|
320
|
+
# Custom options, F8B8, higher precision
|
|
321
|
+
from cache_dit import BasicCacheConfig
|
|
322
|
+
|
|
323
|
+
cache_dit.enable_cache(
|
|
324
|
+
pipe_or_adapter,
|
|
325
|
+
cache_config=BasicCacheConfig(
|
|
326
|
+
max_warmup_steps=8, # steps do not cache
|
|
327
|
+
max_cached_steps=-1, # -1 means no limit
|
|
328
|
+
Fn_compute_blocks=8, # Fn, F8, etc.
|
|
329
|
+
Bn_compute_blocks=8, # Bn, B8, etc.
|
|
330
|
+
residual_diff_threshold=0.12,
|
|
331
|
+
),
|
|
332
|
+
)
|
|
333
|
+
```
|
|
334
|
+
|
|
335
|
+
## 🔥TaylorSeer Calibrator
|
|
336
|
+
|
|
337
|
+
<div id="taylorseer"></div>
|
|
338
|
+
|
|
339
|
+
The [TaylorSeers](https://huggingface.co/papers/2503.06923) algorithm further improves the precision of DBCache in cases where the cached steps are large (Hybrid TaylorSeer + DBCache). At timesteps with significant intervals, the feature similarity in diffusion models decreases substantially, significantly harming the generation quality.
|
|
278
340
|
|
|
279
|
-
|
|
341
|
+
TaylorSeer employs a differential method to approximate the higher-order derivatives of features and predict features in future timesteps with Taylor series expansion. The TaylorSeer implemented in CacheDiT supports both hidden states and residual cache types. F_pred can be a residual cache or a hidden-state cache.
|
|
280
342
|
|
|
281
|
-
|
|
343
|
+
```python
|
|
344
|
+
from cache_dit import BasicCacheConfig, TaylorSeerCalibratorConfig
|
|
282
345
|
|
|
283
|
-
|
|
346
|
+
cache_dit.enable_cache(
|
|
347
|
+
pipe_or_adapter,
|
|
348
|
+
# Basic DBCache w/ FnBn configurations
|
|
349
|
+
cache_config=BasicCacheConfig(
|
|
350
|
+
max_warmup_steps=8, # steps do not cache
|
|
351
|
+
max_cached_steps=-1, # -1 means no limit
|
|
352
|
+
Fn_compute_blocks=8, # Fn, F8, etc.
|
|
353
|
+
Bn_compute_blocks=8, # Bn, B8, etc.
|
|
354
|
+
residual_diff_threshold=0.12,
|
|
355
|
+
),
|
|
356
|
+
# Then, you can use the TaylorSeer Calibrator to approximate
|
|
357
|
+
# the values in cached steps, taylorseer_order default is 1.
|
|
358
|
+
calibrator_config=TaylorSeerCalibratorConfig(
|
|
359
|
+
taylorseer_order=1,
|
|
360
|
+
),
|
|
361
|
+
)
|
|
362
|
+
```
|
|
284
363
|
|
|
364
|
+
> [!TIP]
|
|
365
|
+
> The `Bn_compute_blocks` parameter of DBCache can be set to `0` if you use TaylorSeer as the calibrator for approximate hidden states. DBCache's `Bn_compute_blocks` also acts as a calibrator, so you can choose either `Bn_compute_blocks` > 0 or TaylorSeer. We recommend using the configuration scheme of TaylorSeer + DBCache FnB0.
|
|
285
366
|
|
|
286
|
-
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
|
|
291
|
-
|
|
292
|
-
|
|
293
|
-
|
|
367
|
+
## 📚Hybrid Cache CFG
|
|
368
|
+
|
|
369
|
+
<div id="cfg"></div>
|
|
370
|
+
|
|
371
|
+
cache-dit supports caching for CFG (classifier-free guidance). For models that fuse CFG and non-CFG into a single forward step, or models that do not include CFG (classifier-free guidance) in the forward step, please set `enable_separate_cfg` parameter to `False (default, None)`. Otherwise, set it to `True`.
|
|
372
|
+
|
|
373
|
+
```python
|
|
374
|
+
from cache_dit import BasicCacheConfig
|
|
375
|
+
|
|
376
|
+
cache_dit.enable_cache(
|
|
377
|
+
pipe_or_adapter,
|
|
378
|
+
cache_config=BasicCacheConfig(
|
|
379
|
+
...,
|
|
380
|
+
# For example, set it as True for Wan 2.1/Qwen-Image
|
|
381
|
+
# and set it as False for FLUX.1, HunyuanVideo, CogVideoX, etc.
|
|
382
|
+
enable_separate_cfg=True,
|
|
383
|
+
),
|
|
384
|
+
)
|
|
385
|
+
```
|
|
386
|
+
|
|
387
|
+
## 🔥Benchmarks
|
|
388
|
+
|
|
389
|
+
<div id="benchmarks"></div>
|
|
294
390
|
|
|
295
|
-
The comparison between **cache-dit: DBCache** and algorithms such as Δ-DiT, Chipmunk, FORA, DuCa, TaylorSeer and FoCa is as follows. Now, in the comparison with a speedup ratio less than **3x**, cache-dit achieved the best accuracy.
|
|
391
|
+
The comparison between **cache-dit: DBCache** and algorithms such as Δ-DiT, Chipmunk, FORA, DuCa, TaylorSeer and FoCa is as follows. Now, in the comparison with a speedup ratio less than **3x**, cache-dit achieved the best accuracy. Surprisingly, cache-dit: DBCache still works in the extremely few-step distill model. For a complete benchmark, please refer to [📚Benchmarks](https://github.com/vipshop/cache-dit/raw/main/bench/).
|
|
296
392
|
|
|
297
393
|
| Method | TFLOPs(↓) | SpeedUp(↑) | ImageReward(↑) | Clip Score(↑) |
|
|
298
394
|
| --- | --- | --- | --- | --- |
|
|
@@ -350,363 +446,33 @@ NOTE: Except for DBCache, other performance data are referenced from the paper [
|
|
|
350
446
|
|
|
351
447
|
</details>
|
|
352
448
|
|
|
353
|
-
|
|
354
|
-
|
|
355
|
-
|
|
356
|
-
|
|
357
|
-
|
|
358
|
-
|
|
359
|
-
|
|
360
|
-
|
|
361
|
-
|
|
362
|
-
|
|
363
|
-
|
|
364
|
-
|
|
365
|
-
|
|
366
|
-
|
|
367
|
-
|
|
368
|
-
|
|
369
|
-
|
|
370
|
-
|
|
371
|
-
|
|
372
|
-
|
|
373
|
-
|
|
374
|
-
|
|
375
|
-
|
|
376
|
-
### ♥️Cache Acceleration with One-line Code
|
|
377
|
-
|
|
378
|
-
In most cases, you only need to call **one-line** of code, that is `cache_dit.enable_cache(...)`. After this API is called, you just need to call the pipe as normal. The `pipe` param can be **any** Diffusion Pipeline. Please refer to [Qwen-Image](./examples/pipeline/run_qwen_image.py) as an example.
|
|
379
|
-
|
|
380
|
-
```python
|
|
381
|
-
import cache_dit
|
|
382
|
-
from diffusers import DiffusionPipeline
|
|
383
|
-
|
|
384
|
-
# Can be any diffusion pipeline
|
|
385
|
-
pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image")
|
|
386
|
-
|
|
387
|
-
# One-line code with default cache options.
|
|
388
|
-
cache_dit.enable_cache(pipe)
|
|
389
|
-
|
|
390
|
-
# Just call the pipe as normal.
|
|
391
|
-
output = pipe(...)
|
|
392
|
-
|
|
393
|
-
# Disable cache and run original pipe.
|
|
394
|
-
cache_dit.disable_cache(pipe)
|
|
395
|
-
```
|
|
396
|
-
|
|
397
|
-
### 🔥Automatic Block Adapter
|
|
398
|
-
|
|
399
|
-
But in some cases, you may have a **modified** Diffusion Pipeline or Transformer that is not located in the diffusers library or not officially supported by **cache-dit** at this time. The **BlockAdapter** can help you solve this problems. Please refer to [🔥Qwen-Image w/ BlockAdapter](./examples/adapter/run_qwen_image_adapter.py) as an example.
|
|
400
|
-
|
|
401
|
-
```python
|
|
402
|
-
from cache_dit import ForwardPattern, BlockAdapter
|
|
403
|
-
|
|
404
|
-
# Use 🔥BlockAdapter with `auto` mode.
|
|
405
|
-
cache_dit.enable_cache(
|
|
406
|
-
BlockAdapter(
|
|
407
|
-
# Any DiffusionPipeline, Qwen-Image, etc.
|
|
408
|
-
pipe=pipe, auto=True,
|
|
409
|
-
# Check `📚Forward Pattern Matching` documentation and hack the code of
|
|
410
|
-
# of Qwen-Image, you will find that it has satisfied `FORWARD_PATTERN_1`.
|
|
411
|
-
forward_pattern=ForwardPattern.Pattern_1,
|
|
412
|
-
),
|
|
413
|
-
)
|
|
414
|
-
|
|
415
|
-
# Or, manually setup transformer configurations.
|
|
416
|
-
cache_dit.enable_cache(
|
|
417
|
-
BlockAdapter(
|
|
418
|
-
pipe=pipe, # Qwen-Image, etc.
|
|
419
|
-
transformer=pipe.transformer,
|
|
420
|
-
blocks=pipe.transformer.transformer_blocks,
|
|
421
|
-
forward_pattern=ForwardPattern.Pattern_1,
|
|
422
|
-
),
|
|
423
|
-
)
|
|
424
|
-
```
|
|
425
|
-
For such situations, **BlockAdapter** can help you quickly apply various cache acceleration features to your own Diffusion Pipelines and Transformers. Please check the [📚BlockAdapter.md](./docs/BlockAdapter.md) for more details.
|
|
426
|
-
|
|
427
|
-
### 📚Hybird Forward Pattern
|
|
428
|
-
|
|
429
|
-
Sometimes, a Transformer class will contain more than one transformer `blocks`. For example, **FLUX.1** (HiDream, Chroma, etc) contains transformer_blocks and single_transformer_blocks (with different forward patterns). The **BlockAdapter** can also help you solve this problem. Please refer to [📚FLUX.1](./examples/adapter/run_flux_adapter.py) as an example.
|
|
430
|
-
|
|
431
|
-
```python
|
|
432
|
-
# For diffusers <= 0.34.0, FLUX.1 transformer_blocks and
|
|
433
|
-
# single_transformer_blocks have different forward patterns.
|
|
434
|
-
cache_dit.enable_cache(
|
|
435
|
-
BlockAdapter(
|
|
436
|
-
pipe=pipe, # FLUX.1, etc.
|
|
437
|
-
transformer=pipe.transformer,
|
|
438
|
-
blocks=[
|
|
439
|
-
pipe.transformer.transformer_blocks,
|
|
440
|
-
pipe.transformer.single_transformer_blocks,
|
|
441
|
-
],
|
|
442
|
-
forward_pattern=[
|
|
443
|
-
ForwardPattern.Pattern_1,
|
|
444
|
-
ForwardPattern.Pattern_3,
|
|
445
|
-
],
|
|
446
|
-
),
|
|
447
|
-
)
|
|
448
|
-
```
|
|
449
|
-
|
|
450
|
-
Even sometimes you have more complex cases, such as **Wan 2.2 MoE**, which has more than one Transformer (namely `transformer` and `transformer_2`) in its structure. Fortunately, **cache-dit** can also handle this situation very well. Please refer to [📚Wan 2.2 MoE](./examples/pipeline/run_wan_2.2.py) as an example.
|
|
451
|
-
|
|
452
|
-
```python
|
|
453
|
-
from cache_dit import ForwardPattern, BlockAdapter, ParamsModifier
|
|
454
|
-
|
|
455
|
-
cache_dit.enable_cache(
|
|
456
|
-
BlockAdapter(
|
|
457
|
-
pipe=pipe,
|
|
458
|
-
transformer=[
|
|
459
|
-
pipe.transformer,
|
|
460
|
-
pipe.transformer_2,
|
|
461
|
-
],
|
|
462
|
-
blocks=[
|
|
463
|
-
pipe.transformer.blocks,
|
|
464
|
-
pipe.transformer_2.blocks,
|
|
465
|
-
],
|
|
466
|
-
forward_pattern=[
|
|
467
|
-
ForwardPattern.Pattern_2,
|
|
468
|
-
ForwardPattern.Pattern_2,
|
|
469
|
-
],
|
|
470
|
-
# Setup different cache params for each 'blocks'. You can
|
|
471
|
-
# pass any specific cache params to ParamModifier, the old
|
|
472
|
-
# value will be overwrite by the new one.
|
|
473
|
-
params_modifiers=[
|
|
474
|
-
ParamsModifier(
|
|
475
|
-
max_warmup_steps=4,
|
|
476
|
-
max_cached_steps=8,
|
|
477
|
-
),
|
|
478
|
-
ParamsModifier(
|
|
479
|
-
max_warmup_steps=2,
|
|
480
|
-
max_cached_steps=20,
|
|
481
|
-
),
|
|
482
|
-
],
|
|
483
|
-
has_separate_cfg=True,
|
|
484
|
-
),
|
|
485
|
-
)
|
|
486
|
-
```
|
|
487
|
-
### 📚Implement Patch Functor
|
|
488
|
-
|
|
489
|
-
For any PATTERN not {0...5}, we introduced the simple abstract concept of **Patch Functor**. Users can implement a subclass of Patch Functor to convert an unknown Pattern into a known PATTERN, and for some models, users may also need to fuse the operations within the blocks for loop into block forward.
|
|
490
|
-
|
|
491
|
-

|
|
492
|
-
|
|
493
|
-
Some Patch functors have already been provided in cache-dit: [📚HiDreamPatchFunctor](./src/cache_dit/cache_factory/patch_functors/functor_hidream.py), [📚ChromaPatchFunctor](./src/cache_dit/cache_factory/patch_functors/functor_chroma.py), etc. After implementing Patch Functor, users need to set the `patch_functor` property of **BlockAdapter**.
|
|
494
|
-
|
|
495
|
-
```python
|
|
496
|
-
@BlockAdapterRegistry.register("HiDream")
|
|
497
|
-
def hidream_adapter(pipe, **kwargs) -> BlockAdapter:
|
|
498
|
-
from diffusers import HiDreamImageTransformer2DModel
|
|
499
|
-
from cache_dit.cache_factory.patch_functors import HiDreamPatchFunctor
|
|
500
|
-
|
|
501
|
-
assert isinstance(pipe.transformer, HiDreamImageTransformer2DModel)
|
|
502
|
-
return BlockAdapter(
|
|
503
|
-
pipe=pipe,
|
|
504
|
-
transformer=pipe.transformer,
|
|
505
|
-
blocks=[
|
|
506
|
-
pipe.transformer.double_stream_blocks,
|
|
507
|
-
pipe.transformer.single_stream_blocks,
|
|
508
|
-
],
|
|
509
|
-
forward_pattern=[
|
|
510
|
-
ForwardPattern.Pattern_0,
|
|
511
|
-
ForwardPattern.Pattern_3,
|
|
512
|
-
],
|
|
513
|
-
# NOTE: Setup your custom patch functor here.
|
|
514
|
-
patch_functor=HiDreamPatchFunctor(),
|
|
515
|
-
**kwargs,
|
|
516
|
-
)
|
|
517
|
-
```
|
|
518
|
-
|
|
519
|
-
### 🤖Cache Acceleration Stats Summary
|
|
520
|
-
|
|
521
|
-
After finishing each inference of `pipe(...)`, you can call the `cache_dit.summary()` API on pipe to get the details of the **Cache Acceleration Stats** for the current inference.
|
|
522
|
-
```python
|
|
523
|
-
stats = cache_dit.summary(pipe)
|
|
524
|
-
```
|
|
525
|
-
|
|
526
|
-
You can set `details` param as `True` to show more details of cache stats. (markdown table format) Sometimes, this may help you analyze what values of the residual diff threshold would be better.
|
|
527
|
-
|
|
528
|
-
```python
|
|
529
|
-
⚡️Cache Steps and Residual Diffs Statistics: QwenImagePipeline
|
|
530
|
-
|
|
531
|
-
| Cache Steps | Diffs Min | Diffs P25 | Diffs P50 | Diffs P75 | Diffs P95 | Diffs Max |
|
|
532
|
-
|-------------|-----------|-----------|-----------|-----------|-----------|-----------|
|
|
533
|
-
| 23 | 0.045 | 0.084 | 0.114 | 0.147 | 0.241 | 0.297 |
|
|
534
|
-
```
|
|
535
|
-
|
|
536
|
-
## ⚡️DBCache: Dual Block Cache
|
|
537
|
-
|
|
538
|
-
<div id="dbcache"></div>
|
|
539
|
-
|
|
540
|
-

|
|
541
|
-
|
|
542
|
-
**DBCache**: **Dual Block Caching** for Diffusion Transformers. Different configurations of compute blocks (**F8B12**, etc.) can be customized in DBCache, enabling a balanced trade-off between performance and precision. Moreover, it can be entirely **training**-**free**. Please check [DBCache.md](./docs/DBCache.md) docs for more design details.
|
|
543
|
-
|
|
544
|
-
- **Fn**: Specifies that DBCache uses the **first n** Transformer blocks to fit the information at time step t, enabling the calculation of a more stable L1 diff and delivering more accurate information to subsequent blocks.
|
|
545
|
-
- **Bn**: Further fuses approximate information in the **last n** Transformer blocks to enhance prediction accuracy. These blocks act as an auto-scaler for approximate hidden states that use residual cache.
|
|
546
|
-
|
|
547
|
-
```python
|
|
548
|
-
import cache_dit
|
|
549
|
-
from diffusers import FluxPipeline
|
|
550
|
-
|
|
551
|
-
pipe = FluxPipeline.from_pretrained(
|
|
552
|
-
"black-forest-labs/FLUX.1-dev",
|
|
553
|
-
torch_dtype=torch.bfloat16,
|
|
554
|
-
).to("cuda")
|
|
555
|
-
|
|
556
|
-
# Default options, F8B0, 8 warmup steps, and unlimited cached
|
|
557
|
-
# steps for good balance between performance and precision
|
|
558
|
-
cache_dit.enable_cache(pipe)
|
|
559
|
-
|
|
560
|
-
# Custom options, F8B8, higher precision
|
|
561
|
-
cache_dit.enable_cache(
|
|
562
|
-
pipe,
|
|
563
|
-
max_warmup_steps=8, # steps do not cache
|
|
564
|
-
max_cached_steps=-1, # -1 means no limit
|
|
565
|
-
Fn_compute_blocks=8, # Fn, F8, etc.
|
|
566
|
-
Bn_compute_blocks=8, # Bn, B8, etc.
|
|
567
|
-
residual_diff_threshold=0.12,
|
|
568
|
-
)
|
|
569
|
-
```
|
|
570
|
-
|
|
571
|
-
<div align="center">
|
|
572
|
-
<p align="center">
|
|
573
|
-
DBCache, <b> L20x1 </b>, Steps: 28, "A cat holding a sign that says hello world with complex background"
|
|
574
|
-
</p>
|
|
575
|
-
</div>
|
|
576
|
-
|
|
577
|
-
|Baseline(L20x1)|F1B0 (0.08)|F1B0 (0.20)|F8B8 (0.15)|F12B12 (0.20)|F16B16 (0.20)|
|
|
578
|
-
|:---:|:---:|:---:|:---:|:---:|:---:|
|
|
579
|
-
|24.85s|15.59s|8.58s|15.41s|15.11s|17.74s|
|
|
580
|
-
|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/NONE_R0.08_S0.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F1B0S1_R0.08_S11.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F1B0S1_R0.2_S19.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F8B8S1_R0.15_S15.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F12B12S4_R0.2_S16.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F16B16S4_R0.2_S13.png width=105px>|
|
|
581
|
-
|
|
582
|
-
## 🔥TaylorSeer Calibrator
|
|
583
|
-
|
|
584
|
-
<div id="taylorseer"></div>
|
|
585
|
-
|
|
586
|
-
We have supported the [TaylorSeers: From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers](https://arxiv.org/pdf/2503.06923) algorithm to further improve the precision of DBCache in cases where the cached steps are large, namely, **Hybrid TaylorSeer + DBCache**. At timesteps with significant intervals, the feature similarity in diffusion models decreases substantially, significantly harming the generation quality.
|
|
587
|
-
|
|
588
|
-
$$
|
|
589
|
-
\mathcal{F}\_{\text {pred }, m}\left(x_{t-k}^l\right)=\mathcal{F}\left(x_t^l\right)+\sum_{i=1}^m \frac{\Delta^i \mathcal{F}\left(x_t^l\right)}{i!\cdot N^i}(-k)^i
|
|
590
|
-
$$
|
|
591
|
-
|
|
592
|
-
**TaylorSeer** employs a differential method to approximate the higher-order derivatives of features and predict features in future timesteps with Taylor series expansion. The TaylorSeer implemented in cache-dit supports both hidden states and residual cache types. That is $\mathcal{F}\_{\text {pred }, m}\left(x_{t-k}^l\right)$ can be a residual cache or a hidden-state cache.
|
|
593
|
-
|
|
594
|
-
```python
|
|
595
|
-
from cache_dit import TaylorSeerCalibratorConfig
|
|
596
|
-
|
|
597
|
-
cache_dit.enable_cache(
|
|
598
|
-
pipe,
|
|
599
|
-
# Basic DBCache w/ FnBn configurations
|
|
600
|
-
max_warmup_steps=8, # steps do not cache
|
|
601
|
-
max_cached_steps=-1, # -1 means no limit
|
|
602
|
-
Fn_compute_blocks=8, # Fn, F8, etc.
|
|
603
|
-
Bn_compute_blocks=8, # Bn, B8, etc.
|
|
604
|
-
residual_diff_threshold=0.12,
|
|
605
|
-
# Then, you can use the TaylorSeer Calibrator to approximate
|
|
606
|
-
# the values in cached steps, taylorseer_order default is 1.
|
|
607
|
-
calibrator_config=TaylorSeerCalibratorConfig(
|
|
608
|
-
taylorseer_order=1,
|
|
609
|
-
),
|
|
610
|
-
)
|
|
611
|
-
```
|
|
612
|
-
|
|
613
|
-
> [!Important]
|
|
614
|
-
> Please note that if you have used TaylorSeer as the calibrator for approximate hidden states, the **Bn** param of DBCache can be set to **0**. In essence, DBCache's Bn is also act as a calibrator, so you can choose either Bn > 0 or TaylorSeer. We recommend using the configuration scheme of **TaylorSeer** + **DBCache FnB0**.
|
|
615
|
-
|
|
616
|
-
<div align="center">
|
|
617
|
-
<p align="center">
|
|
618
|
-
<b>DBCache F1B0 + TaylorSeer</b>, L20x1, Steps: 28, <br>"A cat holding a sign that says hello world with complex background"
|
|
619
|
-
</p>
|
|
620
|
-
</div>
|
|
621
|
-
|
|
622
|
-
|Baseline(L20x1)|F1B0 (0.12)|+TaylorSeer|F1B0 (0.15)|+TaylorSeer|+compile|
|
|
623
|
-
|:---:|:---:|:---:|:---:|:---:|:---:|
|
|
624
|
-
|24.85s|12.85s|12.86s|10.27s|10.28s|8.48s|
|
|
625
|
-
|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/NONE_R0.08_S0.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C0_DBCACHE_F1B0S1W0T0ET0_R0.12_S14_T12.85s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C0_DBCACHE_F1B0S1W0T1ET1_R0.12_S14_T12.86s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C0_DBCACHE_F1B0S1W0T0ET0_R0.15_S17_T10.27s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C0_DBCACHE_F1B0S1W0T1ET1_R0.15_S17_T10.28s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBCACHE_F1B0S1W0T1ET1_R0.15_S17_T8.48s.png width=105px>|
|
|
626
|
-
|
|
627
|
-
## ⚡️Hybrid Cache CFG
|
|
628
|
-
|
|
629
|
-
<div id="cfg"></div>
|
|
630
|
-
|
|
631
|
-
cache-dit supports caching for **CFG (classifier-free guidance)**. For models that fuse CFG and non-CFG into a single forward step, or models that do not include CFG (classifier-free guidance) in the forward step, please set `enable_separate_cfg` param to **False (default, None)**. Otherwise, set it to True. For examples:
|
|
632
|
-
|
|
633
|
-
```python
|
|
634
|
-
cache_dit.enable_cache(
|
|
635
|
-
pipe,
|
|
636
|
-
...,
|
|
637
|
-
# CFG: classifier free guidance or not
|
|
638
|
-
# For model that fused CFG and non-CFG into single forward step,
|
|
639
|
-
# should set enable_separate_cfg as False. For example, set it as True
|
|
640
|
-
# for Wan 2.1/Qwen-Image and set it as False for FLUX.1, HunyuanVideo,
|
|
641
|
-
# CogVideoX, Mochi, LTXVideo, Allegro, CogView3Plus, EasyAnimate, SD3, etc.
|
|
642
|
-
enable_separate_cfg=True, # Wan 2.1, Qwen-Image, CogView4, Cosmos, SkyReelsV2, etc.
|
|
643
|
-
# Compute cfg forward first or not, default False, namely,
|
|
644
|
-
# 0, 2, 4, ..., -> non-CFG step; 1, 3, 5, ... -> CFG step.
|
|
645
|
-
cfg_compute_first=False,
|
|
646
|
-
# Compute separate diff values for CFG and non-CFG step,
|
|
647
|
-
# default True. If False, we will use the computed diff from
|
|
648
|
-
# current non-CFG transformer step for current CFG step.
|
|
649
|
-
cfg_diff_compute_separate=True,
|
|
650
|
-
)
|
|
651
|
-
```
|
|
652
|
-
|
|
653
|
-
## ⚙️Torch Compile
|
|
654
|
-
|
|
655
|
-
<div id="compile"></div>
|
|
656
|
-
|
|
657
|
-
By the way, **cache-dit** is designed to work compatibly with **torch.compile.** You can easily use cache-dit with torch.compile to further achieve a better performance. For example:
|
|
658
|
-
|
|
659
|
-
```python
|
|
660
|
-
cache_dit.enable_cache(pipe)
|
|
661
|
-
|
|
662
|
-
# Compile the Transformer module
|
|
663
|
-
pipe.transformer = torch.compile(pipe.transformer)
|
|
664
|
-
```
|
|
665
|
-
However, users intending to use **cache-dit** for DiT with **dynamic input shapes** should consider increasing the **recompile** **limit** of `torch._dynamo`. Otherwise, the recompile_limit error may be triggered, causing the module to fall back to eager mode.
|
|
666
|
-
```python
|
|
667
|
-
torch._dynamo.config.recompile_limit = 96 # default is 8
|
|
668
|
-
torch._dynamo.config.accumulated_recompile_limit = 2048 # default is 256
|
|
669
|
-
```
|
|
670
|
-
|
|
671
|
-
Please check [perf.py](./bench/perf.py) for more details.
|
|
672
|
-
|
|
673
|
-
|
|
674
|
-
## 🛠Metrics CLI
|
|
675
|
-
|
|
676
|
-
<div id="metrics"></div>
|
|
677
|
-
|
|
678
|
-
You can utilize the APIs provided by cache-dit to quickly evaluate the accuracy losses caused by different cache configurations. For example:
|
|
679
|
-
|
|
680
|
-
```python
|
|
681
|
-
from cache_dit.metrics import compute_psnr
|
|
682
|
-
from cache_dit.metrics import compute_ssim
|
|
683
|
-
from cache_dit.metrics import compute_fid
|
|
684
|
-
from cache_dit.metrics import compute_lpips
|
|
685
|
-
from cache_dit.metrics import compute_clip_score
|
|
686
|
-
from cache_dit.metrics import compute_image_reward
|
|
687
|
-
|
|
688
|
-
psnr, n = compute_psnr("true.png", "test.png") # Num: n
|
|
689
|
-
psnr, n = compute_psnr("true_dir", "test_dir")
|
|
690
|
-
ssim, n = compute_ssim("true_dir", "test_dir")
|
|
691
|
-
fid, n = compute_fid("true_dir", "test_dir")
|
|
692
|
-
lpips, n = compute_lpips("true_dir", "test_dir")
|
|
693
|
-
clip_score, n = compute_clip_score("DrawBench200.txt", "test_dir")
|
|
694
|
-
reward, n = compute_image_reward("DrawBench200.txt", "test_dir")
|
|
695
|
-
```
|
|
696
|
-
|
|
697
|
-
Please check [test_metrics.py](./tests/test_metrics.py) for more details. Or, you can use `cache-dit-metrics-cli` tool. For examples:
|
|
698
|
-
|
|
699
|
-
```bash
|
|
700
|
-
cache-dit-metrics-cli -h # show usage
|
|
701
|
-
# all: PSNR, FID, SSIM, MSE, ..., etc.
|
|
702
|
-
cache-dit-metrics-cli all -i1 true.png -i2 test.png # image
|
|
703
|
-
cache-dit-metrics-cli all -i1 true_dir -i2 test_dir # image dir
|
|
704
|
-
```
|
|
449
|
+
## 🎉User Guide
|
|
450
|
+
|
|
451
|
+
<div id="user-guide"></div>
|
|
452
|
+
|
|
453
|
+
For more advanced features such as **Unified Cache APIs**, **Forward Pattern Matching**, **Automatic Block Adapter**, **Hybrid Forward Pattern**, **DBCache**, **TaylorSeer Calibrator**, and **Hybrid Cache CFG**, please refer to the [🎉User_Guide.md](./docs/User_Guide.md) for details.
|
|
454
|
+
|
|
455
|
+
- [⚙️Installation](./docs/User_Guide.md#️installation)
|
|
456
|
+
- [🔥Benchmarks](./docs/User_Guide.md#benchmarks)
|
|
457
|
+
- [🔥Supported Pipelines](./docs/User_Guide.md#supported-pipelines)
|
|
458
|
+
- [🎉Unified Cache APIs](./docs/User_Guide.md#unified-cache-apis)
|
|
459
|
+
- [📚Forward Pattern Matching](./docs/User_Guide.md#forward-pattern-matching)
|
|
460
|
+
- [📚Cache with One-line Code](./docs/User_Guide.md#%EF%B8%8Fcache-acceleration-with-one-line-code)
|
|
461
|
+
- [🔥Automatic Block Adapter](./docs/User_Guide.md#automatic-block-adapter)
|
|
462
|
+
- [📚Hybird Forward Pattern](./docs/User_Guide.md#hybird-forward-pattern)
|
|
463
|
+
- [📚Implement Patch Functor](./docs/User_Guide.md#implement-patch-functor)
|
|
464
|
+
- [🤖Cache Acceleration Stats](./docs/User_Guide.md#cache-acceleration-stats-summary)
|
|
465
|
+
- [⚡️Dual Block Cache](./docs/User_Guide.md#️dbcache-dual-block-cache)
|
|
466
|
+
- [🔥TaylorSeer Calibrator](./docs/User_Guide.md#taylorseer-calibrator)
|
|
467
|
+
- [⚡️Hybrid Cache CFG](./docs/User_Guide.md#️hybrid-cache-cfg)
|
|
468
|
+
- [⚙️Torch Compile](./docs/User_Guide.md#️torch-compile)
|
|
469
|
+
- [🛠Metrics CLI](./docs/User_Guide.md#metrics-cli)
|
|
470
|
+
- [📚API Documents](./docs/User_Guide.md#api-documentation)
|
|
705
471
|
|
|
706
472
|
## 👋Contribute
|
|
707
473
|
<div id="contribute"></div>
|
|
708
474
|
|
|
709
|
-
How to contribute? Star ⭐️ this repo to support us or check [CONTRIBUTE.md](
|
|
475
|
+
How to contribute? Star ⭐️ this repo to support us or check [CONTRIBUTE.md](https://github.com/vipshop/cache-dit/raw/main/CONTRIBUTE.md).
|
|
710
476
|
|
|
711
477
|
<div align='center'>
|
|
712
478
|
<a href="https://star-history.com/#vipshop/cache-dit&Date">
|
|
@@ -730,7 +496,7 @@ The **cache-dit** codebase is adapted from FBCache. Over time its codebase diver
|
|
|
730
496
|
|
|
731
497
|
```BibTeX
|
|
732
498
|
@misc{cache-dit@2025,
|
|
733
|
-
title={cache-dit: A Unified, Flexible and Training-free Cache Acceleration Framework for
|
|
499
|
+
title={cache-dit: A Unified, Flexible and Training-free Cache Acceleration Framework for Diffusers.},
|
|
734
500
|
url={https://github.com/vipshop/cache-dit.git},
|
|
735
501
|
note={Open-source software available at https://github.com/vipshop/cache-dit.git},
|
|
736
502
|
author={vipshop.com},
|