cache-dit 0.3.1__py3-none-any.whl → 0.3.3__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of cache-dit might be problematic. Click here for more details.

Files changed (34) hide show
  1. cache_dit/__init__.py +1 -0
  2. cache_dit/_version.py +2 -2
  3. cache_dit/cache_factory/__init__.py +3 -6
  4. cache_dit/cache_factory/block_adapters/block_adapters.py +21 -64
  5. cache_dit/cache_factory/cache_adapters/__init__.py +0 -1
  6. cache_dit/cache_factory/cache_adapters/cache_adapter.py +82 -21
  7. cache_dit/cache_factory/cache_blocks/__init__.py +4 -0
  8. cache_dit/cache_factory/cache_blocks/offload_utils.py +115 -0
  9. cache_dit/cache_factory/cache_blocks/pattern_base.py +3 -0
  10. cache_dit/cache_factory/cache_contexts/__init__.py +10 -8
  11. cache_dit/cache_factory/cache_contexts/cache_context.py +186 -117
  12. cache_dit/cache_factory/cache_contexts/cache_manager.py +63 -131
  13. cache_dit/cache_factory/cache_contexts/calibrators/__init__.py +132 -0
  14. cache_dit/cache_factory/cache_contexts/{v2/calibrators → calibrators}/foca.py +1 -1
  15. cache_dit/cache_factory/cache_contexts/{v2/calibrators → calibrators}/taylorseer.py +7 -2
  16. cache_dit/cache_factory/cache_interface.py +128 -111
  17. cache_dit/cache_factory/params_modifier.py +87 -0
  18. cache_dit/metrics/__init__.py +3 -1
  19. cache_dit/utils.py +12 -21
  20. {cache_dit-0.3.1.dist-info → cache_dit-0.3.3.dist-info}/METADATA +200 -434
  21. {cache_dit-0.3.1.dist-info → cache_dit-0.3.3.dist-info}/RECORD +27 -31
  22. cache_dit/cache_factory/cache_adapters/v2/__init__.py +0 -3
  23. cache_dit/cache_factory/cache_adapters/v2/cache_adapter_v2.py +0 -524
  24. cache_dit/cache_factory/cache_contexts/taylorseer.py +0 -102
  25. cache_dit/cache_factory/cache_contexts/v2/__init__.py +0 -13
  26. cache_dit/cache_factory/cache_contexts/v2/cache_context_v2.py +0 -288
  27. cache_dit/cache_factory/cache_contexts/v2/cache_manager_v2.py +0 -799
  28. cache_dit/cache_factory/cache_contexts/v2/calibrators/__init__.py +0 -81
  29. /cache_dit/cache_factory/cache_blocks/{utils.py → pattern_utils.py} +0 -0
  30. /cache_dit/cache_factory/cache_contexts/{v2/calibrators → calibrators}/base.py +0 -0
  31. {cache_dit-0.3.1.dist-info → cache_dit-0.3.3.dist-info}/WHEEL +0 -0
  32. {cache_dit-0.3.1.dist-info → cache_dit-0.3.3.dist-info}/entry_points.txt +0 -0
  33. {cache_dit-0.3.1.dist-info → cache_dit-0.3.3.dist-info}/licenses/LICENSE +0 -0
  34. {cache_dit-0.3.1.dist-info → cache_dit-0.3.3.dist-info}/top_level.txt +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: cache_dit
3
- Version: 0.3.1
3
+ Version: 0.3.3
4
4
  Summary: A Unified, Flexible and Training-free Cache Acceleration Framework for 🤗Diffusers.
5
5
  Author: DefTruth, vipshop.com, etc.
6
6
  Maintainer: DefTruth, vipshop.com, etc
@@ -45,6 +45,8 @@ Dynamic: provides-extra
45
45
  Dynamic: requires-dist
46
46
  Dynamic: requires-python
47
47
 
48
+ <a href="./README.md">📚English</a> | <a href="./README_CN.md">📚中文阅读</a>
49
+
48
50
  <div align="center">
49
51
  <img src=https://github.com/vipshop/cache-dit/raw/main/assets/cache-dit-logo.png height="120">
50
52
 
@@ -57,12 +59,12 @@ Dynamic: requires-python
57
59
  <img src=https://img.shields.io/badge/PRs-welcome-9cf.svg >
58
60
  <img src=https://img.shields.io/badge/PyPI-pass-brightgreen.svg >
59
61
  <img src=https://static.pepy.tech/badge/cache-dit >
62
+ <img src=https://img.shields.io/github/stars/vipshop/cache-dit.svg?style=dark >
60
63
  <img src=https://img.shields.io/badge/Python-3.10|3.11|3.12-9cf.svg >
61
- <img src=https://img.shields.io/badge/Release-v0.3-brightgreen.svg >
62
64
  </div>
63
65
  <p align="center">
64
- <b><a href="#unified">📚Unified Cache APIs</a></b> | <a href="#forward-pattern-matching">📚Forward Pattern Matching</a> | <a href="#automatic-block-adapter">📚Automatic Block Adapter</a><br>
65
- <a href="#hybird-forward-pattern">📚Hybrid Forward Pattern</a> | <a href="#dbcache">📚DBCache</a> | <a href="#taylorseer">📚TaylorSeer Calibrator</a> | <a href="#cfg">📚Cache CFG</a><br>
66
+ <b><a href="#unified">📚Unified Cache APIs</a></b> | <a href="#forward-pattern-matching">📚Forward Pattern Matching</a> | <a href="./docs/User_Guide.md">📚Automatic Block Adapter</a><br>
67
+ <a href="./docs/User_Guide.md">📚Hybrid Forward Pattern</a> | <a href="#dbcache">📚DBCache</a> | <a href="./docs/User_Guide.md">📚TaylorSeer Calibrator</a> | <a href="./docs/User_Guide.md">📚Cache CFG</a><br>
66
68
  <a href="#benchmarks">📚Text2Image DrawBench</a> | <a href="#benchmarks">📚Text2Image Distillation DrawBench</a>
67
69
  </p>
68
70
  <p align="center">
@@ -74,6 +76,8 @@ Dynamic: requires-python
74
76
  🔥<a href="#supported">Chroma</a> | <a href="#supported">Sana</a> | <a href="#supported">Allegro</a> | <a href="#supported">Mochi</a> | <a href="#supported">SD 3/3.5</a> | <a href="#supported">Amused</a> | <a href="#supported"> ... </a> | <a href="#supported">DiT-XL</a>🔥
75
77
  </p>
76
78
  </div>
79
+
80
+
77
81
  <div align='center'>
78
82
  <img src=https://github.com/vipshop/cache-dit/raw/main/assets/gifs/wan2.2.C0_Q0_NONE.gif width=124px>
79
83
  <img src=https://github.com/vipshop/cache-dit/raw/main/assets/gifs/wan2.2.C1_Q0_DBCACHE_F1B0_W2M8MC2_T1O2_R0.08.gif width=124px>
@@ -85,12 +89,6 @@ Dynamic: requires-python
85
89
  <img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux.C0_Q0_NONE_T23.69s.png width=90px>
86
90
  <img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux.C0_Q0_DBCACHE_F1B0_W4M0MC0_T1O2_R0.15_S16_T11.39s.png width=90px>
87
91
  <p><b>🔥Qwen-Image</b> | <a href="https://github.com/vipshop/cache-dit">+cache-dit</a>:1.8x↑🎉 | <b>FLUX.1-dev</b> | <a href="https://github.com/vipshop/cache-dit">+cache-dit</a>:2.1x↑🎉</p>
88
- <img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext-cat.C0_L0_Q0_NONE.png width=100px>
89
- <img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext.C0_L0_Q0_NONE.png width=100px>
90
- <img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext.C0_L0_Q0_DBCACHE_F8B0_W8M0MC0_T0O2_R0.08_S10.png width=100px>
91
- <img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext.C0_L0_Q0_DBCACHE_F1B0_W8M0MC2_T0O2_R0.12_S12.png width=100px>
92
- <img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext.C0_L0_Q0_DBCACHE_F1B0_W2M0MC2_T0O2_R0.15_S15.png width=100px>
93
- <p><b>🔥FLUX-Kontext-dev</b> | Baseline | <a href="https://github.com/vipshop/cache-dit">+cache-dit</a>:1.3x↑🎉 | 1.7x↑🎉 | 2.0x↑ 🎉</p>
94
92
  <img src=https://github.com/vipshop/cache-dit/raw/main/assets/qwen-image-lightning.4steps.C0_L1_Q0_NONE.png width=160px>
95
93
  <img src=https://github.com/vipshop/cache-dit/raw/main/assets/qwen-image-lightning.4steps.C0_L1_Q0_DBCACHE_F16B16_W2M1MC1_T0O2_R0.9_S1.png width=160px>
96
94
  <img src=https://github.com/vipshop/cache-dit/raw/main/assets/hunyuan-image-2.1.C0_L0_Q1_fp8_w8a16_wo_NONE.png width=90px>
@@ -100,7 +98,22 @@ Dynamic: requires-python
100
98
  <img src=https://github.com/vipshop/cache-dit/raw/main/assets/qwen-image-edit.C0_L0_Q0_NONE.png width=125px>
101
99
  <img src=https://github.com/vipshop/cache-dit/raw/main/assets/qwen-image-edit.C0_L0_Q0_DBCACHE_F8B0_W8M0MC0_T0O2_R0.08_S18.png width=125px>
102
100
  <img src=https://github.com/vipshop/cache-dit/raw/main/assets/qwen-image-edit.C0_L0_Q0_DBCACHE_F1B0_W8M0MC2_T0O2_R0.12_S24.png width=125px>
103
- <p><b>🔥Qwen-Image-Edit</b> | Input w/o Edit | Baseline | <a href="https://github.com/vipshop/cache-dit">+cache-dit</a>:1.6x↑🎉 | 1.9x↑🎉 </p>
101
+ <p><b>🔥Qwen-Image-Edit</b> | Input w/o Edit | Baseline | <a href="https://github.com/vipshop/cache-dit">+cache-dit</a>:1.6x↑🎉 | 1.9x↑🎉
102
+ <br>♥️ Please consider to leave a <b>⭐️ Star</b> to support us ~ ♥️
103
+ </p>
104
+ </div>
105
+
106
+ <details align='center'>
107
+
108
+ <summary>Click here to show more Image/Video cases</summary>
109
+
110
+ <div align='center'>
111
+ <img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext-cat.C0_L0_Q0_NONE.png width=100px>
112
+ <img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext.C0_L0_Q0_NONE.png width=100px>
113
+ <img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext.C0_L0_Q0_DBCACHE_F8B0_W8M0MC0_T0O2_R0.08_S10.png width=100px>
114
+ <img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext.C0_L0_Q0_DBCACHE_F1B0_W8M0MC2_T0O2_R0.12_S12.png width=100px>
115
+ <img src=https://github.com/vipshop/cache-dit/raw/main/assets/flux-kontext.C0_L0_Q0_DBCACHE_F1B0_W2M0MC2_T0O2_R0.15_S15.png width=100px>
116
+ <p><b>🔥FLUX-Kontext-dev</b> | Baseline | <a href="https://github.com/vipshop/cache-dit">+cache-dit</a>:1.3x↑🎉 | 1.7x↑🎉 | 2.0x↑ 🎉</p>
104
117
  <img src=https://github.com/vipshop/cache-dit/raw/main/assets/hidream.C0_L0_Q0_NONE.png width=100px>
105
118
  <img src=https://github.com/vipshop/cache-dit/raw/main/assets/hidream.C0_L0_Q0_DBCACHE_F1B0_W8M0MC0_T0O2_R0.08_S24.png width=100px>
106
119
  <img src=https://github.com/vipshop/cache-dit/raw/main/assets/cogview4.C0_L0_Q0_NONE.png width=100px>
@@ -160,24 +173,25 @@ Dynamic: requires-python
160
173
  <p><b>🔥Asumed</b> | <a href="https://github.com/vipshop/cache-dit">+cache-dit</a>:1.1x↑🎉 | 1.2x↑🎉 | <b>DiT-XL-256</b> | <a href="https://github.com/vipshop/cache-dit">+cache-dit</a>:1.8x↑🎉
161
174
  <br>♥️ Please consider to leave a <b>⭐️ Star</b> to support us ~ ♥️</p>
162
175
  </div>
176
+ </details>
163
177
 
164
178
  ## 🔥News
165
179
 
166
- - [2025-09-10] 🎉Day 1 support [**HunyuanImage-2.1**](https://github.com/Tencent-Hunyuan/HunyuanImage-2.1) with **1.7x↑🎉** speedup! Check this [example](./examples/pipeline/run_hunyuan_image_2.1.py).
167
- - [2025-09-08] 🔥[**Qwen-Image-Lightning**](./examples/pipeline/run_qwen_image_lightning.py) **7.1/3.5 steps🎉** inference with **[DBCache: F16B16](https://github.com/vipshop/cache-dit)**.
168
- - [2025-09-03] 🎉[**Wan2.2-MoE**](https://github.com/Wan-Video) **2.4x↑🎉** speedup! Please refer to [run_wan_2.2.py](./examples/pipeline/run_wan_2.2.py) as an example.
169
- - [2025-08-19] 🔥[**Qwen-Image-Edit**](https://github.com/QwenLM/Qwen-Image) **2x↑🎉** speedup! Check the example: [run_qwen_image_edit.py](./examples/pipeline/run_qwen_image_edit.py).
170
- - [2025-08-11] 🔥[**Qwen-Image**](https://github.com/QwenLM/Qwen-Image) **1.8x↑🎉** speedup! Please refer to [run_qwen_image.py](./examples/pipeline/run_qwen_image.py) as an example.
171
- - [2025-07-13] 🎉[**FLUX.1-dev**](https://github.com/xlite-dev/flux-faster) **3.3x↑🎉** speedup! NVIDIA L20 with **[cache-dit](https://github.com/vipshop/cache-dit)** + **compile + FP8 DQ**.
180
+ - [2025-09-10] 🎉Day 1 support [**HunyuanImage-2.1**](https://github.com/Tencent-Hunyuan/HunyuanImage-2.1) with **1.7x↑🎉** speedup! Check this [example](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline/run_hunyuan_image_2.1.py).
181
+ - [2025-09-08] 🔥[**Qwen-Image-Lightning**](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline/run_qwen_image_lightning.py) **7.1/3.5 steps🎉** inference with **[DBCache: F16B16](https://github.com/vipshop/cache-dit)**.
182
+ - [2025-09-03] 🎉[**Wan2.2-MoE**](https://github.com/Wan-Video) **2.4x↑🎉** speedup! Please refer to [run_wan_2.2.py](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline/run_wan_2.2.py) as an example.
183
+ - [2025-08-19] 🔥[**Qwen-Image-Edit**](https://github.com/QwenLM/Qwen-Image) **2x↑🎉** speedup! Check the example: [run_qwen_image_edit.py](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline/run_qwen_image_edit.py).
184
+ - [2025-08-11] 🔥[**Qwen-Image**](https://github.com/QwenLM/Qwen-Image) **1.8x↑🎉** speedup! Please refer to [run_qwen_image.py](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline/run_qwen_image.py) as an example.
172
185
 
173
186
  <details>
174
187
  <summary> Previous News </summary>
175
188
 
189
+ - [2025-07-13] 🎉[**FLUX.1-dev**](https://github.com/xlite-dev/flux-faster) **3.3x↑🎉** speedup! NVIDIA L20 with **[cache-dit](https://github.com/vipshop/cache-dit)** + **compile + FP8 DQ**.
176
190
  - [2025-09-08] 🎉First caching mechanism in [Qwen-Image-Lightning](https://github.com/ModelTC/Qwen-Image-Lightning) with **[cache-dit](https://github.com/vipshop/cache-dit)**, check this [PR](https://github.com/ModelTC/Qwen-Image-Lightning/pull/35).
177
191
  - [2025-09-08] 🎉First caching mechanism in [Wan2.2](https://github.com/Wan-Video/Wan2.2) with **[cache-dit](https://github.com/vipshop/cache-dit)**, check this [PR](https://github.com/Wan-Video/Wan2.2/pull/127) for more details.
178
192
  - [2025-08-12] 🎉First caching mechanism in [QwenLM/Qwen-Image](https://github.com/QwenLM/Qwen-Image) with **[cache-dit](https://github.com/vipshop/cache-dit)**, check this [PR](https://github.com/QwenLM/Qwen-Image/pull/61).
179
- - [2025-09-01] 📚[**Hybird Forward Pattern**](#unified) is supported! Please check [FLUX.1-dev](./examples/run_flux_adapter.py) as an example.
180
- - [2025-08-10] 🔥[**FLUX.1-Kontext-dev**](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev) is supported! Please refer [run_flux_kontext.py](./examples/pipeline/run_flux_kontext.py) as an example.
193
+ - [2025-09-01] 📚[**Hybird Forward Pattern**](#unified) is supported! Please check [FLUX.1-dev](https://github.com/vipshop/cache-dit/blob/main/examples/run_flux_adapter.py) as an example.
194
+ - [2025-08-10] 🔥[**FLUX.1-Kontext-dev**](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev) is supported! Please refer [run_flux_kontext.py](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline/run_flux_kontext.py) as an example.
181
195
  - [2025-07-18] 🎉First caching mechanism in [🤗huggingface/flux-fast](https://github.com/huggingface/flux-fast) with **[cache-dit](https://github.com/vipshop/cache-dit)**, check the [PR](https://github.com/huggingface/flux-fast/pull/13).
182
196
 
183
197
  </details>
@@ -187,20 +201,14 @@ Dynamic: requires-python
187
201
  <div id="contents"></div>
188
202
 
189
203
  - [⚙️Installation](#️installation)
190
- - [🔥Benchmarks](#benchmarks)
191
- - [🔥Supported Pipelines](#supported)
192
- - [🎉Unified Cache APIs](#unified)
193
- - [📚Forward Pattern Matching](#forward-pattern-matching)
194
- - [♥️Cache with One-line Code](#%EF%B8%8Fcache-acceleration-with-one-line-code)
195
- - [🔥Automatic Block Adapter](#automatic-block-adapter)
196
- - [📚Hybird Forward Pattern](#automatic-block-adapter)
197
- - [📚Implement Patch Functor](#implement-patch-functor)
198
- - [🤖Cache Acceleration Stats](#cache-acceleration-stats-summary)
204
+ - [🔥Quick Start](#quick-start)
205
+ - [📚Pattern Matching](#forward-pattern-matching)
199
206
  - [⚡️Dual Block Cache](#dbcache)
200
207
  - [🔥TaylorSeer Calibrator](#taylorseer)
201
- - [⚡️Hybrid Cache CFG](#cfg)
202
- - [⚙️Torch Compile](#compile)
203
- - [🛠Metrics CLI](#metrics)
208
+ - [📚Hybrid Cache CFG](#cfg)
209
+ - [🔥Benchmarks](#benchmarks)
210
+ - [🎉User Guide](#user-guide)
211
+ - [©️Citations](#citations)
204
212
 
205
213
  ## ⚙️Installation
206
214
 
@@ -217,11 +225,35 @@ Or you can install the latest develop version from GitHub:
217
225
  pip3 install git+https://github.com/vipshop/cache-dit.git
218
226
  ```
219
227
 
220
- ## 🔥Supported Pipelines
228
+ ## 🔥Quick Start
229
+
230
+ <div id="unified"></div>
231
+
232
+ <div id="quick-start"></div>
233
+
234
+ In most cases, you only need to call ♥️**one-line**♥️ of code, that is `cache_dit.enable_cache(...)`. After this API is called, you just need to call the pipe as normal. The `pipe` param can be **any** Diffusion Pipeline. Please refer to [Qwen-Image](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline/run_qwen_image.py) as an example.
235
+
236
+ ```python
237
+ >>> import cache_dit
238
+ >>> from diffusers import DiffusionPipeline
239
+ >>> pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image") # Can be any diffusion pipeline
240
+ >>> cache_dit.enable_cache(pipe) # One-line code with default cache options.
241
+ >>> output = pipe(...) # Just call the pipe as normal.
242
+ >>> stats = cache_dit.summary(pipe) # Then, get the summary of cache acceleration stats.
243
+ >>> cache_dit.disable_cache(pipe) # Disable cache and run original pipe.
244
+ ```
245
+
246
+ ## 📚Forward Pattern Matching
221
247
 
222
248
  <div id="supported"></div>
223
249
 
224
- Currently, **cache-dit** library supports almost **Any** Diffusion Transformers (with **Transformer Blocks** that match the specific Input and Output **patterns**). Please check [🎉Examples](./examples/pipeline) for more details. Here are just some of the tested models listed.
250
+ <div id="forward-pattern-matching"></div>
251
+
252
+ cache-dit works by matching specific input/output patterns as shown below.
253
+
254
+ ![](https://github.com/vipshop/cache-dit/raw/main/assets/patterns-v1.png)
255
+
256
+ Please check [🎉Examples](https://github.com/vipshop/cache-dit/blob/main/examples/pipeline) for more details. Here are just some of the tested models listed.
225
257
 
226
258
  ```python
227
259
  >>> import cache_dit
@@ -235,64 +267,128 @@ Currently, **cache-dit** library supports almost **Any** Diffusion Transformers
235
267
  <details>
236
268
  <summary> Show all pipelines </summary>
237
269
 
238
- - [🚀HunyuanImage-2.1](https://github.com/vipshop/cache-dit/raw/main/examples)
239
- - [🚀Qwen-Image-Lightning](https://github.com/vipshop/cache-dit/raw/main/examples)
240
- - [🚀Qwen-Image-Edit](https://github.com/vipshop/cache-dit/raw/main/examples)
241
- - [🚀Qwen-Image](https://github.com/vipshop/cache-dit/raw/main/examples)
242
- - [🚀FLUX.1-dev](https://github.com/vipshop/cache-dit/raw/main/examples)
243
- - [🚀FLUX.1-Fill-dev](https://github.com/vipshop/cache-dit/raw/main/examples)
244
- - [🚀FLUX.1-Kontext-dev](https://github.com/vipshop/cache-dit/raw/main/examples)
245
- - [🚀CogView4](https://github.com/vipshop/cache-dit/raw/main/examples)
246
- - [🚀Wan2.2-T2V](https://github.com/vipshop/cache-dit/raw/main/examples)
247
- - [🚀HunyuanVideo](https://github.com/vipshop/cache-dit/raw/main/examples)
248
- - [🚀HiDream-I1-Full](https://github.com/vipshop/cache-dit/raw/main/examples)
249
- - [🚀HunyuanDiT](https://github.com/vipshop/cache-dit/raw/main/examples)
250
- - [🚀Wan2.1-T2V](https://github.com/vipshop/cache-dit/raw/main/examples)
251
- - [🚀Wan2.1-FLF2V](https://github.com/vipshop/cache-dit/raw/main/examples)
252
- - [🚀SkyReelsV2](https://github.com/vipshop/cache-dit/raw/main/examples)
253
- - [🚀Chroma1-HD](https://github.com/vipshop/cache-dit/raw/main/examples)
254
- - [🚀CogVideoX1.5](https://github.com/vipshop/cache-dit/raw/main/examples)
255
- - [🚀CogView3-Plus](https://github.com/vipshop/cache-dit/raw/main/examples)
256
- - [🚀CogVideoX](https://github.com/vipshop/cache-dit/raw/main/examples)
257
- - [🚀VisualCloze](https://github.com/vipshop/cache-dit/raw/main/examples)
258
- - [🚀LTXVideo](https://github.com/vipshop/cache-dit/raw/main/examples)
259
- - [🚀OmniGen](https://github.com/vipshop/cache-dit/raw/main/examples)
260
- - [🚀Lumina2](https://github.com/vipshop/cache-dit/raw/main/examples)
261
- - [🚀mochi-1-preview](https://github.com/vipshop/cache-dit/raw/main/examples)
262
- - [🚀AuraFlow-v0.3](https://github.com/vipshop/cache-dit/raw/main/examples)
263
- - [🚀PixArt-Alpha](https://github.com/vipshop/cache-dit/raw/main/examples)
264
- - [🚀PixArt-Sigma](https://github.com/vipshop/cache-dit/raw/main/examples)
265
- - [🚀NVIDIA Sana](https://github.com/vipshop/cache-dit/raw/main/examples)
266
- - [🚀SD-3/3.5](https://github.com/vipshop/cache-dit/raw/main/examples)
267
- - [🚀ConsisID](https://github.com/vipshop/cache-dit/raw/main/examples)
268
- - [🚀Allegro](https://github.com/vipshop/cache-dit/raw/main/examples)
269
- - [🚀Amused](https://github.com/vipshop/cache-dit/raw/main/examples)
270
- - [🚀DiT-XL](https://github.com/vipshop/cache-dit/raw/main/examples)
270
+ - [🚀HunyuanImage-2.1](https://github.com/vipshop/cache-dit/blob/main/examples)
271
+ - [🚀Qwen-Image-Lightning](https://github.com/vipshop/cache-dit/blob/main/examples)
272
+ - [🚀Qwen-Image-Edit](https://github.com/vipshop/cache-dit/blob/main/examples)
273
+ - [🚀Qwen-Image](https://github.com/vipshop/cache-dit/blob/main/examples)
274
+ - [🚀FLUX.1-dev](https://github.com/vipshop/cache-dit/blob/main/examples)
275
+ - [🚀FLUX.1-Fill-dev](https://github.com/vipshop/cache-dit/blob/main/examples)
276
+ - [🚀FLUX.1-Kontext-dev](https://github.com/vipshop/cache-dit/blob/main/examples)
277
+ - [🚀CogView4](https://github.com/vipshop/cache-dit/blob/main/examples)
278
+ - [🚀Wan2.2-T2V](https://github.com/vipshop/cache-dit/blob/main/examples)
279
+ - [🚀HunyuanVideo](https://github.com/vipshop/cache-dit/blob/main/examples)
280
+ - [🚀HiDream-I1-Full](https://github.com/vipshop/cache-dit/blob/main/examples)
281
+ - [🚀HunyuanDiT](https://github.com/vipshop/cache-dit/blob/main/examples)
282
+ - [🚀Wan2.1-T2V](https://github.com/vipshop/cache-dit/blob/main/examples)
283
+ - [🚀Wan2.1-FLF2V](https://github.com/vipshop/cache-dit/blob/main/examples)
284
+ - [🚀SkyReelsV2](https://github.com/vipshop/cache-dit/blob/main/examples)
285
+ - [🚀Chroma1-HD](https://github.com/vipshop/cache-dit/blob/main/examples)
286
+ - [🚀CogVideoX1.5](https://github.com/vipshop/cache-dit/blob/main/examples)
287
+ - [🚀CogView3-Plus](https://github.com/vipshop/cache-dit/blob/main/examples)
288
+ - [🚀CogVideoX](https://github.com/vipshop/cache-dit/blob/main/examples)
289
+ - [🚀VisualCloze](https://github.com/vipshop/cache-dit/blob/main/examples)
290
+ - [🚀LTXVideo](https://github.com/vipshop/cache-dit/blob/main/examples)
291
+ - [🚀OmniGen](https://github.com/vipshop/cache-dit/blob/main/examples)
292
+ - [🚀Lumina2](https://github.com/vipshop/cache-dit/blob/main/examples)
293
+ - [🚀mochi-1-preview](https://github.com/vipshop/cache-dit/blob/main/examples)
294
+ - [🚀AuraFlow-v0.3](https://github.com/vipshop/cache-dit/blob/main/examples)
295
+ - [🚀PixArt-Alpha](https://github.com/vipshop/cache-dit/blob/main/examples)
296
+ - [🚀PixArt-Sigma](https://github.com/vipshop/cache-dit/blob/main/examples)
297
+ - [🚀NVIDIA Sana](https://github.com/vipshop/cache-dit/blob/main/examples)
298
+ - [🚀SD-3/3.5](https://github.com/vipshop/cache-dit/blob/main/examples)
299
+ - [🚀ConsisID](https://github.com/vipshop/cache-dit/blob/main/examples)
300
+ - [🚀Allegro](https://github.com/vipshop/cache-dit/blob/main/examples)
301
+ - [🚀Amused](https://github.com/vipshop/cache-dit/blob/main/examples)
302
+ - [🚀DiT-XL](https://github.com/vipshop/cache-dit/blob/main/examples)
271
303
  - ...
272
304
 
273
305
  </details>
274
306
 
275
- ## 🔥Benchmarks
307
+ ## ⚡️DBCache: Dual Block Cache
276
308
 
277
- <div id="benchmarks"></div>
309
+ <div id="dbcache"></div>
310
+
311
+ ![](https://github.com/vipshop/cache-dit/raw/main/assets/dbcache-v1.png)
312
+
313
+ **DBCache**: **Dual Block Caching** for Diffusion Transformers. Different configurations of compute blocks (**F8B12**, etc.) can be customized in DBCache, enabling a balanced trade-off between performance and precision. Moreover, it can be entirely **training**-**free**. Please Check the [DBCache](https://github.com/vipshop/cache-dit/blob/main/docs/DBCache.md) and [User Guide](https://github.com/vipshop/cache-dit/blob/main/docs/User_Guide.md#dbcache) docs for more design details.
314
+
315
+ ```python
316
+ # Default options, F8B0, 8 warmup steps, and unlimited cached
317
+ # steps for good balance between performance and precision
318
+ cache_dit.enable_cache(pipe_or_adapter)
319
+
320
+ # Custom options, F8B8, higher precision
321
+ from cache_dit import BasicCacheConfig
322
+
323
+ cache_dit.enable_cache(
324
+ pipe_or_adapter,
325
+ cache_config=BasicCacheConfig(
326
+ max_warmup_steps=8, # steps do not cache
327
+ max_cached_steps=-1, # -1 means no limit
328
+ Fn_compute_blocks=8, # Fn, F8, etc.
329
+ Bn_compute_blocks=8, # Bn, B8, etc.
330
+ residual_diff_threshold=0.12,
331
+ ),
332
+ )
333
+ ```
334
+
335
+ ## 🔥TaylorSeer Calibrator
336
+
337
+ <div id="taylorseer"></div>
338
+
339
+ The [TaylorSeers](https://huggingface.co/papers/2503.06923) algorithm further improves the precision of DBCache in cases where the cached steps are large (Hybrid TaylorSeer + DBCache). At timesteps with significant intervals, the feature similarity in diffusion models decreases substantially, significantly harming the generation quality.
278
340
 
279
- cache-dit will support more mainstream Cache acceleration algorithms in the future. More benchmarks will be released, please stay tuned for update. Here, only the results of some precision and performance benchmarks are presented. The test dataset is **DrawBench**. For a complete benchmark, please refer to [📚Benchmarks](./bench/).
341
+ TaylorSeer employs a differential method to approximate the higher-order derivatives of features and predict features in future timesteps with Taylor series expansion. The TaylorSeer implemented in CacheDiT supports both hidden states and residual cache types. F_pred can be a residual cache or a hidden-state cache.
280
342
 
281
- ### 📚Text2Image DrawBench: FLUX.1-dev
343
+ ```python
344
+ from cache_dit import BasicCacheConfig, TaylorSeerCalibratorConfig
282
345
 
283
- Comparisons between different FnBn compute block configurations show that **more compute blocks result in higher precision**. For example, the F8B0_W8MC0 configuration achieves the best Clip Score (33.007) and ImageReward (1.0333). **Device**: NVIDIA L20. **F**: Fn_compute_blocks, **B**: Bn_compute_blocks, 50 steps.
346
+ cache_dit.enable_cache(
347
+ pipe_or_adapter,
348
+ # Basic DBCache w/ FnBn configurations
349
+ cache_config=BasicCacheConfig(
350
+ max_warmup_steps=8, # steps do not cache
351
+ max_cached_steps=-1, # -1 means no limit
352
+ Fn_compute_blocks=8, # Fn, F8, etc.
353
+ Bn_compute_blocks=8, # Bn, B8, etc.
354
+ residual_diff_threshold=0.12,
355
+ ),
356
+ # Then, you can use the TaylorSeer Calibrator to approximate
357
+ # the values in cached steps, taylorseer_order default is 1.
358
+ calibrator_config=TaylorSeerCalibratorConfig(
359
+ taylorseer_order=1,
360
+ ),
361
+ )
362
+ ```
284
363
 
364
+ > [!TIP]
365
+ > The `Bn_compute_blocks` parameter of DBCache can be set to `0` if you use TaylorSeer as the calibrator for approximate hidden states. DBCache's `Bn_compute_blocks` also acts as a calibrator, so you can choose either `Bn_compute_blocks` > 0 or TaylorSeer. We recommend using the configuration scheme of TaylorSeer + DBCache FnB0.
285
366
 
286
- | Config | Clip Score(↑) | ImageReward(↑) | PSNR(↑) | TFLOPs(↓) | SpeedUp(↑) |
287
- | --- | --- | --- | --- | --- | --- |
288
- | [**FLUX.1**-dev]: 50 steps | 32.9217 | 1.0412 | INF | 3726.87 | 1.00x |
289
- | F8B0_W4MC0_R0.08 | 32.9871 | 1.0370 | 33.8317 | 2064.81 | 1.80x |
290
- | F8B0_W4MC2_R0.12 | 32.9535 | 1.0185 | 32.7346 | 1935.73 | 1.93x |
291
- | F8B0_W4MC3_R0.12 | 32.9234 | 1.0085 | 32.5385 | 1816.58 | 2.05x |
292
- | F4B0_W4MC3_R0.12 | 32.8981 | 1.0130 | 31.8031 | 1507.83 | 2.47x |
293
- | F4B0_W4MC4_R0.12 | 32.8384 | 1.0065 | 31.5292 | 1400.08 | 2.66x |
367
+ ## 📚Hybrid Cache CFG
368
+
369
+ <div id="cfg"></div>
370
+
371
+ cache-dit supports caching for CFG (classifier-free guidance). For models that fuse CFG and non-CFG into a single forward step, or models that do not include CFG (classifier-free guidance) in the forward step, please set `enable_separate_cfg` parameter to `False (default, None)`. Otherwise, set it to `True`.
372
+
373
+ ```python
374
+ from cache_dit import BasicCacheConfig
375
+
376
+ cache_dit.enable_cache(
377
+ pipe_or_adapter,
378
+ cache_config=BasicCacheConfig(
379
+ ...,
380
+ # For example, set it as True for Wan 2.1/Qwen-Image
381
+ # and set it as False for FLUX.1, HunyuanVideo, CogVideoX, etc.
382
+ enable_separate_cfg=True,
383
+ ),
384
+ )
385
+ ```
386
+
387
+ ## 🔥Benchmarks
388
+
389
+ <div id="benchmarks"></div>
294
390
 
295
- The comparison between **cache-dit: DBCache** and algorithms such as Δ-DiT, Chipmunk, FORA, DuCa, TaylorSeer and FoCa is as follows. Now, in the comparison with a speedup ratio less than **3x**, cache-dit achieved the best accuracy. Please check [📚How to Reproduce?](./bench/) for more details.
391
+ The comparison between **cache-dit: DBCache** and algorithms such as Δ-DiT, Chipmunk, FORA, DuCa, TaylorSeer and FoCa is as follows. Now, in the comparison with a speedup ratio less than **3x**, cache-dit achieved the best accuracy. Surprisingly, cache-dit: DBCache still works in the extremely few-step distill model. For a complete benchmark, please refer to [📚Benchmarks](https://github.com/vipshop/cache-dit/raw/main/bench/).
296
392
 
297
393
  | Method | TFLOPs(↓) | SpeedUp(↑) | ImageReward(↑) | Clip Score(↑) |
298
394
  | --- | --- | --- | --- | --- |
@@ -350,363 +446,33 @@ NOTE: Except for DBCache, other performance data are referenced from the paper [
350
446
 
351
447
  </details>
352
448
 
353
- ### 📚Text2Image Distillation DrawBench: Qwen-Image-Lightning
354
-
355
- Surprisingly, cache-dit: DBCache still works in the extremely few-step distill model. For example, **Qwen-Image-Lightning w/ 4 steps**, with the F16B16 configuration, the PSNR is 34.8163, the Clip Score is 35.6109, and the ImageReward is 1.2614. It maintained a relatively high precision.
356
-
357
- | Config | PSNR(↑) | Clip Score(↑) | ImageReward(↑) | TFLOPs(↓) | SpeedUp() |
358
- |----------------------------|-----------|------------|--------------|----------|------------|
359
- | [**Lightning**]: 4 steps | INF | 35.5797 | 1.2630 | 274.33 | 1.00x |
360
- | F24B24_W2MC1_R0.8 | 36.3242 | 35.6224 | 1.2630 | 264.74 | 1.04x |
361
- | F16B16_W2MC1_R0.8 | 34.8163 | 35.6109 | 1.2614 | 244.25 | 1.12x |
362
- | F12B12_W2MC1_R0.8 | 33.8953 | 35.6535 | 1.2549 | 234.63 | 1.17x |
363
- | F8B8_W2MC1_R0.8 | 33.1374 | 35.7284 | 1.2517 | 224.29 | 1.22x |
364
- | F1B0_W2MC1_R0.8 | 31.8317 | 35.6651 | 1.2397 | 206.90 | 1.33x |
365
-
366
- ## 🎉Unified Cache APIs
367
-
368
- <div id="unified"></div>
369
-
370
- ### 📚Forward Pattern Matching
371
-
372
- Currently, for any **Diffusion** models with **Transformer Blocks** that match the specific **Input/Output patterns**, we can use the **Unified Cache APIs** from **cache-dit**, namely, the `cache_dit.enable_cache(...)` API. The **Unified Cache APIs** are currently in the experimental phase; please stay tuned for updates. The supported patterns are listed as follows:
373
-
374
- ![](https://github.com/vipshop/cache-dit/raw/main/assets/patterns-v1.png)
375
-
376
- ### ♥️Cache Acceleration with One-line Code
377
-
378
- In most cases, you only need to call **one-line** of code, that is `cache_dit.enable_cache(...)`. After this API is called, you just need to call the pipe as normal. The `pipe` param can be **any** Diffusion Pipeline. Please refer to [Qwen-Image](./examples/pipeline/run_qwen_image.py) as an example.
379
-
380
- ```python
381
- import cache_dit
382
- from diffusers import DiffusionPipeline
383
-
384
- # Can be any diffusion pipeline
385
- pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image")
386
-
387
- # One-line code with default cache options.
388
- cache_dit.enable_cache(pipe)
389
-
390
- # Just call the pipe as normal.
391
- output = pipe(...)
392
-
393
- # Disable cache and run original pipe.
394
- cache_dit.disable_cache(pipe)
395
- ```
396
-
397
- ### 🔥Automatic Block Adapter
398
-
399
- But in some cases, you may have a **modified** Diffusion Pipeline or Transformer that is not located in the diffusers library or not officially supported by **cache-dit** at this time. The **BlockAdapter** can help you solve this problems. Please refer to [🔥Qwen-Image w/ BlockAdapter](./examples/adapter/run_qwen_image_adapter.py) as an example.
400
-
401
- ```python
402
- from cache_dit import ForwardPattern, BlockAdapter
403
-
404
- # Use 🔥BlockAdapter with `auto` mode.
405
- cache_dit.enable_cache(
406
- BlockAdapter(
407
- # Any DiffusionPipeline, Qwen-Image, etc.
408
- pipe=pipe, auto=True,
409
- # Check `📚Forward Pattern Matching` documentation and hack the code of
410
- # of Qwen-Image, you will find that it has satisfied `FORWARD_PATTERN_1`.
411
- forward_pattern=ForwardPattern.Pattern_1,
412
- ),
413
- )
414
-
415
- # Or, manually setup transformer configurations.
416
- cache_dit.enable_cache(
417
- BlockAdapter(
418
- pipe=pipe, # Qwen-Image, etc.
419
- transformer=pipe.transformer,
420
- blocks=pipe.transformer.transformer_blocks,
421
- forward_pattern=ForwardPattern.Pattern_1,
422
- ),
423
- )
424
- ```
425
- For such situations, **BlockAdapter** can help you quickly apply various cache acceleration features to your own Diffusion Pipelines and Transformers. Please check the [📚BlockAdapter.md](./docs/BlockAdapter.md) for more details.
426
-
427
- ### 📚Hybird Forward Pattern
428
-
429
- Sometimes, a Transformer class will contain more than one transformer `blocks`. For example, **FLUX.1** (HiDream, Chroma, etc) contains transformer_blocks and single_transformer_blocks (with different forward patterns). The **BlockAdapter** can also help you solve this problem. Please refer to [📚FLUX.1](./examples/adapter/run_flux_adapter.py) as an example.
430
-
431
- ```python
432
- # For diffusers <= 0.34.0, FLUX.1 transformer_blocks and
433
- # single_transformer_blocks have different forward patterns.
434
- cache_dit.enable_cache(
435
- BlockAdapter(
436
- pipe=pipe, # FLUX.1, etc.
437
- transformer=pipe.transformer,
438
- blocks=[
439
- pipe.transformer.transformer_blocks,
440
- pipe.transformer.single_transformer_blocks,
441
- ],
442
- forward_pattern=[
443
- ForwardPattern.Pattern_1,
444
- ForwardPattern.Pattern_3,
445
- ],
446
- ),
447
- )
448
- ```
449
-
450
- Even sometimes you have more complex cases, such as **Wan 2.2 MoE**, which has more than one Transformer (namely `transformer` and `transformer_2`) in its structure. Fortunately, **cache-dit** can also handle this situation very well. Please refer to [📚Wan 2.2 MoE](./examples/pipeline/run_wan_2.2.py) as an example.
451
-
452
- ```python
453
- from cache_dit import ForwardPattern, BlockAdapter, ParamsModifier
454
-
455
- cache_dit.enable_cache(
456
- BlockAdapter(
457
- pipe=pipe,
458
- transformer=[
459
- pipe.transformer,
460
- pipe.transformer_2,
461
- ],
462
- blocks=[
463
- pipe.transformer.blocks,
464
- pipe.transformer_2.blocks,
465
- ],
466
- forward_pattern=[
467
- ForwardPattern.Pattern_2,
468
- ForwardPattern.Pattern_2,
469
- ],
470
- # Setup different cache params for each 'blocks'. You can
471
- # pass any specific cache params to ParamModifier, the old
472
- # value will be overwrite by the new one.
473
- params_modifiers=[
474
- ParamsModifier(
475
- max_warmup_steps=4,
476
- max_cached_steps=8,
477
- ),
478
- ParamsModifier(
479
- max_warmup_steps=2,
480
- max_cached_steps=20,
481
- ),
482
- ],
483
- has_separate_cfg=True,
484
- ),
485
- )
486
- ```
487
- ### 📚Implement Patch Functor
488
-
489
- For any PATTERN not {0...5}, we introduced the simple abstract concept of **Patch Functor**. Users can implement a subclass of Patch Functor to convert an unknown Pattern into a known PATTERN, and for some models, users may also need to fuse the operations within the blocks for loop into block forward.
490
-
491
- ![](https://github.com/vipshop/cache-dit/raw/main/assets/patch-functor.png)
492
-
493
- Some Patch functors have already been provided in cache-dit: [📚HiDreamPatchFunctor](./src/cache_dit/cache_factory/patch_functors/functor_hidream.py), [📚ChromaPatchFunctor](./src/cache_dit/cache_factory/patch_functors/functor_chroma.py), etc. After implementing Patch Functor, users need to set the `patch_functor` property of **BlockAdapter**.
494
-
495
- ```python
496
- @BlockAdapterRegistry.register("HiDream")
497
- def hidream_adapter(pipe, **kwargs) -> BlockAdapter:
498
- from diffusers import HiDreamImageTransformer2DModel
499
- from cache_dit.cache_factory.patch_functors import HiDreamPatchFunctor
500
-
501
- assert isinstance(pipe.transformer, HiDreamImageTransformer2DModel)
502
- return BlockAdapter(
503
- pipe=pipe,
504
- transformer=pipe.transformer,
505
- blocks=[
506
- pipe.transformer.double_stream_blocks,
507
- pipe.transformer.single_stream_blocks,
508
- ],
509
- forward_pattern=[
510
- ForwardPattern.Pattern_0,
511
- ForwardPattern.Pattern_3,
512
- ],
513
- # NOTE: Setup your custom patch functor here.
514
- patch_functor=HiDreamPatchFunctor(),
515
- **kwargs,
516
- )
517
- ```
518
-
519
- ### 🤖Cache Acceleration Stats Summary
520
-
521
- After finishing each inference of `pipe(...)`, you can call the `cache_dit.summary()` API on pipe to get the details of the **Cache Acceleration Stats** for the current inference.
522
- ```python
523
- stats = cache_dit.summary(pipe)
524
- ```
525
-
526
- You can set `details` param as `True` to show more details of cache stats. (markdown table format) Sometimes, this may help you analyze what values of the residual diff threshold would be better.
527
-
528
- ```python
529
- ⚡️Cache Steps and Residual Diffs Statistics: QwenImagePipeline
530
-
531
- | Cache Steps | Diffs Min | Diffs P25 | Diffs P50 | Diffs P75 | Diffs P95 | Diffs Max |
532
- |-------------|-----------|-----------|-----------|-----------|-----------|-----------|
533
- | 23 | 0.045 | 0.084 | 0.114 | 0.147 | 0.241 | 0.297 |
534
- ```
535
-
536
- ## ⚡️DBCache: Dual Block Cache
537
-
538
- <div id="dbcache"></div>
539
-
540
- ![](https://github.com/vipshop/cache-dit/raw/main/assets/dbcache-v1.png)
541
-
542
- **DBCache**: **Dual Block Caching** for Diffusion Transformers. Different configurations of compute blocks (**F8B12**, etc.) can be customized in DBCache, enabling a balanced trade-off between performance and precision. Moreover, it can be entirely **training**-**free**. Please check [DBCache.md](./docs/DBCache.md) docs for more design details.
543
-
544
- - **Fn**: Specifies that DBCache uses the **first n** Transformer blocks to fit the information at time step t, enabling the calculation of a more stable L1 diff and delivering more accurate information to subsequent blocks.
545
- - **Bn**: Further fuses approximate information in the **last n** Transformer blocks to enhance prediction accuracy. These blocks act as an auto-scaler for approximate hidden states that use residual cache.
546
-
547
- ```python
548
- import cache_dit
549
- from diffusers import FluxPipeline
550
-
551
- pipe = FluxPipeline.from_pretrained(
552
- "black-forest-labs/FLUX.1-dev",
553
- torch_dtype=torch.bfloat16,
554
- ).to("cuda")
555
-
556
- # Default options, F8B0, 8 warmup steps, and unlimited cached
557
- # steps for good balance between performance and precision
558
- cache_dit.enable_cache(pipe)
559
-
560
- # Custom options, F8B8, higher precision
561
- cache_dit.enable_cache(
562
- pipe,
563
- max_warmup_steps=8, # steps do not cache
564
- max_cached_steps=-1, # -1 means no limit
565
- Fn_compute_blocks=8, # Fn, F8, etc.
566
- Bn_compute_blocks=8, # Bn, B8, etc.
567
- residual_diff_threshold=0.12,
568
- )
569
- ```
570
-
571
- <div align="center">
572
- <p align="center">
573
- DBCache, <b> L20x1 </b>, Steps: 28, "A cat holding a sign that says hello world with complex background"
574
- </p>
575
- </div>
576
-
577
- |Baseline(L20x1)|F1B0 (0.08)|F1B0 (0.20)|F8B8 (0.15)|F12B12 (0.20)|F16B16 (0.20)|
578
- |:---:|:---:|:---:|:---:|:---:|:---:|
579
- |24.85s|15.59s|8.58s|15.41s|15.11s|17.74s|
580
- |<img src=https://github.com/vipshop/cache-dit/raw/main/assets/NONE_R0.08_S0.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F1B0S1_R0.08_S11.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F1B0S1_R0.2_S19.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F8B8S1_R0.15_S15.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F12B12S4_R0.2_S16.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBCACHE_F16B16S4_R0.2_S13.png width=105px>|
581
-
582
- ## 🔥TaylorSeer Calibrator
583
-
584
- <div id="taylorseer"></div>
585
-
586
- We have supported the [TaylorSeers: From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers](https://arxiv.org/pdf/2503.06923) algorithm to further improve the precision of DBCache in cases where the cached steps are large, namely, **Hybrid TaylorSeer + DBCache**. At timesteps with significant intervals, the feature similarity in diffusion models decreases substantially, significantly harming the generation quality.
587
-
588
- $$
589
- \mathcal{F}\_{\text {pred }, m}\left(x_{t-k}^l\right)=\mathcal{F}\left(x_t^l\right)+\sum_{i=1}^m \frac{\Delta^i \mathcal{F}\left(x_t^l\right)}{i!\cdot N^i}(-k)^i
590
- $$
591
-
592
- **TaylorSeer** employs a differential method to approximate the higher-order derivatives of features and predict features in future timesteps with Taylor series expansion. The TaylorSeer implemented in cache-dit supports both hidden states and residual cache types. That is $\mathcal{F}\_{\text {pred }, m}\left(x_{t-k}^l\right)$ can be a residual cache or a hidden-state cache.
593
-
594
- ```python
595
- from cache_dit import TaylorSeerCalibratorConfig
596
-
597
- cache_dit.enable_cache(
598
- pipe,
599
- # Basic DBCache w/ FnBn configurations
600
- max_warmup_steps=8, # steps do not cache
601
- max_cached_steps=-1, # -1 means no limit
602
- Fn_compute_blocks=8, # Fn, F8, etc.
603
- Bn_compute_blocks=8, # Bn, B8, etc.
604
- residual_diff_threshold=0.12,
605
- # Then, you can use the TaylorSeer Calibrator to approximate
606
- # the values in cached steps, taylorseer_order default is 1.
607
- calibrator_config=TaylorSeerCalibratorConfig(
608
- taylorseer_order=1,
609
- ),
610
- )
611
- ```
612
-
613
- > [!Important]
614
- > Please note that if you have used TaylorSeer as the calibrator for approximate hidden states, the **Bn** param of DBCache can be set to **0**. In essence, DBCache's Bn is also act as a calibrator, so you can choose either Bn > 0 or TaylorSeer. We recommend using the configuration scheme of **TaylorSeer** + **DBCache FnB0**.
615
-
616
- <div align="center">
617
- <p align="center">
618
- <b>DBCache F1B0 + TaylorSeer</b>, L20x1, Steps: 28, <br>"A cat holding a sign that says hello world with complex background"
619
- </p>
620
- </div>
621
-
622
- |Baseline(L20x1)|F1B0 (0.12)|+TaylorSeer|F1B0 (0.15)|+TaylorSeer|+compile|
623
- |:---:|:---:|:---:|:---:|:---:|:---:|
624
- |24.85s|12.85s|12.86s|10.27s|10.28s|8.48s|
625
- |<img src=https://github.com/vipshop/cache-dit/raw/main/assets/NONE_R0.08_S0.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C0_DBCACHE_F1B0S1W0T0ET0_R0.12_S14_T12.85s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C0_DBCACHE_F1B0S1W0T1ET1_R0.12_S14_T12.86s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C0_DBCACHE_F1B0S1W0T0ET0_R0.15_S17_T10.27s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C0_DBCACHE_F1B0S1W0T1ET1_R0.15_S17_T10.28s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBCACHE_F1B0S1W0T1ET1_R0.15_S17_T8.48s.png width=105px>|
626
-
627
- ## ⚡️Hybrid Cache CFG
628
-
629
- <div id="cfg"></div>
630
-
631
- cache-dit supports caching for **CFG (classifier-free guidance)**. For models that fuse CFG and non-CFG into a single forward step, or models that do not include CFG (classifier-free guidance) in the forward step, please set `enable_separate_cfg` param to **False (default, None)**. Otherwise, set it to True. For examples:
632
-
633
- ```python
634
- cache_dit.enable_cache(
635
- pipe,
636
- ...,
637
- # CFG: classifier free guidance or not
638
- # For model that fused CFG and non-CFG into single forward step,
639
- # should set enable_separate_cfg as False. For example, set it as True
640
- # for Wan 2.1/Qwen-Image and set it as False for FLUX.1, HunyuanVideo,
641
- # CogVideoX, Mochi, LTXVideo, Allegro, CogView3Plus, EasyAnimate, SD3, etc.
642
- enable_separate_cfg=True, # Wan 2.1, Qwen-Image, CogView4, Cosmos, SkyReelsV2, etc.
643
- # Compute cfg forward first or not, default False, namely,
644
- # 0, 2, 4, ..., -> non-CFG step; 1, 3, 5, ... -> CFG step.
645
- cfg_compute_first=False,
646
- # Compute separate diff values for CFG and non-CFG step,
647
- # default True. If False, we will use the computed diff from
648
- # current non-CFG transformer step for current CFG step.
649
- cfg_diff_compute_separate=True,
650
- )
651
- ```
652
-
653
- ## ⚙️Torch Compile
654
-
655
- <div id="compile"></div>
656
-
657
- By the way, **cache-dit** is designed to work compatibly with **torch.compile.** You can easily use cache-dit with torch.compile to further achieve a better performance. For example:
658
-
659
- ```python
660
- cache_dit.enable_cache(pipe)
661
-
662
- # Compile the Transformer module
663
- pipe.transformer = torch.compile(pipe.transformer)
664
- ```
665
- However, users intending to use **cache-dit** for DiT with **dynamic input shapes** should consider increasing the **recompile** **limit** of `torch._dynamo`. Otherwise, the recompile_limit error may be triggered, causing the module to fall back to eager mode.
666
- ```python
667
- torch._dynamo.config.recompile_limit = 96 # default is 8
668
- torch._dynamo.config.accumulated_recompile_limit = 2048 # default is 256
669
- ```
670
-
671
- Please check [perf.py](./bench/perf.py) for more details.
672
-
673
-
674
- ## 🛠Metrics CLI
675
-
676
- <div id="metrics"></div>
677
-
678
- You can utilize the APIs provided by cache-dit to quickly evaluate the accuracy losses caused by different cache configurations. For example:
679
-
680
- ```python
681
- from cache_dit.metrics import compute_psnr
682
- from cache_dit.metrics import compute_ssim
683
- from cache_dit.metrics import compute_fid
684
- from cache_dit.metrics import compute_lpips
685
- from cache_dit.metrics import compute_clip_score
686
- from cache_dit.metrics import compute_image_reward
687
-
688
- psnr, n = compute_psnr("true.png", "test.png") # Num: n
689
- psnr, n = compute_psnr("true_dir", "test_dir")
690
- ssim, n = compute_ssim("true_dir", "test_dir")
691
- fid, n = compute_fid("true_dir", "test_dir")
692
- lpips, n = compute_lpips("true_dir", "test_dir")
693
- clip_score, n = compute_clip_score("DrawBench200.txt", "test_dir")
694
- reward, n = compute_image_reward("DrawBench200.txt", "test_dir")
695
- ```
696
-
697
- Please check [test_metrics.py](./tests/test_metrics.py) for more details. Or, you can use `cache-dit-metrics-cli` tool. For examples:
698
-
699
- ```bash
700
- cache-dit-metrics-cli -h # show usage
701
- # all: PSNR, FID, SSIM, MSE, ..., etc.
702
- cache-dit-metrics-cli all -i1 true.png -i2 test.png # image
703
- cache-dit-metrics-cli all -i1 true_dir -i2 test_dir # image dir
704
- ```
449
+ ## 🎉User Guide
450
+
451
+ <div id="user-guide"></div>
452
+
453
+ For more advanced features such as **Unified Cache APIs**, **Forward Pattern Matching**, **Automatic Block Adapter**, **Hybrid Forward Pattern**, **DBCache**, **TaylorSeer Calibrator**, and **Hybrid Cache CFG**, please refer to the [🎉User_Guide.md](./docs/User_Guide.md) for details.
454
+
455
+ - [⚙️Installation](./docs/User_Guide.md#️installation)
456
+ - [🔥Benchmarks](./docs/User_Guide.md#benchmarks)
457
+ - [🔥Supported Pipelines](./docs/User_Guide.md#supported-pipelines)
458
+ - [🎉Unified Cache APIs](./docs/User_Guide.md#unified-cache-apis)
459
+ - [📚Forward Pattern Matching](./docs/User_Guide.md#forward-pattern-matching)
460
+ - [📚Cache with One-line Code](./docs/User_Guide.md#%EF%B8%8Fcache-acceleration-with-one-line-code)
461
+ - [🔥Automatic Block Adapter](./docs/User_Guide.md#automatic-block-adapter)
462
+ - [📚Hybird Forward Pattern](./docs/User_Guide.md#hybird-forward-pattern)
463
+ - [📚Implement Patch Functor](./docs/User_Guide.md#implement-patch-functor)
464
+ - [🤖Cache Acceleration Stats](./docs/User_Guide.md#cache-acceleration-stats-summary)
465
+ - [⚡️Dual Block Cache](./docs/User_Guide.md#️dbcache-dual-block-cache)
466
+ - [🔥TaylorSeer Calibrator](./docs/User_Guide.md#taylorseer-calibrator)
467
+ - [⚡️Hybrid Cache CFG](./docs/User_Guide.md#️hybrid-cache-cfg)
468
+ - [⚙️Torch Compile](./docs/User_Guide.md#️torch-compile)
469
+ - [🛠Metrics CLI](./docs/User_Guide.md#metrics-cli)
470
+ - [📚API Documents](./docs/User_Guide.md#api-documentation)
705
471
 
706
472
  ## 👋Contribute
707
473
  <div id="contribute"></div>
708
474
 
709
- How to contribute? Star ⭐️ this repo to support us or check [CONTRIBUTE.md](./CONTRIBUTE.md).
475
+ How to contribute? Star ⭐️ this repo to support us or check [CONTRIBUTE.md](https://github.com/vipshop/cache-dit/raw/main/CONTRIBUTE.md).
710
476
 
711
477
  <div align='center'>
712
478
  <a href="https://star-history.com/#vipshop/cache-dit&Date">
@@ -730,7 +496,7 @@ The **cache-dit** codebase is adapted from FBCache. Over time its codebase diver
730
496
 
731
497
  ```BibTeX
732
498
  @misc{cache-dit@2025,
733
- title={cache-dit: A Unified, Flexible and Training-free Cache Acceleration Framework for 🤗Diffusers.},
499
+ title={cache-dit: A Unified, Flexible and Training-free Cache Acceleration Framework for Diffusers.},
734
500
  url={https://github.com/vipshop/cache-dit.git},
735
501
  note={Open-source software available at https://github.com/vipshop/cache-dit.git},
736
502
  author={vipshop.com},