PyPI - cache-dit - Versions diffs - 0.2.15__py3-none-any.whl → 0.2.17__py3-none-any.whl - Mend

cache-dit 0.2.15py3-none-any.whl → 0.2.17py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of cache-dit might be problematic. Click here for more details.

Files changed (43) hide show

{cache_dit-0.2.15.dist-info → cache_dit-0.2.17.dist-info}/METADATA RENAMED Viewed

@@ -1,7 +1,7 @@
 Metadata-Version: 2.4
 Name: cache_dit
-Version: 0.2.15
-Summary: 🤗 CacheDiT: A Training-free and Easy-to-use Cache Acceleration Toolbox for Diffusion Transformers
+Version: 0.2.17
+Summary: 🤗 CacheDiT: An Unified and Training-free Cache Acceleration Toolbox for Diffusion Transformers
 Author: DefTruth, vipshop.com, etc.
 Maintainer: DefTruth, vipshop.com, etc
 Project-URL: Repository, https://github.com/vipshop/cache-dit.git
@@ -41,7 +41,7 @@ Dynamic: requires-python
 <div align="center">
   <p align="center">
-    <h2>🤗 CacheDiT: A Training-free and Easy-to-use Cache Acceleration <br>Toolbox for Diffusion Transformers</h2>
+    <h2>🤗 CacheDiT: An Unified and Training-free Cache Acceleration <br>Toolbox for Diffusion Transformers</h2>
   </p>
   <img src=https://github.com/vipshop/cache-dit/raw/main/assets/cache-dit-v1.png >
   <div align='center'>
@@ -52,13 +52,23 @@ Dynamic: requires-python
       <img src=https://img.shields.io/badge/Python-3.10|3.11|3.12-9cf.svg >
       <img src=https://img.shields.io/badge/Release-v0.2-brightgreen.svg >
  </div>
-   <b>🔥<a href="#dbcache">DBCache</a> | <a href="#dbprune">DBPrune</a> | <a href="#taylorseer">Hybrid TaylorSeer</a> | <a href="#cfg">Hybrid Cache CFG</a> | <a href="#fbcache">FBCache</a></b>🔥
+  🔥<b><a href="#unified">Unified Cache APIs</a> | <a href="#dbcache">DBCache</a> | <a href="#taylorseer">Hybrid TaylorSeer</a> | <a href="#cfg">Hybrid Cache CFG</a></b>🔥
 </div>
-## 🔥News
-- [2025-08-11] 🔥[Qwen-Image](./examples/run_qwen_image.py) is supported! Please check [run_qwen_image.py](./examples/run_qwen_image.py) as an example.
-- [2025-08-10] 🔥[FLUX.1-Kontext-dev](./examples/run_flux_kontext.py) is supported! Please check [run_flux_kontext.py](./examples/run_flux_kontext.py) as an example.
-- [2025-07-18] 🎉First caching mechanism in [🤗huggingface/flux-fast](https://github.com/huggingface/flux-fast) with **[cache-dit](https://github.com/vipshop/cache-dit)**, also check the [PR](https://github.com/huggingface/flux-fast/pull/13).
+<div align="center">
+  <p align="center">
+    ♥️ Cache <b>Acceleration</b> with <b>One-line</b> Code ~ ♥️
+  </p>
+</div>
+## 🔥News
+- [2025-08-18] 🎉Early **[Unified Cache APIs](#unified)** released! Check [Qwen-Image w/ UAPI](./examples/run_qwen_image_uapi.py) as an example.
+- [2025-08-12] 🎉First caching mechanism in [QwenLM/Qwen-Image](https://github.com/QwenLM/Qwen-Image) with **[cache-dit](https://github.com/vipshop/cache-dit)**, check the [PR](https://github.com/QwenLM/Qwen-Image/pull/61).
+- [2025-08-11] 🔥[Qwen-Image](https://github.com/QwenLM/Qwen-Image) is supported now! Please refer [run_qwen_image.py](./examples/run_qwen_image.py) as an example.
+- [2025-08-10] 🔥[FLUX.1-Kontext-dev](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev) is supported! Please refer [run_flux_kontext.py](./examples/run_flux_kontext.py) as an example.
+- [2025-07-18] 🎉First caching mechanism in [🤗huggingface/flux-fast](https://github.com/huggingface/flux-fast) with **[cache-dit](https://github.com/vipshop/cache-dit)**, check the [PR](https://github.com/huggingface/flux-fast/pull/13).
 - [2025-07-13] **[🤗flux-faster](https://github.com/xlite-dev/flux-faster)** is released! **3.3x** speedup for FLUX.1 on NVIDIA L20 with `cache-dit`.
 ## 📖Contents
@@ -67,17 +77,12 @@ Dynamic: requires-python
 - [⚙️Installation](#️installation)
 - [🔥Supported Models](#supported)
+- [🎉Unified Cache APIs](#unified)
 - [⚡️Dual Block Cache](#dbcache)
 - [🔥Hybrid TaylorSeer](#taylorseer)
 - [⚡️Hybrid Cache CFG](#cfg)
-- [🎉First Block Cache](#fbcache)
-- [⚡️Dynamic Block Prune](#dbprune)
-- [🎉Context Parallelism](#context-parallelism)
 - [🔥Torch Compile](#compile)
-- [⚙️Metrics CLI](#metrics)
-- [👋Contribute](#contribute)
-- [©️License](#license)
-- [©️Citations](#citations)
+- [🛠Metrics CLI](#metrics)
 ## ⚙️Installation
@@ -98,6 +103,8 @@ pip3 install git+https://github.com/vipshop/cache-dit.git
 <div id="supported"></div>
+Currently, **cache-dit** library supports almost **Any** Diffusion Transformers (with **Transformer Blocks** that match the specific Input and Output **patterns**). Please check [🎉Unified Cache APIs](#unified) for more details. Here are just some of the tested models listed:
 - [🚀Qwen-Image](https://github.com/vipshop/cache-dit/raw/main/examples)
 - [🚀FLUX.1-dev](https://github.com/vipshop/cache-dit/raw/main/examples)
 - [🚀FLUX.1-Fill-dev](https://github.com/vipshop/cache-dit/raw/main/examples)
@@ -108,7 +115,48 @@ pip3 install git+https://github.com/vipshop/cache-dit.git
 - [🚀Wan2.1-T2V](https://github.com/vipshop/cache-dit/raw/main/examples)
 - [🚀Wan2.1-FLF2V](https://github.com/vipshop/cache-dit/raw/main/examples)
 - [🚀HunyuanVideo](https://github.com/vipshop/cache-dit/raw/main/examples)
+- [🚀LTXVideo](https://github.com/vipshop/cache-dit/raw/main/examples)
+- [🚀Allegro](https://github.com/vipshop/cache-dit/raw/main/examples)
+- [🚀CogView3Plus](https://github.com/vipshop/cache-dit/raw/main/examples)
+- [🚀CogView4](https://github.com/vipshop/cache-dit/raw/main/examples)
+- [🚀Cosmos](https://github.com/vipshop/cache-dit/raw/main/examples)
+- [🚀EasyAnimate](https://github.com/vipshop/cache-dit/raw/main/examples)
+- [🚀SkyReelsV2](https://github.com/vipshop/cache-dit/raw/main/examples)
+- [🚀SD3](https://github.com/vipshop/cache-dit/raw/main/examples)
+## 🎉Unified Cache APIs
+<div id="unified"></div>
+Currently, for any **Diffusion** models with **Transformer Blocks** that match the specific **Input/Output pattern**, we can use the **Unified Cache APIs** from **cache-dit**. The supported patterns are listed as follows:
+```bash
+(IN: hidden_states, encoder_hidden_states, ...) -> (OUT: hidden_states, encoder_hidden_states)
+(IN: hidden_states, encoder_hidden_states, ...) -> (OUT: encoder_hidden_states, hidden_states)
+(IN: hidden_states, encoder_hidden_states, ...) -> (OUT: hidden_states)
+(IN: hidden_states, ...) -> (OUT: hidden_states) # TODO, DiT, Lumina2, etc.
+```
+Please refer to [Qwen-Image w/ UAPI](./examples/run_qwen_image_uapi.py) as an example. The `pipe` parameter can be **Any** Diffusion Pipelines. The **Unified Cache APIs** are currently in the experimental phase, please stay tuned for updates.
+```python
+import cache_dit
+from diffusers import DiffusionPipeline # Can be [Any] Diffusion Pipeline
+pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image")
+# Just use the one line code with default cache options.
+cache_dit.enable_cache(pipe)
+# Or, enable cache with custom setting according to your models.
+cache_dit.enable_cache(
+    pipe, transformer=pipe.transformer,
+    blocks=pipe.transformer.transformer_blocks,
+    return_hidden_states_first=False,
+    **cache_dit.default_options(),
+)
+```
 ## ⚡️DBCache: Dual Block Cache
@@ -153,31 +201,31 @@ These case studies demonstrate that even with relatively high thresholds (such a
 - **max_cached_steps**:  (default: -1) DBCache disables the caching strategy when the previous cached steps exceed this value to prevent precision degradation.
 - **residual_diff_threshold**: The value of residual diff threshold, a higher value leads to faster performance at the cost of lower precision.
-For a good balance between performance and precision, DBCache is configured by default with **F8B8**, 8 warmup steps, and unlimited cached steps.
+For a good balance between performance and precision, DBCache is configured by default with **F8B0**, 8 warmup steps, and unlimited cached steps.
 ```python
+import cache_dit
 from diffusers import FluxPipeline
-from cache_dit.cache_factory import apply_cache_on_pipe, CacheType
 pipe = FluxPipeline.from_pretrained(
     "black-forest-labs/FLUX.1-dev",
     torch_dtype=torch.bfloat16,
 ).to("cuda")
-# Default options, F8B8, good balance between performance and precision
-cache_options = CacheType.default_options(CacheType.DBCache)
+# Default options, F8B0, good balance between performance and precision
+cache_options = cache_dit.default_options()
-# Custom options, F8B16, higher precision
+# Custom options, F8B8, higher precision
 cache_options = {
-    "cache_type": CacheType.DBCache,
+    "cache_type": cache_dit.DBCache,
     "warmup_steps": 8,
-    "max_cached_steps": 8,    # -1 means no limit
-    "Fn_compute_blocks": 8,   # Fn, F8, etc.
-    "Bn_compute_blocks": 16,  # Bn, B16, etc.
+    "max_cached_steps": -1, # -1 means no limit
+    "Fn_compute_blocks": 8, # Fn, F8, etc.
+    "Bn_compute_blocks": 8, # Bn, B8, etc.
     "residual_diff_threshold": 0.12,
 }
-apply_cache_on_pipe(pipe, **cache_options)
+cache_dit.enable_cache(pipe, **cache_options)
 ```
 Moreover, users configuring higher **Bn** values (e.g., **F8B16**) while aiming to maintain good performance can specify **Bn_compute_blocks_ids** to work with Bn. DBCache will only compute the specified blocks, with the remaining estimated using the previous step's residual cache.
@@ -185,7 +233,7 @@ Moreover, users configuring higher **Bn** values (e.g., **F8B16**) while aiming
 # Custom options, F8B16, higher precision with good performance.
 cache_options = {
     # 0, 2, 4, ..., 14, 15, etc. [0,16)
-    "Bn_compute_blocks_ids": CacheType.range(0, 16, 2),
+    "Bn_compute_blocks_ids": cache_dit.block_range(0, 16, 2),
     # If the L1 difference is below this threshold, skip Bn blocks
     # not in `Bn_compute_blocks_ids`(1, 3,..., etc), Otherwise,
     # compute these blocks.
@@ -203,7 +251,7 @@ $$
 \mathcal{F}\_{\text {pred }, m}\left(x_{t-k}^l\right)=\mathcal{F}\left(x_t^l\right)+\sum_{i=1}^m \frac{\Delta^i \mathcal{F}\left(x_t^l\right)}{i!\cdot N^i}(-k)^i
 $$
-**TaylorSeer** employs a differential method to approximate the higher-order derivatives of features and predict features in future timesteps with Taylor series expansion. The TaylorSeer implemented in CacheDiT supports both hidden states and residual cache types. That is $\mathcal{F}\_{\text {pred }, m}\left(x_{t-k}^l\right)$ can be a residual cache or a hidden-state cache.
+**TaylorSeer** employs a differential method to approximate the higher-order derivatives of features and predict features in future timesteps with Taylor series expansion. The TaylorSeer implemented in cache-dit supports both hidden states and residual cache types. That is $\mathcal{F}\_{\text {pred }, m}\left(x_{t-k}^l\right)$ can be a residual cache or a hidden-state cache.
 ```python
 cache_options = {
@@ -240,7 +288,7 @@ cache_options = {
 <div id="cfg"></div>
-CacheDiT supports caching for **CFG (classifier-free guidance)**. For models that fuse CFG and non-CFG into a single forward step, or models that do not include CFG (classifier-free guidance) in the forward step, please set `do_separate_classifier_free_guidance` param to **False (default)**. Otherwise, set it to True. For examples:
+cache-dit supports caching for **CFG (classifier-free guidance)**. For models that fuse CFG and non-CFG into a single forward step, or models that do not include CFG (classifier-free guidance) in the forward step, please set `do_separate_classifier_free_guidance` param to **False (default)**. Otherwise, set it to True. For examples:
 ```python
 cache_options = {
@@ -249,7 +297,7 @@ cache_options = {
     # should set do_separate_classifier_free_guidance as False.
     # For example, set it as True for Wan 2.1 and set it as False
     # for FLUX.1, HunyuanVideo, CogVideoX, Mochi.
-    "do_separate_classifier_free_guidance": True,  # Wan 2.1
+    "do_separate_classifier_free_guidance": True, # Wan 2.1, Qwen-Image
     # Compute cfg forward first or not, default False, namely,
     # 0, 2, 4, ..., -> non-CFG step; 1, 3, 5, ... -> CFG step.
     "cfg_compute_first": False,
@@ -260,185 +308,20 @@ cache_options = {
 }
 ```
-## 🎉FBCache: First Block Cache
-<div id="fbcache"></div>
-![](https://github.com/vipshop/cache-dit/raw/main/assets/fbcache-v1.png)
-**DBCache** is a more general cache algorithm than **FBCache**. When Fn=1 and Bn=0, DBCache behaves identically to FBCache. Therefore, you can either use the original FBCache implementation directly or configure **DBCache** with **F1B0** settings to achieve the same functionality.
-```python
-from diffusers import FluxPipeline
-from cache_dit.cache_factory import apply_cache_on_pipe, CacheType
-pipe = FluxPipeline.from_pretrained(
-    "black-forest-labs/FLUX.1-dev",
-    torch_dtype=torch.bfloat16,
-).to("cuda")
-# Using FBCache directly
-cache_options = CacheType.default_options(CacheType.FBCache)
-# Or using DBCache with F1B0.
-# Fn=1, Bn=0, means FB Cache, otherwise, Dual Block Cache
-cache_options = {
-    "cache_type": CacheType.DBCache,
-    "warmup_steps": 8,
-    "max_cached_steps": 8,   # -1 means no limit
-    "Fn_compute_blocks": 1,  # Fn, F1, etc.
-    "Bn_compute_blocks": 0,  # Bn, B0, etc.
-    "residual_diff_threshold": 0.12,
-}
-apply_cache_on_pipe(pipe, **cache_options)
-```
-## ⚡️DBPrune: Dynamic Block Prune
-<div id="dbprune"></div>
-![](https://github.com/vipshop/cache-dit/raw/main/assets/dbprune-v1.png)
-We have further implemented a new **Dynamic Block Prune** algorithm based on **Residual Caching** for Diffusion Transformers, which is referred to as **DBPrune**. DBPrune caches each block's hidden states and residuals, then dynamically prunes blocks during inference by computing the L1 distance between previous hidden states. When a block is pruned, its output is approximated using the cached residuals. DBPrune is currently in the experimental phase, and we kindly invite you to stay tuned for upcoming updates.
-```python
-from diffusers import FluxPipeline
-from cache_dit.cache_factory import apply_cache_on_pipe, CacheType
-pipe = FluxPipeline.from_pretrained(
-    "black-forest-labs/FLUX.1-dev",
-    torch_dtype=torch.bfloat16,
-).to("cuda")
-# Using DBPrune with default options
-cache_options = CacheType.default_options(CacheType.DBPrune)
-apply_cache_on_pipe(pipe, **cache_options)
-```
-We have also brought the designs from DBCache to DBPrune to make it a more general and customizable block prune algorithm. You can specify the values of **Fn** and **Bn** for higher precision, or set up the non-prune blocks list **non_prune_blocks_ids** to avoid aggressive pruning. For example:
-```python
-# Custom options for DBPrune
-cache_options = {
-    "cache_type": CacheType.DBPrune,
-    "residual_diff_threshold": 0.05,
-    # Never prune the first `Fn` and last `Bn` blocks.
-    "Fn_compute_blocks": 8,  # default 1
-    "Bn_compute_blocks": 8,  # default 0
-    "warmup_steps": 8,  # default -1
-    # Disables the pruning strategy when the previous
-    # pruned steps greater than this value.
-    "max_pruned_steps": 12,  # default, -1 means no limit
-    # Enable dynamic prune threshold within step, higher
-    # `max_dynamic_prune_threshold` value may introduce a more
-    # ageressive pruning strategy.
-    "enable_dynamic_prune_threshold": True,
-    "max_dynamic_prune_threshold": 2 * 0.05,
-    # (New thresh) = mean(previous_block_diffs_within_step) * 1.25
-    # (New thresh) = ((New thresh) if (New thresh) <
-    # max_dynamic_prune_threshold else residual_diff_threshold)
-    "dynamic_prune_threshold_relax_ratio": 1.25,
-    # The step interval to update residual cache. For example,
-    # 2: means the update steps will be [0, 2, 4, ...].
-    "residual_cache_update_interval": 1,
-    # You can set non-prune blocks to avoid ageressive pruning.
-    # For example, FLUX.1 has 19 + 38 blocks, so we can set it
-    # to 0, 2, 4, ..., 56, etc.
-    "non_prune_blocks_ids": [],
-}
-apply_cache_on_pipe(pipe, **cache_options)
-```
-> [!Important]
-> Please note that for GPUs with lower VRAM, DBPrune may not be suitable for use on video DiTs, as it caches the hidden states and residuals of each block, leading to higher GPU memory requirements. In such cases, please use DBCache, which only caches the hidden states and residuals of 2 blocks.
-<div align="center">
-  <p align="center">
-    DBPrune, <b> L20x1 </b>, Steps: 28, "A cat holding a sign that says hello world with complex background"
-  </p>
-</div>
-|Baseline(L20x1)|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
-|:---:|:---:|:---:|:---:|:---:|:---:|
-|24.85s|19.43s|16.82s|15.95s|14.24s|10.66s|
-|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/NONE_R0.08_S0.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.03_P24.0_T19.43s.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.04_P34.6_T16.82s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.05_P38.3_T15.95s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.06_P45.2_T14.24s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/DBPRUNE_F1B0_R0.2_P59.5_T10.66s.png width=105px>|
-## 🎉Context Parallelism
-<div id="context-parallelism"></div>
-**CacheDiT** are **plug-and-play** solutions that works hand-in-hand with [ParaAttention](https://github.com/chengzeyi/ParaAttention). Users can **easily tap into** its **Context Parallelism** features for distributed inference. Firstly, install `para-attn` from PyPI:
-```bash
-pip3 install para-attn  # or install `para-attn` from sources.
-```
-Then, you can run **DBCache** or **DBPrune** with **Context Parallelism** on 4 GPUs:
-```python
-import torch.distributed as dist
-from diffusers import FluxPipeline
-from para_attn.context_parallel import init_context_parallel_mesh
-from para_attn.context_parallel.diffusers_adapters import parallelize_pipe
-from cache_dit.cache_factory import apply_cache_on_pipe, CacheType
-# Init distributed process group
-dist.init_process_group()
-torch.cuda.set_device(dist.get_rank())
-pipe = FluxPipeline.from_pretrained(
-    "black-forest-labs/FLUX.1-dev",
-    torch_dtype=torch.bfloat16,
-).to("cuda")
-# Context Parallel from ParaAttention
-parallelize_pipe(
-    pipe, mesh=init_context_parallel_mesh(
-        pipe.device.type, max_ulysses_dim_size=4
-    )
-)
-# DBPrune with default options from this library
-apply_cache_on_pipe(
-    pipe, **CacheType.default_options(CacheType.DBPrune)
-)
-dist.destroy_process_group()
-```
-Then, run the python test script with `torchrun`:
-```bash
-torchrun --nproc_per_node=4 parallel_cache.py
-```
-<div align="center">
-  <p align="center">
-  DBPrune + <b>torch.compile + context parallelism</b> <br>Steps: 28, "A cat holding a sign that says hello world with complex background"
-  </p>
-</div>
-|Baseline|Pruned(24%)|Pruned(35%)|Pruned(38%)|Pruned(45%)|Pruned(60%)|
-|:---:|:---:|:---:|:---:|:---:|:---:|
-|+compile:20.43s|16.25s|14.12s|13.41s|12.00s|8.86s|
-|+L20x4:7.75s|6.62s|6.03s|5.81s|5.24s|3.93s|
-|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_NONE_R0.08_S0_T20.43s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.03_P24.0_T16.25s.png width=105px> | <img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.04_P34.6_T14.12s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.045_P38.2_T13.41s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.055_P45.1_T12.00s.png width=105px>|<img src=https://github.com/vipshop/cache-dit/raw/main/assets/U0_C1_DBPRUNE_F1B0_R0.2_P59.5_T8.86s.png width=105px>|
 ## 🔥Torch Compile
 <div id="compile"></div>
-By the way, **CacheDiT** is designed to work compatibly with **torch.compile.** You can easily use CacheDiT with torch.compile to further achieve a better performance. For example:
+By the way, **cache-dit** is designed to work compatibly with **torch.compile.** You can easily use cache-dit with torch.compile to further achieve a better performance. For example:
 ```python
-apply_cache_on_pipe(
-    pipe, **CacheType.default_options(CacheType.DBPrune)
+cache_dit.enable_cache(
+    pipe, **cache_dit.default_options()
 )
 # Compile the Transformer module
 pipe.transformer = torch.compile(pipe.transformer)
 ```
-However, users intending to use **CacheDiT** for DiT with **dynamic input shapes** should consider increasing the **recompile** **limit** of `torch._dynamo`. Otherwise, the recompile_limit error may be triggered, causing the module to fall back to eager mode.
+However, users intending to use **cache-dit** for DiT with **dynamic input shapes** should consider increasing the **recompile** **limit** of `torch._dynamo`. Otherwise, the recompile_limit error may be triggered, causing the module to fall back to eager mode.
 ```python
 torch._dynamo.config.recompile_limit = 96  # default is 8
 torch._dynamo.config.accumulated_recompile_limit = 2048  # default is 256
@@ -447,11 +330,11 @@ torch._dynamo.config.accumulated_recompile_limit = 2048  # default is 256
 Please check [bench.py](./bench/bench.py) for more details.
-## ⚙️Metrics CLI
+## 🛠Metrics CLI
 <div id="metrics"></div>
-You can utilize the APIs provided by CacheDiT to quickly evaluate the accuracy losses caused by different cache configurations. For example:
+You can utilize the APIs provided by cache-dit to quickly evaluate the accuracy losses caused by different cache configurations. For example:
 ```python
 from cache_dit.metrics import compute_psnr
@@ -480,21 +363,21 @@ cache-dit-metrics-cli psnr -i1 true_dir -i2 test_dir  # PSNR
 ## 👋Contribute
 <div id="contribute"></div>
-How to contribute? Star ⭐️ this repo to support us or check [CONTRIBUTE.md](https://github.com/vipshop/cache-dit/raw/main/CONTRIBUTE.md).
+How to contribute? Star ⭐️ this repo to support us or check [CONTRIBUTE.md](./CONTRIBUTE.md).
 ## ©️License
 <div id="license"></div>
-The **CacheDiT** codebase is adapted from [FBCache](https://github.com/chengzeyi/ParaAttention/tree/main/src/para_attn/first_block_cache). Special thanks to their excellent work! We have followed the original License from [FBCache](https://github.com/chengzeyi/ParaAttention), please check [LICENSE](https://github.com/vipshop/cache-dit/raw/main/LICENSE) for more details.
+The **cache-dit** codebase is adapted from FBCache. Special thanks to their excellent work! We have followed the original License from FBCache, please check [LICENSE](./LICENSE) for more details.
 ## ©️Citations
 <div id="citations"></div>
 ```BibTeX
-@misc{CacheDiT@2025,
-  title={CacheDiT: A Training-free and Easy-to-use cache acceleration Toolbox for Diffusion Transformers},
+@misc{cache-dit@2025,
+  title={cache-dit: An Unified and Training-free Cache Acceleration Toolbox for Diffusion Transformers},
   url={https://github.com/vipshop/cache-dit.git},
   note={Open-source software available at https://github.com/vipshop/cache-dit.git},
   author={vipshop.com},

cache_dit-0.2.17.dist-info/RECORD ADDED Viewed

@@ -0,0 +1,30 @@
+cache_dit/__init__.py,sha256=gRJrSVrj-700qjgjwHfcHkiIHKbGm2cutP1TybxQZk4,605
+cache_dit/_version.py,sha256=sRnPbdnyLakHrE7uBPRC_AQNPiFphtVIa4BPaftkqk4,706
+cache_dit/logger.py,sha256=0zsu42hN-3-rgGC_C29ms1IvVpV4_b4_SwJCKSenxBE,4304
+cache_dit/primitives.py,sha256=A2iG9YLot3gOsZSPp-_gyjqjLgJvWQRx8aitD4JQ23Y,3877
+cache_dit/utils.py,sha256=4cFNh0asch6Zgsixq0bS1ElfwBu_6BG5ZSmaa1khjyg,144
+cache_dit/cache_factory/.gitignore,sha256=5Cb-qT9wsTUoMJ7vACDF7ZcLpAXhi5v-xdcWSRit988,23
+cache_dit/cache_factory/__init__.py,sha256=2td8ivq0DDzu00Kq1oPvq0Bh5C76w_gwsMfyUo2xW9U,1652
+cache_dit/cache_factory/cache_adapters.py,sha256=ECYRvgx6ePX6Jd6sqUXmXi6kbWaqlOdvm6aZLhpedW0,23455
+cache_dit/cache_factory/cache_blocks.py,sha256=9jgK2IT0Y_AlbhJLnhgA47lOxQNwNizDgHve45818gg,18390
+cache_dit/cache_factory/cache_context.py,sha256=f-ihx14NXIZNakN2b_dduegRpJr5SwcPtc2PqnpDdUY,39818
+cache_dit/cache_factory/taylorseer.py,sha256=LKSNo2ode69EVo9xrxjxAMEjz0yDGiGADeDYnEqddA8,3987
+cache_dit/cache_factory/utils.py,sha256=iQg3dqBfQTGkvMdKeO5-YmzkQO5LBSoZ8sYKwQA_7_I,1805
+cache_dit/cache_factory/patch/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+cache_dit/cache_factory/patch/flux.py,sha256=eTdq-3limKHgwtVCILkZTwt9FwYUhH7_VlhKnfu55BU,8999
+cache_dit/compile/__init__.py,sha256=FcTVzCeyypl-mxlc59_ehHL3lBNiDAFsXuRoJ-5Cfi0,56
+cache_dit/compile/utils.py,sha256=ugHrv3QRieG1xKwcg_pi3yVZF6EpSOEJjRmbnfa7VG0,3779
+cache_dit/custom_ops/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+cache_dit/custom_ops/triton_taylorseer.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+cache_dit/metrics/__init__.py,sha256=RaUhl5dieF40RqnizGzR30qoJJ9dyMUEADwgwMaMQrE,575
+cache_dit/metrics/config.py,sha256=ieOgD9ayz722RjVzk24bSIqS2D6o7TZjGk8KeXV-OLQ,551
+cache_dit/metrics/fid.py,sha256=9Ivtazl6mW0Bon2VXa-Ia5Xj2ewxRD3V1Qkd69zYM3Y,17066
+cache_dit/metrics/inception.py,sha256=pBVe2X6ylLPIXTG4-GWDM9DWnCviMJbJ45R3ulhktR0,12759
+cache_dit/metrics/lpips.py,sha256=I2qCNi6qJh5TRsaIsdxO0WoRX1DN7U_H3zS0oCSahYM,1032
+cache_dit/metrics/metrics.py,sha256=8jvM1sF-nDxUuwCRy44QEoo4dYVLCQVh1QyAMs4eaQY,27840
+cache_dit-0.2.17.dist-info/licenses/LICENSE,sha256=Dqb07Ik2dV41s9nIdMUbiRWEfDqo7-dQeRiY7kPO8PE,3769
+cache_dit-0.2.17.dist-info/METADATA,sha256=HqEAEr08N7whWcxOMOVJKThQPglCW_GAj-LcynXmIDI,19804
+cache_dit-0.2.17.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
+cache_dit-0.2.17.dist-info/entry_points.txt,sha256=FX2gysXaZx6NeK1iCLMcIdP8Q4_qikkIHtEmi3oWn8o,65
+cache_dit-0.2.17.dist-info/top_level.txt,sha256=ZJDydonLEhujzz0FOkVbO-BqfzO9d_VqRHmZU-3MOZo,10
+cache_dit-0.2.17.dist-info/RECORD,,

cache_dit/cache_factory/adapters.py DELETED Viewed

@@ -1,169 +0,0 @@
-from enum import Enum
-from diffusers import DiffusionPipeline
-from cache_dit.cache_factory.dual_block_cache.diffusers_adapters import (
-    apply_db_cache_on_pipe,
-)
-from cache_dit.cache_factory.first_block_cache.diffusers_adapters import (
-    apply_fb_cache_on_pipe,
-)
-from cache_dit.cache_factory.dynamic_block_prune.diffusers_adapters import (
-    apply_db_prune_on_pipe,
-)
-from cache_dit.logger import init_logger
-logger = init_logger(__name__)
-class CacheType(Enum):
-    NONE = "NONE"
-    FBCache = "First_Block_Cache"
-    DBCache = "Dual_Block_Cache"
-    DBPrune = "Dynamic_Block_Prune"
-    @staticmethod
-    def type(cache_type: "CacheType | str") -> "CacheType":
-        if isinstance(cache_type, CacheType):
-            return cache_type
-        return CacheType.cache_type(cache_type)
-    @staticmethod
-    def cache_type(cache_type: "CacheType | str") -> "CacheType":
-        if cache_type is None:
-            return CacheType.NONE
-        if isinstance(cache_type, CacheType):
-            return cache_type
-        if cache_type.lower() in (
-            "first_block_cache",
-            "fb_cache",
-            "fbcache",
-            "fb",
-        ):
-            return CacheType.FBCache
-        elif cache_type.lower() in (
-            "dual_block_cache",
-            "db_cache",
-            "dbcache",
-            "db",
-        ):
-            return CacheType.DBCache
-        elif cache_type.lower() in (
-            "dynamic_block_prune",
-            "db_prune",
-            "dbprune",
-            "dbp",
-        ):
-            return CacheType.DBPrune
-        elif cache_type.lower() in (
-            "none_cache",
-            "nonecache",
-            "no_cache",
-            "nocache",
-            "none",
-            "no",
-        ):
-            return CacheType.NONE
-        else:
-            raise ValueError(f"Unknown cache type: {cache_type}")
-    @staticmethod
-    def range(start: int, end: int, step: int = 1) -> list[int]:
-        if start > end or end <= 0 or step <= 1:
-            return []
-        # Always compute 0 and end - 1 blocks for DB Cache
-        return list(
-            sorted(set([0] + list(range(start, end, step)) + [end - 1]))
-        )
-    @staticmethod
-    def default_options(cache_type: "CacheType | str") -> dict:
-        _no_options = {
-            "cache_type": CacheType.NONE,
-        }
-        _fb_options = {
-            "cache_type": CacheType.FBCache,
-            "residual_diff_threshold": 0.08,
-            "warmup_steps": 8,
-            "max_cached_steps": 8,
-        }
-        _Fn_compute_blocks = 8
-        _Bn_compute_blocks = 8
-        _db_options = {
-            "cache_type": CacheType.DBCache,
-            "residual_diff_threshold": 0.12,
-            "warmup_steps": 8,
-            "max_cached_steps": -1,  # -1 means no limit
-            # Fn=1, Bn=0, means FB Cache, otherwise, Dual Block Cache
-            "Fn_compute_blocks": _Fn_compute_blocks,
-            "Bn_compute_blocks": _Bn_compute_blocks,
-            "max_Fn_compute_blocks": 16,
-            "max_Bn_compute_blocks": 16,
-            "Fn_compute_blocks_ids": [],  # 0, 1, 2, ..., 7, etc.
-            "Bn_compute_blocks_ids": [],  # 0, 1, 2, ..., 7, etc.
-        }
-        _dbp_options = {
-            "cache_type": CacheType.DBPrune,
-            "residual_diff_threshold": 0.08,
-            "Fn_compute_blocks": _Fn_compute_blocks,
-            "Bn_compute_blocks": _Bn_compute_blocks,
-            "warmup_steps": 8,
-            "max_pruned_steps": -1,  # -1 means no limit
-        }
-        if cache_type == CacheType.FBCache:
-            return _fb_options
-        elif cache_type == CacheType.DBCache:
-            return _db_options
-        elif cache_type == CacheType.DBPrune:
-            return _dbp_options
-        elif cache_type == CacheType.NONE:
-            return _no_options
-        else:
-            raise ValueError(f"Unknown cache type: {cache_type}")
-def apply_cache_on_pipe(pipe: DiffusionPipeline, *args, **kwargs):
-    assert isinstance(pipe, DiffusionPipeline)
-    if hasattr(pipe, "_is_cached") and pipe._is_cached:
-        return pipe
-    if hasattr(pipe, "_is_pruned") and pipe._is_pruned:
-        return pipe
-    cache_type = kwargs.pop("cache_type", None)
-    if cache_type is None:
-        logger.warning(
-            "No cache type specified, we will use DBCache by default. "
-            "Please specify the cache_type explicitly if you want to "
-            "use a different cache type."
-        )
-        # Force to use DBCache with default cache options
-        return apply_db_cache_on_pipe(
-            pipe,
-            **CacheType.default_options(CacheType.DBCache),
-        )
-    cache_type = CacheType.type(cache_type)
-    if cache_type == CacheType.FBCache:
-        return apply_fb_cache_on_pipe(pipe, *args, **kwargs)
-    elif cache_type == CacheType.DBCache:
-        return apply_db_cache_on_pipe(pipe, *args, **kwargs)
-    elif cache_type == CacheType.DBPrune:
-        return apply_db_prune_on_pipe(pipe, *args, **kwargs)
-    elif cache_type == CacheType.NONE:
-        logger.warning(
-            f"Cache type is {cache_type}, no caching will be applied."
-        )
-        return pipe
-    else:
-        raise ValueError(f"Unknown cache type: {cache_type}")

cache-dit 0.2.15__py3-none-any.whl → 0.2.17__py3-none-any.whl

Potentially problematic release.

cache-dit 0.2.15py3-none-any.whl → 0.2.17py3-none-any.whl