PyPI - cache-dit - Versions diffs - 0.2.4__tar.gz → 0.2.5__tar.gz - Mend

cache-dit 0.2.4tar.gz → 0.2.5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of cache-dit might be problematic. Click here for more details.

Files changed (137) hide show

{cache_dit-0.2.4 → cache_dit-0.2.5}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: cache_dit
-Version: 0.2.4
+Version: 0.2.5
 Summary: 🤗 CacheDiT: A Training-free and Easy-to-use Cache Acceleration Toolbox for Diffusion Transformers
 Author: DefTruth, vipshop.com, etc.
 Maintainer: DefTruth, vipshop.com, etc
@@ -44,7 +44,7 @@ Dynamic: requires-python
       <img src=https://img.shields.io/badge/PyPI-pass-brightgreen.svg >
       <img src=https://static.pepy.tech/badge/cache-dit >
       <img src=https://img.shields.io/badge/Python-3.10|3.11|3.12-9cf.svg >
-      <img src=https://img.shields.io/badge/Release-v0.2.2-brightgreen.svg >
+      <img src=https://img.shields.io/badge/Release-v0.2-brightgreen.svg >
  </div>
   <p align="center">
     DeepCache is for UNet not DiT. Most DiT cache speedups are complex and not training-free. CacheDiT offers <br>a set of training-free cache accelerators for DiT: <b>🔥<a href="#dbcache">DBCache</a>, <a href="#dbprune">DBPrune</a>, <a href="#taylorseer">TaylorSeer</a>, <a href="#fbcache">FBCache</a></b>, etc🔥
@@ -169,7 +169,7 @@ The **CacheDiT** codebase is adapted from [FBCache](https://github.com/chengzeyi
 You can install the stable release of `cache-dit` from PyPI:
 ```bash
-pip3 install cache-dit
+pip3 install -U cache-dit
 ```
 Or you can install the latest develop version from GitHub:
@@ -181,11 +181,13 @@ pip3 install git+https://github.com/vipshop/cache-dit.git
 <div id="supported"></div>
-- [🚀FLUX.1](https://github.com/vipshop/cache-dit/raw/main/examples)
-- [🚀Mochi](https://github.com/vipshop/cache-dit/raw/main/examples)
+- [🚀FLUX.1-dev](https://github.com/vipshop/cache-dit/raw/main/examples)
+- [🚀FLUX.1-Fill-dev](https://github.com/vipshop/cache-dit/raw/main/examples)
+- [🚀mochi-1-preview](https://github.com/vipshop/cache-dit/raw/main/examples)
 - [🚀CogVideoX](https://github.com/vipshop/cache-dit/raw/main/examples)
 - [🚀CogVideoX1.5](https://github.com/vipshop/cache-dit/raw/main/examples)
-- [🚀Wan2.1](https://github.com/vipshop/cache-dit/raw/main/examples)
+- [🚀Wan2.1-T2V](https://github.com/vipshop/cache-dit/raw/main/examples)
+- [🚀Wan2.1-FLF2V](https://github.com/vipshop/cache-dit/raw/main/examples)
 - [🚀HunyuanVideo](https://github.com/vipshop/cache-dit/raw/main/examples)
@@ -281,7 +283,7 @@ cache_options = {
     "taylorseer_kwargs": {
         "n_derivatives": 2, # default is 2.
     },
-    "warmup_steps": 3, # n_derivatives + 1
+    "warmup_steps": 3, # prefer: >= n_derivatives + 1
     "residual_diff_threshold": 0.12,
 }
 ```
@@ -304,12 +306,23 @@ cache_options = {
 <div id="cfg"></div>
-CacheDiT supports caching for CFG (classifier-free guidance). For models that fuse CFG and non-CFG into a single forward step, or models that do not include CFG (classifier-free guidance) in the forward step, please set `do_separate_classifier_free_guidance` param to False. Otherwise, set it to True. Wan 2.1: True. FLUX.1, HunyunVideo, CogVideoX, Mochi: False.
+CacheDiT supports caching for **CFG (classifier-free guidance)**. For models that fuse CFG and non-CFG into a single forward step, or models that do not include CFG (classifier-free guidance) in the forward step, please set `do_separate_classifier_free_guidance` param to **False (default)**. Otherwise, set it to True. For examples:
 ```python
 cache_options = {
+    # CFG: classifier free guidance or not
+    # For model that fused CFG and non-CFG into single forward step,
+    # should set do_separate_classifier_free_guidance as False.
+    # For example, set it as True for Wan 2.1 and set it as False
+    # for FLUX.1, HunyuanVideo, CogVideoX, Mochi.
     "do_separate_classifier_free_guidance": True,  # Wan 2.1
+    # Compute cfg forward first or not, default False, namely,
+    # 0, 2, 4, ..., -> non-CFG step; 1, 3, 5, ... -> CFG step.
     "cfg_compute_first": False,
+    # Compute spearate diff values for CFG and non-CFG step,
+    # default True. If False, we will use the computed diff from
+    # current non-CFG transformer step for current CFG step.
+    "cfg_diff_compute_separate": True,
 }
 ```

{cache_dit-0.2.4 → cache_dit-0.2.5}/README.md RENAMED Viewed

@@ -9,7 +9,7 @@
       <img src=https://img.shields.io/badge/PyPI-pass-brightgreen.svg >
       <img src=https://static.pepy.tech/badge/cache-dit >
       <img src=https://img.shields.io/badge/Python-3.10|3.11|3.12-9cf.svg >
-      <img src=https://img.shields.io/badge/Release-v0.2.2-brightgreen.svg >
+      <img src=https://img.shields.io/badge/Release-v0.2-brightgreen.svg >
  </div>
   <p align="center">
     DeepCache is for UNet not DiT. Most DiT cache speedups are complex and not training-free. CacheDiT offers <br>a set of training-free cache accelerators for DiT: <b>🔥<a href="#dbcache">DBCache</a>, <a href="#dbprune">DBPrune</a>, <a href="#taylorseer">TaylorSeer</a>, <a href="#fbcache">FBCache</a></b>, etc🔥
@@ -134,7 +134,7 @@ The **CacheDiT** codebase is adapted from [FBCache](https://github.com/chengzeyi
 You can install the stable release of `cache-dit` from PyPI:
 ```bash
-pip3 install cache-dit
+pip3 install -U cache-dit
 ```
 Or you can install the latest develop version from GitHub:
@@ -146,11 +146,13 @@ pip3 install git+https://github.com/vipshop/cache-dit.git
 <div id="supported"></div>
-- [🚀FLUX.1](https://github.com/vipshop/cache-dit/raw/main/examples)
-- [🚀Mochi](https://github.com/vipshop/cache-dit/raw/main/examples)
+- [🚀FLUX.1-dev](https://github.com/vipshop/cache-dit/raw/main/examples)
+- [🚀FLUX.1-Fill-dev](https://github.com/vipshop/cache-dit/raw/main/examples)
+- [🚀mochi-1-preview](https://github.com/vipshop/cache-dit/raw/main/examples)
 - [🚀CogVideoX](https://github.com/vipshop/cache-dit/raw/main/examples)
 - [🚀CogVideoX1.5](https://github.com/vipshop/cache-dit/raw/main/examples)
-- [🚀Wan2.1](https://github.com/vipshop/cache-dit/raw/main/examples)
+- [🚀Wan2.1-T2V](https://github.com/vipshop/cache-dit/raw/main/examples)
+- [🚀Wan2.1-FLF2V](https://github.com/vipshop/cache-dit/raw/main/examples)
 - [🚀HunyuanVideo](https://github.com/vipshop/cache-dit/raw/main/examples)
@@ -246,7 +248,7 @@ cache_options = {
     "taylorseer_kwargs": {
         "n_derivatives": 2, # default is 2.
     },
-    "warmup_steps": 3, # n_derivatives + 1
+    "warmup_steps": 3, # prefer: >= n_derivatives + 1
     "residual_diff_threshold": 0.12,
 }
 ```
@@ -269,12 +271,23 @@ cache_options = {
 <div id="cfg"></div>
-CacheDiT supports caching for CFG (classifier-free guidance). For models that fuse CFG and non-CFG into a single forward step, or models that do not include CFG (classifier-free guidance) in the forward step, please set `do_separate_classifier_free_guidance` param to False. Otherwise, set it to True. Wan 2.1: True. FLUX.1, HunyunVideo, CogVideoX, Mochi: False.
+CacheDiT supports caching for **CFG (classifier-free guidance)**. For models that fuse CFG and non-CFG into a single forward step, or models that do not include CFG (classifier-free guidance) in the forward step, please set `do_separate_classifier_free_guidance` param to **False (default)**. Otherwise, set it to True. For examples:
 ```python
 cache_options = {
+    # CFG: classifier free guidance or not
+    # For model that fused CFG and non-CFG into single forward step,
+    # should set do_separate_classifier_free_guidance as False.
+    # For example, set it as True for Wan 2.1 and set it as False
+    # for FLUX.1, HunyuanVideo, CogVideoX, Mochi.
     "do_separate_classifier_free_guidance": True,  # Wan 2.1
+    # Compute cfg forward first or not, default False, namely,
+    # 0, 2, 4, ..., -> non-CFG step; 1, 3, 5, ... -> CFG step.
     "cfg_compute_first": False,
+    # Compute spearate diff values for CFG and non-CFG step,
+    # default True. If False, we will use the computed diff from
+    # current non-CFG transformer step for current CFG step.
+    "cfg_diff_compute_separate": True,
 }
 ```

{cache_dit-0.2.4 → cache_dit-0.2.5}/examples/.gitignore RENAMED Viewed

@@ -166,3 +166,4 @@ report*.html
 .DS_Store
 *.png
 *.mp4
+tmp*

{cache_dit-0.2.4 → cache_dit-0.2.5}/examples/README.md RENAMED Viewed

@@ -32,7 +32,7 @@ python3 run_cogvideox.py --cache --Fn 8 --Bn 8
 python3 run_cogvideox.py --cache --Fn 8 --Bn 0 --taylorseer
 ```
-- Wan2.1
+- Wan2.1 T2V
 ```bash
 python3 run_wan.py # baseline
@@ -40,7 +40,15 @@ python3 run_wan.py --cache --Fn 8 --Bn 8
 python3 run_wan.py --cache --Fn 8 --Bn 0 --taylorseer
 ```
-- Mochi
+- Wan2.1 FLF2V
+```bash
+python3 run_wan_flf2v.py # baseline
+python3 run_wan_flf2v.py --cache --Fn 8 --Bn 8
+python3 run_wan_flf2v.py --cache --Fn 8 --Bn 0 --taylorseer
+```
+- mochi-1-preview
 ```bash
 python3 run_mochi.py # baseline

cache_dit-0.2.5/examples/data/flf2v_input_first_frame.png ADDED Viewed

Binary file

cache_dit-0.2.5/examples/data/flf2v_input_last_frame.png ADDED Viewed

Binary file

{cache_dit-0.2.4 → cache_dit-0.2.5}/examples/requirements.txt RENAMED Viewed

@@ -1,4 +1,4 @@
 imageio-ffmpeg
 # wan currently requires installing from source
-diffusers @ git+https://github.com/huggingface/diffusers
+diffusers>=0.34.0
 ftfy

{cache_dit-0.2.4 → cache_dit-0.2.5}/examples/run_cogvideox.py RENAMED Viewed

@@ -70,7 +70,7 @@ if args.cache:
         "enable_taylorseer": args.taylorseer,
         "enable_encoder_taylorseer": args.taylorseer,
         # Taylorseer cache type cache be hidden_states or residual
-        "taylorseer_cache_type": "residual",
+        "taylorseer_cache_type": "hidden_states",
         "taylorseer_kwargs": {
             "n_derivatives": args.taylorseer_order,
         },

{cache_dit-0.2.4 → cache_dit-0.2.5}/examples/run_wan.py RENAMED Viewed

@@ -63,7 +63,13 @@ if args.cache:
         # For model that fused CFG and non-CFG into single forward step,
         # should set do_separate_classifier_free_guidance as False.
         "do_separate_classifier_free_guidance": True,
+        # Compute cfg forward first or not, default False, namely,
+        # 0, 2, 4, ..., -> non-CFG step; 1, 3, 5, ... -> CFG step.
         "cfg_compute_first": False,
+        # Compute spearate diff values for CFG and non-CFG step,
+        # default True. If False, we will use the computed diff from
+        # current non-CFG transformer step for current CFG step.
+        "cfg_diff_compute_separate": True,
         "enable_taylorseer": args.taylorseer,
         "enable_encoder_taylorseer": args.taylorseer,
         # Taylorseer cache type cache be hidden_states or residual
@@ -89,12 +95,12 @@ pipe.enable_model_cpu_offload()
 # Wan currently requires installing diffusers from source
 assert isinstance(pipe.vae, AutoencoderKLWan)  # enable type check for IDE
-if diffusers.__version__ >= "0.34.0.dev0":
+if diffusers.__version__ >= "0.34.0":
     pipe.vae.enable_tiling()
     pipe.vae.enable_slicing()
 else:
     print(
-        "Wan pipeline requires diffusers version >= 0.34.0.dev0 "
+        "Wan pipeline requires diffusers version >= 0.34.0 "
         "for vae tiling and slicing, please install diffusers "
         "from source."
     )

cache_dit-0.2.5/examples/run_wan_flf2v.py ADDED Viewed

@@ -0,0 +1,191 @@
+import os
+import time
+import torch
+import diffusers
+import argparse
+import numpy as np
+import torchvision.transforms.functional as TF
+from diffusers import AutoencoderKLWan, WanImageToVideoPipeline
+from diffusers.utils import export_to_video, load_image
+from transformers import CLIPVisionModel
+from cache_dit.cache_factory import CacheType, apply_cache_on_pipe
+def get_args() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser()
+    # General arguments
+    parser.add_argument("--cache", action="store_true", default=False)
+    parser.add_argument("--taylorseer", action="store_true", default=False)
+    parser.add_argument("--taylorseer-order", "--order", type=int, default=2)
+    parser.add_argument("--Fn-compute-blocks", "--Fn", type=int, default=1)
+    parser.add_argument("--Bn-compute-blocks", "--Bn", type=int, default=0)
+    parser.add_argument("--downsample-factor", "--df", type=int, default=4)
+    parser.add_argument("--rdt", type=float, default=0.08)
+    parser.add_argument("--warmup-steps", type=int, default=0)
+    return parser.parse_args()
+def aspect_ratio_resize(image, pipe, max_area=720 * 1280):
+    aspect_ratio = image.height / image.width
+    mod_value = (
+        pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1]
+    )
+    height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
+    width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
+    image = image.resize((width, height))
+    return image, height, width
+def center_crop_resize(image, height, width):
+    # Calculate resize ratio to match first frame dimensions
+    resize_ratio = max(width / image.width, height / image.height)
+    # Resize the image
+    width = round(image.width * resize_ratio)
+    height = round(image.height * resize_ratio)
+    size = [width, height]
+    image = TF.center_crop(image, size)
+    return image, height, width
+def prepare_pipeline(
+    pipe: WanImageToVideoPipeline,
+    args: argparse.ArgumentParser,
+):
+    if args.cache:
+        cache_options = {
+            "cache_type": CacheType.DBCache,
+            "warmup_steps": args.warmup_steps,
+            "max_cached_steps": -1,  # -1 means no limit
+            "downsample_factor": args.downsample_factor,
+            # Fn=1, Bn=0, means FB Cache, otherwise, Dual Block Cache
+            "Fn_compute_blocks": args.Fn_compute_blocks,  # Fn, F8, etc.
+            "Bn_compute_blocks": args.Bn_compute_blocks,  # Bn, B16, etc.
+            "residual_diff_threshold": args.rdt,
+            # releative token diff threshold, default is 0.0
+            "important_condition_threshold": 0.00,
+            # CFG: classifier free guidance or not
+            # For model that fused CFG and non-CFG into single forward step,
+            # should set do_separate_classifier_free_guidance as False.
+            "do_separate_classifier_free_guidance": True,
+            # Compute cfg forward first or not, default False, namely,
+            # 0, 2, 4, ..., -> non-CFG step; 1, 3, 5, ... -> CFG step.
+            "cfg_compute_first": False,
+            # Compute spearate diff values for CFG and non-CFG step,
+            # default True. If False, we will use the computed diff from
+            # current non-CFG transformer step for current CFG step.
+            "cfg_diff_compute_separate": True,
+            "enable_taylorseer": args.taylorseer,
+            "enable_encoder_taylorseer": args.taylorseer,
+            # Taylorseer cache type cache be hidden_states or residual
+            "taylorseer_cache_type": "residual",
+            "taylorseer_kwargs": {
+                "n_derivatives": args.taylorseer_order,
+            },
+        }
+        cache_type_str = "DBCACHE"
+        cache_type_str = (
+            f"{cache_type_str}_F{args.Fn_compute_blocks}"
+            f"B{args.Bn_compute_blocks}W{args.warmup_steps}"
+            f"T{int(args.taylorseer)}O{args.taylorseer_order}"
+        )
+        print(f"cache options:\n{cache_options}")
+        apply_cache_on_pipe(pipe, **cache_options)
+    else:
+        cache_type_str = "NONE"
+    # Enable memory savings
+    pipe.enable_model_cpu_offload()
+    # Wan currently requires installing diffusers from source
+    assert isinstance(pipe.vae, AutoencoderKLWan)  # enable type check for IDE
+    if diffusers.__version__ >= "0.34.0":
+        pipe.vae.enable_tiling()
+        pipe.vae.enable_slicing()
+    else:
+        print(
+            "Wan pipeline requires diffusers version >= 0.34.0 "
+            "for vae tiling and slicing, please install diffusers "
+            "from source."
+        )
+    return cache_type_str, pipe
+def main():
+    args = get_args()
+    print(args)
+    model_id = os.environ.get(
+        "WAN_FLF2V_DIR",
+        "Wan-AI/Wan2.1-FLF2V-14B-720P-Diffusers",
+    )
+    image_encoder = CLIPVisionModel.from_pretrained(
+        model_id, subfolder="image_encoder", torch_dtype=torch.float32
+    )
+    vae = AutoencoderKLWan.from_pretrained(
+        model_id, subfolder="vae", torch_dtype=torch.float32
+    )
+    pipe = WanImageToVideoPipeline.from_pretrained(
+        model_id,
+        vae=vae,
+        image_encoder=image_encoder,
+        torch_dtype=torch.bfloat16,
+    )
+    pipe.to("cuda")
+    cache_type_str, pipe = prepare_pipeline(pipe, args)
+    first_frame = load_image("data/flf2v_input_first_frame.png")
+    last_frame = load_image("data/flf2v_input_last_frame.png")
+    first_frame, height, width = aspect_ratio_resize(first_frame, pipe)
+    if last_frame.size != first_frame.size:
+        last_frame, _, _ = center_crop_resize(last_frame, height, width)
+    prompt = (
+        "CG animation style, a small blue bird takes off from the ground, flapping its wings. "
+        + "The bird's feathers are delicate, with a unique pattern on its chest. The background shows "
+        + "a blue sky with white clouds under bright sunshine. The camera follows the bird upward, "
+        + "capturing its flight and the vastness of the sky from a close-up, low-angle perspective."
+    )
+    start = time.time()
+    output = pipe(
+        image=first_frame,
+        last_image=last_frame,
+        prompt=prompt,
+        height=height,
+        width=width,
+        guidance_scale=5.5,
+        num_frames=49,
+        num_inference_steps=35,
+        generator=torch.Generator("cpu").manual_seed(0),
+    ).frames[0]
+    end = time.time()
+    if hasattr(pipe.transformer, "_cached_steps"):
+        cached_steps = pipe.transformer._cached_steps
+        residual_diffs = pipe.transformer._residual_diffs
+        print(f"Cache Steps: {len(cached_steps)}, {cached_steps}")
+        print(f"Residual Diffs: {len(residual_diffs)}, {residual_diffs}")
+    if hasattr(pipe.transformer, "_cfg_cached_steps"):
+        cfg_cached_steps = pipe.transformer._cfg_cached_steps
+        cfg_residual_diffs = pipe.transformer._cfg_residual_diffs
+        print(f"CFG Cache Steps: {len(cfg_cached_steps)}, {cfg_cached_steps} ")
+        print(
+            f"CFG Residual Diffs: {len(cfg_residual_diffs)}, {cfg_residual_diffs}"
+        )
+    time_cost = end - start
+    save_path = f"wan.flf2v.{cache_type_str}.mp4"
+    print(f"Time cost: {time_cost:.2f}s")
+    print(f"Saving video to {save_path}")
+    export_to_video(output, save_path, fps=16)
+if __name__ == "__main__":
+    main()

{cache_dit-0.2.4 → cache_dit-0.2.5}/src/cache_dit/_version.py RENAMED Viewed

@@ -17,5 +17,5 @@ __version__: str
 __version_tuple__: VERSION_TUPLE
 version_tuple: VERSION_TUPLE
-__version__ = version = '0.2.4'
-__version_tuple__ = version_tuple = (0, 2, 4)
+__version__ = version = '0.2.5'
+__version_tuple__ = version_tuple = (0, 2, 5)

cache-dit 0.2.4__tar.gz → 0.2.5__tar.gz

Potentially problematic release.

cache-dit 0.2.4tar.gz → 0.2.5tar.gz