PyPI - diffsynth - Versions diffs - 2.0.10__tar.gz → 2.0.11__tar.gz - Mend

diffsynth 2.0.10tar.gz → 2.0.11tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (179) hide show

{diffsynth-2.0.10 → diffsynth-2.0.11}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: diffsynth
-Version: 2.0.10
+Version: 2.0.11
 Summary: Enjoy the magic of Diffusion models!
 Author: ModelScope Team
 License: Apache-2.0

{diffsynth-2.0.10 → diffsynth-2.0.11}/README.md RENAMED Viewed

@@ -34,6 +34,15 @@ We believe that a well-developed open-source code framework can lower the thresh
 > Currently, the development personnel of this project are limited, with most of the work handled by [Artiprocher](https://github.com/Artiprocher) and [mi804](https://github.com/mi804). Therefore, the progress of new feature development will be relatively slow, and the speed of responding to and resolving issues is limited. We apologize for this and ask developers to understand.
+- **April 28, 2026** 🔥 We are excited to announce the release of **Diffusion Templates**, a plugin framework designed for Diffusion models that significantly lowers the barrier to training controllable generative models. Let's explore this cutting-edge technology together!
+    * Open-source code: [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio)
+    * Technical report: [arXiv](https://arxiv.org/abs/2604.24351)
+    * Project homepage: [GitHub](https://modelscope.github.io/diffusion-templates-web/)
+    * Documentation: [English Version](https://diffsynth-studio-doc.readthedocs.io/en/latest/Diffusion_Templates/Introducing_Diffusion_Templates.html) | [Chinese Version](https://diffsynth-studio-doc.readthedocs.io/zh-cn/latest/Diffusion_Templates/Introducing_Diffusion_Templates.html)
+    * Online demo: [ModelScope](https://modelscope.cn/studios/DiffSynth-Studio/Diffusion-Templates)
+    * Model collections: [ModelScope](https://modelscope.cn/collections/DiffSynth-Studio/KleinBase4B-Templates) | [ModelScope International](https://modelscope.ai/collections/DiffSynth-Studio/KleinBase4B-Templates) | [HuggingFace](https://huggingface.co/collections/DiffSynth-Studio/kleinbase4b-templates)
+    * Datasets: [ModelScope](https://modelscope.cn/collections/DiffSynth-Studio/ImagePulseV2) | [ModelScope International](https://modelscope.ai/collections/DiffSynth-Studio/ImagePulseV2) | [HuggingFace](https://huggingface.co/collections/DiffSynth-Studio/imagepulsev2)
 - **April 27, 2026** We support ACE-Step-1.5! Support includes text-to-music generation, low VRAM inference, and LoRA training capabilities. For details, please refer to the [documentation](/docs/en/Model_Details/ACE-Step.md) and [example code](/examples/ace_step/).
 - **April 27, 2026**: We have reinstated support for the Stable Diffusion v1.5 and SDXL models, providing academic research support exclusively for these two model types.
@@ -96,7 +105,7 @@ We believe that a well-developed open-source code framework can lower the thresh
 - **August 20, 2025** We open-sourced the [DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix](https://www.modelscope.cn/models/DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix) model, improving the editing effect of Qwen-Image-Edit on low-resolution image inputs. Please refer to [our sample code](./examples/qwen_image/model_inference/Qwen-Image-Edit-Lowres-Fix.py)
-- **August 19, 2025** 🔥 Qwen-Image-Edit open-sourced, welcome a new member to the image editing model family!
+- **August 19, 2025** Qwen-Image-Edit open-sourced, welcome a new member to the image editing model family!
 - **August 18, 2025** We trained and open-sourced the Qwen-Image inpainting ControlNet model [DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Inpaint](https://www.modelscope.cn/models/DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Inpaint). The model structure adopts a lightweight design. Please refer to [our sample code](./examples/qwen_image/model_inference/Qwen-Image-Blockwise-ControlNet-Inpaint.py).
@@ -112,7 +121,7 @@ We believe that a well-developed open-source code framework can lower the thresh
 - **August 5, 2025** We open-sourced the distilled acceleration model [DiffSynth-Studio/Qwen-Image-Distill-Full](https://www.modelscope.cn/models/DiffSynth-Studio/Qwen-Image-Distill-Full) for Qwen-Image, achieving approximately 5x acceleration.
-- **August 4, 2025** 🔥 Qwen-Image open-sourced, welcome a new member to the image generation model family!
+- **August 4, 2025** Qwen-Image open-sourced, welcome a new member to the image generation model family!
 - **August 1, 2025** [FLUX.1-Krea-dev](https://www.modelscope.cn/models/black-forest-labs/FLUX.1-Krea-dev) open-sourced, a text-to-image model focused on aesthetic photography. We provided comprehensive support in a timely manner, including low VRAM layer-by-layer offload, LoRA training, and full training. For more details, please refer to [./examples/flux/](./examples/flux/).
@@ -479,6 +488,17 @@ Example code for FLUX.2 is available at: [/examples/flux2/](/examples/flux2/)
 |[black-forest-labs/FLUX.2-klein-9B](https://www.modelscope.cn/models/black-forest-labs/FLUX.2-klein-9B)|[code](/examples/flux2/model_inference/FLUX.2-klein-9B.py)|[code](/examples/flux2/model_inference_low_vram/FLUX.2-klein-9B.py)|[code](/examples/flux2/model_training/full/FLUX.2-klein-9B.sh)|[code](/examples/flux2/model_training/validate_full/FLUX.2-klein-9B.py)|[code](/examples/flux2/model_training/lora/FLUX.2-klein-9B.sh)|[code](/examples/flux2/model_training/validate_lora/FLUX.2-klein-9B.py)|
 |[black-forest-labs/FLUX.2-klein-base-4B](https://www.modelscope.cn/models/black-forest-labs/FLUX.2-klein-base-4B)|[code](/examples/flux2/model_inference/FLUX.2-klein-base-4B.py)|[code](/examples/flux2/model_inference_low_vram/FLUX.2-klein-base-4B.py)|[code](/examples/flux2/model_training/full/FLUX.2-klein-base-4B.sh)|[code](/examples/flux2/model_training/validate_full/FLUX.2-klein-base-4B.py)|[code](/examples/flux2/model_training/lora/FLUX.2-klein-base-4B.sh)|[code](/examples/flux2/model_training/validate_lora/FLUX.2-klein-base-4B.py)|
 |[black-forest-labs/FLUX.2-klein-base-9B](https://www.modelscope.cn/models/black-forest-labs/FLUX.2-klein-base-9B)|[code](/examples/flux2/model_inference/FLUX.2-klein-base-9B.py)|[code](/examples/flux2/model_inference_low_vram/FLUX.2-klein-base-9B.py)|[code](/examples/flux2/model_training/full/FLUX.2-klein-base-9B.sh)|[code](/examples/flux2/model_training/validate_full/FLUX.2-klein-base-9B.py)|[code](/examples/flux2/model_training/lora/FLUX.2-klein-base-9B.sh)|[code](/examples/flux2/model_training/validate_lora/FLUX.2-klein-base-9B.py)|
+|[DiffSynth-Studio/Template-KleinBase4B-Aesthetic](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Aesthetic)|[code](/examples/flux2/model_inference/Template-KleinBase4B-Aesthetic.py)|[code](/examples/flux2/model_inference_low_vram/Template-KleinBase4B-Aesthetic.py)|[code](/examples/flux2/model_training/full/Template-KleinBase4B-Aesthetic.sh)|[code](/examples/flux2/model_training/validate_full/Template-KleinBase4B-Aesthetic.py)|-|-|
+|[DiffSynth-Studio/Template-KleinBase4B-Brightness](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Brightness)|[code](/examples/flux2/model_inference/Template-KleinBase4B-Brightness.py)|[code](/examples/flux2/model_inference_low_vram/Template-KleinBase4B-Brightness.py)|[code](/examples/flux2/model_training/full/Template-KleinBase4B-Brightness.sh)|[code](/examples/flux2/model_training/validate_full/Template-KleinBase4B-Brightness.py)|-|-|
+|[DiffSynth-Studio/Template-KleinBase4B-Age](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Age)|[code](/examples/flux2/model_inference/Template-KleinBase4B-Age.py)|[code](/examples/flux2/model_inference_low_vram/Template-KleinBase4B-Age.py)|[code](/examples/flux2/model_training/full/Template-KleinBase4B-Age.sh)|[code](/examples/flux2/model_training/validate_full/Template-KleinBase4B-Age.py)|-|-|
+|[DiffSynth-Studio/Template-KleinBase4B-ControlNet](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-ControlNet)|[code](/examples/flux2/model_inference/Template-KleinBase4B-ControlNet.py)|[code](/examples/flux2/model_inference_low_vram/Template-KleinBase4B-ControlNet.py)|[code](/examples/flux2/model_training/full/Template-KleinBase4B-ControlNet.sh)|[code](/examples/flux2/model_training/validate_full/Template-KleinBase4B-ControlNet.py)|-|-|
+|[DiffSynth-Studio/Template-KleinBase4B-Edit](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Edit)|[code](/examples/flux2/model_inference/Template-KleinBase4B-Edit.py)|[code](/examples/flux2/model_inference_low_vram/Template-KleinBase4B-Edit.py)|[code](/examples/flux2/model_training/full/Template-KleinBase4B-Edit.sh)|[code](/examples/flux2/model_training/validate_full/Template-KleinBase4B-Edit.py)|-|-|
+|[DiffSynth-Studio/Template-KleinBase4B-Inpaint](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Inpaint)|[code](/examples/flux2/model_inference/Template-KleinBase4B-Inpaint.py)|[code](/examples/flux2/model_inference_low_vram/Template-KleinBase4B-Inpaint.py)|[code](/examples/flux2/model_training/full/Template-KleinBase4B-Inpaint.sh)|[code](/examples/flux2/model_training/validate_full/Template-KleinBase4B-Inpaint.py)|-|-|
+|[DiffSynth-Studio/Template-KleinBase4B-PandaMeme](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-PandaMeme)|[code](/examples/flux2/model_inference/Template-KleinBase4B-PandaMeme.py)|[code](/examples/flux2/model_inference_low_vram/Template-KleinBase4B-PandaMeme.py)|[code](/examples/flux2/model_training/full/Template-KleinBase4B-PandaMeme.sh)|[code](/examples/flux2/model_training/validate_full/Template-KleinBase4B-PandaMeme.py)|-|-|
+|[DiffSynth-Studio/Template-KleinBase4B-Sharpness](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Sharpness)|[code](/examples/flux2/model_inference/Template-KleinBase4B-Sharpness.py)|[code](/examples/flux2/model_inference_low_vram/Template-KleinBase4B-Sharpness.py)|[code](/examples/flux2/model_training/full/Template-KleinBase4B-Sharpness.sh)|[code](/examples/flux2/model_training/validate_full/Template-KleinBase4B-Sharpness.py)|-|-|
+|[DiffSynth-Studio/Template-KleinBase4B-SoftRGB](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-SoftRGB)|[code](/examples/flux2/model_inference/Template-KleinBase4B-SoftRGB.py)|[code](/examples/flux2/model_inference_low_vram/Template-KleinBase4B-SoftRGB.py)|[code](/examples/flux2/model_training/full/Template-KleinBase4B-SoftRGB.sh)|[code](/examples/flux2/model_training/validate_full/Template-KleinBase4B-SoftRGB.py)|-|-|
+|[DiffSynth-Studio/Template-KleinBase4B-Upscaler](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-Upscaler)|[code](/examples/flux2/model_inference/Template-KleinBase4B-Upscaler.py)|[code](/examples/flux2/model_inference_low_vram/Template-KleinBase4B-Upscaler.py)|[code](/examples/flux2/model_training/full/Template-KleinBase4B-Upscaler.sh)|[code](/examples/flux2/model_training/validate_full/Template-KleinBase4B-Upscaler.py)|-|-|
+|[DiffSynth-Studio/Template-KleinBase4B-ContentRef](https://www.modelscope.cn/models/DiffSynth-Studio/Template-KleinBase4B-ContentRef)|[code](/examples/flux2/model_inference/Template-KleinBase4B-ContentRef.py)|[code](/examples/flux2/model_inference_low_vram/Template-KleinBase4B-ContentRef.py)|[code](/examples/flux2/model_training/full/Template-KleinBase4B-ContentRef.sh)|[code](/examples/flux2/model_training/validate_full/Template-KleinBase4B-ContentRef.py)|-|-|
 </details>

{diffsynth-2.0.10 → diffsynth-2.0.11}/diffsynth/diffusion/base_pipeline.py RENAMED Viewed

@@ -3,12 +3,13 @@ import torch
 import numpy as np
 from einops import repeat, reduce
 from typing import Union
-from ..core import AutoTorchModule, AutoWrappedLinear, load_state_dict, ModelConfig, parse_device_type
+from ..core import AutoTorchModule, AutoWrappedLinear, load_state_dict, ModelConfig, parse_device_type, enable_vram_management
 from ..core.device.npu_compatible_device import get_device_type
 from ..utils.lora import GeneralLoRALoader
 from ..models.model_loader import ModelPool
 from ..utils.controlnet import ControlNetInput
 from ..core.device import get_device_name, IS_NPU_AVAILABLE
+from .template import load_template_model, load_template_data_processor
 class PipelineUnit:
@@ -319,14 +320,21 @@ class BasePipeline(torch.nn.Module):
     def cfg_guided_model_fn(self, model_fn, cfg_scale, inputs_shared, inputs_posi, inputs_nega, **inputs_others):
+        # Positive side forward
         if inputs_shared.get("positive_only_lora", None) is not None:
-            self.clear_lora(verbose=0)
             self.load_lora(self.dit, state_dict=inputs_shared["positive_only_lora"], verbose=0)
         noise_pred_posi = model_fn(**inputs_posi, **inputs_shared, **inputs_others)
+        if inputs_shared.get("positive_only_lora", None) is not None:
+            self.clear_lora(verbose=0)
         if cfg_scale != 1.0:
-            if inputs_shared.get("positive_only_lora", None) is not None:
-                self.clear_lora(verbose=0)
+            # Negative side forward
+            if inputs_shared.get("negative_only_lora", None) is not None:
+                self.load_lora(self.dit, state_dict=inputs_shared["negative_only_lora"], verbose=0)
             noise_pred_nega = model_fn(**inputs_nega, **inputs_shared, **inputs_others)
+            if inputs_shared.get("negative_only_lora", None) is not None:
+                self.clear_lora(verbose=0)
             if isinstance(noise_pred_posi, tuple):
                 # Separately handling different output types of latents, eg. video and audio latents.
                 noise_pred = tuple(
@@ -338,6 +346,31 @@ class BasePipeline(torch.nn.Module):
         else:
             noise_pred = noise_pred_posi
         return noise_pred
+    def load_training_template_model(self, model_config: ModelConfig = None):
+        if model_config is not None:
+            model_config.download_if_necessary()
+            self.template_model = load_template_model(model_config.path, torch_dtype=self.torch_dtype, device=self.device)
+            self.template_data_processor = load_template_data_processor(model_config.path)()
+    def enable_lora_hot_loading(self, model: torch.nn.Module):
+        if hasattr(model, "vram_management_enabled") and getattr(model, "vram_management_enabled"):
+            return model
+        module_map = {torch.nn.Linear: AutoWrappedLinear}
+        vram_config = {
+            "offload_dtype": self.torch_dtype,
+            "offload_device": self.device,
+            "onload_dtype": self.torch_dtype,
+            "onload_device": self.device,
+            "preparing_dtype": self.torch_dtype,
+            "preparing_device": self.device,
+            "computation_dtype": self.torch_dtype,
+            "computation_device": self.device,
+        }
+        model = enable_vram_management(model, module_map, vram_config=vram_config)
+        return model
     def compile_pipeline(self, mode: str = "default", dynamic: bool = True, fullgraph: bool = False, compile_models: list = None, **kwargs):
         """

{diffsynth-2.0.10 → diffsynth-2.0.11}/diffsynth/diffusion/loss.py RENAMED Viewed

@@ -3,6 +3,11 @@ import torch
 def FlowMatchSFTLoss(pipe: BasePipeline, **inputs):
+    if "lora" in inputs:
+        # Image-to-LoRA models need to load lora here.
+        pipe.clear_lora(verbose=0)
+        pipe.load_lora(pipe.dit, state_dict=inputs["lora"], hotload=True, verbose=0)
     max_timestep_boundary = int(inputs.get("max_timestep_boundary", 1) * len(pipe.scheduler.timesteps))
     min_timestep_boundary = int(inputs.get("min_timestep_boundary", 0) * len(pipe.scheduler.timesteps))

{diffsynth-2.0.10 → diffsynth-2.0.11}/diffsynth/diffusion/parsers.py RENAMED Viewed

@@ -60,6 +60,11 @@ def add_gradient_config(parser: argparse.ArgumentParser):
     parser.add_argument("--gradient_accumulation_steps", type=int, default=1, help="Gradient accumulation steps.")
     return parser
+def add_template_model_config(parser: argparse.ArgumentParser):
+    parser.add_argument("--template_model_id_or_path", type=str, default=None, help="Model ID of path of template models.")
+    parser.add_argument("--enable_lora_hot_loading", default=False, action="store_true", help="Whether to enable LoRA hot-loading. Only available for image-to-lora models.")
+    return parser
 def add_general_config(parser: argparse.ArgumentParser):
     parser = add_dataset_base_config(parser)
     parser = add_model_config(parser)
@@ -67,4 +72,5 @@ def add_general_config(parser: argparse.ArgumentParser):
     parser = add_output_config(parser)
     parser = add_lora_config(parser)
     parser = add_gradient_config(parser)
+    parser = add_template_model_config(parser)
     return parser

diffsynth-2.0.11/diffsynth/diffusion/template.py ADDED Viewed

@@ -0,0 +1,203 @@
+import torch, os, importlib, warnings, json, inspect
+from typing import Dict, List, Tuple, Union
+from ..core import ModelConfig, load_model
+from ..core.device.npu_compatible_device import get_device_type
+from ..utils.lora.merge import merge_lora
+KVCache = Dict[str, Tuple[torch.Tensor, torch.Tensor]]
+class TemplateModel(torch.nn.Module):
+    def __init__(self):
+        super().__init__()
+    @torch.no_grad()
+    def process_inputs(self, **kwargs):
+        return {}
+    def forward(self, **kwargs):
+        raise NotImplementedError()
+def check_template_model_format(model):
+    if not hasattr(model, "process_inputs"):
+        raise NotImplementedError("`process_inputs` is not implemented in the Template model.")
+    if "kwargs" not in inspect.signature(model.process_inputs).parameters:
+        raise NotImplementedError("`**kwargs` is not included in `process_inputs`.")
+    if not hasattr(model, "forward"):
+        raise NotImplementedError("`forward` is not implemented in the Template model.")
+    if "kwargs" not in inspect.signature(model.forward).parameters:
+        raise NotImplementedError("`**kwargs` is not included in `forward`.")
+def load_template_model(path, torch_dtype=torch.bfloat16, device="cuda", verbose=1):
+    spec = importlib.util.spec_from_file_location("template_model", os.path.join(path, "model.py"))
+    module = importlib.util.module_from_spec(spec)
+    spec.loader.exec_module(module)
+    template_model_path = getattr(module, 'TEMPLATE_MODEL_PATH') if hasattr(module, 'TEMPLATE_MODEL_PATH') else None
+    if template_model_path is not None:
+        # With `TEMPLATE_MODEL_PATH`, a pretrained model will be loaded.
+        model = load_model(
+            model_class=getattr(module, 'TEMPLATE_MODEL'),
+            config=getattr(module, 'TEMPLATE_MODEL_CONFIG') if hasattr(module, 'TEMPLATE_MODEL_CONFIG') else None,
+            path=os.path.join(path, getattr(module, 'TEMPLATE_MODEL_PATH')),
+            torch_dtype=torch_dtype,
+            device=device,
+        )
+    else:
+        # Without `TEMPLATE_MODEL_PATH`, a randomly initialized model or a non-model module will be loaded.
+        model = module.TEMPLATE_MODEL()
+        if hasattr(model, "to"):
+            model = model.to(dtype=torch_dtype, device=device)
+        if hasattr(model, "eval"):
+            model = model.eval()
+    check_template_model_format(model)
+    if verbose > 0:
+        metadata = {
+            "model_architecture": getattr(module, 'TEMPLATE_MODEL').__name__,
+            "code_path": os.path.join(path, "model.py"),
+            "weight_path": template_model_path,
+        }
+        print(f"Template model loaded: {json.dumps(metadata, indent=4)}")
+    return model
+def load_template_data_processor(path):
+    spec = importlib.util.spec_from_file_location("template_model", os.path.join(path, "model.py"))
+    module = importlib.util.module_from_spec(spec)
+    spec.loader.exec_module(module)
+    if hasattr(module, 'TEMPLATE_DATA_PROCESSOR'):
+        processor = getattr(module, 'TEMPLATE_DATA_PROCESSOR')
+        return processor
+    else:
+        return None
+class TemplatePipeline(torch.nn.Module):
+    def __init__(
+        self,
+        torch_dtype: torch.dtype = torch.bfloat16,
+        device: Union[str, torch.device] = get_device_type(),
+        model_configs: list[ModelConfig] = [],
+        lazy_loading: bool = False,
+    ):
+        super().__init__()
+        self.torch_dtype = torch_dtype
+        self.device = device
+        self.model_configs = model_configs
+        self.lazy_loading = lazy_loading
+        if lazy_loading:
+            for model_config in model_configs:
+                TemplatePipeline.check_vram_config(model_config)
+                model_config.download_if_necessary()
+            self.models = None
+        else:
+            models = []
+            for model_config in model_configs:
+                TemplatePipeline.check_vram_config(model_config)
+                model_config.download_if_necessary()
+                model = load_template_model(model_config.path, torch_dtype=torch_dtype, device=device)
+                models.append(model)
+            self.models = torch.nn.ModuleList(models)
+    def merge_kv_cache(self, kv_cache_list: List[KVCache]) -> KVCache:
+        names = {}
+        for kv_cache in kv_cache_list:
+            for name in kv_cache:
+                names[name] = None
+        kv_cache_merged = {}
+        for name in names:
+            kv_list = [kv_cache.get(name) for kv_cache in kv_cache_list]
+            kv_list = [kv for kv in kv_list if kv is not None]
+            if len(kv_list) > 0:
+                k = torch.concat([kv[0] for kv in kv_list], dim=1)
+                v = torch.concat([kv[1] for kv in kv_list], dim=1)
+                kv_cache_merged[name] = (k, v)
+        return kv_cache_merged
+    def merge_template_cache(self, template_cache_list):
+        params = sorted(list(set(sum([list(template_cache.keys()) for template_cache in template_cache_list], []))))
+        template_cache_merged = {}
+        for param in params:
+            data = [template_cache[param] for template_cache in template_cache_list if param in template_cache]
+            if param == "kv_cache":
+                data = self.merge_kv_cache(data)
+            elif param == "lora":
+                data = merge_lora(data)
+            elif len(data) == 1:
+                data = data[0]
+            else:
+                print(f"Conflict detected: `{param}` appears in the outputs of multiple Template models. Only the first one will be retained.")
+                data = data[0]
+            template_cache_merged[param] = data
+        return template_cache_merged
+    @staticmethod
+    def check_vram_config(model_config: ModelConfig):
+        params = [
+            model_config.offload_device, model_config.offload_dtype,
+            model_config.onload_device, model_config.onload_dtype,
+            model_config.preparing_device, model_config.preparing_dtype,
+            model_config.computation_device, model_config.computation_dtype,
+        ]
+        for param in params:
+            if param is not None:
+                warnings.warn("TemplatePipeline doesn't support VRAM management. VRAM config will be ignored.")
+    @staticmethod
+    def from_pretrained(
+        torch_dtype: torch.dtype = torch.bfloat16,
+        device: Union[str, torch.device] = get_device_type(),
+        model_configs: list[ModelConfig] = [],
+        lazy_loading: bool = False,
+    ):
+        pipe = TemplatePipeline(torch_dtype, device, model_configs, lazy_loading)
+        return pipe
+    def fetch_model(self, model_id):
+        if self.lazy_loading:
+            model_config = self.model_configs[model_id]
+            model_config.download_if_necessary()
+            model = load_template_model(model_config.path, torch_dtype=self.torch_dtype, device=self.device)
+        else:
+            model = self.models[model_id]
+        return model
+    def call_single_side(self, pipe=None, inputs: List[Dict] = None):
+        model = None
+        onload_model_id = -1
+        template_cache = []
+        for i in inputs:
+            model_id = i.get("model_id", 0)
+            if model_id != onload_model_id:
+                model = self.fetch_model(model_id)
+                onload_model_id = model_id
+            cache = model.process_inputs(pipe=pipe, **i)
+            cache = model.forward(pipe=pipe, **cache)
+            template_cache.append(cache)
+        template_cache = self.merge_template_cache(template_cache)
+        return template_cache
+    @torch.no_grad()
+    def __call__(
+        self,
+        pipe=None,
+        template_inputs: List[Dict] = None,
+        negative_template_inputs: List[Dict] = None,
+        **kwargs,
+    ):
+        template_cache = self.call_single_side(pipe=pipe, inputs=template_inputs or [])
+        negative_template_cache = self.call_single_side(pipe=pipe, inputs=negative_template_inputs or [])
+        required_params = list(inspect.signature(pipe.__call__).parameters.keys())
+        for param in template_cache:
+            if param in required_params:
+                kwargs[param] = template_cache[param]
+            else:
+                print(f"`{param}` is not included in the inputs of `{pipe.__class__.__name__}`. This parameter will be ignored.")
+        for param in negative_template_cache:
+            if "negative_" + param in required_params:
+                kwargs["negative_" + param] = negative_template_cache[param]
+            else:
+                print(f"`{'negative_' + param}` is not included in the inputs of `{pipe.__class__.__name__}`. This parameter will be ignored.")
+        return pipe(**kwargs)

{diffsynth-2.0.10 → diffsynth-2.0.11}/diffsynth/diffusion/training_module.py RENAMED Viewed

@@ -6,6 +6,7 @@ from peft import LoraConfig, inject_adapter_in_model
 class GeneralUnit_RemoveCache(PipelineUnit):
+    # Only used for training
     def __init__(self, required_params=tuple(), force_remove_params_shared=tuple(), force_remove_params_posi=tuple(), force_remove_params_nega=tuple()):
         super().__init__(take_over=True)
         self.required_params = required_params
@@ -27,6 +28,47 @@ class GeneralUnit_RemoveCache(PipelineUnit):
         return inputs_shared, inputs_posi, inputs_nega
+class GeneralUnit_TemplateProcessInputs(PipelineUnit):
+    # Only used for training
+    def __init__(self, data_processor):
+        super().__init__(
+            input_params=("template_inputs",),
+            output_params=("template_inputs",),
+        )
+        self.data_processor = data_processor
+    def process(self, pipe, template_inputs):
+        if not hasattr(pipe, "template_model") or template_inputs is None:
+            return {}
+        if self.data_processor is not None:
+            template_inputs = self.data_processor(**template_inputs)
+        template_inputs = pipe.template_model.process_inputs(pipe=pipe, **template_inputs)
+        return {"template_inputs": template_inputs}
+class GeneralUnit_TemplateForward(PipelineUnit):
+    # Only used for training
+    def __init__(self, use_gradient_checkpointing=False, use_gradient_checkpointing_offload=False):
+        super().__init__(
+            input_params=("template_inputs",),
+            output_params=("kv_cache",),
+            onload_model_names=("template_model",)
+        )
+        self.use_gradient_checkpointing = use_gradient_checkpointing
+        self.use_gradient_checkpointing_offload = use_gradient_checkpointing_offload
+    def process(self, pipe, template_inputs):
+        if not hasattr(pipe, "template_model") or template_inputs is None:
+            return {}
+        template_cache = pipe.template_model.forward(
+            **template_inputs,
+            pipe=pipe,
+            use_gradient_checkpointing=self.use_gradient_checkpointing,
+            use_gradient_checkpointing_offload=self.use_gradient_checkpointing_offload,
+        )
+        return template_cache
 class DiffusionTrainingModule(torch.nn.Module):
     def __init__(self):
         super().__init__()
@@ -209,6 +251,16 @@ class DiffusionTrainingModule(torch.nn.Module):
         else:
             lora_target_modules = lora_target_modules.split(",")
         return lora_target_modules
+    def load_training_template_model(self, pipe, path_or_model_id, use_gradient_checkpointing=False, use_gradient_checkpointing_offload=False):
+        if path_or_model_id is None:
+            return pipe
+        model_config = self.parse_path_or_model_id(path_or_model_id)
+        pipe.load_training_template_model(model_config)
+        pipe.units.append(GeneralUnit_TemplateProcessInputs(pipe.template_data_processor))
+        pipe.units.append(GeneralUnit_TemplateForward(use_gradient_checkpointing, use_gradient_checkpointing_offload))
+        return pipe
     def switch_pipe_to_training_mode(

{diffsynth-2.0.10 → diffsynth-2.0.11}/diffsynth/models/dinov3_image_encoder.py RENAMED Viewed

@@ -1,12 +1,16 @@
-from transformers.models.dinov3_vit.modeling_dinov3_vit import DINOv3ViTModel, DINOv3ViTConfig
-from transformers import DINOv3ViTImageProcessor
-import torch
+import torch, warnings
+try:
+    from transformers.models.dinov3_vit.modeling_dinov3_vit import DINOv3ViTModel
+except:
+    warnings.warn(f"Cannot import `DINOv3ViTModel`. `DINOv3ImageEncoder` is not available. Please update `transformers` by `pip install -U transformers`.")
+    DINOv3ViTModel = torch.nn.Module
 from ..core.device.npu_compatible_device import get_device_type
 class DINOv3ImageEncoder(DINOv3ViTModel):
     def __init__(self):
+        from transformers.models.dinov3_vit.modeling_dinov3_vit import DINOv3ViTConfig
+        from transformers import DINOv3ViTImageProcessor
         config = DINOv3ViTConfig(
             architectures = [
                 "DINOv3ViTModel"

diffsynth 2.0.10__tar.gz → 2.0.11__tar.gz

diffsynth 2.0.10tar.gz → 2.0.11tar.gz