PyPI - diffsynth-engine - Versions diffs - 0.1.1__tar.gz → 0.2.1__tar.gz - Mend

diffsynth-engine 0.1.1tar.gz → 0.2.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (236) hide show

diffsynth_engine-0.2.1/.github/workflows/python-publish.yml ADDED Viewed

@@ -0,0 +1,41 @@
+name: release
+on:
+  push:
+    tags:
+      - 'v**'
+  workflow_dispatch:
+    inputs:
+      branch:
+        required: true
+        default: 'main'
+permissions:
+  contents: read
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+jobs:
+  build-and-publish:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.10"
+      - name: Install build
+        run: pip install build
+      - name: Build dist
+        run: python -m build
+      - name: Publish to PyPI
+        run: |
+          pip install twine
+          twine upload dist/* --skip-existing -p ${{ secrets.PYPI_API_TOKEN }}

diffsynth_engine-0.2.1/.gitignore ADDED Viewed

@@ -0,0 +1,11 @@
+*.pyc
+.idea/
+.vscode/
+__pycache__/
+tmp/
+build/
+dist/
+*.egg-info/
+.DS_Store/
+.pytest_cache/
+.ruff_cache/

diffsynth_engine-0.2.1/.pre-commit-config.yaml ADDED Viewed

@@ -0,0 +1,11 @@
+repos:
+- repo: https://github.com/astral-sh/ruff-pre-commit
+  # Ruff version.
+  rev: v0.11.5
+  hooks:
+    # Run the linter.
+    - id: ruff
+      types_or: [ python, pyi ]
+    # Run the formatter.
+    - id: ruff-format
+      types_or: [ python, pyi ]

diffsynth_engine-0.2.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,34 @@
+Metadata-Version: 2.4
+Name: diffsynth_engine
+Version: 0.2.1
+Author: MuseAI x ModelScope
+Classifier: Programming Language :: Python :: 3
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.10
+License-File: LICENSE
+Requires-Dist: torch>=2.6
+Requires-Dist: torchvision
+Requires-Dist: xformers; sys_platform == "linux"
+Requires-Dist: safetensors
+Requires-Dist: gguf
+Requires-Dist: einops
+Requires-Dist: ftfy
+Requires-Dist: regex
+Requires-Dist: sentencepiece
+Requires-Dist: tokenizers
+Requires-Dist: modelscope
+Requires-Dist: flufl.lock
+Requires-Dist: scipy
+Requires-Dist: torchsde
+Requires-Dist: pillow
+Requires-Dist: imageio[ffmpeg]
+Requires-Dist: yunchang; sys_platform == "linux"
+Provides-Extra: dev
+Requires-Dist: diffusers==0.31.0; extra == "dev"
+Requires-Dist: transformers==4.45.2; extra == "dev"
+Requires-Dist: build; extra == "dev"
+Requires-Dist: ruff; extra == "dev"
+Requires-Dist: scikit-image; extra == "dev"
+Requires-Dist: pytest; extra == "dev"
+Requires-Dist: pre-commit; extra == "dev"
+Dynamic: license-file

{diffsynth_engine-0.1.1 → diffsynth_engine-0.2.1}/README.md RENAMED Viewed

@@ -6,20 +6,20 @@
 [![GitHub pull-requests](https://img.shields.io/github/issues-pr/modelscope/DiffSynth-Engine.svg)](https://GitHub.com/modelscope/DiffSynth-Engine/pull/)
 [![GitHub latest commit](https://badgen.net/github/last-commit/modelscope/DiffSynth-Engine)](https://GitHub.com/modelscope/DiffSynth-Engine/commit/)
-Diffsynth Engine is a high-performance diffusion inference engine designed for developers.
+DiffSynth-Engine is a high-performance engine geared towards buidling efficient inference pipelines for diffusion models.
 **Key Features:**
-- **Clean and Readable Code:** Fully re-implements the Diffusion sampler and scheduler without relying on third-party libraries like k-diffusion, ldm, or sgm.
+- **Thoughtfully-Designed Implementation:** We carefully re-implemented key components in Diffusion pipelines, such as sampler and scheduler, without introducing external dependencies on libraries like k-diffusion, ldm, or sgm.
-- **Extensive Model Support:** Compatible with multiple formats (e.g., CivitAI format) of base models and LoRA models , catering to diverse use cases.
+- **Extensive Model Support:** Compatible with popular formats (e.g., CivitAI) of base models and LoRA models , catering to diverse use cases.
-- **Flexible Memory Management:** Supports various levels of model quantization (e.g., FP8, INT8)
-and offload strategies, enabling users to run large models (e.g., Flux.1 Dev) on limited GPU memory.
+- **Versatile Resource Management:** Comprehensive support for varous model quantization (e.g., FP8, INT8)
+and offloading strategies, enabling loading of larger diffusion models (e.g., Flux.1 Dev) on limited hardware budget of GPU memory.
-- **High-Performance Inference:** Optimizes the inference pipeline to achieve fast generation across various hardware environments.
+- **Optimized Performance:** Carefully-crafted inference pipeline to achieve fast generation across various hardware environments.
-- **Platform Compatibility:** Supports Windows, macOS (Apple Silicon), and Linux, ensuring a smooth experience across different operating systems.
+- **Cross-Platform Support:** Runnable on Windows, macOS (Apple Silicon), and Linux, ensuring a smooth experience across different operating systems.
 ## Quick Start
 ### Requirements
@@ -29,13 +29,13 @@ and offload strategies, enabling users to run large models (e.g., Flux.1 Dev) on
 ### Installation
-Install for PyPI (stable version)
-```python
+Install released version (from PyPI):
+```shell
 pip3 install diffsynth-engine
 ```
-Install for source (preview version)
-```python
+Install from source:
+```shell
 git clone https://github.com/modelscope/diffsynth-engine.git && cd diffsynth-engine
 pip3 install -e .
 ```
@@ -71,10 +71,10 @@ For more details, please refer to our tutorials ([English](./docs/tutorial.md),
 ## Contact
-If you have any questions or feedback, please scan the QR code or send email to muse@alibaba-inc.com.
+If you have any questions or feedback, please scan the QR code below, or send email to muse@alibaba-inc.com.
 <div style="display: flex; justify-content: space-between;">
-    <img src="assets/dingtalk.png" alt="dingtalk" style="zoom: 60%;" />
+    <img src="assets/dingtalk.png" alt="dingtalk" width="400" />
 </div>
 ## License
@@ -82,7 +82,7 @@ This project is licensed under the Apache License 2.0. See the LICENSE file for
 ## Citation
-If you use this codebase, or otherwise found our work valuable, please cite:
+If you use this codebase, or otherwise found our work helpful, please cite:
 ```bibtex
 @misc{diffsynth-engine2025,

diffsynth_engine-0.2.1/assets/dingtalk.png ADDED Viewed

Binary file

diffsynth_engine-0.2.1/assets/showcase.jpeg ADDED Viewed

Binary file

{diffsynth_engine-0.1.1 → diffsynth_engine-0.2.1}/diffsynth_engine/__init__.py RENAMED Viewed

@@ -7,11 +7,16 @@ from .pipelines import (
     SDXLModelConfig,
     SDModelConfig,
     WanModelConfig,
+    ControlNetParams,
 )
+from .models.flux import FluxControlNet
 from .utils.download import fetch_model, fetch_modelscope_model, fetch_civitai_model
 from .utils.video import load_video, save_video
+from .tools import FluxInpaintingTool, FluxOutpaintingTool
 __all__ = [
     "FluxImagePipeline",
+    "FluxControlNet",
     "SDXLImagePipeline",
     "SDImagePipeline",
     "WanVideoPipeline",
@@ -19,7 +24,12 @@ __all__ = [
     "SDXLModelConfig",
     "SDModelConfig",
     "WanModelConfig",
+    "FluxInpaintingTool",
+    "FluxOutpaintingTool",
+    "ControlNetParams",
     "fetch_model",
     "fetch_modelscope_model",
     "fetch_civitai_model",
+    "load_video",
+    "save_video",
 ]

{diffsynth_engine-0.1.1 → diffsynth_engine-0.2.1}/diffsynth_engine/algorithm/noise_scheduler/flow_match/recifited_flow.py RENAMED Viewed

@@ -5,18 +5,19 @@ from diffsynth_engine.algorithm.noise_scheduler.base_scheduler import append_zer
 class RecifitedFlowScheduler(BaseScheduler):
-    def __init__(self,
-        shift=1.0,
-        sigma_min=0.001,
+    def __init__(
+        self,
+        shift=1.0,
+        sigma_min=0.001,
         sigma_max=1.0,
-        num_train_timesteps=1000,
+        num_train_timesteps=1000,
         use_dynamic_shifting=False,
     ):
         self.shift = shift
         self.sigma_min = sigma_min
         self.sigma_max = sigma_max
-        self.num_train_timesteps = num_train_timesteps
-        self.use_dynamic_shifting = use_dynamic_shifting
+        self.num_train_timesteps = num_train_timesteps
+        self.use_dynamic_shifting = use_dynamic_shifting
     def _sigma_to_t(self, sigma):
         return sigma * self.num_train_timesteps
@@ -30,19 +31,20 @@ class RecifitedFlowScheduler(BaseScheduler):
     def _shift_sigma(self, sigma: torch.Tensor, shift: float):
         return shift * sigma / (1 + (shift - 1) * sigma)
-    def schedule(self,
-                 num_inference_steps: int,
-                 mu: float | None = None,
-                 sigma_min: float | None = None,
-                 sigma_max: float | None = None
+    def schedule(
+        self,
+        num_inference_steps: int,
+        mu: float | None = None,
+        sigma_min: float | None = None,
+        sigma_max: float | None = None,
     ):
         sigma_min = self.sigma_min if sigma_min is None else sigma_min
-        sigma_max = self.sigma_max if sigma_max is None else sigma_max
+        sigma_max = self.sigma_max if sigma_max is None else sigma_max
         sigmas = torch.linspace(sigma_max, sigma_min, num_inference_steps)
         if self.use_dynamic_shifting:
-            sigmas = self._time_shift(mu, 1.0, sigmas)            # FLUX
+            sigmas = self._time_shift(mu, 1.0, sigmas)  # FLUX
         else:
             sigmas = self._shift_sigma(sigmas, self.shift)
         timesteps = sigmas * self.num_train_timesteps
         sigmas = append_zero(sigmas)
-        return sigmas, timesteps
+        return sigmas, timesteps

{diffsynth_engine-0.1.1 → diffsynth_engine-0.2.1}/diffsynth_engine/algorithm/noise_scheduler/stable_diffusion/ddim.py RENAMED Viewed

@@ -1,7 +1,4 @@
 import torch
-from .linear import ScaledLinearScheduler
-from ..base_scheduler import append_zero
-import numpy as np
 from diffsynth_engine.algorithm.noise_scheduler.stable_diffusion.linear import ScaledLinearScheduler
 from diffsynth_engine.algorithm.noise_scheduler.base_scheduler import append_zero

{diffsynth_engine-0.1.1 → diffsynth_engine-0.2.1}/diffsynth_engine/algorithm/noise_scheduler/stable_diffusion/sgm_uniform.py RENAMED Viewed

@@ -1,7 +1,4 @@
 import torch
-from .linear import ScaledLinearScheduler
-from ..base_scheduler import append_zero
-import numpy as np
 from diffsynth_engine.algorithm.noise_scheduler.stable_diffusion.linear import ScaledLinearScheduler
 from diffsynth_engine.algorithm.noise_scheduler.base_scheduler import append_zero

{diffsynth_engine-0.1.1 → diffsynth_engine-0.2.1}/diffsynth_engine/algorithm/sampler/flow_match/flow_match_euler.py RENAMED Viewed

@@ -2,7 +2,7 @@ import torch
 class FlowMatchEulerSampler:
-    def initialize(self, init_latents, timesteps, sigmas, mask=None):
+    def initialize(self, init_latents, timesteps, sigmas, mask=None):
         self.init_latents = init_latents
         self.timesteps = timesteps
         self.sigmas = sigmas

diffsynth_engine-0.2.1/diffsynth_engine/models/__init__.py ADDED Viewed

@@ -0,0 +1,7 @@
+from .base import PreTrainedModel, StateDictConverter
+__all__ = [
+    "PreTrainedModel",
+    "StateDictConverter",
+]

{diffsynth_engine-0.1.1 → diffsynth_engine-0.2.1}/diffsynth_engine/models/base.py RENAMED Viewed

@@ -1,22 +1,14 @@
 import os
 import torch
 import torch.nn as nn
-from typing import Dict, Union
-from safetensors.torch import load_file
+from typing import Dict, Union, List, Any
+from diffsynth_engine.utils.loader import load_file
+from diffsynth_engine.models.basic.lora import LoRALinear, LoRAConv2d
 from diffsynth_engine.models.utils import no_init_weights
-class LoRAStateDictConverter:
-    def convert(self, lora_state_dict: Dict[str, torch.Tensor]) -> Dict[str, Dict[str, torch.Tensor]]:
-        return {"lora": lora_state_dict}
-StateDictType = Dict[str, torch.Tensor]
 class StateDictConverter:
-    def convert(self, state_dict: StateDictType) -> StateDictType:
+    def convert(self, state_dict: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:
         return state_dict
@@ -29,17 +21,34 @@ class PreTrainedModel(nn.Module):
     @classmethod
     def from_pretrained(cls, pretrained_model_path: Union[str, os.PathLike], device: str, dtype: torch.dtype, **kwargs):
-        state_dict = load_file(pretrained_model_path, device=device)
+        state_dict = load_file(pretrained_model_path)
         return cls.from_state_dict(state_dict, device=device, dtype=dtype, **kwargs)
     @classmethod
     def from_state_dict(cls, state_dict: Dict[str, torch.Tensor], device: str, dtype: torch.dtype, **kwargs):
         with no_init_weights():
             model = torch.nn.utils.skip_init(cls, device=device, dtype=dtype, **kwargs)
+        model.to_empty(device=device)
         model.load_state_dict(state_dict)
         model.to(device=device, dtype=dtype, non_blocking=True)
         return model
+    def load_loras(self, lora_args: List[Dict[str, Any]], fused: bool = True):
+        for args in lora_args:
+            key = args["name"]
+            module = self.get_submodule(key)
+            if not isinstance(module, (LoRALinear, LoRAConv2d)):
+                raise ValueError(f"Unsupported lora key: {key}")
+            if fused:
+                module.add_frozen_lora(**args)
+            else:
+                module.add_lora(**args)
+    def unload_loras(self):
+        for module in self.modules():
+            if isinstance(module, (LoRALinear, LoRAConv2d)):
+                module.clear()
 def split_suffix(name: str):
     suffix_list = [

diffsynth_engine-0.2.1/diffsynth_engine/models/basic/attention.py ADDED Viewed

@@ -0,0 +1,233 @@
+import torch
+import torch.nn as nn
+from einops import rearrange, repeat
+from typing import Optional
+import torch.nn.functional as F
+from diffsynth_engine.utils import logging
+from diffsynth_engine.utils.flag import (
+    FLASH_ATTN_3_AVAILABLE,
+    FLASH_ATTN_2_AVAILABLE,
+    XFORMERS_AVAILABLE,
+    SDPA_AVAILABLE,
+    SAGE_ATTN_AVAILABLE,
+    SPARGE_ATTN_AVAILABLE,
+)
+logger = logging.get_logger(__name__)
+def memory_align(x: torch.Tensor, dim=-1, alignment: int = 8):
+    padding_size = (alignment - x.shape[dim] % alignment) % alignment
+    padded_x = F.pad(x, (0, padding_size), "constant", 0)
+    return padded_x[..., : x.shape[dim]]
+if FLASH_ATTN_3_AVAILABLE:
+    from flash_attn_interface import flash_attn_func as flash_attn3
+if FLASH_ATTN_2_AVAILABLE:
+    from flash_attn import flash_attn_func as flash_attn2
+if XFORMERS_AVAILABLE:
+    from xformers.ops import memory_efficient_attention
+    def xformers_attn(q, k, v, attn_mask=None, scale=None):
+        if attn_mask is not None:
+            attn_mask = repeat(attn_mask, "S L -> B H S L", B=q.shape[0], H=q.shape[2])
+            attn_mask = memory_align(attn_mask)
+        return memory_efficient_attention(q, k, v, attn_bias=attn_mask, scale=scale)
+if SDPA_AVAILABLE:
+    def sdpa_attn(q, k, v, attn_mask=None, scale=None):
+        q = q.transpose(1, 2)
+        k = k.transpose(1, 2)
+        v = v.transpose(1, 2)
+        out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=attn_mask, scale=scale)
+        return out.transpose(1, 2)
+if SAGE_ATTN_AVAILABLE:
+    from sageattention import sageattn
+    def sage_attn(q, k, v, attn_mask=None, scale=None):
+        q = q.transpose(1, 2)
+        k = k.transpose(1, 2)
+        v = v.transpose(1, 2)
+        out = sageattn(q, k, v, attn_mask=attn_mask, sm_scale=scale)
+        return out.transpose(1, 2)
+if SPARGE_ATTN_AVAILABLE:
+    from spas_sage_attn import spas_sage2_attn_meansim_cuda
+    def sparge_attn(self, q, k, v, attn_mask=None, scale=None):
+        q = q.transpose(1, 2)
+        k = k.transpose(1, 2)
+        v = v.transpose(1, 2)
+        out = spas_sage2_attn_meansim_cuda(q, k, v, attn_mask=attn_mask, scale=scale)
+        return out.transpose(1, 2)
+def eager_attn(q, k, v, attn_mask=None, scale=None):
+    q = q.transpose(1, 2)
+    k = k.transpose(1, 2)
+    v = v.transpose(1, 2)
+    scale = 1 / q.shape[-1] ** 0.5 if scale is None else scale
+    q = q * scale
+    attn = torch.matmul(q, k.transpose(-2, -1))
+    if attn_mask is not None:
+        attn = attn + attn_mask
+    attn = attn.softmax(-1)
+    out = attn @ v
+    return out.transpose(1, 2)
+def attention(
+    q,
+    k,
+    v,
+    attn_impl: Optional[str] = None,
+    attn_mask: Optional[torch.Tensor] = None,
+    scale: Optional[float] = None,
+):
+    """
+    q: [B, Lq, Nq, C1]
+    k: [B, Lk, Nk, C1]
+    v: [B, Lk, Nk, C2]
+    """
+    assert attn_impl in [
+        None,
+        "auto",
+        "eager",
+        "flash_attn_2",
+        "flash_attn_3",
+        "xformers",
+        "sdpa",
+        "sage_attn",
+        "sparge_attn",
+    ]
+    if attn_impl is None or attn_impl == "auto":
+        if FLASH_ATTN_3_AVAILABLE:
+            return flash_attn3(q, k, v, softmax_scale=scale)
+        elif FLASH_ATTN_2_AVAILABLE:
+            return flash_attn2(q, k, v, softmax_scale=scale)
+        elif XFORMERS_AVAILABLE:
+            return xformers_attn(q, k, v, attn_mask=attn_mask, scale=scale)
+        elif SDPA_AVAILABLE:
+            return sdpa_attn(q, k, v, attn_mask=attn_mask, scale=scale)
+        else:
+            return eager_attn(q, k, v, attn_mask=attn_mask, scale=scale)
+    else:
+        if attn_impl == "eager":
+            return eager_attn(q, k, v, attn_mask=attn_mask, scale=scale)
+        elif attn_impl == "flash_attn_3":
+            return flash_attn3(q, k, v, softmax_scale=scale)
+        elif attn_impl == "flash_attn_2":
+            return flash_attn2(q, k, v, softmax_scale=scale)
+        elif attn_impl == "xformers":
+            return xformers_attn(q, k, v, attn_mask=attn_mask, scale=scale)
+        elif attn_impl == "sdpa":
+            return sdpa_attn(q, k, v, attn_mask=attn_mask, scale=scale)
+        elif attn_impl == "sage_attn":
+            return sage_attn(q, k, v, attn_mask=attn_mask, scale=scale)
+        elif attn_impl == "sparge_attn":
+            return sparge_attn(q, k, v, attn_mask=attn_mask, scale=scale)
+        else:
+            raise ValueError(f"Invalid attention implementation: {attn_impl}")
+class Attention(nn.Module):
+    def __init__(
+        self,
+        q_dim,
+        num_heads,
+        head_dim,
+        kv_dim=None,
+        bias_q=False,
+        bias_kv=False,
+        bias_out=False,
+        scale=None,
+        attn_impl: Optional[str] = None,
+        device: str = "cuda:0",
+        dtype: torch.dtype = torch.float16,
+    ):
+        super().__init__()
+        dim_inner = head_dim * num_heads
+        kv_dim = kv_dim if kv_dim is not None else q_dim
+        self.num_heads = num_heads
+        self.head_dim = head_dim
+        self.to_q = nn.Linear(q_dim, dim_inner, bias=bias_q, device=device, dtype=dtype)
+        self.to_k = nn.Linear(kv_dim, dim_inner, bias=bias_kv, device=device, dtype=dtype)
+        self.to_v = nn.Linear(kv_dim, dim_inner, bias=bias_kv, device=device, dtype=dtype)
+        self.to_out = nn.Linear(dim_inner, q_dim, bias=bias_out, device=device, dtype=dtype)
+        self.attn_impl = attn_impl
+        self.scale = scale
+    def forward(
+        self,
+        x: torch.Tensor,
+        y: Optional[torch.Tensor] = None,
+        attn_mask: Optional[torch.Tensor] = None,
+    ):
+        if y is None:
+            y = x
+        q = rearrange(self.to_q(x), "b s (n d) -> b s n d", n=self.num_heads)
+        k = rearrange(self.to_k(y), "b s (n d) -> b s n d", n=self.num_heads)
+        v = rearrange(self.to_v(y), "b s (n d) -> b s n d", n=self.num_heads)
+        out = attention(q, k, v, attn_mask=attn_mask, attn_impl=self.attn_impl, scale=self.scale)
+        out = rearrange(out, "b s n d -> b s (n d)", n=self.num_heads)
+        return self.to_out(out)
+def long_context_attention(
+    q,
+    k,
+    v,
+    attn_impl: Optional[str] = None,
+    attn_mask: Optional[torch.Tensor] = None,
+    scale: Optional[float] = None,
+):
+    """
+    q: [B, Lq, Nq, C1]
+    k: [B, Lk, Nk, C1]
+    v: [B, Lk, Nk, C2]
+    """
+    from yunchang import LongContextAttention
+    from yunchang.kernels import AttnType
+    assert attn_impl in [
+        None,
+        "auto",
+        "eager",
+        "flash_attn_2",
+        "flash_attn_3",
+        "xformers",
+        "sdpa",
+        "sage_attn",
+        "sparge_attn",
+    ]
+    if attn_impl is None or attn_impl == "auto":
+        if FLASH_ATTN_3_AVAILABLE:
+            attn_func = LongContextAttention(attn_type=AttnType.FA3)
+        elif FLASH_ATTN_2_AVAILABLE:
+            attn_func = LongContextAttention(attn_type=AttnType.FA)
+        elif SDPA_AVAILABLE:
+            attn_func = LongContextAttention(attn_type=AttnType.TORCH)
+        else:
+            raise ValueError("No available long context attention implementation")
+    else:
+        if attn_impl == "flash_attn_3":
+            attn_func = LongContextAttention(attn_type=AttnType.FA3)
+        elif attn_impl == "flash_attn_2":
+            attn_func = LongContextAttention(attn_type=AttnType.FA)
+        elif attn_impl == "sdpa":
+            attn_func = LongContextAttention(attn_type=AttnType.TORCH)
+        elif attn_impl == "sage_attn":
+            attn_func = LongContextAttention(attn_type=AttnType.SAGE_FP8)
+        elif attn_impl == "sparge_attn":
+            attn_func = LongContextAttention(attn_type=AttnType.SPARSE_SAGE)
+        else:
+            raise ValueError(f"Invalid long context attention implementation: {attn_impl}")
+    return attn_func(q, k, v, softmax_scale=scale)

{diffsynth_engine-0.1.1 → diffsynth_engine-0.2.1}/diffsynth_engine/models/basic/unet_helper.py RENAMED Viewed

@@ -51,12 +51,12 @@ class BasicTransformerBlock(nn.Module):
     def forward(self, hidden_states, encoder_hidden_states):
         # 1. Self-Attention
         norm_hidden_states = self.norm1(hidden_states)
-        attn_output = self.attn1(norm_hidden_states, encoder_hidden_states=None)
+        attn_output = self.attn1(norm_hidden_states)
         hidden_states = attn_output + hidden_states
         # 2. Cross-Attention
         norm_hidden_states = self.norm2(hidden_states)
-        attn_output = self.attn2(norm_hidden_states, encoder_hidden_states=encoder_hidden_states)
+        attn_output = self.attn2(norm_hidden_states, y=encoder_hidden_states)
         hidden_states = attn_output + hidden_states
         # 3. Feed-forward

diffsynth-engine 0.1.1__tar.gz → 0.2.1__tar.gz

diffsynth-engine 0.1.1tar.gz → 0.2.1tar.gz