PyPI - fusion-bench - Versions diffs - 0.2.2__tar.gz → 0.2.4__tar.gz - Mend

fusion-bench 0.2.2tar.gz → 0.2.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (513) hide show

{fusion_bench-0.2.2 → fusion_bench-0.2.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: fusion_bench
-Version: 0.2.2
+Version: 0.2.4
 Summary: A Comprehensive Benchmark of Deep Model Fusion
 Author-email: Anke Tang <tang.anke@foxmail.com>
 License: MIT License
@@ -33,7 +33,6 @@ Requires-Python: >=3.10
 Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: hydra-core
-Requires-Dist: torch>=2.0.0
 Requires-Dist: lightning
 Requires-Dist: transformers
 Requires-Dist: datasets
@@ -47,6 +46,8 @@ Requires-Dist: scipy
 Requires-Dist: h5py
 Requires-Dist: pytest
+<div align='center'>
 # FusionBench: A Comprehensive Benchmark/ToolKit of Deep Model Fusion
 [![arXiv](https://img.shields.io/badge/arXiv-1234.56789-b31b1b.svg)](http://arxiv.org/abs/2406.03280)
@@ -57,17 +58,23 @@ Requires-Dist: pytest
 [![Static Badge](https://img.shields.io/badge/code%20style-black-black)](https://github.com/psf/black)
 [![Static Badge](https://img.shields.io/badge/code%20style-yamlfmt-black)](https://github.com/google/yamlfmt)
+</div>
 > [!TIP]
 > Documentation is available at [tanganke.github.io/fusion_bench/](https://tanganke.github.io/fusion_bench/).
 ## Overview
 FusionBench is a benchmark suite designed to evaluate the performance of various deep model fusion techniques. It aims to provide a comprehensive comparison of different methods on a variety of datasets and tasks.
 Projects based on FusionBench:
+<details>
+  <summary>Li Shen, Anke Tang, Enneng Yang et al. Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Oct, 2024. https://github.com/EnnengYang/Efficient-WEMoE</summary>
+  <img width="1018" alt="image" src="https://github.com/user-attachments/assets/b7e1279e-87fc-4016-8867-1bff7700e271">
+</details>
 <details>
   <summary>Jinluan Yang et al. Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace. Oct, 2024. http://arxiv.org/abs/2410.13910</summary>
@@ -111,9 +118,21 @@ In this benchmark, we evaluate the performance of different fusion methods on a
 The project is structured as follows:
 - `fusion_bench/`: the main package of the benchmark.
+  - `method`: contains the implementation of the fusion methods.
+    > **naming convention**: `fusion_bench/method/{method_name}/{variant}.py` contains the implementation of the specific method or its variants.
+      For example, `fusion_bench/method/regmean/clip_regmean.py` contains the implementation of the RegMean algorithm for CLIP vision models.
+  - `modelpool`: contains the implementation of the model pool, responsible for managing the models and dataset to be loaded.
+  - `taskpool`: contains the implementation of the task pool, responsible for evaluating the performance of models returned by the algorithm.
 - `config/`: configuration files for the benchmark. We use [Hydra](https://hydra.cc/) to manage the configurations.
+  - `method`: configuration files for the fusion methods.
+    > **naming convention**: `config/method/{method_name}/{variant}.yaml` contains the configuration for the specific method or its variants.
+  - `modelpool`: configuration files for the model pool.
+  - `taskpool`: configuration files for the task pool.
+  - `model`: configuration files for the models.
+  - `dataset`: configuration files for the datasets.
 - `docs/`: documentation for the benchmark. We use [mkdocs](https://www.mkdocs.org/) to generate the documentation. Start the documentation server locally with `mkdocs serve`. The required packages can be installed with `pip install -r mkdocs-requirements.txt`.
 - `examples/`: example scripts for running some of the experiments.
+  > **naming convention**: `examples/{method_name}/` contains the files such as bash scripts and jupyter notebooks for the specific method.
 - `tests/`: unit tests for the benchmark.
 ## A Unified Command Line Interface
@@ -126,6 +145,9 @@ Read the [CLI documentation](https://tanganke.github.io/fusion_bench/cli/fusion_
 ## Implement your own model fusion algorithm
+First, create a new Python file for the algorithm in the `fusion_bench/method` directory.
+Following the naming convention, the file should be named `{method_name_or_class}/{variant}.py`.
 ```python
 from fusion_bench import BaseModelFusionAlgorithm, BaseModelPool
@@ -158,6 +180,9 @@ class DerivedModelFusionAlgorithm(BaseModelFusionAlgorithm):
 A corresponding configuration file should be created to specify the class and hyperparameters of the algorithm.
 Here we assume the configuration file is placed at `config/method/your_algorithm_config.yaml`.
+> [!NOTE]
+> In fact, you can place your implementation anywhere you like, as long as the `_target_` in the configuration file points to the correct class.
 ```yaml
 _target_: path_to_the_module.DerivedModelFusionAlgorithm
@@ -175,6 +200,16 @@ fusion_bench \
   ... # other configurations
 ```
+### :rocket: Quick Start for Experienced Users
+We provide a project template for quickly starting a new fusion algorithm implementation here: [FusionBench Project Template](https://github.com/fusion-bench/fusion-bench-project-template).
+<div align='center'>
+Click on [<kbd>Use this template</kbd>](https://github.com/fusion-bench/fusion-bench-project-template/generate) to initialize new repository.
+</div>
 ### FusionBench Command Generator WebUI (for v0.1.x)
 FusionBench Command Generator is a user-friendly web interface for generating FusionBench commands based on configuration files.

{fusion_bench-0.2.2 → fusion_bench-0.2.4}/README.md RENAMED Viewed

@@ -1,3 +1,5 @@
+<div align='center'>
 # FusionBench: A Comprehensive Benchmark/ToolKit of Deep Model Fusion
 [![arXiv](https://img.shields.io/badge/arXiv-1234.56789-b31b1b.svg)](http://arxiv.org/abs/2406.03280)
@@ -8,17 +10,23 @@
 [![Static Badge](https://img.shields.io/badge/code%20style-black-black)](https://github.com/psf/black)
 [![Static Badge](https://img.shields.io/badge/code%20style-yamlfmt-black)](https://github.com/google/yamlfmt)
+</div>
 > [!TIP]
 > Documentation is available at [tanganke.github.io/fusion_bench/](https://tanganke.github.io/fusion_bench/).
 ## Overview
 FusionBench is a benchmark suite designed to evaluate the performance of various deep model fusion techniques. It aims to provide a comprehensive comparison of different methods on a variety of datasets and tasks.
 Projects based on FusionBench:
+<details>
+  <summary>Li Shen, Anke Tang, Enneng Yang et al. Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Oct, 2024. https://github.com/EnnengYang/Efficient-WEMoE</summary>
+  <img width="1018" alt="image" src="https://github.com/user-attachments/assets/b7e1279e-87fc-4016-8867-1bff7700e271">
+</details>
 <details>
   <summary>Jinluan Yang et al. Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace. Oct, 2024. http://arxiv.org/abs/2410.13910</summary>
@@ -62,9 +70,21 @@ In this benchmark, we evaluate the performance of different fusion methods on a
 The project is structured as follows:
 - `fusion_bench/`: the main package of the benchmark.
+  - `method`: contains the implementation of the fusion methods.
+    > **naming convention**: `fusion_bench/method/{method_name}/{variant}.py` contains the implementation of the specific method or its variants.
+      For example, `fusion_bench/method/regmean/clip_regmean.py` contains the implementation of the RegMean algorithm for CLIP vision models.
+  - `modelpool`: contains the implementation of the model pool, responsible for managing the models and dataset to be loaded.
+  - `taskpool`: contains the implementation of the task pool, responsible for evaluating the performance of models returned by the algorithm.
 - `config/`: configuration files for the benchmark. We use [Hydra](https://hydra.cc/) to manage the configurations.
+  - `method`: configuration files for the fusion methods.
+    > **naming convention**: `config/method/{method_name}/{variant}.yaml` contains the configuration for the specific method or its variants.
+  - `modelpool`: configuration files for the model pool.
+  - `taskpool`: configuration files for the task pool.
+  - `model`: configuration files for the models.
+  - `dataset`: configuration files for the datasets.
 - `docs/`: documentation for the benchmark. We use [mkdocs](https://www.mkdocs.org/) to generate the documentation. Start the documentation server locally with `mkdocs serve`. The required packages can be installed with `pip install -r mkdocs-requirements.txt`.
 - `examples/`: example scripts for running some of the experiments.
+  > **naming convention**: `examples/{method_name}/` contains the files such as bash scripts and jupyter notebooks for the specific method.
 - `tests/`: unit tests for the benchmark.
 ## A Unified Command Line Interface
@@ -77,6 +97,9 @@ Read the [CLI documentation](https://tanganke.github.io/fusion_bench/cli/fusion_
 ## Implement your own model fusion algorithm
+First, create a new Python file for the algorithm in the `fusion_bench/method` directory.
+Following the naming convention, the file should be named `{method_name_or_class}/{variant}.py`.
 ```python
 from fusion_bench import BaseModelFusionAlgorithm, BaseModelPool
@@ -109,6 +132,9 @@ class DerivedModelFusionAlgorithm(BaseModelFusionAlgorithm):
 A corresponding configuration file should be created to specify the class and hyperparameters of the algorithm.
 Here we assume the configuration file is placed at `config/method/your_algorithm_config.yaml`.
+> [!NOTE]
+> In fact, you can place your implementation anywhere you like, as long as the `_target_` in the configuration file points to the correct class.
 ```yaml
 _target_: path_to_the_module.DerivedModelFusionAlgorithm
@@ -126,6 +152,16 @@ fusion_bench \
   ... # other configurations
 ```
+### :rocket: Quick Start for Experienced Users
+We provide a project template for quickly starting a new fusion algorithm implementation here: [FusionBench Project Template](https://github.com/fusion-bench/fusion-bench-project-template).
+<div align='center'>
+Click on [<kbd>Use this template</kbd>](https://github.com/fusion-bench/fusion-bench-project-template/generate) to initialize new repository.
+</div>
 ### FusionBench Command Generator WebUI (for v0.1.x)
 FusionBench Command Generator is a user-friendly web interface for generating FusionBench commands based on configuration files.

{fusion_bench-0.2.2 → fusion_bench-0.2.4}/fusion_bench/__init__.py RENAMED Viewed

@@ -13,7 +13,7 @@ from . import (
     tasks,
     utils,
 )
-from .method import BaseModelFusionAlgorithm
+from .method import BaseAlgorithm, BaseModelFusionAlgorithm
 from .modelpool import BaseModelPool
 from .models import separate_io
 from .taskpool import BaseTaskPool

{fusion_bench-0.2.2 → fusion_bench-0.2.4}/fusion_bench/compat/taskpool/clip_image_classification.py RENAMED Viewed

@@ -83,6 +83,12 @@ class CLIPImageClassificationTask(ClassificationTask):
     def evaluate(self, clip_model: CLIPModel):
         """
         Evaluate the model on the image classification task.
+        Args:
+            clip_model (CLIPModel): The CLIP model to evaluate.
+        Returns:
+            dict: A dictionary containing the evaluation results.
         """
         classifier = HFCLIPClassifier(
             clip_model=clip_model, processor=self._clip_processor
@@ -151,6 +157,12 @@ class CLIPImageClassificationTaskPool(TaskPool):
     def evaluate(self, model: CLIPVisionModel):
         """
         Evaluate the model on the image classification task.
+        Args:
+            model (CLIPVisionModel): The vision model to evaluate.
+        Returns:
+            dict: A dictionary containing the evaluation results for each task.
         """
         # if the fabric is not set, and we have a GPU, create a fabric instance
         if self._fabric is None and torch.cuda.is_available():

{fusion_bench-0.2.2 → fusion_bench-0.2.4}/fusion_bench/compat/taskpool/flan_t5_glue_text_generation.py RENAMED Viewed

@@ -149,6 +149,15 @@ class FlanT5GLUETextGenerationTaskPool(LightningFabricMixin, TaskPool):
             raise ValueError(f"Unknown task {task_config.name}")
     def evaluate(self, model: T5ForConditionalGeneration):
+        """
+        Evaluate the model on the FlanT5 GLUE text generation tasks.
+        Args:
+            model (T5ForConditionalGeneration): The model to evaluate.
+        Returns:
+            dict: A dictionary containing the evaluation results for each task.
+        """
         if not isinstance(model, T5ForConditionalGeneration):
             log.warning(
                 f"Model is not an instance of T5ForConditionalGeneration, but {type(model)}"

{fusion_bench-0.2.2 → fusion_bench-0.2.4}/fusion_bench/dataset/gsm8k.py RENAMED Viewed

@@ -16,6 +16,9 @@ def load_gsm8k_question_label_data(
     {'question': 'Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?',
      'answer': 'Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72'}
+    Args:
+        dataset_name (Literal["train", "test", "train_socratic", "test_socratic"]): The name of the dataset to load.
     Returns:
         questions (List[str]): List of questions.
         labels (List[float]): List of labels. For example, the label for the above example is `72.0`.

fusion_bench-0.2.4/fusion_bench/dataset/imdb.py ADDED Viewed

@@ -0,0 +1,11 @@
+from typing import Any, Dict, List, Optional
+from datasets import load_dataset, load_from_disk
+from transformers import PreTrainedTokenizer
+import fusion_bench
+import os
+import logging
+from trl import SFTConfig, SFTTrainer
+log = logging.getLogger(__name__)

fusion_bench-0.2.4/fusion_bench/dataset/llama/alpaca.py ADDED Viewed

@@ -0,0 +1,142 @@
+import logging
+import os
+from typing import Any, Dict, List, Optional
+from datasets import Dataset, load_dataset, load_from_disk
+from transformers import PreTrainedTokenizer
+import fusion_bench
+log = logging.getLogger(__name__)
+def tokenize_alpaca_dataset(
+    dataset: Dataset,
+    tokenizer: PreTrainedTokenizer,
+    max_length: int = 2048,
+    input_template: str = "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n",
+    input_no_template: str = "### Instruction:\n{instruction}\n\n### Response:\n",
+    batch_size: int = 1000,
+) -> Dataset:
+    """
+    Tokenize Alpaca format dataset with customizable options in batches.
+    Args:
+        dataset: The input dataset in Alpaca format
+        tokenizer: The tokenizer to use
+        max_length: Maximum sequence length
+        input_template: Template for samples with input field
+        input_no_template: Template for samples without input field
+        batch_size: Size of batches to process at once
+    Returns:
+        Tokenized dataset
+    """
+    def prepare_samples(samples: Dict[str, List[str]]) -> Dict[str, List[List[int]]]:
+        # Format prompts based on whether input field exists
+        prompts = []
+        for instruction, input_text in zip(
+            samples["instruction"], samples.get("input", [])
+        ):
+            if input_text.strip():
+                prompt = input_template.format(
+                    instruction=instruction.strip(), input=input_text.strip()
+                )
+            else:
+                prompt = input_no_template.format(instruction=instruction.strip())
+            prompts.append(prompt)
+        responses = [output.strip() for output in samples["output"]]
+        # Tokenize prompts and responses
+        prompt_tokens = tokenizer(
+            prompts, add_special_tokens=False, padding=False, truncation=False
+        )
+        response_tokens = tokenizer(
+            responses, add_special_tokens=False, padding=False, truncation=False
+        )
+        input_ids, labels = [], []
+        # Process each sample in the batch
+        for prompt_toks, response_toks in zip(
+            prompt_tokens["input_ids"], response_tokens["input_ids"]
+        ):
+            # Create input_ids with EOS token
+            sample_input_ids = prompt_toks + response_toks + [tokenizer.eos_token_id]
+            # Create labels: -100 for prompt, actual tokens for response
+            label = [-100] * len(prompt_toks) + response_toks + [tokenizer.eos_token_id]
+            # Truncate if exceeds max length
+            if len(sample_input_ids) > max_length:
+                sample_input_ids = sample_input_ids[:max_length]
+                label = label[:max_length]
+            input_ids.append(sample_input_ids)
+            labels.append(label)
+        # Use tokenizer's padding feature for input_ids and attention_mask
+        padded_results = tokenizer.pad(
+            {"input_ids": input_ids},
+            padding=True,
+            max_length=max_length,
+            return_attention_mask=True,
+        )
+        # Pad labels with -100
+        padded_labels = []
+        for label in labels:
+            padding_length = max_length - len(label)
+            if padding_length > 0:
+                label = label + [-100] * padding_length
+            padded_labels.append(label)
+        return {
+            "input_ids": padded_results["input_ids"],
+            "attention_mask": padded_results["attention_mask"],
+            "labels": padded_labels,
+        }
+    if tokenizer.pad_token is None:
+        log.warning("Tokenizer does not have a `pad_token`. Set it the `eos_token`.")
+        tokenizer.pad_token = tokenizer.eos_token
+    # Process the entire dataset in batches
+    tokenized_dataset = dataset.map(
+        prepare_samples,
+        batched=True,
+        batch_size=batch_size,
+        remove_columns=dataset.column_names,
+        desc="Tokenizing dataset",
+    )
+    return tokenized_dataset
+def load_tokenized_alpaca_dataset_from_json(
+    data_files: str,
+    tokenizer: PreTrainedTokenizer,
+    max_length: int,
+    split: Optional[str] = "train",
+    cache_path: Optional[str] = None,
+):
+    if cache_path is not None and fusion_bench.utils.path.path_is_dir_and_not_empty(
+        cache_path
+    ):
+        datasets = load_from_disk(cache_path)
+        if split is None:
+            return datasets
+        else:
+            return datasets[split]
+    else:
+        assert (
+            tokenizer is not None
+        ), "Cached dataset not found. Need tokenizer to process the raw data."
+    dataset = load_dataset("json", data_files=data_files)
+    if split is not None:
+        dataset = dataset[split]
+    dataset = tokenize_alpaca_dataset(dataset, tokenizer, max_length=max_length)
+    return dataset

fusion_bench-0.2.4/fusion_bench/dataset/llama/openai.py ADDED Viewed

@@ -0,0 +1,160 @@
+import logging
+from typing import Dict, List
+from datasets import Dataset
+from transformers import PreTrainedTokenizer
+log = logging.getLogger(__name__)
+def tokenize_messages_dataset(
+    dataset: Dataset,
+    tokenizer: PreTrainedTokenizer,
+    max_length: int = 2048,
+    padding: bool = True,
+    system_template: str = "### System: {message}\n",
+    user_template: str = "## User: {message}\n",
+    assistant_template: str = "## Assistant: {message}\n",
+) -> Dataset:
+    R"""
+    Tokenize dataset with messages format supporting loss calculation flags.
+    write a script to tokenizer datasets with the following format:
+    Examples:
+    ```json
+    {
+        "messages": [
+            {
+                "role": "system",
+                "content": "XXX",
+                "calculate_loss": 0
+            },
+            {
+                "role": "system",
+                "content": "XXX",
+                "calculate_loss": 0
+            },
+            {
+                "role": "user",
+                "content": "XXX",
+                "calculate_loss": 0
+            },
+            {
+                "role": "assistant",
+                "content": "XXX",
+                "calculate_loss": 1
+            }
+        ],
+        "create_info": [
+            {
+                "date": "20240830",
+                "owner": "l00470783",
+                "within_source_id": 0,
+                "describe": "...",
+                "source": [
+                    "..."
+                ],
+                "language": "zh"
+            }
+        ],
+        "feature_info": {
+            "domain": "...",
+            "tags": [
+                "..."
+            ]
+        },
+        "source_file": "..."
+    }
+    ```
+    Args:
+        dataset: Input dataset with messages format
+        tokenizer: The tokenizer to use
+        max_length: Maximum sequence length
+        system_template: Template for system messages
+        user_template: Template for user messages
+        assistant_template: Template for assistant messages
+    Returns:
+        Tokenized dataset
+    """
+    def build_prompt(messages: List[Dict[str, str]]) -> tuple[str, str]:
+        """
+        Build prompt and get response that needs loss calculation.
+        Returns conversation history and the response to calculate loss on.
+        """
+        history = ""
+        response = ""
+        for message in messages:
+            role = message["role"]
+            content = message["content"].strip()
+            calculate_loss = message.get("calculate_loss", 0)
+            # Build conversation history
+            if role == "system":
+                history += system_template.format(message=content)
+            elif role == "user":
+                history += user_template.format(message=content)
+            elif role == "assistant":
+                if calculate_loss:
+                    # If this assistant message needs loss calculation,
+                    # save it as response and don't add to history
+                    response = content
+                else:
+                    # Otherwise add to conversation history
+                    history += assistant_template.format(message=content)
+        return history, response
+    def prepare_sample(sample: dict) -> dict:
+        # Get conversation history and response
+        history, response = build_prompt(sample["messages"])
+        # Tokenize prompt and response
+        prompt_tokens = tokenizer.encode(history, add_special_tokens=False)
+        response_tokens = tokenizer.encode(response, add_special_tokens=False)
+        # Create input_ids with EOS token
+        input_ids = prompt_tokens + response_tokens + [tokenizer.eos_token_id]
+        # Create attention mask
+        attention_mask = [1] * len(input_ids)
+        # Create labels: -100 for prompt, actual tokens for response
+        labels = (
+            [-100] * len(prompt_tokens) + response_tokens + [tokenizer.eos_token_id]
+        )
+        # Truncate if exceeds max length
+        if len(input_ids) > max_length:
+            input_ids = input_ids[:max_length]
+            attention_mask = attention_mask[:max_length]
+            labels = labels[:max_length]
+        # Pad if necessary
+        if padding:
+            padding_length = max_length - len(input_ids)
+            if padding_length > 0:
+                input_ids.extend([tokenizer.pad_token_id] * padding_length)
+                attention_mask.extend([0] * padding_length)
+                labels.extend([-100] * padding_length)
+        return {
+            "input_ids": input_ids,
+            "attention_mask": attention_mask,
+            "labels": labels,
+        }
+    if tokenizer.pad_token is None:
+        log.warning("Tokenizer does not have a `pad_token`. Set it the `eos_token`.")
+        tokenizer.pad_token = tokenizer.eos_token
+    # Process the dataset
+    tokenized_dataset = dataset.map(
+        prepare_sample, remove_columns=dataset.column_names, desc="Tokenizing dataset"
+    )
+    return tokenized_dataset

fusion-bench 0.2.2__tar.gz → 0.2.4__tar.gz

fusion-bench 0.2.2tar.gz → 0.2.4tar.gz