PyPI - fusion-bench - Versions diffs - 0.2.4__tar.gz → 0.2.6__tar.gz - Mend

fusion-bench 0.2.4tar.gz → 0.2.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (543) hide show

{fusion_bench-0.2.4 → fusion_bench-0.2.6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: fusion_bench
-Version: 0.2.4
+Version: 0.2.6
 Summary: A Comprehensive Benchmark of Deep Model Fusion
 Author-email: Anke Tang <tang.anke@foxmail.com>
 License: MIT License
@@ -50,7 +50,7 @@ Requires-Dist: pytest
 # FusionBench: A Comprehensive Benchmark/ToolKit of Deep Model Fusion
-[![arXiv](https://img.shields.io/badge/arXiv-1234.56789-b31b1b.svg)](http://arxiv.org/abs/2406.03280)
+[![arXiv](https://img.shields.io/badge/arXiv-2406.03280-b31b1b.svg)](http://arxiv.org/abs/2406.03280)
 [![GitHub License](https://img.shields.io/github/license/tanganke/fusion_bench)](https://github.com/tanganke/fusion_bench/blob/main/LICENSE)
 [![PyPI - Version](https://img.shields.io/pypi/v/fusion-bench)](https://pypi.org/project/fusion-bench/)
 [![Downloads](https://static.pepy.tech/badge/fusion-bench/month)](https://pepy.tech/project/fusion-bench)
@@ -67,7 +67,26 @@ Requires-Dist: pytest
 FusionBench is a benchmark suite designed to evaluate the performance of various deep model fusion techniques. It aims to provide a comprehensive comparison of different methods on a variety of datasets and tasks.
-Projects based on FusionBench:
+Projects based on FusionBench and news from the community (descending order of date):
+<details>
+  <summary>Hongling Zheng, Li Shen, Anke Tang, Yong Luo et al. Learn From Model Beyond Fine-Tuning: A Survey. has been accepted for publication in Nature Machine Intelligence. Nov, 2024. https://arxiv.org/abs/2310.08184</summary>
+  > Foundation models (FM) have demonstrated remarkable performance across a wide range of tasks (especially in the fields
+  of natural language processing and computer vision), primarily attributed to their ability to comprehend instructions and access
+  extensive, high-quality data. This not only showcases their current effectiveness but also sets a promising trajectory towards the
+  development of artificial general intelligence. Unfortunately, due to multiple constraints, the raw data of the model used for large model
+  training are often inaccessible, so the use of end-to-end models for downstream tasks has become a new research trend, which we call
+  Learn From Model (LFM) in this article. LFM focuses on the research, modification, and design of FM based on the model interface,
+  so as to better understand the model structure and weights (in a black box environment), and to generalize the model to downstream
+  tasks. The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse,
+  meta learning and model editing. Each category encompasses a repertoire of methods and strategies that aim to enhance the
+  capabilities and performance of FM. This paper gives a comprehensive review of the current methods based on FM from the
+  perspective of LFM, in order to help readers better understand the current research status and ideas. To conclude, we summarize the
+  survey by highlighting several critical areas for future exploration and addressing open issues that require further attention from the
+  research community. The relevant papers we investigated in this article can be accessed at
+  https://github.com/ruthless-man/Awesome-Learn-from-Model.
+</details>
 <details>
   <summary>Li Shen, Anke Tang, Enneng Yang et al. Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Oct, 2024. https://github.com/EnnengYang/Efficient-WEMoE</summary>

{fusion_bench-0.2.4 → fusion_bench-0.2.6}/README.md RENAMED Viewed

@@ -2,7 +2,7 @@
 # FusionBench: A Comprehensive Benchmark/ToolKit of Deep Model Fusion
-[![arXiv](https://img.shields.io/badge/arXiv-1234.56789-b31b1b.svg)](http://arxiv.org/abs/2406.03280)
+[![arXiv](https://img.shields.io/badge/arXiv-2406.03280-b31b1b.svg)](http://arxiv.org/abs/2406.03280)
 [![GitHub License](https://img.shields.io/github/license/tanganke/fusion_bench)](https://github.com/tanganke/fusion_bench/blob/main/LICENSE)
 [![PyPI - Version](https://img.shields.io/pypi/v/fusion-bench)](https://pypi.org/project/fusion-bench/)
 [![Downloads](https://static.pepy.tech/badge/fusion-bench/month)](https://pepy.tech/project/fusion-bench)
@@ -19,7 +19,26 @@
 FusionBench is a benchmark suite designed to evaluate the performance of various deep model fusion techniques. It aims to provide a comprehensive comparison of different methods on a variety of datasets and tasks.
-Projects based on FusionBench:
+Projects based on FusionBench and news from the community (descending order of date):
+<details>
+  <summary>Hongling Zheng, Li Shen, Anke Tang, Yong Luo et al. Learn From Model Beyond Fine-Tuning: A Survey. has been accepted for publication in Nature Machine Intelligence. Nov, 2024. https://arxiv.org/abs/2310.08184</summary>
+  > Foundation models (FM) have demonstrated remarkable performance across a wide range of tasks (especially in the fields
+  of natural language processing and computer vision), primarily attributed to their ability to comprehend instructions and access
+  extensive, high-quality data. This not only showcases their current effectiveness but also sets a promising trajectory towards the
+  development of artificial general intelligence. Unfortunately, due to multiple constraints, the raw data of the model used for large model
+  training are often inaccessible, so the use of end-to-end models for downstream tasks has become a new research trend, which we call
+  Learn From Model (LFM) in this article. LFM focuses on the research, modification, and design of FM based on the model interface,
+  so as to better understand the model structure and weights (in a black box environment), and to generalize the model to downstream
+  tasks. The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse,
+  meta learning and model editing. Each category encompasses a repertoire of methods and strategies that aim to enhance the
+  capabilities and performance of FM. This paper gives a comprehensive review of the current methods based on FM from the
+  perspective of LFM, in order to help readers better understand the current research status and ideas. To conclude, we summarize the
+  survey by highlighting several critical areas for future exploration and addressing open issues that require further attention from the
+  research community. The relevant papers we investigated in this article can be accessed at
+  https://github.com/ruthless-man/Awesome-Learn-from-Model.
+</details>
 <details>
   <summary>Li Shen, Anke Tang, Enneng Yang et al. Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Oct, 2024. https://github.com/EnnengYang/Efficient-WEMoE</summary>

{fusion_bench-0.2.4 → fusion_bench-0.2.6}/fusion_bench/compat/method/__init__.py RENAMED Viewed

@@ -1,3 +1,5 @@
+import warnings
 from omegaconf import DictConfig
 from .base_algorithm import ModelFusionAlgorithm
@@ -16,35 +18,18 @@ class AlgorithmFactory:
         "clip_finetune": ".classification.clip_finetune.ImageClassificationFineTuningForCLIP",
         # analysis
         # model merging methods
-        "simple_average": ".simple_average.SimpleAverageAlgorithm",
-        "weighted_average": ".weighted_average.weighted_average.WeightedAverageAlgorithm",
-        "weighted_average_for_llama": ".weighted_average.llama.WeightedAverageForLLama",
-        "task_arithmetic": ".task_arithmetic.TaskArithmeticAlgorithm",
-        "ties_merging": ".ties_merging.ties_merging.TiesMergingAlgorithm",
         "clip_task_wise_adamerging": ".adamerging.clip_task_wise_adamerging.CLIPTaskWiseAdaMergingAlgorithm",
         "clip_layer_wise_adamerging": ".adamerging.clip_layer_wise_adamerging.CLIPLayerWiseAdaMergingAlgorithm",
         "singular_projection_merging": "fusion_bench.method.smile_upscaling.singular_projection_merging.SingularProjectionMergingAlgorithm",
-        "pwe_moe_ls_for_clip": ".pwe_moe.clip_pwe_moe.PWEMoELinearScalarizationForCLIP",
-        "pwe_moe_epo_for_clip": ".pwe_moe.clip_pwe_moe.PWEMoExactParetoOptimalForCLIP",
         # plug-and-play model merging methods
         "clip_concrete_task_arithmetic": ".concrete_subspace.clip_concrete_task_arithmetic.ConcreteTaskArithmeticAlgorithmForCLIP",
         "clip_concrete_task_wise_adamerging": ".concrete_subspace.clip_concrete_adamerging.ConcreteTaskWiseAdaMergingForCLIP",
         "clip_concrete_layer_wise_adamerging": ".concrete_subspace.clip_concrete_adamerging.ConcreteLayerWiseAdaMergingForCLIP",
         # model mixing methods
-        "depth_upscaling": ".depth_upscaling.DepthUpscalingAlgorithm",
-        "mixtral_moe_upscaling": ".mixture_of_experts.mixtral_upcycling.MixtralUpscalingAlgorithm",
-        "mixtral_for_causal_lm_moe_upscaling": ".mixture_of_experts.mixtral_upcycling.MixtralForCausalLMUpscalingAlgorithm",
-        "mixtral_moe_merging": ".mixture_of_experts.mixtral_merging.MixtralMoEMergingAlgorithm",
-        "mixtral_for_causal_lm_merging": ".mixture_of_experts.mixtral_merging.MixtralForCausalLMMergingAlgorithm",
         "clip_weight_ensembling_moe": ".we_moe.clip_we_moe.CLIPWeightEnsemblingMoEAlgorithm",
-        "model_recombination": ".model_recombination.ModelRecombinationAlgorithm",
-        "smile_upscaling": ".smile_upscaling.smile_upscaling.SmileUpscalingAlgorithm",
         "sparse_clip_weight_ensembling_moe": "fusion_bench.method.SparseCLIPWeightEnsemblingMoEAlgorithm",
         "smile_mistral_upscaling": ".smile_upscaling.smile_mistral_upscaling.SmileMistralUpscalingAlgorithm",
-        # pruning methods
-        "magnitude_diff_pruning": ".pruning.MagnitudeDiffPruningAlgorithm",
-        "magnitude_pruning_for_llama": ".pruning.llama_magnitude_prune.MagnitudePruningForLlama",
-        "wanda_pruning_for_llama": ".pruning.llama_wanda_prune.WandaPruningForLlama",
+        "rankone_moe": ".rankone_moe.clip_rankone_moe.CLIPRankOneMoEAlgorithm",
     }
     @staticmethod
@@ -61,6 +46,12 @@ class AlgorithmFactory:
         Raises:
             ValueError: If 'name' attribute is not found in the configuration or does not match any known algorithm names.
         """
+        warnings.warn(
+            "AlgorithmFactory.create_algorithm() is deprecated and will be removed in future versions. "
+            "Please implement new model fusion algorithm using `fusion_bench.method.BaseModelFusionAlgorithm` instead.",
+            DeprecationWarning,
+        )
         from fusion_bench.utils import import_object
         algorithm_name = method_config.name

{fusion_bench-0.2.4 → fusion_bench-0.2.6}/fusion_bench/compat/method/base_algorithm.py RENAMED Viewed

@@ -26,7 +26,6 @@ class ModelFusionAlgorithm(ABC):
             algorithm_config (Optional[DictConfig]): Configuration for the algorithm. Defaults to an empty configuration if not provided.
                 Get access to the configuration using `self.config`.
         """
-        super().__init__()
         if algorithm_config is None:
             algorithm_config = DictConfig({})
         self.config = algorithm_config
@@ -42,6 +41,9 @@ class ModelFusionAlgorithm(ABC):
         Args:
             modelpool: The pool of models to fuse.
+        Returns:
+            The fused model.
         Examples:
             >>> algorithm = SimpleAverageAlgorithm()
             >>> modelpool = ModelPool()

{fusion_bench-0.2.4 → fusion_bench-0.2.6}/fusion_bench/compat/modelpool/__init__.py RENAMED Viewed

@@ -1,8 +1,10 @@
 # flake8: noqa F401
+import warnings
 from omegaconf import DictConfig
 from fusion_bench.modelpool.huggingface_gpt2_classification import (
-    HuggingFaceGPT2ClassificationPool,
+    GPT2ForSequenceClassificationPool,
 )
 from fusion_bench.modelpool.PeftModelForSeq2SeqLM import PeftModelForSeq2SeqLMPool
@@ -22,7 +24,7 @@ class ModelPoolFactory:
     _modelpool = {
         "NYUv2ModelPool": ".nyuv2_modelpool.NYUv2ModelPool",
         "huggingface_clip_vision": HuggingFaceClipVisionPool,
-        "HF_GPT2ForSequenceClassification": HuggingFaceGPT2ClassificationPool,
+        "HF_GPT2ForSequenceClassification": GPT2ForSequenceClassificationPool,
         "AutoModelPool": ".huggingface_automodel.AutoModelPool",
         # CausualLM
         "AutoModelForCausalLMPool": ".huggingface_llm.AutoModelForCausalLMPool",
@@ -50,6 +52,12 @@ class ModelPoolFactory:
         Raises:
             ValueError: If 'type' attribute is not found in the configuration or does not match any known model pool types.
         """
+        warnings.warn(
+            "ModelPoolFactory.create_modelpool() is deprecated and will be removed in future versions. "
+            "Please implement new model pool using `fusion_bench.modelpool.BaseModelPool` instead.",
+            DeprecationWarning,
+        )
         from fusion_bench.utils import import_object
         modelpool_type = modelpool_config.get("type")

{fusion_bench-0.2.4 → fusion_bench-0.2.6}/fusion_bench/compat/taskpool/base_pool.py RENAMED Viewed

@@ -14,6 +14,7 @@ class TaskPool:
         config (DictConfig): The configuration for the task pool.
         _all_task_names (List[str]): A list of all task names in the task pool.
     """
     _program = None
     def __init__(self, taskpool_config: DictConfig):

{fusion_bench-0.2.4 → fusion_bench-0.2.6}/fusion_bench/compat/taskpool/clip_image_classification.py RENAMED Viewed

@@ -154,7 +154,7 @@ class CLIPImageClassificationTaskPool(TaskPool):
         return task
-    def evaluate(self, model: CLIPVisionModel):
+    def evaluate(self, model: CLIPVisionModel, name=None):
         """
         Evaluate the model on the image classification task.
@@ -178,10 +178,29 @@ class CLIPImageClassificationTaskPool(TaskPool):
             "all_params": all_params,
             "trainable_percentage": training_params / all_params,
         }
+        if name is not None:
+            report["model_info"]["name"] = name
         for task_name in tqdm(self.task_names, desc="Evaluating tasks"):
             task = self.load_task(task_name)
             result = task.evaluate(self.clip_model)
             report[task_name] = result
+        # calculate the average accuracy and loss
+        if "average" not in report:
+            report["average"] = {}
+            accuracies = [
+                value["accuracy"]
+                for key, value in report.items()
+                if "accuracy" in value
+            ]
+            if len(accuracies) > 0:
+                average_accuracy = sum(accuracies) / len(accuracies)
+                report["average"]["accuracy"] = average_accuracy
+            losses = [value["loss"] for key, value in report.items() if "loss" in value]
+            if len(losses) > 0:
+                average_loss = sum(losses) / len(losses)
+                report["average"]["loss"] = average_loss
         log.info(f"Results for taskpool {self.config.name}: {report}")
         if self._fabric.is_global_zero and len(self._fabric._loggers) > 0:
             with open(

fusion_bench-0.2.6/fusion_bench/dataset/arc_agi/__init__.py ADDED Viewed

@@ -0,0 +1,6 @@
+from .arc_agi import (
+    load_tokenized_arc_agi_dataset,
+    load_tokenized_arc_agi_dataset_for_ttt,
+    process_task,
+    process_task_for_ttt,
+)

fusion_bench-0.2.6/fusion_bench/dataset/arc_agi/arc.py ADDED Viewed

@@ -0,0 +1,303 @@
+"""
+This module contains classes to represent ARC tasks and examples
+Grid: a numpy array representing a grid
+Example: a class to represent an example (example.input and example.output are grids)
+Task: a class to represent a task (task.test_example and task.train_examples are test and train examples)
+read_from_single_file: a function to read challenge problems and solutions from a single file
+make_submission: a function to create a submission file
+"""
+import dataclasses
+import glob
+import json
+import os
+from typing import List, Optional
+import numpy as np
+Grid = np.ndarray
+def to_tuple(arr):
+    return tuple(tuple([int(e) for e in row]) for row in arr)
+def to_list(arr):
+    return [[int(e) for e in row] for row in arr]
+@dataclasses.dataclass
+class Example:
+    """
+    class to represent an example
+    """
+    input: Grid
+    output: Grid
+    cot: Optional[List[Grid]] = None
+    def input_size(self) -> int:
+        """return the size of the input grid"""
+        return self.input.size
+    def output_size(self) -> int:
+        """return the size of the output grid"""
+        return self.output.size
+    def size(self) -> int:
+        """return the size of the example"""
+        return max(self.input_size(), self.output_size())
+    def __hash__(self) -> int:
+        return hash((self.input.tobytes(), self.output.tobytes()))
+    def __repr__(self) -> str:
+        return f"Example(input={self.input}, output={self.output})"
+    def serialize(self) -> dict:
+        example = {"input": self.input.tolist(), "output": self.output.tolist()}
+        if self.cot:
+            example["cot"] = [cot.tolist() for cot in self.cot]
+        return example
+    def __eq__(self, other: object) -> bool:
+        if not isinstance(other, Example):
+            return NotImplemented
+        return np.array_equal(self.input, other.input) and np.array_equal(
+            self.output, other.output
+        )
+    @classmethod
+    def deserialize(cls, data: dict, test: bool = False) -> "Example":
+        input = np.array(data["input"])
+        if test:
+            output = input.copy()
+        elif "output" in data:
+            output = np.array(data["output"])
+        else:
+            output = input.copy()
+        cot = None
+        if "cot" in data:
+            cot = [np.array(c) for c in data["cot"]]
+        return cls(input, output, cot)
+@dataclasses.dataclass
+class Task:
+    """
+    A class to represent a task
+    """
+    test_example: Example
+    train_examples: List[Example] = dataclasses.field(default_factory=list)
+    name: str = ""
+    def size(self) -> int:
+        """return the size of the task"""
+        return max([example.size() for example in self.train_examples])
+    def max_height(self) -> int:
+        max_x = 0
+        for example in self.train_examples:
+            x, _ = example.input.shape
+            max_x = max(max_x, x)
+            x, _ = example.output.shape
+            max_x = max(max_x, x)
+        # include test too
+        x, _ = self.test_example.input.shape
+        max_x = max(max_x, x)
+        x, _ = self.test_example.output.shape
+        max_x = max(max_x, x)
+        return max_x
+    def max_width(self) -> int:
+        max_y = 0
+        for example in self.train_examples:
+            _, y = example.input.shape
+            max_y = max(max_y, y)
+            _, y = example.output.shape
+            max_y = max(max_y, y)
+        # include test too
+        _, y = self.test_example.input.shape
+        max_y = max(max_y, y)
+        _, y = self.test_example.output.shape
+        max_y = max(max_y, y)
+        return max_y
+    def __repr__(self) -> str:
+        return f"Task(train={self.train_examples}, test={self.test_example})"
+    def serialize(self) -> dict:
+        return {
+            "train": [train.serialize() for train in self.train_examples],
+            "test": [self.test_example.serialize()],
+            "name": self.name,
+        }
+    def __hash__(self) -> int:
+        return hash((tuple(train for train in self.train_examples), self.test_example))
+    @classmethod
+    def deserialize(cls, data: dict, test: bool = False) -> "Task":
+        assert len(data["test"]) == 1, "Only one test example is allowed"
+        train = [Example.deserialize(train) for train in data["train"]]
+        test = Example.deserialize(data["test"][0], test=test)
+        return cls(train_examples=train, test_example=test, name=data.get("name", ""))
+    @classmethod
+    def read_tasks_from_dict(cls, data: dict, test: bool = False) -> List["Task"]:
+        tasks = []
+        for test_data in data["test"]:
+            task = cls.deserialize(
+                {
+                    "train": data["train"],
+                    "test": [test_data],
+                    "name": data.get("name", ""),
+                },
+                test=test,
+            )
+            tasks.append(task)
+        return tasks
+    def entropy(self) -> float:
+        """return the entropy of the outputs"""
+        outputs = [example.output.flatten() for example in self.train_examples]
+        outputs.append(self.test_example.output.flatten())
+        vocabulary = np.unique(np.concatenate(outputs)).tolist()
+        # find max output length
+        max_output_length = max([len(output) for output in outputs])
+        probs = np.zeros((len(vocabulary), max_output_length))
+        # get the probes for each integer of each index
+        for i, output in enumerate(outputs):
+            for j, value in enumerate(output):
+                index_of_value = vocabulary.index(value)
+                probs[index_of_value, j] += 1
+        # normalize
+        probs = probs / probs.sum(axis=0)
+        # get the entropy
+        entropy = -np.sum(probs * np.log(probs + 1e-9), axis=0)
+        # mean entropy
+        return np.mean(entropy)
+@dataclasses.dataclass
+class TaskWithDescription(Task):
+    description: str = ""
+def read_tasks_from_folder(task_folder: str, test: bool = False) -> List[Task]:
+    """
+    Read tasks from a folder
+    """
+    all_tasks = []
+    for file in glob.glob(f"{task_folder}/*.json"):
+        basename = os.path.basename(file)
+        idx = basename.replace(".json", "")
+        tasks = read_tasks_from_file(file, test=test)
+        for i, task in enumerate(tasks):
+            task.name = idx + "-" + str(i)
+        all_tasks += tasks
+    return all_tasks
+def read_tasks_from_single_file(
+    challenge_file: str, test: bool = False, solution_file: Optional[str] = None
+) -> List[Task]:
+    """
+    Read tasks from a single file
+    """
+    with open(challenge_file, "r", encoding="utf-8") as handle:
+        data = json.load(handle)
+    if solution_file is not None:
+        test = False
+        with open(solution_file, "r", encoding="utf-8") as handle:
+            solutions = json.load(handle)
+            for key, value in solutions.items():
+                for idx, solution in enumerate(value):
+                    data[key]["test"][idx]["output"] = solution
+    all_tasks = []
+    for task_name, subtasks in data.items():
+        parsed_tasks = Task.read_tasks_from_dict(subtasks, test=test)
+        for i, task in enumerate(parsed_tasks):
+            task.name = task_name + "-" + str(i)
+            all_tasks.append(task)
+    return all_tasks
+def read_tasks_from_file(task_file: str, test: bool = False) -> List[Task]:
+    """
+    Read tasks from a file
+    """
+    with open(task_file, "r", encoding="utf-8") as handle:
+        data = json.load(handle)
+    return Task.read_tasks_from_dict(data, test=test)
+def make_submission(
+    tasks: List[Task],
+    predictions: List[List[Grid]],
+    path: Optional[str] = None,
+    number_of_attempts: int = 2,
+) -> dict:
+    """
+    Make a submission
+    """
+    assert len(tasks) == len(
+        predictions
+    ), "Number of tasks and predictions should be the same"
+    # sort by task_name alphabetically to ensure order of subtasks
+    indices = np.argsort([task.name for task in tasks])
+    tasks = [tasks[i] for i in indices]
+    predictions = [predictions[i] for i in indices]
+    # get the submissions
+    submissions = {}
+    for task, prediction in zip(tasks, predictions):
+        task_name, task_no = task.name.split("-")
+        task_no = int(task_no)
+        if task_name not in submissions:
+            submissions[task_name] = []
+        assert (
+            len(prediction) == number_of_attempts
+        ), "Number of attempts should be the same"
+        attempts = {
+            f"attempt_{j+1}": to_list(pred) for j, pred in enumerate(prediction)
+        }
+        while len(submissions[task_name]) <= task_no:
+            submissions[task_name].append({"attempt_1": [[0]], "attempt_2": [[0]]})
+        submissions[task_name][task_no] = attempts
+    if path is not None:
+        with open(path, "w") as handle:
+            json.dump(submissions, handle)
+    return submissions
+if __name__ == "__main__":
+    arc_path = "/kaggle/input/arc-prize-2024/"
+    tasks = read_tasks_from_single_file(arc_path + "arc-agi_training_challenges.json")
+    print(tasks[0])
+    tasks = read_tasks_from_single_file(
+        arc_path + "arc-agi_evaluation_challenges.json", test=True
+    )
+    print(tasks[0])
+    tasks = read_tasks_from_single_file(
+        arc_path + "arc-agi_evaluation_challenges.json",
+        test=True,
+        solution_file=arc_path + "arc-agi_evaluation_solutions.json",
+    )
+    print(tasks[0])

fusion-bench 0.2.4__tar.gz → 0.2.6__tar.gz

fusion-bench 0.2.4tar.gz → 0.2.6tar.gz