PyPI - EuroEval - Versions diffs - 15.8.2__tar.gz → 15.9.0__tar.gz - Mend

EuroEval 15.8.2tar.gz → 15.9.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of EuroEval might be problematic. Click here for more details.

Files changed (240) hide show

{euroeval-15.8.2 → euroeval-15.9.0}/.github/workflows/ci.yaml RENAMED Viewed

@@ -30,7 +30,7 @@ jobs:
           python-version: "3.11"
       - run: python -m pip install pre-commit
         shell: bash
-      - run: pre-commit run --show-diff-on-failure --color=always
+      - run: pre-commit run --show-diff-on-failure --color=always --all-files
         shell: bash
   pytest-linux:

{euroeval-15.8.2 → euroeval-15.9.0}/.pre-commit-config.yaml RENAMED Viewed

@@ -10,7 +10,7 @@ repos:
       - id: trailing-whitespace
       - id: debug-statements
 -   repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.11.9
+    rev: v0.11.12
     hooks:
       - id: ruff
         args:
@@ -31,7 +31,7 @@ repos:
     hooks:
     -   id: nbstripout
 -   repo: https://github.com/pre-commit/mirrors-mypy
-    rev: v1.15.0
+    rev: v1.16.0
     hooks:
     -   id: mypy
         args:

{euroeval-15.8.2 → euroeval-15.9.0}/CHANGELOG.md RENAMED Viewed

@@ -10,6 +10,24 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
+## [v15.9.0] - 2025-05-31
+### Changed
+- Updated `vllm` to `>=0.9.0`, as the bug in `v0.8.5` has been fixed.
+- Removed the `--use-flash-attention` flag as well as the corresponding warning, as
+  flash attention is now built-in to vLLM and is used by default.
+### Fixed
+- When truncating prompts with vLLM models, we now correctly truncate them down below
+  the `MAX_CONTEXT_LENGTH` (set to 5,000 tokens). We have already ensured that all
+  prompts have less than 5,000 Gemma-3 tokens, but sometimes tokenizers add a few more
+  tokens.
+- Fixed an issue regarding model existence check when benchmarking models on custom
+  inference API servers.
+- Fixed an issue with Phi-4 models, as they output multiple end-of-reasoning tokens, and
+  it was previously cutting off at the first one, yielding faulty final answers. We now
+  cut off at the last end-of-reasoning token, which is the correct one.
 ## [v15.8.2] - 2025-05-12
 ### Fixed
 - Catch error when caching generative model outputs, when the number of model inputs and

{euroeval-15.8.2 → euroeval-15.9.0}/Dockerfile.cuda RENAMED Viewed

@@ -5,8 +5,7 @@ RUN apt-get -y update && \
     apt-get -y upgrade && \
     DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends gcc python3.11 python3-pip python3-dev git-all && \
     python3 -m pip install --upgrade pip wheel && \
-    python3 -m pip install euroeval[all] && \
-    FLASH_ATTENTION_SKIP_CUDA_BUILD=TRUE python3 -m pip install flash-attn --no-build-isolation
+    python3 -m pip install euroeval[all]
 # Move the existing evaluation results into the container, to avoid re-running the
 # evaluation

{euroeval-15.8.2 → euroeval-15.9.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: EuroEval
-Version: 15.8.2
+Version: 15.9.0
 Summary: The robust European language model benchmark.
 Project-URL: Repository, https://github.com/EuroEval/EuroEval
 Project-URL: Issues, https://github.com/EuroEval/EuroEval/issues
@@ -62,12 +62,12 @@ Requires-Dist: bitsandbytes>=0.43.1; (platform_system == 'Linux') and extra == '
 Requires-Dist: fbgemm-gpu>=1.0.0; (platform_system == 'Linux') and extra == 'all'
 Requires-Dist: gradio>=4.26.0; extra == 'all'
 Requires-Dist: outlines>=0.1.11; extra == 'all'
-Requires-Dist: vllm<0.8.5,>=0.8.3; (platform_system == 'Linux') and extra == 'all'
+Requires-Dist: vllm>=0.9.0; (platform_system == 'Linux') and extra == 'all'
 Provides-Extra: generative
 Requires-Dist: bitsandbytes>=0.43.1; (platform_system == 'Linux') and extra == 'generative'
 Requires-Dist: fbgemm-gpu>=1.0.0; (platform_system == 'Linux') and extra == 'generative'
 Requires-Dist: outlines>=0.1.11; extra == 'generative'
-Requires-Dist: vllm<0.8.5,>=0.8.3; (platform_system == 'Linux') and extra == 'generative'
+Requires-Dist: vllm>=0.9.0; (platform_system == 'Linux') and extra == 'generative'
 Provides-Extra: human-evaluation
 Requires-Dist: gradio>=4.26.0; extra == 'human-evaluation'
 Provides-Extra: test
@@ -97,8 +97,6 @@ ______________________________________________________________________
 - Dan Saattrup Nielsen ([@saattrupdan](https://github.com/saattrupdan),
   dan.nielsen@alexandra.dk)
-- Kenneth Enevoldsen ([@KennethEnevoldsen](https://github.com/KennethEnevoldsen),
-  kenneth.enevoldsen@cas.au.dk)
 ## Installation

{euroeval-15.8.2 → euroeval-15.9.0}/README.md RENAMED Viewed

@@ -21,8 +21,6 @@ ______________________________________________________________________
 - Dan Saattrup Nielsen ([@saattrupdan](https://github.com/saattrupdan),
   dan.nielsen@alexandra.dk)
-- Kenneth Enevoldsen ([@KennethEnevoldsen](https://github.com/KennethEnevoldsen),
-  kenneth.enevoldsen@cas.au.dk)
 ## Installation

{euroeval-15.8.2 → euroeval-15.9.0}/docs/README.md RENAMED Viewed

@@ -32,6 +32,5 @@ models. It started as a hobby project including Danish, Swedish and Norwegian, b
 since grown to include 8+ European languages.
 EuroEval is maintained by [Dan Saattrup Nielsen](https://www.saattrupdan.com/) from the
-[Alexandra Institute](https://alexandra.dk) and [Kenneth
-Enevoldsen](https://www.kennethenevoldsen.com/) from [Aarhus University](https://au.dk),
-and is funded by the EU project [TrustLLM](https://trustllm.eu/).
+[Alexandra Institute](https://alexandra.dk), and is funded by the EU project
+[TrustLLM](https://trustllm.eu/).

{euroeval-15.8.2 → euroeval-15.9.0}/makefile RENAMED Viewed

@@ -44,18 +44,15 @@ install-rust:
 install-uv:
 	@if [ "$(shell which uv)" = "" ]; then \
 		curl -LsSf https://astral.sh/uv/install.sh | sh; \
-        echo "Installed uv."; \
-    else \
-		echo "Updating uv..."; \
-		uv self update; \
+			echo "Installed uv."; \
+		else \
+			echo "Updating uv..."; \
+			uv self update; \
 	fi
 install-dependencies:
 	@uv python install 3.11
 	@uv sync --all-extras --python 3.11
-	@if [ "${NO_FLASH_ATTN}" != "1" ] && [ $$(uname) != "Darwin" ]; then \
-		uv pip install --no-build-isolation flash-attn>=2.7.0.post2; \
-	fi
 setup-environment-variables:
 	@uv run python src/scripts/fix_dot_env_file.py

{euroeval-15.8.2 → euroeval-15.9.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "EuroEval"
-version = "15.8.2"
+version = "15.9.0"
 description = "The robust European language model benchmark."
 readme = "README.md"
 authors = [
@@ -46,7 +46,7 @@ dependencies = [
 generative = [
     "outlines>=0.1.11",
     "bitsandbytes>=0.43.1; platform_system == 'Linux'",
-    "vllm>=0.8.3,<0.8.5; platform_system == 'Linux'",
+    "vllm>=0.9.0; platform_system == 'Linux'",
     "fbgemm-gpu>=1.0.0; platform_system == 'Linux'",
 ]
 human_evaluation = [
@@ -55,7 +55,7 @@ human_evaluation = [
 all = [
     "outlines>=0.1.11",
     "bitsandbytes>=0.43.1; platform_system == 'Linux'",
-    "vllm>=0.8.3,<0.8.5; platform_system == 'Linux'",
+    "vllm>=0.9.0; platform_system == 'Linux'",
     "fbgemm-gpu>=1.0.0; platform_system == 'Linux'",
     "gradio>=4.26.0",
 ]

{euroeval-15.8.2 → euroeval-15.9.0}/src/euroeval/benchmark_config_factory.py RENAMED Viewed

@@ -1,6 +1,5 @@
 """Factory class for creating dataset configurations."""
-import importlib.util
 import logging
 import sys
 import typing as t
@@ -13,7 +12,6 @@ from .enums import Device
 from .exceptions import InvalidBenchmark
 from .languages import get_all_languages
 from .tasks import SPEED, get_all_tasks
-from .utils import log_once
 if t.TYPE_CHECKING:
     from .data_models import Language, Task
@@ -38,7 +36,6 @@ def build_benchmark_config(
     force: bool,
     verbose: bool,
     trust_remote_code: bool,
-    use_flash_attention: bool | None,
     clear_model_cache: bool,
     evaluate_test_split: bool,
     few_shot: bool,
@@ -92,9 +89,6 @@ def build_benchmark_config(
             automatically set if `debug` is True.
         trust_remote_code:
             Whether to trust remote code when running the benchmark.
-        use_flash_attention:
-            Whether to use Flash Attention for the models. If None then it will be used
-            if it is available.
         clear_model_cache:
             Whether to clear the model cache before running the benchmark.
         evaluate_test_split:
@@ -135,30 +129,6 @@ def build_benchmark_config(
     torch_device = prepare_device(device=device)
-    if use_flash_attention is None:
-        if torch_device.type != "cuda":
-            use_flash_attention = False
-        elif (
-            importlib.util.find_spec("flash_attn") is None
-            and importlib.util.find_spec("vllm_flash_attn") is None
-        ):
-            use_flash_attention = False
-            if first_time and torch_device.type == "cuda":
-                message = (
-                    "Flash attention has not been installed, so this will not be used. "
-                    "To install it, run `pip install -U wheel && "
-                    "FLASH_ATTENTION_SKIP_CUDA_BUILD=TRUE pip install flash-attn "
-                    "--no-build-isolation`. Alternatively, you can disable this "
-                    "message by setting "
-                )
-                if run_with_cli:
-                    message += "the flag `--no-use-flash-attention`."
-                else:
-                    message += (
-                        "the argument `use_flash_attention=False` in the `Benchmarker`."
-                    )
-                log_once(message=message, level=logging.INFO)
     # Set variable with number of iterations
     if hasattr(sys, "_called_from_test"):
         num_iterations = 1
@@ -178,7 +148,6 @@ def build_benchmark_config(
         verbose=verbose or debug,
         device=torch_device,
         trust_remote_code=trust_remote_code,
-        use_flash_attention=use_flash_attention,
         clear_model_cache=clear_model_cache,
         evaluate_test_split=evaluate_test_split,
         few_shot=few_shot,

{euroeval-15.8.2 → euroeval-15.9.0}/src/euroeval/benchmark_modules/hf.py RENAMED Viewed

@@ -54,13 +54,11 @@ from ..enums import (
     TaskGroup,
 )
 from ..exceptions import (
-    HuggingFaceHubDown,
     InvalidBenchmark,
     InvalidModel,
     NeedsAdditionalArgument,
     NeedsEnvironmentVariable,
     NeedsExtraInstalled,
-    NoInternetConnection,
 )
 from ..languages import get_all_languages
 from ..task_group_utils import (
@@ -737,9 +735,10 @@ def get_model_repo_info(
             model_info = HfApiModelInfo(id=model_id, tags=None, pipeline_tag=None)
     # If the model does not exist locally, then we get the model info from the Hugging
-    # Face Hub
+    # Face Hub, if possible
     if model_info is None:
         num_attempts = 3
+        errors: list[Exception] = list()
         for _ in range(num_attempts):
             try:
                 model_info = hf_api.model_info(
@@ -749,25 +748,37 @@ def get_model_repo_info(
             except (GatedRepoError, LocalTokenNotFoundError) as e:
                 try:
                     hf_whoami(token=token)
-                    logger.warning(
+                    logger.debug(
                         f"Could not access the model {model_id} with the revision "
                         f"{revision}. The error was {str(e)!r}."
                     )
                     return None
                 except LocalTokenNotFoundError:
-                    raise NeedsAdditionalArgument(
-                        cli_argument="--api-key",
-                        script_argument="api_key=<your-api-key>",
-                        run_with_cli=benchmark_config.run_with_cli,
+                    logger.debug(
+                        f"Could not access the model {model_id} with the revision "
+                        f"{revision}. The error was {str(e)!r}. Please set the "
+                        "`HUGGINGFACE_API_KEY` environment variable or use the "
+                        "`--api-key` argument."
                     )
+                    return None
             except (RepositoryNotFoundError, HFValidationError):
                 return None
-            except (OSError, RequestException):
+            except (OSError, RequestException) as e:
                 if internet_connection_available():
+                    errors.append(e)
                     continue
-                raise NoInternetConnection()
+                logger.debug(
+                    "Could not access the Hugging Face Hub. Please check your internet "
+                    "connection."
+                )
+                return None
         else:
-            raise HuggingFaceHubDown()
+            logger.debug(
+                f"Could not access model info for the model {model_id!r} from the "
+                f"Hugging Face Hub, after {num_attempts} attempts. The errors "
+                f"encountered were {errors!r}."
+            )
+            return None
     # Get all the Hugging Face repository tags for the model. If the model is an adapter
     # model, then we also get the tags for the base model
@@ -836,7 +847,8 @@ def get_model_repo_info(
                     "Skipping since the `only_allow_safetensors` argument is set "
                     "to `True`."
                 )
-            raise InvalidModel(msg)
+            logger.warning(msg)
+            return None
         # Also check base model if we are evaluating an adapter
         if base_model_id is not None:
@@ -856,7 +868,8 @@ def get_model_repo_info(
                         " Skipping since the `only_allow_safetensors` argument is set "
                         "to `True`."
                     )
-                raise InvalidModel(msg)
+                logging.warning(msg)
+                return None
     return HFModelInfo(
         pipeline_tag=pipeline_tag, tags=tags, adapter_base_model_id=base_model_id

{euroeval-15.8.2 → euroeval-15.9.0}/src/euroeval/benchmark_modules/vllm.py RENAMED Viewed

@@ -84,7 +84,12 @@ if t.TYPE_CHECKING or importlib.util.find_spec("vllm") is not None:
         destroy_distributed_environment,
         destroy_model_parallel,
     )
+    from vllm.inputs import PromptType
     from vllm.lora.request import LoRARequest
+    from vllm.model_executor.guided_decoding.guided_fields import GuidedDecodingRequest
+    from vllm.pooling_params import PoolingParams
+    from vllm.prompt_adapter.request import PromptAdapterRequest
+    from vllm.sampling_params import RequestOutputKind
 if t.TYPE_CHECKING or importlib.util.find_spec("outlines") is not None:
     from outlines.models.vllm import adapt_tokenizer
@@ -451,7 +456,9 @@ class VLLMModel(HuggingFaceEncoderModel):
                         text=prompts,
                         truncation=True,
                         max_length=max(
-                            self._tokenizer.model_max_length - max_tokens, 0
+                            min(self._tokenizer.model_max_length, MAX_CONTEXT_LENGTH)
+                            - max_tokens,
+                            0,
                         ),
                     )
                     prompts = self._tokenizer.batch_decode(
@@ -491,8 +498,19 @@ class VLLMModel(HuggingFaceEncoderModel):
             output.outputs[0].token_ids for output in raw_outputs
         ]
         if self.end_of_reasoning_token_id in completion_ids[0]:
+            # Find the latest index of the end of reasoning token and slice
+            # the token IDs to only include the tokens after it
             completion_ids = [
-                token_ids[token_ids.index(self.end_of_reasoning_token_id) + 1 :]
+                token_ids[
+                    max(
+                        [
+                            i
+                            for i, x in enumerate(token_ids)
+                            if x == self.end_of_reasoning_token_id
+                        ]
+                    )
+                    + 1 :
+                ]
                 if self.end_of_reasoning_token_id in token_ids
                 else token_ids
                 for token_ids in completion_ids
@@ -814,6 +832,9 @@ def load_model_and_tokenizer(
         )
     model._run_engine = MethodType(_run_engine_with_fixed_progress_bars, model)
+    model._validate_and_add_requests = MethodType(
+        _validate_and_add_requests_with_fixed_progress_bars, model
+    )
     model.config = hf_model_config
     return model, tokenizer
@@ -934,6 +955,53 @@ def _run_engine_with_fixed_progress_bars(
     return outputs
+def _validate_and_add_requests_with_fixed_progress_bars(
+    self: "LLM",
+    prompts: "PromptType | c.Sequence[PromptType]",
+    params: "SamplingParams | c.Sequence[SamplingParams] | PoolingParams | c.Sequence[PoolingParams]",  # noqa: E501
+    *,
+    use_tqdm: bool,
+    lora_request: "c.Sequence[LoRARequest] | LoRARequest | None",
+    prompt_adapter_request: "PromptAdapterRequest | None",
+    tokenization_kwargs: dict[str, t.Any] | None = None,
+    guided_options: "GuidedDecodingRequest | None" = None,
+    priority: list[int] | None = None,
+) -> None:
+    if isinstance(prompts, (str, dict)):
+        # Convert a single prompt to a list.
+        prompts = [prompts]
+    num_requests = len(prompts)
+    if isinstance(params, list) and len(params) != num_requests:
+        raise ValueError("The lengths of prompts and params must be the same.")
+    if isinstance(lora_request, list) and len(lora_request) != num_requests:
+        raise ValueError("The lengths of prompts and lora_request must be the same.")
+    for sp in params if isinstance(params, list) else (params,):
+        if isinstance(sp, SamplingParams):
+            self._add_guided_params(sp, guided_options)
+            # We only care about the final output
+            sp.output_kind = RequestOutputKind.FINAL_ONLY
+    # Add requests to the engine.
+    it = prompts
+    if use_tqdm:
+        it = tqdm(it, desc="Adding requests", leave=False)
+    for i, prompt in enumerate(it):
+        self._add_request(
+            prompt,
+            params[i] if isinstance(params, c.Sequence) else params,
+            tokenization_kwargs=tokenization_kwargs,
+            lora_request=lora_request[i]
+            if isinstance(lora_request, c.Sequence)
+            else lora_request,
+            prompt_adapter_request=prompt_adapter_request,
+            priority=priority[i] if priority else 0,
+        )
 def clear_vllm() -> None:
     """Clear the GPU memory used by the vLLM model, enabling re-initialisation."""
     with contextlib.suppress(ValueError):

{euroeval-15.8.2 → euroeval-15.9.0}/src/euroeval/benchmarker.py RENAMED Viewed

@@ -72,7 +72,6 @@ class Benchmarker:
         force: bool = False,
         verbose: bool = False,
         trust_remote_code: bool = False,
-        use_flash_attention: bool | None = None,
         clear_model_cache: bool = False,
         evaluate_test_split: bool = False,
         few_shot: bool = True,
@@ -129,9 +128,6 @@ class Benchmarker:
                 `debug` is True. Defaults to False.
             trust_remote_code:
                 Whether to trust remote code when loading models. Defaults to False.
-            use_flash_attention:
-                Whether to use Flash Attention. If None then it will be used if it is
-                installed and the model is a decoder model. Defaults to None.
             clear_model_cache:
                 Whether to clear the model cache after benchmarking each model.
                 Defaults to False.
@@ -190,7 +186,6 @@ class Benchmarker:
             force=force,
             verbose=verbose,
             trust_remote_code=trust_remote_code,
-            use_flash_attention=use_flash_attention,
             clear_model_cache=clear_model_cache,
             evaluate_test_split=evaluate_test_split,
             few_shot=few_shot,
@@ -243,7 +238,6 @@ class Benchmarker:
         force: bool | None = None,
         verbose: bool | None = None,
         trust_remote_code: bool | None = None,
-        use_flash_attention: bool | None = None,
         clear_model_cache: bool | None = None,
         evaluate_test_split: bool | None = None,
         few_shot: bool | None = None,
@@ -311,9 +305,6 @@ class Benchmarker:
             trust_remote_code:
                 Whether to trust remote code when loading models. Defaults to the value
                 specified when initialising the benchmarker.
-            use_flash_attention:
-                Whether to use Flash Attention. Defaults to the value specified when
-                initialising the benchmarker.
             clear_model_cache:
                 Whether to clear the model cache after benchmarking each model. Defaults
                 to the value specified when initialising the benchmarker.
@@ -359,7 +350,6 @@ class Benchmarker:
             force=force,
             verbose=verbose,
             trust_remote_code=trust_remote_code,
-            use_flash_attention=use_flash_attention,
             clear_model_cache=clear_model_cache,
             evaluate_test_split=evaluate_test_split,
             few_shot=few_shot,
@@ -531,7 +521,6 @@ class Benchmarker:
         force: bool | None = None,
         verbose: bool | None = None,
         trust_remote_code: bool | None = None,
-        use_flash_attention: bool | None | None = None,
         clear_model_cache: bool | None = None,
         evaluate_test_split: bool | None = None,
         few_shot: bool | None = None,
@@ -590,9 +579,6 @@ class Benchmarker:
             trust_remote_code:
                 Whether to trust remote code when loading models. If None, then this
                 value will not be updated.
-            use_flash_attention:
-                Whether to use Flash Attention. If None, then this value will not be
-                updated.
             clear_model_cache:
                 Whether to clear the model cache after benchmarking each model. If None,
                 then this value will not be updated.
@@ -658,8 +644,6 @@ class Benchmarker:
             benchmark_config_params.verbose = verbose
         if trust_remote_code is not None:
             benchmark_config_params.trust_remote_code = trust_remote_code
-        if use_flash_attention is not None:
-            benchmark_config_params.use_flash_attention = use_flash_attention
         if clear_model_cache is not None:
             benchmark_config_params.clear_model_cache = clear_model_cache
         if evaluate_test_split is not None:
@@ -863,7 +847,6 @@ class Benchmarker:
         force: bool | None = None,
         verbose: bool | None = None,
         trust_remote_code: bool | None = None,
-        use_flash_attention: bool | None = None,
         clear_model_cache: bool | None = None,
         evaluate_test_split: bool | None = None,
         few_shot: bool | None = None,
@@ -931,9 +914,6 @@ class Benchmarker:
             trust_remote_code:
                 Whether to trust remote code when loading models. Defaults to the value
                 specified when initialising the benchmarker.
-            use_flash_attention:
-                Whether to use Flash Attention. Defaults to the value specified when
-                initialising the benchmarker.
             clear_model_cache:
                 Whether to clear the model cache after benchmarking each model. Defaults
                 to the value specified when initialising the benchmarker.
@@ -981,7 +961,6 @@ class Benchmarker:
             force=force,
             verbose=verbose,
             trust_remote_code=trust_remote_code,
-            use_flash_attention=use_flash_attention,
             clear_model_cache=clear_model_cache,
             evaluate_test_split=evaluate_test_split,
             few_shot=few_shot,

{euroeval-15.8.2 → euroeval-15.9.0}/src/euroeval/cli.py RENAMED Viewed

@@ -141,14 +141,6 @@ from .tasks import get_all_tasks
     help="""Whether to trust remote code. Only set this flag if you trust the supplier
     of the model.""",
 )
-@click.option(
-    "--use-flash-attention/--no-use-flash-attention",
-    default=None,
-    show_default=True,
-    help="""Whether to use Flash Attention. If not specified then the model will use
-    Flash Attention for generative models if a CUDA GPU is available and `flash-attn`
-    or `vllm-flash-attn` are installed.""",
-)
 @click.option(
     "--clear-model-cache/--no-clear-model-cache",
     default=False,
@@ -225,7 +217,6 @@ def benchmark(
     verbose: bool,
     device: str | None,
     trust_remote_code: bool,
-    use_flash_attention: bool | None,
     clear_model_cache: bool,
     evaluate_test_split: bool,
     few_shot: bool,
@@ -261,7 +252,6 @@ def benchmark(
         cache_dir=cache_dir,
         device=device,
         trust_remote_code=trust_remote_code,
-        use_flash_attention=use_flash_attention,
         clear_model_cache=clear_model_cache,
         evaluate_test_split=evaluate_test_split,
         few_shot=few_shot,

{euroeval-15.8.2 → euroeval-15.9.0}/src/euroeval/data_models.py RENAMED Viewed

@@ -191,9 +191,6 @@ class BenchmarkConfig:
             Whether to print verbose output.
         trust_remote_code:
             Whether to trust remote code when loading models from the Hugging Face Hub.
-        use_flash_attention:
-            Whether to use Flash Attention. If None then this will be used for
-            generative models.
         clear_model_cache:
             Whether to clear the model cache after benchmarking each model.
         evaluate_test_split:
@@ -231,7 +228,6 @@ class BenchmarkConfig:
     device: torch.device
     verbose: bool
     trust_remote_code: bool
-    use_flash_attention: bool | None
     clear_model_cache: bool
     evaluate_test_split: bool
     few_shot: bool
@@ -263,7 +259,6 @@ class BenchmarkConfigParams(pydantic.BaseModel):
     force: bool
     verbose: bool
     trust_remote_code: bool
-    use_flash_attention: bool | None
     clear_model_cache: bool
     evaluate_test_split: bool
     few_shot: bool

{euroeval-15.8.2 → euroeval-15.9.0}/src/euroeval/exceptions.py RENAMED Viewed

@@ -81,28 +81,6 @@ class NaNValueInModelOutput(Exception):
         super().__init__(self.message)
-class FlashAttentionNotInstalled(Exception):
-    """The `flash-attn` package has not been installed."""
-    def __init__(
-        self,
-        message: str = (
-            "The model you are trying to load requires Flash Attention. To use Flash "
-            "Attention, please install the `flash-attn` package, which can be done by "
-            "running `pip install -U wheel && FLASH_ATTENTION_SKIP_CUDA_BUILD=TRUE "
-            "pip install flash-attn --no-build-isolation`."
-        ),
-    ) -> None:
-        """Initialise the exception.
-        Args:
-            message:
-                The message to display.
-        """
-        self.message = message
-        super().__init__(self.message)
 class NeedsExtraInstalled(InvalidModel):
     """The evaluation requires extra to be installed."""

{euroeval-15.8.2 → euroeval-15.9.0}/src/euroeval/human_evaluation.py RENAMED Viewed

@@ -263,7 +263,6 @@ class HumanEvaluator:
             force=False,
             verbose=False,
             trust_remote_code=False,
-            use_flash_attention=None,
             clear_model_cache=False,
             evaluate_test_split=False,
             few_shot=True,

{euroeval-15.8.2 → euroeval-15.9.0}/tests/conftest.py RENAMED Viewed

@@ -80,7 +80,6 @@ def benchmark_config(
         device=device,
         verbose=False,
         trust_remote_code=True,
-        use_flash_attention=False,
         clear_model_cache=False,
         evaluate_test_split=False,
         few_shot=True,

EuroEval 15.8.2__tar.gz → 15.9.0__tar.gz

Potentially problematic release.

EuroEval 15.8.2tar.gz → 15.9.0tar.gz