PyPI - crfm-helm - Versions diffs - 0.5.2__py3-none-any.whl → 0.5.4__py3-none-any.whl - Mend

crfm-helm 0.5.2py3-none-any.whl → 0.5.4py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of crfm-helm might be problematic. Click here for more details.

Files changed (209) hide show

helm/benchmark/scenarios/test_narrativeqa_scenario.py ADDED Viewed

@@ -0,0 +1,73 @@
+import pytest
+from tempfile import TemporaryDirectory
+from helm.benchmark.scenarios.narrativeqa_scenario import NarrativeQAScenario
+from helm.benchmark.scenarios.scenario import CORRECT_TAG, Output, Reference
+@pytest.mark.scenarios
+def test_narrativeqa_scenario():
+    scenario = NarrativeQAScenario()
+    with TemporaryDirectory() as tmpdir:
+        instances = scenario.get_instances(tmpdir)
+    assert len(instances) == 1572
+    assert (
+        instances[0].input.text
+        == "At Madeline Hall, an old mansion-house near Southampton belonging to the wealthy de Versely family, lives"
+        " an elderly spinster Miss Delmar, the aunt of the earl de Versely and Captain Delmar. Miss Delmar invites"
+        " Arabella Mason, the daughter of a deceased, well-liked steward to stay with her as a lower-class guest in"
+        " the house. Captain Delmar is known to visit his aunt at Madeline Hall frequently, accompanied by his"
+        " valet Ben Keene, who is also a private marine. Captain Delmar eventually suggests that Ben should propose"
+        " to Arabella, and the two marry in secret, to the frustration of Miss Delmar and Arabella's mother. The"
+        " captain is able to smooth over the situation with his aunt, even after it is discovered that Arabella was"
+        " six months pregnant at the time of the marriage. She later gives birth to a boy, who takes the Captain's"
+        " Christian name and Ben's surname--the titular Percival Keene.\nThe family moves to Chatham, after Ben is"
+        " ordered back with his detachment. Arabella opens up a successful shop and circulating library below her"
+        " house, enlisting the help of her mother and sister, Amelia. Percival becomes well known in town from his"
+        " mischievous pranks on officers and other strangers, often encouraged by his aunt Amelia. However,"
+        " Percival's mother and grandmother are less fond of his disregard for manners, and insist on sending him"
+        " to school after an episode in which he bites his grandmother. Percival reports to the school house of Mr."
+        " O'Gallagher, a poor Irish scholar, who rules his class with a system of severe corporal punishment. Mr."
+        " O'Gallagher routinely bullies Percival by stealing his lunch, leading Percival to seek revenge by"
+        " poisoning his sandwiches with calomel. On Guy Fawkes Day the schoolteacher confiscates all the"
+        " schoolboys' fireworks, for which Percival retaliates by setting off the collected fireworks while the"
+        " teacher sits above them, leading to the total destruction of the schoolhouse and near death of the"
+        " schoolmaster.\nWhen Percival is a young teenager, Captain Delmar reappears and offers him a position"
+        " aboard his new navy ship, the H.M. Calliope. While preparing to enter service, Percival overhears gossip"
+        " of his illegitimate birth, introducing the idea that Captain Delmar may be his father. He confronts his"
+        " mother about his parentage, which she at first harshly denies but later tearfully explains the truth of"
+        " her affair. Early in his service in the navy, Percival is captured during a pirate raid along with"
+        " others. The pirate crew is entirely black, and the captain explains that they are primarily escaped"
+        " slaves from the Americas. Percival is taken in as a cabin boy, and later dyes his skin tan in the"
+        " appearance of a mulatto to please the captain who doesn't approve of white skin. The pirates often seek"
+        " to take over slave trading vessels, killing every white person on board. During the taking of one such"
+        " vessel, Percival is able is convince the captain to spare the lives of a wealthy Dutch merchant and his"
+        " young daughter, Minnie. Eventually the H.M. Calliope takes the pirate ship, and Percival--unrecognizable"
+        " with his dyed skin--is taken as a prisoner, later to convince his fellow shipman of his true"
+        " identity.\nAfter his reappearance aboard the ship, Percival gains esteem among the crew and is welcomed"
+        " back by the emotional Captain Delmar. His reputation continues to grow over the course of his service in"
+        " conflicts with Dutch and French vessels around the island of Curacao. He also stands in for an ill"
+        " Captain Delmar in a duel with a French officer, effectively saving the captain's life. At this point, the"
+        " captain receives news that his older brother has died, making him the new Lord de Versely, and before"
+        " returning to England he grants Perceval command of his own schooner. After another intense but successful"
+        " battle with a French war ship, Percival is promoted to captain. During his service in the Navy, Percival"
+        " still partakes in the merry pranks of his youth, and at one point teams up with a mulatto hotel owner in"
+        " Curaรงao to convince his fellow officers they've been poisoned. He also keeps correspondence with Minnie,"
+        " developing a romance with the beautiful heiress.\nNear the end of the story, Percival guides his crew"
+        " through a terrible storm in which many of the crew are killed and the ship is heavily damaged. After"
+        " being saved by another English vessel, he receives a letter informing him of Lord de Versely's sudden"
+        " death from heart complications and learns that he has been left all of his personal property. Percival is"
+        " still disappointed that he can not take his father's name. He later journey's with his friend Bob Cross"
+        " to Hamburg to reunite with Minnie, but is captured by French troops on the road and sentenced to"
+        " execution for spying. During a skirmish between the French and the Cossacks, Percival and Cross are able"
+        " to escape and continue on the road. At the end of the novel, Percival proposes to Minnie, and stands to"
+        " inherit a great fortune through her father. He also receives a letter from the de Versely attorney"
+        " letting him know he has been granted the arms and name of Delmar.\nQuestion: Who did Percival reunited"
+        " with?"
+    )
+    assert instances[0].references == [
+        Reference(output=Output(text="Minnie"), tags=[CORRECT_TAG]),
+        Reference(output=Output(text="minnie"), tags=[CORRECT_TAG]),
+    ]
+    assert instances[0].split == "train"

helm/benchmark/scenarios/thai_exam_scenario.py CHANGED Viewed

@@ -86,9 +86,9 @@ class ThaiExamScenario(Scenario):
         super().__init__()
         self.exam = exam
-    def download_thai_exam(self, path: str):
+    def download_thai_exam(self, path: str, revision: str):
         ensure_file_downloaded(
-            "https://storage.googleapis.com/thai_dataset/thai_exam.tar.gz",
+            f"https://huggingface.co/datasets/scb10x/thai_exam/resolve/{revision}/thai_exam.tar.gz",
             target_path=path,
             unpack=True,
         )
@@ -118,8 +118,8 @@ class ThaiExamScenario(Scenario):
     def get_instances(self, output_path) -> List[Instance]:
         data_path: str = os.path.join(output_path, "data")
-        self.download_thai_exam(data_path)
+        # ThaiExam (v1.0) revision = d78aef04ea3cc5095545e6951cb39e17c64e26a1
+        self.download_thai_exam(data_path, revision="d78aef04ea3cc5095545e6951cb39e17c64e26a1")
         instances: List[Instance] = []
         splits: Dict[str, str] = {
             "train": TRAIN_SPLIT,

helm/benchmark/scenarios/vision_language/a_okvqa_scenario.py CHANGED Viewed

@@ -42,7 +42,7 @@ class AOKVQAScenario(Scenario):
     name = "a_okvqa"
     description = (
         "A crowdsourced dataset composed of a diverse set of about 25K questions requiring a broad base of "
-        "commonsense and world knowledge to answer ([paper](https://arxiv.org/abs/2206.01718))."
+        "commonsense and world knowledge to answer ([Schwenk et al., 2022](https://arxiv.org/abs/2206.01718))."
     )
     tags = ["vision-language", "knowledge", "reasoning"]

helm/benchmark/scenarios/vision_language/bingo_scenario.py CHANGED Viewed

@@ -51,8 +51,8 @@ class BingoScenario(Scenario):
     name = "bingo"
     description = (
-        "Evaluate multimodal models on biased and inference-challenging scenarios with five subjects"
-        " ([paper](https://arxiv.org/abs/2311.03287))."
+        "Evaluate multimodal models on biased and inference-challenging scenarios with five subjects "
+        "([Cui et al., 2023](https://arxiv.org/abs/2311.03287))."
     )
     tags = ["vision-language"]

helm/benchmark/scenarios/vision_language/crossmodal_3600_scenario.py CHANGED Viewed

@@ -75,7 +75,8 @@ class Crossmodal3600Scenario(Scenario):
     name = "crossmodal_3600"
     description = (
         "Crossmodal-3600 dataset (XM3600 in short), a geographically-diverse set of 3600 images annotated "
-        "with human-generated reference captions in 36 languages. ([paper](https://arxiv.org/abs/2205.12522))."
+        "with human-generated reference captions in 36 languages. "
+        "([Thapliyal et al., 2022)](https://arxiv.org/abs/2205.12522))."
     )
     tags = ["vision-language", "multilinguality"]

helm/benchmark/scenarios/vision_language/exams_v_scenario.py ADDED Viewed

@@ -0,0 +1,104 @@
+from typing import List, Set
+import os
+from datasets import load_dataset
+from tqdm import tqdm
+from helm.benchmark.scenarios.scenario import (
+    CORRECT_TAG,
+    TEST_SPLIT,
+    TRAIN_SPLIT,
+    Instance,
+    Input,
+    Output,
+    Reference,
+    Scenario,
+)
+from helm.common.media_object import MediaObject, MultimediaObject
+from helm.common.images_utils import generate_hash
+class ExamsVScenario(Scenario):
+    """
+    EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models
+    A challenging multi-discipline multimodal multilingual exam benchmark for evaluating vision language models.
+    It consists of 20,932 multiple-choice questions across 20 school disciplines covering natural science,
+    social science, and other miscellaneous studies, e.g.,religion, fine arts, business, etc.
+    Paper: https://arxiv.org/abs/2403.10378
+    Website: https://huggingface.co/datasets/Rocktim/EXAMS-V
+    """
+    HUGGINGFACE_DATASET_NAME: str = "Rocktim/EXAMS-V"
+    VALID_LANGUAGES: Set[str] = {
+        "Chinese",
+        "Croation",
+        "Italian",
+        "Hungarian",
+        "Arabic",
+        "Serbian",
+        "Bulgarian",
+        "English",
+        "German",
+        "French",
+        "Spanish",
+        "Polish",
+    }
+    VALID_SUBJECT_GROUP: Set[str] = {
+        "Natural Science",
+        "Social Sciences",
+        "Other",
+    }
+    VALID_TYPES: Set[str] = {"text", "image_text"}
+    name = "exams_v"
+    description = (
+        "Multimodal and Multilingual benchmark to evaluate vision-language models across 20 school disciplines "
+        "covering natural science, social science, and other miscellaneous studies "
+        "([Das et al., 2024]( https://arxiv.org/abs/2403.10378))."
+    )
+    tags = ["vision-language", "knowledge", "reasoning", "multilingual"]
+    def __init__(self, language: str, subject_grouped: str, type: str) -> None:
+        super().__init__()
+        subject_grouped = subject_grouped.replace("_", " ")
+        assert subject_grouped in self.VALID_SUBJECT_GROUP, f"Invalid subject_grouped: {subject_grouped}"
+        assert type in self.VALID_TYPES, f"Invalid type: {type}"
+        assert language in self.VALID_LANGUAGES, f"Invalid language: {language}"
+        self._language: str = language
+        self._subject_grouped: str = subject_grouped
+        self._type: str = type
+    def get_instances(self, output_path: str) -> List[Instance]:
+        instances: List[Instance] = []
+        for split in [TRAIN_SPLIT, TEST_SPLIT]:
+            for row in tqdm(load_dataset(self.HUGGINGFACE_DATASET_NAME, split=split, cache_dir=output_path)):
+                language: str = row["language"]
+                subject_grouped: str = row["subject_grouped"]
+                type: str = row["type"]
+                # Exclude examples that do not match the specified language, subject, and type
+                if language != self._language or subject_grouped != self._subject_grouped or type != self._type:
+                    continue
+                # Save the image to disk
+                image = row["image"]
+                image_file_name: str = generate_hash(image) + ".jpg"
+                local_image_path: str = os.path.join(output_path, image_file_name)
+                if not os.path.exists(local_image_path):
+                    image.convert("RGB").save(local_image_path)
+                content: List[MediaObject] = [
+                    MediaObject(location=local_image_path, content_type="image/jpeg"),
+                ]
+                references: List[Reference] = [Reference(output=Output(text=row["answer_key"]), tags=[CORRECT_TAG])]
+                instances.append(
+                    Instance(Input(multimedia_content=MultimediaObject(content)), references=references, split=split)
+                )
+        return instances

helm/benchmark/scenarios/vision_language/fair_face_scenario.py ADDED Viewed

@@ -0,0 +1,136 @@
+from typing import List
+import os
+from datasets import load_dataset
+from tqdm import tqdm
+from helm.benchmark.scenarios.scenario import (
+    CORRECT_TAG,
+    VALID_SPLIT,
+    TRAIN_SPLIT,
+    Instance,
+    Input,
+    Output,
+    Reference,
+    Scenario,
+)
+from helm.common.media_object import MediaObject, MultimediaObject
+from helm.common.images_utils import generate_hash
+class FairFaceScenario(Scenario):
+    """
+    Identify the race, gender or age of a photo of a person.
+    @misc{kärkkäinen2019fairfacefaceattributedataset,
+      title={FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age},
+      author={Kimmo Kärkkäinen and Jungseock Joo},
+      year={2019},
+      eprint={1908.04913},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/1908.04913},
+    }
+    Paper: https://arxiv.org/abs/1908.04913
+    Website: https://huggingface.co/datasets/HuggingFaceM4/FairFace
+    """
+    HUGGINGFACE_DATASET_NAME: str = "HuggingFaceM4/FairFace"
+    RACE_ATTRIBUTE: str = "race"
+    GENDER_ATTRIBUTE: str = "gender"
+    AGE_ATTRIBUTE: str = "age"
+    QUESTION_TEMPLATE: str = "Identify the {attribute} of the person in the image."
+    RACE_SUBGROUPS: List[str] = [
+        "East Asian",
+        "Indian",
+        "Black",
+        "White",
+        "Middle Eastern",
+        "Latino Hispanic",
+        "Southeast Asian",
+    ]
+    GENDER_SUBGROUPS: List[str] = ["Male", "Female"]
+    AGE_SUBGROUPS: List[str] = [
+        "0-2 years",
+        "3-9 years",
+        "10-19 years",
+        "20-29 years",
+        "30-39 years",
+        "40-49 years",
+        "50-59 years",
+        "60-69 years",
+        "Over 70 years",
+    ]
+    name = "fair_face"
+    description = (
+        "Identify the race, gender or age of a photo of a person "
+        "([Karkkainen et al., 2019](https://arxiv.org/abs/1908.04913))."
+    )
+    tags = ["vision-language", "fairness"]
+    def __init__(self, attribute: str, subgroup: str) -> None:
+        super().__init__()
+        subgroups: List[str]
+        if attribute == self.RACE_ATTRIBUTE:
+            subgroups = self.RACE_SUBGROUPS
+        elif attribute == self.GENDER_ATTRIBUTE:
+            subgroups = self.GENDER_SUBGROUPS
+        elif attribute == self.AGE_ATTRIBUTE:
+            subgroups = self.AGE_SUBGROUPS
+        else:
+            raise ValueError(f"Invalid attribute: {attribute}")
+        # Validate the value passed in for the subgroup argument and set possible subgroup choices.
+        # The subgroup passed in has a _ for spaces in the string.
+        subgroup = subgroup.replace("_", " ")
+        assert subgroup in subgroups, f"Invalid subgroup for {attribute} attribute: {subgroup}"
+        self._subgroup_choices: List[str] = subgroups
+        self._correct_subgroup_index: int = subgroups.index(subgroup)
+        self._attribute: str = attribute  # For answer column
+        self._question: str = self.QUESTION_TEMPLATE.format(attribute=attribute)  # What text to prompt the model?
+    def get_instances(self, output_path: str) -> List[Instance]:
+        instances: List[Instance] = []
+        for split in [TRAIN_SPLIT, VALID_SPLIT]:
+            for row in tqdm(
+                load_dataset(
+                    self.HUGGINGFACE_DATASET_NAME,
+                    "1.25",
+                    split="validation" if split == VALID_SPLIT else split,
+                    cache_dir=output_path,
+                )
+            ):
+                # Filter out rows that do not match the subgroup
+                if row[self._attribute] != self._correct_subgroup_index:
+                    continue
+                # Save the image to disk
+                image = row["image"]
+                image_file_name: str = generate_hash(image) + ".jpg"
+                local_image_path: str = os.path.join(output_path, image_file_name)
+                if not os.path.exists(local_image_path):
+                    image.save(local_image_path)
+                content: List[MediaObject] = [
+                    MediaObject(location=local_image_path, content_type="image/jpeg"),
+                    MediaObject(text=self._question, content_type="text/plain"),
+                ]
+                references: List[Reference] = [
+                    Reference(
+                        output=Output(text=subgroup),
+                        tags=[CORRECT_TAG] if i == self._correct_subgroup_index else [],
+                    )
+                    for i, subgroup in enumerate(self._subgroup_choices)
+                ]
+                instances.append(
+                    Instance(Input(multimedia_content=MultimediaObject(content)), references=references, split=split)
+                )
+        return instances

helm/benchmark/scenarios/vision_language/flickr30k_scenario.py CHANGED Viewed

@@ -41,7 +41,7 @@ class Flickr30KScenario(Scenario):
     name = "flickr30k"
     description = (
         "An image caption corpus consisting of 158,915 crowd-sourced captions describing 31,783 Flickr "
-        "images ([paper](https://shannon.cs.illinois.edu/DenotationGraph/TACLDenotationGraph.pdf))."
+        "images ([Young et al., 2014](https://shannon.cs.illinois.edu/DenotationGraph/TACLDenotationGraph.pdf))."
     )
     tags = ["vision-language"]

helm/benchmark/scenarios/vision_language/gqa_scenario.py CHANGED Viewed

@@ -31,7 +31,7 @@ class GQAScenario(Scenario):
     }
     Paper: https://arxiv.org/abs/1902.09506
-    Website: https://github.com/stanford-crfm/helm/issues/1951
+    Website: https://cs.stanford.edu/people/dorarad/gqa/about.html
     """
     QUESTIONS_URL: str = "https://downloads.cs.stanford.edu/nlp/data/gqa/questions1.2.zip"
@@ -40,7 +40,7 @@ class GQAScenario(Scenario):
     name = "gqa"
     description = (
         "Questions about real-world visual reasoning and compositional QA "
-        "([paper](https://arxiv.org/abs/1902.09506))."
+        "([Hudson and Manning, 2019](https://arxiv.org/abs/1902.09506))."
     )
     tags = ["vision-language", "reasoning"]

helm/benchmark/scenarios/vision_language/hateful_memes_scenario.py CHANGED Viewed

@@ -49,7 +49,7 @@ class HatefulMemesScenario(Scenario):
     QUESTION: str = "Is this meme mean?"
     name = "hateful_memes"
-    description = "Identify if a meme is hateful or not ([paper](https://arxiv.org/abs/2005.04790))."
+    description = "Identify if a meme is hateful or not ([Kiela et al., 2020](https://arxiv.org/abs/2005.04790))."
     tags = ["vision-language"]
     def get_instances(self, output_path: str) -> List[Instance]:

helm/benchmark/scenarios/vision_language/{image2structure → image2struct}/chart2csv_scenario.py RENAMED Viewed

@@ -20,7 +20,7 @@ class Chart2CSVScenario(Scenario):
     name = "chart2csv"
     description = "Convert a chart to CSV."
-    tags = ["vision-language", "image2structure"]
+    tags = ["vision-language", "image2struct"]
     def get_instances(self, output_path: str) -> List[Instance]:
         assert os.path.exists(output_path), f"Dataset does not exist at {output_path}"

helm/benchmark/scenarios/vision_language/{image2structure → image2struct}/latex_scenario.py RENAMED Viewed

@@ -1,14 +1,14 @@
-from helm.benchmark.scenarios.vision_language.image2structure.utils_latex import (
+from helm.benchmark.scenarios.vision_language.image2struct.utils_latex import (
     latex_to_image,
     strip_unnecessary_latex_parts,
 )
-from helm.benchmark.scenarios.vision_language.image2structure.image2structure_scenario import Image2StructureScenario
+from helm.benchmark.scenarios.vision_language.image2struct.image2struct_scenario import Image2StructureScenario
 class LatexScenario(Image2StructureScenario):
     BASE_PROMPT = "Please provide the LaTeX code used to generate this image. Only generate the code relevant to what you see. Your code will be surrounded by all the imports necessary as well as the begin and end document delimiters."  # noqa: E501
     HUGGINGFACE_DATASET_NAME = "stanford-crfm/i2s-latex"
-    SUBSETS = ["equation", "table", "plot", "algorithm", "real"]
+    SUBSETS = ["equation", "table", "plot", "algorithm", "wild", "wild_legacy"]
     name = "image2latex"
     description = "Evaluate multimodal models on Latex generation to recreate a provided image"

helm/benchmark/scenarios/vision_language/{image2structure → image2struct}/musicsheet_scenario.py RENAMED Viewed

@@ -1,4 +1,4 @@
-from helm.benchmark.scenarios.vision_language.image2structure.image2structure_scenario import Image2StructureScenario
+from helm.benchmark.scenarios.vision_language.image2struct.image2struct_scenario import Image2StructureScenario
 class MusicSheetScenario(Image2StructureScenario):

helm/benchmark/scenarios/vision_language/{image2structure → image2struct}/utils_latex.py RENAMED Viewed

@@ -5,6 +5,7 @@ import os
 import re
 from helm.common.optional_dependencies import handle_module_not_found_error, OptionalDependencyNotInstalled
+from helm.common.hierarchical_logger import hlog
 try:
     from latex import build_pdf
@@ -12,14 +13,13 @@ try:
     from PIL import ImageOps
     from PIL.Image import Image
 except ModuleNotFoundError as e:
-    handle_module_not_found_error(e, suggestions=["image2structure"])
+    handle_module_not_found_error(e, suggestions=["image2struct"])
 # LaTeX preamble
 # Make sure to install "latex-full".
 TEX_INCLUDES = r"""
 \usepackage{amsmath,amssymb,amsfonts}
 \usepackage{graphicx}
-\usepackage{graphicx}
 \usepackage{amsmath}
 \usepackage{xcolor}
 \usepackage{algorithm}
@@ -98,23 +98,19 @@ def pdf_to_image(
 def strip_unnecessary_latex_parts(latex_code: str) -> str:
     """Strip unnecessary parts of the LaTeX code."""
     # Remove comments
     minimal_latex_code = re.sub(r"%.*?\n", "\n", latex_code)
     # Remove \documentclass and any \usepackage lines
-    minimal_latex_code = re.sub(r"\\documentclass\{.*?\}\n", "", latex_code)
-    minimal_latex_code = re.sub(r"\\usepackage(\[.*?\])?\{.*?\}\n", "", minimal_latex_code)
+    minimal_latex_code = re.sub(r"\\documentclass(\[.*?\])?\{.*?\}", "", latex_code)
+    minimal_latex_code = re.sub(r"\\documentstyle(\[.*?\])?\{.*?\}", "", minimal_latex_code)
+    minimal_latex_code = re.sub(r"\\usepackage(\[.*?\])?\{.*?\}", "", minimal_latex_code)
     # Remove everything before \begin{document} and including it, and everything after \end{document}
     minimal_latex_code = re.sub(r"\\begin\{document\}\n*", "", minimal_latex_code, flags=re.DOTALL)
     minimal_latex_code = re.sub(r"\\end\{document\}.*", "", minimal_latex_code, flags=re.DOTALL)
     # Ensure \begin{...} is followed by a \n
     minimal_latex_code = re.sub(r"(\\begin\{.*?\}(\[.*?\])?)(?!\n)", r"\1\n", minimal_latex_code)
     # Ensure \end{...} has a \n before it
     minimal_latex_code = re.sub(r"(\\end\{.*?\})(?!\n)", r"\1\n", minimal_latex_code)
     # Normalize space sequences to a single space globally
     minimal_latex_code = re.sub(r" +", " ", minimal_latex_code)
     # Replace tabs with a single space
@@ -123,7 +119,6 @@ def strip_unnecessary_latex_parts(latex_code: str) -> str:
     minimal_latex_code = re.sub(r"^[ \t]+|[ \t]+$", "", minimal_latex_code, flags=re.MULTILINE)
     # Remove unnecessary whitespace - multiple empty lines and tabulations
     minimal_latex_code = re.sub(r"\n\s*\n", "\n", minimal_latex_code)
     return minimal_latex_code.strip()
@@ -226,25 +221,21 @@ def handle_latex_error(
         # Error format: "LaTeX Error: Environment <env> undefined."
         undefined_search = re.search(r"LaTeX Error: Environment (.*) undefined", str_e)
         if undefined_search:
-            # If a package is missing and this is our first retry, then simply include TEX_INCLUDES
-            if num_try_remaining == MAX_NUM_TRIES:
-                fixed_code = fixed_code.replace(TEX_BEGIN_FILE, TEX_BEGIN_FILE + "\n" + TEX_INCLUDES + "\n")
-            if num_try_remaining < MAX_NUM_TRIES or fixed_code == original_latex_code:
-                # Here we try to manually solve the missing environment.
-                # This is either executed on the second rety or the first if no changements
-                # were made in the first retry.
-                assert TEX_INCLUDES in fixed_code, "TEX_INCLUDES should be present in the code"
-                # TEX_INCLUDES is already present, so we add the missing package
-                # Since we cannot know the name of the package that contains the missing environment,
-                # we simply hope that they are named the same way.
-                env_undefined: str = undefined_search.group(1)
-                if f"\\usepackage{{{env_undefined}}}" in fixed_code:
-                    # We already tried to include the missing package, but it probably
-                    # does not exist, so we raise an error
-                    raise RuntimeError(str(e)) from e
-                fixed_code = fixed_code.replace(TEX_BEGIN_FILE, TEX_BEGIN_FILE + f"\n\\usepackage{{{env_undefined}}}\n")
+            # Here we try to manually solve the missing environment.
+            # This is either executed on the second rety or the first if no changements
+            # were made in the first retry.
+            assert TEX_INCLUDES in fixed_code, f"TEX_INCLUDES should be present in the code. code={fixed_code}"
+            # TEX_INCLUDES is already present, so we add the missing package
+            # Since we cannot know the name of the package that contains the missing environment,
+            # we simply hope that they are named the same way.
+            env_undefined: str = undefined_search.group(1)
+            if f"\\usepackage{{{env_undefined}}}" in fixed_code:
+                # We already tried to include the missing package, but it probably
+                # does not exist, so we raise an error
+                raise RuntimeError(str(e)) from e
+            fixed_code = fixed_code.replace(TEX_BEGIN_FILE, TEX_BEGIN_FILE + f"\n\\usepackage{{{env_undefined}}}\n")
         # Try again with the fixed code (if the fixed code is different from the original code)
         if fixed_code != original_latex_code:
@@ -313,23 +304,24 @@ def latex_to_image(
     # 2. Add preamble
     # 2.1. Remove \documentclass if present to make sure we use our own
-    documentclass_search = re.search(r"\\documentclass\{(.*)\}", original_latex_code)
+    documentclass_search = re.search(r"\\documentclass(\[.*?\])?\{.*?\}", original_latex_code)
+    documentstyle_search = re.search(r"\\documentstyle(\[.*?\])?\{.*?\}", original_latex_code)
     if documentclass_search:
-        documentclass: str = documentclass_search.group(1)
-        original_latex_code = original_latex_code.replace(f"\\documentclass{{{documentclass}}}", TEX_BEGIN_FILE)
+        matching_string = documentclass_search.group()
+        original_latex_code = original_latex_code.replace(matching_string, TEX_BEGIN_FILE)
+    elif documentstyle_search:
+        matching_string = documentstyle_search.group()
+        original_latex_code = original_latex_code.replace(matching_string, TEX_BEGIN_FILE)
     else:
         # If there is no \documentclass, we add our own
         original_latex_code = TEX_BEGIN_FILE + "\n\n" + original_latex_code
-    # 2.2. Add includes. In this first step, we only add includes if none are present.
-    # We do this because if some are present, we might define them twice which can cause errors
-    # and this section should not make the original LaTeX code fail if it was compilable.
-    # If there are missing packages, in handle_latex_error, we will add TEX_INCLUDES after the begin document,
-    # which might define some packages twice, but often solves the problem.
-    if not re.search(r"\\usepackage\{.*\}", original_latex_code):
-        original_latex_code = original_latex_code.replace(TEX_BEGIN_FILE, TEX_BEGIN_FILE + "\n" + TEX_INCLUDES + "\n")
+    # 2.2. Add includes. In this ste we remove all includes for the default ones.
+    original_latex_code = re.sub(r"\\usepackage(\[.*?\])?\{.*\}", "", original_latex_code)
+    original_latex_code = original_latex_code.replace(TEX_BEGIN_FILE, TEX_BEGIN_FILE + "\n" + TEX_INCLUDES + "\n")
     latex_code: str = original_latex_code
+    hlog(f"Compiling LaTeX code:\n{latex_code}")
     try:
         pdf_stream = latex_to_pdf(latex_code, assets_path=assets_path)
         image = pdf_to_image(pdf_stream, crop=crop, resize_to=resize_to)

helm/benchmark/scenarios/vision_language/{image2structure → image2struct}/webpage/driver.py RENAMED Viewed

@@ -6,7 +6,7 @@ try:
     from selenium import webdriver
     import selenium.common.exceptions
 except ModuleNotFoundError as e:
-    handle_module_not_found_error(e, suggestions=["image2structure"])
+    handle_module_not_found_error(e, suggestions=["image2struct"])
 def init_driver(url: str, resolution: Tuple[int, int] = (1920, 1080)) -> webdriver.Chrome:

helm/benchmark/scenarios/vision_language/{image2structure → image2struct}/webpage/utils.py RENAMED Viewed

@@ -5,7 +5,7 @@ from helm.common.optional_dependencies import handle_module_not_found_error
 try:
     from html2text import HTML2Text
 except ModuleNotFoundError as e:
-    handle_module_not_found_error(e, suggestions=["image2structure"])
+    handle_module_not_found_error(e, suggestions=["image2struct"])
 def convert_html_to_text(handler: HTML2Text, html: str) -> str:

crfm-helm 0.5.2__py3-none-any.whl → 0.5.4__py3-none-any.whl

Potentially problematic release.

crfm-helm 0.5.2py3-none-any.whl → 0.5.4py3-none-any.whl