PyPI - opencompass - Versions diffs - 0.2.3__tar.gz → 0.2.4__tar.gz - Mend

opencompass 0.2.3tar.gz → 0.2.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (394) hide show

{opencompass-0.2.3 → opencompass-0.2.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: opencompass
-Version: 0.2.3
+Version: 0.2.4
 Summary: A comprehensive toolkit for large model evaluation
 Home-page: https://github.com/open-compass/opencompass
 Author: OpenCompass Contributors
@@ -11,8 +11,13 @@ Description: <div align="center">
           <br />
           <br />
-        [![docs](https://readthedocs.org/projects/opencompass/badge)](https://opencompass.readthedocs.io/en)
-        [![license](https://img.shields.io/github/license/InternLM/opencompass.svg)](https://github.com/open-compass/opencompass/blob/main/LICENSE)
+        [![][github-release-shield]][github-release-link]
+        [![][github-releasedate-shield]][github-releasedate-link]
+        [![][github-contributors-shield]][github-contributors-link]<br>
+        [![][github-forks-shield]][github-forks-link]
+        [![][github-stars-shield]][github-stars-link]
+        [![][github-issues-shield]][github-issues-link]
+        [![][github-license-shield]][github-license-link]
         <!-- [![PyPI](https://badge.fury.io/py/opencompass.svg)](https://pypi.org/project/opencompass/) -->
@@ -25,12 +30,18 @@ Description: <div align="center">
         English | [简体中文](README_zh-CN.md)
+        [![][github-trending-shield]][github-trending-url]
         </div>
         <p align="center">
             👋 join us on <a href="https://discord.gg/KKwfEbFj7U" target="_blank">Discord</a> and <a href="https://r.vansin.top/?r=opencompass" target="_blank">WeChat</a>
         </p>
+        > \[!IMPORTANT\]
+        >
+        > **Star Us**, You will receive all release notifications from GitHub without any delay ~ ⭐️
         ## 📣 OpenCompass 2.0
         We are thrilled to introduce OpenCompass 2.0, an advanced suite featuring three key components: [CompassKit](https://github.com/open-compass), [CompassHub](https://hub.opencompass.org.cn/home), and [CompassRank](https://rank.opencompass.org.cn/home).
@@ -42,6 +53,14 @@ Description: <div align="center">
         **CompassKit** is a powerful collection of evaluation toolkits specifically tailored for Large Language Models and Large Vision-language Models. It provides an extensive set of tools to assess and measure the performance of these complex models effectively. Welcome to try our toolkits for in your research and products.
+        <details>
+          <summary><kbd>Star History</kbd></summary>
+          <picture>
+            <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=open-compass%2Fopencompass&theme=dark&type=Date">
+            <img width="100%" src="https://api.star-history.com/svg?repos=open-compass%2Fopencompass&type=Date">
+          </picture>
+        </details>
         ## 🧭	Welcome
         to **OpenCompass**!
@@ -59,12 +78,9 @@ Description: <div align="center">
         ## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
-        - **\[2024.02.29\]** We supported the MT-Bench, AlpacalEval and AlignBench, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/subjective_evaluation.html) 🔥🔥🔥.
-        - **\[2024.01.30\]** We release OpenCompass 2.0. Click  [CompassKit](https://github.com/open-compass), [CompassHub](https://hub.opencompass.org.cn/home), and [CompassRank](https://rank.opencompass.org.cn/home) for more information ! 🔥🔥🔥.
-        - **\[2024.01.17\]** We supported the evaluation of [InternLM2](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_keyset.py) and [InternLM2-Chat](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_chat_keyset.py), InternLM2 showed extremely strong performance in these tests, welcome to try! 🔥🔥🔥.
-        - **\[2024.01.17\]** We supported the needle in a haystack test with multiple needles, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/needleinahaystack_eval.html#id8) 🔥🔥🔥.
-        - **\[2023.12.28\]** We have enabled seamless evaluation of all models developed using [LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory), a powerful toolkit for comprehensive LLM development.
-        - **\[2023.12.22\]** We have released [T-Eval](https://github.com/open-compass/T-Eval), a step-by-step evaluation benchmark to gauge your LLMs on tool utilization. Welcome to our [Leaderboard](https://open-compass.github.io/T-Eval/leaderboard.html) for more details!
+        - **\[2024.04.22\]** We supported the evaluation of [LLaMA3](configs/models/hf_llama/hf_llama3_8b.py) 和 [LLaMA3-Instruct](configs/models/hf_llama/hf_llama3_8b_instruct.py), welcome to try! 🔥🔥🔥
+        - **\[2024.02.29\]** We supported the MT-Bench, AlpacalEval and AlignBench, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/subjective_evaluation.html)
+        - **\[2024.01.30\]** We release OpenCompass 2.0. Click  [CompassKit](https://github.com/open-compass), [CompassHub](https://hub.opencompass.org.cn/home), and [CompassRank](https://rank.opencompass.org.cn/home) for more information !
         > [More](docs/en/notes/news.md)
@@ -447,6 +463,7 @@ Description: <div align="center">
         - [InternLM](https://github.com/InternLM/InternLM)
         - [LLaMA](https://github.com/facebookresearch/llama)
+        - [LLaMA3](https://github.com/meta-llama/llama3)
         - [Vicuna](https://github.com/lm-sys/FastChat)
         - [Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
         - [Baichuan](https://github.com/baichuan-inc)
@@ -505,6 +522,20 @@ Description: <div align="center">
         We appreciate all contributions to improving OpenCompass. Please refer to the [contributing guideline](https://opencompass.readthedocs.io/en/latest/notes/contribution_guide.html) for the best practice.
+        <!-- Copy-paste in your Readme.md file -->
+        <!-- Made with [OSS Insight](https://ossinsight.io/) -->
+        <a href="https://github.com/open-compass/opencompass/graphs/contributors" target="_blank">
+          <table>
+            <tr>
+              <th colspan="2">
+                <br><img src="https://contrib.rocks/image?repo=open-compass/opencompass"><br><br>
+              </th>
+            </tr>
+          </table>
+        </a>
         ## 🤝 Acknowledgements
         Some code in this project is cited and modified from [OpenICL](https://github.com/Shark-NLP/OpenICL).
@@ -524,6 +555,23 @@ Description: <div align="center">
         <p align="right"><a href="#top">🔝Back to top</a></p>
+        [github-contributors-link]: https://github.com/open-compass/opencompass/graphs/contributors
+        [github-contributors-shield]: https://img.shields.io/github/contributors/open-compass/opencompass?color=c4f042&labelColor=black&style=flat-square
+        [github-forks-link]: https://github.com/open-compass/opencompass/network/members
+        [github-forks-shield]: https://img.shields.io/github/forks/open-compass/opencompass?color=8ae8ff&labelColor=black&style=flat-square
+        [github-issues-link]: https://github.com/open-compass/opencompass/issues
+        [github-issues-shield]: https://img.shields.io/github/issues/open-compass/opencompass?color=ff80eb&labelColor=black&style=flat-square
+        [github-license-link]: https://github.com/open-compass/opencompass/blob/main/LICENSE
+        [github-license-shield]: https://img.shields.io/github/license/open-compass/opencompass?color=white&labelColor=black&style=flat-square
+        [github-release-link]: https://github.com/open-compass/opencompass/releases
+        [github-release-shield]: https://img.shields.io/github/v/release/open-compass/opencompass?color=369eff&labelColor=black&logo=github&style=flat-square
+        [github-releasedate-link]: https://github.com/open-compass/opencompass/releases
+        [github-releasedate-shield]: https://img.shields.io/github/release-date/open-compass/opencompass?labelColor=black&style=flat-square
+        [github-stars-link]: https://github.com/open-compass/opencompass/stargazers
+        [github-stars-shield]: https://img.shields.io/github/stars/open-compass/opencompass?color=ffcb47&labelColor=black&style=flat-square
+        [github-trending-shield]: https://trendshift.io/api/badge/repositories/6630
+        [github-trending-url]: https://trendshift.io/repositories/6630
 Keywords: AI,NLP,in-context learning,large language model,evaluation,benchmark,llm
 Platform: UNKNOWN
 Classifier: Programming Language :: Python :: 3.8

{opencompass-0.2.3 → opencompass-0.2.4}/README.md RENAMED Viewed

@@ -3,8 +3,13 @@
   <br />
   <br />
-[![docs](https://readthedocs.org/projects/opencompass/badge)](https://opencompass.readthedocs.io/en)
-[![license](https://img.shields.io/github/license/InternLM/opencompass.svg)](https://github.com/open-compass/opencompass/blob/main/LICENSE)
+[![][github-release-shield]][github-release-link]
+[![][github-releasedate-shield]][github-releasedate-link]
+[![][github-contributors-shield]][github-contributors-link]<br>
+[![][github-forks-shield]][github-forks-link]
+[![][github-stars-shield]][github-stars-link]
+[![][github-issues-shield]][github-issues-link]
+[![][github-license-shield]][github-license-link]
 <!-- [![PyPI](https://badge.fury.io/py/opencompass.svg)](https://pypi.org/project/opencompass/) -->
@@ -17,12 +22,18 @@
 English | [简体中文](README_zh-CN.md)
+[![][github-trending-shield]][github-trending-url]
 </div>
 <p align="center">
     👋 join us on <a href="https://discord.gg/KKwfEbFj7U" target="_blank">Discord</a> and <a href="https://r.vansin.top/?r=opencompass" target="_blank">WeChat</a>
 </p>
+> \[!IMPORTANT\]
+>
+> **Star Us**, You will receive all release notifications from GitHub without any delay ~ ⭐️
 ## 📣 OpenCompass 2.0
 We are thrilled to introduce OpenCompass 2.0, an advanced suite featuring three key components: [CompassKit](https://github.com/open-compass), [CompassHub](https://hub.opencompass.org.cn/home), and [CompassRank](https://rank.opencompass.org.cn/home).
@@ -34,6 +45,14 @@ We are thrilled to introduce OpenCompass 2.0, an advanced suite featuring three
 **CompassKit** is a powerful collection of evaluation toolkits specifically tailored for Large Language Models and Large Vision-language Models. It provides an extensive set of tools to assess and measure the performance of these complex models effectively. Welcome to try our toolkits for in your research and products.
+<details>
+  <summary><kbd>Star History</kbd></summary>
+  <picture>
+    <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=open-compass%2Fopencompass&theme=dark&type=Date">
+    <img width="100%" src="https://api.star-history.com/svg?repos=open-compass%2Fopencompass&type=Date">
+  </picture>
+</details>
 ## 🧭	Welcome
 to **OpenCompass**!
@@ -51,12 +70,9 @@ Just like a compass guides us on our journey, OpenCompass will guide you through
 ## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
-- **\[2024.02.29\]** We supported the MT-Bench, AlpacalEval and AlignBench, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/subjective_evaluation.html) 🔥🔥🔥.
-- **\[2024.01.30\]** We release OpenCompass 2.0. Click  [CompassKit](https://github.com/open-compass), [CompassHub](https://hub.opencompass.org.cn/home), and [CompassRank](https://rank.opencompass.org.cn/home) for more information ! 🔥🔥🔥.
-- **\[2024.01.17\]** We supported the evaluation of [InternLM2](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_keyset.py) and [InternLM2-Chat](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_chat_keyset.py), InternLM2 showed extremely strong performance in these tests, welcome to try! 🔥🔥🔥.
-- **\[2024.01.17\]** We supported the needle in a haystack test with multiple needles, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/needleinahaystack_eval.html#id8) 🔥🔥🔥.
-- **\[2023.12.28\]** We have enabled seamless evaluation of all models developed using [LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory), a powerful toolkit for comprehensive LLM development.
-- **\[2023.12.22\]** We have released [T-Eval](https://github.com/open-compass/T-Eval), a step-by-step evaluation benchmark to gauge your LLMs on tool utilization. Welcome to our [Leaderboard](https://open-compass.github.io/T-Eval/leaderboard.html) for more details!
+- **\[2024.04.22\]** We supported the evaluation of [LLaMA3](configs/models/hf_llama/hf_llama3_8b.py) 和 [LLaMA3-Instruct](configs/models/hf_llama/hf_llama3_8b_instruct.py), welcome to try! 🔥🔥🔥
+- **\[2024.02.29\]** We supported the MT-Bench, AlpacalEval and AlignBench, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/subjective_evaluation.html)
+- **\[2024.01.30\]** We release OpenCompass 2.0. Click  [CompassKit](https://github.com/open-compass), [CompassHub](https://hub.opencompass.org.cn/home), and [CompassRank](https://rank.opencompass.org.cn/home) for more information !
 > [More](docs/en/notes/news.md)
@@ -439,6 +455,7 @@ Through the command line or configuration files, OpenCompass also supports evalu
 - [InternLM](https://github.com/InternLM/InternLM)
 - [LLaMA](https://github.com/facebookresearch/llama)
+- [LLaMA3](https://github.com/meta-llama/llama3)
 - [Vicuna](https://github.com/lm-sys/FastChat)
 - [Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
 - [Baichuan](https://github.com/baichuan-inc)
@@ -497,6 +514,20 @@ Through the command line or configuration files, OpenCompass also supports evalu
 We appreciate all contributions to improving OpenCompass. Please refer to the [contributing guideline](https://opencompass.readthedocs.io/en/latest/notes/contribution_guide.html) for the best practice.
+<!-- Copy-paste in your Readme.md file -->
+<!-- Made with [OSS Insight](https://ossinsight.io/) -->
+<a href="https://github.com/open-compass/opencompass/graphs/contributors" target="_blank">
+  <table>
+    <tr>
+      <th colspan="2">
+        <br><img src="https://contrib.rocks/image?repo=open-compass/opencompass"><br><br>
+      </th>
+    </tr>
+  </table>
+</a>
 ## 🤝 Acknowledgements
 Some code in this project is cited and modified from [OpenICL](https://github.com/Shark-NLP/OpenICL).
@@ -515,3 +546,20 @@ Some datasets and prompt implementations are modified from [chain-of-thought-hub
 ```
 <p align="right"><a href="#top">🔝Back to top</a></p>
+[github-contributors-link]: https://github.com/open-compass/opencompass/graphs/contributors
+[github-contributors-shield]: https://img.shields.io/github/contributors/open-compass/opencompass?color=c4f042&labelColor=black&style=flat-square
+[github-forks-link]: https://github.com/open-compass/opencompass/network/members
+[github-forks-shield]: https://img.shields.io/github/forks/open-compass/opencompass?color=8ae8ff&labelColor=black&style=flat-square
+[github-issues-link]: https://github.com/open-compass/opencompass/issues
+[github-issues-shield]: https://img.shields.io/github/issues/open-compass/opencompass?color=ff80eb&labelColor=black&style=flat-square
+[github-license-link]: https://github.com/open-compass/opencompass/blob/main/LICENSE
+[github-license-shield]: https://img.shields.io/github/license/open-compass/opencompass?color=white&labelColor=black&style=flat-square
+[github-release-link]: https://github.com/open-compass/opencompass/releases
+[github-release-shield]: https://img.shields.io/github/v/release/open-compass/opencompass?color=369eff&labelColor=black&logo=github&style=flat-square
+[github-releasedate-link]: https://github.com/open-compass/opencompass/releases
+[github-releasedate-shield]: https://img.shields.io/github/release-date/open-compass/opencompass?labelColor=black&style=flat-square
+[github-stars-link]: https://github.com/open-compass/opencompass/stargazers
+[github-stars-shield]: https://img.shields.io/github/stars/open-compass/opencompass?color=ffcb47&labelColor=black&style=flat-square
+[github-trending-shield]: https://trendshift.io/api/badge/repositories/6630
+[github-trending-url]: https://trendshift.io/repositories/6630

opencompass-0.2.4/opencompass/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = '0.2.4'

{opencompass-0.2.3 → opencompass-0.2.4}/opencompass/datasets/NPHardEval/cmp_GCP_D.py RENAMED Viewed

@@ -1,6 +1,10 @@
 import ast
-import networkx as nx
+try:
+    import networkx as nx
+except ImportError:
+    nx = None
 from datasets import Dataset
 from opencompass.openicl.icl_evaluator import BaseEvaluator

{opencompass-0.2.3 → opencompass-0.2.4}/opencompass/datasets/NPHardEval/cmp_TSP_D.py RENAMED Viewed

@@ -1,7 +1,11 @@
 import ast
 import json
-import networkx as nx
+try:
+    import networkx as nx
+except ImportError:
+    nx = None
 import pandas as pd
 from datasets import Dataset

{opencompass-0.2.3 → opencompass-0.2.4}/opencompass/datasets/NPHardEval/p_SPP.py RENAMED Viewed

@@ -1,7 +1,11 @@
 import ast
 import json
-import networkx as nx
+try:
+    import networkx as nx
+except ImportError:
+    nx = None
 from datasets import Dataset
 from opencompass.openicl.icl_evaluator import BaseEvaluator

opencompass-0.2.4/opencompass/datasets/QuALITY.py ADDED Viewed

@@ -0,0 +1,59 @@
+import json
+from datasets import Dataset
+from opencompass.openicl.icl_evaluator import BaseEvaluator
+from opencompass.registry import LOAD_DATASET
+from .base import BaseDataset
+@LOAD_DATASET.register_module()
+class QuALITYDataset(BaseDataset):
+    @staticmethod
+    def load(path: str):
+        dataset_list = []
+        with open(path, 'r', encoding='utf-8') as f:
+            for line in f:
+                line = json.loads(line)
+                for question in line['questions']:
+                    dataset_list.append({
+                        'article':
+                        line['article'],
+                        'question':
+                        question['question'],
+                        'A':
+                        question['options'][0],
+                        'B':
+                        question['options'][1],
+                        'C':
+                        question['options'][2],
+                        'D':
+                        question['options'][3],
+                        'gold_label':
+                        'ABCD'[question['gold_label'] - 1],
+                        'difficult':
+                        question['difficult']
+                    })
+        return Dataset.from_list(dataset_list)
+class QuALITYEvaluator(BaseEvaluator):
+    def score(self, predictions, references, test_set):
+        assert len(predictions) == len(references)
+        easy, hard, all = [], [], []
+        for pred, refer, test in zip(predictions, references, test_set):
+            if pred == refer:
+                answer = True
+            else:
+                answer = False
+            all.append(answer)
+            if test['difficult'] == 0:
+                easy.append(answer)
+            else:
+                hard.append(answer)
+        return dict(easy_acc=sum(easy) / len(easy) * 100,
+                    hard_acc=sum(hard) / len(easy) * 100,
+                    all_acc=sum(all) / len(all) * 100)

opencompass-0.2.4/opencompass/datasets/TheoremQA/__init__.py ADDED Viewed

@@ -0,0 +1,4 @@
+from .legacy import (TheoremQA_postprocess, TheoremQA_postprocess_v2,
+                     TheoremQADataset)
+from .main import (TheoremQA_postprocess_v3, TheoremQADatasetV3,
+                   TheoremQAEvaluatorV3)

opencompass-0.2.3/opencompass/datasets/TheoremQA.py → opencompass-0.2.4/opencompass/datasets/TheoremQA/legacy.py RENAMED Viewed

@@ -4,7 +4,7 @@ from datasets import load_dataset
 from opencompass.registry import LOAD_DATASET, TEXT_POSTPROCESSORS
-from .base import BaseDataset
+from ..base import BaseDataset
 @LOAD_DATASET.register_module()

opencompass-0.2.4/opencompass/datasets/TheoremQA/main.py ADDED Viewed

@@ -0,0 +1,66 @@
+import re
+import json
+from datasets import Dataset, DatasetDict
+from opencompass.registry import LOAD_DATASET, TEXT_POSTPROCESSORS, ICL_EVALUATORS
+from opencompass.openicl.icl_evaluator import BaseEvaluator
+from ..base import BaseDataset
+from . import utils
+from tqdm import tqdm
+@LOAD_DATASET.register_module()
+class TheoremQADatasetV3(BaseDataset):
+    @staticmethod
+    def load(path: str):
+        with open(path, 'r') as f:
+            data = json.load(f)
+        for item in data:
+            item['Answer'] = str(item['Answer'])
+        dataset = Dataset.from_list(data)
+        return dataset
+def TheoremQA_postprocess_v3(text: str) -> str:
+    answer = utils.answer_clean(["The answer is:", "The answer is", "the answer is"], text)
+    return answer
+@ICL_EVALUATORS.register_module()
+class TheoremQAEvaluatorV3(BaseEvaluator):
+    def score(self, predictions, references, test_set):
+        if len(predictions) != len(references):
+            return {"error": "preds and refrs have different length"}
+        details = []
+        correct, wrong = 0, 0
+        for index in tqdm(range(len(predictions))):
+            answer = predictions[index]
+            groundtruth = references[index]
+            answer_type = test_set[index]['Answer_type']
+            if answer_type in ['float', 'integer', 'bool']:
+                groundtruth = [groundtruth, eval(groundtruth)]
+            else:
+                groundtruth = [groundtruth, None]
+            if utils.compare_answer_with_groundtruth(answer, *groundtruth):
+                correct += 1
+                is_correct = True
+            else:
+                wrong += 1
+                is_correct = False
+            details.append(
+                {
+                    # "question": question,
+                    # "solution": output,
+                    "correct": groundtruth,
+                    "pred": answer,
+                    "is_correct": is_correct,
+                }
+            )
+        score = correct / (correct + wrong) * 100
+        return {'score': score, 'details': details}

opencompass-0.2.4/opencompass/datasets/TheoremQA/number_utils.py ADDED Viewed

@@ -0,0 +1,98 @@
+import re
+import math
+from math import sqrt, sin, cos, log, pi, factorial, exp, e
+E = 2.718
+def floatify(num: str):
+    try:
+        num = float(num)
+        if num.is_integer():
+            return round(num)
+        else:
+            return num
+    except Exception:
+        return None
+def within_eps(pred: float, gt: float):
+    eps = abs(gt) * 0.04
+    if pred >= gt - eps and pred <= gt + eps:
+        return True
+    else:
+        return False
+def clean_units(pred_str: str):
+    """Clean the units in the number."""
+    def convert_pi_to_number(code_string):
+        code_string = code_string.replace('\\pi', 'π')
+        # Replace \pi or π not preceded by a digit or } with 3.14
+        code_string = re.sub(r'(?<![\d}])\\?π', '3.14', code_string)
+        # Replace instances where π is preceded by a digit but without a multiplication symbol, e.g., "3π" -> "3*3.14"
+        code_string = re.sub(r'(\d)(\\?π)', r'\1*3.14', code_string)
+        # Handle cases where π is within braces or followed by a multiplication symbol
+        # This replaces "{π}" with "3.14" directly and "3*π" with "3*3.14"
+        code_string = re.sub(r'\{(\\?π)\}', '3.14', code_string)
+        code_string = re.sub(r'\*(\\?π)', '*3.14', code_string)
+        return code_string
+    pred_str = convert_pi_to_number(pred_str)
+    pred_str = pred_str.replace('%', '/100')
+    pred_str = pred_str.replace('$', '')
+    pred_str = pred_str.replace('¥', '')
+    pred_str = pred_str.replace('°C', '')
+    pred_str = pred_str.replace(' C', '')
+    pred_str = pred_str.replace('°', '')
+    return pred_str
+def number_it(num):
+    from latex2sympy2 import latex2sympy
+    if isinstance(num, (int, float)):
+        return num
+    num = clean_units(num)
+    try:
+        num = str(latex2sympy(num))
+    except Exception:
+        pass
+    if floatify(num) is not None:
+        return floatify(num)
+    else:
+        try:
+            num = eval(num)
+            if isinstance(num, list) or isinstance(num, tuple):
+                num = num[0]
+            if floatify(num) is not None:
+                return floatify(num)
+            else:
+                return None
+        except Exception:
+            return None
+def compare_two_numbers(p, gt):
+    try:
+        if math.isnan(p):
+            return False
+        if isinstance(gt, int):
+            return round(p) == gt
+        else:
+            return within_eps(pred=p, gt=gt)
+    except Exception:
+        return False
+def compare_two_list(pred, gt):
+    if not isinstance(pred, list):
+        return False
+    elif len(pred) != len(gt):
+        return False
+    elif any([not isinstance(x, (int, float)) for x in pred]):
+        return False
+    else:
+        pred = sorted(pred)
+        gt = sorted(gt)
+        return all([compare_two_numbers(p, g) for p, g in zip(pred, gt)])

opencompass-0.2.4/opencompass/datasets/TheoremQA/utils.py ADDED Viewed

@@ -0,0 +1,110 @@
+import re
+from .number_utils import clean_units, compare_two_numbers, compare_two_list, number_it
+import contextlib
+import signal
+@contextlib.contextmanager
+def time_limit(seconds: float):
+    def signal_handler(signum, frame):
+        raise ValueError
+    signal.setitimer(signal.ITIMER_REAL, seconds)
+    signal.signal(signal.SIGALRM, signal_handler)
+    try:
+        yield
+    finally:
+        signal.setitimer(signal.ITIMER_REAL, 0)
+def extract_theoremqa_answer(pred: str, answer_flag: bool = True):
+    from latex2sympy2 import latex2sympy
+    if any([option in pred.lower() for option in ['yes', 'true']]):
+        pred = 'True'
+    elif any([option in pred.lower() for option in ['no', 'false']]):
+        pred = 'False'
+    elif any([option in pred.lower() for option in ['(a)', '(b)', '(c)', '(d)', '(e)', '(f)']]):
+        pass
+    else:
+        if answer_flag:
+            # Extract the numbers out of the string
+            pred = pred.split('=')[-1].strip()
+            pred = clean_units(pred)
+            try:
+                with time_limit(1):
+                    tmp = str(latex2sympy(pred))
+                    pred = str(eval(tmp))
+            except Exception:
+                if re.match(r'-?[\d\.]+\s\D+$', pred):
+                    pred = pred.split(' ')[0]
+                elif re.match(r'-?[\d\.]+\s[^\s]+$', pred):
+                    pred = pred.split(' ')[0]
+        else:
+            # desparate search over the last number
+            preds = re.findall(r'-?\d*\.?\d+', pred)
+            if(len(preds) >= 1):
+                pred = preds[-1]
+            else:
+                pred = ''
+    return pred
+def answer_clean(direct_answer_trigger_for_fewshot: tuple, pred: str):
+    pred = pred.strip('\n')
+    # Determine if this is ICL, if so, use \n\n to split the first chunk.
+    ICL = False
+    for trigger in direct_answer_trigger_for_fewshot:
+        if pred.count(trigger) > 1:
+            ICL = True
+    if ICL:
+        pred = pred.split('\n\n')[0]
+    # Split the trigger to find the answer.
+    preds = re.split('|'.join(direct_answer_trigger_for_fewshot), pred)
+    if len(preds) > 1:
+        answer_flag = True
+        pred = preds[-1]
+    else:
+        answer_flag = False
+    pred = pred.strip('\n').rstrip('.').rstrip('/').strip(' ')
+    pred = [extract_theoremqa_answer(pred, answer_flag)]
+    # If there is no candidate in list, null is set.
+    if len(pred) == 0:
+        pred = ""
+    else:
+        if answer_flag:
+            # choose the first element in list ...
+            pred = pred[0]
+        else:
+            # choose the last e
+            pred = pred[-1]
+    # Remove the period at the end, again!
+    pred = pred.rstrip('.').rstrip('/')
+    return pred
+def compare_answer_with_groundtruth(answer: str, groundtruth_str: str, groundtruth_num = None):
+    if groundtruth_str.lower() in ['(a)', '(b)', '(c)', '(d)', '(e)', '(f)']:
+        return groundtruth_str.lower() in answer.lower()
+    elif answer.lower() == groundtruth_str.lower():
+        return True
+    elif groundtruth_num is not None:
+        if isinstance(groundtruth_num, (int, float)):
+            return compare_two_numbers(number_it(answer), groundtruth_num)
+        else:
+            if answer.startswith('(') and answer.endswith(')'):
+                try:
+                    answer = list(eval(answer))
+                    answer = [number_it(a) for a in answer]
+                except Exception as e:
+                    return False
+                return compare_two_list(answer, groundtruth_num)
+            else:
+                return False
+    else:
+        return False

opencompass 0.2.3__tar.gz → 0.2.4__tar.gz

opencompass 0.2.3tar.gz → 0.2.4tar.gz