PyPI - opencompass - Versions diffs - 0.2.2__tar.gz → 0.2.4__tar.gz - Mend

opencompass 0.2.2tar.gz → 0.2.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (395) hide show

{opencompass-0.2.2 → opencompass-0.2.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: opencompass
-Version: 0.2.2
+Version: 0.2.4
 Summary: A comprehensive toolkit for large model evaluation
 Home-page: https://github.com/open-compass/opencompass
 Author: OpenCompass Contributors
@@ -11,37 +11,55 @@ Description: <div align="center">
           <br />
           <br />
-        [![docs](https://readthedocs.org/projects/opencompass/badge)](https://opencompass.readthedocs.io/en)
-        [![license](https://img.shields.io/github/license/InternLM/opencompass.svg)](https://github.com/open-compass/opencompass/blob/main/LICENSE)
+        [![][github-release-shield]][github-release-link]
+        [![][github-releasedate-shield]][github-releasedate-link]
+        [![][github-contributors-shield]][github-contributors-link]<br>
+        [![][github-forks-shield]][github-forks-link]
+        [![][github-stars-shield]][github-stars-link]
+        [![][github-issues-shield]][github-issues-link]
+        [![][github-license-shield]][github-license-link]
         <!-- [![PyPI](https://badge.fury.io/py/opencompass.svg)](https://pypi.org/project/opencompass/) -->
         [🌐Website](https://opencompass.org.cn/) |
+        [📖CompassHub](https://hub.opencompass.org.cn/home) |
+        [📊CompassRank](https://rank.opencompass.org.cn/home) |
         [📘Documentation](https://opencompass.readthedocs.io/en/latest/) |
         [🛠️Installation](https://opencompass.readthedocs.io/en/latest/get_started/installation.html) |
         [🤔Reporting Issues](https://github.com/open-compass/opencompass/issues/new/choose)
         English | [简体中文](README_zh-CN.md)
+        [![][github-trending-shield]][github-trending-url]
         </div>
         <p align="center">
             👋 join us on <a href="https://discord.gg/KKwfEbFj7U" target="_blank">Discord</a> and <a href="https://r.vansin.top/?r=opencompass" target="_blank">WeChat</a>
         </p>
-        ## 📣 OpenCompass 2023 LLM Annual Leaderboard
+        > \[!IMPORTANT\]
+        >
+        > **Star Us**, You will receive all release notifications from GitHub without any delay ~ ⭐️
-        We are honored to have witnessed the tremendous progress of artificial general intelligence together with the community in the past year, and we are also very pleased that **OpenCompass** can help numerous developers and users.
+        ## 📣 OpenCompass 2.0
-        We announce the launch of the **OpenCompass 2023 LLM Annual Leaderboard** plan. We expect to release the annual leaderboard of the LLMs in January 2024, systematically evaluating the performance of LLMs in various capabilities such as language, knowledge, reasoning, creation, long-text, and agents.
+        We are thrilled to introduce OpenCompass 2.0, an advanced suite featuring three key components: [CompassKit](https://github.com/open-compass), [CompassHub](https://hub.opencompass.org.cn/home), and [CompassRank](https://rank.opencompass.org.cn/home).
+        ![oc20](https://github.com/tonysy/opencompass/assets/7881589/90dbe1c0-c323-470a-991e-2b37ab5350b2)
-        At that time, we will release rankings for both open-source models and commercial API models, aiming to provide a comprehensive, objective, and neutral reference for the industry and research community.
+        **CompassRank** has been significantly enhanced into the leaderboards that now incorporates both open-source benchmarks and proprietary benchmarks. This upgrade allows for a more comprehensive evaluation of models across the industry.
-        We sincerely invite various large models to join the OpenCompass to showcase their performance advantages in different fields. At the same time, we also welcome researchers and developers to provide valuable suggestions and contributions to jointly promote the development of the LLMs. If you have any questions or needs, please feel free to [contact us](mailto:opencompass@pjlab.org.cn). In addition, relevant evaluation contents, performance statistics, and evaluation methods will be open-source along with the leaderboard release.
+        **CompassHub** presents a pioneering benchmark browser interface, designed to simplify and expedite the exploration and utilization of an extensive array of benchmarks for researchers and practitioners alike. To enhance the visibility of your own benchmark within the community, we warmly invite you to contribute it to CompassHub. You may initiate the submission process by clicking [here](https://hub.opencompass.org.cn/dataset-submit).
-        We have provided the more details of the CompassBench 2023 in [Doc](docs/zh_cn/advanced_guides/compassbench_intro.md).
+        **CompassKit** is a powerful collection of evaluation toolkits specifically tailored for Large Language Models and Large Vision-language Models. It provides an extensive set of tools to assess and measure the performance of these complex models effectively. Welcome to try our toolkits for in your research and products.
-        Let's look forward to the release of the OpenCompass 2023 LLM Annual Leaderboard!
+        <details>
+          <summary><kbd>Star History</kbd></summary>
+          <picture>
+            <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=open-compass%2Fopencompass&theme=dark&type=Date">
+            <img width="100%" src="https://api.star-history.com/svg?repos=open-compass%2Fopencompass&type=Date">
+          </picture>
+        </details>
         ## 🧭	Welcome
@@ -60,12 +78,9 @@ Description: <div align="center">
         ## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
-        - **\[2024.01.17\]** We supported the evaluation of [InternLM2](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_keyset.py) and [InternLM2-Chat](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_chat_keyset.py), InternLM2 showed extremely strong performance in these tests, welcome to try! 🔥🔥🔥.
-        - **\[2024.01.17\]** We supported the needle in a haystack test with multiple needles, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/needleinahaystack_eval.html#id8) 🔥🔥🔥.
-        - **\[2023.12.28\]** We have enabled seamless evaluation of all models developed using [LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory), a powerful toolkit for comprehensive LLM development. 🔥🔥🔥.
-        - **\[2023.12.22\]** We have released [T-Eval](https://github.com/open-compass/T-Eval), a step-by-step evaluation benchmark to gauge your LLMs on tool utilization. Welcome to our [Leaderboard](https://open-compass.github.io/T-Eval/leaderboard.html) for more details! 🔥🔥🔥.
-        - **\[2023.12.10\]** We have released [VLMEvalKit](https://github.com/open-compass/VLMEvalKit), a toolkit for evaluating vision-language models (VLMs), currently support 20+ VLMs and 7 multi-modal benchmarks (including MMBench series).
-        - **\[2023.12.10\]** We have supported Mistral AI's MoE LLM: **Mixtral-8x7B-32K**. Welcome to [MixtralKit](https://github.com/open-compass/MixtralKit) for more details about inference and evaluation.
+        - **\[2024.04.22\]** We supported the evaluation of [LLaMA3](configs/models/hf_llama/hf_llama3_8b.py) 和 [LLaMA3-Instruct](configs/models/hf_llama/hf_llama3_8b_instruct.py), welcome to try! 🔥🔥🔥
+        - **\[2024.02.29\]** We supported the MT-Bench, AlpacalEval and AlignBench, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/subjective_evaluation.html)
+        - **\[2024.01.30\]** We release OpenCompass 2.0. Click  [CompassKit](https://github.com/open-compass), [CompassHub](https://hub.opencompass.org.cn/home), and [CompassRank](https://rank.opencompass.org.cn/home) for more information !
         > [More](docs/en/notes/news.md)
@@ -87,7 +102,7 @@ Description: <div align="center">
         ## 📊 Leaderboard
-        We provide [OpenCompass Leaderboard](https://opencompass.org.cn/rank) for the community to rank all public models and API models. If you would like to join the evaluation, please provide the model repository URL or a standard API interface to the email address `opencompass@pjlab.org.cn`.
+        We provide [OpenCompass Leaderboard](https://rank.opencompass.org.cn/home) for the community to rank all public models and API models. If you would like to join the evaluation, please provide the model repository URL or a standard API interface to the email address `opencompass@pjlab.org.cn`.
         <p align="right"><a href="#top">🔝Back to top</a></p>
@@ -122,8 +137,8 @@ Description: <div align="center">
         ```bash
         # Download dataset to data/ folder
-        wget https://github.com/open-compass/opencompass/releases/download/0.1.8.rc1/OpenCompassData-core-20231110.zip
-        unzip OpenCompassData-core-20231110.zip
+        wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
+        unzip OpenCompassData-core-20240207.zip
         ```
         Some third-party features, like Humaneval and Llama, may require additional steps to work properly, for detailed steps please refer to the [Installation Guide](https://opencompass.readthedocs.io/en/latest/get_started/installation.html).
@@ -428,10 +443,6 @@ Description: <div align="center">
           </tbody>
         </table>
-        ## OpenCompass Ecosystem
-        <p align="right"><a href="#top">🔝Back to top</a></p>
         ## 📖 Model Support
         <table align="center">
@@ -452,6 +463,7 @@ Description: <div align="center">
         - [InternLM](https://github.com/InternLM/InternLM)
         - [LLaMA](https://github.com/facebookresearch/llama)
+        - [LLaMA3](https://github.com/meta-llama/llama3)
         - [Vicuna](https://github.com/lm-sys/FastChat)
         - [Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
         - [Baichuan](https://github.com/baichuan-inc)
@@ -461,12 +473,14 @@ Description: <div align="center">
         - [TigerBot](https://github.com/TigerResearch/TigerBot)
         - [Qwen](https://github.com/QwenLM/Qwen)
         - [BlueLM](https://github.com/vivo-ai-lab/BlueLM)
+        - [Gemma](https://huggingface.co/google/gemma-7b)
         - ...
         </td>
         <td>
         - OpenAI
+        - Gemini
         - Claude
         - ZhipuAI(ChatGLM)
         - Baichuan
@@ -489,18 +503,18 @@ Description: <div align="center">
         ## 🔜 Roadmap
-        - [ ] Subjective Evaluation
+        - [x] Subjective Evaluation
           - [ ] Release CompassAreana
-          - [ ] Subjective evaluation dataset.
+          - [x] Subjective evaluation.
         - [x] Long-context
-          - [ ] Long-context evaluation with extensive datasets.
+          - [x] Long-context evaluation with extensive datasets.
           - [ ] Long-context leaderboard.
-        - [ ] Coding
+        - [x] Coding
           - [ ] Coding evaluation leaderboard.
           - [x] Non-python language evaluation service.
-        - [ ] Agent
+        - [x] Agent
           - [ ] Support various agenet framework.
-          - [ ] Evaluation of tool use of the LLMs.
+          - [x] Evaluation of tool use of the LLMs.
         - [x] Robustness
           - [x] Support various attack method
@@ -508,6 +522,20 @@ Description: <div align="center">
         We appreciate all contributions to improving OpenCompass. Please refer to the [contributing guideline](https://opencompass.readthedocs.io/en/latest/notes/contribution_guide.html) for the best practice.
+        <!-- Copy-paste in your Readme.md file -->
+        <!-- Made with [OSS Insight](https://ossinsight.io/) -->
+        <a href="https://github.com/open-compass/opencompass/graphs/contributors" target="_blank">
+          <table>
+            <tr>
+              <th colspan="2">
+                <br><img src="https://contrib.rocks/image?repo=open-compass/opencompass"><br><br>
+              </th>
+            </tr>
+          </table>
+        </a>
         ## 🤝 Acknowledgements
         Some code in this project is cited and modified from [OpenICL](https://github.com/Shark-NLP/OpenICL).
@@ -527,6 +555,23 @@ Description: <div align="center">
         <p align="right"><a href="#top">🔝Back to top</a></p>
+        [github-contributors-link]: https://github.com/open-compass/opencompass/graphs/contributors
+        [github-contributors-shield]: https://img.shields.io/github/contributors/open-compass/opencompass?color=c4f042&labelColor=black&style=flat-square
+        [github-forks-link]: https://github.com/open-compass/opencompass/network/members
+        [github-forks-shield]: https://img.shields.io/github/forks/open-compass/opencompass?color=8ae8ff&labelColor=black&style=flat-square
+        [github-issues-link]: https://github.com/open-compass/opencompass/issues
+        [github-issues-shield]: https://img.shields.io/github/issues/open-compass/opencompass?color=ff80eb&labelColor=black&style=flat-square
+        [github-license-link]: https://github.com/open-compass/opencompass/blob/main/LICENSE
+        [github-license-shield]: https://img.shields.io/github/license/open-compass/opencompass?color=white&labelColor=black&style=flat-square
+        [github-release-link]: https://github.com/open-compass/opencompass/releases
+        [github-release-shield]: https://img.shields.io/github/v/release/open-compass/opencompass?color=369eff&labelColor=black&logo=github&style=flat-square
+        [github-releasedate-link]: https://github.com/open-compass/opencompass/releases
+        [github-releasedate-shield]: https://img.shields.io/github/release-date/open-compass/opencompass?labelColor=black&style=flat-square
+        [github-stars-link]: https://github.com/open-compass/opencompass/stargazers
+        [github-stars-shield]: https://img.shields.io/github/stars/open-compass/opencompass?color=ffcb47&labelColor=black&style=flat-square
+        [github-trending-shield]: https://trendshift.io/api/badge/repositories/6630
+        [github-trending-url]: https://trendshift.io/repositories/6630
 Keywords: AI,NLP,in-context learning,large language model,evaluation,benchmark,llm
 Platform: UNKNOWN
 Classifier: Programming Language :: Python :: 3.8

{opencompass-0.2.2 → opencompass-0.2.4}/README.md RENAMED Viewed

@@ -3,37 +3,55 @@
   <br />
   <br />
-[![docs](https://readthedocs.org/projects/opencompass/badge)](https://opencompass.readthedocs.io/en)
-[![license](https://img.shields.io/github/license/InternLM/opencompass.svg)](https://github.com/open-compass/opencompass/blob/main/LICENSE)
+[![][github-release-shield]][github-release-link]
+[![][github-releasedate-shield]][github-releasedate-link]
+[![][github-contributors-shield]][github-contributors-link]<br>
+[![][github-forks-shield]][github-forks-link]
+[![][github-stars-shield]][github-stars-link]
+[![][github-issues-shield]][github-issues-link]
+[![][github-license-shield]][github-license-link]
 <!-- [![PyPI](https://badge.fury.io/py/opencompass.svg)](https://pypi.org/project/opencompass/) -->
 [🌐Website](https://opencompass.org.cn/) |
+[📖CompassHub](https://hub.opencompass.org.cn/home) |
+[📊CompassRank](https://rank.opencompass.org.cn/home) |
 [📘Documentation](https://opencompass.readthedocs.io/en/latest/) |
 [🛠️Installation](https://opencompass.readthedocs.io/en/latest/get_started/installation.html) |
 [🤔Reporting Issues](https://github.com/open-compass/opencompass/issues/new/choose)
 English | [简体中文](README_zh-CN.md)
+[![][github-trending-shield]][github-trending-url]
 </div>
 <p align="center">
     👋 join us on <a href="https://discord.gg/KKwfEbFj7U" target="_blank">Discord</a> and <a href="https://r.vansin.top/?r=opencompass" target="_blank">WeChat</a>
 </p>
-## 📣 OpenCompass 2023 LLM Annual Leaderboard
+> \[!IMPORTANT\]
+>
+> **Star Us**, You will receive all release notifications from GitHub without any delay ~ ⭐️
-We are honored to have witnessed the tremendous progress of artificial general intelligence together with the community in the past year, and we are also very pleased that **OpenCompass** can help numerous developers and users.
+## 📣 OpenCompass 2.0
-We announce the launch of the **OpenCompass 2023 LLM Annual Leaderboard** plan. We expect to release the annual leaderboard of the LLMs in January 2024, systematically evaluating the performance of LLMs in various capabilities such as language, knowledge, reasoning, creation, long-text, and agents.
+We are thrilled to introduce OpenCompass 2.0, an advanced suite featuring three key components: [CompassKit](https://github.com/open-compass), [CompassHub](https://hub.opencompass.org.cn/home), and [CompassRank](https://rank.opencompass.org.cn/home).
+![oc20](https://github.com/tonysy/opencompass/assets/7881589/90dbe1c0-c323-470a-991e-2b37ab5350b2)
-At that time, we will release rankings for both open-source models and commercial API models, aiming to provide a comprehensive, objective, and neutral reference for the industry and research community.
+**CompassRank** has been significantly enhanced into the leaderboards that now incorporates both open-source benchmarks and proprietary benchmarks. This upgrade allows for a more comprehensive evaluation of models across the industry.
-We sincerely invite various large models to join the OpenCompass to showcase their performance advantages in different fields. At the same time, we also welcome researchers and developers to provide valuable suggestions and contributions to jointly promote the development of the LLMs. If you have any questions or needs, please feel free to [contact us](mailto:opencompass@pjlab.org.cn). In addition, relevant evaluation contents, performance statistics, and evaluation methods will be open-source along with the leaderboard release.
+**CompassHub** presents a pioneering benchmark browser interface, designed to simplify and expedite the exploration and utilization of an extensive array of benchmarks for researchers and practitioners alike. To enhance the visibility of your own benchmark within the community, we warmly invite you to contribute it to CompassHub. You may initiate the submission process by clicking [here](https://hub.opencompass.org.cn/dataset-submit).
-We have provided the more details of the CompassBench 2023 in [Doc](docs/zh_cn/advanced_guides/compassbench_intro.md).
+**CompassKit** is a powerful collection of evaluation toolkits specifically tailored for Large Language Models and Large Vision-language Models. It provides an extensive set of tools to assess and measure the performance of these complex models effectively. Welcome to try our toolkits for in your research and products.
-Let's look forward to the release of the OpenCompass 2023 LLM Annual Leaderboard!
+<details>
+  <summary><kbd>Star History</kbd></summary>
+  <picture>
+    <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=open-compass%2Fopencompass&theme=dark&type=Date">
+    <img width="100%" src="https://api.star-history.com/svg?repos=open-compass%2Fopencompass&type=Date">
+  </picture>
+</details>
 ## 🧭	Welcome
@@ -52,12 +70,9 @@ Just like a compass guides us on our journey, OpenCompass will guide you through
 ## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
-- **\[2024.01.17\]** We supported the evaluation of [InternLM2](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_keyset.py) and [InternLM2-Chat](https://github.com/open-compass/opencompass/blob/main/configs/eval_internlm2_chat_keyset.py), InternLM2 showed extremely strong performance in these tests, welcome to try! 🔥🔥🔥.
-- **\[2024.01.17\]** We supported the needle in a haystack test with multiple needles, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/needleinahaystack_eval.html#id8) 🔥🔥🔥.
-- **\[2023.12.28\]** We have enabled seamless evaluation of all models developed using [LLaMA2-Accessory](https://github.com/Alpha-VLLM/LLaMA2-Accessory), a powerful toolkit for comprehensive LLM development. 🔥🔥🔥.
-- **\[2023.12.22\]** We have released [T-Eval](https://github.com/open-compass/T-Eval), a step-by-step evaluation benchmark to gauge your LLMs on tool utilization. Welcome to our [Leaderboard](https://open-compass.github.io/T-Eval/leaderboard.html) for more details! 🔥🔥🔥.
-- **\[2023.12.10\]** We have released [VLMEvalKit](https://github.com/open-compass/VLMEvalKit), a toolkit for evaluating vision-language models (VLMs), currently support 20+ VLMs and 7 multi-modal benchmarks (including MMBench series).
-- **\[2023.12.10\]** We have supported Mistral AI's MoE LLM: **Mixtral-8x7B-32K**. Welcome to [MixtralKit](https://github.com/open-compass/MixtralKit) for more details about inference and evaluation.
+- **\[2024.04.22\]** We supported the evaluation of [LLaMA3](configs/models/hf_llama/hf_llama3_8b.py) 和 [LLaMA3-Instruct](configs/models/hf_llama/hf_llama3_8b_instruct.py), welcome to try! 🔥🔥🔥
+- **\[2024.02.29\]** We supported the MT-Bench, AlpacalEval and AlignBench, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/subjective_evaluation.html)
+- **\[2024.01.30\]** We release OpenCompass 2.0. Click  [CompassKit](https://github.com/open-compass), [CompassHub](https://hub.opencompass.org.cn/home), and [CompassRank](https://rank.opencompass.org.cn/home) for more information !
 > [More](docs/en/notes/news.md)
@@ -79,7 +94,7 @@ OpenCompass is a one-stop platform for large model evaluation, aiming to provide
 ## 📊 Leaderboard
-We provide [OpenCompass Leaderboard](https://opencompass.org.cn/rank) for the community to rank all public models and API models. If you would like to join the evaluation, please provide the model repository URL or a standard API interface to the email address `opencompass@pjlab.org.cn`.
+We provide [OpenCompass Leaderboard](https://rank.opencompass.org.cn/home) for the community to rank all public models and API models. If you would like to join the evaluation, please provide the model repository URL or a standard API interface to the email address `opencompass@pjlab.org.cn`.
 <p align="right"><a href="#top">🔝Back to top</a></p>
@@ -114,8 +129,8 @@ pip install -e .
 ```bash
 # Download dataset to data/ folder
-wget https://github.com/open-compass/opencompass/releases/download/0.1.8.rc1/OpenCompassData-core-20231110.zip
-unzip OpenCompassData-core-20231110.zip
+wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
+unzip OpenCompassData-core-20240207.zip
 ```
 Some third-party features, like Humaneval and Llama, may require additional steps to work properly, for detailed steps please refer to the [Installation Guide](https://opencompass.readthedocs.io/en/latest/get_started/installation.html).
@@ -420,10 +435,6 @@ Through the command line or configuration files, OpenCompass also supports evalu
   </tbody>
 </table>
-## OpenCompass Ecosystem
-<p align="right"><a href="#top">🔝Back to top</a></p>
 ## 📖 Model Support
 <table align="center">
@@ -444,6 +455,7 @@ Through the command line or configuration files, OpenCompass also supports evalu
 - [InternLM](https://github.com/InternLM/InternLM)
 - [LLaMA](https://github.com/facebookresearch/llama)
+- [LLaMA3](https://github.com/meta-llama/llama3)
 - [Vicuna](https://github.com/lm-sys/FastChat)
 - [Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
 - [Baichuan](https://github.com/baichuan-inc)
@@ -453,12 +465,14 @@ Through the command line or configuration files, OpenCompass also supports evalu
 - [TigerBot](https://github.com/TigerResearch/TigerBot)
 - [Qwen](https://github.com/QwenLM/Qwen)
 - [BlueLM](https://github.com/vivo-ai-lab/BlueLM)
+- [Gemma](https://huggingface.co/google/gemma-7b)
 - ...
 </td>
 <td>
 - OpenAI
+- Gemini
 - Claude
 - ZhipuAI(ChatGLM)
 - Baichuan
@@ -481,18 +495,18 @@ Through the command line or configuration files, OpenCompass also supports evalu
 ## 🔜 Roadmap
-- [ ] Subjective Evaluation
+- [x] Subjective Evaluation
   - [ ] Release CompassAreana
-  - [ ] Subjective evaluation dataset.
+  - [x] Subjective evaluation.
 - [x] Long-context
-  - [ ] Long-context evaluation with extensive datasets.
+  - [x] Long-context evaluation with extensive datasets.
   - [ ] Long-context leaderboard.
-- [ ] Coding
+- [x] Coding
   - [ ] Coding evaluation leaderboard.
   - [x] Non-python language evaluation service.
-- [ ] Agent
+- [x] Agent
   - [ ] Support various agenet framework.
-  - [ ] Evaluation of tool use of the LLMs.
+  - [x] Evaluation of tool use of the LLMs.
 - [x] Robustness
   - [x] Support various attack method
@@ -500,6 +514,20 @@ Through the command line or configuration files, OpenCompass also supports evalu
 We appreciate all contributions to improving OpenCompass. Please refer to the [contributing guideline](https://opencompass.readthedocs.io/en/latest/notes/contribution_guide.html) for the best practice.
+<!-- Copy-paste in your Readme.md file -->
+<!-- Made with [OSS Insight](https://ossinsight.io/) -->
+<a href="https://github.com/open-compass/opencompass/graphs/contributors" target="_blank">
+  <table>
+    <tr>
+      <th colspan="2">
+        <br><img src="https://contrib.rocks/image?repo=open-compass/opencompass"><br><br>
+      </th>
+    </tr>
+  </table>
+</a>
 ## 🤝 Acknowledgements
 Some code in this project is cited and modified from [OpenICL](https://github.com/Shark-NLP/OpenICL).
@@ -518,3 +546,20 @@ Some datasets and prompt implementations are modified from [chain-of-thought-hub
 ```
 <p align="right"><a href="#top">🔝Back to top</a></p>
+[github-contributors-link]: https://github.com/open-compass/opencompass/graphs/contributors
+[github-contributors-shield]: https://img.shields.io/github/contributors/open-compass/opencompass?color=c4f042&labelColor=black&style=flat-square
+[github-forks-link]: https://github.com/open-compass/opencompass/network/members
+[github-forks-shield]: https://img.shields.io/github/forks/open-compass/opencompass?color=8ae8ff&labelColor=black&style=flat-square
+[github-issues-link]: https://github.com/open-compass/opencompass/issues
+[github-issues-shield]: https://img.shields.io/github/issues/open-compass/opencompass?color=ff80eb&labelColor=black&style=flat-square
+[github-license-link]: https://github.com/open-compass/opencompass/blob/main/LICENSE
+[github-license-shield]: https://img.shields.io/github/license/open-compass/opencompass?color=white&labelColor=black&style=flat-square
+[github-release-link]: https://github.com/open-compass/opencompass/releases
+[github-release-shield]: https://img.shields.io/github/v/release/open-compass/opencompass?color=369eff&labelColor=black&logo=github&style=flat-square
+[github-releasedate-link]: https://github.com/open-compass/opencompass/releases
+[github-releasedate-shield]: https://img.shields.io/github/release-date/open-compass/opencompass?labelColor=black&style=flat-square
+[github-stars-link]: https://github.com/open-compass/opencompass/stargazers
+[github-stars-shield]: https://img.shields.io/github/stars/open-compass/opencompass?color=ffcb47&labelColor=black&style=flat-square
+[github-trending-shield]: https://trendshift.io/api/badge/repositories/6630
+[github-trending-url]: https://trendshift.io/repositories/6630

opencompass-0.2.4/opencompass/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ __version__ = '0.2.4'

{opencompass-0.2.2 → opencompass-0.2.4}/opencompass/datasets/NPHardEval/cmp_GCP_D.py RENAMED Viewed

@@ -1,6 +1,10 @@
 import ast
-import networkx as nx
+try:
+    import networkx as nx
+except ImportError:
+    nx = None
 from datasets import Dataset
 from opencompass.openicl.icl_evaluator import BaseEvaluator

{opencompass-0.2.2 → opencompass-0.2.4}/opencompass/datasets/NPHardEval/cmp_TSP_D.py RENAMED Viewed

@@ -1,7 +1,11 @@
 import ast
 import json
-import networkx as nx
+try:
+    import networkx as nx
+except ImportError:
+    nx = None
 import pandas as pd
 from datasets import Dataset

{opencompass-0.2.2 → opencompass-0.2.4}/opencompass/datasets/NPHardEval/p_SPP.py RENAMED Viewed

@@ -1,7 +1,11 @@
 import ast
 import json
-import networkx as nx
+try:
+    import networkx as nx
+except ImportError:
+    nx = None
 from datasets import Dataset
 from opencompass.openicl.icl_evaluator import BaseEvaluator

opencompass-0.2.4/opencompass/datasets/OpenFinData.py ADDED Viewed

@@ -0,0 +1,47 @@
+import json
+import os.path as osp
+from datasets import Dataset
+from opencompass.openicl.icl_evaluator import BaseEvaluator
+from opencompass.registry import ICL_EVALUATORS, LOAD_DATASET
+from .base import BaseDataset
+@LOAD_DATASET.register_module()
+class OpenFinDataDataset(BaseDataset):
+    @staticmethod
+    def load(path: str, name: str):
+        with open(osp.join(path, f'{name}.json'), 'r') as f:
+            data = json.load(f)
+            return Dataset.from_list(data)
+@ICL_EVALUATORS.register_module()
+class OpenFinDataKWEvaluator(BaseEvaluator):
+    def __init__(self, ):
+        super().__init__()
+    def score(self, predictions, references):
+        assert len(predictions) == len(references)
+        scores = []
+        results = dict()
+        for i in range(len(references)):
+            all_hit = True
+            judgement = references[i].split('、')
+            for item in judgement:
+                if item not in predictions[i]:
+                    all_hit = False
+                    break
+            if all_hit:
+                scores.append(True)
+            else:
+                scores.append(False)
+        results['accuracy'] = round(sum(scores) / len(scores), 4) * 100
+        return results

opencompass-0.2.4/opencompass/datasets/QuALITY.py ADDED Viewed

@@ -0,0 +1,59 @@
+import json
+from datasets import Dataset
+from opencompass.openicl.icl_evaluator import BaseEvaluator
+from opencompass.registry import LOAD_DATASET
+from .base import BaseDataset
+@LOAD_DATASET.register_module()
+class QuALITYDataset(BaseDataset):
+    @staticmethod
+    def load(path: str):
+        dataset_list = []
+        with open(path, 'r', encoding='utf-8') as f:
+            for line in f:
+                line = json.loads(line)
+                for question in line['questions']:
+                    dataset_list.append({
+                        'article':
+                        line['article'],
+                        'question':
+                        question['question'],
+                        'A':
+                        question['options'][0],
+                        'B':
+                        question['options'][1],
+                        'C':
+                        question['options'][2],
+                        'D':
+                        question['options'][3],
+                        'gold_label':
+                        'ABCD'[question['gold_label'] - 1],
+                        'difficult':
+                        question['difficult']
+                    })
+        return Dataset.from_list(dataset_list)
+class QuALITYEvaluator(BaseEvaluator):
+    def score(self, predictions, references, test_set):
+        assert len(predictions) == len(references)
+        easy, hard, all = [], [], []
+        for pred, refer, test in zip(predictions, references, test_set):
+            if pred == refer:
+                answer = True
+            else:
+                answer = False
+            all.append(answer)
+            if test['difficult'] == 0:
+                easy.append(answer)
+            else:
+                hard.append(answer)
+        return dict(easy_acc=sum(easy) / len(easy) * 100,
+                    hard_acc=sum(hard) / len(easy) * 100,
+                    all_acc=sum(all) / len(all) * 100)

opencompass-0.2.4/opencompass/datasets/TheoremQA/__init__.py ADDED Viewed

@@ -0,0 +1,4 @@
+from .legacy import (TheoremQA_postprocess, TheoremQA_postprocess_v2,
+                     TheoremQADataset)
+from .main import (TheoremQA_postprocess_v3, TheoremQADatasetV3,
+                   TheoremQAEvaluatorV3)

opencompass-0.2.2/opencompass/datasets/TheoremQA.py → opencompass-0.2.4/opencompass/datasets/TheoremQA/legacy.py RENAMED Viewed

@@ -4,7 +4,7 @@ from datasets import load_dataset
 from opencompass.registry import LOAD_DATASET, TEXT_POSTPROCESSORS
-from .base import BaseDataset
+from ..base import BaseDataset
 @LOAD_DATASET.register_module()
@@ -24,3 +24,15 @@ def TheoremQA_postprocess(text: str) -> str:
     else:
         text = matches[0].strip().strip('.,?!\"\';:')
         return text
+def TheoremQA_postprocess_v2(text: str) -> str:
+    prediction = text.strip().strip('\n').split('\n')[-1]
+    tmp = ''
+    for entry in prediction.split(' ')[::-1]:
+        if entry == 'is' or entry == 'be' or entry == 'are' or entry.endswith(
+                ':'):
+            break
+        tmp = entry + ' ' + tmp
+    prediction = tmp.strip().strip('.')
+    return prediction

opencompass 0.2.2__tar.gz → 0.2.4__tar.gz

opencompass 0.2.2tar.gz → 0.2.4tar.gz