ScandEval 16.5.0__tar.gz → 16.6.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (354) hide show
  1. {scandeval-16.5.0 → scandeval-16.6.0}/.github/ISSUE_TEMPLATE/benchmark_dataset_request.yaml +2 -0
  2. {scandeval-16.5.0 → scandeval-16.6.0}/.github/ISSUE_TEMPLATE/model_evaluation_request.yaml +1 -1
  3. {scandeval-16.5.0 → scandeval-16.6.0}/.gitignore +1 -0
  4. {scandeval-16.5.0 → scandeval-16.6.0}/.pre-commit-config.yaml +1 -1
  5. {scandeval-16.5.0 → scandeval-16.6.0}/CHANGELOG.md +29 -0
  6. {scandeval-16.5.0 → scandeval-16.6.0}/PKG-INFO +4 -2
  7. {scandeval-16.5.0 → scandeval-16.6.0}/README.md +3 -1
  8. scandeval-16.6.0/docs/datasets/croatian.md +460 -0
  9. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/serbian.md +1 -1
  10. scandeval-16.6.0/docs/datasets/slovenian.md +453 -0
  11. scandeval-16.6.0/docs/leaderboards/Monolingual/ukrainian.md +26 -0
  12. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Multilingual/slavic.md +1 -1
  13. {scandeval-16.5.0 → scandeval-16.6.0}/pyproject.toml +1 -1
  14. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/benchmark_config_factory.py +14 -15
  15. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/benchmark_modules/vllm.py +44 -4
  16. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/cli.py +0 -77
  17. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/__init__.py +10 -2
  18. scandeval-16.6.0/src/scandeval/dataset_configs/croatian.py +56 -0
  19. scandeval-16.6.0/src/scandeval/dataset_configs/slovenian.py +56 -0
  20. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/exceptions.py +20 -0
  21. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/metrics/huggingface.py +1 -1
  22. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/prompt_templates/linguistic_acceptability.py +21 -0
  23. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/prompt_templates/multiple_choice.py +22 -0
  24. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/prompt_templates/named_entity_recognition.py +48 -0
  25. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/prompt_templates/reading_comprehension.py +31 -0
  26. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/prompt_templates/sentiment_classification.py +29 -0
  27. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/tasks.py +0 -14
  28. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/utils.py +34 -0
  29. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/constants.py +2 -0
  30. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_mmlu.py +13 -8
  31. scandeval-16.6.0/src/scripts/create_mmlu_hr.py +158 -0
  32. scandeval-16.6.0/src/scripts/create_mms.py +132 -0
  33. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_multi_wiki_qa.py +2 -0
  34. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_scala.py +4 -0
  35. scandeval-16.6.0/src/scripts/create_sentinews.py +94 -0
  36. scandeval-16.6.0/src/scripts/create_ssj500k_ner.py +93 -0
  37. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_wikiann.py +1 -1
  38. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_winogrande.py +2 -0
  39. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/load_ud_pos.py +36 -0
  40. {scandeval-16.5.0 → scandeval-16.6.0}/tests/conftest.py +5 -3
  41. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_cli.py +0 -3
  42. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_data_loading.py +1 -1
  43. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_tokenisation_utils.py +0 -1
  44. {scandeval-16.5.0 → scandeval-16.6.0}/uv.lock +1 -1
  45. scandeval-16.5.0/cool_test.csv +0 -7
  46. scandeval-16.5.0/cool_train.csv +0 -13
  47. scandeval-16.5.0/cool_val.csv +0 -5
  48. scandeval-16.5.0/custom_datasets.py +0 -21
  49. scandeval-16.5.0/gfx/different-poses/pose1.png +0 -0
  50. scandeval-16.5.0/gfx/different-poses/pose2.png +0 -0
  51. scandeval-16.5.0/gfx/different-poses/pose3.png +0 -0
  52. scandeval-16.5.0/gfx/different-poses/pose4.png +0 -0
  53. scandeval-16.5.0/gfx/different-poses/pose5.png +0 -0
  54. scandeval-16.5.0/gfx/different-poses/pose6.png +0 -0
  55. scandeval-16.5.0/src/scripts/create_mms_sr.py +0 -110
  56. scandeval-16.5.0/tests/test_tasks.py +0 -43
  57. {scandeval-16.5.0 → scandeval-16.6.0}/.github/ISSUE_TEMPLATE/bug.yaml +0 -0
  58. {scandeval-16.5.0 → scandeval-16.6.0}/.github/ISSUE_TEMPLATE/feature_request.yaml +0 -0
  59. {scandeval-16.5.0 → scandeval-16.6.0}/.github/ISSUE_TEMPLATE/language_request.yaml +0 -0
  60. {scandeval-16.5.0 → scandeval-16.6.0}/.github/workflows/ci.yaml +0 -0
  61. {scandeval-16.5.0 → scandeval-16.6.0}/.markdownlint.jsonc +0 -0
  62. {scandeval-16.5.0 → scandeval-16.6.0}/CITATION.cff +0 -0
  63. {scandeval-16.5.0 → scandeval-16.6.0}/CODE_OF_CONDUCT.md +0 -0
  64. {scandeval-16.5.0 → scandeval-16.6.0}/CONTRIBUTING.md +0 -0
  65. {scandeval-16.5.0 → scandeval-16.6.0}/Dockerfile.cuda +0 -0
  66. {scandeval-16.5.0 → scandeval-16.6.0}/LICENSE +0 -0
  67. {scandeval-16.5.0 → scandeval-16.6.0}/NEW_DATASET_GUIDE.md +0 -0
  68. {scandeval-16.5.0 → scandeval-16.6.0}/docs/CNAME +0 -0
  69. {scandeval-16.5.0 → scandeval-16.6.0}/docs/README.md +0 -0
  70. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/README.md +0 -0
  71. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/bulgarian.md +0 -0
  72. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/czech.md +0 -0
  73. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/danish.md +0 -0
  74. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/dutch.md +0 -0
  75. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/english.md +0 -0
  76. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/estonian.md +0 -0
  77. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/faroese.md +0 -0
  78. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/finnish.md +0 -0
  79. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/french.md +0 -0
  80. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/german.md +0 -0
  81. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/greek.md +0 -0
  82. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/icelandic.md +0 -0
  83. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/italian.md +0 -0
  84. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/latvian.md +0 -0
  85. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/lithuanian.md +0 -0
  86. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/norwegian.md +0 -0
  87. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/polish.md +0 -0
  88. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/portuguese.md +0 -0
  89. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/slovak.md +0 -0
  90. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/spanish.md +0 -0
  91. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/swedish.md +0 -0
  92. {scandeval-16.5.0 → scandeval-16.6.0}/docs/datasets/ukrainian.md +0 -0
  93. {scandeval-16.5.0 → scandeval-16.6.0}/docs/extras/radial_plotter.md +0 -0
  94. {scandeval-16.5.0 → scandeval-16.6.0}/docs/faq.md +0 -0
  95. {scandeval-16.5.0 → scandeval-16.6.0}/docs/gfx/favicon.png +0 -0
  96. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/czech.md +0 -0
  97. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/danish.md +0 -0
  98. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/dutch.md +0 -0
  99. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/english.md +0 -0
  100. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/estonian.md +0 -0
  101. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/faroese.md +0 -0
  102. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/finnish.md +0 -0
  103. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/french.md +0 -0
  104. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/german.md +0 -0
  105. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/icelandic.md +0 -0
  106. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/italian.md +0 -0
  107. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/latvian.md +0 -0
  108. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/lithuanian.md +0 -0
  109. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/norwegian.md +0 -0
  110. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/polish.md +0 -0
  111. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/portuguese.md +0 -0
  112. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/slovak.md +0 -0
  113. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/spanish.md +0 -0
  114. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Monolingual/swedish.md +0 -0
  115. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Multilingual/baltic.md +0 -0
  116. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Multilingual/european.md +0 -0
  117. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Multilingual/finnic.md +0 -0
  118. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Multilingual/germanic.md +0 -0
  119. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Multilingual/mainland-scandinavian.md +0 -0
  120. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/Multilingual/romance.md +0 -0
  121. {scandeval-16.5.0 → scandeval-16.6.0}/docs/leaderboards/README.md +0 -0
  122. {scandeval-16.5.0 → scandeval-16.6.0}/docs/methodology.md +0 -0
  123. {scandeval-16.5.0 → scandeval-16.6.0}/docs/python-package.md +0 -0
  124. {scandeval-16.5.0 → scandeval-16.6.0}/docs/tasks/README.md +0 -0
  125. {scandeval-16.5.0 → scandeval-16.6.0}/docs/tasks/common-sense-reasoning.md +0 -0
  126. {scandeval-16.5.0 → scandeval-16.6.0}/docs/tasks/knowledge.md +0 -0
  127. {scandeval-16.5.0 → scandeval-16.6.0}/docs/tasks/linguistic-acceptability.md +0 -0
  128. {scandeval-16.5.0 → scandeval-16.6.0}/docs/tasks/named-entity-recognition.md +0 -0
  129. {scandeval-16.5.0 → scandeval-16.6.0}/docs/tasks/reading-comprehension.md +0 -0
  130. {scandeval-16.5.0 → scandeval-16.6.0}/docs/tasks/sentiment-classification.md +0 -0
  131. {scandeval-16.5.0 → scandeval-16.6.0}/docs/tasks/speed.md +0 -0
  132. {scandeval-16.5.0 → scandeval-16.6.0}/docs/tasks/summarization.md +0 -0
  133. {scandeval-16.5.0 → scandeval-16.6.0}/gfx/euroeval.png +0 -0
  134. {scandeval-16.5.0 → scandeval-16.6.0}/gfx/euroeval.xcf +0 -0
  135. {scandeval-16.5.0 → scandeval-16.6.0}/gfx/scandeval.png +0 -0
  136. {scandeval-16.5.0 → scandeval-16.6.0}/makefile +0 -0
  137. {scandeval-16.5.0 → scandeval-16.6.0}/mkdocs.yaml +0 -0
  138. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/__init__.py +0 -0
  139. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/benchmark_modules/__init__.py +0 -0
  140. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/benchmark_modules/base.py +0 -0
  141. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/benchmark_modules/fresh.py +0 -0
  142. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/benchmark_modules/hf.py +0 -0
  143. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/benchmark_modules/litellm.py +0 -0
  144. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/benchmarker.py +0 -0
  145. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/caching_utils.py +0 -0
  146. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/callbacks.py +0 -0
  147. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/constants.py +0 -0
  148. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/data_loading.py +0 -0
  149. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/data_models.py +0 -0
  150. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/bulgarian.py +0 -0
  151. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/czech.py +0 -0
  152. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/danish.py +0 -0
  153. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/dutch.py +0 -0
  154. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/english.py +0 -0
  155. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/estonian.py +0 -0
  156. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/faroese.py +0 -0
  157. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/finnish.py +0 -0
  158. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/french.py +0 -0
  159. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/german.py +0 -0
  160. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/greek.py +0 -0
  161. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/icelandic.py +0 -0
  162. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/italian.py +0 -0
  163. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/latvian.py +0 -0
  164. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/lithuanian.py +0 -0
  165. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/norwegian.py +0 -0
  166. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/polish.py +0 -0
  167. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/portuguese.py +0 -0
  168. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/serbian.py +0 -0
  169. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/slovak.py +0 -0
  170. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/spanish.py +0 -0
  171. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/swedish.py +0 -0
  172. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/dataset_configs/ukrainian.py +0 -0
  173. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/enums.py +0 -0
  174. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/finetuning.py +0 -0
  175. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/generation.py +0 -0
  176. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/generation_utils.py +0 -0
  177. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/languages.py +0 -0
  178. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/logging_utils.py +0 -0
  179. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/metrics/__init__.py +0 -0
  180. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/metrics/base.py +0 -0
  181. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/metrics/llm_as_a_judge.py +0 -0
  182. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/metrics/pipeline.py +0 -0
  183. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/metrics/speed.py +0 -0
  184. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/model_cache.py +0 -0
  185. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/model_config.py +0 -0
  186. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/model_loading.py +0 -0
  187. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/prompt_templates/__init__.py +0 -0
  188. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/prompt_templates/classification.py +0 -0
  189. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/prompt_templates/summarization.py +0 -0
  190. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/prompt_templates/token_classification.py +0 -0
  191. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/scores.py +0 -0
  192. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/speed_benchmark.py +0 -0
  193. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/task_group_utils/__init__.py +0 -0
  194. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/task_group_utils/multiple_choice_classification.py +0 -0
  195. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/task_group_utils/question_answering.py +0 -0
  196. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/task_group_utils/sequence_classification.py +0 -0
  197. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/task_group_utils/text_to_text.py +0 -0
  198. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/task_group_utils/token_classification.py +0 -0
  199. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/tokenisation_utils.py +0 -0
  200. {scandeval-16.5.0 → scandeval-16.6.0}/src/scandeval/types.py +0 -0
  201. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/__init__.py +0 -0
  202. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_allocine.py +0 -0
  203. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_angry_tweets.py +0 -0
  204. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_arc.py +0 -0
  205. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_arc_is.py +0 -0
  206. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_belebele.py +0 -0
  207. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_bg_ner_bsnlp.py +0 -0
  208. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_boolq_pt.py +0 -0
  209. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_cinexio.py +0 -0
  210. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_cnn_dailymail.py +0 -0
  211. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_conll_en.py +0 -0
  212. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_conll_es.py +0 -0
  213. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_conll_nl.py +0 -0
  214. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_copa_lv.py +0 -0
  215. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_cross_domain_uk_reviews.py +0 -0
  216. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_cs_gec.py +0 -0
  217. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_csfd_sentiment.py +0 -0
  218. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_csfd_sentiment_sk.py +0 -0
  219. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_czech_news.py +0 -0
  220. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_dane.py +0 -0
  221. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_danish_citizen_tests.py +0 -0
  222. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_dansk.py +0 -0
  223. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_danske_talemaader.py +0 -0
  224. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_danske_talemaader_old.py +0 -0
  225. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_dbrd.py +0 -0
  226. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_dutch_cola.py +0 -0
  227. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_elner.py +0 -0
  228. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_eltec.py +0 -0
  229. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_err_news.py +0 -0
  230. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_estner.py +0 -0
  231. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_estonian_valence.py +0 -0
  232. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_european_values.py +0 -0
  233. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_exam_et.py +0 -0
  234. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_exams_bg.py +0 -0
  235. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_fone.py +0 -0
  236. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_foqa.py +0 -0
  237. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_fosent.py +0 -0
  238. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_fquad.py +0 -0
  239. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_fullstack_ner.py +0 -0
  240. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_germanquad.py +0 -0
  241. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_germeval.py +0 -0
  242. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_global_mmlu.py +0 -0
  243. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_goldenswag.py +0 -0
  244. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_grammar_et.py +0 -0
  245. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_greek_sa.py +0 -0
  246. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_greek_wikipedia.py +0 -0
  247. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_harem.py +0 -0
  248. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_hellaswag.py +0 -0
  249. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_hellaswag_cs.py +0 -0
  250. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_hellaswag_fi.py +0 -0
  251. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_hotter_and_colder_sentiment.py +0 -0
  252. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_ice_linguistic.py +0 -0
  253. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_icelandic_error_corpus.py +0 -0
  254. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_icelandic_knowledge.py +0 -0
  255. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_icelandic_qa.py +0 -0
  256. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_icesum.py +0 -0
  257. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_idioms_no.py +0 -0
  258. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_ilpost_sum.py +0 -0
  259. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_jentoft.py +0 -0
  260. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_kpwr_ner.py +0 -0
  261. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_latvian_lsm_summary.py +0 -0
  262. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_latvian_twitter_sentiment.py +0 -0
  263. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_life_in_the_uk.py +0 -0
  264. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_lithuanian_lrytas_summarization.py +0 -0
  265. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_llmzszl.py +0 -0
  266. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_lr_sum.py +0 -0
  267. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_lt_emotions.py +0 -0
  268. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_lt_history.py +0 -0
  269. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_mim_gold_ner.py +0 -0
  270. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_mlqa_es.py +0 -0
  271. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_mlsum_de.py +0 -0
  272. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_mlsum_es.py +0 -0
  273. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_mmlu_et.py +0 -0
  274. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_mmlu_lv.py +0 -0
  275. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_multinerd-it.py +0 -0
  276. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_ner_uk.py +0 -0
  277. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_no_cola.py +0 -0
  278. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_no_sammendrag.py +0 -0
  279. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_nor_common_sense_qa.py +0 -0
  280. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_nordjylland_news.py +0 -0
  281. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_norec.py +0 -0
  282. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_norglm_multiqa.py +0 -0
  283. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_norglm_multisum.py +0 -0
  284. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_norne.py +0 -0
  285. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_norquad.py +0 -0
  286. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_nqii.py +0 -0
  287. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_nrk_quiz_qa.py +0 -0
  288. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_orange_sum.py +0 -0
  289. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_personal_sum.py +0 -0
  290. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_polemo2.py +0 -0
  291. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_poner.py +0 -0
  292. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_poquad.py +0 -0
  293. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_psc.py +0 -0
  294. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_publico.py +0 -0
  295. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_rrn.py +0 -0
  296. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_sb10k.py +0 -0
  297. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_scandiqa.py +0 -0
  298. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_scandisent_fi.py +0 -0
  299. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_schibsted.py +0 -0
  300. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_sentiment_headlines_es.py +0 -0
  301. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_sentipolc16.py +0 -0
  302. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_skolprov.py +0 -0
  303. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_sqad.py +0 -0
  304. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_squad.py +0 -0
  305. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_squad_it.py +0 -0
  306. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_squad_nl.py +0 -0
  307. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_squad_nl_old.py +0 -0
  308. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_sst2_pt.py +0 -0
  309. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_sst5.py +0 -0
  310. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_suc3.py +0 -0
  311. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_swedn.py +0 -0
  312. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_swerec.py +0 -0
  313. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_trivia_et.py +0 -0
  314. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_turku_ner_fi.py +0 -0
  315. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_tydiqa_fi.py +0 -0
  316. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_umimeto_qa.py +0 -0
  317. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_uner_sk.py +0 -0
  318. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_uner_sr.py +0 -0
  319. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_wiki_lingua_nl.py +0 -0
  320. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_wikineural-it.py +0 -0
  321. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_winogrande_et.py +0 -0
  322. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_winogrande_is.py +0 -0
  323. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_xlsum_fi.py +0 -0
  324. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/create_xquad.py +0 -0
  325. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/fix_dot_env_file.py +0 -0
  326. {scandeval-16.5.0 → scandeval-16.6.0}/src/scripts/versioning.py +0 -0
  327. {scandeval-16.5.0 → scandeval-16.6.0}/tests/__init__.py +0 -0
  328. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_benchmark_config_factory.py +0 -0
  329. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_benchmark_modules/__init__.py +0 -0
  330. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_benchmark_modules/test_hf.py +0 -0
  331. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_benchmarker.py +3 -3
  332. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_callbacks.py +0 -0
  333. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_constants.py +0 -0
  334. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_data_models.py +0 -0
  335. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_dataset_configs.py +0 -0
  336. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_enums.py +0 -0
  337. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_exceptions.py +0 -0
  338. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_finetuning.py +0 -0
  339. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_languages.py +0 -0
  340. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_model_config.py +0 -0
  341. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_model_loading.py +0 -0
  342. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_scores.py +0 -0
  343. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_scripts/__init__.py +0 -0
  344. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_scripts/test_create_scala/__init__.py +0 -0
  345. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_scripts/test_create_scala/test_create_scala.py +0 -0
  346. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_scripts/test_create_scala/test_data/de_gsd-ud-train.conllu.adp_det +0 -0
  347. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_scripts/test_create_scala/test_data/empty.file +0 -0
  348. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_scripts/test_create_scala/test_data/en_gum-ud-train.conllu.case +0 -0
  349. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_scripts/test_create_scala/test_data/pl_pdb-ud-train.conllu.aux_clitic_01 +0 -0
  350. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_scripts/test_create_scala/test_data/pl_pdb-ud-train.conllu.aux_clitic_02 +0 -0
  351. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_scripts/test_create_scala/test_data/pl_pdb-ud-train.conllu.aux_clitic_03 +0 -0
  352. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_speed_benchmark.py +0 -0
  353. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_types.py +0 -0
  354. {scandeval-16.5.0 → scandeval-16.6.0}/tests/test_utils.py +0 -0
@@ -25,6 +25,7 @@ body:
25
25
  description: What languages is the dataset in?
26
26
  options:
27
27
  - label: Bulgarian
28
+ - label: Croatian
28
29
  - label: Czech
29
30
  - label: Danish
30
31
  - label: Dutch
@@ -44,6 +45,7 @@ body:
44
45
  - label: Portuguese
45
46
  - label: Serbian
46
47
  - label: Slovak
48
+ - label: Slovenian
47
49
  - label: Spanish
48
50
  - label: Swedish
49
51
  - label: Ukrainian
@@ -23,7 +23,7 @@ body:
23
23
  - label: Hellenic languages (Greek)
24
24
  - label: Romance languages (French, Italian, Portuguese, Spanish)
25
25
  - label: Scandinavian languages (Danish, Faroese, Icelandic, Norwegian, Swedish)
26
- - label: Slavic languages (Bulgarian, Czech, Polish, Serbian, Slovak, Ukrainian)
26
+ - label: Slavic languages (Bulgarian, Croatian, Czech, Polish, Serbian, Slovak, Slovenian, Ukrainian)
27
27
  - label: West Germanic languages (Dutch, English, German)
28
28
  validations:
29
29
  required: true
@@ -121,6 +121,7 @@ gfx/euroeval-*.png
121
121
  gfx/euroeval-*.jpeg
122
122
  gfx/euroeval-*.jpg
123
123
  gfx/euroeval-*.xcf
124
+ gfx/different-poses/*
124
125
 
125
126
  # Contracts
126
127
  generated_contracts/
@@ -10,7 +10,7 @@ repos:
10
10
  - id: trailing-whitespace
11
11
  - id: debug-statements
12
12
  - repo: https://github.com/astral-sh/ruff-pre-commit
13
- rev: v0.14.2
13
+ rev: v0.14.3
14
14
  hooks:
15
15
  - id: ruff
16
16
  args:
@@ -7,10 +7,39 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [v16.6.0] - 2025-11-04
11
+
12
+ ### Added
13
+
14
+ - Added support for Croatian 🇭🇷! This includes the sentiment classification dataset
15
+ MMS-hr, the linguistic acceptability dataset ScaLA-hr, the named entity recognition
16
+ dataset WikiANN-hr, the reading comprehension dataset MultiWikiQA-hr, the knowledge
17
+ dataset MMLU-hr, and the common-sense reasoning dataset Winogrande-hr.
18
+ - Added a system dependency check for `nvcc` in the `VLLMModel.__init__` method to
19
+ ensure the CUDA Toolkit is installed. Raises an error with installation instructions
20
+ if NVCC is not available in the system PATH.
21
+
22
+ ### Changed
23
+
24
+ - Removed the `--custom-datasets-file` argument, which is now always
25
+ `custom_datasets.py` in the current working directory. This enables us to auto-read
26
+ this file, making it possible to evaluate custom datasets by name only when using the
27
+ `Benchmarker` API.
28
+
29
+ ### Fixed
30
+
31
+ - Now disabled structured generation for classification tasks if we're disabling
32
+ logprobs, to force evaluation using raw outputs and word edit distance instead.
33
+
10
34
  ## [v16.5.0] - 2025-10-28
11
35
 
12
36
  ### Added
13
37
 
38
+ - Added support for Slovenian 🇸🇮! This includes the sentiment classification dataset
39
+ Sentinews, the linguistic acceptability dataset ScaLA-sl, the named entity recognition
40
+ dataset ssj500k-NER, the reading comprehension
41
+ dataset MultiWikiQA-sl, the knowledge dataset MMLU-sl, and the common-sense reasoning
42
+ dataset Winogrande-sl.
14
43
  - Added better support for evaluating on custom datasets, by allowing `DatasetConfig`
15
44
  objects directly in the `Benchmarker.benchmark` method. We also support custom
16
45
  datasets with the CLI, by simply defining the desired `DatasetConfig`s in a
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: ScandEval
3
- Version: 16.5.0
3
+ Version: 16.6.0
4
4
  Summary: The robust European language model benchmark.
5
5
  Project-URL: Repository, https://github.com/EuroEval/EuroEval
6
6
  Project-URL: Issues, https://github.com/EuroEval/EuroEval/issues
@@ -92,7 +92,7 @@ ______________________________________________________________________
92
92
  [![Second paper](https://img.shields.io/badge/arXiv-2406.13469-b31b1b.svg)](https://arxiv.org/abs/2406.13469)
93
93
  [![License](https://img.shields.io/github/license/EuroEval/EuroEval)](https://github.com/EuroEval/EuroEval/blob/main/LICENSE)
94
94
  [![LastCommit](https://img.shields.io/github/last-commit/EuroEval/EuroEval)](https://github.com/EuroEval/EuroEval/commits/main)
95
- [![Code Coverage](https://img.shields.io/badge/Coverage-76%25-yellowgreen.svg)](https://github.com/EuroEval/EuroEval/tree/main/tests)
95
+ [![Code Coverage](https://img.shields.io/badge/Coverage-74%25-yellow.svg)](https://github.com/EuroEval/EuroEval/tree/main/tests)
96
96
  [![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg)](https://github.com/EuroEval/EuroEval/blob/main/CODE_OF_CONDUCT.md)
97
97
 
98
98
  ## Maintainer
@@ -295,6 +295,7 @@ from euroeval.languages import ENGLISH
295
295
 
296
296
  MY_CONFIG = DatasetConfig(
297
297
  name="my-dataset",
298
+ pretty_name="My Dataset",
298
299
  source=dict(train="train.csv", val="val.csv", test="test.csv"),
299
300
  task=TEXT_CLASSIFICATION,
300
301
  languages=[ENGLISH],
@@ -385,6 +386,7 @@ sql_generation_task = Task(
385
386
 
386
387
  MY_SQL_DATASET = DatasetConfig(
387
388
  name="my-sql-dataset",
389
+ pretty_name="My SQL Dataset",
388
390
  source=dict(train="train.csv", val="val.csv", test="test.csv"),
389
391
  task=sql_generation_task,
390
392
  languages=[ENGLISH],
@@ -20,7 +20,7 @@ ______________________________________________________________________
20
20
  [![Second paper](https://img.shields.io/badge/arXiv-2406.13469-b31b1b.svg)](https://arxiv.org/abs/2406.13469)
21
21
  [![License](https://img.shields.io/github/license/EuroEval/EuroEval)](https://github.com/EuroEval/EuroEval/blob/main/LICENSE)
22
22
  [![LastCommit](https://img.shields.io/github/last-commit/EuroEval/EuroEval)](https://github.com/EuroEval/EuroEval/commits/main)
23
- [![Code Coverage](https://img.shields.io/badge/Coverage-76%25-yellowgreen.svg)](https://github.com/EuroEval/EuroEval/tree/main/tests)
23
+ [![Code Coverage](https://img.shields.io/badge/Coverage-74%25-yellow.svg)](https://github.com/EuroEval/EuroEval/tree/main/tests)
24
24
  [![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg)](https://github.com/EuroEval/EuroEval/blob/main/CODE_OF_CONDUCT.md)
25
25
 
26
26
  ## Maintainer
@@ -223,6 +223,7 @@ from euroeval.languages import ENGLISH
223
223
 
224
224
  MY_CONFIG = DatasetConfig(
225
225
  name="my-dataset",
226
+ pretty_name="My Dataset",
226
227
  source=dict(train="train.csv", val="val.csv", test="test.csv"),
227
228
  task=TEXT_CLASSIFICATION,
228
229
  languages=[ENGLISH],
@@ -313,6 +314,7 @@ sql_generation_task = Task(
313
314
 
314
315
  MY_SQL_DATASET = DatasetConfig(
315
316
  name="my-sql-dataset",
317
+ pretty_name="My SQL Dataset",
316
318
  source=dict(train="train.csv", val="val.csv", test="test.csv"),
317
319
  task=sql_generation_task,
318
320
  languages=[ENGLISH],
@@ -0,0 +1,460 @@
1
+ # 🇭🇷 Croatian
2
+
3
+ This is an overview of all the datasets used in the Croatian part of EuroEval. The
4
+ datasets are grouped by their task - see the [task overview](/tasks) for more
5
+ information about what these constitute.
6
+
7
+ ## Sentiment Classification
8
+
9
+ ### MMS-hr
10
+
11
+ This dataset was published in [this paper](https://doi.org/10.48550/arXiv.2306.07902).
12
+ The corpus consists of 79 manually selected datasets from over 350 datasets reported in the
13
+ scientific literature based on strict quality criteria.
14
+
15
+ The original dataset contains a single split with 77,594 Croatian samples.
16
+ We use 1,024 / 256 / 2,048 samples for our training, validation, and test splits,
17
+ respectively.
18
+ We have employed stratified sampling based on the label column from the original
19
+ dataset to ensure balanced splits.
20
+
21
+ Here are a few examples from the training split:
22
+
23
+ ```json
24
+ {
25
+ "text": "ali kako mozete biti ovako trijezni u ovo doba ajde molim vas",
26
+ "label": "negative"
27
+ }
28
+ ```
29
+
30
+ ```json
31
+ {
32
+ "text": "RT @bsimun: Thompson okupio 100 000 ljudi u Čavoglavama. Sad će valjda platiti porez. #domoljublje #DanPobjede",
33
+ "label": "neutral"
34
+ }
35
+ ```
36
+
37
+ ```json
38
+ {
39
+ "text": "\n Šesti \"El Clásico\" za\n \n Luku Modrića\n \n bio je i najdraži. Real je dobio Barçu 3-1, a hrvatski veznjak bio je jedan od najboljih igrača \"kraljeva\".\n\n\n\n - Otkako sam u Madridu, meni je to djelovalo kao\n \n najuvjerljivija demonstracija moći\n \n . Barça je izgledala manje moćno jer je Real odigrao impresivno. Meni ta pobjeda više govori o snazi naše momčadi, o potvrdi kako forma koju iskazujemo već osam-devet utakmica nije slučajna - rekao je Luka za\n \n SN\n \n .\n\n\n - Imali su psihološku prednost zbog stanja na ljestvici i manjeg imperativa. Zato je\n \n Realov uspjeh impresivan\n \n , tim prije što smo gubili 0-1 - dodao je.\n\n\n\n Izvorni članak pročitajte u\n \n Sportskim novostima\n \n .\n \n\n\n Pohvalio suigrače\n \n\n\n -\n \n Čudesna utakmica\n \n cijele momčadi i pobjeda protiv Barcelone. Ajmo, halá Madrid! - napisao je Modrić na društvenim mrežama.\n \n",
40
+ "label": "positive"
41
+ }
42
+ ```
43
+
44
+ When evaluating generative models, we use the following setup (see the
45
+ [methodology](/methodology) for more information on how these are used):
46
+
47
+ - Number of few-shot examples: 12
48
+ - Prefix prompt:
49
+
50
+ ```text
51
+ Slijede dokumenti i njihova osjetila, koja mogu biti pozitivno, neutralno ili negativno.
52
+ ```
53
+
54
+ - Base prompt template:
55
+
56
+ ```text
57
+ Dokument: {text}
58
+ Osjetilo: {label}
59
+ ```
60
+
61
+ - Instruction-tuned prompt template:
62
+
63
+ ```text
64
+ Dokument: {text}
65
+
66
+ Klasificirajte osjećaj u dokumentu. Odgovorite samo s pozitivno, neutralno, ili negativno, i ništa drugo.
67
+ ```
68
+
69
+ - Label mapping:
70
+ - `positive` ➡️ `pozitivno`
71
+ - `neutral` ➡️ `neutralno`
72
+ - `negative` ➡️ `negativno`
73
+
74
+ You can evaluate this dataset directly as follows:
75
+
76
+ ```bash
77
+ euroeval --model <model-id> --dataset mms-hr
78
+ ```
79
+
80
+ ## Named Entity Recognition
81
+
82
+ ### WikiANN-hr
83
+
84
+ This dataset was published in [this paper](https://aclanthology.org/P17-1178/) and is
85
+ part of a cross-lingual named entity recognition framework for 282 languages from
86
+ Wikipedia. It uses silver-standard annotations transferred from English through
87
+ cross-lingual links and performs both name tagging and linking to an english Knowledge
88
+ Base.
89
+
90
+ The original full dataset consists of 10,000 / 10,000 / 10,000 samples for the training,
91
+ validation and test splits, respectively. We use 1,024 / 256 / 2,048 samples for our
92
+ training, validation and test splits, respectively. All the new splits are subsets of
93
+ the original splits.
94
+
95
+ Here are a few examples from the training split:
96
+
97
+ ```json
98
+ {
99
+ "tokens": array(["Ubrzo", "su", "uslijedile", "narudžbe", "iz", "cijele", "Britanske", "zajednice", "naroda", "."], dtype=object),
100
+ "labels": ["O", "O", "O", "O", "O", "O", "B-ORG", "I-ORG", "I-ORG", "O"]
101
+ }
102
+ ```
103
+
104
+ ```json
105
+ {
106
+ "tokens": array(["``", "(", "Cole", "Porter", ")"], dtype=object),
107
+ "labels": ["O", "O", "B-PER", "I-PER", "O"]
108
+ }
109
+ ```
110
+
111
+ ```json
112
+ {
113
+ "tokens": array(["'", "''", "La", "Liga", "2009.", "/", "10", "."], dtype=object),
114
+ "labels": ["O", "O", "B-ORG", "I-ORG", "O", "O", "O", "O"]
115
+ }
116
+ ```
117
+
118
+ When evaluating generative models, we use the following setup (see the
119
+ [methodology](/methodology) for more information on how these are used):
120
+
121
+ - Number of few-shot examples: 8
122
+ - Prefix prompt:
123
+
124
+ ```text
125
+ Sljedeće su rečenice i JSON rječnici s imenicama koje se pojavljuju u rečenicama.
126
+ ```
127
+
128
+ - Base prompt template:
129
+
130
+ ```text
131
+ Rečenica: {text}
132
+ Imenovane entiteti: {label}
133
+ ```
134
+
135
+ - Instruction-tuned prompt template:
136
+
137
+ ```text
138
+ Rečenica: {text}
139
+
140
+ Identificirajte imenovane entitete u rečenici. Prikažite ih kao JSON rječnik s ključevima 'osoba', 'mjesto', 'organizacija' i 'razno'. Vrijednosti trebaju biti popisi imenovanih entiteta navedenog tipa, točno kako se pojavljuju u rečenici.
141
+ ```
142
+
143
+ - Label mapping:
144
+ - `B-PER` ➡️ `osoba`
145
+ - `I-PER` ➡️ `osoba`
146
+ - `B-LOC` ➡️ `mjesto`
147
+ - `I-LOC` ➡️ `mjesto`
148
+ - `B-ORG` ➡️ `organizacija`
149
+ - `I-ORG` ➡️ `organizacija`
150
+ - `B-MISC` ➡️ `razno`
151
+ - `I-MISC` ➡️ `razno`
152
+
153
+ You can evaluate this dataset directly as follows:
154
+
155
+ ```bash
156
+ euroeval --model <model-id> --dataset wikiann-hr
157
+ ```
158
+
159
+ ## Linguistic Acceptability
160
+
161
+ ### ScaLA-hr
162
+
163
+ This dataset was published in [this paper](https://aclanthology.org/2023.nodalida-1.20/)
164
+ and was automatically created from the [Croatian Universal Dependencies
165
+ treebank](https://github.com/UniversalDependencies/UD_Croatian-SET) by assuming that the
166
+ documents in the treebank are correct, and corrupting the samples to create
167
+ grammatically incorrect samples. The corruptions were done by either removing a word
168
+ from a sentence, or by swapping two neighbouring words in a sentence. To ensure that
169
+ this does indeed break the grammaticality of the sentence, a set of rules were used on
170
+ the part-of-speech tags of the words in the sentence.
171
+
172
+ The original full dataset consists of 1,024 / 256 / 2,048 samples for training,
173
+ validation and testing, respectively (so 3,328 samples used in total). These splits are
174
+ used as-is in the framework.
175
+
176
+ Here are a few examples from the training split:
177
+
178
+ ```json
179
+ {
180
+ "text": "Nakon kratke intervencije, tijekom koje sam saznala kada se taj osjećaj prvog puta pojavio i zbog čega, sve je nestalo i već mjesecima živim bez opterećenja koji me pratilo cijelog života.",
181
+ "label": "correct"
182
+ }
183
+ ```
184
+
185
+ ```json
186
+ {
187
+ "text": "Svaki od tih sklopova, i dijelova mora biti homologiran i sukladan s ostalima.",
188
+ "label": "incorrect"
189
+ }
190
+ ```
191
+
192
+ ```json
193
+ {
194
+ "text": "Prvi među njima je Laurent Blanc, koji drži Romu na čekanju, a s Parkom prinčeva povezivan je i Fabio Capello.",
195
+ "label": "correct"
196
+ }
197
+ ```
198
+
199
+ When evaluating generative models, we use the following setup (see the
200
+ [methodology](/methodology) for more information on how these are used):
201
+
202
+ - Number of few-shot examples: 12
203
+ - Prefix prompt:
204
+
205
+ ```text
206
+ Sljedeće su rečenice i jesu li gramatički ispravne.
207
+ ```
208
+
209
+ - Base prompt template:
210
+
211
+ ```text
212
+ Rečenica: {text}
213
+ Gramatički ispravna: {label}
214
+ ```
215
+
216
+ - Instruction-tuned prompt template:
217
+
218
+ ```text
219
+ Rečenica: {text}
220
+
221
+ Odredite je li rečenica gramatički ispravna ili ne. Odgovorite s 'da' ako je ispravna, i s 'ne' ako nije. Odgovorite samo tom riječju i ničim drugim.
222
+ ```
223
+
224
+ - Label mapping:
225
+ - `correct` ➡️ `da`
226
+ - `incorrect` ➡️ `ne`
227
+
228
+ You can evaluate this dataset directly as follows:
229
+
230
+ ```bash
231
+ euroeval --model <model-id> --dataset scala-hr
232
+ ```
233
+
234
+ ## Reading Comprehension
235
+
236
+ ### MultiWikiQA-hr
237
+
238
+ This dataset was published in [this paper](https://doi.org/10.48550/arXiv.2509.04111)
239
+ and contains Wikipedia articles with LLM-generated questions and answers in 300+
240
+ languages.
241
+
242
+ The original full dataset consists of 5,000 samples in a single split. We use a 1,024 /
243
+ 256 / 2,048 split for training, validation and testing, respectively, sampled randomly.
244
+
245
+ Here are a few examples from the training split:
246
+
247
+ ```json
248
+ {
249
+ "context": "Arkadija je pokrajina u središnjem dijelu Peloponeza, Grčka.\n\nOsnovni podaci\nGlavni grad Arkadije je Tripoli; populacija pokrajine je 100 611 (podatci iz 2005.), na 38. mjestu u Grčkoj; Površina joj je 4419 km² što je čini 5. po veličini; Gustoća naseljenosti je 22,8/km²; sastoji se od 4 provincije, 22 općine i 1 županije (okruga); poštanski broj je 22, registracijske pločice s oznakom TP; službena web stranica je www.arcadia.gr.\n\nOpćine\n\nPovijest\n\nGradska naselja u Arkadiji su se razvila razmjerno kasno (Mantineja, Orhomen, Tegeja). Bili su saveznici Sparte do sloma njezine hegemonije (371. pr. Kr.), otada tvore samostalan savez pod vodstvom novoosnovanog polisa Megalopola. Samostalnost saveza dokrajčili su Makedonci. U 3. st. pr. Kr. dio gradova u Arkadiji pristupa Ahajskom, a dio Etolskom savezu. Pod rimskom vlašću od 168. pr. Kr.\n\nSimbolika Arkadije\n\nPrema grčkoj tradiciji Arkadija je postojbina Pana, domovina jednostavnih, priprostih i poštenih ljudi (pastira). Kao simbol nepokvarena i idilična života javlja se tzv. bukolska (pastirska) poezija. Obnovljena u doba renesanse pod utjecajem idiličnog romana "Arkadija" talijanskog pisca J. Sannazzara. \n\nPo Arkadiji je ime dobila i čuvena knjižnica Akademija (Accademia degli Arcadi), osnovana 1690. g. u Rimu, a pod njenim utjecajem osnovana su i mnoga slična društva diljem Italije i hrvatske obale (Zadar, Split, Dubrovnik).\n\nVanjske poveznice\n\nPan-Arkadski Kongres.\nhttp://www.arcadians.gr\nSveučilište u Patrasu, Arkadia-Project.\nArkadija, Grčka.\nNepoznata Arkadija.\nhttp://flyingbrick.freeyellow.com/arcadia.htm \nhttp://www.arcadianet.gr/en/\nhttp://www.tripolis.gr\n\nZemljopis Grčke",
250
+ "question": "Koji je naziv za pjesništvo pastira koje simbolizira neiskvareni i idiličan život?",
251
+ "answers": {
252
+ "answer_start": [1037],
253
+ "text": ["bukolska"]
254
+ }
255
+ }
256
+ ```
257
+
258
+ ```json
259
+ {
260
+ "context": "Hans Emil Alexander Gaede (Kolberg, 19. veljače 1852. - Freiburg im Breisgau, 16. rujna 1916.) je bio njemački general i vojni zapovjednik. Tijekom Prvog svjetskog rata zapovijedao je Armijskim odjelom B na Zapadnom bojištu.\n\nVojna karijera\nHans Gaede rođen je 19. veljače 1852. u Kolbergu (danas Kolobrzeg u Poljskoj). Sin je Alexandera Gaede i Emilie Franke. Gaede je u prusku vojsku stupio 1870. godine, te je sudjelovao u Prusko-francuskom ratu u kojem je i ranjen. Nakon rata pohađa Prusku vojnu akademiju, te nakon završetka iste služi u raznim vojnim jedinicama kao u i pruskom ministarstvu rata. Čin pukovnika dostigao 1897. godine kada postaje zapovjednikom i tvrđave Thorn. General bojnikom je postao 1900. godine, dok je 1904. godine promaknut u čin general poručnika kada dobiva zapovjedništvo nad 33. pješačkom divizijom smještenom u Metzu koji se tada nalazio u okviru Njemačkog Carstva. Godine 1907. Gaede je stavljen na raspolaganje.\n\nPrvi svjetski rat\nNa početku Prvog svjetskog rata Gaede je reaktiviran, te postaje zamjenikom zapovjednika XIV. korpusa koji je bio u sastavu 7. armije koja se nalazila pod zapovjedništvom Josiasa von Heeringena. U rujnu 1914. postaje zapovjednikom Armijskog odjela Gaede koji je kasnije preimenovan u Armijski odjel B koji je držao front u Gornjem Alzasu. Za zapovijedanje u borbama u Alzasu Gaede je 25. rujna 1915. godine odlikovan ordenom Pour le Mérite. U prosincu 1915. Gaedeu je na Sveučilištu u Freiburgu dodijeljen počasni doktorat.\n\nSmrt\nU rujnu 1916. godine Gaede se teško razbolio zbog čega je 3. rujna 1916. morao napustiti zapovjedništvo armijskog odjela. Umro je 16. rujna 1916. godine u 64. godini života u bolnici Freiburgu im Breisgau od posljedica operacije.\n\nVanjske poveznice\n Hans Gaede na stranici Prussianmachine.com\n Hans Gaede na stranici Deutschland14-18.de\n\nNjemački vojni zapovjednici u Prvom svjetskom ratu",
261
+ "question": "Koju nagradu je Gaede primio 25. rujna 1915.?",
262
+ "answers": {
263
+ "answer_start": [1395],
264
+ "text": ["Pour le Mérite"]
265
+ }
266
+ }
267
+ ```
268
+
269
+ ```json
270
+ {
271
+ "context": "Žiroglavci (Enteropneusta) su u klasičnoj sistematici životinjski razred s manje od 100 poznatih vrsta. Ubraja ih se u kojeno polusvitkovce (Hemichordata) i preko njih u drugousti (Deuterostomia), jer im se tijekom embrionalnog razvoja usta razvijaju a ne proizlaze iz "prausta", prvog otvora ranog embrionalnog životnog stadija, gastrule. Njihovo znanstveno ime znači, što izražava i tradicionalno mišljenje da su oni praoblik svitkovaca, u koje spadaju i kralježnjaci.\n\nNo, mjesto žiroglavaca u sistematici je danas sporno. Tako se razmatra moguća srodnost žiroglavaca ne samo sa svitkovcima, nego i s bodljikašima (Echinodermata) u koje spadaju na primjer zvjezdače (Asteroidea) i ježinci (Echinoidea). Čak se sve više smatra vjerojatnijim da žiroglavci ne čine monofiletsku skupinu, što znači da oni nisu svi potomci istih zajedničkih predaka.\n\nGrađa i izgled\nTijelo žiroglavaca je meko, crvoliko, i osim grube podjele na tri dijela, nesegmentirano. Veličinom su vrlo različiti, neke vrste su duge samo nekoliko milimetara, dok druge mogu biti duge i 2,5 metra. Boja im je različita, od bijele do tamno ljubičaste.\n \nMeđu beskralješnjacima, žiroglavci su neobični jer imaju neke osobine koje su tipične za kralježnjake: \n Njihov živčani sustav satoji se od živčanih vrpci koje se protežu leđnom i trbušnom stranom životinje. U predjelu "glave" i oko crijeva ove dvije živčane vrpce kružno su međusobno povezane a od njih se odvajaju živčane niti koje završavaju u vanjskoj koži. Leđna živčana vrpca smještena je u posebnom naboru. Zbog njegovog nastanka u embrionalnom razvoju ponekad ga se smatra homolognim leđnoj moždini svitkovaca.\n Žiroglavci imaju i do 100 ždrijelnih pukotina koje imaju isto anatomsko porijeklo kao i škrge kod riba. Voda koja im uđe na usni otvor nakon zadržavanja djelića hrane, izlazi iz tijela kroz te pukotine.\n\nHrana, životni prostor i rasprostranjenost\nŽiroglavci se hrane na dva različita načina: ili kopaju kroz sediment morskog dna, što znači da uzimaju mulj dna i probavljaju u njemu sadržan organski sadržaj (kao kišne gliste), ili filtriraju iz vode sadržane djeliće organske materijekao na primjer alge. Zbog toga žive uglavnom u ili neposredno ispod dijela izloženog plimi i oseci, na ili u morskom dnu (bentos) dijelom i do dubine od 5.000 metara, i tamo često žive u kanalićima u obliku slova U. Samo rijetke vrste žive u otvorenom moru (pelagijal). Žiroglavci žive u svim morskim područjima, od tropa pa sve do u polarna područja.\n\nRazmnožavanje\nŽiroglavci su odvojenih spolova, no izgledom se gotovo ne razlikuju. Iz oplođenog jajašca najčešće se prvo razvijaju trepetljive larve vrlo slične larvama bodljikaša. Dio životnog ciklusa prije metamorfoze provodi kao plankton hraneći se djelićima hrane koji se zadrže na trepetljikama larve i od tamo se prenose do ustiju. Kod nekih vrsta razvoj se odvija direktno, bez larvenog stadija.\n\nDrugi projekti i vanjske poveznice\nTaksonomija žiroglavaca (engleski)\nFilogeneza žiroglavaca (engleski)\n\nPolusvitkovci",
272
+ "question": "Koliki je broj poznatih vrsta žiroglavaca?",
273
+ "answers": {
274
+ "answer_start": [75],
275
+ "text": ["manje od 100"]
276
+ }
277
+ }
278
+ ```
279
+
280
+ When evaluating generative models, we use the following setup (see the
281
+ [methodology](/methodology) for more information on how these are used):
282
+
283
+ - Number of few-shot examples: 4
284
+ - Prefix prompt:
285
+
286
+ ```text
287
+ Sljedeći tekstovi sadrže pitanja i odgovore.
288
+ ```
289
+
290
+ - Base prompt template:
291
+
292
+ ```text
293
+ Tekst: {text}
294
+ Pitanje: {question}
295
+ Odgovor s najviše 3 riječi:
296
+ ```
297
+
298
+ - Instruction-tuned prompt template:
299
+
300
+ ```text
301
+ Tekst: {text}
302
+
303
+ Odgovorite na sljedeće pitanje o gornjem tekstu s najviše 3 riječi.
304
+
305
+ Pitanje: {question}
306
+ ```
307
+
308
+ You can evaluate this dataset directly as follows:
309
+
310
+ ```bash
311
+ euroeval --model <model-id> --dataset multi-wiki-qa-hr
312
+ ```
313
+
314
+ ## Knowledge
315
+
316
+ ### MMLU-hr
317
+
318
+ This dataset was published in
319
+ [this paper](https://doi.org/10.48550/arXiv.2410.08928) and is a machine
320
+ translated version of the English [MMLU dataset](https://openreview.net/forum?id=d7KBjmI3GmQ).
321
+ It features questions within 57 different topics, such as elementary mathematics, US
322
+ history, and law. DeepL was used to translate the dataset to Croatian.
323
+
324
+ The original full dataset consists of 254 / 12,338 samples for
325
+ validation and testing. These splits were merged, duplicates removed, and
326
+ new splits were created with 1,024 / 256 / 2048 samples for training, validation, and
327
+ testing, respectively.
328
+
329
+ Here are a few examples from the training split:
330
+
331
+ ```json
332
+ {
333
+ "text": "Kako se odvija lateralna komunikacija u organizaciji?\nIzbori:\na. Informacije se prenose prema gore.\nb. Informacije se prenose prema dolje.\nc. Informacije su dvosmjerni proces.\nd. Informacije se prenose između različitih odjela i funkcija.",
334
+ "label": "d"
335
+ }
336
+ ```
337
+
338
+ ```json
339
+ {
340
+ "text": "Kako astronomi misle da Jupiter generira svoju unutarnju toplinu?\nIzbori:\na. kroz egzotermne kemijske reakcije koje pretvaraju kemijsku potencijalnu energiju u toplinsku energiju\nb. nuklearna fuzija\nc. kontrakcijom koja mijenja gravitacijsku potencijalnu energiju u toplinsku energiju\nd. unutarnje trenje zbog njegove brze rotacije i diferencijalne rotacije",
341
+ "label": "c"
342
+ }
343
+ ```
344
+
345
+ ```json
346
+ {
347
+ "text": "Ako se parabola $y_1 = x^2 + 2x + 7$ i pravac $y_2 = 6x + b$ sijeku u samo jednoj točki, koja je vrijednost $b$?\nIzbori:\na. 7\nb. 3\nc. 12\nd. 4",
348
+ "label": "b"
349
+ }
350
+ ```
351
+
352
+ When evaluating generative models, we use the following setup (see the
353
+ [methodology](/methodology) for more information on how these are used):
354
+
355
+ - Number of few-shot examples: 5
356
+ - Prefix prompt:
357
+
358
+ ```text
359
+ Sljedeća su pitanja s višestrukim izborom (s odgovorima).
360
+ ```
361
+
362
+ - Base prompt template:
363
+
364
+ ```text
365
+ Pitanje: {text}
366
+ Izbori:
367
+ a. {option_a}
368
+ b. {option_b}
369
+ c. {option_c}
370
+ d. {option_d}
371
+ Odgovor: {label}
372
+ ```
373
+
374
+ - Instruction-tuned prompt template:
375
+
376
+ ```text
377
+ Pitanje: {text}
378
+
379
+ Odgovorite na gornje pitanje koristeći 'a', 'b', 'c' ili 'd', i ništa drugo.
380
+ ```
381
+
382
+ You can evaluate this dataset directly as follows:
383
+
384
+ ```bash
385
+ euroeval --model <model-id> --dataset mmlu-hr
386
+ ```
387
+
388
+ ## Common-sense Reasoning
389
+
390
+ ### Winogrande-hr
391
+
392
+ This dataset was published in
393
+ [this paper](https://doi.org/10.48550/arXiv.2506.19468) and is a translated
394
+ and filtered version of the English
395
+ [Winogrande dataset](https://doi.org/10.1145/3474381). DeepL was used to
396
+ translate the dataset to Croatian.
397
+
398
+ The original full dataset consists of 47 / 1,210 samples for training and testing, and
399
+ we use 128 of the test samples for validation, resulting in a 47 / 128 / 1,085 split for
400
+ training, validation and testing, respectively.
401
+
402
+ Here are a few examples from the training split:
403
+
404
+ ```json
405
+ {
406
+ "text": "Nisam mogao kontrolirati vlagu kao što sam kontrolirao kišu, jer je _ dolazila odasvud. Na što se odnosi praznina _?\nIzbori:\na. vlaga\nb. kiša",
407
+ "label": "a"
408
+ }
409
+ ```
410
+
411
+ ```json
412
+ {
413
+ "text": "Jessica je mislila da je Sandstorm najbolja pjesma ikad napisana, ali Patricia ju je mrzila. _ je kupila kartu za jazz koncert. Na što se odnosi praznina _?\nIzbori:\na. Jessica\nb. Patricia",
414
+ "label": "b"
415
+ }
416
+ ```
417
+
418
+ ```json
419
+ {
420
+ "text": "Termostat je pokazivao da je dolje dvadeset stupnjeva hladnije nego gore, pa je Byron ostao u _ jer mu je bilo hladno. Na što se odnosi praznina _?\nIzbori:\na. dolje\nb. gore",
421
+ "label": "b"
422
+ }
423
+ ```
424
+
425
+ When evaluating generative models, we use the following setup (see the
426
+ [methodology](/methodology) for more information on how these are used):
427
+
428
+ - Number of few-shot examples: 5
429
+ - Prefix prompt:
430
+
431
+ ```text
432
+ Sljedeća su pitanja s višestrukim izborom (s odgovorima).
433
+ ```
434
+
435
+ - Base prompt template:
436
+
437
+ ```text
438
+ Pitanje: {text}
439
+ Mogućnosti:
440
+ a. {option_a}
441
+ b. {option_b}
442
+ Odgovor: {label}
443
+ ```
444
+
445
+ - Instruction-tuned prompt template:
446
+
447
+ ```text
448
+ Pitanje: {text}
449
+ Mogućnosti:
450
+ a. {option_a}
451
+ b. {option_b}
452
+
453
+ Odgovorite na gornje pitanje koristeći 'a' ili 'b', i ništa drugo.
454
+ ```
455
+
456
+ You can evaluate this dataset directly as follows:
457
+
458
+ ```bash
459
+ euroeval --model <model-id> --dataset winogrande-hr
460
+ ```
@@ -12,7 +12,7 @@ This dataset was published in [this paper](https://doi.org/10.48550/arXiv.2306.0
12
12
  The corpus consists of 79 manually selected datasets from over 350 datasets reported in the
13
13
  scientific literature based on strict quality criteria.
14
14
 
15
- The original dataset contains a single split with 6,165,262 samples. We use
15
+ The original dataset contains a single split with 76,368 Serbian samples. We use
16
16
  1,024 / 256 / 2,048 samples for our training, validation and test splits, respectively.
17
17
 
18
18
  Here are a few examples from the training split: