pearmut 0.0.2a2__py3-none-any.whl → 0.0.2a3__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,149 @@
1
+ Metadata-Version: 2.4
2
+ Name: pearmut
3
+ Version: 0.0.2a3
4
+ Summary: A tool for evaluation of model outputs, primarily MT.
5
+ Author-email: Vilém Zouhar <vilem.zouhar@gmail.com>
6
+ License: MIT
7
+ Project-URL: Repository, https://github.com/zouharvi/pearmut
8
+ Project-URL: Issues, https://github.com/zouharvi/pearmut/issues
9
+ Keywords: evaluation,machine translation,human evaluation,annotation
10
+ Requires-Python: >=3.12
11
+ Description-Content-Type: text/markdown
12
+ License-File: LICENSE
13
+ Requires-Dist: fastapi>=0.110.0
14
+ Requires-Dist: uvicorn>=0.29.0
15
+ Requires-Dist: wonderwords>=3.0.0
16
+ Provides-Extra: dev
17
+ Requires-Dist: pytest; extra == "dev"
18
+ Requires-Dist: pynpm>=0.3.0; extra == "dev"
19
+ Dynamic: license-file
20
+
21
+ # Pearmut 🍐
22
+
23
+ Pearmut is a **Platform for Evaluation and Reviewing of Multilingual Tasks**.
24
+ It evaluates model outputs, primarily translation but also various other NLP tasks.
25
+ Supports multimodality (text, video, audio, images) and a variety of annotation protocols (DA, ESA, MQM, paired ESA, etc).
26
+ [![build status](https://github.com/zouharvi/pearmut/actions/workflows/ci.yml/badge.svg)](https://github.com/zouharvi/pearmut/actions/workflows/ci.yml)
27
+
28
+
29
+ <img width="1334" height="614" alt="image" src="https://github.com/user-attachments/assets/dde04b98-c724-4226-b926-011a89e9ce31" />
30
+
31
+
32
+ ## Starting a campaign
33
+
34
+ First, install the package
35
+ ```bash
36
+ pip install pearmut
37
+ ```
38
+
39
+ A campaign is described in a single JSON file.
40
+ The simplest one, where each user has a pre-defined list of tasks (`task-based`) is:
41
+ ```python
42
+ {
43
+ "campaign_id": "my campaign 4",
44
+ "info": {
45
+ "type": "task-based",
46
+ "template": "pointwise",
47
+ "protocol_score": True, # collect scores
48
+ "protocol_error_spans": True, # collect error spans
49
+ "protocol_error_categories": False, # do not collect MQM categories, so ESA
50
+ },
51
+ "data": [
52
+ [...], # tasks for first user
53
+ [...], # tasks for second user
54
+ [...], # tasks for third user
55
+ ...
56
+ ],
57
+ }
58
+ ```
59
+ In general, the task item can be anything and is handled by the specific protocol template.
60
+ For the standard ones (ESA, DA, MQM), we expect each item to be a list (i.e. document unit) that looks as follows:
61
+ ```python
62
+ [
63
+ {
64
+ "src": "A najednou se všechna tato voda naplnila dalšími lidmi a dalšími věcmi.", # mandatory for ESA/MQM/DA
65
+ "tgt": "And suddenly all the water became full of other people and other people.", # mandatory for ESA/MQM/DA
66
+ ... # all other keys that will be stored, useful for your analysis
67
+ },
68
+ {
69
+ "src": "toto je pokračování stejného dokumentu",
70
+ "tgt": "this is a continuation of the same document",
71
+ ...
72
+ },
73
+ ...
74
+ ]
75
+ ```
76
+
77
+ We also support dynamic allocation of annotations (`dynamic`, not yet ⚠️), which is more complex and can be ignored for now:
78
+ ```python
79
+ {
80
+ "campaign_id": "my campaign 6",
81
+ "info": {
82
+ "type": "dynamic",
83
+ "template": "kway",
84
+ "protocol_k": 5,
85
+ "users": 50,
86
+ },
87
+ "data": [...], # list of all items
88
+ }
89
+ ```
90
+
91
+ We also support a super simple allocation of annotations (`task-single`, not yet ⚠️), where you simply pass a list of all examples to be evaluated and they are processed in parallel by all annotators:
92
+ ```python
93
+ {
94
+ "campaign_id": "my campaign 6",
95
+ "info": {
96
+ "type": "task-single",
97
+ "template": "pointwise",
98
+ "protocol_score": True, # collect scores
99
+ "protocol_error_spans": True, # collect error spans
100
+ "protocol_error_categories": False, # do not collect MQM categories, so ESA
101
+ "users": 50,
102
+ },
103
+ "data": [...], # list of all items
104
+ }
105
+ ```
106
+
107
+ To load a campaign into the server, run the following.
108
+ It will fail if an existing campaign with the same `campaign_id` already exists, unless you specify `-o/--overwrite`.
109
+ It will also output a secret management link.
110
+ ```bash
111
+ pearmut add my_campaign_4.json
112
+ ```
113
+
114
+ Finally, you can launch the server with:
115
+ ```bash
116
+ pearmut run
117
+ ```
118
+
119
+ You can see examples in `data/examples/`.
120
+
121
+ ## Development
122
+
123
+ For the server and frontend locally run:
124
+
125
+ ```bash
126
+ # watch the frontend for changes (in a separate terminal)
127
+ npm install web/ --prefix web/
128
+ npm run watch --prefix web/
129
+
130
+ # install local package as editable
131
+ pip3 install -e .
132
+ # add existing data from WMT25, this generates annotation links
133
+ # sets up progress/log files in current working folder
134
+ pearmut add data/examples/wmt25_#_en-cs_CZ.json
135
+ pearmut add data/examples/wmt25_#_cs-de_DE.json
136
+ # shows a management link for all loaded campaigns
137
+ pearmut run
138
+ ```
139
+
140
+ ## Misc
141
+
142
+ If you use this work in your paper, please cite as:
143
+ ```bibtex
144
+ @misc{zouhar2025pearmut,
145
+ author={Vilém Zouhar and others},
146
+ title={Pearmut🍐 Platform for Evaluation and Reviewing of Multilingual Tasks},
147
+ url={https://github.com/zouharvi/pearmut/},
148
+ year={2025},
149
+ }
@@ -10,9 +10,9 @@ pearmut/static/pointwise.bundle.js,sha256=2aGddZQPxdVM73Ln9-ZJen42VeTY5fhMiAYgO1
10
10
  pearmut/static/pointwise.html,sha256=7C2IN61js9F2445whHVDptxdIfL-ntw5u4rF2OoBWzo,4436
11
11
  pearmut/static/assets/favicon.svg,sha256=gVPxdBlyfyJVkiMfh8WLaiSyH4lpwmKZs8UiOeX8YW4,7347
12
12
  pearmut/static/assets/style.css,sha256=jfETRgVCohe680_30GXxbV4Zq4-B6UlXd5pZXlVLIRs,888
13
- pearmut-0.0.2a2.dist-info/licenses/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
14
- pearmut-0.0.2a2.dist-info/METADATA,sha256=-8oKf7lHRMQM1V5wx9chqlh1ZPvVetuGgO8FKVPXY5c,683
15
- pearmut-0.0.2a2.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
16
- pearmut-0.0.2a2.dist-info/entry_points.txt,sha256=eEA9LVWsS3neQbMvL_nMvEw8I0oFudw8nQa1iqxOiWM,45
17
- pearmut-0.0.2a2.dist-info/top_level.txt,sha256=CdgtUM-SKQDt6o5g0QreO-_7XTBP9_wnHMS1P-Rl5Go,8
18
- pearmut-0.0.2a2.dist-info/RECORD,,
13
+ pearmut-0.0.2a3.dist-info/licenses/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
14
+ pearmut-0.0.2a3.dist-info/METADATA,sha256=MJ93IDtFmE9_C_nFHUC_KmfiN3BRRNKrjEuuzUXasfI,4871
15
+ pearmut-0.0.2a3.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
16
+ pearmut-0.0.2a3.dist-info/entry_points.txt,sha256=eEA9LVWsS3neQbMvL_nMvEw8I0oFudw8nQa1iqxOiWM,45
17
+ pearmut-0.0.2a3.dist-info/top_level.txt,sha256=CdgtUM-SKQDt6o5g0QreO-_7XTBP9_wnHMS1P-Rl5Go,8
18
+ pearmut-0.0.2a3.dist-info/RECORD,,
@@ -1,19 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: pearmut
3
- Version: 0.0.2a2
4
- Summary: A tool for evaluation of model outputs, primarily MT.
5
- Author-email: Vilém Zouhar <vilem.zouhar@gmail.com>
6
- License: MIT
7
- Project-URL: Repository, https://github.com/zouharvi/pearmut
8
- Project-URL: Issues, https://github.com/zouharvi/pearmut/issues
9
- Keywords: evaluation,machine translation,human evaluation,annotation
10
- Requires-Python: >=3.12
11
- Description-Content-Type: text/markdown
12
- License-File: LICENSE
13
- Requires-Dist: fastapi>=0.110.0
14
- Requires-Dist: uvicorn>=0.29.0
15
- Requires-Dist: wonderwords>=3.0.0
16
- Provides-Extra: dev
17
- Requires-Dist: pytest; extra == "dev"
18
- Requires-Dist: pynpm>=0.3.0; extra == "dev"
19
- Dynamic: license-file