pearmut 0.0.2a2__py3-none-any.whl → 0.0.2a3__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- pearmut-0.0.2a3.dist-info/METADATA +149 -0
- {pearmut-0.0.2a2.dist-info → pearmut-0.0.2a3.dist-info}/RECORD +6 -6
- pearmut-0.0.2a2.dist-info/METADATA +0 -19
- {pearmut-0.0.2a2.dist-info → pearmut-0.0.2a3.dist-info}/WHEEL +0 -0
- {pearmut-0.0.2a2.dist-info → pearmut-0.0.2a3.dist-info}/entry_points.txt +0 -0
- {pearmut-0.0.2a2.dist-info → pearmut-0.0.2a3.dist-info}/licenses/LICENSE +0 -0
- {pearmut-0.0.2a2.dist-info → pearmut-0.0.2a3.dist-info}/top_level.txt +0 -0
|
@@ -0,0 +1,149 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: pearmut
|
|
3
|
+
Version: 0.0.2a3
|
|
4
|
+
Summary: A tool for evaluation of model outputs, primarily MT.
|
|
5
|
+
Author-email: Vilém Zouhar <vilem.zouhar@gmail.com>
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Repository, https://github.com/zouharvi/pearmut
|
|
8
|
+
Project-URL: Issues, https://github.com/zouharvi/pearmut/issues
|
|
9
|
+
Keywords: evaluation,machine translation,human evaluation,annotation
|
|
10
|
+
Requires-Python: >=3.12
|
|
11
|
+
Description-Content-Type: text/markdown
|
|
12
|
+
License-File: LICENSE
|
|
13
|
+
Requires-Dist: fastapi>=0.110.0
|
|
14
|
+
Requires-Dist: uvicorn>=0.29.0
|
|
15
|
+
Requires-Dist: wonderwords>=3.0.0
|
|
16
|
+
Provides-Extra: dev
|
|
17
|
+
Requires-Dist: pytest; extra == "dev"
|
|
18
|
+
Requires-Dist: pynpm>=0.3.0; extra == "dev"
|
|
19
|
+
Dynamic: license-file
|
|
20
|
+
|
|
21
|
+
# Pearmut 🍐
|
|
22
|
+
|
|
23
|
+
Pearmut is a **Platform for Evaluation and Reviewing of Multilingual Tasks**.
|
|
24
|
+
It evaluates model outputs, primarily translation but also various other NLP tasks.
|
|
25
|
+
Supports multimodality (text, video, audio, images) and a variety of annotation protocols (DA, ESA, MQM, paired ESA, etc).
|
|
26
|
+
[](https://github.com/zouharvi/pearmut/actions/workflows/ci.yml)
|
|
27
|
+
|
|
28
|
+
|
|
29
|
+
<img width="1334" height="614" alt="image" src="https://github.com/user-attachments/assets/dde04b98-c724-4226-b926-011a89e9ce31" />
|
|
30
|
+
|
|
31
|
+
|
|
32
|
+
## Starting a campaign
|
|
33
|
+
|
|
34
|
+
First, install the package
|
|
35
|
+
```bash
|
|
36
|
+
pip install pearmut
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
A campaign is described in a single JSON file.
|
|
40
|
+
The simplest one, where each user has a pre-defined list of tasks (`task-based`) is:
|
|
41
|
+
```python
|
|
42
|
+
{
|
|
43
|
+
"campaign_id": "my campaign 4",
|
|
44
|
+
"info": {
|
|
45
|
+
"type": "task-based",
|
|
46
|
+
"template": "pointwise",
|
|
47
|
+
"protocol_score": True, # collect scores
|
|
48
|
+
"protocol_error_spans": True, # collect error spans
|
|
49
|
+
"protocol_error_categories": False, # do not collect MQM categories, so ESA
|
|
50
|
+
},
|
|
51
|
+
"data": [
|
|
52
|
+
[...], # tasks for first user
|
|
53
|
+
[...], # tasks for second user
|
|
54
|
+
[...], # tasks for third user
|
|
55
|
+
...
|
|
56
|
+
],
|
|
57
|
+
}
|
|
58
|
+
```
|
|
59
|
+
In general, the task item can be anything and is handled by the specific protocol template.
|
|
60
|
+
For the standard ones (ESA, DA, MQM), we expect each item to be a list (i.e. document unit) that looks as follows:
|
|
61
|
+
```python
|
|
62
|
+
[
|
|
63
|
+
{
|
|
64
|
+
"src": "A najednou se všechna tato voda naplnila dalšími lidmi a dalšími věcmi.", # mandatory for ESA/MQM/DA
|
|
65
|
+
"tgt": "And suddenly all the water became full of other people and other people.", # mandatory for ESA/MQM/DA
|
|
66
|
+
... # all other keys that will be stored, useful for your analysis
|
|
67
|
+
},
|
|
68
|
+
{
|
|
69
|
+
"src": "toto je pokračování stejného dokumentu",
|
|
70
|
+
"tgt": "this is a continuation of the same document",
|
|
71
|
+
...
|
|
72
|
+
},
|
|
73
|
+
...
|
|
74
|
+
]
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
We also support dynamic allocation of annotations (`dynamic`, not yet ⚠️), which is more complex and can be ignored for now:
|
|
78
|
+
```python
|
|
79
|
+
{
|
|
80
|
+
"campaign_id": "my campaign 6",
|
|
81
|
+
"info": {
|
|
82
|
+
"type": "dynamic",
|
|
83
|
+
"template": "kway",
|
|
84
|
+
"protocol_k": 5,
|
|
85
|
+
"users": 50,
|
|
86
|
+
},
|
|
87
|
+
"data": [...], # list of all items
|
|
88
|
+
}
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
We also support a super simple allocation of annotations (`task-single`, not yet ⚠️), where you simply pass a list of all examples to be evaluated and they are processed in parallel by all annotators:
|
|
92
|
+
```python
|
|
93
|
+
{
|
|
94
|
+
"campaign_id": "my campaign 6",
|
|
95
|
+
"info": {
|
|
96
|
+
"type": "task-single",
|
|
97
|
+
"template": "pointwise",
|
|
98
|
+
"protocol_score": True, # collect scores
|
|
99
|
+
"protocol_error_spans": True, # collect error spans
|
|
100
|
+
"protocol_error_categories": False, # do not collect MQM categories, so ESA
|
|
101
|
+
"users": 50,
|
|
102
|
+
},
|
|
103
|
+
"data": [...], # list of all items
|
|
104
|
+
}
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
To load a campaign into the server, run the following.
|
|
108
|
+
It will fail if an existing campaign with the same `campaign_id` already exists, unless you specify `-o/--overwrite`.
|
|
109
|
+
It will also output a secret management link.
|
|
110
|
+
```bash
|
|
111
|
+
pearmut add my_campaign_4.json
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
Finally, you can launch the server with:
|
|
115
|
+
```bash
|
|
116
|
+
pearmut run
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
You can see examples in `data/examples/`.
|
|
120
|
+
|
|
121
|
+
## Development
|
|
122
|
+
|
|
123
|
+
For the server and frontend locally run:
|
|
124
|
+
|
|
125
|
+
```bash
|
|
126
|
+
# watch the frontend for changes (in a separate terminal)
|
|
127
|
+
npm install web/ --prefix web/
|
|
128
|
+
npm run watch --prefix web/
|
|
129
|
+
|
|
130
|
+
# install local package as editable
|
|
131
|
+
pip3 install -e .
|
|
132
|
+
# add existing data from WMT25, this generates annotation links
|
|
133
|
+
# sets up progress/log files in current working folder
|
|
134
|
+
pearmut add data/examples/wmt25_#_en-cs_CZ.json
|
|
135
|
+
pearmut add data/examples/wmt25_#_cs-de_DE.json
|
|
136
|
+
# shows a management link for all loaded campaigns
|
|
137
|
+
pearmut run
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
## Misc
|
|
141
|
+
|
|
142
|
+
If you use this work in your paper, please cite as:
|
|
143
|
+
```bibtex
|
|
144
|
+
@misc{zouhar2025pearmut,
|
|
145
|
+
author={Vilém Zouhar and others},
|
|
146
|
+
title={Pearmut🍐 Platform for Evaluation and Reviewing of Multilingual Tasks},
|
|
147
|
+
url={https://github.com/zouharvi/pearmut/},
|
|
148
|
+
year={2025},
|
|
149
|
+
}
|
|
@@ -10,9 +10,9 @@ pearmut/static/pointwise.bundle.js,sha256=2aGddZQPxdVM73Ln9-ZJen42VeTY5fhMiAYgO1
|
|
|
10
10
|
pearmut/static/pointwise.html,sha256=7C2IN61js9F2445whHVDptxdIfL-ntw5u4rF2OoBWzo,4436
|
|
11
11
|
pearmut/static/assets/favicon.svg,sha256=gVPxdBlyfyJVkiMfh8WLaiSyH4lpwmKZs8UiOeX8YW4,7347
|
|
12
12
|
pearmut/static/assets/style.css,sha256=jfETRgVCohe680_30GXxbV4Zq4-B6UlXd5pZXlVLIRs,888
|
|
13
|
-
pearmut-0.0.
|
|
14
|
-
pearmut-0.0.
|
|
15
|
-
pearmut-0.0.
|
|
16
|
-
pearmut-0.0.
|
|
17
|
-
pearmut-0.0.
|
|
18
|
-
pearmut-0.0.
|
|
13
|
+
pearmut-0.0.2a3.dist-info/licenses/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
|
|
14
|
+
pearmut-0.0.2a3.dist-info/METADATA,sha256=MJ93IDtFmE9_C_nFHUC_KmfiN3BRRNKrjEuuzUXasfI,4871
|
|
15
|
+
pearmut-0.0.2a3.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
|
|
16
|
+
pearmut-0.0.2a3.dist-info/entry_points.txt,sha256=eEA9LVWsS3neQbMvL_nMvEw8I0oFudw8nQa1iqxOiWM,45
|
|
17
|
+
pearmut-0.0.2a3.dist-info/top_level.txt,sha256=CdgtUM-SKQDt6o5g0QreO-_7XTBP9_wnHMS1P-Rl5Go,8
|
|
18
|
+
pearmut-0.0.2a3.dist-info/RECORD,,
|
|
@@ -1,19 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: pearmut
|
|
3
|
-
Version: 0.0.2a2
|
|
4
|
-
Summary: A tool for evaluation of model outputs, primarily MT.
|
|
5
|
-
Author-email: Vilém Zouhar <vilem.zouhar@gmail.com>
|
|
6
|
-
License: MIT
|
|
7
|
-
Project-URL: Repository, https://github.com/zouharvi/pearmut
|
|
8
|
-
Project-URL: Issues, https://github.com/zouharvi/pearmut/issues
|
|
9
|
-
Keywords: evaluation,machine translation,human evaluation,annotation
|
|
10
|
-
Requires-Python: >=3.12
|
|
11
|
-
Description-Content-Type: text/markdown
|
|
12
|
-
License-File: LICENSE
|
|
13
|
-
Requires-Dist: fastapi>=0.110.0
|
|
14
|
-
Requires-Dist: uvicorn>=0.29.0
|
|
15
|
-
Requires-Dist: wonderwords>=3.0.0
|
|
16
|
-
Provides-Extra: dev
|
|
17
|
-
Requires-Dist: pytest; extra == "dev"
|
|
18
|
-
Requires-Dist: pynpm>=0.3.0; extra == "dev"
|
|
19
|
-
Dynamic: license-file
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|