pearmut 0.2.5__tar.gz → 0.2.7__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- pearmut-0.2.7/LICENSE +21 -0
- pearmut-0.2.7/PKG-INFO +330 -0
- pearmut-0.2.7/README.md +310 -0
- pearmut-0.2.7/pearmut.egg-info/PKG-INFO +330 -0
- {pearmut-0.2.5 → pearmut-0.2.7}/pyproject.toml +2 -2
- {pearmut-0.2.5 → pearmut-0.2.7}/server/app.py +36 -29
- {pearmut-0.2.5 → pearmut-0.2.7}/server/cli.py +119 -13
- pearmut-0.2.7/server/static/dashboard.bundle.js +1 -0
- pearmut-0.2.7/server/static/listwise.bundle.js +1 -0
- pearmut-0.2.7/server/static/pointwise.bundle.js +1 -0
- pearmut-0.2.5/LICENSE +0 -201
- pearmut-0.2.5/PKG-INFO +0 -345
- pearmut-0.2.5/README.md +0 -325
- pearmut-0.2.5/pearmut.egg-info/PKG-INFO +0 -345
- pearmut-0.2.5/server/static/dashboard.bundle.js +0 -1
- pearmut-0.2.5/server/static/listwise.bundle.js +0 -1
- pearmut-0.2.5/server/static/pointwise.bundle.js +0 -1
- {pearmut-0.2.5 → pearmut-0.2.7}/pearmut.egg-info/SOURCES.txt +0 -0
- {pearmut-0.2.5 → pearmut-0.2.7}/pearmut.egg-info/dependency_links.txt +0 -0
- {pearmut-0.2.5 → pearmut-0.2.7}/pearmut.egg-info/entry_points.txt +0 -0
- {pearmut-0.2.5 → pearmut-0.2.7}/pearmut.egg-info/requires.txt +0 -0
- {pearmut-0.2.5 → pearmut-0.2.7}/pearmut.egg-info/top_level.txt +0 -0
- {pearmut-0.2.5 → pearmut-0.2.7}/server/assignment.py +0 -0
- {pearmut-0.2.5 → pearmut-0.2.7}/server/static/assets/favicon.svg +0 -0
- {pearmut-0.2.5 → pearmut-0.2.7}/server/static/assets/style.css +0 -0
- {pearmut-0.2.5 → pearmut-0.2.7}/server/static/dashboard.html +0 -0
- {pearmut-0.2.5 → pearmut-0.2.7}/server/static/index.html +0 -0
- {pearmut-0.2.5 → pearmut-0.2.7}/server/static/listwise.html +0 -0
- {pearmut-0.2.5 → pearmut-0.2.7}/server/static/pointwise.html +0 -0
- {pearmut-0.2.5 → pearmut-0.2.7}/server/utils.py +0 -0
- {pearmut-0.2.5 → pearmut-0.2.7}/setup.cfg +0 -0
pearmut-0.2.7/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025- Vilém Zouhar
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
pearmut-0.2.7/PKG-INFO
ADDED
|
@@ -0,0 +1,330 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: pearmut
|
|
3
|
+
Version: 0.2.7
|
|
4
|
+
Summary: A tool for evaluation of model outputs, primarily MT.
|
|
5
|
+
Author-email: Vilém Zouhar <vilem.zouhar@gmail.com>
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Repository, https://github.com/zouharvi/pearmut
|
|
8
|
+
Project-URL: Issues, https://github.com/zouharvi/pearmut/issues
|
|
9
|
+
Keywords: evaluation,machine translation,human evaluation,annotation
|
|
10
|
+
Requires-Python: >=3.12
|
|
11
|
+
Description-Content-Type: text/markdown
|
|
12
|
+
License-File: LICENSE
|
|
13
|
+
Requires-Dist: fastapi>=0.110.0
|
|
14
|
+
Requires-Dist: uvicorn>=0.29.0
|
|
15
|
+
Requires-Dist: wonderwords>=3.0.0
|
|
16
|
+
Requires-Dist: psutil>=7.1.0
|
|
17
|
+
Provides-Extra: dev
|
|
18
|
+
Requires-Dist: pytest; extra == "dev"
|
|
19
|
+
Dynamic: license-file
|
|
20
|
+
|
|
21
|
+
# Pearmut 🍐
|
|
22
|
+
|
|
23
|
+
**Platform for Evaluation and Reviewing of Multilingual Tasks** — Evaluate model outputs for translation and NLP tasks with support for multimodal data (text, video, audio, images) and multiple annotation protocols ([DA](https://aclanthology.org/N15-1124/), [ESA](https://aclanthology.org/2024.wmt-1.131/), [ESA<sup>AI</sup>](https://aclanthology.org/2025.naacl-long.255/), [MQM](https://doi.org/10.1162/tacl_a_00437), and more!).
|
|
24
|
+
|
|
25
|
+
[](https://pypi.org/project/pearmut)
|
|
26
|
+
|
|
27
|
+
[](https://pypi.python.org/pypi/pearmut/)
|
|
28
|
+
|
|
29
|
+
[](https://pypi.org/project/pearmut/)
|
|
30
|
+
|
|
31
|
+
[](https://github.com/zouharvi/pearmut/actions/workflows/test.yml)
|
|
32
|
+
|
|
33
|
+
<img width="1000" alt="Screenshot of ESA/MQM interface" src="https://github.com/user-attachments/assets/4fb9a1cb-78ac-47e0-99cd-0870a368a0ad" />
|
|
34
|
+
|
|
35
|
+
## Table of Contents
|
|
36
|
+
|
|
37
|
+
- [Quick Start](#quick-start)
|
|
38
|
+
- [Campaign Configuration](#campaign-configuration)
|
|
39
|
+
- [Basic Structure](#basic-structure)
|
|
40
|
+
- [Assignment Types](#assignment-types)
|
|
41
|
+
- [Protocol Templates](#protocol-templates)
|
|
42
|
+
- [Advanced Features](#advanced-features)
|
|
43
|
+
- [Pre-filled Error Spans (ESA<sup>AI</sup>)](#pre-filled-error-spans-esaai)
|
|
44
|
+
- [Tutorial and Attention Checks](#tutorial-and-attention-checks)
|
|
45
|
+
- [Pre-defined User IDs and Tokens](#pre-defined-user-ids-and-tokens)
|
|
46
|
+
- [Multimodal Annotations](#multimodal-annotations)
|
|
47
|
+
- [Hosting Assets](#hosting-assets)
|
|
48
|
+
- [Campaign Management](#campaign-management)
|
|
49
|
+
- [CLI Commands](#cli-commands)
|
|
50
|
+
- [Development](#development)
|
|
51
|
+
- [Citation](#citation)
|
|
52
|
+
|
|
53
|
+
## Quick Start
|
|
54
|
+
|
|
55
|
+
Install and run locally without cloning:
|
|
56
|
+
```bash
|
|
57
|
+
pip install pearmut
|
|
58
|
+
# Download example campaigns
|
|
59
|
+
wget https://raw.githubusercontent.com/zouharvi/pearmut/refs/heads/main/examples/esa_encs.json
|
|
60
|
+
wget https://raw.githubusercontent.com/zouharvi/pearmut/refs/heads/main/examples/da_enuk.json
|
|
61
|
+
# Load and start
|
|
62
|
+
pearmut add esa_encs.json da_enuk.json
|
|
63
|
+
pearmut run
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
## Campaign Configuration
|
|
67
|
+
|
|
68
|
+
### Basic Structure
|
|
69
|
+
|
|
70
|
+
Campaigns are defined in JSON files (see [examples/](examples/)). The simplest configuration uses `task-based` assignment where each user has pre-defined tasks:
|
|
71
|
+
```python
|
|
72
|
+
{
|
|
73
|
+
"info": {
|
|
74
|
+
"assignment": "task-based",
|
|
75
|
+
"template": "pointwise",
|
|
76
|
+
"protocol_score": true, # we want scores [0...100] for each segment
|
|
77
|
+
"protocol_error_spans": true, # we want error spans
|
|
78
|
+
"protocol_error_categories": false, # we do not want error span categories
|
|
79
|
+
},
|
|
80
|
+
"campaign_id": "wmt25_#_en-cs_CZ",
|
|
81
|
+
"data": [
|
|
82
|
+
# data for first task/user
|
|
83
|
+
[
|
|
84
|
+
[
|
|
85
|
+
# each evaluation item is a document
|
|
86
|
+
{
|
|
87
|
+
"instructions": "Evaluate translation from en to cs_CZ", # message to show to users above the first item
|
|
88
|
+
"src": "This will be the year that Guinness loses its cool. Cheers to that!",
|
|
89
|
+
"tgt": "Nevím přesně, kdy jsem to poprvé zaznamenal. Možná to bylo ve chvíli, ..."
|
|
90
|
+
},
|
|
91
|
+
{
|
|
92
|
+
"src": "I'm not sure I can remember exactly when I sensed it. Maybe it was when some...",
|
|
93
|
+
"tgt": "Tohle bude rok, kdy Guinness přijde o svůj „cool“ faktor. Na zdraví!"
|
|
94
|
+
}
|
|
95
|
+
...
|
|
96
|
+
],
|
|
97
|
+
# more document
|
|
98
|
+
...
|
|
99
|
+
],
|
|
100
|
+
# data for second task/user
|
|
101
|
+
[
|
|
102
|
+
...
|
|
103
|
+
],
|
|
104
|
+
# arbitrary number of users (each corresponds to a single URL to be shared)
|
|
105
|
+
]
|
|
106
|
+
}
|
|
107
|
+
```
|
|
108
|
+
Task items are protocol-specific. For ESA/DA/MQM protocols, each item is a dictionary representing a document unit:
|
|
109
|
+
```python
|
|
110
|
+
[
|
|
111
|
+
{
|
|
112
|
+
"src": "A najednou se všechna tato voda naplnila dalšími lidmi a dalšími věcmi.", # required
|
|
113
|
+
"tgt": "And suddenly all the water became full of other people and other people." # required
|
|
114
|
+
},
|
|
115
|
+
{
|
|
116
|
+
"src": "toto je pokračování stejného dokumentu",
|
|
117
|
+
"tgt": "this is a continuation of the same document"
|
|
118
|
+
# Additional keys stored for analysis
|
|
119
|
+
}
|
|
120
|
+
]
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
Load campaigns and start the server:
|
|
124
|
+
```bash
|
|
125
|
+
pearmut add my_campaign.json # Use -o/--overwrite to replace existing
|
|
126
|
+
pearmut run
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
### Assignment Types
|
|
130
|
+
|
|
131
|
+
- **`task-based`**: Each user has predefined items
|
|
132
|
+
- **`single-stream`**: All users draw from a shared pool (random assignment)
|
|
133
|
+
- **`dynamic`**: work in progress ⚠️
|
|
134
|
+
|
|
135
|
+
### Protocol Templates
|
|
136
|
+
|
|
137
|
+
- **Pointwise**: Evaluate single output against single input
|
|
138
|
+
- `protocol_score`: Collect scores [0-100]
|
|
139
|
+
- `protocol_error_spans`: Collect error span highlights
|
|
140
|
+
- `protocol_error_categories`: Collect MQM category labels
|
|
141
|
+
- **Listwise**: Evaluate multiple outputs simultaneously
|
|
142
|
+
- Same protocol options as pointwise
|
|
143
|
+
|
|
144
|
+
## Advanced Features
|
|
145
|
+
|
|
146
|
+
### Pre-filled Error Spans (ESA<sup>AI</sup>)
|
|
147
|
+
|
|
148
|
+
Include `error_spans` to pre-fill annotations that users can review, modify, or delete:
|
|
149
|
+
|
|
150
|
+
```python
|
|
151
|
+
{
|
|
152
|
+
"src": "The quick brown fox jumps over the lazy dog.",
|
|
153
|
+
"tgt": "Rychlá hnědá liška skáče přes líného psa.",
|
|
154
|
+
"error_spans": [
|
|
155
|
+
{
|
|
156
|
+
"start_i": 0, # character index start (inclusive)
|
|
157
|
+
"end_i": 5, # character index end (inclusive)
|
|
158
|
+
"severity": "minor", # "minor", "major", "neutral", or null
|
|
159
|
+
"category": null # MQM category string or null
|
|
160
|
+
},
|
|
161
|
+
{
|
|
162
|
+
"start_i": 27,
|
|
163
|
+
"end_i": 32,
|
|
164
|
+
"severity": "major",
|
|
165
|
+
"category": null
|
|
166
|
+
}
|
|
167
|
+
]
|
|
168
|
+
}
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
For **listwise** template, `error_spans` is a 2D array (one per candidate). See [examples/esaai_prefilled.json](examples/esaai_prefilled.json).
|
|
172
|
+
|
|
173
|
+
### Tutorial and Attention Checks
|
|
174
|
+
|
|
175
|
+
Add `validation` rules for tutorials or attention checks:
|
|
176
|
+
|
|
177
|
+
```python
|
|
178
|
+
{
|
|
179
|
+
"src": "The quick brown fox jumps.",
|
|
180
|
+
"tgt": "Rychlá hnědá liška skáče.",
|
|
181
|
+
"validation": {
|
|
182
|
+
"warning": "Please set score between 70-80.", # shown on failure (omit for silent logging)
|
|
183
|
+
"score": [70, 80], # required score range [min, max]
|
|
184
|
+
"error_spans": [{"start_i": [0, 2], "end_i": [4, 8], "severity": "minor"}], # expected spans
|
|
185
|
+
"allow_skip": true # show "skip tutorial" button
|
|
186
|
+
}
|
|
187
|
+
}
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
**Types:**
|
|
191
|
+
- **Tutorial**: Include `allow_skip: true` and `warning` to let users skip after feedback
|
|
192
|
+
- **Loud attention checks**: Include `warning` without `allow_skip` to force retry
|
|
193
|
+
- **Silent attention checks**: Omit `warning` to log failures without notification (quality control)
|
|
194
|
+
|
|
195
|
+
For listwise, `validation` is an array (one per candidate). Dashboard shows ✅/❌ based on `validation_threshold` in `info` (integer for max failed count, float \[0,1\) for max proportion, default 0).
|
|
196
|
+
See [examples/tutorial_pointwise.json](examples/tutorial_pointwise.json) and [examples/tutorial_listwise.json](examples/tutorial_listwise.json).
|
|
197
|
+
|
|
198
|
+
### Single-stream Assignment
|
|
199
|
+
|
|
200
|
+
All annotators draw from a shared pool with random assignment:
|
|
201
|
+
```python
|
|
202
|
+
{
|
|
203
|
+
"campaign_id": "my campaign 6",
|
|
204
|
+
"info": {
|
|
205
|
+
"assignment": "single-stream",
|
|
206
|
+
"template": "pointwise",
|
|
207
|
+
"protocol_score": True, # collect scores
|
|
208
|
+
"protocol_error_spans": True, # collect error spans
|
|
209
|
+
"protocol_error_categories": False, # do not collect MQM categories, so ESA
|
|
210
|
+
"users": 50, # number of annotators (can also be a list, see below)
|
|
211
|
+
},
|
|
212
|
+
"data": [...], # list of all items (shared among all annotators)
|
|
213
|
+
}
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
|
|
217
|
+
### Pre-defined User IDs and Tokens
|
|
218
|
+
|
|
219
|
+
The `users` field accepts:
|
|
220
|
+
- **Number** (e.g., `50`): Generate random user IDs
|
|
221
|
+
- **List of strings** (e.g., `["alice", "bob"]`): Use specific user IDs
|
|
222
|
+
- **List of dictionaries**: Specify custom tokens:
|
|
223
|
+
```python
|
|
224
|
+
{
|
|
225
|
+
"info": {
|
|
226
|
+
...
|
|
227
|
+
"users": [
|
|
228
|
+
{"user_id": "alice", "token_pass": "alice_done", "token_fail": "alice_fail"},
|
|
229
|
+
{"user_id": "bob", "token_pass": "bob_done"} # missing tokens are auto-generated
|
|
230
|
+
],
|
|
231
|
+
},
|
|
232
|
+
...
|
|
233
|
+
}
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
### Multimodal Annotations
|
|
237
|
+
|
|
238
|
+
Support for HTML-compatible elements (YouTube embeds, `<video>` tags, images). Ensure elements are pre-styled. See [examples/multimodal.json](examples/multimodal.json).
|
|
239
|
+
|
|
240
|
+
<img width="1000" alt="Preview of multimodal elements in Pearmut" src="https://github.com/user-attachments/assets/77c4fa96-ee62-4e46-8e78-fd16e9007956" />
|
|
241
|
+
|
|
242
|
+
### Hosting Assets
|
|
243
|
+
|
|
244
|
+
Host local assets (audio, images, videos) using the `assets` key:
|
|
245
|
+
|
|
246
|
+
```python
|
|
247
|
+
{
|
|
248
|
+
"campaign_id": "my_campaign",
|
|
249
|
+
"info": {
|
|
250
|
+
"assets": {
|
|
251
|
+
"source": "videos", # Source directory
|
|
252
|
+
"destination": "assets/my_videos" # Mount path (must start with "assets/")
|
|
253
|
+
}
|
|
254
|
+
},
|
|
255
|
+
"data": [ ... ]
|
|
256
|
+
}
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
Files from `videos/` become accessible at `localhost:8001/assets/my_videos/`. Creates a symlink, so source directory must exist throughout annotation. Destination paths must be unique across campaigns.
|
|
260
|
+
|
|
261
|
+
## CLI Commands
|
|
262
|
+
|
|
263
|
+
- **`pearmut add <file(s)>`**: Add campaign JSON files (supports wildcards)
|
|
264
|
+
- `-o/--overwrite`: Replace existing campaigns with same ID
|
|
265
|
+
- `--server <url>`: Server URL prefix (default: `http://localhost:8001`)
|
|
266
|
+
- **`pearmut run`**: Start server
|
|
267
|
+
- `--port <port>`: Server port (default: 8001)
|
|
268
|
+
- `--server <url>`: Server URL prefix
|
|
269
|
+
- **`pearmut purge [campaign]`**: Remove campaign data
|
|
270
|
+
- Without args: Purge all campaigns
|
|
271
|
+
- With campaign name: Purge specific campaign only
|
|
272
|
+
|
|
273
|
+
## Campaign Management
|
|
274
|
+
|
|
275
|
+
Management link (shown when adding campaigns or running server) provides:
|
|
276
|
+
- Annotator progress overview
|
|
277
|
+
- Access to annotation links
|
|
278
|
+
- Task progress reset (data preserved)
|
|
279
|
+
- Download progress and annotations
|
|
280
|
+
|
|
281
|
+
<img width="800" alt="Management dashboard" src="https://github.com/user-attachments/assets/800a1741-5f41-47ac-9d5d-5cbf6abfc0e6" />
|
|
282
|
+
|
|
283
|
+
Completion tokens are shown at annotation end for verification (download correct tokens from dashboard). Incorrect tokens can be shown if quality control fails.
|
|
284
|
+
|
|
285
|
+
<img width="500" alt="Token on completion" src="https://github.com/user-attachments/assets/40eb904c-f47a-4011-aa63-9a4f1c501549" />
|
|
286
|
+
|
|
287
|
+
## Development
|
|
288
|
+
|
|
289
|
+
Server responds to data-only requests from frontend (no template coupling). Frontend served from pre-built `static/` on install.
|
|
290
|
+
|
|
291
|
+
### Local development:
|
|
292
|
+
```bash
|
|
293
|
+
cd pearmut
|
|
294
|
+
# Frontend (separate terminal, recompiles on change)
|
|
295
|
+
npm install web/ --prefix web/
|
|
296
|
+
npm run build --prefix web/
|
|
297
|
+
# optionally keep running indefinitely to auto-rebuild
|
|
298
|
+
npm watch build --prefix web/
|
|
299
|
+
|
|
300
|
+
# Install as editable
|
|
301
|
+
pip3 install -e .
|
|
302
|
+
# Load examples
|
|
303
|
+
pearmut add examples/wmt25_#_en-cs_CZ.json examples/wmt25_#_cs-de_DE.json
|
|
304
|
+
pearmut run
|
|
305
|
+
```
|
|
306
|
+
|
|
307
|
+
### Creating new protocols:
|
|
308
|
+
1. Add HTML and TS files to `web/src`
|
|
309
|
+
2. Add build rule to `webpack.config.js`
|
|
310
|
+
3. Reference as `info->template` in campaign JSON
|
|
311
|
+
|
|
312
|
+
See [web/src/pointwise.ts](web/src/pointwise.ts) for example.
|
|
313
|
+
|
|
314
|
+
### Deployment
|
|
315
|
+
|
|
316
|
+
Run on public server or tunnel local port to public IP/domain and run locally.
|
|
317
|
+
|
|
318
|
+
## Misc.
|
|
319
|
+
|
|
320
|
+
If you use this work in your paper, please cite as following.
|
|
321
|
+
```bibtex
|
|
322
|
+
@misc{zouhar2025pearmut,
|
|
323
|
+
author={Vilém Zouhar},
|
|
324
|
+
title={Pearmut: Platform for Evaluating and Reviewing of Multilingual Tasks},
|
|
325
|
+
url={https://github.com/zouharvi/pearmut/},
|
|
326
|
+
year={2025},
|
|
327
|
+
}
|
|
328
|
+
```
|
|
329
|
+
|
|
330
|
+
Contributions are welcome! Please reach out to [Vilém Zouhar](mailto:vilem.zouhar@gmail.com).
|
pearmut-0.2.7/README.md
ADDED
|
@@ -0,0 +1,310 @@
|
|
|
1
|
+
# Pearmut 🍐
|
|
2
|
+
|
|
3
|
+
**Platform for Evaluation and Reviewing of Multilingual Tasks** — Evaluate model outputs for translation and NLP tasks with support for multimodal data (text, video, audio, images) and multiple annotation protocols ([DA](https://aclanthology.org/N15-1124/), [ESA](https://aclanthology.org/2024.wmt-1.131/), [ESA<sup>AI</sup>](https://aclanthology.org/2025.naacl-long.255/), [MQM](https://doi.org/10.1162/tacl_a_00437), and more!).
|
|
4
|
+
|
|
5
|
+
[](https://pypi.org/project/pearmut)
|
|
6
|
+
|
|
7
|
+
[](https://pypi.python.org/pypi/pearmut/)
|
|
8
|
+
|
|
9
|
+
[](https://pypi.org/project/pearmut/)
|
|
10
|
+
|
|
11
|
+
[](https://github.com/zouharvi/pearmut/actions/workflows/test.yml)
|
|
12
|
+
|
|
13
|
+
<img width="1000" alt="Screenshot of ESA/MQM interface" src="https://github.com/user-attachments/assets/4fb9a1cb-78ac-47e0-99cd-0870a368a0ad" />
|
|
14
|
+
|
|
15
|
+
## Table of Contents
|
|
16
|
+
|
|
17
|
+
- [Quick Start](#quick-start)
|
|
18
|
+
- [Campaign Configuration](#campaign-configuration)
|
|
19
|
+
- [Basic Structure](#basic-structure)
|
|
20
|
+
- [Assignment Types](#assignment-types)
|
|
21
|
+
- [Protocol Templates](#protocol-templates)
|
|
22
|
+
- [Advanced Features](#advanced-features)
|
|
23
|
+
- [Pre-filled Error Spans (ESA<sup>AI</sup>)](#pre-filled-error-spans-esaai)
|
|
24
|
+
- [Tutorial and Attention Checks](#tutorial-and-attention-checks)
|
|
25
|
+
- [Pre-defined User IDs and Tokens](#pre-defined-user-ids-and-tokens)
|
|
26
|
+
- [Multimodal Annotations](#multimodal-annotations)
|
|
27
|
+
- [Hosting Assets](#hosting-assets)
|
|
28
|
+
- [Campaign Management](#campaign-management)
|
|
29
|
+
- [CLI Commands](#cli-commands)
|
|
30
|
+
- [Development](#development)
|
|
31
|
+
- [Citation](#citation)
|
|
32
|
+
|
|
33
|
+
## Quick Start
|
|
34
|
+
|
|
35
|
+
Install and run locally without cloning:
|
|
36
|
+
```bash
|
|
37
|
+
pip install pearmut
|
|
38
|
+
# Download example campaigns
|
|
39
|
+
wget https://raw.githubusercontent.com/zouharvi/pearmut/refs/heads/main/examples/esa_encs.json
|
|
40
|
+
wget https://raw.githubusercontent.com/zouharvi/pearmut/refs/heads/main/examples/da_enuk.json
|
|
41
|
+
# Load and start
|
|
42
|
+
pearmut add esa_encs.json da_enuk.json
|
|
43
|
+
pearmut run
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
## Campaign Configuration
|
|
47
|
+
|
|
48
|
+
### Basic Structure
|
|
49
|
+
|
|
50
|
+
Campaigns are defined in JSON files (see [examples/](examples/)). The simplest configuration uses `task-based` assignment where each user has pre-defined tasks:
|
|
51
|
+
```python
|
|
52
|
+
{
|
|
53
|
+
"info": {
|
|
54
|
+
"assignment": "task-based",
|
|
55
|
+
"template": "pointwise",
|
|
56
|
+
"protocol_score": true, # we want scores [0...100] for each segment
|
|
57
|
+
"protocol_error_spans": true, # we want error spans
|
|
58
|
+
"protocol_error_categories": false, # we do not want error span categories
|
|
59
|
+
},
|
|
60
|
+
"campaign_id": "wmt25_#_en-cs_CZ",
|
|
61
|
+
"data": [
|
|
62
|
+
# data for first task/user
|
|
63
|
+
[
|
|
64
|
+
[
|
|
65
|
+
# each evaluation item is a document
|
|
66
|
+
{
|
|
67
|
+
"instructions": "Evaluate translation from en to cs_CZ", # message to show to users above the first item
|
|
68
|
+
"src": "This will be the year that Guinness loses its cool. Cheers to that!",
|
|
69
|
+
"tgt": "Nevím přesně, kdy jsem to poprvé zaznamenal. Možná to bylo ve chvíli, ..."
|
|
70
|
+
},
|
|
71
|
+
{
|
|
72
|
+
"src": "I'm not sure I can remember exactly when I sensed it. Maybe it was when some...",
|
|
73
|
+
"tgt": "Tohle bude rok, kdy Guinness přijde o svůj „cool“ faktor. Na zdraví!"
|
|
74
|
+
}
|
|
75
|
+
...
|
|
76
|
+
],
|
|
77
|
+
# more document
|
|
78
|
+
...
|
|
79
|
+
],
|
|
80
|
+
# data for second task/user
|
|
81
|
+
[
|
|
82
|
+
...
|
|
83
|
+
],
|
|
84
|
+
# arbitrary number of users (each corresponds to a single URL to be shared)
|
|
85
|
+
]
|
|
86
|
+
}
|
|
87
|
+
```
|
|
88
|
+
Task items are protocol-specific. For ESA/DA/MQM protocols, each item is a dictionary representing a document unit:
|
|
89
|
+
```python
|
|
90
|
+
[
|
|
91
|
+
{
|
|
92
|
+
"src": "A najednou se všechna tato voda naplnila dalšími lidmi a dalšími věcmi.", # required
|
|
93
|
+
"tgt": "And suddenly all the water became full of other people and other people." # required
|
|
94
|
+
},
|
|
95
|
+
{
|
|
96
|
+
"src": "toto je pokračování stejného dokumentu",
|
|
97
|
+
"tgt": "this is a continuation of the same document"
|
|
98
|
+
# Additional keys stored for analysis
|
|
99
|
+
}
|
|
100
|
+
]
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
Load campaigns and start the server:
|
|
104
|
+
```bash
|
|
105
|
+
pearmut add my_campaign.json # Use -o/--overwrite to replace existing
|
|
106
|
+
pearmut run
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
### Assignment Types
|
|
110
|
+
|
|
111
|
+
- **`task-based`**: Each user has predefined items
|
|
112
|
+
- **`single-stream`**: All users draw from a shared pool (random assignment)
|
|
113
|
+
- **`dynamic`**: work in progress ⚠️
|
|
114
|
+
|
|
115
|
+
### Protocol Templates
|
|
116
|
+
|
|
117
|
+
- **Pointwise**: Evaluate single output against single input
|
|
118
|
+
- `protocol_score`: Collect scores [0-100]
|
|
119
|
+
- `protocol_error_spans`: Collect error span highlights
|
|
120
|
+
- `protocol_error_categories`: Collect MQM category labels
|
|
121
|
+
- **Listwise**: Evaluate multiple outputs simultaneously
|
|
122
|
+
- Same protocol options as pointwise
|
|
123
|
+
|
|
124
|
+
## Advanced Features
|
|
125
|
+
|
|
126
|
+
### Pre-filled Error Spans (ESA<sup>AI</sup>)
|
|
127
|
+
|
|
128
|
+
Include `error_spans` to pre-fill annotations that users can review, modify, or delete:
|
|
129
|
+
|
|
130
|
+
```python
|
|
131
|
+
{
|
|
132
|
+
"src": "The quick brown fox jumps over the lazy dog.",
|
|
133
|
+
"tgt": "Rychlá hnědá liška skáče přes líného psa.",
|
|
134
|
+
"error_spans": [
|
|
135
|
+
{
|
|
136
|
+
"start_i": 0, # character index start (inclusive)
|
|
137
|
+
"end_i": 5, # character index end (inclusive)
|
|
138
|
+
"severity": "minor", # "minor", "major", "neutral", or null
|
|
139
|
+
"category": null # MQM category string or null
|
|
140
|
+
},
|
|
141
|
+
{
|
|
142
|
+
"start_i": 27,
|
|
143
|
+
"end_i": 32,
|
|
144
|
+
"severity": "major",
|
|
145
|
+
"category": null
|
|
146
|
+
}
|
|
147
|
+
]
|
|
148
|
+
}
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
For **listwise** template, `error_spans` is a 2D array (one per candidate). See [examples/esaai_prefilled.json](examples/esaai_prefilled.json).
|
|
152
|
+
|
|
153
|
+
### Tutorial and Attention Checks
|
|
154
|
+
|
|
155
|
+
Add `validation` rules for tutorials or attention checks:
|
|
156
|
+
|
|
157
|
+
```python
|
|
158
|
+
{
|
|
159
|
+
"src": "The quick brown fox jumps.",
|
|
160
|
+
"tgt": "Rychlá hnědá liška skáče.",
|
|
161
|
+
"validation": {
|
|
162
|
+
"warning": "Please set score between 70-80.", # shown on failure (omit for silent logging)
|
|
163
|
+
"score": [70, 80], # required score range [min, max]
|
|
164
|
+
"error_spans": [{"start_i": [0, 2], "end_i": [4, 8], "severity": "minor"}], # expected spans
|
|
165
|
+
"allow_skip": true # show "skip tutorial" button
|
|
166
|
+
}
|
|
167
|
+
}
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
**Types:**
|
|
171
|
+
- **Tutorial**: Include `allow_skip: true` and `warning` to let users skip after feedback
|
|
172
|
+
- **Loud attention checks**: Include `warning` without `allow_skip` to force retry
|
|
173
|
+
- **Silent attention checks**: Omit `warning` to log failures without notification (quality control)
|
|
174
|
+
|
|
175
|
+
For listwise, `validation` is an array (one per candidate). Dashboard shows ✅/❌ based on `validation_threshold` in `info` (integer for max failed count, float \[0,1\) for max proportion, default 0).
|
|
176
|
+
See [examples/tutorial_pointwise.json](examples/tutorial_pointwise.json) and [examples/tutorial_listwise.json](examples/tutorial_listwise.json).
|
|
177
|
+
|
|
178
|
+
### Single-stream Assignment
|
|
179
|
+
|
|
180
|
+
All annotators draw from a shared pool with random assignment:
|
|
181
|
+
```python
|
|
182
|
+
{
|
|
183
|
+
"campaign_id": "my campaign 6",
|
|
184
|
+
"info": {
|
|
185
|
+
"assignment": "single-stream",
|
|
186
|
+
"template": "pointwise",
|
|
187
|
+
"protocol_score": True, # collect scores
|
|
188
|
+
"protocol_error_spans": True, # collect error spans
|
|
189
|
+
"protocol_error_categories": False, # do not collect MQM categories, so ESA
|
|
190
|
+
"users": 50, # number of annotators (can also be a list, see below)
|
|
191
|
+
},
|
|
192
|
+
"data": [...], # list of all items (shared among all annotators)
|
|
193
|
+
}
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
|
|
197
|
+
### Pre-defined User IDs and Tokens
|
|
198
|
+
|
|
199
|
+
The `users` field accepts:
|
|
200
|
+
- **Number** (e.g., `50`): Generate random user IDs
|
|
201
|
+
- **List of strings** (e.g., `["alice", "bob"]`): Use specific user IDs
|
|
202
|
+
- **List of dictionaries**: Specify custom tokens:
|
|
203
|
+
```python
|
|
204
|
+
{
|
|
205
|
+
"info": {
|
|
206
|
+
...
|
|
207
|
+
"users": [
|
|
208
|
+
{"user_id": "alice", "token_pass": "alice_done", "token_fail": "alice_fail"},
|
|
209
|
+
{"user_id": "bob", "token_pass": "bob_done"} # missing tokens are auto-generated
|
|
210
|
+
],
|
|
211
|
+
},
|
|
212
|
+
...
|
|
213
|
+
}
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
### Multimodal Annotations
|
|
217
|
+
|
|
218
|
+
Support for HTML-compatible elements (YouTube embeds, `<video>` tags, images). Ensure elements are pre-styled. See [examples/multimodal.json](examples/multimodal.json).
|
|
219
|
+
|
|
220
|
+
<img width="1000" alt="Preview of multimodal elements in Pearmut" src="https://github.com/user-attachments/assets/77c4fa96-ee62-4e46-8e78-fd16e9007956" />
|
|
221
|
+
|
|
222
|
+
### Hosting Assets
|
|
223
|
+
|
|
224
|
+
Host local assets (audio, images, videos) using the `assets` key:
|
|
225
|
+
|
|
226
|
+
```python
|
|
227
|
+
{
|
|
228
|
+
"campaign_id": "my_campaign",
|
|
229
|
+
"info": {
|
|
230
|
+
"assets": {
|
|
231
|
+
"source": "videos", # Source directory
|
|
232
|
+
"destination": "assets/my_videos" # Mount path (must start with "assets/")
|
|
233
|
+
}
|
|
234
|
+
},
|
|
235
|
+
"data": [ ... ]
|
|
236
|
+
}
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
Files from `videos/` become accessible at `localhost:8001/assets/my_videos/`. Creates a symlink, so source directory must exist throughout annotation. Destination paths must be unique across campaigns.
|
|
240
|
+
|
|
241
|
+
## CLI Commands
|
|
242
|
+
|
|
243
|
+
- **`pearmut add <file(s)>`**: Add campaign JSON files (supports wildcards)
|
|
244
|
+
- `-o/--overwrite`: Replace existing campaigns with same ID
|
|
245
|
+
- `--server <url>`: Server URL prefix (default: `http://localhost:8001`)
|
|
246
|
+
- **`pearmut run`**: Start server
|
|
247
|
+
- `--port <port>`: Server port (default: 8001)
|
|
248
|
+
- `--server <url>`: Server URL prefix
|
|
249
|
+
- **`pearmut purge [campaign]`**: Remove campaign data
|
|
250
|
+
- Without args: Purge all campaigns
|
|
251
|
+
- With campaign name: Purge specific campaign only
|
|
252
|
+
|
|
253
|
+
## Campaign Management
|
|
254
|
+
|
|
255
|
+
Management link (shown when adding campaigns or running server) provides:
|
|
256
|
+
- Annotator progress overview
|
|
257
|
+
- Access to annotation links
|
|
258
|
+
- Task progress reset (data preserved)
|
|
259
|
+
- Download progress and annotations
|
|
260
|
+
|
|
261
|
+
<img width="800" alt="Management dashboard" src="https://github.com/user-attachments/assets/800a1741-5f41-47ac-9d5d-5cbf6abfc0e6" />
|
|
262
|
+
|
|
263
|
+
Completion tokens are shown at annotation end for verification (download correct tokens from dashboard). Incorrect tokens can be shown if quality control fails.
|
|
264
|
+
|
|
265
|
+
<img width="500" alt="Token on completion" src="https://github.com/user-attachments/assets/40eb904c-f47a-4011-aa63-9a4f1c501549" />
|
|
266
|
+
|
|
267
|
+
## Development
|
|
268
|
+
|
|
269
|
+
Server responds to data-only requests from frontend (no template coupling). Frontend served from pre-built `static/` on install.
|
|
270
|
+
|
|
271
|
+
### Local development:
|
|
272
|
+
```bash
|
|
273
|
+
cd pearmut
|
|
274
|
+
# Frontend (separate terminal, recompiles on change)
|
|
275
|
+
npm install web/ --prefix web/
|
|
276
|
+
npm run build --prefix web/
|
|
277
|
+
# optionally keep running indefinitely to auto-rebuild
|
|
278
|
+
npm watch build --prefix web/
|
|
279
|
+
|
|
280
|
+
# Install as editable
|
|
281
|
+
pip3 install -e .
|
|
282
|
+
# Load examples
|
|
283
|
+
pearmut add examples/wmt25_#_en-cs_CZ.json examples/wmt25_#_cs-de_DE.json
|
|
284
|
+
pearmut run
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
### Creating new protocols:
|
|
288
|
+
1. Add HTML and TS files to `web/src`
|
|
289
|
+
2. Add build rule to `webpack.config.js`
|
|
290
|
+
3. Reference as `info->template` in campaign JSON
|
|
291
|
+
|
|
292
|
+
See [web/src/pointwise.ts](web/src/pointwise.ts) for example.
|
|
293
|
+
|
|
294
|
+
### Deployment
|
|
295
|
+
|
|
296
|
+
Run on public server or tunnel local port to public IP/domain and run locally.
|
|
297
|
+
|
|
298
|
+
## Misc.
|
|
299
|
+
|
|
300
|
+
If you use this work in your paper, please cite as following.
|
|
301
|
+
```bibtex
|
|
302
|
+
@misc{zouhar2025pearmut,
|
|
303
|
+
author={Vilém Zouhar},
|
|
304
|
+
title={Pearmut: Platform for Evaluating and Reviewing of Multilingual Tasks},
|
|
305
|
+
url={https://github.com/zouharvi/pearmut/},
|
|
306
|
+
year={2025},
|
|
307
|
+
}
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
Contributions are welcome! Please reach out to [Vilém Zouhar](mailto:vilem.zouhar@gmail.com).
|