pearmut 0.2.6__py3-none-any.whl → 0.2.7__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,330 @@
1
+ Metadata-Version: 2.4
2
+ Name: pearmut
3
+ Version: 0.2.7
4
+ Summary: A tool for evaluation of model outputs, primarily MT.
5
+ Author-email: Vilém Zouhar <vilem.zouhar@gmail.com>
6
+ License: MIT
7
+ Project-URL: Repository, https://github.com/zouharvi/pearmut
8
+ Project-URL: Issues, https://github.com/zouharvi/pearmut/issues
9
+ Keywords: evaluation,machine translation,human evaluation,annotation
10
+ Requires-Python: >=3.12
11
+ Description-Content-Type: text/markdown
12
+ License-File: LICENSE
13
+ Requires-Dist: fastapi>=0.110.0
14
+ Requires-Dist: uvicorn>=0.29.0
15
+ Requires-Dist: wonderwords>=3.0.0
16
+ Requires-Dist: psutil>=7.1.0
17
+ Provides-Extra: dev
18
+ Requires-Dist: pytest; extra == "dev"
19
+ Dynamic: license-file
20
+
21
+ # Pearmut 🍐
22
+
23
+ **Platform for Evaluation and Reviewing of Multilingual Tasks** — Evaluate model outputs for translation and NLP tasks with support for multimodal data (text, video, audio, images) and multiple annotation protocols ([DA](https://aclanthology.org/N15-1124/), [ESA](https://aclanthology.org/2024.wmt-1.131/), [ESA<sup>AI</sup>](https://aclanthology.org/2025.naacl-long.255/), [MQM](https://doi.org/10.1162/tacl_a_00437), and more!).
24
+
25
+ [![PyPi version](https://badgen.net/pypi/v/pearmut/)](https://pypi.org/project/pearmut)
26
+ &nbsp;
27
+ [![PyPI download/month](https://img.shields.io/pypi/dm/pearmut.svg)](https://pypi.python.org/pypi/pearmut/)
28
+ &nbsp;
29
+ [![PyPi license](https://badgen.net/pypi/license/pearmut/)](https://pypi.org/project/pearmut/)
30
+ &nbsp;
31
+ [![build status](https://github.com/zouharvi/pearmut/actions/workflows/test.yml/badge.svg)](https://github.com/zouharvi/pearmut/actions/workflows/test.yml)
32
+
33
+ <img width="1000" alt="Screenshot of ESA/MQM interface" src="https://github.com/user-attachments/assets/4fb9a1cb-78ac-47e0-99cd-0870a368a0ad" />
34
+
35
+ ## Table of Contents
36
+
37
+ - [Quick Start](#quick-start)
38
+ - [Campaign Configuration](#campaign-configuration)
39
+ - [Basic Structure](#basic-structure)
40
+ - [Assignment Types](#assignment-types)
41
+ - [Protocol Templates](#protocol-templates)
42
+ - [Advanced Features](#advanced-features)
43
+ - [Pre-filled Error Spans (ESA<sup>AI</sup>)](#pre-filled-error-spans-esaai)
44
+ - [Tutorial and Attention Checks](#tutorial-and-attention-checks)
45
+ - [Pre-defined User IDs and Tokens](#pre-defined-user-ids-and-tokens)
46
+ - [Multimodal Annotations](#multimodal-annotations)
47
+ - [Hosting Assets](#hosting-assets)
48
+ - [Campaign Management](#campaign-management)
49
+ - [CLI Commands](#cli-commands)
50
+ - [Development](#development)
51
+ - [Citation](#citation)
52
+
53
+ ## Quick Start
54
+
55
+ Install and run locally without cloning:
56
+ ```bash
57
+ pip install pearmut
58
+ # Download example campaigns
59
+ wget https://raw.githubusercontent.com/zouharvi/pearmut/refs/heads/main/examples/esa_encs.json
60
+ wget https://raw.githubusercontent.com/zouharvi/pearmut/refs/heads/main/examples/da_enuk.json
61
+ # Load and start
62
+ pearmut add esa_encs.json da_enuk.json
63
+ pearmut run
64
+ ```
65
+
66
+ ## Campaign Configuration
67
+
68
+ ### Basic Structure
69
+
70
+ Campaigns are defined in JSON files (see [examples/](examples/)). The simplest configuration uses `task-based` assignment where each user has pre-defined tasks:
71
+ ```python
72
+ {
73
+ "info": {
74
+ "assignment": "task-based",
75
+ "template": "pointwise",
76
+ "protocol_score": true, # we want scores [0...100] for each segment
77
+ "protocol_error_spans": true, # we want error spans
78
+ "protocol_error_categories": false, # we do not want error span categories
79
+ },
80
+ "campaign_id": "wmt25_#_en-cs_CZ",
81
+ "data": [
82
+ # data for first task/user
83
+ [
84
+ [
85
+ # each evaluation item is a document
86
+ {
87
+ "instructions": "Evaluate translation from en to cs_CZ", # message to show to users above the first item
88
+ "src": "This will be the year that Guinness loses its cool. Cheers to that!",
89
+ "tgt": "Nevím přesně, kdy jsem to poprvé zaznamenal. Možná to bylo ve chvíli, ..."
90
+ },
91
+ {
92
+ "src": "I'm not sure I can remember exactly when I sensed it. Maybe it was when some...",
93
+ "tgt": "Tohle bude rok, kdy Guinness přijde o svůj „cool“ faktor. Na zdraví!"
94
+ }
95
+ ...
96
+ ],
97
+ # more document
98
+ ...
99
+ ],
100
+ # data for second task/user
101
+ [
102
+ ...
103
+ ],
104
+ # arbitrary number of users (each corresponds to a single URL to be shared)
105
+ ]
106
+ }
107
+ ```
108
+ Task items are protocol-specific. For ESA/DA/MQM protocols, each item is a dictionary representing a document unit:
109
+ ```python
110
+ [
111
+ {
112
+ "src": "A najednou se všechna tato voda naplnila dalšími lidmi a dalšími věcmi.", # required
113
+ "tgt": "And suddenly all the water became full of other people and other people." # required
114
+ },
115
+ {
116
+ "src": "toto je pokračování stejného dokumentu",
117
+ "tgt": "this is a continuation of the same document"
118
+ # Additional keys stored for analysis
119
+ }
120
+ ]
121
+ ```
122
+
123
+ Load campaigns and start the server:
124
+ ```bash
125
+ pearmut add my_campaign.json # Use -o/--overwrite to replace existing
126
+ pearmut run
127
+ ```
128
+
129
+ ### Assignment Types
130
+
131
+ - **`task-based`**: Each user has predefined items
132
+ - **`single-stream`**: All users draw from a shared pool (random assignment)
133
+ - **`dynamic`**: work in progress ⚠️
134
+
135
+ ### Protocol Templates
136
+
137
+ - **Pointwise**: Evaluate single output against single input
138
+ - `protocol_score`: Collect scores [0-100]
139
+ - `protocol_error_spans`: Collect error span highlights
140
+ - `protocol_error_categories`: Collect MQM category labels
141
+ - **Listwise**: Evaluate multiple outputs simultaneously
142
+ - Same protocol options as pointwise
143
+
144
+ ## Advanced Features
145
+
146
+ ### Pre-filled Error Spans (ESA<sup>AI</sup>)
147
+
148
+ Include `error_spans` to pre-fill annotations that users can review, modify, or delete:
149
+
150
+ ```python
151
+ {
152
+ "src": "The quick brown fox jumps over the lazy dog.",
153
+ "tgt": "Rychlá hnědá liška skáče přes líného psa.",
154
+ "error_spans": [
155
+ {
156
+ "start_i": 0, # character index start (inclusive)
157
+ "end_i": 5, # character index end (inclusive)
158
+ "severity": "minor", # "minor", "major", "neutral", or null
159
+ "category": null # MQM category string or null
160
+ },
161
+ {
162
+ "start_i": 27,
163
+ "end_i": 32,
164
+ "severity": "major",
165
+ "category": null
166
+ }
167
+ ]
168
+ }
169
+ ```
170
+
171
+ For **listwise** template, `error_spans` is a 2D array (one per candidate). See [examples/esaai_prefilled.json](examples/esaai_prefilled.json).
172
+
173
+ ### Tutorial and Attention Checks
174
+
175
+ Add `validation` rules for tutorials or attention checks:
176
+
177
+ ```python
178
+ {
179
+ "src": "The quick brown fox jumps.",
180
+ "tgt": "Rychlá hnědá liška skáče.",
181
+ "validation": {
182
+ "warning": "Please set score between 70-80.", # shown on failure (omit for silent logging)
183
+ "score": [70, 80], # required score range [min, max]
184
+ "error_spans": [{"start_i": [0, 2], "end_i": [4, 8], "severity": "minor"}], # expected spans
185
+ "allow_skip": true # show "skip tutorial" button
186
+ }
187
+ }
188
+ ```
189
+
190
+ **Types:**
191
+ - **Tutorial**: Include `allow_skip: true` and `warning` to let users skip after feedback
192
+ - **Loud attention checks**: Include `warning` without `allow_skip` to force retry
193
+ - **Silent attention checks**: Omit `warning` to log failures without notification (quality control)
194
+
195
+ For listwise, `validation` is an array (one per candidate). Dashboard shows ✅/❌ based on `validation_threshold` in `info` (integer for max failed count, float \[0,1\) for max proportion, default 0).
196
+ See [examples/tutorial_pointwise.json](examples/tutorial_pointwise.json) and [examples/tutorial_listwise.json](examples/tutorial_listwise.json).
197
+
198
+ ### Single-stream Assignment
199
+
200
+ All annotators draw from a shared pool with random assignment:
201
+ ```python
202
+ {
203
+ "campaign_id": "my campaign 6",
204
+ "info": {
205
+ "assignment": "single-stream",
206
+ "template": "pointwise",
207
+ "protocol_score": True, # collect scores
208
+ "protocol_error_spans": True, # collect error spans
209
+ "protocol_error_categories": False, # do not collect MQM categories, so ESA
210
+ "users": 50, # number of annotators (can also be a list, see below)
211
+ },
212
+ "data": [...], # list of all items (shared among all annotators)
213
+ }
214
+ ```
215
+
216
+
217
+ ### Pre-defined User IDs and Tokens
218
+
219
+ The `users` field accepts:
220
+ - **Number** (e.g., `50`): Generate random user IDs
221
+ - **List of strings** (e.g., `["alice", "bob"]`): Use specific user IDs
222
+ - **List of dictionaries**: Specify custom tokens:
223
+ ```python
224
+ {
225
+ "info": {
226
+ ...
227
+ "users": [
228
+ {"user_id": "alice", "token_pass": "alice_done", "token_fail": "alice_fail"},
229
+ {"user_id": "bob", "token_pass": "bob_done"} # missing tokens are auto-generated
230
+ ],
231
+ },
232
+ ...
233
+ }
234
+ ```
235
+
236
+ ### Multimodal Annotations
237
+
238
+ Support for HTML-compatible elements (YouTube embeds, `<video>` tags, images). Ensure elements are pre-styled. See [examples/multimodal.json](examples/multimodal.json).
239
+
240
+ <img width="1000" alt="Preview of multimodal elements in Pearmut" src="https://github.com/user-attachments/assets/77c4fa96-ee62-4e46-8e78-fd16e9007956" />
241
+
242
+ ### Hosting Assets
243
+
244
+ Host local assets (audio, images, videos) using the `assets` key:
245
+
246
+ ```python
247
+ {
248
+ "campaign_id": "my_campaign",
249
+ "info": {
250
+ "assets": {
251
+ "source": "videos", # Source directory
252
+ "destination": "assets/my_videos" # Mount path (must start with "assets/")
253
+ }
254
+ },
255
+ "data": [ ... ]
256
+ }
257
+ ```
258
+
259
+ Files from `videos/` become accessible at `localhost:8001/assets/my_videos/`. Creates a symlink, so source directory must exist throughout annotation. Destination paths must be unique across campaigns.
260
+
261
+ ## CLI Commands
262
+
263
+ - **`pearmut add <file(s)>`**: Add campaign JSON files (supports wildcards)
264
+ - `-o/--overwrite`: Replace existing campaigns with same ID
265
+ - `--server <url>`: Server URL prefix (default: `http://localhost:8001`)
266
+ - **`pearmut run`**: Start server
267
+ - `--port <port>`: Server port (default: 8001)
268
+ - `--server <url>`: Server URL prefix
269
+ - **`pearmut purge [campaign]`**: Remove campaign data
270
+ - Without args: Purge all campaigns
271
+ - With campaign name: Purge specific campaign only
272
+
273
+ ## Campaign Management
274
+
275
+ Management link (shown when adding campaigns or running server) provides:
276
+ - Annotator progress overview
277
+ - Access to annotation links
278
+ - Task progress reset (data preserved)
279
+ - Download progress and annotations
280
+
281
+ <img width="800" alt="Management dashboard" src="https://github.com/user-attachments/assets/800a1741-5f41-47ac-9d5d-5cbf6abfc0e6" />
282
+
283
+ Completion tokens are shown at annotation end for verification (download correct tokens from dashboard). Incorrect tokens can be shown if quality control fails.
284
+
285
+ <img width="500" alt="Token on completion" src="https://github.com/user-attachments/assets/40eb904c-f47a-4011-aa63-9a4f1c501549" />
286
+
287
+ ## Development
288
+
289
+ Server responds to data-only requests from frontend (no template coupling). Frontend served from pre-built `static/` on install.
290
+
291
+ ### Local development:
292
+ ```bash
293
+ cd pearmut
294
+ # Frontend (separate terminal, recompiles on change)
295
+ npm install web/ --prefix web/
296
+ npm run build --prefix web/
297
+ # optionally keep running indefinitely to auto-rebuild
298
+ npm watch build --prefix web/
299
+
300
+ # Install as editable
301
+ pip3 install -e .
302
+ # Load examples
303
+ pearmut add examples/wmt25_#_en-cs_CZ.json examples/wmt25_#_cs-de_DE.json
304
+ pearmut run
305
+ ```
306
+
307
+ ### Creating new protocols:
308
+ 1. Add HTML and TS files to `web/src`
309
+ 2. Add build rule to `webpack.config.js`
310
+ 3. Reference as `info->template` in campaign JSON
311
+
312
+ See [web/src/pointwise.ts](web/src/pointwise.ts) for example.
313
+
314
+ ### Deployment
315
+
316
+ Run on public server or tunnel local port to public IP/domain and run locally.
317
+
318
+ ## Misc.
319
+
320
+ If you use this work in your paper, please cite as following.
321
+ ```bibtex
322
+ @misc{zouhar2025pearmut,
323
+ author={Vilém Zouhar},
324
+ title={Pearmut: Platform for Evaluating and Reviewing of Multilingual Tasks},
325
+ url={https://github.com/zouharvi/pearmut/},
326
+ year={2025},
327
+ }
328
+ ```
329
+
330
+ Contributions are welcome! Please reach out to [Vilém Zouhar](mailto:vilem.zouhar@gmail.com).
@@ -1,19 +1,19 @@
1
- pearmut/app.py,sha256=Q-DDm5zR42hA2YS0_lej6L4DDVV1cSEbYsp66ykBrss,8261
1
+ pearmut/app.py,sha256=QrAAMQI8L922AD6biirItlwDB6gT90e1HTBH_txkPyM,8298
2
2
  pearmut/assignment.py,sha256=GvulwsPEguA_rNZB58bDKYy1wVZX9j4vnmbrKH4m0Mo,10963
3
3
  pearmut/cli.py,sha256=pXPuLeu1ow557SrpPvLR-5jc1de1rBY5bR5PmiFMJyc,17975
4
4
  pearmut/utils.py,sha256=TWcbdTehg4CNwCpc5FuEOszpQM464LY0IQHHE_Sq1Zg,5293
5
- pearmut/static/dashboard.bundle.js,sha256=GGg5lNwwgzejCi1ZAI-p2HKp-oI8DWAgfytAoTL8fNE,91782
5
+ pearmut/static/dashboard.bundle.js,sha256=9eHVyQd65pu1AJUyipKu8fWyOw5x1Jmf0KvbtAWuZm4,98636
6
6
  pearmut/static/dashboard.html,sha256=fN-B0jyeezMZP4qisGA7lmQem-FqvfDP1i5ziErQK2M,2120
7
7
  pearmut/static/index.html,sha256=SC5M-NSTnJh1UNHCC5VOP0TKkmhNn6MHlY6L4GDacpA,849
8
- pearmut/static/listwise.bundle.js,sha256=Q5bGsvUE2dD9Rzm-1ED_F5oNjx2TW733rAJ3yusEg0o,106742
8
+ pearmut/static/listwise.bundle.js,sha256=dR-CQ9r8OBhz71Usv9Lfg1RT9OMndiKZEvqPMBuDUho,109029
9
9
  pearmut/static/listwise.html,sha256=4A0a_GMVIjJmqT3lhJMT9huqvwgvrRfztt0KA0lJxKI,5308
10
- pearmut/static/pointwise.bundle.js,sha256=4iAhWIuyLK53BOr98ntWvJtKO00hNcgdCDuUviM_uK4,108818
10
+ pearmut/static/pointwise.bundle.js,sha256=gX6bfzPqOK0xMTqcZtbrLM9TbICVbiicD5c4b2bw-AM,111274
11
11
  pearmut/static/pointwise.html,sha256=2NZYyjpznXP2b4GMeDcrjRYI5hZ45l7QgI-RQjkRUqs,5024
12
12
  pearmut/static/assets/favicon.svg,sha256=gVPxdBlyfyJVkiMfh8WLaiSyH4lpwmKZs8UiOeX8YW4,7347
13
13
  pearmut/static/assets/style.css,sha256=BrPnXTDr8hQ0M8T-EJlExddChzIFotlerBYMx2B8GDk,4136
14
- pearmut-0.2.6.dist-info/licenses/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
15
- pearmut-0.2.6.dist-info/METADATA,sha256=-bVVeR4r6Ah0hMwjzfMR9oGKFY2X3A2_v2RUaHW8sCc,14587
16
- pearmut-0.2.6.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
17
- pearmut-0.2.6.dist-info/entry_points.txt,sha256=eEA9LVWsS3neQbMvL_nMvEw8I0oFudw8nQa1iqxOiWM,45
18
- pearmut-0.2.6.dist-info/top_level.txt,sha256=CdgtUM-SKQDt6o5g0QreO-_7XTBP9_wnHMS1P-Rl5Go,8
19
- pearmut-0.2.6.dist-info/RECORD,,
14
+ pearmut-0.2.7.dist-info/licenses/LICENSE,sha256=GtR6RcTdRn-P23h5pKFuWSLZrLPD0ytHAwSOBt7aLpI,1071
15
+ pearmut-0.2.7.dist-info/METADATA,sha256=UMp9gm-oGgXr9Lh4o7Ele5vaUpl7K_C4aKg1mSPeo6c,11934
16
+ pearmut-0.2.7.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
17
+ pearmut-0.2.7.dist-info/entry_points.txt,sha256=eEA9LVWsS3neQbMvL_nMvEw8I0oFudw8nQa1iqxOiWM,45
18
+ pearmut-0.2.7.dist-info/top_level.txt,sha256=CdgtUM-SKQDt6o5g0QreO-_7XTBP9_wnHMS1P-Rl5Go,8
19
+ pearmut-0.2.7.dist-info/RECORD,,
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025- Vilém Zouhar
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.