pearmut 1.0.1__tar.gz → 1.0.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (28) hide show
  1. {pearmut-1.0.1 → pearmut-1.0.3}/PKG-INFO +119 -65
  2. {pearmut-1.0.1 → pearmut-1.0.3}/README.md +118 -64
  3. {pearmut-1.0.1 → pearmut-1.0.3}/pearmut.egg-info/PKG-INFO +119 -65
  4. {pearmut-1.0.1 → pearmut-1.0.3}/pearmut.egg-info/SOURCES.txt +2 -2
  5. {pearmut-1.0.1 → pearmut-1.0.3}/pyproject.toml +2 -2
  6. {pearmut-1.0.1 → pearmut-1.0.3}/server/app.py +56 -25
  7. {pearmut-1.0.1 → pearmut-1.0.3}/server/assignment.py +340 -105
  8. {pearmut-1.0.1 → pearmut-1.0.3}/server/cli.py +185 -104
  9. {pearmut-1.0.1 → pearmut-1.0.3}/server/results_export.py +1 -1
  10. pearmut-1.0.3/server/static/annotate.bundle.js +1 -0
  11. pearmut-1.0.3/server/static/annotate.html +164 -0
  12. pearmut-1.0.3/server/static/dashboard.bundle.js +1 -0
  13. {pearmut-1.0.1 → pearmut-1.0.3}/server/static/dashboard.html +6 -1
  14. {pearmut-1.0.1 → pearmut-1.0.3}/server/static/index.html +1 -1
  15. {pearmut-1.0.1 → pearmut-1.0.3}/server/static/style.css +46 -0
  16. {pearmut-1.0.1 → pearmut-1.0.3}/server/utils.py +40 -21
  17. pearmut-1.0.1/server/static/basic.bundle.js +0 -1
  18. pearmut-1.0.1/server/static/basic.html +0 -133
  19. pearmut-1.0.1/server/static/dashboard.bundle.js +0 -1
  20. {pearmut-1.0.1 → pearmut-1.0.3}/LICENSE +0 -0
  21. {pearmut-1.0.1 → pearmut-1.0.3}/pearmut.egg-info/dependency_links.txt +0 -0
  22. {pearmut-1.0.1 → pearmut-1.0.3}/pearmut.egg-info/entry_points.txt +0 -0
  23. {pearmut-1.0.1 → pearmut-1.0.3}/pearmut.egg-info/requires.txt +0 -0
  24. {pearmut-1.0.1 → pearmut-1.0.3}/pearmut.egg-info/top_level.txt +0 -0
  25. {pearmut-1.0.1 → pearmut-1.0.3}/server/constants.py +0 -0
  26. {pearmut-1.0.1 → pearmut-1.0.3}/server/static/favicon.svg +0 -0
  27. {pearmut-1.0.1 → pearmut-1.0.3}/server/static/index.bundle.js +0 -0
  28. {pearmut-1.0.1 → pearmut-1.0.3}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: pearmut
3
- Version: 1.0.1
3
+ Version: 1.0.3
4
4
  Summary: A tool for evaluation of model outputs, primarily MT.
5
5
  Author-email: Vilém Zouhar <vilem.zouhar@gmail.com>
6
6
  License: MIT
@@ -19,7 +19,7 @@ Provides-Extra: dev
19
19
  Requires-Dist: pytest; extra == "dev"
20
20
  Dynamic: license-file
21
21
 
22
- # 🍐Pearmut &nbsp; &nbsp; [![PyPi version](https://badgen.net/pypi/v/pearmut/)](https://pypi.org/project/pearmut) [![PyPI download/month](https://img.shields.io/pypi/dm/pearmut.svg)](https://pypi.python.org/pypi/pearmut/) [![PyPi license](https://badgen.net/pypi/license/pearmut/)](https://pypi.org/project/pearmut/) [![build status](https://github.com/zouharvi/pearmut/actions/workflows/test.yml/badge.svg)](https://github.com/zouharvi/pearmut/actions/workflows/test.yml)
22
+ # 🍐Pearmut <br> [![PyPi version](https://badgen.net/pypi/v/pearmut/)](https://pypi.org/project/pearmut) [![PyPI download/month](https://img.shields.io/pypi/dm/pearmut.svg)](https://pypi.python.org/pypi/pearmut/) [![PyPi license](https://badgen.net/pypi/license/pearmut/)](https://pypi.org/project/pearmut/) [![build status](https://github.com/zouharvi/pearmut/actions/workflows/test.yml/badge.svg)](https://github.com/zouharvi/pearmut/actions/workflows/test.yml) [![arXiv](https://img.shields.io/badge/arXiv-2601.02933-b31b1b.svg?style=flat)](https://arxiv.org/abs/2601.02933)
23
23
 
24
24
  **Platform for Evaluation and Reviewing of Multilingual Tasks**: Evaluate model outputs for translation and NLP tasks with support for multimodal data (text, video, audio, images) and multiple annotation protocols ([DA](https://aclanthology.org/N15-1124/), [ESA](https://aclanthology.org/2024.wmt-1.131/), [ESA<sup>AI</sup>](https://aclanthology.org/2025.naacl-long.255/), [MQM](https://doi.org/10.1162/tacl_a_00437), and more!).
25
25
 
@@ -35,12 +35,15 @@ Dynamic: license-file
35
35
  - [Assignment Types](#assignment-types)
36
36
  - [Advanced Features](#advanced-features)
37
37
  - [Pre-filled Error Spans (ESA<sup>AI</sup>)](#pre-filled-error-spans-esaai)
38
+ - [Custom MQM Taxonomy](#custom-mqm-taxonomy)
38
39
  - [Tutorial and Attention Checks](#tutorial-and-attention-checks)
40
+ - [Form Items for User Metadata](#form-items-for-user-metadata)
39
41
  - [Pre-defined User IDs and Tokens](#pre-defined-user-ids-and-tokens)
40
42
  - [Multimodal Annotations](#multimodal-annotations)
41
43
  - [Hosting Assets](#hosting-assets)
42
44
  - [Campaign Management](#campaign-management)
43
45
  - [Custom Completion Messages](#custom-completion-messages)
46
+ - [Prolific Integration](#prolific-integration)
44
47
  - [CLI Commands](#cli-commands)
45
48
  - [Terminology](#terminology)
46
49
  - [Development](#development)
@@ -141,6 +144,22 @@ The `shuffle` parameter in campaign `info` controls this behavior:
141
144
  "data": [...]
142
145
  }
143
146
  ```
147
+ Documents in `data_welcome` are not shuffled and so don't require to have the same models in all documents.
148
+
149
+ ### Showing Model Names
150
+
151
+ By default, model names are hidden to avoid biasing annotators. To display model names on top of each output block, set `show_model_names` to `true`:
152
+ ```python
153
+ {
154
+ "info": {
155
+ "assignment": "task-based",
156
+ "protocol": "ESA",
157
+ "show_model_names": true # Default: false.
158
+ },
159
+ "campaign_id": "my_campaign",
160
+ "data": [...]
161
+ }
162
+ ```
144
163
 
145
164
  ### Custom Score Sliders
146
165
 
@@ -163,6 +182,52 @@ For multi-dimensional evaluation tasks (e.g., assessing fluency on a Likert scal
163
182
 
164
183
  When `sliders` is specified, only the custom sliders are shown. Each slider must have `name`, `min`, `max`, and `step` properties. All sliders must be answered before proceeding.
165
184
 
185
+ ### Textfield for Post-editing/Translation
186
+
187
+ Enable a textfield for post-editing or translation tasks using the `textfield` parameter in `info`. The textfield content is stored in annotations alongside scores and error spans.
188
+
189
+ ```python
190
+ {
191
+ "info": {
192
+ "protocol": "DA",
193
+ "textfield": "prefilled" # Options: null, "hidden", "visible", "prefilled"
194
+ }
195
+ }
196
+ ```
197
+
198
+ **Textfield modes:**
199
+ - `null` or omitted: No textfield (default)
200
+ - `"hidden"`: Textfield hidden by default, shown by clicking a button
201
+ - `"visible"`: Textfield always visible
202
+ - `"prefilled"`: Textfield visible and pre-filled with model output for post-editing
203
+
204
+ ### Custom MQM Taxonomy
205
+
206
+ For MQM protocol campaigns, you can define a custom error taxonomy instead of using the default MQM categories. Specify `mqm_categories` in the campaign `info` section as a dictionary mapping main categories to lists of subcategories:
207
+
208
+
209
+ ```python
210
+ {
211
+ "info": {
212
+ "assignment": "task-based",
213
+ "protocol": "MQM",
214
+ "mqm_categories": {
215
+ "": [], # Empty selection option
216
+ "General": ["", "Accuracy", "Fluency"],
217
+ "Audio-specific": ["", "Inaudible", "Background noise", "Speaker overlap", "Misinterpretation"],
218
+ "Style": ["", "Awkward", "Embarassing"],
219
+ "Unknown": [] # Category with no subcategories
220
+ }
221
+ },
222
+ "campaign_id": "custom_mqm_example",
223
+ "data": [...]
224
+ }
225
+ ```
226
+
227
+ If `mqm_categories` is not provided, the default MQM taxonomy will be used. The empty string key `""` provides an unselected state in the dropdown. Categories with empty subcategory lists (e.g., `"Style": []`) do not require a subcategory selection.
228
+
229
+ See [examples/custom_mqm.json](examples/custom_mqm.json) for a complete example.
230
+
166
231
  ### Custom Instructions
167
232
 
168
233
  Set campaign-level instructions using the `instructions` field in `info` (supports HTML).
@@ -252,6 +317,34 @@ The `score_greaterthan` field specifies the index of the candidate that must hav
252
317
  See [examples/tutorial/esa_deen.json](examples/tutorial/esa_deen.json) for a mock campaign with a fully prepared ESA tutorial.
253
318
  To use it, simply extract the `data` attribute and prefix it to each task in your campaign.
254
319
 
320
+ #### Universal Tutorial Items with `data_welcome`
321
+
322
+ Use `data_welcome` to add tutorial items that users must complete before starting regular tasks. The structure is a list of documents (same as `data`). Welcome items have IDs `welcome_0`, `welcome_1`, etc. and are tracked separately via `progress_welcome`.
323
+
324
+ ### Form Items for User Metadata
325
+
326
+ Collect user information (demographics, expertise) before annotation tasks using form items in `data_welcome`.
327
+ Form items have `text` (label/question) and `form` (field type: `null`, `"string"`, `"number"`, `"choices"`, and `"script"`).
328
+ Documents must be homogeneous: all form items or all evaluation items.
329
+
330
+ ```python
331
+ {
332
+ "data_welcome": [
333
+ [
334
+ {"text": "What is your native language?", "form": "string"},
335
+ {"text": "Rate your expertise (1-10)", "form": "number"}
336
+ ]
337
+ ]
338
+ }
339
+ ```
340
+
341
+ <img width="400" alt="Screenshot of a user form" src="https://github.com/user-attachments/assets/2310e8dc-98e9-4abf-8a27-6781b0094efe" />
342
+
343
+
344
+ It is possible to automatically collect additional information from the host system using `"script"` field type.
345
+ Typically such a form document (or their sequence) would be stored in `"data_welcome"` such that it is both mandatory and show to all users.
346
+ See [examples/user_info_form.json](examples/user_info_form.json).
347
+
255
348
  ### Single-stream Assignment
256
349
 
257
350
  All annotators draw from a shared pool with random assignment:
@@ -265,11 +358,14 @@ All annotators draw from a shared pool with random assignment:
265
358
  # ESA: error spans and scores
266
359
  "protocol": "ESA",
267
360
  "users": 50, # number of annotators (can also be a list, see below)
361
+ "docs_per_user": 10, # optional: show goodbye after N documents per user
268
362
  },
269
363
  "data": [...], # list of all items (shared among all annotators)
270
364
  }
271
365
  ```
272
366
 
367
+ Set `docs_per_user` to limit how many documents each user annotates before seeing the goodbye message (for single-stream, this is the number of documents).
368
+
273
369
  ### Dynamic Assignment
274
370
 
275
371
  The `dynamic` assignment type intelligently selects items based on current model performance to focus annotation effort on top-performing models using contrastive comparisons.
@@ -286,11 +382,14 @@ All items must contain outputs from all models for this assignment type to work
286
382
  "dynamic_contrastive_models": 2, # how many models to compare per item (optional, default: 1)
287
383
  "dynamic_first": 5, # annotations per model before dynamic kicks in (optional, default: 5)
288
384
  "dynamic_backoff": 0.1, # probability of uniform sampling (optional, default: 0)
385
+ "docs_per_user": 20, # optional: show goodbye after N documents per user
289
386
  },
290
387
  "data": [...], # list of all items (shared among all annotators)
291
388
  }
292
389
  ```
293
390
 
391
+ Set `docs_per_user` to limit how many documents each user annotates before seeing the goodbye message (for dynamic, this is roughly the number of documents × models).
392
+
294
393
  **How it works:**
295
394
  1. Initial phase: Each model gets `dynamic_first` annotations with fully random contrastive evaluation
296
395
  2. Dynamic phase: After the initial phase, top `dynamic_top` models (by average score) are identified
@@ -378,6 +477,14 @@ When tokens are supplied, the dashboard will try to show model rankings based on
378
477
 
379
478
  Customize the goodbye message shown to users when they complete all annotations using the `instructions_goodbye` field in campaign info. Supports arbitrary HTML for styling and formatting with variable replacement: `${TOKEN}` (completion token) and `${USER_ID}` (user ID). Default: `"If someone asks you for a token of completion, show them: ${TOKEN}"`.
380
479
 
480
+ ### Prolific Integration
481
+
482
+ Use task-based assignment with Prolific. For each task, Pearmut generates a unique URL which can be uploaded to Prolific's interface. Add redirect (on completion) to `instructions_goodbye`:
483
+ ```json
484
+ "instructions_goodbye": "<a href='https://app.prolific.com/submissions/complete?cc=${TOKEN}'>Click here to return to Prolific</a>"
485
+ ```
486
+ The `${TOKEN}` is automatically replaced based on passing attention checks (see [Attention checks](#tutorial-and-attention-checks) and [Pre-defined tokens](#pre-defined-user-ids-and-tokens)).
487
+
381
488
  ## Terminology
382
489
 
383
490
  - **Campaign**: An annotation project that contains configuration, data, and user assignments. Each campaign has a unique identifier and is defined in a JSON file.
@@ -401,7 +508,7 @@ Customize the goodbye message shown to users when they complete all annotations
401
508
  - **Score**: Numeric quality rating (0-100)
402
509
  - **Error Spans**: Text highlights marking errors with severity (`minor`, `major`)
403
510
  - **Error Categories**: MQM taxonomy labels for errors
404
- - **Template**: The annotation interface type. The `basic` template supports comparing multiple outputs simultaneously.
511
+ - **Template**: The annotation interface type. The `annotate` template supports comparing multiple outputs simultaneously.
405
512
  - **Assignment**: The method for distributing items to users:
406
513
  - **Task-based**: Each user has predefined items
407
514
  - **Single-stream**: Users draw from a shared pool with random assignment
@@ -432,7 +539,7 @@ pearmut run
432
539
  2. Add build rule to `webpack.config.js`
433
540
  3. Reference as `info->template` in campaign JSON
434
541
 
435
- See [web/src/basic.ts](web/src/basic.ts) for example.
542
+ See [web/src/annotate.ts](web/src/annotate.ts) for example.
436
543
 
437
544
  ### Deployment
438
545
 
@@ -443,68 +550,15 @@ Run on public server or tunnel local port to public IP/domain and run locally.
443
550
  If you use this work in your paper, please cite as following.
444
551
  ```bibtex
445
552
  @misc{zouhar2026pearmut,
446
- author = {Zouhar, Vilém},
447
- title = {Pearmut: Human Evaluation of Translation Made Trivial},
448
- year = {2026}
553
+ title={Pearmut: Human Evaluation of Translation Made Trivial},
554
+ author={Vilém Zouhar and Tom Kocmi},
555
+ year={2026},
556
+ eprint={2601.02933},
557
+ archivePrefix={arXiv},
558
+ primaryClass={cs.CL},
559
+ url={https://arxiv.org/abs/2601.02933},
449
560
  }
450
561
  ```
451
562
 
452
563
  Contributions are welcome! Please reach out to [Vilém Zouhar](mailto:vilem.zouhar@gmail.com).
453
-
454
- # Changelog
455
-
456
- - v1.0.1
457
- - Support RTL languages
458
- - Add boxes for references
459
- - Add custom score sliders for multi-dimensional evaluation
460
- - Make instructions customizable and protocol-dependent
461
- - Support custom sliders
462
- - Purge/reset whole tasks from dashboard
463
- - Fix resetting individual users in single-stream/dynamic
464
- - Fix notification stacking
465
- - Add campaigns from dashboard
466
- - v0.3.3
467
- - Rename `doc_id` to `item_id`
468
- - Add Typst, LaTeX, and PDF export for model ranking tables. Hide them by default.
469
- - Add dynamic assignment type with contrastive model comparison
470
- - Add `instructions_goodbye` field with variable substitution
471
- - Add visual anchors at 33% and 66% on sliders
472
- - Add German→English ESA tutorial with attention checks
473
- - Validate document model consistency before shuffle
474
- - Fix UI block on any interaction
475
- - v0.3.2
476
- - Revert seeding of user IDs
477
- - Set ESA (Error Span Annotation) as default
478
- - Update server IP address configuration
479
- - Show approximate alignment by default
480
- - Unify pointwise and listwise interfaces into `basic`
481
- - Refactor protocol configuration (breaking change)
482
- - v0.2.11
483
- - Add comment field in settings panel
484
- - Add `score_gt` validation for listwise comparisons
485
- - Add Content-Disposition headers for proper download filenames
486
- - Add model results display to dashboard with rankings
487
- - Add campaign file structure validation
488
- - Purge command now unlinks assets
489
- - v0.2.6
490
- - Add frozen annotation links feature for view-only mode
491
- - Add word-level annotation mode toggle for error spans
492
- - Add `[missing]` token support
493
- - Improve frontend speed and cleanup toolboxes on item load
494
- - Host assets via symlinks
495
- - Add validation threshold for success/fail tokens
496
- - Implement reset masking for annotations
497
- - Allow pre-defined user IDs and tokens in campaign data
498
- - v0.1.1
499
- - Set server defaults and add VM launch scripts
500
- - Add warning dialog when navigating away with unsaved work
501
- - Add tutorial validation support for pointwise and listwise
502
- - Add ability to preview existing annotations via progress bar
503
- - Add support for ESA<sup>AI</sup> pre-filled error_spans
504
- - Rename pairwise to listwise and update layout
505
- - Implement single-stream assignment type
506
- - v0.0.3
507
- - Support multimodal inputs and outputs
508
- - Add dashboard
509
- - Implement ESA (Error Span Annotation) and MQM support
510
-
564
+ See changes in [CHANGELOG.md](CHANGELOG.md).
@@ -1,4 +1,4 @@
1
- # 🍐Pearmut &nbsp; &nbsp; [![PyPi version](https://badgen.net/pypi/v/pearmut/)](https://pypi.org/project/pearmut) [![PyPI download/month](https://img.shields.io/pypi/dm/pearmut.svg)](https://pypi.python.org/pypi/pearmut/) [![PyPi license](https://badgen.net/pypi/license/pearmut/)](https://pypi.org/project/pearmut/) [![build status](https://github.com/zouharvi/pearmut/actions/workflows/test.yml/badge.svg)](https://github.com/zouharvi/pearmut/actions/workflows/test.yml)
1
+ # 🍐Pearmut <br> [![PyPi version](https://badgen.net/pypi/v/pearmut/)](https://pypi.org/project/pearmut) [![PyPI download/month](https://img.shields.io/pypi/dm/pearmut.svg)](https://pypi.python.org/pypi/pearmut/) [![PyPi license](https://badgen.net/pypi/license/pearmut/)](https://pypi.org/project/pearmut/) [![build status](https://github.com/zouharvi/pearmut/actions/workflows/test.yml/badge.svg)](https://github.com/zouharvi/pearmut/actions/workflows/test.yml) [![arXiv](https://img.shields.io/badge/arXiv-2601.02933-b31b1b.svg?style=flat)](https://arxiv.org/abs/2601.02933)
2
2
 
3
3
  **Platform for Evaluation and Reviewing of Multilingual Tasks**: Evaluate model outputs for translation and NLP tasks with support for multimodal data (text, video, audio, images) and multiple annotation protocols ([DA](https://aclanthology.org/N15-1124/), [ESA](https://aclanthology.org/2024.wmt-1.131/), [ESA<sup>AI</sup>](https://aclanthology.org/2025.naacl-long.255/), [MQM](https://doi.org/10.1162/tacl_a_00437), and more!).
4
4
 
@@ -14,12 +14,15 @@
14
14
  - [Assignment Types](#assignment-types)
15
15
  - [Advanced Features](#advanced-features)
16
16
  - [Pre-filled Error Spans (ESA<sup>AI</sup>)](#pre-filled-error-spans-esaai)
17
+ - [Custom MQM Taxonomy](#custom-mqm-taxonomy)
17
18
  - [Tutorial and Attention Checks](#tutorial-and-attention-checks)
19
+ - [Form Items for User Metadata](#form-items-for-user-metadata)
18
20
  - [Pre-defined User IDs and Tokens](#pre-defined-user-ids-and-tokens)
19
21
  - [Multimodal Annotations](#multimodal-annotations)
20
22
  - [Hosting Assets](#hosting-assets)
21
23
  - [Campaign Management](#campaign-management)
22
24
  - [Custom Completion Messages](#custom-completion-messages)
25
+ - [Prolific Integration](#prolific-integration)
23
26
  - [CLI Commands](#cli-commands)
24
27
  - [Terminology](#terminology)
25
28
  - [Development](#development)
@@ -120,6 +123,22 @@ The `shuffle` parameter in campaign `info` controls this behavior:
120
123
  "data": [...]
121
124
  }
122
125
  ```
126
+ Documents in `data_welcome` are not shuffled and so don't require to have the same models in all documents.
127
+
128
+ ### Showing Model Names
129
+
130
+ By default, model names are hidden to avoid biasing annotators. To display model names on top of each output block, set `show_model_names` to `true`:
131
+ ```python
132
+ {
133
+ "info": {
134
+ "assignment": "task-based",
135
+ "protocol": "ESA",
136
+ "show_model_names": true # Default: false.
137
+ },
138
+ "campaign_id": "my_campaign",
139
+ "data": [...]
140
+ }
141
+ ```
123
142
 
124
143
  ### Custom Score Sliders
125
144
 
@@ -142,6 +161,52 @@ For multi-dimensional evaluation tasks (e.g., assessing fluency on a Likert scal
142
161
 
143
162
  When `sliders` is specified, only the custom sliders are shown. Each slider must have `name`, `min`, `max`, and `step` properties. All sliders must be answered before proceeding.
144
163
 
164
+ ### Textfield for Post-editing/Translation
165
+
166
+ Enable a textfield for post-editing or translation tasks using the `textfield` parameter in `info`. The textfield content is stored in annotations alongside scores and error spans.
167
+
168
+ ```python
169
+ {
170
+ "info": {
171
+ "protocol": "DA",
172
+ "textfield": "prefilled" # Options: null, "hidden", "visible", "prefilled"
173
+ }
174
+ }
175
+ ```
176
+
177
+ **Textfield modes:**
178
+ - `null` or omitted: No textfield (default)
179
+ - `"hidden"`: Textfield hidden by default, shown by clicking a button
180
+ - `"visible"`: Textfield always visible
181
+ - `"prefilled"`: Textfield visible and pre-filled with model output for post-editing
182
+
183
+ ### Custom MQM Taxonomy
184
+
185
+ For MQM protocol campaigns, you can define a custom error taxonomy instead of using the default MQM categories. Specify `mqm_categories` in the campaign `info` section as a dictionary mapping main categories to lists of subcategories:
186
+
187
+
188
+ ```python
189
+ {
190
+ "info": {
191
+ "assignment": "task-based",
192
+ "protocol": "MQM",
193
+ "mqm_categories": {
194
+ "": [], # Empty selection option
195
+ "General": ["", "Accuracy", "Fluency"],
196
+ "Audio-specific": ["", "Inaudible", "Background noise", "Speaker overlap", "Misinterpretation"],
197
+ "Style": ["", "Awkward", "Embarassing"],
198
+ "Unknown": [] # Category with no subcategories
199
+ }
200
+ },
201
+ "campaign_id": "custom_mqm_example",
202
+ "data": [...]
203
+ }
204
+ ```
205
+
206
+ If `mqm_categories` is not provided, the default MQM taxonomy will be used. The empty string key `""` provides an unselected state in the dropdown. Categories with empty subcategory lists (e.g., `"Style": []`) do not require a subcategory selection.
207
+
208
+ See [examples/custom_mqm.json](examples/custom_mqm.json) for a complete example.
209
+
145
210
  ### Custom Instructions
146
211
 
147
212
  Set campaign-level instructions using the `instructions` field in `info` (supports HTML).
@@ -231,6 +296,34 @@ The `score_greaterthan` field specifies the index of the candidate that must hav
231
296
  See [examples/tutorial/esa_deen.json](examples/tutorial/esa_deen.json) for a mock campaign with a fully prepared ESA tutorial.
232
297
  To use it, simply extract the `data` attribute and prefix it to each task in your campaign.
233
298
 
299
+ #### Universal Tutorial Items with `data_welcome`
300
+
301
+ Use `data_welcome` to add tutorial items that users must complete before starting regular tasks. The structure is a list of documents (same as `data`). Welcome items have IDs `welcome_0`, `welcome_1`, etc. and are tracked separately via `progress_welcome`.
302
+
303
+ ### Form Items for User Metadata
304
+
305
+ Collect user information (demographics, expertise) before annotation tasks using form items in `data_welcome`.
306
+ Form items have `text` (label/question) and `form` (field type: `null`, `"string"`, `"number"`, `"choices"`, and `"script"`).
307
+ Documents must be homogeneous: all form items or all evaluation items.
308
+
309
+ ```python
310
+ {
311
+ "data_welcome": [
312
+ [
313
+ {"text": "What is your native language?", "form": "string"},
314
+ {"text": "Rate your expertise (1-10)", "form": "number"}
315
+ ]
316
+ ]
317
+ }
318
+ ```
319
+
320
+ <img width="400" alt="Screenshot of a user form" src="https://github.com/user-attachments/assets/2310e8dc-98e9-4abf-8a27-6781b0094efe" />
321
+
322
+
323
+ It is possible to automatically collect additional information from the host system using `"script"` field type.
324
+ Typically such a form document (or their sequence) would be stored in `"data_welcome"` such that it is both mandatory and show to all users.
325
+ See [examples/user_info_form.json](examples/user_info_form.json).
326
+
234
327
  ### Single-stream Assignment
235
328
 
236
329
  All annotators draw from a shared pool with random assignment:
@@ -244,11 +337,14 @@ All annotators draw from a shared pool with random assignment:
244
337
  # ESA: error spans and scores
245
338
  "protocol": "ESA",
246
339
  "users": 50, # number of annotators (can also be a list, see below)
340
+ "docs_per_user": 10, # optional: show goodbye after N documents per user
247
341
  },
248
342
  "data": [...], # list of all items (shared among all annotators)
249
343
  }
250
344
  ```
251
345
 
346
+ Set `docs_per_user` to limit how many documents each user annotates before seeing the goodbye message (for single-stream, this is the number of documents).
347
+
252
348
  ### Dynamic Assignment
253
349
 
254
350
  The `dynamic` assignment type intelligently selects items based on current model performance to focus annotation effort on top-performing models using contrastive comparisons.
@@ -265,11 +361,14 @@ All items must contain outputs from all models for this assignment type to work
265
361
  "dynamic_contrastive_models": 2, # how many models to compare per item (optional, default: 1)
266
362
  "dynamic_first": 5, # annotations per model before dynamic kicks in (optional, default: 5)
267
363
  "dynamic_backoff": 0.1, # probability of uniform sampling (optional, default: 0)
364
+ "docs_per_user": 20, # optional: show goodbye after N documents per user
268
365
  },
269
366
  "data": [...], # list of all items (shared among all annotators)
270
367
  }
271
368
  ```
272
369
 
370
+ Set `docs_per_user` to limit how many documents each user annotates before seeing the goodbye message (for dynamic, this is roughly the number of documents × models).
371
+
273
372
  **How it works:**
274
373
  1. Initial phase: Each model gets `dynamic_first` annotations with fully random contrastive evaluation
275
374
  2. Dynamic phase: After the initial phase, top `dynamic_top` models (by average score) are identified
@@ -357,6 +456,14 @@ When tokens are supplied, the dashboard will try to show model rankings based on
357
456
 
358
457
  Customize the goodbye message shown to users when they complete all annotations using the `instructions_goodbye` field in campaign info. Supports arbitrary HTML for styling and formatting with variable replacement: `${TOKEN}` (completion token) and `${USER_ID}` (user ID). Default: `"If someone asks you for a token of completion, show them: ${TOKEN}"`.
359
458
 
459
+ ### Prolific Integration
460
+
461
+ Use task-based assignment with Prolific. For each task, Pearmut generates a unique URL which can be uploaded to Prolific's interface. Add redirect (on completion) to `instructions_goodbye`:
462
+ ```json
463
+ "instructions_goodbye": "<a href='https://app.prolific.com/submissions/complete?cc=${TOKEN}'>Click here to return to Prolific</a>"
464
+ ```
465
+ The `${TOKEN}` is automatically replaced based on passing attention checks (see [Attention checks](#tutorial-and-attention-checks) and [Pre-defined tokens](#pre-defined-user-ids-and-tokens)).
466
+
360
467
  ## Terminology
361
468
 
362
469
  - **Campaign**: An annotation project that contains configuration, data, and user assignments. Each campaign has a unique identifier and is defined in a JSON file.
@@ -380,7 +487,7 @@ Customize the goodbye message shown to users when they complete all annotations
380
487
  - **Score**: Numeric quality rating (0-100)
381
488
  - **Error Spans**: Text highlights marking errors with severity (`minor`, `major`)
382
489
  - **Error Categories**: MQM taxonomy labels for errors
383
- - **Template**: The annotation interface type. The `basic` template supports comparing multiple outputs simultaneously.
490
+ - **Template**: The annotation interface type. The `annotate` template supports comparing multiple outputs simultaneously.
384
491
  - **Assignment**: The method for distributing items to users:
385
492
  - **Task-based**: Each user has predefined items
386
493
  - **Single-stream**: Users draw from a shared pool with random assignment
@@ -411,7 +518,7 @@ pearmut run
411
518
  2. Add build rule to `webpack.config.js`
412
519
  3. Reference as `info->template` in campaign JSON
413
520
 
414
- See [web/src/basic.ts](web/src/basic.ts) for example.
521
+ See [web/src/annotate.ts](web/src/annotate.ts) for example.
415
522
 
416
523
  ### Deployment
417
524
 
@@ -422,68 +529,15 @@ Run on public server or tunnel local port to public IP/domain and run locally.
422
529
  If you use this work in your paper, please cite as following.
423
530
  ```bibtex
424
531
  @misc{zouhar2026pearmut,
425
- author = {Zouhar, Vilém},
426
- title = {Pearmut: Human Evaluation of Translation Made Trivial},
427
- year = {2026}
532
+ title={Pearmut: Human Evaluation of Translation Made Trivial},
533
+ author={Vilém Zouhar and Tom Kocmi},
534
+ year={2026},
535
+ eprint={2601.02933},
536
+ archivePrefix={arXiv},
537
+ primaryClass={cs.CL},
538
+ url={https://arxiv.org/abs/2601.02933},
428
539
  }
429
540
  ```
430
541
 
431
542
  Contributions are welcome! Please reach out to [Vilém Zouhar](mailto:vilem.zouhar@gmail.com).
432
-
433
- # Changelog
434
-
435
- - v1.0.1
436
- - Support RTL languages
437
- - Add boxes for references
438
- - Add custom score sliders for multi-dimensional evaluation
439
- - Make instructions customizable and protocol-dependent
440
- - Support custom sliders
441
- - Purge/reset whole tasks from dashboard
442
- - Fix resetting individual users in single-stream/dynamic
443
- - Fix notification stacking
444
- - Add campaigns from dashboard
445
- - v0.3.3
446
- - Rename `doc_id` to `item_id`
447
- - Add Typst, LaTeX, and PDF export for model ranking tables. Hide them by default.
448
- - Add dynamic assignment type with contrastive model comparison
449
- - Add `instructions_goodbye` field with variable substitution
450
- - Add visual anchors at 33% and 66% on sliders
451
- - Add German→English ESA tutorial with attention checks
452
- - Validate document model consistency before shuffle
453
- - Fix UI block on any interaction
454
- - v0.3.2
455
- - Revert seeding of user IDs
456
- - Set ESA (Error Span Annotation) as default
457
- - Update server IP address configuration
458
- - Show approximate alignment by default
459
- - Unify pointwise and listwise interfaces into `basic`
460
- - Refactor protocol configuration (breaking change)
461
- - v0.2.11
462
- - Add comment field in settings panel
463
- - Add `score_gt` validation for listwise comparisons
464
- - Add Content-Disposition headers for proper download filenames
465
- - Add model results display to dashboard with rankings
466
- - Add campaign file structure validation
467
- - Purge command now unlinks assets
468
- - v0.2.6
469
- - Add frozen annotation links feature for view-only mode
470
- - Add word-level annotation mode toggle for error spans
471
- - Add `[missing]` token support
472
- - Improve frontend speed and cleanup toolboxes on item load
473
- - Host assets via symlinks
474
- - Add validation threshold for success/fail tokens
475
- - Implement reset masking for annotations
476
- - Allow pre-defined user IDs and tokens in campaign data
477
- - v0.1.1
478
- - Set server defaults and add VM launch scripts
479
- - Add warning dialog when navigating away with unsaved work
480
- - Add tutorial validation support for pointwise and listwise
481
- - Add ability to preview existing annotations via progress bar
482
- - Add support for ESA<sup>AI</sup> pre-filled error_spans
483
- - Rename pairwise to listwise and update layout
484
- - Implement single-stream assignment type
485
- - v0.0.3
486
- - Support multimodal inputs and outputs
487
- - Add dashboard
488
- - Implement ESA (Error Span Annotation) and MQM support
489
-
543
+ See changes in [CHANGELOG.md](CHANGELOG.md).