pearmut 0.2.9__py3-none-any.whl → 0.2.11__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,4 +1,4 @@
1
- <!doctype html><html lang="en" style="height: 100%;"><head><meta charset="UTF-8"><meta name="viewport" content="width=900px"><title>Pearmut Evaluation</title><link rel="icon" type="image/svg+xml" href="assets/favicon.svg"><link rel="stylesheet" href="assets/style.css"><style>.output_srctgt {
1
+ <!doctype html><html lang="en" style="height: 100%;"><head><meta charset="UTF-8"><meta name="viewport" content="width=900px"><title>Pearmut Evaluation</title><link rel="icon" type="image/svg+xml" href="favicon.svg"><link rel="stylesheet" href="style.css"><style>.output_srctgt {
2
2
  display: inline-block;
3
3
  width: calc(100% - 140px);
4
4
  vertical-align: top;
@@ -66,4 +66,4 @@
66
66
  direction: rtl;
67
67
  width: 16px;
68
68
  height: 200px;
69
- }</style><script defer="defer" src="pointwise.bundle.js"></script></head><body><div style="max-width: 1600px; min-width: 900px; margin-left: auto; margin-right: auto; margin-top: 20px; padding-left: 10px;"><div class="white-box" style="margin-right: 30px; background-color: #e7e2cf; padding: 5px 15px 5px 5px;"><span id="instructions_global" style="display: inline-block; font-size: 11pt; width: calc(100% - 170px);"><ul><li id="instructions_spans">Error spans:<ul><li><strong>Click</strong> on the start of an error, then <strong>click</strong> on the end to mark an error span.</li><li><strong>Click</strong> on an existing highlight to change error severity (minor/major) or remove it.</li></ul>Error severity:<ul><li><span class="instruction_sev" id="instruction_sev_minor">Minor:</span> Style, grammar, or word choice could be better.</li><li><span class="instruction_sev" id="instruction_sev_major">Major:</span> Meaning is significantly changed or is hard to understand.</li></ul><strong>Tip</strong>: Mark the general area of the error (doesn't need to be exact). Use separate highlights for different errors. Use <code>[missing]</code> at the end of a sentence for omitted content.<br></li><li id="instructions_score">Score the translation: Please use the slider and set an overall score based on meaning preservation and general quality:</li><ul><li>0: <strong>Nonsense</strong>: most information is lost.</li><li>33%: <strong>Broken</strong>: major gaps and narrative issues.</li><li>66%: <strong>Middling</strong>: minor issues with grammar or consistency.</li><li>100%: <strong>Perfect</strong>: meaning and grammar align completely with the source.</li></ul><li id="instructions_categories">Error types: After highlighting an error fragment, you will be asked to select the specific error type (main category and subcategory). If you are unsure about which errors fall under which categories, please consult the <a href="https://themqm.org/the-mqm-typology/" style="font-weight: bold; text-decoration: none; color: black;">typology definitions</a>.</li></ul></span><div style="width: 170px; display: inline-block; vertical-align: top; text-align: right; padding-top: 5px;"><span id="time" style="width: 135px; text-align: left; display: inline-block; font-size: 11pt;" title="Approximation of total annotation time.">Time: 0m</span> <input type="button" value="⚙️" id="button_settings" style="height: 1.5em; width: 30px;"><br><br><div id="progress" style="text-align: center;"></div><br><br><input type="button" value="Next 🛠️" id="button_next" disabled="disabled" style="width: 170px; height: 2.5em;" title="Finish annotating all examples first."> <input type="button" value="skip tutorial" id="button_skip_tutorial" style="width: 170px; font-size: 11pt; height: 30px; margin-top: 10px; display: none;" title="Skip tutorial only if you completed it already."></div></div><div id="settings_div" class="white-box" style="margin-right: 20px; margin-top: 10px; display: none; background-color: #e7e2cf; font-size: 11pt;"><input type="checkbox" id="settings_approximate_alignment"> <label for="settings_approximate_alignment">Show approximate alignment</label><br><input type="checkbox" id="settings_word_level"> <label for="settings_word_level">Word-level selection</label></div><div id="output_div" style="margin-top: 100px;"></div><br><br><br></div></body></html>
69
+ }</style><script defer="defer" src="pointwise.bundle.js"></script></head><body><div style="max-width: 1600px; min-width: 900px; margin-left: auto; margin-right: auto; margin-top: 20px; padding-left: 10px;"><div class="white-box" style="margin-right: 30px; background-color: #e7e2cf; padding: 5px 15px 5px 5px;"><span id="instructions_global" style="display: inline-block; font-size: 11pt; width: calc(100% - 170px);"><ul><li id="instructions_spans">Error spans:<ul><li><strong>Click</strong> on the start of an error, then <strong>click</strong> on the end to mark an error span.</li><li><strong>Click</strong> on an existing highlight to change error severity (minor/major) or remove it.</li></ul>Error severity:<ul><li><span class="instruction_sev" id="instruction_sev_minor">Minor:</span> Style, grammar, or word choice could be better.</li><li><span class="instruction_sev" id="instruction_sev_major">Major:</span> Meaning is significantly changed or is hard to understand.</li></ul><strong>Tip</strong>: Mark the general area of the error (doesn't need to be exact). Use separate highlights for different errors. Use <code>[missing]</code> at the end of a sentence for omitted content.<br></li><li id="instructions_score">Score the translation: Please use the slider and set an overall score based on meaning preservation and general quality:</li><ul><li>0: <strong>Nonsense</strong>: most information is lost.</li><li>33%: <strong>Broken</strong>: major gaps and narrative issues.</li><li>66%: <strong>Middling</strong>: minor issues with grammar or consistency.</li><li>100%: <strong>Perfect</strong>: meaning and grammar align completely with the source.</li></ul><li id="instructions_categories">Error types: After highlighting an error fragment, you will be asked to select the specific error type (main category and subcategory). If you are unsure about which errors fall under which categories, please consult the <a href="https://themqm.org/the-mqm-typology/" style="font-weight: bold; text-decoration: none; color: black;">typology definitions</a>.</li></ul></span><div style="width: 170px; display: inline-block; vertical-align: top; text-align: right; padding-top: 5px;"><span id="time" style="width: 135px; text-align: left; display: inline-block; font-size: 11pt;" title="Approximation of total annotation time.">Time: 0m</span> <input type="button" value="⚙️" id="button_settings" style="height: 1.5em; width: 30px;"><br><br><div id="progress" style="text-align: center;"></div><br><br><input type="button" value="Next 🛠️" id="button_next" disabled="disabled" style="width: 170px; height: 2.5em;" title="Finish annotating all examples first."> <input type="button" value="skip tutorial" id="button_skip_tutorial" style="width: 170px; font-size: 11pt; height: 30px; margin-top: 10px; display: none;" title="Skip tutorial only if you completed it already."></div></div><div id="settings_div" class="white-box" style="margin-right: 20px; margin-top: 10px; display: none; background-color: #e7e2cf; font-size: 11pt;"><input type="checkbox" id="settings_approximate_alignment"> <label for="settings_approximate_alignment">Show approximate alignment</label><br><input type="checkbox" id="settings_word_level"> <label for="settings_word_level">Word-level selection</label><br><br><textarea id="settings_comment" style="width: 95%; height: 80px; resize: vertical; margin-top: 5px; padding: 5px; border-radius: 4px; border: 1px solid #999;" placeholder="Optional: Add any comments or feedback about this item. Will be submitted with Next."></textarea></div><div id="output_div" style="margin-top: 100px;"></div><br><br><br></div></body></html>
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: pearmut
3
- Version: 0.2.9
3
+ Version: 0.2.11
4
4
  Summary: A tool for evaluation of model outputs, primarily MT.
5
5
  Author-email: Vilém Zouhar <vilem.zouhar@gmail.com>
6
6
  License: MIT
@@ -47,9 +47,13 @@ Dynamic: license-file
47
47
  - [Hosting Assets](#hosting-assets)
48
48
  - [Campaign Management](#campaign-management)
49
49
  - [CLI Commands](#cli-commands)
50
+ - [Terminology](#terminology)
50
51
  - [Development](#development)
51
52
  - [Citation](#citation)
52
53
 
54
+
55
+ **Error Span** — A highlighted segment of text marked as containing an error, with optional severity (`minor`, `major`, `neutral`) and MQM category labels.
56
+
53
57
  ## Quick Start
54
58
 
55
59
  Install and run locally without cloning:
@@ -193,7 +197,21 @@ Add `validation` rules for tutorials or attention checks:
193
197
  - **Silent attention checks**: Omit `warning` to log failures without notification (quality control)
194
198
 
195
199
  For listwise, `validation` is an array (one per candidate). Dashboard shows ✅/❌ based on `validation_threshold` in `info` (integer for max failed count, float \[0,1\) for max proportion, default 0).
196
- See [examples/tutorial_pointwise.json](examples/tutorial_pointwise.json) and [examples/tutorial_listwise.json](examples/tutorial_listwise.json).
200
+
201
+ **Listwise score comparison:** Use `score_greaterthan` to ensure one candidate scores higher than another:
202
+ ```python
203
+ {
204
+ "src": "AI transforms industries.",
205
+ "tgt": ["UI transformuje průmysly.", "Umělá inteligence mění obory."],
206
+ "validation": [
207
+ {"warning": "A has error, score 20-40.", "score": [20, 40]},
208
+ {"warning": "B is correct and must score higher than A.", "score": [70, 90], "score_greaterthan": 0}
209
+ ]
210
+ }
211
+ ```
212
+ The `score_greaterthan` field specifies the index of the candidate that must have a lower score than the current candidate.
213
+
214
+ See [examples/tutorial_pointwise.json](examples/tutorial_pointwise.json), [examples/tutorial_listwise.json](examples/tutorial_listwise.json), and [examples/tutorial_listwise_score_greaterthan.json](examples/tutorial_listwise_score_greaterthan.json).
197
215
 
198
216
  ### Single-stream Assignment
199
217
 
@@ -294,6 +312,38 @@ Items need `model` field (pointwise) or `models` field (listwise) and the `proto
294
312
  ```
295
313
  See an example in [Campaign Management](#campaign-management)
296
314
 
315
+
316
+ ## Terminology
317
+
318
+ - **Campaign**: An annotation project that contains configuration, data, and user assignments. Each campaign has a unique identifier and is defined in a JSON file.
319
+ - **Campaign File**: A JSON file that defines the campaign configuration, including the campaign ID, assignment type, protocol settings, and annotation data.
320
+ - **Campaign ID**: A unique identifier for a campaign (e.g., `"wmt25_#_en-cs_CZ"`). Used to reference and manage specific campaigns.
321
+ - **Task**: A unit of work assigned to a user. In task-based assignment, each task consists of a predefined set of items for a specific user.
322
+ - **Item** — A single annotation unit within a task. For translation evaluation, an item typically represents a document (source text and target translation). Items can contain text, images, audio, or video.
323
+ - **Document** — A collection of one or more segments (sentence pairs or text units) that are evaluated together as a single item.
324
+ - **User** / **Annotator**: A person who performs annotations in a campaign. Each user is identified by a unique user ID and accesses the campaign through a unique URL.
325
+ - **Attention Check** — A validation item with known correct answers used to ensure annotator quality. Can be:
326
+ - **Loud**: Shows warning message and forces retry on failure
327
+ - **Silent**: Logs failures without notifying the user (for quality control analysis)
328
+ - **Token** — A completion code shown to users when they finish their annotations. Tokens verify the completion and whether the user passed quality control checks:
329
+ - **Pass Token** (`token_pass`): Shown when user meets validation thresholds
330
+ - **Fail Token** (`token_fail`): Shown when user fails to meet validation requirements
331
+ - **Tutorial**: An instructional validation item that teaches users how to annotate. Includes `allow_skip: true` to let users skip if they have seen it before.
332
+ - **Validation**: Quality control rules attached to items that check if annotations match expected criteria (score ranges, error span locations, etc.). Used for tutorials and attention checks.
333
+ - **Model**: The system or model that generated the output being evaluated (e.g., `"GPT-4"`, `"Claude"`). Used for tracking and ranking model performance.
334
+ - **Dashboard**: The management interface that shows campaign progress, annotator statistics, access links, and allows downloading annotations. Accessed via a special management URL with token authentication.
335
+ - **Protocol**: The annotation scheme defining what data is collected:
336
+ - **Score**: Numeric quality rating (0-100)
337
+ - **Error Spans**: Text highlights marking errors
338
+ - **Error Categories**: MQM taxonomy labels for errors
339
+ - **Template**: The annotation interface type:
340
+ - **Pointwise**: Evaluate one output at a time
341
+ - **Listwise**: Compare multiple outputs simultaneously
342
+ - **Assignment**: The method for distributing items to users:
343
+ - **Task-based**: Each user has predefined items
344
+ - **Single-stream**: Users draw from a shared pool with random assignment
345
+ - **Dynamic**: Work in progress
346
+
297
347
  ## Development
298
348
 
299
349
  Server responds to data-only requests from frontend (no template coupling). Frontend served from pre-built `static/` on install.
@@ -333,7 +383,7 @@ If you use this work in your paper, please cite as following.
333
383
  author={Vilém Zouhar},
334
384
  title={Pearmut: Platform for Evaluating and Reviewing of Multilingual Tasks},
335
385
  url={https://github.com/zouharvi/pearmut/},
336
- year={2025},
386
+ year={2026},
337
387
  }
338
388
  ```
339
389
 
@@ -0,0 +1,19 @@
1
+ pearmut/app.py,sha256=Vf02dH3bB65C6mlgQBHBg3mzraLTpeizwX4skMvpHhU,10658
2
+ pearmut/assignment.py,sha256=5tc2h7gMFE3bs0S6M_N-wMKH3FYyja_58N8gTjEbEDw,11695
3
+ pearmut/cli.py,sha256=wdyBu_n_IbCT99ZANmMlit817jaUfZKeRGP7HRci3vs,17560
4
+ pearmut/utils.py,sha256=TWcbdTehg4CNwCpc5FuEOszpQM464LY0IQHHE_Sq1Zg,5293
5
+ pearmut/static/dashboard.bundle.js,sha256=fBe1_HTViHSp357qgPhvPSnGEWfC8Gex-sd0GlBcAWM,100658
6
+ pearmut/static/dashboard.html,sha256=HXZzoz44f7LYtAfuP7uQioxTkNmo2_fAN0v2C2s1lAs,2680
7
+ pearmut/static/favicon.svg,sha256=gVPxdBlyfyJVkiMfh8WLaiSyH4lpwmKZs8UiOeX8YW4,7347
8
+ pearmut/static/index.html,sha256=yMttallApd0T7sxngUrdwCDrtTQpRIFF0-4W0jfXejU,835
9
+ pearmut/static/listwise.bundle.js,sha256=n9KlagExMVYh8aB1-ds6f_sL70WcvxNDuagmKHkwW5U,110382
10
+ pearmut/static/listwise.html,sha256=8K6Y7CXXKHqlMflBAa3l0stjs3Tg3SMK5SC25STdbew,5571
11
+ pearmut/static/pointwise.bundle.js,sha256=7dcdlnO3i612prUHk16HPkPlRgqpTKzF6KFwCPPF098,109717
12
+ pearmut/static/pointwise.html,sha256=L_9S6thn3tsgRCtHwICgBU9CC6OSGGSVgmqS0UI8Elg,5287
13
+ pearmut/static/style.css,sha256=BrPnXTDr8hQ0M8T-EJlExddChzIFotlerBYMx2B8GDk,4136
14
+ pearmut-0.2.11.dist-info/licenses/LICENSE,sha256=GtR6RcTdRn-P23h5pKFuWSLZrLPD0ytHAwSOBt7aLpI,1071
15
+ pearmut-0.2.11.dist-info/METADATA,sha256=AWPF3qSmQwvHllZ8jpmZQpUUoumloDTYA7tG89VjOGw,16340
16
+ pearmut-0.2.11.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
17
+ pearmut-0.2.11.dist-info/entry_points.txt,sha256=eEA9LVWsS3neQbMvL_nMvEw8I0oFudw8nQa1iqxOiWM,45
18
+ pearmut-0.2.11.dist-info/top_level.txt,sha256=CdgtUM-SKQDt6o5g0QreO-_7XTBP9_wnHMS1P-Rl5Go,8
19
+ pearmut-0.2.11.dist-info/RECORD,,
@@ -1,19 +0,0 @@
1
- pearmut/app.py,sha256=FV7pEf-TzMmATLLpPMIjWOwznxaND8IYX_jO_s0h9oQ,10236
2
- pearmut/assignment.py,sha256=GvulwsPEguA_rNZB58bDKYy1wVZX9j4vnmbrKH4m0Mo,10963
3
- pearmut/cli.py,sha256=oaViD9vlvFW3WIs2Ncm5DSUWRqObDZ1iTcAt1HAdaYg,17686
4
- pearmut/utils.py,sha256=TWcbdTehg4CNwCpc5FuEOszpQM464LY0IQHHE_Sq1Zg,5293
5
- pearmut/static/dashboard.bundle.js,sha256=9cjhcY57uEOaJNcrOdn4lpMpQIfsRtPlr-rcQBQ92K0,100268
6
- pearmut/static/dashboard.html,sha256=w1xNgLakDMxzp9iDp18SOoKHO10kB7ldvzuuwsC0zxk,2694
7
- pearmut/static/index.html,sha256=SC5M-NSTnJh1UNHCC5VOP0TKkmhNn6MHlY6L4GDacpA,849
8
- pearmut/static/listwise.bundle.js,sha256=BqHFM-o5Eg0FvCaa2-BoLtWffiH1F0yJV76wXRlGOEM,109635
9
- pearmut/static/listwise.html,sha256=4A0a_GMVIjJmqT3lhJMT9huqvwgvrRfztt0KA0lJxKI,5308
10
- pearmut/static/pointwise.bundle.js,sha256=cuCT-quJzsRzorS8ojaQ0uXbDix-niS1txKOhq8QirE,108974
11
- pearmut/static/pointwise.html,sha256=2NZYyjpznXP2b4GMeDcrjRYI5hZ45l7QgI-RQjkRUqs,5024
12
- pearmut/static/assets/favicon.svg,sha256=gVPxdBlyfyJVkiMfh8WLaiSyH4lpwmKZs8UiOeX8YW4,7347
13
- pearmut/static/assets/style.css,sha256=BrPnXTDr8hQ0M8T-EJlExddChzIFotlerBYMx2B8GDk,4136
14
- pearmut-0.2.9.dist-info/licenses/LICENSE,sha256=GtR6RcTdRn-P23h5pKFuWSLZrLPD0ytHAwSOBt7aLpI,1071
15
- pearmut-0.2.9.dist-info/METADATA,sha256=_qNEVQtaI1wAaECmY8PTTvmHQzs5j6ks67A5VaZWZy8,12429
16
- pearmut-0.2.9.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
17
- pearmut-0.2.9.dist-info/entry_points.txt,sha256=eEA9LVWsS3neQbMvL_nMvEw8I0oFudw8nQa1iqxOiWM,45
18
- pearmut-0.2.9.dist-info/top_level.txt,sha256=CdgtUM-SKQDt6o5g0QreO-_7XTBP9_wnHMS1P-Rl5Go,8
19
- pearmut-0.2.9.dist-info/RECORD,,
File without changes
File without changes