pearmut 0.1.2__py3-none-any.whl → 0.2.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -66,4 +66,4 @@
66
66
  direction: rtl;
67
67
  width: 16px;
68
68
  height: 200px;
69
- }</style><script defer="defer" src="pointwise.bundle.js"></script></head><body><div style="max-width: 1600px; min-width: 900px; margin-left: auto; margin-right: auto; margin-top: 20px; padding-left: 10px;"><div class="white-box" style="margin-right: 30px; background-color: #e7e2cf; padding: 5px 15px 5px 5px;"><span id="instructions_global" style="display: inline-block; font-size: 11pt; width: calc(100% - 170px);"><ul id="instructions_spans"><li>Error spans:<ul><li><strong>Select</strong> the part of translation where you have identified a <strong>translation error</strong> (drag or click start & end).</li><li><strong>Click</strong> on the highlight to change error severity (minor/major) or remove the highlight.</li></ul>Choose error severity:<ul><li><span class="instruction_sev" id="instruction_sev_minor">Minor errors:</span> Style, grammar, word choice could be better or more natural.</li><li><span class="instruction_sev" id="instruction_sev_major">Major errors:</span>: The meaning is changed significantly and/or the part is really hard to understand.</li></ul><strong>Tip</strong>: Highlight the word or general area of the error (it doesn't need to be exact). Use separate highlights for different errors.<br></li><li id="instructions_score">Score the translation: Please use the slider and set an overall score based on meaning preservation and general quality:</li><ul><li>0: <strong>No meaning preserved</strong>: most information is lost.</li><li>33%: <strong>Some meaning preserved</strong>: major gaps and narrative issues.</li><li>66%: <strong>Most meaning preserved</strong>: minor issues with grammar or consistency.</li><li>100%: <strong>Perfect</strong>: meaning and grammar align completely with the source.</li></ul><li id="instructions_categories">Error types: After highlighting an error fragment, you will be asked to select the specific error type (main category and subcategory). If you are unsure about which errors fall under which categories, please consult the <a href="https://themqm.org/the-mqm-typology/" style="font-weight: bold; text-decoration: none; color: black;">typology definitions</a>.</li></ul></span><div style="width: 170px; display: inline-block; vertical-align: top; text-align: right; padding-top: 5px;"><span id="time" style="width: 135px; text-align: left; display: inline-block; font-size: 11pt;" title="Approximation of total annotation time.">Time: 0m</span> <input type="button" value="⚙️" id="button_settings" style="height: 1.5em; width: 30px;"><br><br><div id="progress" style="text-align: center;"></div><br><br><input type="button" value="Next 🛠️" id="button_next" disabled="disabled" style="width: 170px; height: 2.5em;" title="Finish annotating all examples first."> <input type="button" value="skip tutorial" id="button_skip_tutorial" style="width: 170px; font-size: 11pt; height: 30px; margin-top: 10px; display: none;" title="Skip tutorial only if you completed it already."></div></div><div id="settings_div" class="white-box" style="margin-right: 20px; margin-top: 10px; display: none; background-color: #e7e2cf; font-size: 11pt;"><input type="checkbox" id="settings_approximate_alignment"> <label for="settings_approximate_alignment">Show approximate alignment</label></div><div id="output_div" style="margin-top: 100px;"></div><br><br><br></div></body></html>
69
+ }</style><script defer="defer" src="pointwise.bundle.js"></script></head><body><div style="max-width: 1600px; min-width: 900px; margin-left: auto; margin-right: auto; margin-top: 20px; padding-left: 10px;"><div class="white-box" style="margin-right: 30px; background-color: #e7e2cf; padding: 5px 15px 5px 5px;"><span id="instructions_global" style="display: inline-block; font-size: 11pt; width: calc(100% - 170px);"><ul><li id="instructions_spans">Error spans:<ul><li><strong>Select</strong> the part of translation where you have identified a <strong>translation error</strong> (drag or click start & end).</li><li><strong>Click</strong> on the highlight to change error severity (minor/major) or remove the highlight.</li></ul>Choose error severity:<ul><li><span class="instruction_sev" id="instruction_sev_minor">Minor errors:</span> Style, grammar, word choice could be better or more natural.</li><li><span class="instruction_sev" id="instruction_sev_major">Major errors:</span>: The meaning is changed significantly and/or the part is really hard to understand.</li></ul><strong>Tip</strong>: Highlight the word or general area of the error (it doesn't need to be exact). Use separate highlights for different errors.<br></li><li id="instructions_score">Score the translation: Please use the slider and set an overall score based on meaning preservation and general quality:</li><ul><li>0: <strong>No meaning preserved</strong>: most information is lost.</li><li>33%: <strong>Some meaning preserved</strong>: major gaps and narrative issues.</li><li>66%: <strong>Most meaning preserved</strong>: minor issues with grammar or consistency.</li><li>100%: <strong>Perfect</strong>: meaning and grammar align completely with the source.</li></ul><li id="instructions_categories">Error types: After highlighting an error fragment, you will be asked to select the specific error type (main category and subcategory). If you are unsure about which errors fall under which categories, please consult the <a href="https://themqm.org/the-mqm-typology/" style="font-weight: bold; text-decoration: none; color: black;">typology definitions</a>.</li></ul></span><div style="width: 170px; display: inline-block; vertical-align: top; text-align: right; padding-top: 5px;"><span id="time" style="width: 135px; text-align: left; display: inline-block; font-size: 11pt;" title="Approximation of total annotation time.">Time: 0m</span> <input type="button" value="⚙️" id="button_settings" style="height: 1.5em; width: 30px;"><br><br><div id="progress" style="text-align: center;"></div><br><br><input type="button" value="Next 🛠️" id="button_next" disabled="disabled" style="width: 170px; height: 2.5em;" title="Finish annotating all examples first."> <input type="button" value="skip tutorial" id="button_skip_tutorial" style="width: 170px; font-size: 11pt; height: 30px; margin-top: 10px; display: none;" title="Skip tutorial only if you completed it already."></div></div><div id="settings_div" class="white-box" style="margin-right: 20px; margin-top: 10px; display: none; background-color: #e7e2cf; font-size: 11pt;"><input type="checkbox" id="settings_approximate_alignment"> <label for="settings_approximate_alignment">Show approximate alignment</label></div><div id="output_div" style="margin-top: 100px;"></div><br><br><br></div></body></html>
pearmut/utils.py CHANGED
@@ -3,6 +3,7 @@ import os
3
3
 
4
4
  ROOT = "."
5
5
 
6
+
6
7
  def highlight_differences(a, b):
7
8
  """
8
9
  Compares two strings and wraps their differences in HTML span tags.
@@ -30,7 +31,7 @@ def highlight_differences(a, b):
30
31
  res_a.append(f"{span_open}{a[i1:i2]}{span_close}")
31
32
  if tag in ('replace', 'insert'):
32
33
  res_b.append(f"{span_open}{b[j1:j2]}{span_close}")
33
-
34
+
34
35
  return "".join(res_a), "".join(res_b)
35
36
 
36
37
 
@@ -43,6 +44,58 @@ def load_progress_data(warn: str | None = None):
43
44
  with open(f"{ROOT}/data/progress.json", "r") as f:
44
45
  return json.load(f)
45
46
 
47
+
46
48
  def save_progress_data(data):
47
49
  with open(f"{ROOT}/data/progress.json", "w") as f:
48
- json.dump(data, f, indent=2)
50
+ json.dump(data, f, indent=2)
51
+
52
+
53
+ _logs = {}
54
+
55
+
56
+ def get_db_log(campaign_id: str) -> list[dict]:
57
+ """
58
+ Returns up to date log for the given campaign_id.
59
+ """
60
+ if campaign_id not in _logs:
61
+ # create a new one if it doesn't exist
62
+ log_path = f"{ROOT}/data/outputs/{campaign_id}.jsonl"
63
+ if os.path.exists(log_path):
64
+ with open(log_path, "r") as f:
65
+ _logs[campaign_id] = [
66
+ json.loads(line) for line in f.readlines()
67
+ ]
68
+ else:
69
+ _logs[campaign_id] = []
70
+
71
+ return _logs[campaign_id]
72
+
73
+
74
+ def get_db_log_item(campaign_id: str, user_id: str | None, item_i: int | None) -> list[dict]:
75
+ """
76
+ Returns the log item for the given campaign_id, user_id and item_i.
77
+ Can be empty.
78
+ """
79
+ log = get_db_log(campaign_id)
80
+ return [
81
+ entry for entry in log
82
+ if (
83
+ (user_id is None or entry.get("user_id") == user_id) and
84
+ (item_i is None or entry.get("item_i") == item_i)
85
+ )
86
+ ]
87
+
88
+
89
+ def save_db_payload(campaign_id: str, payload: dict):
90
+ """
91
+ Saves the given payload to the log for the given campaign_id, user_id and item_i.
92
+ Saves both on disk and in-memory.
93
+ """
94
+
95
+ log_path = f"{ROOT}/data/outputs/{campaign_id}.jsonl"
96
+ with open(log_path, "a") as log_file:
97
+ log_file.write(json.dumps(payload, ensure_ascii=False,) + "\n")
98
+
99
+ log = get_db_log(campaign_id)
100
+ # copy to avoid mutation issues
101
+ log.append(payload)
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: pearmut
3
- Version: 0.1.2
3
+ Version: 0.2.0
4
4
  Summary: A tool for evaluation of model outputs, primarily MT.
5
5
  Author-email: Vilém Zouhar <vilem.zouhar@gmail.com>
6
6
  License: apache-2.0
@@ -23,7 +23,7 @@ Dynamic: license-file
23
23
 
24
24
  Pearmut is a **Platform for Evaluation and Reviewing of Multilingual Tasks**.
25
25
  It evaluates model outputs, primarily translation but also various other NLP tasks.
26
- Supports multimodality (text, video, audio, images) and a variety of annotation protocols (DA, ESA, MQM, paired ESA, etc).
26
+ Supports multimodality (text, video, audio, images) and a variety of annotation protocols ([DA](https://aclanthology.org/N15-1124/), [ESA](https://aclanthology.org/2024.wmt-1.131/), [ESA<sup>AI</sup>](https://aclanthology.org/2025.naacl-long.255/), [MQM](https://doi.org/10.1162/tacl_a_00437), paired ESA, etc).
27
27
 
28
28
  [![PyPi version](https://badgen.net/pypi/v/pearmut/)](https://pypi.org/project/pearmut)
29
29
  &nbsp;
@@ -31,7 +31,7 @@ Supports multimodality (text, video, audio, images) and a variety of annotation
31
31
  &nbsp;
32
32
  [![PyPi license](https://badgen.net/pypi/license/pearmut/)](https://pypi.org/project/pearmut/)
33
33
  &nbsp;
34
- [![build status](https://github.com/zouharvi/pearmut/actions/workflows/ci.yml/badge.svg)](https://github.com/zouharvi/pearmut/actions/workflows/ci.yml)
34
+ [![build status](https://github.com/zouharvi/pearmut/actions/workflows/test.yml/badge.svg)](https://github.com/zouharvi/pearmut/actions/workflows/test.yml)
35
35
 
36
36
  <img width="1000" alt="Screenshot of ESA/MQM interface" src="https://github.com/user-attachments/assets/f14c91a5-44d7-4248-ada9-387e95ca59d0" />
37
37
 
@@ -42,11 +42,11 @@ You do not need to clone this repository. Simply install with pip and run locall
42
42
  # install the package
43
43
  pip install pearmut
44
44
  # download two campaign definitions
45
- wget https://raw.githubusercontent.com/zouharvi/pearmut/refs/heads/main/examples/wmt25_%23_en-cs_CZ.json
46
- wget https://raw.githubusercontent.com/zouharvi/pearmut/refs/heads/main/examples/wmt25_%23_cs-de_DE.json
45
+ wget https://raw.githubusercontent.com/zouharvi/pearmut/refs/heads/main/examples/esa_encs.json
46
+ wget https://raw.githubusercontent.com/zouharvi/pearmut/refs/heads/main/examples/da_enuk.json
47
47
  # load them into pearmut
48
- pearmut add wmt25_#_en-cs_CZ.json
49
- pearmut add wmt25_#_cs-de_DE.json
48
+ pearmut add esa_encs.json
49
+ pearmut add da_enuk.json
50
50
  # start pearmut (will show management links)
51
51
  pearmut run
52
52
  ```
@@ -115,6 +115,62 @@ For the standard ones (ESA, DA, MQM), we expect each item to be a dictionary (co
115
115
  ... # definition of another item (document)
116
116
  ```
117
117
 
118
+ ## Pre-filled Error Spans (ESA<sup>AI</sup> Support)
119
+
120
+ For workflows where you want to provide pre-filled error annotations (e.g., ESA<sup>AI</sup>), you can include an `error_spans` key in each item.
121
+ These spans will be loaded into the interface as existing annotations that users can review, modify, or delete.
122
+
123
+ ```python
124
+ {
125
+ "src": "The quick brown fox jumps over the lazy dog.",
126
+ "tgt": "Rychlá hnědá liška skáče přes líného psa.",
127
+ "error_spans": [
128
+ {
129
+ "start_i": 0, # character index start (inclusive)
130
+ "end_i": 5, # character index end (inclusive)
131
+ "severity": "minor", # "minor", "major", "neutral", or null
132
+ "category": null # MQM category string or null
133
+ },
134
+ {
135
+ "start_i": 27,
136
+ "end_i": 32,
137
+ "severity": "major",
138
+ "category": null
139
+ }
140
+ ]
141
+ }
142
+ ```
143
+
144
+ For **listwise** template, `error_spans` is a 2D array where each inner array corresponds to error spans for that candidate.
145
+
146
+ See [examples/esaai_prefilled.json](examples/esaai_prefilled.json) for a complete example.
147
+
148
+ ## Tutorial and Attention Checks
149
+
150
+ You can add validation rules to items for tutorials or attention checks. Items with `validation` field will be checked before submission:
151
+
152
+ ```python
153
+ {
154
+ "src": "The quick brown fox jumps.",
155
+ "tgt": "Rychlá hnědá liška skáče.",
156
+ "validation": {
157
+ "warning": "Please set score between 70-80.", # shown on failure (omit for silent logging)
158
+ "score": [70, 80], # required score range [min, max]
159
+ "error_spans": [{"start_i": [0, 2], "end_i": [4, 8], "severity": "minor"}], # expected spans
160
+ "allow_skip": true # show "skip tutorial" button
161
+ }
162
+ }
163
+ ```
164
+
165
+ - Tutorial items: Include `allow_skip: true` and `warning` to let users skip after seeing the feedback
166
+ - Loud attention checks: Include `warning` without `allow_skip` to force users to retry
167
+ - Silent attention checks: Omit `warning` to silently log failures without user notification (useful for quality control with bad translations)
168
+ For listwise template, `validation` is an array where each element corresponds to a candidate.
169
+ The dashboard shows failed/total validation checks per user.
170
+ See [examples/tutorial_pointwise.json](examples/tutorial_pointwise.json) and [examples/tutorial_listwise.json](examples/tutorial_listwise.json) for complete examples.
171
+
172
+ ## Single-stream Assignment
173
+
118
174
  We also support a simple allocation where all annotators draw from the same pool (`single-stream`). Items are randomly assigned to annotators from the pool of unfinished items:
119
175
  ```python
120
176
  {
@@ -138,7 +194,7 @@ We also support dynamic allocation of annotations (`dynamic`, not yet ⚠️), w
138
194
  "campaign_id": "my campaign 6",
139
195
  "info": {
140
196
  "assignment": "dynamic",
141
- "template": "kway",
197
+ "template": "listwise",
142
198
  "protocol_k": 5,
143
199
  "num_users": 50,
144
200
  },
@@ -154,6 +210,25 @@ pearmut add my_campaign_4.json
154
210
  pearmut run
155
211
  ```
156
212
 
213
+ ## Campaign options
214
+
215
+ In summary, you can select from the assignment types
216
+
217
+ - `task-based`: each user has a predefined set of items
218
+ - `single-stream`: all users are annotating together the same set of items
219
+ - `dynamic`: WIP ⚠️
220
+
221
+ and independently of that select your protocol template:
222
+
223
+ - `pointwise`: evaluate a single output given a single output
224
+ - `protocol_score`: ask for score 0 to 100
225
+ - `protocol_error_spans`: ask for highlighting error spans
226
+ - `protocol_error_categories`: ask for highlighting error categories
227
+ - `listwise`: evaluate multiple outputs at the same time given a single output ⚠️
228
+ - `protocol_score`: ask for score 0 to 100
229
+ - `protocol_error_spans`: ask for highlighting error spans
230
+ - `protocol_error_categories`: ask for highlighting error categories
231
+
157
232
  ## Campaign management
158
233
 
159
234
  When adding new campaigns or launching pearmut, a management link is shown that gives an overview of annotator progress but also an easy access to the annotation links or resetting the task progress (no data will be lost).
@@ -170,7 +245,7 @@ An intentionally incorrect token can be shown if the annotations don't pass qual
170
245
 
171
246
  We also support anything HTML-compatible both on the input and on the output.
172
247
  This includes embedded YouTube videos, or even simple `<video ` tags that point to some resource somewhere.
173
- For an example, try [examples/mock_multimodal.json](examples/mock_multimodal.json).
248
+ For an example, try [examples/multimodal.json](examples/multimodal.json).
174
249
  Tip: make sure the elements are already appropriately styled.
175
250
 
176
251
  <img width="800" alt="Preview of multimodal elements in Pearmut" src="https://github.com/user-attachments/assets/f34a1a3e-ad95-4114-95ee-8a49e8003faf" />
@@ -209,7 +284,7 @@ If you use this work in your paper, please cite as:
209
284
  ```bibtex
210
285
  @misc{zouhar2025pearmut,
211
286
  author={Vilém Zouhar},
212
- title={Pearmut🍐 Platform for Evaluation and Reviewing of Multilingual Tasks},
287
+ title={Pearmut: Platform for Evaluating and Reviewing of Multilingual Tasks},
213
288
  url={https://github.com/zouharvi/pearmut/},
214
289
  year={2025},
215
290
  }
@@ -0,0 +1,19 @@
1
+ pearmut/app.py,sha256=D4wk5HjEDFtkDakSWUBOb8sKRsbi_dBw3yyL1n6jhpQ,7957
2
+ pearmut/assignment.py,sha256=Sycq-_6BTjpm7KPSZ02zX9aTZxOr-zaxW5QbZpQlqV8,10415
3
+ pearmut/cli.py,sha256=xB05Fq8Ic1ucSxHWYBTtqWssFz0FwoLzHO7RFAG2vcc,7684
4
+ pearmut/utils.py,sha256=gk8b4biPc9TTvZiQMQ_8xh1_FsWuwrhtPzeK3NpzhZc,2902
5
+ pearmut/static/dashboard.bundle.js,sha256=NWGQfd0kXkSkpElCukPrMIPJROE8mMIkvhRwHHMzuAA,91528
6
+ pearmut/static/dashboard.html,sha256=lleOeCqjaCHM5ZG45Q5eM8vWxW65CTmJR3PEJbUKREE,1790
7
+ pearmut/static/index.html,sha256=ieCRLK83MVe-f-gtjYiOlvE-kKd8VnFF2xgyi6FoZpU,872
8
+ pearmut/static/listwise.bundle.js,sha256=UEb1smJt4kgeZU2FUqyc7jWGYQCzV8ri-1bZJXBxGHY,104819
9
+ pearmut/static/listwise.html,sha256=1z83PNGRR_4NEQ8kYxP19Aem_ew5CAKhKtcn2zxGL3M,5212
10
+ pearmut/static/pointwise.bundle.js,sha256=zd8U5tyYb3-IhF_07njSB9Nkab76ZYTj70Q1YPBlKkU,107171
11
+ pearmut/static/pointwise.html,sha256=lvplPE-9RxA-IFWkvzMEVGdroHN68qK9hvzMSuj-mmo,5009
12
+ pearmut/static/assets/favicon.svg,sha256=gVPxdBlyfyJVkiMfh8WLaiSyH4lpwmKZs8UiOeX8YW4,7347
13
+ pearmut/static/assets/style.css,sha256=SARZqqovP_2s9S5ENI7dxJ6Hacz-ztQ2zn2Hn7DwoJU,4089
14
+ pearmut-0.2.0.dist-info/licenses/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
15
+ pearmut-0.2.0.dist-info/METADATA,sha256=fffWQPgx2ytlttm1vmij2gphQz9bDmnVhRakenrsyeM,12266
16
+ pearmut-0.2.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
17
+ pearmut-0.2.0.dist-info/entry_points.txt,sha256=eEA9LVWsS3neQbMvL_nMvEw8I0oFudw8nQa1iqxOiWM,45
18
+ pearmut-0.2.0.dist-info/top_level.txt,sha256=CdgtUM-SKQDt6o5g0QreO-_7XTBP9_wnHMS1P-Rl5Go,8
19
+ pearmut-0.2.0.dist-info/RECORD,,
@@ -1,19 +0,0 @@
1
- pearmut/app.py,sha256=s_xv7Nq9dm3ObApH_Iz9myS-H_q4oXsFKqwiwVbQYuY,6740
2
- pearmut/assignment.py,sha256=IgGXmZKFASoGW8jVeXXUN3meY8Two-Txwg4nMwZEOnA,6422
3
- pearmut/cli.py,sha256=mV76uw6BywckbU7QEKIKTboukcALEdZp7l-kskJnBVA,7683
4
- pearmut/utils.py,sha256=6hfVenrVdGm1r-7uJIkWHhX9o0ztWjqPse_j_MqkgBw,1443
5
- pearmut/static/dashboard.bundle.js,sha256=6389gsHLCFh6JqiKdU3ng-Lm6VICRvfJgCSYM61H75U,91257
6
- pearmut/static/dashboard.html,sha256=tUP1yYvbKySRz0mxFtGq2Si4hTMhJkUCWeTpnq91Nf4,1789
7
- pearmut/static/index.html,sha256=ieCRLK83MVe-f-gtjYiOlvE-kKd8VnFF2xgyi6FoZpU,872
8
- pearmut/static/listwise.bundle.js,sha256=_KWKocPZjkDHHoiixKFOZzmD0qlw-nqFheBPcbED0HM,100788
9
- pearmut/static/listwise.html,sha256=zipFfGus26qWEdFbuNQmaG-NR5S1yaczv2XpD8j843U,5203
10
- pearmut/static/pointwise.bundle.js,sha256=1mks6kD4P2w7uQqeze4GttKVc-JZvsLYKRktV6Em6R0,100431
11
- pearmut/static/pointwise.html,sha256=dhmfgpWvCFB833Y4kj08_aBZyCN33SayYcS1ckL2-FU,5009
12
- pearmut/static/assets/favicon.svg,sha256=gVPxdBlyfyJVkiMfh8WLaiSyH4lpwmKZs8UiOeX8YW4,7347
13
- pearmut/static/assets/style.css,sha256=-B-RySjt8qccqkwvLT0PDy6IRoE1xytLLKAFtR_S-Tg,3967
14
- pearmut-0.1.2.dist-info/licenses/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
15
- pearmut-0.1.2.dist-info/METADATA,sha256=cuHpmxeRqYF9H6s5ukP6RZBEx4tzy7bzipdhmbtIBVc,8923
16
- pearmut-0.1.2.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
17
- pearmut-0.1.2.dist-info/entry_points.txt,sha256=eEA9LVWsS3neQbMvL_nMvEw8I0oFudw8nQa1iqxOiWM,45
18
- pearmut-0.1.2.dist-info/top_level.txt,sha256=CdgtUM-SKQDt6o5g0QreO-_7XTBP9_wnHMS1P-Rl5Go,8
19
- pearmut-0.1.2.dist-info/RECORD,,