pearmut 0.2.2__py3-none-any.whl → 0.2.4__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -66,4 +66,4 @@
66
66
  direction: rtl;
67
67
  width: 16px;
68
68
  height: 200px;
69
- }</style><script defer="defer" src="pointwise.bundle.js"></script></head><body><div style="max-width: 1600px; min-width: 900px; margin-left: auto; margin-right: auto; margin-top: 20px; padding-left: 10px;"><div class="white-box" style="margin-right: 30px; background-color: #e7e2cf; padding: 5px 15px 5px 5px;"><span id="instructions_global" style="display: inline-block; font-size: 11pt; width: calc(100% - 170px);"><ul><li id="instructions_spans">Error spans:<ul><li><strong>Select</strong> the part of translation where you have identified a <strong>translation error</strong> (drag or click start & end).</li><li><strong>Click</strong> on the highlight to change error severity (minor/major) or remove the highlight.</li></ul>Choose error severity:<ul><li><span class="instruction_sev" id="instruction_sev_minor">Minor errors:</span> Style, grammar, word choice could be better or more natural.</li><li><span class="instruction_sev" id="instruction_sev_major">Major errors:</span>: The meaning is changed significantly and/or the part is really hard to understand.</li></ul><strong>Tip</strong>: Highlight the word or general area of the error (it doesn't need to be exact). Use separate highlights for different errors.<br></li><li id="instructions_score">Score the translation: Please use the slider and set an overall score based on meaning preservation and general quality:</li><ul><li>0: <strong>No meaning preserved</strong>: most information is lost.</li><li>33%: <strong>Some meaning preserved</strong>: major gaps and narrative issues.</li><li>66%: <strong>Most meaning preserved</strong>: minor issues with grammar or consistency.</li><li>100%: <strong>Perfect</strong>: meaning and grammar align completely with the source.</li></ul><li id="instructions_categories">Error types: After highlighting an error fragment, you will be asked to select the specific error type (main category and subcategory). If you are unsure about which errors fall under which categories, please consult the <a href="https://themqm.org/the-mqm-typology/" style="font-weight: bold; text-decoration: none; color: black;">typology definitions</a>.</li></ul></span><div style="width: 170px; display: inline-block; vertical-align: top; text-align: right; padding-top: 5px;"><span id="time" style="width: 135px; text-align: left; display: inline-block; font-size: 11pt;" title="Approximation of total annotation time.">Time: 0m</span> <input type="button" value="⚙️" id="button_settings" style="height: 1.5em; width: 30px;"><br><br><div id="progress" style="text-align: center;"></div><br><br><input type="button" value="Next 🛠️" id="button_next" disabled="disabled" style="width: 170px; height: 2.5em;" title="Finish annotating all examples first."> <input type="button" value="skip tutorial" id="button_skip_tutorial" style="width: 170px; font-size: 11pt; height: 30px; margin-top: 10px; display: none;" title="Skip tutorial only if you completed it already."></div></div><div id="settings_div" class="white-box" style="margin-right: 20px; margin-top: 10px; display: none; background-color: #e7e2cf; font-size: 11pt;"><input type="checkbox" id="settings_approximate_alignment"> <label for="settings_approximate_alignment">Show approximate alignment</label></div><div id="output_div" style="margin-top: 100px;"></div><br><br><br></div></body></html>
69
+ }</style><script defer="defer" src="pointwise.bundle.js"></script></head><body><div style="max-width: 1600px; min-width: 900px; margin-left: auto; margin-right: auto; margin-top: 20px; padding-left: 10px;"><div class="white-box" style="margin-right: 30px; background-color: #e7e2cf; padding: 5px 15px 5px 5px;"><span id="instructions_global" style="display: inline-block; font-size: 11pt; width: calc(100% - 170px);"><ul><li id="instructions_spans">Error spans:<ul><li><strong>Click</strong> on the start of an error, then <strong>click</strong> on the end to mark an error span.</li><li><strong>Click</strong> on an existing highlight to change error severity (minor/major) or remove it.</li></ul>Error severity:<ul><li><span class="instruction_sev" id="instruction_sev_minor">Minor:</span> Style, grammar, or word choice could be better.</li><li><span class="instruction_sev" id="instruction_sev_major">Major:</span> Meaning is significantly changed or is hard to understand.</li></ul><strong>Tip</strong>: Mark the general area of the error (doesn't need to be exact). Use separate highlights for different errors.<br></li><li id="instructions_score">Score the translation: Please use the slider and set an overall score based on meaning preservation and general quality:</li><ul><li>0: <strong>No meaning preserved</strong>: most information is lost.</li><li>33%: <strong>Some meaning preserved</strong>: major gaps and narrative issues.</li><li>66%: <strong>Most meaning preserved</strong>: minor issues with grammar or consistency.</li><li>100%: <strong>Perfect</strong>: meaning and grammar align completely with the source.</li></ul><li id="instructions_categories">Error types: After highlighting an error fragment, you will be asked to select the specific error type (main category and subcategory). If you are unsure about which errors fall under which categories, please consult the <a href="https://themqm.org/the-mqm-typology/" style="font-weight: bold; text-decoration: none; color: black;">typology definitions</a>.</li></ul></span><div style="width: 170px; display: inline-block; vertical-align: top; text-align: right; padding-top: 5px;"><span id="time" style="width: 135px; text-align: left; display: inline-block; font-size: 11pt;" title="Approximation of total annotation time.">Time: 0m</span> <input type="button" value="⚙️" id="button_settings" style="height: 1.5em; width: 30px;"><br><br><div id="progress" style="text-align: center;"></div><br><br><input type="button" value="Next 🛠️" id="button_next" disabled="disabled" style="width: 170px; height: 2.5em;" title="Finish annotating all examples first."> <input type="button" value="skip tutorial" id="button_skip_tutorial" style="width: 170px; font-size: 11pt; height: 30px; margin-top: 10px; display: none;" title="Skip tutorial only if you completed it already."></div></div><div id="settings_div" class="white-box" style="margin-right: 20px; margin-top: 10px; display: none; background-color: #e7e2cf; font-size: 11pt;"><input type="checkbox" id="settings_approximate_alignment"> <label for="settings_approximate_alignment">Show approximate alignment</label></div><div id="output_div" style="margin-top: 100px;"></div><br><br><br></div></body></html>
pearmut/utils.py CHANGED
@@ -3,6 +3,9 @@ import os
3
3
 
4
4
  ROOT = "."
5
5
 
6
+ # Sentinel value to indicate a task reset - masks all prior annotations
7
+ RESET_MARKER = "__RESET__"
8
+
6
9
 
7
10
  def highlight_differences(a, b):
8
11
  """
@@ -74,16 +77,31 @@ def get_db_log(campaign_id: str) -> list[dict]:
74
77
  def get_db_log_item(campaign_id: str, user_id: str | None, item_i: int | None) -> list[dict]:
75
78
  """
76
79
  Returns the log item for the given campaign_id, user_id and item_i.
77
- Can be empty.
80
+ Can be empty. Respects reset markers - if a reset marker is found,
81
+ only entries after the last reset are returned.
78
82
  """
79
83
  log = get_db_log(campaign_id)
80
- return [
84
+
85
+ # Filter matching entries
86
+ matching = [
81
87
  entry for entry in log
82
88
  if (
83
89
  (user_id is None or entry.get("user_id") == user_id) and
84
90
  (item_i is None or entry.get("item_i") == item_i)
85
91
  )
86
92
  ]
93
+
94
+ # Find the last reset marker for this user (if any)
95
+ last_reset_idx = -1
96
+ for i, entry in enumerate(matching):
97
+ if entry.get("annotations") == RESET_MARKER:
98
+ last_reset_idx = i
99
+
100
+ # Return only entries after the last reset
101
+ if last_reset_idx >= 0:
102
+ matching = matching[last_reset_idx + 1:]
103
+
104
+ return matching
87
105
 
88
106
 
89
107
  def save_db_payload(campaign_id: str, payload: dict):
@@ -91,11 +109,61 @@ def save_db_payload(campaign_id: str, payload: dict):
91
109
  Saves the given payload to the log for the given campaign_id, user_id and item_i.
92
110
  Saves both on disk and in-memory.
93
111
  """
112
+ # Ensure the in-memory cache is initialized before writing to file
113
+ # to avoid reading back the same entry we're about to append
114
+ log = get_db_log(campaign_id)
94
115
 
95
116
  log_path = f"{ROOT}/data/outputs/{campaign_id}.jsonl"
117
+ os.makedirs(os.path.dirname(log_path), exist_ok=True)
96
118
  with open(log_path, "a") as log_file:
97
119
  log_file.write(json.dumps(payload, ensure_ascii=False,) + "\n")
98
120
 
99
- log = get_db_log(campaign_id)
100
- # copy to avoid mutation issues
101
121
  log.append(payload)
122
+
123
+
124
+ def check_validation_threshold(
125
+ tasks_data: dict,
126
+ progress_data: dict,
127
+ campaign_id: str,
128
+ user_id: str,
129
+ ) -> bool:
130
+ """
131
+ Check if user passes the validation threshold.
132
+
133
+ The threshold is defined in campaign info as 'validation_threshold':
134
+ - If integer: pass if number of failed checks <= threshold
135
+ - If float in [0, 1): pass if proportion of failed checks <= threshold
136
+ - If float >= 1: always fail
137
+ - If None/not set: defaults to 0 (fail on any failed check)
138
+
139
+ Returns True if validation passes, False otherwise.
140
+ """
141
+ threshold = tasks_data[campaign_id]["info"].get("validation_threshold", 0)
142
+
143
+ user_progress = progress_data[campaign_id][user_id]
144
+ validations = user_progress.get("validations", {})
145
+
146
+ # Count failed checks (validations is dict of item_i -> list of bools)
147
+ total_checks = 0
148
+ failed_checks = 0
149
+ for item_validations in validations.values():
150
+ for check_passed in item_validations:
151
+ total_checks += 1
152
+ if not check_passed:
153
+ failed_checks += 1
154
+
155
+ # If no validation checks exist, pass
156
+ if total_checks == 0:
157
+ return True
158
+
159
+ # Float >= 1: always fail
160
+ if isinstance(threshold, float) and threshold >= 1:
161
+ return False
162
+
163
+ # Check threshold based on type
164
+ if isinstance(threshold, float):
165
+ # Float in [0, 1): proportion-based, pass if failed proportion <= threshold
166
+ return failed_checks / total_checks <= threshold
167
+ else:
168
+ # Integer: count-based, pass if failed count <= threshold
169
+ return failed_checks <= threshold
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: pearmut
3
- Version: 0.2.2
3
+ Version: 0.2.4
4
4
  Summary: A tool for evaluation of model outputs, primarily MT.
5
5
  Author-email: Vilém Zouhar <vilem.zouhar@gmail.com>
6
6
  License: apache-2.0
@@ -16,7 +16,6 @@ Requires-Dist: wonderwords>=3.0.0
16
16
  Requires-Dist: psutil>=7.1.0
17
17
  Provides-Extra: dev
18
18
  Requires-Dist: pytest; extra == "dev"
19
- Requires-Dist: pynpm>=0.3.0; extra == "dev"
20
19
  Dynamic: license-file
21
20
 
22
21
  # Pearmut 🍐
@@ -165,8 +164,10 @@ You can add validation rules to items for tutorials or attention checks. Items w
165
164
  - Tutorial items: Include `allow_skip: true` and `warning` to let users skip after seeing the feedback
166
165
  - Loud attention checks: Include `warning` without `allow_skip` to force users to retry
167
166
  - Silent attention checks: Omit `warning` to silently log failures without user notification (useful for quality control with bad translations)
167
+
168
168
  For listwise template, `validation` is an array where each element corresponds to a candidate.
169
- The dashboard shows failed/total validation checks per user.
169
+ The dashboard shows failed/total validation checks per user, and ✅/❌ based on whether they pass the threshold.
170
+ Set `validation_threshold` in `info` to control pass/fail: integer for max failed count, float in [0,1) for max failed proportion.
170
171
  See [examples/tutorial_pointwise.json](examples/tutorial_pointwise.json) and [examples/tutorial_listwise.json](examples/tutorial_listwise.json) for complete examples.
171
172
 
172
173
  ## Single-stream Assignment
@@ -181,7 +182,7 @@ We also support a simple allocation where all annotators draw from the same pool
181
182
  "protocol_score": True, # collect scores
182
183
  "protocol_error_spans": True, # collect error spans
183
184
  "protocol_error_categories": False, # do not collect MQM categories, so ESA
184
- "num_users": 50, # number of annotators
185
+ "users": 50, # number of annotators (can also be a list, see below)
185
186
  },
186
187
  "data": [...], # list of all items (shared among all annotators)
187
188
  }
@@ -196,12 +197,31 @@ We also support dynamic allocation of annotations (`dynamic`, not yet ⚠️), w
196
197
  "assignment": "dynamic",
197
198
  "template": "listwise",
198
199
  "protocol_k": 5,
199
- "num_users": 50,
200
+ "users": 50,
200
201
  },
201
202
  "data": [...], # list of all items
202
203
  }
203
204
  ```
204
205
 
206
+ ## Pre-defined User IDs and Tokens
207
+
208
+ By default, user IDs and completion tokens are automatically generated. The `users` field can be:
209
+ - A number (e.g., `50`) to generate that many random user IDs
210
+ - A list of strings (e.g., `["alice", "bob"]`) to use specific user IDs
211
+ - A list of dictionaries to specify user IDs with custom tokens:
212
+ ```python
213
+ {
214
+ "info": {
215
+ ...
216
+ "users": [
217
+ {"user_id": "alice", "token_pass": "alice_done", "token_fail": "alice_fail"},
218
+ {"user_id": "bob", "token_pass": "bob_done"} # missing tokens are auto-generated
219
+ ],
220
+ },
221
+ ...
222
+ }
223
+ ```
224
+
205
225
  To load a campaign into the server, run the following.
206
226
  It will fail if an existing campaign with the same `campaign_id` already exists, unless you specify `-o/--overwrite`.
207
227
  It will also output a secret management link. Then, launch the server:
@@ -234,8 +254,7 @@ and independently of that select your protocol template:
234
254
  When adding new campaigns or launching pearmut, a management link is shown that gives an overview of annotator progress but also an easy access to the annotation links or resetting the task progress (no data will be lost).
235
255
  This is also the place where you can download all progress and collected annotations (these files exist also locally but this might be more convenient).
236
256
 
237
- <img width="800" alt="Management dashboard" src="https://github.com/user-attachments/assets/82470693-a5ec-4d0e-8989-e93d5b0bb840" />
238
-
257
+ <img width="800" alt="Management dashboard" src="https://github.com/user-attachments/assets/800a1741-5f41-47ac-9d5d-5cbf6abfc0e6" />
239
258
 
240
259
  Additionally, at the end of an annotation, a token of completion is shown which can be compared to the correct one that you can download in metadat from the dashboard.
241
260
  An intentionally incorrect token can be shown if the annotations don't pass quality control.
@@ -252,6 +271,39 @@ Tip: make sure the elements are already appropriately styled.
252
271
 
253
272
  <img width="1000" alt="Preview of multimodal elements in Pearmut" src="https://github.com/user-attachments/assets/77c4fa96-ee62-4e46-8e78-fd16e9007956" />
254
273
 
274
+ ## CLI Commands
275
+
276
+ Pearmut provides the following commands:
277
+
278
+ - `pearmut add <file(s)>`: Add one or more campaign JSON files. Supports wildcards (e.g., `pearmut add examples/*.json`).
279
+ - `-o/--overwrite`: Overwrite existing campaigns with the same ID.
280
+ - `--server <url>`: Prefix server URL for protocol links (default: `http://localhost:8001`).
281
+ - `pearmut run`: Start the Pearmut server.
282
+ - `--port <port>`: Port to run the server on (default: 8001).
283
+ - `--server <url>`: Prefix server URL for protocol links.
284
+ - `pearmut purge [campaign]`: Remove campaign data.
285
+ - Without arguments: Purges all campaigns (tasks, outputs, progress).
286
+ - With campaign name: Purges only the specified campaign's data.
287
+
288
+
289
+ ## Hosting Assets
290
+
291
+ If you need to host local assets (e.g., audio files, images, videos) via Pearmut, you can use the `assets` key in your campaign file.
292
+ When present, this directory is symlinked to the `static/` directory so its contents become accessible from the server.
293
+
294
+ ```python
295
+ {
296
+ "campaign_id": "my_campaign",
297
+ "info": {
298
+ "assets": "videos", # path to directory containing assets
299
+ ...
300
+ },
301
+ "data": [ ... ]
302
+ }
303
+ ```
304
+
305
+ For example, if `videos` contains `audio.mp3`, it will be accessible at `localhost:8001/assets/videos/audio.mp3`.
306
+ The path can be absolute or relative to your current working directory.
255
307
 
256
308
  ## Development
257
309
 
@@ -0,0 +1,19 @@
1
+ pearmut/app.py,sha256=6dswjMC_YN6-3WHPSl8qhin6Qb2IsHXCveX9MKen-O0,8466
2
+ pearmut/assignment.py,sha256=2dWuFacXCg65xjiEiqNPSXn4_4Z4fy5OgBolmCqgtUE,11181
3
+ pearmut/cli.py,sha256=ff3UdCToXP_U1iKLHTAuHo9eDsK5G6d8ToVmSZ-6wYI,12582
4
+ pearmut/utils.py,sha256=TWcbdTehg4CNwCpc5FuEOszpQM464LY0IQHHE_Sq1Zg,5293
5
+ pearmut/static/dashboard.bundle.js,sha256=3i4o4VOZi2g2EsC6rzwz2pYO_YwncCIjnI0Gxz57Z44,91471
6
+ pearmut/static/dashboard.html,sha256=aCYNhRZUHsVF_CXzKmzdBptEAnRTI3J5NKT4trxAots,1966
7
+ pearmut/static/index.html,sha256=SC5M-NSTnJh1UNHCC5VOP0TKkmhNn6MHlY6L4GDacpA,849
8
+ pearmut/static/listwise.bundle.js,sha256=kkXvg4F-xnNH8UzhuiAl1MqatwzAcs2h5r22jhnYvqE,105235
9
+ pearmut/static/listwise.html,sha256=YZKQtB_TOt1gQKjJdwjEkcHAOiZoW2WlIFhpSr4kCo0,5163
10
+ pearmut/static/pointwise.bundle.js,sha256=xVvarH95pYeZUqjfoXufyLzdISqkoJ4DcBshy-94WOw,107298
11
+ pearmut/static/pointwise.html,sha256=7pf7HcyvM6t-Jze7tFYjfwTEu1C5Az1sg4e_SUbBFl0,4879
12
+ pearmut/static/assets/favicon.svg,sha256=gVPxdBlyfyJVkiMfh8WLaiSyH4lpwmKZs8UiOeX8YW4,7347
13
+ pearmut/static/assets/style.css,sha256=BrPnXTDr8hQ0M8T-EJlExddChzIFotlerBYMx2B8GDk,4136
14
+ pearmut-0.2.4.dist-info/licenses/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
15
+ pearmut-0.2.4.dist-info/METADATA,sha256=C8cZZDhSGEYnQOosPieoAeCoY_lb5iM8hc_7SHK4H4o,14381
16
+ pearmut-0.2.4.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
17
+ pearmut-0.2.4.dist-info/entry_points.txt,sha256=eEA9LVWsS3neQbMvL_nMvEw8I0oFudw8nQa1iqxOiWM,45
18
+ pearmut-0.2.4.dist-info/top_level.txt,sha256=CdgtUM-SKQDt6o5g0QreO-_7XTBP9_wnHMS1P-Rl5Go,8
19
+ pearmut-0.2.4.dist-info/RECORD,,
@@ -1,19 +0,0 @@
1
- pearmut/app.py,sha256=kGamXakzpuKFWQQSRV_rsFNn7rpbO-yMM09r65sdK2U,7911
2
- pearmut/assignment.py,sha256=Sycq-_6BTjpm7KPSZ02zX9aTZxOr-zaxW5QbZpQlqV8,10415
3
- pearmut/cli.py,sha256=9JFv8eop4HdpgJH9RzSWLMTo38fkoUBeMEcs1xmYiGs,7689
4
- pearmut/utils.py,sha256=gk8b4biPc9TTvZiQMQ_8xh1_FsWuwrhtPzeK3NpzhZc,2902
5
- pearmut/static/dashboard.bundle.js,sha256=tYnKv1eoDX_Ydfy7pHjFXR79SLjyHC5M3DMbrtlxPEg,91574
6
- pearmut/static/dashboard.html,sha256=tAFNUlrtYTJ_Bnh2Rer278eRyt_tIk8mXvN0sDcyzKE,1767
7
- pearmut/static/index.html,sha256=SC5M-NSTnJh1UNHCC5VOP0TKkmhNn6MHlY6L4GDacpA,849
8
- pearmut/static/listwise.bundle.js,sha256=tJzsHDoOsvWndDaxAcFaFlgiwimSyOXSeP1i2d4Q5n4,104842
9
- pearmut/static/listwise.html,sha256=evNyjPUCWPVfPSnGlzSEMhNmysH-WN4X_4drU91kBWY,5189
10
- pearmut/static/pointwise.bundle.js,sha256=CV3V3NcLpUPsBMhr4zVFvw_x5Udpd_jbtGcTrGUxK4g,107209
11
- pearmut/static/pointwise.html,sha256=snbT0UDxnKS3LEV8r832eglwzwkV0bqwY0zMWFnEUp4,4986
12
- pearmut/static/assets/favicon.svg,sha256=gVPxdBlyfyJVkiMfh8WLaiSyH4lpwmKZs8UiOeX8YW4,7347
13
- pearmut/static/assets/style.css,sha256=SARZqqovP_2s9S5ENI7dxJ6Hacz-ztQ2zn2Hn7DwoJU,4089
14
- pearmut-0.2.2.dist-info/licenses/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
15
- pearmut-0.2.2.dist-info/METADATA,sha256=HIUdUB53cYuk4kBC3fywolFbG81XbVYyuqP_Jeq0KPg,12270
16
- pearmut-0.2.2.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
17
- pearmut-0.2.2.dist-info/entry_points.txt,sha256=eEA9LVWsS3neQbMvL_nMvEw8I0oFudw8nQa1iqxOiWM,45
18
- pearmut-0.2.2.dist-info/top_level.txt,sha256=CdgtUM-SKQDt6o5g0QreO-_7XTBP9_wnHMS1P-Rl5Go,8
19
- pearmut-0.2.2.dist-info/RECORD,,