pearmut 0.2.9__tar.gz → 0.2.10__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (24) hide show
  1. {pearmut-0.2.9 → pearmut-0.2.10}/PKG-INFO +38 -2
  2. {pearmut-0.2.9 → pearmut-0.2.10}/README.md +37 -1
  3. {pearmut-0.2.9 → pearmut-0.2.10}/pearmut.egg-info/PKG-INFO +38 -2
  4. {pearmut-0.2.9 → pearmut-0.2.10}/pyproject.toml +1 -1
  5. {pearmut-0.2.9 → pearmut-0.2.10}/server/app.py +10 -2
  6. {pearmut-0.2.9 → pearmut-0.2.10}/server/static/dashboard.bundle.js +1 -1
  7. {pearmut-0.2.9 → pearmut-0.2.10}/server/static/listwise.bundle.js +1 -1
  8. {pearmut-0.2.9 → pearmut-0.2.10}/server/static/pointwise.bundle.js +1 -1
  9. {pearmut-0.2.9 → pearmut-0.2.10}/LICENSE +0 -0
  10. {pearmut-0.2.9 → pearmut-0.2.10}/pearmut.egg-info/SOURCES.txt +0 -0
  11. {pearmut-0.2.9 → pearmut-0.2.10}/pearmut.egg-info/dependency_links.txt +0 -0
  12. {pearmut-0.2.9 → pearmut-0.2.10}/pearmut.egg-info/entry_points.txt +0 -0
  13. {pearmut-0.2.9 → pearmut-0.2.10}/pearmut.egg-info/requires.txt +0 -0
  14. {pearmut-0.2.9 → pearmut-0.2.10}/pearmut.egg-info/top_level.txt +0 -0
  15. {pearmut-0.2.9 → pearmut-0.2.10}/server/assignment.py +0 -0
  16. {pearmut-0.2.9 → pearmut-0.2.10}/server/cli.py +0 -0
  17. {pearmut-0.2.9 → pearmut-0.2.10}/server/static/assets/favicon.svg +0 -0
  18. {pearmut-0.2.9 → pearmut-0.2.10}/server/static/assets/style.css +0 -0
  19. {pearmut-0.2.9 → pearmut-0.2.10}/server/static/dashboard.html +0 -0
  20. {pearmut-0.2.9 → pearmut-0.2.10}/server/static/index.html +0 -0
  21. {pearmut-0.2.9 → pearmut-0.2.10}/server/static/listwise.html +0 -0
  22. {pearmut-0.2.9 → pearmut-0.2.10}/server/static/pointwise.html +0 -0
  23. {pearmut-0.2.9 → pearmut-0.2.10}/server/utils.py +0 -0
  24. {pearmut-0.2.9 → pearmut-0.2.10}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: pearmut
3
- Version: 0.2.9
3
+ Version: 0.2.10
4
4
  Summary: A tool for evaluation of model outputs, primarily MT.
5
5
  Author-email: Vilém Zouhar <vilem.zouhar@gmail.com>
6
6
  License: MIT
@@ -47,9 +47,13 @@ Dynamic: license-file
47
47
  - [Hosting Assets](#hosting-assets)
48
48
  - [Campaign Management](#campaign-management)
49
49
  - [CLI Commands](#cli-commands)
50
+ - [Terminology](#terminology)
50
51
  - [Development](#development)
51
52
  - [Citation](#citation)
52
53
 
54
+
55
+ **Error Span** — A highlighted segment of text marked as containing an error, with optional severity (`minor`, `major`, `neutral`) and MQM category labels.
56
+
53
57
  ## Quick Start
54
58
 
55
59
  Install and run locally without cloning:
@@ -294,6 +298,38 @@ Items need `model` field (pointwise) or `models` field (listwise) and the `proto
294
298
  ```
295
299
  See an example in [Campaign Management](#campaign-management)
296
300
 
301
+
302
+ ## Terminology
303
+
304
+ - **Campaign**: An annotation project that contains configuration, data, and user assignments. Each campaign has a unique identifier and is defined in a JSON file.
305
+ - **Campaign File**: A JSON file that defines the campaign configuration, including the campaign ID, assignment type, protocol settings, and annotation data.
306
+ - **Campaign ID**: A unique identifier for a campaign (e.g., `"wmt25_#_en-cs_CZ"`). Used to reference and manage specific campaigns.
307
+ - **Task**: A unit of work assigned to a user. In task-based assignment, each task consists of a predefined set of items for a specific user.
308
+ - **Item** — A single annotation unit within a task. For translation evaluation, an item typically represents a document (source text and target translation). Items can contain text, images, audio, or video.
309
+ - **Document** — A collection of one or more segments (sentence pairs or text units) that are evaluated together as a single item.
310
+ - **User** / **Annotator**: A person who performs annotations in a campaign. Each user is identified by a unique user ID and accesses the campaign through a unique URL.
311
+ - **Attention Check** — A validation item with known correct answers used to ensure annotator quality. Can be:
312
+ - **Loud**: Shows warning message and forces retry on failure
313
+ - **Silent**: Logs failures without notifying the user (for quality control analysis)
314
+ - **Token** — A completion code shown to users when they finish their annotations. Tokens verify the completion and whether the user passed quality control checks:
315
+ - **Pass Token** (`token_pass`): Shown when user meets validation thresholds
316
+ - **Fail Token** (`token_fail`): Shown when user fails to meet validation requirements
317
+ - **Tutorial**: An instructional validation item that teaches users how to annotate. Includes `allow_skip: true` to let users skip if they have seen it before.
318
+ - **Validation**: Quality control rules attached to items that check if annotations match expected criteria (score ranges, error span locations, etc.). Used for tutorials and attention checks.
319
+ - **Model**: The system or model that generated the output being evaluated (e.g., `"GPT-4"`, `"Claude"`). Used for tracking and ranking model performance.
320
+ - **Dashboard**: The management interface that shows campaign progress, annotator statistics, access links, and allows downloading annotations. Accessed via a special management URL with token authentication.
321
+ - **Protocol**: The annotation scheme defining what data is collected:
322
+ - **Score**: Numeric quality rating (0-100)
323
+ - **Error Spans**: Text highlights marking errors
324
+ - **Error Categories**: MQM taxonomy labels for errors
325
+ - **Template**: The annotation interface type:
326
+ - **Pointwise**: Evaluate one output at a time
327
+ - **Listwise**: Compare multiple outputs simultaneously
328
+ - **Assignment**: The method for distributing items to users:
329
+ - **Task-based**: Each user has predefined items
330
+ - **Single-stream**: Users draw from a shared pool with random assignment
331
+ - **Dynamic**: Work in progress
332
+
297
333
  ## Development
298
334
 
299
335
  Server responds to data-only requests from frontend (no template coupling). Frontend served from pre-built `static/` on install.
@@ -333,7 +369,7 @@ If you use this work in your paper, please cite as following.
333
369
  author={Vilém Zouhar},
334
370
  title={Pearmut: Platform for Evaluating and Reviewing of Multilingual Tasks},
335
371
  url={https://github.com/zouharvi/pearmut/},
336
- year={2025},
372
+ year={2026},
337
373
  }
338
374
  ```
339
375
 
@@ -27,9 +27,13 @@
27
27
  - [Hosting Assets](#hosting-assets)
28
28
  - [Campaign Management](#campaign-management)
29
29
  - [CLI Commands](#cli-commands)
30
+ - [Terminology](#terminology)
30
31
  - [Development](#development)
31
32
  - [Citation](#citation)
32
33
 
34
+
35
+ **Error Span** — A highlighted segment of text marked as containing an error, with optional severity (`minor`, `major`, `neutral`) and MQM category labels.
36
+
33
37
  ## Quick Start
34
38
 
35
39
  Install and run locally without cloning:
@@ -274,6 +278,38 @@ Items need `model` field (pointwise) or `models` field (listwise) and the `proto
274
278
  ```
275
279
  See an example in [Campaign Management](#campaign-management)
276
280
 
281
+
282
+ ## Terminology
283
+
284
+ - **Campaign**: An annotation project that contains configuration, data, and user assignments. Each campaign has a unique identifier and is defined in a JSON file.
285
+ - **Campaign File**: A JSON file that defines the campaign configuration, including the campaign ID, assignment type, protocol settings, and annotation data.
286
+ - **Campaign ID**: A unique identifier for a campaign (e.g., `"wmt25_#_en-cs_CZ"`). Used to reference and manage specific campaigns.
287
+ - **Task**: A unit of work assigned to a user. In task-based assignment, each task consists of a predefined set of items for a specific user.
288
+ - **Item** — A single annotation unit within a task. For translation evaluation, an item typically represents a document (source text and target translation). Items can contain text, images, audio, or video.
289
+ - **Document** — A collection of one or more segments (sentence pairs or text units) that are evaluated together as a single item.
290
+ - **User** / **Annotator**: A person who performs annotations in a campaign. Each user is identified by a unique user ID and accesses the campaign through a unique URL.
291
+ - **Attention Check** — A validation item with known correct answers used to ensure annotator quality. Can be:
292
+ - **Loud**: Shows warning message and forces retry on failure
293
+ - **Silent**: Logs failures without notifying the user (for quality control analysis)
294
+ - **Token** — A completion code shown to users when they finish their annotations. Tokens verify the completion and whether the user passed quality control checks:
295
+ - **Pass Token** (`token_pass`): Shown when user meets validation thresholds
296
+ - **Fail Token** (`token_fail`): Shown when user fails to meet validation requirements
297
+ - **Tutorial**: An instructional validation item that teaches users how to annotate. Includes `allow_skip: true` to let users skip if they have seen it before.
298
+ - **Validation**: Quality control rules attached to items that check if annotations match expected criteria (score ranges, error span locations, etc.). Used for tutorials and attention checks.
299
+ - **Model**: The system or model that generated the output being evaluated (e.g., `"GPT-4"`, `"Claude"`). Used for tracking and ranking model performance.
300
+ - **Dashboard**: The management interface that shows campaign progress, annotator statistics, access links, and allows downloading annotations. Accessed via a special management URL with token authentication.
301
+ - **Protocol**: The annotation scheme defining what data is collected:
302
+ - **Score**: Numeric quality rating (0-100)
303
+ - **Error Spans**: Text highlights marking errors
304
+ - **Error Categories**: MQM taxonomy labels for errors
305
+ - **Template**: The annotation interface type:
306
+ - **Pointwise**: Evaluate one output at a time
307
+ - **Listwise**: Compare multiple outputs simultaneously
308
+ - **Assignment**: The method for distributing items to users:
309
+ - **Task-based**: Each user has predefined items
310
+ - **Single-stream**: Users draw from a shared pool with random assignment
311
+ - **Dynamic**: Work in progress
312
+
277
313
  ## Development
278
314
 
279
315
  Server responds to data-only requests from frontend (no template coupling). Frontend served from pre-built `static/` on install.
@@ -313,7 +349,7 @@ If you use this work in your paper, please cite as following.
313
349
  author={Vilém Zouhar},
314
350
  title={Pearmut: Platform for Evaluating and Reviewing of Multilingual Tasks},
315
351
  url={https://github.com/zouharvi/pearmut/},
316
- year={2025},
352
+ year={2026},
317
353
  }
318
354
  ```
319
355
 
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: pearmut
3
- Version: 0.2.9
3
+ Version: 0.2.10
4
4
  Summary: A tool for evaluation of model outputs, primarily MT.
5
5
  Author-email: Vilém Zouhar <vilem.zouhar@gmail.com>
6
6
  License: MIT
@@ -47,9 +47,13 @@ Dynamic: license-file
47
47
  - [Hosting Assets](#hosting-assets)
48
48
  - [Campaign Management](#campaign-management)
49
49
  - [CLI Commands](#cli-commands)
50
+ - [Terminology](#terminology)
50
51
  - [Development](#development)
51
52
  - [Citation](#citation)
52
53
 
54
+
55
+ **Error Span** — A highlighted segment of text marked as containing an error, with optional severity (`minor`, `major`, `neutral`) and MQM category labels.
56
+
53
57
  ## Quick Start
54
58
 
55
59
  Install and run locally without cloning:
@@ -294,6 +298,38 @@ Items need `model` field (pointwise) or `models` field (listwise) and the `proto
294
298
  ```
295
299
  See an example in [Campaign Management](#campaign-management)
296
300
 
301
+
302
+ ## Terminology
303
+
304
+ - **Campaign**: An annotation project that contains configuration, data, and user assignments. Each campaign has a unique identifier and is defined in a JSON file.
305
+ - **Campaign File**: A JSON file that defines the campaign configuration, including the campaign ID, assignment type, protocol settings, and annotation data.
306
+ - **Campaign ID**: A unique identifier for a campaign (e.g., `"wmt25_#_en-cs_CZ"`). Used to reference and manage specific campaigns.
307
+ - **Task**: A unit of work assigned to a user. In task-based assignment, each task consists of a predefined set of items for a specific user.
308
+ - **Item** — A single annotation unit within a task. For translation evaluation, an item typically represents a document (source text and target translation). Items can contain text, images, audio, or video.
309
+ - **Document** — A collection of one or more segments (sentence pairs or text units) that are evaluated together as a single item.
310
+ - **User** / **Annotator**: A person who performs annotations in a campaign. Each user is identified by a unique user ID and accesses the campaign through a unique URL.
311
+ - **Attention Check** — A validation item with known correct answers used to ensure annotator quality. Can be:
312
+ - **Loud**: Shows warning message and forces retry on failure
313
+ - **Silent**: Logs failures without notifying the user (for quality control analysis)
314
+ - **Token** — A completion code shown to users when they finish their annotations. Tokens verify the completion and whether the user passed quality control checks:
315
+ - **Pass Token** (`token_pass`): Shown when user meets validation thresholds
316
+ - **Fail Token** (`token_fail`): Shown when user fails to meet validation requirements
317
+ - **Tutorial**: An instructional validation item that teaches users how to annotate. Includes `allow_skip: true` to let users skip if they have seen it before.
318
+ - **Validation**: Quality control rules attached to items that check if annotations match expected criteria (score ranges, error span locations, etc.). Used for tutorials and attention checks.
319
+ - **Model**: The system or model that generated the output being evaluated (e.g., `"GPT-4"`, `"Claude"`). Used for tracking and ranking model performance.
320
+ - **Dashboard**: The management interface that shows campaign progress, annotator statistics, access links, and allows downloading annotations. Accessed via a special management URL with token authentication.
321
+ - **Protocol**: The annotation scheme defining what data is collected:
322
+ - **Score**: Numeric quality rating (0-100)
323
+ - **Error Spans**: Text highlights marking errors
324
+ - **Error Categories**: MQM taxonomy labels for errors
325
+ - **Template**: The annotation interface type:
326
+ - **Pointwise**: Evaluate one output at a time
327
+ - **Listwise**: Compare multiple outputs simultaneously
328
+ - **Assignment**: The method for distributing items to users:
329
+ - **Task-based**: Each user has predefined items
330
+ - **Single-stream**: Users draw from a shared pool with random assignment
331
+ - **Dynamic**: Work in progress
332
+
297
333
  ## Development
298
334
 
299
335
  Server responds to data-only requests from frontend (no template coupling). Frontend served from pre-built `static/` on install.
@@ -333,7 +369,7 @@ If you use this work in your paper, please cite as following.
333
369
  author={Vilém Zouhar},
334
370
  title={Pearmut: Platform for Evaluating and Reviewing of Multilingual Tasks},
335
371
  url={https://github.com/zouharvi/pearmut/},
336
- year={2025},
372
+ year={2026},
337
373
  }
338
374
  ```
339
375
 
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "pearmut"
3
- version = "0.2.9"
3
+ version = "0.2.10"
4
4
  description = "A tool for evaluation of model outputs, primarily MT."
5
5
  readme = "README.md"
6
6
  license = { text = "MIT" }
@@ -291,7 +291,11 @@ async def _download_annotations(
291
291
  with open(output_path, "r") as f:
292
292
  output[campaign_id] = [json.loads(x) for x in f.readlines()]
293
293
 
294
- return JSONResponse(content=output, status_code=200)
294
+ return JSONResponse(
295
+ content=output,
296
+ status_code=200,
297
+ headers={"Content-Disposition": 'inline; filename="annotations.json"'}
298
+ )
295
299
 
296
300
 
297
301
  @app.get("/download-progress")
@@ -315,7 +319,11 @@ async def _download_progress(
315
319
 
316
320
  output[cid] = progress_data[cid]
317
321
 
318
- return JSONResponse(content=output, status_code=200)
322
+ return JSONResponse(
323
+ content=output,
324
+ status_code=200,
325
+ headers={"Content-Disposition": 'inline; filename="progress.json"'}
326
+ )
319
327
 
320
328
 
321
329
  static_dir = f"{os.path.dirname(os.path.abspath(__file__))}/static/"