PyPI - pearmut - Versions diffs - 0.2.9__tar.gz → 0.2.11__tar.gz - Mend

pearmut 0.2.9tar.gz → 0.2.11tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

{pearmut-0.2.9 → pearmut-0.2.11}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pearmut
-Version: 0.2.9
+Version: 0.2.11
 Summary: A tool for evaluation of model outputs, primarily MT.
 Author-email: Vilém Zouhar <vilem.zouhar@gmail.com>
 License: MIT
@@ -47,9 +47,13 @@ Dynamic: license-file
   - [Hosting Assets](#hosting-assets)
 - [Campaign Management](#campaign-management)
 - [CLI Commands](#cli-commands)
+- [Terminology](#terminology)
 - [Development](#development)
 - [Citation](#citation)
+**Error Span** — A highlighted segment of text marked as containing an error, with optional severity (`minor`, `major`, `neutral`) and MQM category labels.
 ## Quick Start
 Install and run locally without cloning:
@@ -193,7 +197,21 @@ Add `validation` rules for tutorials or attention checks:
 - **Silent attention checks**: Omit `warning` to log failures without notification (quality control)
 For listwise, `validation` is an array (one per candidate). Dashboard shows ✅/❌ based on `validation_threshold` in `info` (integer for max failed count, float \[0,1\) for max proportion, default 0).
-See [examples/tutorial_pointwise.json](examples/tutorial_pointwise.json) and [examples/tutorial_listwise.json](examples/tutorial_listwise.json).
+**Listwise score comparison:** Use `score_greaterthan` to ensure one candidate scores higher than another:
+```python
+{
+  "src": "AI transforms industries.",
+  "tgt": ["UI transformuje průmysly.", "Umělá inteligence mění obory."],
+  "validation": [
+    {"warning": "A has error, score 20-40.", "score": [20, 40]},
+    {"warning": "B is correct and must score higher than A.", "score": [70, 90], "score_greaterthan": 0}
+  ]
+}
+```
+The `score_greaterthan` field specifies the index of the candidate that must have a lower score than the current candidate.
+See [examples/tutorial_pointwise.json](examples/tutorial_pointwise.json), [examples/tutorial_listwise.json](examples/tutorial_listwise.json), and [examples/tutorial_listwise_score_greaterthan.json](examples/tutorial_listwise_score_greaterthan.json).
 ### Single-stream Assignment
@@ -294,6 +312,38 @@ Items need `model` field (pointwise) or `models` field (listwise) and the `proto
 ```
 See an example in [Campaign Management](#campaign-management)
+## Terminology
+- **Campaign**: An annotation project that contains configuration, data, and user assignments. Each campaign has a unique identifier and is defined in a JSON file.
+  - **Campaign File**: A JSON file that defines the campaign configuration, including the campaign ID, assignment type, protocol settings, and annotation data.
+  - **Campaign ID**: A unique identifier for a campaign (e.g., `"wmt25_#_en-cs_CZ"`). Used to reference and manage specific campaigns.
+- **Task**: A unit of work assigned to a user. In task-based assignment, each task consists of a predefined set of items for a specific user.
+- **Item** — A single annotation unit within a task. For translation evaluation, an item typically represents a document (source text and target translation). Items can contain text, images, audio, or video.
+- **Document** — A collection of one or more segments (sentence pairs or text units) that are evaluated together as a single item.
+- **User** / **Annotator**: A person who performs annotations in a campaign. Each user is identified by a unique user ID and accesses the campaign through a unique URL.
+- **Attention Check** — A validation item with known correct answers used to ensure annotator quality. Can be:
+  - **Loud**: Shows warning message and forces retry on failure
+  - **Silent**: Logs failures without notifying the user (for quality control analysis)
+  - **Token** — A completion code shown to users when they finish their annotations. Tokens verify the completion and whether the user passed quality control checks:
+    - **Pass Token** (`token_pass`): Shown when user meets validation thresholds
+    - **Fail Token** (`token_fail`): Shown when user fails to meet validation requirements
+- **Tutorial**: An instructional validation item that teaches users how to annotate. Includes `allow_skip: true` to let users skip if they have seen it before.
+- **Validation**: Quality control rules attached to items that check if annotations match expected criteria (score ranges, error span locations, etc.). Used for tutorials and attention checks.
+- **Model**: The system or model that generated the output being evaluated (e.g., `"GPT-4"`, `"Claude"`). Used for tracking and ranking model performance.
+- **Dashboard**: The management interface that shows campaign progress, annotator statistics, access links, and allows downloading annotations. Accessed via a special management URL with token authentication.
+- **Protocol**: The annotation scheme defining what data is collected:
+  - **Score**: Numeric quality rating (0-100)
+  - **Error Spans**: Text highlights marking errors
+  - **Error Categories**: MQM taxonomy labels for errors
+- **Template**: The annotation interface type:
+  - **Pointwise**: Evaluate one output at a time
+  - **Listwise**: Compare multiple outputs simultaneously
+- **Assignment**: The method for distributing items to users:
+  - **Task-based**: Each user has predefined items
+  - **Single-stream**: Users draw from a shared pool with random assignment
+  - **Dynamic**: Work in progress
 ## Development
 Server responds to data-only requests from frontend (no template coupling). Frontend served from pre-built `static/` on install.
@@ -333,7 +383,7 @@ If you use this work in your paper, please cite as following.
     author={Vilém Zouhar},
     title={Pearmut: Platform for Evaluating and Reviewing of Multilingual Tasks},
     url={https://github.com/zouharvi/pearmut/},
-    year={2025},
+    year={2026},
 }
 ```

{pearmut-0.2.9 → pearmut-0.2.11}/README.md RENAMED Viewed

@@ -27,9 +27,13 @@
   - [Hosting Assets](#hosting-assets)
 - [Campaign Management](#campaign-management)
 - [CLI Commands](#cli-commands)
+- [Terminology](#terminology)
 - [Development](#development)
 - [Citation](#citation)
+**Error Span** — A highlighted segment of text marked as containing an error, with optional severity (`minor`, `major`, `neutral`) and MQM category labels.
 ## Quick Start
 Install and run locally without cloning:
@@ -173,7 +177,21 @@ Add `validation` rules for tutorials or attention checks:
 - **Silent attention checks**: Omit `warning` to log failures without notification (quality control)
 For listwise, `validation` is an array (one per candidate). Dashboard shows ✅/❌ based on `validation_threshold` in `info` (integer for max failed count, float \[0,1\) for max proportion, default 0).
-See [examples/tutorial_pointwise.json](examples/tutorial_pointwise.json) and [examples/tutorial_listwise.json](examples/tutorial_listwise.json).
+**Listwise score comparison:** Use `score_greaterthan` to ensure one candidate scores higher than another:
+```python
+{
+  "src": "AI transforms industries.",
+  "tgt": ["UI transformuje průmysly.", "Umělá inteligence mění obory."],
+  "validation": [
+    {"warning": "A has error, score 20-40.", "score": [20, 40]},
+    {"warning": "B is correct and must score higher than A.", "score": [70, 90], "score_greaterthan": 0}
+  ]
+}
+```
+The `score_greaterthan` field specifies the index of the candidate that must have a lower score than the current candidate.
+See [examples/tutorial_pointwise.json](examples/tutorial_pointwise.json), [examples/tutorial_listwise.json](examples/tutorial_listwise.json), and [examples/tutorial_listwise_score_greaterthan.json](examples/tutorial_listwise_score_greaterthan.json).
 ### Single-stream Assignment
@@ -274,6 +292,38 @@ Items need `model` field (pointwise) or `models` field (listwise) and the `proto
 ```
 See an example in [Campaign Management](#campaign-management)
+## Terminology
+- **Campaign**: An annotation project that contains configuration, data, and user assignments. Each campaign has a unique identifier and is defined in a JSON file.
+  - **Campaign File**: A JSON file that defines the campaign configuration, including the campaign ID, assignment type, protocol settings, and annotation data.
+  - **Campaign ID**: A unique identifier for a campaign (e.g., `"wmt25_#_en-cs_CZ"`). Used to reference and manage specific campaigns.
+- **Task**: A unit of work assigned to a user. In task-based assignment, each task consists of a predefined set of items for a specific user.
+- **Item** — A single annotation unit within a task. For translation evaluation, an item typically represents a document (source text and target translation). Items can contain text, images, audio, or video.
+- **Document** — A collection of one or more segments (sentence pairs or text units) that are evaluated together as a single item.
+- **User** / **Annotator**: A person who performs annotations in a campaign. Each user is identified by a unique user ID and accesses the campaign through a unique URL.
+- **Attention Check** — A validation item with known correct answers used to ensure annotator quality. Can be:
+  - **Loud**: Shows warning message and forces retry on failure
+  - **Silent**: Logs failures without notifying the user (for quality control analysis)
+  - **Token** — A completion code shown to users when they finish their annotations. Tokens verify the completion and whether the user passed quality control checks:
+    - **Pass Token** (`token_pass`): Shown when user meets validation thresholds
+    - **Fail Token** (`token_fail`): Shown when user fails to meet validation requirements
+- **Tutorial**: An instructional validation item that teaches users how to annotate. Includes `allow_skip: true` to let users skip if they have seen it before.
+- **Validation**: Quality control rules attached to items that check if annotations match expected criteria (score ranges, error span locations, etc.). Used for tutorials and attention checks.
+- **Model**: The system or model that generated the output being evaluated (e.g., `"GPT-4"`, `"Claude"`). Used for tracking and ranking model performance.
+- **Dashboard**: The management interface that shows campaign progress, annotator statistics, access links, and allows downloading annotations. Accessed via a special management URL with token authentication.
+- **Protocol**: The annotation scheme defining what data is collected:
+  - **Score**: Numeric quality rating (0-100)
+  - **Error Spans**: Text highlights marking errors
+  - **Error Categories**: MQM taxonomy labels for errors
+- **Template**: The annotation interface type:
+  - **Pointwise**: Evaluate one output at a time
+  - **Listwise**: Compare multiple outputs simultaneously
+- **Assignment**: The method for distributing items to users:
+  - **Task-based**: Each user has predefined items
+  - **Single-stream**: Users draw from a shared pool with random assignment
+  - **Dynamic**: Work in progress
 ## Development
 Server responds to data-only requests from frontend (no template coupling). Frontend served from pre-built `static/` on install.
@@ -313,7 +363,7 @@ If you use this work in your paper, please cite as following.
     author={Vilém Zouhar},
     title={Pearmut: Platform for Evaluating and Reviewing of Multilingual Tasks},
     url={https://github.com/zouharvi/pearmut/},
-    year={2025},
+    year={2026},
 }
 ```

{pearmut-0.2.9 → pearmut-0.2.11}/pearmut.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pearmut
-Version: 0.2.9
+Version: 0.2.11
 Summary: A tool for evaluation of model outputs, primarily MT.
 Author-email: Vilém Zouhar <vilem.zouhar@gmail.com>
 License: MIT
@@ -47,9 +47,13 @@ Dynamic: license-file
   - [Hosting Assets](#hosting-assets)
 - [Campaign Management](#campaign-management)
 - [CLI Commands](#cli-commands)
+- [Terminology](#terminology)
 - [Development](#development)
 - [Citation](#citation)
+**Error Span** — A highlighted segment of text marked as containing an error, with optional severity (`minor`, `major`, `neutral`) and MQM category labels.
 ## Quick Start
 Install and run locally without cloning:
@@ -193,7 +197,21 @@ Add `validation` rules for tutorials or attention checks:
 - **Silent attention checks**: Omit `warning` to log failures without notification (quality control)
 For listwise, `validation` is an array (one per candidate). Dashboard shows ✅/❌ based on `validation_threshold` in `info` (integer for max failed count, float \[0,1\) for max proportion, default 0).
-See [examples/tutorial_pointwise.json](examples/tutorial_pointwise.json) and [examples/tutorial_listwise.json](examples/tutorial_listwise.json).
+**Listwise score comparison:** Use `score_greaterthan` to ensure one candidate scores higher than another:
+```python
+{
+  "src": "AI transforms industries.",
+  "tgt": ["UI transformuje průmysly.", "Umělá inteligence mění obory."],
+  "validation": [
+    {"warning": "A has error, score 20-40.", "score": [20, 40]},
+    {"warning": "B is correct and must score higher than A.", "score": [70, 90], "score_greaterthan": 0}
+  ]
+}
+```
+The `score_greaterthan` field specifies the index of the candidate that must have a lower score than the current candidate.
+See [examples/tutorial_pointwise.json](examples/tutorial_pointwise.json), [examples/tutorial_listwise.json](examples/tutorial_listwise.json), and [examples/tutorial_listwise_score_greaterthan.json](examples/tutorial_listwise_score_greaterthan.json).
 ### Single-stream Assignment
@@ -294,6 +312,38 @@ Items need `model` field (pointwise) or `models` field (listwise) and the `proto
 ```
 See an example in [Campaign Management](#campaign-management)
+## Terminology
+- **Campaign**: An annotation project that contains configuration, data, and user assignments. Each campaign has a unique identifier and is defined in a JSON file.
+  - **Campaign File**: A JSON file that defines the campaign configuration, including the campaign ID, assignment type, protocol settings, and annotation data.
+  - **Campaign ID**: A unique identifier for a campaign (e.g., `"wmt25_#_en-cs_CZ"`). Used to reference and manage specific campaigns.
+- **Task**: A unit of work assigned to a user. In task-based assignment, each task consists of a predefined set of items for a specific user.
+- **Item** — A single annotation unit within a task. For translation evaluation, an item typically represents a document (source text and target translation). Items can contain text, images, audio, or video.
+- **Document** — A collection of one or more segments (sentence pairs or text units) that are evaluated together as a single item.
+- **User** / **Annotator**: A person who performs annotations in a campaign. Each user is identified by a unique user ID and accesses the campaign through a unique URL.
+- **Attention Check** — A validation item with known correct answers used to ensure annotator quality. Can be:
+  - **Loud**: Shows warning message and forces retry on failure
+  - **Silent**: Logs failures without notifying the user (for quality control analysis)
+  - **Token** — A completion code shown to users when they finish their annotations. Tokens verify the completion and whether the user passed quality control checks:
+    - **Pass Token** (`token_pass`): Shown when user meets validation thresholds
+    - **Fail Token** (`token_fail`): Shown when user fails to meet validation requirements
+- **Tutorial**: An instructional validation item that teaches users how to annotate. Includes `allow_skip: true` to let users skip if they have seen it before.
+- **Validation**: Quality control rules attached to items that check if annotations match expected criteria (score ranges, error span locations, etc.). Used for tutorials and attention checks.
+- **Model**: The system or model that generated the output being evaluated (e.g., `"GPT-4"`, `"Claude"`). Used for tracking and ranking model performance.
+- **Dashboard**: The management interface that shows campaign progress, annotator statistics, access links, and allows downloading annotations. Accessed via a special management URL with token authentication.
+- **Protocol**: The annotation scheme defining what data is collected:
+  - **Score**: Numeric quality rating (0-100)
+  - **Error Spans**: Text highlights marking errors
+  - **Error Categories**: MQM taxonomy labels for errors
+- **Template**: The annotation interface type:
+  - **Pointwise**: Evaluate one output at a time
+  - **Listwise**: Compare multiple outputs simultaneously
+- **Assignment**: The method for distributing items to users:
+  - **Task-based**: Each user has predefined items
+  - **Single-stream**: Users draw from a shared pool with random assignment
+  - **Dynamic**: Work in progress
 ## Development
 Server responds to data-only requests from frontend (no template coupling). Frontend served from pre-built `static/` on install.
@@ -333,7 +383,7 @@ If you use this work in your paper, please cite as following.
     author={Vilém Zouhar},
     title={Pearmut: Platform for Evaluating and Reviewing of Multilingual Tasks},
     url={https://github.com/zouharvi/pearmut/},
-    year={2025},
+    year={2026},
 }
 ```

{pearmut-0.2.9 → pearmut-0.2.11}/pearmut.egg-info/SOURCES.txt RENAMED Viewed

@@ -13,10 +13,10 @@ server/cli.py
 server/utils.py
 server/static/dashboard.bundle.js
 server/static/dashboard.html
+server/static/favicon.svg
 server/static/index.html
 server/static/listwise.bundle.js
 server/static/listwise.html
 server/static/pointwise.bundle.js
 server/static/pointwise.html
-server/static/assets/favicon.svg
-server/static/assets/style.css
+server/static/style.css

{pearmut-0.2.9 → pearmut-0.2.11}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "pearmut"
-version = "0.2.9"
+version = "0.2.11"
 description = "A tool for evaluation of model outputs, primarily MT."
 readme = "README.md"
 license = { text = "MIT" }

{pearmut-0.2.9 → pearmut-0.2.11}/server/app.py RENAMED Viewed

@@ -291,7 +291,11 @@ async def _download_annotations(
             with open(output_path, "r") as f:
                 output[campaign_id] = [json.loads(x) for x in f.readlines()]
-    return JSONResponse(content=output, status_code=200)
+    return JSONResponse(
+        content=output,
+        status_code=200,
+        headers={"Content-Disposition": 'inline; filename="annotations.json"'}
+    )
 @app.get("/download-progress")
@@ -315,7 +319,11 @@ async def _download_progress(
         output[cid] = progress_data[cid]
-    return JSONResponse(content=output, status_code=200)
+    return JSONResponse(
+        content=output,
+        status_code=200,
+        headers={"Content-Disposition": 'inline; filename="progress.json"'}
+    )
 static_dir = f"{os.path.dirname(os.path.abspath(__file__))}/static/"
@@ -324,6 +332,16 @@ if not os.path.exists(static_dir + "index.html"):
         "Static directory not found. Please build the frontend first."
     )
+# Mount user assets from data/assets/
+assets_dir = f"{ROOT}/data/assets"
+os.makedirs(assets_dir, exist_ok=True)
+app.mount(
+    "/assets",
+    StaticFiles(directory=assets_dir, follow_symlink=True),
+    name="assets",
+)
 app.mount(
     "/",
     StaticFiles(directory=static_dir, html=True, follow_symlink=True),

{pearmut-0.2.9 → pearmut-0.2.11}/server/assignment.py RENAMED Viewed

@@ -84,9 +84,13 @@ def get_i_item_taskbased(
     # try to get existing annotations if any
     items_existing = get_db_log_item(campaign_id, user_id, item_i)
+    payload_existing = None
     if items_existing:
         # get the latest ones
-        payload_existing = items_existing[-1]["annotations"]
+        latest_item = items_existing[-1]
+        payload_existing = {"annotations": latest_item["annotations"]}
+        if "comment" in latest_item:
+            payload_existing["comment"] = latest_item["comment"]
     if item_i < 0 or item_i >= len(data_all[campaign_id]["data"][user_id]):
         return JSONResponse(
@@ -107,7 +111,7 @@ def get_i_item_taskbased(
                 if k.startswith("protocol")
             },
             "payload": data_all[campaign_id]["data"][user_id][item_i]
-        } | ({"payload_existing": payload_existing} if items_existing else {}),
+        } | ({"payload_existing": payload_existing} if payload_existing else {}),
         status_code=200
     )
@@ -127,9 +131,13 @@ def get_i_item_singlestream(
     # try to get existing annotations if any
     # note the None user_id since it is shared
     items_existing = get_db_log_item(campaign_id, None, item_i)
+    payload_existing = None
     if items_existing:
         # get the latest ones
-        payload_existing = items_existing[-1]["annotations"]
+        latest_item = items_existing[-1]
+        payload_existing = {"annotations": latest_item["annotations"]}
+        if "comment" in latest_item:
+            payload_existing["comment"] = latest_item["comment"]
     if item_i < 0 or item_i >= len(data_all[campaign_id]["data"]):
         return JSONResponse(
@@ -150,7 +158,7 @@ def get_i_item_singlestream(
                 if k.startswith("protocol")
             },
             "payload": data_all[campaign_id]["data"][item_i]
-        } | ({"payload_existing": payload_existing} if items_existing else {}),
+        } | ({"payload_existing": payload_existing} if payload_existing else {}),
         status_code=200
     )
@@ -173,9 +181,13 @@ def get_next_item_taskbased(
     # try to get existing annotations if any
     items_existing = get_db_log_item(campaign_id, user_id, item_i)
+    payload_existing = None
     if items_existing:
         # get the latest ones
-        payload_existing = items_existing[-1]["annotations"]
+        latest_item = items_existing[-1]
+        payload_existing = {"annotations": latest_item["annotations"]}
+        if "comment" in latest_item:
+            payload_existing["comment"] = latest_item["comment"]
     return JSONResponse(
         content={
@@ -190,7 +202,7 @@ def get_next_item_taskbased(
                 if k.startswith("protocol")
             },
             "payload": data_all[campaign_id]["data"][user_id][item_i]
-        } | ({"payload_existing": payload_existing} if items_existing else {}),
+        } | ({"payload_existing": payload_existing} if payload_existing else {}),
         status_code=200
     )
@@ -222,9 +234,13 @@ def get_next_item_singlestream(
     # try to get existing annotations if any
     # note the None user_id since it is shared
     items_existing = get_db_log_item(campaign_id, None, item_i)
+    payload_existing = None
     if items_existing:
         # get the latest ones
-        payload_existing = items_existing[-1]["annotations"]
+        latest_item = items_existing[-1]
+        payload_existing = {"annotations": latest_item["annotations"]}
+        if "comment" in latest_item:
+            payload_existing["comment"] = latest_item["comment"]
     return JSONResponse(
         content={
@@ -239,7 +255,7 @@ def get_next_item_singlestream(
                 if k.startswith("protocol")
             },
             "payload": data_all[campaign_id]["data"][item_i]
-        } | ({"payload_existing": payload_existing} if items_existing else {}),
+        } | ({"payload_existing": payload_existing} if payload_existing else {}),
         status_code=200
     )

{pearmut-0.2.9 → pearmut-0.2.11}/server/cli.py RENAMED Viewed

@@ -272,15 +272,10 @@ def _add_single_campaign(data_file, overwrite, server):
         if not os.path.isdir(assets_real_path):
             raise ValueError(f"Assets source path '{assets_real_path}' must be an existing directory.")
-        if not os.path.isdir(STATIC_DIR):
-            raise ValueError(
-                f"Static directory '{STATIC_DIR}' does not exist. "
-                "Please build the frontend first."
-            )
         # Symlink path is based on the destination, stripping the 'assets/' prefix
-        symlink_path = f"{STATIC_DIR}/{assets_destination}".rstrip("/")
+        # User assets are now stored under data/assets/ instead of static/assets/
+        symlink_path = f"{ROOT}/data/{assets_destination}".rstrip("/")
         # Remove existing symlink if present and we are overriding the same campaign
         if os.path.lexists(symlink_path):
@@ -392,7 +387,7 @@ def main():
                 campaign_data = json.load(f)
             destination = campaign_data.get("info", {}).get("assets", {}).get("destination")
             if destination:
-                symlink_path = f"{STATIC_DIR}/{destination}".rstrip("/")
+                symlink_path = f"{ROOT}/data/{destination}".rstrip("/")
                 if os.path.islink(symlink_path):
                     os.remove(symlink_path)
                     print(f"Assets symlink removed: {symlink_path}")

pearmut 0.2.9__tar.gz → 0.2.11__tar.gz

pearmut 0.2.9tar.gz → 0.2.11tar.gz