PyPI - pearmut - Versions diffs - 0.2.2__tar.gz → 0.2.3__tar.gz - Mend

pearmut 0.2.2tar.gz → 0.2.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

{pearmut-0.2.2 → pearmut-0.2.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pearmut
-Version: 0.2.2
+Version: 0.2.3
 Summary: A tool for evaluation of model outputs, primarily MT.
 Author-email: Vilém Zouhar <vilem.zouhar@gmail.com>
 License: apache-2.0
@@ -16,7 +16,6 @@ Requires-Dist: wonderwords>=3.0.0
 Requires-Dist: psutil>=7.1.0
 Provides-Extra: dev
 Requires-Dist: pytest; extra == "dev"
-Requires-Dist: pynpm>=0.3.0; extra == "dev"
 Dynamic: license-file
 # Pearmut 🍐
@@ -165,8 +164,10 @@ You can add validation rules to items for tutorials or attention checks. Items w
 - Tutorial items: Include `allow_skip: true` and `warning` to let users skip after seeing the feedback
 - Loud attention checks: Include `warning` without `allow_skip` to force users to retry
 - Silent attention checks: Omit `warning` to silently log failures without user notification (useful for quality control with bad translations)
 For listwise template, `validation` is an array where each element corresponds to a candidate.
-The dashboard shows failed/total validation checks per user.
+The dashboard shows failed/total validation checks per user, and ✅/❌ based on whether they pass the threshold.
+Set `validation_threshold` in `info` to control pass/fail: integer for max failed count, float in [0,1) for max failed proportion.
 See [examples/tutorial_pointwise.json](examples/tutorial_pointwise.json) and [examples/tutorial_listwise.json](examples/tutorial_listwise.json) for complete examples.
 ## Single-stream Assignment
@@ -181,7 +182,7 @@ We also support a simple allocation where all annotators draw from the same pool
         "protocol_score": True,                # collect scores
         "protocol_error_spans": True,          # collect error spans
         "protocol_error_categories": False,    # do not collect MQM categories, so ESA
-        "num_users": 50,                       # number of annotators
+        "users": 50,                           # number of annotators (can also be a list, see below)
     },
     "data": [...], # list of all items (shared among all annotators)
 }
@@ -196,12 +197,31 @@ We also support dynamic allocation of annotations (`dynamic`, not yet ⚠️), w
         "assignment": "dynamic",
         "template": "listwise",
         "protocol_k": 5,
-        "num_users": 50,
+        "users": 50,
     },
     "data": [...], # list of all items
 }
 ```
+## Pre-defined User IDs and Tokens
+By default, user IDs and completion tokens are automatically generated. The `users` field can be:
+- A number (e.g., `50`) to generate that many random user IDs
+- A list of strings (e.g., `["alice", "bob"]`) to use specific user IDs
+- A list of dictionaries to specify user IDs with custom tokens:
+```python
+{
+    "info": {
+        ...
+        "users": [
+            {"user_id": "alice", "token_pass": "alice_done", "token_fail": "alice_fail"},
+            {"user_id": "bob", "token_pass": "bob_done"}  # missing tokens are auto-generated
+        ],
+    },
+    ...
+}
+```
 To load a campaign into the server, run the following.
 It will fail if an existing campaign with the same `campaign_id` already exists, unless you specify `-o/--overwrite`.
 It will also output a secret management link. Then, launch the server:
@@ -234,8 +254,7 @@ and independently of that select your protocol template:
 When adding new campaigns or launching pearmut, a management link is shown that gives an overview of annotator progress but also an easy access to the annotation links or resetting the task progress (no data will be lost).
 This is also the place where you can download all progress and collected annotations (these files exist also locally but this might be more convenient).
-<img width="800" alt="Management dashboard" src="https://github.com/user-attachments/assets/82470693-a5ec-4d0e-8989-e93d5b0bb840" />
+<img width="800" alt="Management dashboard" src="https://github.com/user-attachments/assets/800a1741-5f41-47ac-9d5d-5cbf6abfc0e6" />
 Additionally, at the end of an annotation, a token of completion is shown which can be compared to the correct one that you can download in metadat from the dashboard.
 An intentionally incorrect token can be shown if the annotations don't pass quality control.
@@ -252,6 +271,39 @@ Tip: make sure the elements are already appropriately styled.
 <img width="1000" alt="Preview of multimodal elements in Pearmut" src="https://github.com/user-attachments/assets/77c4fa96-ee62-4e46-8e78-fd16e9007956" />
+## CLI Commands
+Pearmut provides the following commands:
+- `pearmut add <file(s)>`: Add one or more campaign JSON files. Supports wildcards (e.g., `pearmut add examples/*.json`).
+  - `-o/--overwrite`: Overwrite existing campaigns with the same ID.
+  - `--server <url>`: Prefix server URL for protocol links (default: `http://localhost:8001`).
+- `pearmut run`: Start the Pearmut server.
+  - `--port <port>`: Port to run the server on (default: 8001).
+  - `--server <url>`: Prefix server URL for protocol links.
+- `pearmut purge [campaign]`: Remove campaign data.
+  - Without arguments: Purges all campaigns (tasks, outputs, progress).
+  - With campaign name: Purges only the specified campaign's data.
+## Hosting Assets
+If you need to host local assets (e.g., audio files, images, videos) via Pearmut, you can use the `assets` key in your campaign file.
+When present, this directory is symlinked to the `static/` directory so its contents become accessible from the server.
+```python
+{
+    "campaign_id": "my_campaign",
+    "info": {
+      "assets": "videos",  # path to directory containing assets
+      ...
+    },
+    "data": [ ... ]
+}
+```
+For example, if `videos` contains `audio.mp3`, it will be accessible at `localhost:8001/assets/videos/audio.mp3`.
+The path can be absolute or relative to your current working directory.
 ## Development

{pearmut-0.2.2 → pearmut-0.2.3}/README.md RENAMED Viewed

@@ -144,8 +144,10 @@ You can add validation rules to items for tutorials or attention checks. Items w
 - Tutorial items: Include `allow_skip: true` and `warning` to let users skip after seeing the feedback
 - Loud attention checks: Include `warning` without `allow_skip` to force users to retry
 - Silent attention checks: Omit `warning` to silently log failures without user notification (useful for quality control with bad translations)
 For listwise template, `validation` is an array where each element corresponds to a candidate.
-The dashboard shows failed/total validation checks per user.
+The dashboard shows failed/total validation checks per user, and ✅/❌ based on whether they pass the threshold.
+Set `validation_threshold` in `info` to control pass/fail: integer for max failed count, float in [0,1) for max failed proportion.
 See [examples/tutorial_pointwise.json](examples/tutorial_pointwise.json) and [examples/tutorial_listwise.json](examples/tutorial_listwise.json) for complete examples.
 ## Single-stream Assignment
@@ -160,7 +162,7 @@ We also support a simple allocation where all annotators draw from the same pool
         "protocol_score": True,                # collect scores
         "protocol_error_spans": True,          # collect error spans
         "protocol_error_categories": False,    # do not collect MQM categories, so ESA
-        "num_users": 50,                       # number of annotators
+        "users": 50,                           # number of annotators (can also be a list, see below)
     },
     "data": [...], # list of all items (shared among all annotators)
 }
@@ -175,12 +177,31 @@ We also support dynamic allocation of annotations (`dynamic`, not yet ⚠️), w
         "assignment": "dynamic",
         "template": "listwise",
         "protocol_k": 5,
-        "num_users": 50,
+        "users": 50,
     },
     "data": [...], # list of all items
 }
 ```
+## Pre-defined User IDs and Tokens
+By default, user IDs and completion tokens are automatically generated. The `users` field can be:
+- A number (e.g., `50`) to generate that many random user IDs
+- A list of strings (e.g., `["alice", "bob"]`) to use specific user IDs
+- A list of dictionaries to specify user IDs with custom tokens:
+```python
+{
+    "info": {
+        ...
+        "users": [
+            {"user_id": "alice", "token_pass": "alice_done", "token_fail": "alice_fail"},
+            {"user_id": "bob", "token_pass": "bob_done"}  # missing tokens are auto-generated
+        ],
+    },
+    ...
+}
+```
 To load a campaign into the server, run the following.
 It will fail if an existing campaign with the same `campaign_id` already exists, unless you specify `-o/--overwrite`.
 It will also output a secret management link. Then, launch the server:
@@ -213,8 +234,7 @@ and independently of that select your protocol template:
 When adding new campaigns or launching pearmut, a management link is shown that gives an overview of annotator progress but also an easy access to the annotation links or resetting the task progress (no data will be lost).
 This is also the place where you can download all progress and collected annotations (these files exist also locally but this might be more convenient).
-<img width="800" alt="Management dashboard" src="https://github.com/user-attachments/assets/82470693-a5ec-4d0e-8989-e93d5b0bb840" />
+<img width="800" alt="Management dashboard" src="https://github.com/user-attachments/assets/800a1741-5f41-47ac-9d5d-5cbf6abfc0e6" />
 Additionally, at the end of an annotation, a token of completion is shown which can be compared to the correct one that you can download in metadat from the dashboard.
 An intentionally incorrect token can be shown if the annotations don't pass quality control.
@@ -231,6 +251,39 @@ Tip: make sure the elements are already appropriately styled.
 <img width="1000" alt="Preview of multimodal elements in Pearmut" src="https://github.com/user-attachments/assets/77c4fa96-ee62-4e46-8e78-fd16e9007956" />
+## CLI Commands
+Pearmut provides the following commands:
+- `pearmut add <file(s)>`: Add one or more campaign JSON files. Supports wildcards (e.g., `pearmut add examples/*.json`).
+  - `-o/--overwrite`: Overwrite existing campaigns with the same ID.
+  - `--server <url>`: Prefix server URL for protocol links (default: `http://localhost:8001`).
+- `pearmut run`: Start the Pearmut server.
+  - `--port <port>`: Port to run the server on (default: 8001).
+  - `--server <url>`: Prefix server URL for protocol links.
+- `pearmut purge [campaign]`: Remove campaign data.
+  - Without arguments: Purges all campaigns (tasks, outputs, progress).
+  - With campaign name: Purges only the specified campaign's data.
+## Hosting Assets
+If you need to host local assets (e.g., audio files, images, videos) via Pearmut, you can use the `assets` key in your campaign file.
+When present, this directory is symlinked to the `static/` directory so its contents become accessible from the server.
+```python
+{
+    "campaign_id": "my_campaign",
+    "info": {
+      "assets": "videos",  # path to directory containing assets
+      ...
+    },
+    "data": [ ... ]
+}
+```
+For example, if `videos` contains `audio.mp3`, it will be accessible at `localhost:8001/assets/videos/audio.mp3`.
+The path can be absolute or relative to your current working directory.
 ## Development

{pearmut-0.2.2 → pearmut-0.2.3}/pearmut.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pearmut
-Version: 0.2.2
+Version: 0.2.3
 Summary: A tool for evaluation of model outputs, primarily MT.
 Author-email: Vilém Zouhar <vilem.zouhar@gmail.com>
 License: apache-2.0
@@ -16,7 +16,6 @@ Requires-Dist: wonderwords>=3.0.0
 Requires-Dist: psutil>=7.1.0
 Provides-Extra: dev
 Requires-Dist: pytest; extra == "dev"
-Requires-Dist: pynpm>=0.3.0; extra == "dev"
 Dynamic: license-file
 # Pearmut 🍐
@@ -165,8 +164,10 @@ You can add validation rules to items for tutorials or attention checks. Items w
 - Tutorial items: Include `allow_skip: true` and `warning` to let users skip after seeing the feedback
 - Loud attention checks: Include `warning` without `allow_skip` to force users to retry
 - Silent attention checks: Omit `warning` to silently log failures without user notification (useful for quality control with bad translations)
 For listwise template, `validation` is an array where each element corresponds to a candidate.
-The dashboard shows failed/total validation checks per user.
+The dashboard shows failed/total validation checks per user, and ✅/❌ based on whether they pass the threshold.
+Set `validation_threshold` in `info` to control pass/fail: integer for max failed count, float in [0,1) for max failed proportion.
 See [examples/tutorial_pointwise.json](examples/tutorial_pointwise.json) and [examples/tutorial_listwise.json](examples/tutorial_listwise.json) for complete examples.
 ## Single-stream Assignment
@@ -181,7 +182,7 @@ We also support a simple allocation where all annotators draw from the same pool
         "protocol_score": True,                # collect scores
         "protocol_error_spans": True,          # collect error spans
         "protocol_error_categories": False,    # do not collect MQM categories, so ESA
-        "num_users": 50,                       # number of annotators
+        "users": 50,                           # number of annotators (can also be a list, see below)
     },
     "data": [...], # list of all items (shared among all annotators)
 }
@@ -196,12 +197,31 @@ We also support dynamic allocation of annotations (`dynamic`, not yet ⚠️), w
         "assignment": "dynamic",
         "template": "listwise",
         "protocol_k": 5,
-        "num_users": 50,
+        "users": 50,
     },
     "data": [...], # list of all items
 }
 ```
+## Pre-defined User IDs and Tokens
+By default, user IDs and completion tokens are automatically generated. The `users` field can be:
+- A number (e.g., `50`) to generate that many random user IDs
+- A list of strings (e.g., `["alice", "bob"]`) to use specific user IDs
+- A list of dictionaries to specify user IDs with custom tokens:
+```python
+{
+    "info": {
+        ...
+        "users": [
+            {"user_id": "alice", "token_pass": "alice_done", "token_fail": "alice_fail"},
+            {"user_id": "bob", "token_pass": "bob_done"}  # missing tokens are auto-generated
+        ],
+    },
+    ...
+}
+```
 To load a campaign into the server, run the following.
 It will fail if an existing campaign with the same `campaign_id` already exists, unless you specify `-o/--overwrite`.
 It will also output a secret management link. Then, launch the server:
@@ -234,8 +254,7 @@ and independently of that select your protocol template:
 When adding new campaigns or launching pearmut, a management link is shown that gives an overview of annotator progress but also an easy access to the annotation links or resetting the task progress (no data will be lost).
 This is also the place where you can download all progress and collected annotations (these files exist also locally but this might be more convenient).
-<img width="800" alt="Management dashboard" src="https://github.com/user-attachments/assets/82470693-a5ec-4d0e-8989-e93d5b0bb840" />
+<img width="800" alt="Management dashboard" src="https://github.com/user-attachments/assets/800a1741-5f41-47ac-9d5d-5cbf6abfc0e6" />
 Additionally, at the end of an annotation, a token of completion is shown which can be compared to the correct one that you can download in metadat from the dashboard.
 An intentionally incorrect token can be shown if the annotations don't pass quality control.
@@ -252,6 +271,39 @@ Tip: make sure the elements are already appropriately styled.
 <img width="1000" alt="Preview of multimodal elements in Pearmut" src="https://github.com/user-attachments/assets/77c4fa96-ee62-4e46-8e78-fd16e9007956" />
+## CLI Commands
+Pearmut provides the following commands:
+- `pearmut add <file(s)>`: Add one or more campaign JSON files. Supports wildcards (e.g., `pearmut add examples/*.json`).
+  - `-o/--overwrite`: Overwrite existing campaigns with the same ID.
+  - `--server <url>`: Prefix server URL for protocol links (default: `http://localhost:8001`).
+- `pearmut run`: Start the Pearmut server.
+  - `--port <port>`: Port to run the server on (default: 8001).
+  - `--server <url>`: Prefix server URL for protocol links.
+- `pearmut purge [campaign]`: Remove campaign data.
+  - Without arguments: Purges all campaigns (tasks, outputs, progress).
+  - With campaign name: Purges only the specified campaign's data.
+## Hosting Assets
+If you need to host local assets (e.g., audio files, images, videos) via Pearmut, you can use the `assets` key in your campaign file.
+When present, this directory is symlinked to the `static/` directory so its contents become accessible from the server.
+```python
+{
+    "campaign_id": "my_campaign",
+    "info": {
+      "assets": "videos",  # path to directory containing assets
+      ...
+    },
+    "data": [ ... ]
+}
+```
+For example, if `videos` contains `audio.mp3`, it will be accessible at `localhost:8001/assets/videos/audio.mp3`.
+The path can be absolute or relative to your current working directory.
 ## Development

{pearmut-0.2.2 → pearmut-0.2.3}/pearmut.egg-info/requires.txt RENAMED Viewed

@@ -5,4 +5,3 @@ psutil>=7.1.0
 [dev]
 pytest
-pynpm>=0.3.0

{pearmut-0.2.2 → pearmut-0.2.3}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "pearmut"
-version = "0.2.2"
+version = "0.2.3"
 description = "A tool for evaluation of model outputs, primarily MT."
 readme = "README.md"
 license = { text = "apache-2.0" }
@@ -20,7 +20,7 @@ dependencies = [
 ]
 [project.optional-dependencies]
-dev = ["pytest", "pynpm >= 0.3.0"]
+dev = ["pytest"]
 [project.scripts]
 pearmut = "pearmut.cli:main"

{pearmut-0.2.2 → pearmut-0.2.3}/server/app.py RENAMED Viewed

@@ -9,7 +9,13 @@ from fastapi.staticfiles import StaticFiles
 from pydantic import BaseModel
 from .assignment import get_i_item, get_next_item, reset_task, update_progress
-from .utils import ROOT, load_progress_data, save_db_payload, save_progress_data
+from .utils import (
+    ROOT,
+    check_validation_threshold,
+    load_progress_data,
+    save_db_payload,
+    save_progress_data,
+)
 os.makedirs(f"{ROOT}/data/outputs", exist_ok=True)
@@ -151,6 +157,9 @@ async def _dashboard_data(request: DashboardDataRequest):
     if assignment not in ["task-based", "single-stream"]:
         return JSONResponse(content={"error": "Unsupported campaign assignment type"}, status_code=400)
+    # Get threshold info for the campaign
+    validation_threshold = tasks_data[campaign_id]["info"].get("validation_threshold")
     for user_id, user_val in progress_data[campaign_id].items():
         # shallow copy
         entry = dict(user_val)
@@ -159,6 +168,13 @@ async def _dashboard_data(request: DashboardDataRequest):
             for v in list(entry.get("validations", {}).values())
         ]
+        # Add threshold pass/fail status (only when user is complete)
+        if all(entry["progress"]):
+            entry["threshold_passed"] = check_validation_threshold(
+                tasks_data, progress_data, campaign_id, user_id
+            )
+        else:
+            entry["threshold_passed"] = None
         if not is_privileged:
             entry["token_correct"] = None
@@ -169,7 +185,8 @@ async def _dashboard_data(request: DashboardDataRequest):
     return JSONResponse(
         content={
             "status": "ok",
-            "data": progress_new
+            "data": progress_new,
+            "validation_threshold": validation_threshold
         },
         status_code=200
     )

{pearmut-0.2.2 → pearmut-0.2.3}/server/assignment.py RENAMED Viewed

@@ -3,18 +3,23 @@ from typing import Any
 from fastapi.responses import JSONResponse
-from .utils import get_db_log_item
+from .utils import (
+    RESET_MARKER,
+    check_validation_threshold,
+    get_db_log_item,
+    save_db_payload,
+)
 def _completed_response(
+    tasks_data: dict,
     progress_data: dict,
     campaign_id: str,
     user_id: str,
 ) -> JSONResponse:
     """Build a completed response with progress, time, and token."""
     user_progress = progress_data[campaign_id][user_id]
-    # TODO: add check for data quality
-    is_ok = True
+    is_ok = check_validation_threshold(tasks_data, progress_data, campaign_id, user_id)
     return JSONResponse(
         content={
             "status": "completed",
@@ -161,7 +166,7 @@ def get_next_item_taskbased(
     """
     user_progress = progress_data[campaign_id][user_id]
     if all(user_progress["progress"]):
-        return _completed_response(progress_data, campaign_id, user_id)
+        return _completed_response(data_all, progress_data, campaign_id, user_id)
     # find first incomplete item
     item_i = min([i for i, v in enumerate(user_progress["progress"]) if not v])
@@ -208,7 +213,7 @@ def get_next_item_singlestream(
     progress = user_progress["progress"]
     if all(progress):
-        return _completed_response(progress_data, campaign_id, user_id)
+        return _completed_response(data_all, progress_data, campaign_id, user_id)
     # find a random incomplete item
     incomplete_indices = [i for i, v in enumerate(progress) if not v]
@@ -261,20 +266,33 @@ def reset_task(
 ) -> JSONResponse:
     """
     Reset the task progress for the user in the specified campaign.
+    Saves a reset marker to mask existing annotations.
     """
     assignment = tasks_data[campaign_id]["info"]["assignment"]
     if assignment == "task-based":
-        progress_data[campaign_id][user_id]["progress"] = (
-            [False]*len(tasks_data[campaign_id]["data"][user_id])
-        )
+        # Save reset marker for this user to mask existing annotations
+        num_items = len(tasks_data[campaign_id]["data"][user_id])
+        for item_i in range(num_items):
+            save_db_payload(campaign_id, {
+                "user_id": user_id,
+                "item_i": item_i,
+                "annotations": RESET_MARKER
+            })
+        progress_data[campaign_id][user_id]["progress"] = [False] * num_items
         _reset_user_time(progress_data, campaign_id, user_id)
         return JSONResponse(content={"status": "ok"}, status_code=200)
     elif assignment == "single-stream":
+        # Save reset markers for all items (shared pool)
+        num_items = len(tasks_data[campaign_id]["data"])
+        for item_i in range(num_items):
+            save_db_payload(campaign_id, {
+                "user_id": None,
+                "item_i": item_i,
+                "annotations": RESET_MARKER
+            })
         # for single-stream reset all progress
         for uid in progress_data[campaign_id]:
-            progress_data[campaign_id][uid]["progress"] = (
-                [False]*len(tasks_data[campaign_id]["data"])
-            )
+            progress_data[campaign_id][uid]["progress"] = [False] * num_items
         _reset_user_time(progress_data, campaign_id, user_id)
         return JSONResponse(content={"status": "ok"}, status_code=200)
     else:

pearmut 0.2.2__tar.gz → 0.2.3__tar.gz

pearmut 0.2.2tar.gz → 0.2.3tar.gz