PyPI - pearmut - Versions diffs - 1.0.2__tar.gz → 1.1.0__tar.gz - Mend

pearmut 1.0.2tar.gz → 1.1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

{pearmut-1.0.2 → pearmut-1.1.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pearmut
-Version: 1.0.2
+Version: 1.1.0
 Summary: A tool for evaluation of model outputs, primarily MT.
 Author-email: Vilém Zouhar <vilem.zouhar@gmail.com>
 License: MIT
@@ -35,12 +35,15 @@ Dynamic: license-file
   - [Assignment Types](#assignment-types)
 - [Advanced Features](#advanced-features)
   - [Pre-filled Error Spans (ESA<sup>AI</sup>)](#pre-filled-error-spans-esaai)
+  - [Custom MQM Taxonomy](#custom-mqm-taxonomy)
   - [Tutorial and Attention Checks](#tutorial-and-attention-checks)
+  - [Form Items for User Metadata](#form-items-for-user-metadata)
   - [Pre-defined User IDs and Tokens](#pre-defined-user-ids-and-tokens)
   - [Multimodal Annotations](#multimodal-annotations)
   - [Hosting Assets](#hosting-assets)
 - [Campaign Management](#campaign-management)
   - [Custom Completion Messages](#custom-completion-messages)
+  - [Prolific Integration](#prolific-integration)
 - [CLI Commands](#cli-commands)
 - [Terminology](#terminology)
 - [Development](#development)
@@ -141,6 +144,7 @@ The `shuffle` parameter in campaign `info` controls this behavior:
   "data": [...]
 }
 ```
+Documents in `data_welcome` are not shuffled and so don't require to have the same models in all documents.
 ### Showing Model Names
@@ -197,6 +201,33 @@ Enable a textfield for post-editing or translation tasks using the `textfield` p
 - `"visible"`: Textfield always visible
 - `"prefilled"`: Textfield visible and pre-filled with model output for post-editing
+### Custom MQM Taxonomy
+For MQM protocol campaigns, you can define a custom error taxonomy instead of using the default MQM categories. Specify `mqm_categories` in the campaign `info` section as a dictionary mapping main categories to lists of subcategories:
+```python
+{
+  "info": {
+    "assignment": "task-based",
+    "protocol": "MQM",
+    "mqm_categories": {
+      "": [],                          # Empty selection option
+      "General": ["", "Accuracy", "Fluency"],
+      "Audio-specific": ["", "Inaudible", "Background noise", "Speaker overlap", "Misinterpretation"],
+      "Style": ["", "Awkward", "Embarassing"],
+      "Unknown": []                    # Category with no subcategories
+    }
+  },
+  "campaign_id": "custom_mqm_example",
+  "data": [...]
+}
+```
+If `mqm_categories` is not provided, the default MQM taxonomy will be used. The empty string key `""` provides an unselected state in the dropdown. Categories with empty subcategory lists (e.g., `"Style": []`) do not require a subcategory selection.
+See [examples/custom_mqm.json](examples/custom_mqm.json) for a complete example.
 ### Custom Instructions
 Set campaign-level instructions using the `instructions` field in `info` (supports HTML).
@@ -286,6 +317,34 @@ The `score_greaterthan` field specifies the index of the candidate that must hav
 See [examples/tutorial/esa_deen.json](examples/tutorial/esa_deen.json) for a mock campaign with a fully prepared ESA tutorial.
 To use it, simply extract the `data` attribute and prefix it to each task in your campaign.
+#### Universal Tutorial Items with `data_welcome`
+Use `data_welcome` to add tutorial items that users must complete before starting regular tasks. The structure is a list of documents (same as `data`). Welcome items have IDs `welcome_0`, `welcome_1`, etc. and are tracked separately via `progress_welcome`.
+### Form Items for User Metadata
+Collect user information (demographics, expertise) before annotation tasks using form items in `data_welcome`.
+Form items have `text` (label/question) and `form` (field type: `null`, `"string"`, `"number"`, `"choices"`, and `"script"`).
+Documents must be homogeneous: all form items or all evaluation items.
+```python
+{
+  "data_welcome": [
+    [
+      {"text": "What is your native language?", "form": "string"},
+      {"text": "Rate your expertise (1-10)", "form": "number"}
+    ]
+  ]
+}
+```
+<img width="400" alt="Screenshot of a user form" src="https://github.com/user-attachments/assets/2310e8dc-98e9-4abf-8a27-6781b0094efe" />
+It is possible to automatically collect additional information from the host system using `"script"` field type.
+Typically such a form document (or their sequence) would be stored in `"data_welcome"` such that it is both mandatory and show to all users.
+See [examples/user_info_form.json](examples/user_info_form.json).
 ### Single-stream Assignment
 All annotators draw from a shared pool with random assignment:
@@ -299,11 +358,14 @@ All annotators draw from a shared pool with random assignment:
         # ESA: error spans and scores
         "protocol": "ESA",
         "users": 50,                           # number of annotators (can also be a list, see below)
+        "docs_per_user": 10,                   # optional: show goodbye after N documents per user
     },
     "data": [...], # list of all items (shared among all annotators)
 }
 ```
+Set `docs_per_user` to limit how many documents each user annotates before seeing the goodbye message (for single-stream, this is the number of documents).
 ### Dynamic Assignment
 The `dynamic` assignment type intelligently selects items based on current model performance to focus annotation effort on top-performing models using contrastive comparisons.
@@ -320,11 +382,14 @@ All items must contain outputs from all models for this assignment type to work
         "dynamic_contrastive_models": 2,       # how many models to compare per item (optional, default: 1)
         "dynamic_first": 5,                    # annotations per model before dynamic kicks in (optional, default: 5)
         "dynamic_backoff": 0.1,                # probability of uniform sampling (optional, default: 0)
+        "docs_per_user": 20,                   # optional: show goodbye after N documents per user
     },
     "data": [...], # list of all items (shared among all annotators)
 }
 ```
+Set `docs_per_user` to limit how many documents each user annotates before seeing the goodbye message (for dynamic, this is roughly the number of documents × models).
 **How it works:**
 1. Initial phase: Each model gets `dynamic_first` annotations with fully random contrastive evaluation
 2. Dynamic phase: After the initial phase, top `dynamic_top` models (by average score) are identified
@@ -412,6 +477,14 @@ When tokens are supplied, the dashboard will try to show model rankings based on
 Customize the goodbye message shown to users when they complete all annotations using the `instructions_goodbye` field in campaign info. Supports arbitrary HTML for styling and formatting with variable replacement: `${TOKEN}` (completion token) and `${USER_ID}` (user ID). Default: `"If someone asks you for a token of completion, show them: ${TOKEN}"`.
+### Prolific Integration
+Use task-based assignment with Prolific. For each task, Pearmut generates a unique URL which can be uploaded to Prolific's interface. Add redirect (on completion) to `instructions_goodbye`:
+```json
+"instructions_goodbye": "<a href='https://app.prolific.com/submissions/complete?cc=${TOKEN}'>Click here to return to Prolific</a>"
+```
+The `${TOKEN}` is automatically replaced based on passing attention checks (see [Attention checks](#tutorial-and-attention-checks) and [Pre-defined tokens](#pre-defined-user-ids-and-tokens)).
 ## Terminology
 - **Campaign**: An annotation project that contains configuration, data, and user assignments. Each campaign has a unique identifier and is defined in a JSON file.

{pearmut-1.0.2 → pearmut-1.1.0}/README.md RENAMED Viewed

@@ -14,12 +14,15 @@
   - [Assignment Types](#assignment-types)
 - [Advanced Features](#advanced-features)
   - [Pre-filled Error Spans (ESA<sup>AI</sup>)](#pre-filled-error-spans-esaai)
+  - [Custom MQM Taxonomy](#custom-mqm-taxonomy)
   - [Tutorial and Attention Checks](#tutorial-and-attention-checks)
+  - [Form Items for User Metadata](#form-items-for-user-metadata)
   - [Pre-defined User IDs and Tokens](#pre-defined-user-ids-and-tokens)
   - [Multimodal Annotations](#multimodal-annotations)
   - [Hosting Assets](#hosting-assets)
 - [Campaign Management](#campaign-management)
   - [Custom Completion Messages](#custom-completion-messages)
+  - [Prolific Integration](#prolific-integration)
 - [CLI Commands](#cli-commands)
 - [Terminology](#terminology)
 - [Development](#development)
@@ -120,6 +123,7 @@ The `shuffle` parameter in campaign `info` controls this behavior:
   "data": [...]
 }
 ```
+Documents in `data_welcome` are not shuffled and so don't require to have the same models in all documents.
 ### Showing Model Names
@@ -176,6 +180,33 @@ Enable a textfield for post-editing or translation tasks using the `textfield` p
 - `"visible"`: Textfield always visible
 - `"prefilled"`: Textfield visible and pre-filled with model output for post-editing
+### Custom MQM Taxonomy
+For MQM protocol campaigns, you can define a custom error taxonomy instead of using the default MQM categories. Specify `mqm_categories` in the campaign `info` section as a dictionary mapping main categories to lists of subcategories:
+```python
+{
+  "info": {
+    "assignment": "task-based",
+    "protocol": "MQM",
+    "mqm_categories": {
+      "": [],                          # Empty selection option
+      "General": ["", "Accuracy", "Fluency"],
+      "Audio-specific": ["", "Inaudible", "Background noise", "Speaker overlap", "Misinterpretation"],
+      "Style": ["", "Awkward", "Embarassing"],
+      "Unknown": []                    # Category with no subcategories
+    }
+  },
+  "campaign_id": "custom_mqm_example",
+  "data": [...]
+}
+```
+If `mqm_categories` is not provided, the default MQM taxonomy will be used. The empty string key `""` provides an unselected state in the dropdown. Categories with empty subcategory lists (e.g., `"Style": []`) do not require a subcategory selection.
+See [examples/custom_mqm.json](examples/custom_mqm.json) for a complete example.
 ### Custom Instructions
 Set campaign-level instructions using the `instructions` field in `info` (supports HTML).
@@ -265,6 +296,34 @@ The `score_greaterthan` field specifies the index of the candidate that must hav
 See [examples/tutorial/esa_deen.json](examples/tutorial/esa_deen.json) for a mock campaign with a fully prepared ESA tutorial.
 To use it, simply extract the `data` attribute and prefix it to each task in your campaign.
+#### Universal Tutorial Items with `data_welcome`
+Use `data_welcome` to add tutorial items that users must complete before starting regular tasks. The structure is a list of documents (same as `data`). Welcome items have IDs `welcome_0`, `welcome_1`, etc. and are tracked separately via `progress_welcome`.
+### Form Items for User Metadata
+Collect user information (demographics, expertise) before annotation tasks using form items in `data_welcome`.
+Form items have `text` (label/question) and `form` (field type: `null`, `"string"`, `"number"`, `"choices"`, and `"script"`).
+Documents must be homogeneous: all form items or all evaluation items.
+```python
+{
+  "data_welcome": [
+    [
+      {"text": "What is your native language?", "form": "string"},
+      {"text": "Rate your expertise (1-10)", "form": "number"}
+    ]
+  ]
+}
+```
+<img width="400" alt="Screenshot of a user form" src="https://github.com/user-attachments/assets/2310e8dc-98e9-4abf-8a27-6781b0094efe" />
+It is possible to automatically collect additional information from the host system using `"script"` field type.
+Typically such a form document (or their sequence) would be stored in `"data_welcome"` such that it is both mandatory and show to all users.
+See [examples/user_info_form.json](examples/user_info_form.json).
 ### Single-stream Assignment
 All annotators draw from a shared pool with random assignment:
@@ -278,11 +337,14 @@ All annotators draw from a shared pool with random assignment:
         # ESA: error spans and scores
         "protocol": "ESA",
         "users": 50,                           # number of annotators (can also be a list, see below)
+        "docs_per_user": 10,                   # optional: show goodbye after N documents per user
     },
     "data": [...], # list of all items (shared among all annotators)
 }
 ```
+Set `docs_per_user` to limit how many documents each user annotates before seeing the goodbye message (for single-stream, this is the number of documents).
 ### Dynamic Assignment
 The `dynamic` assignment type intelligently selects items based on current model performance to focus annotation effort on top-performing models using contrastive comparisons.
@@ -299,11 +361,14 @@ All items must contain outputs from all models for this assignment type to work
         "dynamic_contrastive_models": 2,       # how many models to compare per item (optional, default: 1)
         "dynamic_first": 5,                    # annotations per model before dynamic kicks in (optional, default: 5)
         "dynamic_backoff": 0.1,                # probability of uniform sampling (optional, default: 0)
+        "docs_per_user": 20,                   # optional: show goodbye after N documents per user
     },
     "data": [...], # list of all items (shared among all annotators)
 }
 ```
+Set `docs_per_user` to limit how many documents each user annotates before seeing the goodbye message (for dynamic, this is roughly the number of documents × models).
 **How it works:**
 1. Initial phase: Each model gets `dynamic_first` annotations with fully random contrastive evaluation
 2. Dynamic phase: After the initial phase, top `dynamic_top` models (by average score) are identified
@@ -391,6 +456,14 @@ When tokens are supplied, the dashboard will try to show model rankings based on
 Customize the goodbye message shown to users when they complete all annotations using the `instructions_goodbye` field in campaign info. Supports arbitrary HTML for styling and formatting with variable replacement: `${TOKEN}` (completion token) and `${USER_ID}` (user ID). Default: `"If someone asks you for a token of completion, show them: ${TOKEN}"`.
+### Prolific Integration
+Use task-based assignment with Prolific. For each task, Pearmut generates a unique URL which can be uploaded to Prolific's interface. Add redirect (on completion) to `instructions_goodbye`:
+```json
+"instructions_goodbye": "<a href='https://app.prolific.com/submissions/complete?cc=${TOKEN}'>Click here to return to Prolific</a>"
+```
+The `${TOKEN}` is automatically replaced based on passing attention checks (see [Attention checks](#tutorial-and-attention-checks) and [Pre-defined tokens](#pre-defined-user-ids-and-tokens)).
 ## Terminology
 - **Campaign**: An annotation project that contains configuration, data, and user assignments. Each campaign has a unique identifier and is defined in a JSON file.
@@ -467,4 +540,4 @@ If you use this work in your paper, please cite as following.
 ```
 Contributions are welcome! Please reach out to [Vilém Zouhar](mailto:vilem.zouhar@gmail.com).
-See changes in [CHANGELOG.md](CHANGELOG.md).
+See changes in [CHANGELOG.md](CHANGELOG.md).

{pearmut-1.0.2 → pearmut-1.1.0}/pearmut.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pearmut
-Version: 1.0.2
+Version: 1.1.0
 Summary: A tool for evaluation of model outputs, primarily MT.
 Author-email: Vilém Zouhar <vilem.zouhar@gmail.com>
 License: MIT
@@ -35,12 +35,15 @@ Dynamic: license-file
   - [Assignment Types](#assignment-types)
 - [Advanced Features](#advanced-features)
   - [Pre-filled Error Spans (ESA<sup>AI</sup>)](#pre-filled-error-spans-esaai)
+  - [Custom MQM Taxonomy](#custom-mqm-taxonomy)
   - [Tutorial and Attention Checks](#tutorial-and-attention-checks)
+  - [Form Items for User Metadata](#form-items-for-user-metadata)
   - [Pre-defined User IDs and Tokens](#pre-defined-user-ids-and-tokens)
   - [Multimodal Annotations](#multimodal-annotations)
   - [Hosting Assets](#hosting-assets)
 - [Campaign Management](#campaign-management)
   - [Custom Completion Messages](#custom-completion-messages)
+  - [Prolific Integration](#prolific-integration)
 - [CLI Commands](#cli-commands)
 - [Terminology](#terminology)
 - [Development](#development)
@@ -141,6 +144,7 @@ The `shuffle` parameter in campaign `info` controls this behavior:
   "data": [...]
 }
 ```
+Documents in `data_welcome` are not shuffled and so don't require to have the same models in all documents.
 ### Showing Model Names
@@ -197,6 +201,33 @@ Enable a textfield for post-editing or translation tasks using the `textfield` p
 - `"visible"`: Textfield always visible
 - `"prefilled"`: Textfield visible and pre-filled with model output for post-editing
+### Custom MQM Taxonomy
+For MQM protocol campaigns, you can define a custom error taxonomy instead of using the default MQM categories. Specify `mqm_categories` in the campaign `info` section as a dictionary mapping main categories to lists of subcategories:
+```python
+{
+  "info": {
+    "assignment": "task-based",
+    "protocol": "MQM",
+    "mqm_categories": {
+      "": [],                          # Empty selection option
+      "General": ["", "Accuracy", "Fluency"],
+      "Audio-specific": ["", "Inaudible", "Background noise", "Speaker overlap", "Misinterpretation"],
+      "Style": ["", "Awkward", "Embarassing"],
+      "Unknown": []                    # Category with no subcategories
+    }
+  },
+  "campaign_id": "custom_mqm_example",
+  "data": [...]
+}
+```
+If `mqm_categories` is not provided, the default MQM taxonomy will be used. The empty string key `""` provides an unselected state in the dropdown. Categories with empty subcategory lists (e.g., `"Style": []`) do not require a subcategory selection.
+See [examples/custom_mqm.json](examples/custom_mqm.json) for a complete example.
 ### Custom Instructions
 Set campaign-level instructions using the `instructions` field in `info` (supports HTML).
@@ -286,6 +317,34 @@ The `score_greaterthan` field specifies the index of the candidate that must hav
 See [examples/tutorial/esa_deen.json](examples/tutorial/esa_deen.json) for a mock campaign with a fully prepared ESA tutorial.
 To use it, simply extract the `data` attribute and prefix it to each task in your campaign.
+#### Universal Tutorial Items with `data_welcome`
+Use `data_welcome` to add tutorial items that users must complete before starting regular tasks. The structure is a list of documents (same as `data`). Welcome items have IDs `welcome_0`, `welcome_1`, etc. and are tracked separately via `progress_welcome`.
+### Form Items for User Metadata
+Collect user information (demographics, expertise) before annotation tasks using form items in `data_welcome`.
+Form items have `text` (label/question) and `form` (field type: `null`, `"string"`, `"number"`, `"choices"`, and `"script"`).
+Documents must be homogeneous: all form items or all evaluation items.
+```python
+{
+  "data_welcome": [
+    [
+      {"text": "What is your native language?", "form": "string"},
+      {"text": "Rate your expertise (1-10)", "form": "number"}
+    ]
+  ]
+}
+```
+<img width="400" alt="Screenshot of a user form" src="https://github.com/user-attachments/assets/2310e8dc-98e9-4abf-8a27-6781b0094efe" />
+It is possible to automatically collect additional information from the host system using `"script"` field type.
+Typically such a form document (or their sequence) would be stored in `"data_welcome"` such that it is both mandatory and show to all users.
+See [examples/user_info_form.json](examples/user_info_form.json).
 ### Single-stream Assignment
 All annotators draw from a shared pool with random assignment:
@@ -299,11 +358,14 @@ All annotators draw from a shared pool with random assignment:
         # ESA: error spans and scores
         "protocol": "ESA",
         "users": 50,                           # number of annotators (can also be a list, see below)
+        "docs_per_user": 10,                   # optional: show goodbye after N documents per user
     },
     "data": [...], # list of all items (shared among all annotators)
 }
 ```
+Set `docs_per_user` to limit how many documents each user annotates before seeing the goodbye message (for single-stream, this is the number of documents).
 ### Dynamic Assignment
 The `dynamic` assignment type intelligently selects items based on current model performance to focus annotation effort on top-performing models using contrastive comparisons.
@@ -320,11 +382,14 @@ All items must contain outputs from all models for this assignment type to work
         "dynamic_contrastive_models": 2,       # how many models to compare per item (optional, default: 1)
         "dynamic_first": 5,                    # annotations per model before dynamic kicks in (optional, default: 5)
         "dynamic_backoff": 0.1,                # probability of uniform sampling (optional, default: 0)
+        "docs_per_user": 20,                   # optional: show goodbye after N documents per user
     },
     "data": [...], # list of all items (shared among all annotators)
 }
 ```
+Set `docs_per_user` to limit how many documents each user annotates before seeing the goodbye message (for dynamic, this is roughly the number of documents × models).
 **How it works:**
 1. Initial phase: Each model gets `dynamic_first` annotations with fully random contrastive evaluation
 2. Dynamic phase: After the initial phase, top `dynamic_top` models (by average score) are identified
@@ -412,6 +477,14 @@ When tokens are supplied, the dashboard will try to show model rankings based on
 Customize the goodbye message shown to users when they complete all annotations using the `instructions_goodbye` field in campaign info. Supports arbitrary HTML for styling and formatting with variable replacement: `${TOKEN}` (completion token) and `${USER_ID}` (user ID). Default: `"If someone asks you for a token of completion, show them: ${TOKEN}"`.
+### Prolific Integration
+Use task-based assignment with Prolific. For each task, Pearmut generates a unique URL which can be uploaded to Prolific's interface. Add redirect (on completion) to `instructions_goodbye`:
+```json
+"instructions_goodbye": "<a href='https://app.prolific.com/submissions/complete?cc=${TOKEN}'>Click here to return to Prolific</a>"
+```
+The `${TOKEN}` is automatically replaced based on passing attention checks (see [Attention checks](#tutorial-and-attention-checks) and [Pre-defined tokens](#pre-defined-user-ids-and-tokens)).
 ## Terminology
 - **Campaign**: An annotation project that contains configuration, data, and user assignments. Each campaign has a unique identifier and is defined in a JSON file.

{pearmut-1.0.2 → pearmut-1.1.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "pearmut"
-version = "1.0.2"
+version = "1.1.0"
 description = "A tool for evaluation of model outputs, primarily MT."
 readme = "README.md"
 license = { text = "MIT" }
@@ -31,7 +31,7 @@ Repository = "https://github.com/zouharvi/pearmut"
 Issues = "https://github.com/zouharvi/pearmut/issues"
 [tool.setuptools]
-package-dir = { "pearmut" = "server" }
+package-dir = { pearmut = "server" }
 packages = ["pearmut"]
 [build-system]

{pearmut-1.0.2 → pearmut-1.1.0}/server/app.py RENAMED Viewed

@@ -49,7 +49,7 @@ for campaign_id in progress_data.keys():
 class LogResponseRequest(BaseModel):
     campaign_id: str
     user_id: str
-    item_i: int
+    item_i: int | str
     payload: dict[str, Any]
@@ -124,7 +124,7 @@ async def _get_next_item(request: NextItemRequest):
 class GetItemRequest(BaseModel):
     campaign_id: str
     user_id: str
-    item_i: int
+    item_i: int | str
 @app.post("/get-i-item")
@@ -179,7 +179,11 @@ async def _dashboard_data(request: DashboardDataRequest):
         ]
         # Add threshold pass/fail status (only when user is complete)
-        if all(entry["progress"]):
+        if (
+            tasks_data[campaign_id]["info"]["assignment"] != "dynamic" and all(v in {"completed", "completed_foreign"} for v in entry["progress"])
+        ) or (
+            tasks_data[campaign_id]["info"]["assignment"] == "dynamic" and all(v in {"completed", "completed_foreign"} for mv in entry["progress"] for v in mv.values())
+        ):
             entry["threshold_passed"] = check_validation_threshold(
                 tasks_data, progress_data, campaign_id, user_id
             )
@@ -376,7 +380,6 @@ async def _download_annotations(
     # NOTE: currently not checking tokens for progress download as it is non-destructive
     # token: list[str] = Query()
 ):
     output = {}
     for campaign_id in campaign_id:
         output_path = f"{ROOT}/data/outputs/{campaign_id}.jsonl"
@@ -403,7 +406,6 @@ async def _download_annotations(
 async def _download_progress(
     campaign_id: list[str] = Query(), token: list[str] = Query()
 ):
     if len(campaign_id) != len(token):
         return JSONResponse(
             content="Mismatched campaign_id and token count", status_code=400
@@ -435,6 +437,7 @@ if not os.path.exists(static_dir + "index.html"):
         "Static directory not found. Please build the frontend first."
     )
 # Serve HTML files directly without redirect
 @app.get("/annotate")
 async def serve_annotate():

pearmut 1.0.2__tar.gz → 1.1.0__tar.gz

pearmut 1.0.2tar.gz → 1.1.0tar.gz