PyPI - codebook-lab - Versions diffs - 1.1.1__tar.gz → 1.2.0__tar.gz - Mend

codebook-lab 1.1.1tar.gz → 1.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

{codebook_lab-1.1.1 → codebook_lab-1.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: codebook-lab
-Version: 1.1.1
+Version: 1.2.0
 Summary: An LLM annotation experiment pipeline for computational social science.
 Author: Lorcan McLaren
 License-Expression: AGPL-3.0-only
@@ -98,6 +98,7 @@ The package is organized around a small set of importable modules:
 - `codebook_lab.experiments`: high-level functions for single experiments and multi-run comparisons
 - `codebook_lab.annotate`: lower-level annotation functions
 - `codebook_lab.metrics`: evaluation and metrics functions
+- `codebook_lab.human_reliability`: human coder validation, ICR, disagreement, and ground-truth helpers
 - `codebook_lab.prompts`: prompt wrapper registry for built-in and custom prompt styles
 - `codebook_lab.examples`: helpers for bundled example tasks
 - `codebook_lab.types`: dataclasses for experiment specifications and result objects
@@ -281,6 +282,47 @@ Add multiple values to any field and the package sweeps them automatically. For
 If you are still designing a task and do not yet have human-coded labels, you can run annotation with `codebook_lab.run_annotation(...)` on an unlabeled CSV and add `ground-truth.csv` later when you want to score model performance with `codebook_lab.run_metrics(...)`.
+## Human Reliability And Adjudication
+When multiple human coders annotate the same items, CodeBook Lab can validate the coder CSVs, calculate inter-coder reliability, find disagreements, and build a consensus `ground-truth.csv`.
+```python
+from codebook_lab import build_human_ground_truth, calculate_human_reliability
+coder_csvs = {
+    "coder1": "annotations/coder1.csv",
+    "coder2": "annotations/coder2.csv",
+    "coder3": "annotations/coder3.csv",
+}
+reliability = calculate_human_reliability(
+    codebook_path="codebook.json",
+    coder_csvs=coder_csvs,
+    output_dir="outputs/human_reliability",
+)
+ground_truth = build_human_ground_truth(
+    codebook_path="codebook.json",
+    coder_csvs=coder_csvs,
+    output_dir="outputs/ground_truth",
+)
+```
+Each coder CSV must contain a stable item identifier column. The default is `sample_id`; pass `id_column="..."` to use a different column. By default, coder assignments are inferred from the submitted files. To validate expected coverage, pass an optional assignment CSV in either long format (`sample_id,coder_id`) or wide format (`sample_id,ra_1,ra_2,...`).
+Reliability outputs include `validation_issues.csv`, `pairwise_icr.csv`, `multirater_icr.csv`, `disagreements.csv`, and `summary.md`. Ground-truth outputs include `ground-truth.csv`, `adjudication_queue.csv`, and `validation_issues.csv`.
+Rows without a strict majority are written to `adjudication_queue.csv`. Open that queue in CodeBook Studio's adjudication mode, fill the unresolved blanks, export the completed queue, then rebuild:
+```python
+resolved = build_human_ground_truth(
+    codebook_path="codebook.json",
+    coder_csvs=coder_csvs,
+    adjudications_csv="adjudication_queue.csv",
+    output_dir="outputs/ground_truth_resolved",
+)
+```
 ## Advanced Customization
 If you want to go beyond the default wrappers and hyperparameters, `codebook_lab/annotate.py` and `codebook_lab/prompts.py` are the main extension points.

{codebook_lab-1.1.1 → codebook_lab-1.2.0}/README.md RENAMED Viewed

@@ -53,6 +53,7 @@ The package is organized around a small set of importable modules:
 - `codebook_lab.experiments`: high-level functions for single experiments and multi-run comparisons
 - `codebook_lab.annotate`: lower-level annotation functions
 - `codebook_lab.metrics`: evaluation and metrics functions
+- `codebook_lab.human_reliability`: human coder validation, ICR, disagreement, and ground-truth helpers
 - `codebook_lab.prompts`: prompt wrapper registry for built-in and custom prompt styles
 - `codebook_lab.examples`: helpers for bundled example tasks
 - `codebook_lab.types`: dataclasses for experiment specifications and result objects
@@ -236,6 +237,47 @@ Add multiple values to any field and the package sweeps them automatically. For
 If you are still designing a task and do not yet have human-coded labels, you can run annotation with `codebook_lab.run_annotation(...)` on an unlabeled CSV and add `ground-truth.csv` later when you want to score model performance with `codebook_lab.run_metrics(...)`.
+## Human Reliability And Adjudication
+When multiple human coders annotate the same items, CodeBook Lab can validate the coder CSVs, calculate inter-coder reliability, find disagreements, and build a consensus `ground-truth.csv`.
+```python
+from codebook_lab import build_human_ground_truth, calculate_human_reliability
+coder_csvs = {
+    "coder1": "annotations/coder1.csv",
+    "coder2": "annotations/coder2.csv",
+    "coder3": "annotations/coder3.csv",
+}
+reliability = calculate_human_reliability(
+    codebook_path="codebook.json",
+    coder_csvs=coder_csvs,
+    output_dir="outputs/human_reliability",
+)
+ground_truth = build_human_ground_truth(
+    codebook_path="codebook.json",
+    coder_csvs=coder_csvs,
+    output_dir="outputs/ground_truth",
+)
+```
+Each coder CSV must contain a stable item identifier column. The default is `sample_id`; pass `id_column="..."` to use a different column. By default, coder assignments are inferred from the submitted files. To validate expected coverage, pass an optional assignment CSV in either long format (`sample_id,coder_id`) or wide format (`sample_id,ra_1,ra_2,...`).
+Reliability outputs include `validation_issues.csv`, `pairwise_icr.csv`, `multirater_icr.csv`, `disagreements.csv`, and `summary.md`. Ground-truth outputs include `ground-truth.csv`, `adjudication_queue.csv`, and `validation_issues.csv`.
+Rows without a strict majority are written to `adjudication_queue.csv`. Open that queue in CodeBook Studio's adjudication mode, fill the unresolved blanks, export the completed queue, then rebuild:
+```python
+resolved = build_human_ground_truth(
+    codebook_path="codebook.json",
+    coder_csvs=coder_csvs,
+    adjudications_csv="adjudication_queue.csv",
+    output_dir="outputs/ground_truth_resolved",
+)
+```
 ## Advanced Customization
 If you want to go beyond the default wrappers and hyperparameters, `codebook_lab/annotate.py` and `codebook_lab/prompts.py` are the main extension points.

{codebook_lab-1.1.1 → codebook_lab-1.2.0}/codebook_lab/__init__.py RENAMED Viewed

@@ -12,11 +12,19 @@ from .prompts import (
     list_prompt_wrappers,
     register_prompt_wrapper,
 )
-from .types import AnnotationRunResult, ExperimentRunResult, ExperimentSpec, MetricsRunResult
+from .types import (
+    AnnotationRunResult,
+    ExperimentRunResult,
+    ExperimentSpec,
+    HumanGroundTruthResult,
+    HumanReliabilityResult,
+    MetricsRunResult,
+)
 if TYPE_CHECKING:
     from .annotate import run_annotation
     from .experiments import expand_param_grid, resolve_task_dir, run_experiment, run_experiment_grid
+    from .human_reliability import build_human_ground_truth, calculate_human_reliability
     from .metrics import run_metrics
 try:
@@ -30,6 +38,8 @@ _LAZY_EXPORTS = {
     "run_annotation": (".annotate", "run_annotation"),
     "run_experiment": (".experiments", "run_experiment"),
     "run_experiment_grid": (".experiments", "run_experiment_grid"),
+    "build_human_ground_truth": (".human_reliability", "build_human_ground_truth"),
+    "calculate_human_reliability": (".human_reliability", "calculate_human_reliability"),
     "run_metrics": (".metrics", "run_metrics"),
 }
@@ -38,7 +48,11 @@ __all__ = [
     "AnnotationRunResult",
     "ExperimentRunResult",
     "ExperimentSpec",
+    "HumanGroundTruthResult",
+    "HumanReliabilityResult",
     "MetricsRunResult",
+    "build_human_ground_truth",
+    "calculate_human_reliability",
     "copy_example_task",
     "ensure_ollama_available",
     "ensure_ollama_model",

codebook-lab 1.1.1__tar.gz → 1.2.0__tar.gz

codebook-lab 1.1.1tar.gz → 1.2.0tar.gz