PyPI - codebook-lab - Versions diffs - 1.1.0__tar.gz → 1.2.0__tar.gz - Mend

codebook-lab 1.1.0tar.gz → 1.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (36) hide show

{codebook_lab-1.1.0 → codebook_lab-1.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: codebook-lab
-Version: 1.1.0
+Version: 1.2.0
 Summary: An LLM annotation experiment pipeline for computational social science.
 Author: Lorcan McLaren
 License-Expression: AGPL-3.0-only
@@ -45,7 +45,7 @@ Dynamic: license-file
 # CodeBook Lab
-[![DOI](https://zenodo.org/badge/1186234207.svg)](https://doi.org/10.5281/zenodo.19185921)
+[![DOI](https://zenodo.org/badge/1186234207.svg)](https://doi.org/10.5281/zenodo.19185921) [![PyPI](https://img.shields.io/pypi/v/codebook-lab)](https://pypi.org/project/codebook-lab/) [![Python](https://img.shields.io/pypi/pyversions/codebook-lab)](https://pypi.org/project/codebook-lab/) [![License](https://img.shields.io/pypi/l/codebook-lab)](https://pypi.org/project/codebook-lab/)
 CodeBook Lab is an LLM annotation experiment pipeline for computational social science. It takes a codebook and labelled dataset from [CodeBook Studio](https://codebook.streamlit.app/) ([source](https://github.com/LorcanMcLaren/codebook-studio)) and runs structured experiments across the dimensions that matter for text-as-data research: model choice, model size, prompt style, zero-shot versus few-shot learning, and sampling hyperparameters — all benchmarked against human labels.
@@ -98,6 +98,7 @@ The package is organized around a small set of importable modules:
 - `codebook_lab.experiments`: high-level functions for single experiments and multi-run comparisons
 - `codebook_lab.annotate`: lower-level annotation functions
 - `codebook_lab.metrics`: evaluation and metrics functions
+- `codebook_lab.human_reliability`: human coder validation, ICR, disagreement, and ground-truth helpers
 - `codebook_lab.prompts`: prompt wrapper registry for built-in and custom prompt styles
 - `codebook_lab.examples`: helpers for bundled example tasks
 - `codebook_lab.types`: dataclasses for experiment specifications and result objects
@@ -281,6 +282,47 @@ Add multiple values to any field and the package sweeps them automatically. For
 If you are still designing a task and do not yet have human-coded labels, you can run annotation with `codebook_lab.run_annotation(...)` on an unlabeled CSV and add `ground-truth.csv` later when you want to score model performance with `codebook_lab.run_metrics(...)`.
+## Human Reliability And Adjudication
+When multiple human coders annotate the same items, CodeBook Lab can validate the coder CSVs, calculate inter-coder reliability, find disagreements, and build a consensus `ground-truth.csv`.
+```python
+from codebook_lab import build_human_ground_truth, calculate_human_reliability
+coder_csvs = {
+    "coder1": "annotations/coder1.csv",
+    "coder2": "annotations/coder2.csv",
+    "coder3": "annotations/coder3.csv",
+}
+reliability = calculate_human_reliability(
+    codebook_path="codebook.json",
+    coder_csvs=coder_csvs,
+    output_dir="outputs/human_reliability",
+)
+ground_truth = build_human_ground_truth(
+    codebook_path="codebook.json",
+    coder_csvs=coder_csvs,
+    output_dir="outputs/ground_truth",
+)
+```
+Each coder CSV must contain a stable item identifier column. The default is `sample_id`; pass `id_column="..."` to use a different column. By default, coder assignments are inferred from the submitted files. To validate expected coverage, pass an optional assignment CSV in either long format (`sample_id,coder_id`) or wide format (`sample_id,ra_1,ra_2,...`).
+Reliability outputs include `validation_issues.csv`, `pairwise_icr.csv`, `multirater_icr.csv`, `disagreements.csv`, and `summary.md`. Ground-truth outputs include `ground-truth.csv`, `adjudication_queue.csv`, and `validation_issues.csv`.
+Rows without a strict majority are written to `adjudication_queue.csv`. Open that queue in CodeBook Studio's adjudication mode, fill the unresolved blanks, export the completed queue, then rebuild:
+```python
+resolved = build_human_ground_truth(
+    codebook_path="codebook.json",
+    coder_csvs=coder_csvs,
+    adjudications_csv="adjudication_queue.csv",
+    output_dir="outputs/ground_truth_resolved",
+)
+```
 ## Advanced Customization
 If you want to go beyond the default wrappers and hyperparameters, `codebook_lab/annotate.py` and `codebook_lab/prompts.py` are the main extension points.
@@ -297,7 +339,7 @@ This project is licensed under the [GNU Affero General Public License v3.0](http
 If you use CodeBook Lab in research, please cite both:
 - this software package
-- the associated preprint
+- the associated arXiv preprint
 Citation metadata is also available in the project's [`CITATION.cff`](https://github.com/LorcanMcLaren/codebook-lab/blob/main/CITATION.cff).
@@ -324,7 +366,7 @@ BibTeX:
 APSR style:
-McLaren, Lorcan, James P. Cross, Zuzanna Krakowska, Robin Rauner, and Martijn Schoonvelde. 2026. *Magic Words or Methodical Work? Challenging Conventional Wisdom in LLM-Based Political Text Annotation*. Preprint.
+McLaren, Lorcan, James P. Cross, Zuzanna Krakowska, Robin Rauner, and Martijn Schoonvelde. 2026. *Magic Words or Methodical Work? Challenging Conventional Wisdom in LLM-Based Political Text Annotation*. arXiv preprint arXiv:2603.26898. [https://arxiv.org/abs/2603.26898](https://arxiv.org/abs/2603.26898).
 BibTeX:
@@ -333,6 +375,10 @@ BibTeX:
   author = {McLaren, Lorcan and Cross, James P. and Krakowska, Zuzanna and Rauner, Robin and Schoonvelde, Martijn},
   title = {Magic Words or Methodical Work? Challenging Conventional Wisdom in LLM-Based Political Text Annotation},
   year = {2026},
-  note = {Preprint}
+  eprint = {2603.26898},
+  archivePrefix = {arXiv},
+  primaryClass = {cs.CL},
+  doi = {10.48550/arXiv.2603.26898},
+  url = {https://arxiv.org/abs/2603.26898}
 }
 ```

{codebook_lab-1.1.0 → codebook_lab-1.2.0}/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 # CodeBook Lab
-[![DOI](https://zenodo.org/badge/1186234207.svg)](https://doi.org/10.5281/zenodo.19185921)
+[![DOI](https://zenodo.org/badge/1186234207.svg)](https://doi.org/10.5281/zenodo.19185921) [![PyPI](https://img.shields.io/pypi/v/codebook-lab)](https://pypi.org/project/codebook-lab/) [![Python](https://img.shields.io/pypi/pyversions/codebook-lab)](https://pypi.org/project/codebook-lab/) [![License](https://img.shields.io/pypi/l/codebook-lab)](https://pypi.org/project/codebook-lab/)
 CodeBook Lab is an LLM annotation experiment pipeline for computational social science. It takes a codebook and labelled dataset from [CodeBook Studio](https://codebook.streamlit.app/) ([source](https://github.com/LorcanMcLaren/codebook-studio)) and runs structured experiments across the dimensions that matter for text-as-data research: model choice, model size, prompt style, zero-shot versus few-shot learning, and sampling hyperparameters — all benchmarked against human labels.
@@ -53,6 +53,7 @@ The package is organized around a small set of importable modules:
 - `codebook_lab.experiments`: high-level functions for single experiments and multi-run comparisons
 - `codebook_lab.annotate`: lower-level annotation functions
 - `codebook_lab.metrics`: evaluation and metrics functions
+- `codebook_lab.human_reliability`: human coder validation, ICR, disagreement, and ground-truth helpers
 - `codebook_lab.prompts`: prompt wrapper registry for built-in and custom prompt styles
 - `codebook_lab.examples`: helpers for bundled example tasks
 - `codebook_lab.types`: dataclasses for experiment specifications and result objects
@@ -236,6 +237,47 @@ Add multiple values to any field and the package sweeps them automatically. For
 If you are still designing a task and do not yet have human-coded labels, you can run annotation with `codebook_lab.run_annotation(...)` on an unlabeled CSV and add `ground-truth.csv` later when you want to score model performance with `codebook_lab.run_metrics(...)`.
+## Human Reliability And Adjudication
+When multiple human coders annotate the same items, CodeBook Lab can validate the coder CSVs, calculate inter-coder reliability, find disagreements, and build a consensus `ground-truth.csv`.
+```python
+from codebook_lab import build_human_ground_truth, calculate_human_reliability
+coder_csvs = {
+    "coder1": "annotations/coder1.csv",
+    "coder2": "annotations/coder2.csv",
+    "coder3": "annotations/coder3.csv",
+}
+reliability = calculate_human_reliability(
+    codebook_path="codebook.json",
+    coder_csvs=coder_csvs,
+    output_dir="outputs/human_reliability",
+)
+ground_truth = build_human_ground_truth(
+    codebook_path="codebook.json",
+    coder_csvs=coder_csvs,
+    output_dir="outputs/ground_truth",
+)
+```
+Each coder CSV must contain a stable item identifier column. The default is `sample_id`; pass `id_column="..."` to use a different column. By default, coder assignments are inferred from the submitted files. To validate expected coverage, pass an optional assignment CSV in either long format (`sample_id,coder_id`) or wide format (`sample_id,ra_1,ra_2,...`).
+Reliability outputs include `validation_issues.csv`, `pairwise_icr.csv`, `multirater_icr.csv`, `disagreements.csv`, and `summary.md`. Ground-truth outputs include `ground-truth.csv`, `adjudication_queue.csv`, and `validation_issues.csv`.
+Rows without a strict majority are written to `adjudication_queue.csv`. Open that queue in CodeBook Studio's adjudication mode, fill the unresolved blanks, export the completed queue, then rebuild:
+```python
+resolved = build_human_ground_truth(
+    codebook_path="codebook.json",
+    coder_csvs=coder_csvs,
+    adjudications_csv="adjudication_queue.csv",
+    output_dir="outputs/ground_truth_resolved",
+)
+```
 ## Advanced Customization
 If you want to go beyond the default wrappers and hyperparameters, `codebook_lab/annotate.py` and `codebook_lab/prompts.py` are the main extension points.
@@ -252,7 +294,7 @@ This project is licensed under the [GNU Affero General Public License v3.0](http
 If you use CodeBook Lab in research, please cite both:
 - this software package
-- the associated preprint
+- the associated arXiv preprint
 Citation metadata is also available in the project's [`CITATION.cff`](https://github.com/LorcanMcLaren/codebook-lab/blob/main/CITATION.cff).
@@ -279,7 +321,7 @@ BibTeX:
 APSR style:
-McLaren, Lorcan, James P. Cross, Zuzanna Krakowska, Robin Rauner, and Martijn Schoonvelde. 2026. *Magic Words or Methodical Work? Challenging Conventional Wisdom in LLM-Based Political Text Annotation*. Preprint.
+McLaren, Lorcan, James P. Cross, Zuzanna Krakowska, Robin Rauner, and Martijn Schoonvelde. 2026. *Magic Words or Methodical Work? Challenging Conventional Wisdom in LLM-Based Political Text Annotation*. arXiv preprint arXiv:2603.26898. [https://arxiv.org/abs/2603.26898](https://arxiv.org/abs/2603.26898).
 BibTeX:
@@ -288,6 +330,10 @@ BibTeX:
   author = {McLaren, Lorcan and Cross, James P. and Krakowska, Zuzanna and Rauner, Robin and Schoonvelde, Martijn},
   title = {Magic Words or Methodical Work? Challenging Conventional Wisdom in LLM-Based Political Text Annotation},
   year = {2026},
-  note = {Preprint}
+  eprint = {2603.26898},
+  archivePrefix = {arXiv},
+  primaryClass = {cs.CL},
+  doi = {10.48550/arXiv.2603.26898},
+  url = {https://arxiv.org/abs/2603.26898}
 }
 ```

{codebook_lab-1.1.0 → codebook_lab-1.2.0}/codebook_lab/__init__.py RENAMED Viewed

@@ -12,11 +12,19 @@ from .prompts import (
     list_prompt_wrappers,
     register_prompt_wrapper,
 )
-from .types import AnnotationRunResult, ExperimentRunResult, ExperimentSpec, MetricsRunResult
+from .types import (
+    AnnotationRunResult,
+    ExperimentRunResult,
+    ExperimentSpec,
+    HumanGroundTruthResult,
+    HumanReliabilityResult,
+    MetricsRunResult,
+)
 if TYPE_CHECKING:
     from .annotate import run_annotation
     from .experiments import expand_param_grid, resolve_task_dir, run_experiment, run_experiment_grid
+    from .human_reliability import build_human_ground_truth, calculate_human_reliability
     from .metrics import run_metrics
 try:
@@ -30,6 +38,8 @@ _LAZY_EXPORTS = {
     "run_annotation": (".annotate", "run_annotation"),
     "run_experiment": (".experiments", "run_experiment"),
     "run_experiment_grid": (".experiments", "run_experiment_grid"),
+    "build_human_ground_truth": (".human_reliability", "build_human_ground_truth"),
+    "calculate_human_reliability": (".human_reliability", "calculate_human_reliability"),
     "run_metrics": (".metrics", "run_metrics"),
 }
@@ -38,7 +48,11 @@ __all__ = [
     "AnnotationRunResult",
     "ExperimentRunResult",
     "ExperimentSpec",
+    "HumanGroundTruthResult",
+    "HumanReliabilityResult",
     "MetricsRunResult",
+    "build_human_ground_truth",
+    "calculate_human_reliability",
     "copy_example_task",
     "ensure_ollama_available",
     "ensure_ollama_model",

{codebook_lab-1.1.0 → codebook_lab-1.2.0}/codebook_lab/annotate.py RENAMED Viewed

@@ -205,7 +205,9 @@ def generate_response(chain, prompt, char_counts, timing_data, row_num=None, ann
         structured_chain = (
             _PROMPT_TEMPLATE
-            | chain.with_structured_output(AnnotationResponse, include_raw=True)
+            | chain.with_structured_output(
+                AnnotationResponse, method="json_schema", include_raw=True
+            )
         )
         start_time = time.time()

codebook-lab 1.1.0__tar.gz → 1.2.0__tar.gz

codebook-lab 1.1.0tar.gz → 1.2.0tar.gz