PyPI - codebook-lab - Versions diffs - 1.0.0__tar.gz → 1.1.1__tar.gz - Mend

codebook-lab 1.0.0tar.gz → 1.1.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: codebook-lab
-Version: 1.0.0
+Version: 1.1.1
 Summary: An LLM annotation experiment pipeline for computational social science.
 Author: Lorcan McLaren
 License-Expression: AGPL-3.0-only
@@ -45,7 +45,7 @@ Dynamic: license-file
 # CodeBook Lab
-[![DOI](https://zenodo.org/badge/1186234207.svg)](https://doi.org/10.5281/zenodo.19185921)
+[![DOI](https://zenodo.org/badge/1186234207.svg)](https://doi.org/10.5281/zenodo.19185921) [![PyPI](https://img.shields.io/pypi/v/codebook-lab)](https://pypi.org/project/codebook-lab/) [![Python](https://img.shields.io/pypi/pyversions/codebook-lab)](https://pypi.org/project/codebook-lab/) [![License](https://img.shields.io/pypi/l/codebook-lab)](https://pypi.org/project/codebook-lab/)
 CodeBook Lab is an LLM annotation experiment pipeline for computational social science. It takes a codebook and labelled dataset from [CodeBook Studio](https://codebook.streamlit.app/) ([source](https://github.com/LorcanMcLaren/codebook-studio)) and runs structured experiments across the dimensions that matter for text-as-data research: model choice, model size, prompt style, zero-shot versus few-shot learning, and sampling hyperparameters — all benchmarked against human labels.
@@ -297,7 +297,7 @@ This project is licensed under the [GNU Affero General Public License v3.0](http
 If you use CodeBook Lab in research, please cite both:
 - this software package
-- the associated preprint
+- the associated arXiv preprint
 Citation metadata is also available in the project's [`CITATION.cff`](https://github.com/LorcanMcLaren/codebook-lab/blob/main/CITATION.cff).
@@ -324,7 +324,7 @@ BibTeX:
 APSR style:
-McLaren, Lorcan, James P. Cross, Zuzanna Krakowska, Robin Rauner, and Martijn Schoonvelde. 2026. *Magic Words or Methodical Work? Challenging Conventional Wisdom in LLM-Based Political Text Annotation*. Preprint.
+McLaren, Lorcan, James P. Cross, Zuzanna Krakowska, Robin Rauner, and Martijn Schoonvelde. 2026. *Magic Words or Methodical Work? Challenging Conventional Wisdom in LLM-Based Political Text Annotation*. arXiv preprint arXiv:2603.26898. [https://arxiv.org/abs/2603.26898](https://arxiv.org/abs/2603.26898).
 BibTeX:
@@ -333,6 +333,10 @@ BibTeX:
   author = {McLaren, Lorcan and Cross, James P. and Krakowska, Zuzanna and Rauner, Robin and Schoonvelde, Martijn},
   title = {Magic Words or Methodical Work? Challenging Conventional Wisdom in LLM-Based Political Text Annotation},
   year = {2026},
-  note = {Preprint}
+  eprint = {2603.26898},
+  archivePrefix = {arXiv},
+  primaryClass = {cs.CL},
+  doi = {10.48550/arXiv.2603.26898},
+  url = {https://arxiv.org/abs/2603.26898}
 }
 ```

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 # CodeBook Lab
-[![DOI](https://zenodo.org/badge/1186234207.svg)](https://doi.org/10.5281/zenodo.19185921)
+[![DOI](https://zenodo.org/badge/1186234207.svg)](https://doi.org/10.5281/zenodo.19185921) [![PyPI](https://img.shields.io/pypi/v/codebook-lab)](https://pypi.org/project/codebook-lab/) [![Python](https://img.shields.io/pypi/pyversions/codebook-lab)](https://pypi.org/project/codebook-lab/) [![License](https://img.shields.io/pypi/l/codebook-lab)](https://pypi.org/project/codebook-lab/)
 CodeBook Lab is an LLM annotation experiment pipeline for computational social science. It takes a codebook and labelled dataset from [CodeBook Studio](https://codebook.streamlit.app/) ([source](https://github.com/LorcanMcLaren/codebook-studio)) and runs structured experiments across the dimensions that matter for text-as-data research: model choice, model size, prompt style, zero-shot versus few-shot learning, and sampling hyperparameters — all benchmarked against human labels.
@@ -252,7 +252,7 @@ This project is licensed under the [GNU Affero General Public License v3.0](http
 If you use CodeBook Lab in research, please cite both:
 - this software package
-- the associated preprint
+- the associated arXiv preprint
 Citation metadata is also available in the project's [`CITATION.cff`](https://github.com/LorcanMcLaren/codebook-lab/blob/main/CITATION.cff).
@@ -279,7 +279,7 @@ BibTeX:
 APSR style:
-McLaren, Lorcan, James P. Cross, Zuzanna Krakowska, Robin Rauner, and Martijn Schoonvelde. 2026. *Magic Words or Methodical Work? Challenging Conventional Wisdom in LLM-Based Political Text Annotation*. Preprint.
+McLaren, Lorcan, James P. Cross, Zuzanna Krakowska, Robin Rauner, and Martijn Schoonvelde. 2026. *Magic Words or Methodical Work? Challenging Conventional Wisdom in LLM-Based Political Text Annotation*. arXiv preprint arXiv:2603.26898. [https://arxiv.org/abs/2603.26898](https://arxiv.org/abs/2603.26898).
 BibTeX:
@@ -288,6 +288,10 @@ BibTeX:
   author = {McLaren, Lorcan and Cross, James P. and Krakowska, Zuzanna and Rauner, Robin and Schoonvelde, Martijn},
   title = {Magic Words or Methodical Work? Challenging Conventional Wisdom in LLM-Based Political Text Annotation},
   year = {2026},
-  note = {Preprint}
+  eprint = {2603.26898},
+  archivePrefix = {arXiv},
+  primaryClass = {cs.CL},
+  doi = {10.48550/arXiv.2603.26898},
+  url = {https://arxiv.org/abs/2603.26898}
 }
 ```

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/codebook_lab/annotate.py RENAMED Viewed

@@ -8,9 +8,24 @@ import pandas as pd
 import regex
 from codecarbon import OfflineEmissionsTracker
 from langchain_core.prompts import ChatPromptTemplate
-from langchain_ollama.llms import OllamaLLM
+from langchain_ollama.chat_models import ChatOllama
+from pydantic import BaseModel
+from .conditions import (
+    get_annotation_column_name,
+    get_annotation_entries,
+    is_annotation_applicable,
+    normalize_annotation_response_value,
+)
 from .ollama import ensure_ollama_available
+class AnnotationResponse(BaseModel):
+    """Schema used by ChatOllama structured output to guarantee valid JSON."""
+    response: str
+_PROMPT_TEMPLATE = ChatPromptTemplate.from_template("""{question}""")
 from .prompts import PromptContext, get_prompt_type_name, render_prompt
 from .types import AnnotationRunResult
@@ -55,17 +70,20 @@ class _AnnotationProgressBar:
             sys.stderr.write("\n")
             sys.stderr.flush()
+    def skip(self, count: int = 1) -> None:
+        """Reduce the remaining work estimate when prompts are skipped."""
+        if count <= 0:
+            return
+        self.total_steps = max(self.completed_steps, self.total_steps - count)
 def _count_annotations(codebook, process_textbox=False):
-    """Count how many annotation prompts will be issued for one row."""
+    """Count the maximum number of annotation prompts that could be issued for one row."""
     count = 0
-    for key, section in codebook.items():
-        if not key.startswith("section_"):
+    for _, _, _, annotation in get_annotation_entries(codebook):
+        if annotation.get("type") == "textbox" and not process_textbox:
             continue
-        for annotation in section.get("annotations", {}).values():
-            if annotation.get("type") == "textbox" and not process_textbox:
-                continue
-            count += 1
+        count += 1
     return count
 def load_codebook(codebook_path):
@@ -90,19 +108,10 @@ def get_annotation_column_names(codebook):
     Returns:
         List of column names in ``<section_name>_<annotation_name>`` format.
     """
-    annotation_columns = []
-    for key, section in codebook.items():
-        if not key.startswith("section_"):
-            continue
-        section_name = section["section_name"]
-        annotations = section.get("annotations", {})
-        for annotation in annotations.values():
-            annotation_columns.append(f"{section_name}_{annotation['name']}")
-    return annotation_columns
+    return [
+        get_annotation_column_name(section_content, annotation)
+        for _, section_content, _, annotation in get_annotation_entries(codebook)
+    ]
 def load_input_dataframe(csv_path, codebook):
     """Load the input CSV and remove any existing annotation label columns.
@@ -161,24 +170,23 @@ def setup_model(model_name, temperature=None, top_p=None):
         top_p: Optional nucleus-sampling value.
     Returns:
-        LangChain runnable that accepts ``{"question": prompt}``.
+        ``ChatOllama`` instance.  The caller builds structured-output chains
+        from this model as needed.
     """
     model_kwargs = {}
     if temperature is not None:
         model_kwargs['temperature'] = float(temperature)
     if top_p is not None:
         model_kwargs['top_p'] = float(top_p)
-    llm = OllamaLLM(model=model_name, **model_kwargs)
-    prompt_template = ChatPromptTemplate.from_template("""{question}""")
-    chain = prompt_template | llm
-    return chain
+    llm = ChatOllama(model=model_name, **model_kwargs)
+    return llm
 def generate_response(chain, prompt, char_counts, timing_data, row_num=None, annotation_name=None):
     """Run one prompt through the model and update timing/count statistics.
     Args:
-        chain: Runnable returned by :func:`setup_model`.
+        chain: ``ChatOllama`` instance returned by :func:`setup_model`.
         prompt: Fully rendered prompt string.
         char_counts: Mutable dict with ``input_chars`` and ``output_chars`` integers.
         timing_data: Mutable dict with inference timing counters.
@@ -191,28 +199,42 @@ def generate_response(chain, prompt, char_counts, timing_data, row_num=None, ann
     try:
         # Track input characters
         char_counts['input_chars'] += len(prompt)
         if row_num and annotation_name:
             logger.info("[Row %s] Sending request for: %s...", row_num, annotation_name)
+        structured_chain = (
+            _PROMPT_TEMPLATE
+            | chain.with_structured_output(
+                AnnotationResponse, method="json_schema", include_raw=True
+            )
+        )
         start_time = time.time()
-        response = chain.invoke({"question": prompt})
+        result = structured_chain.invoke({"question": prompt})
         end_time = time.time()
         inference_time = end_time - start_time
         timing_data['total_inference_time'] += inference_time
         timing_data['inference_count'] += 1
+        if result.get("parsed") is not None:
+            response = result["parsed"].model_dump_json()
+        else:
+            raw = result.get("raw")
+            response = raw.content if raw else ""
+            logger.debug("Structured parsing failed, using raw response for %s", annotation_name)
         char_counts['output_chars'] += len(response)
         if row_num and annotation_name:
             logger.info("[Row %s] %s done (%.1fs)", row_num, annotation_name, inference_time)
         return response
     except Exception as e:
         logger.warning("Error generating response: %s", e)
         return ""
-def extract_json_response(response, annotation_type, min_value=None, max_value=None):
+def extract_json_response(response, annotation_type, min_value=None, max_value=None, options=None):
     """
     Extract and validate JSON response based on annotation type
@@ -221,12 +243,22 @@ def extract_json_response(response, annotation_type, min_value=None, max_value=N
         annotation_type: Annotation type string such as ``"dropdown"`` or ``"likert"``.
         min_value: Optional integer lower bound for Likert annotations.
         max_value: Optional integer upper bound for Likert annotations.
+        options: Optional dropdown option list used to normalize categorical labels.
     Returns:
         Parsed response value coerced into the expected annotation format.
     """
     pattern = regex.compile(r'\{(?:[^{}]|(?R))*\}')
     json_strings = pattern.findall(response)
+    def normalize_dropdown_value(value):
+        return normalize_annotation_response_value(
+            {
+                "type": "dropdown",
+                "options": options or [],
+            },
+            value,
+        )
     for json_string in json_strings:
         try:
@@ -235,7 +267,7 @@ def extract_json_response(response, annotation_type, min_value=None, max_value=N
             # Validate and format based on annotation type
             if annotation_type == "dropdown":
-                return response_value
+                return normalize_dropdown_value(response_value)
             elif annotation_type == "checkbox":
                 # Convert to 1 or 0
                 if isinstance(response_value, bool):
@@ -251,7 +283,7 @@ def extract_json_response(response, annotation_type, min_value=None, max_value=N
                 return 0
             elif annotation_type == "textbox":
                 # Return as string
-                return str(response_value)
+                return str(response_value).strip()
             elif annotation_type == "likert":
                 # Validate is within range and convert to int
                 try:
@@ -266,12 +298,16 @@ def extract_json_response(response, annotation_type, min_value=None, max_value=N
                     return response_value
             # Fallback
-            return response_value
+            return str(response_value).strip() if isinstance(response_value, str) else response_value
         except json.JSONDecodeError as e:
             logger.debug("Error parsing JSON: %s", e)
     # If no valid JSON, try to extract direct response
-    if annotation_type == "checkbox":
+    stripped_response = response.strip()
+    if annotation_type == "dropdown":
+        return normalize_dropdown_value(stripped_response)
+    elif annotation_type == "checkbox":
         if "yes" in response.lower() or "true" in response.lower():
             return 1
         elif "no" in response.lower() or "false" in response.lower():
@@ -288,8 +324,10 @@ def extract_json_response(response, annotation_type, min_value=None, max_value=N
             except ValueError:
                 continue
         return (min_value + max_value) // 2  # Default to middle value
+    elif annotation_type == "textbox":
+        return stripped_response
-    return response  # Return raw response as fallback
+    return None
 def format_prompt(section_name, section_instruction, name, tooltip, annotation_type,
                options=None, min_value=None, max_value=None, example=None,
@@ -466,73 +504,73 @@ def classify_text(chain, text, codebook, prompt_type="standard", use_examples=Fa
     if timing_data is None:
         timing_data = {'total_inference_time': 0, 'inference_count': 0}
-    for key, section in codebook.items():
-        if key.startswith('section_'):
-            section_name = section['section_name']
-            section_instruction = section.get('section_instruction', '')
-            annotations = section['annotations']
-            for annotation_key, annotation in annotations.items():
-                name = annotation['name']
-                annotation_type = annotation['type']
-                # Skip textbox type annotations if process_textbox is False
-                if annotation_type == "textbox" and not process_textbox:
-                    continue
-                tooltip = annotation.get('tooltip', '')
-                example = annotation.get('example', '')
-                # Get type-specific parameters
-                options = None
-                min_value = None
-                max_value = None
-                if annotation_type == "dropdown":
-                    options = annotation.get('options', [])
-                elif annotation_type == "likert":
-                    min_value = annotation.get('min_value')
-                    max_value = annotation.get('max_value')
-                # Format prompt based on specified type and annotation type
-                prompt = format_prompt(
-                    section_name,
-                    section_instruction,
-                    name,
-                    tooltip,
-                    annotation_type,
-                    options,
-                    min_value,
-                    max_value,
-                    example,
-                    text,
-                    prompt_type=prompt_type,
-                    use_examples=use_examples
-                )
-                annotation_full_name = f"{section_name}_{name}"
-                response_text = generate_response(
-                    chain,
-                    prompt,
-                    char_counts,
-                    timing_data,
-                    row_num=row_num,
-                    annotation_name=annotation_full_name
-                )
-                response_value = extract_json_response(
-                    response_text,
-                    annotation_type,
-                    min_value,
-                    max_value
-                )
-                if response_value is not None:
-                    # Store the response with a meaningful column name
-                    column_name = f"{section_name}_{name}"
-                    responses[column_name] = response_value
-                if progress_bar is not None and row_num is not None and total_rows is not None:
-                    progress_bar.update(row_num, total_rows, annotation_full_name)
+    for section_key, section, annotation_key, annotation in get_annotation_entries(codebook):
+        section_name = section['section_name']
+        section_instruction = section.get('section_instruction', '')
+        name = annotation['name']
+        annotation_type = annotation['type']
+        annotation_full_name = f"{section_name}_{name}"
+        column_name = get_annotation_column_name(section, annotation)
+        if annotation_type == "textbox" and not process_textbox:
+            if progress_bar is not None:
+                progress_bar.skip()
+            continue
+        if not is_annotation_applicable(codebook, section_key, annotation_key, responses):
+            responses[column_name] = None
+            if progress_bar is not None:
+                progress_bar.skip()
+            continue
+        tooltip = annotation.get('tooltip', '')
+        example = annotation.get('example', '')
+        options = None
+        min_value = None
+        max_value = None
+        if annotation_type == "dropdown":
+            options = annotation.get('options', [])
+        elif annotation_type == "likert":
+            min_value = annotation.get('min_value')
+            max_value = annotation.get('max_value')
+        prompt = format_prompt(
+            section_name,
+            section_instruction,
+            name,
+            tooltip,
+            annotation_type,
+            options,
+            min_value,
+            max_value,
+            example,
+            text,
+            prompt_type=prompt_type,
+            use_examples=use_examples
+        )
+        response_text = generate_response(
+            chain,
+            prompt,
+            char_counts,
+            timing_data,
+            row_num=row_num,
+            annotation_name=annotation_full_name
+        )
+        response_value = extract_json_response(
+            response_text,
+            annotation_type,
+            min_value,
+            max_value,
+            options=options,
+        )
+        responses[column_name] = response_value if response_value is not None else None
+        if progress_bar is not None and row_num is not None and total_rows is not None:
+            progress_bar.update(row_num, total_rows, annotation_full_name)
     return responses, char_counts, timing_data

codebook_lab-1.1.1/codebook_lab/conditions.py ADDED Viewed

@@ -0,0 +1,154 @@
+from __future__ import annotations
+from typing import Any
+import pandas as pd
+def get_sorted_annotation_keys(section_content: dict[str, Any]) -> list[str]:
+    """Return annotation keys in the same stable order used by CodeBook Studio."""
+    def sort_key(annotation_key: str) -> tuple[int, int | str]:
+        suffix = annotation_key.split("_")[-1]
+        return (0, int(suffix)) if suffix.isdigit() else (1, annotation_key)
+    return sorted(section_content.get("annotations", {}).keys(), key=sort_key)
+def get_annotation_column_name(section_content: dict[str, Any], annotation: dict[str, Any]) -> str:
+    """Return the canonical CSV column name for an annotation."""
+    return f"{section_content['section_name']}_{annotation['name']}"
+def get_annotation_entries(codebook: dict[str, Any]) -> list[tuple[str, dict[str, Any], str, dict[str, Any]]]:
+    """Return all section/annotation entries in display order."""
+    entries: list[tuple[str, dict[str, Any], str, dict[str, Any]]] = []
+    for section_key, section_content in codebook.items():
+        if not section_key.startswith("section_"):
+            continue
+        for annotation_key in get_sorted_annotation_keys(section_content):
+            annotation = section_content.get("annotations", {}).get(annotation_key, {})
+            entries.append((section_key, section_content, annotation_key, annotation))
+    return entries
+def get_annotation_lookup(
+    codebook: dict[str, Any],
+) -> dict[tuple[str, str], tuple[dict[str, Any], dict[str, Any]]]:
+    """Build a lookup from stable section/annotation keys to annotation metadata."""
+    return {
+        (section_key, annotation_key): (section_content, annotation)
+        for section_key, section_content, annotation_key, annotation in get_annotation_entries(codebook)
+    }
+def get_annotation_condition(annotation: dict[str, Any]) -> dict[str, Any] | None:
+    """Return a normalized condition block when one is present."""
+    condition = annotation.get("condition")
+    if not isinstance(condition, dict):
+        return None
+    section_key = condition.get("section_key")
+    annotation_key = condition.get("annotation_key")
+    if not section_key or not annotation_key:
+        return None
+    return {
+        "section_key": section_key,
+        "annotation_key": annotation_key,
+        "value": condition.get("value"),
+    }
+def normalize_annotation_response_value(annotation: dict[str, Any], value: Any) -> Any:
+    """Coerce stored responses into stable comparable values."""
+    if pd.isna(value):
+        return None
+    annotation_type = annotation.get("type", "dropdown")
+    if annotation_type == "dropdown":
+        normalized = str(value).strip().strip("`").strip()
+        if normalized == "":
+            return None
+        options = annotation.get("options") or []
+        if not options:
+            return normalized
+        option_lookup = {str(option).strip().casefold(): option for option in options}
+        return option_lookup.get(normalized.casefold())
+    if annotation_type == "checkbox":
+        lowered = str(value).strip().lower()
+        if lowered in {"1", "true", "yes"}:
+            return 1
+        if lowered in {"0", "false", "no"}:
+            return 0
+        return value
+    if annotation_type == "likert":
+        try:
+            return int(value)
+        except (TypeError, ValueError):
+            return value
+    if annotation_type == "textbox":
+        return str(value).strip()
+    return str(value).strip()
+def is_annotation_applicable(
+    codebook: dict[str, Any],
+    section_key: str,
+    annotation_key: str,
+    response_values: dict[str, Any],
+    lookup: dict[tuple[str, str], tuple[dict[str, Any], dict[str, Any]]] | None = None,
+    visited: set[tuple[str, str]] | None = None,
+) -> bool:
+    """Return whether an annotation should be shown/generate for the current responses."""
+    lookup = lookup or get_annotation_lookup(codebook)
+    current_entry = lookup.get((section_key, annotation_key))
+    if not current_entry:
+        return True
+    _, annotation = current_entry
+    condition = get_annotation_condition(annotation)
+    if not condition:
+        return True
+    target_key = (condition["section_key"], condition["annotation_key"])
+    if target_key == (section_key, annotation_key):
+        return True
+    target_entry = lookup.get(target_key)
+    if not target_entry:
+        return True
+    visited = visited or set()
+    if (section_key, annotation_key) in visited:
+        return True
+    target_section_content, target_annotation = target_entry
+    if not is_annotation_applicable(
+        codebook,
+        condition["section_key"],
+        condition["annotation_key"],
+        response_values,
+        lookup=lookup,
+        visited=visited | {(section_key, annotation_key)},
+    ):
+        return False
+    target_column_name = get_annotation_column_name(target_section_content, target_annotation)
+    actual_value = normalize_annotation_response_value(target_annotation, response_values.get(target_column_name))
+    expected_value = normalize_annotation_response_value(target_annotation, condition.get("value"))
+    if actual_value is None:
+        return False
+    if target_annotation.get("type") == "textbox" and actual_value == "":
+        return False
+    return actual_value == expected_value

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/codebook_lab/metrics.py RENAMED Viewed

@@ -16,6 +16,13 @@ import krippendorff
 from scipy.stats import spearmanr
 from sklearn.metrics import confusion_matrix
+from .conditions import (
+    get_annotation_column_name,
+    get_annotation_condition,
+    get_annotation_entries,
+    get_annotation_lookup,
+    normalize_annotation_response_value,
+)
 from .types import MetricsRunResult
 logger = logging.getLogger(__name__)
@@ -82,38 +89,90 @@ def extract_column_info_from_codebook(codebook_path):
     """
     with open(codebook_path, 'r') as file:
         codebook = json.load(file)
+    lookup = get_annotation_lookup(codebook)
     column_info = {}
-    for key, section in codebook.items():
-        if key.startswith('section_'):
-            section_name = section['section_name']
-            annotations = section['annotations']
-            for annotation_key, annotation in annotations.items():
-                name = annotation['name']
-                column_name = f"{section_name}_{name}"
-                # Extract annotation type and relevant properties
-                annotation_type = annotation.get('type', 'dropdown')  # Default to dropdown for backward compatibility
-                properties = {
-                    'type': annotation_type
+    for section_key, section, annotation_key, annotation in get_annotation_entries(codebook):
+        column_name = get_annotation_column_name(section, annotation)
+        annotation_type = annotation.get('type', 'dropdown')
+        properties = {
+            'type': annotation_type,
+            'section_key': section_key,
+            'annotation_key': annotation_key,
+        }
+        if annotation_type == 'dropdown':
+            properties['options'] = annotation.get('options', [])
+        elif annotation_type == 'likert':
+            properties['min_value'] = annotation.get('min_value', 0)
+            properties['max_value'] = annotation.get('max_value', 5)
+        condition = get_annotation_condition(annotation)
+        if condition:
+            source_entry = lookup.get((condition['section_key'], condition['annotation_key']))
+            if source_entry:
+                source_section, source_annotation = source_entry
+                properties['condition'] = {
+                    'source_column': get_annotation_column_name(source_section, source_annotation),
+                    'source_type': source_annotation.get('type', 'dropdown'),
+                    'value': normalize_annotation_response_value(source_annotation, condition.get('value')),
                 }
-                # Add type-specific properties
-                if annotation_type == 'dropdown':
-                    properties['options'] = annotation.get('options', [])
-                elif annotation_type == 'likert':
-                    properties['min_value'] = annotation.get('min_value', 0)
-                    properties['max_value'] = annotation.get('max_value', 5)
-                column_info[column_name] = properties
+        column_info[column_name] = properties
     logger.debug("Extracted column info from codebook: %s", column_info)
     return column_info
+def _is_row_applicable_for_column(merged_row, column, column_info, side="gt", visited=None):
+    """Return whether a conditional annotation is applicable for one merged row."""
+    info = column_info.get(column, {})
+    condition = info.get("condition")
+    if not condition:
+        return True
+    source_column = condition.get("source_column")
+    if not source_column:
+        return True
+    visited = visited or set()
+    if column in visited:
+        return True
+    if source_column in column_info and not _is_row_applicable_for_column(
+        merged_row,
+        source_column,
+        column_info,
+        side=side,
+        visited=visited | {column},
+    ):
+        return False
+    source_value = merged_row.get(f"{source_column}_{side}")
+    source_annotation = {"type": condition.get("source_type", "dropdown")}
+    actual_value = normalize_annotation_response_value(source_annotation, source_value)
+    expected_value = normalize_annotation_response_value(source_annotation, condition.get("value"))
+    if actual_value is None:
+        return False
+    if condition.get("source_type") == "textbox" and actual_value == "":
+        return False
+    return actual_value == expected_value
+def _get_applicable_row_mask(merged_df, column, column_info, side="gt"):
+    """Return a boolean mask for rows where an annotation is applicable."""
+    if "condition" not in column_info.get(column, {}):
+        return pd.Series(True, index=merged_df.index)
+    return merged_df.apply(
+        lambda row: _is_row_applicable_for_column(row, column, column_info, side=side),
+        axis=1,
+    )
 def load_data(ground_truth_path, llm_output_path, columns_to_compare):
     """Load and align ground-truth and model-output CSV files for evaluation.
@@ -413,8 +472,9 @@ def evaluate_performance(merged_df, columns_to_compare, column_info, process_tex
             reports[column] = "Textbox processing skipped."
             continue
-        y_true = merged_df[column_gt]
-        y_pred = merged_df[column_llm]
+        applicable_mask = _get_applicable_row_mask(merged_df, column, column_info, side="gt")
+        y_true = merged_df.loc[applicable_mask, column_gt]
+        y_pred = merged_df.loc[applicable_mask, column_llm]
         # Handle values based on annotation type
         if annotation_type == 'checkbox':
@@ -548,8 +608,8 @@ def evaluate_performance(merged_df, columns_to_compare, column_info, process_tex
             # For Krippendorff's alpha
             label_to_int = {label: i for i, label in enumerate(['missing'] + all_labels)}
-            y_true_encoded = np.array([label_to_int[y_true_clean[i]] for i in range(len(y_true_clean))])
-            y_pred_encoded = np.array([label_to_int[y_pred_clean[i]] for i in range(len(y_pred_clean))])
+            y_true_encoded = np.array([label_to_int[value] for value in y_true_clean.tolist()])
+            y_pred_encoded = np.array([label_to_int[value] for value in y_pred_clean.tolist()])
             data = np.array([y_true_encoded, y_pred_encoded])
             krippendorff_alpha_scores[column] = krippendorff.alpha(reliability_data=data)

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/codebook_lab.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: codebook-lab
-Version: 1.0.0
+Version: 1.1.1
 Summary: An LLM annotation experiment pipeline for computational social science.
 Author: Lorcan McLaren
 License-Expression: AGPL-3.0-only
@@ -45,7 +45,7 @@ Dynamic: license-file
 # CodeBook Lab
-[![DOI](https://zenodo.org/badge/1186234207.svg)](https://doi.org/10.5281/zenodo.19185921)
+[![DOI](https://zenodo.org/badge/1186234207.svg)](https://doi.org/10.5281/zenodo.19185921) [![PyPI](https://img.shields.io/pypi/v/codebook-lab)](https://pypi.org/project/codebook-lab/) [![Python](https://img.shields.io/pypi/pyversions/codebook-lab)](https://pypi.org/project/codebook-lab/) [![License](https://img.shields.io/pypi/l/codebook-lab)](https://pypi.org/project/codebook-lab/)
 CodeBook Lab is an LLM annotation experiment pipeline for computational social science. It takes a codebook and labelled dataset from [CodeBook Studio](https://codebook.streamlit.app/) ([source](https://github.com/LorcanMcLaren/codebook-studio)) and runs structured experiments across the dimensions that matter for text-as-data research: model choice, model size, prompt style, zero-shot versus few-shot learning, and sampling hyperparameters — all benchmarked against human labels.
@@ -297,7 +297,7 @@ This project is licensed under the [GNU Affero General Public License v3.0](http
 If you use CodeBook Lab in research, please cite both:
 - this software package
-- the associated preprint
+- the associated arXiv preprint
 Citation metadata is also available in the project's [`CITATION.cff`](https://github.com/LorcanMcLaren/codebook-lab/blob/main/CITATION.cff).
@@ -324,7 +324,7 @@ BibTeX:
 APSR style:
-McLaren, Lorcan, James P. Cross, Zuzanna Krakowska, Robin Rauner, and Martijn Schoonvelde. 2026. *Magic Words or Methodical Work? Challenging Conventional Wisdom in LLM-Based Political Text Annotation*. Preprint.
+McLaren, Lorcan, James P. Cross, Zuzanna Krakowska, Robin Rauner, and Martijn Schoonvelde. 2026. *Magic Words or Methodical Work? Challenging Conventional Wisdom in LLM-Based Political Text Annotation*. arXiv preprint arXiv:2603.26898. [https://arxiv.org/abs/2603.26898](https://arxiv.org/abs/2603.26898).
 BibTeX:
@@ -333,6 +333,10 @@ BibTeX:
   author = {McLaren, Lorcan and Cross, James P. and Krakowska, Zuzanna and Rauner, Robin and Schoonvelde, Martijn},
   title = {Magic Words or Methodical Work? Challenging Conventional Wisdom in LLM-Based Political Text Annotation},
   year = {2026},
-  note = {Preprint}
+  eprint = {2603.26898},
+  archivePrefix = {arXiv},
+  primaryClass = {cs.CL},
+  doi = {10.48550/arXiv.2603.26898},
+  url = {https://arxiv.org/abs/2603.26898}
 }
 ```

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/codebook_lab.egg-info/SOURCES.txt RENAMED Viewed

@@ -3,6 +3,7 @@ README.md
 pyproject.toml
 codebook_lab/__init__.py
 codebook_lab/annotate.py
+codebook_lab/conditions.py
 codebook_lab/examples.py
 codebook_lab/experiments.py
 codebook_lab/metrics.py
@@ -18,10 +19,7 @@ codebook_lab.egg-info/top_level.txt
 codebook_lab/tasks/__init__.py
 codebook_lab/tasks/policy-sentiment/codebook.json
 codebook_lab/tasks/policy-sentiment/ground-truth.csv
-scripts/multi_run_example.py
-scripts/single_run_example.py
-tests/__init__.py
-tests/conftest.py
+tests/test_conditions.py
 tests/test_examples.py
 tests/test_experiments.py
 tests/test_metrics_summary.py

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "codebook-lab"
-version = "1.0.0"
+version = "1.1.1"
 description = "An LLM annotation experiment pipeline for computational social science."
 readme = "README.md"
 requires-python = ">=3.10"

codebook_lab-1.1.1/tests/test_conditions.py ADDED Viewed

@@ -0,0 +1,144 @@
+from __future__ import annotations
+import json
+import pandas as pd
+from codebook_lab.annotate import classify_text, extract_json_response
+from codebook_lab.metrics import evaluate_performance, extract_column_info_from_codebook
+def _conditional_codebook() -> dict:
+    return {
+        "header_column": "id",
+        "text_column": "text",
+        "section_1": {
+            "section_name": "1. Relevance",
+            "section_instruction": "",
+            "annotations": {
+                "annotation_1": {
+                    "name": "is_relevant",
+                    "type": "dropdown",
+                    "tooltip": "",
+                    "options": ["Yes", "No"],
+                }
+            },
+        },
+        "section_2": {
+            "section_name": "2. Stance",
+            "section_instruction": "",
+            "annotations": {
+                "annotation_1": {
+                    "name": "stance",
+                    "type": "dropdown",
+                    "tooltip": "",
+                    "options": ["Positive", "Negative"],
+                    "condition": {
+                        "section_key": "section_1",
+                        "annotation_key": "annotation_1",
+                        "value": "Yes",
+                    },
+                }
+            },
+        },
+    }
+def test_classify_text_skips_inactive_conditional_annotations(monkeypatch):
+    codebook = _conditional_codebook()
+    prompts_seen: list[str] = []
+    responses = iter(
+        [
+            '{"response": "No"}',
+        ]
+    )
+    def fake_generate_response(*args, **kwargs):
+        prompts_seen.append(kwargs.get("annotation_name", ""))
+        return next(responses)
+    monkeypatch.setattr("codebook_lab.annotate.generate_response", fake_generate_response)
+    result, _, _ = classify_text(
+        chain=object(),
+        text="Example text",
+        codebook=codebook,
+        prompt_type="standard",
+        use_examples=False,
+    )
+    assert prompts_seen == ["1. Relevance_is_relevant"]
+    assert result["1. Relevance_is_relevant"] == "No"
+    assert "2. Stance_stance" in result
+    assert result["2. Stance_stance"] is None
+def test_metrics_ignore_non_applicable_conditional_rows(tmp_path):
+    codebook = _conditional_codebook()
+    codebook_path = tmp_path / "codebook.json"
+    codebook_path.write_text(json.dumps(codebook))
+    column_info = extract_column_info_from_codebook(codebook_path)
+    merged_df = pd.DataFrame(
+        {
+            "1. Relevance_is_relevant_gt": ["No", "Yes"],
+            "1. Relevance_is_relevant_llm": ["No", "No"],
+            "2. Stance_stance_gt": [None, "Positive"],
+            "2. Stance_stance_llm": [None, None],
+        }
+    )
+    metrics = evaluate_performance(
+        merged_df=merged_df,
+        columns_to_compare=["2. Stance_stance"],
+        column_info=column_info,
+        process_textbox=False,
+    )
+    accuracy_scores = metrics[0]
+    percentage_agreement_scores = metrics[6]
+    assert accuracy_scores["2. Stance_stance"] == 0.0
+    assert percentage_agreement_scores["2. Stance_stance"] == 0.0
+def test_extract_json_response_normalizes_dropdown_options():
+    options = ["Yes", "No"]
+    assert extract_json_response(
+        '{"response": " yes\\n"}',
+        "dropdown",
+        options=options,
+    ) == "Yes"
+    assert extract_json_response("  No\n", "dropdown", options=options) == "No"
+    assert extract_json_response(
+        '{"response": "JSON"}',
+        "dropdown",
+        options=options,
+    ) is None
+    assert extract_json_response("JSON\n", "dropdown", options=options) is None
+def test_classify_text_stores_none_for_invalid_dropdown_outputs(monkeypatch):
+    codebook = _conditional_codebook()
+    responses = iter(
+        [
+            "JSON\n",
+        ]
+    )
+    def fake_generate_response(*args, **kwargs):
+        return next(responses)
+    monkeypatch.setattr("codebook_lab.annotate.generate_response", fake_generate_response)
+    result, _, _ = classify_text(
+        chain=object(),
+        text="Example text",
+        codebook=codebook,
+        prompt_type="standard",
+        use_examples=False,
+    )
+    assert result["1. Relevance_is_relevant"] is None
+    assert result["2. Stance_stance"] is None

codebook_lab-1.0.0/scripts/multi_run_example.py DELETED Viewed

@@ -1,41 +0,0 @@
-"""Run a small multi-experiment sweep with CodeBook Lab.
-This script is intentionally small so users can test the package quickly.
-Edit the grid below to explore more combinations once the basic workflow is
-working in your environment. The package will try to start a local Ollama
-server if needed and will pull any missing models automatically.
-"""
-from pathlib import Path
-from codebook_lab import run_experiment_grid
-OUTPUT_ROOT = Path("outputs")
-PARAM_GRID = {
-    "country_iso_code": "IRL",
-    "tasks": ["policy-sentiment"],
-    "models": ["gemma3:270m"],
-    "use_examples": [False, True],
-    "prompt_types": ["standard"],
-    "temperatures": [None],
-    "top_ps": [None],
-    "process_textboxes": [True],
-}
-def main() -> None:
-    """Run a small sweep and print a short summary of the completed runs."""
-    results = run_experiment_grid(
-        param_grid=PARAM_GRID,
-        output_root=OUTPUT_ROOT,
-    )
-    print(f"Completed {len(results)} experiment runs.")
-    for result in results:
-        print(f"- {result.model_id}: {result.experiment_directory}")
-if __name__ == "__main__":
-    main()

codebook_lab-1.0.0/scripts/single_run_example.py DELETED Viewed

@@ -1,48 +0,0 @@
-"""Run one bundled-example experiment with CodeBook Lab.
-Edit the constants below if you want to change the model, task, or output
-location. This script assumes:
-1. CodeBook Lab has been installed in the current environment, for example
-   with ``python -m pip install codebook-lab``.
-2. Ollama is installed and available on PATH.
-The package will try to start a local Ollama server if needed and will pull the
-requested model automatically before running the experiment.
-"""
-from pathlib import Path
-from codebook_lab import ExperimentSpec, run_experiment
-TASK = "policy-sentiment"
-MODEL = "gemma3:270m"
-COUNTRY_ISO_CODE = "IRL"
-OUTPUT_ROOT = Path("outputs")
-def main() -> None:
-    """Run a single experiment and print the key output locations."""
-    result = run_experiment(
-        ExperimentSpec(
-            task=TASK,
-            model=MODEL,
-            use_examples=False,
-            prompt_type="standard",
-            temperature=None,
-            top_p=None,
-            process_textbox=True,
-            country_iso_code=COUNTRY_ISO_CODE,
-        ),
-        output_root=OUTPUT_ROOT,
-    )
-    print("Completed single experiment run.")
-    print(f"Experiment directory: {result.experiment_directory}")
-    print(f"Metrics CSV: {result.metrics.output_csv}")
-    print(f"Classification report: {result.metrics.report_file}")
-if __name__ == "__main__":
-    main()

codebook_lab-1.0.0/tests/__init__.py DELETED Viewed

File without changes

codebook_lab-1.0.0/tests/conftest.py DELETED Viewed

@@ -1,13 +0,0 @@
-from __future__ import annotations
-from pathlib import Path
-import pytest
-@pytest.fixture()
-def bundled_task_dir() -> Path:
-    """Return the path to the bundled policy-sentiment example task."""
-    task_dir = Path(__file__).resolve().parent.parent / "codebook_lab" / "tasks" / "policy-sentiment"
-    assert task_dir.exists(), f"Bundled task directory not found: {task_dir}"
-    return task_dir

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/LICENSE RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/codebook_lab/__init__.py RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/codebook_lab/examples.py RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/codebook_lab/experiments.py RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/codebook_lab/ollama.py RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/codebook_lab/prompts.py RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/codebook_lab/py.typed RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/codebook_lab/tasks/__init__.py RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/codebook_lab/tasks/policy-sentiment/codebook.json RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/codebook_lab/tasks/policy-sentiment/ground-truth.csv RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/codebook_lab/types.py RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/codebook_lab.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/codebook_lab.egg-info/requires.txt RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/codebook_lab.egg-info/top_level.txt RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/setup.cfg RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/tests/test_examples.py RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/tests/test_experiments.py RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/tests/test_metrics_summary.py RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/tests/test_package_import.py RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/tests/test_prompts.py RENAMED Viewed

File without changes

{codebook_lab-1.0.0 → codebook_lab-1.1.1}/tests/test_types.py RENAMED Viewed

File without changes

codebook-lab 1.0.0__tar.gz → 1.1.1__tar.gz

codebook-lab 1.0.0tar.gz → 1.1.1tar.gz