codebook-lab 1.0.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,338 @@
1
+ Metadata-Version: 2.4
2
+ Name: codebook-lab
3
+ Version: 1.0.0
4
+ Summary: An LLM annotation experiment pipeline for computational social science.
5
+ Author: Lorcan McLaren
6
+ License-Expression: AGPL-3.0-only
7
+ Project-URL: Homepage, https://github.com/LorcanMcLaren/codebook-lab
8
+ Project-URL: Repository, https://github.com/LorcanMcLaren/codebook-lab
9
+ Project-URL: Documentation, https://github.com/LorcanMcLaren/codebook-lab#readme
10
+ Project-URL: DOI, https://doi.org/10.5281/zenodo.19185921
11
+ Keywords: llm,annotation,benchmarking,computational social science,text-as-data
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Intended Audience :: Science/Research
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Classifier: Topic :: Scientific/Engineering
20
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
21
+ Requires-Python: >=3.10
22
+ Description-Content-Type: text/markdown
23
+ License-File: LICENSE
24
+ Requires-Dist: codecarbon
25
+ Requires-Dist: krippendorff
26
+ Requires-Dist: langchain-core
27
+ Requires-Dist: langchain-ollama
28
+ Requires-Dist: numpy
29
+ Requires-Dist: pandas
30
+ Requires-Dist: regex
31
+ Requires-Dist: scikit-learn
32
+ Requires-Dist: scipy
33
+ Provides-Extra: textbox
34
+ Requires-Dist: bert-score; extra == "textbox"
35
+ Requires-Dist: nltk; extra == "textbox"
36
+ Requires-Dist: python-Levenshtein; extra == "textbox"
37
+ Requires-Dist: rouge-score; extra == "textbox"
38
+ Requires-Dist: sentence-transformers; extra == "textbox"
39
+ Requires-Dist: torch; extra == "textbox"
40
+ Provides-Extra: all
41
+ Requires-Dist: codebook-lab[textbox]; extra == "all"
42
+ Provides-Extra: dev
43
+ Requires-Dist: pytest; extra == "dev"
44
+ Dynamic: license-file
45
+
46
+ # CodeBook Lab
47
+
48
+ [![DOI](https://zenodo.org/badge/1186234207.svg)](https://doi.org/10.5281/zenodo.19185921)
49
+
50
+ CodeBook Lab is an LLM annotation experiment pipeline for computational social science. It takes a codebook and labelled dataset from [CodeBook Studio](https://codebook.streamlit.app/) ([source](https://github.com/LorcanMcLaren/codebook-studio)) and runs structured experiments across the dimensions that matter for text-as-data research: model choice, model size, prompt style, zero-shot versus few-shot learning, and sampling hyperparameters — all benchmarked against human labels.
51
+
52
+ Experiments are controlled through Python objects rather than by editing pipeline code. Because the codebook and labelled data stay constant across runs, each dimension can be isolated and compared against the same human labels.
53
+
54
+ For a step-by-step walkthrough covering both tools, see the [CodeBook Studio & Lab Tutorial](https://lorcanmclaren.com/codebook-tutorial.html).
55
+
56
+ ## Contents
57
+
58
+ - [How It Fits With CodeBook Studio](#how-it-fits-with-codebook-studio)
59
+ - [Package Overview](#package-overview)
60
+ - [Quickstart](#quickstart)
61
+ - [Experiment Configuration](#experiment-configuration)
62
+ - [Create Your Own Task](#create-your-own-task)
63
+ - [Advanced Customization](#advanced-customization)
64
+ - [License](#license)
65
+ - [Citation](#citation)
66
+
67
+ ## How It Fits With CodeBook Studio
68
+
69
+ [CodeBook Studio](https://codebook.streamlit.app/) defines the task. CodeBook Lab runs and evaluates the experiment.
70
+
71
+ <table>
72
+ <tr>
73
+ <td align="center"><strong>CodeBook Studio</strong></td>
74
+ <td align="center"></td>
75
+ <td align="center"><strong>CodeBook Lab</strong></td>
76
+ </tr>
77
+ <tr>
78
+ <td valign="top">
79
+ Define the annotation task<br>
80
+ Annotate texts with humans<br>
81
+ Export <code>codebook.json</code><br>
82
+ Save labeled data as <code>ground-truth.csv</code>
83
+ </td>
84
+ <td align="center" valign="middle">→</td>
85
+ <td valign="top">
86
+ Strip label columns automatically<br>
87
+ Run LLM annotation experiments<br>
88
+ Sweep over models, prompts, and hyperparameters<br>
89
+ Evaluate outputs against human labels
90
+ </td>
91
+ </tr>
92
+ </table>
93
+
94
+ ## Package Overview
95
+
96
+ The package is organized around a small set of importable modules:
97
+
98
+ - `codebook_lab.experiments`: high-level functions for single experiments and multi-run comparisons
99
+ - `codebook_lab.annotate`: lower-level annotation functions
100
+ - `codebook_lab.metrics`: evaluation and metrics functions
101
+ - `codebook_lab.prompts`: prompt wrapper registry for built-in and custom prompt styles
102
+ - `codebook_lab.examples`: helpers for bundled example tasks
103
+ - `codebook_lab.types`: dataclasses for experiment specifications and result objects
104
+
105
+ The package also ships with a bundled example task, `policy-sentiment`, so you can start experimenting immediately after installation.
106
+
107
+ ## Quickstart
108
+
109
+ ### 1. Create a Python environment
110
+
111
+ ```bash
112
+ python3 -m venv .venv
113
+ source .venv/bin/activate
114
+ python -m pip install --upgrade pip
115
+ python -m pip install codebook-lab
116
+ ```
117
+
118
+ This installs CodeBook Lab from a package index so you can import it in your own scripts, notebooks, or analysis workflows.
119
+
120
+ If you plan to generate or score `textbox` annotations, install the optional textbox dependencies as well:
121
+
122
+ ```bash
123
+ python -m pip install "codebook-lab[textbox]"
124
+ ```
125
+
126
+ ### 2. Install and start Ollama
127
+
128
+ Install Ollama on your machine, then make sure the local server is running:
129
+
130
+ ```bash
131
+ ollama serve
132
+ ```
133
+
134
+ If the default local Ollama server is not already running, CodeBook Lab will try to start it automatically when you run an experiment. It will also pull the requested Ollama model automatically if it is not already available locally.
135
+
136
+ ### 3. Choose a model and task
137
+
138
+ The package ships with a bundled example task called `policy-sentiment`. Any Ollama model available on your machine can be used.
139
+
140
+ ```python
141
+ task = "policy-sentiment"
142
+ model = "gemma3:270m"
143
+ ```
144
+
145
+ You can inspect or copy bundled example tasks from Python:
146
+
147
+ ```python
148
+ from codebook_lab import copy_example_task, list_example_tasks
149
+
150
+ print(list_example_tasks())
151
+ copy_example_task("policy-sentiment", "./my_tasks", overwrite=True)
152
+ ```
153
+
154
+ Set `country_iso_code` to the country where the compute is physically running. This is used by CodeCarbon to convert energy use into emissions factors and should be a 3-letter ISO 3166-1 alpha-3 code such as `USA`, `IRL`, or `DEU`.
155
+
156
+ ### 4. Run experiments from Python
157
+
158
+ Single experiment:
159
+
160
+ ```python
161
+ from codebook_lab import ExperimentSpec, run_experiment
162
+
163
+ result = run_experiment(
164
+ ExperimentSpec(
165
+ task="policy-sentiment",
166
+ model="gemma3:270m",
167
+ use_examples=False,
168
+ prompt_type="standard",
169
+ temperature=None,
170
+ top_p=None,
171
+ process_textbox=True,
172
+ country_iso_code="IRL",
173
+ ),
174
+ output_root="outputs",
175
+ )
176
+
177
+ print(result.experiment_directory)
178
+ print(result.metrics.summary_text)
179
+ ```
180
+
181
+ If `process_textbox=True`, CodeBook Lab will calculate textbox similarity metrics such as ROUGE, cosine similarity, and BERTScore when the optional textbox dependencies are installed. Without them, the run still completes, but textbox metrics that rely on those packages will be reported as unavailable and the warning will tell you how to install them.
182
+
183
+ Parameter sweep:
184
+
185
+ ```python
186
+ from codebook_lab import run_experiment_grid
187
+
188
+ results = run_experiment_grid(
189
+ param_grid={
190
+ "country_iso_code": "IRL",
191
+ "tasks": ["policy-sentiment"],
192
+ "models": ["gemma3:270m", "llama3.2:3b"],
193
+ "use_examples": ["false", "true"],
194
+ "prompt_types": ["standard", "persona"],
195
+ "temperatures": ["None", "0.2"],
196
+ "top_ps": ["None"],
197
+ "process_textboxes": ["true"],
198
+ },
199
+ output_root="outputs",
200
+ )
201
+
202
+ print(f"Completed {len(results)} runs")
203
+ ```
204
+
205
+ Custom prompt wrapper:
206
+
207
+ ```python
208
+ from codebook_lab import ExperimentSpec, PromptContext, register_prompt_wrapper, run_experiment
209
+
210
+ def concise_wrapper(context: PromptContext) -> str:
211
+ return (
212
+ "Annotate the text as carefully as possible.\n\n"
213
+ f"{context.core_prompt}\n\n"
214
+ f'Text:\n"{context.text}"\n\n'
215
+ "Response:\n"
216
+ )
217
+
218
+ register_prompt_wrapper("concise", concise_wrapper)
219
+
220
+ result = run_experiment(
221
+ ExperimentSpec(
222
+ task="policy-sentiment",
223
+ model="gemma3:270m",
224
+ prompt_type="concise",
225
+ country_iso_code="IRL",
226
+ )
227
+ )
228
+ ```
229
+
230
+ ### 5. Inspect the outputs
231
+
232
+ Each run creates a timestamped experiment directory under `outputs/<task>/` containing:
233
+
234
+ - `output.csv`: row-level model annotations
235
+ - `config.json`: the run configuration
236
+ - `classification_reports.txt`: per-label evaluation summaries
237
+ - `emissions.csv`: CodeCarbon output
238
+ - `timing_data.json`: inference timing summary
239
+ - `char_counts.json`: prompt and response character counts
240
+
241
+ Aggregate metrics are written to `outputs/metrics/<task>_metrics_log.csv`.
242
+
243
+ That metrics log stores both annotation-quality metrics and run metadata. Depending on the annotation type, it can include:
244
+
245
+ - classification metrics such as accuracy, precision, recall, F1, and percentage agreement
246
+ - inter-rater style agreement metrics such as Cohen's kappa and Krippendorff's alpha
247
+ - ordinal metrics for Likert labels such as Spearman correlation and quadratic weighted kappa
248
+ - textbox metrics such as normalized Levenshtein similarity, BLEU, ROUGE, cosine similarity, and BERTScore
249
+ - resource and run metadata such as CPU model, GPU model, total inference time, average inference time, total input characters, total output characters, energy consumed in kWh, and emissions in kg CO2eq
250
+
251
+ This makes it easy to compare not just which model is most accurate, but also which setup is fastest, cheapest to run, and most energy intensive.
252
+
253
+ Textbox note: normalized Levenshtein and BLEU work with the base install, but ROUGE, embedding-based cosine similarity, and BERTScore require the optional textbox extras. Install them with `python -m pip install "codebook-lab[textbox]"`.
254
+
255
+ ## Experiment Configuration
256
+
257
+ Most multi-run setup happens through the parameter grid dictionary you pass into `run_experiment_grid(...)`.
258
+
259
+ - `tasks`: which task folders to run
260
+ - `models`: which Ollama models to evaluate (e.g. `gemma3:270m`, `llama3.2:3b`, `qwen3.5:latest`)
261
+ - `use_examples`: whether to include worked examples from the codebook in the LLM prompt (zero-shot vs. few-shot)
262
+ - `prompt_types`: which prompt wrapper to use (`standard`, `persona`, or `CoT`)
263
+ - `temperatures`: sampling temperature values (leave empty for model default)
264
+ - `top_ps`: nucleus sampling values (leave empty for model default)
265
+ - `process_textboxes`: whether textbox-style annotations should be generated and scored
266
+
267
+ When `process_textboxes` is enabled, install the optional textbox extras first if you want the full textbox metric suite:
268
+
269
+ ```bash
270
+ python -m pip install "codebook-lab[textbox]"
271
+ ```
272
+
273
+ Add multiple values to any field and the package sweeps them automatically. For a single quick run, keep one value in each field.
274
+
275
+ ## Create Your Own Task
276
+
277
+ 1. Create a local folder such as `my_tasks/my-task/`.
278
+ 2. Annotate your data in [CodeBook Studio](https://codebook.streamlit.app/) and save the labeled file as `my_tasks/my-task/ground-truth.csv`.
279
+ 3. Download the codebook JSON from Studio and save it as `my_tasks/my-task/codebook.json`.
280
+ 4. Pass `task_root="my_tasks"` and `task="my-task"` into `ExperimentSpec(...)` when you run experiments.
281
+
282
+ If you are still designing a task and do not yet have human-coded labels, you can run annotation with `codebook_lab.run_annotation(...)` on an unlabeled CSV and add `ground-truth.csv` later when you want to score model performance with `codebook_lab.run_metrics(...)`.
283
+
284
+ ## Advanced Customization
285
+
286
+ If you want to go beyond the default wrappers and hyperparameters, `codebook_lab/annotate.py` and `codebook_lab/prompts.py` are the main extension points.
287
+
288
+ - To add new prompt wrappers beyond `standard`, `persona`, and `CoT`, register them from Python with `register_prompt_wrapper(...)` or extend the built-in registry in `codebook_lab/prompts.py`.
289
+ - To expose additional model hyperparameters such as `top_k`, add them to `setup_model()`, thread them through `run_annotation(...)` and `run_experiment(...)`, and add the corresponding field to the grid you pass into `run_experiment_grid(...)`.
290
+
291
+ ## License
292
+
293
+ This project is licensed under the [GNU Affero General Public License v3.0](https://github.com/LorcanMcLaren/codebook-lab/blob/main/LICENSE).
294
+
295
+ ## Citation
296
+
297
+ If you use CodeBook Lab in research, please cite both:
298
+
299
+ - this software package
300
+ - the associated preprint
301
+
302
+ Citation metadata is also available in the project's [`CITATION.cff`](https://github.com/LorcanMcLaren/codebook-lab/blob/main/CITATION.cff).
303
+
304
+ ### Software Citation
305
+
306
+ APSR style:
307
+
308
+ McLaren, Lorcan. 2026. *CodeBook Lab* (Version v1.0.0) [Computer software]. Zenodo. [https://doi.org/10.5281/zenodo.19185921](https://doi.org/10.5281/zenodo.19185921).
309
+
310
+ BibTeX:
311
+
312
+ ```bibtex
313
+ @software{mclaren_codebook_lab_2026,
314
+ author = {McLaren, Lorcan},
315
+ title = {CodeBook Lab},
316
+ year = {2026},
317
+ version = {v1.0.0},
318
+ doi = {10.5281/zenodo.19185921},
319
+ url = {https://doi.org/10.5281/zenodo.19185921}
320
+ }
321
+ ```
322
+
323
+ ### Preprint Citation
324
+
325
+ APSR style:
326
+
327
+ McLaren, Lorcan, James P. Cross, Zuzanna Krakowska, Robin Rauner, and Martijn Schoonvelde. 2026. *Magic Words or Methodical Work? Challenging Conventional Wisdom in LLM-Based Political Text Annotation*. Preprint.
328
+
329
+ BibTeX:
330
+
331
+ ```bibtex
332
+ @misc{mclaren_magic_words_2026,
333
+ author = {McLaren, Lorcan and Cross, James P. and Krakowska, Zuzanna and Rauner, Robin and Schoonvelde, Martijn},
334
+ title = {Magic Words or Methodical Work? Challenging Conventional Wisdom in LLM-Based Political Text Annotation},
335
+ year = {2026},
336
+ note = {Preprint}
337
+ }
338
+ ```
@@ -0,0 +1,17 @@
1
+ codebook_lab/__init__.py,sha256=aT0RsqXWzg2RWeBBRmAqJaGzs3-I7s0r-0MWxIgn_Xc,2205
2
+ codebook_lab/annotate.py,sha256=JM-9ipKh8-10gLRHrIXJ_b9WwDHK_TKCk6PCH-JQHSA,28052
3
+ codebook_lab/examples.py,sha256=d2emL3sR9FYRaEssvE2HZL8VevSQV7o7WEb8x3TUzTQ,2722
4
+ codebook_lab/experiments.py,sha256=axyF-TyjQydJcCq5VNSJCxk8OVcOKfM8YRiuUc_24Rw,12170
5
+ codebook_lab/metrics.py,sha256=rjfspbsPUaQj0nDUUeUU2eb8iuPXrm3HgxTUbL14ZoE,61943
6
+ codebook_lab/ollama.py,sha256=vwkx47f8Q4KZrb4rnYTRljhI5SxlhMGpxsxSfehUVYA,3641
7
+ codebook_lab/prompts.py,sha256=2OKLi2eyoQ3a971OxHUQ6dYnUrJTRBlweJgksRBOsE0,5739
8
+ codebook_lab/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
9
+ codebook_lab/types.py,sha256=28xIq0UkkHtxf__zhtMj43Xi-wE84zFpi8yWTEV8cgU,4469
10
+ codebook_lab/tasks/__init__.py,sha256=gKXtpZPPU-Bmie1gNVfZen504QFoGpTMw6GVsjwf6UY,59
11
+ codebook_lab/tasks/policy-sentiment/codebook.json,sha256=GpQIeWnjKhPNU_cPdEZkeDiHDM2xFbFQbm1B8OBcTRU,3720
12
+ codebook_lab/tasks/policy-sentiment/ground-truth.csv,sha256=4AWd4N0kduvWt_JyLceG6BaT2zMKnjF24cpEmlc63bs,5338
13
+ codebook_lab-1.0.0.dist-info/licenses/LICENSE,sha256=DZak_2itbUtvHzD3E7GNUYSRK6jdOJ-GqncQ2weavLA,34523
14
+ codebook_lab-1.0.0.dist-info/METADATA,sha256=hV6FXTIeyckuVnPJoz--fM9TIKtsqOeRNEvy4_Kq-Fw,13492
15
+ codebook_lab-1.0.0.dist-info/WHEEL,sha256=aeYiig01lYGDzBgS8HxWXOg3uV61G9ijOsup-k9o1sk,91
16
+ codebook_lab-1.0.0.dist-info/top_level.txt,sha256=Ap2OYdWhVoV-Um55HophbUxJgEM66H7WA8md37AwOCg,13
17
+ codebook_lab-1.0.0.dist-info/RECORD,,
@@ -0,0 +1,5 @@
1
+ Wheel-Version: 1.0
2
+ Generator: setuptools (82.0.1)
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
5
+