pytest-grounding 0.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Sam Quigley
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,214 @@
1
+ Metadata-Version: 2.4
2
+ Name: pytest-grounding
3
+ Version: 0.0.1
4
+ Summary: Turn assertions about data into re-runnable, provenance-tracked claims — written and reviewed by agents.
5
+ Author-email: Sam Quigley <quigley@emerose.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/emerose/pytest-grounding
8
+ Project-URL: Repository, https://github.com/emerose/pytest-grounding
9
+ Project-URL: Issues, https://github.com/emerose/pytest-grounding/issues
10
+ Keywords: pytest,provenance,grounding,claims,reproducibility,data,audit
11
+ Classifier: Framework :: Pytest
12
+ Classifier: Development Status :: 3 - Alpha
13
+ Classifier: Intended Audience :: Science/Research
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: License :: OSI Approved :: MIT License
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Topic :: Software Development :: Testing
18
+ Classifier: Topic :: Scientific/Engineering
19
+ Requires-Python: >=3.9
20
+ Description-Content-Type: text/markdown
21
+ License-File: LICENSE
22
+ Requires-Dist: pytest>=7.0
23
+ Provides-Extra: data
24
+ Requires-Dist: pandas>=2.0; extra == "data"
25
+ Provides-Extra: docs
26
+ Requires-Dist: pdfplumber>=0.11; extra == "docs"
27
+ Requires-Dist: python-docx>=1.1; extra == "docs"
28
+ Requires-Dist: python-pptx>=1.0; extra == "docs"
29
+ Dynamic: license-file
30
+
31
+ # grounding
32
+
33
+ **Turn assertions about data into re-runnable, provenance-tracked claims — written and reviewed by agents.**
34
+
35
+ `grounding` is a small runtime on top of pytest. A test stops being a pass/fail check on your *code* and becomes a **grounded claim**: a statement about data, automatically pinned to the exact bytes it depends on, re-checked whenever those bytes change, carrying a non-binary judgment (how strong, with what caveats) that lives in version control.
36
+
37
+ It's built for a workflow where **an agent writes the claims and a second, fresh-context agent reviews them.**
38
+
39
+ ```bash
40
+ pip install grounding # core (statement-only / quote-only)
41
+ pip install 'grounding[data]' # + CSV grounding via data()/load()
42
+ pip install 'grounding[docs]' # + document quote verification via doc()
43
+ ```
44
+
45
+ No network, no API keys, no model inside. Everything is a pure function of file bytes.
46
+
47
+ ## Why agents, specifically
48
+
49
+ When an agent asserts *"knockdown reached 53% at the high dose,"* you have two questions: **is it mechanically true** against the data, and **does the evidence actually support the claim** as worded? `grounding` splits those, and each half lands with the right reviewer:
50
+
51
+ - **The mechanical half is the test.** Re-run it; it passes or fails against sha-pinned bytes. No reviewer judgment needed — CI does it.
52
+ - **The judgment half is metadata** (`statement`, `@strength`, `@caveats`, the cited quote). A fresh-context reviewer agent reads *exactly* the bytes the author grounded — same shas, no drift — and decides whether the framing is honest.
53
+
54
+ `grounding_report.json` is the machine-readable handoff: the author agent emits it, the reviewer agent consumes it.
55
+
56
+ ## A claim is a pytest test
57
+
58
+ ```python
59
+ from grounding import data, evidence, statement, strength, caveats, kind
60
+ from scipy import stats
61
+
62
+ @kind("result")
63
+ @strength("moderate")
64
+ @caveats("n=8 per arm, single cohort; not corrected for multiple endpoints")
65
+ def test_treatment_lowers_biomarker_vs_vehicle():
66
+ """Serum biomarker at day 28: 10 mg/kg arm vs vehicle, cohort B.
67
+
68
+ Reviewer notes: groups are the prespecified arms; Welch's t-test because the
69
+ vehicle arm's spread is larger; two treated animals were excluded upstream for
70
+ dosing errors (already applied in the tidy table).
71
+ """
72
+ df = data("biomarker_day28.csv")
73
+ treated = df[df.arm == "10mpk"].biomarker
74
+ vehicle = df[df.arm == "vehicle"].biomarker
75
+
76
+ drop = 1 - treated.mean() / vehicle.mean()
77
+ t, p = stats.ttest_ind(treated, vehicle, equal_var=False)
78
+
79
+ statement(f"At day 28, the 10 mg/kg arm showed a {drop:.0%} lower serum biomarker "
80
+ f"than vehicle (Welch t = {t:.1f}, p = {p:.3f}).")
81
+ evidence(pct_drop=round(drop * 100, 1), p_value=round(p, 4))
82
+
83
+ assert p < 0.05 and drop > 0 # the qualitative claim: a real, downward effect
84
+ ```
85
+
86
+ The three layers don't repeat each other:
87
+
88
+ - **`statement()`** is the proposition, with numbers interpolated from the data — it *can't* claim a drop the table doesn't produce.
89
+ - the **docstring** is the *why and how* — context that lets a later reviewer judge the claim without re-deriving it.
90
+ - the **`assert`** guards only the qualitative shape (significant, downward); the quantity lives in the computed statement.
91
+
92
+ Run it:
93
+
94
+ ```bash
95
+ pytest --grounding-out ./out
96
+ ```
97
+
98
+ → `out/grounding_report.json`:
99
+
100
+ ```json
101
+ {
102
+ "claims": [{
103
+ "id": "test_efficacy.py::test_treatment_lowers_biomarker_vs_vehicle",
104
+ "statement": "At day 28, the 10 mg/kg arm showed a 41% lower serum biomarker than vehicle (Welch t = 3.2, p = 0.006).",
105
+ "kind": "result",
106
+ "strength": "moderate",
107
+ "caveats": "n=8 per arm, single cohort; not corrected for multiple endpoints",
108
+ "inputs": [{"kind": "data", "path": "biomarker_day28.csv", "sha256": "a17b…", "via": "tracked"}],
109
+ "evidence": {"pct_drop": 41.2, "p_value": 0.0061}
110
+ }]
111
+ }
112
+ ```
113
+
114
+ Nobody hand-wrote that provenance. `data()` recorded the read; the capture context attached it to the claim.
115
+
116
+ ## Grounding a quote in a document
117
+
118
+ ```python
119
+ from grounding import doc, statement
120
+
121
+ def test_summary_states_endpoint_met():
122
+ """Quote is from the signed CSR §10.1, not the synopsis."""
123
+ csr = doc("clinical_summary.pdf") # sha-pinned like any input
124
+ statement("The clinical study report states the primary endpoint was met.")
125
+ assert csr.contains("the primary endpoint was met")
126
+ ```
127
+
128
+ `DocRef.contains()` extracts with pinned pure-Python readers (pdf/docx/pptx) and matches whitespace/dash/Markdown-robustly, so a quote split across lines or cells still matches. The match is a pure function of the bytes. There is **no OCR**: a scanned/image-only document raises `EmptyExtraction` rather than silently reporting "not found".
129
+
130
+ ## Composing claims
131
+
132
+ `uses()` lets one claim build on earlier ones: it merges their sha-pinned inputs into this
133
+ claim's provenance (transitively) and hands back their `evidence`. The composed claim can read
134
+ no source of its own, yet `grounding trace` still walks it all the way down — change an upstream
135
+ dataset and the roll-up breaks too. Provenance is a computed DAG, never hand-maintained.
136
+
137
+ **Roll up independent results.** A program-level conclusion that rests on several per-dataset
138
+ claims — defined in different test files, over different data:
139
+
140
+ ```python
141
+ from grounding import uses, statement, strength
142
+
143
+ @strength("moderate")
144
+ def test_effect_replicates_across_cohorts():
145
+ """The biomarker drop holds in two independently-run cohorts."""
146
+ b = uses("test_treatment_lowers_biomarker_vs_vehicle") # cohort B
147
+ c = uses("test_treatment_lowers_biomarker_cohort_c") # cohort C, a different test file
148
+ statement(f"the effect replicates: {b['pct_drop']:.0f}% (cohort B) "
149
+ f"and {c['pct_drop']:.0f}% (cohort C)")
150
+ assert b["pct_drop"] > 0 and c["pct_drop"] > 0
151
+ ```
152
+
153
+ This claim touches no CSV directly, but its recorded inputs now include *both* cohorts' files,
154
+ each sha-pinned. Change either cohort's data and this roll-up — not just the two underlying
155
+ claims — shows up as drifted.
156
+
157
+ **Cross-check data against a document.** Compose a numeric claim with a quote check to assert an
158
+ external report and your own data agree — the classic transcription-drift catcher:
159
+
160
+ ```python
161
+ from grounding import doc, uses, statement, strength, kind
162
+
163
+ @kind("external")
164
+ @strength("strong")
165
+ def test_report_headline_matches_our_data():
166
+ """The CSR's stated drop matches what our tidy data produces — no transcription drift."""
167
+ ours = uses("test_treatment_lowers_biomarker_vs_vehicle")["pct_drop"]
168
+ csr = doc("clinical_summary.pdf")
169
+ statement(f"the CSR's reported reduction matches our computed {ours:.0f}% drop")
170
+ assert csr.contains(f"{ours:.0f}% reduction")
171
+ ```
172
+
173
+ This grounds the *agreement* itself: the PDF is pinned by `doc()`, the number is pinned
174
+ transitively through `uses()`, and the single assert fails if the report and the data ever
175
+ diverge. Each claim stays small and independently reviewable; higher-level claims inherit — never
176
+ re-derive — their evidence and provenance.
177
+
178
+ ## Tracing
179
+
180
+ ```bash
181
+ grounding trace ./out # re-verify every claim's inputs still match recorded shas
182
+ ```
183
+
184
+ One command answers *"is this conclusion still grounded?"* — the question a reviewer otherwise spends an afternoon on. Exit 0 if grounded, 1 if any input changed or went missing.
185
+
186
+ ## What's in the box
187
+
188
+ | Piece | What it does |
189
+ |---|---|
190
+ | **Capture context** | records every tracked read (kind, path, sha256) while a claim runs |
191
+ | **Tracked loaders** | `data()`/`load()` (CSV→DataFrame, sha-pinned), `doc()` (any document) |
192
+ | **`statement()`** | the claim's proposition — ideally computed from the data so it can't drift |
193
+ | **Quote verification** | `DocRef.contains()` — offline, deterministic; raises on unreadable sources |
194
+ | **pytest plugin** | wraps every test in a capture, emits `grounding_report.json` |
195
+ | **Judgment markers** | `@strength`, `@caveats`, `@kind`, `@reviewed` — the reviewer's surface |
196
+ | **`uses()`** | transitive claim composition |
197
+ | **Bypass guard** | flags a claim that reads data through an untracked path |
198
+ | **`grounding trace`** | walks the provenance DAG; tells you if a conclusion is still grounded |
199
+
200
+ ## Design principles
201
+
202
+ - **Deterministic & offline.** Pure function of bytes. No network, keys, or model — runs in CI and in massively parallel agent fan-out with nothing to configure.
203
+ - **Sha-pinned.** The recorded hash is of exactly the bytes parsed.
204
+ - **The test is the spec.** A claim is an ordinary pytest test; your runner, fixtures, and CI just work. Git history of `statement`/`@strength`/`@caveats` is a belief-change ledger.
205
+ - **Computed, not curated.** Provenance, composition, and (ideally) the statement itself derive from what ran, so they can't drift from reality.
206
+ - **Author/critic separation by construction.** Mechanical truth → the assert; honest framing → metadata a fresh-context reviewer judges against the same pinned evidence.
207
+
208
+ ## What it is *not*
209
+
210
+ - **Not data versioning** (DVC/lakeFS) — it pins shas of files you already have, wherever they live.
211
+ - **Not a workflow engine** — it observes reads during a test; it doesn't orchestrate them.
212
+ - **Not rendering** — turning grounded claims into a cited report (PDF/HTML) is a separate layer built *on top* of `grounding_report.json`.
213
+ - **Not storage/indexing** — the report is the wire format; building a searchable index over it is a consumer's concern.
214
+ - **Not an LLM judge** — it runs no model; judgments are recorded by the agents that use it.
@@ -0,0 +1,184 @@
1
+ # grounding
2
+
3
+ **Turn assertions about data into re-runnable, provenance-tracked claims — written and reviewed by agents.**
4
+
5
+ `grounding` is a small runtime on top of pytest. A test stops being a pass/fail check on your *code* and becomes a **grounded claim**: a statement about data, automatically pinned to the exact bytes it depends on, re-checked whenever those bytes change, carrying a non-binary judgment (how strong, with what caveats) that lives in version control.
6
+
7
+ It's built for a workflow where **an agent writes the claims and a second, fresh-context agent reviews them.**
8
+
9
+ ```bash
10
+ pip install grounding # core (statement-only / quote-only)
11
+ pip install 'grounding[data]' # + CSV grounding via data()/load()
12
+ pip install 'grounding[docs]' # + document quote verification via doc()
13
+ ```
14
+
15
+ No network, no API keys, no model inside. Everything is a pure function of file bytes.
16
+
17
+ ## Why agents, specifically
18
+
19
+ When an agent asserts *"knockdown reached 53% at the high dose,"* you have two questions: **is it mechanically true** against the data, and **does the evidence actually support the claim** as worded? `grounding` splits those, and each half lands with the right reviewer:
20
+
21
+ - **The mechanical half is the test.** Re-run it; it passes or fails against sha-pinned bytes. No reviewer judgment needed — CI does it.
22
+ - **The judgment half is metadata** (`statement`, `@strength`, `@caveats`, the cited quote). A fresh-context reviewer agent reads *exactly* the bytes the author grounded — same shas, no drift — and decides whether the framing is honest.
23
+
24
+ `grounding_report.json` is the machine-readable handoff: the author agent emits it, the reviewer agent consumes it.
25
+
26
+ ## A claim is a pytest test
27
+
28
+ ```python
29
+ from grounding import data, evidence, statement, strength, caveats, kind
30
+ from scipy import stats
31
+
32
+ @kind("result")
33
+ @strength("moderate")
34
+ @caveats("n=8 per arm, single cohort; not corrected for multiple endpoints")
35
+ def test_treatment_lowers_biomarker_vs_vehicle():
36
+ """Serum biomarker at day 28: 10 mg/kg arm vs vehicle, cohort B.
37
+
38
+ Reviewer notes: groups are the prespecified arms; Welch's t-test because the
39
+ vehicle arm's spread is larger; two treated animals were excluded upstream for
40
+ dosing errors (already applied in the tidy table).
41
+ """
42
+ df = data("biomarker_day28.csv")
43
+ treated = df[df.arm == "10mpk"].biomarker
44
+ vehicle = df[df.arm == "vehicle"].biomarker
45
+
46
+ drop = 1 - treated.mean() / vehicle.mean()
47
+ t, p = stats.ttest_ind(treated, vehicle, equal_var=False)
48
+
49
+ statement(f"At day 28, the 10 mg/kg arm showed a {drop:.0%} lower serum biomarker "
50
+ f"than vehicle (Welch t = {t:.1f}, p = {p:.3f}).")
51
+ evidence(pct_drop=round(drop * 100, 1), p_value=round(p, 4))
52
+
53
+ assert p < 0.05 and drop > 0 # the qualitative claim: a real, downward effect
54
+ ```
55
+
56
+ The three layers don't repeat each other:
57
+
58
+ - **`statement()`** is the proposition, with numbers interpolated from the data — it *can't* claim a drop the table doesn't produce.
59
+ - the **docstring** is the *why and how* — context that lets a later reviewer judge the claim without re-deriving it.
60
+ - the **`assert`** guards only the qualitative shape (significant, downward); the quantity lives in the computed statement.
61
+
62
+ Run it:
63
+
64
+ ```bash
65
+ pytest --grounding-out ./out
66
+ ```
67
+
68
+ → `out/grounding_report.json`:
69
+
70
+ ```json
71
+ {
72
+ "claims": [{
73
+ "id": "test_efficacy.py::test_treatment_lowers_biomarker_vs_vehicle",
74
+ "statement": "At day 28, the 10 mg/kg arm showed a 41% lower serum biomarker than vehicle (Welch t = 3.2, p = 0.006).",
75
+ "kind": "result",
76
+ "strength": "moderate",
77
+ "caveats": "n=8 per arm, single cohort; not corrected for multiple endpoints",
78
+ "inputs": [{"kind": "data", "path": "biomarker_day28.csv", "sha256": "a17b…", "via": "tracked"}],
79
+ "evidence": {"pct_drop": 41.2, "p_value": 0.0061}
80
+ }]
81
+ }
82
+ ```
83
+
84
+ Nobody hand-wrote that provenance. `data()` recorded the read; the capture context attached it to the claim.
85
+
86
+ ## Grounding a quote in a document
87
+
88
+ ```python
89
+ from grounding import doc, statement
90
+
91
+ def test_summary_states_endpoint_met():
92
+ """Quote is from the signed CSR §10.1, not the synopsis."""
93
+ csr = doc("clinical_summary.pdf") # sha-pinned like any input
94
+ statement("The clinical study report states the primary endpoint was met.")
95
+ assert csr.contains("the primary endpoint was met")
96
+ ```
97
+
98
+ `DocRef.contains()` extracts with pinned pure-Python readers (pdf/docx/pptx) and matches whitespace/dash/Markdown-robustly, so a quote split across lines or cells still matches. The match is a pure function of the bytes. There is **no OCR**: a scanned/image-only document raises `EmptyExtraction` rather than silently reporting "not found".
99
+
100
+ ## Composing claims
101
+
102
+ `uses()` lets one claim build on earlier ones: it merges their sha-pinned inputs into this
103
+ claim's provenance (transitively) and hands back their `evidence`. The composed claim can read
104
+ no source of its own, yet `grounding trace` still walks it all the way down — change an upstream
105
+ dataset and the roll-up breaks too. Provenance is a computed DAG, never hand-maintained.
106
+
107
+ **Roll up independent results.** A program-level conclusion that rests on several per-dataset
108
+ claims — defined in different test files, over different data:
109
+
110
+ ```python
111
+ from grounding import uses, statement, strength
112
+
113
+ @strength("moderate")
114
+ def test_effect_replicates_across_cohorts():
115
+ """The biomarker drop holds in two independently-run cohorts."""
116
+ b = uses("test_treatment_lowers_biomarker_vs_vehicle") # cohort B
117
+ c = uses("test_treatment_lowers_biomarker_cohort_c") # cohort C, a different test file
118
+ statement(f"the effect replicates: {b['pct_drop']:.0f}% (cohort B) "
119
+ f"and {c['pct_drop']:.0f}% (cohort C)")
120
+ assert b["pct_drop"] > 0 and c["pct_drop"] > 0
121
+ ```
122
+
123
+ This claim touches no CSV directly, but its recorded inputs now include *both* cohorts' files,
124
+ each sha-pinned. Change either cohort's data and this roll-up — not just the two underlying
125
+ claims — shows up as drifted.
126
+
127
+ **Cross-check data against a document.** Compose a numeric claim with a quote check to assert an
128
+ external report and your own data agree — the classic transcription-drift catcher:
129
+
130
+ ```python
131
+ from grounding import doc, uses, statement, strength, kind
132
+
133
+ @kind("external")
134
+ @strength("strong")
135
+ def test_report_headline_matches_our_data():
136
+ """The CSR's stated drop matches what our tidy data produces — no transcription drift."""
137
+ ours = uses("test_treatment_lowers_biomarker_vs_vehicle")["pct_drop"]
138
+ csr = doc("clinical_summary.pdf")
139
+ statement(f"the CSR's reported reduction matches our computed {ours:.0f}% drop")
140
+ assert csr.contains(f"{ours:.0f}% reduction")
141
+ ```
142
+
143
+ This grounds the *agreement* itself: the PDF is pinned by `doc()`, the number is pinned
144
+ transitively through `uses()`, and the single assert fails if the report and the data ever
145
+ diverge. Each claim stays small and independently reviewable; higher-level claims inherit — never
146
+ re-derive — their evidence and provenance.
147
+
148
+ ## Tracing
149
+
150
+ ```bash
151
+ grounding trace ./out # re-verify every claim's inputs still match recorded shas
152
+ ```
153
+
154
+ One command answers *"is this conclusion still grounded?"* — the question a reviewer otherwise spends an afternoon on. Exit 0 if grounded, 1 if any input changed or went missing.
155
+
156
+ ## What's in the box
157
+
158
+ | Piece | What it does |
159
+ |---|---|
160
+ | **Capture context** | records every tracked read (kind, path, sha256) while a claim runs |
161
+ | **Tracked loaders** | `data()`/`load()` (CSV→DataFrame, sha-pinned), `doc()` (any document) |
162
+ | **`statement()`** | the claim's proposition — ideally computed from the data so it can't drift |
163
+ | **Quote verification** | `DocRef.contains()` — offline, deterministic; raises on unreadable sources |
164
+ | **pytest plugin** | wraps every test in a capture, emits `grounding_report.json` |
165
+ | **Judgment markers** | `@strength`, `@caveats`, `@kind`, `@reviewed` — the reviewer's surface |
166
+ | **`uses()`** | transitive claim composition |
167
+ | **Bypass guard** | flags a claim that reads data through an untracked path |
168
+ | **`grounding trace`** | walks the provenance DAG; tells you if a conclusion is still grounded |
169
+
170
+ ## Design principles
171
+
172
+ - **Deterministic & offline.** Pure function of bytes. No network, keys, or model — runs in CI and in massively parallel agent fan-out with nothing to configure.
173
+ - **Sha-pinned.** The recorded hash is of exactly the bytes parsed.
174
+ - **The test is the spec.** A claim is an ordinary pytest test; your runner, fixtures, and CI just work. Git history of `statement`/`@strength`/`@caveats` is a belief-change ledger.
175
+ - **Computed, not curated.** Provenance, composition, and (ideally) the statement itself derive from what ran, so they can't drift from reality.
176
+ - **Author/critic separation by construction.** Mechanical truth → the assert; honest framing → metadata a fresh-context reviewer judges against the same pinned evidence.
177
+
178
+ ## What it is *not*
179
+
180
+ - **Not data versioning** (DVC/lakeFS) — it pins shas of files you already have, wherever they live.
181
+ - **Not a workflow engine** — it observes reads during a test; it doesn't orchestrate them.
182
+ - **Not rendering** — turning grounded claims into a cited report (PDF/HTML) is a separate layer built *on top* of `grounding_report.json`.
183
+ - **Not storage/indexing** — the report is the wire format; building a searchable index over it is a consumer's concern.
184
+ - **Not an LLM judge** — it runs no model; judgments are recorded by the agents that use it.
@@ -0,0 +1,64 @@
1
+ """grounding — turn assertions about data into re-runnable, provenance-tracked claims.
2
+
3
+ A claim is a pytest test. Inside it you ground a statement in sha-pinned evidence:
4
+
5
+ from grounding import data, doc, statement, evidence, strength, caveats
6
+
7
+ @strength("strong")
8
+ @caveats("single run; n=3 per dose")
9
+ def test_knockdown_at_high_dose():
10
+ df = data("measurements.csv") # sha-pinned read, recorded as provenance
11
+ hi = df[df.dose == 300].knockdown.mean()
12
+ statement(f"Knockdown reached {hi:.0f}% at the 300 nM dose") # the proposition
13
+ evidence(knockdown_pct=round(hi, 1))
14
+ assert hi > 50 # the grounding/drift check
15
+
16
+ The pytest plugin (auto-loaded via the ``pytest11`` entry point) wraps each test in a
17
+ capture, records every ``data``/``doc`` read, and emits ``grounding_report.json``. The
18
+ non-binary judgment (``@strength``/``@caveats``/``@kind``/``@reviewed``) is metadata a
19
+ reviewer judges — never a pass/fail input.
20
+
21
+ Everything here is a pure function of file bytes: no network, no key, no model.
22
+
23
+ Public API:
24
+
25
+ data / load sha-pinned CSV loader -> DataFrame(.attrs) [needs the [data] extra]
26
+ doc -> DocRef record a document; DocRef.contains() verifies a quote [needs [docs]]
27
+ statement(text) the claim's proposition (ideally computed from data)
28
+ evidence(**kv) headline numbers for the report
29
+ uses(claim_id) compose on a prior claim (transitive provenance + evidence)
30
+ strength/caveats/kind/reviewed the judgment markers
31
+ Capture / current_capture / record / registry / TRACKED_SUFFIXES the capture core
32
+ install_guard install the untracked-read bypass guard (the plugin does this)
33
+ """
34
+ from __future__ import annotations
35
+
36
+ from ._capture import (
37
+ Capture,
38
+ TRACKED_SUFFIXES,
39
+ current_capture,
40
+ record,
41
+ registry,
42
+ )
43
+ from ._text import match_phrase, sha256
44
+ from .loaders import (
45
+ DocRef,
46
+ EmptyExtraction,
47
+ UnsupportedDocFormat,
48
+ data,
49
+ doc,
50
+ load,
51
+ )
52
+ from .claim import caveats, evidence, kind, reviewed, statement, strength, uses
53
+ from .guard import install_guard
54
+
55
+ __all__ = [
56
+ "load", "data", "doc", "DocRef", "UnsupportedDocFormat", "EmptyExtraction",
57
+ "statement", "evidence", "uses",
58
+ "strength", "caveats", "kind", "reviewed",
59
+ "Capture", "current_capture", "record", "registry", "TRACKED_SUFFIXES",
60
+ "match_phrase", "sha256",
61
+ "install_guard",
62
+ ]
63
+
64
+ __version__ = "0.0.1"
@@ -0,0 +1,68 @@
1
+ """The capture context — the heart of automatic provenance.
2
+
3
+ While a claim runs, a :class:`Capture` is active in a context variable. Every tracked read
4
+ (``load``/``data``/``doc``, or an untracked read the bypass guard catches) records its
5
+ ``{kind, path, sha256}`` into it, and the claim's :func:`grounding.statement` /
6
+ :func:`grounding.evidence` write the proposition + headline numbers. A claim's id + its
7
+ captured inputs + its statement + its evidence form a *computed* record — never
8
+ hand-maintained.
9
+ """
10
+ from __future__ import annotations
11
+
12
+ import contextvars
13
+ from dataclasses import dataclass, field
14
+ from typing import Any
15
+
16
+ # Source-file kinds we consider "tracked": reading one while a capture is active is
17
+ # provenance the claim depends on. The bypass guard watches the same set.
18
+ TRACKED_SUFFIXES = {
19
+ ".csv", ".tsv", ".xlsx", ".xls", ".pdf", ".docx", ".pptx", ".ppt",
20
+ ".json", ".yaml", ".yml",
21
+ }
22
+
23
+
24
+ @dataclass
25
+ class Capture:
26
+ """Records every tracked source read + the statement and headline numbers for one
27
+ claim. The claim's id, captured inputs, statement and evidence are all computed from
28
+ what actually ran, so they can't drift from reality."""
29
+
30
+ claim_id: str | None = None
31
+ statement: str | None = None # the proposition (set by statement())
32
+ inputs: list[dict] = field(default_factory=list) # {kind, path, sha256, via}
33
+ evidence: dict[str, Any] = field(default_factory=dict)
34
+ bypassed: list[str] = field(default_factory=list) # untracked reads the guard caught
35
+ _seen: set = field(default_factory=set)
36
+
37
+ def record(self, kind: str, path, sha: str, via: str = "tracked") -> None:
38
+ key = (kind, str(path))
39
+ if key in self._seen:
40
+ return
41
+ self._seen.add(key)
42
+ self.inputs.append({"kind": kind, "path": str(path), "sha256": sha, "via": via})
43
+
44
+ def merge(self, other: "Capture") -> None:
45
+ """Pull another capture's inputs in transitively (used by :func:`grounding.uses`)."""
46
+ for inp in other.inputs:
47
+ self.record(inp["kind"], inp["path"], inp["sha256"], via="uses")
48
+
49
+
50
+ _CURRENT: contextvars.ContextVar[Capture | None] = contextvars.ContextVar(
51
+ "grounding_capture", default=None)
52
+
53
+
54
+ def current_capture() -> Capture | None:
55
+ return _CURRENT.get()
56
+
57
+
58
+ def record(kind: str, path, sha: str, via: str = "tracked") -> None:
59
+ """Record a (kind, path, sha) into the active capture, if any. Called by the tracked
60
+ loaders and the bypass guard."""
61
+ cap = _CURRENT.get()
62
+ if cap is not None:
63
+ cap.record(kind, path, sha, via)
64
+
65
+
66
+ # A session-wide registry of completed claim records, keyed by node id. Populated by the
67
+ # plugin so :func:`grounding.uses` can pull a prior claim's evidence + inputs.
68
+ registry: dict[str, dict] = {}
@@ -0,0 +1,38 @@
1
+ """The one verbatim-quote text normalizer (pure stdlib).
2
+
3
+ A single place for the text normalization that quote matching depends on. Keeping it
4
+ in one function means a verbatim quote folds to exactly one canonical form everywhere
5
+ it is compared — so a correct quote is never defeated by a glyph variant, and any future
6
+ identity/caching layer built on top stays consistent with the matcher by construction.
7
+ """
8
+ from __future__ import annotations
9
+
10
+ import unicodedata
11
+
12
+ # Unicode dash/hyphen variants that publishers and PDF extractors use interchangeably
13
+ # with ASCII "-": en/em dashes, the Unicode hyphen, non-breaking hyphen, minus sign, etc.
14
+ # Folding them (plus NFKC, which normalizes ligatures/full-width/compatibility forms) lets
15
+ # a verbatim quote match stored text without the author reproducing the exact glyph — the
16
+ # single most common reason a real, correct quote fails a naive substring check.
17
+ _DASHES = "‐‑‒–—―⁃−﹘﹣-"
18
+ _DASH_MAP = {ord(c): "-" for c in _DASHES}
19
+
20
+
21
+ def collapse_ws(s: str) -> str:
22
+ """Collapse every run of whitespace to a single space (and strip). Quote matching is
23
+ *verbatim*, but extractors split a sentence across runs/lines/cells (worst in slide
24
+ decks); normalizing both sides makes a short quote match reliably."""
25
+ return " ".join(s.split())
26
+
27
+
28
+ def fold_match(s: str) -> str:
29
+ """Normalize text for verbatim-quote matching: NFKC-normalize, fold Unicode dashes to
30
+ ASCII ``-``, drop Markdown emphasis markers (``*``/``_``), then collapse whitespace.
31
+ Case is preserved (the quote stays verbatim)."""
32
+ folded = (
33
+ unicodedata.normalize("NFKC", s)
34
+ .translate(_DASH_MAP)
35
+ .replace("*", "")
36
+ .replace("_", "")
37
+ )
38
+ return collapse_ws(folded)
@@ -0,0 +1,52 @@
1
+ """Shared text / identifier helpers (leaf module).
2
+
3
+ Small, dependency-light helpers: the sha256 hasher, the single phrase matcher
4
+ ``DocRef.contains`` delegates to, and the identifier-column preservation used by
5
+ :func:`grounding.load`. Imports only :mod:`grounding._normalize` (pure stdlib) and,
6
+ lazily, pandas inside :func:`preserve_identifier`; nothing here imports back up into the
7
+ package, so it is safe to import from anywhere.
8
+ """
9
+ from __future__ import annotations
10
+
11
+ import hashlib
12
+ import re as _re
13
+
14
+ from ._normalize import fold_match
15
+
16
+
17
+ def sha256(b: bytes) -> str:
18
+ return hashlib.sha256(b).hexdigest()
19
+
20
+
21
+ def match_phrase(phrase: str, text: str, *, normalize_ws: bool = True) -> bool:
22
+ """Substring-check ``phrase`` against ``text`` for verbatim-quote matching. With
23
+ ``normalize_ws`` (default) fold both sides first (NFKC + Unicode-dash fold + Markdown
24
+ emphasis strip + whitespace-collapse) so a correct quote isn't defeated by an en-dash,
25
+ a ligature, stored Markdown, or an extractor that split it across runs/lines/cells.
26
+ Case is preserved."""
27
+ if normalize_ws:
28
+ return fold_match(phrase) in fold_match(text)
29
+ return phrase in text
30
+
31
+
32
+ _INT_LIKE = _re.compile(r"^-?\d+$")
33
+
34
+
35
+ def preserve_identifier(col, str_col):
36
+ """Keep a column as faithful strings when pandas' numeric inference would corrupt
37
+ identifiers. Fires only when every non-blank value is a plain integer string AND
38
+ inference would alter it — a leading zero (``"01"`` -> ``1``) or a column floated by
39
+ blank cells (``"73"`` -> ``73.0``). Real measurement columns (decimals, sign-less
40
+ floats, clean blank-free integers) are left numeric and untouched."""
41
+ import pandas as pd
42
+
43
+ if not (pd.api.types.is_integer_dtype(col.dtype) or pd.api.types.is_float_dtype(col.dtype)):
44
+ return col # already object/string
45
+ nonblank = str_col[str_col != ""]
46
+ if not len(nonblank) or not nonblank.map(lambda v: bool(_INT_LIKE.match(v))).all():
47
+ return col # has decimals / non-integer text -> a real measurement column
48
+ has_leading_zero = nonblank.map(lambda v: len(v) > 1 and v.lstrip("-").startswith("0")).any()
49
+ has_blanks = (str_col == "").any()
50
+ if has_leading_zero or has_blanks:
51
+ return str_col # identifier-like; keep the exact text
52
+ return col # clean blank-free integers (counts, indices) stay numeric