saidso 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,19 @@
1
+ __pycache__/
2
+ *.py[cod]
3
+ *.egg-info/
4
+ .eggs/
5
+ build/
6
+ dist/
7
+ .pytest_cache/
8
+ .mypy_cache/
9
+ .ruff_cache/
10
+ .venv/
11
+ venv/
12
+ env/
13
+ *.jsonl
14
+ .env
15
+ .DS_Store
16
+ attestations.jsonl
17
+ .claude/
18
+ /tests
19
+ /graphify-out
@@ -0,0 +1,36 @@
1
+ # Changelog
2
+
3
+ ## 0.1.0
4
+
5
+ First release. A grounding firewall for action-taking agents.
6
+
7
+ ### Core
8
+ - `@grounded` decorator: per-argument grounding policies, block-and-steer on
9
+ failure, attestation on success. Sync **and** async tools.
10
+ - Policies: `SPOKEN`, `CONFIRMED`, `CALLER_ID`, `INFERABLE`.
11
+ - `Transcript` buffer, `call_context` plumbing (contextvars).
12
+ - Deterministic matcher with number-word / year / date / phone / text
13
+ normalization. Uses `rapidfuzz` if installed, stdlib `difflib` otherwise
14
+ (zero required dependencies).
15
+ - `AttestationLog`: in-memory + optional JSONL provenance ledger.
16
+ - `SteerBack` return contract with auto-generated re-ask messages.
17
+ - `saidso.testing.GroundingCase`: replay harness for CI gates.
18
+
19
+ ### Production hardening
20
+ - **Fail-closed**: a matcher exception blocks the call and logs, never crashes
21
+ or lets it through.
22
+ - **Decoration-time validation**: guarding a non-existent parameter raises
23
+ immediately (typos can't leave real args unguarded); unknown policy strings
24
+ and empty policy sets raise.
25
+ - **No digit-substring over-matching**: numbers match as whole values only
26
+ (`"2"` is not grounded by `"20"`).
27
+ - **No short-string fuzzy over-matching**: tokens shorter than 4 chars require
28
+ exact word matches; multi-token values require every token to match.
29
+ - **Type coercion**: `date` / `datetime` / `int` / `float` / `bool` / `Decimal`
30
+ arguments are rendered deterministically before comparison.
31
+ - **`CONFIRMED` tolerates filler/backchannel** turns between read-back and the
32
+ caller's "yes".
33
+ - **Comma-grouped numbers** (`1,250`) parse correctly.
34
+ - `VAR_KEYWORD` (`**kwargs`) functions: guarded args resolved from the kwargs
35
+ dict.
36
+ - Observability via the `saidso` logger; `py.typed` ships type information.
@@ -0,0 +1,72 @@
1
+ # saidso — design
2
+
3
+ ## The one primitive
4
+
5
+ A checkpoint between an agent's "call `f(args)`" and the code that runs `f`. It
6
+ refuses the call unless every guarded argument traces back to the transcript,
7
+ and it keeps a receipt for the ones that pass.
8
+
9
+ Everything else is packaging around that.
10
+
11
+ ## Module map
12
+
13
+ | Module | Responsibility |
14
+ |---|---|
15
+ | `transcript.py` | Append-only, timestamped buffer of `Turn`s (user/agent/system). |
16
+ | `normalize.py` | Deterministic normalization: number words, years, dates, phones, text. The make-or-break layer. |
17
+ | `_fuzz.py` | Fuzzy match with `rapidfuzz` when present, `difflib` fallback. Zero required deps. |
18
+ | `matcher.py` | Per-policy checkers (`SPOKEN`/`CONFIRMED`/`CALLER_ID`/`INFERABLE`) → `GroundingResult` + span. |
19
+ | `policy.py` | `Policy` enum + default thresholds. |
20
+ | `result.py` | `Span`, `GroundingResult`, `ArgFinding`, `SteerBack` (the block-and-steer contract). |
21
+ | `context.py` | `CallContext` + `contextvars` so the decorator reads transcript/metadata implicitly. |
22
+ | `attestation.py` | `Attestation` + `AttestationLog` (in-memory + optional JSONL). |
23
+ | `grounding.py` | The `@grounded` decorator: bind args → check → block or run + attest. |
24
+
25
+ ## Control flow of a guarded call
26
+
27
+ ```
28
+ call f(name=..., dob=...)
29
+ │ (decorator) bind args to names via inspect.signature
30
+ │ resolve CallContext (explicit override > contextvar > empty)
31
+
32
+ for each guarded arg: matcher.check(value, policy, transcript, ctx)
33
+
34
+ ├─ any ungrounded ─► build SteerBack(failed, grounded) ─► return it
35
+ │ (or raise GroundingBlocked if configured)
36
+
37
+ └─ all grounded ──► ledger.build(action, findings) ─► run f(...) ─► return
38
+ ```
39
+
40
+ ## Why deterministic-first
41
+
42
+ The realtime voice loop has a hard latency ceiling. So matching is:
43
+
44
+ 1. **Normalize** the value and the transcript to the same shape (digits, ISO
45
+ dates, lowercased text).
46
+ 2. **Exact** (normalized substring) → high confidence.
47
+ 3. **Fuzzy** (`partial_ratio` ≥ threshold) → medium confidence, with the
48
+ matched turn as the span.
49
+
50
+ A verifier-model escalation for genuinely ambiguous cases is a roadmap hook,
51
+ not on the hot path.
52
+
53
+ ### Dates specifically
54
+
55
+ Full date parsing of free speech is brittle, so `SPOKEN` for a date passes if
56
+ **either** the whole turn parses to the same ISO date **or** all three
57
+ components (year, month, day — in digit *or* spoken form) appear in one turn
58
+ (`date_components_present`). The component check is the robust workhorse.
59
+
60
+ ## The trust trade-off
61
+
62
+ This is a **hard firewall**: it can block a legitimate action if the matcher
63
+ fails to recognize a real value (a false positive). That is the central risk
64
+ and the thing to measure. Thresholds are tunable per policy via
65
+ `GroundingConfig`. The steer-back design softens the cost: a wrong block just
66
+ makes the agent re-ask, rather than crashing the call.
67
+
68
+ ## Anti-priming (why example values are absent)
69
+
70
+ Tool descriptions and guard prompts deliberately never contain a placeholder
71
+ "bad" value (e.g. a sample DOB). Putting an example value in the prompt teaches
72
+ the model to emit it. The future prompt compiler enforces this automatically.
@@ -0,0 +1,47 @@
1
+ # saidso — roadmap
2
+
3
+ The MVP is deliberately small and deterministic. These are the pieces that turn
4
+ it from a useful decorator into a category-defining tool.
5
+
6
+ ## 1. Verifier-model escalation
7
+ When deterministic matching lands in the ambiguous band (near the threshold),
8
+ escalate that single argument to a tiny, fast model: "Did the caller say X?
9
+ Quote the words." Keep it off the hot path; cache per-turn. Pluggable backend.
10
+
11
+ ## 2. Anti-priming prompt compiler
12
+ Generate tool descriptions and guard prompts that enforce fidelity **without
13
+ ever naming a bad example value** (example values teach the model to emit them).
14
+ Input: the function signature + policies. Output: a description block + system
15
+ guidance. This is a hard-won, little-known lesson baked into a generator.
16
+
17
+ ## 3. Hallucination regression harness
18
+ A pytest-style harness: replay real or synthetic transcripts against guarded
19
+ tools and assert **zero ungrounded commits** slipped through. Turns "we hope it
20
+ doesn't fabricate" into a CI gate. Ships with a corpus format and fixtures.
21
+
22
+ ## 4. First-class framework adapters
23
+ - **LiveKit** (priority — home turf): transcription events → `Transcript`,
24
+ auto `call_context` per session.
25
+ - **Pipecat**, **Vapi**, **LangGraph** tool nodes.
26
+ Keep each adapter thin; the core stays framework-neutral.
27
+
28
+ ## 5. Matcher precision/recall eval set
29
+ A brutal labeled set of (transcript, value, expected verdict) — accents, ASR
30
+ noise, spelled-out emails, partial confirmations. Publish precision/recall.
31
+ **This is what determines whether anyone keeps the firewall on.** Build it
32
+ before the launch tweet.
33
+
34
+ ## 6. Richer policies
35
+ - `Policy.SPOKEN(type=...)` explicit type hints (date/phone/email/name).
36
+ - `Policy.ONE_OF([...])` for enum-ish slots.
37
+ - Composable policies (`SPOKEN | CONFIRMED`).
38
+ - `email` normalization ("m as in mango, a, r, i, a, at gmail dot com").
39
+
40
+ ## 7. Provenance ledger as a product surface
41
+ Exportable, signed attestation bundles ("authorized by these words at 00:42")
42
+ for healthcare/finance/legal audit. The likely paid layer on top of the OSS
43
+ core.
44
+
45
+ ## 8. Observability
46
+ Per-call metrics: block rate, false-block feedback loop, latency budget, policy
47
+ hit rates. A dashboard is downstream of this.
saidso-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 saidso contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
saidso-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,202 @@
1
+ Metadata-Version: 2.4
2
+ Name: saidso
3
+ Version: 0.1.0
4
+ Summary: A grounding firewall for action-taking AI agents: refuse any tool argument the user never actually said, with a transcript-linked audit trail.
5
+ Project-URL: Homepage, https://github.com/KarthikRommula/saidso
6
+ Project-URL: Issues, https://github.com/KarthikRommula/saidso/issues
7
+ Author: Karthik Rommula
8
+ License: MIT
9
+ License-File: LICENSE
10
+ Keywords: agents,ai,audit,firewall,grounding,guardrails,hallucination,llm,provenance,tool-use,voice
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Topic :: Software Development :: Libraries
16
+ Requires-Python: >=3.9
17
+ Provides-Extra: dev
18
+ Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
19
+ Requires-Dist: pytest>=7.0; extra == 'dev'
20
+ Requires-Dist: rapidfuzz>=3.0; extra == 'dev'
21
+ Provides-Extra: fast
22
+ Requires-Dist: rapidfuzz>=3.0; extra == 'fast'
23
+ Description-Content-Type: text/markdown
24
+
25
+ # saidso
26
+
27
+ **A grounding firewall for action-taking AI agents.**
28
+
29
+ `saidso` sits between an AI agent and its consequential tools (book, transfer,
30
+ prescribe, refund, update a record) and **refuses to let the agent commit any
31
+ argument that isn't grounded in what the user actually said** — with a
32
+ transcript-linked audit trail for every action that does run.
33
+
34
+ > The name is the whole idea: an action only goes through if the user *said so*.
35
+
36
+ ```python
37
+ from saidso import grounded, Policy
38
+
39
+ @grounded(
40
+ name=Policy.SPOKEN, # must appear in the caller's speech
41
+ dob=Policy.SPOKEN, # spoken naturally -> normalized to ISO
42
+ phone=Policy.CALLER_ID, # comes from carrier metadata, not the mouth
43
+ visit_date=Policy.INFERABLE, # "tomorrow" -> resolved from the clock
44
+ )
45
+ def register_patient(name, dob, phone, visit_date): ...
46
+ ```
47
+
48
+ ## The problem
49
+
50
+ LLM voice/phone agents don't just talk — they *do things*. To do them, they
51
+ call functions:
52
+
53
+ ```python
54
+ register_patient(name="John Doe", dob="1990-01-01", ...)
55
+ ```
56
+
57
+ Sometimes the model **fills in arguments the caller never said.** Today's
58
+ frameworks (LiveKit, Vapi, Pipecat, LangGraph) execute the call anyway — and a
59
+ fabricated name or date of birth lands in a real database.
60
+
61
+ Prompting ("never make up a DOB") is best-effort. It's the *suspect judging
62
+ itself*, it leaves no proof, and it silently degrades as you add tools.
63
+
64
+ `saidso` is the backstop that runs in **code, not in the prompt**: it assumes
65
+ the model *will* hallucinate and refuses to let the hallucination cause harm.
66
+
67
+ ## What it does, on every call
68
+
69
+ 1. **Block** — if an argument isn't grounded in the transcript, the function
70
+ body never runs.
71
+ 2. **Steer back** — instead of a dead error, it returns a structured message
72
+ that makes the agent *re-ask* the caller and try again, in-conversation.
73
+ 3. **Attest** — for every argument that does go through, it writes a receipt:
74
+ *this value came from these words, at this timestamp, with this confidence.*
75
+
76
+ ```text
77
+ agent -> register_patient(name='John Doe', dob='1990-01-01')
78
+ BLOCKED: body never ran.
79
+ steer-back: "I don't have your name and your date of birth from what the
80
+ caller said. Ask the caller for your name and your date of
81
+ birth, then try again. Do not guess or fill in placeholder values."
82
+ ```
83
+
84
+ ## The policies
85
+
86
+ | Policy | A value is grounded if… |
87
+ |---|---|
88
+ | `Policy.SPOKEN` | it appears in the caller's speech (digits/dates/names normalized, fuzzy-matched) |
89
+ | `Policy.CONFIRMED` | the agent read it back **and** the caller affirmed it |
90
+ | `Policy.CALLER_ID` | it matches trusted call metadata, not what was spoken |
91
+ | `Policy.INFERABLE` | it's derivable from context ("tomorrow" + clock) or was spoken |
92
+
93
+ ## Install
94
+
95
+ ```bash
96
+ pip install saidso # zero required dependencies
97
+ pip install saidso[fast] # add rapidfuzz for faster matching
98
+ ```
99
+
100
+ `saidso` works with no third-party packages (stdlib `difflib` fallback) and
101
+ uses `rapidfuzz` automatically if it's installed.
102
+
103
+ ## Usage
104
+
105
+ ```python
106
+ from saidso import grounded, Policy, Transcript, call_context, AttestationLog
107
+
108
+ @grounded(name=Policy.SPOKEN, dob=Policy.SPOKEN)
109
+ def register_patient(name, dob):
110
+ ... # your real DB write
111
+
112
+ # Feed the conversation as it happens:
113
+ tr = Transcript()
114
+ tr.add_user("Hi, I'd like to book an appointment.")
115
+
116
+ log = AttestationLog(path="attestations.jsonl") # optional audit trail
117
+
118
+ with call_context(tr, ledger=log):
119
+ out = register_patient(name="John Doe", dob="1990-01-01")
120
+
121
+ if getattr(out, "blocked", False):
122
+ say_to_caller(out.message) # the agent re-asks; nothing was committed
123
+ ```
124
+
125
+ When grounding passes, the body runs normally and an attestation is recorded.
126
+ By default a block **returns** a `SteerBack` (so it slots straight into a
127
+ tool-use loop); pass `GroundingConfig(raise_on_block=True)` to raise instead.
128
+
129
+ ### Plugging into your agent framework
130
+
131
+ - **Raw OpenAI / Anthropic tool-use** — return `steer.to_tool_message()` as the
132
+ tool result so the model self-corrects. See
133
+ [`examples/openai_tooluse.py`](examples/openai_tooluse.py).
134
+ - **LiveKit / Pipecat / Vapi** — keep a `Transcript` in sync with the
135
+ session's transcription events and open a `call_context`. See
136
+ [`examples/livekit_adapter.py`](examples/livekit_adapter.py).
137
+
138
+ ## Regression harness (CI gate)
139
+
140
+ Assert that invented values are blocked and real ones commit — turn "we hope it
141
+ doesn't fabricate" into a test:
142
+
143
+ ```python
144
+ from saidso.testing import GroundingCase
145
+
146
+ def test_invented_dob_is_blocked():
147
+ (GroundingCase(register_patient)
148
+ .user("Hi, I'd like an appointment")
149
+ .call(name="John Doe", dob="1990-01-01")
150
+ .assert_blocked("name", "dob"))
151
+
152
+ def test_real_values_commit():
153
+ (GroundingCase(register_patient)
154
+ .user("It's Maria Gomez, born January first nineteen ninety")
155
+ .call(name="Maria Gomez", dob="1990-01-01")
156
+ .assert_grounded())
157
+ ```
158
+
159
+ ## Production behaviour
160
+
161
+ - **Fail-closed.** If a grounding check ever raises, the argument is treated as
162
+ ungrounded (blocked) and the error is logged — a crash never opens the gate.
163
+ - **Validated at import time.** A policy naming a non-existent parameter raises
164
+ immediately, so a typo can't silently leave a real argument unguarded.
165
+ - **No silent over-matching.** Numbers must match as whole values (`"2"` is not
166
+ grounded by `"20"`); short names require exact word matches; `date`, `int`,
167
+ `float`, `bool` arguments are coerced deterministically.
168
+ - **Observability.** Blocks and errors log under the `saidso` logger.
169
+
170
+ Tune thresholds or switch to raising via `GroundingConfig`:
171
+
172
+ ```python
173
+ from saidso import GroundingConfig, Policy
174
+ cfg = GroundingConfig(thresholds={Policy.SPOKEN: 0.9}, raise_on_block=True)
175
+
176
+ @grounded(cfg, name=Policy.SPOKEN)
177
+ def book(name): ...
178
+ ```
179
+
180
+ ## Run the demo
181
+
182
+ ```bash
183
+ python examples/john_doe_demo.py
184
+ ```
185
+
186
+ ## Roadmap
187
+
188
+ The MVP is deterministic-first and intentionally small. Planned next:
189
+ verifier-model escalation for ambiguous cases, the **anti-priming prompt
190
+ compiler**, the **hallucination regression harness** (pytest-style CI gate),
191
+ and first-class framework adapters. See [`Docs/ROADMAP.md`](Docs/ROADMAP.md).
192
+
193
+ ## Development
194
+
195
+ ```bash
196
+ pip install -e ".[dev]"
197
+ pytest -q
198
+ ```
199
+
200
+ ## License
201
+
202
+ MIT — see [`LICENSE`](LICENSE).
saidso-0.1.0/README.md ADDED
@@ -0,0 +1,178 @@
1
+ # saidso
2
+
3
+ **A grounding firewall for action-taking AI agents.**
4
+
5
+ `saidso` sits between an AI agent and its consequential tools (book, transfer,
6
+ prescribe, refund, update a record) and **refuses to let the agent commit any
7
+ argument that isn't grounded in what the user actually said** — with a
8
+ transcript-linked audit trail for every action that does run.
9
+
10
+ > The name is the whole idea: an action only goes through if the user *said so*.
11
+
12
+ ```python
13
+ from saidso import grounded, Policy
14
+
15
+ @grounded(
16
+ name=Policy.SPOKEN, # must appear in the caller's speech
17
+ dob=Policy.SPOKEN, # spoken naturally -> normalized to ISO
18
+ phone=Policy.CALLER_ID, # comes from carrier metadata, not the mouth
19
+ visit_date=Policy.INFERABLE, # "tomorrow" -> resolved from the clock
20
+ )
21
+ def register_patient(name, dob, phone, visit_date): ...
22
+ ```
23
+
24
+ ## The problem
25
+
26
+ LLM voice/phone agents don't just talk — they *do things*. To do them, they
27
+ call functions:
28
+
29
+ ```python
30
+ register_patient(name="John Doe", dob="1990-01-01", ...)
31
+ ```
32
+
33
+ Sometimes the model **fills in arguments the caller never said.** Today's
34
+ frameworks (LiveKit, Vapi, Pipecat, LangGraph) execute the call anyway — and a
35
+ fabricated name or date of birth lands in a real database.
36
+
37
+ Prompting ("never make up a DOB") is best-effort. It's the *suspect judging
38
+ itself*, it leaves no proof, and it silently degrades as you add tools.
39
+
40
+ `saidso` is the backstop that runs in **code, not in the prompt**: it assumes
41
+ the model *will* hallucinate and refuses to let the hallucination cause harm.
42
+
43
+ ## What it does, on every call
44
+
45
+ 1. **Block** — if an argument isn't grounded in the transcript, the function
46
+ body never runs.
47
+ 2. **Steer back** — instead of a dead error, it returns a structured message
48
+ that makes the agent *re-ask* the caller and try again, in-conversation.
49
+ 3. **Attest** — for every argument that does go through, it writes a receipt:
50
+ *this value came from these words, at this timestamp, with this confidence.*
51
+
52
+ ```text
53
+ agent -> register_patient(name='John Doe', dob='1990-01-01')
54
+ BLOCKED: body never ran.
55
+ steer-back: "I don't have your name and your date of birth from what the
56
+ caller said. Ask the caller for your name and your date of
57
+ birth, then try again. Do not guess or fill in placeholder values."
58
+ ```
59
+
60
+ ## The policies
61
+
62
+ | Policy | A value is grounded if… |
63
+ |---|---|
64
+ | `Policy.SPOKEN` | it appears in the caller's speech (digits/dates/names normalized, fuzzy-matched) |
65
+ | `Policy.CONFIRMED` | the agent read it back **and** the caller affirmed it |
66
+ | `Policy.CALLER_ID` | it matches trusted call metadata, not what was spoken |
67
+ | `Policy.INFERABLE` | it's derivable from context ("tomorrow" + clock) or was spoken |
68
+
69
+ ## Install
70
+
71
+ ```bash
72
+ pip install saidso # zero required dependencies
73
+ pip install saidso[fast] # add rapidfuzz for faster matching
74
+ ```
75
+
76
+ `saidso` works with no third-party packages (stdlib `difflib` fallback) and
77
+ uses `rapidfuzz` automatically if it's installed.
78
+
79
+ ## Usage
80
+
81
+ ```python
82
+ from saidso import grounded, Policy, Transcript, call_context, AttestationLog
83
+
84
+ @grounded(name=Policy.SPOKEN, dob=Policy.SPOKEN)
85
+ def register_patient(name, dob):
86
+ ... # your real DB write
87
+
88
+ # Feed the conversation as it happens:
89
+ tr = Transcript()
90
+ tr.add_user("Hi, I'd like to book an appointment.")
91
+
92
+ log = AttestationLog(path="attestations.jsonl") # optional audit trail
93
+
94
+ with call_context(tr, ledger=log):
95
+ out = register_patient(name="John Doe", dob="1990-01-01")
96
+
97
+ if getattr(out, "blocked", False):
98
+ say_to_caller(out.message) # the agent re-asks; nothing was committed
99
+ ```
100
+
101
+ When grounding passes, the body runs normally and an attestation is recorded.
102
+ By default a block **returns** a `SteerBack` (so it slots straight into a
103
+ tool-use loop); pass `GroundingConfig(raise_on_block=True)` to raise instead.
104
+
105
+ ### Plugging into your agent framework
106
+
107
+ - **Raw OpenAI / Anthropic tool-use** — return `steer.to_tool_message()` as the
108
+ tool result so the model self-corrects. See
109
+ [`examples/openai_tooluse.py`](examples/openai_tooluse.py).
110
+ - **LiveKit / Pipecat / Vapi** — keep a `Transcript` in sync with the
111
+ session's transcription events and open a `call_context`. See
112
+ [`examples/livekit_adapter.py`](examples/livekit_adapter.py).
113
+
114
+ ## Regression harness (CI gate)
115
+
116
+ Assert that invented values are blocked and real ones commit — turn "we hope it
117
+ doesn't fabricate" into a test:
118
+
119
+ ```python
120
+ from saidso.testing import GroundingCase
121
+
122
+ def test_invented_dob_is_blocked():
123
+ (GroundingCase(register_patient)
124
+ .user("Hi, I'd like an appointment")
125
+ .call(name="John Doe", dob="1990-01-01")
126
+ .assert_blocked("name", "dob"))
127
+
128
+ def test_real_values_commit():
129
+ (GroundingCase(register_patient)
130
+ .user("It's Maria Gomez, born January first nineteen ninety")
131
+ .call(name="Maria Gomez", dob="1990-01-01")
132
+ .assert_grounded())
133
+ ```
134
+
135
+ ## Production behaviour
136
+
137
+ - **Fail-closed.** If a grounding check ever raises, the argument is treated as
138
+ ungrounded (blocked) and the error is logged — a crash never opens the gate.
139
+ - **Validated at import time.** A policy naming a non-existent parameter raises
140
+ immediately, so a typo can't silently leave a real argument unguarded.
141
+ - **No silent over-matching.** Numbers must match as whole values (`"2"` is not
142
+ grounded by `"20"`); short names require exact word matches; `date`, `int`,
143
+ `float`, `bool` arguments are coerced deterministically.
144
+ - **Observability.** Blocks and errors log under the `saidso` logger.
145
+
146
+ Tune thresholds or switch to raising via `GroundingConfig`:
147
+
148
+ ```python
149
+ from saidso import GroundingConfig, Policy
150
+ cfg = GroundingConfig(thresholds={Policy.SPOKEN: 0.9}, raise_on_block=True)
151
+
152
+ @grounded(cfg, name=Policy.SPOKEN)
153
+ def book(name): ...
154
+ ```
155
+
156
+ ## Run the demo
157
+
158
+ ```bash
159
+ python examples/john_doe_demo.py
160
+ ```
161
+
162
+ ## Roadmap
163
+
164
+ The MVP is deterministic-first and intentionally small. Planned next:
165
+ verifier-model escalation for ambiguous cases, the **anti-priming prompt
166
+ compiler**, the **hallucination regression harness** (pytest-style CI gate),
167
+ and first-class framework adapters. See [`Docs/ROADMAP.md`](Docs/ROADMAP.md).
168
+
169
+ ## Development
170
+
171
+ ```bash
172
+ pip install -e ".[dev]"
173
+ pytest -q
174
+ ```
175
+
176
+ ## License
177
+
178
+ MIT — see [`LICENSE`](LICENSE).
@@ -0,0 +1,80 @@
1
+ """The John Doe demo: a naked agent invents a patient; saidso blocks it.
2
+
3
+ Run::
4
+
5
+ python examples/john_doe_demo.py
6
+
7
+ No API keys, no network. This is the 30-second story the project exists for.
8
+ """
9
+
10
+ from __future__ import annotations
11
+
12
+ from saidso import AttestationLog, Policy, SteerBack, Transcript, call_context, grounded
13
+
14
+ # A real action that would touch a database / EHR in production.
15
+ DB = []
16
+
17
+
18
+ @grounded(name=Policy.SPOKEN, dob=Policy.SPOKEN)
19
+ def register_patient(name, dob):
20
+ DB.append({"name": name, "dob": dob})
21
+ return {"committed": True, "name": name, "dob": dob}
22
+
23
+
24
+ def banner(title):
25
+ print("\n" + "=" * 64)
26
+ print(title)
27
+ print("=" * 64)
28
+
29
+
30
+ def scenario_naked_agent():
31
+ banner("LEFT — naked agent (no firewall)")
32
+ tr = Transcript()
33
+ tr.add_user("Hi, I'd like to book an appointment for next week.")
34
+ print("caller: \"Hi, I'd like to book an appointment for next week.\"")
35
+ # The model hallucinates a name + DOB the caller never gave.
36
+ name, dob = "John Doe", "1990-01-01"
37
+ DB.append({"name": name, "dob": dob}) # straight to the DB
38
+ print(f"agent -> register_patient(name={name!r}, dob={dob!r})")
39
+ print(f"RESULT : committed fabricated record. DB now: {DB[-1]}")
40
+
41
+
42
+ def scenario_firewalled():
43
+ DB.clear()
44
+ banner("RIGHT — same agent, behind saidso")
45
+ tr = Transcript()
46
+ tr.add_user("Hi, I'd like to book an appointment for next week.")
47
+ print("caller: \"Hi, I'd like to book an appointment for next week.\"")
48
+
49
+ with call_context(tr):
50
+ out = register_patient(name="John Doe", dob="1990-01-01")
51
+ assert isinstance(out, SteerBack)
52
+ print("agent -> register_patient(name='John Doe', dob='1990-01-01')")
53
+ print("BLOCKED: body never ran. DB is still", DB)
54
+ print(f"steer-back to agent: \"{out.message}\"")
55
+
56
+ # The agent re-asks; the caller answers; now it's grounded.
57
+ print("\n-- agent re-asks, caller answers --")
58
+ tr.add_agent("I didn't catch your name and date of birth — could you tell me?")
59
+ tr.add_user("Sure, it's Maria Gomez, born January first, nineteen ninety.")
60
+ print("caller: \"Sure, it's Maria Gomez, born January first, nineteen ninety.\"")
61
+
62
+ log = AttestationLog()
63
+ with call_context(tr, ledger=log, call_id="call-42"):
64
+ out = register_patient(name="Maria Gomez", dob="1990-01-01")
65
+ print(f"agent -> register_patient(name='Maria Gomez', dob='1990-01-01')")
66
+ print(f"COMMITTED: {out}")
67
+ print("\nprovenance receipt:")
68
+ for arg in log.records[0].to_dict()["args"]:
69
+ span = arg["span"]
70
+ print(
71
+ f" - {arg['arg']}={arg['value']!r} via {arg['policy']} "
72
+ f"(conf {arg['confidence']}) grounded by turn #{span['turn_id']}: "
73
+ f"\"{span['text']}\""
74
+ )
75
+
76
+
77
+ if __name__ == "__main__":
78
+ scenario_naked_agent()
79
+ scenario_firewalled()
80
+ print()