ai-assert 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Kaan Tahti
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,254 @@
1
+ Metadata-Version: 2.4
2
+ Name: ai-assert
3
+ Version: 0.1.0
4
+ Summary: Runtime constraint verification for AI outputs. Zero dependencies.
5
+ Author-email: Kaan Tahti <kaantahti@users.noreply.github.com>
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/kaantahti/ai-assert
8
+ Project-URL: Repository, https://github.com/kaantahti/ai-assert
9
+ Project-URL: Issues, https://github.com/kaantahti/ai-assert/issues
10
+ Keywords: ai,llm,validation,constraints,verification,retry,testing,quality
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.11
16
+ Classifier: Programming Language :: Python :: 3.12
17
+ Classifier: Programming Language :: Python :: 3.13
18
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
19
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
20
+ Classifier: Topic :: Software Development :: Quality Assurance
21
+ Classifier: Typing :: Typed
22
+ Requires-Python: >=3.11
23
+ Description-Content-Type: text/markdown
24
+ License-File: LICENSE
25
+ Dynamic: license-file
26
+
27
+ # ai_assert
28
+
29
+ **Runtime constraint verification for AI outputs. 278 lines. Zero dependencies.**
30
+
31
+ ```python
32
+ from ai_assert import ai_assert, valid_json, max_length, contains
33
+
34
+ result = ai_assert(
35
+ prompt="Return a JSON object with a 'greeting' key",
36
+ constraints=[valid_json(), max_length(200), contains("hello")],
37
+ generate_fn=my_llm, # any function: str → str
38
+ max_retries=3,
39
+ )
40
+
41
+ print(result.output) # guaranteed valid JSON, ≤200 chars, contains "hello"
42
+ print(result.passed) # True
43
+ print(result.attempts) # 1–4 (retried with feedback until constraints pass)
44
+ ```
45
+
46
+ ## The Problem
47
+
48
+ LLMs don't reliably follow instructions. You ask for JSON, you get markdown. You ask for 100 words, you get 500. You ask for a list of 5 items, you get 7.
49
+
50
+ Every AI application handles this with ad-hoc validation scattered across the codebase. Or worse — hopes for the best.
51
+
52
+ ## The Solution
53
+
54
+ `ai_assert` is a universal **check → score → retry** loop:
55
+
56
+ 1. **Generate**: Call your LLM (any model, any provider)
57
+ 2. **Check**: Run every constraint (multiplicative gate — ALL must pass)
58
+ 3. **Retry**: If any constraint fails, feed back failure details and regenerate
59
+ 4. **Return**: Verified output with full audit trail
60
+
61
+ ```
62
+ Prompt → LLM → Check all constraints → All pass? → Return ✓
63
+ ↓ (any fail)
64
+ Build feedback prompt → Retry (up to max_retries)
65
+ ```
66
+
67
+ ## Key Properties
68
+
69
+ - **Zero dependencies** — stdlib only (`dataclasses`, `typing`, `json`, `functools`)
70
+ - **Model-agnostic** — works with OpenAI, Anthropic, local models, anything with `str → str`
71
+ - **278 lines** — small enough to read in one sitting, audit in an hour
72
+ - **Multiplicative gate** — a zero in ANY constraint = failure (not averaged away)
73
+ - **Continuous scoring** — each check returns a score in `[0, 1)`, never 1.0
74
+ - **Full audit trail** — every attempt, every check result, preserved in `result.history`
75
+
76
+ ## Install
77
+
78
+ ```bash
79
+ pip install ai-assert
80
+ ```
81
+
82
+ Or just copy `ai_assert.py` into your project. It's one file with zero dependencies.
83
+
84
+ ## Quick Start
85
+
86
+ ### 1. Define your generator
87
+
88
+ Any function that takes a `str` and returns a `str`:
89
+
90
+ ```python
91
+ from openai import OpenAI
92
+
93
+ client = OpenAI()
94
+
95
+ def my_llm(prompt: str) -> str:
96
+ response = client.chat.completions.create(
97
+ model="gpt-4o-mini",
98
+ messages=[{"role": "user", "content": prompt}],
99
+ )
100
+ return response.choices[0].message.content
101
+ ```
102
+
103
+ ### 2. Declare constraints
104
+
105
+ ```python
106
+ from ai_assert import valid_json, max_length, min_length, contains, not_contains, matches_schema, custom
107
+
108
+ constraints = [
109
+ valid_json(), # must be parseable JSON
110
+ max_length(500), # at most 500 chars
111
+ matches_schema({"required": ["name", "age"]}), # must have these keys
112
+ not_contains("```"), # no markdown code fences
113
+ custom("positive_age", lambda s: json.loads(s).get("age", 0) > 0,
114
+ fail_msg="Age must be positive"), # any boolean function
115
+ ]
116
+ ```
117
+
118
+ ### 3. Call ai_assert
119
+
120
+ ```python
121
+ from ai_assert import ai_assert
122
+
123
+ result = ai_assert(
124
+ prompt="Generate a user profile as JSON with name and age.",
125
+ constraints=constraints,
126
+ generate_fn=my_llm,
127
+ max_retries=3,
128
+ )
129
+
130
+ if result.passed:
131
+ user = json.loads(result.output) # safe — guaranteed valid
132
+ print(f"Got {user['name']}, age {user['age']}")
133
+ else:
134
+ print(f"Failed after {result.attempts} attempts")
135
+ for check in result.checks:
136
+ if not check.passed:
137
+ print(f" ✗ {check.name}: {check.message}")
138
+ ```
139
+
140
+ ### 4. Or use the decorator
141
+
142
+ ```python
143
+ from ai_assert import reliable, valid_json, max_length
144
+
145
+ @reliable(constraints=[valid_json(), max_length(500)], max_retries=3)
146
+ def generate_profile(prompt: str) -> str:
147
+ return my_llm(prompt)
148
+
149
+ result = generate_profile("Generate a user profile as JSON.")
150
+ # result is an AiAssertResult — same interface as ai_assert()
151
+ ```
152
+
153
+ ## Built-in Constraints
154
+
155
+ | Constraint | Description |
156
+ |---|---|
157
+ | `max_length(n)` | Output ≤ n characters |
158
+ | `min_length(n)` | Output ≥ n characters |
159
+ | `contains(s)` | Output contains substring s |
160
+ | `not_contains(s)` | Output does NOT contain substring s |
161
+ | `valid_json()` | Output is parseable JSON |
162
+ | `matches_schema(s)` | Output is JSON with required keys |
163
+ | `custom(name, fn)` | Any `str → bool` function |
164
+
165
+ ### Custom constraints
166
+
167
+ ```python
168
+ from ai_assert import Constraint
169
+
170
+ def word_count_between(min_w, max_w):
171
+ def check(output):
172
+ count = len(output.split())
173
+ if min_w <= count <= max_w:
174
+ return True, 0.99, f"{count} words"
175
+ return False, 0.0, f"{count} words, need {min_w}-{max_w}"
176
+ return Constraint(name=f"word_count({min_w}-{max_w})", check_fn=check)
177
+ ```
178
+
179
+ ## The AiAssertResult
180
+
181
+ ```python
182
+ @dataclass
183
+ class AiAssertResult:
184
+ output: str # the final output
185
+ passed: bool # did all constraints pass?
186
+ checks: list[CheckResult] # per-constraint results
187
+ attempts: int # how many tries it took
188
+ history: list[dict] # full audit trail of all attempts
189
+
190
+ @property
191
+ def composite_score(self) -> float: # average score, always < 1.0
192
+ ```
193
+
194
+ Each `CheckResult`:
195
+ ```python
196
+ @dataclass
197
+ class CheckResult:
198
+ name: str # constraint name
199
+ passed: bool # pass/fail
200
+ score: float # continuous in [0, 1), never 1.0
201
+ message: str # human-readable explanation
202
+ ```
203
+
204
+ ## Benchmark Results
205
+
206
+ Tested on [IFEval](https://arxiv.org/abs/2311.07911) (541 prompts, 25 constraint types):
207
+
208
+ | Metric | Baseline (single-pass) | With ai_assert | Improvement |
209
+ |---|---|---|---|
210
+ | Prompt-level accuracy | 69.3% | 76.2% | **+6.8pp** |
211
+ | Constraint-level accuracy | 77.5% | 82.5% | **+5.0pp** |
212
+
213
+ - 30 prompts rescued from failure by the retry mechanism
214
+ - All improvements from runtime verification — no model changes
215
+
216
+ ## Design Decisions
217
+
218
+ **Why multiplicative, not averaged?** If your JSON is invalid, it doesn't matter that the length was perfect. A zero in any dimension is system failure. This is the multiplicative gate — `all(c.passed for c in checks)`.
219
+
220
+ **Why scores in [0, 1) and not [0, 1]?** No verification achieves perfect fidelity. The score participates in correctness without claiming identity with it. This prevents false confidence in downstream aggregation.
221
+
222
+ **Why retry with feedback, not just Best-of-N?** Feedback-directed retry is more sample-efficient. The failure messages tell the model exactly what to fix. Best-of-N generates blindly N times and picks the best. (Both approaches compose — you can use ai_assert inside a Best-of-N loop.)
223
+
224
+ **Why zero dependencies?** So you can drop it into any project. No framework lock-in. No transitive dependency hell. Copy the file and go.
225
+
226
+ ## Works With
227
+
228
+ ai_assert works with any `str → str` function. Tested with:
229
+
230
+ - **OpenAI** (GPT-4o, GPT-4o-mini)
231
+ - **Anthropic** (Claude 3.5 Sonnet)
232
+ - **Local models** (Ollama, vLLM, llama.cpp)
233
+ - **Any HTTP API** (wrap the call in a function)
234
+
235
+ See [examples/basic_usage.py](examples/basic_usage.py) for integration code.
236
+
237
+ ## Compared To
238
+
239
+ | Feature | ai_assert | Guardrails AI | Instructor | OpenAI Structured Outputs |
240
+ |---|---|---|---|---|
241
+ | Zero dependencies | ✅ | ❌ | ❌ | ❌ |
242
+ | Semantic constraints | ✅ | ❌ | ❌ | ❌ |
243
+ | Model-agnostic | ✅ | ✅ | ❌ (OpenAI) | ❌ (OpenAI) |
244
+ | Retry with feedback | ✅ | ✅ | ✅ | ❌ |
245
+ | Continuous scoring | ✅ | ❌ | ❌ | ❌ |
246
+ | Lines of code | 278 | ~15,000+ | ~5,000+ | N/A (server-side) |
247
+
248
+ ## Contributing
249
+
250
+ Issues and PRs welcome. The codebase is 278 lines — reading the entire thing takes less time than reading this README.
251
+
252
+ ## License
253
+
254
+ MIT — see [LICENSE](LICENSE).
@@ -0,0 +1,228 @@
1
+ # ai_assert
2
+
3
+ **Runtime constraint verification for AI outputs. 278 lines. Zero dependencies.**
4
+
5
+ ```python
6
+ from ai_assert import ai_assert, valid_json, max_length, contains
7
+
8
+ result = ai_assert(
9
+ prompt="Return a JSON object with a 'greeting' key",
10
+ constraints=[valid_json(), max_length(200), contains("hello")],
11
+ generate_fn=my_llm, # any function: str → str
12
+ max_retries=3,
13
+ )
14
+
15
+ print(result.output) # guaranteed valid JSON, ≤200 chars, contains "hello"
16
+ print(result.passed) # True
17
+ print(result.attempts) # 1–4 (retried with feedback until constraints pass)
18
+ ```
19
+
20
+ ## The Problem
21
+
22
+ LLMs don't reliably follow instructions. You ask for JSON, you get markdown. You ask for 100 words, you get 500. You ask for a list of 5 items, you get 7.
23
+
24
+ Every AI application handles this with ad-hoc validation scattered across the codebase. Or worse — hopes for the best.
25
+
26
+ ## The Solution
27
+
28
+ `ai_assert` is a universal **check → score → retry** loop:
29
+
30
+ 1. **Generate**: Call your LLM (any model, any provider)
31
+ 2. **Check**: Run every constraint (multiplicative gate — ALL must pass)
32
+ 3. **Retry**: If any constraint fails, feed back failure details and regenerate
33
+ 4. **Return**: Verified output with full audit trail
34
+
35
+ ```
36
+ Prompt → LLM → Check all constraints → All pass? → Return ✓
37
+ ↓ (any fail)
38
+ Build feedback prompt → Retry (up to max_retries)
39
+ ```
40
+
41
+ ## Key Properties
42
+
43
+ - **Zero dependencies** — stdlib only (`dataclasses`, `typing`, `json`, `functools`)
44
+ - **Model-agnostic** — works with OpenAI, Anthropic, local models, anything with `str → str`
45
+ - **278 lines** — small enough to read in one sitting, audit in an hour
46
+ - **Multiplicative gate** — a zero in ANY constraint = failure (not averaged away)
47
+ - **Continuous scoring** — each check returns a score in `[0, 1)`, never 1.0
48
+ - **Full audit trail** — every attempt, every check result, preserved in `result.history`
49
+
50
+ ## Install
51
+
52
+ ```bash
53
+ pip install ai-assert
54
+ ```
55
+
56
+ Or just copy `ai_assert.py` into your project. It's one file with zero dependencies.
57
+
58
+ ## Quick Start
59
+
60
+ ### 1. Define your generator
61
+
62
+ Any function that takes a `str` and returns a `str`:
63
+
64
+ ```python
65
+ from openai import OpenAI
66
+
67
+ client = OpenAI()
68
+
69
+ def my_llm(prompt: str) -> str:
70
+ response = client.chat.completions.create(
71
+ model="gpt-4o-mini",
72
+ messages=[{"role": "user", "content": prompt}],
73
+ )
74
+ return response.choices[0].message.content
75
+ ```
76
+
77
+ ### 2. Declare constraints
78
+
79
+ ```python
80
+ from ai_assert import valid_json, max_length, min_length, contains, not_contains, matches_schema, custom
81
+
82
+ constraints = [
83
+ valid_json(), # must be parseable JSON
84
+ max_length(500), # at most 500 chars
85
+ matches_schema({"required": ["name", "age"]}), # must have these keys
86
+ not_contains("```"), # no markdown code fences
87
+ custom("positive_age", lambda s: json.loads(s).get("age", 0) > 0,
88
+ fail_msg="Age must be positive"), # any boolean function
89
+ ]
90
+ ```
91
+
92
+ ### 3. Call ai_assert
93
+
94
+ ```python
95
+ from ai_assert import ai_assert
96
+
97
+ result = ai_assert(
98
+ prompt="Generate a user profile as JSON with name and age.",
99
+ constraints=constraints,
100
+ generate_fn=my_llm,
101
+ max_retries=3,
102
+ )
103
+
104
+ if result.passed:
105
+ user = json.loads(result.output) # safe — guaranteed valid
106
+ print(f"Got {user['name']}, age {user['age']}")
107
+ else:
108
+ print(f"Failed after {result.attempts} attempts")
109
+ for check in result.checks:
110
+ if not check.passed:
111
+ print(f" ✗ {check.name}: {check.message}")
112
+ ```
113
+
114
+ ### 4. Or use the decorator
115
+
116
+ ```python
117
+ from ai_assert import reliable, valid_json, max_length
118
+
119
+ @reliable(constraints=[valid_json(), max_length(500)], max_retries=3)
120
+ def generate_profile(prompt: str) -> str:
121
+ return my_llm(prompt)
122
+
123
+ result = generate_profile("Generate a user profile as JSON.")
124
+ # result is an AiAssertResult — same interface as ai_assert()
125
+ ```
126
+
127
+ ## Built-in Constraints
128
+
129
+ | Constraint | Description |
130
+ |---|---|
131
+ | `max_length(n)` | Output ≤ n characters |
132
+ | `min_length(n)` | Output ≥ n characters |
133
+ | `contains(s)` | Output contains substring s |
134
+ | `not_contains(s)` | Output does NOT contain substring s |
135
+ | `valid_json()` | Output is parseable JSON |
136
+ | `matches_schema(s)` | Output is JSON with required keys |
137
+ | `custom(name, fn)` | Any `str → bool` function |
138
+
139
+ ### Custom constraints
140
+
141
+ ```python
142
+ from ai_assert import Constraint
143
+
144
+ def word_count_between(min_w, max_w):
145
+ def check(output):
146
+ count = len(output.split())
147
+ if min_w <= count <= max_w:
148
+ return True, 0.99, f"{count} words"
149
+ return False, 0.0, f"{count} words, need {min_w}-{max_w}"
150
+ return Constraint(name=f"word_count({min_w}-{max_w})", check_fn=check)
151
+ ```
152
+
153
+ ## The AiAssertResult
154
+
155
+ ```python
156
+ @dataclass
157
+ class AiAssertResult:
158
+ output: str # the final output
159
+ passed: bool # did all constraints pass?
160
+ checks: list[CheckResult] # per-constraint results
161
+ attempts: int # how many tries it took
162
+ history: list[dict] # full audit trail of all attempts
163
+
164
+ @property
165
+ def composite_score(self) -> float: # average score, always < 1.0
166
+ ```
167
+
168
+ Each `CheckResult`:
169
+ ```python
170
+ @dataclass
171
+ class CheckResult:
172
+ name: str # constraint name
173
+ passed: bool # pass/fail
174
+ score: float # continuous in [0, 1), never 1.0
175
+ message: str # human-readable explanation
176
+ ```
177
+
178
+ ## Benchmark Results
179
+
180
+ Tested on [IFEval](https://arxiv.org/abs/2311.07911) (541 prompts, 25 constraint types):
181
+
182
+ | Metric | Baseline (single-pass) | With ai_assert | Improvement |
183
+ |---|---|---|---|
184
+ | Prompt-level accuracy | 69.3% | 76.2% | **+6.8pp** |
185
+ | Constraint-level accuracy | 77.5% | 82.5% | **+5.0pp** |
186
+
187
+ - 30 prompts rescued from failure by the retry mechanism
188
+ - All improvements from runtime verification — no model changes
189
+
190
+ ## Design Decisions
191
+
192
+ **Why multiplicative, not averaged?** If your JSON is invalid, it doesn't matter that the length was perfect. A zero in any dimension is system failure. This is the multiplicative gate — `all(c.passed for c in checks)`.
193
+
194
+ **Why scores in [0, 1) and not [0, 1]?** No verification achieves perfect fidelity. The score participates in correctness without claiming identity with it. This prevents false confidence in downstream aggregation.
195
+
196
+ **Why retry with feedback, not just Best-of-N?** Feedback-directed retry is more sample-efficient. The failure messages tell the model exactly what to fix. Best-of-N generates blindly N times and picks the best. (Both approaches compose — you can use ai_assert inside a Best-of-N loop.)
197
+
198
+ **Why zero dependencies?** So you can drop it into any project. No framework lock-in. No transitive dependency hell. Copy the file and go.
199
+
200
+ ## Works With
201
+
202
+ ai_assert works with any `str → str` function. Tested with:
203
+
204
+ - **OpenAI** (GPT-4o, GPT-4o-mini)
205
+ - **Anthropic** (Claude 3.5 Sonnet)
206
+ - **Local models** (Ollama, vLLM, llama.cpp)
207
+ - **Any HTTP API** (wrap the call in a function)
208
+
209
+ See [examples/basic_usage.py](examples/basic_usage.py) for integration code.
210
+
211
+ ## Compared To
212
+
213
+ | Feature | ai_assert | Guardrails AI | Instructor | OpenAI Structured Outputs |
214
+ |---|---|---|---|---|
215
+ | Zero dependencies | ✅ | ❌ | ❌ | ❌ |
216
+ | Semantic constraints | ✅ | ❌ | ❌ | ❌ |
217
+ | Model-agnostic | ✅ | ✅ | ❌ (OpenAI) | ❌ (OpenAI) |
218
+ | Retry with feedback | ✅ | ✅ | ✅ | ❌ |
219
+ | Continuous scoring | ✅ | ❌ | ❌ | ❌ |
220
+ | Lines of code | 278 | ~15,000+ | ~5,000+ | N/A (server-side) |
221
+
222
+ ## Contributing
223
+
224
+ Issues and PRs welcome. The codebase is 278 lines — reading the entire thing takes less time than reading this README.
225
+
226
+ ## License
227
+
228
+ MIT — see [LICENSE](LICENSE).