ai-assert 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- ai_assert-0.1.0/LICENSE +21 -0
- ai_assert-0.1.0/PKG-INFO +254 -0
- ai_assert-0.1.0/README.md +228 -0
- ai_assert-0.1.0/ai_assert.egg-info/PKG-INFO +254 -0
- ai_assert-0.1.0/ai_assert.egg-info/SOURCES.txt +9 -0
- ai_assert-0.1.0/ai_assert.egg-info/dependency_links.txt +1 -0
- ai_assert-0.1.0/ai_assert.egg-info/top_level.txt +1 -0
- ai_assert-0.1.0/ai_assert.py +277 -0
- ai_assert-0.1.0/pyproject.toml +45 -0
- ai_assert-0.1.0/setup.cfg +4 -0
- ai_assert-0.1.0/tests/test_ai_assert.py +721 -0
ai_assert-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Kaan Tahti
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
ai_assert-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,254 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: ai-assert
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Runtime constraint verification for AI outputs. Zero dependencies.
|
|
5
|
+
Author-email: Kaan Tahti <kaantahti@users.noreply.github.com>
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/kaantahti/ai-assert
|
|
8
|
+
Project-URL: Repository, https://github.com/kaantahti/ai-assert
|
|
9
|
+
Project-URL: Issues, https://github.com/kaantahti/ai-assert/issues
|
|
10
|
+
Keywords: ai,llm,validation,constraints,verification,retry,testing,quality
|
|
11
|
+
Classifier: Development Status :: 4 - Beta
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
18
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
19
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
20
|
+
Classifier: Topic :: Software Development :: Quality Assurance
|
|
21
|
+
Classifier: Typing :: Typed
|
|
22
|
+
Requires-Python: >=3.11
|
|
23
|
+
Description-Content-Type: text/markdown
|
|
24
|
+
License-File: LICENSE
|
|
25
|
+
Dynamic: license-file
|
|
26
|
+
|
|
27
|
+
# ai_assert
|
|
28
|
+
|
|
29
|
+
**Runtime constraint verification for AI outputs. 278 lines. Zero dependencies.**
|
|
30
|
+
|
|
31
|
+
```python
|
|
32
|
+
from ai_assert import ai_assert, valid_json, max_length, contains
|
|
33
|
+
|
|
34
|
+
result = ai_assert(
|
|
35
|
+
prompt="Return a JSON object with a 'greeting' key",
|
|
36
|
+
constraints=[valid_json(), max_length(200), contains("hello")],
|
|
37
|
+
generate_fn=my_llm, # any function: str → str
|
|
38
|
+
max_retries=3,
|
|
39
|
+
)
|
|
40
|
+
|
|
41
|
+
print(result.output) # guaranteed valid JSON, ≤200 chars, contains "hello"
|
|
42
|
+
print(result.passed) # True
|
|
43
|
+
print(result.attempts) # 1–4 (retried with feedback until constraints pass)
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
## The Problem
|
|
47
|
+
|
|
48
|
+
LLMs don't reliably follow instructions. You ask for JSON, you get markdown. You ask for 100 words, you get 500. You ask for a list of 5 items, you get 7.
|
|
49
|
+
|
|
50
|
+
Every AI application handles this with ad-hoc validation scattered across the codebase. Or worse — hopes for the best.
|
|
51
|
+
|
|
52
|
+
## The Solution
|
|
53
|
+
|
|
54
|
+
`ai_assert` is a universal **check → score → retry** loop:
|
|
55
|
+
|
|
56
|
+
1. **Generate**: Call your LLM (any model, any provider)
|
|
57
|
+
2. **Check**: Run every constraint (multiplicative gate — ALL must pass)
|
|
58
|
+
3. **Retry**: If any constraint fails, feed back failure details and regenerate
|
|
59
|
+
4. **Return**: Verified output with full audit trail
|
|
60
|
+
|
|
61
|
+
```
|
|
62
|
+
Prompt → LLM → Check all constraints → All pass? → Return ✓
|
|
63
|
+
↓ (any fail)
|
|
64
|
+
Build feedback prompt → Retry (up to max_retries)
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
## Key Properties
|
|
68
|
+
|
|
69
|
+
- **Zero dependencies** — stdlib only (`dataclasses`, `typing`, `json`, `functools`)
|
|
70
|
+
- **Model-agnostic** — works with OpenAI, Anthropic, local models, anything with `str → str`
|
|
71
|
+
- **278 lines** — small enough to read in one sitting, audit in an hour
|
|
72
|
+
- **Multiplicative gate** — a zero in ANY constraint = failure (not averaged away)
|
|
73
|
+
- **Continuous scoring** — each check returns a score in `[0, 1)`, never 1.0
|
|
74
|
+
- **Full audit trail** — every attempt, every check result, preserved in `result.history`
|
|
75
|
+
|
|
76
|
+
## Install
|
|
77
|
+
|
|
78
|
+
```bash
|
|
79
|
+
pip install ai-assert
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
Or just copy `ai_assert.py` into your project. It's one file with zero dependencies.
|
|
83
|
+
|
|
84
|
+
## Quick Start
|
|
85
|
+
|
|
86
|
+
### 1. Define your generator
|
|
87
|
+
|
|
88
|
+
Any function that takes a `str` and returns a `str`:
|
|
89
|
+
|
|
90
|
+
```python
|
|
91
|
+
from openai import OpenAI
|
|
92
|
+
|
|
93
|
+
client = OpenAI()
|
|
94
|
+
|
|
95
|
+
def my_llm(prompt: str) -> str:
|
|
96
|
+
response = client.chat.completions.create(
|
|
97
|
+
model="gpt-4o-mini",
|
|
98
|
+
messages=[{"role": "user", "content": prompt}],
|
|
99
|
+
)
|
|
100
|
+
return response.choices[0].message.content
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
### 2. Declare constraints
|
|
104
|
+
|
|
105
|
+
```python
|
|
106
|
+
from ai_assert import valid_json, max_length, min_length, contains, not_contains, matches_schema, custom
|
|
107
|
+
|
|
108
|
+
constraints = [
|
|
109
|
+
valid_json(), # must be parseable JSON
|
|
110
|
+
max_length(500), # at most 500 chars
|
|
111
|
+
matches_schema({"required": ["name", "age"]}), # must have these keys
|
|
112
|
+
not_contains("```"), # no markdown code fences
|
|
113
|
+
custom("positive_age", lambda s: json.loads(s).get("age", 0) > 0,
|
|
114
|
+
fail_msg="Age must be positive"), # any boolean function
|
|
115
|
+
]
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### 3. Call ai_assert
|
|
119
|
+
|
|
120
|
+
```python
|
|
121
|
+
from ai_assert import ai_assert
|
|
122
|
+
|
|
123
|
+
result = ai_assert(
|
|
124
|
+
prompt="Generate a user profile as JSON with name and age.",
|
|
125
|
+
constraints=constraints,
|
|
126
|
+
generate_fn=my_llm,
|
|
127
|
+
max_retries=3,
|
|
128
|
+
)
|
|
129
|
+
|
|
130
|
+
if result.passed:
|
|
131
|
+
user = json.loads(result.output) # safe — guaranteed valid
|
|
132
|
+
print(f"Got {user['name']}, age {user['age']}")
|
|
133
|
+
else:
|
|
134
|
+
print(f"Failed after {result.attempts} attempts")
|
|
135
|
+
for check in result.checks:
|
|
136
|
+
if not check.passed:
|
|
137
|
+
print(f" ✗ {check.name}: {check.message}")
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
### 4. Or use the decorator
|
|
141
|
+
|
|
142
|
+
```python
|
|
143
|
+
from ai_assert import reliable, valid_json, max_length
|
|
144
|
+
|
|
145
|
+
@reliable(constraints=[valid_json(), max_length(500)], max_retries=3)
|
|
146
|
+
def generate_profile(prompt: str) -> str:
|
|
147
|
+
return my_llm(prompt)
|
|
148
|
+
|
|
149
|
+
result = generate_profile("Generate a user profile as JSON.")
|
|
150
|
+
# result is an AiAssertResult — same interface as ai_assert()
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
## Built-in Constraints
|
|
154
|
+
|
|
155
|
+
| Constraint | Description |
|
|
156
|
+
|---|---|
|
|
157
|
+
| `max_length(n)` | Output ≤ n characters |
|
|
158
|
+
| `min_length(n)` | Output ≥ n characters |
|
|
159
|
+
| `contains(s)` | Output contains substring s |
|
|
160
|
+
| `not_contains(s)` | Output does NOT contain substring s |
|
|
161
|
+
| `valid_json()` | Output is parseable JSON |
|
|
162
|
+
| `matches_schema(s)` | Output is JSON with required keys |
|
|
163
|
+
| `custom(name, fn)` | Any `str → bool` function |
|
|
164
|
+
|
|
165
|
+
### Custom constraints
|
|
166
|
+
|
|
167
|
+
```python
|
|
168
|
+
from ai_assert import Constraint
|
|
169
|
+
|
|
170
|
+
def word_count_between(min_w, max_w):
|
|
171
|
+
def check(output):
|
|
172
|
+
count = len(output.split())
|
|
173
|
+
if min_w <= count <= max_w:
|
|
174
|
+
return True, 0.99, f"{count} words"
|
|
175
|
+
return False, 0.0, f"{count} words, need {min_w}-{max_w}"
|
|
176
|
+
return Constraint(name=f"word_count({min_w}-{max_w})", check_fn=check)
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
## The AiAssertResult
|
|
180
|
+
|
|
181
|
+
```python
|
|
182
|
+
@dataclass
|
|
183
|
+
class AiAssertResult:
|
|
184
|
+
output: str # the final output
|
|
185
|
+
passed: bool # did all constraints pass?
|
|
186
|
+
checks: list[CheckResult] # per-constraint results
|
|
187
|
+
attempts: int # how many tries it took
|
|
188
|
+
history: list[dict] # full audit trail of all attempts
|
|
189
|
+
|
|
190
|
+
@property
|
|
191
|
+
def composite_score(self) -> float: # average score, always < 1.0
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
Each `CheckResult`:
|
|
195
|
+
```python
|
|
196
|
+
@dataclass
|
|
197
|
+
class CheckResult:
|
|
198
|
+
name: str # constraint name
|
|
199
|
+
passed: bool # pass/fail
|
|
200
|
+
score: float # continuous in [0, 1), never 1.0
|
|
201
|
+
message: str # human-readable explanation
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
## Benchmark Results
|
|
205
|
+
|
|
206
|
+
Tested on [IFEval](https://arxiv.org/abs/2311.07911) (541 prompts, 25 constraint types):
|
|
207
|
+
|
|
208
|
+
| Metric | Baseline (single-pass) | With ai_assert | Improvement |
|
|
209
|
+
|---|---|---|---|
|
|
210
|
+
| Prompt-level accuracy | 69.3% | 76.2% | **+6.8pp** |
|
|
211
|
+
| Constraint-level accuracy | 77.5% | 82.5% | **+5.0pp** |
|
|
212
|
+
|
|
213
|
+
- 30 prompts rescued from failure by the retry mechanism
|
|
214
|
+
- All improvements from runtime verification — no model changes
|
|
215
|
+
|
|
216
|
+
## Design Decisions
|
|
217
|
+
|
|
218
|
+
**Why multiplicative, not averaged?** If your JSON is invalid, it doesn't matter that the length was perfect. A zero in any dimension is system failure. This is the multiplicative gate — `all(c.passed for c in checks)`.
|
|
219
|
+
|
|
220
|
+
**Why scores in [0, 1) and not [0, 1]?** No verification achieves perfect fidelity. The score participates in correctness without claiming identity with it. This prevents false confidence in downstream aggregation.
|
|
221
|
+
|
|
222
|
+
**Why retry with feedback, not just Best-of-N?** Feedback-directed retry is more sample-efficient. The failure messages tell the model exactly what to fix. Best-of-N generates blindly N times and picks the best. (Both approaches compose — you can use ai_assert inside a Best-of-N loop.)
|
|
223
|
+
|
|
224
|
+
**Why zero dependencies?** So you can drop it into any project. No framework lock-in. No transitive dependency hell. Copy the file and go.
|
|
225
|
+
|
|
226
|
+
## Works With
|
|
227
|
+
|
|
228
|
+
ai_assert works with any `str → str` function. Tested with:
|
|
229
|
+
|
|
230
|
+
- **OpenAI** (GPT-4o, GPT-4o-mini)
|
|
231
|
+
- **Anthropic** (Claude 3.5 Sonnet)
|
|
232
|
+
- **Local models** (Ollama, vLLM, llama.cpp)
|
|
233
|
+
- **Any HTTP API** (wrap the call in a function)
|
|
234
|
+
|
|
235
|
+
See [examples/basic_usage.py](examples/basic_usage.py) for integration code.
|
|
236
|
+
|
|
237
|
+
## Compared To
|
|
238
|
+
|
|
239
|
+
| Feature | ai_assert | Guardrails AI | Instructor | OpenAI Structured Outputs |
|
|
240
|
+
|---|---|---|---|---|
|
|
241
|
+
| Zero dependencies | ✅ | ❌ | ❌ | ❌ |
|
|
242
|
+
| Semantic constraints | ✅ | ❌ | ❌ | ❌ |
|
|
243
|
+
| Model-agnostic | ✅ | ✅ | ❌ (OpenAI) | ❌ (OpenAI) |
|
|
244
|
+
| Retry with feedback | ✅ | ✅ | ✅ | ❌ |
|
|
245
|
+
| Continuous scoring | ✅ | ❌ | ❌ | ❌ |
|
|
246
|
+
| Lines of code | 278 | ~15,000+ | ~5,000+ | N/A (server-side) |
|
|
247
|
+
|
|
248
|
+
## Contributing
|
|
249
|
+
|
|
250
|
+
Issues and PRs welcome. The codebase is 278 lines — reading the entire thing takes less time than reading this README.
|
|
251
|
+
|
|
252
|
+
## License
|
|
253
|
+
|
|
254
|
+
MIT — see [LICENSE](LICENSE).
|
|
@@ -0,0 +1,228 @@
|
|
|
1
|
+
# ai_assert
|
|
2
|
+
|
|
3
|
+
**Runtime constraint verification for AI outputs. 278 lines. Zero dependencies.**
|
|
4
|
+
|
|
5
|
+
```python
|
|
6
|
+
from ai_assert import ai_assert, valid_json, max_length, contains
|
|
7
|
+
|
|
8
|
+
result = ai_assert(
|
|
9
|
+
prompt="Return a JSON object with a 'greeting' key",
|
|
10
|
+
constraints=[valid_json(), max_length(200), contains("hello")],
|
|
11
|
+
generate_fn=my_llm, # any function: str → str
|
|
12
|
+
max_retries=3,
|
|
13
|
+
)
|
|
14
|
+
|
|
15
|
+
print(result.output) # guaranteed valid JSON, ≤200 chars, contains "hello"
|
|
16
|
+
print(result.passed) # True
|
|
17
|
+
print(result.attempts) # 1–4 (retried with feedback until constraints pass)
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
## The Problem
|
|
21
|
+
|
|
22
|
+
LLMs don't reliably follow instructions. You ask for JSON, you get markdown. You ask for 100 words, you get 500. You ask for a list of 5 items, you get 7.
|
|
23
|
+
|
|
24
|
+
Every AI application handles this with ad-hoc validation scattered across the codebase. Or worse — hopes for the best.
|
|
25
|
+
|
|
26
|
+
## The Solution
|
|
27
|
+
|
|
28
|
+
`ai_assert` is a universal **check → score → retry** loop:
|
|
29
|
+
|
|
30
|
+
1. **Generate**: Call your LLM (any model, any provider)
|
|
31
|
+
2. **Check**: Run every constraint (multiplicative gate — ALL must pass)
|
|
32
|
+
3. **Retry**: If any constraint fails, feed back failure details and regenerate
|
|
33
|
+
4. **Return**: Verified output with full audit trail
|
|
34
|
+
|
|
35
|
+
```
|
|
36
|
+
Prompt → LLM → Check all constraints → All pass? → Return ✓
|
|
37
|
+
↓ (any fail)
|
|
38
|
+
Build feedback prompt → Retry (up to max_retries)
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
## Key Properties
|
|
42
|
+
|
|
43
|
+
- **Zero dependencies** — stdlib only (`dataclasses`, `typing`, `json`, `functools`)
|
|
44
|
+
- **Model-agnostic** — works with OpenAI, Anthropic, local models, anything with `str → str`
|
|
45
|
+
- **278 lines** — small enough to read in one sitting, audit in an hour
|
|
46
|
+
- **Multiplicative gate** — a zero in ANY constraint = failure (not averaged away)
|
|
47
|
+
- **Continuous scoring** — each check returns a score in `[0, 1)`, never 1.0
|
|
48
|
+
- **Full audit trail** — every attempt, every check result, preserved in `result.history`
|
|
49
|
+
|
|
50
|
+
## Install
|
|
51
|
+
|
|
52
|
+
```bash
|
|
53
|
+
pip install ai-assert
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
Or just copy `ai_assert.py` into your project. It's one file with zero dependencies.
|
|
57
|
+
|
|
58
|
+
## Quick Start
|
|
59
|
+
|
|
60
|
+
### 1. Define your generator
|
|
61
|
+
|
|
62
|
+
Any function that takes a `str` and returns a `str`:
|
|
63
|
+
|
|
64
|
+
```python
|
|
65
|
+
from openai import OpenAI
|
|
66
|
+
|
|
67
|
+
client = OpenAI()
|
|
68
|
+
|
|
69
|
+
def my_llm(prompt: str) -> str:
|
|
70
|
+
response = client.chat.completions.create(
|
|
71
|
+
model="gpt-4o-mini",
|
|
72
|
+
messages=[{"role": "user", "content": prompt}],
|
|
73
|
+
)
|
|
74
|
+
return response.choices[0].message.content
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
### 2. Declare constraints
|
|
78
|
+
|
|
79
|
+
```python
|
|
80
|
+
from ai_assert import valid_json, max_length, min_length, contains, not_contains, matches_schema, custom
|
|
81
|
+
|
|
82
|
+
constraints = [
|
|
83
|
+
valid_json(), # must be parseable JSON
|
|
84
|
+
max_length(500), # at most 500 chars
|
|
85
|
+
matches_schema({"required": ["name", "age"]}), # must have these keys
|
|
86
|
+
not_contains("```"), # no markdown code fences
|
|
87
|
+
custom("positive_age", lambda s: json.loads(s).get("age", 0) > 0,
|
|
88
|
+
fail_msg="Age must be positive"), # any boolean function
|
|
89
|
+
]
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
### 3. Call ai_assert
|
|
93
|
+
|
|
94
|
+
```python
|
|
95
|
+
from ai_assert import ai_assert
|
|
96
|
+
|
|
97
|
+
result = ai_assert(
|
|
98
|
+
prompt="Generate a user profile as JSON with name and age.",
|
|
99
|
+
constraints=constraints,
|
|
100
|
+
generate_fn=my_llm,
|
|
101
|
+
max_retries=3,
|
|
102
|
+
)
|
|
103
|
+
|
|
104
|
+
if result.passed:
|
|
105
|
+
user = json.loads(result.output) # safe — guaranteed valid
|
|
106
|
+
print(f"Got {user['name']}, age {user['age']}")
|
|
107
|
+
else:
|
|
108
|
+
print(f"Failed after {result.attempts} attempts")
|
|
109
|
+
for check in result.checks:
|
|
110
|
+
if not check.passed:
|
|
111
|
+
print(f" ✗ {check.name}: {check.message}")
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
### 4. Or use the decorator
|
|
115
|
+
|
|
116
|
+
```python
|
|
117
|
+
from ai_assert import reliable, valid_json, max_length
|
|
118
|
+
|
|
119
|
+
@reliable(constraints=[valid_json(), max_length(500)], max_retries=3)
|
|
120
|
+
def generate_profile(prompt: str) -> str:
|
|
121
|
+
return my_llm(prompt)
|
|
122
|
+
|
|
123
|
+
result = generate_profile("Generate a user profile as JSON.")
|
|
124
|
+
# result is an AiAssertResult — same interface as ai_assert()
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
## Built-in Constraints
|
|
128
|
+
|
|
129
|
+
| Constraint | Description |
|
|
130
|
+
|---|---|
|
|
131
|
+
| `max_length(n)` | Output ≤ n characters |
|
|
132
|
+
| `min_length(n)` | Output ≥ n characters |
|
|
133
|
+
| `contains(s)` | Output contains substring s |
|
|
134
|
+
| `not_contains(s)` | Output does NOT contain substring s |
|
|
135
|
+
| `valid_json()` | Output is parseable JSON |
|
|
136
|
+
| `matches_schema(s)` | Output is JSON with required keys |
|
|
137
|
+
| `custom(name, fn)` | Any `str → bool` function |
|
|
138
|
+
|
|
139
|
+
### Custom constraints
|
|
140
|
+
|
|
141
|
+
```python
|
|
142
|
+
from ai_assert import Constraint
|
|
143
|
+
|
|
144
|
+
def word_count_between(min_w, max_w):
|
|
145
|
+
def check(output):
|
|
146
|
+
count = len(output.split())
|
|
147
|
+
if min_w <= count <= max_w:
|
|
148
|
+
return True, 0.99, f"{count} words"
|
|
149
|
+
return False, 0.0, f"{count} words, need {min_w}-{max_w}"
|
|
150
|
+
return Constraint(name=f"word_count({min_w}-{max_w})", check_fn=check)
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
## The AiAssertResult
|
|
154
|
+
|
|
155
|
+
```python
|
|
156
|
+
@dataclass
|
|
157
|
+
class AiAssertResult:
|
|
158
|
+
output: str # the final output
|
|
159
|
+
passed: bool # did all constraints pass?
|
|
160
|
+
checks: list[CheckResult] # per-constraint results
|
|
161
|
+
attempts: int # how many tries it took
|
|
162
|
+
history: list[dict] # full audit trail of all attempts
|
|
163
|
+
|
|
164
|
+
@property
|
|
165
|
+
def composite_score(self) -> float: # average score, always < 1.0
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
Each `CheckResult`:
|
|
169
|
+
```python
|
|
170
|
+
@dataclass
|
|
171
|
+
class CheckResult:
|
|
172
|
+
name: str # constraint name
|
|
173
|
+
passed: bool # pass/fail
|
|
174
|
+
score: float # continuous in [0, 1), never 1.0
|
|
175
|
+
message: str # human-readable explanation
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
## Benchmark Results
|
|
179
|
+
|
|
180
|
+
Tested on [IFEval](https://arxiv.org/abs/2311.07911) (541 prompts, 25 constraint types):
|
|
181
|
+
|
|
182
|
+
| Metric | Baseline (single-pass) | With ai_assert | Improvement |
|
|
183
|
+
|---|---|---|---|
|
|
184
|
+
| Prompt-level accuracy | 69.3% | 76.2% | **+6.8pp** |
|
|
185
|
+
| Constraint-level accuracy | 77.5% | 82.5% | **+5.0pp** |
|
|
186
|
+
|
|
187
|
+
- 30 prompts rescued from failure by the retry mechanism
|
|
188
|
+
- All improvements from runtime verification — no model changes
|
|
189
|
+
|
|
190
|
+
## Design Decisions
|
|
191
|
+
|
|
192
|
+
**Why multiplicative, not averaged?** If your JSON is invalid, it doesn't matter that the length was perfect. A zero in any dimension is system failure. This is the multiplicative gate — `all(c.passed for c in checks)`.
|
|
193
|
+
|
|
194
|
+
**Why scores in [0, 1) and not [0, 1]?** No verification achieves perfect fidelity. The score participates in correctness without claiming identity with it. This prevents false confidence in downstream aggregation.
|
|
195
|
+
|
|
196
|
+
**Why retry with feedback, not just Best-of-N?** Feedback-directed retry is more sample-efficient. The failure messages tell the model exactly what to fix. Best-of-N generates blindly N times and picks the best. (Both approaches compose — you can use ai_assert inside a Best-of-N loop.)
|
|
197
|
+
|
|
198
|
+
**Why zero dependencies?** So you can drop it into any project. No framework lock-in. No transitive dependency hell. Copy the file and go.
|
|
199
|
+
|
|
200
|
+
## Works With
|
|
201
|
+
|
|
202
|
+
ai_assert works with any `str → str` function. Tested with:
|
|
203
|
+
|
|
204
|
+
- **OpenAI** (GPT-4o, GPT-4o-mini)
|
|
205
|
+
- **Anthropic** (Claude 3.5 Sonnet)
|
|
206
|
+
- **Local models** (Ollama, vLLM, llama.cpp)
|
|
207
|
+
- **Any HTTP API** (wrap the call in a function)
|
|
208
|
+
|
|
209
|
+
See [examples/basic_usage.py](examples/basic_usage.py) for integration code.
|
|
210
|
+
|
|
211
|
+
## Compared To
|
|
212
|
+
|
|
213
|
+
| Feature | ai_assert | Guardrails AI | Instructor | OpenAI Structured Outputs |
|
|
214
|
+
|---|---|---|---|---|
|
|
215
|
+
| Zero dependencies | ✅ | ❌ | ❌ | ❌ |
|
|
216
|
+
| Semantic constraints | ✅ | ❌ | ❌ | ❌ |
|
|
217
|
+
| Model-agnostic | ✅ | ✅ | ❌ (OpenAI) | ❌ (OpenAI) |
|
|
218
|
+
| Retry with feedback | ✅ | ✅ | ✅ | ❌ |
|
|
219
|
+
| Continuous scoring | ✅ | ❌ | ❌ | ❌ |
|
|
220
|
+
| Lines of code | 278 | ~15,000+ | ~5,000+ | N/A (server-side) |
|
|
221
|
+
|
|
222
|
+
## Contributing
|
|
223
|
+
|
|
224
|
+
Issues and PRs welcome. The codebase is 278 lines — reading the entire thing takes less time than reading this README.
|
|
225
|
+
|
|
226
|
+
## License
|
|
227
|
+
|
|
228
|
+
MIT — see [LICENSE](LICENSE).
|