spec-drift 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- spec_drift-0.1.0/.github/workflows/ci.yml +61 -0
- spec_drift-0.1.0/.gitignore +14 -0
- spec_drift-0.1.0/LICENSE +21 -0
- spec_drift-0.1.0/PKG-INFO +359 -0
- spec_drift-0.1.0/README.md +331 -0
- spec_drift-0.1.0/pyproject.toml +57 -0
- spec_drift-0.1.0/spec_drift/__init__.py +751 -0
- spec_drift-0.1.0/tests/__init__.py +0 -0
- spec_drift-0.1.0/tests/test_spec_drift.py +741 -0
|
@@ -0,0 +1,61 @@
|
|
|
1
|
+
name: CI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: ["main", "master"]
|
|
6
|
+
paths:
|
|
7
|
+
- "products/spec-drift/**"
|
|
8
|
+
pull_request:
|
|
9
|
+
paths:
|
|
10
|
+
- "products/spec-drift/**"
|
|
11
|
+
|
|
12
|
+
jobs:
|
|
13
|
+
test:
|
|
14
|
+
runs-on: ubuntu-latest
|
|
15
|
+
defaults:
|
|
16
|
+
run:
|
|
17
|
+
working-directory: products/spec-drift
|
|
18
|
+
|
|
19
|
+
strategy:
|
|
20
|
+
matrix:
|
|
21
|
+
python-version: ["3.10", "3.11", "3.12"]
|
|
22
|
+
|
|
23
|
+
steps:
|
|
24
|
+
- uses: actions/checkout@v4
|
|
25
|
+
|
|
26
|
+
- name: Set up Python ${{ matrix.python-version }}
|
|
27
|
+
uses: actions/setup-python@v5
|
|
28
|
+
with:
|
|
29
|
+
python-version: ${{ matrix.python-version }}
|
|
30
|
+
|
|
31
|
+
- name: Install package + dev dependencies
|
|
32
|
+
run: pip install -e ".[dev]"
|
|
33
|
+
|
|
34
|
+
- name: Run tests
|
|
35
|
+
run: pytest tests/ -v --tb=short
|
|
36
|
+
|
|
37
|
+
publish:
|
|
38
|
+
needs: test
|
|
39
|
+
runs-on: ubuntu-latest
|
|
40
|
+
if: github.ref == 'refs/heads/master' && github.event_name == 'push'
|
|
41
|
+
permissions:
|
|
42
|
+
id-token: write # OIDC trusted publishing — no stored API tokens
|
|
43
|
+
|
|
44
|
+
steps:
|
|
45
|
+
- uses: actions/checkout@v4
|
|
46
|
+
|
|
47
|
+
- name: Set up Python
|
|
48
|
+
uses: actions/setup-python@v5
|
|
49
|
+
with:
|
|
50
|
+
python-version: "3.11"
|
|
51
|
+
|
|
52
|
+
- name: Build distribution
|
|
53
|
+
working-directory: products/spec-drift
|
|
54
|
+
run: |
|
|
55
|
+
pip install hatchling
|
|
56
|
+
python -m hatchling build
|
|
57
|
+
|
|
58
|
+
- name: Publish to PyPI
|
|
59
|
+
uses: pypa/gh-action-pypi-publish@release/v1
|
|
60
|
+
with:
|
|
61
|
+
packages-dir: products/spec-drift/dist/
|
spec_drift-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 BuildWorld
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,359 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: spec-drift
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Semantic specification drift detector for LLM outputs — catches semantic violations Pydantic cannot see.
|
|
5
|
+
Project-URL: Homepage, https://github.com/Rowusuduah/spec-drift
|
|
6
|
+
Project-URL: Repository, https://github.com/Rowusuduah/spec-drift
|
|
7
|
+
Project-URL: Issues, https://github.com/Rowusuduah/spec-drift/issues
|
|
8
|
+
Author-email: Richmond Owusu Duah <Rowusuduah@users.noreply.github.com>
|
|
9
|
+
License: MIT
|
|
10
|
+
License-File: LICENSE
|
|
11
|
+
Keywords: ai,ci-cd,compliance,drift,llm,monitoring,pydantic,quality,semantic,specification
|
|
12
|
+
Classifier: Development Status :: 3 - Alpha
|
|
13
|
+
Classifier: Intended Audience :: Developers
|
|
14
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
18
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
19
|
+
Classifier: Topic :: Software Development :: Quality Assurance
|
|
20
|
+
Requires-Python: >=3.10
|
|
21
|
+
Requires-Dist: pydantic>=2.0
|
|
22
|
+
Provides-Extra: dev
|
|
23
|
+
Requires-Dist: black; extra == 'dev'
|
|
24
|
+
Requires-Dist: pytest-cov; extra == 'dev'
|
|
25
|
+
Requires-Dist: pytest>=7.0; extra == 'dev'
|
|
26
|
+
Requires-Dist: ruff; extra == 'dev'
|
|
27
|
+
Description-Content-Type: text/markdown
|
|
28
|
+
|
|
29
|
+
# spec-drift
|
|
30
|
+
|
|
31
|
+
**Your LLM outputs pass Pydantic. That's not enough.**
|
|
32
|
+
|
|
33
|
+
spec-drift detects when your LLM outputs drift semantically from their declared specification — even when structural validation passes. It monitors continuous semantic compliance in production, generates drift reports, and provides a CI gate for semantic regression testing.
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
## The Problem
|
|
38
|
+
|
|
39
|
+
Every team shipping LLM features uses Pydantic or JSON Schema to validate output structure. These tools are excellent — and completely blind to semantic drift.
|
|
40
|
+
|
|
41
|
+
Consider what happens in production LLM systems over time:
|
|
42
|
+
|
|
43
|
+
**Silent model updates:** Your LLM provider silently updates the underlying model. The API contract (field names, types, schema) doesn't change. But the distribution of values inside those fields shifts. Your "sentiment" field starts returning "ambivalent" where it previously returned "neutral." Your "risk_level" classification shifts its decision boundary. Pydantic sees nothing.
|
|
44
|
+
|
|
45
|
+
**Prompt erosion:** A prompt is modified through six iterations of "just a small tweak." Each tweak passes regression tests individually. But cumulatively, the semantic profile of outputs drifts. The "reasoning" field that used to average 120 words now averages 30. Your validation still passes.
|
|
46
|
+
|
|
47
|
+
**Input distribution shift:** A new user cohort or marketing campaign brings different input patterns. The same model, same prompt, same schema — but outputs drift from the spec because they were calibrated for a different input distribution.
|
|
48
|
+
|
|
49
|
+
spec-drift catches all of these. Pydantic catches none of them.
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## The Biblical Foundation
|
|
54
|
+
|
|
55
|
+
*"Moses then said to Aaron, 'This is what the Lord spoke of when he said: Among those who approach me I will be proved holy.'"* — Leviticus 10:3
|
|
56
|
+
|
|
57
|
+
Nadab and Abihu offered "unauthorized fire" — structurally correct (fire), instrumentally correct (censers), personally authorized (priests). But the semantic specification was violated. Every structural check passed. The semantic compliance check failed.
|
|
58
|
+
|
|
59
|
+
spec-drift applies this principle to LLM outputs: structural validation is necessary but not sufficient. Semantic specification must be declared and continuously monitored.
|
|
60
|
+
|
|
61
|
+
*BibleWorld build — PAT-037, Pivot_Score 8.63*
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## Installation
|
|
66
|
+
|
|
67
|
+
```bash
|
|
68
|
+
pip install spec-drift
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
**Requirements:** Python 3.10+, Pydantic v2
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
## Quick Start
|
|
76
|
+
|
|
77
|
+
### 1. Declare a semantic spec
|
|
78
|
+
|
|
79
|
+
```python
|
|
80
|
+
from pydantic import BaseModel
|
|
81
|
+
from spec_drift import spec, SemanticConstraint
|
|
82
|
+
|
|
83
|
+
@spec(
|
|
84
|
+
category=SemanticConstraint.from_authorized_values(
|
|
85
|
+
["positive", "negative", "neutral"],
|
|
86
|
+
tolerance=0.02, # max 2% outputs outside authorized set
|
|
87
|
+
alert_threshold=0.10 # alert if >10% observations violate
|
|
88
|
+
),
|
|
89
|
+
reasoning=SemanticConstraint.from_length_bounds(
|
|
90
|
+
min_words=30,
|
|
91
|
+
max_words=300,
|
|
92
|
+
alert_threshold=0.15
|
|
93
|
+
),
|
|
94
|
+
score=SemanticConstraint.from_distribution(
|
|
95
|
+
mean=6.5,
|
|
96
|
+
std=2.0,
|
|
97
|
+
drift_threshold=1.0, # alert if mean shifts >1 sigma
|
|
98
|
+
alert_threshold=0.20
|
|
99
|
+
)
|
|
100
|
+
)
|
|
101
|
+
class SentimentAnalysis(BaseModel):
|
|
102
|
+
category: str
|
|
103
|
+
reasoning: str
|
|
104
|
+
score: float
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
### 2. Wrap your LLM function
|
|
108
|
+
|
|
109
|
+
```python
|
|
110
|
+
from spec_drift import DriftMonitor
|
|
111
|
+
import anthropic
|
|
112
|
+
|
|
113
|
+
client = anthropic.Anthropic()
|
|
114
|
+
|
|
115
|
+
monitor = DriftMonitor(
|
|
116
|
+
spec=SentimentAnalysis,
|
|
117
|
+
db_path="./spec_drift.db",
|
|
118
|
+
model_version="claude-3-5-haiku-20241022",
|
|
119
|
+
)
|
|
120
|
+
|
|
121
|
+
@monitor.watch
|
|
122
|
+
def analyze_sentiment(text: str) -> SentimentAnalysis:
|
|
123
|
+
response = client.messages.create(
|
|
124
|
+
model="claude-3-5-haiku-20241022",
|
|
125
|
+
max_tokens=500,
|
|
126
|
+
messages=[{"role": "user", "content": f"Analyze: {text}"}]
|
|
127
|
+
)
|
|
128
|
+
return SentimentAnalysis.model_validate_json(response.content[0].text)
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
### 3. Check drift
|
|
132
|
+
|
|
133
|
+
```bash
|
|
134
|
+
# Terminal drift report (last 7 days)
|
|
135
|
+
spec-drift check --spec my_module.SentimentAnalysis --since 7d
|
|
136
|
+
|
|
137
|
+
# CI gate: fail build if >20% semantic violations
|
|
138
|
+
spec-drift ci \
|
|
139
|
+
--spec my_module.SentimentAnalysis \
|
|
140
|
+
--test-batch data/ci_batch.jsonl \
|
|
141
|
+
--threshold 0.20 \
|
|
142
|
+
--exit-code
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
147
|
+
## API Reference
|
|
148
|
+
|
|
149
|
+
### `@spec(**constraints)`
|
|
150
|
+
|
|
151
|
+
Attaches semantic constraints to a Pydantic model class.
|
|
152
|
+
|
|
153
|
+
```python
|
|
154
|
+
@spec(field_name=SemanticConstraint.from_authorized_values([...]))
|
|
155
|
+
class MyModel(BaseModel):
|
|
156
|
+
field_name: str
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
### `SemanticConstraint`
|
|
160
|
+
|
|
161
|
+
#### `SemanticConstraint.from_authorized_values(authorized, tolerance, alert_threshold)`
|
|
162
|
+
Field values must be drawn from the authorized list (within tolerance).
|
|
163
|
+
- `authorized`: list of permitted values
|
|
164
|
+
- `tolerance`: float, max fraction of outputs outside authorized set before constraint flags
|
|
165
|
+
- `alert_threshold`: float, fraction of rolling observations before alert fires
|
|
166
|
+
|
|
167
|
+
#### `SemanticConstraint.from_length_bounds(min_words, max_words, alert_threshold)`
|
|
168
|
+
String field word count must be within [min_words, max_words].
|
|
169
|
+
|
|
170
|
+
#### `SemanticConstraint.from_distribution(mean, std, drift_threshold, alert_threshold)`
|
|
171
|
+
Numeric field should follow a distribution near (mean, std). Alerts if observed mean shifts by more than drift_threshold standard deviations.
|
|
172
|
+
|
|
173
|
+
#### `SemanticConstraint.from_pattern(regex, min_match_rate, alert_threshold)`
|
|
174
|
+
String field should match the regex pattern at min_match_rate frequency.
|
|
175
|
+
|
|
176
|
+
### `DriftMonitor(spec, db_path, model_version, prompt_hash, alert_callback)`
|
|
177
|
+
|
|
178
|
+
Runtime monitor for semantic specification compliance.
|
|
179
|
+
|
|
180
|
+
#### `.watch` (decorator)
|
|
181
|
+
Wraps an LLM function to automatically observe its return value.
|
|
182
|
+
|
|
183
|
+
#### `.observe(output) -> output`
|
|
184
|
+
Manually observe a Pydantic model instance. Returns the output unchanged.
|
|
185
|
+
|
|
186
|
+
#### `.drift_report(since_hours) -> dict`
|
|
187
|
+
Generate a semantic drift report for the last N hours.
|
|
188
|
+
|
|
189
|
+
```python
|
|
190
|
+
{
|
|
191
|
+
"spec": "SentimentAnalysis",
|
|
192
|
+
"period_hours": 168.0,
|
|
193
|
+
"observations": 4523,
|
|
194
|
+
"violation_rate": 0.0312,
|
|
195
|
+
"severity": "low",
|
|
196
|
+
"field_violation_rates": {
|
|
197
|
+
"category": 0.0089,
|
|
198
|
+
"reasoning": 0.0221,
|
|
199
|
+
"score": 0.0002
|
|
200
|
+
}
|
|
201
|
+
}
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
#### `run_ci_gate(monitor, test_outputs, threshold) -> (passed, report)`
|
|
205
|
+
Run a CI gate on a batch of test outputs. Returns (passed, report).
|
|
206
|
+
|
|
207
|
+
---
|
|
208
|
+
|
|
209
|
+
## CLI Reference
|
|
210
|
+
|
|
211
|
+
```bash
|
|
212
|
+
# Initialize spec-drift in a project
|
|
213
|
+
spec-drift init
|
|
214
|
+
|
|
215
|
+
# Calibrate a baseline from golden data
|
|
216
|
+
spec-drift calibrate \
|
|
217
|
+
--spec my_module.SentimentAnalysis \
|
|
218
|
+
--input-file data/golden_set.jsonl \
|
|
219
|
+
--output baseline.db
|
|
220
|
+
|
|
221
|
+
# Drift report (table format, last 7 days)
|
|
222
|
+
spec-drift check \
|
|
223
|
+
--spec my_module.SentimentAnalysis \
|
|
224
|
+
--since 7d \
|
|
225
|
+
--format table
|
|
226
|
+
|
|
227
|
+
# CI gate
|
|
228
|
+
spec-drift ci \
|
|
229
|
+
--spec my_module.SentimentAnalysis \
|
|
230
|
+
--test-batch data/ci_batch.jsonl \
|
|
231
|
+
--threshold 0.20 \
|
|
232
|
+
--exit-code
|
|
233
|
+
|
|
234
|
+
# Compare two model versions
|
|
235
|
+
spec-drift compare \
|
|
236
|
+
--spec my_module.SentimentAnalysis \
|
|
237
|
+
--baseline-a baseline_gpt4o.db \
|
|
238
|
+
--baseline-b baseline_claude_haiku.db
|
|
239
|
+
|
|
240
|
+
# HTML report
|
|
241
|
+
spec-drift report \
|
|
242
|
+
--spec my_module.SentimentAnalysis \
|
|
243
|
+
--since 30d \
|
|
244
|
+
--output report.html
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
---
|
|
248
|
+
|
|
249
|
+
## GitHub Action
|
|
250
|
+
|
|
251
|
+
```yaml
|
|
252
|
+
# .github/workflows/llm-spec-check.yml
|
|
253
|
+
name: LLM Semantic Spec Check
|
|
254
|
+
|
|
255
|
+
on: [push, pull_request]
|
|
256
|
+
|
|
257
|
+
jobs:
|
|
258
|
+
spec-drift:
|
|
259
|
+
runs-on: ubuntu-latest
|
|
260
|
+
steps:
|
|
261
|
+
- uses: actions/checkout@v4
|
|
262
|
+
- uses: spec-drift/action@v1
|
|
263
|
+
with:
|
|
264
|
+
spec: my_module.SentimentAnalysis
|
|
265
|
+
test-batch: data/ci_batch.jsonl
|
|
266
|
+
threshold: '0.20'
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
---
|
|
270
|
+
|
|
271
|
+
## Severity Levels
|
|
272
|
+
|
|
273
|
+
| Violation Rate | Severity | Recommended Action |
|
|
274
|
+
|---|---|---|
|
|
275
|
+
| 0% | NONE | No action needed |
|
|
276
|
+
| < 5% | LOW | Monitor, no immediate action |
|
|
277
|
+
| 5-15% | MEDIUM | Investigate, create issue |
|
|
278
|
+
| 15-30% | HIGH | Rollback or prompt fix recommended |
|
|
279
|
+
| > 30% | CRITICAL | Immediate rollback |
|
|
280
|
+
|
|
281
|
+
---
|
|
282
|
+
|
|
283
|
+
## Storage
|
|
284
|
+
|
|
285
|
+
spec-drift uses SQLite by default — zero infrastructure required.
|
|
286
|
+
|
|
287
|
+
```python
|
|
288
|
+
# Local development (default)
|
|
289
|
+
monitor = DriftMonitor(spec=MyModel, db_path="./spec_drift.db")
|
|
290
|
+
|
|
291
|
+
# PostgreSQL for production (coming in v0.2)
|
|
292
|
+
monitor = DriftMonitor(
|
|
293
|
+
spec=MyModel,
|
|
294
|
+
db_url="postgresql://user:pass@host/db"
|
|
295
|
+
)
|
|
296
|
+
|
|
297
|
+
# In-memory for testing
|
|
298
|
+
monitor = DriftMonitor(spec=MyModel, db_path=":memory:")
|
|
299
|
+
```
|
|
300
|
+
|
|
301
|
+
---
|
|
302
|
+
|
|
303
|
+
## Roadmap
|
|
304
|
+
|
|
305
|
+
### v0.1 (this release)
|
|
306
|
+
- Core `@spec` decorator + `SemanticConstraint` DSL
|
|
307
|
+
- `DriftMonitor` with `.watch` and `.observe`
|
|
308
|
+
- SQLite observation store
|
|
309
|
+
- `run_ci_gate` function
|
|
310
|
+
- CLI: `check`, `ci`, `compare`
|
|
311
|
+
|
|
312
|
+
### v0.2
|
|
313
|
+
- PostgreSQL support
|
|
314
|
+
- Multi-field correlation monitoring
|
|
315
|
+
- Automatic model version detection (via LLM API response headers)
|
|
316
|
+
- Slack/PagerDuty alert integrations
|
|
317
|
+
- HTML drift reports
|
|
318
|
+
|
|
319
|
+
### v0.3
|
|
320
|
+
- LLM-judge semantic constraint evaluation (for complex, prose-level constraints)
|
|
321
|
+
- Baseline versioning with SemVer
|
|
322
|
+
- Team dashboard (hosted cloud option)
|
|
323
|
+
- Prometheus/Grafana metrics export
|
|
324
|
+
|
|
325
|
+
---
|
|
326
|
+
|
|
327
|
+
## Comparison
|
|
328
|
+
|
|
329
|
+
| Tool | Structural validation | Semantic spec monitoring | Production continuous | CI gate | Open source |
|
|
330
|
+
|------|---|---|---|---|---|
|
|
331
|
+
| Pydantic | YES | NO | NO | NO | YES |
|
|
332
|
+
| DeepEval | No (batch eval) | YES (point-in-time) | NO | YES | YES |
|
|
333
|
+
| Evidently | No (statistical drift) | NO | YES | NO | YES |
|
|
334
|
+
| Langfuse | NO | NO | YES (observability) | NO | YES |
|
|
335
|
+
| **spec-drift** | YES (via Pydantic) | **YES** | **YES** | **YES** | **YES** |
|
|
336
|
+
|
|
337
|
+
---
|
|
338
|
+
|
|
339
|
+
## Contributing
|
|
340
|
+
|
|
341
|
+
spec-drift is MIT licensed. Contributions welcome.
|
|
342
|
+
|
|
343
|
+
```bash
|
|
344
|
+
git clone https://github.com/bibleworld/spec-drift
|
|
345
|
+
cd spec-drift
|
|
346
|
+
pip install -e ".[dev]"
|
|
347
|
+
pytest tests/
|
|
348
|
+
```
|
|
349
|
+
|
|
350
|
+
---
|
|
351
|
+
|
|
352
|
+
## License
|
|
353
|
+
|
|
354
|
+
MIT — free to use, modify, and distribute.
|
|
355
|
+
|
|
356
|
+
---
|
|
357
|
+
|
|
358
|
+
*Built with BibleWorld — Pattern: Leviticus 10:1-3 (The Authorized Fire)*
|
|
359
|
+
*"Among those who approach me I will be proved holy." — Leviticus 10:3*
|