spec-drift 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,61 @@
1
+ name: CI
2
+
3
+ on:
4
+ push:
5
+ branches: ["main", "master"]
6
+ paths:
7
+ - "products/spec-drift/**"
8
+ pull_request:
9
+ paths:
10
+ - "products/spec-drift/**"
11
+
12
+ jobs:
13
+ test:
14
+ runs-on: ubuntu-latest
15
+ defaults:
16
+ run:
17
+ working-directory: products/spec-drift
18
+
19
+ strategy:
20
+ matrix:
21
+ python-version: ["3.10", "3.11", "3.12"]
22
+
23
+ steps:
24
+ - uses: actions/checkout@v4
25
+
26
+ - name: Set up Python ${{ matrix.python-version }}
27
+ uses: actions/setup-python@v5
28
+ with:
29
+ python-version: ${{ matrix.python-version }}
30
+
31
+ - name: Install package + dev dependencies
32
+ run: pip install -e ".[dev]"
33
+
34
+ - name: Run tests
35
+ run: pytest tests/ -v --tb=short
36
+
37
+ publish:
38
+ needs: test
39
+ runs-on: ubuntu-latest
40
+ if: github.ref == 'refs/heads/master' && github.event_name == 'push'
41
+ permissions:
42
+ id-token: write # OIDC trusted publishing — no stored API tokens
43
+
44
+ steps:
45
+ - uses: actions/checkout@v4
46
+
47
+ - name: Set up Python
48
+ uses: actions/setup-python@v5
49
+ with:
50
+ python-version: "3.11"
51
+
52
+ - name: Build distribution
53
+ working-directory: products/spec-drift
54
+ run: |
55
+ pip install hatchling
56
+ python -m hatchling build
57
+
58
+ - name: Publish to PyPI
59
+ uses: pypa/gh-action-pypi-publish@release/v1
60
+ with:
61
+ packages-dir: products/spec-drift/dist/
@@ -0,0 +1,14 @@
1
+ __pycache__/
2
+ *.py[cod]
3
+ *.egg-info/
4
+ dist/
5
+ build/
6
+ .eggs/
7
+ *.spec_drift.db
8
+ spec_drift.db
9
+ *.db
10
+ .pytest_cache/
11
+ .coverage
12
+ htmlcov/
13
+ .venv/
14
+ venv/
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 BuildWorld
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,359 @@
1
+ Metadata-Version: 2.4
2
+ Name: spec-drift
3
+ Version: 0.1.0
4
+ Summary: Semantic specification drift detector for LLM outputs — catches semantic violations Pydantic cannot see.
5
+ Project-URL: Homepage, https://github.com/Rowusuduah/spec-drift
6
+ Project-URL: Repository, https://github.com/Rowusuduah/spec-drift
7
+ Project-URL: Issues, https://github.com/Rowusuduah/spec-drift/issues
8
+ Author-email: Richmond Owusu Duah <Rowusuduah@users.noreply.github.com>
9
+ License: MIT
10
+ License-File: LICENSE
11
+ Keywords: ai,ci-cd,compliance,drift,llm,monitoring,pydantic,quality,semantic,specification
12
+ Classifier: Development Status :: 3 - Alpha
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
19
+ Classifier: Topic :: Software Development :: Quality Assurance
20
+ Requires-Python: >=3.10
21
+ Requires-Dist: pydantic>=2.0
22
+ Provides-Extra: dev
23
+ Requires-Dist: black; extra == 'dev'
24
+ Requires-Dist: pytest-cov; extra == 'dev'
25
+ Requires-Dist: pytest>=7.0; extra == 'dev'
26
+ Requires-Dist: ruff; extra == 'dev'
27
+ Description-Content-Type: text/markdown
28
+
29
+ # spec-drift
30
+
31
+ **Your LLM outputs pass Pydantic. That's not enough.**
32
+
33
+ spec-drift detects when your LLM outputs drift semantically from their declared specification — even when structural validation passes. It monitors continuous semantic compliance in production, generates drift reports, and provides a CI gate for semantic regression testing.
34
+
35
+ ---
36
+
37
+ ## The Problem
38
+
39
+ Every team shipping LLM features uses Pydantic or JSON Schema to validate output structure. These tools are excellent — and completely blind to semantic drift.
40
+
41
+ Consider what happens in production LLM systems over time:
42
+
43
+ **Silent model updates:** Your LLM provider silently updates the underlying model. The API contract (field names, types, schema) doesn't change. But the distribution of values inside those fields shifts. Your "sentiment" field starts returning "ambivalent" where it previously returned "neutral." Your "risk_level" classification shifts its decision boundary. Pydantic sees nothing.
44
+
45
+ **Prompt erosion:** A prompt is modified through six iterations of "just a small tweak." Each tweak passes regression tests individually. But cumulatively, the semantic profile of outputs drifts. The "reasoning" field that used to average 120 words now averages 30. Your validation still passes.
46
+
47
+ **Input distribution shift:** A new user cohort or marketing campaign brings different input patterns. The same model, same prompt, same schema — but outputs drift from the spec because they were calibrated for a different input distribution.
48
+
49
+ spec-drift catches all of these. Pydantic catches none of them.
50
+
51
+ ---
52
+
53
+ ## The Biblical Foundation
54
+
55
+ *"Moses then said to Aaron, 'This is what the Lord spoke of when he said: Among those who approach me I will be proved holy.'"* — Leviticus 10:3
56
+
57
+ Nadab and Abihu offered "unauthorized fire" — structurally correct (fire), instrumentally correct (censers), personally authorized (priests). But the semantic specification was violated. Every structural check passed. The semantic compliance check failed.
58
+
59
+ spec-drift applies this principle to LLM outputs: structural validation is necessary but not sufficient. Semantic specification must be declared and continuously monitored.
60
+
61
+ *BibleWorld build — PAT-037, Pivot_Score 8.63*
62
+
63
+ ---
64
+
65
+ ## Installation
66
+
67
+ ```bash
68
+ pip install spec-drift
69
+ ```
70
+
71
+ **Requirements:** Python 3.10+, Pydantic v2
72
+
73
+ ---
74
+
75
+ ## Quick Start
76
+
77
+ ### 1. Declare a semantic spec
78
+
79
+ ```python
80
+ from pydantic import BaseModel
81
+ from spec_drift import spec, SemanticConstraint
82
+
83
+ @spec(
84
+ category=SemanticConstraint.from_authorized_values(
85
+ ["positive", "negative", "neutral"],
86
+ tolerance=0.02, # max 2% outputs outside authorized set
87
+ alert_threshold=0.10 # alert if >10% observations violate
88
+ ),
89
+ reasoning=SemanticConstraint.from_length_bounds(
90
+ min_words=30,
91
+ max_words=300,
92
+ alert_threshold=0.15
93
+ ),
94
+ score=SemanticConstraint.from_distribution(
95
+ mean=6.5,
96
+ std=2.0,
97
+ drift_threshold=1.0, # alert if mean shifts >1 sigma
98
+ alert_threshold=0.20
99
+ )
100
+ )
101
+ class SentimentAnalysis(BaseModel):
102
+ category: str
103
+ reasoning: str
104
+ score: float
105
+ ```
106
+
107
+ ### 2. Wrap your LLM function
108
+
109
+ ```python
110
+ from spec_drift import DriftMonitor
111
+ import anthropic
112
+
113
+ client = anthropic.Anthropic()
114
+
115
+ monitor = DriftMonitor(
116
+ spec=SentimentAnalysis,
117
+ db_path="./spec_drift.db",
118
+ model_version="claude-3-5-haiku-20241022",
119
+ )
120
+
121
+ @monitor.watch
122
+ def analyze_sentiment(text: str) -> SentimentAnalysis:
123
+ response = client.messages.create(
124
+ model="claude-3-5-haiku-20241022",
125
+ max_tokens=500,
126
+ messages=[{"role": "user", "content": f"Analyze: {text}"}]
127
+ )
128
+ return SentimentAnalysis.model_validate_json(response.content[0].text)
129
+ ```
130
+
131
+ ### 3. Check drift
132
+
133
+ ```bash
134
+ # Terminal drift report (last 7 days)
135
+ spec-drift check --spec my_module.SentimentAnalysis --since 7d
136
+
137
+ # CI gate: fail build if >20% semantic violations
138
+ spec-drift ci \
139
+ --spec my_module.SentimentAnalysis \
140
+ --test-batch data/ci_batch.jsonl \
141
+ --threshold 0.20 \
142
+ --exit-code
143
+ ```
144
+
145
+ ---
146
+
147
+ ## API Reference
148
+
149
+ ### `@spec(**constraints)`
150
+
151
+ Attaches semantic constraints to a Pydantic model class.
152
+
153
+ ```python
154
+ @spec(field_name=SemanticConstraint.from_authorized_values([...]))
155
+ class MyModel(BaseModel):
156
+ field_name: str
157
+ ```
158
+
159
+ ### `SemanticConstraint`
160
+
161
+ #### `SemanticConstraint.from_authorized_values(authorized, tolerance, alert_threshold)`
162
+ Field values must be drawn from the authorized list (within tolerance).
163
+ - `authorized`: list of permitted values
164
+ - `tolerance`: float, max fraction of outputs outside authorized set before constraint flags
165
+ - `alert_threshold`: float, fraction of rolling observations before alert fires
166
+
167
+ #### `SemanticConstraint.from_length_bounds(min_words, max_words, alert_threshold)`
168
+ String field word count must be within [min_words, max_words].
169
+
170
+ #### `SemanticConstraint.from_distribution(mean, std, drift_threshold, alert_threshold)`
171
+ Numeric field should follow a distribution near (mean, std). Alerts if observed mean shifts by more than drift_threshold standard deviations.
172
+
173
+ #### `SemanticConstraint.from_pattern(regex, min_match_rate, alert_threshold)`
174
+ String field should match the regex pattern at min_match_rate frequency.
175
+
176
+ ### `DriftMonitor(spec, db_path, model_version, prompt_hash, alert_callback)`
177
+
178
+ Runtime monitor for semantic specification compliance.
179
+
180
+ #### `.watch` (decorator)
181
+ Wraps an LLM function to automatically observe its return value.
182
+
183
+ #### `.observe(output) -> output`
184
+ Manually observe a Pydantic model instance. Returns the output unchanged.
185
+
186
+ #### `.drift_report(since_hours) -> dict`
187
+ Generate a semantic drift report for the last N hours.
188
+
189
+ ```python
190
+ {
191
+ "spec": "SentimentAnalysis",
192
+ "period_hours": 168.0,
193
+ "observations": 4523,
194
+ "violation_rate": 0.0312,
195
+ "severity": "low",
196
+ "field_violation_rates": {
197
+ "category": 0.0089,
198
+ "reasoning": 0.0221,
199
+ "score": 0.0002
200
+ }
201
+ }
202
+ ```
203
+
204
+ #### `run_ci_gate(monitor, test_outputs, threshold) -> (passed, report)`
205
+ Run a CI gate on a batch of test outputs. Returns (passed, report).
206
+
207
+ ---
208
+
209
+ ## CLI Reference
210
+
211
+ ```bash
212
+ # Initialize spec-drift in a project
213
+ spec-drift init
214
+
215
+ # Calibrate a baseline from golden data
216
+ spec-drift calibrate \
217
+ --spec my_module.SentimentAnalysis \
218
+ --input-file data/golden_set.jsonl \
219
+ --output baseline.db
220
+
221
+ # Drift report (table format, last 7 days)
222
+ spec-drift check \
223
+ --spec my_module.SentimentAnalysis \
224
+ --since 7d \
225
+ --format table
226
+
227
+ # CI gate
228
+ spec-drift ci \
229
+ --spec my_module.SentimentAnalysis \
230
+ --test-batch data/ci_batch.jsonl \
231
+ --threshold 0.20 \
232
+ --exit-code
233
+
234
+ # Compare two model versions
235
+ spec-drift compare \
236
+ --spec my_module.SentimentAnalysis \
237
+ --baseline-a baseline_gpt4o.db \
238
+ --baseline-b baseline_claude_haiku.db
239
+
240
+ # HTML report
241
+ spec-drift report \
242
+ --spec my_module.SentimentAnalysis \
243
+ --since 30d \
244
+ --output report.html
245
+ ```
246
+
247
+ ---
248
+
249
+ ## GitHub Action
250
+
251
+ ```yaml
252
+ # .github/workflows/llm-spec-check.yml
253
+ name: LLM Semantic Spec Check
254
+
255
+ on: [push, pull_request]
256
+
257
+ jobs:
258
+ spec-drift:
259
+ runs-on: ubuntu-latest
260
+ steps:
261
+ - uses: actions/checkout@v4
262
+ - uses: spec-drift/action@v1
263
+ with:
264
+ spec: my_module.SentimentAnalysis
265
+ test-batch: data/ci_batch.jsonl
266
+ threshold: '0.20'
267
+ ```
268
+
269
+ ---
270
+
271
+ ## Severity Levels
272
+
273
+ | Violation Rate | Severity | Recommended Action |
274
+ |---|---|---|
275
+ | 0% | NONE | No action needed |
276
+ | < 5% | LOW | Monitor, no immediate action |
277
+ | 5-15% | MEDIUM | Investigate, create issue |
278
+ | 15-30% | HIGH | Rollback or prompt fix recommended |
279
+ | > 30% | CRITICAL | Immediate rollback |
280
+
281
+ ---
282
+
283
+ ## Storage
284
+
285
+ spec-drift uses SQLite by default — zero infrastructure required.
286
+
287
+ ```python
288
+ # Local development (default)
289
+ monitor = DriftMonitor(spec=MyModel, db_path="./spec_drift.db")
290
+
291
+ # PostgreSQL for production (coming in v0.2)
292
+ monitor = DriftMonitor(
293
+ spec=MyModel,
294
+ db_url="postgresql://user:pass@host/db"
295
+ )
296
+
297
+ # In-memory for testing
298
+ monitor = DriftMonitor(spec=MyModel, db_path=":memory:")
299
+ ```
300
+
301
+ ---
302
+
303
+ ## Roadmap
304
+
305
+ ### v0.1 (this release)
306
+ - Core `@spec` decorator + `SemanticConstraint` DSL
307
+ - `DriftMonitor` with `.watch` and `.observe`
308
+ - SQLite observation store
309
+ - `run_ci_gate` function
310
+ - CLI: `check`, `ci`, `compare`
311
+
312
+ ### v0.2
313
+ - PostgreSQL support
314
+ - Multi-field correlation monitoring
315
+ - Automatic model version detection (via LLM API response headers)
316
+ - Slack/PagerDuty alert integrations
317
+ - HTML drift reports
318
+
319
+ ### v0.3
320
+ - LLM-judge semantic constraint evaluation (for complex, prose-level constraints)
321
+ - Baseline versioning with SemVer
322
+ - Team dashboard (hosted cloud option)
323
+ - Prometheus/Grafana metrics export
324
+
325
+ ---
326
+
327
+ ## Comparison
328
+
329
+ | Tool | Structural validation | Semantic spec monitoring | Production continuous | CI gate | Open source |
330
+ |------|---|---|---|---|---|
331
+ | Pydantic | YES | NO | NO | NO | YES |
332
+ | DeepEval | No (batch eval) | YES (point-in-time) | NO | YES | YES |
333
+ | Evidently | No (statistical drift) | NO | YES | NO | YES |
334
+ | Langfuse | NO | NO | YES (observability) | NO | YES |
335
+ | **spec-drift** | YES (via Pydantic) | **YES** | **YES** | **YES** | **YES** |
336
+
337
+ ---
338
+
339
+ ## Contributing
340
+
341
+ spec-drift is MIT licensed. Contributions welcome.
342
+
343
+ ```bash
344
+ git clone https://github.com/bibleworld/spec-drift
345
+ cd spec-drift
346
+ pip install -e ".[dev]"
347
+ pytest tests/
348
+ ```
349
+
350
+ ---
351
+
352
+ ## License
353
+
354
+ MIT — free to use, modify, and distribute.
355
+
356
+ ---
357
+
358
+ *Built with BibleWorld — Pattern: Leviticus 10:1-3 (The Authorized Fire)*
359
+ *"Among those who approach me I will be proved holy." — Leviticus 10:3*