agent-driftwatch 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,418 @@
1
+ Metadata-Version: 2.4
2
+ Name: agent-driftwatch
3
+ Version: 0.1.0
4
+ Summary: Real-time memory health monitor and context rot detector for AI agents
5
+ Project-URL: Homepage, https://github.com/hawks6/driftwatch
6
+ Project-URL: Repository, https://github.com/hawks6/driftwatch
7
+ Project-URL: Bug Tracker, https://github.com/hawks6/driftwatch/issues
8
+ Project-URL: Documentation, https://github.com/hawks6/driftwatch#readme
9
+ Author-email: Praveen <gpraveen6828@gmail.com>
10
+ License: MIT
11
+ License-File: LICENSE
12
+ Keywords: ai-agents,anthropic,claude,context,drift,llm,memory,monitoring
13
+ Classifier: Development Status :: 3 - Alpha
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: Intended Audience :: Science/Research
16
+ Classifier: License :: OSI Approved :: MIT License
17
+ Classifier: Programming Language :: Python :: 3
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
21
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
22
+ Requires-Python: >=3.11
23
+ Requires-Dist: anthropic>=0.50.0
24
+ Requires-Dist: jsonlines>=4.0.0
25
+ Requires-Dist: numpy>=1.26.0
26
+ Requires-Dist: pydantic>=2.0.0
27
+ Requires-Dist: rich>=13.7.0
28
+ Requires-Dist: sentence-transformers<4.0.0,>=3.0.0
29
+ Requires-Dist: transformers<4.46.0,>=4.41.0
30
+ Requires-Dist: typer>=0.12.0
31
+ Provides-Extra: dev
32
+ Requires-Dist: mypy>=1.10.0; extra == 'dev'
33
+ Requires-Dist: pytest-cov>=5.0.0; extra == 'dev'
34
+ Requires-Dist: pytest>=8.0.0; extra == 'dev'
35
+ Requires-Dist: ruff>=0.4.0; extra == 'dev'
36
+ Description-Content-Type: text/markdown
37
+
38
+ # DriftWatch 🧭
39
+
40
+ > **Real-time memory health monitoring for AI agents.**
41
+ > Detect context rot before your agent goes off the rails.
42
+
43
+ [![PyPI version](https://img.shields.io/pypi/v/driftwatch.svg?color=blue&label=pypi)](https://pypi.org/project/driftwatch/)
44
+ [![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/downloads/)
45
+ [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
46
+ [![arXiv](https://img.shields.io/badge/arXiv-2601.04170-b31b1b.svg)](https://arxiv.org/abs/2601.04170)
47
+
48
+ ```
49
+ ┌─ DriftWatch ─────────────────────────────────────────────────────────┐
50
+ │ Goal: "Conduct a comprehensive research survey on Python performance" │
51
+ ├──────────────────────────┬───────────────────────────────────────────┤
52
+ │ Health Score │ Signal Breakdown │
53
+ │ │ │
54
+ │ ████████░░░ 0.72 │ Goal Coherence ██████████░ 0.81 │
55
+ │ [HEALTHY] │ Entropy ████████░░░ 0.68 │
56
+ │ │ Memory Delta █████░░░░░░ 0.54 │
57
+ ├──────────────────────────┼───────────────────────────────────────────┤
58
+ │ Turn 12 │ Tokens: 48,230 / 200,000 (24%) │
59
+ ├──────────────────────────┴───────────────────────────────────────────┤
60
+ │ Recent: T08 0.79✓ T09 0.76✓ T10 0.68⚠ T11 0.61⚠ T12 0.72✓ │
61
+ └──────────────────────────────────────────────────────────────────────┘
62
+ ```
63
+
64
+ ---
65
+
66
+ ## The problem
67
+
68
+ Long-running AI agents don't fail all at once — they drift.
69
+ By the time your agent produces clearly wrong output, it has been silently
70
+ degrading for dozens of turns. **Context rot** is the progressive loss of
71
+ reasoning quality that starts at 60–70% context fill, not at 100%.
72
+
73
+ A 2025–2026 industry analysis found that **~65% of enterprise AI agent
74
+ failures** were caused by context drift or memory loss during multi-step
75
+ reasoning — *not* by raw context exhaustion. The degradation is measurable,
76
+ predictable, and preventable. DriftWatch does all three.
77
+
78
+ ---
79
+
80
+ ## Install
81
+
82
+ ```bash
83
+ pip install agent-driftwatch
84
+ ```
85
+
86
+ Or from source:
87
+
88
+ ```bash
89
+ git clone https://github.com/your-org/driftwatch
90
+ cd driftwatch
91
+ pip install -e .
92
+ ```
93
+
94
+ ---
95
+
96
+ ## 30-second start
97
+
98
+ ```python
99
+ import os
100
+ import anthropic
101
+ import driftwatch
102
+
103
+ # Wrap your existing Anthropic client — one line change
104
+ client = driftwatch.wrap(
105
+ anthropic.Anthropic(),
106
+ goal="Explain the key principles of clean code and give Python examples",
107
+ threshold=0.55, # trigger action below this health score
108
+ on_drift="alert", # "checkpoint" | "compact" | "alert" | callable
109
+ dashboard=True, # Rich live terminal panel
110
+ )
111
+
112
+ messages = []
113
+ topics = [
114
+ "What are the most important principles of clean code?",
115
+ "Can you give a Python example of the Single Responsibility Principle?",
116
+ "How does dependency injection improve testability?",
117
+ "What's the difference between early return and guard clauses?",
118
+ "Give me a before/after refactor of a messy Python function.",
119
+ ]
120
+
121
+ for turn, question in enumerate(topics, start=1):
122
+ messages.append({"role": "user", "content": question})
123
+
124
+ response = client.messages.create(
125
+ model="claude-sonnet-4-6",
126
+ max_tokens=512,
127
+ messages=messages,
128
+ )
129
+
130
+ messages.append({"role": "assistant", "content": response.content[0].text})
131
+
132
+ event = client.drift_history[-1]
133
+ print(f"Turn {turn} | health={event.health_score:.3f} | tokens={event.token_count:,}")
134
+ ```
135
+
136
+ Output:
137
+ ```
138
+ Turn 1 | health=0.914 | tokens=1,240
139
+ Turn 2 | health=0.882 | tokens=2,890
140
+ Turn 3 | health=0.856 | tokens=4,780
141
+ Turn 4 | health=0.824 | tokens=7,120
142
+ Turn 5 | health=0.793 | tokens=9,870
143
+ ```
144
+
145
+ ---
146
+
147
+ ## How it works
148
+
149
+ DriftWatch computes a composite **health score** (0.0–1.0) after every turn
150
+ by combining three independently validated signals:
151
+
152
+ | Signal | What it measures | Method |
153
+ |--------|-----------------|--------|
154
+ | **Goal Coherence** | How closely the agent's response aligns with the original task intent | Cosine similarity between goal embedding and last-turn embedding ([all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)) |
155
+ | **Repetition Entropy** | Whether the agent is looping or executing diverse actions | Shannon entropy over tool call names / word bigrams in a sliding window |
156
+ | **Memory Delta** | Whether the agent is introducing new facts or just repeating prior context | New-fact ratio via embedding centroid comparison |
157
+
158
+ The composite score is a configurable weighted average:
159
+
160
+ ```
161
+ health_score = 0.50 × goal_coherence
162
+ + 0.30 × repetition_entropy
163
+ + 0.20 × memory_delta
164
+ ```
165
+
166
+ **Color thresholds:**
167
+ - 🟢 `>= 0.70` — Healthy
168
+ - 🟡 `0.55–0.70` — Warning (drift beginning)
169
+ - 🔴 `< 0.55` — Drift detected
170
+
171
+ **Research basis:** arXiv:2601.04170 (Rath, Jan 2026) — *"Agent Drift:
172
+ Quantifying Behavioral Degradation in Multi-Agent LLM Systems"* — formally
173
+ defines semantic drift, coordination drift, and behavioral drift, and
174
+ introduces the Agent Stability Index (ASI) composite metric that DriftWatch
175
+ implements.
176
+
177
+ ---
178
+
179
+ ## Auto-compaction
180
+
181
+ When `on_drift="compact"`, DriftWatch automatically triggers Anthropic's
182
+ **compact-2026-01-12** API to summarise the conversation before continuing:
183
+
184
+ ```python
185
+ client = driftwatch.wrap(
186
+ anthropic.Anthropic(),
187
+ goal="Analyse this codebase for dead code",
188
+ on_drift="compact", # ← auto-compaction on drift
189
+ )
190
+ ```
191
+
192
+ Under the hood, when `health_score < threshold`:
193
+
194
+ ```python
195
+ # DriftWatch calls this automatically:
196
+ response = client.beta.messages.create(
197
+ betas=["compact-2026-01-12"],
198
+ model=model,
199
+ max_tokens=1024,
200
+ messages=messages,
201
+ context_management={
202
+ "edits": [{
203
+ "type": "compact_20260112",
204
+ "pause_after_compaction": True,
205
+ "instructions": "Preserve: original goal, all tool call results, "
206
+ "decisions made, files modified. "
207
+ "Discard: repeated tool outputs, exploratory tangents.",
208
+ }]
209
+ },
210
+ )
211
+ ```
212
+
213
+ The compacted summary replaces the conversation history, token count resets,
214
+ and health scores recover — all transparently. Your agent loop code doesn't
215
+ change at all.
216
+
217
+ ---
218
+
219
+ ## on_drift handlers
220
+
221
+ | Handler | Behaviour |
222
+ |---------|-----------|
223
+ | `"checkpoint"` | Save messages + DriftEvent log to `checkpoint_dir/` |
224
+ | `"compact"` | Trigger Anthropic compaction, then save checkpoint |
225
+ | `"alert"` | Print a warning to `stderr` and continue |
226
+ | `"none"` | Monitor silently, take no action |
227
+ | `callable` | Call `fn(client, event)` — fully custom handler |
228
+
229
+ ```python
230
+ def my_handler(client, event):
231
+ send_slack_alert(f"Agent drift detected! health={event.health_score:.2f}")
232
+ client.save_checkpoint(messages)
233
+
234
+ client = driftwatch.wrap(anthropic.Anthropic(), goal="...", on_drift=my_handler)
235
+ ```
236
+
237
+ ---
238
+
239
+ ## CLI
240
+
241
+ ### Replay a session
242
+
243
+ Visualise a saved event log as a turn-by-turn health timeline:
244
+
245
+ ```bash
246
+ driftwatch replay ./dw_checkpoints/events.jsonl
247
+ ```
248
+
249
+ ```
250
+ DriftWatch Replay — events.jsonl
251
+ Turn │ Health │ GC │ Entropy │ MemDelta │ Tokens │ Status
252
+ ──────┼────────┼───────┼─────────┼──────────┼─────────┼──────────────
253
+ 1 │ 0.92 │ 0.95 │ 0.88 │ 0.93 │ 1,240 │ ✓ healthy
254
+ 2 │ 0.89 │ 0.91 │ 0.85 │ 0.91 │ 2,890 │ ✓ healthy
255
+ ...
256
+ 10 │ 0.52 │ 0.58 │ 0.35 │ 0.42 │ 28,700 │ ✗ DRIFT
257
+ 11 │ 0.48 │ 0.54 │ 0.28 │ 0.38 │ 31,200 │ ✗ DRIFT
258
+ 12 │ 0.44 │ 0.49 │ 0.22 │ 0.33 │ 33,800 │ ★ compacted
259
+ 13 │ 0.83 │ 0.85 │ 0.78 │ 0.88 │ 4,200 │ ✓ healthy
260
+ ```
261
+
262
+ ### Generate a session report
263
+
264
+ ```bash
265
+ driftwatch report ./dw_checkpoints/events.jsonl --format md
266
+ ```
267
+
268
+ ```markdown
269
+ # DriftWatch Session Report
270
+
271
+ | Metric | Value |
272
+ |--------|-------|
273
+ | Total turns | 20 |
274
+ | Average health | 0.741 |
275
+ | First drift turn | T10 |
276
+ | Worst health turn | T12 (0.438) |
277
+ | Drift events (< 0.55) | 3 |
278
+ | Compaction events | 1 |
279
+ ```
280
+
281
+ Or as JSON:
282
+ ```bash
283
+ driftwatch report ./dw_checkpoints/events.jsonl --format json
284
+ ```
285
+
286
+ ### Try the fixture
287
+
288
+ ```bash
289
+ driftwatch replay tests/fixtures/demo_session.jsonl
290
+ ```
291
+
292
+ ---
293
+
294
+ ## Configuration reference
295
+
296
+ ```python
297
+ client = driftwatch.wrap(
298
+ anthropic.Anthropic(),
299
+ goal="...", # required: the semantic anchor
300
+ threshold=0.55, # health score that triggers on_drift
301
+ on_drift="checkpoint", # handler (see table above)
302
+ checkpoint_dir="./dw_checkpoints", # where to save files
303
+ dashboard=True, # Rich live UI (auto-suppressed in CI)
304
+ max_context_tokens=200_000, # context window for token % display
305
+ weights={ # override composite signal weights
306
+ "goal_coherence": 0.50,
307
+ "repetition_entropy": 0.30,
308
+ "memory_delta": 0.20,
309
+ },
310
+ log_path=None, # custom JSONL log path
311
+ )
312
+ ```
313
+
314
+ ---
315
+
316
+ ## DriftEvent schema
317
+
318
+ Every turn produces a `DriftEvent` (Pydantic model):
319
+
320
+ ```python
321
+ @dataclass
322
+ class DriftEvent:
323
+ turn: int # monotonically increasing (1-based)
324
+ timestamp: datetime # UTC
325
+ goal_coherence: float # Signal 1: [0.0, 1.0]
326
+ repetition_entropy: float # Signal 2: [0.0, 1.0]
327
+ memory_delta: float # Signal 3: [0.0, 1.0]
328
+ health_score: float # weighted composite: [0.0, 1.0]
329
+ token_count: int # input_tokens from API usage
330
+ triggered_checkpoint: bool # True if on_drift handler fired
331
+ notes: str # optional annotation
332
+ ```
333
+
334
+ Access the full history:
335
+
336
+ ```python
337
+ for event in client.drift_history:
338
+ print(f"T{event.turn}: {event.health_score:.3f}")
339
+ ```
340
+
341
+ ---
342
+
343
+ ## Roadmap
344
+
345
+ - [ ] OpenAI SDK support
346
+ - [ ] LangGraph integration (`DriftWatchCallbackHandler`)
347
+ - [ ] Multi-agent drift — coordination drift signal across agent network
348
+ - [ ] GitHub Actions reporter (`driftwatch-action`)
349
+ - [ ] Prometheus metrics endpoint
350
+ - [ ] `driftwatch watch <script.py>` — subprocess injection (CLI v0.2)
351
+ - [ ] Grafana dashboard template
352
+
353
+ ---
354
+
355
+ ## Architecture
356
+
357
+ ```
358
+ driftwatch/
359
+ ├── signals.py ← 3 drift signal classes (offline, no API key)
360
+ ├── engine.py ← composite scorer + DriftEvent schema
361
+ ├── wrapper.py ← Anthropic SDK intercept layer
362
+ ├── checkpoint.py ← save/restore + compaction API
363
+ ├── dashboard.py ← Rich live terminal UI
364
+ └── cli.py ← Typer CLI (replay, report, watch)
365
+ ```
366
+
367
+ DriftWatch is an **observer** — it never modifies the response your code
368
+ receives from the Anthropic SDK. It intercepts only to evaluate and log.
369
+ The sole exception is `on_drift="compact"`, which updates your `messages`
370
+ list in place after compaction (your agent continues seamlessly).
371
+
372
+ ---
373
+
374
+ ## Contributing
375
+
376
+ ```bash
377
+ git clone https://github.com/your-org/driftwatch
378
+ cd driftwatch
379
+ pip install -e ".[dev]"
380
+ python -m pytest tests/ -v
381
+ ```
382
+
383
+ All signal tests run without an API key. PRs welcome!
384
+
385
+ ---
386
+
387
+ ## Citation
388
+
389
+ If you use DriftWatch in academic research, please cite the foundational work
390
+ this library is built on:
391
+
392
+ ```bibtex
393
+ @article{rath2026agentdrift,
394
+ title = {Agent Drift: Quantifying Behavioral Degradation in Multi-Agent LLM Systems},
395
+ author = {Rath, et al.},
396
+ journal = {arXiv preprint arXiv:2601.04170},
397
+ year = {2026}
398
+ }
399
+ ```
400
+
401
+ **Related papers:**
402
+ - arXiv:2505.02709 — *"Technical Report: Evaluating Goal Drift in Language Model Agents"*
403
+ — defines GD_actions and GD_inaction metrics
404
+ - arXiv:2510.00615 — *"ACON: Optimizing Context Compression for Long-horizon LLM Agents"*
405
+ — validates 26–54% peak token reduction with smart compression
406
+
407
+ ---
408
+
409
+ ## License
410
+
411
+ MIT — see [LICENSE](LICENSE).
412
+
413
+ ---
414
+
415
+ <p align="center">
416
+ Built with ❤️ for the AI engineering community.<br/>
417
+ If DriftWatch saved your agent, give it a ⭐
418
+ </p>
@@ -0,0 +1,12 @@
1
+ driftwatch/__init__.py,sha256=Htxr-22g5tdJGf3WF5vp-WKcKeyjrb0tp-N9suU3LRc,897
2
+ driftwatch/checkpoint.py,sha256=EaQ05-oS3LcLE_CspcCwgRX6r5CvEATD6qn2LFOyzD4,9179
3
+ driftwatch/cli.py,sha256=sNavATC-Brt1rFtJcNKSghOP56QDJdvmOkxrSz239Lk,9031
4
+ driftwatch/dashboard.py,sha256=33bxTCqnubduiZDQna99EWJmOIrf3Psf0cBkCDhnlAU,9110
5
+ driftwatch/engine.py,sha256=WMReeTeJ2UEG9zS5un_B2Cnp60j0LBuZTGXW7IVGAaA,6613
6
+ driftwatch/signals.py,sha256=yJNSayC9j4zmDHfmS2JulRHNUrd4iyWsTkA6RqnJs7o,9090
7
+ driftwatch/wrapper.py,sha256=X1ZfdvaHhSd_4w1cth_cWru5T0BqZVI9LqKD51PuV0w,13835
8
+ agent_driftwatch-0.1.0.dist-info/METADATA,sha256=FY27DRDCkjnZNwnqzf9J-5vnQ3qLdWf0MUxtjt4Y-XA,14532
9
+ agent_driftwatch-0.1.0.dist-info/WHEEL,sha256=mffPy8wBnZQn2VnJUU5jE99KsxaSfiyMHV9Yt0aLVxs,87
10
+ agent_driftwatch-0.1.0.dist-info/entry_points.txt,sha256=TC1rglbfH3rlcaSTnaT59Aao6RJixdxJJHuQcbONYRA,51
11
+ agent_driftwatch-0.1.0.dist-info/licenses/LICENSE,sha256=J161VpWy8YHSX997h8UHNVgOG8Til1UqK6ui-evf5l4,1080
12
+ agent_driftwatch-0.1.0.dist-info/RECORD,,
@@ -0,0 +1,4 @@
1
+ Wheel-Version: 1.0
2
+ Generator: hatchling 1.30.1
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ driftwatch = driftwatch.cli:main
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 DriftWatch Contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
driftwatch/__init__.py ADDED
@@ -0,0 +1,39 @@
1
+ """
2
+ driftwatch/__init__.py
3
+ ──────────────────────
4
+ Public API for the DriftWatch library.
5
+
6
+ Minimal surface area by design:
7
+
8
+ import anthropic
9
+ import driftwatch
10
+
11
+ client = driftwatch.wrap(
12
+ anthropic.Anthropic(),
13
+ goal="Summarise this codebase and identify dead code",
14
+ )
15
+
16
+ # Use exactly like anthropic.Anthropic():
17
+ response = client.messages.create(
18
+ model="claude-sonnet-4-6",
19
+ max_tokens=4096,
20
+ messages=messages,
21
+ )
22
+
23
+ # Inspect health history:
24
+ for event in client.drift_history:
25
+ print(event.turn, event.health_score)
26
+ """
27
+ from __future__ import annotations
28
+
29
+ from driftwatch.engine import DriftEvent, SignalEngine
30
+ from driftwatch.wrapper import DriftWatchClient, wrap
31
+
32
+ __all__ = [
33
+ "wrap",
34
+ "DriftWatchClient",
35
+ "DriftEvent",
36
+ "SignalEngine",
37
+ ]
38
+
39
+ __version__ = "0.1.0"