agentrec 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. agentrec-0.2.0/.claude/scheduled_tasks.lock +1 -0
  2. agentrec-0.2.0/.github/workflows/ci.yml +26 -0
  3. agentrec-0.2.0/.gitignore +12 -0
  4. agentrec-0.2.0/CHANGELOG.md +36 -0
  5. agentrec-0.2.0/LICENSE +21 -0
  6. agentrec-0.2.0/NOTICE +24 -0
  7. agentrec-0.2.0/PKG-INFO +269 -0
  8. agentrec-0.2.0/README.md +239 -0
  9. agentrec-0.2.0/agentrec/__init__.py +94 -0
  10. agentrec-0.2.0/agentrec/__main__.py +3 -0
  11. agentrec-0.2.0/agentrec/capture.py +46 -0
  12. agentrec-0.2.0/agentrec/cli.py +173 -0
  13. agentrec-0.2.0/agentrec/comparators.py +271 -0
  14. agentrec-0.2.0/agentrec/keying.py +133 -0
  15. agentrec-0.2.0/agentrec/migration.py +506 -0
  16. agentrec-0.2.0/agentrec/providers/__init__.py +248 -0
  17. agentrec-0.2.0/agentrec/providers/anthropic.py +160 -0
  18. agentrec-0.2.0/agentrec/providers/base.py +152 -0
  19. agentrec-0.2.0/agentrec/providers/openai.py +159 -0
  20. agentrec-0.2.0/agentrec/report.py +424 -0
  21. agentrec-0.2.0/agentrec/session.py +160 -0
  22. agentrec-0.2.0/agentrec/store.py +338 -0
  23. agentrec-0.2.0/agentrec/transport.py +271 -0
  24. agentrec-0.2.0/pyproject.toml +53 -0
  25. agentrec-0.2.0/requirements.txt +26 -0
  26. agentrec-0.2.0/tests/__init__.py +0 -0
  27. agentrec-0.2.0/tests/test_anthropic.py +180 -0
  28. agentrec-0.2.0/tests/test_comparators.py +143 -0
  29. agentrec-0.2.0/tests/test_filestore.py +216 -0
  30. agentrec-0.2.0/tests/test_live_record.py +66 -0
  31. agentrec-0.2.0/tests/test_migration.py +499 -0
  32. agentrec-0.2.0/tests/test_non_streaming.py +144 -0
  33. agentrec-0.2.0/tests/test_providers.py +346 -0
  34. agentrec-0.2.0/tests/test_session.py +201 -0
  35. agentrec-0.2.0/tests/test_streaming.py +303 -0
@@ -0,0 +1 @@
1
+ {"sessionId":"49bf5cc6-008f-4af3-9bd0-e4b04cb0b71b","pid":29944,"acquiredAt":1781171304285}
@@ -0,0 +1,26 @@
1
+ name: CI
2
+
3
+ on:
4
+ push:
5
+ branches: [main]
6
+ pull_request:
7
+
8
+ jobs:
9
+ test:
10
+ strategy:
11
+ fail-fast: false
12
+ matrix:
13
+ os: [ubuntu-latest, windows-latest]
14
+ python-version: ["3.10", "3.11", "3.12", "3.13"]
15
+ runs-on: ${{ matrix.os }}
16
+ steps:
17
+ - uses: actions/checkout@v4
18
+ - uses: actions/setup-python@v5
19
+ with:
20
+ python-version: ${{ matrix.python-version }}
21
+ - name: Install
22
+ run: pip install -e ".[dev]"
23
+ # No API keys in CI: the live record/replay tests skip themselves and
24
+ # the offline suite (canned SSE/JSON fixtures) is what gates merges.
25
+ - name: Test
26
+ run: pytest -q
@@ -0,0 +1,12 @@
1
+ **/__pycache__
2
+ *.pyc
3
+ *.pyo
4
+ .pytest_cache/
5
+ *.egg-info/
6
+ dist/
7
+ .env
8
+ venv/
9
+ .venv/
10
+ corpus/
11
+ migration-report__*
12
+ examples/
@@ -0,0 +1,36 @@
1
+ # Changelog
2
+
3
+ ## 0.2.0 — 2026-06-11
4
+
5
+ First public (PyPI) release.
6
+
7
+ ### Added
8
+ - **Model-migration report**: `agentrec migrate | report | annotate` CLI.
9
+ Replays every corpus prompt against a target model (cross-provider
10
+ OpenAI ↔ Anthropic translation included), caches target answers as
11
+ `migration__…` cassettes, and renders Markdown/HTML/console reports.
12
+ - Comparators: `exact`, `fuzzy` (offline), `embedding`, `judge` (live), all
13
+ scored side-by-side in one run.
14
+ - **Per-category breakdown**: recordings tagged via
15
+ `cassette(store, metadata={"category": "..."})` are grouped per task type
16
+ in the report.
17
+ - **Output-token columns** per row, per category, and report-wide
18
+ (baseline vs target volume and ratio) as a verbosity/cost signal.
19
+ - **Concurrent row scoring** in `run_migration` (`concurrency`, default 8),
20
+ with deterministic report order and a `progress` callback.
21
+ - **Retry with backoff** on rate-limited/overloaded target calls
22
+ (429/500/502/503/529), honouring `Retry-After`; failed responses are never
23
+ cached.
24
+ - `agentrec[compression]` extra for brotli/zstd cassette decoding.
25
+
26
+ ### Fixed
27
+ - Corpus tooling (migration, summaries) now decompresses recorded responses
28
+ per their `Content-Encoding` (gzip/deflate built in, brotli/zstd via the
29
+ extra). Replay was always correct; decoding raw chunks was not.
30
+
31
+ ## 0.1.0
32
+
33
+ Internal prototype: record/replay at the httpx transport layer (streaming SSE
34
+ and non-streaming JSON), `InMemoryStore`/`FileStore` with header redaction and
35
+ request-body secret scrubbing, request-fingerprint keying with
36
+ provider/model/semantic-key provenance, `async_client()` + `cassette` facade.
agentrec-0.2.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Pi-Wi
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
agentrec-0.2.0/NOTICE ADDED
@@ -0,0 +1,24 @@
1
+ agentrec
2
+ ========
3
+
4
+ Copyright (c) 2026 agentrec contributors.
5
+
6
+ This project is licensed under the MIT License (see LICENSE).
7
+
8
+ Third-party acknowledgements
9
+ -----------------------------
10
+
11
+ **baml_vcr** (https://github.com/BoundaryML/baml_vcr)
12
+ MIT License — Copyright (c) BoundaryML contributors
13
+
14
+ The streaming chunk capture and replay pattern in agentrec — specifically
15
+ the idea of recording each SSE byte frame as an ordered list of chunks with
16
+ relative timestamps and re-emitting them in the same order during replay —
17
+ draws inspiration from baml_vcr's approach.
18
+
19
+ agentrec's interception mechanism is deliberately different: instead of
20
+ patching a specific framework's client or collector (as baml_vcr does),
21
+ agentrec wraps httpx's AsyncBaseTransport so that interception sits below
22
+ any SDK abstraction (OpenAI, Anthropic, LangChain, etc.) and requires no
23
+ framework-specific code. Only the streaming-chunk pattern was taken as
24
+ inspiration; no source code was copied.
@@ -0,0 +1,269 @@
1
+ Metadata-Version: 2.4
2
+ Name: agentrec
3
+ Version: 0.2.0
4
+ Summary: Framework-agnostic record/replay for LLM API interactions (streaming and non-streaming)
5
+ License-Expression: MIT
6
+ License-File: LICENSE
7
+ License-File: NOTICE
8
+ Keywords: anthropic,httpx,llm,openai,record,replay,testing,vcr
9
+ Classifier: Development Status :: 4 - Beta
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: Programming Language :: Python :: 3
12
+ Classifier: Programming Language :: Python :: 3.10
13
+ Classifier: Programming Language :: Python :: 3.11
14
+ Classifier: Programming Language :: Python :: 3.12
15
+ Classifier: Programming Language :: Python :: 3.13
16
+ Classifier: Topic :: Software Development :: Libraries
17
+ Classifier: Topic :: Software Development :: Testing
18
+ Requires-Python: >=3.10
19
+ Requires-Dist: httpx>=0.27
20
+ Provides-Extra: compression
21
+ Requires-Dist: brotli>=1.1; extra == 'compression'
22
+ Requires-Dist: zstandard>=0.22; extra == 'compression'
23
+ Provides-Extra: dev
24
+ Requires-Dist: anthropic>=0.40; extra == 'dev'
25
+ Requires-Dist: openai>=1.30; extra == 'dev'
26
+ Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
27
+ Requires-Dist: pytest>=8.0; extra == 'dev'
28
+ Requires-Dist: python-dotenv>=1.0; extra == 'dev'
29
+ Description-Content-Type: text/markdown
30
+
31
+ # agentrec
32
+
33
+ Framework-agnostic record/replay for streaming LLM API interactions.
34
+ Records and replays at the **httpx transport layer**, so it works below
35
+ the OpenAI SDK, the Anthropic SDK, LangChain, or any other httpx-backed client —
36
+ the core depends on nothing but `httpx`.
37
+
38
+ > **Status:** beta (0.2). The record/replay mechanic is proven for streaming
39
+ > (SSE) and non-streaming (JSON) responses, for both OpenAI and Anthropic.
40
+ > On top of the recorded corpus sits a working **model-migration report**
41
+ > (see [Model-migration report](#model-migration-report)). The API may still
42
+ > change in minor releases before 1.0.
43
+ >
44
+ > **Scope limits:** record/replay works for *any* httpx-backed SDK, but the
45
+ > migration runner's cross-provider translation covers OpenAI ↔ Anthropic and
46
+ > **text-only** conversations — requests using tools, images, or other rich
47
+ > content become clearly-reasoned skipped rows rather than translations.
48
+
49
+ ---
50
+
51
+ ## Architecture
52
+
53
+ ```
54
+ agentrec/
55
+ capture.py # CapturedChunk, CapturedRequest, CapturedInteraction — storage-agnostic data
56
+ keying.py # fingerprint() — provider/model/semantic_key + the default cassette id
57
+ store.py # InteractionStore ABC + InMemoryStore + FileStore (JSON cassettes)
58
+ transport.py # RecordingTransport, ReplayTransport, AutoTransport (the low-level seam)
59
+ session.py # async_client() + cassette — the high-level, ergonomic seam
60
+ providers/ # ProviderAdapter registry: OpenAI + Anthropic request/response dialects
61
+ comparators.py # exact / fuzzy (offline), embedding / judge (live) response scoring
62
+ migration.py # run_migration() — replay the corpus against a candidate model
63
+ report.py # Markdown / HTML / console rendering of a MigrationReport
64
+ cli.py # `agentrec migrate | report | annotate`
65
+ ```
66
+
67
+ Key design commitments:
68
+
69
+ - **Tee, don't intercept-and-buffer.** `RecordingTransport` wraps the live
70
+ stream so the caller and the store both see every chunk in order, without
71
+ the recorder buffering the whole response first.
72
+ - **Raw bytes, no parsing.** Chunks are stored as the original SSE byte frames.
73
+ The SDK parser re-runs on replay and produces the same objects it would have
74
+ from the network. OpenAI SSE and Anthropic SSE look identical here — both are
75
+ byte streams — which is why one codebase covers both with no provider branches.
76
+ - **Injected store.** `InMemoryStore` (volatile) and `FileStore` (human-readable
77
+ JSON cassettes, atomic writes, secret-scrubbing) both satisfy `InteractionStore`.
78
+ A future store (Parquet corpus, S3, …) drops in without touching transport code.
79
+ - **Distinct transport classes.** `RecordingTransport` requires an inner
80
+ transport; `ReplayTransport` has none — it *cannot* accidentally touch the
81
+ network. `AutoTransport` composes the two for cassette semantics.
82
+ - **Request-derived keys.** Each interaction is keyed by a fingerprint of the
83
+ request (method + path + model + normalised body), so one transport handles
84
+ many distinct calls and the same call replays deterministically.
85
+
86
+ ---
87
+
88
+ ## Install
89
+
90
+ ```bash
91
+ pip install agentrec # core is httpx-only
92
+ pip install "agentrec[compression]" # + brotli/zstd cassette decoding
93
+
94
+ # from a checkout:
95
+ pip install -e ".[dev]" # the dev extra adds the SDKs + pytest
96
+ ```
97
+
98
+ ---
99
+
100
+ ## Quick start — the high-level seam
101
+
102
+ Build one `agentrec.async_client()` and pass it to any httpx-based SDK. Wrap
103
+ your calls in a `cassette`: `mode="auto"` replays a request if it's been
104
+ recorded, otherwise records it (true VCR-style cassette behaviour).
105
+
106
+ ```python
107
+ import agentrec
108
+ from openai import AsyncOpenAI
109
+
110
+ store = agentrec.FileStore("corpus")
111
+ http = agentrec.async_client() # honours the active cassette scope
112
+ oai = AsyncOpenAI(http_client=http)
113
+
114
+ # Streaming — every call inside is recorded once, then replayed:
115
+ @agentrec.cassette(store, mode="auto")
116
+ async def ask_stream(prompt: str) -> str:
117
+ stream = await oai.chat.completions.create(
118
+ model="gpt-4o-mini",
119
+ messages=[{"role": "user", "content": prompt}],
120
+ stream=True,
121
+ )
122
+ out = ""
123
+ async for chunk in stream:
124
+ if chunk.choices and chunk.choices[0].delta.content:
125
+ out += chunk.choices[0].delta.content
126
+ return out
127
+
128
+ # Non-streaming — works identically; the JSON body is one chunk at the transport layer:
129
+ @agentrec.cassette(store, mode="auto")
130
+ async def ask(prompt: str) -> str:
131
+ response = await oai.chat.completions.create(
132
+ model="gpt-4o-mini",
133
+ messages=[{"role": "user", "content": prompt}],
134
+ )
135
+ return response.choices[0].message.content
136
+
137
+ # Or as a context manager:
138
+ async with agentrec.cassette(store, mode="record"):
139
+ await oai.chat.completions.create(...)
140
+ ```
141
+
142
+ The same `async_client` + `cassette` works against the Anthropic SDK unchanged —
143
+ just `AsyncAnthropic(http_client=http)`.
144
+
145
+ ---
146
+
147
+ ## Lower-level seam — explicit transports
148
+
149
+ When you'd rather wire the httpx client yourself (no contextvar), use the
150
+ transports directly. `key` is optional: omit it for request-derived keying, or
151
+ pass a fixed id for a single named cassette.
152
+
153
+ ```python
154
+ import httpx
155
+ from openai import AsyncOpenAI
156
+ from agentrec import FileStore, RecordingTransport, ReplayTransport
157
+
158
+ store = FileStore("corpus")
159
+
160
+ # --- Record (needs network) ---
161
+ async with httpx.AsyncClient(
162
+ transport=RecordingTransport(httpx.AsyncHTTPTransport(), store, key="weather")
163
+ ) as http_client:
164
+ client = AsyncOpenAI(http_client=http_client)
165
+ stream = await client.chat.completions.create(..., stream=True)
166
+ async for chunk in stream:
167
+ ... # caller receives the live stream unchanged
168
+
169
+ # --- Replay (offline, no key needed if you recorded with request keying) ---
170
+ async with httpx.AsyncClient(transport=ReplayTransport(store, key="weather")) as http_client:
171
+ client = AsyncOpenAI(http_client=http_client)
172
+ stream = await client.chat.completions.create(..., stream=True)
173
+ async for chunk in stream:
174
+ ... # identical to the recorded run
175
+ ```
176
+
177
+ ---
178
+
179
+ ## Provider support
180
+
181
+ Interception is at the httpx transport, so agentrec is provider-neutral for
182
+ **any SDK that lets you pass an httpx client**:
183
+
184
+ | SDK / client | Works | How |
185
+ | ------------------------------------ | :---: | ------------------------------------------- |
186
+ | OpenAI (`openai`) | ✅ | `AsyncOpenAI(http_client=...)` |
187
+ | Anthropic (`anthropic`) | ✅ | `AsyncAnthropic(http_client=...)` |
188
+ | Most modern httpx-based SDKs / LangChain | ✅ | pass the agentrec httpx client through |
189
+ | **Non-httpx SDKs** (AWS Bedrock/boto3, some Vertex paths) | ❌ | they don't route through httpx, so the transport never sees the call — a different seam would be needed |
190
+
191
+ The boundary is "httpx-backed," not "OpenAI." If a client opens its sockets
192
+ through `botocore`/`urllib3` instead of httpx, transport interception can't see
193
+ it.
194
+
195
+ ---
196
+
197
+ ## Running the tests
198
+
199
+ ```bash
200
+ pytest -q
201
+ ```
202
+
203
+ | Test file | Needs a key? | What it proves |
204
+ | ------------------------ | ------------ | ----------------------------------------------------------------- |
205
+ | `tests/test_streaming.py` | offline + `OPENAI_API_KEY` | OpenAI SSE replay mechanic; live record→replay identity |
206
+ | `tests/test_non_streaming.py` | offline | Plain JSON (non-streaming) record/replay, auto mode, provenance |
207
+ | `tests/test_filestore.py` | offline | FileStore round-trip, redaction, hostile ids, readable cassettes |
208
+ | `tests/test_session.py` | offline | `async_client`/`cassette`, auto mode, request keying, metadata |
209
+ | `tests/test_providers.py` | offline | Adapter decoding (SSE/JSON × provider), translation, registry |
210
+ | `tests/test_comparators.py` | offline | Comparator scoring incl. mocked embedding/judge, spec parsing |
211
+ | `tests/test_migration.py` | offline | Migration end-to-end, caching, lineage metadata, report + CLI |
212
+ | `tests/test_anthropic.py` | offline + `ANTHROPIC_API_KEY` | Anthropic replay (provider-neutrality); live record→replay |
213
+ | `tests/test_live_record.py` | `OPENAI_API_KEY` | live capture against the real OpenAI API |
214
+
215
+ Key-gated tests skip cleanly when the key is absent. Live keys are read from a
216
+ project-root `.env` (via `python-dotenv`). The offline tests use canned SSE
217
+ frames and patch `httpx.AsyncHTTPTransport` so any accidental network access
218
+ fails the test.
219
+
220
+ ---
221
+
222
+ ## Model-migration report
223
+
224
+ Every recording carries provenance in `interaction.metadata`: `provider`,
225
+ `model`, `semantic_key`, and `recorded_at`. The **`semantic_key`** is a hash of
226
+ the request *without* the model (and other non-semantic fields), so two
227
+ interactions recorded against different models for the same logical prompt share
228
+ a `semantic_key`.
229
+
230
+ The migration runner builds on that: it groups the corpus by `semantic_key`,
231
+ re-asks every recorded prompt of a **target model** (cross-provider translation
232
+ included — an OpenAI-recorded prompt can be re-asked of Claude), records the
233
+ target's answers back into the corpus as `migration__…` cassettes, and scores
234
+ baseline vs. target with the selected comparators:
235
+
236
+ | Comparator | Needs network? | What it measures |
237
+ | ----------- | -------------- | ------------------------------------------------- |
238
+ | `exact` | no | normalized string equality (classification-style) |
239
+ | `fuzzy` | no | `difflib` sequence similarity |
240
+ | `embedding` | OpenAI API | cosine similarity of embeddings |
241
+ | `judge` | LLM API | an LLM scores semantic equivalence |
242
+
243
+ ```bash
244
+ # Re-ask the corpus of a candidate model and write Markdown + HTML reports:
245
+ agentrec migrate --corpus corpus --target claude-haiku-4-5 --compare exact,fuzzy,judge
246
+
247
+ # Re-render fully offline from already-recorded migration cassettes
248
+ # (offline comparators only; --strict exits 1 on any failure — a CI gate):
249
+ agentrec report --corpus corpus --target claude-haiku-4-5 --strict
250
+
251
+ # Backfill summary blocks + fingerprint metadata into older cassettes:
252
+ agentrec annotate --corpus corpus
253
+ ```
254
+
255
+ Re-running `migrate` is cheap: each target answer is itself a cassette, so
256
+ already-answered prompts are served from disk and only new prompts hit the API.
257
+ A failed (non-200) target call is never cached — a re-run retries it live.
258
+
259
+ Rows are scored concurrently (`--concurrency`, default 8). Recordings tagged
260
+ with a category — `cassette(store, metadata={"category": "extract"})` — get a
261
+ **per-category breakdown** in the report, and output-token counts per row and
262
+ per category surface verbosity/cost differences between the models.
263
+
264
+ ---
265
+
266
+ ## Attributions
267
+
268
+ See [NOTICE](NOTICE) for third-party acknowledgements, including inspiration
269
+ from **baml_vcr** for the streaming chunk capture/replay pattern.
@@ -0,0 +1,239 @@
1
+ # agentrec
2
+
3
+ Framework-agnostic record/replay for streaming LLM API interactions.
4
+ Records and replays at the **httpx transport layer**, so it works below
5
+ the OpenAI SDK, the Anthropic SDK, LangChain, or any other httpx-backed client —
6
+ the core depends on nothing but `httpx`.
7
+
8
+ > **Status:** beta (0.2). The record/replay mechanic is proven for streaming
9
+ > (SSE) and non-streaming (JSON) responses, for both OpenAI and Anthropic.
10
+ > On top of the recorded corpus sits a working **model-migration report**
11
+ > (see [Model-migration report](#model-migration-report)). The API may still
12
+ > change in minor releases before 1.0.
13
+ >
14
+ > **Scope limits:** record/replay works for *any* httpx-backed SDK, but the
15
+ > migration runner's cross-provider translation covers OpenAI ↔ Anthropic and
16
+ > **text-only** conversations — requests using tools, images, or other rich
17
+ > content become clearly-reasoned skipped rows rather than translations.
18
+
19
+ ---
20
+
21
+ ## Architecture
22
+
23
+ ```
24
+ agentrec/
25
+ capture.py # CapturedChunk, CapturedRequest, CapturedInteraction — storage-agnostic data
26
+ keying.py # fingerprint() — provider/model/semantic_key + the default cassette id
27
+ store.py # InteractionStore ABC + InMemoryStore + FileStore (JSON cassettes)
28
+ transport.py # RecordingTransport, ReplayTransport, AutoTransport (the low-level seam)
29
+ session.py # async_client() + cassette — the high-level, ergonomic seam
30
+ providers/ # ProviderAdapter registry: OpenAI + Anthropic request/response dialects
31
+ comparators.py # exact / fuzzy (offline), embedding / judge (live) response scoring
32
+ migration.py # run_migration() — replay the corpus against a candidate model
33
+ report.py # Markdown / HTML / console rendering of a MigrationReport
34
+ cli.py # `agentrec migrate | report | annotate`
35
+ ```
36
+
37
+ Key design commitments:
38
+
39
+ - **Tee, don't intercept-and-buffer.** `RecordingTransport` wraps the live
40
+ stream so the caller and the store both see every chunk in order, without
41
+ the recorder buffering the whole response first.
42
+ - **Raw bytes, no parsing.** Chunks are stored as the original SSE byte frames.
43
+ The SDK parser re-runs on replay and produces the same objects it would have
44
+ from the network. OpenAI SSE and Anthropic SSE look identical here — both are
45
+ byte streams — which is why one codebase covers both with no provider branches.
46
+ - **Injected store.** `InMemoryStore` (volatile) and `FileStore` (human-readable
47
+ JSON cassettes, atomic writes, secret-scrubbing) both satisfy `InteractionStore`.
48
+ A future store (Parquet corpus, S3, …) drops in without touching transport code.
49
+ - **Distinct transport classes.** `RecordingTransport` requires an inner
50
+ transport; `ReplayTransport` has none — it *cannot* accidentally touch the
51
+ network. `AutoTransport` composes the two for cassette semantics.
52
+ - **Request-derived keys.** Each interaction is keyed by a fingerprint of the
53
+ request (method + path + model + normalised body), so one transport handles
54
+ many distinct calls and the same call replays deterministically.
55
+
56
+ ---
57
+
58
+ ## Install
59
+
60
+ ```bash
61
+ pip install agentrec # core is httpx-only
62
+ pip install "agentrec[compression]" # + brotli/zstd cassette decoding
63
+
64
+ # from a checkout:
65
+ pip install -e ".[dev]" # the dev extra adds the SDKs + pytest
66
+ ```
67
+
68
+ ---
69
+
70
+ ## Quick start — the high-level seam
71
+
72
+ Build one `agentrec.async_client()` and pass it to any httpx-based SDK. Wrap
73
+ your calls in a `cassette`: `mode="auto"` replays a request if it's been
74
+ recorded, otherwise records it (true VCR-style cassette behaviour).
75
+
76
+ ```python
77
+ import agentrec
78
+ from openai import AsyncOpenAI
79
+
80
+ store = agentrec.FileStore("corpus")
81
+ http = agentrec.async_client() # honours the active cassette scope
82
+ oai = AsyncOpenAI(http_client=http)
83
+
84
+ # Streaming — every call inside is recorded once, then replayed:
85
+ @agentrec.cassette(store, mode="auto")
86
+ async def ask_stream(prompt: str) -> str:
87
+ stream = await oai.chat.completions.create(
88
+ model="gpt-4o-mini",
89
+ messages=[{"role": "user", "content": prompt}],
90
+ stream=True,
91
+ )
92
+ out = ""
93
+ async for chunk in stream:
94
+ if chunk.choices and chunk.choices[0].delta.content:
95
+ out += chunk.choices[0].delta.content
96
+ return out
97
+
98
+ # Non-streaming — works identically; the JSON body is one chunk at the transport layer:
99
+ @agentrec.cassette(store, mode="auto")
100
+ async def ask(prompt: str) -> str:
101
+ response = await oai.chat.completions.create(
102
+ model="gpt-4o-mini",
103
+ messages=[{"role": "user", "content": prompt}],
104
+ )
105
+ return response.choices[0].message.content
106
+
107
+ # Or as a context manager:
108
+ async with agentrec.cassette(store, mode="record"):
109
+ await oai.chat.completions.create(...)
110
+ ```
111
+
112
+ The same `async_client` + `cassette` works against the Anthropic SDK unchanged —
113
+ just `AsyncAnthropic(http_client=http)`.
114
+
115
+ ---
116
+
117
+ ## Lower-level seam — explicit transports
118
+
119
+ When you'd rather wire the httpx client yourself (no contextvar), use the
120
+ transports directly. `key` is optional: omit it for request-derived keying, or
121
+ pass a fixed id for a single named cassette.
122
+
123
+ ```python
124
+ import httpx
125
+ from openai import AsyncOpenAI
126
+ from agentrec import FileStore, RecordingTransport, ReplayTransport
127
+
128
+ store = FileStore("corpus")
129
+
130
+ # --- Record (needs network) ---
131
+ async with httpx.AsyncClient(
132
+ transport=RecordingTransport(httpx.AsyncHTTPTransport(), store, key="weather")
133
+ ) as http_client:
134
+ client = AsyncOpenAI(http_client=http_client)
135
+ stream = await client.chat.completions.create(..., stream=True)
136
+ async for chunk in stream:
137
+ ... # caller receives the live stream unchanged
138
+
139
+ # --- Replay (offline, no key needed if you recorded with request keying) ---
140
+ async with httpx.AsyncClient(transport=ReplayTransport(store, key="weather")) as http_client:
141
+ client = AsyncOpenAI(http_client=http_client)
142
+ stream = await client.chat.completions.create(..., stream=True)
143
+ async for chunk in stream:
144
+ ... # identical to the recorded run
145
+ ```
146
+
147
+ ---
148
+
149
+ ## Provider support
150
+
151
+ Interception is at the httpx transport, so agentrec is provider-neutral for
152
+ **any SDK that lets you pass an httpx client**:
153
+
154
+ | SDK / client | Works | How |
155
+ | ------------------------------------ | :---: | ------------------------------------------- |
156
+ | OpenAI (`openai`) | ✅ | `AsyncOpenAI(http_client=...)` |
157
+ | Anthropic (`anthropic`) | ✅ | `AsyncAnthropic(http_client=...)` |
158
+ | Most modern httpx-based SDKs / LangChain | ✅ | pass the agentrec httpx client through |
159
+ | **Non-httpx SDKs** (AWS Bedrock/boto3, some Vertex paths) | ❌ | they don't route through httpx, so the transport never sees the call — a different seam would be needed |
160
+
161
+ The boundary is "httpx-backed," not "OpenAI." If a client opens its sockets
162
+ through `botocore`/`urllib3` instead of httpx, transport interception can't see
163
+ it.
164
+
165
+ ---
166
+
167
+ ## Running the tests
168
+
169
+ ```bash
170
+ pytest -q
171
+ ```
172
+
173
+ | Test file | Needs a key? | What it proves |
174
+ | ------------------------ | ------------ | ----------------------------------------------------------------- |
175
+ | `tests/test_streaming.py` | offline + `OPENAI_API_KEY` | OpenAI SSE replay mechanic; live record→replay identity |
176
+ | `tests/test_non_streaming.py` | offline | Plain JSON (non-streaming) record/replay, auto mode, provenance |
177
+ | `tests/test_filestore.py` | offline | FileStore round-trip, redaction, hostile ids, readable cassettes |
178
+ | `tests/test_session.py` | offline | `async_client`/`cassette`, auto mode, request keying, metadata |
179
+ | `tests/test_providers.py` | offline | Adapter decoding (SSE/JSON × provider), translation, registry |
180
+ | `tests/test_comparators.py` | offline | Comparator scoring incl. mocked embedding/judge, spec parsing |
181
+ | `tests/test_migration.py` | offline | Migration end-to-end, caching, lineage metadata, report + CLI |
182
+ | `tests/test_anthropic.py` | offline + `ANTHROPIC_API_KEY` | Anthropic replay (provider-neutrality); live record→replay |
183
+ | `tests/test_live_record.py` | `OPENAI_API_KEY` | live capture against the real OpenAI API |
184
+
185
+ Key-gated tests skip cleanly when the key is absent. Live keys are read from a
186
+ project-root `.env` (via `python-dotenv`). The offline tests use canned SSE
187
+ frames and patch `httpx.AsyncHTTPTransport` so any accidental network access
188
+ fails the test.
189
+
190
+ ---
191
+
192
+ ## Model-migration report
193
+
194
+ Every recording carries provenance in `interaction.metadata`: `provider`,
195
+ `model`, `semantic_key`, and `recorded_at`. The **`semantic_key`** is a hash of
196
+ the request *without* the model (and other non-semantic fields), so two
197
+ interactions recorded against different models for the same logical prompt share
198
+ a `semantic_key`.
199
+
200
+ The migration runner builds on that: it groups the corpus by `semantic_key`,
201
+ re-asks every recorded prompt of a **target model** (cross-provider translation
202
+ included — an OpenAI-recorded prompt can be re-asked of Claude), records the
203
+ target's answers back into the corpus as `migration__…` cassettes, and scores
204
+ baseline vs. target with the selected comparators:
205
+
206
+ | Comparator | Needs network? | What it measures |
207
+ | ----------- | -------------- | ------------------------------------------------- |
208
+ | `exact` | no | normalized string equality (classification-style) |
209
+ | `fuzzy` | no | `difflib` sequence similarity |
210
+ | `embedding` | OpenAI API | cosine similarity of embeddings |
211
+ | `judge` | LLM API | an LLM scores semantic equivalence |
212
+
213
+ ```bash
214
+ # Re-ask the corpus of a candidate model and write Markdown + HTML reports:
215
+ agentrec migrate --corpus corpus --target claude-haiku-4-5 --compare exact,fuzzy,judge
216
+
217
+ # Re-render fully offline from already-recorded migration cassettes
218
+ # (offline comparators only; --strict exits 1 on any failure — a CI gate):
219
+ agentrec report --corpus corpus --target claude-haiku-4-5 --strict
220
+
221
+ # Backfill summary blocks + fingerprint metadata into older cassettes:
222
+ agentrec annotate --corpus corpus
223
+ ```
224
+
225
+ Re-running `migrate` is cheap: each target answer is itself a cassette, so
226
+ already-answered prompts are served from disk and only new prompts hit the API.
227
+ A failed (non-200) target call is never cached — a re-run retries it live.
228
+
229
+ Rows are scored concurrently (`--concurrency`, default 8). Recordings tagged
230
+ with a category — `cassette(store, metadata={"category": "extract"})` — get a
231
+ **per-category breakdown** in the report, and output-token counts per row and
232
+ per category surface verbosity/cost differences between the models.
233
+
234
+ ---
235
+
236
+ ## Attributions
237
+
238
+ See [NOTICE](NOTICE) for third-party acknowledgements, including inspiration
239
+ from **baml_vcr** for the streaming chunk capture/replay pattern.