deja 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +24 -0
- data/LICENSE +21 -0
- data/README.md +226 -0
- data/lib/deja/adapters/anthropic.rb +45 -0
- data/lib/deja/adapters/base.rb +83 -0
- data/lib/deja/cache.rb +281 -0
- data/lib/deja/configuration.rb +86 -0
- data/lib/deja/judges/anthropic.rb +32 -0
- data/lib/deja/judges/base.rb +69 -0
- data/lib/deja/requirements_cache.rb +117 -0
- data/lib/deja/rspec.rb +162 -0
- data/lib/deja/session.rb +53 -0
- data/lib/deja/version.rb +5 -0
- data/lib/deja.rb +84 -0
- metadata +120 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: c773af63e95adbc7d09f0cd1f3df714e567da9c4abb03803e58d4ae008f9c7f1
|
|
4
|
+
data.tar.gz: '089eccd51f378482ffdd83c97b53e7d2a45712aadf7f42385d03045256a2484f'
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: 1e91bb15c14bcad87e0ddf77b32c6e77a6cae4313f31b2b212c9669f6c7b3fe626a852beac89f723d4ea0e67ef2e2fe79c5c22960dd9fb2e261077d214d32912
|
|
7
|
+
data.tar.gz: d110bdee9a51d74de91ccef37be10ea86de0863ce0d68b66bc2b1f30d332e3534b5288a47647066192721f7de1026de6c95dd188acbc70f04fde01029cf66c01
|
data/CHANGELOG.md
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project are documented here. This project adheres to
|
|
4
|
+
[Semantic Versioning](https://semver.org/).
|
|
5
|
+
|
|
6
|
+
## [0.1.0] - 2026-06-19
|
|
7
|
+
|
|
8
|
+
### Added
|
|
9
|
+
- Initial extraction from the Forge test suite.
|
|
10
|
+
- `use_llm_cache(id)` — record/replay Anthropic `messages.create` and
|
|
11
|
+
`messages.stream` calls to a per-test YAML file.
|
|
12
|
+
- `expect_llm_called` and `forbid_calls` helpers.
|
|
13
|
+
- `cached_llm_value(id, *path)` reader.
|
|
14
|
+
- `meet_requirements(text)` matcher — judge a value against free-text
|
|
15
|
+
requirements once, then cache the verdict.
|
|
16
|
+
- Pluggable provider adapters (`Deja::Adapters::Base`), so a suite can mix
|
|
17
|
+
providers. Ships `:anthropic`. `use_llm_cache` installs every registered
|
|
18
|
+
adapter; cache entries are tagged with `provider:`.
|
|
19
|
+
- `Deja.configure` with `cache_root`, `register(provider, install:, real_client:)`,
|
|
20
|
+
a dedicated `judge_client`, and judge model/prompt settings.
|
|
21
|
+
- Anthropic SDK response structs + serialize/deserialize extracted into the
|
|
22
|
+
`llm_mock_anthropic` gem (on the shared `llm_mock` contract); the Anthropic
|
|
23
|
+
adapter now delegates to it. `Deja::Anthropic::*` is removed — use
|
|
24
|
+
`LlmMock::Anthropic::*` for canned responses.
|
data/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Nate Brustein
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
data/README.md
ADDED
|
@@ -0,0 +1,226 @@
|
|
|
1
|
+
# Deja
|
|
2
|
+
|
|
3
|
+
Deja is a testing toolkit for code that calls LLM apis. Your tests make real calls to real LLMs, so you can confirm the model actually gives the results you expect — you're checking genuine model behavior, not a stub you wrote. To keep that fast and repeatable, Deja records each real call once and replays it from a cache on every later run, so your suite stays deterministic and free to run in CI.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
### What Deja allows
|
|
8
|
+
|
|
9
|
+
Deja allows you to add the following coverage to your test.
|
|
10
|
+
|
|
11
|
+
* I have application code that generates arguments for an LLM api. I want to assert on the arguments that were provided to the LLM api.
|
|
12
|
+
* When I pass certain arguments (i.e. a certain prompt) to an LLM, the result is non-deterministic. Even so, I have certain requirements
|
|
13
|
+
as to what the result should be. I want to assert that the LLM's response meets those requirements.
|
|
14
|
+
|
|
15
|
+
With this functionality, you can do the following.
|
|
16
|
+
|
|
17
|
+
* I want to change my code and be sure that the changes I made did not affect the arguments passed to the LLM api.
|
|
18
|
+
* I want to iterate on my application code in ways that will change a prompt sent to an LLM until the response meets certain requirements.
|
|
19
|
+
* I want to change my code in a way that will change a prompt sent to an LLM and be sure that the response still meets existing requirements.
|
|
20
|
+
* I want to upgrade to a new model and be sure that all of my existing calls still meet existing requirements.
|
|
21
|
+
|
|
22
|
+
### How Deja works
|
|
23
|
+
|
|
24
|
+
1. You run a test locally with ALLOW_LLM_CALL=1.
|
|
25
|
+
* When your test hits application code that triggers a call to an LLM api, the call is actually made via http.
|
|
26
|
+
* You assert on the arguments that were sent to the LLM api
|
|
27
|
+
* You assert in a fuzzy way on the response, like "The response should say that..."
|
|
28
|
+
* Deja caches the response, keyed off of the exact set of arguments. The cached response is stored in a generated file, which
|
|
29
|
+
you store in version control.
|
|
30
|
+
2. You run the test again
|
|
31
|
+
* When the LLM api call is triggered, the test finds the response in the cache, skipping the http call to the LLM api
|
|
32
|
+
* Your assertions ensure that your code still sent the expected arguments
|
|
33
|
+
3. You push your code and tests run on CI
|
|
34
|
+
* Since the cached response is stored in version control, CI has access to it and runs the test without making any
|
|
35
|
+
actual LLM calls
|
|
36
|
+
4. You update code and re-run the test locally with ALLOW_LLM_CALL=1.
|
|
37
|
+
* The updated code can change the prompt, the LLM model, or anything else that will change the arguments sent to the LLM api.
|
|
38
|
+
* Since there is no cached response for the new arguments, the call to the LLM api is actually made via http.
|
|
39
|
+
* The new response is cached, replacing the old one
|
|
40
|
+
* Your fuzzy assertion ensures that the new response still matches your requirements.
|
|
41
|
+
|
|
42
|
+
### LLM support
|
|
43
|
+
|
|
44
|
+
Today Deja targets the [Anthropic](https://github.com/anthropics/anthropic-sdk-ruby) SDK. Support for other SDKs is coming.
|
|
45
|
+
|
|
46
|
+
The Anthropic-specific bits — the response value objects and the
|
|
47
|
+
serialize/deserialize that backs the cache — live in a separate gem,
|
|
48
|
+
[`llm_mock_anthropic`](https://github.com/nbrustein/llm_mock_anthropic) (built on
|
|
49
|
+
the shared [`llm_mock`](https://github.com/nbrustein/llm_mock) contract). Deja
|
|
50
|
+
pulls it in automatically. If a test stubs the Anthropic client directly (rather
|
|
51
|
+
than recording/replaying), use that gem's builders — e.g.
|
|
52
|
+
`LlmMock::Anthropic.message([...])` — to construct canned responses.
|
|
53
|
+
|
|
54
|
+
## Usage
|
|
55
|
+
|
|
56
|
+
### Installation
|
|
57
|
+
|
|
58
|
+
```ruby
|
|
59
|
+
# Gemfile
|
|
60
|
+
group :test do
|
|
61
|
+
gem "deja"
|
|
62
|
+
end
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
### Setup
|
|
66
|
+
|
|
67
|
+
Require the RSpec integration, point Deja at a cache directory, and register a
|
|
68
|
+
provider — telling it how to swap your app's client for Deja's caching stub:
|
|
69
|
+
|
|
70
|
+
```ruby
|
|
71
|
+
# spec/support/deja.rb (or spec/rails_helper.rb)
|
|
72
|
+
require "deja/rspec"
|
|
73
|
+
|
|
74
|
+
Deja.configure do |c|
|
|
75
|
+
# Whatever your app calls to get an Anthropic client. Deja hands you its
|
|
76
|
+
# caching stub; you return it from that accessor for the duration of the test.
|
|
77
|
+
c.register :anthropic,
|
|
78
|
+
install: ->(mock_anthropic_client) { allow(AnthropicClient).to receive(:client).and_return(mock_anthropic_client) }
|
|
79
|
+
|
|
80
|
+
# Required only if you use the `meet_requirements` matcher: the client Deja
|
|
81
|
+
# uses to judge a value against its requirements. Deja picks provider-specific
|
|
82
|
+
# defaults from the client's type (Anthropic is built in).
|
|
83
|
+
c.judge_client { Anthropic::Client.new }
|
|
84
|
+
|
|
85
|
+
# Optional: override the judge's defaults (model, max_tokens, system prompt) or
|
|
86
|
+
# pass provider-specific args. These are merged into the judge's
|
|
87
|
+
# messages.create call over its built-in defaults.
|
|
88
|
+
c.judge_attrs = { model: "claude-sonnet-4-5" }
|
|
89
|
+
end
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
Recorded cache files go under `spec/support/deja_cache` by default. To put them
|
|
93
|
+
somewhere else, set `cache_root` (a String or Pathname, resolved under your
|
|
94
|
+
project root):
|
|
95
|
+
|
|
96
|
+
```ruby
|
|
97
|
+
Deja.configure { |c| c.cache_root = Rails.root.join("spec/support/cache") }
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
That assumes your app funnels LLM access through a single seam. e.g., In this example,
|
|
101
|
+
Deja will mock out calls to `AnthropicClient.client`:
|
|
102
|
+
|
|
103
|
+
```ruby
|
|
104
|
+
class AnthropicClient
|
|
105
|
+
def self.client
|
|
106
|
+
Anthropic::Client.new
|
|
107
|
+
end
|
|
108
|
+
end
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
|
|
112
|
+
**Optional:** Deja doesn't require WebMock — it intercepts calls at the client
|
|
113
|
+
seam, not at the HTTP layer. But if your suite already uses WebMock, allow the
|
|
114
|
+
Anthropic host so recording can reach it (and keep the allowlist tight so a
|
|
115
|
+
forgotten stub surfaces as a blocked request rather than a silent live call):
|
|
116
|
+
|
|
117
|
+
```ruby
|
|
118
|
+
WebMock.disable_net_connect!(allow_localhost: true, allow: ["api.anthropic.com"])
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
## Assert on LLM api arguments and response
|
|
122
|
+
|
|
123
|
+
```ruby
|
|
124
|
+
it "summarizes an article" do
|
|
125
|
+
use_llm_cache("2026-04-30_17-03") # one cache file for this test
|
|
126
|
+
|
|
127
|
+
summary = ArticleSummarizer.new(article).call # makes LLM calls — routed through Deja
|
|
128
|
+
|
|
129
|
+
kwargs = expect_llm_called # exactly one call happened
|
|
130
|
+
expect(kwargs[:system]).to include("You are a summarization assistant")
|
|
131
|
+
|
|
132
|
+
expect(summary).to meet_requirements(<<~REQ)
|
|
133
|
+
A single sentence under 200 characters that indicates that the article is about The Hitchhiker's Guide to the Galaxy
|
|
134
|
+
REQ
|
|
135
|
+
end
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
Run it three ways:
|
|
139
|
+
|
|
140
|
+
```bash
|
|
141
|
+
# 1. First run — nothing cached yet:
|
|
142
|
+
bundle exec rspec spec/integration/article_summarizer_spec.rb
|
|
143
|
+
# => Deja::MissingCacheError: "Set ALLOW_LLM_CALL=1 to make the call and record it."
|
|
144
|
+
|
|
145
|
+
# 2. Record — makes the real calls and writes YAML fixtures:
|
|
146
|
+
ALLOW_LLM_CALL=1 bundle exec rspec spec/integration/article_summarizer_spec.rb
|
|
147
|
+
|
|
148
|
+
# 3. Every run after — replays from cache, no network:
|
|
149
|
+
bundle exec rspec spec/integration/article_summarizer_spec.rb
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
Commit the YAML files under `cache_root`. They're the recorded fixtures; CI
|
|
153
|
+
replays them with no API key.
|
|
154
|
+
|
|
155
|
+
## Use LLM response in a subsequent assertion
|
|
156
|
+
|
|
157
|
+
When your code *acts on* what the model returned, you'll often want to assert it
|
|
158
|
+
used that output correctly. But the output is non-deterministic, so you can't
|
|
159
|
+
hardcode it. `cached_llm_value` reads the actual recorded response out of the
|
|
160
|
+
cache file by walking keys and array indices — so your assertion and the
|
|
161
|
+
recording stay in sync every time you re-record.
|
|
162
|
+
|
|
163
|
+
```ruby
|
|
164
|
+
it "stores the topics the model extracted" do
|
|
165
|
+
use_llm_cache("2026-05-01_09-15")
|
|
166
|
+
|
|
167
|
+
# Asks the model to extract topics via a tool call, then saves them.
|
|
168
|
+
ArticleTagger.new(article).call
|
|
169
|
+
|
|
170
|
+
# Read what the model actually returned (a tool_use input) from the cache and
|
|
171
|
+
# assert your code persisted exactly that — no hardcoded expectation.
|
|
172
|
+
topics = cached_llm_value("2026-05-01_09-15",
|
|
173
|
+
"calls", 0, "response", "tool_uses", 0, "input", "topics")
|
|
174
|
+
|
|
175
|
+
expect(Article.last.topics).to eq(topics)
|
|
176
|
+
end
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
The path mirrors the recorded YAML (see [How it caches](#how-it-caches)):
|
|
180
|
+
`calls` → the first call → its `response` → the first `tool_uses` entry → that
|
|
181
|
+
tool call's `input` → the `topics` key.
|
|
182
|
+
|
|
183
|
+
## DSL reference
|
|
184
|
+
|
|
185
|
+
| Helper | What it does |
|
|
186
|
+
| --- | --- |
|
|
187
|
+
| `use_llm_cache(id)` | Installs the caching stub and sets the per-test cache id. Call once at the top of an example. |
|
|
188
|
+
| `expect_llm_called` | Asserts exactly one LLM call happened; returns its kwargs. Currently only useful where there is a single llm call in a test. |
|
|
189
|
+
| `forbid_llm_calls` | Installs a client that raises on any access — proves a code path never reaches the LLM. |
|
|
190
|
+
| `cached_llm_value(id, *path)` | Reads a value out of a recorded YAML file by walking keys/indices. |
|
|
191
|
+
| `meet_requirements(text)` | Matcher: asserts a value satisfies free-text requirements (judged once, cached). |
|
|
192
|
+
|
|
193
|
+
## How it caches
|
|
194
|
+
|
|
195
|
+
One YAML file per test, keyed by the id you pass to `use_llm_cache`:
|
|
196
|
+
|
|
197
|
+
```
|
|
198
|
+
<cache_root>/cached_calls/<spec/path>/<id>.yaml # recorded responses
|
|
199
|
+
<cache_root>/meets_requirements/<spec/path>/<id>.yaml # confirmed meet_requirements values
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
Each request is fingerprinted with a 12-char hash of its canonicalized kwargs.
|
|
203
|
+
On replay, a miss prints a unified diff against the closest recorded request so
|
|
204
|
+
you can see exactly what drifted. Re-recording (`ALLOW_LLM_CALL=1`) prunes any
|
|
205
|
+
cached entry the test no longer reaches.
|
|
206
|
+
|
|
207
|
+
## Environment variables
|
|
208
|
+
|
|
209
|
+
| Variable | Effect |
|
|
210
|
+
| --- | --- |
|
|
211
|
+
| `ALLOW_LLM_CALL=1` | Make real calls and record/update the cache. Your real client must be able to authenticate (the Anthropic SDK reads `ANTHROPIC_API_KEY` by default). |
|
|
212
|
+
| `DISABLE_LLM_CACHE=1` | Bypass the cache entirely and always call live (debugging). |
|
|
213
|
+
|
|
214
|
+
## Configuration
|
|
215
|
+
|
|
216
|
+
| Setting | Default | Purpose |
|
|
217
|
+
| --- | --- | --- |
|
|
218
|
+
| `cache_root` | `spec/support/deja_cache` | Directory for recorded YAML (under `project_root`). |
|
|
219
|
+
| `register(provider, install:, real_client:, as:)` | — (≥1 required) | Register a provider. `install` swaps your app's client for Deja's stub; `real_client` (optional) builds a live client for recording. |
|
|
220
|
+
| `project_root` | `Dir.pwd` | Base for relative paths in error messages. |
|
|
221
|
+
| `judge_client { ... }` | — (required for `meet_requirements`) | Live client used by the `meet_requirements` judge. No default. |
|
|
222
|
+
| `judge_attrs = { ... }` | `{}` | Attrs merged into the `meet_requirements` judge's `messages.create` call, overriding the judge's own defaults (model, token limit, system prompt). `messages` and `output_config` are reserved by the matcher. |
|
|
223
|
+
|
|
224
|
+
## License
|
|
225
|
+
|
|
226
|
+
MIT — see [LICENSE](LICENSE).
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "deja/adapters/base"
|
|
4
|
+
require "llm_mock_anthropic"
|
|
5
|
+
|
|
6
|
+
module Deja
|
|
7
|
+
module Adapters
|
|
8
|
+
# Wires the Anthropic provider from llm_mock_anthropic into Deja's cache.
|
|
9
|
+
# All Anthropic-SDK shape knowledge — the stub client, the response structs,
|
|
10
|
+
# and serialize/deserialize — lives in LlmMock::Anthropic; this adapter just
|
|
11
|
+
# routes its calls through Deja::Cache (via Base#cached_call).
|
|
12
|
+
class Anthropic < Base
|
|
13
|
+
def provider
|
|
14
|
+
@provider ||= LlmMock::Anthropic::Provider.new
|
|
15
|
+
end
|
|
16
|
+
|
|
17
|
+
def build_mock_client
|
|
18
|
+
adapter = self
|
|
19
|
+
provider.build_client do |method, kwargs|
|
|
20
|
+
adapter.cached_call(method, kwargs) do
|
|
21
|
+
adapter.provider.call_real(adapter.real_client, method, kwargs)
|
|
22
|
+
end
|
|
23
|
+
end
|
|
24
|
+
end
|
|
25
|
+
|
|
26
|
+
def default_real_client
|
|
27
|
+
provider.default_real_client
|
|
28
|
+
end
|
|
29
|
+
|
|
30
|
+
def prompt_for(kwargs)
|
|
31
|
+
provider.prompt_for(kwargs)
|
|
32
|
+
end
|
|
33
|
+
|
|
34
|
+
def serialize(method, response)
|
|
35
|
+
provider.serialize(method, response)
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
def deserialize(method, data)
|
|
39
|
+
provider.deserialize(method, data)
|
|
40
|
+
end
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
register_type(:anthropic, Anthropic)
|
|
44
|
+
end
|
|
45
|
+
end
|
|
@@ -0,0 +1,83 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Deja
|
|
4
|
+
# Adapters teach Deja how to talk to one LLM provider: the stub client's shape,
|
|
5
|
+
# how to (de)serialize a response, and how to build a real client. The cache,
|
|
6
|
+
# hashing, and matcher are all provider-agnostic and live elsewhere.
|
|
7
|
+
module Adapters
|
|
8
|
+
@registered = {}
|
|
9
|
+
|
|
10
|
+
class << self
|
|
11
|
+
# Built-in adapter classes register themselves by provider name, so
|
|
12
|
+
# `c.register :anthropic, ...` knows which class to build. A new provider is
|
|
13
|
+
# purely additive: a new Base subclass + one register_type call.
|
|
14
|
+
def register_type(name, klass)
|
|
15
|
+
@registered[name] = klass
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
def build(provider, key:, install:, real_client:)
|
|
19
|
+
klass = @registered.fetch(provider) do
|
|
20
|
+
raise Deja::Error, "Unknown provider #{provider.inspect}. Registered: #{@registered.keys.inspect}"
|
|
21
|
+
end
|
|
22
|
+
klass.new(key:, install:, real_client:)
|
|
23
|
+
end
|
|
24
|
+
end
|
|
25
|
+
|
|
26
|
+
class Base
|
|
27
|
+
attr_reader :key, :install_block
|
|
28
|
+
|
|
29
|
+
# key — how this registration is named (usually the provider symbol)
|
|
30
|
+
# install — block run in the example's context to swap the app's client
|
|
31
|
+
# for the one Deja hands it
|
|
32
|
+
# real_client — optional block building a live client; falls back to the
|
|
33
|
+
# subclass default
|
|
34
|
+
def initialize(key:, install:, real_client: nil)
|
|
35
|
+
@key = key
|
|
36
|
+
@install_block = install
|
|
37
|
+
@real_client_override = real_client
|
|
38
|
+
end
|
|
39
|
+
|
|
40
|
+
def real_client
|
|
41
|
+
(@real_client_override || default_real_client).call
|
|
42
|
+
end
|
|
43
|
+
|
|
44
|
+
# Wraps a single call: records it (for expect_llm_called), routes through the
|
|
45
|
+
# cache, and (de)serializes via the subclass. `real_call` performs the live
|
|
46
|
+
# provider call when recording.
|
|
47
|
+
def cached_call(method, kwargs, &real_call)
|
|
48
|
+
Deja.record_call(key, method, kwargs)
|
|
49
|
+
data = Deja::Cache.fetch(method, kwargs, provider: key, prompt: prompt_for(kwargs)) do
|
|
50
|
+
serialize(method, real_call.call)
|
|
51
|
+
end
|
|
52
|
+
deserialize(method, data)
|
|
53
|
+
end
|
|
54
|
+
|
|
55
|
+
# --- subclass hooks ---
|
|
56
|
+
|
|
57
|
+
# The stub client object the app receives. Its methods call `cached_call`.
|
|
58
|
+
def build_mock_client
|
|
59
|
+
raise NotImplementedError, "#{self.class} must implement #build_mock_client"
|
|
60
|
+
end
|
|
61
|
+
|
|
62
|
+
# Provider response object -> plain Hash (must round-trip through deserialize).
|
|
63
|
+
def serialize(_method, _response)
|
|
64
|
+
raise NotImplementedError, "#{self.class} must implement #serialize"
|
|
65
|
+
end
|
|
66
|
+
|
|
67
|
+
# Plain Hash (from the cache) -> object shaped like the provider's response.
|
|
68
|
+
def deserialize(_method, _data)
|
|
69
|
+
raise NotImplementedError, "#{self.class} must implement #deserialize"
|
|
70
|
+
end
|
|
71
|
+
|
|
72
|
+
def default_real_client
|
|
73
|
+
raise NotImplementedError, "#{self.class} must implement #default_real_client"
|
|
74
|
+
end
|
|
75
|
+
|
|
76
|
+
# A human-readable prompt string stored on the cache entry (purely for
|
|
77
|
+
# auditing the YAML). Optional.
|
|
78
|
+
def prompt_for(_kwargs)
|
|
79
|
+
nil
|
|
80
|
+
end
|
|
81
|
+
end
|
|
82
|
+
end
|
|
83
|
+
end
|
data/lib/deja/cache.rb
ADDED
|
@@ -0,0 +1,281 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "diff/lcs"
|
|
4
|
+
require "diff/lcs/hunk"
|
|
5
|
+
require "digest"
|
|
6
|
+
require "fileutils"
|
|
7
|
+
require "set"
|
|
8
|
+
require "yaml"
|
|
9
|
+
|
|
10
|
+
module Deja
|
|
11
|
+
# File-based cache for Anthropic responses, keyed by an id chosen per-test (see
|
|
12
|
+
# `use_llm_cache(id)`). One file per test: `<cache_root>/cached_calls/<suite>/<id>.yaml`.
|
|
13
|
+
# All calls a test makes land in that one file under `calls:`, each tagged with a
|
|
14
|
+
# `hash` of the kwargs so we can look up the right cached response on replay.
|
|
15
|
+
#
|
|
16
|
+
# YAML shape:
|
|
17
|
+
# test_suite: <derived from the spec file path>
|
|
18
|
+
# test_name: <full RSpec description>
|
|
19
|
+
# summary: <human-readable counts: total / tool_use / message-only>
|
|
20
|
+
# calls:
|
|
21
|
+
# - provider: <which registered adapter produced this — e.g. anthropic>
|
|
22
|
+
# hash: <12-char fingerprint of kwargs — used for lookup>
|
|
23
|
+
# prompt: <adapter-supplied readable prompt, when present>
|
|
24
|
+
# payload: <full canonicalized kwargs — for a precise diff on miss>
|
|
25
|
+
# response: <adapter-serialized response hash; the adapter replays it>
|
|
26
|
+
#
|
|
27
|
+
# Behavior:
|
|
28
|
+
# DISABLE_LLM_CACHE=1 → bypass cache entirely
|
|
29
|
+
# cache hit → return cached response
|
|
30
|
+
# miss + ALLOW_LLM_CALL=1 → call live, append to the test's file
|
|
31
|
+
# miss + no ALLOW_LLM_CALL → raise Deja::MissingCacheError
|
|
32
|
+
module Cache
|
|
33
|
+
module_function
|
|
34
|
+
|
|
35
|
+
def cache_dir
|
|
36
|
+
Deja.configuration.cache_root.join("cached_calls")
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
def fetch(method, kwargs, provider:, prompt: nil)
|
|
40
|
+
return yield if ENV["DISABLE_LLM_CACHE"]
|
|
41
|
+
|
|
42
|
+
hash = call_hash(method, kwargs)
|
|
43
|
+
record_touched(hash)
|
|
44
|
+
entry = load_call(hash)
|
|
45
|
+
|
|
46
|
+
if entry
|
|
47
|
+
response_from_entry(entry)
|
|
48
|
+
elsif ENV["ALLOW_LLM_CALL"]
|
|
49
|
+
response = yield
|
|
50
|
+
append_call!(provider, hash, kwargs, prompt, response)
|
|
51
|
+
response
|
|
52
|
+
else
|
|
53
|
+
raise Deja::MissingCacheError, build_miss_message(hash, kwargs)
|
|
54
|
+
end
|
|
55
|
+
end
|
|
56
|
+
|
|
57
|
+
# Builds the MissingCacheError body. When there's a cached entry whose
|
|
58
|
+
# canonicalized payload is similar to the current request, we show a
|
|
59
|
+
# unified diff against the cached payload so the test author can see
|
|
60
|
+
# exactly what drifted between record and replay. The cache stores the
|
|
61
|
+
# full canonicalized payload on each entry, so this covers `system`,
|
|
62
|
+
# `messages`, `tools`, `tool_choice`, etc. — anything the hash is computed
|
|
63
|
+
# over.
|
|
64
|
+
def build_miss_message(hash, kwargs)
|
|
65
|
+
base = "No cached LLM response with hash #{hash} in #{display_path(cache_file)}.\n" \
|
|
66
|
+
"Set ALLOW_LLM_CALL=1 to make the call and record it."
|
|
67
|
+
current_payload = JSON.pretty_generate(cache_affecting_args(kwargs))
|
|
68
|
+
closest = closest_cached_entry(current_payload)
|
|
69
|
+
return base unless closest
|
|
70
|
+
|
|
71
|
+
cached_payload = JSON.pretty_generate(closest["payload"]) if closest["payload"]
|
|
72
|
+
cached_payload ||= closest["prompt"].to_s # legacy entries: only prompt was stored
|
|
73
|
+
diff = unified_diff(cached_payload, current_payload, context: 3)
|
|
74
|
+
if diff.empty?
|
|
75
|
+
return "#{base}\n\nClosest cached entry: #{closest['hash']} " \
|
|
76
|
+
"(prompts differ outside the captured payload)"
|
|
77
|
+
end
|
|
78
|
+
|
|
79
|
+
"#{base}\n\n" \
|
|
80
|
+
"Closest cached entry: #{closest['hash']}\n" \
|
|
81
|
+
"--- cached payload (#{closest['hash']})\n" \
|
|
82
|
+
"+++ current payload (#{hash})\n" \
|
|
83
|
+
"#{diff}"
|
|
84
|
+
end
|
|
85
|
+
|
|
86
|
+
# Picks the cached entry whose stored payload (or, for legacy entries that
|
|
87
|
+
# only stored `prompt`, system text) has the largest LCS overlap with the
|
|
88
|
+
# current request. Returns nil when the cache file is empty.
|
|
89
|
+
def closest_cached_entry(current_text)
|
|
90
|
+
return nil unless cache_file.exist?
|
|
91
|
+
|
|
92
|
+
data = YAML.safe_load(cache_file.read, permitted_classes: [], aliases: false)
|
|
93
|
+
calls = data["calls"]
|
|
94
|
+
return nil if calls.nil? || calls.empty?
|
|
95
|
+
|
|
96
|
+
current_lines = current_text.lines
|
|
97
|
+
calls.max_by do |c|
|
|
98
|
+
cached_text = c["payload"] ? JSON.pretty_generate(c["payload"]) : c["prompt"].to_s
|
|
99
|
+
Diff::LCS.lcs(cached_text.lines, current_lines).size
|
|
100
|
+
end
|
|
101
|
+
end
|
|
102
|
+
|
|
103
|
+
# Returns a unified diff (with `context` lines of context) between two
|
|
104
|
+
# strings, or an empty string when they're identical.
|
|
105
|
+
def unified_diff(old_text, new_text, context: 2)
|
|
106
|
+
old_lines = old_text.lines
|
|
107
|
+
new_lines = new_text.lines
|
|
108
|
+
return "" if old_lines == new_lines
|
|
109
|
+
|
|
110
|
+
diffs = Diff::LCS.diff(old_lines, new_lines)
|
|
111
|
+
return "" if diffs.empty?
|
|
112
|
+
|
|
113
|
+
out = +""
|
|
114
|
+
file_length_difference = 0
|
|
115
|
+
diffs.each do |piece|
|
|
116
|
+
hunk = Diff::LCS::Hunk.new(old_lines, new_lines, piece, context, file_length_difference)
|
|
117
|
+
file_length_difference = hunk.file_length_difference
|
|
118
|
+
out << hunk.diff(:unified).to_s
|
|
119
|
+
out << "\n"
|
|
120
|
+
end
|
|
121
|
+
out
|
|
122
|
+
end
|
|
123
|
+
|
|
124
|
+
# Drops any call entry from the test's file whose hash wasn't looked up during
|
|
125
|
+
# the example — covers the case where a kwarg edit (or a deleted call) leaves
|
|
126
|
+
# an old entry unreachable. Only runs when ALLOW_LLM_CALL=1 (re-record mode);
|
|
127
|
+
# cache-only runs leave the file untouched so a temporarily-disabled call
|
|
128
|
+
# doesn't lose its cached response.
|
|
129
|
+
def prune_untouched_in_current_example!
|
|
130
|
+
return unless cache_file.exist?
|
|
131
|
+
|
|
132
|
+
data = YAML.safe_load(cache_file.read)
|
|
133
|
+
touched = touched_hashes
|
|
134
|
+
fresh_calls = data["calls"].select {|c| touched.include?(c["hash"]) }
|
|
135
|
+
return if fresh_calls.size == data["calls"].size
|
|
136
|
+
|
|
137
|
+
if fresh_calls.empty?
|
|
138
|
+
cache_file.delete
|
|
139
|
+
else
|
|
140
|
+
data["calls"] = fresh_calls
|
|
141
|
+
data["summary"] = build_summary(fresh_calls)
|
|
142
|
+
cache_file.write(YAML.dump(stringify(data)))
|
|
143
|
+
end
|
|
144
|
+
end
|
|
145
|
+
|
|
146
|
+
def record_touched(hash)
|
|
147
|
+
touched_hashes << hash
|
|
148
|
+
end
|
|
149
|
+
|
|
150
|
+
def touched_hashes
|
|
151
|
+
current_example!.metadata[:touched_llm_cache_hashes] ||= Set.new
|
|
152
|
+
end
|
|
153
|
+
|
|
154
|
+
def cache_file
|
|
155
|
+
cache_dir.join(test_suite, "#{current_id!}.yaml")
|
|
156
|
+
end
|
|
157
|
+
|
|
158
|
+
def call_hash(method, kwargs)
|
|
159
|
+
payload = canonicalize({method: method.to_s, args: cache_affecting_args(kwargs)})
|
|
160
|
+
Digest::SHA256.hexdigest(JSON.generate(payload))[0, 12]
|
|
161
|
+
end
|
|
162
|
+
|
|
163
|
+
def cache_affecting_args(kwargs)
|
|
164
|
+
canonicalize(kwargs.except(:request_options))
|
|
165
|
+
end
|
|
166
|
+
|
|
167
|
+
def canonicalize(obj)
|
|
168
|
+
case obj
|
|
169
|
+
when Hash
|
|
170
|
+
obj.each_with_object({}) {|(k, v), h| h[k.to_s] = canonicalize(v) }.sort.to_h
|
|
171
|
+
when Array
|
|
172
|
+
obj.map {|v| canonicalize(v) }
|
|
173
|
+
when Symbol
|
|
174
|
+
obj.to_s
|
|
175
|
+
else
|
|
176
|
+
obj
|
|
177
|
+
end
|
|
178
|
+
end
|
|
179
|
+
|
|
180
|
+
def load_call(hash)
|
|
181
|
+
return nil unless cache_file.exist?
|
|
182
|
+
|
|
183
|
+
data = YAML.safe_load(cache_file.read, permitted_classes: [], aliases: false)
|
|
184
|
+
data["calls"].find {|c| c["hash"] == hash }
|
|
185
|
+
end
|
|
186
|
+
|
|
187
|
+
# The recorded response hash, handed back to the adapter to deserialize.
|
|
188
|
+
def response_from_entry(entry)
|
|
189
|
+
entry.fetch("response")
|
|
190
|
+
end
|
|
191
|
+
|
|
192
|
+
def append_call!(provider, hash, kwargs, prompt, response)
|
|
193
|
+
FileUtils.mkdir_p(cache_file.dirname)
|
|
194
|
+
data = cache_file.exist? ? YAML.safe_load(cache_file.read) : new_file_data
|
|
195
|
+
data["calls"] << build_call_entry(provider, hash, kwargs, prompt, response)
|
|
196
|
+
data["summary"] = build_summary(data["calls"])
|
|
197
|
+
cache_file.write(YAML.dump(stringify(data)))
|
|
198
|
+
end
|
|
199
|
+
|
|
200
|
+
def new_file_data
|
|
201
|
+
{
|
|
202
|
+
"test_suite" => test_suite,
|
|
203
|
+
"test_name" => current_test_name,
|
|
204
|
+
"summary" => "",
|
|
205
|
+
"calls" => [],
|
|
206
|
+
}
|
|
207
|
+
end
|
|
208
|
+
|
|
209
|
+
# Provider-agnostic: the adapter already serialized `response` (including any
|
|
210
|
+
# readable conveniences like text_response/tool_uses). We tag the entry with
|
|
211
|
+
# the provider and store the canonicalized payload so a cache miss can report
|
|
212
|
+
# a precise diff.
|
|
213
|
+
def build_call_entry(provider, hash, kwargs, prompt, response)
|
|
214
|
+
entry = {"provider" => provider.to_s, "hash" => hash}
|
|
215
|
+
entry["prompt"] = prompt unless prompt.nil?
|
|
216
|
+
entry["payload"] = cache_affecting_args(kwargs)
|
|
217
|
+
entry["response"] = response
|
|
218
|
+
entry
|
|
219
|
+
end
|
|
220
|
+
|
|
221
|
+
def build_summary(calls)
|
|
222
|
+
tool_use_count = calls.count {|c| c["response"]["tool_uses"] }
|
|
223
|
+
text_only_count = calls.count {|c| c["response"]["text_response"] && !c["response"]["tool_uses"] }
|
|
224
|
+
|
|
225
|
+
parts = [ "#{calls.size} LLM #{calls.size == 1 ? 'call' : 'calls'} made." ]
|
|
226
|
+
if tool_use_count > 0
|
|
227
|
+
parts << "#{tool_use_count} #{tool_use_count == 1 ? 'call' : 'calls'} returned tool use responses."
|
|
228
|
+
end
|
|
229
|
+
if text_only_count > 0
|
|
230
|
+
parts << "#{text_only_count} #{text_only_count == 1 ? 'call' : 'calls'} returned a message response."
|
|
231
|
+
end
|
|
232
|
+
parts.join("\n")
|
|
233
|
+
end
|
|
234
|
+
|
|
235
|
+
# Like canonicalize but preserves insertion order so the readable header
|
|
236
|
+
# (test_suite/test_name/summary/calls) stays at the top of the YAML file.
|
|
237
|
+
def stringify(obj)
|
|
238
|
+
case obj
|
|
239
|
+
when Hash
|
|
240
|
+
obj.each_with_object({}) {|(k, v), h| h[k.to_s] = stringify(v) }
|
|
241
|
+
when Array
|
|
242
|
+
obj.map {|v| stringify(v) }
|
|
243
|
+
when Symbol
|
|
244
|
+
obj.to_s
|
|
245
|
+
else
|
|
246
|
+
obj
|
|
247
|
+
end
|
|
248
|
+
end
|
|
249
|
+
|
|
250
|
+
# Derived from the spec file path. Purely organizational — moving a test to a
|
|
251
|
+
# different suite means moving its cache file, but the suite name itself has
|
|
252
|
+
# no behavioral effect beyond placement.
|
|
253
|
+
def test_suite
|
|
254
|
+
file_path = current_example!.metadata.fetch(:file_path)
|
|
255
|
+
file_path.sub(%r{^\./spec/}, "").sub(/\.rb$/, "")
|
|
256
|
+
end
|
|
257
|
+
|
|
258
|
+
def current_test_name
|
|
259
|
+
current_example!.metadata.fetch(:full_description)
|
|
260
|
+
end
|
|
261
|
+
|
|
262
|
+
def current_id!
|
|
263
|
+
id = current_example!.metadata[:llm_cache_id]
|
|
264
|
+
raise Deja::MissingIdError, "No id set on the current example. Call use_llm_cache(id) before making LLM calls." if id.nil?
|
|
265
|
+
|
|
266
|
+
id
|
|
267
|
+
end
|
|
268
|
+
|
|
269
|
+
def current_example!
|
|
270
|
+
RSpec.current_example or raise Deja::Error, "Deja must be used inside an RSpec example"
|
|
271
|
+
end
|
|
272
|
+
|
|
273
|
+
# Renders `path` relative to the configured project_root for friendlier error
|
|
274
|
+
# messages, falling back to the absolute path when it's outside the root.
|
|
275
|
+
def display_path(path)
|
|
276
|
+
path.relative_path_from(Deja.configuration.project_root)
|
|
277
|
+
rescue ArgumentError
|
|
278
|
+
path
|
|
279
|
+
end
|
|
280
|
+
end
|
|
281
|
+
end
|