deja 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: c773af63e95adbc7d09f0cd1f3df714e567da9c4abb03803e58d4ae008f9c7f1
4
+ data.tar.gz: '089eccd51f378482ffdd83c97b53e7d2a45712aadf7f42385d03045256a2484f'
5
+ SHA512:
6
+ metadata.gz: 1e91bb15c14bcad87e0ddf77b32c6e77a6cae4313f31b2b212c9669f6c7b3fe626a852beac89f723d4ea0e67ef2e2fe79c5c22960dd9fb2e261077d214d32912
7
+ data.tar.gz: d110bdee9a51d74de91ccef37be10ea86de0863ce0d68b66bc2b1f30d332e3534b5288a47647066192721f7de1026de6c95dd188acbc70f04fde01029cf66c01
data/CHANGELOG.md ADDED
@@ -0,0 +1,24 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project are documented here. This project adheres to
4
+ [Semantic Versioning](https://semver.org/).
5
+
6
+ ## [0.1.0] - 2026-06-19
7
+
8
+ ### Added
9
+ - Initial extraction from the Forge test suite.
10
+ - `use_llm_cache(id)` — record/replay Anthropic `messages.create` and
11
+ `messages.stream` calls to a per-test YAML file.
12
+ - `expect_llm_called` and `forbid_calls` helpers.
13
+ - `cached_llm_value(id, *path)` reader.
14
+ - `meet_requirements(text)` matcher — judge a value against free-text
15
+ requirements once, then cache the verdict.
16
+ - Pluggable provider adapters (`Deja::Adapters::Base`), so a suite can mix
17
+ providers. Ships `:anthropic`. `use_llm_cache` installs every registered
18
+ adapter; cache entries are tagged with `provider:`.
19
+ - `Deja.configure` with `cache_root`, `register(provider, install:, real_client:)`,
20
+ a dedicated `judge_client`, and judge model/prompt settings.
21
+ - Anthropic SDK response structs + serialize/deserialize extracted into the
22
+ `llm_mock_anthropic` gem (on the shared `llm_mock` contract); the Anthropic
23
+ adapter now delegates to it. `Deja::Anthropic::*` is removed — use
24
+ `LlmMock::Anthropic::*` for canned responses.
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Nate Brustein
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,226 @@
1
+ # Deja
2
+
3
+ Deja is a testing toolkit for code that calls LLM apis. Your tests make real calls to real LLMs, so you can confirm the model actually gives the results you expect — you're checking genuine model behavior, not a stub you wrote. To keep that fast and repeatable, Deja records each real call once and replays it from a cache on every later run, so your suite stays deterministic and free to run in CI.
4
+
5
+ ## Overview
6
+
7
+ ### What Deja allows
8
+
9
+ Deja allows you to add the following coverage to your test.
10
+
11
+ * I have application code that generates arguments for an LLM api. I want to assert on the arguments that were provided to the LLM api.
12
+ * When I pass certain arguments (i.e. a certain prompt) to an LLM, the result is non-deterministic. Even so, I have certain requirements
13
+ as to what the result should be. I want to assert that the LLM's response meets those requirements.
14
+
15
+ With this functionality, you can do the following.
16
+
17
+ * I want to change my code and be sure that the changes I made did not affect the arguments passed to the LLM api.
18
+ * I want to iterate on my application code in ways that will change a prompt sent to an LLM until the response meets certain requirements.
19
+ * I want to change my code in a way that will change a prompt sent to an LLM and be sure that the response still meets existing requirements.
20
+ * I want to upgrade to a new model and be sure that all of my existing calls still meet existing requirements.
21
+
22
+ ### How Deja works
23
+
24
+ 1. You run a test locally with ALLOW_LLM_CALL=1.
25
+ * When your test hits application code that triggers a call to an LLM api, the call is actually made via http.
26
+ * You assert on the arguments that were sent to the LLM api
27
+ * You assert in a fuzzy way on the response, like "The response should say that..."
28
+ * Deja caches the response, keyed off of the exact set of arguments. The cached response is stored in a generated file, which
29
+ you store in version control.
30
+ 2. You run the test again
31
+ * When the LLM api call is triggered, the test finds the response in the cache, skipping the http call to the LLM api
32
+ * Your assertions ensure that your code still sent the expected arguments
33
+ 3. You push your code and tests run on CI
34
+ * Since the cached response is stored in version control, CI has access to it and runs the test without making any
35
+ actual LLM calls
36
+ 4. You update code and re-run the test locally with ALLOW_LLM_CALL=1.
37
+ * The updated code can change the prompt, the LLM model, or anything else that will change the arguments sent to the LLM api.
38
+ * Since there is no cached response for the new arguments, the call to the LLM api is actually made via http.
39
+ * The new response is cached, replacing the old one
40
+ * Your fuzzy assertion ensures that the new response still matches your requirements.
41
+
42
+ ### LLM support
43
+
44
+ Today Deja targets the [Anthropic](https://github.com/anthropics/anthropic-sdk-ruby) SDK. Support for other SDKs is coming.
45
+
46
+ The Anthropic-specific bits — the response value objects and the
47
+ serialize/deserialize that backs the cache — live in a separate gem,
48
+ [`llm_mock_anthropic`](https://github.com/nbrustein/llm_mock_anthropic) (built on
49
+ the shared [`llm_mock`](https://github.com/nbrustein/llm_mock) contract). Deja
50
+ pulls it in automatically. If a test stubs the Anthropic client directly (rather
51
+ than recording/replaying), use that gem's builders — e.g.
52
+ `LlmMock::Anthropic.message([...])` — to construct canned responses.
53
+
54
+ ## Usage
55
+
56
+ ### Installation
57
+
58
+ ```ruby
59
+ # Gemfile
60
+ group :test do
61
+ gem "deja"
62
+ end
63
+ ```
64
+
65
+ ### Setup
66
+
67
+ Require the RSpec integration, point Deja at a cache directory, and register a
68
+ provider — telling it how to swap your app's client for Deja's caching stub:
69
+
70
+ ```ruby
71
+ # spec/support/deja.rb (or spec/rails_helper.rb)
72
+ require "deja/rspec"
73
+
74
+ Deja.configure do |c|
75
+ # Whatever your app calls to get an Anthropic client. Deja hands you its
76
+ # caching stub; you return it from that accessor for the duration of the test.
77
+ c.register :anthropic,
78
+ install: ->(mock_anthropic_client) { allow(AnthropicClient).to receive(:client).and_return(mock_anthropic_client) }
79
+
80
+ # Required only if you use the `meet_requirements` matcher: the client Deja
81
+ # uses to judge a value against its requirements. Deja picks provider-specific
82
+ # defaults from the client's type (Anthropic is built in).
83
+ c.judge_client { Anthropic::Client.new }
84
+
85
+ # Optional: override the judge's defaults (model, max_tokens, system prompt) or
86
+ # pass provider-specific args. These are merged into the judge's
87
+ # messages.create call over its built-in defaults.
88
+ c.judge_attrs = { model: "claude-sonnet-4-5" }
89
+ end
90
+ ```
91
+
92
+ Recorded cache files go under `spec/support/deja_cache` by default. To put them
93
+ somewhere else, set `cache_root` (a String or Pathname, resolved under your
94
+ project root):
95
+
96
+ ```ruby
97
+ Deja.configure { |c| c.cache_root = Rails.root.join("spec/support/cache") }
98
+ ```
99
+
100
+ That assumes your app funnels LLM access through a single seam. e.g., In this example,
101
+ Deja will mock out calls to `AnthropicClient.client`:
102
+
103
+ ```ruby
104
+ class AnthropicClient
105
+ def self.client
106
+ Anthropic::Client.new
107
+ end
108
+ end
109
+ ```
110
+
111
+
112
+ **Optional:** Deja doesn't require WebMock — it intercepts calls at the client
113
+ seam, not at the HTTP layer. But if your suite already uses WebMock, allow the
114
+ Anthropic host so recording can reach it (and keep the allowlist tight so a
115
+ forgotten stub surfaces as a blocked request rather than a silent live call):
116
+
117
+ ```ruby
118
+ WebMock.disable_net_connect!(allow_localhost: true, allow: ["api.anthropic.com"])
119
+ ```
120
+
121
+ ## Assert on LLM api arguments and response
122
+
123
+ ```ruby
124
+ it "summarizes an article" do
125
+ use_llm_cache("2026-04-30_17-03") # one cache file for this test
126
+
127
+ summary = ArticleSummarizer.new(article).call # makes LLM calls — routed through Deja
128
+
129
+ kwargs = expect_llm_called # exactly one call happened
130
+ expect(kwargs[:system]).to include("You are a summarization assistant")
131
+
132
+ expect(summary).to meet_requirements(<<~REQ)
133
+ A single sentence under 200 characters that indicates that the article is about The Hitchhiker's Guide to the Galaxy
134
+ REQ
135
+ end
136
+ ```
137
+
138
+ Run it three ways:
139
+
140
+ ```bash
141
+ # 1. First run — nothing cached yet:
142
+ bundle exec rspec spec/integration/article_summarizer_spec.rb
143
+ # => Deja::MissingCacheError: "Set ALLOW_LLM_CALL=1 to make the call and record it."
144
+
145
+ # 2. Record — makes the real calls and writes YAML fixtures:
146
+ ALLOW_LLM_CALL=1 bundle exec rspec spec/integration/article_summarizer_spec.rb
147
+
148
+ # 3. Every run after — replays from cache, no network:
149
+ bundle exec rspec spec/integration/article_summarizer_spec.rb
150
+ ```
151
+
152
+ Commit the YAML files under `cache_root`. They're the recorded fixtures; CI
153
+ replays them with no API key.
154
+
155
+ ## Use LLM response in a subsequent assertion
156
+
157
+ When your code *acts on* what the model returned, you'll often want to assert it
158
+ used that output correctly. But the output is non-deterministic, so you can't
159
+ hardcode it. `cached_llm_value` reads the actual recorded response out of the
160
+ cache file by walking keys and array indices — so your assertion and the
161
+ recording stay in sync every time you re-record.
162
+
163
+ ```ruby
164
+ it "stores the topics the model extracted" do
165
+ use_llm_cache("2026-05-01_09-15")
166
+
167
+ # Asks the model to extract topics via a tool call, then saves them.
168
+ ArticleTagger.new(article).call
169
+
170
+ # Read what the model actually returned (a tool_use input) from the cache and
171
+ # assert your code persisted exactly that — no hardcoded expectation.
172
+ topics = cached_llm_value("2026-05-01_09-15",
173
+ "calls", 0, "response", "tool_uses", 0, "input", "topics")
174
+
175
+ expect(Article.last.topics).to eq(topics)
176
+ end
177
+ ```
178
+
179
+ The path mirrors the recorded YAML (see [How it caches](#how-it-caches)):
180
+ `calls` → the first call → its `response` → the first `tool_uses` entry → that
181
+ tool call's `input` → the `topics` key.
182
+
183
+ ## DSL reference
184
+
185
+ | Helper | What it does |
186
+ | --- | --- |
187
+ | `use_llm_cache(id)` | Installs the caching stub and sets the per-test cache id. Call once at the top of an example. |
188
+ | `expect_llm_called` | Asserts exactly one LLM call happened; returns its kwargs. Currently only useful where there is a single llm call in a test. |
189
+ | `forbid_llm_calls` | Installs a client that raises on any access — proves a code path never reaches the LLM. |
190
+ | `cached_llm_value(id, *path)` | Reads a value out of a recorded YAML file by walking keys/indices. |
191
+ | `meet_requirements(text)` | Matcher: asserts a value satisfies free-text requirements (judged once, cached). |
192
+
193
+ ## How it caches
194
+
195
+ One YAML file per test, keyed by the id you pass to `use_llm_cache`:
196
+
197
+ ```
198
+ <cache_root>/cached_calls/<spec/path>/<id>.yaml # recorded responses
199
+ <cache_root>/meets_requirements/<spec/path>/<id>.yaml # confirmed meet_requirements values
200
+ ```
201
+
202
+ Each request is fingerprinted with a 12-char hash of its canonicalized kwargs.
203
+ On replay, a miss prints a unified diff against the closest recorded request so
204
+ you can see exactly what drifted. Re-recording (`ALLOW_LLM_CALL=1`) prunes any
205
+ cached entry the test no longer reaches.
206
+
207
+ ## Environment variables
208
+
209
+ | Variable | Effect |
210
+ | --- | --- |
211
+ | `ALLOW_LLM_CALL=1` | Make real calls and record/update the cache. Your real client must be able to authenticate (the Anthropic SDK reads `ANTHROPIC_API_KEY` by default). |
212
+ | `DISABLE_LLM_CACHE=1` | Bypass the cache entirely and always call live (debugging). |
213
+
214
+ ## Configuration
215
+
216
+ | Setting | Default | Purpose |
217
+ | --- | --- | --- |
218
+ | `cache_root` | `spec/support/deja_cache` | Directory for recorded YAML (under `project_root`). |
219
+ | `register(provider, install:, real_client:, as:)` | — (≥1 required) | Register a provider. `install` swaps your app's client for Deja's stub; `real_client` (optional) builds a live client for recording. |
220
+ | `project_root` | `Dir.pwd` | Base for relative paths in error messages. |
221
+ | `judge_client { ... }` | — (required for `meet_requirements`) | Live client used by the `meet_requirements` judge. No default. |
222
+ | `judge_attrs = { ... }` | `{}` | Attrs merged into the `meet_requirements` judge's `messages.create` call, overriding the judge's own defaults (model, token limit, system prompt). `messages` and `output_config` are reserved by the matcher. |
223
+
224
+ ## License
225
+
226
+ MIT — see [LICENSE](LICENSE).
@@ -0,0 +1,45 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "deja/adapters/base"
4
+ require "llm_mock_anthropic"
5
+
6
+ module Deja
7
+ module Adapters
8
+ # Wires the Anthropic provider from llm_mock_anthropic into Deja's cache.
9
+ # All Anthropic-SDK shape knowledge — the stub client, the response structs,
10
+ # and serialize/deserialize — lives in LlmMock::Anthropic; this adapter just
11
+ # routes its calls through Deja::Cache (via Base#cached_call).
12
+ class Anthropic < Base
13
+ def provider
14
+ @provider ||= LlmMock::Anthropic::Provider.new
15
+ end
16
+
17
+ def build_mock_client
18
+ adapter = self
19
+ provider.build_client do |method, kwargs|
20
+ adapter.cached_call(method, kwargs) do
21
+ adapter.provider.call_real(adapter.real_client, method, kwargs)
22
+ end
23
+ end
24
+ end
25
+
26
+ def default_real_client
27
+ provider.default_real_client
28
+ end
29
+
30
+ def prompt_for(kwargs)
31
+ provider.prompt_for(kwargs)
32
+ end
33
+
34
+ def serialize(method, response)
35
+ provider.serialize(method, response)
36
+ end
37
+
38
+ def deserialize(method, data)
39
+ provider.deserialize(method, data)
40
+ end
41
+ end
42
+
43
+ register_type(:anthropic, Anthropic)
44
+ end
45
+ end
@@ -0,0 +1,83 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Deja
4
+ # Adapters teach Deja how to talk to one LLM provider: the stub client's shape,
5
+ # how to (de)serialize a response, and how to build a real client. The cache,
6
+ # hashing, and matcher are all provider-agnostic and live elsewhere.
7
+ module Adapters
8
+ @registered = {}
9
+
10
+ class << self
11
+ # Built-in adapter classes register themselves by provider name, so
12
+ # `c.register :anthropic, ...` knows which class to build. A new provider is
13
+ # purely additive: a new Base subclass + one register_type call.
14
+ def register_type(name, klass)
15
+ @registered[name] = klass
16
+ end
17
+
18
+ def build(provider, key:, install:, real_client:)
19
+ klass = @registered.fetch(provider) do
20
+ raise Deja::Error, "Unknown provider #{provider.inspect}. Registered: #{@registered.keys.inspect}"
21
+ end
22
+ klass.new(key:, install:, real_client:)
23
+ end
24
+ end
25
+
26
+ class Base
27
+ attr_reader :key, :install_block
28
+
29
+ # key — how this registration is named (usually the provider symbol)
30
+ # install — block run in the example's context to swap the app's client
31
+ # for the one Deja hands it
32
+ # real_client — optional block building a live client; falls back to the
33
+ # subclass default
34
+ def initialize(key:, install:, real_client: nil)
35
+ @key = key
36
+ @install_block = install
37
+ @real_client_override = real_client
38
+ end
39
+
40
+ def real_client
41
+ (@real_client_override || default_real_client).call
42
+ end
43
+
44
+ # Wraps a single call: records it (for expect_llm_called), routes through the
45
+ # cache, and (de)serializes via the subclass. `real_call` performs the live
46
+ # provider call when recording.
47
+ def cached_call(method, kwargs, &real_call)
48
+ Deja.record_call(key, method, kwargs)
49
+ data = Deja::Cache.fetch(method, kwargs, provider: key, prompt: prompt_for(kwargs)) do
50
+ serialize(method, real_call.call)
51
+ end
52
+ deserialize(method, data)
53
+ end
54
+
55
+ # --- subclass hooks ---
56
+
57
+ # The stub client object the app receives. Its methods call `cached_call`.
58
+ def build_mock_client
59
+ raise NotImplementedError, "#{self.class} must implement #build_mock_client"
60
+ end
61
+
62
+ # Provider response object -> plain Hash (must round-trip through deserialize).
63
+ def serialize(_method, _response)
64
+ raise NotImplementedError, "#{self.class} must implement #serialize"
65
+ end
66
+
67
+ # Plain Hash (from the cache) -> object shaped like the provider's response.
68
+ def deserialize(_method, _data)
69
+ raise NotImplementedError, "#{self.class} must implement #deserialize"
70
+ end
71
+
72
+ def default_real_client
73
+ raise NotImplementedError, "#{self.class} must implement #default_real_client"
74
+ end
75
+
76
+ # A human-readable prompt string stored on the cache entry (purely for
77
+ # auditing the YAML). Optional.
78
+ def prompt_for(_kwargs)
79
+ nil
80
+ end
81
+ end
82
+ end
83
+ end
data/lib/deja/cache.rb ADDED
@@ -0,0 +1,281 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "diff/lcs"
4
+ require "diff/lcs/hunk"
5
+ require "digest"
6
+ require "fileutils"
7
+ require "set"
8
+ require "yaml"
9
+
10
+ module Deja
11
+ # File-based cache for Anthropic responses, keyed by an id chosen per-test (see
12
+ # `use_llm_cache(id)`). One file per test: `<cache_root>/cached_calls/<suite>/<id>.yaml`.
13
+ # All calls a test makes land in that one file under `calls:`, each tagged with a
14
+ # `hash` of the kwargs so we can look up the right cached response on replay.
15
+ #
16
+ # YAML shape:
17
+ # test_suite: <derived from the spec file path>
18
+ # test_name: <full RSpec description>
19
+ # summary: <human-readable counts: total / tool_use / message-only>
20
+ # calls:
21
+ # - provider: <which registered adapter produced this — e.g. anthropic>
22
+ # hash: <12-char fingerprint of kwargs — used for lookup>
23
+ # prompt: <adapter-supplied readable prompt, when present>
24
+ # payload: <full canonicalized kwargs — for a precise diff on miss>
25
+ # response: <adapter-serialized response hash; the adapter replays it>
26
+ #
27
+ # Behavior:
28
+ # DISABLE_LLM_CACHE=1 → bypass cache entirely
29
+ # cache hit → return cached response
30
+ # miss + ALLOW_LLM_CALL=1 → call live, append to the test's file
31
+ # miss + no ALLOW_LLM_CALL → raise Deja::MissingCacheError
32
+ module Cache
33
+ module_function
34
+
35
+ def cache_dir
36
+ Deja.configuration.cache_root.join("cached_calls")
37
+ end
38
+
39
+ def fetch(method, kwargs, provider:, prompt: nil)
40
+ return yield if ENV["DISABLE_LLM_CACHE"]
41
+
42
+ hash = call_hash(method, kwargs)
43
+ record_touched(hash)
44
+ entry = load_call(hash)
45
+
46
+ if entry
47
+ response_from_entry(entry)
48
+ elsif ENV["ALLOW_LLM_CALL"]
49
+ response = yield
50
+ append_call!(provider, hash, kwargs, prompt, response)
51
+ response
52
+ else
53
+ raise Deja::MissingCacheError, build_miss_message(hash, kwargs)
54
+ end
55
+ end
56
+
57
+ # Builds the MissingCacheError body. When there's a cached entry whose
58
+ # canonicalized payload is similar to the current request, we show a
59
+ # unified diff against the cached payload so the test author can see
60
+ # exactly what drifted between record and replay. The cache stores the
61
+ # full canonicalized payload on each entry, so this covers `system`,
62
+ # `messages`, `tools`, `tool_choice`, etc. — anything the hash is computed
63
+ # over.
64
+ def build_miss_message(hash, kwargs)
65
+ base = "No cached LLM response with hash #{hash} in #{display_path(cache_file)}.\n" \
66
+ "Set ALLOW_LLM_CALL=1 to make the call and record it."
67
+ current_payload = JSON.pretty_generate(cache_affecting_args(kwargs))
68
+ closest = closest_cached_entry(current_payload)
69
+ return base unless closest
70
+
71
+ cached_payload = JSON.pretty_generate(closest["payload"]) if closest["payload"]
72
+ cached_payload ||= closest["prompt"].to_s # legacy entries: only prompt was stored
73
+ diff = unified_diff(cached_payload, current_payload, context: 3)
74
+ if diff.empty?
75
+ return "#{base}\n\nClosest cached entry: #{closest['hash']} " \
76
+ "(prompts differ outside the captured payload)"
77
+ end
78
+
79
+ "#{base}\n\n" \
80
+ "Closest cached entry: #{closest['hash']}\n" \
81
+ "--- cached payload (#{closest['hash']})\n" \
82
+ "+++ current payload (#{hash})\n" \
83
+ "#{diff}"
84
+ end
85
+
86
+ # Picks the cached entry whose stored payload (or, for legacy entries that
87
+ # only stored `prompt`, system text) has the largest LCS overlap with the
88
+ # current request. Returns nil when the cache file is empty.
89
+ def closest_cached_entry(current_text)
90
+ return nil unless cache_file.exist?
91
+
92
+ data = YAML.safe_load(cache_file.read, permitted_classes: [], aliases: false)
93
+ calls = data["calls"]
94
+ return nil if calls.nil? || calls.empty?
95
+
96
+ current_lines = current_text.lines
97
+ calls.max_by do |c|
98
+ cached_text = c["payload"] ? JSON.pretty_generate(c["payload"]) : c["prompt"].to_s
99
+ Diff::LCS.lcs(cached_text.lines, current_lines).size
100
+ end
101
+ end
102
+
103
+ # Returns a unified diff (with `context` lines of context) between two
104
+ # strings, or an empty string when they're identical.
105
+ def unified_diff(old_text, new_text, context: 2)
106
+ old_lines = old_text.lines
107
+ new_lines = new_text.lines
108
+ return "" if old_lines == new_lines
109
+
110
+ diffs = Diff::LCS.diff(old_lines, new_lines)
111
+ return "" if diffs.empty?
112
+
113
+ out = +""
114
+ file_length_difference = 0
115
+ diffs.each do |piece|
116
+ hunk = Diff::LCS::Hunk.new(old_lines, new_lines, piece, context, file_length_difference)
117
+ file_length_difference = hunk.file_length_difference
118
+ out << hunk.diff(:unified).to_s
119
+ out << "\n"
120
+ end
121
+ out
122
+ end
123
+
124
+ # Drops any call entry from the test's file whose hash wasn't looked up during
125
+ # the example — covers the case where a kwarg edit (or a deleted call) leaves
126
+ # an old entry unreachable. Only runs when ALLOW_LLM_CALL=1 (re-record mode);
127
+ # cache-only runs leave the file untouched so a temporarily-disabled call
128
+ # doesn't lose its cached response.
129
+ def prune_untouched_in_current_example!
130
+ return unless cache_file.exist?
131
+
132
+ data = YAML.safe_load(cache_file.read)
133
+ touched = touched_hashes
134
+ fresh_calls = data["calls"].select {|c| touched.include?(c["hash"]) }
135
+ return if fresh_calls.size == data["calls"].size
136
+
137
+ if fresh_calls.empty?
138
+ cache_file.delete
139
+ else
140
+ data["calls"] = fresh_calls
141
+ data["summary"] = build_summary(fresh_calls)
142
+ cache_file.write(YAML.dump(stringify(data)))
143
+ end
144
+ end
145
+
146
+ def record_touched(hash)
147
+ touched_hashes << hash
148
+ end
149
+
150
+ def touched_hashes
151
+ current_example!.metadata[:touched_llm_cache_hashes] ||= Set.new
152
+ end
153
+
154
+ def cache_file
155
+ cache_dir.join(test_suite, "#{current_id!}.yaml")
156
+ end
157
+
158
+ def call_hash(method, kwargs)
159
+ payload = canonicalize({method: method.to_s, args: cache_affecting_args(kwargs)})
160
+ Digest::SHA256.hexdigest(JSON.generate(payload))[0, 12]
161
+ end
162
+
163
+ def cache_affecting_args(kwargs)
164
+ canonicalize(kwargs.except(:request_options))
165
+ end
166
+
167
+ def canonicalize(obj)
168
+ case obj
169
+ when Hash
170
+ obj.each_with_object({}) {|(k, v), h| h[k.to_s] = canonicalize(v) }.sort.to_h
171
+ when Array
172
+ obj.map {|v| canonicalize(v) }
173
+ when Symbol
174
+ obj.to_s
175
+ else
176
+ obj
177
+ end
178
+ end
179
+
180
+ def load_call(hash)
181
+ return nil unless cache_file.exist?
182
+
183
+ data = YAML.safe_load(cache_file.read, permitted_classes: [], aliases: false)
184
+ data["calls"].find {|c| c["hash"] == hash }
185
+ end
186
+
187
+ # The recorded response hash, handed back to the adapter to deserialize.
188
+ def response_from_entry(entry)
189
+ entry.fetch("response")
190
+ end
191
+
192
+ def append_call!(provider, hash, kwargs, prompt, response)
193
+ FileUtils.mkdir_p(cache_file.dirname)
194
+ data = cache_file.exist? ? YAML.safe_load(cache_file.read) : new_file_data
195
+ data["calls"] << build_call_entry(provider, hash, kwargs, prompt, response)
196
+ data["summary"] = build_summary(data["calls"])
197
+ cache_file.write(YAML.dump(stringify(data)))
198
+ end
199
+
200
+ def new_file_data
201
+ {
202
+ "test_suite" => test_suite,
203
+ "test_name" => current_test_name,
204
+ "summary" => "",
205
+ "calls" => [],
206
+ }
207
+ end
208
+
209
+ # Provider-agnostic: the adapter already serialized `response` (including any
210
+ # readable conveniences like text_response/tool_uses). We tag the entry with
211
+ # the provider and store the canonicalized payload so a cache miss can report
212
+ # a precise diff.
213
+ def build_call_entry(provider, hash, kwargs, prompt, response)
214
+ entry = {"provider" => provider.to_s, "hash" => hash}
215
+ entry["prompt"] = prompt unless prompt.nil?
216
+ entry["payload"] = cache_affecting_args(kwargs)
217
+ entry["response"] = response
218
+ entry
219
+ end
220
+
221
+ def build_summary(calls)
222
+ tool_use_count = calls.count {|c| c["response"]["tool_uses"] }
223
+ text_only_count = calls.count {|c| c["response"]["text_response"] && !c["response"]["tool_uses"] }
224
+
225
+ parts = [ "#{calls.size} LLM #{calls.size == 1 ? 'call' : 'calls'} made." ]
226
+ if tool_use_count > 0
227
+ parts << "#{tool_use_count} #{tool_use_count == 1 ? 'call' : 'calls'} returned tool use responses."
228
+ end
229
+ if text_only_count > 0
230
+ parts << "#{text_only_count} #{text_only_count == 1 ? 'call' : 'calls'} returned a message response."
231
+ end
232
+ parts.join("\n")
233
+ end
234
+
235
+ # Like canonicalize but preserves insertion order so the readable header
236
+ # (test_suite/test_name/summary/calls) stays at the top of the YAML file.
237
+ def stringify(obj)
238
+ case obj
239
+ when Hash
240
+ obj.each_with_object({}) {|(k, v), h| h[k.to_s] = stringify(v) }
241
+ when Array
242
+ obj.map {|v| stringify(v) }
243
+ when Symbol
244
+ obj.to_s
245
+ else
246
+ obj
247
+ end
248
+ end
249
+
250
+ # Derived from the spec file path. Purely organizational — moving a test to a
251
+ # different suite means moving its cache file, but the suite name itself has
252
+ # no behavioral effect beyond placement.
253
+ def test_suite
254
+ file_path = current_example!.metadata.fetch(:file_path)
255
+ file_path.sub(%r{^\./spec/}, "").sub(/\.rb$/, "")
256
+ end
257
+
258
+ def current_test_name
259
+ current_example!.metadata.fetch(:full_description)
260
+ end
261
+
262
+ def current_id!
263
+ id = current_example!.metadata[:llm_cache_id]
264
+ raise Deja::MissingIdError, "No id set on the current example. Call use_llm_cache(id) before making LLM calls." if id.nil?
265
+
266
+ id
267
+ end
268
+
269
+ def current_example!
270
+ RSpec.current_example or raise Deja::Error, "Deja must be used inside an RSpec example"
271
+ end
272
+
273
+ # Renders `path` relative to the configured project_root for friendlier error
274
+ # messages, falling back to the absolute path when it's outside the root.
275
+ def display_path(path)
276
+ path.relative_path_from(Deja.configuration.project_root)
277
+ rescue ArgumentError
278
+ path
279
+ end
280
+ end
281
+ end