@betterdb/semantic-cache 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. package/README.md +211 -128
  2. package/dist/SemanticCache.d.ts +85 -5
  3. package/dist/SemanticCache.js +689 -47
  4. package/dist/adapters/ai.js +6 -1
  5. package/dist/adapters/anthropic.d.ts +32 -0
  6. package/dist/adapters/anthropic.js +94 -0
  7. package/dist/adapters/langchain.js +6 -1
  8. package/dist/adapters/langgraph.d.ts +104 -0
  9. package/dist/adapters/langgraph.js +271 -0
  10. package/dist/adapters/llamaindex.d.ts +32 -0
  11. package/dist/adapters/llamaindex.js +76 -0
  12. package/dist/adapters/openai-responses.d.ts +31 -0
  13. package/dist/adapters/openai-responses.js +112 -0
  14. package/dist/adapters/openai.d.ts +42 -0
  15. package/dist/adapters/openai.js +97 -0
  16. package/dist/analytics.d.ts +24 -0
  17. package/dist/analytics.js +116 -0
  18. package/dist/cluster.d.ts +10 -0
  19. package/dist/cluster.js +43 -0
  20. package/dist/defaultCostTable.d.ts +11 -0
  21. package/dist/defaultCostTable.js +1976 -0
  22. package/dist/embed/bedrock.d.ts +32 -0
  23. package/dist/embed/bedrock.js +109 -0
  24. package/dist/embed/cohere.d.ts +34 -0
  25. package/dist/embed/cohere.js +37 -0
  26. package/dist/embed/ollama.d.ts +30 -0
  27. package/dist/embed/ollama.js +24 -0
  28. package/dist/embed/openai.d.ts +31 -0
  29. package/dist/embed/openai.js +66 -0
  30. package/dist/embed/voyage.d.ts +31 -0
  31. package/dist/embed/voyage.js +32 -0
  32. package/dist/index.d.ts +6 -1
  33. package/dist/index.js +11 -1
  34. package/dist/normalizer.d.ts +68 -0
  35. package/dist/normalizer.js +102 -0
  36. package/dist/telemetry.d.ts +3 -0
  37. package/dist/telemetry.js +18 -0
  38. package/dist/types.d.ts +107 -7
  39. package/dist/utils.d.ts +58 -0
  40. package/dist/utils.js +30 -0
  41. package/package.json +81 -6
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # @betterdb/semantic-cache
2
2
 
3
- A standalone, framework-agnostic semantic cache for LLM applications backed by [Valkey](https://valkey.io/) (or Redis). Uses Valkey's vector search (`valkey-search` module) for similarity matching with built-in [OpenTelemetry](https://opentelemetry.io/) tracing and [Prometheus](https://prometheus.io/) metrics via `prom-client`. The first semantic cache library designed to work natively with Valkey and BetterDB Monitor.
3
+ A standalone, framework-agnostic semantic cache for LLM applications backed by [Valkey](https://valkey.io/). Uses Valkey's vector search (`valkey-search` module) for similarity matching with built-in [OpenTelemetry](https://opentelemetry.io/) tracing and [Prometheus](https://prometheus.io/) metrics. Full adapter parity with [`@betterdb/agent-cache`](../agent-cache/).
4
4
 
5
5
  ## Prerequisites
6
6
 
@@ -12,153 +12,154 @@ A standalone, framework-agnostic semantic cache for LLM applications backed by [
12
12
  ## Installation
13
13
 
14
14
  ```bash
15
- npm install @betterdb/semantic-cache
15
+ npm install @betterdb/semantic-cache iovalkey
16
16
  ```
17
17
 
18
- You must also have `iovalkey` installed (it is a peer dependency):
19
-
20
- ```bash
21
- npm install iovalkey
22
- ```
18
+ `iovalkey` is a required peer dependency.
23
19
 
24
20
  ## Why @betterdb/semantic-cache
25
21
 
26
- As of 2026, no existing semantic cache library simultaneously satisfies all three of the following properties: **Valkey-native** support (explicitly handling `valkey-search` API differences rather than assuming Redis wire compatibility), **standalone** operation (no coupling to LangChain, LiteLLM, AWS, or any other orchestration layer), and **built-in observability** (OpenTelemetry spans and Prometheus metrics emitted at the cache operation level, not just at the HTTP or LLM call level). This package was built to fill that gap.
27
-
28
- | Library / Service | Valkey-native | Standalone | Built-in OTel + Prometheus |
29
- |---|---|---|---|
30
- | **@betterdb/semantic-cache** | ✅ | ✅ | ✅ |
31
- | RedisVL `SemanticCache` | ❌ Redis only | ✅ | ❌ |
32
- | LangChain `RedisSemanticCache` | ❌ Redis only | ❌ Requires LangChain | ❌ |
33
- | LiteLLM `redis-semantic` | ❌ Redis only | ❌ Requires LiteLLM | ❌ Partial (no cache metrics) |
34
- | `langgraph-checkpoint-aws` `ValkeyCache` | ✅ | ❌ Requires AWS + LangGraph | ❌ |
35
- | Mem0 + Valkey | ✅ | ❌ Full memory framework | ❌ |
36
- | Redis LangCache | ❌ Redis Cloud only | ❌ Managed service | ✅ Dashboard only |
37
- | Upstash `semantic-cache` | ❌ Upstash Vector only | ✅ | ❌ |
38
- | GPTCache | ❌ Abandoned (2023) | ✅ | ❌ |
39
-
40
- - **Valkey-native**: `valkey-search` has API differences from Redis's RediSearch that require explicit handling (see [Valkey Search 1.2 compatibility notes](#valkey-search-12-compatibility-notes) in the changelog). Libraries targeting Redis are not guaranteed to work correctly against self-hosted Valkey or managed Valkey services (ElastiCache, Memorystore).
41
- - **Standalone**: no dependency on a specific AI framework means you can use this with any LLM client — OpenAI SDK, Anthropic SDK, a local model, or a custom inference endpoint — and swap it out without changing your cache layer.
42
- - **Built-in OTel + Prometheus**: every `check()` and `store()` call emits a span and increments counters. You get hit rate, similarity score distribution, and latency percentiles in Grafana or any OTel-compatible backend without writing any instrumentation code. If you use [BetterDB Monitor](https://betterdb.com), these metrics are surfaced automatically alongside your other Valkey observability data.
22
+ The only semantic cache library that is simultaneously Valkey-native (explicit handling of `valkey-search` API differences), standalone (no coupling to any AI framework), and has built-in OpenTelemetry + Prometheus instrumentation at the cache operation level.
43
23
 
44
24
  ## Quick Start
45
25
 
46
26
  ```typescript
47
27
  import Valkey from 'iovalkey';
48
28
  import { SemanticCache } from '@betterdb/semantic-cache';
29
+ import { createOpenAIEmbed } from '@betterdb/semantic-cache/embed/openai';
49
30
 
50
31
  const client = new Valkey({ host: 'localhost', port: 6399 });
51
32
 
52
33
  const cache = new SemanticCache({
53
34
  client,
54
- embedFn: async (text) => {
55
- // Any embedding provider works OpenAI, Voyage AI, Cohere, a local model, etc.
56
- const res = await fetch('https://api.voyageai.com/v1/embeddings', {
57
- method: 'POST',
58
- headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${process.env.VOYAGE_API_KEY}` },
59
- body: JSON.stringify({ model: 'voyage-3-lite', input: [text] }),
60
- });
61
- const json = await res.json();
62
- return json.data[0].embedding;
63
- },
35
+ embedFn: createOpenAIEmbed(), // or createVoyageEmbed(), createOllamaEmbed(), etc.
36
+ defaultThreshold: 0.15, // loosen slightly to catch paraphrases with high confidence
37
+ defaultTtl: 3600,
64
38
  });
65
39
 
66
40
  await cache.initialize();
67
41
 
68
- // Store a response
69
- await cache.store('What is the capital of France?', 'Paris');
70
-
71
- // Check for a semantically similar prompt
72
- const result = await cache.check('Capital city of France?');
73
- // result.hit === true, result.response === 'Paris'
74
- ```
75
-
76
- ## Client Lifecycle
77
-
78
- SemanticCache does **not** own the iovalkey client. You create it, you close it:
79
-
80
- ```typescript
81
- const client = new Valkey({ host: 'localhost', port: 6399 });
82
- const cache = new SemanticCache({ client, embedFn });
83
-
84
- // ... use cache ...
42
+ // Store with cost tracking
43
+ await cache.store('What is the capital of France?', 'Paris', {
44
+ model: 'gpt-4o-mini',
45
+ inputTokens: 20,
46
+ outputTokens: 5,
47
+ });
85
48
 
86
- // When shutting down, close the client yourself:
87
- await client.quit();
49
+ // Exact match - always high confidence
50
+ const exact = await cache.check('What is the capital of France?');
51
+ // exact.hit === true
52
+ // exact.confidence === 'high'
53
+ // exact.similarity === 0.0000
54
+ // exact.costSaved === 0.0000085
55
+
56
+ // Paraphrase - typically 'uncertain' at threshold 0.1, 'high' at threshold 0.15
57
+ const paraphrase = await cache.check('What city is the capital of France?');
58
+ // paraphrase.hit === true
59
+ // paraphrase.confidence === 'high' // at threshold 0.15
60
+ // paraphrase.similarity ~= 0.087 // observed with text-embedding-3-small
61
+ // paraphrase.costSaved === 0.0000085
88
62
  ```
89
63
 
90
- ## Threshold: Cosine Distance vs Cosine Similarity
64
+ ## Threshold and Confidence
91
65
 
92
- This library uses **cosine distance** (02 scale), not cosine similarity (0–1 scale):
66
+ This library uses **cosine distance** (0-2 scale, lower = more similar):
93
67
 
94
68
  | Distance | Meaning |
95
69
  |----------|---------|
96
- | 0 | Identical vectors |
97
- | 1 | Orthogonal (unrelated) |
98
- | 2 | Opposite vectors |
99
-
100
- A cache lookup is a **hit** when `score <= threshold`. The default threshold of `0.1` is strict — it matches only very similar prompts. Increase to `0.15–0.2` for broader matching.
101
-
102
- The relationship is: `distance = 1 - similarity`. A cosine similarity of 0.95 corresponds to a distance of 0.05.
70
+ | 0.00 | Identical vectors |
71
+ | 0.05-0.10 | Strong paraphrase |
72
+ | 0.10-0.20 | Loose paraphrase / related topic |
73
+ | 1.00 | Orthogonal (unrelated) |
103
74
 
104
- ### Handling uncertain hits
75
+ A lookup is a **hit** when `score <= threshold`. The default threshold is `0.1`.
105
76
 
106
- When `confidence` is `'uncertain'`, the cached response is technically above
107
- the similarity threshold but close to the boundary. Three common patterns:
77
+ ### Confidence levels
108
78
 
109
- **Accept and monitor** return the cached response but track uncertain hits
110
- separately via the `result: 'uncertain_hit'` Prometheus label. Review them
111
- periodically to decide if the threshold needs adjustment.
79
+ | `confidence` | When | What to do |
80
+ |---|---|---|
81
+ | `high` | `score <= threshold - uncertaintyBand` (e.g. `<= 0.05`) | Return the cached response directly |
82
+ | `uncertain` | `threshold - band < score <= threshold` (e.g. `0.05–0.10`) | Return the response but consider flagging for review |
83
+ | `miss` | `score > threshold` | No hit - call the LLM |
112
84
 
113
- **Fall back to LLM** — treat uncertain hits as misses, call the LLM, then
114
- update the cache entry with `store()` using the fresh response.
85
+ **With real embeddings (`text-embedding-3-small`):**
86
+ - Exact same phrasing: `~0.000` - always `high`
87
+ - Close paraphrase ("Which city is the capital of France?"): `~0.08–0.09` - `uncertain` at default `0.1` threshold, `high` at `0.15`
88
+ - Loose paraphrase ("France's capital?"): `~0.10–0.15` - typically `miss` at `0.1`, `uncertain` at `0.15`
115
89
 
116
- **Prompt for feedback** in user-facing applications, show the cached
117
- response but collect a thumbs up/down signal to identify false positives.
90
+ **Recommended thresholds by use case:**
118
91
 
119
- A high rate of uncertain hits (visible in the `{prefix}_requests_total`
120
- metric) indicates the threshold may be too loose for the query distribution.
92
+ | Use case | Threshold | Notes |
93
+ |---|---|---|
94
+ | FAQ / exact match only | `0.05` | Very strict, near-zero false positives |
95
+ | Standard Q&A | `0.10` | Default - paraphrases land as `uncertain` |
96
+ | Conversational / RAG | `0.15` | Paraphrases hit as `high` confidence |
97
+ | Broad search / recall | `0.20` | High hit rate, review uncertain hits |
121
98
 
122
99
  ## Configuration Reference
123
100
 
124
101
  | Option | Type | Default | Description |
125
102
  |--------|------|---------|-------------|
126
- | `name` | `string` | `'betterdb_scache'` | Index name prefix for Valkey keys |
127
- | `client` | `Valkey` | | iovalkey client instance (required) |
128
- | `embedFn` | `(text: string) => Promise<number[]>` | | Embedding function (required) |
129
- | `defaultThreshold` | `number` | `0.1` | Cosine distance threshold (02) |
130
- | `defaultTtl` | `number` | `undefined` | Default TTL in seconds for entries |
103
+ | `name` | `string` | `'betterdb_scache'` | Key prefix |
104
+ | `client` | `Valkey` | - | iovalkey client (required) |
105
+ | `embedFn` | `EmbedFn` | - | Embedding function (required) |
106
+ | `defaultThreshold` | `number` | `0.1` | Cosine distance threshold (0-2) |
107
+ | `defaultTtl` | `number` | `undefined` | Default TTL in seconds |
131
108
  | `categoryThresholds` | `Record<string, number>` | `{}` | Per-category threshold overrides |
132
- | `uncertaintyBand` | `number` | `0.05` | Width of the uncertainty band below threshold |
133
- | `telemetry.tracerName` | `string` | `'@betterdb/semantic-cache'` | OpenTelemetry tracer name |
134
- | `telemetry.metricsPrefix` | `string` | `'semantic_cache'` | Prometheus metric name prefix |
135
- | `telemetry.registry` | `Registry` | default registry | prom-client Registry for metrics |
136
-
137
- ## Observability
138
-
139
- ### Prometheus Metrics
109
+ | `uncertaintyBand` | `number` | `0.05` | Width of uncertainty band below threshold |
110
+ | `costTable` | `Record<string, ModelCost>` | `undefined` | Per-model pricing overrides |
111
+ | `useDefaultCostTable` | `boolean` | `true` | Use bundled LiteLLM price table (1,971 models) |
112
+ | `normalizer` | `BinaryNormalizer` | `defaultNormalizer` | Binary content normalizer |
113
+ | `embeddingCache.enabled` | `boolean` | `true` | Cache computed embeddings in Valkey |
114
+ | `embeddingCache.ttl` | `number` | `86400` | Embedding cache TTL (seconds) |
115
+ | `telemetry.tracerName` | `string` | `'@betterdb/semantic-cache'` | OTel tracer name |
116
+ | `telemetry.metricsPrefix` | `string` | `'semantic_cache'` | Prometheus prefix |
117
+ | `telemetry.registry` | `Registry` | default | prom-client Registry |
140
118
 
141
- All metric names are prefixed with `semantic_cache_` by default (configurable via `telemetry.metricsPrefix`).
119
+ ## Cost Tracking
142
120
 
143
- | Metric | Type | Labels | Description |
144
- |--------|------|--------|-------------|
145
- | `semantic_cache_requests_total` | Counter | `cache_name`, `result`, `category` | Total cache requests. `result` is `hit`, `miss`, or `uncertain_hit` |
146
- | `semantic_cache_similarity_score` | Histogram | `cache_name`, `category` | Cosine distance scores for lookups with candidates |
147
- | `semantic_cache_operation_duration_seconds` | Histogram | `cache_name`, `operation` | Duration of cache operations (`check`, `store`, `invalidate`, `initialize`) |
148
- | `semantic_cache_embedding_duration_seconds` | Histogram | `cache_name` | Duration of embedding function calls |
121
+ Store token counts at cache-time to get per-hit cost savings:
149
122
 
150
- ### OpenTelemetry Tracing
151
-
152
- Every public method emits an OTel span with relevant attributes (`cache.hit`, `cache.similarity`, `cache.threshold`, `cache.confidence`, etc.). Spans require an OpenTelemetry SDK to be configured in the host application — this library uses `@opentelemetry/api` and does not bundle an SDK.
123
+ ```typescript
124
+ await cache.store('What is the capital of France?', 'Paris', {
125
+ model: 'claude-haiku-4-5', // looked up in bundled LiteLLM price table
126
+ inputTokens: 42,
127
+ outputTokens: 12,
128
+ });
153
129
 
154
- ## BetterDB Monitor Integration
130
+ const result = await cache.check('Capital of France?');
131
+ console.log(result.costSaved); // e.g. 0.000064 (dollars saved on this hit)
155
132
 
156
- If you connect [BetterDB Monitor](https://github.com/KIvanow/monitor) to the same Valkey instance, it will automatically detect the semantic cache index and surface:
133
+ const stats = await cache.stats();
134
+ console.log(stats.costSavedMicros); // cumulative microdollars saved
135
+ ```
157
136
 
158
- - Hit rate and miss rate over time
159
- - Similarity score distribution
160
- - Cache entry count and memory usage
161
- - Cost savings estimates based on cache hit rates
137
+ Cost savings scale with the model. Observed values from live examples:
138
+ - `gpt-4o-mini`: ~`$0.000006` per hit (cheap model, short responses)
139
+ - `claude-haiku-4-5`: ~`$0.000064` per hit (~10x more expensive)
140
+ - `gpt-4o`: ~`$0.000100` per hit at 20 input / 5 output tokens
141
+
142
+ ## Adapters
143
+
144
+ | Import | Class/Function | Description |
145
+ |---|---|---|
146
+ | `@betterdb/semantic-cache/langchain` | `BetterDBSemanticCache` | LangChain `BaseCache` |
147
+ | `@betterdb/semantic-cache/ai` | `createSemanticCacheMiddleware` | Vercel AI SDK middleware |
148
+ | `@betterdb/semantic-cache/openai` | `prepareSemanticParams` | OpenAI Chat Completions |
149
+ | `@betterdb/semantic-cache/openai-responses` | `prepareSemanticParams` | OpenAI Responses API |
150
+ | `@betterdb/semantic-cache/anthropic` | `prepareSemanticParams` | Anthropic Messages API |
151
+ | `@betterdb/semantic-cache/llamaindex` | `prepareSemanticParams` | LlamaIndex ChatMessage[] |
152
+ | `@betterdb/semantic-cache/langgraph` | `BetterDBSemanticStore` | LangGraph BaseStore |
153
+
154
+ ## Embedding Helpers
155
+
156
+ | Import | Default model | Dimensions |
157
+ |---|---|---|
158
+ | `@betterdb/semantic-cache/embed/openai` | `text-embedding-3-small` | 1536 |
159
+ | `@betterdb/semantic-cache/embed/bedrock` | `amazon.titan-embed-text-v2:0` | 1024 |
160
+ | `@betterdb/semantic-cache/embed/voyage` | `voyage-3-lite` | 512 |
161
+ | `@betterdb/semantic-cache/embed/cohere` | `embed-english-v3.0` | 1024 |
162
+ | `@betterdb/semantic-cache/embed/ollama` | `nomic-embed-text` | 768 |
162
163
 
163
164
  ## API
164
165
 
@@ -168,55 +169,137 @@ Creates or reconnects to the Valkey search index. Must be called before `check()
168
169
 
169
170
  ### `cache.check(prompt, options?)`
170
171
 
171
- Searches for a semantically similar cached prompt. Returns `{ hit, response, similarity, confidence, matchedKey, nearestMiss }`.
172
+ `prompt` is `string | ContentBlock[]`. Returns `CacheCheckResult`:
173
+
174
+ | Field | Description |
175
+ |---|---|
176
+ | `hit` | Whether the nearest neighbour's distance was `<= threshold` |
177
+ | `response` | Cached response text. Present on hit |
178
+ | `similarity` | Cosine distance (0-2). Present when a candidate was found |
179
+ | `confidence` | `'high'` / `'uncertain'` / `'miss'` |
180
+ | `costSaved` | Dollars saved on this hit. Present when cost was recorded at store time |
181
+ | `contentBlocks` | Structured response blocks. Present when stored via `storeMultipart()` |
182
+ | `nearestMiss` | On miss with a candidate: `{ similarity, deltaToThreshold }` |
183
+
184
+ **Options:** `threshold`, `category`, `filter`, `k`, `staleAfterModelChange`, `currentModel`, `rerank`
172
185
 
173
186
  ### `cache.store(prompt, response, options?)`
174
187
 
175
- Stores a prompt/response pair with its embedding vector. Returns the Valkey key.
188
+ `prompt` is `string | ContentBlock[]`. Returns the Valkey key.
189
+
190
+ **Options:** `ttl`, `category`, `model`, `metadata`, `inputTokens`, `outputTokens`, `temperature`, `topP`, `seed`
191
+
192
+ ### `cache.storeMultipart(prompt, blocks, options?)`
193
+
194
+ Stores structured `ContentBlock[]` as the response. On hit, `check()` returns `contentBlocks`.
195
+
196
+ ### `cache.checkBatch(prompts[], options?)`
197
+
198
+ Pipelined multi-prompt lookups. ~50-70% faster than sequential `check()` calls. Returns results in input order.
176
199
 
177
200
  ### `cache.invalidate(filter)`
178
201
 
179
- Deletes entries matching a valkey-search filter expression. Example: `cache.invalidate('@model:{gpt-4o}')`.
202
+ Delete entries matching a `valkey-search` filter (e.g. `'@model:{gpt-4o}'`).
203
+
204
+ ### `cache.invalidateByModel(model)` / `cache.invalidateByCategory(category)`
205
+
206
+ Convenience wrappers around `invalidate()`.
180
207
 
181
208
  ### `cache.stats()`
182
209
 
183
- Returns `{ hits, misses, total, hitRate }` from the Valkey stats hash.
210
+ Returns `{ hits, misses, total, hitRate, costSavedMicros }`.
184
211
 
185
212
  ### `cache.indexInfo()`
186
213
 
187
- Returns index metadata: `{ name, numDocs, dimension, indexingState }`.
214
+ Returns `{ name, numDocs, dimension, indexingState }`.
188
215
 
189
216
  ### `cache.flush()`
190
217
 
191
- Drops the index and all entries. Call `initialize()` again to rebuild.
218
+ Drops the index and all keys. Call `initialize()` again to rebuild.
192
219
 
193
- ## Known limitations
220
+ ### `cache.thresholdEffectiveness(options?)`
194
221
 
195
- ### Cluster mode
222
+ Analyzes the rolling similarity score window (last 10,000 entries, up to 7 days) and returns:
223
+
224
+ ```typescript
225
+ {
226
+ recommendation: 'tighten_threshold' | 'loosen_threshold' | 'optimal' | 'insufficient_data',
227
+ recommendedThreshold?: number, // present when recommendation is tighten/loosen
228
+ reasoning: string, // human-readable explanation
229
+ hitRate: number,
230
+ uncertainHitRate: number, // >20% triggers tighten recommendation
231
+ nearMissRate: number, // >30% with avg delta <0.03 triggers loosen
232
+ // ...
233
+ }
234
+ ```
235
+
236
+ ### `cache.thresholdEffectivenessAll(options?)`
237
+
238
+ Returns one result per category seen in the window, plus one aggregate `'all'` result.
239
+
240
+ ## Observability
241
+
242
+ ### Prometheus Metrics
243
+
244
+ | Metric | Type | Labels | Description |
245
+ |--------|------|--------|-------------|
246
+ | `{prefix}_requests_total` | Counter | `cache_name`, `result`, `category` | `result`: `hit`, `miss`, `uncertain_hit` |
247
+ | `{prefix}_similarity_score` | Histogram | `cache_name`, `category` | Cosine distance per lookup |
248
+ | `{prefix}_operation_duration_seconds` | Histogram | `cache_name`, `operation` | End-to-end latency |
249
+ | `{prefix}_embedding_duration_seconds` | Histogram | `cache_name` | Time in `embedFn` |
250
+ | `{prefix}_cost_saved_total` | Counter | `cache_name`, `category` | Dollars saved from hits |
251
+ | `{prefix}_embedding_cache_total` | Counter | `cache_name`, `result` | Embedding cache hit/miss |
252
+ | `{prefix}_stale_model_evictions_total` | Counter | `cache_name` | Evictions from `staleAfterModelChange` |
253
+
254
+ ### OpenTelemetry
255
+
256
+ Every public method emits an OTel span. Requires an OpenTelemetry SDK in the host application.
257
+
258
+ ## Examples
259
+
260
+ Runnable examples in [examples/](./examples/). All examples connect to `localhost:6399` by default (override via `VALKEY_HOST` / `VALKEY_PORT`).
261
+
262
+ | Example | API key needed | What it shows |
263
+ |---|---|---|
264
+ | `basic/` | Voyage AI (or `--mock`) | Core store/check/invalidate |
265
+ | `openai/` | OpenAI | Chat Completions + cost tracking |
266
+ | `openai-responses/` | OpenAI | Responses API adapter |
267
+ | `anthropic/` | Anthropic + OpenAI | Messages API, high cost savings (~$0.000064/hit) |
268
+ | `llamaindex/` | OpenAI | ChatMessage[] adapter |
269
+ | `langchain/` | OpenAI | BetterDBSemanticCache + ChatOpenAI |
270
+ | `vercel-ai-sdk/` | OpenAI | createSemanticCacheMiddleware |
271
+ | `langgraph/` | None | BetterDBSemanticStore memory |
272
+ | `multimodal/` | None | ContentBlock[] with text + image |
273
+ | `cost-tracking/` | None | Cost savings with mock embedder |
274
+ | `threshold-tuning/` | None | thresholdEffectiveness() |
275
+ | `embedding-cache/` | None | Embedding cache on/off comparison |
276
+ | `batch-check/` | None | checkBatch() vs sequential |
277
+ | `rerank/` | None | Top-k rerank hook |
196
278
 
197
- `@betterdb/semantic-cache` works with single-node Valkey instances and managed
198
- single-endpoint services (Amazon ElastiCache for Valkey, Google Cloud Memorystore
199
- for Valkey). It does not fully support Valkey in cluster mode.
279
+ ## Client Lifecycle
280
+
281
+ SemanticCache does **not** own the iovalkey client:
282
+
283
+ ```typescript
284
+ const client = new Valkey({ host: 'localhost', port: 6399 });
285
+ const cache = new SemanticCache({ client, embedFn });
286
+ // ... use cache ...
287
+ await client.quit();
288
+ ```
200
289
 
201
- The specific issue is `flush()`: it uses `SCAN` to find and delete entry keys,
202
- but `SCAN` in cluster mode only iterates keys on the node it is sent to. In a
203
- multi-node cluster, `flush()` will silently leave entry keys on other nodes
204
- (the FT index itself is dropped correctly).
290
+ ## Known Limitations
205
291
 
206
- `check()`, `store()`, `invalidate()`, and `stats()` are unaffected — these use
207
- `FT.SEARCH`, `HSET`, `DEL`, and `HINCRBY` which route correctly in cluster mode
208
- via the key hash slot.
292
+ ### Cluster mode
209
293
 
210
- If you need cluster support, either avoid `flush()` or implement a cluster-aware
211
- key sweep using the iovalkey cluster client's per-node scan capability.
212
- Cluster mode support is planned for a future release.
294
+ `flush()` fans out via `clusterScan()` across all master nodes. `FT.SEARCH` routes correctly via hash slots. `FT.CREATE` only creates the index on the receiving node - in a full cluster, create the index on each node separately.
213
295
 
214
296
  ### Streaming
215
297
 
216
- Streaming LLM responses are not supported. `store()` expects a complete response
217
- string. If your application uses streaming, accumulate the full response before
218
- calling `store()`. The cached response is always returned as a complete string,
219
- not re-streamed token-by-token.
298
+ `store()` requires a complete response string. The Vercel AI SDK adapter does not implement `wrapStream`. Accumulate the full streamed response before calling `store()`.
299
+
300
+ ### Schema migration (v0.1 -> v0.2)
301
+
302
+ v0.2.0 added `binary_refs`, `temperature`, `top_p`, `seed` fields to the index schema. Existing v0.1.0 indexes operate in text-only mode until `flush()` + `initialize()` rebuilds the schema.
220
303
 
221
304
  ## License
222
305
 
@@ -1,4 +1,5 @@
1
1
  import type { SemanticCacheOptions, CacheCheckOptions, CacheStoreOptions, CacheCheckResult, CacheStats, IndexInfo, InvalidateResult } from './types';
2
+ import { type ContentBlock } from './utils';
2
3
  export declare class SemanticCache {
3
4
  private readonly client;
4
5
  private readonly embedFn;
@@ -6,15 +7,27 @@ export declare class SemanticCache {
6
7
  private readonly indexName;
7
8
  private readonly entryPrefix;
8
9
  private readonly statsKey;
10
+ private readonly similarityWindowKey;
9
11
  private readonly defaultThreshold;
10
12
  private readonly defaultTtl;
11
13
  private readonly categoryThresholds;
12
14
  private readonly uncertaintyBand;
13
15
  private readonly telemetry;
16
+ private readonly costTable;
17
+ private readonly embeddingCacheEnabled;
18
+ private readonly embeddingCacheTtl;
19
+ private readonly embedKeyPrefix;
14
20
  private _initialized;
15
21
  private _dimension;
22
+ private _hasBinaryRefs;
16
23
  private _initPromise;
17
24
  private _initGeneration;
25
+ private readonly analyticsOpts;
26
+ private readonly usesDefaultCostTable;
27
+ private analytics;
28
+ private statsTimer;
29
+ private shutdownCalled;
30
+ private analyticsInitiated;
18
31
  /**
19
32
  * Creates a new SemanticCache instance.
20
33
  *
@@ -27,33 +40,100 @@ export declare class SemanticCache {
27
40
  constructor(options: SemanticCacheOptions);
28
41
  initialize(): Promise<void>;
29
42
  flush(): Promise<void>;
30
- check(prompt: string, options?: CacheCheckOptions): Promise<CacheCheckResult>;
31
- store(prompt: string, response: string, options?: CacheStoreOptions): Promise<string>;
43
+ /** Shut down the analytics client and cancel the stats timer. */
44
+ shutdown(): Promise<void>;
45
+ check(prompt: string | ContentBlock[], options?: CacheCheckOptions): Promise<CacheCheckResult>;
46
+ store(prompt: string | ContentBlock[], response: string, options?: CacheStoreOptions): Promise<string>;
47
+ /**
48
+ * Store structured content blocks as the cached response.
49
+ * Populates both the response field (from TextBlock text) and content_blocks (full JSON).
50
+ */
51
+ storeMultipart(prompt: string | ContentBlock[], blocks: ContentBlock[], options?: CacheStoreOptions): Promise<string>;
52
+ /**
53
+ * Check multiple prompts in parallel, using pipelined FT.SEARCH calls.
54
+ * Returns results in input order.
55
+ */
56
+ checkBatch(prompts: (string | ContentBlock[])[], options?: CacheCheckOptions): Promise<CacheCheckResult[]>;
32
57
  /**
33
58
  * Deletes all entries matching a valkey-search filter expression.
34
59
  *
35
60
  * **Security note:** `filter` is passed directly to FT.SEARCH. Only pass
36
- * trusted, programmatically-constructed expressions never unsanitised
61
+ * trusted, programmatically-constructed expressions - never unsanitised
37
62
  * user input.
38
63
  */
39
64
  invalidate(filter: string): Promise<InvalidateResult>;
65
+ /** Delete all entries tagged with the given model name. */
66
+ invalidateByModel(model: string): Promise<number>;
67
+ /** Delete all entries tagged with the given category. */
68
+ invalidateByCategory(category: string): Promise<number>;
40
69
  stats(): Promise<CacheStats>;
41
70
  indexInfo(): Promise<IndexInfo>;
71
+ /**
72
+ * Analyze the rolling similarity score window and recommend threshold adjustments.
73
+ */
74
+ thresholdEffectiveness(options?: {
75
+ category?: string;
76
+ minSamples?: number;
77
+ }): Promise<ThresholdEffectivenessResult>;
78
+ /**
79
+ * Returns threshold effectiveness results for every category seen in the
80
+ * rolling window, plus one aggregate result for all categories combined.
81
+ */
82
+ thresholdEffectivenessAll(options?: {
83
+ minSamples?: number;
84
+ }): Promise<ThresholdEffectivenessResult[]>;
85
+ /** @internal Default similarity threshold. */
86
+ get _defaultThreshold(): number;
87
+ /**
88
+ * Execute a stable FT.SEARCH for use by adapters (e.g. LangGraph).
89
+ * SORTBY inserted_at ASC gives stable ordering across paginated calls.
90
+ * @internal
91
+ */
92
+ _searchEntries(filterExpr: string, limit: number, offset: number): Promise<unknown>;
93
+ /**
94
+ * Embed text for use by adapters (e.g. LangGraph semantic search).
95
+ * @internal
96
+ */
97
+ _embedText(text: string): Promise<{
98
+ vector: number[];
99
+ durationSec: number;
100
+ }>;
42
101
  private _doInitialize;
102
+ private initAnalyticsSafe;
103
+ private captureStatsSnapshot;
43
104
  private ensureIndexAndGetDimension;
44
- /** Wraps embedFn with error handling and duration tracking. */
105
+ /** Check if the index schema has a binary_refs field. */
106
+ private parseHasBinaryRefsFromInfo;
107
+ /** Resolve a prompt (string or ContentBlock[]) into text + binary refs. */
108
+ private resolvePrompt;
109
+ /** Wraps embedFn with error handling, duration tracking, and optional embedding cache. */
45
110
  private embed;
46
111
  /**
47
112
  * Wraps a method body in an OTel span with automatic status, end, and
48
113
  * operation duration metric. The span is passed to fn so callers can
49
- * set attributes but callers must NOT call span.end() or span.setStatus(),
114
+ * set attributes - but callers must NOT call span.end() or span.setStatus(),
50
115
  * as traced() handles both.
51
116
  */
52
117
  private traced;
53
118
  /** Increment stats counters via pipeline. */
54
119
  private recordStat;
120
+ /** Append to the rolling similarity window sorted set and trim to 10,000 entries or 7 days. */
121
+ private recordSimilarityWindow;
55
122
  private assertInitialized;
56
123
  private assertDimension;
57
124
  private isIndexNotFoundError;
58
125
  private parseDimensionFromInfo;
59
126
  }
127
+ export interface ThresholdEffectivenessResult {
128
+ category: string;
129
+ sampleCount: number;
130
+ currentThreshold: number;
131
+ hitRate: number;
132
+ uncertainHitRate: number;
133
+ nearMissRate: number;
134
+ avgHitSimilarity: number;
135
+ avgMissSimilarity: number;
136
+ recommendation: 'tighten_threshold' | 'loosen_threshold' | 'optimal' | 'insufficient_data';
137
+ recommendedThreshold?: number;
138
+ reasoning: string;
139
+ }