@betterdb/semantic-cache 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +223 -0
- package/dist/SemanticCache.d.ts +59 -0
- package/dist/SemanticCache.js +416 -0
- package/dist/adapters/ai.d.ts +43 -0
- package/dist/adapters/ai.js +98 -0
- package/dist/adapters/langchain.d.ts +29 -0
- package/dist/adapters/langchain.js +50 -0
- package/dist/errors.d.ts +25 -0
- package/dist/errors.js +43 -0
- package/dist/index.d.ts +3 -0
- package/dist/index.js +9 -0
- package/dist/telemetry.d.ts +19 -0
- package/dist/telemetry.js +54 -0
- package/dist/types.d.ts +142 -0
- package/dist/types.js +2 -0
- package/dist/utils.d.ts +25 -0
- package/dist/utils.js +77 -0
- package/package.json +69 -0
package/README.md
ADDED
|
@@ -0,0 +1,223 @@
|
|
|
1
|
+
# @betterdb/semantic-cache
|
|
2
|
+
|
|
3
|
+
A standalone, framework-agnostic semantic cache for LLM applications backed by [Valkey](https://valkey.io/) (or Redis). Uses Valkey's vector search (`valkey-search` module) for similarity matching with built-in [OpenTelemetry](https://opentelemetry.io/) tracing and [Prometheus](https://prometheus.io/) metrics via `prom-client`. The first semantic cache library designed to work natively with Valkey and BetterDB Monitor.
|
|
4
|
+
|
|
5
|
+
## Prerequisites
|
|
6
|
+
|
|
7
|
+
- **Valkey 8.0+** with the `valkey-search` module loaded
|
|
8
|
+
- Or **Amazon ElastiCache for Valkey** (8.0+)
|
|
9
|
+
- Or **Google Cloud Memorystore for Valkey**
|
|
10
|
+
- Node.js >= 20.0.0
|
|
11
|
+
|
|
12
|
+
## Installation
|
|
13
|
+
|
|
14
|
+
```bash
|
|
15
|
+
npm install @betterdb/semantic-cache
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
You must also have `iovalkey` installed (it is a peer dependency):
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
npm install iovalkey
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
## Why @betterdb/semantic-cache
|
|
25
|
+
|
|
26
|
+
As of 2026, no existing semantic cache library simultaneously satisfies all three of the following properties: **Valkey-native** support (explicitly handling `valkey-search` API differences rather than assuming Redis wire compatibility), **standalone** operation (no coupling to LangChain, LiteLLM, AWS, or any other orchestration layer), and **built-in observability** (OpenTelemetry spans and Prometheus metrics emitted at the cache operation level, not just at the HTTP or LLM call level). This package was built to fill that gap.
|
|
27
|
+
|
|
28
|
+
| Library / Service | Valkey-native | Standalone | Built-in OTel + Prometheus |
|
|
29
|
+
|---|---|---|---|
|
|
30
|
+
| **@betterdb/semantic-cache** | ✅ | ✅ | ✅ |
|
|
31
|
+
| RedisVL `SemanticCache` | ❌ Redis only | ✅ | ❌ |
|
|
32
|
+
| LangChain `RedisSemanticCache` | ❌ Redis only | ❌ Requires LangChain | ❌ |
|
|
33
|
+
| LiteLLM `redis-semantic` | ❌ Redis only | ❌ Requires LiteLLM | ❌ Partial (no cache metrics) |
|
|
34
|
+
| `langgraph-checkpoint-aws` `ValkeyCache` | ✅ | ❌ Requires AWS + LangGraph | ❌ |
|
|
35
|
+
| Mem0 + Valkey | ✅ | ❌ Full memory framework | ❌ |
|
|
36
|
+
| Redis LangCache | ❌ Redis Cloud only | ❌ Managed service | ✅ Dashboard only |
|
|
37
|
+
| Upstash `semantic-cache` | ❌ Upstash Vector only | ✅ | ❌ |
|
|
38
|
+
| GPTCache | ❌ Abandoned (2023) | ✅ | ❌ |
|
|
39
|
+
|
|
40
|
+
- **Valkey-native**: `valkey-search` has API differences from Redis's RediSearch that require explicit handling (see [Valkey Search 1.2 compatibility notes](#valkey-search-12-compatibility-notes) in the changelog). Libraries targeting Redis are not guaranteed to work correctly against self-hosted Valkey or managed Valkey services (ElastiCache, Memorystore).
|
|
41
|
+
- **Standalone**: no dependency on a specific AI framework means you can use this with any LLM client — OpenAI SDK, Anthropic SDK, a local model, or a custom inference endpoint — and swap it out without changing your cache layer.
|
|
42
|
+
- **Built-in OTel + Prometheus**: every `check()` and `store()` call emits a span and increments counters. You get hit rate, similarity score distribution, and latency percentiles in Grafana or any OTel-compatible backend without writing any instrumentation code. If you use [BetterDB Monitor](https://betterdb.com), these metrics are surfaced automatically alongside your other Valkey observability data.
|
|
43
|
+
|
|
44
|
+
## Quick Start
|
|
45
|
+
|
|
46
|
+
```typescript
|
|
47
|
+
import Valkey from 'iovalkey';
|
|
48
|
+
import { SemanticCache } from '@betterdb/semantic-cache';
|
|
49
|
+
|
|
50
|
+
const client = new Valkey({ host: 'localhost', port: 6399 });
|
|
51
|
+
|
|
52
|
+
const cache = new SemanticCache({
|
|
53
|
+
client,
|
|
54
|
+
embedFn: async (text) => {
|
|
55
|
+
// Any embedding provider works — OpenAI, Voyage AI, Cohere, a local model, etc.
|
|
56
|
+
const res = await fetch('https://api.voyageai.com/v1/embeddings', {
|
|
57
|
+
method: 'POST',
|
|
58
|
+
headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${process.env.VOYAGE_API_KEY}` },
|
|
59
|
+
body: JSON.stringify({ model: 'voyage-3-lite', input: [text] }),
|
|
60
|
+
});
|
|
61
|
+
const json = await res.json();
|
|
62
|
+
return json.data[0].embedding;
|
|
63
|
+
},
|
|
64
|
+
});
|
|
65
|
+
|
|
66
|
+
await cache.initialize();
|
|
67
|
+
|
|
68
|
+
// Store a response
|
|
69
|
+
await cache.store('What is the capital of France?', 'Paris');
|
|
70
|
+
|
|
71
|
+
// Check for a semantically similar prompt
|
|
72
|
+
const result = await cache.check('Capital city of France?');
|
|
73
|
+
// result.hit === true, result.response === 'Paris'
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
## Client Lifecycle
|
|
77
|
+
|
|
78
|
+
SemanticCache does **not** own the iovalkey client. You create it, you close it:
|
|
79
|
+
|
|
80
|
+
```typescript
|
|
81
|
+
const client = new Valkey({ host: 'localhost', port: 6399 });
|
|
82
|
+
const cache = new SemanticCache({ client, embedFn });
|
|
83
|
+
|
|
84
|
+
// ... use cache ...
|
|
85
|
+
|
|
86
|
+
// When shutting down, close the client yourself:
|
|
87
|
+
await client.quit();
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## Threshold: Cosine Distance vs Cosine Similarity
|
|
91
|
+
|
|
92
|
+
This library uses **cosine distance** (0–2 scale), not cosine similarity (0–1 scale):
|
|
93
|
+
|
|
94
|
+
| Distance | Meaning |
|
|
95
|
+
|----------|---------|
|
|
96
|
+
| 0 | Identical vectors |
|
|
97
|
+
| 1 | Orthogonal (unrelated) |
|
|
98
|
+
| 2 | Opposite vectors |
|
|
99
|
+
|
|
100
|
+
A cache lookup is a **hit** when `score <= threshold`. The default threshold of `0.1` is strict — it matches only very similar prompts. Increase to `0.15–0.2` for broader matching.
|
|
101
|
+
|
|
102
|
+
The relationship is: `distance = 1 - similarity`. A cosine similarity of 0.95 corresponds to a distance of 0.05.
|
|
103
|
+
|
|
104
|
+
### Handling uncertain hits
|
|
105
|
+
|
|
106
|
+
When `confidence` is `'uncertain'`, the cached response is technically above
|
|
107
|
+
the similarity threshold but close to the boundary. Three common patterns:
|
|
108
|
+
|
|
109
|
+
**Accept and monitor** — return the cached response but track uncertain hits
|
|
110
|
+
separately via the `result: 'uncertain_hit'` Prometheus label. Review them
|
|
111
|
+
periodically to decide if the threshold needs adjustment.
|
|
112
|
+
|
|
113
|
+
**Fall back to LLM** — treat uncertain hits as misses, call the LLM, then
|
|
114
|
+
update the cache entry with `store()` using the fresh response.
|
|
115
|
+
|
|
116
|
+
**Prompt for feedback** — in user-facing applications, show the cached
|
|
117
|
+
response but collect a thumbs up/down signal to identify false positives.
|
|
118
|
+
|
|
119
|
+
A high rate of uncertain hits (visible in the `{prefix}_requests_total`
|
|
120
|
+
metric) indicates the threshold may be too loose for the query distribution.
|
|
121
|
+
|
|
122
|
+
## Configuration Reference
|
|
123
|
+
|
|
124
|
+
| Option | Type | Default | Description |
|
|
125
|
+
|--------|------|---------|-------------|
|
|
126
|
+
| `name` | `string` | `'betterdb_scache'` | Index name prefix for Valkey keys |
|
|
127
|
+
| `client` | `Valkey` | — | iovalkey client instance (required) |
|
|
128
|
+
| `embedFn` | `(text: string) => Promise<number[]>` | — | Embedding function (required) |
|
|
129
|
+
| `defaultThreshold` | `number` | `0.1` | Cosine distance threshold (0–2) |
|
|
130
|
+
| `defaultTtl` | `number` | `undefined` | Default TTL in seconds for entries |
|
|
131
|
+
| `categoryThresholds` | `Record<string, number>` | `{}` | Per-category threshold overrides |
|
|
132
|
+
| `uncertaintyBand` | `number` | `0.05` | Width of the uncertainty band below threshold |
|
|
133
|
+
| `telemetry.tracerName` | `string` | `'@betterdb/semantic-cache'` | OpenTelemetry tracer name |
|
|
134
|
+
| `telemetry.metricsPrefix` | `string` | `'semantic_cache'` | Prometheus metric name prefix |
|
|
135
|
+
| `telemetry.registry` | `Registry` | default registry | prom-client Registry for metrics |
|
|
136
|
+
|
|
137
|
+
## Observability
|
|
138
|
+
|
|
139
|
+
### Prometheus Metrics
|
|
140
|
+
|
|
141
|
+
All metric names are prefixed with `semantic_cache_` by default (configurable via `telemetry.metricsPrefix`).
|
|
142
|
+
|
|
143
|
+
| Metric | Type | Labels | Description |
|
|
144
|
+
|--------|------|--------|-------------|
|
|
145
|
+
| `semantic_cache_requests_total` | Counter | `cache_name`, `result`, `category` | Total cache requests. `result` is `hit`, `miss`, or `uncertain_hit` |
|
|
146
|
+
| `semantic_cache_similarity_score` | Histogram | `cache_name`, `category` | Cosine distance scores for lookups with candidates |
|
|
147
|
+
| `semantic_cache_operation_duration_seconds` | Histogram | `cache_name`, `operation` | Duration of cache operations (`check`, `store`, `invalidate`, `initialize`) |
|
|
148
|
+
| `semantic_cache_embedding_duration_seconds` | Histogram | `cache_name` | Duration of embedding function calls |
|
|
149
|
+
|
|
150
|
+
### OpenTelemetry Tracing
|
|
151
|
+
|
|
152
|
+
Every public method emits an OTel span with relevant attributes (`cache.hit`, `cache.similarity`, `cache.threshold`, `cache.confidence`, etc.). Spans require an OpenTelemetry SDK to be configured in the host application — this library uses `@opentelemetry/api` and does not bundle an SDK.
|
|
153
|
+
|
|
154
|
+
## BetterDB Monitor Integration
|
|
155
|
+
|
|
156
|
+
If you connect [BetterDB Monitor](https://github.com/KIvanow/monitor) to the same Valkey instance, it will automatically detect the semantic cache index and surface:
|
|
157
|
+
|
|
158
|
+
- Hit rate and miss rate over time
|
|
159
|
+
- Similarity score distribution
|
|
160
|
+
- Cache entry count and memory usage
|
|
161
|
+
- Cost savings estimates based on cache hit rates
|
|
162
|
+
|
|
163
|
+
## API
|
|
164
|
+
|
|
165
|
+
### `cache.initialize()`
|
|
166
|
+
|
|
167
|
+
Creates or reconnects to the Valkey search index. Must be called before `check()` or `store()`. Safe to call multiple times.
|
|
168
|
+
|
|
169
|
+
### `cache.check(prompt, options?)`
|
|
170
|
+
|
|
171
|
+
Searches for a semantically similar cached prompt. Returns `{ hit, response, similarity, confidence, matchedKey, nearestMiss }`.
|
|
172
|
+
|
|
173
|
+
### `cache.store(prompt, response, options?)`
|
|
174
|
+
|
|
175
|
+
Stores a prompt/response pair with its embedding vector. Returns the Valkey key.
|
|
176
|
+
|
|
177
|
+
### `cache.invalidate(filter)`
|
|
178
|
+
|
|
179
|
+
Deletes entries matching a valkey-search filter expression. Example: `cache.invalidate('@model:{gpt-4o}')`.
|
|
180
|
+
|
|
181
|
+
### `cache.stats()`
|
|
182
|
+
|
|
183
|
+
Returns `{ hits, misses, total, hitRate }` from the Valkey stats hash.
|
|
184
|
+
|
|
185
|
+
### `cache.indexInfo()`
|
|
186
|
+
|
|
187
|
+
Returns index metadata: `{ name, numDocs, dimension, indexingState }`.
|
|
188
|
+
|
|
189
|
+
### `cache.flush()`
|
|
190
|
+
|
|
191
|
+
Drops the index and all entries. Call `initialize()` again to rebuild.
|
|
192
|
+
|
|
193
|
+
## Known limitations
|
|
194
|
+
|
|
195
|
+
### Cluster mode
|
|
196
|
+
|
|
197
|
+
`@betterdb/semantic-cache` works with single-node Valkey instances and managed
|
|
198
|
+
single-endpoint services (Amazon ElastiCache for Valkey, Google Cloud Memorystore
|
|
199
|
+
for Valkey). It does not fully support Valkey in cluster mode.
|
|
200
|
+
|
|
201
|
+
The specific issue is `flush()`: it uses `SCAN` to find and delete entry keys,
|
|
202
|
+
but `SCAN` in cluster mode only iterates keys on the node it is sent to. In a
|
|
203
|
+
multi-node cluster, `flush()` will silently leave entry keys on other nodes
|
|
204
|
+
(the FT index itself is dropped correctly).
|
|
205
|
+
|
|
206
|
+
`check()`, `store()`, `invalidate()`, and `stats()` are unaffected — these use
|
|
207
|
+
`FT.SEARCH`, `HSET`, `DEL`, and `HINCRBY` which route correctly in cluster mode
|
|
208
|
+
via the key hash slot.
|
|
209
|
+
|
|
210
|
+
If you need cluster support, either avoid `flush()` or implement a cluster-aware
|
|
211
|
+
key sweep using the iovalkey cluster client's per-node scan capability.
|
|
212
|
+
Cluster mode support is planned for a future release.
|
|
213
|
+
|
|
214
|
+
### Streaming
|
|
215
|
+
|
|
216
|
+
Streaming LLM responses are not supported. `store()` expects a complete response
|
|
217
|
+
string. If your application uses streaming, accumulate the full response before
|
|
218
|
+
calling `store()`. The cached response is always returned as a complete string,
|
|
219
|
+
not re-streamed token-by-token.
|
|
220
|
+
|
|
221
|
+
## License
|
|
222
|
+
|
|
223
|
+
MIT
|
|
@@ -0,0 +1,59 @@
|
|
|
1
|
+
import type { SemanticCacheOptions, CacheCheckOptions, CacheStoreOptions, CacheCheckResult, CacheStats, IndexInfo, InvalidateResult } from './types';
|
|
2
|
+
export declare class SemanticCache {
|
|
3
|
+
private readonly client;
|
|
4
|
+
private readonly embedFn;
|
|
5
|
+
private readonly name;
|
|
6
|
+
private readonly indexName;
|
|
7
|
+
private readonly entryPrefix;
|
|
8
|
+
private readonly statsKey;
|
|
9
|
+
private readonly defaultThreshold;
|
|
10
|
+
private readonly defaultTtl;
|
|
11
|
+
private readonly categoryThresholds;
|
|
12
|
+
private readonly uncertaintyBand;
|
|
13
|
+
private readonly telemetry;
|
|
14
|
+
private _initialized;
|
|
15
|
+
private _dimension;
|
|
16
|
+
private _initPromise;
|
|
17
|
+
private _initGeneration;
|
|
18
|
+
/**
|
|
19
|
+
* Creates a new SemanticCache instance.
|
|
20
|
+
*
|
|
21
|
+
* The caller owns the iovalkey client lifecycle. SemanticCache does not
|
|
22
|
+
* close or disconnect the client when it is done. Call client.quit() or
|
|
23
|
+
* client.disconnect() yourself when the application shuts down.
|
|
24
|
+
*
|
|
25
|
+
* Call initialize() before using check() or store().
|
|
26
|
+
*/
|
|
27
|
+
constructor(options: SemanticCacheOptions);
|
|
28
|
+
initialize(): Promise<void>;
|
|
29
|
+
flush(): Promise<void>;
|
|
30
|
+
check(prompt: string, options?: CacheCheckOptions): Promise<CacheCheckResult>;
|
|
31
|
+
store(prompt: string, response: string, options?: CacheStoreOptions): Promise<string>;
|
|
32
|
+
/**
|
|
33
|
+
* Deletes all entries matching a valkey-search filter expression.
|
|
34
|
+
*
|
|
35
|
+
* **Security note:** `filter` is passed directly to FT.SEARCH. Only pass
|
|
36
|
+
* trusted, programmatically-constructed expressions — never unsanitised
|
|
37
|
+
* user input.
|
|
38
|
+
*/
|
|
39
|
+
invalidate(filter: string): Promise<InvalidateResult>;
|
|
40
|
+
stats(): Promise<CacheStats>;
|
|
41
|
+
indexInfo(): Promise<IndexInfo>;
|
|
42
|
+
private _doInitialize;
|
|
43
|
+
private ensureIndexAndGetDimension;
|
|
44
|
+
/** Wraps embedFn with error handling and duration tracking. */
|
|
45
|
+
private embed;
|
|
46
|
+
/**
|
|
47
|
+
* Wraps a method body in an OTel span with automatic status, end, and
|
|
48
|
+
* operation duration metric. The span is passed to fn so callers can
|
|
49
|
+
* set attributes — but callers must NOT call span.end() or span.setStatus(),
|
|
50
|
+
* as traced() handles both.
|
|
51
|
+
*/
|
|
52
|
+
private traced;
|
|
53
|
+
/** Increment stats counters via pipeline. */
|
|
54
|
+
private recordStat;
|
|
55
|
+
private assertInitialized;
|
|
56
|
+
private assertDimension;
|
|
57
|
+
private isIndexNotFoundError;
|
|
58
|
+
private parseDimensionFromInfo;
|
|
59
|
+
}
|