@betterdb/semantic-cache 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,223 @@
1
+ # @betterdb/semantic-cache
2
+
3
+ A standalone, framework-agnostic semantic cache for LLM applications backed by [Valkey](https://valkey.io/) (or Redis). Uses Valkey's vector search (`valkey-search` module) for similarity matching with built-in [OpenTelemetry](https://opentelemetry.io/) tracing and [Prometheus](https://prometheus.io/) metrics via `prom-client`. The first semantic cache library designed to work natively with Valkey and BetterDB Monitor.
4
+
5
+ ## Prerequisites
6
+
7
+ - **Valkey 8.0+** with the `valkey-search` module loaded
8
+ - Or **Amazon ElastiCache for Valkey** (8.0+)
9
+ - Or **Google Cloud Memorystore for Valkey**
10
+ - Node.js >= 20.0.0
11
+
12
+ ## Installation
13
+
14
+ ```bash
15
+ npm install @betterdb/semantic-cache
16
+ ```
17
+
18
+ You must also have `iovalkey` installed (it is a peer dependency):
19
+
20
+ ```bash
21
+ npm install iovalkey
22
+ ```
23
+
24
+ ## Why @betterdb/semantic-cache
25
+
26
+ As of 2026, no existing semantic cache library simultaneously satisfies all three of the following properties: **Valkey-native** support (explicitly handling `valkey-search` API differences rather than assuming Redis wire compatibility), **standalone** operation (no coupling to LangChain, LiteLLM, AWS, or any other orchestration layer), and **built-in observability** (OpenTelemetry spans and Prometheus metrics emitted at the cache operation level, not just at the HTTP or LLM call level). This package was built to fill that gap.
27
+
28
+ | Library / Service | Valkey-native | Standalone | Built-in OTel + Prometheus |
29
+ |---|---|---|---|
30
+ | **@betterdb/semantic-cache** | ✅ | ✅ | ✅ |
31
+ | RedisVL `SemanticCache` | ❌ Redis only | ✅ | ❌ |
32
+ | LangChain `RedisSemanticCache` | ❌ Redis only | ❌ Requires LangChain | ❌ |
33
+ | LiteLLM `redis-semantic` | ❌ Redis only | ❌ Requires LiteLLM | ❌ Partial (no cache metrics) |
34
+ | `langgraph-checkpoint-aws` `ValkeyCache` | ✅ | ❌ Requires AWS + LangGraph | ❌ |
35
+ | Mem0 + Valkey | ✅ | ❌ Full memory framework | ❌ |
36
+ | Redis LangCache | ❌ Redis Cloud only | ❌ Managed service | ✅ Dashboard only |
37
+ | Upstash `semantic-cache` | ❌ Upstash Vector only | ✅ | ❌ |
38
+ | GPTCache | ❌ Abandoned (2023) | ✅ | ❌ |
39
+
40
+ - **Valkey-native**: `valkey-search` has API differences from Redis's RediSearch that require explicit handling (see [Valkey Search 1.2 compatibility notes](#valkey-search-12-compatibility-notes) in the changelog). Libraries targeting Redis are not guaranteed to work correctly against self-hosted Valkey or managed Valkey services (ElastiCache, Memorystore).
41
+ - **Standalone**: no dependency on a specific AI framework means you can use this with any LLM client — OpenAI SDK, Anthropic SDK, a local model, or a custom inference endpoint — and swap it out without changing your cache layer.
42
+ - **Built-in OTel + Prometheus**: every `check()` and `store()` call emits a span and increments counters. You get hit rate, similarity score distribution, and latency percentiles in Grafana or any OTel-compatible backend without writing any instrumentation code. If you use [BetterDB Monitor](https://betterdb.com), these metrics are surfaced automatically alongside your other Valkey observability data.
43
+
44
+ ## Quick Start
45
+
46
+ ```typescript
47
+ import Valkey from 'iovalkey';
48
+ import { SemanticCache } from '@betterdb/semantic-cache';
49
+
50
+ const client = new Valkey({ host: 'localhost', port: 6399 });
51
+
52
+ const cache = new SemanticCache({
53
+ client,
54
+ embedFn: async (text) => {
55
+ // Any embedding provider works — OpenAI, Voyage AI, Cohere, a local model, etc.
56
+ const res = await fetch('https://api.voyageai.com/v1/embeddings', {
57
+ method: 'POST',
58
+ headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${process.env.VOYAGE_API_KEY}` },
59
+ body: JSON.stringify({ model: 'voyage-3-lite', input: [text] }),
60
+ });
61
+ const json = await res.json();
62
+ return json.data[0].embedding;
63
+ },
64
+ });
65
+
66
+ await cache.initialize();
67
+
68
+ // Store a response
69
+ await cache.store('What is the capital of France?', 'Paris');
70
+
71
+ // Check for a semantically similar prompt
72
+ const result = await cache.check('Capital city of France?');
73
+ // result.hit === true, result.response === 'Paris'
74
+ ```
75
+
76
+ ## Client Lifecycle
77
+
78
+ SemanticCache does **not** own the iovalkey client. You create it, you close it:
79
+
80
+ ```typescript
81
+ const client = new Valkey({ host: 'localhost', port: 6399 });
82
+ const cache = new SemanticCache({ client, embedFn });
83
+
84
+ // ... use cache ...
85
+
86
+ // When shutting down, close the client yourself:
87
+ await client.quit();
88
+ ```
89
+
90
+ ## Threshold: Cosine Distance vs Cosine Similarity
91
+
92
+ This library uses **cosine distance** (0–2 scale), not cosine similarity (0–1 scale):
93
+
94
+ | Distance | Meaning |
95
+ |----------|---------|
96
+ | 0 | Identical vectors |
97
+ | 1 | Orthogonal (unrelated) |
98
+ | 2 | Opposite vectors |
99
+
100
+ A cache lookup is a **hit** when `score <= threshold`. The default threshold of `0.1` is strict — it matches only very similar prompts. Increase to `0.15–0.2` for broader matching.
101
+
102
+ The relationship is: `distance = 1 - similarity`. A cosine similarity of 0.95 corresponds to a distance of 0.05.
103
+
104
+ ### Handling uncertain hits
105
+
106
+ When `confidence` is `'uncertain'`, the cached response is technically above
107
+ the similarity threshold but close to the boundary. Three common patterns:
108
+
109
+ **Accept and monitor** — return the cached response but track uncertain hits
110
+ separately via the `result: 'uncertain_hit'` Prometheus label. Review them
111
+ periodically to decide if the threshold needs adjustment.
112
+
113
+ **Fall back to LLM** — treat uncertain hits as misses, call the LLM, then
114
+ update the cache entry with `store()` using the fresh response.
115
+
116
+ **Prompt for feedback** — in user-facing applications, show the cached
117
+ response but collect a thumbs up/down signal to identify false positives.
118
+
119
+ A high rate of uncertain hits (visible in the `{prefix}_requests_total`
120
+ metric) indicates the threshold may be too loose for the query distribution.
121
+
122
+ ## Configuration Reference
123
+
124
+ | Option | Type | Default | Description |
125
+ |--------|------|---------|-------------|
126
+ | `name` | `string` | `'betterdb_scache'` | Index name prefix for Valkey keys |
127
+ | `client` | `Valkey` | — | iovalkey client instance (required) |
128
+ | `embedFn` | `(text: string) => Promise<number[]>` | — | Embedding function (required) |
129
+ | `defaultThreshold` | `number` | `0.1` | Cosine distance threshold (0–2) |
130
+ | `defaultTtl` | `number` | `undefined` | Default TTL in seconds for entries |
131
+ | `categoryThresholds` | `Record<string, number>` | `{}` | Per-category threshold overrides |
132
+ | `uncertaintyBand` | `number` | `0.05` | Width of the uncertainty band below threshold |
133
+ | `telemetry.tracerName` | `string` | `'@betterdb/semantic-cache'` | OpenTelemetry tracer name |
134
+ | `telemetry.metricsPrefix` | `string` | `'semantic_cache'` | Prometheus metric name prefix |
135
+ | `telemetry.registry` | `Registry` | default registry | prom-client Registry for metrics |
136
+
137
+ ## Observability
138
+
139
+ ### Prometheus Metrics
140
+
141
+ All metric names are prefixed with `semantic_cache_` by default (configurable via `telemetry.metricsPrefix`).
142
+
143
+ | Metric | Type | Labels | Description |
144
+ |--------|------|--------|-------------|
145
+ | `semantic_cache_requests_total` | Counter | `cache_name`, `result`, `category` | Total cache requests. `result` is `hit`, `miss`, or `uncertain_hit` |
146
+ | `semantic_cache_similarity_score` | Histogram | `cache_name`, `category` | Cosine distance scores for lookups with candidates |
147
+ | `semantic_cache_operation_duration_seconds` | Histogram | `cache_name`, `operation` | Duration of cache operations (`check`, `store`, `invalidate`, `initialize`) |
148
+ | `semantic_cache_embedding_duration_seconds` | Histogram | `cache_name` | Duration of embedding function calls |
149
+
150
+ ### OpenTelemetry Tracing
151
+
152
+ Every public method emits an OTel span with relevant attributes (`cache.hit`, `cache.similarity`, `cache.threshold`, `cache.confidence`, etc.). Spans require an OpenTelemetry SDK to be configured in the host application — this library uses `@opentelemetry/api` and does not bundle an SDK.
153
+
154
+ ## BetterDB Monitor Integration
155
+
156
+ If you connect [BetterDB Monitor](https://github.com/KIvanow/monitor) to the same Valkey instance, it will automatically detect the semantic cache index and surface:
157
+
158
+ - Hit rate and miss rate over time
159
+ - Similarity score distribution
160
+ - Cache entry count and memory usage
161
+ - Cost savings estimates based on cache hit rates
162
+
163
+ ## API
164
+
165
+ ### `cache.initialize()`
166
+
167
+ Creates or reconnects to the Valkey search index. Must be called before `check()` or `store()`. Safe to call multiple times.
168
+
169
+ ### `cache.check(prompt, options?)`
170
+
171
+ Searches for a semantically similar cached prompt. Returns `{ hit, response, similarity, confidence, matchedKey, nearestMiss }`.
172
+
173
+ ### `cache.store(prompt, response, options?)`
174
+
175
+ Stores a prompt/response pair with its embedding vector. Returns the Valkey key.
176
+
177
+ ### `cache.invalidate(filter)`
178
+
179
+ Deletes entries matching a valkey-search filter expression. Example: `cache.invalidate('@model:{gpt-4o}')`.
180
+
181
+ ### `cache.stats()`
182
+
183
+ Returns `{ hits, misses, total, hitRate }` from the Valkey stats hash.
184
+
185
+ ### `cache.indexInfo()`
186
+
187
+ Returns index metadata: `{ name, numDocs, dimension, indexingState }`.
188
+
189
+ ### `cache.flush()`
190
+
191
+ Drops the index and all entries. Call `initialize()` again to rebuild.
192
+
193
+ ## Known limitations
194
+
195
+ ### Cluster mode
196
+
197
+ `@betterdb/semantic-cache` works with single-node Valkey instances and managed
198
+ single-endpoint services (Amazon ElastiCache for Valkey, Google Cloud Memorystore
199
+ for Valkey). It does not fully support Valkey in cluster mode.
200
+
201
+ The specific issue is `flush()`: it uses `SCAN` to find and delete entry keys,
202
+ but `SCAN` in cluster mode only iterates keys on the node it is sent to. In a
203
+ multi-node cluster, `flush()` will silently leave entry keys on other nodes
204
+ (the FT index itself is dropped correctly).
205
+
206
+ `check()`, `store()`, `invalidate()`, and `stats()` are unaffected — these use
207
+ `FT.SEARCH`, `HSET`, `DEL`, and `HINCRBY` which route correctly in cluster mode
208
+ via the key hash slot.
209
+
210
+ If you need cluster support, either avoid `flush()` or implement a cluster-aware
211
+ key sweep using the iovalkey cluster client's per-node scan capability.
212
+ Cluster mode support is planned for a future release.
213
+
214
+ ### Streaming
215
+
216
+ Streaming LLM responses are not supported. `store()` expects a complete response
217
+ string. If your application uses streaming, accumulate the full response before
218
+ calling `store()`. The cached response is always returned as a complete string,
219
+ not re-streamed token-by-token.
220
+
221
+ ## License
222
+
223
+ MIT
@@ -0,0 +1,59 @@
1
+ import type { SemanticCacheOptions, CacheCheckOptions, CacheStoreOptions, CacheCheckResult, CacheStats, IndexInfo, InvalidateResult } from './types';
2
+ export declare class SemanticCache {
3
+ private readonly client;
4
+ private readonly embedFn;
5
+ private readonly name;
6
+ private readonly indexName;
7
+ private readonly entryPrefix;
8
+ private readonly statsKey;
9
+ private readonly defaultThreshold;
10
+ private readonly defaultTtl;
11
+ private readonly categoryThresholds;
12
+ private readonly uncertaintyBand;
13
+ private readonly telemetry;
14
+ private _initialized;
15
+ private _dimension;
16
+ private _initPromise;
17
+ private _initGeneration;
18
+ /**
19
+ * Creates a new SemanticCache instance.
20
+ *
21
+ * The caller owns the iovalkey client lifecycle. SemanticCache does not
22
+ * close or disconnect the client when it is done. Call client.quit() or
23
+ * client.disconnect() yourself when the application shuts down.
24
+ *
25
+ * Call initialize() before using check() or store().
26
+ */
27
+ constructor(options: SemanticCacheOptions);
28
+ initialize(): Promise<void>;
29
+ flush(): Promise<void>;
30
+ check(prompt: string, options?: CacheCheckOptions): Promise<CacheCheckResult>;
31
+ store(prompt: string, response: string, options?: CacheStoreOptions): Promise<string>;
32
+ /**
33
+ * Deletes all entries matching a valkey-search filter expression.
34
+ *
35
+ * **Security note:** `filter` is passed directly to FT.SEARCH. Only pass
36
+ * trusted, programmatically-constructed expressions — never unsanitised
37
+ * user input.
38
+ */
39
+ invalidate(filter: string): Promise<InvalidateResult>;
40
+ stats(): Promise<CacheStats>;
41
+ indexInfo(): Promise<IndexInfo>;
42
+ private _doInitialize;
43
+ private ensureIndexAndGetDimension;
44
+ /** Wraps embedFn with error handling and duration tracking. */
45
+ private embed;
46
+ /**
47
+ * Wraps a method body in an OTel span with automatic status, end, and
48
+ * operation duration metric. The span is passed to fn so callers can
49
+ * set attributes — but callers must NOT call span.end() or span.setStatus(),
50
+ * as traced() handles both.
51
+ */
52
+ private traced;
53
+ /** Increment stats counters via pipeline. */
54
+ private recordStat;
55
+ private assertInitialized;
56
+ private assertDimension;
57
+ private isIndexNotFoundError;
58
+ private parseDimensionFromInfo;
59
+ }