@betterdb/semantic-cache 0.2.0 → 0.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +75 -1
- package/dist/SemanticCache.d.ts +43 -3
- package/dist/SemanticCache.js +270 -5
- package/dist/discovery.d.ts +67 -0
- package/dist/discovery.js +140 -0
- package/dist/index.d.ts +3 -1
- package/dist/index.js +3 -1
- package/dist/telemetry.d.ts +4 -0
- package/dist/telemetry.js +25 -0
- package/dist/types.d.ts +90 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -96,6 +96,54 @@ A lookup is a **hit** when `score <= threshold`. The default threshold is `0.1`.
|
|
|
96
96
|
| Conversational / RAG | `0.15` | Paraphrases hit as `high` confidence |
|
|
97
97
|
| Broad search / recall | `0.20` | High hit rate, review uncertain hits |
|
|
98
98
|
|
|
99
|
+
## LLM-as-judge
|
|
100
|
+
|
|
101
|
+
When a hit lands in the uncertainty band (`threshold - uncertaintyBand < score <= threshold`), you can supply a `judgeFn` to adjudicate automatically instead of handling `confidence: 'uncertain'` yourself.
|
|
102
|
+
|
|
103
|
+
```typescript
|
|
104
|
+
const result = await cache.check(userPrompt, {
|
|
105
|
+
judge: {
|
|
106
|
+
judgeFn: async ({ prompt, response, similarity, threshold, category }) => {
|
|
107
|
+
// Return true to accept (confidence → 'high')
|
|
108
|
+
// Return false to reject (treated as miss with nearestMiss)
|
|
109
|
+
const verdict = await openai.chat.completions.create({
|
|
110
|
+
model: 'gpt-5-mini',
|
|
111
|
+
messages: [
|
|
112
|
+
{ role: 'system', content: 'Reply YES or NO only.' },
|
|
113
|
+
{ role: 'user', content: `Does this cached response correctly answer the prompt?\nPrompt: ${prompt}\nResponse: ${response}` },
|
|
114
|
+
],
|
|
115
|
+
});
|
|
116
|
+
return verdict.choices[0].message.content?.startsWith('YES') ?? false;
|
|
117
|
+
},
|
|
118
|
+
onError: 'accept', // fail-open on judge errors (default)
|
|
119
|
+
timeoutMs: 2000, // per-call timeout (default)
|
|
120
|
+
},
|
|
121
|
+
});
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
**When the judge is invoked:** only for `confidence === 'uncertain'` hits. High-confidence hits, misses, and the zero-candidates case bypass the judge entirely.
|
|
125
|
+
|
|
126
|
+
**Accept path:** `result.hit === true`, `result.confidence === 'high'`.
|
|
127
|
+
|
|
128
|
+
**Reject path:** `result.hit === false`, `result.nearestMiss` populated with `deltaToThreshold <= 0` (use this to distinguish judge rejections from regular misses where `deltaToThreshold > 0`).
|
|
129
|
+
|
|
130
|
+
**Composing with rerank:** when both `rerank` and `judge` are set, the judge receives the reranked pick's response and similarity score.
|
|
131
|
+
|
|
132
|
+
**`checkBatch()` does not support `judge`.** Call `check()` individually for prompts that need adjudication.
|
|
133
|
+
|
|
134
|
+
### CacheCheckOptions reference
|
|
135
|
+
|
|
136
|
+
| Option | Type | Default | Description |
|
|
137
|
+
|---|---|---|---|
|
|
138
|
+
| `threshold` | `number` | `defaultThreshold` | Per-request cosine distance threshold override |
|
|
139
|
+
| `category` | `string` | — | Category tag for per-category thresholds and metric labels |
|
|
140
|
+
| `filter` | `string` | — | FT.SEARCH pre-filter expression (trusted input only) |
|
|
141
|
+
| `k` | `number` | `1` | KNN neighbours to fetch (ignored when `rerank` is set) |
|
|
142
|
+
| `staleAfterModelChange` | `boolean` | `false` | Evict and miss when stored model differs from `currentModel` |
|
|
143
|
+
| `currentModel` | `string` | — | Model to compare against stored entries |
|
|
144
|
+
| `rerank` | `RerankOptions` | — | Rerank hook; see `RerankOptions` |
|
|
145
|
+
| `judge` | `JudgeOptions` | — | LLM-as-judge for borderline hits; see `JudgeOptions`. Not supported by `checkBatch()`; throws `SemanticCacheUsageError` |
|
|
146
|
+
|
|
99
147
|
## Configuration Reference
|
|
100
148
|
|
|
101
149
|
| Option | Type | Default | Description |
|
|
@@ -161,6 +209,24 @@ Cost savings scale with the model. Observed values from live examples:
|
|
|
161
209
|
| `@betterdb/semantic-cache/embed/cohere` | `embed-english-v3.0` | 1024 |
|
|
162
210
|
| `@betterdb/semantic-cache/embed/ollama` | `nomic-embed-text` | 768 |
|
|
163
211
|
|
|
212
|
+
### Discovery markers
|
|
213
|
+
|
|
214
|
+
Starting in `0.2.0`, `initialize()` writes a small advisory record to a shared `__betterdb:caches` hash on the Valkey instance so Monitor (and other tooling) can enumerate caches without configuration. A 60s-TTL heartbeat key is refreshed every 30s; `flush()` and `dispose()` remove the heartbeat immediately. No sensitive data is ever written — only cache metadata (type, prefix, version, capabilities, configured thresholds).
|
|
215
|
+
|
|
216
|
+
Opt out by passing `discovery: { enabled: false }`. See `SemanticCacheOptions.discovery` for the full set of knobs.
|
|
217
|
+
|
|
218
|
+
If your Valkey runs with ACLs, grant the library's user access to the `__betterdb:*` prefix:
|
|
219
|
+
|
|
220
|
+
```
|
|
221
|
+
ACL SETUSER <user> +@write +@read ~__betterdb:* ~<your-cache-prefix>:*
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
Discovery writes are best-effort — if the ACL denies them, the cache still functions and the `semantic_cache_discovery_write_failed_total` counter increments so operators can alert.
|
|
225
|
+
|
|
226
|
+
### `cache.dispose()`
|
|
227
|
+
|
|
228
|
+
Graceful shutdown: stops the heartbeat and deletes this instance's heartbeat key so Monitor marks the cache offline immediately. Does not drop the index or delete entries. Call from your SIGTERM handler alongside `client.quit()`.
|
|
229
|
+
|
|
164
230
|
## API
|
|
165
231
|
|
|
166
232
|
### `cache.initialize()`
|
|
@@ -215,7 +281,15 @@ Returns `{ name, numDocs, dimension, indexingState }`.
|
|
|
215
281
|
|
|
216
282
|
### `cache.flush()`
|
|
217
283
|
|
|
218
|
-
Drops the index and all
|
|
284
|
+
Drops the index and all entries. Call `initialize()` again to rebuild. Also stops the discovery heartbeat and deletes its heartbeat key, but preserves the registry entry in `__betterdb:caches` so Monitor retains history.
|
|
285
|
+
|
|
286
|
+
### `cache.shutdown()`
|
|
287
|
+
|
|
288
|
+
Stops the analytics client, cancels the stats snapshot timer, and disposes the discovery heartbeat. Safe to call multiple times.
|
|
289
|
+
|
|
290
|
+
### `cache.dispose()`
|
|
291
|
+
|
|
292
|
+
Graceful shutdown of the discovery layer for in-process caches without destroying data. Stops the discovery heartbeat and deletes the heartbeat key; does not touch the index or entries.
|
|
219
293
|
|
|
220
294
|
### `cache.thresholdEffectiveness(options?)`
|
|
221
295
|
|
package/dist/SemanticCache.d.ts
CHANGED
|
@@ -8,15 +8,22 @@ export declare class SemanticCache {
|
|
|
8
8
|
private readonly entryPrefix;
|
|
9
9
|
private readonly statsKey;
|
|
10
10
|
private readonly similarityWindowKey;
|
|
11
|
-
private readonly
|
|
11
|
+
private readonly configKey;
|
|
12
|
+
private defaultThreshold;
|
|
12
13
|
private readonly defaultTtl;
|
|
13
|
-
private
|
|
14
|
+
private categoryThresholds;
|
|
14
15
|
private readonly uncertaintyBand;
|
|
15
16
|
private readonly telemetry;
|
|
16
17
|
private readonly costTable;
|
|
17
18
|
private readonly embeddingCacheEnabled;
|
|
18
19
|
private readonly embeddingCacheTtl;
|
|
19
20
|
private readonly embedKeyPrefix;
|
|
21
|
+
private readonly discoveryOptions;
|
|
22
|
+
private readonly _initialDefaultThreshold;
|
|
23
|
+
private readonly _initialCategoryThresholds;
|
|
24
|
+
private readonly configRefreshOptions;
|
|
25
|
+
private configRefreshTimer;
|
|
26
|
+
private discovery;
|
|
20
27
|
private _initialized;
|
|
21
28
|
private _dimension;
|
|
22
29
|
private _hasBinaryRefs;
|
|
@@ -40,8 +47,18 @@ export declare class SemanticCache {
|
|
|
40
47
|
constructor(options: SemanticCacheOptions);
|
|
41
48
|
initialize(): Promise<void>;
|
|
42
49
|
flush(): Promise<void>;
|
|
43
|
-
/**
|
|
50
|
+
/**
|
|
51
|
+
* Shut down the analytics client, cancel the stats timer, and stop the
|
|
52
|
+
* discovery heartbeat. Safe to call multiple times.
|
|
53
|
+
*/
|
|
44
54
|
shutdown(): Promise<void>;
|
|
55
|
+
/**
|
|
56
|
+
* Graceful shutdown of the discovery layer — stops the heartbeat and
|
|
57
|
+
* deletes this instance's heartbeat key so Monitor marks the cache offline
|
|
58
|
+
* immediately. Does NOT touch the registry hash, the FT index, or any
|
|
59
|
+
* entries. Safe to call multiple times.
|
|
60
|
+
*/
|
|
61
|
+
dispose(): Promise<void>;
|
|
45
62
|
check(prompt: string | ContentBlock[], options?: CacheCheckOptions): Promise<CacheCheckResult>;
|
|
46
63
|
store(prompt: string | ContentBlock[], response: string, options?: CacheStoreOptions): Promise<string>;
|
|
47
64
|
/**
|
|
@@ -82,8 +99,29 @@ export declare class SemanticCache {
|
|
|
82
99
|
thresholdEffectivenessAll(options?: {
|
|
83
100
|
minSamples?: number;
|
|
84
101
|
}): Promise<ThresholdEffectivenessResult[]>;
|
|
102
|
+
/**
|
|
103
|
+
* Refresh threshold config from Valkey. Returns true on a successful HGETALL,
|
|
104
|
+
* false if the call threw.
|
|
105
|
+
*
|
|
106
|
+
* Field semantics:
|
|
107
|
+
* - "threshold" -> updates defaultThreshold
|
|
108
|
+
* - "threshold:{category}" -> updates categoryThresholds[category]
|
|
109
|
+
* - "threshold:" (empty) -> ignored
|
|
110
|
+
* - non-numeric values -> ignored
|
|
111
|
+
* - out-of-range values -> ignored (must be 0 <= x <= 2)
|
|
112
|
+
*
|
|
113
|
+
* Categories present in memory but absent from the hash fall back to their
|
|
114
|
+
* constructor values (or are removed if no constructor override existed).
|
|
115
|
+
* The default threshold likewise falls back to its constructor value if
|
|
116
|
+
* `threshold` is absent from the hash.
|
|
117
|
+
*/
|
|
118
|
+
refreshConfig(): Promise<boolean>;
|
|
85
119
|
/** @internal Default similarity threshold. */
|
|
86
120
|
get _defaultThreshold(): number;
|
|
121
|
+
/** @internal Test-only getter. */
|
|
122
|
+
get _categoryThresholds(): Readonly<Record<string, number>>;
|
|
123
|
+
/** @internal Test-only getter. */
|
|
124
|
+
get _configRefreshIntervalMs(): number;
|
|
87
125
|
/**
|
|
88
126
|
* Execute a stable FT.SEARCH for use by adapters (e.g. LangGraph).
|
|
89
127
|
* SORTBY inserted_at ASC gives stable ordering across paginated calls.
|
|
@@ -98,7 +136,9 @@ export declare class SemanticCache {
|
|
|
98
136
|
vector: number[];
|
|
99
137
|
durationSec: number;
|
|
100
138
|
}>;
|
|
139
|
+
private startConfigRefresh;
|
|
101
140
|
private _doInitialize;
|
|
141
|
+
private registerDiscovery;
|
|
102
142
|
private initAnalyticsSafe;
|
|
103
143
|
private captureStatsSnapshot;
|
|
104
144
|
private ensureIndexAndGetDimension;
|
package/dist/SemanticCache.js
CHANGED
|
@@ -10,7 +10,9 @@ const utils_1 = require("./utils");
|
|
|
10
10
|
const defaultCostTable_1 = require("./defaultCostTable");
|
|
11
11
|
const cluster_1 = require("./cluster");
|
|
12
12
|
const analytics_1 = require("./analytics");
|
|
13
|
+
const discovery_1 = require("./discovery");
|
|
13
14
|
const INVALIDATE_BATCH_SIZE = 1000;
|
|
15
|
+
const PACKAGE_VERSION = require('../package.json').version;
|
|
14
16
|
function errMsg(err) {
|
|
15
17
|
return err instanceof Error ? err.message : String(err);
|
|
16
18
|
}
|
|
@@ -22,6 +24,7 @@ class SemanticCache {
|
|
|
22
24
|
entryPrefix;
|
|
23
25
|
statsKey;
|
|
24
26
|
similarityWindowKey;
|
|
27
|
+
configKey;
|
|
25
28
|
defaultThreshold;
|
|
26
29
|
defaultTtl;
|
|
27
30
|
categoryThresholds;
|
|
@@ -31,6 +34,12 @@ class SemanticCache {
|
|
|
31
34
|
embeddingCacheEnabled;
|
|
32
35
|
embeddingCacheTtl;
|
|
33
36
|
embedKeyPrefix;
|
|
37
|
+
discoveryOptions;
|
|
38
|
+
_initialDefaultThreshold;
|
|
39
|
+
_initialCategoryThresholds;
|
|
40
|
+
configRefreshOptions;
|
|
41
|
+
configRefreshTimer;
|
|
42
|
+
discovery = null;
|
|
34
43
|
_initialized = false;
|
|
35
44
|
_dimension = 0;
|
|
36
45
|
_hasBinaryRefs = false;
|
|
@@ -59,6 +68,7 @@ class SemanticCache {
|
|
|
59
68
|
this.entryPrefix = `${this.name}:entry:`;
|
|
60
69
|
this.statsKey = `${this.name}:__stats`;
|
|
61
70
|
this.similarityWindowKey = `${this.name}:__similarity_window`;
|
|
71
|
+
this.configKey = `${this.name}:__config`;
|
|
62
72
|
this.embedKeyPrefix = `${this.name}:embed:`;
|
|
63
73
|
this.defaultThreshold = options.defaultThreshold ?? 0.1;
|
|
64
74
|
this.defaultTtl = options.defaultTtl;
|
|
@@ -85,6 +95,16 @@ class SemanticCache {
|
|
|
85
95
|
});
|
|
86
96
|
this.analyticsOpts = options.analytics;
|
|
87
97
|
this.usesDefaultCostTable = useDefault;
|
|
98
|
+
this.discoveryOptions = options.discovery ?? {};
|
|
99
|
+
// Capture constructor values as fallback when __config fields are absent
|
|
100
|
+
this._initialDefaultThreshold = this.defaultThreshold;
|
|
101
|
+
this._initialCategoryThresholds = { ...this.categoryThresholds };
|
|
102
|
+
// Refresh options
|
|
103
|
+
const refresh = options.configRefresh ?? {};
|
|
104
|
+
this.configRefreshOptions = {
|
|
105
|
+
enabled: refresh.enabled ?? true,
|
|
106
|
+
intervalMs: Math.max(1000, refresh.intervalMs ?? 30_000),
|
|
107
|
+
};
|
|
88
108
|
}
|
|
89
109
|
// -- Lifecycle --
|
|
90
110
|
async initialize() {
|
|
@@ -102,6 +122,14 @@ class SemanticCache {
|
|
|
102
122
|
this._initialized = false;
|
|
103
123
|
this._initPromise = null;
|
|
104
124
|
this._initGeneration++;
|
|
125
|
+
// Capture and null the discovery ref synchronously, before any await,
|
|
126
|
+
// so a concurrent _doInitialize() (started after _initGeneration++) can't
|
|
127
|
+
// race in and have its new manager overwritten by this flush.
|
|
128
|
+
const discoveryToStop = this.discovery;
|
|
129
|
+
this.discovery = null;
|
|
130
|
+
if (discoveryToStop) {
|
|
131
|
+
await discoveryToStop.stop({ deleteHeartbeat: true });
|
|
132
|
+
}
|
|
105
133
|
// Valkey Search 1.2 does not support the DD (Delete Documents) flag on
|
|
106
134
|
// FT.DROPINDEX. Drop the index first, then clean up keys separately.
|
|
107
135
|
try {
|
|
@@ -126,14 +154,41 @@ class SemanticCache {
|
|
|
126
154
|
await this.client.del(this.similarityWindowKey);
|
|
127
155
|
this.analytics.capture('cache_flush');
|
|
128
156
|
}
|
|
129
|
-
/**
|
|
157
|
+
/**
|
|
158
|
+
* Shut down the analytics client, cancel the stats timer, and stop the
|
|
159
|
+
* discovery heartbeat. Safe to call multiple times.
|
|
160
|
+
*/
|
|
130
161
|
async shutdown() {
|
|
131
162
|
this.shutdownCalled = true;
|
|
163
|
+
if (this.configRefreshTimer) {
|
|
164
|
+
clearInterval(this.configRefreshTimer);
|
|
165
|
+
this.configRefreshTimer = undefined;
|
|
166
|
+
}
|
|
132
167
|
if (this.statsTimer) {
|
|
133
168
|
clearInterval(this.statsTimer);
|
|
134
169
|
this.statsTimer = undefined;
|
|
135
170
|
}
|
|
136
171
|
await this.analytics.shutdown();
|
|
172
|
+
await this.dispose();
|
|
173
|
+
}
|
|
174
|
+
/**
|
|
175
|
+
* Graceful shutdown of the discovery layer — stops the heartbeat and
|
|
176
|
+
* deletes this instance's heartbeat key so Monitor marks the cache offline
|
|
177
|
+
* immediately. Does NOT touch the registry hash, the FT index, or any
|
|
178
|
+
* entries. Safe to call multiple times.
|
|
179
|
+
*/
|
|
180
|
+
async dispose() {
|
|
181
|
+
if (this.configRefreshTimer) {
|
|
182
|
+
clearInterval(this.configRefreshTimer);
|
|
183
|
+
this.configRefreshTimer = undefined;
|
|
184
|
+
}
|
|
185
|
+
if (this._initPromise) {
|
|
186
|
+
await this._initPromise.catch(() => { });
|
|
187
|
+
}
|
|
188
|
+
if (this.discovery) {
|
|
189
|
+
await this.discovery.stop({ deleteHeartbeat: true });
|
|
190
|
+
this.discovery = null;
|
|
191
|
+
}
|
|
137
192
|
}
|
|
138
193
|
// -- Public operations --
|
|
139
194
|
async check(prompt, options) {
|
|
@@ -259,14 +314,85 @@ class SemanticCache {
|
|
|
259
314
|
return { hit: false, confidence: 'miss' };
|
|
260
315
|
}
|
|
261
316
|
}
|
|
262
|
-
// All checks passed —
|
|
317
|
+
// All checks passed — compute confidence (recordSimilarityWindow moves to after judge)
|
|
318
|
+
let confidence = winnerScore >= threshold - this.uncertaintyBand ? 'uncertain' : 'high';
|
|
319
|
+
const matchedKey = winner.key;
|
|
320
|
+
// --- LLM-as-judge for borderline hits ---
|
|
321
|
+
if (options?.judge && confidence === 'uncertain') {
|
|
322
|
+
const judgeStart = performance.now();
|
|
323
|
+
const timeoutMs = options.judge.timeoutMs ?? 2000;
|
|
324
|
+
const onError = options.judge.onError ?? 'accept';
|
|
325
|
+
let decision;
|
|
326
|
+
try {
|
|
327
|
+
const accepted = await raceWithTimeout(options.judge.judgeFn({
|
|
328
|
+
prompt: promptText,
|
|
329
|
+
response: winner.fields['response'] ?? '',
|
|
330
|
+
similarity: winnerScore,
|
|
331
|
+
threshold,
|
|
332
|
+
category: category || undefined,
|
|
333
|
+
}), timeoutMs);
|
|
334
|
+
decision = accepted ? 'accept' : 'reject';
|
|
335
|
+
}
|
|
336
|
+
catch (err) {
|
|
337
|
+
const isTimeout = err instanceof JudgeTimeoutError;
|
|
338
|
+
if (onError === 'accept') {
|
|
339
|
+
decision = isTimeout ? 'timeout_accept' : 'error_accept';
|
|
340
|
+
}
|
|
341
|
+
else {
|
|
342
|
+
decision = isTimeout ? 'timeout_reject' : 'error_reject';
|
|
343
|
+
}
|
|
344
|
+
}
|
|
345
|
+
const judgeSec = (performance.now() - judgeStart) / 1000;
|
|
346
|
+
this.telemetry.metrics.judgeDecisions
|
|
347
|
+
.labels({ cache_name: this.name, category: categoryLabel, decision })
|
|
348
|
+
.inc();
|
|
349
|
+
this.telemetry.metrics.judgeDuration
|
|
350
|
+
.labels({ cache_name: this.name, category: categoryLabel, decision })
|
|
351
|
+
.observe(judgeSec);
|
|
352
|
+
span.setAttributes({
|
|
353
|
+
'cache.judge.invoked': true,
|
|
354
|
+
'cache.judge.decision': decision,
|
|
355
|
+
'cache.judge.latency_ms': judgeSec * 1000,
|
|
356
|
+
});
|
|
357
|
+
if (decision === 'accept') {
|
|
358
|
+
confidence = 'high';
|
|
359
|
+
// Fall through to hit-return path
|
|
360
|
+
}
|
|
361
|
+
else if (decision === 'error_accept' || decision === 'timeout_accept') {
|
|
362
|
+
// Preserve 'uncertain'; fall through to hit-return path
|
|
363
|
+
}
|
|
364
|
+
else {
|
|
365
|
+
// reject / error_reject / timeout_reject → treat as miss
|
|
366
|
+
await this.recordSimilarityWindow(winnerScore, 'miss', category);
|
|
367
|
+
await this.recordStat('misses');
|
|
368
|
+
this.telemetry.metrics.requestsTotal
|
|
369
|
+
.labels({ cache_name: this.name, result: 'miss', category: categoryLabel })
|
|
370
|
+
.inc();
|
|
371
|
+
span.setAttributes({
|
|
372
|
+
'cache.hit': false,
|
|
373
|
+
'cache.name': this.name,
|
|
374
|
+
'cache.category': categoryLabel,
|
|
375
|
+
});
|
|
376
|
+
return {
|
|
377
|
+
hit: false,
|
|
378
|
+
confidence: 'miss',
|
|
379
|
+
similarity: winnerScore,
|
|
380
|
+
nearestMiss: {
|
|
381
|
+
similarity: winnerScore,
|
|
382
|
+
threshold,
|
|
383
|
+
deltaToThreshold: winnerScore - threshold,
|
|
384
|
+
matchedKey,
|
|
385
|
+
},
|
|
386
|
+
};
|
|
387
|
+
}
|
|
388
|
+
}
|
|
389
|
+
// --- End judge ---
|
|
390
|
+
// Record as genuine hit (moved here from before the judge block)
|
|
263
391
|
await this.recordSimilarityWindow(winnerScore, 'hit', category);
|
|
264
|
-
const confidence = winnerScore >= threshold - this.uncertaintyBand ? 'uncertain' : 'high';
|
|
265
392
|
await this.recordStat('hits');
|
|
266
393
|
const metricResult = confidence === 'uncertain' ? 'uncertain_hit' : 'hit';
|
|
267
394
|
this.telemetry.metrics.requestsTotal
|
|
268
395
|
.labels({ cache_name: this.name, result: metricResult, category: categoryLabel }).inc();
|
|
269
|
-
const matchedKey = winner.key;
|
|
270
396
|
if (this.defaultTtl !== undefined && matchedKey) {
|
|
271
397
|
await this.client.expire(matchedKey, this.defaultTtl);
|
|
272
398
|
}
|
|
@@ -446,6 +572,9 @@ class SemanticCache {
|
|
|
446
572
|
if (options?.staleAfterModelChange) {
|
|
447
573
|
throw new errors_1.SemanticCacheUsageError("checkBatch() does not support 'staleAfterModelChange'. Use check() for stale-model eviction.");
|
|
448
574
|
}
|
|
575
|
+
if (options?.judge) {
|
|
576
|
+
throw new errors_1.SemanticCacheUsageError("checkBatch() does not support the 'judge' option. Use check() for LLM-as-judge adjudication.");
|
|
577
|
+
}
|
|
449
578
|
return this.traced('checkBatch', async (span) => {
|
|
450
579
|
// Resolve all prompts and embed in parallel
|
|
451
580
|
const resolved = await Promise.all(prompts.map((p) => this.resolvePrompt(p)));
|
|
@@ -769,9 +898,64 @@ class SemanticCache {
|
|
|
769
898
|
]);
|
|
770
899
|
return results;
|
|
771
900
|
}
|
|
901
|
+
/**
|
|
902
|
+
* Refresh threshold config from Valkey. Returns true on a successful HGETALL,
|
|
903
|
+
* false if the call threw.
|
|
904
|
+
*
|
|
905
|
+
* Field semantics:
|
|
906
|
+
* - "threshold" -> updates defaultThreshold
|
|
907
|
+
* - "threshold:{category}" -> updates categoryThresholds[category]
|
|
908
|
+
* - "threshold:" (empty) -> ignored
|
|
909
|
+
* - non-numeric values -> ignored
|
|
910
|
+
* - out-of-range values -> ignored (must be 0 <= x <= 2)
|
|
911
|
+
*
|
|
912
|
+
* Categories present in memory but absent from the hash fall back to their
|
|
913
|
+
* constructor values (or are removed if no constructor override existed).
|
|
914
|
+
* The default threshold likewise falls back to its constructor value if
|
|
915
|
+
* `threshold` is absent from the hash.
|
|
916
|
+
*/
|
|
917
|
+
async refreshConfig() {
|
|
918
|
+
let raw = null;
|
|
919
|
+
try {
|
|
920
|
+
raw = await this.client.hgetall(this.configKey);
|
|
921
|
+
}
|
|
922
|
+
catch {
|
|
923
|
+
return false;
|
|
924
|
+
}
|
|
925
|
+
let nextDefault = this._initialDefaultThreshold;
|
|
926
|
+
const nextCategory = { ...this._initialCategoryThresholds };
|
|
927
|
+
if (raw) {
|
|
928
|
+
for (const [field, value] of Object.entries(raw)) {
|
|
929
|
+
const parsed = Number(value);
|
|
930
|
+
if (!Number.isFinite(parsed) || parsed < 0 || parsed > 2) {
|
|
931
|
+
continue;
|
|
932
|
+
}
|
|
933
|
+
if (field === 'threshold') {
|
|
934
|
+
nextDefault = parsed;
|
|
935
|
+
}
|
|
936
|
+
else if (field.startsWith('threshold:')) {
|
|
937
|
+
const category = field.slice('threshold:'.length);
|
|
938
|
+
if (category.length > 0) {
|
|
939
|
+
nextCategory[category] = parsed;
|
|
940
|
+
}
|
|
941
|
+
}
|
|
942
|
+
}
|
|
943
|
+
}
|
|
944
|
+
this.defaultThreshold = nextDefault;
|
|
945
|
+
this.categoryThresholds = nextCategory;
|
|
946
|
+
return true;
|
|
947
|
+
}
|
|
772
948
|
// -- Internal helpers exposed to package adapters --
|
|
773
949
|
/** @internal Default similarity threshold. */
|
|
774
950
|
get _defaultThreshold() { return this.defaultThreshold; }
|
|
951
|
+
/** @internal Test-only getter. */
|
|
952
|
+
get _categoryThresholds() {
|
|
953
|
+
return this.categoryThresholds;
|
|
954
|
+
}
|
|
955
|
+
/** @internal Test-only getter. */
|
|
956
|
+
get _configRefreshIntervalMs() {
|
|
957
|
+
return this.configRefreshOptions.intervalMs;
|
|
958
|
+
}
|
|
775
959
|
/**
|
|
776
960
|
* Execute a stable FT.SEARCH for use by adapters (e.g. LangGraph).
|
|
777
961
|
* SORTBY inserted_at ASC gives stable ordering across paginated calls.
|
|
@@ -788,19 +972,86 @@ class SemanticCache {
|
|
|
788
972
|
return this.embed(text);
|
|
789
973
|
}
|
|
790
974
|
// -- Private helpers --
|
|
975
|
+
startConfigRefresh() {
|
|
976
|
+
if (!this.configRefreshOptions.enabled) {
|
|
977
|
+
return;
|
|
978
|
+
}
|
|
979
|
+
const tick = () => {
|
|
980
|
+
this.refreshConfig()
|
|
981
|
+
.then((ok) => {
|
|
982
|
+
if (!ok) {
|
|
983
|
+
this.telemetry.metrics.configRefreshFailed
|
|
984
|
+
.labels({ cache_name: this.name })
|
|
985
|
+
.inc();
|
|
986
|
+
}
|
|
987
|
+
})
|
|
988
|
+
.catch(() => {
|
|
989
|
+
this.telemetry.metrics.configRefreshFailed
|
|
990
|
+
.labels({ cache_name: this.name })
|
|
991
|
+
.inc();
|
|
992
|
+
});
|
|
993
|
+
};
|
|
994
|
+
// Synchronous first refresh: process started immediately after a proposal
|
|
995
|
+
// was applied picks up the change without waiting for the first tick.
|
|
996
|
+
tick();
|
|
997
|
+
this.configRefreshTimer = setInterval(tick, this.configRefreshOptions.intervalMs);
|
|
998
|
+
if (typeof this.configRefreshTimer.unref === 'function') {
|
|
999
|
+
this.configRefreshTimer.unref();
|
|
1000
|
+
}
|
|
1001
|
+
}
|
|
791
1002
|
async _doInitialize() {
|
|
792
1003
|
const gen = this._initGeneration;
|
|
793
1004
|
return this.traced('initialize', async () => {
|
|
794
1005
|
const { dim, hasBinaryRefs } = await this.ensureIndexAndGetDimension();
|
|
795
|
-
if (this._initGeneration !== gen)
|
|
1006
|
+
if (this._initGeneration !== gen) {
|
|
796
1007
|
return;
|
|
1008
|
+
}
|
|
797
1009
|
this._dimension = dim;
|
|
798
1010
|
this._hasBinaryRefs = hasBinaryRefs;
|
|
1011
|
+
// registerDiscovery() may throw SemanticCacheUsageError on a name
|
|
1012
|
+
// collision. Mark the cache initialized only after discovery succeeds
|
|
1013
|
+
// so a colliding caller cannot subsequently call check()/store()
|
|
1014
|
+
// against another owner's keys.
|
|
1015
|
+
const manager = await this.registerDiscovery();
|
|
1016
|
+
if (this._initGeneration !== gen) {
|
|
1017
|
+
if (manager) {
|
|
1018
|
+
await manager.stop({ deleteHeartbeat: true });
|
|
1019
|
+
}
|
|
1020
|
+
return;
|
|
1021
|
+
}
|
|
1022
|
+
this.discovery = manager;
|
|
799
1023
|
this._initialized = true;
|
|
1024
|
+
this.startConfigRefresh();
|
|
800
1025
|
// Fire analytics init once (not on every flush+initialize cycle)
|
|
801
1026
|
this.initAnalyticsSafe().catch(() => { });
|
|
802
1027
|
});
|
|
803
1028
|
}
|
|
1029
|
+
async registerDiscovery() {
|
|
1030
|
+
if (this.discoveryOptions.enabled === false) {
|
|
1031
|
+
return null;
|
|
1032
|
+
}
|
|
1033
|
+
const metadata = (0, discovery_1.buildSemanticMetadata)({
|
|
1034
|
+
name: this.name,
|
|
1035
|
+
version: PACKAGE_VERSION,
|
|
1036
|
+
defaultThreshold: this.defaultThreshold,
|
|
1037
|
+
categoryThresholds: this.categoryThresholds,
|
|
1038
|
+
uncertaintyBand: this.uncertaintyBand,
|
|
1039
|
+
includeCategories: this.discoveryOptions.includeCategories ?? true,
|
|
1040
|
+
});
|
|
1041
|
+
const manager = new discovery_1.DiscoveryManager({
|
|
1042
|
+
client: this.client,
|
|
1043
|
+
name: this.name,
|
|
1044
|
+
metadata,
|
|
1045
|
+
heartbeatIntervalMs: this.discoveryOptions.heartbeatIntervalMs,
|
|
1046
|
+
onWriteFailed: () => {
|
|
1047
|
+
this.telemetry.metrics.discoveryWriteFailed
|
|
1048
|
+
.labels({ cache_name: this.name })
|
|
1049
|
+
.inc();
|
|
1050
|
+
},
|
|
1051
|
+
});
|
|
1052
|
+
await manager.register();
|
|
1053
|
+
return manager;
|
|
1054
|
+
}
|
|
804
1055
|
async initAnalyticsSafe() {
|
|
805
1056
|
if (this.analyticsInitiated)
|
|
806
1057
|
return;
|
|
@@ -1056,3 +1307,17 @@ class SemanticCache {
|
|
|
1056
1307
|
}
|
|
1057
1308
|
}
|
|
1058
1309
|
exports.SemanticCache = SemanticCache;
|
|
1310
|
+
// --- Judge helpers ---
|
|
1311
|
+
class JudgeTimeoutError extends Error {
|
|
1312
|
+
constructor() {
|
|
1313
|
+
super('judgeFn timed out');
|
|
1314
|
+
this.name = 'JudgeTimeoutError';
|
|
1315
|
+
}
|
|
1316
|
+
}
|
|
1317
|
+
function raceWithTimeout(p, timeoutMs) {
|
|
1318
|
+
let timer;
|
|
1319
|
+
const timeout = new Promise((_, reject) => {
|
|
1320
|
+
timer = setTimeout(() => reject(new JudgeTimeoutError()), timeoutMs);
|
|
1321
|
+
});
|
|
1322
|
+
return Promise.race([p, timeout]).finally(() => clearTimeout(timer));
|
|
1323
|
+
}
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
import type { Valkey } from './types';
|
|
2
|
+
export declare const PROTOCOL_VERSION = 1;
|
|
3
|
+
export declare const REGISTRY_KEY = "__betterdb:caches";
|
|
4
|
+
export declare const PROTOCOL_KEY = "__betterdb:protocol";
|
|
5
|
+
export declare const HEARTBEAT_KEY_PREFIX = "__betterdb:heartbeat:";
|
|
6
|
+
export declare const DEFAULT_HEARTBEAT_INTERVAL_MS = 30000;
|
|
7
|
+
export declare const HEARTBEAT_TTL_SECONDS = 60;
|
|
8
|
+
export declare const CACHE_TYPE: "semantic_cache";
|
|
9
|
+
export type CacheType = typeof CACHE_TYPE;
|
|
10
|
+
export interface DiscoveryOptions {
|
|
11
|
+
enabled?: boolean;
|
|
12
|
+
heartbeatIntervalMs?: number;
|
|
13
|
+
includeCategories?: boolean;
|
|
14
|
+
}
|
|
15
|
+
export interface MarkerMetadata {
|
|
16
|
+
type: CacheType;
|
|
17
|
+
prefix: string;
|
|
18
|
+
version: string;
|
|
19
|
+
protocol_version: number;
|
|
20
|
+
capabilities: string[];
|
|
21
|
+
stats_key: string;
|
|
22
|
+
started_at: string;
|
|
23
|
+
pid?: number;
|
|
24
|
+
hostname?: string;
|
|
25
|
+
[extra: string]: unknown;
|
|
26
|
+
}
|
|
27
|
+
export interface BuildSemanticMetadataInput {
|
|
28
|
+
name: string;
|
|
29
|
+
version: string;
|
|
30
|
+
defaultThreshold: number;
|
|
31
|
+
categoryThresholds: Record<string, number>;
|
|
32
|
+
uncertaintyBand: number;
|
|
33
|
+
includeCategories: boolean;
|
|
34
|
+
}
|
|
35
|
+
export declare function buildSemanticMetadata(input: BuildSemanticMetadataInput): MarkerMetadata;
|
|
36
|
+
export interface DiscoveryLogger {
|
|
37
|
+
warn: (msg: string) => void;
|
|
38
|
+
debug: (msg: string) => void;
|
|
39
|
+
}
|
|
40
|
+
export interface DiscoveryManagerDeps {
|
|
41
|
+
client: Valkey;
|
|
42
|
+
name: string;
|
|
43
|
+
metadata: MarkerMetadata;
|
|
44
|
+
heartbeatIntervalMs?: number;
|
|
45
|
+
logger?: DiscoveryLogger;
|
|
46
|
+
onWriteFailed?: () => void;
|
|
47
|
+
}
|
|
48
|
+
export declare class DiscoveryManager {
|
|
49
|
+
private readonly client;
|
|
50
|
+
private readonly name;
|
|
51
|
+
private readonly metadata;
|
|
52
|
+
private readonly heartbeatIntervalMs;
|
|
53
|
+
private readonly heartbeatKey;
|
|
54
|
+
private readonly logger;
|
|
55
|
+
private readonly onWriteFailed;
|
|
56
|
+
private heartbeatHandle;
|
|
57
|
+
constructor(deps: DiscoveryManagerDeps);
|
|
58
|
+
register(): Promise<void>;
|
|
59
|
+
stop(opts: {
|
|
60
|
+
deleteHeartbeat: boolean;
|
|
61
|
+
}): Promise<void>;
|
|
62
|
+
tickHeartbeat(): Promise<void>;
|
|
63
|
+
private startHeartbeat;
|
|
64
|
+
private safeHget;
|
|
65
|
+
private safeCall;
|
|
66
|
+
private checkCollision;
|
|
67
|
+
}
|
|
@@ -0,0 +1,140 @@
|
|
|
1
|
+
"use strict";
|
|
2
|
+
Object.defineProperty(exports, "__esModule", { value: true });
|
|
3
|
+
exports.DiscoveryManager = exports.CACHE_TYPE = exports.HEARTBEAT_TTL_SECONDS = exports.DEFAULT_HEARTBEAT_INTERVAL_MS = exports.HEARTBEAT_KEY_PREFIX = exports.PROTOCOL_KEY = exports.REGISTRY_KEY = exports.PROTOCOL_VERSION = void 0;
|
|
4
|
+
exports.buildSemanticMetadata = buildSemanticMetadata;
|
|
5
|
+
const node_os_1 = require("node:os");
|
|
6
|
+
const errors_1 = require("./errors");
|
|
7
|
+
exports.PROTOCOL_VERSION = 1;
|
|
8
|
+
exports.REGISTRY_KEY = '__betterdb:caches';
|
|
9
|
+
exports.PROTOCOL_KEY = '__betterdb:protocol';
|
|
10
|
+
exports.HEARTBEAT_KEY_PREFIX = '__betterdb:heartbeat:';
|
|
11
|
+
exports.DEFAULT_HEARTBEAT_INTERVAL_MS = 30_000;
|
|
12
|
+
exports.HEARTBEAT_TTL_SECONDS = 60;
|
|
13
|
+
exports.CACHE_TYPE = 'semantic_cache';
|
|
14
|
+
function buildSemanticMetadata(input) {
|
|
15
|
+
const metadata = {
|
|
16
|
+
type: exports.CACHE_TYPE,
|
|
17
|
+
prefix: input.name,
|
|
18
|
+
version: input.version,
|
|
19
|
+
protocol_version: exports.PROTOCOL_VERSION,
|
|
20
|
+
capabilities: ['invalidate', 'similarity_distribution', 'threshold_adjust'],
|
|
21
|
+
index_name: `${input.name}:idx`,
|
|
22
|
+
stats_key: `${input.name}:__stats`,
|
|
23
|
+
config_key: `${input.name}:__config`,
|
|
24
|
+
default_threshold: input.defaultThreshold,
|
|
25
|
+
uncertainty_band: input.uncertaintyBand,
|
|
26
|
+
started_at: new Date().toISOString(),
|
|
27
|
+
pid: process.pid,
|
|
28
|
+
hostname: (0, node_os_1.hostname)(),
|
|
29
|
+
};
|
|
30
|
+
if (input.includeCategories && Object.keys(input.categoryThresholds).length > 0) {
|
|
31
|
+
metadata.category_thresholds = { ...input.categoryThresholds };
|
|
32
|
+
}
|
|
33
|
+
return metadata;
|
|
34
|
+
}
|
|
35
|
+
const noopLogger = {
|
|
36
|
+
warn: () => { },
|
|
37
|
+
debug: () => { },
|
|
38
|
+
};
|
|
39
|
+
function errMsg(err) {
|
|
40
|
+
return err instanceof Error ? err.message : String(err);
|
|
41
|
+
}
|
|
42
|
+
class DiscoveryManager {
|
|
43
|
+
client;
|
|
44
|
+
name;
|
|
45
|
+
metadata;
|
|
46
|
+
heartbeatIntervalMs;
|
|
47
|
+
heartbeatKey;
|
|
48
|
+
logger;
|
|
49
|
+
onWriteFailed;
|
|
50
|
+
heartbeatHandle = null;
|
|
51
|
+
constructor(deps) {
|
|
52
|
+
this.client = deps.client;
|
|
53
|
+
this.name = deps.name;
|
|
54
|
+
this.metadata = deps.metadata;
|
|
55
|
+
this.heartbeatIntervalMs = deps.heartbeatIntervalMs ?? exports.DEFAULT_HEARTBEAT_INTERVAL_MS;
|
|
56
|
+
this.heartbeatKey = `${exports.HEARTBEAT_KEY_PREFIX}${deps.name}`;
|
|
57
|
+
this.logger = deps.logger ?? noopLogger;
|
|
58
|
+
this.onWriteFailed = deps.onWriteFailed ?? (() => { });
|
|
59
|
+
}
|
|
60
|
+
async register() {
|
|
61
|
+
const existingJson = await this.safeHget();
|
|
62
|
+
if (existingJson !== null) {
|
|
63
|
+
this.checkCollision(existingJson);
|
|
64
|
+
}
|
|
65
|
+
await this.safeCall(() => this.client.hset(exports.REGISTRY_KEY, this.name, JSON.stringify(this.metadata)), 'HSET registry');
|
|
66
|
+
await this.safeCall(() => this.client.set(exports.PROTOCOL_KEY, String(exports.PROTOCOL_VERSION), 'NX'), 'SET protocol');
|
|
67
|
+
await this.tickHeartbeat();
|
|
68
|
+
this.startHeartbeat();
|
|
69
|
+
}
|
|
70
|
+
async stop(opts) {
|
|
71
|
+
if (this.heartbeatHandle) {
|
|
72
|
+
clearInterval(this.heartbeatHandle);
|
|
73
|
+
this.heartbeatHandle = null;
|
|
74
|
+
}
|
|
75
|
+
if (!opts.deleteHeartbeat) {
|
|
76
|
+
return;
|
|
77
|
+
}
|
|
78
|
+
try {
|
|
79
|
+
await this.client.del(this.heartbeatKey);
|
|
80
|
+
}
|
|
81
|
+
catch (err) {
|
|
82
|
+
this.logger.debug(`discovery: DEL heartbeat failed: ${errMsg(err)}`);
|
|
83
|
+
}
|
|
84
|
+
}
|
|
85
|
+
async tickHeartbeat() {
|
|
86
|
+
const now = new Date().toISOString();
|
|
87
|
+
try {
|
|
88
|
+
await this.client.set(this.heartbeatKey, now, 'EX', exports.HEARTBEAT_TTL_SECONDS);
|
|
89
|
+
}
|
|
90
|
+
catch (err) {
|
|
91
|
+
this.logger.debug(`discovery: heartbeat SET failed: ${errMsg(err)}`);
|
|
92
|
+
this.onWriteFailed();
|
|
93
|
+
}
|
|
94
|
+
}
|
|
95
|
+
startHeartbeat() {
|
|
96
|
+
if (this.heartbeatHandle) {
|
|
97
|
+
clearInterval(this.heartbeatHandle);
|
|
98
|
+
}
|
|
99
|
+
const handle = setInterval(() => {
|
|
100
|
+
void this.tickHeartbeat();
|
|
101
|
+
}, this.heartbeatIntervalMs);
|
|
102
|
+
handle.unref?.();
|
|
103
|
+
this.heartbeatHandle = handle;
|
|
104
|
+
}
|
|
105
|
+
async safeHget() {
|
|
106
|
+
try {
|
|
107
|
+
return await this.client.hget(exports.REGISTRY_KEY, this.name);
|
|
108
|
+
}
|
|
109
|
+
catch (err) {
|
|
110
|
+
this.logger.warn(`discovery: HGET registry failed: ${errMsg(err)}`);
|
|
111
|
+
this.onWriteFailed();
|
|
112
|
+
return null;
|
|
113
|
+
}
|
|
114
|
+
}
|
|
115
|
+
async safeCall(fn, label) {
|
|
116
|
+
try {
|
|
117
|
+
await fn();
|
|
118
|
+
}
|
|
119
|
+
catch (err) {
|
|
120
|
+
this.logger.warn(`discovery: ${label} failed: ${errMsg(err)}`);
|
|
121
|
+
this.onWriteFailed();
|
|
122
|
+
}
|
|
123
|
+
}
|
|
124
|
+
checkCollision(existingJson) {
|
|
125
|
+
let parsed;
|
|
126
|
+
try {
|
|
127
|
+
parsed = JSON.parse(existingJson);
|
|
128
|
+
}
|
|
129
|
+
catch {
|
|
130
|
+
return;
|
|
131
|
+
}
|
|
132
|
+
if (parsed.type && parsed.type !== exports.CACHE_TYPE) {
|
|
133
|
+
throw new errors_1.SemanticCacheUsageError(`cache name collision: '${this.name}' is already registered as type '${String(parsed.type)}' on this Valkey instance`);
|
|
134
|
+
}
|
|
135
|
+
if (parsed.version && parsed.version !== this.metadata.version) {
|
|
136
|
+
this.logger.warn(`discovery: overwriting marker for '${this.name}' (existing version ${String(parsed.version)}, this version ${this.metadata.version})`);
|
|
137
|
+
}
|
|
138
|
+
}
|
|
139
|
+
}
|
|
140
|
+
exports.DiscoveryManager = DiscoveryManager;
|
package/dist/index.d.ts
CHANGED
|
@@ -1,8 +1,10 @@
|
|
|
1
1
|
export { SemanticCache } from './SemanticCache';
|
|
2
2
|
export type { ThresholdEffectivenessResult } from './SemanticCache';
|
|
3
3
|
export { DEFAULT_COST_TABLE } from './defaultCostTable';
|
|
4
|
-
export type { SemanticCacheOptions, CacheCheckOptions, CacheStoreOptions, CacheCheckResult, CacheStats, IndexInfo, InvalidateResult, CacheConfidence, EmbedFn, ModelCost, RerankOptions, } from './types';
|
|
4
|
+
export type { SemanticCacheOptions, CacheCheckOptions, CacheStoreOptions, CacheCheckResult, CacheStats, IndexInfo, InvalidateResult, CacheConfidence, EmbedFn, ModelCost, RerankOptions, JudgeOptions, ConfigRefreshOptions, } from './types';
|
|
5
5
|
export { SemanticCacheUsageError, EmbeddingError, ValkeyCommandError, } from './errors';
|
|
6
6
|
export type { ContentBlock, TextBlock, BinaryBlock, ToolCallBlock, ToolResultBlock, ReasoningBlock, BlockHints, } from './utils';
|
|
7
|
+
export { escapeTag } from './utils';
|
|
7
8
|
export type { BinaryRef, BinaryNormalizer, NormalizerConfig } from './normalizer';
|
|
8
9
|
export { hashBase64, hashBytes, hashUrl, fetchAndHash, passthrough, composeNormalizer, defaultNormalizer, } from './normalizer';
|
|
10
|
+
export type { DiscoveryOptions } from './discovery';
|
package/dist/index.js
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
"use strict";
|
|
2
2
|
Object.defineProperty(exports, "__esModule", { value: true });
|
|
3
|
-
exports.defaultNormalizer = exports.composeNormalizer = exports.passthrough = exports.fetchAndHash = exports.hashUrl = exports.hashBytes = exports.hashBase64 = exports.ValkeyCommandError = exports.EmbeddingError = exports.SemanticCacheUsageError = exports.DEFAULT_COST_TABLE = exports.SemanticCache = void 0;
|
|
3
|
+
exports.defaultNormalizer = exports.composeNormalizer = exports.passthrough = exports.fetchAndHash = exports.hashUrl = exports.hashBytes = exports.hashBase64 = exports.escapeTag = exports.ValkeyCommandError = exports.EmbeddingError = exports.SemanticCacheUsageError = exports.DEFAULT_COST_TABLE = exports.SemanticCache = void 0;
|
|
4
4
|
var SemanticCache_1 = require("./SemanticCache");
|
|
5
5
|
Object.defineProperty(exports, "SemanticCache", { enumerable: true, get: function () { return SemanticCache_1.SemanticCache; } });
|
|
6
6
|
var defaultCostTable_1 = require("./defaultCostTable");
|
|
@@ -9,6 +9,8 @@ var errors_1 = require("./errors");
|
|
|
9
9
|
Object.defineProperty(exports, "SemanticCacheUsageError", { enumerable: true, get: function () { return errors_1.SemanticCacheUsageError; } });
|
|
10
10
|
Object.defineProperty(exports, "EmbeddingError", { enumerable: true, get: function () { return errors_1.EmbeddingError; } });
|
|
11
11
|
Object.defineProperty(exports, "ValkeyCommandError", { enumerable: true, get: function () { return errors_1.ValkeyCommandError; } });
|
|
12
|
+
var utils_1 = require("./utils");
|
|
13
|
+
Object.defineProperty(exports, "escapeTag", { enumerable: true, get: function () { return utils_1.escapeTag; } });
|
|
12
14
|
var normalizer_1 = require("./normalizer");
|
|
13
15
|
Object.defineProperty(exports, "hashBase64", { enumerable: true, get: function () { return normalizer_1.hashBase64; } });
|
|
14
16
|
Object.defineProperty(exports, "hashBytes", { enumerable: true, get: function () { return normalizer_1.hashBytes; } });
|
package/dist/telemetry.d.ts
CHANGED
|
@@ -13,6 +13,10 @@ interface CacheMetrics {
|
|
|
13
13
|
costSavedTotal: Counter;
|
|
14
14
|
embeddingCacheTotal: Counter;
|
|
15
15
|
staleModelEvictions: Counter;
|
|
16
|
+
discoveryWriteFailed: Counter;
|
|
17
|
+
configRefreshFailed: Counter;
|
|
18
|
+
judgeDecisions: Counter;
|
|
19
|
+
judgeDuration: Histogram;
|
|
16
20
|
}
|
|
17
21
|
export interface Telemetry {
|
|
18
22
|
tracer: Tracer;
|
package/dist/telemetry.js
CHANGED
|
@@ -57,6 +57,27 @@ function createTelemetry(opts) {
|
|
|
57
57
|
help: 'Entries evicted due to staleAfterModelChange detection',
|
|
58
58
|
labelNames: ['cache_name'],
|
|
59
59
|
});
|
|
60
|
+
const discoveryWriteFailed = getOrCreateCounter(registry, {
|
|
61
|
+
name: `${opts.prefix}_discovery_write_failed_total`,
|
|
62
|
+
help: 'Count of failed discovery-marker writes (best-effort HGET/HSET/SET operations against __betterdb:* keys)',
|
|
63
|
+
labelNames: ['cache_name'],
|
|
64
|
+
});
|
|
65
|
+
const configRefreshFailed = getOrCreateCounter(registry, {
|
|
66
|
+
name: `${opts.prefix}_config_refresh_failed_total`,
|
|
67
|
+
help: 'Count of failed periodic config refreshes (HGETALL on __config).',
|
|
68
|
+
labelNames: ['cache_name'],
|
|
69
|
+
});
|
|
70
|
+
const judgeDecisions = getOrCreateCounter(registry, {
|
|
71
|
+
name: `${opts.prefix}_judge_decisions_total`,
|
|
72
|
+
help: 'LLM-as-judge decisions for borderline cache hits',
|
|
73
|
+
labelNames: ['cache_name', 'category', 'decision'],
|
|
74
|
+
});
|
|
75
|
+
const judgeDuration = getOrCreateHistogram(registry, {
|
|
76
|
+
name: `${opts.prefix}_judge_duration_seconds`,
|
|
77
|
+
help: 'Wall-clock duration of judgeFn invocations',
|
|
78
|
+
labelNames: ['cache_name', 'category', 'decision'],
|
|
79
|
+
buckets: [0.05, 0.1, 0.25, 0.5, 1, 2, 5],
|
|
80
|
+
});
|
|
60
81
|
return {
|
|
61
82
|
tracer,
|
|
62
83
|
metrics: {
|
|
@@ -67,6 +88,10 @@ function createTelemetry(opts) {
|
|
|
67
88
|
costSavedTotal,
|
|
68
89
|
embeddingCacheTotal,
|
|
69
90
|
staleModelEvictions,
|
|
91
|
+
discoveryWriteFailed,
|
|
92
|
+
configRefreshFailed,
|
|
93
|
+
judgeDecisions,
|
|
94
|
+
judgeDuration,
|
|
70
95
|
},
|
|
71
96
|
};
|
|
72
97
|
}
|
package/dist/types.d.ts
CHANGED
|
@@ -1,6 +1,13 @@
|
|
|
1
1
|
import type Valkey from 'iovalkey';
|
|
2
2
|
import type { Registry } from 'prom-client';
|
|
3
|
+
import type { DiscoveryOptions } from './discovery';
|
|
3
4
|
export type { Valkey };
|
|
5
|
+
export interface ConfigRefreshOptions {
|
|
6
|
+
/** Enable periodic config refresh from Valkey. Default: true. */
|
|
7
|
+
enabled?: boolean;
|
|
8
|
+
/** Refresh interval in milliseconds. Default: 30000. Minimum: 1000. */
|
|
9
|
+
intervalMs?: number;
|
|
10
|
+
}
|
|
4
11
|
export type EmbedFn = (text: string) => Promise<number[]>;
|
|
5
12
|
export interface ModelCost {
|
|
6
13
|
inputPer1k: number;
|
|
@@ -92,6 +99,20 @@ export interface SemanticCacheOptions {
|
|
|
92
99
|
/** Interval in ms for periodic stats snapshots. Default: 300_000 (5 min). 0 to disable. */
|
|
93
100
|
statsIntervalMs?: number;
|
|
94
101
|
};
|
|
102
|
+
/**
|
|
103
|
+
* Discovery-marker protocol controls. See
|
|
104
|
+
* docs/plans/specs/spec-semantic-cache-discovery-markers.md.
|
|
105
|
+
* Defaults: enabled=true, heartbeatIntervalMs=30000, includeCategories=true.
|
|
106
|
+
*/
|
|
107
|
+
discovery?: DiscoveryOptions;
|
|
108
|
+
/**
|
|
109
|
+
* Periodic refresh of in-memory threshold config from Valkey.
|
|
110
|
+
* When enabled, the cache re-reads `{name}:__config` on the configured
|
|
111
|
+
* interval. Field `threshold` updates `defaultThreshold`; fields named
|
|
112
|
+
* `threshold:{category}` update `categoryThresholds[category]`.
|
|
113
|
+
* Defaults: enabled=true, intervalMs=30000.
|
|
114
|
+
*/
|
|
115
|
+
configRefresh?: ConfigRefreshOptions;
|
|
95
116
|
}
|
|
96
117
|
export interface RerankOptions {
|
|
97
118
|
/**
|
|
@@ -108,6 +129,61 @@ export interface RerankOptions {
|
|
|
108
129
|
similarity: number;
|
|
109
130
|
}>) => Promise<number>;
|
|
110
131
|
}
|
|
132
|
+
/**
|
|
133
|
+
* LLM-as-judge adjudication for borderline cache hits.
|
|
134
|
+
*
|
|
135
|
+
* When set on CacheCheckOptions, a hit whose cosine distance lands in the
|
|
136
|
+
* uncertainty band (threshold - uncertaintyBand < score <= threshold) is
|
|
137
|
+
* passed to judgeFn before being returned. The judge accepts (promotes the
|
|
138
|
+
* hit to confidence: 'high') or rejects (treats it as a miss with
|
|
139
|
+
* nearestMiss populated).
|
|
140
|
+
*
|
|
141
|
+
* The judge is NOT invoked for:
|
|
142
|
+
* - high-confidence hits (score <= threshold - uncertaintyBand)
|
|
143
|
+
* - misses (score > threshold)
|
|
144
|
+
* - the no-candidates case (FT.SEARCH returned zero rows)
|
|
145
|
+
*
|
|
146
|
+
* When rerank is also set, the judge runs on the reranked pick, not the
|
|
147
|
+
* original top-1.
|
|
148
|
+
*/
|
|
149
|
+
export interface JudgeOptions {
|
|
150
|
+
/**
|
|
151
|
+
* Function that decides whether a borderline cache hit is acceptable.
|
|
152
|
+
* Return true to accept (caller receives confidence: 'high').
|
|
153
|
+
* Return false to reject (caller receives a miss with nearestMiss).
|
|
154
|
+
*
|
|
155
|
+
* The function receives the original prompt text (or the resolved text
|
|
156
|
+
* portion of a multipart prompt), the cached response, the cosine distance,
|
|
157
|
+
* the effective threshold, and the category if one was supplied to check().
|
|
158
|
+
*/
|
|
159
|
+
judgeFn: (input: {
|
|
160
|
+
prompt: string;
|
|
161
|
+
response: string;
|
|
162
|
+
similarity: number;
|
|
163
|
+
threshold: number;
|
|
164
|
+
category: string | undefined;
|
|
165
|
+
}) => Promise<boolean>;
|
|
166
|
+
/**
|
|
167
|
+
* Behavior when judgeFn throws or exceeds timeoutMs.
|
|
168
|
+
* 'accept' - return the cached response with confidence: 'uncertain'
|
|
169
|
+
* (current pre-judge behavior, fail-open).
|
|
170
|
+
* 'reject' - treat as a miss (fail-closed).
|
|
171
|
+
* Default: 'accept'.
|
|
172
|
+
*/
|
|
173
|
+
onError?: 'accept' | 'reject';
|
|
174
|
+
/**
|
|
175
|
+
* Per-call timeout in milliseconds. Default: 2000.
|
|
176
|
+
* The judge function is raced against this timeout; timeout is treated
|
|
177
|
+
* the same as a thrown error and routed through onError.
|
|
178
|
+
*
|
|
179
|
+
* Note: the underlying promise is not cancelled on timeout — JavaScript has
|
|
180
|
+
* no built-in cancellation primitive. A real LLM HTTP request will continue
|
|
181
|
+
* running in the background after the timeout fires, consuming API quota.
|
|
182
|
+
* To stop the underlying request, use an AbortController inside judgeFn and
|
|
183
|
+
* abort it when the signal you manage fires.
|
|
184
|
+
*/
|
|
185
|
+
timeoutMs?: number;
|
|
186
|
+
}
|
|
111
187
|
export interface CacheCheckOptions {
|
|
112
188
|
/** Per-request threshold override (cosine distance 0-2). Highest priority. */
|
|
113
189
|
threshold?: number;
|
|
@@ -146,6 +222,11 @@ export interface CacheCheckOptions {
|
|
|
146
222
|
* in rerankFn yourself.
|
|
147
223
|
*/
|
|
148
224
|
rerank?: RerankOptions;
|
|
225
|
+
/**
|
|
226
|
+
* Optional LLM-as-judge adjudication for borderline hits.
|
|
227
|
+
* See JudgeOptions. Ignored on checkBatch() - call check() per prompt instead.
|
|
228
|
+
*/
|
|
229
|
+
judge?: JudgeOptions;
|
|
149
230
|
}
|
|
150
231
|
export interface CacheStoreOptions {
|
|
151
232
|
/** Per-entry TTL in seconds. Overrides SemanticCacheOptions.defaultTtl. */
|
|
@@ -202,10 +283,19 @@ export interface CacheCheckResult {
|
|
|
202
283
|
/**
|
|
203
284
|
* On a miss where a candidate existed but didn't clear the threshold,
|
|
204
285
|
* describes how close it was. Useful for threshold tuning.
|
|
286
|
+
*
|
|
287
|
+
* Note: when the miss originates from a judge rejection, `deltaToThreshold`
|
|
288
|
+
* will be <= 0 because the score did clear the threshold — the judge said no.
|
|
289
|
+
* Existing non-judge misses always produce deltaToThreshold > 0.
|
|
290
|
+
* Use `deltaToThreshold <= 0` to detect judge-originated misses.
|
|
205
291
|
*/
|
|
206
292
|
nearestMiss?: {
|
|
207
293
|
similarity: number;
|
|
208
294
|
deltaToThreshold: number;
|
|
295
|
+
/** The effective threshold that was applied. Present on judge-rejection misses. */
|
|
296
|
+
threshold?: number;
|
|
297
|
+
/** The Valkey key of the entry that was rejected. Present on judge-rejection misses. */
|
|
298
|
+
matchedKey?: string;
|
|
209
299
|
};
|
|
210
300
|
/**
|
|
211
301
|
* Estimated cost saved (in dollars) by returning this cached result instead of calling the LLM.
|
package/package.json
CHANGED