@betterdb/semantic-cache 0.2.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -96,6 +96,54 @@ A lookup is a **hit** when `score <= threshold`. The default threshold is `0.1`.
96
96
  | Conversational / RAG | `0.15` | Paraphrases hit as `high` confidence |
97
97
  | Broad search / recall | `0.20` | High hit rate, review uncertain hits |
98
98
 
99
+ ## LLM-as-judge
100
+
101
+ When a hit lands in the uncertainty band (`threshold - uncertaintyBand < score <= threshold`), you can supply a `judgeFn` to adjudicate automatically instead of handling `confidence: 'uncertain'` yourself.
102
+
103
+ ```typescript
104
+ const result = await cache.check(userPrompt, {
105
+ judge: {
106
+ judgeFn: async ({ prompt, response, similarity, threshold, category }) => {
107
+ // Return true to accept (confidence → 'high')
108
+ // Return false to reject (treated as miss with nearestMiss)
109
+ const verdict = await openai.chat.completions.create({
110
+ model: 'gpt-5-mini',
111
+ messages: [
112
+ { role: 'system', content: 'Reply YES or NO only.' },
113
+ { role: 'user', content: `Does this cached response correctly answer the prompt?\nPrompt: ${prompt}\nResponse: ${response}` },
114
+ ],
115
+ });
116
+ return verdict.choices[0].message.content?.startsWith('YES') ?? false;
117
+ },
118
+ onError: 'accept', // fail-open on judge errors (default)
119
+ timeoutMs: 2000, // per-call timeout (default)
120
+ },
121
+ });
122
+ ```
123
+
124
+ **When the judge is invoked:** only for `confidence === 'uncertain'` hits. High-confidence hits, misses, and the zero-candidates case bypass the judge entirely.
125
+
126
+ **Accept path:** `result.hit === true`, `result.confidence === 'high'`.
127
+
128
+ **Reject path:** `result.hit === false`, `result.nearestMiss` populated with `deltaToThreshold <= 0` (use this to distinguish judge rejections from regular misses where `deltaToThreshold > 0`).
129
+
130
+ **Composing with rerank:** when both `rerank` and `judge` are set, the judge receives the reranked pick's response and similarity score.
131
+
132
+ **`checkBatch()` does not support `judge`.** Call `check()` individually for prompts that need adjudication.
133
+
134
+ ### CacheCheckOptions reference
135
+
136
+ | Option | Type | Default | Description |
137
+ |---|---|---|---|
138
+ | `threshold` | `number` | `defaultThreshold` | Per-request cosine distance threshold override |
139
+ | `category` | `string` | — | Category tag for per-category thresholds and metric labels |
140
+ | `filter` | `string` | — | FT.SEARCH pre-filter expression (trusted input only) |
141
+ | `k` | `number` | `1` | KNN neighbours to fetch (ignored when `rerank` is set) |
142
+ | `staleAfterModelChange` | `boolean` | `false` | Evict and miss when stored model differs from `currentModel` |
143
+ | `currentModel` | `string` | — | Model to compare against stored entries |
144
+ | `rerank` | `RerankOptions` | — | Rerank hook; see `RerankOptions` |
145
+ | `judge` | `JudgeOptions` | — | LLM-as-judge for borderline hits; see `JudgeOptions`. Not supported by `checkBatch()`; throws `SemanticCacheUsageError` |
146
+
99
147
  ## Configuration Reference
100
148
 
101
149
  | Option | Type | Default | Description |
@@ -161,6 +209,24 @@ Cost savings scale with the model. Observed values from live examples:
161
209
  | `@betterdb/semantic-cache/embed/cohere` | `embed-english-v3.0` | 1024 |
162
210
  | `@betterdb/semantic-cache/embed/ollama` | `nomic-embed-text` | 768 |
163
211
 
212
+ ### Discovery markers
213
+
214
+ Starting in `0.2.0`, `initialize()` writes a small advisory record to a shared `__betterdb:caches` hash on the Valkey instance so Monitor (and other tooling) can enumerate caches without configuration. A 60s-TTL heartbeat key is refreshed every 30s; `flush()` and `dispose()` remove the heartbeat immediately. No sensitive data is ever written — only cache metadata (type, prefix, version, capabilities, configured thresholds).
215
+
216
+ Opt out by passing `discovery: { enabled: false }`. See `SemanticCacheOptions.discovery` for the full set of knobs.
217
+
218
+ If your Valkey runs with ACLs, grant the library's user access to the `__betterdb:*` prefix:
219
+
220
+ ```
221
+ ACL SETUSER <user> +@write +@read ~__betterdb:* ~<your-cache-prefix>:*
222
+ ```
223
+
224
+ Discovery writes are best-effort — if the ACL denies them, the cache still functions and the `semantic_cache_discovery_write_failed_total` counter increments so operators can alert.
225
+
226
+ ### `cache.dispose()`
227
+
228
+ Graceful shutdown: stops the heartbeat and deletes this instance's heartbeat key so Monitor marks the cache offline immediately. Does not drop the index or delete entries. Call from your SIGTERM handler alongside `client.quit()`.
229
+
164
230
  ## API
165
231
 
166
232
  ### `cache.initialize()`
@@ -215,7 +281,15 @@ Returns `{ name, numDocs, dimension, indexingState }`.
215
281
 
216
282
  ### `cache.flush()`
217
283
 
218
- Drops the index and all keys. Call `initialize()` again to rebuild.
284
+ Drops the index and all entries. Call `initialize()` again to rebuild. Also stops the discovery heartbeat and deletes its heartbeat key, but preserves the registry entry in `__betterdb:caches` so Monitor retains history.
285
+
286
+ ### `cache.shutdown()`
287
+
288
+ Stops the analytics client, cancels the stats snapshot timer, and disposes the discovery heartbeat. Safe to call multiple times.
289
+
290
+ ### `cache.dispose()`
291
+
292
+ Graceful shutdown of the discovery layer for in-process caches without destroying data. Stops the discovery heartbeat and deletes the heartbeat key; does not touch the index or entries.
219
293
 
220
294
  ### `cache.thresholdEffectiveness(options?)`
221
295
 
@@ -8,15 +8,22 @@ export declare class SemanticCache {
8
8
  private readonly entryPrefix;
9
9
  private readonly statsKey;
10
10
  private readonly similarityWindowKey;
11
- private readonly defaultThreshold;
11
+ private readonly configKey;
12
+ private defaultThreshold;
12
13
  private readonly defaultTtl;
13
- private readonly categoryThresholds;
14
+ private categoryThresholds;
14
15
  private readonly uncertaintyBand;
15
16
  private readonly telemetry;
16
17
  private readonly costTable;
17
18
  private readonly embeddingCacheEnabled;
18
19
  private readonly embeddingCacheTtl;
19
20
  private readonly embedKeyPrefix;
21
+ private readonly discoveryOptions;
22
+ private readonly _initialDefaultThreshold;
23
+ private readonly _initialCategoryThresholds;
24
+ private readonly configRefreshOptions;
25
+ private configRefreshTimer;
26
+ private discovery;
20
27
  private _initialized;
21
28
  private _dimension;
22
29
  private _hasBinaryRefs;
@@ -40,8 +47,18 @@ export declare class SemanticCache {
40
47
  constructor(options: SemanticCacheOptions);
41
48
  initialize(): Promise<void>;
42
49
  flush(): Promise<void>;
43
- /** Shut down the analytics client and cancel the stats timer. */
50
+ /**
51
+ * Shut down the analytics client, cancel the stats timer, and stop the
52
+ * discovery heartbeat. Safe to call multiple times.
53
+ */
44
54
  shutdown(): Promise<void>;
55
+ /**
56
+ * Graceful shutdown of the discovery layer — stops the heartbeat and
57
+ * deletes this instance's heartbeat key so Monitor marks the cache offline
58
+ * immediately. Does NOT touch the registry hash, the FT index, or any
59
+ * entries. Safe to call multiple times.
60
+ */
61
+ dispose(): Promise<void>;
45
62
  check(prompt: string | ContentBlock[], options?: CacheCheckOptions): Promise<CacheCheckResult>;
46
63
  store(prompt: string | ContentBlock[], response: string, options?: CacheStoreOptions): Promise<string>;
47
64
  /**
@@ -82,8 +99,29 @@ export declare class SemanticCache {
82
99
  thresholdEffectivenessAll(options?: {
83
100
  minSamples?: number;
84
101
  }): Promise<ThresholdEffectivenessResult[]>;
102
+ /**
103
+ * Refresh threshold config from Valkey. Returns true on a successful HGETALL,
104
+ * false if the call threw.
105
+ *
106
+ * Field semantics:
107
+ * - "threshold" -> updates defaultThreshold
108
+ * - "threshold:{category}" -> updates categoryThresholds[category]
109
+ * - "threshold:" (empty) -> ignored
110
+ * - non-numeric values -> ignored
111
+ * - out-of-range values -> ignored (must be 0 <= x <= 2)
112
+ *
113
+ * Categories present in memory but absent from the hash fall back to their
114
+ * constructor values (or are removed if no constructor override existed).
115
+ * The default threshold likewise falls back to its constructor value if
116
+ * `threshold` is absent from the hash.
117
+ */
118
+ refreshConfig(): Promise<boolean>;
85
119
  /** @internal Default similarity threshold. */
86
120
  get _defaultThreshold(): number;
121
+ /** @internal Test-only getter. */
122
+ get _categoryThresholds(): Readonly<Record<string, number>>;
123
+ /** @internal Test-only getter. */
124
+ get _configRefreshIntervalMs(): number;
87
125
  /**
88
126
  * Execute a stable FT.SEARCH for use by adapters (e.g. LangGraph).
89
127
  * SORTBY inserted_at ASC gives stable ordering across paginated calls.
@@ -98,7 +136,9 @@ export declare class SemanticCache {
98
136
  vector: number[];
99
137
  durationSec: number;
100
138
  }>;
139
+ private startConfigRefresh;
101
140
  private _doInitialize;
141
+ private registerDiscovery;
102
142
  private initAnalyticsSafe;
103
143
  private captureStatsSnapshot;
104
144
  private ensureIndexAndGetDimension;
@@ -10,7 +10,9 @@ const utils_1 = require("./utils");
10
10
  const defaultCostTable_1 = require("./defaultCostTable");
11
11
  const cluster_1 = require("./cluster");
12
12
  const analytics_1 = require("./analytics");
13
+ const discovery_1 = require("./discovery");
13
14
  const INVALIDATE_BATCH_SIZE = 1000;
15
+ const PACKAGE_VERSION = require('../package.json').version;
14
16
  function errMsg(err) {
15
17
  return err instanceof Error ? err.message : String(err);
16
18
  }
@@ -22,6 +24,7 @@ class SemanticCache {
22
24
  entryPrefix;
23
25
  statsKey;
24
26
  similarityWindowKey;
27
+ configKey;
25
28
  defaultThreshold;
26
29
  defaultTtl;
27
30
  categoryThresholds;
@@ -31,6 +34,12 @@ class SemanticCache {
31
34
  embeddingCacheEnabled;
32
35
  embeddingCacheTtl;
33
36
  embedKeyPrefix;
37
+ discoveryOptions;
38
+ _initialDefaultThreshold;
39
+ _initialCategoryThresholds;
40
+ configRefreshOptions;
41
+ configRefreshTimer;
42
+ discovery = null;
34
43
  _initialized = false;
35
44
  _dimension = 0;
36
45
  _hasBinaryRefs = false;
@@ -59,6 +68,7 @@ class SemanticCache {
59
68
  this.entryPrefix = `${this.name}:entry:`;
60
69
  this.statsKey = `${this.name}:__stats`;
61
70
  this.similarityWindowKey = `${this.name}:__similarity_window`;
71
+ this.configKey = `${this.name}:__config`;
62
72
  this.embedKeyPrefix = `${this.name}:embed:`;
63
73
  this.defaultThreshold = options.defaultThreshold ?? 0.1;
64
74
  this.defaultTtl = options.defaultTtl;
@@ -85,6 +95,16 @@ class SemanticCache {
85
95
  });
86
96
  this.analyticsOpts = options.analytics;
87
97
  this.usesDefaultCostTable = useDefault;
98
+ this.discoveryOptions = options.discovery ?? {};
99
+ // Capture constructor values as fallback when __config fields are absent
100
+ this._initialDefaultThreshold = this.defaultThreshold;
101
+ this._initialCategoryThresholds = { ...this.categoryThresholds };
102
+ // Refresh options
103
+ const refresh = options.configRefresh ?? {};
104
+ this.configRefreshOptions = {
105
+ enabled: refresh.enabled ?? true,
106
+ intervalMs: Math.max(1000, refresh.intervalMs ?? 30_000),
107
+ };
88
108
  }
89
109
  // -- Lifecycle --
90
110
  async initialize() {
@@ -102,6 +122,14 @@ class SemanticCache {
102
122
  this._initialized = false;
103
123
  this._initPromise = null;
104
124
  this._initGeneration++;
125
+ // Capture and null the discovery ref synchronously, before any await,
126
+ // so a concurrent _doInitialize() (started after _initGeneration++) can't
127
+ // race in and have its new manager overwritten by this flush.
128
+ const discoveryToStop = this.discovery;
129
+ this.discovery = null;
130
+ if (discoveryToStop) {
131
+ await discoveryToStop.stop({ deleteHeartbeat: true });
132
+ }
105
133
  // Valkey Search 1.2 does not support the DD (Delete Documents) flag on
106
134
  // FT.DROPINDEX. Drop the index first, then clean up keys separately.
107
135
  try {
@@ -126,14 +154,41 @@ class SemanticCache {
126
154
  await this.client.del(this.similarityWindowKey);
127
155
  this.analytics.capture('cache_flush');
128
156
  }
129
- /** Shut down the analytics client and cancel the stats timer. */
157
+ /**
158
+ * Shut down the analytics client, cancel the stats timer, and stop the
159
+ * discovery heartbeat. Safe to call multiple times.
160
+ */
130
161
  async shutdown() {
131
162
  this.shutdownCalled = true;
163
+ if (this.configRefreshTimer) {
164
+ clearInterval(this.configRefreshTimer);
165
+ this.configRefreshTimer = undefined;
166
+ }
132
167
  if (this.statsTimer) {
133
168
  clearInterval(this.statsTimer);
134
169
  this.statsTimer = undefined;
135
170
  }
136
171
  await this.analytics.shutdown();
172
+ await this.dispose();
173
+ }
174
+ /**
175
+ * Graceful shutdown of the discovery layer — stops the heartbeat and
176
+ * deletes this instance's heartbeat key so Monitor marks the cache offline
177
+ * immediately. Does NOT touch the registry hash, the FT index, or any
178
+ * entries. Safe to call multiple times.
179
+ */
180
+ async dispose() {
181
+ if (this.configRefreshTimer) {
182
+ clearInterval(this.configRefreshTimer);
183
+ this.configRefreshTimer = undefined;
184
+ }
185
+ if (this._initPromise) {
186
+ await this._initPromise.catch(() => { });
187
+ }
188
+ if (this.discovery) {
189
+ await this.discovery.stop({ deleteHeartbeat: true });
190
+ this.discovery = null;
191
+ }
137
192
  }
138
193
  // -- Public operations --
139
194
  async check(prompt, options) {
@@ -259,14 +314,85 @@ class SemanticCache {
259
314
  return { hit: false, confidence: 'miss' };
260
315
  }
261
316
  }
262
- // All checks passed — record as a genuine hit
317
+ // All checks passed — compute confidence (recordSimilarityWindow moves to after judge)
318
+ let confidence = winnerScore >= threshold - this.uncertaintyBand ? 'uncertain' : 'high';
319
+ const matchedKey = winner.key;
320
+ // --- LLM-as-judge for borderline hits ---
321
+ if (options?.judge && confidence === 'uncertain') {
322
+ const judgeStart = performance.now();
323
+ const timeoutMs = options.judge.timeoutMs ?? 2000;
324
+ const onError = options.judge.onError ?? 'accept';
325
+ let decision;
326
+ try {
327
+ const accepted = await raceWithTimeout(options.judge.judgeFn({
328
+ prompt: promptText,
329
+ response: winner.fields['response'] ?? '',
330
+ similarity: winnerScore,
331
+ threshold,
332
+ category: category || undefined,
333
+ }), timeoutMs);
334
+ decision = accepted ? 'accept' : 'reject';
335
+ }
336
+ catch (err) {
337
+ const isTimeout = err instanceof JudgeTimeoutError;
338
+ if (onError === 'accept') {
339
+ decision = isTimeout ? 'timeout_accept' : 'error_accept';
340
+ }
341
+ else {
342
+ decision = isTimeout ? 'timeout_reject' : 'error_reject';
343
+ }
344
+ }
345
+ const judgeSec = (performance.now() - judgeStart) / 1000;
346
+ this.telemetry.metrics.judgeDecisions
347
+ .labels({ cache_name: this.name, category: categoryLabel, decision })
348
+ .inc();
349
+ this.telemetry.metrics.judgeDuration
350
+ .labels({ cache_name: this.name, category: categoryLabel, decision })
351
+ .observe(judgeSec);
352
+ span.setAttributes({
353
+ 'cache.judge.invoked': true,
354
+ 'cache.judge.decision': decision,
355
+ 'cache.judge.latency_ms': judgeSec * 1000,
356
+ });
357
+ if (decision === 'accept') {
358
+ confidence = 'high';
359
+ // Fall through to hit-return path
360
+ }
361
+ else if (decision === 'error_accept' || decision === 'timeout_accept') {
362
+ // Preserve 'uncertain'; fall through to hit-return path
363
+ }
364
+ else {
365
+ // reject / error_reject / timeout_reject → treat as miss
366
+ await this.recordSimilarityWindow(winnerScore, 'miss', category);
367
+ await this.recordStat('misses');
368
+ this.telemetry.metrics.requestsTotal
369
+ .labels({ cache_name: this.name, result: 'miss', category: categoryLabel })
370
+ .inc();
371
+ span.setAttributes({
372
+ 'cache.hit': false,
373
+ 'cache.name': this.name,
374
+ 'cache.category': categoryLabel,
375
+ });
376
+ return {
377
+ hit: false,
378
+ confidence: 'miss',
379
+ similarity: winnerScore,
380
+ nearestMiss: {
381
+ similarity: winnerScore,
382
+ threshold,
383
+ deltaToThreshold: winnerScore - threshold,
384
+ matchedKey,
385
+ },
386
+ };
387
+ }
388
+ }
389
+ // --- End judge ---
390
+ // Record as genuine hit (moved here from before the judge block)
263
391
  await this.recordSimilarityWindow(winnerScore, 'hit', category);
264
- const confidence = winnerScore >= threshold - this.uncertaintyBand ? 'uncertain' : 'high';
265
392
  await this.recordStat('hits');
266
393
  const metricResult = confidence === 'uncertain' ? 'uncertain_hit' : 'hit';
267
394
  this.telemetry.metrics.requestsTotal
268
395
  .labels({ cache_name: this.name, result: metricResult, category: categoryLabel }).inc();
269
- const matchedKey = winner.key;
270
396
  if (this.defaultTtl !== undefined && matchedKey) {
271
397
  await this.client.expire(matchedKey, this.defaultTtl);
272
398
  }
@@ -446,6 +572,9 @@ class SemanticCache {
446
572
  if (options?.staleAfterModelChange) {
447
573
  throw new errors_1.SemanticCacheUsageError("checkBatch() does not support 'staleAfterModelChange'. Use check() for stale-model eviction.");
448
574
  }
575
+ if (options?.judge) {
576
+ throw new errors_1.SemanticCacheUsageError("checkBatch() does not support the 'judge' option. Use check() for LLM-as-judge adjudication.");
577
+ }
449
578
  return this.traced('checkBatch', async (span) => {
450
579
  // Resolve all prompts and embed in parallel
451
580
  const resolved = await Promise.all(prompts.map((p) => this.resolvePrompt(p)));
@@ -769,9 +898,64 @@ class SemanticCache {
769
898
  ]);
770
899
  return results;
771
900
  }
901
+ /**
902
+ * Refresh threshold config from Valkey. Returns true on a successful HGETALL,
903
+ * false if the call threw.
904
+ *
905
+ * Field semantics:
906
+ * - "threshold" -> updates defaultThreshold
907
+ * - "threshold:{category}" -> updates categoryThresholds[category]
908
+ * - "threshold:" (empty) -> ignored
909
+ * - non-numeric values -> ignored
910
+ * - out-of-range values -> ignored (must be 0 <= x <= 2)
911
+ *
912
+ * Categories present in memory but absent from the hash fall back to their
913
+ * constructor values (or are removed if no constructor override existed).
914
+ * The default threshold likewise falls back to its constructor value if
915
+ * `threshold` is absent from the hash.
916
+ */
917
+ async refreshConfig() {
918
+ let raw = null;
919
+ try {
920
+ raw = await this.client.hgetall(this.configKey);
921
+ }
922
+ catch {
923
+ return false;
924
+ }
925
+ let nextDefault = this._initialDefaultThreshold;
926
+ const nextCategory = { ...this._initialCategoryThresholds };
927
+ if (raw) {
928
+ for (const [field, value] of Object.entries(raw)) {
929
+ const parsed = Number(value);
930
+ if (!Number.isFinite(parsed) || parsed < 0 || parsed > 2) {
931
+ continue;
932
+ }
933
+ if (field === 'threshold') {
934
+ nextDefault = parsed;
935
+ }
936
+ else if (field.startsWith('threshold:')) {
937
+ const category = field.slice('threshold:'.length);
938
+ if (category.length > 0) {
939
+ nextCategory[category] = parsed;
940
+ }
941
+ }
942
+ }
943
+ }
944
+ this.defaultThreshold = nextDefault;
945
+ this.categoryThresholds = nextCategory;
946
+ return true;
947
+ }
772
948
  // -- Internal helpers exposed to package adapters --
773
949
  /** @internal Default similarity threshold. */
774
950
  get _defaultThreshold() { return this.defaultThreshold; }
951
+ /** @internal Test-only getter. */
952
+ get _categoryThresholds() {
953
+ return this.categoryThresholds;
954
+ }
955
+ /** @internal Test-only getter. */
956
+ get _configRefreshIntervalMs() {
957
+ return this.configRefreshOptions.intervalMs;
958
+ }
775
959
  /**
776
960
  * Execute a stable FT.SEARCH for use by adapters (e.g. LangGraph).
777
961
  * SORTBY inserted_at ASC gives stable ordering across paginated calls.
@@ -788,19 +972,86 @@ class SemanticCache {
788
972
  return this.embed(text);
789
973
  }
790
974
  // -- Private helpers --
975
+ startConfigRefresh() {
976
+ if (!this.configRefreshOptions.enabled) {
977
+ return;
978
+ }
979
+ const tick = () => {
980
+ this.refreshConfig()
981
+ .then((ok) => {
982
+ if (!ok) {
983
+ this.telemetry.metrics.configRefreshFailed
984
+ .labels({ cache_name: this.name })
985
+ .inc();
986
+ }
987
+ })
988
+ .catch(() => {
989
+ this.telemetry.metrics.configRefreshFailed
990
+ .labels({ cache_name: this.name })
991
+ .inc();
992
+ });
993
+ };
994
+ // Synchronous first refresh: process started immediately after a proposal
995
+ // was applied picks up the change without waiting for the first tick.
996
+ tick();
997
+ this.configRefreshTimer = setInterval(tick, this.configRefreshOptions.intervalMs);
998
+ if (typeof this.configRefreshTimer.unref === 'function') {
999
+ this.configRefreshTimer.unref();
1000
+ }
1001
+ }
791
1002
  async _doInitialize() {
792
1003
  const gen = this._initGeneration;
793
1004
  return this.traced('initialize', async () => {
794
1005
  const { dim, hasBinaryRefs } = await this.ensureIndexAndGetDimension();
795
- if (this._initGeneration !== gen)
1006
+ if (this._initGeneration !== gen) {
796
1007
  return;
1008
+ }
797
1009
  this._dimension = dim;
798
1010
  this._hasBinaryRefs = hasBinaryRefs;
1011
+ // registerDiscovery() may throw SemanticCacheUsageError on a name
1012
+ // collision. Mark the cache initialized only after discovery succeeds
1013
+ // so a colliding caller cannot subsequently call check()/store()
1014
+ // against another owner's keys.
1015
+ const manager = await this.registerDiscovery();
1016
+ if (this._initGeneration !== gen) {
1017
+ if (manager) {
1018
+ await manager.stop({ deleteHeartbeat: true });
1019
+ }
1020
+ return;
1021
+ }
1022
+ this.discovery = manager;
799
1023
  this._initialized = true;
1024
+ this.startConfigRefresh();
800
1025
  // Fire analytics init once (not on every flush+initialize cycle)
801
1026
  this.initAnalyticsSafe().catch(() => { });
802
1027
  });
803
1028
  }
1029
+ async registerDiscovery() {
1030
+ if (this.discoveryOptions.enabled === false) {
1031
+ return null;
1032
+ }
1033
+ const metadata = (0, discovery_1.buildSemanticMetadata)({
1034
+ name: this.name,
1035
+ version: PACKAGE_VERSION,
1036
+ defaultThreshold: this.defaultThreshold,
1037
+ categoryThresholds: this.categoryThresholds,
1038
+ uncertaintyBand: this.uncertaintyBand,
1039
+ includeCategories: this.discoveryOptions.includeCategories ?? true,
1040
+ });
1041
+ const manager = new discovery_1.DiscoveryManager({
1042
+ client: this.client,
1043
+ name: this.name,
1044
+ metadata,
1045
+ heartbeatIntervalMs: this.discoveryOptions.heartbeatIntervalMs,
1046
+ onWriteFailed: () => {
1047
+ this.telemetry.metrics.discoveryWriteFailed
1048
+ .labels({ cache_name: this.name })
1049
+ .inc();
1050
+ },
1051
+ });
1052
+ await manager.register();
1053
+ return manager;
1054
+ }
804
1055
  async initAnalyticsSafe() {
805
1056
  if (this.analyticsInitiated)
806
1057
  return;
@@ -1056,3 +1307,17 @@ class SemanticCache {
1056
1307
  }
1057
1308
  }
1058
1309
  exports.SemanticCache = SemanticCache;
1310
+ // --- Judge helpers ---
1311
+ class JudgeTimeoutError extends Error {
1312
+ constructor() {
1313
+ super('judgeFn timed out');
1314
+ this.name = 'JudgeTimeoutError';
1315
+ }
1316
+ }
1317
+ function raceWithTimeout(p, timeoutMs) {
1318
+ let timer;
1319
+ const timeout = new Promise((_, reject) => {
1320
+ timer = setTimeout(() => reject(new JudgeTimeoutError()), timeoutMs);
1321
+ });
1322
+ return Promise.race([p, timeout]).finally(() => clearTimeout(timer));
1323
+ }
@@ -0,0 +1,67 @@
1
+ import type { Valkey } from './types';
2
+ export declare const PROTOCOL_VERSION = 1;
3
+ export declare const REGISTRY_KEY = "__betterdb:caches";
4
+ export declare const PROTOCOL_KEY = "__betterdb:protocol";
5
+ export declare const HEARTBEAT_KEY_PREFIX = "__betterdb:heartbeat:";
6
+ export declare const DEFAULT_HEARTBEAT_INTERVAL_MS = 30000;
7
+ export declare const HEARTBEAT_TTL_SECONDS = 60;
8
+ export declare const CACHE_TYPE: "semantic_cache";
9
+ export type CacheType = typeof CACHE_TYPE;
10
+ export interface DiscoveryOptions {
11
+ enabled?: boolean;
12
+ heartbeatIntervalMs?: number;
13
+ includeCategories?: boolean;
14
+ }
15
+ export interface MarkerMetadata {
16
+ type: CacheType;
17
+ prefix: string;
18
+ version: string;
19
+ protocol_version: number;
20
+ capabilities: string[];
21
+ stats_key: string;
22
+ started_at: string;
23
+ pid?: number;
24
+ hostname?: string;
25
+ [extra: string]: unknown;
26
+ }
27
+ export interface BuildSemanticMetadataInput {
28
+ name: string;
29
+ version: string;
30
+ defaultThreshold: number;
31
+ categoryThresholds: Record<string, number>;
32
+ uncertaintyBand: number;
33
+ includeCategories: boolean;
34
+ }
35
+ export declare function buildSemanticMetadata(input: BuildSemanticMetadataInput): MarkerMetadata;
36
+ export interface DiscoveryLogger {
37
+ warn: (msg: string) => void;
38
+ debug: (msg: string) => void;
39
+ }
40
+ export interface DiscoveryManagerDeps {
41
+ client: Valkey;
42
+ name: string;
43
+ metadata: MarkerMetadata;
44
+ heartbeatIntervalMs?: number;
45
+ logger?: DiscoveryLogger;
46
+ onWriteFailed?: () => void;
47
+ }
48
+ export declare class DiscoveryManager {
49
+ private readonly client;
50
+ private readonly name;
51
+ private readonly metadata;
52
+ private readonly heartbeatIntervalMs;
53
+ private readonly heartbeatKey;
54
+ private readonly logger;
55
+ private readonly onWriteFailed;
56
+ private heartbeatHandle;
57
+ constructor(deps: DiscoveryManagerDeps);
58
+ register(): Promise<void>;
59
+ stop(opts: {
60
+ deleteHeartbeat: boolean;
61
+ }): Promise<void>;
62
+ tickHeartbeat(): Promise<void>;
63
+ private startHeartbeat;
64
+ private safeHget;
65
+ private safeCall;
66
+ private checkCollision;
67
+ }
@@ -0,0 +1,140 @@
1
+ "use strict";
2
+ Object.defineProperty(exports, "__esModule", { value: true });
3
+ exports.DiscoveryManager = exports.CACHE_TYPE = exports.HEARTBEAT_TTL_SECONDS = exports.DEFAULT_HEARTBEAT_INTERVAL_MS = exports.HEARTBEAT_KEY_PREFIX = exports.PROTOCOL_KEY = exports.REGISTRY_KEY = exports.PROTOCOL_VERSION = void 0;
4
+ exports.buildSemanticMetadata = buildSemanticMetadata;
5
+ const node_os_1 = require("node:os");
6
+ const errors_1 = require("./errors");
7
+ exports.PROTOCOL_VERSION = 1;
8
+ exports.REGISTRY_KEY = '__betterdb:caches';
9
+ exports.PROTOCOL_KEY = '__betterdb:protocol';
10
+ exports.HEARTBEAT_KEY_PREFIX = '__betterdb:heartbeat:';
11
+ exports.DEFAULT_HEARTBEAT_INTERVAL_MS = 30_000;
12
+ exports.HEARTBEAT_TTL_SECONDS = 60;
13
+ exports.CACHE_TYPE = 'semantic_cache';
14
+ function buildSemanticMetadata(input) {
15
+ const metadata = {
16
+ type: exports.CACHE_TYPE,
17
+ prefix: input.name,
18
+ version: input.version,
19
+ protocol_version: exports.PROTOCOL_VERSION,
20
+ capabilities: ['invalidate', 'similarity_distribution', 'threshold_adjust'],
21
+ index_name: `${input.name}:idx`,
22
+ stats_key: `${input.name}:__stats`,
23
+ config_key: `${input.name}:__config`,
24
+ default_threshold: input.defaultThreshold,
25
+ uncertainty_band: input.uncertaintyBand,
26
+ started_at: new Date().toISOString(),
27
+ pid: process.pid,
28
+ hostname: (0, node_os_1.hostname)(),
29
+ };
30
+ if (input.includeCategories && Object.keys(input.categoryThresholds).length > 0) {
31
+ metadata.category_thresholds = { ...input.categoryThresholds };
32
+ }
33
+ return metadata;
34
+ }
35
+ const noopLogger = {
36
+ warn: () => { },
37
+ debug: () => { },
38
+ };
39
+ function errMsg(err) {
40
+ return err instanceof Error ? err.message : String(err);
41
+ }
42
+ class DiscoveryManager {
43
+ client;
44
+ name;
45
+ metadata;
46
+ heartbeatIntervalMs;
47
+ heartbeatKey;
48
+ logger;
49
+ onWriteFailed;
50
+ heartbeatHandle = null;
51
+ constructor(deps) {
52
+ this.client = deps.client;
53
+ this.name = deps.name;
54
+ this.metadata = deps.metadata;
55
+ this.heartbeatIntervalMs = deps.heartbeatIntervalMs ?? exports.DEFAULT_HEARTBEAT_INTERVAL_MS;
56
+ this.heartbeatKey = `${exports.HEARTBEAT_KEY_PREFIX}${deps.name}`;
57
+ this.logger = deps.logger ?? noopLogger;
58
+ this.onWriteFailed = deps.onWriteFailed ?? (() => { });
59
+ }
60
+ async register() {
61
+ const existingJson = await this.safeHget();
62
+ if (existingJson !== null) {
63
+ this.checkCollision(existingJson);
64
+ }
65
+ await this.safeCall(() => this.client.hset(exports.REGISTRY_KEY, this.name, JSON.stringify(this.metadata)), 'HSET registry');
66
+ await this.safeCall(() => this.client.set(exports.PROTOCOL_KEY, String(exports.PROTOCOL_VERSION), 'NX'), 'SET protocol');
67
+ await this.tickHeartbeat();
68
+ this.startHeartbeat();
69
+ }
70
+ async stop(opts) {
71
+ if (this.heartbeatHandle) {
72
+ clearInterval(this.heartbeatHandle);
73
+ this.heartbeatHandle = null;
74
+ }
75
+ if (!opts.deleteHeartbeat) {
76
+ return;
77
+ }
78
+ try {
79
+ await this.client.del(this.heartbeatKey);
80
+ }
81
+ catch (err) {
82
+ this.logger.debug(`discovery: DEL heartbeat failed: ${errMsg(err)}`);
83
+ }
84
+ }
85
+ async tickHeartbeat() {
86
+ const now = new Date().toISOString();
87
+ try {
88
+ await this.client.set(this.heartbeatKey, now, 'EX', exports.HEARTBEAT_TTL_SECONDS);
89
+ }
90
+ catch (err) {
91
+ this.logger.debug(`discovery: heartbeat SET failed: ${errMsg(err)}`);
92
+ this.onWriteFailed();
93
+ }
94
+ }
95
+ startHeartbeat() {
96
+ if (this.heartbeatHandle) {
97
+ clearInterval(this.heartbeatHandle);
98
+ }
99
+ const handle = setInterval(() => {
100
+ void this.tickHeartbeat();
101
+ }, this.heartbeatIntervalMs);
102
+ handle.unref?.();
103
+ this.heartbeatHandle = handle;
104
+ }
105
+ async safeHget() {
106
+ try {
107
+ return await this.client.hget(exports.REGISTRY_KEY, this.name);
108
+ }
109
+ catch (err) {
110
+ this.logger.warn(`discovery: HGET registry failed: ${errMsg(err)}`);
111
+ this.onWriteFailed();
112
+ return null;
113
+ }
114
+ }
115
+ async safeCall(fn, label) {
116
+ try {
117
+ await fn();
118
+ }
119
+ catch (err) {
120
+ this.logger.warn(`discovery: ${label} failed: ${errMsg(err)}`);
121
+ this.onWriteFailed();
122
+ }
123
+ }
124
+ checkCollision(existingJson) {
125
+ let parsed;
126
+ try {
127
+ parsed = JSON.parse(existingJson);
128
+ }
129
+ catch {
130
+ return;
131
+ }
132
+ if (parsed.type && parsed.type !== exports.CACHE_TYPE) {
133
+ throw new errors_1.SemanticCacheUsageError(`cache name collision: '${this.name}' is already registered as type '${String(parsed.type)}' on this Valkey instance`);
134
+ }
135
+ if (parsed.version && parsed.version !== this.metadata.version) {
136
+ this.logger.warn(`discovery: overwriting marker for '${this.name}' (existing version ${String(parsed.version)}, this version ${this.metadata.version})`);
137
+ }
138
+ }
139
+ }
140
+ exports.DiscoveryManager = DiscoveryManager;
package/dist/index.d.ts CHANGED
@@ -1,8 +1,10 @@
1
1
  export { SemanticCache } from './SemanticCache';
2
2
  export type { ThresholdEffectivenessResult } from './SemanticCache';
3
3
  export { DEFAULT_COST_TABLE } from './defaultCostTable';
4
- export type { SemanticCacheOptions, CacheCheckOptions, CacheStoreOptions, CacheCheckResult, CacheStats, IndexInfo, InvalidateResult, CacheConfidence, EmbedFn, ModelCost, RerankOptions, } from './types';
4
+ export type { SemanticCacheOptions, CacheCheckOptions, CacheStoreOptions, CacheCheckResult, CacheStats, IndexInfo, InvalidateResult, CacheConfidence, EmbedFn, ModelCost, RerankOptions, JudgeOptions, ConfigRefreshOptions, } from './types';
5
5
  export { SemanticCacheUsageError, EmbeddingError, ValkeyCommandError, } from './errors';
6
6
  export type { ContentBlock, TextBlock, BinaryBlock, ToolCallBlock, ToolResultBlock, ReasoningBlock, BlockHints, } from './utils';
7
+ export { escapeTag } from './utils';
7
8
  export type { BinaryRef, BinaryNormalizer, NormalizerConfig } from './normalizer';
8
9
  export { hashBase64, hashBytes, hashUrl, fetchAndHash, passthrough, composeNormalizer, defaultNormalizer, } from './normalizer';
10
+ export type { DiscoveryOptions } from './discovery';
package/dist/index.js CHANGED
@@ -1,6 +1,6 @@
1
1
  "use strict";
2
2
  Object.defineProperty(exports, "__esModule", { value: true });
3
- exports.defaultNormalizer = exports.composeNormalizer = exports.passthrough = exports.fetchAndHash = exports.hashUrl = exports.hashBytes = exports.hashBase64 = exports.ValkeyCommandError = exports.EmbeddingError = exports.SemanticCacheUsageError = exports.DEFAULT_COST_TABLE = exports.SemanticCache = void 0;
3
+ exports.defaultNormalizer = exports.composeNormalizer = exports.passthrough = exports.fetchAndHash = exports.hashUrl = exports.hashBytes = exports.hashBase64 = exports.escapeTag = exports.ValkeyCommandError = exports.EmbeddingError = exports.SemanticCacheUsageError = exports.DEFAULT_COST_TABLE = exports.SemanticCache = void 0;
4
4
  var SemanticCache_1 = require("./SemanticCache");
5
5
  Object.defineProperty(exports, "SemanticCache", { enumerable: true, get: function () { return SemanticCache_1.SemanticCache; } });
6
6
  var defaultCostTable_1 = require("./defaultCostTable");
@@ -9,6 +9,8 @@ var errors_1 = require("./errors");
9
9
  Object.defineProperty(exports, "SemanticCacheUsageError", { enumerable: true, get: function () { return errors_1.SemanticCacheUsageError; } });
10
10
  Object.defineProperty(exports, "EmbeddingError", { enumerable: true, get: function () { return errors_1.EmbeddingError; } });
11
11
  Object.defineProperty(exports, "ValkeyCommandError", { enumerable: true, get: function () { return errors_1.ValkeyCommandError; } });
12
+ var utils_1 = require("./utils");
13
+ Object.defineProperty(exports, "escapeTag", { enumerable: true, get: function () { return utils_1.escapeTag; } });
12
14
  var normalizer_1 = require("./normalizer");
13
15
  Object.defineProperty(exports, "hashBase64", { enumerable: true, get: function () { return normalizer_1.hashBase64; } });
14
16
  Object.defineProperty(exports, "hashBytes", { enumerable: true, get: function () { return normalizer_1.hashBytes; } });
@@ -13,6 +13,10 @@ interface CacheMetrics {
13
13
  costSavedTotal: Counter;
14
14
  embeddingCacheTotal: Counter;
15
15
  staleModelEvictions: Counter;
16
+ discoveryWriteFailed: Counter;
17
+ configRefreshFailed: Counter;
18
+ judgeDecisions: Counter;
19
+ judgeDuration: Histogram;
16
20
  }
17
21
  export interface Telemetry {
18
22
  tracer: Tracer;
package/dist/telemetry.js CHANGED
@@ -57,6 +57,27 @@ function createTelemetry(opts) {
57
57
  help: 'Entries evicted due to staleAfterModelChange detection',
58
58
  labelNames: ['cache_name'],
59
59
  });
60
+ const discoveryWriteFailed = getOrCreateCounter(registry, {
61
+ name: `${opts.prefix}_discovery_write_failed_total`,
62
+ help: 'Count of failed discovery-marker writes (best-effort HGET/HSET/SET operations against __betterdb:* keys)',
63
+ labelNames: ['cache_name'],
64
+ });
65
+ const configRefreshFailed = getOrCreateCounter(registry, {
66
+ name: `${opts.prefix}_config_refresh_failed_total`,
67
+ help: 'Count of failed periodic config refreshes (HGETALL on __config).',
68
+ labelNames: ['cache_name'],
69
+ });
70
+ const judgeDecisions = getOrCreateCounter(registry, {
71
+ name: `${opts.prefix}_judge_decisions_total`,
72
+ help: 'LLM-as-judge decisions for borderline cache hits',
73
+ labelNames: ['cache_name', 'category', 'decision'],
74
+ });
75
+ const judgeDuration = getOrCreateHistogram(registry, {
76
+ name: `${opts.prefix}_judge_duration_seconds`,
77
+ help: 'Wall-clock duration of judgeFn invocations',
78
+ labelNames: ['cache_name', 'category', 'decision'],
79
+ buckets: [0.05, 0.1, 0.25, 0.5, 1, 2, 5],
80
+ });
60
81
  return {
61
82
  tracer,
62
83
  metrics: {
@@ -67,6 +88,10 @@ function createTelemetry(opts) {
67
88
  costSavedTotal,
68
89
  embeddingCacheTotal,
69
90
  staleModelEvictions,
91
+ discoveryWriteFailed,
92
+ configRefreshFailed,
93
+ judgeDecisions,
94
+ judgeDuration,
70
95
  },
71
96
  };
72
97
  }
package/dist/types.d.ts CHANGED
@@ -1,6 +1,13 @@
1
1
  import type Valkey from 'iovalkey';
2
2
  import type { Registry } from 'prom-client';
3
+ import type { DiscoveryOptions } from './discovery';
3
4
  export type { Valkey };
5
+ export interface ConfigRefreshOptions {
6
+ /** Enable periodic config refresh from Valkey. Default: true. */
7
+ enabled?: boolean;
8
+ /** Refresh interval in milliseconds. Default: 30000. Minimum: 1000. */
9
+ intervalMs?: number;
10
+ }
4
11
  export type EmbedFn = (text: string) => Promise<number[]>;
5
12
  export interface ModelCost {
6
13
  inputPer1k: number;
@@ -92,6 +99,20 @@ export interface SemanticCacheOptions {
92
99
  /** Interval in ms for periodic stats snapshots. Default: 300_000 (5 min). 0 to disable. */
93
100
  statsIntervalMs?: number;
94
101
  };
102
+ /**
103
+ * Discovery-marker protocol controls. See
104
+ * docs/plans/specs/spec-semantic-cache-discovery-markers.md.
105
+ * Defaults: enabled=true, heartbeatIntervalMs=30000, includeCategories=true.
106
+ */
107
+ discovery?: DiscoveryOptions;
108
+ /**
109
+ * Periodic refresh of in-memory threshold config from Valkey.
110
+ * When enabled, the cache re-reads `{name}:__config` on the configured
111
+ * interval. Field `threshold` updates `defaultThreshold`; fields named
112
+ * `threshold:{category}` update `categoryThresholds[category]`.
113
+ * Defaults: enabled=true, intervalMs=30000.
114
+ */
115
+ configRefresh?: ConfigRefreshOptions;
95
116
  }
96
117
  export interface RerankOptions {
97
118
  /**
@@ -108,6 +129,61 @@ export interface RerankOptions {
108
129
  similarity: number;
109
130
  }>) => Promise<number>;
110
131
  }
132
+ /**
133
+ * LLM-as-judge adjudication for borderline cache hits.
134
+ *
135
+ * When set on CacheCheckOptions, a hit whose cosine distance lands in the
136
+ * uncertainty band (threshold - uncertaintyBand < score <= threshold) is
137
+ * passed to judgeFn before being returned. The judge accepts (promotes the
138
+ * hit to confidence: 'high') or rejects (treats it as a miss with
139
+ * nearestMiss populated).
140
+ *
141
+ * The judge is NOT invoked for:
142
+ * - high-confidence hits (score <= threshold - uncertaintyBand)
143
+ * - misses (score > threshold)
144
+ * - the no-candidates case (FT.SEARCH returned zero rows)
145
+ *
146
+ * When rerank is also set, the judge runs on the reranked pick, not the
147
+ * original top-1.
148
+ */
149
+ export interface JudgeOptions {
150
+ /**
151
+ * Function that decides whether a borderline cache hit is acceptable.
152
+ * Return true to accept (caller receives confidence: 'high').
153
+ * Return false to reject (caller receives a miss with nearestMiss).
154
+ *
155
+ * The function receives the original prompt text (or the resolved text
156
+ * portion of a multipart prompt), the cached response, the cosine distance,
157
+ * the effective threshold, and the category if one was supplied to check().
158
+ */
159
+ judgeFn: (input: {
160
+ prompt: string;
161
+ response: string;
162
+ similarity: number;
163
+ threshold: number;
164
+ category: string | undefined;
165
+ }) => Promise<boolean>;
166
+ /**
167
+ * Behavior when judgeFn throws or exceeds timeoutMs.
168
+ * 'accept' - return the cached response with confidence: 'uncertain'
169
+ * (current pre-judge behavior, fail-open).
170
+ * 'reject' - treat as a miss (fail-closed).
171
+ * Default: 'accept'.
172
+ */
173
+ onError?: 'accept' | 'reject';
174
+ /**
175
+ * Per-call timeout in milliseconds. Default: 2000.
176
+ * The judge function is raced against this timeout; timeout is treated
177
+ * the same as a thrown error and routed through onError.
178
+ *
179
+ * Note: the underlying promise is not cancelled on timeout — JavaScript has
180
+ * no built-in cancellation primitive. A real LLM HTTP request will continue
181
+ * running in the background after the timeout fires, consuming API quota.
182
+ * To stop the underlying request, use an AbortController inside judgeFn and
183
+ * abort it when the signal you manage fires.
184
+ */
185
+ timeoutMs?: number;
186
+ }
111
187
  export interface CacheCheckOptions {
112
188
  /** Per-request threshold override (cosine distance 0-2). Highest priority. */
113
189
  threshold?: number;
@@ -146,6 +222,11 @@ export interface CacheCheckOptions {
146
222
  * in rerankFn yourself.
147
223
  */
148
224
  rerank?: RerankOptions;
225
+ /**
226
+ * Optional LLM-as-judge adjudication for borderline hits.
227
+ * See JudgeOptions. Ignored on checkBatch() - call check() per prompt instead.
228
+ */
229
+ judge?: JudgeOptions;
149
230
  }
150
231
  export interface CacheStoreOptions {
151
232
  /** Per-entry TTL in seconds. Overrides SemanticCacheOptions.defaultTtl. */
@@ -202,10 +283,19 @@ export interface CacheCheckResult {
202
283
  /**
203
284
  * On a miss where a candidate existed but didn't clear the threshold,
204
285
  * describes how close it was. Useful for threshold tuning.
286
+ *
287
+ * Note: when the miss originates from a judge rejection, `deltaToThreshold`
288
+ * will be <= 0 because the score did clear the threshold — the judge said no.
289
+ * Existing non-judge misses always produce deltaToThreshold > 0.
290
+ * Use `deltaToThreshold <= 0` to detect judge-originated misses.
205
291
  */
206
292
  nearestMiss?: {
207
293
  similarity: number;
208
294
  deltaToThreshold: number;
295
+ /** The effective threshold that was applied. Present on judge-rejection misses. */
296
+ threshold?: number;
297
+ /** The Valkey key of the entry that was rejected. Present on judge-rejection misses. */
298
+ matchedKey?: string;
209
299
  };
210
300
  /**
211
301
  * Estimated cost saved (in dollars) by returning this cached result instead of calling the LLM.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@betterdb/semantic-cache",
3
- "version": "0.2.0",
3
+ "version": "0.5.0",
4
4
  "description": "Valkey-native semantic cache for LLM applications with built-in OpenTelemetry and Prometheus instrumentation",
5
5
  "keywords": [
6
6
  "valkey",