npm - @betterdb/semantic-cache - Versions diffs - 0.1.0 → 0.2.0 - Mend

@betterdb/semantic-cache 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

package/README.md +211 -128
package/dist/SemanticCache.d.ts +85 -5
package/dist/SemanticCache.js +689 -47
package/dist/adapters/ai.js +6 -1
package/dist/adapters/anthropic.d.ts +32 -0
package/dist/adapters/anthropic.js +94 -0
package/dist/adapters/langchain.js +6 -1
package/dist/adapters/langgraph.d.ts +104 -0
package/dist/adapters/langgraph.js +271 -0
package/dist/adapters/llamaindex.d.ts +32 -0
package/dist/adapters/llamaindex.js +76 -0
package/dist/adapters/openai-responses.d.ts +31 -0
package/dist/adapters/openai-responses.js +112 -0
package/dist/adapters/openai.d.ts +42 -0
package/dist/adapters/openai.js +97 -0
package/dist/analytics.d.ts +24 -0
package/dist/analytics.js +116 -0
package/dist/cluster.d.ts +10 -0
package/dist/cluster.js +43 -0
package/dist/defaultCostTable.d.ts +11 -0
package/dist/defaultCostTable.js +1976 -0
package/dist/embed/bedrock.d.ts +32 -0
package/dist/embed/bedrock.js +109 -0
package/dist/embed/cohere.d.ts +34 -0
package/dist/embed/cohere.js +37 -0
package/dist/embed/ollama.d.ts +30 -0
package/dist/embed/ollama.js +24 -0
package/dist/embed/openai.d.ts +31 -0
package/dist/embed/openai.js +66 -0
package/dist/embed/voyage.d.ts +31 -0
package/dist/embed/voyage.js +32 -0
package/dist/index.d.ts +6 -1
package/dist/index.js +11 -1
package/dist/normalizer.d.ts +68 -0
package/dist/normalizer.js +102 -0
package/dist/telemetry.d.ts +3 -0
package/dist/telemetry.js +18 -0
package/dist/types.d.ts +107 -7
package/dist/utils.d.ts +58 -0
package/dist/utils.js +30 -0
package/package.json +81 -6

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # @betterdb/semantic-cache
-A standalone, framework-agnostic semantic cache for LLM applications backed by [Valkey](https://valkey.io/) (or Redis). Uses Valkey's vector search (`valkey-search` module) for similarity matching with built-in [OpenTelemetry](https://opentelemetry.io/) tracing and [Prometheus](https://prometheus.io/) metrics via `prom-client`. The first semantic cache library designed to work natively with Valkey and BetterDB Monitor.
+A standalone, framework-agnostic semantic cache for LLM applications backed by [Valkey](https://valkey.io/). Uses Valkey's vector search (`valkey-search` module) for similarity matching with built-in [OpenTelemetry](https://opentelemetry.io/) tracing and [Prometheus](https://prometheus.io/) metrics. Full adapter parity with [`@betterdb/agent-cache`](../agent-cache/).
 ## Prerequisites
@@ -12,153 +12,154 @@ A standalone, framework-agnostic semantic cache for LLM applications backed by [
 ## Installation
 ```bash
-npm install @betterdb/semantic-cache
+npm install @betterdb/semantic-cache iovalkey
 ```
-You must also have `iovalkey` installed (it is a peer dependency):
-```bash
-npm install iovalkey
-```
+`iovalkey` is a required peer dependency.
 ## Why @betterdb/semantic-cache
-As of 2026, no existing semantic cache library simultaneously satisfies all three of the following properties: **Valkey-native** support (explicitly handling `valkey-search` API differences rather than assuming Redis wire compatibility), **standalone** operation (no coupling to LangChain, LiteLLM, AWS, or any other orchestration layer), and **built-in observability** (OpenTelemetry spans and Prometheus metrics emitted at the cache operation level, not just at the HTTP or LLM call level). This package was built to fill that gap.
-| Library / Service | Valkey-native | Standalone | Built-in OTel + Prometheus |
-|---|---|---|---|
-| **@betterdb/semantic-cache** | ✅ | ✅ | ✅ |
-| RedisVL `SemanticCache` | ❌ Redis only | ✅ | ❌ |
-| LangChain `RedisSemanticCache` | ❌ Redis only | ❌ Requires LangChain | ❌ |
-| LiteLLM `redis-semantic` | ❌ Redis only | ❌ Requires LiteLLM | ❌ Partial (no cache metrics) |
-| `langgraph-checkpoint-aws` `ValkeyCache` | ✅ | ❌ Requires AWS + LangGraph | ❌ |
-| Mem0 + Valkey | ✅ | ❌ Full memory framework | ❌ |
-| Redis LangCache | ❌ Redis Cloud only | ❌ Managed service | ✅ Dashboard only |
-| Upstash `semantic-cache` | ❌ Upstash Vector only | ✅ | ❌ |
-| GPTCache | ❌ Abandoned (2023) | ✅ | ❌ |
-- **Valkey-native**: `valkey-search` has API differences from Redis's RediSearch that require explicit handling (see [Valkey Search 1.2 compatibility notes](#valkey-search-12-compatibility-notes) in the changelog). Libraries targeting Redis are not guaranteed to work correctly against self-hosted Valkey or managed Valkey services (ElastiCache, Memorystore).
-- **Standalone**: no dependency on a specific AI framework means you can use this with any LLM client — OpenAI SDK, Anthropic SDK, a local model, or a custom inference endpoint — and swap it out without changing your cache layer.
-- **Built-in OTel + Prometheus**: every `check()` and `store()` call emits a span and increments counters. You get hit rate, similarity score distribution, and latency percentiles in Grafana or any OTel-compatible backend without writing any instrumentation code. If you use [BetterDB Monitor](https://betterdb.com), these metrics are surfaced automatically alongside your other Valkey observability data.
+The only semantic cache library that is simultaneously Valkey-native (explicit handling of `valkey-search` API differences), standalone (no coupling to any AI framework), and has built-in OpenTelemetry + Prometheus instrumentation at the cache operation level.
 ## Quick Start
 ```typescript
 import Valkey from 'iovalkey';
 import { SemanticCache } from '@betterdb/semantic-cache';
+import { createOpenAIEmbed } from '@betterdb/semantic-cache/embed/openai';
 const client = new Valkey({ host: 'localhost', port: 6399 });
 const cache = new SemanticCache({
   client,
-  embedFn: async (text) => {
-    // Any embedding provider works — OpenAI, Voyage AI, Cohere, a local model, etc.
-    const res = await fetch('https://api.voyageai.com/v1/embeddings', {
-      method: 'POST',
-      headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${process.env.VOYAGE_API_KEY}` },
-      body: JSON.stringify({ model: 'voyage-3-lite', input: [text] }),
-    });
-    const json = await res.json();
-    return json.data[0].embedding;
-  },
+  embedFn: createOpenAIEmbed(), // or createVoyageEmbed(), createOllamaEmbed(), etc.
+  defaultThreshold: 0.15,       // loosen slightly to catch paraphrases with high confidence
+  defaultTtl: 3600,
 });
 await cache.initialize();
-// Store a response
-await cache.store('What is the capital of France?', 'Paris');
-// Check for a semantically similar prompt
-const result = await cache.check('Capital city of France?');
-// result.hit === true, result.response === 'Paris'
-```
-## Client Lifecycle
-SemanticCache does **not** own the iovalkey client. You create it, you close it:
-```typescript
-const client = new Valkey({ host: 'localhost', port: 6399 });
-const cache = new SemanticCache({ client, embedFn });
-// ... use cache ...
+// Store with cost tracking
+await cache.store('What is the capital of France?', 'Paris', {
+  model: 'gpt-4o-mini',
+  inputTokens: 20,
+  outputTokens: 5,
+});
-// When shutting down, close the client yourself:
-await client.quit();
+// Exact match - always high confidence
+const exact = await cache.check('What is the capital of France?');
+// exact.hit === true
+// exact.confidence === 'high'
+// exact.similarity === 0.0000
+// exact.costSaved === 0.0000085
+// Paraphrase - typically 'uncertain' at threshold 0.1, 'high' at threshold 0.15
+const paraphrase = await cache.check('What city is the capital of France?');
+// paraphrase.hit === true
+// paraphrase.confidence === 'high'    // at threshold 0.15
+// paraphrase.similarity ~= 0.087      // observed with text-embedding-3-small
+// paraphrase.costSaved === 0.0000085
 ```
-## Threshold: Cosine Distance vs Cosine Similarity
+## Threshold and Confidence
-This library uses **cosine distance** (0–2 scale), not cosine similarity (0–1 scale):
+This library uses **cosine distance** (0-2 scale, lower = more similar):
 | Distance | Meaning |
 |----------|---------|
-| 0 | Identical vectors |
-| 1 | Orthogonal (unrelated) |
-| 2 | Opposite vectors |
-A cache lookup is a **hit** when `score <= threshold`. The default threshold of `0.1` is strict — it matches only very similar prompts. Increase to `0.15–0.2` for broader matching.
-The relationship is: `distance = 1 - similarity`. A cosine similarity of 0.95 corresponds to a distance of 0.05.
+| 0.00 | Identical vectors |
+| 0.05-0.10 | Strong paraphrase |
+| 0.10-0.20 | Loose paraphrase / related topic |
+| 1.00 | Orthogonal (unrelated) |
-### Handling uncertain hits
+A lookup is a **hit** when `score <= threshold`. The default threshold is `0.1`.
-When `confidence` is `'uncertain'`, the cached response is technically above
-the similarity threshold but close to the boundary. Three common patterns:
+### Confidence levels
-**Accept and monitor** — return the cached response but track uncertain hits
-separately via the `result: 'uncertain_hit'` Prometheus label. Review them
-periodically to decide if the threshold needs adjustment.
+| `confidence` | When | What to do |
+|---|---|---|
+| `high` | `score <= threshold - uncertaintyBand` (e.g. `<= 0.05`) | Return the cached response directly |
+| `uncertain` | `threshold - band < score <= threshold` (e.g. `0.05–0.10`) | Return the response but consider flagging for review |
+| `miss` | `score > threshold` | No hit - call the LLM |
-**Fall back to LLM** — treat uncertain hits as misses, call the LLM, then
-update the cache entry with `store()` using the fresh response.
+**With real embeddings (`text-embedding-3-small`):**
+- Exact same phrasing: `~0.000` - always `high`
+- Close paraphrase ("Which city is the capital of France?"): `~0.08–0.09` - `uncertain` at default `0.1` threshold, `high` at `0.15`
+- Loose paraphrase ("France's capital?"): `~0.10–0.15` - typically `miss` at `0.1`, `uncertain` at `0.15`
-**Prompt for feedback** — in user-facing applications, show the cached
-response but collect a thumbs up/down signal to identify false positives.
+**Recommended thresholds by use case:**
-A high rate of uncertain hits (visible in the `{prefix}_requests_total`
-metric) indicates the threshold may be too loose for the query distribution.
+| Use case | Threshold | Notes |
+|---|---|---|
+| FAQ / exact match only | `0.05` | Very strict, near-zero false positives |
+| Standard Q&A | `0.10` | Default - paraphrases land as `uncertain` |
+| Conversational / RAG | `0.15` | Paraphrases hit as `high` confidence |
+| Broad search / recall | `0.20` | High hit rate, review uncertain hits |
 ## Configuration Reference
 | Option | Type | Default | Description |
 |--------|------|---------|-------------|
-| `name` | `string` | `'betterdb_scache'` | Index name prefix for Valkey keys |
-| `client` | `Valkey` | — | iovalkey client instance (required) |
-| `embedFn` | `(text: string) => Promise<number[]>` | — | Embedding function (required) |
-| `defaultThreshold` | `number` | `0.1` | Cosine distance threshold (0–2) |
-| `defaultTtl` | `number` | `undefined` | Default TTL in seconds for entries |
+| `name` | `string` | `'betterdb_scache'` | Key prefix |
+| `client` | `Valkey` | - | iovalkey client (required) |
+| `embedFn` | `EmbedFn` | - | Embedding function (required) |
+| `defaultThreshold` | `number` | `0.1` | Cosine distance threshold (0-2) |
+| `defaultTtl` | `number` | `undefined` | Default TTL in seconds |
 | `categoryThresholds` | `Record<string, number>` | `{}` | Per-category threshold overrides |
-| `uncertaintyBand` | `number` | `0.05` | Width of the uncertainty band below threshold |
-| `telemetry.tracerName` | `string` | `'@betterdb/semantic-cache'` | OpenTelemetry tracer name |
-| `telemetry.metricsPrefix` | `string` | `'semantic_cache'` | Prometheus metric name prefix |
-| `telemetry.registry` | `Registry` | default registry | prom-client Registry for metrics |
-## Observability
-### Prometheus Metrics
+| `uncertaintyBand` | `number` | `0.05` | Width of uncertainty band below threshold |
+| `costTable` | `Record<string, ModelCost>` | `undefined` | Per-model pricing overrides |
+| `useDefaultCostTable` | `boolean` | `true` | Use bundled LiteLLM price table (1,971 models) |
+| `normalizer` | `BinaryNormalizer` | `defaultNormalizer` | Binary content normalizer |
+| `embeddingCache.enabled` | `boolean` | `true` | Cache computed embeddings in Valkey |
+| `embeddingCache.ttl` | `number` | `86400` | Embedding cache TTL (seconds) |
+| `telemetry.tracerName` | `string` | `'@betterdb/semantic-cache'` | OTel tracer name |
+| `telemetry.metricsPrefix` | `string` | `'semantic_cache'` | Prometheus prefix |
+| `telemetry.registry` | `Registry` | default | prom-client Registry |
-All metric names are prefixed with `semantic_cache_` by default (configurable via `telemetry.metricsPrefix`).
+## Cost Tracking
-| Metric | Type | Labels | Description |
-|--------|------|--------|-------------|
-| `semantic_cache_requests_total` | Counter | `cache_name`, `result`, `category` | Total cache requests. `result` is `hit`, `miss`, or `uncertain_hit` |
-| `semantic_cache_similarity_score` | Histogram | `cache_name`, `category` | Cosine distance scores for lookups with candidates |
-| `semantic_cache_operation_duration_seconds` | Histogram | `cache_name`, `operation` | Duration of cache operations (`check`, `store`, `invalidate`, `initialize`) |
-| `semantic_cache_embedding_duration_seconds` | Histogram | `cache_name` | Duration of embedding function calls |
+Store token counts at cache-time to get per-hit cost savings:
-### OpenTelemetry Tracing
-Every public method emits an OTel span with relevant attributes (`cache.hit`, `cache.similarity`, `cache.threshold`, `cache.confidence`, etc.). Spans require an OpenTelemetry SDK to be configured in the host application — this library uses `@opentelemetry/api` and does not bundle an SDK.
+```typescript
+await cache.store('What is the capital of France?', 'Paris', {
+  model: 'claude-haiku-4-5',   // looked up in bundled LiteLLM price table
+  inputTokens: 42,
+  outputTokens: 12,
+});
-## BetterDB Monitor Integration
+const result = await cache.check('Capital of France?');
+console.log(result.costSaved);  // e.g. 0.000064 (dollars saved on this hit)
-If you connect [BetterDB Monitor](https://github.com/KIvanow/monitor) to the same Valkey instance, it will automatically detect the semantic cache index and surface:
+const stats = await cache.stats();
+console.log(stats.costSavedMicros); // cumulative microdollars saved
+```
-- Hit rate and miss rate over time
-- Similarity score distribution
-- Cache entry count and memory usage
-- Cost savings estimates based on cache hit rates
+Cost savings scale with the model. Observed values from live examples:
+- `gpt-4o-mini`: ~`$0.000006` per hit (cheap model, short responses)
+- `claude-haiku-4-5`: ~`$0.000064` per hit (~10x more expensive)
+- `gpt-4o`: ~`$0.000100` per hit at 20 input / 5 output tokens
+## Adapters
+| Import | Class/Function | Description |
+|---|---|---|
+| `@betterdb/semantic-cache/langchain` | `BetterDBSemanticCache` | LangChain `BaseCache` |
+| `@betterdb/semantic-cache/ai` | `createSemanticCacheMiddleware` | Vercel AI SDK middleware |
+| `@betterdb/semantic-cache/openai` | `prepareSemanticParams` | OpenAI Chat Completions |
+| `@betterdb/semantic-cache/openai-responses` | `prepareSemanticParams` | OpenAI Responses API |
+| `@betterdb/semantic-cache/anthropic` | `prepareSemanticParams` | Anthropic Messages API |
+| `@betterdb/semantic-cache/llamaindex` | `prepareSemanticParams` | LlamaIndex ChatMessage[] |
+| `@betterdb/semantic-cache/langgraph` | `BetterDBSemanticStore` | LangGraph BaseStore |
+## Embedding Helpers
+| Import | Default model | Dimensions |
+|---|---|---|
+| `@betterdb/semantic-cache/embed/openai` | `text-embedding-3-small` | 1536 |
+| `@betterdb/semantic-cache/embed/bedrock` | `amazon.titan-embed-text-v2:0` | 1024 |
+| `@betterdb/semantic-cache/embed/voyage` | `voyage-3-lite` | 512 |
+| `@betterdb/semantic-cache/embed/cohere` | `embed-english-v3.0` | 1024 |
+| `@betterdb/semantic-cache/embed/ollama` | `nomic-embed-text` | 768 |
 ## API
@@ -168,55 +169,137 @@ Creates or reconnects to the Valkey search index. Must be called before `check()
 ### `cache.check(prompt, options?)`
-Searches for a semantically similar cached prompt. Returns `{ hit, response, similarity, confidence, matchedKey, nearestMiss }`.
+`prompt` is `string | ContentBlock[]`. Returns `CacheCheckResult`:
+| Field | Description |
+|---|---|
+| `hit` | Whether the nearest neighbour's distance was `<= threshold` |
+| `response` | Cached response text. Present on hit |
+| `similarity` | Cosine distance (0-2). Present when a candidate was found |
+| `confidence` | `'high'` / `'uncertain'` / `'miss'` |
+| `costSaved` | Dollars saved on this hit. Present when cost was recorded at store time |
+| `contentBlocks` | Structured response blocks. Present when stored via `storeMultipart()` |
+| `nearestMiss` | On miss with a candidate: `{ similarity, deltaToThreshold }` |
+**Options:** `threshold`, `category`, `filter`, `k`, `staleAfterModelChange`, `currentModel`, `rerank`
 ### `cache.store(prompt, response, options?)`
-Stores a prompt/response pair with its embedding vector. Returns the Valkey key.
+`prompt` is `string | ContentBlock[]`. Returns the Valkey key.
+**Options:** `ttl`, `category`, `model`, `metadata`, `inputTokens`, `outputTokens`, `temperature`, `topP`, `seed`
+### `cache.storeMultipart(prompt, blocks, options?)`
+Stores structured `ContentBlock[]` as the response. On hit, `check()` returns `contentBlocks`.
+### `cache.checkBatch(prompts[], options?)`
+Pipelined multi-prompt lookups. ~50-70% faster than sequential `check()` calls. Returns results in input order.
 ### `cache.invalidate(filter)`
-Deletes entries matching a valkey-search filter expression. Example: `cache.invalidate('@model:{gpt-4o}')`.
+Delete entries matching a `valkey-search` filter (e.g. `'@model:{gpt-4o}'`).
+### `cache.invalidateByModel(model)` / `cache.invalidateByCategory(category)`
+Convenience wrappers around `invalidate()`.
 ### `cache.stats()`
-Returns `{ hits, misses, total, hitRate }` from the Valkey stats hash.
+Returns `{ hits, misses, total, hitRate, costSavedMicros }`.
 ### `cache.indexInfo()`
-Returns index metadata: `{ name, numDocs, dimension, indexingState }`.
+Returns `{ name, numDocs, dimension, indexingState }`.
 ### `cache.flush()`
-Drops the index and all entries. Call `initialize()` again to rebuild.
+Drops the index and all keys. Call `initialize()` again to rebuild.
-## Known limitations
+### `cache.thresholdEffectiveness(options?)`
-### Cluster mode
+Analyzes the rolling similarity score window (last 10,000 entries, up to 7 days) and returns:
+```typescript
+{
+  recommendation: 'tighten_threshold' | 'loosen_threshold' | 'optimal' | 'insufficient_data',
+  recommendedThreshold?: number,  // present when recommendation is tighten/loosen
+  reasoning: string,              // human-readable explanation
+  hitRate: number,
+  uncertainHitRate: number,       // >20% triggers tighten recommendation
+  nearMissRate: number,           // >30% with avg delta <0.03 triggers loosen
+  // ...
+}
+```
+### `cache.thresholdEffectivenessAll(options?)`
+Returns one result per category seen in the window, plus one aggregate `'all'` result.
+## Observability
+### Prometheus Metrics
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
+| `{prefix}_requests_total` | Counter | `cache_name`, `result`, `category` | `result`: `hit`, `miss`, `uncertain_hit` |
+| `{prefix}_similarity_score` | Histogram | `cache_name`, `category` | Cosine distance per lookup |
+| `{prefix}_operation_duration_seconds` | Histogram | `cache_name`, `operation` | End-to-end latency |
+| `{prefix}_embedding_duration_seconds` | Histogram | `cache_name` | Time in `embedFn` |
+| `{prefix}_cost_saved_total` | Counter | `cache_name`, `category` | Dollars saved from hits |
+| `{prefix}_embedding_cache_total` | Counter | `cache_name`, `result` | Embedding cache hit/miss |
+| `{prefix}_stale_model_evictions_total` | Counter | `cache_name` | Evictions from `staleAfterModelChange` |
+### OpenTelemetry
+Every public method emits an OTel span. Requires an OpenTelemetry SDK in the host application.
+## Examples
+Runnable examples in [examples/](./examples/). All examples connect to `localhost:6399` by default (override via `VALKEY_HOST` / `VALKEY_PORT`).
+| Example | API key needed | What it shows |
+|---|---|---|
+| `basic/` | Voyage AI (or `--mock`) | Core store/check/invalidate |
+| `openai/` | OpenAI | Chat Completions + cost tracking |
+| `openai-responses/` | OpenAI | Responses API adapter |
+| `anthropic/` | Anthropic + OpenAI | Messages API, high cost savings (~$0.000064/hit) |
+| `llamaindex/` | OpenAI | ChatMessage[] adapter |
+| `langchain/` | OpenAI | BetterDBSemanticCache + ChatOpenAI |
+| `vercel-ai-sdk/` | OpenAI | createSemanticCacheMiddleware |
+| `langgraph/` | None | BetterDBSemanticStore memory |
+| `multimodal/` | None | ContentBlock[] with text + image |
+| `cost-tracking/` | None | Cost savings with mock embedder |
+| `threshold-tuning/` | None | thresholdEffectiveness() |
+| `embedding-cache/` | None | Embedding cache on/off comparison |
+| `batch-check/` | None | checkBatch() vs sequential |
+| `rerank/` | None | Top-k rerank hook |
-`@betterdb/semantic-cache` works with single-node Valkey instances and managed
-single-endpoint services (Amazon ElastiCache for Valkey, Google Cloud Memorystore
-for Valkey). It does not fully support Valkey in cluster mode.
+## Client Lifecycle
+SemanticCache does **not** own the iovalkey client:
+```typescript
+const client = new Valkey({ host: 'localhost', port: 6399 });
+const cache = new SemanticCache({ client, embedFn });
+// ... use cache ...
+await client.quit();
+```
-The specific issue is `flush()`: it uses `SCAN` to find and delete entry keys,
-but `SCAN` in cluster mode only iterates keys on the node it is sent to. In a
-multi-node cluster, `flush()` will silently leave entry keys on other nodes
-(the FT index itself is dropped correctly).
+## Known Limitations
-`check()`, `store()`, `invalidate()`, and `stats()` are unaffected — these use
-`FT.SEARCH`, `HSET`, `DEL`, and `HINCRBY` which route correctly in cluster mode
-via the key hash slot.
+### Cluster mode
-If you need cluster support, either avoid `flush()` or implement a cluster-aware
-key sweep using the iovalkey cluster client's per-node scan capability.
-Cluster mode support is planned for a future release.
+`flush()` fans out via `clusterScan()` across all master nodes. `FT.SEARCH` routes correctly via hash slots. `FT.CREATE` only creates the index on the receiving node - in a full cluster, create the index on each node separately.
 ### Streaming
-Streaming LLM responses are not supported. `store()` expects a complete response
-string. If your application uses streaming, accumulate the full response before
-calling `store()`. The cached response is always returned as a complete string,
-not re-streamed token-by-token.
+`store()` requires a complete response string. The Vercel AI SDK adapter does not implement `wrapStream`. Accumulate the full streamed response before calling `store()`.
+### Schema migration (v0.1 -> v0.2)
+v0.2.0 added `binary_refs`, `temperature`, `top_p`, `seed` fields to the index schema. Existing v0.1.0 indexes operate in text-only mode until `flush()` + `initialize()` rebuilds the schema.
 ## License

package/dist/SemanticCache.d.ts CHANGED Viewed

@@ -1,4 +1,5 @@
 import type { SemanticCacheOptions, CacheCheckOptions, CacheStoreOptions, CacheCheckResult, CacheStats, IndexInfo, InvalidateResult } from './types';
+import { type ContentBlock } from './utils';
 export declare class SemanticCache {
     private readonly client;
     private readonly embedFn;
@@ -6,15 +7,27 @@ export declare class SemanticCache {
     private readonly indexName;
     private readonly entryPrefix;
     private readonly statsKey;
+    private readonly similarityWindowKey;
     private readonly defaultThreshold;
     private readonly defaultTtl;
     private readonly categoryThresholds;
     private readonly uncertaintyBand;
     private readonly telemetry;
+    private readonly costTable;
+    private readonly embeddingCacheEnabled;
+    private readonly embeddingCacheTtl;
+    private readonly embedKeyPrefix;
     private _initialized;
     private _dimension;
+    private _hasBinaryRefs;
     private _initPromise;
     private _initGeneration;
+    private readonly analyticsOpts;
+    private readonly usesDefaultCostTable;
+    private analytics;
+    private statsTimer;
+    private shutdownCalled;
+    private analyticsInitiated;
     /**
      * Creates a new SemanticCache instance.
      *
@@ -27,33 +40,100 @@ export declare class SemanticCache {
     constructor(options: SemanticCacheOptions);
     initialize(): Promise<void>;
     flush(): Promise<void>;
-    check(prompt: string, options?: CacheCheckOptions): Promise<CacheCheckResult>;
-    store(prompt: string, response: string, options?: CacheStoreOptions): Promise<string>;
+    /** Shut down the analytics client and cancel the stats timer. */
+    shutdown(): Promise<void>;
+    check(prompt: string | ContentBlock[], options?: CacheCheckOptions): Promise<CacheCheckResult>;
+    store(prompt: string | ContentBlock[], response: string, options?: CacheStoreOptions): Promise<string>;
+    /**
+     * Store structured content blocks as the cached response.
+     * Populates both the response field (from TextBlock text) and content_blocks (full JSON).
+     */
+    storeMultipart(prompt: string | ContentBlock[], blocks: ContentBlock[], options?: CacheStoreOptions): Promise<string>;
+    /**
+     * Check multiple prompts in parallel, using pipelined FT.SEARCH calls.
+     * Returns results in input order.
+     */
+    checkBatch(prompts: (string | ContentBlock[])[], options?: CacheCheckOptions): Promise<CacheCheckResult[]>;
     /**
      * Deletes all entries matching a valkey-search filter expression.
      *
      * **Security note:** `filter` is passed directly to FT.SEARCH. Only pass
-     * trusted, programmatically-constructed expressions — never unsanitised
+     * trusted, programmatically-constructed expressions - never unsanitised
      * user input.
      */
     invalidate(filter: string): Promise<InvalidateResult>;
+    /** Delete all entries tagged with the given model name. */
+    invalidateByModel(model: string): Promise<number>;
+    /** Delete all entries tagged with the given category. */
+    invalidateByCategory(category: string): Promise<number>;
     stats(): Promise<CacheStats>;
     indexInfo(): Promise<IndexInfo>;
+    /**
+     * Analyze the rolling similarity score window and recommend threshold adjustments.
+     */
+    thresholdEffectiveness(options?: {
+        category?: string;
+        minSamples?: number;
+    }): Promise<ThresholdEffectivenessResult>;
+    /**
+     * Returns threshold effectiveness results for every category seen in the
+     * rolling window, plus one aggregate result for all categories combined.
+     */
+    thresholdEffectivenessAll(options?: {
+        minSamples?: number;
+    }): Promise<ThresholdEffectivenessResult[]>;
+    /** @internal Default similarity threshold. */
+    get _defaultThreshold(): number;
+    /**
+     * Execute a stable FT.SEARCH for use by adapters (e.g. LangGraph).
+     * SORTBY inserted_at ASC gives stable ordering across paginated calls.
+     * @internal
+     */
+    _searchEntries(filterExpr: string, limit: number, offset: number): Promise<unknown>;
+    /**
+     * Embed text for use by adapters (e.g. LangGraph semantic search).
+     * @internal
+     */
+    _embedText(text: string): Promise<{
+        vector: number[];
+        durationSec: number;
+    }>;
     private _doInitialize;
+    private initAnalyticsSafe;
+    private captureStatsSnapshot;
     private ensureIndexAndGetDimension;
-    /** Wraps embedFn with error handling and duration tracking. */
+    /** Check if the index schema has a binary_refs field. */
+    private parseHasBinaryRefsFromInfo;
+    /** Resolve a prompt (string or ContentBlock[]) into text + binary refs. */
+    private resolvePrompt;
+    /** Wraps embedFn with error handling, duration tracking, and optional embedding cache. */
     private embed;
     /**
      * Wraps a method body in an OTel span with automatic status, end, and
      * operation duration metric. The span is passed to fn so callers can
-     * set attributes — but callers must NOT call span.end() or span.setStatus(),
+     * set attributes - but callers must NOT call span.end() or span.setStatus(),
      * as traced() handles both.
      */
     private traced;
     /** Increment stats counters via pipeline. */
     private recordStat;
+    /** Append to the rolling similarity window sorted set and trim to 10,000 entries or 7 days. */
+    private recordSimilarityWindow;
     private assertInitialized;
     private assertDimension;
     private isIndexNotFoundError;
     private parseDimensionFromInfo;
 }
+export interface ThresholdEffectivenessResult {
+    category: string;
+    sampleCount: number;
+    currentThreshold: number;
+    hitRate: number;
+    uncertainHitRate: number;
+    nearMissRate: number;
+    avgHitSimilarity: number;
+    avgMissSimilarity: number;
+    recommendation: 'tighten_threshold' | 'loosen_threshold' | 'optimal' | 'insufficient_data';
+    recommendedThreshold?: number;
+    reasoning: string;
+}