openclaw-langcache 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,260 @@
1
+ # Redis LangCache REST API Reference
2
+
3
+ Complete documentation for the Redis LangCache REST API.
4
+
5
+ ## Base URL
6
+
7
+ ```
8
+ https://{LANGCACHE_HOST}/v1/caches/{CACHE_ID}
9
+ ```
10
+
11
+ ## Authentication
12
+
13
+ All requests require a Bearer token in the Authorization header:
14
+
15
+ ```
16
+ Authorization: Bearer {API_KEY}
17
+ ```
18
+
19
+ ## Endpoints
20
+
21
+ ### Search for Cached Response
22
+
23
+ Search the cache for semantically similar prompts.
24
+
25
+ ```
26
+ POST /v1/caches/{cacheId}/entries/search
27
+ ```
28
+
29
+ **Request Body:**
30
+
31
+ | Field | Type | Required | Description |
32
+ |-------|------|----------|-------------|
33
+ | `prompt` | string | Yes | The prompt to search for |
34
+ | `similarityThreshold` | number | No | Similarity threshold (0.0-1.0). Higher = stricter matching |
35
+ | `searchStrategies` | array | No | `["exact"]`, `["semantic"]`, or `["exact", "semantic"]` |
36
+ | `attributes` | object | No | Key-value pairs to filter results |
37
+
38
+ **Example Request:**
39
+
40
+ ```json
41
+ {
42
+ "prompt": "What is semantic caching?",
43
+ "similarityThreshold": 0.9,
44
+ "searchStrategies": ["semantic"],
45
+ "attributes": {
46
+ "model": "gpt-5"
47
+ }
48
+ }
49
+ ```
50
+
51
+ **Response (Cache Hit):**
52
+
53
+ ```json
54
+ {
55
+ "hit": true,
56
+ "entryId": "abc123",
57
+ "prompt": "What is semantic caching?",
58
+ "response": "Semantic caching stores and retrieves data based on meaning similarity...",
59
+ "similarity": 0.95,
60
+ "attributes": {
61
+ "model": "gpt-5"
62
+ }
63
+ }
64
+ ```
65
+
66
+ **Response (Cache Miss):**
67
+
68
+ ```json
69
+ {
70
+ "hit": false
71
+ }
72
+ ```
73
+
74
+ ---
75
+
76
+ ### Store New Entry
77
+
78
+ Store a prompt-response pair in the cache.
79
+
80
+ ```
81
+ POST /v1/caches/{cacheId}/entries
82
+ ```
83
+
84
+ **Request Body:**
85
+
86
+ | Field | Type | Required | Description |
87
+ |-------|------|----------|-------------|
88
+ | `prompt` | string | Yes | The prompt to cache |
89
+ | `response` | string | Yes | The LLM response to cache |
90
+ | `attributes` | object | No | Key-value metadata for filtering/organization |
91
+
92
+ **Example Request:**
93
+
94
+ ```json
95
+ {
96
+ "prompt": "What is semantic caching?",
97
+ "response": "Semantic caching is a technique that stores and retrieves cached data based on semantic similarity rather than exact matches.",
98
+ "attributes": {
99
+ "model": "gpt-5",
100
+ "skill": "general-qa",
101
+ "version": "1.0"
102
+ }
103
+ }
104
+ ```
105
+
106
+ **Response:**
107
+
108
+ ```json
109
+ {
110
+ "entryId": "abc123",
111
+ "created": true
112
+ }
113
+ ```
114
+
115
+ ---
116
+
117
+ ### Delete Entry by ID
118
+
119
+ Delete a specific cache entry.
120
+
121
+ ```
122
+ DELETE /v1/caches/{cacheId}/entries/{entryId}
123
+ ```
124
+
125
+ **Response:**
126
+
127
+ ```json
128
+ {
129
+ "deleted": true
130
+ }
131
+ ```
132
+
133
+ ---
134
+
135
+ ### Delete Entries by Attributes
136
+
137
+ Delete all entries matching the specified attributes.
138
+
139
+ ```
140
+ DELETE /v1/caches/{cacheId}/entries
141
+ ```
142
+
143
+ **Request Body:**
144
+
145
+ ```json
146
+ {
147
+ "attributes": {
148
+ "user_id": "123"
149
+ }
150
+ }
151
+ ```
152
+
153
+ **Response:**
154
+
155
+ ```json
156
+ {
157
+ "deletedCount": 15
158
+ }
159
+ ```
160
+
161
+ ---
162
+
163
+ ### Flush Cache
164
+
165
+ Delete all entries in the cache.
166
+
167
+ ```
168
+ POST /v1/caches/{cacheId}/flush
169
+ ```
170
+
171
+ **Response:**
172
+
173
+ ```json
174
+ {
175
+ "flushed": true
176
+ }
177
+ ```
178
+
179
+ **Warning:** This operation cannot be undone.
180
+
181
+ ---
182
+
183
+ ## Search Strategies
184
+
185
+ | Strategy | Description |
186
+ |----------|-------------|
187
+ | `exact` | Case-insensitive exact match on prompt text |
188
+ | `semantic` | Vector similarity search using embeddings |
189
+
190
+ When both strategies are specified, exact match is checked first. If no exact match, semantic search is performed.
191
+
192
+ ## Similarity Threshold
193
+
194
+ The `similarityThreshold` parameter controls how similar a cached prompt must be to return a hit:
195
+
196
+ - `1.0` - Exact semantic match only
197
+ - `0.95` - Very high similarity (recommended for factual queries)
198
+ - `0.90` - High similarity (good default)
199
+ - `0.85` - Moderate similarity (allows more variation)
200
+ - `0.80` - Lower similarity (may return less relevant results)
201
+
202
+ The optimal threshold depends on your use case. Start with `0.90` and adjust based on hit/miss quality.
203
+
204
+ ## Attributes
205
+
206
+ Attributes are key-value pairs attached to cache entries. Use them to:
207
+
208
+ 1. **Partition the cache** - Separate entries by user, model, or context
209
+ 2. **Filter searches** - Only match entries with specific attributes
210
+ 3. **Bulk delete** - Remove all entries matching attributes
211
+
212
+ **Common attribute patterns:**
213
+
214
+ ```json
215
+ {
216
+ "model": "gpt-5.2",
217
+ "user_id": "user_123",
218
+ "skill": "code-review",
219
+ "language": "en",
220
+ "version": "2.0"
221
+ }
222
+ ```
223
+
224
+ ## Error Responses
225
+
226
+ | Status | Description |
227
+ |--------|-------------|
228
+ | 400 | Bad request (invalid JSON or missing required fields) |
229
+ | 401 | Unauthorized (invalid or missing API key) |
230
+ | 404 | Cache or entry not found |
231
+ | 429 | Rate limited |
232
+ | 500 | Internal server error |
233
+
234
+ **Error Response Format:**
235
+
236
+ ```json
237
+ {
238
+ "error": {
239
+ "code": "INVALID_REQUEST",
240
+ "message": "Missing required field: prompt"
241
+ }
242
+ }
243
+ ```
244
+
245
+ ## Rate Limits
246
+
247
+ Rate limits depend on your Redis Cloud plan. Monitor the `X-RateLimit-*` headers in responses:
248
+
249
+ - `X-RateLimit-Limit` - Maximum requests per window
250
+ - `X-RateLimit-Remaining` - Requests remaining
251
+ - `X-RateLimit-Reset` - Unix timestamp when limit resets
252
+
253
+ ## SDKs
254
+
255
+ Official SDKs are available for:
256
+
257
+ - **Python:** `pip install langcache`
258
+ - **JavaScript/Node:** `npm install @redis-ai/langcache`
259
+
260
+ See [Redis LangCache documentation](https://redis.io/docs/latest/develop/ai/langcache/) for SDK usage examples.
@@ -0,0 +1,215 @@
1
+ # LangCache Best Practices
2
+
3
+ Optimization techniques for effective semantic caching with Redis LangCache.
4
+
5
+ ## 1. Choose the Right Similarity Threshold
6
+
7
+ The threshold directly impacts cache hit rate vs. relevance:
8
+
9
+ | Threshold | Hit Rate | Relevance | Best For |
10
+ |-----------|----------|-----------|----------|
11
+ | 0.98+ | Very Low | Very High | Safety-critical, exact answers only |
12
+ | 0.93-0.97 | Low | High | Factual Q&A, documentation lookups |
13
+ | 0.88-0.92 | Medium | Good | General queries, support questions |
14
+ | 0.83-0.87 | High | Moderate | Exploratory queries, suggestions |
15
+ | < 0.83 | Very High | Low | Not recommended (too many false positives) |
16
+
17
+ **Recommendation:** Start at `0.90` and adjust based on observed quality.
18
+
19
+ ## 2. Use Attributes Strategically
20
+
21
+ ### Partition by Model
22
+
23
+ Different models produce different responses. Always include the model:
24
+
25
+ ```json
26
+ {"attributes": {"model": "gpt-5.2"}}
27
+ ```
28
+
29
+ ### Partition by User (When Appropriate)
30
+
31
+ For personalized responses, include user ID:
32
+
33
+ ```json
34
+ {"attributes": {"user_id": "123"}}
35
+ ```
36
+
37
+ For shared knowledge (FAQs, docs), omit user ID to maximize cache hits.
38
+
39
+ ### Version Your Cache
40
+
41
+ When prompts or system behavior changes, bump the version:
42
+
43
+ ```json
44
+ {"attributes": {"version": "2.1"}}
45
+ ```
46
+
47
+ This prevents stale responses from old prompt formats.
48
+
49
+ ## 3. Normalize Prompts Before Caching
50
+
51
+ Semantic similarity helps, but normalizing prompts improves hit rates:
52
+
53
+ **Before storing/searching:**
54
+ - Trim whitespace
55
+ - Lowercase (if case doesn't matter)
56
+ - Remove filler words ("um", "uh", "please", "can you")
57
+ - Standardize punctuation
58
+
59
+ ```python
60
+ def normalize_prompt(prompt):
61
+ prompt = prompt.strip().lower()
62
+ prompt = re.sub(r'\s+', ' ', prompt) # Collapse whitespace
63
+ prompt = re.sub(r'^(please |can you |could you )', '', prompt)
64
+ return prompt
65
+ ```
66
+
67
+ ## 4. Don't Cache Everything
68
+
69
+ ### Good Candidates for Caching
70
+
71
+ - **Factual queries:** "What is X?", "How does Y work?"
72
+ - **Documentation lookups:** "Show me the API for Z"
73
+ - **Repeated patterns:** Common skill invocations
74
+ - **Static information:** Help text, feature descriptions
75
+
76
+ ### Bad Candidates for Caching
77
+
78
+ - **Time-sensitive:** "What's the weather?", "What's on my calendar?"
79
+ - **Context-dependent:** Responses that depend on conversation history
80
+ - **Personalized:** Responses tailored to user preferences/state
81
+ - **Creative:** Tasks where variation is desired
82
+ - **Stateful:** Responses that modify system state
83
+
84
+ ## 5. Implement Cache-Aside Pattern
85
+
86
+ The standard pattern for integrating LangCache:
87
+
88
+ ```python
89
+ async def get_response(prompt: str, context: dict) -> str:
90
+ # 1. Normalize prompt
91
+ normalized = normalize_prompt(prompt)
92
+
93
+ # 2. Check cache
94
+ cached = await langcache.search(
95
+ prompt=normalized,
96
+ similarity_threshold=0.9,
97
+ attributes={"model": MODEL_ID}
98
+ )
99
+
100
+ if cached.hit:
101
+ log_cache_hit(cached.similarity)
102
+ return cached.response
103
+
104
+ # 3. Call LLM on cache miss
105
+ response = await llm.complete(prompt, context)
106
+
107
+ # 4. Store in cache (async, don't block response)
108
+ asyncio.create_task(
109
+ langcache.store(
110
+ prompt=normalized,
111
+ response=response,
112
+ attributes={"model": MODEL_ID}
113
+ )
114
+ )
115
+
116
+ return response
117
+ ```
118
+
119
+ ## 6. Use Hybrid Search for Exact + Semantic
120
+
121
+ For maximum efficiency, check exact match first:
122
+
123
+ ```json
124
+ {
125
+ "prompt": "What is Redis?",
126
+ "searchStrategies": ["exact", "semantic"],
127
+ "similarityThreshold": 0.9
128
+ }
129
+ ```
130
+
131
+ Exact matches are faster and guaranteed relevant. Semantic search is fallback.
132
+
133
+ ## 7. Monitor and Tune
134
+
135
+ ### Key Metrics to Track
136
+
137
+ - **Hit rate:** `cache_hits / total_requests`
138
+ - **Similarity distribution:** Histogram of similarity scores for hits
139
+ - **Miss reasons:** Why queries miss (no similar entry vs. below threshold)
140
+ - **Latency:** Cache lookup time vs. LLM call time
141
+
142
+ ### Tuning Workflow
143
+
144
+ 1. Start with threshold `0.90`
145
+ 2. Log all cache hits with similarity scores
146
+ 3. Review hits with similarity `0.85-0.92` - are they relevant?
147
+ 4. If too many irrelevant hits: raise threshold
148
+ 5. If too many misses on similar queries: lower threshold
149
+ 6. Repeat weekly as usage patterns evolve
150
+
151
+ ## 8. Handle Cache Invalidation
152
+
153
+ ### Time-Based (TTL)
154
+
155
+ Configure TTL at the cache level for automatic expiration:
156
+ - Short TTL (hours): Fast-changing information
157
+ - Medium TTL (days): General knowledge
158
+ - Long TTL (weeks): Stable documentation
159
+
160
+ ### Event-Based
161
+
162
+ Invalidate when underlying data changes:
163
+
164
+ ```python
165
+ # When documentation is updated
166
+ await langcache.delete_query(attributes={"category": "docs", "version": "1.0"})
167
+ ```
168
+
169
+ ### Version-Based
170
+
171
+ Instead of deleting, bump the version attribute:
172
+
173
+ ```python
174
+ # Old entries remain but won't match new searches
175
+ attributes = {"version": "2.0"} # was "1.0"
176
+ ```
177
+
178
+ ## 9. Warm the Cache
179
+
180
+ For predictable queries, pre-populate the cache:
181
+
182
+ ```python
183
+ # Warm cache with common questions
184
+ faqs = [
185
+ ("What is OpenClaw?", "OpenClaw is an AI agent platform..."),
186
+ ("How do I create a skill?", "To create a skill, create a SKILL.md file..."),
187
+ ]
188
+
189
+ for prompt, response in faqs:
190
+ await langcache.store(
191
+ prompt=prompt,
192
+ response=response,
193
+ attributes={"category": "faq", "version": "1.0"}
194
+ )
195
+ ```
196
+
197
+ ## 10. Cost-Benefit Analysis
198
+
199
+ ### When Caching Saves Money
200
+
201
+ ```
202
+ Savings = (LLM_cost_per_call × cache_hits) - LangCache_cost
203
+ ```
204
+
205
+ Caching is worthwhile when:
206
+ - LLM calls are expensive (GPT-4, Claude Opus)
207
+ - Queries are repetitive
208
+ - Hit rate > 20-30% (depends on relative costs)
209
+
210
+ ### When Caching Adds Complexity Without Benefit
211
+
212
+ - Low query volume (< 100/day)
213
+ - Highly unique queries (< 10% potential hit rate)
214
+ - Cheap LLM model (caching overhead not worth it)
215
+ - Real-time requirements where stale data is unacceptable