openclaw-langcache 1.0.0 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +6 -4
- package/claude-skills/langcache/SKILL.md +173 -0
- package/claude-skills/langcache/examples/agent-integration.py +453 -0
- package/claude-skills/langcache/examples/basic-caching.sh +56 -0
- package/claude-skills/langcache/references/api-reference.md +260 -0
- package/claude-skills/langcache/references/best-practices.md +215 -0
- package/claude-skills/langcache/scripts/langcache.sh +528 -0
- package/package.json +14 -8
- package/scripts/postinstall.js +116 -0
package/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
|
-
#
|
|
1
|
+
# openclaw-langcache
|
|
2
2
|
|
|
3
|
-
Semantic caching skill for [OpenClaw](https://openclaw.ai) using [Redis LangCache](https://redis.io/langcache/).
|
|
3
|
+
Semantic caching skill for [OpenClaw](https://openclaw.ai) and [Claude Code](https://claude.ai/code) using [Redis LangCache](https://redis.io/langcache/).
|
|
4
4
|
|
|
5
5
|
Reduce LLM costs and latency by caching responses for semantically similar queries, with built-in privacy and security guardrails.
|
|
6
6
|
|
|
@@ -20,10 +20,12 @@ Reduce LLM costs and latency by caching responses for semantically similar queri
|
|
|
20
20
|
### Via npm (Recommended)
|
|
21
21
|
|
|
22
22
|
```bash
|
|
23
|
-
npm install
|
|
23
|
+
npm install -g openclaw-langcache
|
|
24
24
|
```
|
|
25
25
|
|
|
26
|
-
The skill will be automatically installed to
|
|
26
|
+
The skill will be automatically installed to:
|
|
27
|
+
- **OpenClaw**: `~/.openclaw/workspace/skills/langcache/`
|
|
28
|
+
- **Claude Code**: `~/.claude/skills/langcache/`
|
|
27
29
|
|
|
28
30
|
### Via Git
|
|
29
31
|
|
|
@@ -0,0 +1,173 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: langcache
|
|
3
|
+
description: This skill should be used when the user asks to "enable semantic caching", "cache LLM responses", "reduce API costs", "speed up AI responses", "configure LangCache", "check the cache", or mentions Redis LangCache, semantic similarity caching, or LLM response caching. Provides integration with Redis LangCache managed service for semantic caching of prompts and responses.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
tools: Read, Bash, WebFetch
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Redis LangCache Semantic Caching
|
|
9
|
+
|
|
10
|
+
Integrate [Redis LangCache](https://redis.io/langcache/) for semantic caching of LLM prompts and responses. Reduces costs and latency by returning cached results for semantically similar queries.
|
|
11
|
+
|
|
12
|
+
## Prerequisites
|
|
13
|
+
|
|
14
|
+
Set credentials in environment or `~/.claude/settings.local.json`:
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
export LANGCACHE_HOST=your-instance.redis.cloud
|
|
18
|
+
export LANGCACHE_CACHE_ID=your-cache-id
|
|
19
|
+
export LANGCACHE_API_KEY=your-api-key
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
## Quick Reference
|
|
23
|
+
|
|
24
|
+
| Operation | Command |
|
|
25
|
+
|-----------|---------|
|
|
26
|
+
| Search cache | `./scripts/langcache.sh search "query"` |
|
|
27
|
+
| Store response | `./scripts/langcache.sh store "prompt" "response"` |
|
|
28
|
+
| Check if blocked | `./scripts/langcache.sh check "text"` |
|
|
29
|
+
| Delete entry | `./scripts/langcache.sh delete --id <id>` |
|
|
30
|
+
| Flush cache | `./scripts/langcache.sh flush` |
|
|
31
|
+
|
|
32
|
+
## Default Caching Policy
|
|
33
|
+
|
|
34
|
+
This policy is **enforced automatically** by the CLI and integration code.
|
|
35
|
+
|
|
36
|
+
### CACHEABLE (white-list)
|
|
37
|
+
|
|
38
|
+
| Category | Examples | Threshold |
|
|
39
|
+
|----------|----------|-----------|
|
|
40
|
+
| Factual Q&A | "What is X?", "How does Y work?" | 0.90 |
|
|
41
|
+
| Definitions / docs | API docs, command help | 0.90 |
|
|
42
|
+
| Command explanations | "What does `git rebase` do?" | 0.92 |
|
|
43
|
+
| Reply templates | "polite no", "follow-up", "intro" | 0.88 |
|
|
44
|
+
| Style transforms | "make this warmer/shorter" | 0.85 |
|
|
45
|
+
|
|
46
|
+
### NEVER CACHE (hard blocks)
|
|
47
|
+
|
|
48
|
+
| Category | Patterns | Reason |
|
|
49
|
+
|----------|----------|--------|
|
|
50
|
+
| **Temporal** | today, tomorrow, deadline, ETA, "in 20 min" | Stale immediately |
|
|
51
|
+
| **Credentials** | API keys, passwords, tokens, OTP/2FA | Security |
|
|
52
|
+
| **Identifiers** | emails, phones, account IDs, UUIDs | Privacy/PII |
|
|
53
|
+
| **Personal** | "my wife said", relationships, private chats | Privacy |
|
|
54
|
+
|
|
55
|
+
## Core Operations
|
|
56
|
+
|
|
57
|
+
### Search for Cached Response
|
|
58
|
+
|
|
59
|
+
Before calling an LLM, check for semantically similar cached response:
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
# Basic search
|
|
63
|
+
./scripts/langcache.sh search "What is semantic caching?"
|
|
64
|
+
|
|
65
|
+
# With similarity threshold (0.0-1.0, higher = stricter)
|
|
66
|
+
./scripts/langcache.sh search "What is semantic caching?" --threshold 0.95
|
|
67
|
+
|
|
68
|
+
# With attribute filtering
|
|
69
|
+
./scripts/langcache.sh search "query" --attr "model=gpt-5"
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
**Response (hit):**
|
|
73
|
+
```json
|
|
74
|
+
{"hit": true, "response": "...", "similarity": 0.94, "entryId": "abc123"}
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
**Response (miss):**
|
|
78
|
+
```json
|
|
79
|
+
{"hit": false}
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
**Response (blocked):**
|
|
83
|
+
```json
|
|
84
|
+
{"hit": false, "blocked": true, "reason": "temporal_info"}
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### Store New Response
|
|
88
|
+
|
|
89
|
+
After LLM call, cache for future use:
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
# Basic store
|
|
93
|
+
./scripts/langcache.sh store "What is Redis?" "Redis is an in-memory data store..."
|
|
94
|
+
|
|
95
|
+
# With attributes for organization/filtering
|
|
96
|
+
./scripts/langcache.sh store "prompt" "response" --attr "model=gpt-5" --attr "category=factual"
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
### Check Policy Compliance
|
|
100
|
+
|
|
101
|
+
Test if content would be blocked:
|
|
102
|
+
|
|
103
|
+
```bash
|
|
104
|
+
./scripts/langcache.sh check "What's on my calendar today?"
|
|
105
|
+
# Output: BLOCKED: temporal_info
|
|
106
|
+
|
|
107
|
+
./scripts/langcache.sh check "What is Redis?"
|
|
108
|
+
# Output: ALLOWED: Content can be cached
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
### Delete Entries
|
|
112
|
+
|
|
113
|
+
```bash
|
|
114
|
+
# By ID
|
|
115
|
+
./scripts/langcache.sh delete --id "abc123"
|
|
116
|
+
|
|
117
|
+
# By attributes (bulk)
|
|
118
|
+
./scripts/langcache.sh delete --attr "model=gpt-4"
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
### Flush Cache
|
|
122
|
+
|
|
123
|
+
```bash
|
|
124
|
+
./scripts/langcache.sh flush # Interactive confirmation
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
## Integration Pattern
|
|
128
|
+
|
|
129
|
+
Recommended cache-aside pattern for agent workflows:
|
|
130
|
+
|
|
131
|
+
```
|
|
132
|
+
1. Receive user prompt
|
|
133
|
+
2. Check policy: is it cacheable?
|
|
134
|
+
- If blocked → skip cache, call LLM
|
|
135
|
+
3. Search LangCache for similar cached response
|
|
136
|
+
- If hit (similarity ≥ threshold) → return cached
|
|
137
|
+
4. Call LLM API
|
|
138
|
+
5. Store prompt + response in LangCache
|
|
139
|
+
6. Return response
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
## Search Strategies
|
|
143
|
+
|
|
144
|
+
| Strategy | Description |
|
|
145
|
+
|----------|-------------|
|
|
146
|
+
| `semantic` (default) | Vector similarity matching |
|
|
147
|
+
| `exact` | Case-insensitive exact match |
|
|
148
|
+
| `exact,semantic` | Try exact first, fall back to semantic |
|
|
149
|
+
|
|
150
|
+
```bash
|
|
151
|
+
./scripts/langcache.sh search "query" --strategy "exact,semantic"
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
## Attributes for Cache Partitioning
|
|
155
|
+
|
|
156
|
+
Use attributes to organize and filter cache entries:
|
|
157
|
+
|
|
158
|
+
| Attribute | Purpose |
|
|
159
|
+
|-----------|---------|
|
|
160
|
+
| `model` | Separate caches per LLM model |
|
|
161
|
+
| `category` | `factual`, `template`, `style`, `command` |
|
|
162
|
+
| `version` | Invalidate when prompts change |
|
|
163
|
+
| `user_id` | Per-user isolation (if needed) |
|
|
164
|
+
|
|
165
|
+
## References
|
|
166
|
+
|
|
167
|
+
- [API Reference](references/api-reference.md) - Complete REST API documentation
|
|
168
|
+
- [Best Practices](references/best-practices.md) - Optimization techniques
|
|
169
|
+
|
|
170
|
+
## Examples
|
|
171
|
+
|
|
172
|
+
- [examples/basic-caching.sh](examples/basic-caching.sh) - Shell workflow
|
|
173
|
+
- [examples/agent-integration.py](examples/agent-integration.py) - Python pattern with policy enforcement
|
|
@@ -0,0 +1,453 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
"""
|
|
3
|
+
agent-integration.py - LangCache integration pattern for OpenClaw agents
|
|
4
|
+
|
|
5
|
+
This example demonstrates how to integrate Redis LangCache semantic caching
|
|
6
|
+
into an OpenClaw agent workflow with the default caching policy enforced.
|
|
7
|
+
|
|
8
|
+
Requirements:
|
|
9
|
+
pip install langcache httpx
|
|
10
|
+
|
|
11
|
+
Environment variables:
|
|
12
|
+
LANGCACHE_HOST - LangCache API host
|
|
13
|
+
LANGCACHE_CACHE_ID - Cache ID
|
|
14
|
+
LANGCACHE_API_KEY - API key
|
|
15
|
+
OPENAI_API_KEY - OpenAI API key (or your LLM provider)
|
|
16
|
+
"""
|
|
17
|
+
|
|
18
|
+
import os
|
|
19
|
+
import re
|
|
20
|
+
import asyncio
|
|
21
|
+
import logging
|
|
22
|
+
from dataclasses import dataclass, field
|
|
23
|
+
from typing import Optional
|
|
24
|
+
from enum import Enum
|
|
25
|
+
|
|
26
|
+
# LangCache SDK
|
|
27
|
+
from langcache import LangCache
|
|
28
|
+
from langcache.models import SearchStrategy
|
|
29
|
+
|
|
30
|
+
# Your LLM client (example uses OpenAI)
|
|
31
|
+
import httpx
|
|
32
|
+
|
|
33
|
+
logging.basicConfig(level=logging.INFO)
|
|
34
|
+
logger = logging.getLogger(__name__)
|
|
35
|
+
|
|
36
|
+
|
|
37
|
+
class BlockReason(Enum):
|
|
38
|
+
"""Reasons why content was blocked from caching."""
|
|
39
|
+
TEMPORAL = "temporal_info"
|
|
40
|
+
CREDENTIALS = "credentials"
|
|
41
|
+
IDENTIFIERS = "identifiers"
|
|
42
|
+
PERSONAL = "personal_context"
|
|
43
|
+
|
|
44
|
+
|
|
45
|
+
# =============================================================================
|
|
46
|
+
# HARD BLOCK PATTERNS - These NEVER get cached
|
|
47
|
+
# =============================================================================
|
|
48
|
+
|
|
49
|
+
BLOCK_PATTERNS = {
|
|
50
|
+
BlockReason.TEMPORAL: [
|
|
51
|
+
r"\b(today|tomorrow|tonight|yesterday)\b",
|
|
52
|
+
r"\b(this|next|last)\s+(week|month|year|monday|tuesday|wednesday|thursday|friday|saturday|sunday)\b",
|
|
53
|
+
r"\bin\s+\d+\s+(minutes?|hours?|days?|weeks?)\b",
|
|
54
|
+
r"\b(deadline|eta|appointment|scheduled?|meeting\s+at)\b",
|
|
55
|
+
r"\b(right\s+now|at\s+\d{1,2}(:\d{2})?\s*(am|pm)?)\b",
|
|
56
|
+
r"\b(this\s+morning|this\s+afternoon|this\s+evening)\b",
|
|
57
|
+
],
|
|
58
|
+
BlockReason.CREDENTIALS: [
|
|
59
|
+
r"\b(api[_-]?key|api[_-]?secret|access[_-]?token)\b",
|
|
60
|
+
r"\b(password|passwd|pwd)\s*[:=]",
|
|
61
|
+
r"\b(secret[_-]?key|private[_-]?key)\b",
|
|
62
|
+
r"\b(otp|2fa|totp|authenticator)\s*(code|token)?\b",
|
|
63
|
+
r"\bbearer\s+[a-zA-Z0-9_-]+",
|
|
64
|
+
r"\b(sk|pk)[_-][a-zA-Z0-9]{20,}\b", # API key patterns like sk-xxx
|
|
65
|
+
],
|
|
66
|
+
BlockReason.IDENTIFIERS: [
|
|
67
|
+
r"\b\d{10,15}\b", # Phone numbers, long numeric IDs
|
|
68
|
+
r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b", # Emails
|
|
69
|
+
r"\b(order|account|message|chat|user|customer)[_-]?id\s*[:=]?\s*\w+",
|
|
70
|
+
r"\b[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}\b", # UUIDs
|
|
71
|
+
r"\b\d{1,5}\s+\w+\s+(street|st|avenue|ave|road|rd|boulevard|blvd)\b", # Addresses
|
|
72
|
+
r"\b\d{5}(-\d{4})?\b", # ZIP codes
|
|
73
|
+
r"@[a-zA-Z0-9_]{1,15}\b", # Social handles / JIDs
|
|
74
|
+
],
|
|
75
|
+
BlockReason.PERSONAL: [
|
|
76
|
+
r"\bmy\s+(wife|husband|partner|girlfriend|boyfriend|spouse)\b",
|
|
77
|
+
r"\bmy\s+(mom|dad|mother|father|brother|sister|son|daughter|child|kid)\b",
|
|
78
|
+
r"\bmy\s+(friend|colleague|coworker|boss|manager)\s+\w+", # "my friend John"
|
|
79
|
+
r"\b(said\s+to\s+me|told\s+me|asked\s+me|between\s+us)\b",
|
|
80
|
+
r"\b(private|confidential|secret)\s+(conversation|chat|message)\b",
|
|
81
|
+
r"\bin\s+(our|my)\s+(chat|conversation|thread|group)\b",
|
|
82
|
+
r"\b(he|she|they)\s+(said|told|asked|mentioned)\b", # Referencing specific people
|
|
83
|
+
],
|
|
84
|
+
}
|
|
85
|
+
|
|
86
|
+
|
|
87
|
+
@dataclass
|
|
88
|
+
class CacheConfig:
|
|
89
|
+
"""Configuration for semantic caching behavior."""
|
|
90
|
+
enabled: bool = True
|
|
91
|
+
model_id: str = "gpt-5"
|
|
92
|
+
cache_ttl_seconds: int = 86400 # 24 hours
|
|
93
|
+
|
|
94
|
+
# Thresholds by category
|
|
95
|
+
thresholds: dict = field(default_factory=lambda: {
|
|
96
|
+
"factual": 0.90,
|
|
97
|
+
"template": 0.88,
|
|
98
|
+
"style": 0.85,
|
|
99
|
+
"command": 0.92,
|
|
100
|
+
"default": 0.90,
|
|
101
|
+
})
|
|
102
|
+
|
|
103
|
+
|
|
104
|
+
@dataclass
|
|
105
|
+
class CacheResult:
|
|
106
|
+
"""Result from cache lookup."""
|
|
107
|
+
hit: bool
|
|
108
|
+
response: Optional[str] = None
|
|
109
|
+
similarity: Optional[float] = None
|
|
110
|
+
entry_id: Optional[str] = None
|
|
111
|
+
|
|
112
|
+
|
|
113
|
+
@dataclass
|
|
114
|
+
class BlockCheckResult:
|
|
115
|
+
"""Result from block pattern check."""
|
|
116
|
+
blocked: bool
|
|
117
|
+
reason: Optional[BlockReason] = None
|
|
118
|
+
matched_pattern: Optional[str] = None
|
|
119
|
+
|
|
120
|
+
|
|
121
|
+
class CachedAgent:
|
|
122
|
+
"""
|
|
123
|
+
An agent wrapper that adds semantic caching to LLM calls.
|
|
124
|
+
Enforces the default caching policy with hard blocks.
|
|
125
|
+
|
|
126
|
+
Usage:
|
|
127
|
+
agent = CachedAgent(config=CacheConfig())
|
|
128
|
+
response = await agent.complete("What is Redis?")
|
|
129
|
+
"""
|
|
130
|
+
|
|
131
|
+
def __init__(self, config: Optional[CacheConfig] = None):
|
|
132
|
+
self.config = config or CacheConfig()
|
|
133
|
+
|
|
134
|
+
# Initialize LangCache client
|
|
135
|
+
self.cache = LangCache(
|
|
136
|
+
server_url=f"https://{os.environ['LANGCACHE_HOST']}",
|
|
137
|
+
cache_id=os.environ["LANGCACHE_CACHE_ID"],
|
|
138
|
+
api_key=os.environ["LANGCACHE_API_KEY"],
|
|
139
|
+
)
|
|
140
|
+
|
|
141
|
+
# Initialize LLM client (example: OpenAI-compatible API)
|
|
142
|
+
self.llm_client = httpx.AsyncClient(
|
|
143
|
+
base_url="https://api.openai.com/v1",
|
|
144
|
+
headers={"Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}"},
|
|
145
|
+
timeout=60.0,
|
|
146
|
+
)
|
|
147
|
+
|
|
148
|
+
# Metrics
|
|
149
|
+
self.cache_hits = 0
|
|
150
|
+
self.cache_misses = 0
|
|
151
|
+
self.blocked_requests = {reason: 0 for reason in BlockReason}
|
|
152
|
+
|
|
153
|
+
def _check_hard_blocks(self, text: str) -> BlockCheckResult:
|
|
154
|
+
"""
|
|
155
|
+
Check if text contains any hard-blocked patterns.
|
|
156
|
+
Returns BlockCheckResult with reason if blocked.
|
|
157
|
+
"""
|
|
158
|
+
text_lower = text.lower()
|
|
159
|
+
|
|
160
|
+
for reason, patterns in BLOCK_PATTERNS.items():
|
|
161
|
+
for pattern in patterns:
|
|
162
|
+
if re.search(pattern, text_lower, re.IGNORECASE):
|
|
163
|
+
return BlockCheckResult(
|
|
164
|
+
blocked=True,
|
|
165
|
+
reason=reason,
|
|
166
|
+
matched_pattern=pattern,
|
|
167
|
+
)
|
|
168
|
+
|
|
169
|
+
return BlockCheckResult(blocked=False)
|
|
170
|
+
|
|
171
|
+
def _normalize_prompt(self, prompt: str) -> str:
|
|
172
|
+
"""Normalize prompt for better cache hit rates."""
|
|
173
|
+
normalized = prompt.strip().lower()
|
|
174
|
+
normalized = re.sub(r'\s+', ' ', normalized)
|
|
175
|
+
|
|
176
|
+
# Remove common filler phrases
|
|
177
|
+
fillers = [
|
|
178
|
+
r'^(please |can you |could you |would you |hey |hi |hello )',
|
|
179
|
+
r'^(i want to |i need to |i\'d like to )',
|
|
180
|
+
r'( please| thanks| thank you)$',
|
|
181
|
+
]
|
|
182
|
+
for pattern in fillers:
|
|
183
|
+
normalized = re.sub(pattern, '', normalized)
|
|
184
|
+
|
|
185
|
+
return normalized.strip()
|
|
186
|
+
|
|
187
|
+
def _detect_category(self, prompt: str) -> str:
|
|
188
|
+
"""Detect the category of a prompt for threshold selection."""
|
|
189
|
+
prompt_lower = prompt.lower()
|
|
190
|
+
|
|
191
|
+
# Template patterns
|
|
192
|
+
if re.search(r"(polite|professional|formal|warmer|shorter|firmer|rewrite|rephrase)", prompt_lower):
|
|
193
|
+
return "style"
|
|
194
|
+
|
|
195
|
+
if re.search(r"(template|draft|write a|compose a|reply to)", prompt_lower):
|
|
196
|
+
return "template"
|
|
197
|
+
|
|
198
|
+
# Command patterns
|
|
199
|
+
if re.search(r"(what does|how do i|explain|command|flag|option|syntax)", prompt_lower):
|
|
200
|
+
return "command"
|
|
201
|
+
|
|
202
|
+
# Default to factual
|
|
203
|
+
return "factual"
|
|
204
|
+
|
|
205
|
+
def _is_cacheable(self, prompt: str, response: str = "") -> tuple[bool, Optional[str]]:
|
|
206
|
+
"""
|
|
207
|
+
Check if prompt/response pair should be cached.
|
|
208
|
+
Returns (is_cacheable, block_reason).
|
|
209
|
+
"""
|
|
210
|
+
if not self.config.enabled:
|
|
211
|
+
return False, "caching_disabled"
|
|
212
|
+
|
|
213
|
+
# Check prompt for hard blocks
|
|
214
|
+
prompt_check = self._check_hard_blocks(prompt)
|
|
215
|
+
if prompt_check.blocked:
|
|
216
|
+
self.blocked_requests[prompt_check.reason] += 1
|
|
217
|
+
logger.info(
|
|
218
|
+
f"BLOCKED ({prompt_check.reason.value}): {prompt[:50]}... "
|
|
219
|
+
f"[pattern: {prompt_check.matched_pattern}]"
|
|
220
|
+
)
|
|
221
|
+
return False, prompt_check.reason.value
|
|
222
|
+
|
|
223
|
+
# Check response for hard blocks (don't cache responses with sensitive data)
|
|
224
|
+
if response:
|
|
225
|
+
response_check = self._check_hard_blocks(response)
|
|
226
|
+
if response_check.blocked:
|
|
227
|
+
self.blocked_requests[response_check.reason] += 1
|
|
228
|
+
logger.info(
|
|
229
|
+
f"BLOCKED response ({response_check.reason.value}): "
|
|
230
|
+
f"[pattern: {response_check.matched_pattern}]"
|
|
231
|
+
)
|
|
232
|
+
return False, response_check.reason.value
|
|
233
|
+
|
|
234
|
+
return True, None
|
|
235
|
+
|
|
236
|
+
async def _search_cache(self, prompt: str, category: str) -> CacheResult:
|
|
237
|
+
"""Search for cached response with category-specific threshold."""
|
|
238
|
+
try:
|
|
239
|
+
threshold = self.config.thresholds.get(
|
|
240
|
+
category,
|
|
241
|
+
self.config.thresholds["default"]
|
|
242
|
+
)
|
|
243
|
+
|
|
244
|
+
result = await asyncio.to_thread(
|
|
245
|
+
self.cache.search,
|
|
246
|
+
prompt=prompt,
|
|
247
|
+
similarity_threshold=threshold,
|
|
248
|
+
search_strategies=[SearchStrategy.EXACT, SearchStrategy.SEMANTIC],
|
|
249
|
+
attributes={"model": self.config.model_id},
|
|
250
|
+
)
|
|
251
|
+
|
|
252
|
+
if result.hit:
|
|
253
|
+
# Verify cached response doesn't contain blocked content
|
|
254
|
+
response_check = self._check_hard_blocks(result.response)
|
|
255
|
+
if response_check.blocked:
|
|
256
|
+
logger.warning(
|
|
257
|
+
f"Cached response contains blocked content, skipping: "
|
|
258
|
+
f"{response_check.reason.value}"
|
|
259
|
+
)
|
|
260
|
+
return CacheResult(hit=False)
|
|
261
|
+
|
|
262
|
+
return CacheResult(
|
|
263
|
+
hit=True,
|
|
264
|
+
response=result.response,
|
|
265
|
+
similarity=result.similarity,
|
|
266
|
+
entry_id=result.entry_id,
|
|
267
|
+
)
|
|
268
|
+
except Exception as e:
|
|
269
|
+
logger.warning(f"Cache search failed: {e}")
|
|
270
|
+
|
|
271
|
+
return CacheResult(hit=False)
|
|
272
|
+
|
|
273
|
+
async def _store_in_cache(
|
|
274
|
+
self,
|
|
275
|
+
prompt: str,
|
|
276
|
+
response: str,
|
|
277
|
+
category: str
|
|
278
|
+
) -> None:
|
|
279
|
+
"""Store response in cache (fire-and-forget) if allowed."""
|
|
280
|
+
# Final safety check before storing
|
|
281
|
+
cacheable, reason = self._is_cacheable(prompt, response)
|
|
282
|
+
if not cacheable:
|
|
283
|
+
logger.debug(f"Not storing in cache: {reason}")
|
|
284
|
+
return
|
|
285
|
+
|
|
286
|
+
try:
|
|
287
|
+
await asyncio.to_thread(
|
|
288
|
+
self.cache.set,
|
|
289
|
+
prompt=prompt,
|
|
290
|
+
response=response,
|
|
291
|
+
attributes={
|
|
292
|
+
"model": self.config.model_id,
|
|
293
|
+
"category": category,
|
|
294
|
+
},
|
|
295
|
+
)
|
|
296
|
+
logger.debug(f"Cached [{category}]: {prompt[:50]}...")
|
|
297
|
+
except Exception as e:
|
|
298
|
+
logger.warning(f"Cache store failed: {e}")
|
|
299
|
+
|
|
300
|
+
async def _call_llm(self, prompt: str, system_prompt: Optional[str] = None) -> str:
|
|
301
|
+
"""Call the LLM API."""
|
|
302
|
+
messages = []
|
|
303
|
+
if system_prompt:
|
|
304
|
+
messages.append({"role": "system", "content": system_prompt})
|
|
305
|
+
messages.append({"role": "user", "content": prompt})
|
|
306
|
+
|
|
307
|
+
response = await self.llm_client.post(
|
|
308
|
+
"/chat/completions",
|
|
309
|
+
json={
|
|
310
|
+
"model": self.config.model_id,
|
|
311
|
+
"messages": messages,
|
|
312
|
+
"max_tokens": 1024,
|
|
313
|
+
},
|
|
314
|
+
)
|
|
315
|
+
response.raise_for_status()
|
|
316
|
+
data = response.json()
|
|
317
|
+
return data["choices"][0]["message"]["content"]
|
|
318
|
+
|
|
319
|
+
async def complete(
|
|
320
|
+
self,
|
|
321
|
+
prompt: str,
|
|
322
|
+
system_prompt: Optional[str] = None,
|
|
323
|
+
force_refresh: bool = False,
|
|
324
|
+
) -> str:
|
|
325
|
+
"""
|
|
326
|
+
Complete a prompt with semantic caching.
|
|
327
|
+
Enforces caching policy with hard blocks.
|
|
328
|
+
|
|
329
|
+
Args:
|
|
330
|
+
prompt: The user prompt
|
|
331
|
+
system_prompt: Optional system prompt (not included in cache key)
|
|
332
|
+
force_refresh: Skip cache and call LLM directly
|
|
333
|
+
|
|
334
|
+
Returns:
|
|
335
|
+
The LLM response (from cache or fresh)
|
|
336
|
+
"""
|
|
337
|
+
normalized_prompt = self._normalize_prompt(prompt)
|
|
338
|
+
category = self._detect_category(prompt)
|
|
339
|
+
|
|
340
|
+
# Check if cacheable (hard blocks)
|
|
341
|
+
cacheable, block_reason = self._is_cacheable(prompt)
|
|
342
|
+
|
|
343
|
+
if not force_refresh and cacheable:
|
|
344
|
+
cache_result = await self._search_cache(normalized_prompt, category)
|
|
345
|
+
|
|
346
|
+
if cache_result.hit:
|
|
347
|
+
self.cache_hits += 1
|
|
348
|
+
logger.info(
|
|
349
|
+
f"Cache HIT [{category}] (similarity={cache_result.similarity:.3f}): "
|
|
350
|
+
f"{prompt[:50]}..."
|
|
351
|
+
)
|
|
352
|
+
return cache_result.response
|
|
353
|
+
|
|
354
|
+
# Cache miss, blocked, or force refresh - call LLM
|
|
355
|
+
self.cache_misses += 1
|
|
356
|
+
if block_reason:
|
|
357
|
+
logger.info(f"Cache SKIP (blocked: {block_reason}): {prompt[:50]}...")
|
|
358
|
+
else:
|
|
359
|
+
logger.info(f"Cache MISS [{category}]: {prompt[:50]}...")
|
|
360
|
+
|
|
361
|
+
response = await self._call_llm(prompt, system_prompt)
|
|
362
|
+
|
|
363
|
+
# Store in cache if allowed (async, don't block response)
|
|
364
|
+
if cacheable:
|
|
365
|
+
asyncio.create_task(
|
|
366
|
+
self._store_in_cache(normalized_prompt, response, category)
|
|
367
|
+
)
|
|
368
|
+
|
|
369
|
+
return response
|
|
370
|
+
|
|
371
|
+
def get_stats(self) -> dict:
|
|
372
|
+
"""Get cache statistics including block reasons."""
|
|
373
|
+
total = self.cache_hits + self.cache_misses
|
|
374
|
+
hit_rate = self.cache_hits / total if total > 0 else 0
|
|
375
|
+
return {
|
|
376
|
+
"hits": self.cache_hits,
|
|
377
|
+
"misses": self.cache_misses,
|
|
378
|
+
"total": total,
|
|
379
|
+
"hit_rate": f"{hit_rate:.1%}",
|
|
380
|
+
"blocked": {
|
|
381
|
+
reason.value: count
|
|
382
|
+
for reason, count in self.blocked_requests.items()
|
|
383
|
+
},
|
|
384
|
+
}
|
|
385
|
+
|
|
386
|
+
|
|
387
|
+
# =============================================================================
|
|
388
|
+
# Example usage
|
|
389
|
+
# =============================================================================
|
|
390
|
+
|
|
391
|
+
async def main():
|
|
392
|
+
"""Demonstrate cached agent with policy enforcement."""
|
|
393
|
+
|
|
394
|
+
agent = CachedAgent(config=CacheConfig(enabled=True, model_id="gpt-5"))
|
|
395
|
+
|
|
396
|
+
test_queries = [
|
|
397
|
+
# CACHEABLE - Factual Q&A
|
|
398
|
+
("What is Redis?", "Should cache"),
|
|
399
|
+
("Explain semantic caching", "Should cache"),
|
|
400
|
+
|
|
401
|
+
# CACHEABLE - Style transforms
|
|
402
|
+
("Make this message warmer: Thanks for your email", "Should cache"),
|
|
403
|
+
("Rewrite this to be more professional", "Should cache"),
|
|
404
|
+
|
|
405
|
+
# CACHEABLE - Templates
|
|
406
|
+
("Write a polite decline email", "Should cache"),
|
|
407
|
+
|
|
408
|
+
# BLOCKED - Temporal
|
|
409
|
+
("What's on my calendar today?", "BLOCKED: temporal"),
|
|
410
|
+
("Remind me in 20 minutes", "BLOCKED: temporal"),
|
|
411
|
+
("What's the deadline for this week?", "BLOCKED: temporal"),
|
|
412
|
+
|
|
413
|
+
# BLOCKED - Credentials
|
|
414
|
+
("Store my API key sk-abc123xyz", "BLOCKED: credentials"),
|
|
415
|
+
("My password is hunter2", "BLOCKED: credentials"),
|
|
416
|
+
|
|
417
|
+
# BLOCKED - Identifiers
|
|
418
|
+
("Send email to john@example.com", "BLOCKED: identifiers"),
|
|
419
|
+
("Call me at 5551234567", "BLOCKED: identifiers"),
|
|
420
|
+
("Order ID: 12345678", "BLOCKED: identifiers"),
|
|
421
|
+
|
|
422
|
+
# BLOCKED - Personal context
|
|
423
|
+
("My wife said we should...", "BLOCKED: personal"),
|
|
424
|
+
("In our private chat, he mentioned...", "BLOCKED: personal"),
|
|
425
|
+
]
|
|
426
|
+
|
|
427
|
+
print("=" * 60)
|
|
428
|
+
print("LangCache Policy Enforcement Demo")
|
|
429
|
+
print("=" * 60)
|
|
430
|
+
|
|
431
|
+
for query, expected in test_queries:
|
|
432
|
+
print(f"\nQuery: {query}")
|
|
433
|
+
print(f"Expected: {expected}")
|
|
434
|
+
try:
|
|
435
|
+
response = await agent.complete(query)
|
|
436
|
+
print(f"Response: {response[:80]}...")
|
|
437
|
+
except Exception as e:
|
|
438
|
+
print(f"Error: {e}")
|
|
439
|
+
|
|
440
|
+
print("\n" + "=" * 60)
|
|
441
|
+
print("Cache Statistics:")
|
|
442
|
+
print("=" * 60)
|
|
443
|
+
stats = agent.get_stats()
|
|
444
|
+
print(f"Hits: {stats['hits']}")
|
|
445
|
+
print(f"Misses: {stats['misses']}")
|
|
446
|
+
print(f"Hit Rate: {stats['hit_rate']}")
|
|
447
|
+
print(f"Blocked by reason:")
|
|
448
|
+
for reason, count in stats['blocked'].items():
|
|
449
|
+
print(f" - {reason}: {count}")
|
|
450
|
+
|
|
451
|
+
|
|
452
|
+
if __name__ == "__main__":
|
|
453
|
+
asyncio.run(main())
|