opencode-skills-collection 1.0.185 → 1.0.187
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bundled-skills/.antigravity-install-manifest.json +5 -1
- package/bundled-skills/3d-web-experience/SKILL.md +152 -37
- package/bundled-skills/agent-evaluation/SKILL.md +1088 -26
- package/bundled-skills/agent-memory-systems/SKILL.md +1037 -25
- package/bundled-skills/agent-tool-builder/SKILL.md +668 -16
- package/bundled-skills/ai-agents-architect/SKILL.md +271 -31
- package/bundled-skills/ai-product/SKILL.md +716 -26
- package/bundled-skills/ai-wrapper-product/SKILL.md +450 -44
- package/bundled-skills/algolia-search/SKILL.md +867 -15
- package/bundled-skills/autonomous-agents/SKILL.md +1033 -26
- package/bundled-skills/aws-serverless/SKILL.md +1046 -35
- package/bundled-skills/azure-functions/SKILL.md +1318 -19
- package/bundled-skills/browser-automation/SKILL.md +1065 -28
- package/bundled-skills/browser-extension-builder/SKILL.md +159 -32
- package/bundled-skills/bullmq-specialist/SKILL.md +347 -16
- package/bundled-skills/clerk-auth/SKILL.md +796 -15
- package/bundled-skills/computer-use-agents/SKILL.md +1870 -28
- package/bundled-skills/context-window-management/SKILL.md +271 -18
- package/bundled-skills/conversation-memory/SKILL.md +453 -24
- package/bundled-skills/crewai/SKILL.md +252 -46
- package/bundled-skills/discord-bot-architect/SKILL.md +1207 -34
- package/bundled-skills/docs/integrations/jetski-cortex.md +3 -3
- package/bundled-skills/docs/integrations/jetski-gemini-loader/README.md +1 -1
- package/bundled-skills/docs/maintainers/repo-growth-seo.md +3 -3
- package/bundled-skills/docs/maintainers/skills-update-guide.md +1 -1
- package/bundled-skills/docs/users/bundles.md +1 -1
- package/bundled-skills/docs/users/claude-code-skills.md +1 -1
- package/bundled-skills/docs/users/gemini-cli-skills.md +1 -1
- package/bundled-skills/docs/users/getting-started.md +1 -1
- package/bundled-skills/docs/users/kiro-integration.md +1 -1
- package/bundled-skills/docs/users/usage.md +4 -4
- package/bundled-skills/docs/users/visual-guide.md +4 -4
- package/bundled-skills/email-systems/SKILL.md +646 -26
- package/bundled-skills/faf-expert/SKILL.md +221 -0
- package/bundled-skills/faf-wizard/SKILL.md +252 -0
- package/bundled-skills/file-uploads/SKILL.md +212 -11
- package/bundled-skills/firebase/SKILL.md +646 -16
- package/bundled-skills/gcp-cloud-run/SKILL.md +1117 -32
- package/bundled-skills/graphql/SKILL.md +1026 -27
- package/bundled-skills/hubspot-integration/SKILL.md +804 -19
- package/bundled-skills/idea-darwin/SKILL.md +120 -0
- package/bundled-skills/inngest/SKILL.md +431 -16
- package/bundled-skills/interactive-portfolio/SKILL.md +342 -44
- package/bundled-skills/langfuse/SKILL.md +296 -41
- package/bundled-skills/langgraph/SKILL.md +259 -50
- package/bundled-skills/micro-saas-launcher/SKILL.md +343 -44
- package/bundled-skills/neon-postgres/SKILL.md +572 -15
- package/bundled-skills/nextjs-supabase-auth/SKILL.md +269 -21
- package/bundled-skills/notion-template-business/SKILL.md +371 -44
- package/bundled-skills/personal-tool-builder/SKILL.md +537 -44
- package/bundled-skills/plaid-fintech/SKILL.md +825 -19
- package/bundled-skills/prompt-caching/SKILL.md +438 -25
- package/bundled-skills/rag-engineer/SKILL.md +271 -29
- package/bundled-skills/salesforce-development/SKILL.md +912 -19
- package/bundled-skills/satori/SKILL.md +54 -0
- package/bundled-skills/scroll-experience/SKILL.md +381 -44
- package/bundled-skills/segment-cdp/SKILL.md +817 -19
- package/bundled-skills/shopify-apps/SKILL.md +1475 -19
- package/bundled-skills/slack-bot-builder/SKILL.md +1162 -28
- package/bundled-skills/telegram-bot-builder/SKILL.md +152 -37
- package/bundled-skills/telegram-mini-app/SKILL.md +445 -44
- package/bundled-skills/trigger-dev/SKILL.md +916 -27
- package/bundled-skills/twilio-communications/SKILL.md +1310 -28
- package/bundled-skills/upstash-qstash/SKILL.md +898 -27
- package/bundled-skills/vercel-deployment/SKILL.md +637 -39
- package/bundled-skills/viral-generator-builder/SKILL.md +132 -37
- package/bundled-skills/voice-agents/SKILL.md +937 -27
- package/bundled-skills/voice-ai-development/SKILL.md +375 -46
- package/bundled-skills/workflow-automation/SKILL.md +982 -29
- package/bundled-skills/zapier-make-patterns/SKILL.md +772 -27
- package/package.json +1 -1
|
@@ -1,21 +1,38 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: agent-memory-systems
|
|
3
|
-
description: "
|
|
3
|
+
description: "Memory is the cornerstone of intelligent agents. Without it, every
|
|
4
|
+
interaction starts from zero. This skill covers the architecture of agent
|
|
5
|
+
memory: short-term (context window), long-term (vector stores), and the
|
|
6
|
+
cognitive architectures that organize them."
|
|
4
7
|
risk: safe
|
|
5
|
-
source:
|
|
6
|
-
date_added:
|
|
8
|
+
source: vibeship-spawner-skills (Apache 2.0)
|
|
9
|
+
date_added: 2026-02-27
|
|
7
10
|
---
|
|
8
11
|
|
|
9
12
|
# Agent Memory Systems
|
|
10
13
|
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
+
Memory is the cornerstone of intelligent agents. Without it, every interaction
|
|
15
|
+
starts from zero. This skill covers the architecture of agent memory: short-term
|
|
16
|
+
(context window), long-term (vector stores), and the cognitive architectures
|
|
17
|
+
that organize them.
|
|
14
18
|
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
+
Key insight: Memory isn't just storage - it's retrieval. A million stored facts
|
|
20
|
+
mean nothing if you can't find the right one. Chunking, embedding, and retrieval
|
|
21
|
+
strategies determine whether your agent remembers or forgets.
|
|
22
|
+
|
|
23
|
+
The field is fragmented with inconsistent terminology. We use the CoALA cognitive
|
|
24
|
+
architecture framework: semantic memory (facts), episodic memory (experiences),
|
|
25
|
+
and procedural memory (how-to knowledge).
|
|
26
|
+
|
|
27
|
+
## Principles
|
|
28
|
+
|
|
29
|
+
- Memory quality = retrieval quality, not storage quantity
|
|
30
|
+
- Chunk for retrieval, not for storage
|
|
31
|
+
- Context isolation is the enemy of memory
|
|
32
|
+
- Right memory type for right information
|
|
33
|
+
- Decay old memories - not everything should be forever
|
|
34
|
+
- Test retrieval accuracy before production
|
|
35
|
+
- Background memory formation beats real-time
|
|
19
36
|
|
|
20
37
|
## Capabilities
|
|
21
38
|
|
|
@@ -30,43 +47,1038 @@ and
|
|
|
30
47
|
- memory-formation
|
|
31
48
|
- memory-decay
|
|
32
49
|
|
|
50
|
+
## Scope
|
|
51
|
+
|
|
52
|
+
- vector-database-operations → data-engineer
|
|
53
|
+
- rag-pipeline-architecture → llm-architect
|
|
54
|
+
- embedding-model-selection → ml-engineer
|
|
55
|
+
- knowledge-graph-design → knowledge-engineer
|
|
56
|
+
|
|
57
|
+
## Tooling
|
|
58
|
+
|
|
59
|
+
### Memory_frameworks
|
|
60
|
+
|
|
61
|
+
- LangMem (LangChain) - When: LangGraph agents with persistent memory Note: Semantic, episodic, procedural memory types
|
|
62
|
+
- MemGPT / Letta - When: Virtual context management, OS-style memory Note: Hierarchical memory tiers, automatic paging
|
|
63
|
+
- Mem0 - When: User memory layer for personalization Note: Designed for user preferences and history
|
|
64
|
+
|
|
65
|
+
### Vector_stores
|
|
66
|
+
|
|
67
|
+
- Pinecone - When: Managed, enterprise-scale (billions of vectors) Note: Best query performance, highest cost
|
|
68
|
+
- Qdrant - When: Complex metadata filtering, open-source Note: Rust-based, excellent filtering
|
|
69
|
+
- Weaviate - When: Hybrid search, knowledge graph features Note: GraphQL interface, good for relationships
|
|
70
|
+
- ChromaDB - When: Prototyping, small/medium apps Note: Developer-friendly, ~20ms p50 at 100K vectors
|
|
71
|
+
- pgvector - When: Already using PostgreSQL, simpler setup Note: Good for <1M vectors, familiar tooling
|
|
72
|
+
|
|
73
|
+
### Embedding_models
|
|
74
|
+
|
|
75
|
+
- OpenAI text-embedding-3-large - When: Best quality, 3072 dimensions Note: $0.13/1M tokens
|
|
76
|
+
- OpenAI text-embedding-3-small - When: Good balance, 1536 dimensions Note: $0.02/1M tokens, 5x cheaper
|
|
77
|
+
- nomic-embed-text-v1.5 - When: Open-source, local deployment Note: 768 dimensions, good quality
|
|
78
|
+
- all-MiniLM-L6-v2 - When: Lightweight, fast local embedding Note: 384 dimensions, lowest latency
|
|
79
|
+
|
|
33
80
|
## Patterns
|
|
34
81
|
|
|
35
82
|
### Memory Type Architecture
|
|
36
83
|
|
|
37
84
|
Choosing the right memory type for different information
|
|
38
85
|
|
|
86
|
+
**When to use**: Designing agent memory system
|
|
87
|
+
|
|
88
|
+
# MEMORY TYPE ARCHITECTURE (CoALA Framework):
|
|
89
|
+
|
|
90
|
+
"""
|
|
91
|
+
Three memory types for different purposes:
|
|
92
|
+
|
|
93
|
+
1. Semantic Memory: Facts and knowledge
|
|
94
|
+
- What you know about the world
|
|
95
|
+
- User preferences, domain knowledge
|
|
96
|
+
- Stored in profiles (structured) or collections (unstructured)
|
|
97
|
+
|
|
98
|
+
2. Episodic Memory: Experiences and events
|
|
99
|
+
- What happened (timestamped events)
|
|
100
|
+
- Past conversations, task outcomes
|
|
101
|
+
- Used for learning from experience
|
|
102
|
+
|
|
103
|
+
3. Procedural Memory: How to do things
|
|
104
|
+
- Rules, skills, workflows
|
|
105
|
+
- Often implemented as few-shot examples
|
|
106
|
+
- "How did I solve this before?"
|
|
107
|
+
"""
|
|
108
|
+
|
|
109
|
+
## LangMem Implementation
|
|
110
|
+
"""
|
|
111
|
+
from langmem import MemoryStore
|
|
112
|
+
from langgraph.graph import StateGraph
|
|
113
|
+
|
|
114
|
+
# Initialize memory store
|
|
115
|
+
memory = MemoryStore(
|
|
116
|
+
connection_string=os.environ["POSTGRES_URL"]
|
|
117
|
+
)
|
|
118
|
+
|
|
119
|
+
# Semantic memory: user profile
|
|
120
|
+
await memory.semantic.upsert(
|
|
121
|
+
namespace="user_profile",
|
|
122
|
+
key=user_id,
|
|
123
|
+
content={
|
|
124
|
+
"name": "Alice",
|
|
125
|
+
"preferences": ["dark mode", "concise responses"],
|
|
126
|
+
"expertise_level": "developer",
|
|
127
|
+
}
|
|
128
|
+
)
|
|
129
|
+
|
|
130
|
+
# Episodic memory: past interaction
|
|
131
|
+
await memory.episodic.add(
|
|
132
|
+
namespace="conversations",
|
|
133
|
+
content={
|
|
134
|
+
"timestamp": datetime.now(),
|
|
135
|
+
"summary": "Helped debug authentication issue",
|
|
136
|
+
"outcome": "resolved",
|
|
137
|
+
"key_insights": ["Token expiry was root cause"],
|
|
138
|
+
},
|
|
139
|
+
metadata={"user_id": user_id, "topic": "debugging"}
|
|
140
|
+
)
|
|
141
|
+
|
|
142
|
+
# Procedural memory: learned pattern
|
|
143
|
+
await memory.procedural.add(
|
|
144
|
+
namespace="skills",
|
|
145
|
+
content={
|
|
146
|
+
"task_type": "debug_auth",
|
|
147
|
+
"steps": ["Check token expiry", "Verify refresh flow"],
|
|
148
|
+
"example_interaction": few_shot_example,
|
|
149
|
+
}
|
|
150
|
+
)
|
|
151
|
+
"""
|
|
152
|
+
|
|
153
|
+
## Memory Retrieval at Runtime
|
|
154
|
+
"""
|
|
155
|
+
async def prepare_context(user_id, query):
|
|
156
|
+
# Get user profile (semantic)
|
|
157
|
+
profile = await memory.semantic.get(
|
|
158
|
+
namespace="user_profile",
|
|
159
|
+
key=user_id
|
|
160
|
+
)
|
|
161
|
+
|
|
162
|
+
# Find relevant past experiences (episodic)
|
|
163
|
+
similar_experiences = await memory.episodic.search(
|
|
164
|
+
namespace="conversations",
|
|
165
|
+
query=query,
|
|
166
|
+
filter={"user_id": user_id},
|
|
167
|
+
limit=3
|
|
168
|
+
)
|
|
169
|
+
|
|
170
|
+
# Find relevant skills (procedural)
|
|
171
|
+
relevant_skills = await memory.procedural.search(
|
|
172
|
+
namespace="skills",
|
|
173
|
+
query=query,
|
|
174
|
+
limit=2
|
|
175
|
+
)
|
|
176
|
+
|
|
177
|
+
return {
|
|
178
|
+
"profile": profile,
|
|
179
|
+
"past_experiences": similar_experiences,
|
|
180
|
+
"relevant_skills": relevant_skills,
|
|
181
|
+
}
|
|
182
|
+
"""
|
|
183
|
+
|
|
39
184
|
### Vector Store Selection Pattern
|
|
40
185
|
|
|
41
186
|
Choosing the right vector database for your use case
|
|
42
187
|
|
|
188
|
+
**When to use**: Setting up persistent memory storage
|
|
189
|
+
|
|
190
|
+
# VECTOR STORE SELECTION:
|
|
191
|
+
|
|
192
|
+
"""
|
|
193
|
+
Decision matrix:
|
|
194
|
+
|
|
195
|
+
| | Pinecone | Qdrant | Weaviate | ChromaDB | pgvector |
|
|
196
|
+
|------------|----------|--------|----------|----------|----------|
|
|
197
|
+
| Scale | Billions | 100M+ | 100M+ | 1M | 1M |
|
|
198
|
+
| Managed | Yes | Both | Both | Self | Self |
|
|
199
|
+
| Filtering | Basic | Best | Good | Basic | SQL |
|
|
200
|
+
| Hybrid | No | Yes | Best | No | Yes |
|
|
201
|
+
| Cost | High | Medium | Medium | Free | Free |
|
|
202
|
+
| Latency | 5ms | 7ms | 10ms | 20ms | 15ms |
|
|
203
|
+
"""
|
|
204
|
+
|
|
205
|
+
## Pinecone (Enterprise Scale)
|
|
206
|
+
"""
|
|
207
|
+
from pinecone import Pinecone
|
|
208
|
+
|
|
209
|
+
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
|
|
210
|
+
index = pc.Index("agent-memory")
|
|
211
|
+
|
|
212
|
+
# Upsert with metadata
|
|
213
|
+
index.upsert(
|
|
214
|
+
vectors=[
|
|
215
|
+
{
|
|
216
|
+
"id": f"memory-{uuid4()}",
|
|
217
|
+
"values": embedding,
|
|
218
|
+
"metadata": {
|
|
219
|
+
"user_id": user_id,
|
|
220
|
+
"timestamp": datetime.now().isoformat(),
|
|
221
|
+
"type": "episodic",
|
|
222
|
+
"content": memory_text,
|
|
223
|
+
}
|
|
224
|
+
}
|
|
225
|
+
],
|
|
226
|
+
namespace=namespace
|
|
227
|
+
)
|
|
228
|
+
|
|
229
|
+
# Query with filter
|
|
230
|
+
results = index.query(
|
|
231
|
+
vector=query_embedding,
|
|
232
|
+
filter={"user_id": user_id, "type": "episodic"},
|
|
233
|
+
top_k=5,
|
|
234
|
+
include_metadata=True
|
|
235
|
+
)
|
|
236
|
+
"""
|
|
237
|
+
|
|
238
|
+
## Qdrant (Complex Filtering)
|
|
239
|
+
"""
|
|
240
|
+
from qdrant_client import QdrantClient
|
|
241
|
+
from qdrant_client.models import PointStruct, Filter, FieldCondition
|
|
242
|
+
|
|
243
|
+
client = QdrantClient(url="http://localhost:6333")
|
|
244
|
+
|
|
245
|
+
# Complex filtering with Qdrant
|
|
246
|
+
results = client.search(
|
|
247
|
+
collection_name="agent_memory",
|
|
248
|
+
query_vector=query_embedding,
|
|
249
|
+
query_filter=Filter(
|
|
250
|
+
must=[
|
|
251
|
+
FieldCondition(key="user_id", match={"value": user_id}),
|
|
252
|
+
FieldCondition(key="type", match={"value": "semantic"}),
|
|
253
|
+
],
|
|
254
|
+
should=[
|
|
255
|
+
FieldCondition(key="topic", match={"any": ["auth", "security"]}),
|
|
256
|
+
]
|
|
257
|
+
),
|
|
258
|
+
limit=5
|
|
259
|
+
)
|
|
260
|
+
"""
|
|
261
|
+
|
|
262
|
+
## ChromaDB (Prototyping)
|
|
263
|
+
"""
|
|
264
|
+
import chromadb
|
|
265
|
+
|
|
266
|
+
client = chromadb.PersistentClient(path="./memory_db")
|
|
267
|
+
collection = client.get_or_create_collection("agent_memory")
|
|
268
|
+
|
|
269
|
+
# Simple and fast for prototypes
|
|
270
|
+
collection.add(
|
|
271
|
+
ids=[str(uuid4())],
|
|
272
|
+
embeddings=[embedding],
|
|
273
|
+
documents=[memory_text],
|
|
274
|
+
metadatas=[{"user_id": user_id, "type": "episodic"}]
|
|
275
|
+
)
|
|
276
|
+
|
|
277
|
+
results = collection.query(
|
|
278
|
+
query_embeddings=[query_embedding],
|
|
279
|
+
n_results=5,
|
|
280
|
+
where={"user_id": user_id}
|
|
281
|
+
)
|
|
282
|
+
"""
|
|
283
|
+
|
|
43
284
|
### Chunking Strategy Pattern
|
|
44
285
|
|
|
45
286
|
Breaking documents into retrievable chunks
|
|
46
287
|
|
|
47
|
-
|
|
288
|
+
**When to use**: Processing documents for memory storage
|
|
289
|
+
|
|
290
|
+
# CHUNKING STRATEGIES:
|
|
291
|
+
|
|
292
|
+
"""
|
|
293
|
+
The chunking dilemma:
|
|
294
|
+
- Too large: Vector loses specificity
|
|
295
|
+
- Too small: Loses context
|
|
296
|
+
|
|
297
|
+
Optimal chunk size depends on:
|
|
298
|
+
- Document type (code vs prose vs data)
|
|
299
|
+
- Query patterns (factual vs exploratory)
|
|
300
|
+
- Embedding model (each has sweet spot)
|
|
301
|
+
|
|
302
|
+
General guidance: 256-512 tokens for most use cases
|
|
303
|
+
"""
|
|
304
|
+
|
|
305
|
+
## Fixed-Size Chunking (Baseline)
|
|
306
|
+
"""
|
|
307
|
+
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
|
308
|
+
|
|
309
|
+
splitter = RecursiveCharacterTextSplitter(
|
|
310
|
+
chunk_size=500, # Characters
|
|
311
|
+
chunk_overlap=50, # Overlap prevents cutting sentences
|
|
312
|
+
separators=["\n\n", "\n", ". ", " ", ""] # Priority order
|
|
313
|
+
)
|
|
314
|
+
|
|
315
|
+
chunks = splitter.split_text(document)
|
|
316
|
+
"""
|
|
317
|
+
|
|
318
|
+
## Semantic Chunking (Better Quality)
|
|
319
|
+
"""
|
|
320
|
+
from langchain_experimental.text_splitter import SemanticChunker
|
|
321
|
+
from langchain_openai import OpenAIEmbeddings
|
|
322
|
+
|
|
323
|
+
# Splits based on semantic similarity
|
|
324
|
+
splitter = SemanticChunker(
|
|
325
|
+
embeddings=OpenAIEmbeddings(),
|
|
326
|
+
breakpoint_threshold_type="percentile",
|
|
327
|
+
breakpoint_threshold_amount=95
|
|
328
|
+
)
|
|
329
|
+
|
|
330
|
+
chunks = splitter.split_text(document)
|
|
331
|
+
"""
|
|
332
|
+
|
|
333
|
+
## Structure-Aware Chunking (Documents with Hierarchy)
|
|
334
|
+
"""
|
|
335
|
+
from langchain.text_splitter import MarkdownHeaderTextSplitter
|
|
336
|
+
|
|
337
|
+
# Respect document structure
|
|
338
|
+
splitter = MarkdownHeaderTextSplitter(
|
|
339
|
+
headers_to_split_on=[
|
|
340
|
+
("#", "Header 1"),
|
|
341
|
+
("##", "Header 2"),
|
|
342
|
+
("###", "Header 3"),
|
|
343
|
+
]
|
|
344
|
+
)
|
|
345
|
+
|
|
346
|
+
chunks = splitter.split_text(markdown_doc)
|
|
347
|
+
# Each chunk has header metadata for context
|
|
348
|
+
"""
|
|
349
|
+
|
|
350
|
+
## Contextual Chunking (Anthropic's Approach)
|
|
351
|
+
"""
|
|
352
|
+
# Add context to each chunk before embedding
|
|
353
|
+
# Reduces retrieval failures by 35%
|
|
354
|
+
|
|
355
|
+
def add_context_to_chunk(chunk, document_summary):
|
|
356
|
+
context_prompt = f'''
|
|
357
|
+
Document summary: {document_summary}
|
|
358
|
+
|
|
359
|
+
The following is a chunk from this document:
|
|
360
|
+
{chunk}
|
|
361
|
+
'''
|
|
362
|
+
return context_prompt
|
|
363
|
+
|
|
364
|
+
# Embed the contextualized chunk, not raw chunk
|
|
365
|
+
for chunk in chunks:
|
|
366
|
+
contextualized = add_context_to_chunk(chunk, summary)
|
|
367
|
+
embedding = embed(contextualized)
|
|
368
|
+
store(chunk, embedding) # Store original, embed contextualized
|
|
369
|
+
"""
|
|
370
|
+
|
|
371
|
+
## Code-Specific Chunking
|
|
372
|
+
"""
|
|
373
|
+
from langchain.text_splitter import Language, RecursiveCharacterTextSplitter
|
|
374
|
+
|
|
375
|
+
# Language-aware splitting
|
|
376
|
+
python_splitter = RecursiveCharacterTextSplitter.from_language(
|
|
377
|
+
language=Language.PYTHON,
|
|
378
|
+
chunk_size=1000,
|
|
379
|
+
chunk_overlap=200
|
|
380
|
+
)
|
|
381
|
+
|
|
382
|
+
# Respects function/class boundaries
|
|
383
|
+
chunks = python_splitter.split_text(python_code)
|
|
384
|
+
"""
|
|
385
|
+
|
|
386
|
+
### Background Memory Formation
|
|
387
|
+
|
|
388
|
+
Processing memories asynchronously for better quality
|
|
389
|
+
|
|
390
|
+
**When to use**: You want higher recall without slowing interactions
|
|
391
|
+
|
|
392
|
+
# BACKGROUND MEMORY FORMATION:
|
|
393
|
+
|
|
394
|
+
"""
|
|
395
|
+
Real-time memory extraction slows conversations and adds
|
|
396
|
+
complexity to agent tool calls. Background processing after
|
|
397
|
+
conversations yields higher quality memories.
|
|
398
|
+
|
|
399
|
+
Pattern: Subconscious memory formation
|
|
400
|
+
"""
|
|
401
|
+
|
|
402
|
+
## LangGraph Background Processing
|
|
403
|
+
"""
|
|
404
|
+
from langgraph.graph import StateGraph
|
|
405
|
+
from langgraph.checkpoint.postgres import PostgresSaver
|
|
406
|
+
|
|
407
|
+
async def background_memory_processor(thread_id: str):
|
|
408
|
+
# Run after conversation ends or goes idle
|
|
409
|
+
conversation = await load_conversation(thread_id)
|
|
410
|
+
|
|
411
|
+
# Extract insights without time pressure
|
|
412
|
+
insights = await llm.invoke('''
|
|
413
|
+
Analyze this conversation and extract:
|
|
414
|
+
1. Key facts learned about the user
|
|
415
|
+
2. User preferences revealed
|
|
416
|
+
3. Tasks completed or pending
|
|
417
|
+
4. Patterns in user behavior
|
|
418
|
+
|
|
419
|
+
Be thorough - this runs in background.
|
|
420
|
+
|
|
421
|
+
Conversation:
|
|
422
|
+
{conversation}
|
|
423
|
+
''')
|
|
424
|
+
|
|
425
|
+
# Store to long-term memory
|
|
426
|
+
for insight in insights:
|
|
427
|
+
await memory.semantic.upsert(
|
|
428
|
+
namespace="user_insights",
|
|
429
|
+
key=generate_key(insight),
|
|
430
|
+
content=insight,
|
|
431
|
+
metadata={"source_thread": thread_id}
|
|
432
|
+
)
|
|
433
|
+
|
|
434
|
+
# Trigger on conversation end or idle timeout
|
|
435
|
+
@on_conversation_idle(timeout_minutes=5)
|
|
436
|
+
async def process_conversation(thread_id):
|
|
437
|
+
await background_memory_processor(thread_id)
|
|
438
|
+
"""
|
|
439
|
+
|
|
440
|
+
## Memory Consolidation (Like Sleep)
|
|
441
|
+
"""
|
|
442
|
+
# Periodically consolidate and deduplicate memories
|
|
443
|
+
|
|
444
|
+
async def consolidate_memories(user_id: str):
|
|
445
|
+
# Get all memories for user
|
|
446
|
+
memories = await memory.semantic.list(
|
|
447
|
+
namespace="user_insights",
|
|
448
|
+
filter={"user_id": user_id}
|
|
449
|
+
)
|
|
450
|
+
|
|
451
|
+
# Find similar memories (potential duplicates)
|
|
452
|
+
clusters = cluster_by_similarity(memories, threshold=0.9)
|
|
453
|
+
|
|
454
|
+
# Merge similar memories
|
|
455
|
+
for cluster in clusters:
|
|
456
|
+
if len(cluster) > 1:
|
|
457
|
+
merged = await llm.invoke(f'''
|
|
458
|
+
Consolidate these related memories into one:
|
|
459
|
+
{cluster}
|
|
460
|
+
|
|
461
|
+
Preserve all important information.
|
|
462
|
+
''')
|
|
463
|
+
await memory.semantic.upsert(
|
|
464
|
+
namespace="user_insights",
|
|
465
|
+
key=generate_key(merged),
|
|
466
|
+
content=merged
|
|
467
|
+
)
|
|
468
|
+
# Delete originals
|
|
469
|
+
for old in cluster:
|
|
470
|
+
await memory.semantic.delete(old.id)
|
|
471
|
+
"""
|
|
472
|
+
|
|
473
|
+
### Memory Decay Pattern
|
|
474
|
+
|
|
475
|
+
Forgetting old, irrelevant memories
|
|
476
|
+
|
|
477
|
+
**When to use**: Memory grows large, retrieval slows down
|
|
478
|
+
|
|
479
|
+
# MEMORY DECAY:
|
|
480
|
+
|
|
481
|
+
"""
|
|
482
|
+
Not all memories should live forever:
|
|
483
|
+
- Old preferences may be outdated
|
|
484
|
+
- Task details lose relevance
|
|
485
|
+
- Conflicting memories confuse retrieval
|
|
486
|
+
|
|
487
|
+
Implement intelligent decay based on:
|
|
488
|
+
- Recency (when was it created/accessed?)
|
|
489
|
+
- Frequency (how often is it retrieved?)
|
|
490
|
+
- Importance (is it a core fact or detail?)
|
|
491
|
+
"""
|
|
492
|
+
|
|
493
|
+
## Time-Based Decay
|
|
494
|
+
"""
|
|
495
|
+
from datetime import datetime, timedelta
|
|
496
|
+
|
|
497
|
+
async def decay_old_memories(namespace: str, max_age_days: int):
|
|
498
|
+
cutoff = datetime.now() - timedelta(days=max_age_days)
|
|
499
|
+
|
|
500
|
+
old_memories = await memory.episodic.list(
|
|
501
|
+
namespace=namespace,
|
|
502
|
+
filter={"last_accessed": {"$lt": cutoff.isoformat()}}
|
|
503
|
+
)
|
|
504
|
+
|
|
505
|
+
for mem in old_memories:
|
|
506
|
+
# Soft delete (mark as archived)
|
|
507
|
+
await memory.episodic.update(
|
|
508
|
+
id=mem.id,
|
|
509
|
+
metadata={"archived": True, "archived_at": datetime.now()}
|
|
510
|
+
)
|
|
511
|
+
"""
|
|
512
|
+
|
|
513
|
+
## Utility-Based Decay (MIRIX Approach)
|
|
514
|
+
"""
|
|
515
|
+
def calculate_memory_utility(memory):
|
|
516
|
+
'''
|
|
517
|
+
Composite utility score inspired by cognitive science:
|
|
518
|
+
- Recency: When was it last accessed?
|
|
519
|
+
- Frequency: How often is it accessed?
|
|
520
|
+
- Importance: How critical is this information?
|
|
521
|
+
'''
|
|
522
|
+
now = datetime.now()
|
|
523
|
+
|
|
524
|
+
# Recency score (exponential decay with 72h half-life)
|
|
525
|
+
hours_since_access = (now - memory.last_accessed).total_seconds() / 3600
|
|
526
|
+
recency_score = 0.5 ** (hours_since_access / 72)
|
|
527
|
+
|
|
528
|
+
# Frequency score
|
|
529
|
+
frequency_score = min(memory.access_count / 10, 1.0)
|
|
530
|
+
|
|
531
|
+
# Importance (from metadata or heuristic)
|
|
532
|
+
importance = memory.metadata.get("importance", 0.5)
|
|
533
|
+
|
|
534
|
+
# Weighted combination
|
|
535
|
+
utility = (
|
|
536
|
+
0.4 * recency_score +
|
|
537
|
+
0.3 * frequency_score +
|
|
538
|
+
0.3 * importance
|
|
539
|
+
)
|
|
540
|
+
|
|
541
|
+
return utility
|
|
542
|
+
|
|
543
|
+
async def prune_low_utility_memories(threshold=0.2):
|
|
544
|
+
all_memories = await memory.list_all()
|
|
545
|
+
for mem in all_memories:
|
|
546
|
+
if calculate_memory_utility(mem) < threshold:
|
|
547
|
+
await memory.archive(mem.id)
|
|
548
|
+
"""
|
|
549
|
+
|
|
550
|
+
## Sharp Edges
|
|
551
|
+
|
|
552
|
+
### Chunking Isolates Information From Its Context
|
|
553
|
+
|
|
554
|
+
Severity: CRITICAL
|
|
555
|
+
|
|
556
|
+
Situation: Processing documents for vector storage
|
|
557
|
+
|
|
558
|
+
Symptoms:
|
|
559
|
+
Retrieval finds chunks but they don't make sense alone. Agent
|
|
560
|
+
answers miss the big picture. "The function returns X" retrieved
|
|
561
|
+
without knowing which function. References to "this" without
|
|
562
|
+
knowing what "this" refers to.
|
|
563
|
+
|
|
564
|
+
Why this breaks:
|
|
565
|
+
When we chunk for AI processing, we're breaking connections,
|
|
566
|
+
reducing a holistic narrative to isolated fragments that often
|
|
567
|
+
miss the big picture. A chunk about "the configuration" without
|
|
568
|
+
context about what system is being configured is nearly useless.
|
|
569
|
+
|
|
570
|
+
Recommended fix:
|
|
571
|
+
|
|
572
|
+
## Contextual Chunking (Anthropic's approach)
|
|
573
|
+
# Add document context to each chunk before embedding
|
|
574
|
+
# Reduces retrieval failures by 35%
|
|
575
|
+
|
|
576
|
+
def contextualize_chunk(chunk, document):
|
|
577
|
+
summary = summarize(document)
|
|
578
|
+
|
|
579
|
+
# LLM generates context for chunk
|
|
580
|
+
context = llm.invoke(f'''
|
|
581
|
+
Document summary: {summary}
|
|
582
|
+
|
|
583
|
+
Generate a brief context statement for this chunk
|
|
584
|
+
that would help someone understand what it refers to:
|
|
585
|
+
|
|
586
|
+
{chunk}
|
|
587
|
+
''')
|
|
588
|
+
|
|
589
|
+
return f"{context}\n\n{chunk}"
|
|
590
|
+
|
|
591
|
+
# Embed the contextualized version
|
|
592
|
+
for chunk in chunks:
|
|
593
|
+
contextualized = contextualize_chunk(chunk, full_doc)
|
|
594
|
+
embedding = embed(contextualized)
|
|
595
|
+
# Store original chunk, embed contextualized
|
|
596
|
+
store(original=chunk, embedding=embedding)
|
|
597
|
+
|
|
598
|
+
## Hierarchical Chunking
|
|
599
|
+
# Store at multiple granularities
|
|
600
|
+
chunks_small = split(doc, size=256)
|
|
601
|
+
chunks_medium = split(doc, size=512)
|
|
602
|
+
chunks_large = split(doc, size=1024)
|
|
603
|
+
|
|
604
|
+
# Retrieve at appropriate level based on query
|
|
605
|
+
|
|
606
|
+
### Chunk Size Mismatched to Query Patterns
|
|
607
|
+
|
|
608
|
+
Severity: HIGH
|
|
609
|
+
|
|
610
|
+
Situation: Configuring chunking for memory storage
|
|
611
|
+
|
|
612
|
+
Symptoms:
|
|
613
|
+
High-quality documents produce low-quality retrievals. Simple
|
|
614
|
+
questions miss relevant information. Complex questions get
|
|
615
|
+
fragments instead of complete answers.
|
|
616
|
+
|
|
617
|
+
Why this breaks:
|
|
618
|
+
Optimal chunk size depends on query patterns:
|
|
619
|
+
- Factual queries need small, specific chunks
|
|
620
|
+
- Conceptual queries need larger context
|
|
621
|
+
- Code needs function-level boundaries
|
|
48
622
|
|
|
49
|
-
|
|
623
|
+
The sweet spot varies by document type and embedding model.
|
|
624
|
+
Default 1000 characters works for nothing specific.
|
|
50
625
|
|
|
51
|
-
|
|
626
|
+
Recommended fix:
|
|
52
627
|
|
|
53
|
-
|
|
628
|
+
## Test different sizes
|
|
629
|
+
from sklearn.metrics import recall_score
|
|
54
630
|
|
|
55
|
-
|
|
631
|
+
def evaluate_chunk_size(documents, test_queries, chunk_size):
|
|
632
|
+
chunks = split_documents(documents, size=chunk_size)
|
|
633
|
+
index = build_index(chunks)
|
|
56
634
|
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
635
|
+
correct_retrievals = 0
|
|
636
|
+
for query, expected_chunk in test_queries:
|
|
637
|
+
results = index.search(query, k=5)
|
|
638
|
+
if expected_chunk in results:
|
|
639
|
+
correct_retrievals += 1
|
|
640
|
+
|
|
641
|
+
return correct_retrievals / len(test_queries)
|
|
642
|
+
|
|
643
|
+
# Test multiple sizes
|
|
644
|
+
for size in [256, 512, 768, 1024]:
|
|
645
|
+
recall = evaluate_chunk_size(docs, test_queries, size)
|
|
646
|
+
print(f"Size {size}: Recall@5 = {recall:.2%}")
|
|
647
|
+
|
|
648
|
+
## Size recommendations by content type
|
|
649
|
+
CHUNK_SIZES = {
|
|
650
|
+
"documentation": 512, # Complete concepts
|
|
651
|
+
"code": 1000, # Function-level
|
|
652
|
+
"conversation": 256, # Turn-level
|
|
653
|
+
"articles": 768, # Paragraph-level
|
|
654
|
+
}
|
|
655
|
+
|
|
656
|
+
## Use overlap to prevent boundary issues
|
|
657
|
+
splitter = RecursiveCharacterTextSplitter(
|
|
658
|
+
chunk_size=512,
|
|
659
|
+
chunk_overlap=50, # 10% overlap
|
|
660
|
+
)
|
|
661
|
+
|
|
662
|
+
### Semantic Search Returns Irrelevant Results
|
|
663
|
+
|
|
664
|
+
Severity: HIGH
|
|
665
|
+
|
|
666
|
+
Situation: Querying memory for context
|
|
667
|
+
|
|
668
|
+
Symptoms:
|
|
669
|
+
Agent retrieves memories that seem related but aren't useful.
|
|
670
|
+
"Tell me about the user's preferences" returns conversation
|
|
671
|
+
about preferences in general, not this user's. High similarity
|
|
672
|
+
scores for wrong content.
|
|
673
|
+
|
|
674
|
+
Why this breaks:
|
|
675
|
+
Semantic similarity isn't the same as relevance. "The user
|
|
676
|
+
likes Python" and "Python is a programming language" are
|
|
677
|
+
semantically similar but very different types of information.
|
|
678
|
+
Without metadata filtering, retrieval is just word matching.
|
|
679
|
+
|
|
680
|
+
Recommended fix:
|
|
681
|
+
|
|
682
|
+
## Always filter by metadata first
|
|
683
|
+
# Don't rely on semantic similarity alone
|
|
684
|
+
|
|
685
|
+
# Bad: Only semantic search
|
|
686
|
+
results = index.query(
|
|
687
|
+
vector=query_embedding,
|
|
688
|
+
top_k=5
|
|
689
|
+
)
|
|
690
|
+
|
|
691
|
+
# Good: Filter then search
|
|
692
|
+
results = index.query(
|
|
693
|
+
vector=query_embedding,
|
|
694
|
+
filter={
|
|
695
|
+
"user_id": current_user.id,
|
|
696
|
+
"type": "preference",
|
|
697
|
+
"created_after": cutoff_date,
|
|
698
|
+
},
|
|
699
|
+
top_k=5
|
|
700
|
+
)
|
|
701
|
+
|
|
702
|
+
## Use hybrid search (semantic + keyword)
|
|
703
|
+
from qdrant_client import QdrantClient
|
|
704
|
+
|
|
705
|
+
client = QdrantClient(...)
|
|
706
|
+
|
|
707
|
+
# Hybrid search with fusion
|
|
708
|
+
results = client.search(
|
|
709
|
+
collection_name="memories",
|
|
710
|
+
query_vector=semantic_embedding,
|
|
711
|
+
query_text=query, # Also keyword match
|
|
712
|
+
fusion={"method": "rrf"}, # Reciprocal Rank Fusion
|
|
713
|
+
)
|
|
714
|
+
|
|
715
|
+
## Rerank results with cross-encoder
|
|
716
|
+
from sentence_transformers import CrossEncoder
|
|
717
|
+
|
|
718
|
+
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
|
|
719
|
+
|
|
720
|
+
# Initial retrieval (recall-oriented)
|
|
721
|
+
candidates = index.query(query_embedding, top_k=20)
|
|
722
|
+
|
|
723
|
+
# Rerank (precision-oriented)
|
|
724
|
+
pairs = [(query, c.text) for c in candidates]
|
|
725
|
+
scores = reranker.predict(pairs)
|
|
726
|
+
reranked = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True)
|
|
727
|
+
|
|
728
|
+
### Old Memories Override Current Information
|
|
729
|
+
|
|
730
|
+
Severity: HIGH
|
|
731
|
+
|
|
732
|
+
Situation: User preferences or facts change over time
|
|
733
|
+
|
|
734
|
+
Symptoms:
|
|
735
|
+
Agent uses outdated preferences. "User prefers dark mode" from
|
|
736
|
+
6 months ago overrides recent "switch to light mode" request.
|
|
737
|
+
Agent confidently uses stale data.
|
|
738
|
+
|
|
739
|
+
Why this breaks:
|
|
740
|
+
Vector stores don't have temporal awareness by default. A memory
|
|
741
|
+
from a year ago has the same retrieval weight as one from today.
|
|
742
|
+
Recent information should generally override old information
|
|
743
|
+
for preferences and mutable facts.
|
|
744
|
+
|
|
745
|
+
Recommended fix:
|
|
746
|
+
|
|
747
|
+
## Add temporal scoring
|
|
748
|
+
from datetime import datetime, timedelta
|
|
749
|
+
|
|
750
|
+
def time_decay_score(memory, half_life_days=30):
|
|
751
|
+
age = (datetime.now() - memory.created_at).days
|
|
752
|
+
decay = 0.5 ** (age / half_life_days)
|
|
753
|
+
return decay
|
|
754
|
+
|
|
755
|
+
def retrieve_with_recency(query, user_id):
|
|
756
|
+
# Get candidates
|
|
757
|
+
candidates = index.query(
|
|
758
|
+
vector=embed(query),
|
|
759
|
+
filter={"user_id": user_id},
|
|
760
|
+
top_k=20
|
|
761
|
+
)
|
|
762
|
+
|
|
763
|
+
# Apply time decay
|
|
764
|
+
for candidate in candidates:
|
|
765
|
+
time_score = time_decay_score(candidate)
|
|
766
|
+
candidate.final_score = candidate.similarity * 0.7 + time_score * 0.3
|
|
767
|
+
|
|
768
|
+
# Re-sort by final score
|
|
769
|
+
return sorted(candidates, key=lambda x: x.final_score, reverse=True)[:5]
|
|
770
|
+
|
|
771
|
+
## Update instead of append for preferences
|
|
772
|
+
async def update_preference(user_id, category, value):
|
|
773
|
+
# Delete old preference
|
|
774
|
+
await memory.delete(
|
|
775
|
+
filter={"user_id": user_id, "type": "preference", "category": category}
|
|
776
|
+
)
|
|
777
|
+
|
|
778
|
+
# Store new preference
|
|
779
|
+
await memory.upsert(
|
|
780
|
+
id=f"pref-{user_id}-{category}",
|
|
781
|
+
content={"category": category, "value": value},
|
|
782
|
+
metadata={"updated_at": datetime.now()}
|
|
783
|
+
)
|
|
784
|
+
|
|
785
|
+
## Explicit versioning for facts
|
|
786
|
+
await memory.upsert(
|
|
787
|
+
id=f"fact-{fact_id}-v{version}",
|
|
788
|
+
content=new_fact,
|
|
789
|
+
metadata={
|
|
790
|
+
"version": version,
|
|
791
|
+
"supersedes": previous_id,
|
|
792
|
+
"valid_from": datetime.now()
|
|
793
|
+
}
|
|
794
|
+
)
|
|
795
|
+
|
|
796
|
+
### Contradictory Memories Retrieved Together
|
|
797
|
+
|
|
798
|
+
Severity: MEDIUM
|
|
799
|
+
|
|
800
|
+
Situation: User has changed preferences or provided conflicting info
|
|
801
|
+
|
|
802
|
+
Symptoms:
|
|
803
|
+
Agent retrieves "user prefers dark mode" and "user prefers light
|
|
804
|
+
mode" in same context. Gives inconsistent answers. Seems confused
|
|
805
|
+
or forgetful to user.
|
|
806
|
+
|
|
807
|
+
Why this breaks:
|
|
808
|
+
Without conflict resolution, both old and new information coexist.
|
|
809
|
+
Semantic search might return both because they're both about the
|
|
810
|
+
same topic (preferences). Agent has no way to know which is current.
|
|
811
|
+
|
|
812
|
+
Recommended fix:
|
|
813
|
+
|
|
814
|
+
## Detect conflicts on storage
|
|
815
|
+
async def store_with_conflict_check(memory, user_id):
|
|
816
|
+
# Find potentially conflicting memories
|
|
817
|
+
similar = await index.query(
|
|
818
|
+
vector=embed(memory.content),
|
|
819
|
+
filter={"user_id": user_id, "type": memory.type},
|
|
820
|
+
threshold=0.9, # Very similar
|
|
821
|
+
top_k=5
|
|
822
|
+
)
|
|
823
|
+
|
|
824
|
+
for existing in similar:
|
|
825
|
+
if is_contradictory(memory.content, existing.content):
|
|
826
|
+
# Ask for resolution
|
|
827
|
+
resolution = await resolve_conflict(memory, existing)
|
|
828
|
+
if resolution == "replace":
|
|
829
|
+
await index.delete(existing.id)
|
|
830
|
+
elif resolution == "version":
|
|
831
|
+
await mark_superseded(existing.id, memory.id)
|
|
832
|
+
|
|
833
|
+
await index.upsert(memory)
|
|
834
|
+
|
|
835
|
+
## Conflict detection heuristic
|
|
836
|
+
def is_contradictory(new_content, old_content):
|
|
837
|
+
# Use LLM to detect contradiction
|
|
838
|
+
result = llm.invoke(f'''
|
|
839
|
+
Do these two statements contradict each other?
|
|
840
|
+
|
|
841
|
+
Statement 1: {old_content}
|
|
842
|
+
Statement 2: {new_content}
|
|
843
|
+
|
|
844
|
+
Respond with just YES or NO.
|
|
845
|
+
''')
|
|
846
|
+
return result.strip().upper() == "YES"
|
|
847
|
+
|
|
848
|
+
## Periodic consolidation
|
|
849
|
+
async def consolidate_memories(user_id):
|
|
850
|
+
all_memories = await index.list(filter={"user_id": user_id})
|
|
851
|
+
clusters = cluster_by_topic(all_memories)
|
|
852
|
+
|
|
853
|
+
for cluster in clusters:
|
|
854
|
+
if has_conflicts(cluster):
|
|
855
|
+
resolved = await llm.invoke(f'''
|
|
856
|
+
These memories may conflict. Create one consolidated
|
|
857
|
+
memory that represents the current truth:
|
|
858
|
+
{cluster}
|
|
859
|
+
''')
|
|
860
|
+
await replace_cluster(cluster, resolved)
|
|
861
|
+
|
|
862
|
+
### Retrieved Memories Exceed Context Window
|
|
863
|
+
|
|
864
|
+
Severity: MEDIUM
|
|
865
|
+
|
|
866
|
+
Situation: Retrieving too many memories at once
|
|
867
|
+
|
|
868
|
+
Symptoms:
|
|
869
|
+
Token limit errors. Agent truncates important information.
|
|
870
|
+
System prompt gets cut off. Retrieved memories compete with
|
|
871
|
+
user query for space.
|
|
872
|
+
|
|
873
|
+
Why this breaks:
|
|
874
|
+
Retrieval typically returns top-k results. If k is too high or
|
|
875
|
+
chunks are too large, retrieved context overwhelms the window.
|
|
876
|
+
Critical information (system prompt, recent messages) gets pushed
|
|
877
|
+
out.
|
|
878
|
+
|
|
879
|
+
Recommended fix:
|
|
880
|
+
|
|
881
|
+
## Budget tokens for different memory types
|
|
882
|
+
TOKEN_BUDGET = {
|
|
883
|
+
"system_prompt": 500,
|
|
884
|
+
"user_profile": 200,
|
|
885
|
+
"recent_messages": 2000,
|
|
886
|
+
"retrieved_memories": 1000,
|
|
887
|
+
"current_query": 500,
|
|
888
|
+
"buffer": 300, # Safety margin
|
|
889
|
+
}
|
|
890
|
+
|
|
891
|
+
def budget_aware_retrieval(query, context_limit=4000):
|
|
892
|
+
remaining = context_limit - TOKEN_BUDGET["system_prompt"] - TOKEN_BUDGET["buffer"]
|
|
893
|
+
|
|
894
|
+
# Prioritize recent messages
|
|
895
|
+
recent = get_recent_messages(limit=TOKEN_BUDGET["recent_messages"])
|
|
896
|
+
remaining -= count_tokens(recent)
|
|
897
|
+
|
|
898
|
+
# Then user profile
|
|
899
|
+
profile = get_user_profile(limit=TOKEN_BUDGET["user_profile"])
|
|
900
|
+
remaining -= count_tokens(profile)
|
|
901
|
+
|
|
902
|
+
# Finally retrieved memories with remaining budget
|
|
903
|
+
memories = retrieve_memories(query, max_tokens=remaining)
|
|
904
|
+
|
|
905
|
+
return build_context(profile, recent, memories)
|
|
906
|
+
|
|
907
|
+
## Dynamic k based on chunk size
|
|
908
|
+
def retrieve_with_budget(query, max_tokens=1000):
|
|
909
|
+
avg_chunk_tokens = 150 # From your data
|
|
910
|
+
max_k = max_tokens // avg_chunk_tokens
|
|
911
|
+
|
|
912
|
+
results = index.query(query, top_k=max_k)
|
|
913
|
+
|
|
914
|
+
# Trim if still over budget
|
|
915
|
+
total_tokens = 0
|
|
916
|
+
filtered = []
|
|
917
|
+
for result in results:
|
|
918
|
+
tokens = count_tokens(result.text)
|
|
919
|
+
if total_tokens + tokens <= max_tokens:
|
|
920
|
+
filtered.append(result)
|
|
921
|
+
total_tokens += tokens
|
|
922
|
+
else:
|
|
923
|
+
break
|
|
924
|
+
|
|
925
|
+
return filtered
|
|
926
|
+
|
|
927
|
+
### Query and Document Embeddings From Different Models
|
|
928
|
+
|
|
929
|
+
Severity: MEDIUM
|
|
930
|
+
|
|
931
|
+
Situation: Upgrading embedding model or mixing providers
|
|
932
|
+
|
|
933
|
+
Symptoms:
|
|
934
|
+
Retrieval quality suddenly drops. Relevant documents not found.
|
|
935
|
+
Random results returned. Works for new documents, fails for old.
|
|
936
|
+
|
|
937
|
+
Why this breaks:
|
|
938
|
+
Embedding models produce different vector spaces. A query embedded
|
|
939
|
+
with text-embedding-3 won't match documents embedded with text-ada-002.
|
|
940
|
+
Mixing models creates garbage similarity scores.
|
|
941
|
+
|
|
942
|
+
Recommended fix:
|
|
943
|
+
|
|
944
|
+
## Track embedding model in metadata
|
|
945
|
+
await index.upsert(
|
|
946
|
+
id=doc_id,
|
|
947
|
+
vector=embedding,
|
|
948
|
+
metadata={
|
|
949
|
+
"embedding_model": "text-embedding-3-small",
|
|
950
|
+
"embedding_version": "2024-01",
|
|
951
|
+
"content": content
|
|
952
|
+
}
|
|
953
|
+
)
|
|
954
|
+
|
|
955
|
+
## Filter by model version on retrieval
|
|
956
|
+
results = index.query(
|
|
957
|
+
vector=query_embedding,
|
|
958
|
+
filter={"embedding_model": current_model},
|
|
959
|
+
top_k=10
|
|
960
|
+
)
|
|
961
|
+
|
|
962
|
+
## Migration strategy for model upgrades
|
|
963
|
+
async def migrate_embeddings(old_model, new_model):
|
|
964
|
+
# Get all documents with old model
|
|
965
|
+
old_docs = await index.list(filter={"embedding_model": old_model})
|
|
966
|
+
|
|
967
|
+
for doc in old_docs:
|
|
968
|
+
# Re-embed with new model
|
|
969
|
+
new_embedding = await embed(doc.content, model=new_model)
|
|
970
|
+
|
|
971
|
+
# Update in place
|
|
972
|
+
await index.update(
|
|
973
|
+
id=doc.id,
|
|
974
|
+
vector=new_embedding,
|
|
975
|
+
metadata={"embedding_model": new_model}
|
|
976
|
+
)
|
|
977
|
+
|
|
978
|
+
## Use separate collections during migration
|
|
979
|
+
# Old collection: production queries
|
|
980
|
+
# New collection: re-embedding in progress
|
|
981
|
+
# Switch over when complete
|
|
982
|
+
|
|
983
|
+
## Validation Checks
|
|
984
|
+
|
|
985
|
+
### In-Memory Store in Production Code
|
|
986
|
+
|
|
987
|
+
Severity: ERROR
|
|
988
|
+
|
|
989
|
+
In-memory stores lose data on restart
|
|
990
|
+
|
|
991
|
+
Message: In-memory store detected. Use persistent storage (Postgres, Qdrant, Pinecone) for production.
|
|
992
|
+
|
|
993
|
+
### Vector Upsert Without Metadata
|
|
994
|
+
|
|
995
|
+
Severity: WARNING
|
|
996
|
+
|
|
997
|
+
Vectors should have metadata for filtering
|
|
998
|
+
|
|
999
|
+
Message: Vector upsert without metadata. Add user_id, type, timestamp for proper filtering.
|
|
1000
|
+
|
|
1001
|
+
### Query Without User Filtering
|
|
1002
|
+
|
|
1003
|
+
Severity: ERROR
|
|
1004
|
+
|
|
1005
|
+
Queries should filter by user to prevent data leakage
|
|
1006
|
+
|
|
1007
|
+
Message: Vector query without user filtering. Always filter by user_id to prevent data leakage.
|
|
1008
|
+
|
|
1009
|
+
### Hardcoded Chunk Size Without Justification
|
|
1010
|
+
|
|
1011
|
+
Severity: INFO
|
|
1012
|
+
|
|
1013
|
+
Chunk size should be tested and justified
|
|
1014
|
+
|
|
1015
|
+
Message: Hardcoded chunk size. Test different sizes for your content type and measure retrieval accuracy.
|
|
1016
|
+
|
|
1017
|
+
### Chunking Without Overlap
|
|
1018
|
+
|
|
1019
|
+
Severity: WARNING
|
|
1020
|
+
|
|
1021
|
+
Chunk overlap prevents boundary issues
|
|
1022
|
+
|
|
1023
|
+
Message: Text splitting without overlap. Add chunk_overlap (10-20%) to prevent boundary issues.
|
|
1024
|
+
|
|
1025
|
+
### Semantic Search Without Filters
|
|
1026
|
+
|
|
1027
|
+
Severity: WARNING
|
|
1028
|
+
|
|
1029
|
+
Pure semantic search often returns irrelevant results
|
|
1030
|
+
|
|
1031
|
+
Message: Pure semantic search. Add metadata filters (user, type, time) for better relevance.
|
|
1032
|
+
|
|
1033
|
+
### Retrieval Without Result Limit
|
|
1034
|
+
|
|
1035
|
+
Severity: WARNING
|
|
1036
|
+
|
|
1037
|
+
Unbounded retrieval can overflow context
|
|
1038
|
+
|
|
1039
|
+
Message: Retrieval without limit. Set top_k to prevent context overflow.
|
|
1040
|
+
|
|
1041
|
+
### Embeddings Without Model Version Tracking
|
|
1042
|
+
|
|
1043
|
+
Severity: WARNING
|
|
1044
|
+
|
|
1045
|
+
Track embedding model to handle migrations
|
|
1046
|
+
|
|
1047
|
+
Message: Store embedding model version in metadata to handle model migrations.
|
|
1048
|
+
|
|
1049
|
+
### Different Models for Document and Query Embedding
|
|
1050
|
+
|
|
1051
|
+
Severity: ERROR
|
|
1052
|
+
|
|
1053
|
+
Documents and queries must use same embedding model
|
|
1054
|
+
|
|
1055
|
+
Message: Ensure same embedding model for indexing and querying.
|
|
1056
|
+
|
|
1057
|
+
## Collaboration
|
|
1058
|
+
|
|
1059
|
+
### Delegation Triggers
|
|
1060
|
+
|
|
1061
|
+
- user needs vector database at scale -> data-engineer (Production vector store operations)
|
|
1062
|
+
- user needs embedding model optimization -> ml-engineer (Custom embeddings, fine-tuning)
|
|
1063
|
+
- user needs knowledge graph -> knowledge-engineer (Graph-based memory structures)
|
|
1064
|
+
- user needs RAG pipeline -> llm-architect (End-to-end retrieval augmented generation)
|
|
1065
|
+
- user needs multi-agent shared memory -> multi-agent-orchestration (Memory sharing between agents)
|
|
66
1066
|
|
|
67
1067
|
## Related Skills
|
|
68
1068
|
|
|
69
1069
|
Works well with: `autonomous-agents`, `multi-agent-orchestration`, `llm-architect`, `agent-tool-builder`
|
|
70
1070
|
|
|
71
1071
|
## When to Use
|
|
72
|
-
|
|
1072
|
+
|
|
1073
|
+
- User mentions or implies: agent memory
|
|
1074
|
+
- User mentions or implies: long-term memory
|
|
1075
|
+
- User mentions or implies: memory systems
|
|
1076
|
+
- User mentions or implies: remember across sessions
|
|
1077
|
+
- User mentions or implies: memory retrieval
|
|
1078
|
+
- User mentions or implies: episodic memory
|
|
1079
|
+
- User mentions or implies: semantic memory
|
|
1080
|
+
- User mentions or implies: vector store
|
|
1081
|
+
- User mentions or implies: rag
|
|
1082
|
+
- User mentions or implies: langmem
|
|
1083
|
+
- User mentions or implies: memgpt
|
|
1084
|
+
- User mentions or implies: conversation history
|