groundswell 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/settings.local.json +9 -0
- package/.claude/system_prompts/task-breakdown.md +100 -0
- package/PRPs/001-hierarchical-workflow-engine.md +2438 -0
- package/PRPs/PRDs/001-hierarchical-workflow-engine.md +543 -0
- package/PRPs/PRDs/002-agent-prompt.md +390 -0
- package/PRPs/PRDs/003-agent-prompt.md +943 -0
- package/PRPs/PRDs/004-agent-prompt.md +1136 -0
- package/PRPs/PRDs/tasks-001.json +492 -0
- package/PRPs/README.md +83 -0
- package/PRPs/templates/prp_base.md +222 -0
- package/README.md +218 -0
- package/docs/agent.md +422 -0
- package/docs/prompt.md +419 -0
- package/docs/workflow.md +600 -0
- package/examples/README.md +244 -0
- package/examples/examples/01-basic-workflow.ts +100 -0
- package/examples/examples/02-decorator-options.ts +217 -0
- package/examples/examples/03-parent-child.ts +241 -0
- package/examples/examples/04-observers-debugger.ts +340 -0
- package/examples/examples/05-error-handling.ts +387 -0
- package/examples/examples/06-concurrent-tasks.ts +352 -0
- package/examples/examples/07-agent-loops.ts +432 -0
- package/examples/examples/08-sdk-features.ts +667 -0
- package/examples/examples/09-reflection.ts +573 -0
- package/examples/examples/10-introspection.ts +550 -0
- package/examples/index.ts +143 -0
- package/examples/utils/helpers.ts +57 -0
- package/llms_full.txt +5890 -0
- package/package.json +63 -0
- package/plan/P1P2/PRP.md +527 -0
- package/plan/P1P2/research/LRU_CACHE_BEST_PRACTICES.md +1929 -0
- package/plan/P1P2/research/LRU_CACHE_CODE_PATTERNS.md +857 -0
- package/plan/P1P2/research/LRU_CACHE_INTEGRATION_GUIDE.md +738 -0
- package/plan/P1P2/research/LRU_CACHE_RESEARCH_INDEX.md +424 -0
- package/plan/P1P2/research/REFLECTION_INDEX.md +291 -0
- package/plan/P1P2/research/REFLECTION_RESEARCH_REPORT.md +1342 -0
- package/plan/P1P2/research/RESEARCH_SUMMARY.md +342 -0
- package/plan/P1P2/research/anthropic-sdk.md +174 -0
- package/plan/P1P2/research/async-local-storage.md +200 -0
- package/plan/P1P2/research/reflection-code-patterns.md +1205 -0
- package/plan/P1P2/research/reflection-decision-matrix.md +421 -0
- package/plan/P1P2/research/reflection-implementation-guide.md +1341 -0
- package/plan/P1P2/research/reflection-integration-guide.md +834 -0
- package/plan/P1P2/research/reflection-patterns.md +1468 -0
- package/plan/P1P2/research/reflection-quick-reference.md +558 -0
- package/plan/P1P2/research/zod-schema.md +152 -0
- package/plan/P3P4/PRP.md +1388 -0
- package/plan/P3P4/research/caching-lru.md +116 -0
- package/plan/P3P4/research/introspection-tools.md +177 -0
- package/plan/P3P4/research/reflection-patterns.md +117 -0
- package/plan/P4P5/PRP.md +1136 -0
- package/plan/P4P5/research/RESEARCH_SUMMARY.md +151 -0
- package/plan/architecture/external_deps.md +358 -0
- package/plan/architecture/system_context.md +242 -0
- package/plan/backlog.json +867 -0
- package/plan/research/INTROSPECTION_RESEARCH_SUMMARY.md +378 -0
- package/plan/research/README-INTROSPECTION.md +352 -0
- package/plan/research/agent-introspection-patterns.md +1085 -0
- package/plan/research/introspection-security-guide.md +928 -0
- package/plan/research/introspection-tool-examples.md +875 -0
- package/scripts/generate-llms-full.ts +206 -0
- package/src/__tests__/integration/agent-workflow.test.ts +256 -0
- package/src/__tests__/integration/tree-mirroring.test.ts +114 -0
- package/src/__tests__/unit/agent.test.ts +169 -0
- package/src/__tests__/unit/cache-key.test.ts +182 -0
- package/src/__tests__/unit/cache.test.ts +172 -0
- package/src/__tests__/unit/context.test.ts +138 -0
- package/src/__tests__/unit/decorators.test.ts +100 -0
- package/src/__tests__/unit/introspection-tools.test.ts +277 -0
- package/src/__tests__/unit/prompt.test.ts +135 -0
- package/src/__tests__/unit/reflection.test.ts +210 -0
- package/src/__tests__/unit/tree-debugger.test.ts +85 -0
- package/src/__tests__/unit/workflow.test.ts +81 -0
- package/src/cache/cache-key.ts +244 -0
- package/src/cache/cache.ts +236 -0
- package/src/cache/index.ts +8 -0
- package/src/core/agent.ts +573 -0
- package/src/core/context.ts +119 -0
- package/src/core/event-tree.ts +260 -0
- package/src/core/factory.ts +123 -0
- package/src/core/index.ts +17 -0
- package/src/core/logger.ts +87 -0
- package/src/core/mcp-handler.ts +184 -0
- package/src/core/prompt.ts +150 -0
- package/src/core/workflow-context.ts +349 -0
- package/src/core/workflow.ts +302 -0
- package/src/debugger/index.ts +1 -0
- package/src/debugger/tree-debugger.ts +210 -0
- package/src/decorators/index.ts +3 -0
- package/src/decorators/observed-state.ts +95 -0
- package/src/decorators/step.ts +139 -0
- package/src/decorators/task.ts +96 -0
- package/src/examples/index.ts +2 -0
- package/src/examples/tdd-orchestrator.ts +65 -0
- package/src/examples/test-cycle-workflow.ts +64 -0
- package/src/index.ts +140 -0
- package/src/reflection/index.ts +5 -0
- package/src/reflection/reflection.ts +407 -0
- package/src/tools/index.ts +36 -0
- package/src/tools/introspection.ts +464 -0
- package/src/types/agent.ts +90 -0
- package/src/types/decorators.ts +25 -0
- package/src/types/error-strategy.ts +13 -0
- package/src/types/error.ts +20 -0
- package/src/types/events.ts +74 -0
- package/src/types/index.ts +55 -0
- package/src/types/logging.ts +24 -0
- package/src/types/observer.ts +18 -0
- package/src/types/prompt.ts +40 -0
- package/src/types/reflection.ts +117 -0
- package/src/types/sdk-primitives.ts +128 -0
- package/src/types/snapshot.ts +14 -0
- package/src/types/workflow-context.ts +163 -0
- package/src/types/workflow.ts +37 -0
- package/src/utils/id.ts +11 -0
- package/src/utils/index.ts +3 -0
- package/src/utils/observable.ts +77 -0
- package/tasks.json +0 -0
- package/tsconfig.json +22 -0
- package/vitest.config.ts +16 -0
|
@@ -0,0 +1,424 @@
|
|
|
1
|
+
# LRU Cache Research for LLM Response Caching - Complete Index
|
|
2
|
+
|
|
3
|
+
**Research Period:** 2025-12-08
|
|
4
|
+
**Scope:** Best practices for implementing LRU caching in TypeScript/Node.js for LLM response caching
|
|
5
|
+
**Target:** Production-grade implementation with lru-cache v10+, deterministic key generation, and semantic caching
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Research Documents
|
|
10
|
+
|
|
11
|
+
### 1. **LRU_CACHE_BEST_PRACTICES.md** (49 KB)
|
|
12
|
+
**Comprehensive reference guide with in-depth technical details**
|
|
13
|
+
|
|
14
|
+
- Executive summary of LRU caching for LLMs
|
|
15
|
+
- lru-cache v10+ package deep dive:
|
|
16
|
+
- Core concepts and configuration options (max, maxSize, TTL)
|
|
17
|
+
- The `fetch()` method (recommended pattern)
|
|
18
|
+
- Size calculation best practices
|
|
19
|
+
- Production configuration recommendations
|
|
20
|
+
- Deterministic JSON stringification:
|
|
21
|
+
- The problem and 3+ solutions
|
|
22
|
+
- `safe-stable-stringify` (recommended)
|
|
23
|
+
- `fast-json-stable-stringify` (performance alternative)
|
|
24
|
+
- `json-stringify-deterministic`
|
|
25
|
+
- Comparison table
|
|
26
|
+
- SHA-256 hashing in Node.js:
|
|
27
|
+
- Built-in crypto module usage
|
|
28
|
+
- Performance characteristics (~20 MB/s)
|
|
29
|
+
- Stream-based hashing for large data
|
|
30
|
+
- HMAC patterns for cache validation
|
|
31
|
+
- Zod schema hashing patterns:
|
|
32
|
+
- Schema fingerprinting techniques
|
|
33
|
+
- Deep traversal for nested schemas
|
|
34
|
+
- Versioning approach (recommended)
|
|
35
|
+
- Zod v4 improvements
|
|
36
|
+
- LLM response caching architecture:
|
|
37
|
+
- Exact vs. semantic caching
|
|
38
|
+
- Hybrid approach (recommended)
|
|
39
|
+
- Multi-layer cache architecture
|
|
40
|
+
- Cache hit rate monitoring
|
|
41
|
+
- Common pitfalls and solutions (8 detailed patterns)
|
|
42
|
+
- Performance benchmarks with data
|
|
43
|
+
- Complete production-ready implementation example
|
|
44
|
+
- LangChain integration patterns
|
|
45
|
+
- Version recommendations and compatibility
|
|
46
|
+
|
|
47
|
+
**Use This For:** Understanding the full technical landscape, making architectural decisions, detailed implementation guidance.
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
### 2. **LRU_CACHE_CODE_PATTERNS.md** (19 KB)
|
|
52
|
+
**Quick reference with copy-paste ready code examples**
|
|
53
|
+
|
|
54
|
+
- Installation guide
|
|
55
|
+
- Quick start minimal example
|
|
56
|
+
- Cache key generation patterns:
|
|
57
|
+
- Simple string hashing
|
|
58
|
+
- Object hashing (most common)
|
|
59
|
+
- Composite keys with prefix
|
|
60
|
+
- Semantic cache keys
|
|
61
|
+
- Versioned keys (auto-invalidate)
|
|
62
|
+
- Configuration patterns:
|
|
63
|
+
- Development cache
|
|
64
|
+
- Production cache
|
|
65
|
+
- Memory-constrained cache
|
|
66
|
+
- High-throughput cache
|
|
67
|
+
- Persistent + in-memory hybrid
|
|
68
|
+
- Usage patterns:
|
|
69
|
+
- Basic fetch (recommended)
|
|
70
|
+
- Conditional fetch
|
|
71
|
+
- Batch operations
|
|
72
|
+
- Stale-while-revalidate
|
|
73
|
+
- Cache warming
|
|
74
|
+
- Testing patterns:
|
|
75
|
+
- Cache hit/miss testing
|
|
76
|
+
- Key generation testing
|
|
77
|
+
- Performance testing
|
|
78
|
+
- Monitoring patterns:
|
|
79
|
+
- Basic metrics collection
|
|
80
|
+
- Periodic logging
|
|
81
|
+
- Alert on low hit rate
|
|
82
|
+
- Export metrics to JSON
|
|
83
|
+
- Edge case handling:
|
|
84
|
+
- Large prompts
|
|
85
|
+
- Special characters
|
|
86
|
+
- Null/undefined values
|
|
87
|
+
- Complete integration example with service class
|
|
88
|
+
- All patterns include working TypeScript/JavaScript code
|
|
89
|
+
|
|
90
|
+
**Use This For:** Copy-paste implementation, solving specific problems, quick integration.
|
|
91
|
+
|
|
92
|
+
---
|
|
93
|
+
|
|
94
|
+
### 3. **LRU_CACHE_INTEGRATION_GUIDE.md** (18 KB)
|
|
95
|
+
**Practical integration strategies and decision matrix**
|
|
96
|
+
|
|
97
|
+
- Quick decision matrix:
|
|
98
|
+
- Exact vs. semantic vs. hybrid caching
|
|
99
|
+
- Deployment environment recommendations
|
|
100
|
+
- Package selection guide
|
|
101
|
+
- Integration strategies:
|
|
102
|
+
- Strategy 1: Minimal Integration (MVP/simple cases)
|
|
103
|
+
- Strategy 2: Service-Based Integration (recommended for production)
|
|
104
|
+
- Strategy 3: Multi-Layer Caching (multi-node deployment)
|
|
105
|
+
- Each with pros/cons and implementation examples
|
|
106
|
+
- Framework-specific integration:
|
|
107
|
+
- Groundswell workflow engine integration
|
|
108
|
+
- Plugin-based approach
|
|
109
|
+
- Decorator-based caching
|
|
110
|
+
- OpenAI integration example
|
|
111
|
+
- Migration path (4-phase rollout):
|
|
112
|
+
- Phase 1: Evaluation (Week 1)
|
|
113
|
+
- Phase 2: Pilot (Week 2-3)
|
|
114
|
+
- Phase 3: Rollout (Week 4-8)
|
|
115
|
+
- Phase 4: Optimization (Week 8+)
|
|
116
|
+
- Troubleshooting guide (6 common issues):
|
|
117
|
+
- Low cache hit rate diagnosis and solutions
|
|
118
|
+
- High memory usage fixes
|
|
119
|
+
- Stale data handling
|
|
120
|
+
- Test cache issues
|
|
121
|
+
- Race conditions
|
|
122
|
+
- Key length optimization
|
|
123
|
+
- Performance tuning checklist (10 items)
|
|
124
|
+
- Decision flowchart
|
|
125
|
+
|
|
126
|
+
**Use This For:** Choosing integration strategy, planning rollout, troubleshooting issues, framework integration.
|
|
127
|
+
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
## Key Findings Summary
|
|
131
|
+
|
|
132
|
+
### Most Important Takeaways
|
|
133
|
+
|
|
134
|
+
1. **Always use `safe-stable-stringify`** instead of native `JSON.stringify()`
|
|
135
|
+
- Native JSON.stringify has no key ordering guarantee
|
|
136
|
+
- Causes cache misses for identical logical inputs with different key order
|
|
137
|
+
- `safe-stable-stringify` is fastest deterministic option
|
|
138
|
+
|
|
139
|
+
2. **Use the `fetch()` method** instead of manual get/set
|
|
140
|
+
- Automatically deduplicates concurrent requests
|
|
141
|
+
- Prevents race conditions
|
|
142
|
+
- Recommended pattern for all use cases
|
|
143
|
+
|
|
144
|
+
3. **Configure at least one limit** (max, maxSize, or TTL)
|
|
145
|
+
- Prevents unbounded memory growth
|
|
146
|
+
- `max: 5000` for typical LLM cache
|
|
147
|
+
- `maxSize: 500MB` for production
|
|
148
|
+
|
|
149
|
+
4. **Semantic caching achieves 60-70% hit rates**
|
|
150
|
+
- Exact caching: 30-40% hit rate
|
|
151
|
+
- Semantic caching: 60-70% hit rate
|
|
152
|
+
- Requires embedding model (~10-50ms latency)
|
|
153
|
+
- Consider hybrid approach for production
|
|
154
|
+
|
|
155
|
+
5. **SHA-256 hashing via Node.js crypto is excellent**
|
|
156
|
+
- ~20 MB/s throughput (OpenSSL-backed)
|
|
157
|
+
- <0.05ms for typical LLM prompt
|
|
158
|
+
- Use for prompts > 100 characters
|
|
159
|
+
|
|
160
|
+
6. **Cache key generation must be deterministic**
|
|
161
|
+
- Object key order matters
|
|
162
|
+
- Circular references must be handled
|
|
163
|
+
- Include version numbers for auto-invalidation
|
|
164
|
+
|
|
165
|
+
7. **Zod schema hashing should use versioning** rather than dynamic hashing
|
|
166
|
+
- `_def` introspection is fragile (private API)
|
|
167
|
+
- Explicit version numbers are more stable
|
|
168
|
+
- Increment version = automatic cache invalidation
|
|
169
|
+
|
|
170
|
+
8. **Hit rate > 30% is good baseline**
|
|
171
|
+
- 30-40% for exact caching (typical)
|
|
172
|
+
- Monitor and alert below threshold
|
|
173
|
+
- Use metrics to guide optimization
|
|
174
|
+
|
|
175
|
+
### Performance Expectations
|
|
176
|
+
|
|
177
|
+
| Metric | Value | Notes |
|
|
178
|
+
|--------|-------|-------|
|
|
179
|
+
| Cache hit lookup | <0.1ms | Negligible |
|
|
180
|
+
| Deterministic stringify | <1ms | For typical prompt |
|
|
181
|
+
| SHA-256 hashing | <0.05ms | Built-in crypto |
|
|
182
|
+
| fetch() deduplication | ~0.0001ms | Minimal overhead |
|
|
183
|
+
| Hit rate (exact) | 30-40% | Expected baseline |
|
|
184
|
+
| Hit rate (semantic) | 60-70% | With embedding model |
|
|
185
|
+
| Memory per 100 responses | 1-5 MB | Depends on response size |
|
|
186
|
+
|
|
187
|
+
### Architecture Recommendation
|
|
188
|
+
|
|
189
|
+
```
|
|
190
|
+
Development:
|
|
191
|
+
LRUCache (max: 100, ttl: 1hr)
|
|
192
|
+
|
|
193
|
+
Production (Single Node):
|
|
194
|
+
L1: LRUCache (max: 5000, maxSize: 500MB, ttl: 24hrs)
|
|
195
|
+
|
|
196
|
+
Production (Multi-Node):
|
|
197
|
+
L1: LRUCache (max: 1000, maxSize: 100MB, ttl: 24hrs)
|
|
198
|
+
L2: Redis (shared across nodes)
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
---
|
|
202
|
+
|
|
203
|
+
## Quick Decision Guide
|
|
204
|
+
|
|
205
|
+
### Q: Which package should I use?
|
|
206
|
+
**A:** `safe-stable-stringify` (v2.4.0+) for deterministic key generation
|
|
207
|
+
|
|
208
|
+
### Q: Exact or semantic caching?
|
|
209
|
+
**A:** Start with exact caching (30-40% hit rate), add semantic if needed (60-70%)
|
|
210
|
+
|
|
211
|
+
### Q: How should I generate cache keys?
|
|
212
|
+
**A:** Use `safeStringify() + createHash('sha256') + digest('hex')`
|
|
213
|
+
|
|
214
|
+
### Q: What cache size should I use?
|
|
215
|
+
**A:** Start with `max: 5000` + `maxSize: 500MB` for production
|
|
216
|
+
|
|
217
|
+
### Q: How long should I cache responses?
|
|
218
|
+
**A:** 24 hours TTL with `updateAgeOnGet: true` for hot items
|
|
219
|
+
|
|
220
|
+
### Q: Should I cache to Redis?
|
|
221
|
+
**A:** Only if multi-node deployment; L1 in-memory is usually sufficient
|
|
222
|
+
|
|
223
|
+
### Q: How do I monitor cache performance?
|
|
224
|
+
**A:** Track hits/misses, hit rate, latency, memory usage
|
|
225
|
+
|
|
226
|
+
### Q: What's the expected hit rate?
|
|
227
|
+
**A:** 30-40% for exact caching, 60-70% for semantic
|
|
228
|
+
|
|
229
|
+
---
|
|
230
|
+
|
|
231
|
+
## Implementation Checklist
|
|
232
|
+
|
|
233
|
+
### Before Starting
|
|
234
|
+
- [ ] Node.js 18+ installed
|
|
235
|
+
- [ ] TypeScript configured (if using TS)
|
|
236
|
+
- [ ] npm/yarn ready
|
|
237
|
+
|
|
238
|
+
### Setup Phase
|
|
239
|
+
- [ ] Install `lru-cache` v10+
|
|
240
|
+
- [ ] Install `safe-stable-stringify`
|
|
241
|
+
- [ ] Install `zod` (for validation)
|
|
242
|
+
- [ ] Review architecture decision (Strategy 1/2/3)
|
|
243
|
+
|
|
244
|
+
### Implementation Phase
|
|
245
|
+
- [ ] Create cache service class
|
|
246
|
+
- [ ] Implement deterministic key generation
|
|
247
|
+
- [ ] Add to main LLM workflow
|
|
248
|
+
- [ ] Implement metrics collection
|
|
249
|
+
- [ ] Write tests (hit/miss scenarios)
|
|
250
|
+
|
|
251
|
+
### Validation Phase
|
|
252
|
+
- [ ] Test cache hit scenarios
|
|
253
|
+
- [ ] Test cache miss scenarios
|
|
254
|
+
- [ ] Monitor hit rate (target: >30%)
|
|
255
|
+
- [ ] Check memory usage
|
|
256
|
+
- [ ] Load test with concurrent requests
|
|
257
|
+
- [ ] Verify no race conditions
|
|
258
|
+
|
|
259
|
+
### Production Phase
|
|
260
|
+
- [ ] Add monitoring/alerting
|
|
261
|
+
- [ ] Set up metrics export
|
|
262
|
+
- [ ] Document cache behavior
|
|
263
|
+
- [ ] Create cache invalidation procedure
|
|
264
|
+
- [ ] Plan upgrade path for schema changes
|
|
265
|
+
|
|
266
|
+
---
|
|
267
|
+
|
|
268
|
+
## Document Map by Use Case
|
|
269
|
+
|
|
270
|
+
### "I just need to get started quickly"
|
|
271
|
+
1. Read: LRU_CACHE_CODE_PATTERNS.md → Quick Start section
|
|
272
|
+
2. Copy: The minimal example
|
|
273
|
+
3. Integrate: Into your LLM service
|
|
274
|
+
4. Done
|
|
275
|
+
|
|
276
|
+
### "I need to understand everything"
|
|
277
|
+
1. Read: LRU_CACHE_BEST_PRACTICES.md (complete)
|
|
278
|
+
2. Deep dive: Sections matching your questions
|
|
279
|
+
3. Reference: Code patterns as needed
|
|
280
|
+
4. Decide: Integration strategy from guide
|
|
281
|
+
|
|
282
|
+
### "I need to plan integration"
|
|
283
|
+
1. Read: LRU_CACHE_INTEGRATION_GUIDE.md → Decision Matrix
|
|
284
|
+
2. Choose: One of 3 strategies
|
|
285
|
+
3. Plan: Your rollout phases
|
|
286
|
+
4. Execute: Using code patterns
|
|
287
|
+
|
|
288
|
+
### "I'm having problems"
|
|
289
|
+
1. Go to: LRU_CACHE_INTEGRATION_GUIDE.md → Troubleshooting
|
|
290
|
+
2. Match: Your issue to diagnosis
|
|
291
|
+
3. Apply: Recommended solution
|
|
292
|
+
4. Test: Changes to verify fix
|
|
293
|
+
|
|
294
|
+
### "I need production monitoring"
|
|
295
|
+
1. Read: LRU_CACHE_CODE_PATTERNS.md → Monitoring Patterns
|
|
296
|
+
2. Implement: Metrics collection
|
|
297
|
+
3. Setup: Periodic logging
|
|
298
|
+
4. Configure: Alerts for low hit rate
|
|
299
|
+
|
|
300
|
+
---
|
|
301
|
+
|
|
302
|
+
## Version Recommendations
|
|
303
|
+
|
|
304
|
+
### Minimum Versions (Works, Not Recommended)
|
|
305
|
+
- lru-cache: v7+ (rewritten in TS)
|
|
306
|
+
- safe-stable-stringify: v2.0+
|
|
307
|
+
- Node.js: 14+
|
|
308
|
+
|
|
309
|
+
### Recommended Versions (Best Balance)
|
|
310
|
+
- lru-cache: v10.0.0+
|
|
311
|
+
- safe-stable-stringify: v2.4.0+
|
|
312
|
+
- zod: v3.20+
|
|
313
|
+
- Node.js: 18 LTS
|
|
314
|
+
|
|
315
|
+
### Cutting Edge (Latest Features)
|
|
316
|
+
- lru-cache: v11.2.2+
|
|
317
|
+
- safe-stable-stringify: v2.4.3+
|
|
318
|
+
- zod: v3.22+
|
|
319
|
+
- Node.js: 20+ LTS or 22+ current
|
|
320
|
+
|
|
321
|
+
---
|
|
322
|
+
|
|
323
|
+
## Research Sources
|
|
324
|
+
|
|
325
|
+
### Primary Sources
|
|
326
|
+
- lru-cache npm: https://www.npmjs.com/package/lru-cache
|
|
327
|
+
- Node.js Crypto API: https://nodejs.org/api/crypto.html
|
|
328
|
+
- safe-stable-stringify: https://github.com/davidmarkclements/fast-safe-stringify
|
|
329
|
+
- fast-json-stable-stringify: https://github.com/epoberezkin/fast-json-stable-stringify
|
|
330
|
+
- Zod Documentation: https://zod.dev/
|
|
331
|
+
|
|
332
|
+
### LLM Caching References
|
|
333
|
+
- Helicone LLM Caching Guide: https://www.helicone.ai/blog/effective-llm-caching
|
|
334
|
+
- IBM Prompt Caching: https://www.ibm.com/think/topics/prompt-caching
|
|
335
|
+
- GPTCache Library: https://github.com/zilliztech/GPTCache
|
|
336
|
+
- LangChain Caching: https://docs.langchain.com/modules/memory/
|
|
337
|
+
|
|
338
|
+
### Performance Research
|
|
339
|
+
- JavaScript Hashing Speed: https://lemire.me/blog/2025/01/11/javascript-hashing-speed-comparison-md5-versus-sha-256/
|
|
340
|
+
- Node.js Performance: https://github.com/nodejs/performance/issues/136
|
|
341
|
+
|
|
342
|
+
---
|
|
343
|
+
|
|
344
|
+
## Next Steps for Your Project
|
|
345
|
+
|
|
346
|
+
1. **Short Term (This Week)**
|
|
347
|
+
- [ ] Review LRU_CACHE_BEST_PRACTICES.md
|
|
348
|
+
- [ ] Copy minimal example from CODE_PATTERNS
|
|
349
|
+
- [ ] Integrate into dev environment
|
|
350
|
+
- [ ] Measure baseline metrics
|
|
351
|
+
|
|
352
|
+
2. **Medium Term (Next 2-4 Weeks)**
|
|
353
|
+
- [ ] Implement service-based caching (Strategy 2)
|
|
354
|
+
- [ ] Add monitoring and metrics
|
|
355
|
+
- [ ] Pilot in staging environment
|
|
356
|
+
- [ ] Tune configuration
|
|
357
|
+
|
|
358
|
+
3. **Long Term (Month 2+)**
|
|
359
|
+
- [ ] Add semantic caching if needed
|
|
360
|
+
- [ ] Multi-layer caching if multi-node
|
|
361
|
+
- [ ] Comprehensive monitoring dashboard
|
|
362
|
+
- [ ] Auto-scaling policies
|
|
363
|
+
|
|
364
|
+
---
|
|
365
|
+
|
|
366
|
+
## Questions? Common Issues?
|
|
367
|
+
|
|
368
|
+
**Q: My cache keys don't match even for identical inputs**
|
|
369
|
+
A: Use `safe-stable-stringify`, not `JSON.stringify`
|
|
370
|
+
|
|
371
|
+
**Q: Cache hit rate is too low (< 20%)**
|
|
372
|
+
A: Check for non-deterministic key generation or slight prompt variations
|
|
373
|
+
|
|
374
|
+
**Q: Memory usage is growing unbounded**
|
|
375
|
+
A: Ensure you configured `max` or `maxSize` limit
|
|
376
|
+
|
|
377
|
+
**Q: Concurrent requests calling LLM multiple times**
|
|
378
|
+
A: Use `cache.fetch()` method, not manual get/set
|
|
379
|
+
|
|
380
|
+
**Q: Schema changes aren't invalidating cache**
|
|
381
|
+
A: Use versioning in your cache key generation
|
|
382
|
+
|
|
383
|
+
**Q: How do I test cache behavior?**
|
|
384
|
+
A: See testing patterns in CODE_PATTERNS.md
|
|
385
|
+
|
|
386
|
+
---
|
|
387
|
+
|
|
388
|
+
## Document Statistics
|
|
389
|
+
|
|
390
|
+
| Document | Size | Sections | Code Examples | Tables |
|
|
391
|
+
|----------|------|----------|---|--------|
|
|
392
|
+
| Best Practices | 49 KB | 10 | 50+ | 8 |
|
|
393
|
+
| Code Patterns | 19 KB | 7 | 30+ | 2 |
|
|
394
|
+
| Integration Guide | 18 KB | 5 | 15+ | 3 |
|
|
395
|
+
| **Total** | **86 KB** | **22** | **95+** | **13** |
|
|
396
|
+
|
|
397
|
+
---
|
|
398
|
+
|
|
399
|
+
## Research Completion Summary
|
|
400
|
+
|
|
401
|
+
This comprehensive research provides:
|
|
402
|
+
|
|
403
|
+
✅ Complete API reference for lru-cache v10+
|
|
404
|
+
✅ 3 deterministic stringification solutions with comparison
|
|
405
|
+
✅ SHA-256 hashing best practices with performance data
|
|
406
|
+
✅ Zod schema hashing patterns and anti-patterns
|
|
407
|
+
✅ 8+ common pitfalls with solutions
|
|
408
|
+
✅ 3 integration strategies with decision matrix
|
|
409
|
+
✅ 95+ copy-paste ready code examples
|
|
410
|
+
✅ Testing patterns for cache behavior
|
|
411
|
+
✅ Monitoring and metrics patterns
|
|
412
|
+
✅ Troubleshooting guide with 6 common issues
|
|
413
|
+
✅ 4-phase migration/rollout plan
|
|
414
|
+
✅ Performance benchmarks and expectations
|
|
415
|
+
✅ Production recommendations
|
|
416
|
+
✅ Complete working implementation examples
|
|
417
|
+
|
|
418
|
+
**Ready for implementation!**
|
|
419
|
+
|
|
420
|
+
---
|
|
421
|
+
|
|
422
|
+
**Index Version:** 1.0
|
|
423
|
+
**Created:** 2025-12-08
|
|
424
|
+
**Status:** Complete and Ready for Use
|
|
@@ -0,0 +1,291 @@
|
|
|
1
|
+
# AI Reflection and Self-Correction: Complete Research Index
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
This research package contains a comprehensive analysis of AI reflection and self-correction patterns in agent orchestration systems, based on 2024-2025 research and industry best practices.
|
|
6
|
+
|
|
7
|
+
**Key Finding**: Reflection with external feedback (evidence-grounded) significantly improves LLM agent performance, with even minimal "try again" signals yielding 60%+ improvement rates.
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Documents in This Package
|
|
12
|
+
|
|
13
|
+
### 1. REFLECTION_RESEARCH_REPORT.md
|
|
14
|
+
**Comprehensive research report covering all aspects of reflection patterns**
|
|
15
|
+
|
|
16
|
+
- **Sections**:
|
|
17
|
+
- Core reflection concepts and types
|
|
18
|
+
- Research-backed approaches (Reflexion, LATS, SPOC)
|
|
19
|
+
- Prompt templates for different scenarios
|
|
20
|
+
- Retry limits and backoff strategies
|
|
21
|
+
- Introspection tools and security
|
|
22
|
+
- Best practices from LangChain, CrewAI, AutoGPT
|
|
23
|
+
- Common pitfalls (infinite loops, context bloat)
|
|
24
|
+
- Security considerations and patterns
|
|
25
|
+
- Monitoring and observability
|
|
26
|
+
|
|
27
|
+
- **Use when**: You need detailed understanding of reflection theory and implementation
|
|
28
|
+
|
|
29
|
+
- **Key sections for reference**:
|
|
30
|
+
- Section 1: Reflection patterns overview
|
|
31
|
+
- Section 2: Implementation patterns with code examples
|
|
32
|
+
- Section 3: Introspection tools specifications
|
|
33
|
+
- Section 5: Best practices from frameworks
|
|
34
|
+
- Section 10: Key recommendations
|
|
35
|
+
|
|
36
|
+
---
|
|
37
|
+
|
|
38
|
+
### 2. reflection-implementation-guide.md
|
|
39
|
+
**Practical code examples in TypeScript and Python**
|
|
40
|
+
|
|
41
|
+
- **Contains**:
|
|
42
|
+
- Basic reflection loop (TypeScript)
|
|
43
|
+
- Reflexion pattern implementation (Python)
|
|
44
|
+
- Loop detection system (TypeScript)
|
|
45
|
+
- Context window management (Python)
|
|
46
|
+
- Security validation (TypeScript)
|
|
47
|
+
- Multi-level hierarchical reflection (Python)
|
|
48
|
+
- Metrics collection (TypeScript)
|
|
49
|
+
- Orchestration integration example
|
|
50
|
+
|
|
51
|
+
- **Use when**: Building actual reflection systems; copy-paste ready code
|
|
52
|
+
|
|
53
|
+
- **How to use**:
|
|
54
|
+
1. Select the language (Python or TypeScript)
|
|
55
|
+
2. Find the pattern matching your needs
|
|
56
|
+
3. Adapt the code to your LLM client
|
|
57
|
+
4. Integrate with your orchestration layer
|
|
58
|
+
|
|
59
|
+
- **Code quality**: Production-ready with error handling and type safety
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
### 3. reflection-quick-reference.md
|
|
64
|
+
**Quick lookup guide for decision-making and configuration**
|
|
65
|
+
|
|
66
|
+
- **Contains**:
|
|
67
|
+
- Decision tree for when to use reflection
|
|
68
|
+
- Approach selection matrix (by task type)
|
|
69
|
+
- Prompt templates (4 levels of complexity)
|
|
70
|
+
- Configuration profiles (Speed, Quality, Balanced, Safety-Critical)
|
|
71
|
+
- Stopping conditions checklist
|
|
72
|
+
- Common mistakes and fixes
|
|
73
|
+
- Performance benchmarks
|
|
74
|
+
- Token budget calculator
|
|
75
|
+
- Security checklist
|
|
76
|
+
- Integration checklist
|
|
77
|
+
- Troubleshooting guide
|
|
78
|
+
|
|
79
|
+
- **Use when**: Making decisions about reflection implementation, configuring systems
|
|
80
|
+
|
|
81
|
+
- **Most useful sections**:
|
|
82
|
+
- Decision tree at top (start here)
|
|
83
|
+
- Configuration profiles section
|
|
84
|
+
- Stopping conditions checklist
|
|
85
|
+
- Troubleshooting guide
|
|
86
|
+
|
|
87
|
+
---
|
|
88
|
+
|
|
89
|
+
## Quick Navigation by Use Case
|
|
90
|
+
|
|
91
|
+
### I'm Building a New Agent Orchestration System
|
|
92
|
+
|
|
93
|
+
1. Read Decision Tree in `reflection-quick-reference.md`
|
|
94
|
+
2. Read Approach Selection Matrix in `reflection-quick-reference.md`
|
|
95
|
+
3. Read Section 1 (Concepts) in `REFLECTION_RESEARCH_REPORT.md`
|
|
96
|
+
4. Review Configuration Profiles in `reflection-quick-reference.md`
|
|
97
|
+
5. Read Sections 2-3 in `REFLECTION_RESEARCH_REPORT.md`
|
|
98
|
+
6. Copy code from `reflection-implementation-guide.md`
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
### I'm Implementing Reflection for a Specific Task Type
|
|
103
|
+
|
|
104
|
+
**For Code Generation**: Use Reflexion approach with test feedback, max_attempts=2-3
|
|
105
|
+
|
|
106
|
+
**For Writing/Content**: Use Self-Critique or Multi-Agent, max_attempts=2-3
|
|
107
|
+
|
|
108
|
+
**For Analysis/Research**: Use Reflexion with evidence and fact-checking tools
|
|
109
|
+
|
|
110
|
+
**For Planning**: Use basic self-critique, max_attempts=1-2, keep lightweight
|
|
111
|
+
|
|
112
|
+
---
|
|
113
|
+
|
|
114
|
+
### I'm Worried About Infinite Loops
|
|
115
|
+
|
|
116
|
+
1. See Section 5.1 in REFLECTION_RESEARCH_REPORT.md
|
|
117
|
+
2. Copy loop detection code from reflection-implementation-guide.md
|
|
118
|
+
3. Implement all 4 detection strategies
|
|
119
|
+
4. Set hard max_attempts limit (never exceed 3)
|
|
120
|
+
|
|
121
|
+
---
|
|
122
|
+
|
|
123
|
+
### I'm Building a Secure System
|
|
124
|
+
|
|
125
|
+
1. Read Section 6 in REFLECTION_RESEARCH_REPORT.md
|
|
126
|
+
2. Read Section 4 in reflection-implementation-guide.md
|
|
127
|
+
3. Implement input sanitization
|
|
128
|
+
4. Set up role-based access control for introspection
|
|
129
|
+
5. Complete Security Checklist in reflection-quick-reference.md
|
|
130
|
+
|
|
131
|
+
---
|
|
132
|
+
|
|
133
|
+
## Key Research Findings
|
|
134
|
+
|
|
135
|
+
### Finding 1: External Feedback is Critical
|
|
136
|
+
|
|
137
|
+
LLM self-correction works best WITH external feedback:
|
|
138
|
+
- With external feedback (tests, facts, tool results): 80-95% effective
|
|
139
|
+
- Without external feedback (self-evaluation only): 30-50% effective
|
|
140
|
+
|
|
141
|
+
**Implication**: Always implement introspection/feedback mechanisms before reflection
|
|
142
|
+
|
|
143
|
+
### Finding 2: Minimal Signals Can Help
|
|
144
|
+
|
|
145
|
+
Even simple "Try again" signals improve performance by 60%+ across all LLMs.
|
|
146
|
+
|
|
147
|
+
**Implication**: Even basic reflection is worth implementing
|
|
148
|
+
|
|
149
|
+
### Finding 3: Self-Correction Blind Spot
|
|
150
|
+
|
|
151
|
+
LLMs struggle to correct their own errors without external guidance. However, minimal triggers like "Wait" prompts can reduce blind spots by 89.3%.
|
|
152
|
+
|
|
153
|
+
**Implication**: Use multi-agent reflection or external validation
|
|
154
|
+
|
|
155
|
+
### Finding 4: Reflection Improves with Information Richness
|
|
156
|
+
|
|
157
|
+
Reflection types ranked by effectiveness:
|
|
158
|
+
1. Instructions + Explanation + Solution (best)
|
|
159
|
+
2. Evidence-grounded with citations
|
|
160
|
+
3. Advice/suggestions
|
|
161
|
+
4. Keywords
|
|
162
|
+
5. Retry (simplest)
|
|
163
|
+
|
|
164
|
+
**Implication**: More detailed feedback yields better results
|
|
165
|
+
|
|
166
|
+
### Finding 5: Three Reflection Cycles is the Sweet Spot
|
|
167
|
+
|
|
168
|
+
After 2-3 cycles, improvement plateaus. Additional cycles add cost with minimal benefit.
|
|
169
|
+
|
|
170
|
+
**Implication**: Set max_attempts = 2-3, not higher
|
|
171
|
+
|
|
172
|
+
---
|
|
173
|
+
|
|
174
|
+
## Reflection Patterns Summary
|
|
175
|
+
|
|
176
|
+
| Pattern | Best For | Complexity | Token Cost | Effectiveness | External Feedback |
|
|
177
|
+
|---------|----------|-----------|-----------|---------------|--------------------|
|
|
178
|
+
| Retry | Quick fixes | Low | Low | 60% | No |
|
|
179
|
+
| Self-Critique | Content quality | Medium | Medium | 75% | No |
|
|
180
|
+
| Reflexion | Accuracy-critical | Medium | Medium-High | 85% | Yes (required) |
|
|
181
|
+
| Multi-Agent | Highest quality | High | High | 90% | Yes (recommended) |
|
|
182
|
+
| Tool-Interactive | Problem-solving | Medium | Medium | 85% | Yes (tools) |
|
|
183
|
+
| Hierarchical | Complex planning | High | High | 90% | Yes (rewards) |
|
|
184
|
+
|
|
185
|
+
---
|
|
186
|
+
|
|
187
|
+
## Security Patterns
|
|
188
|
+
|
|
189
|
+
| Attack Vector | Defense | Section |
|
|
190
|
+
|--------------|---------|---------|
|
|
191
|
+
| Prompt injection via feedback | Input sanitization, keyword filtering | Section 6.1 |
|
|
192
|
+
| Credential exposure in history | Credential masking, filtering | Section 6.2 |
|
|
193
|
+
| Unauthorized context access | RBAC, permission model | Section 3.2 |
|
|
194
|
+
| Agent context escape | Isolation, execution boundaries | Section 6.3 |
|
|
195
|
+
| Loop-based DoS | Loop detection, hard limits | Section 5.1 |
|
|
196
|
+
|
|
197
|
+
---
|
|
198
|
+
|
|
199
|
+
## Introspection Tools Reference
|
|
200
|
+
|
|
201
|
+
**For Agent Introspection**:
|
|
202
|
+
- `get_agent_metadata` - Agent capabilities, status
|
|
203
|
+
- `read_parent_context` - Parent goals, constraints
|
|
204
|
+
- `read_sibling_context` - Other agents' work
|
|
205
|
+
- `read_execution_history` - Own prior attempts
|
|
206
|
+
- `read_workflow_state` - Overall progress
|
|
207
|
+
- `check_resource_constraints` - Token/time remaining
|
|
208
|
+
|
|
209
|
+
All defined in Section 3.1 of REFLECTION_RESEARCH_REPORT.md
|
|
210
|
+
|
|
211
|
+
---
|
|
212
|
+
|
|
213
|
+
## Configuration Profiles Summary
|
|
214
|
+
|
|
215
|
+
### Speed-Optimized
|
|
216
|
+
- max_attempts: 1
|
|
217
|
+
- reflection: minimal
|
|
218
|
+
- Cost: lowest
|
|
219
|
+
- Quality: baseline
|
|
220
|
+
|
|
221
|
+
### Quality-Optimized
|
|
222
|
+
- max_attempts: 3
|
|
223
|
+
- reflection: evidence_grounded
|
|
224
|
+
- Cost: high
|
|
225
|
+
- Quality: best
|
|
226
|
+
|
|
227
|
+
### Balanced (Recommended)
|
|
228
|
+
- max_attempts: 2
|
|
229
|
+
- reflection: self_critique
|
|
230
|
+
- Cost: medium
|
|
231
|
+
- Quality: good
|
|
232
|
+
|
|
233
|
+
### Safety-Critical
|
|
234
|
+
- max_attempts: 3
|
|
235
|
+
- reflection: multi_agent
|
|
236
|
+
- Cost: highest
|
|
237
|
+
- Quality: highest
|
|
238
|
+
- Security: strict
|
|
239
|
+
|
|
240
|
+
---
|
|
241
|
+
|
|
242
|
+
## Common Questions Answered
|
|
243
|
+
|
|
244
|
+
**Q: Should I use reflection for every task?**
|
|
245
|
+
A: No. See Decision Tree. Skip reflection for time-critical or purely creative tasks.
|
|
246
|
+
|
|
247
|
+
**Q: How many reflection cycles should I do?**
|
|
248
|
+
A: Typically 2-3. More yields diminishing returns.
|
|
249
|
+
|
|
250
|
+
**Q: How much will reflection cost me?**
|
|
251
|
+
A: See Token Budget Calculator. Expect 2-5x token cost.
|
|
252
|
+
|
|
253
|
+
**Q: Will I get infinite loops?**
|
|
254
|
+
A: Possibly, if you don't implement loop detection (Section 5.1).
|
|
255
|
+
|
|
256
|
+
**Q: What's the best reflection template?**
|
|
257
|
+
A: Evidence-grounded (Template 2) works for most tasks.
|
|
258
|
+
|
|
259
|
+
**Q: How do I know if reflection is helping?**
|
|
260
|
+
A: Measure quality before/after (Section 9).
|
|
261
|
+
|
|
262
|
+
**Q: Is reflection safe?**
|
|
263
|
+
A: See Section 6 for security considerations.
|
|
264
|
+
|
|
265
|
+
---
|
|
266
|
+
|
|
267
|
+
## Getting Started in 5 Minutes
|
|
268
|
+
|
|
269
|
+
1. **Determine if reflection is right for you**: Decision Tree (quick-reference.md)
|
|
270
|
+
2. **Choose your approach**: Approach Selection Matrix (quick-reference.md)
|
|
271
|
+
3. **Pick a configuration**: Configuration Profiles (quick-reference.md)
|
|
272
|
+
4. **Get the code**: reflection-implementation-guide.md
|
|
273
|
+
5. **Understand the theory**: REFLECTION_RESEARCH_REPORT.md
|
|
274
|
+
|
|
275
|
+
---
|
|
276
|
+
|
|
277
|
+
## Document Files
|
|
278
|
+
|
|
279
|
+
All research files are in `./plan/P1P2/research/`:
|
|
280
|
+
|
|
281
|
+
1. `REFLECTION_RESEARCH_REPORT.md` - Main comprehensive report (15,000+ words)
|
|
282
|
+
2. `reflection-implementation-guide.md` - Code examples (Python/TypeScript)
|
|
283
|
+
3. `reflection-quick-reference.md` - Decision matrices and checklists
|
|
284
|
+
4. `REFLECTION_INDEX.md` - This file, navigation guide
|
|
285
|
+
|
|
286
|
+
---
|
|
287
|
+
|
|
288
|
+
**Created**: December 2025
|
|
289
|
+
**Version**: 1.0
|
|
290
|
+
**Status**: Comprehensive research package ready for implementation
|
|
291
|
+
|