nttp 1.4.10 → 1.4.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,579 @@
1
+ # 3-Layer Caching System
2
+
3
+ NTTP uses an intelligent 3-layer cache to optimize cost and performance. Queries cascade through increasingly expensive layers until a result is found.
4
+
5
+ ## Table of Contents
6
+
7
+ - [Overview](#overview)
8
+ - [Layer 1: Exact Match](#layer-1-exact-match-l1)
9
+ - [Layer 2: Semantic Match](#layer-2-semantic-match-l2)
10
+ - [Layer 3: LLM Generation](#layer-3-llm-generation-l3)
11
+ - [Cache Flow](#cache-flow)
12
+ - [Configuration](#configuration)
13
+ - [Performance Metrics](#performance-metrics)
14
+ - [Best Practices](#best-practices)
15
+
16
+ ---
17
+
18
+ ## Overview
19
+
20
+ ```
21
+ ┌─────────────────────────────────────────────────────────────┐
22
+ │ Natural Language Query │
23
+ │ "show me active users" │
24
+ └─────────────────────────────────────────────────────────────┘
25
+
26
+
27
+ ┌─────────────────────────────────────────────────────────────┐
28
+ │ L1: EXACT MATCH │
29
+ │ Hash-based lookup $0 <1ms (in-memory) │
30
+ │ ~5ms (Redis) │
31
+ ├─────────────────────────────────────────────────────────────┤
32
+ │ • Checks if exact query string was seen before │
33
+ │ • Instant hit if query matches character-for-character │
34
+ │ • Uses MD5 hash for O(1) lookup │
35
+ └─────────────────────────────────────────────────────────────┘
36
+ │ MISS
37
+
38
+ ┌─────────────────────────────────────────────────────────────┐
39
+ │ L2: SEMANTIC MATCH │
40
+ │ Embedding similarity ~$0.0001 50-100ms │
41
+ ├─────────────────────────────────────────────────────────────┤
42
+ │ • Compares semantic meaning of query │
43
+ │ • Matches similar phrasings: │
44
+ │ "show users" ≈ "get users" ≈ "list users" │
45
+ │ • Uses OpenAI embeddings (text-embedding-3-small) │
46
+ │ • Similarity threshold: 0.85 (configurable) │
47
+ └─────────────────────────────────────────────────────────────┘
48
+ │ MISS
49
+
50
+ ┌─────────────────────────────────────────────────────────────┐
51
+ │ L3: LLM GENERATION │
52
+ │ Full pipeline ~$0.01 2-3s │
53
+ ├─────────────────────────────────────────────────────────────┤
54
+ │ • Parse intent with LLM │
55
+ │ • Generate SQL with LLM │
56
+ │ • Execute and cache result │
57
+ │ • Populates L1 and L2 for future queries │
58
+ └─────────────────────────────────────────────────────────────┘
59
+ ```
60
+
61
+ ---
62
+
63
+ ## Layer 1: Exact Match (L1)
64
+
65
+ ### How It Works
66
+
67
+ L1 cache uses **exact string matching** via hash lookup. If you query "show users" twice, the second query hits L1 instantly.
68
+
69
+ **Storage Options:**
70
+ 1. **In-Memory** (default): Fast but resets on restart
71
+ 2. **Redis** (recommended): Persistent across restarts and instances
72
+
73
+ ### Configuration
74
+
75
+ **In-Memory:**
76
+
77
+ ```typescript
78
+ const nttp = new NTTP({
79
+ // ... other config
80
+ cache: {
81
+ l1: {
82
+ enabled: true, // Default: true
83
+ maxSize: 1000 // Default: 1000 entries
84
+ }
85
+ }
86
+ });
87
+ ```
88
+
89
+ **Redis (Persistent):**
90
+
91
+ ```typescript
92
+ const nttp = new NTTP({
93
+ // ... other config
94
+ cache: {
95
+ l1: {
96
+ enabled: true,
97
+ maxSize: 1000
98
+ },
99
+ redis: {
100
+ url: 'redis://localhost:6379'
101
+ }
102
+ }
103
+ });
104
+ ```
105
+
106
+ Or via environment variable:
107
+
108
+ ```bash
109
+ REDIS_URL=redis://localhost:6379
110
+ ```
111
+
112
+ ### Performance
113
+
114
+ | Storage | Latency | Persistence | Multi-Instance |
115
+ |---------|---------|-------------|----------------|
116
+ | In-Memory | <1ms | ❌ No | ❌ No |
117
+ | Redis | ~5ms | ✅ Yes | ✅ Shared |
118
+
119
+ ### When L1 Hits
120
+
121
+ ```typescript
122
+ const result1 = await nttp.query("show active users");
123
+ // L3 MISS: 2500ms - LLM generation
124
+
125
+ const result2 = await nttp.query("show active users");
126
+ // L1 HIT: 0.8ms - Exact match
127
+
128
+ console.log(result2.meta);
129
+ // {
130
+ // cacheLayer: 1,
131
+ // cost: 0,
132
+ // latency: 0.8
133
+ // }
134
+ ```
135
+
136
+ ### Benefits
137
+
138
+ - **Zero cost** - No API calls
139
+ - **Instant response** - Sub-millisecond latency
140
+ - **Perfect reliability** - No LLM variability
141
+
142
+ ### Limitations
143
+
144
+ - **Exact match only** - "show users" ≠ "get users"
145
+ - **Case sensitive** - "Show Users" ≠ "show users"
146
+ - **No typo tolerance** - "show usres" ≠ "show users"
147
+
148
+ ---
149
+
150
+ ## Layer 2: Semantic Match (L2)
151
+
152
+ ### How It Works
153
+
154
+ L2 cache uses **embedding-based semantic similarity** to match queries with similar meaning but different wording.
155
+
156
+ **Example Matches:**
157
+ - "show users" ≈ "get users" ≈ "list users" ≈ "display users"
158
+ - "top 10 products" ≈ "10 most popular products"
159
+ - "count orders" ≈ "how many orders" ≈ "number of orders"
160
+
161
+ ### Configuration
162
+
163
+ ```typescript
164
+ const nttp = new NTTP({
165
+ // ... other config
166
+ cache: {
167
+ l2: {
168
+ enabled: true, // Default: false
169
+ provider: 'openai', // Only OpenAI supported
170
+ model: 'text-embedding-3-small', // Default model
171
+ apiKey: process.env.OPENAI_API_KEY, // Required
172
+ maxSize: 500, // Default: 500 entries
173
+ similarityThreshold: 0.85 // Default: 0.85 (0-1 scale)
174
+ }
175
+ }
176
+ });
177
+ ```
178
+
179
+ Or via environment variables:
180
+
181
+ ```bash
182
+ OPENAI_API_KEY=sk-...
183
+ # L2 is auto-enabled if OPENAI_API_KEY is present
184
+ ```
185
+
186
+ ### Performance
187
+
188
+ - **Cost:** ~$0.0001 per query (embedding generation)
189
+ - **Latency:** 50-100ms (embedding API call + cosine similarity)
190
+ - **Accuracy:** 85% similarity threshold (configurable)
191
+
192
+ ### When L2 Hits
193
+
194
+ ```typescript
195
+ const result1 = await nttp.query("show active users");
196
+ // L3 MISS: 2500ms - LLM generation
197
+
198
+ const result2 = await nttp.query("get active users");
199
+ // L2 HIT: 75ms - Semantic match (similarity: 0.92)
200
+
201
+ console.log(result2.meta);
202
+ // {
203
+ // cacheLayer: 2,
204
+ // cost: 0.0001,
205
+ // latency: 75,
206
+ // similarity: 0.92
207
+ // }
208
+ ```
209
+
210
+ ### Similarity Threshold
211
+
212
+ The `similarityThreshold` controls how strict the matching is:
213
+
214
+ - **0.95+**: Very strict - only nearly identical phrasings
215
+ - **0.85-0.95** (recommended): Moderate - same intent, different words
216
+ - **0.75-0.85**: Loose - more false positives but higher hit rate
217
+ - **<0.75**: Too loose - risky, may match unrelated queries
218
+
219
+ **Adjusting the threshold:**
220
+
221
+ ```typescript
222
+ cache: {
223
+ l2: {
224
+ similarityThreshold: 0.90 // Stricter matching
225
+ }
226
+ }
227
+ ```
228
+
229
+ ### Cache Promotion
230
+
231
+ When L2 hits, the query is **promoted to L1** for future exact matches:
232
+
233
+ ```
234
+ Query 1: "show users" → L3 MISS → Generate SQL → Cache in L1 + L2
235
+ Query 2: "get users" → L2 HIT → Promote to L1
236
+ Query 3: "get users" → L1 HIT → Instant
237
+ ```
238
+
239
+ ### Benefits
240
+
241
+ - **Low cost** - 100x cheaper than LLM
242
+ - **Handles variations** - Different phrasings match
243
+ - **Fast** - 30x faster than LLM generation
244
+
245
+ ### Limitations
246
+
247
+ - **Requires OpenAI API** - Currently only provider
248
+ - **Additional cost** - $0.0001 per query vs $0 for L1
249
+ - **Slower than L1** - 75ms vs 1ms
250
+ - **False positives possible** - May match unrelated similar queries
251
+
252
+ ---
253
+
254
+ ## Layer 3: LLM Generation (L3)
255
+
256
+ ### How It Works
257
+
258
+ L3 is the **full pipeline** when no cache hits:
259
+
260
+ 1. **Parse Intent** - LLM extracts structured intent from natural language
261
+ 2. **Generate SQL** - LLM creates safe, parameterized SQL
262
+ 3. **Execute Query** - Database runs the SQL
263
+ 4. **Cache Result** - Stores in L1, L2, and schema cache
264
+
265
+ ### Configuration
266
+
267
+ L3 is always enabled. Configure via LLM settings:
268
+
269
+ ```typescript
270
+ const nttp = new NTTP({
271
+ llm: {
272
+ provider: 'anthropic', // See models guide
273
+ model: 'claude-sonnet-4-5-20250929', // Recommended
274
+ apiKey: process.env.ANTHROPIC_API_KEY,
275
+ maxTokens: 2048 // Default: 2048
276
+ }
277
+ });
278
+ ```
279
+
280
+ ### Performance
281
+
282
+ - **Cost:** ~$0.01 per query (2 LLM calls: intent + SQL)
283
+ - **Latency:** 2-3 seconds (network + LLM processing)
284
+ - **Reliability:** 99%+ with Claude Sonnet
285
+
286
+ ### When L3 Runs
287
+
288
+ ```typescript
289
+ const result = await nttp.query("show active premium users from California");
290
+ // L1 MISS: Never seen this exact query
291
+ // L2 MISS: No similar cached queries
292
+ // L3: Full generation (2847ms)
293
+
294
+ console.log(result.meta);
295
+ // {
296
+ // cacheLayer: 3,
297
+ // cost: 0.01,
298
+ // latency: 2847
299
+ // }
300
+ ```
301
+
302
+ ### Benefits
303
+
304
+ - **Handles any query** - No cache required
305
+ - **Highest quality** - Claude's reasoning ensures correct SQL
306
+ - **Self-healing** - Populates cache for future queries
307
+
308
+ ### Limitations
309
+
310
+ - **Expensive** - $0.01 vs $0.0001 (L2) vs $0 (L1)
311
+ - **Slow** - 2-3s vs 75ms (L2) vs 1ms (L1)
312
+ - **Rate limited** - Subject to LLM provider limits
313
+
314
+ ---
315
+
316
+ ## Cache Flow
317
+
318
+ ### Complete Flow Diagram
319
+
320
+ ```
321
+ ┌──────────────────────────────────────────────────────────────┐
322
+ │ User Query: "show active users" │
323
+ └──────────────────────────────────────────────────────────────┘
324
+
325
+
326
+ ┌───────────────────────┐
327
+ │ L1: Hash Lookup │
328
+ │ Hash("show active...") │
329
+ └───────────────────────┘
330
+
331
+ ┌───────────┴───────────┐
332
+ │ │
333
+ HIT│ │MISS
334
+ ▼ ▼
335
+ ┌──────────────┐ ┌──────────────────────┐
336
+ │ Return SQL │ │ L2: Semantic Search │
337
+ │ Cost: $0 │ │ Embed("show active...")│
338
+ │ Time: <1ms │ └──────────────────────┘
339
+ └──────────────┘ │
340
+ ┌────────────┴────────────┐
341
+ │ │
342
+ HIT│ │MISS
343
+ ▼ ▼
344
+ ┌──────────────────┐ ┌─────────────────────┐
345
+ │ Promote to L1 │ │ L3: LLM Generation │
346
+ │ Return SQL │ │ 1. Parse Intent │
347
+ │ Cost: ~$0.0001 │ │ 2. Generate SQL │
348
+ │ Time: ~75ms │ │ 3. Execute │
349
+ └──────────────────┘ │ 4. Cache in L1+L2 │
350
+ │ Cost: ~$0.01 │
351
+ │ Time: 2-3s │
352
+ └─────────────────────┘
353
+ ```
354
+
355
+ ### Example Flow
356
+
357
+ ```typescript
358
+ // First user queries
359
+ await nttp.query("show active users");
360
+ // → L1 MISS → L2 MISS → L3 (2500ms, $0.01)
361
+
362
+ // Same user again
363
+ await nttp.query("show active users");
364
+ // → L1 HIT (0.8ms, $0)
365
+
366
+ // Different user, similar query
367
+ await nttp.query("get active users");
368
+ // → L1 MISS → L2 HIT (75ms, $0.0001) → Promote to L1
369
+
370
+ // Same different user again
371
+ await nttp.query("get active users");
372
+ // → L1 HIT (0.8ms, $0)
373
+
374
+ // Completely new query
375
+ await nttp.query("count pending orders");
376
+ // → L1 MISS → L2 MISS → L3 (2400ms, $0.01)
377
+ ```
378
+
379
+ ---
380
+
381
+ ## Configuration
382
+
383
+ ### Minimal (L1 Only)
384
+
385
+ ```typescript
386
+ const nttp = new NTTP({
387
+ database: { /* ... */ },
388
+ llm: { /* ... */ }
389
+ // L1 in-memory enabled by default
390
+ });
391
+ ```
392
+
393
+ **Cost:** $0 after warm-up
394
+ **Latency:** <1ms for exact matches
395
+ **Use case:** Development, single-instance apps
396
+
397
+ ---
398
+
399
+ ### Recommended (L1 + Redis)
400
+
401
+ ```typescript
402
+ const nttp = new NTTP({
403
+ database: { /* ... */ },
404
+ llm: { /* ... */ },
405
+ cache: {
406
+ redis: {
407
+ url: 'redis://localhost:6379'
408
+ }
409
+ }
410
+ });
411
+ ```
412
+
413
+ **Cost:** $0 after warm-up
414
+ **Latency:** ~5ms for exact matches
415
+ **Use case:** Production, CLI tools, multi-instance deployments
416
+
417
+ ---
418
+
419
+ ### Maximum (L1 + Redis + L2)
420
+
421
+ ```typescript
422
+ const nttp = new NTTP({
423
+ database: { /* ... */ },
424
+ llm: { /* ... */ },
425
+ cache: {
426
+ redis: { url: 'redis://localhost:6379' },
427
+ l2: {
428
+ enabled: true,
429
+ provider: 'openai',
430
+ apiKey: process.env.OPENAI_API_KEY,
431
+ similarityThreshold: 0.85
432
+ }
433
+ }
434
+ });
435
+ ```
436
+
437
+ **Cost:** ~$0.0001 for similar queries, $0 for exact
438
+ **Latency:** ~75ms for similar, ~5ms for exact
439
+ **Use case:** Production with high query variation
440
+
441
+ ---
442
+
443
+ ## Performance Metrics
444
+
445
+ ### Typical Hit Rates (After Warm-up)
446
+
447
+ | Layer | Hit Rate | Cumulative |
448
+ |-------|----------|------------|
449
+ | L1 | 60-70% | 60-70% |
450
+ | L2 | 20-30% | 85-95% |
451
+ | L3 | 5-15% | 100% |
452
+
453
+ ### Cost Comparison (1000 queries)
454
+
455
+ | Configuration | Cold Start | After Warm-up | Savings |
456
+ |---------------|------------|---------------|---------|
457
+ | No cache | $10.00 | $10.00 | 0% |
458
+ | L1 only | $10.00 | $1.50 | 85% |
459
+ | L1 + L2 | $10.00 | $0.50 | 95% |
460
+
461
+ ### Latency Comparison
462
+
463
+ | Cache Layer | Avg Latency | vs L3 Speedup |
464
+ |-------------|-------------|---------------|
465
+ | L1 (memory) | 0.8ms | 3000x faster |
466
+ | L1 (Redis) | 5ms | 500x faster |
467
+ | L2 | 75ms | 35x faster |
468
+ | L3 | 2500ms | 1x (baseline) |
469
+
470
+ ---
471
+
472
+ ## Best Practices
473
+
474
+ ### 1. Always Enable Redis in Production
475
+
476
+ ```typescript
477
+ // ✅ Good: Persistent cache
478
+ cache: {
479
+ redis: { url: process.env.REDIS_URL }
480
+ }
481
+
482
+ // ❌ Bad: Cache resets on restart
483
+ cache: {
484
+ l1: { enabled: true } // In-memory only
485
+ }
486
+ ```
487
+
488
+ **Why:** CLI tools and restarts lose all cache without Redis.
489
+
490
+ ---
491
+
492
+ ### 2. Enable L2 for High Query Variation
493
+
494
+ ```typescript
495
+ // ✅ Good for customer-facing apps
496
+ cache: {
497
+ redis: { url: process.env.REDIS_URL },
498
+ l2: { enabled: true }
499
+ }
500
+
501
+ // ⚠️ Okay for internal tools with repeated queries
502
+ cache: {
503
+ redis: { url: process.env.REDIS_URL }
504
+ }
505
+ ```
506
+
507
+ **Why:** L2 handles different phrasings of the same intent.
508
+
509
+ ---
510
+
511
+ ### 3. Monitor Cache Performance
512
+
513
+ ```typescript
514
+ const result = await nttp.query("show users");
515
+
516
+ if (result.meta) {
517
+ console.log(`L${result.meta.cacheLayer} | $${result.meta.cost} | ${result.meta.latency}ms`);
518
+
519
+ if (result.meta.cacheLayer === 3) {
520
+ console.warn('⚠️ Cache miss - consider pre-warming');
521
+ }
522
+ }
523
+ ```
524
+
525
+ ---
526
+
527
+ ### 4. Pre-warm Cache for Common Queries
528
+
529
+ ```typescript
530
+ // Warm cache on startup
531
+ const commonQueries = [
532
+ "show active users",
533
+ "count pending orders",
534
+ "top 10 products by price"
535
+ ];
536
+
537
+ for (const query of commonQueries) {
538
+ await nttp.query(query);
539
+ }
540
+ ```
541
+
542
+ ---
543
+
544
+ ### 5. Adjust L2 Threshold Based on Use Case
545
+
546
+ ```typescript
547
+ // Strict matching for financial/critical queries
548
+ cache: {
549
+ l2: { similarityThreshold: 0.92 }
550
+ }
551
+
552
+ // Loose matching for general search
553
+ cache: {
554
+ l2: { similarityThreshold: 0.80 }
555
+ }
556
+ ```
557
+
558
+ ---
559
+
560
+ ### 6. Set Appropriate Cache Sizes
561
+
562
+ ```typescript
563
+ cache: {
564
+ l1: {
565
+ maxSize: 1000 // ~1MB memory, adjust based on query complexity
566
+ },
567
+ l2: {
568
+ maxSize: 500 // Embeddings are larger, be conservative
569
+ }
570
+ }
571
+ ```
572
+
573
+ ---
574
+
575
+ ## See Also
576
+
577
+ - [Configuration](./configuration.md) - Complete cache config reference
578
+ - [Production Guide](./production.md) - Deployment best practices
579
+ - [API Reference](./api.md) - Cache management methods