nttp 1.4.11 → 1.4.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/docs/models.md ADDED
@@ -0,0 +1,423 @@
1
+ # LLM Model Selection Guide
2
+
3
+ Choosing the right LLM model for NTTP based on your needs.
4
+
5
+ ## Quick Recommendations
6
+
7
+ | Use Case | Provider | Model | Why |
8
+ |----------|----------|-------|-----|
9
+ | **Production (recommended)** | Anthropic | `claude-sonnet-4-5-20250929` | Best balance of quality, speed, and cost |
10
+ | **Complex queries** | Anthropic | `claude-opus-4-5-20251101` | Highest reasoning capability |
11
+ | **Development** | Anthropic | `claude-haiku-4-20250514` | Fastest, lowest cost |
12
+ | **OpenAI users** | OpenAI | `gpt-4o` | Fast and reliable |
13
+ | **Budget-conscious** | OpenAI | `gpt-3.5-turbo` | Cheapest option |
14
+
15
+ ---
16
+
17
+ ## Anthropic (Claude)
18
+
19
+ ### claude-sonnet-4-5-20250929 ⭐ **Recommended**
20
+
21
+ **Best for:** Production, general-purpose SQL generation
22
+
23
+ **Strengths:**
24
+ - Excellent instruction following
25
+ - Strong reasoning for complex schemas
26
+ - Reliable structured outputs
27
+ - Good balance of speed and quality
28
+ - Handles multi-table joins well
29
+
30
+ **Performance:**
31
+ - Latency: ~2-3s per query
32
+ - Cost: ~$0.01 per query (2 LLM calls)
33
+ - Accuracy: 95%+ for well-defined schemas
34
+
35
+ **Example:**
36
+
37
+ ```typescript
38
+ llm: {
39
+ provider: 'anthropic',
40
+ model: 'claude-sonnet-4-5-20250929',
41
+ apiKey: process.env.ANTHROPIC_API_KEY
42
+ }
43
+ ```
44
+
45
+ **When to use:**
46
+ - Production applications
47
+ - Complex multi-table databases
48
+ - When accuracy is critical
49
+ - General-purpose use
50
+
51
+ **When NOT to use:**
52
+ - Simple lookup tables (use Haiku)
53
+ - When milliseconds matter (cache instead)
54
+
55
+ ---
56
+
57
+ ### claude-opus-4-5-20251101
58
+
59
+ **Best for:** Complex queries, ambiguous requests
60
+
61
+ **Strengths:**
62
+ - Highest reasoning capability
63
+ - Best at understanding vague queries
64
+ - Excellent at inferring relationships
65
+ - Handles very complex schemas
66
+
67
+ **Performance:**
68
+ - Latency: ~4-6s per query
69
+ - Cost: ~$0.03 per query
70
+ - Accuracy: 98%+ even for ambiguous queries
71
+
72
+ **Example:**
73
+
74
+ ```typescript
75
+ llm: {
76
+ provider: 'anthropic',
77
+ model: 'claude-opus-4-5-20251101',
78
+ apiKey: process.env.ANTHROPIC_API_KEY
79
+ }
80
+ ```
81
+
82
+ **When to use:**
83
+ - Very complex schemas (50+ tables)
84
+ - Ambiguous natural language queries
85
+ - When quality > speed/cost
86
+ - Research/analysis applications
87
+
88
+ **When NOT to use:**
89
+ - Simple queries
90
+ - High-throughput applications
91
+ - Budget-constrained projects
92
+
93
+ ---
94
+
95
+ ### claude-haiku-4-20250514
96
+
97
+ **Best for:** Development, simple schemas
98
+
99
+ **Strengths:**
100
+ - Fastest Claude model
101
+ - Lowest cost
102
+ - Good for simple queries
103
+ - Fast iteration during development
104
+
105
+ **Performance:**
106
+ - Latency: ~1-1.5s per query
107
+ - Cost: ~$0.003 per query
108
+ - Accuracy: 85-90% for simple queries
109
+
110
+ **Example:**
111
+
112
+ ```typescript
113
+ llm: {
114
+ provider: 'anthropic',
115
+ model: 'claude-haiku-4-20250514',
116
+ apiKey: process.env.ANTHROPIC_API_KEY
117
+ }
118
+ ```
119
+
120
+ **When to use:**
121
+ - Development/testing
122
+ - Simple schemas (< 10 tables)
123
+ - Basic SELECT queries
124
+ - When speed is critical
125
+
126
+ **When NOT to use:**
127
+ - Complex multi-table joins
128
+ - Ambiguous queries
129
+ - Production with complex schemas
130
+
131
+ ---
132
+
133
+ ## OpenAI (GPT)
134
+
135
+ ### gpt-4o
136
+
137
+ **Best for:** OpenAI users, fast structured outputs
138
+
139
+ **Strengths:**
140
+ - Very fast for structured outputs
141
+ - Good SQL generation quality
142
+ - Wide availability
143
+ - Strong ecosystem
144
+
145
+ **Performance:**
146
+ - Latency: ~1.5-2s per query
147
+ - Cost: ~$0.008 per query
148
+ - Accuracy: 90-93%
149
+
150
+ **Example:**
151
+
152
+ ```typescript
153
+ llm: {
154
+ provider: 'openai',
155
+ model: 'gpt-4o',
156
+ apiKey: process.env.OPENAI_API_KEY
157
+ }
158
+ ```
159
+
160
+ **When to use:**
161
+ - Already using OpenAI
162
+ - Need fast responses
163
+ - Moderate complexity queries
164
+
165
+ **When NOT to use:**
166
+ - Very complex schemas
167
+ - When highest accuracy required
168
+
169
+ ---
170
+
171
+ ### gpt-4-turbo
172
+
173
+ **Best for:** Balance of quality and speed
174
+
175
+ **Strengths:**
176
+ - Good reasoning
177
+ - Reliable outputs
178
+ - Faster than original GPT-4
179
+
180
+ **Performance:**
181
+ - Latency: ~2-3s per query
182
+ - Cost: ~$0.01 per query
183
+ - Accuracy: 88-92%
184
+
185
+ ---
186
+
187
+ ### gpt-3.5-turbo
188
+
189
+ **Best for:** Budget-conscious development
190
+
191
+ **Strengths:**
192
+ - Cheapest option
193
+ - Very fast
194
+ - Good for simple queries
195
+
196
+ **Performance:**
197
+ - Latency: ~1s per query
198
+ - Cost: ~$0.002 per query
199
+ - Accuracy: 80-85%
200
+
201
+ **Example:**
202
+
203
+ ```typescript
204
+ llm: {
205
+ provider: 'openai',
206
+ model: 'gpt-3.5-turbo',
207
+ apiKey: process.env.OPENAI_API_KEY
208
+ }
209
+ ```
210
+
211
+ **When to use:**
212
+ - Development only
213
+ - Very simple schemas
214
+ - Budget-constrained projects
215
+
216
+ **When NOT to use:**
217
+ - Production
218
+ - Complex queries
219
+ - Critical accuracy requirements
220
+
221
+ ---
222
+
223
+ ## Other Providers
224
+
225
+ ### Cohere (command-r-plus)
226
+
227
+ **Best for:** Enterprise deployments
228
+
229
+ **Performance:**
230
+ - Latency: ~2-3s
231
+ - Cost: ~$0.01 per query
232
+ - Accuracy: 85-90%
233
+
234
+ **Example:**
235
+
236
+ ```typescript
237
+ llm: {
238
+ provider: 'cohere',
239
+ model: 'command-r-plus',
240
+ apiKey: process.env.COHERE_API_KEY
241
+ }
242
+ ```
243
+
244
+ ---
245
+
246
+ ### Mistral (mistral-large-latest)
247
+
248
+ **Best for:** Open-source preference
249
+
250
+ **Performance:**
251
+ - Latency: ~2-3s
252
+ - Cost: ~$0.008 per query
253
+ - Accuracy: 85-90%
254
+
255
+ **Example:**
256
+
257
+ ```typescript
258
+ llm: {
259
+ provider: 'mistral',
260
+ model: 'mistral-large-latest',
261
+ apiKey: process.env.MISTRAL_API_KEY
262
+ }
263
+ ```
264
+
265
+ ---
266
+
267
+ ### Google (gemini-pro)
268
+
269
+ **Best for:** Google Cloud users
270
+
271
+ **Performance:**
272
+ - Latency: ~2-3s
273
+ - Cost: ~$0.007 per query
274
+ - Accuracy: 85-90%
275
+
276
+ **Example:**
277
+
278
+ ```typescript
279
+ llm: {
280
+ provider: 'google',
281
+ model: 'gemini-pro',
282
+ apiKey: process.env.GOOGLE_API_KEY
283
+ }
284
+ ```
285
+
286
+ ---
287
+
288
+ ## Model Comparison
289
+
290
+ ### Performance Comparison
291
+
292
+ | Model | Latency | Cost/Query | Accuracy | Best For |
293
+ |-------|---------|------------|----------|----------|
294
+ | Claude Opus 4.5 | 4-6s | ~$0.03 | 98% | Complex queries |
295
+ | Claude Sonnet 4.5 ⭐ | 2-3s | ~$0.01 | 95% | Production |
296
+ | Claude Haiku 4 | 1-1.5s | ~$0.003 | 85% | Development |
297
+ | GPT-4o | 1.5-2s | ~$0.008 | 90% | OpenAI users |
298
+ | GPT-4 Turbo | 2-3s | ~$0.01 | 88% | Balanced |
299
+ | GPT-3.5 Turbo | 1s | ~$0.002 | 80% | Budget/dev |
300
+ | Cohere Command-R+ | 2-3s | ~$0.01 | 85% | Enterprise |
301
+ | Mistral Large | 2-3s | ~$0.008 | 85% | Open-source |
302
+ | Gemini Pro | 2-3s | ~$0.007 | 85% | Google Cloud |
303
+
304
+ ---
305
+
306
+ ### Capability Matrix
307
+
308
+ | Capability | Opus | Sonnet | Haiku | GPT-4o | GPT-3.5 |
309
+ |------------|------|--------|-------|--------|---------|
310
+ | Simple SELECT | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ |
311
+ | Complex JOINs | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐ | ⭐ |
312
+ | Ambiguous queries | ⭐⭐⭐ | ⭐⭐ | ⭐ | ⭐⭐ | ⭐ |
313
+ | Speed | ⭐ | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ |
314
+ | Cost efficiency | ⭐ | ⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐ |
315
+
316
+ ---
317
+
318
+ ## Choosing the Right Model
319
+
320
+ ### Decision Tree
321
+
322
+ ```
323
+ Start: What's your primary concern?
324
+
325
+ ├─ Cost?
326
+ │ ├─ Development: Claude Haiku or GPT-3.5
327
+ │ └─ Production: Claude Sonnet or GPT-4o
328
+
329
+ ├─ Accuracy?
330
+ │ ├─ Highest: Claude Opus
331
+ │ └─ Good enough: Claude Sonnet ⭐
332
+
333
+ ├─ Speed?
334
+ │ ├─ Fastest: Claude Haiku or GPT-3.5
335
+ │ └─ Fast enough: GPT-4o or Claude Sonnet
336
+
337
+ └─ Schema complexity?
338
+ ├─ Very complex (50+ tables): Claude Opus
339
+ ├─ Moderate (10-50 tables): Claude Sonnet ⭐
340
+ └─ Simple (< 10 tables): Claude Haiku or GPT-4o
341
+ ```
342
+
343
+ ---
344
+
345
+ ## Configuration Tips
346
+
347
+ ### Development
348
+
349
+ Use faster, cheaper models:
350
+
351
+ ```typescript
352
+ llm: {
353
+ provider: 'anthropic',
354
+ model: 'claude-haiku-4-20250514',
355
+ apiKey: process.env.ANTHROPIC_API_KEY
356
+ }
357
+ ```
358
+
359
+ ---
360
+
361
+ ### Production
362
+
363
+ Use balanced, reliable models:
364
+
365
+ ```typescript
366
+ llm: {
367
+ provider: 'anthropic',
368
+ model: 'claude-sonnet-4-5-20250929', // ⭐ Recommended
369
+ apiKey: process.env.ANTHROPIC_API_KEY,
370
+ maxTokens: 2048
371
+ }
372
+ ```
373
+
374
+ ---
375
+
376
+ ### High-Complexity Production
377
+
378
+ Use most capable model:
379
+
380
+ ```typescript
381
+ llm: {
382
+ provider: 'anthropic',
383
+ model: 'claude-opus-4-5-20251101',
384
+ apiKey: process.env.ANTHROPIC_API_KEY,
385
+ maxTokens: 4096
386
+ }
387
+ ```
388
+
389
+ ---
390
+
391
+ ## Cost Optimization
392
+
393
+ ### With Caching
394
+
395
+ **First query (cold cache):**
396
+ - Cost: Model cost (e.g., $0.01 for Sonnet)
397
+
398
+ **Subsequent identical queries (L1 hit):**
399
+ - Cost: $0
400
+
401
+ **Similar queries (L2 hit):**
402
+ - Cost: ~$0.0001 (embedding only)
403
+
404
+ **Example: 1000 queries with 70% L1 hit rate, 25% L2:**
405
+
406
+ ```
407
+ Without cache: 1000 × $0.01 = $10.00
408
+ With cache:
409
+ - 700 L1 hits: $0
410
+ - 250 L2 hits: 250 × $0.0001 = $0.025
411
+ - 50 L3 misses: 50 × $0.01 = $0.50
412
+ Total: $0.525 (95% savings!)
413
+ ```
414
+
415
+ **Conclusion:** Model cost matters less with good caching.
416
+
417
+ ---
418
+
419
+ ## See Also
420
+
421
+ - [Configuration](./configuration.md) - How to configure LLM
422
+ - [Caching](./caching.md) - Reduce LLM costs with caching
423
+ - [Production Guide](./production.md) - Production deployment tips