rag-lite-ts 2.0.5 โ 2.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +815 -808
- package/dist/cli/indexer.js +2 -38
- package/dist/cli/search.d.ts +1 -1
- package/dist/cli/search.js +118 -9
- package/dist/cli.js +77 -94
- package/dist/core/db.js +173 -173
- package/dist/core/ingestion.js +47 -9
- package/dist/core/lazy-dependency-loader.d.ts +3 -8
- package/dist/core/lazy-dependency-loader.js +11 -29
- package/dist/core/mode-detection-service.js +1 -1
- package/dist/core/reranking-config.d.ts +1 -1
- package/dist/core/reranking-config.js +7 -16
- package/dist/core/reranking-factory.js +3 -184
- package/dist/core/reranking-strategies.js +5 -4
- package/dist/core/search.d.ts +10 -0
- package/dist/core/search.js +34 -11
- package/dist/factories/ingestion-factory.js +3 -1
- package/dist/mcp-server.js +127 -105
- package/dist/multimodal/clip-embedder.js +70 -71
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,808 +1,815 @@
|
|
|
1
|
-
<div align="center">
|
|
2
|
-
|
|
3
|
-
# ๐ฆ RAG-lite TS
|
|
4
|
-
|
|
5
|
-
### *Simple by default, powerful when needed*
|
|
6
|
-
|
|
7
|
-
**Local-first semantic search that actually works**
|
|
8
|
-
|
|
9
|
-
[](https://www.npmjs.com/package/rag-lite-ts)
|
|
10
|
-
[](https://opensource.org/licenses/MIT)
|
|
11
|
-
[](https://www.typescriptlang.org/)
|
|
12
|
-
[](https://nodejs.org/)
|
|
13
|
-
|
|
14
|
-
[Quick Start](#quick-start) โข [Features](#features) โข [Documentation](#documentation) โข [Examples](#examples) โข [MCP Integration](#mcp-server-integration)
|
|
15
|
-
|
|
16
|
-
</div>
|
|
17
|
-
|
|
18
|
-
---
|
|
19
|
-
|
|
20
|
-
## ๐ฏ Why RAG-lite TS?
|
|
21
|
-
|
|
22
|
-
**Stop fighting with complex RAG frameworks.** Get semantic search running in 30 seconds:
|
|
23
|
-
|
|
24
|
-
```bash
|
|
25
|
-
npm install -g rag-lite-ts
|
|
26
|
-
raglite ingest ./docs/
|
|
27
|
-
raglite search "your query here"
|
|
28
|
-
```
|
|
29
|
-
|
|
30
|
-
**That's it.** No API keys, no cloud services, no configuration hell.
|
|
31
|
-
|
|
32
|
-
### ๐ฌ See It In Action
|
|
33
|
-
|
|
34
|
-
```typescript
|
|
35
|
-
// 1. Ingest your docs
|
|
36
|
-
const pipeline = new IngestionPipeline('./db.sqlite', './index.bin');
|
|
37
|
-
await pipeline.ingestDirectory('./docs/');
|
|
38
|
-
|
|
39
|
-
// 2. Search semantically
|
|
40
|
-
const search = new SearchEngine('./index.bin', './db.sqlite');
|
|
41
|
-
const results = await search.search('authentication flow');
|
|
42
|
-
|
|
43
|
-
// 3. Get relevant results instantly
|
|
44
|
-
console.log(results[0].text);
|
|
45
|
-
// "To authenticate users, first obtain a JWT token from the /auth endpoint..."
|
|
46
|
-
```
|
|
47
|
-
|
|
48
|
-
**Real semantic understanding** - not just keyword matching. Finds "JWT token" when you search for "authentication flow".
|
|
49
|
-
|
|
50
|
-
### What Makes It Different?
|
|
51
|
-
|
|
52
|
-
- ๐ **100% Local** - Your data never leaves your machine
|
|
53
|
-
- ๐ **Actually Fast** - Sub-100ms queries, not "eventually consistent"
|
|
54
|
-
- ๐ฆ **Chameleon Architecture** - Automatically adapts between text and multimodal modes
|
|
55
|
-
- ๐ผ๏ธ **True Multimodal** - Search images with text, text with images (CLIP unified space)
|
|
56
|
-
- ๐ฆ **Zero Runtime Dependencies** - No Python, no Docker, no external services
|
|
57
|
-
- ๐ฏ **TypeScript Native** - Full type safety, modern ESM architecture
|
|
58
|
-
- ๐ **MCP Ready** - Built-in Model Context Protocol server for AI agents
|
|
59
|
-
|
|
60
|
-

|
|
61
|
-
|
|
62
|
-
---
|
|
63
|
-
|
|
64
|
-
## ๐ What's New in 2.0
|
|
65
|
-
|
|
66
|
-
**Chameleon Multimodal Architecture** - RAG-lite TS now seamlessly adapts between text-only and multimodal search:
|
|
67
|
-
|
|
68
|
-
### ๐ผ๏ธ Multimodal Search
|
|
69
|
-
- **CLIP Integration** - Unified 512D embedding space for text and images
|
|
70
|
-
- **Cross-Modal Search** - Find images with text queries, text with image queries
|
|
71
|
-
- **Image-to-Text Generation** - Automatic descriptions using vision-language models
|
|
72
|
-
- **Smart Reranking** -
|
|
73
|
-
|
|
74
|
-
### ๐๏ธ Architecture Improvements
|
|
75
|
-
- **Layered Architecture** - Clean separation: core (model-agnostic) โ implementation (text/multimodal) โ public API
|
|
76
|
-
- **Mode Persistence** - Configuration stored in database, auto-detected during search
|
|
77
|
-
- **Unified Content System** - Memory-based ingestion for AI agents, format-adaptive retrieval
|
|
78
|
-
- **Simplified APIs** - `createEmbedder()` and `createReranker()` replace complex factory patterns
|
|
79
|
-
|
|
80
|
-
### ๐ค MCP Server Enhancements
|
|
81
|
-
- **Multimodal Tools** - `multimodal_search`, `ingest_image` with URL download
|
|
82
|
-
- **Base64 Image Delivery** - Automatic encoding for AI agent integration
|
|
83
|
-
- **Content-Type Filtering** - Filter results by text, image, pdf, docx
|
|
84
|
-
- **Dynamic Tool Descriptions** - Context-aware tool documentation
|
|
85
|
-
|
|
86
|
-
### ๐ฆ Migration from 1.x
|
|
87
|
-
Existing databases need schema updates for multimodal support. Two options:
|
|
88
|
-
1. **Automatic Migration**: Use `migrateToRagLiteStructure()` function
|
|
89
|
-
2. **Fresh Start**: Re-ingest content with v2.0.0
|
|
90
|
-
|
|
91
|
-
See [CHANGELOG.md](CHANGELOG.md) for complete details.
|
|
92
|
-
|
|
93
|
-
---
|
|
94
|
-
|
|
95
|
-
## ๐ Table of Contents
|
|
96
|
-
|
|
97
|
-
- [Why RAG-lite TS?](#-why-rag-lite-ts)
|
|
98
|
-
- [Quick Start](#-quick-start)
|
|
99
|
-
- [Features](#-features)
|
|
100
|
-
- [Real-World Examples](#-real-world-examples)
|
|
101
|
-
- [How It Works](#-how-it-works)
|
|
102
|
-
- [Supported Models](#-supported-models)
|
|
103
|
-
- [Documentation](#-documentation)
|
|
104
|
-
- [MCP Server Integration](#-mcp-server-integration)
|
|
105
|
-
- [Development](#-development)
|
|
106
|
-
- [Contributing](#-contributing)
|
|
107
|
-
- [License](#-license)
|
|
108
|
-
|
|
109
|
-
## ๐ Quick Start
|
|
110
|
-
|
|
111
|
-
### Installation
|
|
112
|
-
|
|
113
|
-
```bash
|
|
114
|
-
npm install -g rag-lite-ts
|
|
115
|
-
```
|
|
116
|
-
|
|
117
|
-
### Basic Usage
|
|
118
|
-
|
|
119
|
-
```bash
|
|
120
|
-
# Ingest documents
|
|
121
|
-
raglite ingest ./docs/
|
|
122
|
-
|
|
123
|
-
# Search your documents
|
|
124
|
-
raglite search "machine learning concepts"
|
|
125
|
-
|
|
126
|
-
# Get more results with reranking
|
|
127
|
-
raglite search "API documentation" --top-k 10 --rerank
|
|
128
|
-
```
|
|
129
|
-
|
|
130
|
-
### Using Different Models
|
|
131
|
-
|
|
132
|
-
```bash
|
|
133
|
-
# Use higher quality model (auto-rebuilds if needed)
|
|
134
|
-
raglite ingest ./docs/ --model Xenova/all-mpnet-base-v2 --rebuild-if-needed
|
|
135
|
-
|
|
136
|
-
# Search automatically uses the correct model
|
|
137
|
-
raglite search "complex query"
|
|
138
|
-
```
|
|
139
|
-
|
|
140
|
-
### Content Retrieval and MCP Integration
|
|
141
|
-
|
|
142
|
-
```typescript
|
|
143
|
-
import { SearchEngine, IngestionPipeline } from 'rag-lite-ts';
|
|
144
|
-
|
|
145
|
-
// Memory-based ingestion for AI agents
|
|
146
|
-
const pipeline = new IngestionPipeline('./db.sqlite', './index.bin');
|
|
147
|
-
const content = Buffer.from('Document from AI agent');
|
|
148
|
-
await pipeline.ingestFromMemory(content, {
|
|
149
|
-
displayName: 'agent-document.txt'
|
|
150
|
-
});
|
|
151
|
-
|
|
152
|
-
// Format-adaptive content retrieval
|
|
153
|
-
const search = new SearchEngine('./index.bin', './db.sqlite');
|
|
154
|
-
const results = await search.search('query');
|
|
155
|
-
|
|
156
|
-
// Get file path for CLI clients
|
|
157
|
-
const filePath = await search.getContent(results[0].contentId, 'file');
|
|
158
|
-
|
|
159
|
-
// Get base64 content for MCP clients
|
|
160
|
-
const base64 = await search.getContent(results[0].contentId, 'base64');
|
|
161
|
-
```
|
|
162
|
-
|
|
163
|
-
### Multimodal Search (Text + Images)
|
|
164
|
-
|
|
165
|
-
RAG-lite TS now supports true multimodal search using CLIP's unified embedding space, enabling cross-modal search between text and images:
|
|
166
|
-
|
|
167
|
-
```bash
|
|
168
|
-
# Enable multimodal processing for text and image content
|
|
169
|
-
raglite ingest ./docs/ --mode multimodal
|
|
170
|
-
|
|
171
|
-
# Cross-modal search: Find images using text queries
|
|
172
|
-
raglite search "architecture diagram" --content-type image
|
|
173
|
-
raglite search "red sports car" --content-type image
|
|
174
|
-
|
|
175
|
-
# Find
|
|
176
|
-
raglite search
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
- **
|
|
190
|
-
- **
|
|
191
|
-
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
const
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
|
|
220
|
-
|
|
221
|
-
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
|
|
231
|
-
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
const
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
const
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
|
|
246
|
-
|
|
247
|
-
|
|
248
|
-
|
|
249
|
-
|
|
250
|
-
|
|
251
|
-
|
|
252
|
-
|
|
253
|
-
|
|
254
|
-
|
|
255
|
-
|
|
256
|
-
|
|
257
|
-
|
|
258
|
-
|
|
259
|
-
|
|
260
|
-
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
|
|
265
|
-
|
|
266
|
-
|
|
267
|
-
|
|
268
|
-
|
|
269
|
-
|
|
270
|
-
|
|
271
|
-
|
|
272
|
-
|
|
273
|
-
|
|
274
|
-
|
|
275
|
-
|
|
276
|
-
|
|
277
|
-
|
|
278
|
-
|
|
279
|
-
|
|
280
|
-
|
|
281
|
-
|
|
282
|
-
|
|
283
|
-
|
|
284
|
-
|
|
285
|
-
|
|
286
|
-
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
|
|
291
|
-
|
|
292
|
-
|
|
293
|
-
|
|
294
|
-
|
|
295
|
-
|
|
296
|
-
|
|
297
|
-
|
|
298
|
-
|
|
299
|
-
|
|
300
|
-
|
|
301
|
-
|
|
302
|
-
|
|
303
|
-
|
|
304
|
-
#
|
|
305
|
-
raglite
|
|
306
|
-
|
|
307
|
-
|
|
308
|
-
|
|
309
|
-
|
|
310
|
-
|
|
311
|
-
|
|
312
|
-
|
|
313
|
-
|
|
314
|
-
|
|
315
|
-
|
|
316
|
-
|
|
317
|
-
|
|
318
|
-
|
|
319
|
-
|
|
320
|
-
|
|
321
|
-
|
|
322
|
-
|
|
323
|
-
|
|
324
|
-
|
|
325
|
-
|
|
326
|
-
|
|
327
|
-
|
|
328
|
-
|
|
329
|
-
|
|
330
|
-
|
|
331
|
-
|
|
332
|
-
|
|
333
|
-
|
|
334
|
-
|
|
335
|
-
|
|
336
|
-
|
|
337
|
-
|
|
338
|
-
|
|
339
|
-
|
|
340
|
-
|
|
341
|
-
|
|
342
|
-
|
|
343
|
-
|
|
344
|
-
|
|
345
|
-
|
|
346
|
-
|
|
347
|
-
|
|
348
|
-
|
|
349
|
-
|
|
350
|
-
|
|
351
|
-
|
|
352
|
-
|
|
353
|
-
|
|
354
|
-
|
|
355
|
-
|
|
356
|
-
|
|
357
|
-
|
|
358
|
-
|
|
359
|
-
|
|
360
|
-
|
|
361
|
-
|
|
362
|
-
|
|
363
|
-
|
|
364
|
-
|
|
365
|
-
|
|
366
|
-
|
|
367
|
-
|
|
368
|
-
|
|
369
|
-
|
|
370
|
-
|
|
371
|
-
|
|
372
|
-
|
|
373
|
-
|
|
374
|
-
|
|
375
|
-
|
|
376
|
-
|
|
377
|
-
|
|
378
|
-
|
|
379
|
-
|
|
380
|
-
|
|
381
|
-
|
|
382
|
-
|
|
383
|
-
|
|
384
|
-
|
|
385
|
-
|
|
386
|
-
|
|
387
|
-
|
|
388
|
-
|
|
389
|
-
|
|
390
|
-
|
|
391
|
-
|
|
392
|
-
|
|
393
|
-
- **
|
|
394
|
-
- **
|
|
395
|
-
|
|
396
|
-
|
|
397
|
-
|
|
398
|
-
|
|
399
|
-
|
|
400
|
-
|
|
401
|
-
|
|
402
|
-
|
|
403
|
-
|
|
404
|
-
|
|
405
|
-
|
|
406
|
-
|
|
407
|
-
|
|
408
|
-
|
|
409
|
-
|
|
410
|
-
|
|
411
|
-
|
|
412
|
-
|
|
413
|
-
- **
|
|
414
|
-
- **
|
|
415
|
-
|
|
416
|
-
|
|
417
|
-
|
|
418
|
-
|
|
419
|
-
|
|
420
|
-
|
|
421
|
-
|
|
422
|
-
|
|
423
|
-
|
|
424
|
-
|
|
425
|
-
|
|
426
|
-
|
|
427
|
-
|
|
428
|
-
|
|
429
|
-
|
|
430
|
-
|
|
431
|
-
|
|
432
|
-
|
|
433
|
-
- **
|
|
434
|
-
- **
|
|
435
|
-
|
|
436
|
-
|
|
437
|
-
|
|
438
|
-
|
|
439
|
-
|
|
440
|
-
|
|
441
|
-
|
|
442
|
-
|
|
443
|
-
|
|
444
|
-
|
|
445
|
-
|
|
446
|
-
|
|
447
|
-
|
|
448
|
-
|
|
449
|
-
|
|
450
|
-
|
|
451
|
-
|
|
452
|
-
|
|
453
|
-
|
|
454
|
-
|
|
455
|
-
|
|
456
|
-
|
|
457
|
-
|
|
458
|
-
|
|
459
|
-
|
|
460
|
-
|
|
461
|
-
|
|
462
|
-
|
|
463
|
-
|
|
464
|
-
|
|
465
|
-
raglite ingest ./
|
|
466
|
-
|
|
467
|
-
|
|
468
|
-
|
|
469
|
-
|
|
470
|
-
|
|
471
|
-
|
|
472
|
-
|
|
473
|
-
|
|
474
|
-
|
|
475
|
-
|
|
476
|
-
|
|
477
|
-
|
|
478
|
-
|
|
479
|
-
|
|
480
|
-
|
|
481
|
-
|
|
482
|
-
|
|
483
|
-
|
|
484
|
-
|
|
485
|
-
|
|
486
|
-
|
|
487
|
-
|
|
|
488
|
-
|
|
489
|
-
|
|
490
|
-
|
|
491
|
-
|
|
492
|
-
|
|
493
|
-
|
|
494
|
-
|
|
495
|
-
|
|
496
|
-
|
|
497
|
-
|
|
498
|
-
|
|
499
|
-
|
|
500
|
-
|
|
501
|
-
|
|
502
|
-
|
|
503
|
-
|
|
504
|
-
|
|
505
|
-
|
|
506
|
-
|
|
507
|
-
|
|
508
|
-
|
|
509
|
-
|
|
510
|
-
|
|
511
|
-
|
|
512
|
-
|
|
513
|
-
|
|
514
|
-
|
|
515
|
-
|
|
516
|
-
|
|
517
|
-
|
|
518
|
-
|
|
519
|
-
|
|
520
|
-
|
|
521
|
-
|
|
522
|
-
|
|
523
|
-
|
|
524
|
-
|
|
525
|
-
|
|
526
|
-
|
|
527
|
-
|
|
528
|
-
|
|
529
|
-
|
|
530
|
-
|
|
531
|
-
|
|
532
|
-
|
|
533
|
-
|
|
534
|
-
|
|
535
|
-
|
|
536
|
-
|
|
537
|
-
|
|
538
|
-
|
|
539
|
-
|
|
540
|
-
|
|
541
|
-
|
|
542
|
-
|
|
543
|
-
|
|
544
|
-
|
|
545
|
-
|
|
546
|
-
|
|
547
|
-
|
|
548
|
-
|
|
549
|
-
|
|
550
|
-
###
|
|
551
|
-
|
|
552
|
-
| Model | Dims | Speed | Quality | Best For |
|
|
553
|
-
|-------|------|-------|---------|----------|
|
|
554
|
-
| `
|
|
555
|
-
| `Xenova/
|
|
556
|
-
|
|
557
|
-
###
|
|
558
|
-
|
|
559
|
-
|
|
560
|
-
|
|
561
|
-
-
|
|
562
|
-
-
|
|
563
|
-
|
|
564
|
-
|
|
565
|
-
|
|
566
|
-
|
|
567
|
-
|
|
568
|
-
|
|
569
|
-
|
|
570
|
-
|
|
571
|
-
|
|
572
|
-
|
|
573
|
-
|
|
574
|
-
|
|
575
|
-
|
|
576
|
-
|
|
577
|
-
|
|
578
|
-
|
|
579
|
-
|
|
580
|
-
|
|
581
|
-
|
|
582
|
-
|
|
583
|
-
- [
|
|
584
|
-
- [
|
|
585
|
-
|
|
586
|
-
|
|
587
|
-
|
|
588
|
-
|
|
589
|
-
|
|
590
|
-
|
|
591
|
-
|
|
592
|
-
- [
|
|
593
|
-
- [
|
|
594
|
-
|
|
595
|
-
|
|
596
|
-
|
|
597
|
-
|
|
598
|
-
|
|
599
|
-
|
|
600
|
-
|
|
601
|
-
|
|
602
|
-
|
|
603
|
-
|
|
604
|
-
|
|
605
|
-
|
|
606
|
-
|
|
607
|
-
|
|
608
|
-
|
|
609
|
-
|
|
610
|
-
|
|
|
611
|
-
|
|
612
|
-
|
|
613
|
-
|
|
614
|
-
|
|
615
|
-
|
|
616
|
-
|
|
617
|
-
|
|
618
|
-
|
|
619
|
-
|
|
620
|
-
|
|
621
|
-
|
|
622
|
-
|
|
623
|
-
|
|
624
|
-
|
|
625
|
-
|
|
626
|
-
|
|
627
|
-
|
|
628
|
-
|
|
629
|
-
|
|
630
|
-
|
|
631
|
-
|
|
632
|
-
|
|
633
|
-
|
|
634
|
-
|
|
635
|
-
|
|
636
|
-
|
|
637
|
-
|
|
638
|
-
|
|
639
|
-
|
|
640
|
-
|
|
641
|
-
|
|
642
|
-
|
|
643
|
-
|
|
644
|
-
|
|
645
|
-
|
|
646
|
-
|
|
647
|
-
|
|
648
|
-
|
|
649
|
-
|
|
650
|
-
|
|
651
|
-
|
|
652
|
-
|
|
653
|
-
"
|
|
654
|
-
"
|
|
655
|
-
|
|
656
|
-
"
|
|
657
|
-
|
|
658
|
-
|
|
659
|
-
|
|
660
|
-
|
|
661
|
-
|
|
662
|
-
|
|
663
|
-
|
|
664
|
-
|
|
665
|
-
|
|
666
|
-
|
|
667
|
-
|
|
668
|
-
|
|
669
|
-
|
|
670
|
-
|
|
671
|
-
|
|
672
|
-
|
|
673
|
-
-
|
|
674
|
-
-
|
|
675
|
-
-
|
|
676
|
-
|
|
677
|
-
|
|
678
|
-
|
|
679
|
-
|
|
680
|
-
|
|
681
|
-
|
|
682
|
-
|
|
683
|
-
|
|
684
|
-
|
|
685
|
-
|
|
686
|
-
|
|
687
|
-
|
|
688
|
-
|
|
689
|
-
|
|
690
|
-
|
|
691
|
-
|
|
692
|
-
|
|
693
|
-
|
|
694
|
-
|
|
695
|
-
|
|
696
|
-
|
|
697
|
-
|
|
698
|
-
|
|
699
|
-
npm
|
|
700
|
-
|
|
701
|
-
|
|
702
|
-
|
|
703
|
-
|
|
704
|
-
|
|
705
|
-
|
|
706
|
-
|
|
707
|
-
|
|
708
|
-
|
|
709
|
-
|
|
710
|
-
|
|
711
|
-
|
|
712
|
-
|
|
713
|
-
|
|
714
|
-
|
|
715
|
-
|
|
716
|
-
|
|
717
|
-
โโโ
|
|
718
|
-
โ โโโ
|
|
719
|
-
โ โโโ
|
|
720
|
-
โ
|
|
721
|
-
โโโ
|
|
722
|
-
โ โโโ
|
|
723
|
-
โ
|
|
724
|
-
|
|
725
|
-
โ
|
|
726
|
-
โโโ
|
|
727
|
-
|
|
728
|
-
|
|
729
|
-
|
|
730
|
-
|
|
731
|
-
|
|
732
|
-
|
|
733
|
-
|
|
734
|
-
|
|
735
|
-
|
|
736
|
-
|
|
737
|
-
|
|
738
|
-
|
|
739
|
-
|
|
740
|
-
|
|
741
|
-
|
|
742
|
-
|
|
743
|
-
|
|
744
|
-
|
|
745
|
-
|
|
746
|
-
|
|
747
|
-
|
|
748
|
-
|
|
749
|
-
|
|
750
|
-
|
|
751
|
-
|
|
752
|
-
|
|
753
|
-
|
|
754
|
-
|
|
755
|
-
|
|
756
|
-
|
|
757
|
-
|
|
758
|
-
|
|
759
|
-
|
|
760
|
-
|
|
761
|
-
|
|
762
|
-
|
|
763
|
-
|
|
764
|
-
|
|
765
|
-
|
|
766
|
-
|
|
767
|
-
|
|
768
|
-
|
|
769
|
-
|
|
770
|
-
|
|
771
|
-
|
|
772
|
-
|
|
773
|
-
|
|
774
|
-
|
|
775
|
-
|
|
776
|
-
|
|
777
|
-
|
|
778
|
-
-
|
|
779
|
-
-
|
|
780
|
-
-
|
|
781
|
-
|
|
782
|
-
|
|
783
|
-
|
|
784
|
-
|
|
785
|
-
|
|
786
|
-
|
|
787
|
-
|
|
788
|
-
|
|
789
|
-
|
|
790
|
-
|
|
791
|
-
|
|
792
|
-
|
|
793
|
-
|
|
794
|
-
|
|
795
|
-
|
|
796
|
-
|
|
797
|
-
|
|
798
|
-
|
|
799
|
-
|
|
800
|
-
|
|
801
|
-
|
|
802
|
-
|
|
803
|
-
|
|
804
|
-
|
|
805
|
-
|
|
806
|
-
|
|
807
|
-
|
|
808
|
-
|
|
1
|
+
<div align="center">
|
|
2
|
+
|
|
3
|
+
# ๐ฆ RAG-lite TS
|
|
4
|
+
|
|
5
|
+
### *Simple by default, powerful when needed*
|
|
6
|
+
|
|
7
|
+
**Local-first semantic search that actually works**
|
|
8
|
+
|
|
9
|
+
[](https://www.npmjs.com/package/rag-lite-ts)
|
|
10
|
+
[](https://opensource.org/licenses/MIT)
|
|
11
|
+
[](https://www.typescriptlang.org/)
|
|
12
|
+
[](https://nodejs.org/)
|
|
13
|
+
|
|
14
|
+
[Quick Start](#quick-start) โข [Features](#features) โข [Documentation](#documentation) โข [Examples](#examples) โข [MCP Integration](#mcp-server-integration)
|
|
15
|
+
|
|
16
|
+
</div>
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## ๐ฏ Why RAG-lite TS?
|
|
21
|
+
|
|
22
|
+
**Stop fighting with complex RAG frameworks.** Get semantic search running in 30 seconds:
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
npm install -g rag-lite-ts
|
|
26
|
+
raglite ingest ./docs/
|
|
27
|
+
raglite search "your query here"
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
**That's it.** No API keys, no cloud services, no configuration hell.
|
|
31
|
+
|
|
32
|
+
### ๐ฌ See It In Action
|
|
33
|
+
|
|
34
|
+
```typescript
|
|
35
|
+
// 1. Ingest your docs
|
|
36
|
+
const pipeline = new IngestionPipeline('./db.sqlite', './index.bin');
|
|
37
|
+
await pipeline.ingestDirectory('./docs/');
|
|
38
|
+
|
|
39
|
+
// 2. Search semantically
|
|
40
|
+
const search = new SearchEngine('./index.bin', './db.sqlite');
|
|
41
|
+
const results = await search.search('authentication flow');
|
|
42
|
+
|
|
43
|
+
// 3. Get relevant results instantly
|
|
44
|
+
console.log(results[0].text);
|
|
45
|
+
// "To authenticate users, first obtain a JWT token from the /auth endpoint..."
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
**Real semantic understanding** - not just keyword matching. Finds "JWT token" when you search for "authentication flow".
|
|
49
|
+
|
|
50
|
+
### What Makes It Different?
|
|
51
|
+
|
|
52
|
+
- ๐ **100% Local** - Your data never leaves your machine
|
|
53
|
+
- ๐ **Actually Fast** - Sub-100ms queries, not "eventually consistent"
|
|
54
|
+
- ๐ฆ **Chameleon Architecture** - Automatically adapts between text and multimodal modes
|
|
55
|
+
- ๐ผ๏ธ **True Multimodal** - Search images with text, text with images (CLIP unified space)
|
|
56
|
+
- ๐ฆ **Zero Runtime Dependencies** - No Python, no Docker, no external services
|
|
57
|
+
- ๐ฏ **TypeScript Native** - Full type safety, modern ESM architecture
|
|
58
|
+
- ๐ **MCP Ready** - Built-in Model Context Protocol server for AI agents
|
|
59
|
+
|
|
60
|
+

|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## ๐ What's New in 2.0
|
|
65
|
+
|
|
66
|
+
**Chameleon Multimodal Architecture** - RAG-lite TS now seamlessly adapts between text-only and multimodal search:
|
|
67
|
+
|
|
68
|
+
### ๐ผ๏ธ Multimodal Search
|
|
69
|
+
- **CLIP Integration** - Unified 512D embedding space for text and images
|
|
70
|
+
- **Cross-Modal Search** - Find images with text queries, text with image queries
|
|
71
|
+
- **Image-to-Text Generation** - Automatic descriptions using vision-language models
|
|
72
|
+
- **Smart Reranking** - Automatic strategy selection with cross-encoder and text-derived methods
|
|
73
|
+
|
|
74
|
+
### ๐๏ธ Architecture Improvements
|
|
75
|
+
- **Layered Architecture** - Clean separation: core (model-agnostic) โ implementation (text/multimodal) โ public API
|
|
76
|
+
- **Mode Persistence** - Configuration stored in database, auto-detected during search
|
|
77
|
+
- **Unified Content System** - Memory-based ingestion for AI agents, format-adaptive retrieval
|
|
78
|
+
- **Simplified APIs** - `createEmbedder()` and `createReranker()` replace complex factory patterns
|
|
79
|
+
|
|
80
|
+
### ๐ค MCP Server Enhancements
|
|
81
|
+
- **Multimodal Tools** - `multimodal_search`, `ingest_image` with URL download
|
|
82
|
+
- **Base64 Image Delivery** - Automatic encoding for AI agent integration
|
|
83
|
+
- **Content-Type Filtering** - Filter results by text, image, pdf, docx
|
|
84
|
+
- **Dynamic Tool Descriptions** - Context-aware tool documentation
|
|
85
|
+
|
|
86
|
+
### ๐ฆ Migration from 1.x
|
|
87
|
+
Existing databases need schema updates for multimodal support. Two options:
|
|
88
|
+
1. **Automatic Migration**: Use `migrateToRagLiteStructure()` function
|
|
89
|
+
2. **Fresh Start**: Re-ingest content with v2.0.0
|
|
90
|
+
|
|
91
|
+
See [CHANGELOG.md](CHANGELOG.md) for complete details.
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## ๐ Table of Contents
|
|
96
|
+
|
|
97
|
+
- [Why RAG-lite TS?](#-why-rag-lite-ts)
|
|
98
|
+
- [Quick Start](#-quick-start)
|
|
99
|
+
- [Features](#-features)
|
|
100
|
+
- [Real-World Examples](#-real-world-examples)
|
|
101
|
+
- [How It Works](#-how-it-works)
|
|
102
|
+
- [Supported Models](#-supported-models)
|
|
103
|
+
- [Documentation](#-documentation)
|
|
104
|
+
- [MCP Server Integration](#-mcp-server-integration)
|
|
105
|
+
- [Development](#-development)
|
|
106
|
+
- [Contributing](#-contributing)
|
|
107
|
+
- [License](#-license)
|
|
108
|
+
|
|
109
|
+
## ๐ Quick Start
|
|
110
|
+
|
|
111
|
+
### Installation
|
|
112
|
+
|
|
113
|
+
```bash
|
|
114
|
+
npm install -g rag-lite-ts
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
### Basic Usage
|
|
118
|
+
|
|
119
|
+
```bash
|
|
120
|
+
# Ingest documents
|
|
121
|
+
raglite ingest ./docs/
|
|
122
|
+
|
|
123
|
+
# Search your documents
|
|
124
|
+
raglite search "machine learning concepts"
|
|
125
|
+
|
|
126
|
+
# Get more results with reranking (use --rerank for better quality)
|
|
127
|
+
raglite search "API documentation" --top-k 10 --rerank
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
### Using Different Models
|
|
131
|
+
|
|
132
|
+
```bash
|
|
133
|
+
# Use higher quality model (auto-rebuilds if needed)
|
|
134
|
+
raglite ingest ./docs/ --model Xenova/all-mpnet-base-v2 --rebuild-if-needed
|
|
135
|
+
|
|
136
|
+
# Search automatically uses the correct model
|
|
137
|
+
raglite search "complex query"
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
### Content Retrieval and MCP Integration
|
|
141
|
+
|
|
142
|
+
```typescript
|
|
143
|
+
import { SearchEngine, IngestionPipeline } from 'rag-lite-ts';
|
|
144
|
+
|
|
145
|
+
// Memory-based ingestion for AI agents
|
|
146
|
+
const pipeline = new IngestionPipeline('./db.sqlite', './index.bin');
|
|
147
|
+
const content = Buffer.from('Document from AI agent');
|
|
148
|
+
await pipeline.ingestFromMemory(content, {
|
|
149
|
+
displayName: 'agent-document.txt'
|
|
150
|
+
});
|
|
151
|
+
|
|
152
|
+
// Format-adaptive content retrieval
|
|
153
|
+
const search = new SearchEngine('./index.bin', './db.sqlite');
|
|
154
|
+
const results = await search.search('query');
|
|
155
|
+
|
|
156
|
+
// Get file path for CLI clients
|
|
157
|
+
const filePath = await search.getContent(results[0].contentId, 'file');
|
|
158
|
+
|
|
159
|
+
// Get base64 content for MCP clients
|
|
160
|
+
const base64 = await search.getContent(results[0].contentId, 'base64');
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
### Multimodal Search (Text + Images)
|
|
164
|
+
|
|
165
|
+
RAG-lite TS now supports true multimodal search using CLIP's unified embedding space, enabling cross-modal search between text and images:
|
|
166
|
+
|
|
167
|
+
```bash
|
|
168
|
+
# Enable multimodal processing for text and image content
|
|
169
|
+
raglite ingest ./docs/ --mode multimodal
|
|
170
|
+
|
|
171
|
+
# Cross-modal search: Find images using text queries
|
|
172
|
+
raglite search "architecture diagram" --content-type image
|
|
173
|
+
raglite search "red sports car" --content-type image
|
|
174
|
+
|
|
175
|
+
# Image-to-image search: Find similar images using image files
|
|
176
|
+
raglite search ./photo.jpg # Find similar images
|
|
177
|
+
raglite search ./diagram.png --top-k 5 # Find similar images with custom count
|
|
178
|
+
|
|
179
|
+
# Find text documents about visual concepts
|
|
180
|
+
raglite search "user interface design" --content-type text
|
|
181
|
+
|
|
182
|
+
# Search across both content types (default)
|
|
183
|
+
raglite search "system overview"
|
|
184
|
+
|
|
185
|
+
# Automatic reranking based on mode (text: cross-encoder, multimodal: text-derived)
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
**Key Features:**
|
|
189
|
+
- **Unified embedding space**: Text and images embedded in the same 512-dimensional CLIP space
|
|
190
|
+
- **Cross-modal search**: Text queries find semantically similar images
|
|
191
|
+
- **Automatic mode detection**: Set mode once during ingestion, automatically detected during search
|
|
192
|
+
- **Automatic reranking**: Cross-encoder for text, text-derived for multimodal, with enable/disable control
|
|
193
|
+
- **Seamless experience**: Same CLI commands work for both text-only and multimodal content
|
|
194
|
+
|
|
195
|
+
โ **[Complete Multimodal Tutorial](docs/multimodal-tutorial.md)**
|
|
196
|
+
|
|
197
|
+
### Programmatic Usage
|
|
198
|
+
|
|
199
|
+
```typescript
|
|
200
|
+
import { SearchEngine, IngestionPipeline } from 'rag-lite-ts';
|
|
201
|
+
|
|
202
|
+
// Text-only mode (default)
|
|
203
|
+
const ingestion = new IngestionPipeline('./db.sqlite', './vector-index.bin');
|
|
204
|
+
await ingestion.ingestDirectory('./docs/');
|
|
205
|
+
|
|
206
|
+
// Multimodal mode (text + images)
|
|
207
|
+
const multimodalIngestion = new IngestionPipeline('./db.sqlite', './index.bin', {
|
|
208
|
+
mode: 'multimodal',
|
|
209
|
+
embeddingModel: 'Xenova/clip-vit-base-patch32',
|
|
210
|
+
rerankingStrategy: 'text-derived'
|
|
211
|
+
});
|
|
212
|
+
await multimodalIngestion.ingestDirectory('./mixed-content/');
|
|
213
|
+
|
|
214
|
+
// Search (mode auto-detected from database)
|
|
215
|
+
const search = new SearchEngine('./vector-index.bin', './db.sqlite');
|
|
216
|
+
const results = await search.search('machine learning', { top_k: 10 });
|
|
217
|
+
|
|
218
|
+
// Cross-modal search in multimodal mode
|
|
219
|
+
const imageResults = results.filter(r => r.contentType === 'image');
|
|
220
|
+
const textResults = results.filter(r => r.contentType === 'text');
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
### Memory Ingestion & Unified Content System (NEW)
|
|
224
|
+
|
|
225
|
+
```typescript
|
|
226
|
+
// Ingest content directly from memory (perfect for MCP integration)
|
|
227
|
+
const content = Buffer.from('# AI Guide\n\nComprehensive AI concepts...');
|
|
228
|
+
const contentId = await ingestion.ingestFromMemory(content, {
|
|
229
|
+
displayName: 'AI Guide.md',
|
|
230
|
+
contentType: 'text/markdown'
|
|
231
|
+
});
|
|
232
|
+
|
|
233
|
+
// Retrieve content in different formats based on client needs
|
|
234
|
+
const filePath = await search.getContent(contentId, 'file'); // For CLI clients
|
|
235
|
+
const base64Data = await search.getContent(contentId, 'base64'); // For MCP clients
|
|
236
|
+
|
|
237
|
+
// Batch content retrieval for efficiency
|
|
238
|
+
const contentIds = ['id1', 'id2', 'id3'];
|
|
239
|
+
const contents = await search.getContentBatch(contentIds, 'base64');
|
|
240
|
+
|
|
241
|
+
// Content management with deduplication
|
|
242
|
+
const stats = await ingestion.getStorageStats();
|
|
243
|
+
console.log(`Content directory: ${stats.contentDirSize} bytes, ${stats.fileCount} files`);
|
|
244
|
+
|
|
245
|
+
// Cleanup orphaned content
|
|
246
|
+
const cleanupResult = await ingestion.cleanup();
|
|
247
|
+
console.log(`Removed ${cleanupResult.removedFiles} orphaned files`);
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
#### Configuration Options
|
|
251
|
+
|
|
252
|
+
```typescript
|
|
253
|
+
import { SearchEngine, IngestionPipeline } from 'rag-lite-ts';
|
|
254
|
+
|
|
255
|
+
// Custom model configuration
|
|
256
|
+
const search = new SearchEngine('./vector-index.bin', './db.sqlite', {
|
|
257
|
+
embeddingModel: 'Xenova/all-mpnet-base-v2',
|
|
258
|
+
enableReranking: true,
|
|
259
|
+
topK: 15
|
|
260
|
+
});
|
|
261
|
+
|
|
262
|
+
// Ingestion with custom settings
|
|
263
|
+
const ingestion = new IngestionPipeline('./db.sqlite', './vector-index.bin', {
|
|
264
|
+
embeddingModel: 'Xenova/all-mpnet-base-v2',
|
|
265
|
+
chunkSize: 400,
|
|
266
|
+
chunkOverlap: 80
|
|
267
|
+
});
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
โ **[Complete CLI Reference](docs/cli-reference.md)** | **[API Documentation](docs/api-reference.md)**
|
|
271
|
+
|
|
272
|
+
---
|
|
273
|
+
|
|
274
|
+
## ๐ก Real-World Examples
|
|
275
|
+
|
|
276
|
+
<details>
|
|
277
|
+
<summary><b>๐ Build a Documentation Search Engine</b></summary>
|
|
278
|
+
|
|
279
|
+
```typescript
|
|
280
|
+
import { SearchEngine, IngestionPipeline } from 'rag-lite-ts';
|
|
281
|
+
|
|
282
|
+
// Ingest your docs once
|
|
283
|
+
const pipeline = new IngestionPipeline('./db.sqlite', './index.bin');
|
|
284
|
+
await pipeline.ingestDirectory('./docs/');
|
|
285
|
+
|
|
286
|
+
// Search instantly
|
|
287
|
+
const search = new SearchEngine('./index.bin', './db.sqlite');
|
|
288
|
+
const results = await search.search('authentication flow');
|
|
289
|
+
|
|
290
|
+
results.forEach(r => {
|
|
291
|
+
console.log(`${r.metadata.title}: ${r.text}`);
|
|
292
|
+
console.log(`Relevance: ${r.score.toFixed(3)}\n`);
|
|
293
|
+
});
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
**Use case:** Internal documentation, API references, knowledge bases
|
|
297
|
+
|
|
298
|
+
</details>
|
|
299
|
+
|
|
300
|
+
<details>
|
|
301
|
+
<summary><b>๐ผ๏ธ Search Images with Natural Language</b></summary>
|
|
302
|
+
|
|
303
|
+
```bash
|
|
304
|
+
# Ingest mixed content (text + images)
|
|
305
|
+
raglite ingest ./assets/ --mode multimodal
|
|
306
|
+
|
|
307
|
+
# Find images using text descriptions
|
|
308
|
+
raglite search "architecture diagram" --content-type image
|
|
309
|
+
raglite search "team photo" --content-type image
|
|
310
|
+
raglite search "product screenshot" --content-type image
|
|
311
|
+
|
|
312
|
+
# Or find similar images using image files directly
|
|
313
|
+
raglite search ./reference-diagram.png --content-type image
|
|
314
|
+
raglite search ./sample-photo.jpg --top-k 5
|
|
315
|
+
```
|
|
316
|
+
|
|
317
|
+
**Use case:** Digital asset management, photo libraries, design systems
|
|
318
|
+
|
|
319
|
+
</details>
|
|
320
|
+
|
|
321
|
+
<details>
|
|
322
|
+
<summary><b>๐ค AI Agent with Memory</b></summary>
|
|
323
|
+
|
|
324
|
+
```typescript
|
|
325
|
+
// Agent ingests conversation context
|
|
326
|
+
const content = Buffer.from('User prefers dark mode. Uses TypeScript.');
|
|
327
|
+
await pipeline.ingestFromMemory(content, {
|
|
328
|
+
displayName: 'user-preferences.txt'
|
|
329
|
+
});
|
|
330
|
+
|
|
331
|
+
// Later, agent retrieves relevant context
|
|
332
|
+
const context = await search.search('user interface preferences');
|
|
333
|
+
// Agent now knows: "User prefers dark mode"
|
|
334
|
+
```
|
|
335
|
+
|
|
336
|
+
**Use case:** Chatbots, AI assistants, context-aware agents
|
|
337
|
+
|
|
338
|
+
</details>
|
|
339
|
+
|
|
340
|
+
<details>
|
|
341
|
+
<summary><b>๐ Semantic Code Search</b></summary>
|
|
342
|
+
|
|
343
|
+
```typescript
|
|
344
|
+
// Index your codebase
|
|
345
|
+
await pipeline.ingestDirectory('./src/', {
|
|
346
|
+
chunkSize: 500, // Larger chunks for code
|
|
347
|
+
chunkOverlap: 100
|
|
348
|
+
});
|
|
349
|
+
|
|
350
|
+
// Find code by intent, not keywords
|
|
351
|
+
const results = await search.search('authentication middleware');
|
|
352
|
+
// Finds relevant code even if it doesn't contain those exact words
|
|
353
|
+
```
|
|
354
|
+
|
|
355
|
+
**Use case:** Code navigation, refactoring, onboarding
|
|
356
|
+
|
|
357
|
+
</details>
|
|
358
|
+
|
|
359
|
+
<details>
|
|
360
|
+
<summary><b>๐ MCP Server for Claude/AI Tools</b></summary>
|
|
361
|
+
|
|
362
|
+
```json
|
|
363
|
+
{
|
|
364
|
+
"mcpServers": {
|
|
365
|
+
"my-docs": {
|
|
366
|
+
"command": "raglite-mcp",
|
|
367
|
+
"env": {
|
|
368
|
+
"RAG_DB_FILE": "./docs/db.sqlite",
|
|
369
|
+
"RAG_INDEX_FILE": "./docs/index.bin"
|
|
370
|
+
}
|
|
371
|
+
}
|
|
372
|
+
}
|
|
373
|
+
}
|
|
374
|
+
```
|
|
375
|
+
|
|
376
|
+
Now Claude can search your docs directly! Works with any MCP-compatible AI tool.
|
|
377
|
+
|
|
378
|
+
**Use case:** AI-powered documentation, intelligent assistants
|
|
379
|
+
|
|
380
|
+
</details>
|
|
381
|
+
|
|
382
|
+
---
|
|
383
|
+
|
|
384
|
+
## โจ Features
|
|
385
|
+
|
|
386
|
+
<table>
|
|
387
|
+
<tr>
|
|
388
|
+
<td width="50%">
|
|
389
|
+
|
|
390
|
+
### ๐ฏ Developer Experience
|
|
391
|
+
- **One-line setup** - `new SearchEngine()` just works
|
|
392
|
+
- **TypeScript native** - Full type safety
|
|
393
|
+
- **Zero config** - Sensible defaults everywhere
|
|
394
|
+
- **Hackable** - Clean architecture, easy to extend
|
|
395
|
+
|
|
396
|
+
</td>
|
|
397
|
+
<td width="50%">
|
|
398
|
+
|
|
399
|
+
### ๐ Performance
|
|
400
|
+
- **Sub-100ms queries** - Fast vector search
|
|
401
|
+
- **Offline-first** - No network calls
|
|
402
|
+
- **Efficient chunking** - Smart semantic boundaries
|
|
403
|
+
- **Optimized models** - Multiple quality/speed options
|
|
404
|
+
|
|
405
|
+
</td>
|
|
406
|
+
</tr>
|
|
407
|
+
<tr>
|
|
408
|
+
<td width="50%">
|
|
409
|
+
|
|
410
|
+
### ๐ฆ Chameleon Architecture
|
|
411
|
+
- **Auto-adapting** - Text or multimodal mode
|
|
412
|
+
- **Mode persistence** - Set once, auto-detected
|
|
413
|
+
- **No fallbacks** - Reliable or clear failure
|
|
414
|
+
- **Polymorphic runtime** - Same API, different modes
|
|
415
|
+
|
|
416
|
+
</td>
|
|
417
|
+
<td width="50%">
|
|
418
|
+
|
|
419
|
+
### ๐ผ๏ธ Multimodal Search
|
|
420
|
+
- **CLIP unified space** - Text and images together
|
|
421
|
+
- **Cross-modal queries** - Text finds images, vice versa
|
|
422
|
+
- **Smart reranking** - Automatic strategy selection by mode
|
|
423
|
+
- **Seamless experience** - Same commands, more power
|
|
424
|
+
|
|
425
|
+
</td>
|
|
426
|
+
</tr>
|
|
427
|
+
<tr>
|
|
428
|
+
<td width="50%">
|
|
429
|
+
|
|
430
|
+
### ๐ Integration Ready
|
|
431
|
+
- **MCP server included** - AI agent integration
|
|
432
|
+
- **Memory ingestion** - Direct buffer processing
|
|
433
|
+
- **Format-adaptive** - File paths or base64 data
|
|
434
|
+
- **Multi-instance** - Run multiple databases
|
|
435
|
+
|
|
436
|
+
</td>
|
|
437
|
+
<td width="50%">
|
|
438
|
+
|
|
439
|
+
### ๐ ๏ธ Production Ready
|
|
440
|
+
- **Content management** - Deduplication, cleanup
|
|
441
|
+
- **Model compatibility** - Auto-detection, rebuilds
|
|
442
|
+
- **Error recovery** - Clear messages, helpful hints
|
|
443
|
+
|
|
444
|
+
</td>
|
|
445
|
+
</tr>
|
|
446
|
+
</table>
|
|
447
|
+
|
|
448
|
+
### ๐ Supported File Formats
|
|
449
|
+
|
|
450
|
+
RAG-lite TS supports the following file formats with full processing implementations:
|
|
451
|
+
|
|
452
|
+
**Text Mode:**
|
|
453
|
+
- Markdown: `.md`, `.mdx`
|
|
454
|
+
- Plain text: `.txt`
|
|
455
|
+
- Documents: `.pdf`, `.docx`
|
|
456
|
+
|
|
457
|
+
**Multimodal Mode** (includes all text formats plus):
|
|
458
|
+
- Images: `.jpg`, `.jpeg`, `.png`, `.gif`, `.webp`, `.bmp`
|
|
459
|
+
|
|
460
|
+
All formats work seamlessly with both single file and directory ingestion:
|
|
461
|
+
|
|
462
|
+
```bash
|
|
463
|
+
# Single file ingestion
|
|
464
|
+
raglite ingest ./document.pdf
|
|
465
|
+
raglite ingest ./readme.md
|
|
466
|
+
raglite ingest ./notes.txt
|
|
467
|
+
|
|
468
|
+
# Directory ingestion (processes all supported formats)
|
|
469
|
+
raglite ingest ./docs/
|
|
470
|
+
|
|
471
|
+
# Multimodal ingestion (includes images)
|
|
472
|
+
raglite ingest ./mixed-content/ --mode multimodal
|
|
473
|
+
```
|
|
474
|
+
|
|
475
|
+
## ๐ง How It Works
|
|
476
|
+
|
|
477
|
+
RAG-lite TS follows a clean, efficient pipeline:
|
|
478
|
+
|
|
479
|
+
```
|
|
480
|
+
๐ Documents โ ๐งน Preprocessing โ โ๏ธ Chunking โ ๐ง Embedding โ ๐พ Storage
|
|
481
|
+
โ
|
|
482
|
+
๐ฏ Results โ ๐ Reranking โ ๐ Vector Search โ ๐ง Query Embedding โ โ Query
|
|
483
|
+
```
|
|
484
|
+
|
|
485
|
+
### Pipeline Steps
|
|
486
|
+
|
|
487
|
+
| Step | What Happens | Technologies |
|
|
488
|
+
|------|--------------|--------------|
|
|
489
|
+
| **1. Ingestion** | Reads `.md`, `.txt`, `.pdf`, `.docx`, images | Native parsers |
|
|
490
|
+
| **2. Preprocessing** | Cleans JSX, Mermaid, code blocks, generates image descriptions | Custom processors |
|
|
491
|
+
| **3. Chunking** | Splits at natural boundaries with token limits | Semantic chunking |
|
|
492
|
+
| **4. Embedding** | Converts text/images to vectors | transformers.js |
|
|
493
|
+
| **5. Storage** | Indexes vectors, stores metadata | hnswlib + SQLite |
|
|
494
|
+
| **6. Search** | Finds similar chunks via cosine similarity | HNSW algorithm |
|
|
495
|
+
| **7. Reranking** | Re-scores results for relevance | Cross-encoder/text-derived |
|
|
496
|
+
|
|
497
|
+
### ๐ฆ Chameleon Architecture
|
|
498
|
+
|
|
499
|
+
The system **automatically adapts** based on your content:
|
|
500
|
+
|
|
501
|
+
<table>
|
|
502
|
+
<tr>
|
|
503
|
+
<td width="50%">
|
|
504
|
+
|
|
505
|
+
#### ๐ Text Mode
|
|
506
|
+
```
|
|
507
|
+
Text Docs โ Sentence Transformer
|
|
508
|
+
โ
|
|
509
|
+
384D Vectors
|
|
510
|
+
โ
|
|
511
|
+
HNSW Index + SQLite
|
|
512
|
+
โ
|
|
513
|
+
Cross-Encoder Reranking
|
|
514
|
+
```
|
|
515
|
+
|
|
516
|
+
**Best for:** Documentation, articles, code
|
|
517
|
+
|
|
518
|
+
</td>
|
|
519
|
+
<td width="50%">
|
|
520
|
+
|
|
521
|
+
#### ๐ผ๏ธ Multimodal Mode
|
|
522
|
+
```
|
|
523
|
+
Text + Images โ CLIP Embedder
|
|
524
|
+
โ
|
|
525
|
+
512D Unified Space
|
|
526
|
+
โ
|
|
527
|
+
HNSW Index + SQLite
|
|
528
|
+
โ
|
|
529
|
+
Text-Derived Reranking
|
|
530
|
+
```
|
|
531
|
+
|
|
532
|
+
**Best for:** Mixed content, visual search
|
|
533
|
+
|
|
534
|
+
</td>
|
|
535
|
+
</tr>
|
|
536
|
+
</table>
|
|
537
|
+
|
|
538
|
+
**๐ฏ Key Benefits:**
|
|
539
|
+
- Set mode **once** during ingestion โ Auto-detected during search
|
|
540
|
+
- **Cross-modal search** - Text queries find images, image queries find text
|
|
541
|
+
- **No fallback complexity** - Each mode works reliably or fails clearly
|
|
542
|
+
- **Same API** - Your code doesn't change between modes
|
|
543
|
+
|
|
544
|
+
โ **[Document Preprocessing Guide](docs/preprocessing.md)** | **[Model Management Details](models/README.md)**
|
|
545
|
+
|
|
546
|
+
## ๐ง Supported Models
|
|
547
|
+
|
|
548
|
+
Choose the right model for your use case:
|
|
549
|
+
|
|
550
|
+
### ๐ Text Mode Models
|
|
551
|
+
|
|
552
|
+
| Model | Dims | Speed | Quality | Best For |
|
|
553
|
+
|-------|------|-------|---------|----------|
|
|
554
|
+
| `sentence-transformers/all-MiniLM-L6-v2` โญ | 384 | โกโกโก | โญโญโญ | General purpose (default) |
|
|
555
|
+
| `Xenova/all-mpnet-base-v2` | 768 | โกโก | โญโญโญโญ | Complex queries, higher accuracy |
|
|
556
|
+
|
|
557
|
+
### ๐ผ๏ธ Multimodal Models
|
|
558
|
+
|
|
559
|
+
| Model | Dims | Speed | Quality | Best For |
|
|
560
|
+
|-------|------|-------|---------|----------|
|
|
561
|
+
| `Xenova/clip-vit-base-patch32` โญ | 512 | โกโก | โญโญโญ | Text + images (default) |
|
|
562
|
+
| `Xenova/clip-vit-base-patch16` | 512 | โก | โญโญโญโญ | Higher visual quality |
|
|
563
|
+
|
|
564
|
+
### โจ Model Features
|
|
565
|
+
|
|
566
|
+
- โ
**Auto-download** - Models cached locally on first use
|
|
567
|
+
- โ
**Smart compatibility** - Detects model changes, prompts rebuilds
|
|
568
|
+
- โ
**Offline support** - Pre-download for air-gapped environments
|
|
569
|
+
- โ
**Zero config** - Works out of the box with sensible defaults
|
|
570
|
+
- โ
**Cross-modal** - CLIP enables text โ image search
|
|
571
|
+
|
|
572
|
+
โ **[Complete Model Guide](docs/model-guide.md)** | **[Performance Benchmarks](docs/EMBEDDING_MODELS_COMPARISON.md)**
|
|
573
|
+
|
|
574
|
+
## ๐ Documentation
|
|
575
|
+
|
|
576
|
+
<table>
|
|
577
|
+
<tr>
|
|
578
|
+
<td width="33%">
|
|
579
|
+
|
|
580
|
+
### ๐ Getting Started
|
|
581
|
+
- [CLI Reference](docs/cli-reference.md)
|
|
582
|
+
- [API Reference](docs/api-reference.md)
|
|
583
|
+
- [Multimodal Tutorial](docs/multimodal-tutorial.md)
|
|
584
|
+
- [Unified Content System](docs/unified-content-system.md)
|
|
585
|
+
|
|
586
|
+
</td>
|
|
587
|
+
<td width="33%">
|
|
588
|
+
|
|
589
|
+
### ๐ง Advanced
|
|
590
|
+
- [Configuration Guide](docs/configuration.md)
|
|
591
|
+
- [Model Selection](docs/model-guide.md)
|
|
592
|
+
- [Multimodal Config](docs/multimodal-configuration.md)
|
|
593
|
+
- [Path Strategies](docs/path-strategies.md)
|
|
594
|
+
|
|
595
|
+
</td>
|
|
596
|
+
<td width="33%">
|
|
597
|
+
|
|
598
|
+
### ๐ ๏ธ Support
|
|
599
|
+
- [Troubleshooting](docs/troubleshooting.md)
|
|
600
|
+
- [Multimodal Issues](docs/multimodal-troubleshooting.md)
|
|
601
|
+
- [Content Issues](docs/unified-content-troubleshooting.md)
|
|
602
|
+
- [Benchmarks](docs/EMBEDDING_MODELS_COMPARISON.md)
|
|
603
|
+
|
|
604
|
+
</td>
|
|
605
|
+
</tr>
|
|
606
|
+
</table>
|
|
607
|
+
|
|
608
|
+
### ๐ฏ Quick Start by Role
|
|
609
|
+
|
|
610
|
+
| I want to... | Start here |
|
|
611
|
+
|--------------|------------|
|
|
612
|
+
| ๐ Try it out | [CLI Reference](docs/cli-reference.md) โ `npm i -g rag-lite-ts` |
|
|
613
|
+
| ๐ผ๏ธ Search images | [Multimodal Tutorial](docs/multimodal-tutorial.md) โ `--mode multimodal` |
|
|
614
|
+
| ๐ป Build an app | [API Reference](docs/api-reference.md) โ `new SearchEngine()` |
|
|
615
|
+
| ๐ค Integrate with AI | [MCP Guide](docs/mcp-server-multimodal-guide.md) โ `raglite-mcp` |
|
|
616
|
+
| โก Optimize performance | [Model Guide](docs/model-guide.md) โ Choose your model |
|
|
617
|
+
| ๐ Fix an issue | [Troubleshooting](docs/troubleshooting.md) โ Common solutions |
|
|
618
|
+
|
|
619
|
+
**๐ [Complete Documentation Hub](docs/README.md)**
|
|
620
|
+
|
|
621
|
+
## ๐ MCP Server Integration
|
|
622
|
+
|
|
623
|
+
**Give your AI agents semantic memory.** RAG-lite TS includes a built-in Model Context Protocol (MCP) server.
|
|
624
|
+
|
|
625
|
+
```bash
|
|
626
|
+
# Start MCP server (works with Claude, Cline, and other MCP clients)
|
|
627
|
+
raglite-mcp
|
|
628
|
+
```
|
|
629
|
+
|
|
630
|
+
### Single Instance Configuration
|
|
631
|
+
|
|
632
|
+
**MCP Configuration:**
|
|
633
|
+
```json
|
|
634
|
+
{
|
|
635
|
+
"mcpServers": {
|
|
636
|
+
"rag-lite": {
|
|
637
|
+
"command": "raglite-mcp",
|
|
638
|
+
"args": []
|
|
639
|
+
}
|
|
640
|
+
}
|
|
641
|
+
}
|
|
642
|
+
```
|
|
643
|
+
|
|
644
|
+
### Multiple Instance Configuration (NEW)
|
|
645
|
+
|
|
646
|
+
Run multiple MCP server instances for different databases with **intelligent routing**:
|
|
647
|
+
|
|
648
|
+
```json
|
|
649
|
+
{
|
|
650
|
+
"mcpServers": {
|
|
651
|
+
"rag-lite-text-docs": {
|
|
652
|
+
"command": "npx",
|
|
653
|
+
"args": ["rag-lite-mcp"],
|
|
654
|
+
"env": {
|
|
655
|
+
"RAG_DB_FILE": "./text-docs/db.sqlite",
|
|
656
|
+
"RAG_INDEX_FILE": "./text-docs/index.bin"
|
|
657
|
+
}
|
|
658
|
+
},
|
|
659
|
+
"rag-lite-multimodal-images": {
|
|
660
|
+
"command": "npx",
|
|
661
|
+
"args": ["rag-lite-mcp"],
|
|
662
|
+
"env": {
|
|
663
|
+
"RAG_DB_FILE": "./mixed-content/db.sqlite",
|
|
664
|
+
"RAG_INDEX_FILE": "./mixed-content/index.bin"
|
|
665
|
+
}
|
|
666
|
+
}
|
|
667
|
+
}
|
|
668
|
+
}
|
|
669
|
+
```
|
|
670
|
+
|
|
671
|
+
**Dynamic Tool Descriptions:**
|
|
672
|
+
Each server automatically detects and advertises its capabilities:
|
|
673
|
+
- `[TEXT MODE]` - Text-only databases clearly indicate supported file types
|
|
674
|
+
- `[MULTIMODAL MODE]` - Multimodal databases advertise image support and cross-modal search
|
|
675
|
+
- AI assistants can intelligently route queries to the appropriate database
|
|
676
|
+
|
|
677
|
+
**Available Tools:** `search`, `ingest`, `ingest_image`, `multimodal_search`, `rebuild_index`, `get_stats`, `get_mode_info`, `list_supported_models`, `list_reranking_strategies`, `get_system_stats`
|
|
678
|
+
|
|
679
|
+
**Multimodal Features:**
|
|
680
|
+
- Search across text and image content
|
|
681
|
+
- Retrieve image content as base64 data
|
|
682
|
+
- Cross-modal search capabilities (text queries find images)
|
|
683
|
+
- Automatic mode detection from database
|
|
684
|
+
- Content type filtering
|
|
685
|
+
- Multiple reranking strategies
|
|
686
|
+
|
|
687
|
+
โ **[Complete MCP Integration Guide](docs/cli-reference.md#mcp-server)** | **[MCP Multimodal Guide](docs/mcp-server-multimodal-guide.md)** | **[Multi-Instance Setup](docs/mcp-server-multimodal-guide.md#running-multiple-mcp-server-instances)**
|
|
688
|
+
|
|
689
|
+
---
|
|
690
|
+
|
|
691
|
+
## ๐ ๏ธ Development
|
|
692
|
+
|
|
693
|
+
### Building from Source
|
|
694
|
+
|
|
695
|
+
```bash
|
|
696
|
+
# Clone and setup
|
|
697
|
+
git clone https://github.com/your-username/rag-lite-ts.git
|
|
698
|
+
cd rag-lite-ts
|
|
699
|
+
npm install
|
|
700
|
+
|
|
701
|
+
# Build and link for development
|
|
702
|
+
npm run build
|
|
703
|
+
npm link # Makes raglite/raglite-mcp available globally
|
|
704
|
+
|
|
705
|
+
# Run tests
|
|
706
|
+
npm test
|
|
707
|
+
npm run test:integration
|
|
708
|
+
```
|
|
709
|
+
|
|
710
|
+
### Project Structure
|
|
711
|
+
|
|
712
|
+
```
|
|
713
|
+
src/
|
|
714
|
+
โโโ index.ts # Main exports and factory functions
|
|
715
|
+
โโโ search.ts # Public SearchEngine API
|
|
716
|
+
โโโ ingestion.ts # Public IngestionPipeline API
|
|
717
|
+
โโโ core/ # Model-agnostic core layer
|
|
718
|
+
โ โโโ search.ts # Core search engine
|
|
719
|
+
โ โโโ ingestion.ts # Core ingestion pipeline
|
|
720
|
+
โ โโโ db.ts # SQLite operations
|
|
721
|
+
โ โโโ config.ts # Configuration system
|
|
722
|
+
โ โโโ content-manager.ts # Content storage and management
|
|
723
|
+
โ โโโ types.ts # Core type definitions
|
|
724
|
+
โโโ text/ # Text-specific implementations
|
|
725
|
+
โ โโโ embedder.ts # Sentence-transformer embedder
|
|
726
|
+
โ โโโ reranker.ts # Cross-encoder reranking
|
|
727
|
+
โ โโโ tokenizer.ts # Text tokenization
|
|
728
|
+
โโโ multimodal/ # Multimodal implementations
|
|
729
|
+
โ โโโ embedder.ts # CLIP embedder (text + images)
|
|
730
|
+
โ โโโ reranker.ts # Text-derived reranking for multimodal
|
|
731
|
+
โ โโโ image-processor.ts # Image description and metadata
|
|
732
|
+
โ โโโ content-types.ts # Content type detection
|
|
733
|
+
โโโ cli.ts # CLI interface
|
|
734
|
+
โโโ mcp-server.ts # MCP server
|
|
735
|
+
โโโ preprocessors/ # Content type processors
|
|
736
|
+
|
|
737
|
+
dist/ # Compiled output
|
|
738
|
+
```
|
|
739
|
+
|
|
740
|
+
### Design Philosophy
|
|
741
|
+
|
|
742
|
+
**Simple by default, powerful when needed:**
|
|
743
|
+
- โ
Simple constructors work immediately with sensible defaults
|
|
744
|
+
- โ
Configuration options available when you need customization
|
|
745
|
+
- โ
Advanced patterns available for complex use cases
|
|
746
|
+
- โ
Clean architecture with minimal dependencies
|
|
747
|
+
- โ
No ORMs or heavy frameworks - just TypeScript and SQLite
|
|
748
|
+
- โ
Extensible design for future capabilities
|
|
749
|
+
|
|
750
|
+
This approach ensures that basic usage is effortless while providing the flexibility needed for advanced scenarios.
|
|
751
|
+
|
|
752
|
+
---
|
|
753
|
+
|
|
754
|
+
## ๐ค Contributing
|
|
755
|
+
|
|
756
|
+
We welcome contributions! Whether it's:
|
|
757
|
+
|
|
758
|
+
- ๐ Bug fixes
|
|
759
|
+
- โจ New features
|
|
760
|
+
- ๐ Documentation improvements
|
|
761
|
+
- ๐งช Test coverage
|
|
762
|
+
- ๐ก Ideas and suggestions
|
|
763
|
+
|
|
764
|
+
**Guidelines:**
|
|
765
|
+
1. Fork the repository
|
|
766
|
+
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
|
|
767
|
+
3. Make your changes with tests
|
|
768
|
+
4. Ensure all tests pass (`npm test`)
|
|
769
|
+
5. Submit a pull request
|
|
770
|
+
|
|
771
|
+
We maintain clean architecture principles while enhancing functionality and developer experience.
|
|
772
|
+
|
|
773
|
+
---
|
|
774
|
+
|
|
775
|
+
## ๐ฏ Why We Built This
|
|
776
|
+
|
|
777
|
+
Existing RAG solutions are either:
|
|
778
|
+
- ๐ด **Too complex** - Require extensive setup and configuration
|
|
779
|
+
- ๐ด **Cloud-dependent** - Need API keys and external services
|
|
780
|
+
- ๐ด **Python-only** - Not ideal for TypeScript/Node.js projects
|
|
781
|
+
- ๐ด **Heavy** - Massive dependencies and slow startup
|
|
782
|
+
|
|
783
|
+
**RAG-lite TS is different:**
|
|
784
|
+
- โ
**Simple** - Works out of the box with zero config
|
|
785
|
+
- โ
**Local-first** - Your data stays on your machine
|
|
786
|
+
- โ
**TypeScript native** - Built for modern JS/TS projects
|
|
787
|
+
- โ
**Lightweight** - Fast startup, minimal dependencies
|
|
788
|
+
|
|
789
|
+
---
|
|
790
|
+
|
|
791
|
+
## ๐ Acknowledgments
|
|
792
|
+
|
|
793
|
+
Built with amazing open-source projects:
|
|
794
|
+
|
|
795
|
+
- **[transformers.js](https://github.com/xenova/transformers.js)** - Client-side ML models by Xenova
|
|
796
|
+
- **[hnswlib](https://github.com/nmslib/hnswlib)** - Fast approximate nearest neighbor search
|
|
797
|
+
- **[better-sqlite3](https://github.com/WiseLibs/better-sqlite3)** - Fast SQLite3 bindings
|
|
798
|
+
|
|
799
|
+
---
|
|
800
|
+
|
|
801
|
+
## ๐ License
|
|
802
|
+
|
|
803
|
+
MIT License - see [LICENSE](LICENSE) file for details.
|
|
804
|
+
|
|
805
|
+
---
|
|
806
|
+
|
|
807
|
+
<div align="center">
|
|
808
|
+
|
|
809
|
+
**โญ Star us on GitHub โ it helps!**
|
|
810
|
+
|
|
811
|
+
[Report Bug](https://github.com/your-username/rag-lite-ts/issues) โข [Request Feature](https://github.com/your-username/rag-lite-ts/issues) โข [Documentation](docs/README.md)
|
|
812
|
+
|
|
813
|
+
Made with โค๏ธ by developers, for developers
|
|
814
|
+
|
|
815
|
+
</div>
|