rag-lite-ts 1.0.1 → 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +651 -109
- package/dist/cli/indexer.js +262 -46
- package/dist/cli/search.js +54 -32
- package/dist/cli.js +185 -28
- package/dist/config.d.ts +34 -73
- package/dist/config.js +50 -255
- package/dist/core/abstract-embedder.d.ts +125 -0
- package/dist/core/abstract-embedder.js +264 -0
- package/dist/core/actionable-error-messages.d.ts +60 -0
- package/dist/core/actionable-error-messages.js +397 -0
- package/dist/core/adapters.d.ts +93 -0
- package/dist/core/adapters.js +139 -0
- package/dist/core/batch-processing-optimizer.d.ts +155 -0
- package/dist/core/batch-processing-optimizer.js +541 -0
- package/dist/core/chunker.d.ts +119 -0
- package/dist/core/chunker.js +73 -0
- package/dist/core/cli-database-utils.d.ts +53 -0
- package/dist/core/cli-database-utils.js +239 -0
- package/dist/core/config.d.ts +102 -0
- package/dist/core/config.js +247 -0
- package/dist/core/content-errors.d.ts +111 -0
- package/dist/core/content-errors.js +362 -0
- package/dist/core/content-manager.d.ts +343 -0
- package/dist/core/content-manager.js +1504 -0
- package/dist/core/content-performance-optimizer.d.ts +150 -0
- package/dist/core/content-performance-optimizer.js +516 -0
- package/dist/core/content-resolver.d.ts +104 -0
- package/dist/core/content-resolver.js +285 -0
- package/dist/core/cross-modal-search.d.ts +164 -0
- package/dist/core/cross-modal-search.js +342 -0
- package/dist/core/database-connection-manager.d.ts +109 -0
- package/dist/core/database-connection-manager.js +304 -0
- package/dist/core/db.d.ts +245 -0
- package/dist/core/db.js +952 -0
- package/dist/core/embedder-factory.d.ts +176 -0
- package/dist/core/embedder-factory.js +338 -0
- package/dist/{error-handler.d.ts → core/error-handler.d.ts} +23 -2
- package/dist/{error-handler.js → core/error-handler.js} +51 -8
- package/dist/core/index.d.ts +59 -0
- package/dist/core/index.js +69 -0
- package/dist/core/ingestion.d.ts +213 -0
- package/dist/core/ingestion.js +812 -0
- package/dist/core/interfaces.d.ts +408 -0
- package/dist/core/interfaces.js +106 -0
- package/dist/core/lazy-dependency-loader.d.ts +152 -0
- package/dist/core/lazy-dependency-loader.js +453 -0
- package/dist/core/mode-detection-service.d.ts +150 -0
- package/dist/core/mode-detection-service.js +565 -0
- package/dist/core/mode-model-validator.d.ts +92 -0
- package/dist/core/mode-model-validator.js +203 -0
- package/dist/core/model-registry.d.ts +120 -0
- package/dist/core/model-registry.js +415 -0
- package/dist/core/model-validator.d.ts +217 -0
- package/dist/core/model-validator.js +782 -0
- package/dist/{path-manager.d.ts → core/path-manager.d.ts} +5 -0
- package/dist/{path-manager.js → core/path-manager.js} +5 -0
- package/dist/core/polymorphic-search-factory.d.ts +154 -0
- package/dist/core/polymorphic-search-factory.js +344 -0
- package/dist/core/raglite-paths.d.ts +121 -0
- package/dist/core/raglite-paths.js +145 -0
- package/dist/core/reranking-config.d.ts +42 -0
- package/dist/core/reranking-config.js +156 -0
- package/dist/core/reranking-factory.d.ts +92 -0
- package/dist/core/reranking-factory.js +591 -0
- package/dist/core/reranking-strategies.d.ts +325 -0
- package/dist/core/reranking-strategies.js +720 -0
- package/dist/core/resource-cleanup.d.ts +163 -0
- package/dist/core/resource-cleanup.js +371 -0
- package/dist/core/resource-manager.d.ts +212 -0
- package/dist/core/resource-manager.js +564 -0
- package/dist/core/search-pipeline.d.ts +111 -0
- package/dist/core/search-pipeline.js +287 -0
- package/dist/core/search.d.ts +131 -0
- package/dist/core/search.js +296 -0
- package/dist/core/streaming-operations.d.ts +145 -0
- package/dist/core/streaming-operations.js +409 -0
- package/dist/core/types.d.ts +66 -0
- package/dist/core/types.js +6 -0
- package/dist/core/universal-embedder.d.ts +177 -0
- package/dist/core/universal-embedder.js +139 -0
- package/dist/core/validation-messages.d.ts +99 -0
- package/dist/core/validation-messages.js +334 -0
- package/dist/{vector-index.d.ts → core/vector-index.d.ts} +4 -0
- package/dist/{vector-index.js → core/vector-index.js} +21 -3
- package/dist/dom-polyfills.d.ts +6 -0
- package/dist/dom-polyfills.js +40 -0
- package/dist/factories/index.d.ts +43 -0
- package/dist/factories/index.js +44 -0
- package/dist/factories/text-factory.d.ts +560 -0
- package/dist/factories/text-factory.js +968 -0
- package/dist/file-processor.d.ts +90 -4
- package/dist/file-processor.js +723 -20
- package/dist/index-manager.d.ts +3 -2
- package/dist/index-manager.js +13 -11
- package/dist/index.d.ts +72 -8
- package/dist/index.js +102 -16
- package/dist/indexer.js +1 -1
- package/dist/ingestion.d.ts +44 -154
- package/dist/ingestion.js +75 -671
- package/dist/mcp-server.d.ts +35 -3
- package/dist/mcp-server.js +1186 -79
- package/dist/multimodal/clip-embedder.d.ts +314 -0
- package/dist/multimodal/clip-embedder.js +945 -0
- package/dist/multimodal/index.d.ts +6 -0
- package/dist/multimodal/index.js +6 -0
- package/dist/preprocess.js +1 -1
- package/dist/run-error-recovery-tests.d.ts +7 -0
- package/dist/run-error-recovery-tests.js +101 -0
- package/dist/search-standalone.js +1 -1
- package/dist/search.d.ts +51 -69
- package/dist/search.js +117 -412
- package/dist/test-utils.d.ts +8 -26
- package/dist/text/chunker.d.ts +33 -0
- package/dist/{chunker.js → text/chunker.js} +98 -75
- package/dist/{embedder.d.ts → text/embedder.d.ts} +22 -1
- package/dist/{embedder.js → text/embedder.js} +84 -10
- package/dist/text/index.d.ts +8 -0
- package/dist/text/index.js +9 -0
- package/dist/text/preprocessors/index.d.ts +17 -0
- package/dist/text/preprocessors/index.js +38 -0
- package/dist/text/preprocessors/mdx.d.ts +25 -0
- package/dist/text/preprocessors/mdx.js +101 -0
- package/dist/text/preprocessors/mermaid.d.ts +68 -0
- package/dist/text/preprocessors/mermaid.js +330 -0
- package/dist/text/preprocessors/registry.d.ts +56 -0
- package/dist/text/preprocessors/registry.js +180 -0
- package/dist/text/reranker.d.ts +59 -0
- package/dist/{reranker.js → text/reranker.js} +138 -53
- package/dist/text/sentence-transformer-embedder.d.ts +96 -0
- package/dist/text/sentence-transformer-embedder.js +340 -0
- package/dist/{tokenizer.d.ts → text/tokenizer.d.ts} +1 -0
- package/dist/{tokenizer.js → text/tokenizer.js} +7 -2
- package/dist/types.d.ts +40 -1
- package/dist/utils/vector-math.d.ts +31 -0
- package/dist/utils/vector-math.js +70 -0
- package/package.json +16 -4
- package/dist/api-errors.d.ts.map +0 -1
- package/dist/api-errors.js.map +0 -1
- package/dist/chunker.d.ts +0 -47
- package/dist/chunker.d.ts.map +0 -1
- package/dist/chunker.js.map +0 -1
- package/dist/cli/indexer.d.ts.map +0 -1
- package/dist/cli/indexer.js.map +0 -1
- package/dist/cli/search.d.ts.map +0 -1
- package/dist/cli/search.js.map +0 -1
- package/dist/cli.d.ts.map +0 -1
- package/dist/cli.js.map +0 -1
- package/dist/config.d.ts.map +0 -1
- package/dist/config.js.map +0 -1
- package/dist/db.d.ts +0 -90
- package/dist/db.d.ts.map +0 -1
- package/dist/db.js +0 -340
- package/dist/db.js.map +0 -1
- package/dist/embedder.d.ts.map +0 -1
- package/dist/embedder.js.map +0 -1
- package/dist/error-handler.d.ts.map +0 -1
- package/dist/error-handler.js.map +0 -1
- package/dist/file-processor.d.ts.map +0 -1
- package/dist/file-processor.js.map +0 -1
- package/dist/index-manager.d.ts.map +0 -1
- package/dist/index-manager.js.map +0 -1
- package/dist/index.d.ts.map +0 -1
- package/dist/index.js.map +0 -1
- package/dist/indexer.d.ts.map +0 -1
- package/dist/indexer.js.map +0 -1
- package/dist/ingestion.d.ts.map +0 -1
- package/dist/ingestion.js.map +0 -1
- package/dist/mcp-server.d.ts.map +0 -1
- package/dist/mcp-server.js.map +0 -1
- package/dist/path-manager.d.ts.map +0 -1
- package/dist/path-manager.js.map +0 -1
- package/dist/preprocess.d.ts.map +0 -1
- package/dist/preprocess.js.map +0 -1
- package/dist/preprocessors/index.d.ts.map +0 -1
- package/dist/preprocessors/index.js.map +0 -1
- package/dist/preprocessors/mdx.d.ts.map +0 -1
- package/dist/preprocessors/mdx.js.map +0 -1
- package/dist/preprocessors/mermaid.d.ts.map +0 -1
- package/dist/preprocessors/mermaid.js.map +0 -1
- package/dist/preprocessors/registry.d.ts.map +0 -1
- package/dist/preprocessors/registry.js.map +0 -1
- package/dist/reranker.d.ts +0 -40
- package/dist/reranker.d.ts.map +0 -1
- package/dist/reranker.js.map +0 -1
- package/dist/resource-manager-demo.d.ts +0 -7
- package/dist/resource-manager-demo.d.ts.map +0 -1
- package/dist/resource-manager-demo.js +0 -52
- package/dist/resource-manager-demo.js.map +0 -1
- package/dist/resource-manager.d.ts +0 -129
- package/dist/resource-manager.d.ts.map +0 -1
- package/dist/resource-manager.js +0 -389
- package/dist/resource-manager.js.map +0 -1
- package/dist/search-standalone.d.ts.map +0 -1
- package/dist/search-standalone.js.map +0 -1
- package/dist/search.d.ts.map +0 -1
- package/dist/search.js.map +0 -1
- package/dist/test-utils.d.ts.map +0 -1
- package/dist/test-utils.js.map +0 -1
- package/dist/tokenizer.d.ts.map +0 -1
- package/dist/tokenizer.js.map +0 -1
- package/dist/types.d.ts.map +0 -1
- package/dist/types.js.map +0 -1
- package/dist/vector-index.d.ts.map +0 -1
- package/dist/vector-index.js.map +0 -1
package/README.md
CHANGED
|
@@ -1,33 +1,112 @@
|
|
|
1
|
-
|
|
2
|
-
*A local-first, TypeScript-friendly retrieval engine*
|
|
1
|
+
<div align="center">
|
|
3
2
|
|
|
4
|
-
|
|
3
|
+
# 🦎 RAG-lite TS
|
|
5
4
|
|
|
6
|
-
|
|
5
|
+
### *Simple by default, powerful when needed*
|
|
7
6
|
|
|
8
|
-
|
|
7
|
+
**Local-first semantic search that actually works**
|
|
9
8
|
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
-
|
|
13
|
-
-
|
|
14
|
-
- 🛠️ **Hackable**: Clear module boundaries and minimal dependencies
|
|
15
|
-
- 📦 **Dual Interface**: CLI + MCP server entry points in one package
|
|
16
|
-
- 🎯 **TypeScript**: Full type safety with ESM-only architecture
|
|
17
|
-
- 🧠 **Multi-Model**: Support for multiple embedding models with automatic compatibility checking
|
|
9
|
+
[](https://www.npmjs.com/package/rag-lite-ts)
|
|
10
|
+
[](https://opensource.org/licenses/MIT)
|
|
11
|
+
[](https://www.typescriptlang.org/)
|
|
12
|
+
[](https://nodejs.org/)
|
|
18
13
|
|
|
19
|
-
|
|
14
|
+
[Quick Start](#quick-start) • [Features](#features) • [Documentation](#documentation) • [Examples](#examples) • [MCP Integration](#mcp-server-integration)
|
|
20
15
|
|
|
21
|
-
|
|
22
|
-
- [How It Works](#how-it-works)
|
|
23
|
-
- [Supported Models](#supported-models)
|
|
24
|
-
- [MCP Server Integration](#mcp-server-integration)
|
|
25
|
-
- [Documentation](#documentation)
|
|
26
|
-
- [Development](#development)
|
|
27
|
-
- [Contributing](#contributing)
|
|
28
|
-
- [License](#license)
|
|
16
|
+
</div>
|
|
29
17
|
|
|
30
|
-
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## 🎯 Why RAG-lite TS?
|
|
21
|
+
|
|
22
|
+
**Stop fighting with complex RAG frameworks.** Get semantic search running in 30 seconds:
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
npm install -g rag-lite-ts
|
|
26
|
+
raglite ingest ./docs/
|
|
27
|
+
raglite search "your query here"
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
**That's it.** No API keys, no cloud services, no configuration hell.
|
|
31
|
+
|
|
32
|
+
### 🎬 See It In Action
|
|
33
|
+
|
|
34
|
+
```typescript
|
|
35
|
+
// 1. Ingest your docs
|
|
36
|
+
const pipeline = new IngestionPipeline('./db.sqlite', './index.bin');
|
|
37
|
+
await pipeline.ingestDirectory('./docs/');
|
|
38
|
+
|
|
39
|
+
// 2. Search semantically
|
|
40
|
+
const search = new SearchEngine('./index.bin', './db.sqlite');
|
|
41
|
+
const results = await search.search('authentication flow');
|
|
42
|
+
|
|
43
|
+
// 3. Get relevant results instantly
|
|
44
|
+
console.log(results[0].text);
|
|
45
|
+
// "To authenticate users, first obtain a JWT token from the /auth endpoint..."
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
**Real semantic understanding** - not just keyword matching. Finds "JWT token" when you search for "authentication flow".
|
|
49
|
+
|
|
50
|
+
### What Makes It Different?
|
|
51
|
+
|
|
52
|
+
- 🏠 **100% Local** - Your data never leaves your machine
|
|
53
|
+
- 🚀 **Actually Fast** - Sub-100ms queries, not "eventually consistent"
|
|
54
|
+
- 🦎 **Chameleon Architecture** - Automatically adapts between text and multimodal modes
|
|
55
|
+
- 🖼️ **True Multimodal** - Search images with text, text with images (CLIP unified space)
|
|
56
|
+
- 📦 **Zero Runtime Dependencies** - No Python, no Docker, no external services
|
|
57
|
+
- 🎯 **TypeScript Native** - Full type safety, modern ESM architecture
|
|
58
|
+
- 🔌 **MCP Ready** - Built-in Model Context Protocol server for AI agents
|
|
59
|
+
|
|
60
|
+

|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## 🎉 What's New in 2.0
|
|
65
|
+
|
|
66
|
+
**Chameleon Multimodal Architecture** - RAG-lite TS now seamlessly adapts between text-only and multimodal search:
|
|
67
|
+
|
|
68
|
+
### 🖼️ Multimodal Search
|
|
69
|
+
- **CLIP Integration** - Unified 512D embedding space for text and images
|
|
70
|
+
- **Cross-Modal Search** - Find images with text queries, text with image queries
|
|
71
|
+
- **Image-to-Text Generation** - Automatic descriptions using vision-language models
|
|
72
|
+
- **Smart Reranking** - Text-derived, metadata-based, and hybrid strategies
|
|
73
|
+
|
|
74
|
+
### 🏗️ Architecture Improvements
|
|
75
|
+
- **Layered Architecture** - Clean separation: core (model-agnostic) → implementation (text/multimodal) → public API
|
|
76
|
+
- **Mode Persistence** - Configuration stored in database, auto-detected during search
|
|
77
|
+
- **Unified Content System** - Memory-based ingestion for AI agents, format-adaptive retrieval
|
|
78
|
+
- **Simplified APIs** - `createEmbedder()` and `createReranker()` replace complex factory patterns
|
|
79
|
+
|
|
80
|
+
### 🤖 MCP Server Enhancements
|
|
81
|
+
- **Multimodal Tools** - `multimodal_search`, `ingest_image` with URL download
|
|
82
|
+
- **Base64 Image Delivery** - Automatic encoding for AI agent integration
|
|
83
|
+
- **Content-Type Filtering** - Filter results by text, image, pdf, docx
|
|
84
|
+
- **Dynamic Tool Descriptions** - Context-aware tool documentation
|
|
85
|
+
|
|
86
|
+
### 📦 Migration from 1.x
|
|
87
|
+
Existing databases need schema updates for multimodal support. Two options:
|
|
88
|
+
1. **Automatic Migration**: Use `migrateToRagLiteStructure()` function
|
|
89
|
+
2. **Fresh Start**: Re-ingest content with v2.0.0
|
|
90
|
+
|
|
91
|
+
See [CHANGELOG.md](CHANGELOG.md) for complete details.
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## 📋 Table of Contents
|
|
96
|
+
|
|
97
|
+
- [Why RAG-lite TS?](#-why-rag-lite-ts)
|
|
98
|
+
- [Quick Start](#-quick-start)
|
|
99
|
+
- [Features](#-features)
|
|
100
|
+
- [Real-World Examples](#-real-world-examples)
|
|
101
|
+
- [How It Works](#-how-it-works)
|
|
102
|
+
- [Supported Models](#-supported-models)
|
|
103
|
+
- [Documentation](#-documentation)
|
|
104
|
+
- [MCP Server Integration](#-mcp-server-integration)
|
|
105
|
+
- [Development](#-development)
|
|
106
|
+
- [Contributing](#-contributing)
|
|
107
|
+
- [License](#-license)
|
|
108
|
+
|
|
109
|
+
## 🚀 Quick Start
|
|
31
110
|
|
|
32
111
|
### Installation
|
|
33
112
|
|
|
@@ -58,75 +137,465 @@ raglite ingest ./docs/ --model Xenova/all-mpnet-base-v2 --rebuild-if-needed
|
|
|
58
137
|
raglite search "complex query"
|
|
59
138
|
```
|
|
60
139
|
|
|
140
|
+
### Content Retrieval and MCP Integration
|
|
141
|
+
|
|
142
|
+
```typescript
|
|
143
|
+
import { SearchEngine, IngestionPipeline } from 'rag-lite-ts';
|
|
144
|
+
|
|
145
|
+
// Memory-based ingestion for AI agents
|
|
146
|
+
const pipeline = new IngestionPipeline('./db.sqlite', './index.bin');
|
|
147
|
+
const content = Buffer.from('Document from AI agent');
|
|
148
|
+
await pipeline.ingestFromMemory(content, {
|
|
149
|
+
displayName: 'agent-document.txt'
|
|
150
|
+
});
|
|
151
|
+
|
|
152
|
+
// Format-adaptive content retrieval
|
|
153
|
+
const search = new SearchEngine('./index.bin', './db.sqlite');
|
|
154
|
+
const results = await search.search('query');
|
|
155
|
+
|
|
156
|
+
// Get file path for CLI clients
|
|
157
|
+
const filePath = await search.getContent(results[0].contentId, 'file');
|
|
158
|
+
|
|
159
|
+
// Get base64 content for MCP clients
|
|
160
|
+
const base64 = await search.getContent(results[0].contentId, 'base64');
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
### Multimodal Search (Text + Images)
|
|
164
|
+
|
|
165
|
+
RAG-lite TS now supports true multimodal search using CLIP's unified embedding space, enabling cross-modal search between text and images:
|
|
166
|
+
|
|
167
|
+
```bash
|
|
168
|
+
# Enable multimodal processing for text and image content
|
|
169
|
+
raglite ingest ./docs/ --mode multimodal
|
|
170
|
+
|
|
171
|
+
# Cross-modal search: Find images using text queries
|
|
172
|
+
raglite search "architecture diagram" --content-type image
|
|
173
|
+
raglite search "red sports car" --content-type image
|
|
174
|
+
|
|
175
|
+
# Find text documents about visual concepts
|
|
176
|
+
raglite search "user interface design" --content-type text
|
|
177
|
+
|
|
178
|
+
# Search across both content types (default)
|
|
179
|
+
raglite search "system overview"
|
|
180
|
+
|
|
181
|
+
# Use different reranking strategies for optimal results
|
|
182
|
+
raglite ingest ./docs/ --mode multimodal --rerank-strategy text-derived
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
**Key Features:**
|
|
186
|
+
- **Unified embedding space**: Text and images embedded in the same 512-dimensional CLIP space
|
|
187
|
+
- **Cross-modal search**: Text queries find semantically similar images
|
|
188
|
+
- **Automatic mode detection**: Set mode once during ingestion, automatically detected during search
|
|
189
|
+
- **Multiple reranking strategies**: text-derived, metadata, hybrid, or disabled
|
|
190
|
+
- **Seamless experience**: Same CLI commands work for both text-only and multimodal content
|
|
191
|
+
|
|
192
|
+
→ **[Complete Multimodal Tutorial](docs/multimodal-tutorial.md)**
|
|
193
|
+
|
|
61
194
|
### Programmatic Usage
|
|
62
195
|
|
|
63
196
|
```typescript
|
|
64
|
-
import { SearchEngine, IngestionPipeline
|
|
197
|
+
import { SearchEngine, IngestionPipeline } from 'rag-lite-ts';
|
|
198
|
+
|
|
199
|
+
// Text-only mode (default)
|
|
200
|
+
const ingestion = new IngestionPipeline('./db.sqlite', './vector-index.bin');
|
|
201
|
+
await ingestion.ingestDirectory('./docs/');
|
|
202
|
+
|
|
203
|
+
// Multimodal mode (text + images)
|
|
204
|
+
const multimodalIngestion = new IngestionPipeline('./db.sqlite', './index.bin', {
|
|
205
|
+
mode: 'multimodal',
|
|
206
|
+
embeddingModel: 'Xenova/clip-vit-base-patch32',
|
|
207
|
+
rerankingStrategy: 'text-derived'
|
|
208
|
+
});
|
|
209
|
+
await multimodalIngestion.ingestDirectory('./mixed-content/');
|
|
210
|
+
|
|
211
|
+
// Search (mode auto-detected from database)
|
|
212
|
+
const search = new SearchEngine('./vector-index.bin', './db.sqlite');
|
|
213
|
+
const results = await search.search('machine learning', { top_k: 10 });
|
|
214
|
+
|
|
215
|
+
// Cross-modal search in multimodal mode
|
|
216
|
+
const imageResults = results.filter(r => r.contentType === 'image');
|
|
217
|
+
const textResults = results.filter(r => r.contentType === 'text');
|
|
218
|
+
```
|
|
65
219
|
|
|
66
|
-
|
|
67
|
-
const embedder = await initializeEmbeddingEngine();
|
|
68
|
-
const pipeline = new IngestionPipeline('./data/', embedder);
|
|
69
|
-
await pipeline.ingestDirectory('./docs/');
|
|
220
|
+
### Memory Ingestion & Unified Content System (NEW)
|
|
70
221
|
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
const
|
|
222
|
+
```typescript
|
|
223
|
+
// Ingest content directly from memory (perfect for MCP integration)
|
|
224
|
+
const content = Buffer.from('# AI Guide\n\nComprehensive AI concepts...');
|
|
225
|
+
const contentId = await ingestion.ingestFromMemory(content, {
|
|
226
|
+
displayName: 'AI Guide.md',
|
|
227
|
+
contentType: 'text/markdown'
|
|
228
|
+
});
|
|
229
|
+
|
|
230
|
+
// Retrieve content in different formats based on client needs
|
|
231
|
+
const filePath = await search.getContent(contentId, 'file'); // For CLI clients
|
|
232
|
+
const base64Data = await search.getContent(contentId, 'base64'); // For MCP clients
|
|
233
|
+
|
|
234
|
+
// Batch content retrieval for efficiency
|
|
235
|
+
const contentIds = ['id1', 'id2', 'id3'];
|
|
236
|
+
const contents = await search.getContentBatch(contentIds, 'base64');
|
|
237
|
+
|
|
238
|
+
// Content management with deduplication
|
|
239
|
+
const stats = await ingestion.getStorageStats();
|
|
240
|
+
console.log(`Content directory: ${stats.contentDirSize} bytes, ${stats.fileCount} files`);
|
|
241
|
+
|
|
242
|
+
// Cleanup orphaned content
|
|
243
|
+
const cleanupResult = await ingestion.cleanup();
|
|
244
|
+
console.log(`Removed ${cleanupResult.removedFiles} orphaned files`);
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
#### Configuration Options
|
|
248
|
+
|
|
249
|
+
```typescript
|
|
250
|
+
import { SearchEngine, IngestionPipeline } from 'rag-lite-ts';
|
|
251
|
+
|
|
252
|
+
// Custom model configuration
|
|
253
|
+
const search = new SearchEngine('./vector-index.bin', './db.sqlite', {
|
|
254
|
+
embeddingModel: 'Xenova/all-mpnet-base-v2',
|
|
255
|
+
enableReranking: true,
|
|
256
|
+
topK: 15
|
|
257
|
+
});
|
|
258
|
+
|
|
259
|
+
// Ingestion with custom settings
|
|
260
|
+
const ingestion = new IngestionPipeline('./db.sqlite', './vector-index.bin', {
|
|
261
|
+
embeddingModel: 'Xenova/all-mpnet-base-v2',
|
|
262
|
+
chunkSize: 400,
|
|
263
|
+
chunkOverlap: 80
|
|
264
|
+
});
|
|
74
265
|
```
|
|
75
266
|
|
|
76
267
|
→ **[Complete CLI Reference](docs/cli-reference.md)** | **[API Documentation](docs/api-reference.md)**
|
|
77
268
|
|
|
78
|
-
|
|
269
|
+
---
|
|
270
|
+
|
|
271
|
+
## 💡 Real-World Examples
|
|
272
|
+
|
|
273
|
+
<details>
|
|
274
|
+
<summary><b>🔍 Build a Documentation Search Engine</b></summary>
|
|
275
|
+
|
|
276
|
+
```typescript
|
|
277
|
+
import { SearchEngine, IngestionPipeline } from 'rag-lite-ts';
|
|
278
|
+
|
|
279
|
+
// Ingest your docs once
|
|
280
|
+
const pipeline = new IngestionPipeline('./db.sqlite', './index.bin');
|
|
281
|
+
await pipeline.ingestDirectory('./docs/');
|
|
282
|
+
|
|
283
|
+
// Search instantly
|
|
284
|
+
const search = new SearchEngine('./index.bin', './db.sqlite');
|
|
285
|
+
const results = await search.search('authentication flow');
|
|
286
|
+
|
|
287
|
+
results.forEach(r => {
|
|
288
|
+
console.log(`${r.metadata.title}: ${r.text}`);
|
|
289
|
+
console.log(`Relevance: ${r.score.toFixed(3)}\n`);
|
|
290
|
+
});
|
|
291
|
+
```
|
|
292
|
+
|
|
293
|
+
**Use case:** Internal documentation, API references, knowledge bases
|
|
294
|
+
|
|
295
|
+
</details>
|
|
296
|
+
|
|
297
|
+
<details>
|
|
298
|
+
<summary><b>🖼️ Search Images with Natural Language</b></summary>
|
|
299
|
+
|
|
300
|
+
```bash
|
|
301
|
+
# Ingest mixed content (text + images)
|
|
302
|
+
raglite ingest ./assets/ --mode multimodal
|
|
303
|
+
|
|
304
|
+
# Find images using text descriptions
|
|
305
|
+
raglite search "architecture diagram" --content-type image
|
|
306
|
+
raglite search "team photo" --content-type image
|
|
307
|
+
raglite search "product screenshot" --content-type image
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
**Use case:** Digital asset management, photo libraries, design systems
|
|
311
|
+
|
|
312
|
+
</details>
|
|
313
|
+
|
|
314
|
+
<details>
|
|
315
|
+
<summary><b>🤖 AI Agent with Memory</b></summary>
|
|
316
|
+
|
|
317
|
+
```typescript
|
|
318
|
+
// Agent ingests conversation context
|
|
319
|
+
const content = Buffer.from('User prefers dark mode. Uses TypeScript.');
|
|
320
|
+
await pipeline.ingestFromMemory(content, {
|
|
321
|
+
displayName: 'user-preferences.txt'
|
|
322
|
+
});
|
|
323
|
+
|
|
324
|
+
// Later, agent retrieves relevant context
|
|
325
|
+
const context = await search.search('user interface preferences');
|
|
326
|
+
// Agent now knows: "User prefers dark mode"
|
|
327
|
+
```
|
|
328
|
+
|
|
329
|
+
**Use case:** Chatbots, AI assistants, context-aware agents
|
|
330
|
+
|
|
331
|
+
</details>
|
|
332
|
+
|
|
333
|
+
<details>
|
|
334
|
+
<summary><b>📊 Semantic Code Search</b></summary>
|
|
335
|
+
|
|
336
|
+
```typescript
|
|
337
|
+
// Index your codebase
|
|
338
|
+
await pipeline.ingestDirectory('./src/', {
|
|
339
|
+
chunkSize: 500, // Larger chunks for code
|
|
340
|
+
chunkOverlap: 100
|
|
341
|
+
});
|
|
342
|
+
|
|
343
|
+
// Find code by intent, not keywords
|
|
344
|
+
const results = await search.search('authentication middleware');
|
|
345
|
+
// Finds relevant code even if it doesn't contain those exact words
|
|
346
|
+
```
|
|
347
|
+
|
|
348
|
+
**Use case:** Code navigation, refactoring, onboarding
|
|
349
|
+
|
|
350
|
+
</details>
|
|
351
|
+
|
|
352
|
+
<details>
|
|
353
|
+
<summary><b>🔌 MCP Server for Claude/AI Tools</b></summary>
|
|
354
|
+
|
|
355
|
+
```json
|
|
356
|
+
{
|
|
357
|
+
"mcpServers": {
|
|
358
|
+
"my-docs": {
|
|
359
|
+
"command": "raglite-mcp",
|
|
360
|
+
"env": {
|
|
361
|
+
"RAG_DB_FILE": "./docs/db.sqlite",
|
|
362
|
+
"RAG_INDEX_FILE": "./docs/index.bin"
|
|
363
|
+
}
|
|
364
|
+
}
|
|
365
|
+
}
|
|
366
|
+
}
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
Now Claude can search your docs directly! Works with any MCP-compatible AI tool.
|
|
370
|
+
|
|
371
|
+
**Use case:** AI-powered documentation, intelligent assistants
|
|
372
|
+
|
|
373
|
+
</details>
|
|
374
|
+
|
|
375
|
+
---
|
|
376
|
+
|
|
377
|
+
## ✨ Features
|
|
378
|
+
|
|
379
|
+
<table>
|
|
380
|
+
<tr>
|
|
381
|
+
<td width="50%">
|
|
382
|
+
|
|
383
|
+
### 🎯 Developer Experience
|
|
384
|
+
- **One-line setup** - `new SearchEngine()` just works
|
|
385
|
+
- **TypeScript native** - Full type safety
|
|
386
|
+
- **Zero config** - Sensible defaults everywhere
|
|
387
|
+
- **Hackable** - Clean architecture, easy to extend
|
|
388
|
+
|
|
389
|
+
</td>
|
|
390
|
+
<td width="50%">
|
|
79
391
|
|
|
80
|
-
|
|
392
|
+
### 🚀 Performance
|
|
393
|
+
- **Sub-100ms queries** - Fast vector search
|
|
394
|
+
- **Offline-first** - No network calls
|
|
395
|
+
- **Efficient chunking** - Smart semantic boundaries
|
|
396
|
+
- **Optimized models** - Multiple quality/speed options
|
|
397
|
+
|
|
398
|
+
</td>
|
|
399
|
+
</tr>
|
|
400
|
+
<tr>
|
|
401
|
+
<td width="50%">
|
|
402
|
+
|
|
403
|
+
### 🦎 Chameleon Architecture
|
|
404
|
+
- **Auto-adapting** - Text or multimodal mode
|
|
405
|
+
- **Mode persistence** - Set once, auto-detected
|
|
406
|
+
- **No fallbacks** - Reliable or clear failure
|
|
407
|
+
- **Polymorphic runtime** - Same API, different modes
|
|
408
|
+
|
|
409
|
+
</td>
|
|
410
|
+
<td width="50%">
|
|
411
|
+
|
|
412
|
+
### 🖼️ Multimodal Search
|
|
413
|
+
- **CLIP unified space** - Text and images together
|
|
414
|
+
- **Cross-modal queries** - Text finds images, vice versa
|
|
415
|
+
- **Multiple strategies** - Text-derived, metadata, hybrid
|
|
416
|
+
- **Seamless experience** - Same commands, more power
|
|
417
|
+
|
|
418
|
+
</td>
|
|
419
|
+
</tr>
|
|
420
|
+
<tr>
|
|
421
|
+
<td width="50%">
|
|
422
|
+
|
|
423
|
+
### 🔌 Integration Ready
|
|
424
|
+
- **MCP server included** - AI agent integration
|
|
425
|
+
- **Memory ingestion** - Direct buffer processing
|
|
426
|
+
- **Format-adaptive** - File paths or base64 data
|
|
427
|
+
- **Multi-instance** - Run multiple databases
|
|
428
|
+
|
|
429
|
+
</td>
|
|
430
|
+
<td width="50%">
|
|
431
|
+
|
|
432
|
+
### 🛠️ Production Ready
|
|
433
|
+
- **Content management** - Deduplication, cleanup
|
|
434
|
+
- **Model compatibility** - Auto-detection, rebuilds
|
|
435
|
+
- **Error recovery** - Clear messages, helpful hints
|
|
436
|
+
- **Battle-tested** - Used in real applications
|
|
437
|
+
|
|
438
|
+
</td>
|
|
439
|
+
</tr>
|
|
440
|
+
</table>
|
|
441
|
+
|
|
442
|
+
## 🔧 How It Works
|
|
443
|
+
|
|
444
|
+
RAG-lite TS follows a clean, efficient pipeline:
|
|
445
|
+
|
|
446
|
+
```
|
|
447
|
+
📄 Documents → 🧹 Preprocessing → ✂️ Chunking → 🧠 Embedding → 💾 Storage
|
|
448
|
+
↓
|
|
449
|
+
🎯 Results ← 🔄 Reranking ← 🔍 Vector Search ← 🧠 Query Embedding ← ❓ Query
|
|
450
|
+
```
|
|
451
|
+
|
|
452
|
+
### Pipeline Steps
|
|
453
|
+
|
|
454
|
+
| Step | What Happens | Technologies |
|
|
455
|
+
|------|--------------|--------------|
|
|
456
|
+
| **1. Ingestion** | Reads `.md`, `.txt`, `.pdf`, `.docx`, images | Native parsers |
|
|
457
|
+
| **2. Preprocessing** | Cleans JSX, Mermaid, code blocks, generates image descriptions | Custom processors |
|
|
458
|
+
| **3. Chunking** | Splits at natural boundaries with token limits | Semantic chunking |
|
|
459
|
+
| **4. Embedding** | Converts text/images to vectors | transformers.js |
|
|
460
|
+
| **5. Storage** | Indexes vectors, stores metadata | hnswlib + SQLite |
|
|
461
|
+
| **6. Search** | Finds similar chunks via cosine similarity | HNSW algorithm |
|
|
462
|
+
| **7. Reranking** | Re-scores results for relevance | Cross-encoder/metadata |
|
|
463
|
+
|
|
464
|
+
### 🦎 Chameleon Architecture
|
|
465
|
+
|
|
466
|
+
The system **automatically adapts** based on your content:
|
|
467
|
+
|
|
468
|
+
<table>
|
|
469
|
+
<tr>
|
|
470
|
+
<td width="50%">
|
|
471
|
+
|
|
472
|
+
#### 📝 Text Mode
|
|
473
|
+
```
|
|
474
|
+
Text Docs → Sentence Transformer
|
|
475
|
+
↓
|
|
476
|
+
384D Vectors
|
|
477
|
+
↓
|
|
478
|
+
HNSW Index + SQLite
|
|
479
|
+
↓
|
|
480
|
+
Cross-Encoder Reranking
|
|
481
|
+
```
|
|
81
482
|
|
|
82
|
-
|
|
83
|
-
2. **Preprocessing**: Cleans content (JSX components, Mermaid diagrams, code blocks)
|
|
84
|
-
3. **Semantic Chunking**: Splits documents at natural boundaries with token limits
|
|
85
|
-
4. **Embedding Generation**: Uses transformers.js models for semantic vectors
|
|
86
|
-
5. **Vector Storage**: Fast similarity search with hnswlib-wasm
|
|
87
|
-
6. **Metadata Storage**: SQLite for document info and model compatibility
|
|
88
|
-
7. **Search**: Embeds queries and finds similar chunks using cosine similarity
|
|
89
|
-
8. **Reranking** (optional): Cross-encoder models for improved relevance
|
|
483
|
+
**Best for:** Documentation, articles, code
|
|
90
484
|
|
|
91
|
-
|
|
485
|
+
</td>
|
|
486
|
+
<td width="50%">
|
|
92
487
|
|
|
488
|
+
#### 🖼️ Multimodal Mode
|
|
93
489
|
```
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
490
|
+
Text + Images → CLIP Embedder
|
|
491
|
+
↓
|
|
492
|
+
512D Unified Space
|
|
493
|
+
↓
|
|
494
|
+
HNSW Index + SQLite
|
|
495
|
+
↓
|
|
496
|
+
Text-Derived Reranking
|
|
97
497
|
```
|
|
98
498
|
|
|
499
|
+
**Best for:** Mixed content, visual search
|
|
500
|
+
|
|
501
|
+
</td>
|
|
502
|
+
</tr>
|
|
503
|
+
</table>
|
|
504
|
+
|
|
505
|
+
**🎯 Key Benefits:**
|
|
506
|
+
- Set mode **once** during ingestion → Auto-detected during search
|
|
507
|
+
- **Cross-modal search** - Text queries find images, image queries find text
|
|
508
|
+
- **No fallback complexity** - Each mode works reliably or fails clearly
|
|
509
|
+
- **Same API** - Your code doesn't change between modes
|
|
510
|
+
|
|
99
511
|
→ **[Document Preprocessing Guide](docs/preprocessing.md)** | **[Model Management Details](models/README.md)**
|
|
100
512
|
|
|
101
|
-
## Supported Models
|
|
513
|
+
## 🧠 Supported Models
|
|
514
|
+
|
|
515
|
+
Choose the right model for your use case:
|
|
516
|
+
|
|
517
|
+
### 📝 Text Mode Models
|
|
102
518
|
|
|
103
|
-
|
|
519
|
+
| Model | Dims | Speed | Quality | Best For |
|
|
520
|
+
|-------|------|-------|---------|----------|
|
|
521
|
+
| `sentence-transformers/all-MiniLM-L6-v2` ⭐ | 384 | ⚡⚡⚡ | ⭐⭐⭐ | General purpose (default) |
|
|
522
|
+
| `Xenova/all-mpnet-base-v2` | 768 | ⚡⚡ | ⭐⭐⭐⭐ | Complex queries, higher accuracy |
|
|
104
523
|
|
|
105
|
-
|
|
106
|
-
|-------|------------|-------|----------|
|
|
107
|
-
| `sentence-transformers/all-MiniLM-L6-v2` | 384 | Fast | General purpose (default) |
|
|
108
|
-
| `Xenova/all-mpnet-base-v2` | 768 | Slower | Higher quality, complex queries |
|
|
524
|
+
### 🖼️ Multimodal Models
|
|
109
525
|
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
-
|
|
113
|
-
-
|
|
114
|
-
|
|
526
|
+
| Model | Dims | Speed | Quality | Best For |
|
|
527
|
+
|-------|------|-------|---------|----------|
|
|
528
|
+
| `Xenova/clip-vit-base-patch32` ⭐ | 512 | ⚡⚡ | ⭐⭐⭐ | Text + images (default) |
|
|
529
|
+
| `Xenova/clip-vit-base-patch16` | 512 | ⚡ | ⭐⭐⭐⭐ | Higher visual quality |
|
|
530
|
+
|
|
531
|
+
### ✨ Model Features
|
|
532
|
+
|
|
533
|
+
- ✅ **Auto-download** - Models cached locally on first use
|
|
534
|
+
- ✅ **Smart compatibility** - Detects model changes, prompts rebuilds
|
|
535
|
+
- ✅ **Offline support** - Pre-download for air-gapped environments
|
|
536
|
+
- ✅ **Zero config** - Works out of the box with sensible defaults
|
|
537
|
+
- ✅ **Cross-modal** - CLIP enables text ↔ image search
|
|
115
538
|
|
|
116
539
|
→ **[Complete Model Guide](docs/model-guide.md)** | **[Performance Benchmarks](docs/EMBEDDING_MODELS_COMPARISON.md)**
|
|
117
540
|
|
|
541
|
+
## 📚 Documentation
|
|
542
|
+
|
|
543
|
+
<table>
|
|
544
|
+
<tr>
|
|
545
|
+
<td width="33%">
|
|
546
|
+
|
|
547
|
+
### 🚀 Getting Started
|
|
548
|
+
- [CLI Reference](docs/cli-reference.md)
|
|
549
|
+
- [API Reference](docs/api-reference.md)
|
|
550
|
+
- [Multimodal Tutorial](docs/multimodal-tutorial.md)
|
|
551
|
+
- [Unified Content System](docs/unified-content-system.md)
|
|
552
|
+
|
|
553
|
+
</td>
|
|
554
|
+
<td width="33%">
|
|
118
555
|
|
|
556
|
+
### 🔧 Advanced
|
|
557
|
+
- [Configuration Guide](docs/configuration.md)
|
|
558
|
+
- [Model Selection](docs/model-guide.md)
|
|
559
|
+
- [Multimodal Config](docs/multimodal-configuration.md)
|
|
560
|
+
- [Path Strategies](docs/path-strategies.md)
|
|
119
561
|
|
|
562
|
+
</td>
|
|
563
|
+
<td width="33%">
|
|
120
564
|
|
|
121
|
-
|
|
565
|
+
### 🛠️ Support
|
|
566
|
+
- [Troubleshooting](docs/troubleshooting.md)
|
|
567
|
+
- [Multimodal Issues](docs/multimodal-troubleshooting.md)
|
|
568
|
+
- [Content Issues](docs/unified-content-troubleshooting.md)
|
|
569
|
+
- [Benchmarks](docs/EMBEDDING_MODELS_COMPARISON.md)
|
|
122
570
|
|
|
123
|
-
|
|
571
|
+
</td>
|
|
572
|
+
</tr>
|
|
573
|
+
</table>
|
|
574
|
+
|
|
575
|
+
### 🎯 Quick Start by Role
|
|
576
|
+
|
|
577
|
+
| I want to... | Start here |
|
|
578
|
+
|--------------|------------|
|
|
579
|
+
| 🆕 Try it out | [CLI Reference](docs/cli-reference.md) → `npm i -g rag-lite-ts` |
|
|
580
|
+
| 🖼️ Search images | [Multimodal Tutorial](docs/multimodal-tutorial.md) → `--mode multimodal` |
|
|
581
|
+
| 💻 Build an app | [API Reference](docs/api-reference.md) → `new SearchEngine()` |
|
|
582
|
+
| 🤖 Integrate with AI | [MCP Guide](docs/mcp-server-multimodal-guide.md) → `raglite-mcp` |
|
|
583
|
+
| ⚡ Optimize performance | [Model Guide](docs/model-guide.md) → Choose your model |
|
|
584
|
+
| 🐛 Fix an issue | [Troubleshooting](docs/troubleshooting.md) → Common solutions |
|
|
585
|
+
|
|
586
|
+
**📖 [Complete Documentation Hub](docs/README.md)**
|
|
587
|
+
|
|
588
|
+
## 🔌 MCP Server Integration
|
|
589
|
+
|
|
590
|
+
**Give your AI agents semantic memory.** RAG-lite TS includes a built-in Model Context Protocol (MCP) server.
|
|
124
591
|
|
|
125
592
|
```bash
|
|
126
|
-
# Start MCP server
|
|
593
|
+
# Start MCP server (works with Claude, Cline, and other MCP clients)
|
|
127
594
|
raglite-mcp
|
|
128
595
|
```
|
|
129
596
|
|
|
597
|
+
### Single Instance Configuration
|
|
598
|
+
|
|
130
599
|
**MCP Configuration:**
|
|
131
600
|
```json
|
|
132
601
|
{
|
|
@@ -139,39 +608,54 @@ raglite-mcp
|
|
|
139
608
|
}
|
|
140
609
|
```
|
|
141
610
|
|
|
142
|
-
|
|
611
|
+
### Multiple Instance Configuration (NEW)
|
|
143
612
|
|
|
144
|
-
|
|
613
|
+
Run multiple MCP server instances for different databases with **intelligent routing**:
|
|
145
614
|
|
|
146
|
-
|
|
615
|
+
```json
|
|
616
|
+
{
|
|
617
|
+
"mcpServers": {
|
|
618
|
+
"rag-lite-text-docs": {
|
|
619
|
+
"command": "npx",
|
|
620
|
+
"args": ["rag-lite-mcp"],
|
|
621
|
+
"env": {
|
|
622
|
+
"RAG_DB_FILE": "./text-docs/db.sqlite",
|
|
623
|
+
"RAG_INDEX_FILE": "./text-docs/index.bin"
|
|
624
|
+
}
|
|
625
|
+
},
|
|
626
|
+
"rag-lite-multimodal-images": {
|
|
627
|
+
"command": "npx",
|
|
628
|
+
"args": ["rag-lite-mcp"],
|
|
629
|
+
"env": {
|
|
630
|
+
"RAG_DB_FILE": "./mixed-content/db.sqlite",
|
|
631
|
+
"RAG_INDEX_FILE": "./mixed-content/index.bin"
|
|
632
|
+
}
|
|
633
|
+
}
|
|
634
|
+
}
|
|
635
|
+
}
|
|
636
|
+
```
|
|
147
637
|
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
-
|
|
151
|
-
-
|
|
638
|
+
**Dynamic Tool Descriptions:**
|
|
639
|
+
Each server automatically detects and advertises its capabilities:
|
|
640
|
+
- `[TEXT MODE]` - Text-only databases clearly indicate supported file types
|
|
641
|
+
- `[MULTIMODAL MODE]` - Multimodal databases advertise image support and cross-modal search
|
|
642
|
+
- AI assistants can intelligently route queries to the appropriate database
|
|
152
643
|
|
|
153
|
-
|
|
154
|
-
- **[Model Selection Guide](docs/model-guide.md)** - Embedding models and performance
|
|
155
|
-
- **[Path Storage Strategies](docs/path-strategies.md)** - Document path management
|
|
156
|
-
- **[Document Preprocessing](docs/preprocessing.md)** - Content processing options
|
|
157
|
-
- **[Troubleshooting Guide](docs/troubleshooting.md)** - Common issues and solutions
|
|
644
|
+
**Available Tools:** `search`, `ingest`, `ingest_image`, `multimodal_search`, `rebuild_index`, `get_stats`, `get_mode_info`, `list_supported_models`, `list_reranking_strategies`, `get_system_stats`
|
|
158
645
|
|
|
159
|
-
|
|
160
|
-
-
|
|
161
|
-
-
|
|
646
|
+
**Multimodal Features:**
|
|
647
|
+
- Search across text and image content
|
|
648
|
+
- Retrieve image content as base64 data
|
|
649
|
+
- Cross-modal search capabilities (text queries find images)
|
|
650
|
+
- Automatic mode detection from database
|
|
651
|
+
- Content type filtering
|
|
652
|
+
- Multiple reranking strategies
|
|
162
653
|
|
|
163
|
-
|
|
654
|
+
→ **[Complete MCP Integration Guide](docs/cli-reference.md#mcp-server)** | **[MCP Multimodal Guide](docs/mcp-server-multimodal-guide.md)** | **[Multi-Instance Setup](docs/mcp-server-multimodal-guide.md#running-multiple-mcp-server-instances)**
|
|
164
655
|
|
|
165
|
-
|
|
166
|
-
|----------|---------------|-------------------|
|
|
167
|
-
| **Getting Started** | [CLI Reference](docs/cli-reference.md) | [Configuration](docs/configuration.md) |
|
|
168
|
-
| **Model Selection** | [Model Guide](docs/model-guide.md) | [Performance Benchmarks](docs/EMBEDDING_MODELS_COMPARISON.md) |
|
|
169
|
-
| **Production Setup** | [Configuration Guide](docs/configuration.md) | [Path Strategies](docs/path-strategies.md) |
|
|
170
|
-
| **File Processing** | [Preprocessing Guide](docs/preprocessing.md) | [Troubleshooting](docs/troubleshooting.md) |
|
|
171
|
-
| **Integration** | [API Reference](docs/api-reference.md) | [Configuration](docs/configuration.md) |
|
|
172
|
-
| **Issue Resolution** | [Troubleshooting Guide](docs/troubleshooting.md) | All guides |
|
|
656
|
+
---
|
|
173
657
|
|
|
174
|
-
## Development
|
|
658
|
+
## 🛠️ Development
|
|
175
659
|
|
|
176
660
|
### Building from Source
|
|
177
661
|
|
|
@@ -194,13 +678,25 @@ npm run test:integration
|
|
|
194
678
|
|
|
195
679
|
```
|
|
196
680
|
src/
|
|
197
|
-
├── index.ts # Main exports
|
|
198
|
-
├──
|
|
199
|
-
├──
|
|
200
|
-
├──
|
|
201
|
-
├── search.ts
|
|
202
|
-
├── ingestion.ts
|
|
203
|
-
├──
|
|
681
|
+
├── index.ts # Main exports and factory functions
|
|
682
|
+
├── search.ts # Public SearchEngine API
|
|
683
|
+
├── ingestion.ts # Public IngestionPipeline API
|
|
684
|
+
├── core/ # Model-agnostic core layer
|
|
685
|
+
│ ├── search.ts # Core search engine
|
|
686
|
+
│ ├── ingestion.ts # Core ingestion pipeline
|
|
687
|
+
│ ├── db.ts # SQLite operations
|
|
688
|
+
│ ├── config.ts # Configuration system
|
|
689
|
+
│ ├── content-manager.ts # Content storage and management
|
|
690
|
+
│ └── types.ts # Core type definitions
|
|
691
|
+
├── text/ # Text-specific implementations
|
|
692
|
+
│ ├── embedder.ts # Sentence-transformer embedder
|
|
693
|
+
│ ├── reranker.ts # Cross-encoder reranking
|
|
694
|
+
│ └── tokenizer.ts # Text tokenization
|
|
695
|
+
├── multimodal/ # Multimodal implementations
|
|
696
|
+
│ ├── embedder.ts # CLIP embedder (text + images)
|
|
697
|
+
│ ├── reranker.ts # Text-derived and metadata reranking
|
|
698
|
+
│ ├── image-processor.ts # Image description and metadata
|
|
699
|
+
│ └── content-types.ts # Content type detection
|
|
204
700
|
├── cli.ts # CLI interface
|
|
205
701
|
├── mcp-server.ts # MCP server
|
|
206
702
|
└── preprocessors/ # Content type processors
|
|
@@ -210,31 +706,77 @@ dist/ # Compiled output
|
|
|
210
706
|
|
|
211
707
|
### Design Philosophy
|
|
212
708
|
|
|
213
|
-
**
|
|
214
|
-
- ✅
|
|
215
|
-
- ✅
|
|
216
|
-
- ✅
|
|
217
|
-
- ✅
|
|
218
|
-
-
|
|
709
|
+
**Simple by default, powerful when needed:**
|
|
710
|
+
- ✅ Simple constructors work immediately with sensible defaults
|
|
711
|
+
- ✅ Configuration options available when you need customization
|
|
712
|
+
- ✅ Advanced patterns available for complex use cases
|
|
713
|
+
- ✅ Clean architecture with minimal dependencies
|
|
714
|
+
- ✅ No ORMs or heavy frameworks - just TypeScript and SQLite
|
|
715
|
+
- ✅ Extensible design for future capabilities
|
|
716
|
+
|
|
717
|
+
This approach ensures that basic usage is effortless while providing the flexibility needed for advanced scenarios.
|
|
718
|
+
|
|
719
|
+
---
|
|
720
|
+
|
|
721
|
+
## 🤝 Contributing
|
|
219
722
|
|
|
220
|
-
|
|
723
|
+
We welcome contributions! Whether it's:
|
|
221
724
|
|
|
222
|
-
|
|
725
|
+
- 🐛 Bug fixes
|
|
726
|
+
- ✨ New features
|
|
727
|
+
- 📝 Documentation improvements
|
|
728
|
+
- 🧪 Test coverage
|
|
729
|
+
- 💡 Ideas and suggestions
|
|
223
730
|
|
|
731
|
+
**Guidelines:**
|
|
224
732
|
1. Fork the repository
|
|
225
|
-
2. Create a feature branch
|
|
226
|
-
3. Make your changes
|
|
227
|
-
4.
|
|
228
|
-
5.
|
|
229
|
-
6. Submit a pull request
|
|
733
|
+
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
|
|
734
|
+
3. Make your changes with tests
|
|
735
|
+
4. Ensure all tests pass (`npm test`)
|
|
736
|
+
5. Submit a pull request
|
|
230
737
|
|
|
231
|
-
|
|
738
|
+
We maintain clean architecture principles while enhancing functionality and developer experience.
|
|
232
739
|
|
|
233
|
-
|
|
740
|
+
---
|
|
234
741
|
|
|
235
|
-
|
|
742
|
+
## 🎯 Why We Built This
|
|
743
|
+
|
|
744
|
+
Existing RAG solutions are either:
|
|
745
|
+
- 🔴 **Too complex** - Require extensive setup and configuration
|
|
746
|
+
- 🔴 **Cloud-dependent** - Need API keys and external services
|
|
747
|
+
- 🔴 **Python-only** - Not ideal for TypeScript/Node.js projects
|
|
748
|
+
- 🔴 **Heavy** - Massive dependencies and slow startup
|
|
749
|
+
|
|
750
|
+
**RAG-lite TS is different:**
|
|
751
|
+
- ✅ **Simple** - Works out of the box with zero config
|
|
752
|
+
- ✅ **Local-first** - Your data stays on your machine
|
|
753
|
+
- ✅ **TypeScript native** - Built for modern JS/TS projects
|
|
754
|
+
- ✅ **Lightweight** - Fast startup, minimal dependencies
|
|
755
|
+
|
|
756
|
+
---
|
|
757
|
+
|
|
758
|
+
## 🙏 Acknowledgments
|
|
236
759
|
|
|
237
|
-
|
|
760
|
+
Built with amazing open-source projects:
|
|
238
761
|
|
|
239
|
-
- **[transformers.js](https://github.com/xenova/transformers.js)** - Client-side ML models
|
|
762
|
+
- **[transformers.js](https://github.com/xenova/transformers.js)** - Client-side ML models by Xenova
|
|
240
763
|
- **[hnswlib](https://github.com/nmslib/hnswlib)** - Fast approximate nearest neighbor search
|
|
764
|
+
- **[better-sqlite3](https://github.com/WiseLibs/better-sqlite3)** - Fast SQLite3 bindings
|
|
765
|
+
|
|
766
|
+
---
|
|
767
|
+
|
|
768
|
+
## 📄 License
|
|
769
|
+
|
|
770
|
+
MIT License - see [LICENSE](LICENSE) file for details.
|
|
771
|
+
|
|
772
|
+
---
|
|
773
|
+
|
|
774
|
+
<div align="center">
|
|
775
|
+
|
|
776
|
+
**⭐ Star us on GitHub — it helps!**
|
|
777
|
+
|
|
778
|
+
[Report Bug](https://github.com/your-username/rag-lite-ts/issues) • [Request Feature](https://github.com/your-username/rag-lite-ts/issues) • [Documentation](docs/README.md)
|
|
779
|
+
|
|
780
|
+
Made with ❤️ by developers, for developers
|
|
781
|
+
|
|
782
|
+
</div>
|