rag-lite-ts 1.0.2 → 2.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +605 -93
- package/dist/cli/indexer.js +192 -4
- package/dist/cli/search.js +50 -11
- package/dist/cli.js +183 -26
- package/dist/core/abstract-embedder.d.ts +125 -0
- package/dist/core/abstract-embedder.js +264 -0
- package/dist/core/actionable-error-messages.d.ts +60 -0
- package/dist/core/actionable-error-messages.js +397 -0
- package/dist/core/batch-processing-optimizer.d.ts +155 -0
- package/dist/core/batch-processing-optimizer.js +541 -0
- package/dist/core/binary-index-format.d.ts +52 -0
- package/dist/core/binary-index-format.js +122 -0
- package/dist/core/chunker.d.ts +2 -0
- package/dist/core/cli-database-utils.d.ts +53 -0
- package/dist/core/cli-database-utils.js +239 -0
- package/dist/core/config.js +10 -3
- package/dist/core/content-errors.d.ts +111 -0
- package/dist/core/content-errors.js +362 -0
- package/dist/core/content-manager.d.ts +343 -0
- package/dist/core/content-manager.js +1504 -0
- package/dist/core/content-performance-optimizer.d.ts +150 -0
- package/dist/core/content-performance-optimizer.js +516 -0
- package/dist/core/content-resolver.d.ts +104 -0
- package/dist/core/content-resolver.js +285 -0
- package/dist/core/cross-modal-search.d.ts +164 -0
- package/dist/core/cross-modal-search.js +342 -0
- package/dist/core/database-connection-manager.d.ts +109 -0
- package/dist/core/database-connection-manager.js +304 -0
- package/dist/core/db.d.ts +141 -2
- package/dist/core/db.js +631 -89
- package/dist/core/embedder-factory.d.ts +176 -0
- package/dist/core/embedder-factory.js +338 -0
- package/dist/core/index.d.ts +3 -1
- package/dist/core/index.js +4 -1
- package/dist/core/ingestion.d.ts +85 -15
- package/dist/core/ingestion.js +510 -45
- package/dist/core/lazy-dependency-loader.d.ts +152 -0
- package/dist/core/lazy-dependency-loader.js +453 -0
- package/dist/core/mode-detection-service.d.ts +150 -0
- package/dist/core/mode-detection-service.js +565 -0
- package/dist/core/mode-model-validator.d.ts +92 -0
- package/dist/core/mode-model-validator.js +203 -0
- package/dist/core/model-registry.d.ts +120 -0
- package/dist/core/model-registry.js +415 -0
- package/dist/core/model-validator.d.ts +217 -0
- package/dist/core/model-validator.js +782 -0
- package/dist/core/polymorphic-search-factory.d.ts +154 -0
- package/dist/core/polymorphic-search-factory.js +344 -0
- package/dist/core/raglite-paths.d.ts +121 -0
- package/dist/core/raglite-paths.js +145 -0
- package/dist/core/reranking-config.d.ts +42 -0
- package/dist/core/reranking-config.js +156 -0
- package/dist/core/reranking-factory.d.ts +92 -0
- package/dist/core/reranking-factory.js +591 -0
- package/dist/core/reranking-strategies.d.ts +325 -0
- package/dist/core/reranking-strategies.js +720 -0
- package/dist/core/resource-cleanup.d.ts +163 -0
- package/dist/core/resource-cleanup.js +371 -0
- package/dist/core/resource-manager.d.ts +212 -0
- package/dist/core/resource-manager.js +564 -0
- package/dist/core/search.d.ts +28 -1
- package/dist/core/search.js +83 -5
- package/dist/core/streaming-operations.d.ts +145 -0
- package/dist/core/streaming-operations.js +409 -0
- package/dist/core/types.d.ts +3 -0
- package/dist/core/universal-embedder.d.ts +177 -0
- package/dist/core/universal-embedder.js +139 -0
- package/dist/core/validation-messages.d.ts +99 -0
- package/dist/core/validation-messages.js +334 -0
- package/dist/core/vector-index.d.ts +1 -1
- package/dist/core/vector-index.js +37 -39
- package/dist/factories/index.d.ts +3 -1
- package/dist/factories/index.js +2 -0
- package/dist/factories/polymorphic-factory.d.ts +50 -0
- package/dist/factories/polymorphic-factory.js +159 -0
- package/dist/factories/text-factory.d.ts +128 -34
- package/dist/factories/text-factory.js +346 -97
- package/dist/file-processor.d.ts +88 -2
- package/dist/file-processor.js +720 -17
- package/dist/index.d.ts +32 -0
- package/dist/index.js +29 -0
- package/dist/ingestion.d.ts +16 -0
- package/dist/ingestion.js +21 -0
- package/dist/mcp-server.d.ts +35 -3
- package/dist/mcp-server.js +1107 -31
- package/dist/multimodal/clip-embedder.d.ts +327 -0
- package/dist/multimodal/clip-embedder.js +992 -0
- package/dist/multimodal/index.d.ts +6 -0
- package/dist/multimodal/index.js +6 -0
- package/dist/run-error-recovery-tests.d.ts +7 -0
- package/dist/run-error-recovery-tests.js +101 -0
- package/dist/search.d.ts +60 -9
- package/dist/search.js +82 -11
- package/dist/test-utils.d.ts +8 -26
- package/dist/text/chunker.d.ts +1 -0
- package/dist/text/embedder.js +15 -8
- package/dist/text/index.d.ts +1 -0
- package/dist/text/index.js +1 -0
- package/dist/text/reranker.d.ts +1 -2
- package/dist/text/reranker.js +17 -47
- package/dist/text/sentence-transformer-embedder.d.ts +96 -0
- package/dist/text/sentence-transformer-embedder.js +340 -0
- package/dist/types.d.ts +39 -0
- package/dist/utils/vector-math.d.ts +31 -0
- package/dist/utils/vector-math.js +70 -0
- package/package.json +27 -6
- package/dist/api-errors.d.ts.map +0 -1
- package/dist/api-errors.js.map +0 -1
- package/dist/cli/indexer.d.ts.map +0 -1
- package/dist/cli/indexer.js.map +0 -1
- package/dist/cli/search.d.ts.map +0 -1
- package/dist/cli/search.js.map +0 -1
- package/dist/cli.d.ts.map +0 -1
- package/dist/cli.js.map +0 -1
- package/dist/config.d.ts.map +0 -1
- package/dist/config.js.map +0 -1
- package/dist/core/adapters.d.ts.map +0 -1
- package/dist/core/adapters.js.map +0 -1
- package/dist/core/chunker.d.ts.map +0 -1
- package/dist/core/chunker.js.map +0 -1
- package/dist/core/config.d.ts.map +0 -1
- package/dist/core/config.js.map +0 -1
- package/dist/core/db.d.ts.map +0 -1
- package/dist/core/db.js.map +0 -1
- package/dist/core/error-handler.d.ts.map +0 -1
- package/dist/core/error-handler.js.map +0 -1
- package/dist/core/index.d.ts.map +0 -1
- package/dist/core/index.js.map +0 -1
- package/dist/core/ingestion.d.ts.map +0 -1
- package/dist/core/ingestion.js.map +0 -1
- package/dist/core/interfaces.d.ts.map +0 -1
- package/dist/core/interfaces.js.map +0 -1
- package/dist/core/path-manager.d.ts.map +0 -1
- package/dist/core/path-manager.js.map +0 -1
- package/dist/core/search-example.d.ts +0 -25
- package/dist/core/search-example.d.ts.map +0 -1
- package/dist/core/search-example.js +0 -138
- package/dist/core/search-example.js.map +0 -1
- package/dist/core/search-pipeline-example.d.ts +0 -21
- package/dist/core/search-pipeline-example.d.ts.map +0 -1
- package/dist/core/search-pipeline-example.js +0 -188
- package/dist/core/search-pipeline-example.js.map +0 -1
- package/dist/core/search-pipeline.d.ts.map +0 -1
- package/dist/core/search-pipeline.js.map +0 -1
- package/dist/core/search.d.ts.map +0 -1
- package/dist/core/search.js.map +0 -1
- package/dist/core/types.d.ts.map +0 -1
- package/dist/core/types.js.map +0 -1
- package/dist/core/vector-index.d.ts.map +0 -1
- package/dist/core/vector-index.js.map +0 -1
- package/dist/dom-polyfills.d.ts.map +0 -1
- package/dist/dom-polyfills.js.map +0 -1
- package/dist/examples/clean-api-examples.d.ts +0 -44
- package/dist/examples/clean-api-examples.d.ts.map +0 -1
- package/dist/examples/clean-api-examples.js +0 -206
- package/dist/examples/clean-api-examples.js.map +0 -1
- package/dist/factories/index.d.ts.map +0 -1
- package/dist/factories/index.js.map +0 -1
- package/dist/factories/text-factory.d.ts.map +0 -1
- package/dist/factories/text-factory.js.map +0 -1
- package/dist/file-processor.d.ts.map +0 -1
- package/dist/file-processor.js.map +0 -1
- package/dist/index-manager.d.ts.map +0 -1
- package/dist/index-manager.js.map +0 -1
- package/dist/index.d.ts.map +0 -1
- package/dist/index.js.map +0 -1
- package/dist/indexer.d.ts.map +0 -1
- package/dist/indexer.js.map +0 -1
- package/dist/ingestion.d.ts.map +0 -1
- package/dist/ingestion.js.map +0 -1
- package/dist/mcp-server.d.ts.map +0 -1
- package/dist/mcp-server.js.map +0 -1
- package/dist/preprocess.d.ts.map +0 -1
- package/dist/preprocess.js.map +0 -1
- package/dist/preprocessors/index.d.ts.map +0 -1
- package/dist/preprocessors/index.js.map +0 -1
- package/dist/preprocessors/mdx.d.ts.map +0 -1
- package/dist/preprocessors/mdx.js.map +0 -1
- package/dist/preprocessors/mermaid.d.ts.map +0 -1
- package/dist/preprocessors/mermaid.js.map +0 -1
- package/dist/preprocessors/registry.d.ts.map +0 -1
- package/dist/preprocessors/registry.js.map +0 -1
- package/dist/search-standalone.d.ts.map +0 -1
- package/dist/search-standalone.js.map +0 -1
- package/dist/search.d.ts.map +0 -1
- package/dist/search.js.map +0 -1
- package/dist/test-utils.d.ts.map +0 -1
- package/dist/test-utils.js.map +0 -1
- package/dist/text/chunker.d.ts.map +0 -1
- package/dist/text/chunker.js.map +0 -1
- package/dist/text/embedder.d.ts.map +0 -1
- package/dist/text/embedder.js.map +0 -1
- package/dist/text/index.d.ts.map +0 -1
- package/dist/text/index.js.map +0 -1
- package/dist/text/preprocessors/index.d.ts.map +0 -1
- package/dist/text/preprocessors/index.js.map +0 -1
- package/dist/text/preprocessors/mdx.d.ts.map +0 -1
- package/dist/text/preprocessors/mdx.js.map +0 -1
- package/dist/text/preprocessors/mermaid.d.ts.map +0 -1
- package/dist/text/preprocessors/mermaid.js.map +0 -1
- package/dist/text/preprocessors/registry.d.ts.map +0 -1
- package/dist/text/preprocessors/registry.js.map +0 -1
- package/dist/text/reranker.d.ts.map +0 -1
- package/dist/text/reranker.js.map +0 -1
- package/dist/text/tokenizer.d.ts.map +0 -1
- package/dist/text/tokenizer.js.map +0 -1
- package/dist/types.d.ts.map +0 -1
- package/dist/types.js.map +0 -1
package/README.md
CHANGED
|
@@ -1,23 +1,112 @@
|
|
|
1
|
-
|
|
2
|
-
*Simple by default, powerful when needed*
|
|
1
|
+
<div align="center">
|
|
3
2
|
|
|
4
|
-
|
|
3
|
+
# 🦎 RAG-lite TS
|
|
5
4
|
|
|
6
|
-
|
|
5
|
+
### *Simple by default, powerful when needed*
|
|
7
6
|
|
|
8
|
-
|
|
7
|
+
**Local-first semantic search that actually works**
|
|
9
8
|
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
- [Documentation](#documentation)
|
|
15
|
-
- [MCP Server Integration](#mcp-server-integration)
|
|
16
|
-
- [Development](#development)
|
|
17
|
-
- [Contributing](#contributing)
|
|
18
|
-
- [License](#license)
|
|
9
|
+
[](https://www.npmjs.com/package/rag-lite-ts)
|
|
10
|
+
[](https://opensource.org/licenses/MIT)
|
|
11
|
+
[](https://www.typescriptlang.org/)
|
|
12
|
+
[](https://nodejs.org/)
|
|
19
13
|
|
|
20
|
-
|
|
14
|
+
[Quick Start](#quick-start) • [Features](#features) • [Documentation](#documentation) • [Examples](#examples) • [MCP Integration](#mcp-server-integration)
|
|
15
|
+
|
|
16
|
+
</div>
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## 🎯 Why RAG-lite TS?
|
|
21
|
+
|
|
22
|
+
**Stop fighting with complex RAG frameworks.** Get semantic search running in 30 seconds:
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
npm install -g rag-lite-ts
|
|
26
|
+
raglite ingest ./docs/
|
|
27
|
+
raglite search "your query here"
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
**That's it.** No API keys, no cloud services, no configuration hell.
|
|
31
|
+
|
|
32
|
+
### 🎬 See It In Action
|
|
33
|
+
|
|
34
|
+
```typescript
|
|
35
|
+
// 1. Ingest your docs
|
|
36
|
+
const pipeline = new IngestionPipeline('./db.sqlite', './index.bin');
|
|
37
|
+
await pipeline.ingestDirectory('./docs/');
|
|
38
|
+
|
|
39
|
+
// 2. Search semantically
|
|
40
|
+
const search = new SearchEngine('./index.bin', './db.sqlite');
|
|
41
|
+
const results = await search.search('authentication flow');
|
|
42
|
+
|
|
43
|
+
// 3. Get relevant results instantly
|
|
44
|
+
console.log(results[0].text);
|
|
45
|
+
// "To authenticate users, first obtain a JWT token from the /auth endpoint..."
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
**Real semantic understanding** - not just keyword matching. Finds "JWT token" when you search for "authentication flow".
|
|
49
|
+
|
|
50
|
+
### What Makes It Different?
|
|
51
|
+
|
|
52
|
+
- 🏠 **100% Local** - Your data never leaves your machine
|
|
53
|
+
- 🚀 **Actually Fast** - Sub-100ms queries, not "eventually consistent"
|
|
54
|
+
- 🦎 **Chameleon Architecture** - Automatically adapts between text and multimodal modes
|
|
55
|
+
- 🖼️ **True Multimodal** - Search images with text, text with images (CLIP unified space)
|
|
56
|
+
- 📦 **Zero Runtime Dependencies** - No Python, no Docker, no external services
|
|
57
|
+
- 🎯 **TypeScript Native** - Full type safety, modern ESM architecture
|
|
58
|
+
- 🔌 **MCP Ready** - Built-in Model Context Protocol server for AI agents
|
|
59
|
+
|
|
60
|
+

|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## 🎉 What's New in 2.0
|
|
65
|
+
|
|
66
|
+
**Chameleon Multimodal Architecture** - RAG-lite TS now seamlessly adapts between text-only and multimodal search:
|
|
67
|
+
|
|
68
|
+
### 🖼️ Multimodal Search
|
|
69
|
+
- **CLIP Integration** - Unified 512D embedding space for text and images
|
|
70
|
+
- **Cross-Modal Search** - Find images with text queries, text with image queries
|
|
71
|
+
- **Image-to-Text Generation** - Automatic descriptions using vision-language models
|
|
72
|
+
- **Smart Reranking** - Text-derived, metadata-based, and hybrid strategies
|
|
73
|
+
|
|
74
|
+
### 🏗️ Architecture Improvements
|
|
75
|
+
- **Layered Architecture** - Clean separation: core (model-agnostic) → implementation (text/multimodal) → public API
|
|
76
|
+
- **Mode Persistence** - Configuration stored in database, auto-detected during search
|
|
77
|
+
- **Unified Content System** - Memory-based ingestion for AI agents, format-adaptive retrieval
|
|
78
|
+
- **Simplified APIs** - `createEmbedder()` and `createReranker()` replace complex factory patterns
|
|
79
|
+
|
|
80
|
+
### 🤖 MCP Server Enhancements
|
|
81
|
+
- **Multimodal Tools** - `multimodal_search`, `ingest_image` with URL download
|
|
82
|
+
- **Base64 Image Delivery** - Automatic encoding for AI agent integration
|
|
83
|
+
- **Content-Type Filtering** - Filter results by text, image, pdf, docx
|
|
84
|
+
- **Dynamic Tool Descriptions** - Context-aware tool documentation
|
|
85
|
+
|
|
86
|
+
### 📦 Migration from 1.x
|
|
87
|
+
Existing databases need schema updates for multimodal support. Two options:
|
|
88
|
+
1. **Automatic Migration**: Use `migrateToRagLiteStructure()` function
|
|
89
|
+
2. **Fresh Start**: Re-ingest content with v2.0.0
|
|
90
|
+
|
|
91
|
+
See [CHANGELOG.md](CHANGELOG.md) for complete details.
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## 📋 Table of Contents
|
|
96
|
+
|
|
97
|
+
- [Why RAG-lite TS?](#-why-rag-lite-ts)
|
|
98
|
+
- [Quick Start](#-quick-start)
|
|
99
|
+
- [Features](#-features)
|
|
100
|
+
- [Real-World Examples](#-real-world-examples)
|
|
101
|
+
- [How It Works](#-how-it-works)
|
|
102
|
+
- [Supported Models](#-supported-models)
|
|
103
|
+
- [Documentation](#-documentation)
|
|
104
|
+
- [MCP Server Integration](#-mcp-server-integration)
|
|
105
|
+
- [Development](#-development)
|
|
106
|
+
- [Contributing](#-contributing)
|
|
107
|
+
- [License](#-license)
|
|
108
|
+
|
|
109
|
+
## 🚀 Quick Start
|
|
21
110
|
|
|
22
111
|
### Installation
|
|
23
112
|
|
|
@@ -48,18 +137,111 @@ raglite ingest ./docs/ --model Xenova/all-mpnet-base-v2 --rebuild-if-needed
|
|
|
48
137
|
raglite search "complex query"
|
|
49
138
|
```
|
|
50
139
|
|
|
140
|
+
### Content Retrieval and MCP Integration
|
|
141
|
+
|
|
142
|
+
```typescript
|
|
143
|
+
import { SearchEngine, IngestionPipeline } from 'rag-lite-ts';
|
|
144
|
+
|
|
145
|
+
// Memory-based ingestion for AI agents
|
|
146
|
+
const pipeline = new IngestionPipeline('./db.sqlite', './index.bin');
|
|
147
|
+
const content = Buffer.from('Document from AI agent');
|
|
148
|
+
await pipeline.ingestFromMemory(content, {
|
|
149
|
+
displayName: 'agent-document.txt'
|
|
150
|
+
});
|
|
151
|
+
|
|
152
|
+
// Format-adaptive content retrieval
|
|
153
|
+
const search = new SearchEngine('./index.bin', './db.sqlite');
|
|
154
|
+
const results = await search.search('query');
|
|
155
|
+
|
|
156
|
+
// Get file path for CLI clients
|
|
157
|
+
const filePath = await search.getContent(results[0].contentId, 'file');
|
|
158
|
+
|
|
159
|
+
// Get base64 content for MCP clients
|
|
160
|
+
const base64 = await search.getContent(results[0].contentId, 'base64');
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
### Multimodal Search (Text + Images)
|
|
164
|
+
|
|
165
|
+
RAG-lite TS now supports true multimodal search using CLIP's unified embedding space, enabling cross-modal search between text and images:
|
|
166
|
+
|
|
167
|
+
```bash
|
|
168
|
+
# Enable multimodal processing for text and image content
|
|
169
|
+
raglite ingest ./docs/ --mode multimodal
|
|
170
|
+
|
|
171
|
+
# Cross-modal search: Find images using text queries
|
|
172
|
+
raglite search "architecture diagram" --content-type image
|
|
173
|
+
raglite search "red sports car" --content-type image
|
|
174
|
+
|
|
175
|
+
# Find text documents about visual concepts
|
|
176
|
+
raglite search "user interface design" --content-type text
|
|
177
|
+
|
|
178
|
+
# Search across both content types (default)
|
|
179
|
+
raglite search "system overview"
|
|
180
|
+
|
|
181
|
+
# Use different reranking strategies for optimal results
|
|
182
|
+
raglite ingest ./docs/ --mode multimodal --rerank-strategy text-derived
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
**Key Features:**
|
|
186
|
+
- **Unified embedding space**: Text and images embedded in the same 512-dimensional CLIP space
|
|
187
|
+
- **Cross-modal search**: Text queries find semantically similar images
|
|
188
|
+
- **Automatic mode detection**: Set mode once during ingestion, automatically detected during search
|
|
189
|
+
- **Multiple reranking strategies**: text-derived, metadata, hybrid, or disabled
|
|
190
|
+
- **Seamless experience**: Same CLI commands work for both text-only and multimodal content
|
|
191
|
+
|
|
192
|
+
→ **[Complete Multimodal Tutorial](docs/multimodal-tutorial.md)**
|
|
193
|
+
|
|
51
194
|
### Programmatic Usage
|
|
52
195
|
|
|
53
196
|
```typescript
|
|
54
197
|
import { SearchEngine, IngestionPipeline } from 'rag-lite-ts';
|
|
55
198
|
|
|
56
|
-
//
|
|
199
|
+
// Text-only mode (default)
|
|
57
200
|
const ingestion = new IngestionPipeline('./db.sqlite', './vector-index.bin');
|
|
58
201
|
await ingestion.ingestDirectory('./docs/');
|
|
59
202
|
|
|
60
|
-
//
|
|
203
|
+
// Multimodal mode (text + images)
|
|
204
|
+
const multimodalIngestion = new IngestionPipeline('./db.sqlite', './index.bin', {
|
|
205
|
+
mode: 'multimodal',
|
|
206
|
+
embeddingModel: 'Xenova/clip-vit-base-patch32',
|
|
207
|
+
rerankingStrategy: 'text-derived'
|
|
208
|
+
});
|
|
209
|
+
await multimodalIngestion.ingestDirectory('./mixed-content/');
|
|
210
|
+
|
|
211
|
+
// Search (mode auto-detected from database)
|
|
61
212
|
const search = new SearchEngine('./vector-index.bin', './db.sqlite');
|
|
62
213
|
const results = await search.search('machine learning', { top_k: 10 });
|
|
214
|
+
|
|
215
|
+
// Cross-modal search in multimodal mode
|
|
216
|
+
const imageResults = results.filter(r => r.contentType === 'image');
|
|
217
|
+
const textResults = results.filter(r => r.contentType === 'text');
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
### Memory Ingestion & Unified Content System (NEW)
|
|
221
|
+
|
|
222
|
+
```typescript
|
|
223
|
+
// Ingest content directly from memory (perfect for MCP integration)
|
|
224
|
+
const content = Buffer.from('# AI Guide\n\nComprehensive AI concepts...');
|
|
225
|
+
const contentId = await ingestion.ingestFromMemory(content, {
|
|
226
|
+
displayName: 'AI Guide.md',
|
|
227
|
+
contentType: 'text/markdown'
|
|
228
|
+
});
|
|
229
|
+
|
|
230
|
+
// Retrieve content in different formats based on client needs
|
|
231
|
+
const filePath = await search.getContent(contentId, 'file'); // For CLI clients
|
|
232
|
+
const base64Data = await search.getContent(contentId, 'base64'); // For MCP clients
|
|
233
|
+
|
|
234
|
+
// Batch content retrieval for efficiency
|
|
235
|
+
const contentIds = ['id1', 'id2', 'id3'];
|
|
236
|
+
const contents = await search.getContentBatch(contentIds, 'base64');
|
|
237
|
+
|
|
238
|
+
// Content management with deduplication
|
|
239
|
+
const stats = await ingestion.getStorageStats();
|
|
240
|
+
console.log(`Content directory: ${stats.contentDirSize} bytes, ${stats.fileCount} files`);
|
|
241
|
+
|
|
242
|
+
// Cleanup orphaned content
|
|
243
|
+
const cleanupResult = await ingestion.cleanup();
|
|
244
|
+
console.log(`Removed ${cleanupResult.removedFiles} orphaned files`);
|
|
63
245
|
```
|
|
64
246
|
|
|
65
247
|
#### Configuration Options
|
|
@@ -84,95 +266,335 @@ const ingestion = new IngestionPipeline('./db.sqlite', './vector-index.bin', {
|
|
|
84
266
|
|
|
85
267
|
→ **[Complete CLI Reference](docs/cli-reference.md)** | **[API Documentation](docs/api-reference.md)**
|
|
86
268
|
|
|
87
|
-
|
|
269
|
+
---
|
|
88
270
|
|
|
89
|
-
|
|
90
|
-
- 🏠 **Local-first**: All processing happens offline on your machine
|
|
91
|
-
- 🚀 **Fast**: Sub-100ms queries for typical document collections
|
|
92
|
-
- 🔍 **Semantic**: Uses embeddings for meaning-based search, not just keywords
|
|
93
|
-
- 🛠️ **Flexible**: Simple constructors for basic use, advanced options when you need them
|
|
94
|
-
- 📦 **Complete**: CLI, programmatic API, and MCP server in one package
|
|
95
|
-
- 🎯 **TypeScript**: Full type safety with modern ESM architecture
|
|
96
|
-
- 🧠 **Smart**: Automatic model management and compatibility checking
|
|
271
|
+
## 💡 Real-World Examples
|
|
97
272
|
|
|
98
|
-
|
|
273
|
+
<details>
|
|
274
|
+
<summary><b>🔍 Build a Documentation Search Engine</b></summary>
|
|
99
275
|
|
|
100
|
-
|
|
276
|
+
```typescript
|
|
277
|
+
import { SearchEngine, IngestionPipeline } from 'rag-lite-ts';
|
|
101
278
|
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
4. **Embedding Generation**: Uses transformers.js models for semantic vectors
|
|
106
|
-
5. **Vector Storage**: Fast similarity search with hnswlib-wasm
|
|
107
|
-
6. **Metadata Storage**: SQLite for document info and model compatibility
|
|
108
|
-
7. **Search**: Embeds queries and finds similar chunks using cosine similarity
|
|
109
|
-
8. **Reranking** (optional): Cross-encoder models for improved relevance
|
|
279
|
+
// Ingest your docs once
|
|
280
|
+
const pipeline = new IngestionPipeline('./db.sqlite', './index.bin');
|
|
281
|
+
await pipeline.ingestDirectory('./docs/');
|
|
110
282
|
|
|
111
|
-
|
|
283
|
+
// Search instantly
|
|
284
|
+
const search = new SearchEngine('./index.bin', './db.sqlite');
|
|
285
|
+
const results = await search.search('authentication flow');
|
|
112
286
|
|
|
287
|
+
results.forEach(r => {
|
|
288
|
+
console.log(`${r.metadata.title}: ${r.text}`);
|
|
289
|
+
console.log(`Relevance: ${r.score.toFixed(3)}\n`);
|
|
290
|
+
});
|
|
113
291
|
```
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
292
|
+
|
|
293
|
+
**Use case:** Internal documentation, API references, knowledge bases
|
|
294
|
+
|
|
295
|
+
</details>
|
|
296
|
+
|
|
297
|
+
<details>
|
|
298
|
+
<summary><b>🖼️ Search Images with Natural Language</b></summary>
|
|
299
|
+
|
|
300
|
+
```bash
|
|
301
|
+
# Ingest mixed content (text + images)
|
|
302
|
+
raglite ingest ./assets/ --mode multimodal
|
|
303
|
+
|
|
304
|
+
# Find images using text descriptions
|
|
305
|
+
raglite search "architecture diagram" --content-type image
|
|
306
|
+
raglite search "team photo" --content-type image
|
|
307
|
+
raglite search "product screenshot" --content-type image
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
**Use case:** Digital asset management, photo libraries, design systems
|
|
311
|
+
|
|
312
|
+
</details>
|
|
313
|
+
|
|
314
|
+
<details>
|
|
315
|
+
<summary><b>🤖 AI Agent with Memory</b></summary>
|
|
316
|
+
|
|
317
|
+
```typescript
|
|
318
|
+
// Agent ingests conversation context
|
|
319
|
+
const content = Buffer.from('User prefers dark mode. Uses TypeScript.');
|
|
320
|
+
await pipeline.ingestFromMemory(content, {
|
|
321
|
+
displayName: 'user-preferences.txt'
|
|
322
|
+
});
|
|
323
|
+
|
|
324
|
+
// Later, agent retrieves relevant context
|
|
325
|
+
const context = await search.search('user interface preferences');
|
|
326
|
+
// Agent now knows: "User prefers dark mode"
|
|
327
|
+
```
|
|
328
|
+
|
|
329
|
+
**Use case:** Chatbots, AI assistants, context-aware agents
|
|
330
|
+
|
|
331
|
+
</details>
|
|
332
|
+
|
|
333
|
+
<details>
|
|
334
|
+
<summary><b>📊 Semantic Code Search</b></summary>
|
|
335
|
+
|
|
336
|
+
```typescript
|
|
337
|
+
// Index your codebase
|
|
338
|
+
await pipeline.ingestDirectory('./src/', {
|
|
339
|
+
chunkSize: 500, // Larger chunks for code
|
|
340
|
+
chunkOverlap: 100
|
|
341
|
+
});
|
|
342
|
+
|
|
343
|
+
// Find code by intent, not keywords
|
|
344
|
+
const results = await search.search('authentication middleware');
|
|
345
|
+
// Finds relevant code even if it doesn't contain those exact words
|
|
117
346
|
```
|
|
118
347
|
|
|
348
|
+
**Use case:** Code navigation, refactoring, onboarding
|
|
349
|
+
|
|
350
|
+
</details>
|
|
351
|
+
|
|
352
|
+
<details>
|
|
353
|
+
<summary><b>🔌 MCP Server for Claude/AI Tools</b></summary>
|
|
354
|
+
|
|
355
|
+
```json
|
|
356
|
+
{
|
|
357
|
+
"mcpServers": {
|
|
358
|
+
"my-docs": {
|
|
359
|
+
"command": "raglite-mcp",
|
|
360
|
+
"env": {
|
|
361
|
+
"RAG_DB_FILE": "./docs/db.sqlite",
|
|
362
|
+
"RAG_INDEX_FILE": "./docs/index.bin"
|
|
363
|
+
}
|
|
364
|
+
}
|
|
365
|
+
}
|
|
366
|
+
}
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
Now Claude can search your docs directly! Works with any MCP-compatible AI tool.
|
|
370
|
+
|
|
371
|
+
**Use case:** AI-powered documentation, intelligent assistants
|
|
372
|
+
|
|
373
|
+
</details>
|
|
374
|
+
|
|
375
|
+
---
|
|
376
|
+
|
|
377
|
+
## ✨ Features
|
|
378
|
+
|
|
379
|
+
<table>
|
|
380
|
+
<tr>
|
|
381
|
+
<td width="50%">
|
|
382
|
+
|
|
383
|
+
### 🎯 Developer Experience
|
|
384
|
+
- **One-line setup** - `new SearchEngine()` just works
|
|
385
|
+
- **TypeScript native** - Full type safety
|
|
386
|
+
- **Zero config** - Sensible defaults everywhere
|
|
387
|
+
- **Hackable** - Clean architecture, easy to extend
|
|
388
|
+
|
|
389
|
+
</td>
|
|
390
|
+
<td width="50%">
|
|
391
|
+
|
|
392
|
+
### 🚀 Performance
|
|
393
|
+
- **Sub-100ms queries** - Fast vector search
|
|
394
|
+
- **Offline-first** - No network calls
|
|
395
|
+
- **Efficient chunking** - Smart semantic boundaries
|
|
396
|
+
- **Optimized models** - Multiple quality/speed options
|
|
397
|
+
|
|
398
|
+
</td>
|
|
399
|
+
</tr>
|
|
400
|
+
<tr>
|
|
401
|
+
<td width="50%">
|
|
402
|
+
|
|
403
|
+
### 🦎 Chameleon Architecture
|
|
404
|
+
- **Auto-adapting** - Text or multimodal mode
|
|
405
|
+
- **Mode persistence** - Set once, auto-detected
|
|
406
|
+
- **No fallbacks** - Reliable or clear failure
|
|
407
|
+
- **Polymorphic runtime** - Same API, different modes
|
|
408
|
+
|
|
409
|
+
</td>
|
|
410
|
+
<td width="50%">
|
|
411
|
+
|
|
412
|
+
### 🖼️ Multimodal Search
|
|
413
|
+
- **CLIP unified space** - Text and images together
|
|
414
|
+
- **Cross-modal queries** - Text finds images, vice versa
|
|
415
|
+
- **Multiple strategies** - Text-derived, metadata, hybrid
|
|
416
|
+
- **Seamless experience** - Same commands, more power
|
|
417
|
+
|
|
418
|
+
</td>
|
|
419
|
+
</tr>
|
|
420
|
+
<tr>
|
|
421
|
+
<td width="50%">
|
|
422
|
+
|
|
423
|
+
### 🔌 Integration Ready
|
|
424
|
+
- **MCP server included** - AI agent integration
|
|
425
|
+
- **Memory ingestion** - Direct buffer processing
|
|
426
|
+
- **Format-adaptive** - File paths or base64 data
|
|
427
|
+
- **Multi-instance** - Run multiple databases
|
|
428
|
+
|
|
429
|
+
</td>
|
|
430
|
+
<td width="50%">
|
|
431
|
+
|
|
432
|
+
### 🛠️ Production Ready
|
|
433
|
+
- **Content management** - Deduplication, cleanup
|
|
434
|
+
- **Model compatibility** - Auto-detection, rebuilds
|
|
435
|
+
- **Error recovery** - Clear messages, helpful hints
|
|
436
|
+
|
|
437
|
+
</td>
|
|
438
|
+
</tr>
|
|
439
|
+
</table>
|
|
440
|
+
|
|
441
|
+
## 🔧 How It Works
|
|
442
|
+
|
|
443
|
+
RAG-lite TS follows a clean, efficient pipeline:
|
|
444
|
+
|
|
445
|
+
```
|
|
446
|
+
📄 Documents → 🧹 Preprocessing → ✂️ Chunking → 🧠 Embedding → 💾 Storage
|
|
447
|
+
↓
|
|
448
|
+
🎯 Results ← 🔄 Reranking ← 🔍 Vector Search ← 🧠 Query Embedding ← ❓ Query
|
|
449
|
+
```
|
|
450
|
+
|
|
451
|
+
### Pipeline Steps
|
|
452
|
+
|
|
453
|
+
| Step | What Happens | Technologies |
|
|
454
|
+
|------|--------------|--------------|
|
|
455
|
+
| **1. Ingestion** | Reads `.md`, `.txt`, `.pdf`, `.docx`, images | Native parsers |
|
|
456
|
+
| **2. Preprocessing** | Cleans JSX, Mermaid, code blocks, generates image descriptions | Custom processors |
|
|
457
|
+
| **3. Chunking** | Splits at natural boundaries with token limits | Semantic chunking |
|
|
458
|
+
| **4. Embedding** | Converts text/images to vectors | transformers.js |
|
|
459
|
+
| **5. Storage** | Indexes vectors, stores metadata | hnswlib + SQLite |
|
|
460
|
+
| **6. Search** | Finds similar chunks via cosine similarity | HNSW algorithm |
|
|
461
|
+
| **7. Reranking** | Re-scores results for relevance | Cross-encoder/metadata |
|
|
462
|
+
|
|
463
|
+
### 🦎 Chameleon Architecture
|
|
464
|
+
|
|
465
|
+
The system **automatically adapts** based on your content:
|
|
466
|
+
|
|
467
|
+
<table>
|
|
468
|
+
<tr>
|
|
469
|
+
<td width="50%">
|
|
470
|
+
|
|
471
|
+
#### 📝 Text Mode
|
|
472
|
+
```
|
|
473
|
+
Text Docs → Sentence Transformer
|
|
474
|
+
↓
|
|
475
|
+
384D Vectors
|
|
476
|
+
↓
|
|
477
|
+
HNSW Index + SQLite
|
|
478
|
+
↓
|
|
479
|
+
Cross-Encoder Reranking
|
|
480
|
+
```
|
|
481
|
+
|
|
482
|
+
**Best for:** Documentation, articles, code
|
|
483
|
+
|
|
484
|
+
</td>
|
|
485
|
+
<td width="50%">
|
|
486
|
+
|
|
487
|
+
#### 🖼️ Multimodal Mode
|
|
488
|
+
```
|
|
489
|
+
Text + Images → CLIP Embedder
|
|
490
|
+
↓
|
|
491
|
+
512D Unified Space
|
|
492
|
+
↓
|
|
493
|
+
HNSW Index + SQLite
|
|
494
|
+
↓
|
|
495
|
+
Text-Derived Reranking
|
|
496
|
+
```
|
|
497
|
+
|
|
498
|
+
**Best for:** Mixed content, visual search
|
|
499
|
+
|
|
500
|
+
</td>
|
|
501
|
+
</tr>
|
|
502
|
+
</table>
|
|
503
|
+
|
|
504
|
+
**🎯 Key Benefits:**
|
|
505
|
+
- Set mode **once** during ingestion → Auto-detected during search
|
|
506
|
+
- **Cross-modal search** - Text queries find images, image queries find text
|
|
507
|
+
- **No fallback complexity** - Each mode works reliably or fails clearly
|
|
508
|
+
- **Same API** - Your code doesn't change between modes
|
|
509
|
+
|
|
119
510
|
→ **[Document Preprocessing Guide](docs/preprocessing.md)** | **[Model Management Details](models/README.md)**
|
|
120
511
|
|
|
121
|
-
## Supported Models
|
|
512
|
+
## 🧠 Supported Models
|
|
122
513
|
|
|
123
|
-
|
|
514
|
+
Choose the right model for your use case:
|
|
124
515
|
|
|
125
|
-
|
|
126
|
-
|-------|------------|-------|----------|
|
|
127
|
-
| `sentence-transformers/all-MiniLM-L6-v2` | 384 | Fast | General purpose (default) |
|
|
128
|
-
| `Xenova/all-mpnet-base-v2` | 768 | Slower | Higher quality, complex queries |
|
|
516
|
+
### 📝 Text Mode Models
|
|
129
517
|
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
-
|
|
133
|
-
-
|
|
134
|
-
|
|
518
|
+
| Model | Dims | Speed | Quality | Best For |
|
|
519
|
+
|-------|------|-------|---------|----------|
|
|
520
|
+
| `sentence-transformers/all-MiniLM-L6-v2` ⭐ | 384 | ⚡⚡⚡ | ⭐⭐⭐ | General purpose (default) |
|
|
521
|
+
| `Xenova/all-mpnet-base-v2` | 768 | ⚡⚡ | ⭐⭐⭐⭐ | Complex queries, higher accuracy |
|
|
522
|
+
|
|
523
|
+
### 🖼️ Multimodal Models
|
|
524
|
+
|
|
525
|
+
| Model | Dims | Speed | Quality | Best For |
|
|
526
|
+
|-------|------|-------|---------|----------|
|
|
527
|
+
| `Xenova/clip-vit-base-patch32` ⭐ | 512 | ⚡⚡ | ⭐⭐⭐ | Text + images (default) |
|
|
528
|
+
| `Xenova/clip-vit-base-patch16` | 512 | ⚡ | ⭐⭐⭐⭐ | Higher visual quality |
|
|
529
|
+
|
|
530
|
+
### ✨ Model Features
|
|
531
|
+
|
|
532
|
+
- ✅ **Auto-download** - Models cached locally on first use
|
|
533
|
+
- ✅ **Smart compatibility** - Detects model changes, prompts rebuilds
|
|
534
|
+
- ✅ **Offline support** - Pre-download for air-gapped environments
|
|
535
|
+
- ✅ **Zero config** - Works out of the box with sensible defaults
|
|
536
|
+
- ✅ **Cross-modal** - CLIP enables text ↔ image search
|
|
135
537
|
|
|
136
538
|
→ **[Complete Model Guide](docs/model-guide.md)** | **[Performance Benchmarks](docs/EMBEDDING_MODELS_COMPARISON.md)**
|
|
137
539
|
|
|
138
|
-
## Documentation
|
|
540
|
+
## 📚 Documentation
|
|
541
|
+
|
|
542
|
+
<table>
|
|
543
|
+
<tr>
|
|
544
|
+
<td width="33%">
|
|
139
545
|
|
|
140
|
-
###
|
|
141
|
-
-
|
|
142
|
-
-
|
|
546
|
+
### 🚀 Getting Started
|
|
547
|
+
- [CLI Reference](docs/cli-reference.md)
|
|
548
|
+
- [API Reference](docs/api-reference.md)
|
|
549
|
+
- [Multimodal Tutorial](docs/multimodal-tutorial.md)
|
|
550
|
+
- [Unified Content System](docs/unified-content-system.md)
|
|
143
551
|
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
-
|
|
552
|
+
</td>
|
|
553
|
+
<td width="33%">
|
|
554
|
+
|
|
555
|
+
### 🔧 Advanced
|
|
556
|
+
- [Configuration Guide](docs/configuration.md)
|
|
557
|
+
- [Model Selection](docs/model-guide.md)
|
|
558
|
+
- [Multimodal Config](docs/multimodal-configuration.md)
|
|
559
|
+
- [Path Strategies](docs/path-strategies.md)
|
|
560
|
+
|
|
561
|
+
</td>
|
|
562
|
+
<td width="33%">
|
|
149
563
|
|
|
150
564
|
### 🛠️ Support
|
|
151
|
-
-
|
|
565
|
+
- [Troubleshooting](docs/troubleshooting.md)
|
|
566
|
+
- [Multimodal Issues](docs/multimodal-troubleshooting.md)
|
|
567
|
+
- [Content Issues](docs/unified-content-troubleshooting.md)
|
|
568
|
+
- [Benchmarks](docs/EMBEDDING_MODELS_COMPARISON.md)
|
|
569
|
+
|
|
570
|
+
</td>
|
|
571
|
+
</tr>
|
|
572
|
+
</table>
|
|
152
573
|
|
|
153
|
-
###
|
|
154
|
-
- **[Embedding Models Comparison](docs/EMBEDDING_MODELS_COMPARISON.md)** - Detailed benchmarks
|
|
155
|
-
- **[Documentation Hub](docs/README.md)** - Complete documentation index
|
|
574
|
+
### 🎯 Quick Start by Role
|
|
156
575
|
|
|
157
|
-
|
|
576
|
+
| I want to... | Start here |
|
|
577
|
+
|--------------|------------|
|
|
578
|
+
| 🆕 Try it out | [CLI Reference](docs/cli-reference.md) → `npm i -g rag-lite-ts` |
|
|
579
|
+
| 🖼️ Search images | [Multimodal Tutorial](docs/multimodal-tutorial.md) → `--mode multimodal` |
|
|
580
|
+
| 💻 Build an app | [API Reference](docs/api-reference.md) → `new SearchEngine()` |
|
|
581
|
+
| 🤖 Integrate with AI | [MCP Guide](docs/mcp-server-multimodal-guide.md) → `raglite-mcp` |
|
|
582
|
+
| ⚡ Optimize performance | [Model Guide](docs/model-guide.md) → Choose your model |
|
|
583
|
+
| 🐛 Fix an issue | [Troubleshooting](docs/troubleshooting.md) → Common solutions |
|
|
158
584
|
|
|
159
|
-
|
|
160
|
-
|-----------|------------|------------|
|
|
161
|
-
| **New Users** | [CLI Reference](docs/cli-reference.md) | [API Reference](docs/api-reference.md) |
|
|
162
|
-
| **App Developers** | [API Reference](docs/api-reference.md) | [Configuration Guide](docs/configuration.md) |
|
|
163
|
-
| **Performance Optimizers** | [Model Guide](docs/model-guide.md) | [Performance Benchmarks](docs/EMBEDDING_MODELS_COMPARISON.md) |
|
|
164
|
-
| **Production Deployers** | [Configuration Guide](docs/configuration.md) | [Path Strategies](docs/path-strategies.md) |
|
|
165
|
-
| **Troubleshooters** | [Troubleshooting Guide](docs/troubleshooting.md) | [Preprocessing Guide](docs/preprocessing.md) |
|
|
585
|
+
**📖 [Complete Documentation Hub](docs/README.md)**
|
|
166
586
|
|
|
167
|
-
## MCP Server Integration
|
|
587
|
+
## 🔌 MCP Server Integration
|
|
168
588
|
|
|
169
|
-
RAG-lite TS includes a Model Context Protocol (MCP) server
|
|
589
|
+
**Give your AI agents semantic memory.** RAG-lite TS includes a built-in Model Context Protocol (MCP) server.
|
|
170
590
|
|
|
171
591
|
```bash
|
|
172
|
-
# Start MCP server
|
|
592
|
+
# Start MCP server (works with Claude, Cline, and other MCP clients)
|
|
173
593
|
raglite-mcp
|
|
174
594
|
```
|
|
175
595
|
|
|
596
|
+
### Single Instance Configuration
|
|
597
|
+
|
|
176
598
|
**MCP Configuration:**
|
|
177
599
|
```json
|
|
178
600
|
{
|
|
@@ -185,11 +607,54 @@ raglite-mcp
|
|
|
185
607
|
}
|
|
186
608
|
```
|
|
187
609
|
|
|
188
|
-
|
|
610
|
+
### Multiple Instance Configuration (NEW)
|
|
611
|
+
|
|
612
|
+
Run multiple MCP server instances for different databases with **intelligent routing**:
|
|
613
|
+
|
|
614
|
+
```json
|
|
615
|
+
{
|
|
616
|
+
"mcpServers": {
|
|
617
|
+
"rag-lite-text-docs": {
|
|
618
|
+
"command": "npx",
|
|
619
|
+
"args": ["rag-lite-mcp"],
|
|
620
|
+
"env": {
|
|
621
|
+
"RAG_DB_FILE": "./text-docs/db.sqlite",
|
|
622
|
+
"RAG_INDEX_FILE": "./text-docs/index.bin"
|
|
623
|
+
}
|
|
624
|
+
},
|
|
625
|
+
"rag-lite-multimodal-images": {
|
|
626
|
+
"command": "npx",
|
|
627
|
+
"args": ["rag-lite-mcp"],
|
|
628
|
+
"env": {
|
|
629
|
+
"RAG_DB_FILE": "./mixed-content/db.sqlite",
|
|
630
|
+
"RAG_INDEX_FILE": "./mixed-content/index.bin"
|
|
631
|
+
}
|
|
632
|
+
}
|
|
633
|
+
}
|
|
634
|
+
}
|
|
635
|
+
```
|
|
636
|
+
|
|
637
|
+
**Dynamic Tool Descriptions:**
|
|
638
|
+
Each server automatically detects and advertises its capabilities:
|
|
639
|
+
- `[TEXT MODE]` - Text-only databases clearly indicate supported file types
|
|
640
|
+
- `[MULTIMODAL MODE]` - Multimodal databases advertise image support and cross-modal search
|
|
641
|
+
- AI assistants can intelligently route queries to the appropriate database
|
|
642
|
+
|
|
643
|
+
**Available Tools:** `search`, `ingest`, `ingest_image`, `multimodal_search`, `rebuild_index`, `get_stats`, `get_mode_info`, `list_supported_models`, `list_reranking_strategies`, `get_system_stats`
|
|
644
|
+
|
|
645
|
+
**Multimodal Features:**
|
|
646
|
+
- Search across text and image content
|
|
647
|
+
- Retrieve image content as base64 data
|
|
648
|
+
- Cross-modal search capabilities (text queries find images)
|
|
649
|
+
- Automatic mode detection from database
|
|
650
|
+
- Content type filtering
|
|
651
|
+
- Multiple reranking strategies
|
|
652
|
+
|
|
653
|
+
→ **[Complete MCP Integration Guide](docs/cli-reference.md#mcp-server)** | **[MCP Multimodal Guide](docs/mcp-server-multimodal-guide.md)** | **[Multi-Instance Setup](docs/mcp-server-multimodal-guide.md#running-multiple-mcp-server-instances)**
|
|
189
654
|
|
|
190
|
-
|
|
655
|
+
---
|
|
191
656
|
|
|
192
|
-
## Development
|
|
657
|
+
## 🛠️ Development
|
|
193
658
|
|
|
194
659
|
### Building from Source
|
|
195
660
|
|
|
@@ -220,13 +685,17 @@ src/
|
|
|
220
685
|
│ ├── ingestion.ts # Core ingestion pipeline
|
|
221
686
|
│ ├── db.ts # SQLite operations
|
|
222
687
|
│ ├── config.ts # Configuration system
|
|
688
|
+
│ ├── content-manager.ts # Content storage and management
|
|
223
689
|
│ └── types.ts # Core type definitions
|
|
224
|
-
├── factories/ # Factory functions for easy setup
|
|
225
|
-
│ └── text-factory.ts # Text-specific factories
|
|
226
690
|
├── text/ # Text-specific implementations
|
|
227
|
-
│ ├── embedder.ts #
|
|
228
|
-
│ ├── reranker.ts #
|
|
691
|
+
│ ├── embedder.ts # Sentence-transformer embedder
|
|
692
|
+
│ ├── reranker.ts # Cross-encoder reranking
|
|
229
693
|
│ └── tokenizer.ts # Text tokenization
|
|
694
|
+
├── multimodal/ # Multimodal implementations
|
|
695
|
+
│ ├── embedder.ts # CLIP embedder (text + images)
|
|
696
|
+
│ ├── reranker.ts # Text-derived and metadata reranking
|
|
697
|
+
│ ├── image-processor.ts # Image description and metadata
|
|
698
|
+
│ └── content-types.ts # Content type detection
|
|
230
699
|
├── cli.ts # CLI interface
|
|
231
700
|
├── mcp-server.ts # MCP server
|
|
232
701
|
└── preprocessors/ # Content type processors
|
|
@@ -246,24 +715,67 @@ dist/ # Compiled output
|
|
|
246
715
|
|
|
247
716
|
This approach ensures that basic usage is effortless while providing the flexibility needed for advanced scenarios.
|
|
248
717
|
|
|
718
|
+
---
|
|
249
719
|
|
|
720
|
+
## 🤝 Contributing
|
|
250
721
|
|
|
251
|
-
|
|
722
|
+
We welcome contributions! Whether it's:
|
|
252
723
|
|
|
724
|
+
- 🐛 Bug fixes
|
|
725
|
+
- ✨ New features
|
|
726
|
+
- 📝 Documentation improvements
|
|
727
|
+
- 🧪 Test coverage
|
|
728
|
+
- 💡 Ideas and suggestions
|
|
729
|
+
|
|
730
|
+
**Guidelines:**
|
|
253
731
|
1. Fork the repository
|
|
254
|
-
2. Create a feature branch
|
|
255
|
-
3. Make your changes
|
|
256
|
-
4.
|
|
257
|
-
5.
|
|
258
|
-
6. Submit a pull request
|
|
732
|
+
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
|
|
733
|
+
3. Make your changes with tests
|
|
734
|
+
4. Ensure all tests pass (`npm test`)
|
|
735
|
+
5. Submit a pull request
|
|
259
736
|
|
|
260
|
-
We
|
|
737
|
+
We maintain clean architecture principles while enhancing functionality and developer experience.
|
|
261
738
|
|
|
262
|
-
|
|
739
|
+
---
|
|
263
740
|
|
|
264
|
-
|
|
741
|
+
## 🎯 Why We Built This
|
|
742
|
+
|
|
743
|
+
Existing RAG solutions are either:
|
|
744
|
+
- 🔴 **Too complex** - Require extensive setup and configuration
|
|
745
|
+
- 🔴 **Cloud-dependent** - Need API keys and external services
|
|
746
|
+
- 🔴 **Python-only** - Not ideal for TypeScript/Node.js projects
|
|
747
|
+
- 🔴 **Heavy** - Massive dependencies and slow startup
|
|
748
|
+
|
|
749
|
+
**RAG-lite TS is different:**
|
|
750
|
+
- ✅ **Simple** - Works out of the box with zero config
|
|
751
|
+
- ✅ **Local-first** - Your data stays on your machine
|
|
752
|
+
- ✅ **TypeScript native** - Built for modern JS/TS projects
|
|
753
|
+
- ✅ **Lightweight** - Fast startup, minimal dependencies
|
|
754
|
+
|
|
755
|
+
---
|
|
265
756
|
|
|
266
|
-
##
|
|
757
|
+
## 🙏 Acknowledgments
|
|
267
758
|
|
|
268
|
-
|
|
759
|
+
Built with amazing open-source projects:
|
|
760
|
+
|
|
761
|
+
- **[transformers.js](https://github.com/xenova/transformers.js)** - Client-side ML models by Xenova
|
|
269
762
|
- **[hnswlib](https://github.com/nmslib/hnswlib)** - Fast approximate nearest neighbor search
|
|
763
|
+
- **[better-sqlite3](https://github.com/WiseLibs/better-sqlite3)** - Fast SQLite3 bindings
|
|
764
|
+
|
|
765
|
+
---
|
|
766
|
+
|
|
767
|
+
## 📄 License
|
|
768
|
+
|
|
769
|
+
MIT License - see [LICENSE](LICENSE) file for details.
|
|
770
|
+
|
|
771
|
+
---
|
|
772
|
+
|
|
773
|
+
<div align="center">
|
|
774
|
+
|
|
775
|
+
**⭐ Star us on GitHub — it helps!**
|
|
776
|
+
|
|
777
|
+
[Report Bug](https://github.com/your-username/rag-lite-ts/issues) • [Request Feature](https://github.com/your-username/rag-lite-ts/issues) • [Documentation](docs/README.md)
|
|
778
|
+
|
|
779
|
+
Made with ❤️ by developers, for developers
|
|
780
|
+
|
|
781
|
+
</div>
|