mcp-local-rag 0.4.2 → 0.5.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +58 -3
- package/dist/bin/install-skills.d.ts +17 -0
- package/dist/bin/install-skills.d.ts.map +1 -0
- package/dist/bin/install-skills.js +194 -0
- package/dist/bin/install-skills.js.map +1 -0
- package/dist/embedder/index.d.ts.map +1 -1
- package/dist/embedder/index.js +2 -0
- package/dist/embedder/index.js.map +1 -1
- package/dist/parser/html-parser.d.ts +14 -0
- package/dist/parser/html-parser.d.ts.map +1 -0
- package/dist/parser/html-parser.js +99 -0
- package/dist/parser/html-parser.js.map +1 -0
- package/dist/server/index.d.ts +46 -3
- package/dist/server/index.d.ts.map +1 -1
- package/dist/server/index.js +170 -19
- package/dist/server/index.js.map +1 -1
- package/dist/server/raw-data-utils.d.ts +84 -0
- package/dist/server/raw-data-utils.d.ts.map +1 -0
- package/dist/server/raw-data-utils.js +170 -0
- package/dist/server/raw-data-utils.js.map +1 -0
- package/package.json +23 -6
- package/skills/mcp-local-rag/SKILL.md +110 -0
- package/skills/mcp-local-rag/references/html-ingestion.md +73 -0
- package/skills/mcp-local-rag/references/query-optimization.md +57 -0
- package/skills/mcp-local-rag/references/result-refinement.md +54 -0
package/README.md
CHANGED
|
@@ -2,6 +2,8 @@
|
|
|
2
2
|
|
|
3
3
|
[](https://www.npmjs.com/package/mcp-local-rag)
|
|
4
4
|
[](https://opensource.org/licenses/MIT)
|
|
5
|
+
[](https://www.typescriptlang.org/)
|
|
6
|
+
[](https://registry.modelcontextprotocol.io/)
|
|
5
7
|
|
|
6
8
|
Local RAG for developers using MCP.
|
|
7
9
|
Semantic search with keyword boost for exact technical terms — fully private, zero setup.
|
|
@@ -86,8 +88,8 @@ You want AI to search your documents—technical specs, research papers, interna
|
|
|
86
88
|
|
|
87
89
|
## Usage
|
|
88
90
|
|
|
89
|
-
The server provides
|
|
90
|
-
(`ingest_file`, `query_documents`, `list_files`, `delete_file`, `status`).
|
|
91
|
+
The server provides 6 MCP tools: ingest file, ingest data, search, list, delete, status
|
|
92
|
+
(`ingest_file`, `ingest_data`, `query_documents`, `list_files`, `delete_file`, `status`).
|
|
91
93
|
|
|
92
94
|
### Ingesting Documents
|
|
93
95
|
|
|
@@ -99,6 +101,23 @@ Supports PDF, DOCX, TXT, and Markdown. The server extracts text, splits it into
|
|
|
99
101
|
|
|
100
102
|
Re-ingesting the same file replaces the old version automatically.
|
|
101
103
|
|
|
104
|
+
### Ingesting HTML Content
|
|
105
|
+
|
|
106
|
+
Use `ingest_data` to ingest HTML content retrieved by your AI assistant (via web fetch, curl, browser tools, etc.):
|
|
107
|
+
|
|
108
|
+
```
|
|
109
|
+
"Fetch https://example.com/docs and ingest the HTML"
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
The server extracts main content using Readability (removes navigation, ads, etc.), converts to Markdown, and indexes it. Perfect for:
|
|
113
|
+
- Web documentation
|
|
114
|
+
- HTML retrieved by the AI assistant
|
|
115
|
+
- Clipboard content
|
|
116
|
+
|
|
117
|
+
HTML is automatically cleaned—you get the article content, not the boilerplate.
|
|
118
|
+
|
|
119
|
+
> **Note:** The RAG server itself doesn't fetch web content—your AI assistant retrieves it and passes the HTML to `ingest_data`. This keeps the server fully local while letting you index any content your assistant can access. Please respect website terms of service and copyright when ingesting external content.
|
|
120
|
+
|
|
102
121
|
### Searching Documents
|
|
103
122
|
|
|
104
123
|
```
|
|
@@ -169,6 +188,42 @@ When you search:
|
|
|
169
188
|
|
|
170
189
|
The keyword boost ensures exact terms like `useEffect` or error codes rank higher when they match.
|
|
171
190
|
|
|
191
|
+
## Agent Skills
|
|
192
|
+
|
|
193
|
+
[Agent Skills](https://agentskills.io/) provide optimized prompts that help AI assistants use RAG tools more effectively. Install skills for better query formulation, result interpretation, and ingestion workflows:
|
|
194
|
+
|
|
195
|
+
```bash
|
|
196
|
+
# Claude Code (project-level)
|
|
197
|
+
npx mcp-local-rag-skills --claude-code
|
|
198
|
+
|
|
199
|
+
# Claude Code (user-level)
|
|
200
|
+
npx mcp-local-rag-skills --claude-code --global
|
|
201
|
+
|
|
202
|
+
# Codex
|
|
203
|
+
npx mcp-local-rag-skills --codex
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
Skills include:
|
|
207
|
+
- **Query optimization**: Better search query formulation
|
|
208
|
+
- **Result interpretation**: Score thresholds and filtering guidelines
|
|
209
|
+
- **HTML ingestion**: Format selection and source naming
|
|
210
|
+
|
|
211
|
+
### Ensuring Skill Activation
|
|
212
|
+
|
|
213
|
+
Skills are loaded automatically in most cases—AI assistants scan skill metadata and load relevant instructions when needed. For consistent behavior:
|
|
214
|
+
|
|
215
|
+
**Option 1: Explicit request (natural language)**
|
|
216
|
+
Before RAG operations, request in natural language:
|
|
217
|
+
- "Use the mcp-local-rag skill for this search"
|
|
218
|
+
- "Apply RAG best practices from skills"
|
|
219
|
+
|
|
220
|
+
**Option 2: Add to agent instruction file**
|
|
221
|
+
Add to your `AGENTS.md`, `CLAUDE.md`, or other agent instruction file:
|
|
222
|
+
```
|
|
223
|
+
When using query_documents, ingest_file, or ingest_data tools,
|
|
224
|
+
apply the mcp-local-rag skill for optimal query formulation and result interpretation.
|
|
225
|
+
```
|
|
226
|
+
|
|
172
227
|
<details>
|
|
173
228
|
<summary><strong>Configuration</strong></summary>
|
|
174
229
|
|
|
@@ -301,7 +356,7 @@ Yes, after the first model download (~90MB).
|
|
|
301
356
|
Cloud services offer better accuracy at scale but require sending data externally. This trades some accuracy for complete privacy and zero runtime cost.
|
|
302
357
|
|
|
303
358
|
**What file formats are supported?**
|
|
304
|
-
PDF, DOCX, TXT, Markdown. Not yet: Excel, PowerPoint, images
|
|
359
|
+
PDF, DOCX, TXT, Markdown, and HTML (via `ingest_data`). Not yet: Excel, PowerPoint, images.
|
|
305
360
|
|
|
306
361
|
**Can I change the embedding model?**
|
|
307
362
|
Yes, but you must delete your database and re-ingest all documents. Different models produce incompatible vector dimensions.
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
/**
|
|
3
|
+
* MCP Local RAG Skills Installer
|
|
4
|
+
*
|
|
5
|
+
* Installs skills to various AI coding assistants:
|
|
6
|
+
* - Claude Code (project or global)
|
|
7
|
+
* - OpenAI Codex
|
|
8
|
+
* - Custom path
|
|
9
|
+
*
|
|
10
|
+
* Usage:
|
|
11
|
+
* npx mcp-local-rag-skills --claude-code # Project-level
|
|
12
|
+
* npx mcp-local-rag-skills --claude-code --global # User-level
|
|
13
|
+
* npx mcp-local-rag-skills --codex # Codex
|
|
14
|
+
* npx mcp-local-rag-skills --path /custom/path # Custom
|
|
15
|
+
*/
|
|
16
|
+
export {};
|
|
17
|
+
//# sourceMappingURL=install-skills.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"install-skills.d.ts","sourceRoot":"","sources":["../../src/bin/install-skills.ts"],"names":[],"mappings":";AAEA;;;;;;;;;;;;;GAaG"}
|
|
@@ -0,0 +1,194 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
"use strict";
|
|
3
|
+
/**
|
|
4
|
+
* MCP Local RAG Skills Installer
|
|
5
|
+
*
|
|
6
|
+
* Installs skills to various AI coding assistants:
|
|
7
|
+
* - Claude Code (project or global)
|
|
8
|
+
* - OpenAI Codex
|
|
9
|
+
* - Custom path
|
|
10
|
+
*
|
|
11
|
+
* Usage:
|
|
12
|
+
* npx mcp-local-rag-skills --claude-code # Project-level
|
|
13
|
+
* npx mcp-local-rag-skills --claude-code --global # User-level
|
|
14
|
+
* npx mcp-local-rag-skills --codex # Codex
|
|
15
|
+
* npx mcp-local-rag-skills --path /custom/path # Custom
|
|
16
|
+
*/
|
|
17
|
+
Object.defineProperty(exports, "__esModule", { value: true });
|
|
18
|
+
const node_fs_1 = require("node:fs");
|
|
19
|
+
const node_os_1 = require("node:os");
|
|
20
|
+
const node_path_1 = require("node:path");
|
|
21
|
+
// ============================================
|
|
22
|
+
// Constants
|
|
23
|
+
// ============================================
|
|
24
|
+
// Skills source directory (relative to dist/bin when compiled)
|
|
25
|
+
// dist/bin/install-skills.js -> dist/skills/mcp-local-rag
|
|
26
|
+
// But skills are actually in package root: skills/mcp-local-rag
|
|
27
|
+
// So from dist/bin, go up twice: ../.. then skills/mcp-local-rag
|
|
28
|
+
const SKILLS_SOURCE = (0, node_path_1.resolve)(__dirname, '..', '..', 'skills', 'mcp-local-rag');
|
|
29
|
+
// Codex home directory (supports CODEX_HOME environment variable)
|
|
30
|
+
// https://developers.openai.com/codex/local-config/
|
|
31
|
+
const CODEX_HOME = process.env['CODEX_HOME'] || (0, node_path_1.join)((0, node_os_1.homedir)(), '.codex');
|
|
32
|
+
// Installation targets
|
|
33
|
+
const TARGETS = {
|
|
34
|
+
'claude-code-project': './.claude/skills/mcp-local-rag',
|
|
35
|
+
'claude-code-global': (0, node_path_1.join)((0, node_os_1.homedir)(), '.claude', 'skills', 'mcp-local-rag'),
|
|
36
|
+
'codex-project': './.codex/skills/mcp-local-rag',
|
|
37
|
+
'codex-global': (0, node_path_1.join)(CODEX_HOME, 'skills', 'mcp-local-rag'),
|
|
38
|
+
};
|
|
39
|
+
function parseArgs(args) {
|
|
40
|
+
const options = {
|
|
41
|
+
target: 'claude-code-project',
|
|
42
|
+
help: false,
|
|
43
|
+
};
|
|
44
|
+
for (let i = 0; i < args.length; i++) {
|
|
45
|
+
const arg = args[i];
|
|
46
|
+
switch (arg) {
|
|
47
|
+
case '--help':
|
|
48
|
+
case '-h':
|
|
49
|
+
options.help = true;
|
|
50
|
+
break;
|
|
51
|
+
case '--claude-code':
|
|
52
|
+
// Check for --global flag
|
|
53
|
+
if (args[i + 1] === '--global') {
|
|
54
|
+
options.target = 'claude-code-global';
|
|
55
|
+
i++; // Skip next arg
|
|
56
|
+
}
|
|
57
|
+
else {
|
|
58
|
+
options.target = 'claude-code-project';
|
|
59
|
+
}
|
|
60
|
+
break;
|
|
61
|
+
case '--codex':
|
|
62
|
+
// Check for --project or --global flag
|
|
63
|
+
if (args[i + 1] === '--project') {
|
|
64
|
+
options.target = 'codex-project';
|
|
65
|
+
i++; // Skip next arg
|
|
66
|
+
}
|
|
67
|
+
else if (args[i + 1] === '--global') {
|
|
68
|
+
options.target = 'codex-global';
|
|
69
|
+
i++; // Skip next arg
|
|
70
|
+
}
|
|
71
|
+
else {
|
|
72
|
+
// Default to global (matches previous behavior)
|
|
73
|
+
options.target = 'codex-global';
|
|
74
|
+
}
|
|
75
|
+
break;
|
|
76
|
+
case '--path': {
|
|
77
|
+
const pathArg = args[i + 1];
|
|
78
|
+
if (!pathArg) {
|
|
79
|
+
console.error('Error: --path requires a path argument');
|
|
80
|
+
process.exit(1);
|
|
81
|
+
}
|
|
82
|
+
options.target = 'custom';
|
|
83
|
+
options.customPath = pathArg;
|
|
84
|
+
i++; // Skip next arg
|
|
85
|
+
break;
|
|
86
|
+
}
|
|
87
|
+
default:
|
|
88
|
+
if (arg?.startsWith('-')) {
|
|
89
|
+
console.error(`Unknown option: ${arg}`);
|
|
90
|
+
process.exit(1);
|
|
91
|
+
}
|
|
92
|
+
}
|
|
93
|
+
}
|
|
94
|
+
return options;
|
|
95
|
+
}
|
|
96
|
+
// ============================================
|
|
97
|
+
// Help Message
|
|
98
|
+
// ============================================
|
|
99
|
+
function printHelp() {
|
|
100
|
+
console.log(`
|
|
101
|
+
MCP Local RAG Skills Installer
|
|
102
|
+
|
|
103
|
+
Usage:
|
|
104
|
+
npx mcp-local-rag-skills [options]
|
|
105
|
+
|
|
106
|
+
Options:
|
|
107
|
+
--claude-code Install to project-level Claude Code skills
|
|
108
|
+
(./.claude/skills/)
|
|
109
|
+
|
|
110
|
+
--claude-code --global Install to user-level Claude Code skills
|
|
111
|
+
(~/.claude/skills/)
|
|
112
|
+
|
|
113
|
+
--codex Install to user-level Codex skills (default)
|
|
114
|
+
($CODEX_HOME/skills/ or ~/.codex/skills/)
|
|
115
|
+
|
|
116
|
+
--codex --project Install to project-level Codex skills
|
|
117
|
+
(./.codex/skills/)
|
|
118
|
+
|
|
119
|
+
--codex --global Install to user-level Codex skills
|
|
120
|
+
($CODEX_HOME/skills/ or ~/.codex/skills/)
|
|
121
|
+
|
|
122
|
+
--path <path> Install to custom path
|
|
123
|
+
|
|
124
|
+
--help, -h Show this help message
|
|
125
|
+
|
|
126
|
+
Examples:
|
|
127
|
+
npx mcp-local-rag-skills --claude-code
|
|
128
|
+
npx mcp-local-rag-skills --claude-code --global
|
|
129
|
+
npx mcp-local-rag-skills --codex
|
|
130
|
+
npx mcp-local-rag-skills --codex --project
|
|
131
|
+
npx mcp-local-rag-skills --path ./my-skills/
|
|
132
|
+
`);
|
|
133
|
+
}
|
|
134
|
+
// ============================================
|
|
135
|
+
// Installation
|
|
136
|
+
// ============================================
|
|
137
|
+
function getTargetPath(options) {
|
|
138
|
+
if (options.target === 'custom') {
|
|
139
|
+
if (!options.customPath) {
|
|
140
|
+
console.error('Error: Custom path not specified');
|
|
141
|
+
process.exit(1);
|
|
142
|
+
}
|
|
143
|
+
return (0, node_path_1.resolve)(options.customPath, 'mcp-local-rag');
|
|
144
|
+
}
|
|
145
|
+
return TARGETS[options.target];
|
|
146
|
+
}
|
|
147
|
+
function install(targetPath) {
|
|
148
|
+
// Check source exists
|
|
149
|
+
if (!(0, node_fs_1.existsSync)(SKILLS_SOURCE)) {
|
|
150
|
+
console.error(`Error: Skills source not found at ${SKILLS_SOURCE}`);
|
|
151
|
+
process.exit(1);
|
|
152
|
+
}
|
|
153
|
+
// Create target directory
|
|
154
|
+
const targetDir = (0, node_path_1.dirname)(targetPath);
|
|
155
|
+
if (!(0, node_fs_1.existsSync)(targetDir)) {
|
|
156
|
+
(0, node_fs_1.mkdirSync)(targetDir, { recursive: true });
|
|
157
|
+
console.log(`Created directory: ${targetDir}`);
|
|
158
|
+
}
|
|
159
|
+
// Copy skills
|
|
160
|
+
(0, node_fs_1.cpSync)(SKILLS_SOURCE, targetPath, { recursive: true });
|
|
161
|
+
console.log(`Installed skills to: ${targetPath}`);
|
|
162
|
+
}
|
|
163
|
+
// ============================================
|
|
164
|
+
// Main
|
|
165
|
+
// ============================================
|
|
166
|
+
function main() {
|
|
167
|
+
const args = process.argv.slice(2);
|
|
168
|
+
// Default to help if no args
|
|
169
|
+
if (args.length === 0) {
|
|
170
|
+
printHelp();
|
|
171
|
+
process.exit(0);
|
|
172
|
+
}
|
|
173
|
+
const options = parseArgs(args);
|
|
174
|
+
if (options.help) {
|
|
175
|
+
printHelp();
|
|
176
|
+
process.exit(0);
|
|
177
|
+
}
|
|
178
|
+
const targetPath = getTargetPath(options);
|
|
179
|
+
console.log('Installing MCP Local RAG skills...');
|
|
180
|
+
console.log(`Target: ${options.target}`);
|
|
181
|
+
console.log(`Path: ${targetPath}`);
|
|
182
|
+
console.log();
|
|
183
|
+
install(targetPath);
|
|
184
|
+
console.log();
|
|
185
|
+
console.log('Installation complete!');
|
|
186
|
+
console.log();
|
|
187
|
+
console.log('The following skills are now available:');
|
|
188
|
+
console.log(' - mcp-local-rag (SKILL.md)');
|
|
189
|
+
console.log(' - references/html-ingestion.md');
|
|
190
|
+
console.log(' - references/query-optimization.md');
|
|
191
|
+
console.log(' - references/result-refinement.md');
|
|
192
|
+
}
|
|
193
|
+
main();
|
|
194
|
+
//# sourceMappingURL=install-skills.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"install-skills.js","sourceRoot":"","sources":["../../src/bin/install-skills.ts"],"names":[],"mappings":";;AAEA;;;;;;;;;;;;;GAaG;;AAEH,qCAAuD;AACvD,qCAAiC;AACjC,yCAAkD;AAElD,+CAA+C;AAC/C,YAAY;AACZ,+CAA+C;AAE/C,+DAA+D;AAC/D,0DAA0D;AAC1D,gEAAgE;AAChE,iEAAiE;AACjE,MAAM,aAAa,GAAG,IAAA,mBAAO,EAAC,SAAS,EAAE,IAAI,EAAE,IAAI,EAAE,QAAQ,EAAE,eAAe,CAAC,CAAA;AAE/E,kEAAkE;AAClE,oDAAoD;AACpD,MAAM,UAAU,GAAG,OAAO,CAAC,GAAG,CAAC,YAAY,CAAC,IAAI,IAAA,gBAAI,EAAC,IAAA,iBAAO,GAAE,EAAE,QAAQ,CAAC,CAAA;AAEzE,uBAAuB;AACvB,MAAM,OAAO,GAAG;IACd,qBAAqB,EAAE,gCAAgC;IACvD,oBAAoB,EAAE,IAAA,gBAAI,EAAC,IAAA,iBAAO,GAAE,EAAE,SAAS,EAAE,QAAQ,EAAE,eAAe,CAAC;IAC3E,eAAe,EAAE,+BAA+B;IAChD,cAAc,EAAE,IAAA,gBAAI,EAAC,UAAU,EAAE,QAAQ,EAAE,eAAe,CAAC;CACnD,CAAA;AAYV,SAAS,SAAS,CAAC,IAAc;IAC/B,MAAM,OAAO,GAAY;QACvB,MAAM,EAAE,qBAAqB;QAC7B,IAAI,EAAE,KAAK;KACZ,CAAA;IAED,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,IAAI,CAAC,MAAM,EAAE,CAAC,EAAE,EAAE,CAAC;QACrC,MAAM,GAAG,GAAG,IAAI,CAAC,CAAC,CAAC,CAAA;QAEnB,QAAQ,GAAG,EAAE,CAAC;YACZ,KAAK,QAAQ,CAAC;YACd,KAAK,IAAI;gBACP,OAAO,CAAC,IAAI,GAAG,IAAI,CAAA;gBACnB,MAAK;YAEP,KAAK,eAAe;gBAClB,0BAA0B;gBAC1B,IAAI,IAAI,CAAC,CAAC,GAAG,CAAC,CAAC,KAAK,UAAU,EAAE,CAAC;oBAC/B,OAAO,CAAC,MAAM,GAAG,oBAAoB,CAAA;oBACrC,CAAC,EAAE,CAAA,CAAC,gBAAgB;gBACtB,CAAC;qBAAM,CAAC;oBACN,OAAO,CAAC,MAAM,GAAG,qBAAqB,CAAA;gBACxC,CAAC;gBACD,MAAK;YAEP,KAAK,SAAS;gBACZ,uCAAuC;gBACvC,IAAI,IAAI,CAAC,CAAC,GAAG,CAAC,CAAC,KAAK,WAAW,EAAE,CAAC;oBAChC,OAAO,CAAC,MAAM,GAAG,eAAe,CAAA;oBAChC,CAAC,EAAE,CAAA,CAAC,gBAAgB;gBACtB,CAAC;qBAAM,IAAI,IAAI,CAAC,CAAC,GAAG,CAAC,CAAC,KAAK,UAAU,EAAE,CAAC;oBACtC,OAAO,CAAC,MAAM,GAAG,cAAc,CAAA;oBAC/B,CAAC,EAAE,CAAA,CAAC,gBAAgB;gBACtB,CAAC;qBAAM,CAAC;oBACN,gDAAgD;oBAChD,OAAO,CAAC,MAAM,GAAG,cAAc,CAAA;gBACjC,CAAC;gBACD,MAAK;YAEP,KAAK,QAAQ,CAAC,CAAC,CAAC;gBACd,MAAM,OAAO,GAAG,IAAI,CAAC,CAAC,GAAG,CAAC,CAAC,CAAA;gBAC3B,IAAI,CAAC,OAAO,EAAE,CAAC;oBACb,OAAO,CAAC,KAAK,CAAC,wCAAwC,CAAC,CAAA;oBACvD,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAA;gBACjB,CAAC;gBACD,OAAO,CAAC,MAAM,GAAG,QAAQ,CAAA;gBACzB,OAAO,CAAC,UAAU,GAAG,OAAO,CAAA;gBAC5B,CAAC,EAAE,CAAA,CAAC,gBAAgB;gBACpB,MAAK;YACP,CAAC;YAED;gBACE,IAAI,GAAG,EAAE,UAAU,CAAC,GAAG,CAAC,EAAE,CAAC;oBACzB,OAAO,CAAC,KAAK,CAAC,mBAAmB,GAAG,EAAE,CAAC,CAAA;oBACvC,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAA;gBACjB,CAAC;QACL,CAAC;IACH,CAAC;IAED,OAAO,OAAO,CAAA;AAChB,CAAC;AAED,+CAA+C;AAC/C,eAAe;AACf,+CAA+C;AAE/C,SAAS,SAAS;IAChB,OAAO,CAAC,GAAG,CAAC;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CAgCb,CAAC,CAAA;AACF,CAAC;AAED,+CAA+C;AAC/C,eAAe;AACf,+CAA+C;AAE/C,SAAS,aAAa,CAAC,OAAgB;IACrC,IAAI,OAAO,CAAC,MAAM,KAAK,QAAQ,EAAE,CAAC;QAChC,IAAI,CAAC,OAAO,CAAC,UAAU,EAAE,CAAC;YACxB,OAAO,CAAC,KAAK,CAAC,kCAAkC,CAAC,CAAA;YACjD,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAA;QACjB,CAAC;QACD,OAAO,IAAA,mBAAO,EAAC,OAAO,CAAC,UAAU,EAAE,eAAe,CAAC,CAAA;IACrD,CAAC;IAED,OAAO,OAAO,CAAC,OAAO,CAAC,MAAM,CAAC,CAAA;AAChC,CAAC;AAED,SAAS,OAAO,CAAC,UAAkB;IACjC,sBAAsB;IACtB,IAAI,CAAC,IAAA,oBAAU,EAAC,aAAa,CAAC,EAAE,CAAC;QAC/B,OAAO,CAAC,KAAK,CAAC,qCAAqC,aAAa,EAAE,CAAC,CAAA;QACnE,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAA;IACjB,CAAC;IAED,0BAA0B;IAC1B,MAAM,SAAS,GAAG,IAAA,mBAAO,EAAC,UAAU,CAAC,CAAA;IACrC,IAAI,CAAC,IAAA,oBAAU,EAAC,SAAS,CAAC,EAAE,CAAC;QAC3B,IAAA,mBAAS,EAAC,SAAS,EAAE,EAAE,SAAS,EAAE,IAAI,EAAE,CAAC,CAAA;QACzC,OAAO,CAAC,GAAG,CAAC,sBAAsB,SAAS,EAAE,CAAC,CAAA;IAChD,CAAC;IAED,cAAc;IACd,IAAA,gBAAM,EAAC,aAAa,EAAE,UAAU,EAAE,EAAE,SAAS,EAAE,IAAI,EAAE,CAAC,CAAA;IACtD,OAAO,CAAC,GAAG,CAAC,wBAAwB,UAAU,EAAE,CAAC,CAAA;AACnD,CAAC;AAED,+CAA+C;AAC/C,OAAO;AACP,+CAA+C;AAE/C,SAAS,IAAI;IACX,MAAM,IAAI,GAAG,OAAO,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC,CAAC,CAAA;IAElC,6BAA6B;IAC7B,IAAI,IAAI,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;QACtB,SAAS,EAAE,CAAA;QACX,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAA;IACjB,CAAC;IAED,MAAM,OAAO,GAAG,SAAS,CAAC,IAAI,CAAC,CAAA;IAE/B,IAAI,OAAO,CAAC,IAAI,EAAE,CAAC;QACjB,SAAS,EAAE,CAAA;QACX,OAAO,CAAC,IAAI,CAAC,CAAC,CAAC,CAAA;IACjB,CAAC;IAED,MAAM,UAAU,GAAG,aAAa,CAAC,OAAO,CAAC,CAAA;IAEzC,OAAO,CAAC,GAAG,CAAC,oCAAoC,CAAC,CAAA;IACjD,OAAO,CAAC,GAAG,CAAC,WAAW,OAAO,CAAC,MAAM,EAAE,CAAC,CAAA;IACxC,OAAO,CAAC,GAAG,CAAC,SAAS,UAAU,EAAE,CAAC,CAAA;IAClC,OAAO,CAAC,GAAG,EAAE,CAAA;IAEb,OAAO,CAAC,UAAU,CAAC,CAAA;IAEnB,OAAO,CAAC,GAAG,EAAE,CAAA;IACb,OAAO,CAAC,GAAG,CAAC,wBAAwB,CAAC,CAAA;IACrC,OAAO,CAAC,GAAG,EAAE,CAAA;IACb,OAAO,CAAC,GAAG,CAAC,yCAAyC,CAAC,CAAA;IACtD,OAAO,CAAC,GAAG,CAAC,8BAA8B,CAAC,CAAA;IAC3C,OAAO,CAAC,GAAG,CAAC,kCAAkC,CAAC,CAAA;IAC/C,OAAO,CAAC,GAAG,CAAC,sCAAsC,CAAC,CAAA;IACnD,OAAO,CAAC,GAAG,CAAC,qCAAqC,CAAC,CAAA;AACpD,CAAC;AAED,IAAI,EAAE,CAAA"}
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../src/embedder/index.ts"],"names":[],"mappings":"AAQA;;GAEG;AACH,MAAM,WAAW,cAAc;IAC7B,6BAA6B;IAC7B,SAAS,EAAE,MAAM,CAAA;IACjB,iBAAiB;IACjB,SAAS,EAAE,MAAM,CAAA;IACjB,4BAA4B;IAC5B,QAAQ,EAAE,MAAM,CAAA;CACjB;AAMD;;GAEG;AACH,qBAAa,cAAe,SAAQ,KAAK;aAGZ,KAAK,CAAC,EAAE,KAAK;gBADtC,OAAO,EAAE,MAAM,EACU,KAAK,CAAC,EAAE,KAAK,YAAA;CAKzC;AAMD;;;;;;;GAOG;AACH,qBAAa,QAAQ;
|
|
1
|
+
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../src/embedder/index.ts"],"names":[],"mappings":"AAQA;;GAEG;AACH,MAAM,WAAW,cAAc;IAC7B,6BAA6B;IAC7B,SAAS,EAAE,MAAM,CAAA;IACjB,iBAAiB;IACjB,SAAS,EAAE,MAAM,CAAA;IACjB,4BAA4B;IAC5B,QAAQ,EAAE,MAAM,CAAA;CACjB;AAMD;;GAEG;AACH,qBAAa,cAAe,SAAQ,KAAK;aAGZ,KAAK,CAAC,EAAE,KAAK;gBADtC,OAAO,EAAE,MAAM,EACU,KAAK,CAAC,EAAE,KAAK,YAAA;CAKzC;AAMD;;;;;;;GAOG;AACH,qBAAa,QAAQ;IAEnB,OAAO,CAAC,KAAK,CAAgB;IAC7B,OAAO,CAAC,WAAW,CAA6B;IAChD,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAgB;gBAE3B,MAAM,EAAE,cAAc;IAIlC;;OAEG;IACG,UAAU,IAAI,OAAO,CAAC,IAAI,CAAC;IAuBjC;;;OAGG;YACW,iBAAiB;IA+B/B;;;;;OAKG;IACG,KAAK,CAAC,IAAI,EAAE,MAAM,GAAG,OAAO,CAAC,MAAM,EAAE,CAAC;IAiC5C;;;;;OAKG;IACG,UAAU,CAAC,KAAK,EAAE,MAAM,EAAE,GAAG,OAAO,CAAC,MAAM,EAAE,EAAE,CAAC;CA0BvD"}
|
package/dist/embedder/index.js
CHANGED
|
@@ -30,6 +30,7 @@ exports.EmbeddingError = EmbeddingError;
|
|
|
30
30
|
*/
|
|
31
31
|
class Embedder {
|
|
32
32
|
constructor(config) {
|
|
33
|
+
// Using unknown to avoid TS2590 (union type too complex with @types/jsdom)
|
|
33
34
|
this.model = null;
|
|
34
35
|
this.initPromise = null;
|
|
35
36
|
this.config = config;
|
|
@@ -47,6 +48,7 @@ class Embedder {
|
|
|
47
48
|
transformers_1.env.cacheDir = this.config.cacheDir;
|
|
48
49
|
console.error(`Embedder: Setting cache directory to "${this.config.cacheDir}"`);
|
|
49
50
|
console.error(`Embedder: Loading model "${this.config.modelPath}"...`);
|
|
51
|
+
// Use type assertion to avoid TS2590 (union type too complex with @types/jsdom)
|
|
50
52
|
this.model = await (0, transformers_1.pipeline)('feature-extraction', this.config.modelPath);
|
|
51
53
|
console.error('Embedder: Model loaded successfully');
|
|
52
54
|
}
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"index.js","sourceRoot":"","sources":["../../src/embedder/index.ts"],"names":[],"mappings":";AAAA,+CAA+C;;;AAE/C,4DAAyD;AAkBzD,+CAA+C;AAC/C,gBAAgB;AAChB,+CAA+C;AAE/C;;GAEG;AACH,MAAa,cAAe,SAAQ,KAAK;IACvC,YACE,OAAe,EACU,KAAa;QAEtC,KAAK,CAAC,OAAO,CAAC,CAAA;QAFW,UAAK,GAAL,KAAK,CAAQ;QAGtC,IAAI,CAAC,IAAI,GAAG,gBAAgB,CAAA;IAC9B,CAAC;CACF;AARD,wCAQC;AAED,+CAA+C;AAC/C,iBAAiB;AACjB,+CAA+C;AAE/C;;;;;;;GAOG;AACH,MAAa,QAAQ;
|
|
1
|
+
{"version":3,"file":"index.js","sourceRoot":"","sources":["../../src/embedder/index.ts"],"names":[],"mappings":";AAAA,+CAA+C;;;AAE/C,4DAAyD;AAkBzD,+CAA+C;AAC/C,gBAAgB;AAChB,+CAA+C;AAE/C;;GAEG;AACH,MAAa,cAAe,SAAQ,KAAK;IACvC,YACE,OAAe,EACU,KAAa;QAEtC,KAAK,CAAC,OAAO,CAAC,CAAA;QAFW,UAAK,GAAL,KAAK,CAAQ;QAGtC,IAAI,CAAC,IAAI,GAAG,gBAAgB,CAAA;IAC9B,CAAC;CACF;AARD,wCAQC;AAED,+CAA+C;AAC/C,iBAAiB;AACjB,+CAA+C;AAE/C;;;;;;;GAOG;AACH,MAAa,QAAQ;IAMnB,YAAY,MAAsB;QALlC,2EAA2E;QACnE,UAAK,GAAY,IAAI,CAAA;QACrB,gBAAW,GAAyB,IAAI,CAAA;QAI9C,IAAI,CAAC,MAAM,GAAG,MAAM,CAAA;IACtB,CAAC;IAED;;OAEG;IACH,KAAK,CAAC,UAAU;QACd,8BAA8B;QAC9B,IAAI,IAAI,CAAC,KAAK,EAAE,CAAC;YACf,OAAM;QACR,CAAC;QAED,IAAI,CAAC;YACH,+CAA+C;YAC/C,kBAAG,CAAC,QAAQ,GAAG,IAAI,CAAC,MAAM,CAAC,QAAQ,CAAA;YAEnC,OAAO,CAAC,KAAK,CAAC,yCAAyC,IAAI,CAAC,MAAM,CAAC,QAAQ,GAAG,CAAC,CAAA;YAC/E,OAAO,CAAC,KAAK,CAAC,4BAA4B,IAAI,CAAC,MAAM,CAAC,SAAS,MAAM,CAAC,CAAA;YACtE,gFAAgF;YAChF,IAAI,CAAC,KAAK,GAAG,MAAM,IAAA,uBAAQ,EAAC,oBAAoB,EAAE,IAAI,CAAC,MAAM,CAAC,SAAS,CAAC,CAAA;YACxE,OAAO,CAAC,KAAK,CAAC,qCAAqC,CAAC,CAAA;QACtD,CAAC;QAAC,OAAO,KAAK,EAAE,CAAC;YACf,MAAM,IAAI,cAAc,CACtB,kCAAmC,KAAe,CAAC,OAAO,EAAE,EAC5D,KAAc,CACf,CAAA;QACH,CAAC;IACH,CAAC;IAED;;;OAGG;IACK,KAAK,CAAC,iBAAiB;QAC7B,sBAAsB;QACtB,IAAI,IAAI,CAAC,KAAK,EAAE,CAAC;YACf,OAAM;QACR,CAAC;QAED,kDAAkD;QAClD,IAAI,IAAI,CAAC,WAAW,EAAE,CAAC;YACrB,MAAM,IAAI,CAAC,WAAW,CAAA;YACtB,OAAM;QACR,CAAC;QAED,uBAAuB;QACvB,OAAO,CAAC,KAAK,CACX,+FAA+F,CAChG,CAAA;QAED,IAAI,CAAC,WAAW,GAAG,IAAI,CAAC,UAAU,EAAE,CAAC,KAAK,CAAC,CAAC,KAAK,EAAE,EAAE;YACnD,8CAA8C;YAC9C,IAAI,CAAC,WAAW,GAAG,IAAI,CAAA;YAEvB,+CAA+C;YAC/C,MAAM,IAAI,cAAc,CACtB,+CAAgD,KAAe,CAAC,OAAO,wTAAwT,IAAI,CAAC,MAAM,CAAC,QAAQ,gCAAgC,EACnb,KAAc,CACf,CAAA;QACH,CAAC,CAAC,CAAA;QAEF,MAAM,IAAI,CAAC,WAAW,CAAA;IACxB,CAAC;IAED;;;;;OAKG;IACH,KAAK,CAAC,KAAK,CAAC,IAAY;QACtB,0EAA0E;QAC1E,MAAM,IAAI,CAAC,iBAAiB,EAAE,CAAA;QAE9B,IAAI,CAAC;YACH,mEAAmE;YACnE,IAAI,IAAI,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;gBACtB,MAAM,IAAI,cAAc,CAAC,0CAA0C,CAAC,CAAA;YACtE,CAAC;YAED,uEAAuE;YACvE,8FAA8F;YAC9F,MAAM,OAAO,GAAG,EAAE,OAAO,EAAE,MAAM,EAAE,SAAS,EAAE,IAAI,EAAE,CAAA;YACpD,MAAM,SAAS,GAAG,IAAI,CAAC,KAGa,CAAA;YACpC,MAAM,MAAM,GAAG,MAAM,SAAS,CAAC,IAAI,EAAE,OAAO,CAAC,CAAA;YAE7C,qCAAqC;YACrC,MAAM,SAAS,GAAG,KAAK,CAAC,IAAI,CAAC,MAAM,CAAC,IAAI,CAAC,CAAA;YACzC,OAAO,SAAS,CAAA;QAClB,CAAC;QAAC,OAAO,KAAK,EAAE,CAAC;YACf,IAAI,KAAK,YAAY,cAAc,EAAE,CAAC;gBACpC,MAAM,KAAK,CAAA;YACb,CAAC;YACD,MAAM,IAAI,cAAc,CACtB,iCAAkC,KAAe,CAAC,OAAO,EAAE,EAC3D,KAAc,CACf,CAAA;QACH,CAAC;IACH,CAAC;IAED;;;;;OAKG;IACH,KAAK,CAAC,UAAU,CAAC,KAAe;QAC9B,0EAA0E;QAC1E,MAAM,IAAI,CAAC,iBAAiB,EAAE,CAAA;QAE9B,IAAI,KAAK,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;YACvB,OAAO,EAAE,CAAA;QACX,CAAC;QAED,IAAI,CAAC;YACH,MAAM,UAAU,GAAe,EAAE,CAAA;YAEjC,6CAA6C;YAC7C,KAAK,IAAI,CAAC,GAAG,CAAC,EAAE,CAAC,GAAG,KAAK,CAAC,MAAM,EAAE,CAAC,IAAI,IAAI,CAAC,MAAM,CAAC,SAAS,EAAE,CAAC;gBAC7D,MAAM,KAAK,GAAG,KAAK,CAAC,KAAK,CAAC,CAAC,EAAE,CAAC,GAAG,IAAI,CAAC,MAAM,CAAC,SAAS,CAAC,CAAA;gBACvD,MAAM,eAAe,GAAG,MAAM,OAAO,CAAC,GAAG,CAAC,KAAK,CAAC,GAAG,CAAC,CAAC,IAAI,EAAE,EAAE,CAAC,IAAI,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC,CAAC,CAAA;gBAChF,UAAU,CAAC,IAAI,CAAC,GAAG,eAAe,CAAC,CAAA;YACrC,CAAC;YAED,OAAO,UAAU,CAAA;QACnB,CAAC;QAAC,OAAO,KAAK,EAAE,CAAC;YACf,MAAM,IAAI,cAAc,CACtB,wCAAyC,KAAe,CAAC,OAAO,EAAE,EAClE,KAAc,CACf,CAAA;QACH,CAAC;IACH,CAAC;CACF;AA9ID,4BA8IC"}
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Parse HTML content and extract main content as Markdown
|
|
3
|
+
*
|
|
4
|
+
* Flow:
|
|
5
|
+
* 1. HTML string → JSDOM (DOM creation)
|
|
6
|
+
* 2. JSDOM → Readability (main content extraction, noise removal)
|
|
7
|
+
* 3. Readability result → Turndown (Markdown conversion)
|
|
8
|
+
*
|
|
9
|
+
* @param html - Raw HTML string
|
|
10
|
+
* @param url - Source URL (used for resolving relative links)
|
|
11
|
+
* @returns Markdown string of extracted content
|
|
12
|
+
*/
|
|
13
|
+
export declare function parseHtml(html: string, url: string): Promise<string>;
|
|
14
|
+
//# sourceMappingURL=html-parser.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"html-parser.d.ts","sourceRoot":"","sources":["../../src/parser/html-parser.ts"],"names":[],"mappings":"AAsDA;;;;;;;;;;;GAWG;AACH,wBAAsB,SAAS,CAAC,IAAI,EAAE,MAAM,EAAE,GAAG,EAAE,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC,CAoD1E"}
|
|
@@ -0,0 +1,99 @@
|
|
|
1
|
+
"use strict";
|
|
2
|
+
// HTML Parser using Readability and Turndown
|
|
3
|
+
// Extracts main content from HTML and converts to Markdown
|
|
4
|
+
var __importDefault = (this && this.__importDefault) || function (mod) {
|
|
5
|
+
return (mod && mod.__esModule) ? mod : { "default": mod };
|
|
6
|
+
};
|
|
7
|
+
Object.defineProperty(exports, "__esModule", { value: true });
|
|
8
|
+
exports.parseHtml = parseHtml;
|
|
9
|
+
const readability_1 = require("@mozilla/readability");
|
|
10
|
+
const jsdom_1 = require("jsdom");
|
|
11
|
+
const turndown_1 = __importDefault(require("turndown"));
|
|
12
|
+
// ============================================
|
|
13
|
+
// Turndown Service Configuration
|
|
14
|
+
// ============================================
|
|
15
|
+
/**
|
|
16
|
+
* Create and configure Turndown service for HTML to Markdown conversion
|
|
17
|
+
*/
|
|
18
|
+
function createTurndownService() {
|
|
19
|
+
const turndownService = new turndown_1.default({
|
|
20
|
+
headingStyle: 'atx', // Use # style headings
|
|
21
|
+
codeBlockStyle: 'fenced', // Use ``` for code blocks
|
|
22
|
+
bulletListMarker: '-', // Use - for bullet lists
|
|
23
|
+
emDelimiter: '_', // Use _ for emphasis
|
|
24
|
+
strongDelimiter: '**', // Use ** for bold
|
|
25
|
+
});
|
|
26
|
+
// Keep code blocks intact
|
|
27
|
+
turndownService.addRule('codeBlocks', {
|
|
28
|
+
filter: ['pre'],
|
|
29
|
+
replacement: (_content, node) => {
|
|
30
|
+
const element = node;
|
|
31
|
+
const codeElement = element.querySelector('code');
|
|
32
|
+
const code = codeElement ? codeElement.textContent : element.textContent;
|
|
33
|
+
const language = codeElement?.className?.replace('language-', '') || '';
|
|
34
|
+
return `\n\`\`\`${language}\n${code?.trim() || ''}\n\`\`\`\n`;
|
|
35
|
+
},
|
|
36
|
+
});
|
|
37
|
+
return turndownService;
|
|
38
|
+
}
|
|
39
|
+
// ============================================
|
|
40
|
+
// HTML Parser
|
|
41
|
+
// ============================================
|
|
42
|
+
/**
|
|
43
|
+
* Parse HTML content and extract main content as Markdown
|
|
44
|
+
*
|
|
45
|
+
* Flow:
|
|
46
|
+
* 1. HTML string → JSDOM (DOM creation)
|
|
47
|
+
* 2. JSDOM → Readability (main content extraction, noise removal)
|
|
48
|
+
* 3. Readability result → Turndown (Markdown conversion)
|
|
49
|
+
*
|
|
50
|
+
* @param html - Raw HTML string
|
|
51
|
+
* @param url - Source URL (used for resolving relative links)
|
|
52
|
+
* @returns Markdown string of extracted content
|
|
53
|
+
*/
|
|
54
|
+
async function parseHtml(html, url) {
|
|
55
|
+
// Handle empty or whitespace-only HTML
|
|
56
|
+
if (!html || html.trim().length === 0) {
|
|
57
|
+
return '';
|
|
58
|
+
}
|
|
59
|
+
try {
|
|
60
|
+
// Create DOM from HTML string
|
|
61
|
+
const dom = new jsdom_1.JSDOM(html, {
|
|
62
|
+
url,
|
|
63
|
+
// Enable features needed for Readability
|
|
64
|
+
runScripts: 'outside-only',
|
|
65
|
+
});
|
|
66
|
+
const document = dom.window.document;
|
|
67
|
+
// Use Readability to extract main content
|
|
68
|
+
const reader = new readability_1.Readability(document, {
|
|
69
|
+
keepClasses: false,
|
|
70
|
+
debug: false,
|
|
71
|
+
});
|
|
72
|
+
const article = reader.parse();
|
|
73
|
+
// If Readability couldn't extract content, fall back to body text
|
|
74
|
+
if (!article || !article.content) {
|
|
75
|
+
// Try to get body content directly
|
|
76
|
+
const bodyContent = document.body?.innerHTML || '';
|
|
77
|
+
if (!bodyContent.trim()) {
|
|
78
|
+
return '';
|
|
79
|
+
}
|
|
80
|
+
// Convert raw body HTML to Markdown
|
|
81
|
+
const turndownService = createTurndownService();
|
|
82
|
+
return turndownService.turndown(bodyContent).trim();
|
|
83
|
+
}
|
|
84
|
+
// Convert extracted HTML content to Markdown
|
|
85
|
+
const turndownService = createTurndownService();
|
|
86
|
+
const markdown = turndownService.turndown(article.content);
|
|
87
|
+
// Add title if available
|
|
88
|
+
if (article.title) {
|
|
89
|
+
return `# ${article.title}\n\n${markdown}`.trim();
|
|
90
|
+
}
|
|
91
|
+
return markdown.trim();
|
|
92
|
+
}
|
|
93
|
+
catch (error) {
|
|
94
|
+
// Log error but don't throw - return empty string for graceful degradation
|
|
95
|
+
console.error('Failed to parse HTML:', error);
|
|
96
|
+
return '';
|
|
97
|
+
}
|
|
98
|
+
}
|
|
99
|
+
//# sourceMappingURL=html-parser.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"html-parser.js","sourceRoot":"","sources":["../../src/parser/html-parser.ts"],"names":[],"mappings":";AAAA,6CAA6C;AAC7C,2DAA2D;;;;;AAiE3D,8BAoDC;AAnHD,sDAAkD;AAClD,iCAA6B;AAC7B,wDAAsC;AActC,+CAA+C;AAC/C,iCAAiC;AACjC,+CAA+C;AAE/C;;GAEG;AACH,SAAS,qBAAqB;IAC5B,MAAM,eAAe,GAAG,IAAI,kBAAe,CAAC;QAC1C,YAAY,EAAE,KAAK,EAAE,uBAAuB;QAC5C,cAAc,EAAE,QAAQ,EAAE,0BAA0B;QACpD,gBAAgB,EAAE,GAAG,EAAE,yBAAyB;QAChD,WAAW,EAAE,GAAG,EAAE,qBAAqB;QACvC,eAAe,EAAE,IAAI,EAAE,kBAAkB;KAC1C,CAAC,CAAA;IAEF,0BAA0B;IAC1B,eAAe,CAAC,OAAO,CAAC,YAAY,EAAE;QACpC,MAAM,EAAE,CAAC,KAAK,CAAC;QACf,WAAW,EAAE,CAAC,QAAQ,EAAE,IAAI,EAAE,EAAE;YAC9B,MAAM,OAAO,GAAG,IAAe,CAAA;YAC/B,MAAM,WAAW,GAAG,OAAO,CAAC,aAAa,CAAC,MAAM,CAAC,CAAA;YACjD,MAAM,IAAI,GAAG,WAAW,CAAC,CAAC,CAAC,WAAW,CAAC,WAAW,CAAC,CAAC,CAAC,OAAO,CAAC,WAAW,CAAA;YACxE,MAAM,QAAQ,GAAG,WAAW,EAAE,SAAS,EAAE,OAAO,CAAC,WAAW,EAAE,EAAE,CAAC,IAAI,EAAE,CAAA;YACvE,OAAO,WAAW,QAAQ,KAAK,IAAI,EAAE,IAAI,EAAE,IAAI,EAAE,YAAY,CAAA;QAC/D,CAAC;KACF,CAAC,CAAA;IAEF,OAAO,eAAe,CAAA;AACxB,CAAC;AAED,+CAA+C;AAC/C,cAAc;AACd,+CAA+C;AAE/C;;;;;;;;;;;GAWG;AACI,KAAK,UAAU,SAAS,CAAC,IAAY,EAAE,GAAW;IACvD,uCAAuC;IACvC,IAAI,CAAC,IAAI,IAAI,IAAI,CAAC,IAAI,EAAE,CAAC,MAAM,KAAK,CAAC,EAAE,CAAC;QACtC,OAAO,EAAE,CAAA;IACX,CAAC;IAED,IAAI,CAAC;QACH,8BAA8B;QAC9B,MAAM,GAAG,GAAG,IAAI,aAAK,CAAC,IAAI,EAAE;YAC1B,GAAG;YACH,yCAAyC;YACzC,UAAU,EAAE,cAAc;SAC3B,CAAC,CAAA;QAEF,MAAM,QAAQ,GAAG,GAAG,CAAC,MAAM,CAAC,QAAQ,CAAA;QAEpC,0CAA0C;QAC1C,MAAM,MAAM,GAAG,IAAI,yBAAW,CAAC,QAAQ,EAAE;YACvC,WAAW,EAAE,KAAK;YAClB,KAAK,EAAE,KAAK;SACb,CAAC,CAAA;QAEF,MAAM,OAAO,GAAG,MAAM,CAAC,KAAK,EAA8B,CAAA;QAE1D,kEAAkE;QAClE,IAAI,CAAC,OAAO,IAAI,CAAC,OAAO,CAAC,OAAO,EAAE,CAAC;YACjC,mCAAmC;YACnC,MAAM,WAAW,GAAG,QAAQ,CAAC,IAAI,EAAE,SAAS,IAAI,EAAE,CAAA;YAClD,IAAI,CAAC,WAAW,CAAC,IAAI,EAAE,EAAE,CAAC;gBACxB,OAAO,EAAE,CAAA;YACX,CAAC;YAED,oCAAoC;YACpC,MAAM,eAAe,GAAG,qBAAqB,EAAE,CAAA;YAC/C,OAAO,eAAe,CAAC,QAAQ,CAAC,WAAW,CAAC,CAAC,IAAI,EAAE,CAAA;QACrD,CAAC;QAED,6CAA6C;QAC7C,MAAM,eAAe,GAAG,qBAAqB,EAAE,CAAA;QAC/C,MAAM,QAAQ,GAAG,eAAe,CAAC,QAAQ,CAAC,OAAO,CAAC,OAAO,CAAC,CAAA;QAE1D,yBAAyB;QACzB,IAAI,OAAO,CAAC,KAAK,EAAE,CAAC;YAClB,OAAO,KAAK,OAAO,CAAC,KAAK,OAAO,QAAQ,EAAE,CAAC,IAAI,EAAE,CAAA;QACnD,CAAC;QAED,OAAO,QAAQ,CAAC,IAAI,EAAE,CAAA;IACxB,CAAC;IAAC,OAAO,KAAK,EAAE,CAAC;QACf,2EAA2E;QAC3E,OAAO,CAAC,KAAK,CAAC,uBAAuB,EAAE,KAAK,CAAC,CAAA;QAC7C,OAAO,EAAE,CAAA;IACX,CAAC;AACH,CAAC"}
|
package/dist/server/index.d.ts
CHANGED
|
@@ -1,4 +1,5 @@
|
|
|
1
1
|
import { type GroupingMode } from '../vectordb/index.js';
|
|
2
|
+
import { type ContentFormat } from './raw-data-utils.js';
|
|
2
3
|
/**
|
|
3
4
|
* RAGServer configuration
|
|
4
5
|
*/
|
|
@@ -36,12 +37,33 @@ export interface IngestFileInput {
|
|
|
36
37
|
/** File path */
|
|
37
38
|
filePath: string;
|
|
38
39
|
}
|
|
40
|
+
/**
|
|
41
|
+
* ingest_data tool input metadata
|
|
42
|
+
*/
|
|
43
|
+
export interface IngestDataMetadata {
|
|
44
|
+
/** Source identifier: URL ("https://...") or custom ID ("clipboard://2024-12-30") */
|
|
45
|
+
source: string;
|
|
46
|
+
/** Content format */
|
|
47
|
+
format: ContentFormat;
|
|
48
|
+
}
|
|
49
|
+
/**
|
|
50
|
+
* ingest_data tool input
|
|
51
|
+
*/
|
|
52
|
+
export interface IngestDataInput {
|
|
53
|
+
/** Content to ingest (text, HTML, or Markdown) */
|
|
54
|
+
content: string;
|
|
55
|
+
/** Content metadata */
|
|
56
|
+
metadata: IngestDataMetadata;
|
|
57
|
+
}
|
|
39
58
|
/**
|
|
40
59
|
* delete_file tool input
|
|
60
|
+
* Either filePath or source must be provided
|
|
41
61
|
*/
|
|
42
62
|
export interface DeleteFileInput {
|
|
43
|
-
/** File path */
|
|
44
|
-
filePath
|
|
63
|
+
/** File path (for files ingested via ingest_file) */
|
|
64
|
+
filePath?: string;
|
|
65
|
+
/** Source identifier (for data ingested via ingest_data) */
|
|
66
|
+
source?: string;
|
|
45
67
|
}
|
|
46
68
|
/**
|
|
47
69
|
* ingest_file tool output
|
|
@@ -66,6 +88,8 @@ export interface QueryResult {
|
|
|
66
88
|
text: string;
|
|
67
89
|
/** Similarity score */
|
|
68
90
|
score: number;
|
|
91
|
+
/** Original source (only for raw-data files, e.g., URLs ingested via ingest_data) */
|
|
92
|
+
source?: string;
|
|
69
93
|
}
|
|
70
94
|
/**
|
|
71
95
|
* RAG server compliant with MCP Protocol
|
|
@@ -82,6 +106,7 @@ export declare class RAGServer {
|
|
|
82
106
|
private readonly embedder;
|
|
83
107
|
private readonly chunker;
|
|
84
108
|
private readonly parser;
|
|
109
|
+
private readonly dbPath;
|
|
85
110
|
constructor(config: RAGServerConfig);
|
|
86
111
|
/**
|
|
87
112
|
* Set up MCP handlers
|
|
@@ -110,7 +135,23 @@ export declare class RAGServer {
|
|
|
110
135
|
}];
|
|
111
136
|
}>;
|
|
112
137
|
/**
|
|
113
|
-
*
|
|
138
|
+
* ingest_data tool handler
|
|
139
|
+
* Saves raw content to raw-data directory and calls handleIngestFile internally
|
|
140
|
+
*
|
|
141
|
+
* For HTML content:
|
|
142
|
+
* - Parses HTML and extracts main content using Readability
|
|
143
|
+
* - Converts to Markdown for better chunking
|
|
144
|
+
* - Saves as .md file
|
|
145
|
+
*/
|
|
146
|
+
handleIngestData(args: IngestDataInput): Promise<{
|
|
147
|
+
content: [{
|
|
148
|
+
type: 'text';
|
|
149
|
+
text: string;
|
|
150
|
+
}];
|
|
151
|
+
}>;
|
|
152
|
+
/**
|
|
153
|
+
* list_files tool handler
|
|
154
|
+
* Enriches raw-data files with original source information
|
|
114
155
|
*/
|
|
115
156
|
handleListFiles(): Promise<{
|
|
116
157
|
content: [{
|
|
@@ -129,6 +170,8 @@ export declare class RAGServer {
|
|
|
129
170
|
}>;
|
|
130
171
|
/**
|
|
131
172
|
* delete_file tool handler
|
|
173
|
+
* Deletes chunks from VectorDB and physical raw-data files
|
|
174
|
+
* Supports both filePath (for ingest_file) and source (for ingest_data)
|
|
132
175
|
*/
|
|
133
176
|
handleDeleteFile(args: DeleteFileInput): Promise<{
|
|
134
177
|
content: [{
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../src/server/index.ts"],"names":[],"mappings":"
|
|
1
|
+
{"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../src/server/index.ts"],"names":[],"mappings":"AAWA,OAAO,EAAE,KAAK,YAAY,EAAiC,MAAM,sBAAsB,CAAA;AACvF,OAAO,EACL,KAAK,aAAa,EAKnB,MAAM,qBAAqB,CAAA;AAM5B;;GAEG;AACH,MAAM,WAAW,eAAe;IAC9B,4BAA4B;IAC5B,MAAM,EAAE,MAAM,CAAA;IACd,iCAAiC;IACjC,SAAS,EAAE,MAAM,CAAA;IACjB,4BAA4B;IAC5B,QAAQ,EAAE,MAAM,CAAA;IAChB,8BAA8B;IAC9B,OAAO,EAAE,MAAM,CAAA;IACf,gCAAgC;IAChC,WAAW,EAAE,MAAM,CAAA;IACnB,kEAAkE;IAClE,WAAW,CAAC,EAAE,MAAM,CAAA;IACpB,qDAAqD;IACrD,QAAQ,CAAC,EAAE,YAAY,CAAA;IACvB,sFAAsF;IACtF,YAAY,CAAC,EAAE,MAAM,CAAA;CACtB;AAED;;GAEG;AACH,MAAM,WAAW,mBAAmB;IAClC,6BAA6B;IAC7B,KAAK,EAAE,MAAM,CAAA;IACb,iDAAiD;IACjD,KAAK,CAAC,EAAE,MAAM,CAAA;CACf;AAED;;GAEG;AACH,MAAM,WAAW,eAAe;IAC9B,gBAAgB;IAChB,QAAQ,EAAE,MAAM,CAAA;CACjB;AAED;;GAEG;AACH,MAAM,WAAW,kBAAkB;IACjC,qFAAqF;IACrF,MAAM,EAAE,MAAM,CAAA;IACd,qBAAqB;IACrB,MAAM,EAAE,aAAa,CAAA;CACtB;AAED;;GAEG;AACH,MAAM,WAAW,eAAe;IAC9B,kDAAkD;IAClD,OAAO,EAAE,MAAM,CAAA;IACf,uBAAuB;IACvB,QAAQ,EAAE,kBAAkB,CAAA;CAC7B;AAED;;;GAGG;AACH,MAAM,WAAW,eAAe;IAC9B,qDAAqD;IACrD,QAAQ,CAAC,EAAE,MAAM,CAAA;IACjB,4DAA4D;IAC5D,MAAM,CAAC,EAAE,MAAM,CAAA;CAChB;AAED;;GAEG;AACH,MAAM,WAAW,YAAY;IAC3B,gBAAgB;IAChB,QAAQ,EAAE,MAAM,CAAA;IAChB,kBAAkB;IAClB,UAAU,EAAE,MAAM,CAAA;IAClB,gBAAgB;IAChB,SAAS,EAAE,MAAM,CAAA;CAClB;AAED;;GAEG;AACH,MAAM,WAAW,WAAW;IAC1B,gBAAgB;IAChB,QAAQ,EAAE,MAAM,CAAA;IAChB,kBAAkB;IAClB,UAAU,EAAE,MAAM,CAAA;IAClB,WAAW;IACX,IAAI,EAAE,MAAM,CAAA;IACZ,uBAAuB;IACvB,KAAK,EAAE,MAAM,CAAA;IACb,qFAAqF;IACrF,MAAM,CAAC,EAAE,MAAM,CAAA;CAChB;AAMD;;;;;;;;GAQG;AACH,qBAAa,SAAS;IACpB,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAQ;IAC/B,OAAO,CAAC,QAAQ,CAAC,WAAW,CAAa;IACzC,OAAO,CAAC,QAAQ,CAAC,QAAQ,CAAU;IACnC,OAAO,CAAC,QAAQ,CAAC,OAAO,CAAiB;IACzC,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAgB;IACvC,OAAO,CAAC,QAAQ,CAAC,MAAM,CAAQ;gBAEnB,MAAM,EAAE,eAAe;IAqCnC;;OAEG;IACH,OAAO,CAAC,aAAa;IA0IrB;;OAEG;IACG,UAAU,IAAI,OAAO,CAAC,IAAI,CAAC;IAKjC;;OAEG;IACG,oBAAoB,CACxB,IAAI,EAAE,mBAAmB,GACxB,OAAO,CAAC;QAAE,OAAO,EAAE,CAAC;YAAE,IAAI,EAAE,MAAM,CAAC;YAAC,IAAI,EAAE,MAAM,CAAA;SAAE,CAAC,CAAA;KAAE,CAAC;IA0CzD;;OAEG;IACG,gBAAgB,CACpB,IAAI,EAAE,eAAe,GACpB,OAAO,CAAC;QAAE,OAAO,EAAE,CAAC;YAAE,IAAI,EAAE,MAAM,CAAC;YAAC,IAAI,EAAE,MAAM,CAAA;SAAE,CAAC,CAAA;KAAE,CAAC;IAmIzD;;;;;;;;OAQG;IACG,gBAAgB,CACpB,IAAI,EAAE,eAAe,GACpB,OAAO,CAAC;QAAE,OAAO,EAAE,CAAC;YAAE,IAAI,EAAE,MAAM,CAAC;YAAC,IAAI,EAAE,MAAM,CAAA;SAAE,CAAC,CAAA;KAAE,CAAC;IAyDzD;;;OAGG;IACG,eAAe,IAAI,OAAO,CAAC;QAAE,OAAO,EAAE,CAAC;YAAE,IAAI,EAAE,MAAM,CAAC;YAAC,IAAI,EAAE,MAAM,CAAA;SAAE,CAAC,CAAA;KAAE,CAAC;IA6B/E;;OAEG;IACG,YAAY,IAAI,OAAO,CAAC;QAAE,OAAO,EAAE,CAAC;YAAE,IAAI,EAAE,MAAM,CAAC;YAAC,IAAI,EAAE,MAAM,CAAA;SAAE,CAAC,CAAA;KAAE,CAAC;IAiB5E;;;;OAIG;IACG,gBAAgB,CACpB,IAAI,EAAE,eAAe,GACpB,OAAO,CAAC;QAAE,OAAO,EAAE,CAAC;YAAE,IAAI,EAAE,MAAM,CAAC;YAAC,IAAI,EAAE,MAAM,CAAA;SAAE,CAAC,CAAA;KAAE,CAAC;IA+DzD;;OAEG;IACG,GAAG,IAAI,OAAO,CAAC,IAAI,CAAC;CAK3B"}
|