@robthepcguy/rag-vault 1.5.0 → 1.5.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +0 -0
- package/README.md +1 -0
- package/dist/bin/install-skills.d.ts +0 -0
- package/dist/bin/install-skills.js +0 -0
- package/dist/chunker/index.d.ts +0 -0
- package/dist/chunker/index.js +0 -0
- package/dist/chunker/semantic-chunker.d.ts +0 -0
- package/dist/chunker/semantic-chunker.js +0 -0
- package/dist/chunker/sentence-splitter.d.ts +0 -0
- package/dist/chunker/sentence-splitter.js +0 -0
- package/dist/embedder/index.d.ts +0 -0
- package/dist/embedder/index.js +0 -0
- package/dist/errors/index.d.ts +0 -0
- package/dist/errors/index.js +0 -0
- package/dist/explainability/index.d.ts +0 -0
- package/dist/explainability/index.js +0 -0
- package/dist/explainability/keywords.d.ts +0 -0
- package/dist/explainability/keywords.js +0 -0
- package/dist/flywheel/feedback.d.ts +0 -0
- package/dist/flywheel/feedback.js +0 -0
- package/dist/flywheel/index.d.ts +0 -0
- package/dist/flywheel/index.js +0 -0
- package/dist/index.d.ts +0 -0
- package/dist/parser/html-parser.d.ts +0 -0
- package/dist/parser/html-parser.js +0 -0
- package/dist/parser/index.d.ts +0 -0
- package/dist/parser/index.js +0 -0
- package/dist/parser/pdf-filter.d.ts +0 -0
- package/dist/parser/pdf-filter.js +0 -0
- package/dist/query/index.d.ts +0 -0
- package/dist/query/index.js +0 -0
- package/dist/query/parser.d.ts +0 -0
- package/dist/query/parser.js +0 -0
- package/dist/server/index.d.ts +0 -0
- package/dist/server/index.js +0 -0
- package/dist/server/raw-data-utils.d.ts +0 -0
- package/dist/server/raw-data-utils.js +0 -0
- package/dist/server/schemas.d.ts +0 -0
- package/dist/server/schemas.js +0 -0
- package/dist/utils/config-parsers.d.ts +0 -0
- package/dist/utils/config-parsers.js +0 -0
- package/dist/utils/config.d.ts +0 -0
- package/dist/utils/config.js +0 -0
- package/dist/utils/file-utils.d.ts +0 -0
- package/dist/utils/file-utils.js +0 -0
- package/dist/utils/math.d.ts +0 -0
- package/dist/utils/math.js +0 -0
- package/dist/utils/process-handlers.d.ts +0 -0
- package/dist/utils/process-handlers.js +0 -0
- package/dist/vectordb/index.d.ts +0 -0
- package/dist/vectordb/index.js +12 -12
- package/dist/web/api-routes.d.ts +0 -0
- package/dist/web/api-routes.js +0 -0
- package/dist/web/config-routes.d.ts +0 -0
- package/dist/web/config-routes.js +0 -0
- package/dist/web/database-manager.d.ts +0 -0
- package/dist/web/database-manager.js +0 -0
- package/dist/web/http-server.d.ts +0 -0
- package/dist/web/http-server.js +0 -0
- package/dist/web/index.d.ts +0 -0
- package/dist/web/index.js +0 -0
- package/dist/web/middleware/async-handler.d.ts +0 -0
- package/dist/web/middleware/async-handler.js +0 -0
- package/dist/web/middleware/auth.d.ts +0 -0
- package/dist/web/middleware/auth.js +0 -0
- package/dist/web/middleware/error-handler.d.ts +0 -0
- package/dist/web/middleware/error-handler.js +0 -0
- package/dist/web/middleware/index.d.ts +0 -0
- package/dist/web/middleware/index.js +0 -0
- package/dist/web/middleware/rate-limit.d.ts +0 -0
- package/dist/web/middleware/rate-limit.js +0 -0
- package/dist/web/middleware/request-logger.d.ts +0 -0
- package/dist/web/middleware/request-logger.js +0 -0
- package/dist/web/types.d.ts +0 -0
- package/dist/web/types.js +0 -0
- package/package.json +37 -50
- package/skills/rag-vault/SKILL.md +111 -111
- package/skills/rag-vault/references/html-ingestion.md +73 -73
- package/skills/rag-vault/references/query-optimization.md +57 -57
- package/skills/rag-vault/references/result-refinement.md +54 -54
- package/web-ui/dist/assets/index-SBHxoAwi.js +0 -0
- package/web-ui/dist/assets/index-ej8i4PGl.css +0 -0
- package/web-ui/dist/index.html +0 -0
- package/web-ui/dist/vite.svg +0 -0
package/LICENSE
CHANGED
|
File without changes
|
package/README.md
CHANGED
|
@@ -397,6 +397,7 @@ Copy the `DB_PATH` directory (default: `./lancedb/`).
|
|
|
397
397
|
| File too large | Default limit is 100MB. Set `MAX_FILE_SIZE` higher or split the file. |
|
|
398
398
|
| Path outside BASE_DIR | All file paths must be under `BASE_DIR`. Use absolute paths. |
|
|
399
399
|
| MCP tools not showing | Verify config syntax, restart your AI tool completely (Cmd+Q on Mac). |
|
|
400
|
+
| `mcp-publisher login github` fails with `slow_down` | Use token login instead: `mcp-publisher login github --token "$(gh auth token)"` (or pass a PAT). |
|
|
400
401
|
| 401 Unauthorized | API key required. Set `RAG_API_KEY` or use correct header format. |
|
|
401
402
|
| 429 Too Many Requests | Rate limited. Wait for reset or increase `RATE_LIMIT_MAX_REQUESTS`. |
|
|
402
403
|
| CORS errors | Add your origin to `CORS_ORIGINS` environment variable. |
|
|
File without changes
|
|
File without changes
|
package/dist/chunker/index.d.ts
CHANGED
|
File without changes
|
package/dist/chunker/index.js
CHANGED
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
package/dist/embedder/index.d.ts
CHANGED
|
File without changes
|
package/dist/embedder/index.js
CHANGED
|
File without changes
|
package/dist/errors/index.d.ts
CHANGED
|
File without changes
|
package/dist/errors/index.js
CHANGED
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
package/dist/flywheel/index.d.ts
CHANGED
|
File without changes
|
package/dist/flywheel/index.js
CHANGED
|
File without changes
|
package/dist/index.d.ts
CHANGED
|
File without changes
|
|
File without changes
|
|
File without changes
|
package/dist/parser/index.d.ts
CHANGED
|
File without changes
|
package/dist/parser/index.js
CHANGED
|
File without changes
|
|
File without changes
|
|
File without changes
|
package/dist/query/index.d.ts
CHANGED
|
File without changes
|
package/dist/query/index.js
CHANGED
|
File without changes
|
package/dist/query/parser.d.ts
CHANGED
|
File without changes
|
package/dist/query/parser.js
CHANGED
|
File without changes
|
package/dist/server/index.d.ts
CHANGED
|
File without changes
|
package/dist/server/index.js
CHANGED
|
File without changes
|
|
File without changes
|
|
File without changes
|
package/dist/server/schemas.d.ts
CHANGED
|
File without changes
|
package/dist/server/schemas.js
CHANGED
|
File without changes
|
|
File without changes
|
|
File without changes
|
package/dist/utils/config.d.ts
CHANGED
|
File without changes
|
package/dist/utils/config.js
CHANGED
|
File without changes
|
|
File without changes
|
package/dist/utils/file-utils.js
CHANGED
|
File without changes
|
package/dist/utils/math.d.ts
CHANGED
|
File without changes
|
package/dist/utils/math.js
CHANGED
|
File without changes
|
|
File without changes
|
|
File without changes
|
package/dist/vectordb/index.d.ts
CHANGED
|
File without changes
|
package/dist/vectordb/index.js
CHANGED
|
@@ -323,15 +323,15 @@ class VectorStore {
|
|
|
323
323
|
if (tableNames.includes(this.config.tableName)) {
|
|
324
324
|
// Open existing table
|
|
325
325
|
this.table = await this.db.openTable(this.config.tableName);
|
|
326
|
-
console.
|
|
326
|
+
console.error(`VectorStore: Opened existing table "${this.config.tableName}"`);
|
|
327
327
|
// Ensure FTS index exists (migration for existing databases)
|
|
328
328
|
await this.ensureFtsIndex();
|
|
329
329
|
}
|
|
330
330
|
else {
|
|
331
331
|
// Create new table (schema auto-defined on first data insertion)
|
|
332
|
-
console.
|
|
332
|
+
console.error(`VectorStore: Table "${this.config.tableName}" will be created on first data insertion`);
|
|
333
333
|
}
|
|
334
|
-
console.
|
|
334
|
+
console.error(`VectorStore initialized: ${this.config.dbPath}`);
|
|
335
335
|
}
|
|
336
336
|
catch (error) {
|
|
337
337
|
// Clean up partially initialized resources on failure
|
|
@@ -365,7 +365,7 @@ class VectorStore {
|
|
|
365
365
|
async deleteChunks(filePath) {
|
|
366
366
|
if (!this.table) {
|
|
367
367
|
// If table doesn't exist, no deletion targets, return normally
|
|
368
|
-
console.
|
|
368
|
+
console.error('VectorStore: Skipping deletion as table does not exist');
|
|
369
369
|
return;
|
|
370
370
|
}
|
|
371
371
|
// Validate file path before use in query to prevent SQL injection
|
|
@@ -381,7 +381,7 @@ class VectorStore {
|
|
|
381
381
|
// so call delete directly
|
|
382
382
|
// Note: Field names are case-sensitive, use backticks for camelCase fields
|
|
383
383
|
await this.table.delete(`\`filePath\` = '${escapedFilePath}'`);
|
|
384
|
-
console.
|
|
384
|
+
console.error(`VectorStore: Deleted chunks for file "${filePath}"`);
|
|
385
385
|
// Rebuild FTS index after deleting data
|
|
386
386
|
await this.rebuildFtsIndex();
|
|
387
387
|
}
|
|
@@ -435,7 +435,7 @@ class VectorStore {
|
|
|
435
435
|
// Convert to LanceDB record format using explicit field mapping
|
|
436
436
|
const records = chunksWithFingerprints.map(toDbRecord);
|
|
437
437
|
this.table = await this.db.createTable(this.config.tableName, records);
|
|
438
|
-
console.
|
|
438
|
+
console.error(`VectorStore: Created table "${this.config.tableName}"`);
|
|
439
439
|
// Create FTS index for hybrid search
|
|
440
440
|
await this.ensureFtsIndex();
|
|
441
441
|
})();
|
|
@@ -445,7 +445,7 @@ class VectorStore {
|
|
|
445
445
|
finally {
|
|
446
446
|
this.tableCreationPromise = null;
|
|
447
447
|
}
|
|
448
|
-
console.
|
|
448
|
+
console.error(`VectorStore: Inserted ${chunks.length} chunks`);
|
|
449
449
|
return;
|
|
450
450
|
}
|
|
451
451
|
}
|
|
@@ -454,7 +454,7 @@ class VectorStore {
|
|
|
454
454
|
await this.table.add(records);
|
|
455
455
|
// Rebuild FTS index after adding new data
|
|
456
456
|
await this.rebuildFtsIndex();
|
|
457
|
-
console.
|
|
457
|
+
console.error(`VectorStore: Inserted ${chunks.length} chunks`);
|
|
458
458
|
}
|
|
459
459
|
catch (error) {
|
|
460
460
|
throw new index_js_1.DatabaseError('Failed to insert chunks', error);
|
|
@@ -492,12 +492,12 @@ class VectorStore {
|
|
|
492
492
|
name: FTS_INDEX_NAME,
|
|
493
493
|
});
|
|
494
494
|
this.ftsEnabled = true;
|
|
495
|
-
console.
|
|
495
|
+
console.error(`VectorStore: FTS index "${FTS_INDEX_NAME}" created successfully`);
|
|
496
496
|
// Drop old FTS indices
|
|
497
497
|
for (const idx of existingFtsIndices) {
|
|
498
498
|
if (idx.name !== FTS_INDEX_NAME) {
|
|
499
499
|
await this.table.dropIndex(idx.name);
|
|
500
|
-
console.
|
|
500
|
+
console.error(`VectorStore: Dropped old FTS index "${idx.name}"`);
|
|
501
501
|
}
|
|
502
502
|
}
|
|
503
503
|
}
|
|
@@ -579,7 +579,7 @@ class VectorStore {
|
|
|
579
579
|
*/
|
|
580
580
|
async search(queryVector, queryText, limit = 10) {
|
|
581
581
|
if (!this.table) {
|
|
582
|
-
console.
|
|
582
|
+
console.error('VectorStore: Returning empty results as table does not exist');
|
|
583
583
|
return [];
|
|
584
584
|
}
|
|
585
585
|
if (limit < 1 || limit > 20) {
|
|
@@ -779,7 +779,7 @@ class VectorStore {
|
|
|
779
779
|
this.ftsEnabled = false;
|
|
780
780
|
this.ftsFailureCount = 0;
|
|
781
781
|
this.ftsLastFailure = null;
|
|
782
|
-
console.
|
|
782
|
+
console.error('VectorStore: Connection closed');
|
|
783
783
|
// Propagate errors to caller after cleanup is complete
|
|
784
784
|
if (errors.length > 0) {
|
|
785
785
|
throw new index_js_1.DatabaseError(`Errors during close: ${errors.map((e) => e.message).join('; ')}`, errors[0]);
|
package/dist/web/api-routes.d.ts
CHANGED
|
File without changes
|
package/dist/web/api-routes.js
CHANGED
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
package/dist/web/http-server.js
CHANGED
|
File without changes
|
package/dist/web/index.d.ts
CHANGED
|
File without changes
|
package/dist/web/index.js
CHANGED
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
package/dist/web/types.d.ts
CHANGED
|
File without changes
|
package/dist/web/types.js
CHANGED
|
File without changes
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@robthepcguy/rag-vault",
|
|
3
|
-
"version": "1.5.
|
|
3
|
+
"version": "1.5.1",
|
|
4
4
|
"description": "Local RAG MCP Server - Easy-to-setup document search with minimal configuration",
|
|
5
5
|
"main": "dist/index.js",
|
|
6
6
|
"bin": {
|
|
@@ -41,42 +41,6 @@
|
|
|
41
41
|
"type": "git",
|
|
42
42
|
"url": "git+https://github.com/RobThePCGuy/rag-vault.git"
|
|
43
43
|
},
|
|
44
|
-
"scripts": {
|
|
45
|
-
"build": "tsc -p tsconfig.build.json && tsc-alias -p tsconfig.build.json",
|
|
46
|
-
"check": "pnpm type-check && pnpm lint && pnpm format:check",
|
|
47
|
-
"check:all": "pnpm check && pnpm check:web-ui && pnpm check:unused && pnpm check:deps && pnpm build && pnpm test:unit",
|
|
48
|
-
"check:deps": "madge --circular --extensions ts src",
|
|
49
|
-
"check:deps:graph": "madge --extensions ts --image graph.svg src",
|
|
50
|
-
"check:web-ui": "pnpm --prefix web-ui check",
|
|
51
|
-
"check:unused": "node scripts/check-unused-exports.js",
|
|
52
|
-
"check:unused:all": "knip",
|
|
53
|
-
"cleanup:processes": "bash ./scripts/cleanup-test-processes.sh",
|
|
54
|
-
"clean:dev": "rm -rf ./node_modules ./tmp ./uploads ./models ./lancedb ./dist ./package-lock.json && cd web-ui && rm -rf ./dist ./node_modules ./package-lock.json",
|
|
55
|
-
"dev": "tsx src/index.ts",
|
|
56
|
-
"format": "biome format --write src",
|
|
57
|
-
"format:check": "biome format src",
|
|
58
|
-
"lint": "biome lint src",
|
|
59
|
-
"lint:fix": "biome lint --write src",
|
|
60
|
-
"start": "node dist/index.js",
|
|
61
|
-
"test": "vitest run",
|
|
62
|
-
"test:coverage": "vitest run --coverage",
|
|
63
|
-
"test:safe": "pnpm test && pnpm cleanup:processes",
|
|
64
|
-
"test:watch": "vitest",
|
|
65
|
-
"type-check": "tsc --noEmit",
|
|
66
|
-
"audit": "pnpm audit --audit-level=moderate",
|
|
67
|
-
"audit:fix": "pnpm audit --fix",
|
|
68
|
-
"setup:web": "pnpm install && pnpm web:build && pnpm --prefix web-ui install && pnpm ui:build && pnpm web:start",
|
|
69
|
-
"ui:build": "pnpm --prefix web-ui build",
|
|
70
|
-
"ui:dev": "cd web-ui && pnpm dev",
|
|
71
|
-
"web:build": "pnpm build",
|
|
72
|
-
"web:dev": "concurrently -n api,ui -c blue,magenta \"pnpm web:watch\" \"pnpm --prefix web-ui dev\"",
|
|
73
|
-
"web:start": "node dist/web/index.js",
|
|
74
|
-
"web:watch": "tsx watch src/web/index.ts",
|
|
75
|
-
"web": "tsx src/web/index.ts",
|
|
76
|
-
"test:unit": "vitest run --project backend-unit --project web-ui",
|
|
77
|
-
"test:integration": "RUN_EMBEDDING_INTEGRATION=1 vitest run --project backend-integration",
|
|
78
|
-
"hooks:install": "git config core.hooksPath .githooks"
|
|
79
|
-
},
|
|
80
44
|
"dependencies": {
|
|
81
45
|
"@huggingface/transformers": "^3.7.6",
|
|
82
46
|
"@lancedb/lancedb": "^0.23.0",
|
|
@@ -118,17 +82,40 @@
|
|
|
118
82
|
"node": ">=20"
|
|
119
83
|
},
|
|
120
84
|
"mcpName": "io.github.RobThePCGuy/rag-vault",
|
|
121
|
-
"
|
|
122
|
-
"
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
"
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
85
|
+
"scripts": {
|
|
86
|
+
"build": "tsc -p tsconfig.build.json && tsc-alias -p tsconfig.build.json",
|
|
87
|
+
"check": "pnpm type-check && pnpm lint && pnpm format:check",
|
|
88
|
+
"check:all": "pnpm check && pnpm check:web-ui && pnpm check:unused && pnpm check:deps && pnpm build && pnpm test:unit",
|
|
89
|
+
"check:deps": "madge --circular --extensions ts src",
|
|
90
|
+
"check:deps:graph": "madge --extensions ts --image graph.svg src",
|
|
91
|
+
"check:web-ui": "pnpm --prefix web-ui check",
|
|
92
|
+
"check:unused": "node scripts/check-unused-exports.js",
|
|
93
|
+
"check:unused:all": "knip",
|
|
94
|
+
"cleanup:processes": "bash ./scripts/cleanup-test-processes.sh",
|
|
95
|
+
"clean:dev": "rm -rf ./node_modules ./tmp ./uploads ./models ./lancedb ./dist ./package-lock.json && cd web-ui && rm -rf ./dist ./node_modules ./package-lock.json",
|
|
96
|
+
"dev": "tsx src/index.ts",
|
|
97
|
+
"format": "biome format --write src",
|
|
98
|
+
"format:check": "biome format src",
|
|
99
|
+
"lint": "biome lint src",
|
|
100
|
+
"lint:fix": "biome lint --write src",
|
|
101
|
+
"start": "node dist/index.js",
|
|
102
|
+
"test": "vitest run",
|
|
103
|
+
"test:coverage": "vitest run --coverage",
|
|
104
|
+
"test:safe": "pnpm test && pnpm cleanup:processes",
|
|
105
|
+
"test:watch": "vitest",
|
|
106
|
+
"type-check": "tsc --noEmit",
|
|
107
|
+
"audit": "pnpm audit --audit-level=moderate",
|
|
108
|
+
"audit:fix": "pnpm audit --fix",
|
|
109
|
+
"setup:web": "pnpm install && pnpm web:build && pnpm --prefix web-ui install && pnpm ui:build && pnpm web:start",
|
|
110
|
+
"ui:build": "pnpm --prefix web-ui build",
|
|
111
|
+
"ui:dev": "cd web-ui && pnpm dev",
|
|
112
|
+
"web:build": "pnpm build",
|
|
113
|
+
"web:dev": "concurrently -n api,ui -c blue,magenta \"pnpm web:watch\" \"pnpm --prefix web-ui dev\"",
|
|
114
|
+
"web:start": "node dist/web/index.js",
|
|
115
|
+
"web:watch": "tsx watch src/web/index.ts",
|
|
116
|
+
"web": "tsx src/web/index.ts",
|
|
117
|
+
"test:unit": "vitest run --project backend-unit --project web-ui",
|
|
118
|
+
"test:integration": "RUN_EMBEDDING_INTEGRATION=1 vitest run --project backend-integration",
|
|
119
|
+
"hooks:install": "git config core.hooksPath .githooks"
|
|
133
120
|
}
|
|
134
|
-
}
|
|
121
|
+
}
|
|
@@ -1,111 +1,111 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: rag-vault
|
|
3
|
-
description: This skill should be used when the user asks to "search documents", "query RAG", "ingest file", "ingest PDF", "save web page", "add to knowledge base", or mentions document search, semantic search, vector search, or RAG operations. Provides score interpretation (< 0.3 good, > 0.5 skip), query optimization, and ingestion guidance for query_documents, ingest_file, ingest_data tools.
|
|
4
|
-
version: 1.0.0
|
|
5
|
-
---
|
|
6
|
-
|
|
7
|
-
# RAG Vault Skills
|
|
8
|
-
|
|
9
|
-
## Tools
|
|
10
|
-
|
|
11
|
-
| Tool | Use When |
|
|
12
|
-
|------|----------|
|
|
13
|
-
| `ingest_file` | Local files (PDF, DOCX, TXT, MD, JSON, JSONL) |
|
|
14
|
-
| `ingest_data` | Raw content (HTML, text) with source URL |
|
|
15
|
-
| `query_documents` | Semantic + keyword hybrid search |
|
|
16
|
-
| `delete_file` / `list_files` / `status` | Management |
|
|
17
|
-
|
|
18
|
-
## Search: Core Rules
|
|
19
|
-
|
|
20
|
-
Hybrid search combines vector (semantic) and keyword (BM25).
|
|
21
|
-
|
|
22
|
-
### Score Interpretation
|
|
23
|
-
|
|
24
|
-
Lower = better match. Use this to filter noise.
|
|
25
|
-
|
|
26
|
-
| Score | Action |
|
|
27
|
-
|-------|--------|
|
|
28
|
-
| < 0.3 | Use directly |
|
|
29
|
-
| 0.3-0.5 | Include if mentions same concept/entity |
|
|
30
|
-
| > 0.5 | Skip unless no better results |
|
|
31
|
-
|
|
32
|
-
### Limit Selection
|
|
33
|
-
|
|
34
|
-
| Intent | Limit |
|
|
35
|
-
|--------|-------|
|
|
36
|
-
| Specific answer (function, error) | 5 |
|
|
37
|
-
| General understanding | 10 |
|
|
38
|
-
| Comprehensive survey | 20 |
|
|
39
|
-
|
|
40
|
-
### Query Formulation
|
|
41
|
-
|
|
42
|
-
| Situation | Why Transform | Action |
|
|
43
|
-
|-----------|---------------|--------|
|
|
44
|
-
| Specific term mentioned | Keyword search needs exact match | KEEP term |
|
|
45
|
-
| Vague query | Vector search needs semantic signal | ADD context |
|
|
46
|
-
| Error stack or code block | Long text dilutes relevance | EXTRACT core keywords |
|
|
47
|
-
| Multiple distinct topics | Single query conflates results | SPLIT queries |
|
|
48
|
-
| Few/poor results | Term mismatch | EXPAND (see below) |
|
|
49
|
-
|
|
50
|
-
### Query Expansion
|
|
51
|
-
|
|
52
|
-
When results are few or all score > 0.5, expand query terms:
|
|
53
|
-
|
|
54
|
-
- Keep original term first, add 2-4 variants
|
|
55
|
-
- Types: synonyms, abbreviations, related terms, word forms
|
|
56
|
-
- Example: `"config"` → `"config configuration settings configure"`
|
|
57
|
-
|
|
58
|
-
Avoid over-expansion (causes topic drift).
|
|
59
|
-
|
|
60
|
-
### Result Selection
|
|
61
|
-
|
|
62
|
-
When to include vs skip—based on answer quality, not just score.
|
|
63
|
-
|
|
64
|
-
**INCLUDE** if:
|
|
65
|
-
- Directly answers the question
|
|
66
|
-
- Provides necessary context
|
|
67
|
-
- Score < 0.5
|
|
68
|
-
|
|
69
|
-
**SKIP** if:
|
|
70
|
-
- Same keyword, unrelated context
|
|
71
|
-
- Score > 0.7
|
|
72
|
-
- Mentions term without explanation
|
|
73
|
-
|
|
74
|
-
## Ingestion
|
|
75
|
-
|
|
76
|
-
### ingest_file
|
|
77
|
-
```
|
|
78
|
-
ingest_file({ filePath: "/absolute/path/to/document.pdf" })
|
|
79
|
-
```
|
|
80
|
-
|
|
81
|
-
### ingest_data
|
|
82
|
-
```
|
|
83
|
-
ingest_data({
|
|
84
|
-
content: "<html>...</html>",
|
|
85
|
-
metadata: { source: "https://example.com/page", format: "html" }
|
|
86
|
-
})
|
|
87
|
-
```
|
|
88
|
-
|
|
89
|
-
**Format selection** — match the data you have:
|
|
90
|
-
- HTML string → `format: "html"`
|
|
91
|
-
- Markdown string → `format: "markdown"`
|
|
92
|
-
- Other → `format: "text"`
|
|
93
|
-
|
|
94
|
-
**Source format:**
|
|
95
|
-
- Web page → Use URL: `https://example.com/page`
|
|
96
|
-
- Other content → Use scheme: `{type}://{date}` or `{type}://{date}/{detail}`
|
|
97
|
-
- Examples: `clipboard://2024-12-30`, `chat://2024-12-30/project-discussion`
|
|
98
|
-
|
|
99
|
-
**HTML source options:**
|
|
100
|
-
- Static page → LLM fetch
|
|
101
|
-
- SPA/JS-rendered → Browser MCP
|
|
102
|
-
- Auth required → Manual paste
|
|
103
|
-
|
|
104
|
-
Re-ingest same source to update. Use same source in `delete_file` to remove.
|
|
105
|
-
|
|
106
|
-
## References
|
|
107
|
-
|
|
108
|
-
For edge cases and examples:
|
|
109
|
-
- [html-ingestion.md](references/html-ingestion.md) - URL normalization, SPA handling
|
|
110
|
-
- [query-optimization.md](references/query-optimization.md) - Query patterns by intent
|
|
111
|
-
- [result-refinement.md](references/result-refinement.md) - Contradiction resolution, chunking
|
|
1
|
+
---
|
|
2
|
+
name: rag-vault
|
|
3
|
+
description: This skill should be used when the user asks to "search documents", "query RAG", "ingest file", "ingest PDF", "save web page", "add to knowledge base", or mentions document search, semantic search, vector search, or RAG operations. Provides score interpretation (< 0.3 good, > 0.5 skip), query optimization, and ingestion guidance for query_documents, ingest_file, ingest_data tools.
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# RAG Vault Skills
|
|
8
|
+
|
|
9
|
+
## Tools
|
|
10
|
+
|
|
11
|
+
| Tool | Use When |
|
|
12
|
+
|------|----------|
|
|
13
|
+
| `ingest_file` | Local files (PDF, DOCX, TXT, MD, JSON, JSONL) |
|
|
14
|
+
| `ingest_data` | Raw content (HTML, text) with source URL |
|
|
15
|
+
| `query_documents` | Semantic + keyword hybrid search |
|
|
16
|
+
| `delete_file` / `list_files` / `status` | Management |
|
|
17
|
+
|
|
18
|
+
## Search: Core Rules
|
|
19
|
+
|
|
20
|
+
Hybrid search combines vector (semantic) and keyword (BM25).
|
|
21
|
+
|
|
22
|
+
### Score Interpretation
|
|
23
|
+
|
|
24
|
+
Lower = better match. Use this to filter noise.
|
|
25
|
+
|
|
26
|
+
| Score | Action |
|
|
27
|
+
|-------|--------|
|
|
28
|
+
| < 0.3 | Use directly |
|
|
29
|
+
| 0.3-0.5 | Include if mentions same concept/entity |
|
|
30
|
+
| > 0.5 | Skip unless no better results |
|
|
31
|
+
|
|
32
|
+
### Limit Selection
|
|
33
|
+
|
|
34
|
+
| Intent | Limit |
|
|
35
|
+
|--------|-------|
|
|
36
|
+
| Specific answer (function, error) | 5 |
|
|
37
|
+
| General understanding | 10 |
|
|
38
|
+
| Comprehensive survey | 20 |
|
|
39
|
+
|
|
40
|
+
### Query Formulation
|
|
41
|
+
|
|
42
|
+
| Situation | Why Transform | Action |
|
|
43
|
+
|-----------|---------------|--------|
|
|
44
|
+
| Specific term mentioned | Keyword search needs exact match | KEEP term |
|
|
45
|
+
| Vague query | Vector search needs semantic signal | ADD context |
|
|
46
|
+
| Error stack or code block | Long text dilutes relevance | EXTRACT core keywords |
|
|
47
|
+
| Multiple distinct topics | Single query conflates results | SPLIT queries |
|
|
48
|
+
| Few/poor results | Term mismatch | EXPAND (see below) |
|
|
49
|
+
|
|
50
|
+
### Query Expansion
|
|
51
|
+
|
|
52
|
+
When results are few or all score > 0.5, expand query terms:
|
|
53
|
+
|
|
54
|
+
- Keep original term first, add 2-4 variants
|
|
55
|
+
- Types: synonyms, abbreviations, related terms, word forms
|
|
56
|
+
- Example: `"config"` → `"config configuration settings configure"`
|
|
57
|
+
|
|
58
|
+
Avoid over-expansion (causes topic drift).
|
|
59
|
+
|
|
60
|
+
### Result Selection
|
|
61
|
+
|
|
62
|
+
When to include vs skip—based on answer quality, not just score.
|
|
63
|
+
|
|
64
|
+
**INCLUDE** if:
|
|
65
|
+
- Directly answers the question
|
|
66
|
+
- Provides necessary context
|
|
67
|
+
- Score < 0.5
|
|
68
|
+
|
|
69
|
+
**SKIP** if:
|
|
70
|
+
- Same keyword, unrelated context
|
|
71
|
+
- Score > 0.7
|
|
72
|
+
- Mentions term without explanation
|
|
73
|
+
|
|
74
|
+
## Ingestion
|
|
75
|
+
|
|
76
|
+
### ingest_file
|
|
77
|
+
```
|
|
78
|
+
ingest_file({ filePath: "/absolute/path/to/document.pdf" })
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
### ingest_data
|
|
82
|
+
```
|
|
83
|
+
ingest_data({
|
|
84
|
+
content: "<html>...</html>",
|
|
85
|
+
metadata: { source: "https://example.com/page", format: "html" }
|
|
86
|
+
})
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
**Format selection** — match the data you have:
|
|
90
|
+
- HTML string → `format: "html"`
|
|
91
|
+
- Markdown string → `format: "markdown"`
|
|
92
|
+
- Other → `format: "text"`
|
|
93
|
+
|
|
94
|
+
**Source format:**
|
|
95
|
+
- Web page → Use URL: `https://example.com/page`
|
|
96
|
+
- Other content → Use scheme: `{type}://{date}` or `{type}://{date}/{detail}`
|
|
97
|
+
- Examples: `clipboard://2024-12-30`, `chat://2024-12-30/project-discussion`
|
|
98
|
+
|
|
99
|
+
**HTML source options:**
|
|
100
|
+
- Static page → LLM fetch
|
|
101
|
+
- SPA/JS-rendered → Browser MCP
|
|
102
|
+
- Auth required → Manual paste
|
|
103
|
+
|
|
104
|
+
Re-ingest same source to update. Use same source in `delete_file` to remove.
|
|
105
|
+
|
|
106
|
+
## References
|
|
107
|
+
|
|
108
|
+
For edge cases and examples:
|
|
109
|
+
- [html-ingestion.md](references/html-ingestion.md) - URL normalization, SPA handling
|
|
110
|
+
- [query-optimization.md](references/query-optimization.md) - Query patterns by intent
|
|
111
|
+
- [result-refinement.md](references/result-refinement.md) - Contradiction resolution, chunking
|
|
@@ -1,73 +1,73 @@
|
|
|
1
|
-
# HTML Ingestion Reference
|
|
2
|
-
|
|
3
|
-
Basic usage is in SKILL.md. This covers URL handling and edge cases.
|
|
4
|
-
|
|
5
|
-
## System Behavior
|
|
6
|
-
|
|
7
|
-
The parser extracts main content only—navigation, ads, and boilerplate are stripped. What gets indexed is clean body text, not the full HTML.
|
|
8
|
-
|
|
9
|
-
## When to Use Each Source Method
|
|
10
|
-
|
|
11
|
-
| Source Type | Method | Why |
|
|
12
|
-
|-------------|--------|-----|
|
|
13
|
-
| Static page, public | LLM fetch | Simplest, no extra tools |
|
|
14
|
-
| SPA / JS-rendered | Browser MCP | Need rendered DOM |
|
|
15
|
-
| Auth required | Manual paste | Can't fetch programmatically |
|
|
16
|
-
|
|
17
|
-
## URL Normalization
|
|
18
|
-
|
|
19
|
-
System strips query strings and fragments:
|
|
20
|
-
```
|
|
21
|
-
https://example.com/page?utm=x#section → https://example.com/page
|
|
22
|
-
```
|
|
23
|
-
|
|
24
|
-
**When query strings matter** (pagination, dynamic IDs):
|
|
25
|
-
```
|
|
26
|
-
ingest_data({
|
|
27
|
-
content: page1_html,
|
|
28
|
-
metadata: { source: "https://example.com/results?page=1", format: "html" }
|
|
29
|
-
})
|
|
30
|
-
```
|
|
31
|
-
Explicitly include full URL as source.
|
|
32
|
-
|
|
33
|
-
## Edge Cases
|
|
34
|
-
|
|
35
|
-
### Empty/Minimal Extraction
|
|
36
|
-
|
|
37
|
-
Why it happens:
|
|
38
|
-
- JS-rendered content (use browser MCP)
|
|
39
|
-
- Non-standard HTML structure
|
|
40
|
-
- Login required
|
|
41
|
-
|
|
42
|
-
### SPA/Dynamic Content
|
|
43
|
-
|
|
44
|
-
1. Use browser MCP to render
|
|
45
|
-
2. Wait for content load
|
|
46
|
-
3. Extract rendered HTML
|
|
47
|
-
4. Ingest via `ingest_data`
|
|
48
|
-
|
|
49
|
-
### Pages with Only Navigation
|
|
50
|
-
|
|
51
|
-
Skip or fetch deeper linked pages instead.
|
|
52
|
-
|
|
53
|
-
## Updating Content
|
|
54
|
-
|
|
55
|
-
Re-ingest with same source to replace:
|
|
56
|
-
```
|
|
57
|
-
ingest_data({
|
|
58
|
-
content: updated_html,
|
|
59
|
-
metadata: { source: "https://example.com/page", format: "html" }
|
|
60
|
-
})
|
|
61
|
-
```
|
|
62
|
-
|
|
63
|
-
## Search Results
|
|
64
|
-
|
|
65
|
-
Results from HTML include `source` field:
|
|
66
|
-
```json
|
|
67
|
-
{
|
|
68
|
-
"filePath": "raw-data/abc123.md",
|
|
69
|
-
"source": "https://example.com/page",
|
|
70
|
-
"text": "...",
|
|
71
|
-
"score": 0.25
|
|
72
|
-
}
|
|
73
|
-
```
|
|
1
|
+
# HTML Ingestion Reference
|
|
2
|
+
|
|
3
|
+
Basic usage is in SKILL.md. This covers URL handling and edge cases.
|
|
4
|
+
|
|
5
|
+
## System Behavior
|
|
6
|
+
|
|
7
|
+
The parser extracts main content only—navigation, ads, and boilerplate are stripped. What gets indexed is clean body text, not the full HTML.
|
|
8
|
+
|
|
9
|
+
## When to Use Each Source Method
|
|
10
|
+
|
|
11
|
+
| Source Type | Method | Why |
|
|
12
|
+
|-------------|--------|-----|
|
|
13
|
+
| Static page, public | LLM fetch | Simplest, no extra tools |
|
|
14
|
+
| SPA / JS-rendered | Browser MCP | Need rendered DOM |
|
|
15
|
+
| Auth required | Manual paste | Can't fetch programmatically |
|
|
16
|
+
|
|
17
|
+
## URL Normalization
|
|
18
|
+
|
|
19
|
+
System strips query strings and fragments:
|
|
20
|
+
```
|
|
21
|
+
https://example.com/page?utm=x#section → https://example.com/page
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
**When query strings matter** (pagination, dynamic IDs):
|
|
25
|
+
```
|
|
26
|
+
ingest_data({
|
|
27
|
+
content: page1_html,
|
|
28
|
+
metadata: { source: "https://example.com/results?page=1", format: "html" }
|
|
29
|
+
})
|
|
30
|
+
```
|
|
31
|
+
Explicitly include full URL as source.
|
|
32
|
+
|
|
33
|
+
## Edge Cases
|
|
34
|
+
|
|
35
|
+
### Empty/Minimal Extraction
|
|
36
|
+
|
|
37
|
+
Why it happens:
|
|
38
|
+
- JS-rendered content (use browser MCP)
|
|
39
|
+
- Non-standard HTML structure
|
|
40
|
+
- Login required
|
|
41
|
+
|
|
42
|
+
### SPA/Dynamic Content
|
|
43
|
+
|
|
44
|
+
1. Use browser MCP to render
|
|
45
|
+
2. Wait for content load
|
|
46
|
+
3. Extract rendered HTML
|
|
47
|
+
4. Ingest via `ingest_data`
|
|
48
|
+
|
|
49
|
+
### Pages with Only Navigation
|
|
50
|
+
|
|
51
|
+
Skip or fetch deeper linked pages instead.
|
|
52
|
+
|
|
53
|
+
## Updating Content
|
|
54
|
+
|
|
55
|
+
Re-ingest with same source to replace:
|
|
56
|
+
```
|
|
57
|
+
ingest_data({
|
|
58
|
+
content: updated_html,
|
|
59
|
+
metadata: { source: "https://example.com/page", format: "html" }
|
|
60
|
+
})
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
## Search Results
|
|
64
|
+
|
|
65
|
+
Results from HTML include `source` field:
|
|
66
|
+
```json
|
|
67
|
+
{
|
|
68
|
+
"filePath": "raw-data/abc123.md",
|
|
69
|
+
"source": "https://example.com/page",
|
|
70
|
+
"text": "...",
|
|
71
|
+
"score": 0.25
|
|
72
|
+
}
|
|
73
|
+
```
|
|
@@ -1,57 +1,57 @@
|
|
|
1
|
-
# Query Optimization Reference
|
|
2
|
-
|
|
3
|
-
Core rules are in SKILL.md. This covers patterns and edge cases.
|
|
4
|
-
|
|
5
|
-
## Query Patterns by Intent
|
|
6
|
-
|
|
7
|
-
| User Intent | Query Pattern | Why |
|
|
8
|
-
|-------------|---------------|-----|
|
|
9
|
-
| Definition/Concept | `"[term] definition concept"` | Targets explanatory content |
|
|
10
|
-
| How-To/Procedure | `"[action] steps example usage"` | Targets instructional content |
|
|
11
|
-
| API/Function | `"[function] API arguments return"` | Targets reference docs |
|
|
12
|
-
| Troubleshooting | `"[error] fix solution cause"` | Targets problem-solving content |
|
|
13
|
-
|
|
14
|
-
## Multi-Query: When to Split
|
|
15
|
-
|
|
16
|
-
**Split** when "and" connects distinct topics:
|
|
17
|
-
```
|
|
18
|
-
"How do I authenticate AND handle errors?"
|
|
19
|
-
→ Query 1: "authentication login JWT session"
|
|
20
|
-
→ Query 2: "error handling exception catch"
|
|
21
|
-
```
|
|
22
|
-
|
|
23
|
-
**Don't split** when "and" is within single topic:
|
|
24
|
-
```
|
|
25
|
-
"How do I set up and configure the database?"
|
|
26
|
-
→ Single: "database setup configuration"
|
|
27
|
-
```
|
|
28
|
-
|
|
29
|
-
## Query Expansion Examples
|
|
30
|
-
|
|
31
|
-
When results are few or all score > 0.5:
|
|
32
|
-
|
|
33
|
-
| Type | Original | Expanded |
|
|
34
|
-
|------|----------|----------|
|
|
35
|
-
| Synonyms | delete | "delete remove" |
|
|
36
|
-
| Abbreviations | API | "API Application Programming Interface" |
|
|
37
|
-
| Related terms | auth | "auth authentication login" |
|
|
38
|
-
| Word forms | config | "config configuration configure" |
|
|
39
|
-
|
|
40
|
-
Keep original term first. Limit to 2-4 additions.
|
|
41
|
-
|
|
42
|
-
## Iterative Refinement
|
|
43
|
-
|
|
44
|
-
When initial results are unsatisfactory:
|
|
45
|
-
|
|
46
|
-
| Problem | Why It Happens | Action |
|
|
47
|
-
|---------|----------------|--------|
|
|
48
|
-
| Too few results | Term mismatch | Expand query (see above) |
|
|
49
|
-
| Too many irrelevant | Query too broad | Add specific terms |
|
|
50
|
-
| Missing expected | Phrasing mismatch | Try alternative wording |
|
|
51
|
-
|
|
52
|
-
## Language Mixing
|
|
53
|
-
|
|
54
|
-
Ngram tokenization supports cross-language queries:
|
|
55
|
-
```
|
|
56
|
-
"API error handling" → matches both English and Japanese content
|
|
57
|
-
```
|
|
1
|
+
# Query Optimization Reference
|
|
2
|
+
|
|
3
|
+
Core rules are in SKILL.md. This covers patterns and edge cases.
|
|
4
|
+
|
|
5
|
+
## Query Patterns by Intent
|
|
6
|
+
|
|
7
|
+
| User Intent | Query Pattern | Why |
|
|
8
|
+
|-------------|---------------|-----|
|
|
9
|
+
| Definition/Concept | `"[term] definition concept"` | Targets explanatory content |
|
|
10
|
+
| How-To/Procedure | `"[action] steps example usage"` | Targets instructional content |
|
|
11
|
+
| API/Function | `"[function] API arguments return"` | Targets reference docs |
|
|
12
|
+
| Troubleshooting | `"[error] fix solution cause"` | Targets problem-solving content |
|
|
13
|
+
|
|
14
|
+
## Multi-Query: When to Split
|
|
15
|
+
|
|
16
|
+
**Split** when "and" connects distinct topics:
|
|
17
|
+
```
|
|
18
|
+
"How do I authenticate AND handle errors?"
|
|
19
|
+
→ Query 1: "authentication login JWT session"
|
|
20
|
+
→ Query 2: "error handling exception catch"
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
**Don't split** when "and" is within single topic:
|
|
24
|
+
```
|
|
25
|
+
"How do I set up and configure the database?"
|
|
26
|
+
→ Single: "database setup configuration"
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## Query Expansion Examples
|
|
30
|
+
|
|
31
|
+
When results are few or all score > 0.5:
|
|
32
|
+
|
|
33
|
+
| Type | Original | Expanded |
|
|
34
|
+
|------|----------|----------|
|
|
35
|
+
| Synonyms | delete | "delete remove" |
|
|
36
|
+
| Abbreviations | API | "API Application Programming Interface" |
|
|
37
|
+
| Related terms | auth | "auth authentication login" |
|
|
38
|
+
| Word forms | config | "config configuration configure" |
|
|
39
|
+
|
|
40
|
+
Keep original term first. Limit to 2-4 additions.
|
|
41
|
+
|
|
42
|
+
## Iterative Refinement
|
|
43
|
+
|
|
44
|
+
When initial results are unsatisfactory:
|
|
45
|
+
|
|
46
|
+
| Problem | Why It Happens | Action |
|
|
47
|
+
|---------|----------------|--------|
|
|
48
|
+
| Too few results | Term mismatch | Expand query (see above) |
|
|
49
|
+
| Too many irrelevant | Query too broad | Add specific terms |
|
|
50
|
+
| Missing expected | Phrasing mismatch | Try alternative wording |
|
|
51
|
+
|
|
52
|
+
## Language Mixing
|
|
53
|
+
|
|
54
|
+
Ngram tokenization supports cross-language queries:
|
|
55
|
+
```
|
|
56
|
+
"API error handling" → matches both English and Japanese content
|
|
57
|
+
```
|
|
@@ -1,54 +1,54 @@
|
|
|
1
|
-
# Result Refinement Reference
|
|
2
|
-
|
|
3
|
-
Core rules (score, include/skip) are in SKILL.md. This covers when and how to combine multiple results.
|
|
4
|
-
|
|
5
|
-
## When to Synthesize vs Filter
|
|
6
|
-
|
|
7
|
-
Match approach to user intent:
|
|
8
|
-
|
|
9
|
-
| User Intent | Approach | Why |
|
|
10
|
-
|-------------|----------|-----|
|
|
11
|
-
| Specific answer ("how to X") | Filter to 1-2 best | Extra results add noise |
|
|
12
|
-
| Understanding a topic | Synthesize multiple | Builds complete picture |
|
|
13
|
-
| Troubleshooting error | Filter to direct cause | Tangential info confuses |
|
|
14
|
-
| Comparing options | Synthesize with structure | Need all perspectives |
|
|
15
|
-
|
|
16
|
-
## Multiple Results Handling
|
|
17
|
-
|
|
18
|
-
### Synthesis
|
|
19
|
-
|
|
20
|
-
When: User needs comprehensive understanding.
|
|
21
|
-
|
|
22
|
-
```
|
|
23
|
-
Result 1: "API accepts JSON..."
|
|
24
|
-
Result 2: "Auth uses Bearer tokens..."
|
|
25
|
-
→ Combine into unified answer
|
|
26
|
-
```
|
|
27
|
-
|
|
28
|
-
### Deduplication
|
|
29
|
-
|
|
30
|
-
When: Results overlap significantly.
|
|
31
|
-
|
|
32
|
-
1. Pick most complete result
|
|
33
|
-
2. Add only unique info from others
|
|
34
|
-
|
|
35
|
-
### Contradiction Resolution
|
|
36
|
-
|
|
37
|
-
When: Results conflict.
|
|
38
|
-
|
|
39
|
-
Priority: Lower score (= better match)
|
|
40
|
-
If unresolved → Note discrepancy to user
|
|
41
|
-
|
|
42
|
-
## Chunk Context
|
|
43
|
-
|
|
44
|
-
Single chunks may lack context ("as described above").
|
|
45
|
-
|
|
46
|
-
- Note when information is partial
|
|
47
|
-
- Group multiple chunks from same `filePath` as coherent sections
|
|
48
|
-
|
|
49
|
-
## No Results
|
|
50
|
-
|
|
51
|
-
1. Rephrase query (alternative terms)
|
|
52
|
-
2. Broaden scope
|
|
53
|
-
3. Check ingestion (`list_files`)
|
|
54
|
-
4. Inform user: no matching content
|
|
1
|
+
# Result Refinement Reference
|
|
2
|
+
|
|
3
|
+
Core rules (score, include/skip) are in SKILL.md. This covers when and how to combine multiple results.
|
|
4
|
+
|
|
5
|
+
## When to Synthesize vs Filter
|
|
6
|
+
|
|
7
|
+
Match approach to user intent:
|
|
8
|
+
|
|
9
|
+
| User Intent | Approach | Why |
|
|
10
|
+
|-------------|----------|-----|
|
|
11
|
+
| Specific answer ("how to X") | Filter to 1-2 best | Extra results add noise |
|
|
12
|
+
| Understanding a topic | Synthesize multiple | Builds complete picture |
|
|
13
|
+
| Troubleshooting error | Filter to direct cause | Tangential info confuses |
|
|
14
|
+
| Comparing options | Synthesize with structure | Need all perspectives |
|
|
15
|
+
|
|
16
|
+
## Multiple Results Handling
|
|
17
|
+
|
|
18
|
+
### Synthesis
|
|
19
|
+
|
|
20
|
+
When: User needs comprehensive understanding.
|
|
21
|
+
|
|
22
|
+
```
|
|
23
|
+
Result 1: "API accepts JSON..."
|
|
24
|
+
Result 2: "Auth uses Bearer tokens..."
|
|
25
|
+
→ Combine into unified answer
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
### Deduplication
|
|
29
|
+
|
|
30
|
+
When: Results overlap significantly.
|
|
31
|
+
|
|
32
|
+
1. Pick most complete result
|
|
33
|
+
2. Add only unique info from others
|
|
34
|
+
|
|
35
|
+
### Contradiction Resolution
|
|
36
|
+
|
|
37
|
+
When: Results conflict.
|
|
38
|
+
|
|
39
|
+
Priority: Lower score (= better match)
|
|
40
|
+
If unresolved → Note discrepancy to user
|
|
41
|
+
|
|
42
|
+
## Chunk Context
|
|
43
|
+
|
|
44
|
+
Single chunks may lack context ("as described above").
|
|
45
|
+
|
|
46
|
+
- Note when information is partial
|
|
47
|
+
- Group multiple chunks from same `filePath` as coherent sections
|
|
48
|
+
|
|
49
|
+
## No Results
|
|
50
|
+
|
|
51
|
+
1. Rephrase query (alternative terms)
|
|
52
|
+
2. Broaden scope
|
|
53
|
+
3. Check ingestion (`list_files`)
|
|
54
|
+
4. Inform user: no matching content
|
|
File without changes
|
|
File without changes
|
package/web-ui/dist/index.html
CHANGED
|
File without changes
|
package/web-ui/dist/vite.svg
CHANGED
|
File without changes
|