npm - @robthepcguy/rag-vault - Versions diffs - 1.5.0 → 1.5.1 - Mend

@robthepcguy/rag-vault 1.5.0 → 1.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (84) hide show

package/LICENSE +0 -0
package/README.md +1 -0
package/dist/bin/install-skills.d.ts +0 -0
package/dist/bin/install-skills.js +0 -0
package/dist/chunker/index.d.ts +0 -0
package/dist/chunker/index.js +0 -0
package/dist/chunker/semantic-chunker.d.ts +0 -0
package/dist/chunker/semantic-chunker.js +0 -0
package/dist/chunker/sentence-splitter.d.ts +0 -0
package/dist/chunker/sentence-splitter.js +0 -0
package/dist/embedder/index.d.ts +0 -0
package/dist/embedder/index.js +0 -0
package/dist/errors/index.d.ts +0 -0
package/dist/errors/index.js +0 -0
package/dist/explainability/index.d.ts +0 -0
package/dist/explainability/index.js +0 -0
package/dist/explainability/keywords.d.ts +0 -0
package/dist/explainability/keywords.js +0 -0
package/dist/flywheel/feedback.d.ts +0 -0
package/dist/flywheel/feedback.js +0 -0
package/dist/flywheel/index.d.ts +0 -0
package/dist/flywheel/index.js +0 -0
package/dist/index.d.ts +0 -0
package/dist/parser/html-parser.d.ts +0 -0
package/dist/parser/html-parser.js +0 -0
package/dist/parser/index.d.ts +0 -0
package/dist/parser/index.js +0 -0
package/dist/parser/pdf-filter.d.ts +0 -0
package/dist/parser/pdf-filter.js +0 -0
package/dist/query/index.d.ts +0 -0
package/dist/query/index.js +0 -0
package/dist/query/parser.d.ts +0 -0
package/dist/query/parser.js +0 -0
package/dist/server/index.d.ts +0 -0
package/dist/server/index.js +0 -0
package/dist/server/raw-data-utils.d.ts +0 -0
package/dist/server/raw-data-utils.js +0 -0
package/dist/server/schemas.d.ts +0 -0
package/dist/server/schemas.js +0 -0
package/dist/utils/config-parsers.d.ts +0 -0
package/dist/utils/config-parsers.js +0 -0
package/dist/utils/config.d.ts +0 -0
package/dist/utils/config.js +0 -0
package/dist/utils/file-utils.d.ts +0 -0
package/dist/utils/file-utils.js +0 -0
package/dist/utils/math.d.ts +0 -0
package/dist/utils/math.js +0 -0
package/dist/utils/process-handlers.d.ts +0 -0
package/dist/utils/process-handlers.js +0 -0
package/dist/vectordb/index.d.ts +0 -0
package/dist/vectordb/index.js +12 -12
package/dist/web/api-routes.d.ts +0 -0
package/dist/web/api-routes.js +0 -0
package/dist/web/config-routes.d.ts +0 -0
package/dist/web/config-routes.js +0 -0
package/dist/web/database-manager.d.ts +0 -0
package/dist/web/database-manager.js +0 -0
package/dist/web/http-server.d.ts +0 -0
package/dist/web/http-server.js +0 -0
package/dist/web/index.d.ts +0 -0
package/dist/web/index.js +0 -0
package/dist/web/middleware/async-handler.d.ts +0 -0
package/dist/web/middleware/async-handler.js +0 -0
package/dist/web/middleware/auth.d.ts +0 -0
package/dist/web/middleware/auth.js +0 -0
package/dist/web/middleware/error-handler.d.ts +0 -0
package/dist/web/middleware/error-handler.js +0 -0
package/dist/web/middleware/index.d.ts +0 -0
package/dist/web/middleware/index.js +0 -0
package/dist/web/middleware/rate-limit.d.ts +0 -0
package/dist/web/middleware/rate-limit.js +0 -0
package/dist/web/middleware/request-logger.d.ts +0 -0
package/dist/web/middleware/request-logger.js +0 -0
package/dist/web/types.d.ts +0 -0
package/dist/web/types.js +0 -0
package/package.json +37 -50
package/skills/rag-vault/SKILL.md +111 -111
package/skills/rag-vault/references/html-ingestion.md +73 -73
package/skills/rag-vault/references/query-optimization.md +57 -57
package/skills/rag-vault/references/result-refinement.md +54 -54
package/web-ui/dist/assets/index-SBHxoAwi.js +0 -0
package/web-ui/dist/assets/index-ej8i4PGl.css +0 -0
package/web-ui/dist/index.html +0 -0
package/web-ui/dist/vite.svg +0 -0

package/LICENSE CHANGED Viewed

File without changes

package/README.md CHANGED Viewed

@@ -397,6 +397,7 @@ Copy the `DB_PATH` directory (default: `./lancedb/`).
 | File too large | Default limit is 100MB. Set `MAX_FILE_SIZE` higher or split the file. |
 | Path outside BASE_DIR | All file paths must be under `BASE_DIR`. Use absolute paths. |
 | MCP tools not showing | Verify config syntax, restart your AI tool completely (Cmd+Q on Mac). |
+| `mcp-publisher login github` fails with `slow_down` | Use token login instead: `mcp-publisher login github --token "$(gh auth token)"` (or pass a PAT). |
 | 401 Unauthorized | API key required. Set `RAG_API_KEY` or use correct header format. |
 | 429 Too Many Requests | Rate limited. Wait for reset or increase `RATE_LIMIT_MAX_REQUESTS`. |
 | CORS errors | Add your origin to `CORS_ORIGINS` environment variable. |

package/dist/bin/install-skills.d.ts CHANGED Viewed

File without changes

package/dist/bin/install-skills.js CHANGED Viewed

File without changes

package/dist/chunker/index.d.ts CHANGED Viewed

File without changes

package/dist/chunker/index.js CHANGED Viewed

File without changes

package/dist/chunker/semantic-chunker.d.ts CHANGED Viewed

File without changes

package/dist/chunker/semantic-chunker.js CHANGED Viewed

File without changes

package/dist/chunker/sentence-splitter.d.ts CHANGED Viewed

File without changes

package/dist/chunker/sentence-splitter.js CHANGED Viewed

File without changes

package/dist/embedder/index.d.ts CHANGED Viewed

File without changes

package/dist/embedder/index.js CHANGED Viewed

File without changes

package/dist/errors/index.d.ts CHANGED Viewed

File without changes

package/dist/errors/index.js CHANGED Viewed

File without changes

package/dist/explainability/index.d.ts CHANGED Viewed

File without changes

package/dist/explainability/index.js CHANGED Viewed

File without changes

package/dist/explainability/keywords.d.ts CHANGED Viewed

File without changes

package/dist/explainability/keywords.js CHANGED Viewed

File without changes

package/dist/flywheel/feedback.d.ts CHANGED Viewed

File without changes

package/dist/flywheel/feedback.js CHANGED Viewed

File without changes

package/dist/flywheel/index.d.ts CHANGED Viewed

File without changes

package/dist/flywheel/index.js CHANGED Viewed

File without changes

package/dist/index.d.ts CHANGED Viewed

File without changes

package/dist/parser/html-parser.d.ts CHANGED Viewed

File without changes

package/dist/parser/html-parser.js CHANGED Viewed

File without changes

package/dist/parser/index.d.ts CHANGED Viewed

File without changes

package/dist/parser/index.js CHANGED Viewed

File without changes

package/dist/parser/pdf-filter.d.ts CHANGED Viewed

File without changes

package/dist/parser/pdf-filter.js CHANGED Viewed

File without changes

package/dist/query/index.d.ts CHANGED Viewed

File without changes

package/dist/query/index.js CHANGED Viewed

File without changes

package/dist/query/parser.d.ts CHANGED Viewed

File without changes

package/dist/query/parser.js CHANGED Viewed

File without changes

package/dist/server/index.d.ts CHANGED Viewed

File without changes

package/dist/server/index.js CHANGED Viewed

File without changes

package/dist/server/raw-data-utils.d.ts CHANGED Viewed

File without changes

package/dist/server/raw-data-utils.js CHANGED Viewed

File without changes

package/dist/server/schemas.d.ts CHANGED Viewed

File without changes

package/dist/server/schemas.js CHANGED Viewed

File without changes

package/dist/utils/config-parsers.d.ts CHANGED Viewed

File without changes

package/dist/utils/config-parsers.js CHANGED Viewed

File without changes

package/dist/utils/config.d.ts CHANGED Viewed

File without changes

package/dist/utils/config.js CHANGED Viewed

File without changes

package/dist/utils/file-utils.d.ts CHANGED Viewed

File without changes

package/dist/utils/file-utils.js CHANGED Viewed

File without changes

package/dist/utils/math.d.ts CHANGED Viewed

File without changes

package/dist/utils/math.js CHANGED Viewed

File without changes

package/dist/utils/process-handlers.d.ts CHANGED Viewed

File without changes

package/dist/utils/process-handlers.js CHANGED Viewed

File without changes

package/dist/vectordb/index.d.ts CHANGED Viewed

File without changes

package/dist/vectordb/index.js CHANGED Viewed

@@ -323,15 +323,15 @@ class VectorStore {
             if (tableNames.includes(this.config.tableName)) {
                 // Open existing table
                 this.table = await this.db.openTable(this.config.tableName);
-                console.log(`VectorStore: Opened existing table "${this.config.tableName}"`);
+                console.error(`VectorStore: Opened existing table "${this.config.tableName}"`);
                 // Ensure FTS index exists (migration for existing databases)
                 await this.ensureFtsIndex();
             }
             else {
                 // Create new table (schema auto-defined on first data insertion)
-                console.log(`VectorStore: Table "${this.config.tableName}" will be created on first data insertion`);
+                console.error(`VectorStore: Table "${this.config.tableName}" will be created on first data insertion`);
             }
-            console.log(`VectorStore initialized: ${this.config.dbPath}`);
+            console.error(`VectorStore initialized: ${this.config.dbPath}`);
         }
         catch (error) {
             // Clean up partially initialized resources on failure
@@ -365,7 +365,7 @@ class VectorStore {
     async deleteChunks(filePath) {
         if (!this.table) {
             // If table doesn't exist, no deletion targets, return normally
-            console.log('VectorStore: Skipping deletion as table does not exist');
+            console.error('VectorStore: Skipping deletion as table does not exist');
             return;
         }
         // Validate file path before use in query to prevent SQL injection
@@ -381,7 +381,7 @@ class VectorStore {
             // so call delete directly
             // Note: Field names are case-sensitive, use backticks for camelCase fields
             await this.table.delete(`\`filePath\` = '${escapedFilePath}'`);
-            console.log(`VectorStore: Deleted chunks for file "${filePath}"`);
+            console.error(`VectorStore: Deleted chunks for file "${filePath}"`);
             // Rebuild FTS index after deleting data
             await this.rebuildFtsIndex();
         }
@@ -435,7 +435,7 @@ class VectorStore {
                         // Convert to LanceDB record format using explicit field mapping
                         const records = chunksWithFingerprints.map(toDbRecord);
                         this.table = await this.db.createTable(this.config.tableName, records);
-                        console.log(`VectorStore: Created table "${this.config.tableName}"`);
+                        console.error(`VectorStore: Created table "${this.config.tableName}"`);
                         // Create FTS index for hybrid search
                         await this.ensureFtsIndex();
                     })();
@@ -445,7 +445,7 @@ class VectorStore {
                     finally {
                         this.tableCreationPromise = null;
                     }
-                    console.log(`VectorStore: Inserted ${chunks.length} chunks`);
+                    console.error(`VectorStore: Inserted ${chunks.length} chunks`);
                     return;
                 }
             }
@@ -454,7 +454,7 @@ class VectorStore {
             await this.table.add(records);
             // Rebuild FTS index after adding new data
             await this.rebuildFtsIndex();
-            console.log(`VectorStore: Inserted ${chunks.length} chunks`);
+            console.error(`VectorStore: Inserted ${chunks.length} chunks`);
         }
         catch (error) {
             throw new index_js_1.DatabaseError('Failed to insert chunks', error);
@@ -492,12 +492,12 @@ class VectorStore {
             name: FTS_INDEX_NAME,
         });
         this.ftsEnabled = true;
-        console.log(`VectorStore: FTS index "${FTS_INDEX_NAME}" created successfully`);
+        console.error(`VectorStore: FTS index "${FTS_INDEX_NAME}" created successfully`);
         // Drop old FTS indices
         for (const idx of existingFtsIndices) {
             if (idx.name !== FTS_INDEX_NAME) {
                 await this.table.dropIndex(idx.name);
-                console.log(`VectorStore: Dropped old FTS index "${idx.name}"`);
+                console.error(`VectorStore: Dropped old FTS index "${idx.name}"`);
             }
         }
     }
@@ -579,7 +579,7 @@ class VectorStore {
      */
     async search(queryVector, queryText, limit = 10) {
         if (!this.table) {
-            console.log('VectorStore: Returning empty results as table does not exist');
+            console.error('VectorStore: Returning empty results as table does not exist');
             return [];
         }
         if (limit < 1 || limit > 20) {
@@ -779,7 +779,7 @@ class VectorStore {
         this.ftsEnabled = false;
         this.ftsFailureCount = 0;
         this.ftsLastFailure = null;
-        console.log('VectorStore: Connection closed');
+        console.error('VectorStore: Connection closed');
         // Propagate errors to caller after cleanup is complete
         if (errors.length > 0) {
             throw new index_js_1.DatabaseError(`Errors during close: ${errors.map((e) => e.message).join('; ')}`, errors[0]);

package/dist/web/api-routes.d.ts CHANGED Viewed

File without changes

package/dist/web/api-routes.js CHANGED Viewed

File without changes

package/dist/web/config-routes.d.ts CHANGED Viewed

File without changes

package/dist/web/config-routes.js CHANGED Viewed

File without changes

package/dist/web/database-manager.d.ts CHANGED Viewed

File without changes

package/dist/web/database-manager.js CHANGED Viewed

File without changes

package/dist/web/http-server.d.ts CHANGED Viewed

File without changes

package/dist/web/http-server.js CHANGED Viewed

File without changes

package/dist/web/index.d.ts CHANGED Viewed

File without changes

package/dist/web/index.js CHANGED Viewed

File without changes

package/dist/web/middleware/async-handler.d.ts CHANGED Viewed

File without changes

package/dist/web/middleware/async-handler.js CHANGED Viewed

File without changes

package/dist/web/middleware/auth.d.ts CHANGED Viewed

File without changes

package/dist/web/middleware/auth.js CHANGED Viewed

File without changes

package/dist/web/middleware/error-handler.d.ts CHANGED Viewed

File without changes

package/dist/web/middleware/error-handler.js CHANGED Viewed

File without changes

package/dist/web/middleware/index.d.ts CHANGED Viewed

File without changes

package/dist/web/middleware/index.js CHANGED Viewed

File without changes

package/dist/web/middleware/rate-limit.d.ts CHANGED Viewed

File without changes

package/dist/web/middleware/rate-limit.js CHANGED Viewed

File without changes

package/dist/web/middleware/request-logger.d.ts CHANGED Viewed

File without changes

package/dist/web/middleware/request-logger.js CHANGED Viewed

File without changes

package/dist/web/types.d.ts CHANGED Viewed

File without changes

package/dist/web/types.js CHANGED Viewed

File without changes

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@robthepcguy/rag-vault",
-  "version": "1.5.0",
+  "version": "1.5.1",
   "description": "Local RAG MCP Server - Easy-to-setup document search with minimal configuration",
   "main": "dist/index.js",
   "bin": {
@@ -41,42 +41,6 @@
     "type": "git",
     "url": "git+https://github.com/RobThePCGuy/rag-vault.git"
   },
-  "scripts": {
-    "build": "tsc -p tsconfig.build.json && tsc-alias -p tsconfig.build.json",
-    "check": "pnpm type-check && pnpm lint && pnpm format:check",
-    "check:all": "pnpm check && pnpm check:web-ui && pnpm check:unused && pnpm check:deps && pnpm build && pnpm test:unit",
-    "check:deps": "madge --circular --extensions ts src",
-    "check:deps:graph": "madge --extensions ts --image graph.svg src",
-    "check:web-ui": "pnpm --prefix web-ui check",
-    "check:unused": "node scripts/check-unused-exports.js",
-    "check:unused:all": "knip",
-    "cleanup:processes": "bash ./scripts/cleanup-test-processes.sh",
-    "clean:dev": "rm -rf ./node_modules ./tmp ./uploads ./models ./lancedb ./dist ./package-lock.json && cd web-ui && rm -rf ./dist ./node_modules ./package-lock.json",
-    "dev": "tsx src/index.ts",
-    "format": "biome format --write src",
-    "format:check": "biome format src",
-    "lint": "biome lint src",
-    "lint:fix": "biome lint --write src",
-    "start": "node dist/index.js",
-    "test": "vitest run",
-    "test:coverage": "vitest run --coverage",
-    "test:safe": "pnpm test && pnpm cleanup:processes",
-    "test:watch": "vitest",
-    "type-check": "tsc --noEmit",
-    "audit": "pnpm audit --audit-level=moderate",
-    "audit:fix": "pnpm audit --fix",
-    "setup:web": "pnpm install && pnpm web:build && pnpm --prefix web-ui install && pnpm ui:build && pnpm web:start",
-    "ui:build": "pnpm --prefix web-ui build",
-    "ui:dev": "cd web-ui && pnpm dev",
-    "web:build": "pnpm build",
-    "web:dev": "concurrently -n api,ui -c blue,magenta \"pnpm web:watch\" \"pnpm --prefix web-ui dev\"",
-    "web:start": "node dist/web/index.js",
-    "web:watch": "tsx watch src/web/index.ts",
-    "web": "tsx src/web/index.ts",
-    "test:unit": "vitest run --project backend-unit --project web-ui",
-    "test:integration": "RUN_EMBEDDING_INTEGRATION=1 vitest run --project backend-integration",
-    "hooks:install": "git config core.hooksPath .githooks"
-  },
   "dependencies": {
     "@huggingface/transformers": "^3.7.6",
     "@lancedb/lancedb": "^0.23.0",
@@ -118,17 +82,40 @@
     "node": ">=20"
   },
   "mcpName": "io.github.RobThePCGuy/rag-vault",
-  "pnpm": {
-    "overrides": {
-      "tar": ">=7.5.7",
-      "diff": ">=4.0.4"
-    },
-    "onlyBuiltDependencies": [
-      "esbuild",
-      "onnxruntime-node",
-      "protobufjs",
-      "@robthepcguy/rag-vault",
-      "sharp"
-    ]
+  "scripts": {
+    "build": "tsc -p tsconfig.build.json && tsc-alias -p tsconfig.build.json",
+    "check": "pnpm type-check && pnpm lint && pnpm format:check",
+    "check:all": "pnpm check && pnpm check:web-ui && pnpm check:unused && pnpm check:deps && pnpm build && pnpm test:unit",
+    "check:deps": "madge --circular --extensions ts src",
+    "check:deps:graph": "madge --extensions ts --image graph.svg src",
+    "check:web-ui": "pnpm --prefix web-ui check",
+    "check:unused": "node scripts/check-unused-exports.js",
+    "check:unused:all": "knip",
+    "cleanup:processes": "bash ./scripts/cleanup-test-processes.sh",
+    "clean:dev": "rm -rf ./node_modules ./tmp ./uploads ./models ./lancedb ./dist ./package-lock.json && cd web-ui && rm -rf ./dist ./node_modules ./package-lock.json",
+    "dev": "tsx src/index.ts",
+    "format": "biome format --write src",
+    "format:check": "biome format src",
+    "lint": "biome lint src",
+    "lint:fix": "biome lint --write src",
+    "start": "node dist/index.js",
+    "test": "vitest run",
+    "test:coverage": "vitest run --coverage",
+    "test:safe": "pnpm test && pnpm cleanup:processes",
+    "test:watch": "vitest",
+    "type-check": "tsc --noEmit",
+    "audit": "pnpm audit --audit-level=moderate",
+    "audit:fix": "pnpm audit --fix",
+    "setup:web": "pnpm install && pnpm web:build && pnpm --prefix web-ui install && pnpm ui:build && pnpm web:start",
+    "ui:build": "pnpm --prefix web-ui build",
+    "ui:dev": "cd web-ui && pnpm dev",
+    "web:build": "pnpm build",
+    "web:dev": "concurrently -n api,ui -c blue,magenta \"pnpm web:watch\" \"pnpm --prefix web-ui dev\"",
+    "web:start": "node dist/web/index.js",
+    "web:watch": "tsx watch src/web/index.ts",
+    "web": "tsx src/web/index.ts",
+    "test:unit": "vitest run --project backend-unit --project web-ui",
+    "test:integration": "RUN_EMBEDDING_INTEGRATION=1 vitest run --project backend-integration",
+    "hooks:install": "git config core.hooksPath .githooks"
   }
-}
+}

package/skills/rag-vault/SKILL.md CHANGED Viewed

@@ -1,111 +1,111 @@
----
-name: rag-vault
-description: This skill should be used when the user asks to "search documents", "query RAG", "ingest file", "ingest PDF", "save web page", "add to knowledge base", or mentions document search, semantic search, vector search, or RAG operations. Provides score interpretation (< 0.3 good, > 0.5 skip), query optimization, and ingestion guidance for query_documents, ingest_file, ingest_data tools.
-version: 1.0.0
----
-# RAG Vault Skills
-## Tools
-| Tool | Use When |
-|------|----------|
-| `ingest_file` | Local files (PDF, DOCX, TXT, MD, JSON, JSONL) |
-| `ingest_data` | Raw content (HTML, text) with source URL |
-| `query_documents` | Semantic + keyword hybrid search |
-| `delete_file` / `list_files` / `status` | Management |
-## Search: Core Rules
-Hybrid search combines vector (semantic) and keyword (BM25).
-### Score Interpretation
-Lower = better match. Use this to filter noise.
-| Score | Action |
-|-------|--------|
-| < 0.3 | Use directly |
-| 0.3-0.5 | Include if mentions same concept/entity |
-| > 0.5 | Skip unless no better results |
-### Limit Selection
-| Intent | Limit |
-|--------|-------|
-| Specific answer (function, error) | 5 |
-| General understanding | 10 |
-| Comprehensive survey | 20 |
-### Query Formulation
-| Situation | Why Transform | Action |
-|-----------|---------------|--------|
-| Specific term mentioned | Keyword search needs exact match | KEEP term |
-| Vague query | Vector search needs semantic signal | ADD context |
-| Error stack or code block | Long text dilutes relevance | EXTRACT core keywords |
-| Multiple distinct topics | Single query conflates results | SPLIT queries |
-| Few/poor results | Term mismatch | EXPAND (see below) |
-### Query Expansion
-When results are few or all score > 0.5, expand query terms:
-- Keep original term first, add 2-4 variants
-- Types: synonyms, abbreviations, related terms, word forms
-- Example: `"config"` → `"config configuration settings configure"`
-Avoid over-expansion (causes topic drift).
-### Result Selection
-When to include vs skip—based on answer quality, not just score.
-**INCLUDE** if:
-- Directly answers the question
-- Provides necessary context
-- Score < 0.5
-**SKIP** if:
-- Same keyword, unrelated context
-- Score > 0.7
-- Mentions term without explanation
-## Ingestion
-### ingest_file
-```
-ingest_file({ filePath: "/absolute/path/to/document.pdf" })
-```
-### ingest_data
-```
-ingest_data({
-  content: "<html>...</html>",
-  metadata: { source: "https://example.com/page", format: "html" }
-})
-```
-**Format selection** — match the data you have:
-- HTML string → `format: "html"`
-- Markdown string → `format: "markdown"`
-- Other → `format: "text"`
-**Source format:**
-- Web page → Use URL: `https://example.com/page`
-- Other content → Use scheme: `{type}://{date}` or `{type}://{date}/{detail}`
-  - Examples: `clipboard://2024-12-30`, `chat://2024-12-30/project-discussion`
-**HTML source options:**
-- Static page → LLM fetch
-- SPA/JS-rendered → Browser MCP
-- Auth required → Manual paste
-Re-ingest same source to update. Use same source in `delete_file` to remove.
-## References
-For edge cases and examples:
-- [html-ingestion.md](references/html-ingestion.md) - URL normalization, SPA handling
-- [query-optimization.md](references/query-optimization.md) - Query patterns by intent
-- [result-refinement.md](references/result-refinement.md) - Contradiction resolution, chunking
+---
+name: rag-vault
+description: This skill should be used when the user asks to "search documents", "query RAG", "ingest file", "ingest PDF", "save web page", "add to knowledge base", or mentions document search, semantic search, vector search, or RAG operations. Provides score interpretation (< 0.3 good, > 0.5 skip), query optimization, and ingestion guidance for query_documents, ingest_file, ingest_data tools.
+version: 1.0.0
+---
+# RAG Vault Skills
+## Tools
+| Tool | Use When |
+|------|----------|
+| `ingest_file` | Local files (PDF, DOCX, TXT, MD, JSON, JSONL) |
+| `ingest_data` | Raw content (HTML, text) with source URL |
+| `query_documents` | Semantic + keyword hybrid search |
+| `delete_file` / `list_files` / `status` | Management |
+## Search: Core Rules
+Hybrid search combines vector (semantic) and keyword (BM25).
+### Score Interpretation
+Lower = better match. Use this to filter noise.
+| Score | Action |
+|-------|--------|
+| < 0.3 | Use directly |
+| 0.3-0.5 | Include if mentions same concept/entity |
+| > 0.5 | Skip unless no better results |
+### Limit Selection
+| Intent | Limit |
+|--------|-------|
+| Specific answer (function, error) | 5 |
+| General understanding | 10 |
+| Comprehensive survey | 20 |
+### Query Formulation
+| Situation | Why Transform | Action |
+|-----------|---------------|--------|
+| Specific term mentioned | Keyword search needs exact match | KEEP term |
+| Vague query | Vector search needs semantic signal | ADD context |
+| Error stack or code block | Long text dilutes relevance | EXTRACT core keywords |
+| Multiple distinct topics | Single query conflates results | SPLIT queries |
+| Few/poor results | Term mismatch | EXPAND (see below) |
+### Query Expansion
+When results are few or all score > 0.5, expand query terms:
+- Keep original term first, add 2-4 variants
+- Types: synonyms, abbreviations, related terms, word forms
+- Example: `"config"` → `"config configuration settings configure"`
+Avoid over-expansion (causes topic drift).
+### Result Selection
+When to include vs skip—based on answer quality, not just score.
+**INCLUDE** if:
+- Directly answers the question
+- Provides necessary context
+- Score < 0.5
+**SKIP** if:
+- Same keyword, unrelated context
+- Score > 0.7
+- Mentions term without explanation
+## Ingestion
+### ingest_file
+```
+ingest_file({ filePath: "/absolute/path/to/document.pdf" })
+```
+### ingest_data
+```
+ingest_data({
+  content: "<html>...</html>",
+  metadata: { source: "https://example.com/page", format: "html" }
+})
+```
+**Format selection** — match the data you have:
+- HTML string → `format: "html"`
+- Markdown string → `format: "markdown"`
+- Other → `format: "text"`
+**Source format:**
+- Web page → Use URL: `https://example.com/page`
+- Other content → Use scheme: `{type}://{date}` or `{type}://{date}/{detail}`
+  - Examples: `clipboard://2024-12-30`, `chat://2024-12-30/project-discussion`
+**HTML source options:**
+- Static page → LLM fetch
+- SPA/JS-rendered → Browser MCP
+- Auth required → Manual paste
+Re-ingest same source to update. Use same source in `delete_file` to remove.
+## References
+For edge cases and examples:
+- [html-ingestion.md](references/html-ingestion.md) - URL normalization, SPA handling
+- [query-optimization.md](references/query-optimization.md) - Query patterns by intent
+- [result-refinement.md](references/result-refinement.md) - Contradiction resolution, chunking

package/skills/rag-vault/references/html-ingestion.md CHANGED Viewed

@@ -1,73 +1,73 @@
-# HTML Ingestion Reference
-Basic usage is in SKILL.md. This covers URL handling and edge cases.
-## System Behavior
-The parser extracts main content only—navigation, ads, and boilerplate are stripped. What gets indexed is clean body text, not the full HTML.
-## When to Use Each Source Method
-| Source Type | Method | Why |
-|-------------|--------|-----|
-| Static page, public | LLM fetch | Simplest, no extra tools |
-| SPA / JS-rendered | Browser MCP | Need rendered DOM |
-| Auth required | Manual paste | Can't fetch programmatically |
-## URL Normalization
-System strips query strings and fragments:
-```
-https://example.com/page?utm=x#section → https://example.com/page
-```
-**When query strings matter** (pagination, dynamic IDs):
-```
-ingest_data({
-  content: page1_html,
-  metadata: { source: "https://example.com/results?page=1", format: "html" }
-})
-```
-Explicitly include full URL as source.
-## Edge Cases
-### Empty/Minimal Extraction
-Why it happens:
-- JS-rendered content (use browser MCP)
-- Non-standard HTML structure
-- Login required
-### SPA/Dynamic Content
-1. Use browser MCP to render
-2. Wait for content load
-3. Extract rendered HTML
-4. Ingest via `ingest_data`
-### Pages with Only Navigation
-Skip or fetch deeper linked pages instead.
-## Updating Content
-Re-ingest with same source to replace:
-```
-ingest_data({
-  content: updated_html,
-  metadata: { source: "https://example.com/page", format: "html" }
-})
-```
-## Search Results
-Results from HTML include `source` field:
-```json
-{
-  "filePath": "raw-data/abc123.md",
-  "source": "https://example.com/page",
-  "text": "...",
-  "score": 0.25
-}
-```
+# HTML Ingestion Reference
+Basic usage is in SKILL.md. This covers URL handling and edge cases.
+## System Behavior
+The parser extracts main content only—navigation, ads, and boilerplate are stripped. What gets indexed is clean body text, not the full HTML.
+## When to Use Each Source Method
+| Source Type | Method | Why |
+|-------------|--------|-----|
+| Static page, public | LLM fetch | Simplest, no extra tools |
+| SPA / JS-rendered | Browser MCP | Need rendered DOM |
+| Auth required | Manual paste | Can't fetch programmatically |
+## URL Normalization
+System strips query strings and fragments:
+```
+https://example.com/page?utm=x#section → https://example.com/page
+```
+**When query strings matter** (pagination, dynamic IDs):
+```
+ingest_data({
+  content: page1_html,
+  metadata: { source: "https://example.com/results?page=1", format: "html" }
+})
+```
+Explicitly include full URL as source.
+## Edge Cases
+### Empty/Minimal Extraction
+Why it happens:
+- JS-rendered content (use browser MCP)
+- Non-standard HTML structure
+- Login required
+### SPA/Dynamic Content
+1. Use browser MCP to render
+2. Wait for content load
+3. Extract rendered HTML
+4. Ingest via `ingest_data`
+### Pages with Only Navigation
+Skip or fetch deeper linked pages instead.
+## Updating Content
+Re-ingest with same source to replace:
+```
+ingest_data({
+  content: updated_html,
+  metadata: { source: "https://example.com/page", format: "html" }
+})
+```
+## Search Results
+Results from HTML include `source` field:
+```json
+{
+  "filePath": "raw-data/abc123.md",
+  "source": "https://example.com/page",
+  "text": "...",
+  "score": 0.25
+}
+```

package/skills/rag-vault/references/query-optimization.md CHANGED Viewed

@@ -1,57 +1,57 @@
-# Query Optimization Reference
-Core rules are in SKILL.md. This covers patterns and edge cases.
-## Query Patterns by Intent
-| User Intent | Query Pattern | Why |
-|-------------|---------------|-----|
-| Definition/Concept | `"[term] definition concept"` | Targets explanatory content |
-| How-To/Procedure | `"[action] steps example usage"` | Targets instructional content |
-| API/Function | `"[function] API arguments return"` | Targets reference docs |
-| Troubleshooting | `"[error] fix solution cause"` | Targets problem-solving content |
-## Multi-Query: When to Split
-**Split** when "and" connects distinct topics:
-```
-"How do I authenticate AND handle errors?"
-→ Query 1: "authentication login JWT session"
-→ Query 2: "error handling exception catch"
-```
-**Don't split** when "and" is within single topic:
-```
-"How do I set up and configure the database?"
-→ Single: "database setup configuration"
-```
-## Query Expansion Examples
-When results are few or all score > 0.5:
-| Type | Original | Expanded |
-|------|----------|----------|
-| Synonyms | delete | "delete remove" |
-| Abbreviations | API | "API Application Programming Interface" |
-| Related terms | auth | "auth authentication login" |
-| Word forms | config | "config configuration configure" |
-Keep original term first. Limit to 2-4 additions.
-## Iterative Refinement
-When initial results are unsatisfactory:
-| Problem | Why It Happens | Action |
-|---------|----------------|--------|
-| Too few results | Term mismatch | Expand query (see above) |
-| Too many irrelevant | Query too broad | Add specific terms |
-| Missing expected | Phrasing mismatch | Try alternative wording |
-## Language Mixing
-Ngram tokenization supports cross-language queries:
-```
-"API error handling" → matches both English and Japanese content
-```
+# Query Optimization Reference
+Core rules are in SKILL.md. This covers patterns and edge cases.
+## Query Patterns by Intent
+| User Intent | Query Pattern | Why |
+|-------------|---------------|-----|
+| Definition/Concept | `"[term] definition concept"` | Targets explanatory content |
+| How-To/Procedure | `"[action] steps example usage"` | Targets instructional content |
+| API/Function | `"[function] API arguments return"` | Targets reference docs |
+| Troubleshooting | `"[error] fix solution cause"` | Targets problem-solving content |
+## Multi-Query: When to Split
+**Split** when "and" connects distinct topics:
+```
+"How do I authenticate AND handle errors?"
+→ Query 1: "authentication login JWT session"
+→ Query 2: "error handling exception catch"
+```
+**Don't split** when "and" is within single topic:
+```
+"How do I set up and configure the database?"
+→ Single: "database setup configuration"
+```
+## Query Expansion Examples
+When results are few or all score > 0.5:
+| Type | Original | Expanded |
+|------|----------|----------|
+| Synonyms | delete | "delete remove" |
+| Abbreviations | API | "API Application Programming Interface" |
+| Related terms | auth | "auth authentication login" |
+| Word forms | config | "config configuration configure" |
+Keep original term first. Limit to 2-4 additions.
+## Iterative Refinement
+When initial results are unsatisfactory:
+| Problem | Why It Happens | Action |
+|---------|----------------|--------|
+| Too few results | Term mismatch | Expand query (see above) |
+| Too many irrelevant | Query too broad | Add specific terms |
+| Missing expected | Phrasing mismatch | Try alternative wording |
+## Language Mixing
+Ngram tokenization supports cross-language queries:
+```
+"API error handling" → matches both English and Japanese content
+```

package/skills/rag-vault/references/result-refinement.md CHANGED Viewed

@@ -1,54 +1,54 @@
-# Result Refinement Reference
-Core rules (score, include/skip) are in SKILL.md. This covers when and how to combine multiple results.
-## When to Synthesize vs Filter
-Match approach to user intent:
-| User Intent | Approach | Why |
-|-------------|----------|-----|
-| Specific answer ("how to X") | Filter to 1-2 best | Extra results add noise |
-| Understanding a topic | Synthesize multiple | Builds complete picture |
-| Troubleshooting error | Filter to direct cause | Tangential info confuses |
-| Comparing options | Synthesize with structure | Need all perspectives |
-## Multiple Results Handling
-### Synthesis
-When: User needs comprehensive understanding.
-```
-Result 1: "API accepts JSON..."
-Result 2: "Auth uses Bearer tokens..."
-→ Combine into unified answer
-```
-### Deduplication
-When: Results overlap significantly.
-1. Pick most complete result
-2. Add only unique info from others
-### Contradiction Resolution
-When: Results conflict.
-Priority: Lower score (= better match)
-If unresolved → Note discrepancy to user
-## Chunk Context
-Single chunks may lack context ("as described above").
-- Note when information is partial
-- Group multiple chunks from same `filePath` as coherent sections
-## No Results
-1. Rephrase query (alternative terms)
-2. Broaden scope
-3. Check ingestion (`list_files`)
-4. Inform user: no matching content
+# Result Refinement Reference
+Core rules (score, include/skip) are in SKILL.md. This covers when and how to combine multiple results.
+## When to Synthesize vs Filter
+Match approach to user intent:
+| User Intent | Approach | Why |
+|-------------|----------|-----|
+| Specific answer ("how to X") | Filter to 1-2 best | Extra results add noise |
+| Understanding a topic | Synthesize multiple | Builds complete picture |
+| Troubleshooting error | Filter to direct cause | Tangential info confuses |
+| Comparing options | Synthesize with structure | Need all perspectives |
+## Multiple Results Handling
+### Synthesis
+When: User needs comprehensive understanding.
+```
+Result 1: "API accepts JSON..."
+Result 2: "Auth uses Bearer tokens..."
+→ Combine into unified answer
+```
+### Deduplication
+When: Results overlap significantly.
+1. Pick most complete result
+2. Add only unique info from others
+### Contradiction Resolution
+When: Results conflict.
+Priority: Lower score (= better match)
+If unresolved → Note discrepancy to user
+## Chunk Context
+Single chunks may lack context ("as described above").
+- Note when information is partial
+- Group multiple chunks from same `filePath` as coherent sections
+## No Results
+1. Rephrase query (alternative terms)
+2. Broaden scope
+3. Check ingestion (`list_files`)
+4. Inform user: no matching content

package/web-ui/dist/assets/index-SBHxoAwi.js CHANGED Viewed

File without changes

package/web-ui/dist/assets/index-ej8i4PGl.css CHANGED Viewed

File without changes

package/web-ui/dist/index.html CHANGED Viewed

File without changes

package/web-ui/dist/vite.svg CHANGED Viewed

File without changes