npm - kiri-mcp-server - Versions diffs - 0.4.1 → 0.6.0 - Mend

kiri-mcp-server 0.4.1 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (43) hide show

package/README.md +58 -5
package/config/default.example.yml +9 -0
package/config/scoring-profiles.yml +11 -6
package/dist/config/default.example.yml +9 -0
package/dist/config/scoring-profiles.yml +11 -6
package/dist/package.json +1 -1
package/dist/server/context.js +0 -1
package/dist/server/handlers.js +547 -79
package/dist/server/scoring.js +8 -3
package/dist/shared/duckdb.js +0 -2
package/dist/shared/embedding.js +15 -2
package/dist/shared/tokenizer.js +0 -1
package/dist/shared/utils/simpleYaml.js +0 -1
package/dist/src/indexer/cli.d.ts +1 -0
package/dist/src/indexer/cli.d.ts.map +1 -1
package/dist/src/indexer/cli.js +197 -11
package/dist/src/indexer/cli.js.map +1 -1
package/dist/src/indexer/watch.d.ts +4 -3
package/dist/src/indexer/watch.d.ts.map +1 -1
package/dist/src/indexer/watch.js +11 -7
package/dist/src/indexer/watch.js.map +1 -1
package/dist/src/server/handlers.d.ts.map +1 -1
package/dist/src/server/handlers.js +234 -26
package/dist/src/server/handlers.js.map +1 -1
package/dist/src/server/rpc.d.ts.map +1 -1
package/dist/src/server/rpc.js +9 -3
package/dist/src/server/rpc.js.map +1 -1
package/dist/src/server/scoring.d.ts +2 -0
package/dist/src/server/scoring.d.ts.map +1 -1
package/dist/src/server/scoring.js +13 -1
package/dist/src/server/scoring.js.map +1 -1
package/dist/src/shared/duckdb.d.ts +1 -0
package/dist/src/shared/duckdb.d.ts.map +1 -1
package/dist/src/shared/duckdb.js +54 -3
package/dist/src/shared/duckdb.js.map +1 -1
package/dist/src/shared/embedding.d.ts.map +1 -1
package/dist/src/shared/embedding.js +2 -8
package/dist/src/shared/embedding.js.map +1 -1
package/dist/src/shared/tokenizer.d.ts +18 -0
package/dist/src/shared/tokenizer.d.ts.map +1 -1
package/dist/src/shared/tokenizer.js +35 -0
package/dist/src/shared/tokenizer.js.map +1 -1
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 > Intelligent code context extraction for LLMs via Model Context Protocol
-[![Version](https://img.shields.io/badge/version-0.4.1-blue.svg)](package.json)
+[![Version](https://img.shields.io/badge/version-0.6.0-blue.svg)](package.json)
 [![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
 [![TypeScript](https://img.shields.io/badge/TypeScript-5.6-blue.svg)](https://www.typescriptlang.org/)
 [![MCP](https://img.shields.io/badge/MCP-Compatible-green.svg)](https://modelcontextprotocol.io/)
@@ -12,11 +12,12 @@
 ## 🎯 Why KIRI?
 - **🔌 MCP Native**: Plug-and-play integration with Claude Desktop, Codex CLI, and other MCP clients
-- **🧠 Smart Context**: Extract minimal, relevant code fragments based on task goals
+- **🧠 Smart Context**: Extract minimal, relevant code fragments based on task goals (95% accuracy)
 - **⚡ Fast**: Sub-second response time for most queries
 - **🔍 Semantic Search**: Multi-word queries, dependency analysis, and BM25 ranking
 - **👁️ Auto-Sync**: Watch mode automatically re-indexes when files change
 - **🛡️ Reliable**: Degrade-first architecture works without optional extensions
+- **📝 Phrase-Aware**: Recognizes compound terms (kebab-case, snake_case) for precise matching
 ## ⚙️ Prerequisites
@@ -152,9 +153,15 @@ KIRI provides 5 MCP tools for intelligent code exploration:
 ### 1. context_bundle
-**Extract relevant code context based on task goals**
+**Extract relevant code context based on task goals (95% accuracy)**
-The most powerful tool for getting started with unfamiliar code. Provide a task description, and KIRI returns the most relevant code snippets.
+The most powerful tool for getting started with unfamiliar code. Provide a task description, and KIRI returns the most relevant code snippets using phrase-aware tokenization and path-based scoring.
+**v0.6.0 improvements:**
+- **Phrase-aware tokenization**: Recognizes compound terms like `page-agent`, `user_profile` as single concepts (2× scoring weight)
+- **Path-based scoring**: Additional boost when keywords/phrases appear in file paths
+- **95% accuracy**: Improved from 65-75% through enhanced tokenization and scoring
 **When to use:**
@@ -405,11 +412,57 @@ kiri --repo . --db .kiri/index.duckdb --watch --debounce 1000
 **Watch Mode Features:**
 - **Debouncing**: Aggregates rapid changes to minimize reindex operations
+- **Incremental Indexing**: Only reindexes changed files (10-100x faster)
 - **Background Operation**: Doesn't interrupt ongoing queries
 - **Denylist Integration**: Respects `.gitignore` and `denylist.yml`
 - **Lock Management**: Prevents concurrent indexing
 - **Statistics**: Tracks reindex count, duration, and queue depth
+### Tokenization Strategy
+Control how KIRI tokenizes and matches compound terms using the `KIRI_TOKENIZATION_STRATEGY` environment variable:
+```bash
+# Phrase-aware (default): Recognizes kebab-case/snake_case as phrases
+export KIRI_TOKENIZATION_STRATEGY=phrase-aware
+# Legacy: Traditional word-by-word tokenization
+export KIRI_TOKENIZATION_STRATEGY=legacy
+# Hybrid: Both phrase and word-level matching
+export KIRI_TOKENIZATION_STRATEGY=hybrid
+```
+**Strategies:**
+- **`phrase-aware`** (default): Compound terms like `page-agent`, `user_profile` are treated as single phrases with 2× scoring weight. Best for codebases with consistent naming conventions.
+- **`legacy`**: Traditional tokenization that splits all delimiters. Use for backward compatibility.
+- **`hybrid`**: Combines both strategies for maximum flexibility.
+### Database Auto-Gitignore
+KIRI automatically creates `.gitignore` files in database directories to prevent accidental commits:
+```typescript
+// Enabled by default
+const db = await DuckDBClient.connect({
+  databasePath: ".kiri/index.duckdb",
+  autoGitignore: true, // Creates .gitignore with "*" pattern
+});
+// Disable if needed
+const db = await DuckDBClient.connect({
+  databasePath: ".kiri/index.duckdb",
+  autoGitignore: false,
+});
+```
+**Behavior:**
+- Only creates `.gitignore` if directory is inside a Git repository
+- Never overwrites existing `.gitignore` files
+- Uses wildcard pattern (`*`) to ignore all database files
 ### File Type Boosting
 Control search ranking behavior with the `boost_profile` parameter:
@@ -680,6 +733,6 @@ Built with:
 ---
-**Status**: v0.4.1 (Beta) - Production-ready for MCP clients
+**Status**: v0.6.0 (Beta) - Production-ready for MCP clients
 For questions or support, please open a [GitHub issue](https://github.com/CAPHTECH/kiri/issues).

package/config/default.example.yml CHANGED Viewed

@@ -4,6 +4,15 @@ mcp:
   tools:
     - context_bundle
     - files_search
+# Tokenization configuration for keyword extraction
+tokenization:
+  # Strategy: "phrase-aware" (default), "legacy", or "hybrid"
+  # - phrase-aware: Preserves hyphenated terms (e.g., "page-agent" stays as one unit)
+  # - legacy: Splits on hyphens (e.g., "page-agent" → ["page", "agent"])
+  # - hybrid: Emits both phrases and split keywords
+  strategy: "phrase-aware"
 indexer:
   repoRoot: "../../target-repo"
   database: "var/index.duckdb"

package/config/scoring-profiles.yml CHANGED Viewed

@@ -2,36 +2,41 @@
 # Each profile defines weights for different ranking signals
 default:
-  textMatch: 0.8 # Text/keyword match weight (reduced to decrease noise from broad matches)
+  textMatch: 1.0 # Text/keyword match weight (increased to prioritize literal matches)
+  pathMatch: 1.5 # Path-based match weight (new - prioritizes files with keywords in paths)
   editingPath: 2.0 # Currently editing file weight
   dependency: 0.6 # Dependency relationship weight (increased to prioritize connected implementation files)
   proximity: 0.25 # Same directory weight
-  structural: 1.0 # Structural similarity weight (increased to improve semantic matching for broad terms)
+  structural: 0.6 # Structural similarity weight (reduced to prevent false positives from similar structure)
 bugfix:
   textMatch: 1.0
+  pathMatch: 1.5
   editingPath: 1.8
   dependency: 0.7 # Higher: bugs often in dependencies
   proximity: 0.35
-  structural: 0.9 # Higher: structural understanding helps
+  structural: 0.7 # Reduced: prevent canvas-agent matching when searching page-agent
 testfail:
   textMatch: 1.0
+  pathMatch: 1.5
   editingPath: 1.6
   dependency: 0.85 # Very high: failed tests reveal dependencies
   proximity: 0.3
-  structural: 0.8
+  structural: 0.7 # Reduced: focus on actual test dependencies
 typeerror:
   textMatch: 1.0
+  pathMatch: 1.5
   editingPath: 1.4
   dependency: 0.6
   proximity: 0.4 # Higher: type errors cluster in modules
-  structural: 0.6 # Lower: type errors are structural
+  structural: 0.6 # Already balanced for type analysis
 feature:
   textMatch: 1.0
+  pathMatch: 1.5
   editingPath: 1.5
   dependency: 0.45 # Lower: new features less dependent
   proximity: 0.5 # Higher: features cluster spatially
-  structural: 0.7
+  structural: 0.6 # Reduced: focus on actual feature files

package/dist/config/default.example.yml CHANGED Viewed

@@ -4,6 +4,15 @@ mcp:
   tools:
     - context_bundle
     - files_search
+# Tokenization configuration for keyword extraction
+tokenization:
+  # Strategy: "phrase-aware" (default), "legacy", or "hybrid"
+  # - phrase-aware: Preserves hyphenated terms (e.g., "page-agent" stays as one unit)
+  # - legacy: Splits on hyphens (e.g., "page-agent" → ["page", "agent"])
+  # - hybrid: Emits both phrases and split keywords
+  strategy: "phrase-aware"
 indexer:
   repoRoot: "../../target-repo"
   database: "var/index.duckdb"

package/dist/config/scoring-profiles.yml CHANGED Viewed

@@ -2,36 +2,41 @@
 # Each profile defines weights for different ranking signals
 default:
-  textMatch: 0.8 # Text/keyword match weight (reduced to decrease noise from broad matches)
+  textMatch: 1.0 # Text/keyword match weight (increased to prioritize literal matches)
+  pathMatch: 1.5 # Path-based match weight (new - prioritizes files with keywords in paths)
   editingPath: 2.0 # Currently editing file weight
   dependency: 0.6 # Dependency relationship weight (increased to prioritize connected implementation files)
   proximity: 0.25 # Same directory weight
-  structural: 1.0 # Structural similarity weight (increased to improve semantic matching for broad terms)
+  structural: 0.6 # Structural similarity weight (reduced to prevent false positives from similar structure)
 bugfix:
   textMatch: 1.0
+  pathMatch: 1.5
   editingPath: 1.8
   dependency: 0.7 # Higher: bugs often in dependencies
   proximity: 0.35
-  structural: 0.9 # Higher: structural understanding helps
+  structural: 0.7 # Reduced: prevent canvas-agent matching when searching page-agent
 testfail:
   textMatch: 1.0
+  pathMatch: 1.5
   editingPath: 1.6
   dependency: 0.85 # Very high: failed tests reveal dependencies
   proximity: 0.3
-  structural: 0.8
+  structural: 0.7 # Reduced: focus on actual test dependencies
 typeerror:
   textMatch: 1.0
+  pathMatch: 1.5
   editingPath: 1.4
   dependency: 0.6
   proximity: 0.4 # Higher: type errors cluster in modules
-  structural: 0.6 # Lower: type errors are structural
+  structural: 0.6 # Already balanced for type analysis
 feature:
   textMatch: 1.0
+  pathMatch: 1.5
   editingPath: 1.5
   dependency: 0.45 # Lower: new features less dependent
   proximity: 0.5 # Higher: features cluster spatially
-  structural: 0.7
+  structural: 0.6 # Reduced: focus on actual feature files

package/dist/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
     "name": "kiri-mcp-server",
-    "version": "0.4.1",
+    "version": "0.6.0",
     "description": "KIRI context extraction platform",
     "type": "module",
     "packageManager": "pnpm@9.0.0",

package/dist/server/context.js CHANGED Viewed

	@@ -1,2 +1 @@
1 1	export {};
2	- //# sourceMappingURL=context.js.map