kiri-mcp-server 0.4.1 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. package/README.md +58 -5
  2. package/config/default.example.yml +9 -0
  3. package/config/scoring-profiles.yml +11 -6
  4. package/dist/config/default.example.yml +9 -0
  5. package/dist/config/scoring-profiles.yml +11 -6
  6. package/dist/package.json +1 -1
  7. package/dist/server/context.js +0 -1
  8. package/dist/server/handlers.js +547 -79
  9. package/dist/server/scoring.js +8 -3
  10. package/dist/shared/duckdb.js +0 -2
  11. package/dist/shared/embedding.js +15 -2
  12. package/dist/shared/tokenizer.js +0 -1
  13. package/dist/shared/utils/simpleYaml.js +0 -1
  14. package/dist/src/indexer/cli.d.ts +1 -0
  15. package/dist/src/indexer/cli.d.ts.map +1 -1
  16. package/dist/src/indexer/cli.js +197 -11
  17. package/dist/src/indexer/cli.js.map +1 -1
  18. package/dist/src/indexer/watch.d.ts +4 -3
  19. package/dist/src/indexer/watch.d.ts.map +1 -1
  20. package/dist/src/indexer/watch.js +11 -7
  21. package/dist/src/indexer/watch.js.map +1 -1
  22. package/dist/src/server/handlers.d.ts.map +1 -1
  23. package/dist/src/server/handlers.js +234 -26
  24. package/dist/src/server/handlers.js.map +1 -1
  25. package/dist/src/server/rpc.d.ts.map +1 -1
  26. package/dist/src/server/rpc.js +9 -3
  27. package/dist/src/server/rpc.js.map +1 -1
  28. package/dist/src/server/scoring.d.ts +2 -0
  29. package/dist/src/server/scoring.d.ts.map +1 -1
  30. package/dist/src/server/scoring.js +13 -1
  31. package/dist/src/server/scoring.js.map +1 -1
  32. package/dist/src/shared/duckdb.d.ts +1 -0
  33. package/dist/src/shared/duckdb.d.ts.map +1 -1
  34. package/dist/src/shared/duckdb.js +54 -3
  35. package/dist/src/shared/duckdb.js.map +1 -1
  36. package/dist/src/shared/embedding.d.ts.map +1 -1
  37. package/dist/src/shared/embedding.js +2 -8
  38. package/dist/src/shared/embedding.js.map +1 -1
  39. package/dist/src/shared/tokenizer.d.ts +18 -0
  40. package/dist/src/shared/tokenizer.d.ts.map +1 -1
  41. package/dist/src/shared/tokenizer.js +35 -0
  42. package/dist/src/shared/tokenizer.js.map +1 -1
  43. package/package.json +1 -1
package/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  > Intelligent code context extraction for LLMs via Model Context Protocol
4
4
 
5
- [![Version](https://img.shields.io/badge/version-0.4.1-blue.svg)](package.json)
5
+ [![Version](https://img.shields.io/badge/version-0.6.0-blue.svg)](package.json)
6
6
  [![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
7
7
  [![TypeScript](https://img.shields.io/badge/TypeScript-5.6-blue.svg)](https://www.typescriptlang.org/)
8
8
  [![MCP](https://img.shields.io/badge/MCP-Compatible-green.svg)](https://modelcontextprotocol.io/)
@@ -12,11 +12,12 @@
12
12
  ## 🎯 Why KIRI?
13
13
 
14
14
  - **🔌 MCP Native**: Plug-and-play integration with Claude Desktop, Codex CLI, and other MCP clients
15
- - **🧠 Smart Context**: Extract minimal, relevant code fragments based on task goals
15
+ - **🧠 Smart Context**: Extract minimal, relevant code fragments based on task goals (95% accuracy)
16
16
  - **⚡ Fast**: Sub-second response time for most queries
17
17
  - **🔍 Semantic Search**: Multi-word queries, dependency analysis, and BM25 ranking
18
18
  - **👁️ Auto-Sync**: Watch mode automatically re-indexes when files change
19
19
  - **🛡️ Reliable**: Degrade-first architecture works without optional extensions
20
+ - **📝 Phrase-Aware**: Recognizes compound terms (kebab-case, snake_case) for precise matching
20
21
 
21
22
  ## ⚙️ Prerequisites
22
23
 
@@ -152,9 +153,15 @@ KIRI provides 5 MCP tools for intelligent code exploration:
152
153
 
153
154
  ### 1. context_bundle
154
155
 
155
- **Extract relevant code context based on task goals**
156
+ **Extract relevant code context based on task goals (95% accuracy)**
156
157
 
157
- The most powerful tool for getting started with unfamiliar code. Provide a task description, and KIRI returns the most relevant code snippets.
158
+ The most powerful tool for getting started with unfamiliar code. Provide a task description, and KIRI returns the most relevant code snippets using phrase-aware tokenization and path-based scoring.
159
+
160
+ **v0.6.0 improvements:**
161
+
162
+ - **Phrase-aware tokenization**: Recognizes compound terms like `page-agent`, `user_profile` as single concepts (2× scoring weight)
163
+ - **Path-based scoring**: Additional boost when keywords/phrases appear in file paths
164
+ - **95% accuracy**: Improved from 65-75% through enhanced tokenization and scoring
158
165
 
159
166
  **When to use:**
160
167
 
@@ -405,11 +412,57 @@ kiri --repo . --db .kiri/index.duckdb --watch --debounce 1000
405
412
  **Watch Mode Features:**
406
413
 
407
414
  - **Debouncing**: Aggregates rapid changes to minimize reindex operations
415
+ - **Incremental Indexing**: Only reindexes changed files (10-100x faster)
408
416
  - **Background Operation**: Doesn't interrupt ongoing queries
409
417
  - **Denylist Integration**: Respects `.gitignore` and `denylist.yml`
410
418
  - **Lock Management**: Prevents concurrent indexing
411
419
  - **Statistics**: Tracks reindex count, duration, and queue depth
412
420
 
421
+ ### Tokenization Strategy
422
+
423
+ Control how KIRI tokenizes and matches compound terms using the `KIRI_TOKENIZATION_STRATEGY` environment variable:
424
+
425
+ ```bash
426
+ # Phrase-aware (default): Recognizes kebab-case/snake_case as phrases
427
+ export KIRI_TOKENIZATION_STRATEGY=phrase-aware
428
+
429
+ # Legacy: Traditional word-by-word tokenization
430
+ export KIRI_TOKENIZATION_STRATEGY=legacy
431
+
432
+ # Hybrid: Both phrase and word-level matching
433
+ export KIRI_TOKENIZATION_STRATEGY=hybrid
434
+ ```
435
+
436
+ **Strategies:**
437
+
438
+ - **`phrase-aware`** (default): Compound terms like `page-agent`, `user_profile` are treated as single phrases with 2× scoring weight. Best for codebases with consistent naming conventions.
439
+ - **`legacy`**: Traditional tokenization that splits all delimiters. Use for backward compatibility.
440
+ - **`hybrid`**: Combines both strategies for maximum flexibility.
441
+
442
+ ### Database Auto-Gitignore
443
+
444
+ KIRI automatically creates `.gitignore` files in database directories to prevent accidental commits:
445
+
446
+ ```typescript
447
+ // Enabled by default
448
+ const db = await DuckDBClient.connect({
449
+ databasePath: ".kiri/index.duckdb",
450
+ autoGitignore: true, // Creates .gitignore with "*" pattern
451
+ });
452
+
453
+ // Disable if needed
454
+ const db = await DuckDBClient.connect({
455
+ databasePath: ".kiri/index.duckdb",
456
+ autoGitignore: false,
457
+ });
458
+ ```
459
+
460
+ **Behavior:**
461
+
462
+ - Only creates `.gitignore` if directory is inside a Git repository
463
+ - Never overwrites existing `.gitignore` files
464
+ - Uses wildcard pattern (`*`) to ignore all database files
465
+
413
466
  ### File Type Boosting
414
467
 
415
468
  Control search ranking behavior with the `boost_profile` parameter:
@@ -680,6 +733,6 @@ Built with:
680
733
 
681
734
  ---
682
735
 
683
- **Status**: v0.4.1 (Beta) - Production-ready for MCP clients
736
+ **Status**: v0.6.0 (Beta) - Production-ready for MCP clients
684
737
 
685
738
  For questions or support, please open a [GitHub issue](https://github.com/CAPHTECH/kiri/issues).
@@ -4,6 +4,15 @@ mcp:
4
4
  tools:
5
5
  - context_bundle
6
6
  - files_search
7
+
8
+ # Tokenization configuration for keyword extraction
9
+ tokenization:
10
+ # Strategy: "phrase-aware" (default), "legacy", or "hybrid"
11
+ # - phrase-aware: Preserves hyphenated terms (e.g., "page-agent" stays as one unit)
12
+ # - legacy: Splits on hyphens (e.g., "page-agent" → ["page", "agent"])
13
+ # - hybrid: Emits both phrases and split keywords
14
+ strategy: "phrase-aware"
15
+
7
16
  indexer:
8
17
  repoRoot: "../../target-repo"
9
18
  database: "var/index.duckdb"
@@ -2,36 +2,41 @@
2
2
  # Each profile defines weights for different ranking signals
3
3
 
4
4
  default:
5
- textMatch: 0.8 # Text/keyword match weight (reduced to decrease noise from broad matches)
5
+ textMatch: 1.0 # Text/keyword match weight (increased to prioritize literal matches)
6
+ pathMatch: 1.5 # Path-based match weight (new - prioritizes files with keywords in paths)
6
7
  editingPath: 2.0 # Currently editing file weight
7
8
  dependency: 0.6 # Dependency relationship weight (increased to prioritize connected implementation files)
8
9
  proximity: 0.25 # Same directory weight
9
- structural: 1.0 # Structural similarity weight (increased to improve semantic matching for broad terms)
10
+ structural: 0.6 # Structural similarity weight (reduced to prevent false positives from similar structure)
10
11
 
11
12
  bugfix:
12
13
  textMatch: 1.0
14
+ pathMatch: 1.5
13
15
  editingPath: 1.8
14
16
  dependency: 0.7 # Higher: bugs often in dependencies
15
17
  proximity: 0.35
16
- structural: 0.9 # Higher: structural understanding helps
18
+ structural: 0.7 # Reduced: prevent canvas-agent matching when searching page-agent
17
19
 
18
20
  testfail:
19
21
  textMatch: 1.0
22
+ pathMatch: 1.5
20
23
  editingPath: 1.6
21
24
  dependency: 0.85 # Very high: failed tests reveal dependencies
22
25
  proximity: 0.3
23
- structural: 0.8
26
+ structural: 0.7 # Reduced: focus on actual test dependencies
24
27
 
25
28
  typeerror:
26
29
  textMatch: 1.0
30
+ pathMatch: 1.5
27
31
  editingPath: 1.4
28
32
  dependency: 0.6
29
33
  proximity: 0.4 # Higher: type errors cluster in modules
30
- structural: 0.6 # Lower: type errors are structural
34
+ structural: 0.6 # Already balanced for type analysis
31
35
 
32
36
  feature:
33
37
  textMatch: 1.0
38
+ pathMatch: 1.5
34
39
  editingPath: 1.5
35
40
  dependency: 0.45 # Lower: new features less dependent
36
41
  proximity: 0.5 # Higher: features cluster spatially
37
- structural: 0.7
42
+ structural: 0.6 # Reduced: focus on actual feature files
@@ -4,6 +4,15 @@ mcp:
4
4
  tools:
5
5
  - context_bundle
6
6
  - files_search
7
+
8
+ # Tokenization configuration for keyword extraction
9
+ tokenization:
10
+ # Strategy: "phrase-aware" (default), "legacy", or "hybrid"
11
+ # - phrase-aware: Preserves hyphenated terms (e.g., "page-agent" stays as one unit)
12
+ # - legacy: Splits on hyphens (e.g., "page-agent" → ["page", "agent"])
13
+ # - hybrid: Emits both phrases and split keywords
14
+ strategy: "phrase-aware"
15
+
7
16
  indexer:
8
17
  repoRoot: "../../target-repo"
9
18
  database: "var/index.duckdb"
@@ -2,36 +2,41 @@
2
2
  # Each profile defines weights for different ranking signals
3
3
 
4
4
  default:
5
- textMatch: 0.8 # Text/keyword match weight (reduced to decrease noise from broad matches)
5
+ textMatch: 1.0 # Text/keyword match weight (increased to prioritize literal matches)
6
+ pathMatch: 1.5 # Path-based match weight (new - prioritizes files with keywords in paths)
6
7
  editingPath: 2.0 # Currently editing file weight
7
8
  dependency: 0.6 # Dependency relationship weight (increased to prioritize connected implementation files)
8
9
  proximity: 0.25 # Same directory weight
9
- structural: 1.0 # Structural similarity weight (increased to improve semantic matching for broad terms)
10
+ structural: 0.6 # Structural similarity weight (reduced to prevent false positives from similar structure)
10
11
 
11
12
  bugfix:
12
13
  textMatch: 1.0
14
+ pathMatch: 1.5
13
15
  editingPath: 1.8
14
16
  dependency: 0.7 # Higher: bugs often in dependencies
15
17
  proximity: 0.35
16
- structural: 0.9 # Higher: structural understanding helps
18
+ structural: 0.7 # Reduced: prevent canvas-agent matching when searching page-agent
17
19
 
18
20
  testfail:
19
21
  textMatch: 1.0
22
+ pathMatch: 1.5
20
23
  editingPath: 1.6
21
24
  dependency: 0.85 # Very high: failed tests reveal dependencies
22
25
  proximity: 0.3
23
- structural: 0.8
26
+ structural: 0.7 # Reduced: focus on actual test dependencies
24
27
 
25
28
  typeerror:
26
29
  textMatch: 1.0
30
+ pathMatch: 1.5
27
31
  editingPath: 1.4
28
32
  dependency: 0.6
29
33
  proximity: 0.4 # Higher: type errors cluster in modules
30
- structural: 0.6 # Lower: type errors are structural
34
+ structural: 0.6 # Already balanced for type analysis
31
35
 
32
36
  feature:
33
37
  textMatch: 1.0
38
+ pathMatch: 1.5
34
39
  editingPath: 1.5
35
40
  dependency: 0.45 # Lower: new features less dependent
36
41
  proximity: 0.5 # Higher: features cluster spatially
37
- structural: 0.7
42
+ structural: 0.6 # Reduced: focus on actual feature files
package/dist/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "kiri-mcp-server",
3
- "version": "0.4.1",
3
+ "version": "0.6.0",
4
4
  "description": "KIRI context extraction platform",
5
5
  "type": "module",
6
6
  "packageManager": "pnpm@9.0.0",
@@ -1,2 +1 @@
1
1
  export {};
2
- //# sourceMappingURL=context.js.map