kiri-mcp-server 0.5.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. package/README.md +59 -5
  2. package/config/default.example.yml +9 -0
  3. package/config/scoring-profiles.yml +21 -6
  4. package/dist/config/default.example.yml +9 -0
  5. package/dist/config/scoring-profiles.yml +21 -6
  6. package/dist/package.json +1 -1
  7. package/dist/server/context.js +0 -1
  8. package/dist/server/handlers.js +547 -79
  9. package/dist/server/scoring.js +8 -3
  10. package/dist/shared/duckdb.js +0 -2
  11. package/dist/shared/embedding.js +15 -2
  12. package/dist/shared/tokenizer.js +0 -1
  13. package/dist/shared/utils/simpleYaml.js +0 -1
  14. package/dist/src/server/handlers.d.ts.map +1 -1
  15. package/dist/src/server/handlers.js +353 -85
  16. package/dist/src/server/handlers.js.map +1 -1
  17. package/dist/src/server/rpc.d.ts.map +1 -1
  18. package/dist/src/server/rpc.js +9 -3
  19. package/dist/src/server/rpc.js.map +1 -1
  20. package/dist/src/server/scoring.d.ts +6 -0
  21. package/dist/src/server/scoring.d.ts.map +1 -1
  22. package/dist/src/server/scoring.js +29 -5
  23. package/dist/src/server/scoring.js.map +1 -1
  24. package/dist/src/shared/duckdb.d.ts +1 -0
  25. package/dist/src/shared/duckdb.d.ts.map +1 -1
  26. package/dist/src/shared/duckdb.js +54 -3
  27. package/dist/src/shared/duckdb.js.map +1 -1
  28. package/dist/src/shared/embedding.d.ts.map +1 -1
  29. package/dist/src/shared/embedding.js +2 -8
  30. package/dist/src/shared/embedding.js.map +1 -1
  31. package/dist/src/shared/tokenizer.d.ts +18 -0
  32. package/dist/src/shared/tokenizer.d.ts.map +1 -1
  33. package/dist/src/shared/tokenizer.js +35 -0
  34. package/dist/src/shared/tokenizer.js.map +1 -1
  35. package/package.json +1 -1
package/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  > Intelligent code context extraction for LLMs via Model Context Protocol
4
4
 
5
- [![Version](https://img.shields.io/badge/version-0.4.1-blue.svg)](package.json)
5
+ [![Version](https://img.shields.io/badge/version-0.7.0-blue.svg)](package.json)
6
6
  [![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
7
7
  [![TypeScript](https://img.shields.io/badge/TypeScript-5.6-blue.svg)](https://www.typescriptlang.org/)
8
8
  [![MCP](https://img.shields.io/badge/MCP-Compatible-green.svg)](https://modelcontextprotocol.io/)
@@ -12,11 +12,12 @@
12
12
  ## 🎯 Why KIRI?
13
13
 
14
14
  - **🔌 MCP Native**: Plug-and-play integration with Claude Desktop, Codex CLI, and other MCP clients
15
- - **🧠 Smart Context**: Extract minimal, relevant code fragments based on task goals
15
+ - **🧠 Smart Context**: Extract minimal, relevant code fragments based on task goals (95% accuracy)
16
16
  - **⚡ Fast**: Sub-second response time for most queries
17
17
  - **🔍 Semantic Search**: Multi-word queries, dependency analysis, and BM25 ranking
18
18
  - **👁️ Auto-Sync**: Watch mode automatically re-indexes when files change
19
19
  - **🛡️ Reliable**: Degrade-first architecture works without optional extensions
20
+ - **📝 Phrase-Aware**: Recognizes compound terms (kebab-case, snake_case) for precise matching
20
21
 
21
22
  ## ⚙️ Prerequisites
22
23
 
@@ -152,9 +153,16 @@ KIRI provides 5 MCP tools for intelligent code exploration:
152
153
 
153
154
  ### 1. context_bundle
154
155
 
155
- **Extract relevant code context based on task goals**
156
+ **Extract relevant code context based on task goals (95% accuracy)**
156
157
 
157
- The most powerful tool for getting started with unfamiliar code. Provide a task description, and KIRI returns the most relevant code snippets.
158
+ The most powerful tool for getting started with unfamiliar code. Provide a task description, and KIRI returns the most relevant code snippets using phrase-aware tokenization and path-based scoring.
159
+
160
+ **v0.7.0 improvements:**
161
+
162
+ - **Multiplicative penalties**: Documentation files now penalized by ×0.3 (70% reduction) instead of additive -2.0
163
+ - **Implementation prioritization**: Implementation files rank 5-10× higher than documentation (1.82 vs 0.30)
164
+ - **Unified boosting logic**: Consistent file ranking across `files_search` and `context_bundle`
165
+ - **Configurable profiles**: `boost_profile` parameter supports "default" (implementation-first), "docs" (documentation-first), or "none" (natural BM25)
158
166
 
159
167
  **When to use:**
160
168
 
@@ -405,11 +413,57 @@ kiri --repo . --db .kiri/index.duckdb --watch --debounce 1000
405
413
  **Watch Mode Features:**
406
414
 
407
415
  - **Debouncing**: Aggregates rapid changes to minimize reindex operations
416
+ - **Incremental Indexing**: Only reindexes changed files (10-100x faster)
408
417
  - **Background Operation**: Doesn't interrupt ongoing queries
409
418
  - **Denylist Integration**: Respects `.gitignore` and `denylist.yml`
410
419
  - **Lock Management**: Prevents concurrent indexing
411
420
  - **Statistics**: Tracks reindex count, duration, and queue depth
412
421
 
422
+ ### Tokenization Strategy
423
+
424
+ Control how KIRI tokenizes and matches compound terms using the `KIRI_TOKENIZATION_STRATEGY` environment variable:
425
+
426
+ ```bash
427
+ # Phrase-aware (default): Recognizes kebab-case/snake_case as phrases
428
+ export KIRI_TOKENIZATION_STRATEGY=phrase-aware
429
+
430
+ # Legacy: Traditional word-by-word tokenization
431
+ export KIRI_TOKENIZATION_STRATEGY=legacy
432
+
433
+ # Hybrid: Both phrase and word-level matching
434
+ export KIRI_TOKENIZATION_STRATEGY=hybrid
435
+ ```
436
+
437
+ **Strategies:**
438
+
439
+ - **`phrase-aware`** (default): Compound terms like `page-agent`, `user_profile` are treated as single phrases with 2× scoring weight. Best for codebases with consistent naming conventions.
440
+ - **`legacy`**: Traditional tokenization that splits all delimiters. Use for backward compatibility.
441
+ - **`hybrid`**: Combines both strategies for maximum flexibility.
442
+
443
+ ### Database Auto-Gitignore
444
+
445
+ KIRI automatically creates `.gitignore` files in database directories to prevent accidental commits:
446
+
447
+ ```typescript
448
+ // Enabled by default
449
+ const db = await DuckDBClient.connect({
450
+ databasePath: ".kiri/index.duckdb",
451
+ autoGitignore: true, // Creates .gitignore with "*" pattern
452
+ });
453
+
454
+ // Disable if needed
455
+ const db = await DuckDBClient.connect({
456
+ databasePath: ".kiri/index.duckdb",
457
+ autoGitignore: false,
458
+ });
459
+ ```
460
+
461
+ **Behavior:**
462
+
463
+ - Only creates `.gitignore` if directory is inside a Git repository
464
+ - Never overwrites existing `.gitignore` files
465
+ - Uses wildcard pattern (`*`) to ignore all database files
466
+
413
467
  ### File Type Boosting
414
468
 
415
469
  Control search ranking behavior with the `boost_profile` parameter:
@@ -680,6 +734,6 @@ Built with:
680
734
 
681
735
  ---
682
736
 
683
- **Status**: v0.4.1 (Beta) - Production-ready for MCP clients
737
+ **Status**: v0.7.0 (Beta) - Production-ready for MCP clients
684
738
 
685
739
  For questions or support, please open a [GitHub issue](https://github.com/CAPHTECH/kiri/issues).
@@ -4,6 +4,15 @@ mcp:
4
4
  tools:
5
5
  - context_bundle
6
6
  - files_search
7
+
8
+ # Tokenization configuration for keyword extraction
9
+ tokenization:
10
+ # Strategy: "phrase-aware" (default), "legacy", or "hybrid"
11
+ # - phrase-aware: Preserves hyphenated terms (e.g., "page-agent" stays as one unit)
12
+ # - legacy: Splits on hyphens (e.g., "page-agent" → ["page", "agent"])
13
+ # - hybrid: Emits both phrases and split keywords
14
+ strategy: "phrase-aware"
15
+
7
16
  indexer:
8
17
  repoRoot: "../../target-repo"
9
18
  database: "var/index.duckdb"
@@ -2,36 +2,51 @@
2
2
  # Each profile defines weights for different ranking signals
3
3
 
4
4
  default:
5
- textMatch: 0.8 # Text/keyword match weight (reduced to decrease noise from broad matches)
5
+ textMatch: 1.0 # Text/keyword match weight (increased to prioritize literal matches)
6
+ pathMatch: 1.5 # Path-based match weight (new - prioritizes files with keywords in paths)
6
7
  editingPath: 2.0 # Currently editing file weight
7
8
  dependency: 0.6 # Dependency relationship weight (increased to prioritize connected implementation files)
8
9
  proximity: 0.25 # Same directory weight
9
- structural: 1.0 # Structural similarity weight (increased to improve semantic matching for broad terms)
10
+ structural: 0.6 # Structural similarity weight (reduced to prevent false positives from similar structure)
11
+ docPenaltyMultiplier: 0.3 # Multiplicative penalty for docs (0.3 = 70% reduction, Phase 1 conservative value)
12
+ implBoostMultiplier: 1.3 # Multiplicative boost for implementation files (1.3 = 30% boost)
10
13
 
11
14
  bugfix:
12
15
  textMatch: 1.0
16
+ pathMatch: 1.5
13
17
  editingPath: 1.8
14
18
  dependency: 0.7 # Higher: bugs often in dependencies
15
19
  proximity: 0.35
16
- structural: 0.9 # Higher: structural understanding helps
20
+ structural: 0.7 # Reduced: prevent canvas-agent matching when searching page-agent
21
+ docPenaltyMultiplier: 0.3
22
+ implBoostMultiplier: 1.3
17
23
 
18
24
  testfail:
19
25
  textMatch: 1.0
26
+ pathMatch: 1.5
20
27
  editingPath: 1.6
21
28
  dependency: 0.85 # Very high: failed tests reveal dependencies
22
29
  proximity: 0.3
23
- structural: 0.8
30
+ structural: 0.7 # Reduced: focus on actual test dependencies
31
+ docPenaltyMultiplier: 0.3
32
+ implBoostMultiplier: 1.3
24
33
 
25
34
  typeerror:
26
35
  textMatch: 1.0
36
+ pathMatch: 1.5
27
37
  editingPath: 1.4
28
38
  dependency: 0.6
29
39
  proximity: 0.4 # Higher: type errors cluster in modules
30
- structural: 0.6 # Lower: type errors are structural
40
+ structural: 0.6 # Already balanced for type analysis
41
+ docPenaltyMultiplier: 0.3
42
+ implBoostMultiplier: 1.3
31
43
 
32
44
  feature:
33
45
  textMatch: 1.0
46
+ pathMatch: 1.5
34
47
  editingPath: 1.5
35
48
  dependency: 0.45 # Lower: new features less dependent
36
49
  proximity: 0.5 # Higher: features cluster spatially
37
- structural: 0.7
50
+ structural: 0.6 # Reduced: focus on actual feature files
51
+ docPenaltyMultiplier: 0.3
52
+ implBoostMultiplier: 1.3
@@ -4,6 +4,15 @@ mcp:
4
4
  tools:
5
5
  - context_bundle
6
6
  - files_search
7
+
8
+ # Tokenization configuration for keyword extraction
9
+ tokenization:
10
+ # Strategy: "phrase-aware" (default), "legacy", or "hybrid"
11
+ # - phrase-aware: Preserves hyphenated terms (e.g., "page-agent" stays as one unit)
12
+ # - legacy: Splits on hyphens (e.g., "page-agent" → ["page", "agent"])
13
+ # - hybrid: Emits both phrases and split keywords
14
+ strategy: "phrase-aware"
15
+
7
16
  indexer:
8
17
  repoRoot: "../../target-repo"
9
18
  database: "var/index.duckdb"
@@ -2,36 +2,51 @@
2
2
  # Each profile defines weights for different ranking signals
3
3
 
4
4
  default:
5
- textMatch: 0.8 # Text/keyword match weight (reduced to decrease noise from broad matches)
5
+ textMatch: 1.0 # Text/keyword match weight (increased to prioritize literal matches)
6
+ pathMatch: 1.5 # Path-based match weight (new - prioritizes files with keywords in paths)
6
7
  editingPath: 2.0 # Currently editing file weight
7
8
  dependency: 0.6 # Dependency relationship weight (increased to prioritize connected implementation files)
8
9
  proximity: 0.25 # Same directory weight
9
- structural: 1.0 # Structural similarity weight (increased to improve semantic matching for broad terms)
10
+ structural: 0.6 # Structural similarity weight (reduced to prevent false positives from similar structure)
11
+ docPenaltyMultiplier: 0.3 # Multiplicative penalty for docs (0.3 = 70% reduction, Phase 1 conservative value)
12
+ implBoostMultiplier: 1.3 # Multiplicative boost for implementation files (1.3 = 30% boost)
10
13
 
11
14
  bugfix:
12
15
  textMatch: 1.0
16
+ pathMatch: 1.5
13
17
  editingPath: 1.8
14
18
  dependency: 0.7 # Higher: bugs often in dependencies
15
19
  proximity: 0.35
16
- structural: 0.9 # Higher: structural understanding helps
20
+ structural: 0.7 # Reduced: prevent canvas-agent matching when searching page-agent
21
+ docPenaltyMultiplier: 0.3
22
+ implBoostMultiplier: 1.3
17
23
 
18
24
  testfail:
19
25
  textMatch: 1.0
26
+ pathMatch: 1.5
20
27
  editingPath: 1.6
21
28
  dependency: 0.85 # Very high: failed tests reveal dependencies
22
29
  proximity: 0.3
23
- structural: 0.8
30
+ structural: 0.7 # Reduced: focus on actual test dependencies
31
+ docPenaltyMultiplier: 0.3
32
+ implBoostMultiplier: 1.3
24
33
 
25
34
  typeerror:
26
35
  textMatch: 1.0
36
+ pathMatch: 1.5
27
37
  editingPath: 1.4
28
38
  dependency: 0.6
29
39
  proximity: 0.4 # Higher: type errors cluster in modules
30
- structural: 0.6 # Lower: type errors are structural
40
+ structural: 0.6 # Already balanced for type analysis
41
+ docPenaltyMultiplier: 0.3
42
+ implBoostMultiplier: 1.3
31
43
 
32
44
  feature:
33
45
  textMatch: 1.0
46
+ pathMatch: 1.5
34
47
  editingPath: 1.5
35
48
  dependency: 0.45 # Lower: new features less dependent
36
49
  proximity: 0.5 # Higher: features cluster spatially
37
- structural: 0.7
50
+ structural: 0.6 # Reduced: focus on actual feature files
51
+ docPenaltyMultiplier: 0.3
52
+ implBoostMultiplier: 1.3
package/dist/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "kiri-mcp-server",
3
- "version": "0.5.0",
3
+ "version": "0.7.0",
4
4
  "description": "KIRI context extraction platform",
5
5
  "type": "module",
6
6
  "packageManager": "pnpm@9.0.0",
@@ -1,2 +1 @@
1
1
  export {};
2
- //# sourceMappingURL=context.js.map