@softerist/heuristic-mcp 2.1.1 → 2.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -13,24 +13,106 @@ AI coding assistants work better when they can find relevant code quickly. Tradi
13
13
 
14
14
  This MCP server solves that by indexing your codebase with AI embeddings. Your AI assistant can search by meaning instead of exact keywords, finding relevant code even when the terminology differs.
15
15
 
16
- ![Example](example.png)
16
+ ## Example: Semantic Search in Action
17
17
 
18
- ## Why Use This
18
+ **User Query:** "How vector cache working?"
19
19
 
20
- **Better Code Understanding**
20
+ ### MCP Tool Call
21
21
 
22
- - Search finds code by concept, not just matching words
23
- - Works with typos and variations in terminology
24
- - Natural language queries like "where do we validate user input?"
22
+ ```json
23
+ {
24
+ "maxResults": 5,
25
+ "query": "vector store embeddings cache save load"
26
+ }
27
+ ```
28
+
29
+ ### Output
30
+
31
+ **Result 1 (Relevance: 72.5%)**
32
+
33
+ **File:** `lib/cache.js` **Lines:** 42-59
34
+
35
+ ```javascript
36
+ }
37
+
38
+ const filtered = rawVectorStore.length - this.vectorStore.length;
39
+ if (filtered > 0) {
40
+ console.error(`[Cache] Filtered ${filtered} outdated cache entries`);
41
+ }
42
+ console.error(`[Cache] Loaded ${this.vectorStore.length} cached embeddings`);
43
+ ```
44
+
45
+ Here's how the **vector cache** works:
46
+
47
+ ### Overview
48
+
49
+ The cache system (`lib/cache.js`) stores two things:
50
+
51
+ 1. **Vector embeddings** - AI-generated vectors for each code chunk
52
+ 2. **File hashes** - MD5 hashes to detect file changes
53
+
54
+ ## Why Heuristic MCP is Useful
55
+
56
+ Yes, absolutely. The **Heuristic MCP** is highly effective because it bridges the gap between "having access to files" and "understanding the codebase." Here is why it is useful:
57
+
58
+ ### 1. It Solved the "Needle in a Haystack" Problem
59
+
60
+ To answer your technical questions, I didn't have to manually browse 15 files or run generic `grep` commands.
61
+
62
+ - **Without MCP:** I would have listed directories, guessed `lib/utils.js` might be relevant, read the whole file, then checked `index.js`, etc.
63
+ - **With MCP:** I asked *"how does chunking work"* and it instantly returned lines 91-108 of `lib/utils.js`. It acted like a senior engineer pointing me to the exact lines of code.
25
64
 
26
- **Performance**
65
+ ### 2. It Finds "Concepts," Not Just Words
66
+
67
+ Standard tools like `grep` only find exact matches.
68
+
69
+ - If I searched for "authentication" using `grep`, I might miss a function named `verifyUserCredentials`.
70
+ - The **Heuristic MCP** links these concepts. In the test script I analyzed earlier, `authentication` correctly matched with `credentials` because of the vector similarity.
71
+
72
+ ### 3. It Finds "Similar Code"
73
+
74
+ AI agents have a limited memory (context window).
75
+
76
+ - Instead of reading **every file** to understand the project (which wastes thousands of tokens), the MCP lets me retrieve **only the 5-10 relevant snippets**. This leaves more room for complex reasoning and generating code.
77
+
78
+ ### 4. It Is Fast & Private
79
+
80
+ Since it runs the **Local LLM** (Xenova) directly on your machine:
81
+
82
+ - **Latency is near-zero** (<50ms).
83
+ - **Privacy is 100%**: Your source code never leaves your laptop to be indexed by an external cloud service.
84
+
85
+ ### Verdict
86
+
87
+ For a developer (or an AI agent) working on a confusing or large project, this tool is a massive productivity booster. It essentially turns the entire codebase into a searchable database of knowledge.
88
+
89
+ ## How This is Different
90
+
91
+ Most MCP servers and RAG tools are "naive"—they just embed code chunks and run a vector search. **Heuristic MCP** is different because it adds **deterministic intelligence** on top of AI:
92
+
93
+ | Feature | Generic MCP / RAG Tool | Heuristic MCP |
94
+ | :- | :- | :- |
95
+ | **Ranking** | Pure similarity score | Similarity + **Call Graph Proximity** + **Recency Boost** |
96
+ | **Logic** | "Is this text similar?" | "Is this similar, AND used by this function, AND active?" |
97
+ | **Refactoring** | N/A | **`find_similar_code`** tool to detect duplicates |
98
+ | **Tuning** | Static (hardcoded) | **Runtime Config** (adjust ANN parameters on the fly) |
99
+
100
+ ### Comparison to Cursor
101
+
102
+ [Cursor](https://cursor.sh) is an excellent AI editor with built-in codebase indexing.
103
+
104
+ - **Cursor** is an *Editor*: You must use their IDE to get the features.
105
+ - **Heuristic MCP** is a *Protocol*: It brings Cursor-like intelligence to **any** tool (Claude Desktop, multiple IDEs, agentic workflows) without locking you into a specific editor.
106
+ - **Transparency**: This is open-source. You know exactly how your code is indexed and where the data lives (locally).
107
+
108
+ ## Performance
27
109
 
28
110
  - Pre-indexed embeddings are faster than scanning files at runtime
29
111
  - Smart project detection skips dependencies automatically (node_modules, vendor, etc.)
30
112
  - Incremental updates - only re-processes changed files
31
113
  - Optional ANN search (HNSW) for faster queries on large codebases
32
114
 
33
- **Privacy**
115
+ ## Privacy
34
116
 
35
117
  - Everything runs locally on your machine
36
118
  - Your code never leaves your system
@@ -178,8 +260,6 @@ When you search, your query is converted to the same vector format. We use a **h
178
260
  - **Exact Keyword Matching** (BM25-inspired boost)
179
261
  - **Recency Boosting** (favoring files you're actively working on)
180
262
 
181
- ![How It Works](how-its-works.png)
182
-
183
263
  ## Examples
184
264
 
185
265
  **Natural language search:**
@@ -208,13 +288,6 @@ Query: "error handling and exceptions"
208
288
 
209
289
  Finds all try/catch blocks and error handling patterns.
210
290
 
211
- ## Privacy
212
-
213
- - AI model runs entirely on your machine
214
- - No network requests to external services
215
- - No telemetry or analytics
216
- - Cache stored locally in `.smart-coding-cache/`
217
-
218
291
  ## Technical Details
219
292
 
220
293
  **Embedding Model**: all-MiniLM-L6-v2 via transformers.js
@@ -503,6 +503,7 @@ export class CodebaseIndexer {
503
503
  console.error("[Indexer] Force reindex requested: clearing cache");
504
504
  this.cache.setVectorStore([]);
505
505
  this.cache.fileHashes = new Map();
506
+ await this.cache.clearCallGraphData({ removeFile: true });
506
507
  }
507
508
 
508
509
  const totalStartTime = Date.now();
@@ -520,9 +521,10 @@ export class CodebaseIndexer {
520
521
  // Send progress: discovery complete
521
522
  this.sendProgress(5, 100, `Discovered ${files.length} files`);
522
523
 
524
+ const currentFilesSet = new Set(files);
525
+
523
526
  // Step 1.5: Prune deleted or excluded files from cache
524
527
  if (!force) {
525
- const currentFilesSet = new Set(files);
526
528
  const cachedFiles = Array.from(this.cache.fileHashes.keys());
527
529
  let prunedCount = 0;
528
530
 
@@ -540,24 +542,77 @@ export class CodebaseIndexer {
540
542
  }
541
543
  // If we pruned files, we should save these changes even if no other files changed
542
544
  }
545
+
546
+ const prunedCallGraph = this.cache.pruneCallGraphData(currentFilesSet);
547
+ if (prunedCallGraph > 0 && this.config.verbose) {
548
+ console.error(`[Indexer] Pruned ${prunedCallGraph} call-graph entries`);
549
+ }
543
550
  }
544
551
 
545
552
  // Step 2: Pre-filter unchanged files (early hash check)
546
553
  const filesToProcess = await this.preFilterFiles(files);
554
+ const filesToProcessSet = new Set(filesToProcess.map(entry => entry.file));
547
555
 
548
556
  if (filesToProcess.length === 0) {
549
557
  console.error("[Indexer] All files unchanged, nothing to index");
550
- this.sendProgress(100, 100, "All files up to date");
551
- await this.cache.save();
552
- const vectorStore = this.cache.getVectorStore();
553
- return {
554
- skipped: false,
555
- filesProcessed: 0,
556
- chunksCreated: 0,
557
- totalFiles: new Set(vectorStore.map(v => v.file)).size,
558
- totalChunks: vectorStore.length,
559
- message: "All files up to date"
560
- };
558
+
559
+ // If we have no call graph data but we have cached files, we should try to rebuild it
560
+ if (this.config.callGraphEnabled && this.cache.getVectorStore().length > 0) {
561
+ // Check for files that are in cache but missing from call graph data
562
+ const cachedFiles = new Set(this.cache.getVectorStore().map(c => c.file));
563
+ const callDataFiles = new Set(this.cache.fileCallData.keys());
564
+
565
+ const missingCallData = [];
566
+ for (const file of cachedFiles) {
567
+ if (!callDataFiles.has(file) && currentFilesSet.has(file)) {
568
+ missingCallData.push(file);
569
+ }
570
+ }
571
+
572
+ if (missingCallData.length > 0) {
573
+ console.error(`[Indexer] Found ${missingCallData.length} files missing call graph data, re-indexing...`);
574
+ const BATCH_SIZE = 100;
575
+ for (let i = 0; i < missingCallData.length; i += BATCH_SIZE) {
576
+ const batch = missingCallData.slice(i, i + BATCH_SIZE);
577
+ const results = await Promise.all(
578
+ batch.map(async (file) => {
579
+ try {
580
+ const stats = await fs.stat(file);
581
+ if (stats.isDirectory()) return null;
582
+ if (stats.size > this.config.maxFileSize) return null;
583
+ const content = await fs.readFile(file, "utf-8");
584
+ const hash = hashContent(content);
585
+ return { file, content, hash };
586
+ } catch {
587
+ return null;
588
+ }
589
+ })
590
+ );
591
+
592
+ for (const result of results) {
593
+ if (!result) continue;
594
+ if (filesToProcessSet.has(result.file)) continue;
595
+ filesToProcess.push(result);
596
+ filesToProcessSet.add(result.file);
597
+ }
598
+ }
599
+ }
600
+ }
601
+
602
+ // If still empty after checking for missing call data, then we are truly done
603
+ if (filesToProcess.length === 0) {
604
+ this.sendProgress(100, 100, "All files up to date");
605
+ await this.cache.save();
606
+ const vectorStore = this.cache.getVectorStore();
607
+ return {
608
+ skipped: false,
609
+ filesProcessed: 0,
610
+ chunksCreated: 0,
611
+ totalFiles: new Set(vectorStore.map(v => v.file)).size,
612
+ totalChunks: vectorStore.length,
613
+ message: "All files up to date"
614
+ };
615
+ }
561
616
  }
562
617
 
563
618
  // Send progress: filtering complete
package/lib/cache.js CHANGED
@@ -6,6 +6,7 @@ const CACHE_META_FILE = "meta.json";
6
6
  const ANN_META_VERSION = 1;
7
7
  const ANN_INDEX_FILE = "ann-index.bin";
8
8
  const ANN_META_FILE = "ann-meta.json";
9
+ const CALL_GRAPH_FILE = "call-graph.json";
9
10
 
10
11
  let hnswlibPromise = null;
11
12
  let hnswlibLoadError = null;
@@ -166,7 +167,7 @@ export class EmbeddingsCache {
166
167
  }
167
168
 
168
169
  // Load call-graph data if it exists
169
- const callGraphFile = path.join(this.config.cacheDirectory, "call-graph.json");
170
+ const callGraphFile = path.join(this.config.cacheDirectory, CALL_GRAPH_FILE);
170
171
  try {
171
172
  const callGraphData = await fs.readFile(callGraphFile, "utf8");
172
173
  const parsed = JSON.parse(callGraphData);
@@ -203,10 +204,12 @@ export class EmbeddingsCache {
203
204
  fs.writeFile(metaFile, JSON.stringify(this.cacheMeta, null, 2))
204
205
  ]);
205
206
 
206
- // Save call-graph data
207
+ // Save call-graph data (or remove stale cache if empty)
208
+ const callGraphFile = path.join(this.config.cacheDirectory, CALL_GRAPH_FILE);
207
209
  if (this.fileCallData.size > 0) {
208
- const callGraphFile = path.join(this.config.cacheDirectory, "call-graph.json");
209
210
  await fs.writeFile(callGraphFile, JSON.stringify(Object.fromEntries(this.fileCallData), null, 2));
211
+ } else {
212
+ await fs.rm(callGraphFile, { force: true });
210
213
  }
211
214
  } catch (error) {
212
215
  console.error("[Cache] Failed to save cache:", error.message);
@@ -440,9 +443,7 @@ export class EmbeddingsCache {
440
443
  this.vectorStore = [];
441
444
  this.fileHashes = new Map();
442
445
  this.invalidateAnnIndex();
443
- // Clear call-graph data
444
- this.fileCallData.clear();
445
- this.callGraph = null;
446
+ await this.clearCallGraphData();
446
447
  console.error(`[Cache] Cache cleared successfully: ${this.config.cacheDirectory}`);
447
448
  } catch (error) {
448
449
  console.error("[Cache] Failed to clear cache:", error.message);
@@ -497,6 +498,46 @@ export class EmbeddingsCache {
497
498
 
498
499
  // ========== Call Graph Methods ==========
499
500
 
501
+ /**
502
+ * Clear all call-graph data (optionally remove persisted cache file)
503
+ */
504
+ async clearCallGraphData({ removeFile = false } = {}) {
505
+ this.fileCallData.clear();
506
+ this.callGraph = null;
507
+
508
+ if (removeFile && this.config.enableCache) {
509
+ const callGraphFile = path.join(this.config.cacheDirectory, CALL_GRAPH_FILE);
510
+ try {
511
+ await fs.rm(callGraphFile, { force: true });
512
+ } catch (error) {
513
+ if (this.config.verbose) {
514
+ console.error(`[Cache] Failed to remove call-graph cache: ${error.message}`);
515
+ }
516
+ }
517
+ }
518
+ }
519
+
520
+ /**
521
+ * Remove call-graph entries for files no longer in the codebase
522
+ */
523
+ pruneCallGraphData(validFiles) {
524
+ if (!validFiles || this.fileCallData.size === 0) return 0;
525
+
526
+ let pruned = 0;
527
+ for (const file of Array.from(this.fileCallData.keys())) {
528
+ if (!validFiles.has(file)) {
529
+ this.fileCallData.delete(file);
530
+ pruned++;
531
+ }
532
+ }
533
+
534
+ if (pruned > 0) {
535
+ this.callGraph = null;
536
+ }
537
+
538
+ return pruned;
539
+ }
540
+
500
541
  /**
501
542
  * Store call data for a file
502
543
  */
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@softerist/heuristic-mcp",
3
- "version": "2.1.1",
3
+ "version": "2.1.3",
4
4
  "description": "An enhanced MCP server providing intelligent semantic code search with find-similar-code, recency ranking, and improved chunking. Fork of smart-coding-mcp.",
5
5
  "type": "module",
6
6
  "main": "index.js",
package/example.png DELETED
Binary file
package/how-its-works.png DELETED
Binary file