npm - @softerist/heuristic-mcp - Versions diffs - 2.1.1 → 2.1.3 - Mend

@softerist/heuristic-mcp 2.1.1 → 2.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md +90 -17
package/features/index-codebase.js +67 -12
package/lib/cache.js +47 -6
package/package.json +1 -1
package/example.png +0 -0
package/how-its-works.png +0 -0

package/README.md CHANGED Viewed

@@ -13,24 +13,106 @@ AI coding assistants work better when they can find relevant code quickly. Tradi
 This MCP server solves that by indexing your codebase with AI embeddings. Your AI assistant can search by meaning instead of exact keywords, finding relevant code even when the terminology differs.
-![Example](example.png)
+## Example: Semantic Search in Action
-## Why Use This
+**User Query:** "How vector cache working?"
-**Better Code Understanding**
+### MCP Tool Call
-- Search finds code by concept, not just matching words
-- Works with typos and variations in terminology
-- Natural language queries like "where do we validate user input?"
+```json
+{
+  "maxResults": 5,
+  "query": "vector store embeddings cache save load"
+}
+```
+### Output
+**Result 1 (Relevance: 72.5%)**
+**File:** `lib/cache.js` **Lines:** 42-59
+```javascript
+}
+    const filtered = rawVectorStore.length - this.vectorStore.length;
+    if (filtered > 0) {
+        console.error(`[Cache] Filtered ${filtered} outdated cache entries`);
+    }
+    console.error(`[Cache] Loaded ${this.vectorStore.length} cached embeddings`);
+```
+Here's how the **vector cache** works:
+### Overview
+The cache system (`lib/cache.js`) stores two things:
+1. **Vector embeddings** - AI-generated vectors for each code chunk
+2. **File hashes** - MD5 hashes to detect file changes
+## Why Heuristic MCP is Useful
+Yes, absolutely. The **Heuristic MCP** is highly effective because it bridges the gap between "having access to files" and "understanding the codebase." Here is why it is useful:
+### 1. It Solved the "Needle in a Haystack" Problem
+To answer your technical questions, I didn't have to manually browse 15 files or run generic `grep` commands.
+- **Without MCP:** I would have listed directories, guessed `lib/utils.js` might be relevant, read the whole file, then checked `index.js`, etc.
+- **With MCP:** I asked *"how does chunking work"* and it instantly returned lines 91-108 of `lib/utils.js`. It acted like a senior engineer pointing me to the exact lines of code.
-**Performance**
+### 2. It Finds "Concepts," Not Just Words
+Standard tools like `grep` only find exact matches.
+- If I searched for "authentication" using `grep`, I might miss a function named `verifyUserCredentials`.
+- The **Heuristic MCP** links these concepts. In the test script I analyzed earlier, `authentication` correctly matched with `credentials` because of the vector similarity.
+### 3. It Finds "Similar Code"
+AI agents have a limited memory (context window).
+- Instead of reading **every file** to understand the project (which wastes thousands of tokens), the MCP lets me retrieve **only the 5-10 relevant snippets**. This leaves more room for complex reasoning and generating code.
+### 4. It Is Fast & Private
+Since it runs the **Local LLM** (Xenova) directly on your machine:
+- **Latency is near-zero** (<50ms).
+- **Privacy is 100%**: Your source code never leaves your laptop to be indexed by an external cloud service.
+### Verdict
+For a developer (or an AI agent) working on a confusing or large project, this tool is a massive productivity booster. It essentially turns the entire codebase into a searchable database of knowledge.
+## How This is Different
+Most MCP servers and RAG tools are "naive"—they just embed code chunks and run a vector search. **Heuristic MCP** is different because it adds **deterministic intelligence** on top of AI:
+| Feature | Generic MCP / RAG Tool | Heuristic MCP |
+| :- | :- | :- |
+| **Ranking** | Pure similarity score | Similarity + **Call Graph Proximity** + **Recency Boost** |
+| **Logic** | "Is this text similar?" | "Is this similar, AND used by this function, AND active?" |
+| **Refactoring** | N/A | **`find_similar_code`** tool to detect duplicates |
+| **Tuning** | Static (hardcoded) | **Runtime Config** (adjust ANN parameters on the fly) |
+### Comparison to Cursor
+[Cursor](https://cursor.sh) is an excellent AI editor with built-in codebase indexing.
+- **Cursor** is an *Editor*: You must use their IDE to get the features.
+- **Heuristic MCP** is a *Protocol*: It brings Cursor-like intelligence to **any** tool (Claude Desktop, multiple IDEs, agentic workflows) without locking you into a specific editor.
+- **Transparency**: This is open-source. You know exactly how your code is indexed and where the data lives (locally).
+## Performance
 - Pre-indexed embeddings are faster than scanning files at runtime
 - Smart project detection skips dependencies automatically (node_modules, vendor, etc.)
 - Incremental updates - only re-processes changed files
 - Optional ANN search (HNSW) for faster queries on large codebases
-**Privacy**
+## Privacy
 - Everything runs locally on your machine
 - Your code never leaves your system
@@ -178,8 +260,6 @@ When you search, your query is converted to the same vector format. We use a **h
 - **Exact Keyword Matching** (BM25-inspired boost)
 - **Recency Boosting** (favoring files you're actively working on)
-![How It Works](how-its-works.png)
 ## Examples
 **Natural language search:**
@@ -208,13 +288,6 @@ Query: "error handling and exceptions"
 Finds all try/catch blocks and error handling patterns.
-## Privacy
-- AI model runs entirely on your machine
-- No network requests to external services
-- No telemetry or analytics
-- Cache stored locally in `.smart-coding-cache/`
 ## Technical Details
 **Embedding Model**: all-MiniLM-L6-v2 via transformers.js

package/features/index-codebase.js CHANGED Viewed

@@ -503,6 +503,7 @@ export class CodebaseIndexer {
         console.error("[Indexer] Force reindex requested: clearing cache");
         this.cache.setVectorStore([]);
         this.cache.fileHashes = new Map();
+        await this.cache.clearCallGraphData({ removeFile: true });
       }
       const totalStartTime = Date.now();
@@ -520,9 +521,10 @@ export class CodebaseIndexer {
     // Send progress: discovery complete
     this.sendProgress(5, 100, `Discovered ${files.length} files`);
+    const currentFilesSet = new Set(files);
     // Step 1.5: Prune deleted or excluded files from cache
     if (!force) {
-      const currentFilesSet = new Set(files);
       const cachedFiles = Array.from(this.cache.fileHashes.keys());
       let prunedCount = 0;
@@ -540,24 +542,77 @@ export class CodebaseIndexer {
         }
         // If we pruned files, we should save these changes even if no other files changed
       }
+      const prunedCallGraph = this.cache.pruneCallGraphData(currentFilesSet);
+      if (prunedCallGraph > 0 && this.config.verbose) {
+        console.error(`[Indexer] Pruned ${prunedCallGraph} call-graph entries`);
+      }
     }
     // Step 2: Pre-filter unchanged files (early hash check)
     const filesToProcess = await this.preFilterFiles(files);
+    const filesToProcessSet = new Set(filesToProcess.map(entry => entry.file));
     if (filesToProcess.length === 0) {
       console.error("[Indexer] All files unchanged, nothing to index");
-      this.sendProgress(100, 100, "All files up to date");
-      await this.cache.save();
-      const vectorStore = this.cache.getVectorStore();
-      return {
-        skipped: false,
-        filesProcessed: 0,
-        chunksCreated: 0,
-        totalFiles: new Set(vectorStore.map(v => v.file)).size,
-        totalChunks: vectorStore.length,
-        message: "All files up to date"
-      };
+      // If we have no call graph data but we have cached files, we should try to rebuild it
+      if (this.config.callGraphEnabled && this.cache.getVectorStore().length > 0) {
+        // Check for files that are in cache but missing from call graph data
+        const cachedFiles = new Set(this.cache.getVectorStore().map(c => c.file));
+        const callDataFiles = new Set(this.cache.fileCallData.keys());
+        const missingCallData = [];
+        for (const file of cachedFiles) {
+          if (!callDataFiles.has(file) && currentFilesSet.has(file)) {
+            missingCallData.push(file);
+          }
+        }
+        if (missingCallData.length > 0) {
+          console.error(`[Indexer] Found ${missingCallData.length} files missing call graph data, re-indexing...`);
+          const BATCH_SIZE = 100;
+          for (let i = 0; i < missingCallData.length; i += BATCH_SIZE) {
+            const batch = missingCallData.slice(i, i + BATCH_SIZE);
+            const results = await Promise.all(
+              batch.map(async (file) => {
+                try {
+                  const stats = await fs.stat(file);
+                  if (stats.isDirectory()) return null;
+                  if (stats.size > this.config.maxFileSize) return null;
+                  const content = await fs.readFile(file, "utf-8");
+                  const hash = hashContent(content);
+                  return { file, content, hash };
+                } catch {
+                  return null;
+                }
+              })
+            );
+            for (const result of results) {
+              if (!result) continue;
+              if (filesToProcessSet.has(result.file)) continue;
+              filesToProcess.push(result);
+              filesToProcessSet.add(result.file);
+            }
+          }
+        }
+      }
+      // If still empty after checking for missing call data, then we are truly done
+      if (filesToProcess.length === 0) {
+        this.sendProgress(100, 100, "All files up to date");
+        await this.cache.save();
+        const vectorStore = this.cache.getVectorStore();
+        return {
+          skipped: false,
+          filesProcessed: 0,
+          chunksCreated: 0,
+          totalFiles: new Set(vectorStore.map(v => v.file)).size,
+          totalChunks: vectorStore.length,
+          message: "All files up to date"
+        };
+      }
     }
     // Send progress: filtering complete

package/lib/cache.js CHANGED Viewed

@@ -6,6 +6,7 @@ const CACHE_META_FILE = "meta.json";
 const ANN_META_VERSION = 1;
 const ANN_INDEX_FILE = "ann-index.bin";
 const ANN_META_FILE = "ann-meta.json";
+const CALL_GRAPH_FILE = "call-graph.json";
 let hnswlibPromise = null;
 let hnswlibLoadError = null;
@@ -166,7 +167,7 @@ export class EmbeddingsCache {
       }
       // Load call-graph data if it exists
-      const callGraphFile = path.join(this.config.cacheDirectory, "call-graph.json");
+      const callGraphFile = path.join(this.config.cacheDirectory, CALL_GRAPH_FILE);
       try {
         const callGraphData = await fs.readFile(callGraphFile, "utf8");
         const parsed = JSON.parse(callGraphData);
@@ -203,10 +204,12 @@ export class EmbeddingsCache {
         fs.writeFile(metaFile, JSON.stringify(this.cacheMeta, null, 2))
       ]);
-      // Save call-graph data
+      // Save call-graph data (or remove stale cache if empty)
+      const callGraphFile = path.join(this.config.cacheDirectory, CALL_GRAPH_FILE);
       if (this.fileCallData.size > 0) {
-        const callGraphFile = path.join(this.config.cacheDirectory, "call-graph.json");
         await fs.writeFile(callGraphFile, JSON.stringify(Object.fromEntries(this.fileCallData), null, 2));
+      } else {
+        await fs.rm(callGraphFile, { force: true });
       }
     } catch (error) {
       console.error("[Cache] Failed to save cache:", error.message);
@@ -440,9 +443,7 @@ export class EmbeddingsCache {
       this.vectorStore = [];
       this.fileHashes = new Map();
       this.invalidateAnnIndex();
-      // Clear call-graph data
-      this.fileCallData.clear();
-      this.callGraph = null;
+      await this.clearCallGraphData();
       console.error(`[Cache] Cache cleared successfully: ${this.config.cacheDirectory}`);
     } catch (error) {
       console.error("[Cache] Failed to clear cache:", error.message);
@@ -497,6 +498,46 @@ export class EmbeddingsCache {
   // ========== Call Graph Methods ==========
+  /**
+   * Clear all call-graph data (optionally remove persisted cache file)
+   */
+  async clearCallGraphData({ removeFile = false } = {}) {
+    this.fileCallData.clear();
+    this.callGraph = null;
+    if (removeFile && this.config.enableCache) {
+      const callGraphFile = path.join(this.config.cacheDirectory, CALL_GRAPH_FILE);
+      try {
+        await fs.rm(callGraphFile, { force: true });
+      } catch (error) {
+        if (this.config.verbose) {
+          console.error(`[Cache] Failed to remove call-graph cache: ${error.message}`);
+        }
+      }
+    }
+  }
+  /**
+   * Remove call-graph entries for files no longer in the codebase
+   */
+  pruneCallGraphData(validFiles) {
+    if (!validFiles || this.fileCallData.size === 0) return 0;
+    let pruned = 0;
+    for (const file of Array.from(this.fileCallData.keys())) {
+      if (!validFiles.has(file)) {
+        this.fileCallData.delete(file);
+        pruned++;
+      }
+    }
+    if (pruned > 0) {
+      this.callGraph = null;
+    }
+    return pruned;
+  }
   /**
    * Store call data for a file
    */

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@softerist/heuristic-mcp",
-  "version": "2.1.1",
+  "version": "2.1.3",
   "description": "An enhanced MCP server providing intelligent semantic code search with find-similar-code, recency ranking, and improved chunking. Fork of smart-coding-mcp.",
   "type": "module",
   "main": "index.js",

package/example.png DELETED Viewed

Binary file

package/how-its-works.png DELETED Viewed

Binary file