npm - @softerist/heuristic-mcp - Versions diffs - 3.2.11 → 3.2.13 - Mend

@softerist/heuristic-mcp 3.2.11 → 3.2.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md +20 -357
package/features/hybrid-search.js +47 -19
package/features/index-codebase.js +66 -16
package/lib/cache.js +125 -34
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -1,387 +1,50 @@
 # Heuristic MCP Server
-An enhanced MCP server for your codebase. It provides intelligent semantic search, find-similar-code, recency-aware ranking, call-graph proximity boosts, and smart chunking. Optimized for Antigravity, Cursor, Claude Desktop, and VS Code.
+Heuristic MCP adds smart code search to your editor or MCP client.
----
+## Requirements
-## Key Features
+- Node.js `18+`
+- npm (for global install)
+- Internet access at least once to download the embedding model (if install-time download is skipped, it downloads on first run)
+- 64-bit Node.js recommended for native ONNX performance; on Windows, install Microsoft Visual C++ 2015-2022 Redistributable (x64) if native bindings fail
-- Zero-touch setup: postinstall auto-registers the MCP server with supported IDEs when possible.
-- Smart indexing: detects project type and applies smart ignore patterns on top of your excludes.
-- Semantic search: find code by meaning, not just keywords.
-- Find similar code: locate near-duplicate or related patterns from a snippet.
-- Package version lookup: check latest versions from npm, PyPI, crates.io, Maven, and more.
-- Workspace switching: change workspace at runtime without restarting the server.
-- Recency ranking and call-graph boosting: surfaces fresh and related code.
-- Optional ANN index: faster candidate retrieval for large codebases.
-- Optional binary vector store: mmap-friendly cache format for large repos.
-- Flexible embedding dimensions: MRL-compatible dimension reduction (64-768d) for speed/quality tradeoffs.
----
-## Installation
-Install globally (recommended):
+## Install
 ```bash
 npm install -g @softerist/heuristic-mcp
 ```
-What happens during install:
-- Registration runs automatically (`scripts/postinstall.js`).
-- Model pre-download is attempted (`scripts/download-model.js`). If offline, it will be skipped and downloaded on first run.
-If auto-registration did not update your IDE config, run:
+Then enable it for your client:
 ```bash
 heuristic-mcp --start
 ```
----
+If your editor was already open, reload it once.
-## CLI Commands
+## How It Works
-The `heuristic-mcp` binary manages the server lifecycle.
+1. The server scans your workspace and builds a searchable index of your code.
+2. IDE AI models/MCP tools query that index using plain language so you can find relevant code quickly.
+3. Results improve as your index stays up to date with project changes.
-### Status
+## Basic Commands
 ```bash
 heuristic-mcp --status
-```
-Shows server PID(s) and cache stats.
-### Logs
-```bash
 heuristic-mcp --logs
-```
-Tails the server log for the current workspace (defaults to last 200 lines and follows).
-Optional flags:
-```bash
-heuristic-mcp --logs --tail 100
-heuristic-mcp --logs --no-follow
-```
-### Version
-```bash
-heuristic-mcp --version
-```
-### Start/Stop
-```bash
-heuristic-mcp --start
-heuristic-mcp --start antigravity
-heuristic-mcp --start codex
-heuristic-mcp --start cursor
-heuristic-mcp --start vscode
-heuristic-mcp --start windsurf
-heuristic-mcp --start warp
-heuristic-mcp --start "Claude Desktop"
 heuristic-mcp --stop
 ```
-`--start` registers (if needed) and enables the MCP server entry. `--stop` disables it so the IDE won't immediately respawn it. Restart/reload the IDE after `--start` to launch.
-Warp note: this package now targets `~/.warp/mcp_settings.json` (and `%APPDATA%\\Warp\\mcp_settings.json` on Windows when present). If no local Warp MCP config is writable yet, use Warp MCP settings/UI once to initialize it, then re-run `--start warp`.
-### Clear Cache
-```bash
-heuristic-mcp --clear-cache
-```
-Clears the cache for the current working directory (or `--workspace` if provided) and removes stale cache directories without metadata.
----
-## Configuration (`config.jsonc`)
-Configuration is loaded from your workspace root when the server runs with `--workspace`. If not provided by the IDE, the server auto-detects workspace via environment variables and current working directory. In server mode, it falls back to the package `config.jsonc` (or `config.json`) and then your current working directory.
-Example `config.jsonc`:
-```json
-{
-  "excludePatterns": ["**/legacy-code/**", "**/*.test.ts"],
-  "fileNames": ["Dockerfile", ".env.example", "Makefile"],
-  "indexing": {
-    "smartIndexing": true
-  },
-  "worker": {
-    "workerThreads": 0
-  },
-  "embedding": {
-    "embeddingModel": "jinaai/jina-embeddings-v2-base-code",
-    "embeddingBatchSize": null,
-    "embeddingProcessNumThreads": 8
-  },
-  "search": {
-    "recencyBoost": 0.1,
-    "recencyDecayDays": 30
-  },
-  "callGraph": {
-    "callGraphEnabled": true,
-    "callGraphBoost": 0.15
-  },
-  "ann": {
-    "annEnabled": true
-  },
-  "vectorStore": {
-    "vectorStoreFormat": "binary",
-    "vectorStoreContentMode": "external",
-    "vectorStoreLoadMode": "disk",
-    "contentCacheEntries": 256,
-    "vectorCacheEntries": 64
-  },
-  "memoryCleanup": {
-    "clearCacheAfterIndex": true
-  }
-}
-```
-Preferred style is namespaced keys (shown above). Legacy top-level keys are still supported for backward compatibility.
-### Embedding Model & Dimension Options
-**Default model:** `jinaai/jina-embeddings-v2-base-code` (768 dimensions)
-> **Important:** The default Jina model was **not** trained with Matryoshka Representation Learning (MRL). Dimension reduction (`embeddingDimension`) will significantly degrade search quality with this model. Only use dimension reduction with MRL-trained models.
-For faster search with smaller embeddings, switch to an MRL-compatible model:
-```json
-{
-  "embedding": {
-    "embeddingModel": "nomic-ai/nomic-embed-text-v1.5",
-    "embeddingDimension": 128
-  }
-}
-```
-**MRL-compatible models:**
-- `nomic-ai/nomic-embed-text-v1.5` — recommended for 128d/256d
-- Other models explicitly trained with Matryoshka loss
-**embeddingDimension values:** `64 | 128 | 256 | 512 | 768 | null` (null = full dimensions)
-Cache location:
-- By default, the cache is stored in a global OS cache directory under `heuristic-mcp/<hash>`.
-- You can override with `cacheDirectory` in your config file.
-### Environment Variables
-Selected overrides (prefix `SMART_CODING_`):
-Environment overrides target runtime keys and are synced back into namespaces by `lib/config.js`.
-- `SMART_CODING_VERBOSE=true|false` — enable detailed logging.
-- `SMART_CODING_WORKER_THREADS=auto|N` — worker thread count.
-- `SMART_CODING_BATCH_SIZE=100` — files per indexing batch.
-- `SMART_CODING_CHUNK_SIZE=25` — lines per chunk.
-- `SMART_CODING_MAX_RESULTS=5` — max search results.
-- `SMART_CODING_EMBEDDING_BATCH_SIZE=64` — embedding batch size (1–256, overrides auto).
-- `SMART_CODING_EMBEDDING_THREADS=8` — ONNX threads for the embedding child process.
-- `SMART_CODING_RECENCY_BOOST=0.1` — boost for recently edited files.
-- `SMART_CODING_RECENCY_DECAY_DAYS=30` — days until recency boost decays to 0.
-- `SMART_CODING_ANN_ENABLED=true|false` — enable ANN index.
-- `SMART_CODING_ANN_EF_SEARCH=64` — ANN search quality/speed tradeoff.
-- `SMART_CODING_VECTOR_STORE_FORMAT=json|binary|sqlite` — on-disk vector store format.
-- `SMART_CODING_VECTOR_STORE_CONTENT_MODE=external|inline` — where content is stored for binary format.
-- `SMART_CODING_VECTOR_STORE_LOAD_MODE=memory|disk` — vector loading strategy.
-- `SMART_CODING_CONTENT_CACHE_ENTRIES=256` — LRU entries for decoded content.
-- `SMART_CODING_VECTOR_CACHE_ENTRIES=64` — LRU entries for vectors (disk mode).
-- `SMART_CODING_CLEAR_CACHE_AFTER_INDEX=true|false` — drop in-memory vectors after indexing.
-- `SMART_CODING_UNLOAD_MODEL_AFTER_INDEX=true|false` — unload embedding model after indexing to free RAM (~500MB-1GB).
-- `SMART_CODING_EXPLICIT_GC=true|false` — opt-in to explicit GC (requires `--expose-gc`).
-- `SMART_CODING_INCREMENTAL_GC_THRESHOLD_MB=2048` — RSS threshold for running incremental GC after watcher updates (requires explicit GC).
-- `SMART_CODING_EMBEDDING_DIMENSION=64|128|256|512|768` — MRL dimension reduction (only for MRL-trained models).
-See `lib/config.js` for the full list.
-### Binary Vector Store
-Set `vectorStore.vectorStoreFormat` to `binary` to use the on-disk binary cache. This keeps vectors and content out of JS heap
-and reads on demand. Recommended for large repos.
-- `vectorStore.vectorStoreContentMode=external` keeps content in the binary file and only loads for top-N results.
-- `vectorStore.contentCacheEntries` controls the small in-memory LRU for decoded content strings.
-- `vectorStore.vectorStoreLoadMode=disk` streams vectors from disk to reduce memory usage.
-- `vectorStore.vectorCacheEntries` controls the small in-memory LRU for vectors when using disk mode.
-- `memoryCleanup.clearCacheAfterIndex=true` drops in-memory vectors after indexing and reloads lazily on next query.
-- `memoryCleanup.unloadModelAfterIndex=true` (default) unloads the embedding model after indexing to free ~500MB-1GB of RAM; the model will reload on the next search query.
-- Note: `ann.annEnabled=true` with `vectorStore.vectorStoreLoadMode=disk` can increase disk reads during ANN rebuilds on large indexes.
-### SQLite Vector Store
-Set `vectorStore.vectorStoreFormat` to `sqlite` to use SQLite for persistence. This provides:
-- ACID transactions for reliable writes
-- Simpler concurrent access
-- Standard database format for inspection
-```json
-{
-  "vectorStore": {
-    "vectorStoreFormat": "sqlite"
-  }
-}
-```
-The vectors and content are stored in `vectors.sqlite` in your cache directory. You can inspect it with any SQLite browser.
-`vectorStore.vectorStoreContentMode` and `vectorStore.vectorStoreLoadMode` are respected for SQLite (use `vectorStore.vectorStoreLoadMode=disk` to avoid loading vectors into memory).
-**Tradeoffs vs Binary:**
-- Slightly higher read overhead (SQL queries vs direct memory access)
-- Better write reliability (transactions)
-- Easier debugging (standard SQLite file)
-### Benchmarking Search
-Use the built-in script to compare memory vs latency tradeoffs:
-```bash
-node tools/scripts/benchmark-search.js --query "database connection" --runs 10
-```
-Compare modes quickly:
-```bash
-SMART_CODING_VECTOR_STORE_LOAD_MODE=memory node tools/scripts/benchmark-search.js --runs 10
-SMART_CODING_VECTOR_STORE_LOAD_MODE=disk node tools/scripts/benchmark-search.js --runs 10
-SMART_CODING_VECTOR_STORE_FORMAT=binary SMART_CODING_VECTOR_STORE_LOAD_MODE=disk node tools/scripts/benchmark-search.js --runs 10
-```
-Note: On small repos, disk mode may be slightly slower and show noisy RSS deltas; benefits are clearer on large indexes with a small `vectorStore.vectorCacheEntries`.
----
-## MCP Tools Reference
-### `a_semantic_search`
-Find code by meaning. Ideal for natural language queries like "authentication logic" or "database queries".
-### `b_index_codebase`
-Manually trigger a full reindex. Useful after large code changes.
-### `c_clear_cache`
-Clear the embeddings cache and force reindex.
-### `d_ann_config`
-Configure the ANN (Approximate Nearest Neighbor) index. Actions: `stats`, `set_ef_search`, `rebuild`.
-### `d_find_similar_code`
-Find similar code patterns given a snippet. Useful for finding duplicates or refactoring opportunities.
-### `e_check_package_version`
-Fetch the latest version of a package from its official registry.
-**Supported registries:**
-- **npm** (default): `lodash`, `@types/node`
-- **PyPI**: `pip:requests`, `pypi:django`
-- **crates.io**: `cargo:serde`, `rust:tokio`
-- **Maven**: `maven:org.springframework:spring-core`
-- **Go**: `go:github.com/gin-gonic/gin`
-- **RubyGems**: `gem:rails`
-- **NuGet**: `nuget:Newtonsoft.Json`
-- **Packagist**: `composer:laravel/framework`
-- **Hex**: `hex:phoenix`
-- **pub.dev**: `pub:flutter`
-- **Homebrew**: `brew:node`
-- **Conda**: `conda:numpy`
-### `f_set_workspace`
-Change the workspace directory at runtime. Updates search directory, cache location, and optionally triggers reindex.
-The server also attempts this automatically before each tool call when it detects a new workspace path from environment variables (for example `CODEX_WORKSPACE`, `CODEX_PROJECT_ROOT`, `WORKSPACE_FOLDER`).
-**Parameters:**
-- `workspacePath` (required): Absolute path to the new workspace
-- `reindex` (optional, default: `true`): Whether to trigger a full reindex
----
-## Troubleshooting
-**Server isn't starting**
-1. Run `heuristic-mcp --status` to check config and cache status.
-2. Run `heuristic-mcp --logs` to see startup errors.
-**Native ONNX backend unavailable (falls back to WASM)**
-If you see log lines like:
-```
-Native ONNX backend unavailable: The operating system cannot run %1.
-...onnxruntime_binding.node. Falling back to WASM.
-```
-The server will automatically disable workers and force `embedding.embeddingProcessPerBatch` to reduce memory spikes, but you
-should fix the native binding to restore stable memory usage:
-- Ensure you are running **64-bit Node.js** (`node -p "process.arch"` should be `x64`).
-- Install **Microsoft Visual C++ 2015–2022 Redistributable (x64)**.
-- Reinstall dependencies (clears locked native binaries):
-```bash
-Remove-Item -Recurse -Force node_modules\\onnxruntime-node, node_modules\\.onnxruntime-node-* -ErrorAction SilentlyContinue
-npm install
-```
-If you see a warning about **version mismatch** (e.g. "onnxruntime-node 1.23.x incompatible with transformers.js
-expectation 1.14.x"), install the matching version:
-```bash
-npm install onnxruntime-node@1.14.0
-```
-**Search returns no results**
-- Check `heuristic-mcp --status` for indexing progress.
-- If indexing shows zero files, review `excludePatterns` and `fileExtensions`.
-**Model download fails**
-- The install step tries to pre-download the model, but it can be skipped offline.
-- The server will download on first run; ensure network access at least once.
-**Clear cache**
-- Use the MCP tool `c_clear_cache`, run `heuristic-mcp --clear-cache`, or delete the cache directory. For local dev, run `npm run clean`.
-**Inspect cache**
-```bash
-node tools/scripts/cache-stats.js --workspace <path>
-```
-**Stop doesn't stick**
-- The IDE will auto-restart the server if it's still enabled in its config. `--stop` now disables the server entry for Antigravity, Cursor (including `~/.cursor/mcp.json`), Windsurf (`~/.codeium/windsurf/mcp_config.json`), Warp (`~/.warp/mcp_settings.json` and `%APPDATA%\\Warp\\mcp_settings.json` when present), Claude Desktop, and VS Code (when using common MCP settings keys). Restart the IDE after `--start` to re-enable.
+Use `heuristic-mcp --status` first if something looks off.
+Use `heuristic-mcp --cache` to see the cache status or file index progress.
----
+## Advanced Docs
-## Contributing
+Detailed configuration, tool reference, troubleshooting, and release notes are in:
-See `CONTRIBUTING.md` for guidelines.
+- [`docs/GUIDE.md`](docs/GUIDE.md)
+- [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md)
 License: MIT

package/features/hybrid-search.js CHANGED Viewed

@@ -3,6 +3,7 @@ import fs from 'fs/promises';
 import { dotSimilarity } from '../lib/utils.js';
 import { extractSymbolsFromContent } from '../lib/call-graph.js';
 import { embedQueryInChildProcess } from '../lib/embed-query-process.js';
+import { normalizePathKey } from '../lib/path-utils.js';
 import {
   STAT_CONCURRENCY_LIMIT,
   SEARCH_BATCH_SIZE,
@@ -27,6 +28,10 @@ function alignQueryVectorDimension(vector, targetDim) {
   return sliced;
 }
+function toFileKey(file) {
+  return normalizePathKey(file);
+}
 export class HybridSearch {
   constructor(embedder, cache, config) {
     this.embedder = embedder;
@@ -36,6 +41,13 @@ export class HybridSearch {
     this._lastAccess = new Map();
   }
+  setFileModTime(file, mtimeMs) {
+    const key = toFileKey(file);
+    if (!key) return;
+    this.fileModTimes.set(key, mtimeMs);
+    this._lastAccess.set(key, Date.now());
+  }
   async getChunkContent(chunkOrIndex) {
     return await this.cache.getChunkContent(chunkOrIndex);
   }
@@ -54,20 +66,28 @@ export class HybridSearch {
   }
   async populateFileModTimes(files) {
-    const uniqueFiles = new Set(files);
+    const uniqueFilesByKey = new Map();
+    for (const file of files) {
+      const key = toFileKey(file);
+      if (!key) continue;
+      if (!uniqueFilesByKey.has(key)) {
+        uniqueFilesByKey.set(key, file);
+      }
+    }
     const missing = [];
+    const now = Date.now();
-    for (const file of uniqueFiles) {
-      if (!this.fileModTimes.has(file)) {
+    for (const [key, file] of uniqueFilesByKey) {
+      if (!this.fileModTimes.has(key)) {
         const meta = this.cache.getFileMeta(file);
         if (meta && typeof meta.mtimeMs === 'number') {
-          this.fileModTimes.set(file, meta.mtimeMs);
-          this._lastAccess.set(file, Date.now());
+          this.fileModTimes.set(key, meta.mtimeMs);
+          this._lastAccess.set(key, now);
         } else {
-          missing.push(file);
+          missing.push({ key, file });
         }
       } else {
-        this._lastAccess.set(file, Date.now());
+        this._lastAccess.set(key, now);
       }
     }
@@ -79,13 +99,15 @@ export class HybridSearch {
     const worker = async (startIdx) => {
       for (let i = startIdx; i < missing.length; i += workerCount) {
-        const file = missing[i];
+        const item = missing[i];
+        if (!item) continue;
+        const { key, file } = item;
         try {
           const stats = await fs.stat(file);
-          this.fileModTimes.set(file, stats.mtimeMs);
-          this._lastAccess.set(file, Date.now());
+          this.fileModTimes.set(key, stats.mtimeMs);
+          this._lastAccess.set(key, Date.now());
         } catch {
-          this.fileModTimes.set(file, null);
+          this.fileModTimes.set(key, null);
         }
       }
     };
@@ -109,7 +131,10 @@ export class HybridSearch {
   }
   clearFileModTime(file) {
-    this.fileModTimes.delete(file);
+    const key = toFileKey(file);
+    if (!key) return;
+    this.fileModTimes.delete(key);
+    this._lastAccess.delete(key);
   }
   async search(query, maxResults) {
@@ -259,11 +284,11 @@ export class HybridSearch {
           await this.populateFileModTimes(candidates.map((chunk) => chunk.file));
         } else {
           for (const chunk of candidates) {
-            if (!this.fileModTimes.has(chunk.file)) {
-              const meta = this.cache.getFileMeta(chunk.file);
-              if (meta && typeof meta.mtimeMs === 'number') {
-                this.fileModTimes.set(chunk.file, meta.mtimeMs);
-              }
+            const chunkKey = toFileKey(chunk.file);
+            if (!chunkKey || this.fileModTimes.has(chunkKey)) continue;
+            const meta = this.cache.getFileMeta(chunk.file);
+            if (meta && typeof meta.mtimeMs === 'number') {
+              this.setFileModTime(chunk.file, meta.mtimeMs);
             }
           }
         }
@@ -323,7 +348,8 @@ export class HybridSearch {
           }
           if (recencyBoostEnabled) {
-            const mtime = this.fileModTimes.get(chunkInfo.file);
+            const chunkKey = toFileKey(chunkInfo.file);
+            const mtime = chunkKey ? this.fileModTimes.get(chunkKey) : undefined;
             if (typeof mtime === 'number') {
               const ageMs = now - mtime;
               const recencyFactor = Math.max(0, 1 - ageMs / recencyDecayMs);
@@ -380,7 +406,9 @@ export class HybridSearch {
           const relatedFiles = await this.cache.getRelatedFiles(Array.from(symbolsFromTop));
           for (const chunk of scoredChunks) {
-            const proximity = relatedFiles.get(chunk.file);
+            const chunkKey = toFileKey(chunk.file);
+            const proximity =
+              relatedFiles.get(chunk.file) ?? (chunkKey ? relatedFiles.get(chunkKey) : undefined);
             if (proximity) {
               chunk.score += proximity * this.config.callGraphBoost;
             }

package/features/index-codebase.js CHANGED Viewed

@@ -10,6 +10,7 @@ import { fileURLToPath } from 'url';
 import { smartChunk, hashContent } from '../lib/utils.js';
 import { extractCallData } from '../lib/call-graph.js';
 import { forceShutdownEmbeddingPool, isEmbeddingPoolActive } from '../lib/embed-query-process.js';
+import { normalizePathKey } from '../lib/path-utils.js';
 import ignore from 'ignore';
@@ -31,6 +32,10 @@ function normalizePath(value) {
   return value.split(path.sep).join('/');
 }
+function toFileKey(value) {
+  return normalizePathKey(value);
+}
 function globToRegExp(pattern) {
   let regex = '^';
   for (let i = 0; i < pattern.length; i += 1) {
@@ -2149,7 +2154,14 @@ export class CodebaseIndexer {
       if (this.server && this.server.hybridSearch && this.server.hybridSearch.fileModTimes) {
         for (const stat of fileStats) {
           if (stat && stat.file && typeof stat.mtimeMs === 'number') {
-            this.server.hybridSearch.fileModTimes.set(stat.file, stat.mtimeMs);
+            if (typeof this.server.hybridSearch.setFileModTime === 'function') {
+              this.server.hybridSearch.setFileModTime(stat.file, stat.mtimeMs);
+            } else {
+              const key = toFileKey(stat.file);
+              if (key) {
+                this.server.hybridSearch.fileModTimes.set(key, stat.mtimeMs);
+              }
+            }
           }
         }
       }
@@ -2233,7 +2245,16 @@ export class CodebaseIndexer {
       this.sendProgress(5, 100, `Discovered ${files.length} files`);
-      const currentFilesSet = new Set(files);
+      const currentFileKeySet = new Set();
+      const currentFilePathByKey = new Map();
+      for (const file of files) {
+        const key = toFileKey(file);
+        if (!key) continue;
+        currentFileKeySet.add(key);
+        if (!currentFilePathByKey.has(key)) {
+          currentFilePathByKey.set(key, file);
+        }
+      }
       if (!force) {
         const cachedFiles =
@@ -2241,7 +2262,8 @@ export class CodebaseIndexer {
         let prunedCount = 0;
         for (const cachedFile of cachedFiles) {
-          if (!currentFilesSet.has(cachedFile)) {
+          const cachedKey = toFileKey(cachedFile);
+          if (!cachedKey || !currentFileKeySet.has(cachedKey)) {
             this.cache.removeFileFromStore(cachedFile);
             this.cache.deleteFileHash(cachedFile);
             prunedCount++;
@@ -2254,26 +2276,48 @@ export class CodebaseIndexer {
           }
         }
-        const prunedCallGraph = this.cache.pruneCallGraphData(currentFilesSet);
+        const prunedCallGraph = this.cache.pruneCallGraphData(currentFileKeySet);
         if (prunedCallGraph > 0 && this.config.verbose) {
           console.info(`[Indexer] Pruned ${prunedCallGraph} call-graph entries`);
         }
       }
       const filesToProcess = await this.preFilterFiles(files);
-      const filesToProcessSet = new Set(filesToProcess.map((entry) => entry.file));
-      const filesToProcessByFile = new Map(filesToProcess.map((entry) => [entry.file, entry]));
+      const filesToProcessKeys = new Set();
+      const filesToProcessByKey = new Map();
+      for (const entry of filesToProcess) {
+        const key = toFileKey(entry?.file);
+        if (!key) continue;
+        filesToProcessKeys.add(key);
+        if (!filesToProcessByKey.has(key)) {
+          filesToProcessByKey.set(key, entry);
+        }
+      }
       if (this.config.callGraphEnabled && this.cache.getVectorStore().length > 0) {
-        const cachedFiles = new Set(this.cache.getVectorStore().map((c) => c.file));
-        const callDataFiles = new Set(this.cache.getFileCallDataKeys());
+        const cachedFileKeys = new Set();
+        for (const chunk of this.cache.getVectorStore()) {
+          const key = toFileKey(chunk?.file);
+          if (key) cachedFileKeys.add(key);
+        }
+        const callDataFiles = new Set();
+        for (const file of this.cache.getFileCallDataKeys()) {
+          const key = toFileKey(file);
+          if (key) callDataFiles.add(key);
+        }
         const missingCallData = [];
-        for (const file of cachedFiles) {
-          if (!callDataFiles.has(file) && currentFilesSet.has(file)) {
-            missingCallData.push(file);
-            const existing = filesToProcessByFile.get(file);
-            if (existing) existing.force = true;
+        for (const key of cachedFileKeys) {
+          if (!callDataFiles.has(key) && currentFileKeySet.has(key)) {
+            const existing = filesToProcessByKey.get(key);
+            if (existing) {
+              existing.force = true;
+              continue;
+            }
+            const concretePath = currentFilePathByKey.get(key);
+            if (concretePath) {
+              missingCallData.push({ key, file: concretePath });
+            }
           }
         }
@@ -2285,7 +2329,7 @@ export class CodebaseIndexer {
           for (let i = 0; i < missingCallData.length; i += BATCH_SIZE) {
             const batch = missingCallData.slice(i, i + BATCH_SIZE);
             const results = await Promise.all(
-              batch.map(async (file) => {
+              batch.map(async ({ file }) => {
                 try {
                   const stats = await fs.stat(file);
                   if (!stats || typeof stats.isDirectory !== 'function') {
@@ -2304,9 +2348,15 @@ export class CodebaseIndexer {
             for (const result of results) {
               if (!result) continue;
-              if (!filesToProcessSet.has(result.file)) {
+              const key = toFileKey(result.file);
+              if (!key) continue;
+              if (!filesToProcessKeys.has(key)) {
                 filesToProcess.push(result);
-                filesToProcessSet.add(result.file);
+                filesToProcessKeys.add(key);
+                filesToProcessByKey.set(key, result);
+              } else {
+                const existing = filesToProcessByKey.get(key);
+                if (existing) existing.force = existing.force || result.force === true;
               }
             }
           }

package/lib/cache.js CHANGED Viewed

@@ -9,6 +9,7 @@ import {
 } from './vector-store-binary.js';
 import { SqliteVectorStore } from './vector-store-sqlite.js';
 import { isNonProjectDirectory } from './config.js';
+import { normalizePathKey } from './path-utils.js';
 import {
   JSON_WORKER_THRESHOLD_BYTES,
   ANN_DIMENSION_SAMPLE_SIZE,
@@ -226,6 +227,26 @@ function serializeFileHashEntry(entry) {
   return normalizeFileHashEntry(entry);
 }
+function fileKey(filePath) {
+  return normalizePathKey(filePath);
+}
+function numericOrNegInfinity(value) {
+  return Number.isFinite(value) ? value : Number.NEGATIVE_INFINITY;
+}
+function shouldPreferFileHashEntry(candidate, current) {
+  const candidateMtime = numericOrNegInfinity(candidate?.mtimeMs);
+  const currentMtime = numericOrNegInfinity(current?.mtimeMs);
+  if (candidateMtime !== currentMtime) return candidateMtime > currentMtime;
+  const candidateSize = numericOrNegInfinity(candidate?.size);
+  const currentSize = numericOrNegInfinity(current?.size);
+  if (candidateSize !== currentSize) return candidateSize > currentSize;
+  return false;
+}
 function computeAnnCapacity(total, config) {
   const factor = typeof config.annCapacityFactor === 'number' ? config.annCapacityFactor : 1.2;
   const extra = Number.isInteger(config.annCapacityExtra) ? config.annCapacityExtra : 1024;
@@ -674,20 +695,29 @@ export class EmbeddingsCache {
       const hasCacheData = Array.isArray(cacheData);
       const hasHashData = hashData && typeof hashData === 'object';
+      let normalizedHashAliasCollapses = 0;
+      let normalizedCallGraphAliasCollapses = 0;
       if (hasCacheData) {
+        const isWin32 = process.platform === 'win32';
         const allowedExtensions = new Set(
-          (this.config.fileExtensions || []).map((ext) => `.${ext}`)
+          (this.config.fileExtensions || []).map((ext) => `.${String(ext).toLowerCase()}`)
+        );
+        const allowedFileNames = new Set(
+          (this.config.fileNames || []).map((name) =>
+            isWin32 ? String(name).toLowerCase() : String(name)
+          )
         );
-        const allowedFileNames = new Set(this.config.fileNames || []);
         const applyExtensionFilter = !this.binaryStore;
         const shouldKeepFile = (filePath) => {
-          const ext = path.extname(filePath);
+          const ext = path.extname(filePath).toLowerCase();
           if (allowedExtensions.has(ext)) return true;
-          return allowedFileNames.has(path.basename(filePath));
+          const baseName = path.basename(filePath);
+          const normalizedBaseName = isWin32 ? baseName.toLowerCase() : baseName;
+          return allowedFileNames.has(normalizedBaseName);
         };
-        const rawHashes = hasHashData ? new Map(Object.entries(hashData)) : new Map();
+        const rawHashes = hasHashData ? Object.entries(hashData) : [];
         this.vectorStore = [];
         this.fileHashes.clear();
@@ -707,8 +737,17 @@ export class EmbeddingsCache {
           for (const [file, entry] of rawHashes) {
             if (!applyExtensionFilter || shouldKeepFile(file)) {
               const normalized = normalizeFileHashEntry(entry);
-              if (normalized) {
-                this.fileHashes.set(file, normalized);
+              const key = fileKey(file);
+              if (normalized && key) {
+                const existing = this.fileHashes.get(key);
+                if (existing) {
+                  normalizedHashAliasCollapses += 1;
+                  if (shouldPreferFileHashEntry(normalized, existing)) {
+                    this.fileHashes.set(key, normalized);
+                  }
+                } else {
+                  this.fileHashes.set(key, normalized);
+                }
               }
             }
           }
@@ -739,11 +778,31 @@ export class EmbeddingsCache {
       try {
         const callGraphData = await fs.readFile(callGraphFile, 'utf8');
         const parsed = JSON.parse(callGraphData);
-        this.fileCallData = new Map(Object.entries(parsed));
+        const normalizedCallData = new Map();
+        if (parsed && typeof parsed === 'object') {
+          for (const [file, data] of Object.entries(parsed)) {
+            const key = fileKey(file);
+            if (!key) continue;
+            if (normalizedCallData.has(key)) {
+              normalizedCallGraphAliasCollapses += 1;
+            }
+            normalizedCallData.set(key, data);
+          }
+        }
+        this.fileCallData = normalizedCallData;
         if (this.config.verbose) {
           console.info(`[Cache] Loaded call-graph data for ${this.fileCallData.size} files`);
         }
       } catch {}
+      if (
+        this.config.verbose &&
+        (normalizedHashAliasCollapses > 0 || normalizedCallGraphAliasCollapses > 0)
+      ) {
+        console.info(
+          `[Cache] Normalized path-key aliases on load (file-hashes=${normalizedHashAliasCollapses}, call-graph=${normalizedCallGraphAliasCollapses})`
+        );
+      }
     } catch (error) {
       console.warn('[Cache] Failed to load cache:', error.message);
       this.clearInMemoryState();
@@ -943,8 +1002,9 @@ export class EmbeddingsCache {
       const hashEntries = {};
       for (const [file, entry] of this.fileHashes) {
         const serialized = serializeFileHashEntry(entry);
-        if (serialized) {
-          hashEntries[file] = serialized;
+        const key = fileKey(file);
+        if (serialized && key) {
+          hashEntries[key] = serialized;
         }
       }
@@ -955,9 +1015,15 @@ export class EmbeddingsCache {
       const callGraphFile = path.join(this.config.cacheDirectory, CALL_GRAPH_FILE);
       if (this.fileCallData.size > 0) {
+        const callGraphEntries = {};
+        for (const [file, data] of this.fileCallData) {
+          const key = fileKey(file);
+          if (!key) continue;
+          callGraphEntries[key] = data;
+        }
         await fs.writeFile(
           callGraphFile,
-          JSON.stringify(Object.fromEntries(this.fileCallData), null, 2)
+          JSON.stringify(callGraphEntries, null, 2)
         );
       } else {
         await fs.rm(callGraphFile, { force: true });
@@ -1071,7 +1137,9 @@ export class EmbeddingsCache {
   }
   getFileHash(file) {
-    const entry = this.fileHashes.get(file);
+    const key = fileKey(file);
+    if (!key) return undefined;
+    const entry = this.fileHashes.get(key);
     if (typeof entry === 'string') return entry;
     return entry?.hash;
   }
@@ -1095,23 +1163,31 @@ export class EmbeddingsCache {
     if (!iterator) return;
     for (const [file, entry] of iterator) {
       const normalized = normalizeFileHashEntry(entry);
-      if (normalized) {
-        this.fileHashes.set(file, normalized);
+      const key = fileKey(file);
+      if (normalized && key) {
+        const existing = this.fileHashes.get(key);
+        if (!existing || shouldPreferFileHashEntry(normalized, existing)) {
+          this.fileHashes.set(key, normalized);
+        }
       }
     }
   }
   setFileHash(file, hash, meta = null) {
+    const key = fileKey(file);
+    if (!key) return;
     const entry = { hash };
     if (meta && typeof meta === 'object') {
       if (Number.isFinite(meta.mtimeMs)) entry.mtimeMs = meta.mtimeMs;
       if (Number.isFinite(meta.size)) entry.size = meta.size;
     }
-    this.fileHashes.set(file, entry);
+    this.fileHashes.set(key, entry);
   }
   getFileMeta(file) {
-    const entry = this.fileHashes.get(file);
+    const key = fileKey(file);
+    if (!key) return null;
+    const entry = this.fileHashes.get(key);
     if (!entry) return null;
     if (typeof entry === 'string') return { hash: entry };
     return entry;
@@ -1194,16 +1270,20 @@ export class EmbeddingsCache {
   }
   deleteFileHash(file) {
-    this.fileHashes.delete(file);
+    const key = fileKey(file);
+    if (!key) return;
+    this.fileHashes.delete(key);
   }
   async removeFileFromStore(file) {
     if (!Array.isArray(this.vectorStore)) return;
+    const targetKey = fileKey(file);
+    if (!targetKey) return;
     let w = 0;
     for (let r = 0; r < this.vectorStore.length; r++) {
       const chunk = this.vectorStore[r];
-      if (chunk.file !== file) {
+      if (fileKey(chunk.file) !== targetKey) {
         chunk._index = w;
         this.vectorStore[w++] = chunk;
       }
@@ -1213,7 +1293,7 @@ export class EmbeddingsCache {
     this.invalidateAnnIndex();
     this.removeFileCallData(file);
-    this.fileHashes.delete(file);
+    this.deleteFileHash(file);
   }
   addToStore(chunk) {
@@ -1627,10 +1707,15 @@ export class EmbeddingsCache {
   pruneCallGraphData(validFiles) {
     if (!validFiles || this.fileCallData.size === 0) return 0;
+    const validKeys = new Set();
+    for (const file of validFiles) {
+      const key = fileKey(file);
+      if (key) validKeys.add(key);
+    }
     let pruned = 0;
     for (const file of Array.from(this.fileCallData.keys())) {
-      if (!validFiles.has(file)) {
+      if (!validKeys.has(fileKey(file))) {
         this.fileCallData.delete(file);
         pruned++;
       }
@@ -1641,11 +1726,15 @@ export class EmbeddingsCache {
   }
   getFileCallData(file) {
-    return this.fileCallData.get(file);
+    const key = fileKey(file);
+    if (!key) return undefined;
+    return this.fileCallData.get(key);
   }
   hasFileCallData(file) {
-    return this.fileCallData.has(file);
+    const key = fileKey(file);
+    if (!key) return false;
+    return this.fileCallData.has(key);
   }
   getFileCallDataKeys() {
@@ -1657,21 +1746,21 @@ export class EmbeddingsCache {
   }
   setFileCallData(file, data) {
-    this.fileCallData.set(file, data);
+    const key = fileKey(file);
+    if (!key) return;
+    this.fileCallData.set(key, data);
     this.callGraph = null;
   }
   setFileCallDataEntries(entries) {
-    if (entries instanceof Map) {
-      this.fileCallData = entries;
-    } else {
-      this.fileCallData.clear();
-      if (entries && typeof entries === 'object') {
-        for (const [file, data] of Object.entries(entries)) {
-          this.fileCallData.set(file, data);
-        }
-      }
-    }
+    const normalized = new Map();
+    const iterator = entries instanceof Map ? entries.entries() : Object.entries(entries || {});
+    for (const [file, data] of iterator) {
+      const key = fileKey(file);
+      if (!key) continue;
+      normalized.set(key, data);
+    }
+    this.fileCallData = normalized;
     this.callGraph = null;
   }
@@ -1681,7 +1770,9 @@ export class EmbeddingsCache {
   }
   removeFileCallData(file) {
-    this.fileCallData.delete(file);
+    const key = fileKey(file);
+    if (!key) return;
+    this.fileCallData.delete(key);
     this.callGraph = null;
   }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@softerist/heuristic-mcp",
-  "version": "3.2.11",
+  "version": "3.2.13",
   "description": "An enhanced MCP server providing intelligent semantic code search with find-similar-code, recency ranking, and improved chunking. Fork of smart-coding-mcp.",
   "type": "module",
   "main": "index.js",