npm - semantic-code-mcp - Versions diffs - 2.0.0 - Mend

semantic-code-mcp 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/LICENSE +22 -0
package/README.md +259 -0
package/config.json +85 -0
package/features/check-last-version.js +504 -0
package/features/clear-cache.js +75 -0
package/features/get-status.js +210 -0
package/features/hybrid-search.js +189 -0
package/features/index-codebase.js +999 -0
package/features/set-workspace.js +183 -0
package/index.js +297 -0
package/lib/ast-chunker.js +273 -0
package/lib/cache-factory.js +13 -0
package/lib/cache.js +157 -0
package/lib/config.js +1296 -0
package/lib/embedding-worker.js +155 -0
package/lib/gemini-embedder.js +351 -0
package/lib/ignore-patterns.js +896 -0
package/lib/milvus-cache.js +478 -0
package/lib/mrl-embedder.js +235 -0
package/lib/project-detector.js +75 -0
package/lib/resource-throttle.js +85 -0
package/lib/sqlite-cache.js +468 -0
package/lib/tokenizer.js +149 -0
package/lib/utils.js +214 -0
package/package.json +70 -0
package/reindex.js +109 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,22 @@
+MIT License
+Copyright (c) 2025 Omar Haris (original)
+Copyright (c) 2026 bitkyc08 (modifications)
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,259 @@
+# Semantic Code MCP
+[![npm version](https://img.shields.io/npm/v/semantic-code-mcp.svg)](https://www.npmjs.com/package/semantic-code-mcp)
+[![npm downloads](https://img.shields.io/npm/dm/semantic-code-mcp.svg)](https://www.npmjs.com/package/semantic-code-mcp)
+[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
+[![Node.js](https://img.shields.io/badge/Node.js-%3E%3D18-green.svg)](https://nodejs.org/)
+AI-powered semantic code search for coding agents. An MCP server that indexes your codebase with vector embeddings so AI assistants can find code by **meaning**, not just keywords.
+> Ask *"where do we handle authentication?"* and find code that uses `login`, `session`, `verifyCredentials` — even when no file contains the word "authentication."
+## Why
+Traditional `grep` and keyword search break down when you don't know the exact terms used in the codebase. Semantic search bridges that gap:
+- **Concept matching** — `"error handling"` finds `try/catch`, `onRejected`, `fallback` patterns
+- **Typo-tolerant** — `"embeding modle"` still finds embedding model code
+- **Context-aware chunking** — AST-based (Tree-sitter) or smart regex splitting preserves code structure
+- **Fast** — progressive indexing lets you search while the codebase is still being indexed
+Based on [Cursor's research](https://cursor.com/blog/semsearch) showing semantic search improves AI agent performance by 12.5%.
+## Quick Start
+```bash
+npm install -g semantic-code-mcp
+```
+Add to your MCP config:
+```json
+{
+  "mcpServers": {
+    "semantic-code-mcp": {
+      "command": "semantic-code-mcp",
+      "args": ["--workspace", "/path/to/your/project"]
+    }
+  }
+}
+```
+That's it. Your AI assistant now has semantic code search.
+## Features
+### Multi-Provider Embeddings
+| Provider | Model | Privacy | Speed |
+|----------|-------|---------|-------|
+| **Local** (default) | nomic-embed-text-v1.5 | 100% local | ~50ms/chunk |
+| **Gemini** | gemini-embedding-001 | API call | Fast, batched |
+| **OpenAI** | text-embedding-3-small | API call | Fast |
+| **OpenAI-compatible** | Any compatible endpoint | Varies | Varies |
+| **Vertex AI** | Google Cloud models | GCP | Fast |
+### Flexible Vector Storage
+- **SQLite** (default) — zero-config, single-file `.smart-coding-cache/embeddings.db`
+- **Milvus** — scalable ANN search for large codebases or shared team indexes
+### Smart Code Chunking
+Three modes to match your codebase:
+- **`smart`** (default) — regex-based, language-aware splitting
+- **`ast`** — Tree-sitter parsing for precise function/class boundaries
+- **`line`** — simple fixed-size line chunks
+### Resource Throttling
+CPU capped at 50% during indexing. Your machine stays responsive.
+## Tools
+| Tool | Description |
+|------|-------------|
+| `a_semantic_search` | Find code by meaning. Hybrid semantic + exact match scoring. |
+| `b_index_codebase` | Trigger manual reindex (normally automatic & incremental). |
+| `c_clear_cache` | Reset embeddings cache entirely. |
+| `d_check_last_version` | Look up latest package version from 20+ registries. |
+| `e_set_workspace` | Switch project at runtime without restart. |
+| `f_get_status` | Server health: version, index progress, config. |
+## IDE Setup
+| IDE / App | Guide | `${workspaceFolder}` |
+|-----------|-------|----------------------|
+| **VS Code** | [Setup](docs/ide-setup/vscode.md) | ✅ |
+| **Cursor** | [Setup](docs/ide-setup/cursor.md) | ✅ |
+| **Windsurf** | [Setup](docs/ide-setup/windsurf.md) | ❌ |
+| **Claude Desktop** | [Setup](docs/ide-setup/claude-desktop.md) | ❌ |
+| **OpenCode** | [Setup](docs/ide-setup/opencode.md) | ❌ |
+| **Raycast** | [Setup](docs/ide-setup/raycast.md) | ❌ |
+| **Antigravity** | [Setup](docs/ide-setup/antigravity.md) | ❌ |
+### Multi-Project
+```json
+{
+  "mcpServers": {
+    "code-frontend": {
+      "command": "semantic-code-mcp",
+      "args": ["--workspace", "/path/to/frontend"]
+    },
+    "code-backend": {
+      "command": "semantic-code-mcp",
+      "args": ["--workspace", "/path/to/backend"]
+    }
+  }
+}
+```
+## Configuration
+All settings via environment variables. Prefix: `SMART_CODING_`.
+### Core
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `SMART_CODING_VERBOSE` | `false` | Detailed logging |
+| `SMART_CODING_MAX_RESULTS` | `5` | Search results returned |
+| `SMART_CODING_BATCH_SIZE` | `100` | Files per parallel batch |
+| `SMART_CODING_MAX_FILE_SIZE` | `1048576` | Max file size (1MB) |
+| `SMART_CODING_CHUNK_SIZE` | `25` | Lines per chunk |
+| `SMART_CODING_CHUNKING_MODE` | `smart` | `smart` / `ast` / `line` |
+| `SMART_CODING_WATCH_FILES` | `false` | Auto-reindex on changes |
+| `SMART_CODING_AUTO_INDEX_DELAY` | `5000` | Background index delay (ms) |
+| `SMART_CODING_MAX_CPU_PERCENT` | `50` | CPU cap during indexing |
+### Embedding Provider
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `SMART_CODING_EMBEDDING_PROVIDER` | `local` | `local` / `gemini` / `openai` / `openai-compatible` / `vertex` |
+| `SMART_CODING_EMBEDDING_MODEL` | `nomic-ai/nomic-embed-text-v1.5` | Model name |
+| `SMART_CODING_EMBEDDING_DIMENSION` | `128` | MRL dimension (64–768) |
+| `SMART_CODING_DEVICE` | `auto` | `cpu` / `webgpu` / `auto` |
+### Gemini
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `SMART_CODING_GEMINI_API_KEY` | — | API key |
+| `SMART_CODING_GEMINI_MODEL` | `gemini-embedding-001` | Model |
+| `SMART_CODING_GEMINI_DIMENSIONS` | `768` | Output dimensions |
+| `SMART_CODING_GEMINI_BATCH_SIZE` | `24` | Micro-batch size |
+| `SMART_CODING_GEMINI_MAX_RETRIES` | `3` | Retry count |
+### OpenAI / Compatible
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `SMART_CODING_EMBEDDING_API_KEY` | — | API key |
+| `SMART_CODING_EMBEDDING_BASE_URL` | — | Base URL (compatible only) |
+### Vertex AI
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `SMART_CODING_VERTEX_PROJECT` | — | GCP project ID |
+| `SMART_CODING_VERTEX_LOCATION` | `us-central1` | Region |
+### Vector Store
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `SMART_CODING_VECTOR_STORE_PROVIDER` | `sqlite` | `sqlite` / `milvus` |
+| `SMART_CODING_MILVUS_ADDRESS` | — | Milvus endpoint |
+| `SMART_CODING_MILVUS_TOKEN` | — | Auth token |
+| `SMART_CODING_MILVUS_DATABASE` | `default` | Database name |
+| `SMART_CODING_MILVUS_COLLECTION` | `smart_coding_embeddings` | Collection |
+### Search Tuning
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `SMART_CODING_SEMANTIC_WEIGHT` | `0.7` | Semantic vs exact weight |
+| `SMART_CODING_EXACT_MATCH_BOOST` | `1.5` | Exact match multiplier |
+### Example with Gemini + Milvus
+```json
+{
+  "mcpServers": {
+    "semantic-code-mcp": {
+      "command": "semantic-code-mcp",
+      "args": ["--workspace", "/path/to/project"],
+      "env": {
+        "SMART_CODING_EMBEDDING_PROVIDER": "gemini",
+        "SMART_CODING_GEMINI_API_KEY": "YOUR_KEY",
+        "SMART_CODING_VECTOR_STORE_PROVIDER": "milvus",
+        "SMART_CODING_MILVUS_ADDRESS": "http://localhost:19530"
+      }
+    }
+  }
+}
+```
+## Architecture
+```
+semantic-code-mcp/
+├── index.js              # MCP server entry point
+├── lib/
+│   ├── config.js         # Configuration loader
+│   ├── cache-factory.js  # SQLite / Milvus provider selection
+│   ├── cache.js          # SQLite vector store
+│   ├── milvus-cache.js   # Milvus vector store
+│   ├── mrl-embedder.js   # Local MRL embedder
+│   ├── gemini-embedder.js# Gemini API embedder
+│   ├── ast-chunker.js    # Tree-sitter AST chunking
+│   ├── tokenizer.js      # Token counting
+│   └── utils.js          # Cosine similarity, hashing, smart chunking
+├── features/
+│   ├── hybrid-search.js  # Semantic + exact match search
+│   ├── index-codebase.js # File discovery & incremental indexing
+│   ├── clear-cache.js    # Cache reset
+│   ├── check-last-version.js  # Package version lookup
+│   ├── set-workspace.js  # Runtime workspace switching
+│   └── get-status.js     # Server status
+└── test/                 # Vitest test suite
+```
+## How It Works
+```
+Your code files
+    ↓ glob + .gitignore-aware discovery
+Smart/AST chunking
+    ↓ language-aware splitting
+AI embedding (local or API)
+    ↓ vector generation
+SQLite or Milvus storage
+    ↓ incremental, hash-based updates
+Search query
+    ↓ embed query → cosine similarity → exact match boost
+Top N results with relevance scores
+```
+**Progressive indexing** — search works immediately while indexing continues in the background. Only changed files are re-indexed on subsequent runs.
+## Privacy
+- **Local mode**: everything runs on your machine. Code never leaves your system.
+- **API mode**: code chunks are sent to the embedding API for vectorization. No telemetry beyond provider API calls.
+## License
+MIT License
+Copyright (c) 2025 Omar Haris (original), bitkyc08 (modifications, 2026)
+See [LICENSE](LICENSE) for full text.
+---
+*Built on [smart-coding-mcp](https://github.com/omarHaris/smart-coding-mcp) by Omar Haris. Extended with multi-provider embeddings, Milvus ANN search, AST chunking, resource throttling, and comprehensive test suite.*

package/config.json ADDED Viewed

@@ -0,0 +1,85 @@
+{
+  "searchDirectory": ".",
+  "fileExtensions": [
+    "js",
+    "ts",
+    "jsx",
+    "tsx",
+    "mjs",
+    "cjs",
+    "css",
+    "scss",
+    "sass",
+    "less",
+    "html",
+    "htm",
+    "xml",
+    "svg",
+    "py",
+    "pyw",
+    "java",
+    "kt",
+    "scala",
+    "c",
+    "cpp",
+    "h",
+    "hpp",
+    "cs",
+    "go",
+    "rs",
+    "rb",
+    "php",
+    "swift",
+    "sh",
+    "bash",
+    "json",
+    "yaml",
+    "yml",
+    "toml",
+    "sql"
+  ],
+  "excludePatterns": [
+    "**/node_modules/**",
+    "**/dist/**",
+    "**/build/**",
+    "**/.git/**",
+    "**/coverage/**",
+    "**/.next/**",
+    "**/target/**",
+    "**/vendor/**",
+    "**/.smart-coding-cache/**",
+    "**/*.rdb",
+    "**/.venv/**",
+    "**/venv/**",
+    "**/__pycache__/**",
+    "**/_legacy/**"
+  ],
+  "smartIndexing": true,
+  "chunkSize": 10,
+  "chunkOverlap": 3,
+  "batchSize": 100,
+  "maxFileSize": 1048576,
+  "maxResults": 3,
+  "enableCache": true,
+  "cacheDirectory": "./.smart-coding-cache",
+  "watchFiles": false,
+  "verbose": false,
+  "embeddingProvider": "local",
+  "embeddingModel": "nomic-ai/nomic-embed-text-v1.5",
+  "embeddingDimension": 128,
+  "device": "auto",
+  "geminiModel": "gemini-embedding-001",
+  "geminiBaseURL": "https://generativelanguage.googleapis.com/v1beta/openai",
+  "geminiDimensions": 768,
+  "geminiBatchSize": 24,
+  "geminiBatchFlushMs": 12,
+  "geminiMaxRetries": 3,
+  "geminiMaxConcurrentBatches": 50,
+  "chunkingMode": "smart",
+  "semanticWeight": 0.7,
+  "exactMatchBoost": 1.5,
+  "workerThreads": 50,
+  "maxCpuPercent": 50,
+  "batchDelay": 100,
+  "autoIndexDelay": 5000
+}