semantic-code-mcp 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Omar Haris (original)
4
+ Copyright (c) 2026 bitkyc08 (modifications)
5
+
6
+ Permission is hereby granted, free of charge, to any person obtaining a copy
7
+ of this software and associated documentation files (the "Software"), to deal
8
+ in the Software without restriction, including without limitation the rights
9
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10
+ copies of the Software, and to permit persons to whom the Software is
11
+ furnished to do so, subject to the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be included in all
14
+ copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,259 @@
1
+ # Semantic Code MCP
2
+
3
+ [![npm version](https://img.shields.io/npm/v/semantic-code-mcp.svg)](https://www.npmjs.com/package/semantic-code-mcp)
4
+ [![npm downloads](https://img.shields.io/npm/dm/semantic-code-mcp.svg)](https://www.npmjs.com/package/semantic-code-mcp)
5
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
6
+ [![Node.js](https://img.shields.io/badge/Node.js-%3E%3D18-green.svg)](https://nodejs.org/)
7
+
8
+ AI-powered semantic code search for coding agents. An MCP server that indexes your codebase with vector embeddings so AI assistants can find code by **meaning**, not just keywords.
9
+
10
+ > Ask *"where do we handle authentication?"* and find code that uses `login`, `session`, `verifyCredentials` — even when no file contains the word "authentication."
11
+
12
+ ## Why
13
+
14
+ Traditional `grep` and keyword search break down when you don't know the exact terms used in the codebase. Semantic search bridges that gap:
15
+
16
+ - **Concept matching** — `"error handling"` finds `try/catch`, `onRejected`, `fallback` patterns
17
+ - **Typo-tolerant** — `"embeding modle"` still finds embedding model code
18
+ - **Context-aware chunking** — AST-based (Tree-sitter) or smart regex splitting preserves code structure
19
+ - **Fast** — progressive indexing lets you search while the codebase is still being indexed
20
+
21
+ Based on [Cursor's research](https://cursor.com/blog/semsearch) showing semantic search improves AI agent performance by 12.5%.
22
+
23
+ ## Quick Start
24
+
25
+ ```bash
26
+ npm install -g semantic-code-mcp
27
+ ```
28
+
29
+ Add to your MCP config:
30
+
31
+ ```json
32
+ {
33
+ "mcpServers": {
34
+ "semantic-code-mcp": {
35
+ "command": "semantic-code-mcp",
36
+ "args": ["--workspace", "/path/to/your/project"]
37
+ }
38
+ }
39
+ }
40
+ ```
41
+
42
+ That's it. Your AI assistant now has semantic code search.
43
+
44
+ ## Features
45
+
46
+ ### Multi-Provider Embeddings
47
+
48
+ | Provider | Model | Privacy | Speed |
49
+ |----------|-------|---------|-------|
50
+ | **Local** (default) | nomic-embed-text-v1.5 | 100% local | ~50ms/chunk |
51
+ | **Gemini** | gemini-embedding-001 | API call | Fast, batched |
52
+ | **OpenAI** | text-embedding-3-small | API call | Fast |
53
+ | **OpenAI-compatible** | Any compatible endpoint | Varies | Varies |
54
+ | **Vertex AI** | Google Cloud models | GCP | Fast |
55
+
56
+ ### Flexible Vector Storage
57
+
58
+ - **SQLite** (default) — zero-config, single-file `.smart-coding-cache/embeddings.db`
59
+ - **Milvus** — scalable ANN search for large codebases or shared team indexes
60
+
61
+ ### Smart Code Chunking
62
+
63
+ Three modes to match your codebase:
64
+
65
+ - **`smart`** (default) — regex-based, language-aware splitting
66
+ - **`ast`** — Tree-sitter parsing for precise function/class boundaries
67
+ - **`line`** — simple fixed-size line chunks
68
+
69
+ ### Resource Throttling
70
+
71
+ CPU capped at 50% during indexing. Your machine stays responsive.
72
+
73
+ ## Tools
74
+
75
+ | Tool | Description |
76
+ |------|-------------|
77
+ | `a_semantic_search` | Find code by meaning. Hybrid semantic + exact match scoring. |
78
+ | `b_index_codebase` | Trigger manual reindex (normally automatic & incremental). |
79
+ | `c_clear_cache` | Reset embeddings cache entirely. |
80
+ | `d_check_last_version` | Look up latest package version from 20+ registries. |
81
+ | `e_set_workspace` | Switch project at runtime without restart. |
82
+ | `f_get_status` | Server health: version, index progress, config. |
83
+
84
+ ## IDE Setup
85
+
86
+ | IDE / App | Guide | `${workspaceFolder}` |
87
+ |-----------|-------|----------------------|
88
+ | **VS Code** | [Setup](docs/ide-setup/vscode.md) | ✅ |
89
+ | **Cursor** | [Setup](docs/ide-setup/cursor.md) | ✅ |
90
+ | **Windsurf** | [Setup](docs/ide-setup/windsurf.md) | ❌ |
91
+ | **Claude Desktop** | [Setup](docs/ide-setup/claude-desktop.md) | ❌ |
92
+ | **OpenCode** | [Setup](docs/ide-setup/opencode.md) | ❌ |
93
+ | **Raycast** | [Setup](docs/ide-setup/raycast.md) | ❌ |
94
+ | **Antigravity** | [Setup](docs/ide-setup/antigravity.md) | ❌ |
95
+
96
+ ### Multi-Project
97
+
98
+ ```json
99
+ {
100
+ "mcpServers": {
101
+ "code-frontend": {
102
+ "command": "semantic-code-mcp",
103
+ "args": ["--workspace", "/path/to/frontend"]
104
+ },
105
+ "code-backend": {
106
+ "command": "semantic-code-mcp",
107
+ "args": ["--workspace", "/path/to/backend"]
108
+ }
109
+ }
110
+ }
111
+ ```
112
+
113
+ ## Configuration
114
+
115
+ All settings via environment variables. Prefix: `SMART_CODING_`.
116
+
117
+ ### Core
118
+
119
+ | Variable | Default | Description |
120
+ |----------|---------|-------------|
121
+ | `SMART_CODING_VERBOSE` | `false` | Detailed logging |
122
+ | `SMART_CODING_MAX_RESULTS` | `5` | Search results returned |
123
+ | `SMART_CODING_BATCH_SIZE` | `100` | Files per parallel batch |
124
+ | `SMART_CODING_MAX_FILE_SIZE` | `1048576` | Max file size (1MB) |
125
+ | `SMART_CODING_CHUNK_SIZE` | `25` | Lines per chunk |
126
+ | `SMART_CODING_CHUNKING_MODE` | `smart` | `smart` / `ast` / `line` |
127
+ | `SMART_CODING_WATCH_FILES` | `false` | Auto-reindex on changes |
128
+ | `SMART_CODING_AUTO_INDEX_DELAY` | `5000` | Background index delay (ms) |
129
+ | `SMART_CODING_MAX_CPU_PERCENT` | `50` | CPU cap during indexing |
130
+
131
+ ### Embedding Provider
132
+
133
+ | Variable | Default | Description |
134
+ |----------|---------|-------------|
135
+ | `SMART_CODING_EMBEDDING_PROVIDER` | `local` | `local` / `gemini` / `openai` / `openai-compatible` / `vertex` |
136
+ | `SMART_CODING_EMBEDDING_MODEL` | `nomic-ai/nomic-embed-text-v1.5` | Model name |
137
+ | `SMART_CODING_EMBEDDING_DIMENSION` | `128` | MRL dimension (64–768) |
138
+ | `SMART_CODING_DEVICE` | `auto` | `cpu` / `webgpu` / `auto` |
139
+
140
+ ### Gemini
141
+
142
+ | Variable | Default | Description |
143
+ |----------|---------|-------------|
144
+ | `SMART_CODING_GEMINI_API_KEY` | — | API key |
145
+ | `SMART_CODING_GEMINI_MODEL` | `gemini-embedding-001` | Model |
146
+ | `SMART_CODING_GEMINI_DIMENSIONS` | `768` | Output dimensions |
147
+ | `SMART_CODING_GEMINI_BATCH_SIZE` | `24` | Micro-batch size |
148
+ | `SMART_CODING_GEMINI_MAX_RETRIES` | `3` | Retry count |
149
+
150
+ ### OpenAI / Compatible
151
+
152
+ | Variable | Default | Description |
153
+ |----------|---------|-------------|
154
+ | `SMART_CODING_EMBEDDING_API_KEY` | — | API key |
155
+ | `SMART_CODING_EMBEDDING_BASE_URL` | — | Base URL (compatible only) |
156
+
157
+ ### Vertex AI
158
+
159
+ | Variable | Default | Description |
160
+ |----------|---------|-------------|
161
+ | `SMART_CODING_VERTEX_PROJECT` | — | GCP project ID |
162
+ | `SMART_CODING_VERTEX_LOCATION` | `us-central1` | Region |
163
+
164
+ ### Vector Store
165
+
166
+ | Variable | Default | Description |
167
+ |----------|---------|-------------|
168
+ | `SMART_CODING_VECTOR_STORE_PROVIDER` | `sqlite` | `sqlite` / `milvus` |
169
+ | `SMART_CODING_MILVUS_ADDRESS` | — | Milvus endpoint |
170
+ | `SMART_CODING_MILVUS_TOKEN` | — | Auth token |
171
+ | `SMART_CODING_MILVUS_DATABASE` | `default` | Database name |
172
+ | `SMART_CODING_MILVUS_COLLECTION` | `smart_coding_embeddings` | Collection |
173
+
174
+ ### Search Tuning
175
+
176
+ | Variable | Default | Description |
177
+ |----------|---------|-------------|
178
+ | `SMART_CODING_SEMANTIC_WEIGHT` | `0.7` | Semantic vs exact weight |
179
+ | `SMART_CODING_EXACT_MATCH_BOOST` | `1.5` | Exact match multiplier |
180
+
181
+ ### Example with Gemini + Milvus
182
+
183
+ ```json
184
+ {
185
+ "mcpServers": {
186
+ "semantic-code-mcp": {
187
+ "command": "semantic-code-mcp",
188
+ "args": ["--workspace", "/path/to/project"],
189
+ "env": {
190
+ "SMART_CODING_EMBEDDING_PROVIDER": "gemini",
191
+ "SMART_CODING_GEMINI_API_KEY": "YOUR_KEY",
192
+ "SMART_CODING_VECTOR_STORE_PROVIDER": "milvus",
193
+ "SMART_CODING_MILVUS_ADDRESS": "http://localhost:19530"
194
+ }
195
+ }
196
+ }
197
+ }
198
+ ```
199
+
200
+ ## Architecture
201
+
202
+ ```
203
+ semantic-code-mcp/
204
+ ├── index.js # MCP server entry point
205
+ ├── lib/
206
+ │ ├── config.js # Configuration loader
207
+ │ ├── cache-factory.js # SQLite / Milvus provider selection
208
+ │ ├── cache.js # SQLite vector store
209
+ │ ├── milvus-cache.js # Milvus vector store
210
+ │ ├── mrl-embedder.js # Local MRL embedder
211
+ │ ├── gemini-embedder.js# Gemini API embedder
212
+ │ ├── ast-chunker.js # Tree-sitter AST chunking
213
+ │ ├── tokenizer.js # Token counting
214
+ │ └── utils.js # Cosine similarity, hashing, smart chunking
215
+ ├── features/
216
+ │ ├── hybrid-search.js # Semantic + exact match search
217
+ │ ├── index-codebase.js # File discovery & incremental indexing
218
+ │ ├── clear-cache.js # Cache reset
219
+ │ ├── check-last-version.js # Package version lookup
220
+ │ ├── set-workspace.js # Runtime workspace switching
221
+ │ └── get-status.js # Server status
222
+ └── test/ # Vitest test suite
223
+ ```
224
+
225
+ ## How It Works
226
+
227
+ ```
228
+ Your code files
229
+ ↓ glob + .gitignore-aware discovery
230
+ Smart/AST chunking
231
+ ↓ language-aware splitting
232
+ AI embedding (local or API)
233
+ ↓ vector generation
234
+ SQLite or Milvus storage
235
+ ↓ incremental, hash-based updates
236
+
237
+ Search query
238
+ ↓ embed query → cosine similarity → exact match boost
239
+ Top N results with relevance scores
240
+ ```
241
+
242
+ **Progressive indexing** — search works immediately while indexing continues in the background. Only changed files are re-indexed on subsequent runs.
243
+
244
+ ## Privacy
245
+
246
+ - **Local mode**: everything runs on your machine. Code never leaves your system.
247
+ - **API mode**: code chunks are sent to the embedding API for vectorization. No telemetry beyond provider API calls.
248
+
249
+ ## License
250
+
251
+ MIT License
252
+
253
+ Copyright (c) 2025 Omar Haris (original), bitkyc08 (modifications, 2026)
254
+
255
+ See [LICENSE](LICENSE) for full text.
256
+
257
+ ---
258
+
259
+ *Built on [smart-coding-mcp](https://github.com/omarHaris/smart-coding-mcp) by Omar Haris. Extended with multi-provider embeddings, Milvus ANN search, AST chunking, resource throttling, and comprehensive test suite.*
package/config.json ADDED
@@ -0,0 +1,85 @@
1
+ {
2
+ "searchDirectory": ".",
3
+ "fileExtensions": [
4
+ "js",
5
+ "ts",
6
+ "jsx",
7
+ "tsx",
8
+ "mjs",
9
+ "cjs",
10
+ "css",
11
+ "scss",
12
+ "sass",
13
+ "less",
14
+ "html",
15
+ "htm",
16
+ "xml",
17
+ "svg",
18
+ "py",
19
+ "pyw",
20
+ "java",
21
+ "kt",
22
+ "scala",
23
+ "c",
24
+ "cpp",
25
+ "h",
26
+ "hpp",
27
+ "cs",
28
+ "go",
29
+ "rs",
30
+ "rb",
31
+ "php",
32
+ "swift",
33
+ "sh",
34
+ "bash",
35
+ "json",
36
+ "yaml",
37
+ "yml",
38
+ "toml",
39
+ "sql"
40
+ ],
41
+ "excludePatterns": [
42
+ "**/node_modules/**",
43
+ "**/dist/**",
44
+ "**/build/**",
45
+ "**/.git/**",
46
+ "**/coverage/**",
47
+ "**/.next/**",
48
+ "**/target/**",
49
+ "**/vendor/**",
50
+ "**/.smart-coding-cache/**",
51
+ "**/*.rdb",
52
+ "**/.venv/**",
53
+ "**/venv/**",
54
+ "**/__pycache__/**",
55
+ "**/_legacy/**"
56
+ ],
57
+ "smartIndexing": true,
58
+ "chunkSize": 10,
59
+ "chunkOverlap": 3,
60
+ "batchSize": 100,
61
+ "maxFileSize": 1048576,
62
+ "maxResults": 3,
63
+ "enableCache": true,
64
+ "cacheDirectory": "./.smart-coding-cache",
65
+ "watchFiles": false,
66
+ "verbose": false,
67
+ "embeddingProvider": "local",
68
+ "embeddingModel": "nomic-ai/nomic-embed-text-v1.5",
69
+ "embeddingDimension": 128,
70
+ "device": "auto",
71
+ "geminiModel": "gemini-embedding-001",
72
+ "geminiBaseURL": "https://generativelanguage.googleapis.com/v1beta/openai",
73
+ "geminiDimensions": 768,
74
+ "geminiBatchSize": 24,
75
+ "geminiBatchFlushMs": 12,
76
+ "geminiMaxRetries": 3,
77
+ "geminiMaxConcurrentBatches": 50,
78
+ "chunkingMode": "smart",
79
+ "semanticWeight": 0.7,
80
+ "exactMatchBoost": 1.5,
81
+ "workerThreads": 50,
82
+ "maxCpuPercent": 50,
83
+ "batchDelay": 100,
84
+ "autoIndexDelay": 5000
85
+ }