codeseeker 1.11.2 → 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +170 -181
- package/dist/cli/commands/services/semantic-search-orchestrator.d.ts +36 -4
- package/dist/cli/commands/services/semantic-search-orchestrator.d.ts.map +1 -1
- package/dist/cli/commands/services/semantic-search-orchestrator.js +238 -40
- package/dist/cli/commands/services/semantic-search-orchestrator.js.map +1 -1
- package/dist/cli/services/analysis/deduplication/duplicate-code-detector.js +1 -1
- package/dist/cli/services/analysis/deduplication/duplicate-code-detector.js.map +1 -1
- package/dist/cli/services/monitoring/file-scanning/file-scanner-config.json +126 -0
- package/dist/cli/services/search/ast-chunker.d.ts +37 -0
- package/dist/cli/services/search/ast-chunker.d.ts.map +1 -0
- package/dist/cli/services/search/ast-chunker.js +171 -0
- package/dist/cli/services/search/ast-chunker.js.map +1 -0
- package/dist/mcp/indexing-service.d.ts +0 -4
- package/dist/mcp/indexing-service.d.ts.map +1 -1
- package/dist/mcp/indexing-service.js +8 -25
- package/dist/mcp/indexing-service.js.map +1 -1
- package/dist/mcp/mcp-server.d.ts +23 -9
- package/dist/mcp/mcp-server.d.ts.map +1 -1
- package/dist/mcp/mcp-server.js +370 -328
- package/dist/mcp/mcp-server.js.map +1 -1
- package/dist/storage/embedded/minisearch-text-store.d.ts.map +1 -1
- package/dist/storage/embedded/minisearch-text-store.js +3 -2
- package/dist/storage/embedded/minisearch-text-store.js.map +1 -1
- package/dist/storage/embedded/sqlite-vector-store.d.ts.map +1 -1
- package/dist/storage/embedded/sqlite-vector-store.js +7 -1
- package/dist/storage/embedded/sqlite-vector-store.js.map +1 -1
- package/dist/storage/storage-manager.d.ts +2 -1
- package/dist/storage/storage-manager.d.ts.map +1 -1
- package/dist/storage/storage-manager.js +8 -2
- package/dist/storage/storage-manager.js.map +1 -1
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -1,83 +1,129 @@
|
|
|
1
1
|
# CodeSeeker
|
|
2
2
|
|
|
3
|
-
**
|
|
3
|
+
**Four-layer hybrid search and knowledge graph for AI coding assistants.**
|
|
4
|
+
BM25 + vector embeddings + RAPTOR directory summaries + graph expansion — fused into a single MCP tool that gives Claude, Copilot, and Cursor a real understanding of your codebase.
|
|
4
5
|
|
|
5
6
|
[](https://www.npmjs.com/package/codeseeker)
|
|
6
7
|
[](LICENSE)
|
|
7
8
|
[](https://www.typescriptlang.org/)
|
|
8
9
|
|
|
9
|
-
|
|
10
|
+
Works with **Claude Code**, **GitHub Copilot** (VS Code 1.99+), **Cursor**, **Windsurf**, and **Claude Desktop**.
|
|
11
|
+
Zero configuration — indexes on first use, stays in sync automatically.
|
|
10
12
|
|
|
11
|
-
|
|
13
|
+
## The Problem
|
|
12
14
|
|
|
13
|
-
|
|
15
|
+
AI assistants are powerful editors, but they navigate code like a tourist:
|
|
16
|
+
- **Grep finds text** — not meaning. `"find authentication logic"` returns every file containing the word "auth"
|
|
17
|
+
- **File reads are isolated** — Claude sees a file but not its dependencies, callers, or the patterns your team established
|
|
18
|
+
- **No memory of your project** — every session starts from scratch
|
|
14
19
|
|
|
15
|
-
|
|
20
|
+
CodeSeeker fixes this. It indexes your codebase once and gives AI assistants a queryable knowledge graph they can use on every turn.
|
|
16
21
|
|
|
17
|
-
|
|
22
|
+
## How It Works
|
|
18
23
|
|
|
19
|
-
|
|
24
|
+
A 4-stage pipeline runs on every query:
|
|
20
25
|
|
|
21
|
-
**macOS/Linux:**
|
|
22
|
-
```bash
|
|
23
|
-
curl -fsSL https://raw.githubusercontent.com/jghiringhelli/codeseeker/master/scripts/install.sh | sh
|
|
24
26
|
```
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
27
|
+
Query: "find JWT refresh token logic"
|
|
28
|
+
│
|
|
29
|
+
▼ Stage 1 — Hybrid retrieval
|
|
30
|
+
┌─────────────────────────────────────────────────────┐
|
|
31
|
+
│ BM25 (exact symbols, camelCase tokenized) │
|
|
32
|
+
│ + │
|
|
33
|
+
│ Vector search (384-dim Xenova embeddings) │
|
|
34
|
+
│ ↓ │
|
|
35
|
+
│ Reciprocal Rank Fusion: score = Σ 1/(60 + rank_i) │
|
|
36
|
+
│ Top-30 results, including RAPTOR directory nodes │
|
|
37
|
+
└─────────────────────────────────────────────────────┘
|
|
38
|
+
│
|
|
39
|
+
▼ Stage 2 — RAPTOR cascade (conditional)
|
|
40
|
+
┌─────────────────────────────────────────────────────┐
|
|
41
|
+
│ IF best directory-summary score ≥ 0.5: │
|
|
42
|
+
│ → narrow results to that directory automatically │
|
|
43
|
+
│ ELSE: all 30 results pass through unchanged │
|
|
44
|
+
│ Effect: "what does auth/ do?" scopes to auth/ │
|
|
45
|
+
│ "jwt.ts decode function" bypasses this │
|
|
46
|
+
└─────────────────────────────────────────────────────┘
|
|
47
|
+
│
|
|
48
|
+
▼ Stage 3 — Scoring and deduplication
|
|
49
|
+
┌─────────────────────────────────────────────────────┐
|
|
50
|
+
│ Dedup: keep highest-score chunk per file │
|
|
51
|
+
│ Source files: +0.10 (definition sites matter) │
|
|
52
|
+
│ Test files: −0.15 (prevent test dominance) │
|
|
53
|
+
│ Symbol boost: +0.20 (query token in filename) │
|
|
54
|
+
│ Multi-chunk: up to +0.30 (file has many hits) │
|
|
55
|
+
└─────────────────────────────────────────────────────┘
|
|
56
|
+
│
|
|
57
|
+
▼ Stage 4 — Graph expansion
|
|
58
|
+
┌─────────────────────────────────────────────────────┐
|
|
59
|
+
│ Top-10 results → follow IMPORTS/CALLS/EXTENDS edges │
|
|
60
|
+
│ Structural neighbors scored at source × 0.7 │
|
|
61
|
+
│ Avg graph connectivity: 20.8 edges/node │
|
|
62
|
+
└─────────────────────────────────────────────────────┘
|
|
63
|
+
│
|
|
64
|
+
▼
|
|
65
|
+
auth/jwt.ts (0.94), auth/refresh.ts (0.89), ...
|
|
29
66
|
```
|
|
30
67
|
|
|
31
|
-
|
|
68
|
+
The knowledge graph is built from AST-parsed imports at index time. It's what powers `analyze dependencies`, dead-code detection, and graph expansion in every search.
|
|
32
69
|
|
|
33
|
-
|
|
70
|
+
## What Makes It Different
|
|
34
71
|
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
72
|
+
| Approach | Strengths | Limitations |
|
|
73
|
+
|----------|-----------|-------------|
|
|
74
|
+
| **Grep / ripgrep** | Fast, universal | No semantic understanding |
|
|
75
|
+
| **Vector search only** | Finds similar code | Misses structural relationships |
|
|
76
|
+
| **Serena** | Precise LSP symbol navigation, 30+ languages | No semantic search, no cross-file reasoning |
|
|
77
|
+
| **Codanna** | Fast symbol lookup, good call graphs | Semantic search needs JSDoc — undocumented code gets no embeddings; no BM25, no RAPTOR, Windows experimental |
|
|
78
|
+
| **CodeSeeker** | BM25 + embedding fusion + RAPTOR + graph + coding standards + multi-language AST | Requires initial indexing (30s–5min) |
|
|
41
79
|
|
|
42
|
-
**
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
```
|
|
80
|
+
**What LSP tools can't do:**
|
|
81
|
+
- *"Find code that handles errors like this"* → semantic pattern search
|
|
82
|
+
- *"What validation approach does this project use?"* → auto-detected coding standards
|
|
83
|
+
- *"Show me everything related to authentication"* → graph traversal across indirect dependencies
|
|
47
84
|
|
|
48
|
-
**
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
```
|
|
85
|
+
**What vector-only search misses:**
|
|
86
|
+
- Direct import/export chains
|
|
87
|
+
- Class inheritance hierarchies
|
|
88
|
+
- Which files actually depend on which
|
|
53
89
|
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
90
|
+
## Installation
|
|
91
|
+
|
|
92
|
+
### Recommended: npx (no install needed)
|
|
93
|
+
|
|
94
|
+
The standard way to configure any MCP server — no global install required:
|
|
95
|
+
|
|
96
|
+
```json
|
|
97
|
+
{
|
|
98
|
+
"mcpServers": {
|
|
99
|
+
"codeseeker": {
|
|
100
|
+
"command": "npx",
|
|
101
|
+
"args": ["-y", "codeseeker", "serve", "--mcp"]
|
|
102
|
+
}
|
|
103
|
+
}
|
|
104
|
+
}
|
|
58
105
|
```
|
|
59
106
|
|
|
60
|
-
|
|
107
|
+
Add this to your MCP config file ([see below](#advanced-installation-options) for per-client locations) and restart your editor.
|
|
108
|
+
|
|
109
|
+
### npm global install
|
|
61
110
|
|
|
62
|
-
Run without installing:
|
|
63
111
|
```bash
|
|
64
|
-
|
|
65
|
-
|
|
112
|
+
npm install -g codeseeker
|
|
113
|
+
codeseeker install --vscode # or --cursor, --windsurf
|
|
66
114
|
```
|
|
67
115
|
|
|
68
116
|
### 🔌 Claude Code Plugin
|
|
69
117
|
|
|
70
|
-
|
|
118
|
+
For Claude Code CLI users — adds auto-sync hooks and slash commands:
|
|
71
119
|
|
|
72
120
|
```bash
|
|
73
121
|
/plugin install codeseeker@github:jghiringhelli/codeseeker#plugin
|
|
74
122
|
```
|
|
75
123
|
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
### ☁️ Devcontainer / GitHub Codespaces
|
|
124
|
+
Slash commands: `/codeseeker:init`, `/codeseeker:reindex`
|
|
79
125
|
|
|
80
|
-
|
|
126
|
+
### ☁️ Devcontainers / GitHub Codespaces
|
|
81
127
|
|
|
82
128
|
```json
|
|
83
129
|
{
|
|
@@ -87,121 +133,37 @@ CodeSeeker auto-installs in devcontainers! Just add `.devcontainer/devcontainer.
|
|
|
87
133
|
}
|
|
88
134
|
```
|
|
89
135
|
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
### ✅ Verify Installation
|
|
136
|
+
### ✅ Verify
|
|
93
137
|
|
|
94
138
|
Ask your AI assistant: *"What CodeSeeker tools do you have?"*
|
|
95
139
|
|
|
96
|
-
You should see: `search`, `analyze`, `index` — CodeSeeker's three
|
|
97
|
-
|
|
98
|
-
---
|
|
99
|
-
|
|
100
|
-
## The Problem
|
|
101
|
-
|
|
102
|
-
Claude Code is powerful, but it navigates your codebase like a tourist with a phrasebook:
|
|
103
|
-
- **Grep searches** find text matches, not semantic meaning
|
|
104
|
-
- **File reads** show code in isolation, missing the bigger picture
|
|
105
|
-
- **No memory** of your project's patterns—every session starts fresh
|
|
106
|
-
|
|
107
|
-
The result? Claude asks you to explain code relationships it should already know. It writes validation logic that doesn't match your existing patterns. It misses dependencies and breaks things.
|
|
108
|
-
|
|
109
|
-
## How CodeSeeker Fixes This
|
|
110
|
-
|
|
111
|
-
CodeSeeker builds a **knowledge graph** of your codebase:
|
|
112
|
-
|
|
113
|
-
```
|
|
114
|
-
┌─────────────┐ imports ┌─────────────┐
|
|
115
|
-
│ auth.ts │ ───────────────▶ │ user.ts │
|
|
116
|
-
└─────────────┘ └─────────────┘
|
|
117
|
-
│ │
|
|
118
|
-
│ calls │ extends
|
|
119
|
-
▼ ▼
|
|
120
|
-
┌─────────────┐ implements ┌─────────────┐
|
|
121
|
-
│ session.ts │ ◀─────────────── │ BaseUser.ts │
|
|
122
|
-
└─────────────┘ └─────────────┘
|
|
123
|
-
```
|
|
124
|
-
|
|
125
|
-
When you ask "add password reset to authentication", Claude doesn't just find files containing "auth"—it traverses the graph to find:
|
|
126
|
-
- What `auth.ts` imports and exports
|
|
127
|
-
- Which services call authentication functions
|
|
128
|
-
- What patterns exist in related code
|
|
129
|
-
- How your project handles similar flows
|
|
130
|
-
|
|
131
|
-
This is **Graph RAG** (Retrieval-Augmented Generation), not just vector search.
|
|
140
|
+
You should see: `search`, `analyze`, `index` — CodeSeeker's three tools.
|
|
132
141
|
|
|
133
142
|
## Advanced Installation Options
|
|
134
143
|
|
|
135
144
|
<details>
|
|
136
|
-
<summary><b>📋
|
|
137
|
-
|
|
138
|
-
### VS Code (Claude Code & GitHub Copilot)
|
|
145
|
+
<summary><b>📋 MCP Configuration by client</b></summary>
|
|
139
146
|
|
|
140
|
-
|
|
147
|
+
The MCP config JSON is the same for all clients — only the file location differs:
|
|
141
148
|
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
"env": {
|
|
149
|
-
"CODESEEKER_STORAGE_MODE": "embedded"
|
|
150
|
-
}
|
|
151
|
-
}
|
|
152
|
-
}
|
|
153
|
-
}
|
|
154
|
-
```
|
|
155
|
-
|
|
156
|
-
### Cursor
|
|
157
|
-
|
|
158
|
-
Add to `.cursor/mcp.json` in your project:
|
|
159
|
-
|
|
160
|
-
```json
|
|
161
|
-
{
|
|
162
|
-
"mcpServers": {
|
|
163
|
-
"codeseeker": {
|
|
164
|
-
"command": "npx",
|
|
165
|
-
"args": ["-y", "codeseeker", "serve", "--mcp"],
|
|
166
|
-
"env": {
|
|
167
|
-
"CODESEEKER_STORAGE_MODE": "embedded"
|
|
168
|
-
}
|
|
169
|
-
}
|
|
170
|
-
}
|
|
171
|
-
}
|
|
172
|
-
```
|
|
173
|
-
|
|
174
|
-
### Claude Desktop
|
|
175
|
-
|
|
176
|
-
Add to your `claude_desktop_config.json`:
|
|
177
|
-
|
|
178
|
-
**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
|
|
179
|
-
**Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
|
|
149
|
+
| Client | Config file |
|
|
150
|
+
|--------|------------|
|
|
151
|
+
| **VS Code** (Claude Code / Copilot) | `.vscode/mcp.json` in your project, or `~/.vscode/mcp.json` globally |
|
|
152
|
+
| **Cursor** | `.cursor/mcp.json` in your project |
|
|
153
|
+
| **Claude Desktop** | `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows) |
|
|
154
|
+
| **Windsurf** | `.windsurf/mcp.json` in your project |
|
|
180
155
|
|
|
181
156
|
```json
|
|
182
157
|
{
|
|
183
158
|
"mcpServers": {
|
|
184
159
|
"codeseeker": {
|
|
185
160
|
"command": "npx",
|
|
186
|
-
"args": ["-y", "codeseeker", "serve", "--mcp"]
|
|
187
|
-
"env": {
|
|
188
|
-
"CODESEEKER_STORAGE_MODE": "embedded"
|
|
189
|
-
}
|
|
161
|
+
"args": ["-y", "codeseeker", "serve", "--mcp"]
|
|
190
162
|
}
|
|
191
163
|
}
|
|
192
164
|
}
|
|
193
165
|
```
|
|
194
166
|
|
|
195
|
-
### Global vs Project-Level Configuration
|
|
196
|
-
|
|
197
|
-
```bash
|
|
198
|
-
# Apply to all projects (user-level)
|
|
199
|
-
codeseeker install --vscode --global
|
|
200
|
-
|
|
201
|
-
# Apply to current project only
|
|
202
|
-
codeseeker install --vscode
|
|
203
|
-
```
|
|
204
|
-
|
|
205
167
|
</details>
|
|
206
168
|
|
|
207
169
|
<details>
|
|
@@ -262,38 +224,70 @@ User: "Find the authentication logic"
|
|
|
262
224
|
|
|
263
225
|
First search on a new project takes 30 seconds to several minutes (depending on size). Subsequent searches are instant.
|
|
264
226
|
|
|
265
|
-
|
|
227
|
+
---
|
|
266
228
|
|
|
267
|
-
|
|
268
|
-
|----------|--------------|-----------|-------------|
|
|
269
|
-
| **Grep/ripgrep** | Text pattern matching | Fast, universal | No semantic understanding |
|
|
270
|
-
| **Vector search only** | Embedding similarity | Finds similar code | Misses structural relationships |
|
|
271
|
-
| **LSP-based tools** | Language server protocol | Precise symbol definitions | No semantic search, no cross-file reasoning |
|
|
272
|
-
| **CodeSeeker** | Knowledge graph + hierarchical hybrid search | Semantic + structure + directory context + patterns | Requires initial indexing (30s-5min) |
|
|
229
|
+
## Search Quality Research
|
|
273
230
|
|
|
274
|
-
|
|
231
|
+
<details>
|
|
232
|
+
<summary><b>📊 Component ablation study (v2.0.0)</b> — measured impact of each retrieval layer</summary>
|
|
275
233
|
|
|
276
|
-
|
|
277
|
-
CodeSeeker generates *directory summary nodes* by mean-pooling the embeddings of all files in each folder, plus a *project root node* for the whole codebase. These live in the same index as regular file chunks:
|
|
278
|
-
- *Concrete queries* ("find JWT refresh logic") surface precise file chunks as usual
|
|
279
|
-
- *Abstract queries* ("what does the auth package do?") naturally score higher against directory summaries → instant package-level answers without enumerating 20 files
|
|
280
|
-
- *On sync*, a structural hash + cosine drift check skips regeneration for most edits — no extra cost for routine code changes
|
|
234
|
+
### Setup
|
|
281
235
|
|
|
282
|
-
|
|
283
|
-
- *"Find code that handles errors like this"* → Semantic search finds similar patterns
|
|
284
|
-
- *"What validation approach does this project use?"* → Auto-detected coding standards
|
|
285
|
-
- *"Show me everything related to authentication"* → Graph traversal across indirect dependencies
|
|
236
|
+
18 hand-labelled queries across two real-world codebases:
|
|
286
237
|
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
-
|
|
290
|
-
|
|
291
|
-
|
|
238
|
+
| Corpus | Language | Files | Queries | Query types |
|
|
239
|
+
|--------|----------|-------|---------|-------------|
|
|
240
|
+
| [Conclave](https://github.com/jghiringhelli/conclave) | TypeScript (pnpm monorepo) | 201 | 10 | Symbol lookup, cross-file chains, out-of-scope |
|
|
241
|
+
| [ImperialCommander2](https://github.com/jonwill8/ImperialCommander2) | C# / Unity | 199 | 8 | Class lookup, controller wiring, file I/O |
|
|
242
|
+
|
|
243
|
+
Each query has one or more `mustFind` targets (exact file basenames) and optional `mustNotFind` targets (scope leak check). Queries were run on a real index built from source — real Xenova embeddings, real graph, real RAPTOR L2 nodes — to reflect production conditions.
|
|
244
|
+
|
|
245
|
+
Metrics: **MRR** (Mean Reciprocal Rank), **P@1** (Precision at 1), **R@5** (Recall at 5), **F1@3**.
|
|
246
|
+
|
|
247
|
+
### Ablation results
|
|
248
|
+
|
|
249
|
+
| Configuration | MRR | P@1 | P@3 | R@5 | F1@3 | Notes |
|
|
250
|
+
|--------------|-----|-----|-----|-----|------|-------|
|
|
251
|
+
| **Hybrid baseline** (BM25 + embed + RAPTOR, no graph) | **75.2%** | 61.1% | 29.6% | 91.7% | 44.4% | Production default |
|
|
252
|
+
| + graph 1-hop | 74.9% | 61.1% | 29.6% | 91.7% | 44.4% | ±0% ranking, adds structural neighbors |
|
|
253
|
+
| + graph 2-hop | 74.9% | 61.1% | 29.6% | 91.7% | 44.4% | Scope leaks on unrelated queries |
|
|
254
|
+
| No RAPTOR (graph 1-hop) | 74.9% | 61.1% | 29.6% | 91.7% | 44.4% | RAPTOR contributes +0.3% |
|
|
255
|
+
|
|
256
|
+
### What each layer actually does
|
|
257
|
+
|
|
258
|
+
**BM25 + embedding fusion (RRF)**
|
|
259
|
+
The workhorse. Handles ~94% of ranking quality on its own. BM25 catches exact symbol names and camelCase tokens; vector embeddings catch semantic similarity when names differ. Fused with Reciprocal Rank Fusion to combine both signals without manual weight tuning.
|
|
260
|
+
|
|
261
|
+
**RAPTOR (hierarchical directory summaries)**
|
|
262
|
+
Generates per-directory embedding nodes by mean-pooling all file embeddings in a folder. Acts as a post-filter: when a directory summary scores ≥ 0.5 against the query, results are narrowed to that directory's files. Measured contribution: **+0.3% MRR** on symbol queries. Fires conservatively — only when the directory is an obvious match. Its real value is on _abstract queries_ ("what does the payments module do?") which don't appear in this benchmark; for those queries it prevents broad scattering across the entire codebase.
|
|
292
263
|
|
|
293
|
-
|
|
264
|
+
**Knowledge graph (import/dependency edges)**
|
|
265
|
+
Average connectivity: 20.8 file→file edges per node across both TS and C# codebases. Measured ranking impact: **±0% MRR** for 1-hop expansion. The graph doesn't move MRR because the semantic layer already finds the right files — the graph's neighbors are usually already in the top-15. Its value is structural: the `analyze dependencies` action and explicit `graph` search type give Claude traversable import chains, inheritance hierarchies, and dependency paths that embeddings alone cannot provide.
|
|
294
266
|
|
|
295
|
-
**
|
|
296
|
-
|
|
267
|
+
**Type boost / penalty scoring**
|
|
268
|
+
Source files get +0.10 score boost; test files get −0.15 penalty; lock files and docs get −0.05 penalty. Without this, `integration.test.ts` would rank above `dag-engine.ts` for exact symbol queries because test files import and exercise every symbol in the source. The penalty corrects this without eliminating test files from results.
|
|
269
|
+
|
|
270
|
+
**Monorepo directory exclusion fix**
|
|
271
|
+
The single highest-impact change in v1.12.0: removing `packages/` from the default exclusion list. For pnpm/yarn/lerna monorepos where all source lives under `packages/`, this exclusion was silently dropping all source files. Effect: **10% → 72% MRR** on the Conclave monorepo benchmark.
|
|
272
|
+
|
|
273
|
+
### Known limitations
|
|
274
|
+
|
|
275
|
+
| Query | Target | Issue | Root cause |
|
|
276
|
+
|-------|--------|-------|-----------|
|
|
277
|
+
| `cv-prompts` | `orchestrator.ts` | rank 97+ even with 2-hop graph | `prompt-builder.test.ts` outscores `prompt-builder.ts` semantically; source file never enters top-10, so we can't graph-walk from it to `orchestrator.ts`. Test-file dominance on cross-file queries. |
|
|
278
|
+
| `cv-exec-mode` | `types.ts` | rank 11–12 | `types.ts` is a pure type-export file; low keyword density. Found within R@5 (rank ≤ 15). |
|
|
279
|
+
|
|
280
|
+
### Benchmark script
|
|
281
|
+
|
|
282
|
+
Reproduce with:
|
|
283
|
+
```bash
|
|
284
|
+
npm run build
|
|
285
|
+
node scripts/real-bench.js
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
Requires `C:\workspace\claude\conclave` and `C:\workspace\ImperialCommander2` to be present locally (or update paths in `scripts/real-bench.js`).
|
|
289
|
+
|
|
290
|
+
</details>
|
|
297
291
|
|
|
298
292
|
## Auto-Detected Coding Standards
|
|
299
293
|
|
|
@@ -474,14 +468,9 @@ All data stored locally in `.codeseeker/`. No external services required.
|
|
|
474
468
|
|
|
475
469
|
For large teams (100K+ files, shared indexes), server mode supports PostgreSQL + Neo4j. See [Storage Documentation](docs/technical/storage.md).
|
|
476
470
|
|
|
477
|
-
|
|
471
|
+
For the complete technical internals — exact scoring formulas, MCP tool schema, graph edge types, RAPTOR threshold logic, pipeline stages, analysis confidence tiers — see the **[Technical Architecture Manual](docs/technical/architecture.md)**.
|
|
478
472
|
|
|
479
|
-
|
|
480
|
-
|
|
481
|
-
**CodeSeeker is NOT a VS Code extension.** It's an MCP server that works WITH AI assistants.
|
|
482
|
-
|
|
483
|
-
✅ **Correct:** Install via npm: `npm install -g codeseeker`
|
|
484
|
-
❌ **Wrong:** Looking for it in VS Code Extensions marketplace
|
|
473
|
+
## Troubleshooting
|
|
485
474
|
|
|
486
475
|
### MCP server not connecting
|
|
487
476
|
|
|
@@ -513,16 +502,16 @@ Open an issue: [GitHub Issues](https://github.com/jghiringhelli/codeseeker/issue
|
|
|
513
502
|
|
|
514
503
|
## Supported Platforms
|
|
515
504
|
|
|
516
|
-
|
|
|
517
|
-
|
|
518
|
-
| **Claude Code** (VS Code) |
|
|
519
|
-
| **GitHub Copilot** (VS Code
|
|
520
|
-
| **Cursor** |
|
|
521
|
-
| **
|
|
522
|
-
| **
|
|
523
|
-
| **Visual Studio** |
|
|
505
|
+
| Client | MCP Support | Config |
|
|
506
|
+
|--------|-------------|--------|
|
|
507
|
+
| **Claude Code** (VS Code) | ✅ | `.vscode/mcp.json` or plugin |
|
|
508
|
+
| **GitHub Copilot** (VS Code 1.99+) | ✅ | `.vscode/mcp.json` |
|
|
509
|
+
| **Cursor** | ✅ | `.cursor/mcp.json` |
|
|
510
|
+
| **Windsurf** | ✅ | `.windsurf/mcp.json` |
|
|
511
|
+
| **Claude Desktop** | ✅ | `claude_desktop_config.json` |
|
|
512
|
+
| **Visual Studio** | ✅ | `codeseeker install --vs` |
|
|
524
513
|
|
|
525
|
-
>
|
|
514
|
+
> Claude Code and GitHub Copilot share the same `.vscode/mcp.json` — configure once, works for both.
|
|
526
515
|
|
|
527
516
|
## Support
|
|
528
517
|
|
|
@@ -44,6 +44,8 @@ export declare class SemanticSearchOrchestrator {
|
|
|
44
44
|
private vectorStore?;
|
|
45
45
|
private projectStore?;
|
|
46
46
|
private graphStore?;
|
|
47
|
+
/** Current query — set at the start of performSemanticSearch, used for symbol-name boosting */
|
|
48
|
+
private currentQuery;
|
|
47
49
|
constructor();
|
|
48
50
|
/**
|
|
49
51
|
* Initialize storage - checks if we should use embedded or server mode
|
|
@@ -64,12 +66,41 @@ export declare class SemanticSearchOrchestrator {
|
|
|
64
66
|
* Returns empty array if storage unavailable or no results - Claude handles file discovery natively
|
|
65
67
|
*/
|
|
66
68
|
performSemanticSearch(query: string, projectPath: string, searchType?: 'hybrid' | 'vector' | 'fts' | 'graph'): Promise<SemanticResult[]>;
|
|
69
|
+
/** Minimum L2 RAPTOR node score to trust its directory hint */
|
|
70
|
+
private l2Threshold;
|
|
71
|
+
/** Minimum number of results cascade must produce to skip fallback */
|
|
72
|
+
private cascadeMinResults;
|
|
73
|
+
/** Minimum top-result score cascade must produce to skip fallback */
|
|
74
|
+
private cascadeTopScore;
|
|
75
|
+
/** Override RAPTOR cascade thresholds — useful for tuning experiments. */
|
|
76
|
+
setRaptorConfig(config: {
|
|
77
|
+
l2Threshold?: number;
|
|
78
|
+
cascadeMinResults?: number;
|
|
79
|
+
cascadeTopScore?: number;
|
|
80
|
+
}): void;
|
|
67
81
|
/**
|
|
68
|
-
*
|
|
69
|
-
*
|
|
70
|
-
|
|
82
|
+
* Depth of graph neighbor expansion after hybrid search.
|
|
83
|
+
* 0 = disabled, 1 = 1-hop (default), 2 = 2-hop (cross-file chains)
|
|
84
|
+
*/
|
|
85
|
+
private graphExpansionDepth;
|
|
86
|
+
/** Configure graph expansion depth. 0 disables expansion entirely. */
|
|
87
|
+
setGraphExpansionDepth(depth: number): void;
|
|
88
|
+
/**
|
|
89
|
+
* Perform hybrid search using the storage interface abstraction.
|
|
90
|
+
* Works for both embedded (SQLite + MiniSearch) and server (PostgreSQL + pgvector) modes.
|
|
91
|
+
*
|
|
92
|
+
* RAPTOR Cascade (post-processing):
|
|
93
|
+
* 1. Run wide searchHybrid (one call, always happens)
|
|
94
|
+
* 2. Extract RAPTOR L2 nodes from raw results
|
|
95
|
+
* 3. If a high-confidence L2 node exists, post-filter real files to its dir(s)
|
|
96
|
+
* 4. If the filtered set is thin or low-confidence, fall back to full wide results
|
|
71
97
|
*/
|
|
72
98
|
private performHybridSearchViaInterface;
|
|
99
|
+
/**
|
|
100
|
+
* Apply RAPTOR cascade post-filter.
|
|
101
|
+
* Returns filtered SemanticResult[] when cascade is confident, null when falling back.
|
|
102
|
+
*/
|
|
103
|
+
private applyCascadeFilter;
|
|
73
104
|
/**
|
|
74
105
|
* Perform vector-only semantic search (pure embedding similarity, no BM25/path matching)
|
|
75
106
|
*/
|
|
@@ -84,9 +115,10 @@ export declare class SemanticSearchOrchestrator {
|
|
|
84
115
|
*/
|
|
85
116
|
private processRawResults;
|
|
86
117
|
/**
|
|
87
|
-
* Graph RAG: expand hybrid search results by following
|
|
118
|
+
* Graph RAG: expand hybrid search results by following code relationship edges.
|
|
88
119
|
* For each of the top-5 result files, lookup its graph node and collect neighbours
|
|
89
120
|
* (files connected via imports/calls/extends). Appends new files at a discounted score.
|
|
121
|
+
* @param depth 1 = 1-hop, 2 = 2-hop (follows neighbors of neighbors for cross-file chains)
|
|
90
122
|
*/
|
|
91
123
|
private expandWithGraphNeighbors;
|
|
92
124
|
/**
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"semantic-search-orchestrator.d.ts","sourceRoot":"","sources":["../../../../src/cli/commands/services/semantic-search-orchestrator.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;;;GAqBG;AASH,MAAM,WAAW,cAAc;IAC7B,IAAI,EAAE,MAAM,CAAC;IACb,IAAI,EAAE,MAAM,CAAC;IACb,UAAU,EAAE,MAAM,CAAC;IACnB,OAAO,EAAE,MAAM,CAAC;IAChB,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,8DAA8D;IAC9D,KAAK,CAAC,EAAE;QACN,WAAW,EAAE,MAAM,CAAC;QACpB,SAAS,EAAE,MAAM,CAAC;QAClB,SAAS,EAAE,OAAO,CAAC;QACnB,WAAW,EAAE,MAAM,CAAC;KACrB,CAAC;CACH;AAED,qBAAa,0BAA0B;IACrC,OAAO,CAAC,MAAM,CAAwB;IACtC,OAAO,CAAC,SAAS,CAAC,CAAS;IAC3B,OAAO,CAAC,kBAAkB,CAA4B;IACtD,OAAO,CAAC,cAAc,CAA+B;IACrD,OAAO,CAAC,WAAW,CAAkB;IACrC,OAAO,CAAC,WAAW,CAAC,CAAe;IACnC,OAAO,CAAC,YAAY,CAAC,CAAgB;IACrC,OAAO,CAAC,UAAU,CAAC,CAAc;;
|
|
1
|
+
{"version":3,"file":"semantic-search-orchestrator.d.ts","sourceRoot":"","sources":["../../../../src/cli/commands/services/semantic-search-orchestrator.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;;;GAqBG;AASH,MAAM,WAAW,cAAc;IAC7B,IAAI,EAAE,MAAM,CAAC;IACb,IAAI,EAAE,MAAM,CAAC;IACb,UAAU,EAAE,MAAM,CAAC;IACnB,OAAO,EAAE,MAAM,CAAC;IAChB,SAAS,CAAC,EAAE,MAAM,CAAC;IACnB,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,8DAA8D;IAC9D,KAAK,CAAC,EAAE;QACN,WAAW,EAAE,MAAM,CAAC;QACpB,SAAS,EAAE,MAAM,CAAC;QAClB,SAAS,EAAE,OAAO,CAAC;QACnB,WAAW,EAAE,MAAM,CAAC;KACrB,CAAC;CACH;AAED,qBAAa,0BAA0B;IACrC,OAAO,CAAC,MAAM,CAAwB;IACtC,OAAO,CAAC,SAAS,CAAC,CAAS;IAC3B,OAAO,CAAC,kBAAkB,CAA4B;IACtD,OAAO,CAAC,cAAc,CAA+B;IACrD,OAAO,CAAC,WAAW,CAAkB;IACrC,OAAO,CAAC,WAAW,CAAC,CAAe;IACnC,OAAO,CAAC,YAAY,CAAC,CAAgB;IACrC,OAAO,CAAC,UAAU,CAAC,CAAc;IAEjC,+FAA+F;IAC/F,OAAO,CAAC,YAAY,CAAc;;IAMlC;;OAEG;YACW,WAAW;IAgBzB;;OAEG;IACH,YAAY,CAAC,SAAS,EAAE,MAAM,GAAG,IAAI;IAIrC;;OAEG;YACW,gBAAgB;IA0B9B;;;;;OAKG;IACG,qBAAqB,CAAC,KAAK,EAAE,MAAM,EAAE,WAAW,EAAE,MAAM,EAAE,UAAU,GAAE,QAAQ,GAAG,QAAQ,GAAG,KAAK,GAAG,OAAkB,GAAG,OAAO,CAAC,cAAc,EAAE,CAAC;IAoDxJ,+DAA+D;IAC/D,OAAO,CAAC,WAAW,CAAO;IAC1B,sEAAsE;IACtE,OAAO,CAAC,iBAAiB,CAAK;IAC9B,qEAAqE;IACrE,OAAO,CAAC,eAAe,CAAQ;IAE/B,0EAA0E;IAC1E,eAAe,CAAC,MAAM,EAAE;QAAE,WAAW,CAAC,EAAE,MAAM,CAAC;QAAC,iBAAiB,CAAC,EAAE,MAAM,CAAC;QAAC,eAAe,CAAC,EAAE,MAAM,CAAA;KAAE,GAAG,IAAI;IAO7G;;;OAGG;IACH,OAAO,CAAC,mBAAmB,CAAK;IAEhC,sEAAsE;IACtE,sBAAsB,CAAC,KAAK,EAAE,MAAM,GAAG,IAAI;IAI3C;;;;;;;;;OASG;YACW,+BAA+B;IAkC7C;;;OAGG;IACH,OAAO,CAAC,kBAAkB;IAgE1B;;OAEG;YACW,uBAAuB;IAoBrC;;OAEG;YACW,qBAAqB;IAYnC;;;OAGG;IACH,OAAO,CAAC,iBAAiB;IAgIzB;;;;;OAKG;YACW,wBAAwB;IAyGtC;;OAEG;IACH,OAAO,CAAC,WAAW;IAKnB;;OAEG;IACH,OAAO,CAAC,aAAa;IA0BrB;;OAEG;IACH,OAAO,CAAC,iBAAiB;CAiC1B"}
|