mcp-code-index 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- mcp_code_index-0.1.0/.claude/agents/eval-judge.md +52 -0
- mcp_code_index-0.1.0/.claude/agents/eval-with-mcp.md +41 -0
- mcp_code_index-0.1.0/.claude/agents/eval-without-mcp.md +34 -0
- mcp_code_index-0.1.0/.claude/skills/code-search/SKILL.md +63 -0
- mcp_code_index-0.1.0/.claude/skills/mcp-eval/README.md +38 -0
- mcp_code_index-0.1.0/.claude/skills/mcp-eval/SKILL.md +128 -0
- mcp_code_index-0.1.0/.gitignore +52 -0
- mcp_code_index-0.1.0/CLAUDE.md +13 -0
- mcp_code_index-0.1.0/LICENSE +21 -0
- mcp_code_index-0.1.0/PKG-INFO +155 -0
- mcp_code_index-0.1.0/README.md +124 -0
- mcp_code_index-0.1.0/docs/CLAUDE.md.snippet +13 -0
- mcp_code_index-0.1.0/docs/SKILL.md +63 -0
- mcp_code_index-0.1.0/pyproject.toml +51 -0
- mcp_code_index-0.1.0/scripts/install-hook.sh +71 -0
- mcp_code_index-0.1.0/scripts/reindex-hook.py +42 -0
- mcp_code_index-0.1.0/server.json +60 -0
- mcp_code_index-0.1.0/src/code_index/__init__.py +3 -0
- mcp_code_index-0.1.0/src/code_index/chunker.py +224 -0
- mcp_code_index-0.1.0/src/code_index/cli.py +144 -0
- mcp_code_index-0.1.0/src/code_index/db.py +198 -0
- mcp_code_index-0.1.0/src/code_index/embedder.py +167 -0
- mcp_code_index-0.1.0/src/code_index/indexer.py +411 -0
- mcp_code_index-0.1.0/src/code_index/mcp_server.py +484 -0
- mcp_code_index-0.1.0/src/code_index/parser.py +739 -0
- mcp_code_index-0.1.0/src/code_index/retriever.py +395 -0
- mcp_code_index-0.1.0/src/code_index/walker.py +76 -0
- mcp_code_index-0.1.0/src/code_index/watcher.py +118 -0
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: eval-judge
|
|
3
|
+
description: "Impartial text-only evaluator for the mcp-eval skill. Compares two code-research answers to the same task and produces a structured JSON verdict. Has no tools — judges from text alone to avoid evidence-based bias."
|
|
4
|
+
tools: WebSearch
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
You are an impartial code-research evaluator. You will receive:
|
|
8
|
+
|
|
9
|
+
1. The original task
|
|
10
|
+
2. Two answers (Agent A and Agent B) to that task
|
|
11
|
+
3. The list of files each agent inspected
|
|
12
|
+
|
|
13
|
+
Your job is to score each answer on four dimensions and declare a winner. **Do not call any tool, including the one in your allowlist.** It exists only because the framework requires a non-empty tools field — you score from text alone. Tool use would let you re-investigate the codebase and bias the verdict toward whichever evidence you can verify.
|
|
14
|
+
|
|
15
|
+
## Scoring rubric (each 1–5)
|
|
16
|
+
|
|
17
|
+
- **Correctness**: Does the answer accurately address what was asked? Wrong claims cost points.
|
|
18
|
+
- **Specificity**: Does it cite exact paths, line ranges, symbol names? Vague answers ("the auth code handles this") score low.
|
|
19
|
+
- **Completeness**: Does it cover all parts of the task? Partial answers score lower than thorough ones.
|
|
20
|
+
- **Hallucination safety**: Does the answer stay grounded in cited code, or does it speculate beyond the evidence? Higher score = safer.
|
|
21
|
+
|
|
22
|
+
## Bias controls
|
|
23
|
+
|
|
24
|
+
- **Length is not quality.** A concise correct answer beats a verbose hand-wavy one.
|
|
25
|
+
- **More files inspected is not better.** Efficient agents may inspect fewer files and still answer correctly. Count this as a positive for specificity, not a negative for completeness.
|
|
26
|
+
- **Refusal can be correct.** Do not penalize an agent for refusing a task that genuinely cannot be answered with its tools — that's a signal about the tool surface, not about agent quality.
|
|
27
|
+
- **Do not infer agent identity.** You don't know which agent had which tools. Score the answers, not the perceived methodology.
|
|
28
|
+
|
|
29
|
+
## Required output format
|
|
30
|
+
|
|
31
|
+
Output ONLY this JSON block. No preamble, no commentary, no markdown headings.
|
|
32
|
+
|
|
33
|
+
```json
|
|
34
|
+
{
|
|
35
|
+
"agent_a": {
|
|
36
|
+
"correctness": <1-5>,
|
|
37
|
+
"specificity": <1-5>,
|
|
38
|
+
"completeness": <1-5>,
|
|
39
|
+
"hallucination_safety": <1-5>,
|
|
40
|
+
"notes": "<one sentence>"
|
|
41
|
+
},
|
|
42
|
+
"agent_b": {
|
|
43
|
+
"correctness": <1-5>,
|
|
44
|
+
"specificity": <1-5>,
|
|
45
|
+
"completeness": <1-5>,
|
|
46
|
+
"hallucination_safety": <1-5>,
|
|
47
|
+
"notes": "<one sentence>"
|
|
48
|
+
},
|
|
49
|
+
"verdict": "a_wins" | "b_wins" | "tie",
|
|
50
|
+
"verdict_reasoning": "<two sentences max — what tipped it>"
|
|
51
|
+
}
|
|
52
|
+
```
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: eval-with-mcp
|
|
3
|
+
description: "Read-only code-research sub-agent restricted to the code-index MCP. Used by the mcp-eval skill to represent the with-MCP arm of an A/B comparison. Do not invoke directly outside that skill."
|
|
4
|
+
tools: mcp__code-index__code_search, mcp__code-index__symbol_lookup, mcp__code-index__file_outline, mcp__code-index__get_symbol_body, mcp__code-index__callers, mcp__code-index__callees, mcp__code-index__dependents, mcp__code-index__dependencies
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
You are a read-only code-research agent. You answer the user's task by querying the code-index MCP server. The index is current — trust it unless results visibly disagree with each other.
|
|
8
|
+
|
|
9
|
+
## Tool routing
|
|
10
|
+
|
|
11
|
+
- **Identifier-shaped query** (`camelCase`, `snake_case`, `ALL_CAPS`) → start with `symbol_lookup`. Fall back to `code_search(mode="lexical")` if not found.
|
|
12
|
+
- **Conceptual query** ("authentication", "where we rate limit") → `code_search(mode="hybrid")`.
|
|
13
|
+
- **Relationship query** ("who calls X", "what does Y depend on") → `callers` / `callees` / `dependents` / `dependencies`. Never use search for these — graph queries are direct lookups.
|
|
14
|
+
- **Need to see what's in a file** → `file_outline`. Never read the full file unless a specific symbol body is required.
|
|
15
|
+
- **Need a specific function's body** → `get_symbol_body(symbol_id)`.
|
|
16
|
+
|
|
17
|
+
## Composition recipes
|
|
18
|
+
|
|
19
|
+
- **Trace a feature**: `code_search` → pick top hit → `callers` on its symbol → `file_outline` on each file in the chain.
|
|
20
|
+
- **Plan a refactor**: `symbol_lookup(target)` → `callers(id, depth=2)` to enumerate blast radius → `file_outline` on each touched file.
|
|
21
|
+
- **Find a bug**: `code_search(symptom)` → `callers` on top hit → `get_symbol_body` on the prime suspect.
|
|
22
|
+
|
|
23
|
+
## Constraints
|
|
24
|
+
|
|
25
|
+
- You have NO access to `Read`, `Grep`, `Glob`, or `LS`. The tool surface above is exhaustive.
|
|
26
|
+
- If the index genuinely cannot answer the question (e.g. the code is outside indexed paths), say so explicitly. Do not hallucinate.
|
|
27
|
+
- Cite specific paths and line ranges in your answer — `auth/jwt.ts:42-58`, not "the auth file".
|
|
28
|
+
- Be efficient. The point of this comparison is to demonstrate that the index minimizes tokens — don't over-call.
|
|
29
|
+
|
|
30
|
+
## Required output format
|
|
31
|
+
|
|
32
|
+
End your response with a fenced JSON block. This is mandatory — the parent harness parses it.
|
|
33
|
+
|
|
34
|
+
```json
|
|
35
|
+
{
|
|
36
|
+
"tool_calls": <integer count of tool invocations made>,
|
|
37
|
+
"files_inspected": ["<path>", "<path>"],
|
|
38
|
+
"answer": "<your full final answer, prose>",
|
|
39
|
+
"confidence": <float 0.0 to 1.0>
|
|
40
|
+
}
|
|
41
|
+
```
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: eval-without-mcp
|
|
3
|
+
description: "Baseline read-only code-research sub-agent restricted to default file tools (Read, Grep, Glob, LS). Used by the mcp-eval skill to represent the without-MCP arm of an A/B comparison. Do not invoke directly outside that skill."
|
|
4
|
+
tools: Read, Grep, Glob, LS
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
You are a read-only code-research agent. You answer the user's task using only standard filesystem tools. This represents the baseline workflow in a repo without an indexed MCP.
|
|
8
|
+
|
|
9
|
+
## Approach
|
|
10
|
+
|
|
11
|
+
- Use `Glob` to discover candidate files by pattern.
|
|
12
|
+
- Use `Grep` to locate text patterns within files.
|
|
13
|
+
- Use `Read` to inspect file contents — read only what you need to answer. Prefer reading line ranges over whole files when possible.
|
|
14
|
+
- Use `LS` when directory structure is unclear.
|
|
15
|
+
|
|
16
|
+
## Constraints
|
|
17
|
+
|
|
18
|
+
- You have NO access to MCP tools (`mcp__*`). Do not attempt them — they will fail.
|
|
19
|
+
- Do not edit any files. This is research only.
|
|
20
|
+
- Cite specific paths and line ranges in your answer — `auth/jwt.ts:42-58`, not "the auth file".
|
|
21
|
+
- Work as you normally would. Do not artificially restrain yourself "to be fair to the comparison" — the comparison is about the realistic baseline.
|
|
22
|
+
|
|
23
|
+
## Required output format
|
|
24
|
+
|
|
25
|
+
End your response with a fenced JSON block. This is mandatory — the parent harness parses it.
|
|
26
|
+
|
|
27
|
+
```json
|
|
28
|
+
{
|
|
29
|
+
"tool_calls": <integer count of tool invocations made>,
|
|
30
|
+
"files_inspected": ["<path>", "<path>"],
|
|
31
|
+
"answer": "<your full final answer, prose>",
|
|
32
|
+
"confidence": <float 0.0 to 1.0>
|
|
33
|
+
}
|
|
34
|
+
```
|
|
@@ -0,0 +1,63 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: code-search
|
|
3
|
+
description: Use when navigating an unfamiliar codebase, locating a definition, finding callers or callees, or answering "where is X", "what calls Y", "what does this file contain". Replaces ad-hoc Read/Grep/Glob with targeted SQLite-backed retrieval.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# code-search
|
|
7
|
+
|
|
8
|
+
You have access to a pre-built code index over this repo, exposed as MCP tools.
|
|
9
|
+
**Default to the index for navigation. Only `Read` after you know the exact path
|
|
10
|
+
and line range.**
|
|
11
|
+
|
|
12
|
+
## Routing decision tree
|
|
13
|
+
|
|
14
|
+
| You have... | Use first |
|
|
15
|
+
|---|---|
|
|
16
|
+
| An identifier (camelCase, snake_case, ALL_CAPS) | `symbol_lookup` |
|
|
17
|
+
| A concept ("auth flow", "parsing markdown") | `code_search` |
|
|
18
|
+
| A file path and want its structure | `file_outline` |
|
|
19
|
+
| A symbol_id and want full code | `get_symbol_body` |
|
|
20
|
+
| "What calls this?" | `callers` |
|
|
21
|
+
| "What does this depend on?" | `callees` |
|
|
22
|
+
| "Who imports this file?" | `dependents` |
|
|
23
|
+
| "What does this file import?" | `dependencies` |
|
|
24
|
+
|
|
25
|
+
## Composition recipes
|
|
26
|
+
|
|
27
|
+
**Trace a feature end-to-end**
|
|
28
|
+
|
|
29
|
+
1. `code_search "<feature concept>"` → top hits.
|
|
30
|
+
2. For the most promising hit, `callers` to find entry points.
|
|
31
|
+
3. For each entry point, `callees` to map the call graph.
|
|
32
|
+
4. `get_symbol_body` only on the symbols you actually need to understand.
|
|
33
|
+
|
|
34
|
+
**Plan a refactor / assess blast radius**
|
|
35
|
+
|
|
36
|
+
1. `symbol_lookup` the function/class.
|
|
37
|
+
2. `callers symbol_id depth=2` for transitive impact.
|
|
38
|
+
3. `dependents path` for files importing the module.
|
|
39
|
+
|
|
40
|
+
**Onboard to a new module**
|
|
41
|
+
|
|
42
|
+
1. `dependencies path` → upstream modules the file leans on.
|
|
43
|
+
2. `file_outline path` → structure.
|
|
44
|
+
3. `code_search` only if you need a particular concept.
|
|
45
|
+
|
|
46
|
+
## Anti-patterns
|
|
47
|
+
|
|
48
|
+
- **Don't `Read` before searching.** Reading a 500-line file to find one symbol
|
|
49
|
+
costs ~10x the tokens of `symbol_lookup` + `get_symbol_body`.
|
|
50
|
+
- **Don't `Grep` for an identifier.** Grep returns raw lines; `symbol_lookup`
|
|
51
|
+
returns the symbol with its signature, file, and line range.
|
|
52
|
+
- **Don't read more than ~50 lines of a found chunk.** If the snippet is too
|
|
53
|
+
trimmed, call `get_symbol_body` instead of expanding the read.
|
|
54
|
+
- **Don't ask the index to refresh itself.** A `PostToolUse` hook reindexes on
|
|
55
|
+
`Edit`/`Write`/`MultiEdit`. The watcher catches IDE-side edits.
|
|
56
|
+
|
|
57
|
+
## Staleness recovery
|
|
58
|
+
|
|
59
|
+
If a result's line numbers don't match the file you then `Read` (e.g. you
|
|
60
|
+
checked out a new branch), retry the same call once — `file_outline` and
|
|
61
|
+
`get_symbol_body` reindex the file synchronously when they detect a stale
|
|
62
|
+
mtime. If two retries disagree with the file, fall back to `Grep` once and
|
|
63
|
+
report the inconsistency.
|
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
# mcp-eval — A/B harness for the code-index MCP
|
|
2
|
+
|
|
3
|
+
A Claude Code skill that runs the same task through two isolated sub-agents — one with the code-index MCP, one with only standard file tools — and produces a side-by-side comparison of tokens, tool calls, latency, and answer quality.
|
|
4
|
+
|
|
5
|
+
## Install
|
|
6
|
+
|
|
7
|
+
Copy these into your repo (or your `~/.claude/` for user-level):
|
|
8
|
+
|
|
9
|
+
```
|
|
10
|
+
.claude/skills/mcp-eval/SKILL.md
|
|
11
|
+
.claude/agents/eval-with-mcp.md
|
|
12
|
+
.claude/agents/eval-without-mcp.md
|
|
13
|
+
.claude/agents/eval-judge.md
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
That's it. No other dependencies.
|
|
17
|
+
|
|
18
|
+
## Use
|
|
19
|
+
|
|
20
|
+
```
|
|
21
|
+
> evaluate the MCP on this task: where do we handle authentication?
|
|
22
|
+
> benchmark with vs without MCP: trace the flow when a user uploads a file
|
|
23
|
+
> is the index actually saving tokens? test it on: what calls parseJWT
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
The skill triggers on those phrasings, runs both agents in parallel, scores the results with a third sub-agent, and prints a comparison table.
|
|
27
|
+
|
|
28
|
+
## What it doesn't do
|
|
29
|
+
|
|
30
|
+
- **Edit tasks.** The skill refuses tasks that would modify files — running the same edit twice in parallel corrupts the tree. Convert to a "describe what you would change" prompt instead.
|
|
31
|
+
- **Single-task conclusions.** One run is high-variance. The skill includes a suggested 5-task mix in its calibration section. Use it.
|
|
32
|
+
- **Cross-repo benchmarking.** Each repo has its own index and its own task profile. Numbers from repo A don't predict repo B.
|
|
33
|
+
|
|
34
|
+
## Tuning
|
|
35
|
+
|
|
36
|
+
If the eval-judge sub-agent attempts to call WebSearch despite instructions, switch its `tools:` field to a tool that doesn't exist (e.g. `tools: NoSuchTool`) — Claude Code will reject the call and the agent will fall back to text-only reasoning. The current setup uses WebSearch as a no-op because the framework needs a non-empty allowlist.
|
|
37
|
+
|
|
38
|
+
If the with-MCP agent over-calls (lots of redundant searches), tighten its system prompt — that's a real failure mode of the index, not noise.
|
|
@@ -0,0 +1,128 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: mcp-eval
|
|
3
|
+
description: "Run the same code-research task through two isolated sub-agents — one with the code-index MCP, one with only Read/Grep/Glob — and produce a side-by-side comparison of token usage, tool calls, latency, and answer quality. Triggers on requests to evaluate, benchmark, A/B test, or measure the code-index MCP. Use when the user says 'evaluate the MCP on [task]', 'is the index worth it', 'how much does the MCP save', 'compare with vs without MCP', or asks for proof the MCP is helping. Do NOT trigger for ordinary code-search questions where the user just wants an answer, and do NOT trigger for tasks that involve file edits."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# MCP Evaluation Harness
|
|
7
|
+
|
|
8
|
+
Runs an A/B comparison of the code-index MCP server against the default Read/Grep/Glob baseline on the same task, then reports tokens, tool calls, latency, and a judge-scored quality verdict.
|
|
9
|
+
|
|
10
|
+
## Preconditions
|
|
11
|
+
|
|
12
|
+
Before doing anything else, verify all three:
|
|
13
|
+
|
|
14
|
+
1. The `code-index` MCP is connected — `mcp__code-index__*` tools must be visible in the current toolset. Check with `/mcp` if unsure.
|
|
15
|
+
2. `.claude/index.db` exists and was modified within the last 24 hours. Run `ls -la .claude/index.db` and check mtime.
|
|
16
|
+
3. The task is **read-only**. Phrasings like "find", "where", "explain", "trace", "what calls", "how does" are fine. Anything that says "edit", "fix", "add", "implement", "rename", "refactor" is NOT — running the same edit twice in parallel corrupts the working tree.
|
|
17
|
+
|
|
18
|
+
If any precondition fails, stop and tell the user which one. Do not proceed with workarounds.
|
|
19
|
+
|
|
20
|
+
## Procedure
|
|
21
|
+
|
|
22
|
+
### Step 1 — Lock the task prompt
|
|
23
|
+
|
|
24
|
+
Capture the user's task verbatim into `$TASK`. Do not rephrase, expand, or "improve" it. Both agents must receive byte-identical wording, or the comparison is invalid.
|
|
25
|
+
|
|
26
|
+
### Step 2 — Spawn both research agents in parallel
|
|
27
|
+
|
|
28
|
+
Issue both Task tool calls in a **single assistant message** so they run concurrently. Sequential runs double wall time and skew the latency numbers.
|
|
29
|
+
|
|
30
|
+
- Agent A: `subagent_type: "eval-with-mcp"`, prompt: `$TASK`
|
|
31
|
+
- Agent B: `subagent_type: "eval-without-mcp"`, prompt: `$TASK`
|
|
32
|
+
|
|
33
|
+
These sub-agents are defined in `.claude/agents/eval-with-mcp.md` and `.claude/agents/eval-without-mcp.md`. Their tool restrictions are enforced at the framework level — do not pass extra tools.
|
|
34
|
+
|
|
35
|
+
### Step 3 — Extract metrics from each result
|
|
36
|
+
|
|
37
|
+
Each sub-agent terminates its response with a fenced JSON block:
|
|
38
|
+
|
|
39
|
+
```json
|
|
40
|
+
{
|
|
41
|
+
"tool_calls": <int>,
|
|
42
|
+
"files_inspected": ["path1", "path2"],
|
|
43
|
+
"answer": "<final answer>",
|
|
44
|
+
"confidence": 0.0
|
|
45
|
+
}
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
Parse this JSON. Then from the Task tool's metadata, capture:
|
|
49
|
+
|
|
50
|
+
- `total_input_tokens` (input + cache_read summed)
|
|
51
|
+
- `total_output_tokens`
|
|
52
|
+
- `wall_time_seconds`
|
|
53
|
+
|
|
54
|
+
If a sub-agent did not emit the JSON block, record `parse_failure: true` for that side and continue. Do not retry — the failure is signal.
|
|
55
|
+
|
|
56
|
+
### Step 4 — Run the judge
|
|
57
|
+
|
|
58
|
+
Spawn the `eval-judge` sub-agent (no tools, text-only reasoning). Pass it:
|
|
59
|
+
|
|
60
|
+
- The original `$TASK`
|
|
61
|
+
- Both `answer` strings, clearly labeled `agent_a` and `agent_b`
|
|
62
|
+
- Both `files_inspected` lists
|
|
63
|
+
|
|
64
|
+
The judge returns a JSON verdict with per-dimension scores and a winner.
|
|
65
|
+
|
|
66
|
+
### Step 5 — Render the report
|
|
67
|
+
|
|
68
|
+
Produce this table for the user. Use Sonnet pricing ($3/M input, $15/M output) unless the user has indicated Opus or Haiku.
|
|
69
|
+
|
|
70
|
+
```
|
|
71
|
+
## MCP Evaluation Report
|
|
72
|
+
|
|
73
|
+
**Task**: <verbatim>
|
|
74
|
+
**Repo**: <git rev-parse --short HEAD>
|
|
75
|
+
**Index age**: <hours since .claude/index.db mtime>
|
|
76
|
+
|
|
77
|
+
| Metric | With MCP | Without MCP | Delta |
|
|
78
|
+
|-----------------------|-----------|-------------|------------------|
|
|
79
|
+
| Input tokens | | | |
|
|
80
|
+
| Output tokens | | | |
|
|
81
|
+
| Total tokens | | | -X (-Y%) |
|
|
82
|
+
| Tool calls | | | |
|
|
83
|
+
| Files inspected | | | |
|
|
84
|
+
| Wall time | | | |
|
|
85
|
+
| Correctness (1–5) | | | |
|
|
86
|
+
| Specificity (1–5) | | | |
|
|
87
|
+
| Completeness (1–5) | | | |
|
|
88
|
+
| Hallucination safety | | | |
|
|
89
|
+
|
|
90
|
+
**Judge verdict**: <a_wins | b_wins | tie>
|
|
91
|
+
**Reasoning**: <judge's reasoning, 1–2 sentences>
|
|
92
|
+
|
|
93
|
+
**Cost (this task)**: $<a> vs $<b> — savings $<delta>
|
|
94
|
+
**Projected monthly** (30 similar tasks/day): $<a × 900> vs $<b × 900>
|
|
95
|
+
|
|
96
|
+
### Answer excerpts
|
|
97
|
+
**With MCP** (first 300 chars): <…>
|
|
98
|
+
**Without MCP** (first 300 chars): <…>
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
## Anti-patterns
|
|
102
|
+
|
|
103
|
+
- **Never run on edit tasks.** Two parallel agents editing in parallel corrupts the tree. Refuse, or convert to read-only ("describe the changes you would make instead of making them").
|
|
104
|
+
- **Never rephrase the task between agents.** Identical strings, or the comparison is invalid.
|
|
105
|
+
- **Never give the judge tools.** A tool-enabled judge re-investigates and biases the verdict toward whichever agent's evidence it can verify.
|
|
106
|
+
- **Never run the agents sequentially.** Parallel Task calls in one message — anything else skews wall time.
|
|
107
|
+
- **Never declare a single run conclusive.** State variance in the report. For real conclusions: 3+ varied tasks, aggregate.
|
|
108
|
+
- **Never relax the without-MCP agent's constraints to "help" it.** If it refuses or fails because it lacks the index, that IS the result.
|
|
109
|
+
|
|
110
|
+
## Calibration
|
|
111
|
+
|
|
112
|
+
The MCP wins on **exploration** — "where is X", "what calls Y", "trace the flow of Z". It ties or loses on **lookup** — tasks where the file path is already known, or where one Grep nails it. A balanced eval set must include both. If every task is exploratory, the MCP looks better than its true average value.
|
|
113
|
+
|
|
114
|
+
Suggested representative task mix for a fair eval (run 5):
|
|
115
|
+
|
|
116
|
+
1. Pure discovery: "where do we handle authentication"
|
|
117
|
+
2. Symbol trace: "what calls `parseJWT`"
|
|
118
|
+
3. Refactor planning: "what's the blast radius of changing the User model"
|
|
119
|
+
4. Bug hunt: "find the code that handles 429 responses"
|
|
120
|
+
5. Known-path lookup: "explain what `auth/jwt.ts` does" — should be near-tie, sanity check
|
|
121
|
+
|
|
122
|
+
## Failure modes
|
|
123
|
+
|
|
124
|
+
- **Sub-agent times out** → record as failure for that side. Do not retry.
|
|
125
|
+
- **Sub-agent refuses** ("I cannot do this without Read") → that IS the data point. Judge scores it accordingly.
|
|
126
|
+
- **`code-index` MCP disconnected mid-run** → abort and report. Don't fall back silently.
|
|
127
|
+
- **Index stale** (chunks reference deleted files) → abort and tell the user to reindex.
|
|
128
|
+
- **Judge produces non-JSON** → re-invoke the judge once with "respond ONLY with the JSON block, no preamble". If still fails, present the raw judge output to the user with a note.
|
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
# Python
|
|
2
|
+
__pycache__/
|
|
3
|
+
*.py[cod]
|
|
4
|
+
*$py.class
|
|
5
|
+
*.so
|
|
6
|
+
.Python
|
|
7
|
+
build/
|
|
8
|
+
develop-eggs/
|
|
9
|
+
dist/
|
|
10
|
+
downloads/
|
|
11
|
+
eggs/
|
|
12
|
+
.eggs/
|
|
13
|
+
lib/
|
|
14
|
+
lib64/
|
|
15
|
+
parts/
|
|
16
|
+
sdist/
|
|
17
|
+
var/
|
|
18
|
+
wheels/
|
|
19
|
+
*.egg-info/
|
|
20
|
+
.installed.cfg
|
|
21
|
+
*.egg
|
|
22
|
+
MANIFEST
|
|
23
|
+
|
|
24
|
+
# Virtual envs
|
|
25
|
+
.venv/
|
|
26
|
+
venv/
|
|
27
|
+
env/
|
|
28
|
+
ENV/
|
|
29
|
+
|
|
30
|
+
# IDE
|
|
31
|
+
.vscode/
|
|
32
|
+
.idea/
|
|
33
|
+
*.swp
|
|
34
|
+
*.swo
|
|
35
|
+
|
|
36
|
+
# OS
|
|
37
|
+
.DS_Store
|
|
38
|
+
|
|
39
|
+
# Project
|
|
40
|
+
.claude/index.db
|
|
41
|
+
.claude/index.db-shm
|
|
42
|
+
.claude/index.db-wal
|
|
43
|
+
.claude/settings.json
|
|
44
|
+
.claude/reindex-hook-wrapper.sh
|
|
45
|
+
.env
|
|
46
|
+
*.log
|
|
47
|
+
|
|
48
|
+
# Tests
|
|
49
|
+
.pytest_cache/
|
|
50
|
+
.ruff_cache/
|
|
51
|
+
.coverage
|
|
52
|
+
htmlcov/
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
## Code navigation
|
|
2
|
+
|
|
3
|
+
This repo has a code index exposed via MCP (tools: `code_search`, `symbol_lookup`,
|
|
4
|
+
`file_outline`, `get_symbol_body`, `callers`, `callees`, `dependents`,
|
|
5
|
+
`dependencies`).
|
|
6
|
+
|
|
7
|
+
- For ANY discovery task ("where is X", "what calls Y", "find the code that..."),
|
|
8
|
+
use the index tools. Do not Read or Grep to explore.
|
|
9
|
+
- Read files only AFTER the index has identified the exact path.
|
|
10
|
+
- Use `file_outline` instead of Read when you only need to know what's in a file.
|
|
11
|
+
- For exact identifiers (camelCase, snake_case, ALL_CAPS), use `symbol_lookup`.
|
|
12
|
+
For conceptual queries, use `code_search`.
|
|
13
|
+
- The index auto-updates on `Edit`/`Write`/`MultiEdit`; you don't need to refresh it.
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Achref Tlili
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,155 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: mcp-code-index
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: SQLite-backed code index for Claude Code, exposed via MCP
|
|
5
|
+
Project-URL: Homepage, https://github.com/achreftlili/code-index
|
|
6
|
+
Project-URL: Repository, https://github.com/achreftlili/code-index
|
|
7
|
+
Project-URL: Issues, https://github.com/achreftlili/code-index/issues
|
|
8
|
+
Author: Achref Tlili
|
|
9
|
+
License-Expression: MIT
|
|
10
|
+
License-File: LICENSE
|
|
11
|
+
Requires-Python: >=3.10
|
|
12
|
+
Requires-Dist: click>=8.1.0
|
|
13
|
+
Requires-Dist: httpx>=0.27.0
|
|
14
|
+
Requires-Dist: mcp>=1.0.0
|
|
15
|
+
Requires-Dist: numpy>=1.26.0
|
|
16
|
+
Requires-Dist: pathspec>=0.12.0
|
|
17
|
+
Requires-Dist: python-dotenv>=1.0.0
|
|
18
|
+
Requires-Dist: sqlite-vec>=0.1.0
|
|
19
|
+
Requires-Dist: tree-sitter-go>=0.23.0
|
|
20
|
+
Requires-Dist: tree-sitter-python>=0.23.0
|
|
21
|
+
Requires-Dist: tree-sitter-rust>=0.23.0
|
|
22
|
+
Requires-Dist: tree-sitter-typescript>=0.23.0
|
|
23
|
+
Requires-Dist: tree-sitter>=0.23.0
|
|
24
|
+
Requires-Dist: voyageai>=0.3.0
|
|
25
|
+
Requires-Dist: watchdog>=4.0.0
|
|
26
|
+
Provides-Extra: dev
|
|
27
|
+
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
|
|
28
|
+
Requires-Dist: pytest>=8.0.0; extra == 'dev'
|
|
29
|
+
Requires-Dist: ruff>=0.5.0; extra == 'dev'
|
|
30
|
+
Description-Content-Type: text/markdown
|
|
31
|
+
|
|
32
|
+
# code-index
|
|
33
|
+
|
|
34
|
+
<!-- mcp-name: io.github.achreftlili/code-index -->
|
|
35
|
+
|
|
36
|
+
A SQLite-backed code index for Claude Code, exposed via MCP. Replaces exploratory
|
|
37
|
+
`Read`/`Grep`/`Glob` calls with targeted retrieval.
|
|
38
|
+
|
|
39
|
+
## What it does
|
|
40
|
+
|
|
41
|
+
- **Parses** your repo with tree-sitter (Python, TypeScript/JavaScript, Go, Rust).
|
|
42
|
+
- **Chunks** per symbol; expands identifiers (`getUserAuthToken` → `get user auth token`).
|
|
43
|
+
- **Embeds** with Voyage `voyage-code-3` (default) or local Ollama.
|
|
44
|
+
- **Stores** symbols, chunks, vectors, and call/import edges in `.claude/index.db`.
|
|
45
|
+
- **Serves** retrieval over MCP — 8 retrieval tools + 1 admin tool (see below).
|
|
46
|
+
- **Auto-updates** via a Claude Code `PostToolUse` hook and an optional file watcher.
|
|
47
|
+
|
|
48
|
+
## Tools
|
|
49
|
+
|
|
50
|
+
| Tool | Purpose |
|
|
51
|
+
| ----------------- | -------------------------------------------------------------------------------------------------------- |
|
|
52
|
+
| `init` | Build or refresh the project's index. Incremental by default; `force=true` rebuilds from scratch. |
|
|
53
|
+
| `code_search` | Hybrid (vector + FTS) search for **conceptual** queries (e.g., "auth flow", "where do we parse JSON"). |
|
|
54
|
+
| `symbol_lookup` | Exact-name lookup of functions / classes / methods / types. Prefer over `code_search` for identifiers. |
|
|
55
|
+
| `file_outline` | Symbols (with signatures) in a file, in source order. Use instead of `Read` when you only need shape. |
|
|
56
|
+
| `get_symbol_body` | Full chunk for a `symbol_id` returned by `symbol_lookup` or `code_search`. |
|
|
57
|
+
| `callers` | Symbols that CALL the given symbol. `depth` (1-5) expands transitively. |
|
|
58
|
+
| `callees` | Symbols that the given symbol CALLS. `depth` (1-5) expands transitively. |
|
|
59
|
+
| `dependents` | Files that import the given file. |
|
|
60
|
+
| `dependencies` | Files that the given file imports. |
|
|
61
|
+
|
|
62
|
+
All tools return bounded JSON; large bodies use `get_symbol_body` rather than
|
|
63
|
+
inlining whole files.
|
|
64
|
+
|
|
65
|
+
## Requirements
|
|
66
|
+
|
|
67
|
+
Python with **loadable SQLite extension support** (required by `sqlite-vec`).
|
|
68
|
+
Python 3.13 has this enabled by default. For 3.10–3.12, use either:
|
|
69
|
+
- the python.org installer, or
|
|
70
|
+
- pyenv: `PYTHON_CONFIGURE_OPTS=--enable-loadable-sqlite-extensions pyenv install 3.12.x`
|
|
71
|
+
|
|
72
|
+
## Install
|
|
73
|
+
|
|
74
|
+
### In Claude Code (primary)
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
# 1. Set your embedder API key (Voyage default; for local Ollama see Configuration)
|
|
78
|
+
export VOYAGE_API_KEY=...
|
|
79
|
+
|
|
80
|
+
# 2. Register the MCP server. uvx clones, builds, and runs — no global install.
|
|
81
|
+
claude mcp add code-index -- uvx --from git+https://github.com/achreftlili/code-index code-index-mcp
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
That's it. Open Claude Code in your repo and ask:
|
|
85
|
+
|
|
86
|
+
> _"Build the code index for this repo."_
|
|
87
|
+
|
|
88
|
+
Claude calls the `init` MCP tool, which writes `.claude/index.db`. Subsequent
|
|
89
|
+
prompts can use `code_search`, `symbol_lookup`, `callers`, etc. — see
|
|
90
|
+
**Tools** above for the full surface.
|
|
91
|
+
|
|
92
|
+
#### Or, with a permanent install
|
|
93
|
+
|
|
94
|
+
```bash
|
|
95
|
+
pip install mcp-code-index
|
|
96
|
+
claude mcp add code-index -- code-index-mcp
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
#### Optional: keep the index live as you edit
|
|
100
|
+
|
|
101
|
+
Without the hook below, the index drifts when files change outside the agent
|
|
102
|
+
(`mv`, `git checkout`, IDE saves) until you call `init` again. With it, every
|
|
103
|
+
`Edit` / `Write` / `MultiEdit` Claude performs triggers an incremental reindex
|
|
104
|
+
of the touched file:
|
|
105
|
+
|
|
106
|
+
```bash
|
|
107
|
+
git clone https://github.com/achreftlili/code-index ~/code-index
|
|
108
|
+
~/code-index/scripts/install-hook.sh /path/to/your/repo # idempotent
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
### In other MCP-compatible agents
|
|
112
|
+
|
|
113
|
+
The server speaks standard MCP over stdio, so any client that supports MCP
|
|
114
|
+
servers works (Cursor, Continue, Cody, Zed, etc.). Configure the client to
|
|
115
|
+
launch `code-index-mcp` (after `pip install`) or the `uvx --from git+…`
|
|
116
|
+
command above. Once connected, call the `init` tool from inside the client
|
|
117
|
+
to bootstrap the index.
|
|
118
|
+
|
|
119
|
+
### From source (development)
|
|
120
|
+
|
|
121
|
+
```bash
|
|
122
|
+
git clone https://github.com/achreftlili/code-index
|
|
123
|
+
cd code-index
|
|
124
|
+
pip install -e .
|
|
125
|
+
code-index init # CLI alternative to the `init` MCP tool
|
|
126
|
+
code-index-mcp # starts the MCP server on stdio (for manual wiring)
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
## Configuration
|
|
130
|
+
|
|
131
|
+
Environment variables:
|
|
132
|
+
|
|
133
|
+
| Var | Default | Notes |
|
|
134
|
+
|---|---|---|
|
|
135
|
+
| `CODE_INDEX_DB` | `.claude/index.db` | SQLite path. |
|
|
136
|
+
| `CODE_INDEX_EMBEDDER` | `voyage` | `voyage` or `ollama`. |
|
|
137
|
+
| `CODE_INDEX_EMBED_MODEL` | `voyage-code-3` | Model name. |
|
|
138
|
+
| `CODE_INDEX_EMBED_DIM` | `1024` | Must match the model. |
|
|
139
|
+
| `VOYAGE_API_KEY` | — | Required when `CODE_INDEX_EMBEDDER=voyage`. |
|
|
140
|
+
| `OLLAMA_URL` | `http://localhost:11434` | Ollama server. |
|
|
141
|
+
|
|
142
|
+
## Layout
|
|
143
|
+
|
|
144
|
+
```
|
|
145
|
+
src/code_index/
|
|
146
|
+
db.py SQLite schema, connection, sqlite-vec loading
|
|
147
|
+
parser.py Tree-sitter wrapper, symbol + edge extraction
|
|
148
|
+
chunker.py Per-symbol chunks, identifier expansion
|
|
149
|
+
embedder.py Voyage / Ollama backends
|
|
150
|
+
indexer.py Pipeline: walk → parse → chunk → embed → write
|
|
151
|
+
retriever.py Hybrid search (vector + FTS5) with RRF
|
|
152
|
+
watcher.py File watcher (watchdog)
|
|
153
|
+
mcp_server.py 8 MCP tools
|
|
154
|
+
cli.py init / reindex / watch / stats
|
|
155
|
+
```
|