purecontext-mcp 1.1.0 → 1.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENT_INSTRUCTIONS.md +509 -0
- package/AGENT_INSTRUCTIONS_SHORT.md +97 -0
- package/CHANGELOG.md +212 -0
- package/docs/01-introduction.md +69 -0
- package/docs/02-installation.md +267 -0
- package/docs/03-quick-start.md +135 -0
- package/docs/04-configuration.md +214 -0
- package/docs/05-cli-reference.md +130 -0
- package/docs/06-tools-reference.md +499 -0
- package/docs/07-language-support.md +88 -0
- package/docs/08-framework-adapters.md +324 -0
- package/docs/09-dependency-graph.md +182 -0
- package/docs/10-semantic-search.md +153 -0
- package/docs/11-search-quality.md +110 -0
- package/docs/12-ai-summarization.md +106 -0
- package/docs/13-token-savings.md +110 -0
- package/docs/14-transport-modes.md +167 -0
- package/docs/15-team-setup.md +251 -0
- package/docs/16-docker.md +186 -0
- package/docs/17-web-ui.md +157 -0
- package/docs/18-git-history.md +157 -0
- package/docs/19-cross-repo.md +177 -0
- package/docs/20-architecture-analysis.md +228 -0
- package/docs/21-ecosystem-tools.md +189 -0
- package/docs/22-distribution.md +240 -0
- package/docs/23-performance.md +121 -0
- package/docs/24-security.md +144 -0
- package/docs/25-architecture-overview.md +240 -0
- package/docs/26-troubleshooting.md +234 -0
- package/docs/27-api-stability.md +114 -0
- package/docs/README.md +71 -0
- package/docs/dev/API_STABILITY.md +319 -0
- package/docs/dev/DECISIONS.md +22 -0
- package/docs/dev/DOCUMENTATION_PLAN.md +113 -0
- package/docs/dev/PHASE10_TASKS.md +476 -0
- package/docs/dev/PHASE11_TASKS.md +385 -0
- package/docs/dev/PHASE12_TASKS.md +335 -0
- package/docs/dev/PHASE13_TASKS.md +381 -0
- package/docs/dev/PHASE14_TASKS.md +371 -0
- package/docs/dev/PHASE15_TASKS.md +256 -0
- package/docs/dev/PHASE16_TASKS.md +314 -0
- package/docs/dev/PHASE17_TASKS.md +321 -0
- package/docs/dev/PHASE18_TASKS.md +345 -0
- package/docs/dev/PHASE19_TASKS.md +261 -0
- package/docs/dev/PHASE1_TASKS.md +443 -0
- package/docs/dev/PHASE20_TASKS.md +280 -0
- package/docs/dev/PHASE21_TASKS.md +355 -0
- package/docs/dev/PHASE22_TASKS.md +371 -0
- package/docs/dev/PHASE23_TASKS.md +274 -0
- package/docs/dev/PHASE24_TASKS.md +326 -0
- package/docs/dev/PHASE25_TASKS.md +452 -0
- package/docs/dev/PHASE26_TASKS.md +253 -0
- package/docs/dev/PHASE27_TASKS.md +410 -0
- package/docs/dev/PHASE2_TASKS.md +328 -0
- package/docs/dev/PHASE3_TASKS.md +571 -0
- package/docs/dev/PHASE4_TASKS.md +531 -0
- package/docs/dev/PHASE5_TASKS.md +835 -0
- package/docs/dev/PHASE6_TASKS.md +347 -0
- package/docs/dev/PHASE7_TASKS.md +257 -0
- package/docs/dev/PHASE8_TASKS.md +299 -0
- package/docs/dev/PHASE9_TASKS.md +320 -0
- package/docs/dev/PureContext_MCP_PRD_v1.0.docx +0 -0
- package/docs/dev/SELF_HOSTING.md +142 -0
- package/docs/dev/TEAM_SETUP.md +316 -0
- package/docs/dev/TELEMETRY.md +99 -0
- package/docs/dev/feature-analysis.md +305 -0
- package/docs/dev/phase-1-notes.md +3 -0
- package/guide/README.md +57 -0
- package/guide/ai-summaries.md +127 -0
- package/guide/code-health.md +190 -0
- package/guide/code-history.md +149 -0
- package/guide/finding-code.md +157 -0
- package/guide/navigating-new-code.md +121 -0
- package/guide/safe-changes.md +156 -0
- package/guide/team-setup.md +191 -0
- package/guide/web-ui.md +154 -0
- package/guide/why-purecontext.md +73 -0
- package/guide/workflow-onboarding.md +114 -0
- package/guide/workflow-pr-review.md +199 -0
- package/guide/workflow-refactoring.md +172 -0
- package/package.json +9 -2
|
@@ -0,0 +1,509 @@
|
|
|
1
|
+
# PureContext MCP — AI Agent Instructions
|
|
2
|
+
|
|
3
|
+
These instructions tell AI agents how to use PureContext MCP correctly for token-efficient code navigation. Add this file to your agent's rules (CLAUDE.md, Windsurf rules, Cursor rules, etc.).
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## What PureContext MCP is
|
|
8
|
+
|
|
9
|
+
PureContext MCP is a structured code navigation server. It indexes a codebase using tree-sitter AST parsing, stores symbol metadata in SQLite, and exposes MCP tools so you can retrieve precisely the code you need — without reading entire files.
|
|
10
|
+
|
|
11
|
+
**Token savings:** Retrieving a 45-line function by name costs ~150 tokens. Reading the 800-line file it lives in costs ~2,000 tokens. PureContext saves 88–98% of context tokens on typical navigation tasks.
|
|
12
|
+
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
## Mandatory workflow — always follow this order
|
|
16
|
+
|
|
17
|
+
### Step 1 — Check if the project is indexed
|
|
18
|
+
|
|
19
|
+
Before doing any code navigation, call `list_repos` to see what is already indexed.
|
|
20
|
+
|
|
21
|
+
```
|
|
22
|
+
list_repos()
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
If the current project is not in the list, index it first:
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
index_folder({ path: "/absolute/path/to/project" })
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
**Never skip this step.** All other tools require a `repoId`. `index_folder` returns the `repoId` you will use in every subsequent call. Save it.
|
|
32
|
+
|
|
33
|
+
### Step 2 — Navigate by symbol, not by file
|
|
34
|
+
|
|
35
|
+
Do **not** read entire files to find code. Use the tools:
|
|
36
|
+
|
|
37
|
+
| Goal | Tool to use |
|
|
38
|
+
|------|-------------|
|
|
39
|
+
| Find a function/class/method by name | `search_symbols` |
|
|
40
|
+
| Find code by what it does | `search_semantic` |
|
|
41
|
+
| Find a literal string, comment, or config value | `search_text` |
|
|
42
|
+
| See all symbols in one file | `get_file_outline` |
|
|
43
|
+
| See the whole project structure | `get_repo_outline` |
|
|
44
|
+
|
|
45
|
+
### Step 3 — Read summaries before fetching source
|
|
46
|
+
|
|
47
|
+
`search_symbols` returns signatures and summaries — **no source code**. This is intentional. Read the `summary` field first to decide whether a symbol is relevant. Fetch the implementation only for symbols you will actually work with:
|
|
48
|
+
|
|
49
|
+
```
|
|
50
|
+
get_symbol_source({ repoId, symbolId })
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
Do not call `get_symbol_source` for every result in a search. Summaries let you navigate without reading source, saving 10–50× tokens on typical lookups.
|
|
54
|
+
|
|
55
|
+
**Trust but verify:** summaries describe intent, not contract. For modification tasks, always read the source after using the summary to navigate. An AI-generated summary describes what a function is meant to do — source code is ground truth.
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## Tool reference — when to use each tool
|
|
60
|
+
|
|
61
|
+
### Indexing tools
|
|
62
|
+
|
|
63
|
+
#### `list_repos`
|
|
64
|
+
Always call this first. Returns all indexed repos with their `repoId`, path, file count, and last indexed time.
|
|
65
|
+
|
|
66
|
+
#### `index_folder`
|
|
67
|
+
Index a local directory. Returns `repoId`. Re-indexing is incremental — only changed files are re-parsed. Call it again if files have changed since the last index.
|
|
68
|
+
|
|
69
|
+
**Parameters:**
|
|
70
|
+
- `path` (required) — absolute path to project root
|
|
71
|
+
- `force` (optional) — set `true` to force re-index of all files, even unchanged ones
|
|
72
|
+
- `fileLimit` (optional) — override the configured file limit for this run
|
|
73
|
+
|
|
74
|
+
#### `resolve_repo`
|
|
75
|
+
Convert a local path to its `repoId` without indexing. Use this when you know the project is already indexed but don't have the `repoId` at hand.
|
|
76
|
+
|
|
77
|
+
#### `invalidate_cache`
|
|
78
|
+
Force a full re-index by clearing content hashes. Use when the index seems stale and `index_folder` is not picking up changes.
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
### Symbol search & retrieval
|
|
83
|
+
|
|
84
|
+
#### `search_symbols` — primary navigation tool
|
|
85
|
+
Search by name fragment. Use this for almost all navigation tasks.
|
|
86
|
+
|
|
87
|
+
```json
|
|
88
|
+
{
|
|
89
|
+
"repoId": "a1b2c3d4e5f60001",
|
|
90
|
+
"query": "authenticate",
|
|
91
|
+
"kind": "function",
|
|
92
|
+
"limit": 10
|
|
93
|
+
}
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
- Returns signatures and summaries — **no source code**
|
|
97
|
+
- Use the `kind` filter to narrow results: `function`, `class`, `method`, `route`, `component`, `hook`, `middleware`, etc.
|
|
98
|
+
- `camelCase`, `snake_case`, and space-separated queries are equivalent: `processOrder`, `process_order`, and `process order` return the same results
|
|
99
|
+
- Use `mode: "hybrid"` for best recall when unsure of the exact name
|
|
100
|
+
|
|
101
|
+
#### `search_semantic`
|
|
102
|
+
Search by meaning, not name. Use when you know what the code does but not what it is called.
|
|
103
|
+
|
|
104
|
+
```json
|
|
105
|
+
{
|
|
106
|
+
"repoId": "...",
|
|
107
|
+
"query": "function that validates user credentials and returns a session token",
|
|
108
|
+
"mode": "hybrid",
|
|
109
|
+
"max_results": 10
|
|
110
|
+
}
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
Requires semantic search to be enabled in config. Falls back to FTS5 keyword search automatically if the HNSW index is not available.
|
|
114
|
+
|
|
115
|
+
#### `search_text`
|
|
116
|
+
Grep-style full-text search across file content. Use for finding literal strings, error messages, config values, comments, or anything that is not a symbol name.
|
|
117
|
+
|
|
118
|
+
```json
|
|
119
|
+
{
|
|
120
|
+
"repoId": "...",
|
|
121
|
+
"query": "TODO: fix this",
|
|
122
|
+
"context_lines": 3
|
|
123
|
+
}
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
Do **not** use `search_text` when you are looking for a function or class — use `search_symbols` instead. `search_text` searches raw file content, not the symbol index.
|
|
127
|
+
|
|
128
|
+
#### `get_symbol_source`
|
|
129
|
+
Retrieve the source code of a specific symbol by its ID.
|
|
130
|
+
|
|
131
|
+
```json
|
|
132
|
+
{
|
|
133
|
+
"repoId": "...",
|
|
134
|
+
"symbolId": "8f3a2c1d0e4b5f9a",
|
|
135
|
+
"context_lines": 2
|
|
136
|
+
}
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
- `symbolId` comes from `search_symbols` or `get_file_outline` results
|
|
140
|
+
- Use `context_lines` to include surrounding lines for additional context
|
|
141
|
+
- Use `verify: true` when you need to confirm the source on disk matches the index (after recent file edits)
|
|
142
|
+
|
|
143
|
+
#### `get_symbols`
|
|
144
|
+
Batch-fetch multiple symbols by ID in a single call. Prefer this over calling `get_symbol_source` repeatedly when you need several symbols.
|
|
145
|
+
|
|
146
|
+
```json
|
|
147
|
+
{
|
|
148
|
+
"repoId": "...",
|
|
149
|
+
"symbolIds": ["id1", "id2", "id3"]
|
|
150
|
+
}
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
#### `get_file_content`
|
|
154
|
+
Retrieve raw file content with optional line range. Use only when you need to read a section of a file that is not a named symbol — for example, top-level imports, configuration blocks, or non-symbol prose.
|
|
155
|
+
|
|
156
|
+
```json
|
|
157
|
+
{
|
|
158
|
+
"repoId": "...",
|
|
159
|
+
"filePath": "src/config/settings.ts",
|
|
160
|
+
"startLine": 1,
|
|
161
|
+
"endLine": 40
|
|
162
|
+
}
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
Do **not** use `get_file_content` as a substitute for `get_symbol_source`. Always prefer symbol-level retrieval.
|
|
166
|
+
|
|
167
|
+
#### `get_file_outline`
|
|
168
|
+
All symbols in a single file with signatures and summaries. Use to survey a file without reading its content.
|
|
169
|
+
|
|
170
|
+
#### `get_repo_outline`
|
|
171
|
+
All files in the repo with their top-level symbols. Use to orient yourself in an unfamiliar project.
|
|
172
|
+
|
|
173
|
+
#### `get_file_tree`
|
|
174
|
+
Directory tree with file counts. Use when you need to understand the project's folder structure.
|
|
175
|
+
|
|
176
|
+
#### `find_references`
|
|
177
|
+
Find all usage sites (call sites, references) for a symbol across the repo. Use before renaming or modifying a symbol to understand all places that use it.
|
|
178
|
+
|
|
179
|
+
---
|
|
180
|
+
|
|
181
|
+
### Dependency graph tools
|
|
182
|
+
|
|
183
|
+
#### `get_context_bundle`
|
|
184
|
+
Forward-walk from a symbol — returns the symbol and everything it transitively imports. Use **before modifying a function** to understand its full context.
|
|
185
|
+
|
|
186
|
+
```json
|
|
187
|
+
{
|
|
188
|
+
"repoId": "...",
|
|
189
|
+
"symbolId": "...",
|
|
190
|
+
"maxDepth": 2,
|
|
191
|
+
"maxTokens": 4000
|
|
192
|
+
}
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
Use `maxTokens` to cap the response size when working with deeply connected code.
|
|
196
|
+
|
|
197
|
+
#### `get_blast_radius`
|
|
198
|
+
Reverse-walk — all files that transitively import a symbol. Use **before modifying or deleting a symbol** to understand what would break.
|
|
199
|
+
|
|
200
|
+
```json
|
|
201
|
+
{
|
|
202
|
+
"repoId": "...",
|
|
203
|
+
"symbolId": "...",
|
|
204
|
+
"maxDepth": 5
|
|
205
|
+
}
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
#### `find_importers`
|
|
209
|
+
Direct (one-hop) importers of a file. Faster than `get_blast_radius` when you only need the immediate callers.
|
|
210
|
+
|
|
211
|
+
#### `find_dead_code`
|
|
212
|
+
Exported symbols that nothing else imports. Use for cleanup audits. Note: may produce false positives for dynamic imports and symbols consumed by external npm consumers.
|
|
213
|
+
|
|
214
|
+
---
|
|
215
|
+
|
|
216
|
+
### Architecture & quality tools
|
|
217
|
+
|
|
218
|
+
#### `get_layer_violations`
|
|
219
|
+
Detect architectural import boundary violations. Use when enforcing layered architecture rules.
|
|
220
|
+
|
|
221
|
+
#### `get_quality_metrics`
|
|
222
|
+
Per-file complexity, coupling, cohesion, and documentation coverage scores. Always use this instead of making subjective assessments from reading source code. Treat complexity scores as directional signals — cyclomatic complexity is estimated from symbol count and nesting depth, not exact AST branch-counting.
|
|
223
|
+
|
|
224
|
+
#### `detect_antipatterns`
|
|
225
|
+
Detect common architectural anti-patterns (god classes, circular dependencies, dead code) across the repo. Returns structured results with severity levels and actionable locations. Only detects static patterns — cannot find runtime coupling or dynamic dispatch issues.
|
|
226
|
+
|
|
227
|
+
#### `get_architecture_doc`
|
|
228
|
+
Auto-generate an architecture summary in Markdown or Mermaid format. Requires `ai.allowRemoteAI: true`. Use early when onboarding to an unfamiliar codebase. The generated doc is always accurate because it derives from the actual index, not hand-written documentation.
|
|
229
|
+
|
|
230
|
+
**Pre-refactoring workflow:**
|
|
231
|
+
```
|
|
232
|
+
get_quality_metrics → find worst files
|
|
233
|
+
detect_antipatterns → find structural issues
|
|
234
|
+
get_blast_radius → understand impact scope
|
|
235
|
+
get_architecture_doc → generate "before" snapshot
|
|
236
|
+
[make changes]
|
|
237
|
+
detect_antipatterns → verify anti-patterns resolved
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
---
|
|
241
|
+
|
|
242
|
+
### Git & history tools
|
|
243
|
+
|
|
244
|
+
#### `get_symbol_history`
|
|
245
|
+
Symbol-level git commit history. Returns structured JSON with commits, authors, and diffs — no shell commands needed. Use to understand why a function was written the way it is, and to answer "who wrote this?" or "who should review this change?" without running `git log` or `git blame`.
|
|
246
|
+
|
|
247
|
+
**Limitations:** Rename/move breaks history continuity — symbols in renamed files start fresh history from the rename commit. After a rebase, run `invalidate_cache` + `index_folder` to rebuild accurate history.
|
|
248
|
+
|
|
249
|
+
#### `get_churn_metrics`
|
|
250
|
+
File and symbol churn metrics. Use to identify high-risk files before making changes. **Before modifying any symbol, check churn:** if `churnScore > 6`, mention this to the user and suggest extra testing. High-churn files are under active development (merge conflict risk) or chronically buggy (regression risk).
|
|
251
|
+
|
|
252
|
+
**For debugging:** Use `get_churn_metrics` to identify recently-changed symbols — recent changes are the most likely source of new bugs. This narrows the search space dramatically.
|
|
253
|
+
|
|
254
|
+
**Note:** The default `maxCommits: 500` cap means long-lived projects may lose early history. Increase `git.maxCommits` for history-sensitive workflows.
|
|
255
|
+
|
|
256
|
+
---
|
|
257
|
+
|
|
258
|
+
### Cross-repo tools
|
|
259
|
+
|
|
260
|
+
#### `search_cross_repo`
|
|
261
|
+
Search symbols across multiple indexed repositories simultaneously. Use for architectural questions like "which services handle email sending?" or "where is `UserProfile` defined?" — a single call replaces N per-repo queries.
|
|
262
|
+
|
|
263
|
+
#### `find_similar`
|
|
264
|
+
Find semantically similar code across repos using the HNSW vector index. **Before implementing new functionality**, call this to check if equivalent code already exists elsewhere in the organization. Requires semantic search enabled (`semantic.enabled: true` with a configured provider).
|
|
265
|
+
|
|
266
|
+
**Before modifying shared library code**, use `get_blast_radius` with `crossRepo: true` to understand the full downstream impact across all repos.
|
|
267
|
+
|
|
268
|
+
**Note:** `crossRepoDeps` requires explicit package name configuration — there is no auto-detection of Nx/Turborepo/Lerna workspaces. Monorepo packages must each be indexed separately with `index_folder`.
|
|
269
|
+
|
|
270
|
+
---
|
|
271
|
+
|
|
272
|
+
### Ecosystem & data tools
|
|
273
|
+
|
|
274
|
+
#### `search_columns`
|
|
275
|
+
Search column definitions across dbt models. Returns upstream/downstream lineage — not just where a column is defined, but the full chain from source tables through staging models to final fact tables. Use for data lineage questions like "where does the `revenue` column come from?"
|
|
276
|
+
|
|
277
|
+
**Note:** `search_columns` is dbt-only — it does not search columns in raw SQL `CREATE TABLE` statements. For those, use `get_symbol_source` on the `CREATE TABLE` symbol directly.
|
|
278
|
+
|
|
279
|
+
**dbt workflow notes:**
|
|
280
|
+
- Always run `index_folder` after `dbt compile` to ensure `manifest.json` is current — stale manifests produce incorrect column lineage.
|
|
281
|
+
- Use `get_context_bundle` to traverse dbt model dependencies just like code dependencies.
|
|
282
|
+
- Use `search_symbols` with `kind: "route"` to find API endpoints via the OpenAPI provider.
|
|
283
|
+
|
|
284
|
+
**Templating coverage:** Jinja preprocessing is implemented only for dbt's SQL dialect. Helm/Go templates, Ansible Jinja2, Kubernetes YAML, ERB, and Kustomize are not preprocessed — those files are indexed as raw text or skipped. Terraform is fully supported.
|
|
285
|
+
|
|
286
|
+
---
|
|
287
|
+
|
|
288
|
+
## Decision rules — which tool to pick
|
|
289
|
+
|
|
290
|
+
```
|
|
291
|
+
I need to find a symbol by name
|
|
292
|
+
→ search_symbols
|
|
293
|
+
|
|
294
|
+
I know what the code does but not its name
|
|
295
|
+
→ search_semantic (or search_symbols with mode: "hybrid")
|
|
296
|
+
|
|
297
|
+
I need to find a literal string, comment, or config value
|
|
298
|
+
→ search_text
|
|
299
|
+
|
|
300
|
+
I need the source code of a specific symbol
|
|
301
|
+
→ get_symbol_source (use symbolId from search_symbols)
|
|
302
|
+
|
|
303
|
+
I need source for several symbols at once
|
|
304
|
+
→ get_symbols (batch)
|
|
305
|
+
|
|
306
|
+
I need to understand a function's dependencies
|
|
307
|
+
→ get_context_bundle
|
|
308
|
+
|
|
309
|
+
I need to know what breaks if I change a symbol
|
|
310
|
+
→ get_blast_radius (before modifying)
|
|
311
|
+
→ find_references (for call sites specifically)
|
|
312
|
+
|
|
313
|
+
I need to survey a file's contents
|
|
314
|
+
→ get_file_outline
|
|
315
|
+
|
|
316
|
+
I need to understand the project layout
|
|
317
|
+
→ get_repo_outline or get_file_tree
|
|
318
|
+
|
|
319
|
+
I need a non-symbol section of a file (imports block, config)
|
|
320
|
+
→ get_file_content with startLine/endLine
|
|
321
|
+
```
|
|
322
|
+
|
|
323
|
+
---
|
|
324
|
+
|
|
325
|
+
## Anti-patterns — what NOT to do
|
|
326
|
+
|
|
327
|
+
**Do not read whole files to find a function.**
|
|
328
|
+
Use `search_symbols` + `get_symbol_source`. Reading an 800-line file to locate a 45-line function wastes ~1,850 tokens.
|
|
329
|
+
|
|
330
|
+
**Do not call `get_symbol_source` for every search result.**
|
|
331
|
+
Read the `signature` and `summary` from `search_symbols` first. Fetch source only for symbols you will actually work with.
|
|
332
|
+
|
|
333
|
+
**Do not skip `list_repos` at the start of a session.**
|
|
334
|
+
You need a `repoId` for every tool call. Get it from `list_repos` or `index_folder` — do not guess.
|
|
335
|
+
|
|
336
|
+
**Do not use `search_text` for symbol lookups.**
|
|
337
|
+
`search_text` is a grep over raw file content. It is slower and less precise than `search_symbols` for finding named code entities.
|
|
338
|
+
|
|
339
|
+
**Do not use `get_file_content` as a fallback for reading whole files.**
|
|
340
|
+
If a symbol exists in the index, use `get_symbol_source`. Only use `get_file_content` for content that is not a named symbol.
|
|
341
|
+
|
|
342
|
+
**Do not ignore `_tokenEstimate` fields.**
|
|
343
|
+
Every response includes a `_tokenEstimate`. Use it to decide whether to fetch more context or stop.
|
|
344
|
+
|
|
345
|
+
---
|
|
346
|
+
|
|
347
|
+
## Efficient navigation patterns
|
|
348
|
+
|
|
349
|
+
### Pattern: understand an unfamiliar codebase
|
|
350
|
+
|
|
351
|
+
```
|
|
352
|
+
1. list_repos() → check if indexed
|
|
353
|
+
2. index_folder({ path }) → index if needed, get repoId
|
|
354
|
+
3. get_repo_outline({ repoId }) → survey the structure
|
|
355
|
+
4. search_symbols({ query: "main entry point concept" }) → locate key symbols
|
|
356
|
+
5. get_context_bundle({ symbolId }) → understand the entry + dependencies
|
|
357
|
+
```
|
|
358
|
+
|
|
359
|
+
### Pattern: modify a function safely
|
|
360
|
+
|
|
361
|
+
```
|
|
362
|
+
1. search_symbols({ query: "functionName", kind: "function" })
|
|
363
|
+
2. get_blast_radius({ symbolId }) → know the impact scope BEFORE touching it
|
|
364
|
+
3. get_context_bundle({ symbolId, maxDepth: 2 }) → understand its context
|
|
365
|
+
4. get_symbol_source({ symbolId }) → read the implementation
|
|
366
|
+
5. [make the change]
|
|
367
|
+
6. find_dead_code({ repoId }) → verify no orphaned exports left behind
|
|
368
|
+
```
|
|
369
|
+
|
|
370
|
+
### Pattern: find where something is called
|
|
371
|
+
|
|
372
|
+
```
|
|
373
|
+
1. search_symbols({ query: "symbolName" })
|
|
374
|
+
2. find_references({ symbolId }) → all call sites
|
|
375
|
+
3. get_symbol_source for relevant call sites
|
|
376
|
+
```
|
|
377
|
+
|
|
378
|
+
### Pattern: search when you know the concept but not the name
|
|
379
|
+
|
|
380
|
+
```
|
|
381
|
+
1. search_semantic({ query: "natural language description", mode: "hybrid" })
|
|
382
|
+
2. Review signatures and summaries in results
|
|
383
|
+
3. get_symbol_source for the best match
|
|
384
|
+
```
|
|
385
|
+
|
|
386
|
+
### Pattern: large batch of symbols
|
|
387
|
+
|
|
388
|
+
```
|
|
389
|
+
1. search_symbols({ query: "...", limit: 20 })
|
|
390
|
+
2. Filter results by signature/summary to pick the ones you need
|
|
391
|
+
3. get_symbols({ symbolIds: ["id1", "id2", "id3"] }) ← one call, not three
|
|
392
|
+
```
|
|
393
|
+
|
|
394
|
+
### Pattern: modify a high-risk symbol safely
|
|
395
|
+
|
|
396
|
+
```
|
|
397
|
+
1. search_symbols({ query: "functionName", kind: "function" })
|
|
398
|
+
2. get_churn_metrics({ repoId, symbolId }) → if churnScore > 6, warn user
|
|
399
|
+
3. get_symbol_history({ symbolId }) → understand recent change context
|
|
400
|
+
4. get_blast_radius({ symbolId }) → know full impact scope
|
|
401
|
+
5. get_context_bundle({ symbolId, maxDepth: 2 }) → understand dependencies
|
|
402
|
+
6. get_symbol_source({ symbolId }) → read the implementation
|
|
403
|
+
7. [make the change]
|
|
404
|
+
8. find_dead_code({ repoId }) → verify no orphaned exports
|
|
405
|
+
```
|
|
406
|
+
|
|
407
|
+
### Pattern: architecture review / onboarding
|
|
408
|
+
|
|
409
|
+
```
|
|
410
|
+
1. list_repos → index_folder if needed
|
|
411
|
+
2. get_architecture_doc({ repoId }) → generate project overview
|
|
412
|
+
3. get_quality_metrics({ repoId }) → identify weakest files
|
|
413
|
+
4. detect_antipatterns({ repoId }) → find structural issues
|
|
414
|
+
5. get_repo_outline({ repoId }) → survey specific areas
|
|
415
|
+
```
|
|
416
|
+
|
|
417
|
+
### Pattern: before implementing new functionality
|
|
418
|
+
|
|
419
|
+
```
|
|
420
|
+
1. find_similar({ query: "description", crossRepo: true }) → check for existing code
|
|
421
|
+
2. search_cross_repo({ query: "conceptName" }) → find related symbols across repos
|
|
422
|
+
3. get_blast_radius({ symbolId, crossRepo: true }) → understand cross-repo impact
|
|
423
|
+
```
|
|
424
|
+
|
|
425
|
+
### Pattern: debug a recent regression
|
|
426
|
+
|
|
427
|
+
```
|
|
428
|
+
1. get_churn_metrics({ repoId }) → find recently-changed files
|
|
429
|
+
2. get_symbol_history({ symbolId }) → check commits in the affected area
|
|
430
|
+
3. search_symbols in changed files → find the suspect functions
|
|
431
|
+
4. get_symbol_source → get_context_bundle → read and understand the change
|
|
432
|
+
```
|
|
433
|
+
|
|
434
|
+
### Pattern: PR review
|
|
435
|
+
|
|
436
|
+
```
|
|
437
|
+
1. [obtain list of changed files from PR]
|
|
438
|
+
2. get_symbol_history for changed symbols → understand prior context
|
|
439
|
+
3. get_churn_metrics for changed files → flag hotspots
|
|
440
|
+
4. get_blast_radius for each modified symbol → identify affected downstream code
|
|
441
|
+
5. detect_antipatterns({ repoId }) → flag new structural issues
|
|
442
|
+
```
|
|
443
|
+
|
|
444
|
+
---
|
|
445
|
+
|
|
446
|
+
## Search tips
|
|
447
|
+
|
|
448
|
+
- **camelCase and snake_case are equivalent** — `processOrder` and `process_order` return the same results.
|
|
449
|
+
- **Short queries rank better** — `auth` finds more than `authentication middleware function`.
|
|
450
|
+
- **Use `kind` to narrow results** — `kind: "function"` eliminates class/method noise.
|
|
451
|
+
- **Use `filePath` to scope** — `filePath: "src/auth/"` restricts to a directory.
|
|
452
|
+
- **Use `debug: true` to diagnose ranking** — shows BM25 scores and name boost factors.
|
|
453
|
+
- **For hybrid mode** — `semantic_weight: 0.6, keyword_weight: 0.4` is a good default when you are unsure of the exact name.
|
|
454
|
+
|
|
455
|
+
---
|
|
456
|
+
|
|
457
|
+
## Notes on `_tokenEstimate` and `_meta`
|
|
458
|
+
|
|
459
|
+
Every response includes:
|
|
460
|
+
|
|
461
|
+
```json
|
|
462
|
+
"_meta": {
|
|
463
|
+
"timing_ms": 3,
|
|
464
|
+
"tokens_saved": 1842,
|
|
465
|
+
"total_tokens_saved": 45231
|
|
466
|
+
}
|
|
467
|
+
```
|
|
468
|
+
|
|
469
|
+
And most responses include `_tokenEstimate` — a rough count of tokens in the returned payload. Use this to:
|
|
470
|
+
- Decide whether to fetch additional context or stop
|
|
471
|
+
- Avoid hitting context limits by capping `maxTokens` in `get_context_bundle`
|
|
472
|
+
- Track cumulative savings with `get_savings_stats`
|
|
473
|
+
|
|
474
|
+
---
|
|
475
|
+
|
|
476
|
+
## Keeping the index fresh
|
|
477
|
+
|
|
478
|
+
The file watcher triggers incremental re-indexing automatically on file changes. If you suspect the index is stale:
|
|
479
|
+
|
|
480
|
+
```
|
|
481
|
+
index_folder({ path, force: false }) → incremental (changed files only)
|
|
482
|
+
index_folder({ path, force: true }) → full re-index (all files)
|
|
483
|
+
invalidate_cache({ repoId }) → clear hashes, then index_folder
|
|
484
|
+
```
|
|
485
|
+
|
|
486
|
+
---
|
|
487
|
+
|
|
488
|
+
## Known limitations
|
|
489
|
+
|
|
490
|
+
These are documented gaps — understand them so you can work around them rather than being confused when a tool behaves unexpectedly.
|
|
491
|
+
|
|
492
|
+
| Area | Limitation | Workaround |
|
|
493
|
+
|------|-----------|-----------|
|
|
494
|
+
| **AI Summaries** | Summaries describe intent, not contract. Stale summaries exist until re-index. | Always verify with `get_symbol_source` before modifying. |
|
|
495
|
+
| **AI Summaries** | `get_architecture_doc` requires `ai.allowRemoteAI: true`. | `detect_antipatterns` and `get_quality_metrics` work without AI. |
|
|
496
|
+
| **Git History** | Rename/move breaks history continuity — prior history is lost after a rename. | Future: `git log --follow` tracking. |
|
|
497
|
+
| **Git History** | Rebase invalidates commit hashes — re-index required after significant rebase. | Run `invalidate_cache` + `index_folder` post-rebase. |
|
|
498
|
+
| **Git History** | Default `maxCommits: 500` drops early history on long-lived projects. | Increase `git.maxCommits` in config for history-sensitive workflows. |
|
|
499
|
+
| **Git History** | No SVN/Mercurial/Perforce support. | Git is a hard requirement for history features. |
|
|
500
|
+
| **Cross-Repo** | `crossRepoDeps` is manual — no auto-detection of Nx/Turborepo/pnpm workspaces. | Explicitly list package names in each repo's config. |
|
|
501
|
+
| **Cross-Repo** | `find_similar` requires semantic search enabled and an embedding provider. | Use a local Ollama model as a zero-cost alternative. |
|
|
502
|
+
| **Cross-Repo** | MCP Resources `resources/subscribe` is not yet supported by Claude Code or Cursor. | Polling with `search_cross_repo` is the current alternative. |
|
|
503
|
+
| **Architecture** | Quality metrics use estimated complexity (nesting heuristics), not true AST branch-counting. | Treat scores as directional signals, not precise measurements. |
|
|
504
|
+
| **Architecture** | `detect_antipatterns` cannot detect runtime coupling or dynamic dispatch. | Complementary to profiling and runtime observability — not a replacement. |
|
|
505
|
+
| **Architecture** | `get_layer_violations` needs layer boundaries defined in config before it delivers value. | Requires upfront config investment. |
|
|
506
|
+
| **Ecosystem** | Jinja preprocessing is dbt SQL only — Helm, Ansible, ERB, Kustomize not supported. | Use Terraform for IaC where possible; raw file reads otherwise. |
|
|
507
|
+
| **Ecosystem** | `search_columns` is dbt-only — does not cover `CREATE TABLE` SQL columns. | Use `get_symbol_source` on the `CREATE TABLE` symbol instead. |
|
|
508
|
+
| **Ecosystem** | dbt indexer does not detect stale `manifest.json`. | Always run `dbt compile` before `index_folder` on dbt projects. |
|
|
509
|
+
| **Ecosystem** | BigQuery STRUCT/ARRAY, Snowflake QUALIFY, and DuckDB LIST/MAP may not parse fully. | Model-level symbols are still extracted even when the body fails to parse. |
|
|
@@ -0,0 +1,97 @@
|
|
|
1
|
+
# PureContext MCP — Agent Instructions
|
|
2
|
+
|
|
3
|
+
PureContext indexes codebases with tree-sitter and serves symbols via MCP. Retrieving a 45-line function by name costs ~150 tokens vs ~2,000 tokens for reading the whole file. Use these tools instead of reading files.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Mandatory first step
|
|
8
|
+
|
|
9
|
+
Always call `list_repos` before any code navigation. If the project is not listed, call `index_folder` with the absolute project path. Every other tool requires the `repoId` returned by these two calls.
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Pick the right tool
|
|
14
|
+
|
|
15
|
+
| I need to… | Use |
|
|
16
|
+
|---|---|
|
|
17
|
+
| Find a function/class/method by name | `search_symbols` |
|
|
18
|
+
| Find code by what it does (meaning, not name) | `search_semantic` |
|
|
19
|
+
| Find a literal string, comment, or config value | `search_text` |
|
|
20
|
+
| Read a symbol's implementation | `get_symbol_source` |
|
|
21
|
+
| Fetch several symbols at once | `get_symbols` |
|
|
22
|
+
| Survey all symbols in one file | `get_file_outline` |
|
|
23
|
+
| Survey the whole project layout | `get_repo_outline` or `get_file_tree` |
|
|
24
|
+
| Read a non-symbol file section (imports, config block) | `get_file_content` with `startLine`/`endLine` |
|
|
25
|
+
| Understand what a symbol depends on | `get_context_bundle` |
|
|
26
|
+
| Know what breaks if I change a symbol | `get_blast_radius` |
|
|
27
|
+
| Find all call sites of a symbol | `find_references` |
|
|
28
|
+
| Check who imports a file directly | `find_importers` |
|
|
29
|
+
| Find unused exports | `find_dead_code` |
|
|
30
|
+
| Check if similar code exists across repos | `find_similar` |
|
|
31
|
+
| Search all indexed repos at once | `search_cross_repo` |
|
|
32
|
+
| Trace a dbt column's lineage | `search_columns` |
|
|
33
|
+
| Understand symbol-level git history | `get_symbol_history` |
|
|
34
|
+
| Identify high-churn / high-risk files | `get_churn_metrics` |
|
|
35
|
+
| Get per-file quality scores (complexity, coupling) | `get_quality_metrics` |
|
|
36
|
+
| Find god classes, circular deps, dead code | `detect_antipatterns` |
|
|
37
|
+
| Generate an architecture overview doc | `get_architecture_doc` |
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## Rules
|
|
42
|
+
|
|
43
|
+
**1. Never read whole files to find code.** Use `search_symbols` + `get_symbol_source`. Reading files wastes tokens.
|
|
44
|
+
|
|
45
|
+
**2. `search_symbols` returns no source.** It returns signatures and summaries only. Call `get_symbol_source` only for symbols you will actually work with — not for every result.
|
|
46
|
+
|
|
47
|
+
**3. Trust summaries, but verify before modifying.** Summaries describe intent, not contract. Use the `summary` field to navigate; always read the source before making a change.
|
|
48
|
+
|
|
49
|
+
**4. Before modifying a symbol:** call `get_churn_metrics` first. If `churnScore > 6`, warn the user. Then call `get_blast_radius` for impact scope and `get_context_bundle` for dependencies.
|
|
50
|
+
|
|
51
|
+
**5. `search_text` is grep, not symbol search.** Use it only for literal strings, comments, and values that are not named symbols.
|
|
52
|
+
|
|
53
|
+
**6. Use `get_symbols` for batches.** When you need source for multiple symbols, one `get_symbols` call beats multiple `get_symbol_source` calls.
|
|
54
|
+
|
|
55
|
+
**7. camelCase = snake_case for queries.** `processOrder`, `process_order`, and `process order` return the same results. Use `kind:` to narrow (e.g. `kind: "function"`).
|
|
56
|
+
|
|
57
|
+
**8. Use `mode: "hybrid"` when unsure of the exact name.** Combines keyword precision with semantic recall.
|
|
58
|
+
|
|
59
|
+
**9. Check for duplicates before implementing new code.** Call `find_similar` (cross-repo) to discover existing implementations before writing something new.
|
|
60
|
+
|
|
61
|
+
**10. Use `get_architecture_doc` when onboarding.** Call it early on an unfamiliar codebase to build a mental model before diving into symbols.
|
|
62
|
+
|
|
63
|
+
**11. For dbt projects:** always run `dbt compile` before `index_folder`. Use `search_columns` for column lineage, `get_context_bundle` for model dependencies, and `search_symbols` with `kind: "route"` for API endpoints.
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## Common patterns
|
|
68
|
+
|
|
69
|
+
**Explore an unfamiliar codebase**
|
|
70
|
+
```
|
|
71
|
+
list_repos → (index_folder if missing) → get_architecture_doc → get_quality_metrics → get_repo_outline → search_symbols → get_context_bundle
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
**Modify a function safely**
|
|
75
|
+
```
|
|
76
|
+
search_symbols → get_churn_metrics → get_symbol_history → get_blast_radius → get_context_bundle → get_symbol_source → [edit]
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
**Find where a symbol is used**
|
|
80
|
+
```
|
|
81
|
+
search_symbols → find_references → get_symbol_source for relevant call sites
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
**Before implementing new functionality**
|
|
85
|
+
```
|
|
86
|
+
find_similar (crossRepo: true) → search_cross_repo → [only build if nothing equivalent exists]
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
**Debug a recent regression**
|
|
90
|
+
```
|
|
91
|
+
get_churn_metrics → get_symbol_history for changed symbols → search_symbols → get_symbol_source
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
**Architecture / code health review**
|
|
95
|
+
```
|
|
96
|
+
get_quality_metrics → detect_antipatterns → get_architecture_doc (before) → [refactor] → detect_antipatterns (after)
|
|
97
|
+
```
|