purecontext-mcp 1.1.0 → 1.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. package/AGENT_INSTRUCTIONS.md +509 -0
  2. package/AGENT_INSTRUCTIONS_SHORT.md +97 -0
  3. package/CHANGELOG.md +212 -0
  4. package/docs/01-introduction.md +69 -0
  5. package/docs/02-installation.md +267 -0
  6. package/docs/03-quick-start.md +135 -0
  7. package/docs/04-configuration.md +214 -0
  8. package/docs/05-cli-reference.md +130 -0
  9. package/docs/06-tools-reference.md +499 -0
  10. package/docs/07-language-support.md +88 -0
  11. package/docs/08-framework-adapters.md +324 -0
  12. package/docs/09-dependency-graph.md +182 -0
  13. package/docs/10-semantic-search.md +153 -0
  14. package/docs/11-search-quality.md +110 -0
  15. package/docs/12-ai-summarization.md +106 -0
  16. package/docs/13-token-savings.md +110 -0
  17. package/docs/14-transport-modes.md +167 -0
  18. package/docs/15-team-setup.md +251 -0
  19. package/docs/16-docker.md +186 -0
  20. package/docs/17-web-ui.md +157 -0
  21. package/docs/18-git-history.md +157 -0
  22. package/docs/19-cross-repo.md +177 -0
  23. package/docs/20-architecture-analysis.md +228 -0
  24. package/docs/21-ecosystem-tools.md +189 -0
  25. package/docs/22-distribution.md +240 -0
  26. package/docs/23-performance.md +121 -0
  27. package/docs/24-security.md +144 -0
  28. package/docs/25-architecture-overview.md +240 -0
  29. package/docs/26-troubleshooting.md +234 -0
  30. package/docs/27-api-stability.md +114 -0
  31. package/docs/README.md +71 -0
  32. package/guide/README.md +57 -0
  33. package/guide/ai-summaries.md +127 -0
  34. package/guide/code-health.md +190 -0
  35. package/guide/code-history.md +149 -0
  36. package/guide/finding-code.md +157 -0
  37. package/guide/navigating-new-code.md +121 -0
  38. package/guide/safe-changes.md +156 -0
  39. package/guide/team-setup.md +191 -0
  40. package/guide/web-ui.md +154 -0
  41. package/guide/why-purecontext.md +73 -0
  42. package/guide/workflow-onboarding.md +114 -0
  43. package/guide/workflow-pr-review.md +199 -0
  44. package/guide/workflow-refactoring.md +172 -0
  45. package/package.json +9 -2
@@ -0,0 +1,509 @@
1
+ # PureContext MCP — AI Agent Instructions
2
+
3
+ These instructions tell AI agents how to use PureContext MCP correctly for token-efficient code navigation. Add this file to your agent's rules (CLAUDE.md, Windsurf rules, Cursor rules, etc.).
4
+
5
+ ---
6
+
7
+ ## What PureContext MCP is
8
+
9
+ PureContext MCP is a structured code navigation server. It indexes a codebase using tree-sitter AST parsing, stores symbol metadata in SQLite, and exposes MCP tools so you can retrieve precisely the code you need — without reading entire files.
10
+
11
+ **Token savings:** Retrieving a 45-line function by name costs ~150 tokens. Reading the 800-line file it lives in costs ~2,000 tokens. PureContext saves 88–98% of context tokens on typical navigation tasks.
12
+
13
+ ---
14
+
15
+ ## Mandatory workflow — always follow this order
16
+
17
+ ### Step 1 — Check if the project is indexed
18
+
19
+ Before doing any code navigation, call `list_repos` to see what is already indexed.
20
+
21
+ ```
22
+ list_repos()
23
+ ```
24
+
25
+ If the current project is not in the list, index it first:
26
+
27
+ ```
28
+ index_folder({ path: "/absolute/path/to/project" })
29
+ ```
30
+
31
+ **Never skip this step.** All other tools require a `repoId`. `index_folder` returns the `repoId` you will use in every subsequent call. Save it.
32
+
33
+ ### Step 2 — Navigate by symbol, not by file
34
+
35
+ Do **not** read entire files to find code. Use the tools:
36
+
37
+ | Goal | Tool to use |
38
+ |------|-------------|
39
+ | Find a function/class/method by name | `search_symbols` |
40
+ | Find code by what it does | `search_semantic` |
41
+ | Find a literal string, comment, or config value | `search_text` |
42
+ | See all symbols in one file | `get_file_outline` |
43
+ | See the whole project structure | `get_repo_outline` |
44
+
45
+ ### Step 3 — Read summaries before fetching source
46
+
47
+ `search_symbols` returns signatures and summaries — **no source code**. This is intentional. Read the `summary` field first to decide whether a symbol is relevant. Fetch the implementation only for symbols you will actually work with:
48
+
49
+ ```
50
+ get_symbol_source({ repoId, symbolId })
51
+ ```
52
+
53
+ Do not call `get_symbol_source` for every result in a search. Summaries let you navigate without reading source, saving 10–50× tokens on typical lookups.
54
+
55
+ **Trust but verify:** summaries describe intent, not contract. For modification tasks, always read the source after using the summary to navigate. An AI-generated summary describes what a function is meant to do — source code is ground truth.
56
+
57
+ ---
58
+
59
+ ## Tool reference — when to use each tool
60
+
61
+ ### Indexing tools
62
+
63
+ #### `list_repos`
64
+ Always call this first. Returns all indexed repos with their `repoId`, path, file count, and last indexed time.
65
+
66
+ #### `index_folder`
67
+ Index a local directory. Returns `repoId`. Re-indexing is incremental — only changed files are re-parsed. Call it again if files have changed since the last index.
68
+
69
+ **Parameters:**
70
+ - `path` (required) — absolute path to project root
71
+ - `force` (optional) — set `true` to force re-index of all files, even unchanged ones
72
+ - `fileLimit` (optional) — override the configured file limit for this run
73
+
74
+ #### `resolve_repo`
75
+ Convert a local path to its `repoId` without indexing. Use this when you know the project is already indexed but don't have the `repoId` at hand.
76
+
77
+ #### `invalidate_cache`
78
+ Force a full re-index by clearing content hashes. Use when the index seems stale and `index_folder` is not picking up changes.
79
+
80
+ ---
81
+
82
+ ### Symbol search & retrieval
83
+
84
+ #### `search_symbols` — primary navigation tool
85
+ Search by name fragment. Use this for almost all navigation tasks.
86
+
87
+ ```json
88
+ {
89
+ "repoId": "a1b2c3d4e5f60001",
90
+ "query": "authenticate",
91
+ "kind": "function",
92
+ "limit": 10
93
+ }
94
+ ```
95
+
96
+ - Returns signatures and summaries — **no source code**
97
+ - Use the `kind` filter to narrow results: `function`, `class`, `method`, `route`, `component`, `hook`, `middleware`, etc.
98
+ - `camelCase`, `snake_case`, and space-separated queries are equivalent: `processOrder`, `process_order`, and `process order` return the same results
99
+ - Use `mode: "hybrid"` for best recall when unsure of the exact name
100
+
101
+ #### `search_semantic`
102
+ Search by meaning, not name. Use when you know what the code does but not what it is called.
103
+
104
+ ```json
105
+ {
106
+ "repoId": "...",
107
+ "query": "function that validates user credentials and returns a session token",
108
+ "mode": "hybrid",
109
+ "max_results": 10
110
+ }
111
+ ```
112
+
113
+ Requires semantic search to be enabled in config. Falls back to FTS5 keyword search automatically if the HNSW index is not available.
114
+
115
+ #### `search_text`
116
+ Grep-style full-text search across file content. Use for finding literal strings, error messages, config values, comments, or anything that is not a symbol name.
117
+
118
+ ```json
119
+ {
120
+ "repoId": "...",
121
+ "query": "TODO: fix this",
122
+ "context_lines": 3
123
+ }
124
+ ```
125
+
126
+ Do **not** use `search_text` when you are looking for a function or class — use `search_symbols` instead. `search_text` searches raw file content, not the symbol index.
127
+
128
+ #### `get_symbol_source`
129
+ Retrieve the source code of a specific symbol by its ID.
130
+
131
+ ```json
132
+ {
133
+ "repoId": "...",
134
+ "symbolId": "8f3a2c1d0e4b5f9a",
135
+ "context_lines": 2
136
+ }
137
+ ```
138
+
139
+ - `symbolId` comes from `search_symbols` or `get_file_outline` results
140
+ - Use `context_lines` to include surrounding lines for additional context
141
+ - Use `verify: true` when you need to confirm the source on disk matches the index (after recent file edits)
142
+
143
+ #### `get_symbols`
144
+ Batch-fetch multiple symbols by ID in a single call. Prefer this over calling `get_symbol_source` repeatedly when you need several symbols.
145
+
146
+ ```json
147
+ {
148
+ "repoId": "...",
149
+ "symbolIds": ["id1", "id2", "id3"]
150
+ }
151
+ ```
152
+
153
+ #### `get_file_content`
154
+ Retrieve raw file content with optional line range. Use only when you need to read a section of a file that is not a named symbol — for example, top-level imports, configuration blocks, or non-symbol prose.
155
+
156
+ ```json
157
+ {
158
+ "repoId": "...",
159
+ "filePath": "src/config/settings.ts",
160
+ "startLine": 1,
161
+ "endLine": 40
162
+ }
163
+ ```
164
+
165
+ Do **not** use `get_file_content` as a substitute for `get_symbol_source`. Always prefer symbol-level retrieval.
166
+
167
+ #### `get_file_outline`
168
+ All symbols in a single file with signatures and summaries. Use to survey a file without reading its content.
169
+
170
+ #### `get_repo_outline`
171
+ All files in the repo with their top-level symbols. Use to orient yourself in an unfamiliar project.
172
+
173
+ #### `get_file_tree`
174
+ Directory tree with file counts. Use when you need to understand the project's folder structure.
175
+
176
+ #### `find_references`
177
+ Find all usage sites (call sites, references) for a symbol across the repo. Use before renaming or modifying a symbol to understand all places that use it.
178
+
179
+ ---
180
+
181
+ ### Dependency graph tools
182
+
183
+ #### `get_context_bundle`
184
+ Forward-walk from a symbol — returns the symbol and everything it transitively imports. Use **before modifying a function** to understand its full context.
185
+
186
+ ```json
187
+ {
188
+ "repoId": "...",
189
+ "symbolId": "...",
190
+ "maxDepth": 2,
191
+ "maxTokens": 4000
192
+ }
193
+ ```
194
+
195
+ Use `maxTokens` to cap the response size when working with deeply connected code.
196
+
197
+ #### `get_blast_radius`
198
+ Reverse-walk — all files that transitively import a symbol. Use **before modifying or deleting a symbol** to understand what would break.
199
+
200
+ ```json
201
+ {
202
+ "repoId": "...",
203
+ "symbolId": "...",
204
+ "maxDepth": 5
205
+ }
206
+ ```
207
+
208
+ #### `find_importers`
209
+ Direct (one-hop) importers of a file. Faster than `get_blast_radius` when you only need the immediate callers.
210
+
211
+ #### `find_dead_code`
212
+ Exported symbols that nothing else imports. Use for cleanup audits. Note: may produce false positives for dynamic imports and symbols consumed by external npm consumers.
213
+
214
+ ---
215
+
216
+ ### Architecture & quality tools
217
+
218
+ #### `get_layer_violations`
219
+ Detect architectural import boundary violations. Use when enforcing layered architecture rules.
220
+
221
+ #### `get_quality_metrics`
222
+ Per-file complexity, coupling, cohesion, and documentation coverage scores. Always use this instead of making subjective assessments from reading source code. Treat complexity scores as directional signals — cyclomatic complexity is estimated from symbol count and nesting depth, not exact AST branch-counting.
223
+
224
+ #### `detect_antipatterns`
225
+ Detect common architectural anti-patterns (god classes, circular dependencies, dead code) across the repo. Returns structured results with severity levels and actionable locations. Only detects static patterns — cannot find runtime coupling or dynamic dispatch issues.
226
+
227
+ #### `get_architecture_doc`
228
+ Auto-generate an architecture summary in Markdown or Mermaid format. Requires `ai.allowRemoteAI: true`. Use early when onboarding to an unfamiliar codebase. The generated doc is always accurate because it derives from the actual index, not hand-written documentation.
229
+
230
+ **Pre-refactoring workflow:**
231
+ ```
232
+ get_quality_metrics → find worst files
233
+ detect_antipatterns → find structural issues
234
+ get_blast_radius → understand impact scope
235
+ get_architecture_doc → generate "before" snapshot
236
+ [make changes]
237
+ detect_antipatterns → verify anti-patterns resolved
238
+ ```
239
+
240
+ ---
241
+
242
+ ### Git & history tools
243
+
244
+ #### `get_symbol_history`
245
+ Symbol-level git commit history. Returns structured JSON with commits, authors, and diffs — no shell commands needed. Use to understand why a function was written the way it is, and to answer "who wrote this?" or "who should review this change?" without running `git log` or `git blame`.
246
+
247
+ **Limitations:** Rename/move breaks history continuity — symbols in renamed files start fresh history from the rename commit. After a rebase, run `invalidate_cache` + `index_folder` to rebuild accurate history.
248
+
249
+ #### `get_churn_metrics`
250
+ File and symbol churn metrics. Use to identify high-risk files before making changes. **Before modifying any symbol, check churn:** if `churnScore > 6`, mention this to the user and suggest extra testing. High-churn files are under active development (merge conflict risk) or chronically buggy (regression risk).
251
+
252
+ **For debugging:** Use `get_churn_metrics` to identify recently-changed symbols — recent changes are the most likely source of new bugs. This narrows the search space dramatically.
253
+
254
+ **Note:** The default `maxCommits: 500` cap means long-lived projects may lose early history. Increase `git.maxCommits` for history-sensitive workflows.
255
+
256
+ ---
257
+
258
+ ### Cross-repo tools
259
+
260
+ #### `search_cross_repo`
261
+ Search symbols across multiple indexed repositories simultaneously. Use for architectural questions like "which services handle email sending?" or "where is `UserProfile` defined?" — a single call replaces N per-repo queries.
262
+
263
+ #### `find_similar`
264
+ Find semantically similar code across repos using the HNSW vector index. **Before implementing new functionality**, call this to check if equivalent code already exists elsewhere in the organization. Requires semantic search enabled (`semantic.enabled: true` with a configured provider).
265
+
266
+ **Before modifying shared library code**, use `get_blast_radius` with `crossRepo: true` to understand the full downstream impact across all repos.
267
+
268
+ **Note:** `crossRepoDeps` requires explicit package name configuration — there is no auto-detection of Nx/Turborepo/Lerna workspaces. Monorepo packages must each be indexed separately with `index_folder`.
269
+
270
+ ---
271
+
272
+ ### Ecosystem & data tools
273
+
274
+ #### `search_columns`
275
+ Search column definitions across dbt models. Returns upstream/downstream lineage — not just where a column is defined, but the full chain from source tables through staging models to final fact tables. Use for data lineage questions like "where does the `revenue` column come from?"
276
+
277
+ **Note:** `search_columns` is dbt-only — it does not search columns in raw SQL `CREATE TABLE` statements. For those, use `get_symbol_source` on the `CREATE TABLE` symbol directly.
278
+
279
+ **dbt workflow notes:**
280
+ - Always run `index_folder` after `dbt compile` to ensure `manifest.json` is current — stale manifests produce incorrect column lineage.
281
+ - Use `get_context_bundle` to traverse dbt model dependencies just like code dependencies.
282
+ - Use `search_symbols` with `kind: "route"` to find API endpoints via the OpenAPI provider.
283
+
284
+ **Templating coverage:** Jinja preprocessing is implemented only for dbt's SQL dialect. Helm/Go templates, Ansible Jinja2, Kubernetes YAML, ERB, and Kustomize are not preprocessed — those files are indexed as raw text or skipped. Terraform is fully supported.
285
+
286
+ ---
287
+
288
+ ## Decision rules — which tool to pick
289
+
290
+ ```
291
+ I need to find a symbol by name
292
+ → search_symbols
293
+
294
+ I know what the code does but not its name
295
+ → search_semantic (or search_symbols with mode: "hybrid")
296
+
297
+ I need to find a literal string, comment, or config value
298
+ → search_text
299
+
300
+ I need the source code of a specific symbol
301
+ → get_symbol_source (use symbolId from search_symbols)
302
+
303
+ I need source for several symbols at once
304
+ → get_symbols (batch)
305
+
306
+ I need to understand a function's dependencies
307
+ → get_context_bundle
308
+
309
+ I need to know what breaks if I change a symbol
310
+ → get_blast_radius (before modifying)
311
+ → find_references (for call sites specifically)
312
+
313
+ I need to survey a file's contents
314
+ → get_file_outline
315
+
316
+ I need to understand the project layout
317
+ → get_repo_outline or get_file_tree
318
+
319
+ I need a non-symbol section of a file (imports block, config)
320
+ → get_file_content with startLine/endLine
321
+ ```
322
+
323
+ ---
324
+
325
+ ## Anti-patterns — what NOT to do
326
+
327
+ **Do not read whole files to find a function.**
328
+ Use `search_symbols` + `get_symbol_source`. Reading an 800-line file to locate a 45-line function wastes ~1,850 tokens.
329
+
330
+ **Do not call `get_symbol_source` for every search result.**
331
+ Read the `signature` and `summary` from `search_symbols` first. Fetch source only for symbols you will actually work with.
332
+
333
+ **Do not skip `list_repos` at the start of a session.**
334
+ You need a `repoId` for every tool call. Get it from `list_repos` or `index_folder` — do not guess.
335
+
336
+ **Do not use `search_text` for symbol lookups.**
337
+ `search_text` is a grep over raw file content. It is slower and less precise than `search_symbols` for finding named code entities.
338
+
339
+ **Do not use `get_file_content` as a fallback for reading whole files.**
340
+ If a symbol exists in the index, use `get_symbol_source`. Only use `get_file_content` for content that is not a named symbol.
341
+
342
+ **Do not ignore `_tokenEstimate` fields.**
343
+ Every response includes a `_tokenEstimate`. Use it to decide whether to fetch more context or stop.
344
+
345
+ ---
346
+
347
+ ## Efficient navigation patterns
348
+
349
+ ### Pattern: understand an unfamiliar codebase
350
+
351
+ ```
352
+ 1. list_repos() → check if indexed
353
+ 2. index_folder({ path }) → index if needed, get repoId
354
+ 3. get_repo_outline({ repoId }) → survey the structure
355
+ 4. search_symbols({ query: "main entry point concept" }) → locate key symbols
356
+ 5. get_context_bundle({ symbolId }) → understand the entry + dependencies
357
+ ```
358
+
359
+ ### Pattern: modify a function safely
360
+
361
+ ```
362
+ 1. search_symbols({ query: "functionName", kind: "function" })
363
+ 2. get_blast_radius({ symbolId }) → know the impact scope BEFORE touching it
364
+ 3. get_context_bundle({ symbolId, maxDepth: 2 }) → understand its context
365
+ 4. get_symbol_source({ symbolId }) → read the implementation
366
+ 5. [make the change]
367
+ 6. find_dead_code({ repoId }) → verify no orphaned exports left behind
368
+ ```
369
+
370
+ ### Pattern: find where something is called
371
+
372
+ ```
373
+ 1. search_symbols({ query: "symbolName" })
374
+ 2. find_references({ symbolId }) → all call sites
375
+ 3. get_symbol_source for relevant call sites
376
+ ```
377
+
378
+ ### Pattern: search when you know the concept but not the name
379
+
380
+ ```
381
+ 1. search_semantic({ query: "natural language description", mode: "hybrid" })
382
+ 2. Review signatures and summaries in results
383
+ 3. get_symbol_source for the best match
384
+ ```
385
+
386
+ ### Pattern: large batch of symbols
387
+
388
+ ```
389
+ 1. search_symbols({ query: "...", limit: 20 })
390
+ 2. Filter results by signature/summary to pick the ones you need
391
+ 3. get_symbols({ symbolIds: ["id1", "id2", "id3"] }) ← one call, not three
392
+ ```
393
+
394
+ ### Pattern: modify a high-risk symbol safely
395
+
396
+ ```
397
+ 1. search_symbols({ query: "functionName", kind: "function" })
398
+ 2. get_churn_metrics({ repoId, symbolId }) → if churnScore > 6, warn user
399
+ 3. get_symbol_history({ symbolId }) → understand recent change context
400
+ 4. get_blast_radius({ symbolId }) → know full impact scope
401
+ 5. get_context_bundle({ symbolId, maxDepth: 2 }) → understand dependencies
402
+ 6. get_symbol_source({ symbolId }) → read the implementation
403
+ 7. [make the change]
404
+ 8. find_dead_code({ repoId }) → verify no orphaned exports
405
+ ```
406
+
407
+ ### Pattern: architecture review / onboarding
408
+
409
+ ```
410
+ 1. list_repos → index_folder if needed
411
+ 2. get_architecture_doc({ repoId }) → generate project overview
412
+ 3. get_quality_metrics({ repoId }) → identify weakest files
413
+ 4. detect_antipatterns({ repoId }) → find structural issues
414
+ 5. get_repo_outline({ repoId }) → survey specific areas
415
+ ```
416
+
417
+ ### Pattern: before implementing new functionality
418
+
419
+ ```
420
+ 1. find_similar({ query: "description", crossRepo: true }) → check for existing code
421
+ 2. search_cross_repo({ query: "conceptName" }) → find related symbols across repos
422
+ 3. get_blast_radius({ symbolId, crossRepo: true }) → understand cross-repo impact
423
+ ```
424
+
425
+ ### Pattern: debug a recent regression
426
+
427
+ ```
428
+ 1. get_churn_metrics({ repoId }) → find recently-changed files
429
+ 2. get_symbol_history({ symbolId }) → check commits in the affected area
430
+ 3. search_symbols in changed files → find the suspect functions
431
+ 4. get_symbol_source → get_context_bundle → read and understand the change
432
+ ```
433
+
434
+ ### Pattern: PR review
435
+
436
+ ```
437
+ 1. [obtain list of changed files from PR]
438
+ 2. get_symbol_history for changed symbols → understand prior context
439
+ 3. get_churn_metrics for changed files → flag hotspots
440
+ 4. get_blast_radius for each modified symbol → identify affected downstream code
441
+ 5. detect_antipatterns({ repoId }) → flag new structural issues
442
+ ```
443
+
444
+ ---
445
+
446
+ ## Search tips
447
+
448
+ - **camelCase and snake_case are equivalent** — `processOrder` and `process_order` return the same results.
449
+ - **Short queries rank better** — `auth` finds more than `authentication middleware function`.
450
+ - **Use `kind` to narrow results** — `kind: "function"` eliminates class/method noise.
451
+ - **Use `filePath` to scope** — `filePath: "src/auth/"` restricts to a directory.
452
+ - **Use `debug: true` to diagnose ranking** — shows BM25 scores and name boost factors.
453
+ - **For hybrid mode** — `semantic_weight: 0.6, keyword_weight: 0.4` is a good default when you are unsure of the exact name.
454
+
455
+ ---
456
+
457
+ ## Notes on `_tokenEstimate` and `_meta`
458
+
459
+ Every response includes:
460
+
461
+ ```json
462
+ "_meta": {
463
+ "timing_ms": 3,
464
+ "tokens_saved": 1842,
465
+ "total_tokens_saved": 45231
466
+ }
467
+ ```
468
+
469
+ And most responses include `_tokenEstimate` — a rough count of tokens in the returned payload. Use this to:
470
+ - Decide whether to fetch additional context or stop
471
+ - Avoid hitting context limits by capping `maxTokens` in `get_context_bundle`
472
+ - Track cumulative savings with `get_savings_stats`
473
+
474
+ ---
475
+
476
+ ## Keeping the index fresh
477
+
478
+ The file watcher triggers incremental re-indexing automatically on file changes. If you suspect the index is stale:
479
+
480
+ ```
481
+ index_folder({ path, force: false }) → incremental (changed files only)
482
+ index_folder({ path, force: true }) → full re-index (all files)
483
+ invalidate_cache({ repoId }) → clear hashes, then index_folder
484
+ ```
485
+
486
+ ---
487
+
488
+ ## Known limitations
489
+
490
+ These are documented gaps — understand them so you can work around them rather than being confused when a tool behaves unexpectedly.
491
+
492
+ | Area | Limitation | Workaround |
493
+ |------|-----------|-----------|
494
+ | **AI Summaries** | Summaries describe intent, not contract. Stale summaries exist until re-index. | Always verify with `get_symbol_source` before modifying. |
495
+ | **AI Summaries** | `get_architecture_doc` requires `ai.allowRemoteAI: true`. | `detect_antipatterns` and `get_quality_metrics` work without AI. |
496
+ | **Git History** | Rename/move breaks history continuity — prior history is lost after a rename. | Future: `git log --follow` tracking. |
497
+ | **Git History** | Rebase invalidates commit hashes — re-index required after significant rebase. | Run `invalidate_cache` + `index_folder` post-rebase. |
498
+ | **Git History** | Default `maxCommits: 500` drops early history on long-lived projects. | Increase `git.maxCommits` in config for history-sensitive workflows. |
499
+ | **Git History** | No SVN/Mercurial/Perforce support. | Git is a hard requirement for history features. |
500
+ | **Cross-Repo** | `crossRepoDeps` is manual — no auto-detection of Nx/Turborepo/pnpm workspaces. | Explicitly list package names in each repo's config. |
501
+ | **Cross-Repo** | `find_similar` requires semantic search enabled and an embedding provider. | Use a local Ollama model as a zero-cost alternative. |
502
+ | **Cross-Repo** | MCP Resources `resources/subscribe` is not yet supported by Claude Code or Cursor. | Polling with `search_cross_repo` is the current alternative. |
503
+ | **Architecture** | Quality metrics use estimated complexity (nesting heuristics), not true AST branch-counting. | Treat scores as directional signals, not precise measurements. |
504
+ | **Architecture** | `detect_antipatterns` cannot detect runtime coupling or dynamic dispatch. | Complementary to profiling and runtime observability — not a replacement. |
505
+ | **Architecture** | `get_layer_violations` needs layer boundaries defined in config before it delivers value. | Requires upfront config investment. |
506
+ | **Ecosystem** | Jinja preprocessing is dbt SQL only — Helm, Ansible, ERB, Kustomize not supported. | Use Terraform for IaC where possible; raw file reads otherwise. |
507
+ | **Ecosystem** | `search_columns` is dbt-only — does not cover `CREATE TABLE` SQL columns. | Use `get_symbol_source` on the `CREATE TABLE` symbol instead. |
508
+ | **Ecosystem** | dbt indexer does not detect stale `manifest.json`. | Always run `dbt compile` before `index_folder` on dbt projects. |
509
+ | **Ecosystem** | BigQuery STRUCT/ARRAY, Snowflake QUALIFY, and DuckDB LIST/MAP may not parse fully. | Model-level symbols are still extracted even when the body fails to parse. |
@@ -0,0 +1,97 @@
1
+ # PureContext MCP — Agent Instructions
2
+
3
+ PureContext indexes codebases with tree-sitter and serves symbols via MCP. Retrieving a 45-line function by name costs ~150 tokens vs ~2,000 tokens for reading the whole file. Use these tools instead of reading files.
4
+
5
+ ---
6
+
7
+ ## Mandatory first step
8
+
9
+ Always call `list_repos` before any code navigation. If the project is not listed, call `index_folder` with the absolute project path. Every other tool requires the `repoId` returned by these two calls.
10
+
11
+ ---
12
+
13
+ ## Pick the right tool
14
+
15
+ | I need to… | Use |
16
+ |---|---|
17
+ | Find a function/class/method by name | `search_symbols` |
18
+ | Find code by what it does (meaning, not name) | `search_semantic` |
19
+ | Find a literal string, comment, or config value | `search_text` |
20
+ | Read a symbol's implementation | `get_symbol_source` |
21
+ | Fetch several symbols at once | `get_symbols` |
22
+ | Survey all symbols in one file | `get_file_outline` |
23
+ | Survey the whole project layout | `get_repo_outline` or `get_file_tree` |
24
+ | Read a non-symbol file section (imports, config block) | `get_file_content` with `startLine`/`endLine` |
25
+ | Understand what a symbol depends on | `get_context_bundle` |
26
+ | Know what breaks if I change a symbol | `get_blast_radius` |
27
+ | Find all call sites of a symbol | `find_references` |
28
+ | Check who imports a file directly | `find_importers` |
29
+ | Find unused exports | `find_dead_code` |
30
+ | Check if similar code exists across repos | `find_similar` |
31
+ | Search all indexed repos at once | `search_cross_repo` |
32
+ | Trace a dbt column's lineage | `search_columns` |
33
+ | Understand symbol-level git history | `get_symbol_history` |
34
+ | Identify high-churn / high-risk files | `get_churn_metrics` |
35
+ | Get per-file quality scores (complexity, coupling) | `get_quality_metrics` |
36
+ | Find god classes, circular deps, dead code | `detect_antipatterns` |
37
+ | Generate an architecture overview doc | `get_architecture_doc` |
38
+
39
+ ---
40
+
41
+ ## Rules
42
+
43
+ **1. Never read whole files to find code.** Use `search_symbols` + `get_symbol_source`. Reading files wastes tokens.
44
+
45
+ **2. `search_symbols` returns no source.** It returns signatures and summaries only. Call `get_symbol_source` only for symbols you will actually work with — not for every result.
46
+
47
+ **3. Trust summaries, but verify before modifying.** Summaries describe intent, not contract. Use the `summary` field to navigate; always read the source before making a change.
48
+
49
+ **4. Before modifying a symbol:** call `get_churn_metrics` first. If `churnScore > 6`, warn the user. Then call `get_blast_radius` for impact scope and `get_context_bundle` for dependencies.
50
+
51
+ **5. `search_text` is grep, not symbol search.** Use it only for literal strings, comments, and values that are not named symbols.
52
+
53
+ **6. Use `get_symbols` for batches.** When you need source for multiple symbols, one `get_symbols` call beats multiple `get_symbol_source` calls.
54
+
55
+ **7. camelCase = snake_case for queries.** `processOrder`, `process_order`, and `process order` return the same results. Use `kind:` to narrow (e.g. `kind: "function"`).
56
+
57
+ **8. Use `mode: "hybrid"` when unsure of the exact name.** Combines keyword precision with semantic recall.
58
+
59
+ **9. Check for duplicates before implementing new code.** Call `find_similar` (cross-repo) to discover existing implementations before writing something new.
60
+
61
+ **10. Use `get_architecture_doc` when onboarding.** Call it early on an unfamiliar codebase to build a mental model before diving into symbols.
62
+
63
+ **11. For dbt projects:** always run `dbt compile` before `index_folder`. Use `search_columns` for column lineage, `get_context_bundle` for model dependencies, and `search_symbols` with `kind: "route"` for API endpoints.
64
+
65
+ ---
66
+
67
+ ## Common patterns
68
+
69
+ **Explore an unfamiliar codebase**
70
+ ```
71
+ list_repos → (index_folder if missing) → get_architecture_doc → get_quality_metrics → get_repo_outline → search_symbols → get_context_bundle
72
+ ```
73
+
74
+ **Modify a function safely**
75
+ ```
76
+ search_symbols → get_churn_metrics → get_symbol_history → get_blast_radius → get_context_bundle → get_symbol_source → [edit]
77
+ ```
78
+
79
+ **Find where a symbol is used**
80
+ ```
81
+ search_symbols → find_references → get_symbol_source for relevant call sites
82
+ ```
83
+
84
+ **Before implementing new functionality**
85
+ ```
86
+ find_similar (crossRepo: true) → search_cross_repo → [only build if nothing equivalent exists]
87
+ ```
88
+
89
+ **Debug a recent regression**
90
+ ```
91
+ get_churn_metrics → get_symbol_history for changed symbols → search_symbols → get_symbol_source
92
+ ```
93
+
94
+ **Architecture / code health review**
95
+ ```
96
+ get_quality_metrics → detect_antipatterns → get_architecture_doc (before) → [refactor] → detect_antipatterns (after)
97
+ ```