npm - purecontext-mcp - Versions diffs - 1.1.0 → 1.1.2 - Mend

purecontext-mcp 1.1.0 → 1.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (45) hide show

package/AGENT_INSTRUCTIONS.md +509 -0
package/AGENT_INSTRUCTIONS_SHORT.md +97 -0
package/CHANGELOG.md +212 -0
package/docs/01-introduction.md +69 -0
package/docs/02-installation.md +267 -0
package/docs/03-quick-start.md +135 -0
package/docs/04-configuration.md +214 -0
package/docs/05-cli-reference.md +130 -0
package/docs/06-tools-reference.md +499 -0
package/docs/07-language-support.md +88 -0
package/docs/08-framework-adapters.md +324 -0
package/docs/09-dependency-graph.md +182 -0
package/docs/10-semantic-search.md +153 -0
package/docs/11-search-quality.md +110 -0
package/docs/12-ai-summarization.md +106 -0
package/docs/13-token-savings.md +110 -0
package/docs/14-transport-modes.md +167 -0
package/docs/15-team-setup.md +251 -0
package/docs/16-docker.md +186 -0
package/docs/17-web-ui.md +157 -0
package/docs/18-git-history.md +157 -0
package/docs/19-cross-repo.md +177 -0
package/docs/20-architecture-analysis.md +228 -0
package/docs/21-ecosystem-tools.md +189 -0
package/docs/22-distribution.md +240 -0
package/docs/23-performance.md +121 -0
package/docs/24-security.md +144 -0
package/docs/25-architecture-overview.md +240 -0
package/docs/26-troubleshooting.md +234 -0
package/docs/27-api-stability.md +114 -0
package/docs/README.md +71 -0
package/guide/README.md +57 -0
package/guide/ai-summaries.md +127 -0
package/guide/code-health.md +190 -0
package/guide/code-history.md +149 -0
package/guide/finding-code.md +157 -0
package/guide/navigating-new-code.md +121 -0
package/guide/safe-changes.md +156 -0
package/guide/team-setup.md +191 -0
package/guide/web-ui.md +154 -0
package/guide/why-purecontext.md +73 -0
package/guide/workflow-onboarding.md +114 -0
package/guide/workflow-pr-review.md +199 -0
package/guide/workflow-refactoring.md +172 -0
package/package.json +9 -2

package/docs/11-search-quality.md ADDED Viewed

@@ -0,0 +1,110 @@
+# Search Quality & Ranking
+Keyword search is backed by SQLite FTS5 with a custom relevance ranker and a camelCase/snake_case query preprocessor.
+---
+## How keyword search works
+`search_symbols` and `search_text` use SQLite's **FTS5** (Full-Text Search) virtual table, which provides:
+- BM25 ranking — frequency-adjusted term weighting
+- Fast full-text matching without table scans
+- Unicode tokenization
+Before FTS5, keyword search used SQL `LIKE` — which had 0% precision for camelCase names. FTS5 with the preprocessor below fixed this.
+---
+## Query preprocessor
+Before a query hits FTS5, it goes through a preprocessor that splits identifiers into component words:
+| Input query | Preprocessed to |
+|-------------|----------------|
+| `processOrder` | `process order` |
+| `process_order` | `process order` |
+| `HTTPClient` | `http client` |
+| `getUserById` | `get user by id` |
+| `validate-token` | `validate token` |
+| `auth validate` | `auth validate` (already split) |
+This means `processOrder` and `process_order` are equivalent queries — no need to guess the naming convention.
+**Phrase search** bypasses splitting. Use quotes for exact matching:
+```
+"processOrder"   → matches only "processOrder" literally
+```
+---
+## Relevance ranker
+Results are scored by a multi-factor ranker that adjusts raw BM25 scores:
+| Factor | Boost | Example |
+|--------|-------|---------|
+| Exact name match | +3.0 | query `"authenticate"` matches symbol named `authenticate` |
+| Name starts-with | +1.5 | query `"auth"` matches `authenticateUser` |
+| Symbol kind filter match | +0.5 | `kind: "function"` filters and boosts |
+| File path proximity | +0.3 | query restricted to `src/auth/**` |
+| BM25 base | 1.0 | FTS5 BM25 score |
+The ranker ensures that an exact name hit always ranks above a summary-only hit, even if the summary appears more times in the index.
+---
+## `search_symbols` vs `search_text`
+| Tool | Searches | Returns |
+|------|---------|---------|
+| `search_symbols` | Symbol names and summaries | Symbol metadata (no source) |
+| `search_text` | Raw file content (grep-style) | File + line + context snippet |
+Use `search_symbols` for navigating by identifier. Use `search_text` when you need to find a string that isn't a symbol name — error messages, config values, comments, string literals.
+---
+## Debug mode
+Pass `debug: true` to either search tool to get the scoring breakdown in the response:
+```json
+{
+  "query": "authenticateUser",
+  "debug": true
+}
+```
+Response includes:
+```json
+{
+  "symbols": [...],
+  "_debug": {
+    "preprocessedQuery": "authenticate user",
+    "ftsMatches": 12,
+    "rankedResults": [
+      {
+        "name": "authenticateUser",
+        "bm25": 4.21,
+        "nameBoost": 3.0,
+        "finalScore": 7.21
+      }
+    ]
+  }
+}
+```
+This is useful for diagnosing why a result ranks unexpectedly high or low.
+---
+## Search tips
+- **camelCase and snake_case are equivalent** — `processOrder`, `process_order`, and `process order` all return the same results.
+- **Short queries rank better** — `auth` finds more than `authentication function` because shorter terms match more precisely.
+- **Use `kind` filter to narrow** — `kind: "function"` eliminates class/method noise.
+- **Combine with semantic search** — use `mode: "hybrid"` for the best recall when you're not sure of the exact name.
+- **Scope with `filePath`** — `filePath: "src/auth/**"` restricts to a directory.
+- **For exact strings** — use `search_text` with `is_regex: false` when you need to find a literal string in source.

package/docs/12-ai-summarization.md ADDED Viewed

@@ -0,0 +1,106 @@
+# AI Summarization
+AI summarization generates one-line descriptions for symbols that have no docstring. Summaries appear in search results and reduce the need to fetch full source.
+---
+## Summary priority chain
+For every symbol, PureContext uses the first successful source in this order:
+1. **Extracted docstring** — JSDoc `/** */`, Python `"""`, `///`, `@doc`, Haddock, etc. No AI cost.
+2. **Framework-derived** — for recognized patterns: `"Vue component UserCard"`, `"GET /api/users Nuxt server route"`. No AI cost.
+3. **AI-generated** (optional) — requires config. Batched API call to the configured provider.
+4. **Signature fallback** — if AI is disabled or fails: reformatted one-liner from the symbol signature. No AI cost.
+The result is that well-documented codebases spend almost nothing on AI summarization.
+---
+## Enabling AI summarization
+AI summarization is **always disabled by default** and requires two explicit opt-ins:
+```json
+{
+  "ai": {
+    "provider": "anthropic",
+    "allowRemoteAI": true,
+    "apiKey": "${ANTHROPIC_API_KEY}",
+    "model": "claude-haiku-4-5-20251001",
+    "batchSize": 50
+  }
+}
+```
+`allowRemoteAI: true` is a safety gate — without it, no outbound AI API calls are made even if `provider` is set. This prevents accidental API costs during development.
+---
+## Supported providers
+| Provider | `ai.provider` value | Recommended model | Notes |
+|----------|---------------------|-------------------|-------|
+| Anthropic | `"anthropic"` | `claude-haiku-4-5-20251001` | Best quality, fast |
+| OpenAI | `"openai"` | `gpt-4o-mini` | Good quality, cost-effective |
+| Google Gemini | `"google"` | `gemini-flash` | Lowest cost per token |
+| OpenAI-compatible | `"openai-compatible"` | any | Ollama, LM Studio, etc. |
+| Disabled | `"none"` | — | Default — no AI calls |
+### Using a local Ollama model
+```json
+{
+  "ai": {
+    "provider": "openai-compatible",
+    "allowRemoteAI": true,
+    "endpoint": "http://localhost:11434",
+    "model": "llama3.2",
+    "batchSize": 10
+  }
+}
+```
+---
+## Batch mode
+Symbols are summarized in batches to minimize API round trips. `ai.batchSize` controls how many symbols are sent per request (default: 50).
+The batch prompt includes all symbol signatures and asks for one-line summaries for each. Responses are parsed and cached in SQLite — no re-generation on repeated `index_folder` calls for unchanged files.
+**Cost estimate:** A 1,000-symbol project with no docstrings, using Claude Haiku at ~50 symbols/batch:
+- 20 API calls
+- ~10,000 input tokens + ~2,000 output tokens per call
+- Total: ~$0.01–0.05 depending on provider
+---
+## Google Gemini Flash
+Google Gemini Flash offers the lowest cost per token for summarization. Enable it with:
+```json
+{
+  "ai": {
+    "provider": "google",
+    "allowRemoteAI": true,
+    "apiKey": "${GEMINI_API_KEY}",
+    "model": "gemini-flash",
+    "batchSize": 100
+  }
+}
+```
+Gemini Flash supports larger batches than Claude or GPT-4o-mini, reducing the number of API calls.
+---
+## Cost management tips
+- **Use the cheapest model** — summaries are short, quality difference between Haiku/Flash/mini and Opus/GPT-4o is negligible.
+- **Only undocumented symbols trigger AI** — a codebase with JSDoc on every function costs almost nothing.
+- **Summaries are cached** — re-indexing unchanged files does not re-summarize.
+- **Set `allowRemoteAI: false` during development** (the default) to avoid accidental charges.
+- **Lower `batchSize` for local models** — local models can handle fewer tokens per call.

package/docs/13-token-savings.md ADDED Viewed

@@ -0,0 +1,110 @@
+# Token Savings Tracker
+Every retrieval tool call automatically tracks how many tokens were saved compared to reading full files. The tracker is always on — no configuration required.
+---
+## How savings are calculated
+```
+tokens_saved = max(0, (rawFileBytes - responseBytes) / 4)
+```
+- `rawFileBytes` — size of the full file(s) that would have been read
+- `responseBytes` — size of the actual response (symbol source or summary)
+- `4 bytes/token` — approximation using the cl100k_base encoding
+This is a **conservative estimate** — the actual savings are often higher because agents typically need to read multiple files to locate a symbol, while PureContext returns it directly.
+---
+## Viewing savings
+### In every response
+Savings are included in the `_meta` field of every retrieval tool response:
+```json
+{
+  "symbol": { "name": "authenticateUser", ... },
+  "source": "...",
+  "_meta": {
+    "timing_ms": 3,
+    "tokens_saved": 1842,
+    "total_tokens_saved": 45231,
+    "cost_avoided": {
+      "claude_opus_4": 0.028,
+      "claude_sonnet_4": 0.006
+    },
+    "powered_by": "PureContext MCP"
+  }
+}
+```
+### Cumulative stats
+Use the `get_savings_stats` tool to view totals across the session:
+```json
+{}
+```
+**Response:**
+```json
+{
+  "total_tokens_saved": 1234567,
+  "equivalent_context_windows": {
+    "claude_200k": 6.17,
+    "gpt4_128k": 9.64
+  },
+  "total_cost_avoided": {
+    "claude_opus_4": 18.52,
+    "claude_sonnet_4": 3.70,
+    "claude_haiku_4": 0.99,
+    "gpt4o": 3.09,
+    "gpt4o_mini": 0.19
+  }
+}
+```
+---
+## Interpreting results
+| Savings % | What it means |
+|-----------|--------------|
+| 90–98% | Typical for well-structured codebases — agents retrieving individual symbols |
+| 70–89% | Normal — some larger functions or files being retrieved whole |
+| < 70% | Check agent tool usage — agents may be calling `get_file_content` for full files, or using `get_repo_outline` frequently |
+**`equivalent_context_windows`** shows how many full context windows worth of tokens were saved — useful for communicating the value to stakeholders.
+**`total_cost_avoided`** is the dollar equivalent at published API rates for each model. This is an estimate at the time of the release — actual rates may differ.
+---
+## Persistence
+Savings persist to `~/.purecontext/_savings.json` across sessions. They accumulate indefinitely.
+To reset the counter:
+```json
+{
+  "reset": true
+}
+```
+---
+## What does and does not count
+| Counts toward savings | Does not count |
+|----------------------|---------------|
+| `get_symbol_source` — returns partial file | `list_repos` — no file content |
+| `get_file_outline` — returns symbols, not file | `search_symbols` — no file content |
+| `get_context_bundle` — returns selected symbols | `get_file_tree` — no file content |
+| `get_blast_radius` — returns file list | `index_folder` — write operation |
+| `get_file_content` with line range | `get_file_content` without range (full file) |

package/docs/14-transport-modes.md ADDED Viewed

@@ -0,0 +1,167 @@
+# Transport Modes
+PureContext supports two transport modes: **stdio** (local, default) and **HTTP/SSE** (team/cloud).
+## stdio transport (default)
+The standard transport for Claude Code and other MCP-native clients.
+```bash
+purecontext-mcp
+```
+Claude Code spawns `purecontext-mcp` as a child process and communicates over stdin/stdout using the JSON-RPC MCP protocol. No network, no authentication required.
+**Claude Code setup:**
+```bash
+# Using npx (recommended)
+claude mcp add purecontext-mcp -- npx purecontext-mcp
+# Using global install
+claude mcp add purecontext-mcp purecontext-mcp
+```
+**Best for:** Individual developers, local development, any situation where security and simplicity matter more than sharing.
+## HTTP / SSE transport
+For browser-based clients, remote development, or multi-client setups.
+```bash
+purecontext-mcp --transport http --port 3000
+```
+Or via `config.json`:
+```json
+{
+  "transport": "http",
+  "http": {
+    "port": 3000,
+    "host": "127.0.0.1",
+    "corsOrigins": ["http://localhost:*"]
+  }
+}
+```
+**HTTP endpoints:**
+| Endpoint | Description |
+|----------|-------------|
+| `GET /health` | Server health check (always public) |
+| `POST /mcp/sse` | MCP Streamable HTTP endpoint |
+| `GET /` | Web UI (served when UI is built) |
+| `GET /admin/*` | Admin API (requires admin key) |
+**Connect Claude Code to an HTTP server:**
+```json
+// ~/.claude/claude_desktop_config.json
+{
+  "mcpServers": {
+    "purecontext": {
+      "transport": "http",
+      "url": "http://localhost:3000/mcp/sse"
+    }
+  }
+}
+```
+Or via CLI:
+```bash
+claude mcp add purecontext-remote \
+  --transport http \
+  --url https://purecontext.mycompany.com/mcp/sse \
+  --header "Authorization: Bearer pctx_yourkey"
+```
+**Best for:** Team deployments, shared index, CI pipelines, Web UI access.
+## Both transports simultaneously (development)
+Run stdio and HTTP at the same time — useful during development to test the HTTP API while still using Claude Code via stdio:
+```bash
+purecontext-mcp --transport both
+```
+## Choosing a transport
+| Scenario | Recommended transport |
+|----------|-----------------------|
+| Solo developer, local project | `stdio` |
+| Team with shared codebase | `http` (server) |
+| CI pipeline | `http` or `stdio` with cached index |
+| Web UI access | `http` |
+| Testing both simultaneously | `both` |
+## Authentication in HTTP mode
+When binding to a non-loopback address, always enable authentication:
+```json
+{
+  "http": {
+    "host": "0.0.0.0",
+    "auth": {
+      "enabled": true,
+      "token": "${PURECONTEXT_API_TOKEN}"
+    }
+  }
+}
+```
+If `token` is empty and `enabled` is `true`, a random 32-byte hex token is generated at startup and printed to stderr. Save it immediately — it is not persisted to disk.
+All MCP requests must include:
+```
+Authorization: Bearer <token>
+```
+A warning is logged at startup if the server is bound to a non-loopback address with authentication disabled.
+## TLS / HTTPS
+PureContext does not terminate TLS itself. Put it behind a reverse proxy for HTTPS in production.
+**nginx example:**
+```nginx
+server {
+    listen 443 ssl;
+    server_name purecontext.mycompany.com;
+    ssl_certificate     /etc/letsencrypt/live/purecontext.mycompany.com/fullchain.pem;
+    ssl_certificate_key /etc/letsencrypt/live/purecontext.mycompany.com/privkey.pem;
+    location / {
+        proxy_pass http://localhost:3000;
+        proxy_http_version 1.1;
+        proxy_set_header Upgrade $http_upgrade;
+        proxy_set_header Connection keep-alive;
+        proxy_set_header Host $host;
+        # Disable buffering for SSE
+        proxy_buffering off;
+        proxy_cache off;
+        proxy_read_timeout 3600s;
+    }
+}
+```
+**Caddy example:**
+```
+purecontext.mycompany.com {
+    reverse_proxy localhost:3000 {
+        flush_interval -1
+    }
+}
+```
+## SSE keepalive
+The HTTP server sends a `: ping` comment over the SSE stream every 30 seconds to keep connections alive through proxies and load balancers. If your proxy has a shorter idle timeout than 30 seconds, increase it (e.g., `proxy_read_timeout 3600s` in nginx).