purecontext-mcp 1.1.0 → 1.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. package/AGENT_INSTRUCTIONS.md +509 -0
  2. package/AGENT_INSTRUCTIONS_SHORT.md +97 -0
  3. package/CHANGELOG.md +212 -0
  4. package/docs/01-introduction.md +69 -0
  5. package/docs/02-installation.md +267 -0
  6. package/docs/03-quick-start.md +135 -0
  7. package/docs/04-configuration.md +214 -0
  8. package/docs/05-cli-reference.md +130 -0
  9. package/docs/06-tools-reference.md +499 -0
  10. package/docs/07-language-support.md +88 -0
  11. package/docs/08-framework-adapters.md +324 -0
  12. package/docs/09-dependency-graph.md +182 -0
  13. package/docs/10-semantic-search.md +153 -0
  14. package/docs/11-search-quality.md +110 -0
  15. package/docs/12-ai-summarization.md +106 -0
  16. package/docs/13-token-savings.md +110 -0
  17. package/docs/14-transport-modes.md +167 -0
  18. package/docs/15-team-setup.md +251 -0
  19. package/docs/16-docker.md +186 -0
  20. package/docs/17-web-ui.md +157 -0
  21. package/docs/18-git-history.md +157 -0
  22. package/docs/19-cross-repo.md +177 -0
  23. package/docs/20-architecture-analysis.md +228 -0
  24. package/docs/21-ecosystem-tools.md +189 -0
  25. package/docs/22-distribution.md +240 -0
  26. package/docs/23-performance.md +121 -0
  27. package/docs/24-security.md +144 -0
  28. package/docs/25-architecture-overview.md +240 -0
  29. package/docs/26-troubleshooting.md +234 -0
  30. package/docs/27-api-stability.md +114 -0
  31. package/docs/README.md +71 -0
  32. package/guide/README.md +57 -0
  33. package/guide/ai-summaries.md +127 -0
  34. package/guide/code-health.md +190 -0
  35. package/guide/code-history.md +149 -0
  36. package/guide/finding-code.md +157 -0
  37. package/guide/navigating-new-code.md +121 -0
  38. package/guide/safe-changes.md +156 -0
  39. package/guide/team-setup.md +191 -0
  40. package/guide/web-ui.md +154 -0
  41. package/guide/why-purecontext.md +73 -0
  42. package/guide/workflow-onboarding.md +114 -0
  43. package/guide/workflow-pr-review.md +199 -0
  44. package/guide/workflow-refactoring.md +172 -0
  45. package/package.json +9 -2
@@ -0,0 +1,189 @@
1
+ # Ecosystem & Data Tools
2
+
3
+
4
+ Ecosystem tools extend PureContext to data-centric codebases: dbt projects, SQL schemas, OpenAPI specifications, and a context provider framework for domain-specific integrations.
5
+
6
+ ---
7
+
8
+ ## Context provider framework
9
+
10
+ A context provider is a plugin that adds domain-specific enrichment to symbol metadata and search results. Providers are loaded automatically when their target framework is detected.
11
+
12
+ **Built-in providers:**
13
+ - **dbt provider** — enriches dbt model symbols with column lineage and upstream/downstream dependencies
14
+ - **OpenAPI provider** — enriches endpoint symbols with request/response schema details
15
+ - **SQL provider** — enriches table symbols with column definitions and foreign key relationships
16
+
17
+ **Writing a custom provider:**
18
+
19
+ ```typescript
20
+ interface ContextProvider {
21
+ name: string;
22
+ detect(projectRoot: string): Promise<boolean>;
23
+ enrich(symbol: SymbolRecord): Promise<EnrichedSymbol>;
24
+ }
25
+ ```
26
+
27
+ Register in `config.json`:
28
+
29
+ ```json
30
+ {
31
+ "contextProviders": ["my-custom-provider"]
32
+ }
33
+ ```
34
+
35
+ ---
36
+
37
+ ## dbt integration
38
+
39
+ **Auto-detected by:** `dbt_project.yml` in project root.
40
+
41
+ ### What is indexed
42
+
43
+ | dbt artifact | Symbol kind | Notes |
44
+ |-------------|-------------|-------|
45
+ | Model (`.sql`) | `function` | SQL logic as source, dbt Jinja expanded |
46
+ | Source | `const` | External data source reference |
47
+ | Seed (`.csv`) | `const` | Static data table |
48
+ | Macro | `function` | Jinja macro definition |
49
+ | Exposure | `const` | Dashboard/downstream consumer |
50
+
51
+ Column definitions from `schema.yml` are stored in `frameworkMeta.columns`.
52
+
53
+ ### dbt Jinja expansion
54
+
55
+ Before parsing, dbt SQL files are pre-processed to expand Jinja templating:
56
+ - `{{ ref('orders') }}` → resolved model name
57
+ - `{{ source('raw', 'events') }}` → source reference
58
+ - `{{ config(...) }}` → stripped
59
+
60
+ This allows the SQL handler to parse the underlying SQL accurately.
61
+
62
+ ### Configuration
63
+
64
+ ```json
65
+ {
66
+ "dbt": {
67
+ "manifestPath": "target/manifest.json",
68
+ "profilesPath": "~/.dbt/profiles.yml"
69
+ }
70
+ }
71
+ ```
72
+
73
+ Run `dbt compile` or `dbt run` before indexing to ensure `target/manifest.json` is current.
74
+
75
+ ---
76
+
77
+ ## `search_columns`
78
+
79
+ Search column definitions across dbt models and SQL tables.
80
+
81
+ **Parameters:**
82
+
83
+ | Parameter | Type | Default | Description |
84
+ |-----------|------|---------|-------------|
85
+ | `repoId` | `string` | required | Target repository |
86
+ | `query` | `string` | required | Column name fragment |
87
+ | `modelName` | `string` | — | Restrict to a specific model |
88
+
89
+ **Response:**
90
+
91
+ ```json
92
+ {
93
+ "columns": [
94
+ {
95
+ "name": "user_id",
96
+ "model": "fct_orders",
97
+ "dataType": "bigint",
98
+ "description": "Foreign key to dim_users",
99
+ "nullable": false,
100
+ "lineage": {
101
+ "upstream": ["stg_orders.user_id"],
102
+ "downstream": ["rpt_user_activity.user_id", "fct_revenue.user_id"]
103
+ }
104
+ }
105
+ ]
106
+ }
107
+ ```
108
+
109
+ **Use cases:**
110
+ - "Find all columns named `user_id` across my dbt project"
111
+ - "What models produce the `revenue` column?"
112
+ - "What is the lineage of `order_status`?"
113
+
114
+ ---
115
+
116
+ ## OpenAPI / Swagger handler
117
+
118
+ **Auto-detected by:** `openapi.yaml`, `openapi.json`, `swagger.yaml`, or `swagger.json` in the project root, or files with `openapi: 3.x.x` content.
119
+
120
+ ### What is indexed
121
+
122
+ | OpenAPI artifact | Symbol kind | Notes |
123
+ |-----------------|-------------|-------|
124
+ | Endpoint (`GET /users`) | `route` | Path + method as name |
125
+ | Schema object | `type` | Request/response schema |
126
+ | Parameter | `const` | Query/path/header parameter |
127
+
128
+ ### Using OpenAPI symbols
129
+
130
+ ```
131
+ search_symbols(query: "users", kind: "route")
132
+ → "GET /users", "POST /users", "GET /users/{id}"
133
+
134
+ get_symbol_source(symbolId: "GET /users/{id}")
135
+ → Full endpoint definition including parameters, request body, response schemas
136
+ ```
137
+
138
+ ---
139
+
140
+ ## SQL handler
141
+
142
+ **Extensions:** `.sql` files.
143
+
144
+ **Detected separately from dbt** — the SQL handler processes raw SQL files without dbt Jinja.
145
+
146
+ ### What is indexed
147
+
148
+ | SQL statement | Symbol kind |
149
+ |--------------|-------------|
150
+ | `CREATE TABLE` | `class` |
151
+ | `CREATE VIEW` | `function` |
152
+ | `CREATE FUNCTION` | `function` |
153
+ | `CREATE PROCEDURE` | `function` |
154
+ | `CREATE INDEX` | `const` |
155
+
156
+ For dbt projects, the SQL handler works alongside the dbt provider — the provider handles Jinja expansion and column lineage, the handler handles AST parsing.
157
+
158
+ ### Example
159
+
160
+ ```
161
+ search_symbols(query: "orders", kind: "class")
162
+ → "orders" table (CREATE TABLE orders ...)
163
+
164
+ get_symbol_source(symbolId: "orders-table-id")
165
+ → Full CREATE TABLE statement with all column definitions
166
+ ```
167
+
168
+ ---
169
+
170
+ ## Combining data tools
171
+
172
+ A typical data platform exploration workflow:
173
+
174
+ ```
175
+ 1. search_columns(query: "revenue")
176
+ → Find all columns named 'revenue' and their models
177
+
178
+ 2. get_symbol_source(symbolId: "fct_revenue-model-id")
179
+ → See the SQL logic that produces the revenue column
180
+
181
+ 3. get_context_bundle(symbolId: "fct_revenue-model-id")
182
+ → Traverse upstream to understand the full lineage
183
+
184
+ 4. search_symbols(query: "revenue", kind: "route")
185
+ → Find the API endpoints that expose revenue data
186
+
187
+ 5. get_blast_radius(symbolId: "fct_revenue-model-id")
188
+ → See which dashboards and downstream models depend on this
189
+ ```
@@ -0,0 +1,240 @@
1
+ # Distribution & Platform
2
+
3
+
4
+ PureContext supports distribution and automation through index export/import, a public registry of pre-built indexes, webhooks for auto-reindex, GitHub Actions integration, and a VS Code extension.
5
+
6
+ ---
7
+
8
+ ## Index export and import
9
+
10
+ Share pre-built indexes without requiring everyone to re-index from scratch.
11
+
12
+ ### Export
13
+
14
+ ```bash
15
+ npx purecontext-mcp export --repo <repoId> --out index.pctx.tar.gz
16
+ ```
17
+
18
+ Or by path:
19
+
20
+ ```bash
21
+ npx purecontext-mcp export --path /path/to/project --out index.pctx.tar.gz
22
+ ```
23
+
24
+ The archive contains: compressed SQLite database, HNSW index (if present), and a metadata JSON file.
25
+
26
+ ### Import
27
+
28
+ ```bash
29
+ npx purecontext-mcp import --file index.pctx.tar.gz
30
+ ```
31
+
32
+ After import, the repo is immediately searchable — no re-indexing required.
33
+
34
+ ### Use cases
35
+
36
+ - **Team onboarding**: export the index after CI, share as an artifact — new developers get a pre-built index on day one
37
+ - **CI pipeline**: cache the index between runs (see GitHub Actions below)
38
+ - **Server migration**: move indexes from one server to another without re-indexing
39
+
40
+ ---
41
+
42
+ ## Public registry
43
+
44
+ Pre-built indexes for popular open-source projects are hosted on a CDN.
45
+
46
+ ### Pulling a registry index
47
+
48
+ ```bash
49
+ npx purecontext-mcp pull react@18
50
+ npx purecontext-mcp pull typescript@5
51
+ npx purecontext-mcp pull django@4.2
52
+ ```
53
+
54
+ The index is downloaded and imported automatically. Use `list_repos` to confirm it's available.
55
+
56
+ ### Available packages
57
+
58
+ ```bash
59
+ npx purecontext-mcp registry list
60
+ # Lists all available packages with versions and index sizes
61
+ ```
62
+
63
+ ### Requesting a new package
64
+
65
+ Open an issue on GitHub with the package name and version. Registry indexes are built automatically from GitHub releases using the GitHub Actions integration.
66
+
67
+ ---
68
+
69
+ ## Webhooks for auto-reindex
70
+
71
+ Configure a webhook endpoint to trigger re-indexing automatically when code is pushed to your repository.
72
+
73
+ ### Setup
74
+
75
+ 1. In your PureContext server config:
76
+
77
+ ```json
78
+ {
79
+ "webhooks": {
80
+ "enabled": true,
81
+ "secret": "${WEBHOOK_SECRET}",
82
+ "branches": ["main", "develop"]
83
+ }
84
+ }
85
+ ```
86
+
87
+ 2. In your GitHub repository settings:
88
+ - Go to Settings → Webhooks → Add webhook
89
+ - Payload URL: `https://your-server/webhook/github`
90
+ - Content type: `application/json`
91
+ - Secret: same value as `WEBHOOK_SECRET`
92
+ - Events: "Just the push event"
93
+
94
+ ### How it works
95
+
96
+ When a push is received:
97
+ 1. PureContext verifies the webhook signature (HMAC-SHA256)
98
+ 2. Checks if the pushed branch is in `webhooks.branches`
99
+ 3. Triggers an incremental re-index of the affected repo
100
+ 4. New symbols are available within seconds
101
+
102
+ ### GitLab and others
103
+
104
+ Custom webhook formats are supported by mapping them to PureContext's internal format:
105
+
106
+ ```json
107
+ {
108
+ "webhooks": {
109
+ "enabled": true,
110
+ "format": "gitlab",
111
+ "secret": "${WEBHOOK_SECRET}"
112
+ }
113
+ }
114
+ ```
115
+
116
+ ---
117
+
118
+ ## GitHub Actions integration
119
+
120
+ The official `purecontext/index-action` automates index building in CI.
121
+
122
+ ### Basic usage
123
+
124
+ ```yaml
125
+ # .github/workflows/index.yml
126
+ name: Index with PureContext
127
+ on:
128
+ push:
129
+ branches: [main]
130
+
131
+ jobs:
132
+ index:
133
+ runs-on: ubuntu-latest
134
+ steps:
135
+ - uses: actions/checkout@v4
136
+
137
+ - name: Index repository
138
+ uses: purecontext/index-action@v1
139
+ with:
140
+ server-url: ${{ vars.PCTX_SERVER_URL }}
141
+ api-key: ${{ secrets.PCTX_API_KEY }}
142
+ ```
143
+
144
+ ### Caching the index in CI
145
+
146
+ ```yaml
147
+ - name: Cache PureContext index
148
+ uses: actions/cache@v4
149
+ with:
150
+ path: ~/.purecontext/indexes
151
+ key: purecontext-${{ github.sha }}
152
+ restore-keys: purecontext-
153
+
154
+ - name: Index repository
155
+ uses: purecontext/index-action@v1
156
+ with:
157
+ path: ${{ github.workspace }}
158
+ ```
159
+
160
+ With caching, only changed files are re-parsed on each run — CI index time drops to seconds after the first run.
161
+
162
+ ### Publishing to the registry
163
+
164
+ ```yaml
165
+ - name: Publish index to registry
166
+ uses: purecontext/index-action@v1
167
+ with:
168
+ action: publish
169
+ package-name: my-org/my-library
170
+ api-key: ${{ secrets.PCTX_REGISTRY_KEY }}
171
+ ```
172
+
173
+ See the full `action.yml` in the project root for all available inputs.
174
+
175
+ ---
176
+
177
+ ## VS Code extension
178
+
179
+ The PureContext VS Code extension integrates symbol search and navigation directly into the editor.
180
+
181
+ ### Installation
182
+
183
+ Search "PureContext" in the VS Code Extensions panel, or:
184
+
185
+ ```bash
186
+ code --install-extension purecontext.purecontext-vscode
187
+ ```
188
+
189
+ The source is in `vscode-extension/` in the project repo.
190
+
191
+ ### Features
192
+
193
+ | Feature | Description |
194
+ |---------|-------------|
195
+ | Symbol search | `Ctrl+Shift+P` → "PureContext: Search Symbols" |
196
+ | Hover summary | Hover over any identifier to see its PureContext summary |
197
+ | Go to definition | Uses PureContext index for faster lookup in large repos |
198
+ | Dependency graph | `Ctrl+Shift+P` → "PureContext: Show Dependencies" — opens graph panel |
199
+ | Blast radius | Right-click a symbol → "Show Blast Radius" |
200
+ | Quick outline | `Ctrl+Shift+O` with PureContext — shows AI-enriched summaries |
201
+
202
+ ### Configuration
203
+
204
+ Extension settings in VS Code match the `config.json` fields:
205
+
206
+ ```json
207
+ // .vscode/settings.json
208
+ {
209
+ "purecontext.serverUrl": "http://localhost:3000",
210
+ "purecontext.apiKey": "pctx_...",
211
+ "purecontext.enabled": true
212
+ }
213
+ ```
214
+
215
+ ---
216
+
217
+ ## Programmatic API
218
+
219
+ For building custom integrations, the `@purecontext/client` npm package provides a typed TypeScript client:
220
+
221
+ ```typescript
222
+ import { PureContextClient } from '@purecontext/client';
223
+
224
+ const client = new PureContextClient({
225
+ serverUrl: 'http://localhost:3000',
226
+ apiKey: 'pctx_...'
227
+ });
228
+
229
+ const symbols = await client.searchSymbols({
230
+ repoId: 'a1b2c3d4',
231
+ query: 'authenticate',
232
+ kind: 'function'
233
+ });
234
+ ```
235
+
236
+ All tool inputs and outputs are fully typed. Install:
237
+
238
+ ```bash
239
+ npm install @purecontext/client
240
+ ```
@@ -0,0 +1,121 @@
1
+ # Performance & Scalability
2
+
3
+
4
+ PureContext is designed to handle enterprise-scale repos (10k–50k files) using a worker thread pool for parallel tree-sitter parsing.
5
+
6
+ ---
7
+
8
+ ## Indexing speed
9
+
10
+ Typical performance on a 4-core machine:
11
+
12
+ | Repo size | First index | Incremental re-index |
13
+ |-----------|-------------|----------------------|
14
+ | 500 files | ~2 seconds | < 100ms |
15
+ | 5,000 files | ~15 seconds | < 1 second |
16
+ | 20,000 files | ~60 seconds | 1–3 seconds |
17
+ | 50,000 files | ~3 minutes | 2–10 seconds |
18
+
19
+ These numbers assume no AI summarization or semantic indexing. Both add API round-trip time.
20
+
21
+ ---
22
+
23
+ ## Worker thread pool
24
+
25
+ The bottleneck in sequential indexing is tree-sitter WASM parsing — each WASM instance is single-threaded. The worker thread pool parallelizes parsing across CPU cores.
26
+
27
+ ```
28
+ Main thread
29
+
30
+ ┌────────────┼────────────┐
31
+ ▼ ▼ ▼
32
+ Worker 1 Worker 2 Worker 3
33
+ (TypeScript) (Python) (Go)
34
+ parse + extract parse + extract parse + extract
35
+ │ │ │
36
+ └────────────┴────────────┘
37
+
38
+ Main thread
39
+ (SQLite writes)
40
+ ```
41
+
42
+ Each worker loads its own WASM grammar instances. File batches are distributed across workers by the main thread. SQLite writes are serialized on the main thread (better-sqlite3 is synchronous).
43
+
44
+ ### Configuring worker threads
45
+
46
+ ```json
47
+ {
48
+ "workerThreads": 4 // default: os.cpus().length - 1, minimum 1
49
+ }
50
+ ```
51
+
52
+ Increase for CPU-bound workloads on machines with many cores. Do not exceed `os.cpus().length - 1` — you want to leave one core for the main thread and OS.
53
+
54
+ ---
55
+
56
+ ## Memory usage
57
+
58
+ | Component | Memory |
59
+ |-----------|--------|
60
+ | WASM grammars (per worker) | ~20–30 MB per grammar loaded |
61
+ | In-memory symbol cache (during indexing) | ~100 MB for 10k symbols |
62
+ | SQLite WAL mode (at rest) | ~50 MB |
63
+ | HNSW vector index (if enabled) | ~100 bytes per embedding dimension per symbol |
64
+
65
+ **Typical peak during indexing:** 200–500 MB for a 10k-file repo. Returns to ~50 MB at rest.
66
+
67
+ Workers are spawned once and reused for the lifetime of the server — no spawn/teardown overhead per index run.
68
+
69
+ ---
70
+
71
+ ## Incremental re-indexing
72
+
73
+ The content hash cache makes re-indexing very fast:
74
+
75
+ 1. Each file's SHA-256 hash is stored in the `files` table after indexing
76
+ 2. On re-index, the hash is recomputed and compared
77
+ 3. Only files with a changed hash are re-parsed
78
+ 4. Symbols for unchanged files are retained as-is
79
+
80
+ A typical `git pull` touches 10–50 files — re-index completes in milliseconds.
81
+
82
+ To force a full re-index (bypass the hash cache):
83
+
84
+ ```
85
+ Use invalidate_cache tool, then index_folder again.
86
+ ```
87
+
88
+ Or call `index_folder` with `force: true`.
89
+
90
+ ---
91
+
92
+ ## Large repo tuning
93
+
94
+ For repos with > 10,000 files:
95
+
96
+ | Setting | Recommendation |
97
+ |---------|---------------|
98
+ | `workerThreads` | Set to `os.cpus().length - 1` |
99
+ | `watchDebounceMs` | Increase to `5000` if many files change at once (e.g., code generation) |
100
+ | `excludePatterns` | Add patterns for generated files, test fixtures with large data files |
101
+ | `maxFileSizeBytes` | Keep at 1 MB or lower — parsing multi-MB files is slow and rarely useful |
102
+ | `fileLimit` | Set to `0` (unlimited) if you need the full repo indexed |
103
+
104
+ ---
105
+
106
+ ## SQLite performance
107
+
108
+ SQLite in **WAL (Write-Ahead Logging) mode** provides:
109
+ - Concurrent reads without blocking writes
110
+ - Fast writes (no fsync on every write in WAL mode)
111
+ - Crash safety (WAL journal ensures atomicity)
112
+
113
+ Query performance:
114
+ - `search_symbols` with FTS5: < 5ms for 100k symbols
115
+ - `get_symbol_source`: < 1ms (single row lookup by primary key)
116
+ - `get_blast_radius` (depth 5): 5–20ms depending on graph density
117
+ - `get_context_bundle` (depth 3): 3–15ms
118
+
119
+ No tuning is needed for the SQLite layer up to ~500k symbols. At very large scale, consider periodic `VACUUM` to reclaim space from deleted symbols.
120
+
121
+
@@ -0,0 +1,144 @@
1
+ # Security
2
+
3
+
4
+ ---
5
+
6
+ ## Threat model
7
+
8
+ PureContext stores and serves source code metadata. Security measures focus on:
9
+
10
+ **Protected:**
11
+ - Symbol names, signatures, summaries — stored in SQLite
12
+ - Raw source returned by `get_symbol_source` / `get_file_content`
13
+ - Admin API (workspace/key management)
14
+
15
+ **Not in scope:**
16
+ - The source repository itself — PureContext only reads it during indexing
17
+ - Network transport — handle TLS at a reverse proxy
18
+ - Host OS security — standard server hardening applies
19
+
20
+ ---
21
+
22
+ ## Path traversal prevention
23
+
24
+ All file paths are validated before any read:
25
+
26
+ 1. Resolved to an absolute path
27
+ 2. Verified to start within the project root (the indexed directory)
28
+ 3. Symlinks that resolve outside the root are blocked unless `allowSymlinks: true`
29
+
30
+ This prevents tools like `get_file_content` from being used to read arbitrary files on the server.
31
+
32
+ ---
33
+
34
+ ## Secret file exclusion
35
+
36
+ The following files are automatically excluded from indexing (never stored in the index):
37
+
38
+ - `.env`, `.env.*`, `.env.local`, `.env.production`
39
+ - `*.pem`, `*.key`, `*.p12`, `*.pfx`, `*.crt`, `*.cer`
40
+ - `id_rsa`, `id_ed25519`, `id_ecdsa`, `id_dsa`
41
+ - `credentials.json`, `credentials.yaml`, `secrets.json`
42
+ - `serviceAccountKey*.json`, `*-service-account.json`
43
+ - `*.token`, `*.secret`
44
+
45
+ These patterns are built into the file discovery layer and cannot be overridden by `excludePatterns`.
46
+
47
+ ---
48
+
49
+ ## Binary file detection
50
+
51
+ Files are scanned for null bytes in the first 8 KB. Files with null bytes are treated as binary and skipped — preventing large binary files (which may contain embedded secrets) from entering the index.
52
+
53
+ ---
54
+
55
+ ## API key security
56
+
57
+ Keys are stored as **bcrypt hashes** in the auth database — plaintext is never persisted after the key is generated.
58
+
59
+ - Keys are shown once on creation — store in a password manager or CI secrets
60
+ - Key format: `pctx_<workspaceId>_<24-char-random>_<checksum>`
61
+ - The checksum allows fast format validation without a database lookup
62
+ - Rotate keys by revoking the old one and creating a new one
63
+ - Use `read` permission for agents that only query, not `write` or `admin`
64
+
65
+ ---
66
+
67
+ ## Workspace isolation
68
+
69
+ Every query is scoped to the workspace of the API key used:
70
+
71
+ - A key from workspace A cannot query repos in workspace B
72
+ - Workspace scoping is enforced in all SQL queries via `workspace_id` column
73
+ - The admin key (`PCTX_ADMIN_KEY`) bypasses workspace isolation — protect it like a root password
74
+
75
+ ---
76
+
77
+ ## Rate limiting
78
+
79
+ Per-key rate limits (token bucket algorithm):
80
+
81
+ - `rateLimit.maxTokens` — bucket capacity (default: 100)
82
+ - `rateLimit.refillRate` — tokens/second refill rate (default: 10)
83
+ - Heavy tools (e.g., `index_folder`) cost more tokens per call
84
+
85
+ When exceeded: `429 Too Many Requests` with `Retry-After` header.
86
+
87
+ ---
88
+
89
+ ## HTTP security
90
+
91
+ - **Default host: `127.0.0.1`** — loopback only, not exposed on the network
92
+ - A warning is logged at startup if `host` is not loopback and `auth.enabled` is false
93
+ - **Timing-safe comparison** — `crypto.timingSafeEqual()` used for token comparison (prevents timing attacks)
94
+ - **Request body limit** — 1 MB maximum
95
+ - **CORS** — whitelist-controlled via `http.corsOrigins`
96
+
97
+ ---
98
+
99
+ ## Remote repository cloning
100
+
101
+ When using `index_repo`:
102
+
103
+ - Only `https://`, `http://`, and `git@` URL schemes are accepted
104
+ - Clone tokens (`token` parameter) are never logged
105
+ - Clones are isolated under `~/.purecontext/clones/`
106
+
107
+ ---
108
+
109
+ ## Self-hosting hardening checklist
110
+
111
+ - [ ] Run behind a TLS-terminating reverse proxy (nginx, Caddy)
112
+ - [ ] Set `PCTX_ADMIN_KEY` via environment variable, never in `config.json`
113
+ - [ ] Restrict developer API keys to `read` permission where possible
114
+ - [ ] Restrict server bind address to internal network if not public-facing
115
+ - [ ] Use firewall rules to limit access to port 3000
116
+ - [ ] Monitor `/health` endpoint and set up uptime alerts
117
+ - [ ] Rotate API keys regularly
118
+ - [ ] Back up the `/data` volume (contains indexes and auth database)
119
+
120
+ ---
121
+
122
+ ## Data at rest
123
+
124
+ SQLite files are stored in `indexDir` (`~/.purecontext/indexes/` by default). No encryption at rest is applied by PureContext itself.
125
+
126
+ For sensitive codebases, use OS-level disk encryption:
127
+ - macOS: FileVault
128
+ - Windows: BitLocker
129
+ - Linux: LUKS
130
+
131
+ Docker: use encrypted volumes if the host is shared.
132
+
133
+ ---
134
+
135
+ ## Audit logging
136
+
137
+ The HTTP server logs every MCP tool call with:
138
+ - Timestamp
139
+ - API key label (not the key itself)
140
+ - Tool name
141
+ - `repoId`
142
+ - Response status and duration
143
+
144
+ At `debug` level, full request/response bodies are included. Pipe logs to your SIEM or log aggregator for audit trails.