purecontext-mcp 1.5.2 → 1.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (196) hide show
  1. package/AGENT_INSTRUCTIONS.md +18 -10
  2. package/AGENT_REFERENCE.md +684 -561
  3. package/CHANGELOG.md +567 -445
  4. package/CODE-HISTORY.md +29 -1
  5. package/FRAMEWORK-ADAPTERS.md +368 -351
  6. package/FULL-INSTALLATION-GUIDE.md +351 -341
  7. package/README.md +411 -339
  8. package/REFACTORING-SAFELY.md +338 -279
  9. package/SAFE-CHANGES.md +208 -156
  10. package/USER-GUIDE.md +3 -1
  11. package/WHY-PURECONTEXT.md +103 -73
  12. package/WORKFLOW-PR-REVIEW.md +245 -199
  13. package/dist/adapters/astro-preprocessor.d.ts +25 -0
  14. package/dist/adapters/astro-preprocessor.d.ts.map +1 -0
  15. package/dist/adapters/astro-preprocessor.js +50 -0
  16. package/dist/adapters/astro-preprocessor.js.map +1 -0
  17. package/dist/adapters/astro.d.ts +13 -0
  18. package/dist/adapters/astro.d.ts.map +1 -0
  19. package/dist/adapters/astro.js +83 -0
  20. package/dist/adapters/astro.js.map +1 -0
  21. package/dist/adapters/detect-utils.d.ts +38 -0
  22. package/dist/adapters/detect-utils.d.ts.map +1 -0
  23. package/dist/adapters/detect-utils.js +95 -0
  24. package/dist/adapters/detect-utils.js.map +1 -0
  25. package/dist/adapters/nuxt.d.ts +20 -0
  26. package/dist/adapters/nuxt.d.ts.map +1 -1
  27. package/dist/adapters/nuxt.js +128 -13
  28. package/dist/adapters/nuxt.js.map +1 -1
  29. package/dist/adapters/svelte-preprocessor.d.ts +29 -0
  30. package/dist/adapters/svelte-preprocessor.d.ts.map +1 -0
  31. package/dist/adapters/svelte-preprocessor.js +83 -0
  32. package/dist/adapters/svelte-preprocessor.js.map +1 -0
  33. package/dist/adapters/svelte.d.ts +13 -0
  34. package/dist/adapters/svelte.d.ts.map +1 -0
  35. package/dist/adapters/svelte.js +96 -0
  36. package/dist/adapters/svelte.js.map +1 -0
  37. package/dist/adapters/vue.d.ts.map +1 -1
  38. package/dist/adapters/vue.js +87 -20
  39. package/dist/adapters/vue.js.map +1 -1
  40. package/dist/bin.d.ts +16 -0
  41. package/dist/bin.d.ts.map +1 -0
  42. package/dist/bin.js +21 -0
  43. package/dist/bin.js.map +1 -0
  44. package/dist/cli/hooks.d.ts +2 -2
  45. package/dist/cli/hooks.d.ts.map +1 -1
  46. package/dist/cli/hooks.js +123 -135
  47. package/dist/cli/hooks.js.map +1 -1
  48. package/dist/cli/install-writers.d.ts.map +1 -1
  49. package/dist/cli/install-writers.js +281 -36
  50. package/dist/cli/install-writers.js.map +1 -1
  51. package/dist/cli/resolve-node.d.ts +53 -0
  52. package/dist/cli/resolve-node.d.ts.map +1 -0
  53. package/dist/cli/resolve-node.js +84 -0
  54. package/dist/cli/resolve-node.js.map +1 -0
  55. package/dist/config/config-loader.js +24 -0
  56. package/dist/config/config-loader.js.map +1 -1
  57. package/dist/config/config-schema.d.ts +71 -0
  58. package/dist/config/config-schema.d.ts.map +1 -1
  59. package/dist/config/config-schema.js +102 -0
  60. package/dist/config/config-schema.js.map +1 -1
  61. package/dist/core/db/api-keys.d.ts +1 -1
  62. package/dist/core/db/api-keys.d.ts.map +1 -1
  63. package/dist/core/db/api-keys.js +39 -39
  64. package/dist/core/db/api-keys.js.map +1 -1
  65. package/dist/core/db/co-change-store.d.ts +34 -0
  66. package/dist/core/db/co-change-store.d.ts.map +1 -0
  67. package/dist/core/db/co-change-store.js +78 -0
  68. package/dist/core/db/co-change-store.js.map +1 -0
  69. package/dist/core/db/schema.d.ts +3 -3
  70. package/dist/core/db/schema.d.ts.map +1 -1
  71. package/dist/core/db/schema.js +12 -30
  72. package/dist/core/db/schema.js.map +1 -1
  73. package/dist/core/db/sqlite-loader.d.ts +51 -0
  74. package/dist/core/db/sqlite-loader.d.ts.map +1 -0
  75. package/dist/core/db/sqlite-loader.js +94 -0
  76. package/dist/core/db/sqlite-loader.js.map +1 -0
  77. package/dist/core/db/wasm-sqlite.d.ts +4 -0
  78. package/dist/core/db/wasm-sqlite.d.ts.map +1 -0
  79. package/dist/core/db/wasm-sqlite.js +270 -0
  80. package/dist/core/db/wasm-sqlite.js.map +1 -0
  81. package/dist/core/diff-parser.d.ts.map +1 -1
  82. package/dist/core/diff-parser.js +6 -1
  83. package/dist/core/diff-parser.js.map +1 -1
  84. package/dist/core/git-log-reader.d.ts +28 -0
  85. package/dist/core/git-log-reader.d.ts.map +1 -1
  86. package/dist/core/git-log-reader.js +74 -3
  87. package/dist/core/git-log-reader.js.map +1 -1
  88. package/dist/core/index-manager.d.ts.map +1 -1
  89. package/dist/core/index-manager.js +29 -3
  90. package/dist/core/index-manager.js.map +1 -1
  91. package/dist/core/indexing-worker.d.ts +2 -0
  92. package/dist/core/indexing-worker.d.ts.map +1 -1
  93. package/dist/core/indexing-worker.js +2 -0
  94. package/dist/core/indexing-worker.js.map +1 -1
  95. package/dist/core/watcher/file-watcher.d.ts +6 -0
  96. package/dist/core/watcher/file-watcher.d.ts.map +1 -1
  97. package/dist/core/watcher/file-watcher.js +11 -1
  98. package/dist/core/watcher/file-watcher.js.map +1 -1
  99. package/dist/graph/path-resolver.js +86 -17
  100. package/dist/graph/path-resolver.js.map +1 -1
  101. package/dist/index.d.ts +3 -0
  102. package/dist/index.d.ts.map +1 -1
  103. package/dist/index.js +11 -1
  104. package/dist/index.js.map +1 -1
  105. package/dist/node-guard.d.ts +15 -0
  106. package/dist/node-guard.d.ts.map +1 -0
  107. package/dist/node-guard.js +33 -0
  108. package/dist/node-guard.js.map +1 -0
  109. package/dist/server/admin-api.d.ts +1 -1
  110. package/dist/server/admin-api.d.ts.map +1 -1
  111. package/dist/server/admin-api.js +2 -2
  112. package/dist/server/admin-api.js.map +1 -1
  113. package/dist/server/auth/api-key.d.ts +1 -1
  114. package/dist/server/auth/api-key.d.ts.map +1 -1
  115. package/dist/server/mcp-server.d.ts.map +1 -1
  116. package/dist/server/mcp-server.js +25 -0
  117. package/dist/server/mcp-server.js.map +1 -1
  118. package/dist/server/tools/analyze-diff.d.ts +8 -0
  119. package/dist/server/tools/analyze-diff.d.ts.map +1 -1
  120. package/dist/server/tools/analyze-diff.js +80 -16
  121. package/dist/server/tools/analyze-diff.js.map +1 -1
  122. package/dist/server/tools/change-synthesis.d.ts +90 -0
  123. package/dist/server/tools/change-synthesis.d.ts.map +1 -0
  124. package/dist/server/tools/change-synthesis.js +236 -0
  125. package/dist/server/tools/change-synthesis.js.map +1 -0
  126. package/dist/server/tools/co-change.d.ts +65 -0
  127. package/dist/server/tools/co-change.d.ts.map +1 -0
  128. package/dist/server/tools/co-change.js +146 -0
  129. package/dist/server/tools/co-change.js.map +1 -0
  130. package/dist/server/tools/compare-change-impact.d.ts +58 -0
  131. package/dist/server/tools/compare-change-impact.d.ts.map +1 -0
  132. package/dist/server/tools/compare-change-impact.js +0 -0
  133. package/dist/server/tools/compare-change-impact.js.map +1 -0
  134. package/dist/server/tools/find-refactoring-opportunities.d.ts +1 -1
  135. package/dist/server/tools/get-architecture-snapshot.d.ts.map +1 -1
  136. package/dist/server/tools/get-architecture-snapshot.js +28 -14
  137. package/dist/server/tools/get-architecture-snapshot.js.map +1 -1
  138. package/dist/server/tools/get-churn-metrics.d.ts.map +1 -1
  139. package/dist/server/tools/get-churn-metrics.js +1 -12
  140. package/dist/server/tools/get-churn-metrics.js.map +1 -1
  141. package/dist/server/tools/get-co-change.d.ts +37 -0
  142. package/dist/server/tools/get-co-change.d.ts.map +1 -0
  143. package/dist/server/tools/get-co-change.js +120 -0
  144. package/dist/server/tools/get-co-change.js.map +1 -0
  145. package/dist/server/tools/get-context-bundle.d.ts.map +1 -1
  146. package/dist/server/tools/get-context-bundle.js +56 -3
  147. package/dist/server/tools/get-context-bundle.js.map +1 -1
  148. package/dist/server/tools/get-entry-points.d.ts +1 -1
  149. package/dist/server/tools/get-symbol-risk.d.ts +25 -0
  150. package/dist/server/tools/get-symbol-risk.d.ts.map +1 -0
  151. package/dist/server/tools/get-symbol-risk.js +60 -0
  152. package/dist/server/tools/get-symbol-risk.js.map +1 -0
  153. package/dist/server/tools/get-symbol-source.d.ts +2 -0
  154. package/dist/server/tools/get-symbol-source.d.ts.map +1 -1
  155. package/dist/server/tools/get-symbol-source.js +18 -1
  156. package/dist/server/tools/get-symbol-source.js.map +1 -1
  157. package/dist/server/tools/index-repo.d.ts.map +1 -1
  158. package/dist/server/tools/index-repo.js +8 -2
  159. package/dist/server/tools/index-repo.js.map +1 -1
  160. package/dist/server/tools/prepare-change.d.ts +61 -0
  161. package/dist/server/tools/prepare-change.d.ts.map +1 -0
  162. package/dist/server/tools/prepare-change.js +262 -0
  163. package/dist/server/tools/prepare-change.js.map +1 -0
  164. package/dist/server/tools/search-symbols.d.ts +2 -0
  165. package/dist/server/tools/search-symbols.d.ts.map +1 -1
  166. package/dist/server/tools/search-symbols.js +33 -0
  167. package/dist/server/tools/search-symbols.js.map +1 -1
  168. package/dist/server/tools/symbol-lines.d.ts +25 -0
  169. package/dist/server/tools/symbol-lines.d.ts.map +1 -0
  170. package/dist/server/tools/symbol-lines.js +40 -0
  171. package/dist/server/tools/symbol-lines.js.map +1 -0
  172. package/dist/server/tools/symbol-risk.d.ts +109 -0
  173. package/dist/server/tools/symbol-risk.d.ts.map +1 -0
  174. package/dist/server/tools/symbol-risk.js +251 -0
  175. package/dist/server/tools/symbol-risk.js.map +1 -0
  176. package/dist/server/tools/verify-change.d.ts +40 -0
  177. package/dist/server/tools/verify-change.d.ts.map +1 -0
  178. package/dist/server/tools/verify-change.js +149 -0
  179. package/dist/server/tools/verify-change.js.map +1 -0
  180. package/dist/version.d.ts +1 -1
  181. package/dist/version.d.ts.map +1 -1
  182. package/dist/version.js +1 -1
  183. package/dist/version.js.map +1 -1
  184. package/docs/01-introduction.md +2 -2
  185. package/docs/02-installation.md +97 -89
  186. package/docs/03-quick-start.md +138 -135
  187. package/docs/04-configuration.md +247 -214
  188. package/docs/05-cli-reference.md +236 -219
  189. package/docs/06-tools-reference.md +902 -499
  190. package/docs/14-transport-modes.md +170 -167
  191. package/docs/18-git-history.md +43 -0
  192. package/docs/23-performance.md +123 -121
  193. package/docs/26-troubleshooting.md +249 -234
  194. package/grammars/README.md +88 -0
  195. package/package.json +7 -25
  196. package/AGENT_INSTRUCTIONS_SHORT.md +0 -150
@@ -1,167 +1,170 @@
1
- # Transport Modes
2
-
3
-
4
- PureContext supports two transport modes: **stdio** (local, default) and **HTTP/SSE** (team/cloud).
5
-
6
- ## stdio transport (default)
7
-
8
- The standard transport for Claude Code and other MCP-native clients.
9
-
10
- ```bash
11
- purecontext-mcp
12
- ```
13
-
14
- Claude Code spawns `purecontext-mcp` as a child process and communicates over stdin/stdout using the JSON-RPC MCP protocol. No network, no authentication required.
15
-
16
- **Claude Code setup:**
17
-
18
- ```bash
19
- # Using npx (recommended)
20
- claude mcp add purecontext-mcp -- npx purecontext-mcp
21
-
22
- # Using global install
23
- claude mcp add purecontext-mcp purecontext-mcp
24
- ```
25
-
26
- **Best for:** Individual developers, local development, any situation where security and simplicity matter more than sharing.
27
-
28
- ## HTTP / SSE transport
29
-
30
- For browser-based clients, remote development, or multi-client setups.
31
-
32
- ```bash
33
- purecontext-mcp --transport http --port 3000
34
- ```
35
-
36
- Or via `config.json`:
37
-
38
- ```json
39
- {
40
- "transport": "http",
41
- "http": {
42
- "port": 3000,
43
- "host": "127.0.0.1",
44
- "corsOrigins": ["http://localhost:*"]
45
- }
46
- }
47
- ```
48
-
49
- **HTTP endpoints:**
50
-
51
- | Endpoint | Description |
52
- |----------|-------------|
53
- | `GET /health` | Server health check (always public) |
54
- | `POST /mcp/sse` | MCP Streamable HTTP endpoint |
55
- | `GET /` | Web UI (served when UI is built) |
56
- | `GET /admin/*` | Admin API (requires admin key) |
57
-
58
- **Connect Claude Code to an HTTP server:**
59
-
60
- ```json
61
- // ~/.claude/claude_desktop_config.json
62
- {
63
- "mcpServers": {
64
- "purecontext": {
65
- "transport": "http",
66
- "url": "http://localhost:3000/mcp/sse"
67
- }
68
- }
69
- }
70
- ```
71
-
72
- Or via CLI:
73
-
74
- ```bash
75
- claude mcp add purecontext-remote \
76
- --transport http \
77
- --url https://purecontext.mycompany.com/mcp/sse \
78
- --header "Authorization: Bearer pctx_yourkey"
79
- ```
80
-
81
- **Best for:** Team deployments, shared index, CI pipelines, Web UI access.
82
-
83
- ## Both transports simultaneously (development)
84
-
85
- Run stdio and HTTP at the same time — useful during development to test the HTTP API while still using Claude Code via stdio:
86
-
87
- ```bash
88
- purecontext-mcp --transport both
89
- ```
90
-
91
- ## Choosing a transport
92
-
93
- | Scenario | Recommended transport |
94
- |----------|-----------------------|
95
- | Solo developer, local project | `stdio` |
96
- | Team with shared codebase | `http` (server) |
97
- | CI pipeline | `http` or `stdio` with cached index |
98
- | Web UI access | `http` |
99
- | Testing both simultaneously | `both` |
100
-
101
- ## Authentication in HTTP mode
102
-
103
- When binding to a non-loopback address, always enable authentication:
104
-
105
- ```json
106
- {
107
- "http": {
108
- "host": "0.0.0.0",
109
- "auth": {
110
- "enabled": true,
111
- "token": "${PURECONTEXT_API_TOKEN}"
112
- }
113
- }
114
- }
115
- ```
116
-
117
- If `token` is empty and `enabled` is `true`, a random 32-byte hex token is generated at startup and printed to stderr. Save it immediately — it is not persisted to disk.
118
-
119
- All MCP requests must include:
120
-
121
- ```
122
- Authorization: Bearer <token>
123
- ```
124
-
125
- A warning is logged at startup if the server is bound to a non-loopback address with authentication disabled.
126
-
127
- ## TLS / HTTPS
128
-
129
- PureContext does not terminate TLS itself. Put it behind a reverse proxy for HTTPS in production.
130
-
131
- **nginx example:**
132
-
133
- ```nginx
134
- server {
135
- listen 443 ssl;
136
- server_name purecontext.mycompany.com;
137
-
138
- ssl_certificate /etc/letsencrypt/live/purecontext.mycompany.com/fullchain.pem;
139
- ssl_certificate_key /etc/letsencrypt/live/purecontext.mycompany.com/privkey.pem;
140
-
141
- location / {
142
- proxy_pass http://localhost:3000;
143
- proxy_http_version 1.1;
144
- proxy_set_header Upgrade $http_upgrade;
145
- proxy_set_header Connection keep-alive;
146
- proxy_set_header Host $host;
147
- # Disable buffering for SSE
148
- proxy_buffering off;
149
- proxy_cache off;
150
- proxy_read_timeout 3600s;
151
- }
152
- }
153
- ```
154
-
155
- **Caddy example:**
156
-
157
- ```
158
- purecontext.mycompany.com {
159
- reverse_proxy localhost:3000 {
160
- flush_interval -1
161
- }
162
- }
163
- ```
164
-
165
- ## SSE keepalive
166
-
167
- The HTTP server sends a `: ping` comment over the SSE stream every 30 seconds to keep connections alive through proxies and load balancers. If your proxy has a shorter idle timeout than 30 seconds, increase it (e.g., `proxy_read_timeout 3600s` in nginx).
1
+ # Transport Modes
2
+
3
+
4
+ PureContext supports two transport modes: **stdio** (local, default) and **HTTP/SSE** (team/cloud).
5
+
6
+ ## stdio transport (default)
7
+
8
+ The standard transport for Claude Code and other MCP-native clients.
9
+
10
+ ```bash
11
+ purecontext-mcp
12
+ ```
13
+
14
+ Claude Code spawns `purecontext-mcp` as a child process and communicates over stdin/stdout using the JSON-RPC MCP protocol. No network, no authentication required.
15
+
16
+ **Claude Code setup:**
17
+
18
+ ```bash
19
+ # Recommended: installer registers the server (pinned to your global Node) + rules
20
+ npx purecontext-mcp install claude
21
+
22
+ # Or register manually with npx
23
+ claude mcp add purecontext-mcp -- npx purecontext-mcp
24
+
25
+ # Using global install
26
+ claude mcp add purecontext-mcp purecontext-mcp
27
+ ```
28
+
29
+ **Best for:** Individual developers, local development, any situation where security and simplicity matter more than sharing.
30
+
31
+ ## HTTP / SSE transport
32
+
33
+ For browser-based clients, remote development, or multi-client setups.
34
+
35
+ ```bash
36
+ purecontext-mcp --transport http --port 3000
37
+ ```
38
+
39
+ Or via `config.json`:
40
+
41
+ ```json
42
+ {
43
+ "transport": "http",
44
+ "http": {
45
+ "port": 3000,
46
+ "host": "127.0.0.1",
47
+ "corsOrigins": ["http://localhost:*"]
48
+ }
49
+ }
50
+ ```
51
+
52
+ **HTTP endpoints:**
53
+
54
+ | Endpoint | Description |
55
+ |----------|-------------|
56
+ | `GET /health` | Server health check (always public) |
57
+ | `POST /mcp/sse` | MCP Streamable HTTP endpoint |
58
+ | `GET /` | Web UI (served when UI is built) |
59
+ | `GET /admin/*` | Admin API (requires admin key) |
60
+
61
+ **Connect Claude Code to an HTTP server:**
62
+
63
+ ```json
64
+ // ~/.claude/claude_desktop_config.json
65
+ {
66
+ "mcpServers": {
67
+ "purecontext": {
68
+ "transport": "http",
69
+ "url": "http://localhost:3000/mcp/sse"
70
+ }
71
+ }
72
+ }
73
+ ```
74
+
75
+ Or via CLI:
76
+
77
+ ```bash
78
+ claude mcp add purecontext-remote \
79
+ --transport http \
80
+ --url https://purecontext.mycompany.com/mcp/sse \
81
+ --header "Authorization: Bearer pctx_yourkey"
82
+ ```
83
+
84
+ **Best for:** Team deployments, shared index, CI pipelines, Web UI access.
85
+
86
+ ## Both transports simultaneously (development)
87
+
88
+ Run stdio and HTTP at the same time — useful during development to test the HTTP API while still using Claude Code via stdio:
89
+
90
+ ```bash
91
+ purecontext-mcp --transport both
92
+ ```
93
+
94
+ ## Choosing a transport
95
+
96
+ | Scenario | Recommended transport |
97
+ |----------|-----------------------|
98
+ | Solo developer, local project | `stdio` |
99
+ | Team with shared codebase | `http` (server) |
100
+ | CI pipeline | `http` or `stdio` with cached index |
101
+ | Web UI access | `http` |
102
+ | Testing both simultaneously | `both` |
103
+
104
+ ## Authentication in HTTP mode
105
+
106
+ When binding to a non-loopback address, always enable authentication:
107
+
108
+ ```json
109
+ {
110
+ "http": {
111
+ "host": "0.0.0.0",
112
+ "auth": {
113
+ "enabled": true,
114
+ "token": "${PURECONTEXT_API_TOKEN}"
115
+ }
116
+ }
117
+ }
118
+ ```
119
+
120
+ If `token` is empty and `enabled` is `true`, a random 32-byte hex token is generated at startup and printed to stderr. Save it immediately — it is not persisted to disk.
121
+
122
+ All MCP requests must include:
123
+
124
+ ```
125
+ Authorization: Bearer <token>
126
+ ```
127
+
128
+ A warning is logged at startup if the server is bound to a non-loopback address with authentication disabled.
129
+
130
+ ## TLS / HTTPS
131
+
132
+ PureContext does not terminate TLS itself. Put it behind a reverse proxy for HTTPS in production.
133
+
134
+ **nginx example:**
135
+
136
+ ```nginx
137
+ server {
138
+ listen 443 ssl;
139
+ server_name purecontext.mycompany.com;
140
+
141
+ ssl_certificate /etc/letsencrypt/live/purecontext.mycompany.com/fullchain.pem;
142
+ ssl_certificate_key /etc/letsencrypt/live/purecontext.mycompany.com/privkey.pem;
143
+
144
+ location / {
145
+ proxy_pass http://localhost:3000;
146
+ proxy_http_version 1.1;
147
+ proxy_set_header Upgrade $http_upgrade;
148
+ proxy_set_header Connection keep-alive;
149
+ proxy_set_header Host $host;
150
+ # Disable buffering for SSE
151
+ proxy_buffering off;
152
+ proxy_cache off;
153
+ proxy_read_timeout 3600s;
154
+ }
155
+ }
156
+ ```
157
+
158
+ **Caddy example:**
159
+
160
+ ```
161
+ purecontext.mycompany.com {
162
+ reverse_proxy localhost:3000 {
163
+ flush_interval -1
164
+ }
165
+ }
166
+ ```
167
+
168
+ ## SSE keepalive
169
+
170
+ The HTTP server sends a `: ping` comment over the SSE stream every 30 seconds to keep connections alive through proxies and load balancers. If your proxy has a shorter idle timeout than 30 seconds, increase it (e.g., `proxy_read_timeout 3600s` in nginx).
@@ -43,6 +43,8 @@ This means you can ask "which commits touched `authenticateUser`?" and get an an
43
43
  | `git.maxCommits` | `500` | Maximum commits to walk back from HEAD |
44
44
  | `git.includeMergeCommits` | `false` | Include merge commits (usually noise) |
45
45
  | `git.branches` | `["main"]` | Branches to index history from |
46
+ | `git.coChangeDepth` | `300` | Commits captured at the repo root for co-change analysis (`get_co_change`, `get_symbol_risk`, bundle `historicalNeighbors`). `0` disables capture entirely. |
47
+ | `git.megaCommitThreshold` | `30` | Commits touching more files than this are excluded / down-weighted as mega-commits (reformats, lockfile sweeps) |
46
48
 
47
49
  ---
48
50
 
@@ -120,6 +122,45 @@ File and symbol churn metrics — how often things change.
120
122
 
121
123
  ---
122
124
 
125
+ ## `get_co_change` — temporal coupling
126
+
127
+ Files that historically change together with a target file or symbol. This is the signal a static dependency graph cannot derive: a route and its test, or a feature flag and the code it gates, that move together without importing each other.
128
+
129
+ Capture is a single repo-level `git log --no-merges --name-only -n N` at index time (controlled by `git.coChangeDepth`), stored in a dedicated `commit_files` table — separate from the per-file `git_metadata` history, whose last-N-per-file window is too shallow for co-change.
130
+
131
+ **Parameters:**
132
+
133
+ | Parameter | Type | Default | Description |
134
+ |-----------|------|---------|-------------|
135
+ | `repoId` | `string` | required | Target repository |
136
+ | `filePath` | `string` | — | Target file (provide this **or** `symbolId`) |
137
+ | `symbolId` | `string` | — | Target symbol — resolved to its file (git is file-granular) |
138
+ | `minSupport` | `number` | `2` | Drop partners with fewer than N shared commits |
139
+ | `dayWindow` | `number` | — | Look back N days (default: entire captured window) |
140
+ | `topN` | `number` | `20` | Max partners to return |
141
+
142
+ **Response:** ranked `partners` with `support` (shared commits), `confidence` (directional A→B probability), `lift` (association strength), and `coChangeDate`. Mega-commits are filtered and down-weighted; `signalQuality: "low"` flags shallow/sparse histories.
143
+
144
+ **Use cases:**
145
+ - "If I touch this file, what else usually changes with it?"
146
+ - "What's the test or config that moves with this code but doesn't import it?"
147
+
148
+ ---
149
+
150
+ ## `get_symbol_risk` — composite change risk
151
+
152
+ A single, explainable "how risky is it to change this symbol?" verdict. Blends churn (90 d), centrality (afferent coupling + reverse blast radius), cyclomatic complexity, test-coverage gap, and co-change spread — each normalized **repo-relative** (midrank percentile) so scores compare within a repo and aren't dominated by absolute size.
153
+
154
+ **Parameters:** `{ repoId, symbolId }`
155
+
156
+ **Response:** `{ riskScore (0–100), band: "low" | "review" | "high", factors: { churn, centrality, complexity, testGap, coChange }, reasons: string[], signalQuality }`. Factor weights are tunable via `risk.weights.*` (see [Configuration](04-configuration.md)).
157
+
158
+ It always returns `factors` (raw + normalized) and human-readable `reasons[]` — never a black-box number. **Code-centered only — no author, ownership, or productivity metrics.**
159
+
160
+ **Guardrail:** before broad or automated edits to a `high` symbol, inspect its callers (`get_blast_radius`) and co-changers (`get_co_change`) first. `search_symbols` and `get_symbol_source` accept `includeRisk: true` to attach a compact `{ band, riskScore }` inline; `get_context_bundle` returns `historicalNeighbors` (co-changing files not reachable via imports) when co-change data exists.
161
+
162
+ ---
163
+
123
164
  ## PR / diff analysis
124
165
 
125
166
  Analyze what a branch or commit range changes at the symbol level:
@@ -155,3 +196,5 @@ When git integration is enabled:
155
196
  - **Merge commits:** excluded by default to avoid noise from merge-only diffs
156
197
  - **Rebased history:** rebase changes commit hashes — a re-index is needed to pick up rebased history accurately
157
198
  - Git submodules are not indexed
199
+ - **Co-change rename continuity:** the repo-level capture does not follow renames, so a file renamed mid-history splits its co-change signal until re-indexed
200
+ - **Squash-merge monorepos:** squashed PRs collapse many logical changes into one commit, which can inflate co-change; `git.megaCommitThreshold` mitigates this and `signalQuality: "low"` flags weak signal
@@ -1,121 +1,123 @@
1
- # Performance & Scalability
2
-
3
-
4
- PureContext is designed to handle enterprise-scale repos (10k–50k files) using a worker thread pool for parallel tree-sitter parsing.
5
-
6
- ---
7
-
8
- ## Indexing speed
9
-
10
- Typical performance on a 4-core machine:
11
-
12
- | Repo size | First index | Incremental re-index |
13
- |-----------|-------------|----------------------|
14
- | 500 files | ~2 seconds | < 100ms |
15
- | 5,000 files | ~15 seconds | < 1 second |
16
- | 20,000 files | ~60 seconds | 1–3 seconds |
17
- | 50,000 files | ~3 minutes | 2–10 seconds |
18
-
19
- These numbers assume no AI summarization or semantic indexing. Both add API round-trip time.
20
-
21
- ---
22
-
23
- ## Worker thread pool
24
-
25
- The bottleneck in sequential indexing is tree-sitter WASM parsing — each WASM instance is single-threaded. The worker thread pool parallelizes parsing across CPU cores.
26
-
27
- ```
28
- Main thread
29
-
30
- ┌────────────┼────────────┐
31
- ▼ ▼ ▼
32
- Worker 1 Worker 2 Worker 3
33
- (TypeScript) (Python) (Go)
34
- parse + extract parse + extract parse + extract
35
- │ │ │
36
- └────────────┴────────────┘
37
-
38
- Main thread
39
- (SQLite writes)
40
- ```
41
-
42
- Each worker loads its own WASM grammar instances. File batches are distributed across workers by the main thread. SQLite writes are serialized on the main thread (better-sqlite3 is synchronous).
43
-
44
- ### Configuring worker threads
45
-
46
- ```json
47
- {
48
- "workerThreads": 4 // default: os.cpus().length - 1, minimum 1
49
- }
50
- ```
51
-
52
- Increase for CPU-bound workloads on machines with many cores. Do not exceed `os.cpus().length - 1` — you want to leave one core for the main thread and OS.
53
-
54
- ---
55
-
56
- ## Memory usage
57
-
58
- | Component | Memory |
59
- |-----------|--------|
60
- | WASM grammars (per worker) | ~20–30 MB per grammar loaded |
61
- | In-memory symbol cache (during indexing) | ~100 MB for 10k symbols |
62
- | SQLite WAL mode (at rest) | ~50 MB |
63
- | HNSW vector index (if enabled) | ~100 bytes per embedding dimension per symbol |
64
-
65
- **Typical peak during indexing:** 200–500 MB for a 10k-file repo. Returns to ~50 MB at rest.
66
-
67
- Workers are spawned once and reused for the lifetime of the server no spawn/teardown overhead per index run.
68
-
69
- ---
70
-
71
- ## Incremental re-indexing
72
-
73
- The content hash cache makes re-indexing very fast:
74
-
75
- 1. Each file's SHA-256 hash is stored in the `files` table after indexing
76
- 2. On re-index, the hash is recomputed and compared
77
- 3. Only files with a changed hash are re-parsed
78
- 4. Symbols for unchanged files are retained as-is
79
-
80
- A typical `git pull` touches 10–50 files — re-index completes in milliseconds.
81
-
82
- To force a full re-index (bypass the hash cache):
83
-
84
- ```
85
- Use invalidate_cache tool, then index_folder again.
86
- ```
87
-
88
- Or call `index_folder` with `force: true`.
89
-
90
- ---
91
-
92
- ## Large repo tuning
93
-
94
- For repos with > 10,000 files:
95
-
96
- | Setting | Recommendation |
97
- |---------|---------------|
98
- | `workerThreads` | Set to `os.cpus().length - 1` |
99
- | `watchDebounceMs` | Increase to `5000` if many files change at once (e.g., code generation) |
100
- | `excludePatterns` | Add patterns for generated files, test fixtures with large data files |
101
- | `maxFileSizeBytes` | Keep at 1 MB or lower parsing multi-MB files is slow and rarely useful |
102
- | `fileLimit` | Set to `0` (unlimited) if you need the full repo indexed |
103
-
104
- ---
105
-
106
- ## SQLite performance
107
-
108
- SQLite in **WAL (Write-Ahead Logging) mode** provides:
109
- - Concurrent reads without blocking writes
110
- - Fast writes (no fsync on every write in WAL mode)
111
- - Crash safety (WAL journal ensures atomicity)
112
-
113
- Query performance:
114
- - `search_symbols` with FTS5: < 5ms for 100k symbols
115
- - `get_symbol_source`: < 1ms (single row lookup by primary key)
116
- - `get_blast_radius` (depth 5): 5–20ms depending on graph density
117
- - `get_context_bundle` (depth 3): 3–15ms
118
-
119
- No tuning is needed for the SQLite layer up to ~500k symbols. At very large scale, consider periodic `VACUUM` to reclaim space from deleted symbols.
120
-
121
-
1
+ # Performance & Scalability
2
+
3
+
4
+ PureContext is designed to handle enterprise-scale repos (10k–50k files) using a worker thread pool for parallel tree-sitter parsing.
5
+
6
+ ---
7
+
8
+ ## Indexing speed
9
+
10
+ Typical performance on a 4-core machine:
11
+
12
+ | Repo size | First index | Incremental re-index |
13
+ |-----------|-------------|----------------------|
14
+ | 500 files | ~2 seconds | < 100ms |
15
+ | 5,000 files | ~15 seconds | < 1 second |
16
+ | 20,000 files | ~60 seconds | 1–3 seconds |
17
+ | 50,000 files | ~3 minutes | 2–10 seconds |
18
+
19
+ These numbers assume no AI summarization or semantic indexing. Both add API round-trip time.
20
+
21
+ ---
22
+
23
+ ## Worker thread pool
24
+
25
+ The bottleneck in sequential indexing is tree-sitter WASM parsing — each WASM instance is single-threaded. The worker thread pool parallelizes parsing across CPU cores.
26
+
27
+ ```
28
+ Main thread
29
+
30
+ ┌────────────┼────────────┐
31
+ ▼ ▼ ▼
32
+ Worker 1 Worker 2 Worker 3
33
+ (TypeScript) (Python) (Go)
34
+ parse + extract parse + extract parse + extract
35
+ │ │ │
36
+ └────────────┴────────────┘
37
+
38
+ Main thread
39
+ (SQLite writes)
40
+ ```
41
+
42
+ Each worker loads its own WASM grammar instances. File batches are distributed across workers by the main thread. SQLite writes are serialized on the main thread (better-sqlite3 is synchronous).
43
+
44
+ > **SQLite backend note:** the numbers here assume the native `better-sqlite3` engine (Node 18/20/22). On other Node versions PureContext falls back to a WASM SQLite engine (see [Installation](02-installation.md)); it is functionally identical (FTS5 included) but slower on write-heavy indexing, because the WASM database is held in memory and serialized to disk on flush rather than written natively in place. Indexing throughput is the main thing affected; query latency is much closer.
45
+
46
+ ### Configuring worker threads
47
+
48
+ ```json
49
+ {
50
+ "workerThreads": 4 // default: os.cpus().length - 1, minimum 1
51
+ }
52
+ ```
53
+
54
+ Increase for CPU-bound workloads on machines with many cores. Do not exceed `os.cpus().length - 1` — you want to leave one core for the main thread and OS.
55
+
56
+ ---
57
+
58
+ ## Memory usage
59
+
60
+ | Component | Memory |
61
+ |-----------|--------|
62
+ | WASM grammars (per worker) | ~20–30 MB per grammar loaded |
63
+ | In-memory symbol cache (during indexing) | ~100 MB for 10k symbols |
64
+ | SQLite WAL mode (at rest) | ~50 MB |
65
+ | HNSW vector index (if enabled) | ~100 bytes per embedding dimension per symbol |
66
+
67
+ **Typical peak during indexing:** 200–500 MB for a 10k-file repo. Returns to ~50 MB at rest.
68
+
69
+ Workers are spawned once and reused for the lifetime of the server — no spawn/teardown overhead per index run.
70
+
71
+ ---
72
+
73
+ ## Incremental re-indexing
74
+
75
+ The content hash cache makes re-indexing very fast:
76
+
77
+ 1. Each file's SHA-256 hash is stored in the `files` table after indexing
78
+ 2. On re-index, the hash is recomputed and compared
79
+ 3. Only files with a changed hash are re-parsed
80
+ 4. Symbols for unchanged files are retained as-is
81
+
82
+ A typical `git pull` touches 10–50 files — re-index completes in milliseconds.
83
+
84
+ To force a full re-index (bypass the hash cache):
85
+
86
+ ```
87
+ Use invalidate_cache tool, then index_folder again.
88
+ ```
89
+
90
+ Or call `index_folder` with `force: true`.
91
+
92
+ ---
93
+
94
+ ## Large repo tuning
95
+
96
+ For repos with > 10,000 files:
97
+
98
+ | Setting | Recommendation |
99
+ |---------|---------------|
100
+ | `workerThreads` | Set to `os.cpus().length - 1` |
101
+ | `watchDebounceMs` | Increase to `5000` if many files change at once (e.g., code generation) |
102
+ | `excludePatterns` | Add patterns for generated files, test fixtures with large data files |
103
+ | `maxFileSizeBytes` | Keep at 1 MB or lower — parsing multi-MB files is slow and rarely useful |
104
+ | `fileLimit` | Set to `0` (unlimited) if you need the full repo indexed |
105
+
106
+ ---
107
+
108
+ ## SQLite performance
109
+
110
+ SQLite in **WAL (Write-Ahead Logging) mode** provides:
111
+ - Concurrent reads without blocking writes
112
+ - Fast writes (no fsync on every write in WAL mode)
113
+ - Crash safety (WAL journal ensures atomicity)
114
+
115
+ Query performance:
116
+ - `search_symbols` with FTS5: < 5ms for 100k symbols
117
+ - `get_symbol_source`: < 1ms (single row lookup by primary key)
118
+ - `get_blast_radius` (depth 5): 5–20ms depending on graph density
119
+ - `get_context_bundle` (depth 3): 3–15ms
120
+
121
+ No tuning is needed for the SQLite layer up to ~500k symbols. At very large scale, consider periodic `VACUUM` to reclaim space from deleted symbols.
122
+
123
+