purecontext-mcp 1.2.0 → 1.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (185) hide show
  1. package/AGENT_INSTRUCTIONS.md +110 -784
  2. package/AGENT_REFERENCE.md +561 -0
  3. package/BENCHMARKS.md +153 -0
  4. package/CHANGELOG.md +177 -6
  5. package/FRAMEWORK-ADAPTERS.md +351 -0
  6. package/FULL-INSTALLATION-GUIDE.md +341 -0
  7. package/LANGUAGE-SUPPORT.md +144 -0
  8. package/README.md +154 -16
  9. package/USER-GUIDE.md +29 -21
  10. package/dist/cli/hooks.d.ts +28 -0
  11. package/dist/cli/hooks.d.ts.map +1 -0
  12. package/dist/cli/hooks.js +570 -0
  13. package/dist/cli/hooks.js.map +1 -0
  14. package/dist/cli/install-detect.d.ts +16 -0
  15. package/dist/cli/install-detect.d.ts.map +1 -0
  16. package/dist/cli/install-detect.js +70 -0
  17. package/dist/cli/install-detect.js.map +1 -0
  18. package/dist/cli/install-writers.d.ts +59 -0
  19. package/dist/cli/install-writers.d.ts.map +1 -0
  20. package/dist/cli/install-writers.js +292 -0
  21. package/dist/cli/install-writers.js.map +1 -0
  22. package/dist/cli/install.d.ts +14 -0
  23. package/dist/cli/install.d.ts.map +1 -0
  24. package/dist/cli/install.js +150 -0
  25. package/dist/cli/install.js.map +1 -0
  26. package/dist/config/config-loader.js +3 -0
  27. package/dist/config/config-loader.js.map +1 -1
  28. package/dist/config/config-schema.d.ts +11 -0
  29. package/dist/config/config-schema.d.ts.map +1 -1
  30. package/dist/config/config-schema.js +15 -0
  31. package/dist/config/config-schema.js.map +1 -1
  32. package/dist/core/db/symbol-store.d.ts +1 -0
  33. package/dist/core/db/symbol-store.d.ts.map +1 -1
  34. package/dist/core/db/symbol-store.js +120 -6
  35. package/dist/core/db/symbol-store.js.map +1 -1
  36. package/dist/core/file-discovery.d.ts +6 -0
  37. package/dist/core/file-discovery.d.ts.map +1 -1
  38. package/dist/core/file-discovery.js +20 -13
  39. package/dist/core/file-discovery.js.map +1 -1
  40. package/dist/core/file-processor.d.ts.map +1 -1
  41. package/dist/core/file-processor.js +26 -1
  42. package/dist/core/file-processor.js.map +1 -1
  43. package/dist/core/git-log-reader.d.ts.map +1 -1
  44. package/dist/core/git-log-reader.js +21 -0
  45. package/dist/core/git-log-reader.js.map +1 -1
  46. package/dist/core/index-manager.d.ts.map +1 -1
  47. package/dist/core/index-manager.js +21 -7
  48. package/dist/core/index-manager.js.map +1 -1
  49. package/dist/core/indexing-worker.d.ts.map +1 -1
  50. package/dist/core/indexing-worker.js +14 -0
  51. package/dist/core/indexing-worker.js.map +1 -1
  52. package/dist/core/parse-dispatcher.d.ts.map +1 -1
  53. package/dist/core/parse-dispatcher.js +20 -5
  54. package/dist/core/parse-dispatcher.js.map +1 -1
  55. package/dist/core/search/query-preprocessor.d.ts +69 -3
  56. package/dist/core/search/query-preprocessor.d.ts.map +1 -1
  57. package/dist/core/search/query-preprocessor.js +450 -17
  58. package/dist/core/search/query-preprocessor.js.map +1 -1
  59. package/dist/core/search/relevance-ranker.d.ts +60 -5
  60. package/dist/core/search/relevance-ranker.d.ts.map +1 -1
  61. package/dist/core/search/relevance-ranker.js +931 -33
  62. package/dist/core/search/relevance-ranker.js.map +1 -1
  63. package/dist/core/test-mapper.d.ts.map +1 -1
  64. package/dist/core/test-mapper.js +7 -1
  65. package/dist/core/test-mapper.js.map +1 -1
  66. package/dist/core/types.d.ts +28 -1
  67. package/dist/core/types.d.ts.map +1 -1
  68. package/dist/handlers/angular-html.d.ts +3 -0
  69. package/dist/handlers/angular-html.d.ts.map +1 -0
  70. package/dist/handlers/angular-html.js +215 -0
  71. package/dist/handlers/angular-html.js.map +1 -0
  72. package/dist/handlers/c.d.ts.map +1 -1
  73. package/dist/handlers/c.js +19 -0
  74. package/dist/handlers/c.js.map +1 -1
  75. package/dist/handlers/cpp-macro-registry.d.ts +21 -0
  76. package/dist/handlers/cpp-macro-registry.d.ts.map +1 -0
  77. package/dist/handlers/cpp-macro-registry.js +44 -0
  78. package/dist/handlers/cpp-macro-registry.js.map +1 -0
  79. package/dist/handlers/cpp.d.ts.map +1 -1
  80. package/dist/handlers/cpp.js +579 -10
  81. package/dist/handlers/cpp.js.map +1 -1
  82. package/dist/handlers/csharp.d.ts.map +1 -1
  83. package/dist/handlers/csharp.js +39 -2
  84. package/dist/handlers/csharp.js.map +1 -1
  85. package/dist/handlers/css.d.ts +3 -0
  86. package/dist/handlers/css.d.ts.map +1 -0
  87. package/dist/handlers/css.js +154 -0
  88. package/dist/handlers/css.js.map +1 -0
  89. package/dist/handlers/erlang.d.ts.map +1 -1
  90. package/dist/handlers/erlang.js +8 -1
  91. package/dist/handlers/erlang.js.map +1 -1
  92. package/dist/handlers/fortran.js +1 -1
  93. package/dist/handlers/fortran.js.map +1 -1
  94. package/dist/handlers/go.d.ts.map +1 -1
  95. package/dist/handlers/go.js +87 -2
  96. package/dist/handlers/go.js.map +1 -1
  97. package/dist/handlers/handler-registry.d.ts.map +1 -1
  98. package/dist/handlers/handler-registry.js +4 -0
  99. package/dist/handlers/handler-registry.js.map +1 -1
  100. package/dist/handlers/hcl.d.ts +3 -0
  101. package/dist/handlers/hcl.d.ts.map +1 -0
  102. package/dist/handlers/hcl.js +193 -0
  103. package/dist/handlers/hcl.js.map +1 -0
  104. package/dist/handlers/java.d.ts.map +1 -1
  105. package/dist/handlers/java.js +33 -16
  106. package/dist/handlers/java.js.map +1 -1
  107. package/dist/handlers/kotlin.d.ts.map +1 -1
  108. package/dist/handlers/kotlin.js +48 -3
  109. package/dist/handlers/kotlin.js.map +1 -1
  110. package/dist/handlers/less.d.ts +3 -0
  111. package/dist/handlers/less.d.ts.map +1 -0
  112. package/dist/handlers/less.js +255 -0
  113. package/dist/handlers/less.js.map +1 -0
  114. package/dist/handlers/objective-c.d.ts.map +1 -1
  115. package/dist/handlers/objective-c.js +122 -64
  116. package/dist/handlers/objective-c.js.map +1 -1
  117. package/dist/handlers/openapi.d.ts.map +1 -1
  118. package/dist/handlers/openapi.js +30 -5
  119. package/dist/handlers/openapi.js.map +1 -1
  120. package/dist/handlers/php.d.ts.map +1 -1
  121. package/dist/handlers/php.js +287 -41
  122. package/dist/handlers/php.js.map +1 -1
  123. package/dist/handlers/protobuf.d.ts.map +1 -1
  124. package/dist/handlers/protobuf.js +1 -0
  125. package/dist/handlers/protobuf.js.map +1 -1
  126. package/dist/handlers/python.d.ts.map +1 -1
  127. package/dist/handlers/python.js +1 -3
  128. package/dist/handlers/python.js.map +1 -1
  129. package/dist/handlers/ruby-dsl.d.ts +23 -0
  130. package/dist/handlers/ruby-dsl.d.ts.map +1 -0
  131. package/dist/handlers/ruby-dsl.js +251 -0
  132. package/dist/handlers/ruby-dsl.js.map +1 -0
  133. package/dist/handlers/ruby.d.ts.map +1 -1
  134. package/dist/handlers/ruby.js +29 -4
  135. package/dist/handlers/ruby.js.map +1 -1
  136. package/dist/handlers/rust.d.ts.map +1 -1
  137. package/dist/handlers/rust.js +98 -2
  138. package/dist/handlers/rust.js.map +1 -1
  139. package/dist/handlers/scss.d.ts +3 -0
  140. package/dist/handlers/scss.d.ts.map +1 -0
  141. package/dist/handlers/scss.js +290 -0
  142. package/dist/handlers/scss.js.map +1 -0
  143. package/dist/handlers/sql.d.ts.map +1 -1
  144. package/dist/handlers/sql.js +37 -18
  145. package/dist/handlers/sql.js.map +1 -1
  146. package/dist/handlers/typescript.d.ts.map +1 -1
  147. package/dist/handlers/typescript.js +65 -17
  148. package/dist/handlers/typescript.js.map +1 -1
  149. package/dist/handlers/xml.d.ts.map +1 -1
  150. package/dist/handlers/xml.js +35 -2
  151. package/dist/handlers/xml.js.map +1 -1
  152. package/dist/index.d.ts.map +1 -1
  153. package/dist/index.js +91 -0
  154. package/dist/index.js.map +1 -1
  155. package/dist/server/mcp-server.d.ts.map +1 -1
  156. package/dist/server/mcp-server.js +10 -0
  157. package/dist/server/mcp-server.js.map +1 -1
  158. package/dist/server/tools/detect-antipatterns.d.ts +1 -1
  159. package/dist/server/tools/get-architecture-snapshot.d.ts +1 -1
  160. package/dist/server/tools/get-entry-points.d.ts +1 -1
  161. package/dist/server/tools/get-lexical-scope-matches.d.ts +54 -0
  162. package/dist/server/tools/get-lexical-scope-matches.d.ts.map +1 -0
  163. package/dist/server/tools/get-lexical-scope-matches.js +470 -0
  164. package/dist/server/tools/get-lexical-scope-matches.js.map +1 -0
  165. package/dist/server/tools/search-symbols.d.ts +10 -0
  166. package/dist/server/tools/search-symbols.d.ts.map +1 -1
  167. package/dist/server/tools/search-symbols.js +353 -8
  168. package/dist/server/tools/search-symbols.js.map +1 -1
  169. package/dist/server/tools/trace-invocation-chain.d.ts +53 -0
  170. package/dist/server/tools/trace-invocation-chain.d.ts.map +1 -0
  171. package/dist/server/tools/trace-invocation-chain.js +280 -0
  172. package/dist/server/tools/trace-invocation-chain.js.map +1 -0
  173. package/dist/version.d.ts +1 -1
  174. package/dist/version.js +1 -1
  175. package/docs/02-installation.md +43 -245
  176. package/docs/05-cli-reference.md +89 -0
  177. package/docs/07-language-support.md +73 -50
  178. package/docs/08-framework-adapters.md +7 -2
  179. package/docs/15-team-setup.md +70 -200
  180. package/docs/17-web-ui.md +73 -93
  181. package/docs/README.md +60 -39
  182. package/docs/dev/benchmark-findings-eu-za-tebe.md +210 -0
  183. package/docs/dev/phase-35-coverage-audit.md +469 -0
  184. package/package.json +6 -3
  185. package/user-manual.md +0 -2466
package/docs/17-web-ui.md CHANGED
@@ -1,68 +1,66 @@
1
- # Web UI
1
+ # Web UI — Reference
2
2
 
3
+ This is the reference page: build commands, configuration flags, keyboard shortcuts, heatmap metrics, and graph-viewer controls.
3
4
 
4
- The Web UI provides a visual interface for exploring indexed codebases. It is served by the same process as the MCP server when HTTP transport is active.
5
+ For the **user-friendly tour** when to use the UI vs the chat, what each view is good for, workflow examples see [`WEB-UI.md`](../WEB-UI.md) at the project root.
5
6
 
6
7
  ---
7
8
 
8
- ## Accessing the Web UI
9
+ ## Activating the UI
9
10
 
10
- The Web UI is available at `http://localhost:3000` (or your server URL) when running in HTTP mode:
11
+ The UI is served by the same process as the MCP server, but only when HTTP transport is active:
11
12
 
12
13
  ```bash
13
14
  purecontext-mcp --transport http --port 3000
15
+ # Web UI: http://localhost:3000
16
+ # MCP endpoint: http://localhost:3000/mcp/sse
14
17
  ```
15
18
 
16
- Then open `http://localhost:3000` in a browser.
17
-
18
- ### Building the UI
19
-
20
- The UI is pre-built in the npm package. For development or rebuilding from source:
19
+ The UI is pre-built into the npm package. For source builds:
21
20
 
22
21
  ```bash
23
22
  npm run build:ui # build only the UI
24
23
  npm run build # build everything
25
- npm run dev # watch mode: TypeScript + Vite dev server with hot reload
24
+ npm run dev # watch mode: TypeScript + Vite dev server with HMR
26
25
  ```
27
26
 
28
27
  ---
29
28
 
30
- ## Repository browser
29
+ ## Configuration
31
30
 
32
- - List all indexed repositories with symbol counts, file counts, and language breakdown
33
- - Collapsible file tree with file type icons
34
- - Click any file to open its symbol outline
35
-
36
- ---
31
+ | Field | Default | Description |
32
+ |-------|--------:|-------------|
33
+ | `webUI.enabled` | `true` | Set `false` to disable UI even in HTTP mode (API-only) |
34
+ | `webUI.theme` | `"system"` | `"light"` / `"dark"` / `"system"` default; users can override |
35
+ | `webUI.basePath` | `"/"` | Mount the UI under a subpath (e.g., `/purecontext`) |
36
+ | `webUI.maxGraphNodes` | `500` | Hard cap on graph viewer node count for performance |
37
37
 
38
- ## Symbol search
39
-
40
- - Real-time search with 300ms debounce — results appear as you type
41
- - Filter by: symbol kind, language, file path pattern
42
- - Keyboard navigation: arrow keys to move through results, Enter to open
43
- - Query term highlighting in results
44
- - Switches between keyword and semantic mode (if semantic search is enabled)
38
+ When deployed behind a reverse proxy at a subpath, set `webUI.basePath` to match the proxy path.
45
39
 
46
40
  ---
47
41
 
48
- ## Symbol viewer
42
+ ## Keyboard shortcuts
49
43
 
50
- - Syntax-highlighted source code (powered by Shiki — VS Code-quality highlighting)
51
- - Line numbers with anchors (shareable URLs)
52
- - Light/dark theme toggle (preference persisted in localStorage)
53
- - **Related symbols panel**: importers, dependencies, same-file symbols
44
+ | Shortcut | Action |
45
+ |----------|--------|
46
+ | `/` | Focus search bar |
47
+ | `↑` / `↓` | Navigate search results |
48
+ | `Enter` | Open selected symbol |
49
+ | `Esc` | Close panels / clear search |
50
+ | `G` | Open graph view for current symbol |
51
+ | `B` | Show blast radius for current symbol |
52
+ | `H` | Toggle heatmap overlay |
53
+ | `T` | Toggle light/dark theme |
54
54
 
55
55
  ---
56
56
 
57
- ## Dependency graph viewer
58
-
59
- An interactive force-directed graph of file and symbol dependencies.
57
+ ## Graph viewer
60
58
 
61
59
  ### Controls
62
60
 
63
61
  | Action | Control |
64
62
  |--------|---------|
65
- | Pan | Click and drag |
63
+ | Pan | Click and drag background |
66
64
  | Zoom | Scroll wheel |
67
65
  | Fit to view | Double-click background |
68
66
  | Select node | Click |
@@ -70,88 +68,70 @@ An interactive force-directed graph of file and symbol dependencies.
70
68
  | Forward walk | Enable "Dependencies" mode |
71
69
  | Reverse walk | Enable "Importers" mode |
72
70
 
73
- ### Layout options
74
-
75
- - **Force-directed** (default) — physics simulation, nodes cluster by connectivity
76
- - **Hierarchical** — root at top, dependencies flow downward
77
- - **Radial** — selected node at center, connected nodes radiate outward
71
+ ### Layouts
78
72
 
79
- ### Depth slider
73
+ | Layout | Behavior |
74
+ |--------|----------|
75
+ | Force-directed (default) | Physics simulation; nodes cluster by connectivity |
76
+ | Hierarchical | Root at top, dependencies flow downward |
77
+ | Radial | Selected node at center; connected nodes radiate outward |
80
78
 
81
- Adjust traversal depth (1–5 hops). Higher depth reveals transitive dependencies but may produce large graphs.
79
+ ### Filters and overlays
82
80
 
83
- ### Blast radius view
84
-
85
- Switch to "Blast radius" mode to see everything that depends on the selected node — color gradient from red (direct impact) to yellow (indirect).
81
+ | Feature | Description |
82
+ |---------|-------------|
83
+ | Depth slider | Traversal depth 1–5 hops |
84
+ | Language filter | Show only nodes of a specific language |
85
+ | Kind filter | Show only files/symbols of a specific kind |
86
+ | Cycle detection | Highlight circular dependency cycles in red |
87
+ | Blast-radius mode | Color gradient: red (direct impact) → yellow (indirect) |
88
+ | Export | Save graph as SVG or PNG |
89
+ | Minimap | Overview panel for large graphs |
86
90
 
87
91
  ---
88
92
 
89
93
  ## Architecture heatmap
90
94
 
91
- An overlay on the file tree that color-codes files by a selected metric:
92
-
93
- | Metric | Color scale | Use case |
94
- |--------|-------------|----------|
95
- | Churn | blue (stable) → red (high churn) | Identify high-risk files before a refactor |
96
- | Complexity | green → orange → red | Find over-complex files that need attention |
97
- | Quality score | green (high) → red (low) | Prioritize technical debt |
98
- | Test coverage | green (covered) → red (uncovered) | Requires external coverage report |
95
+ Color-codes files by a chosen metric.
99
96
 
100
- Click any cell in the heatmap to open the file's symbol outline.
97
+ | Metric | Color scale | Source |
98
+ |--------|-------------|--------|
99
+ | Churn | blue (stable) → red (high churn) | git log history |
100
+ | Complexity | green → orange → red | per-file cyclomatic complexity |
101
+ | Quality score | green (high) → red (low) | aggregated metrics |
102
+ | Test coverage | green (covered) → red (uncovered) | uploaded lcov file |
101
103
 
102
104
  ---
103
105
 
104
- ## Symbol timeline
106
+ ## Test coverage upload
105
107
 
106
- Per-symbol git history visualized as a timeline. Shows:
107
- - When the symbol was created (first commit where it appears)
108
- - Each commit that modified the symbol (with author, date, message)
109
- - When the symbol was deleted (if applicable)
108
+ The coverage overlay needs an lcov-format report:
110
109
 
111
- Requires git history integration enabled (see [Git & History Integration](18-git-history.md)).
110
+ 1. Run your test suite with coverage output (`vitest --coverage`, `pytest --cov`, `jest --coverage`, etc.)
111
+ 2. Export as lcov: typical output paths are `coverage/lcov.info` or `coverage.info`
112
+ 3. In the UI: Settings → Coverage → Upload lcov file
112
113
 
113
- ---
114
-
115
- ## Test coverage overlay
116
-
117
- Overlays test coverage data on the file tree. Requires an lcov-format coverage report:
118
-
119
- 1. Run your test suite with coverage output (`npx vitest --coverage`, `pytest --cov`, etc.)
120
- 2. Export as lcov: `coverage.info` / `lcov.info`
121
- 3. In PureContext Web UI: Settings → Coverage → Upload lcov file
122
-
123
- Files are color-coded by coverage percentage. Click a file to see line-level coverage in the source viewer.
114
+ Coverage data is stored per workspace and persists across UI sessions.
124
115
 
125
116
  ---
126
117
 
127
- ## Multi-repo workspace
128
-
129
- When multiple repos are indexed, the sidebar shows a repo switcher. Cross-repo search results appear in a unified list with the source repo identified for each result.
118
+ ## URL conventions
130
119
 
131
- ---
120
+ | Pattern | Purpose |
121
+ |---------|---------|
122
+ | `/r/:repoId` | Repository home |
123
+ | `/r/:repoId/f/:filePath` | File outline |
124
+ | `/r/:repoId/s/:symbolId` | Symbol viewer |
125
+ | `/r/:repoId/s/:symbolId#L42` | Symbol viewer with line anchor |
126
+ | `/r/:repoId/graph?root=:symbolId&depth=3` | Graph viewer with preset |
127
+ | `/r/:repoId/heatmap?metric=churn` | Heatmap with preset metric |
132
128
 
133
- ## Advanced graph controls
134
-
135
- Additional controls available in the graph viewer:
136
-
137
- | Feature | Description |
138
- |---------|-------------|
139
- | Language filter | Show only nodes of a specific language |
140
- | Kind filter | Show only files/symbols of a specific kind |
141
- | Cycle detection | Highlight circular dependency cycles in red |
142
- | Export | Save graph as SVG or PNG |
143
- | Minimap | Overview panel for large graphs |
129
+ URLs are stable — link them in PR descriptions or share with teammates.
144
130
 
145
131
  ---
146
132
 
147
- ## Keyboard shortcuts
133
+ ## Related reference
148
134
 
149
- | Shortcut | Action |
150
- |----------|--------|
151
- | `/` | Focus search bar |
152
- | `↑` / `↓` | Navigate search results |
153
- | `Enter` | Open selected symbol |
154
- | `Esc` | Close panels / clear search |
155
- | `G` | Open graph view for current symbol |
156
- | `B` | Show blast radius for current symbol |
157
- | `H` | Toggle heatmap overlay |
135
+ - [Transport Modes](14-transport-modes.md) required HTTP setup for UI to activate
136
+ - [Git & History Integration](18-git-history.md) — powers the symbol timeline and churn heatmap
137
+ - [Configuration](04-configuration.md) full `webUI.*` schema
package/docs/README.md CHANGED
@@ -1,71 +1,92 @@
1
- # PureContext MCP — User Manual
1
+ # PureContext MCP — Reference Manual
2
2
 
3
- PureContext MCP indexes your codebase and gives AI agents a way to navigate it without reading entire files. Instead of loading hundreds of lines of code to find one function, Claude (or any other MCP-compatible AI) can search by name, retrieve just the symbol it needs, and understand the dependency chain — all in a fraction of the tokens.
3
+ This is the **reference manual**: parameter-level documentation for every tool, configuration option, language handler, framework adapter, and deployment option.
4
4
 
5
- This manual covers everything from installation through advanced features. Use the sections below to navigate to what you need, or read in order for a full introduction.
5
+ For the **user guide** narrative explanations, worked examples, and real-world workflows see [`USER-GUIDE.md`](../USER-GUIDE.md) and the `WHY-PURECONTEXT.md` / `FINDING-CODE.md` / `WORKFLOW-*.md` files at the project root.
6
6
 
7
- ---
7
+ Each row below has two columns: the reference page in this directory, and the user-friendly companion at the project root when one exists.
8
8
 
9
- ## Getting Started
9
+ ---
10
10
 
11
- These three sections get you from zero to a working setup.
11
+ ## Getting started
12
12
 
13
- - [Introduction](01-introduction.md) What PureContext is, why token efficiency matters, key concepts
14
- - [Installation](02-installation.md) — Install via npm, verify your setup, upgrade and uninstall
15
- - [Quick Start](03-quick-start.md) — Index a project and search your first symbol in minutes
13
+ | Reference | Companion |
14
+ |-----------|-----------|
15
+ | [Introduction](01-introduction.md) — concise spec, glossary, key concepts | [Why PureContext](../WHY-PURECONTEXT.md) narrative case |
16
+ | [Installation](02-installation.md) — prereqs, support matrix, verify, upgrade | [Full Installation Guide](../FULL-INSTALLATION-GUIDE.md) — per-IDE walkthrough |
17
+ | [Quick Start](03-quick-start.md) — index a project and search in minutes | [Navigating a New Codebase](../NAVIGATING-NEW-CODE.md) — day-one workflow |
16
18
 
17
19
  ---
18
20
 
19
- ## Reference
20
-
21
- Complete reference material for configuration, the CLI, and every MCP tool.
21
+ ## Core reference
22
22
 
23
- - [Configuration](04-configuration.md) — Full `config.json` schema, every field explained, environment variable overrides
24
- - [CLI Reference](05-cli-reference.md) — Every command and flag: `config --init`, `--health`, `--transport`, and more
23
+ - [Configuration](04-configuration.md) — Full `config.json` schema and environment variable overrides
24
+ - [CLI Reference](05-cli-reference.md) — Every command and flag (`config --init`, `--health`, `--transport`, etc.)
25
25
  - [MCP Tools Reference](06-tools-reference.md) — Every tool with inputs, outputs, and examples — grouped by category
26
26
 
27
27
  ---
28
28
 
29
- ## Language & Framework Support
29
+ ## Language and framework support
30
30
 
31
- - [Language Support](07-language-support.md) All 34 supported languages: what gets indexed and known limitations
32
- - [Framework Adapters](08-framework-adapters.md) — Vue, React, Nuxt, Next.js, Angular, NestJS, Express, Django, Rails, Spring, and 20+ more
31
+ | Reference | Companion |
32
+ |-----------|-----------|
33
+ | [Language Support](07-language-support.md) — symbol-kind matrix, visibility filters, grammar notes | [Language Support](../LANGUAGE-SUPPORT.md) — narrative tour by category |
34
+ | [Framework Adapters](08-framework-adapters.md) — detection rules, extracted kinds, `frameworkMeta` | [Framework Adapters](../FRAMEWORK-ADAPTERS.md) — what each adapter changes in practice |
33
35
 
34
36
  ---
35
37
 
36
- ## Core Features
38
+ ## Core features
37
39
 
38
- - [Dependency Graph Tools](09-dependency-graph.md) — Find what a symbol depends on, what depends on it, and what is dead code
39
- - [Semantic Search](10-semantic-search.md) — Search by meaning rather than name using HNSW vector index
40
- - [Search Quality & Ranking](11-search-quality.md) — How FTS5, camelCase splitting, and relevance ranking work; search tips
41
- - [AI Summarization](12-ai-summarization.md) — Auto-generate symbol descriptions with Anthropic, OpenAI, or Gemini
42
- - [Token Savings Tracker](13-token-savings.md) — See exactly how many tokens (and dollars) PureContext saves per session
40
+ - [Dependency Graph Tools](09-dependency-graph.md) — what a symbol depends on, what depends on it, dead-code detection
41
+ - [Semantic Search](10-semantic-search.md) — HNSW vector index, embedding providers, hybrid mode
42
+ - [Search Quality & Ranking](11-search-quality.md) — FTS5, camelCase splitting, relevance ranking
43
+ - [AI Summarization](12-ai-summarization.md) — provider config, batch sizes, cost model
44
+ - [Token Savings Tracker](13-token-savings.md) — per-session token (and dollar) accounting
45
+
46
+ Companion narratives: [Finding Code](../FINDING-CODE.md), [AI Summaries](../AI-SUMMARIES.md), [AST-Level Search](../AST-SEARCH.md), [Code Intelligence](../CODE-INTELLIGENCE.md).
43
47
 
44
48
  ---
45
49
 
46
50
  ## Deployment
47
51
 
48
- - [Transport Modes](14-transport-modes.md) stdio (local) vs HTTP/SSE (team/browser); TLS via reverse proxy
49
- - [Team Setup & Multi-Tenant](15-team-setup.md) — Shared server, workspaces, API keys, rate limiting
50
- - [Docker Deployment](16-docker.md) — `docker run`, Docker Compose, volumes, environment variables, health checks
52
+ | Reference | Companion |
53
+ |-----------|-----------|
54
+ | [Transport Modes](14-transport-modes.md) — stdio vs HTTP/SSE, TLS via reverse proxy | — |
55
+ | [Team Setup & Multi-Tenant](15-team-setup.md) — permissions, rate limit, admin API reference | [Using PureContext with a Team](../TEAM-SETUP.md) — narrative deployment |
56
+ | [Docker Deployment](16-docker.md) — image tags, compose, volumes, env vars, healthchecks | — |
57
+
58
+ ---
59
+
60
+ ## Advanced features
61
+
62
+ | Reference | Companion |
63
+ |-----------|-----------|
64
+ | [Web UI](17-web-ui.md) — config flags, keyboard shortcuts, URL conventions | [The Web UI](../WEB-UI.md) — when to leave the chat |
65
+ | [Git & History Integration](18-git-history.md) — symbol history, churn, diff analysis | [Code History](../CODE-HISTORY.md) — narrative |
66
+ | [Cross-Repo Intelligence](19-cross-repo.md) — multi-repo search, similarity, MCP Resources | — |
67
+ | [AI-Powered Architecture Analysis](20-architecture-analysis.md) — metrics, anti-patterns, auto-docs | [Code Health](../CODE-HEALTH.md), [Health Dashboards](../HEALTH-DASHBOARDS.md), [Visualizing Code Structure](../VISUALIZING-CODE.md) |
68
+ | [Ecosystem & Data Tools](21-ecosystem-tools.md) — dbt, OpenAPI handler, SQL handler, column search | — |
69
+ | [Distribution & Platform](22-distribution.md) — export/import, registry, webhooks, GitHub Actions | — |
70
+
71
+ Companion narratives also relevant here: [Making Changes Safely](../SAFE-CHANGES.md), [Understanding Code Relationships](../UNDERSTANDING-RELATIONSHIPS.md), [Refactoring Safely](../REFACTORING-SAFELY.md).
51
72
 
52
73
  ---
53
74
 
54
- ## Advanced Features
75
+ ## Operations and stability
55
76
 
56
- - [Web UI](17-web-ui.md) — Visual graph viewer, heatmap, symbol timeline, test coverage overlay
57
- - [Git & History Integration](18-git-history.md) — Symbol-level commit history, churn metrics, PR diff analysis
58
- - [Cross-Repo Intelligence](19-cross-repo.md) — Search across multiple repos, find similar code, MCP Resources
59
- - [AI-Powered Architecture Analysis](20-architecture-analysis.md) — Quality metrics, anti-pattern detection, auto-generated architecture docs
60
- - [Ecosystem & Data Tools](21-ecosystem-tools.md) — dbt integration, OpenAPI/Swagger handler, SQL handler, column search
61
- - [Distribution & Platform](22-distribution.md) — Index export/import, public registry, webhooks, GitHub Actions, VS Code extension
77
+ - [Performance & Scalability](23-performance.md) — worker thread pool, large-repo tuning, memory
78
+ - [Security](24-security.md) — API key model, workspace isolation, path-traversal protections, hardening
79
+ - [Troubleshooting](26-troubleshooting.md) — common errors, `--health` output, debug logging
80
+ - [Architecture Overview](25-architecture-overview.md) — three-layer design, data flow, SQLite schema
81
+ - [API Stability & Changelog](27-api-stability.md) — semver policy, stable vs experimental tools, version history
62
82
 
63
83
  ---
64
84
 
65
- ## Operations & Reference
85
+ ## End-to-end workflows
86
+
87
+ The user-guide root has narrative walkthroughs for full real-world scenarios:
66
88
 
67
- - [Performance & Scalability](23-performance.md) — Worker thread pool, large repo tuning, memory usage
68
- - [Security](24-security.md) — API key model, workspace isolation, path traversal prevention, hardening checklist
69
- - [Troubleshooting](26-troubleshooting.md) — Common errors, `--health` output, debug logging, re-indexing from scratch
70
- - [Architecture Overview](25-architecture-overview.md) — How PureContext works internally: three-layer design, data flow, SQLite schema
71
- - [API Stability & Changelog](27-api-stability.md) — Semver policy, stable vs experimental tools, version history
89
+ - [Onboarding to a New Codebase](../WORKFLOW-ONBOARDING.md)
90
+ - [Refactoring Legacy Code](../WORKFLOW-REFACTORING.md)
91
+ - [Reviewing a Pull Request](../WORKFLOW-PR-REVIEW.md)
92
+ - [Running a Tech Debt Sprint](../WORKFLOW-TECH-DEBT.md)
@@ -0,0 +1,210 @@
1
+ # Benchmark Findings — PureContext vs jCodeMunch (eu-za-tebe)
2
+
3
+ **Date:** 2026-05-14 (re-measured 2026-05-15 after Phase 34 body-snippet indexing)
4
+ **Project:** eu-za-tebe (PHP/CodeIgniter 3, Twig, HMVC modules)
5
+ **PureContext version:** 1.2.0
6
+ **jCodeMunch version:** 1.80.1
7
+ **Harness:** `benchmarks/harness/run_benchmark.ts`
8
+ **Results:** `benchmarks/eu-za-tebe/results/`
9
+
10
+ ---
11
+
12
+ ## 1. Scorecard
13
+
14
+ | Dimension | Metric | PureContext | jCodeMunch | Winner |
15
+ |-----------|--------|------------:|------------:|--------|
16
+ | **0 — Indexing** | Speed (files/sec) | 193 | 106 | PC |
17
+ | | Symbols/sec | 1,466 | 2,833 | JC |
18
+ | | Files indexed | 565 | 824 | JC |
19
+ | | Symbols found | 4,291 | 21,984 | JC |
20
+ | **1 — Token efficiency** | Avg reduction | 99.9% | 99.8% | PC |
21
+ | | Avg ratio vs baseline | 1,060× | 696× | PC |
22
+ | **2 — Search quality** | Precision@1 | **0.0%** | **28.0%** | JC |
23
+ | | Precision@3 | **0.0%** | **32.0%** | JC |
24
+ | | Recall@5 | **0.0%** | **32.0%** | JC |
25
+ | | Median search latency | 0.8ms | 57ms | PC |
26
+ | **3 — Coverage** | Total symbols | 4,291 | 21,984 | JC |
27
+ | | Symbols/kLOC | 38.8 | 198.7 | JC |
28
+
29
+ ---
30
+
31
+ ## 2. Gap 1 — Search Quality: PureContext 0% vs jCodeMunch 28%
32
+
33
+ ### Root cause
34
+
35
+ All 25 ground-truth queries are **natural-language descriptions** (e.g. "execute parameterized query and return single database row"). PureContext's FTS5 search operates on **symbol name + signature + summary**. On this PHP project, summaries are just the raw signature repeated (no docstrings), so:
36
+
37
+ - Indexed content for `CIR_Model::get_row`: `"CIR_Model get_row public function get_row($query, $input)"`
38
+ - Query: `"execute parameterized query and return single database row"`
39
+ - FTS5 MATCH fails because tokens like `"parameterized"`, `"database"`, `"single"`, `"return"` are absent from the index
40
+
41
+ This is **not a bug** — it is the expected behavior of keyword search against undocumented code. It is, however, a serious product gap.
42
+
43
+ ### Why jCodeMunch scores 28%
44
+
45
+ jCodeMunch's BM25 search appears to index **function body content** in addition to names and signatures. For example, `insert_row` calls `$this->db->insert()` internally, so the word `"insert"` appears multiple times with high weight. It also indexes variable names ($table, $fields, $values) which overlap with the query "insert new record into database table with fields and values".
46
+
47
+ jCodeMunch still missed 17/25 queries (68% miss rate), indicating that even richer keyword indexing isn't enough for pure natural-language queries against undocumented code.
48
+
49
+ ### Concrete examples
50
+
51
+ | Query | Expected | PC result | JC result | Why JC wins |
52
+ |-------|----------|-----------|-----------|-------------|
53
+ | "insert new record into database table with fields and values" | `CIR_Model::insert_row` | miss | rank 1 | "insert", "table", "fields", "values" in signature |
54
+ | "render twig template and return output as string" | `Twig::render` | miss | rank 1 | "render", "twig", "template", "string" in docstring |
55
+ | "set content language and slug for a localized page" | `CIR_Controller::localize` | miss | rank 1 | "localize" in name, "language"/"slug" in body |
56
+ | "fetch scalar value from database query result" | `CIR_Model::get_value` | miss | miss | "scalar" absent from both indexes |
57
+
58
+ ### Fix options (ranked by impact)
59
+
60
+ 1. **Enable semantic search for undocumented code** (highest impact)
61
+ Embed symbol content (name + signature + body snippet) into the HNSW vector index. Natural-language queries then find the right symbol even without docstrings. This is already implemented — the gap is that semantic search is disabled by default and requires an embedding provider. Consider enabling it with the bundled local ONNX model by default.
62
+
63
+ 2. **Index function body snippets into FTS5** (medium impact, no config needed)
64
+ Currently FTS5 indexes only name + signature + summary. Indexing the first ~10 lines of each function body (variable names, return statements, called methods) would dramatically improve recall for undocumented code. This alone would likely close a large portion of the gap without needing embeddings.
65
+
66
+ 3. **Query expansion in the preprocessor** (low impact, low risk)
67
+ When a query contains no exact or prefix name matches, fall back to individual token OR-matching rather than AND-matching. Currently FTS5 requires all tokens to match; if any one token is absent, the result is 0. Switching to OR / BM25 scoring for long queries would surface partial matches.
68
+
69
+ 4. **AI summary generation at index time** (medium impact, cost/latency tradeoff)
70
+ When `ai.allowRemoteAI: true`, generate a one-sentence natural-language description of each function at index time. This is already supported but opt-in. Making it the default (with a local model fallback) would close most of the gap.
71
+
72
+ ---
73
+
74
+ ## 3. Gap 2 — Symbol Coverage: PureContext 4,291 vs jCodeMunch 21,984
75
+
76
+ ### Root cause
77
+
78
+ The 5× coverage gap has two distinct causes:
79
+
80
+ **A. File scope difference**
81
+
82
+ | Tool | Files indexed | Files skipped | Reason for skips |
83
+ |------|--------------|---------------|-----------------|
84
+ | PureContext | 565 (of 2,656 eligible) | 2,091 | Incremental: unchanged since last full index |
85
+ | jCodeMunch | 824 | 4,167 gitignore + 756 wrong extension | Fresh index, stricter gitignore |
86
+
87
+ Note: PureContext counts are misleading here because the "incremental" run only re-processed changed files. The first full index found 2,661 files. jCodeMunch's 824 is a genuinely smaller set because it applies stricter gitignore rules and skips more extension types (no Twig templates, no SCSS as code, etc.).
88
+
89
+ **B. Symbol extraction depth**
90
+
91
+ | Metric | PureContext | jCodeMunch |
92
+ |--------|-------------|------------|
93
+ | Symbols per file (PHP) | ~7.5 | ~34 |
94
+ | What is extracted | class, method, function, const, interface | Same + local variables, constants, imports, inline lambdas, config keys |
95
+ | Function body | Not indexed | Indexed for search |
96
+
97
+ jCodeMunch extracts ~4.5× more symbols per PHP file. This is because it indexes finer-grained constructs: local variable assignments, inline anonymous functions, config array keys, and potentially PHP `define()` constants that PureContext only counts as `const` if they're class constants.
98
+
99
+ **C. Dim 3 table is not apples-to-apples**
100
+
101
+ PureContext reports by **symbol kind** (class/method/function/const). jCodeMunch reports by **language** (php/javascript/css). This made the side-by-side table in the benchmark report misleading — jCodeMunch's 21,984 includes 638 PHP *files* as entries, not just their symbols. The actual comparable number may be lower.
102
+
103
+ **Action required:** Re-run jCodeMunch with `detail_level: 'full'` on a known file and count actual distinct named symbols to get a fair apples-to-apples coverage number.
104
+
105
+ ### Fix options (ranked by effort vs impact)
106
+
107
+ 1. **Index function body content for FTS5** (already listed under Gap 1 — fixes both gaps)
108
+ Body content doesn't increase `symbol_count` but dramatically improves searchability.
109
+
110
+ 2. **Extract PHP class properties and constants** (medium effort, high value)
111
+ PureContext's PHP handler currently extracts classes, methods, functions, and class constants. PHP `define()` constants and class property declarations with PHPDoc are not extracted. Adding them would increase symbol density significantly.
112
+
113
+ 3. **Extract anonymous functions and closures** (low effort)
114
+ PHP closures assigned to variables (e.g., `$handler = function($req) {...}`) are common in CI3 hooks and route definitions. Treating them as `function` symbols with the variable name would add meaningful symbols.
115
+
116
+ 4. **Index PHP config arrays as structured data** (medium effort, niche value)
117
+ CodeIgniter stores configuration in PHP arrays (routes.php, config.php). Treating array keys as indexed entries (similar to the OpenAPI/dbt adapters) would let agents find "what is the base URL config key" questions.
118
+
119
+ 5. **Re-check PHP handler symbol kinds** (low effort, quick win)
120
+ Audit what `phpHandler.extractSymbols()` currently emits vs what jCodeMunch finds in the same files. Run both on `application/core/CIR_Model.php` and compare symbol lists. Any missing symbols from PureContext are extraction gaps in the PHP handler.
121
+
122
+ ---
123
+
124
+ ## 4. Token Efficiency — PureContext wins, but context matters
125
+
126
+ PureContext achieved 1,060× average compression vs 696× for jCodeMunch. However, jCodeMunch's compression is still excellent (99.8% reduction). The difference comes from:
127
+
128
+ - jCodeMunch returns more results per search (richer symbol set = more candidates shown)
129
+ - jCodeMunch's `get_symbol_source` response includes docstrings, hash, line range metadata (~414 tokens per symbol vs ~250 for PureContext)
130
+ - PureContext's `#MUNCH/1` format is not applicable — jCodeMunch uses this for search, but `get_symbol_source` returns full JSON
131
+
132
+ **Implication:** The token efficiency gap is not a deficiency in either tool — it reflects different trade-offs between completeness and compression.
133
+
134
+ ---
135
+
136
+ ## 5. Indexing Speed
137
+
138
+ - PureContext: **193 files/sec** (incremental run; first full index was ~419 files/sec)
139
+ - jCodeMunch: **106 files/sec** (fresh full index, 824 files in 8.8s)
140
+
141
+ PureContext is faster at indexing. The incremental hash-based approach means re-indexing a large repo after small changes is near-instant.
142
+
143
+ ---
144
+
145
+ ## 6. What the Ground Truth Revealed About Query Design
146
+
147
+ The 25 ground-truth queries were written as long natural-language sentences. This is representative of how AI agents actually query code navigation tools. Key learnings:
148
+
149
+ - Short queries with the exact symbol name (e.g. "insert_row") work well for both tools
150
+ - Long natural-language queries without semantic search work poorly for BOTH tools on undocumented PHP
151
+ - jCodeMunch's advantage comes from indexing function body content, not from better NLP
152
+ - Adding semantic search to PureContext would likely flip the Dim 2 winner even on this project
153
+
154
+ **For future benchmarks:** Keep the long natural-language format — it's the realistic test case. Do NOT shorten queries to keyword fragments just to make PureContext score higher; that would misrepresent real agent usage.
155
+
156
+ ---
157
+
158
+ ## 7. Phase 34 Post-Mortem (2026-05-15)
159
+
160
+ Phase 34 implemented body snippet indexing (first ~200 bytes of function/method bodies into FTS5) and re-measured P@1. Result: **still 0%**.
161
+
162
+ ### Root cause — FTS5 AND semantics
163
+
164
+ Body snippets ARE correctly indexed and searchable (21 unit tests pass). The blocker is that FTS5's default query mode is strict AND: every token in the query must appear in the document for it to be returned.
165
+
166
+ All 25 ground-truth queries are natural-language sentences containing English connectives ("and", "with", "from", "into", "as", "by", "on", "for") that never appear as tokens in PHP code. Example:
167
+
168
+ - Query: `"execute parameterized query and return single database row"`
169
+ - FTS needs ALL of: `execute`, `parameterized`, `query`, `and`, `return`, `single`, `database`, `row`
170
+ - `get_row` body has: `query` ✓, `row` ✓, `return` ✓ — but `execute`, `parameterized`, `and`, `single`, `database` are absent → zero results
171
+
172
+ Body snippets bring body tokens into the FTS index correctly. The AND-semantics prevent these tokens from being used for ranking because the English connectives act as hard filters that guarantee zero matches.
173
+
174
+ ### Why the P@1 ≥ 20% target was not reached in Phase 34
175
+
176
+ The ≥20% target requires OR-mode fallback (Phase 37) to be effective. Without OR-fallback:
177
+ - AND mode: all 25 queries fail because they contain English connectives absent from code
178
+ - LIKE fallback: also fails (multi-word natural language strings never match symbol names)
179
+
180
+ Phase 34 is a necessary prerequisite for Phase 37. Once Phase 37 implements OR-fallback (retry the FTS query in OR mode when AND returns zero results), the body snippet tokens will be usable for BM25 ranking and P@1 should climb to ≥20%.
181
+
182
+ ### What DID improve in Phase 34
183
+
184
+ - Body snippet content is now in the FTS index for all functions and methods (PHP + TypeScript)
185
+ - The benchmark harness was updated to use FTS+bodySnippets for Dim 2 (previously used LIKE-only)
186
+ - 21 unit tests verify the extraction and FTS integration end-to-end
187
+
188
+ ## 8. Priority Action Items
189
+
190
+ | Priority | Gap | Fix | Effort |
191
+ |----------|-----|-----|--------|
192
+ | P0 | Search quality (0%) | OR-fallback when FTS AND returns zero results (Phase 37) | Low |
193
+ | P0 | Search quality (0%) | Enable local ONNX semantic search by default | Low (already built) |
194
+ | P1 | Coverage (5× gap) | Audit PHP handler vs jCodeMunch on same file | Low |
195
+ | P1 | Coverage | Extract PHP `define()` constants + class properties | Medium |
196
+ | P2 | Coverage | Extract PHP closures assigned to variables | Low |
197
+ | P2 | Dim 3 accuracy | Fix apples-to-apples coverage comparison in harness | Low |
198
+
199
+ ---
200
+
201
+ ## 8. Files
202
+
203
+ | File | Description |
204
+ |------|-------------|
205
+ | `benchmarks/harness/run_benchmark.ts` | Dual comparison harness |
206
+ | `benchmarks/eu-za-tebe/tasks.json` | 5 keyword queries for Dim 1 |
207
+ | `benchmarks/eu-za-tebe/ground-truth.json` | 25 natural-language queries for Dim 2 |
208
+ | `benchmarks/eu-za-tebe/results/purecontext.json` | Raw PureContext results |
209
+ | `benchmarks/eu-za-tebe/results/jcodemunch.json` | Raw jCodeMunch results |
210
+ | `benchmarks/eu-za-tebe/results/comparison.md` | Generated side-by-side report |