magector 1.2.11 → 1.2.13

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +402 -154
  2. package/package.json +5 -5
  3. package/src/mcp-server.js +95 -80
package/README.md CHANGED
@@ -7,7 +7,7 @@ Magector indexes an entire Magento 2 codebase and lets you search it with natura
7
7
  [![Rust](https://img.shields.io/badge/rust-1.75+-orange.svg)](https://www.rust-lang.org)
8
8
  [![Node.js](https://img.shields.io/badge/node-18+-green.svg)](https://nodejs.org)
9
9
  [![Magento](https://img.shields.io/badge/magento-2.4.x-blue.svg)](https://magento.com)
10
- [![Accuracy](https://img.shields.io/badge/accuracy-96.1%25-brightgreen.svg)](#validation)
10
+ [![Accuracy](https://img.shields.io/badge/accuracy-94.9%25-brightgreen.svg)](#validation)
11
11
  [![License: MIT](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE)
12
12
 
13
13
  ---
@@ -21,7 +21,7 @@ Magento 2 has **18,000+ source files** across hundreds of modules. Finding the r
21
21
  | `grep` / `ripgrep` | No | No | 100-500ms |
22
22
  | IDE search | No | No | 200-1000ms |
23
23
  | GitHub search | Partial | No | 500-2000ms |
24
- | **Magector** | **Yes** | **Yes** | **15-45ms** |
24
+ | **Magector** | **Yes** | **Yes** | **10-45ms** |
25
25
 
26
26
  Magector understands that a query about *"payment capture"* should return `Sales/Model/Order/Payment/Operations/CaptureOperation.php`, not just files containing the word "capture".
27
27
 
@@ -29,37 +29,41 @@ Magector understands that a query about *"payment capture"* should return `Sales
29
29
 
30
30
  ## Magector vs Built-in AI Search
31
31
 
32
- Claude Code and Cursor both have built-in code search but they rely on keyword matching (`grep`/`ripgrep`) and file-tree heuristics. On a Magento 2 codebase with 18,000+ files, that approach breaks down fast.
32
+ Claude Code and Cursor both have built-in code search -- but they rely on keyword matching (`grep`/`ripgrep`) and file-tree heuristics. On a Magento 2 codebase with 18,000+ files, that approach breaks down fast.
33
33
 
34
34
  | Capability | Claude Code / Cursor (built-in) | Magector |
35
35
  |---|---|---|
36
36
  | **Search method** | Keyword grep / ripgrep | Semantic vector search (ONNX embeddings) |
37
- | **Understands intent** | No literal string matching only | Yes "payment capture" finds `CaptureOperation.php` |
38
- | **Magento pattern awareness** | None treats all PHP the same | Detects controllers, plugins, observers, blocks, resolvers, cron, and 20+ patterns |
39
- | **Query speed (18K files)** | 200-1000ms per grep pass; multiple rounds needed | 15-45ms single pass |
40
- | **Context window cost** | Reads many wrong files burns tokens | Returns ranked results AI reads only what matters |
41
- | **Works offline** | Yes | Yes local ONNX model, no API calls |
37
+ | **Understands intent** | No -- literal string matching only | Yes -- "payment capture" finds `CaptureOperation.php` |
38
+ | **Magento pattern awareness** | None -- treats all PHP the same | Detects controllers, plugins, observers, blocks, resolvers, cron, and 20+ patterns |
39
+ | **Query speed (36K vectors)** | 200-1000ms per grep pass; multiple rounds needed | 10-45ms single pass |
40
+ | **Context window cost** | Reads many wrong files, burns tokens | Returns structured JSON with ranked results, methods, and snippets |
41
+ | **Works offline** | Yes | Yes -- local ONNX model, no API calls |
42
42
  | **Setup** | Built-in | `npx magector init` (one command) |
43
43
 
44
44
  ### What this means in practice
45
45
 
46
- Without Magector, asking Claude Code or Cursor *"how are checkout totals calculated?"* triggers multiple grep searches, reads dozens of files, and still may miss the right ones. With Magector, the AI calls `magento_search("checkout totals calculation")` and gets the exact files ranked by relevance in one step saving tokens and time.
46
+ Without Magector, asking Claude Code or Cursor *"how are checkout totals calculated?"* triggers multiple grep searches, reads dozens of files, and still may miss the right ones. With Magector, the AI calls `magento_search("checkout totals calculation")` and gets the exact files ranked by relevance in one step -- saving tokens and time.
47
47
 
48
- **Magector doesn't replace your AI tool it gives it a better search engine.**
48
+ **Magector doesn't replace your AI tool -- it gives it a better search engine.**
49
49
 
50
50
  ---
51
51
 
52
52
  ## Features
53
53
 
54
54
  - **Semantic search** -- find code by meaning, not exact keywords
55
- - **96.1% accuracy** -- validated with 557 test cases across 50+ categories
56
- - **ONNX embeddings** -- native 384-dim transformer embeddings via ONNX Runtime for higher quality search
57
- - **Parallel processing** -- batch embedding with parallel intelligence for faster indexing
55
+ - **94.9% accuracy** -- validated with 101 E2E test queries across 16 tool categories, plus 557 Rust-level test cases
56
+ - **Hybrid search** -- combines semantic vector similarity with keyword re-ranking for best-of-both-worlds results
57
+ - **Structured JSON output** -- results include file path, class name, methods list, role badges, and content snippets for minimal round-trips
58
+ - **Persistent serve mode** -- keeps ONNX model and HNSW index resident in memory, eliminating cold-start latency
59
+ - **ONNX embeddings** -- native 384-dim transformer embeddings via ONNX Runtime
60
+ - **36K+ vectors** -- indexes the complete Magento 2 codebase including framework internals
58
61
  - **Magento-aware** -- understands controllers, plugins, observers, blocks, resolvers, repositories, and 20+ Magento patterns
59
62
  - **AST-powered** -- tree-sitter parsing for PHP and JavaScript extracts classes, methods, namespaces, and inheritance
63
+ - **Cross-tool discovery** -- tool descriptions include keywords and "See also" references so AI clients find the right tool on the first try
60
64
  - **Diff analysis** -- risk scoring and change classification for git commits and staged changes
61
65
  - **Complexity analysis** -- cyclomatic complexity, function count, and hotspot detection across modules
62
- - **Fast** -- 15-45ms queries, batched ONNX embedding with adaptive thread scaling
66
+ - **Fast** -- 10-45ms queries via persistent serve process, batched ONNX embedding with adaptive thread scaling
63
67
  - **MCP server** -- 19 tools integrating with Claude Code, Cursor, and any MCP-compatible AI tool
64
68
  - **Clean architecture** -- Rust core handles all indexing/search, Node.js MCP server delegates to it
65
69
 
@@ -67,52 +71,48 @@ Without Magector, asking Claude Code or Cursor *"how are checkout totals calcula
67
71
 
68
72
  ## Architecture
69
73
 
74
+ ```mermaid
75
+ block-beta
76
+ columns 2
77
+ block:rust["Rust Core"]:1
78
+ A["Tree-sitter AST Parser\nPHP + JS"]
79
+ B["Magento Pattern Detection\n20+ patterns"]
80
+ C["ONNX Embedder\nMiniLM-L6-v2 · 384 dim"]
81
+ D["HNSW Vector DB\n+ Hybrid Reranking"]
82
+ end
83
+ block:node["Node.js Layer"]:1
84
+ E["MCP Server\n19 tools · JSON output"]
85
+ F["Persistent Serve Process\nstdin/stdout JSON"]
86
+ G["CLI Interface\ninit · index · search · serve"]
87
+ end
88
+
89
+ style rust fill:#f4a460,color:#000
90
+ style node fill:#68b684,color:#000
70
91
  ```
71
- ┌──────────────────────────────────────────┐
72
- │ Magector │
73
- ├──────────────────┬───────────────────────┤
74
- │ Rust Core │ Node.js Layer │
75
- │ │ │
76
- ┌────────────┐ │ ┌─────────────────┐ │
77
- Tree-sitter│ │ │ MCP Server │ │
78
- AST Parser │ │ │ (19 tools) │ │
79
- PHP + JS │ │ └────────┬────────┘ │
80
- └─────┬──────┘ │ │ │
81
- │ │ │ ┌────────┴────────┐ │
82
- ┌─────┴──────┐ │ │ CLI Interface │ │
83
- Magento │ │ │ index/search/ │ │
84
- │ │ Pattern │ │ │ validate │ │
85
- │ │ Detection │ │ └─────────────────┘ │
86
- │ └─────┬──────┘ │ │
87
- │ │ │ │
88
- │ ┌─────┴──────┐ │ │
89
- │ │ ONNX │ │ │
90
- Embedder │ │ │
91
- MiniLM-L6 │ │ │
92
- └─────┬──────┘ │ │
93
- │ │ │ │
94
- ┌─────┴──────┐ │ │
95
- │ │ HNSW │ │ │
96
- │ │ Vector DB │ │ │
97
- │ └────────────┘ │ │
98
- └──────────────────┴───────────────────────┘
99
- ```
100
-
101
- ### Embedding Pipeline
102
-
103
- ```
104
- Source File ──▶ Tree-sitter AST ──▶ Magento Pattern Detection ──▶ Search Text Enrichment
105
- │ │
106
- │ ▼
107
- │ ONNX Runtime
108
- │ (MiniLM-L6-v2)
109
- │ │
110
- │ ▼
111
- │ 384-dim embedding
112
- │ │
113
- ▼ ▼
114
- Metadata ─────────────────────────────────────────────────────▶ HNSW Index
115
- (path, class, namespace, type, methods, patterns) (17,891 vectors)
92
+
93
+ ### Indexing Pipeline
94
+
95
+ ```mermaid
96
+ flowchart LR
97
+ A[Source File\n.php .xml .js .phtml .graphqls] --> B[Tree-sitter\nAST Parser]
98
+ B --> C[Magento Pattern\nDetection]
99
+ C --> D[Search Text\nEnrichment]
100
+ D --> E[ONNX Runtime\nMiniLM-L6-v2]
101
+ E --> F[384-dim\nEmbedding]
102
+ A --> G[Metadata\npath · class · namespace\nmethods · patterns]
103
+ F --> H[(HNSW Index\n35,795 vectors)]
104
+ G --> H
105
+ ```
106
+
107
+ ### Search Pipeline
108
+
109
+ ```mermaid
110
+ flowchart LR
111
+ Q[Query Text] --> E1[Pattern Synonym\nEnrichment]
112
+ E1 --> E2[ONNX Embedding\n384-dim vector]
113
+ E2 --> H[HNSW\nNearest Neighbor]
114
+ H --> R[Hybrid\nReranking]
115
+ R --> J[Structured JSON\npath · class · methods\nbadges · snippet]
116
116
  ```
117
117
 
118
118
  ### Components
@@ -120,12 +120,12 @@ Source File ──▶ Tree-sitter AST ──▶ Magento Pattern Detection ──
120
120
  | Component | Technology | Purpose |
121
121
  |-----------|-----------|---------|
122
122
  | Embeddings | `ort` (ONNX Runtime) | all-MiniLM-L6-v2, 384 dimensions |
123
- | Vector search | `hnsw_rs` | Approximate nearest neighbor |
123
+ | Vector search | `hnsw_rs` + hybrid reranking | Approximate nearest neighbor + keyword boosting |
124
124
  | PHP parsing | `tree-sitter-php` | Class, method, namespace extraction |
125
125
  | JS parsing | `tree-sitter-javascript` | AMD/ES6 module detection |
126
126
  | Pattern detection | Custom Rust | 20+ Magento-specific patterns |
127
- | CLI | `clap` | Command-line interface |
128
- | MCP server | `@modelcontextprotocol/sdk` | AI tool integration |
127
+ | CLI | `clap` | Command-line interface (index, search, serve, validate) |
128
+ | MCP server | `@modelcontextprotocol/sdk` | AI tool integration with structured JSON output |
129
129
 
130
130
  ---
131
131
 
@@ -142,14 +142,17 @@ cd /path/to/your/magento2
142
142
  npx magector init
143
143
  ```
144
144
 
145
- This single command:
146
- - Verifies the Magento project
147
- - Downloads the ONNX model (~86MB, cached globally in `~/.magector/models/`)
148
- - Indexes the entire codebase
149
- - Detects your IDE (Cursor / Claude Code)
150
- - Writes MCP server configuration
151
- - Writes IDE rules (`.cursorrules` / `CLAUDE.md`)
152
- - Adds `magector.db` to `.gitignore`
145
+ This single command handles the entire setup:
146
+
147
+ ```mermaid
148
+ flowchart LR
149
+ A["npx magector init"] --> B[Verify Magento\nProject]
150
+ B --> C[Download ONNX Model\n~86MB · cached]
151
+ C --> D[Index Codebase\n36K+ vectors]
152
+ D --> E[Detect IDE\nCursor / Claude Code]
153
+ E --> F[Write Config\nMCP + IDE rules]
154
+ F --> G[Update .gitignore]
155
+ ```
153
156
 
154
157
  ### 2. Search
155
158
 
@@ -182,6 +185,7 @@ magector-core <COMMAND>
182
185
  Commands:
183
186
  index Index a Magento codebase
184
187
  search Search the index semantically
188
+ serve Start persistent server mode (stdin/stdout JSON protocol)
185
189
  validate Run validation suite (downloads Magento if needed)
186
190
  download Download Magento 2 Open Source
187
191
  stats Show index statistics
@@ -211,6 +215,31 @@ Options:
211
215
  -f, --format <FORMAT> Output format: text, json [default: text]
212
216
  ```
213
217
 
218
+ #### `serve`
219
+
220
+ ```bash
221
+ magector-core serve [OPTIONS]
222
+
223
+ Options:
224
+ -d, --database <PATH> Index database path [default: ./magector.db]
225
+ -c, --model-cache <PATH> Model cache directory [default: ./models]
226
+ ```
227
+
228
+ Starts a persistent process that reads JSON queries from stdin and writes JSON responses to stdout. Keeps the ONNX model and HNSW index resident in memory for fast repeated queries.
229
+
230
+ **Protocol (one JSON object per line):**
231
+
232
+ ```json
233
+ // Request:
234
+ {"command":"search","query":"product price","limit":10}
235
+
236
+ // Response:
237
+ {"ok":true,"data":[{"id":123,"score":0.85,"metadata":{...}}]}
238
+
239
+ // Stats request:
240
+ {"command":"stats"}
241
+ ```
242
+
214
243
  ### Node.js CLI
215
244
 
216
245
  ```bash
@@ -236,45 +265,112 @@ npx magector help # Show help
236
265
 
237
266
  ## MCP Server Tools
238
267
 
239
- The MCP server exposes 19 tools for AI-assisted Magento development:
268
+ The MCP server exposes 19 tools for AI-assisted Magento development. All search tools return **structured JSON** with file paths, class names, methods, role badges, and content snippets -- enabling AI clients to parse results programmatically and minimize file-read round-trips.
269
+
270
+ ### Output Format
271
+
272
+ All search tools return structured JSON:
273
+
274
+ ```json
275
+ {
276
+ "results": [
277
+ {
278
+ "rank": 1,
279
+ "score": 0.892,
280
+ "path": "vendor/magento/module-catalog/Model/ProductRepository.php",
281
+ "module": "Magento_Catalog",
282
+ "className": "ProductRepository",
283
+ "namespace": "Magento\\Catalog\\Model",
284
+ "methods": ["save", "getById", "getList", "delete", "deleteById"],
285
+ "magentoType": "repository",
286
+ "fileType": "php",
287
+ "badges": ["repository"],
288
+ "snippet": "class ProductRepository implements ProductRepositoryInterface..."
289
+ }
290
+ ],
291
+ "count": 1
292
+ }
293
+ ```
294
+
295
+ **Key fields:**
296
+ - `methods` -- list of method names in the class (avoids needing to read the file)
297
+ - `badges` -- role indicators: `plugin`, `controller`, `observer`, `repository`, `graphql-resolver`, `model`, `block`
298
+ - `snippet` -- first 300 characters of indexed content for quick assessment
240
299
 
241
300
  ### Search Tools
242
301
 
243
302
  | Tool | Description |
244
303
  |------|-------------|
245
- | `magento_search` | Semantic code search with natural language queries |
246
- | `magento_find_class` | Find PHP class, interface, or trait by name |
304
+ | `magento_search` | Semantic search -- find any PHP class, method, XML config, template, or GraphQL schema by natural language |
305
+ | `magento_find_class` | Find PHP class, interface, abstract class, or trait by name |
247
306
  | `magento_find_method` | Find method implementations across the codebase |
248
307
 
249
308
  ### Magento-Specific Finders
250
309
 
251
310
  | Tool | Description |
252
311
  |------|-------------|
253
- | `magento_find_config` | Find XML configuration files (di.xml, events.xml, etc.) |
254
- | `magento_find_template` | Find PHTML template files |
255
- | `magento_find_plugin` | Find interceptor plugins and their targets |
256
- | `magento_find_observer` | Find event observers |
257
- | `magento_find_controller` | Find controllers by route or action |
258
- | `magento_find_block` | Find Block classes |
259
- | `magento_find_graphql` | Find GraphQL resolvers and schema |
260
- | `magento_find_api` | Find REST API endpoints and webapi.xml routes |
261
- | `magento_find_cron` | Find cron job definitions |
262
- | `magento_find_db_schema` | Find database table definitions |
312
+ | `magento_find_config` | Find XML configuration (di.xml, events.xml, routes.xml, system.xml, webapi.xml, module.xml, layout) |
313
+ | `magento_find_template` | Find PHTML template files for frontend or admin rendering |
314
+ | `magento_find_plugin` | Find interceptor plugins (before/after/around methods) and di.xml declarations |
315
+ | `magento_find_observer` | Find event observers and events.xml declarations |
316
+ | `magento_find_preference` | Find DI preference overrides -- which class implements an interface |
317
+ | `magento_find_controller` | Find MVC controllers by frontend or admin route path |
318
+ | `magento_find_block` | Find Block classes for view rendering |
319
+ | `magento_find_graphql` | Find GraphQL schema definitions, resolvers, types, queries, and mutations |
320
+ | `magento_find_api` | Find REST/SOAP API endpoints in webapi.xml |
321
+ | `magento_find_cron` | Find cron job definitions in crontab.xml |
322
+ | `magento_find_db_schema` | Find database table definitions in db_schema.xml (declarative schema) |
263
323
 
264
324
  ### Analysis Tools
265
325
 
266
326
  | Tool | Description |
267
327
  |------|-------------|
268
328
  | `magento_analyze_diff` | Analyze git diffs for risk scoring and change classification |
269
- | `magento_complexity` | Analyze code complexity (cyclomatic, function count, lines) |
329
+ | `magento_complexity` | Analyze cyclomatic complexity, function count, and line count |
270
330
 
271
331
  ### Utility Tools
272
332
 
273
333
  | Tool | Description |
274
334
  |------|-------------|
275
- | `magento_module_structure` | Show module directory structure |
335
+ | `magento_module_structure` | Show complete module structure -- controllers, models, blocks, plugins, observers, configs |
276
336
  | `magento_index` | Trigger re-indexing of the codebase |
277
- | `magento_stats` | View index statistics (ONNX, parallel mode) |
337
+ | `magento_stats` | View index statistics |
338
+
339
+ ### Tool Cross-References
340
+
341
+ Each tool description includes "See also" hints to help AI clients chain tools effectively:
342
+
343
+ ```mermaid
344
+ graph LR
345
+ class["find_class"] --> plugin["find_plugin"]
346
+ class --> pref["find_preference"]
347
+ class --> method["find_method"]
348
+ config["find_config"] --> observer["find_observer"]
349
+ config --> pref
350
+ config --> api["find_api"]
351
+ plugin --> class
352
+ plugin --> method
353
+ template["find_template"] --> block["find_block"]
354
+ block --> template
355
+ block --> config
356
+ db["find_db_schema"] --> class
357
+ graphql["find_graphql"] --> class
358
+ graphql --> method
359
+ controller["find_controller"] --> config
360
+
361
+ style class fill:#4a90d9,color:#fff
362
+ style method fill:#4a90d9,color:#fff
363
+ style config fill:#e8a838,color:#000
364
+ style plugin fill:#d94a4a,color:#fff
365
+ style observer fill:#d94a4a,color:#fff
366
+ style pref fill:#e8a838,color:#000
367
+ style api fill:#e8a838,color:#000
368
+ style template fill:#68b684,color:#000
369
+ style block fill:#68b684,color:#000
370
+ style db fill:#9b59b6,color:#fff
371
+ style graphql fill:#9b59b6,color:#fff
372
+ style controller fill:#4a90d9,color:#fff
373
+ ```
278
374
 
279
375
  ### Query Examples
280
376
 
@@ -282,11 +378,18 @@ The MCP server exposes 19 tools for AI-assisted Magento development:
282
378
  magento_search("how are checkout totals calculated")
283
379
  magento_search("product price with tier pricing and catalog rules")
284
380
  magento_find_class("ProductRepositoryInterface")
381
+ magento_find_method("getById")
285
382
  magento_find_config("di.xml plugin for ProductRepository")
286
- magento_find_plugin("save method")
383
+ magento_find_plugin({ targetClass: "Topmenu" })
287
384
  magento_find_observer("sales_order_place_after")
288
- magento_find_api("products REST endpoint")
289
- magento_find_graphql("cart mutation resolver")
385
+ magento_find_preference("StoreManagerInterface")
386
+ magento_find_api("/V1/orders")
387
+ magento_find_controller("catalog/product/view")
388
+ magento_find_graphql("placeOrder")
389
+ magento_find_db_schema("sales_order")
390
+ magento_find_cron("indexer")
391
+ magento_find_block("cart totals")
392
+ magento_find_template("minicart")
290
393
  magento_analyze_diff({ commitHash: "abc123" })
291
394
  magento_complexity({ module: "Magento_Catalog", threshold: 10 })
292
395
  ```
@@ -310,43 +413,74 @@ Pre-built binaries are provided for the following platforms:
310
413
 
311
414
  ## Validation
312
415
 
313
- Magector is validated against the complete Magento 2.4.7 codebase with **557 test cases** across **50+ categories**.
314
-
315
- ### Overall Results
316
-
317
- | Metric | Value |
318
- |--------|-------|
319
- | **Accuracy** | **96.1%** |
320
- | Tests passed | 535 / 557 |
321
- | Index size | 17,891 vectors |
322
- | Query time | 15-45ms |
323
- | Indexing time | ~3 minutes |
416
+ Magector is validated at two levels:
324
417
 
325
- ### Category Performance
418
+ 1. **E2E MCP accuracy tests** -- 101 queries across 16 tool categories via stdio JSON-RPC
419
+ 2. **Rust-level validation** -- 557 test cases across 50+ categories against Magento 2.4.7
326
420
 
327
- **100% accuracy (34 categories):**
328
- Controllers, Blocks, Observers, GraphQL, API, Shipping, Tax, Payment, EAV, Indexers, Cron, Email, Import, Export, Cache, Queue, Admin, CMS, Promotions, Debugging, Architecture, Order Management, Plugin Advanced, GraphQL Advanced, API Advanced, Admin Advanced, Email Advanced, Cron Advanced, Queue Advanced, Import Advanced, Payment Advanced, URL Rewrite, SEO, Marketing
421
+ ### E2E Accuracy (MCP Tools)
329
422
 
330
- **90-99% accuracy:**
331
- Catalog Product (96%), Customer Advanced (95%), Checkout Flow (95%), Shipping Advanced (93.3%), Category (93.3%), Frontend JS (90%), Search (90%)
332
-
333
- **Known limitations:**
334
- - XML configuration file search (di.xml, plugin configs) -- semantic search favors PHP files with richer content
335
- - Very generic single-word queries -- include more context for better results
423
+ ```mermaid
424
+ ---
425
+ config:
426
+ themeVariables:
427
+ pie1: "#4caf50"
428
+ pie2: "#ff9800"
429
+ pie3: "#2196f3"
430
+ pie4: "#9c27b0"
431
+ ---
432
+ pie title Accuracy Breakdown (94.9/100)
433
+ "Pass Rate (100%)" : 100
434
+ "Precision (93.2%)" : 93.2
435
+ "MRR (99.2%)" : 99.2
436
+ "NDCG@10 (85.5%)" : 85.5
437
+ ```
336
438
 
337
- ### Running Validation
439
+ | Metric | Value |
440
+ |--------|-------|
441
+ | **Grade** | **A (94.9/100)** |
442
+ | **Pass rate** | 101/101 (100%) |
443
+ | **Precision** | 93.2% |
444
+ | **MRR** | 99.2% |
445
+ | **NDCG@10** | 85.5% |
446
+ | **Index size** | 35,795 vectors |
447
+ | **Query time** | 10-45ms |
448
+
449
+ #### Per-Tool Performance
450
+
451
+ | Tool | Pass | Precision | MRR | NDCG |
452
+ |------|------|-----------|-----|------|
453
+ | find_class | 100% | 100% | 100% | 100% |
454
+ | find_method | 100% | 89% | 100% | 87% |
455
+ | find_controller | 100% | 100% | 100% | -- |
456
+ | find_observer | 100% | 100% | 100% | 100% |
457
+ | find_plugin | 100% | 96% | 100% | 100% |
458
+ | find_preference | 100% | 100% | 100% | 100% |
459
+ | find_api | 100% | 100% | 100% | 100% |
460
+ | find_cron | 100% | 100% | 100% | 100% |
461
+ | find_db_schema | 100% | 100% | 100% | 100% |
462
+ | find_graphql | 100% | 100% | 100% | 100% |
463
+ | find_block | 100% | 100% | 100% | 100% |
464
+ | find_config | 100% | 89% | 89% | 93% |
465
+ | find_template | 100% | 84% | 100% | 100% |
466
+ | search | 100% | 99% | 100% | 100% |
467
+
468
+ ### Integration Tests
469
+
470
+ 62 integration tests covering MCP protocol compliance, tool schemas, tool calls, analysis tools, and stdout JSON integrity.
471
+
472
+ ### Running Tests
338
473
 
339
474
  ```bash
340
- # Full validation (downloads Magento, indexes, validates)
341
- cd rust-core
342
- cargo run --release -- validate
475
+ # E2E accuracy tests (101 queries, requires indexed codebase)
476
+ npm run test:accuracy
477
+ npm run test:accuracy:verbose
343
478
 
344
- # Skip indexing (use existing index)
345
- cargo run --release -- validate -m ./magento2 --skip-index
479
+ # Integration tests (62 tests)
480
+ npm test
346
481
 
347
- # Node.js validation suite
348
- npm run validate
349
- npm run validate:verbose
482
+ # Rust validation (557 test cases)
483
+ cd rust-core && cargo run --release -- validate -m ./magento2 --skip-index
350
484
  ```
351
485
 
352
486
  ---
@@ -357,7 +491,7 @@ npm run validate:verbose
357
491
  magector/
358
492
  ├── src/ # Node.js source
359
493
  │ ├── cli.js # CLI entry point (npx magector <command>)
360
- │ ├── mcp-server.js # MCP server (19 tools, delegates to Rust core)
494
+ │ ├── mcp-server.js # MCP server (19 tools, structured JSON output)
361
495
  │ ├── binary.js # Platform binary resolver
362
496
  │ ├── model.js # ONNX model resolver/downloader
363
497
  │ ├── init.js # Full init command (index + IDE config)
@@ -372,7 +506,10 @@ magector/
372
506
  │ ├── test-data-generator.js
373
507
  │ └── accuracy-calculator.js
374
508
  ├── tests/ # Automated tests
375
- └── mcp-server.test.js # MCP server tests (Rust core + analysis tools)
509
+ ├── mcp-server.test.js # Integration tests (62 tests)
510
+ │ ├── mcp-accuracy.test.js # E2E accuracy tests (101 queries)
511
+ │ └── results/ # Test result artifacts
512
+ │ └── accuracy-report.json
376
513
  ├── platforms/ # Platform-specific binary packages
377
514
  │ ├── darwin-arm64/ # macOS ARM (Apple Silicon)
378
515
  │ ├── linux-x64/ # Linux x64
@@ -381,11 +518,11 @@ magector/
381
518
  ├── rust-core/ # Rust high-performance core
382
519
  │ ├── Cargo.toml
383
520
  │ ├── src/
384
- │ │ ├── main.rs # Rust CLI (index, search, validate)
521
+ │ │ ├── main.rs # Rust CLI (index, search, serve, validate)
385
522
  │ │ ├── lib.rs # Library exports
386
523
  │ │ ├── indexer.rs # Core indexing with progress output
387
524
  │ │ ├── embedder.rs # ONNX embedding (MiniLM-L6-v2)
388
- │ │ ├── vectordb.rs # HNSW vector database
525
+ │ │ ├── vectordb.rs # HNSW vector database + hybrid search
389
526
  │ │ ├── ast.rs # Tree-sitter AST (PHP + JS)
390
527
  │ │ ├── magento.rs # Magento pattern detection (Rust)
391
528
  │ │ └── validation.rs # 557 test cases, validation framework
@@ -424,32 +561,100 @@ Magector scans every `.php`, `.js`, `.xml`, `.phtml`, and `.graphqls` file in a
424
561
  1. Query text is enriched with pattern synonyms (e.g., "controller" adds "action execute http request dispatch")
425
562
  2. The enriched query is embedded into the same 384-dimensional vector space
426
563
  3. HNSW finds the nearest neighbors by cosine similarity
427
- 4. Results are ranked and returned with file path, class name, Magento type, and relevance score
564
+ 4. **Hybrid reranking** boosts results with keyword matches in path and search text
565
+ 5. Results are returned as structured JSON with file path, class name, methods, role badges, and content snippet
428
566
 
429
- ### 3. MCP Integration
567
+ ### 3. Persistent Serve Mode
430
568
 
431
- The MCP server delegates all search/index operations to the Rust core binary. Analysis tools (diff, complexity) use ruvector JS modules directly.
569
+ The MCP server spawns a persistent Rust process (`magector-core serve`) that keeps the ONNX model and HNSW index loaded in memory. Queries are sent as JSON over stdin and responses returned via stdout -- eliminating the ~2.6s cold-start overhead of loading the model per query. Falls back to single-shot `execFileSync` if the serve process is unavailable.
432
570
 
571
+ ```mermaid
572
+ flowchart TB
573
+ subgraph startup ["Startup (once)"]
574
+ S1[Load ONNX Model\n~500ms] --> S2[Load HNSW Index\n~1s]
575
+ S2 --> S3["Send ready signal\n{ok:true, ready:true}"]
576
+ end
577
+
578
+ subgraph query ["Per Query (~10-45ms)"]
579
+ Q1["stdin: JSON query"] --> Q2[Embed query\n~2ms]
580
+ Q2 --> Q3[HNSW search\n~5-15ms]
581
+ Q3 --> Q4[Hybrid rerank\n~1ms]
582
+ Q4 --> Q5["stdout: JSON response"]
583
+ end
584
+
585
+ startup --> query
586
+
587
+ subgraph fallback ["Fallback (if serve unavailable)"]
588
+ F1["execFileSync\n~2.6s cold start"]
589
+ end
590
+
591
+ style startup fill:#e8f4e8,color:#000
592
+ style query fill:#e8e8f4,color:#000
593
+ style fallback fill:#f4e8e8,color:#000
433
594
  ```
434
- Developer: "How does checkout totals calculation work?"
435
-
436
-
437
- AI Assistant ──▶ magento_search("checkout totals collector calculate")
438
-
439
-
440
- MCP Server ──▶ magector-core search (Rust) ──▶ HNSW lookup ──▶ Ranked results
441
-
442
-
443
- Results:
444
- 1. Quote/Model/Quote/TotalsCollector.php (0.554)
445
- 2. Quote/Model/Quote/Address/Total/Collector.php (0.524)
446
- 3. Quote/Model/Quote/Address/Total/Subtotal.php (0.517)
595
+
596
+ ### 4. MCP Integration
597
+
598
+ The MCP server delegates all search/index operations to the Rust core binary. Analysis tools (diff, complexity) use ruvector JS modules directly.
599
+
600
+ ```mermaid
601
+ sequenceDiagram
602
+ participant Dev as Developer
603
+ participant AI as AI Assistant
604
+ participant MCP as MCP Server<br/>(Node.js)
605
+ participant Rust as Persistent Rust<br/>Process
606
+ participant HNSW as HNSW Index<br/>(35K vectors)
607
+
608
+ Dev->>AI: "How does checkout totals calculation work?"
609
+ AI->>MCP: magento_search("checkout totals collector")
610
+ MCP->>Rust: {"command":"search","query":"...","limit":10}
611
+ Rust->>HNSW: Embed query → nearest neighbor
612
+ HNSW-->>Rust: Top candidates + scores
613
+ Rust-->>MCP: {"ok":true,"data":[...]}
614
+ MCP-->>AI: Structured JSON with paths,<br/>methods, badges, snippets
615
+ AI-->>Dev: TotalsCollector.php,<br/>Address/Total/Collector.php, ...
447
616
  ```
448
617
 
449
618
  ---
450
619
 
451
620
  ## Magento Patterns Detected
452
621
 
622
+ ```mermaid
623
+ mindmap
624
+ root((Magento 2\nPatterns))
625
+ PHP Classes
626
+ Controller
627
+ Model
628
+ Repository
629
+ Block
630
+ Helper
631
+ ViewModel
632
+ Console Command
633
+ Data Provider
634
+ Interception
635
+ Plugin
636
+ Observer
637
+ Preference
638
+ XML Config
639
+ di.xml
640
+ events.xml
641
+ webapi.xml
642
+ routes.xml
643
+ system.xml
644
+ layout XML
645
+ module.xml
646
+ crontab.xml
647
+ db_schema.xml
648
+ Frontend
649
+ PHTML Template
650
+ JavaScript
651
+ GraphQL Schema
652
+ GraphQL Resolver
653
+ Database
654
+ Setup Patch
655
+ Declarative Schema
656
+ ```
657
+
453
658
  Magector understands these Magento 2 architectural patterns:
454
659
 
455
660
  | Pattern | Detection Method | Example |
@@ -490,7 +695,7 @@ Copy `.cursorrules` to your Magento project root for optimized AI-assisted devel
490
695
 
491
696
  ### Model Configuration
492
697
 
493
- The ONNX model (`all-MiniLM-L6-v2`) is automatically downloaded on first run to `rust-core/models/`. To use a different location:
698
+ The ONNX model (`all-MiniLM-L6-v2`) is automatically downloaded on first run to `~/.magector/models/`. To use a different location:
494
699
 
495
700
  ```bash
496
701
  magector-core index -m /path/to/magento -c /custom/model/path
@@ -535,16 +740,20 @@ cargo run --release -- validate
535
740
  ### Testing
536
741
 
537
742
  ```bash
538
- # Run MCP server auto tests (129 tests, requires indexed codebase)
743
+ # Integration tests (62 tests, requires indexed codebase)
539
744
  npm test
540
745
 
746
+ # E2E accuracy tests (101 queries)
747
+ npm run test:accuracy
748
+ npm run test:accuracy:verbose
749
+
541
750
  # Run without index (unit + schema tests only)
542
751
  npm run test:no-index
543
752
 
544
- # Run Rust unit tests
753
+ # Rust unit tests
545
754
  cd rust-core && cargo test
546
755
 
547
- # Run Rust validation (557 test cases)
756
+ # Rust validation (557 test cases)
548
757
  cd rust-core && cargo run --release -- validate -m ./magento2 --skip-index
549
758
  ```
550
759
 
@@ -553,18 +762,23 @@ cd rust-core && cargo run --release -- validate -m ./magento2 --skip-index
553
762
  1. Add pattern detection in `rust-core/src/magento.rs`
554
763
  2. Add search text enrichment in `rust-core/src/indexer.rs`
555
764
  3. Add validation test cases in `rust-core/src/validation.rs`
556
- 4. Rebuild and run validation to verify:
765
+ 4. Add E2E accuracy test cases in `tests/mcp-accuracy.test.js`
766
+ 5. Rebuild and run validation to verify:
557
767
 
558
768
  ```bash
559
769
  cargo build --release
560
770
  ./target/release/magector-core validate -m ./magento2 --skip-index
771
+ npm run test:accuracy
561
772
  ```
562
773
 
563
774
  ### Adding MCP Tools
564
775
 
565
776
  1. Define the tool schema in `src/mcp-server.js` (ListToolsRequestSchema handler)
566
- 2. Implement the handler in the CallToolRequestSchema handler
567
- 3. Test with Claude Code or the MCP inspector
777
+ 2. Include keyword-rich descriptions and cross-tool "See also" references
778
+ 3. Implement the handler in the CallToolRequestSchema handler
779
+ 4. Return structured JSON via `formatSearchResults()`
780
+ 5. Add E2E test cases in `tests/mcp-accuracy.test.js`
781
+ 6. Test with Claude Code or the MCP inspector
568
782
 
569
783
  ---
570
784
 
@@ -582,8 +796,10 @@ cargo build --release
582
796
 
583
797
  - **Algorithm:** HNSW (Hierarchical Navigable Small World)
584
798
  - **Library:** `hnsw_rs`
799
+ - **Parameters:** M=32, max_layers=16, ef_construction=200
585
800
  - **Distance metric:** Cosine similarity
586
- - **Persistence:** JSON serialization (HNSW + metadata)
801
+ - **Hybrid search:** Semantic nearest-neighbor + keyword reranking in path and search text
802
+ - **Persistence:** Bincode binary serialization
587
803
 
588
804
  ### Index Structure
589
805
 
@@ -596,13 +812,15 @@ struct IndexMetadata {
596
812
  magento_type: String, // controller, model, block, plugin, ...
597
813
  class_name: Option<String>,
598
814
  namespace: Option<String>,
599
- methods: Vec<String>,
600
- search_text: String, // Enriched searchable text
815
+ methods: Vec<String>, // extracted method names
816
+ search_text: String, // enriched searchable text
601
817
  is_controller: bool,
602
818
  is_plugin: bool,
603
819
  is_observer: bool,
604
820
  is_model: bool,
605
821
  is_block: bool,
822
+ is_repository: bool,
823
+ is_resolver: bool,
606
824
  // ... 20+ pattern flags
607
825
  }
608
826
  ```
@@ -611,8 +829,9 @@ struct IndexMetadata {
611
829
 
612
830
  | Operation | Time | Notes |
613
831
  |-----------|------|-------|
614
- | Full index (18K files) | ~1 min | Parallel parsing + batched ONNX embedding |
615
- | Single query | 15-45ms | HNSW approximate nearest neighbor |
832
+ | Full index (36K vectors) | ~1 min | Parallel parsing + batched ONNX embedding |
833
+ | Single query (warm) | 10-45ms | Persistent serve process, HNSW + rerank |
834
+ | Single query (cold) | ~2.6s | Includes ONNX model + index load |
616
835
  | Embedding generation | ~2ms | ONNX Runtime with CoreML/CUDA |
617
836
  | Batch embedding (32) | ~30ms | Batched ONNX inference |
618
837
  | Model load | ~500ms | One-time at startup |
@@ -620,19 +839,48 @@ struct IndexMetadata {
620
839
 
621
840
  ### Performance Optimizations
622
841
 
842
+ - **Persistent serve mode** -- Rust process keeps ONNX model + HNSW index in memory via stdin/stdout JSON protocol
843
+ - **Query cache** -- LRU cache (200 entries) avoids re-embedding identical queries
844
+ - **Hybrid reranking** -- combines semantic similarity with keyword matching for better precision
623
845
  - **Batched ONNX embedding** -- 32 texts per inference call (vs. 1-at-a-time), 3-5x faster embedding
624
- - **Dynamic thread scaling** -- ONNX intra-op threads scale to CPU core count (vs. hardcoded 4)
846
+ - **Dynamic thread scaling** -- ONNX intra-op threads scale to CPU core count
625
847
  - **Thread-local AST parsers** -- each rayon thread gets its own tree-sitter parser (no mutex contention)
626
848
  - **Bincode persistence** -- binary serialization replaces JSON (3-5x faster save/load, ~5x smaller files)
627
- - **Adaptive HNSW capacity** -- pre-sized to actual vector count (no wasted memory)
849
+ - **Adaptive HNSW capacity** -- pre-sized to actual vector count
628
850
  - **Parallel HNSW insert** -- batch insert uses hnsw_rs parallel insertion on load and index
629
- - **Optimized file discovery** -- no symlink following, uses cached DirEntry metadata
851
+ - **Tuned ef_search** -- optimized search parameters for 36K vector index (ef_search=50 for search, 64 for hybrid)
630
852
 
631
853
  ---
632
854
 
633
855
  ## Roadmap
634
856
 
857
+ ```mermaid
858
+ gantt
859
+ title Magector Development Roadmap
860
+ dateFormat YYYY-MM
861
+ axisFormat %b %Y
862
+ section Completed
863
+ Hybrid search (semantic + keyword) :done, 2025-01, 2025-02
864
+ Persistent serve mode :done, 2025-02, 2025-03
865
+ Structured JSON output :done, 2025-03, 2025-03
866
+ Cross-tool discovery hints :done, 2025-03, 2025-03
867
+ E2E accuracy test suite (101 queries) :done, 2025-03, 2025-03
868
+ section Planned
869
+ Method-level chunking :active, 2025-04, 2025-05
870
+ Query intent classification :2025-05, 2025-06
871
+ Vector-level file type filtering :2025-06, 2025-07
872
+ Incremental indexing :2025-07, 2025-08
873
+ VSCode extension :2025-08, 2025-10
874
+ Web UI for browsing results :2025-10, 2025-12
875
+ Magento Commerce support :2026-01, 2026-03
876
+ ```
877
+
635
878
  - [x] Hybrid search (semantic + keyword re-ranking)
879
+ - [x] Persistent serve mode (eliminates cold-start latency)
880
+ - [x] Structured JSON output (methods, badges, snippets)
881
+ - [x] Cross-tool discovery hints for AI clients
882
+ - [x] E2E accuracy test suite (101 queries)
883
+ - [ ] Method-level chunking (per-method vectors for direct method search)
636
884
  - [ ] Query intent classification (auto-detect "give me XML" vs "give me PHP")
637
885
  - [ ] Filtered search by file type at the vector level
638
886
  - [ ] Incremental indexing (only re-index changed files)
@@ -655,7 +903,7 @@ Contributions are welcome. Please:
655
903
  1. Fork the repository
656
904
  2. Create a feature branch (`git checkout -b feature/improvement`)
657
905
  3. Add tests for new functionality
658
- 4. Run validation to ensure accuracy doesn't regress
906
+ 4. Run validation to ensure accuracy doesn't regress: `npm run test:accuracy`
659
907
  5. Submit a pull request
660
908
 
661
909
  ---