magector 1.2.12 → 1.2.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +389 -154
  2. package/package.json +5 -5
package/README.md CHANGED
@@ -7,7 +7,7 @@ Magector indexes an entire Magento 2 codebase and lets you search it with natura
7
7
  [![Rust](https://img.shields.io/badge/rust-1.75+-orange.svg)](https://www.rust-lang.org)
8
8
  [![Node.js](https://img.shields.io/badge/node-18+-green.svg)](https://nodejs.org)
9
9
  [![Magento](https://img.shields.io/badge/magento-2.4.x-blue.svg)](https://magento.com)
10
- [![Accuracy](https://img.shields.io/badge/accuracy-96.1%25-brightgreen.svg)](#validation)
10
+ [![Accuracy](https://img.shields.io/badge/accuracy-94.9%25-brightgreen.svg)](#validation)
11
11
  [![License: MIT](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE)
12
12
 
13
13
  ---
@@ -21,7 +21,7 @@ Magento 2 has **18,000+ source files** across hundreds of modules. Finding the r
21
21
  | `grep` / `ripgrep` | No | No | 100-500ms |
22
22
  | IDE search | No | No | 200-1000ms |
23
23
  | GitHub search | Partial | No | 500-2000ms |
24
- | **Magector** | **Yes** | **Yes** | **15-45ms** |
24
+ | **Magector** | **Yes** | **Yes** | **10-45ms** |
25
25
 
26
26
  Magector understands that a query about *"payment capture"* should return `Sales/Model/Order/Payment/Operations/CaptureOperation.php`, not just files containing the word "capture".
27
27
 
@@ -29,37 +29,41 @@ Magector understands that a query about *"payment capture"* should return `Sales
29
29
 
30
30
  ## Magector vs Built-in AI Search
31
31
 
32
- Claude Code and Cursor both have built-in code search but they rely on keyword matching (`grep`/`ripgrep`) and file-tree heuristics. On a Magento 2 codebase with 18,000+ files, that approach breaks down fast.
32
+ Claude Code and Cursor both have built-in code search -- but they rely on keyword matching (`grep`/`ripgrep`) and file-tree heuristics. On a Magento 2 codebase with 18,000+ files, that approach breaks down fast.
33
33
 
34
34
  | Capability | Claude Code / Cursor (built-in) | Magector |
35
35
  |---|---|---|
36
36
  | **Search method** | Keyword grep / ripgrep | Semantic vector search (ONNX embeddings) |
37
- | **Understands intent** | No literal string matching only | Yes "payment capture" finds `CaptureOperation.php` |
38
- | **Magento pattern awareness** | None treats all PHP the same | Detects controllers, plugins, observers, blocks, resolvers, cron, and 20+ patterns |
39
- | **Query speed (18K files)** | 200-1000ms per grep pass; multiple rounds needed | 15-45ms single pass |
40
- | **Context window cost** | Reads many wrong files burns tokens | Returns ranked results AI reads only what matters |
41
- | **Works offline** | Yes | Yes local ONNX model, no API calls |
37
+ | **Understands intent** | No -- literal string matching only | Yes -- "payment capture" finds `CaptureOperation.php` |
38
+ | **Magento pattern awareness** | None -- treats all PHP the same | Detects controllers, plugins, observers, blocks, resolvers, cron, and 20+ patterns |
39
+ | **Query speed (36K vectors)** | 200-1000ms per grep pass; multiple rounds needed | 10-45ms single pass |
40
+ | **Context window cost** | Reads many wrong files, burns tokens | Returns structured JSON with ranked results, methods, and snippets |
41
+ | **Works offline** | Yes | Yes -- local ONNX model, no API calls |
42
42
  | **Setup** | Built-in | `npx magector init` (one command) |
43
43
 
44
44
  ### What this means in practice
45
45
 
46
- Without Magector, asking Claude Code or Cursor *"how are checkout totals calculated?"* triggers multiple grep searches, reads dozens of files, and still may miss the right ones. With Magector, the AI calls `magento_search("checkout totals calculation")` and gets the exact files ranked by relevance in one step saving tokens and time.
46
+ Without Magector, asking Claude Code or Cursor *"how are checkout totals calculated?"* triggers multiple grep searches, reads dozens of files, and still may miss the right ones. With Magector, the AI calls `magento_search("checkout totals calculation")` and gets the exact files ranked by relevance in one step -- saving tokens and time.
47
47
 
48
- **Magector doesn't replace your AI tool it gives it a better search engine.**
48
+ **Magector doesn't replace your AI tool -- it gives it a better search engine.**
49
49
 
50
50
  ---
51
51
 
52
52
  ## Features
53
53
 
54
54
  - **Semantic search** -- find code by meaning, not exact keywords
55
- - **96.1% accuracy** -- validated with 557 test cases across 50+ categories
56
- - **ONNX embeddings** -- native 384-dim transformer embeddings via ONNX Runtime for higher quality search
57
- - **Parallel processing** -- batch embedding with parallel intelligence for faster indexing
55
+ - **94.9% accuracy** -- validated with 101 E2E test queries across 16 tool categories, plus 557 Rust-level test cases
56
+ - **Hybrid search** -- combines semantic vector similarity with keyword re-ranking for best-of-both-worlds results
57
+ - **Structured JSON output** -- results include file path, class name, methods list, role badges, and content snippets for minimal round-trips
58
+ - **Persistent serve mode** -- keeps ONNX model and HNSW index resident in memory, eliminating cold-start latency
59
+ - **ONNX embeddings** -- native 384-dim transformer embeddings via ONNX Runtime
60
+ - **36K+ vectors** -- indexes the complete Magento 2 codebase including framework internals
58
61
  - **Magento-aware** -- understands controllers, plugins, observers, blocks, resolvers, repositories, and 20+ Magento patterns
59
62
  - **AST-powered** -- tree-sitter parsing for PHP and JavaScript extracts classes, methods, namespaces, and inheritance
63
+ - **Cross-tool discovery** -- tool descriptions include keywords and "See also" references so AI clients find the right tool on the first try
60
64
  - **Diff analysis** -- risk scoring and change classification for git commits and staged changes
61
65
  - **Complexity analysis** -- cyclomatic complexity, function count, and hotspot detection across modules
62
- - **Fast** -- 15-45ms queries, batched ONNX embedding with adaptive thread scaling
66
+ - **Fast** -- 10-45ms queries via persistent serve process, batched ONNX embedding with adaptive thread scaling
63
67
  - **MCP server** -- 19 tools integrating with Claude Code, Cursor, and any MCP-compatible AI tool
64
68
  - **Clean architecture** -- Rust core handles all indexing/search, Node.js MCP server delegates to it
65
69
 
@@ -67,52 +71,50 @@ Without Magector, asking Claude Code or Cursor *"how are checkout totals calcula
67
71
 
68
72
  ## Architecture
69
73
 
74
+ ```mermaid
75
+ flowchart TD
76
+ subgraph rust ["Rust Core"]
77
+ A["AST Parser · PHP + JS"]
78
+ B["Pattern Detection · 20+"]
79
+ C["ONNX Embedder · 384d"]
80
+ D["HNSW + Reranking"]
81
+ A --> B --> C --> D
82
+ end
83
+ subgraph node ["Node.js Layer"]
84
+ E["MCP Server · 19 tools"]
85
+ F["Persistent Serve"]
86
+ G["CLI · init/index/search"]
87
+ E --> F
88
+ G --> F
89
+ end
90
+ node -->|stdin/stdout JSON| rust
91
+
92
+ style rust fill:#f4a460,color:#000
93
+ style node fill:#68b684,color:#000
70
94
  ```
71
- ┌──────────────────────────────────────────┐
72
- │ Magector │
73
- ├──────────────────┬───────────────────────┤
74
- │ Rust Core │ Node.js Layer │
75
- │ │ │
76
- ┌────────────┐ │ ┌─────────────────┐ │
77
- Tree-sitter│ │ │ MCP Server │ │
78
- AST Parser │ │ │ (19 tools) │ │
79
- PHP + JS │ │ └────────┬────────┘ │
80
- └─────┬──────┘ │ │ │
81
- │ │ │ ┌────────┴────────┐ │
82
- ┌─────┴──────┐ │ │ CLI Interface │ │
83
- │ │ Magento │ │ │ index/search/ │ │
84
- │ │ Pattern │ │ │ validate │ │
85
- │ │ Detection │ │ └─────────────────┘ │
86
- │ └─────┬──────┘ │ │
87
- │ │ │ │
88
- │ ┌─────┴──────┐ │ │
89
- ONNX │ │ │
90
- Embedder │ │ │
91
- MiniLM-L6 │ │ │
92
- └─────┬──────┘ │ │
93
- │ │ │ │
94
- │ ┌─────┴──────┐ │ │
95
- │ │ HNSW │ │ │
96
- │ │ Vector DB │ │ │
97
- │ └────────────┘ │ │
98
- └──────────────────┴───────────────────────┘
99
- ```
100
-
101
- ### Embedding Pipeline
102
-
103
- ```
104
- Source File ──▶ Tree-sitter AST ──▶ Magento Pattern Detection ──▶ Search Text Enrichment
105
- │ │
106
- │ ▼
107
- │ ONNX Runtime
108
- │ (MiniLM-L6-v2)
109
- │ │
110
- │ ▼
111
- │ 384-dim embedding
112
- │ │
113
- ▼ ▼
114
- Metadata ─────────────────────────────────────────────────────▶ HNSW Index
115
- (path, class, namespace, type, methods, patterns) (17,891 vectors)
95
+
96
+ ### Indexing Pipeline
97
+
98
+ ```mermaid
99
+ flowchart TD
100
+ A[Source File] --> B[AST Parser]
101
+ B --> C[Pattern Detection]
102
+ C --> D[Text Enrichment]
103
+ D --> E[ONNX Embedding]
104
+ E --> F[(HNSW Index)]
105
+ A --> G[Metadata]
106
+ G --> F
107
+ ```
108
+
109
+ ### Search Pipeline
110
+
111
+ ```mermaid
112
+ flowchart TD
113
+ Q[Query] --> E1[Synonym Enrichment]
114
+ E1 --> E2[ONNX Embedding]
115
+ E2 --> H[HNSW Search]
116
+ H --> R[Hybrid Reranking]
117
+ R --> J[Structured JSON]
116
118
  ```
117
119
 
118
120
  ### Components
@@ -120,12 +122,12 @@ Source File ──▶ Tree-sitter AST ──▶ Magento Pattern Detection ──
120
122
  | Component | Technology | Purpose |
121
123
  |-----------|-----------|---------|
122
124
  | Embeddings | `ort` (ONNX Runtime) | all-MiniLM-L6-v2, 384 dimensions |
123
- | Vector search | `hnsw_rs` | Approximate nearest neighbor |
125
+ | Vector search | `hnsw_rs` + hybrid reranking | Approximate nearest neighbor + keyword boosting |
124
126
  | PHP parsing | `tree-sitter-php` | Class, method, namespace extraction |
125
127
  | JS parsing | `tree-sitter-javascript` | AMD/ES6 module detection |
126
128
  | Pattern detection | Custom Rust | 20+ Magento-specific patterns |
127
- | CLI | `clap` | Command-line interface |
128
- | MCP server | `@modelcontextprotocol/sdk` | AI tool integration |
129
+ | CLI | `clap` | Command-line interface (index, search, serve, validate) |
130
+ | MCP server | `@modelcontextprotocol/sdk` | AI tool integration with structured JSON output |
129
131
 
130
132
  ---
131
133
 
@@ -142,14 +144,17 @@ cd /path/to/your/magento2
142
144
  npx magector init
143
145
  ```
144
146
 
145
- This single command:
146
- - Verifies the Magento project
147
- - Downloads the ONNX model (~86MB, cached globally in `~/.magector/models/`)
148
- - Indexes the entire codebase
149
- - Detects your IDE (Cursor / Claude Code)
150
- - Writes MCP server configuration
151
- - Writes IDE rules (`.cursorrules` / `CLAUDE.md`)
152
- - Adds `magector.db` to `.gitignore`
147
+ This single command handles the entire setup:
148
+
149
+ ```mermaid
150
+ flowchart TD
151
+ A["npx magector init"] --> B[Verify Project]
152
+ B --> C[Download Model]
153
+ C --> D[Index Codebase]
154
+ D --> E[Detect IDE]
155
+ E --> F[Write Config]
156
+ F --> G[Update .gitignore]
157
+ ```
153
158
 
154
159
  ### 2. Search
155
160
 
@@ -182,6 +187,7 @@ magector-core <COMMAND>
182
187
  Commands:
183
188
  index Index a Magento codebase
184
189
  search Search the index semantically
190
+ serve Start persistent server mode (stdin/stdout JSON protocol)
185
191
  validate Run validation suite (downloads Magento if needed)
186
192
  download Download Magento 2 Open Source
187
193
  stats Show index statistics
@@ -211,6 +217,31 @@ Options:
211
217
  -f, --format <FORMAT> Output format: text, json [default: text]
212
218
  ```
213
219
 
220
+ #### `serve`
221
+
222
+ ```bash
223
+ magector-core serve [OPTIONS]
224
+
225
+ Options:
226
+ -d, --database <PATH> Index database path [default: ./magector.db]
227
+ -c, --model-cache <PATH> Model cache directory [default: ./models]
228
+ ```
229
+
230
+ Starts a persistent process that reads JSON queries from stdin and writes JSON responses to stdout. Keeps the ONNX model and HNSW index resident in memory for fast repeated queries.
231
+
232
+ **Protocol (one JSON object per line):**
233
+
234
+ ```json
235
+ // Request:
236
+ {"command":"search","query":"product price","limit":10}
237
+
238
+ // Response:
239
+ {"ok":true,"data":[{"id":123,"score":0.85,"metadata":{...}}]}
240
+
241
+ // Stats request:
242
+ {"command":"stats"}
243
+ ```
244
+
214
245
  ### Node.js CLI
215
246
 
216
247
  ```bash
@@ -236,45 +267,112 @@ npx magector help # Show help
236
267
 
237
268
  ## MCP Server Tools
238
269
 
239
- The MCP server exposes 19 tools for AI-assisted Magento development:
270
+ The MCP server exposes 19 tools for AI-assisted Magento development. All search tools return **structured JSON** with file paths, class names, methods, role badges, and content snippets -- enabling AI clients to parse results programmatically and minimize file-read round-trips.
271
+
272
+ ### Output Format
273
+
274
+ All search tools return structured JSON:
275
+
276
+ ```json
277
+ {
278
+ "results": [
279
+ {
280
+ "rank": 1,
281
+ "score": 0.892,
282
+ "path": "vendor/magento/module-catalog/Model/ProductRepository.php",
283
+ "module": "Magento_Catalog",
284
+ "className": "ProductRepository",
285
+ "namespace": "Magento\\Catalog\\Model",
286
+ "methods": ["save", "getById", "getList", "delete", "deleteById"],
287
+ "magentoType": "repository",
288
+ "fileType": "php",
289
+ "badges": ["repository"],
290
+ "snippet": "class ProductRepository implements ProductRepositoryInterface..."
291
+ }
292
+ ],
293
+ "count": 1
294
+ }
295
+ ```
296
+
297
+ **Key fields:**
298
+ - `methods` -- list of method names in the class (avoids needing to read the file)
299
+ - `badges` -- role indicators: `plugin`, `controller`, `observer`, `repository`, `graphql-resolver`, `model`, `block`
300
+ - `snippet` -- first 300 characters of indexed content for quick assessment
240
301
 
241
302
  ### Search Tools
242
303
 
243
304
  | Tool | Description |
244
305
  |------|-------------|
245
- | `magento_search` | Semantic code search with natural language queries |
246
- | `magento_find_class` | Find PHP class, interface, or trait by name |
306
+ | `magento_search` | Semantic search -- find any PHP class, method, XML config, template, or GraphQL schema by natural language |
307
+ | `magento_find_class` | Find PHP class, interface, abstract class, or trait by name |
247
308
  | `magento_find_method` | Find method implementations across the codebase |
248
309
 
249
310
  ### Magento-Specific Finders
250
311
 
251
312
  | Tool | Description |
252
313
  |------|-------------|
253
- | `magento_find_config` | Find XML configuration files (di.xml, events.xml, etc.) |
254
- | `magento_find_template` | Find PHTML template files |
255
- | `magento_find_plugin` | Find interceptor plugins and their targets |
256
- | `magento_find_observer` | Find event observers |
257
- | `magento_find_controller` | Find controllers by route or action |
258
- | `magento_find_block` | Find Block classes |
259
- | `magento_find_graphql` | Find GraphQL resolvers and schema |
260
- | `magento_find_api` | Find REST API endpoints and webapi.xml routes |
261
- | `magento_find_cron` | Find cron job definitions |
262
- | `magento_find_db_schema` | Find database table definitions |
314
+ | `magento_find_config` | Find XML configuration (di.xml, events.xml, routes.xml, system.xml, webapi.xml, module.xml, layout) |
315
+ | `magento_find_template` | Find PHTML template files for frontend or admin rendering |
316
+ | `magento_find_plugin` | Find interceptor plugins (before/after/around methods) and di.xml declarations |
317
+ | `magento_find_observer` | Find event observers and events.xml declarations |
318
+ | `magento_find_preference` | Find DI preference overrides -- which class implements an interface |
319
+ | `magento_find_controller` | Find MVC controllers by frontend or admin route path |
320
+ | `magento_find_block` | Find Block classes for view rendering |
321
+ | `magento_find_graphql` | Find GraphQL schema definitions, resolvers, types, queries, and mutations |
322
+ | `magento_find_api` | Find REST/SOAP API endpoints in webapi.xml |
323
+ | `magento_find_cron` | Find cron job definitions in crontab.xml |
324
+ | `magento_find_db_schema` | Find database table definitions in db_schema.xml (declarative schema) |
263
325
 
264
326
  ### Analysis Tools
265
327
 
266
328
  | Tool | Description |
267
329
  |------|-------------|
268
330
  | `magento_analyze_diff` | Analyze git diffs for risk scoring and change classification |
269
- | `magento_complexity` | Analyze code complexity (cyclomatic, function count, lines) |
331
+ | `magento_complexity` | Analyze cyclomatic complexity, function count, and line count |
270
332
 
271
333
  ### Utility Tools
272
334
 
273
335
  | Tool | Description |
274
336
  |------|-------------|
275
- | `magento_module_structure` | Show module directory structure |
337
+ | `magento_module_structure` | Show complete module structure -- controllers, models, blocks, plugins, observers, configs |
276
338
  | `magento_index` | Trigger re-indexing of the codebase |
277
- | `magento_stats` | View index statistics (ONNX, parallel mode) |
339
+ | `magento_stats` | View index statistics |
340
+
341
+ ### Tool Cross-References
342
+
343
+ Each tool description includes "See also" hints to help AI clients chain tools effectively:
344
+
345
+ ```mermaid
346
+ graph TD
347
+ cls["find_class"] --> plg["find_plugin"]
348
+ cls --> prf["find_preference"]
349
+ cls --> mtd["find_method"]
350
+ cfg["find_config"] --> obs["find_observer"]
351
+ cfg --> prf
352
+ cfg --> api["find_api"]
353
+ plg --> cls
354
+ plg --> mtd
355
+ tpl["find_template"] --> blk["find_block"]
356
+ blk --> tpl
357
+ blk --> cfg
358
+ dbs["find_db_schema"] --> cls
359
+ gql["find_graphql"] --> cls
360
+ gql --> mtd
361
+ ctl["find_controller"] --> cfg
362
+
363
+ style cls fill:#4a90d9,color:#fff
364
+ style mtd fill:#4a90d9,color:#fff
365
+ style cfg fill:#e8a838,color:#000
366
+ style plg fill:#d94a4a,color:#fff
367
+ style obs fill:#d94a4a,color:#fff
368
+ style prf fill:#e8a838,color:#000
369
+ style api fill:#e8a838,color:#000
370
+ style tpl fill:#68b684,color:#000
371
+ style blk fill:#68b684,color:#000
372
+ style dbs fill:#9b59b6,color:#fff
373
+ style gql fill:#9b59b6,color:#fff
374
+ style ctl fill:#4a90d9,color:#fff
375
+ ```
278
376
 
279
377
  ### Query Examples
280
378
 
@@ -282,11 +380,18 @@ The MCP server exposes 19 tools for AI-assisted Magento development:
282
380
  magento_search("how are checkout totals calculated")
283
381
  magento_search("product price with tier pricing and catalog rules")
284
382
  magento_find_class("ProductRepositoryInterface")
383
+ magento_find_method("getById")
285
384
  magento_find_config("di.xml plugin for ProductRepository")
286
- magento_find_plugin("save method")
385
+ magento_find_plugin({ targetClass: "Topmenu" })
287
386
  magento_find_observer("sales_order_place_after")
288
- magento_find_api("products REST endpoint")
289
- magento_find_graphql("cart mutation resolver")
387
+ magento_find_preference("StoreManagerInterface")
388
+ magento_find_api("/V1/orders")
389
+ magento_find_controller("catalog/product/view")
390
+ magento_find_graphql("placeOrder")
391
+ magento_find_db_schema("sales_order")
392
+ magento_find_cron("indexer")
393
+ magento_find_block("cart totals")
394
+ magento_find_template("minicart")
290
395
  magento_analyze_diff({ commitHash: "abc123" })
291
396
  magento_complexity({ module: "Magento_Catalog", threshold: 10 })
292
397
  ```
@@ -310,43 +415,70 @@ Pre-built binaries are provided for the following platforms:
310
415
 
311
416
  ## Validation
312
417
 
313
- Magector is validated against the complete Magento 2.4.7 codebase with **557 test cases** across **50+ categories**.
314
-
315
- ### Overall Results
316
-
317
- | Metric | Value |
318
- |--------|-------|
319
- | **Accuracy** | **96.1%** |
320
- | Tests passed | 535 / 557 |
321
- | Index size | 17,891 vectors |
322
- | Query time | 15-45ms |
323
- | Indexing time | ~3 minutes |
324
-
325
- ### Category Performance
418
+ Magector is validated at two levels:
326
419
 
327
- **100% accuracy (34 categories):**
328
- Controllers, Blocks, Observers, GraphQL, API, Shipping, Tax, Payment, EAV, Indexers, Cron, Email, Import, Export, Cache, Queue, Admin, CMS, Promotions, Debugging, Architecture, Order Management, Plugin Advanced, GraphQL Advanced, API Advanced, Admin Advanced, Email Advanced, Cron Advanced, Queue Advanced, Import Advanced, Payment Advanced, URL Rewrite, SEO, Marketing
420
+ 1. **E2E MCP accuracy tests** -- 101 queries across 16 tool categories via stdio JSON-RPC
421
+ 2. **Rust-level validation** -- 557 test cases across 50+ categories against Magento 2.4.7
329
422
 
330
- **90-99% accuracy:**
331
- Catalog Product (96%), Customer Advanced (95%), Checkout Flow (95%), Shipping Advanced (93.3%), Category (93.3%), Frontend JS (90%), Search (90%)
423
+ ### E2E Accuracy (MCP Tools)
332
424
 
333
- **Known limitations:**
334
- - XML configuration file search (di.xml, plugin configs) -- semantic search favors PHP files with richer content
335
- - Very generic single-word queries -- include more context for better results
425
+ ```mermaid
426
+ ---
427
+ config:
428
+ themeVariables:
429
+ pie1: "#4caf50"
430
+ pie2: "#f44336"
431
+ ---
432
+ pie title Test Pass Rate (101 queries)
433
+ "Passed (101)" : 101
434
+ "Failed (0)" : 0
435
+ ```
336
436
 
337
- ### Running Validation
437
+ | Metric | Value |
438
+ |--------|-------|
439
+ | **Grade** | **A (94.9/100)** |
440
+ | **Pass rate** | 101/101 (100%) |
441
+ | **Precision** | 93.2% |
442
+ | **MRR** | 99.2% |
443
+ | **NDCG@10** | 85.5% |
444
+ | **Index size** | 35,795 vectors |
445
+ | **Query time** | 10-45ms |
446
+
447
+ #### Per-Tool Performance
448
+
449
+ | Tool | Pass | Precision | MRR | NDCG |
450
+ |------|------|-----------|-----|------|
451
+ | find_class | 100% | 100% | 100% | 100% |
452
+ | find_method | 100% | 89% | 100% | 87% |
453
+ | find_controller | 100% | 100% | 100% | -- |
454
+ | find_observer | 100% | 100% | 100% | 100% |
455
+ | find_plugin | 100% | 96% | 100% | 100% |
456
+ | find_preference | 100% | 100% | 100% | 100% |
457
+ | find_api | 100% | 100% | 100% | 100% |
458
+ | find_cron | 100% | 100% | 100% | 100% |
459
+ | find_db_schema | 100% | 100% | 100% | 100% |
460
+ | find_graphql | 100% | 100% | 100% | 100% |
461
+ | find_block | 100% | 100% | 100% | 100% |
462
+ | find_config | 100% | 89% | 89% | 93% |
463
+ | find_template | 100% | 84% | 100% | 100% |
464
+ | search | 100% | 99% | 100% | 100% |
465
+
466
+ ### Integration Tests
467
+
468
+ 62 integration tests covering MCP protocol compliance, tool schemas, tool calls, analysis tools, and stdout JSON integrity.
469
+
470
+ ### Running Tests
338
471
 
339
472
  ```bash
340
- # Full validation (downloads Magento, indexes, validates)
341
- cd rust-core
342
- cargo run --release -- validate
473
+ # E2E accuracy tests (101 queries, requires indexed codebase)
474
+ npm run test:accuracy
475
+ npm run test:accuracy:verbose
343
476
 
344
- # Skip indexing (use existing index)
345
- cargo run --release -- validate -m ./magento2 --skip-index
477
+ # Integration tests (62 tests)
478
+ npm test
346
479
 
347
- # Node.js validation suite
348
- npm run validate
349
- npm run validate:verbose
480
+ # Rust validation (557 test cases)
481
+ cd rust-core && cargo run --release -- validate -m ./magento2 --skip-index
350
482
  ```
351
483
 
352
484
  ---
@@ -357,7 +489,7 @@ npm run validate:verbose
357
489
  magector/
358
490
  ├── src/ # Node.js source
359
491
  │ ├── cli.js # CLI entry point (npx magector <command>)
360
- │ ├── mcp-server.js # MCP server (19 tools, delegates to Rust core)
492
+ │ ├── mcp-server.js # MCP server (19 tools, structured JSON output)
361
493
  │ ├── binary.js # Platform binary resolver
362
494
  │ ├── model.js # ONNX model resolver/downloader
363
495
  │ ├── init.js # Full init command (index + IDE config)
@@ -372,7 +504,10 @@ magector/
372
504
  │ ├── test-data-generator.js
373
505
  │ └── accuracy-calculator.js
374
506
  ├── tests/ # Automated tests
375
- └── mcp-server.test.js # MCP server tests (Rust core + analysis tools)
507
+ ├── mcp-server.test.js # Integration tests (62 tests)
508
+ │ ├── mcp-accuracy.test.js # E2E accuracy tests (101 queries)
509
+ │ └── results/ # Test result artifacts
510
+ │ └── accuracy-report.json
376
511
  ├── platforms/ # Platform-specific binary packages
377
512
  │ ├── darwin-arm64/ # macOS ARM (Apple Silicon)
378
513
  │ ├── linux-x64/ # Linux x64
@@ -381,11 +516,11 @@ magector/
381
516
  ├── rust-core/ # Rust high-performance core
382
517
  │ ├── Cargo.toml
383
518
  │ ├── src/
384
- │ │ ├── main.rs # Rust CLI (index, search, validate)
519
+ │ │ ├── main.rs # Rust CLI (index, search, serve, validate)
385
520
  │ │ ├── lib.rs # Library exports
386
521
  │ │ ├── indexer.rs # Core indexing with progress output
387
522
  │ │ ├── embedder.rs # ONNX embedding (MiniLM-L6-v2)
388
- │ │ ├── vectordb.rs # HNSW vector database
523
+ │ │ ├── vectordb.rs # HNSW vector database + hybrid search
389
524
  │ │ ├── ast.rs # Tree-sitter AST (PHP + JS)
390
525
  │ │ ├── magento.rs # Magento pattern detection (Rust)
391
526
  │ │ └── validation.rs # 557 test cases, validation framework
@@ -424,32 +559,88 @@ Magector scans every `.php`, `.js`, `.xml`, `.phtml`, and `.graphqls` file in a
424
559
  1. Query text is enriched with pattern synonyms (e.g., "controller" adds "action execute http request dispatch")
425
560
  2. The enriched query is embedded into the same 384-dimensional vector space
426
561
  3. HNSW finds the nearest neighbors by cosine similarity
427
- 4. Results are ranked and returned with file path, class name, Magento type, and relevance score
562
+ 4. **Hybrid reranking** boosts results with keyword matches in path and search text
563
+ 5. Results are returned as structured JSON with file path, class name, methods, role badges, and content snippet
564
+
565
+ ### 3. Persistent Serve Mode
566
+
567
+ The MCP server spawns a persistent Rust process (`magector-core serve`) that keeps the ONNX model and HNSW index loaded in memory. Queries are sent as JSON over stdin and responses returned via stdout -- eliminating the ~2.6s cold-start overhead of loading the model per query. Falls back to single-shot `execFileSync` if the serve process is unavailable.
568
+
569
+ ```mermaid
570
+ flowchart TD
571
+ subgraph startup ["Startup (once)"]
572
+ S1[Load Model] --> S2[Load Index]
573
+ S2 --> S3[Ready Signal]
574
+ end
575
+ subgraph query ["Per Query (10-45ms)"]
576
+ Q1[stdin JSON] --> Q2[Embed]
577
+ Q2 --> Q3[HNSW Search]
578
+ Q3 --> Q4[Rerank]
579
+ Q4 --> Q5[stdout JSON]
580
+ end
581
+ startup --> query
582
+ subgraph fallback ["Fallback"]
583
+ F1[execFileSync ~2.6s]
584
+ end
585
+
586
+ style startup fill:#e8f4e8,color:#000
587
+ style query fill:#e8e8f4,color:#000
588
+ style fallback fill:#f4e8e8,color:#000
589
+ ```
428
590
 
429
- ### 3. MCP Integration
591
+ ### 4. MCP Integration
430
592
 
431
593
  The MCP server delegates all search/index operations to the Rust core binary. Analysis tools (diff, complexity) use ruvector JS modules directly.
432
594
 
433
- ```
434
- Developer: "How does checkout totals calculation work?"
435
-
436
-
437
- AI Assistant ──▶ magento_search("checkout totals collector calculate")
438
-
439
-
440
- MCP Server ──▶ magector-core search (Rust) ──▶ HNSW lookup ──▶ Ranked results
441
-
442
-
443
- Results:
444
- 1. Quote/Model/Quote/TotalsCollector.php (0.554)
445
- 2. Quote/Model/Quote/Address/Total/Collector.php (0.524)
446
- 3. Quote/Model/Quote/Address/Total/Subtotal.php (0.517)
595
+ ```mermaid
596
+ sequenceDiagram
597
+ participant Dev
598
+ participant AI
599
+ participant MCP
600
+ participant Rust
601
+ participant HNSW
602
+
603
+ Dev->>AI: "checkout totals?"
604
+ AI->>MCP: magento_search(...)
605
+ MCP->>Rust: JSON query
606
+ Rust->>HNSW: embed + search
607
+ HNSW-->>Rust: candidates
608
+ Rust-->>MCP: JSON results
609
+ MCP-->>AI: paths, methods, badges
610
+ AI-->>Dev: TotalsCollector.php
447
611
  ```
448
612
 
449
613
  ---
450
614
 
451
615
  ## Magento Patterns Detected
452
616
 
617
+ ```mermaid
618
+ mindmap
619
+ root((Patterns))
620
+ PHP
621
+ Controller
622
+ Model
623
+ Repository
624
+ Block
625
+ Helper
626
+ ViewModel
627
+ Interception
628
+ Plugin
629
+ Observer
630
+ Preference
631
+ XML
632
+ di.xml
633
+ events.xml
634
+ webapi.xml
635
+ routes.xml
636
+ crontab.xml
637
+ db_schema.xml
638
+ Frontend
639
+ Template
640
+ JavaScript
641
+ GraphQL
642
+ ```
643
+
453
644
  Magector understands these Magento 2 architectural patterns:
454
645
 
455
646
  | Pattern | Detection Method | Example |
@@ -490,7 +681,7 @@ Copy `.cursorrules` to your Magento project root for optimized AI-assisted devel
490
681
 
491
682
  ### Model Configuration
492
683
 
493
- The ONNX model (`all-MiniLM-L6-v2`) is automatically downloaded on first run to `rust-core/models/`. To use a different location:
684
+ The ONNX model (`all-MiniLM-L6-v2`) is automatically downloaded on first run to `~/.magector/models/`. To use a different location:
494
685
 
495
686
  ```bash
496
687
  magector-core index -m /path/to/magento -c /custom/model/path
@@ -535,16 +726,20 @@ cargo run --release -- validate
535
726
  ### Testing
536
727
 
537
728
  ```bash
538
- # Run MCP server auto tests (129 tests, requires indexed codebase)
729
+ # Integration tests (62 tests, requires indexed codebase)
539
730
  npm test
540
731
 
732
+ # E2E accuracy tests (101 queries)
733
+ npm run test:accuracy
734
+ npm run test:accuracy:verbose
735
+
541
736
  # Run without index (unit + schema tests only)
542
737
  npm run test:no-index
543
738
 
544
- # Run Rust unit tests
739
+ # Rust unit tests
545
740
  cd rust-core && cargo test
546
741
 
547
- # Run Rust validation (557 test cases)
742
+ # Rust validation (557 test cases)
548
743
  cd rust-core && cargo run --release -- validate -m ./magento2 --skip-index
549
744
  ```
550
745
 
@@ -553,18 +748,23 @@ cd rust-core && cargo run --release -- validate -m ./magento2 --skip-index
553
748
  1. Add pattern detection in `rust-core/src/magento.rs`
554
749
  2. Add search text enrichment in `rust-core/src/indexer.rs`
555
750
  3. Add validation test cases in `rust-core/src/validation.rs`
556
- 4. Rebuild and run validation to verify:
751
+ 4. Add E2E accuracy test cases in `tests/mcp-accuracy.test.js`
752
+ 5. Rebuild and run validation to verify:
557
753
 
558
754
  ```bash
559
755
  cargo build --release
560
756
  ./target/release/magector-core validate -m ./magento2 --skip-index
757
+ npm run test:accuracy
561
758
  ```
562
759
 
563
760
  ### Adding MCP Tools
564
761
 
565
762
  1. Define the tool schema in `src/mcp-server.js` (ListToolsRequestSchema handler)
566
- 2. Implement the handler in the CallToolRequestSchema handler
567
- 3. Test with Claude Code or the MCP inspector
763
+ 2. Include keyword-rich descriptions and cross-tool "See also" references
764
+ 3. Implement the handler in the CallToolRequestSchema handler
765
+ 4. Return structured JSON via `formatSearchResults()`
766
+ 5. Add E2E test cases in `tests/mcp-accuracy.test.js`
767
+ 6. Test with Claude Code or the MCP inspector
568
768
 
569
769
  ---
570
770
 
@@ -582,8 +782,10 @@ cargo build --release
582
782
 
583
783
  - **Algorithm:** HNSW (Hierarchical Navigable Small World)
584
784
  - **Library:** `hnsw_rs`
785
+ - **Parameters:** M=32, max_layers=16, ef_construction=200
585
786
  - **Distance metric:** Cosine similarity
586
- - **Persistence:** JSON serialization (HNSW + metadata)
787
+ - **Hybrid search:** Semantic nearest-neighbor + keyword reranking in path and search text
788
+ - **Persistence:** Bincode binary serialization
587
789
 
588
790
  ### Index Structure
589
791
 
@@ -596,13 +798,15 @@ struct IndexMetadata {
596
798
  magento_type: String, // controller, model, block, plugin, ...
597
799
  class_name: Option<String>,
598
800
  namespace: Option<String>,
599
- methods: Vec<String>,
600
- search_text: String, // Enriched searchable text
801
+ methods: Vec<String>, // extracted method names
802
+ search_text: String, // enriched searchable text
601
803
  is_controller: bool,
602
804
  is_plugin: bool,
603
805
  is_observer: bool,
604
806
  is_model: bool,
605
807
  is_block: bool,
808
+ is_repository: bool,
809
+ is_resolver: bool,
606
810
  // ... 20+ pattern flags
607
811
  }
608
812
  ```
@@ -611,8 +815,9 @@ struct IndexMetadata {
611
815
 
612
816
  | Operation | Time | Notes |
613
817
  |-----------|------|-------|
614
- | Full index (18K files) | ~1 min | Parallel parsing + batched ONNX embedding |
615
- | Single query | 15-45ms | HNSW approximate nearest neighbor |
818
+ | Full index (36K vectors) | ~1 min | Parallel parsing + batched ONNX embedding |
819
+ | Single query (warm) | 10-45ms | Persistent serve process, HNSW + rerank |
820
+ | Single query (cold) | ~2.6s | Includes ONNX model + index load |
616
821
  | Embedding generation | ~2ms | ONNX Runtime with CoreML/CUDA |
617
822
  | Batch embedding (32) | ~30ms | Batched ONNX inference |
618
823
  | Model load | ~500ms | One-time at startup |
@@ -620,19 +825,49 @@ struct IndexMetadata {
620
825
 
621
826
  ### Performance Optimizations
622
827
 
828
+ - **Persistent serve mode** -- Rust process keeps ONNX model + HNSW index in memory via stdin/stdout JSON protocol
829
+ - **Query cache** -- LRU cache (200 entries) avoids re-embedding identical queries
830
+ - **Hybrid reranking** -- combines semantic similarity with keyword matching for better precision
623
831
  - **Batched ONNX embedding** -- 32 texts per inference call (vs. 1-at-a-time), 3-5x faster embedding
624
- - **Dynamic thread scaling** -- ONNX intra-op threads scale to CPU core count (vs. hardcoded 4)
832
+ - **Dynamic thread scaling** -- ONNX intra-op threads scale to CPU core count
625
833
  - **Thread-local AST parsers** -- each rayon thread gets its own tree-sitter parser (no mutex contention)
626
834
  - **Bincode persistence** -- binary serialization replaces JSON (3-5x faster save/load, ~5x smaller files)
627
- - **Adaptive HNSW capacity** -- pre-sized to actual vector count (no wasted memory)
835
+ - **Adaptive HNSW capacity** -- pre-sized to actual vector count
628
836
  - **Parallel HNSW insert** -- batch insert uses hnsw_rs parallel insertion on load and index
629
- - **Optimized file discovery** -- no symlink following, uses cached DirEntry metadata
837
+ - **Tuned ef_search** -- optimized search parameters for 36K vector index (ef_search=50 for search, 64 for hybrid)
630
838
 
631
839
  ---
632
840
 
633
841
  ## Roadmap
634
842
 
843
+ ```mermaid
844
+ gantt
845
+ title Roadmap
846
+ dateFormat YYYY-MM
847
+ axisFormat %b
848
+ section Done
849
+ Hybrid search :done, 2025-01, 30d
850
+ Serve mode :done, 2025-02, 30d
851
+ JSON output :done, 2025-03, 15d
852
+ Cross-tool hints :done, 2025-03, 15d
853
+ E2E tests :done, 2025-03, 15d
854
+ section Next
855
+ Method chunking :active, 2025-04, 30d
856
+ Intent detection :2025-05, 30d
857
+ Type filtering :2025-06, 30d
858
+ Incremental index :2025-07, 30d
859
+ section Future
860
+ VSCode extension :2025-08, 60d
861
+ Web UI :2025-10, 60d
862
+ Commerce support :2026-01, 60d
863
+ ```
864
+
635
865
  - [x] Hybrid search (semantic + keyword re-ranking)
866
+ - [x] Persistent serve mode (eliminates cold-start latency)
867
+ - [x] Structured JSON output (methods, badges, snippets)
868
+ - [x] Cross-tool discovery hints for AI clients
869
+ - [x] E2E accuracy test suite (101 queries)
870
+ - [ ] Method-level chunking (per-method vectors for direct method search)
636
871
  - [ ] Query intent classification (auto-detect "give me XML" vs "give me PHP")
637
872
  - [ ] Filtered search by file type at the vector level
638
873
  - [ ] Incremental indexing (only re-index changed files)
@@ -655,7 +890,7 @@ Contributions are welcome. Please:
655
890
  1. Fork the repository
656
891
  2. Create a feature branch (`git checkout -b feature/improvement`)
657
892
  3. Add tests for new functionality
658
- 4. Run validation to ensure accuracy doesn't regress
893
+ 4. Run validation to ensure accuracy doesn't regress: `npm run test:accuracy`
659
894
  5. Submit a pull request
660
895
 
661
896
  ---
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "magector",
3
- "version": "1.2.12",
3
+ "version": "1.2.14",
4
4
  "description": "Semantic code search for Magento 2 — index, search, MCP server",
5
5
  "type": "module",
6
6
  "main": "src/mcp-server.js",
@@ -33,10 +33,10 @@
33
33
  "ruvector": "^0.1.96"
34
34
  },
35
35
  "optionalDependencies": {
36
- "@magector/cli-darwin-arm64": "1.2.12",
37
- "@magector/cli-linux-x64": "1.2.12",
38
- "@magector/cli-linux-arm64": "1.2.12",
39
- "@magector/cli-win32-x64": "1.2.12"
36
+ "@magector/cli-darwin-arm64": "1.2.14",
37
+ "@magector/cli-linux-x64": "1.2.14",
38
+ "@magector/cli-linux-arm64": "1.2.14",
39
+ "@magector/cli-win32-x64": "1.2.14"
40
40
  },
41
41
  "keywords": [
42
42
  "magento",