magector 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Magector Contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,627 @@
1
+ # Magector
2
+
3
+ **Semantic code search engine for Magento 2, powered by ONNX embeddings and HNSW vector search.**
4
+
5
+ Magector indexes an entire Magento 2 codebase and lets you search it with natural language. Instead of grepping for keywords, ask questions like *"how are checkout totals calculated?"* or *"where is the product price determined?"* and get ranked, relevant results in under 50ms.
6
+
7
+ [![Rust](https://img.shields.io/badge/rust-1.75+-orange.svg)](https://www.rust-lang.org)
8
+ [![Node.js](https://img.shields.io/badge/node-18+-green.svg)](https://nodejs.org)
9
+ [![Magento](https://img.shields.io/badge/magento-2.4.x-blue.svg)](https://magento.com)
10
+ [![Accuracy](https://img.shields.io/badge/accuracy-94.4%25-brightgreen.svg)](#validation)
11
+ [![License: MIT](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE)
12
+
13
+ ---
14
+
15
+ ## Why Magector
16
+
17
+ Magento 2 has **18,000+ source files** across hundreds of modules. Finding the right code is slow:
18
+
19
+ | Approach | Finds semantic matches | Understands Magento patterns | Speed (18K files) |
20
+ |----------|:---------------------:|:---------------------------:|:-----------------:|
21
+ | `grep` / `ripgrep` | No | No | 100-500ms |
22
+ | IDE search | No | No | 200-1000ms |
23
+ | GitHub search | Partial | No | 500-2000ms |
24
+ | **Magector** | **Yes** | **Yes** | **15-45ms** |
25
+
26
+ Magector understands that a query about *"payment capture"* should return `Sales/Model/Order/Payment/Operations/CaptureOperation.php`, not just files containing the word "capture".
27
+
28
+ ---
29
+
30
+ ## Features
31
+
32
+ - **Semantic search** -- find code by meaning, not exact keywords
33
+ - **94.4% accuracy** -- validated with 557 test cases across 50+ categories
34
+ - **ONNX embeddings** -- native 384-dim transformer embeddings via ONNX Runtime for higher quality search
35
+ - **Parallel processing** -- batch embedding with parallel intelligence for faster indexing
36
+ - **Magento-aware** -- understands controllers, plugins, observers, blocks, resolvers, repositories, and 20+ Magento patterns
37
+ - **AST-powered** -- tree-sitter parsing for PHP and JavaScript extracts classes, methods, namespaces, and inheritance
38
+ - **Diff analysis** -- risk scoring and change classification for git commits and staged changes
39
+ - **Complexity analysis** -- cyclomatic complexity, function count, and hotspot detection across modules
40
+ - **Fast** -- 15-45ms queries, batched ONNX embedding with adaptive thread scaling
41
+ - **MCP server** -- 19 tools integrating with Claude Code, Cursor, and any MCP-compatible AI tool
42
+ - **Clean architecture** -- Rust core handles all indexing/search, Node.js MCP server delegates to it
43
+
44
+ ---
45
+
46
+ ## Architecture
47
+
48
+ ```
49
+ ┌──────────────────────────────────────────┐
50
+ │ Magector │
51
+ ├──────────────────┬───────────────────────┤
52
+ │ Rust Core │ Node.js Layer │
53
+ │ │ │
54
+ │ ┌────────────┐ │ ┌─────────────────┐ │
55
+ │ │ Tree-sitter│ │ │ MCP Server │ │
56
+ │ │ AST Parser │ │ │ (19 tools) │ │
57
+ │ │ PHP + JS │ │ └────────┬────────┘ │
58
+ │ └─────┬──────┘ │ │ │
59
+ │ │ │ ┌────────┴────────┐ │
60
+ │ ┌─────┴──────┐ │ │ CLI Interface │ │
61
+ │ │ Magento │ │ │ index/search/ │ │
62
+ │ │ Pattern │ │ │ validate │ │
63
+ │ │ Detection │ │ └─────────────────┘ │
64
+ │ └─────┬──────┘ │ │
65
+ │ │ │ │
66
+ │ ┌─────┴──────┐ │ │
67
+ │ │ ONNX │ │ │
68
+ │ │ Embedder │ │ │
69
+ │ │ MiniLM-L6 │ │ │
70
+ │ └─────┬──────┘ │ │
71
+ │ │ │ │
72
+ │ ┌─────┴──────┐ │ │
73
+ │ │ HNSW │ │ │
74
+ │ │ Vector DB │ │ │
75
+ │ └────────────┘ │ │
76
+ └──────────────────┴───────────────────────┘
77
+ ```
78
+
79
+ ### Embedding Pipeline
80
+
81
+ ```
82
+ Source File ──▶ Tree-sitter AST ──▶ Magento Pattern Detection ──▶ Search Text Enrichment
83
+ │ │
84
+ │ ▼
85
+ │ ONNX Runtime
86
+ │ (MiniLM-L6-v2)
87
+ │ │
88
+ │ ▼
89
+ │ 384-dim embedding
90
+ │ │
91
+ ▼ ▼
92
+ Metadata ─────────────────────────────────────────────────────▶ HNSW Index
93
+ (path, class, namespace, type, methods, patterns) (17,891 vectors)
94
+ ```
95
+
96
+ ### Components
97
+
98
+ | Component | Technology | Purpose |
99
+ |-----------|-----------|---------|
100
+ | Embeddings | `ort` (ONNX Runtime) | all-MiniLM-L6-v2, 384 dimensions |
101
+ | Vector search | `hnsw_rs` | Approximate nearest neighbor |
102
+ | PHP parsing | `tree-sitter-php` | Class, method, namespace extraction |
103
+ | JS parsing | `tree-sitter-javascript` | AMD/ES6 module detection |
104
+ | Pattern detection | Custom Rust | 20+ Magento-specific patterns |
105
+ | CLI | `clap` | Command-line interface |
106
+ | MCP server | `@modelcontextprotocol/sdk` | AI tool integration |
107
+
108
+ ---
109
+
110
+ ## Quick Start
111
+
112
+ ### Prerequisites
113
+
114
+ - [Node.js 18+](https://nodejs.org)
115
+
116
+ ### 1. Initialize in Your Magento Project
117
+
118
+ ```bash
119
+ cd /path/to/your/magento2
120
+ npx magector init
121
+ ```
122
+
123
+ This single command:
124
+ - Verifies the Magento project
125
+ - Downloads the ONNX model (~86MB, cached globally in `~/.magector/models/`)
126
+ - Indexes the entire codebase
127
+ - Detects your IDE (Cursor / Claude Code)
128
+ - Writes MCP server configuration
129
+ - Writes IDE rules (`.cursorrules` / `CLAUDE.md`)
130
+ - Adds `magector.db` to `.gitignore`
131
+
132
+ ### 2. Search
133
+
134
+ ```bash
135
+ npx magector search "product price calculation"
136
+ npx magector search "checkout totals collector" -l 20
137
+ ```
138
+
139
+ ### 3. Re-index After Changes
140
+
141
+ ```bash
142
+ npx magector index
143
+ ```
144
+
145
+ ### 4. IDE Setup Only (Skip Indexing)
146
+
147
+ ```bash
148
+ npx magector setup
149
+ ```
150
+
151
+ ---
152
+
153
+ ## CLI Reference
154
+
155
+ ### Rust Core CLI
156
+
157
+ ```
158
+ magector-core <COMMAND>
159
+
160
+ Commands:
161
+ index Index a Magento codebase
162
+ search Search the index semantically
163
+ validate Run validation suite (downloads Magento if needed)
164
+ download Download Magento 2 Open Source
165
+ stats Show index statistics
166
+ embed Generate embedding for text
167
+ ```
168
+
169
+ #### `index`
170
+
171
+ ```bash
172
+ magector-core index [OPTIONS]
173
+
174
+ Options:
175
+ -m, --magento-root <PATH> Path to Magento root directory
176
+ -d, --database <PATH> Index database path [default: ./magector.db]
177
+ -c, --model-cache <PATH> Model cache directory [default: ./models]
178
+ -v, --verbose Enable verbose output
179
+ ```
180
+
181
+ #### `search`
182
+
183
+ ```bash
184
+ magector-core search <QUERY> [OPTIONS]
185
+
186
+ Options:
187
+ -d, --database <PATH> Index database path [default: ./magector.db]
188
+ -l, --limit <N> Number of results [default: 10]
189
+ -f, --format <FORMAT> Output format: text, json [default: text]
190
+ ```
191
+
192
+ ### Node.js CLI
193
+
194
+ ```bash
195
+ npx magector init [path] # Full setup: index + IDE config
196
+ npx magector index [path] # Index (or re-index) Magento codebase
197
+ npx magector search <query> # Search indexed code
198
+ npx magector stats # Show indexer statistics
199
+ npx magector setup [path] # IDE setup only (no indexing)
200
+ npx magector mcp # Start MCP server
201
+ npx magector help # Show help
202
+ ```
203
+
204
+ ### Environment Variables
205
+
206
+ | Variable | Description | Default |
207
+ |----------|-------------|---------|
208
+ | `MAGENTO_ROOT` | Path to Magento installation | Current directory |
209
+ | `MAGECTOR_DB` | Path to index database | `./magector.db` |
210
+ | `MAGECTOR_BIN` | Path to magector-core binary | Auto-detected |
211
+ | `MAGECTOR_MODELS` | Path to ONNX model directory | `~/.magector/models/` |
212
+
213
+ ---
214
+
215
+ ## MCP Server Tools
216
+
217
+ The MCP server exposes 19 tools for AI-assisted Magento development:
218
+
219
+ ### Search Tools
220
+
221
+ | Tool | Description |
222
+ |------|-------------|
223
+ | `magento_search` | Semantic code search with natural language queries |
224
+ | `magento_find_class` | Find PHP class, interface, or trait by name |
225
+ | `magento_find_method` | Find method implementations across the codebase |
226
+
227
+ ### Magento-Specific Finders
228
+
229
+ | Tool | Description |
230
+ |------|-------------|
231
+ | `magento_find_config` | Find XML configuration files (di.xml, events.xml, etc.) |
232
+ | `magento_find_template` | Find PHTML template files |
233
+ | `magento_find_plugin` | Find interceptor plugins and their targets |
234
+ | `magento_find_observer` | Find event observers |
235
+ | `magento_find_controller` | Find controllers by route or action |
236
+ | `magento_find_block` | Find Block classes |
237
+ | `magento_find_graphql` | Find GraphQL resolvers and schema |
238
+ | `magento_find_api` | Find REST API endpoints and webapi.xml routes |
239
+ | `magento_find_cron` | Find cron job definitions |
240
+ | `magento_find_db_schema` | Find database table definitions |
241
+
242
+ ### Analysis Tools
243
+
244
+ | Tool | Description |
245
+ |------|-------------|
246
+ | `magento_analyze_diff` | Analyze git diffs for risk scoring and change classification |
247
+ | `magento_complexity` | Analyze code complexity (cyclomatic, function count, lines) |
248
+
249
+ ### Utility Tools
250
+
251
+ | Tool | Description |
252
+ |------|-------------|
253
+ | `magento_module_structure` | Show module directory structure |
254
+ | `magento_index` | Trigger re-indexing of the codebase |
255
+ | `magento_stats` | View index statistics (ONNX, parallel mode) |
256
+
257
+ ### Query Examples
258
+
259
+ ```
260
+ magento_search("how are checkout totals calculated")
261
+ magento_search("product price with tier pricing and catalog rules")
262
+ magento_find_class("ProductRepositoryInterface")
263
+ magento_find_config("di.xml plugin for ProductRepository")
264
+ magento_find_plugin("save method")
265
+ magento_find_observer("sales_order_place_after")
266
+ magento_find_api("products REST endpoint")
267
+ magento_find_graphql("cart mutation resolver")
268
+ magento_analyze_diff({ commitHash: "abc123" })
269
+ magento_complexity({ module: "Magento_Catalog", threshold: 10 })
270
+ ```
271
+
272
+ ---
273
+
274
+ ## Validation
275
+
276
+ Magector is validated against the complete Magento 2.4.7 codebase with **557 test cases** across **50+ categories**.
277
+
278
+ ### Overall Results
279
+
280
+ | Metric | Value |
281
+ |--------|-------|
282
+ | **Accuracy** | **94.4%** |
283
+ | Tests passed | 526 / 557 |
284
+ | Index size | 17,891 vectors |
285
+ | Query time | 15-45ms |
286
+ | Indexing time | ~3 minutes |
287
+
288
+ ### Category Performance
289
+
290
+ **100% accuracy (34 categories):**
291
+ Controllers, Blocks, Observers, GraphQL, API, Shipping, Tax, Payment, EAV, Indexers, Cron, Email, Import, Export, Cache, Queue, Admin, CMS, Promotions, Debugging, Architecture, Order Management, Plugin Advanced, GraphQL Advanced, API Advanced, Admin Advanced, Email Advanced, Cron Advanced, Queue Advanced, Import Advanced, Payment Advanced, URL Rewrite, SEO, Marketing
292
+
293
+ **90-99% accuracy:**
294
+ Catalog Product (96%), Customer Advanced (95%), Checkout Flow (95%), Shipping Advanced (93.3%), Category (93.3%), Frontend JS (90%), Search (90%)
295
+
296
+ **Known limitations:**
297
+ - XML configuration file search (di.xml, plugin configs) -- semantic search favors PHP files with richer content
298
+ - Very generic single-word queries -- include more context for better results
299
+
300
+ ### Running Validation
301
+
302
+ ```bash
303
+ # Full validation (downloads Magento, indexes, validates)
304
+ cd rust-core
305
+ cargo run --release -- validate
306
+
307
+ # Skip indexing (use existing index)
308
+ cargo run --release -- validate -m ./magento2 --skip-index
309
+
310
+ # Node.js validation suite
311
+ npm run validate
312
+ npm run validate:verbose
313
+ ```
314
+
315
+ ---
316
+
317
+ ## Project Structure
318
+
319
+ ```
320
+ magector/
321
+ ├── src/ # Node.js source
322
+ │ ├── cli.js # CLI entry point (npx magector <command>)
323
+ │ ├── mcp-server.js # MCP server (19 tools, delegates to Rust core)
324
+ │ ├── binary.js # Platform binary resolver
325
+ │ ├── model.js # ONNX model resolver/downloader
326
+ │ ├── init.js # Full init command (index + IDE config)
327
+ │ ├── magento-patterns.js # Magento pattern detection (JS)
328
+ │ ├── templates/ # IDE rules templates
329
+ │ │ ├── cursorrules.js # .cursorrules content
330
+ │ │ └── claude-md.js # CLAUDE.md content
331
+ │ └── validation/ # JS validation suite
332
+ │ ├── validator.js
333
+ │ ├── benchmark.js
334
+ │ ├── test-queries.js
335
+ │ ├── test-data-generator.js
336
+ │ └── accuracy-calculator.js
337
+ ├── tests/ # Automated tests
338
+ │ └── mcp-server.test.js # MCP server tests (Rust core + analysis tools)
339
+ ├── platforms/ # Platform-specific binary packages
340
+ │ ├── darwin-arm64/ # macOS ARM (Apple Silicon)
341
+ │ ├── darwin-x64/ # macOS Intel
342
+ │ ├── linux-x64/ # Linux x64
343
+ │ ├── linux-arm64/ # Linux ARM64
344
+ │ └── win32-x64/ # Windows x64
345
+ ├── rust-core/ # Rust high-performance core
346
+ │ ├── Cargo.toml
347
+ │ ├── src/
348
+ │ │ ├── main.rs # Rust CLI (index, search, validate)
349
+ │ │ ├── lib.rs # Library exports
350
+ │ │ ├── indexer.rs # Core indexing with progress output
351
+ │ │ ├── embedder.rs # ONNX embedding (MiniLM-L6-v2)
352
+ │ │ ├── vectordb.rs # HNSW vector database
353
+ │ │ ├── ast.rs # Tree-sitter AST (PHP + JS)
354
+ │ │ ├── magento.rs # Magento pattern detection (Rust)
355
+ │ │ └── validation.rs # 557 test cases, validation framework
356
+ │ └── models/ # ONNX model files (auto-downloaded)
357
+ │ ├── all-MiniLM-L6-v2.onnx
358
+ │ └── tokenizer.json
359
+ ├── .github/
360
+ │ └── workflows/
361
+ │ └── release.yml # Cross-compile + publish CI
362
+ ├── scripts/
363
+ │ └── setup.sh # Claude Code MCP setup script
364
+ ├── config/
365
+ │ └── mcp-config.json # MCP server configuration template
366
+ ├── package.json
367
+ ├── .gitignore
368
+ ├── LICENSE
369
+ └── README.md
370
+ ```
371
+
372
+ ---
373
+
374
+ ## How It Works
375
+
376
+ ### 1. Indexing
377
+
378
+ Magector scans every `.php`, `.js`, `.xml`, `.phtml`, and `.graphqls` file in a Magento codebase:
379
+
380
+ 1. **AST parsing** -- Tree-sitter extracts class names, namespaces, methods, inheritance, and interface implementations from PHP and JavaScript files
381
+ 2. **Pattern detection** -- Identifies Magento-specific patterns: controllers, models, repositories, plugins, observers, blocks, GraphQL resolvers, admin grids, cron jobs, and more
382
+ 3. **Search text enrichment** -- Combines AST metadata with Magento pattern keywords to create semantically rich text representations
383
+ 4. **Embedding** -- ONNX Runtime generates 384-dimensional vectors using all-MiniLM-L6-v2
384
+ 5. **Indexing** -- Vectors are stored in an HNSW index for sub-millisecond approximate nearest neighbor search
385
+
386
+ ### 2. Searching
387
+
388
+ 1. Query text is enriched with pattern synonyms (e.g., "controller" adds "action execute http request dispatch")
389
+ 2. The enriched query is embedded into the same 384-dimensional vector space
390
+ 3. HNSW finds the nearest neighbors by cosine similarity
391
+ 4. Results are ranked and returned with file path, class name, Magento type, and relevance score
392
+
393
+ ### 3. MCP Integration
394
+
395
+ The MCP server delegates all search/index operations to the Rust core binary. Analysis tools (diff, complexity) use ruvector JS modules directly.
396
+
397
+ ```
398
+ Developer: "How does checkout totals calculation work?"
399
+
400
+
401
+ AI Assistant ──▶ magento_search("checkout totals collector calculate")
402
+
403
+
404
+ MCP Server ──▶ magector-core search (Rust) ──▶ HNSW lookup ──▶ Ranked results
405
+
406
+
407
+ Results:
408
+ 1. Quote/Model/Quote/TotalsCollector.php (0.554)
409
+ 2. Quote/Model/Quote/Address/Total/Collector.php (0.524)
410
+ 3. Quote/Model/Quote/Address/Total/Subtotal.php (0.517)
411
+ ```
412
+
413
+ ---
414
+
415
+ ## Magento Patterns Detected
416
+
417
+ Magector understands these Magento 2 architectural patterns:
418
+
419
+ | Pattern | Detection Method | Example |
420
+ |---------|-----------------|---------|
421
+ | Controller | Path + `execute()` method | `Controller/Adminhtml/Order/View.php` |
422
+ | Model | Path + extends `AbstractModel` | `Model/Product.php` |
423
+ | Repository | Path + implements `RepositoryInterface` | `Model/ProductRepository.php` |
424
+ | Block | Path + extends `AbstractBlock` | `Block/Product/View.php` |
425
+ | Plugin | Path + before/after/around methods | `Plugin/Product/SavePlugin.php` |
426
+ | Observer | Path + implements `ObserverInterface` | `Observer/ProductSaveObserver.php` |
427
+ | GraphQL Resolver | Path + implements `ResolverInterface` | `Model/Resolver/Products.php` |
428
+ | Helper | Path under `Helper/` | `Helper/Data.php` |
429
+ | Cron | Path under `Cron/` | `Cron/CleanExpiredQuotes.php` |
430
+ | Console Command | Path + extends `Command` | `Console/Command/IndexerReindex.php` |
431
+ | Data Provider | Path + `DataProvider` | `Ui/DataProvider/Product/Listing.php` |
432
+ | ViewModel | Path + implements `ArgumentInterface` | `ViewModel/Product/Breadcrumbs.php` |
433
+ | Setup Patch | Path + `Patch/Data` or `Patch/Schema` | `Setup/Patch/Data/AddAttribute.php` |
434
+ | di.xml | Path matching | `etc/di.xml`, `etc/frontend/di.xml` |
435
+ | events.xml | Path matching | `etc/events.xml` |
436
+ | webapi.xml | Path matching | `etc/webapi.xml` |
437
+ | layout XML | Path under `layout/` | `view/frontend/layout/catalog_product_view.xml` |
438
+ | Template | `.phtml` extension | `view/frontend/templates/product/view.phtml` |
439
+ | JavaScript | `.js` with AMD/ES6 detection | `view/frontend/web/js/view/minicart.js` |
440
+ | GraphQL Schema | `.graphqls` extension | `etc/schema.graphqls` |
441
+
442
+ ---
443
+
444
+ ## Configuration
445
+
446
+ ### Cursor IDE Rules
447
+
448
+ Copy `.cursorrules` to your Magento project root for optimized AI-assisted development. The rules instruct the AI to:
449
+
450
+ 1. Use Magector MCP tools before reading files manually
451
+ 2. Write effective semantic queries
452
+ 3. Follow Magento development patterns
453
+ 4. Interpret search results correctly
454
+
455
+ ### Model Configuration
456
+
457
+ The ONNX model (`all-MiniLM-L6-v2`) is automatically downloaded on first run to `rust-core/models/`. To use a different location:
458
+
459
+ ```bash
460
+ magector-core index -m /path/to/magento -c /custom/model/path
461
+ ```
462
+
463
+ ---
464
+
465
+ ## Development
466
+
467
+ ### Building from Source
468
+
469
+ ```bash
470
+ git clone https://github.com/krejcif/magector.git
471
+ cd magector
472
+
473
+ # Install Node.js dependencies
474
+ npm install
475
+
476
+ # Build the Rust core
477
+ cd rust-core
478
+ cargo build --release
479
+ cd ..
480
+
481
+ # The CLI will automatically find the dev binary at rust-core/target/release/magector-core
482
+ node src/cli.js help
483
+ ```
484
+
485
+ ### Building
486
+
487
+ ```bash
488
+ # Rust core
489
+ cd rust-core
490
+ cargo build --release
491
+
492
+ # Run unit tests
493
+ cargo test
494
+
495
+ # Run validation
496
+ cargo run --release -- validate
497
+ ```
498
+
499
+ ### Testing
500
+
501
+ ```bash
502
+ # Run MCP server auto tests (129 tests, requires indexed codebase)
503
+ npm test
504
+
505
+ # Run without index (unit + schema tests only)
506
+ npm run test:no-index
507
+
508
+ # Run Rust unit tests
509
+ cd rust-core && cargo test
510
+
511
+ # Run Rust validation (557 test cases)
512
+ cd rust-core && cargo run --release -- validate -m ./magento2 --skip-index
513
+ ```
514
+
515
+ ### Adding New Magento Patterns
516
+
517
+ 1. Add pattern detection in `rust-core/src/magento.rs`
518
+ 2. Add search text enrichment in `rust-core/src/indexer.rs`
519
+ 3. Add validation test cases in `rust-core/src/validation.rs`
520
+ 4. Rebuild and run validation to verify:
521
+
522
+ ```bash
523
+ cargo build --release
524
+ ./target/release/magector-core validate -m ./magento2 --skip-index
525
+ ```
526
+
527
+ ### Adding MCP Tools
528
+
529
+ 1. Define the tool schema in `src/mcp-server.js` (ListToolsRequestSchema handler)
530
+ 2. Implement the handler in the CallToolRequestSchema handler
531
+ 3. Test with Claude Code or the MCP inspector
532
+
533
+ ---
534
+
535
+ ## Technical Details
536
+
537
+ ### Embedding Model
538
+
539
+ - **Model:** all-MiniLM-L6-v2
540
+ - **Dimensions:** 384
541
+ - **Pooling:** Mean pooling with attention mask
542
+ - **Normalization:** L2 normalized
543
+ - **Runtime:** ONNX Runtime (via `ort` crate)
544
+
545
+ ### Vector Index
546
+
547
+ - **Algorithm:** HNSW (Hierarchical Navigable Small World)
548
+ - **Library:** `hnsw_rs`
549
+ - **Distance metric:** Cosine similarity
550
+ - **Persistence:** JSON serialization (HNSW + metadata)
551
+
552
+ ### Index Structure
553
+
554
+ Each indexed file produces a vector entry with metadata:
555
+
556
+ ```rust
557
+ struct IndexMetadata {
558
+ path: String,
559
+ file_type: String, // php, xml, js, template, graphql
560
+ magento_type: String, // controller, model, block, plugin, ...
561
+ class_name: Option<String>,
562
+ namespace: Option<String>,
563
+ methods: Vec<String>,
564
+ search_text: String, // Enriched searchable text
565
+ is_controller: bool,
566
+ is_plugin: bool,
567
+ is_observer: bool,
568
+ is_model: bool,
569
+ is_block: bool,
570
+ // ... 20+ pattern flags
571
+ }
572
+ ```
573
+
574
+ ### Performance Characteristics
575
+
576
+ | Operation | Time | Notes |
577
+ |-----------|------|-------|
578
+ | Full index (18K files) | ~1 min | Parallel parsing + batched ONNX embedding |
579
+ | Single query | 15-45ms | HNSW approximate nearest neighbor |
580
+ | Embedding generation | ~2ms | ONNX Runtime with CoreML/CUDA |
581
+ | Batch embedding (32) | ~30ms | Batched ONNX inference |
582
+ | Model load | ~500ms | One-time at startup |
583
+ | Index save/load | <1s | Bincode binary serialization |
584
+
585
+ ### Performance Optimizations
586
+
587
+ - **Batched ONNX embedding** -- 32 texts per inference call (vs. 1-at-a-time), 3-5x faster embedding
588
+ - **Dynamic thread scaling** -- ONNX intra-op threads scale to CPU core count (vs. hardcoded 4)
589
+ - **Thread-local AST parsers** -- each rayon thread gets its own tree-sitter parser (no mutex contention)
590
+ - **Bincode persistence** -- binary serialization replaces JSON (3-5x faster save/load, ~5x smaller files)
591
+ - **Adaptive HNSW capacity** -- pre-sized to actual vector count (no wasted memory)
592
+ - **Parallel HNSW insert** -- batch insert uses hnsw_rs parallel insertion on load and index
593
+ - **Optimized file discovery** -- no symlink following, uses cached DirEntry metadata
594
+
595
+ ---
596
+
597
+ ## Roadmap
598
+
599
+ - [ ] Hybrid search (semantic + BM25 keyword matching)
600
+ - [ ] Query intent classification (auto-detect "give me XML" vs "give me PHP")
601
+ - [ ] Filtered search by file type at the vector level
602
+ - [ ] Incremental indexing (only re-index changed files)
603
+ - [ ] VSCode extension
604
+ - [ ] Web UI for browsing results
605
+ - [ ] Support for Magento 2 Commerce (B2B, Staging modules)
606
+
607
+ ---
608
+
609
+ ## License
610
+
611
+ MIT License. See [LICENSE](LICENSE) for details.
612
+
613
+ ---
614
+
615
+ ## Contributing
616
+
617
+ Contributions are welcome. Please:
618
+
619
+ 1. Fork the repository
620
+ 2. Create a feature branch (`git checkout -b feature/improvement`)
621
+ 3. Add tests for new functionality
622
+ 4. Run validation to ensure accuracy doesn't regress
623
+ 5. Submit a pull request
624
+
625
+ ---
626
+
627
+ Built with Rust and Node.js for the Magento community.
@@ -0,0 +1,13 @@
1
+ {
2
+ "mcpServers": {
3
+ "magector": {
4
+ "command": "node",
5
+ "args": ["src/mcp-server.js"],
6
+ "cwd": "/Users/file/Code/magector",
7
+ "env": {
8
+ "MAGENTO_ROOT": "/path/to/your/magento",
9
+ "MAGECTOR_DB": "/Users/file/Code/magector/magector.db"
10
+ }
11
+ }
12
+ }
13
+ }