magector 1.4.3 → 1.5.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +162 -106
- package/package.json +8 -6
- package/src/cli.js +63 -4
- package/src/init.js +58 -23
- package/src/mcp-server.js +261 -25
- package/src/update.js +158 -0
package/README.md
CHANGED
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
# Magector
|
|
2
2
|
|
|
3
|
-
**
|
|
3
|
+
**Technology-aware MCP server for Magento 2 and Adobe Commerce with intelligent indexing and search.**
|
|
4
4
|
|
|
5
|
-
Magector
|
|
5
|
+
Magector is a Model Context Protocol (MCP) server that deeply understands Magento 2 and Adobe Commerce. It builds a semantic vector index of your entire codebase — 18,000+ files across hundreds of modules — and exposes 21 tools that let AI assistants search, navigate, and understand the code with domain-specific intelligence. Instead of grepping for keywords, your AI asks *"how are checkout totals calculated?"* and gets ranked, relevant results in under 50ms, enriched with Magento pattern detection (plugins, observers, controllers, DI preferences, layout XML, and 20+ more).
|
|
6
6
|
|
|
7
7
|
[](https://www.rust-lang.org)
|
|
8
8
|
[](https://nodejs.org)
|
|
@@ -15,38 +15,26 @@ Magector indexes an entire Magento 2 or Adobe Commerce codebase and lets you sea
|
|
|
15
15
|
|
|
16
16
|
## Why Magector
|
|
17
17
|
|
|
18
|
-
Magento 2 and Adobe Commerce have **18,000+
|
|
18
|
+
Magento 2 and Adobe Commerce have **18,000+ PHP, XML, JS, PHTML, and GraphQL files** spread across hundreds of modules. The codebase relies heavily on indirection — plugins intercept methods defined in other modules, observers react to events dispatched elsewhere, `di.xml` rewires interfaces to concrete classes, and layout XML stitches blocks and templates together. No single file tells the full story.
|
|
19
19
|
|
|
20
|
-
|
|
21
|
-
|----------|:---------------------:|:---------------------------:|:-----------------:|
|
|
22
|
-
| `grep` / `ripgrep` | No | No | 100-500ms |
|
|
23
|
-
| IDE search | No | No | 200-1000ms |
|
|
24
|
-
| GitHub search | Partial | No | 500-2000ms |
|
|
25
|
-
| **Magector** | **Yes** | **Yes** | **10-45ms** |
|
|
20
|
+
Generic search tools — `grep`, IDE search, or the keyword matching built into AI assistants — can't bridge this gap. They find literal strings but can't connect *"how does checkout calculate totals?"* to `TotalsCollector.php` when the word "totals" appears in hundreds of unrelated files.
|
|
26
21
|
|
|
27
|
-
Magector
|
|
22
|
+
Magector solves this with three layers of intelligence:
|
|
28
23
|
|
|
29
|
-
|
|
24
|
+
1. **Semantic vector index** — every file is embedded into a 384-dimensional space (ONNX, all-MiniLM-L6-v2) where meaning matters more than keywords. A search for *"payment capture"* returns `CaptureOperation.php` because the embeddings are close, not because the file contains the word "capture".
|
|
30
25
|
|
|
31
|
-
|
|
26
|
+
2. **Magento technology awareness** — 20+ pattern detectors identify plugins, observers, controllers, blocks, cron jobs, GraphQL resolvers, DI preferences, layout XML, and more. Every search result is enriched with what kind of Magento component it is, so the AI client understands the code's role in the system.
|
|
32
27
|
|
|
33
|
-
|
|
28
|
+
3. **Adaptive learning (SONA)** — Magector tracks which results you actually use and adjusts future rankings with MicroLoRA feedback, getting smarter over time without any API calls.
|
|
34
29
|
|
|
35
|
-
|
|
36
|
-
|---|---|---|
|
|
37
|
-
| **Search method** | Keyword grep / ripgrep | Semantic vector search (ONNX embeddings) |
|
|
38
|
-
| **Understands intent** | No -- literal string matching only | Yes -- "payment capture" finds `CaptureOperation.php` |
|
|
39
|
-
| **Magento pattern awareness** | None -- treats all PHP the same | Detects controllers, plugins, observers, blocks, resolvers, cron, and 20+ patterns |
|
|
40
|
-
| **Query speed (36K vectors)** | 200-1000ms per grep pass; multiple rounds needed | 10-45ms single pass |
|
|
41
|
-
| **Context window cost** | Reads many wrong files, burns tokens | Returns structured JSON with ranked results, methods, and snippets |
|
|
42
|
-
| **Works offline** | Yes | Yes -- local ONNX model, no API calls |
|
|
43
|
-
| **Setup** | Built-in | `npx magector init` (one command) |
|
|
30
|
+
The result: your AI assistant calls one MCP tool and gets ranked, pattern-enriched results in 10-45ms — instead of burning tokens grepping through dozens of wrong files. High relevance accuracy means the AI reads fewer, more targeted files, which optimizes context window usage, reduces API costs, and accelerates development cycles.
|
|
44
31
|
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
32
|
+
| Approach | Semantic matches | Magento-aware | Speed (18K files) |
|
|
33
|
+
|----------|:---------------------:|:---------------------------:|:-----------------:|
|
|
34
|
+
| `grep` / `ripgrep` | No | No | 100-500ms |
|
|
35
|
+
| IDE search | No | No | 200-1000ms |
|
|
36
|
+
| GitHub search | Partial | No | 500-2000ms |
|
|
37
|
+
| **Magector** | **Yes** | **Yes** | **10-45ms** |
|
|
50
38
|
|
|
51
39
|
---
|
|
52
40
|
|
|
@@ -69,7 +57,8 @@ Without Magector, asking Claude Code or Cursor *"how are checkout totals calcula
|
|
|
69
57
|
- **Diff analysis** -- risk scoring and change classification for git commits and staged changes
|
|
70
58
|
- **Complexity analysis** -- cyclomatic complexity, function count, and hotspot detection across modules
|
|
71
59
|
- **Fast** -- 10-45ms queries via persistent serve process, batched ONNX embedding with adaptive thread scaling
|
|
72
|
-
- **
|
|
60
|
+
- **LLM description enrichment** -- generate natural-language descriptions of di.xml files using Claude, stored in SQLite, and prepend them to embedding text so descriptions influence vector search ranking (not just post-retrieval display)
|
|
61
|
+
- **MCP server** -- 21 tools integrating with Claude Code, Cursor, and any MCP-compatible AI tool
|
|
73
62
|
- **Clean architecture** -- Rust core handles all indexing/search, Node.js MCP server delegates to it
|
|
74
63
|
|
|
75
64
|
---
|
|
@@ -77,22 +66,27 @@ Without Magector, asking Claude Code or Cursor *"how are checkout totals calcula
|
|
|
77
66
|
## Architecture
|
|
78
67
|
|
|
79
68
|
```mermaid
|
|
80
|
-
flowchart
|
|
81
|
-
subgraph rust ["Rust Core"]
|
|
82
|
-
A["AST Parser · PHP + JS"]
|
|
83
|
-
B["Pattern Detection · 20+"]
|
|
84
|
-
C["ONNX Embedder · 384d"]
|
|
85
|
-
D["HNSW + Reranking"]
|
|
86
|
-
A --> B --> C --> D
|
|
87
|
-
end
|
|
69
|
+
flowchart LR
|
|
88
70
|
subgraph node ["Node.js Layer"]
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
71
|
+
direction TB
|
|
72
|
+
G["CLI<br/>init · index · search · describe"]
|
|
73
|
+
E["MCP Server<br/>21 tools · LRU cache"]
|
|
74
|
+
F["Persistent Serve Process"]
|
|
93
75
|
G --> F
|
|
76
|
+
E --> F
|
|
77
|
+
end
|
|
78
|
+
|
|
79
|
+
F -->|"stdin/stdout JSON"| rust
|
|
80
|
+
|
|
81
|
+
subgraph rust ["Rust Core"]
|
|
82
|
+
direction TB
|
|
83
|
+
A["AST Parser<br/>PHP · JS · XML"]
|
|
84
|
+
B["Pattern Detection<br/>20+ Magento patterns"]
|
|
85
|
+
B2["Description Enrichment<br/>LLM-powered di.xml summaries"]
|
|
86
|
+
C["ONNX Embedder<br/>all-MiniLM-L6-v2 · 384d"]
|
|
87
|
+
D["HNSW Vector Search<br/>hybrid reranking · SONA"]
|
|
88
|
+
A --> B --> B2 --> C --> D
|
|
94
89
|
end
|
|
95
|
-
node -->|stdin/stdout JSON| rust
|
|
96
90
|
|
|
97
91
|
style rust fill:#f4a460,color:#000
|
|
98
92
|
style node fill:#68b684,color:#000
|
|
@@ -101,26 +95,28 @@ flowchart TD
|
|
|
101
95
|
### Indexing Pipeline
|
|
102
96
|
|
|
103
97
|
```mermaid
|
|
104
|
-
flowchart
|
|
105
|
-
A[Source File] --> B[AST Parser]
|
|
106
|
-
B --> C[Pattern Detection]
|
|
107
|
-
C --> D[Text Enrichment]
|
|
108
|
-
D -->
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
98
|
+
flowchart LR
|
|
99
|
+
A["Source File"] --> B["AST Parser"]
|
|
100
|
+
B --> C["Pattern Detection"]
|
|
101
|
+
C --> D["Text Enrichment"]
|
|
102
|
+
D --> D2{"Descriptions DB?"}
|
|
103
|
+
D2 -->|Yes| D3["Prepend LLM Description"]
|
|
104
|
+
D2 -->|No| E["ONNX Embedding"]
|
|
105
|
+
D3 --> E
|
|
106
|
+
E --> F[("HNSW Index")]
|
|
107
|
+
A --> G["Metadata"] --> F
|
|
112
108
|
```
|
|
113
109
|
|
|
114
110
|
### Search Pipeline
|
|
115
111
|
|
|
116
112
|
```mermaid
|
|
117
|
-
flowchart
|
|
118
|
-
Q[Query] --> E1[Synonym Enrichment]
|
|
119
|
-
E1 --> E2[ONNX Embedding]
|
|
120
|
-
E2 --> H[HNSW Search]
|
|
121
|
-
H --> R[Hybrid Reranking]
|
|
122
|
-
R --> SA[SONA Adjustment
|
|
123
|
-
SA --> J[Structured JSON]
|
|
113
|
+
flowchart LR
|
|
114
|
+
Q["Query"] --> E1["Synonym Enrichment"]
|
|
115
|
+
E1 --> E2["ONNX Embedding"]
|
|
116
|
+
E2 --> H["HNSW Search"]
|
|
117
|
+
H --> R["Hybrid Reranking"]
|
|
118
|
+
R --> SA["SONA Adjustment"]
|
|
119
|
+
SA --> J["Structured JSON"]
|
|
124
120
|
```
|
|
125
121
|
|
|
126
122
|
### Components
|
|
@@ -133,6 +129,7 @@ flowchart TD
|
|
|
133
129
|
| JS parsing | `tree-sitter-javascript` | AMD/ES6 module detection |
|
|
134
130
|
| Pattern detection | Custom Rust | 20+ Magento-specific patterns |
|
|
135
131
|
| CLI | `clap` | Command-line interface (index, search, serve, validate) |
|
|
132
|
+
| Descriptions | `rusqlite` (bundled SQLite) | LLM-generated di.xml descriptions stored in SQLite, prepended to embeddings |
|
|
136
133
|
| SONA | Custom Rust | Feedback learning with MicroLoRA + EWC++ |
|
|
137
134
|
| MCP server | `@modelcontextprotocol/sdk` | AI tool integration with structured JSON output |
|
|
138
135
|
|
|
@@ -154,13 +151,14 @@ npx magector init
|
|
|
154
151
|
This single command handles the entire setup:
|
|
155
152
|
|
|
156
153
|
```mermaid
|
|
157
|
-
flowchart
|
|
158
|
-
A["npx magector init"] --> B[Verify
|
|
159
|
-
B --> C[Download Model]
|
|
160
|
-
C --> D[Index
|
|
161
|
-
D --> E[Detect IDE]
|
|
162
|
-
E -->
|
|
163
|
-
|
|
154
|
+
flowchart LR
|
|
155
|
+
A["npx magector init"] --> B["Verify<br/>Project"]
|
|
156
|
+
B --> C["Download<br/>ONNX Model"]
|
|
157
|
+
C --> D["Index<br/>Codebase"]
|
|
158
|
+
D --> E["Detect IDE<br/>Cursor · Claude Code"]
|
|
159
|
+
E --> E2["API Key<br/>(optional)"]
|
|
160
|
+
E2 --> F["Write MCP<br/>Config"]
|
|
161
|
+
F --> G["Update<br/>.gitignore"]
|
|
164
162
|
```
|
|
165
163
|
|
|
166
164
|
### 2. Search
|
|
@@ -195,6 +193,7 @@ Commands:
|
|
|
195
193
|
index Index a Magento codebase
|
|
196
194
|
search Search the index semantically
|
|
197
195
|
serve Start persistent server mode (stdin/stdout JSON protocol)
|
|
196
|
+
describe Generate LLM descriptions for di.xml files (requires ANTHROPIC_API_KEY)
|
|
198
197
|
validate Run validation suite (downloads Magento if needed)
|
|
199
198
|
download Download Magento 2 Open Source
|
|
200
199
|
stats Show index statistics
|
|
@@ -207,33 +206,50 @@ Commands:
|
|
|
207
206
|
magector-core index [OPTIONS]
|
|
208
207
|
|
|
209
208
|
Options:
|
|
210
|
-
-m, --magento-root <PATH>
|
|
211
|
-
-d, --database <PATH>
|
|
212
|
-
-c, --model-cache <PATH>
|
|
213
|
-
|
|
209
|
+
-m, --magento-root <PATH> Path to Magento root directory
|
|
210
|
+
-d, --database <PATH> Index database path [default: ./.magector/index.db]
|
|
211
|
+
-c, --model-cache <PATH> Model cache directory [default: ./models]
|
|
212
|
+
--descriptions-db <PATH> Path to descriptions SQLite DB (descriptions are prepended to embeddings)
|
|
213
|
+
-v, --verbose Enable verbose output
|
|
214
214
|
```
|
|
215
215
|
|
|
216
|
+
When `--descriptions-db` is provided (or auto-detected as `sqlite.db` next to the index), descriptions are prepended to the embedding text as `"Description: {text}\n\n"` before the raw file content. This places semantic terms within the 256-token ONNX window, significantly improving retrieval of di.xml files for natural-language queries.
|
|
217
|
+
|
|
216
218
|
#### `search`
|
|
217
219
|
|
|
218
220
|
```bash
|
|
219
221
|
magector-core search <QUERY> [OPTIONS]
|
|
220
222
|
|
|
221
223
|
Options:
|
|
222
|
-
-d, --database <PATH> Index database path [default:
|
|
224
|
+
-d, --database <PATH> Index database path [default: ./.magector/index.db]
|
|
223
225
|
-l, --limit <N> Number of results [default: 10]
|
|
224
226
|
-f, --format <FORMAT> Output format: text, json [default: text]
|
|
225
227
|
```
|
|
226
228
|
|
|
229
|
+
#### `describe`
|
|
230
|
+
|
|
231
|
+
```bash
|
|
232
|
+
magector-core describe [OPTIONS]
|
|
233
|
+
|
|
234
|
+
Options:
|
|
235
|
+
-m, --magento-root <PATH> Path to Magento root directory
|
|
236
|
+
-o, --output <PATH> Output SQLite database [default: ./.magector/sqlite.db]
|
|
237
|
+
--force Re-describe all files (ignore cache)
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
Generates natural-language descriptions of di.xml files using the Anthropic API (Claude Sonnet). Requires `ANTHROPIC_API_KEY` environment variable. Descriptions are stored in a SQLite database and used during indexing to enrich embeddings. Only files with changed content hashes are re-described (incremental by default).
|
|
241
|
+
|
|
227
242
|
#### `serve`
|
|
228
243
|
|
|
229
244
|
```bash
|
|
230
245
|
magector-core serve [OPTIONS]
|
|
231
246
|
|
|
232
247
|
Options:
|
|
233
|
-
-d, --database <PATH>
|
|
234
|
-
-c, --model-cache <PATH>
|
|
235
|
-
-m, --magento-root <PATH>
|
|
236
|
-
--
|
|
248
|
+
-d, --database <PATH> Index database path [default: ./.magector/index.db]
|
|
249
|
+
-c, --model-cache <PATH> Model cache directory [default: ./models]
|
|
250
|
+
-m, --magento-root <PATH> Magento root (enables file watcher)
|
|
251
|
+
--descriptions-db <PATH> Path to descriptions SQLite DB
|
|
252
|
+
--watch-interval <SECS> File watcher poll interval [default: 60]
|
|
237
253
|
```
|
|
238
254
|
|
|
239
255
|
Starts a persistent process that reads JSON queries from stdin and writes JSON responses to stdout. Keeps the ONNX model and HNSW index resident in memory for fast repeated queries.
|
|
@@ -257,6 +273,16 @@ When `--magento-root` is provided, a background file watcher polls for changed f
|
|
|
257
273
|
// Response:
|
|
258
274
|
{"ok":true,"data":{"running":true,"tracked_files":18234,"last_scan_changes":3,"interval_secs":60}}
|
|
259
275
|
|
|
276
|
+
// Descriptions (all LLM descriptions from SQLite DB):
|
|
277
|
+
{"command":"descriptions"}
|
|
278
|
+
// Response:
|
|
279
|
+
{"ok":true,"data":{"app/code/Magento/Catalog/etc/di.xml":{"hash":"...","description":"...","model":"claude-sonnet-4-5-20250929","timestamp":1769875137},...}}
|
|
280
|
+
|
|
281
|
+
// Describe (generate descriptions + auto-reindex affected files):
|
|
282
|
+
{"command":"describe"}
|
|
283
|
+
// Response:
|
|
284
|
+
{"ok":true,"data":{"files_found":371,"described":5,"skipped":366,"errors":0,"described_paths":["app/code/..."]}}
|
|
285
|
+
|
|
260
286
|
// SONA feedback:
|
|
261
287
|
{"command":"feedback","signals":[{"type":"refinement_to_plugin","query":"checkout totals","timestamp":1700000000000}]}
|
|
262
288
|
// Response:
|
|
@@ -275,26 +301,30 @@ When `--magento-root` is provided, a background file watcher polls for changed f
|
|
|
275
301
|
npx magector init [path] # Full setup: index + IDE config
|
|
276
302
|
npx magector index [path] # Index (or re-index) Magento codebase
|
|
277
303
|
npx magector search <query> # Search indexed code
|
|
304
|
+
npx magector describe [path] # Generate LLM descriptions for di.xml files
|
|
278
305
|
npx magector stats # Show indexer statistics
|
|
279
306
|
npx magector setup [path] # IDE setup only (no indexing)
|
|
280
307
|
npx magector mcp # Start MCP server
|
|
281
308
|
npx magector help # Show help
|
|
282
309
|
```
|
|
283
310
|
|
|
311
|
+
The `describe` command and `magento_describe` MCP tool require an Anthropic API key. During `npx magector init`, you are prompted to paste your key (optional). If provided, it is stored in the MCP config file as the `ANTHROPIC_API_KEY` environment variable so the MCP server can use it automatically. You can also set it manually later by adding `"ANTHROPIC_API_KEY": "sk-..."` to the `env` section in `.mcp.json` or `~/.cursor/mcp.json`.
|
|
312
|
+
|
|
284
313
|
### Environment Variables
|
|
285
314
|
|
|
286
315
|
| Variable | Description | Default |
|
|
287
316
|
|----------|-------------|---------|
|
|
288
317
|
| `MAGENTO_ROOT` | Path to Magento installation | Current directory |
|
|
289
|
-
| `MAGECTOR_DB` | Path to index database |
|
|
318
|
+
| `MAGECTOR_DB` | Path to index database | `./.magector/index.db` |
|
|
290
319
|
| `MAGECTOR_BIN` | Path to magector-core binary | Auto-detected |
|
|
291
320
|
| `MAGECTOR_MODELS` | Path to ONNX model directory | `~/.magector/models/` |
|
|
321
|
+
| `ANTHROPIC_API_KEY` | API key for description generation (`describe` command) | — |
|
|
292
322
|
|
|
293
323
|
---
|
|
294
324
|
|
|
295
325
|
## MCP Server Tools
|
|
296
326
|
|
|
297
|
-
The MCP server exposes
|
|
327
|
+
The MCP server exposes 21 tools for AI-assisted Magento 2 and Adobe Commerce development. All search tools return **structured JSON** with file paths, class names, methods, role badges, and content snippets -- enabling AI clients to parse results programmatically and minimize file-read round-trips.
|
|
298
328
|
|
|
299
329
|
### Output Format
|
|
300
330
|
|
|
@@ -371,6 +401,7 @@ Auto-detects entry type from pattern (`/V1/...` → API, `snake_case` → event,
|
|
|
371
401
|
|------|-------------|
|
|
372
402
|
| `magento_module_structure` | Show complete module structure -- controllers, models, blocks, plugins, observers, configs |
|
|
373
403
|
| `magento_index` | Trigger re-indexing of the codebase |
|
|
404
|
+
| `magento_describe` | Generate LLM descriptions for di.xml files (requires `ANTHROPIC_API_KEY`), stored in SQLite, auto-reindexes affected files |
|
|
374
405
|
| `magento_stats` | View index statistics |
|
|
375
406
|
|
|
376
407
|
### Tool Cross-References
|
|
@@ -378,7 +409,7 @@ Auto-detects entry type from pattern (`/V1/...` → API, `snake_case` → event,
|
|
|
378
409
|
Each tool description includes "See also" hints to help AI clients chain tools effectively:
|
|
379
410
|
|
|
380
411
|
```mermaid
|
|
381
|
-
graph
|
|
412
|
+
graph LR
|
|
382
413
|
cls["find_class"] --> plg["find_plugin"]
|
|
383
414
|
cls --> prf["find_preference"]
|
|
384
415
|
cls --> mtd["find_method"]
|
|
@@ -436,6 +467,7 @@ magento_find_block("cart totals")
|
|
|
436
467
|
magento_find_template("minicart")
|
|
437
468
|
magento_analyze_diff({ commitHash: "abc123" })
|
|
438
469
|
magento_complexity({ module: "Magento_Catalog", threshold: 10 })
|
|
470
|
+
magento_describe()
|
|
439
471
|
magento_trace_flow({ entryPoint: "checkout/cart/add", depth: "deep" })
|
|
440
472
|
magento_trace_flow({ entryPoint: "/V1/products" })
|
|
441
473
|
magento_trace_flow({ entryPoint: "placeOrder", entryType: "graphql" })
|
|
@@ -492,7 +524,7 @@ pie title Test Pass Rate (101 queries)
|
|
|
492
524
|
|
|
493
525
|
### Integration Tests
|
|
494
526
|
|
|
495
|
-
|
|
527
|
+
66 integration tests covering MCP protocol compliance, tool schemas, tool calls (including `magento_describe`), analysis tools, and stdout JSON integrity.
|
|
496
528
|
|
|
497
529
|
### Running Tests
|
|
498
530
|
|
|
@@ -501,7 +533,7 @@ pie title Test Pass Rate (101 queries)
|
|
|
501
533
|
npm run test:accuracy
|
|
502
534
|
npm run test:accuracy:verbose
|
|
503
535
|
|
|
504
|
-
# Integration tests (
|
|
536
|
+
# Integration tests (66 tests)
|
|
505
537
|
npm test
|
|
506
538
|
|
|
507
539
|
# SONA/MicroLoRA benefit evaluation (180 queries, baseline vs post-training)
|
|
@@ -539,6 +571,7 @@ magector/
|
|
|
539
571
|
│ ├── mcp-accuracy.test.js # E2E accuracy tests (101 queries)
|
|
540
572
|
│ ├── mcp-sona.test.js # SONA feedback integration tests (8 tests)
|
|
541
573
|
│ ├── mcp-sona-eval.test.js # SONA/MicroLoRA benefit evaluation (180 queries)
|
|
574
|
+
│ ├── describe-benefit-eval.test.js # Description enrichment benefit evaluation
|
|
542
575
|
│ └── results/ # Test result artifacts
|
|
543
576
|
│ ├── accuracy-report.json
|
|
544
577
|
│ └── sona-eval-report.json
|
|
@@ -558,6 +591,7 @@ magector/
|
|
|
558
591
|
│ │ ├── watcher.rs # File watcher for incremental re-indexing
|
|
559
592
|
│ │ ├── ast.rs # Tree-sitter AST (PHP + JS)
|
|
560
593
|
│ │ ├── magento.rs # Magento pattern detection (Rust)
|
|
594
|
+
│ │ ├── describe.rs # LLM description generation + SQLite storage
|
|
561
595
|
│ │ ├── sona.rs # SONA feedback learning + MicroLoRA + EWC++
|
|
562
596
|
│ │ └── validation.rs # 557 test cases, validation framework
|
|
563
597
|
│ └── models/ # ONNX model files (auto-downloaded)
|
|
@@ -587,8 +621,9 @@ Magector scans every `.php`, `.js`, `.xml`, `.phtml`, and `.graphqls` file in a
|
|
|
587
621
|
1. **AST parsing** -- Tree-sitter extracts class names, namespaces, methods, inheritance, and interface implementations from PHP and JavaScript files
|
|
588
622
|
2. **Pattern detection** -- Identifies Magento-specific patterns: controllers, models, repositories, plugins, observers, blocks, GraphQL resolvers, admin grids, cron jobs, and more
|
|
589
623
|
3. **Search text enrichment** -- Combines AST metadata with Magento pattern keywords to create semantically rich text representations
|
|
590
|
-
4. **
|
|
591
|
-
5. **
|
|
624
|
+
4. **Description enrichment** -- If a descriptions SQLite DB is present, LLM-generated natural-language descriptions are prepended to the embedding text as `"Description: {text}\n\n"`, placing semantic DI concepts (preferences, plugins, virtual types, subsystem names) within the 256-token ONNX window
|
|
625
|
+
5. **Embedding** -- ONNX Runtime generates 384-dimensional vectors using all-MiniLM-L6-v2
|
|
626
|
+
6. **Indexing** -- Vectors are stored in an HNSW index for sub-millisecond approximate nearest neighbor search
|
|
592
627
|
|
|
593
628
|
### 2. Searching
|
|
594
629
|
|
|
@@ -604,20 +639,16 @@ Magector scans every `.php`, `.js`, `.xml`, `.phtml`, and `.graphqls` file in a
|
|
|
604
639
|
The MCP server spawns a persistent Rust process (`magector-core serve`) that keeps the ONNX model and HNSW index loaded in memory. Queries are sent as JSON over stdin and responses returned via stdout -- eliminating the ~2.6s cold-start overhead of loading the model per query. Falls back to single-shot `execFileSync` if the serve process is unavailable.
|
|
605
640
|
|
|
606
641
|
```mermaid
|
|
607
|
-
flowchart
|
|
642
|
+
flowchart LR
|
|
608
643
|
subgraph startup ["Startup (once)"]
|
|
609
|
-
S1[Load Model] --> S2[Load Index]
|
|
610
|
-
S2 --> S3[Ready Signal]
|
|
644
|
+
S1["Load Model"] --> S2["Load Index"] --> S3["Ready Signal"]
|
|
611
645
|
end
|
|
646
|
+
startup --> query
|
|
612
647
|
subgraph query ["Per Query (10-45ms)"]
|
|
613
|
-
Q1[stdin JSON] --> Q2[Embed]
|
|
614
|
-
Q2 --> Q3[HNSW Search]
|
|
615
|
-
Q3 --> Q4[Rerank]
|
|
616
|
-
Q4 --> Q5[stdout JSON]
|
|
648
|
+
Q1["stdin JSON"] --> Q2["Embed"] --> Q3["HNSW Search"] --> Q4["Rerank"] --> Q5["stdout JSON"]
|
|
617
649
|
end
|
|
618
|
-
startup --> query
|
|
619
650
|
subgraph fallback ["Fallback"]
|
|
620
|
-
F1[execFileSync ~2.6s]
|
|
651
|
+
F1["execFileSync ~2.6s"]
|
|
621
652
|
end
|
|
622
653
|
|
|
623
654
|
style startup fill:#e8f4e8,color:#000
|
|
@@ -632,17 +663,12 @@ When the serve process is started with `--magento-root`, a background thread pol
|
|
|
632
663
|
Since `hnsw_rs` does not support point deletion, Magector uses a **tombstone** strategy: old vectors for modified/deleted files are marked as tombstoned and filtered out of search results. New vectors are appended. When tombstoned entries exceed 20% of total vectors, the HNSW graph is automatically rebuilt (compacted) to reclaim memory and restore search performance.
|
|
633
664
|
|
|
634
665
|
```mermaid
|
|
635
|
-
flowchart
|
|
636
|
-
W1[Sleep 60s] --> W2[Scan Filesystem]
|
|
637
|
-
W2 --> W3{Changes?}
|
|
666
|
+
flowchart LR
|
|
667
|
+
W1["Sleep 60s"] --> W2["Scan Filesystem"] --> W3{"Changes?"}
|
|
638
668
|
W3 -->|No| W1
|
|
639
|
-
W3 -->|Yes| W4[Tombstone Old Vectors]
|
|
640
|
-
|
|
641
|
-
|
|
642
|
-
W6 --> W7{Tombstone > 20%?}
|
|
643
|
-
W7 -->|Yes| W8[Compact / Rebuild HNSW]
|
|
644
|
-
W7 -->|No| W9[Save to Disk]
|
|
645
|
-
W8 --> W9
|
|
669
|
+
W3 -->|Yes| W4["Tombstone Old Vectors"] --> W5["Parse + Embed New Files"] --> W6["Append to HNSW"] --> W7{"Tombstone > 20%?"}
|
|
670
|
+
W7 -->|Yes| W8["Compact / Rebuild HNSW"] --> W9["Save to Disk"]
|
|
671
|
+
W7 -->|No| W9
|
|
646
672
|
W9 --> W1
|
|
647
673
|
|
|
648
674
|
style W4 fill:#f4e8e8,color:#000
|
|
@@ -694,7 +720,7 @@ The MCP server tracks sequences of tool calls and sends feedback signals to the
|
|
|
694
720
|
- Learning rate decays with repeated observations (diminishing returns)
|
|
695
721
|
- Learned weights are keyed by normalized, order-independent query term hashes
|
|
696
722
|
- Always active -- no feature flags or build-time opt-in required
|
|
697
|
-
- Persisted via bincode to `<db_path>.sona`
|
|
723
|
+
- Persisted via bincode to `<db_path>.sona` (e.g., `.magector/index.db.sona`)
|
|
698
724
|
|
|
699
725
|
**SONA v2: MicroLoRA + EWC++**
|
|
700
726
|
|
|
@@ -714,6 +740,34 @@ SONA v2 adds embedding-level adaptation via a MicroLoRA adapter and Elastic Weig
|
|
|
714
740
|
cd rust-core && cargo build --release
|
|
715
741
|
```
|
|
716
742
|
|
|
743
|
+
### 7. LLM Description Enrichment
|
|
744
|
+
|
|
745
|
+
Magector can generate natural-language descriptions of di.xml files using the Anthropic API and embed them directly into the vector index. This significantly improves search ranking for semantic queries about dependency injection.
|
|
746
|
+
|
|
747
|
+
**Workflow:**
|
|
748
|
+
|
|
749
|
+
```bash
|
|
750
|
+
# 1. Generate descriptions (one-time, incremental — only re-describes changed files)
|
|
751
|
+
ANTHROPIC_API_KEY=sk-... npx magector describe /path/to/magento
|
|
752
|
+
|
|
753
|
+
# 2. Re-index with descriptions embedded into vectors
|
|
754
|
+
npx magector index /path/to/magento
|
|
755
|
+
```
|
|
756
|
+
|
|
757
|
+
Or via the MCP tool: `magento_describe()` generates descriptions and auto-reindexes affected files in one step.
|
|
758
|
+
|
|
759
|
+
**How it works:** Each di.xml file is sent to Claude Sonnet with a prompt optimized for semantic search retrieval. The resulting description (~70 words) is stored in a SQLite database (`.magector/sqlite.db`). During indexing, descriptions are prepended to the embedding text as `"Description: {text}\n\n"` before the raw file content, placing semantic terms (preferences, plugins, virtual types, subsystem names) within the ONNX model's 256-token window.
|
|
760
|
+
|
|
761
|
+
**Measured impact** (A/B experiment, 25 queries, Magento 2.4.7, 17,891 vectors, 371 described files):
|
|
762
|
+
|
|
763
|
+
| Metric | Without Descriptions | With Descriptions | Delta |
|
|
764
|
+
|--------|---------------------|-------------------|-------|
|
|
765
|
+
| Precision@K | 1.6% | 20.3% | **+18.7%** |
|
|
766
|
+
| MRR | 0.031 | 0.330 | **+0.30** |
|
|
767
|
+
| NDCG@10 | 0.037 | 0.369 | **+0.33** |
|
|
768
|
+
| di.xml results/query | 0.2 | 3.0 | **+2.8** |
|
|
769
|
+
| Query win rate | — | — | **76%** |
|
|
770
|
+
|
|
717
771
|
---
|
|
718
772
|
|
|
719
773
|
## Magento Patterns Detected
|
|
@@ -830,7 +884,7 @@ cargo run --release -- validate
|
|
|
830
884
|
### Testing
|
|
831
885
|
|
|
832
886
|
```bash
|
|
833
|
-
# Integration tests (
|
|
887
|
+
# Integration tests (66 tests, requires indexed codebase)
|
|
834
888
|
npm test
|
|
835
889
|
|
|
836
890
|
# E2E accuracy tests (101 queries)
|
|
@@ -840,7 +894,7 @@ npm run test:accuracy:verbose
|
|
|
840
894
|
# Run without index (unit + schema tests only)
|
|
841
895
|
npm run test:no-index
|
|
842
896
|
|
|
843
|
-
# Rust unit tests (
|
|
897
|
+
# Rust unit tests (37 tests including SONA + descriptions)
|
|
844
898
|
cd rust-core && cargo test
|
|
845
899
|
|
|
846
900
|
# SONA integration tests (8 tests)
|
|
@@ -968,12 +1022,13 @@ gantt
|
|
|
968
1022
|
SONA feedback :done, 2025-04, 30d
|
|
969
1023
|
Incremental index :done, 2025-04, 30d
|
|
970
1024
|
SONA v2 MicroLoRA :done, 2025-05, 15d
|
|
971
|
-
|
|
972
|
-
|
|
973
|
-
|
|
1025
|
+
LLM descriptions :done, 2025-06, 30d
|
|
1026
|
+
Method chunking :active, 2025-07, 30d
|
|
1027
|
+
Intent detection :2025-08, 30d
|
|
1028
|
+
Type filtering :2025-09, 30d
|
|
974
1029
|
section Future
|
|
975
|
-
VSCode extension :2025-
|
|
976
|
-
Web UI :2025-
|
|
1030
|
+
VSCode extension :2025-10, 60d
|
|
1031
|
+
Web UI :2025-12, 60d
|
|
977
1032
|
```
|
|
978
1033
|
|
|
979
1034
|
- [x] Hybrid search (semantic + keyword re-ranking)
|
|
@@ -984,6 +1039,7 @@ gantt
|
|
|
984
1039
|
- [x] Adobe Commerce support (B2B, Staging, and all Commerce-specific modules)
|
|
985
1040
|
- [x] SONA feedback learning (search rankings adapt to MCP tool call patterns)
|
|
986
1041
|
- [x] SONA v2 with MicroLoRA + EWC++ (embedding-level adaptation, prevents catastrophic forgetting)
|
|
1042
|
+
- [x] LLM description enrichment (generate di.xml descriptions via Claude, store in SQLite, embed into vectors for improved search ranking)
|
|
987
1043
|
- [ ] Method-level chunking (per-method vectors for direct method search)
|
|
988
1044
|
- [ ] Query intent classification (auto-detect "give me XML" vs "give me PHP")
|
|
989
1045
|
- [ ] Filtered search by file type at the vector level
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "magector",
|
|
3
|
-
"version": "1.
|
|
3
|
+
"version": "1.5.1",
|
|
4
4
|
"description": "Semantic code search for Magento 2 — index, search, MCP server",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "src/mcp-server.js",
|
|
@@ -25,7 +25,9 @@
|
|
|
25
25
|
"test:accuracy": "node tests/mcp-accuracy.test.js",
|
|
26
26
|
"test:accuracy:verbose": "node tests/mcp-accuracy.test.js --verbose",
|
|
27
27
|
"test:sona-eval": "node tests/mcp-sona-eval.test.js",
|
|
28
|
-
"test:sona-eval:verbose": "node tests/mcp-sona-eval.test.js --verbose"
|
|
28
|
+
"test:sona-eval:verbose": "node tests/mcp-sona-eval.test.js --verbose",
|
|
29
|
+
"test:describe-eval": "node tests/describe-benefit-eval.test.js",
|
|
30
|
+
"test:describe-eval:verbose": "node tests/describe-benefit-eval.test.js --verbose"
|
|
29
31
|
},
|
|
30
32
|
"dependencies": {
|
|
31
33
|
"@modelcontextprotocol/sdk": "^1.0.0",
|
|
@@ -35,10 +37,10 @@
|
|
|
35
37
|
"ruvector": "^0.1.96"
|
|
36
38
|
},
|
|
37
39
|
"optionalDependencies": {
|
|
38
|
-
"@magector/cli-darwin-arm64": "1.
|
|
39
|
-
"@magector/cli-linux-x64": "1.
|
|
40
|
-
"@magector/cli-linux-arm64": "1.
|
|
41
|
-
"@magector/cli-win32-x64": "1.
|
|
40
|
+
"@magector/cli-darwin-arm64": "1.5.1",
|
|
41
|
+
"@magector/cli-linux-x64": "1.5.1",
|
|
42
|
+
"@magector/cli-linux-arm64": "1.5.1",
|
|
43
|
+
"@magector/cli-win32-x64": "1.5.1"
|
|
42
44
|
},
|
|
43
45
|
"keywords": [
|
|
44
46
|
"magento",
|