magector 1.4.3 → 1.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,8 +1,8 @@
1
1
  # Magector
2
2
 
3
- **Semantic code search engine for Magento 2 and Adobe Commerce, powered by ONNX embeddings and HNSW vector search.**
3
+ **Technology-aware MCP server for Magento 2 and Adobe Commerce with intelligent indexing and search.**
4
4
 
5
- Magector indexes an entire Magento 2 or Adobe Commerce codebase and lets you search it with natural language. Instead of grepping for keywords, ask questions like *"how are checkout totals calculated?"* or *"where is the product price determined?"* and get ranked, relevant results in under 50ms.
5
+ Magector is a Model Context Protocol (MCP) server that deeply understands Magento 2 and Adobe Commerce. It builds a semantic vector index of your entire codebase — 18,000+ files across hundreds of modules — and exposes 21 tools that let AI assistants search, navigate, and understand the code with domain-specific intelligence. Instead of grepping for keywords, your AI asks *"how are checkout totals calculated?"* and gets ranked, relevant results in under 50ms, enriched with Magento pattern detection (plugins, observers, controllers, DI preferences, layout XML, and 20+ more).
6
6
 
7
7
  [![Rust](https://img.shields.io/badge/rust-1.75+-orange.svg)](https://www.rust-lang.org)
8
8
  [![Node.js](https://img.shields.io/badge/node-18+-green.svg)](https://nodejs.org)
@@ -15,38 +15,26 @@ Magector indexes an entire Magento 2 or Adobe Commerce codebase and lets you sea
15
15
 
16
16
  ## Why Magector
17
17
 
18
- Magento 2 and Adobe Commerce have **18,000+ source files** across hundreds of modules. Finding the right code is slow:
18
+ Magento 2 and Adobe Commerce have **18,000+ PHP, XML, JS, PHTML, and GraphQL files** spread across hundreds of modules. The codebase relies heavily on indirection — plugins intercept methods defined in other modules, observers react to events dispatched elsewhere, `di.xml` rewires interfaces to concrete classes, and layout XML stitches blocks and templates together. No single file tells the full story.
19
19
 
20
- | Approach | Finds semantic matches | Understands Magento patterns | Speed (18K files) |
21
- |----------|:---------------------:|:---------------------------:|:-----------------:|
22
- | `grep` / `ripgrep` | No | No | 100-500ms |
23
- | IDE search | No | No | 200-1000ms |
24
- | GitHub search | Partial | No | 500-2000ms |
25
- | **Magector** | **Yes** | **Yes** | **10-45ms** |
20
+ Generic search tools `grep`, IDE search, or the keyword matching built into AI assistants — can't bridge this gap. They find literal strings but can't connect *"how does checkout calculate totals?"* to `TotalsCollector.php` when the word "totals" appears in hundreds of unrelated files.
26
21
 
27
- Magector understands that a query about *"payment capture"* should return `Sales/Model/Order/Payment/Operations/CaptureOperation.php`, not just files containing the word "capture".
22
+ Magector solves this with three layers of intelligence:
28
23
 
29
- ---
24
+ 1. **Semantic vector index** — every file is embedded into a 384-dimensional space (ONNX, all-MiniLM-L6-v2) where meaning matters more than keywords. A search for *"payment capture"* returns `CaptureOperation.php` because the embeddings are close, not because the file contains the word "capture".
30
25
 
31
- ## Magector vs Built-in AI Search
26
+ 2. **Magento technology awareness** — 20+ pattern detectors identify plugins, observers, controllers, blocks, cron jobs, GraphQL resolvers, DI preferences, layout XML, and more. Every search result is enriched with what kind of Magento component it is, so the AI client understands the code's role in the system.
32
27
 
33
- Claude Code and Cursor both have built-in code search -- but they rely on keyword matching (`grep`/`ripgrep`) and file-tree heuristics. On a Magento 2 / Adobe Commerce codebase with 18,000+ files, that approach breaks down fast.
28
+ 3. **Adaptive learning (SONA)** Magector tracks which results you actually use and adjusts future rankings with MicroLoRA feedback, getting smarter over time without any API calls.
34
29
 
35
- | Capability | Claude Code / Cursor (built-in) | Magector |
36
- |---|---|---|
37
- | **Search method** | Keyword grep / ripgrep | Semantic vector search (ONNX embeddings) |
38
- | **Understands intent** | No -- literal string matching only | Yes -- "payment capture" finds `CaptureOperation.php` |
39
- | **Magento pattern awareness** | None -- treats all PHP the same | Detects controllers, plugins, observers, blocks, resolvers, cron, and 20+ patterns |
40
- | **Query speed (36K vectors)** | 200-1000ms per grep pass; multiple rounds needed | 10-45ms single pass |
41
- | **Context window cost** | Reads many wrong files, burns tokens | Returns structured JSON with ranked results, methods, and snippets |
42
- | **Works offline** | Yes | Yes -- local ONNX model, no API calls |
43
- | **Setup** | Built-in | `npx magector init` (one command) |
30
+ The result: your AI assistant calls one MCP tool and gets ranked, pattern-enriched results in 10-45ms instead of burning tokens grepping through dozens of wrong files. High relevance accuracy means the AI reads fewer, more targeted files, which optimizes context window usage, reduces API costs, and accelerates development cycles.
44
31
 
45
- ### What this means in practice
46
-
47
- Without Magector, asking Claude Code or Cursor *"how are checkout totals calculated?"* triggers multiple grep searches, reads dozens of files, and still may miss the right ones. With Magector, the AI calls `magento_search("checkout totals calculation")` and gets the exact files ranked by relevance in one step -- saving tokens and time.
48
-
49
- **Magector doesn't replace your AI tool -- it gives it a better search engine.**
32
+ | Approach | Semantic matches | Magento-aware | Speed (18K files) |
33
+ |----------|:---------------------:|:---------------------------:|:-----------------:|
34
+ | `grep` / `ripgrep` | No | No | 100-500ms |
35
+ | IDE search | No | No | 200-1000ms |
36
+ | GitHub search | Partial | No | 500-2000ms |
37
+ | **Magector** | **Yes** | **Yes** | **10-45ms** |
50
38
 
51
39
  ---
52
40
 
@@ -69,7 +57,8 @@ Without Magector, asking Claude Code or Cursor *"how are checkout totals calcula
69
57
  - **Diff analysis** -- risk scoring and change classification for git commits and staged changes
70
58
  - **Complexity analysis** -- cyclomatic complexity, function count, and hotspot detection across modules
71
59
  - **Fast** -- 10-45ms queries via persistent serve process, batched ONNX embedding with adaptive thread scaling
72
- - **MCP server** -- 20 tools integrating with Claude Code, Cursor, and any MCP-compatible AI tool
60
+ - **LLM description enrichment** -- generate natural-language descriptions of di.xml files using Claude, stored in SQLite, and prepend them to embedding text so descriptions influence vector search ranking (not just post-retrieval display)
61
+ - **MCP server** -- 21 tools integrating with Claude Code, Cursor, and any MCP-compatible AI tool
73
62
  - **Clean architecture** -- Rust core handles all indexing/search, Node.js MCP server delegates to it
74
63
 
75
64
  ---
@@ -77,22 +66,27 @@ Without Magector, asking Claude Code or Cursor *"how are checkout totals calcula
77
66
  ## Architecture
78
67
 
79
68
  ```mermaid
80
- flowchart TD
81
- subgraph rust ["Rust Core"]
82
- A["AST Parser · PHP + JS"]
83
- B["Pattern Detection · 20+"]
84
- C["ONNX Embedder · 384d"]
85
- D["HNSW + Reranking"]
86
- A --> B --> C --> D
87
- end
69
+ flowchart LR
88
70
  subgraph node ["Node.js Layer"]
89
- E["MCP Server · 20 tools"]
90
- F["Persistent Serve"]
91
- G["CLI · init/index/search"]
92
- E --> F
71
+ direction TB
72
+ G["CLI<br/>init · index · search · describe"]
73
+ E["MCP Server<br/>21 tools · LRU cache"]
74
+ F["Persistent Serve Process"]
93
75
  G --> F
76
+ E --> F
77
+ end
78
+
79
+ F -->|"stdin/stdout JSON"| rust
80
+
81
+ subgraph rust ["Rust Core"]
82
+ direction TB
83
+ A["AST Parser<br/>PHP · JS · XML"]
84
+ B["Pattern Detection<br/>20+ Magento patterns"]
85
+ B2["Description Enrichment<br/>LLM-powered di.xml summaries"]
86
+ C["ONNX Embedder<br/>all-MiniLM-L6-v2 · 384d"]
87
+ D["HNSW Vector Search<br/>hybrid reranking · SONA"]
88
+ A --> B --> B2 --> C --> D
94
89
  end
95
- node -->|stdin/stdout JSON| rust
96
90
 
97
91
  style rust fill:#f4a460,color:#000
98
92
  style node fill:#68b684,color:#000
@@ -101,26 +95,28 @@ flowchart TD
101
95
  ### Indexing Pipeline
102
96
 
103
97
  ```mermaid
104
- flowchart TD
105
- A[Source File] --> B[AST Parser]
106
- B --> C[Pattern Detection]
107
- C --> D[Text Enrichment]
108
- D --> E[ONNX Embedding]
109
- E --> F[(HNSW Index)]
110
- A --> G[Metadata]
111
- G --> F
98
+ flowchart LR
99
+ A["Source File"] --> B["AST Parser"]
100
+ B --> C["Pattern Detection"]
101
+ C --> D["Text Enrichment"]
102
+ D --> D2{"Descriptions DB?"}
103
+ D2 -->|Yes| D3["Prepend LLM Description"]
104
+ D2 -->|No| E["ONNX Embedding"]
105
+ D3 --> E
106
+ E --> F[("HNSW Index")]
107
+ A --> G["Metadata"] --> F
112
108
  ```
113
109
 
114
110
  ### Search Pipeline
115
111
 
116
112
  ```mermaid
117
- flowchart TD
118
- Q[Query] --> E1[Synonym Enrichment]
119
- E1 --> E2[ONNX Embedding]
120
- E2 --> H[HNSW Search]
121
- H --> R[Hybrid Reranking]
122
- R --> SA[SONA Adjustment + MicroLoRA]
123
- SA --> J[Structured JSON]
113
+ flowchart LR
114
+ Q["Query"] --> E1["Synonym Enrichment"]
115
+ E1 --> E2["ONNX Embedding"]
116
+ E2 --> H["HNSW Search"]
117
+ H --> R["Hybrid Reranking"]
118
+ R --> SA["SONA Adjustment"]
119
+ SA --> J["Structured JSON"]
124
120
  ```
125
121
 
126
122
  ### Components
@@ -133,6 +129,7 @@ flowchart TD
133
129
  | JS parsing | `tree-sitter-javascript` | AMD/ES6 module detection |
134
130
  | Pattern detection | Custom Rust | 20+ Magento-specific patterns |
135
131
  | CLI | `clap` | Command-line interface (index, search, serve, validate) |
132
+ | Descriptions | `rusqlite` (bundled SQLite) | LLM-generated di.xml descriptions stored in SQLite, prepended to embeddings |
136
133
  | SONA | Custom Rust | Feedback learning with MicroLoRA + EWC++ |
137
134
  | MCP server | `@modelcontextprotocol/sdk` | AI tool integration with structured JSON output |
138
135
 
@@ -154,13 +151,14 @@ npx magector init
154
151
  This single command handles the entire setup:
155
152
 
156
153
  ```mermaid
157
- flowchart TD
158
- A["npx magector init"] --> B[Verify Project]
159
- B --> C[Download Model]
160
- C --> D[Index Codebase]
161
- D --> E[Detect IDE]
162
- E --> F[Write Config]
163
- F --> G[Update .gitignore]
154
+ flowchart LR
155
+ A["npx magector init"] --> B["Verify<br/>Project"]
156
+ B --> C["Download<br/>ONNX Model"]
157
+ C --> D["Index<br/>Codebase"]
158
+ D --> E["Detect IDE<br/>Cursor · Claude Code"]
159
+ E --> E2["API Key<br/>(optional)"]
160
+ E2 --> F["Write MCP<br/>Config"]
161
+ F --> G["Update<br/>.gitignore"]
164
162
  ```
165
163
 
166
164
  ### 2. Search
@@ -195,6 +193,7 @@ Commands:
195
193
  index Index a Magento codebase
196
194
  search Search the index semantically
197
195
  serve Start persistent server mode (stdin/stdout JSON protocol)
196
+ describe Generate LLM descriptions for di.xml files (requires ANTHROPIC_API_KEY)
198
197
  validate Run validation suite (downloads Magento if needed)
199
198
  download Download Magento 2 Open Source
200
199
  stats Show index statistics
@@ -207,33 +206,50 @@ Commands:
207
206
  magector-core index [OPTIONS]
208
207
 
209
208
  Options:
210
- -m, --magento-root <PATH> Path to Magento root directory
211
- -d, --database <PATH> Index database path [default: ./magector.db]
212
- -c, --model-cache <PATH> Model cache directory [default: ./models]
213
- -v, --verbose Enable verbose output
209
+ -m, --magento-root <PATH> Path to Magento root directory
210
+ -d, --database <PATH> Index database path [default: ./.magector/index.db]
211
+ -c, --model-cache <PATH> Model cache directory [default: ./models]
212
+ --descriptions-db <PATH> Path to descriptions SQLite DB (descriptions are prepended to embeddings)
213
+ -v, --verbose Enable verbose output
214
214
  ```
215
215
 
216
+ When `--descriptions-db` is provided (or auto-detected as `sqlite.db` next to the index), descriptions are prepended to the embedding text as `"Description: {text}\n\n"` before the raw file content. This places semantic terms within the 256-token ONNX window, significantly improving retrieval of di.xml files for natural-language queries.
217
+
216
218
  #### `search`
217
219
 
218
220
  ```bash
219
221
  magector-core search <QUERY> [OPTIONS]
220
222
 
221
223
  Options:
222
- -d, --database <PATH> Index database path [default: ./magector.db]
224
+ -d, --database <PATH> Index database path [default: ./.magector/index.db]
223
225
  -l, --limit <N> Number of results [default: 10]
224
226
  -f, --format <FORMAT> Output format: text, json [default: text]
225
227
  ```
226
228
 
229
+ #### `describe`
230
+
231
+ ```bash
232
+ magector-core describe [OPTIONS]
233
+
234
+ Options:
235
+ -m, --magento-root <PATH> Path to Magento root directory
236
+ -o, --output <PATH> Output SQLite database [default: ./.magector/sqlite.db]
237
+ --force Re-describe all files (ignore cache)
238
+ ```
239
+
240
+ Generates natural-language descriptions of di.xml files using the Anthropic API (Claude Sonnet). Requires `ANTHROPIC_API_KEY` environment variable. Descriptions are stored in a SQLite database and used during indexing to enrich embeddings. Only files with changed content hashes are re-described (incremental by default).
241
+
227
242
  #### `serve`
228
243
 
229
244
  ```bash
230
245
  magector-core serve [OPTIONS]
231
246
 
232
247
  Options:
233
- -d, --database <PATH> Index database path [default: ./magector.db]
234
- -c, --model-cache <PATH> Model cache directory [default: ./models]
235
- -m, --magento-root <PATH> Magento root (enables file watcher)
236
- --watch-interval <SECS> File watcher poll interval [default: 60]
248
+ -d, --database <PATH> Index database path [default: ./.magector/index.db]
249
+ -c, --model-cache <PATH> Model cache directory [default: ./models]
250
+ -m, --magento-root <PATH> Magento root (enables file watcher)
251
+ --descriptions-db <PATH> Path to descriptions SQLite DB
252
+ --watch-interval <SECS> File watcher poll interval [default: 60]
237
253
  ```
238
254
 
239
255
  Starts a persistent process that reads JSON queries from stdin and writes JSON responses to stdout. Keeps the ONNX model and HNSW index resident in memory for fast repeated queries.
@@ -257,6 +273,16 @@ When `--magento-root` is provided, a background file watcher polls for changed f
257
273
  // Response:
258
274
  {"ok":true,"data":{"running":true,"tracked_files":18234,"last_scan_changes":3,"interval_secs":60}}
259
275
 
276
+ // Descriptions (all LLM descriptions from SQLite DB):
277
+ {"command":"descriptions"}
278
+ // Response:
279
+ {"ok":true,"data":{"app/code/Magento/Catalog/etc/di.xml":{"hash":"...","description":"...","model":"claude-sonnet-4-5-20250929","timestamp":1769875137},...}}
280
+
281
+ // Describe (generate descriptions + auto-reindex affected files):
282
+ {"command":"describe"}
283
+ // Response:
284
+ {"ok":true,"data":{"files_found":371,"described":5,"skipped":366,"errors":0,"described_paths":["app/code/..."]}}
285
+
260
286
  // SONA feedback:
261
287
  {"command":"feedback","signals":[{"type":"refinement_to_plugin","query":"checkout totals","timestamp":1700000000000}]}
262
288
  // Response:
@@ -275,26 +301,30 @@ When `--magento-root` is provided, a background file watcher polls for changed f
275
301
  npx magector init [path] # Full setup: index + IDE config
276
302
  npx magector index [path] # Index (or re-index) Magento codebase
277
303
  npx magector search <query> # Search indexed code
304
+ npx magector describe [path] # Generate LLM descriptions for di.xml files
278
305
  npx magector stats # Show indexer statistics
279
306
  npx magector setup [path] # IDE setup only (no indexing)
280
307
  npx magector mcp # Start MCP server
281
308
  npx magector help # Show help
282
309
  ```
283
310
 
311
+ The `describe` command and `magento_describe` MCP tool require an Anthropic API key. During `npx magector init`, you are prompted to paste your key (optional). If provided, it is stored in the MCP config file as the `ANTHROPIC_API_KEY` environment variable so the MCP server can use it automatically. You can also set it manually later by adding `"ANTHROPIC_API_KEY": "sk-..."` to the `env` section in `.mcp.json` or `~/.cursor/mcp.json`.
312
+
284
313
  ### Environment Variables
285
314
 
286
315
  | Variable | Description | Default |
287
316
  |----------|-------------|---------|
288
317
  | `MAGENTO_ROOT` | Path to Magento installation | Current directory |
289
- | `MAGECTOR_DB` | Path to index database | `./magector.db` |
318
+ | `MAGECTOR_DB` | Path to index database | `./.magector/index.db` |
290
319
  | `MAGECTOR_BIN` | Path to magector-core binary | Auto-detected |
291
320
  | `MAGECTOR_MODELS` | Path to ONNX model directory | `~/.magector/models/` |
321
+ | `ANTHROPIC_API_KEY` | API key for description generation (`describe` command) | — |
292
322
 
293
323
  ---
294
324
 
295
325
  ## MCP Server Tools
296
326
 
297
- The MCP server exposes 20 tools for AI-assisted Magento 2 and Adobe Commerce development. All search tools return **structured JSON** with file paths, class names, methods, role badges, and content snippets -- enabling AI clients to parse results programmatically and minimize file-read round-trips.
327
+ The MCP server exposes 21 tools for AI-assisted Magento 2 and Adobe Commerce development. All search tools return **structured JSON** with file paths, class names, methods, role badges, and content snippets -- enabling AI clients to parse results programmatically and minimize file-read round-trips.
298
328
 
299
329
  ### Output Format
300
330
 
@@ -371,6 +401,7 @@ Auto-detects entry type from pattern (`/V1/...` → API, `snake_case` → event,
371
401
  |------|-------------|
372
402
  | `magento_module_structure` | Show complete module structure -- controllers, models, blocks, plugins, observers, configs |
373
403
  | `magento_index` | Trigger re-indexing of the codebase |
404
+ | `magento_describe` | Generate LLM descriptions for di.xml files (requires `ANTHROPIC_API_KEY`), stored in SQLite, auto-reindexes affected files |
374
405
  | `magento_stats` | View index statistics |
375
406
 
376
407
  ### Tool Cross-References
@@ -378,7 +409,7 @@ Auto-detects entry type from pattern (`/V1/...` → API, `snake_case` → event,
378
409
  Each tool description includes "See also" hints to help AI clients chain tools effectively:
379
410
 
380
411
  ```mermaid
381
- graph TD
412
+ graph LR
382
413
  cls["find_class"] --> plg["find_plugin"]
383
414
  cls --> prf["find_preference"]
384
415
  cls --> mtd["find_method"]
@@ -436,6 +467,7 @@ magento_find_block("cart totals")
436
467
  magento_find_template("minicart")
437
468
  magento_analyze_diff({ commitHash: "abc123" })
438
469
  magento_complexity({ module: "Magento_Catalog", threshold: 10 })
470
+ magento_describe()
439
471
  magento_trace_flow({ entryPoint: "checkout/cart/add", depth: "deep" })
440
472
  magento_trace_flow({ entryPoint: "/V1/products" })
441
473
  magento_trace_flow({ entryPoint: "placeOrder", entryType: "graphql" })
@@ -492,7 +524,7 @@ pie title Test Pass Rate (101 queries)
492
524
 
493
525
  ### Integration Tests
494
526
 
495
- 64 integration tests covering MCP protocol compliance, tool schemas, tool calls, analysis tools, and stdout JSON integrity.
527
+ 66 integration tests covering MCP protocol compliance, tool schemas, tool calls (including `magento_describe`), analysis tools, and stdout JSON integrity.
496
528
 
497
529
  ### Running Tests
498
530
 
@@ -501,7 +533,7 @@ pie title Test Pass Rate (101 queries)
501
533
  npm run test:accuracy
502
534
  npm run test:accuracy:verbose
503
535
 
504
- # Integration tests (64 tests)
536
+ # Integration tests (66 tests)
505
537
  npm test
506
538
 
507
539
  # SONA/MicroLoRA benefit evaluation (180 queries, baseline vs post-training)
@@ -539,6 +571,7 @@ magector/
539
571
  │ ├── mcp-accuracy.test.js # E2E accuracy tests (101 queries)
540
572
  │ ├── mcp-sona.test.js # SONA feedback integration tests (8 tests)
541
573
  │ ├── mcp-sona-eval.test.js # SONA/MicroLoRA benefit evaluation (180 queries)
574
+ │ ├── describe-benefit-eval.test.js # Description enrichment benefit evaluation
542
575
  │ └── results/ # Test result artifacts
543
576
  │ ├── accuracy-report.json
544
577
  │ └── sona-eval-report.json
@@ -558,6 +591,7 @@ magector/
558
591
  │ │ ├── watcher.rs # File watcher for incremental re-indexing
559
592
  │ │ ├── ast.rs # Tree-sitter AST (PHP + JS)
560
593
  │ │ ├── magento.rs # Magento pattern detection (Rust)
594
+ │ │ ├── describe.rs # LLM description generation + SQLite storage
561
595
  │ │ ├── sona.rs # SONA feedback learning + MicroLoRA + EWC++
562
596
  │ │ └── validation.rs # 557 test cases, validation framework
563
597
  │ └── models/ # ONNX model files (auto-downloaded)
@@ -587,8 +621,9 @@ Magector scans every `.php`, `.js`, `.xml`, `.phtml`, and `.graphqls` file in a
587
621
  1. **AST parsing** -- Tree-sitter extracts class names, namespaces, methods, inheritance, and interface implementations from PHP and JavaScript files
588
622
  2. **Pattern detection** -- Identifies Magento-specific patterns: controllers, models, repositories, plugins, observers, blocks, GraphQL resolvers, admin grids, cron jobs, and more
589
623
  3. **Search text enrichment** -- Combines AST metadata with Magento pattern keywords to create semantically rich text representations
590
- 4. **Embedding** -- ONNX Runtime generates 384-dimensional vectors using all-MiniLM-L6-v2
591
- 5. **Indexing** -- Vectors are stored in an HNSW index for sub-millisecond approximate nearest neighbor search
624
+ 4. **Description enrichment** -- If a descriptions SQLite DB is present, LLM-generated natural-language descriptions are prepended to the embedding text as `"Description: {text}\n\n"`, placing semantic DI concepts (preferences, plugins, virtual types, subsystem names) within the 256-token ONNX window
625
+ 5. **Embedding** -- ONNX Runtime generates 384-dimensional vectors using all-MiniLM-L6-v2
626
+ 6. **Indexing** -- Vectors are stored in an HNSW index for sub-millisecond approximate nearest neighbor search
592
627
 
593
628
  ### 2. Searching
594
629
 
@@ -604,20 +639,16 @@ Magector scans every `.php`, `.js`, `.xml`, `.phtml`, and `.graphqls` file in a
604
639
  The MCP server spawns a persistent Rust process (`magector-core serve`) that keeps the ONNX model and HNSW index loaded in memory. Queries are sent as JSON over stdin and responses returned via stdout -- eliminating the ~2.6s cold-start overhead of loading the model per query. Falls back to single-shot `execFileSync` if the serve process is unavailable.
605
640
 
606
641
  ```mermaid
607
- flowchart TD
642
+ flowchart LR
608
643
  subgraph startup ["Startup (once)"]
609
- S1[Load Model] --> S2[Load Index]
610
- S2 --> S3[Ready Signal]
644
+ S1["Load Model"] --> S2["Load Index"] --> S3["Ready Signal"]
611
645
  end
646
+ startup --> query
612
647
  subgraph query ["Per Query (10-45ms)"]
613
- Q1[stdin JSON] --> Q2[Embed]
614
- Q2 --> Q3[HNSW Search]
615
- Q3 --> Q4[Rerank]
616
- Q4 --> Q5[stdout JSON]
648
+ Q1["stdin JSON"] --> Q2["Embed"] --> Q3["HNSW Search"] --> Q4["Rerank"] --> Q5["stdout JSON"]
617
649
  end
618
- startup --> query
619
650
  subgraph fallback ["Fallback"]
620
- F1[execFileSync ~2.6s]
651
+ F1["execFileSync ~2.6s"]
621
652
  end
622
653
 
623
654
  style startup fill:#e8f4e8,color:#000
@@ -632,17 +663,12 @@ When the serve process is started with `--magento-root`, a background thread pol
632
663
  Since `hnsw_rs` does not support point deletion, Magector uses a **tombstone** strategy: old vectors for modified/deleted files are marked as tombstoned and filtered out of search results. New vectors are appended. When tombstoned entries exceed 20% of total vectors, the HNSW graph is automatically rebuilt (compacted) to reclaim memory and restore search performance.
633
664
 
634
665
  ```mermaid
635
- flowchart TD
636
- W1[Sleep 60s] --> W2[Scan Filesystem]
637
- W2 --> W3{Changes?}
666
+ flowchart LR
667
+ W1["Sleep 60s"] --> W2["Scan Filesystem"] --> W3{"Changes?"}
638
668
  W3 -->|No| W1
639
- W3 -->|Yes| W4[Tombstone Old Vectors]
640
- W4 --> W5[Parse + Embed New Files]
641
- W5 --> W6[Append to HNSW]
642
- W6 --> W7{Tombstone > 20%?}
643
- W7 -->|Yes| W8[Compact / Rebuild HNSW]
644
- W7 -->|No| W9[Save to Disk]
645
- W8 --> W9
669
+ W3 -->|Yes| W4["Tombstone Old Vectors"] --> W5["Parse + Embed New Files"] --> W6["Append to HNSW"] --> W7{"Tombstone > 20%?"}
670
+ W7 -->|Yes| W8["Compact / Rebuild HNSW"] --> W9["Save to Disk"]
671
+ W7 -->|No| W9
646
672
  W9 --> W1
647
673
 
648
674
  style W4 fill:#f4e8e8,color:#000
@@ -694,7 +720,7 @@ The MCP server tracks sequences of tool calls and sends feedback signals to the
694
720
  - Learning rate decays with repeated observations (diminishing returns)
695
721
  - Learned weights are keyed by normalized, order-independent query term hashes
696
722
  - Always active -- no feature flags or build-time opt-in required
697
- - Persisted via bincode to `<db_path>.sona`
723
+ - Persisted via bincode to `<db_path>.sona` (e.g., `.magector/index.db.sona`)
698
724
 
699
725
  **SONA v2: MicroLoRA + EWC++**
700
726
 
@@ -714,6 +740,34 @@ SONA v2 adds embedding-level adaptation via a MicroLoRA adapter and Elastic Weig
714
740
  cd rust-core && cargo build --release
715
741
  ```
716
742
 
743
+ ### 7. LLM Description Enrichment
744
+
745
+ Magector can generate natural-language descriptions of di.xml files using the Anthropic API and embed them directly into the vector index. This significantly improves search ranking for semantic queries about dependency injection.
746
+
747
+ **Workflow:**
748
+
749
+ ```bash
750
+ # 1. Generate descriptions (one-time, incremental — only re-describes changed files)
751
+ ANTHROPIC_API_KEY=sk-... npx magector describe /path/to/magento
752
+
753
+ # 2. Re-index with descriptions embedded into vectors
754
+ npx magector index /path/to/magento
755
+ ```
756
+
757
+ Or via the MCP tool: `magento_describe()` generates descriptions and auto-reindexes affected files in one step.
758
+
759
+ **How it works:** Each di.xml file is sent to Claude Sonnet with a prompt optimized for semantic search retrieval. The resulting description (~70 words) is stored in a SQLite database (`.magector/sqlite.db`). During indexing, descriptions are prepended to the embedding text as `"Description: {text}\n\n"` before the raw file content, placing semantic terms (preferences, plugins, virtual types, subsystem names) within the ONNX model's 256-token window.
760
+
761
+ **Measured impact** (A/B experiment, 25 queries, Magento 2.4.7, 17,891 vectors, 371 described files):
762
+
763
+ | Metric | Without Descriptions | With Descriptions | Delta |
764
+ |--------|---------------------|-------------------|-------|
765
+ | Precision@K | 1.6% | 20.3% | **+18.7%** |
766
+ | MRR | 0.031 | 0.330 | **+0.30** |
767
+ | NDCG@10 | 0.037 | 0.369 | **+0.33** |
768
+ | di.xml results/query | 0.2 | 3.0 | **+2.8** |
769
+ | Query win rate | — | — | **76%** |
770
+
717
771
  ---
718
772
 
719
773
  ## Magento Patterns Detected
@@ -830,7 +884,7 @@ cargo run --release -- validate
830
884
  ### Testing
831
885
 
832
886
  ```bash
833
- # Integration tests (64 tests, requires indexed codebase)
887
+ # Integration tests (66 tests, requires indexed codebase)
834
888
  npm test
835
889
 
836
890
  # E2E accuracy tests (101 queries)
@@ -840,7 +894,7 @@ npm run test:accuracy:verbose
840
894
  # Run without index (unit + schema tests only)
841
895
  npm run test:no-index
842
896
 
843
- # Rust unit tests (33 tests including SONA)
897
+ # Rust unit tests (37 tests including SONA + descriptions)
844
898
  cd rust-core && cargo test
845
899
 
846
900
  # SONA integration tests (8 tests)
@@ -968,12 +1022,13 @@ gantt
968
1022
  SONA feedback :done, 2025-04, 30d
969
1023
  Incremental index :done, 2025-04, 30d
970
1024
  SONA v2 MicroLoRA :done, 2025-05, 15d
971
- Method chunking :active, 2025-06, 30d
972
- Intent detection :2025-07, 30d
973
- Type filtering :2025-08, 30d
1025
+ LLM descriptions :done, 2025-06, 30d
1026
+ Method chunking :active, 2025-07, 30d
1027
+ Intent detection :2025-08, 30d
1028
+ Type filtering :2025-09, 30d
974
1029
  section Future
975
- VSCode extension :2025-09, 60d
976
- Web UI :2025-11, 60d
1030
+ VSCode extension :2025-10, 60d
1031
+ Web UI :2025-12, 60d
977
1032
  ```
978
1033
 
979
1034
  - [x] Hybrid search (semantic + keyword re-ranking)
@@ -984,6 +1039,7 @@ gantt
984
1039
  - [x] Adobe Commerce support (B2B, Staging, and all Commerce-specific modules)
985
1040
  - [x] SONA feedback learning (search rankings adapt to MCP tool call patterns)
986
1041
  - [x] SONA v2 with MicroLoRA + EWC++ (embedding-level adaptation, prevents catastrophic forgetting)
1042
+ - [x] LLM description enrichment (generate di.xml descriptions via Claude, store in SQLite, embed into vectors for improved search ranking)
987
1043
  - [ ] Method-level chunking (per-method vectors for direct method search)
988
1044
  - [ ] Query intent classification (auto-detect "give me XML" vs "give me PHP")
989
1045
  - [ ] Filtered search by file type at the vector level
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "magector",
3
- "version": "1.4.3",
3
+ "version": "1.5.1",
4
4
  "description": "Semantic code search for Magento 2 — index, search, MCP server",
5
5
  "type": "module",
6
6
  "main": "src/mcp-server.js",
@@ -25,7 +25,9 @@
25
25
  "test:accuracy": "node tests/mcp-accuracy.test.js",
26
26
  "test:accuracy:verbose": "node tests/mcp-accuracy.test.js --verbose",
27
27
  "test:sona-eval": "node tests/mcp-sona-eval.test.js",
28
- "test:sona-eval:verbose": "node tests/mcp-sona-eval.test.js --verbose"
28
+ "test:sona-eval:verbose": "node tests/mcp-sona-eval.test.js --verbose",
29
+ "test:describe-eval": "node tests/describe-benefit-eval.test.js",
30
+ "test:describe-eval:verbose": "node tests/describe-benefit-eval.test.js --verbose"
29
31
  },
30
32
  "dependencies": {
31
33
  "@modelcontextprotocol/sdk": "^1.0.0",
@@ -35,10 +37,10 @@
35
37
  "ruvector": "^0.1.96"
36
38
  },
37
39
  "optionalDependencies": {
38
- "@magector/cli-darwin-arm64": "1.4.3",
39
- "@magector/cli-linux-x64": "1.4.3",
40
- "@magector/cli-linux-arm64": "1.4.3",
41
- "@magector/cli-win32-x64": "1.4.3"
40
+ "@magector/cli-darwin-arm64": "1.5.1",
41
+ "@magector/cli-linux-x64": "1.5.1",
42
+ "@magector/cli-linux-arm64": "1.5.1",
43
+ "@magector/cli-win32-x64": "1.5.1"
42
44
  },
43
45
  "keywords": [
44
46
  "magento",