magector 1.4.3 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,8 +1,8 @@
1
1
  # Magector
2
2
 
3
- **Semantic code search engine for Magento 2 and Adobe Commerce, powered by ONNX embeddings and HNSW vector search.**
3
+ **Technology-aware MCP server for Magento 2 and Adobe Commerce with intelligent indexing and search.**
4
4
 
5
- Magector indexes an entire Magento 2 or Adobe Commerce codebase and lets you search it with natural language. Instead of grepping for keywords, ask questions like *"how are checkout totals calculated?"* or *"where is the product price determined?"* and get ranked, relevant results in under 50ms.
5
+ Magector is a Model Context Protocol (MCP) server that deeply understands Magento 2 and Adobe Commerce. It builds a semantic vector index of your entire codebase — 18,000+ files across hundreds of modules — and exposes 21 tools that let AI assistants search, navigate, and understand the code with domain-specific intelligence. Instead of grepping for keywords, your AI asks *"how are checkout totals calculated?"* and gets ranked, relevant results in under 50ms, enriched with Magento pattern detection (plugins, observers, controllers, DI preferences, layout XML, and 20+ more).
6
6
 
7
7
  [![Rust](https://img.shields.io/badge/rust-1.75+-orange.svg)](https://www.rust-lang.org)
8
8
  [![Node.js](https://img.shields.io/badge/node-18+-green.svg)](https://nodejs.org)
@@ -15,38 +15,26 @@ Magector indexes an entire Magento 2 or Adobe Commerce codebase and lets you sea
15
15
 
16
16
  ## Why Magector
17
17
 
18
- Magento 2 and Adobe Commerce have **18,000+ source files** across hundreds of modules. Finding the right code is slow:
18
+ Magento 2 and Adobe Commerce have **18,000+ PHP, XML, JS, PHTML, and GraphQL files** spread across hundreds of modules. The codebase relies heavily on indirection — plugins intercept methods defined in other modules, observers react to events dispatched elsewhere, `di.xml` rewires interfaces to concrete classes, and layout XML stitches blocks and templates together. No single file tells the full story.
19
19
 
20
- | Approach | Finds semantic matches | Understands Magento patterns | Speed (18K files) |
21
- |----------|:---------------------:|:---------------------------:|:-----------------:|
22
- | `grep` / `ripgrep` | No | No | 100-500ms |
23
- | IDE search | No | No | 200-1000ms |
24
- | GitHub search | Partial | No | 500-2000ms |
25
- | **Magector** | **Yes** | **Yes** | **10-45ms** |
20
+ Generic search tools `grep`, IDE search, or the keyword matching built into AI assistants — can't bridge this gap. They find literal strings but can't connect *"how does checkout calculate totals?"* to `TotalsCollector.php` when the word "totals" appears in hundreds of unrelated files.
26
21
 
27
- Magector understands that a query about *"payment capture"* should return `Sales/Model/Order/Payment/Operations/CaptureOperation.php`, not just files containing the word "capture".
22
+ Magector solves this with three layers of intelligence:
28
23
 
29
- ---
30
-
31
- ## Magector vs Built-in AI Search
24
+ 1. **Semantic vector index** — every file is embedded into a 384-dimensional space (ONNX, all-MiniLM-L6-v2) where meaning matters more than keywords. A search for *"payment capture"* returns `CaptureOperation.php` because the embeddings are close, not because the file contains the word "capture".
32
25
 
33
- Claude Code and Cursor both have built-in code search -- but they rely on keyword matching (`grep`/`ripgrep`) and file-tree heuristics. On a Magento 2 / Adobe Commerce codebase with 18,000+ files, that approach breaks down fast.
26
+ 2. **Magento technology awareness** 20+ pattern detectors identify plugins, observers, controllers, blocks, cron jobs, GraphQL resolvers, DI preferences, layout XML, and more. Every search result is enriched with what kind of Magento component it is, so the AI client understands the code's role in the system.
34
27
 
35
- | Capability | Claude Code / Cursor (built-in) | Magector |
36
- |---|---|---|
37
- | **Search method** | Keyword grep / ripgrep | Semantic vector search (ONNX embeddings) |
38
- | **Understands intent** | No -- literal string matching only | Yes -- "payment capture" finds `CaptureOperation.php` |
39
- | **Magento pattern awareness** | None -- treats all PHP the same | Detects controllers, plugins, observers, blocks, resolvers, cron, and 20+ patterns |
40
- | **Query speed (36K vectors)** | 200-1000ms per grep pass; multiple rounds needed | 10-45ms single pass |
41
- | **Context window cost** | Reads many wrong files, burns tokens | Returns structured JSON with ranked results, methods, and snippets |
42
- | **Works offline** | Yes | Yes -- local ONNX model, no API calls |
43
- | **Setup** | Built-in | `npx magector init` (one command) |
28
+ 3. **Adaptive learning (SONA)** Magector tracks which results you actually use and adjusts future rankings with MicroLoRA feedback, getting smarter over time without any API calls.
44
29
 
45
- ### What this means in practice
30
+ The result: your AI assistant calls one MCP tool and gets ranked, pattern-enriched results in 10-45ms — instead of burning tokens grepping through dozens of wrong files. High relevance accuracy means the AI reads fewer, more targeted files, which optimizes context window usage, reduces API costs, and accelerates development cycles.
46
31
 
47
- Without Magector, asking Claude Code or Cursor *"how are checkout totals calculated?"* triggers multiple grep searches, reads dozens of files, and still may miss the right ones. With Magector, the AI calls `magento_search("checkout totals calculation")` and gets the exact files ranked by relevance in one step -- saving tokens and time.
48
-
49
- **Magector doesn't replace your AI tool -- it gives it a better search engine.**
32
+ | Approach | Semantic matches | Magento-aware | Speed (18K files) |
33
+ |----------|:---------------------:|:---------------------------:|:-----------------:|
34
+ | `grep` / `ripgrep` | No | No | 100-500ms |
35
+ | IDE search | No | No | 200-1000ms |
36
+ | GitHub search | Partial | No | 500-2000ms |
37
+ | **Magector** | **Yes** | **Yes** | **10-45ms** |
50
38
 
51
39
  ---
52
40
 
@@ -69,7 +57,8 @@ Without Magector, asking Claude Code or Cursor *"how are checkout totals calcula
69
57
  - **Diff analysis** -- risk scoring and change classification for git commits and staged changes
70
58
  - **Complexity analysis** -- cyclomatic complexity, function count, and hotspot detection across modules
71
59
  - **Fast** -- 10-45ms queries via persistent serve process, batched ONNX embedding with adaptive thread scaling
72
- - **MCP server** -- 20 tools integrating with Claude Code, Cursor, and any MCP-compatible AI tool
60
+ - **LLM description enrichment** -- generate natural-language descriptions of di.xml files using Claude, stored in SQLite, and prepend them to embedding text so descriptions influence vector search ranking (not just post-retrieval display)
61
+ - **MCP server** -- 21 tools integrating with Claude Code, Cursor, and any MCP-compatible AI tool
73
62
  - **Clean architecture** -- Rust core handles all indexing/search, Node.js MCP server delegates to it
74
63
 
75
64
  ---
@@ -81,14 +70,15 @@ flowchart TD
81
70
  subgraph rust ["Rust Core"]
82
71
  A["AST Parser · PHP + JS"]
83
72
  B["Pattern Detection · 20+"]
73
+ B2["Description Enrichment"]
84
74
  C["ONNX Embedder · 384d"]
85
75
  D["HNSW + Reranking"]
86
- A --> B --> C --> D
76
+ A --> B --> B2 --> C --> D
87
77
  end
88
78
  subgraph node ["Node.js Layer"]
89
- E["MCP Server · 20 tools"]
79
+ E["MCP Server · 21 tools"]
90
80
  F["Persistent Serve"]
91
- G["CLI · init/index/search"]
81
+ G["CLI · init/index/search/describe"]
92
82
  E --> F
93
83
  G --> F
94
84
  end
@@ -105,7 +95,10 @@ flowchart TD
105
95
  A[Source File] --> B[AST Parser]
106
96
  B --> C[Pattern Detection]
107
97
  C --> D[Text Enrichment]
108
- D --> E[ONNX Embedding]
98
+ D --> D2{Description DB?}
99
+ D2 -->|Yes| D3["Prepend Description"]
100
+ D2 -->|No| E[ONNX Embedding]
101
+ D3 --> E
109
102
  E --> F[(HNSW Index)]
110
103
  A --> G[Metadata]
111
104
  G --> F
@@ -133,6 +126,7 @@ flowchart TD
133
126
  | JS parsing | `tree-sitter-javascript` | AMD/ES6 module detection |
134
127
  | Pattern detection | Custom Rust | 20+ Magento-specific patterns |
135
128
  | CLI | `clap` | Command-line interface (index, search, serve, validate) |
129
+ | Descriptions | `rusqlite` (bundled SQLite) | LLM-generated di.xml descriptions stored in SQLite, prepended to embeddings |
136
130
  | SONA | Custom Rust | Feedback learning with MicroLoRA + EWC++ |
137
131
  | MCP server | `@modelcontextprotocol/sdk` | AI tool integration with structured JSON output |
138
132
 
@@ -195,6 +189,7 @@ Commands:
195
189
  index Index a Magento codebase
196
190
  search Search the index semantically
197
191
  serve Start persistent server mode (stdin/stdout JSON protocol)
192
+ describe Generate LLM descriptions for di.xml files (requires ANTHROPIC_API_KEY)
198
193
  validate Run validation suite (downloads Magento if needed)
199
194
  download Download Magento 2 Open Source
200
195
  stats Show index statistics
@@ -207,33 +202,50 @@ Commands:
207
202
  magector-core index [OPTIONS]
208
203
 
209
204
  Options:
210
- -m, --magento-root <PATH> Path to Magento root directory
211
- -d, --database <PATH> Index database path [default: ./magector.db]
212
- -c, --model-cache <PATH> Model cache directory [default: ./models]
213
- -v, --verbose Enable verbose output
205
+ -m, --magento-root <PATH> Path to Magento root directory
206
+ -d, --database <PATH> Index database path [default: ./.magector/index.db]
207
+ -c, --model-cache <PATH> Model cache directory [default: ./models]
208
+ --descriptions-db <PATH> Path to descriptions SQLite DB (descriptions are prepended to embeddings)
209
+ -v, --verbose Enable verbose output
214
210
  ```
215
211
 
212
+ When `--descriptions-db` is provided (or auto-detected as `sqlite.db` next to the index), descriptions are prepended to the embedding text as `"Description: {text}\n\n"` before the raw file content. This places semantic terms within the 256-token ONNX window, significantly improving retrieval of di.xml files for natural-language queries.
213
+
216
214
  #### `search`
217
215
 
218
216
  ```bash
219
217
  magector-core search <QUERY> [OPTIONS]
220
218
 
221
219
  Options:
222
- -d, --database <PATH> Index database path [default: ./magector.db]
220
+ -d, --database <PATH> Index database path [default: ./.magector/index.db]
223
221
  -l, --limit <N> Number of results [default: 10]
224
222
  -f, --format <FORMAT> Output format: text, json [default: text]
225
223
  ```
226
224
 
225
+ #### `describe`
226
+
227
+ ```bash
228
+ magector-core describe [OPTIONS]
229
+
230
+ Options:
231
+ -m, --magento-root <PATH> Path to Magento root directory
232
+ -o, --output <PATH> Output SQLite database [default: ./.magector/sqlite.db]
233
+ --force Re-describe all files (ignore cache)
234
+ ```
235
+
236
+ Generates natural-language descriptions of di.xml files using the Anthropic API (Claude Sonnet). Requires `ANTHROPIC_API_KEY` environment variable. Descriptions are stored in a SQLite database and used during indexing to enrich embeddings. Only files with changed content hashes are re-described (incremental by default).
237
+
227
238
  #### `serve`
228
239
 
229
240
  ```bash
230
241
  magector-core serve [OPTIONS]
231
242
 
232
243
  Options:
233
- -d, --database <PATH> Index database path [default: ./magector.db]
234
- -c, --model-cache <PATH> Model cache directory [default: ./models]
235
- -m, --magento-root <PATH> Magento root (enables file watcher)
236
- --watch-interval <SECS> File watcher poll interval [default: 60]
244
+ -d, --database <PATH> Index database path [default: ./.magector/index.db]
245
+ -c, --model-cache <PATH> Model cache directory [default: ./models]
246
+ -m, --magento-root <PATH> Magento root (enables file watcher)
247
+ --descriptions-db <PATH> Path to descriptions SQLite DB
248
+ --watch-interval <SECS> File watcher poll interval [default: 60]
237
249
  ```
238
250
 
239
251
  Starts a persistent process that reads JSON queries from stdin and writes JSON responses to stdout. Keeps the ONNX model and HNSW index resident in memory for fast repeated queries.
@@ -257,6 +269,16 @@ When `--magento-root` is provided, a background file watcher polls for changed f
257
269
  // Response:
258
270
  {"ok":true,"data":{"running":true,"tracked_files":18234,"last_scan_changes":3,"interval_secs":60}}
259
271
 
272
+ // Descriptions (all LLM descriptions from SQLite DB):
273
+ {"command":"descriptions"}
274
+ // Response:
275
+ {"ok":true,"data":{"app/code/Magento/Catalog/etc/di.xml":{"hash":"...","description":"...","model":"claude-sonnet-4-5-20250929","timestamp":1769875137},...}}
276
+
277
+ // Describe (generate descriptions + auto-reindex affected files):
278
+ {"command":"describe"}
279
+ // Response:
280
+ {"ok":true,"data":{"files_found":371,"described":5,"skipped":366,"errors":0,"described_paths":["app/code/..."]}}
281
+
260
282
  // SONA feedback:
261
283
  {"command":"feedback","signals":[{"type":"refinement_to_plugin","query":"checkout totals","timestamp":1700000000000}]}
262
284
  // Response:
@@ -275,26 +297,30 @@ When `--magento-root` is provided, a background file watcher polls for changed f
275
297
  npx magector init [path] # Full setup: index + IDE config
276
298
  npx magector index [path] # Index (or re-index) Magento codebase
277
299
  npx magector search <query> # Search indexed code
300
+ npx magector describe [path] # Generate LLM descriptions for di.xml files
278
301
  npx magector stats # Show indexer statistics
279
302
  npx magector setup [path] # IDE setup only (no indexing)
280
303
  npx magector mcp # Start MCP server
281
304
  npx magector help # Show help
282
305
  ```
283
306
 
307
+ The `describe` command requires `ANTHROPIC_API_KEY`. After running `describe`, the next `index` automatically picks up the descriptions DB and embeds them into the vectors.
308
+
284
309
  ### Environment Variables
285
310
 
286
311
  | Variable | Description | Default |
287
312
  |----------|-------------|---------|
288
313
  | `MAGENTO_ROOT` | Path to Magento installation | Current directory |
289
- | `MAGECTOR_DB` | Path to index database | `./magector.db` |
314
+ | `MAGECTOR_DB` | Path to index database | `./.magector/index.db` |
290
315
  | `MAGECTOR_BIN` | Path to magector-core binary | Auto-detected |
291
316
  | `MAGECTOR_MODELS` | Path to ONNX model directory | `~/.magector/models/` |
317
+ | `ANTHROPIC_API_KEY` | API key for description generation (`describe` command) | — |
292
318
 
293
319
  ---
294
320
 
295
321
  ## MCP Server Tools
296
322
 
297
- The MCP server exposes 20 tools for AI-assisted Magento 2 and Adobe Commerce development. All search tools return **structured JSON** with file paths, class names, methods, role badges, and content snippets -- enabling AI clients to parse results programmatically and minimize file-read round-trips.
323
+ The MCP server exposes 21 tools for AI-assisted Magento 2 and Adobe Commerce development. All search tools return **structured JSON** with file paths, class names, methods, role badges, and content snippets -- enabling AI clients to parse results programmatically and minimize file-read round-trips.
298
324
 
299
325
  ### Output Format
300
326
 
@@ -371,6 +397,7 @@ Auto-detects entry type from pattern (`/V1/...` → API, `snake_case` → event,
371
397
  |------|-------------|
372
398
  | `magento_module_structure` | Show complete module structure -- controllers, models, blocks, plugins, observers, configs |
373
399
  | `magento_index` | Trigger re-indexing of the codebase |
400
+ | `magento_describe` | Generate LLM descriptions for di.xml files (requires `ANTHROPIC_API_KEY`), stored in SQLite, auto-reindexes affected files |
374
401
  | `magento_stats` | View index statistics |
375
402
 
376
403
  ### Tool Cross-References
@@ -436,6 +463,7 @@ magento_find_block("cart totals")
436
463
  magento_find_template("minicart")
437
464
  magento_analyze_diff({ commitHash: "abc123" })
438
465
  magento_complexity({ module: "Magento_Catalog", threshold: 10 })
466
+ magento_describe()
439
467
  magento_trace_flow({ entryPoint: "checkout/cart/add", depth: "deep" })
440
468
  magento_trace_flow({ entryPoint: "/V1/products" })
441
469
  magento_trace_flow({ entryPoint: "placeOrder", entryType: "graphql" })
@@ -492,7 +520,7 @@ pie title Test Pass Rate (101 queries)
492
520
 
493
521
  ### Integration Tests
494
522
 
495
- 64 integration tests covering MCP protocol compliance, tool schemas, tool calls, analysis tools, and stdout JSON integrity.
523
+ 66 integration tests covering MCP protocol compliance, tool schemas, tool calls (including `magento_describe`), analysis tools, and stdout JSON integrity.
496
524
 
497
525
  ### Running Tests
498
526
 
@@ -501,7 +529,7 @@ pie title Test Pass Rate (101 queries)
501
529
  npm run test:accuracy
502
530
  npm run test:accuracy:verbose
503
531
 
504
- # Integration tests (64 tests)
532
+ # Integration tests (66 tests)
505
533
  npm test
506
534
 
507
535
  # SONA/MicroLoRA benefit evaluation (180 queries, baseline vs post-training)
@@ -539,6 +567,7 @@ magector/
539
567
  │ ├── mcp-accuracy.test.js # E2E accuracy tests (101 queries)
540
568
  │ ├── mcp-sona.test.js # SONA feedback integration tests (8 tests)
541
569
  │ ├── mcp-sona-eval.test.js # SONA/MicroLoRA benefit evaluation (180 queries)
570
+ │ ├── describe-benefit-eval.test.js # Description enrichment benefit evaluation
542
571
  │ └── results/ # Test result artifacts
543
572
  │ ├── accuracy-report.json
544
573
  │ └── sona-eval-report.json
@@ -558,6 +587,7 @@ magector/
558
587
  │ │ ├── watcher.rs # File watcher for incremental re-indexing
559
588
  │ │ ├── ast.rs # Tree-sitter AST (PHP + JS)
560
589
  │ │ ├── magento.rs # Magento pattern detection (Rust)
590
+ │ │ ├── describe.rs # LLM description generation + SQLite storage
561
591
  │ │ ├── sona.rs # SONA feedback learning + MicroLoRA + EWC++
562
592
  │ │ └── validation.rs # 557 test cases, validation framework
563
593
  │ └── models/ # ONNX model files (auto-downloaded)
@@ -587,8 +617,9 @@ Magector scans every `.php`, `.js`, `.xml`, `.phtml`, and `.graphqls` file in a
587
617
  1. **AST parsing** -- Tree-sitter extracts class names, namespaces, methods, inheritance, and interface implementations from PHP and JavaScript files
588
618
  2. **Pattern detection** -- Identifies Magento-specific patterns: controllers, models, repositories, plugins, observers, blocks, GraphQL resolvers, admin grids, cron jobs, and more
589
619
  3. **Search text enrichment** -- Combines AST metadata with Magento pattern keywords to create semantically rich text representations
590
- 4. **Embedding** -- ONNX Runtime generates 384-dimensional vectors using all-MiniLM-L6-v2
591
- 5. **Indexing** -- Vectors are stored in an HNSW index for sub-millisecond approximate nearest neighbor search
620
+ 4. **Description enrichment** -- If a descriptions SQLite DB is present, LLM-generated natural-language descriptions are prepended to the embedding text as `"Description: {text}\n\n"`, placing semantic DI concepts (preferences, plugins, virtual types, subsystem names) within the 256-token ONNX window
621
+ 5. **Embedding** -- ONNX Runtime generates 384-dimensional vectors using all-MiniLM-L6-v2
622
+ 6. **Indexing** -- Vectors are stored in an HNSW index for sub-millisecond approximate nearest neighbor search
592
623
 
593
624
  ### 2. Searching
594
625
 
@@ -694,7 +725,7 @@ The MCP server tracks sequences of tool calls and sends feedback signals to the
694
725
  - Learning rate decays with repeated observations (diminishing returns)
695
726
  - Learned weights are keyed by normalized, order-independent query term hashes
696
727
  - Always active -- no feature flags or build-time opt-in required
697
- - Persisted via bincode to `<db_path>.sona`
728
+ - Persisted via bincode to `<db_path>.sona` (e.g., `.magector/index.db.sona`)
698
729
 
699
730
  **SONA v2: MicroLoRA + EWC++**
700
731
 
@@ -714,6 +745,34 @@ SONA v2 adds embedding-level adaptation via a MicroLoRA adapter and Elastic Weig
714
745
  cd rust-core && cargo build --release
715
746
  ```
716
747
 
748
+ ### 7. LLM Description Enrichment
749
+
750
+ Magector can generate natural-language descriptions of di.xml files using the Anthropic API and embed them directly into the vector index. This significantly improves search ranking for semantic queries about dependency injection.
751
+
752
+ **Workflow:**
753
+
754
+ ```bash
755
+ # 1. Generate descriptions (one-time, incremental — only re-describes changed files)
756
+ ANTHROPIC_API_KEY=sk-... npx magector describe /path/to/magento
757
+
758
+ # 2. Re-index with descriptions embedded into vectors
759
+ npx magector index /path/to/magento
760
+ ```
761
+
762
+ Or via the MCP tool: `magento_describe()` generates descriptions and auto-reindexes affected files in one step.
763
+
764
+ **How it works:** Each di.xml file is sent to Claude Sonnet with a prompt optimized for semantic search retrieval. The resulting description (~70 words) is stored in a SQLite database (`.magector/sqlite.db`). During indexing, descriptions are prepended to the embedding text as `"Description: {text}\n\n"` before the raw file content, placing semantic terms (preferences, plugins, virtual types, subsystem names) within the ONNX model's 256-token window.
765
+
766
+ **Measured impact** (A/B experiment, 25 queries, Magento 2.4.7, 17,891 vectors, 371 described files):
767
+
768
+ | Metric | Without Descriptions | With Descriptions | Delta |
769
+ |--------|---------------------|-------------------|-------|
770
+ | Precision@K | 1.6% | 20.3% | **+18.7%** |
771
+ | MRR | 0.031 | 0.330 | **+0.30** |
772
+ | NDCG@10 | 0.037 | 0.369 | **+0.33** |
773
+ | di.xml results/query | 0.2 | 3.0 | **+2.8** |
774
+ | Query win rate | — | — | **76%** |
775
+
717
776
  ---
718
777
 
719
778
  ## Magento Patterns Detected
@@ -830,7 +889,7 @@ cargo run --release -- validate
830
889
  ### Testing
831
890
 
832
891
  ```bash
833
- # Integration tests (64 tests, requires indexed codebase)
892
+ # Integration tests (66 tests, requires indexed codebase)
834
893
  npm test
835
894
 
836
895
  # E2E accuracy tests (101 queries)
@@ -840,7 +899,7 @@ npm run test:accuracy:verbose
840
899
  # Run without index (unit + schema tests only)
841
900
  npm run test:no-index
842
901
 
843
- # Rust unit tests (33 tests including SONA)
902
+ # Rust unit tests (37 tests including SONA + descriptions)
844
903
  cd rust-core && cargo test
845
904
 
846
905
  # SONA integration tests (8 tests)
@@ -968,12 +1027,13 @@ gantt
968
1027
  SONA feedback :done, 2025-04, 30d
969
1028
  Incremental index :done, 2025-04, 30d
970
1029
  SONA v2 MicroLoRA :done, 2025-05, 15d
971
- Method chunking :active, 2025-06, 30d
972
- Intent detection :2025-07, 30d
973
- Type filtering :2025-08, 30d
1030
+ LLM descriptions :done, 2025-06, 30d
1031
+ Method chunking :active, 2025-07, 30d
1032
+ Intent detection :2025-08, 30d
1033
+ Type filtering :2025-09, 30d
974
1034
  section Future
975
- VSCode extension :2025-09, 60d
976
- Web UI :2025-11, 60d
1035
+ VSCode extension :2025-10, 60d
1036
+ Web UI :2025-12, 60d
977
1037
  ```
978
1038
 
979
1039
  - [x] Hybrid search (semantic + keyword re-ranking)
@@ -984,6 +1044,7 @@ gantt
984
1044
  - [x] Adobe Commerce support (B2B, Staging, and all Commerce-specific modules)
985
1045
  - [x] SONA feedback learning (search rankings adapt to MCP tool call patterns)
986
1046
  - [x] SONA v2 with MicroLoRA + EWC++ (embedding-level adaptation, prevents catastrophic forgetting)
1047
+ - [x] LLM description enrichment (generate di.xml descriptions via Claude, store in SQLite, embed into vectors for improved search ranking)
987
1048
  - [ ] Method-level chunking (per-method vectors for direct method search)
988
1049
  - [ ] Query intent classification (auto-detect "give me XML" vs "give me PHP")
989
1050
  - [ ] Filtered search by file type at the vector level
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "magector",
3
- "version": "1.4.3",
3
+ "version": "1.5.0",
4
4
  "description": "Semantic code search for Magento 2 — index, search, MCP server",
5
5
  "type": "module",
6
6
  "main": "src/mcp-server.js",
@@ -25,7 +25,9 @@
25
25
  "test:accuracy": "node tests/mcp-accuracy.test.js",
26
26
  "test:accuracy:verbose": "node tests/mcp-accuracy.test.js --verbose",
27
27
  "test:sona-eval": "node tests/mcp-sona-eval.test.js",
28
- "test:sona-eval:verbose": "node tests/mcp-sona-eval.test.js --verbose"
28
+ "test:sona-eval:verbose": "node tests/mcp-sona-eval.test.js --verbose",
29
+ "test:describe-eval": "node tests/describe-benefit-eval.test.js",
30
+ "test:describe-eval:verbose": "node tests/describe-benefit-eval.test.js --verbose"
29
31
  },
30
32
  "dependencies": {
31
33
  "@modelcontextprotocol/sdk": "^1.0.0",
@@ -35,10 +37,10 @@
35
37
  "ruvector": "^0.1.96"
36
38
  },
37
39
  "optionalDependencies": {
38
- "@magector/cli-darwin-arm64": "1.4.3",
39
- "@magector/cli-linux-x64": "1.4.3",
40
- "@magector/cli-linux-arm64": "1.4.3",
41
- "@magector/cli-win32-x64": "1.4.3"
40
+ "@magector/cli-darwin-arm64": "1.5.0",
41
+ "@magector/cli-linux-x64": "1.5.0",
42
+ "@magector/cli-linux-arm64": "1.5.0",
43
+ "@magector/cli-win32-x64": "1.5.0"
42
44
  },
43
45
  "keywords": [
44
46
  "magento",
package/src/cli.js CHANGED
@@ -6,6 +6,7 @@
6
6
  * The CLI resolves the binary and model paths, then shells out.
7
7
  */
8
8
  import { execFileSync, spawn } from 'child_process';
9
+ import { existsSync, mkdirSync } from 'fs';
9
10
  import path from 'path';
10
11
  import { resolveBinary } from './binary.js';
11
12
  import { ensureModels, resolveModels } from './model.js';
@@ -22,6 +23,7 @@ Usage:
22
23
  npx magector init [path] Full setup: index + IDE config
23
24
  npx magector index [path] Index (or re-index) Magento codebase
24
25
  npx magector search <query> Search indexed code
26
+ npx magector describe [path] Generate LLM descriptions for di.xml files
25
27
  npx magector mcp Start MCP server (for Claude Code / Cursor)
26
28
  npx magector stats Show index statistics
27
29
  npx magector setup [path] IDE setup only (no indexing)
@@ -33,7 +35,7 @@ Options:
33
35
 
34
36
  Environment Variables:
35
37
  MAGENTO_ROOT Path to Magento installation (default: cwd)
36
- MAGECTOR_DB Path to index database (default: ./magector.db)
38
+ MAGECTOR_DB Path to index database (default: ./.magector/index.db)
37
39
  MAGECTOR_BIN Path to magector-core binary
38
40
  MAGECTOR_MODELS Path to ONNX model directory
39
41
 
@@ -48,7 +50,7 @@ Examples:
48
50
 
49
51
  function getConfig() {
50
52
  return {
51
- dbPath: process.env.MAGECTOR_DB || './magector.db',
53
+ dbPath: process.env.MAGECTOR_DB || './.magector/index.db',
52
54
  magentoRoot: process.env.MAGENTO_ROOT || process.cwd()
53
55
  };
54
56
  }
@@ -62,6 +64,8 @@ function parseArgs(argv) {
62
64
  opts.format = argv[++i];
63
65
  } else if (argv[i] === '-v' || argv[i] === '--verbose') {
64
66
  opts.verbose = true;
67
+ } else if (argv[i] === '--force') {
68
+ opts.force = true;
65
69
  }
66
70
  }
67
71
  return opts;
@@ -76,13 +80,23 @@ async function runIndex(targetPath) {
76
80
  console.log(`\nIndexing: ${path.resolve(root)}`);
77
81
  console.log(`Database: ${path.resolve(config.dbPath)}\n`);
78
82
 
83
+ // Ensure .magector/ directory exists
84
+ const magectorDir = path.resolve(root, '.magector');
85
+ mkdirSync(magectorDir, { recursive: true });
86
+
79
87
  try {
80
- execFileSync(binary, [
88
+ const indexArgs = [
81
89
  'index',
82
90
  '-m', path.resolve(root),
83
91
  '-d', path.resolve(config.dbPath),
84
92
  '-c', modelPath
85
- ], { timeout: 600000, stdio: 'inherit' });
93
+ ];
94
+ // Pass descriptions DB if it exists
95
+ const descDbPath = path.resolve(root, '.magector', 'sqlite.db');
96
+ if (existsSync(descDbPath)) {
97
+ indexArgs.push('--descriptions-db', descDbPath);
98
+ }
99
+ execFileSync(binary, indexArgs, { timeout: 600000, stdio: 'inherit' });
86
100
  console.log('\nIndexing complete.');
87
101
  } catch (err) {
88
102
  if (err.status) {
@@ -140,6 +154,43 @@ function runStats() {
140
154
  }
141
155
  }
142
156
 
157
+ async function runDescribe(targetPath) {
158
+ const config = getConfig();
159
+ const root = targetPath || config.magentoRoot;
160
+ const binary = resolveBinary();
161
+ const opts = parseArgs(args.slice(1));
162
+ mkdirSync(path.resolve(root, '.magector'), { recursive: true });
163
+ const outputPath = path.resolve(root, '.magector', 'sqlite.db');
164
+
165
+ if (!process.env.ANTHROPIC_API_KEY) {
166
+ console.error('Error: ANTHROPIC_API_KEY environment variable is required for description generation.');
167
+ process.exit(1);
168
+ }
169
+
170
+ console.log(`\nGenerating LLM descriptions for di.xml files`);
171
+ console.log(`Magento root: ${path.resolve(root)}`);
172
+ console.log(`Output: ${outputPath}\n`);
173
+
174
+ const describeArgs = [
175
+ 'describe',
176
+ '-m', path.resolve(root),
177
+ '-o', outputPath
178
+ ];
179
+ if (opts.force) describeArgs.push('--force');
180
+
181
+ try {
182
+ execFileSync(binary, describeArgs, { timeout: 3600000, stdio: 'inherit' });
183
+ console.log('\nDescription generation complete.');
184
+ } catch (err) {
185
+ if (err.status) {
186
+ console.error('Description generation failed.');
187
+ process.exit(err.status);
188
+ }
189
+ console.error(`Description error: ${err.message}`);
190
+ process.exit(1);
191
+ }
192
+ }
193
+
143
194
  async function main() {
144
195
  switch (command) {
145
196
  case 'init':
@@ -165,6 +216,10 @@ async function main() {
165
216
  await import('./mcp-server.js');
166
217
  break;
167
218
 
219
+ case 'describe':
220
+ await runDescribe(args[1]);
221
+ break;
222
+
168
223
  case 'stats':
169
224
  runStats();
170
225
  break;
package/src/init.js CHANGED
@@ -159,24 +159,20 @@ function writeRules(projectPath, ides) {
159
159
  }
160
160
 
161
161
  /**
162
- * Add magector.db and magector.log to .gitignore if not already present.
162
+ * Add .magector/ to .gitignore if not already present.
163
163
  */
164
164
  function updateGitignore(projectPath) {
165
165
  const giPath = path.join(projectPath, '.gitignore');
166
166
  let updated = false;
167
167
  if (existsSync(giPath)) {
168
168
  const content = readFileSync(giPath, 'utf-8');
169
- if (!content.includes('magector.db')) {
170
- appendFileSync(giPath, '\n# Magector index\nmagector.db\n');
171
- updated = true;
172
- }
173
- if (!content.includes('magector.log')) {
174
- appendFileSync(giPath, 'magector.log\n');
169
+ if (!content.includes('.magector/')) {
170
+ appendFileSync(giPath, '\n# Magector data\n.magector/\n');
175
171
  updated = true;
176
172
  }
177
173
  return updated;
178
174
  }
179
- writeFileSync(giPath, '# Magector index\nmagector.db\nmagector.log\n');
175
+ writeFileSync(giPath, '# Magector data\n.magector/\n');
180
176
  return true;
181
177
  }
182
178
 
@@ -185,7 +181,8 @@ function updateGitignore(projectPath) {
185
181
  */
186
182
  export async function init(projectPath) {
187
183
  projectPath = path.resolve(projectPath || process.cwd());
188
- const dbPath = path.join(projectPath, 'magector.db');
184
+ mkdirSync(path.join(projectPath, '.magector'), { recursive: true });
185
+ const dbPath = path.join(projectPath, '.magector', 'index.db');
189
186
 
190
187
  console.log('\nMagector Init\n');
191
188
 
@@ -264,7 +261,7 @@ export async function init(projectPath) {
264
261
  // 8. Update .gitignore
265
262
  const giUpdated = updateGitignore(projectPath);
266
263
  if (giUpdated) {
267
- console.log('\nUpdated .gitignore with magector.db and magector.log');
264
+ console.log('\nUpdated .gitignore with .magector/');
268
265
  }
269
266
 
270
267
  // 9. Get stats and print summary
@@ -294,7 +291,7 @@ export async function init(projectPath) {
294
291
  */
295
292
  export async function setup(projectPath) {
296
293
  projectPath = path.resolve(projectPath || process.cwd());
297
- const dbPath = path.join(projectPath, 'magector.db');
294
+ const dbPath = path.join(projectPath, '.magector', 'index.db');
298
295
 
299
296
  console.log('\nMagector IDE Setup\n');
300
297
 
package/src/mcp-server.js CHANGED
@@ -16,7 +16,7 @@ import {
16
16
  } from '@modelcontextprotocol/sdk/types.js';
17
17
  import { execFileSync, spawn } from 'child_process';
18
18
  import { createInterface } from 'readline';
19
- import { existsSync, statSync, unlinkSync, appendFileSync, writeFileSync } from 'fs';
19
+ import { existsSync, statSync, unlinkSync, appendFileSync, writeFileSync, readFileSync, mkdirSync } from 'fs';
20
20
  import { stat } from 'fs/promises';
21
21
  import { glob } from 'glob';
22
22
  import path from 'path';
@@ -33,17 +33,42 @@ import { resolveBinary } from './binary.js';
33
33
  import { resolveModels } from './model.js';
34
34
 
35
35
  const config = {
36
- dbPath: process.env.MAGECTOR_DB || './magector.db',
36
+ dbPath: process.env.MAGECTOR_DB || './.magector/index.db',
37
37
  magentoRoot: process.env.MAGENTO_ROOT || process.cwd(),
38
38
  watchInterval: parseInt(process.env.MAGECTOR_WATCH_INTERVAL, 10) || 300,
39
39
  get rustBinary() { return resolveBinary(); },
40
40
  get modelCache() { return resolveModels() || process.env.MAGECTOR_MODELS || './models'; }
41
41
  };
42
42
 
43
+ // ─── LLM Descriptions ────────────────────────────────────────────
44
+ // Loaded from SQLite via serve "descriptions" command (generated by `npx magector describe`)
45
+
46
+ let descriptionMap = {};
47
+
48
+ async function loadDescriptions() {
49
+ // Try via serve process first
50
+ if (serveProcess && serveReady) {
51
+ try {
52
+ const resp = await serveQuery('descriptions', {}, 10000);
53
+ if (resp.ok && resp.data) {
54
+ descriptionMap = resp.data;
55
+ logToFile('INFO', `Loaded ${Object.keys(descriptionMap).length} LLM descriptions via serve`);
56
+ return;
57
+ }
58
+ } catch {
59
+ // Fall through
60
+ }
61
+ }
62
+
63
+ }
64
+
43
65
  // ─── Logging ─────────────────────────────────────────────────────
44
- // All activity is logged to magector.log in the project root (MAGENTO_ROOT).
66
+ // All activity is logged to .magector/magector.log.
67
+
68
+ // Ensure .magector/ directory exists for log and data files
69
+ try { mkdirSync(path.join(config.magentoRoot, '.magector'), { recursive: true }); } catch {}
45
70
 
46
- const LOG_PATH = path.join(config.magentoRoot, 'magector.log');
71
+ const LOG_PATH = path.join(config.magentoRoot, '.magector', 'magector.log');
47
72
 
48
73
  function logToFile(level, message) {
49
74
  const ts = new Date().toISOString();
@@ -125,7 +150,7 @@ function checkDbFormat() {
125
150
  }
126
151
 
127
152
  /**
128
- * Start a background re-index process. Logs to magector.log in project root.
153
+ * Start a background re-index process. Logs to .magector/magector.log.
129
154
  * MCP tools return an informative error while this is running.
130
155
  */
131
156
  function startBackgroundReindex() {
@@ -145,12 +170,17 @@ function startBackgroundReindex() {
145
170
  logToFile('WARN', `Database format incompatible. Starting background re-index.`);
146
171
  console.error(`Database format incompatible. Starting background re-index (log: ${LOG_PATH})`);
147
172
 
148
- reindexProcess = spawn(config.rustBinary, [
173
+ const reindexArgs = [
149
174
  'index',
150
175
  '-m', config.magentoRoot,
151
176
  '-d', config.dbPath,
152
177
  '-c', config.modelCache
153
- ], {
178
+ ];
179
+ const bgDescDbPath = path.join(config.magentoRoot, '.magector', 'sqlite.db');
180
+ if (existsSync(bgDescDbPath)) {
181
+ reindexArgs.push('--descriptions-db', bgDescDbPath);
182
+ }
183
+ reindexProcess = spawn(config.rustBinary, reindexArgs, {
154
184
  stdio: ['pipe', 'pipe', 'pipe'],
155
185
  env: rustEnv,
156
186
  });
@@ -291,13 +321,18 @@ function startServeProcess() {
291
321
  if (config.magentoRoot && existsSync(config.magentoRoot)) {
292
322
  args.push('-m', config.magentoRoot, '--watch-interval', String(config.watchInterval));
293
323
  }
324
+ // Pass descriptions DB path if it exists
325
+ const descDbPath = path.join(config.magentoRoot || '.', '.magector', 'sqlite.db');
326
+ if (existsSync(descDbPath)) {
327
+ args.push('--descriptions-db', descDbPath);
328
+ }
294
329
  const proc = spawn(config.rustBinary, args,
295
330
  { stdio: ['pipe', 'pipe', 'pipe'], env: rustEnv });
296
331
 
297
332
  proc.on('error', () => { serveProcess = null; serveReady = false; if (serveReadyResolve) { serveReadyResolve(false); serveReadyResolve = null; } });
298
333
  proc.on('exit', () => { serveProcess = null; serveReady = false; if (serveReadyResolve) { serveReadyResolve(false); serveReadyResolve = null; } });
299
334
  proc.stderr.on('data', (d) => {
300
- // Log serve process stderr (watcher events, tracing, errors) to magector.log
335
+ // Log serve process stderr (watcher events, tracing, errors) to .magector/magector.log
301
336
  // Strip ANSI escape codes for clean log output
302
337
  const text = d.toString().replace(/\x1b\[[0-9;]*m/g, '').trim();
303
338
  if (text) logToFile('SERVE', text);
@@ -398,12 +433,18 @@ function rustSearch(query, limit = 10) {
398
433
 
399
434
  function rustIndex(magentoRoot) {
400
435
  searchCache.clear(); // invalidate cache on reindex
401
- const result = execFileSync(config.rustBinary, [
436
+ const indexArgs = [
402
437
  'index',
403
438
  '-m', magentoRoot,
404
439
  '-d', config.dbPath,
405
440
  '-c', config.modelCache
406
- ], { encoding: 'utf-8', timeout: 600000, stdio: ['pipe', 'pipe', 'pipe'], env: rustEnv });
441
+ ];
442
+ // Pass descriptions DB if it exists
443
+ const descDbPath = path.join(magentoRoot, '.magector', 'sqlite.db');
444
+ if (existsSync(descDbPath)) {
445
+ indexArgs.push('--descriptions-db', descDbPath);
446
+ }
447
+ const result = execFileSync(config.rustBinary, indexArgs, { encoding: 'utf-8', timeout: 600000, stdio: ['pipe', 'pipe', 'pipe'], env: rustEnv });
407
448
  return result;
408
449
  }
409
450
 
@@ -477,6 +518,7 @@ function normalizeResult(r) {
477
518
  isModel: meta.is_model || meta.isModel,
478
519
  isBlock: meta.is_block || meta.isBlock,
479
520
  area: meta.area,
521
+ description: descriptionMap[meta.path]?.description || null,
480
522
  score: r.score
481
523
  };
482
524
  }
@@ -809,6 +851,7 @@ function formatSearchResults(results) {
809
851
  if (r.magentoType) entry.magentoType = r.magentoType;
810
852
  if (r.type) entry.fileType = r.type;
811
853
  if (r.area && r.area !== 'global') entry.area = r.area;
854
+ if (r.description) entry.description = r.description;
812
855
 
813
856
  // Badges — concise role indicators
814
857
  const badges = [];
@@ -1163,6 +1206,20 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
1163
1206
  }
1164
1207
  }
1165
1208
  },
1209
+ {
1210
+ name: 'magento_describe',
1211
+ description: 'Generate LLM-powered natural language descriptions for di.xml files using Claude Sonnet via the Anthropic API. Requires ANTHROPIC_API_KEY env var. Descriptions are cached and automatically attached to search results for di.xml files.',
1212
+ inputSchema: {
1213
+ type: 'object',
1214
+ properties: {
1215
+ force: {
1216
+ type: 'boolean',
1217
+ description: 'Force regeneration of all descriptions, ignoring cached hashes. Default: false.',
1218
+ default: false
1219
+ }
1220
+ }
1221
+ }
1222
+ },
1166
1223
  {
1167
1224
  name: 'magento_trace_flow',
1168
1225
  description: 'Trace Magento execution flow from an entry point (route, API endpoint, GraphQL mutation, event, or cron job). Chains multiple searches to map controller → plugins → observers → templates for a given request path. Use this to understand how a request is processed end-to-end.',
@@ -1203,7 +1260,7 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
1203
1260
  return {
1204
1261
  content: [{
1205
1262
  type: 'text',
1206
- text: 'Re-indexing in progress. The database format was incompatible and is being rebuilt automatically. Check magector.log for progress. Search tools will be available once re-indexing completes.'
1263
+ text: 'Re-indexing in progress. The database format was incompatible and is being rebuilt automatically. Check .magector/magector.log for progress. Search tools will be available once re-indexing completes.'
1207
1264
  }],
1208
1265
  isError: true,
1209
1266
  };
@@ -1670,6 +1727,73 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
1670
1727
  return { content: [{ type: 'text', text }] };
1671
1728
  }
1672
1729
 
1730
+ case 'magento_describe': {
1731
+ if (!config.magentoRoot || !existsSync(config.magentoRoot)) {
1732
+ return {
1733
+ content: [{ type: 'text', text: 'Error: MAGENTO_ROOT not set or directory not found. Set the MAGENTO_ROOT environment variable.' }],
1734
+ isError: true
1735
+ };
1736
+ }
1737
+
1738
+ if (!process.env.ANTHROPIC_API_KEY) {
1739
+ return {
1740
+ content: [{ type: 'text', text: 'Error: ANTHROPIC_API_KEY environment variable is required for description generation.' }],
1741
+ isError: true
1742
+ };
1743
+ }
1744
+
1745
+ // Try via serve process first, fall back to direct exec
1746
+ const descOutput = path.join(config.magentoRoot, '.magector', 'sqlite.db');
1747
+ const force = args.force || false;
1748
+
1749
+ if (serveProcess && serveReady) {
1750
+ try {
1751
+ const resp = await serveQuery('describe', {
1752
+ magento_root: config.magentoRoot,
1753
+ output: descOutput,
1754
+ force
1755
+ }, 3600000);
1756
+ if (resp.ok) {
1757
+ // Reload descriptions after generation
1758
+ await loadDescriptions();
1759
+ searchCache.clear(); // Invalidate cache since embeddings changed
1760
+ return {
1761
+ content: [{
1762
+ type: 'text',
1763
+ text: `Description generation complete (auto-reindexed).\nTotal: ${resp.data.total_files}, Generated: ${resp.data.generated}, Skipped: ${resp.data.skipped}, Errors: ${resp.data.errors}`
1764
+ }]
1765
+ };
1766
+ }
1767
+ } catch {
1768
+ // Fall through to direct exec
1769
+ }
1770
+ }
1771
+
1772
+ // Direct execution fallback
1773
+ try {
1774
+ const descArgs = [
1775
+ 'describe',
1776
+ '-m', config.magentoRoot,
1777
+ '-o', descOutput
1778
+ ];
1779
+ if (force) descArgs.push('--force');
1780
+
1781
+ const output = execFileSync(config.rustBinary, descArgs, {
1782
+ encoding: 'utf-8', timeout: 3600000, stdio: ['pipe', 'pipe', 'pipe'], env: rustEnv
1783
+ });
1784
+ // Reload descriptions after generation
1785
+ await loadDescriptions();
1786
+ return {
1787
+ content: [{ type: 'text', text: `Description generation complete.\n${output}` }]
1788
+ };
1789
+ } catch (err) {
1790
+ return {
1791
+ content: [{ type: 'text', text: `Description generation failed: ${err.message}` }],
1792
+ isError: true
1793
+ };
1794
+ }
1795
+ }
1796
+
1673
1797
  case 'magento_trace_flow': {
1674
1798
  const entryPoint = args.entryPoint;
1675
1799
  const entryType = args.entryType || 'auto';
@@ -1775,6 +1899,9 @@ async function main() {
1775
1899
  }
1776
1900
  }
1777
1901
 
1902
+ // Load LLM descriptions (after serve process is ready for SQLite access)
1903
+ await loadDescriptions();
1904
+
1778
1905
  const transport = new StdioServerTransport();
1779
1906
  await server.connect(transport);
1780
1907
  logToFile('INFO', 'Magector MCP server started (Rust core backend)');