magector 2.15.0 → 2.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -5,7 +5,7 @@
5
5
  Magector is a Model Context Protocol (MCP) server that deeply understands Magento 2 and Adobe Commerce. It builds a semantic vector index of your entire codebase — 18,000+ files across hundreds of modules — and exposes 46 tools that let AI assistants search, navigate, and understand the code with domain-specific intelligence. Instead of grepping for keywords, your AI asks *"how are checkout totals calculated?"* and gets ranked, relevant results in under 50ms, enriched with Magento pattern detection (plugins, observers, controllers, DI preferences, layout XML, and 20+ more).
6
6
 
7
7
  [![Rust](https://img.shields.io/badge/rust-1.75+-orange.svg)](https://www.rust-lang.org)
8
- [![Node.js](https://img.shields.io/badge/node-22.5+-green.svg)](https://nodejs.org)
8
+ [![Node.js](https://img.shields.io/badge/node-18+-green.svg)](https://nodejs.org)
9
9
  [![Magento](https://img.shields.io/badge/magento-2.4.x-blue.svg)](https://magento.com)
10
10
  [![Adobe Commerce](https://img.shields.io/badge/adobe%20commerce-supported-blue.svg)](https://business.adobe.com/products/magento/magento-commerce.html)
11
11
  [![Accuracy](https://img.shields.io/badge/accuracy-99.2%25-brightgreen.svg)](#validation)
@@ -129,19 +129,48 @@ flowchart LR
129
129
  | JS parsing | `tree-sitter-javascript` | AMD/ES6 module detection |
130
130
  | Pattern detection | Custom Rust | 20+ Magento-specific patterns |
131
131
  | CLI | `clap` | Command-line interface (index, search, serve, validate) |
132
- | Descriptions | `rusqlite` (bundled SQLite) | LLM-generated di.xml descriptions stored in `.magector/sqlite.db`, prepended to embeddings |
133
- | Null-safety index | `node:sqlite` (Node.js 22.5+ built-in) | Method-chain enrichment index in `.magector/enrichment.db` — O(1) null-risk queries |
132
+ | Unified metadata | rusqlite (bundled SQLite) | LLM descriptions, method-chain enrichment, process state, cache — all in .magector/data.db |
134
133
  | SONA | Custom Rust | Feedback learning with MicroLoRA + EWC++ |
135
134
  | MCP server | `@modelcontextprotocol/sdk` | AI tool integration with structured JSON output |
136
135
 
137
136
  ---
138
137
 
138
+ ## Security
139
+
140
+ Magector operates on source code indexed from potentially-untrusted `vendor/` dependencies and is driven by an LLM that may be manipulated via prompt injection in indexed comments, docblocks, or markdown. The following hardening applies as of **v2.15.1**:
141
+
142
+ ### Path traversal protection
143
+
144
+ All tools that accept a `path` argument (`magento_read`, `magento_grep`, `magento_ast_search`, `magento_find_dataobject_issues`) route the input through `safePath()` / `safeRelPath()` helpers in `src/mcp-server.js`. These:
145
+
146
+ 1. Resolve the argument against `MAGENTO_ROOT` with `path.resolve()` (normalizes `..`, symlinks are not followed during validation).
147
+ 2. Reject any resolved path that does not lie inside `MAGENTO_ROOT`.
148
+
149
+ This prevents a hostile `vendor/` comment from instructing the LLM to e.g. `magento_read` `../../home/user/.ssh/id_rsa`. Both the standalone case handlers and their `magento_batch` counterparts share the same chokepoint.
150
+
151
+ ### Shell injection hardening in auto-update
152
+
153
+ `src/update.js` fetches the `latest` field from the npm registry and re-execs itself with the new version string. Previously this was interpolated into a shell command; a tampered registry response could inject shell metacharacters. As of v2.15.1:
154
+
155
+ - The re-exec passes argv as an **array** to a no-shell spawner (no intermediate shell).
156
+ - A semver-strict `isSafeVersion()` validator rejects any version string containing metacharacters or that does not match `X.Y.Z` / `X.Y.Z-prerelease` form.
157
+ - Fails closed: the auto-update is silently skipped rather than run a malformed version.
158
+
159
+ ### Unix socket permissions
160
+
161
+ The serve-proxy Unix socket at `.magector/serve.sock` is created with `chmod 0600` immediately after `listen()`. On multi-user systems, another local account can no longer connect and query the vector index (which would leak indexed source snippets). The chmod is best-effort on platforms that don't support it (logged to `.magector/magector.log`).
162
+
163
+ ### Reporting vulnerabilities
164
+
165
+ If you find a security issue, please open an issue on the GitHub repo and mark it as security-related. Do not post reproducers that leak actual source contents from private codebases.
166
+
167
+ ---
168
+
139
169
  ## Quick Start
140
170
 
141
171
  ### Prerequisites
142
172
 
143
- - [Node.js 22.5+](https://nodejs.org) — required for built-in `node:sqlite` (used by `magento_enrich` / `magento_find_null_risks`)
144
- - [semgrep](https://semgrep.dev) (optional) — required for `magento_ast_search`: `pip install semgrep`
173
+ - [Node.js 18+](https://nodejs.org)
145
174
 
146
175
  ### 1. Initialize in Your Project
147
176
 
@@ -215,7 +244,7 @@ Options:
215
244
  -v, --verbose Enable verbose output
216
245
  ```
217
246
 
218
- When `--descriptions-db` is provided (or auto-detected as `sqlite.db` next to the index), descriptions are prepended to the embedding text as `"Description: {text}\n\n"` before the raw file content. This places semantic terms within the 256-token ONNX window, significantly improving retrieval of di.xml files for natural-language queries.
247
+ When `--descriptions-db` is provided (or auto-detected as `data.db` next to the index), descriptions are prepended to the embedding text as `"Description: {text}\n\n"` before the raw file content. This places semantic terms within the 256-token ONNX window, significantly improving retrieval of di.xml files for natural-language queries.
219
248
 
220
249
  #### `search`
221
250
 
@@ -235,7 +264,7 @@ magector-core describe [OPTIONS]
235
264
 
236
265
  Options:
237
266
  -m, --magento-root <PATH> Path to Magento root directory
238
- -o, --output <PATH> Output SQLite database [default: ./.magector/sqlite.db]
267
+ -o, --output <PATH> Output SQLite database [default: ./.magector/data.db]
239
268
  --force Re-describe all files (ignore cache)
240
269
  ```
241
270
 
@@ -473,7 +502,7 @@ Auto-detects entry type from pattern (`/V1/...` → API, `snake_case` → event,
473
502
  |------|-------------|
474
503
  | `magento_module_structure` | Show complete module structure -- controllers, models, blocks, plugins, observers, configs |
475
504
  | `magento_index` | Trigger re-indexing of the codebase (also kicks off background enrichment) |
476
- | `magento_describe` | Generate LLM descriptions for di.xml files (requires `ANTHROPIC_API_KEY`), stored in `.magector/sqlite.db`, auto-reindexes affected files |
505
+ | `magento_describe` | Generate LLM descriptions for di.xml files (requires `ANTHROPIC_API_KEY`), stored in `.magector/data.db`, auto-reindexes affected files |
477
506
  | `magento_stats` | View index statistics |
478
507
  | `magento_batch` | Execute multiple tool queries in parallel in one MCP roundtrip. Supports all search, find, grep, read, and null-risk tools. Use to avoid N×3-5s roundtrip overhead. |
479
508
  | `magento_grep` | Exact text/regex search across PHP/XML/PHTML files (`grep -rn -E` internally). Supports `filesOnly` mode (like `grep -l`), `context` lines, `ignoreCase`, `include` patterns. **(v2.9)** |
@@ -486,10 +515,10 @@ Auto-detects entry type from pattern (`/V1/...` → API, `snake_case` → event,
486
515
 
487
516
  | Tool | Description |
488
517
  |------|-------------|
489
- | `magento_ast_search` | Structural PHP code search using [semgrep](https://semgrep.dev). Understands PHP AST matches by structure regardless of variable names, ignores comments/strings. Pattern syntax: `$X` = any expression, `$Y` = any identifier, `...` = any args. Example: `$X->getPayment()->$Y(...)`. Requires `semgrep`. **(v2.12)** |
490
- | `magento_enrich` | Build the method-chain enrichment index. Scans all `vendor/` PHP files for `->firstMethod()->secondMethod()` chains and detects null guards in surrounding code. Stores results in `.magector/enrichment.db` (SQLite, `node:sqlite`). Runs automatically after `magento_index`. **(v2.13)** |
518
+ | `magento_ast_search` | Structural PHP code search using tree-sitter. Named patterns: `dataobject-set-null` (detect setX(null) anti-pattern), `unchecked-method-chain` (detect $this->dep->method() chains). Pattern arg is an enum, not free-text. Executed in Rust serve process no external dependency. **(v2.16)** |
519
+ | `magento_enrich` | Build the method-chain enrichment index. Scans all `vendor/` PHP files for `->firstMethod()->secondMethod()` chains and detects null guards in surrounding code. Stores results in `.magector/data.db` (SQLite, via Rust serve). Runs automatically after `magento_index`. **(v2.13, moved to Rust v2.16)** |
491
520
  | `magento_find_null_risks` | Query the enrichment index for method chains without null guards. O(1) SQLite query instead of file scanning. Pass `firstMethod` to filter (e.g., `"getPayment"` → all `->getPayment()->anything()` without null guard). Requires `magento_enrich`. **(v2.13)** |
492
- | `magento_find_dataobject_issues` | Detect `setX(null)` anti-pattern on Magento `DataObject` subclasses. `setX(null)` stores `['x' => null]` in `_data` — `hasX()` (via `array_key_exists`) returns `true` even when the value is `null`, creating false-positive guard conditions. Use during field-lifecycle audits or when debugging "value persists but shouldn't" bugs. Requires `semgrep`. **(v2.15)** |
521
+ | `magento_find_dataobject_issues` | Detect `setX(null)` anti-pattern on Magento `DataObject` subclasses. `setX(null)` stores `['x' => null]` in `_data` — `hasX()` (via `array_key_exists`) returns `true` even when the value is `null`, creating false-positive guard conditions. Use during field-lifecycle audits or when debugging "value persists but shouldn't" bugs. Uses tree-sitter. **(v2.15, tree-sitter v2.16)** |
493
522
 
494
523
  ### Search Enhancements (v2.1)
495
524
 
@@ -877,7 +906,7 @@ npx magector index /path/to/magento
877
906
 
878
907
  Or via the MCP tool: `magento_describe()` generates descriptions and auto-reindexes affected files in one step.
879
908
 
880
- **How it works:** Each di.xml file is sent to Claude Sonnet with a prompt optimized for semantic search retrieval. The resulting description (~70 words) is stored in a SQLite database (`.magector/sqlite.db`). During indexing, descriptions are prepended to the embedding text as `"Description: {text}\n\n"` before the raw file content, placing semantic terms (preferences, plugins, virtual types, subsystem names) within the ONNX model's 256-token window.
909
+ **How it works:** Each di.xml file is sent to Claude Sonnet with a prompt optimized for semantic search retrieval. The resulting description (~70 words) is stored in a SQLite database (`.magector/data.db`). During indexing, descriptions are prepended to the embedding text as `"Description: {text}\n\n"` before the raw file content, placing semantic terms (preferences, plugins, virtual types, subsystem names) within the ONNX model's 256-token window.
881
910
 
882
911
  **Measured impact** (A/B experiment, 25 queries, Magento 2.4.7, 17,891 vectors, 371 described files):
883
912
 
@@ -1218,8 +1247,8 @@ All MCP server activity is logged to `.magector/magector.log` in the Magento pro
1218
1247
  | Level | Meaning |
1219
1248
  |-------|---------|
1220
1249
  | `INFO` | Normal operations: startup config, tool completion, search fallbacks, enrichment progress |
1221
- | `WARN` | Recoverable issues: slow grep queries (>5s), missing enrichment.db, file read errors, serve process disconnects |
1222
- | `ERR` | Failures: semgrep crashes, transaction rollbacks, serve process errors, tool execution errors |
1250
+ | `WARN` | Recoverable issues: slow grep queries (>5s), missing data.db, file read errors, serve process disconnects |
1251
+ | `ERR` | Failures: AST query errors, transaction rollbacks, serve process errors, tool execution errors |
1223
1252
  | `REQ` | Every tool call with full input parameters (JSON) |
1224
1253
  | `RES` | Tool completion with elapsed time in milliseconds |
1225
1254
  | `QUERY` | Rust serve process queries (search, feedback) |
@@ -1240,7 +1269,7 @@ grep '\[RES\]' .magector/magector.log | tail -20
1240
1269
  # Enrichment/null-risk analysis
1241
1270
  grep 'enrich:\|null_risks:' .magector/magector.log | tail -20
1242
1271
 
1243
- # AST search (semgrep) issues
1272
+ # AST search (tree-sitter) issues
1244
1273
  grep 'ast_search:' .magector/magector.log | tail -20
1245
1274
 
1246
1275
  # Batch query breakdown (per-tool timing)
@@ -1257,7 +1286,7 @@ grep 'server starting\|Config:\|primary\|Serve process' .magector/magector.log |
1257
1286
 
1258
1287
  Every tool call logs `[REQ]` with input parameters and `[RES]` with elapsed time. Additionally:
1259
1288
 
1260
- - **`magento_ast_search`** — semgrep pattern, target path, execution time, result count, semgrep errors
1289
+ - **`magento_ast_search`** — tree-sitter pattern, target path, execution time, result count, query errors
1261
1290
  - **`magento_enrich`** — file count, progress every 10k files, read errors, transaction failures, final summary
1262
1291
  - **`magento_find_null_risks`** — query parameters, result count, query timing, missing DB warnings
1263
1292
  - **`magento_batch`** — query list on entry, per-sub-tool timing and errors
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "magector",
3
- "version": "2.15.0",
3
+ "version": "2.16.0",
4
4
  "description": "Semantic code search for Magento 2 — index, search, MCP server",
5
5
  "type": "module",
6
6
  "main": "src/mcp-server.js",
@@ -33,10 +33,10 @@
33
33
  "ruvector": "^0.1.96"
34
34
  },
35
35
  "optionalDependencies": {
36
- "@magector/cli-darwin-arm64": "2.15.0",
37
- "@magector/cli-linux-x64": "2.15.0",
38
- "@magector/cli-linux-arm64": "2.15.0",
39
- "@magector/cli-win32-x64": "2.15.0"
36
+ "@magector/cli-darwin-arm64": "2.16.0",
37
+ "@magector/cli-linux-x64": "2.16.0",
38
+ "@magector/cli-linux-arm64": "2.16.0",
39
+ "@magector/cli-win32-x64": "2.16.0"
40
40
  },
41
41
  "keywords": [
42
42
  "magento",
package/src/mcp-server.js CHANGED
@@ -17,7 +17,7 @@ import {
17
17
  import { execFileSync, spawn } from 'child_process';
18
18
  import { createInterface } from 'readline';
19
19
  import { createServer as createNetServer, createConnection } from 'net';
20
- import { existsSync, statSync, unlinkSync, copyFileSync, renameSync, appendFileSync, writeFileSync, readFileSync, mkdirSync, openSync, closeSync, constants as fsConstants } from 'fs';
20
+ import { existsSync, statSync, unlinkSync, copyFileSync, renameSync, appendFileSync, writeFileSync, readFileSync, mkdirSync, openSync, closeSync, chmodSync, constants as fsConstants } from 'fs';
21
21
  import { stat } from 'fs/promises';
22
22
  import { glob } from 'glob';
23
23
  import path from 'path';
@@ -142,6 +142,9 @@ function extractJson(stdout) {
142
142
 
143
143
  // ─── PID File & Orphan Cleanup ──────────────────────────────────
144
144
  // Track the serve process PID to clean up orphans on restart.
145
+ // Primary state lives in data.db (state_processes / state_cache tables).
146
+ // File-based paths are kept as fallback for operations that run before
147
+ // the serve process (and thus data.db) is available.
145
148
 
146
149
  const PID_PATH = path.join(config.magentoRoot, '.magector', 'serve.pid');
147
150
  const REINDEX_PID_PATH = path.join(config.magentoRoot, '.magector', 'reindex.pid');
@@ -149,6 +152,38 @@ const SOCK_PATH = path.join(config.magentoRoot, '.magector', 'serve.sock');
149
152
  const FORMAT_CACHE_PATH = path.join(config.magentoRoot, '.magector', 'format-ok.json');
150
153
  const PRIMARY_LOCK_PATH = path.join(config.magentoRoot, '.magector', 'primary.lock');
151
154
 
155
+ // ─── Path Safety ────────────────────────────────────────────────
156
+ // All tool handlers accept user-supplied paths relative to MAGENTO_ROOT.
157
+ // Without validation, `../../../etc/passwd` would escape the project.
158
+ // A hostile indexed file could prompt-inject the LLM into requesting such a
159
+ // path and leak host files. These helpers are the single chokepoint.
160
+
161
+ /**
162
+ * Resolve a user-supplied path against a trusted root and verify it stays
163
+ * inside the root. Returns the absolute resolved path or null on escape.
164
+ */
165
+ function safePath(root, rel) {
166
+ if (rel === undefined || rel === null) return null;
167
+ const rootAbs = path.resolve(root);
168
+ const joined = path.resolve(rootAbs, String(rel));
169
+ if (joined !== rootAbs && !joined.startsWith(rootAbs + path.sep)) {
170
+ return null;
171
+ }
172
+ return joined;
173
+ }
174
+
175
+ /**
176
+ * Like safePath but returns the relative form (used for tools that invoke
177
+ * external processes with cwd=root and want a relative path argument).
178
+ * '.' means "the root itself".
179
+ */
180
+ function safeRelPath(root, rel) {
181
+ const abs = safePath(root, rel);
182
+ if (!abs) return null;
183
+ const r = path.relative(path.resolve(root), abs);
184
+ return r === '' ? '.' : r;
185
+ }
186
+
152
187
  /**
153
188
  * Expand brace patterns in include globs for GNU grep compatibility.
154
189
  * GNU grep --include does NOT support brace expansion (that's a shell feature).
@@ -190,12 +225,15 @@ function expandIncludePattern(include) {
190
225
  /**
191
226
  * Try to acquire the primary lock (O_EXCL = atomic create-or-fail).
192
227
  * Returns true if we are the primary instance, false if another instance holds the lock.
228
+ * Uses file-based O_EXCL for atomicity (runs before serve/DB is available).
229
+ * Also writes lock state to data.db when serve becomes available.
193
230
  */
194
231
  function tryAcquirePrimaryLock() {
195
232
  try {
196
233
  const fd = openSync(PRIMARY_LOCK_PATH, fsConstants.O_WRONLY | fsConstants.O_CREAT | fsConstants.O_EXCL);
197
234
  writeFileSync(fd, String(process.pid));
198
235
  closeSync(fd);
236
+ // DB write deferred — serve is not available during lock acquisition
199
237
  return true;
200
238
  } catch {
201
239
  // Lock file exists — check if holder is alive
@@ -226,6 +264,19 @@ function tryAcquirePrimaryLock() {
226
264
  }
227
265
  }
228
266
 
267
+ /**
268
+ * Write primary lock state to data.db (called after serve becomes available).
269
+ * This mirrors the file-based lock so other processes can query DB for lock owner.
270
+ */
271
+ function persistPrimaryLockToDb() {
272
+ if (serveProcess && serveReady) {
273
+ serveQuery('cache_set', {
274
+ key: 'primary_lock',
275
+ value: JSON.stringify({ pid: process.pid, timestamp: Date.now() })
276
+ }, 5000).catch(() => {});
277
+ }
278
+ }
279
+
229
280
  function releasePrimaryLock() {
230
281
  try {
231
282
  // Only remove if we own it
@@ -234,13 +285,27 @@ function releasePrimaryLock() {
234
285
  unlinkSync(PRIMARY_LOCK_PATH);
235
286
  }
236
287
  } catch {}
288
+ // Also clean DB state — fire-and-forget
289
+ if (serveProcess && serveReady) {
290
+ serveQuery('cache_set', {
291
+ key: 'primary_lock',
292
+ value: JSON.stringify({ pid: null, released: true })
293
+ }, 5000).catch(() => {});
294
+ }
237
295
  }
238
296
 
239
297
  /**
240
298
  * Write the serve process PID to disk so future instances can clean up orphans.
299
+ * Also writes to data.db via serve command when the serve process is available.
300
+ * Note: the Rust serve process writes its own PID to data.db on startup, so
301
+ * the file is primarily a fallback for the brief window before serve is ready.
241
302
  */
242
303
  function writePidFile(pid) {
243
304
  try { writeFileSync(PID_PATH, `${pid}\n${__pkg.version}`); } catch {}
305
+ // Async DB write — fire-and-forget, serve process also writes its own PID
306
+ if (serveProcess && serveReady) {
307
+ serveQuery('process_set', { name: 'serve', pid, version: __pkg.version }, 5000).catch(() => {});
308
+ }
244
309
  }
245
310
 
246
311
  function getServePidVersion() {
@@ -254,21 +319,35 @@ function getServePidVersion() {
254
319
 
255
320
  function removePidFile() {
256
321
  try { if (existsSync(PID_PATH)) unlinkSync(PID_PATH); } catch {}
322
+ // Also clean DB state — fire-and-forget
323
+ if (serveProcess && serveReady) {
324
+ serveQuery('process_remove', { name: 'serve' }, 5000).catch(() => {});
325
+ }
257
326
  }
258
327
 
259
328
  function writeReindexPidFile(pid) {
260
329
  try { writeFileSync(REINDEX_PID_PATH, String(pid)); } catch {}
330
+ // Also persist in data.db when serve is available
331
+ if (serveProcess && serveReady) {
332
+ serveQuery('process_set', { name: 'reindex', pid }, 5000).catch(() => {});
333
+ }
261
334
  }
262
335
 
263
336
  function removeReindexPidFile() {
264
337
  try { if (existsSync(REINDEX_PID_PATH)) unlinkSync(REINDEX_PID_PATH); } catch {}
338
+ // Also clean DB state
339
+ if (serveProcess && serveReady) {
340
+ serveQuery('process_remove', { name: 'reindex' }, 5000).catch(() => {});
341
+ }
265
342
  }
266
343
 
267
344
  /**
268
345
  * Check if another reindex process is already running (from another MCP instance).
346
+ * Checks data.db first (via serve query), falls back to PID file.
269
347
  * Returns the PID if alive, null otherwise.
270
348
  */
271
349
  function getRunningReindexPid() {
350
+ // Try file-based check (synchronous, always available)
272
351
  try {
273
352
  if (!existsSync(REINDEX_PID_PATH)) return null;
274
353
  const pid = parseInt(readFileSync(REINDEX_PID_PATH, 'utf-8').trim(), 10);
@@ -287,6 +366,7 @@ function getRunningReindexPid() {
287
366
  * Returns the PID if alive, null if stale/missing.
288
367
  * Does NOT kill it — multiple MCP instances can share one serve process
289
368
  * by sending queries to it via stdin (each instance starts its own).
369
+ * Tries file-based check (synchronous, always available).
290
370
  */
291
371
  function getExistingServePid() {
292
372
  try {
@@ -305,6 +385,7 @@ function getExistingServePid() {
305
385
  * Kill any stale serve process from a previous MCP server instance.
306
386
  * Only called during cleanup (exit/SIGTERM), not during startup —
307
387
  * multiple concurrent MCP instances each run their own serve process.
388
+ * Reads PID from file (DB may not be available if serve is dead).
308
389
  */
309
390
  function killStaleServeProcess() {
310
391
  try {
@@ -344,8 +425,13 @@ let warmupInProgress = true; // true until checkDbFormat + serve process ready
344
425
 
345
426
  /**
346
427
  * Check if the database file is compatible with the current binary.
347
- * Uses a cached result file to avoid running stats (30-60s) on every startup.
428
+ * Uses a cached result to avoid running stats (30-60s) on every startup.
348
429
  * Cache key: binary path mtime + db file mtime + db size.
430
+ *
431
+ * Cache lookup order:
432
+ * 1. data.db state_cache (via serve query, if serve is available)
433
+ * 2. format-ok.json file (fallback — runs before serve starts)
434
+ * Both locations are written on cache miss.
349
435
  */
350
436
  async function checkDbFormat() {
351
437
  if (!existsSync(config.dbPath)) return true;
@@ -357,10 +443,27 @@ async function checkDbFormat() {
357
443
  // Check cached result — avoids 40s stats command on every MCP startup
358
444
  const binaryStat = statSync(config.rustBinary);
359
445
  const cacheKey = `${binaryStat.mtimeMs}|${fstat.mtimeMs}|${fstat.size}`;
446
+
447
+ // Try DB cache first (if serve process is running)
448
+ const queryFn = globalServeQuery || ((serveProcess && serveReady) ? serveQuery : null);
449
+ if (queryFn) {
450
+ try {
451
+ const resp = await queryFn('cache_get', { key: 'format_ok' }, 5000);
452
+ if (resp.ok && resp.data) {
453
+ const cached = JSON.parse(resp.data.value);
454
+ if (cached.key === cacheKey) {
455
+ logToFile('INFO', `Format check cached (DB): ${cached.ok ? 'compatible' : 'incompatible'}`);
456
+ return cached.ok;
457
+ }
458
+ }
459
+ } catch { /* DB cache miss or unavailable */ }
460
+ }
461
+
462
+ // Fall back to file cache
360
463
  try {
361
464
  const cached = JSON.parse(readFileSync(FORMAT_CACHE_PATH, 'utf-8'));
362
465
  if (cached.key === cacheKey) {
363
- logToFile('INFO', `Format check cached: ${cached.ok ? 'compatible' : 'incompatible'}`);
466
+ logToFile('INFO', `Format check cached (file): ${cached.ok ? 'compatible' : 'incompatible'}`);
364
467
  return cached.ok;
365
468
  }
366
469
  } catch { /* no cache or invalid */ }
@@ -380,8 +483,14 @@ async function checkDbFormat() {
380
483
  const vectors = parseInt(result.match(/Total vectors:\s*(\d+)/)?.[1] || '0');
381
484
  const ok = vectors > 0;
382
485
 
383
- // Write cache
384
- try { writeFileSync(FORMAT_CACHE_PATH, JSON.stringify({ key: cacheKey, ok })); } catch {}
486
+ // Write cache to both file and DB
487
+ const cacheValue = JSON.stringify({ key: cacheKey, ok });
488
+ try { writeFileSync(FORMAT_CACHE_PATH, cacheValue); } catch {}
489
+ // Async DB write — fire-and-forget (serve may not be up yet on first startup)
490
+ const queryFn2 = globalServeQuery || ((serveProcess && serveReady) ? serveQuery : null);
491
+ if (queryFn2) {
492
+ queryFn2('cache_set', { key: 'format_ok', value: cacheValue }, 5000).catch(() => {});
493
+ }
385
494
  return ok;
386
495
  } catch {
387
496
  return false;
@@ -685,6 +794,8 @@ function startServeProcess() {
685
794
  serveReady = true;
686
795
  logToFile('INFO', `Serve process ready (PID ${proc.pid})`);
687
796
  if (serveReadyResolve) { serveReadyResolve(true); serveReadyResolve = null; }
797
+ // Now that serve is up, persist primary lock state to data.db
798
+ persistPrimaryLockToDb();
688
799
  return;
689
800
  }
690
801
 
@@ -737,7 +848,13 @@ function startSocketProxy() {
737
848
  logToFile('WARN', `Socket proxy error: ${err.message}`);
738
849
  });
739
850
  socketServer.listen(SOCK_PATH, () => {
740
- logToFile('INFO', `Socket proxy listening on ${SOCK_PATH}`);
851
+ // Restrict socket to the owning user — without this, on multi-user
852
+ // systems any local account could connect to the serve proxy and query
853
+ // the index (leaking indexed code snippets to other local users).
854
+ try { chmodSync(SOCK_PATH, 0o600); } catch (err) {
855
+ logToFile('WARN', `Failed to chmod socket to 0600: ${err.message}`);
856
+ }
857
+ logToFile('INFO', `Socket proxy listening on ${SOCK_PATH} (mode 0600)`);
741
858
  });
742
859
  }
743
860
 
@@ -3464,298 +3581,35 @@ async function traceCallChain(startClass, startMethod, maxDepth = 3) {
3464
3581
  }
3465
3582
 
3466
3583
  // ─── Method Chain Enrichment ────────────────────────────────────
3467
- // Scans PHP files for two-step method chains (->first()->second()) and detects
3468
- // null guards in surrounding code. Results stored in SQLite enrichment.db for
3469
- // instant O(1) queries — eliminates 20+ grep calls for null-risk analyses.
3470
-
3471
- const ENRICHMENT_DB_PATH = (root) => path.join(root, '.magector', 'enrichment.db');
3472
-
3473
- /**
3474
- * Detect null guard for a chained call in surrounding lines.
3475
- * Checks ±guardRadius lines for: null checks, ?->, ??, isset()
3476
- */
3477
- function hasNullGuard(lines, matchLineIdx, receiverExpr, guardRadius = 6) {
3478
- const start = Math.max(0, matchLineIdx - guardRadius);
3479
- const end = Math.min(lines.length - 1, matchLineIdx + guardRadius);
3480
- const matchLine = lines[matchLineIdx] || '';
3481
- const window = lines.slice(start, end + 1).join('\n');
3482
-
3483
- // ?-> only counts if it's on the same line as the chain (avoid false positives from unrelated variables)
3484
- if (matchLine.includes('?->')) return true;
3485
- if (/\?\?|\?:/.test(window)) return true;
3486
-
3487
- if (receiverExpr) {
3488
- const esc = receiverExpr.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
3489
- if (new RegExp(`(?:is_null\\s*\\(\\s*${esc}|${esc}\\s*(?:===|!==)\\s*null|!\\s*${esc}\\s*[,)]|isset\\s*\\(\\s*${esc})`, 'i').test(window)) return true;
3490
- }
3491
- return false;
3492
- }
3493
-
3494
- /**
3495
- * Scan vendor/ PHP files for ->first()->second() chains and store null-safety
3496
- * analysis in enrichment.db. Called by magento_enrich and after magento_index.
3497
- */
3498
- async function enrichMethodChains(root) {
3499
- const dbPath = ENRICHMENT_DB_PATH(root);
3500
- logToFile('INFO', `enrich: starting method-chain scan, db=${dbPath}`);
3501
- const enrichStart = Date.now();
3502
-
3503
- // Use node:sqlite (built-in, no deps)
3504
- let DatabaseSync;
3505
- try {
3506
- ({ DatabaseSync } = await import('node:sqlite'));
3507
- } catch {
3508
- logToFile('ERR', 'enrich: node:sqlite not available — requires Node.js 22.5+');
3509
- throw new Error('node:sqlite not available — requires Node.js 22.5+');
3510
- }
3511
-
3512
- const db = new DatabaseSync(dbPath);
3513
- db.exec('PRAGMA journal_mode = WAL');
3514
- db.exec(`
3515
- CREATE TABLE IF NOT EXISTS method_chains (
3516
- id INTEGER PRIMARY KEY AUTOINCREMENT,
3517
- file TEXT NOT NULL,
3518
- line INTEGER NOT NULL,
3519
- chain TEXT NOT NULL,
3520
- first_method TEXT NOT NULL,
3521
- second_method TEXT NOT NULL,
3522
- has_null_guard INTEGER NOT NULL DEFAULT 0,
3523
- updated_at INTEGER NOT NULL
3524
- );
3525
- CREATE INDEX IF NOT EXISTS idx_first_method ON method_chains (first_method);
3526
- CREATE INDEX IF NOT EXISTS idx_null_guard ON method_chains (has_null_guard, first_method);
3527
- `);
3528
-
3529
- // Two-step chain: $var->firstMethod(...)->secondMethod(
3530
- // Captures: receiver ($var), firstMethod, secondMethod
3531
- const chainRegex = /(\$\w+)\s*->\s*(\w+)\s*\([^)]{0,60}\)\s*->\s*(\w+)\s*\(/g;
3532
- const now = Date.now();
3533
-
3534
- const phpFiles = await glob('vendor/**/*.php', { cwd: root, absolute: true, nodir: true });
3535
- logToFile('INFO', `enrich: found ${phpFiles.length} PHP files in vendor/`);
3536
- let scanned = 0;
3537
- let chains = 0;
3538
- let readErrors = 0;
3539
-
3540
- const insertStmt = db.prepare(
3541
- 'INSERT INTO method_chains (file, line, chain, first_method, second_method, has_null_guard, updated_at) VALUES (?,?,?,?,?,?,?)'
3542
- );
3543
- const deleteFile = db.prepare('DELETE FROM method_chains WHERE file = ?');
3544
-
3545
- // Build line-offset index for O(1) line number lookups
3546
- function buildLineIndex(content) {
3547
- const offsets = [0];
3548
- let idx = 0;
3549
- while ((idx = content.indexOf('\n', idx)) !== -1) {
3550
- idx++;
3551
- offsets.push(idx);
3552
- }
3553
- return offsets;
3554
- }
3555
-
3556
- function lineFromOffset(offsets, charIndex) {
3557
- let lo = 0, hi = offsets.length - 1;
3558
- while (lo < hi) {
3559
- const mid = (lo + hi + 1) >> 1;
3560
- if (offsets[mid] <= charIndex) lo = mid; else hi = mid - 1;
3561
- }
3562
- return lo + 1; // 1-based
3563
- }
3564
-
3565
- // Progress logging every 10k files
3566
- const progressInterval = 10000;
3567
-
3568
- db.exec('BEGIN');
3569
- try {
3570
- for (const phpFile of phpFiles) {
3571
- let content;
3572
- try { content = readFileSync(phpFile, 'utf-8'); } catch (err) {
3573
- readErrors++;
3574
- if (readErrors <= 5) logToFile('WARN', `enrich: cannot read ${phpFile}: ${err.code || err.message}`);
3575
- continue;
3576
- }
3577
- if (!content.includes('->')) continue;
3578
-
3579
- const relPath = phpFile.replace(root + '/', '');
3580
- const lines = content.split('\n');
3581
- const lineOffsets = buildLineIndex(content);
3582
- const rows = [];
3583
-
3584
- chainRegex.lastIndex = 0;
3585
- let m;
3586
- while ((m = chainRegex.exec(content)) !== null) {
3587
- const lineNum = lineFromOffset(lineOffsets, m.index);
3588
- rows.push({
3589
- file: relPath, line: lineNum,
3590
- chain: `->${m[2]}()->${m[3]}()`,
3591
- firstMethod: m[2], secondMethod: m[3],
3592
- hasNullGuard: hasNullGuard(lines, lineNum - 1, m[1]) ? 1 : 0
3593
- });
3594
- chains++;
3595
- }
3596
-
3597
- if (rows.length > 0) {
3598
- deleteFile.run(relPath);
3599
- for (const r of rows) {
3600
- insertStmt.run(r.file, r.line, r.chain, r.firstMethod, r.secondMethod, r.hasNullGuard, now);
3601
- }
3602
- }
3603
- scanned++;
3604
- if (scanned % progressInterval === 0) {
3605
- logToFile('INFO', `enrich: progress ${scanned}/${phpFiles.length} files, ${chains} chains so far (${Date.now() - enrichStart}ms)`);
3606
- }
3607
- }
3608
- db.exec('COMMIT');
3609
- } catch (err) {
3610
- logToFile('ERR', `enrich: transaction failed at file ${scanned}/${phpFiles.length}: ${err.message}`);
3611
- db.exec('ROLLBACK');
3612
- throw err;
3613
- }
3614
-
3615
- db.close();
3616
- const enrichElapsed = Date.now() - enrichStart;
3617
- logToFile('INFO', `enrich: complete — ${scanned} files scanned, ${chains} chains indexed, ${readErrors} read errors, ${enrichElapsed}ms`);
3618
- return { scanned, chains };
3619
- }
3620
-
3621
- /**
3622
- * Query enrichment.db for unsafe method chains (no null guard).
3623
- */
3624
- async function queryNullRisks(root, firstMethod, limit = 100) {
3625
- const dbPath = ENRICHMENT_DB_PATH(root);
3626
- if (!existsSync(dbPath)) {
3627
- logToFile('WARN', `null_risks: enrichment.db not found at ${dbPath} — run magento_enrich first`);
3628
- return null;
3629
- }
3630
-
3631
- let DatabaseSync;
3632
- try {
3633
- ({ DatabaseSync } = await import('node:sqlite'));
3634
- } catch (err) {
3635
- logToFile('ERR', `null_risks: node:sqlite not available: ${err.message}`);
3636
- return null;
3637
- }
3638
-
3639
- const queryStart = Date.now();
3640
- logToFile('INFO', `null_risks: querying firstMethod=${firstMethod || '(all)'} limit=${limit}`);
3641
- const db = new DatabaseSync(dbPath, { open: true });
3642
- let rows;
3643
- try {
3644
- if (firstMethod) {
3645
- rows = db.prepare(
3646
- 'SELECT file, line, chain, second_method FROM method_chains WHERE has_null_guard = 0 AND first_method = ? ORDER BY file, line LIMIT ?'
3647
- ).all(firstMethod, limit);
3648
- } else {
3649
- rows = db.prepare(
3650
- 'SELECT file, line, chain, first_method, second_method FROM method_chains WHERE has_null_guard = 0 ORDER BY first_method, file, line LIMIT ?'
3651
- ).all(limit);
3652
- }
3653
- } finally {
3654
- db.close();
3655
- }
3656
- logToFile('INFO', `null_risks: ${rows.length} unsafe chain(s) found in ${Date.now() - queryStart}ms`);
3657
- return rows;
3658
- }
3584
+ // Enrichment logic has been moved to the Rust serve process (enrich / enrich_query commands).
3585
+ // Use serveQuery('enrich', { magento_root }) and serveQuery('enrich_query', { first_method, limit }).
3659
3586
 
3660
- // ─── AST Search (semgrep) ───────────────────────────────────────
3587
+ // ─── AST Search (tree-sitter via Rust serve) ────────────────────
3661
3588
 
3662
- async function astSearch(pattern, searchPath, lang, maxResults) {
3589
+ async function astSearch(patternName, searchPath, maxResults) {
3663
3590
  const root = config.magentoRoot;
3664
3591
  if (!root) throw new Error('MAGENTO_ROOT not set');
3665
3592
 
3666
- const targetPath = searchPath ? path.join(root, searchPath) : root;
3667
- const semgrepLang = lang || 'php';
3668
- const limit = Math.min(maxResults || 50, 200);
3669
-
3670
- logToFile('INFO', `ast_search: pattern="${pattern}" path="${searchPath || '.'}" lang=${semgrepLang} limit=${limit}`);
3671
- const astStart = Date.now();
3672
-
3673
- // Semgrep's default ignore list includes "vendor/" which is exactly what we need to scan.
3674
- // Semgrep resolves .semgrepignore from the git repo root, NOT the scan directory.
3675
- // An empty .semgrepignore at root overrides the defaults: https://semgrep.dev/docs/ignoring-files-folders-code/
3676
- const semgrepIgnorePath = path.join(root, '.semgrepignore');
3677
- let createdSemgrepIgnore = false;
3678
- if (!existsSync(semgrepIgnorePath)) {
3679
- try {
3680
- writeFileSync(semgrepIgnorePath, '# Magector: scan vendor/ and all project files\n');
3681
- createdSemgrepIgnore = true;
3682
- logToFile('INFO', `ast_search: created temporary .semgrepignore at ${root}`);
3683
- } catch (err) {
3684
- logToFile('WARN', `ast_search: failed to create .semgrepignore: ${err.message}`);
3685
- }
3686
- }
3593
+ const safeSp = searchPath ? safeRelPath(root, searchPath) : '.';
3594
+ if (searchPath && !safeSp) throw new Error(`Path escapes project root: ${searchPath}`);
3687
3595
 
3688
- const semgrepArgs = [
3689
- '--pattern', pattern,
3690
- '--lang', semgrepLang,
3691
- '--json',
3692
- '--no-git-ignore',
3693
- targetPath
3694
- ];
3695
-
3696
- let rawOutput;
3697
- try {
3698
- rawOutput = execFileSync('semgrep', semgrepArgs, {
3699
- encoding: 'utf-8',
3700
- timeout: 60000,
3701
- maxBuffer: 20 * 1024 * 1024,
3702
- stdio: ['pipe', 'pipe', 'pipe'],
3703
- env: { ...process.env, PATH: process.env.PATH + ':/home/swed/.local/bin' }
3704
- });
3705
- } catch (err) {
3706
- // semgrep exits non-zero when it has findings — stdout still contains valid JSON
3707
- rawOutput = err.stdout || '';
3708
- if (!rawOutput) {
3709
- const errMsg = (err.stderr || err.message || '').slice(0, 500);
3710
- logToFile('ERR', `ast_search: semgrep failed after ${Date.now() - astStart}ms: ${errMsg}`);
3711
- throw new Error(`semgrep failed: ${errMsg}`);
3712
- }
3713
- } finally {
3714
- if (createdSemgrepIgnore) { try { unlinkSync(semgrepIgnorePath); } catch { /* best effort */ } }
3715
- }
3596
+ const limit = Math.min(maxResults || 50, 200);
3597
+ logToFile('INFO', `ast_search: pattern="${patternName}" path="${safeSp}" limit=${limit}`);
3598
+ const start = Date.now();
3716
3599
 
3717
- let parsed;
3718
- try {
3719
- parsed = JSON.parse(rawOutput);
3720
- } catch {
3721
- logToFile('ERR', `ast_search: failed to parse semgrep JSON output (${rawOutput.length} bytes)`);
3722
- throw new Error(`Failed to parse semgrep output. First 300 chars: ${rawOutput.slice(0, 300)}`);
3723
- }
3600
+ const resp = await serveQuery('ast_query', { pattern: patternName, path: safeSp, limit }, 60000);
3601
+ if (!resp.ok) throw new Error(resp.error || 'ast_query failed');
3724
3602
 
3725
- const findings = (parsed.results || []).slice(0, limit);
3726
- const astElapsed = Date.now() - astStart;
3727
- logToFile('INFO', `ast_search: ${findings.length} match(es) in ${astElapsed}ms (semgrep returned ${(parsed.results || []).length} total)`);
3728
- if (parsed.errors && parsed.errors.length > 0) {
3729
- logToFile('WARN', `ast_search: semgrep reported ${parsed.errors.length} error(s): ${parsed.errors.slice(0, 3).map(e => e.message || e.type || JSON.stringify(e)).join('; ')}`);
3730
- }
3731
- return findings.map(r => {
3732
- // semgrep >=1.100 may return "requires login" in r.extra.lines for unlicensed installs.
3733
- // Fall back to r.extra.message which contains the matched expression (always available).
3734
- const rawLines = r.extra?.lines || '';
3735
- const snippet = (rawLines && rawLines !== 'requires login')
3736
- ? rawLines.trim()
3737
- : (r.extra?.message || '').trim();
3738
- return {
3739
- file: r.path.replace(root + '/', ''),
3740
- line: r.start.line,
3741
- endLine: r.end.line,
3742
- snippet
3743
- };
3744
- });
3603
+ const elapsed = Date.now() - start;
3604
+ const results = resp.data || [];
3605
+ logToFile('INFO', `ast_search: ${results.length} match(es) in ${elapsed}ms`);
3606
+ return results;
3745
3607
  }
3746
3608
 
3747
3609
  // ─── DataObject set-null Anti-pattern Detection ─────────────────
3748
3610
 
3749
3611
  async function findDataObjectIssues(searchPath, maxResults) {
3750
- // Detects DataObject::setX(null) anti-pattern:
3751
- // setX(null) stores ['x' => null] in _data — key EXISTS with null value.
3752
- // hasX() / hasData('x') calls array_key_exists() → returns true even for null.
3753
- // This causes false-positive guard conditions: hasX() passes, getX() returns null.
3754
- // Correct way to fully clear: unsetData('x') removes the key entirely.
3755
- const allResults = await astSearch('$X->$SETTER(null)', searchPath, 'php', 500);
3756
- const setterNullRegex = /->set[A-Z]\w+\s*\(\s*null\s*\)/;
3757
- const limit = Math.min(maxResults || 100, 500);
3758
- return allResults.filter(r => setterNullRegex.test(r.snippet)).slice(0, limit);
3612
+ return astSearch('dataobject-set-null', searchPath, maxResults || 100);
3759
3613
  }
3760
3614
 
3761
3615
  // ─── MCP Server ─────────────────────────────────────────────────
@@ -4490,23 +4344,19 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
4490
4344
  },
4491
4345
  {
4492
4346
  name: 'magento_ast_search',
4493
- description: 'Structural PHP code search using semgrep patterns. Unlike magento_grep (text-based), this understands PHP AST — matches code structure regardless of variable names, ignores comments/strings, understands operator precedence. Use when grep gives false positives or you need structural awareness. Pattern syntax: $X = any expression/variable, $Y = any identifier, ... = any arguments. Examples: "$ORDER->getPayment()->$M(...)" finds all method calls on payment regardless of variable name; "$X->getPayment()->$Y(...)" finds all two-step chains involving getPayment. ⚡ For multi-query workflows use magento_batch.',
4347
+ description: 'Structural PHP code search using tree-sitter AST queries. Unlike magento_grep (text-based), this understands PHP AST — matches code structure regardless of variable names, ignores comments/strings. Available named patterns: "dataobject-set-null" (finds ->setX(null) anti-pattern calls), "unchecked-method-chain" (finds ->a()->b() chains without null guards). Uses Rust tree-sitter for fast, accurate parsing. ⚡ For multi-query workflows use magento_batch.',
4494
4348
  inputSchema: {
4495
4349
  type: 'object',
4496
4350
  properties: {
4497
4351
  pattern: {
4498
4352
  type: 'string',
4499
- description: 'Semgrep PHP pattern. $X = any expr, $Y = any identifier, ... = any args. Examples: "$X->getPayment()->$Y(...)", "if ($X !== null) { ... $X->$Y(...) }", "$X = $Y->getPayment(); ... $X->$Z(...)"'
4353
+ enum: ['dataobject-set-null', 'unchecked-method-chain'],
4354
+ description: 'Named AST query pattern. "dataobject-set-null": finds ->setX(null) calls on DataObjects. "unchecked-method-chain": finds ->a()->b() chains without null guards.'
4500
4355
  },
4501
4356
  path: {
4502
4357
  type: 'string',
4503
4358
  description: 'Subdirectory to search (relative to MAGENTO_ROOT). Default: entire codebase. Example: "vendor/acme/"'
4504
4359
  },
4505
- lang: {
4506
- type: 'string',
4507
- description: 'Language to search (default: php). Options: php, xml, js.',
4508
- default: 'php'
4509
- },
4510
4360
  maxResults: {
4511
4361
  type: 'number',
4512
4362
  description: 'Maximum matches to return (default: 50, max: 200)',
@@ -4536,7 +4386,7 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({
4536
4386
  },
4537
4387
  {
4538
4388
  name: 'magento_enrich',
4539
- description: 'Build the method-chain enrichment index. Scans all vendor/ PHP files for two-step method chains (->firstMethod()->secondMethod()) and analyses whether each call has a null guard in surrounding code. Results stored in .magector/enrichment.db. Run this once after magento_index, then use magento_find_null_risks for instant O(1) null-safety queries instead of 20+ grep calls. Also runs automatically after magento_index completes.',
4389
+ description: 'Build the method-chain enrichment index. Scans all vendor/ PHP files for two-step method chains (->firstMethod()->secondMethod()) and analyses whether each call has a null guard in surrounding code. Results stored in .magector/data.db. Run this once after magento_index, then use magento_find_null_risks for instant O(1) null-safety queries instead of 20+ grep calls. Also runs automatically after magento_index completes.',
4540
4390
  inputSchema: { type: 'object', properties: {} }
4541
4391
  },
4542
4392
  {
@@ -4886,10 +4736,14 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
4886
4736
  case 'magento_index': {
4887
4737
  const root = args.path || config.magentoRoot;
4888
4738
  const output = rustIndex(root);
4889
- // Auto-enrich after indexing: runs in background, doesn't block response
4739
+ // Auto-enrich after indexing: runs in background via Rust serve, doesn't block response
4890
4740
  logToFile('INFO', 'Auto-enrich: starting in background after index');
4891
- enrichMethodChains(root).then(({ scanned, chains }) => {
4892
- logToFile('INFO', `Auto-enrich complete: ${scanned} files, ${chains} chains`);
4741
+ serveQuery('enrich', { magento_root: root }, 120000).then(resp => {
4742
+ if (resp.ok) {
4743
+ logToFile('INFO', `Auto-enrich complete: ${resp.data.scanned} files, ${resp.data.chains} chains`);
4744
+ } else {
4745
+ logToFile('WARN', `Auto-enrich failed: ${resp.error}`);
4746
+ }
4893
4747
  }).catch(err => {
4894
4748
  logToFile('WARN', `Auto-enrich failed: ${err.message}`);
4895
4749
  });
@@ -6466,7 +6320,12 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
6466
6320
  break;
6467
6321
  }
6468
6322
  case 'magento_read': {
6469
- const filePath = path.join(config.magentoRoot, a.path);
6323
+ const filePath = safePath(config.magentoRoot, a.path);
6324
+ if (!filePath) {
6325
+ logToFile('WARN', `batch read: rejected path traversal attempt: "${a.path}"`);
6326
+ text = `Path escapes project root: ${a.path}`;
6327
+ break;
6328
+ }
6470
6329
  let fileContent;
6471
6330
  try { fileContent = readFileSync(filePath, 'utf-8'); } catch {
6472
6331
  text = `File not found: ${a.path}`;
@@ -6491,32 +6350,79 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
6491
6350
  break;
6492
6351
  }
6493
6352
  case 'magento_grep': {
6494
- const searchPath = a.path || '.';
6353
+ const searchPath = safeRelPath(config.magentoRoot, a.path || '.');
6354
+ if (!searchPath) {
6355
+ logToFile('WARN', `batch grep: rejected path traversal attempt: "${a.path}"`);
6356
+ text = `Path escapes project root: ${a.path}`;
6357
+ break;
6358
+ }
6495
6359
  const include = a.include || '*.php';
6496
6360
  const maxRes = Math.min(a.maxResults || 30, 100);
6497
6361
  const batchCtx = a.context !== undefined ? a.context : 4;
6498
6362
  const batchFilesOnly = a.filesOnly || false;
6499
- const gArgs = batchFilesOnly ? ['-rl', '-E'] : ['-rn', '-E'];
6500
- if (a.ignoreCase) gArgs.push('-i');
6501
- if (!batchFilesOnly && batchCtx > 0) gArgs.push('-C', String(batchCtx));
6502
- for (const pat of expandIncludePattern(include)) gArgs.push('--include=' + pat);
6503
- gArgs.push('--', a.pattern, searchPath);
6504
- let out;
6505
- try {
6506
- out = execFileSync('grep', gArgs, { cwd: config.magentoRoot, encoding: 'utf-8', timeout: 15000, maxBuffer: 5 * 1024 * 1024, stdio: ['pipe', 'pipe', 'pipe'] });
6507
- } catch (err) { out = err.stdout || ''; }
6508
- const gLines = out.trim().split('\n').filter(Boolean);
6509
- if (batchFilesOnly) {
6510
- text = `Files matching \`${a.pattern}\` (${gLines.length}):\n`;
6511
- for (const gl of gLines.slice(0, maxRes)) text += gl + '\n';
6512
- } else {
6513
- text = `Found ${gLines.length} matches${gLines.length > maxRes ? ` (showing ${maxRes})` : ''}:\n`;
6514
- for (const gl of gLines.slice(0, maxRes)) text += gl + '\n';
6363
+
6364
+ // Try Rust serve grep first
6365
+ const batchQueryFn = globalServeQuery || ((serveProcess && serveReady) ? serveQuery : null);
6366
+ let batchGrepDone = false;
6367
+ if (batchQueryFn) {
6368
+ try {
6369
+ const bResp = await batchQueryFn('grep', {
6370
+ pattern: a.pattern,
6371
+ magento_root: config.magentoRoot,
6372
+ path: searchPath,
6373
+ include,
6374
+ context: batchCtx,
6375
+ max_results: maxRes,
6376
+ files_only: batchFilesOnly,
6377
+ ignore_case: a.ignoreCase || false
6378
+ }, 15000);
6379
+ if (bResp.ok && bResp.data) {
6380
+ const bMatches = bResp.data.matches || [];
6381
+ const bTotal = bResp.data.total || bMatches.length;
6382
+ if (batchFilesOnly) {
6383
+ text = `Files matching \`${a.pattern}\` (${bTotal}):\n`;
6384
+ for (const m of bMatches) text += (m.file || m) + '\n';
6385
+ } else {
6386
+ text = `Found ${bTotal} matches${bTotal > maxRes ? ` (showing ${maxRes})` : ''}:\n`;
6387
+ for (const m of bMatches) {
6388
+ if (m.is_context) {
6389
+ text += `${m.file}-${m.line}-${m.text}\n`;
6390
+ } else {
6391
+ text += `${m.file}:${m.line}:${m.text}\n`;
6392
+ }
6393
+ }
6394
+ }
6395
+ batchGrepDone = true;
6396
+ }
6397
+ } catch (bErr) {
6398
+ logToFile('WARN', `batch grep: serve query failed, falling back to external grep: ${bErr.message}`);
6399
+ }
6400
+ }
6401
+
6402
+ // Fallback: external GNU grep (cold-start path or serve error)
6403
+ if (!batchGrepDone) {
6404
+ const gArgs = batchFilesOnly ? ['-rl', '-E'] : ['-rn', '-E'];
6405
+ if (a.ignoreCase) gArgs.push('-i');
6406
+ if (!batchFilesOnly && batchCtx > 0) gArgs.push('-C', String(batchCtx));
6407
+ for (const pat of expandIncludePattern(include)) gArgs.push('--include=' + pat);
6408
+ gArgs.push('--', a.pattern, searchPath);
6409
+ let out;
6410
+ try {
6411
+ out = execFileSync('grep', gArgs, { cwd: config.magentoRoot, encoding: 'utf-8', timeout: 15000, maxBuffer: 5 * 1024 * 1024, stdio: ['pipe', 'pipe', 'pipe'] });
6412
+ } catch (err) { out = err.stdout || ''; }
6413
+ const gLines = out.trim().split('\n').filter(Boolean);
6414
+ if (batchFilesOnly) {
6415
+ text = `Files matching \`${a.pattern}\` (${gLines.length}):\n`;
6416
+ for (const gl of gLines.slice(0, maxRes)) text += gl + '\n';
6417
+ } else {
6418
+ text = `Found ${gLines.length} matches${gLines.length > maxRes ? ` (showing ${maxRes})` : ''}:\n`;
6419
+ for (const gl of gLines.slice(0, maxRes)) text += gl + '\n';
6420
+ }
6515
6421
  }
6516
6422
  break;
6517
6423
  }
6518
6424
  case 'magento_ast_search': {
6519
- const astResults = await astSearch(a.pattern, a.path, a.lang, a.maxResults);
6425
+ const astResults = await astSearch(a.pattern, a.path, a.maxResults);
6520
6426
  if (astResults.length === 0) {
6521
6427
  text = `No matches for pattern: \`${a.pattern}\``;
6522
6428
  } else {
@@ -6536,15 +6442,19 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
6536
6442
  break;
6537
6443
  }
6538
6444
  case 'magento_find_null_risks': {
6539
- const bRoot = config.magentoRoot;
6540
6445
  const bLimit = Math.min(a.limit || 100, 500);
6541
- const bRows = bRoot ? await queryNullRisks(bRoot, a.firstMethod || null, bLimit) : null;
6542
- if (!bRows) { text = '⚠️ Run magento_enrich first.'; break; }
6543
- if (bRows.length === 0) { text = 'No unsafe chains found.'; break; }
6544
- text = `Found ${bRows.length} unsafe chain(s):\n`;
6545
- for (const r of bRows.slice(0, 50)) {
6546
- const chain = r.chain || `->${r.first_method}()->${r.second_method}()`;
6547
- text += `${r.file}:${r.line}: ${chain}\n`;
6446
+ try {
6447
+ const bResp = await serveQuery('enrich_query', { first_method: a.firstMethod || null, limit: bLimit }, 30000);
6448
+ if (!bResp.ok) { text = `⚠️ ${bResp.error || 'Query failed'}`; break; }
6449
+ const bRows = bResp.data || [];
6450
+ if (bRows.length === 0) { text = 'No unsafe chains found.'; break; }
6451
+ text = `Found ${bRows.length} unsafe chain(s):\n`;
6452
+ for (const r of bRows.slice(0, 50)) {
6453
+ const chain = r.chain || `->${r.first_method}()->${r.second_method}()`;
6454
+ text += `${r.file}:${r.line}: ${chain}\n`;
6455
+ }
6456
+ } catch (bErr) {
6457
+ text = '⚠️ Run magento_enrich first.';
6548
6458
  }
6549
6459
  break;
6550
6460
  }
@@ -6569,11 +6479,60 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
6569
6479
  case 'magento_grep': {
6570
6480
  const root = config.magentoRoot;
6571
6481
  if (!root) return { content: [{ type: 'text', text: 'MAGENTO_ROOT not set.' }], isError: true };
6572
- const searchPath = args.path || '.';
6482
+ const searchPath = safeRelPath(root, args.path || '.');
6483
+ if (!searchPath) {
6484
+ logToFile('WARN', `grep: rejected path traversal attempt: "${args.path}"`);
6485
+ return { content: [{ type: 'text', text: `Path escapes project root: ${args.path}` }], isError: true };
6486
+ }
6573
6487
  const include = args.include || '*.php';
6574
6488
  const maxResults = Math.min(args.maxResults || 50, 200);
6575
6489
  const ctxLines = args.context !== undefined ? args.context : 4;
6576
6490
  const filesOnly = args.filesOnly || false;
6491
+ const grepStart = Date.now();
6492
+
6493
+ // Try Rust serve grep first
6494
+ const queryFn = globalServeQuery || ((serveProcess && serveReady) ? serveQuery : null);
6495
+ let output = null;
6496
+ if (queryFn) {
6497
+ try {
6498
+ const resp = await queryFn('grep', {
6499
+ pattern: args.pattern,
6500
+ magento_root: root,
6501
+ path: searchPath,
6502
+ include,
6503
+ context: ctxLines,
6504
+ max_results: maxResults,
6505
+ files_only: filesOnly,
6506
+ ignore_case: args.ignoreCase || false
6507
+ }, 30000);
6508
+ if (resp.ok && resp.data) {
6509
+ const matches = resp.data.matches || [];
6510
+ const total = resp.data.total || matches.length;
6511
+ const grepElapsed = Date.now() - grepStart;
6512
+ if (grepElapsed > 5000) logToFile('WARN', `grep: slow query "${args.pattern}" — ${total} matches in ${grepElapsed}ms`);
6513
+
6514
+ if (filesOnly) {
6515
+ let text = `## grep (files only): \`${args.pattern}\`\nFound **${total}** file(s)${total > maxResults ? ` (showing first ${maxResults})` : ''}. Use magento_read with methodName to read specific methods.\n\n`;
6516
+ for (const m of matches) text += (m.file || m) + '\n';
6517
+ return { content: [{ type: 'text', text }] };
6518
+ } else {
6519
+ let text = `## grep: \`${args.pattern}\`\nFound **${total}** matches${total > maxResults ? ` (showing first ${maxResults})` : ''}\n\n`;
6520
+ for (const m of matches) {
6521
+ if (m.is_context) {
6522
+ text += `${m.file}-${m.line}-${m.text}\n`;
6523
+ } else {
6524
+ text += `${m.file}:${m.line}:${m.text}\n`;
6525
+ }
6526
+ }
6527
+ return { content: [{ type: 'text', text }] };
6528
+ }
6529
+ }
6530
+ } catch (err) {
6531
+ logToFile('WARN', `grep: serve query failed, falling back to external grep: ${err.message}`);
6532
+ }
6533
+ }
6534
+
6535
+ // Fallback: external GNU grep (cold-start path or serve error)
6577
6536
  const grepArgs = filesOnly ? ['-rl', '-E'] : ['-rn', '-E'];
6578
6537
  if (args.ignoreCase) grepArgs.push('-i');
6579
6538
  if (!filesOnly && ctxLines > 0) grepArgs.push('-C', String(ctxLines));
@@ -6581,8 +6540,6 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
6581
6540
  grepArgs.push('--include=' + pat);
6582
6541
  }
6583
6542
  grepArgs.push('--', args.pattern, searchPath);
6584
- let output;
6585
- const grepStart = Date.now();
6586
6543
  try {
6587
6544
  output = execFileSync('grep', grepArgs, {
6588
6545
  cwd: root, encoding: 'utf-8', timeout: 30000,
@@ -6756,7 +6713,11 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
6756
6713
  case 'magento_read': {
6757
6714
  const root = config.magentoRoot;
6758
6715
  if (!root) return { content: [{ type: 'text', text: 'MAGENTO_ROOT not set.' }], isError: true };
6759
- const filePath = path.join(root, args.path);
6716
+ const filePath = safePath(root, args.path);
6717
+ if (!filePath) {
6718
+ logToFile('WARN', `read: rejected path traversal attempt: "${args.path}"`);
6719
+ return { content: [{ type: 'text', text: `Path escapes project root: ${args.path}` }], isError: true };
6720
+ }
6760
6721
  let content;
6761
6722
  try { content = readFileSync(filePath, 'utf-8'); } catch (err) {
6762
6723
  logToFile('WARN', `read: file not found: ${args.path} (${err.code || err.message})`);
@@ -6799,7 +6760,7 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
6799
6760
  }
6800
6761
 
6801
6762
  case 'magento_ast_search': {
6802
- const astResults = await astSearch(args.pattern, args.path, args.lang, args.maxResults);
6763
+ const astResults = await astSearch(args.pattern, args.path, args.maxResults);
6803
6764
  if (astResults.length === 0) {
6804
6765
  return { content: [{ type: 'text', text: `## magento_ast_search: \`${args.pattern}\`\n\nNo matches found.` }] };
6805
6766
  }
@@ -6834,8 +6795,12 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
6834
6795
  if (!root) return { content: [{ type: 'text', text: 'MAGENTO_ROOT not set.' }], isError: true };
6835
6796
  let text = `## magento_enrich\n\nScanning vendor/ PHP files for method chains...\n`;
6836
6797
  try {
6837
- const { scanned, chains } = await enrichMethodChains(root);
6838
- text += `\n✅ **Done**\n- Files scanned: ${scanned}\n- Method chains indexed: ${chains}\n- Null-risk index saved to: \`.magector/enrichment.db\`\n\nUse \`magento_find_null_risks\` to query unsafe chains.`;
6798
+ const resp = await serveQuery('enrich', { magento_root: root }, 120000);
6799
+ if (resp.ok) {
6800
+ text += `\n✅ **Done**\n- Files scanned: ${resp.data.scanned}\n- Method chains indexed: ${resp.data.chains}\n- Null-risk index saved to: \`.magector/data.db\``;
6801
+ } else {
6802
+ text += `\n❌ Error: ${resp.error}`;
6803
+ }
6839
6804
  } catch (err) {
6840
6805
  text += `\n❌ Error: ${err.message}`;
6841
6806
  }
@@ -6846,32 +6811,37 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => {
6846
6811
  const root = config.magentoRoot;
6847
6812
  if (!root) return { content: [{ type: 'text', text: 'MAGENTO_ROOT not set.' }], isError: true };
6848
6813
  const limit = Math.min(args.limit || 100, 500);
6849
- const rows = await queryNullRisks(root, args.firstMethod || null, limit);
6850
- if (rows === null) {
6851
- return { content: [{ type: 'text', text: `## magento_find_null_risks\n\n⚠️ Enrichment index not found. Run \`magento_enrich\` first to build the method-chain index.` }] };
6852
- }
6853
- if (rows.length === 0) {
6814
+ try {
6815
+ const resp = await serveQuery('enrich_query', { first_method: args.firstMethod || null, limit }, 30000);
6816
+ if (!resp.ok) {
6817
+ return { content: [{ type: 'text', text: `## magento_find_null_risks\n\n⚠️ ${resp.error || 'Query failed'}` }] };
6818
+ }
6819
+ const rows = resp.data || [];
6820
+ if (rows.length === 0) {
6821
+ const filter = args.firstMethod ? ` for \`->${args.firstMethod}()\`` : '';
6822
+ return { content: [{ type: 'text', text: `## magento_find_null_risks${filter}\n\nNo unsafe chains found. All detected chains have null guards.` }] };
6823
+ }
6854
6824
  const filter = args.firstMethod ? ` for \`->${args.firstMethod}()\`` : '';
6855
- return { content: [{ type: 'text', text: `## magento_find_null_risks${filter}\n\nNo unsafe chains found. All detected chains have null guards.` }] };
6856
- }
6857
- const filter = args.firstMethod ? ` for \`->${args.firstMethod}()\`` : '';
6858
- let text = `## magento_find_null_risks${filter}\n\nFound **${rows.length}** chain(s) without null guard:\n\n`;
6859
- // Group by chain type for readability
6860
- const byChain = {};
6861
- for (const r of rows) {
6862
- const key = r.chain || `->${r.first_method}()->${r.second_method}()`;
6863
- if (!byChain[key]) byChain[key] = [];
6864
- byChain[key].push(r);
6865
- }
6866
- for (const [chain, sites] of Object.entries(byChain)) {
6867
- text += `### \`${chain}\` (${sites.length} site${sites.length > 1 ? 's' : ''})\n`;
6868
- for (const s of sites.slice(0, 20)) {
6869
- text += `- \`${s.file}:${s.line}\`\n`;
6825
+ let text = `## magento_find_null_risks${filter}\n\nFound **${rows.length}** chain(s) without null guard:\n\n`;
6826
+ // Group by chain type for readability
6827
+ const byChain = {};
6828
+ for (const r of rows) {
6829
+ const key = r.chain || `->${r.first_method}()->${r.second_method}()`;
6830
+ if (!byChain[key]) byChain[key] = [];
6831
+ byChain[key].push(r);
6870
6832
  }
6871
- if (sites.length > 20) text += `- ... and ${sites.length - 20} more\n`;
6872
- text += '\n';
6833
+ for (const [chain, sites] of Object.entries(byChain)) {
6834
+ text += `### \`${chain}\` (${sites.length} site${sites.length > 1 ? 's' : ''})\n`;
6835
+ for (const s of sites.slice(0, 20)) {
6836
+ text += `- \`${s.file}:${s.line}\`\n`;
6837
+ }
6838
+ if (sites.length > 20) text += `- ... and ${sites.length - 20} more\n`;
6839
+ text += '\n';
6840
+ }
6841
+ return { content: [{ type: 'text', text }] };
6842
+ } catch (err) {
6843
+ return { content: [{ type: 'text', text: `## magento_find_null_risks\n\n⚠️ Enrichment index not found. Run \`magento_enrich\` first.` }] };
6873
6844
  }
6874
- return { content: [{ type: 'text', text }] };
6875
6845
  }
6876
6846
 
6877
6847
  default:
package/src/update.js CHANGED
@@ -8,7 +8,7 @@
8
8
  * Never blocks the CLI on failure — network errors are silently ignored.
9
9
  */
10
10
  import { existsSync, readFileSync, writeFileSync, mkdirSync } from 'fs';
11
- import { execSync } from 'child_process';
11
+ import { execFileSync } from 'child_process';
12
12
  import { homedir } from 'os';
13
13
  import path from 'path';
14
14
  import { fileURLToPath } from 'url';
@@ -140,14 +140,29 @@ export async function checkForUpdate(command, originalArgs) {
140
140
  }
141
141
  }
142
142
 
143
+ /**
144
+ * Validate a semver string to prevent shell injection via malicious registry
145
+ * responses. Only digits, dots, dashes and alphanumerics allowed (semver prerelease).
146
+ * Example: "1.2.3", "1.2.3-beta.1", "2.0.0-rc.9" — yes. "1; rm -rf ~" — no.
147
+ */
148
+ function isSafeVersion(v) {
149
+ return typeof v === 'string' && /^[0-9]+\.[0-9]+\.[0-9]+(-[0-9A-Za-z.-]+)?$/.test(v);
150
+ }
151
+
143
152
  /**
144
153
  * Re-exec the current command with the latest version.
145
154
  */
146
155
  function reExec(current, latest, originalArgs) {
156
+ // Defensive: reject anything that doesn't look like a real semver so a
157
+ // compromised npm registry response can't inject shell metacharacters.
158
+ if (!isSafeVersion(latest)) {
159
+ return; // silently skip — never block CLI on update check
160
+ }
147
161
  console.log(`\n⬆ Updating magector: v${current} → v${latest}...\n`);
148
162
  try {
149
- const cmd = `npx -y magector@${latest} ${originalArgs.join(' ')}`;
150
- execSync(cmd, {
163
+ // execFileSync with an argv array (no shell) — originalArgs are passed as
164
+ // individual argv entries, so spaces/metachars in them can't expand.
165
+ execFileSync('npx', ['-y', `magector@${latest}`, ...originalArgs], {
151
166
  stdio: 'inherit',
152
167
  env: { ...process.env, MAGECTOR_NO_UPDATE: '1' }
153
168
  });