muaddib-scanner 2.11.48 → 2.11.52

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -30,7 +30,7 @@
30
30
 
31
31
  npm and PyPI supply-chain attacks are exploding. Shai-Hulud compromised 25K+ repos in 2025. Existing tools detect threats but don't help you respond.
32
32
 
33
- MUAD'DIB combines **17 parallel scanners** (234 detection rules), a **deobfuscation engine**, **inter-module dataflow analysis**, **compound scoring** (16 compound rules), **ML classifiers** (XGBoost), and gVisor/Docker sandbox to detect known threats and suspicious behavioral patterns in npm and PyPI packages.
33
+ MUAD'DIB combines **20 parallel scanners** (262 detection rules), a **deobfuscation engine**, **inter-module dataflow analysis**, **compound scoring** (17 compound rules), and a gVisor/Docker sandbox to detect known threats and suspicious behavioral patterns in npm and PyPI packages. An XGBoost classifier exists in the codebase but is **currently inactive** (see [Evaluation Metrics](#evaluation-metrics) → ML Classifier section).
34
34
 
35
35
  ---
36
36
 
@@ -169,14 +169,14 @@ muaddib scrape # Full IOC refresh (~5min)
169
169
  muaddib diff HEAD~1 # Compare threats with previous commit
170
170
  muaddib init-hooks # Pre-commit hooks (husky/pre-commit/git)
171
171
  muaddib scan . --breakdown # Explainable score decomposition
172
- muaddib replay # Ground truth validation (61/65 TPR@3)
172
+ muaddib replay # Ground truth validation (90/94 TPR@3, v2.11.48)
173
173
  ```
174
174
 
175
175
  ---
176
176
 
177
177
  ## Features
178
178
 
179
- ### 17 parallel scanners
179
+ ### 20 parallel scanners
180
180
 
181
181
  | Scanner | Detection |
182
182
  |---------|-----------|
@@ -198,10 +198,13 @@ muaddib replay # Ground truth validation (61/65 TPR@3)
198
198
  | Anti-Forensic AST (intel-triage P1.2) | XOR loop + self-delete + decoy write compound (csec autodelete) |
199
199
  | Stub Package (intel-triage P1.3) | Tiny main file + external dep URL + lifecycle hook (ltidi chain) |
200
200
  | Monorepo Scanner | Lerna/pnpm-workspace/turbo detection (Sprint 1 audit MR-C2 fix) |
201
+ | Trusted-Dep-Diff (opt-in) | Diff against trusted dep tarballs from registry (v2.10.x) |
202
+ | Python Source (PYSRC) | Import-time / install-time RCE patterns in `__init__.py` / `setup.py` (v2.11.41 — closes TrapDoor PyPI gap) |
203
+ | Python AST (PYAST) | Tree-sitter-Python AST with taint-aware detectors (v2.11.42+) |
201
204
 
202
- ### 234 detection rules
205
+ ### 259 detection rules
203
206
 
204
- All rules (229 RULES + 5 PARANOID) are mapped to MITRE ATT&CK techniques. See [SECURITY.md](SECURITY.md#detection-rules-v21021) for the complete rules reference.
207
+ All rules (254 RULES + 5 PARANOID) are mapped to MITRE ATT&CK techniques. See [SECURITY.md](SECURITY.md#detection-rules-v21147) for the complete rules reference.
205
208
 
206
209
  ### Detected campaigns
207
210
 
@@ -275,7 +278,7 @@ With pre-commit framework:
275
278
  ```yaml
276
279
  repos:
277
280
  - repo: https://github.com/DNSZLSK/muad-dib
278
- rev: v2.11.24
281
+ rev: v2.11.48
279
282
  hooks:
280
283
  - id: muaddib-scan
281
284
  ```
@@ -284,33 +287,57 @@ repos:
284
287
 
285
288
  ## Evaluation Metrics
286
289
 
290
+ Latest measurement: **v2.11.48** (2026-05-26, Track D + PyPI download fix). Ground truth holds 96 samples (94 in-scope, 2 out-of-scope protestware). This run measures the full 94 in-scope set after the 2026-05-25 enrichment (Track C synthetic for the new PYSRC/PYAST/AST-092/AICONF-004/PKG-022 rules, Track A real-world tarballs recovered from VPS archive, Track B reconstructions from the in-house security-review benchmark).
291
+
292
+ ### Operational metrics (what an operator actually gets)
293
+
294
+ These are the numbers a user gets when running `muaddib scan` against npm or PyPI packages. The pipeline executes scanners + FP caps only — no ML filter is applied (see ML Classifier note below).
295
+
287
296
  | Metric | Result | Details |
288
297
  |--------|--------|---------|
298
+ | **Wild TPR** (Datadog 17K) | **92.8%** (13,538/14,587 in-scope) | 17,922 packages. 3,335 skipped (no JS). By category: compromised_lib 97.8%, malicious_intent 92.1% — last measurement v2.9.4, independent of GT. |
299
+ | **TPR@3** (detection rate, v2.11.48) | **95.74%** (90/94 in-scope) | Full GT re-measurement. Threshold=3: any signal. 13 PyPI samples (was 0). 4 misses incl. 3 browser-only (lottie-player, polyfill-io, trojanized-jquery). |
300
+ | **TPR@20** (alert rate, v2.11.48) | **88.30%** (83/94 in-scope) | Operational alert threshold=20. **+3.1pp vs v2.11.47** — Track D `recon_exfil_direct_ip` compound (MUADDIB-COMPOUND-016) closed the GT-095 gap (risk 3→50) and boosted GT-091 byvendors / GT-092 heloo131313 through `linux_fingerprint_exec`. |
301
+ | **FPR rules** (Benign curated, v2.11.48 measure) | **1.10%** (6/545 scanned, 548 total) | **Unchanged after Track D** — the new compound + types created zero new FPs (sameFile gate + public-IP-only filter). Drop from 15.6% (v2.10.95) is attributable to FP caps F1-F14 (v2.10.97 → v2.11.31). 6 remaining FPs are real (meteor, prisma, @prisma/client, drizzle-orm, scrypt, liquid). |
302
+ | **FPR** (Benign random, v2.11.48) | **2.50%** (5/200) | 200 random npm packages, unchanged. |
303
+ | **FPR PyPI** (v2.11.48, first honest measurement) | **9.68%** (12/124 scanned, 132 total) | **Track D fixed the PyPI downloader** — removed `pip --no-binary :all:` flag (forced compile of wheel-only packages, timed out 38% of the time) + added `.whl` extraction via `extractArchive()`. Brought 42 previously-skipped giants (numpy/pandas/django/matplotlib/scikit-learn/...) into scope. All 12 FPs cluster at score 25-35: this is the cap-PyPI-35 artifact, not new rule misfires. Lifting the cap (Track E) would drop FPR PyPI to ≈0%. 8 residual fails are >500MB packages (torch, tensorflow, scipy, opencv-python, ansible…) hitting the 30s `PACK_TIMEOUT_MS`. |
304
+ | **ADR** (Adversarial + Holdout, v2.11.48) | **96.26%** (103/107) | 67 adversarial + 40 holdout, global threshold=20. Stable vs v2.10.95. |
305
+
306
+ **3913 tests** across 109 files. **262 rules** (257 RULES + 5 PARANOID — Track D added 3: AST-093, AST-094, COMPOUND-016).
307
+
308
+ **Known issues (v2.11.48):**
309
+ - *Cap PyPI à 35/100*: Python samples plafonnent à `riskScore=35` even when `globalRiskScore=100`. Confirmed empirically — all 12 PyPI FPs at score 25-35 (flask 32, django 35, tornado 35, bottle 30, pandas 25, matplotlib 25, plotly 25, bokeh 25, pymongo 35, coverage 32, fabric 35, websockets 35). Lifting the cap will simultaneously drop FPR PyPI to ≈0% and unblock PyPI MALWARE detection at higher thresholds. Track E target.
310
+
311
+ ### ML Classifier (offline only)
312
+
313
+ `src/ml/classifier.js` is **not wired into `muaddib scan`**. The XGBoost model is currently exercised only by `muaddib evaluate` (offline metric replay) and `muaddib monitor` (LOG-ONLY since 2026-04-08, model collapsed pending retrain — see `src/monitor/queue.js:628`). The v2.11.48 evaluate-time replay shows the same 1.10% FPR (no additional FPs filtered) — kept as a reference for retrain validation, but the published operational FPR is the rules-only number above.
314
+
315
+ > **Static evaluation caveats:**
316
+ > - TPR measured on the full 94 in-scope samples from the 96-sample ground truth (2 out-of-scope protestware GT-005/GT-009 with `min_threats=0`)
317
+ > - TPR@3 = detection rate (any signal); TPR@20 = operational alert threshold
318
+ > - FPR rules measured on 548 curated popular npm packages (not a random sample)
319
+ > - FPR PyPI: 124/132 scanned (8 download fails on >500MB giants — torch/tensorflow/ansible/…). Smaller N than npm.
320
+ > - ADR measured with global threshold (score >= 20) as of v2.6.5
321
+
322
+ See [Evaluation Methodology](docs/EVALUATION_METHODOLOGY.md) for the full experimental protocol, holdout history, and Datadog benchmark details.
323
+
324
+ ### ML Classifier — R&D, currently inactive
325
+
326
+ > **Status (2026-04-08 → present):** The XGBoost classifier (`src/ml/classifier.js`) is **not wired into `muaddib scan`** at all, and in `muaddib monitor` it runs in **LOG-ONLY mode** since 2026-04-08 — the trained model collapsed (predicts p≈0.002 for every input, including clearly malicious lifecycle+exec+staged_payload patterns) and was disabled pending retrain on balanced JSONL data. The metrics below come from offline `muaddib evaluate` replay against a frozen bench. They describe what the model *would* contribute if it worked, **not** what an operator gets today.
327
+
328
+ | Metric (offline `evaluate` replay) | Result | Details |
329
+ |--------|--------|---------|
289
330
  | **ML FPR** | **2.85%** (239/8,393 holdout) | XGBoost retrained on 56,564 samples, 64 features, threshold=0.710 |
290
331
  | **ML TPR** | **99.93%** (2,918/2,920 holdout) | 377 confirmed_malicious via OSSF/GHSA/npm correlation |
291
- | **Wild TPR** (Datadog 17K) | **92.8%** (13,538/14,587 in-scope) | 17,922 packages. 3,335 skipped (no JS). By category: compromised_lib 97.8%, malicious_intent 92.1% |
292
- | **TPR@3** (detection rate) | **93.85%** (61/65) | 67 real attacks (65 active, 2 out-of-scope: GT-005 colors, GT-009 faker — protestware with min_threats=0). Threshold=3: any signal |
293
- | **TPR@20** (alert rate) | **86.2%** (56/65) | Operational alert threshold=20, aligned with ADR/FPR |
294
- | **FPR rules** (Benign curated, v2.10.95 measure) | **15.6%** (85/545 scanned, 548 total) | npm packages, real source via `npm pack`; v2.10.74 estimated 6-9% reduction did NOT materialize on rebuilt corpus |
295
- | **FPR after ML** (v2.10.95 measure) | **10.28%** (56/545 scanned) | ML filters 29/30 T1 benign, 0 GT/ADR suppressed |
296
- | **FPR** (Benign random, v2.10.95 measure) | **7.0%** (14/200) | 200 random npm packages, stratified sampling |
297
- | **ADR** (Adversarial + Holdout) | **96.3%** (103/107) | 67 adversarial + 40 holdout (107 available on disk), global threshold=20 |
298
-
299
- **3664 tests** across 93 files. **234 rules** (229 RULES + 5 PARANOID).
332
+ | **FPR after ML T1** (offline replay, v2.11.48) | **1.10%** (6/545 scanned) | Classifier filters 0/6 raw FPs in this run (filtered 1 at v2.11.47). Not applied during real scans — `muaddib scan` never invokes the classifier. |
300
333
 
301
- > **ML retrain methodology (v2.10.51):**
334
+ > **Retrain methodology (v2.10.51):**
302
335
  > - Ground truth: 377 confirmed_malicious via auto-labeler (OSSF malicious-packages, GitHub Advisory Database, npm registry takedown correlation)
303
336
  > - Dataset: 56,564 samples (14,602 malicious, 41,962 clean). Stratified 80/20 split
304
337
  > - Grid search: depth=4, estimators=300, lr=0.05. AUC-ROC=0.999, F1=0.960
305
338
  > - Leaky feature filter: 23 dead/leaky features removed (source-identity proxies)
306
339
  >
307
- > **Static evaluation caveats:**
308
- > - TPR measured on 65 active Node.js attack samples (2 out-of-scope: GT-005 colors, GT-009 faker, both protestware with min_threats=0; from 67 total)
309
- > - TPR@3 = detection rate (any signal); TPR@20 = operational alert threshold
310
- > - FPR measured on 532 curated popular npm packages (not a random sample)
311
- > - ADR measured with global threshold (score >= 20) as of v2.6.5
312
-
313
- See [Evaluation Methodology](docs/EVALUATION_METHODOLOGY.md) for the full experimental protocol, holdout history, and Datadog benchmark details.
340
+ > The shadow model continues to log predictions in `muaddib monitor` for retraining validation. When the next model passes shadow validation, the LOG-ONLY guard in `src/monitor/queue.js:660` will be flipped and the metrics above will move back into the operational table.
314
341
 
315
342
  ---
316
343
 
@@ -344,11 +371,11 @@ npm test
344
371
 
345
372
  ### Testing
346
373
 
347
- - **3664 tests** across 93 modular test files
374
+ - **3913 tests** across 109 modular test files
348
375
  - **56 fuzz tests** - Malformed inputs, ReDoS, unicode, binary
349
376
  - **Datadog 17K benchmark** - 14,587 confirmed malware samples (in-scope)
350
- - **Ground truth validation** - 67 real-world attacks (93.85% TPR@3, 86.2% TPR@20 — v2.10.95 measure)
351
- - **False positive validation** (v2.10.95 measure) - 15.6% FPR rules (85/545 scanned), 10.28% after ML (56/545 scanned), 7.0% on 200 random
377
+ - **Ground truth validation** - 96 real-world attacks (95.74% TPR@3, 88.30% TPR@20 — v2.11.48 full measure on 94 in-scope)
378
+ - **False positive validation** (v2.11.48 measure) - 1.10% FPR rules (6/545 scanned), 2.50% on 200 random, 9.68% on 124/132 PyPI (first honest measurement post-Track-D download fix). ML classifier currently inactive — see Evaluation Metrics → ML Classifier.
352
379
 
353
380
  ---
354
381
 
@@ -365,7 +392,7 @@ npm test
365
392
  - [Documentation Index](docs/INDEX.md) - All documentation in one place
366
393
  - [Evaluation Methodology](docs/EVALUATION_METHODOLOGY.md) - Experimental protocol, holdout scores
367
394
  - [Threat Model](docs/threat-model.md) - What MUAD'DIB detects and doesn't detect
368
- - [Security Policy](SECURITY.md) - Detection rules reference (234 rules)
395
+ - [Security Policy](SECURITY.md) - Detection rules reference (259 rules)
369
396
  - [Security Audit](docs/SECURITY_AUDIT.md) - Bypass validation report
370
397
  - [FP Analysis](docs/EVALUATION.md) - Historical false positive analysis
371
398
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "muaddib-scanner",
3
- "version": "2.11.48",
3
+ "version": "2.11.52",
4
4
  "description": "Supply-chain threat detection & response for npm & PyPI/Python",
5
5
  "main": "src/index.js",
6
6
  "bin": {
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "target": "node_modules",
3
- "timestamp": "2026-05-26T08:43:39.544Z",
3
+ "timestamp": "2026-05-26T21:21:36.874Z",
4
4
  "threats": [
5
5
  {
6
6
  "type": "string_mutation_obfuscation",
@@ -141,7 +141,10 @@ async function getWeeklyDownloads(packageName) {
141
141
  }
142
142
  try {
143
143
  const url = `https://api.npmjs.org/downloads/point/last-week/${encodeURIComponent(packageName)}`;
144
- const body = await httpsGet(url, 3000);
144
+ // Routed via _deps so tests can stub the downloads endpoint independently
145
+ // of the registry endpoint (Stage 2.1 added parallel-fetch from
146
+ // preResolveNpmBatch).
147
+ const body = await _deps.httpsGet(url, 3000);
145
148
  const data = JSON.parse(body);
146
149
  const downloads = typeof data.downloads === 'number' ? data.downloads : -1;
147
150
  downloadsCache.set(packageName, { downloads, fetchedAt: Date.now() });
@@ -158,12 +161,15 @@ function getNpmTarballUrl(pkgData) {
158
161
  }
159
162
 
160
163
  async function getPyPITarballUrl(packageName, packageVersion = '') {
161
- // Per-version endpoint when we know the version (e.g. from the XML-RPC changelog) —
162
- // guarantees we scan the artifact that just landed, not whatever became "latest"
163
- // between event detection and scan. Falls back to /pypi/<name>/json (latest) otherwise.
164
- const url = packageVersion
165
- ? `https://pypi.org/pypi/${encodeURIComponent(packageName)}/${encodeURIComponent(packageVersion)}/json`
166
- : `https://pypi.org/pypi/${encodeURIComponent(packageName)}/json`;
164
+ // Always hit the package-level endpoint. It contains:
165
+ // - info.version → latest version
166
+ // - urls → files for the latest version
167
+ // - releases → files for ALL versions (so we can find packageVersion's
168
+ // exact artifact, same anti-race guarantee as the per-
169
+ // version endpoint used to provide)
170
+ // We extract triage metadata (age_days, version_count) from `releases` in
171
+ // the same round-trip — keeps Stage 2's PyPI cost at 1 HTTP call.
172
+ const url = `https://pypi.org/pypi/${encodeURIComponent(packageName)}/json`;
167
173
  const body = await _deps.httpsGet(url);
168
174
  let data;
169
175
  try {
@@ -171,20 +177,58 @@ async function getPyPITarballUrl(packageName, packageVersion = '') {
171
177
  } catch (e) {
172
178
  throw new Error(`Invalid JSON from PyPI for ${packageName}: ${e.message}`);
173
179
  }
174
- const version = (data.info && data.info.version) || packageVersion || '';
175
- const urls = data.urls || [];
176
- // Prefer sdist (.tar.gz)
177
- const sdist = urls.find(u => u.packagetype === 'sdist' && u.url);
178
- if (sdist) return { url: sdist.url, version };
179
- // Fallback: any .tar.gz
180
- const tarGz = urls.find(u => u.url && u.url.endsWith('.tar.gz'));
181
- if (tarGz) return { url: tarGz.url, version };
182
- // Fallback: wheel (.whl) extracted via adm-zip in queue.js, not tar.
183
- // Legacy .egg / .tar.bz2 / .exe installers intentionally NOT returned —
184
- // they were the cause of ~2773 tar_failed/day before this fix.
185
- const wheel = urls.find(u => u.url && (u.url.endsWith('.whl') || u.url.endsWith('.zip')));
186
- if (wheel) return { url: wheel.url, version };
187
- return { url: null, version };
180
+
181
+ const latestVersion = (data.info && data.info.version) || '';
182
+ const version = packageVersion || latestVersion;
183
+ const releases = (data && data.releases) || {};
184
+
185
+ // Pick files for the requested version (preserves the original anti-race
186
+ // guarantee we scan the exact version flagged by the changelog). If
187
+ // absent (e.g. lazy resolution without a known version), use latest urls.
188
+ const files = (packageVersion && Array.isArray(releases[packageVersion]))
189
+ ? releases[packageVersion]
190
+ : (Array.isArray(data.urls) ? data.urls : []);
191
+
192
+ // Tarball selection priority unchanged: sdist > .tar.gz > .whl/.zip.
193
+ // Legacy .egg / .tar.bz2 / .exe intentionally not returned (they were the
194
+ // cause of ~2773 tar_failed/day before the original fix).
195
+ let tarballUrl = null;
196
+ const sdist = files.find(u => u && u.packagetype === 'sdist' && u.url);
197
+ if (sdist) {
198
+ tarballUrl = sdist.url;
199
+ } else {
200
+ const tarGz = files.find(u => u && u.url && u.url.endsWith('.tar.gz'));
201
+ if (tarGz) {
202
+ tarballUrl = tarGz.url;
203
+ } else {
204
+ const wheel = files.find(u => u && u.url && (u.url.endsWith('.whl') || u.url.endsWith('.zip')));
205
+ if (wheel) tarballUrl = wheel.url;
206
+ }
207
+ }
208
+
209
+ // Stage 2 triage metadata: derived from `releases` once per fetch.
210
+ const versionCount = Object.keys(releases).length;
211
+ let earliestUpload = Number.MAX_SAFE_INTEGER;
212
+ for (const v of Object.keys(releases)) {
213
+ const versionFiles = releases[v];
214
+ if (!Array.isArray(versionFiles)) continue;
215
+ for (const f of versionFiles) {
216
+ if (f && f.upload_time) {
217
+ const ts = Date.parse(f.upload_time);
218
+ if (Number.isFinite(ts) && ts < earliestUpload) earliestUpload = ts;
219
+ }
220
+ }
221
+ }
222
+ const ageDays = earliestUpload !== Number.MAX_SAFE_INTEGER
223
+ ? Math.floor((Date.now() - earliestUpload) / 86_400_000)
224
+ : null;
225
+
226
+ return {
227
+ url: tarballUrl,
228
+ version,
229
+ age_days: ageDays,
230
+ version_count: versionCount,
231
+ };
188
232
  }
189
233
 
190
234
  // --- RSS parsing ---
@@ -372,7 +416,7 @@ async function getNpmLatestTarball(packageName) {
372
416
  await acquireRegistrySlot();
373
417
  let body;
374
418
  try {
375
- body = await httpsGet(url);
419
+ body = await _deps.httpsGet(url);
376
420
  } finally {
377
421
  releaseRegistrySlot();
378
422
  }
@@ -388,11 +432,153 @@ async function getNpmLatestTarball(packageName) {
388
432
  version: '', tarball: null, unpackedSize: 0, scripts: {},
389
433
  homepage: '', description: '',
390
434
  latestTagVersion: null, recentVersions: [],
435
+ age_days: null, version_count: 0,
391
436
  };
392
437
  }
438
+ // Stage 2.1 — extract reputation signals from the packument we already have,
439
+ // so triageRisk in queue.js doesn't have to refetch metadata via
440
+ // getPackageMetadata. Two fields are derivable from the packument alone:
441
+ // - age_days : time.created (package creation timestamp)
442
+ // - version_count : Object.keys(versions).length (excludes unpublished
443
+ // tombstones kept only in `time`)
444
+ // weekly_downloads requires a separate api.npmjs.org call and is fetched in
445
+ // parallel by preResolveNpmBatch (it has its own cache + no semaphore).
446
+ const createdAt = (packument && packument.time && packument.time.created) || null;
447
+ result.age_days = createdAt
448
+ ? Math.floor((Date.now() - new Date(createdAt).getTime()) / 86_400_000)
449
+ : null;
450
+ result.version_count = (packument && packument.versions)
451
+ ? Object.keys(packument.versions).length : 0;
393
452
  return result;
394
453
  }
395
454
 
455
+ // --- Pre-resolution helpers ---
456
+ //
457
+ // Resolve tarball URLs and metadata at ingestion time so scan workers do not
458
+ // each pay a separate registry round-trip. Best-effort: any failure leaves
459
+ // item.tarballUrl untouched (null) so resolveTarballAndScan() in queue.js
460
+ // falls back to its existing lazy-resolution path (zero scan loss).
461
+ //
462
+ // HTTP throttling: getNpmLatestTarball / getPyPITarballUrl already acquire
463
+ // the shared REGISTRY_SEMAPHORE_MAX=20 slot + 30 req/sec token bucket, so
464
+ // fan-out is naturally bounded — bursts queue up rather than overrun the
465
+ // registry. We still chunk explicitly below so the Promise closures don't
466
+ // pile up on a 1000-item catch-up batch (each waiting on the semaphore
467
+ // holds ~10KB of state; 1000 of them is a needless heap spike).
468
+ const PRE_RESOLVE_CHUNK_SIZE = 50;
469
+
470
+ // If a scanQueue is provided, items are pushed onto it as soon as their chunk
471
+ // finishes resolution — so a crash mid-batch only loses the current chunk's
472
+ // in-flight work, not all the chunks that already completed. When scanQueue
473
+ // is omitted (unit tests, lib usage), items are only mutated in place and the
474
+ // caller decides when to push.
475
+ async function preResolveNpmBatch(items, stats, scanQueue) {
476
+ if (!items || items.length === 0) return;
477
+ const start = Date.now();
478
+ let resolved = 0;
479
+ let alreadyResolved = 0;
480
+ let failed = 0;
481
+ for (let i = 0; i < items.length; i += PRE_RESOLVE_CHUNK_SIZE) {
482
+ const chunk = items.slice(i, i + PRE_RESOLVE_CHUNK_SIZE);
483
+ await Promise.all(chunk.map(async (item) => {
484
+ if (item.tarballUrl) { alreadyResolved++; return; }
485
+ try {
486
+ // Stage 2.1 — fetch downloads in parallel with the packument. The
487
+ // downloads endpoint (api.npmjs.org) is not on the registry semaphore
488
+ // and has its own internal cache, so this is effectively free in the
489
+ // warm-cache case and adds at most one parallel HTTP otherwise.
490
+ const [npmInfo, weeklyDownloads] = await Promise.all([
491
+ getNpmLatestTarball(item.name),
492
+ getWeeklyDownloads(item.name).catch(() => null)
493
+ ]);
494
+ if (npmInfo && npmInfo.tarball) {
495
+ item.tarballUrl = npmInfo.tarball;
496
+ if (!item.version) item.version = npmInfo.version || '';
497
+ if (!item.unpackedSize) item.unpackedSize = npmInfo.unpackedSize || 0;
498
+ if (!item.registryScripts) item.registryScripts = npmInfo.scripts || null;
499
+ // weekly_downloads is best-effort. getWeeklyDownloads returns -1 on
500
+ // failure; normalize that to null so triageRisk treats it as missing
501
+ // (rather than silently biasing the reputation factor toward "suspect").
502
+ npmInfo.weekly_downloads = (typeof weeklyDownloads === 'number' && weeklyDownloads >= 0)
503
+ ? weeklyDownloads : null;
504
+ // Stash full packument-derived metadata for resolveTarballAndScan so
505
+ // the worker can run ATO-signature, burst-extras, and fast-track logic
506
+ // without a second registry call. Stage 2.1 enriches this with
507
+ // age_days / version_count (from getNpmLatestTarball) and
508
+ // weekly_downloads (from getWeeklyDownloads) so the triage block in
509
+ // queue.js can read meta directly without re-fetching.
510
+ item._npmInfo = npmInfo;
511
+ resolved++;
512
+ } else {
513
+ failed++;
514
+ }
515
+ } catch {
516
+ // Silent: worker will retry via lazy resolution. Logging here would
517
+ // double-count errors that the worker already surfaces.
518
+ failed++;
519
+ }
520
+ }));
521
+ // Crash resilience: surface this chunk to the queue now, before the next
522
+ // chunk starts. If the process dies between chunks we still keep the work
523
+ // already done. Items keep their original order because chunks complete
524
+ // sequentially.
525
+ if (scanQueue) {
526
+ for (const item of chunk) scanQueue.push(item);
527
+ }
528
+ }
529
+ if (stats) {
530
+ stats.npmPreResolved = (stats.npmPreResolved || 0) + resolved;
531
+ stats.npmPreResolveFailed = (stats.npmPreResolveFailed || 0) + failed;
532
+ }
533
+ if (items.length >= 5) {
534
+ const elapsed = Date.now() - start;
535
+ console.log(`[MONITOR] PRE-RESOLVE npm: ${resolved}/${items.length} in ${elapsed}ms (${failed} → lazy fallback${alreadyResolved ? `, ${alreadyResolved} already resolved` : ''})`);
536
+ }
537
+ }
538
+
539
+ async function preResolvePyPIBatch(items, stats, scanQueue) {
540
+ if (!items || items.length === 0) return;
541
+ const start = Date.now();
542
+ let resolved = 0;
543
+ let alreadyResolved = 0;
544
+ let failed = 0;
545
+ for (let i = 0; i < items.length; i += PRE_RESOLVE_CHUNK_SIZE) {
546
+ const chunk = items.slice(i, i + PRE_RESOLVE_CHUNK_SIZE);
547
+ await Promise.all(chunk.map(async (item) => {
548
+ if (item.tarballUrl) { alreadyResolved++; return; }
549
+ try {
550
+ const pypiInfo = await getPyPITarballUrl(item.name, item.version || '');
551
+ if (pypiInfo && pypiInfo.url) {
552
+ item.tarballUrl = pypiInfo.url;
553
+ if (!item.version && pypiInfo.version) item.version = pypiInfo.version;
554
+ // Stage 2 triage signals: stash age_days + version_count for
555
+ // triageRisk() to read in queue.js without a second registry call.
556
+ item._pypiInfo = {
557
+ age_days: pypiInfo.age_days,
558
+ version_count: pypiInfo.version_count,
559
+ };
560
+ resolved++;
561
+ } else {
562
+ failed++;
563
+ }
564
+ } catch {
565
+ failed++;
566
+ }
567
+ }));
568
+ if (scanQueue) {
569
+ for (const item of chunk) scanQueue.push(item);
570
+ }
571
+ }
572
+ if (stats) {
573
+ stats.pypiPreResolved = (stats.pypiPreResolved || 0) + resolved;
574
+ stats.pypiPreResolveFailed = (stats.pypiPreResolveFailed || 0) + failed;
575
+ }
576
+ if (items.length >= 5) {
577
+ const elapsed = Date.now() - start;
578
+ console.log(`[MONITOR] PRE-RESOLVE pypi: ${resolved}/${items.length} in ${elapsed}ms (${failed} → lazy fallback${alreadyResolved ? `, ${alreadyResolved} already resolved` : ''})`);
579
+ }
580
+ }
581
+
396
582
  // --- npm polling ---
397
583
 
398
584
  /**
@@ -481,6 +667,10 @@ async function pollNpmChanges(state, scanQueue, stats) {
481
667
  stats.npmPublishEventsSeen = (stats.npmPublishEventsSeen || 0) + data.results.length;
482
668
 
483
669
  let queued = 0;
670
+ // Collect items into a local batch so we can pre-resolve tarball URLs in
671
+ // parallel before pushing to scanQueue. Items reach workers with metadata
672
+ // already attached → workers skip the per-scan registry round-trip.
673
+ const newItems = [];
484
674
  for (const change of data.results) {
485
675
  // Skip deleted packages
486
676
  if (change.deleted) continue;
@@ -547,11 +737,10 @@ async function pollNpmChanges(state, scanQueue, stats) {
547
737
  // Layer 3: Evaluate if this package should be cached
548
738
  const cacheTrigger = evaluateCacheTrigger(name, docMeta, change.doc || null);
549
739
 
550
- // Layer 2: Extract tarball URL from CouchDB doc (eliminates lazy resolution 404 race)
551
- // NOTE: fastTrack flag is computed in resolveTarballAndScan() AFTER metadata
552
- // resolution via getNpmLatestTarball(). It cannot be computed here because
553
- // post-May 2025, include_docs is deprecated and change.doc is always null.
554
- scanQueue.push({
740
+ // Post-May 2025: change.doc is always null, so docMeta is null and tarballUrl
741
+ // starts as null. preResolveNpmBatch below fills tarballUrl + metadata via
742
+ // a parallel registry fetch so workers do not pay the round-trip per scan.
743
+ newItems.push({
555
744
  name,
556
745
  version: docMeta ? docMeta.version : '',
557
746
  ecosystem: 'npm',
@@ -564,6 +753,11 @@ async function pollNpmChanges(state, scanQueue, stats) {
564
753
  queued++;
565
754
  }
566
755
 
756
+ // Parallel pre-resolution, pushed chunk by chunk for crash resilience.
757
+ // Failures leave tarballUrl=null so the existing lazy-resolution path in
758
+ // resolveTarballAndScan() picks up the slack — zero scan loss.
759
+ await preResolveNpmBatch(newItems, stats, scanQueue);
760
+
567
761
  // Update seq in memory only — disk persistence is handled by daemon.js
568
762
  // after both queue and seq are saved atomically (prevents data loss on crash).
569
763
  if (data.last_seq != null) {
@@ -623,6 +817,7 @@ async function pollNpmRss(state, scanQueue, stats) {
623
817
  // falls back to RSS.
624
818
  stats.npmPublishEventsSeen = (stats.npmPublishEventsSeen || 0) + newPackages.length;
625
819
 
820
+ const newItems = [];
626
821
  for (const name of newPackages) {
627
822
  if (name === SELF_PACKAGE_NAME) {
628
823
  console.log(`[MONITOR] SKIPPED (self): ${name}`);
@@ -666,15 +861,18 @@ async function pollNpmRss(state, scanQueue, stats) {
666
861
  }
667
862
  }
668
863
 
669
- // Queue npm packages — tarball URL resolved during scan
670
- scanQueue.push({
864
+ newItems.push({
671
865
  name,
672
866
  version: '',
673
867
  ecosystem: 'npm',
674
- tarballUrl: null // resolved lazily via resolveTarballAndScan (no CouchDB doc in RSS)
868
+ tarballUrl: null // pre-resolved below; lazy fallback preserved on failure
675
869
  });
676
870
  }
677
871
 
872
+ // Parallel pre-resolution with per-chunk push → crash-resilient and saves
873
+ // the worker's per-scan registry round-trip when it succeeds.
874
+ await preResolveNpmBatch(newItems, stats, scanQueue);
875
+
678
876
  // Remember the most recent package (first in RSS)
679
877
  if (packages.length > 0) {
680
878
  state.npmLastPackage = packages[0];
@@ -901,6 +1099,7 @@ async function pollPyPIChangelog(state, scanQueue, stats) {
901
1099
  const seen = new Set();
902
1100
  let queued = 0;
903
1101
  let maxSerial = lastSerial;
1102
+ const newItems = [];
904
1103
 
905
1104
  for (const ev of events) {
906
1105
  if (ev.serial > maxSerial) maxSerial = ev.serial;
@@ -932,16 +1131,20 @@ async function pollPyPIChangelog(state, scanQueue, stats) {
932
1131
  }
933
1132
  } catch { /* IOC load failure is non-fatal */ }
934
1133
 
935
- scanQueue.push({
1134
+ newItems.push({
936
1135
  name: ev.name,
937
1136
  version: ev.version,
938
1137
  ecosystem: 'pypi',
939
- tarballUrl: null, // resolved lazily via getPyPITarballUrl()
1138
+ tarballUrl: null, // pre-resolved below; lazy fallback preserved
940
1139
  isIOCMatch: isKnownIOC
941
1140
  });
942
1141
  queued++;
943
1142
  }
944
1143
 
1144
+ // Parallel pre-resolution with per-chunk push to scanQueue. Failures keep
1145
+ // tarballUrl=null so resolveTarballAndScan() falls back to lazy lookup.
1146
+ await preResolvePyPIBatch(newItems, stats, scanQueue);
1147
+
945
1148
  // Persist the serial both in memory and on disk before returning.
946
1149
  // daemon.js also flushes state.json after the queue is saved, but writing the
947
1150
  // dedicated serial file here means a crash between the two flush points costs
@@ -996,17 +1199,22 @@ async function pollPyPIRss(state, scanQueue) {
996
1199
  }
997
1200
  }
998
1201
 
1202
+ const newItems = [];
999
1203
  for (const name of newPackages) {
1000
1204
  console.log(`[MONITOR] New pypi (rss): ${name}`);
1001
- // Queue PyPI packages — tarball URL resolved during scan
1002
- scanQueue.push({
1205
+ newItems.push({
1003
1206
  name,
1004
1207
  version: '',
1005
1208
  ecosystem: 'pypi',
1006
- tarballUrl: null // resolved lazily in scanPackage wrapper
1209
+ tarballUrl: null // pre-resolved below; lazy fallback preserved
1007
1210
  });
1008
1211
  }
1009
1212
 
1213
+ // pollPyPIRss does not have a stats arg today; pass {} so the helper still
1214
+ // runs but per-poll counters are dropped. The PRE-RESOLVE log line gives
1215
+ // operational visibility regardless. scanQueue is passed for per-chunk push.
1216
+ await preResolvePyPIBatch(newItems, {}, scanQueue);
1217
+
1010
1218
  // Remember the most recent package (first in RSS)
1011
1219
  if (packages.length > 0) {
1012
1220
  state.pypiLastPackage = packages[0];
@@ -1119,6 +1327,8 @@ module.exports = {
1119
1327
  getNpmTarballUrl,
1120
1328
  getPyPITarballUrl,
1121
1329
  getNpmLatestTarball,
1330
+ preResolveNpmBatch,
1331
+ preResolvePyPIBatch,
1122
1332
 
1123
1333
  // RSS parsing
1124
1334
  parseNpmRss,
@@ -73,6 +73,7 @@ const {
73
73
  buildCanaryExfiltrationWebhookEmbed,
74
74
  getWebhookUrl,
75
75
  computeReputationFactor,
76
+ triageRisk,
76
77
  computeRiskLevel,
77
78
  sendDailyReport,
78
79
  alertedPackageRules,
@@ -127,6 +128,22 @@ const LARGE_PACKAGE_SIZE = 10 * 1024 * 1024; // 10MB
127
128
  const FIRST_PUBLISH_SANDBOX_MAX_QUEUE = parseInt(process.env.MUADDIB_FIRST_PUBLISH_SANDBOX_MAX_QUEUE, 10) || 10;
128
129
  const FIRST_PUBLISH_SANDBOX_ENABLED = process.env.MUADDIB_FIRST_PUBLISH_SANDBOX !== '0';
129
130
 
131
+ // Stage 3 — sandbox gate. Static-score threshold below which T1b/T2 packages
132
+ // are NOT sandboxed (static result alone is authoritative). Tightens the prior
133
+ // "T1b sandbox if score >= 25 or queue < 20" to remove low-signal sandbox runs
134
+ // that consume slots without producing actionable findings (the dominant cost
135
+ // in the queue-saturation diagnostic). Validated by axon-enterprise@1.0.0
136
+ // (static 52, sandbox confirmed 100) — gate >= 40 still catches it.
137
+ // T1a (high-confidence malice) bypasses this gate; it's mandatory.
138
+ // Override via env var to widen the gate (lower threshold) for a short
139
+ // rollback window without redeploying. Clamped to [0, 100].
140
+ function computeSandboxScoreThreshold(envValue) {
141
+ const parsed = parseInt(envValue, 10);
142
+ const value = Number.isFinite(parsed) ? parsed : 40;
143
+ return Math.max(0, Math.min(100, value));
144
+ }
145
+ const SANDBOX_SCORE_THRESHOLD = computeSandboxScoreThreshold(process.env.MUADDIB_SANDBOX_SCORE_THRESHOLD);
146
+
130
147
  // --- Bundled tooling false-positive filter ---
131
148
 
132
149
  const KNOWN_BUNDLED_FILES = ['yarn.js', 'webpack.js', 'terser.js', 'esbuild.js', 'polyfills.js'];
@@ -444,7 +461,11 @@ async function scanPackage(name, version, ecosystem, tarballUrl, registryMeta, s
444
461
  version,
445
462
  ecosystem,
446
463
  monitorMode: true,
447
- trustedDepDiff: true
464
+ trustedDepDiff: true,
465
+ // Stage 2: set by processQueueItem when MUADDIB_TRIAGE_MODE=enforce.
466
+ // Defaults to 'full' so any CLI/test caller that bypasses triage gets
467
+ // the full 20-scanner pipeline (unchanged behaviour).
468
+ scanMode: (meta && meta.scanMode) || 'full'
448
469
  };
449
470
  result = await runScanInWorker(extractedDir, STATIC_SCAN_TIMEOUT_MS, scanContext);
450
471
  } catch (staticErr) {
@@ -733,14 +754,16 @@ async function scanPackage(name, version, ecosystem, tarballUrl, registryMeta, s
733
754
  }
734
755
 
735
756
  // T1a: mandatory sandbox (HC malice types, TIER1_TYPES non-LOW, lifecycle + intent compound)
736
- // T1b: conditional sandbox (HIGH/CRITICAL without HC type bundler FP zone)
737
- // sandbox only if score >= 25 (significant risk) or queue pressure is low
738
- // T2: sandbox if queue < 50 (as before)
757
+ // T1b: conditional sandbox gated by SANDBOX_SCORE_THRESHOLD (Stage 3).
758
+ // Previously gated at >= 25 OR queue < 20; tightened to >= 40 by
759
+ // default because the 25-39 band produced no decisive sandbox
760
+ // findings in 4 months of prod data (axon-enterprise was at 52).
761
+ // T2: conditional sandbox — same score gate AND queue < 50.
739
762
  let sandboxResult = null;
740
763
  const shouldSandbox = !skipSandboxLargePackage && isSandboxEnabled() && sandboxAvailable && (
741
764
  tier === '1a' ||
742
- (tier === '1b' && (riskScore >= 25 || scanQueue.length < 20)) ||
743
- (tier === 2 && scanQueue.length < 50)
765
+ (tier === '1b' && riskScore >= SANDBOX_SCORE_THRESHOLD) ||
766
+ (tier === 2 && riskScore >= SANDBOX_SCORE_THRESHOLD && scanQueue.length < 50)
744
767
  );
745
768
 
746
769
  if (shouldSandbox) {
@@ -808,8 +831,12 @@ async function scanPackage(name, version, ecosystem, tarballUrl, registryMeta, s
808
831
  } catch (err) {
809
832
  console.error(`[MONITOR] SANDBOX error for ${name}@${version}: ${err.message}`);
810
833
  }
811
- } else if (tier === '1b' && sandboxAvailable) {
812
- console.log(`[MONITOR] SANDBOX DEFERRED (T1b, score=${riskScore} < 25, queue ${scanQueue.length} >= 20): ${name}@${version}`);
834
+ } else if (tier === '1b' && sandboxAvailable && riskScore >= SANDBOX_SCORE_THRESHOLD) {
835
+ // Stage 3 defer only when the score crosses the gate. Below the
836
+ // threshold, sandbox is skipped entirely (static result is final).
837
+ // This stops the deferred-queue from filling with low-score items
838
+ // that would never produce decisive sandbox findings.
839
+ console.log(`[MONITOR] SANDBOX DEFERRED (T1b, score=${riskScore}, queue ${scanQueue.length}): ${name}@${version}`);
813
840
  enqueueDeferred({
814
841
  name, version, ecosystem, tier, riskScore, tarballUrl,
815
842
  enqueuedAt: Date.now(),
@@ -818,10 +845,14 @@ async function scanPackage(name, version, ecosystem, tarballUrl, registryMeta, s
818
845
  retries: 0
819
846
  });
820
847
  stats.sandboxDeferred = (stats.sandboxDeferred || 0) + 1;
848
+ } else if (tier === '1b' && sandboxAvailable) {
849
+ // Below SANDBOX_SCORE_THRESHOLD — no sandbox, no defer.
850
+ console.log(`[MONITOR] SANDBOX GATED (T1b, score=${riskScore} < ${SANDBOX_SCORE_THRESHOLD}): ${name}@${version}`);
851
+ stats.sandboxGated = (stats.sandboxGated || 0) + 1;
821
852
  } else if (tier === '1b') {
822
853
  console.log(`[MONITOR] SANDBOX SKIPPED (T1b, no Docker): ${name}@${version}`);
823
- } else if (tier === 2 && sandboxAvailable) {
824
- console.log(`[MONITOR] SANDBOX DEFERRED (T2, queue ${scanQueue.length} >= 50): ${name}@${version}`);
854
+ } else if (tier === 2 && sandboxAvailable && riskScore >= SANDBOX_SCORE_THRESHOLD) {
855
+ console.log(`[MONITOR] SANDBOX DEFERRED (T2, score=${riskScore}, queue ${scanQueue.length}): ${name}@${version}`);
825
856
  enqueueDeferred({
826
857
  name, version, ecosystem, tier, riskScore, tarballUrl,
827
858
  enqueuedAt: Date.now(),
@@ -830,6 +861,11 @@ async function scanPackage(name, version, ecosystem, tarballUrl, registryMeta, s
830
861
  retries: 0
831
862
  });
832
863
  stats.sandboxDeferred = (stats.sandboxDeferred || 0) + 1;
864
+ } else if (tier === 2 && sandboxAvailable) {
865
+ // Below SANDBOX_SCORE_THRESHOLD — T2 was already passive; staying
866
+ // static-only matches the existing T3 behaviour.
867
+ console.log(`[MONITOR] SANDBOX GATED (T2, score=${riskScore} < ${SANDBOX_SCORE_THRESHOLD}): ${name}@${version}`);
868
+ stats.sandboxGated = (stats.sandboxGated || 0) + 1;
833
869
  } else if (tier === 2) {
834
870
  console.log(`[MONITOR] SANDBOX SKIPPED (T2, no Docker): ${name}@${version}`);
835
871
  }
@@ -1114,65 +1150,78 @@ async function processQueue(scanQueue, stats, dailyAlerts, recentlyScanned, down
1114
1150
  async function resolveTarballAndScan(item, stats, dailyAlerts, recentlyScanned, downloadsCache, scanQueue, sandboxAvailable, signal) {
1115
1151
  if (signal && signal.aborted) return;
1116
1152
 
1117
- if (item.ecosystem === 'npm' && !item.tarballUrl) {
1153
+ if (item.ecosystem === 'npm') {
1154
+ // Pre-resolve at ingestion (ingestion.js:preResolveNpmBatch) attaches
1155
+ // _npmInfo when it succeeds. Lazy path runs only when pre-resolve was
1156
+ // skipped or failed — in which case _npmInfo is absent and tarballUrl is
1157
+ // null. Either way, ATO / burst-extras / fast-track logic below runs on
1158
+ // whichever npmInfo we have, preserving full behavior.
1159
+ let npmInfo = item._npmInfo || null;
1118
1160
  try {
1119
- const npmInfo = await getNpmLatestTarball(item.name);
1120
- if (!npmInfo.tarball) {
1121
- console.log(`[MONITOR] SKIP: ${item.name} — no tarball URL found on npm`);
1122
- return;
1123
- }
1124
- item.tarballUrl = npmInfo.tarball;
1125
- if (npmInfo.version) item.version = npmInfo.version;
1126
- if (npmInfo.unpackedSize) item.unpackedSize = npmInfo.unpackedSize;
1127
- if (npmInfo.scripts) item.registryScripts = npmInfo.scripts;
1128
-
1129
- // ATO signature: most-recently-published version differs from current
1130
- // dist-tags.latest. Pattern observed in TeamPCP / @antv 2026-05-19:
1131
- // attacker publishes 1-2 versions per package but does NOT bump the latest
1132
- // tag. semver resolution on `npm install <pkg>@^x.y` still pulls the
1133
- // malicious version. The mismatch is a strong ATO signal — legitimate
1134
- // maintainers almost always move latest when publishing.
1135
- if (npmInfo.latestTagVersion && npmInfo.version && npmInfo.version !== npmInfo.latestTagVersion) {
1136
- item.atoSignal = true;
1137
- console.log(`[MONITOR] ATO SIGNAL: ${item.name}@${item.version} published but dist-tags.latest=${npmInfo.latestTagVersion}`);
1161
+ if (!item.tarballUrl) {
1162
+ npmInfo = await getNpmLatestTarball(item.name);
1163
+ if (!npmInfo.tarball) {
1164
+ console.log(`[MONITOR] SKIP: ${item.name} — no tarball URL found on npm`);
1165
+ return;
1166
+ }
1167
+ item.tarballUrl = npmInfo.tarball;
1168
+ if (npmInfo.version) item.version = npmInfo.version;
1169
+ if (npmInfo.unpackedSize) item.unpackedSize = npmInfo.unpackedSize;
1170
+ if (npmInfo.scripts) item.registryScripts = npmInfo.scripts;
1138
1171
  }
1139
1172
 
1140
- // Burst-publish coverage: enqueue extra versions published in the same
1141
- // recent window. Single change event in the CouchDB feed can correspond
1142
- // to multiple version publishes when the attacker fires several in a
1143
- // burst (TeamPCP averaged ~2 versions per package). Without this we'd
1144
- // only scan whichever version happened to be the most recent at resolution
1145
- // time, racing the publish stream.
1146
- const recents = Array.isArray(npmInfo.recentVersions) ? npmInfo.recentVersions : [];
1147
- for (const recent of recents) {
1148
- if (!recent || !recent.tarball || !recent.version) continue;
1149
- const dedupeKey = `${item.name}@${recent.version}`;
1150
- if (recentlyScanned.has(dedupeKey)) continue;
1151
- scanQueue.push({
1152
- name: item.name,
1153
- version: recent.version,
1154
- ecosystem: 'npm',
1155
- tarballUrl: recent.tarball,
1156
- unpackedSize: recent.unpackedSize || 0,
1157
- registryScripts: recent.scripts || null,
1158
- atoSignal: item.atoSignal === true,
1159
- isATOBurstExtra: true,
1160
- });
1161
- }
1173
+ if (npmInfo) {
1174
+ // ATO signature: most-recently-published version differs from current
1175
+ // dist-tags.latest. Pattern observed in TeamPCP / @antv 2026-05-19:
1176
+ // attacker publishes 1-2 versions per package but does NOT bump the latest
1177
+ // tag. semver resolution on `npm install <pkg>@^x.y` still pulls the
1178
+ // malicious version. The mismatch is a strong ATO signal — legitimate
1179
+ // maintainers almost always move latest when publishing.
1180
+ if (npmInfo.latestTagVersion && item.version && item.version !== npmInfo.latestTagVersion) {
1181
+ item.atoSignal = true;
1182
+ console.log(`[MONITOR] ATO SIGNAL: ${item.name}@${item.version} published but dist-tags.latest=${npmInfo.latestTagVersion}`);
1183
+ }
1162
1184
 
1163
- // Fast-track decision: large packages (>15MB) with no lifecycle scripts and no IOC match.
1164
- // Computed HERE (after metadata resolution), not at ingestion time post-May 2025
1165
- // CouchDB changes feed has no docs, so metadata is only available after lazy fetch.
1166
- // Fast-track packages get: quick static scan (package.json + shell only), no AST,
1167
- // no sandbox, no LLM, no archiving. Exits in ~2-3s instead of 30-300s.
1168
- // ATO-signalled packages bypass fast-track regardless of size — we want
1169
- // the full pipeline (AST + sandbox) on anything that smells like an ATO.
1170
- const FAST_TRACK_SIZE_BYTES = 15 * 1024 * 1024;
1171
- if (!item.isIOCMatch && !item.atoSignal && (item.unpackedSize || 0) > FAST_TRACK_SIZE_BYTES) {
1172
- const scripts = item.registryScripts || {};
1173
- if (!scripts.preinstall && !scripts.postinstall && !scripts.install) {
1174
- item.fastTrack = true;
1185
+ // Burst-publish coverage: enqueue extra versions published in the same
1186
+ // recent window. Single change event in the CouchDB feed can correspond
1187
+ // to multiple version publishes when the attacker fires several in a
1188
+ // burst (TeamPCP averaged ~2 versions per package). Without this we'd
1189
+ // only scan whichever version happened to be the most recent at resolution
1190
+ // time, racing the publish stream.
1191
+ const recents = Array.isArray(npmInfo.recentVersions) ? npmInfo.recentVersions : [];
1192
+ for (const recent of recents) {
1193
+ if (!recent || !recent.tarball || !recent.version) continue;
1194
+ const dedupeKey = `${item.name}@${recent.version}`;
1195
+ if (recentlyScanned.has(dedupeKey)) continue;
1196
+ scanQueue.push({
1197
+ name: item.name,
1198
+ version: recent.version,
1199
+ ecosystem: 'npm',
1200
+ tarballUrl: recent.tarball,
1201
+ unpackedSize: recent.unpackedSize || 0,
1202
+ registryScripts: recent.scripts || null,
1203
+ atoSignal: item.atoSignal === true,
1204
+ isATOBurstExtra: true,
1205
+ });
1206
+ }
1207
+
1208
+ // Fast-track decision: large packages (>15MB) with no lifecycle scripts and no IOC match.
1209
+ // Fast-track packages get: quick static scan (package.json + shell only), no AST,
1210
+ // no sandbox, no LLM, no archiving. Exits in ~2-3s instead of 30-300s.
1211
+ // ATO-signalled packages bypass fast-track regardless of size — we want
1212
+ // the full pipeline (AST + sandbox) on anything that smells like an ATO.
1213
+ const FAST_TRACK_SIZE_BYTES = 15 * 1024 * 1024;
1214
+ if (!item.isIOCMatch && !item.atoSignal && (item.unpackedSize || 0) > FAST_TRACK_SIZE_BYTES) {
1215
+ const scripts = item.registryScripts || {};
1216
+ if (!scripts.preinstall && !scripts.postinstall && !scripts.install) {
1217
+ item.fastTrack = true;
1218
+ }
1175
1219
  }
1220
+
1221
+ // Free the packument-derived metadata once the per-item decisions are
1222
+ // made — keeps queue items lean (a 28k-item queue × full packument JSON
1223
+ // would be tens of MB of useless heap).
1224
+ if (item._npmInfo) delete item._npmInfo;
1176
1225
  }
1177
1226
  } catch (err) {
1178
1227
  console.error(`[MONITOR] ERROR resolving npm tarball for ${item.name}: ${err.message}`);
@@ -1265,11 +1314,52 @@ async function resolveTarballAndScan(item, stats, dailyAlerts, recentlyScanned,
1265
1314
  // Abort check: if timeout fired during temporal checks, skip the expensive scan
1266
1315
  if (signal && signal.aborted) return;
1267
1316
 
1317
+ // Stage 2 — Pass A triage. Decides whether the static scan runs all 20
1318
+ // scanners or a quick_scan subset. Defaults to full when:
1319
+ // - env MUADDIB_TRIAGE_MODE !== 'enforce' (off | shadow | unset)
1320
+ // - the item is fastTrack-elected (already a more aggressive subset)
1321
+ // - any suspect signal flips triageRisk to 'full'
1322
+ // Shadow mode computes + logs the decision but still runs full — safe way
1323
+ // to observe classification share before flipping enforce.
1324
+ const triageMode = (process.env.MUADDIB_TRIAGE_MODE || 'off').toLowerCase();
1325
+ let effectiveScanMode = 'full';
1326
+ if (triageMode !== 'off' && !item.fastTrack) {
1327
+ let triageMeta = null;
1328
+ if (item.ecosystem === 'npm') {
1329
+ // Stage 2.1 — Stage 1 pre-resolve already fetched the packument and
1330
+ // (Stage 2.1) computed age_days + version_count, plus parallel-fetched
1331
+ // weekly_downloads. Read those directly to skip the second
1332
+ // registry round-trip via getPackageMetadata. Fallback to the lazy
1333
+ // metadata fetch only when _npmInfo is absent (lazy-resolve path).
1334
+ if (item._npmInfo) {
1335
+ triageMeta = {
1336
+ age_days: item._npmInfo.age_days,
1337
+ version_count: item._npmInfo.version_count,
1338
+ weekly_downloads: item._npmInfo.weekly_downloads,
1339
+ };
1340
+ } else {
1341
+ try {
1342
+ const { getPackageMetadata } = require('../scanner/npm-registry.js');
1343
+ triageMeta = await getPackageMetadata(item.name);
1344
+ } catch { /* metadata unavailable → triageRisk will see null and pick 'full' */ }
1345
+ }
1346
+ } else if (item.ecosystem === 'pypi') {
1347
+ triageMeta = item._pypiInfo || null;
1348
+ }
1349
+ const triage = triageRisk(item, triageMeta);
1350
+ item.scanMode = triage.mode;
1351
+ stats.triageQuick = (stats.triageQuick || 0) + (triage.mode === 'quick' ? 1 : 0);
1352
+ stats.triageFull = (stats.triageFull || 0) + (triage.mode === 'full' ? 1 : 0);
1353
+ console.log(`[TRIAGE] ${item.name}@${item.version || '?'}: mode=${triage.mode} reasons=[${triage.reasons.join(',') || 'none'}]`);
1354
+ if (triageMode === 'enforce') effectiveScanMode = triage.mode;
1355
+ }
1356
+
1268
1357
  const scanResult = await scanPackage(item.name, item.version, item.ecosystem, item.tarballUrl, {
1269
1358
  unpackedSize: item.unpackedSize || 0,
1270
1359
  registryScripts: item.registryScripts || null,
1271
1360
  _cacheTrigger: item._cacheTrigger || null,
1272
- fastTrack: item.fastTrack || false
1361
+ fastTrack: item.fastTrack || false,
1362
+ scanMode: effectiveScanMode
1273
1363
  }, stats, dailyAlerts, recentlyScanned, downloadsCache, scanQueue, sandboxAvailable);
1274
1364
  const sandboxResult = scanResult && scanResult.sandboxResult;
1275
1365
  const staticClean = scanResult && scanResult.staticClean;
@@ -1367,6 +1457,8 @@ module.exports = {
1367
1457
  LARGE_PACKAGE_SIZE,
1368
1458
  FIRST_PUBLISH_SANDBOX_MAX_QUEUE,
1369
1459
  FIRST_PUBLISH_SANDBOX_ENABLED,
1460
+ SANDBOX_SCORE_THRESHOLD,
1461
+ computeSandboxScoreThreshold,
1370
1462
  KNOWN_BUNDLED_FILES,
1371
1463
  KNOWN_BUNDLED_PATHS,
1372
1464
  ML_EXCLUDED_DIRS,
@@ -304,6 +304,72 @@ function computeReputationFactor(metadata) {
304
304
  return Math.max(0.10, Math.min(1.5, factor));
305
305
  }
306
306
 
307
+ /**
308
+ * True if the package declares an install-time lifecycle script that executes
309
+ * code on `npm install`. These hooks are the principal vehicle for malicious
310
+ * payloads (preinstall / postinstall / install). PyPI's setup.py equivalent is
311
+ * handled separately via `meta.has_setup_py` in triageRisk.
312
+ *
313
+ * Reads from both `item.registryScripts` (set by changes-stream docMeta when
314
+ * available) and `item._npmInfo.scripts` (set by Stage 1's preResolveNpmBatch).
315
+ *
316
+ * @param {Object} item - queue item
317
+ * @returns {boolean}
318
+ */
319
+ function hasDangerousLifecycle(item) {
320
+ if (!item) return false;
321
+ const direct = item.registryScripts;
322
+ if (direct && (direct.preinstall || direct.postinstall || direct.install)) return true;
323
+ const stashed = item._npmInfo && item._npmInfo.scripts;
324
+ if (stashed && (stashed.preinstall || stashed.postinstall || stashed.install)) return true;
325
+ return false;
326
+ }
327
+
328
+ /**
329
+ * Pass A triage: choose between full pipeline (20 scanners) and quick_scan
330
+ * subset for a queued package. Default is `quick`; any suspect signal flips
331
+ * to `full`. Used by the monitor only — CLI scans default to full elsewhere.
332
+ *
333
+ * Tiers (any reason → full):
334
+ * T0 IOC match / ATO signal / install-time lifecycle → known or high-prob threat
335
+ * T1 No registry metadata available → cannot establish trust, default safe
336
+ * T2 (npm) computeReputationFactor(meta) >= 1.0 → composite signal of new /
337
+ * low-download / few-versions package, subsumes individual checks
338
+ * T3 (PyPI) direct age < 30d or version_count < 5 → PyPI has no download
339
+ * stats, so we cannot reuse the npm composite; use the direct fields the
340
+ * PyPI JSON API exposes.
341
+ *
342
+ * Returning the reasons list (not just the mode) makes shadow-mode logs
343
+ * actionable for tuning.
344
+ *
345
+ * @param {Object} item - queue item
346
+ * @param {Object|null} meta - registry metadata {age_days, version_count, weekly_downloads, has_setup_py?}
347
+ * @returns {{mode: 'full'|'quick', reasons: string[]}}
348
+ */
349
+ function triageRisk(item, meta) {
350
+ const reasons = [];
351
+ const ecosystem = (item && item.ecosystem) || null;
352
+
353
+ if (item && item.isIOCMatch) reasons.push('ioc_match');
354
+ if (item && item.atoSignal) reasons.push('ato_signal');
355
+ if (hasDangerousLifecycle(item)) reasons.push('lifecycle_scripts');
356
+
357
+ if (!meta) {
358
+ reasons.push('no_metadata');
359
+ } else if (ecosystem === 'npm') {
360
+ const factor = computeReputationFactor(meta);
361
+ if (factor >= 1.0) reasons.push(`reputation_factor=${factor.toFixed(2)}`);
362
+ } else if (ecosystem === 'pypi') {
363
+ // PyPI has no weekly_downloads source today, so we cannot reuse
364
+ // computeReputationFactor as-is. Use direct signals instead.
365
+ if ((meta.age_days || 0) < 30) reasons.push('pypi_age<30d');
366
+ if ((meta.version_count || 0) < 5) reasons.push('pypi_version_count<5');
367
+ if (meta.has_setup_py === true) reasons.push('pypi_setup_py');
368
+ }
369
+
370
+ return { mode: reasons.length ? 'full' : 'quick', reasons };
371
+ }
372
+
307
373
  /**
308
374
  * Persist a CRITICAL/HIGH alert to logs/alerts/YYYY-MM-DD-HH-mm-ss-<package>.json
309
375
  * Same payload as webhook — enables offline FPR/TPR trend analysis.
@@ -1237,6 +1303,8 @@ module.exports = {
1237
1303
  computeRiskLevel,
1238
1304
  computeRiskScore,
1239
1305
  computeReputationFactor,
1306
+ hasDangerousLifecycle,
1307
+ triageRisk,
1240
1308
  persistAlert,
1241
1309
  persistDailyReport,
1242
1310
  computeAlertPriority,
@@ -227,41 +227,80 @@ async function execute(targetPath, options, pythonDeps, warnings) {
227
227
  'scanPythonAST'
228
228
  ];
229
229
 
230
+ // Stage 2 quick_scan subset (monitor-only, set via options.scanMode='quick'
231
+ // by queue.js when MUADDIB_TRIAGE_MODE=enforce). The subset keeps the heavy
232
+ // detectors that anchor TPR on the 96-sample GT (analyzeAST covers 70/96,
233
+ // analyzeDataFlow covers 31/96 — non-negotiable), the cheap high-signal
234
+ // lifecycle/IOC scanners, and the Python detectors (PyPI samples need them;
235
+ // npm exit immediately on a depth-1 readdir, so the cost is negligible).
236
+ // Excluded: scanAntiForensic (45s timeout, never the unique trigger on GT),
237
+ // scanHashes (cheap but GT samples are rebuilt — hashes drift), scanAIConfig,
238
+ // scanStubPackage, scanMonorepo, scanTrustedDepDiff (opt-in registry diff),
239
+ // checkPyPITyposquatting (subsumed by scanTyposquatting for npm; PyPI
240
+ // typosquats already get full via triage signals). CLI mode and shadow mode
241
+ // never set scanMode so the default branch runs all 20 scanners — fully
242
+ // backwards-compatible.
243
+ const QUICK_SCAN_ALLOWLIST = new Set([
244
+ 'scanPackageJson',
245
+ 'scanShellScripts',
246
+ 'analyzeAST',
247
+ 'detectObfuscation',
248
+ 'scanDependencies',
249
+ 'analyzeDataFlow',
250
+ 'scanTyposquatting',
251
+ 'scanGitHubActions',
252
+ 'matchPythonIOCs',
253
+ 'scanEntropy',
254
+ 'scanIocStrings',
255
+ 'scanPythonSource',
256
+ 'scanPythonAST',
257
+ 'scanAIConfig'
258
+ ]);
259
+ const isQuick = options.scanMode === 'quick';
260
+ function ifEnabled(name, fn) {
261
+ if (isQuick && !QUICK_SCAN_ALLOWLIST.has(name)) return Promise.resolve([]);
262
+ return fn();
263
+ }
264
+ if (isQuick) {
265
+ const skipped = SCANNER_NAMES.filter(n => !QUICK_SCAN_ALLOWLIST.has(n));
266
+ debugLog(`[EXECUTOR] scanMode=quick — skipping ${skipped.length} scanners: ${skipped.join(', ')}`);
267
+ }
268
+
230
269
  const settledResults = await Promise.allSettled([
231
- yieldThen(() => scanPackageJson(targetPath)),
232
- yieldThen(() => scanShellScripts(targetPath)),
233
- withTimeout(() => analyzeAST(targetPath, { deobfuscate: deobfuscateFn }), 'analyzeAST'),
234
- yieldThen(() => detectObfuscation(targetPath)),
235
- yieldThen(() => scanDependencies(targetPath)),
236
- yieldThen(() => scanHashes(targetPath)),
237
- withTimeout(() => analyzeDataFlow(targetPath, { deobfuscate: deobfuscateFn }), 'analyzeDataFlow'),
238
- yieldThen(() => scanTyposquatting(targetPath)),
239
- yieldThen(() => scanGitHubActions(targetPath)),
240
- yieldThen(() => matchPythonIOCs(pythonDeps, targetPath)),
241
- yieldThen(() => checkPyPITyposquatting(pythonDeps, targetPath)),
242
- withTimeout(() => scanEntropy(targetPath, { entropyThreshold: options.entropyThreshold || undefined }), 'scanEntropy'),
243
- yieldThen(() => scanAIConfig(targetPath)),
244
- yieldThen(() => scanIocStrings(targetPath)),
245
- withTimeout(() => scanAntiForensic(targetPath), 'scanAntiForensic'),
246
- yieldThen(() => scanStubPackage(targetPath)),
247
- yieldThen(() => scanMonorepo(targetPath)),
270
+ ifEnabled('scanPackageJson', () => yieldThen(() => scanPackageJson(targetPath))),
271
+ ifEnabled('scanShellScripts', () => yieldThen(() => scanShellScripts(targetPath))),
272
+ ifEnabled('analyzeAST', () => withTimeout(() => analyzeAST(targetPath, { deobfuscate: deobfuscateFn }), 'analyzeAST')),
273
+ ifEnabled('detectObfuscation', () => yieldThen(() => detectObfuscation(targetPath))),
274
+ ifEnabled('scanDependencies', () => yieldThen(() => scanDependencies(targetPath))),
275
+ ifEnabled('scanHashes', () => yieldThen(() => scanHashes(targetPath))),
276
+ ifEnabled('analyzeDataFlow', () => withTimeout(() => analyzeDataFlow(targetPath, { deobfuscate: deobfuscateFn }), 'analyzeDataFlow')),
277
+ ifEnabled('scanTyposquatting', () => yieldThen(() => scanTyposquatting(targetPath))),
278
+ ifEnabled('scanGitHubActions', () => yieldThen(() => scanGitHubActions(targetPath))),
279
+ ifEnabled('matchPythonIOCs', () => yieldThen(() => matchPythonIOCs(pythonDeps, targetPath))),
280
+ ifEnabled('checkPyPITyposquatting', () => yieldThen(() => checkPyPITyposquatting(pythonDeps, targetPath))),
281
+ ifEnabled('scanEntropy', () => withTimeout(() => scanEntropy(targetPath, { entropyThreshold: options.entropyThreshold || undefined }), 'scanEntropy')),
282
+ ifEnabled('scanAIConfig', () => yieldThen(() => scanAIConfig(targetPath))),
283
+ ifEnabled('scanIocStrings', () => yieldThen(() => scanIocStrings(targetPath))),
284
+ ifEnabled('scanAntiForensic', () => withTimeout(() => scanAntiForensic(targetPath), 'scanAntiForensic')),
285
+ ifEnabled('scanStubPackage', () => yieldThen(() => scanStubPackage(targetPath))),
286
+ ifEnabled('scanMonorepo', () => yieldThen(() => scanMonorepo(targetPath))),
248
287
  // Opt-in scanner — short-circuits to [] unless options.trustedDepDiff or
249
288
  // options.monitorMode is set. CLI runs without flags pay no cost (no I/O).
250
289
  // Wrapped in withTimeout as defense in depth: scanner has its own 10s + 5s × N
251
290
  // internal timeouts, but a registry slowdown with many added deps could exceed
252
291
  // the static-scan budget without this cap.
253
- withTimeout(() => scanTrustedDepDiff(targetPath, options), 'scanTrustedDepDiff'),
292
+ ifEnabled('scanTrustedDepDiff', () => withTimeout(() => scanTrustedDepDiff(targetPath, options), 'scanTrustedDepDiff')),
254
293
  // PYSRC-001..008 (v2.11.25, TrapDoor PyPI gap). Detect import-time RCE
255
294
  // in __init__.py / setup.py / top-level .py files. Runs always — not gated
256
295
  // on detectPythonProject() because an attacker can ship a malicious __init__.py
257
296
  // without a requirements.txt. Walker is cheap (just a depth-1 readdir).
258
- yieldThen(() => scanPythonSource(targetPath)),
297
+ ifEnabled('scanPythonSource', () => yieldThen(() => scanPythonSource(targetPath))),
259
298
  // PYAST-001..008 (v2.11.42+, npm/PyPI parity Phase 1). Full Python CST
260
299
  // analysis via tree-sitter-python WASM. Scope-aware module-level detection
261
300
  // of cmdclass override, exec, subprocess shell=True, pickle.loads,
262
301
  // __import__ dangerous, entry_points. Parser init happens at pre-analysis
263
302
  // stage above; this call is sync from the caller's POV.
264
- yieldThen(() => scanPythonAST(targetPath))
303
+ ifEnabled('scanPythonAST', () => yieldThen(() => scanPythonAST(targetPath)))
265
304
  ]);
266
305
 
267
306
  // Extract results: use empty array for rejected scanners, log errors
@@ -86,6 +86,15 @@ const PLAYBOOKS = {
86
86
  detached_process:
87
87
  'spawn/fork avec {detached: true} detecte. Le processus enfant survit a la fin de npm install et execute le payload en arriere-plan. Verifier les processus en cours: ps aux | grep node. Tuer le processus suspect.',
88
88
 
89
+ linux_fingerprint_exec:
90
+ 'execSync/spawn d\'une commande de reconnaissance Linux (id, uname, lsb_release, hostname, whoami). Seule, peut etre du telemetry legit. Combinee avec un envoi reseau, c\'est du fingerprint pour C2 grouping — verifier le contexte (compound recon_exfil_direct_ip si IP literal publique present dans le meme fichier).',
91
+
92
+ direct_ip_exfil:
93
+ 'Endpoint C2 hardcode comme IPv4 literal publique (bypass DNS resolution). Verifier le fichier qui contient l\'IP : si combine avec linux_fingerprint_exec ou credential_regex_harvest, c\'est tres probablement un C2 attaquant. Geolocaliser l\'IP, croiser avec threat intel.',
94
+
95
+ recon_exfil_direct_ip:
96
+ 'CRITIQUE: Linux system fingerprint (id/uname/lsb_release/hostname/whoami) + exfil vers IPv4 publique literal dans le meme fichier. Pattern targeted C2 grouping (campagne marginfi mai 2026, design-system-coopeuch). Isoler la machine, blocker l\'IP au firewall, capturer trafic sortant pour forensic.',
97
+
89
98
  known_malicious_package:
90
99
  'CRITIQUE: Supprimer immediatement. rm -rf node_modules && npm cache clean --force && npm install',
91
100
 
@@ -783,6 +783,19 @@ const RULES = {
783
783
  references: ['https://attack.mitre.org/techniques/T1195/002/'],
784
784
  mitre: 'T1195.002'
785
785
  },
786
+ recon_exfil_direct_ip: {
787
+ id: 'MUADDIB-COMPOUND-016',
788
+ name: 'Linux Fingerprint + Direct-IP Exfil',
789
+ severity: 'CRITICAL',
790
+ confidence: 'high',
791
+ domain: 'malware',
792
+ description: 'execSync(id|uname|lsb_release|hostname|whoami) + http/https vers IPv4 literal publique dans le meme fichier — fingerprint device pour groupement C2 cible. Pattern observe sur la campagne marginfi (mai 2026) et design-system-coopeuch reconstruction. Track D — ferme la gap surfacee par GT-095.',
793
+ references: [
794
+ 'https://attack.mitre.org/techniques/T1082/',
795
+ 'https://attack.mitre.org/techniques/T1041/'
796
+ ],
797
+ mitre: 'T1082'
798
+ },
786
799
 
787
800
  // Package.json script patterns
788
801
  curl_pipe_sh: {
@@ -1113,6 +1126,33 @@ const RULES = {
1113
1126
  ],
1114
1127
  mitre: 'T1564'
1115
1128
  },
1129
+ linux_fingerprint_exec: {
1130
+ id: 'MUADDIB-AST-093',
1131
+ name: 'Linux System Reconnaissance Exec',
1132
+ severity: 'HIGH',
1133
+ confidence: 'high',
1134
+ domain: 'malware',
1135
+ description: 'execSync/exec/spawn d\'une commande de reconnaissance Linux (id, uname, lsb_release, hostname, whoami). Pattern observe sur les MALWARE direct-IP-exfil (marginfi cluster, design-system-coopeuch) qui collectent un fingerprint device avant exfil C2. HIGH seul (telemetry SDKs peuvent appeler hostname legit) — escalade CRITICAL en compound avec direct_ip_exfil dans le meme fichier.',
1136
+ references: [
1137
+ 'https://attack.mitre.org/techniques/T1082/',
1138
+ 'https://attack.mitre.org/techniques/T1592/'
1139
+ ],
1140
+ mitre: 'T1082'
1141
+ },
1142
+ direct_ip_exfil: {
1143
+ id: 'MUADDIB-AST-094',
1144
+ name: 'Direct IP Exfiltration Endpoint',
1145
+ severity: 'HIGH',
1146
+ confidence: 'high',
1147
+ domain: 'malware',
1148
+ description: 'Literal IPv4 publique utilise comme endpoint C2 (URL http://1.2.3.4:port/path ou IP nue dans un host:/hostname: option). Bypass DNS resolution = pattern attaque ciblee. Plages skip: 127/8 (localhost), 169.254/16 (link-local incl. IMDS), 10/8 + 172.16/12 + 192.168/16 (RFC 1918 prive). RFC 5737 documentation flagge (aucun usage runtime legit).',
1149
+ references: [
1150
+ 'https://attack.mitre.org/techniques/T1071/001/',
1151
+ 'https://attack.mitre.org/techniques/T1041/',
1152
+ 'https://datatracker.ietf.org/doc/html/rfc5737'
1153
+ ],
1154
+ mitre: 'T1041'
1155
+ },
1116
1156
  dangerous_call_function: {
1117
1157
  id: 'MUADDIB-AST-005',
1118
1158
  name: 'new Function() Constructor',
@@ -349,6 +349,21 @@ function handleCallExpression(node, ctx) {
349
349
  file: ctx.relFile
350
350
  });
351
351
  }
352
+
353
+ // AST-NNN: linux_fingerprint_exec (Track D, v2.11.48+) — recon command
354
+ // pattern observed on direct-IP-exfil malware (marginfi cluster, GT-095
355
+ // design-system-coopeuch). HIGH alone (telemetry SDKs may legitimately
356
+ // call hostname); CRITICAL when compounded with direct_ip_exfil in the
357
+ // same file (`recon_exfil_direct_ip` in SCORING_COMPOUNDS).
358
+ if (/^\s*(id|uname|lsb_release|hostname|whoami)(\s|$)/.test(cmdStr)) {
359
+ const firstTok = cmdStr.trim().split(/\s+/)[0];
360
+ ctx.threats.push({
361
+ type: 'linux_fingerprint_exec',
362
+ severity: 'HIGH',
363
+ message: `${execName || memberExec}("${cmdStr.slice(0, 60)}") — Linux system reconnaissance (${firstTok}) used for device fingerprinting / C2 grouping.`,
364
+ file: ctx.relFile
365
+ });
366
+ }
352
367
  }
353
368
  }
354
369
 
@@ -424,7 +439,7 @@ function handleCallExpression(node, ctx) {
424
439
  }
425
440
 
426
441
  // Detect spawn/execFile of shell processes
427
- if ((callName === 'spawn' || callName === 'execFile') && node.arguments.length >= 1) {
442
+ if ((callName === 'spawn' || callName === 'execFile' || callName === 'spawnSync' || callName === 'execFileSync') && node.arguments.length >= 1) {
428
443
  const shellArg = node.arguments[0];
429
444
  if (shellArg.type === 'Literal' && typeof shellArg.value === 'string') {
430
445
  const shellBin = shellArg.value.toLowerCase();
@@ -436,6 +451,16 @@ function handleCallExpression(node, ctx) {
436
451
  file: ctx.relFile
437
452
  });
438
453
  }
454
+ // AST-NNN: linux_fingerprint_exec (Track D, v2.11.48+) — spawn form,
455
+ // first arg is the bare command (e.g. `spawn('uname', ['-a'])`).
456
+ if (['id', 'uname', 'lsb_release', 'hostname', 'whoami'].includes(shellBin)) {
457
+ ctx.threats.push({
458
+ type: 'linux_fingerprint_exec',
459
+ severity: 'HIGH',
460
+ message: `${callName}('${shellArg.value}', ...) — Linux system reconnaissance (${shellBin}) used for device fingerprinting / C2 grouping.`,
461
+ file: ctx.relFile
462
+ });
463
+ }
439
464
  }
440
465
  // Also check when shell is computed via os.platform() ternary
441
466
  if (shellArg.type === 'ConditionalExpression') {
@@ -73,6 +73,43 @@ function handleLiteral(node, ctx) {
73
73
  }
74
74
  }
75
75
 
76
+ // AST-NNN: direct_ip_exfil (Track D, v2.11.48+) — IPv4 literal used as
77
+ // C2 endpoint (URL form `http://1.2.3.4:port/path` OR bare IP literal
78
+ // outside the safe ranges). Pattern observed on marginfi cluster
79
+ // (72.62.71.201), design-system-coopeuch GT-095 (direct IP exfil, no
80
+ // OAST cover), and similar manual-review MALWARE. HIGH alone — combined
81
+ // with linux_fingerprint_exec in the same file, escalates to CRITICAL
82
+ // via `recon_exfil_direct_ip` compound.
83
+ //
84
+ // Safe ranges (skipped, no fire):
85
+ // 0.0.0.0 bind-all / server listen address (fastify/express default)
86
+ // 127.0.0.0/8 localhost
87
+ // 169.254.0.0/16 link-local (incl. cloud IMDS — separate rules cover abuse)
88
+ // 10.0.0.0/8 RFC 1918 private
89
+ // 172.16.0.0/12 RFC 1918 private
90
+ // 192.168.0.0/16 RFC 1918 private
91
+ // 255.255.255.255 broadcast
92
+ // RFC 5737 documentation ranges (192.0.2.x, 198.51.100.x, 203.0.113.x)
93
+ // are intentionally flagged — no legitimate runtime use, lets our GT
94
+ // reconstruction fixtures exercise the rule.
95
+ const IP_SAFE_RE = /^(0\.0\.0\.0$|127\.|10\.|192\.168\.|169\.254\.|172\.(1[6-9]|2[0-9]|3[01])\.|255\.255\.255\.255$)/;
96
+ const urlIpMatch = node.value.match(/^https?:\/\/((?:\d{1,3}\.){3}\d{1,3})(?::\d+)?(?:\/|$)/);
97
+ const bareIpMatch = node.value.match(/^((?:\d{1,3}\.){3}\d{1,3})$/);
98
+ const candidateIp = (urlIpMatch && urlIpMatch[1]) || (bareIpMatch && bareIpMatch[1]) || null;
99
+ if (candidateIp && !IP_SAFE_RE.test(candidateIp)) {
100
+ // Validate each octet ≤ 255 to avoid matching '999.999.999.999' style noise
101
+ const octets = candidateIp.split('.').map(n => parseInt(n, 10));
102
+ if (octets.every(o => o >= 0 && o <= 255)) {
103
+ const form = urlIpMatch ? 'URL' : 'bare IPv4 literal';
104
+ ctx.threats.push({
105
+ type: 'direct_ip_exfil',
106
+ severity: 'HIGH',
107
+ message: `Hardcoded ${form} ${candidateIp} — direct-IP exfil endpoint (no DNS, no OAST cover). Classic C2 / dep-confusion pattern.`,
108
+ file: ctx.relFile
109
+ });
110
+ }
111
+ }
112
+
76
113
  // Ollama LLM local: polymorphic engine indicator (PhantomRaven Wave 4)
77
114
  // Port 11434 is Ollama's default port. Legitimate packages don't call local LLMs.
78
115
  if (/(?:localhost|127\.0\.0\.1):11434/.test(node.value)) {
package/src/scoring.js CHANGED
@@ -654,6 +654,20 @@ const SCORING_COMPOUNDS = [
654
654
  fileFrom: 'function_constructor_require',
655
655
  sameFile: true
656
656
  },
657
+ // Track D (v2.11.48+) — recon_exfil_direct_ip. Closes GT-095 gap
658
+ // (design-system-coopeuch reconstruction scoring 3 alone, MALWARE per
659
+ // in-house review). Pattern: execSync(id|uname|lsb_release|hostname|whoami)
660
+ // + http(s) call to a direct IPv4 literal (no DNS, no OAST). Same file
661
+ // gates this to attacker-targeted device fingerprinting; legit telemetry
662
+ // SDKs talk to named endpoints and never co-occur with bare-IP exfil.
663
+ {
664
+ type: 'recon_exfil_direct_ip',
665
+ requires: ['linux_fingerprint_exec', 'direct_ip_exfil'],
666
+ severity: 'CRITICAL',
667
+ message: 'Linux system fingerprint (id/uname/lsb_release/hostname/whoami) + direct-IP exfil in same file — targeted device fingerprinting for C2 grouping (scoring compound).',
668
+ fileFrom: 'direct_ip_exfil',
669
+ sameFile: true
670
+ },
657
671
  ];
658
672
 
659
673
  // v2.11.11: Extract static require/import targets from a JS file (1 level).