muaddib-scanner 2.11.20 → 2.11.23

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -30,7 +30,7 @@
30
30
 
31
31
  npm and PyPI supply-chain attacks are exploding. Shai-Hulud compromised 25K+ repos in 2025. Existing tools detect threats but don't help you respond.
32
32
 
33
- MUAD'DIB combines **16 parallel scanners** (223 detection rules), a **deobfuscation engine**, **inter-module dataflow analysis**, **compound scoring**, **ML classifiers** (XGBoost), and gVisor/Docker sandbox to detect known threats and suspicious behavioral patterns in npm and PyPI packages.
33
+ MUAD'DIB combines **17 parallel scanners** (234 detection rules), a **deobfuscation engine**, **inter-module dataflow analysis**, **compound scoring** (16 compound rules), **ML classifiers** (XGBoost), and gVisor/Docker sandbox to detect known threats and suspicious behavioral patterns in npm and PyPI packages.
34
34
 
35
35
  ---
36
36
 
@@ -176,7 +176,7 @@ muaddib replay # Ground truth validation (61/65 TPR@3)
176
176
 
177
177
  ## Features
178
178
 
179
- ### 16 parallel scanners
179
+ ### 17 parallel scanners
180
180
 
181
181
  | Scanner | Detection |
182
182
  |---------|-----------|
@@ -197,10 +197,11 @@ muaddib replay # Ground truth validation (61/65 TPR@3)
197
197
  | IOC Strings (intel-triage P1.1) | YARA-style string matching (Axios 2026, TeamPCP, GlassWorm, CanisterSprawl) |
198
198
  | Anti-Forensic AST (intel-triage P1.2) | XOR loop + self-delete + decoy write compound (csec autodelete) |
199
199
  | Stub Package (intel-triage P1.3) | Tiny main file + external dep URL + lifecycle hook (ltidi chain) |
200
+ | Monorepo Scanner | Lerna/pnpm-workspace/turbo detection (Sprint 1 audit MR-C2 fix) |
200
201
 
201
- ### 223 detection rules
202
+ ### 234 detection rules
202
203
 
203
- All rules are mapped to MITRE ATT&CK techniques. See [SECURITY.md](SECURITY.md#detection-rules-v21021) for the complete rules reference.
204
+ All rules (229 RULES + 5 PARANOID) are mapped to MITRE ATT&CK techniques. See [SECURITY.md](SECURITY.md#detection-rules-v21021) for the complete rules reference.
204
205
 
205
206
  ### Detected campaigns
206
207
 
@@ -274,7 +275,7 @@ With pre-commit framework:
274
275
  ```yaml
275
276
  repos:
276
277
  - repo: https://github.com/DNSZLSK/muad-dib
277
- rev: v2.11.6
278
+ rev: v2.11.23
278
279
  hooks:
279
280
  - id: muaddib-scan
280
281
  ```
@@ -295,7 +296,7 @@ repos:
295
296
  | **FPR** (Benign random, v2.10.95 measure) | **7.0%** (14/200) | 200 random npm packages, stratified sampling |
296
297
  | **ADR** (Adversarial + Holdout) | **96.3%** (103/107) | 67 adversarial + 40 holdout (107 available on disk), global threshold=20 |
297
298
 
298
- **3529 tests** across 89 files. **223 rules** (218 RULES + 5 PARANOID).
299
+ **3594 tests** across 93 files. **234 rules** (229 RULES + 5 PARANOID).
299
300
 
300
301
  > **ML retrain methodology (v2.10.51):**
301
302
  > - Ground truth: 377 confirmed_malicious via auto-labeler (OSSF malicious-packages, GitHub Advisory Database, npm registry takedown correlation)
@@ -343,7 +344,7 @@ npm test
343
344
 
344
345
  ### Testing
345
346
 
346
- - **3529 tests** across 89 modular test files
347
+ - **3594 tests** across 93 modular test files
347
348
  - **56 fuzz tests** - Malformed inputs, ReDoS, unicode, binary
348
349
  - **Datadog 17K benchmark** - 14,587 confirmed malware samples (in-scope)
349
350
  - **Ground truth validation** - 67 real-world attacks (93.85% TPR@3, 86.2% TPR@20 — v2.10.95 measure)
@@ -364,7 +365,7 @@ npm test
364
365
  - [Documentation Index](docs/INDEX.md) - All documentation in one place
365
366
  - [Evaluation Methodology](docs/EVALUATION_METHODOLOGY.md) - Experimental protocol, holdout scores
366
367
  - [Threat Model](docs/threat-model.md) - What MUAD'DIB detects and doesn't detect
367
- - [Security Policy](SECURITY.md) - Detection rules reference (223 rules)
368
+ - [Security Policy](SECURITY.md) - Detection rules reference (234 rules)
368
369
  - [Security Audit](docs/SECURITY_AUDIT.md) - Bypass validation report
369
370
  - [FP Analysis](docs/EVALUATION.md) - Historical false positive analysis
370
371
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "muaddib-scanner",
3
- "version": "2.11.20",
3
+ "version": "2.11.23",
4
4
  "description": "Supply-chain threat detection & response for npm & PyPI/Python",
5
5
  "main": "src/index.js",
6
6
  "bin": {
@@ -552,6 +552,246 @@ function placeholderAntiDepConfusion(result, meta) {
552
552
  return true;
553
553
  }
554
554
 
555
+ // ============================================================================
556
+ // Feature 9 — mcp_server_env_access (v2.11.22, audit week3 cluster, 25 FP)
557
+ // ============================================================================
558
+ //
559
+ // Targets legitimate MCP installers / servers (Cachly, Roadmapfy, Llama
560
+ // Ventures, Flomenco, Supericons, cf-memory-mcp, mcp-memory-service, etc.)
561
+ // that currently score 75-99 from `mcp_config_injection` CRITICAL +
562
+ // env_access + credential_regex_harvest triple-stacking on legitimate
563
+ // provider-key reads. The conjunction below discriminates them from
564
+ // SANDWORM_MODE droppers (which also emit mcp_config_injection) by requiring
565
+ // the package to (a) self-identify as MCP, (b) be opt-in (no lifecycle
566
+ // hook), (c) read ONLY known provider API keys (not .npmrc / .aws / SSH),
567
+ // and (d) show no third-party exfil capability.
568
+
569
+ // Provider API key env vars that legitimate MCP installers read to populate
570
+ // the .mcp.json server config they write to the user's tool dirs.
571
+ // Case-sensitive exact-name match; pattern match for *_API_KEY / *_TOKEN /
572
+ // MCP_* / .*_KEY is allowed via PROVIDER_KEY_SUFFIX_RE.
573
+ const KNOWN_PROVIDER_KEYS_LITERAL = new Set([
574
+ 'ANTHROPIC_API_KEY', 'OPENAI_API_KEY', 'GEMINI_API_KEY', 'GOOGLE_API_KEY',
575
+ 'GOOGLE_GENERATIVE_AI_API_KEY',
576
+ 'STRIPE_SECRET_KEY', 'STRIPE_PUBLISHABLE_KEY', 'STRIPE_API_KEY', 'STRIPE_KEY',
577
+ 'BRAVE_API_KEY', 'FIGMA_TOKEN', 'FIGMA_ACCESS_TOKEN', 'POSTHOG_KEY',
578
+ 'PERPLEXITY_API_KEY', 'GROQ_API_KEY', 'COHERE_API_KEY', 'MISTRAL_API_KEY',
579
+ 'OPENROUTER_API_KEY', 'TOGETHER_API_KEY', 'DEEPSEEK_API_KEY', 'XAI_API_KEY',
580
+ 'SUPABASE_ANON_KEY', 'SUPABASE_URL', 'CLAUDE_API_KEY', 'CLAUDE_KEY',
581
+ 'ANTHROPIC_AUTH_TOKEN', 'ANTHROPIC_BASE_URL', 'OPENAI_BASE_URL'
582
+ ]);
583
+ const PROVIDER_KEY_SUFFIX_RE = /^(?:MCP_[A-Z0-9_]+|[A-Z][A-Z0-9_]*_API_KEY|[A-Z][A-Z0-9_]*_TOKEN|[A-Z][A-Z0-9_]*_API_TOKEN)$/;
584
+
585
+ // Infra / build env vars that any well-behaved package can read without
586
+ // disqualifying F9 (their presence doesn't indicate credential harvest).
587
+ const F9_INFRA_KEYS = new Set([
588
+ 'HOME', 'USERPROFILE', 'XDG_CONFIG_HOME', 'XDG_DATA_HOME', 'XDG_CACHE_HOME',
589
+ 'PATH', 'NODE_ENV', 'NODE_PATH', 'DEBUG', 'CI', 'CWD', 'PWD',
590
+ 'APPDATA', 'LOCALAPPDATA', 'TEMP', 'TMP', 'TMPDIR', 'SHELL',
591
+ 'LANG', 'LC_ALL', 'TERM', 'COLORTERM'
592
+ ]);
593
+
594
+ // Credential file paths that a malicious MCP dropper would harvest.
595
+ // Appearance in any threat message disqualifies F9.
596
+ const F9_CREDENTIAL_FILE_RE = /\.npmrc\b|\.aws[\/\\](?:credentials|config)\b|\bid_rsa\b|\bid_ed25519\b|\.ssh[\/\\]|\.kube[\/\\]config\b|\.docker[\/\\]config\b|\.netrc\b|\.git-credentials\b|wallet\.dat\b|\bsecret_token\b/i;
597
+
598
+ // Threat types that signal third-party network egress. F9 disqualifies on
599
+ // any of these — a legit MCP installer writes .mcp.json and reads env, it
600
+ // does NOT download payloads or call back to attacker hosts.
601
+ const F9_EXFIL_TYPES = new Set([
602
+ 'suspicious_domain',
603
+ 'suspicious_dataflow',
604
+ 'remote_code_load',
605
+ 'intent_credential_exfil',
606
+ 'intent_command_exfil',
607
+ 'fetch_decrypt_exec',
608
+ 'reverse_shell',
609
+ 'binary_dropper',
610
+ 'download_exec_binary',
611
+ 'curl_env_exfil',
612
+ 'curl_exec',
613
+ 'external_tarball_dep',
614
+ 'dependency_url_suspicious',
615
+ 'blockchain_c2_resolution',
616
+ 'dns_exfil'
617
+ ]);
618
+
619
+ // MCP identity signals — package SELF-identifies as an MCP installer/server.
620
+ const MCP_NAME_RE = /(?:^|[/_-])mcp(?:[_-]|$)|claude[_-]plugin[_-]mcp|mcp[_-](?:server|init|bridge|installer|memory|plugin|core|router|host|client|gateway|relay|stdio|transport|orchestrator)/i;
621
+ const MCP_DESC_RE = /\bmodel context protocol\b|\bmcp[ -](?:server|installer|bridge|plugin|memory|core|gateway|relay|orchestrator|transport)\b|\b(?:claude|cursor|windsurf)[ -]mcp\b/i;
622
+
623
+ function _f9Keywords(meta) {
624
+ const m = (meta && meta.registryMeta) || {};
625
+ return Array.isArray(m.keywords) ? m.keywords.map(k => String(k).toLowerCase()) : [];
626
+ }
627
+
628
+ function _f9HasMcpIdentity(meta) {
629
+ if (!meta) return false;
630
+ const name = String(meta.name || '').toLowerCase();
631
+ if (MCP_NAME_RE.test(name)) return true;
632
+ const desc = (meta.registryMeta && meta.registryMeta.description) || meta.description || '';
633
+ if (MCP_DESC_RE.test(desc)) return true;
634
+ const kw = _f9Keywords(meta);
635
+ for (const k of kw) {
636
+ if (k === 'mcp' || k === 'model-context-protocol' || k === 'model context protocol' ||
637
+ k.startsWith('mcp-') || k.startsWith('mcp_')) return true;
638
+ }
639
+ const bin = meta.registryMeta && meta.registryMeta.bin;
640
+ if (bin && typeof bin === 'object') {
641
+ for (const b of Object.keys(bin)) {
642
+ if (/mcp/i.test(b)) return true;
643
+ }
644
+ } else if (typeof bin === 'string' && /mcp/i.test(bin)) {
645
+ return true;
646
+ }
647
+ return false;
648
+ }
649
+
650
+ /**
651
+ * Feature 9 — TRUE iff the package self-identifies as an MCP installer/server
652
+ * AND emits `mcp_config_injection` (legit scaffolding signal) AND has no
653
+ * install lifecycle script AND its env_access / credential_regex_harvest
654
+ * threats cite ONLY known provider API keys (Anthropic/OpenAI/Stripe/etc.)
655
+ * — never credential files like .npmrc, .aws/credentials, or SSH keys —
656
+ * AND shows no third-party exfil capability.
657
+ *
658
+ * Targets the v2.11 audit week3 cluster of 25 legitimate MCP plugin
659
+ * installers that currently score 75-99 from mcp_config_injection +
660
+ * env_access + credential_regex_harvest triple-stacking. Cap to 30 (MEDIUM).
661
+ *
662
+ * Mutually exclusive with SANDWORM_MODE MCP droppers: condition C3 blocks
663
+ * preinstall/postinstall droppers; C4 blocks .npmrc/SSH/AWS harvests; C5
664
+ * blocks downloaders. None of the 15 MALWARE + 29 PENTEST samples in the
665
+ * week3 audit satisfy all five conditions simultaneously.
666
+ *
667
+ * Covers 25 FP (8.7% of audit week3 FP corpus).
668
+ */
669
+ function mcpServerEnvAccess(result, meta) {
670
+ // C1 — MCP identity
671
+ if (!_f9HasMcpIdentity(meta)) return false;
672
+ const threats = (result && result.threats) || [];
673
+ if (threats.length === 0) return false;
674
+ // C2 — mcp_config_injection present (the positive signal that the package
675
+ // actually does MCP work, not just claims to)
676
+ if (!threats.some(t => t.type === 'mcp_config_injection')) return false;
677
+ // C3 — no install lifecycle hook
678
+ if (hasLifecycleScripts(meta)) return false;
679
+ // C4 — env_access / credential_regex_harvest must cite only known provider
680
+ // keys (literal whitelist + suffix pattern) or infra vars; never credential
681
+ // file paths
682
+ for (const t of threats) {
683
+ if (t.type !== 'env_access' && t.type !== 'credential_regex_harvest' &&
684
+ t.type !== 'env_charcode_reconstruction') continue;
685
+ const msg = String(t.message || '');
686
+ if (F9_CREDENTIAL_FILE_RE.test(msg)) return false;
687
+ // Extract candidate env var names from the message
688
+ const candidates = msg.match(/\b[A-Z][A-Z0-9_]{2,}\b/g);
689
+ if (!candidates) continue;
690
+ for (const v of candidates) {
691
+ if (KNOWN_PROVIDER_KEYS_LITERAL.has(v)) continue;
692
+ if (PROVIDER_KEY_SUFFIX_RE.test(v)) continue;
693
+ if (F9_INFRA_KEYS.has(v)) continue;
694
+ // Unknown all-caps token in a credential threat message — could be an
695
+ // attacker-specific var. Don't vouch for legitimacy.
696
+ return false;
697
+ }
698
+ }
699
+ // C5 — no third-party exfil capability
700
+ for (const t of threats) {
701
+ if (F9_EXFIL_TYPES.has(t.type)) return false;
702
+ }
703
+ return true;
704
+ }
705
+
706
+ // ============================================================================
707
+ // Feature 10 — vendor_cli_sdk (v2.11.23, audit week3 cluster, 96 FP)
708
+ // ============================================================================
709
+ //
710
+ // Targets the largest residual FP cluster from the audit 2026-05-week3
711
+ // (96 entries, 33.6% of FP): legitimate vendor / community CLIs and SDKs
712
+ // that fire `credential_regex_harvest` + `env_access` on their OWN
713
+ // in-package credential handling (Stripe checkout, OAuth-PKCE, bearer
714
+ // tokens to vendor APIs, .env template scaffolding). Examples observed:
715
+ // @nocobase/cli-v1, @posterly/cli, @super-hands/cli, codeapp-js-cli
716
+ // (Microsoft Power Apps), nodebb-plugin-flawless-donations (Stripe),
717
+ // @aiyiran/myclaw (Chinese OpenClaw wrapper), usegrain (scaffolder),
718
+ // @tapestry-mud/cli, db-model-router, etc.
719
+ //
720
+ // Discriminator vs vendor-impersonating malware: SANDWORM_MODE droppers
721
+ // (a) typically have no `bin` entry (they install via lifecycle hook,
722
+ // not user-invoked CLI), (b) emit `mcp_config_injection` (F9 catches
723
+ // those), (c) cite credential file paths (.npmrc / .ssh / .aws), (d)
724
+ // emit third-party exfil threats. F10's conjunction requires NONE of
725
+ // these and additionally requires a vendor identity hint (homepage or
726
+ // scoped name).
727
+
728
+ function _f10HasBinEntry(meta) {
729
+ const bin = meta && meta.registryMeta && meta.registryMeta.bin;
730
+ if (!bin) return false;
731
+ if (typeof bin === 'string' && bin.trim().length > 0) return true;
732
+ if (typeof bin === 'object' && Object.keys(bin).length > 0) return true;
733
+ return false;
734
+ }
735
+
736
+ function _f10HasVendorIdentity(meta) {
737
+ if (!meta) return false;
738
+ if (getHomepageHost(meta)) return true;
739
+ const name = meta.name && String(meta.name);
740
+ if (name && name.startsWith('@') && name.includes('/')) return true;
741
+ return false;
742
+ }
743
+
744
+ /**
745
+ * Feature 10 — TRUE iff the package looks structurally like a legitimate
746
+ * vendor / community CLI / SDK whose credential-handling threats are
747
+ * intrinsic to its functionality, not an exfil vector.
748
+ *
749
+ * Conjunction of 7 conditions (see file header for SANDWORM_MODE
750
+ * discriminator rationale):
751
+ *
752
+ * C1 has `bin` entry — CLI signal
753
+ * C2 credential_regex_harvest OR env_access fires
754
+ * C3 no `mcp_config_injection` — F9 catches MCP installers
755
+ * C4 no install lifecycle hook — legit CLIs are opt-in
756
+ * C5 no third-party exfil threat (15 types)
757
+ * C6 no credential file path (.npmrc/.ssh/.aws) in any threat message
758
+ * C7 vendor identity present (homepage host OR scoped @vendor/name)
759
+ *
760
+ * Cap value 35 (CRITICAL → MEDIUM-HIGH boundary). Reuses the F9 constants
761
+ * F9_EXFIL_TYPES and F9_CREDENTIAL_FILE_RE for C5/C6.
762
+ *
763
+ * Covers up to 96 FP (33.6% of audit week3 FP corpus). Estimated effective
764
+ * coverage 60-75 after the conjunction filters (some week3 entries lack
765
+ * a bin field, e.g. design-system asset packages — those fall under F1
766
+ * `bundle_without_install_scripts` instead).
767
+ */
768
+ function vendorCliSdk(result, meta) {
769
+ // C1 — has bin entry
770
+ if (!_f10HasBinEntry(meta)) return false;
771
+ const threats = (result && result.threats) || [];
772
+ if (threats.length === 0) return false;
773
+ // C2 — at least one credential-noise threat (the FP source)
774
+ const hasCredentialNoise = threats.some(t =>
775
+ t.type === 'credential_regex_harvest' ||
776
+ t.type === 'env_access' ||
777
+ t.type === 'env_charcode_reconstruction' ||
778
+ t.type === 'credential_tampering'
779
+ );
780
+ if (!hasCredentialNoise) return false;
781
+ // C3 — no mcp_config_injection (F9 territory)
782
+ if (threats.some(t => t.type === 'mcp_config_injection')) return false;
783
+ // C4 — no install lifecycle hook
784
+ if (hasLifecycleScripts(meta)) return false;
785
+ // C5 + C6 — scan threats for exfil signal and credential-file mentions
786
+ for (const t of threats) {
787
+ if (F9_EXFIL_TYPES.has(t.type)) return false; // C5
788
+ if (F9_CREDENTIAL_FILE_RE.test(String(t.message || ''))) return false; // C6
789
+ }
790
+ // C7 — vendor identity
791
+ if (!_f10HasVendorIdentity(meta)) return false;
792
+ return true;
793
+ }
794
+
555
795
  /**
556
796
  * Feature 8 — TRUE iff the package declares at least one install
557
797
  * lifecycle script AND the scan shows no network egress capability
@@ -702,6 +942,11 @@ function extractFeatures(result, meta) {
702
942
  // See ml-retrain/ml-auc-v2.10.96.md for details.
703
943
  features.install_script_no_network_egress = 0; // installScriptNoNetworkEgress(result, meta) ? 1 : 0;
704
944
 
945
+ // --- v2.11.22 Feature 9 (audit week3 cluster — 25 FP) ---
946
+ features.mcp_server_env_access = mcpServerEnvAccess(result, meta) ? 1 : 0;
947
+ // --- v2.11.23 Feature 10 (audit week3 cluster — up to 96 FP) ---
948
+ features.vendor_cli_sdk = vendorCliSdk(result, meta) ? 1 : 0;
949
+
705
950
  return features;
706
951
  }
707
952
 
@@ -779,5 +1024,7 @@ module.exports = {
779
1024
  typosquatScopedPackage,
780
1025
  obfuscationWithoutVector,
781
1026
  placeholderAntiDepConfusion,
782
- installScriptNoNetworkEgress
1027
+ installScriptNoNetworkEgress,
1028
+ mcpServerEnvAccess,
1029
+ vendorCliSdk
783
1030
  };
@@ -317,43 +317,136 @@ function parsePyPIRss(xml) {
317
317
 
318
318
  // --- CouchDB doc extraction ---
319
319
 
320
+ // Burst-publish window: extra versions published within this window before the
321
+ // most-recent one are also enqueued for scanning. Covers the case where an
322
+ // account-takeover attacker publishes several versions in a short burst.
323
+ const RECENT_PUBLISH_WINDOW_MS = 24 * 60 * 60 * 1000;
324
+ const RECENT_PUBLISH_MAX = 5;
325
+
320
326
  /**
321
- * Layer 2: Extract the latest version's tarball URL from a CouchDB changes document
322
- * (when using include_docs=true). Eliminates the separate registry roundtrip
323
- * that can 404 if the package is unpublished between detection and scan.
327
+ * Pure function: pick the most-recently-published version from a packument and
328
+ * return its metadata, plus context useful for ATO detection.
329
+ *
330
+ * Critical: we sort by `time[version]` publish timestamp, NOT `dist-tags.latest`.
331
+ * Account-takeover attacks (TeamPCP / @antv 2026-05-19, SAP, every Shai-Hulud
332
+ * derivative) publish malicious versions without moving the latest tag — semver
333
+ * resolution on `npm install` will still pull them. Selecting by latest tag
334
+ * scans the wrong (clean) version and lets the malicious tarball ship.
324
335
  *
325
- * @param {Object} doc - CouchDB document (change.doc)
326
- * @returns {{ version: string, tarball: string|null, unpackedSize: number, scripts: Object }|null}
336
+ * Falls back to `dist-tags.latest` only when `time` is missing or yields no
337
+ * usable entries (very old legacy packages).
338
+ *
339
+ * @param {Object} packument - npm packument (full /<pkg> response or CouchDB doc)
340
+ * @param {Object} [options]
341
+ * @param {number} [options.recentWindowMs=86400000] - window for collecting extra recent versions
342
+ * @param {number} [options.maxRecent=5] - hard cap on extras returned
343
+ * @returns {Object|null} - {
344
+ * version, tarball, unpackedSize, scripts, homepage, description,
345
+ * latestTagVersion, // dist-tags.latest (may differ from `version` under ATO)
346
+ * recentVersions: [{ version, tarball, unpackedSize, scripts }, ...]
347
+ * } or null if no usable version found
327
348
  */
328
- function extractTarballFromDoc(doc) {
329
- try {
330
- if (!doc || !doc.versions || !doc['dist-tags']) return null;
331
-
332
- const latestTag = doc['dist-tags'].latest;
333
- if (!latestTag) return null;
349
+ function selectMostRecentVersion(packument, options = {}) {
350
+ const recentWindowMs = options.recentWindowMs != null ? options.recentWindowMs : RECENT_PUBLISH_WINDOW_MS;
351
+ const maxRecent = options.maxRecent != null ? options.maxRecent : RECENT_PUBLISH_MAX;
352
+
353
+ if (!packument || typeof packument !== 'object') return null;
354
+ const versions = packument.versions || {};
355
+ const time = packument.time || {};
356
+ const distTags = packument['dist-tags'] || {};
357
+ const latestTagVersion = (typeof distTags.latest === 'string') ? distTags.latest : null;
358
+
359
+ // Build [version, timestamp] pairs from `time`, skipping non-version keys
360
+ // (created/modified) and entries for unpublished versions (present in `time`
361
+ // but absent from `versions` — npm leaves the tombstone after `npm unpublish`).
362
+ const versionTimes = [];
363
+ for (const [v, tsStr] of Object.entries(time)) {
364
+ if (v === 'created' || v === 'modified') continue;
365
+ if (!versions[v]) continue;
366
+ const ts = Date.parse(tsStr);
367
+ if (!Number.isFinite(ts)) continue;
368
+ versionTimes.push([v, ts]);
369
+ }
334
370
 
335
- const versionData = doc.versions[latestTag];
336
- if (!versionData) return null;
371
+ let mostRecentVersion = null;
372
+ if (versionTimes.length > 0) {
373
+ versionTimes.sort((a, b) => b[1] - a[1]);
374
+ mostRecentVersion = versionTimes[0][0];
375
+ } else if (latestTagVersion && versions[latestTagVersion]) {
376
+ // Legacy fallback: no usable time data, accept dist-tag latest
377
+ mostRecentVersion = latestTagVersion;
378
+ }
379
+ if (!mostRecentVersion) return null;
380
+
381
+ const versionData = versions[mostRecentVersion];
382
+ if (!versionData) return null;
383
+
384
+ const result = {
385
+ version: versionData.version || mostRecentVersion,
386
+ tarball: (versionData.dist && versionData.dist.tarball) || null,
387
+ unpackedSize: (versionData.dist && versionData.dist.unpackedSize) || 0,
388
+ scripts: versionData.scripts || {},
389
+ homepage: (typeof versionData.homepage === 'string') ? versionData.homepage : '',
390
+ description: (typeof versionData.description === 'string') ? versionData.description : '',
391
+ latestTagVersion,
392
+ recentVersions: [],
393
+ };
394
+
395
+ // Burst extras: other versions published within the recent window, excluding
396
+ // the most-recent one. Bounded by maxRecent. Each extra carries enough
397
+ // metadata for the queue to enqueue it directly without re-fetching the packument.
398
+ if (versionTimes.length > 1) {
399
+ const cutoff = versionTimes[0][1] - recentWindowMs;
400
+ for (let i = 1; i < versionTimes.length && result.recentVersions.length < maxRecent; i++) {
401
+ const [v, ts] = versionTimes[i];
402
+ if (ts < cutoff) break; // sorted desc, so once we cross the cutoff we're done
403
+ const vData = versions[v];
404
+ if (!vData) continue;
405
+ result.recentVersions.push({
406
+ version: vData.version || v,
407
+ tarball: (vData.dist && vData.dist.tarball) || null,
408
+ unpackedSize: (vData.dist && vData.dist.unpackedSize) || 0,
409
+ scripts: vData.scripts || {},
410
+ });
411
+ }
412
+ }
337
413
 
338
- const tarball = (versionData.dist && versionData.dist.tarball) || null;
339
- const unpackedSize = (versionData.dist && versionData.dist.unpackedSize) || 0;
340
- const version = versionData.version || latestTag;
341
- const scripts = versionData.scripts || {};
342
- const homepage = (typeof versionData.homepage === 'string') ? versionData.homepage : '';
343
- const description = (typeof versionData.description === 'string') ? versionData.description : '';
414
+ return result;
415
+ }
344
416
 
345
- return { version, tarball, unpackedSize, scripts, homepage, description };
417
+ /**
418
+ * Layer 2: Extract metadata for the most-recently-published version from a
419
+ * CouchDB changes document (when using include_docs=true). Eliminates the
420
+ * separate registry roundtrip that can 404 if the package is unpublished
421
+ * between detection and scan.
422
+ *
423
+ * Currently dead code post-May 2025 CouchDB migration (include_docs deprecated,
424
+ * change.doc is always null). Kept defensive in case the registry restores it
425
+ * or a different upstream mirror provides docs again.
426
+ *
427
+ * @param {Object} doc - CouchDB document (change.doc), structurally a packument
428
+ */
429
+ function extractTarballFromDoc(doc) {
430
+ try {
431
+ if (!doc || !doc.versions || !doc['dist-tags']) return null;
432
+ return selectMostRecentVersion(doc);
346
433
  } catch {
347
434
  return null; // Parse failure -> fallback to lazy resolution
348
435
  }
349
436
  }
350
437
 
351
438
  /**
352
- * Fetch latest version metadata for an npm package.
353
- * Returns { version, tarball } or null on failure.
439
+ * Fetch most-recently-published version metadata for an npm package.
440
+ *
441
+ * Uses the full packument (`registry.npmjs.org/<pkg>`) rather than the `/latest`
442
+ * endpoint so we can detect ATO attacks that publish without moving the latest
443
+ * dist-tag (see selectMostRecentVersion for full threat model).
444
+ *
445
+ * Returned object includes `latestTagVersion` and `recentVersions` so callers
446
+ * can flag the ATO signature and enqueue burst extras for scanning.
354
447
  */
355
448
  async function getNpmLatestTarball(packageName) {
356
- const url = `https://registry.npmjs.org/${encodeURIComponent(packageName)}/latest`;
449
+ const url = `https://registry.npmjs.org/${encodeURIComponent(packageName)}`;
357
450
  await acquireRegistrySlot();
358
451
  let body;
359
452
  try {
@@ -361,19 +454,21 @@ async function getNpmLatestTarball(packageName) {
361
454
  } finally {
362
455
  releaseRegistrySlot();
363
456
  }
364
- let data;
457
+ let packument;
365
458
  try {
366
- data = JSON.parse(body);
459
+ packument = JSON.parse(body);
367
460
  } catch (e) {
368
461
  throw new Error(`Invalid JSON from npm registry for ${packageName}: ${e.message}`);
369
462
  }
370
- const version = data.version || '';
371
- const tarball = (data.dist && data.dist.tarball) || null;
372
- const unpackedSize = (data.dist && data.dist.unpackedSize) || 0;
373
- const scripts = (data.scripts) || {};
374
- const homepage = (typeof data.homepage === 'string') ? data.homepage : '';
375
- const description = (typeof data.description === 'string') ? data.description : '';
376
- return { version, tarball, unpackedSize, scripts, homepage, description };
463
+ const result = selectMostRecentVersion(packument);
464
+ if (!result) {
465
+ return {
466
+ version: '', tarball: null, unpackedSize: 0, scripts: {},
467
+ homepage: '', description: '',
468
+ latestTagVersion: null, recentVersions: [],
469
+ };
470
+ }
471
+ return result;
377
472
  }
378
473
 
379
474
  // --- npm polling ---
@@ -1075,6 +1170,9 @@ module.exports = {
1075
1170
 
1076
1171
  // CouchDB doc extraction
1077
1172
  extractTarballFromDoc,
1173
+ selectMostRecentVersion,
1174
+ RECENT_PUBLISH_WINDOW_MS,
1175
+ RECENT_PUBLISH_MAX,
1078
1176
 
1079
1177
  // Polling functions
1080
1178
  pollNpmChanges,
@@ -1118,13 +1118,49 @@ async function resolveTarballAndScan(item, stats, dailyAlerts, recentlyScanned,
1118
1118
  if (npmInfo.unpackedSize) item.unpackedSize = npmInfo.unpackedSize;
1119
1119
  if (npmInfo.scripts) item.registryScripts = npmInfo.scripts;
1120
1120
 
1121
+ // ATO signature: most-recently-published version differs from current
1122
+ // dist-tags.latest. Pattern observed in TeamPCP / @antv 2026-05-19:
1123
+ // attacker publishes 1-2 versions per package but does NOT bump the latest
1124
+ // tag. semver resolution on `npm install <pkg>@^x.y` still pulls the
1125
+ // malicious version. The mismatch is a strong ATO signal — legitimate
1126
+ // maintainers almost always move latest when publishing.
1127
+ if (npmInfo.latestTagVersion && npmInfo.version && npmInfo.version !== npmInfo.latestTagVersion) {
1128
+ item.atoSignal = true;
1129
+ console.log(`[MONITOR] ATO SIGNAL: ${item.name}@${item.version} published but dist-tags.latest=${npmInfo.latestTagVersion}`);
1130
+ }
1131
+
1132
+ // Burst-publish coverage: enqueue extra versions published in the same
1133
+ // recent window. Single change event in the CouchDB feed can correspond
1134
+ // to multiple version publishes when the attacker fires several in a
1135
+ // burst (TeamPCP averaged ~2 versions per package). Without this we'd
1136
+ // only scan whichever version happened to be the most recent at resolution
1137
+ // time, racing the publish stream.
1138
+ const recents = Array.isArray(npmInfo.recentVersions) ? npmInfo.recentVersions : [];
1139
+ for (const recent of recents) {
1140
+ if (!recent || !recent.tarball || !recent.version) continue;
1141
+ const dedupeKey = `${item.name}@${recent.version}`;
1142
+ if (recentlyScanned.has(dedupeKey)) continue;
1143
+ scanQueue.push({
1144
+ name: item.name,
1145
+ version: recent.version,
1146
+ ecosystem: 'npm',
1147
+ tarballUrl: recent.tarball,
1148
+ unpackedSize: recent.unpackedSize || 0,
1149
+ registryScripts: recent.scripts || null,
1150
+ atoSignal: item.atoSignal === true,
1151
+ isATOBurstExtra: true,
1152
+ });
1153
+ }
1154
+
1121
1155
  // Fast-track decision: large packages (>15MB) with no lifecycle scripts and no IOC match.
1122
1156
  // Computed HERE (after metadata resolution), not at ingestion time — post-May 2025
1123
1157
  // CouchDB changes feed has no docs, so metadata is only available after lazy fetch.
1124
1158
  // Fast-track packages get: quick static scan (package.json + shell only), no AST,
1125
1159
  // no sandbox, no LLM, no archiving. Exits in ~2-3s instead of 30-300s.
1160
+ // ATO-signalled packages bypass fast-track regardless of size — we want
1161
+ // the full pipeline (AST + sandbox) on anything that smells like an ATO.
1126
1162
  const FAST_TRACK_SIZE_BYTES = 15 * 1024 * 1024;
1127
- if (!item.isIOCMatch && (item.unpackedSize || 0) > FAST_TRACK_SIZE_BYTES) {
1163
+ if (!item.isIOCMatch && !item.atoSignal && (item.unpackedSize || 0) > FAST_TRACK_SIZE_BYTES) {
1128
1164
  const scripts = item.registryScripts || {};
1129
1165
  if (!scripts.preinstall && !scripts.postinstall && !scripts.install) {
1130
1166
  item.fastTrack = true;
@@ -166,6 +166,9 @@ async function process(threats, targetPath, options, pythonDeps, warnings, scann
166
166
  homepage: pkgData.homepage || (typeof pkgData.repository === 'string' ? pkgData.repository : (pkgData.repository && pkgData.repository.url) || ''),
167
167
  dependencies: pkgData.dependencies,
168
168
  devDependencies: pkgData.devDependencies,
169
+ // v2.11.22 — used by F9 (mcp_server_env_access) identity check.
170
+ keywords: Array.isArray(pkgData.keywords) ? pkgData.keywords : undefined,
171
+ bin: pkgData.bin,
169
172
  };
170
173
  }
171
174
  } catch { /* graceful fallback */ }
package/src/scoring.js CHANGED
@@ -1483,6 +1483,8 @@ const {
1483
1483
  typosquatScopedPackage,
1484
1484
  obfuscationWithoutVector,
1485
1485
  placeholderAntiDepConfusion,
1486
+ mcpServerEnvAccess,
1487
+ vendorCliSdk,
1486
1488
  } = require('./ml/feature-extractor.js');
1487
1489
 
1488
1490
  /**
@@ -1501,6 +1503,9 @@ function applyContextualFPCaps(result, pkgMeta) {
1501
1503
  homepage: (pkgMeta && pkgMeta.homepage) || '',
1502
1504
  dependencies: (pkgMeta && pkgMeta.dependencies),
1503
1505
  devDependencies: (pkgMeta && pkgMeta.devDependencies),
1506
+ // v2.11.22 — used by F9 (mcp_server_env_access) identity check.
1507
+ keywords: (pkgMeta && pkgMeta.keywords),
1508
+ bin: (pkgMeta && pkgMeta.bin),
1504
1509
  },
1505
1510
  };
1506
1511
 
@@ -1518,6 +1523,10 @@ function applyContextualFPCaps(result, pkgMeta) {
1518
1523
  if (networkDestinationFirstParty(result, meta)) {
1519
1524
  applied.push({ feature: 'network_destination_first_party', cap: 30 });
1520
1525
  }
1526
+ // F9: legit MCP installer/server with env_access on provider keys → MAX 30
1527
+ if (mcpServerEnvAccess(result, meta)) {
1528
+ applied.push({ feature: 'mcp_server_env_access', cap: 30 });
1529
+ }
1521
1530
  // F2: binary installer from GitHub Releases → MAX 35
1522
1531
  if (installUrlGithubReleases(result)) {
1523
1532
  applied.push({ feature: 'install_url_github_releases', cap: 35 });
@@ -1530,6 +1539,10 @@ function applyContextualFPCaps(result, pkgMeta) {
1530
1539
  if (obfuscationWithoutVector(result)) {
1531
1540
  applied.push({ feature: 'obfuscation_without_vector', cap: 35 });
1532
1541
  }
1542
+ // F10: legit vendor CLI/SDK with intrinsic credential handling → MAX 35
1543
+ if (vendorCliSdk(result, meta)) {
1544
+ applied.push({ feature: 'vendor_cli_sdk', cap: 35 });
1545
+ }
1533
1546
  // F5: typosquat on scoped package → suppress typosquat points
1534
1547
  if (typosquatScopedPackage(result, meta)) {
1535
1548
  applied.push({ feature: 'typosquat_scoped_package', cap: -1 });