npm - muaddib-scanner - Versions diffs - 2.11.20 → 2.11.23 - Mend

muaddib-scanner 2.11.20 → 2.11.23

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md +9 -8
package/package.json +1 -1
package/src/ml/feature-extractor.js +248 -1
package/src/monitor/ingestion.js +130 -32
package/src/monitor/queue.js +37 -1
package/src/pipeline/processor.js +3 -0
package/src/scoring.js +13 -0

package/README.md CHANGED Viewed

@@ -30,7 +30,7 @@
 npm and PyPI supply-chain attacks are exploding. Shai-Hulud compromised 25K+ repos in 2025. Existing tools detect threats but don't help you respond.
-MUAD'DIB combines **16 parallel scanners** (223 detection rules), a **deobfuscation engine**, **inter-module dataflow analysis**, **compound scoring**, **ML classifiers** (XGBoost), and gVisor/Docker sandbox to detect known threats and suspicious behavioral patterns in npm and PyPI packages.
+MUAD'DIB combines **17 parallel scanners** (234 detection rules), a **deobfuscation engine**, **inter-module dataflow analysis**, **compound scoring** (16 compound rules), **ML classifiers** (XGBoost), and gVisor/Docker sandbox to detect known threats and suspicious behavioral patterns in npm and PyPI packages.
 ---
@@ -176,7 +176,7 @@ muaddib replay                     # Ground truth validation (61/65 TPR@3)
 ## Features
-### 16 parallel scanners
+### 17 parallel scanners
 | Scanner | Detection |
 |---------|-----------|
@@ -197,10 +197,11 @@ muaddib replay                     # Ground truth validation (61/65 TPR@3)
 | IOC Strings (intel-triage P1.1) | YARA-style string matching (Axios 2026, TeamPCP, GlassWorm, CanisterSprawl) |
 | Anti-Forensic AST (intel-triage P1.2) | XOR loop + self-delete + decoy write compound (csec autodelete) |
 | Stub Package (intel-triage P1.3) | Tiny main file + external dep URL + lifecycle hook (ltidi chain) |
+| Monorepo Scanner | Lerna/pnpm-workspace/turbo detection (Sprint 1 audit MR-C2 fix) |
-### 223 detection rules
+### 234 detection rules
-All rules are mapped to MITRE ATT&CK techniques. See [SECURITY.md](SECURITY.md#detection-rules-v21021) for the complete rules reference.
+All rules (229 RULES + 5 PARANOID) are mapped to MITRE ATT&CK techniques. See [SECURITY.md](SECURITY.md#detection-rules-v21021) for the complete rules reference.
 ### Detected campaigns
@@ -274,7 +275,7 @@ With pre-commit framework:
 ```yaml
 repos:
   - repo: https://github.com/DNSZLSK/muad-dib
-    rev: v2.11.6
+    rev: v2.11.23
     hooks:
       - id: muaddib-scan
 ```
@@ -295,7 +296,7 @@ repos:
 | **FPR** (Benign random, v2.10.95 measure) | **7.0%** (14/200) | 200 random npm packages, stratified sampling |
 | **ADR** (Adversarial + Holdout) | **96.3%** (103/107) | 67 adversarial + 40 holdout (107 available on disk), global threshold=20 |
-**3529 tests** across 89 files. **223 rules** (218 RULES + 5 PARANOID).
+**3594 tests** across 93 files. **234 rules** (229 RULES + 5 PARANOID).
 > **ML retrain methodology (v2.10.51):**
 > - Ground truth: 377 confirmed_malicious via auto-labeler (OSSF malicious-packages, GitHub Advisory Database, npm registry takedown correlation)
@@ -343,7 +344,7 @@ npm test
 ### Testing
-- **3529 tests** across 89 modular test files
+- **3594 tests** across 93 modular test files
 - **56 fuzz tests** - Malformed inputs, ReDoS, unicode, binary
 - **Datadog 17K benchmark** - 14,587 confirmed malware samples (in-scope)
 - **Ground truth validation** - 67 real-world attacks (93.85% TPR@3, 86.2% TPR@20 — v2.10.95 measure)
@@ -364,7 +365,7 @@ npm test
 - [Documentation Index](docs/INDEX.md) - All documentation in one place
 - [Evaluation Methodology](docs/EVALUATION_METHODOLOGY.md) - Experimental protocol, holdout scores
 - [Threat Model](docs/threat-model.md) - What MUAD'DIB detects and doesn't detect
-- [Security Policy](SECURITY.md) - Detection rules reference (223 rules)
+- [Security Policy](SECURITY.md) - Detection rules reference (234 rules)
 - [Security Audit](docs/SECURITY_AUDIT.md) - Bypass validation report
 - [FP Analysis](docs/EVALUATION.md) - Historical false positive analysis

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "muaddib-scanner",
-  "version": "2.11.20",
+  "version": "2.11.23",
   "description": "Supply-chain threat detection & response for npm & PyPI/Python",
   "main": "src/index.js",
   "bin": {

package/src/ml/feature-extractor.js CHANGED Viewed

@@ -552,6 +552,246 @@ function placeholderAntiDepConfusion(result, meta) {
   return true;
 }
+// ============================================================================
+// Feature 9 — mcp_server_env_access (v2.11.22, audit week3 cluster, 25 FP)
+// ============================================================================
+//
+// Targets legitimate MCP installers / servers (Cachly, Roadmapfy, Llama
+// Ventures, Flomenco, Supericons, cf-memory-mcp, mcp-memory-service, etc.)
+// that currently score 75-99 from `mcp_config_injection` CRITICAL +
+// env_access + credential_regex_harvest triple-stacking on legitimate
+// provider-key reads. The conjunction below discriminates them from
+// SANDWORM_MODE droppers (which also emit mcp_config_injection) by requiring
+// the package to (a) self-identify as MCP, (b) be opt-in (no lifecycle
+// hook), (c) read ONLY known provider API keys (not .npmrc / .aws / SSH),
+// and (d) show no third-party exfil capability.
+// Provider API key env vars that legitimate MCP installers read to populate
+// the .mcp.json server config they write to the user's tool dirs.
+// Case-sensitive exact-name match; pattern match for *_API_KEY / *_TOKEN /
+// MCP_* / .*_KEY is allowed via PROVIDER_KEY_SUFFIX_RE.
+const KNOWN_PROVIDER_KEYS_LITERAL = new Set([
+  'ANTHROPIC_API_KEY', 'OPENAI_API_KEY', 'GEMINI_API_KEY', 'GOOGLE_API_KEY',
+  'GOOGLE_GENERATIVE_AI_API_KEY',
+  'STRIPE_SECRET_KEY', 'STRIPE_PUBLISHABLE_KEY', 'STRIPE_API_KEY', 'STRIPE_KEY',
+  'BRAVE_API_KEY', 'FIGMA_TOKEN', 'FIGMA_ACCESS_TOKEN', 'POSTHOG_KEY',
+  'PERPLEXITY_API_KEY', 'GROQ_API_KEY', 'COHERE_API_KEY', 'MISTRAL_API_KEY',
+  'OPENROUTER_API_KEY', 'TOGETHER_API_KEY', 'DEEPSEEK_API_KEY', 'XAI_API_KEY',
+  'SUPABASE_ANON_KEY', 'SUPABASE_URL', 'CLAUDE_API_KEY', 'CLAUDE_KEY',
+  'ANTHROPIC_AUTH_TOKEN', 'ANTHROPIC_BASE_URL', 'OPENAI_BASE_URL'
+]);
+const PROVIDER_KEY_SUFFIX_RE = /^(?:MCP_[A-Z0-9_]+|[A-Z][A-Z0-9_]*_API_KEY|[A-Z][A-Z0-9_]*_TOKEN|[A-Z][A-Z0-9_]*_API_TOKEN)$/;
+// Infra / build env vars that any well-behaved package can read without
+// disqualifying F9 (their presence doesn't indicate credential harvest).
+const F9_INFRA_KEYS = new Set([
+  'HOME', 'USERPROFILE', 'XDG_CONFIG_HOME', 'XDG_DATA_HOME', 'XDG_CACHE_HOME',
+  'PATH', 'NODE_ENV', 'NODE_PATH', 'DEBUG', 'CI', 'CWD', 'PWD',
+  'APPDATA', 'LOCALAPPDATA', 'TEMP', 'TMP', 'TMPDIR', 'SHELL',
+  'LANG', 'LC_ALL', 'TERM', 'COLORTERM'
+]);
+// Credential file paths that a malicious MCP dropper would harvest.
+// Appearance in any threat message disqualifies F9.
+const F9_CREDENTIAL_FILE_RE = /\.npmrc\b|\.aws[\/\\](?:credentials|config)\b|\bid_rsa\b|\bid_ed25519\b|\.ssh[\/\\]|\.kube[\/\\]config\b|\.docker[\/\\]config\b|\.netrc\b|\.git-credentials\b|wallet\.dat\b|\bsecret_token\b/i;
+// Threat types that signal third-party network egress. F9 disqualifies on
+// any of these — a legit MCP installer writes .mcp.json and reads env, it
+// does NOT download payloads or call back to attacker hosts.
+const F9_EXFIL_TYPES = new Set([
+  'suspicious_domain',
+  'suspicious_dataflow',
+  'remote_code_load',
+  'intent_credential_exfil',
+  'intent_command_exfil',
+  'fetch_decrypt_exec',
+  'reverse_shell',
+  'binary_dropper',
+  'download_exec_binary',
+  'curl_env_exfil',
+  'curl_exec',
+  'external_tarball_dep',
+  'dependency_url_suspicious',
+  'blockchain_c2_resolution',
+  'dns_exfil'
+]);
+// MCP identity signals — package SELF-identifies as an MCP installer/server.
+const MCP_NAME_RE = /(?:^|[/_-])mcp(?:[_-]|$)|claude[_-]plugin[_-]mcp|mcp[_-](?:server|init|bridge|installer|memory|plugin|core|router|host|client|gateway|relay|stdio|transport|orchestrator)/i;
+const MCP_DESC_RE = /\bmodel context protocol\b|\bmcp[ -](?:server|installer|bridge|plugin|memory|core|gateway|relay|orchestrator|transport)\b|\b(?:claude|cursor|windsurf)[ -]mcp\b/i;
+function _f9Keywords(meta) {
+  const m = (meta && meta.registryMeta) || {};
+  return Array.isArray(m.keywords) ? m.keywords.map(k => String(k).toLowerCase()) : [];
+}
+function _f9HasMcpIdentity(meta) {
+  if (!meta) return false;
+  const name = String(meta.name || '').toLowerCase();
+  if (MCP_NAME_RE.test(name)) return true;
+  const desc = (meta.registryMeta && meta.registryMeta.description) || meta.description || '';
+  if (MCP_DESC_RE.test(desc)) return true;
+  const kw = _f9Keywords(meta);
+  for (const k of kw) {
+    if (k === 'mcp' || k === 'model-context-protocol' || k === 'model context protocol' ||
+        k.startsWith('mcp-') || k.startsWith('mcp_')) return true;
+  }
+  const bin = meta.registryMeta && meta.registryMeta.bin;
+  if (bin && typeof bin === 'object') {
+    for (const b of Object.keys(bin)) {
+      if (/mcp/i.test(b)) return true;
+    }
+  } else if (typeof bin === 'string' && /mcp/i.test(bin)) {
+    return true;
+  }
+  return false;
+}
+/**
+ * Feature 9 — TRUE iff the package self-identifies as an MCP installer/server
+ * AND emits `mcp_config_injection` (legit scaffolding signal) AND has no
+ * install lifecycle script AND its env_access / credential_regex_harvest
+ * threats cite ONLY known provider API keys (Anthropic/OpenAI/Stripe/etc.)
+ * — never credential files like .npmrc, .aws/credentials, or SSH keys —
+ * AND shows no third-party exfil capability.
+ *
+ * Targets the v2.11 audit week3 cluster of 25 legitimate MCP plugin
+ * installers that currently score 75-99 from mcp_config_injection +
+ * env_access + credential_regex_harvest triple-stacking. Cap to 30 (MEDIUM).
+ *
+ * Mutually exclusive with SANDWORM_MODE MCP droppers: condition C3 blocks
+ * preinstall/postinstall droppers; C4 blocks .npmrc/SSH/AWS harvests; C5
+ * blocks downloaders. None of the 15 MALWARE + 29 PENTEST samples in the
+ * week3 audit satisfy all five conditions simultaneously.
+ *
+ * Covers 25 FP (8.7% of audit week3 FP corpus).
+ */
+function mcpServerEnvAccess(result, meta) {
+  // C1 — MCP identity
+  if (!_f9HasMcpIdentity(meta)) return false;
+  const threats = (result && result.threats) || [];
+  if (threats.length === 0) return false;
+  // C2 — mcp_config_injection present (the positive signal that the package
+  // actually does MCP work, not just claims to)
+  if (!threats.some(t => t.type === 'mcp_config_injection')) return false;
+  // C3 — no install lifecycle hook
+  if (hasLifecycleScripts(meta)) return false;
+  // C4 — env_access / credential_regex_harvest must cite only known provider
+  // keys (literal whitelist + suffix pattern) or infra vars; never credential
+  // file paths
+  for (const t of threats) {
+    if (t.type !== 'env_access' && t.type !== 'credential_regex_harvest' &&
+        t.type !== 'env_charcode_reconstruction') continue;
+    const msg = String(t.message || '');
+    if (F9_CREDENTIAL_FILE_RE.test(msg)) return false;
+    // Extract candidate env var names from the message
+    const candidates = msg.match(/\b[A-Z][A-Z0-9_]{2,}\b/g);
+    if (!candidates) continue;
+    for (const v of candidates) {
+      if (KNOWN_PROVIDER_KEYS_LITERAL.has(v)) continue;
+      if (PROVIDER_KEY_SUFFIX_RE.test(v)) continue;
+      if (F9_INFRA_KEYS.has(v)) continue;
+      // Unknown all-caps token in a credential threat message — could be an
+      // attacker-specific var. Don't vouch for legitimacy.
+      return false;
+    }
+  }
+  // C5 — no third-party exfil capability
+  for (const t of threats) {
+    if (F9_EXFIL_TYPES.has(t.type)) return false;
+  }
+  return true;
+}
+// ============================================================================
+// Feature 10 — vendor_cli_sdk (v2.11.23, audit week3 cluster, 96 FP)
+// ============================================================================
+//
+// Targets the largest residual FP cluster from the audit 2026-05-week3
+// (96 entries, 33.6% of FP): legitimate vendor / community CLIs and SDKs
+// that fire `credential_regex_harvest` + `env_access` on their OWN
+// in-package credential handling (Stripe checkout, OAuth-PKCE, bearer
+// tokens to vendor APIs, .env template scaffolding). Examples observed:
+// @nocobase/cli-v1, @posterly/cli, @super-hands/cli, codeapp-js-cli
+// (Microsoft Power Apps), nodebb-plugin-flawless-donations (Stripe),
+// @aiyiran/myclaw (Chinese OpenClaw wrapper), usegrain (scaffolder),
+// @tapestry-mud/cli, db-model-router, etc.
+//
+// Discriminator vs vendor-impersonating malware: SANDWORM_MODE droppers
+// (a) typically have no `bin` entry (they install via lifecycle hook,
+// not user-invoked CLI), (b) emit `mcp_config_injection` (F9 catches
+// those), (c) cite credential file paths (.npmrc / .ssh / .aws), (d)
+// emit third-party exfil threats. F10's conjunction requires NONE of
+// these and additionally requires a vendor identity hint (homepage or
+// scoped name).
+function _f10HasBinEntry(meta) {
+  const bin = meta && meta.registryMeta && meta.registryMeta.bin;
+  if (!bin) return false;
+  if (typeof bin === 'string' && bin.trim().length > 0) return true;
+  if (typeof bin === 'object' && Object.keys(bin).length > 0) return true;
+  return false;
+}
+function _f10HasVendorIdentity(meta) {
+  if (!meta) return false;
+  if (getHomepageHost(meta)) return true;
+  const name = meta.name && String(meta.name);
+  if (name && name.startsWith('@') && name.includes('/')) return true;
+  return false;
+}
+/**
+ * Feature 10 — TRUE iff the package looks structurally like a legitimate
+ * vendor / community CLI / SDK whose credential-handling threats are
+ * intrinsic to its functionality, not an exfil vector.
+ *
+ * Conjunction of 7 conditions (see file header for SANDWORM_MODE
+ * discriminator rationale):
+ *
+ *   C1  has `bin` entry                       — CLI signal
+ *   C2  credential_regex_harvest OR env_access fires
+ *   C3  no `mcp_config_injection`             — F9 catches MCP installers
+ *   C4  no install lifecycle hook             — legit CLIs are opt-in
+ *   C5  no third-party exfil threat (15 types)
+ *   C6  no credential file path (.npmrc/.ssh/.aws) in any threat message
+ *   C7  vendor identity present (homepage host OR scoped @vendor/name)
+ *
+ * Cap value 35 (CRITICAL → MEDIUM-HIGH boundary). Reuses the F9 constants
+ * F9_EXFIL_TYPES and F9_CREDENTIAL_FILE_RE for C5/C6.
+ *
+ * Covers up to 96 FP (33.6% of audit week3 FP corpus). Estimated effective
+ * coverage 60-75 after the conjunction filters (some week3 entries lack
+ * a bin field, e.g. design-system asset packages — those fall under F1
+ * `bundle_without_install_scripts` instead).
+ */
+function vendorCliSdk(result, meta) {
+  // C1 — has bin entry
+  if (!_f10HasBinEntry(meta)) return false;
+  const threats = (result && result.threats) || [];
+  if (threats.length === 0) return false;
+  // C2 — at least one credential-noise threat (the FP source)
+  const hasCredentialNoise = threats.some(t =>
+    t.type === 'credential_regex_harvest' ||
+    t.type === 'env_access' ||
+    t.type === 'env_charcode_reconstruction' ||
+    t.type === 'credential_tampering'
+  );
+  if (!hasCredentialNoise) return false;
+  // C3 — no mcp_config_injection (F9 territory)
+  if (threats.some(t => t.type === 'mcp_config_injection')) return false;
+  // C4 — no install lifecycle hook
+  if (hasLifecycleScripts(meta)) return false;
+  // C5 + C6 — scan threats for exfil signal and credential-file mentions
+  for (const t of threats) {
+    if (F9_EXFIL_TYPES.has(t.type)) return false;       // C5
+    if (F9_CREDENTIAL_FILE_RE.test(String(t.message || ''))) return false;  // C6
+  }
+  // C7 — vendor identity
+  if (!_f10HasVendorIdentity(meta)) return false;
+  return true;
+}
 /**
  * Feature 8 — TRUE iff the package declares at least one install
  * lifecycle script AND the scan shows no network egress capability
@@ -702,6 +942,11 @@ function extractFeatures(result, meta) {
   // See ml-retrain/ml-auc-v2.10.96.md for details.
   features.install_script_no_network_egress = 0; // installScriptNoNetworkEgress(result, meta) ? 1 : 0;
+  // --- v2.11.22 Feature 9 (audit week3 cluster — 25 FP) ---
+  features.mcp_server_env_access = mcpServerEnvAccess(result, meta) ? 1 : 0;
+  // --- v2.11.23 Feature 10 (audit week3 cluster — up to 96 FP) ---
+  features.vendor_cli_sdk = vendorCliSdk(result, meta) ? 1 : 0;
   return features;
 }
@@ -779,5 +1024,7 @@ module.exports = {
   typosquatScopedPackage,
   obfuscationWithoutVector,
   placeholderAntiDepConfusion,
-  installScriptNoNetworkEgress
+  installScriptNoNetworkEgress,
+  mcpServerEnvAccess,
+  vendorCliSdk
 };

package/src/monitor/ingestion.js CHANGED Viewed

@@ -317,43 +317,136 @@ function parsePyPIRss(xml) {
 // --- CouchDB doc extraction ---
+// Burst-publish window: extra versions published within this window before the
+// most-recent one are also enqueued for scanning. Covers the case where an
+// account-takeover attacker publishes several versions in a short burst.
+const RECENT_PUBLISH_WINDOW_MS = 24 * 60 * 60 * 1000;
+const RECENT_PUBLISH_MAX = 5;
 /**
- * Layer 2: Extract the latest version's tarball URL from a CouchDB changes document
- * (when using include_docs=true). Eliminates the separate registry roundtrip
- * that can 404 if the package is unpublished between detection and scan.
+ * Pure function: pick the most-recently-published version from a packument and
+ * return its metadata, plus context useful for ATO detection.
+ *
+ * Critical: we sort by `time[version]` publish timestamp, NOT `dist-tags.latest`.
+ * Account-takeover attacks (TeamPCP / @antv 2026-05-19, SAP, every Shai-Hulud
+ * derivative) publish malicious versions without moving the latest tag — semver
+ * resolution on `npm install` will still pull them. Selecting by latest tag
+ * scans the wrong (clean) version and lets the malicious tarball ship.
  *
- * @param {Object} doc - CouchDB document (change.doc)
- * @returns {{ version: string, tarball: string|null, unpackedSize: number, scripts: Object }|null}
+ * Falls back to `dist-tags.latest` only when `time` is missing or yields no
+ * usable entries (very old legacy packages).
+ *
+ * @param {Object} packument - npm packument (full /<pkg> response or CouchDB doc)
+ * @param {Object} [options]
+ * @param {number} [options.recentWindowMs=86400000] - window for collecting extra recent versions
+ * @param {number} [options.maxRecent=5] - hard cap on extras returned
+ * @returns {Object|null} - {
+ *   version, tarball, unpackedSize, scripts, homepage, description,
+ *   latestTagVersion,       // dist-tags.latest (may differ from `version` under ATO)
+ *   recentVersions: [{ version, tarball, unpackedSize, scripts }, ...]
+ * } or null if no usable version found
  */
-function extractTarballFromDoc(doc) {
-  try {
-    if (!doc || !doc.versions || !doc['dist-tags']) return null;
-    const latestTag = doc['dist-tags'].latest;
-    if (!latestTag) return null;
+function selectMostRecentVersion(packument, options = {}) {
+  const recentWindowMs = options.recentWindowMs != null ? options.recentWindowMs : RECENT_PUBLISH_WINDOW_MS;
+  const maxRecent = options.maxRecent != null ? options.maxRecent : RECENT_PUBLISH_MAX;
+  if (!packument || typeof packument !== 'object') return null;
+  const versions = packument.versions || {};
+  const time = packument.time || {};
+  const distTags = packument['dist-tags'] || {};
+  const latestTagVersion = (typeof distTags.latest === 'string') ? distTags.latest : null;
+  // Build [version, timestamp] pairs from `time`, skipping non-version keys
+  // (created/modified) and entries for unpublished versions (present in `time`
+  // but absent from `versions` — npm leaves the tombstone after `npm unpublish`).
+  const versionTimes = [];
+  for (const [v, tsStr] of Object.entries(time)) {
+    if (v === 'created' || v === 'modified') continue;
+    if (!versions[v]) continue;
+    const ts = Date.parse(tsStr);
+    if (!Number.isFinite(ts)) continue;
+    versionTimes.push([v, ts]);
+  }
-    const versionData = doc.versions[latestTag];
-    if (!versionData) return null;
+  let mostRecentVersion = null;
+  if (versionTimes.length > 0) {
+    versionTimes.sort((a, b) => b[1] - a[1]);
+    mostRecentVersion = versionTimes[0][0];
+  } else if (latestTagVersion && versions[latestTagVersion]) {
+    // Legacy fallback: no usable time data, accept dist-tag latest
+    mostRecentVersion = latestTagVersion;
+  }
+  if (!mostRecentVersion) return null;
+  const versionData = versions[mostRecentVersion];
+  if (!versionData) return null;
+  const result = {
+    version: versionData.version || mostRecentVersion,
+    tarball: (versionData.dist && versionData.dist.tarball) || null,
+    unpackedSize: (versionData.dist && versionData.dist.unpackedSize) || 0,
+    scripts: versionData.scripts || {},
+    homepage: (typeof versionData.homepage === 'string') ? versionData.homepage : '',
+    description: (typeof versionData.description === 'string') ? versionData.description : '',
+    latestTagVersion,
+    recentVersions: [],
+  };
+  // Burst extras: other versions published within the recent window, excluding
+  // the most-recent one. Bounded by maxRecent. Each extra carries enough
+  // metadata for the queue to enqueue it directly without re-fetching the packument.
+  if (versionTimes.length > 1) {
+    const cutoff = versionTimes[0][1] - recentWindowMs;
+    for (let i = 1; i < versionTimes.length && result.recentVersions.length < maxRecent; i++) {
+      const [v, ts] = versionTimes[i];
+      if (ts < cutoff) break; // sorted desc, so once we cross the cutoff we're done
+      const vData = versions[v];
+      if (!vData) continue;
+      result.recentVersions.push({
+        version: vData.version || v,
+        tarball: (vData.dist && vData.dist.tarball) || null,
+        unpackedSize: (vData.dist && vData.dist.unpackedSize) || 0,
+        scripts: vData.scripts || {},
+      });
+    }
+  }
-    const tarball = (versionData.dist && versionData.dist.tarball) || null;
-    const unpackedSize = (versionData.dist && versionData.dist.unpackedSize) || 0;
-    const version = versionData.version || latestTag;
-    const scripts = versionData.scripts || {};
-    const homepage = (typeof versionData.homepage === 'string') ? versionData.homepage : '';
-    const description = (typeof versionData.description === 'string') ? versionData.description : '';
+  return result;
+}
-    return { version, tarball, unpackedSize, scripts, homepage, description };
+/**
+ * Layer 2: Extract metadata for the most-recently-published version from a
+ * CouchDB changes document (when using include_docs=true). Eliminates the
+ * separate registry roundtrip that can 404 if the package is unpublished
+ * between detection and scan.
+ *
+ * Currently dead code post-May 2025 CouchDB migration (include_docs deprecated,
+ * change.doc is always null). Kept defensive in case the registry restores it
+ * or a different upstream mirror provides docs again.
+ *
+ * @param {Object} doc - CouchDB document (change.doc), structurally a packument
+ */
+function extractTarballFromDoc(doc) {
+  try {
+    if (!doc || !doc.versions || !doc['dist-tags']) return null;
+    return selectMostRecentVersion(doc);
   } catch {
     return null; // Parse failure -> fallback to lazy resolution
   }
 }
 /**
- * Fetch latest version metadata for an npm package.
- * Returns { version, tarball } or null on failure.
+ * Fetch most-recently-published version metadata for an npm package.
+ *
+ * Uses the full packument (`registry.npmjs.org/<pkg>`) rather than the `/latest`
+ * endpoint so we can detect ATO attacks that publish without moving the latest
+ * dist-tag (see selectMostRecentVersion for full threat model).
+ *
+ * Returned object includes `latestTagVersion` and `recentVersions` so callers
+ * can flag the ATO signature and enqueue burst extras for scanning.
  */
 async function getNpmLatestTarball(packageName) {
-  const url = `https://registry.npmjs.org/${encodeURIComponent(packageName)}/latest`;
+  const url = `https://registry.npmjs.org/${encodeURIComponent(packageName)}`;
   await acquireRegistrySlot();
   let body;
   try {
@@ -361,19 +454,21 @@ async function getNpmLatestTarball(packageName) {
   } finally {
     releaseRegistrySlot();
   }
-  let data;
+  let packument;
   try {
-    data = JSON.parse(body);
+    packument = JSON.parse(body);
   } catch (e) {
     throw new Error(`Invalid JSON from npm registry for ${packageName}: ${e.message}`);
   }
-  const version = data.version || '';
-  const tarball = (data.dist && data.dist.tarball) || null;
-  const unpackedSize = (data.dist && data.dist.unpackedSize) || 0;
-  const scripts = (data.scripts) || {};
-  const homepage = (typeof data.homepage === 'string') ? data.homepage : '';
-  const description = (typeof data.description === 'string') ? data.description : '';
-  return { version, tarball, unpackedSize, scripts, homepage, description };
+  const result = selectMostRecentVersion(packument);
+  if (!result) {
+    return {
+      version: '', tarball: null, unpackedSize: 0, scripts: {},
+      homepage: '', description: '',
+      latestTagVersion: null, recentVersions: [],
+    };
+  }
+  return result;
 }
 // --- npm polling ---
@@ -1075,6 +1170,9 @@ module.exports = {
   // CouchDB doc extraction
   extractTarballFromDoc,
+  selectMostRecentVersion,
+  RECENT_PUBLISH_WINDOW_MS,
+  RECENT_PUBLISH_MAX,
   // Polling functions
   pollNpmChanges,

package/src/monitor/queue.js CHANGED Viewed

@@ -1118,13 +1118,49 @@ async function resolveTarballAndScan(item, stats, dailyAlerts, recentlyScanned,
       if (npmInfo.unpackedSize) item.unpackedSize = npmInfo.unpackedSize;
       if (npmInfo.scripts) item.registryScripts = npmInfo.scripts;
+      // ATO signature: most-recently-published version differs from current
+      // dist-tags.latest. Pattern observed in TeamPCP / @antv 2026-05-19:
+      // attacker publishes 1-2 versions per package but does NOT bump the latest
+      // tag. semver resolution on `npm install <pkg>@^x.y` still pulls the
+      // malicious version. The mismatch is a strong ATO signal — legitimate
+      // maintainers almost always move latest when publishing.
+      if (npmInfo.latestTagVersion && npmInfo.version && npmInfo.version !== npmInfo.latestTagVersion) {
+        item.atoSignal = true;
+        console.log(`[MONITOR] ATO SIGNAL: ${item.name}@${item.version} published but dist-tags.latest=${npmInfo.latestTagVersion}`);
+      }
+      // Burst-publish coverage: enqueue extra versions published in the same
+      // recent window. Single change event in the CouchDB feed can correspond
+      // to multiple version publishes when the attacker fires several in a
+      // burst (TeamPCP averaged ~2 versions per package). Without this we'd
+      // only scan whichever version happened to be the most recent at resolution
+      // time, racing the publish stream.
+      const recents = Array.isArray(npmInfo.recentVersions) ? npmInfo.recentVersions : [];
+      for (const recent of recents) {
+        if (!recent || !recent.tarball || !recent.version) continue;
+        const dedupeKey = `${item.name}@${recent.version}`;
+        if (recentlyScanned.has(dedupeKey)) continue;
+        scanQueue.push({
+          name: item.name,
+          version: recent.version,
+          ecosystem: 'npm',
+          tarballUrl: recent.tarball,
+          unpackedSize: recent.unpackedSize || 0,
+          registryScripts: recent.scripts || null,
+          atoSignal: item.atoSignal === true,
+          isATOBurstExtra: true,
+        });
+      }
       // Fast-track decision: large packages (>15MB) with no lifecycle scripts and no IOC match.
       // Computed HERE (after metadata resolution), not at ingestion time — post-May 2025
       // CouchDB changes feed has no docs, so metadata is only available after lazy fetch.
       // Fast-track packages get: quick static scan (package.json + shell only), no AST,
       // no sandbox, no LLM, no archiving. Exits in ~2-3s instead of 30-300s.
+      // ATO-signalled packages bypass fast-track regardless of size — we want
+      // the full pipeline (AST + sandbox) on anything that smells like an ATO.
       const FAST_TRACK_SIZE_BYTES = 15 * 1024 * 1024;
-      if (!item.isIOCMatch && (item.unpackedSize || 0) > FAST_TRACK_SIZE_BYTES) {
+      if (!item.isIOCMatch && !item.atoSignal && (item.unpackedSize || 0) > FAST_TRACK_SIZE_BYTES) {
         const scripts = item.registryScripts || {};
         if (!scripts.preinstall && !scripts.postinstall && !scripts.install) {
           item.fastTrack = true;

package/src/pipeline/processor.js CHANGED Viewed

@@ -166,6 +166,9 @@ async function process(threats, targetPath, options, pythonDeps, warnings, scann
         homepage: pkgData.homepage || (typeof pkgData.repository === 'string' ? pkgData.repository : (pkgData.repository && pkgData.repository.url) || ''),
         dependencies: pkgData.dependencies,
         devDependencies: pkgData.devDependencies,
+        // v2.11.22 — used by F9 (mcp_server_env_access) identity check.
+        keywords: Array.isArray(pkgData.keywords) ? pkgData.keywords : undefined,
+        bin: pkgData.bin,
       };
     }
   } catch { /* graceful fallback */ }

package/src/scoring.js CHANGED Viewed

@@ -1483,6 +1483,8 @@ const {
   typosquatScopedPackage,
   obfuscationWithoutVector,
   placeholderAntiDepConfusion,
+  mcpServerEnvAccess,
+  vendorCliSdk,
 } = require('./ml/feature-extractor.js');
 /**
@@ -1501,6 +1503,9 @@ function applyContextualFPCaps(result, pkgMeta) {
       homepage: (pkgMeta && pkgMeta.homepage) || '',
       dependencies: (pkgMeta && pkgMeta.dependencies),
       devDependencies: (pkgMeta && pkgMeta.devDependencies),
+      // v2.11.22 — used by F9 (mcp_server_env_access) identity check.
+      keywords: (pkgMeta && pkgMeta.keywords),
+      bin: (pkgMeta && pkgMeta.bin),
     },
   };
@@ -1518,6 +1523,10 @@ function applyContextualFPCaps(result, pkgMeta) {
   if (networkDestinationFirstParty(result, meta)) {
     applied.push({ feature: 'network_destination_first_party', cap: 30 });
   }
+  // F9: legit MCP installer/server with env_access on provider keys → MAX 30
+  if (mcpServerEnvAccess(result, meta)) {
+    applied.push({ feature: 'mcp_server_env_access', cap: 30 });
+  }
   // F2: binary installer from GitHub Releases → MAX 35
   if (installUrlGithubReleases(result)) {
     applied.push({ feature: 'install_url_github_releases', cap: 35 });
@@ -1530,6 +1539,10 @@ function applyContextualFPCaps(result, pkgMeta) {
   if (obfuscationWithoutVector(result)) {
     applied.push({ feature: 'obfuscation_without_vector', cap: 35 });
   }
+  // F10: legit vendor CLI/SDK with intrinsic credential handling → MAX 35
+  if (vendorCliSdk(result, meta)) {
+    applied.push({ feature: 'vendor_cli_sdk', cap: 35 });
+  }
   // F5: typosquat on scoped package → suppress typosquat points
   if (typosquatScopedPackage(result, meta)) {
     applied.push({ feature: 'typosquat_scoped_package', cap: -1 });