muaddib-scanner 2.11.20 → 2.11.22
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +9 -8
- package/package.json +1 -1
- package/src/ml/feature-extractor.js +156 -1
- package/src/monitor/ingestion.js +130 -32
- package/src/monitor/queue.js +37 -1
- package/src/pipeline/processor.js +3 -0
- package/src/scoring.js +8 -0
package/README.md
CHANGED
|
@@ -30,7 +30,7 @@
|
|
|
30
30
|
|
|
31
31
|
npm and PyPI supply-chain attacks are exploding. Shai-Hulud compromised 25K+ repos in 2025. Existing tools detect threats but don't help you respond.
|
|
32
32
|
|
|
33
|
-
MUAD'DIB combines **
|
|
33
|
+
MUAD'DIB combines **17 parallel scanners** (234 detection rules), a **deobfuscation engine**, **inter-module dataflow analysis**, **compound scoring** (16 compound rules), **ML classifiers** (XGBoost), and gVisor/Docker sandbox to detect known threats and suspicious behavioral patterns in npm and PyPI packages.
|
|
34
34
|
|
|
35
35
|
---
|
|
36
36
|
|
|
@@ -176,7 +176,7 @@ muaddib replay # Ground truth validation (61/65 TPR@3)
|
|
|
176
176
|
|
|
177
177
|
## Features
|
|
178
178
|
|
|
179
|
-
###
|
|
179
|
+
### 17 parallel scanners
|
|
180
180
|
|
|
181
181
|
| Scanner | Detection |
|
|
182
182
|
|---------|-----------|
|
|
@@ -197,10 +197,11 @@ muaddib replay # Ground truth validation (61/65 TPR@3)
|
|
|
197
197
|
| IOC Strings (intel-triage P1.1) | YARA-style string matching (Axios 2026, TeamPCP, GlassWorm, CanisterSprawl) |
|
|
198
198
|
| Anti-Forensic AST (intel-triage P1.2) | XOR loop + self-delete + decoy write compound (csec autodelete) |
|
|
199
199
|
| Stub Package (intel-triage P1.3) | Tiny main file + external dep URL + lifecycle hook (ltidi chain) |
|
|
200
|
+
| Monorepo Scanner | Lerna/pnpm-workspace/turbo detection (Sprint 1 audit MR-C2 fix) |
|
|
200
201
|
|
|
201
|
-
###
|
|
202
|
+
### 234 detection rules
|
|
202
203
|
|
|
203
|
-
All rules are mapped to MITRE ATT&CK techniques. See [SECURITY.md](SECURITY.md#detection-rules-v21021) for the complete rules reference.
|
|
204
|
+
All rules (229 RULES + 5 PARANOID) are mapped to MITRE ATT&CK techniques. See [SECURITY.md](SECURITY.md#detection-rules-v21021) for the complete rules reference.
|
|
204
205
|
|
|
205
206
|
### Detected campaigns
|
|
206
207
|
|
|
@@ -274,7 +275,7 @@ With pre-commit framework:
|
|
|
274
275
|
```yaml
|
|
275
276
|
repos:
|
|
276
277
|
- repo: https://github.com/DNSZLSK/muad-dib
|
|
277
|
-
rev: v2.11.
|
|
278
|
+
rev: v2.11.22
|
|
278
279
|
hooks:
|
|
279
280
|
- id: muaddib-scan
|
|
280
281
|
```
|
|
@@ -295,7 +296,7 @@ repos:
|
|
|
295
296
|
| **FPR** (Benign random, v2.10.95 measure) | **7.0%** (14/200) | 200 random npm packages, stratified sampling |
|
|
296
297
|
| **ADR** (Adversarial + Holdout) | **96.3%** (103/107) | 67 adversarial + 40 holdout (107 available on disk), global threshold=20 |
|
|
297
298
|
|
|
298
|
-
**
|
|
299
|
+
**3586 tests** across 93 files. **234 rules** (229 RULES + 5 PARANOID).
|
|
299
300
|
|
|
300
301
|
> **ML retrain methodology (v2.10.51):**
|
|
301
302
|
> - Ground truth: 377 confirmed_malicious via auto-labeler (OSSF malicious-packages, GitHub Advisory Database, npm registry takedown correlation)
|
|
@@ -343,7 +344,7 @@ npm test
|
|
|
343
344
|
|
|
344
345
|
### Testing
|
|
345
346
|
|
|
346
|
-
- **
|
|
347
|
+
- **3586 tests** across 93 modular test files
|
|
347
348
|
- **56 fuzz tests** - Malformed inputs, ReDoS, unicode, binary
|
|
348
349
|
- **Datadog 17K benchmark** - 14,587 confirmed malware samples (in-scope)
|
|
349
350
|
- **Ground truth validation** - 67 real-world attacks (93.85% TPR@3, 86.2% TPR@20 — v2.10.95 measure)
|
|
@@ -364,7 +365,7 @@ npm test
|
|
|
364
365
|
- [Documentation Index](docs/INDEX.md) - All documentation in one place
|
|
365
366
|
- [Evaluation Methodology](docs/EVALUATION_METHODOLOGY.md) - Experimental protocol, holdout scores
|
|
366
367
|
- [Threat Model](docs/threat-model.md) - What MUAD'DIB detects and doesn't detect
|
|
367
|
-
- [Security Policy](SECURITY.md) - Detection rules reference (
|
|
368
|
+
- [Security Policy](SECURITY.md) - Detection rules reference (234 rules)
|
|
368
369
|
- [Security Audit](docs/SECURITY_AUDIT.md) - Bypass validation report
|
|
369
370
|
- [FP Analysis](docs/EVALUATION.md) - Historical false positive analysis
|
|
370
371
|
|
package/package.json
CHANGED
|
@@ -552,6 +552,157 @@ function placeholderAntiDepConfusion(result, meta) {
|
|
|
552
552
|
return true;
|
|
553
553
|
}
|
|
554
554
|
|
|
555
|
+
// ============================================================================
|
|
556
|
+
// Feature 9 — mcp_server_env_access (v2.11.22, audit week3 cluster, 25 FP)
|
|
557
|
+
// ============================================================================
|
|
558
|
+
//
|
|
559
|
+
// Targets legitimate MCP installers / servers (Cachly, Roadmapfy, Llama
|
|
560
|
+
// Ventures, Flomenco, Supericons, cf-memory-mcp, mcp-memory-service, etc.)
|
|
561
|
+
// that currently score 75-99 from `mcp_config_injection` CRITICAL +
|
|
562
|
+
// env_access + credential_regex_harvest triple-stacking on legitimate
|
|
563
|
+
// provider-key reads. The conjunction below discriminates them from
|
|
564
|
+
// SANDWORM_MODE droppers (which also emit mcp_config_injection) by requiring
|
|
565
|
+
// the package to (a) self-identify as MCP, (b) be opt-in (no lifecycle
|
|
566
|
+
// hook), (c) read ONLY known provider API keys (not .npmrc / .aws / SSH),
|
|
567
|
+
// and (d) show no third-party exfil capability.
|
|
568
|
+
|
|
569
|
+
// Provider API key env vars that legitimate MCP installers read to populate
|
|
570
|
+
// the .mcp.json server config they write to the user's tool dirs.
|
|
571
|
+
// Case-sensitive exact-name match; pattern match for *_API_KEY / *_TOKEN /
|
|
572
|
+
// MCP_* / .*_KEY is allowed via PROVIDER_KEY_SUFFIX_RE.
|
|
573
|
+
const KNOWN_PROVIDER_KEYS_LITERAL = new Set([
|
|
574
|
+
'ANTHROPIC_API_KEY', 'OPENAI_API_KEY', 'GEMINI_API_KEY', 'GOOGLE_API_KEY',
|
|
575
|
+
'GOOGLE_GENERATIVE_AI_API_KEY',
|
|
576
|
+
'STRIPE_SECRET_KEY', 'STRIPE_PUBLISHABLE_KEY', 'STRIPE_API_KEY', 'STRIPE_KEY',
|
|
577
|
+
'BRAVE_API_KEY', 'FIGMA_TOKEN', 'FIGMA_ACCESS_TOKEN', 'POSTHOG_KEY',
|
|
578
|
+
'PERPLEXITY_API_KEY', 'GROQ_API_KEY', 'COHERE_API_KEY', 'MISTRAL_API_KEY',
|
|
579
|
+
'OPENROUTER_API_KEY', 'TOGETHER_API_KEY', 'DEEPSEEK_API_KEY', 'XAI_API_KEY',
|
|
580
|
+
'SUPABASE_ANON_KEY', 'SUPABASE_URL', 'CLAUDE_API_KEY', 'CLAUDE_KEY',
|
|
581
|
+
'ANTHROPIC_AUTH_TOKEN', 'ANTHROPIC_BASE_URL', 'OPENAI_BASE_URL'
|
|
582
|
+
]);
|
|
583
|
+
const PROVIDER_KEY_SUFFIX_RE = /^(?:MCP_[A-Z0-9_]+|[A-Z][A-Z0-9_]*_API_KEY|[A-Z][A-Z0-9_]*_TOKEN|[A-Z][A-Z0-9_]*_API_TOKEN)$/;
|
|
584
|
+
|
|
585
|
+
// Infra / build env vars that any well-behaved package can read without
|
|
586
|
+
// disqualifying F9 (their presence doesn't indicate credential harvest).
|
|
587
|
+
const F9_INFRA_KEYS = new Set([
|
|
588
|
+
'HOME', 'USERPROFILE', 'XDG_CONFIG_HOME', 'XDG_DATA_HOME', 'XDG_CACHE_HOME',
|
|
589
|
+
'PATH', 'NODE_ENV', 'NODE_PATH', 'DEBUG', 'CI', 'CWD', 'PWD',
|
|
590
|
+
'APPDATA', 'LOCALAPPDATA', 'TEMP', 'TMP', 'TMPDIR', 'SHELL',
|
|
591
|
+
'LANG', 'LC_ALL', 'TERM', 'COLORTERM'
|
|
592
|
+
]);
|
|
593
|
+
|
|
594
|
+
// Credential file paths that a malicious MCP dropper would harvest.
|
|
595
|
+
// Appearance in any threat message disqualifies F9.
|
|
596
|
+
const F9_CREDENTIAL_FILE_RE = /\.npmrc\b|\.aws[\/\\](?:credentials|config)\b|\bid_rsa\b|\bid_ed25519\b|\.ssh[\/\\]|\.kube[\/\\]config\b|\.docker[\/\\]config\b|\.netrc\b|\.git-credentials\b|wallet\.dat\b|\bsecret_token\b/i;
|
|
597
|
+
|
|
598
|
+
// Threat types that signal third-party network egress. F9 disqualifies on
|
|
599
|
+
// any of these — a legit MCP installer writes .mcp.json and reads env, it
|
|
600
|
+
// does NOT download payloads or call back to attacker hosts.
|
|
601
|
+
const F9_EXFIL_TYPES = new Set([
|
|
602
|
+
'suspicious_domain',
|
|
603
|
+
'suspicious_dataflow',
|
|
604
|
+
'remote_code_load',
|
|
605
|
+
'intent_credential_exfil',
|
|
606
|
+
'intent_command_exfil',
|
|
607
|
+
'fetch_decrypt_exec',
|
|
608
|
+
'reverse_shell',
|
|
609
|
+
'binary_dropper',
|
|
610
|
+
'download_exec_binary',
|
|
611
|
+
'curl_env_exfil',
|
|
612
|
+
'curl_exec',
|
|
613
|
+
'external_tarball_dep',
|
|
614
|
+
'dependency_url_suspicious',
|
|
615
|
+
'blockchain_c2_resolution',
|
|
616
|
+
'dns_exfil'
|
|
617
|
+
]);
|
|
618
|
+
|
|
619
|
+
// MCP identity signals — package SELF-identifies as an MCP installer/server.
|
|
620
|
+
const MCP_NAME_RE = /(?:^|[/_-])mcp(?:[_-]|$)|claude[_-]plugin[_-]mcp|mcp[_-](?:server|init|bridge|installer|memory|plugin|core|router|host|client|gateway|relay|stdio|transport|orchestrator)/i;
|
|
621
|
+
const MCP_DESC_RE = /\bmodel context protocol\b|\bmcp[ -](?:server|installer|bridge|plugin|memory|core|gateway|relay|orchestrator|transport)\b|\b(?:claude|cursor|windsurf)[ -]mcp\b/i;
|
|
622
|
+
|
|
623
|
+
function _f9Keywords(meta) {
|
|
624
|
+
const m = (meta && meta.registryMeta) || {};
|
|
625
|
+
return Array.isArray(m.keywords) ? m.keywords.map(k => String(k).toLowerCase()) : [];
|
|
626
|
+
}
|
|
627
|
+
|
|
628
|
+
function _f9HasMcpIdentity(meta) {
|
|
629
|
+
if (!meta) return false;
|
|
630
|
+
const name = String(meta.name || '').toLowerCase();
|
|
631
|
+
if (MCP_NAME_RE.test(name)) return true;
|
|
632
|
+
const desc = (meta.registryMeta && meta.registryMeta.description) || meta.description || '';
|
|
633
|
+
if (MCP_DESC_RE.test(desc)) return true;
|
|
634
|
+
const kw = _f9Keywords(meta);
|
|
635
|
+
for (const k of kw) {
|
|
636
|
+
if (k === 'mcp' || k === 'model-context-protocol' || k === 'model context protocol' ||
|
|
637
|
+
k.startsWith('mcp-') || k.startsWith('mcp_')) return true;
|
|
638
|
+
}
|
|
639
|
+
const bin = meta.registryMeta && meta.registryMeta.bin;
|
|
640
|
+
if (bin && typeof bin === 'object') {
|
|
641
|
+
for (const b of Object.keys(bin)) {
|
|
642
|
+
if (/mcp/i.test(b)) return true;
|
|
643
|
+
}
|
|
644
|
+
} else if (typeof bin === 'string' && /mcp/i.test(bin)) {
|
|
645
|
+
return true;
|
|
646
|
+
}
|
|
647
|
+
return false;
|
|
648
|
+
}
|
|
649
|
+
|
|
650
|
+
/**
|
|
651
|
+
* Feature 9 — TRUE iff the package self-identifies as an MCP installer/server
|
|
652
|
+
* AND emits `mcp_config_injection` (legit scaffolding signal) AND has no
|
|
653
|
+
* install lifecycle script AND its env_access / credential_regex_harvest
|
|
654
|
+
* threats cite ONLY known provider API keys (Anthropic/OpenAI/Stripe/etc.)
|
|
655
|
+
* — never credential files like .npmrc, .aws/credentials, or SSH keys —
|
|
656
|
+
* AND shows no third-party exfil capability.
|
|
657
|
+
*
|
|
658
|
+
* Targets the v2.11 audit week3 cluster of 25 legitimate MCP plugin
|
|
659
|
+
* installers that currently score 75-99 from mcp_config_injection +
|
|
660
|
+
* env_access + credential_regex_harvest triple-stacking. Cap to 30 (MEDIUM).
|
|
661
|
+
*
|
|
662
|
+
* Mutually exclusive with SANDWORM_MODE MCP droppers: condition C3 blocks
|
|
663
|
+
* preinstall/postinstall droppers; C4 blocks .npmrc/SSH/AWS harvests; C5
|
|
664
|
+
* blocks downloaders. None of the 15 MALWARE + 29 PENTEST samples in the
|
|
665
|
+
* week3 audit satisfy all five conditions simultaneously.
|
|
666
|
+
*
|
|
667
|
+
* Covers 25 FP (8.7% of audit week3 FP corpus).
|
|
668
|
+
*/
|
|
669
|
+
function mcpServerEnvAccess(result, meta) {
|
|
670
|
+
// C1 — MCP identity
|
|
671
|
+
if (!_f9HasMcpIdentity(meta)) return false;
|
|
672
|
+
const threats = (result && result.threats) || [];
|
|
673
|
+
if (threats.length === 0) return false;
|
|
674
|
+
// C2 — mcp_config_injection present (the positive signal that the package
|
|
675
|
+
// actually does MCP work, not just claims to)
|
|
676
|
+
if (!threats.some(t => t.type === 'mcp_config_injection')) return false;
|
|
677
|
+
// C3 — no install lifecycle hook
|
|
678
|
+
if (hasLifecycleScripts(meta)) return false;
|
|
679
|
+
// C4 — env_access / credential_regex_harvest must cite only known provider
|
|
680
|
+
// keys (literal whitelist + suffix pattern) or infra vars; never credential
|
|
681
|
+
// file paths
|
|
682
|
+
for (const t of threats) {
|
|
683
|
+
if (t.type !== 'env_access' && t.type !== 'credential_regex_harvest' &&
|
|
684
|
+
t.type !== 'env_charcode_reconstruction') continue;
|
|
685
|
+
const msg = String(t.message || '');
|
|
686
|
+
if (F9_CREDENTIAL_FILE_RE.test(msg)) return false;
|
|
687
|
+
// Extract candidate env var names from the message
|
|
688
|
+
const candidates = msg.match(/\b[A-Z][A-Z0-9_]{2,}\b/g);
|
|
689
|
+
if (!candidates) continue;
|
|
690
|
+
for (const v of candidates) {
|
|
691
|
+
if (KNOWN_PROVIDER_KEYS_LITERAL.has(v)) continue;
|
|
692
|
+
if (PROVIDER_KEY_SUFFIX_RE.test(v)) continue;
|
|
693
|
+
if (F9_INFRA_KEYS.has(v)) continue;
|
|
694
|
+
// Unknown all-caps token in a credential threat message — could be an
|
|
695
|
+
// attacker-specific var. Don't vouch for legitimacy.
|
|
696
|
+
return false;
|
|
697
|
+
}
|
|
698
|
+
}
|
|
699
|
+
// C5 — no third-party exfil capability
|
|
700
|
+
for (const t of threats) {
|
|
701
|
+
if (F9_EXFIL_TYPES.has(t.type)) return false;
|
|
702
|
+
}
|
|
703
|
+
return true;
|
|
704
|
+
}
|
|
705
|
+
|
|
555
706
|
/**
|
|
556
707
|
* Feature 8 — TRUE iff the package declares at least one install
|
|
557
708
|
* lifecycle script AND the scan shows no network egress capability
|
|
@@ -702,6 +853,9 @@ function extractFeatures(result, meta) {
|
|
|
702
853
|
// See ml-retrain/ml-auc-v2.10.96.md for details.
|
|
703
854
|
features.install_script_no_network_egress = 0; // installScriptNoNetworkEgress(result, meta) ? 1 : 0;
|
|
704
855
|
|
|
856
|
+
// --- v2.11.22 Feature 9 (audit week3 cluster — 25 FP) ---
|
|
857
|
+
features.mcp_server_env_access = mcpServerEnvAccess(result, meta) ? 1 : 0;
|
|
858
|
+
|
|
705
859
|
return features;
|
|
706
860
|
}
|
|
707
861
|
|
|
@@ -779,5 +933,6 @@ module.exports = {
|
|
|
779
933
|
typosquatScopedPackage,
|
|
780
934
|
obfuscationWithoutVector,
|
|
781
935
|
placeholderAntiDepConfusion,
|
|
782
|
-
installScriptNoNetworkEgress
|
|
936
|
+
installScriptNoNetworkEgress,
|
|
937
|
+
mcpServerEnvAccess
|
|
783
938
|
};
|
package/src/monitor/ingestion.js
CHANGED
|
@@ -317,43 +317,136 @@ function parsePyPIRss(xml) {
|
|
|
317
317
|
|
|
318
318
|
// --- CouchDB doc extraction ---
|
|
319
319
|
|
|
320
|
+
// Burst-publish window: extra versions published within this window before the
|
|
321
|
+
// most-recent one are also enqueued for scanning. Covers the case where an
|
|
322
|
+
// account-takeover attacker publishes several versions in a short burst.
|
|
323
|
+
const RECENT_PUBLISH_WINDOW_MS = 24 * 60 * 60 * 1000;
|
|
324
|
+
const RECENT_PUBLISH_MAX = 5;
|
|
325
|
+
|
|
320
326
|
/**
|
|
321
|
-
*
|
|
322
|
-
*
|
|
323
|
-
*
|
|
327
|
+
* Pure function: pick the most-recently-published version from a packument and
|
|
328
|
+
* return its metadata, plus context useful for ATO detection.
|
|
329
|
+
*
|
|
330
|
+
* Critical: we sort by `time[version]` publish timestamp, NOT `dist-tags.latest`.
|
|
331
|
+
* Account-takeover attacks (TeamPCP / @antv 2026-05-19, SAP, every Shai-Hulud
|
|
332
|
+
* derivative) publish malicious versions without moving the latest tag — semver
|
|
333
|
+
* resolution on `npm install` will still pull them. Selecting by latest tag
|
|
334
|
+
* scans the wrong (clean) version and lets the malicious tarball ship.
|
|
324
335
|
*
|
|
325
|
-
*
|
|
326
|
-
*
|
|
336
|
+
* Falls back to `dist-tags.latest` only when `time` is missing or yields no
|
|
337
|
+
* usable entries (very old legacy packages).
|
|
338
|
+
*
|
|
339
|
+
* @param {Object} packument - npm packument (full /<pkg> response or CouchDB doc)
|
|
340
|
+
* @param {Object} [options]
|
|
341
|
+
* @param {number} [options.recentWindowMs=86400000] - window for collecting extra recent versions
|
|
342
|
+
* @param {number} [options.maxRecent=5] - hard cap on extras returned
|
|
343
|
+
* @returns {Object|null} - {
|
|
344
|
+
* version, tarball, unpackedSize, scripts, homepage, description,
|
|
345
|
+
* latestTagVersion, // dist-tags.latest (may differ from `version` under ATO)
|
|
346
|
+
* recentVersions: [{ version, tarball, unpackedSize, scripts }, ...]
|
|
347
|
+
* } or null if no usable version found
|
|
327
348
|
*/
|
|
328
|
-
function
|
|
329
|
-
|
|
330
|
-
|
|
331
|
-
|
|
332
|
-
|
|
333
|
-
|
|
349
|
+
function selectMostRecentVersion(packument, options = {}) {
|
|
350
|
+
const recentWindowMs = options.recentWindowMs != null ? options.recentWindowMs : RECENT_PUBLISH_WINDOW_MS;
|
|
351
|
+
const maxRecent = options.maxRecent != null ? options.maxRecent : RECENT_PUBLISH_MAX;
|
|
352
|
+
|
|
353
|
+
if (!packument || typeof packument !== 'object') return null;
|
|
354
|
+
const versions = packument.versions || {};
|
|
355
|
+
const time = packument.time || {};
|
|
356
|
+
const distTags = packument['dist-tags'] || {};
|
|
357
|
+
const latestTagVersion = (typeof distTags.latest === 'string') ? distTags.latest : null;
|
|
358
|
+
|
|
359
|
+
// Build [version, timestamp] pairs from `time`, skipping non-version keys
|
|
360
|
+
// (created/modified) and entries for unpublished versions (present in `time`
|
|
361
|
+
// but absent from `versions` — npm leaves the tombstone after `npm unpublish`).
|
|
362
|
+
const versionTimes = [];
|
|
363
|
+
for (const [v, tsStr] of Object.entries(time)) {
|
|
364
|
+
if (v === 'created' || v === 'modified') continue;
|
|
365
|
+
if (!versions[v]) continue;
|
|
366
|
+
const ts = Date.parse(tsStr);
|
|
367
|
+
if (!Number.isFinite(ts)) continue;
|
|
368
|
+
versionTimes.push([v, ts]);
|
|
369
|
+
}
|
|
334
370
|
|
|
335
|
-
|
|
336
|
-
|
|
371
|
+
let mostRecentVersion = null;
|
|
372
|
+
if (versionTimes.length > 0) {
|
|
373
|
+
versionTimes.sort((a, b) => b[1] - a[1]);
|
|
374
|
+
mostRecentVersion = versionTimes[0][0];
|
|
375
|
+
} else if (latestTagVersion && versions[latestTagVersion]) {
|
|
376
|
+
// Legacy fallback: no usable time data, accept dist-tag latest
|
|
377
|
+
mostRecentVersion = latestTagVersion;
|
|
378
|
+
}
|
|
379
|
+
if (!mostRecentVersion) return null;
|
|
380
|
+
|
|
381
|
+
const versionData = versions[mostRecentVersion];
|
|
382
|
+
if (!versionData) return null;
|
|
383
|
+
|
|
384
|
+
const result = {
|
|
385
|
+
version: versionData.version || mostRecentVersion,
|
|
386
|
+
tarball: (versionData.dist && versionData.dist.tarball) || null,
|
|
387
|
+
unpackedSize: (versionData.dist && versionData.dist.unpackedSize) || 0,
|
|
388
|
+
scripts: versionData.scripts || {},
|
|
389
|
+
homepage: (typeof versionData.homepage === 'string') ? versionData.homepage : '',
|
|
390
|
+
description: (typeof versionData.description === 'string') ? versionData.description : '',
|
|
391
|
+
latestTagVersion,
|
|
392
|
+
recentVersions: [],
|
|
393
|
+
};
|
|
394
|
+
|
|
395
|
+
// Burst extras: other versions published within the recent window, excluding
|
|
396
|
+
// the most-recent one. Bounded by maxRecent. Each extra carries enough
|
|
397
|
+
// metadata for the queue to enqueue it directly without re-fetching the packument.
|
|
398
|
+
if (versionTimes.length > 1) {
|
|
399
|
+
const cutoff = versionTimes[0][1] - recentWindowMs;
|
|
400
|
+
for (let i = 1; i < versionTimes.length && result.recentVersions.length < maxRecent; i++) {
|
|
401
|
+
const [v, ts] = versionTimes[i];
|
|
402
|
+
if (ts < cutoff) break; // sorted desc, so once we cross the cutoff we're done
|
|
403
|
+
const vData = versions[v];
|
|
404
|
+
if (!vData) continue;
|
|
405
|
+
result.recentVersions.push({
|
|
406
|
+
version: vData.version || v,
|
|
407
|
+
tarball: (vData.dist && vData.dist.tarball) || null,
|
|
408
|
+
unpackedSize: (vData.dist && vData.dist.unpackedSize) || 0,
|
|
409
|
+
scripts: vData.scripts || {},
|
|
410
|
+
});
|
|
411
|
+
}
|
|
412
|
+
}
|
|
337
413
|
|
|
338
|
-
|
|
339
|
-
|
|
340
|
-
const version = versionData.version || latestTag;
|
|
341
|
-
const scripts = versionData.scripts || {};
|
|
342
|
-
const homepage = (typeof versionData.homepage === 'string') ? versionData.homepage : '';
|
|
343
|
-
const description = (typeof versionData.description === 'string') ? versionData.description : '';
|
|
414
|
+
return result;
|
|
415
|
+
}
|
|
344
416
|
|
|
345
|
-
|
|
417
|
+
/**
|
|
418
|
+
* Layer 2: Extract metadata for the most-recently-published version from a
|
|
419
|
+
* CouchDB changes document (when using include_docs=true). Eliminates the
|
|
420
|
+
* separate registry roundtrip that can 404 if the package is unpublished
|
|
421
|
+
* between detection and scan.
|
|
422
|
+
*
|
|
423
|
+
* Currently dead code post-May 2025 CouchDB migration (include_docs deprecated,
|
|
424
|
+
* change.doc is always null). Kept defensive in case the registry restores it
|
|
425
|
+
* or a different upstream mirror provides docs again.
|
|
426
|
+
*
|
|
427
|
+
* @param {Object} doc - CouchDB document (change.doc), structurally a packument
|
|
428
|
+
*/
|
|
429
|
+
function extractTarballFromDoc(doc) {
|
|
430
|
+
try {
|
|
431
|
+
if (!doc || !doc.versions || !doc['dist-tags']) return null;
|
|
432
|
+
return selectMostRecentVersion(doc);
|
|
346
433
|
} catch {
|
|
347
434
|
return null; // Parse failure -> fallback to lazy resolution
|
|
348
435
|
}
|
|
349
436
|
}
|
|
350
437
|
|
|
351
438
|
/**
|
|
352
|
-
* Fetch
|
|
353
|
-
*
|
|
439
|
+
* Fetch most-recently-published version metadata for an npm package.
|
|
440
|
+
*
|
|
441
|
+
* Uses the full packument (`registry.npmjs.org/<pkg>`) rather than the `/latest`
|
|
442
|
+
* endpoint so we can detect ATO attacks that publish without moving the latest
|
|
443
|
+
* dist-tag (see selectMostRecentVersion for full threat model).
|
|
444
|
+
*
|
|
445
|
+
* Returned object includes `latestTagVersion` and `recentVersions` so callers
|
|
446
|
+
* can flag the ATO signature and enqueue burst extras for scanning.
|
|
354
447
|
*/
|
|
355
448
|
async function getNpmLatestTarball(packageName) {
|
|
356
|
-
const url = `https://registry.npmjs.org/${encodeURIComponent(packageName)}
|
|
449
|
+
const url = `https://registry.npmjs.org/${encodeURIComponent(packageName)}`;
|
|
357
450
|
await acquireRegistrySlot();
|
|
358
451
|
let body;
|
|
359
452
|
try {
|
|
@@ -361,19 +454,21 @@ async function getNpmLatestTarball(packageName) {
|
|
|
361
454
|
} finally {
|
|
362
455
|
releaseRegistrySlot();
|
|
363
456
|
}
|
|
364
|
-
let
|
|
457
|
+
let packument;
|
|
365
458
|
try {
|
|
366
|
-
|
|
459
|
+
packument = JSON.parse(body);
|
|
367
460
|
} catch (e) {
|
|
368
461
|
throw new Error(`Invalid JSON from npm registry for ${packageName}: ${e.message}`);
|
|
369
462
|
}
|
|
370
|
-
const
|
|
371
|
-
|
|
372
|
-
|
|
373
|
-
|
|
374
|
-
|
|
375
|
-
|
|
376
|
-
|
|
463
|
+
const result = selectMostRecentVersion(packument);
|
|
464
|
+
if (!result) {
|
|
465
|
+
return {
|
|
466
|
+
version: '', tarball: null, unpackedSize: 0, scripts: {},
|
|
467
|
+
homepage: '', description: '',
|
|
468
|
+
latestTagVersion: null, recentVersions: [],
|
|
469
|
+
};
|
|
470
|
+
}
|
|
471
|
+
return result;
|
|
377
472
|
}
|
|
378
473
|
|
|
379
474
|
// --- npm polling ---
|
|
@@ -1075,6 +1170,9 @@ module.exports = {
|
|
|
1075
1170
|
|
|
1076
1171
|
// CouchDB doc extraction
|
|
1077
1172
|
extractTarballFromDoc,
|
|
1173
|
+
selectMostRecentVersion,
|
|
1174
|
+
RECENT_PUBLISH_WINDOW_MS,
|
|
1175
|
+
RECENT_PUBLISH_MAX,
|
|
1078
1176
|
|
|
1079
1177
|
// Polling functions
|
|
1080
1178
|
pollNpmChanges,
|
package/src/monitor/queue.js
CHANGED
|
@@ -1118,13 +1118,49 @@ async function resolveTarballAndScan(item, stats, dailyAlerts, recentlyScanned,
|
|
|
1118
1118
|
if (npmInfo.unpackedSize) item.unpackedSize = npmInfo.unpackedSize;
|
|
1119
1119
|
if (npmInfo.scripts) item.registryScripts = npmInfo.scripts;
|
|
1120
1120
|
|
|
1121
|
+
// ATO signature: most-recently-published version differs from current
|
|
1122
|
+
// dist-tags.latest. Pattern observed in TeamPCP / @antv 2026-05-19:
|
|
1123
|
+
// attacker publishes 1-2 versions per package but does NOT bump the latest
|
|
1124
|
+
// tag. semver resolution on `npm install <pkg>@^x.y` still pulls the
|
|
1125
|
+
// malicious version. The mismatch is a strong ATO signal — legitimate
|
|
1126
|
+
// maintainers almost always move latest when publishing.
|
|
1127
|
+
if (npmInfo.latestTagVersion && npmInfo.version && npmInfo.version !== npmInfo.latestTagVersion) {
|
|
1128
|
+
item.atoSignal = true;
|
|
1129
|
+
console.log(`[MONITOR] ATO SIGNAL: ${item.name}@${item.version} published but dist-tags.latest=${npmInfo.latestTagVersion}`);
|
|
1130
|
+
}
|
|
1131
|
+
|
|
1132
|
+
// Burst-publish coverage: enqueue extra versions published in the same
|
|
1133
|
+
// recent window. Single change event in the CouchDB feed can correspond
|
|
1134
|
+
// to multiple version publishes when the attacker fires several in a
|
|
1135
|
+
// burst (TeamPCP averaged ~2 versions per package). Without this we'd
|
|
1136
|
+
// only scan whichever version happened to be the most recent at resolution
|
|
1137
|
+
// time, racing the publish stream.
|
|
1138
|
+
const recents = Array.isArray(npmInfo.recentVersions) ? npmInfo.recentVersions : [];
|
|
1139
|
+
for (const recent of recents) {
|
|
1140
|
+
if (!recent || !recent.tarball || !recent.version) continue;
|
|
1141
|
+
const dedupeKey = `${item.name}@${recent.version}`;
|
|
1142
|
+
if (recentlyScanned.has(dedupeKey)) continue;
|
|
1143
|
+
scanQueue.push({
|
|
1144
|
+
name: item.name,
|
|
1145
|
+
version: recent.version,
|
|
1146
|
+
ecosystem: 'npm',
|
|
1147
|
+
tarballUrl: recent.tarball,
|
|
1148
|
+
unpackedSize: recent.unpackedSize || 0,
|
|
1149
|
+
registryScripts: recent.scripts || null,
|
|
1150
|
+
atoSignal: item.atoSignal === true,
|
|
1151
|
+
isATOBurstExtra: true,
|
|
1152
|
+
});
|
|
1153
|
+
}
|
|
1154
|
+
|
|
1121
1155
|
// Fast-track decision: large packages (>15MB) with no lifecycle scripts and no IOC match.
|
|
1122
1156
|
// Computed HERE (after metadata resolution), not at ingestion time — post-May 2025
|
|
1123
1157
|
// CouchDB changes feed has no docs, so metadata is only available after lazy fetch.
|
|
1124
1158
|
// Fast-track packages get: quick static scan (package.json + shell only), no AST,
|
|
1125
1159
|
// no sandbox, no LLM, no archiving. Exits in ~2-3s instead of 30-300s.
|
|
1160
|
+
// ATO-signalled packages bypass fast-track regardless of size — we want
|
|
1161
|
+
// the full pipeline (AST + sandbox) on anything that smells like an ATO.
|
|
1126
1162
|
const FAST_TRACK_SIZE_BYTES = 15 * 1024 * 1024;
|
|
1127
|
-
if (!item.isIOCMatch && (item.unpackedSize || 0) > FAST_TRACK_SIZE_BYTES) {
|
|
1163
|
+
if (!item.isIOCMatch && !item.atoSignal && (item.unpackedSize || 0) > FAST_TRACK_SIZE_BYTES) {
|
|
1128
1164
|
const scripts = item.registryScripts || {};
|
|
1129
1165
|
if (!scripts.preinstall && !scripts.postinstall && !scripts.install) {
|
|
1130
1166
|
item.fastTrack = true;
|
|
@@ -166,6 +166,9 @@ async function process(threats, targetPath, options, pythonDeps, warnings, scann
|
|
|
166
166
|
homepage: pkgData.homepage || (typeof pkgData.repository === 'string' ? pkgData.repository : (pkgData.repository && pkgData.repository.url) || ''),
|
|
167
167
|
dependencies: pkgData.dependencies,
|
|
168
168
|
devDependencies: pkgData.devDependencies,
|
|
169
|
+
// v2.11.22 — used by F9 (mcp_server_env_access) identity check.
|
|
170
|
+
keywords: Array.isArray(pkgData.keywords) ? pkgData.keywords : undefined,
|
|
171
|
+
bin: pkgData.bin,
|
|
169
172
|
};
|
|
170
173
|
}
|
|
171
174
|
} catch { /* graceful fallback */ }
|
package/src/scoring.js
CHANGED
|
@@ -1483,6 +1483,7 @@ const {
|
|
|
1483
1483
|
typosquatScopedPackage,
|
|
1484
1484
|
obfuscationWithoutVector,
|
|
1485
1485
|
placeholderAntiDepConfusion,
|
|
1486
|
+
mcpServerEnvAccess,
|
|
1486
1487
|
} = require('./ml/feature-extractor.js');
|
|
1487
1488
|
|
|
1488
1489
|
/**
|
|
@@ -1501,6 +1502,9 @@ function applyContextualFPCaps(result, pkgMeta) {
|
|
|
1501
1502
|
homepage: (pkgMeta && pkgMeta.homepage) || '',
|
|
1502
1503
|
dependencies: (pkgMeta && pkgMeta.dependencies),
|
|
1503
1504
|
devDependencies: (pkgMeta && pkgMeta.devDependencies),
|
|
1505
|
+
// v2.11.22 — used by F9 (mcp_server_env_access) identity check.
|
|
1506
|
+
keywords: (pkgMeta && pkgMeta.keywords),
|
|
1507
|
+
bin: (pkgMeta && pkgMeta.bin),
|
|
1504
1508
|
},
|
|
1505
1509
|
};
|
|
1506
1510
|
|
|
@@ -1518,6 +1522,10 @@ function applyContextualFPCaps(result, pkgMeta) {
|
|
|
1518
1522
|
if (networkDestinationFirstParty(result, meta)) {
|
|
1519
1523
|
applied.push({ feature: 'network_destination_first_party', cap: 30 });
|
|
1520
1524
|
}
|
|
1525
|
+
// F9: legit MCP installer/server with env_access on provider keys → MAX 30
|
|
1526
|
+
if (mcpServerEnvAccess(result, meta)) {
|
|
1527
|
+
applied.push({ feature: 'mcp_server_env_access', cap: 30 });
|
|
1528
|
+
}
|
|
1521
1529
|
// F2: binary installer from GitHub Releases → MAX 35
|
|
1522
1530
|
if (installUrlGithubReleases(result)) {
|
|
1523
1531
|
applied.push({ feature: 'install_url_github_releases', cap: 35 });
|