npm - muaddib-scanner - Versions diffs - 2.5.15 → 2.5.17 - Mend

muaddib-scanner 2.5.15 → 2.5.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

package/README.md +32 -20
package/package.json +1 -1
package/src/index.js +5 -1
package/src/response/playbooks.js +25 -0
package/src/rules/index.js +74 -2
package/src/scanner/ast-detectors.js +177 -11
package/src/scanner/ast.js +32 -1
package/src/scanner/dataflow.js +25 -14
package/src/scanner/entropy.js +1 -1
package/src/scanner/module-graph.js +291 -1
package/src/scanner/obfuscation.js +4 -1
package/src/scoring.js +37 -8

package/README.md CHANGED Viewed

@@ -30,7 +30,7 @@
 npm and PyPI supply-chain attacks are exploding. Shai-Hulud compromised 25K+ repos in 2025. Existing tools detect threats but don't help you respond.
-MUAD'DIB combines static analysis + **deobfuscation engine** (v2.2.5) + **inter-module dataflow** (v2.2.6) + **per-file max scoring** (v2.2.11) + dynamic analysis (Docker sandbox with **monkey-patching preload** for time-bomb detection, v2.4.9) + **behavioral anomaly detection** (v2.0) + **ground truth validation** (v2.1) + **security audit** (41 issues remediated, v2.5.0–v2.5.6) to detect threats AND guide your response — even before they appear in any IOC database.
+MUAD'DIB combines static analysis + **deobfuscation engine** (v2.2.5) + **inter-module dataflow** (v2.2.6) + **per-file max scoring** (v2.2.11) + dynamic analysis (Docker sandbox with **monkey-patching preload** for time-bomb detection, v2.4.9) + **behavioral anomaly detection** (v2.0) + **ground truth validation** (v2.1) + **security audit** (41 issues remediated, v2.5.0–v2.5.6) + **audit hardening** (v2.5.13–v2.5.14) + **FP reduction P5/P6** (v2.5.15–v2.5.16) to detect threats AND guide your response — even before they appear in any IOC database.
 ---
@@ -286,7 +286,7 @@ Add to `.pre-commit-config.yaml`:
 ```yaml
 repos:
   - repo: https://github.com/DNSZLSK/muad-dib
-    rev: v2.5.8
+    rev: v2.5.17
     hooks:
       - id: muaddib-scan        # Scan all threats
       # - id: muaddib-diff      # Or: only new threats
@@ -335,7 +335,7 @@ muaddib replay
 muaddib ground-truth
 ```
-Replay real-world supply-chain attacks against the scanner to validate detection coverage. Current results: **45/49 detected (91.8% TPR)** from 51 samples (49 active).
+Replay real-world supply-chain attacks against the scanner to validate detection coverage. Current results: **46/49 detected (93.9% TPR)** from 51 samples (49 active).
 4 out-of-scope misses: lottie-player, polyfill-io, trojanized-jquery (browser-only DOM attacks), websocket-rat (FP-risky pattern).
@@ -642,7 +642,7 @@ Alerts appear in Security > Code scanning alerts.
 ## Architecture
 ```
-MUAD'DIB 2.5.8 Scanner
+MUAD'DIB 2.5.17 Scanner
 |
 +-- IOC Match (225,000+ packages, JSON DB)
 |   +-- OSV.dev npm dump (200K+ MAL-* entries)
@@ -664,7 +664,7 @@ MUAD'DIB 2.5.8 Scanner
 |   +-- 3-hop re-export chains, class method analysis
 |   +-- Cross-file credential read -> network sink detection
 |
-+-- 14 Parallel Scanners (113 rules)
++-- 14 Parallel Scanners (121 rules)
 |   +-- AST Parse (acorn) — eval/Function, credential CLI theft, binary droppers, prototype hooks
 |   +-- Pattern Matching (shell, scripts)
 |   +-- Obfuscation Detection (skip .min.js, ignore hex/unicode alone)
@@ -685,20 +685,22 @@ MUAD'DIB 2.5.8 Scanner
 |
 +-- Validation & Observability (v2.1)
 |   +-- Datadog 17K Benchmark (88.2% raw, ~100% JS/Node.js adjusted)
-|   +-- Ground Truth Dataset (51 real-world attacks, 91.8% TPR)
+|   +-- Ground Truth Dataset (51 real-world attacks, 93.9% TPR)
 |   +-- Detection Time Logging (first_seen tracking, lead time metrics)
 |   +-- FP Rate Tracking (daily stats, false positive rate)
 |   +-- Score Breakdown (explainable per-rule scoring)
 |   +-- Threat Feed API (HTTP server, JSON feed for SIEM)
 |
-+-- FP Reduction Post-processing (v2.2.8-v2.2.9, v2.3.0-v2.3.1, v2.5.7-v2.5.8)
++-- FP Reduction Post-processing (v2.2.8-v2.3.1, v2.5.7-v2.5.8, v2.5.15-v2.5.16)
 |   +-- Count-based severity downgrade (dynamic_require, dataflow, module_compile, etc.)
 |   +-- Framework prototype scoring cap + HTTP client whitelist
-|   +-- Obfuscation in dist/build/.cjs/.mjs → LOW
-|   +-- Safe env var + prefix filtering
+|   +-- Obfuscation in dist/build/.cjs/.mjs/.js >100KB → LOW
+|   +-- Safe env var + prefix filtering + DATAFLOW_SAFE_ENV_VARS
 |   +-- Dataflow telemetry source categorization (os.platform/arch → telemetry_read)
 |   +-- DEP whitelist (es5-ext, bootstrap-sass) + npm alias skip
 |   +-- IOC wildcard audit (v2.5.8): FPR 10.8% → 6.0%
+|   +-- P5 heuristic precision (v2.5.15): 7 fixes
+|   +-- P6 compound detection precision (v2.5.16): 6 fixes
 |
 +-- Per-File Max Scoring (v2.2.11)
 |   +-- Score = max(file_scores) + package_level_score
@@ -714,6 +716,14 @@ MUAD'DIB 2.5.8 Scanner
 |   +-- 41 issues remediated (14 CRITICAL, 18 HIGH, 9 MEDIUM)
 |   +-- Native addon path traversal, atomic writes, AST bypasses
 |
++-- Audit Hardening (v2.5.13-v2.5.14)
+|   +-- Scoring: plugin loader threshold, lifecycle CRITICAL floor, percentage guard 40%
+|   +-- AST: eval alias, globalThis indirect, require(obj.prop), variable reassignment
+|   +-- Dataflow: Promise .then() tainting, JSON taint propagation
+|   +-- Shell: mkfifo+nc, base64|bash, wget+base64 (3 new patterns)
+|   +-- Entropy: fragment cluster, windowed analysis
+|   +-- 8 new rules (SHELL-013 to 015, ENTROPY-004, +4 audit fixes)
+|
 +-- Paranoid Mode (ultra-strict)
 +-- Docker Sandbox (behavioral analysis, network capture, canary tokens, CI-aware, preload)
 +-- Zero-Day Monitor (internal: npm + PyPI RSS polling, Discord alerts, daily report)
@@ -735,9 +745,9 @@ Output (CLI, JSON, HTML, SARIF, Webhook, Threat Feed)
 | Metric | Result | Details |
 |--------|--------|---------|
 | **Wild TPR** (Datadog 17K) | **88.2%** raw · **~100%** adjusted | 17,922 real malware samples. 2,077 misses are all out-of-scope (see below) |
-| **TPR** (Ground Truth) | **91.8%** (45/49) | 51 real-world attacks (49 active). 4 out-of-scope: browser-only (3) + FP-risky (1) |
-| **FPR** (Benign, global) | **6.0%** (32/529) | 529 npm packages, real source code via `npm pack`, threshold > 20 |
-| **ADR** (Adversarial + Holdout) | **98.8%** (82/83) | 43 adversarial + 40 holdout evasive samples. 1 documented miss: `require-cache-poison` (accepted trade-off) |
+| **TPR** (Ground Truth) | **93.9%** (46/49) | 51 real-world attacks (49 active). 3 out-of-scope: browser-only (3) |
+| **FPR** (Benign, global) | **12.3%** (65/529) | 529 npm packages, real source code via `npm pack`, threshold > 20 |
+| **ADR** (Adversarial + Holdout) | **94.0%** (63/67) | 62 adversarial + 40 holdout evasive samples. 4 misses: `require-cache-poison` (P3 trade-off), `getter-defineProperty-exfil`, `setTimeout-eval-chain`, `setter-trap-exfil` |
 **Datadog 17K benchmark** — [DataDog Malicious Software Packages Dataset](https://github.com/DataDog/malicious-software-packages-dataset), 17,922 real malware samples (npm). Raw TPR: 88.2% (15,810/17,922). The 2,077 misses (score=0) were manually categorized:
@@ -758,7 +768,9 @@ All 2,077 misses lack Node.js malware patterns. MUAD'DIB performs AST-based Node
 | Large (50-100 JS files) | 40 | 10 | 25.0% |
 | Very large (100+ JS files) | 62 | 25 | 40.3% |
-**FPR progression**: 0% (invalid, empty dirs, v2.2.0-v2.2.6) → 38% (first real measurement, v2.2.7) → 19.4% (v2.2.8) → 17.5% (v2.2.9) → ~13% (v2.2.11, per-file max scoring) → 8.9% (v2.3.0, P2) → 7.4% (v2.3.1, P3) → **6.0%** (v2.5.8, P4 + IOC wildcard audit)
+**FPR progression**: 0% (invalid, empty dirs, v2.2.0-v2.2.6) → 38% (first real measurement, v2.2.7) → 19.4% (v2.2.8) → 17.5% (v2.2.9) → ~13% (v2.2.11, per-file max scoring) → 8.9% (v2.3.0, P2) → 7.4% (v2.3.1, P3) → 6.0% (v2.5.8, P4 + IOC wildcard audit) → ~13.6% (v2.5.14, audit hardening added stricter detection) → **12.3%** (v2.5.16, P5 + P6)
+> **Note on FPR evolution:** The historic 6.0% FPR (v2.5.8) relied on a `BENIGN_PACKAGE_WHITELIST` that excluded certain known packages from scoring — a data leakage bias removed in v2.5.10. The current 12.3% FPR is an honest measurement without whitelisting, against 529 real benign packages. The P5/P6 reductions (setTimeout precision, dist/ two-notch downgrade, credential_regex count-based, env segment matching, etc.) are detector precision improvements, not whitelisting.
 **Holdout progression** (pre-tuning scores, rules frozen):
@@ -771,12 +783,12 @@ All 2,077 misses lack Node.js malware patterns. MUAD'DIB performs AST-based Node
 | v5 | 50% (5/10) | Inter-module dataflow (new scanner) |
 - **Wild TPR** (Datadog Benchmark): detection rate on 17,922 real malware packages from the [DataDog Malicious Software Packages Dataset](https://github.com/DataDog/malicious-software-packages-dataset). Raw 88.2% (15,810/17,922). Adjusted ~100% on JS/Node.js malware when excluding out-of-scope samples (1,233 phishing HTML pages, 824 native binaries, 20 corrected libraries). See [Evaluation Methodology](docs/EVALUATION_METHODOLOGY.md#14-datadog-17k-benchmark).
-- **TPR** (True Positive Rate): detection rate on 49 real-world supply-chain attacks (event-stream, ua-parser-js, coa, flatmap-stream, eslint-scope, solana-web3js, and 43 more). 4 misses are browser-only (lottie-player, polyfill-io, trojanized-jquery) or risky to fix (websocket-rat) — see [Threat Model](docs/threat-model.md).
+- **TPR** (True Positive Rate): detection rate on 49 real-world supply-chain attacks (event-stream, ua-parser-js, coa, flatmap-stream, eslint-scope, solana-web3js, and 43 more). 3 misses are browser-only (lottie-player, polyfill-io, trojanized-jquery) — see [Threat Model](docs/threat-model.md).
 - **FPR** (False Positive Rate): packages scoring > 20 out of 529 real npm packages (source code scanned, not empty dirs).
-- **ADR** (Adversarial Detection Rate): detection rate on 83 evasive malicious samples — 43 adversarial + 40 holdout (5 batches of 10, testing obfuscation, inter-module dataflow, etc.). 1 documented miss: `require-cache-poison` (score 10 < threshold 20, accepted trade-off from FP reduction P3).
+- **ADR** (Adversarial Detection Rate): detection rate on 102 evasive malicious samples — 62 adversarial + 40 holdout (5 adversarial waves + 4 holdout batches). 4 misses on available samples: `require-cache-poison` (P3 trade-off), `getter-defineProperty-exfil`, `setTimeout-eval-chain`, `setter-trap-exfil`.
 - **Holdout** (pre-tuning): detection rate on 10 unseen samples with rules frozen (measures generalization)
-Datasets: 17,922 Datadog malware samples, 529 npm + 132 PyPI benign packages, 83 adversarial/holdout samples, 51 ground-truth attacks (65 documented malware packages). **1656 tests**, 86% code coverage.
+Datasets: 17,922 Datadog malware samples, 529 npm + 132 PyPI benign packages, 102 adversarial/holdout samples, 51 ground-truth attacks (65 documented malware packages). **1869 tests**, 86% code coverage.
 See [Evaluation Methodology](docs/EVALUATION_METHODOLOGY.md) for the full experimental protocol.
@@ -812,12 +824,12 @@ npm test
 ### Testing
-- **1656 unit/integration tests** across 42 modular test files - 86% code coverage via [Codecov](https://codecov.io/gh/DNSZLSK/muad-dib)
+- **1869 unit/integration tests** across 43 modular test files - 86% code coverage via [Codecov](https://codecov.io/gh/DNSZLSK/muad-dib)
 - **56 fuzz tests** - Malformed YAML, invalid JSON, binary files, ReDoS, unicode, 10MB inputs
 - **Datadog 17K benchmark** - 17,922 real malware samples, 88.2% raw TPR, ~100% on JS/Node.js malware (2,077 out-of-scope misses: phishing, binaries, corrected libs)
-- **83 adversarial/holdout samples** - 43 adversarial + 40 holdout, 82/83 detection rate (98.8% ADR). 1 documented miss: `require-cache-poison` (accepted trade-off)
-- **Ground truth validation** - 51 real-world attacks (45/49 detected = 91.8% TPR). 4 out-of-scope: browser-only (3) + FP-risky (1)
-- **False positive validation** - 6.0% FPR global (32/529) on real npm source code via `npm pack`
+- **102 adversarial/holdout samples** - 62 adversarial + 40 holdout, 63/67 detection rate on available samples (94.0% ADR). 4 misses: `require-cache-poison` (P3 trade-off), `getter-defineProperty-exfil`, `setTimeout-eval-chain`, `setter-trap-exfil`
+- **Ground truth validation** - 51 real-world attacks (46/49 detected = 93.9% TPR). 3 out-of-scope: browser-only (lottie-player, polyfill-io, trojanized-jquery)
+- **False positive validation** - 12.3% FPR global (65/529) on real npm source code via `npm pack`
 - **ESLint security audit** - `eslint-plugin-security` with 14 rules enabled
 ---

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "muaddib-scanner",
-  "version": "2.5.15",
+  "version": "2.5.17",
   "description": "Supply-chain threat detection & response for npm & PyPI/Python",
   "main": "src/index.js",
   "bin": {

package/src/index.js CHANGED Viewed

@@ -23,7 +23,7 @@ const { ensureIOCs } = require('./ioc/bootstrap.js');
 const { scanEntropy } = require('./scanner/entropy.js');
 const { scanAIConfig } = require('./scanner/ai-config.js');
 const { deobfuscate } = require('./scanner/deobfuscate.js');
-const { buildModuleGraph, annotateTaintedExports, detectCrossFileFlows } = require('./scanner/module-graph.js');
+const { buildModuleGraph, annotateTaintedExports, detectCrossFileFlows, annotateSinkExports, detectCallbackCrossFileFlows } = require('./scanner/module-graph.js');
 const { computeReachableFiles } = require('./scanner/reachability.js');
 const { runTemporalAnalyses } = require('./temporal-runner.js');
 const { formatOutput } = require('./output-formatter.js');
@@ -362,6 +362,10 @@ async function run(targetPath, options = {}) {
       const graph = await yieldThen(() => buildModuleGraph(targetPath));
       const tainted = await yieldThen(() => annotateTaintedExports(graph, targetPath));
       crossFileFlows = await yieldThen(() => detectCrossFileFlows(graph, tainted, targetPath));
+      // Callback-based cross-file flow detection
+      const sinkAnnotations = await yieldThen(() => annotateSinkExports(graph, targetPath));
+      const callbackFlows = await yieldThen(() => detectCallbackCrossFileFlows(graph, tainted, sinkAnnotations, targetPath));
+      crossFileFlows = crossFileFlows.concat(callbackFlows);
     } catch (e) {
       // Graceful fallback — module graph is best-effort
       debugLog('[MODULE-GRAPH] Error:', e && e.message);

package/src/response/playbooks.js CHANGED Viewed

@@ -461,6 +461,31 @@ const PLAYBOOKS = {
   fragmented_high_entropy_cluster:
     'Cluster de chaines courtes a haute entropie detecte. Possible fragmentation de payload pour eviter la detection. ' +
     'Reconstituer les fragments et analyser le contenu combine. Verifier si les chaines sont concatenees ou reassemblees a l\'execution.',
+  wasm_host_sink:
+    'CRITIQUE: Module WebAssembly charge avec des imports host contenant des sinks reseau. Le flux de controle est cache dans le binaire WASM, ' +
+    'rendant l\'analyse statique impossible. Le WASM peut lire des fichiers sensibles et exfiltrer via les callbacks host. ' +
+    'Supprimer le package immediatement. Analyser le fichier WASM avec wasm2wat pour comprendre le flux. Regenerer tous les secrets.',
+  credential_regex_harvest:
+    'Code contient des regex de detection de credentials (Bearer, password, token, API key) combine avec un appel reseau. ' +
+    'Technique de harvesting: scanne les donnees en transit (streams HTTP, fichiers) pour extraire des secrets et les exfiltrer. ' +
+    'Supprimer le package. Auditer le trafic reseau sortant.',
+  builtin_override_exfil:
+    'Code remplace une methode built-in (console.log/warn/error, Object.defineProperty) et contient un appel reseau. ' +
+    'Technique de monkey-patching: intercepte les donnees passant par les APIs natives pour les exfiltrer. ' +
+    'Supprimer le package. Verifier si d\'autres methodes natives ont ete modifiees.',
+  stream_credential_intercept:
+    'Classe stream (Transform/Duplex/Writable) avec regex de credentials et appel reseau. ' +
+    'Technique de wiretap: le stream intercepte les donnees en transit, scanne pour des secrets (Bearer, password, token) ' +
+    'et les exfiltre via un appel reseau. Supprimer le package.',
+  remote_code_load:
+    'CRITIQUE: Fetch reseau + eval/new Function() dans le meme fichier. ' +
+    'Technique multi-stage: le package telecharge un payload depuis un serveur distant (SVG, HTML, JSON) puis l\'execute. ' +
+    'Supprimer le package. Bloquer le domaine C2 au niveau firewall.',
+  proxy_data_intercept:
+    'CRITIQUE: Un Proxy JavaScript avec trap set/get/apply est combine avec un appel reseau. ' +
+    'Technique d\'interception: le Proxy capture toutes les ecritures de proprietes (credentials, tokens, config) ' +
+    'et les exfiltre via HTTPS/fetch/dgram. Supprimer le package. Auditer tous les modules qui importent ce package.',
 };
 function getPlaybook(threatType) {

package/src/rules/index.js CHANGED Viewed

@@ -703,7 +703,7 @@ const RULES = {
   module_compile: {
     id: 'MUADDIB-AST-023',
     name: 'Module Compile Execution',
-    severity: 'CRITICAL',
+    severity: 'HIGH',
     confidence: 'high',
     description: 'module._compile() detecte. Execution de code arbitraire a partir d\'une chaine dans le contexte module. Technique cle de flatmap-stream.',
     references: [
@@ -729,7 +729,7 @@ const RULES = {
   module_compile_dynamic: {
     id: 'MUADDIB-AST-025',
     name: 'Dynamic Module Compile Execution',
-    severity: 'CRITICAL',
+    severity: 'HIGH',
     confidence: 'high',
     description: 'Module._compile() avec argument dynamique (non-literal). Execution de code en memoire sans ecriture sur disque. Technique d\'evasion malware courante.',
     references: [
@@ -1285,6 +1285,78 @@ const RULES = {
     ],
     mitre: 'T1059'
   },
+  wasm_host_sink: {
+    id: 'MUADDIB-AST-042',
+    name: 'WASM Host Import Sink',
+    severity: 'CRITICAL',
+    confidence: 'high',
+    description: 'Module WebAssembly charge avec des callbacks host contenant des sinks reseau (fetch/http.request). Le WASM peut invoquer ces callbacks pour exfiltrer des donnees tout en cachant le flux de controle. Aucun package npm legitime ne combine WASM + callbacks reseau host.',
+    references: [
+      'https://attack.mitre.org/techniques/T1059/',
+      'https://attack.mitre.org/techniques/T1027/'
+    ],
+    mitre: 'T1059'
+  },
+  credential_regex_harvest: {
+    id: 'MUADDIB-AST-041',
+    name: 'Credential Regex Harvesting',
+    severity: 'HIGH',
+    confidence: 'high',
+    description: 'Regex de detection de credentials (token/password/secret/Bearer) combine avec un appel reseau. Technique de harvesting: le code scanne les donnees de flux (streams, requetes) a la recherche de credentials et les exfiltre.',
+    references: [
+      'https://attack.mitre.org/techniques/T1552/',
+      'https://attack.mitre.org/techniques/T1041/'
+    ],
+    mitre: 'T1552'
+  },
+  builtin_override_exfil: {
+    id: 'MUADDIB-AST-044',
+    name: 'Built-in Method Override Exfiltration',
+    severity: 'HIGH',
+    confidence: 'high',
+    description: 'Override de methode built-in (console.log/warn/error, Object.defineProperty) combine avec un appel reseau. Technique de monkey-patching: le code remplace une API native pour intercepter les donnees en transit et les exfiltrer.',
+    references: [
+      'https://attack.mitre.org/techniques/T1557/',
+      'https://attack.mitre.org/techniques/T1041/'
+    ],
+    mitre: 'T1557'
+  },
+  stream_credential_intercept: {
+    id: 'MUADDIB-AST-045',
+    name: 'Stream Credential Interception',
+    severity: 'HIGH',
+    confidence: 'high',
+    description: 'Classe stream (Transform/Duplex/Writable) avec regex de credentials et appel reseau. Technique de wiretap: le stream intercepte les donnees en transit, scanne pour des credentials (Bearer, password, token) et les exfiltre.',
+    references: [
+      'https://attack.mitre.org/techniques/T1557/',
+      'https://attack.mitre.org/techniques/T1552/'
+    ],
+    mitre: 'T1557'
+  },
+  remote_code_load: {
+    id: 'MUADDIB-AST-040',
+    name: 'Remote Code Loading',
+    severity: 'CRITICAL',
+    confidence: 'high',
+    description: 'Fetch reseau + eval/Function dans le meme fichier. Technique multi-stage: le code telecharge un payload distant (SVG, HTML, JSON) et l\'execute dynamiquement. Aucun package npm legitime ne combine fetch + eval/Function.',
+    references: [
+      'https://attack.mitre.org/techniques/T1105/',
+      'https://attack.mitre.org/techniques/T1059/'
+    ],
+    mitre: 'T1105'
+  },
+  proxy_data_intercept: {
+    id: 'MUADDIB-AST-043',
+    name: 'Proxy Data Interception',
+    severity: 'CRITICAL',
+    confidence: 'high',
+    description: 'Proxy trap (set/get/apply) combine avec un appel reseau dans le meme fichier. Technique d\'interception de donnees: le Proxy capture toutes les ecritures/lectures de proprietes et les exfiltre via le reseau. Utilise pour voler des credentials passees via module.exports.',
+    references: [
+      'https://attack.mitre.org/techniques/T1557/',
+      'https://attack.mitre.org/techniques/T1041/'
+    ],
+    mitre: 'T1557'
+  },
 };
 function getRule(type) {

package/src/scanner/ast-detectors.js CHANGED Viewed

@@ -26,17 +26,56 @@ const SAFE_ENV_VARS = [
   'LANG', 'TERM', 'CI', 'DEBUG', 'VERBOSE', 'LOG_LEVEL',
   'SHELL', 'USER', 'LOGNAME', 'EDITOR', 'TZ',
   'NODE_DEBUG', 'NODE_PATH', 'NODE_OPTIONS',
-  'DISPLAY', 'COLORTERM', 'FORCE_COLOR', 'NO_COLOR', 'TERM_PROGRAM'
+  'DISPLAY', 'COLORTERM', 'FORCE_COLOR', 'NO_COLOR', 'TERM_PROGRAM',
+  // CI environment metadata (non-sensitive)
+  'GITHUB_REPOSITORY', 'GITHUB_SHA', 'GITHUB_REF', 'GITHUB_WORKSPACE',
+  'GITHUB_RUN_ID', 'GITHUB_RUN_NUMBER', 'GITHUB_ACTOR', 'GITHUB_EVENT_NAME',
+  'GITHUB_WORKFLOW', 'GITHUB_ACTION', 'GITHUB_JOB', 'GITHUB_SERVER_URL',
+  'GITLAB_CI', 'TRAVIS', 'CIRCLECI', 'JENKINS_URL',
+  // Build tool config
+  'NODE_TLS_REJECT_UNAUTHORIZED', 'BABEL_ENV', 'WEBPACK_MODE'
 ];
-// Env var prefixes that are safe (npm metadata, locale settings)
-const SAFE_ENV_PREFIXES = ['npm_config_', 'npm_lifecycle_', 'npm_package_', 'lc_', 'muaddib_'];
+// Env var prefixes that are safe (npm metadata, locale settings, framework public vars)
+const SAFE_ENV_PREFIXES = [
+  'npm_config_', 'npm_lifecycle_', 'npm_package_', 'lc_', 'muaddib_',
+  'next_public_', 'vite_', 'react_app_'
+];
 // Env var keywords to detect sensitive environment access (separate from SENSITIVE_STRINGS)
 const ENV_SENSITIVE_KEYWORDS = [
   'TOKEN', 'SECRET', 'KEY', 'PASSWORD', 'CREDENTIAL', 'AUTH'
 ];
+// Non-sensitive qualifiers: when a keyword is preceded by one of these in the env var name,
+// it is config metadata, not a real secret (e.g., PUBLIC_KEY, CACHE_KEY, SORT_KEY)
+const ENV_NON_SENSITIVE_QUALIFIERS = new Set([
+  'PUBLIC', 'CACHE', 'PRIMARY', 'FOREIGN', 'SORT', 'PARTITION', 'INDEX', 'ENCRYPTION'
+]);
+/**
+ * Check if an env var name contains a sensitive keyword as a full _-delimited segment,
+ * not preceded by a non-sensitive qualifier.
+ * e.g., NPM_TOKEN → TOKEN is full segment → true
+ *       PUBLIC_KEY → KEY preceded by PUBLIC → false
+ *       CACHE_KEY → KEY preceded by CACHE → false
+ *       GITHUB_TOKEN → TOKEN is full segment, preceded by GITHUB (not a qualifier) → true
+ */
+function isEnvSensitive(envVar) {
+  const upper = envVar.toUpperCase();
+  const segments = upper.split('_');
+  for (let i = 0; i < segments.length; i++) {
+    if (ENV_SENSITIVE_KEYWORDS.includes(segments[i])) {
+      // Check if preceded by a non-sensitive qualifier
+      if (i > 0 && ENV_NON_SENSITIVE_QUALIFIERS.has(segments[i - 1])) {
+        continue;
+      }
+      return true;
+    }
+  }
+  return false;
+}
 // AI agent dangerous flags — disable security controls (s1ngularity/Nx, Aug 2025)
 const AI_AGENT_DANGEROUS_FLAGS = [
   '--dangerously-skip-permissions',
@@ -85,10 +124,11 @@ const HOOKABLE_NATIVES = [
 ];
 // Node.js core module classes targeted for prototype hooking
-const NODE_HOOKABLE_MODULES = ['http', 'https', 'net', 'tls', 'stream'];
+const NODE_HOOKABLE_MODULES = ['http', 'https', 'net', 'tls', 'stream', 'events', 'dgram'];
 const NODE_HOOKABLE_CLASSES = [
   'IncomingMessage', 'ServerResponse', 'ClientRequest',
-  'OutgoingMessage', 'Socket', 'Server', 'Agent'
+  'OutgoingMessage', 'Socket', 'Server', 'Agent',
+  'EventEmitter'
 ];
 // AI/MCP config paths targeted for config injection (SANDWORM_MODE)
@@ -423,7 +463,7 @@ function handleVariableDeclarator(node, ctx) {
         if (SAFE_ENV_VARS.includes(envVar)) continue;
         const envLower = envVar.toLowerCase();
         if (SAFE_ENV_PREFIXES.some(p => envLower.startsWith(p))) continue;
-        if (ENV_SENSITIVE_KEYWORDS.some(s => envVar.toUpperCase().includes(s))) {
+        if (isEnvSensitive(envVar)) {
           ctx.threats.push({
             type: 'env_access',
             severity: 'HIGH',
@@ -538,7 +578,7 @@ function handleCallExpression(node, ctx) {
         }
       }
       if (!resolved) {
-        ctx.threats.push({ type: 'dynamic_require', severity: 'HIGH',
+        ctx.threats.push({ type: 'dynamic_require', severity: 'MEDIUM',
           message: 'Dynamic require() with member expression argument (object property obfuscation).',
           file: ctx.relFile });
       }
@@ -985,9 +1025,9 @@ function handleCallExpression(node, ctx) {
   if (callName === 'eval') {
     ctx.hasEvalInFile = true;
-    ctx.hasDynamicExec = true;
     // Detect staged eval decode
     if (node.arguments.length === 1 && hasDecodeArg(node.arguments[0])) {
+      ctx.hasDynamicExec = true;
       ctx.threats.push({
         type: 'staged_eval_decode',
         severity: 'CRITICAL',
@@ -1007,9 +1047,15 @@ function handleCallExpression(node, ctx) {
         if (/\b(require|import|exec|execSync|spawn|child_process|\.readFile|\.writeFile|process\.env|\.homedir)\b/.test(val)) {
           severity = 'HIGH';
           message = `eval() with dangerous API in string literal: "${val.substring(0, 100)}"`;
+          ctx.hasDynamicExec = true;
         }
       }
+      // Only set hasDynamicExec for non-constant (dynamic) eval
+      if (!isConstant) {
+        ctx.hasDynamicExec = true;
+      }
       ctx.threats.push({
         type: 'dangerous_call_eval',
         severity,
@@ -1039,6 +1085,25 @@ function handleCallExpression(node, ctx) {
     }
   }
+  // setTimeout/setInterval with string argument = eval equivalent
+  // setTimeout("require('child_process').exec('whoami')", 100) executes the string as code
+  // Only string Literal and TemplateLiteral are eval-equivalent; Identifier/MemberExpression
+  // are function references (callbacks), not code strings.
+  if ((callName === 'setTimeout' || callName === 'setInterval') && node.arguments.length >= 1) {
+    const firstArg = node.arguments[0];
+    if ((firstArg.type === 'Literal' && typeof firstArg.value === 'string') ||
+        firstArg.type === 'TemplateLiteral') {
+      ctx.hasEvalInFile = true;
+      ctx.hasDynamicExec = true;
+      ctx.threats.push({
+        type: 'dangerous_call_eval',
+        severity: 'HIGH',
+        message: `${callName}() with string argument — eval equivalent, executes the string as code.`,
+        file: ctx.relFile
+      });
+    }
+  }
   // Detect eval.call(null, code) / eval.apply(null, [code]) / Function.call/apply
   if (node.callee.type === 'MemberExpression' && !node.callee.computed &&
       node.callee.property?.type === 'Identifier' &&
@@ -1229,7 +1294,9 @@ function handleCallExpression(node, ctx) {
         ctx.hasDynamicExec = true;
         ctx.threats.push({
           type: 'module_compile',
-          severity: 'CRITICAL',
+          // P6: Baseline HIGH — single module._compile() in build tools (@babel/core, art-template)
+          // is framework behavior. Compound detections (zlib_inflate_eval, fetch_decrypt_exec) stay CRITICAL.
+          severity: 'HIGH',
           message: 'module._compile() detected — executes arbitrary code from string in module context (flatmap-stream pattern).',
           file: ctx.relFile
         });
@@ -1237,7 +1304,7 @@ function handleCallExpression(node, ctx) {
         if (node.arguments.length >= 1 && !hasOnlyStringLiteralArgs(node)) {
           ctx.threats.push({
             type: 'module_compile_dynamic',
-            severity: 'CRITICAL',
+            severity: 'HIGH',
             message: 'In-memory code execution via Module._compile(). Common malware evasion technique.',
             file: ctx.relFile
           });
@@ -1447,6 +1514,20 @@ function handleNewExpression(node, ctx) {
         file: ctx.relFile
       });
     }
+    // Detect new Proxy(obj, handler) where handler has set/get traps — data interception
+    // Real-world technique: export a Proxy that intercepts all property sets/gets to exfiltrate
+    // data flowing through the module. Combined with network (hasNetworkInFile) → credential theft.
+    if (!target.type?.includes('MemberExpression') || target.property?.name !== 'env') {
+      const handler = node.arguments[1];
+      if (handler?.type === 'ObjectExpression') {
+        const hasTrap = handler.properties?.some(p =>
+          p.key?.type === 'Identifier' && ['set', 'get', 'apply', 'construct'].includes(p.key.name)
+        );
+        if (hasTrap) {
+          ctx.hasProxyTrap = true;
+        }
+      }
+    }
   }
   // Batch 2: new Worker(code, { eval: true }) — worker_threads code execution
@@ -1630,6 +1711,19 @@ function handleAssignmentExpression(node, ctx) {
       }
     }
+    // JSON.stringify = ... or JSON.parse = ... — global API hooking
+    // Real-world technique: override JSON.stringify to intercept all serialization and exfiltrate data
+    if (left.object?.type === 'Identifier' && left.object.name === 'JSON' &&
+        left.property?.type === 'Identifier' &&
+        ['stringify', 'parse'].includes(left.property.name)) {
+      ctx.threats.push({
+        type: 'prototype_hook',
+        severity: 'HIGH',
+        message: `JSON.${left.property.name} overridden — global API hooking to intercept all JSON serialization/deserialization.`,
+        file: ctx.relFile
+      });
+    }
     // XMLHttpRequest.prototype.send = ... or Response.prototype.json = ...
     if (left.object?.type === 'MemberExpression' &&
         left.object.property?.type === 'Identifier' &&
@@ -1723,7 +1817,7 @@ function handleMemberExpression(node, ctx) {
       if (SAFE_ENV_PREFIXES.some(p => envLower.startsWith(p))) {
         return;
       }
-      if (ENV_SENSITIVE_KEYWORDS.some(s => envVar.toUpperCase().includes(s))) {
+      if (isEnvSensitive(envVar)) {
         ctx.threats.push({
           type: 'env_access',
           severity: 'HIGH',
@@ -1828,6 +1922,17 @@ function handlePostWalk(ctx) {
     });
   }
+  // Remote code loading: fetch + eval/Function in same file = multi-stage payload
+  // Distinct from fetch_decrypt_exec which also requires crypto. This catches SVG/HTML payload extraction.
+  if (ctx.hasRemoteFetch && ctx.hasDynamicExec && !ctx.hasCryptoDecipher) {
+    ctx.threats.push({
+      type: 'remote_code_load',
+      severity: 'CRITICAL',
+      message: 'Remote code loading: network fetch + dynamic eval/Function in same file — multi-stage payload execution.',
+      file: ctx.relFile
+    });
+  }
   // Wave 4: Remote fetch + crypto decrypt + dynamic eval = steganographic payload chain
   if (ctx.hasRemoteFetch && ctx.hasCryptoDecipher && ctx.hasDynamicExec) {
     ctx.threats.push({
@@ -1861,6 +1966,67 @@ function handlePostWalk(ctx) {
     });
   }
+  // WASM payload detection: WebAssembly.compile/instantiate + readFileSync/https in same file
+  // WASM host import objects can contain callback functions that read credentials and exfiltrate.
+  // This pattern is never legitimate in npm packages — WASM should use pure computation, not host I/O.
+  if (ctx.hasWasmLoad && ctx.hasNetworkCallInFile) {
+    ctx.threats.push({
+      type: 'wasm_host_sink',
+      severity: 'CRITICAL',
+      message: 'WebAssembly module with network-capable host imports. WASM can invoke host callbacks to exfiltrate data while hiding control flow.',
+      file: ctx.relFile
+    });
+  }
+  // Credential regex harvesting: credential-matching regex + network call in same file
+  // Real-world pattern: Transform/stream that scans data for tokens/passwords and exfiltrates
+  if (ctx.hasCredentialRegex && ctx.hasNetworkCallInFile) {
+    ctx.threats.push({
+      type: 'credential_regex_harvest',
+      severity: 'HIGH',
+      message: 'Credential regex patterns (token/password/secret/Bearer) + network call in same file — stream data credential harvesting.',
+      file: ctx.relFile
+    });
+  }
+  // Built-in method override + network: console.X = function or Object.defineProperty = function
+  // combined with network calls. Monkey-patching built-in APIs for data interception.
+  if (ctx.hasBuiltinOverride && ctx.hasNetworkCallInFile) {
+    ctx.threats.push({
+      type: 'builtin_override_exfil',
+      severity: 'HIGH',
+      message: 'Built-in method override (console/Object.defineProperty) + network call — runtime API hijacking for data interception and exfiltration.',
+      file: ctx.relFile
+    });
+  }
+  // Stream credential interception: Transform/Duplex/Writable stream + credential regex + network
+  // Wiretap pattern: intercepts data in transit, scans for credentials, exfiltrates matches.
+  if (ctx.hasStreamInterceptor && ctx.hasCredentialRegex && ctx.hasNetworkCallInFile) {
+    ctx.threats.push({
+      type: 'stream_credential_intercept',
+      severity: 'HIGH',
+      message: 'Stream class (Transform/Duplex/Writable) with credential regex scanning + network call — data-in-transit credential wiretap.',
+      file: ctx.relFile
+    });
+  }
+  // Proxy data interception: new Proxy(obj, { set/get }) + network in same file
+  // Real-world pattern: export a Proxy that exfiltrates all property assignments via network
+  // CRITICAL only when credential signals co-occur (env_access, suspicious_dataflow),
+  // otherwise HIGH — bare Proxy + fetch is insufficient evidence.
+  if (ctx.hasProxyTrap && ctx.hasNetworkCallInFile) {
+    const hasCredentialSignal = ctx.threats.some(t =>
+      t.type === 'env_access' || t.type === 'suspicious_dataflow'
+    );
+    ctx.threats.push({
+      type: 'proxy_data_intercept',
+      severity: hasCredentialSignal ? 'CRITICAL' : 'HIGH',
+      message: 'Proxy trap (set/get/apply) with network call in same file — data interception and exfiltration via Proxy handler.',
+      file: ctx.relFile
+    });
+  }
   // Wave 4: MCP content keywords in file with writeFileSync = MCP injection signal
   if (ctx.hasMcpContentKeywords && !ctx.threats.some(t => t.type === 'mcp_config_injection')) {
     ctx.threats.push({

package/src/scanner/ast.js CHANGED Viewed

@@ -15,6 +15,24 @@ const {
   handlePostWalk
 } = require('./ast-detectors.js');
+// Check if credential keywords appear INSIDE regex literals or new RegExp() patterns.
+// Only true when the keyword is part of the regex pattern itself, not just a string elsewhere in the file.
+const CREDENTIAL_REGEX_KEYWORDS = /bearer|password|secret|token|credential|api.?key/i;
+function hasCredentialInsideRegex(content) {
+  // Check regex literals: /...pattern.../flags
+  const regexLiteralRe = /\/(?!\*)(?:[^/\\]|\\.)+\/[gimsuy]*/g;
+  let m;
+  while ((m = regexLiteralRe.exec(content)) !== null) {
+    if (CREDENTIAL_REGEX_KEYWORDS.test(m[0])) return true;
+  }
+  // Check new RegExp('pattern') — keyword must be in the string argument
+  const newRegExpRe = /new\s+RegExp\s*\(\s*(['"`])((?:[^\\]|\\.)*?)\1/g;
+  while ((m = newRegExpRe.exec(content)) !== null) {
+    if (CREDENTIAL_REGEX_KEYWORDS.test(m[2])) return true;
+  }
+  return false;
+}
 const EXCLUDED_FILES = [
   'src/scanner/ast.js',
   'src/scanner/shell.js',
@@ -93,6 +111,15 @@ function analyzeFile(content, filePath, basePath) {
     hasEnvEnumeration: false,  // Object.entries/keys/values(process.env)
     hasEnvHarvestPattern: /\b(KEY|SECRET|TOKEN|PASSWORD|CREDENTIAL|NPM|AWS|SSH|WEBHOOK)\b/.test(content),
     hasNetworkCallInFile: /\b(fetch|https?\.request|https?\.get|dns\.resolve)\b/.test(content),
+    // Credential regex harvesting: regex literals or new RegExp() whose PATTERN contains credential keywords
+    // Must check that the keyword is inside the regex, not just anywhere in the file
+    hasCredentialRegex: hasCredentialInsideRegex(content),
+    // Built-in method override: console.X = function or Object.defineProperty = function
+    hasBuiltinOverride: /\bconsole\s*\.\s*\w+\s*=\s*function/.test(content) ||
+                        /\bconsole\s*\[\s*\w+\s*\]\s*=\s*function/.test(content) ||
+                        /\bObject\s*\.\s*defineProperty\s*=\s*function/.test(content),
+    // Stream interceptor: class extending Transform/Duplex/Writable (data wiretap pattern)
+    hasStreamInterceptor: /\bextends\s+(Transform|Duplex|Writable)\b/.test(content),
     // SANDWORM_MODE P2: DNS exfiltration co-occurrence
     hasDnsRequire: /\brequire\s*\(\s*['"]dns['"]\s*\)/.test(content) || /\bdns\s*\.\s*resolve/.test(content),
     hasBase64Encode: /\.toString\s*\(\s*['"]base64(url)?['"]\s*\)/.test(content),
@@ -123,7 +150,11 @@ function analyzeFile(content, filePath, basePath) {
     hasModuleImport: /require\s*\(\s*['"]module['"]\s*\)/.test(content) || /module\.constructor/.test(content),
     hasMcpContentKeywords: (/\bmcpServers\b/.test(content) || /\bmcp\.json\b/.test(content) || /\bclaude_desktop_config\b/.test(content)) &&
       /\bwriteFileSync\b|\bwriteFile\s*\(/.test(content) &&
-      (/\.claude[/\\]/.test(content) || /\.cursor[/\\]/.test(content) || /\.vscode[/\\]/.test(content) || /\.windsurf[/\\]/.test(content) || /\.codeium[/\\]/.test(content) || /\.continue[/\\]/.test(content) || /claude_desktop_config/.test(content) || /\bmcp\.json\b/.test(content))
+      (/\.claude[/\\]/.test(content) || /\.cursor[/\\]/.test(content) || /\.vscode[/\\]/.test(content) || /\.windsurf[/\\]/.test(content) || /\.codeium[/\\]/.test(content) || /\.continue[/\\]/.test(content) || /claude_desktop_config/.test(content) || /\bmcp\.json\b/.test(content)),
+    // WASM payload detection: WebAssembly.compile/instantiate with host import sinks
+    hasWasmLoad: /\bWebAssembly\s*\.\s*(compile|instantiate|compileStreaming|instantiateStreaming)\b/.test(content),
+    hasWasmHostSink: false,  // set in handleCallExpression when WASM import object contains network/fs sinks
+    hasProxyTrap: false  // set in handleNewExpression when Proxy has set/get/apply trap
   };
   // Compute fetchOnlySafeDomains: check if ALL URLs in file point to known registries

package/src/scanner/dataflow.js CHANGED Viewed

@@ -9,9 +9,11 @@ const { analyzeWithDeobfuscation } = require('../shared/analyze-helper.js');
 // Module classification maps for intra-file taint tracking
 const MODULE_SOURCE_METHODS = {
   os: {
-    homedir: 'fingerprint_read', hostname: 'fingerprint_read',
+    homedir: 'fingerprint_read',
     networkInterfaces: 'fingerprint_read', userInfo: 'fingerprint_read',
-    platform: 'telemetry_read', arch: 'telemetry_read'
+    hostname: 'telemetry_read', platform: 'telemetry_read', arch: 'telemetry_read',
+    type: 'telemetry_read', release: 'telemetry_read',
+    cpus: 'telemetry_read', totalmem: 'telemetry_read', freemem: 'telemetry_read'
   },
   fs: {
     readFileSync: 'credential_read', readFile: 'credential_read',
@@ -356,21 +358,17 @@ function analyzeFile(content, filePath, basePath) {
         }
       }
-      // os.hostname(), os.networkInterfaces(), os.userInfo(), os.homedir() as fingerprint sources
-      // os.platform(), os.arch() as telemetry sources (lower severity)
+      // os.* methods classified via MODULE_SOURCE_METHODS for consistent categorization
+      // fingerprint_read: homedir, networkInterfaces, userInfo (real exfil targets)
+      // telemetry_read: hostname, platform, arch, type, release, cpus, totalmem, freemem
       if (node.callee.type === 'MemberExpression') {
         const obj = node.callee.object;
         const prop = node.callee.property;
         if (obj?.type === 'Identifier' && obj.name === 'os' && prop?.type === 'Identifier') {
-          if (['hostname', 'networkInterfaces', 'userInfo', 'homedir'].includes(prop.name)) {
+          const osClassification = MODULE_SOURCE_METHODS.os?.[prop.name];
+          if (osClassification) {
             sources.push({
-              type: 'fingerprint_read',
-              name: `os.${prop.name}`,
-              line: node.loc?.start?.line
-            });
-          } else if (['platform', 'arch'].includes(prop.name)) {
-            sources.push({
-              type: 'telemetry_read',
+              type: osClassification,
               name: `os.${prop.name}`,
               line: node.loc?.start?.line
             });
@@ -742,8 +740,9 @@ const SENSITIVE_PATH_PATTERNS = [
   '.ethereum', '.electrum', '.config/solana', '.exodus',
   '.atomic', '.metamask', '.ledger-live', '.trezor',
   '.bitcoin', '.monero', '.gnupg',
-  '_cacache', '.cache/yarn', '.cache/pip',
-  'discord', 'leveldb'
+  '_cacache', '.cache/yarn', '.cache/pip'
+  // P6: Removed discord, leveldb — data directories, not credential paths.
+  // _cacache/.cache kept — real cache poisoning vectors (T1195.002).
 ];
 function isSensitivePath(val) {
@@ -818,8 +817,20 @@ const SYSTEM_IDENTITY_ENVS = new Set([
 // Env var prefixes for tool-internal configuration (not external credentials)
 const SAFE_ENV_PREFIXES = ['MUADDIB_', 'npm_config_', 'npm_lifecycle_', 'npm_package_'];
+// P6: Node.js runtime config env vars that are not credentials.
+// NODE_TLS_REJECT_UNAUTHORIZED matches "AUTH" in "UNAUTHORIZED" → false positive.
+// Real credential exfiltration targets API_KEY, TOKEN, SECRET, PASSWORD.
+const DATAFLOW_SAFE_ENV_VARS = new Set([
+  'NODE_TLS_REJECT_UNAUTHORIZED', 'NODE_OPTIONS', 'NODE_EXTRA_CA_CERTS',
+  'NODE_ENV', 'NODE_PATH', 'NODE_DEBUG',
+  'DEBUG', 'CI', 'HTTPS_PROXY', 'HTTP_PROXY', 'NO_PROXY',
+  'LANG', 'TZ', 'PORT', 'HOST'
+  // Note: HOME, USER, HOSTNAME stay sensitive — fingerprint exfiltration detection.
+]);
 function isSensitiveEnv(name) {
   const upper = name.toUpperCase();
+  if (DATAFLOW_SAFE_ENV_VARS.has(upper)) return false;
   if (SYSTEM_IDENTITY_ENVS.has(upper)) return true;
   if (SAFE_ENV_PREFIXES.some(p => upper.startsWith(p))) return false;
   const sensitive = ['TOKEN', 'SECRET', 'KEY', 'PASSWORD', 'CREDENTIAL', 'AUTH', 'NPM', 'AWS', 'AZURE', 'GCP'];

package/src/scanner/entropy.js CHANGED Viewed

@@ -266,7 +266,7 @@ function scanEntropy(targetPath, options = {}) {
     }
     // B11: Fragment cluster — many short high-entropy strings = payload fragmentation
-    const FRAG_MIN = 8, FRAG_MAX = 49, FRAG_COUNT = 5, FRAG_ENTROPY = 4.5;
+    const FRAG_MIN = 8, FRAG_MAX = 49, FRAG_COUNT = 10, FRAG_ENTROPY = 5.0;
     const frags = strings.filter(s =>
       s.length >= FRAG_MIN && s.length <= FRAG_MAX &&
       !SOURCE_MAP_REGEX.test(s) && !SHA256_HEX_REGEX.test(s) && !MD5_HEX_REGEX.test(s) &&

package/src/scanner/module-graph.js CHANGED Viewed

@@ -5,7 +5,7 @@ const { findFiles, EXCLUDED_DIRS } = require('../utils');
 const { ACORN_OPTIONS: BASE_ACORN_OPTIONS, safeParse } = require('../shared/constants.js');
 // --- Sensitive source patterns ---
-const SENSITIVE_MODULES = new Set(['fs', 'child_process', 'dns', 'os']);
+const SENSITIVE_MODULES = new Set(['fs', 'child_process', 'dns', 'os', 'dgram']);
 const ACORN_OPTIONS = {
   ...BASE_ACORN_OPTIONS,
@@ -151,10 +151,15 @@ function analyzeExports(filePath) {
   // Track class declarations: class Foo { ... }
   const classDefs = {};
+  // Track function declarations: function foo() { ... }
+  const funcDefs = {};
   walkAST(ast, (node) => {
     if (node.type === 'ClassDeclaration' && node.id && node.id.name) {
       classDefs[node.id.name] = node;
     }
+    if (node.type === 'FunctionDeclaration' && node.id && node.id.name) {
+      funcDefs[node.id.name] = node;
+    }
   });
   // First pass: collect require assignments, ES imports, and tainted variable assignments
@@ -309,6 +314,16 @@ function analyzeExports(filePath) {
             } else if (prop.value.type === 'Identifier' && taintedVars[prop.value.name]) {
               const t = taintedVars[prop.value.name];
               exports[propName] = { tainted: true, source: t.source, detail: t.detail };
+            } else if (prop.value.type === 'Identifier' && funcDefs[prop.value.name]) {
+              // Shorthand property referencing a FunctionDeclaration: { readConfig }
+              const fnNode = funcDefs[prop.value.name];
+              const fnBody = fnNode.body && fnNode.body.type === 'BlockStatement' ? fnNode.body.body : null;
+              if (fnBody) {
+                const bodyTaint = scanBodyForTaint(fnBody, moduleVars, taintedVars);
+                if (bodyTaint) {
+                  exports[propName] = { tainted: true, source: bodyTaint.source, detail: bodyTaint.detail };
+                }
+              }
             }
           }
         }
@@ -1081,8 +1096,283 @@ function toRel(abs, packagePath) {
   return path.relative(packagePath, abs).replace(/\\/g, '/');
 }
+// =============================================================================
+// STEP 4 — Sink export annotation (for callback-based cross-file detection)
+// =============================================================================
+/**
+ * Annotate exports that contain network/exec sinks in their function body.
+ * This is the inverse of annotateTaintedExports — finds "where data goes out".
+ * Used to detect callback-based cross-file exfiltration:
+ *   reader.js exports readConfig() (tainted source)
+ *   sender.js exports sendData() (sink export)
+ *   index.js connects them via callback: readConfig((data) => sendData(data))
+ */
+function annotateSinkExports(graph, packagePath) {
+  const result = {};
+  for (const relFile of Object.keys(graph)) {
+    const absFile = path.resolve(packagePath, relFile);
+    result[relFile] = analyzeSinkExports(absFile);
+  }
+  return result;
+}
+function analyzeSinkExports(filePath) {
+  const ast = parseFile(filePath);
+  if (!ast) return {};
+  const sinkExports = {};
+  // Track function declarations for shorthand property resolution
+  const localFuncDefs = {};
+  walkAST(ast, (node) => {
+    if (node.type === 'FunctionDeclaration' && node.id && node.id.name) {
+      localFuncDefs[node.id.name] = node;
+    }
+  });
+  // Collect require assignments for sink module detection
+  const sinkModuleVars = {};
+  walkAST(ast, (node) => {
+    if (node.type === 'VariableDeclaration') {
+      for (const decl of node.declarations) {
+        if (!decl.init || !decl.id || decl.id.type !== 'Identifier') continue;
+        if (isRequireCall(decl.init)) {
+          const mod = decl.init.arguments[0].value;
+          if (mod === 'http' || mod === 'https' || mod === 'net' || mod === 'dgram') {
+            sinkModuleVars[decl.id.name] = mod;
+          }
+        }
+      }
+    }
+    if (node.type === 'ImportDeclaration' && node.source && typeof node.source.value === 'string') {
+      const mod = node.source.value;
+      if (mod === 'http' || mod === 'https' || mod === 'net' || mod === 'dgram') {
+        for (const spec of node.specifiers) {
+          sinkModuleVars[spec.local.name] = mod;
+        }
+      }
+    }
+  });
+  function bodyHasSink(body) {
+    let found = null;
+    walkAST({ type: 'Program', body }, (node) => {
+      if (found) return;
+      if (node.type === 'CallExpression') {
+        // fetch(), eval()
+        if (node.callee.type === 'Identifier' && SINK_CALLEE_NAMES.has(node.callee.name)) {
+          found = node.callee.name + '()';
+          return;
+        }
+        // https.request(), http.get()
+        if (node.callee.type === 'MemberExpression') {
+          const chain = getMemberChain(node.callee);
+          if (SINK_MEMBER_METHODS.has(chain)) {
+            found = chain + '()';
+            return;
+          }
+          // Variable-based: const h = require('https'); h.request()
+          if (node.callee.object.type === 'Identifier' && sinkModuleVars[node.callee.object.name]) {
+            const method = node.callee.property.name || node.callee.property.value;
+            if (method === 'request' || method === 'get') {
+              found = sinkModuleVars[node.callee.object.name] + '.' + method + '()';
+              return;
+            }
+          }
+          // .write(), .send(), .connect()
+          const method = node.callee.property.name || node.callee.property.value;
+          if (SINK_INSTANCE_METHODS.has(method)) {
+            found = method + '()';
+            return;
+          }
+        }
+      }
+    });
+    return found;
+  }
+  // Check module.exports = { fn: function() { ... sink ... } }
+  walkAST(ast, (node) => {
+    if (isModuleExportsAssign(node)) {
+      const value = node.expression.right;
+      const exportName = getExportName(node.expression.left);
+      if (value.type === 'ObjectExpression' && exportName === 'default') {
+        for (const prop of value.properties) {
+          if (!prop.key) continue;
+          const propName = prop.key.name || prop.key.value || 'unknown';
+          let funcBody = getFunctionBody(prop.value);
+          // Shorthand property referencing a FunctionDeclaration: { reportData }
+          if (!funcBody && prop.value.type === 'Identifier' && localFuncDefs[prop.value.name]) {
+            const fnNode = localFuncDefs[prop.value.name];
+            funcBody = fnNode.body && fnNode.body.type === 'BlockStatement' ? fnNode.body.body : null;
+          }
+          if (funcBody) {
+            const sink = bodyHasSink(funcBody);
+            if (sink) {
+              sinkExports[propName] = { hasSink: true, sink };
+            }
+          }
+        }
+      } else {
+        const funcBody = getFunctionBody(value);
+        if (funcBody) {
+          const sink = bodyHasSink(funcBody);
+          if (sink) {
+            sinkExports[exportName] = { hasSink: true, sink };
+          }
+        }
+      }
+    }
+    // export function foo() { ... sink ... }
+    if (node.type === 'ExportNamedDeclaration' && node.declaration) {
+      const decl = node.declaration;
+      if (decl.type === 'FunctionDeclaration' && decl.id) {
+        const funcBody = decl.body && decl.body.type === 'BlockStatement' ? decl.body.body : null;
+        if (funcBody) {
+          const sink = bodyHasSink(funcBody);
+          if (sink) {
+            sinkExports[decl.id.name] = { hasSink: true, sink };
+          }
+        }
+      }
+    }
+  });
+  return sinkExports;
+}
+/**
+ * Detect callback-based cross-file flows.
+ * Pattern: file imports tainted source fn + sink fn, connects them via callback.
+ * Example: readConfig((err, data) => { sendData(data); })
+ * Also: const data = readConfig(); sendData(data);
+ */
+function detectCallbackCrossFileFlows(graph, taintedExports, sinkExports, packagePath) {
+  const expandedTaint = expandTaintThroughReexports(graph, taintedExports, packagePath);
+  const flows = [];
+  for (const relFile of Object.keys(graph)) {
+    const absFile = path.resolve(packagePath, relFile);
+    const ast = parseFile(absFile);
+    if (!ast) continue;
+    const fileDir = path.dirname(absFile);
+    // Collect imported tainted source functions and imported sink functions
+    const importedSources = {}; // varName → { sourceFile, source, detail }
+    const importedSinks = {};   // varName → { sinkFile, sink }
+    walkAST(ast, (node) => {
+      if (node.type !== 'VariableDeclaration') return;
+      for (const decl of node.declarations) {
+        if (!decl.init || !decl.id) continue;
+        // const { readConfig } = require('./reader')
+        if (isRequireCall(decl.init) && isLocalImport(decl.init.arguments[0].value)) {
+          const spec = decl.init.arguments[0].value;
+          const resolved = resolveLocal(fileDir, spec, packagePath);
+          if (!resolved) continue;
+          if (decl.id.type === 'ObjectPattern') {
+            for (const prop of decl.id.properties) {
+              const key = prop.key && (prop.key.name || prop.key.value);
+              const localName = prop.value && prop.value.name;
+              if (!key || !localName) continue;
+              // Check if this is a tainted source export
+              if (expandedTaint[resolved] && expandedTaint[resolved][key] && expandedTaint[resolved][key].tainted) {
+                const t = expandedTaint[resolved][key];
+                importedSources[localName] = {
+                  sourceFile: t.sourceFile || resolved,
+                  source: t.source,
+                  detail: t.detail || ''
+                };
+              }
+              // Check if this is a sink export
+              if (sinkExports[resolved] && sinkExports[resolved][key] && sinkExports[resolved][key].hasSink) {
+                importedSinks[localName] = {
+                  sinkFile: resolved,
+                  sink: sinkExports[resolved][key].sink
+                };
+              }
+            }
+          }
+          if (decl.id.type === 'Identifier') {
+            // Whole module import: const reader = require('./reader')
+            // Check default taint
+            if (expandedTaint[resolved] && expandedTaint[resolved]['default'] && expandedTaint[resolved]['default'].tainted) {
+              const t = expandedTaint[resolved]['default'];
+              importedSources[decl.id.name] = {
+                sourceFile: t.sourceFile || resolved,
+                source: t.source,
+                detail: t.detail || ''
+              };
+            }
+            // Check default sink
+            if (sinkExports[resolved] && sinkExports[resolved]['default'] && sinkExports[resolved]['default'].hasSink) {
+              importedSinks[decl.id.name] = {
+                sinkFile: resolved,
+                sink: sinkExports[resolved]['default'].sink
+              };
+            }
+          }
+        }
+      }
+    });
+    // If we have both imported sources and sinks, check for callback connections
+    if (Object.keys(importedSources).length === 0 || Object.keys(importedSinks).length === 0) continue;
+    // Pattern 1: sourceFn(function(err, data) { sinkFn(data); })
+    // Pattern 2: const result = sourceFn(); sinkFn(result);
+    walkAST(ast, (node) => {
+      if (node.type !== 'CallExpression') return;
+      // Check if the call is to an imported source
+      const calleeName = node.callee.type === 'Identifier' ? node.callee.name : null;
+      if (!calleeName || !importedSources[calleeName]) return;
+      // Check if any argument is a callback that calls an imported sink
+      for (const arg of node.arguments) {
+        if (arg.type === 'FunctionExpression' || arg.type === 'ArrowFunctionExpression') {
+          const body = arg.body.type === 'BlockStatement' ? arg.body.body : [arg.body];
+          walkAST({ type: 'Program', body }, (inner) => {
+            if (inner.type !== 'CallExpression') return;
+            const innerCallee = inner.callee.type === 'Identifier' ? inner.callee.name : null;
+            if (innerCallee && importedSinks[innerCallee]) {
+              const src = importedSources[calleeName];
+              const snk = importedSinks[innerCallee];
+              // Avoid duplicates
+              const key = `${src.sourceFile}→${relFile}→${snk.sinkFile}`;
+              if (!flows.some(f => `${f.sourceFile}→${f.sinkFile}→${snk.sinkFile}` === key)) {
+                flows.push({
+                  severity: 'CRITICAL',
+                  type: 'cross_file_dataflow',
+                  sourceFile: src.sourceFile,
+                  source: `${src.source}${src.detail ? '(' + src.detail + ')' : ''}`,
+                  sinkFile: relFile,
+                  sink: snk.sink,
+                  description: `Credential read in ${src.sourceFile} passed via callback to network sink (${snk.sink}) imported from ${snk.sinkFile} in ${relFile}`,
+                });
+              }
+            }
+          });
+        }
+      }
+    });
+  }
+  return flows;
+}
 module.exports = {
   buildModuleGraph, annotateTaintedExports, detectCrossFileFlows,
+  annotateSinkExports, detectCallbackCrossFileFlows,
   resolveLocal, extractLocalImports, parseFile, isLocalImport, toRel, isFileExists,
   tryResolveConcatRequire
 };

package/src/scanner/obfuscation.js CHANGED Viewed

@@ -20,7 +20,10 @@ function detectObfuscation(targetPath) {
     const pathParts = relativePath.split(path.sep);
     const isInDistOrBuild = pathParts.some(p => p === 'dist' || p === 'build');
     const isLargeCjsMjs = (basename.endsWith('.cjs') || basename.endsWith('.mjs')) && content.length > 100 * 1024;
-    const isPackageOutput = isMinified || isBundled || isInDistOrBuild || isLargeCjsMjs;
+    // P6: Any JS file > 100KB is overwhelmingly bundled output regardless of directory name.
+    // Real obfuscated malware is typically small (<50KB). Catches prettier plugins/, svelte compiler/, etc.
+    const isLargeJs = basename.endsWith('.js') && content.length > 100 * 1024;
+    const isPackageOutput = isMinified || isBundled || isInDistOrBuild || isLargeCjsMjs || isLargeJs;
     // 1. Ratio code sur une seule ligne (skip .min.js — minification, not obfuscation)
     if (!isMinified) {

package/src/scoring.js CHANGED Viewed

@@ -108,8 +108,8 @@ const FP_COUNT_THRESHOLDS = {
   require_cache_poison: { maxCount: 3, from: 'CRITICAL', to: 'LOW' },
   suspicious_dataflow: { maxCount: 3, to: 'LOW' },
   obfuscation_detected: { maxCount: 3, to: 'LOW' },
-  module_compile_dynamic: { maxCount: 3, from: 'CRITICAL', to: 'LOW' },
-  module_compile: { maxCount: 3, from: 'CRITICAL', to: 'LOW' },
+  module_compile_dynamic: { maxCount: 3, from: 'HIGH', to: 'LOW' },
+  module_compile: { maxCount: 3, from: 'HIGH', to: 'LOW' },
   zlib_inflate_eval: { maxCount: 2, from: 'CRITICAL', to: 'LOW' },
   // Build tools (webpack, jest) legitimately use vm.runInThisContext for module evaluation
   vm_code_execution: { maxCount: 3, from: 'HIGH', to: 'LOW' },
@@ -118,7 +118,12 @@ const FP_COUNT_THRESHOLDS = {
   // P4: hash algorithms contain bit manipulation that triggers obfuscation heuristics
   js_obfuscation_pattern: { maxCount: 1, from: 'HIGH', to: 'LOW' },
   // P4: bundled credential_tampering from minified alias resolution (jspdf, lerna)
-  credential_tampering: { maxCount: 5, to: 'LOW' }
+  credential_tampering: { maxCount: 5, to: 'LOW' },
+  // B1 FP reduction: bundled code aliases eval/Function (sinon, storybook, vitest)
+  dangerous_call_eval: { maxCount: 3, from: 'MEDIUM', to: 'LOW' },
+  // P6: HTTP client libraries (undici, aws-sdk, nodemailer, jsdom) parse Authorization/Bearer headers
+  // with 5+ credential regexes. Real harvesters use 1-2 targeted regexes.
+  credential_regex_harvest: { maxCount: 4, from: 'HIGH', to: 'LOW' }
 };
 // Types exempt from dist/ downgrade — IOC matches, lifecycle scripts, and
@@ -134,11 +139,25 @@ const DIST_EXEMPT_TYPES = new Set([
   'cross_file_dataflow',      // credential read → network exfil across files
   'staged_eval_decode',       // eval(atob(...)) (explicit payload staging)
   'reverse_shell'             // net.Socket + connect + pipe (always malicious)
+  // P6: remote_code_load and proxy_data_intercept removed — in bundled dist/ files,
+  // fetch + eval co-occurrence is coincidental (bundler combines HTTP client + template compilation).
+  // fetch_decrypt_exec (fetch+decrypt+eval triple) remains exempt — never coincidental.
 ]);
 // Regex matching dist/build/minified/bundled file paths
 const DIST_FILE_RE = /(?:^|[/\\])(?:dist|build)[/\\]|\.min\.js$|\.bundle\.js$/i;
+// Bundler artifact types: get two-notch downgrade in dist/ files (CRITICAL→MEDIUM, HIGH→LOW).
+// These are individual pattern signals that bundlers routinely produce (eval for globalThis,
+// dynamic require for code-splitting, minification obfuscation, etc.)
+const DIST_BUNDLER_ARTIFACT_TYPES = new Set([
+  'dangerous_call_eval', 'dangerous_call_function',
+  'dynamic_require', 'dynamic_import',
+  'obfuscation_detected', 'high_entropy_string', 'possible_obfuscation',
+  'js_obfuscation_pattern', 'vm_code_execution',
+  'module_compile', 'module_compile_dynamic'
+]);
 // Types exempt from reachability downgrade — IOC matches, lifecycle, and package-level types.
 // NOTE: Uses the base IOC/lifecycle exempt set, NOT full DIST_EXEMPT_TYPES.
 // Compound detections (zlib_inflate_eval, staged_eval_decode, etc.) should still be
@@ -244,13 +263,23 @@ function applyFPReductions(threats, reachableFiles, packageName) {
       }
     }
-    // Dist/build/minified files: bundler artifacts get severity downgraded one notch.
-    // Reduced from two-notch (audit fix): 2-notch made dist/ attacks invisible (CRITICAL→MEDIUM=3pts).
+    // Dist/build/minified files: severity downgrade for bundler output.
     // Compound detections are exempt (DIST_EXEMPT_TYPES).
+    // Bundler artifact types (eval, dynamic_require, obfuscation) get two-notch downgrade
+    // (CRITICAL→MEDIUM, HIGH→LOW) since bundlers routinely produce these patterns.
+    // Other non-exempt types keep one-notch downgrade.
     if (t.file && !DIST_EXEMPT_TYPES.has(t.type) && DIST_FILE_RE.test(t.file)) {
-      if (t.severity === 'CRITICAL') t.severity = 'HIGH';
-      else if (t.severity === 'HIGH') t.severity = 'MEDIUM';
-      else if (t.severity === 'MEDIUM') t.severity = 'LOW';
+      if (DIST_BUNDLER_ARTIFACT_TYPES.has(t.type)) {
+        // Two-notch downgrade for bundler artifacts
+        if (t.severity === 'CRITICAL') t.severity = 'MEDIUM';
+        else if (t.severity === 'HIGH') t.severity = 'LOW';
+        else if (t.severity === 'MEDIUM') t.severity = 'LOW';
+      } else {
+        // One-notch downgrade for other non-exempt types
+        if (t.severity === 'CRITICAL') t.severity = 'HIGH';
+        else if (t.severity === 'HIGH') t.severity = 'MEDIUM';
+        else if (t.severity === 'MEDIUM') t.severity = 'LOW';
+      }
     }
     // Reachability: findings in files not reachable from entry points → LOW