npm - muaddib-scanner - Versions diffs - 2.11.76 → 2.11.78 - Mend

muaddib-scanner 2.11.76 → 2.11.78

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (40) hide show

package/.githooks/pre-commit +18 -0
package/README.md +15 -6
package/bin/muaddib.js +18 -4
package/package.json +1 -2
package/{self-scan-v2.11.76.json → self-scan-v2.11.78.json} +1 -1
package/src/commands/interactive.js +5 -6
package/src/commands/safe-install.js +19 -19
package/src/ioc/scraper.js +46 -10
package/src/monitor/daemon.js +39 -28
package/src/monitor/ingestion.js +32 -2
package/src/monitor/queue.js +84 -21
package/src/monitor/scan-queue.js +68 -1
package/src/monitor/state.js +24 -1
package/src/monitor/webhook.js +32 -11
package/src/output/formatter.js +3 -4
package/src/pipeline/executor.js +9 -1
package/src/runtime/daemon.js +27 -28
package/src/runtime/watch.js +7 -7
package/src/sandbox/index.js +11 -9
package/src/scanner/temporal-analysis.js +8 -0
package/src/scanner/temporal-ast-diff.js +5 -0
package/src/utils.js +60 -1
package/.dockerignore +0 -7
package/.env.example +0 -43
package/ml-retrain/auto-labeler/auto_labeler.py +0 -312
package/ml-retrain/auto-labeler/ghsa_checker.py +0 -169
package/ml-retrain/auto-labeler/labeler.py +0 -256
package/ml-retrain/auto-labeler/npm_checker.py +0 -228
package/ml-retrain/auto-labeler/ossf_index.py +0 -178
package/ml-retrain/auto-labeler/requirements.txt +0 -1
package/ml-retrain/confusion-matrix.png +0 -0
package/ml-retrain/model-trees-retrained.js +0 -12
package/ml-retrain/retrain-report.json +0 -225
package/ml-retrain/retrain.py +0 -974
package/sbom.json +0 -0
package/src/ml/train-bundler-detector.py +0 -725
package/src/ml/train-xgboost.py +0 -957
package/tools/export-model-js.py +0 -160
package/tools/requirements-ml.txt +0 -5
package/tools/train-classifier.py +0 -333

package/.githooks/pre-commit ADDED Viewed

@@ -0,0 +1,18 @@
+#!/bin/sh
+# Pre-commit guard — block committing a typosquat/forbidden dependency
+# (loadash, lodash, lodahs, …). Local mirror of the CI gate
+# (scripts/check-deps-typosquats.js, run in the `test` job of .github/workflows/scan.yml)
+# and defense-in-depth with the package.json `preinstall` denylist.
+#
+# Enable once per clone:  git config core.hooksPath .githooks
+#
+# This repo uses NEITHER lodash NOR loadash (CLAUDE.md interdiction). The loadash
+# typosquat has re-entered package.json repeatedly; this stops it at commit time.
+node scripts/check-deps-typosquats.js
+status=$?
+if [ "$status" -ne 0 ]; then
+  echo "pre-commit: ABORTED — remove the typosquat dependency from package.json (see above)." >&2
+  exit 1
+fi
+exit 0

package/README.md CHANGED Viewed

@@ -30,7 +30,7 @@
 npm and PyPI supply-chain attacks are exploding. Shai-Hulud compromised 25K+ repos in 2025. Existing tools detect threats but don't help you respond.
-MUAD'DIB combines **20 parallel scanners** (262 detection rules), a **deobfuscation engine**, **inter-module dataflow analysis**, **compound scoring** (17 compound rules), and a gVisor/Docker sandbox to detect known threats and suspicious behavioral patterns in npm and PyPI packages. An XGBoost classifier exists in the codebase but is **currently inactive** (see [Evaluation Metrics](#evaluation-metrics) → ML Classifier section).
+MUAD'DIB combines **20 parallel scanners** (264 detection rules), a **deobfuscation engine**, **inter-module dataflow analysis**, **compound scoring** (17 compound rules), and a gVisor/Docker sandbox to detect known threats and suspicious behavioral patterns in npm and PyPI packages. An XGBoost classifier exists in the codebase but is **currently inactive** (see [Evaluation Metrics](#evaluation-metrics) → ML Classifier section).
 ---
@@ -202,9 +202,9 @@ muaddib replay                     # Ground truth validation (90/94 TPR@3, v2.11
 | Python Source (PYSRC) | Import-time / install-time RCE patterns in `__init__.py` / `setup.py` (v2.11.41 — closes TrapDoor PyPI gap) |
 | Python AST (PYAST) | Tree-sitter-Python AST with taint-aware detectors (v2.11.42+) |
-### 259 detection rules
+### 264 detection rules
-All rules (254 RULES + 5 PARANOID) are mapped to MITRE ATT&CK techniques. See [SECURITY.md](SECURITY.md#detection-rules-v21147) for the complete rules reference.
+All rules (259 RULES + 5 PARANOID) are mapped to MITRE ATT&CK techniques. See [SECURITY.md](SECURITY.md#detection-rules-v21176) for the complete rules reference.
 ### Detected campaigns
@@ -278,7 +278,7 @@ With pre-commit framework:
 ```yaml
 repos:
   - repo: https://github.com/DNSZLSK/muad-dib
-    rev: v2.11.48
+    rev: v2.11.76
     hooks:
       - id: muaddib-scan
 ```
@@ -303,11 +303,20 @@ These are the numbers a user gets when running `muaddib scan` against npm or PyP
 | **FPR PyPI** (v2.11.48, first honest measurement) | **9.68%** (12/124 scanned, 132 total) | **Track D fixed the PyPI downloader** — removed `pip --no-binary :all:` flag (forced compile of wheel-only packages, timed out 38% of the time) + added `.whl` extraction via `extractArchive()`. Brought 42 previously-skipped giants (numpy/pandas/django/matplotlib/scikit-learn/...) into scope. All 12 FPs cluster at score 25-35: this is the cap-PyPI-35 artifact, not new rule misfires. Lifting the cap (Track E) would drop FPR PyPI to ≈0%. 8 residual fails are >500MB packages (torch, tensorflow, scipy, opencv-python, ansible…) hitting the 30s `PACK_TIMEOUT_MS`. |
 | **ADR** (Adversarial + Holdout, v2.11.48) | **96.26%** (103/107) | 67 adversarial + 40 holdout, global threshold=20. Stable vs v2.10.95. |
-**3969 tests** across 109 files. **262 rules** (257 RULES + 5 PARANOID — Track D added 3: AST-093, AST-094, COMPOUND-016).
+**4132 tests** across 115 files. **264 rules** (259 RULES + 5 PARANOID; v2.11.67/70 Phantom Gyp added PKG-023 + COMPOUND-017).
 **Known issues (v2.11.48):**
 - *Cap PyPI à 35/100*: Python samples plafonnent à `riskScore=35` even when `globalRiskScore=100`. Confirmed empirically — all 12 PyPI FPs at score 25-35 (flask 32, django 35, tornado 35, bottle 30, pandas 25, matplotlib 25, plotly 25, bokeh 25, pymongo 35, coverage 32, fabric 35, websockets 35). Lifting the cap will simultaneously drop FPR PyPI to ≈0% and unblock PyPI MALWARE detection at higher thresholds. Track E target.
+### Operational coverage (v2.11.67-76)
+The static ground-truth TPR above is measured offline. Since v2.11.67 the monitor also tracks **operational** coverage on live npm/PyPI ingestion:
+- A per-scan **ledger** (`data/scan-ledger.jsonl`) records every scanned package's outcome; `computeLedgerRollup()` produces a 24h rollup (`alertRate`, per-ecosystem). Note: `alertRate` is a throughput signal, **not** detection TPR.
+- An active **GHSA poller** (~15 min; npm, pypi, crates) builds an authoritative "what should we have caught" denominator (`data/ghsa-malware.jsonl`), plus a **feed-health** alarm that fires when an IOC feed silently goes dark.
+- The Phase 5 **coverage-audit** (`scripts/coverage-audit.js`, daily 05:00 UTC) joins that denominator against ledger outcomes + the tarball archive to compute an honest GHSA-denominated **operational TPR** (`alerted / total`), and surfaces `scannedClean` misses as human-gated ground-truth candidates.
+This operational TPR is the real production detection rate, distinct from the static GT TPR (which has not been re-measured since v2.11.48).
 ### ML Classifier (offline only)
 `src/ml/classifier.js` is **not wired into `muaddib scan`**. The XGBoost model is currently exercised only by `muaddib evaluate` (offline metric replay) and `muaddib monitor` (LOG-ONLY since 2026-04-08, model collapsed pending retrain — see `src/monitor/queue.js:628`). The v2.11.48 evaluate-time replay shows the same 1.10% FPR (no additional FPs filtered) — kept as a reference for retrain validation, but the published operational FPR is the rules-only number above.
@@ -371,7 +380,7 @@ npm test
 ### Testing
-- **3913 tests** across 109 modular test files
+- **4132 tests** across 115 modular test files
 - **56 fuzz tests** - Malformed inputs, ReDoS, unicode, binary
 - **Datadog 17K benchmark** - 14,587 confirmed malware samples (in-scope)
 - **Ground truth validation** - 96 real-world attacks (95.74% TPR@3, 88.30% TPR@20 — v2.11.48 full measure on 94 in-scope)

package/bin/muaddib.js CHANGED Viewed

@@ -31,6 +31,23 @@ const { diff, showRefs } = require('../src/diff.js');
 const { initHooks, removeHooks } = require('../src/hooks-init.js');
 const { showHelp, commandHelp } = require('../src/commands/help.js');
 const { interactiveMenu } = require('../src/commands/interactive.js');
+const { isPromptCancellation } = require('../src/utils.js');
+// Global safety net: turn an unhandled async error into a clean one-line message
+// instead of a raw stack trace. Ctrl-C inside an interactive prompt exits 130
+// (POSIX SIGINT convention); any other error exits 1. Set MUADDIB_DEBUG=1 to see
+// the full stack.
+function handleFatal(err) {
+  if (isPromptCancellation(err)) {
+    console.log('\nCancelled.');
+    process.exit(130);
+  }
+  console.error('[ERROR]', err && err.message ? err.message : String(err));
+  if (process.env.MUADDIB_DEBUG && err && err.stack) console.error(err.stack);
+  process.exit(1);
+}
+process.on('unhandledRejection', handleFatal);
+process.on('uncaughtException', handleFatal);
 const args = process.argv.slice(2);
 const command = args[0];
@@ -274,10 +291,7 @@ if (command === 'version' || command === '--version' || command === '-v') {
   if (command === '--help' || command === '-h') {
     showHelp();
   }
-  interactiveMenu().catch(err => {
-    console.error('[ERROR]', err.message);
-    process.exit(1);
-  });
+  interactiveMenu().catch(handleFatal);
 } else if (command === 'scan') {
   if (wantHelp) showHelp('scan');
   run(target, {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "muaddib-scanner",
-  "version": "2.11.76",
+  "version": "2.11.78",
   "description": "Supply-chain threat detection & response for npm & PyPI/Python",
   "main": "src/index.js",
   "bin": {
@@ -52,7 +52,6 @@
     "acorn-walk": "8.3.5",
     "adm-zip": "0.5.17",
     "js-yaml": "4.2.0",
-    "loadash": "^1.0.0",
     "web-tree-sitter": "^0.26.9"
   },
   "devDependencies": {

package/{self-scan-v2.11.76.json → self-scan-v2.11.78.json} RENAMED Viewed

@@ -1,6 +1,6 @@
 {
   "target": "node_modules",
-  "timestamp": "2026-06-07T19:47:48.330Z",
+  "timestamp": "2026-06-08T18:21:33.065Z",
   "threats": [
     {
       "type": "string_mutation_obfuscation",

package/src/commands/interactive.js CHANGED Viewed

@@ -8,16 +8,15 @@ const { safeInstall } = require('../safe-install.js');
 const { buildSandboxImage, runSandbox, generateNetworkReport } = require('../sandbox/index.js');
 const { diff } = require('../diff.js');
 const { initHooks } = require('../hooks-init.js');
+const { banner } = require('../utils.js');
 async function interactiveMenu() {
   const { select, input, confirm } = await import('@inquirer/prompts');
-  console.log(`
-  ╔═══════════════════════════════════════════════╗
-  ║   MUAD'DIB - npm & PyPI Supply Chain Hunter  ║
-  ║   "The worms must die."                      ║
-  ╚═══════════════════════════════════════════════╝
-  `);
+  console.log('\n' + banner([
+    "MUAD'DIB - npm & PyPI Supply Chain Hunter",
+    '"The worms must die."'
+  ]) + '\n');
   const action = await select({
     message: 'What do you want to do?',

package/src/commands/safe-install.js CHANGED Viewed

@@ -1,8 +1,14 @@
 const fs = require('fs');
 const path = require('path');
-const { spawnSync } = require('child_process');
+// NB: keep the module object (cp.spawnSync), NOT a destructured `spawnSync`. Destructuring
+// captures the original reference at load time, which makes the function impossible to mock
+// from tests (`cp.spawnSync = ...` wouldn't be seen here) — that's exactly how the
+// safe-install test's mock silently failed and a real `npm install loadash` ran every test
+// run, re-contaminating package.json. Property access keeps it interceptable.
+const cp = require('child_process');
 const { loadCachedIOCs } = require('../ioc/updater.js');
 const { REHABILITATED_PACKAGES, NPM_PACKAGE_REGEX } = require('../shared/constants.js');
+const { banner } = require('../utils.js');
 /**
  * Validates that a package name is safe (no command injection)
@@ -172,7 +178,7 @@ async function scanPackageRecursive(pkg, depth = 0, maxDepth = 3) {
   // Get the package info (uses spawnSync to avoid injection)
   let pkgInfo;
   try {
-    const result = spawnSync('npm', ['view', pkgName, '--json'], { encoding: 'utf8', shell: false });
+    const result = cp.spawnSync('npm', ['view', pkgName, '--json'], { encoding: 'utf8', shell: false });
     if (result.status !== 0 || !result.stdout) {
       if (depth === 0) console.log(`[!] Package ${pkgName} not found on npm`);
       return { safe: false, package: pkgName, reason: 'npm_unreachable', source: 'npm-registry', description: 'Package not found on npm registry', depth };
@@ -207,12 +213,10 @@ async function scanPackageRecursive(pkg, depth = 0, maxDepth = 3) {
 async function safeInstall(packages, options = {}) {
   const { isDev, isGlobal, force } = options;
-  console.log(`
-╔══════════════════════════════════════════╗
-║   MUAD'DIB Safe Install                  ║
-║   Scanning packages + dependencies...    ║
-╚══════════════════════════════════════════╝
-`);
+  console.log('\n' + banner([
+    "MUAD'DIB Safe Install",
+    'Scanning packages + dependencies...'
+  ]) + '\n');
   // Reset the cache for each install
   scannedPackages.clear();
@@ -221,11 +225,7 @@ async function safeInstall(packages, options = {}) {
     const result = await scanPackageRecursive(pkg);
     if (!result.safe) {
-      console.log(`
-╔══════════════════════════════════════════╗
-║   [!] MALICIOUS PACKAGE DETECTED         ║
-╚══════════════════════════════════════════╝
-`);
+      console.log('\n' + banner(['[!] MALICIOUS PACKAGE DETECTED']) + '\n');
       if (result.depth > 0) {
         console.log(`Requested package: ${pkg}`);
         console.log(`Malicious dependency: ${result.package} (depth: ${result.depth})`);
@@ -240,11 +240,11 @@ async function safeInstall(packages, options = {}) {
         console.log('[!] Installation BLOCKED.');
         return { blocked: true, package: result.package, threats: [{ type: 'known_malicious', severity: 'CRITICAL', message: result.description }] };
       } else {
-        console.log('╔══════════════════════════════════════════╗');
-        console.log('║   [!!!] WARNING: FORCE INSTALL ACTIVE    ║');
-        console.log('║   Known malicious package detected!       ║');
-        console.log('║   Installing despite security threats.    ║');
-        console.log('╚══════════════════════════════════════════╝');
+        console.log(banner([
+          '[!!!] WARNING: FORCE INSTALL ACTIVE',
+          'Known malicious package detected!',
+          'Installing despite security threats.'
+        ]));
         console.log('[AUDIT] Force-install override for malicious package: ' + result.package);
         // SFI-004: Write audit log for force-install overrides
@@ -276,7 +276,7 @@ async function safeInstall(packages, options = {}) {
   if (isDev) npmArgs.push('--save-dev');
   if (isGlobal) npmArgs.push('-g');
-  const result = spawnSync('npm', npmArgs, { stdio: 'inherit', shell: false });
+  const result = cp.spawnSync('npm', npmArgs, { stdio: 'inherit', shell: false });
   if (result.status !== 0) {
     console.log('');

package/src/ioc/scraper.js CHANGED Viewed

@@ -7,9 +7,10 @@ const AdmZip = require('adm-zip');
 const IOC_FILE = path.join(__dirname, 'data/iocs.json');
 const COMPACT_IOC_FILE = path.join(__dirname, 'data/iocs-compact.json');
 const HOME_IOC_FILE = path.join(os.homedir(), '.muaddib', 'data', 'iocs.json');
-const { generateCompactIOCs, NEVER_WILDCARD } = require('./updater.js');
+const { generateCompactIOCs, NEVER_WILDCARD, expandCompactIOCs } = require('./updater.js');
 const { Spinner } = require('../utils.js');
 const { NPM_PACKAGE_REGEX } = require('../shared/constants.js');
+const { version: PKG_VERSION } = require('../../package.json');
 // Version format validation (semver-like + wildcard)
 // Permissive version validator — accepts:
@@ -43,6 +44,8 @@ const VERSION_RE = { test: isValidVersion };
 let _noVersionSkipCount = 0;
 let _invalidVersionSkipCount = 0;
 let _invalidVersionSamples = [];  // first 3 samples for context
+let _invalidNameSkipCount = 0;
+let _invalidNameSamples = [];  // first 3 samples for context
 /**
  * Validate an IOC package entry before insertion.
@@ -53,14 +56,18 @@ function validateIOCEntry(pkgName, version, ecosystem) {
   // npm: validate with NPM_PACKAGE_REGEX
   if (ecosystem === 'npm' || !ecosystem) {
     if (!NPM_PACKAGE_REGEX.test(pkgName)) {
-      console.warn(`[WARN] Invalid ${ecosystem || 'npm'} package name skipped: ${pkgName}`);
+      // Aggregated counter (summary emitted by runScraper) — was a per-line
+      // console.warn that spammed 100+ lines on feeds carrying non-spec names.
+      _invalidNameSkipCount++;
+      if (_invalidNameSamples.length < 3) _invalidNameSamples.push(pkgName);
       return false;
     }
   }
   // PyPI: basic check — no path traversal, no slashes
   if (ecosystem === 'pypi') {
     if (/[/\\]|\.\./.test(pkgName)) {
-      console.warn(`[WARN] Invalid PyPI package name skipped: ${pkgName}`);
+      _invalidNameSkipCount++;
+      if (_invalidNameSamples.length < 3) _invalidNameSamples.push(pkgName);
       return false;
     }
   }
@@ -955,14 +962,16 @@ async function scrapeGitHubAdvisory() {
 // ============================================
 async function runScraper() {
   console.log('\n' + '='.repeat(60));
-  console.log('  MUAD\'DIB IOC Scraper v4.1');
-  console.log('  OSV + OSSF + GenSecAI + DataDog + Aikido + OSM');
+  console.log('  MUAD\'DIB IOC Scraper v' + PKG_VERSION);
+  console.log('  OSV + OSSF + GitHub Advisory + GenSecAI + DataDog + Aikido + OSM');
   console.log('='.repeat(60) + '\n');
   // Reset aggregated warning counters
   _noVersionSkipCount = 0;
   _invalidVersionSkipCount = 0;
   _invalidVersionSamples = [];
+  _invalidNameSkipCount = 0;
+  _invalidNameSamples = [];
   // Create data directory if needed
   const dataDir = path.dirname(IOC_FILE);
@@ -997,11 +1006,28 @@ async function runScraper() {
     }
   }
+  // Fresh-install fallback: the full iocs.json is gitignored / not shipped in the
+  // npm package, but the compact baseline IS. Seed from it (the same path that
+  // `muaddib update` uses) so a first scrape augments the shipped baseline instead
+  // of appearing to start from zero and re-downloading everything.
+  let seededFromCompact = false;
+  if (existingIOCs.packages.length === 0 && fs.existsSync(COMPACT_IOC_FILE)) {
+    try {
+      const compactData = JSON.parse(fs.readFileSync(COMPACT_IOC_FILE, 'utf8'));
+      existingIOCs = expandCompactIOCs(compactData);
+      if (!existingIOCs.pypi_packages) existingIOCs.pypi_packages = [];
+      seededFromCompact = true;
+    } catch {
+      console.log('[WARN] Compact IOC baseline unreadable, starting fresh');
+    }
+  }
   const initialCount = existingIOCs.packages.length;
   const initialPyPICount = existingIOCs.pypi_packages.length;
   const initialHashCount = existingIOCs.hashes ? existingIOCs.hashes.length : 0;
-  console.log('[INFO] Existing IOCs: ' + initialCount + ' packages, ' + initialHashCount + ' hashes\n');
+  const baselineLabel = seededFromCompact ? 'Baseline IOCs loaded (shipped compact)' : 'Existing IOCs';
+  console.log('[INFO] ' + baselineLabel + ': ' + initialCount + ' packages, ' + initialHashCount + ' hashes\n');
   // Phase 1: OSV data dump first (bulk, primary source)
   // This returns knownIds so OSSF can skip already-known entries
@@ -1037,6 +1063,12 @@ async function runScraper() {
       : '';
     console.log('[SCRAPER] WARN: ' + _invalidVersionSkipCount + ' entries skipped (malformed version)' + samples);
   }
+  if (_invalidNameSkipCount > 0) {
+    const nameSamples = _invalidNameSamples.length > 0
+      ? ' (samples: ' + _invalidNameSamples.join(', ') + ')'
+      : '';
+    console.log('[SCRAPER] WARN: ' + _invalidNameSkipCount + ' invalid package names skipped' + nameSamples);
+  }
   // Merge all scraped packages
   const allPackages = [
@@ -1297,12 +1329,13 @@ async function runScraper() {
     console.log('     - ' + source + ': ' + count);
   }
-  // Target check
+  // Sanity check: a drop vs the previously-loaded baseline is a real signal of a
+  // feed outage or a corrupted merge — surface that instead of a meaningless target.
   const total = existingIOCs.packages.length;
-  if (total >= 5000) {
-    console.log('\n  [OK] Target reached: ' + total + ' IOCs (>= 5000)');
+  if (total < initialCount) {
+    console.log('\n  [WARN] IOC count decreased: ' + total + ' (was ' + initialCount + ') — possible source outage');
   } else {
-    console.log('\n  [WARN] Target NOT reached: ' + total + ' IOCs (< 5000)');
+    console.log('\n  [OK] IOC database: ' + total + ' npm IOCs');
   }
   console.log('\n');
@@ -1606,6 +1639,8 @@ async function queryOSVBatch(packageNames) {
 // Test helpers for aggregated warning counters
 function getNoVersionSkipCount() { return _noVersionSkipCount; }
 function resetNoVersionSkipCount() { _noVersionSkipCount = 0; }
+function getInvalidNameSkipCount() { return _invalidNameSkipCount; }
+function resetInvalidNameSkipCount() { _invalidNameSkipCount = 0; _invalidNameSamples = []; }
 /**
  * Source-aware confidence: a package reported by N distinct feeds is more
@@ -1643,6 +1678,7 @@ module.exports = {
   createFreshness, isAllowedRedirect,
   validateIOCEntry,
   getNoVersionSkipCount, resetNoVersionSkipCount,
+  getInvalidNameSkipCount, resetInvalidNameSkipCount,
   CONFIDENCE_ORDER, ALLOWED_REDIRECT_DOMAINS,
   MAX_ENTRY_UNCOMPRESSED, MAX_TOTAL_UNCOMPRESSED
 };

package/src/monitor/daemon.js CHANGED Viewed

@@ -4,15 +4,17 @@ const path = require('path');
 const os = require('os');
 const v8 = require('v8');
 const { isDockerAvailable, SANDBOX_CONCURRENCY_MAX, killAllSandboxContainers } = require('../sandbox/index.js');
+const { banner } = require('../utils.js');
 const { setVerboseMode, isSandboxEnabled, isCanaryEnabled, isLlmDetectiveEnabled, getLlmDetectiveMode, DOWNLOADS_CACHE_TTL } = require('./classify.js');
-const { loadState, saveState, loadDailyStats, saveDailyStats, purgeTarballCache, getParisHour, atomicWriteFileSync, saveNpmSeq, ALERTS_FILE, runStateMigrations, loadRecentlyScanned, saveRecentlyScanned } = require('./state.js');
+const { loadState, saveState, loadDailyStats, saveDailyStats, purgeTarballCache, isDailyReportDue, atomicWriteFileSync, saveNpmSeq, ALERTS_FILE, runStateMigrations, loadRecentlyScanned, saveRecentlyScanned } = require('./state.js');
 const { isTemporalEnabled, isTemporalAstEnabled, isTemporalPublishEnabled, isTemporalMaintainerEnabled } = require('./temporal.js');
-const { pendingGrouped, flushScopeGroup, sendDailyReport, DAILY_REPORT_HOUR, alertedPackageRules, ALERTED_PACKAGES_MAX: MAX_ALERTED_PACKAGES } = require('./webhook.js');
+const { pendingGrouped, flushScopeGroup, sendDailyReport, alertedPackageRules, ALERTED_PACKAGES_MAX: MAX_ALERTED_PACKAGES } = require('./webhook.js');
 const { poll, getPollBackoffMs } = require('./ingestion.js');
 const { ensureWorkers, drainWorkers, getTargetConcurrency, setTargetConcurrency, getActiveWorkers, terminateAllWorkers } = require('./queue.js');
 const { computeTarget, ADJUST_INTERVAL_MS, BASE_CONCURRENCY } = require('./adaptive-concurrency.js');
 const { startHealthcheck } = require('./healthcheck.js');
 const { startDeferredWorker, stopDeferredWorker, persistDeferredQueue, restoreDeferredQueue, clearDeferredQueue } = require('./deferred-sandbox.js');
+const { evictFromScanQueueBulk } = require('./scan-queue.js');
 const { startGhsaPoller, stopGhsaPoller } = require('../ioc/ghsa-poller.js');
 const { cleanupOldArchives, getRetentionDays, startPeriodicCleanup } = require('./tarball-archive.js');
 const { clearMetadataCache } = require('../scanner/temporal-analysis.js');
@@ -532,8 +534,13 @@ function pruneMemoryCaches(recentlyScanned, downloadsCache, alertedPackageRules)
  * Worker spawning is gated separately in the main loop (ensureWorkers skipped at HIGH+).
  * Ingestion is gated in ingestion.js via getMemoryPressureLevel() (skipped at CRITICAL+).
  */
-function handleMemoryPressure(level, ratio, recentlyScanned, downloadsCache, scanQueue) {
+function handleMemoryPressure(level, ratio, rssRatio, recentlyScanned, downloadsCache, scanQueue, stats) {
   const pct = (ratio * 100).toFixed(0);
+  // Show BOTH arms: an EMERGENCY almost always fires on RSS (off-heap — gVisor containers,
+  // tarball buffers) while the heap sits low (~15%). Logging only heap made every breaker
+  // line read "heap at 15%" and hid the real cause; memPctLabel surfaces which arm tripped.
+  const rssPct = (rssRatio != null && isFinite(rssRatio)) ? (rssRatio * 100).toFixed(0) : '?';
+  const memPctLabel = `heap ${pct}% / rss ${rssPct}%`;
   // Structured summary of what the breaker actually did this tick. Returned (the poll loop
   // at the call site ignores it) so the reclaim is observable to callers and tests without
   // scraping console output — CLAUDE.md §3 "Toujours logger un resume". The two kill fields
@@ -543,7 +550,7 @@ function handleMemoryPressure(level, ratio, recentlyScanned, downloadsCache, sca
   // HIGH (85%+): clear auxiliary caches — same as old emergency prune
   if (level >= MEMORY_PRESSURE_LEVELS.HIGH) {
-    console.error(`[MONITOR] MEMORY PRESSURE HIGH: heap at ${pct}% — pruning caches, stopping new workers`);
+    console.error(`[MONITOR] MEMORY PRESSURE HIGH: ${memPctLabel} — pruning caches, stopping new workers`);
     recentlyScanned.clear();
     downloadsCache.clear();
     alertedPackageRules.clear();
@@ -552,7 +559,7 @@ function handleMemoryPressure(level, ratio, recentlyScanned, downloadsCache, sca
   // CRITICAL (90%+): clear scanner caches, force GC
   if (level >= MEMORY_PRESSURE_LEVELS.CRITICAL) {
-    console.error(`[MONITOR] MEMORY PRESSURE CRITICAL: heap at ${pct}% — stopping ingestion, clearing scanner caches`);
+    console.error(`[MONITOR] MEMORY PRESSURE CRITICAL: ${memPctLabel} — stopping ingestion, clearing scanner caches`);
     // temporal-analysis._metadataCache (200 entries × full npm registry metadata)
     try { clearMetadataCache(); } catch {}
     // typosquat metadataCache (500 entries × npm registry metadata for typosquat scoring)
@@ -578,21 +585,30 @@ function handleMemoryPressure(level, ratio, recentlyScanned, downloadsCache, sca
   if (level >= MEMORY_PRESSURE_LEVELS.EMERGENCY) {
     const queueBefore = scanQueue.length;
     if (queueBefore > EMERGENCY_QUEUE_KEEP) {
-      // Keep the LAST N items (most recently added = newest packages).
-      // These are the packages most likely to still exist on npm for re-scan later.
-      // Dropped items are public packages — they'll appear again on republish or
-      // can be re-fetched from the registry if needed.
-      const dropped = queueBefore - EMERGENCY_QUEUE_KEEP;
-      // splice from the front: older items were pushed first
-      scanQueue.splice(0, dropped);
+      // Protected-aware bulk eviction — SINGLE SOURCE OF TRUTH with the queue-cap path
+      // (scan-queue.js evictFromScanQueueBulk / enqueueScan share _isProtected). Keeps
+      // IOC-match / burst / first-publish / ATO scans, drops the oldest UNPROTECTED items
+      // first (newest survive — most likely to still exist for re-scan), protected only as
+      // a last resort, and LEDGERS every drop. Closes the v2.10.88 gap where the raw
+      // splice(0,n) silently dropped protected scans (CLAUDE.md "ne jamais perdre de scan").
+      const { dropped, droppedProtected } = evictFromScanQueueBulk(scanQueue, EMERGENCY_QUEUE_KEEP, 'mem_emergency');
       summary.queueDropped = dropped;
-      console.error(`[MONITOR] MEMORY EMERGENCY: heap at ${pct}% — truncated queue ${queueBefore} → ${scanQueue.length} (dropped ${dropped} oldest items)`);
+      summary.queueDroppedProtected = droppedProtected;
+      if (stats) {
+        stats.queueEmergencyDrops = (stats.queueEmergencyDrops || 0) + dropped;
+        if (droppedProtected) stats.queueEmergencyProtectedDrops = (stats.queueEmergencyProtectedDrops || 0) + droppedProtected;
+      }
+      console.error(`[MONITOR] MEMORY EMERGENCY: ${memPctLabel} — truncated queue ${queueBefore} → ${scanQueue.length} (dropped ${dropped} oldest UNPROTECTED${droppedProtected ? ` + ${droppedProtected} protected as last resort` : ''}, all ledgered)`);
     }
     // Clear deferred sandbox queue (holds full staticResult objects)
     const deferredDropped = clearDeferredQueue();
     summary.deferredDropped = deferredDropped;
     if (deferredDropped > 0) {
-      console.error(`[MONITOR] MEMORY EMERGENCY: cleared ${deferredDropped} deferred sandbox items`);
+      // Observability only (counter, NOT a ledger 'dropped' entry): the deferred queue holds
+      // post-scan sandbox ENRICHMENT for packages already statically scanned + alerted, so
+      // clearing it is not a coverage loss — ledgering them as 'dropped' would mislabel them.
+      if (stats) stats.deferredDroppedEmergency = (stats.deferredDroppedEmergency || 0) + deferredDropped;
+      console.error(`[MONITOR] MEMORY EMERGENCY: cleared ${deferredDropped} deferred sandbox items (post-scan enrichment only — primary alerts already sent)`);
     }
     // Free the off-heap leak that queue truncation can't touch: orphaned sandbox
     // containers (gVisor runsc survives `docker kill`) and wedged scan workers.
@@ -642,13 +658,10 @@ function reportStats(stats) {
   stats.lastReportTime = Date.now();
 }
-function isDailyReportDue(stats) {
-  const hour = getParisHour();
-  if (hour !== DAILY_REPORT_HOUR) return false;
-  // Check if already sent today
-  const { hasReportBeenSentToday } = require('./state.js');
-  return !hasReportBeenSentToday(stats);
-}
+// isDailyReportDue is the canonical gate in state.js (imported above) — re-exported below
+// so monitor.js (daemonModule.isDailyReportDue) keeps resolving. The old local copy used a
+// `hour !== 8` gate that lost a whole day whenever the daemon missed the single 08:00 minute
+// (OOM crash-loop); state.js uses the catch-up `hour >= 8` gate instead.
 // ─── P1.0 — memory-trend instrumentation ───
 // Append one sample per memory-watchdog tick so the off-heap leak can be localised
@@ -775,12 +788,10 @@ async function startMonitor(options, stats, dailyAlerts, recentlyScanned, downlo
     console.warn(`[Archive] Failed to start periodic cleanup: ${err.message}`);
   }
-  console.log(`
-╔════════════════════════════════════════════╗
-║     MUAD'DIB - Registry Monitor           ║
-║     Scanning npm + PyPI new packages      ║
-╚════════════════════════════════════════════╝
-  `);
+  console.log('\n' + banner([
+    "MUAD'DIB - Registry Monitor",
+    'Scanning npm + PyPI new packages'
+  ]) + '\n');
   // Note: alerts file migrated from .json to .jsonl in v2.10.89
   const oldAlertsJson = ALERTS_FILE.replace('.jsonl', '.json');
@@ -1087,7 +1098,7 @@ async function startMonitor(options, stats, dailyAlerts, recentlyScanned, downlo
       // Graduated response at HIGH+
       if (pressureLevel >= MEMORY_PRESSURE_LEVELS.HIGH) {
-        handleMemoryPressure(pressureLevel, heapRatio, recentlyScanned, downloadsCache, scanQueue);
+        handleMemoryPressure(pressureLevel, heapRatio, rssRatio, recentlyScanned, downloadsCache, scanQueue, stats);
       }
       lastMemoryLogTime = Date.now();
     }

package/src/monitor/ingestion.js CHANGED Viewed

@@ -13,7 +13,7 @@ const { loadCachedIOCs } = require('../ioc/updater.js');
 const { enqueueScan } = require('./scan-queue.js');
 const {
   saveNpmSeq, CHANGES_STREAM_URL, CHANGES_LIMIT, CHANGES_CATCHUP_MAX,
-  savePypiSerial, PYPI_XMLRPC_URL, PYPI_CATCHUP_MAX
+  savePypiSerial, PYPI_XMLRPC_URL, PYPI_CATCHUP_MAX, appendScanLedger
 } = require('./state.js');
 const { sendIOCPreAlert, sendCampaignPreAlert } = require('./webhook.js');
@@ -109,6 +109,14 @@ function httpsGet(url, timeoutMs = 30_000, deadlineMs = Math.max(timeoutMs * 2,
         clearTimeout(deadline);
         return httpsGet(location, timeoutMs, deadlineMs).then(resolve, reject);
       }
+      if (res.statusCode === 429) {
+        res.resume();
+        // Coordinated backoff: drain the SHARED token bucket so every in-flight registry fetch
+        // slows together. This high-volume packument/changes path must signal 429 like the
+        // metadata path (npm-registry.js) does — not just acquire a slot (CLAUDE.md 429 storm).
+        try { require('../shared/http-limiter.js').signal429(); } catch { /* limiter best-effort */ }
+        return done(new Error(`HTTP 429 rate limited for ${url}`));
+      }
       if (res.statusCode < 200 || res.statusCode >= 300) {
         res.resume();
         return done(new Error(`HTTP ${res.statusCode} for ${url}`));
@@ -166,6 +174,11 @@ function httpsPost(url, body, headers = {}, timeoutMs = 30_000, deadlineMs = Mat
       if (err) reject(err); else resolve(value);
     };
     req = _deps.https.request(options, (res) => {
+      if (res.statusCode === 429) {
+        res.resume();
+        try { require('../shared/http-limiter.js').signal429(); } catch { /* limiter best-effort */ }
+        return done(new Error(`HTTP 429 rate limited for POST ${url}`));
+      }
       if (res.statusCode < 200 || res.statusCode >= 300) {
         res.resume();
         return done(new Error(`HTTP ${res.statusCode} for POST ${url}`));
@@ -418,6 +431,7 @@ function selectMostRecentVersion(packument, options = {}) {
     description: (typeof versionData.description === 'string') ? versionData.description : '',
     latestTagVersion,
     recentVersions: [],
+    droppedBurstVersions: [],
   };
   // Burst extras: other versions published within the recent window, excluding the
@@ -432,7 +446,13 @@ function selectMostRecentVersion(packument, options = {}) {
       const [v, ts] = versionTimes[i];
       if (ts < cutoff) break; // sorted desc, so once we cross the cutoff we're done
       result.recentWindowCount++;
-      if (result.recentVersions.length >= maxRecent) continue; // enqueue list capped; count continues
+      if (result.recentVersions.length >= maxRecent) {
+        // Burst beyond the enqueue cap: collect the version so the caller ledgers it as a
+        // coverage loss (it is never enqueued/scanned). Keeps a Miasma-style burst that
+        // outruns maxRecent visible instead of vanishing silently (CLAUDE.md "no silent caps").
+        result.droppedBurstVersions.push(v);
+        continue; // enqueue list capped; count continues
+      }
       const vData = versions[v];
       if (!vData) continue;
       result.recentVersions.push({
@@ -502,6 +522,16 @@ async function getNpmLatestTarball(packageName) {
       age_days: null, version_count: 0,
     };
   }
+  // A3: ledger burst versions dropped by the maxRecent enqueue cap — they are never scanned,
+  // so record each as a 'dropped' coverage loss (source burst_extras_cap) for the coverage
+  // audit. Best-effort; never throws. selectMostRecentVersion stays pure (it only collects).
+  if (result.droppedBurstVersions && result.droppedBurstVersions.length) {
+    for (const v of result.droppedBurstVersions) {
+      try {
+        appendScanLedger({ name: packageName, version: v, ecosystem: 'npm', outcome: 'dropped', source: 'burst_extras_cap' });
+      } catch { /* ledger is best-effort */ }
+    }
+  }
   // Stage 2.1 — extract reputation signals from the packument we already have,
   // so triageRisk in queue.js doesn't have to refetch metadata via
   // getPackageMetadata. Two fields are derivable from the packument alone: