muaddib-scanner 2.11.76 → 2.11.78
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.githooks/pre-commit +18 -0
- package/README.md +15 -6
- package/bin/muaddib.js +18 -4
- package/package.json +1 -2
- package/{self-scan-v2.11.76.json → self-scan-v2.11.78.json} +1 -1
- package/src/commands/interactive.js +5 -6
- package/src/commands/safe-install.js +19 -19
- package/src/ioc/scraper.js +46 -10
- package/src/monitor/daemon.js +39 -28
- package/src/monitor/ingestion.js +32 -2
- package/src/monitor/queue.js +84 -21
- package/src/monitor/scan-queue.js +68 -1
- package/src/monitor/state.js +24 -1
- package/src/monitor/webhook.js +32 -11
- package/src/output/formatter.js +3 -4
- package/src/pipeline/executor.js +9 -1
- package/src/runtime/daemon.js +27 -28
- package/src/runtime/watch.js +7 -7
- package/src/sandbox/index.js +11 -9
- package/src/scanner/temporal-analysis.js +8 -0
- package/src/scanner/temporal-ast-diff.js +5 -0
- package/src/utils.js +60 -1
- package/.dockerignore +0 -7
- package/.env.example +0 -43
- package/ml-retrain/auto-labeler/auto_labeler.py +0 -312
- package/ml-retrain/auto-labeler/ghsa_checker.py +0 -169
- package/ml-retrain/auto-labeler/labeler.py +0 -256
- package/ml-retrain/auto-labeler/npm_checker.py +0 -228
- package/ml-retrain/auto-labeler/ossf_index.py +0 -178
- package/ml-retrain/auto-labeler/requirements.txt +0 -1
- package/ml-retrain/confusion-matrix.png +0 -0
- package/ml-retrain/model-trees-retrained.js +0 -12
- package/ml-retrain/retrain-report.json +0 -225
- package/ml-retrain/retrain.py +0 -974
- package/sbom.json +0 -0
- package/src/ml/train-bundler-detector.py +0 -725
- package/src/ml/train-xgboost.py +0 -957
- package/tools/export-model-js.py +0 -160
- package/tools/requirements-ml.txt +0 -5
- package/tools/train-classifier.py +0 -333
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
#!/bin/sh
|
|
2
|
+
# Pre-commit guard — block committing a typosquat/forbidden dependency
|
|
3
|
+
# (loadash, lodash, lodahs, …). Local mirror of the CI gate
|
|
4
|
+
# (scripts/check-deps-typosquats.js, run in the `test` job of .github/workflows/scan.yml)
|
|
5
|
+
# and defense-in-depth with the package.json `preinstall` denylist.
|
|
6
|
+
#
|
|
7
|
+
# Enable once per clone: git config core.hooksPath .githooks
|
|
8
|
+
#
|
|
9
|
+
# This repo uses NEITHER lodash NOR loadash (CLAUDE.md interdiction). The loadash
|
|
10
|
+
# typosquat has re-entered package.json repeatedly; this stops it at commit time.
|
|
11
|
+
|
|
12
|
+
node scripts/check-deps-typosquats.js
|
|
13
|
+
status=$?
|
|
14
|
+
if [ "$status" -ne 0 ]; then
|
|
15
|
+
echo "pre-commit: ABORTED — remove the typosquat dependency from package.json (see above)." >&2
|
|
16
|
+
exit 1
|
|
17
|
+
fi
|
|
18
|
+
exit 0
|
package/README.md
CHANGED
|
@@ -30,7 +30,7 @@
|
|
|
30
30
|
|
|
31
31
|
npm and PyPI supply-chain attacks are exploding. Shai-Hulud compromised 25K+ repos in 2025. Existing tools detect threats but don't help you respond.
|
|
32
32
|
|
|
33
|
-
MUAD'DIB combines **20 parallel scanners** (
|
|
33
|
+
MUAD'DIB combines **20 parallel scanners** (264 detection rules), a **deobfuscation engine**, **inter-module dataflow analysis**, **compound scoring** (17 compound rules), and a gVisor/Docker sandbox to detect known threats and suspicious behavioral patterns in npm and PyPI packages. An XGBoost classifier exists in the codebase but is **currently inactive** (see [Evaluation Metrics](#evaluation-metrics) → ML Classifier section).
|
|
34
34
|
|
|
35
35
|
---
|
|
36
36
|
|
|
@@ -202,9 +202,9 @@ muaddib replay # Ground truth validation (90/94 TPR@3, v2.11
|
|
|
202
202
|
| Python Source (PYSRC) | Import-time / install-time RCE patterns in `__init__.py` / `setup.py` (v2.11.41 — closes TrapDoor PyPI gap) |
|
|
203
203
|
| Python AST (PYAST) | Tree-sitter-Python AST with taint-aware detectors (v2.11.42+) |
|
|
204
204
|
|
|
205
|
-
###
|
|
205
|
+
### 264 detection rules
|
|
206
206
|
|
|
207
|
-
All rules (
|
|
207
|
+
All rules (259 RULES + 5 PARANOID) are mapped to MITRE ATT&CK techniques. See [SECURITY.md](SECURITY.md#detection-rules-v21176) for the complete rules reference.
|
|
208
208
|
|
|
209
209
|
### Detected campaigns
|
|
210
210
|
|
|
@@ -278,7 +278,7 @@ With pre-commit framework:
|
|
|
278
278
|
```yaml
|
|
279
279
|
repos:
|
|
280
280
|
- repo: https://github.com/DNSZLSK/muad-dib
|
|
281
|
-
rev: v2.11.
|
|
281
|
+
rev: v2.11.76
|
|
282
282
|
hooks:
|
|
283
283
|
- id: muaddib-scan
|
|
284
284
|
```
|
|
@@ -303,11 +303,20 @@ These are the numbers a user gets when running `muaddib scan` against npm or PyP
|
|
|
303
303
|
| **FPR PyPI** (v2.11.48, first honest measurement) | **9.68%** (12/124 scanned, 132 total) | **Track D fixed the PyPI downloader** — removed `pip --no-binary :all:` flag (forced compile of wheel-only packages, timed out 38% of the time) + added `.whl` extraction via `extractArchive()`. Brought 42 previously-skipped giants (numpy/pandas/django/matplotlib/scikit-learn/...) into scope. All 12 FPs cluster at score 25-35: this is the cap-PyPI-35 artifact, not new rule misfires. Lifting the cap (Track E) would drop FPR PyPI to ≈0%. 8 residual fails are >500MB packages (torch, tensorflow, scipy, opencv-python, ansible…) hitting the 30s `PACK_TIMEOUT_MS`. |
|
|
304
304
|
| **ADR** (Adversarial + Holdout, v2.11.48) | **96.26%** (103/107) | 67 adversarial + 40 holdout, global threshold=20. Stable vs v2.10.95. |
|
|
305
305
|
|
|
306
|
-
**
|
|
306
|
+
**4132 tests** across 115 files. **264 rules** (259 RULES + 5 PARANOID; v2.11.67/70 Phantom Gyp added PKG-023 + COMPOUND-017).
|
|
307
307
|
|
|
308
308
|
**Known issues (v2.11.48):**
|
|
309
309
|
- *Cap PyPI à 35/100*: Python samples plafonnent à `riskScore=35` even when `globalRiskScore=100`. Confirmed empirically — all 12 PyPI FPs at score 25-35 (flask 32, django 35, tornado 35, bottle 30, pandas 25, matplotlib 25, plotly 25, bokeh 25, pymongo 35, coverage 32, fabric 35, websockets 35). Lifting the cap will simultaneously drop FPR PyPI to ≈0% and unblock PyPI MALWARE detection at higher thresholds. Track E target.
|
|
310
310
|
|
|
311
|
+
### Operational coverage (v2.11.67-76)
|
|
312
|
+
|
|
313
|
+
The static ground-truth TPR above is measured offline. Since v2.11.67 the monitor also tracks **operational** coverage on live npm/PyPI ingestion:
|
|
314
|
+
- A per-scan **ledger** (`data/scan-ledger.jsonl`) records every scanned package's outcome; `computeLedgerRollup()` produces a 24h rollup (`alertRate`, per-ecosystem). Note: `alertRate` is a throughput signal, **not** detection TPR.
|
|
315
|
+
- An active **GHSA poller** (~15 min; npm, pypi, crates) builds an authoritative "what should we have caught" denominator (`data/ghsa-malware.jsonl`), plus a **feed-health** alarm that fires when an IOC feed silently goes dark.
|
|
316
|
+
- The Phase 5 **coverage-audit** (`scripts/coverage-audit.js`, daily 05:00 UTC) joins that denominator against ledger outcomes + the tarball archive to compute an honest GHSA-denominated **operational TPR** (`alerted / total`), and surfaces `scannedClean` misses as human-gated ground-truth candidates.
|
|
317
|
+
|
|
318
|
+
This operational TPR is the real production detection rate, distinct from the static GT TPR (which has not been re-measured since v2.11.48).
|
|
319
|
+
|
|
311
320
|
### ML Classifier (offline only)
|
|
312
321
|
|
|
313
322
|
`src/ml/classifier.js` is **not wired into `muaddib scan`**. The XGBoost model is currently exercised only by `muaddib evaluate` (offline metric replay) and `muaddib monitor` (LOG-ONLY since 2026-04-08, model collapsed pending retrain — see `src/monitor/queue.js:628`). The v2.11.48 evaluate-time replay shows the same 1.10% FPR (no additional FPs filtered) — kept as a reference for retrain validation, but the published operational FPR is the rules-only number above.
|
|
@@ -371,7 +380,7 @@ npm test
|
|
|
371
380
|
|
|
372
381
|
### Testing
|
|
373
382
|
|
|
374
|
-
- **
|
|
383
|
+
- **4132 tests** across 115 modular test files
|
|
375
384
|
- **56 fuzz tests** - Malformed inputs, ReDoS, unicode, binary
|
|
376
385
|
- **Datadog 17K benchmark** - 14,587 confirmed malware samples (in-scope)
|
|
377
386
|
- **Ground truth validation** - 96 real-world attacks (95.74% TPR@3, 88.30% TPR@20 — v2.11.48 full measure on 94 in-scope)
|
package/bin/muaddib.js
CHANGED
|
@@ -31,6 +31,23 @@ const { diff, showRefs } = require('../src/diff.js');
|
|
|
31
31
|
const { initHooks, removeHooks } = require('../src/hooks-init.js');
|
|
32
32
|
const { showHelp, commandHelp } = require('../src/commands/help.js');
|
|
33
33
|
const { interactiveMenu } = require('../src/commands/interactive.js');
|
|
34
|
+
const { isPromptCancellation } = require('../src/utils.js');
|
|
35
|
+
|
|
36
|
+
// Global safety net: turn an unhandled async error into a clean one-line message
|
|
37
|
+
// instead of a raw stack trace. Ctrl-C inside an interactive prompt exits 130
|
|
38
|
+
// (POSIX SIGINT convention); any other error exits 1. Set MUADDIB_DEBUG=1 to see
|
|
39
|
+
// the full stack.
|
|
40
|
+
function handleFatal(err) {
|
|
41
|
+
if (isPromptCancellation(err)) {
|
|
42
|
+
console.log('\nCancelled.');
|
|
43
|
+
process.exit(130);
|
|
44
|
+
}
|
|
45
|
+
console.error('[ERROR]', err && err.message ? err.message : String(err));
|
|
46
|
+
if (process.env.MUADDIB_DEBUG && err && err.stack) console.error(err.stack);
|
|
47
|
+
process.exit(1);
|
|
48
|
+
}
|
|
49
|
+
process.on('unhandledRejection', handleFatal);
|
|
50
|
+
process.on('uncaughtException', handleFatal);
|
|
34
51
|
|
|
35
52
|
const args = process.argv.slice(2);
|
|
36
53
|
const command = args[0];
|
|
@@ -274,10 +291,7 @@ if (command === 'version' || command === '--version' || command === '-v') {
|
|
|
274
291
|
if (command === '--help' || command === '-h') {
|
|
275
292
|
showHelp();
|
|
276
293
|
}
|
|
277
|
-
interactiveMenu().catch(
|
|
278
|
-
console.error('[ERROR]', err.message);
|
|
279
|
-
process.exit(1);
|
|
280
|
-
});
|
|
294
|
+
interactiveMenu().catch(handleFatal);
|
|
281
295
|
} else if (command === 'scan') {
|
|
282
296
|
if (wantHelp) showHelp('scan');
|
|
283
297
|
run(target, {
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "muaddib-scanner",
|
|
3
|
-
"version": "2.11.
|
|
3
|
+
"version": "2.11.78",
|
|
4
4
|
"description": "Supply-chain threat detection & response for npm & PyPI/Python",
|
|
5
5
|
"main": "src/index.js",
|
|
6
6
|
"bin": {
|
|
@@ -52,7 +52,6 @@
|
|
|
52
52
|
"acorn-walk": "8.3.5",
|
|
53
53
|
"adm-zip": "0.5.17",
|
|
54
54
|
"js-yaml": "4.2.0",
|
|
55
|
-
"loadash": "^1.0.0",
|
|
56
55
|
"web-tree-sitter": "^0.26.9"
|
|
57
56
|
},
|
|
58
57
|
"devDependencies": {
|
|
@@ -8,16 +8,15 @@ const { safeInstall } = require('../safe-install.js');
|
|
|
8
8
|
const { buildSandboxImage, runSandbox, generateNetworkReport } = require('../sandbox/index.js');
|
|
9
9
|
const { diff } = require('../diff.js');
|
|
10
10
|
const { initHooks } = require('../hooks-init.js');
|
|
11
|
+
const { banner } = require('../utils.js');
|
|
11
12
|
|
|
12
13
|
async function interactiveMenu() {
|
|
13
14
|
const { select, input, confirm } = await import('@inquirer/prompts');
|
|
14
15
|
|
|
15
|
-
console.log(
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
╚═══════════════════════════════════════════════╝
|
|
20
|
-
`);
|
|
16
|
+
console.log('\n' + banner([
|
|
17
|
+
"MUAD'DIB - npm & PyPI Supply Chain Hunter",
|
|
18
|
+
'"The worms must die."'
|
|
19
|
+
]) + '\n');
|
|
21
20
|
|
|
22
21
|
const action = await select({
|
|
23
22
|
message: 'What do you want to do?',
|
|
@@ -1,8 +1,14 @@
|
|
|
1
1
|
const fs = require('fs');
|
|
2
2
|
const path = require('path');
|
|
3
|
-
|
|
3
|
+
// NB: keep the module object (cp.spawnSync), NOT a destructured `spawnSync`. Destructuring
|
|
4
|
+
// captures the original reference at load time, which makes the function impossible to mock
|
|
5
|
+
// from tests (`cp.spawnSync = ...` wouldn't be seen here) — that's exactly how the
|
|
6
|
+
// safe-install test's mock silently failed and a real `npm install loadash` ran every test
|
|
7
|
+
// run, re-contaminating package.json. Property access keeps it interceptable.
|
|
8
|
+
const cp = require('child_process');
|
|
4
9
|
const { loadCachedIOCs } = require('../ioc/updater.js');
|
|
5
10
|
const { REHABILITATED_PACKAGES, NPM_PACKAGE_REGEX } = require('../shared/constants.js');
|
|
11
|
+
const { banner } = require('../utils.js');
|
|
6
12
|
|
|
7
13
|
/**
|
|
8
14
|
* Validates that a package name is safe (no command injection)
|
|
@@ -172,7 +178,7 @@ async function scanPackageRecursive(pkg, depth = 0, maxDepth = 3) {
|
|
|
172
178
|
// Get the package info (uses spawnSync to avoid injection)
|
|
173
179
|
let pkgInfo;
|
|
174
180
|
try {
|
|
175
|
-
const result = spawnSync('npm', ['view', pkgName, '--json'], { encoding: 'utf8', shell: false });
|
|
181
|
+
const result = cp.spawnSync('npm', ['view', pkgName, '--json'], { encoding: 'utf8', shell: false });
|
|
176
182
|
if (result.status !== 0 || !result.stdout) {
|
|
177
183
|
if (depth === 0) console.log(`[!] Package ${pkgName} not found on npm`);
|
|
178
184
|
return { safe: false, package: pkgName, reason: 'npm_unreachable', source: 'npm-registry', description: 'Package not found on npm registry', depth };
|
|
@@ -207,12 +213,10 @@ async function scanPackageRecursive(pkg, depth = 0, maxDepth = 3) {
|
|
|
207
213
|
async function safeInstall(packages, options = {}) {
|
|
208
214
|
const { isDev, isGlobal, force } = options;
|
|
209
215
|
|
|
210
|
-
console.log(
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
╚══════════════════════════════════════════╝
|
|
215
|
-
`);
|
|
216
|
+
console.log('\n' + banner([
|
|
217
|
+
"MUAD'DIB Safe Install",
|
|
218
|
+
'Scanning packages + dependencies...'
|
|
219
|
+
]) + '\n');
|
|
216
220
|
|
|
217
221
|
// Reset the cache for each install
|
|
218
222
|
scannedPackages.clear();
|
|
@@ -221,11 +225,7 @@ async function safeInstall(packages, options = {}) {
|
|
|
221
225
|
const result = await scanPackageRecursive(pkg);
|
|
222
226
|
|
|
223
227
|
if (!result.safe) {
|
|
224
|
-
console.log(
|
|
225
|
-
╔══════════════════════════════════════════╗
|
|
226
|
-
║ [!] MALICIOUS PACKAGE DETECTED ║
|
|
227
|
-
╚══════════════════════════════════════════╝
|
|
228
|
-
`);
|
|
228
|
+
console.log('\n' + banner(['[!] MALICIOUS PACKAGE DETECTED']) + '\n');
|
|
229
229
|
if (result.depth > 0) {
|
|
230
230
|
console.log(`Requested package: ${pkg}`);
|
|
231
231
|
console.log(`Malicious dependency: ${result.package} (depth: ${result.depth})`);
|
|
@@ -240,11 +240,11 @@ async function safeInstall(packages, options = {}) {
|
|
|
240
240
|
console.log('[!] Installation BLOCKED.');
|
|
241
241
|
return { blocked: true, package: result.package, threats: [{ type: 'known_malicious', severity: 'CRITICAL', message: result.description }] };
|
|
242
242
|
} else {
|
|
243
|
-
console.log(
|
|
244
|
-
|
|
245
|
-
|
|
246
|
-
|
|
247
|
-
|
|
243
|
+
console.log(banner([
|
|
244
|
+
'[!!!] WARNING: FORCE INSTALL ACTIVE',
|
|
245
|
+
'Known malicious package detected!',
|
|
246
|
+
'Installing despite security threats.'
|
|
247
|
+
]));
|
|
248
248
|
console.log('[AUDIT] Force-install override for malicious package: ' + result.package);
|
|
249
249
|
|
|
250
250
|
// SFI-004: Write audit log for force-install overrides
|
|
@@ -276,7 +276,7 @@ async function safeInstall(packages, options = {}) {
|
|
|
276
276
|
if (isDev) npmArgs.push('--save-dev');
|
|
277
277
|
if (isGlobal) npmArgs.push('-g');
|
|
278
278
|
|
|
279
|
-
const result = spawnSync('npm', npmArgs, { stdio: 'inherit', shell: false });
|
|
279
|
+
const result = cp.spawnSync('npm', npmArgs, { stdio: 'inherit', shell: false });
|
|
280
280
|
|
|
281
281
|
if (result.status !== 0) {
|
|
282
282
|
console.log('');
|
package/src/ioc/scraper.js
CHANGED
|
@@ -7,9 +7,10 @@ const AdmZip = require('adm-zip');
|
|
|
7
7
|
const IOC_FILE = path.join(__dirname, 'data/iocs.json');
|
|
8
8
|
const COMPACT_IOC_FILE = path.join(__dirname, 'data/iocs-compact.json');
|
|
9
9
|
const HOME_IOC_FILE = path.join(os.homedir(), '.muaddib', 'data', 'iocs.json');
|
|
10
|
-
const { generateCompactIOCs, NEVER_WILDCARD } = require('./updater.js');
|
|
10
|
+
const { generateCompactIOCs, NEVER_WILDCARD, expandCompactIOCs } = require('./updater.js');
|
|
11
11
|
const { Spinner } = require('../utils.js');
|
|
12
12
|
const { NPM_PACKAGE_REGEX } = require('../shared/constants.js');
|
|
13
|
+
const { version: PKG_VERSION } = require('../../package.json');
|
|
13
14
|
|
|
14
15
|
// Version format validation (semver-like + wildcard)
|
|
15
16
|
// Permissive version validator — accepts:
|
|
@@ -43,6 +44,8 @@ const VERSION_RE = { test: isValidVersion };
|
|
|
43
44
|
let _noVersionSkipCount = 0;
|
|
44
45
|
let _invalidVersionSkipCount = 0;
|
|
45
46
|
let _invalidVersionSamples = []; // first 3 samples for context
|
|
47
|
+
let _invalidNameSkipCount = 0;
|
|
48
|
+
let _invalidNameSamples = []; // first 3 samples for context
|
|
46
49
|
|
|
47
50
|
/**
|
|
48
51
|
* Validate an IOC package entry before insertion.
|
|
@@ -53,14 +56,18 @@ function validateIOCEntry(pkgName, version, ecosystem) {
|
|
|
53
56
|
// npm: validate with NPM_PACKAGE_REGEX
|
|
54
57
|
if (ecosystem === 'npm' || !ecosystem) {
|
|
55
58
|
if (!NPM_PACKAGE_REGEX.test(pkgName)) {
|
|
56
|
-
|
|
59
|
+
// Aggregated counter (summary emitted by runScraper) — was a per-line
|
|
60
|
+
// console.warn that spammed 100+ lines on feeds carrying non-spec names.
|
|
61
|
+
_invalidNameSkipCount++;
|
|
62
|
+
if (_invalidNameSamples.length < 3) _invalidNameSamples.push(pkgName);
|
|
57
63
|
return false;
|
|
58
64
|
}
|
|
59
65
|
}
|
|
60
66
|
// PyPI: basic check — no path traversal, no slashes
|
|
61
67
|
if (ecosystem === 'pypi') {
|
|
62
68
|
if (/[/\\]|\.\./.test(pkgName)) {
|
|
63
|
-
|
|
69
|
+
_invalidNameSkipCount++;
|
|
70
|
+
if (_invalidNameSamples.length < 3) _invalidNameSamples.push(pkgName);
|
|
64
71
|
return false;
|
|
65
72
|
}
|
|
66
73
|
}
|
|
@@ -955,14 +962,16 @@ async function scrapeGitHubAdvisory() {
|
|
|
955
962
|
// ============================================
|
|
956
963
|
async function runScraper() {
|
|
957
964
|
console.log('\n' + '='.repeat(60));
|
|
958
|
-
console.log(' MUAD\'DIB IOC Scraper
|
|
959
|
-
console.log(' OSV + OSSF + GenSecAI + DataDog + Aikido + OSM');
|
|
965
|
+
console.log(' MUAD\'DIB IOC Scraper v' + PKG_VERSION);
|
|
966
|
+
console.log(' OSV + OSSF + GitHub Advisory + GenSecAI + DataDog + Aikido + OSM');
|
|
960
967
|
console.log('='.repeat(60) + '\n');
|
|
961
968
|
|
|
962
969
|
// Reset aggregated warning counters
|
|
963
970
|
_noVersionSkipCount = 0;
|
|
964
971
|
_invalidVersionSkipCount = 0;
|
|
965
972
|
_invalidVersionSamples = [];
|
|
973
|
+
_invalidNameSkipCount = 0;
|
|
974
|
+
_invalidNameSamples = [];
|
|
966
975
|
|
|
967
976
|
// Create data directory if needed
|
|
968
977
|
const dataDir = path.dirname(IOC_FILE);
|
|
@@ -997,11 +1006,28 @@ async function runScraper() {
|
|
|
997
1006
|
}
|
|
998
1007
|
}
|
|
999
1008
|
|
|
1009
|
+
// Fresh-install fallback: the full iocs.json is gitignored / not shipped in the
|
|
1010
|
+
// npm package, but the compact baseline IS. Seed from it (the same path that
|
|
1011
|
+
// `muaddib update` uses) so a first scrape augments the shipped baseline instead
|
|
1012
|
+
// of appearing to start from zero and re-downloading everything.
|
|
1013
|
+
let seededFromCompact = false;
|
|
1014
|
+
if (existingIOCs.packages.length === 0 && fs.existsSync(COMPACT_IOC_FILE)) {
|
|
1015
|
+
try {
|
|
1016
|
+
const compactData = JSON.parse(fs.readFileSync(COMPACT_IOC_FILE, 'utf8'));
|
|
1017
|
+
existingIOCs = expandCompactIOCs(compactData);
|
|
1018
|
+
if (!existingIOCs.pypi_packages) existingIOCs.pypi_packages = [];
|
|
1019
|
+
seededFromCompact = true;
|
|
1020
|
+
} catch {
|
|
1021
|
+
console.log('[WARN] Compact IOC baseline unreadable, starting fresh');
|
|
1022
|
+
}
|
|
1023
|
+
}
|
|
1024
|
+
|
|
1000
1025
|
const initialCount = existingIOCs.packages.length;
|
|
1001
1026
|
const initialPyPICount = existingIOCs.pypi_packages.length;
|
|
1002
1027
|
const initialHashCount = existingIOCs.hashes ? existingIOCs.hashes.length : 0;
|
|
1003
1028
|
|
|
1004
|
-
|
|
1029
|
+
const baselineLabel = seededFromCompact ? 'Baseline IOCs loaded (shipped compact)' : 'Existing IOCs';
|
|
1030
|
+
console.log('[INFO] ' + baselineLabel + ': ' + initialCount + ' packages, ' + initialHashCount + ' hashes\n');
|
|
1005
1031
|
|
|
1006
1032
|
// Phase 1: OSV data dump first (bulk, primary source)
|
|
1007
1033
|
// This returns knownIds so OSSF can skip already-known entries
|
|
@@ -1037,6 +1063,12 @@ async function runScraper() {
|
|
|
1037
1063
|
: '';
|
|
1038
1064
|
console.log('[SCRAPER] WARN: ' + _invalidVersionSkipCount + ' entries skipped (malformed version)' + samples);
|
|
1039
1065
|
}
|
|
1066
|
+
if (_invalidNameSkipCount > 0) {
|
|
1067
|
+
const nameSamples = _invalidNameSamples.length > 0
|
|
1068
|
+
? ' (samples: ' + _invalidNameSamples.join(', ') + ')'
|
|
1069
|
+
: '';
|
|
1070
|
+
console.log('[SCRAPER] WARN: ' + _invalidNameSkipCount + ' invalid package names skipped' + nameSamples);
|
|
1071
|
+
}
|
|
1040
1072
|
|
|
1041
1073
|
// Merge all scraped packages
|
|
1042
1074
|
const allPackages = [
|
|
@@ -1297,12 +1329,13 @@ async function runScraper() {
|
|
|
1297
1329
|
console.log(' - ' + source + ': ' + count);
|
|
1298
1330
|
}
|
|
1299
1331
|
|
|
1300
|
-
//
|
|
1332
|
+
// Sanity check: a drop vs the previously-loaded baseline is a real signal of a
|
|
1333
|
+
// feed outage or a corrupted merge — surface that instead of a meaningless target.
|
|
1301
1334
|
const total = existingIOCs.packages.length;
|
|
1302
|
-
if (total
|
|
1303
|
-
console.log('\n [
|
|
1335
|
+
if (total < initialCount) {
|
|
1336
|
+
console.log('\n [WARN] IOC count decreased: ' + total + ' (was ' + initialCount + ') — possible source outage');
|
|
1304
1337
|
} else {
|
|
1305
|
-
console.log('\n [
|
|
1338
|
+
console.log('\n [OK] IOC database: ' + total + ' npm IOCs');
|
|
1306
1339
|
}
|
|
1307
1340
|
|
|
1308
1341
|
console.log('\n');
|
|
@@ -1606,6 +1639,8 @@ async function queryOSVBatch(packageNames) {
|
|
|
1606
1639
|
// Test helpers for aggregated warning counters
|
|
1607
1640
|
function getNoVersionSkipCount() { return _noVersionSkipCount; }
|
|
1608
1641
|
function resetNoVersionSkipCount() { _noVersionSkipCount = 0; }
|
|
1642
|
+
function getInvalidNameSkipCount() { return _invalidNameSkipCount; }
|
|
1643
|
+
function resetInvalidNameSkipCount() { _invalidNameSkipCount = 0; _invalidNameSamples = []; }
|
|
1609
1644
|
|
|
1610
1645
|
/**
|
|
1611
1646
|
* Source-aware confidence: a package reported by N distinct feeds is more
|
|
@@ -1643,6 +1678,7 @@ module.exports = {
|
|
|
1643
1678
|
createFreshness, isAllowedRedirect,
|
|
1644
1679
|
validateIOCEntry,
|
|
1645
1680
|
getNoVersionSkipCount, resetNoVersionSkipCount,
|
|
1681
|
+
getInvalidNameSkipCount, resetInvalidNameSkipCount,
|
|
1646
1682
|
CONFIDENCE_ORDER, ALLOWED_REDIRECT_DOMAINS,
|
|
1647
1683
|
MAX_ENTRY_UNCOMPRESSED, MAX_TOTAL_UNCOMPRESSED
|
|
1648
1684
|
};
|
package/src/monitor/daemon.js
CHANGED
|
@@ -4,15 +4,17 @@ const path = require('path');
|
|
|
4
4
|
const os = require('os');
|
|
5
5
|
const v8 = require('v8');
|
|
6
6
|
const { isDockerAvailable, SANDBOX_CONCURRENCY_MAX, killAllSandboxContainers } = require('../sandbox/index.js');
|
|
7
|
+
const { banner } = require('../utils.js');
|
|
7
8
|
const { setVerboseMode, isSandboxEnabled, isCanaryEnabled, isLlmDetectiveEnabled, getLlmDetectiveMode, DOWNLOADS_CACHE_TTL } = require('./classify.js');
|
|
8
|
-
const { loadState, saveState, loadDailyStats, saveDailyStats, purgeTarballCache,
|
|
9
|
+
const { loadState, saveState, loadDailyStats, saveDailyStats, purgeTarballCache, isDailyReportDue, atomicWriteFileSync, saveNpmSeq, ALERTS_FILE, runStateMigrations, loadRecentlyScanned, saveRecentlyScanned } = require('./state.js');
|
|
9
10
|
const { isTemporalEnabled, isTemporalAstEnabled, isTemporalPublishEnabled, isTemporalMaintainerEnabled } = require('./temporal.js');
|
|
10
|
-
const { pendingGrouped, flushScopeGroup, sendDailyReport,
|
|
11
|
+
const { pendingGrouped, flushScopeGroup, sendDailyReport, alertedPackageRules, ALERTED_PACKAGES_MAX: MAX_ALERTED_PACKAGES } = require('./webhook.js');
|
|
11
12
|
const { poll, getPollBackoffMs } = require('./ingestion.js');
|
|
12
13
|
const { ensureWorkers, drainWorkers, getTargetConcurrency, setTargetConcurrency, getActiveWorkers, terminateAllWorkers } = require('./queue.js');
|
|
13
14
|
const { computeTarget, ADJUST_INTERVAL_MS, BASE_CONCURRENCY } = require('./adaptive-concurrency.js');
|
|
14
15
|
const { startHealthcheck } = require('./healthcheck.js');
|
|
15
16
|
const { startDeferredWorker, stopDeferredWorker, persistDeferredQueue, restoreDeferredQueue, clearDeferredQueue } = require('./deferred-sandbox.js');
|
|
17
|
+
const { evictFromScanQueueBulk } = require('./scan-queue.js');
|
|
16
18
|
const { startGhsaPoller, stopGhsaPoller } = require('../ioc/ghsa-poller.js');
|
|
17
19
|
const { cleanupOldArchives, getRetentionDays, startPeriodicCleanup } = require('./tarball-archive.js');
|
|
18
20
|
const { clearMetadataCache } = require('../scanner/temporal-analysis.js');
|
|
@@ -532,8 +534,13 @@ function pruneMemoryCaches(recentlyScanned, downloadsCache, alertedPackageRules)
|
|
|
532
534
|
* Worker spawning is gated separately in the main loop (ensureWorkers skipped at HIGH+).
|
|
533
535
|
* Ingestion is gated in ingestion.js via getMemoryPressureLevel() (skipped at CRITICAL+).
|
|
534
536
|
*/
|
|
535
|
-
function handleMemoryPressure(level, ratio, recentlyScanned, downloadsCache, scanQueue) {
|
|
537
|
+
function handleMemoryPressure(level, ratio, rssRatio, recentlyScanned, downloadsCache, scanQueue, stats) {
|
|
536
538
|
const pct = (ratio * 100).toFixed(0);
|
|
539
|
+
// Show BOTH arms: an EMERGENCY almost always fires on RSS (off-heap — gVisor containers,
|
|
540
|
+
// tarball buffers) while the heap sits low (~15%). Logging only heap made every breaker
|
|
541
|
+
// line read "heap at 15%" and hid the real cause; memPctLabel surfaces which arm tripped.
|
|
542
|
+
const rssPct = (rssRatio != null && isFinite(rssRatio)) ? (rssRatio * 100).toFixed(0) : '?';
|
|
543
|
+
const memPctLabel = `heap ${pct}% / rss ${rssPct}%`;
|
|
537
544
|
// Structured summary of what the breaker actually did this tick. Returned (the poll loop
|
|
538
545
|
// at the call site ignores it) so the reclaim is observable to callers and tests without
|
|
539
546
|
// scraping console output — CLAUDE.md §3 "Toujours logger un resume". The two kill fields
|
|
@@ -543,7 +550,7 @@ function handleMemoryPressure(level, ratio, recentlyScanned, downloadsCache, sca
|
|
|
543
550
|
|
|
544
551
|
// HIGH (85%+): clear auxiliary caches — same as old emergency prune
|
|
545
552
|
if (level >= MEMORY_PRESSURE_LEVELS.HIGH) {
|
|
546
|
-
console.error(`[MONITOR] MEMORY PRESSURE HIGH:
|
|
553
|
+
console.error(`[MONITOR] MEMORY PRESSURE HIGH: ${memPctLabel} — pruning caches, stopping new workers`);
|
|
547
554
|
recentlyScanned.clear();
|
|
548
555
|
downloadsCache.clear();
|
|
549
556
|
alertedPackageRules.clear();
|
|
@@ -552,7 +559,7 @@ function handleMemoryPressure(level, ratio, recentlyScanned, downloadsCache, sca
|
|
|
552
559
|
|
|
553
560
|
// CRITICAL (90%+): clear scanner caches, force GC
|
|
554
561
|
if (level >= MEMORY_PRESSURE_LEVELS.CRITICAL) {
|
|
555
|
-
console.error(`[MONITOR] MEMORY PRESSURE CRITICAL:
|
|
562
|
+
console.error(`[MONITOR] MEMORY PRESSURE CRITICAL: ${memPctLabel} — stopping ingestion, clearing scanner caches`);
|
|
556
563
|
// temporal-analysis._metadataCache (200 entries × full npm registry metadata)
|
|
557
564
|
try { clearMetadataCache(); } catch {}
|
|
558
565
|
// typosquat metadataCache (500 entries × npm registry metadata for typosquat scoring)
|
|
@@ -578,21 +585,30 @@ function handleMemoryPressure(level, ratio, recentlyScanned, downloadsCache, sca
|
|
|
578
585
|
if (level >= MEMORY_PRESSURE_LEVELS.EMERGENCY) {
|
|
579
586
|
const queueBefore = scanQueue.length;
|
|
580
587
|
if (queueBefore > EMERGENCY_QUEUE_KEEP) {
|
|
581
|
-
//
|
|
582
|
-
//
|
|
583
|
-
//
|
|
584
|
-
//
|
|
585
|
-
|
|
586
|
-
// splice
|
|
587
|
-
scanQueue
|
|
588
|
+
// Protected-aware bulk eviction — SINGLE SOURCE OF TRUTH with the queue-cap path
|
|
589
|
+
// (scan-queue.js evictFromScanQueueBulk / enqueueScan share _isProtected). Keeps
|
|
590
|
+
// IOC-match / burst / first-publish / ATO scans, drops the oldest UNPROTECTED items
|
|
591
|
+
// first (newest survive — most likely to still exist for re-scan), protected only as
|
|
592
|
+
// a last resort, and LEDGERS every drop. Closes the v2.10.88 gap where the raw
|
|
593
|
+
// splice(0,n) silently dropped protected scans (CLAUDE.md "ne jamais perdre de scan").
|
|
594
|
+
const { dropped, droppedProtected } = evictFromScanQueueBulk(scanQueue, EMERGENCY_QUEUE_KEEP, 'mem_emergency');
|
|
588
595
|
summary.queueDropped = dropped;
|
|
589
|
-
|
|
596
|
+
summary.queueDroppedProtected = droppedProtected;
|
|
597
|
+
if (stats) {
|
|
598
|
+
stats.queueEmergencyDrops = (stats.queueEmergencyDrops || 0) + dropped;
|
|
599
|
+
if (droppedProtected) stats.queueEmergencyProtectedDrops = (stats.queueEmergencyProtectedDrops || 0) + droppedProtected;
|
|
600
|
+
}
|
|
601
|
+
console.error(`[MONITOR] MEMORY EMERGENCY: ${memPctLabel} — truncated queue ${queueBefore} → ${scanQueue.length} (dropped ${dropped} oldest UNPROTECTED${droppedProtected ? ` + ${droppedProtected} protected as last resort` : ''}, all ledgered)`);
|
|
590
602
|
}
|
|
591
603
|
// Clear deferred sandbox queue (holds full staticResult objects)
|
|
592
604
|
const deferredDropped = clearDeferredQueue();
|
|
593
605
|
summary.deferredDropped = deferredDropped;
|
|
594
606
|
if (deferredDropped > 0) {
|
|
595
|
-
|
|
607
|
+
// Observability only (counter, NOT a ledger 'dropped' entry): the deferred queue holds
|
|
608
|
+
// post-scan sandbox ENRICHMENT for packages already statically scanned + alerted, so
|
|
609
|
+
// clearing it is not a coverage loss — ledgering them as 'dropped' would mislabel them.
|
|
610
|
+
if (stats) stats.deferredDroppedEmergency = (stats.deferredDroppedEmergency || 0) + deferredDropped;
|
|
611
|
+
console.error(`[MONITOR] MEMORY EMERGENCY: cleared ${deferredDropped} deferred sandbox items (post-scan enrichment only — primary alerts already sent)`);
|
|
596
612
|
}
|
|
597
613
|
// Free the off-heap leak that queue truncation can't touch: orphaned sandbox
|
|
598
614
|
// containers (gVisor runsc survives `docker kill`) and wedged scan workers.
|
|
@@ -642,13 +658,10 @@ function reportStats(stats) {
|
|
|
642
658
|
stats.lastReportTime = Date.now();
|
|
643
659
|
}
|
|
644
660
|
|
|
645
|
-
|
|
646
|
-
|
|
647
|
-
|
|
648
|
-
|
|
649
|
-
const { hasReportBeenSentToday } = require('./state.js');
|
|
650
|
-
return !hasReportBeenSentToday(stats);
|
|
651
|
-
}
|
|
661
|
+
// isDailyReportDue is the canonical gate in state.js (imported above) — re-exported below
|
|
662
|
+
// so monitor.js (daemonModule.isDailyReportDue) keeps resolving. The old local copy used a
|
|
663
|
+
// `hour !== 8` gate that lost a whole day whenever the daemon missed the single 08:00 minute
|
|
664
|
+
// (OOM crash-loop); state.js uses the catch-up `hour >= 8` gate instead.
|
|
652
665
|
|
|
653
666
|
// ─── P1.0 — memory-trend instrumentation ───
|
|
654
667
|
// Append one sample per memory-watchdog tick so the off-heap leak can be localised
|
|
@@ -775,12 +788,10 @@ async function startMonitor(options, stats, dailyAlerts, recentlyScanned, downlo
|
|
|
775
788
|
console.warn(`[Archive] Failed to start periodic cleanup: ${err.message}`);
|
|
776
789
|
}
|
|
777
790
|
|
|
778
|
-
console.log(
|
|
779
|
-
|
|
780
|
-
|
|
781
|
-
|
|
782
|
-
╚════════════════════════════════════════════╝
|
|
783
|
-
`);
|
|
791
|
+
console.log('\n' + banner([
|
|
792
|
+
"MUAD'DIB - Registry Monitor",
|
|
793
|
+
'Scanning npm + PyPI new packages'
|
|
794
|
+
]) + '\n');
|
|
784
795
|
|
|
785
796
|
// Note: alerts file migrated from .json to .jsonl in v2.10.89
|
|
786
797
|
const oldAlertsJson = ALERTS_FILE.replace('.jsonl', '.json');
|
|
@@ -1087,7 +1098,7 @@ async function startMonitor(options, stats, dailyAlerts, recentlyScanned, downlo
|
|
|
1087
1098
|
|
|
1088
1099
|
// Graduated response at HIGH+
|
|
1089
1100
|
if (pressureLevel >= MEMORY_PRESSURE_LEVELS.HIGH) {
|
|
1090
|
-
handleMemoryPressure(pressureLevel, heapRatio, recentlyScanned, downloadsCache, scanQueue);
|
|
1101
|
+
handleMemoryPressure(pressureLevel, heapRatio, rssRatio, recentlyScanned, downloadsCache, scanQueue, stats);
|
|
1091
1102
|
}
|
|
1092
1103
|
lastMemoryLogTime = Date.now();
|
|
1093
1104
|
}
|
package/src/monitor/ingestion.js
CHANGED
|
@@ -13,7 +13,7 @@ const { loadCachedIOCs } = require('../ioc/updater.js');
|
|
|
13
13
|
const { enqueueScan } = require('./scan-queue.js');
|
|
14
14
|
const {
|
|
15
15
|
saveNpmSeq, CHANGES_STREAM_URL, CHANGES_LIMIT, CHANGES_CATCHUP_MAX,
|
|
16
|
-
savePypiSerial, PYPI_XMLRPC_URL, PYPI_CATCHUP_MAX
|
|
16
|
+
savePypiSerial, PYPI_XMLRPC_URL, PYPI_CATCHUP_MAX, appendScanLedger
|
|
17
17
|
} = require('./state.js');
|
|
18
18
|
const { sendIOCPreAlert, sendCampaignPreAlert } = require('./webhook.js');
|
|
19
19
|
|
|
@@ -109,6 +109,14 @@ function httpsGet(url, timeoutMs = 30_000, deadlineMs = Math.max(timeoutMs * 2,
|
|
|
109
109
|
clearTimeout(deadline);
|
|
110
110
|
return httpsGet(location, timeoutMs, deadlineMs).then(resolve, reject);
|
|
111
111
|
}
|
|
112
|
+
if (res.statusCode === 429) {
|
|
113
|
+
res.resume();
|
|
114
|
+
// Coordinated backoff: drain the SHARED token bucket so every in-flight registry fetch
|
|
115
|
+
// slows together. This high-volume packument/changes path must signal 429 like the
|
|
116
|
+
// metadata path (npm-registry.js) does — not just acquire a slot (CLAUDE.md 429 storm).
|
|
117
|
+
try { require('../shared/http-limiter.js').signal429(); } catch { /* limiter best-effort */ }
|
|
118
|
+
return done(new Error(`HTTP 429 rate limited for ${url}`));
|
|
119
|
+
}
|
|
112
120
|
if (res.statusCode < 200 || res.statusCode >= 300) {
|
|
113
121
|
res.resume();
|
|
114
122
|
return done(new Error(`HTTP ${res.statusCode} for ${url}`));
|
|
@@ -166,6 +174,11 @@ function httpsPost(url, body, headers = {}, timeoutMs = 30_000, deadlineMs = Mat
|
|
|
166
174
|
if (err) reject(err); else resolve(value);
|
|
167
175
|
};
|
|
168
176
|
req = _deps.https.request(options, (res) => {
|
|
177
|
+
if (res.statusCode === 429) {
|
|
178
|
+
res.resume();
|
|
179
|
+
try { require('../shared/http-limiter.js').signal429(); } catch { /* limiter best-effort */ }
|
|
180
|
+
return done(new Error(`HTTP 429 rate limited for POST ${url}`));
|
|
181
|
+
}
|
|
169
182
|
if (res.statusCode < 200 || res.statusCode >= 300) {
|
|
170
183
|
res.resume();
|
|
171
184
|
return done(new Error(`HTTP ${res.statusCode} for POST ${url}`));
|
|
@@ -418,6 +431,7 @@ function selectMostRecentVersion(packument, options = {}) {
|
|
|
418
431
|
description: (typeof versionData.description === 'string') ? versionData.description : '',
|
|
419
432
|
latestTagVersion,
|
|
420
433
|
recentVersions: [],
|
|
434
|
+
droppedBurstVersions: [],
|
|
421
435
|
};
|
|
422
436
|
|
|
423
437
|
// Burst extras: other versions published within the recent window, excluding the
|
|
@@ -432,7 +446,13 @@ function selectMostRecentVersion(packument, options = {}) {
|
|
|
432
446
|
const [v, ts] = versionTimes[i];
|
|
433
447
|
if (ts < cutoff) break; // sorted desc, so once we cross the cutoff we're done
|
|
434
448
|
result.recentWindowCount++;
|
|
435
|
-
if (result.recentVersions.length >= maxRecent)
|
|
449
|
+
if (result.recentVersions.length >= maxRecent) {
|
|
450
|
+
// Burst beyond the enqueue cap: collect the version so the caller ledgers it as a
|
|
451
|
+
// coverage loss (it is never enqueued/scanned). Keeps a Miasma-style burst that
|
|
452
|
+
// outruns maxRecent visible instead of vanishing silently (CLAUDE.md "no silent caps").
|
|
453
|
+
result.droppedBurstVersions.push(v);
|
|
454
|
+
continue; // enqueue list capped; count continues
|
|
455
|
+
}
|
|
436
456
|
const vData = versions[v];
|
|
437
457
|
if (!vData) continue;
|
|
438
458
|
result.recentVersions.push({
|
|
@@ -502,6 +522,16 @@ async function getNpmLatestTarball(packageName) {
|
|
|
502
522
|
age_days: null, version_count: 0,
|
|
503
523
|
};
|
|
504
524
|
}
|
|
525
|
+
// A3: ledger burst versions dropped by the maxRecent enqueue cap — they are never scanned,
|
|
526
|
+
// so record each as a 'dropped' coverage loss (source burst_extras_cap) for the coverage
|
|
527
|
+
// audit. Best-effort; never throws. selectMostRecentVersion stays pure (it only collects).
|
|
528
|
+
if (result.droppedBurstVersions && result.droppedBurstVersions.length) {
|
|
529
|
+
for (const v of result.droppedBurstVersions) {
|
|
530
|
+
try {
|
|
531
|
+
appendScanLedger({ name: packageName, version: v, ecosystem: 'npm', outcome: 'dropped', source: 'burst_extras_cap' });
|
|
532
|
+
} catch { /* ledger is best-effort */ }
|
|
533
|
+
}
|
|
534
|
+
}
|
|
505
535
|
// Stage 2.1 — extract reputation signals from the packument we already have,
|
|
506
536
|
// so triageRisk in queue.js doesn't have to refetch metadata via
|
|
507
537
|
// getPackageMetadata. Two fields are derivable from the packument alone:
|