muaddib-scanner 2.11.76 → 2.11.77
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.githooks/pre-commit +18 -0
- package/README.md +15 -6
- package/package.json +1 -2
- package/{self-scan-v2.11.76.json → self-scan-v2.11.77.json} +1 -1
- package/src/commands/safe-install.js +8 -3
- package/src/monitor/daemon.js +34 -22
- package/src/monitor/ingestion.js +32 -2
- package/src/monitor/queue.js +84 -21
- package/src/monitor/scan-queue.js +68 -1
- package/src/monitor/state.js +24 -1
- package/src/monitor/webhook.js +32 -11
- package/src/scanner/temporal-analysis.js +8 -0
- package/src/scanner/temporal-ast-diff.js +5 -0
- package/.dockerignore +0 -7
- package/.env.example +0 -43
- package/ml-retrain/auto-labeler/auto_labeler.py +0 -312
- package/ml-retrain/auto-labeler/ghsa_checker.py +0 -169
- package/ml-retrain/auto-labeler/labeler.py +0 -256
- package/ml-retrain/auto-labeler/npm_checker.py +0 -228
- package/ml-retrain/auto-labeler/ossf_index.py +0 -178
- package/ml-retrain/auto-labeler/requirements.txt +0 -1
- package/ml-retrain/confusion-matrix.png +0 -0
- package/ml-retrain/model-trees-retrained.js +0 -12
- package/ml-retrain/retrain-report.json +0 -225
- package/ml-retrain/retrain.py +0 -974
- package/sbom.json +0 -0
- package/src/ml/train-bundler-detector.py +0 -725
- package/src/ml/train-xgboost.py +0 -957
- package/tools/export-model-js.py +0 -160
- package/tools/requirements-ml.txt +0 -5
- package/tools/train-classifier.py +0 -333
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
#!/bin/sh
|
|
2
|
+
# Pre-commit guard — block committing a typosquat/forbidden dependency
|
|
3
|
+
# (loadash, lodash, lodahs, …). Local mirror of the CI gate
|
|
4
|
+
# (scripts/check-deps-typosquats.js, run in the `test` job of .github/workflows/scan.yml)
|
|
5
|
+
# and defense-in-depth with the package.json `preinstall` denylist.
|
|
6
|
+
#
|
|
7
|
+
# Enable once per clone: git config core.hooksPath .githooks
|
|
8
|
+
#
|
|
9
|
+
# This repo uses NEITHER lodash NOR loadash (CLAUDE.md interdiction). The loadash
|
|
10
|
+
# typosquat has re-entered package.json repeatedly; this stops it at commit time.
|
|
11
|
+
|
|
12
|
+
node scripts/check-deps-typosquats.js
|
|
13
|
+
status=$?
|
|
14
|
+
if [ "$status" -ne 0 ]; then
|
|
15
|
+
echo "pre-commit: ABORTED — remove the typosquat dependency from package.json (see above)." >&2
|
|
16
|
+
exit 1
|
|
17
|
+
fi
|
|
18
|
+
exit 0
|
package/README.md
CHANGED
|
@@ -30,7 +30,7 @@
|
|
|
30
30
|
|
|
31
31
|
npm and PyPI supply-chain attacks are exploding. Shai-Hulud compromised 25K+ repos in 2025. Existing tools detect threats but don't help you respond.
|
|
32
32
|
|
|
33
|
-
MUAD'DIB combines **20 parallel scanners** (
|
|
33
|
+
MUAD'DIB combines **20 parallel scanners** (264 detection rules), a **deobfuscation engine**, **inter-module dataflow analysis**, **compound scoring** (17 compound rules), and a gVisor/Docker sandbox to detect known threats and suspicious behavioral patterns in npm and PyPI packages. An XGBoost classifier exists in the codebase but is **currently inactive** (see [Evaluation Metrics](#evaluation-metrics) → ML Classifier section).
|
|
34
34
|
|
|
35
35
|
---
|
|
36
36
|
|
|
@@ -202,9 +202,9 @@ muaddib replay # Ground truth validation (90/94 TPR@3, v2.11
|
|
|
202
202
|
| Python Source (PYSRC) | Import-time / install-time RCE patterns in `__init__.py` / `setup.py` (v2.11.41 — closes TrapDoor PyPI gap) |
|
|
203
203
|
| Python AST (PYAST) | Tree-sitter-Python AST with taint-aware detectors (v2.11.42+) |
|
|
204
204
|
|
|
205
|
-
###
|
|
205
|
+
### 264 detection rules
|
|
206
206
|
|
|
207
|
-
All rules (
|
|
207
|
+
All rules (259 RULES + 5 PARANOID) are mapped to MITRE ATT&CK techniques. See [SECURITY.md](SECURITY.md#detection-rules-v21176) for the complete rules reference.
|
|
208
208
|
|
|
209
209
|
### Detected campaigns
|
|
210
210
|
|
|
@@ -278,7 +278,7 @@ With pre-commit framework:
|
|
|
278
278
|
```yaml
|
|
279
279
|
repos:
|
|
280
280
|
- repo: https://github.com/DNSZLSK/muad-dib
|
|
281
|
-
rev: v2.11.
|
|
281
|
+
rev: v2.11.76
|
|
282
282
|
hooks:
|
|
283
283
|
- id: muaddib-scan
|
|
284
284
|
```
|
|
@@ -303,11 +303,20 @@ These are the numbers a user gets when running `muaddib scan` against npm or PyP
|
|
|
303
303
|
| **FPR PyPI** (v2.11.48, first honest measurement) | **9.68%** (12/124 scanned, 132 total) | **Track D fixed the PyPI downloader** — removed `pip --no-binary :all:` flag (forced compile of wheel-only packages, timed out 38% of the time) + added `.whl` extraction via `extractArchive()`. Brought 42 previously-skipped giants (numpy/pandas/django/matplotlib/scikit-learn/...) into scope. All 12 FPs cluster at score 25-35: this is the cap-PyPI-35 artifact, not new rule misfires. Lifting the cap (Track E) would drop FPR PyPI to ≈0%. 8 residual fails are >500MB packages (torch, tensorflow, scipy, opencv-python, ansible…) hitting the 30s `PACK_TIMEOUT_MS`. |
|
|
304
304
|
| **ADR** (Adversarial + Holdout, v2.11.48) | **96.26%** (103/107) | 67 adversarial + 40 holdout, global threshold=20. Stable vs v2.10.95. |
|
|
305
305
|
|
|
306
|
-
**
|
|
306
|
+
**4132 tests** across 115 files. **264 rules** (259 RULES + 5 PARANOID; v2.11.67/70 Phantom Gyp added PKG-023 + COMPOUND-017).
|
|
307
307
|
|
|
308
308
|
**Known issues (v2.11.48):**
|
|
309
309
|
- *Cap PyPI à 35/100*: Python samples plafonnent à `riskScore=35` even when `globalRiskScore=100`. Confirmed empirically — all 12 PyPI FPs at score 25-35 (flask 32, django 35, tornado 35, bottle 30, pandas 25, matplotlib 25, plotly 25, bokeh 25, pymongo 35, coverage 32, fabric 35, websockets 35). Lifting the cap will simultaneously drop FPR PyPI to ≈0% and unblock PyPI MALWARE detection at higher thresholds. Track E target.
|
|
310
310
|
|
|
311
|
+
### Operational coverage (v2.11.67-76)
|
|
312
|
+
|
|
313
|
+
The static ground-truth TPR above is measured offline. Since v2.11.67 the monitor also tracks **operational** coverage on live npm/PyPI ingestion:
|
|
314
|
+
- A per-scan **ledger** (`data/scan-ledger.jsonl`) records every scanned package's outcome; `computeLedgerRollup()` produces a 24h rollup (`alertRate`, per-ecosystem). Note: `alertRate` is a throughput signal, **not** detection TPR.
|
|
315
|
+
- An active **GHSA poller** (~15 min; npm, pypi, crates) builds an authoritative "what should we have caught" denominator (`data/ghsa-malware.jsonl`), plus a **feed-health** alarm that fires when an IOC feed silently goes dark.
|
|
316
|
+
- The Phase 5 **coverage-audit** (`scripts/coverage-audit.js`, daily 05:00 UTC) joins that denominator against ledger outcomes + the tarball archive to compute an honest GHSA-denominated **operational TPR** (`alerted / total`), and surfaces `scannedClean` misses as human-gated ground-truth candidates.
|
|
317
|
+
|
|
318
|
+
This operational TPR is the real production detection rate, distinct from the static GT TPR (which has not been re-measured since v2.11.48).
|
|
319
|
+
|
|
311
320
|
### ML Classifier (offline only)
|
|
312
321
|
|
|
313
322
|
`src/ml/classifier.js` is **not wired into `muaddib scan`**. The XGBoost model is currently exercised only by `muaddib evaluate` (offline metric replay) and `muaddib monitor` (LOG-ONLY since 2026-04-08, model collapsed pending retrain — see `src/monitor/queue.js:628`). The v2.11.48 evaluate-time replay shows the same 1.10% FPR (no additional FPs filtered) — kept as a reference for retrain validation, but the published operational FPR is the rules-only number above.
|
|
@@ -371,7 +380,7 @@ npm test
|
|
|
371
380
|
|
|
372
381
|
### Testing
|
|
373
382
|
|
|
374
|
-
- **
|
|
383
|
+
- **4132 tests** across 115 modular test files
|
|
375
384
|
- **56 fuzz tests** - Malformed inputs, ReDoS, unicode, binary
|
|
376
385
|
- **Datadog 17K benchmark** - 14,587 confirmed malware samples (in-scope)
|
|
377
386
|
- **Ground truth validation** - 96 real-world attacks (95.74% TPR@3, 88.30% TPR@20 — v2.11.48 full measure on 94 in-scope)
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "muaddib-scanner",
|
|
3
|
-
"version": "2.11.
|
|
3
|
+
"version": "2.11.77",
|
|
4
4
|
"description": "Supply-chain threat detection & response for npm & PyPI/Python",
|
|
5
5
|
"main": "src/index.js",
|
|
6
6
|
"bin": {
|
|
@@ -52,7 +52,6 @@
|
|
|
52
52
|
"acorn-walk": "8.3.5",
|
|
53
53
|
"adm-zip": "0.5.17",
|
|
54
54
|
"js-yaml": "4.2.0",
|
|
55
|
-
"loadash": "^1.0.0",
|
|
56
55
|
"web-tree-sitter": "^0.26.9"
|
|
57
56
|
},
|
|
58
57
|
"devDependencies": {
|
|
@@ -1,6 +1,11 @@
|
|
|
1
1
|
const fs = require('fs');
|
|
2
2
|
const path = require('path');
|
|
3
|
-
|
|
3
|
+
// NB: keep the module object (cp.spawnSync), NOT a destructured `spawnSync`. Destructuring
|
|
4
|
+
// captures the original reference at load time, which makes the function impossible to mock
|
|
5
|
+
// from tests (`cp.spawnSync = ...` wouldn't be seen here) — that's exactly how the
|
|
6
|
+
// safe-install test's mock silently failed and a real `npm install loadash` ran every test
|
|
7
|
+
// run, re-contaminating package.json. Property access keeps it interceptable.
|
|
8
|
+
const cp = require('child_process');
|
|
4
9
|
const { loadCachedIOCs } = require('../ioc/updater.js');
|
|
5
10
|
const { REHABILITATED_PACKAGES, NPM_PACKAGE_REGEX } = require('../shared/constants.js');
|
|
6
11
|
|
|
@@ -172,7 +177,7 @@ async function scanPackageRecursive(pkg, depth = 0, maxDepth = 3) {
|
|
|
172
177
|
// Get the package info (uses spawnSync to avoid injection)
|
|
173
178
|
let pkgInfo;
|
|
174
179
|
try {
|
|
175
|
-
const result = spawnSync('npm', ['view', pkgName, '--json'], { encoding: 'utf8', shell: false });
|
|
180
|
+
const result = cp.spawnSync('npm', ['view', pkgName, '--json'], { encoding: 'utf8', shell: false });
|
|
176
181
|
if (result.status !== 0 || !result.stdout) {
|
|
177
182
|
if (depth === 0) console.log(`[!] Package ${pkgName} not found on npm`);
|
|
178
183
|
return { safe: false, package: pkgName, reason: 'npm_unreachable', source: 'npm-registry', description: 'Package not found on npm registry', depth };
|
|
@@ -276,7 +281,7 @@ async function safeInstall(packages, options = {}) {
|
|
|
276
281
|
if (isDev) npmArgs.push('--save-dev');
|
|
277
282
|
if (isGlobal) npmArgs.push('-g');
|
|
278
283
|
|
|
279
|
-
const result = spawnSync('npm', npmArgs, { stdio: 'inherit', shell: false });
|
|
284
|
+
const result = cp.spawnSync('npm', npmArgs, { stdio: 'inherit', shell: false });
|
|
280
285
|
|
|
281
286
|
if (result.status !== 0) {
|
|
282
287
|
console.log('');
|
package/src/monitor/daemon.js
CHANGED
|
@@ -5,14 +5,15 @@ const os = require('os');
|
|
|
5
5
|
const v8 = require('v8');
|
|
6
6
|
const { isDockerAvailable, SANDBOX_CONCURRENCY_MAX, killAllSandboxContainers } = require('../sandbox/index.js');
|
|
7
7
|
const { setVerboseMode, isSandboxEnabled, isCanaryEnabled, isLlmDetectiveEnabled, getLlmDetectiveMode, DOWNLOADS_CACHE_TTL } = require('./classify.js');
|
|
8
|
-
const { loadState, saveState, loadDailyStats, saveDailyStats, purgeTarballCache,
|
|
8
|
+
const { loadState, saveState, loadDailyStats, saveDailyStats, purgeTarballCache, isDailyReportDue, atomicWriteFileSync, saveNpmSeq, ALERTS_FILE, runStateMigrations, loadRecentlyScanned, saveRecentlyScanned } = require('./state.js');
|
|
9
9
|
const { isTemporalEnabled, isTemporalAstEnabled, isTemporalPublishEnabled, isTemporalMaintainerEnabled } = require('./temporal.js');
|
|
10
|
-
const { pendingGrouped, flushScopeGroup, sendDailyReport,
|
|
10
|
+
const { pendingGrouped, flushScopeGroup, sendDailyReport, alertedPackageRules, ALERTED_PACKAGES_MAX: MAX_ALERTED_PACKAGES } = require('./webhook.js');
|
|
11
11
|
const { poll, getPollBackoffMs } = require('./ingestion.js');
|
|
12
12
|
const { ensureWorkers, drainWorkers, getTargetConcurrency, setTargetConcurrency, getActiveWorkers, terminateAllWorkers } = require('./queue.js');
|
|
13
13
|
const { computeTarget, ADJUST_INTERVAL_MS, BASE_CONCURRENCY } = require('./adaptive-concurrency.js');
|
|
14
14
|
const { startHealthcheck } = require('./healthcheck.js');
|
|
15
15
|
const { startDeferredWorker, stopDeferredWorker, persistDeferredQueue, restoreDeferredQueue, clearDeferredQueue } = require('./deferred-sandbox.js');
|
|
16
|
+
const { evictFromScanQueueBulk } = require('./scan-queue.js');
|
|
16
17
|
const { startGhsaPoller, stopGhsaPoller } = require('../ioc/ghsa-poller.js');
|
|
17
18
|
const { cleanupOldArchives, getRetentionDays, startPeriodicCleanup } = require('./tarball-archive.js');
|
|
18
19
|
const { clearMetadataCache } = require('../scanner/temporal-analysis.js');
|
|
@@ -532,8 +533,13 @@ function pruneMemoryCaches(recentlyScanned, downloadsCache, alertedPackageRules)
|
|
|
532
533
|
* Worker spawning is gated separately in the main loop (ensureWorkers skipped at HIGH+).
|
|
533
534
|
* Ingestion is gated in ingestion.js via getMemoryPressureLevel() (skipped at CRITICAL+).
|
|
534
535
|
*/
|
|
535
|
-
function handleMemoryPressure(level, ratio, recentlyScanned, downloadsCache, scanQueue) {
|
|
536
|
+
function handleMemoryPressure(level, ratio, rssRatio, recentlyScanned, downloadsCache, scanQueue, stats) {
|
|
536
537
|
const pct = (ratio * 100).toFixed(0);
|
|
538
|
+
// Show BOTH arms: an EMERGENCY almost always fires on RSS (off-heap — gVisor containers,
|
|
539
|
+
// tarball buffers) while the heap sits low (~15%). Logging only heap made every breaker
|
|
540
|
+
// line read "heap at 15%" and hid the real cause; memPctLabel surfaces which arm tripped.
|
|
541
|
+
const rssPct = (rssRatio != null && isFinite(rssRatio)) ? (rssRatio * 100).toFixed(0) : '?';
|
|
542
|
+
const memPctLabel = `heap ${pct}% / rss ${rssPct}%`;
|
|
537
543
|
// Structured summary of what the breaker actually did this tick. Returned (the poll loop
|
|
538
544
|
// at the call site ignores it) so the reclaim is observable to callers and tests without
|
|
539
545
|
// scraping console output — CLAUDE.md §3 "Toujours logger un resume". The two kill fields
|
|
@@ -543,7 +549,7 @@ function handleMemoryPressure(level, ratio, recentlyScanned, downloadsCache, sca
|
|
|
543
549
|
|
|
544
550
|
// HIGH (85%+): clear auxiliary caches — same as old emergency prune
|
|
545
551
|
if (level >= MEMORY_PRESSURE_LEVELS.HIGH) {
|
|
546
|
-
console.error(`[MONITOR] MEMORY PRESSURE HIGH:
|
|
552
|
+
console.error(`[MONITOR] MEMORY PRESSURE HIGH: ${memPctLabel} — pruning caches, stopping new workers`);
|
|
547
553
|
recentlyScanned.clear();
|
|
548
554
|
downloadsCache.clear();
|
|
549
555
|
alertedPackageRules.clear();
|
|
@@ -552,7 +558,7 @@ function handleMemoryPressure(level, ratio, recentlyScanned, downloadsCache, sca
|
|
|
552
558
|
|
|
553
559
|
// CRITICAL (90%+): clear scanner caches, force GC
|
|
554
560
|
if (level >= MEMORY_PRESSURE_LEVELS.CRITICAL) {
|
|
555
|
-
console.error(`[MONITOR] MEMORY PRESSURE CRITICAL:
|
|
561
|
+
console.error(`[MONITOR] MEMORY PRESSURE CRITICAL: ${memPctLabel} — stopping ingestion, clearing scanner caches`);
|
|
556
562
|
// temporal-analysis._metadataCache (200 entries × full npm registry metadata)
|
|
557
563
|
try { clearMetadataCache(); } catch {}
|
|
558
564
|
// typosquat metadataCache (500 entries × npm registry metadata for typosquat scoring)
|
|
@@ -578,21 +584,30 @@ function handleMemoryPressure(level, ratio, recentlyScanned, downloadsCache, sca
|
|
|
578
584
|
if (level >= MEMORY_PRESSURE_LEVELS.EMERGENCY) {
|
|
579
585
|
const queueBefore = scanQueue.length;
|
|
580
586
|
if (queueBefore > EMERGENCY_QUEUE_KEEP) {
|
|
581
|
-
//
|
|
582
|
-
//
|
|
583
|
-
//
|
|
584
|
-
//
|
|
585
|
-
|
|
586
|
-
// splice
|
|
587
|
-
scanQueue
|
|
587
|
+
// Protected-aware bulk eviction — SINGLE SOURCE OF TRUTH with the queue-cap path
|
|
588
|
+
// (scan-queue.js evictFromScanQueueBulk / enqueueScan share _isProtected). Keeps
|
|
589
|
+
// IOC-match / burst / first-publish / ATO scans, drops the oldest UNPROTECTED items
|
|
590
|
+
// first (newest survive — most likely to still exist for re-scan), protected only as
|
|
591
|
+
// a last resort, and LEDGERS every drop. Closes the v2.10.88 gap where the raw
|
|
592
|
+
// splice(0,n) silently dropped protected scans (CLAUDE.md "ne jamais perdre de scan").
|
|
593
|
+
const { dropped, droppedProtected } = evictFromScanQueueBulk(scanQueue, EMERGENCY_QUEUE_KEEP, 'mem_emergency');
|
|
588
594
|
summary.queueDropped = dropped;
|
|
589
|
-
|
|
595
|
+
summary.queueDroppedProtected = droppedProtected;
|
|
596
|
+
if (stats) {
|
|
597
|
+
stats.queueEmergencyDrops = (stats.queueEmergencyDrops || 0) + dropped;
|
|
598
|
+
if (droppedProtected) stats.queueEmergencyProtectedDrops = (stats.queueEmergencyProtectedDrops || 0) + droppedProtected;
|
|
599
|
+
}
|
|
600
|
+
console.error(`[MONITOR] MEMORY EMERGENCY: ${memPctLabel} — truncated queue ${queueBefore} → ${scanQueue.length} (dropped ${dropped} oldest UNPROTECTED${droppedProtected ? ` + ${droppedProtected} protected as last resort` : ''}, all ledgered)`);
|
|
590
601
|
}
|
|
591
602
|
// Clear deferred sandbox queue (holds full staticResult objects)
|
|
592
603
|
const deferredDropped = clearDeferredQueue();
|
|
593
604
|
summary.deferredDropped = deferredDropped;
|
|
594
605
|
if (deferredDropped > 0) {
|
|
595
|
-
|
|
606
|
+
// Observability only (counter, NOT a ledger 'dropped' entry): the deferred queue holds
|
|
607
|
+
// post-scan sandbox ENRICHMENT for packages already statically scanned + alerted, so
|
|
608
|
+
// clearing it is not a coverage loss — ledgering them as 'dropped' would mislabel them.
|
|
609
|
+
if (stats) stats.deferredDroppedEmergency = (stats.deferredDroppedEmergency || 0) + deferredDropped;
|
|
610
|
+
console.error(`[MONITOR] MEMORY EMERGENCY: cleared ${deferredDropped} deferred sandbox items (post-scan enrichment only — primary alerts already sent)`);
|
|
596
611
|
}
|
|
597
612
|
// Free the off-heap leak that queue truncation can't touch: orphaned sandbox
|
|
598
613
|
// containers (gVisor runsc survives `docker kill`) and wedged scan workers.
|
|
@@ -642,13 +657,10 @@ function reportStats(stats) {
|
|
|
642
657
|
stats.lastReportTime = Date.now();
|
|
643
658
|
}
|
|
644
659
|
|
|
645
|
-
|
|
646
|
-
|
|
647
|
-
|
|
648
|
-
|
|
649
|
-
const { hasReportBeenSentToday } = require('./state.js');
|
|
650
|
-
return !hasReportBeenSentToday(stats);
|
|
651
|
-
}
|
|
660
|
+
// isDailyReportDue is the canonical gate in state.js (imported above) — re-exported below
|
|
661
|
+
// so monitor.js (daemonModule.isDailyReportDue) keeps resolving. The old local copy used a
|
|
662
|
+
// `hour !== 8` gate that lost a whole day whenever the daemon missed the single 08:00 minute
|
|
663
|
+
// (OOM crash-loop); state.js uses the catch-up `hour >= 8` gate instead.
|
|
652
664
|
|
|
653
665
|
// ─── P1.0 — memory-trend instrumentation ───
|
|
654
666
|
// Append one sample per memory-watchdog tick so the off-heap leak can be localised
|
|
@@ -1087,7 +1099,7 @@ async function startMonitor(options, stats, dailyAlerts, recentlyScanned, downlo
|
|
|
1087
1099
|
|
|
1088
1100
|
// Graduated response at HIGH+
|
|
1089
1101
|
if (pressureLevel >= MEMORY_PRESSURE_LEVELS.HIGH) {
|
|
1090
|
-
handleMemoryPressure(pressureLevel, heapRatio, recentlyScanned, downloadsCache, scanQueue);
|
|
1102
|
+
handleMemoryPressure(pressureLevel, heapRatio, rssRatio, recentlyScanned, downloadsCache, scanQueue, stats);
|
|
1091
1103
|
}
|
|
1092
1104
|
lastMemoryLogTime = Date.now();
|
|
1093
1105
|
}
|
package/src/monitor/ingestion.js
CHANGED
|
@@ -13,7 +13,7 @@ const { loadCachedIOCs } = require('../ioc/updater.js');
|
|
|
13
13
|
const { enqueueScan } = require('./scan-queue.js');
|
|
14
14
|
const {
|
|
15
15
|
saveNpmSeq, CHANGES_STREAM_URL, CHANGES_LIMIT, CHANGES_CATCHUP_MAX,
|
|
16
|
-
savePypiSerial, PYPI_XMLRPC_URL, PYPI_CATCHUP_MAX
|
|
16
|
+
savePypiSerial, PYPI_XMLRPC_URL, PYPI_CATCHUP_MAX, appendScanLedger
|
|
17
17
|
} = require('./state.js');
|
|
18
18
|
const { sendIOCPreAlert, sendCampaignPreAlert } = require('./webhook.js');
|
|
19
19
|
|
|
@@ -109,6 +109,14 @@ function httpsGet(url, timeoutMs = 30_000, deadlineMs = Math.max(timeoutMs * 2,
|
|
|
109
109
|
clearTimeout(deadline);
|
|
110
110
|
return httpsGet(location, timeoutMs, deadlineMs).then(resolve, reject);
|
|
111
111
|
}
|
|
112
|
+
if (res.statusCode === 429) {
|
|
113
|
+
res.resume();
|
|
114
|
+
// Coordinated backoff: drain the SHARED token bucket so every in-flight registry fetch
|
|
115
|
+
// slows together. This high-volume packument/changes path must signal 429 like the
|
|
116
|
+
// metadata path (npm-registry.js) does — not just acquire a slot (CLAUDE.md 429 storm).
|
|
117
|
+
try { require('../shared/http-limiter.js').signal429(); } catch { /* limiter best-effort */ }
|
|
118
|
+
return done(new Error(`HTTP 429 rate limited for ${url}`));
|
|
119
|
+
}
|
|
112
120
|
if (res.statusCode < 200 || res.statusCode >= 300) {
|
|
113
121
|
res.resume();
|
|
114
122
|
return done(new Error(`HTTP ${res.statusCode} for ${url}`));
|
|
@@ -166,6 +174,11 @@ function httpsPost(url, body, headers = {}, timeoutMs = 30_000, deadlineMs = Mat
|
|
|
166
174
|
if (err) reject(err); else resolve(value);
|
|
167
175
|
};
|
|
168
176
|
req = _deps.https.request(options, (res) => {
|
|
177
|
+
if (res.statusCode === 429) {
|
|
178
|
+
res.resume();
|
|
179
|
+
try { require('../shared/http-limiter.js').signal429(); } catch { /* limiter best-effort */ }
|
|
180
|
+
return done(new Error(`HTTP 429 rate limited for POST ${url}`));
|
|
181
|
+
}
|
|
169
182
|
if (res.statusCode < 200 || res.statusCode >= 300) {
|
|
170
183
|
res.resume();
|
|
171
184
|
return done(new Error(`HTTP ${res.statusCode} for POST ${url}`));
|
|
@@ -418,6 +431,7 @@ function selectMostRecentVersion(packument, options = {}) {
|
|
|
418
431
|
description: (typeof versionData.description === 'string') ? versionData.description : '',
|
|
419
432
|
latestTagVersion,
|
|
420
433
|
recentVersions: [],
|
|
434
|
+
droppedBurstVersions: [],
|
|
421
435
|
};
|
|
422
436
|
|
|
423
437
|
// Burst extras: other versions published within the recent window, excluding the
|
|
@@ -432,7 +446,13 @@ function selectMostRecentVersion(packument, options = {}) {
|
|
|
432
446
|
const [v, ts] = versionTimes[i];
|
|
433
447
|
if (ts < cutoff) break; // sorted desc, so once we cross the cutoff we're done
|
|
434
448
|
result.recentWindowCount++;
|
|
435
|
-
if (result.recentVersions.length >= maxRecent)
|
|
449
|
+
if (result.recentVersions.length >= maxRecent) {
|
|
450
|
+
// Burst beyond the enqueue cap: collect the version so the caller ledgers it as a
|
|
451
|
+
// coverage loss (it is never enqueued/scanned). Keeps a Miasma-style burst that
|
|
452
|
+
// outruns maxRecent visible instead of vanishing silently (CLAUDE.md "no silent caps").
|
|
453
|
+
result.droppedBurstVersions.push(v);
|
|
454
|
+
continue; // enqueue list capped; count continues
|
|
455
|
+
}
|
|
436
456
|
const vData = versions[v];
|
|
437
457
|
if (!vData) continue;
|
|
438
458
|
result.recentVersions.push({
|
|
@@ -502,6 +522,16 @@ async function getNpmLatestTarball(packageName) {
|
|
|
502
522
|
age_days: null, version_count: 0,
|
|
503
523
|
};
|
|
504
524
|
}
|
|
525
|
+
// A3: ledger burst versions dropped by the maxRecent enqueue cap — they are never scanned,
|
|
526
|
+
// so record each as a 'dropped' coverage loss (source burst_extras_cap) for the coverage
|
|
527
|
+
// audit. Best-effort; never throws. selectMostRecentVersion stays pure (it only collects).
|
|
528
|
+
if (result.droppedBurstVersions && result.droppedBurstVersions.length) {
|
|
529
|
+
for (const v of result.droppedBurstVersions) {
|
|
530
|
+
try {
|
|
531
|
+
appendScanLedger({ name: packageName, version: v, ecosystem: 'npm', outcome: 'dropped', source: 'burst_extras_cap' });
|
|
532
|
+
} catch { /* ledger is best-effort */ }
|
|
533
|
+
}
|
|
534
|
+
}
|
|
505
535
|
// Stage 2.1 — extract reputation signals from the packument we already have,
|
|
506
536
|
// so triageRisk in queue.js doesn't have to refetch metadata via
|
|
507
537
|
// getPackageMetadata. Two fields are derivable from the packument alone:
|
package/src/monitor/queue.js
CHANGED
|
@@ -32,8 +32,7 @@ const {
|
|
|
32
32
|
tarballCacheKey,
|
|
33
33
|
tarballCachePath,
|
|
34
34
|
appendAlert,
|
|
35
|
-
|
|
36
|
-
hasReportBeenSentToday,
|
|
35
|
+
isDailyReportDue,
|
|
37
36
|
MAX_DAILY_ALERTS,
|
|
38
37
|
loadScanMemory,
|
|
39
38
|
shouldSuppressByMemory,
|
|
@@ -64,8 +63,7 @@ const {
|
|
|
64
63
|
computeReputationFactor,
|
|
65
64
|
triageRisk,
|
|
66
65
|
sendDailyReport,
|
|
67
|
-
alertedPackageRules
|
|
68
|
-
DAILY_REPORT_HOUR
|
|
66
|
+
alertedPackageRules
|
|
69
67
|
} = require('./webhook.js');
|
|
70
68
|
|
|
71
69
|
// From ./temporal.js
|
|
@@ -99,10 +97,11 @@ let _targetConcurrency = BASE_CONCURRENCY;
|
|
|
99
97
|
const SCAN_CONCURRENCY = BASE_CONCURRENCY; // legacy export — tests check this value
|
|
100
98
|
let _activeWorkers = 0;
|
|
101
99
|
const _workerPromises = new Set();
|
|
102
|
-
// Live static-scan Worker threads
|
|
103
|
-
//
|
|
104
|
-
// ASTs)
|
|
105
|
-
|
|
100
|
+
// Live static-scan Worker threads, mapped to the {name,version,ecosystem} of the scan they
|
|
101
|
+
// run — tracked so the daemon's EMERGENCY memory handler can terminate orphaned workers
|
|
102
|
+
// (each retains its isolate heap + parsed ASTs) AND name the in-flight scans it kills.
|
|
103
|
+
// Bounded by concurrency, so it stays tiny.
|
|
104
|
+
const _liveWorkers = new Map();
|
|
106
105
|
|
|
107
106
|
function getTargetConcurrency() { return _targetConcurrency; }
|
|
108
107
|
function setTargetConcurrency(n) { _targetConcurrency = Math.max(MIN_CONCURRENCY, Math.min(MAX_CONCURRENCY, n)); }
|
|
@@ -115,10 +114,20 @@ function getActiveWorkers() { return _activeWorkers; }
|
|
|
115
114
|
*/
|
|
116
115
|
function terminateAllWorkers() {
|
|
117
116
|
let n = 0;
|
|
118
|
-
|
|
119
|
-
|
|
117
|
+
const dropped = [];
|
|
118
|
+
for (const [w, item] of Array.from(_liveWorkers.entries())) {
|
|
119
|
+
try {
|
|
120
|
+
w.terminate(); n++;
|
|
121
|
+
if (item && item.name) dropped.push(`${item.name}@${item.version || '?'}`);
|
|
122
|
+
} catch { /* already gone */ }
|
|
120
123
|
_liveWorkers.delete(w);
|
|
121
124
|
}
|
|
125
|
+
if (dropped.length) {
|
|
126
|
+
// The terminate rejects each scan's worker promise; that reject propagates to
|
|
127
|
+
// scanPackage's catch, which ledgers it (outcome:'error', source scan_error) — so these
|
|
128
|
+
// in-flight scans are NOT lost from the scan-ledger. This line names them for the operator.
|
|
129
|
+
console.error(`[MONITOR] EMERGENCY worker-terminate killed ${dropped.length} in-flight scan(s): ${dropped.slice(0, 20).join(', ')}${dropped.length > 20 ? ` (+${dropped.length - 20} more)` : ''}`);
|
|
130
|
+
}
|
|
122
131
|
return n;
|
|
123
132
|
}
|
|
124
133
|
const SCAN_TIMEOUT_MS = 300_000; // 5 minutes per package (3 sandbox runs × 90s + static scan headroom)
|
|
@@ -388,7 +397,8 @@ function runScanInWorker(extractedDir, timeoutMs, scanContext = null, signal = n
|
|
|
388
397
|
const worker = new Worker(SCAN_WORKER_PATH, {
|
|
389
398
|
workerData: { extractedDir, scanContext: scanContext || {} }
|
|
390
399
|
});
|
|
391
|
-
|
|
400
|
+
const _sc = scanContext || {};
|
|
401
|
+
_liveWorkers.set(worker, { name: _sc.name, version: _sc.version, ecosystem: _sc.ecosystem });
|
|
392
402
|
|
|
393
403
|
let settled = false;
|
|
394
404
|
let timer = null;
|
|
@@ -639,6 +649,11 @@ async function scanPackage(name, version, ecosystem, tarballUrl, registryMeta, s
|
|
|
639
649
|
// deliberately hangs the parser to evade analysis would otherwise be relabelled
|
|
640
650
|
// benign. Count as inconclusive (excluded from the FP/TP denominator).
|
|
641
651
|
updateScanStats('sandbox_inconclusive');
|
|
652
|
+
// Ledger the inconclusive timeout — the 'static_timeout' outcome existed but was
|
|
653
|
+
// emitted nowhere, so a parser-hang evasion vanished from coverage. Best-effort.
|
|
654
|
+
try {
|
|
655
|
+
appendScanLedger({ name, version, ecosystem, outcome: 'static_timeout', source: 'static_timeout' });
|
|
656
|
+
} catch { /* ledger is best-effort */ }
|
|
642
657
|
return { sandboxResult: null, staticClean: false };
|
|
643
658
|
}
|
|
644
659
|
throw staticErr;
|
|
@@ -1215,6 +1230,12 @@ async function scanPackage(name, version, ecosystem, tarballUrl, registryMeta, s
|
|
|
1215
1230
|
stats.scanned++;
|
|
1216
1231
|
stats.totalTimeMs += Date.now() - startTime;
|
|
1217
1232
|
console.error(`[MONITOR] ERROR scanning ${name}@${version}: ${err.message}`);
|
|
1233
|
+
// Ledger the terminal failure so the scan-ledger never over-states coverage (an errored
|
|
1234
|
+
// package is NOT clean). Also captures EMERGENCY worker-terminate losses, whose reject
|
|
1235
|
+
// propagates here (CLAUDE.md "no silent caps"). Best-effort; never throws.
|
|
1236
|
+
try {
|
|
1237
|
+
appendScanLedger({ name, version, ecosystem, outcome: 'error', source: 'scan_error' });
|
|
1238
|
+
} catch { /* ledger is best-effort */ }
|
|
1218
1239
|
return { sandboxResult: null, staticClean: false };
|
|
1219
1240
|
} finally {
|
|
1220
1241
|
// Cleanup temp dir
|
|
@@ -1256,15 +1277,9 @@ function timeoutPromise(ms) {
|
|
|
1256
1277
|
});
|
|
1257
1278
|
}
|
|
1258
1279
|
|
|
1259
|
-
|
|
1260
|
-
|
|
1261
|
-
|
|
1262
|
-
*/
|
|
1263
|
-
function isDailyReportDue(stats) {
|
|
1264
|
-
const parisHour = getParisHour();
|
|
1265
|
-
if (parisHour < DAILY_REPORT_HOUR) return false;
|
|
1266
|
-
return !hasReportBeenSentToday(stats);
|
|
1267
|
-
}
|
|
1280
|
+
// isDailyReportDue is the canonical gate in state.js (imported above), called per scan in
|
|
1281
|
+
// processQueueItem below. Previously a local `parisHour < 8` copy here diverged from the
|
|
1282
|
+
// daemon's `!== 8` copy; unifying in state.js removes the divergence. Still re-exported below.
|
|
1268
1283
|
|
|
1269
1284
|
/**
|
|
1270
1285
|
* Process a single item from the scan queue.
|
|
@@ -1358,6 +1373,37 @@ function computeWorkersToSpawn(targetConcurrency, activeWorkers, queueLength) {
|
|
|
1358
1373
|
return Math.max(0, Math.min(targetConcurrency - activeWorkers, queueLength));
|
|
1359
1374
|
}
|
|
1360
1375
|
|
|
1376
|
+
// ── RSS-aware worker admission (P1 OOM durable fix) ──
|
|
1377
|
+
// The pressure breaker is reactive: it stops spawning at HIGH, but the workers already in
|
|
1378
|
+
// flight overshoot RSS by ~2GB (each isolate + gVisor sandbox ~0.55GB, draining up to
|
|
1379
|
+
// SCAN_TIMEOUT) before EMERGENCY truncates the queue + kills them. This caps the OVERSHOOT at
|
|
1380
|
+
// the source — refuse a new spawn when current RSS + one worker's footprint would breach a
|
|
1381
|
+
// soft ceiling (default 80% of the EMERGENCY RSS limit), leaving headroom for in-flight drain.
|
|
1382
|
+
const RSS_SOFT_LIMIT_MB = (() => {
|
|
1383
|
+
const parsed = parseInt(process.env.MUADDIB_RSS_SOFT_LIMIT_MB, 10);
|
|
1384
|
+
if (Number.isFinite(parsed) && parsed > 0) return parsed;
|
|
1385
|
+
const hard = parseInt(process.env.MUADDIB_RSS_LIMIT_MB, 10);
|
|
1386
|
+
const base = (Number.isFinite(hard) && hard > 0) ? hard : 8500;
|
|
1387
|
+
return Math.round(base * 0.80);
|
|
1388
|
+
})();
|
|
1389
|
+
const EST_WORKER_RSS_MB = (() => {
|
|
1390
|
+
const parsed = parseInt(process.env.MUADDIB_EST_WORKER_RSS_MB, 10);
|
|
1391
|
+
return (Number.isFinite(parsed) && parsed > 0) ? parsed : 600;
|
|
1392
|
+
})();
|
|
1393
|
+
|
|
1394
|
+
/**
|
|
1395
|
+
* Pure: how many NEW scan workers the current RSS headroom allows under the soft ceiling.
|
|
1396
|
+
* `currentRssBytes` already includes the active workers, so this answers "how many MORE fit".
|
|
1397
|
+
* Returns 0 (never negative) once RSS reaches the soft limit — existing workers are NOT killed
|
|
1398
|
+
* here, they drain and free memory; ensureWorkers keeps the queue alive with 1 worker if
|
|
1399
|
+
* nothing is running. softLimitMb / estWorkerMb are injectable for tests.
|
|
1400
|
+
*/
|
|
1401
|
+
function rssAdmissionCap(currentRssBytes, softLimitMb = RSS_SOFT_LIMIT_MB, estWorkerMb = EST_WORKER_RSS_MB) {
|
|
1402
|
+
const headroomMb = softLimitMb - (currentRssBytes / 1024 / 1024);
|
|
1403
|
+
if (headroomMb <= 0) return 0;
|
|
1404
|
+
return Math.max(0, Math.floor(headroomMb / estWorkerMb));
|
|
1405
|
+
}
|
|
1406
|
+
|
|
1361
1407
|
/**
|
|
1362
1408
|
* Ensure the target number of workers are running. Non-blocking: spawns
|
|
1363
1409
|
* missing workers as background promises. Called from the daemon main loop
|
|
@@ -1365,7 +1411,23 @@ function computeWorkersToSpawn(targetConcurrency, activeWorkers, queueLength) {
|
|
|
1365
1411
|
*/
|
|
1366
1412
|
function ensureWorkers(scanQueue, stats, dailyAlerts, recentlyScanned, downloadsCache, sandboxAvailable) {
|
|
1367
1413
|
if (scanQueue.length === 0) return;
|
|
1368
|
-
|
|
1414
|
+
let toSpawn = computeWorkersToSpawn(_targetConcurrency, _activeWorkers, scanQueue.length);
|
|
1415
|
+
if (toSpawn <= 0) return;
|
|
1416
|
+
|
|
1417
|
+
// RSS-aware admission (P1 OOM durable fix): cap NEW spawns by memory headroom so the
|
|
1418
|
+
// in-flight worker set can't overshoot the soft RSS ceiling. Never fully deadlock: if
|
|
1419
|
+
// headroom is gone AND nothing is running, allow exactly one so the queue still makes
|
|
1420
|
+
// forward progress (its completion frees memory). Bounds peak RSS BEFORE the reactive breaker.
|
|
1421
|
+
const rssNow = process.memoryUsage().rss;
|
|
1422
|
+
const rssCap = rssAdmissionCap(rssNow);
|
|
1423
|
+
if (toSpawn > rssCap) {
|
|
1424
|
+
if (rssCap === 0 && _activeWorkers === 0) {
|
|
1425
|
+
toSpawn = 1;
|
|
1426
|
+
} else {
|
|
1427
|
+
console.log(`[MONITOR] RSS admission: capping spawn ${toSpawn}->${rssCap} (rss=${Math.round(rssNow / 1024 / 1024)}MB soft=${RSS_SOFT_LIMIT_MB}MB active=${_activeWorkers})`);
|
|
1428
|
+
toSpawn = rssCap;
|
|
1429
|
+
}
|
|
1430
|
+
}
|
|
1369
1431
|
if (toSpawn <= 0) return;
|
|
1370
1432
|
|
|
1371
1433
|
console.log(`[MONITOR] Spawning ${toSpawn} worker(s) (active: ${_activeWorkers}, target: ${_targetConcurrency}, queue: ${scanQueue.length})`);
|
|
@@ -1757,6 +1819,7 @@ module.exports = {
|
|
|
1757
1819
|
getActiveWorkers,
|
|
1758
1820
|
terminateAllWorkers,
|
|
1759
1821
|
computeWorkersToSpawn,
|
|
1822
|
+
rssAdmissionCap,
|
|
1760
1823
|
ensureWorkers,
|
|
1761
1824
|
drainWorkers,
|
|
1762
1825
|
|
|
@@ -82,4 +82,71 @@ function enqueueScan(scanQueue, item, stats, max = MAX_SCAN_QUEUE) {
|
|
|
82
82
|
return dropped;
|
|
83
83
|
}
|
|
84
84
|
|
|
85
|
-
|
|
85
|
+
/**
|
|
86
|
+
* Bulk-evict the scan queue down to `targetKeep`, honoring the SAME protection predicate
|
|
87
|
+
* as enqueueScan and ledgering EVERY dropped item — the single-source-of-truth eviction
|
|
88
|
+
* the daemon's EMERGENCY memory breaker must use instead of a raw `splice(0, n)`.
|
|
89
|
+
*
|
|
90
|
+
* Selection: drop the oldest UNPROTECTED items first; only dip into protected items
|
|
91
|
+
* (oldest-first) if there aren't enough unprotected ones to reach the target. This keeps
|
|
92
|
+
* IOC-match / burst / first-publish / ATO scans alive through a memory emergency, exactly
|
|
93
|
+
* like the per-item cap path — closing the gap where the v2.10.88 circuit breaker silently
|
|
94
|
+
* dropped protected scans (CLAUDE.md "ne jamais perdre de scan" / "no silent caps").
|
|
95
|
+
*
|
|
96
|
+
* In-place compaction (write-pointer, O(n), preserves insertion order, no giant spread) so
|
|
97
|
+
* the daemon (which holds the same array reference) sees the mutation. Best-effort ledger;
|
|
98
|
+
* never throws. `ledgerFn` is injectable for tests; defaults to state.appendScanLedger.
|
|
99
|
+
*
|
|
100
|
+
* @returns {{dropped:number, droppedProtected:number}}
|
|
101
|
+
*/
|
|
102
|
+
function evictFromScanQueueBulk(scanQueue, targetKeep, source = 'bulk_evict', ledgerFn = null) {
|
|
103
|
+
const before = scanQueue.length;
|
|
104
|
+
const keep = Math.max(0, targetKeep | 0);
|
|
105
|
+
if (before <= keep) return { dropped: 0, droppedProtected: 0 };
|
|
106
|
+
const toDrop = before - keep;
|
|
107
|
+
|
|
108
|
+
// Victim set: oldest unprotected first, then (only if short) oldest protected.
|
|
109
|
+
const dropSet = new Set();
|
|
110
|
+
for (let i = 0; i < before && dropSet.size < toDrop; i++) {
|
|
111
|
+
if (!_isProtected(scanQueue[i])) dropSet.add(i);
|
|
112
|
+
}
|
|
113
|
+
let droppedProtected = 0;
|
|
114
|
+
if (dropSet.size < toDrop) {
|
|
115
|
+
// Not enough unprotected items: every unprotected one is already marked, so the
|
|
116
|
+
// remaining oldest-first items are protected — drop them as a last resort.
|
|
117
|
+
for (let i = 0; i < before && dropSet.size < toDrop; i++) {
|
|
118
|
+
if (!dropSet.has(i)) { dropSet.add(i); droppedProtected++; }
|
|
119
|
+
}
|
|
120
|
+
}
|
|
121
|
+
|
|
122
|
+
// Resolve the ledger sink once (per-call require would be 500+ lookups under emergency).
|
|
123
|
+
let appendLedger = ledgerFn;
|
|
124
|
+
if (!appendLedger) {
|
|
125
|
+
try { appendLedger = require('./state.js').appendScanLedger; } catch { appendLedger = null; }
|
|
126
|
+
}
|
|
127
|
+
|
|
128
|
+
// Compact survivors in place, ledgering each evicted item with an identity-preserving
|
|
129
|
+
// source (protected drops get a distinct suffix so the rare case stays visible in the rollup).
|
|
130
|
+
let w = 0;
|
|
131
|
+
for (let r = 0; r < before; r++) {
|
|
132
|
+
if (dropSet.has(r)) {
|
|
133
|
+
const item = scanQueue[r];
|
|
134
|
+
if (appendLedger && item && item.name) {
|
|
135
|
+
try {
|
|
136
|
+
appendLedger({
|
|
137
|
+
name: item.name, version: item.version, ecosystem: item.ecosystem,
|
|
138
|
+
outcome: 'dropped',
|
|
139
|
+
source: _isProtected(item) ? `${source}_protected` : source
|
|
140
|
+
});
|
|
141
|
+
} catch { /* ledger is best-effort — must never break the breaker */ }
|
|
142
|
+
}
|
|
143
|
+
} else {
|
|
144
|
+
scanQueue[w++] = scanQueue[r];
|
|
145
|
+
}
|
|
146
|
+
}
|
|
147
|
+
scanQueue.length = w;
|
|
148
|
+
|
|
149
|
+
return { dropped: toDrop, droppedProtected };
|
|
150
|
+
}
|
|
151
|
+
|
|
152
|
+
module.exports = { enqueueScan, evictFromScanQueueBulk, isProtected: _isProtected, MAX_SCAN_QUEUE };
|
package/src/monitor/state.js
CHANGED
|
@@ -972,7 +972,7 @@ let _scanLedgerAppendedSinceCompact = 0;
|
|
|
972
972
|
const SCAN_LEDGER_OUTCOMES = new Set([
|
|
973
973
|
'clean', 'clean_low_signal', 'clean_tooling', 'suspect', 'ml_clean', 'llm_benign',
|
|
974
974
|
'sandbox_inconclusive', 'sandbox_unconfirmed', 'confirmed',
|
|
975
|
-
'static_timeout', 'size_skip', 'dropped'
|
|
975
|
+
'static_timeout', 'size_skip', 'dropped', 'error'
|
|
976
976
|
]);
|
|
977
977
|
|
|
978
978
|
/**
|
|
@@ -1453,6 +1453,27 @@ function getParisDateString() {
|
|
|
1453
1453
|
return formatter.format(new Date());
|
|
1454
1454
|
}
|
|
1455
1455
|
|
|
1456
|
+
// Hour (Europe/Paris) at/after which the once-daily report may fire. Single source of
|
|
1457
|
+
// truth — imported by webhook.js, daemon.js and queue.js (each previously redefined it,
|
|
1458
|
+
// and webhook.js still re-exports it for back-compat).
|
|
1459
|
+
const DAILY_REPORT_HOUR = 8; // 08:00 Paris time (Europe/Paris)
|
|
1460
|
+
|
|
1461
|
+
/**
|
|
1462
|
+
* Canonical "is the daily report due?" predicate — the ONE gate, defined here in state.js
|
|
1463
|
+
* (a leaf module that daemon.js and queue.js already import, so no require cycle).
|
|
1464
|
+
*
|
|
1465
|
+
* Catch-up semantics: fire at OR AFTER 08:00 Paris, so a missed 08:00 (e.g. the daemon was
|
|
1466
|
+
* down/OOM-restarting at that minute) still fires later the SAME day — losing a whole day
|
|
1467
|
+
* was the old daemon.js `hour === 8` behaviour. But NEVER fire during the 00:00–07:59 Paris
|
|
1468
|
+
* "dead zone": a fire then stamps the NEW day's date before its 08:00 window and, because
|
|
1469
|
+
* hasReportBeenSentToday() keys off the Paris CALENDAR date, permanently suppresses that
|
|
1470
|
+
* day's real report. Replaces the two divergent copies (daemon.js `!== 8`, queue.js `< 8`).
|
|
1471
|
+
*/
|
|
1472
|
+
function isDailyReportDue(stats) {
|
|
1473
|
+
if (getParisHour() < DAILY_REPORT_HOUR) return false;
|
|
1474
|
+
return !hasReportBeenSentToday(stats);
|
|
1475
|
+
}
|
|
1476
|
+
|
|
1456
1477
|
// --- recentlyScanned dedup-set persistence (survives restarts → no re-scan storm) ---
|
|
1457
1478
|
//
|
|
1458
1479
|
// The dedup Set is in-memory only, so every restart starts it empty and re-scans the
|
|
@@ -1703,5 +1724,7 @@ module.exports = {
|
|
|
1703
1724
|
loadRecentlyScanned,
|
|
1704
1725
|
getParisHour,
|
|
1705
1726
|
getParisDateString,
|
|
1727
|
+
DAILY_REPORT_HOUR,
|
|
1728
|
+
isDailyReportDue,
|
|
1706
1729
|
loadStateRaw
|
|
1707
1730
|
};
|