npm - @fanboynz/network-scanner - Versions diffs - 3.2.0 → 3.3.0 - Mend

@fanboynz/network-scanner 3.2.0 → 3.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,34 @@
 All notable changes to the Network Scanner (nwss.js) project.
+## [3.3.0] - 2026-06-06
+### Added
+- **DNS dead-domain skip + corroborated persistence** — within a scan, once a host resolves NXDOMAIN/ENODATA it is remembered and repeat URLs on that host are skipped without re-resolving. With `--dns-cache`, a host that *also* fails navigation (`ERR_NAME_NOT_RESOLVED` / `ERR_ADDRESS_UNREACHABLE`) is corroborated and persisted to the negative cache (`.dnsnegcache`, 12h TTL) so it is skipped on the next run too. Only definitive non-existence is cached — resolver errors fail open and never poison a live host.
+- **`acceptInsecureCerts` on browser launch** — TLS/cert errors (expired, self-signed, name-mismatch) no longer abort navigation, so streaming/pirate domains with broken certs are still scanned.
+- **`--disable-popup-blocking` when a site uses `capture_popups`** — Chrome's pop-up blocker (`chrome://settings/content/popups`) is turned off only for popup-capture scans, so non-gesture popunders (document-level `onclick` / timer SDKs) fire and get captured too. Non-popup scans keep the blocker on (stealthier — a real browser blocks non-gesture `window.open()`); gesture-triggered popups already worked via the synthetic-click path.
+### Changed
+- **The main-frame document is never blocked** — the scanned page (and any main-frame redirect target) is exempt from adblock / `blocked` / `blockDomainsByUrl` aborts. Aborting it made the navigation never commit (`about:blank` → timeout), silently breaking scanned URLs that matched our own filter lists (common on adult/pirate/stream domains). The request still flows through the matcher, so a main-frame redirect destination (e.g. a filecrypt → ad-domain hop) is still captured; sub-frame / ad iframes stay blockable.
+- **Navigation timeouts are recovered, not discarded** — on a nav timeout the scanner retries leniently and proceeds with the partially-loaded page instead of dropping the URL (a page still at `about:blank` is still treated as a failure).
+- **whois disk-cache TTL raised to 36h** (dig stays 20h) — registrar data is stable and whois servers rate-limit aggressively, so a longer TTL cuts repeat queries; dig keeps its 20h TTL.
+- **VPN is Linux-only with a clear guard** — `vpn` / `openvpn` on macOS/Windows now returns an explicit "Linux-only" error instead of cryptic `ip` / `/proc` failures.
+### Performance
+- **`psl.parse` memoized by hostname** in the request hot path — both per-request handlers (main page + popup capture) parsed the root domain of *every* request, while a page hammers the same handful of hosts (CDN, analytics, ad domains). A hostname-keyed memo turns almost all of those into `Map` hits, replacing the URL-keyed cache (fewer + shorter keys, far higher hit rate).
+- **Lower per-request overhead** — the iframe-loop guard's `frame().url()` lookup is now gated behind a cheap URL string test instead of running on every request.
+- **Removed redundant disk I/O** — a leaked adblock combined-list temp file in `tmpdir` is now cleaned up, and a redundant `existsSync` before each forced screenshot's recursive `mkdir` was dropped.
+### Fixed
+- **Periodic debug/`--dumpurls` log flush is now synchronous** — the 2s timer used async `fs.writeFile({flag:'a'})` with no in-flight guard, so two ticks could append to the same file concurrently and interleave lines, and it cleared the buffer *before* the write confirmed (silently dropping entries on a failed write). It now uses `appendFileSync`, clears only after a successful write (transient failures retry next tick), and is bounded so a permanently-unwritable path can't grow memory.
+- **Dead-domain skip works without `--show-dead-domains`** — the in-scan skip recorded into the dead set only when the report flag was on, which made the skip dead code; recording is now unconditional and the flag gates only the end-of-scan report. Transient DNS errors were also dropped from the dead-domain match so only `ERR_NAME_NOT_RESOLVED` / `ERR_ADDRESS_UNREACHABLE` mark a host dead.
+### Removed
+- **Hardcoded `dmzjmp` iframe-loop guard** — the domain-specific abort for a `creative.dmzjmp.com` frame requesting `go.dmzjmp.com/api/models` (added mid-2025 to stop a runaway request loop) has not recurred and was removed from the request hot path; the per-URL timeout remains the backstop. Recoverable from git history — prefer a config-driven `iframe_loop_guards` entry if it ever returns.
+### Documentation
+- **README + man page now document `--block-ads` and `--adblock-engine`** — blocking ads/trackers *during* the scan with EasyList-format list(s) (comma-separated), and the `js` (default, native parser) vs `rust` (Brave `adblock-rs`) matcher backends.
 ## [3.2.0] - 2026-06-04
 ### Added

package/README.md CHANGED Viewed

@@ -66,9 +66,10 @@ A Puppeteer-based tool for scanning websites to find third-party (or optionally
 | `--use-puppeteer-core`      | Use `puppeteer-core` with system Chrome instead of bundled Chromium |
 | `--use-obscura`             | Connect to running Obscura CDP server (`ws://127.0.0.1:9222` or `OBSCURA_WS` env). Skips fingerprint injection — Obscura provides built-in stealth |
 | `--load-extension <path>`   | Load unpacked Chrome extension from directory (can be used multiple times) |
-| `--dns-cache`               | Persist dig/whois results to disk between runs (20hr TTL, 2000-entry cap each, `.digcache`/`.whoiscache`), **plus** the DNS pre-check negative cache (NXDOMAIN/ENODATA only — never resolver errors — 12h TTL, `.dnsnegcache`) so known-dead hosts aren't re-resolved next run. Disk writes are atomic (tmp + rename); corrupt cache files are detected on load with a `[dns-cache]` warn line and reset cleanly. |
-| `--no-dns-precheck`         | Disable per-URL DNS resolution check before page navigation. By default, hosts that dig/whois have already proven live (within the 20hr cache TTL) skip their c-ares pre-check via a positive-resolution index. |
-| `--block-ads=<files>`       | Block ads using EasyList format rules (comma-separated: `easylist.txt,easyprivacy.txt`) |
+| `--dns-cache`               | Persist dig/whois results to disk between runs (dig 20hr / whois 36hr TTL, 2000-entry cap each, `.digcache`/`.whoiscache`), **plus** the DNS pre-check negative cache (NXDOMAIN/ENODATA only — never resolver errors — 12h TTL, `.dnsnegcache`) so known-dead hosts aren't re-resolved next run. Disk writes are atomic (tmp + rename); corrupt cache files are detected on load with a `[dns-cache]` warn line and reset cleanly. |
+| `--no-dns-precheck`         | Disable per-URL DNS resolution check before page navigation. By default, hosts that dig/whois have already proven live (within the dig/whois cache TTL) skip their c-ares pre-check via a positive-resolution index. |
+| `--block-ads=<files>`       | Block ads/trackers **during the scan** using EasyList-format filter list(s) (`\|\|domain^`, `/ads/*`, etc.). Comma-separated for multiple: `--block-ads=easylist.txt,easyprivacy.txt`. See [Blocking ads during the scan](#blocking-ads-during-the-scan). |
+| `--adblock-engine=<js\|rust>` | Matcher backend for `--block-ads` (default: `js`). `rust` uses Brave's `adblock-rs` (much faster on large lists) and requires `npm i adblock-rs`. |
 | `--cdp`                     | Enable Chrome DevTools Protocol logging (now per-page if enabled) |
 | `--remove-dupes`            | Remove duplicate domains from output (only with `-o`) |
 | `--dry-run`                 | Console output only: show matching regex, titles, whois/dig/searchstring results, and adblock rules |
@@ -92,6 +93,37 @@ A Puppeteer-based tool for scanning websites to find third-party (or optionally
 | `--clear-cache`             | Clear persistent cache before scanning (improves fresh start performance) |
 | `--ignore-cache`            | Bypass all smart caching functionality during scanning |
+### Blocking ads during the scan
+`--block-ads` makes the scanner **block** matching requests *during* the scan (separate from capturing rules) — to keep ad/tracker noise out of the page, speed up loads, or test that a list catches what it should.
+**Adding lists.** Pass one or more EasyList-format filter lists (same syntax as uBlock Origin / EasyList):
+```bash
+# Single list
+node nwss.js --block-ads=easylist.txt
+# Multiple lists — comma-separated, no spaces
+node nwss.js --block-ads=easylist.txt,easyprivacy.txt,mylist.txt
+```
+Lists are plain-text **network** rules — `||doubleclick.net^`, `/ads/*`, `||example.com^$script`, etc. Element-hiding/cosmetic rules (`##…`) don't apply to request blocking and are ignored. The scanned page's own top-level document is never blocked (only sub-resources), so a site whose own domain is in a list still loads.
+**Engine — `js` vs `rust`** (`--adblock-engine`, default `js`):
+| Engine | Flag | Backend | When |
+|---|---|---|---|
+| **js** (default) | `--adblock-engine=js` | `lib/adblock.js` — pure-JS, no extra deps | Default; fine for small/medium lists, works everywhere |
+| **rust** | `--adblock-engine=rust` | `lib/adblock-rust.js` — Brave's [`adblock-rs`](https://github.com/brave/adblock-rust) | Large lists (full EasyList + EasyPrivacy + …); much faster matching. Drop-in (same rules, same results). Requires `npm install adblock-rs` (needs a Rust toolchain) |
+The two engines are interchangeable — same rule format, same blocking result; `rust` is purely a speed option for big lists. If you pass `--adblock-engine=rust` without `adblock-rs` installed, install it (`npm i adblock-rs`) or drop the flag to use `js`.
+```bash
+# Fast matching over big lists with the Rust engine
+npm install adblock-rs
+node nwss.js --block-ads=easylist.txt,easyprivacy.txt --adblock-engine=rust
+```
 ---
 ## config.json Format

package/lib/nettools.js CHANGED Viewed

@@ -30,7 +30,7 @@ const GLOBAL_DIG_CACHE_MAX = 2000;
 // Global whois result cache — shared across ALL handler instances and processUrl calls
 // Whois data is per root domain and doesn't change based on search terms
 const globalWhoisResultCache = new Map();
-const GLOBAL_WHOIS_CACHE_TTL = 72000000; // 20 hours (persisted to disk between runs)
+const GLOBAL_WHOIS_CACHE_TTL = 129600000; // 36 hours (persisted to disk between runs). Longer than dig's 20h: registrar data is very stable and whois servers rate-limit aggressively, so caching longer cuts repeat queries.
 const GLOBAL_WHOIS_CACHE_MAX = 2000;
 // Persistent disk cache file paths
@@ -40,8 +40,8 @@ const WHOIS_CACHE_FILE = path.join(__dirname, '..', '.whoiscache');
 // Index of hostnames known to resolve, populated as a side effect of
 // positive dig/whois cache writes AND cache hits. nwss.js's DNS pre-check
 // reads this via domainKnownToResolve() so it can skip its own resolve4
-// call on hosts that dig or whois have already proven live within the
-// 20-hour TTL window. Populating on cache HITS (not just writes) handles
+// call on hosts that dig or whois have already proven live within their
+// cache TTL window (dig 20h / whois 36h). Populating on cache HITS (not just writes) handles
 // the --dns-cache disk-load case where entries arrive without going
 // through the in-process write path. Stale entries -- hostname in Set but
 // the dig/whois entry has since been evicted -- are harmless: worst case

package/lib/openvpn_vpn.js CHANGED Viewed

@@ -778,6 +778,14 @@ function validateOvpnConfig(ovpnConfig) {
  * @returns {Promise<Object>} { success, connection, tunDevice, error }
  */
 async function connectForSite(siteConfig, forceDebug = false) {
+  // Platform guard: OpenVPN routing here reads /proc and uses the iproute2 `ip`
+  // command, both Linux-only. Fail clearly instead of a cryptic /proc or `ip`
+  // error on macOS/Windows. WSL2 reports 'linux' and passes (TUN is checked
+  // separately below via isWSL/checkTunDevice).
+  if (process.platform !== 'linux') {
+    return { success: false, error: `OpenVPN routing is currently Linux-only (needs /proc + the iproute2 'ip' command; not available on ${process.platform}). Run on Linux/WSL2, or remove the 'openvpn' option from the site config.` };
+  }
   const ovpnConfig = normalizeOvpnConfig(siteConfig.openvpn);
   if (!ovpnConfig) {
     return { success: false, error: 'Invalid OpenVPN configuration' };

package/lib/wireguard_vpn.js CHANGED Viewed

@@ -388,6 +388,14 @@ function validateVpnConfig(vpnConfig) {
  * @returns {Promise<Object>} { success, interface, error }
  */
 async function connectForSite(siteConfig, forceDebug = false) {
+  // Platform guard: WireGuard routing here relies on the iproute2 `ip` command
+  // and wg-quick conventions, which are Linux-only. Fail with a clear message
+  // instead of a cryptic `ip: command not found` on macOS/Windows. WSL2 reports
+  // 'linux' and passes.
+  if (process.platform !== 'linux') {
+    return { success: false, error: `WireGuard routing is currently Linux-only (needs the iproute2 'ip' command + wg-quick; not available on ${process.platform}). Run on Linux/WSL2, or remove the 'vpn' option from the site config.` };
+  }
   const vpnConfig = normalizeVpnConfig(siteConfig.vpn);
   if (!vpnConfig) {
     return { success: false, error: 'Invalid VPN configuration' };

package/nwss.1 CHANGED Viewed

@@ -153,6 +153,14 @@ Browser restart interval in URLs processed (1-1000, overrides config/default).
 .B \--show-dead-domains
 At end of scan, list hostnames that did not resolve or were unreachable (\fBNXDOMAIN\fR/\fBENODATA\fR plus \fBERR_NAME_NOT_RESOLVED\fR/\fBERR_ADDRESS_UNREACHABLE\fR). Excludes blocks and timeouts, since those mean the domain is alive. Useful for pruning dead URLs.
+.TP
+.BI \--block-ads= FILE\fR[,\fIFILE\fR...]
+Block ads/trackers during the scan using EasyList-format filter list(s) \(em network rules like \fB||domain^\fR, \fB/ads/*\fR, \fB||domain^$script\fR. Comma-separated for multiple lists. Cosmetic (\fB##\fR) rules are ignored; the scanned page's own top-level document is never blocked (only sub-resources).
+.TP
+.BI \--adblock-engine= js|rust
+Matcher backend for \fB\-\-block-ads\fR (default: \fBjs\fR). \fBjs\fR is the built-in pure-JS matcher (no extra dependencies). \fBrust\fR uses Brave's \fBadblock-rs\fR \(em much faster on large lists, same rules and results, but requires \fBnpm install adblock-rs\fR (needs a Rust toolchain).
 .TP
 .BR \-h ", " \--help
 Show help message and exit.

package/nwss.js CHANGED Viewed

@@ -55,6 +55,7 @@ const CSS_BLOCKED_TAG = messageColors.processing('[css_blocked]');
 const EVAL_ON_DOC_TAG = messageColors.processing('[evalOnDoc]');
 const REALTIME_CLEANUP_TAG = messageColors.processing('[realtime_cleanup]');
 const VPN_TAG = messageColors.processing('[vpn]');
+const POPUP_TAG = messageColors.processing('[popup]');
 // Precomputed colored '[SmartCache]' subsystem prefix — paired with the
 // same constant in lib/smart-cache.js so debug lines from both files
 // produce consistently colored output. formatLogMessage only colors the
@@ -387,7 +388,11 @@ const dnsPrecheckTimeoutMs = 2000;
 const showDeadDomains = args.includes('--show-dead-domains');
 const _deadDomains = new Map();
 function recordDeadDomain(urlOrHost, reason) {
-  if (!showDeadDomains || !urlOrHost) return;
+  // Populate unconditionally — the pre-check skip reads _deadDomains to drop
+  // repeat URLs on a host already proven dead this run, which must work whether
+  // or not --show-dead-domains is set. The end-of-scan REPORT is separately
+  // gated on showDeadDomains, so the flag still controls output, not recording.
+  if (!urlOrHost) return;
   let host = urlOrHost;
   try { host = new URL(urlOrHost).hostname; } catch { /* already a bare host */ }
   if (host && !_deadDomains.has(host)) _deadDomains.set(host, reason);
@@ -407,7 +412,7 @@ const DNS_NEGATIVE_CACHE_MAX = 1000;
 // persisting it can't silently drop a live host. Opt-in via --dns-cache: dead
 // hosts are remembered for DNS_NEGATIVE_PERSIST_TTL_MS and reloaded next run;
 // otherwise it's a 5-min in-memory-only cache. The persist TTL is deliberately
-// shorter than the dig/whois positive cache (20h): a domain that doesn't exist
+// shorter than the dig/whois positive cache (dig 20h / whois 36h): a domain that doesn't exist
 // now MAY get registered, and this is a domain-hunting scanner, so the dead
 // ones are re-checked twice a day rather than trusted for ~a day.
 const DNS_NEGATIVE_PERSIST_TTL_MS = 12 * 60 * 60 * 1000; // 12 hours
@@ -715,6 +720,9 @@ if (blockAdsIndex !== -1) {
   adblockEnabled = true;
   const engine = adblockEngineName === 'rust' ? adblockRust : adblockJs;
+  // Only ever assigned the os.tmpdir() path below — never a user file — so the
+  // unlink in finally can never touch the caller's own lists.
+  let combinedTmpFile = null;
   try {
     if (engine === adblockRust) {
       // Rust wrapper accepts an array directly — no temp file needed.
@@ -723,15 +731,22 @@ if (blockAdsIndex !== -1) {
       // JS engine takes a single path; concat to a temp file when multiple lists.
       let rulesFile = rulesFiles[0];
       if (rulesFiles.length > 1) {
-        rulesFile = path.join(os.tmpdir(), `nwss-adblock-combined-${Date.now()}.txt`);
+        combinedTmpFile = path.join(os.tmpdir(), `nwss-adblock-combined-${Date.now()}.txt`);
+        rulesFile = combinedTmpFile;
         const combined = rulesFiles.map(f => fs.readFileSync(f, 'utf-8')).join('\n');
         fs.writeFileSync(rulesFile, combined);
       }
+      // parseAdblockRules reads the file synchronously and in full before
+      // returning, so the temp copy is safe to remove immediately afterwards.
       adblockMatcher = engine.parseAdblockRules(rulesFile, { enableLogging: forceDebug });
     }
   } catch (err) {
     console.log(`Error: Failed to load adblock engine '${adblockEngineName}': ${err.message}`);
     process.exit(1);
+  } finally {
+    if (combinedTmpFile) {
+      try { fs.unlinkSync(combinedTmpFile); } catch { /* best effort — OS reaps tmpdir */ }
+    }
   }
   const stats = adblockMatcher.getStats();
   const ruleDesc = stats.total != null
@@ -805,7 +820,7 @@ Validation Options:
   --cache-requests               Cache HTTP requests to avoid re-requesting same URLs within scan
   --dns <ip[,ip,...]>            Resolver(s) for the DNS pre-check AND nettools' dig (not Chrome nav / whois).
                                  One pins all queries to it; several rotate per query. Overrides /etc/resolv.conf.
-  --dns-cache                    Persist dig/whois results to disk between runs (20h TTL, 2000-entry cap each),
+  --dns-cache                    Persist dig/whois results to disk between runs (dig 20h / whois 36h TTL, 2000-entry cap each),
                                  plus the DNS pre-check negative cache (NXDOMAIN only, 12h TTL, .dnsnegcache)
   --no-dns-precheck              Disable per-URL DNS resolution check before page navigation.
                                  By default, URLs whose hostname doesn't resolve are skipped
@@ -933,7 +948,7 @@ Advanced Options:
   whois_delay: <milliseconds>                Delay between whois requests for this site (default: global whois_delay)
   dig: ["term1", "term2"]                     Check dig output for ALL specified terms (AND logic)
   dig-or: ["term1", "term2"]                  Check dig output for ANY specified term (OR logic)
-  goto_options: {"waitUntil": "domcontentloaded"} Custom page.goto() options (default: {"waitUntil": "load"})
+  goto_options: {"waitUntil": "domcontentloaded"} Custom page.goto() options (default: {"waitUntil": "domcontentloaded"})
   dig_subdomain: true/false                    Use subdomain for dig lookup instead of root domain (default: false)
   digRecordType: "A"                          DNS record type for dig (default: A)
@@ -1423,6 +1438,7 @@ if (dumpUrls) {
 // Avoids blocking I/O on every intercepted request in debug/dumpurls mode
 const _logBuffers = new Map();  // filePath -> string[]
 const LOG_FLUSH_INTERVAL = 2000; // Flush every 2 seconds
+const LOG_BUFFER_MAX_RETAINED = 10000; // Cap a file's retry backlog (lines) so a permanently unwritable path can't grow memory unboundedly
 let _logFlushTimer = null;
 function bufferedLogWrite(filePath, entry) {
@@ -1435,18 +1451,20 @@ function bufferedLogWrite(filePath, entry) {
 function flushLogBuffers() {
   for (const [filePath, entries] of _logBuffers) {
-    if (entries.length > 0) {
-      try {
-        const data = entries.join('');
-        entries.length = 0; // Clear buffer immediately
-        fs.writeFile(filePath, data, { flag: 'a' }, (err) => {
-          if (err) {
-            console.warn(formatLogMessage('warn', `Failed to flush log buffer to ${filePath}: ${err.message}`));
-          }
-        });
-      } catch (err) {
-        console.warn(formatLogMessage('warn', `Failed to flush log buffer to ${filePath}: ${err.message}`));
-      }
+    if (entries.length === 0) continue;
+    try {
+      // Synchronous append on purpose: the batched 2s flush is small, and a
+      // blocking append cannot overlap the next timer tick (it holds the event
+      // loop for its duration) — eliminating the interleaved concurrent-append
+      // hazard of the old async fs.writeFile({flag:'a'}). Clear ONLY after the
+      // write succeeds, so a transient failure retries next tick instead of
+      // being silently dropped (the old code cleared before the async write
+      // confirmed). Bounded so a permanently unwritable path can't grow memory.
+      fs.appendFileSync(filePath, entries.join(''));
+      entries.length = 0;
+    } catch (err) {
+      console.warn(formatLogMessage('warn', `Failed to flush log buffer to ${filePath}: ${err.message}`));
+      if (entries.length > LOG_BUFFER_MAX_RETAINED) entries.length = 0;
     }
   }
 }
@@ -1490,21 +1508,29 @@ if (forceDebug && globalComments) {
  * @param {string} url - The URL string to parse.
  * @returns {string} The root domain, or the original hostname if parsing fails (e.g., for IP addresses or invalid URLs), or an empty string on error.
  */
-const _rootDomainCache = new Map();
-function getRootDomain(url) {
-  const cached = _rootDomainCache.get(url);
+// psl.parse memoized by hostname. The request handlers parse the root domain
+// of EVERY request, and a page hits the same few hosts repeatedly (CDN,
+// analytics, ad domains) — so a hostname-keyed memo turns almost all of those
+// into Map hits instead of repeated public-suffix-list lookups. Keyed by
+// hostname (not full URL) so distinct paths/queries on one host share one
+// entry: higher hit rate, fewer + shorter keys than a URL-keyed cache.
+// psl.parse is pure and never throws (malformed input → {domain: null}), so
+// the catch is defensive only.
+const _hostRootCache = new Map();
+function rootDomainForHost(hostname) {
+  if (!hostname) return '';
+  const cached = _hostRootCache.get(hostname);
   if (cached !== undefined) return cached;
-  try {
-    const { hostname } = new URL(url);
-    const parsed = psl.parse(hostname);
-    const result = parsed.domain || hostname;
-    if (_rootDomainCache.size > 5000) _rootDomainCache.clear();
-    _rootDomainCache.set(url, result);
-    return result;
-  } catch {
-    _rootDomainCache.set(url, '');
-    return '';
-  }
+  let result;
+  try { const parsed = psl.parse(hostname); result = parsed.domain || hostname; }
+  catch { result = hostname; }
+  if (_hostRootCache.size > 5000) _hostRootCache.clear();
+  _hostRootCache.set(hostname, result);
+  return result;
+}
+function getRootDomain(url) {
+  try { return rootDomainForHost(new URL(url).hostname); }
+  catch { return ''; }
 }
 /**
@@ -1839,7 +1865,19 @@ function setupFrameHandling(page, forceDebug) {
   // Declare userDataDir in outer scope for cleanup access
   let userDataDir = null;
+  // Browser-level decision (the browser launches once per batch, so this can't
+  // be per-site): only disable Chrome's pop-up blocker when at least one site
+  // actually wants popups captured. A real browser blocks non-gesture
+  // window.open(), so non-popup scans keep the blocker on for stealth.
+  // capture_popups scans turn it off so non-gesture popunders (document-level
+  // onclick / timer SDKs) fire and get captured too — gesture-triggered
+  // popups already work via the synthetic-click path regardless of this flag.
+  const wantPopups = Array.isArray(sites) && sites.some(s => s && s.capture_popups === true);
+  if (wantPopups && forceDebug) {
+    console.log(formatLogMessage('debug', `${POPUP_TAG} capture_popups set — launching with --disable-popup-blocking (non-gesture popunders allowed)`));
+  }
   /**
    * Creates a new browser instance with consistent configuration
    * Uses system Chrome and temporary directories to minimize disk usage
@@ -1930,6 +1968,12 @@ function setupFrameHandling(page, forceDebug) {
       // Puppeteer 22.x headless mode optimization
       // Auto-detect best headless mode based on Puppeteer version
       headless: headlessMode,
+      // Bypass TLS cert errors at the browser level (drives CDP
+      // Security.setIgnoreCertificateErrors). Robust on new-headless Chrome,
+      // where the --ignore-certificate-errors *flag* is increasingly ignored.
+      // An ad/tracker scanner must reach self-signed / mismatched-cert ad and
+      // embed domains; we observe traffic, we don't transmit secrets.
+      acceptInsecureCerts: true,
       args: [
         // CRITICAL: Remove automation detection markers
         '--disable-blink-features=AutomationControlled',
@@ -2018,6 +2062,10 @@ function setupFrameHandling(page, forceDebug) {
         '--memory-pressure-off',
         '--max_old_space_size=2048',   // V8 heap limit
         '--disable-prompt-on-repost',  // Fixes form popup on page reload
+        // Disable Chrome's pop-up blocker (chrome://settings/content/popups)
+        // ONLY when a site wants popups captured — lets non-gesture popunders
+        // fire. Gated so non-popup scans keep the blocker on for stealth.
+        ...(wantPopups ? ['--disable-popup-blocking'] : []),
         ...(keepBrowserOpen ? [] : ['--disable-background-networking']),
         '--no-sandbox',
         '--disable-setuid-sandbox',
@@ -3362,8 +3410,7 @@ function setupFrameHandling(page, forceDebug) {
             try {
               const parsedUrl = new URL(checkedUrl);
               fullSubdomain = parsedUrl.hostname;
-              const pslResult = psl.parse(fullSubdomain);
-              checkedRootDomain = pslResult.domain || fullSubdomain;
+              checkedRootDomain = rootDomainForHost(fullSubdomain);
             } catch (_) { return; }
             if (!checkedRootDomain) return;
@@ -3638,30 +3685,24 @@ function setupFrameHandling(page, forceDebug) {
         try {
           const parsedUrl = new URL(checkedUrl);
           fullSubdomain = parsedUrl.hostname;
-          const pslResult = psl.parse(fullSubdomain);
-          checkedRootDomain = pslResult.domain || fullSubdomain;
+          checkedRootDomain = rootDomainForHost(fullSubdomain);
         } catch (e) {}
+        // Never BLOCK the top-level document (the scanned page OR a main-frame
+        // redirect target). Aborting it makes the navigation never commit (page
+        // stays at about:blank → navigation timeout), silently breaking any
+        // scanned URL that matches our own filter lists (adblock / blocked /
+        // blockDomainsByUrl) — common on adult/pirate/stream domains. This flag
+        // ONLY guards the abort paths below; the request still flows through the
+        // match logic, so a main-frame redirect destination (e.g. a
+        // filecrypt → ad-domain hop) is still captured via filterRegex/dig/whois.
+        // isNavigationRequest is true for sub-frame docs too, so the mainFrame()
+        // check keeps ad iframes blockable.
+        let isMainFrameDoc = false;
+        try { isMainFrameDoc = request.isNavigationRequest() && request.frame() === page.mainFrame(); } catch (_) {}
         // Check against ALL first-party domains (original + all redirects)
         const isFirstParty = checkedRootDomain && firstPartyDomains.has(checkedRootDomain);
-        // Block infinite iframe loops - safely access frame URL
-        const frameUrl = (() => {
-          try {
-            const frame = request.frame();
-            return frame ? frame.url() : '';
-          } catch (err) {
-            return '';
-          }
-        })();
-        if (frameUrl && frameUrl.includes('creative.dmzjmp.com') &&
-            checkedUrl.includes('go.dmzjmp.com/api/models')) {
-          if (forceDebug) {
-            console.log(formatLogMessage('debug', `Blocking potential infinite iframe loop: ${checkedUrl}`));
-          }
-          request.abort();
-          return;
-        }
         // Enhanced debug logging to show which frame the request came from
         if (forceDebug) {
@@ -3691,7 +3732,7 @@ function setupFrameHandling(page, forceDebug) {
               request.resourceType()
             );
-            if (result.blocked) {
+            if (result.blocked && !isMainFrameDoc) {
               adblockStats.blocked++;
               if (forceDebug) {
                 console.log(formatLogMessage('debug', `${messageColors.blocked('[adblock]')} ${checkedUrl} (${result.reason})`));
@@ -3699,6 +3740,12 @@ function setupFrameHandling(page, forceDebug) {
               request.abort('blockedbyclient');
               return;
             }
+            if (result.blocked && isMainFrameDoc && forceDebug) {
+              // Matched a filter rule but it's the page we're scanning (or a
+              // main-frame redirect target) — allow it (blocking the top-level
+              // document aborts navigation). It still flows through the matcher.
+              console.log(formatLogMessage('debug', `${messageColors.highlight('[adblock]')} top-level document ${checkedUrl} matched (${result.reason}) — allowed (never block the scanned page)`));
+            }
             adblockStats.allowed++;
           } catch (err) { /* Silently continue on adblock errors */ }
         }
@@ -3752,7 +3799,7 @@ function setupFrameHandling(page, forceDebug) {
         // check so domain-based blocks short-circuit without paying the
         // per-URL regex scan. Same abort reason as the static path so
         // request.failure() observers see consistent metadata.
-        if (reqDomain && _dynamicallyBlockedDomains.size > 0 && matchesDynamicBlock(reqDomain)) {
+        if (reqDomain && _dynamicallyBlockedDomains.size > 0 && matchesDynamicBlock(reqDomain) && !isMainFrameDoc) {
           if (forceDebug) {
             console.log(formatLogMessage('debug', `${BLOCK_DOMAINS_BY_URL_TAG} aborting ${reqUrl} (domain ${reqDomain} dynamically blocked)`));
           }
@@ -3767,7 +3814,7 @@ function setupFrameHandling(page, forceDebug) {
             break;
           }
         }
-        if (blockedMatchIndex !== -1) {
+        if (blockedMatchIndex !== -1 && !isMainFrameDoc) {
           // Always track the hit (zero-cost on the un-debug path) so the
           // scan-end summary can show which patterns are doing work vs.
           // which are stale and ready to prune. Keyed by pattern.source --
@@ -4349,15 +4396,43 @@ function setupFrameHandling(page, forceDebug) {
         try {
           navigationResult = await navigateWithRedirectHandling(page, currentUrl, siteConfig, gotoOptions, forceDebug, formatLogMessage);
         } catch (navErr) {
-          // Only retry on genuine timeouts, not chrome-error:// redirects
+          // Only handle genuine timeouts here, not chrome-error:// redirects.
+          // pageUrl === 'about:blank' means the navigation never committed
+          // (server never responded) — treat as a real failure, not a partial
+          // page; only a page that actually reached a URL is worth observing.
           let pageUrl = '';
           try { if (!page.isClosed()) pageUrl = page.url(); } catch {}
           const isPopupFailure = navErr.message.includes('chrome-error://') || navErr.message.includes('invalid URL') ||
             pageUrl.startsWith('chrome-error://') || pageUrl === 'about:blank';
           if ((navErr.message.includes('timeout') || navErr.message.includes('Timeout')) && !isPopupFailure) {
-            if (forceDebug) console.log(formatLogMessage('debug', `Navigation timeout, retrying with waitUntil:networkidle2 for ${currentUrl}`));
-            const fallbackOptions = { ...gotoOptions, waitUntil: 'networkidle2', timeout: Math.min(timeout, 10000) };
-            navigationResult = await navigateWithRedirectHandling(page, currentUrl, siteConfig, fallbackOptions, forceDebug, formatLogMessage);
+            // The OLD fallback retried with networkidle2 — STRICTER than the
+            // domcontentloaded default, so it could never rescue a
+            // domcontentloaded timeout (and Puppeteer 25 has no 'commit', i.e.
+            // nothing more lenient). Two-tier recovery instead:
+            //   1. If the site used a wait STRICTER than domcontentloaded, do one
+            //      lenient retry with domcontentloaded (it fires earlier).
+            //   2. Otherwise proceed with the partially-loaded page rather than
+            //      discarding the URL — it exists and requests already fired
+            //      (captured by page.on('request')); the delay/interact phase
+            //      below keeps capturing. Streaming/embed/media pages routinely
+            //      never reach DOM-ready (a connection stays open) but their
+            //      ad/tracker calls fired early.
+            const primaryWait = gotoOptions.waitUntil || defaultWaitUntil;
+            let recovered = false;
+            if (primaryWait !== 'domcontentloaded') {
+              try {
+                if (forceDebug) console.log(formatLogMessage('debug', `Navigation timeout (${primaryWait}), retrying with waitUntil:domcontentloaded for ${currentUrl}`));
+                const fallbackOptions = { ...gotoOptions, waitUntil: 'domcontentloaded', timeout: Math.min(timeout, 15000) };
+                navigationResult = await navigateWithRedirectHandling(page, currentUrl, siteConfig, fallbackOptions, forceDebug, formatLogMessage);
+                recovered = true;
+              } catch (_) { /* fall through to proceed-with-partial */ }
+            }
+            if (!recovered) {
+              let partialUrl = currentUrl;
+              try { if (!page.isClosed()) partialUrl = page.url() || currentUrl; } catch {}
+              if (forceDebug) console.log(formatLogMessage('debug', `Navigation timeout — proceeding with partially-loaded page for ${currentUrl}`));
+              navigationResult = { finalUrl: partialUrl, redirected: false, redirectChain: [currentUrl], originalUrl: currentUrl, redirectDomains: [], httpStatus: null, cfRay: null };
+            }
           } else {
             throw navErr;
           }
@@ -4630,8 +4705,41 @@ function setupFrameHandling(page, forceDebug) {
           // Capture hard "dead domain" navigation errors for --show-dead-domains
           // (DNS doesn't resolve / host unreachable). Blocks, timeouts and CF
           // challenges are NOT dead — they're excluded by this match.
-          const deadNav = /ERR_NAME_NOT_RESOLVED|ERR_ADDRESS_UNREACHABLE|ERR_DNS/.exec(err.message || '');
-          if (deadNav) recordDeadDomain(currentUrl, deadNav[0]);
+          // Only DEFINITIVE non-existence / unreachable signals — these now drive
+          // the in-scan dead-domain SKIP (not just --show-dead-domains reporting),
+          // so transient DNS errors must NOT match. The bare `ERR_DNS` used to
+          // catch ERR_DNS_TIMED_OUT / ERR_DNS_MALFORMED_RESPONSE / ERR_DNS_SERVER_FAILED
+          // (all transient) — dropped so a slow-DNS blip can't false-skip the
+          // rest of a live host's URLs.
+          const deadNav = /ERR_NAME_NOT_RESOLVED|ERR_ADDRESS_UNREACHABLE/.exec(err.message || '');
+          if (deadNav) {
+            recordDeadDomain(currentUrl, deadNav[0]);
+            // Corroborate-then-persist to the negative cache (.dnsnegcache with
+            // --dns-cache → cross-scan skip; else in-memory). Chrome resolves via
+            // the possibly-flaky SYSTEM resolver, so its ERR_NAME_NOT_RESOLVED may
+            // be a glitch on a LIVE host. Re-confirm via the reliable --dns
+            // resolver and cache ONLY if it ALSO returns a definitive NXDOMAIN.
+            // ERR_ADDRESS_UNREACHABLE is routing (the host resolves), so the
+            // resolve succeeds and it's correctly not cached. Fire-and-forget:
+            // off the critical path; saveDiskCache flushes on exit.
+            if (dnsPrecheckEnabled && deadNav[0] === 'ERR_NAME_NOT_RESOLVED') {
+              let navHost = '';
+              try { navHost = new URL(currentUrl).hostname; } catch {}
+              if (navHost && !/^[\d.:]+$|^\[/.test(navHost) && !dnsNegativeCache.has(navHost)) {
+                dnsResolver.resolveHost(navHost, dnsPrecheckTimeoutMs).then(
+                  () => { /* reliable resolver resolves it — system-resolver glitch, do NOT cache */ },
+                  (e) => {
+                    const code = (e && (e.code || e.message)) || '';
+                    if (isNonExistenceError(code)) {
+                      dnsNegativeCacheSet(navHost, code);
+                      recordDeadDomain(navHost, code);
+                      if (forceDebug) console.log(formatLogMessage('debug', `Dead domain confirmed by --dns resolver (${code}) — caching ${navHost} (skips next run with --dns-cache)`));
+                    }
+                  }
+                ).catch(() => {});
+              }
+            }
+          }
           throw err;
         }
       }
@@ -5263,7 +5371,7 @@ function setupFrameHandling(page, forceDebug) {
           const safeUrl = currentUrl.replace(/https?:\/\//, '').replace(/[^a-zA-Z0-9]/g, '_').substring(0, 80);
           const filename = `screenshots/${safeUrl}-${timestamp}.png`;
           try {
-            if (!fs.existsSync('screenshots')) fs.mkdirSync('screenshots', { recursive: true });
+            fs.mkdirSync('screenshots', { recursive: true }); // recursive:true is a no-op if it already exists
             await page.screenshot({ path: filename, type: 'png', fullPage: true });
             console.log(formatLogMessage('info', `Screenshot saved: ${filename}`));
           } catch (screenshotErr) {
@@ -5759,6 +5867,19 @@ function setupFrameHandling(page, forceDebug) {
        // actually starting — wrongly skipping live domains. c-ares isn't
        // threadpool-bound so it's immune to that contention.
        if (dnsPrecheckEnabled && taskDomain && !/^[\d.:]+$|^\[/.test(taskDomain)) {
+         // Already proven dead earlier THIS run — either a pre-check NXDOMAIN or
+         // a prior URL's navigation hit ERR_NAME_NOT_RESOLVED / ERR_ADDRESS_UNREACHABLE
+         // (recordDeadDomain populates _deadDomains for both). Skip the repeat
+         // instead of paying another fail-open navigation on a multi-URL dead
+         // host (e.g. dlstreams.top?id=39/54/347). In-scan only (NOT persisted):
+         // Chrome resolves via the system resolver, so a nav-level failure could
+         // be a system-resolver glitch on a live host — a false "dead" must not
+         // carry across runs. Cheap: a Map lookup, no DNS resolve.
+         if (_deadDomains.has(taskDomain)) {
+           dnsPrecheckSkips++;
+           if (forceDebug) console.log(formatLogMessage('debug', `DNS pre-check: ${taskDomain} already dead this run (${_deadDomains.get(taskDomain)}) — skipping`));
+           return { url: task.url, rules: [], success: false, error: `DNS: ${_deadDomains.get(taskDomain)}`, skipped: true };
+         }
          const cached = dnsNegativeCache.get(taskDomain);
          if (cached && Date.now() - cached.timestamp < DNS_NEGATIVE_CACHE_TTL_MS) {
            dnsPrecheckSkips++;

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@fanboynz/network-scanner",
-  "version": "3.2.0",
+  "version": "3.3.0",
   "description": "A Puppeteer-based network scanner for analyzing web traffic, generating adblock filter rules, and identifying third-party requests. Features include fingerprint spoofing, Cloudflare bypass, content analysis with curl/grep, and multiple output formats.",
   "main": "nwss.js",
   "scripts": {