@fanboynz/network-scanner 3.1.2 → 3.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,7 +2,58 @@
2
2
 
3
3
  All notable changes to the Network Scanner (nwss.js) project.
4
4
 
5
- ## [3.1.1] - 2026-05-30
5
+ ## [3.3.0] - 2026-06-06
6
+
7
+ ### Added
8
+ - **DNS dead-domain skip + corroborated persistence** — within a scan, once a host resolves NXDOMAIN/ENODATA it is remembered and repeat URLs on that host are skipped without re-resolving. With `--dns-cache`, a host that *also* fails navigation (`ERR_NAME_NOT_RESOLVED` / `ERR_ADDRESS_UNREACHABLE`) is corroborated and persisted to the negative cache (`.dnsnegcache`, 12h TTL) so it is skipped on the next run too. Only definitive non-existence is cached — resolver errors fail open and never poison a live host.
9
+ - **`acceptInsecureCerts` on browser launch** — TLS/cert errors (expired, self-signed, name-mismatch) no longer abort navigation, so streaming/pirate domains with broken certs are still scanned.
10
+ - **`--disable-popup-blocking` when a site uses `capture_popups`** — Chrome's pop-up blocker (`chrome://settings/content/popups`) is turned off only for popup-capture scans, so non-gesture popunders (document-level `onclick` / timer SDKs) fire and get captured too. Non-popup scans keep the blocker on (stealthier — a real browser blocks non-gesture `window.open()`); gesture-triggered popups already worked via the synthetic-click path.
11
+
12
+ ### Changed
13
+ - **The main-frame document is never blocked** — the scanned page (and any main-frame redirect target) is exempt from adblock / `blocked` / `blockDomainsByUrl` aborts. Aborting it made the navigation never commit (`about:blank` → timeout), silently breaking scanned URLs that matched our own filter lists (common on adult/pirate/stream domains). The request still flows through the matcher, so a main-frame redirect destination (e.g. a filecrypt → ad-domain hop) is still captured; sub-frame / ad iframes stay blockable.
14
+ - **Navigation timeouts are recovered, not discarded** — on a nav timeout the scanner retries leniently and proceeds with the partially-loaded page instead of dropping the URL (a page still at `about:blank` is still treated as a failure).
15
+ - **whois disk-cache TTL raised to 36h** (dig stays 20h) — registrar data is stable and whois servers rate-limit aggressively, so a longer TTL cuts repeat queries; dig keeps its 20h TTL.
16
+ - **VPN is Linux-only with a clear guard** — `vpn` / `openvpn` on macOS/Windows now returns an explicit "Linux-only" error instead of cryptic `ip` / `/proc` failures.
17
+
18
+ ### Performance
19
+ - **`psl.parse` memoized by hostname** in the request hot path — both per-request handlers (main page + popup capture) parsed the root domain of *every* request, while a page hammers the same handful of hosts (CDN, analytics, ad domains). A hostname-keyed memo turns almost all of those into `Map` hits, replacing the URL-keyed cache (fewer + shorter keys, far higher hit rate).
20
+ - **Lower per-request overhead** — the iframe-loop guard's `frame().url()` lookup is now gated behind a cheap URL string test instead of running on every request.
21
+ - **Removed redundant disk I/O** — a leaked adblock combined-list temp file in `tmpdir` is now cleaned up, and a redundant `existsSync` before each forced screenshot's recursive `mkdir` was dropped.
22
+
23
+ ### Fixed
24
+ - **Periodic debug/`--dumpurls` log flush is now synchronous** — the 2s timer used async `fs.writeFile({flag:'a'})` with no in-flight guard, so two ticks could append to the same file concurrently and interleave lines, and it cleared the buffer *before* the write confirmed (silently dropping entries on a failed write). It now uses `appendFileSync`, clears only after a successful write (transient failures retry next tick), and is bounded so a permanently-unwritable path can't grow memory.
25
+ - **Dead-domain skip works without `--show-dead-domains`** — the in-scan skip recorded into the dead set only when the report flag was on, which made the skip dead code; recording is now unconditional and the flag gates only the end-of-scan report. Transient DNS errors were also dropped from the dead-domain match so only `ERR_NAME_NOT_RESOLVED` / `ERR_ADDRESS_UNREACHABLE` mark a host dead.
26
+
27
+ ### Removed
28
+ - **Hardcoded `dmzjmp` iframe-loop guard** — the domain-specific abort for a `creative.dmzjmp.com` frame requesting `go.dmzjmp.com/api/models` (added mid-2025 to stop a runaway request loop) has not recurred and was removed from the request hot path; the per-URL timeout remains the backstop. Recoverable from git history — prefer a config-driven `iframe_loop_guards` entry if it ever returns.
29
+
30
+ ### Documentation
31
+ - **README + man page now document `--block-ads` and `--adblock-engine`** — blocking ads/trackers *during* the scan with EasyList-format list(s) (comma-separated), and the `js` (default, native parser) vs `rust` (Brave `adblock-rs`) matcher backends.
32
+
33
+ ## [3.2.0] - 2026-06-04
34
+
35
+ ### Added
36
+ - **`output_regex`** site option — a per-site regex whose capture group 1 (or whole match) becomes the rule body, so output can be a path-prefix rule like `||host/script/` instead of `||host^`. Collapses randomized filenames under a stable path into one rule and lets you block a folder on a host that also serves legit content; falls back to `||host^` when the regex doesn't match. Adblock-only — domain-based formats (dnsmasq/unbound/pi-hole/hosts/plain) emit the bare host. Compiled once per pattern (memoized) and validated at config load.
37
+ - **dig resolver failover** — `digLookup` now fails over through the `--dns` resolvers on timeout / no-reply / `REFUSED` / `SERVFAIL` (up to 3 attempts, `+time=2 +tries=1` each), matching the resilience the whois retry and DNS pre-check rotation already had. With no `--dns`, the system-resolver path keeps dig's native `resolv.conf` rotation unchanged.
38
+
39
+ ### Changed
40
+ - **Ghost-cursor coordinate clicks now use the same realistic press as the built-in content clicks** (`humanClick`): hover dwell + mousedown/hold/mouseup, plus hand-tremor during the hold and a mouseup drift (so mousedown ≠ mouseup coordinates) when `realistic_click` is set — replacing a 0ms `page.mouse.click`.
41
+ - **Ghost-cursor clicks honor `interact_click_count`** (default 3, cap 20) instead of firing a single click — ad SDKs often swallow the 1st/2nd click as warmup. The bezier movement loop reserves part of `ghost_cursor_duration` for the clicks (raise the duration to fit more; the default 2000ms fits ~1 realistic click).
42
+ - **`dig` success is judged by RCODE, not stderr** — a dig that prints a transient `communications error` warning but still returns a valid `ANSWER SECTION` is no longer discarded.
43
+ - **dig-only configs skip the whois root-domain parse** per request (small per-request saving when no `whois`/`whois-or` is configured).
44
+
45
+ ### Fixed
46
+ - **`max_redirects: 0`** now means "follow none" instead of silently becoming 10 (the `|| 10` falsy-zero bug in `nwss.js` and `lib/redirect.js`).
47
+ - **A `REFUSED`/`SERVFAIL` dig that exhausts all resolvers returns failure** so it isn't cached — a transient resolver-side error no longer poisons a domain for the cache TTL.
48
+ - **Ghost-cursor coordinate click no longer reports false success** — it returned `true` (and logged "Clicked") even when the click was silently skipped for lack of a page; it now returns `false` and logs the skip.
49
+
50
+ ### Removed
51
+ - **`follow_redirects`** site option — documented in `--help`, the man page, the README, and example configs but never wired to any runtime behavior; removed from the docs. Use `max_redirects` instead (`0` = follow none).
52
+
53
+ ### Security
54
+ - **dig argv-injection guard** — `digLookup` rejects non-hostname-shaped input before shelling out. `dig` has no `--` end-of-options marker (unlike whois) and parses `@`/`-`/`+`-leading argv tokens as options, so a crafted "domain" like `@evil-resolver` (redirects the query to an arbitrary server) or `-f /path` (reads a file as a query batch) is now rejected — out-of-charset or dash-leading values fall back to no-match.
55
+
56
+ ## [3.1.2] - 2026-05-30
6
57
 
7
58
  ### Changed
8
59
  - **Fingerprint identity pinned to Stable Chrome 148**, not whatever Chrome-for-Testing puppeteer bundles (currently 149, ahead of Stable). The spoof must blend with the real-world population; claiming an unreleased build is itself a tell. The Chrome major + build (`CHROME_BUILD`) + GREASE brand (`CHROME_GREASE_BRAND`) are now single constants — see `lib/fingerprint.md`.
package/CLAUDE.md CHANGED
@@ -6,7 +6,7 @@ Puppeteer-based network scanner for analyzing web traffic, generating adblock fi
6
6
 
7
7
  - `nwss.js` — Main entry point (~5,800 lines). CLI args, URL processing, orchestration.
8
8
  - `config.json` — Default scan configuration (sites, filters, options).
9
- - `lib/` — 32 focused, single-purpose modules:
9
+ - `lib/` — 33 focused, single-purpose modules:
10
10
  - `fingerprint.js` — Bot detection evasion (device/GPU/timezone spoofing)
11
11
  - `cloudflare.js` — Cloudflare challenge detection and solving
12
12
  - `browserhealth.js` — Memory management and browser lifecycle
@@ -14,6 +14,7 @@ Puppeteer-based network scanner for analyzing web traffic, generating adblock fi
14
14
  - `ghost-cursor.js` — Bezier-curve cursor pathing for human-like mouse movement
15
15
  - `smart-cache.js` — Multi-layer caching with persistence
16
16
  - `nettools.js` — WHOIS/dig integration
17
+ - `dns.js` — DNS pre-check resolver: multi-nameserver rotation + `--dns` override (pre-check only; not Chrome/dig)
17
18
  - `output.js` — Multi-format rule output (adblock, dnsmasq, unbound, pihole, etc.)
18
19
  - `proxy.js` — SOCKS5/HTTP proxy support
19
20
  - `socks-relay.js` — Local SOCKS proxy relay/chain helper
package/README.md CHANGED
@@ -17,6 +17,7 @@ A Puppeteer-based tool for scanning websites to find third-party (or optionally
17
17
  - Subdomain handling (collapse to root or full subdomain)
18
18
  - Optionally match only first-party, third-party, or both
19
19
  - Enhanced redirect handling with JavaScript and meta refresh detection
20
+ - Capture and drive popup/popunder chains (`capture_popups` + `interact_popups`) so domains reachable only via a clicked popup still match
20
21
  - Per-site proxy routing (SOCKS5, SOCKS4, HTTP, HTTPS) with pre-flight health checks
21
22
 
22
23
  ---
@@ -50,7 +51,6 @@ A Puppeteer-based tool for scanning websites to find third-party (or optionally
50
51
 
51
52
  | Argument | Description |
52
53
  |:---------------------------|:------------|
53
- | `--verbose` | Force verbose mode globally |
54
54
  | `--debug` | Force debug mode globally |
55
55
  | `--silent` | Suppress normal console logs |
56
56
  | `--titles` | Add `! <url>` title before each site's group |
@@ -66,9 +66,10 @@ A Puppeteer-based tool for scanning websites to find third-party (or optionally
66
66
  | `--use-puppeteer-core` | Use `puppeteer-core` with system Chrome instead of bundled Chromium |
67
67
  | `--use-obscura` | Connect to running Obscura CDP server (`ws://127.0.0.1:9222` or `OBSCURA_WS` env). Skips fingerprint injection — Obscura provides built-in stealth |
68
68
  | `--load-extension <path>` | Load unpacked Chrome extension from directory (can be used multiple times) |
69
- | `--dns-cache` | Persist dig/whois results to disk between runs (20hr TTL, 2000-entry cap each, `.digcache`/`.whoiscache`). Disk writes are atomic (tmp + rename); corrupt cache files are detected on load with a `[dns-cache]` warn line and reset cleanly. |
70
- | `--no-dns-precheck` | Disable per-URL DNS resolution check before page navigation. By default, hosts that dig/whois have already proven live (within the 20hr cache TTL) skip their c-ares pre-check via a positive-resolution index. |
71
- | `--block-ads=<files>` | Block ads using EasyList format rules (comma-separated: `easylist.txt,easyprivacy.txt`) |
69
+ | `--dns-cache` | Persist dig/whois results to disk between runs (dig 20hr / whois 36hr TTL, 2000-entry cap each, `.digcache`/`.whoiscache`), **plus** the DNS pre-check negative cache (NXDOMAIN/ENODATA only — never resolver errors — 12h TTL, `.dnsnegcache`) so known-dead hosts aren't re-resolved next run. Disk writes are atomic (tmp + rename); corrupt cache files are detected on load with a `[dns-cache]` warn line and reset cleanly. |
70
+ | `--no-dns-precheck` | Disable per-URL DNS resolution check before page navigation. By default, hosts that dig/whois have already proven live (within the dig/whois cache TTL) skip their c-ares pre-check via a positive-resolution index. |
71
+ | `--block-ads=<files>` | Block ads/trackers **during the scan** using EasyList-format filter list(s) (`\|\|domain^`, `/ads/*`, etc.). Comma-separated for multiple: `--block-ads=easylist.txt,easyprivacy.txt`. See [Blocking ads during the scan](#blocking-ads-during-the-scan). |
72
+ | `--adblock-engine=<js\|rust>` | Matcher backend for `--block-ads` (default: `js`). `rust` uses Brave's `adblock-rs` (much faster on large lists) and requires `npm i adblock-rs`. |
72
73
  | `--cdp` | Enable Chrome DevTools Protocol logging (now per-page if enabled) |
73
74
  | `--remove-dupes` | Remove duplicate domains from output (only with `-o`) |
74
75
  | `--dry-run` | Console output only: show matching regex, titles, whois/dig/searchstring results, and adblock rules |
@@ -76,6 +77,8 @@ A Puppeteer-based tool for scanning websites to find third-party (or optionally
76
77
  | `--help`, `-h` | Show this help menu |
77
78
  | `--version` | Show script version |
78
79
  | `--max-concurrent <number>` | Maximum concurrent site processing (1-50, overrides config/default) |
80
+ | `--dns <ip[,ip,...]>` | Resolver(s) for the DNS pre-check **and** nettools' `dig` (one pins, several rotate per query; overrides `/etc/resolv.conf`). Does not affect Chrome navigation or `whois`. Useful when the system resolver is flaky and `dig`-gated domains time out |
81
+ | `--show-dead-domains` | At end of scan, list hostnames that did not resolve / were unreachable (`NXDOMAIN`/`ENODATA` + `ERR_NAME_NOT_RESOLVED`/`ERR_ADDRESS_UNREACHABLE`). Excludes blocks/timeouts (those mean the domain is alive). For pruning dead URLs. |
79
82
  | `--cleanup-interval <number>` | Browser restart interval in URLs processed (1-1000, overrides config/default) |
80
83
 
81
84
  ### Validation Options
@@ -90,6 +93,37 @@ A Puppeteer-based tool for scanning websites to find third-party (or optionally
90
93
  | `--clear-cache` | Clear persistent cache before scanning (improves fresh start performance) |
91
94
  | `--ignore-cache` | Bypass all smart caching functionality during scanning |
92
95
 
96
+ ### Blocking ads during the scan
97
+
98
+ `--block-ads` makes the scanner **block** matching requests *during* the scan (separate from capturing rules) — to keep ad/tracker noise out of the page, speed up loads, or test that a list catches what it should.
99
+
100
+ **Adding lists.** Pass one or more EasyList-format filter lists (same syntax as uBlock Origin / EasyList):
101
+
102
+ ```bash
103
+ # Single list
104
+ node nwss.js --block-ads=easylist.txt
105
+
106
+ # Multiple lists — comma-separated, no spaces
107
+ node nwss.js --block-ads=easylist.txt,easyprivacy.txt,mylist.txt
108
+ ```
109
+
110
+ Lists are plain-text **network** rules — `||doubleclick.net^`, `/ads/*`, `||example.com^$script`, etc. Element-hiding/cosmetic rules (`##…`) don't apply to request blocking and are ignored. The scanned page's own top-level document is never blocked (only sub-resources), so a site whose own domain is in a list still loads.
111
+
112
+ **Engine — `js` vs `rust`** (`--adblock-engine`, default `js`):
113
+
114
+ | Engine | Flag | Backend | When |
115
+ |---|---|---|---|
116
+ | **js** (default) | `--adblock-engine=js` | `lib/adblock.js` — pure-JS, no extra deps | Default; fine for small/medium lists, works everywhere |
117
+ | **rust** | `--adblock-engine=rust` | `lib/adblock-rust.js` — Brave's [`adblock-rs`](https://github.com/brave/adblock-rust) | Large lists (full EasyList + EasyPrivacy + …); much faster matching. Drop-in (same rules, same results). Requires `npm install adblock-rs` (needs a Rust toolchain) |
118
+
119
+ The two engines are interchangeable — same rule format, same blocking result; `rust` is purely a speed option for big lists. If you pass `--adblock-engine=rust` without `adblock-rs` installed, install it (`npm i adblock-rs`) or drop the flag to use `js`.
120
+
121
+ ```bash
122
+ # Fast matching over big lists with the Rust engine
123
+ npm install adblock-rs
124
+ node nwss.js --block-ads=easylist.txt,easyprivacy.txt --adblock-engine=rust
125
+ ```
126
+
93
127
  ---
94
128
 
95
129
  ## config.json Format
@@ -152,6 +186,7 @@ Example:
152
186
  | `userAgent` | `chrome`, `chrome_mac`, `chrome_linux`, `firefox`, `firefox_mac`, `firefox_linux`, `safari` | - | User agent for page |
153
187
  | `filterRegex` | String or Array | `.*` | Regex or list of regexes to match requests |
154
188
  | `regex_and` | Boolean | `false` | Use AND logic for multiple filterRegex patterns - ALL patterns must match the same URL |
189
+ | `output_regex` | String | — | Regex applied to each matched URL to build the rule body: capture group 1 (or whole match) becomes `\|\|<capture>` instead of `\|\|host^`. E.g. `^https?:\/\/([^\/]+\/[^\/]+\/)` turns `https://host.com/script/abc.js` into `\|\|host.com/script/`. The capture must include the host. No match → falls back to `\|\|host^`. Adblock-only; domain formats (dnsmasq/pihole/hosts/plain) emit the bare host |
155
190
  | `comments` | String or Array | - | String of comments or references |
156
191
  | `resourceTypes` | Array | `["script", "xhr", "image", "stylesheet"]` | What resource types to monitor |
157
192
  | `reload` | Integer | `1` | Number of times to reload page |
@@ -176,8 +211,7 @@ Example:
176
211
 
177
212
  | Field | Values | Default | Description |
178
213
  |:---------------------|:-------|:-------:|:------------|
179
- | `follow_redirects` | Boolean | `true` | Follow redirects to new domains |
180
- | `max_redirects` | Integer | `10` | Maximum number of redirects to follow |
214
+ | `max_redirects` | Integer | `10` | Maximum number of redirects to follow (`0` = follow none) |
181
215
  | `js_redirect_timeout` | Milliseconds | `5000` | Time to wait for JavaScript redirects |
182
216
  | `detect_js_patterns` | Boolean | `true` | Analyze page source for redirect patterns |
183
217
  | `redirect_timeout_multiplier` | Number | `1.5` | Increase timeout for redirected URLs |
@@ -279,6 +313,8 @@ When a page redirects to a new domain, first-party/third-party detection is base
279
313
  | `interact_duration` | Milliseconds | `2000` | Duration of interaction simulation |
280
314
  | `interact_scrolling` | Boolean | `true` | Enable scrolling simulation |
281
315
  | `interact_clicks` | Boolean | `false` | Enable element clicking simulation |
316
+ | `interact_click_count` | Integer | `3` | Number of random content-zone clicks per load (capped at 20). Default 3 = primary + 2 backups, since ad SDKs sometimes suppress the 1st/2nd click as warmup |
317
+ | `realistic_click` | Boolean | `false` | Higher click fidelity: denser mouse approach (15 steps), ±1px hand-tremor micro-moves during the press, and ±1.5px mouseup drift (so mousedown≠mouseup coords) — for sites that score click realism. Costs ~80–120ms/click |
282
318
  | `interact_typing` | Boolean | `false` | Enable typing simulation |
283
319
  | `interact_intensity` | String | `"medium"` | Interaction simulation intensity: "low", "medium", "high" |
284
320
  | `cursor_mode` | `"ghost"` | - | Use ghost-cursor Bezier mouse movements (requires `npm i ghost-cursor`) |
@@ -295,6 +331,21 @@ When a page redirects to a new domain, first-party/third-party detection is base
295
331
  | `ignore_similar_threshold` | Integer | - | Override global similarity threshold for this site |
296
332
  | `ignore_similar_ignored_domains` | Boolean | - | Override global `ignore_similar_ignored_domains` for this site |
297
333
 
334
+ ### Popup Capture Options
335
+
336
+ Capture (and optionally drive) the popup/popunder windows that ad and redirect
337
+ scripts open, so domains reachable only via a popup chain still match `filterRegex`.
338
+ The same `filterRegex` applies to the whole chain — it must contain every pattern
339
+ you expect along it. Popup capture only fires when the main page is actually
340
+ clicking, so set `interact: true` **and** `interact_clicks: true` as well.
341
+
342
+ | Field | Values | Default | Description |
343
+ |:---------------------|:-------|:-------:|:------------|
344
+ | `capture_popups` | Boolean | `false` | Capture popup windows opened during the scan and evaluate their landing URL + in-popup requests against `filterRegex`/`dig`/`whois` (requires `interact` + `interact_clicks` to fire user-gesture clicks) |
345
+ | `interact_popups` | Boolean | `false` | Mouse-click inside captured popups (3 content-zone clicks) so the chain cascades to its next redirect/ad. Requires `capture_popups`. Clicks popups up to `capture_popups_max_depth − 1` (the deepest captured popup is observed, not clicked) |
346
+ | `capture_popups_max_depth` | Integer | `4` | Max popup-chain depth to capture (`site → p1 → p2 → p3 → destination`). Each extra level multiplies popups + time |
347
+ | `capture_popups_window_ms` | Integer | `5000` | Per-popup capture window (ms) before the popup is auto-closed |
348
+
298
349
  ### VPN Options
299
350
 
300
351
  Route traffic through a VPN for specific sites. Requires `sudo` privileges. The VPN connection is established before scanning and torn down after the site completes.
@@ -596,8 +647,11 @@ node nwss.js --max-concurrent 12 --cleanup-interval 300 -o rules.txt
596
647
  {
597
648
  "url": "https://anti-bot-site.com",
598
649
  "interact": true,
650
+ "interact_clicks": true,
599
651
  "cursor_mode": "ghost",
600
- "ghost_cursor_duration": 3000,
652
+ "realistic_click": true,
653
+ "interact_click_count": 3,
654
+ "ghost_cursor_duration": 5000,
601
655
  "ghost_cursor_speed": 1.2,
602
656
  "fingerprint_protection": "random",
603
657
  "filterRegex": "tracking|analytics",
@@ -610,6 +664,12 @@ Or enable globally via CLI:
610
664
  node nwss.js --ghost-cursor --debug -o rules.txt
611
665
  ```
612
666
 
667
+ **Ghost-cursor clicks.** The cursor moves with `cursor_mode: "ghost"`, but it only *clicks* when both `interact: true` **and** `interact_clicks: true` are set (same rule as the built-in path). Click behavior:
668
+
669
+ - `realistic_click: true` — each press adds hand-tremor during the hold and a mouseup drift, so `mousedown` ≠ `mouseup` coordinates (the press is routed through the same `humanClick` the built-in content clicks use).
670
+ - `interact_click_count` — number of clicks per load (default `3`, capped at `20`). The default of 3 matters because some ad SDKs swallow the 1st/2nd click as warmup.
671
+ - **Duration vs. clicks:** realistic clicks take ~600–700ms each, and the bezier movement loop reserves up to **half** of `ghost_cursor_duration` for them. So the default `ghost_cursor_duration: 2000` only fits **~1 click** — raise it to roughly `interact_click_count × 700 + movement` (e.g. `5000`–`8000`) to fit all of them.
672
+
613
673
  > **Note:** ghost-cursor is an optional dependency. Install with `npm install ghost-cursor`. If not installed, the scanner falls back to the built-in mouse simulation automatically.
614
674
 
615
675
  #### E-commerce Site Scanning
package/eslint.config.mjs CHANGED
@@ -2,5 +2,17 @@ import globals from "globals";
2
2
  import { defineConfig } from "eslint/config";
3
3
 
4
4
  export default defineConfig([
5
- { files: ["**/*.{js,mjs,cjs}"], languageOptions: { globals: globals.browser } },
5
+ {
6
+ files: ["**/*.{js,mjs,cjs}"],
7
+ // Node globals (require/module/process/Buffer/...) plus browser globals
8
+ // (document/window/navigator) — the latter are referenced inside
9
+ // page.evaluate() callbacks that eslint parses as part of the file.
10
+ languageOptions: { globals: { ...globals.node, ...globals.browser } },
11
+ // Catch undefined-variable references statically. node --check only
12
+ // validates syntax, so an orphaned identifier (e.g. a const that was
13
+ // removed while a usage remained) passes parsing but throws
14
+ // ReferenceError at runtime only when that branch executes. no-undef
15
+ // turns that whole class into a build-time failure.
16
+ rules: { "no-undef": "error" },
17
+ },
6
18
  ]);
@@ -7,12 +7,11 @@ const { formatLogMessage, messageColors } = require('./colorize');
7
7
  const IS_PAGE_FROM_PREVIOUS_SCAN_TAG = messageColors.processing('[isPageFromPreviousScan]');
8
8
  const REALTIME_CLEANUP_TAG = messageColors.processing('[realtime_cleanup]');
9
9
  const GROUP_WINDOW_CLEANUP_TAG = messageColors.processing('[group_window_cleanup]');
10
- const { execSync, execFile } = require('child_process');
10
+ const { execFile } = require('child_process');
11
11
 
12
12
  // Window cleanup delay constant
13
13
  const WINDOW_CLEANUP_DELAY_MS = 15000;
14
14
  // window_clean REALTIME
15
- const REALTIME_CLEANUP_BUFFER_MS = 25000; // Additional buffer time after site delay (increased for Cloudflare)
16
15
  const REALTIME_CLEANUP_THRESHOLD = 12; // Default number of pages to keep
17
16
  const REALTIME_CLEANUP_MIN_PAGES = 6; // Minimum pages before cleanup kicks in
18
17
 
@@ -380,7 +379,30 @@ async function performRealtimeWindowCleanup(browserInstance, threshold = REALTIM
380
379
 
381
380
  // Use the provided total delay (already includes appropriate buffer)
382
381
  const cleanupDelay = totalDelay;
383
-
382
+
383
+ // Pre-wait short-circuit. The only pages this pass can ever close are popups
384
+ // (untracked) and idle pages — active main pages are protected by
385
+ // isPageSafeToClose. When concurrency exceeds the threshold the page count is
386
+ // dominated by active main pages, so without this we'd wait the full
387
+ // cleanupDelay and then close nothing (e.g. max_concurrent 30 vs threshold 8
388
+ // = a ~36s no-op on every task). If nothing is even a candidate, skip the
389
+ // wait. A main task that finishes during the skipped wait closes its OWN page,
390
+ // so realtime cleanup never needed to wait for it.
391
+ const hasCloseCandidate = quickPages.some(p => {
392
+ if (p.isClosed()) return false;
393
+ const usage = pageUsageTracker.get(p);
394
+ return !usage || !usage.isProcessing; // untracked popup, or a tracked-idle page
395
+ });
396
+ if (!hasCloseCandidate) {
397
+ if (forceDebug) {
398
+ console.log(formatLogMessage('debug', `${REALTIME_CLEANUP_TAG} ${quickPages.length} pages but all actively processing — skipping ${cleanupDelay}ms wait (nothing closeable)`));
399
+ }
400
+ result.success = true;
401
+ result.totalPages = quickPages.length;
402
+ result.reason = 'all_active';
403
+ return result;
404
+ }
405
+
384
406
  if (forceDebug) {
385
407
  console.log(formatLogMessage('debug', `${REALTIME_CLEANUP_TAG} Waiting ${cleanupDelay}ms before cleanup (threshold: ${threshold})`));
386
408
  }
package/lib/dns.js ADDED
@@ -0,0 +1,238 @@
1
+ /**
2
+ * DNS pre-check resolver with multi-nameserver rotation.
3
+ *
4
+ * Owns nameserver selection and robust resolution for the scan's DNS
5
+ * pre-check. The default global resolver leads EVERY query with the FIRST
6
+ * nameserver in /etc/resolv.conf, so under scan concurrency one server
7
+ * (typically the ISP resolver) takes the whole c-ares burst and starts
8
+ * answering REFUSED while the other configured servers (e.g. 8.8.8.8/8.8.4.4)
9
+ * sit idle. This module builds one Resolver per nameserver — each leading with
10
+ * a different server, the rest kept as failover order — and round-robins them
11
+ * per resolve attempt so the lead spreads across all servers (and across the
12
+ * retry). A `--dns` override pins/rotates an explicit list instead of
13
+ * resolv.conf.
14
+ *
15
+ * Scope: this affects the pre-check resolver only. Chrome's navigation DNS
16
+ * (OS resolver) and nettools' dig/whois are separate paths and unaffected.
17
+ */
18
+ const net = require('node:net');
19
+ const dnsPromises = require('node:dns/promises');
20
+ const { getServers: getSystemDnsServers } = require('node:dns');
21
+ const { Resolver: DnsPromiseResolver } = require('node:dns/promises');
22
+ const { formatLogMessage } = require('./colorize');
23
+
24
+ // c-ares codes that mean "resolver problem" (retry-worthy / fail-open), not
25
+ // "the host does not exist".
26
+ const DNS_TRANSIENT_ERRORS = new Set(['ETIMEOUT', 'ESERVFAIL', 'EREFUSED', 'ECONNREFUSED']);
27
+
28
+ /**
29
+ * True only for a definitive "host does not exist / has no address" answer —
30
+ * the only case that justifies skipping a URL in the pre-check. Everything
31
+ * else (EREFUSED, ESERVFAIL, ETIMEOUT, ECONNREFUSED, timeout) is a resolver
32
+ * problem the caller should fail open on.
33
+ * @param {string} code
34
+ * @returns {boolean}
35
+ */
36
+ function isNonExistenceError(code) {
37
+ return code === 'ENOTFOUND' || code === 'ENODATA';
38
+ }
39
+
40
+ // Accept a bare IPv4/IPv6 address, or an address with a port in the exact form
41
+ // Resolver.setServers() understands: `ipv4:port` or `[ipv6]:port`.
42
+ function isResolverSpec(s) {
43
+ if (net.isIP(s)) return true;
44
+ const bracketed = s.match(/^\[([0-9a-fA-F:]+)\](?::\d{1,5})?$/);
45
+ if (bracketed) return net.isIP(bracketed[1]) === 6;
46
+ const v4port = s.match(/^(\d{1,3}(?:\.\d{1,3}){3}):\d{1,5}$/);
47
+ if (v4port) return net.isIP(v4port[1]) === 4;
48
+ return false;
49
+ }
50
+
51
+ /**
52
+ * Parse + validate a `--dns` / config value into a clean, de-duplicated server
53
+ * list. Accepts a comma-separated string or an array. Each entry may be a bare
54
+ * IPv4/IPv6 address or an address with a port (`8.8.8.8:5353`,
55
+ * `[2001:db8::1]:5353`) — the form setServers() accepts. Invalid entries are
56
+ * warned and dropped; duplicates are collapsed so the rotation stays even.
57
+ * @param {string|string[]|undefined} raw
58
+ * @returns {string[]} validated server specs (possibly empty)
59
+ */
60
+ function parseDnsServers(raw) {
61
+ if (!raw) return [];
62
+ const parts = (Array.isArray(raw) ? raw : String(raw).split(','))
63
+ .map(s => String(s).trim())
64
+ .filter(Boolean);
65
+ const valid = [];
66
+ const seen = new Set();
67
+ for (const p of parts) {
68
+ if (!isResolverSpec(p)) {
69
+ console.warn(`⚠ --dns: ignoring invalid server "${p}" (expected IPv4/IPv6, optionally with :port)`);
70
+ continue;
71
+ }
72
+ if (!seen.has(p)) { seen.add(p); valid.push(p); }
73
+ }
74
+ return valid;
75
+ }
76
+
77
+ /**
78
+ * Build a rotating pre-check resolver.
79
+ * @param {object} [opts]
80
+ * @param {string[]} [opts.servers] - explicit servers (from --dns). When empty,
81
+ * the system resolv.conf servers are used.
82
+ * @param {boolean} [opts.forceDebug] - emit a debug line on the retry path.
83
+ * @returns {{ resolveHost: (hostname:string, timeoutMs:number)=>Promise<void>,
84
+ * servers: string[], rotates: boolean, pinned: boolean }}
85
+ * resolveHost resolves on success and rejects with the final error
86
+ * (err.code intact) on failure.
87
+ */
88
+ function createRotatingResolver(opts = {}) {
89
+ const forceDebug = !!opts.forceDebug;
90
+ const override = Array.isArray(opts.servers) && opts.servers.length > 0 ? opts.servers : null;
91
+
92
+ let systemServers = [];
93
+ try { systemServers = getSystemDnsServers(); } catch { systemServers = []; }
94
+ const servers = override || systemServers;
95
+
96
+ // Pin/rotate an explicit --dns list (even a single server — never fall back
97
+ // to the OS resolver in that case). For resolv.conf, only build a pool when
98
+ // there is more than one server to rotate; otherwise use the global API
99
+ // (which already reads resolv.conf).
100
+ const shouldPool = override ? servers.length >= 1 : servers.length > 1;
101
+ let pool = null;
102
+ if (shouldPool) {
103
+ pool = servers.map((_, i) => {
104
+ const r = new DnsPromiseResolver();
105
+ // setServers accepts exactly what we hold here: getServers()'s own output
106
+ // (system path) or net-validated specs incl. ip:port (override path).
107
+ // Keep the resolver's default servers if an entry is somehow rejected.
108
+ try { r.setServers([...servers.slice(i), ...servers.slice(0, i)]); } catch { /* keep default */ }
109
+ return r;
110
+ });
111
+ }
112
+
113
+ let cursor = 0;
114
+ // Resolver for the next attempt: rotated when a pool exists, else the global
115
+ // promises API. `cursor++` is a synchronous single-threaded increment, so even
116
+ // under heavy concurrency every caller gets a distinct slot and the lead
117
+ // distribution stays exactly even (no locking needed).
118
+ const nextResolver = () => (pool ? pool[cursor++ % pool.length] : dnsPromises);
119
+
120
+ // One resolution attempt: rotate the lead server, resolve4 first, and on
121
+ // no-IPv4 (ENODATA/ENOTFOUND) fall back to resolve6 so IPv6-only hosts aren't
122
+ // wrongly skipped. Any OTHER code propagates unchanged so the caller sees the
123
+ // real resolver error. A timeout is kept as a safety net — with c-ares off
124
+ // the libuv threadpool it should rarely fire.
125
+ async function attempt(hostname, timeoutMs) {
126
+ const resolver = nextResolver();
127
+ let timer;
128
+ try {
129
+ const timeoutP = new Promise((_, reject) => {
130
+ timer = setTimeout(() => reject(new Error('DNS timeout')), timeoutMs);
131
+ });
132
+ const chain = resolver.resolve4(hostname).catch(err => {
133
+ if (err && (err.code === 'ENODATA' || err.code === 'ENOTFOUND')) {
134
+ return resolver.resolve6(hostname);
135
+ }
136
+ throw err;
137
+ });
138
+ await Promise.race([chain, timeoutP]);
139
+ } finally {
140
+ if (timer) clearTimeout(timer);
141
+ }
142
+ }
143
+
144
+ /**
145
+ * Resolve a hostname, rotating the lead server per attempt and retrying once
146
+ * on a transient/resolver error (so the retry leads with the next server —
147
+ * if one REFUSES, the retry hits another).
148
+ */
149
+ async function resolveHost(hostname, timeoutMs) {
150
+ try {
151
+ await attempt(hostname, timeoutMs);
152
+ } catch (firstErr) {
153
+ const code = firstErr && firstErr.code;
154
+ if (DNS_TRANSIENT_ERRORS.has(code) || (firstErr && firstErr.message === 'DNS timeout')) {
155
+ if (forceDebug) console.log(formatLogMessage('debug', `DNS pre-check transient (${code || 'timeout'}) for ${hostname}, retrying once`));
156
+ await attempt(hostname, timeoutMs);
157
+ } else {
158
+ throw firstErr;
159
+ }
160
+ }
161
+ }
162
+
163
+ return { resolveHost, servers, rotates: !!pool, pinned: !!override };
164
+ }
165
+
166
+ /**
167
+ * Circuit breaker for the DNS pre-check. During a resolver-refusal storm the
168
+ * pre-check is worthless (every host fails open and proceeds anyway) and
169
+ * actively harmful (it piles ~2× the queries — with the retry — onto an
170
+ * already-refusing resolver). This trips when resolver errors dominate a recent
171
+ * window of attempts and suspends pre-checking for a cooldown so the resolver
172
+ * gets breathing room; sites still load (a suspended pre-check just proceeds to
173
+ * navigation, exactly like a single fail-open). NXDOMAIN and success count as
174
+ * HEALTHY (the resolver answered) — only resolver errors (EREFUSED / ESERVFAIL
175
+ * / ETIMEOUT / ECONNREFUSED / timeout) count against it.
176
+ *
177
+ * @param {object} [opts]
178
+ * @param {number} [opts.window=20] attempts kept in the rolling window
179
+ * @param {number} [opts.threshold=10] resolver-errors in the window to trip
180
+ * @param {number} [opts.cooldownMs=30000] how long to stay suspended once tripped
181
+ * @param {boolean} [opts.forceDebug]
182
+ * @param {function} [opts.now] clock injection (tests); defaults to Date.now
183
+ * @returns {{ record:(isResolverError:boolean)=>void, isTripped:()=>boolean,
184
+ * stats:()=>{tripped:boolean,errorCount:number,windowFill:number,trips:number} }}
185
+ */
186
+ function createDnsCircuitBreaker(opts = {}) {
187
+ const windowSize = opts.window || 20;
188
+ const threshold = opts.threshold || 10;
189
+ const cooldownMs = opts.cooldownMs != null ? opts.cooldownMs : 30000;
190
+ const forceDebug = !!opts.forceDebug;
191
+ const now = opts.now || Date.now;
192
+
193
+ const recent = []; // booleans, true = resolver error
194
+ let errorCount = 0;
195
+ let openUntil = 0; // suspended while now() < openUntil
196
+ let trips = 0;
197
+
198
+ // Feed one resolve outcome. Only ever called while closed (a suspended
199
+ // pre-check skips the resolve, so no outcome is produced).
200
+ function record(isResolverError) {
201
+ recent.push(!!isResolverError);
202
+ if (isResolverError) errorCount++;
203
+ if (recent.length > windowSize && recent.shift()) errorCount--;
204
+
205
+ if (now() >= openUntil && errorCount >= threshold) {
206
+ openUntil = now() + cooldownMs;
207
+ trips++;
208
+ console.log(formatLogMessage('warn', `[dns-precheck] resolver errors ${errorCount}/${recent.length} — suspending DNS pre-check ${Math.round(cooldownMs / 1000)}s (sites still load; backing off the resolver)`));
209
+ }
210
+ }
211
+
212
+ // True while suspended. On the first call after the cooldown elapses, resume
213
+ // with a clean window so the storm is re-measured fresh rather than re-tripping
214
+ // on stale errors.
215
+ function isTripped() {
216
+ if (now() < openUntil) return true;
217
+ if (openUntil !== 0) {
218
+ openUntil = 0;
219
+ recent.length = 0;
220
+ errorCount = 0;
221
+ if (forceDebug) console.log(formatLogMessage('debug', '[dns-precheck] cooldown elapsed — resuming DNS pre-check'));
222
+ }
223
+ return false;
224
+ }
225
+
226
+ return {
227
+ record,
228
+ isTripped,
229
+ stats: () => ({ tripped: now() < openUntil, errorCount, windowFill: recent.length, trips }),
230
+ };
231
+ }
232
+
233
+ module.exports = {
234
+ createRotatingResolver,
235
+ createDnsCircuitBreaker,
236
+ parseDnsServers,
237
+ isNonExistenceError,
238
+ };