@fanboynz/network-scanner 3.1.2 → 3.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,7 +2,30 @@
2
2
 
3
3
  All notable changes to the Network Scanner (nwss.js) project.
4
4
 
5
- ## [3.1.1] - 2026-05-30
5
+ ## [3.2.0] - 2026-06-04
6
+
7
+ ### Added
8
+ - **`output_regex`** site option — a per-site regex whose capture group 1 (or whole match) becomes the rule body, so output can be a path-prefix rule like `||host/script/` instead of `||host^`. Collapses randomized filenames under a stable path into one rule and lets you block a folder on a host that also serves legit content; falls back to `||host^` when the regex doesn't match. Adblock-only — domain-based formats (dnsmasq/unbound/pi-hole/hosts/plain) emit the bare host. Compiled once per pattern (memoized) and validated at config load.
9
+ - **dig resolver failover** — `digLookup` now fails over through the `--dns` resolvers on timeout / no-reply / `REFUSED` / `SERVFAIL` (up to 3 attempts, `+time=2 +tries=1` each), matching the resilience the whois retry and DNS pre-check rotation already had. With no `--dns`, the system-resolver path keeps dig's native `resolv.conf` rotation unchanged.
10
+
11
+ ### Changed
12
+ - **Ghost-cursor coordinate clicks now use the same realistic press as the built-in content clicks** (`humanClick`): hover dwell + mousedown/hold/mouseup, plus hand-tremor during the hold and a mouseup drift (so mousedown ≠ mouseup coordinates) when `realistic_click` is set — replacing a 0ms `page.mouse.click`.
13
+ - **Ghost-cursor clicks honor `interact_click_count`** (default 3, cap 20) instead of firing a single click — ad SDKs often swallow the 1st/2nd click as warmup. The bezier movement loop reserves part of `ghost_cursor_duration` for the clicks (raise the duration to fit more; the default 2000ms fits ~1 realistic click).
14
+ - **`dig` success is judged by RCODE, not stderr** — a dig that prints a transient `communications error` warning but still returns a valid `ANSWER SECTION` is no longer discarded.
15
+ - **dig-only configs skip the whois root-domain parse** per request (small per-request saving when no `whois`/`whois-or` is configured).
16
+
17
+ ### Fixed
18
+ - **`max_redirects: 0`** now means "follow none" instead of silently becoming 10 (the `|| 10` falsy-zero bug in `nwss.js` and `lib/redirect.js`).
19
+ - **A `REFUSED`/`SERVFAIL` dig that exhausts all resolvers returns failure** so it isn't cached — a transient resolver-side error no longer poisons a domain for the cache TTL.
20
+ - **Ghost-cursor coordinate click no longer reports false success** — it returned `true` (and logged "Clicked") even when the click was silently skipped for lack of a page; it now returns `false` and logs the skip.
21
+
22
+ ### Removed
23
+ - **`follow_redirects`** site option — documented in `--help`, the man page, the README, and example configs but never wired to any runtime behavior; removed from the docs. Use `max_redirects` instead (`0` = follow none).
24
+
25
+ ### Security
26
+ - **dig argv-injection guard** — `digLookup` rejects non-hostname-shaped input before shelling out. `dig` has no `--` end-of-options marker (unlike whois) and parses `@`/`-`/`+`-leading argv tokens as options, so a crafted "domain" like `@evil-resolver` (redirects the query to an arbitrary server) or `-f /path` (reads a file as a query batch) is now rejected — out-of-charset or dash-leading values fall back to no-match.
27
+
28
+ ## [3.1.2] - 2026-05-30
6
29
 
7
30
  ### Changed
8
31
  - **Fingerprint identity pinned to Stable Chrome 148**, not whatever Chrome-for-Testing puppeteer bundles (currently 149, ahead of Stable). The spoof must blend with the real-world population; claiming an unreleased build is itself a tell. The Chrome major + build (`CHROME_BUILD`) + GREASE brand (`CHROME_GREASE_BRAND`) are now single constants — see `lib/fingerprint.md`.
package/CLAUDE.md CHANGED
@@ -6,7 +6,7 @@ Puppeteer-based network scanner for analyzing web traffic, generating adblock fi
6
6
 
7
7
  - `nwss.js` — Main entry point (~5,800 lines). CLI args, URL processing, orchestration.
8
8
  - `config.json` — Default scan configuration (sites, filters, options).
9
- - `lib/` — 32 focused, single-purpose modules:
9
+ - `lib/` — 33 focused, single-purpose modules:
10
10
  - `fingerprint.js` — Bot detection evasion (device/GPU/timezone spoofing)
11
11
  - `cloudflare.js` — Cloudflare challenge detection and solving
12
12
  - `browserhealth.js` — Memory management and browser lifecycle
@@ -14,6 +14,7 @@ Puppeteer-based network scanner for analyzing web traffic, generating adblock fi
14
14
  - `ghost-cursor.js` — Bezier-curve cursor pathing for human-like mouse movement
15
15
  - `smart-cache.js` — Multi-layer caching with persistence
16
16
  - `nettools.js` — WHOIS/dig integration
17
+ - `dns.js` — DNS pre-check resolver: multi-nameserver rotation + `--dns` override (pre-check only; not Chrome/dig)
17
18
  - `output.js` — Multi-format rule output (adblock, dnsmasq, unbound, pihole, etc.)
18
19
  - `proxy.js` — SOCKS5/HTTP proxy support
19
20
  - `socks-relay.js` — Local SOCKS proxy relay/chain helper
package/README.md CHANGED
@@ -17,6 +17,7 @@ A Puppeteer-based tool for scanning websites to find third-party (or optionally
17
17
  - Subdomain handling (collapse to root or full subdomain)
18
18
  - Optionally match only first-party, third-party, or both
19
19
  - Enhanced redirect handling with JavaScript and meta refresh detection
20
+ - Capture and drive popup/popunder chains (`capture_popups` + `interact_popups`) so domains reachable only via a clicked popup still match
20
21
  - Per-site proxy routing (SOCKS5, SOCKS4, HTTP, HTTPS) with pre-flight health checks
21
22
 
22
23
  ---
@@ -50,7 +51,6 @@ A Puppeteer-based tool for scanning websites to find third-party (or optionally
50
51
 
51
52
  | Argument | Description |
52
53
  |:---------------------------|:------------|
53
- | `--verbose` | Force verbose mode globally |
54
54
  | `--debug` | Force debug mode globally |
55
55
  | `--silent` | Suppress normal console logs |
56
56
  | `--titles` | Add `! <url>` title before each site's group |
@@ -66,7 +66,7 @@ A Puppeteer-based tool for scanning websites to find third-party (or optionally
66
66
  | `--use-puppeteer-core` | Use `puppeteer-core` with system Chrome instead of bundled Chromium |
67
67
  | `--use-obscura` | Connect to running Obscura CDP server (`ws://127.0.0.1:9222` or `OBSCURA_WS` env). Skips fingerprint injection — Obscura provides built-in stealth |
68
68
  | `--load-extension <path>` | Load unpacked Chrome extension from directory (can be used multiple times) |
69
- | `--dns-cache` | Persist dig/whois results to disk between runs (20hr TTL, 2000-entry cap each, `.digcache`/`.whoiscache`). Disk writes are atomic (tmp + rename); corrupt cache files are detected on load with a `[dns-cache]` warn line and reset cleanly. |
69
+ | `--dns-cache` | Persist dig/whois results to disk between runs (20hr TTL, 2000-entry cap each, `.digcache`/`.whoiscache`), **plus** the DNS pre-check negative cache (NXDOMAIN/ENODATA only — never resolver errors — 12h TTL, `.dnsnegcache`) so known-dead hosts aren't re-resolved next run. Disk writes are atomic (tmp + rename); corrupt cache files are detected on load with a `[dns-cache]` warn line and reset cleanly. |
70
70
  | `--no-dns-precheck` | Disable per-URL DNS resolution check before page navigation. By default, hosts that dig/whois have already proven live (within the 20hr cache TTL) skip their c-ares pre-check via a positive-resolution index. |
71
71
  | `--block-ads=<files>` | Block ads using EasyList format rules (comma-separated: `easylist.txt,easyprivacy.txt`) |
72
72
  | `--cdp` | Enable Chrome DevTools Protocol logging (now per-page if enabled) |
@@ -76,6 +76,8 @@ A Puppeteer-based tool for scanning websites to find third-party (or optionally
76
76
  | `--help`, `-h` | Show this help menu |
77
77
  | `--version` | Show script version |
78
78
  | `--max-concurrent <number>` | Maximum concurrent site processing (1-50, overrides config/default) |
79
+ | `--dns <ip[,ip,...]>` | Resolver(s) for the DNS pre-check **and** nettools' `dig` (one pins, several rotate per query; overrides `/etc/resolv.conf`). Does not affect Chrome navigation or `whois`. Useful when the system resolver is flaky and `dig`-gated domains time out |
80
+ | `--show-dead-domains` | At end of scan, list hostnames that did not resolve / were unreachable (`NXDOMAIN`/`ENODATA` + `ERR_NAME_NOT_RESOLVED`/`ERR_ADDRESS_UNREACHABLE`). Excludes blocks/timeouts (those mean the domain is alive). For pruning dead URLs. |
79
81
  | `--cleanup-interval <number>` | Browser restart interval in URLs processed (1-1000, overrides config/default) |
80
82
 
81
83
  ### Validation Options
@@ -152,6 +154,7 @@ Example:
152
154
  | `userAgent` | `chrome`, `chrome_mac`, `chrome_linux`, `firefox`, `firefox_mac`, `firefox_linux`, `safari` | - | User agent for page |
153
155
  | `filterRegex` | String or Array | `.*` | Regex or list of regexes to match requests |
154
156
  | `regex_and` | Boolean | `false` | Use AND logic for multiple filterRegex patterns - ALL patterns must match the same URL |
157
+ | `output_regex` | String | — | Regex applied to each matched URL to build the rule body: capture group 1 (or whole match) becomes `\|\|<capture>` instead of `\|\|host^`. E.g. `^https?:\/\/([^\/]+\/[^\/]+\/)` turns `https://host.com/script/abc.js` into `\|\|host.com/script/`. The capture must include the host. No match → falls back to `\|\|host^`. Adblock-only; domain formats (dnsmasq/pihole/hosts/plain) emit the bare host |
155
158
  | `comments` | String or Array | - | String of comments or references |
156
159
  | `resourceTypes` | Array | `["script", "xhr", "image", "stylesheet"]` | What resource types to monitor |
157
160
  | `reload` | Integer | `1` | Number of times to reload page |
@@ -176,8 +179,7 @@ Example:
176
179
 
177
180
  | Field | Values | Default | Description |
178
181
  |:---------------------|:-------|:-------:|:------------|
179
- | `follow_redirects` | Boolean | `true` | Follow redirects to new domains |
180
- | `max_redirects` | Integer | `10` | Maximum number of redirects to follow |
182
+ | `max_redirects` | Integer | `10` | Maximum number of redirects to follow (`0` = follow none) |
181
183
  | `js_redirect_timeout` | Milliseconds | `5000` | Time to wait for JavaScript redirects |
182
184
  | `detect_js_patterns` | Boolean | `true` | Analyze page source for redirect patterns |
183
185
  | `redirect_timeout_multiplier` | Number | `1.5` | Increase timeout for redirected URLs |
@@ -279,6 +281,8 @@ When a page redirects to a new domain, first-party/third-party detection is base
279
281
  | `interact_duration` | Milliseconds | `2000` | Duration of interaction simulation |
280
282
  | `interact_scrolling` | Boolean | `true` | Enable scrolling simulation |
281
283
  | `interact_clicks` | Boolean | `false` | Enable element clicking simulation |
284
+ | `interact_click_count` | Integer | `3` | Number of random content-zone clicks per load (capped at 20). Default 3 = primary + 2 backups, since ad SDKs sometimes suppress the 1st/2nd click as warmup |
285
+ | `realistic_click` | Boolean | `false` | Higher click fidelity: denser mouse approach (15 steps), ±1px hand-tremor micro-moves during the press, and ±1.5px mouseup drift (so mousedown≠mouseup coords) — for sites that score click realism. Costs ~80–120ms/click |
282
286
  | `interact_typing` | Boolean | `false` | Enable typing simulation |
283
287
  | `interact_intensity` | String | `"medium"` | Interaction simulation intensity: "low", "medium", "high" |
284
288
  | `cursor_mode` | `"ghost"` | - | Use ghost-cursor Bezier mouse movements (requires `npm i ghost-cursor`) |
@@ -295,6 +299,21 @@ When a page redirects to a new domain, first-party/third-party detection is base
295
299
  | `ignore_similar_threshold` | Integer | - | Override global similarity threshold for this site |
296
300
  | `ignore_similar_ignored_domains` | Boolean | - | Override global `ignore_similar_ignored_domains` for this site |
297
301
 
302
+ ### Popup Capture Options
303
+
304
+ Capture (and optionally drive) the popup/popunder windows that ad and redirect
305
+ scripts open, so domains reachable only via a popup chain still match `filterRegex`.
306
+ The same `filterRegex` applies to the whole chain — it must contain every pattern
307
+ you expect along it. Popup capture only fires when the main page is actually
308
+ clicking, so set `interact: true` **and** `interact_clicks: true` as well.
309
+
310
+ | Field | Values | Default | Description |
311
+ |:---------------------|:-------|:-------:|:------------|
312
+ | `capture_popups` | Boolean | `false` | Capture popup windows opened during the scan and evaluate their landing URL + in-popup requests against `filterRegex`/`dig`/`whois` (requires `interact` + `interact_clicks` to fire user-gesture clicks) |
313
+ | `interact_popups` | Boolean | `false` | Mouse-click inside captured popups (3 content-zone clicks) so the chain cascades to its next redirect/ad. Requires `capture_popups`. Clicks popups up to `capture_popups_max_depth − 1` (the deepest captured popup is observed, not clicked) |
314
+ | `capture_popups_max_depth` | Integer | `4` | Max popup-chain depth to capture (`site → p1 → p2 → p3 → destination`). Each extra level multiplies popups + time |
315
+ | `capture_popups_window_ms` | Integer | `5000` | Per-popup capture window (ms) before the popup is auto-closed |
316
+
298
317
  ### VPN Options
299
318
 
300
319
  Route traffic through a VPN for specific sites. Requires `sudo` privileges. The VPN connection is established before scanning and torn down after the site completes.
@@ -596,8 +615,11 @@ node nwss.js --max-concurrent 12 --cleanup-interval 300 -o rules.txt
596
615
  {
597
616
  "url": "https://anti-bot-site.com",
598
617
  "interact": true,
618
+ "interact_clicks": true,
599
619
  "cursor_mode": "ghost",
600
- "ghost_cursor_duration": 3000,
620
+ "realistic_click": true,
621
+ "interact_click_count": 3,
622
+ "ghost_cursor_duration": 5000,
601
623
  "ghost_cursor_speed": 1.2,
602
624
  "fingerprint_protection": "random",
603
625
  "filterRegex": "tracking|analytics",
@@ -610,6 +632,12 @@ Or enable globally via CLI:
610
632
  node nwss.js --ghost-cursor --debug -o rules.txt
611
633
  ```
612
634
 
635
+ **Ghost-cursor clicks.** The cursor moves with `cursor_mode: "ghost"`, but it only *clicks* when both `interact: true` **and** `interact_clicks: true` are set (same rule as the built-in path). Click behavior:
636
+
637
+ - `realistic_click: true` — each press adds hand-tremor during the hold and a mouseup drift, so `mousedown` ≠ `mouseup` coordinates (the press is routed through the same `humanClick` the built-in content clicks use).
638
+ - `interact_click_count` — number of clicks per load (default `3`, capped at `20`). The default of 3 matters because some ad SDKs swallow the 1st/2nd click as warmup.
639
+ - **Duration vs. clicks:** realistic clicks take ~600–700ms each, and the bezier movement loop reserves up to **half** of `ghost_cursor_duration` for them. So the default `ghost_cursor_duration: 2000` only fits **~1 click** — raise it to roughly `interact_click_count × 700 + movement` (e.g. `5000`–`8000`) to fit all of them.
640
+
613
641
  > **Note:** ghost-cursor is an optional dependency. Install with `npm install ghost-cursor`. If not installed, the scanner falls back to the built-in mouse simulation automatically.
614
642
 
615
643
  #### E-commerce Site Scanning
package/eslint.config.mjs CHANGED
@@ -2,5 +2,17 @@ import globals from "globals";
2
2
  import { defineConfig } from "eslint/config";
3
3
 
4
4
  export default defineConfig([
5
- { files: ["**/*.{js,mjs,cjs}"], languageOptions: { globals: globals.browser } },
5
+ {
6
+ files: ["**/*.{js,mjs,cjs}"],
7
+ // Node globals (require/module/process/Buffer/...) plus browser globals
8
+ // (document/window/navigator) — the latter are referenced inside
9
+ // page.evaluate() callbacks that eslint parses as part of the file.
10
+ languageOptions: { globals: { ...globals.node, ...globals.browser } },
11
+ // Catch undefined-variable references statically. node --check only
12
+ // validates syntax, so an orphaned identifier (e.g. a const that was
13
+ // removed while a usage remained) passes parsing but throws
14
+ // ReferenceError at runtime only when that branch executes. no-undef
15
+ // turns that whole class into a build-time failure.
16
+ rules: { "no-undef": "error" },
17
+ },
6
18
  ]);
@@ -7,12 +7,11 @@ const { formatLogMessage, messageColors } = require('./colorize');
7
7
  const IS_PAGE_FROM_PREVIOUS_SCAN_TAG = messageColors.processing('[isPageFromPreviousScan]');
8
8
  const REALTIME_CLEANUP_TAG = messageColors.processing('[realtime_cleanup]');
9
9
  const GROUP_WINDOW_CLEANUP_TAG = messageColors.processing('[group_window_cleanup]');
10
- const { execSync, execFile } = require('child_process');
10
+ const { execFile } = require('child_process');
11
11
 
12
12
  // Window cleanup delay constant
13
13
  const WINDOW_CLEANUP_DELAY_MS = 15000;
14
14
  // window_clean REALTIME
15
- const REALTIME_CLEANUP_BUFFER_MS = 25000; // Additional buffer time after site delay (increased for Cloudflare)
16
15
  const REALTIME_CLEANUP_THRESHOLD = 12; // Default number of pages to keep
17
16
  const REALTIME_CLEANUP_MIN_PAGES = 6; // Minimum pages before cleanup kicks in
18
17
 
@@ -380,7 +379,30 @@ async function performRealtimeWindowCleanup(browserInstance, threshold = REALTIM
380
379
 
381
380
  // Use the provided total delay (already includes appropriate buffer)
382
381
  const cleanupDelay = totalDelay;
383
-
382
+
383
+ // Pre-wait short-circuit. The only pages this pass can ever close are popups
384
+ // (untracked) and idle pages — active main pages are protected by
385
+ // isPageSafeToClose. When concurrency exceeds the threshold the page count is
386
+ // dominated by active main pages, so without this we'd wait the full
387
+ // cleanupDelay and then close nothing (e.g. max_concurrent 30 vs threshold 8
388
+ // = a ~36s no-op on every task). If nothing is even a candidate, skip the
389
+ // wait. A main task that finishes during the skipped wait closes its OWN page,
390
+ // so realtime cleanup never needed to wait for it.
391
+ const hasCloseCandidate = quickPages.some(p => {
392
+ if (p.isClosed()) return false;
393
+ const usage = pageUsageTracker.get(p);
394
+ return !usage || !usage.isProcessing; // untracked popup, or a tracked-idle page
395
+ });
396
+ if (!hasCloseCandidate) {
397
+ if (forceDebug) {
398
+ console.log(formatLogMessage('debug', `${REALTIME_CLEANUP_TAG} ${quickPages.length} pages but all actively processing — skipping ${cleanupDelay}ms wait (nothing closeable)`));
399
+ }
400
+ result.success = true;
401
+ result.totalPages = quickPages.length;
402
+ result.reason = 'all_active';
403
+ return result;
404
+ }
405
+
384
406
  if (forceDebug) {
385
407
  console.log(formatLogMessage('debug', `${REALTIME_CLEANUP_TAG} Waiting ${cleanupDelay}ms before cleanup (threshold: ${threshold})`));
386
408
  }
package/lib/dns.js ADDED
@@ -0,0 +1,238 @@
1
+ /**
2
+ * DNS pre-check resolver with multi-nameserver rotation.
3
+ *
4
+ * Owns nameserver selection and robust resolution for the scan's DNS
5
+ * pre-check. The default global resolver leads EVERY query with the FIRST
6
+ * nameserver in /etc/resolv.conf, so under scan concurrency one server
7
+ * (typically the ISP resolver) takes the whole c-ares burst and starts
8
+ * answering REFUSED while the other configured servers (e.g. 8.8.8.8/8.8.4.4)
9
+ * sit idle. This module builds one Resolver per nameserver — each leading with
10
+ * a different server, the rest kept as failover order — and round-robins them
11
+ * per resolve attempt so the lead spreads across all servers (and across the
12
+ * retry). A `--dns` override pins/rotates an explicit list instead of
13
+ * resolv.conf.
14
+ *
15
+ * Scope: this affects the pre-check resolver only. Chrome's navigation DNS
16
+ * (OS resolver) and nettools' dig/whois are separate paths and unaffected.
17
+ */
18
+ const net = require('node:net');
19
+ const dnsPromises = require('node:dns/promises');
20
+ const { getServers: getSystemDnsServers } = require('node:dns');
21
+ const { Resolver: DnsPromiseResolver } = require('node:dns/promises');
22
+ const { formatLogMessage } = require('./colorize');
23
+
24
+ // c-ares codes that mean "resolver problem" (retry-worthy / fail-open), not
25
+ // "the host does not exist".
26
+ const DNS_TRANSIENT_ERRORS = new Set(['ETIMEOUT', 'ESERVFAIL', 'EREFUSED', 'ECONNREFUSED']);
27
+
28
+ /**
29
+ * True only for a definitive "host does not exist / has no address" answer —
30
+ * the only case that justifies skipping a URL in the pre-check. Everything
31
+ * else (EREFUSED, ESERVFAIL, ETIMEOUT, ECONNREFUSED, timeout) is a resolver
32
+ * problem the caller should fail open on.
33
+ * @param {string} code
34
+ * @returns {boolean}
35
+ */
36
+ function isNonExistenceError(code) {
37
+ return code === 'ENOTFOUND' || code === 'ENODATA';
38
+ }
39
+
40
+ // Accept a bare IPv4/IPv6 address, or an address with a port in the exact form
41
+ // Resolver.setServers() understands: `ipv4:port` or `[ipv6]:port`.
42
+ function isResolverSpec(s) {
43
+ if (net.isIP(s)) return true;
44
+ const bracketed = s.match(/^\[([0-9a-fA-F:]+)\](?::\d{1,5})?$/);
45
+ if (bracketed) return net.isIP(bracketed[1]) === 6;
46
+ const v4port = s.match(/^(\d{1,3}(?:\.\d{1,3}){3}):\d{1,5}$/);
47
+ if (v4port) return net.isIP(v4port[1]) === 4;
48
+ return false;
49
+ }
50
+
51
+ /**
52
+ * Parse + validate a `--dns` / config value into a clean, de-duplicated server
53
+ * list. Accepts a comma-separated string or an array. Each entry may be a bare
54
+ * IPv4/IPv6 address or an address with a port (`8.8.8.8:5353`,
55
+ * `[2001:db8::1]:5353`) — the form setServers() accepts. Invalid entries are
56
+ * warned and dropped; duplicates are collapsed so the rotation stays even.
57
+ * @param {string|string[]|undefined} raw
58
+ * @returns {string[]} validated server specs (possibly empty)
59
+ */
60
+ function parseDnsServers(raw) {
61
+ if (!raw) return [];
62
+ const parts = (Array.isArray(raw) ? raw : String(raw).split(','))
63
+ .map(s => String(s).trim())
64
+ .filter(Boolean);
65
+ const valid = [];
66
+ const seen = new Set();
67
+ for (const p of parts) {
68
+ if (!isResolverSpec(p)) {
69
+ console.warn(`⚠ --dns: ignoring invalid server "${p}" (expected IPv4/IPv6, optionally with :port)`);
70
+ continue;
71
+ }
72
+ if (!seen.has(p)) { seen.add(p); valid.push(p); }
73
+ }
74
+ return valid;
75
+ }
76
+
77
+ /**
78
+ * Build a rotating pre-check resolver.
79
+ * @param {object} [opts]
80
+ * @param {string[]} [opts.servers] - explicit servers (from --dns). When empty,
81
+ * the system resolv.conf servers are used.
82
+ * @param {boolean} [opts.forceDebug] - emit a debug line on the retry path.
83
+ * @returns {{ resolveHost: (hostname:string, timeoutMs:number)=>Promise<void>,
84
+ * servers: string[], rotates: boolean, pinned: boolean }}
85
+ * resolveHost resolves on success and rejects with the final error
86
+ * (err.code intact) on failure.
87
+ */
88
+ function createRotatingResolver(opts = {}) {
89
+ const forceDebug = !!opts.forceDebug;
90
+ const override = Array.isArray(opts.servers) && opts.servers.length > 0 ? opts.servers : null;
91
+
92
+ let systemServers = [];
93
+ try { systemServers = getSystemDnsServers(); } catch { systemServers = []; }
94
+ const servers = override || systemServers;
95
+
96
+ // Pin/rotate an explicit --dns list (even a single server — never fall back
97
+ // to the OS resolver in that case). For resolv.conf, only build a pool when
98
+ // there is more than one server to rotate; otherwise use the global API
99
+ // (which already reads resolv.conf).
100
+ const shouldPool = override ? servers.length >= 1 : servers.length > 1;
101
+ let pool = null;
102
+ if (shouldPool) {
103
+ pool = servers.map((_, i) => {
104
+ const r = new DnsPromiseResolver();
105
+ // setServers accepts exactly what we hold here: getServers()'s own output
106
+ // (system path) or net-validated specs incl. ip:port (override path).
107
+ // Keep the resolver's default servers if an entry is somehow rejected.
108
+ try { r.setServers([...servers.slice(i), ...servers.slice(0, i)]); } catch { /* keep default */ }
109
+ return r;
110
+ });
111
+ }
112
+
113
+ let cursor = 0;
114
+ // Resolver for the next attempt: rotated when a pool exists, else the global
115
+ // promises API. `cursor++` is a synchronous single-threaded increment, so even
116
+ // under heavy concurrency every caller gets a distinct slot and the lead
117
+ // distribution stays exactly even (no locking needed).
118
+ const nextResolver = () => (pool ? pool[cursor++ % pool.length] : dnsPromises);
119
+
120
+ // One resolution attempt: rotate the lead server, resolve4 first, and on
121
+ // no-IPv4 (ENODATA/ENOTFOUND) fall back to resolve6 so IPv6-only hosts aren't
122
+ // wrongly skipped. Any OTHER code propagates unchanged so the caller sees the
123
+ // real resolver error. A timeout is kept as a safety net — with c-ares off
124
+ // the libuv threadpool it should rarely fire.
125
+ async function attempt(hostname, timeoutMs) {
126
+ const resolver = nextResolver();
127
+ let timer;
128
+ try {
129
+ const timeoutP = new Promise((_, reject) => {
130
+ timer = setTimeout(() => reject(new Error('DNS timeout')), timeoutMs);
131
+ });
132
+ const chain = resolver.resolve4(hostname).catch(err => {
133
+ if (err && (err.code === 'ENODATA' || err.code === 'ENOTFOUND')) {
134
+ return resolver.resolve6(hostname);
135
+ }
136
+ throw err;
137
+ });
138
+ await Promise.race([chain, timeoutP]);
139
+ } finally {
140
+ if (timer) clearTimeout(timer);
141
+ }
142
+ }
143
+
144
+ /**
145
+ * Resolve a hostname, rotating the lead server per attempt and retrying once
146
+ * on a transient/resolver error (so the retry leads with the next server —
147
+ * if one REFUSES, the retry hits another).
148
+ */
149
+ async function resolveHost(hostname, timeoutMs) {
150
+ try {
151
+ await attempt(hostname, timeoutMs);
152
+ } catch (firstErr) {
153
+ const code = firstErr && firstErr.code;
154
+ if (DNS_TRANSIENT_ERRORS.has(code) || (firstErr && firstErr.message === 'DNS timeout')) {
155
+ if (forceDebug) console.log(formatLogMessage('debug', `DNS pre-check transient (${code || 'timeout'}) for ${hostname}, retrying once`));
156
+ await attempt(hostname, timeoutMs);
157
+ } else {
158
+ throw firstErr;
159
+ }
160
+ }
161
+ }
162
+
163
+ return { resolveHost, servers, rotates: !!pool, pinned: !!override };
164
+ }
165
+
166
+ /**
167
+ * Circuit breaker for the DNS pre-check. During a resolver-refusal storm the
168
+ * pre-check is worthless (every host fails open and proceeds anyway) and
169
+ * actively harmful (it piles ~2× the queries — with the retry — onto an
170
+ * already-refusing resolver). This trips when resolver errors dominate a recent
171
+ * window of attempts and suspends pre-checking for a cooldown so the resolver
172
+ * gets breathing room; sites still load (a suspended pre-check just proceeds to
173
+ * navigation, exactly like a single fail-open). NXDOMAIN and success count as
174
+ * HEALTHY (the resolver answered) — only resolver errors (EREFUSED / ESERVFAIL
175
+ * / ETIMEOUT / ECONNREFUSED / timeout) count against it.
176
+ *
177
+ * @param {object} [opts]
178
+ * @param {number} [opts.window=20] attempts kept in the rolling window
179
+ * @param {number} [opts.threshold=10] resolver-errors in the window to trip
180
+ * @param {number} [opts.cooldownMs=30000] how long to stay suspended once tripped
181
+ * @param {boolean} [opts.forceDebug]
182
+ * @param {function} [opts.now] clock injection (tests); defaults to Date.now
183
+ * @returns {{ record:(isResolverError:boolean)=>void, isTripped:()=>boolean,
184
+ * stats:()=>{tripped:boolean,errorCount:number,windowFill:number,trips:number} }}
185
+ */
186
+ function createDnsCircuitBreaker(opts = {}) {
187
+ const windowSize = opts.window || 20;
188
+ const threshold = opts.threshold || 10;
189
+ const cooldownMs = opts.cooldownMs != null ? opts.cooldownMs : 30000;
190
+ const forceDebug = !!opts.forceDebug;
191
+ const now = opts.now || Date.now;
192
+
193
+ const recent = []; // booleans, true = resolver error
194
+ let errorCount = 0;
195
+ let openUntil = 0; // suspended while now() < openUntil
196
+ let trips = 0;
197
+
198
+ // Feed one resolve outcome. Only ever called while closed (a suspended
199
+ // pre-check skips the resolve, so no outcome is produced).
200
+ function record(isResolverError) {
201
+ recent.push(!!isResolverError);
202
+ if (isResolverError) errorCount++;
203
+ if (recent.length > windowSize && recent.shift()) errorCount--;
204
+
205
+ if (now() >= openUntil && errorCount >= threshold) {
206
+ openUntil = now() + cooldownMs;
207
+ trips++;
208
+ console.log(formatLogMessage('warn', `[dns-precheck] resolver errors ${errorCount}/${recent.length} — suspending DNS pre-check ${Math.round(cooldownMs / 1000)}s (sites still load; backing off the resolver)`));
209
+ }
210
+ }
211
+
212
+ // True while suspended. On the first call after the cooldown elapses, resume
213
+ // with a clean window so the storm is re-measured fresh rather than re-tripping
214
+ // on stale errors.
215
+ function isTripped() {
216
+ if (now() < openUntil) return true;
217
+ if (openUntil !== 0) {
218
+ openUntil = 0;
219
+ recent.length = 0;
220
+ errorCount = 0;
221
+ if (forceDebug) console.log(formatLogMessage('debug', '[dns-precheck] cooldown elapsed — resuming DNS pre-check'));
222
+ }
223
+ return false;
224
+ }
225
+
226
+ return {
227
+ record,
228
+ isTripped,
229
+ stats: () => ({ tripped: now() < openUntil, errorCount, windowFill: recent.length, trips }),
230
+ };
231
+ }
232
+
233
+ module.exports = {
234
+ createRotatingResolver,
235
+ createDnsCircuitBreaker,
236
+ parseDnsServers,
237
+ isNonExistenceError,
238
+ };
@@ -103,54 +103,6 @@ class DomainCache {
103
103
  return wasNew;
104
104
  }
105
105
 
106
- /**
107
- * Combined check-and-mark in one pass. Functionally equivalent to
108
- * isDomainAlreadyDetected() followed by markDomainAsDetected(), but with
109
- * one Set.has() call instead of two. (JS is single-threaded so all three
110
- * variants are individually atomic; this one is just cheaper.)
111
- * @param {string} domain - Domain to check and potentially mark
112
- * @returns {boolean} True if domain was ALREADY detected (should skip), false if NEW (should process)
113
- */
114
- checkAndMark(domain) {
115
- if (!domain || typeof domain !== 'string') {
116
- return false;
117
- }
118
-
119
- const wasAlreadyDetected = this.cache.has(domain);
120
-
121
- if (wasAlreadyDetected) {
122
- // Domain already exists - update skip stats and return true (should skip)
123
- this.stats.totalSkipped++;
124
- this.stats.cacheHits++;
125
-
126
- if (this.enableLogging) {
127
- console.log(formatLogMessage('debug', `${this.logPrefix} Cache HIT: ${domain} (skipped)`));
128
- }
129
- return true; // Already detected, should skip
130
- }
131
-
132
- // Domain is NEW - mark it as detected
133
- this.stats.cacheMisses++;
134
-
135
- this.cache.add(domain);
136
- this.stats.totalDetected++;
137
-
138
- if (this.enableLogging) {
139
- console.log(formatLogMessage('debug', `${this.logPrefix} Cache MISS: ${domain} (processing and marked, cache size: ${this.cache.size})`));
140
- }
141
-
142
- // Check size after the add so an overflow only fires eviction once per
143
- // overflowing call (using targetCacheSize precomputed in the constructor).
144
- if (this.cache.size > this.maxCacheSize) {
145
- const toRemove = this.cache.size - this.targetCacheSize;
146
- if (toRemove > 0) {
147
- this.clearOldestEntries(toRemove);
148
- }
149
- }
150
-
151
- return false; // New domain, should process
152
- }
153
-
154
106
  /**
155
107
  * Clear oldest entries from cache (FIFO eviction). Set iteration order is
156
108
  * guaranteed insertion order per ES2015, so this genuinely evicts oldest-
@@ -208,45 +160,6 @@ class DomainCache {
208
160
  return this.cache.has(domain);
209
161
  }
210
162
 
211
- /**
212
- * Add multiple domains to cache at once. Uses a single .size delta to
213
- * count actually-new entries (skipping per-domain .has() calls), and
214
- * runs the size-overflow eviction check once after the batch instead of
215
- * per-domain. For a batch of N domains this is N .has() calls saved and
216
- * up to N redundant cap checks collapsed to one.
217
- * @param {Array<string>} domains - Array of domains to add
218
- * @returns {number} Number of domains actually added (excludes duplicates)
219
- */
220
- markMultipleDomainsAsDetected(domains) {
221
- if (!Array.isArray(domains) || domains.length === 0) {
222
- return 0;
223
- }
224
-
225
- const startSize = this.cache.size;
226
- for (let i = 0; i < domains.length; i++) {
227
- const d = domains[i];
228
- if (d && typeof d === 'string') {
229
- this.cache.add(d);
230
- }
231
- }
232
- const addedCount = this.cache.size - startSize;
233
- this.stats.totalDetected += addedCount;
234
-
235
- if (this.enableLogging && addedCount > 0) {
236
- console.log(formatLogMessage('debug', `${this.logPrefix} Batch added ${addedCount} new domains (cache size: ${this.cache.size})`));
237
- }
238
-
239
- // One eviction sweep at the end, mirroring the single-add overflow check.
240
- if (this.cache.size > this.maxCacheSize) {
241
- const toRemove = this.cache.size - this.targetCacheSize;
242
- if (toRemove > 0) {
243
- this.clearOldestEntries(toRemove);
244
- }
245
- }
246
-
247
- return addedCount;
248
- }
249
-
250
163
  /**
251
164
  * Create bound helper functions for easy integration with existing code
252
165
  * @returns {object} Object with bound helper functions
@@ -255,7 +168,6 @@ class DomainCache {
255
168
  return {
256
169
  isDomainAlreadyDetected: this.isDomainAlreadyDetected.bind(this),
257
170
  markDomainAsDetected: this.markDomainAsDetected.bind(this),
258
- checkAndMark: this.checkAndMark.bind(this),
259
171
  getSkippedCount: () => this.stats.totalSkipped,
260
172
  getCacheSize: () => this.cache.size,
261
173
  getStats: this.getStats.bind(this)
@@ -273,8 +185,7 @@ let globalDomainCache = null;
273
185
  *
274
186
  * NOTE: `options` is honored ONLY on the first call (the call that actually
275
187
  * constructs the singleton). Subsequent calls return the existing instance
276
- * regardless of what's passed. If you need different settings, call
277
- * resetGlobalCache() first or use `new DomainCache(options)` directly.
188
+ * regardless of what's passed; options are fixed at first construction.
278
189
  *
279
190
  * Under debug logging, a warning fires if a later caller passes options
280
191
  * that don't match the live instance — silent drift is a recurring source
@@ -295,7 +206,7 @@ function getGlobalDomainCache(options = {}) {
295
206
  (options.enableLogging !== undefined && options.enableLogging !== globalDomainCache.enableLogging) ||
296
207
  (options.logPrefix !== undefined && options.logPrefix !== globalDomainCache.logPrefix);
297
208
  if (drifted) {
298
- console.log(formatLogMessage('debug', `${globalDomainCache.logPrefix} getGlobalDomainCache called with options that differ from the live singleton; ignored (call resetGlobalCache() first to apply new options)`));
209
+ console.log(formatLogMessage('debug', `${globalDomainCache.logPrefix} getGlobalDomainCache called with options that differ from the live singleton; ignored (options are fixed at first construction)`));
299
210
  }
300
211
  }
301
212
  return globalDomainCache;
@@ -312,36 +223,17 @@ function createGlobalHelpers(options = {}) {
312
223
  }
313
224
 
314
225
  /**
315
- * Reset the global cache (useful for testing or manual resets)
316
- */
317
- function resetGlobalCache() {
318
- if (globalDomainCache) {
319
- globalDomainCache.clear();
320
- }
321
- globalDomainCache = null;
322
- }
323
-
324
- /**
325
- * Legacy wrapper functions for backward compatibility
326
- * These match the original function signatures from nwss.js
226
+ * Legacy wrapper for backward compatibility.
327
227
  *
328
- * NOTE: getTotalDomainsSkipped and getDetectedDomainsCount are the only
329
- * ones kept they're used directly by nwss.js for end-of-scan stats.
330
- * Previously-defined isDomainAlreadyDetected / markDomainAsDetected /
331
- * checkAndMark wrappers were removed: nwss.js calls those via
332
- * createGlobalHelpers() now and repo-wide grep confirmed zero remaining
333
- * external callers of the legacy wrappers.
228
+ * getDetectedDomainsCount is the only one kept — nwss.js reads it for the
229
+ * end-of-scan "unique domains cached" stat. getTotalDomainsSkipped was
230
+ * removed: its value was always 0 because the global cache's skip-check
231
+ * (isDomainAlreadyDetected) is never called cross-URL dedup is handled by
232
+ * nettools' processed-domain sets / smart-cache / the per-URL set — so the
233
+ * stat was misleading. The isDomainAlreadyDetected / markDomainAsDetected /
234
+ * checkAndMark wrappers were likewise removed; nwss.js uses createGlobalHelpers().
334
235
  */
335
236
 
336
- /**
337
- * Get total domains skipped (legacy wrapper)
338
- * @returns {number} Number of domains skipped
339
- */
340
- function getTotalDomainsSkipped() {
341
- const cache = getGlobalDomainCache();
342
- return cache.stats.totalSkipped;
343
- }
344
-
345
237
  /**
346
238
  * Get detected domains cache size (legacy wrapper)
347
239
  * @returns {number} Size of the detected domains cache
@@ -352,15 +244,10 @@ function getDetectedDomainsCount() {
352
244
  }
353
245
 
354
246
  module.exports = {
355
- // Main class
356
- DomainCache,
357
-
358
- // Global cache functions
359
- getGlobalDomainCache,
247
+ // Global cache helpers — createGlobalHelpers feeds nwss.js's per-domain
248
+ // marking; getDetectedDomainsCount feeds the end-of-scan "unique domains
249
+ // cached" stat. (DomainCache / getGlobalDomainCache stay internal — no
250
+ // external consumer; construct via createGlobalHelpers.)
360
251
  createGlobalHelpers,
361
- resetGlobalCache,
362
-
363
- // Legacy wrappers still used by nwss.js for end-of-scan stats
364
- getTotalDomainsSkipped,
365
252
  getDetectedDomainsCount
366
253
  };