@fanboynz/network-scanner 3.2.0 → 3.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +28 -0
- package/README.md +35 -3
- package/lib/nettools.js +3 -3
- package/lib/openvpn_vpn.js +8 -0
- package/lib/wireguard_vpn.js +8 -0
- package/nwss.1 +8 -0
- package/nwss.js +185 -64
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,34 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to the Network Scanner (nwss.js) project.
|
|
4
4
|
|
|
5
|
+
## [3.3.0] - 2026-06-06
|
|
6
|
+
|
|
7
|
+
### Added
|
|
8
|
+
- **DNS dead-domain skip + corroborated persistence** — within a scan, once a host resolves NXDOMAIN/ENODATA it is remembered and repeat URLs on that host are skipped without re-resolving. With `--dns-cache`, a host that *also* fails navigation (`ERR_NAME_NOT_RESOLVED` / `ERR_ADDRESS_UNREACHABLE`) is corroborated and persisted to the negative cache (`.dnsnegcache`, 12h TTL) so it is skipped on the next run too. Only definitive non-existence is cached — resolver errors fail open and never poison a live host.
|
|
9
|
+
- **`acceptInsecureCerts` on browser launch** — TLS/cert errors (expired, self-signed, name-mismatch) no longer abort navigation, so streaming/pirate domains with broken certs are still scanned.
|
|
10
|
+
- **`--disable-popup-blocking` when a site uses `capture_popups`** — Chrome's pop-up blocker (`chrome://settings/content/popups`) is turned off only for popup-capture scans, so non-gesture popunders (document-level `onclick` / timer SDKs) fire and get captured too. Non-popup scans keep the blocker on (stealthier — a real browser blocks non-gesture `window.open()`); gesture-triggered popups already worked via the synthetic-click path.
|
|
11
|
+
|
|
12
|
+
### Changed
|
|
13
|
+
- **The main-frame document is never blocked** — the scanned page (and any main-frame redirect target) is exempt from adblock / `blocked` / `blockDomainsByUrl` aborts. Aborting it made the navigation never commit (`about:blank` → timeout), silently breaking scanned URLs that matched our own filter lists (common on adult/pirate/stream domains). The request still flows through the matcher, so a main-frame redirect destination (e.g. a filecrypt → ad-domain hop) is still captured; sub-frame / ad iframes stay blockable.
|
|
14
|
+
- **Navigation timeouts are recovered, not discarded** — on a nav timeout the scanner retries leniently and proceeds with the partially-loaded page instead of dropping the URL (a page still at `about:blank` is still treated as a failure).
|
|
15
|
+
- **whois disk-cache TTL raised to 36h** (dig stays 20h) — registrar data is stable and whois servers rate-limit aggressively, so a longer TTL cuts repeat queries; dig keeps its 20h TTL.
|
|
16
|
+
- **VPN is Linux-only with a clear guard** — `vpn` / `openvpn` on macOS/Windows now returns an explicit "Linux-only" error instead of cryptic `ip` / `/proc` failures.
|
|
17
|
+
|
|
18
|
+
### Performance
|
|
19
|
+
- **`psl.parse` memoized by hostname** in the request hot path — both per-request handlers (main page + popup capture) parsed the root domain of *every* request, while a page hammers the same handful of hosts (CDN, analytics, ad domains). A hostname-keyed memo turns almost all of those into `Map` hits, replacing the URL-keyed cache (fewer + shorter keys, far higher hit rate).
|
|
20
|
+
- **Lower per-request overhead** — the iframe-loop guard's `frame().url()` lookup is now gated behind a cheap URL string test instead of running on every request.
|
|
21
|
+
- **Removed redundant disk I/O** — a leaked adblock combined-list temp file in `tmpdir` is now cleaned up, and a redundant `existsSync` before each forced screenshot's recursive `mkdir` was dropped.
|
|
22
|
+
|
|
23
|
+
### Fixed
|
|
24
|
+
- **Periodic debug/`--dumpurls` log flush is now synchronous** — the 2s timer used async `fs.writeFile({flag:'a'})` with no in-flight guard, so two ticks could append to the same file concurrently and interleave lines, and it cleared the buffer *before* the write confirmed (silently dropping entries on a failed write). It now uses `appendFileSync`, clears only after a successful write (transient failures retry next tick), and is bounded so a permanently-unwritable path can't grow memory.
|
|
25
|
+
- **Dead-domain skip works without `--show-dead-domains`** — the in-scan skip recorded into the dead set only when the report flag was on, which made the skip dead code; recording is now unconditional and the flag gates only the end-of-scan report. Transient DNS errors were also dropped from the dead-domain match so only `ERR_NAME_NOT_RESOLVED` / `ERR_ADDRESS_UNREACHABLE` mark a host dead.
|
|
26
|
+
|
|
27
|
+
### Removed
|
|
28
|
+
- **Hardcoded `dmzjmp` iframe-loop guard** — the domain-specific abort for a `creative.dmzjmp.com` frame requesting `go.dmzjmp.com/api/models` (added mid-2025 to stop a runaway request loop) has not recurred and was removed from the request hot path; the per-URL timeout remains the backstop. Recoverable from git history — prefer a config-driven `iframe_loop_guards` entry if it ever returns.
|
|
29
|
+
|
|
30
|
+
### Documentation
|
|
31
|
+
- **README + man page now document `--block-ads` and `--adblock-engine`** — blocking ads/trackers *during* the scan with EasyList-format list(s) (comma-separated), and the `js` (default, native parser) vs `rust` (Brave `adblock-rs`) matcher backends.
|
|
32
|
+
|
|
5
33
|
## [3.2.0] - 2026-06-04
|
|
6
34
|
|
|
7
35
|
### Added
|
package/README.md
CHANGED
|
@@ -66,9 +66,10 @@ A Puppeteer-based tool for scanning websites to find third-party (or optionally
|
|
|
66
66
|
| `--use-puppeteer-core` | Use `puppeteer-core` with system Chrome instead of bundled Chromium |
|
|
67
67
|
| `--use-obscura` | Connect to running Obscura CDP server (`ws://127.0.0.1:9222` or `OBSCURA_WS` env). Skips fingerprint injection — Obscura provides built-in stealth |
|
|
68
68
|
| `--load-extension <path>` | Load unpacked Chrome extension from directory (can be used multiple times) |
|
|
69
|
-
| `--dns-cache` | Persist dig/whois results to disk between runs (20hr TTL, 2000-entry cap each, `.digcache`/`.whoiscache`), **plus** the DNS pre-check negative cache (NXDOMAIN/ENODATA only — never resolver errors — 12h TTL, `.dnsnegcache`) so known-dead hosts aren't re-resolved next run. Disk writes are atomic (tmp + rename); corrupt cache files are detected on load with a `[dns-cache]` warn line and reset cleanly. |
|
|
70
|
-
| `--no-dns-precheck` | Disable per-URL DNS resolution check before page navigation. By default, hosts that dig/whois have already proven live (within the
|
|
71
|
-
| `--block-ads=<files>` | Block ads using EasyList
|
|
69
|
+
| `--dns-cache` | Persist dig/whois results to disk between runs (dig 20hr / whois 36hr TTL, 2000-entry cap each, `.digcache`/`.whoiscache`), **plus** the DNS pre-check negative cache (NXDOMAIN/ENODATA only — never resolver errors — 12h TTL, `.dnsnegcache`) so known-dead hosts aren't re-resolved next run. Disk writes are atomic (tmp + rename); corrupt cache files are detected on load with a `[dns-cache]` warn line and reset cleanly. |
|
|
70
|
+
| `--no-dns-precheck` | Disable per-URL DNS resolution check before page navigation. By default, hosts that dig/whois have already proven live (within the dig/whois cache TTL) skip their c-ares pre-check via a positive-resolution index. |
|
|
71
|
+
| `--block-ads=<files>` | Block ads/trackers **during the scan** using EasyList-format filter list(s) (`\|\|domain^`, `/ads/*`, etc.). Comma-separated for multiple: `--block-ads=easylist.txt,easyprivacy.txt`. See [Blocking ads during the scan](#blocking-ads-during-the-scan). |
|
|
72
|
+
| `--adblock-engine=<js\|rust>` | Matcher backend for `--block-ads` (default: `js`). `rust` uses Brave's `adblock-rs` (much faster on large lists) and requires `npm i adblock-rs`. |
|
|
72
73
|
| `--cdp` | Enable Chrome DevTools Protocol logging (now per-page if enabled) |
|
|
73
74
|
| `--remove-dupes` | Remove duplicate domains from output (only with `-o`) |
|
|
74
75
|
| `--dry-run` | Console output only: show matching regex, titles, whois/dig/searchstring results, and adblock rules |
|
|
@@ -92,6 +93,37 @@ A Puppeteer-based tool for scanning websites to find third-party (or optionally
|
|
|
92
93
|
| `--clear-cache` | Clear persistent cache before scanning (improves fresh start performance) |
|
|
93
94
|
| `--ignore-cache` | Bypass all smart caching functionality during scanning |
|
|
94
95
|
|
|
96
|
+
### Blocking ads during the scan
|
|
97
|
+
|
|
98
|
+
`--block-ads` makes the scanner **block** matching requests *during* the scan (separate from capturing rules) — to keep ad/tracker noise out of the page, speed up loads, or test that a list catches what it should.
|
|
99
|
+
|
|
100
|
+
**Adding lists.** Pass one or more EasyList-format filter lists (same syntax as uBlock Origin / EasyList):
|
|
101
|
+
|
|
102
|
+
```bash
|
|
103
|
+
# Single list
|
|
104
|
+
node nwss.js --block-ads=easylist.txt
|
|
105
|
+
|
|
106
|
+
# Multiple lists — comma-separated, no spaces
|
|
107
|
+
node nwss.js --block-ads=easylist.txt,easyprivacy.txt,mylist.txt
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
Lists are plain-text **network** rules — `||doubleclick.net^`, `/ads/*`, `||example.com^$script`, etc. Element-hiding/cosmetic rules (`##…`) don't apply to request blocking and are ignored. The scanned page's own top-level document is never blocked (only sub-resources), so a site whose own domain is in a list still loads.
|
|
111
|
+
|
|
112
|
+
**Engine — `js` vs `rust`** (`--adblock-engine`, default `js`):
|
|
113
|
+
|
|
114
|
+
| Engine | Flag | Backend | When |
|
|
115
|
+
|---|---|---|---|
|
|
116
|
+
| **js** (default) | `--adblock-engine=js` | `lib/adblock.js` — pure-JS, no extra deps | Default; fine for small/medium lists, works everywhere |
|
|
117
|
+
| **rust** | `--adblock-engine=rust` | `lib/adblock-rust.js` — Brave's [`adblock-rs`](https://github.com/brave/adblock-rust) | Large lists (full EasyList + EasyPrivacy + …); much faster matching. Drop-in (same rules, same results). Requires `npm install adblock-rs` (needs a Rust toolchain) |
|
|
118
|
+
|
|
119
|
+
The two engines are interchangeable — same rule format, same blocking result; `rust` is purely a speed option for big lists. If you pass `--adblock-engine=rust` without `adblock-rs` installed, install it (`npm i adblock-rs`) or drop the flag to use `js`.
|
|
120
|
+
|
|
121
|
+
```bash
|
|
122
|
+
# Fast matching over big lists with the Rust engine
|
|
123
|
+
npm install adblock-rs
|
|
124
|
+
node nwss.js --block-ads=easylist.txt,easyprivacy.txt --adblock-engine=rust
|
|
125
|
+
```
|
|
126
|
+
|
|
95
127
|
---
|
|
96
128
|
|
|
97
129
|
## config.json Format
|
package/lib/nettools.js
CHANGED
|
@@ -30,7 +30,7 @@ const GLOBAL_DIG_CACHE_MAX = 2000;
|
|
|
30
30
|
// Global whois result cache — shared across ALL handler instances and processUrl calls
|
|
31
31
|
// Whois data is per root domain and doesn't change based on search terms
|
|
32
32
|
const globalWhoisResultCache = new Map();
|
|
33
|
-
const GLOBAL_WHOIS_CACHE_TTL =
|
|
33
|
+
const GLOBAL_WHOIS_CACHE_TTL = 129600000; // 36 hours (persisted to disk between runs). Longer than dig's 20h: registrar data is very stable and whois servers rate-limit aggressively, so caching longer cuts repeat queries.
|
|
34
34
|
const GLOBAL_WHOIS_CACHE_MAX = 2000;
|
|
35
35
|
|
|
36
36
|
// Persistent disk cache file paths
|
|
@@ -40,8 +40,8 @@ const WHOIS_CACHE_FILE = path.join(__dirname, '..', '.whoiscache');
|
|
|
40
40
|
// Index of hostnames known to resolve, populated as a side effect of
|
|
41
41
|
// positive dig/whois cache writes AND cache hits. nwss.js's DNS pre-check
|
|
42
42
|
// reads this via domainKnownToResolve() so it can skip its own resolve4
|
|
43
|
-
// call on hosts that dig or whois have already proven live within
|
|
44
|
-
//
|
|
43
|
+
// call on hosts that dig or whois have already proven live within their
|
|
44
|
+
// cache TTL window (dig 20h / whois 36h). Populating on cache HITS (not just writes) handles
|
|
45
45
|
// the --dns-cache disk-load case where entries arrive without going
|
|
46
46
|
// through the in-process write path. Stale entries -- hostname in Set but
|
|
47
47
|
// the dig/whois entry has since been evicted -- are harmless: worst case
|
package/lib/openvpn_vpn.js
CHANGED
|
@@ -778,6 +778,14 @@ function validateOvpnConfig(ovpnConfig) {
|
|
|
778
778
|
* @returns {Promise<Object>} { success, connection, tunDevice, error }
|
|
779
779
|
*/
|
|
780
780
|
async function connectForSite(siteConfig, forceDebug = false) {
|
|
781
|
+
// Platform guard: OpenVPN routing here reads /proc and uses the iproute2 `ip`
|
|
782
|
+
// command, both Linux-only. Fail clearly instead of a cryptic /proc or `ip`
|
|
783
|
+
// error on macOS/Windows. WSL2 reports 'linux' and passes (TUN is checked
|
|
784
|
+
// separately below via isWSL/checkTunDevice).
|
|
785
|
+
if (process.platform !== 'linux') {
|
|
786
|
+
return { success: false, error: `OpenVPN routing is currently Linux-only (needs /proc + the iproute2 'ip' command; not available on ${process.platform}). Run on Linux/WSL2, or remove the 'openvpn' option from the site config.` };
|
|
787
|
+
}
|
|
788
|
+
|
|
781
789
|
const ovpnConfig = normalizeOvpnConfig(siteConfig.openvpn);
|
|
782
790
|
if (!ovpnConfig) {
|
|
783
791
|
return { success: false, error: 'Invalid OpenVPN configuration' };
|
package/lib/wireguard_vpn.js
CHANGED
|
@@ -388,6 +388,14 @@ function validateVpnConfig(vpnConfig) {
|
|
|
388
388
|
* @returns {Promise<Object>} { success, interface, error }
|
|
389
389
|
*/
|
|
390
390
|
async function connectForSite(siteConfig, forceDebug = false) {
|
|
391
|
+
// Platform guard: WireGuard routing here relies on the iproute2 `ip` command
|
|
392
|
+
// and wg-quick conventions, which are Linux-only. Fail with a clear message
|
|
393
|
+
// instead of a cryptic `ip: command not found` on macOS/Windows. WSL2 reports
|
|
394
|
+
// 'linux' and passes.
|
|
395
|
+
if (process.platform !== 'linux') {
|
|
396
|
+
return { success: false, error: `WireGuard routing is currently Linux-only (needs the iproute2 'ip' command + wg-quick; not available on ${process.platform}). Run on Linux/WSL2, or remove the 'vpn' option from the site config.` };
|
|
397
|
+
}
|
|
398
|
+
|
|
391
399
|
const vpnConfig = normalizeVpnConfig(siteConfig.vpn);
|
|
392
400
|
if (!vpnConfig) {
|
|
393
401
|
return { success: false, error: 'Invalid VPN configuration' };
|
package/nwss.1
CHANGED
|
@@ -153,6 +153,14 @@ Browser restart interval in URLs processed (1-1000, overrides config/default).
|
|
|
153
153
|
.B \--show-dead-domains
|
|
154
154
|
At end of scan, list hostnames that did not resolve or were unreachable (\fBNXDOMAIN\fR/\fBENODATA\fR plus \fBERR_NAME_NOT_RESOLVED\fR/\fBERR_ADDRESS_UNREACHABLE\fR). Excludes blocks and timeouts, since those mean the domain is alive. Useful for pruning dead URLs.
|
|
155
155
|
|
|
156
|
+
.TP
|
|
157
|
+
.BI \--block-ads= FILE\fR[,\fIFILE\fR...]
|
|
158
|
+
Block ads/trackers during the scan using EasyList-format filter list(s) \(em network rules like \fB||domain^\fR, \fB/ads/*\fR, \fB||domain^$script\fR. Comma-separated for multiple lists. Cosmetic (\fB##\fR) rules are ignored; the scanned page's own top-level document is never blocked (only sub-resources).
|
|
159
|
+
|
|
160
|
+
.TP
|
|
161
|
+
.BI \--adblock-engine= js|rust
|
|
162
|
+
Matcher backend for \fB\-\-block-ads\fR (default: \fBjs\fR). \fBjs\fR is the built-in pure-JS matcher (no extra dependencies). \fBrust\fR uses Brave's \fBadblock-rs\fR \(em much faster on large lists, same rules and results, but requires \fBnpm install adblock-rs\fR (needs a Rust toolchain).
|
|
163
|
+
|
|
156
164
|
.TP
|
|
157
165
|
.BR \-h ", " \--help
|
|
158
166
|
Show help message and exit.
|
package/nwss.js
CHANGED
|
@@ -55,6 +55,7 @@ const CSS_BLOCKED_TAG = messageColors.processing('[css_blocked]');
|
|
|
55
55
|
const EVAL_ON_DOC_TAG = messageColors.processing('[evalOnDoc]');
|
|
56
56
|
const REALTIME_CLEANUP_TAG = messageColors.processing('[realtime_cleanup]');
|
|
57
57
|
const VPN_TAG = messageColors.processing('[vpn]');
|
|
58
|
+
const POPUP_TAG = messageColors.processing('[popup]');
|
|
58
59
|
// Precomputed colored '[SmartCache]' subsystem prefix — paired with the
|
|
59
60
|
// same constant in lib/smart-cache.js so debug lines from both files
|
|
60
61
|
// produce consistently colored output. formatLogMessage only colors the
|
|
@@ -387,7 +388,11 @@ const dnsPrecheckTimeoutMs = 2000;
|
|
|
387
388
|
const showDeadDomains = args.includes('--show-dead-domains');
|
|
388
389
|
const _deadDomains = new Map();
|
|
389
390
|
function recordDeadDomain(urlOrHost, reason) {
|
|
390
|
-
|
|
391
|
+
// Populate unconditionally — the pre-check skip reads _deadDomains to drop
|
|
392
|
+
// repeat URLs on a host already proven dead this run, which must work whether
|
|
393
|
+
// or not --show-dead-domains is set. The end-of-scan REPORT is separately
|
|
394
|
+
// gated on showDeadDomains, so the flag still controls output, not recording.
|
|
395
|
+
if (!urlOrHost) return;
|
|
391
396
|
let host = urlOrHost;
|
|
392
397
|
try { host = new URL(urlOrHost).hostname; } catch { /* already a bare host */ }
|
|
393
398
|
if (host && !_deadDomains.has(host)) _deadDomains.set(host, reason);
|
|
@@ -407,7 +412,7 @@ const DNS_NEGATIVE_CACHE_MAX = 1000;
|
|
|
407
412
|
// persisting it can't silently drop a live host. Opt-in via --dns-cache: dead
|
|
408
413
|
// hosts are remembered for DNS_NEGATIVE_PERSIST_TTL_MS and reloaded next run;
|
|
409
414
|
// otherwise it's a 5-min in-memory-only cache. The persist TTL is deliberately
|
|
410
|
-
// shorter than the dig/whois positive cache (20h): a domain that doesn't exist
|
|
415
|
+
// shorter than the dig/whois positive cache (dig 20h / whois 36h): a domain that doesn't exist
|
|
411
416
|
// now MAY get registered, and this is a domain-hunting scanner, so the dead
|
|
412
417
|
// ones are re-checked twice a day rather than trusted for ~a day.
|
|
413
418
|
const DNS_NEGATIVE_PERSIST_TTL_MS = 12 * 60 * 60 * 1000; // 12 hours
|
|
@@ -715,6 +720,9 @@ if (blockAdsIndex !== -1) {
|
|
|
715
720
|
|
|
716
721
|
adblockEnabled = true;
|
|
717
722
|
const engine = adblockEngineName === 'rust' ? adblockRust : adblockJs;
|
|
723
|
+
// Only ever assigned the os.tmpdir() path below — never a user file — so the
|
|
724
|
+
// unlink in finally can never touch the caller's own lists.
|
|
725
|
+
let combinedTmpFile = null;
|
|
718
726
|
try {
|
|
719
727
|
if (engine === adblockRust) {
|
|
720
728
|
// Rust wrapper accepts an array directly — no temp file needed.
|
|
@@ -723,15 +731,22 @@ if (blockAdsIndex !== -1) {
|
|
|
723
731
|
// JS engine takes a single path; concat to a temp file when multiple lists.
|
|
724
732
|
let rulesFile = rulesFiles[0];
|
|
725
733
|
if (rulesFiles.length > 1) {
|
|
726
|
-
|
|
734
|
+
combinedTmpFile = path.join(os.tmpdir(), `nwss-adblock-combined-${Date.now()}.txt`);
|
|
735
|
+
rulesFile = combinedTmpFile;
|
|
727
736
|
const combined = rulesFiles.map(f => fs.readFileSync(f, 'utf-8')).join('\n');
|
|
728
737
|
fs.writeFileSync(rulesFile, combined);
|
|
729
738
|
}
|
|
739
|
+
// parseAdblockRules reads the file synchronously and in full before
|
|
740
|
+
// returning, so the temp copy is safe to remove immediately afterwards.
|
|
730
741
|
adblockMatcher = engine.parseAdblockRules(rulesFile, { enableLogging: forceDebug });
|
|
731
742
|
}
|
|
732
743
|
} catch (err) {
|
|
733
744
|
console.log(`Error: Failed to load adblock engine '${adblockEngineName}': ${err.message}`);
|
|
734
745
|
process.exit(1);
|
|
746
|
+
} finally {
|
|
747
|
+
if (combinedTmpFile) {
|
|
748
|
+
try { fs.unlinkSync(combinedTmpFile); } catch { /* best effort — OS reaps tmpdir */ }
|
|
749
|
+
}
|
|
735
750
|
}
|
|
736
751
|
const stats = adblockMatcher.getStats();
|
|
737
752
|
const ruleDesc = stats.total != null
|
|
@@ -805,7 +820,7 @@ Validation Options:
|
|
|
805
820
|
--cache-requests Cache HTTP requests to avoid re-requesting same URLs within scan
|
|
806
821
|
--dns <ip[,ip,...]> Resolver(s) for the DNS pre-check AND nettools' dig (not Chrome nav / whois).
|
|
807
822
|
One pins all queries to it; several rotate per query. Overrides /etc/resolv.conf.
|
|
808
|
-
--dns-cache Persist dig/whois results to disk between runs (20h TTL, 2000-entry cap each),
|
|
823
|
+
--dns-cache Persist dig/whois results to disk between runs (dig 20h / whois 36h TTL, 2000-entry cap each),
|
|
809
824
|
plus the DNS pre-check negative cache (NXDOMAIN only, 12h TTL, .dnsnegcache)
|
|
810
825
|
--no-dns-precheck Disable per-URL DNS resolution check before page navigation.
|
|
811
826
|
By default, URLs whose hostname doesn't resolve are skipped
|
|
@@ -933,7 +948,7 @@ Advanced Options:
|
|
|
933
948
|
whois_delay: <milliseconds> Delay between whois requests for this site (default: global whois_delay)
|
|
934
949
|
dig: ["term1", "term2"] Check dig output for ALL specified terms (AND logic)
|
|
935
950
|
dig-or: ["term1", "term2"] Check dig output for ANY specified term (OR logic)
|
|
936
|
-
goto_options: {"waitUntil": "domcontentloaded"} Custom page.goto() options (default: {"waitUntil": "
|
|
951
|
+
goto_options: {"waitUntil": "domcontentloaded"} Custom page.goto() options (default: {"waitUntil": "domcontentloaded"})
|
|
937
952
|
dig_subdomain: true/false Use subdomain for dig lookup instead of root domain (default: false)
|
|
938
953
|
digRecordType: "A" DNS record type for dig (default: A)
|
|
939
954
|
|
|
@@ -1423,6 +1438,7 @@ if (dumpUrls) {
|
|
|
1423
1438
|
// Avoids blocking I/O on every intercepted request in debug/dumpurls mode
|
|
1424
1439
|
const _logBuffers = new Map(); // filePath -> string[]
|
|
1425
1440
|
const LOG_FLUSH_INTERVAL = 2000; // Flush every 2 seconds
|
|
1441
|
+
const LOG_BUFFER_MAX_RETAINED = 10000; // Cap a file's retry backlog (lines) so a permanently unwritable path can't grow memory unboundedly
|
|
1426
1442
|
let _logFlushTimer = null;
|
|
1427
1443
|
|
|
1428
1444
|
function bufferedLogWrite(filePath, entry) {
|
|
@@ -1435,18 +1451,20 @@ function bufferedLogWrite(filePath, entry) {
|
|
|
1435
1451
|
|
|
1436
1452
|
function flushLogBuffers() {
|
|
1437
1453
|
for (const [filePath, entries] of _logBuffers) {
|
|
1438
|
-
if (entries.length
|
|
1439
|
-
|
|
1440
|
-
|
|
1441
|
-
|
|
1442
|
-
|
|
1443
|
-
|
|
1444
|
-
|
|
1445
|
-
|
|
1446
|
-
|
|
1447
|
-
|
|
1448
|
-
|
|
1449
|
-
|
|
1454
|
+
if (entries.length === 0) continue;
|
|
1455
|
+
try {
|
|
1456
|
+
// Synchronous append on purpose: the batched 2s flush is small, and a
|
|
1457
|
+
// blocking append cannot overlap the next timer tick (it holds the event
|
|
1458
|
+
// loop for its duration) — eliminating the interleaved concurrent-append
|
|
1459
|
+
// hazard of the old async fs.writeFile({flag:'a'}). Clear ONLY after the
|
|
1460
|
+
// write succeeds, so a transient failure retries next tick instead of
|
|
1461
|
+
// being silently dropped (the old code cleared before the async write
|
|
1462
|
+
// confirmed). Bounded so a permanently unwritable path can't grow memory.
|
|
1463
|
+
fs.appendFileSync(filePath, entries.join(''));
|
|
1464
|
+
entries.length = 0;
|
|
1465
|
+
} catch (err) {
|
|
1466
|
+
console.warn(formatLogMessage('warn', `Failed to flush log buffer to ${filePath}: ${err.message}`));
|
|
1467
|
+
if (entries.length > LOG_BUFFER_MAX_RETAINED) entries.length = 0;
|
|
1450
1468
|
}
|
|
1451
1469
|
}
|
|
1452
1470
|
}
|
|
@@ -1490,21 +1508,29 @@ if (forceDebug && globalComments) {
|
|
|
1490
1508
|
* @param {string} url - The URL string to parse.
|
|
1491
1509
|
* @returns {string} The root domain, or the original hostname if parsing fails (e.g., for IP addresses or invalid URLs), or an empty string on error.
|
|
1492
1510
|
*/
|
|
1493
|
-
|
|
1494
|
-
|
|
1495
|
-
|
|
1511
|
+
// psl.parse memoized by hostname. The request handlers parse the root domain
|
|
1512
|
+
// of EVERY request, and a page hits the same few hosts repeatedly (CDN,
|
|
1513
|
+
// analytics, ad domains) — so a hostname-keyed memo turns almost all of those
|
|
1514
|
+
// into Map hits instead of repeated public-suffix-list lookups. Keyed by
|
|
1515
|
+
// hostname (not full URL) so distinct paths/queries on one host share one
|
|
1516
|
+
// entry: higher hit rate, fewer + shorter keys than a URL-keyed cache.
|
|
1517
|
+
// psl.parse is pure and never throws (malformed input → {domain: null}), so
|
|
1518
|
+
// the catch is defensive only.
|
|
1519
|
+
const _hostRootCache = new Map();
|
|
1520
|
+
function rootDomainForHost(hostname) {
|
|
1521
|
+
if (!hostname) return '';
|
|
1522
|
+
const cached = _hostRootCache.get(hostname);
|
|
1496
1523
|
if (cached !== undefined) return cached;
|
|
1497
|
-
|
|
1498
|
-
|
|
1499
|
-
|
|
1500
|
-
|
|
1501
|
-
|
|
1502
|
-
|
|
1503
|
-
|
|
1504
|
-
|
|
1505
|
-
|
|
1506
|
-
|
|
1507
|
-
}
|
|
1524
|
+
let result;
|
|
1525
|
+
try { const parsed = psl.parse(hostname); result = parsed.domain || hostname; }
|
|
1526
|
+
catch { result = hostname; }
|
|
1527
|
+
if (_hostRootCache.size > 5000) _hostRootCache.clear();
|
|
1528
|
+
_hostRootCache.set(hostname, result);
|
|
1529
|
+
return result;
|
|
1530
|
+
}
|
|
1531
|
+
function getRootDomain(url) {
|
|
1532
|
+
try { return rootDomainForHost(new URL(url).hostname); }
|
|
1533
|
+
catch { return ''; }
|
|
1508
1534
|
}
|
|
1509
1535
|
|
|
1510
1536
|
/**
|
|
@@ -1839,7 +1865,19 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
1839
1865
|
|
|
1840
1866
|
// Declare userDataDir in outer scope for cleanup access
|
|
1841
1867
|
let userDataDir = null;
|
|
1842
|
-
|
|
1868
|
+
|
|
1869
|
+
// Browser-level decision (the browser launches once per batch, so this can't
|
|
1870
|
+
// be per-site): only disable Chrome's pop-up blocker when at least one site
|
|
1871
|
+
// actually wants popups captured. A real browser blocks non-gesture
|
|
1872
|
+
// window.open(), so non-popup scans keep the blocker on for stealth.
|
|
1873
|
+
// capture_popups scans turn it off so non-gesture popunders (document-level
|
|
1874
|
+
// onclick / timer SDKs) fire and get captured too — gesture-triggered
|
|
1875
|
+
// popups already work via the synthetic-click path regardless of this flag.
|
|
1876
|
+
const wantPopups = Array.isArray(sites) && sites.some(s => s && s.capture_popups === true);
|
|
1877
|
+
if (wantPopups && forceDebug) {
|
|
1878
|
+
console.log(formatLogMessage('debug', `${POPUP_TAG} capture_popups set — launching with --disable-popup-blocking (non-gesture popunders allowed)`));
|
|
1879
|
+
}
|
|
1880
|
+
|
|
1843
1881
|
/**
|
|
1844
1882
|
* Creates a new browser instance with consistent configuration
|
|
1845
1883
|
* Uses system Chrome and temporary directories to minimize disk usage
|
|
@@ -1930,6 +1968,12 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
1930
1968
|
// Puppeteer 22.x headless mode optimization
|
|
1931
1969
|
// Auto-detect best headless mode based on Puppeteer version
|
|
1932
1970
|
headless: headlessMode,
|
|
1971
|
+
// Bypass TLS cert errors at the browser level (drives CDP
|
|
1972
|
+
// Security.setIgnoreCertificateErrors). Robust on new-headless Chrome,
|
|
1973
|
+
// where the --ignore-certificate-errors *flag* is increasingly ignored.
|
|
1974
|
+
// An ad/tracker scanner must reach self-signed / mismatched-cert ad and
|
|
1975
|
+
// embed domains; we observe traffic, we don't transmit secrets.
|
|
1976
|
+
acceptInsecureCerts: true,
|
|
1933
1977
|
args: [
|
|
1934
1978
|
// CRITICAL: Remove automation detection markers
|
|
1935
1979
|
'--disable-blink-features=AutomationControlled',
|
|
@@ -2018,6 +2062,10 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
2018
2062
|
'--memory-pressure-off',
|
|
2019
2063
|
'--max_old_space_size=2048', // V8 heap limit
|
|
2020
2064
|
'--disable-prompt-on-repost', // Fixes form popup on page reload
|
|
2065
|
+
// Disable Chrome's pop-up blocker (chrome://settings/content/popups)
|
|
2066
|
+
// ONLY when a site wants popups captured — lets non-gesture popunders
|
|
2067
|
+
// fire. Gated so non-popup scans keep the blocker on for stealth.
|
|
2068
|
+
...(wantPopups ? ['--disable-popup-blocking'] : []),
|
|
2021
2069
|
...(keepBrowserOpen ? [] : ['--disable-background-networking']),
|
|
2022
2070
|
'--no-sandbox',
|
|
2023
2071
|
'--disable-setuid-sandbox',
|
|
@@ -3362,8 +3410,7 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
3362
3410
|
try {
|
|
3363
3411
|
const parsedUrl = new URL(checkedUrl);
|
|
3364
3412
|
fullSubdomain = parsedUrl.hostname;
|
|
3365
|
-
|
|
3366
|
-
checkedRootDomain = pslResult.domain || fullSubdomain;
|
|
3413
|
+
checkedRootDomain = rootDomainForHost(fullSubdomain);
|
|
3367
3414
|
} catch (_) { return; }
|
|
3368
3415
|
if (!checkedRootDomain) return;
|
|
3369
3416
|
|
|
@@ -3638,30 +3685,24 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
3638
3685
|
try {
|
|
3639
3686
|
const parsedUrl = new URL(checkedUrl);
|
|
3640
3687
|
fullSubdomain = parsedUrl.hostname;
|
|
3641
|
-
|
|
3642
|
-
checkedRootDomain = pslResult.domain || fullSubdomain;
|
|
3688
|
+
checkedRootDomain = rootDomainForHost(fullSubdomain);
|
|
3643
3689
|
} catch (e) {}
|
|
3644
3690
|
|
|
3691
|
+
// Never BLOCK the top-level document (the scanned page OR a main-frame
|
|
3692
|
+
// redirect target). Aborting it makes the navigation never commit (page
|
|
3693
|
+
// stays at about:blank → navigation timeout), silently breaking any
|
|
3694
|
+
// scanned URL that matches our own filter lists (adblock / blocked /
|
|
3695
|
+
// blockDomainsByUrl) — common on adult/pirate/stream domains. This flag
|
|
3696
|
+
// ONLY guards the abort paths below; the request still flows through the
|
|
3697
|
+
// match logic, so a main-frame redirect destination (e.g. a
|
|
3698
|
+
// filecrypt → ad-domain hop) is still captured via filterRegex/dig/whois.
|
|
3699
|
+
// isNavigationRequest is true for sub-frame docs too, so the mainFrame()
|
|
3700
|
+
// check keeps ad iframes blockable.
|
|
3701
|
+
let isMainFrameDoc = false;
|
|
3702
|
+
try { isMainFrameDoc = request.isNavigationRequest() && request.frame() === page.mainFrame(); } catch (_) {}
|
|
3703
|
+
|
|
3645
3704
|
// Check against ALL first-party domains (original + all redirects)
|
|
3646
3705
|
const isFirstParty = checkedRootDomain && firstPartyDomains.has(checkedRootDomain);
|
|
3647
|
-
|
|
3648
|
-
// Block infinite iframe loops - safely access frame URL
|
|
3649
|
-
const frameUrl = (() => {
|
|
3650
|
-
try {
|
|
3651
|
-
const frame = request.frame();
|
|
3652
|
-
return frame ? frame.url() : '';
|
|
3653
|
-
} catch (err) {
|
|
3654
|
-
return '';
|
|
3655
|
-
}
|
|
3656
|
-
})();
|
|
3657
|
-
if (frameUrl && frameUrl.includes('creative.dmzjmp.com') &&
|
|
3658
|
-
checkedUrl.includes('go.dmzjmp.com/api/models')) {
|
|
3659
|
-
if (forceDebug) {
|
|
3660
|
-
console.log(formatLogMessage('debug', `Blocking potential infinite iframe loop: ${checkedUrl}`));
|
|
3661
|
-
}
|
|
3662
|
-
request.abort();
|
|
3663
|
-
return;
|
|
3664
|
-
}
|
|
3665
3706
|
|
|
3666
3707
|
// Enhanced debug logging to show which frame the request came from
|
|
3667
3708
|
if (forceDebug) {
|
|
@@ -3691,7 +3732,7 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
3691
3732
|
request.resourceType()
|
|
3692
3733
|
);
|
|
3693
3734
|
|
|
3694
|
-
if (result.blocked) {
|
|
3735
|
+
if (result.blocked && !isMainFrameDoc) {
|
|
3695
3736
|
adblockStats.blocked++;
|
|
3696
3737
|
if (forceDebug) {
|
|
3697
3738
|
console.log(formatLogMessage('debug', `${messageColors.blocked('[adblock]')} ${checkedUrl} (${result.reason})`));
|
|
@@ -3699,6 +3740,12 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
3699
3740
|
request.abort('blockedbyclient');
|
|
3700
3741
|
return;
|
|
3701
3742
|
}
|
|
3743
|
+
if (result.blocked && isMainFrameDoc && forceDebug) {
|
|
3744
|
+
// Matched a filter rule but it's the page we're scanning (or a
|
|
3745
|
+
// main-frame redirect target) — allow it (blocking the top-level
|
|
3746
|
+
// document aborts navigation). It still flows through the matcher.
|
|
3747
|
+
console.log(formatLogMessage('debug', `${messageColors.highlight('[adblock]')} top-level document ${checkedUrl} matched (${result.reason}) — allowed (never block the scanned page)`));
|
|
3748
|
+
}
|
|
3702
3749
|
adblockStats.allowed++;
|
|
3703
3750
|
} catch (err) { /* Silently continue on adblock errors */ }
|
|
3704
3751
|
}
|
|
@@ -3752,7 +3799,7 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
3752
3799
|
// check so domain-based blocks short-circuit without paying the
|
|
3753
3800
|
// per-URL regex scan. Same abort reason as the static path so
|
|
3754
3801
|
// request.failure() observers see consistent metadata.
|
|
3755
|
-
if (reqDomain && _dynamicallyBlockedDomains.size > 0 && matchesDynamicBlock(reqDomain)) {
|
|
3802
|
+
if (reqDomain && _dynamicallyBlockedDomains.size > 0 && matchesDynamicBlock(reqDomain) && !isMainFrameDoc) {
|
|
3756
3803
|
if (forceDebug) {
|
|
3757
3804
|
console.log(formatLogMessage('debug', `${BLOCK_DOMAINS_BY_URL_TAG} aborting ${reqUrl} (domain ${reqDomain} dynamically blocked)`));
|
|
3758
3805
|
}
|
|
@@ -3767,7 +3814,7 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
3767
3814
|
break;
|
|
3768
3815
|
}
|
|
3769
3816
|
}
|
|
3770
|
-
if (blockedMatchIndex !== -1) {
|
|
3817
|
+
if (blockedMatchIndex !== -1 && !isMainFrameDoc) {
|
|
3771
3818
|
// Always track the hit (zero-cost on the un-debug path) so the
|
|
3772
3819
|
// scan-end summary can show which patterns are doing work vs.
|
|
3773
3820
|
// which are stale and ready to prune. Keyed by pattern.source --
|
|
@@ -4349,15 +4396,43 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
4349
4396
|
try {
|
|
4350
4397
|
navigationResult = await navigateWithRedirectHandling(page, currentUrl, siteConfig, gotoOptions, forceDebug, formatLogMessage);
|
|
4351
4398
|
} catch (navErr) {
|
|
4352
|
-
// Only
|
|
4399
|
+
// Only handle genuine timeouts here, not chrome-error:// redirects.
|
|
4400
|
+
// pageUrl === 'about:blank' means the navigation never committed
|
|
4401
|
+
// (server never responded) — treat as a real failure, not a partial
|
|
4402
|
+
// page; only a page that actually reached a URL is worth observing.
|
|
4353
4403
|
let pageUrl = '';
|
|
4354
4404
|
try { if (!page.isClosed()) pageUrl = page.url(); } catch {}
|
|
4355
4405
|
const isPopupFailure = navErr.message.includes('chrome-error://') || navErr.message.includes('invalid URL') ||
|
|
4356
4406
|
pageUrl.startsWith('chrome-error://') || pageUrl === 'about:blank';
|
|
4357
4407
|
if ((navErr.message.includes('timeout') || navErr.message.includes('Timeout')) && !isPopupFailure) {
|
|
4358
|
-
|
|
4359
|
-
|
|
4360
|
-
|
|
4408
|
+
// The OLD fallback retried with networkidle2 — STRICTER than the
|
|
4409
|
+
// domcontentloaded default, so it could never rescue a
|
|
4410
|
+
// domcontentloaded timeout (and Puppeteer 25 has no 'commit', i.e.
|
|
4411
|
+
// nothing more lenient). Two-tier recovery instead:
|
|
4412
|
+
// 1. If the site used a wait STRICTER than domcontentloaded, do one
|
|
4413
|
+
// lenient retry with domcontentloaded (it fires earlier).
|
|
4414
|
+
// 2. Otherwise proceed with the partially-loaded page rather than
|
|
4415
|
+
// discarding the URL — it exists and requests already fired
|
|
4416
|
+
// (captured by page.on('request')); the delay/interact phase
|
|
4417
|
+
// below keeps capturing. Streaming/embed/media pages routinely
|
|
4418
|
+
// never reach DOM-ready (a connection stays open) but their
|
|
4419
|
+
// ad/tracker calls fired early.
|
|
4420
|
+
const primaryWait = gotoOptions.waitUntil || defaultWaitUntil;
|
|
4421
|
+
let recovered = false;
|
|
4422
|
+
if (primaryWait !== 'domcontentloaded') {
|
|
4423
|
+
try {
|
|
4424
|
+
if (forceDebug) console.log(formatLogMessage('debug', `Navigation timeout (${primaryWait}), retrying with waitUntil:domcontentloaded for ${currentUrl}`));
|
|
4425
|
+
const fallbackOptions = { ...gotoOptions, waitUntil: 'domcontentloaded', timeout: Math.min(timeout, 15000) };
|
|
4426
|
+
navigationResult = await navigateWithRedirectHandling(page, currentUrl, siteConfig, fallbackOptions, forceDebug, formatLogMessage);
|
|
4427
|
+
recovered = true;
|
|
4428
|
+
} catch (_) { /* fall through to proceed-with-partial */ }
|
|
4429
|
+
}
|
|
4430
|
+
if (!recovered) {
|
|
4431
|
+
let partialUrl = currentUrl;
|
|
4432
|
+
try { if (!page.isClosed()) partialUrl = page.url() || currentUrl; } catch {}
|
|
4433
|
+
if (forceDebug) console.log(formatLogMessage('debug', `Navigation timeout — proceeding with partially-loaded page for ${currentUrl}`));
|
|
4434
|
+
navigationResult = { finalUrl: partialUrl, redirected: false, redirectChain: [currentUrl], originalUrl: currentUrl, redirectDomains: [], httpStatus: null, cfRay: null };
|
|
4435
|
+
}
|
|
4361
4436
|
} else {
|
|
4362
4437
|
throw navErr;
|
|
4363
4438
|
}
|
|
@@ -4630,8 +4705,41 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
4630
4705
|
// Capture hard "dead domain" navigation errors for --show-dead-domains
|
|
4631
4706
|
// (DNS doesn't resolve / host unreachable). Blocks, timeouts and CF
|
|
4632
4707
|
// challenges are NOT dead — they're excluded by this match.
|
|
4633
|
-
|
|
4634
|
-
|
|
4708
|
+
// Only DEFINITIVE non-existence / unreachable signals — these now drive
|
|
4709
|
+
// the in-scan dead-domain SKIP (not just --show-dead-domains reporting),
|
|
4710
|
+
// so transient DNS errors must NOT match. The bare `ERR_DNS` used to
|
|
4711
|
+
// catch ERR_DNS_TIMED_OUT / ERR_DNS_MALFORMED_RESPONSE / ERR_DNS_SERVER_FAILED
|
|
4712
|
+
// (all transient) — dropped so a slow-DNS blip can't false-skip the
|
|
4713
|
+
// rest of a live host's URLs.
|
|
4714
|
+
const deadNav = /ERR_NAME_NOT_RESOLVED|ERR_ADDRESS_UNREACHABLE/.exec(err.message || '');
|
|
4715
|
+
if (deadNav) {
|
|
4716
|
+
recordDeadDomain(currentUrl, deadNav[0]);
|
|
4717
|
+
// Corroborate-then-persist to the negative cache (.dnsnegcache with
|
|
4718
|
+
// --dns-cache → cross-scan skip; else in-memory). Chrome resolves via
|
|
4719
|
+
// the possibly-flaky SYSTEM resolver, so its ERR_NAME_NOT_RESOLVED may
|
|
4720
|
+
// be a glitch on a LIVE host. Re-confirm via the reliable --dns
|
|
4721
|
+
// resolver and cache ONLY if it ALSO returns a definitive NXDOMAIN.
|
|
4722
|
+
// ERR_ADDRESS_UNREACHABLE is routing (the host resolves), so the
|
|
4723
|
+
// resolve succeeds and it's correctly not cached. Fire-and-forget:
|
|
4724
|
+
// off the critical path; saveDiskCache flushes on exit.
|
|
4725
|
+
if (dnsPrecheckEnabled && deadNav[0] === 'ERR_NAME_NOT_RESOLVED') {
|
|
4726
|
+
let navHost = '';
|
|
4727
|
+
try { navHost = new URL(currentUrl).hostname; } catch {}
|
|
4728
|
+
if (navHost && !/^[\d.:]+$|^\[/.test(navHost) && !dnsNegativeCache.has(navHost)) {
|
|
4729
|
+
dnsResolver.resolveHost(navHost, dnsPrecheckTimeoutMs).then(
|
|
4730
|
+
() => { /* reliable resolver resolves it — system-resolver glitch, do NOT cache */ },
|
|
4731
|
+
(e) => {
|
|
4732
|
+
const code = (e && (e.code || e.message)) || '';
|
|
4733
|
+
if (isNonExistenceError(code)) {
|
|
4734
|
+
dnsNegativeCacheSet(navHost, code);
|
|
4735
|
+
recordDeadDomain(navHost, code);
|
|
4736
|
+
if (forceDebug) console.log(formatLogMessage('debug', `Dead domain confirmed by --dns resolver (${code}) — caching ${navHost} (skips next run with --dns-cache)`));
|
|
4737
|
+
}
|
|
4738
|
+
}
|
|
4739
|
+
).catch(() => {});
|
|
4740
|
+
}
|
|
4741
|
+
}
|
|
4742
|
+
}
|
|
4635
4743
|
throw err;
|
|
4636
4744
|
}
|
|
4637
4745
|
}
|
|
@@ -5263,7 +5371,7 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
5263
5371
|
const safeUrl = currentUrl.replace(/https?:\/\//, '').replace(/[^a-zA-Z0-9]/g, '_').substring(0, 80);
|
|
5264
5372
|
const filename = `screenshots/${safeUrl}-${timestamp}.png`;
|
|
5265
5373
|
try {
|
|
5266
|
-
|
|
5374
|
+
fs.mkdirSync('screenshots', { recursive: true }); // recursive:true is a no-op if it already exists
|
|
5267
5375
|
await page.screenshot({ path: filename, type: 'png', fullPage: true });
|
|
5268
5376
|
console.log(formatLogMessage('info', `Screenshot saved: ${filename}`));
|
|
5269
5377
|
} catch (screenshotErr) {
|
|
@@ -5759,6 +5867,19 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
5759
5867
|
// actually starting — wrongly skipping live domains. c-ares isn't
|
|
5760
5868
|
// threadpool-bound so it's immune to that contention.
|
|
5761
5869
|
if (dnsPrecheckEnabled && taskDomain && !/^[\d.:]+$|^\[/.test(taskDomain)) {
|
|
5870
|
+
// Already proven dead earlier THIS run — either a pre-check NXDOMAIN or
|
|
5871
|
+
// a prior URL's navigation hit ERR_NAME_NOT_RESOLVED / ERR_ADDRESS_UNREACHABLE
|
|
5872
|
+
// (recordDeadDomain populates _deadDomains for both). Skip the repeat
|
|
5873
|
+
// instead of paying another fail-open navigation on a multi-URL dead
|
|
5874
|
+
// host (e.g. dlstreams.top?id=39/54/347). In-scan only (NOT persisted):
|
|
5875
|
+
// Chrome resolves via the system resolver, so a nav-level failure could
|
|
5876
|
+
// be a system-resolver glitch on a live host — a false "dead" must not
|
|
5877
|
+
// carry across runs. Cheap: a Map lookup, no DNS resolve.
|
|
5878
|
+
if (_deadDomains.has(taskDomain)) {
|
|
5879
|
+
dnsPrecheckSkips++;
|
|
5880
|
+
if (forceDebug) console.log(formatLogMessage('debug', `DNS pre-check: ${taskDomain} already dead this run (${_deadDomains.get(taskDomain)}) — skipping`));
|
|
5881
|
+
return { url: task.url, rules: [], success: false, error: `DNS: ${_deadDomains.get(taskDomain)}`, skipped: true };
|
|
5882
|
+
}
|
|
5762
5883
|
const cached = dnsNegativeCache.get(taskDomain);
|
|
5763
5884
|
if (cached && Date.now() - cached.timestamp < DNS_NEGATIVE_CACHE_TTL_MS) {
|
|
5764
5885
|
dnsPrecheckSkips++;
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@fanboynz/network-scanner",
|
|
3
|
-
"version": "3.
|
|
3
|
+
"version": "3.3.0",
|
|
4
4
|
"description": "A Puppeteer-based network scanner for analyzing web traffic, generating adblock filter rules, and identifying third-party requests. Features include fingerprint spoofing, Cloudflare bypass, content analysis with curl/grep, and multiple output formats.",
|
|
5
5
|
"main": "nwss.js",
|
|
6
6
|
"scripts": {
|