@fanboynz/network-scanner 3.3.0 → 3.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +16 -0
- package/README.md +5 -1
- package/lib/dns.js +117 -7
- package/lib/fingerprint.js +39 -36
- package/lib/interaction.js +151 -0
- package/lib/nettools.js +4 -1
- package/lib/validate_rules.js +3 -3
- package/nwss.1 +38 -6
- package/nwss.js +264 -25
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,22 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to the Network Scanner (nwss.js) project.
|
|
4
4
|
|
|
5
|
+
## [3.4.0] - 2026-06-13
|
|
6
|
+
|
|
7
|
+
### Added
|
|
8
|
+
- **`redirect_first_party` site option** (default `true`) — by default a redirect's destination domains (and chain hops) are registered first-party so the landed site's own resources aren't captured as third-party. Set `false` to keep redirect targets **third-party**, so `filterRegex`/`dig` apply to them under `thirdParty: true` — e.g. capturing the end domain of an ad/cloak redirect chain (which Chrome reaches via the `ERR_TOO_MANY_REDIRECTS` curl-resolve recovery). The originally-scanned domain stays first-party either way.
|
|
9
|
+
- **`ERR_TOO_MANY_REDIRECTS` is recovered, not hard-failed** — a redirect-cloaking chain (rotating throwaway domains) can exceed Chrome's ~20-hop ceiling. The scanner now recovers via two complementary paths: **(1)** it first waits briefly for the browser to *ride through* on its own — a JS/meta hop on a committed page resets Chrome's hop counter and often carries the page to the end site for free; **(2)** if the page parked on `chrome-error://` instead **and the site has `curl: true`**, it resolves the chain endpoint with `curl` (which, unlike headless Chrome, isn't served the endless-loop variant) and navigates there directly — a short hop that lands on the real end site. Either way it captures the end site's ad/tracker requests; falls back to the captured chain requests when neither path lands. The curl step is opt-in via the existing `curl` site option (so `curl: false`/unset never shells out to curl) and is also **skipped under a proxy/VPN** (curl runs direct and would leak the real IP / resolve from the wrong network); the free ride-through always applies.
|
|
10
|
+
- **`click_elements` site option** — after a page loads, click a list of CSS selectors **in order** (searched across the main frame and any iframe) (e.g. `["a[href*='/movie/']", ".play"]` to click a movie link then a play button). Reaches content via organic navigation/gesture instead of a direct deep-load, which some sites JS-redirect away, and triggers click-only content like video players. Each selector is `waitForSelector`-ed (visible) up to `click_wait` before clicking, so JS-rendered targets like video players aren't missed by racing ahead of them. The request interceptor stays attached, so the post-click page's requests run through the same `filterRegex`/`dig` matching; a click that navigates is followed and later selectors query the resulting page. Honors `realistic_click` (genuine trusted gesture) and `cursor_mode: "ghost"` (Bezier travel to the element); missing elements are skipped and never fail the scan. Settle/nav wait per click via `click_wait` (default 5000ms, capped at half the per-URL timeout).
|
|
11
|
+
- **`--dns` now also pins Chrome's page-navigation resolver via DoH.** Chrome ignores `--dns` for navigation and reads `/etc/resolv.conf` directly, so a broken or filtering system resolver could `ERR_NAME_NOT_RESOLVED` a domain the pre-check had already resolved. When the `--dns` servers map to a known public DoH provider — **Google, Cloudflare, Quad9, OpenDNS, AdGuard, CleanBrowsing, DNS.SB, Mullvad** (incl. malware/family/unfiltered variants) — Chrome is launched with secure-DNS `automatic` mode pointed at that provider, so page navigation resolves through the same resolver as the pre-check. `automatic` (not `secure`) keeps a system-DNS fallback if DoH is unreachable rather than failing the batch. **Applied to direct connections only** — skipped when a proxy (`--proxy-server`) or VPN is active, since the exit/tunnel does the resolution and local DoH would be redundant or resolve geo-split domains to the wrong region. Unmapped resolvers (custom/ISP, per-account providers like NextDNS, IPv6) fall back to system DNS with a warning naming the supported providers.
|
|
12
|
+
- **`--doh-disable`** site/CLI option (`doh_disable` in `.nwssconfig`), default off — opt out of the Chrome-navigation DoH pinning entirely. Chrome then resolves page navigation via the system `resolv.conf` even when `--dns` maps to a known provider, while the pre-check and `dig` still honor `--dns`. For networks where DoH adds latency or is blocked, or when system-path resolution is specifically wanted.
|
|
13
|
+
|
|
14
|
+
### Changed
|
|
15
|
+
- **A clamped `delay` is now logged (`--debug`)** — when `delay` exceeds its ceiling (the default 2s cap, or `timeout/2` under `delay_uncapped: true`) it was silently reduced, so `delay: 48000` quietly running as 29000ms looked like the flag was ignored. A debug line now reports the clamp and which ceiling applied (raise `timeout`, or set `delay_uncapped: true`, to lift it). The per-URL budget already reserves the full configured `delay`; this only surfaces the post-load dwell clamp.
|
|
16
|
+
- **DNS pre-check is paced and more tolerant under concurrency** — a concurrent scan fired up to `max_concurrent` simultaneous c-ares UDP queries at the pinned `--dns` servers; the burst (rough on WSL2's UDP-through-NAT path, and rate-limited by public resolvers) produced timeouts / `EREFUSED` that tripped the circuit breaker (`resolver errors N/M — suspending DNS pre-check`) and lost the dead-host-skip optimization. The pre-check timeout is raised 2s → 4s (a clean NXDOMAIN still returns fast, so the higher ceiling only costs time when the resolver is genuinely slow), and `createRotatingResolver` now caps in-flight queries with a counting semaphore (default 6) so the burst is paced and excess callers queue and drain quickly. The circuit breaker itself is unchanged — these reduce the error rate so it stops tripping on healthy resolvers.
|
|
17
|
+
|
|
18
|
+
### Fixed
|
|
19
|
+
- **`whois` availability probe is now platform-aware** — the fallback used `which whois` (Unix-only), which on native Windows would false-negative an installed `whois.exe` whose `whois --version` errors (e.g. Sysinternals whois). Uses `where` on Windows, `which` elsewhere. No change on Linux/macOS/WSL.
|
|
20
|
+
|
|
5
21
|
## [3.3.0] - 2026-06-06
|
|
6
22
|
|
|
7
23
|
### Added
|
package/README.md
CHANGED
|
@@ -77,7 +77,8 @@ A Puppeteer-based tool for scanning websites to find third-party (or optionally
|
|
|
77
77
|
| `--help`, `-h` | Show this help menu |
|
|
78
78
|
| `--version` | Show script version |
|
|
79
79
|
| `--max-concurrent <number>` | Maximum concurrent site processing (1-50, overrides config/default) |
|
|
80
|
-
| `--dns <ip[,ip,...]>` | Resolver(s) for the DNS pre-check **and**
|
|
80
|
+
| `--dns <ip[,ip,...]>` | Resolver(s) for the DNS pre-check, nettools' `dig`, **and** — when they map to a known public DoH provider — Chrome's page navigation via DNS-over-HTTPS on direct connections (one pins, several rotate per query; overrides `/etc/resolv.conf`; does not affect `whois`). Chrome normally ignores `--dns` and reads `resolv.conf` directly, so a broken system resolver could fail a domain the pre-check already resolved; pinning Chrome's secure-DNS (`automatic` mode) to the matching provider closes that gap. Mapped providers: Google, Cloudflare, Quad9, OpenDNS, AdGuard, CleanBrowsing, DNS.SB, Mullvad (incl. malware/family/unfiltered variants). **Skipped under a proxy/VPN** (the exit/tunnel resolves); unmapped resolvers (custom/ISP, per-account, IPv6) fall back to system DNS with a warning |
|
|
81
|
+
| `--doh-disable` | Opt out of the Chrome-navigation DoH pinning (default: off). Chrome resolves page navigation via system `resolv.conf` even when `--dns` maps to a known provider; the pre-check and `dig` still honor `--dns`. Use when DoH adds latency, is blocked on the network, or you want system-path resolution |
|
|
81
82
|
| `--show-dead-domains` | At end of scan, list hostnames that did not resolve / were unreachable (`NXDOMAIN`/`ENODATA` + `ERR_NAME_NOT_RESOLVED`/`ERR_ADDRESS_UNREACHABLE`). Excludes blocks/timeouts (those mean the domain is alive). For pruning dead URLs. |
|
|
82
83
|
| `--cleanup-interval <number>` | Browser restart interval in URLs processed (1-1000, overrides config/default) |
|
|
83
84
|
|
|
@@ -197,6 +198,7 @@ Example:
|
|
|
197
198
|
| `interact` | `true` or `false` | `false` | Simulate user interaction (hover, click) |
|
|
198
199
|
| `firstParty` | `0` or `1` | `0` | Match first-party requests |
|
|
199
200
|
| `thirdParty` | `0` or `1` | `1` | Match third-party requests |
|
|
201
|
+
| `redirect_first_party` | Boolean | `true` | Whether redirect-destination domains count as first-party. Set `false` to keep redirect targets (and chain hops) **third-party**, so `filterRegex`/`dig` can match them under `thirdParty:true` — e.g. capturing the end domain of an ad/cloak redirect. The originally-scanned domain stays first-party either way. Note: this also un-excludes the redirect target's own same-domain resources (more captured) |
|
|
200
202
|
| `subDomains` | `0` or `1` | `0` | 1 = preserve subdomains in output |
|
|
201
203
|
| `blocked` | Array | - | Domains or regexes to block during scanning |
|
|
202
204
|
| `even_blocked` | Boolean | `false` | Add matching rules even if requests are blocked |
|
|
@@ -316,6 +318,8 @@ When a page redirects to a new domain, first-party/third-party detection is base
|
|
|
316
318
|
| `interact_click_count` | Integer | `3` | Number of random content-zone clicks per load (capped at 20). Default 3 = primary + 2 backups, since ad SDKs sometimes suppress the 1st/2nd click as warmup |
|
|
317
319
|
| `realistic_click` | Boolean | `false` | Higher click fidelity: denser mouse approach (15 steps), ±1px hand-tremor micro-moves during the press, and ±1.5px mouseup drift (so mousedown≠mouseup coords) — for sites that score click realism. Costs ~80–120ms/click |
|
|
318
320
|
| `interact_typing` | Boolean | `false` | Enable typing simulation |
|
|
321
|
+
| `click_elements` | String[] | - | After load, click these CSS selectors **in order**, searched across the **main frame and any iframe** — reach content via organic navigation/gesture instead of a direct load (e.g. `["a[href*='/movie/']", ".play"]` to click a link then a play button). The request interceptor stays attached, so the post-click page's requests are matched against `filterRegex`/`dig` as usual. A click that navigates is followed; later selectors query the resulting page. Honors `realistic_click` and `cursor_mode: "ghost"` (Bezier travel to the element); missing elements are skipped (never fails the scan) |
|
|
322
|
+
| `click_wait` | Integer | `5000` | Per click: max time (ms) to wait for the element to appear/be visible (`waitForSelector`) **and** the settle/navigation wait after it; capped at half the per-URL timeout |
|
|
319
323
|
| `interact_intensity` | String | `"medium"` | Interaction simulation intensity: "low", "medium", "high" |
|
|
320
324
|
| `cursor_mode` | `"ghost"` | - | Use ghost-cursor Bezier mouse movements (requires `npm i ghost-cursor`) |
|
|
321
325
|
| `ghost_cursor_speed` | Number | auto | Ghost-cursor movement speed multiplier |
|
package/lib/dns.js
CHANGED
|
@@ -117,6 +117,29 @@ function createRotatingResolver(opts = {}) {
|
|
|
117
117
|
// distribution stays exactly even (no locking needed).
|
|
118
118
|
const nextResolver = () => (pool ? pool[cursor++ % pool.length] : dnsPromises);
|
|
119
119
|
|
|
120
|
+
// Concurrency cap (counting semaphore). The scanner runs up to max_concurrent
|
|
121
|
+
// navigations, each firing a pre-check; bursting that many simultaneous c-ares
|
|
122
|
+
// UDP queries at one resolver provokes timeouts / EREFUSED rate-limiting (and
|
|
123
|
+
// is rough on WSL2's UDP-through-NAT path), which then trips the pre-check
|
|
124
|
+
// circuit breaker. Capping in-flight queries paces the burst so the resolver
|
|
125
|
+
// can keep up. Excess callers queue and drain quickly (resolutions are short).
|
|
126
|
+
const maxInFlight = Math.max(1, opts.maxConcurrent || 6);
|
|
127
|
+
let inFlight = 0;
|
|
128
|
+
const waiters = [];
|
|
129
|
+
function acquire() {
|
|
130
|
+
return new Promise(resolve => {
|
|
131
|
+
if (inFlight < maxInFlight) { inFlight++; resolve(); }
|
|
132
|
+
else waiters.push(resolve);
|
|
133
|
+
});
|
|
134
|
+
}
|
|
135
|
+
function release() {
|
|
136
|
+
inFlight--;
|
|
137
|
+
if (waiters.length > 0 && inFlight < maxInFlight) {
|
|
138
|
+
inFlight++;
|
|
139
|
+
waiters.shift()();
|
|
140
|
+
}
|
|
141
|
+
}
|
|
142
|
+
|
|
120
143
|
// One resolution attempt: rotate the lead server, resolve4 first, and on
|
|
121
144
|
// no-IPv4 (ENODATA/ENOTFOUND) fall back to resolve6 so IPv6-only hosts aren't
|
|
122
145
|
// wrongly skipped. Any OTHER code propagates unchanged so the caller sees the
|
|
@@ -147,16 +170,21 @@ function createRotatingResolver(opts = {}) {
|
|
|
147
170
|
* if one REFUSES, the retry hits another).
|
|
148
171
|
*/
|
|
149
172
|
async function resolveHost(hostname, timeoutMs) {
|
|
173
|
+
await acquire();
|
|
150
174
|
try {
|
|
151
|
-
|
|
152
|
-
} catch (firstErr) {
|
|
153
|
-
const code = firstErr && firstErr.code;
|
|
154
|
-
if (DNS_TRANSIENT_ERRORS.has(code) || (firstErr && firstErr.message === 'DNS timeout')) {
|
|
155
|
-
if (forceDebug) console.log(formatLogMessage('debug', `DNS pre-check transient (${code || 'timeout'}) for ${hostname}, retrying once`));
|
|
175
|
+
try {
|
|
156
176
|
await attempt(hostname, timeoutMs);
|
|
157
|
-
}
|
|
158
|
-
|
|
177
|
+
} catch (firstErr) {
|
|
178
|
+
const code = firstErr && firstErr.code;
|
|
179
|
+
if (DNS_TRANSIENT_ERRORS.has(code) || (firstErr && firstErr.message === 'DNS timeout')) {
|
|
180
|
+
if (forceDebug) console.log(formatLogMessage('debug', `DNS pre-check transient (${code || 'timeout'}) for ${hostname}, retrying once`));
|
|
181
|
+
await attempt(hostname, timeoutMs);
|
|
182
|
+
} else {
|
|
183
|
+
throw firstErr;
|
|
184
|
+
}
|
|
159
185
|
}
|
|
186
|
+
} finally {
|
|
187
|
+
release();
|
|
160
188
|
}
|
|
161
189
|
}
|
|
162
190
|
|
|
@@ -230,9 +258,91 @@ function createDnsCircuitBreaker(opts = {}) {
|
|
|
230
258
|
};
|
|
231
259
|
}
|
|
232
260
|
|
|
261
|
+
// Map well-known public resolver IPs (IPv4) to their DNS-over-HTTPS (DoH)
|
|
262
|
+
// endpoint templates. Chrome's page-navigation resolver ignores --dns and reads
|
|
263
|
+
// /etc/resolv.conf directly; pointing Chrome's DoH at the same provider the
|
|
264
|
+
// --dns pre-check uses closes that gap, so a broken or filtering system
|
|
265
|
+
// resolv.conf can't fail navigations the pre-check already passed.
|
|
266
|
+
//
|
|
267
|
+
// Only providers whose DoH endpoint is derivable from a fixed anycast IP are
|
|
268
|
+
// listed. Resolvers with per-account templates (NextDNS, ControlD, AdGuard
|
|
269
|
+
// personal) can't be mapped from an IP and fall through to `unmapped` — Chrome
|
|
270
|
+
// keeps system DNS for those (a warning is logged). IPv6 is intentionally not
|
|
271
|
+
// mapped here; an IPv6 --dns server simply falls back to system DNS.
|
|
272
|
+
const DOH_PROVIDER_TEMPLATES = {
|
|
273
|
+
// Google
|
|
274
|
+
'8.8.8.8': 'https://dns.google/dns-query',
|
|
275
|
+
'8.8.4.4': 'https://dns.google/dns-query',
|
|
276
|
+
// Cloudflare — standard / malware-blocking / malware+adult
|
|
277
|
+
'1.1.1.1': 'https://cloudflare-dns.com/dns-query',
|
|
278
|
+
'1.0.0.1': 'https://cloudflare-dns.com/dns-query',
|
|
279
|
+
'1.1.1.2': 'https://security.cloudflare-dns.com/dns-query',
|
|
280
|
+
'1.0.0.2': 'https://security.cloudflare-dns.com/dns-query',
|
|
281
|
+
'1.1.1.3': 'https://family.cloudflare-dns.com/dns-query',
|
|
282
|
+
'1.0.0.3': 'https://family.cloudflare-dns.com/dns-query',
|
|
283
|
+
// Quad9 — secured (default) / unsecured / secured+ECS
|
|
284
|
+
'9.9.9.9': 'https://dns.quad9.net/dns-query',
|
|
285
|
+
'149.112.112.112': 'https://dns.quad9.net/dns-query',
|
|
286
|
+
'9.9.9.10': 'https://dns10.quad9.net/dns-query',
|
|
287
|
+
'149.112.112.10': 'https://dns10.quad9.net/dns-query',
|
|
288
|
+
'9.9.9.11': 'https://dns11.quad9.net/dns-query',
|
|
289
|
+
'149.112.112.11': 'https://dns11.quad9.net/dns-query',
|
|
290
|
+
// OpenDNS — standard / FamilyShield
|
|
291
|
+
'208.67.222.222': 'https://doh.opendns.com/dns-query',
|
|
292
|
+
'208.67.220.220': 'https://doh.opendns.com/dns-query',
|
|
293
|
+
'208.67.222.123': 'https://doh.familyshield.opendns.com/dns-query',
|
|
294
|
+
'208.67.220.123': 'https://doh.familyshield.opendns.com/dns-query',
|
|
295
|
+
// AdGuard DNS — default (ad/tracker block) / non-filtering / family
|
|
296
|
+
'94.140.14.14': 'https://dns.adguard-dns.com/dns-query',
|
|
297
|
+
'94.140.15.15': 'https://dns.adguard-dns.com/dns-query',
|
|
298
|
+
'94.140.14.140': 'https://unfiltered.adguard-dns.com/dns-query',
|
|
299
|
+
'94.140.14.141': 'https://unfiltered.adguard-dns.com/dns-query',
|
|
300
|
+
'94.140.14.15': 'https://family.adguard-dns.com/dns-query',
|
|
301
|
+
'94.140.15.16': 'https://family.adguard-dns.com/dns-query',
|
|
302
|
+
// CleanBrowsing — security / family / adult
|
|
303
|
+
'185.228.168.9': 'https://doh.cleanbrowsing.org/doh/security-filter/',
|
|
304
|
+
'185.228.169.9': 'https://doh.cleanbrowsing.org/doh/security-filter/',
|
|
305
|
+
'185.228.168.168': 'https://doh.cleanbrowsing.org/doh/family-filter/',
|
|
306
|
+
'185.228.169.168': 'https://doh.cleanbrowsing.org/doh/family-filter/',
|
|
307
|
+
'185.228.168.10': 'https://doh.cleanbrowsing.org/doh/adult-filter/',
|
|
308
|
+
'185.228.169.11': 'https://doh.cleanbrowsing.org/doh/adult-filter/',
|
|
309
|
+
// DNS.SB
|
|
310
|
+
'185.222.222.222': 'https://doh.dns.sb/dns-query',
|
|
311
|
+
'45.11.45.11': 'https://doh.dns.sb/dns-query',
|
|
312
|
+
// Mullvad (non-filtering)
|
|
313
|
+
'194.242.2.2': 'https://dns.mullvad.net/dns-query',
|
|
314
|
+
};
|
|
315
|
+
|
|
316
|
+
/**
|
|
317
|
+
* Resolve a list of --dns resolver specs to Chrome DoH templates.
|
|
318
|
+
* Strips any :port (DoH is always 443) and dedupes. Returns the
|
|
319
|
+
* space-joined template string Chrome's --dns-over-https-templates wants,
|
|
320
|
+
* plus which inputs were mapped vs had no known DoH endpoint.
|
|
321
|
+
* @param {string[]} servers - resolver IPs (optionally ip:port) from --dns
|
|
322
|
+
* @returns {{ templates: string, mapped: string[], unmapped: string[] }}
|
|
323
|
+
*/
|
|
324
|
+
function dohTemplatesForResolvers(servers) {
|
|
325
|
+
const templates = [];
|
|
326
|
+
const mapped = [];
|
|
327
|
+
const unmapped = [];
|
|
328
|
+
for (const raw of (servers || [])) {
|
|
329
|
+
const ip = String(raw).trim().replace(/:\d+$/, ''); // drop :port — DoH is 443
|
|
330
|
+
if (!ip) continue;
|
|
331
|
+
const tpl = DOH_PROVIDER_TEMPLATES[ip];
|
|
332
|
+
if (tpl) {
|
|
333
|
+
if (!templates.includes(tpl)) templates.push(tpl);
|
|
334
|
+
mapped.push(ip);
|
|
335
|
+
} else {
|
|
336
|
+
unmapped.push(ip);
|
|
337
|
+
}
|
|
338
|
+
}
|
|
339
|
+
return { templates: templates.join(' '), mapped, unmapped };
|
|
340
|
+
}
|
|
341
|
+
|
|
233
342
|
module.exports = {
|
|
234
343
|
createRotatingResolver,
|
|
235
344
|
createDnsCircuitBreaker,
|
|
236
345
|
parseDnsServers,
|
|
237
346
|
isNonExistenceError,
|
|
347
|
+
dohTemplatesForResolvers,
|
|
238
348
|
};
|
package/lib/fingerprint.js
CHANGED
|
@@ -2489,44 +2489,47 @@ async function applyUserAgentSpoofing(page, siteConfig, forceDebug, currentUrl)
|
|
|
2489
2489
|
}, 'enhanced mouse/pointer spoofing');
|
|
2490
2490
|
|
|
2491
2491
|
safeExecute(() => {
|
|
2492
|
-
// Neutralize CDP
|
|
2493
|
-
// CDP's Runtime.enable
|
|
2494
|
-
//
|
|
2495
|
-
//
|
|
2496
|
-
|
|
2497
|
-
|
|
2498
|
-
console.
|
|
2499
|
-
|
|
2500
|
-
|
|
2501
|
-
|
|
2502
|
-
|
|
2503
|
-
|
|
2504
|
-
|
|
2505
|
-
|
|
2506
|
-
|
|
2492
|
+
// Neutralize CDP "inspector reads logged object" traps across the
|
|
2493
|
+
// common console methods. CDP's Runtime.enable serializes console
|
|
2494
|
+
// arguments, reading getters / walking prototypes on logged objects —
|
|
2495
|
+
// disable-devtool-style scripts exploit this (log an object with a
|
|
2496
|
+
// getter, then check if it fired) to detect the inspector and redirect
|
|
2497
|
+
// away. The previous version guarded ONLY console.debug; detectors
|
|
2498
|
+
// overwhelmingly use console.log (and dir/table), so the trap still
|
|
2499
|
+
// fired. Sanitize args so getter/proxy traps can't trigger, and drop
|
|
2500
|
+
// DevTools-protocol noise. Covers log/info/debug/dir/table — the
|
|
2501
|
+
// methods detection uses; warn/error stay native (legit usage, and
|
|
2502
|
+
// not the common vector).
|
|
2503
|
+
const sanitizeArg = (arg) => {
|
|
2504
|
+
// Strip Error objects with custom .stack getters (inspector reads .stack)
|
|
2505
|
+
if (arg instanceof Error) {
|
|
2506
|
+
const desc = Object.getOwnPropertyDescriptor(arg, 'stack');
|
|
2507
|
+
if (desc && desc.get) return `${arg.name}: ${arg.message}`;
|
|
2507
2508
|
}
|
|
2508
|
-
//
|
|
2509
|
-
|
|
2510
|
-
|
|
2511
|
-
|
|
2512
|
-
|
|
2513
|
-
|
|
2514
|
-
|
|
2515
|
-
|
|
2516
|
-
|
|
2517
|
-
|
|
2518
|
-
const proto = Object.getPrototypeOf(arg);
|
|
2519
|
-
if (proto && proto !== Object.prototype && proto !== Array.prototype) {
|
|
2520
|
-
try { Object.keys(proto); } catch { return '[object Object]'; }
|
|
2521
|
-
}
|
|
2522
|
-
} catch { return '[object Object]'; }
|
|
2523
|
-
}
|
|
2524
|
-
return arg;
|
|
2525
|
-
});
|
|
2526
|
-
return originalConsoleDebug.apply(this, sanitized);
|
|
2509
|
+
// Neutralize Proxy/getter prototype traps (inspector walks the chain)
|
|
2510
|
+
if (arg !== null && typeof arg === 'object') {
|
|
2511
|
+
try {
|
|
2512
|
+
const proto = Object.getPrototypeOf(arg);
|
|
2513
|
+
if (proto && proto !== Object.prototype && proto !== Array.prototype) {
|
|
2514
|
+
try { Object.keys(proto); } catch { return '[object Object]'; }
|
|
2515
|
+
}
|
|
2516
|
+
} catch { return '[object Object]'; }
|
|
2517
|
+
}
|
|
2518
|
+
return arg;
|
|
2527
2519
|
};
|
|
2528
|
-
|
|
2529
|
-
|
|
2520
|
+
const DEVTOOLS_NOISE = ['DevTools', 'Runtime.evaluate', 'Page.addScriptToEvaluateOnNewDocument', 'Protocol error'];
|
|
2521
|
+
for (const method of ['log', 'info', 'debug', 'dir', 'table']) {
|
|
2522
|
+
const original = console[method];
|
|
2523
|
+
if (typeof original !== 'function') continue;
|
|
2524
|
+
console[method] = function(...args) {
|
|
2525
|
+
const message = args.join(' ');
|
|
2526
|
+
if (typeof message === 'string' && DEVTOOLS_NOISE.some(n => message.includes(n))) {
|
|
2527
|
+
return;
|
|
2528
|
+
}
|
|
2529
|
+
return original.apply(this, args.map(sanitizeArg));
|
|
2530
|
+
};
|
|
2531
|
+
}
|
|
2532
|
+
}, 'console trap neutralization');
|
|
2530
2533
|
|
|
2531
2534
|
// NOTE: The previous `location URL masking` Proxy was removed.
|
|
2532
2535
|
// It wrapped window.location in a Proxy to return 'about:blank' when
|
package/lib/interaction.js
CHANGED
|
@@ -1322,6 +1322,154 @@ function createInteractionConfig(url, siteConfig = {}) {
|
|
|
1322
1322
|
* const { generateRandomCoordinates } = require('./lib/interaction');
|
|
1323
1323
|
* const pos = generateRandomCoordinates(1920, 1080, { preferEdges: true });
|
|
1324
1324
|
*/
|
|
1325
|
+
|
|
1326
|
+
/**
|
|
1327
|
+
* Find the first VISIBLE match for a selector across the main frame AND every
|
|
1328
|
+
* child frame (iframes), polling until found or timeoutMs. page.waitForSelector
|
|
1329
|
+
* only searches the main frame; players/ads commonly live in an iframe, so we
|
|
1330
|
+
* walk page.frames() (main frame first, so a main-frame match wins). "Visible"
|
|
1331
|
+
* is approximated by a non-null boundingBox (rules out display:none/zero-size).
|
|
1332
|
+
* Returns an ElementHandle bound to its frame, or null on timeout.
|
|
1333
|
+
*/
|
|
1334
|
+
async function findVisibleInAnyFrame(page, selector, timeoutMs) {
|
|
1335
|
+
const deadline = Date.now() + Math.max(0, timeoutMs);
|
|
1336
|
+
for (;;) {
|
|
1337
|
+
if (page.isClosed()) return null;
|
|
1338
|
+
const mf = page.mainFrame();
|
|
1339
|
+
const frames = [mf, ...page.frames().filter(f => f !== mf)]; // main frame first
|
|
1340
|
+
for (const frame of frames) {
|
|
1341
|
+
let el = null;
|
|
1342
|
+
try {
|
|
1343
|
+
el = await frame.$(selector);
|
|
1344
|
+
} catch (e) {
|
|
1345
|
+
// An invalid CSS selector (config typo) throws "... is not a valid
|
|
1346
|
+
// selector" — a permanent error, so surface it instead of polling to a
|
|
1347
|
+
// confusing not-found. Frame-detached / cross-origin hiccups have other
|
|
1348
|
+
// messages and fall through to keep polling.
|
|
1349
|
+
if (/is not a valid selector|not a valid or unsupported selector/i.test((e && e.message) || '')) {
|
|
1350
|
+
throw new Error(`invalid CSS selector: ${(e && e.message) || e}`);
|
|
1351
|
+
}
|
|
1352
|
+
el = null;
|
|
1353
|
+
}
|
|
1354
|
+
if (el) {
|
|
1355
|
+
let box = null;
|
|
1356
|
+
try { box = await el.boundingBox(); } catch (_) { box = null; }
|
|
1357
|
+
if (box) return el; // present AND rendered
|
|
1358
|
+
try { await el.dispose(); } catch (_) { /* not rendered yet — keep polling */ }
|
|
1359
|
+
}
|
|
1360
|
+
}
|
|
1361
|
+
if (Date.now() >= deadline) return null;
|
|
1362
|
+
await new Promise(r => setTimeout(r, 250));
|
|
1363
|
+
}
|
|
1364
|
+
}
|
|
1365
|
+
|
|
1366
|
+
/**
|
|
1367
|
+
* Click a list of CSS selectors in order, reaching content via organic
|
|
1368
|
+
* gesture/navigation instead of a direct page load. Each selector's first
|
|
1369
|
+
* match is clicked; if the click navigates (an <a href> / form submit), we wait
|
|
1370
|
+
* for it to commit, otherwise we wait a settle window for in-page actions
|
|
1371
|
+
* (e.g. a player starting). The page's request interceptor stays attached
|
|
1372
|
+
* throughout, so the post-click requests flow into the caller's normal
|
|
1373
|
+
* filterRegex/dig matching — this function only performs the clicks.
|
|
1374
|
+
*
|
|
1375
|
+
* Missing elements are skipped (sites change markup); a click error never
|
|
1376
|
+
* throws out of here. After a navigation, later selectors are queried against
|
|
1377
|
+
* the NEW page (so e.g. "movie link" then "play button" works).
|
|
1378
|
+
*
|
|
1379
|
+
* @param {import('puppeteer').Page} page
|
|
1380
|
+
* @param {string[]} selectors - CSS selectors, clicked in order
|
|
1381
|
+
* @param {object} [options]
|
|
1382
|
+
* @param {boolean} [options.realistic=false] - use humanClick (hover/tremor) vs elementHandle.click
|
|
1383
|
+
* @param {number} [options.waitMs=5000] - per click: max wait for the element to
|
|
1384
|
+
* appear+be visible (waitForSelector), AND the settle/nav window after the click
|
|
1385
|
+
* @param {function} [options.ghostClick] - optional (x,y)=>Promise that performs a
|
|
1386
|
+
* ghost-cursor click (Bezier travel + press) at the element centre. Injected by the
|
|
1387
|
+
* caller so this module needn't depend on ghost-cursor.js (which depends on this one).
|
|
1388
|
+
* When provided it takes precedence over the humanClick/el.click paths.
|
|
1389
|
+
* @param {boolean} [options.forceDebug=false]
|
|
1390
|
+
* @returns {Promise<Array<{selector:string, clicked:boolean, reason?:string}>>}
|
|
1391
|
+
*/
|
|
1392
|
+
async function performTargetedClicks(page, selectors, options = {}) {
|
|
1393
|
+
const { realistic = false, waitMs = 5000, forceDebug = false, ghostClick = null } = options;
|
|
1394
|
+
const results = [];
|
|
1395
|
+
if (!Array.isArray(selectors)) return results;
|
|
1396
|
+
|
|
1397
|
+
for (const raw of selectors) {
|
|
1398
|
+
const selector = typeof raw === 'string' ? raw.trim() : '';
|
|
1399
|
+
if (!selector) continue;
|
|
1400
|
+
if (page.isClosed()) break;
|
|
1401
|
+
|
|
1402
|
+
// Wait for the element to appear AND be visible (up to waitMs), searching
|
|
1403
|
+
// the main frame and every iframe — many targets (video players, lazy
|
|
1404
|
+
// menus, post-consent buttons) are injected by JS after DOMContentLoaded
|
|
1405
|
+
// and/or live inside an iframe, so a single main-frame query would miss
|
|
1406
|
+
// them. Timeout → treat as not-found, skip.
|
|
1407
|
+
let el;
|
|
1408
|
+
try {
|
|
1409
|
+
el = await findVisibleInAnyFrame(page, selector, waitMs);
|
|
1410
|
+
} catch (selErr) {
|
|
1411
|
+
const msg = (selErr && selErr.message) || '';
|
|
1412
|
+
if (/invalid CSS selector|not a valid selector/i.test(msg)) {
|
|
1413
|
+
// Config typo, not a transient miss — warn (visible without --debug).
|
|
1414
|
+
console.warn(formatLogMessage('warn', `${INTERACTION_TAG} click_elements: invalid selector "${selector}" — skipping (${msg})`));
|
|
1415
|
+
results.push({ selector, clicked: false, reason: 'invalid-selector' });
|
|
1416
|
+
} else {
|
|
1417
|
+
// Defensive: findVisibleInAnyFrame swallows transient/detached-frame
|
|
1418
|
+
// errors internally, so this shouldn't fire — but never let an
|
|
1419
|
+
// unexpected find error abort the remaining selectors.
|
|
1420
|
+
if (forceDebug) console.log(formatLogMessage('debug', `${INTERACTION_TAG} click_elements: find failed for "${selector}": ${msg} — skipping`));
|
|
1421
|
+
results.push({ selector, clicked: false, reason: msg || 'find-error' });
|
|
1422
|
+
}
|
|
1423
|
+
continue;
|
|
1424
|
+
}
|
|
1425
|
+
if (!el) {
|
|
1426
|
+
if (forceDebug) console.log(formatLogMessage('debug', `${INTERACTION_TAG} click_elements: "${selector}" not visible (any frame) within ${waitMs}ms — skipping`));
|
|
1427
|
+
results.push({ selector, clicked: false, reason: 'not-found' });
|
|
1428
|
+
continue;
|
|
1429
|
+
}
|
|
1430
|
+
|
|
1431
|
+
try {
|
|
1432
|
+
// Bring it into view so coordinate clicks land (elementHandle.click also
|
|
1433
|
+
// auto-scrolls, but humanClick clicks raw coordinates).
|
|
1434
|
+
try { await el.evaluate(e => e.scrollIntoView({ block: 'center', inline: 'center' })); } catch (_) { /* detached/odd element */ }
|
|
1435
|
+
|
|
1436
|
+
// Arm a navigation wait BEFORE clicking so a link/submit is caught.
|
|
1437
|
+
const navP = page.waitForNavigation({ waitUntil: 'domcontentloaded', timeout: waitMs + 3000 }).catch(() => {});
|
|
1438
|
+
|
|
1439
|
+
let box = null;
|
|
1440
|
+
try { box = await el.boundingBox(); } catch (_) { box = null; }
|
|
1441
|
+
if (typeof ghostClick === 'function' && box) {
|
|
1442
|
+
// Ghost-cursor path: Bezier travel to the element centre + realistic
|
|
1443
|
+
// press, matching the interact phase (caller-injected).
|
|
1444
|
+
await ghostClick(box.x + box.width / 2, box.y + box.height / 2);
|
|
1445
|
+
} else if (realistic && box) {
|
|
1446
|
+
await humanClick(page, box.x + box.width / 2, box.y + box.height / 2, { realistic: true, forceDebug });
|
|
1447
|
+
} else {
|
|
1448
|
+
await el.click({ delay: 30 }); // trusted gesture; auto-scrolls + handles non-visible coords
|
|
1449
|
+
}
|
|
1450
|
+
|
|
1451
|
+
// Resolve on whichever comes first: a committed navigation, or the settle
|
|
1452
|
+
// window (in-page actions). Either way, requests fired in between are
|
|
1453
|
+
// already captured by the caller's interceptor. Capture-and-clear the
|
|
1454
|
+
// settle timer (cdp.js / interact pattern): when navP wins, an uncleared
|
|
1455
|
+
// setTimeout would keep the event loop + closure alive for the full waitMs.
|
|
1456
|
+
let settleTimer;
|
|
1457
|
+
await Promise.race([
|
|
1458
|
+
navP,
|
|
1459
|
+
new Promise(r => { settleTimer = setTimeout(r, waitMs); })
|
|
1460
|
+
]).finally(() => clearTimeout(settleTimer));
|
|
1461
|
+
results.push({ selector, clicked: true });
|
|
1462
|
+
if (forceDebug) console.log(formatLogMessage('debug', `${INTERACTION_TAG} click_elements: clicked "${selector}"`));
|
|
1463
|
+
} catch (err) {
|
|
1464
|
+
if (forceDebug) console.log(formatLogMessage('debug', `${INTERACTION_TAG} click_elements: click failed for "${selector}": ${err.message}`));
|
|
1465
|
+
results.push({ selector, clicked: false, reason: err.message });
|
|
1466
|
+
} finally {
|
|
1467
|
+
try { await el.dispose(); } catch (_) { /* detached after navigation — fine */ }
|
|
1468
|
+
}
|
|
1469
|
+
}
|
|
1470
|
+
return results;
|
|
1471
|
+
}
|
|
1472
|
+
|
|
1325
1473
|
module.exports = {
|
|
1326
1474
|
// Main interaction functions
|
|
1327
1475
|
performPageInteraction,
|
|
@@ -1337,5 +1485,8 @@ module.exports = {
|
|
|
1337
1485
|
// hand-tremor + mouseup drift). Reused by lib/ghost-cursor.js so the ghost
|
|
1338
1486
|
// coordinate click gets the same press realism as built-in content clicks.
|
|
1339
1487
|
humanClick,
|
|
1488
|
+
// Click specific CSS selectors in order (organic navigation / play-button /
|
|
1489
|
+
// link clicking) — site config `click_elements`.
|
|
1490
|
+
performTargetedClicks,
|
|
1340
1491
|
generateRandomCoordinates
|
|
1341
1492
|
};
|
package/lib/nettools.js
CHANGED
|
@@ -366,7 +366,10 @@ function validateWhoisAvailability() {
|
|
|
366
366
|
};
|
|
367
367
|
} catch (error) {
|
|
368
368
|
try {
|
|
369
|
-
|
|
369
|
+
// `which` is Unix-only; Windows uses `where`. Without this, an installed
|
|
370
|
+
// whois.exe whose `whois --version` errors (e.g. Sysinternals whois)
|
|
371
|
+
// would be false-negatived as unavailable on native Windows.
|
|
372
|
+
execSync(process.platform === 'win32' ? 'where whois' : 'which whois', { encoding: 'utf8' });
|
|
370
373
|
validateWhoisAvailability._cached = {
|
|
371
374
|
isAvailable: true,
|
|
372
375
|
version: 'whois (version unknown)'
|
package/lib/validate_rules.js
CHANGED
|
@@ -1102,7 +1102,7 @@ function testDomainValidation() {
|
|
|
1102
1102
|
const KNOWN_SITE_CONFIG_KEYS = new Set([
|
|
1103
1103
|
'adblock_rules', 'blocked', 'bypass_cache', 'capture_popups',
|
|
1104
1104
|
'capture_popups_max_depth', 'capture_popups_window_ms', 'cdp', 'cdp_specific',
|
|
1105
|
-
'clear_sitedata', 'clear_sitedata_full_on_reload',
|
|
1105
|
+
'clear_sitedata', 'clear_sitedata_full_on_reload', 'click_elements', 'click_wait',
|
|
1106
1106
|
'cloudflare_bypass', 'cloudflare_max_retries', 'comments',
|
|
1107
1107
|
'cloudflare_parallel_detection', 'cloudflare_phish', 'cloudflare_retry_on_error',
|
|
1108
1108
|
'css_blocked', 'curl', 'cursor_mode', 'custom_headers', 'delay',
|
|
@@ -1119,7 +1119,7 @@ const KNOWN_SITE_CONFIG_KEYS = new Set([
|
|
|
1119
1119
|
'js_redirect_timeout', 'localhost', 'max_redirects', 'openvpn', 'pihole',
|
|
1120
1120
|
'output_regex',
|
|
1121
1121
|
'plain', 'privoxy', 'proxy', 'proxy_bypass', 'proxy_debug', 'proxy_remote_dns',
|
|
1122
|
-
'realistic_click', 'referrer_disable', 'referrer_headers', 'regex_and',
|
|
1122
|
+
'realistic_click', 'redirect_first_party', 'referrer_disable', 'referrer_headers', 'regex_and',
|
|
1123
1123
|
'reload', 'resourceTypes', 'screenshot', 'searchstring', 'searchstring_and',
|
|
1124
1124
|
'socks5_bypass', 'socks5_debug', 'socks5_proxy', 'socks5_remote_dns',
|
|
1125
1125
|
'subDomains',
|
|
@@ -1151,7 +1151,7 @@ const BOOLEAN_SITE_CONFIG_FIELDS = new Set([
|
|
|
1151
1151
|
'grep', 'headful', 'ignore_similar', 'ignore_similar_ignored_domains',
|
|
1152
1152
|
'interact', 'interact_clicks', 'interact_scrolling', 'isBrave', 'localhost',
|
|
1153
1153
|
'pihole', 'plain', 'privoxy', 'proxy_debug', 'proxy_remote_dns',
|
|
1154
|
-
'realistic_click', 'referrer_disable', 'regex_and', 'screenshot',
|
|
1154
|
+
'realistic_click', 'redirect_first_party', 'referrer_disable', 'regex_and', 'screenshot',
|
|
1155
1155
|
'searchstring_and', 'socks5_debug', 'socks5_remote_dns', 'thirdParty',
|
|
1156
1156
|
'unbound', 'whois_retry_on_error', 'whois_retry_on_timeout', 'whois_use_fallback',
|
|
1157
1157
|
]);
|
package/nwss.1
CHANGED
|
@@ -138,12 +138,32 @@ Maximum concurrent site processing (1-50, overrides config/default).
|
|
|
138
138
|
|
|
139
139
|
.TP
|
|
140
140
|
.BR \--dns " \fIIP\fR[,\fIIP\fR...]"
|
|
141
|
-
Nameserver(s) for the DNS pre-check
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
141
|
+
Nameserver(s) for the DNS pre-check, nettools' dig, and \(em when they map to a
|
|
142
|
+
known public DoH provider \(em Chrome's page navigation via DNS-over-HTTPS on
|
|
143
|
+
direct connections (does not affect whois). A single address pins all queries
|
|
144
|
+
to it; several are rotated per query (each leading once, the rest as failover)
|
|
145
|
+
to spread the load. Routing dig through these avoids dig timeouts on a flaky
|
|
146
|
+
system resolver silently dropping dig-gated domains. Overrides
|
|
147
|
+
/etc/resolv.conf. Invalid entries are warned and dropped.
|
|
148
|
+
.IP
|
|
149
|
+
Chrome ignores \fB\--dns\fR for navigation and reads /etc/resolv.conf directly,
|
|
150
|
+
so a broken system resolver could fail a domain the pre-check already resolved.
|
|
151
|
+
When the \fB\--dns\fR servers match a known DoH provider (Google, Cloudflare,
|
|
152
|
+
Quad9, OpenDNS, AdGuard, CleanBrowsing, DNS.SB, Mullvad \(em including
|
|
153
|
+
malware/family/unfiltered variants), Chrome is launched with secure-DNS
|
|
154
|
+
\fIautomatic\fR mode pointed at that provider, so navigation resolves through
|
|
155
|
+
the same resolver. \fIautomatic\fR (not \fIsecure\fR) keeps a system-DNS
|
|
156
|
+
fallback if DoH is unreachable. Skipped under a proxy or VPN (the exit/tunnel
|
|
157
|
+
resolves); unmapped resolvers (custom/ISP, per-account providers, IPv6) fall
|
|
158
|
+
back to system DNS with a warning.
|
|
159
|
+
|
|
160
|
+
.TP
|
|
161
|
+
.B \--doh-disable
|
|
162
|
+
Opt out of the Chrome-navigation DoH pinning (default: off). Chrome then
|
|
163
|
+
resolves page navigation via the system /etc/resolv.conf even when \fB\--dns\fR
|
|
164
|
+
maps to a known provider; the pre-check and dig still honor \fB\--dns\fR. Use
|
|
165
|
+
when DoH adds latency, is blocked on the network, or system-path resolution is
|
|
166
|
+
specifically wanted.
|
|
147
167
|
|
|
148
168
|
.TP
|
|
149
169
|
.BR \--cleanup-interval " \fINUMBER\fR"
|
|
@@ -326,6 +346,14 @@ Integer. Number of random content-zone clicks per load, capped at 20 (default: 3
|
|
|
326
346
|
.B realistic_click
|
|
327
347
|
Boolean. Higher click fidelity: denser mouse approach (15 steps), sub-pixel hand-tremor micro-moves during the press, and a small mouseup drift so the mousedown and mouseup coordinates differ. For sites that score click realism. Costs roughly 80-120ms per click (default: false).
|
|
328
348
|
|
|
349
|
+
.TP
|
|
350
|
+
.B click_elements
|
|
351
|
+
Array of CSS selectors. After the page loads, click each selector's first match \fBin order\fR (searched across the main frame and any iframe) \(em reaching content via organic navigation/gesture instead of a direct deep-load (which some sites JS-redirect away). Example: \fB["a[href*='/movie/']", ".play"]\fR clicks a movie link then a play button. The request interceptor stays attached, so the post-click page's requests are matched against \fBfilterRegex\fR/\fBdig\fR as usual; a click that navigates is followed and later selectors query the resulting page. Honors \fBrealistic_click\fR and \fBcursor_mode: "ghost"\fR (Bezier travel to the element); missing elements are skipped and never fail the scan.
|
|
352
|
+
|
|
353
|
+
.TP
|
|
354
|
+
.B click_wait
|
|
355
|
+
Per click: maximum time in milliseconds to wait for the element to appear and be visible (\fBwaitForSelector\fR) AND the navigation/settle wait after the click (default: 5000; capped at half the per-URL timeout).
|
|
356
|
+
|
|
329
357
|
.TP
|
|
330
358
|
.B delay
|
|
331
359
|
Milliseconds to wait after page load (default: 4000).
|
|
@@ -350,6 +378,10 @@ Boolean. Allow first-party request matching (default: false).
|
|
|
350
378
|
.B thirdParty
|
|
351
379
|
Boolean. Allow third-party request matching (default: true).
|
|
352
380
|
|
|
381
|
+
.TP
|
|
382
|
+
.B redirect_first_party
|
|
383
|
+
Boolean (default: true). Whether redirect-destination domains (and chain hops) are treated as first-party. Set to \fBfalse\fR to keep redirect targets \fBthird-party\fR so \fBfilterRegex\fR/\fBdig\fR can match them under \fBthirdParty: true\fR \(em e.g. capturing the end domain of an ad/cloak redirect. The originally-scanned domain stays first-party either way; note this also un-excludes the redirect target's own same-domain resources.
|
|
384
|
+
|
|
353
385
|
.TP
|
|
354
386
|
.B fingerprint_protection
|
|
355
387
|
Boolean or \fB"random"\fR. Enable browser fingerprint spoofing.
|
package/nwss.js
CHANGED
|
@@ -9,7 +9,7 @@ const fs = require('fs');
|
|
|
9
9
|
const os = require('os');
|
|
10
10
|
const psl = require('psl');
|
|
11
11
|
const path = require('path');
|
|
12
|
-
const { createRotatingResolver, createDnsCircuitBreaker, parseDnsServers, isNonExistenceError } = require('./lib/dns');
|
|
12
|
+
const { createRotatingResolver, createDnsCircuitBreaker, parseDnsServers, isNonExistenceError, dohTemplatesForResolvers } = require('./lib/dns');
|
|
13
13
|
const { createGrepHandler, validateGrepAvailability } = require('./lib/grep');
|
|
14
14
|
const { compressMultipleFiles } = require('./lib/compress');
|
|
15
15
|
const { parseSearchStrings, createResponseHandler } = require('./lib/searchstring');
|
|
@@ -17,6 +17,7 @@ const { applyAllFingerprintSpoofing, USER_AGENT_COLLECTIONS, CHROME_BUILD, CHROM
|
|
|
17
17
|
const { formatRules, handleOutput, getFormatDescription } = require('./lib/output');
|
|
18
18
|
// Curl functionality (replace searchstring curl handler)
|
|
19
19
|
const { validateCurlAvailability, createCurlHandler: createCurlModuleHandler } = require('./lib/curl');
|
|
20
|
+
const { runProcess } = require('./lib/spawn-async');
|
|
20
21
|
// Rule validation
|
|
21
22
|
const { validateRulesetFile, validateFullConfig, testDomainValidation, cleanRulesetFile, normalizeSiteConfig } = require('./lib/validate_rules');
|
|
22
23
|
// CF Bypass
|
|
@@ -65,7 +66,7 @@ const SMART_CACHE_TAG = messageColors.processing('[SmartCache]');
|
|
|
65
66
|
// log lines (start/completed). Same cyan as the other monitoring tags.
|
|
66
67
|
const CONCURRENCY_TAG = messageColors.processing('[CONCURRENCY]');
|
|
67
68
|
// Enhanced mouse interaction and page simulation
|
|
68
|
-
const { performPageInteraction, createInteractionConfig, computeInteractionCeilingMs, performContentClicks, humanLikeMouseMove } = require('./lib/interaction');
|
|
69
|
+
const { performPageInteraction, createInteractionConfig, computeInteractionCeilingMs, performContentClicks, humanLikeMouseMove, performTargetedClicks } = require('./lib/interaction');
|
|
69
70
|
// Optional ghost-cursor support for advanced Bezier-based mouse movements
|
|
70
71
|
const { createGhostCursor, ghostMove, ghostClick, ghostRandomMove, resolveGhostCursorConfig } = require('./lib/ghost-cursor');
|
|
71
72
|
// Domain detection cache for performance optimization
|
|
@@ -241,6 +242,7 @@ if (fs.existsSync(NWSSCONFIG_PATH)) {
|
|
|
241
242
|
resource_cleanup_interval: ['--cleanup-interval'],
|
|
242
243
|
dns: ['--dns'],
|
|
243
244
|
dns_cache: ['--dns-cache'],
|
|
245
|
+
doh_disable: ['--doh-disable'],
|
|
244
246
|
cache_requests: ['--cache-requests'],
|
|
245
247
|
dumpurls: ['--dumpurls'],
|
|
246
248
|
remove_tempfiles: ['--remove-tempfiles'],
|
|
@@ -377,7 +379,13 @@ if (dnsCacheMode) enableDiskCache();
|
|
|
377
379
|
// Filters NXDOMAIN / unresolvable hostnames in <100ms before paying the
|
|
378
380
|
// ~5-15s Puppeteer + Cloudflare detection round-trip on each.
|
|
379
381
|
const dnsPrecheckEnabled = !args.includes('--no-dns-precheck');
|
|
380
|
-
|
|
382
|
+
// 4s (was 2s): under a concurrent scan the c-ares UDP burst against the pinned
|
|
383
|
+
// resolvers can take >2s to answer — a tight timeout false-counted those as
|
|
384
|
+
// resolver errors and tripped the circuit breaker. A clean NXDOMAIN still
|
|
385
|
+
// returns fast (the resolver answers immediately), so the higher ceiling only
|
|
386
|
+
// costs time when the resolver is genuinely slow — exactly when we want to wait
|
|
387
|
+
// rather than false-fail. Paired with the resolver's concurrency cap below.
|
|
388
|
+
const dnsPrecheckTimeoutMs = 4000;
|
|
381
389
|
|
|
382
390
|
// --show-dead-domains: collect hostnames that are definitively DEAD (do not
|
|
383
391
|
// exist / unreachable) and print them at the end of the scan so they can be
|
|
@@ -442,6 +450,31 @@ const dnsResolver = createRotatingResolver({ servers: dnsServersOverride, forceD
|
|
|
442
450
|
// system /etc/resolv.conf, which on a flaky setup times out and silently drops
|
|
443
451
|
// dig-gated domains). Only when --dns is explicitly set.
|
|
444
452
|
if (dnsServersOverride.length > 0) setDigResolvers(dnsServersOverride);
|
|
453
|
+
// Pin Chrome's NAVIGATION resolver to the same providers via DoH. Chrome
|
|
454
|
+
// ignores --dns for page loads and reads /etc/resolv.conf directly, so a broken
|
|
455
|
+
// system resolver (e.g. one returning REFUSED) can ERR_NAME_NOT_RESOLVED a
|
|
456
|
+
// domain the pre-check already resolved. Mapping --dns to the matching DoH
|
|
457
|
+
// endpoint makes navigation use the pinned provider instead of resolv.conf.
|
|
458
|
+
// 'automatic' mode (not 'secure') so Chrome still falls back to system DNS if
|
|
459
|
+
// DoH is unreachable rather than failing the whole batch. Empty templates when
|
|
460
|
+
// --dns is absent or maps to no known DoH provider — Chrome keeps system DNS.
|
|
461
|
+
//
|
|
462
|
+
// Applied ONLY to direct connections (see createBrowser): when a proxy or VPN
|
|
463
|
+
// is active, the exit/tunnel does the resolution (remote DNS / pushed DNS), so
|
|
464
|
+
// pinning local DoH would be redundant and could resolve geo-split domains to
|
|
465
|
+
// the wrong region. In those modes Chrome defers to the proxy/VPN as before.
|
|
466
|
+
// --doh-disable (default false) opts out of the Chrome DoH pinning entirely —
|
|
467
|
+
// navigation falls back to system resolv.conf even when --dns maps to a known
|
|
468
|
+
// provider. The pre-check and dig still honor --dns. Use it if DoH adds
|
|
469
|
+
// unwanted latency, is blocked on the network, or you specifically want Chrome
|
|
470
|
+
// to resolve via the system path.
|
|
471
|
+
const dohDisabled = args.includes('--doh-disable');
|
|
472
|
+
const chromeDoh = dnsServersOverride.length > 0
|
|
473
|
+
? dohTemplatesForResolvers(dnsServersOverride)
|
|
474
|
+
: { templates: '', mapped: [], unmapped: [] };
|
|
475
|
+
// anyVpnConfigured and the DoH startup log live inside the main IIFE below:
|
|
476
|
+
// `sites` is destructured from the config later in module load, so referencing
|
|
477
|
+
// it at this point in top-level evaluation would TDZ-throw.
|
|
445
478
|
// Circuit breaker: if resolver errors dominate, suspend the pre-check for a
|
|
446
479
|
// cooldown so a refusal storm doesn't keep hammering a broken resolver (sites
|
|
447
480
|
// still load — a suspended pre-check just proceeds to navigation).
|
|
@@ -818,8 +851,12 @@ General Options:
|
|
|
818
851
|
|
|
819
852
|
Validation Options:
|
|
820
853
|
--cache-requests Cache HTTP requests to avoid re-requesting same URLs within scan
|
|
821
|
-
--dns <ip[,ip,...]> Resolver(s) for the DNS pre-check
|
|
822
|
-
|
|
854
|
+
--dns <ip[,ip,...]> Resolver(s) for the DNS pre-check, nettools' dig, and — when they map to a
|
|
855
|
+
known DoH provider — Chrome's page navigation via DoH on direct connections
|
|
856
|
+
(skipped under proxy/VPN; not whois). Overrides /etc/resolv.conf.
|
|
857
|
+
One pins all queries to it; several rotate per query.
|
|
858
|
+
--doh-disable Opt out of the Chrome-navigation DoH pinning (default: off). Chrome then
|
|
859
|
+
resolves via system resolv.conf; --dns still pins the pre-check and dig.
|
|
823
860
|
--dns-cache Persist dig/whois results to disk between runs (dig 20h / whois 36h TTL, 2000-entry cap each),
|
|
824
861
|
plus the DNS pre-check negative cache (NXDOMAIN only, 12h TTL, .dnsnegcache)
|
|
825
862
|
--no-dns-precheck Disable per-URL DNS resolution check before page navigation.
|
|
@@ -894,6 +931,9 @@ Redirect Handling Options:
|
|
|
894
931
|
source: true/false Save page source HTML after load
|
|
895
932
|
firstParty: true/false Allow first-party matches (default: false)
|
|
896
933
|
thirdParty: true/false Allow third-party matches (default: true)
|
|
934
|
+
redirect_first_party: true/false Treat redirect-destination domains as first-party (default: true).
|
|
935
|
+
false keeps redirect targets third-party so filterRegex/dig can match
|
|
936
|
+
them (e.g. capturing an ad/cloak redirect's end domain)
|
|
897
937
|
screenshot: true/false/\"force\" Capture screenshot (true=on failure, \"force\"=always)
|
|
898
938
|
headful: true/false Launch browser with GUI for this site
|
|
899
939
|
fingerprint_protection: true/false/"random" Enable fingerprint spoofing: true/false/"random"
|
|
@@ -931,6 +971,9 @@ Advanced Options:
|
|
|
931
971
|
interact_scrolling: true/false Enable scrolling simulation (default: true)
|
|
932
972
|
interact_clicks: true/false Enable element clicking simulation (default: false)
|
|
933
973
|
interact_typing: true/false Enable typing simulation (default: false)
|
|
974
|
+
click_elements: ["sel1","sel2"] After load, click these CSS selectors in order, main frame + iframes
|
|
975
|
+
(organic nav / play button). Honors realistic_click + cursor_mode "ghost"; missing skipped
|
|
976
|
+
click_wait: <milliseconds> Per-click: max wait for the element to appear + settle/nav after (default: 5000)
|
|
934
977
|
cursor_mode: "ghost" Use ghost-cursor Bezier mouse (requires: npm i ghost-cursor)
|
|
935
978
|
ghost_cursor_speed: <number> Ghost-cursor speed multiplier (default: auto)
|
|
936
979
|
ghost_cursor_hesitate: <milliseconds> Delay before ghost-cursor clicks (default: 50)
|
|
@@ -1878,6 +1921,20 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
1878
1921
|
console.log(formatLogMessage('debug', `${POPUP_TAG} capture_popups set — launching with --disable-popup-blocking (non-gesture popunders allowed)`));
|
|
1879
1922
|
}
|
|
1880
1923
|
|
|
1924
|
+
// DoH gate: any VPN site disables Chrome DoH (the tunnel resolves). Computed
|
|
1925
|
+
// here (not at module top) because `sites` is only initialized by this point.
|
|
1926
|
+
// Read by createBrowser's launch args; the startup log reports the decision.
|
|
1927
|
+
const anyVpnConfigured = Array.isArray(sites) && sites.some(s => s && (s.vpn || s.openvpn));
|
|
1928
|
+
if (dnsServersOverride.length > 0 && !silentMode) {
|
|
1929
|
+
if (dohDisabled) {
|
|
1930
|
+
console.log(formatLogMessage('info', `Chrome DoH disabled via --doh-disable — navigation uses system resolv.conf; --dns still pins the pre-check and dig.`));
|
|
1931
|
+
} else if (chromeDoh.templates) {
|
|
1932
|
+
console.log(formatLogMessage('info', `Chrome navigation will use DoH (automatic) on direct connections: ${chromeDoh.templates}${anyVpnConfigured ? ' — VPN configured, so it defers to VPN resolution' : ' — deferred to proxy resolution on proxied sites'}`));
|
|
1933
|
+
} else {
|
|
1934
|
+
console.warn(formatLogMessage('warn', `--dns servers (${chromeDoh.unmapped.join(', ')}) have no known DoH endpoint — Chrome navigation stays on system resolv.conf; only the pre-check and dig are pinned. Known providers: Google, Cloudflare, Quad9, OpenDNS, AdGuard, CleanBrowsing, DNS.SB, Mullvad.`));
|
|
1935
|
+
}
|
|
1936
|
+
}
|
|
1937
|
+
|
|
1881
1938
|
/**
|
|
1882
1939
|
* Creates a new browser instance with consistent configuration
|
|
1883
1940
|
* Uses system Chrome and temporary directories to minimize disk usage
|
|
@@ -1985,6 +2042,19 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
1985
2042
|
'--use-mock-keychain',
|
|
1986
2043
|
'--disable-client-side-phishing-detection',
|
|
1987
2044
|
'--enable-features=NetworkService',
|
|
2045
|
+
// DoH for Chrome's navigation resolver when --dns maps to a known
|
|
2046
|
+
// provider — but ONLY on direct connections. A proxied launch carries
|
|
2047
|
+
// a --proxy-server in extraArgs and does its own (remote) DNS; a VPN
|
|
2048
|
+
// tunnels resolution. In both cases local DoH is redundant and could
|
|
2049
|
+
// resolve geo-split domains to the wrong region, so it's skipped and
|
|
2050
|
+
// Chrome defers to the proxy/VPN. 'automatic' keeps a system-DNS
|
|
2051
|
+
// fallback if DoH is unreachable. Flags omitted when not applicable.
|
|
2052
|
+
...((chromeDoh.templates
|
|
2053
|
+
&& !dohDisabled
|
|
2054
|
+
&& !anyVpnConfigured
|
|
2055
|
+
&& !extraArgs.some(a => typeof a === 'string' && a.startsWith('--proxy-server')))
|
|
2056
|
+
? ['--dns-over-https-mode=automatic', `--dns-over-https-templates=${chromeDoh.templates}`]
|
|
2057
|
+
: []),
|
|
1988
2058
|
// Disk space controls - minimal cache for scanning workloads
|
|
1989
2059
|
`--disk-cache-size=${CACHE_LIMITS.DISK_CACHE_SIZE}`,
|
|
1990
2060
|
`--media-cache-size=${CACHE_LIMITS.MEDIA_CACHE_SIZE}`,
|
|
@@ -2468,10 +2538,18 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
2468
2538
|
page.setDefaultNavigationTimeout(Math.min(timeout, TIMEOUTS.DEFAULT_NAVIGATION));
|
|
2469
2539
|
// Aggressive timeouts prevent hanging in Puppeteer 23.x while maintaining speed
|
|
2470
2540
|
|
|
2471
|
-
|
|
2472
|
-
|
|
2473
|
-
|
|
2474
|
-
|
|
2541
|
+
// Only attach a console listener under --debug. Registering ANY 'console'
|
|
2542
|
+
// listener makes Puppeteer enable the CDP Runtime domain, which arms
|
|
2543
|
+
// console-based automation/DevTools traps (e.g. disable-devtool logs an
|
|
2544
|
+
// object with a getter and detects the inspector reading it → redirects
|
|
2545
|
+
// away). The body is a no-op without forceDebug, so attaching it
|
|
2546
|
+
// unconditionally armed that trap for zero benefit.
|
|
2547
|
+
if (forceDebug) {
|
|
2548
|
+
page.on('console', (msg) => {
|
|
2549
|
+
if (msg.type() === 'error') console.log(formatLogMessage('debug', `Console error: ${msg.text()}`));
|
|
2550
|
+
});
|
|
2551
|
+
}
|
|
2552
|
+
|
|
2475
2553
|
// Add page crash handler
|
|
2476
2554
|
page.on('error', (err) => {
|
|
2477
2555
|
if (forceDebug) console.log(formatLogMessage('debug', `Page crashed: ${err.message}`));
|
|
@@ -3356,12 +3434,18 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
3356
3434
|
// (normalizeSiteConfig now coerces interact: 1 → true with a warning,
|
|
3357
3435
|
// so by the time we get here both should be booleans — but keep the
|
|
3358
3436
|
// diagnostic accurate for the truly-missing case.)
|
|
3437
|
+
const hasClickElements = Array.isArray(siteConfig.click_elements) && siteConfig.click_elements.length > 0;
|
|
3359
3438
|
const interactOn = siteConfig.interact === true;
|
|
3360
3439
|
const clicksOn = siteConfig.interact_clicks === true;
|
|
3361
|
-
if (!interactOn
|
|
3362
|
-
|
|
3440
|
+
if (hasClickElements && (!interactOn || !clicksOn)) {
|
|
3441
|
+
// click_elements fires its own trusted gesture clicks, so popups it
|
|
3442
|
+
// triggers capture regardless of interact/interact_clicks. Don't warn
|
|
3443
|
+
// "no clicks fire" — surface the random-click coverage gap instead.
|
|
3444
|
+
console.log(formatLogMessage('debug', `[popup] capture_popups: click_elements supplies targeted gesture clicks (popups they trigger WILL capture). interact=${interactOn}, interact_clicks=${clicksOn} — enable both for random content-zone click coverage of overlay popunders too`));
|
|
3445
|
+
} else if (!interactOn && !clicksOn) {
|
|
3446
|
+
console.log(formatLogMessage('debug', `[popup] capture_popups is enabled but neither 'interact' nor 'interact_clicks' is — set BOTH to true to fire user-gesture clicks; without them, only popups opened via in-page redirects (or click_elements) will capture`));
|
|
3363
3447
|
} else if (!interactOn) {
|
|
3364
|
-
console.log(formatLogMessage('debug', `[popup] capture_popups is enabled but 'interact' is not — set interact: true to enable the interaction loop (interact_clicks is already set); without it, no fake clicks fire`));
|
|
3448
|
+
console.log(formatLogMessage('debug', `[popup] capture_popups is enabled but 'interact' is not — set interact: true to enable the interaction loop (interact_clicks is already set); without it, no random fake clicks fire`));
|
|
3365
3449
|
} else if (!clicksOn) {
|
|
3366
3450
|
console.log(formatLogMessage('debug', `[popup] capture_popups is enabled but 'interact_clicks' is not — set interact_clicks: true to enable element-targeted clicks; without it, only random content-zone clicks fire and may miss overlay-based popunders`));
|
|
3367
3451
|
}
|
|
@@ -4433,6 +4517,77 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
4433
4517
|
if (forceDebug) console.log(formatLogMessage('debug', `Navigation timeout — proceeding with partially-loaded page for ${currentUrl}`));
|
|
4434
4518
|
navigationResult = { finalUrl: partialUrl, redirected: false, redirectChain: [currentUrl], originalUrl: currentUrl, redirectDomains: [], httpStatus: null, cfRay: null };
|
|
4435
4519
|
}
|
|
4520
|
+
} else if (navErr.message.includes('ERR_TOO_MANY_REDIRECTS')) {
|
|
4521
|
+
// Redirect-cloaking chain exceeded Chrome's ~20-hop per-navigation
|
|
4522
|
+
// ceiling, so goto() rejected. Two recovery paths — they cover
|
|
4523
|
+
// opposite cases run-to-run, so try both:
|
|
4524
|
+
// 1. Browser ride-through (free): a JS/meta hop on a committed
|
|
4525
|
+
// intermediate page resets Chrome's counter and carries the page
|
|
4526
|
+
// to the end site on its own. Check if it already happened, else
|
|
4527
|
+
// wait briefly for it.
|
|
4528
|
+
// 2. curl-resolve (fallback, only if the page parked on
|
|
4529
|
+
// chrome-error): curl follows the chain (it gets the real chain,
|
|
4530
|
+
// not headless Chrome's endless loop) to the JS-handoff page;
|
|
4531
|
+
// navigating there directly is a SHORT hop that reaches the end
|
|
4532
|
+
// site. Skipped under proxy/VPN — curl runs DIRECT from the host
|
|
4533
|
+
// and would leak the real IP / resolve from the wrong network.
|
|
4534
|
+
// If neither reaches a real page, keep the chain requests already
|
|
4535
|
+
// captured (grouped under the original URL, never chrome-error).
|
|
4536
|
+
let landedUrl = '';
|
|
4537
|
+
const isRealPage = (u) => !!u && /^https?:\/\//.test(u) && !u.startsWith('chrome-error://') && u !== currentUrl;
|
|
4538
|
+
|
|
4539
|
+
// 1) Browser ride-through — may have completed during goto(); if not,
|
|
4540
|
+
// wait for the next navigation(s) to carry it through.
|
|
4541
|
+
try { if (!page.isClosed() && isRealPage(page.url())) landedUrl = page.url(); } catch {}
|
|
4542
|
+
for (let r = 0; r < 3 && !landedUrl; r++) {
|
|
4543
|
+
try {
|
|
4544
|
+
await page.waitForNavigation({ waitUntil: 'domcontentloaded', timeout: 8000 });
|
|
4545
|
+
if (!page.isClosed() && isRealPage(page.url())) landedUrl = page.url();
|
|
4546
|
+
} catch { break; } // no further navigation — stop waiting
|
|
4547
|
+
}
|
|
4548
|
+
if (landedUrl && forceDebug) console.log(formatLogMessage('debug', `Too many redirects — browser rode through to ${landedUrl} for ${currentUrl}`));
|
|
4549
|
+
|
|
4550
|
+
// 2) curl-resolve fallback — only if still parked (no ride-through).
|
|
4551
|
+
// Opt-in via the site's `curl` option: if you didn't enable curl
|
|
4552
|
+
// in the config, the scanner won't shell out to it here either
|
|
4553
|
+
// (consistent with the content-analysis `curl` gate).
|
|
4554
|
+
if (!landedUrl) {
|
|
4555
|
+
const curlResolveOk = siteConfig.curl === true && !needsProxy(siteConfig) && !anyVpnConfigured && validateCurlAvailability().isAvailable;
|
|
4556
|
+
if (curlResolveOk) {
|
|
4557
|
+
let resolvedUrl = '';
|
|
4558
|
+
try {
|
|
4559
|
+
const curlUa = USER_AGENT_COLLECTIONS.get((siteConfig.userAgent || 'chrome').toLowerCase())
|
|
4560
|
+
|| 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36';
|
|
4561
|
+
const cr = await runProcess('curl', ['-sL', '--max-redirs', '50', '--max-time', '20', '-o', '/dev/null', '-A', curlUa, '-w', '%{url_effective}', currentUrl], { timeout: 22000, maxStdout: 4096 });
|
|
4562
|
+
const u = (cr.stdout || '').trim();
|
|
4563
|
+
if (cr.code === 0 && /^https?:\/\//.test(u) && u !== currentUrl) resolvedUrl = u;
|
|
4564
|
+
} catch (_) { /* curl failed */ }
|
|
4565
|
+
if (resolvedUrl) {
|
|
4566
|
+
if (forceDebug) console.log(formatLogMessage('debug', `Too many redirects — curl resolved the chain to ${resolvedUrl}; navigating there directly for ${currentUrl}`));
|
|
4567
|
+
// Navigate to the resolved endpoint; the streaming/embed end page
|
|
4568
|
+
// often never reaches DOM-ready, so the goto may throw — either
|
|
4569
|
+
// way it navigated, so adopt page.url().
|
|
4570
|
+
try { navigationResult = await navigateWithRedirectHandling(page, resolvedUrl, siteConfig, gotoOptions, forceDebug, formatLogMessage); } catch (_) { /* timed out — use page.url() below */ }
|
|
4571
|
+
try { if (!page.isClosed() && page.url() && !page.url().startsWith('chrome-error://')) landedUrl = page.url(); } catch {}
|
|
4572
|
+
} else if (forceDebug) {
|
|
4573
|
+
console.log(formatLogMessage('debug', `Too many redirects — no ride-through and curl could not resolve; keeping chain captures for ${currentUrl}`));
|
|
4574
|
+
}
|
|
4575
|
+
} else if (forceDebug) {
|
|
4576
|
+
const why = siteConfig.curl !== true ? 'curl not enabled (curl:false)'
|
|
4577
|
+
: (needsProxy(siteConfig) || anyVpnConfigured) ? 'proxy/VPN active'
|
|
4578
|
+
: 'curl unavailable';
|
|
4579
|
+
console.log(formatLogMessage('debug', `Too many redirects — no ride-through and curl-resolve skipped (${why}); keeping chain captures for ${currentUrl}`));
|
|
4580
|
+
}
|
|
4581
|
+
}
|
|
4582
|
+
|
|
4583
|
+
// navigateWithRedirectHandling may already have set navigationResult
|
|
4584
|
+
// (clean curl path). Otherwise build a partial from where we landed —
|
|
4585
|
+
// the end site if we rode through / curl'd, else the original URL with
|
|
4586
|
+
// the chain requests already captured.
|
|
4587
|
+
if (!navigationResult) {
|
|
4588
|
+
const fu = landedUrl || currentUrl;
|
|
4589
|
+
navigationResult = { finalUrl: fu, redirected: fu !== currentUrl, redirectChain: [currentUrl, fu], originalUrl: currentUrl, redirectDomains: [], httpStatus: null, cfRay: null };
|
|
4590
|
+
}
|
|
4436
4591
|
} else {
|
|
4437
4592
|
throw navErr;
|
|
4438
4593
|
}
|
|
@@ -4478,17 +4633,26 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
4478
4633
|
redirectHistory.add(currentUrl);
|
|
4479
4634
|
redirectHistory.add(finalUrl);
|
|
4480
4635
|
|
|
4481
|
-
// Add redirect destination to first-party domains
|
|
4482
|
-
|
|
4483
|
-
|
|
4484
|
-
|
|
4485
|
-
|
|
4486
|
-
//
|
|
4487
|
-
|
|
4488
|
-
|
|
4489
|
-
|
|
4490
|
-
|
|
4491
|
-
}
|
|
4636
|
+
// Add redirect destination (and intermediates) to first-party domains
|
|
4637
|
+
// so the landed site's own resources aren't captured as third-party.
|
|
4638
|
+
// Opt out with redirect_first_party:false — then redirect targets stay
|
|
4639
|
+
// THIRD-PARTY and become eligible for filterRegex/dig under
|
|
4640
|
+
// thirdParty:true (e.g. capturing an ad/cloak redirect's end domain).
|
|
4641
|
+
// The originally-scanned domain (added earlier) stays first-party.
|
|
4642
|
+
const redirectsAreFirstParty = siteConfig.redirect_first_party !== false;
|
|
4643
|
+
if (redirectsAreFirstParty) {
|
|
4644
|
+
if (finalDomain) {
|
|
4645
|
+
firstPartyDomains.add(finalDomain);
|
|
4646
|
+
}
|
|
4647
|
+
// Also add any intermediate redirect domains as first-party
|
|
4648
|
+
if (redirectDomains && redirectDomains.length > 0) {
|
|
4649
|
+
redirectDomains.forEach(domain => {
|
|
4650
|
+
const rootDomain = safeGetDomain(`http://${domain}`, false);
|
|
4651
|
+
if (rootDomain) firstPartyDomains.add(rootDomain);
|
|
4652
|
+
});
|
|
4653
|
+
}
|
|
4654
|
+
} else if (forceDebug) {
|
|
4655
|
+
console.log(formatLogMessage('debug', `redirect_first_party:false — keeping redirect target ${finalDomain} third-party for ${currentUrl}`));
|
|
4492
4656
|
}
|
|
4493
4657
|
|
|
4494
4658
|
if (originalDomain !== finalDomain) {
|
|
@@ -4745,6 +4909,45 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
4745
4909
|
}
|
|
4746
4910
|
}
|
|
4747
4911
|
|
|
4912
|
+
// Targeted clicks: after load, click configured CSS selectors in order
|
|
4913
|
+
// (e.g. a movie link, then a play button) to reach content via organic
|
|
4914
|
+
// navigation/gesture instead of a direct deep-load (which some sites
|
|
4915
|
+
// JS-redirect away). The request interceptor stays attached, so the
|
|
4916
|
+
// post-click page's requests flow into the same filterRegex/dig matching.
|
|
4917
|
+
// Reuses realistic_click for a genuine trusted gesture. Runs before the
|
|
4918
|
+
// delay/interact phase so those operate on the resulting page.
|
|
4919
|
+
if (Array.isArray(siteConfig.click_elements) && siteConfig.click_elements.length > 0 && page && !page.isClosed()) {
|
|
4920
|
+
// If ghost-cursor is enabled for this site (cursor_mode:"ghost" or
|
|
4921
|
+
// --ghost-cursor), route the targeted clicks through it — Bezier travel
|
|
4922
|
+
// to the element + realistic press — matching the interact phase.
|
|
4923
|
+
// Injected so interaction.js needn't require ghost-cursor.js (circular).
|
|
4924
|
+
// Falls back to performTargetedClicks' humanClick/el.click when ghost is
|
|
4925
|
+
// off or the package isn't installed (resolveGhostCursorConfig → null).
|
|
4926
|
+
let ghostClicker = null;
|
|
4927
|
+
const tcGhostCfg = resolveGhostCursorConfig(siteConfig, globalGhostCursor, forceDebug);
|
|
4928
|
+
if (tcGhostCfg) {
|
|
4929
|
+
const tcCursor = createGhostCursor(page, { forceDebug });
|
|
4930
|
+
if (tcCursor) {
|
|
4931
|
+
ghostClicker = (x, y) => ghostClick(tcCursor, { x, y }, {
|
|
4932
|
+
hesitate: tcGhostCfg.hesitate,
|
|
4933
|
+
page,
|
|
4934
|
+
realistic: siteConfig.realistic_click === true,
|
|
4935
|
+
forceDebug
|
|
4936
|
+
});
|
|
4937
|
+
}
|
|
4938
|
+
}
|
|
4939
|
+
try {
|
|
4940
|
+
await performTargetedClicks(page, siteConfig.click_elements, {
|
|
4941
|
+
realistic: siteConfig.realistic_click === true,
|
|
4942
|
+
waitMs: Math.min(Number(siteConfig.click_wait) || 5000, Math.floor(timeout / 2)),
|
|
4943
|
+
ghostClick: ghostClicker,
|
|
4944
|
+
forceDebug
|
|
4945
|
+
});
|
|
4946
|
+
} catch (clickErr) {
|
|
4947
|
+
if (forceDebug) console.log(formatLogMessage('debug', `${INTERACTION_TAG} click_elements phase failed for ${currentUrl}: ${clickErr.message}`));
|
|
4948
|
+
}
|
|
4949
|
+
}
|
|
4950
|
+
|
|
4748
4951
|
const delayMs = siteConfig.delay || TIMEOUTS.DEFAULT_DELAY;
|
|
4749
4952
|
|
|
4750
4953
|
// Optimized delays for Puppeteer 23.x performance
|
|
@@ -4761,6 +4964,13 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
4761
4964
|
const actualDelay = siteConfig.delay_uncapped === true
|
|
4762
4965
|
? Math.min(delayMs, Math.floor(timeout / 2))
|
|
4763
4966
|
: Math.min(delayMs, TIMEOUTS.NETWORK_IDLE);
|
|
4967
|
+
// Surface the clamp — otherwise `delay: 48000` silently running as 29000
|
|
4968
|
+
// (timeout/2) looks like the flag was ignored. The per-URL budget already
|
|
4969
|
+
// reserves the full `delay`, so the lever to honor it is a larger timeout.
|
|
4970
|
+
if (forceDebug && actualDelay < delayMs) {
|
|
4971
|
+
const ceiling = siteConfig.delay_uncapped === true ? 'timeout/2; raise timeout to lift' : 'default 2s cap; set delay_uncapped:true to lift';
|
|
4972
|
+
console.log(formatLogMessage('debug', `delay ${delayMs}ms clamped to ${actualDelay}ms (${ceiling}) for ${currentUrl}`));
|
|
4973
|
+
}
|
|
4764
4974
|
|
|
4765
4975
|
// Build delay promise (networkIdle + delay + optional flowProxy delay)
|
|
4766
4976
|
const delayPromise = (async () => {
|
|
@@ -5033,6 +5243,21 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
5033
5243
|
|
|
5034
5244
|
let reloadSuccess = false;
|
|
5035
5245
|
|
|
5246
|
+
// page.reload() can't carry a referer; when referrer_headers is set,
|
|
5247
|
+
// re-navigate to the current URL with it so referer-gated embeds keep
|
|
5248
|
+
// serving across the reload:N loop (the initial goto carries the referer,
|
|
5249
|
+
// but reload() drops it). Nav-only scope — subresources keep their normal
|
|
5250
|
+
// page-origin referer (unlike setExtraHTTPHeaders, which would force the
|
|
5251
|
+
// referer onto every request and can break embeds whose subresources
|
|
5252
|
+
// expect own-origin). A static referrer_headers string is identical each
|
|
5253
|
+
// reload; random/mixed modes pick a fresh value per reload.
|
|
5254
|
+
const reloadReferer = siteConfig.referrer_headers
|
|
5255
|
+
? getReferrerForUrl(currentUrl, siteConfig.referrer_headers, siteConfig.referrer_disable, forceDebug)
|
|
5256
|
+
: '';
|
|
5257
|
+
const reloadOrReferredGoto = (opts) => reloadReferer
|
|
5258
|
+
? page.goto(page.url(), { ...opts, referer: reloadReferer })
|
|
5259
|
+
: page.reload(opts);
|
|
5260
|
+
|
|
5036
5261
|
// Skip force reload if browser seems unhealthy
|
|
5037
5262
|
const skipForceReload = i > 2; // After 2 attempts, skip force reload
|
|
5038
5263
|
|
|
@@ -5055,7 +5280,7 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
5055
5280
|
await raceWithTimer(page.setCacheEnabled(false), 'Cache disable timeout', 8000);
|
|
5056
5281
|
|
|
5057
5282
|
// Use networkidle2 for force reload to better detect when page is actually loaded
|
|
5058
|
-
await
|
|
5283
|
+
await reloadOrReferredGoto({ waitUntil: 'networkidle2', timeout: Math.min(timeout, 15000) });
|
|
5059
5284
|
|
|
5060
5285
|
// Timeout-protected cache enable
|
|
5061
5286
|
await raceWithTimer(page.setCacheEnabled(true), 'Cache enable timeout', 8000);
|
|
@@ -5094,7 +5319,7 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
5094
5319
|
? { waitUntil: 'domcontentloaded', timeout: 10000 } // Simpler after failures
|
|
5095
5320
|
: { waitUntil: 'networkidle2', timeout: 15000 }; // Full wait first time
|
|
5096
5321
|
|
|
5097
|
-
await
|
|
5322
|
+
await reloadOrReferredGoto(reloadOptions);
|
|
5098
5323
|
|
|
5099
5324
|
if (forceDebug) console.log(formatLogMessage('debug', `Standard reload #${i} completed for ${currentUrl}`));
|
|
5100
5325
|
} catch (standardReloadErr) {
|
|
@@ -5954,10 +6179,24 @@ function setupFrameHandling(page, forceDebug) {
|
|
|
5954
6179
|
const INTERACTION_OVERHEAD_MS = interactionOnForTask
|
|
5955
6180
|
? computeInteractionCeilingMs(createInteractionConfig(task.url, task.config))
|
|
5956
6181
|
: 0;
|
|
6182
|
+
// click_elements runs ONCE after load (before the delay/interact/reload
|
|
6183
|
+
// phases): N selectors, each a settle/nav wait (click_wait, capped at
|
|
6184
|
+
// timeout/2 — mirror the call site) plus ~2s for scroll + the click action
|
|
6185
|
+
// (ghost Bezier travel is the slowest). Budget it so a heavy click chain
|
|
6186
|
+
// can't trip the per-URL ceiling before the work that follows it. Not
|
|
6187
|
+
// multiplied by reloadCount — the click phase is one-time.
|
|
6188
|
+
const clickEls = Array.isArray(task.config.click_elements)
|
|
6189
|
+
? task.config.click_elements.filter(s => typeof s === 'string' && s.trim())
|
|
6190
|
+
: [];
|
|
6191
|
+
const clickWaitMs = clickEls.length
|
|
6192
|
+
? Math.min(Number(task.config.click_wait) || 5000, Math.floor((task.config.timeout || 35000) / 2))
|
|
6193
|
+
: 0;
|
|
6194
|
+
const CLICK_ELEMENTS_OVERHEAD_MS = clickEls.length * (clickWaitMs + 2000);
|
|
5957
6195
|
const PER_URL_TIMEOUT_MS = Math.max(
|
|
5958
6196
|
75000,
|
|
5959
6197
|
(task.config.timeout || 35000)
|
|
5960
6198
|
+ ((task.config.delay || 0) + INTERACTION_OVERHEAD_MS) * (1 + reloadCount)
|
|
6199
|
+
+ CLICK_ELEMENTS_OVERHEAD_MS
|
|
5961
6200
|
+ 30000
|
|
5962
6201
|
);
|
|
5963
6202
|
// Feed the hang-check restart so it never escalates before this URL's own
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@fanboynz/network-scanner",
|
|
3
|
-
"version": "3.
|
|
3
|
+
"version": "3.4.0",
|
|
4
4
|
"description": "A Puppeteer-based network scanner for analyzing web traffic, generating adblock filter rules, and identifying third-party requests. Features include fingerprint spoofing, Cloudflare bypass, content analysis with curl/grep, and multiple output formats.",
|
|
5
5
|
"main": "nwss.js",
|
|
6
6
|
"scripts": {
|