@fanboynz/network-scanner 3.2.0 → 3.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +44 -0
- package/README.md +40 -4
- package/lib/dns.js +117 -7
- package/lib/fingerprint.js +39 -36
- package/lib/interaction.js +151 -0
- package/lib/nettools.js +7 -4
- package/lib/openvpn_vpn.js +8 -0
- package/lib/validate_rules.js +3 -3
- package/lib/wireguard_vpn.js +8 -0
- package/nwss.1 +46 -6
- package/nwss.js +449 -89
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,50 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to the Network Scanner (nwss.js) project.
|
|
4
4
|
|
|
5
|
+
## [3.4.0] - 2026-06-13
|
|
6
|
+
|
|
7
|
+
### Added
|
|
8
|
+
- **`redirect_first_party` site option** (default `true`) — by default a redirect's destination domains (and chain hops) are registered first-party so the landed site's own resources aren't captured as third-party. Set `false` to keep redirect targets **third-party**, so `filterRegex`/`dig` apply to them under `thirdParty: true` — e.g. capturing the end domain of an ad/cloak redirect chain (which Chrome reaches via the `ERR_TOO_MANY_REDIRECTS` curl-resolve recovery). The originally-scanned domain stays first-party either way.
|
|
9
|
+
- **`ERR_TOO_MANY_REDIRECTS` is recovered, not hard-failed** — a redirect-cloaking chain (rotating throwaway domains) can exceed Chrome's ~20-hop ceiling. The scanner now recovers via two complementary paths: **(1)** it first waits briefly for the browser to *ride through* on its own — a JS/meta hop on a committed page resets Chrome's hop counter and often carries the page to the end site for free; **(2)** if the page parked on `chrome-error://` instead **and the site has `curl: true`**, it resolves the chain endpoint with `curl` (which, unlike headless Chrome, isn't served the endless-loop variant) and navigates there directly — a short hop that lands on the real end site. Either way it captures the end site's ad/tracker requests; falls back to the captured chain requests when neither path lands. The curl step is opt-in via the existing `curl` site option (so `curl: false`/unset never shells out to curl) and is also **skipped under a proxy/VPN** (curl runs direct and would leak the real IP / resolve from the wrong network); the free ride-through always applies.
|
|
10
|
+
- **`click_elements` site option** — after a page loads, click a list of CSS selectors **in order** (searched across the main frame and any iframe) (e.g. `["a[href*='/movie/']", ".play"]` to click a movie link then a play button). Reaches content via organic navigation/gesture instead of a direct deep-load, which some sites JS-redirect away, and triggers click-only content like video players. Each selector is `waitForSelector`-ed (visible) up to `click_wait` before clicking, so JS-rendered targets like video players aren't missed by racing ahead of them. The request interceptor stays attached, so the post-click page's requests run through the same `filterRegex`/`dig` matching; a click that navigates is followed and later selectors query the resulting page. Honors `realistic_click` (genuine trusted gesture) and `cursor_mode: "ghost"` (Bezier travel to the element); missing elements are skipped and never fail the scan. Settle/nav wait per click via `click_wait` (default 5000ms, capped at half the per-URL timeout).
|
|
11
|
+
- **`--dns` now also pins Chrome's page-navigation resolver via DoH.** Chrome ignores `--dns` for navigation and reads `/etc/resolv.conf` directly, so a broken or filtering system resolver could `ERR_NAME_NOT_RESOLVED` a domain the pre-check had already resolved. When the `--dns` servers map to a known public DoH provider — **Google, Cloudflare, Quad9, OpenDNS, AdGuard, CleanBrowsing, DNS.SB, Mullvad** (incl. malware/family/unfiltered variants) — Chrome is launched with secure-DNS `automatic` mode pointed at that provider, so page navigation resolves through the same resolver as the pre-check. `automatic` (not `secure`) keeps a system-DNS fallback if DoH is unreachable rather than failing the batch. **Applied to direct connections only** — skipped when a proxy (`--proxy-server`) or VPN is active, since the exit/tunnel does the resolution and local DoH would be redundant or resolve geo-split domains to the wrong region. Unmapped resolvers (custom/ISP, per-account providers like NextDNS, IPv6) fall back to system DNS with a warning naming the supported providers.
|
|
12
|
+
- **`--doh-disable`** site/CLI option (`doh_disable` in `.nwssconfig`), default off — opt out of the Chrome-navigation DoH pinning entirely. Chrome then resolves page navigation via the system `resolv.conf` even when `--dns` maps to a known provider, while the pre-check and `dig` still honor `--dns`. For networks where DoH adds latency or is blocked, or when system-path resolution is specifically wanted.
|
|
13
|
+
|
|
14
|
+
### Changed
|
|
15
|
+
- **A clamped `delay` is now logged (`--debug`)** — when `delay` exceeds its ceiling (the default 2s cap, or `timeout/2` under `delay_uncapped: true`) it was silently reduced, so `delay: 48000` quietly running as 29000ms looked like the flag was ignored. A debug line now reports the clamp and which ceiling applied (raise `timeout`, or set `delay_uncapped: true`, to lift it). The per-URL budget already reserves the full configured `delay`; this only surfaces the post-load dwell clamp.
|
|
16
|
+
- **DNS pre-check is paced and more tolerant under concurrency** — a concurrent scan fired up to `max_concurrent` simultaneous c-ares UDP queries at the pinned `--dns` servers; the burst (rough on WSL2's UDP-through-NAT path, and rate-limited by public resolvers) produced timeouts / `EREFUSED` that tripped the circuit breaker (`resolver errors N/M — suspending DNS pre-check`) and lost the dead-host-skip optimization. The pre-check timeout is raised 2s → 4s (a clean NXDOMAIN still returns fast, so the higher ceiling only costs time when the resolver is genuinely slow), and `createRotatingResolver` now caps in-flight queries with a counting semaphore (default 6) so the burst is paced and excess callers queue and drain quickly. The circuit breaker itself is unchanged — these reduce the error rate so it stops tripping on healthy resolvers.
|
|
17
|
+
|
|
18
|
+
### Fixed
|
|
19
|
+
- **`whois` availability probe is now platform-aware** — the fallback used `which whois` (Unix-only), which on native Windows would false-negative an installed `whois.exe` whose `whois --version` errors (e.g. Sysinternals whois). Uses `where` on Windows, `which` elsewhere. No change on Linux/macOS/WSL.
|
|
20
|
+
|
|
21
|
+
## [3.3.0] - 2026-06-06
|
|
22
|
+
|
|
23
|
+
### Added
|
|
24
|
+
- **DNS dead-domain skip + corroborated persistence** — within a scan, once a host resolves NXDOMAIN/ENODATA it is remembered and repeat URLs on that host are skipped without re-resolving. With `--dns-cache`, a host that *also* fails navigation (`ERR_NAME_NOT_RESOLVED` / `ERR_ADDRESS_UNREACHABLE`) is corroborated and persisted to the negative cache (`.dnsnegcache`, 12h TTL) so it is skipped on the next run too. Only definitive non-existence is cached — resolver errors fail open and never poison a live host.
|
|
25
|
+
- **`acceptInsecureCerts` on browser launch** — TLS/cert errors (expired, self-signed, name-mismatch) no longer abort navigation, so streaming/pirate domains with broken certs are still scanned.
|
|
26
|
+
- **`--disable-popup-blocking` when a site uses `capture_popups`** — Chrome's pop-up blocker (`chrome://settings/content/popups`) is turned off only for popup-capture scans, so non-gesture popunders (document-level `onclick` / timer SDKs) fire and get captured too. Non-popup scans keep the blocker on (stealthier — a real browser blocks non-gesture `window.open()`); gesture-triggered popups already worked via the synthetic-click path.
|
|
27
|
+
|
|
28
|
+
### Changed
|
|
29
|
+
- **The main-frame document is never blocked** — the scanned page (and any main-frame redirect target) is exempt from adblock / `blocked` / `blockDomainsByUrl` aborts. Aborting it made the navigation never commit (`about:blank` → timeout), silently breaking scanned URLs that matched our own filter lists (common on adult/pirate/stream domains). The request still flows through the matcher, so a main-frame redirect destination (e.g. a filecrypt → ad-domain hop) is still captured; sub-frame / ad iframes stay blockable.
|
|
30
|
+
- **Navigation timeouts are recovered, not discarded** — on a nav timeout the scanner retries leniently and proceeds with the partially-loaded page instead of dropping the URL (a page still at `about:blank` is still treated as a failure).
|
|
31
|
+
- **whois disk-cache TTL raised to 36h** (dig stays 20h) — registrar data is stable and whois servers rate-limit aggressively, so a longer TTL cuts repeat queries; dig keeps its 20h TTL.
|
|
32
|
+
- **VPN is Linux-only with a clear guard** — `vpn` / `openvpn` on macOS/Windows now returns an explicit "Linux-only" error instead of cryptic `ip` / `/proc` failures.
|
|
33
|
+
|
|
34
|
+
### Performance
|
|
35
|
+
- **`psl.parse` memoized by hostname** in the request hot path — both per-request handlers (main page + popup capture) parsed the root domain of *every* request, while a page hammers the same handful of hosts (CDN, analytics, ad domains). A hostname-keyed memo turns almost all of those into `Map` hits, replacing the URL-keyed cache (fewer + shorter keys, far higher hit rate).
|
|
36
|
+
- **Lower per-request overhead** — the iframe-loop guard's `frame().url()` lookup is now gated behind a cheap URL string test instead of running on every request.
|
|
37
|
+
- **Removed redundant disk I/O** — a leaked adblock combined-list temp file in `tmpdir` is now cleaned up, and a redundant `existsSync` before each forced screenshot's recursive `mkdir` was dropped.
|
|
38
|
+
|
|
39
|
+
### Fixed
|
|
40
|
+
- **Periodic debug/`--dumpurls` log flush is now synchronous** — the 2s timer used async `fs.writeFile({flag:'a'})` with no in-flight guard, so two ticks could append to the same file concurrently and interleave lines, and it cleared the buffer *before* the write confirmed (silently dropping entries on a failed write). It now uses `appendFileSync`, clears only after a successful write (transient failures retry next tick), and is bounded so a permanently-unwritable path can't grow memory.
|
|
41
|
+
- **Dead-domain skip works without `--show-dead-domains`** — the in-scan skip recorded into the dead set only when the report flag was on, which made the skip dead code; recording is now unconditional and the flag gates only the end-of-scan report. Transient DNS errors were also dropped from the dead-domain match so only `ERR_NAME_NOT_RESOLVED` / `ERR_ADDRESS_UNREACHABLE` mark a host dead.
|
|
42
|
+
|
|
43
|
+
### Removed
|
|
44
|
+
- **Hardcoded `dmzjmp` iframe-loop guard** — the domain-specific abort for a `creative.dmzjmp.com` frame requesting `go.dmzjmp.com/api/models` (added mid-2025 to stop a runaway request loop) has not recurred and was removed from the request hot path; the per-URL timeout remains the backstop. Recoverable from git history — prefer a config-driven `iframe_loop_guards` entry if it ever returns.
|
|
45
|
+
|
|
46
|
+
### Documentation
|
|
47
|
+
- **README + man page now document `--block-ads` and `--adblock-engine`** — blocking ads/trackers *during* the scan with EasyList-format list(s) (comma-separated), and the `js` (default, native parser) vs `rust` (Brave `adblock-rs`) matcher backends.
|
|
48
|
+
|
|
5
49
|
## [3.2.0] - 2026-06-04
|
|
6
50
|
|
|
7
51
|
### Added
|
package/README.md
CHANGED
|
@@ -66,9 +66,10 @@ A Puppeteer-based tool for scanning websites to find third-party (or optionally
|
|
|
66
66
|
| `--use-puppeteer-core` | Use `puppeteer-core` with system Chrome instead of bundled Chromium |
|
|
67
67
|
| `--use-obscura` | Connect to running Obscura CDP server (`ws://127.0.0.1:9222` or `OBSCURA_WS` env). Skips fingerprint injection — Obscura provides built-in stealth |
|
|
68
68
|
| `--load-extension <path>` | Load unpacked Chrome extension from directory (can be used multiple times) |
|
|
69
|
-
| `--dns-cache` | Persist dig/whois results to disk between runs (20hr TTL, 2000-entry cap each, `.digcache`/`.whoiscache`), **plus** the DNS pre-check negative cache (NXDOMAIN/ENODATA only — never resolver errors — 12h TTL, `.dnsnegcache`) so known-dead hosts aren't re-resolved next run. Disk writes are atomic (tmp + rename); corrupt cache files are detected on load with a `[dns-cache]` warn line and reset cleanly. |
|
|
70
|
-
| `--no-dns-precheck` | Disable per-URL DNS resolution check before page navigation. By default, hosts that dig/whois have already proven live (within the
|
|
71
|
-
| `--block-ads=<files>` | Block ads using EasyList
|
|
69
|
+
| `--dns-cache` | Persist dig/whois results to disk between runs (dig 20hr / whois 36hr TTL, 2000-entry cap each, `.digcache`/`.whoiscache`), **plus** the DNS pre-check negative cache (NXDOMAIN/ENODATA only — never resolver errors — 12h TTL, `.dnsnegcache`) so known-dead hosts aren't re-resolved next run. Disk writes are atomic (tmp + rename); corrupt cache files are detected on load with a `[dns-cache]` warn line and reset cleanly. |
|
|
70
|
+
| `--no-dns-precheck` | Disable per-URL DNS resolution check before page navigation. By default, hosts that dig/whois have already proven live (within the dig/whois cache TTL) skip their c-ares pre-check via a positive-resolution index. |
|
|
71
|
+
| `--block-ads=<files>` | Block ads/trackers **during the scan** using EasyList-format filter list(s) (`\|\|domain^`, `/ads/*`, etc.). Comma-separated for multiple: `--block-ads=easylist.txt,easyprivacy.txt`. See [Blocking ads during the scan](#blocking-ads-during-the-scan). |
|
|
72
|
+
| `--adblock-engine=<js\|rust>` | Matcher backend for `--block-ads` (default: `js`). `rust` uses Brave's `adblock-rs` (much faster on large lists) and requires `npm i adblock-rs`. |
|
|
72
73
|
| `--cdp` | Enable Chrome DevTools Protocol logging (now per-page if enabled) |
|
|
73
74
|
| `--remove-dupes` | Remove duplicate domains from output (only with `-o`) |
|
|
74
75
|
| `--dry-run` | Console output only: show matching regex, titles, whois/dig/searchstring results, and adblock rules |
|
|
@@ -76,7 +77,8 @@ A Puppeteer-based tool for scanning websites to find third-party (or optionally
|
|
|
76
77
|
| `--help`, `-h` | Show this help menu |
|
|
77
78
|
| `--version` | Show script version |
|
|
78
79
|
| `--max-concurrent <number>` | Maximum concurrent site processing (1-50, overrides config/default) |
|
|
79
|
-
| `--dns <ip[,ip,...]>` | Resolver(s) for the DNS pre-check **and**
|
|
80
|
+
| `--dns <ip[,ip,...]>` | Resolver(s) for the DNS pre-check, nettools' `dig`, **and** — when they map to a known public DoH provider — Chrome's page navigation via DNS-over-HTTPS on direct connections (one pins, several rotate per query; overrides `/etc/resolv.conf`; does not affect `whois`). Chrome normally ignores `--dns` and reads `resolv.conf` directly, so a broken system resolver could fail a domain the pre-check already resolved; pinning Chrome's secure-DNS (`automatic` mode) to the matching provider closes that gap. Mapped providers: Google, Cloudflare, Quad9, OpenDNS, AdGuard, CleanBrowsing, DNS.SB, Mullvad (incl. malware/family/unfiltered variants). **Skipped under a proxy/VPN** (the exit/tunnel resolves); unmapped resolvers (custom/ISP, per-account, IPv6) fall back to system DNS with a warning |
|
|
81
|
+
| `--doh-disable` | Opt out of the Chrome-navigation DoH pinning (default: off). Chrome resolves page navigation via system `resolv.conf` even when `--dns` maps to a known provider; the pre-check and `dig` still honor `--dns`. Use when DoH adds latency, is blocked on the network, or you want system-path resolution |
|
|
80
82
|
| `--show-dead-domains` | At end of scan, list hostnames that did not resolve / were unreachable (`NXDOMAIN`/`ENODATA` + `ERR_NAME_NOT_RESOLVED`/`ERR_ADDRESS_UNREACHABLE`). Excludes blocks/timeouts (those mean the domain is alive). For pruning dead URLs. |
|
|
81
83
|
| `--cleanup-interval <number>` | Browser restart interval in URLs processed (1-1000, overrides config/default) |
|
|
82
84
|
|
|
@@ -92,6 +94,37 @@ A Puppeteer-based tool for scanning websites to find third-party (or optionally
|
|
|
92
94
|
| `--clear-cache` | Clear persistent cache before scanning (improves fresh start performance) |
|
|
93
95
|
| `--ignore-cache` | Bypass all smart caching functionality during scanning |
|
|
94
96
|
|
|
97
|
+
### Blocking ads during the scan
|
|
98
|
+
|
|
99
|
+
`--block-ads` makes the scanner **block** matching requests *during* the scan (separate from capturing rules) — to keep ad/tracker noise out of the page, speed up loads, or test that a list catches what it should.
|
|
100
|
+
|
|
101
|
+
**Adding lists.** Pass one or more EasyList-format filter lists (same syntax as uBlock Origin / EasyList):
|
|
102
|
+
|
|
103
|
+
```bash
|
|
104
|
+
# Single list
|
|
105
|
+
node nwss.js --block-ads=easylist.txt
|
|
106
|
+
|
|
107
|
+
# Multiple lists — comma-separated, no spaces
|
|
108
|
+
node nwss.js --block-ads=easylist.txt,easyprivacy.txt,mylist.txt
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
Lists are plain-text **network** rules — `||doubleclick.net^`, `/ads/*`, `||example.com^$script`, etc. Element-hiding/cosmetic rules (`##…`) don't apply to request blocking and are ignored. The scanned page's own top-level document is never blocked (only sub-resources), so a site whose own domain is in a list still loads.
|
|
112
|
+
|
|
113
|
+
**Engine — `js` vs `rust`** (`--adblock-engine`, default `js`):
|
|
114
|
+
|
|
115
|
+
| Engine | Flag | Backend | When |
|
|
116
|
+
|---|---|---|---|
|
|
117
|
+
| **js** (default) | `--adblock-engine=js` | `lib/adblock.js` — pure-JS, no extra deps | Default; fine for small/medium lists, works everywhere |
|
|
118
|
+
| **rust** | `--adblock-engine=rust` | `lib/adblock-rust.js` — Brave's [`adblock-rs`](https://github.com/brave/adblock-rust) | Large lists (full EasyList + EasyPrivacy + …); much faster matching. Drop-in (same rules, same results). Requires `npm install adblock-rs` (needs a Rust toolchain) |
|
|
119
|
+
|
|
120
|
+
The two engines are interchangeable — same rule format, same blocking result; `rust` is purely a speed option for big lists. If you pass `--adblock-engine=rust` without `adblock-rs` installed, install it (`npm i adblock-rs`) or drop the flag to use `js`.
|
|
121
|
+
|
|
122
|
+
```bash
|
|
123
|
+
# Fast matching over big lists with the Rust engine
|
|
124
|
+
npm install adblock-rs
|
|
125
|
+
node nwss.js --block-ads=easylist.txt,easyprivacy.txt --adblock-engine=rust
|
|
126
|
+
```
|
|
127
|
+
|
|
95
128
|
---
|
|
96
129
|
|
|
97
130
|
## config.json Format
|
|
@@ -165,6 +198,7 @@ Example:
|
|
|
165
198
|
| `interact` | `true` or `false` | `false` | Simulate user interaction (hover, click) |
|
|
166
199
|
| `firstParty` | `0` or `1` | `0` | Match first-party requests |
|
|
167
200
|
| `thirdParty` | `0` or `1` | `1` | Match third-party requests |
|
|
201
|
+
| `redirect_first_party` | Boolean | `true` | Whether redirect-destination domains count as first-party. Set `false` to keep redirect targets (and chain hops) **third-party**, so `filterRegex`/`dig` can match them under `thirdParty:true` — e.g. capturing the end domain of an ad/cloak redirect. The originally-scanned domain stays first-party either way. Note: this also un-excludes the redirect target's own same-domain resources (more captured) |
|
|
168
202
|
| `subDomains` | `0` or `1` | `0` | 1 = preserve subdomains in output |
|
|
169
203
|
| `blocked` | Array | - | Domains or regexes to block during scanning |
|
|
170
204
|
| `even_blocked` | Boolean | `false` | Add matching rules even if requests are blocked |
|
|
@@ -284,6 +318,8 @@ When a page redirects to a new domain, first-party/third-party detection is base
|
|
|
284
318
|
| `interact_click_count` | Integer | `3` | Number of random content-zone clicks per load (capped at 20). Default 3 = primary + 2 backups, since ad SDKs sometimes suppress the 1st/2nd click as warmup |
|
|
285
319
|
| `realistic_click` | Boolean | `false` | Higher click fidelity: denser mouse approach (15 steps), ±1px hand-tremor micro-moves during the press, and ±1.5px mouseup drift (so mousedown≠mouseup coords) — for sites that score click realism. Costs ~80–120ms/click |
|
|
286
320
|
| `interact_typing` | Boolean | `false` | Enable typing simulation |
|
|
321
|
+
| `click_elements` | String[] | - | After load, click these CSS selectors **in order**, searched across the **main frame and any iframe** — reach content via organic navigation/gesture instead of a direct load (e.g. `["a[href*='/movie/']", ".play"]` to click a link then a play button). The request interceptor stays attached, so the post-click page's requests are matched against `filterRegex`/`dig` as usual. A click that navigates is followed; later selectors query the resulting page. Honors `realistic_click` and `cursor_mode: "ghost"` (Bezier travel to the element); missing elements are skipped (never fails the scan) |
|
|
322
|
+
| `click_wait` | Integer | `5000` | Per click: max time (ms) to wait for the element to appear/be visible (`waitForSelector`) **and** the settle/navigation wait after it; capped at half the per-URL timeout |
|
|
287
323
|
| `interact_intensity` | String | `"medium"` | Interaction simulation intensity: "low", "medium", "high" |
|
|
288
324
|
| `cursor_mode` | `"ghost"` | - | Use ghost-cursor Bezier mouse movements (requires `npm i ghost-cursor`) |
|
|
289
325
|
| `ghost_cursor_speed` | Number | auto | Ghost-cursor movement speed multiplier |
|
package/lib/dns.js
CHANGED
|
@@ -117,6 +117,29 @@ function createRotatingResolver(opts = {}) {
|
|
|
117
117
|
// distribution stays exactly even (no locking needed).
|
|
118
118
|
const nextResolver = () => (pool ? pool[cursor++ % pool.length] : dnsPromises);
|
|
119
119
|
|
|
120
|
+
// Concurrency cap (counting semaphore). The scanner runs up to max_concurrent
|
|
121
|
+
// navigations, each firing a pre-check; bursting that many simultaneous c-ares
|
|
122
|
+
// UDP queries at one resolver provokes timeouts / EREFUSED rate-limiting (and
|
|
123
|
+
// is rough on WSL2's UDP-through-NAT path), which then trips the pre-check
|
|
124
|
+
// circuit breaker. Capping in-flight queries paces the burst so the resolver
|
|
125
|
+
// can keep up. Excess callers queue and drain quickly (resolutions are short).
|
|
126
|
+
const maxInFlight = Math.max(1, opts.maxConcurrent || 6);
|
|
127
|
+
let inFlight = 0;
|
|
128
|
+
const waiters = [];
|
|
129
|
+
function acquire() {
|
|
130
|
+
return new Promise(resolve => {
|
|
131
|
+
if (inFlight < maxInFlight) { inFlight++; resolve(); }
|
|
132
|
+
else waiters.push(resolve);
|
|
133
|
+
});
|
|
134
|
+
}
|
|
135
|
+
function release() {
|
|
136
|
+
inFlight--;
|
|
137
|
+
if (waiters.length > 0 && inFlight < maxInFlight) {
|
|
138
|
+
inFlight++;
|
|
139
|
+
waiters.shift()();
|
|
140
|
+
}
|
|
141
|
+
}
|
|
142
|
+
|
|
120
143
|
// One resolution attempt: rotate the lead server, resolve4 first, and on
|
|
121
144
|
// no-IPv4 (ENODATA/ENOTFOUND) fall back to resolve6 so IPv6-only hosts aren't
|
|
122
145
|
// wrongly skipped. Any OTHER code propagates unchanged so the caller sees the
|
|
@@ -147,16 +170,21 @@ function createRotatingResolver(opts = {}) {
|
|
|
147
170
|
* if one REFUSES, the retry hits another).
|
|
148
171
|
*/
|
|
149
172
|
async function resolveHost(hostname, timeoutMs) {
|
|
173
|
+
await acquire();
|
|
150
174
|
try {
|
|
151
|
-
|
|
152
|
-
} catch (firstErr) {
|
|
153
|
-
const code = firstErr && firstErr.code;
|
|
154
|
-
if (DNS_TRANSIENT_ERRORS.has(code) || (firstErr && firstErr.message === 'DNS timeout')) {
|
|
155
|
-
if (forceDebug) console.log(formatLogMessage('debug', `DNS pre-check transient (${code || 'timeout'}) for ${hostname}, retrying once`));
|
|
175
|
+
try {
|
|
156
176
|
await attempt(hostname, timeoutMs);
|
|
157
|
-
}
|
|
158
|
-
|
|
177
|
+
} catch (firstErr) {
|
|
178
|
+
const code = firstErr && firstErr.code;
|
|
179
|
+
if (DNS_TRANSIENT_ERRORS.has(code) || (firstErr && firstErr.message === 'DNS timeout')) {
|
|
180
|
+
if (forceDebug) console.log(formatLogMessage('debug', `DNS pre-check transient (${code || 'timeout'}) for ${hostname}, retrying once`));
|
|
181
|
+
await attempt(hostname, timeoutMs);
|
|
182
|
+
} else {
|
|
183
|
+
throw firstErr;
|
|
184
|
+
}
|
|
159
185
|
}
|
|
186
|
+
} finally {
|
|
187
|
+
release();
|
|
160
188
|
}
|
|
161
189
|
}
|
|
162
190
|
|
|
@@ -230,9 +258,91 @@ function createDnsCircuitBreaker(opts = {}) {
|
|
|
230
258
|
};
|
|
231
259
|
}
|
|
232
260
|
|
|
261
|
+
// Map well-known public resolver IPs (IPv4) to their DNS-over-HTTPS (DoH)
|
|
262
|
+
// endpoint templates. Chrome's page-navigation resolver ignores --dns and reads
|
|
263
|
+
// /etc/resolv.conf directly; pointing Chrome's DoH at the same provider the
|
|
264
|
+
// --dns pre-check uses closes that gap, so a broken or filtering system
|
|
265
|
+
// resolv.conf can't fail navigations the pre-check already passed.
|
|
266
|
+
//
|
|
267
|
+
// Only providers whose DoH endpoint is derivable from a fixed anycast IP are
|
|
268
|
+
// listed. Resolvers with per-account templates (NextDNS, ControlD, AdGuard
|
|
269
|
+
// personal) can't be mapped from an IP and fall through to `unmapped` — Chrome
|
|
270
|
+
// keeps system DNS for those (a warning is logged). IPv6 is intentionally not
|
|
271
|
+
// mapped here; an IPv6 --dns server simply falls back to system DNS.
|
|
272
|
+
const DOH_PROVIDER_TEMPLATES = {
|
|
273
|
+
// Google
|
|
274
|
+
'8.8.8.8': 'https://dns.google/dns-query',
|
|
275
|
+
'8.8.4.4': 'https://dns.google/dns-query',
|
|
276
|
+
// Cloudflare — standard / malware-blocking / malware+adult
|
|
277
|
+
'1.1.1.1': 'https://cloudflare-dns.com/dns-query',
|
|
278
|
+
'1.0.0.1': 'https://cloudflare-dns.com/dns-query',
|
|
279
|
+
'1.1.1.2': 'https://security.cloudflare-dns.com/dns-query',
|
|
280
|
+
'1.0.0.2': 'https://security.cloudflare-dns.com/dns-query',
|
|
281
|
+
'1.1.1.3': 'https://family.cloudflare-dns.com/dns-query',
|
|
282
|
+
'1.0.0.3': 'https://family.cloudflare-dns.com/dns-query',
|
|
283
|
+
// Quad9 — secured (default) / unsecured / secured+ECS
|
|
284
|
+
'9.9.9.9': 'https://dns.quad9.net/dns-query',
|
|
285
|
+
'149.112.112.112': 'https://dns.quad9.net/dns-query',
|
|
286
|
+
'9.9.9.10': 'https://dns10.quad9.net/dns-query',
|
|
287
|
+
'149.112.112.10': 'https://dns10.quad9.net/dns-query',
|
|
288
|
+
'9.9.9.11': 'https://dns11.quad9.net/dns-query',
|
|
289
|
+
'149.112.112.11': 'https://dns11.quad9.net/dns-query',
|
|
290
|
+
// OpenDNS — standard / FamilyShield
|
|
291
|
+
'208.67.222.222': 'https://doh.opendns.com/dns-query',
|
|
292
|
+
'208.67.220.220': 'https://doh.opendns.com/dns-query',
|
|
293
|
+
'208.67.222.123': 'https://doh.familyshield.opendns.com/dns-query',
|
|
294
|
+
'208.67.220.123': 'https://doh.familyshield.opendns.com/dns-query',
|
|
295
|
+
// AdGuard DNS — default (ad/tracker block) / non-filtering / family
|
|
296
|
+
'94.140.14.14': 'https://dns.adguard-dns.com/dns-query',
|
|
297
|
+
'94.140.15.15': 'https://dns.adguard-dns.com/dns-query',
|
|
298
|
+
'94.140.14.140': 'https://unfiltered.adguard-dns.com/dns-query',
|
|
299
|
+
'94.140.14.141': 'https://unfiltered.adguard-dns.com/dns-query',
|
|
300
|
+
'94.140.14.15': 'https://family.adguard-dns.com/dns-query',
|
|
301
|
+
'94.140.15.16': 'https://family.adguard-dns.com/dns-query',
|
|
302
|
+
// CleanBrowsing — security / family / adult
|
|
303
|
+
'185.228.168.9': 'https://doh.cleanbrowsing.org/doh/security-filter/',
|
|
304
|
+
'185.228.169.9': 'https://doh.cleanbrowsing.org/doh/security-filter/',
|
|
305
|
+
'185.228.168.168': 'https://doh.cleanbrowsing.org/doh/family-filter/',
|
|
306
|
+
'185.228.169.168': 'https://doh.cleanbrowsing.org/doh/family-filter/',
|
|
307
|
+
'185.228.168.10': 'https://doh.cleanbrowsing.org/doh/adult-filter/',
|
|
308
|
+
'185.228.169.11': 'https://doh.cleanbrowsing.org/doh/adult-filter/',
|
|
309
|
+
// DNS.SB
|
|
310
|
+
'185.222.222.222': 'https://doh.dns.sb/dns-query',
|
|
311
|
+
'45.11.45.11': 'https://doh.dns.sb/dns-query',
|
|
312
|
+
// Mullvad (non-filtering)
|
|
313
|
+
'194.242.2.2': 'https://dns.mullvad.net/dns-query',
|
|
314
|
+
};
|
|
315
|
+
|
|
316
|
+
/**
|
|
317
|
+
* Resolve a list of --dns resolver specs to Chrome DoH templates.
|
|
318
|
+
* Strips any :port (DoH is always 443) and dedupes. Returns the
|
|
319
|
+
* space-joined template string Chrome's --dns-over-https-templates wants,
|
|
320
|
+
* plus which inputs were mapped vs had no known DoH endpoint.
|
|
321
|
+
* @param {string[]} servers - resolver IPs (optionally ip:port) from --dns
|
|
322
|
+
* @returns {{ templates: string, mapped: string[], unmapped: string[] }}
|
|
323
|
+
*/
|
|
324
|
+
function dohTemplatesForResolvers(servers) {
|
|
325
|
+
const templates = [];
|
|
326
|
+
const mapped = [];
|
|
327
|
+
const unmapped = [];
|
|
328
|
+
for (const raw of (servers || [])) {
|
|
329
|
+
const ip = String(raw).trim().replace(/:\d+$/, ''); // drop :port — DoH is 443
|
|
330
|
+
if (!ip) continue;
|
|
331
|
+
const tpl = DOH_PROVIDER_TEMPLATES[ip];
|
|
332
|
+
if (tpl) {
|
|
333
|
+
if (!templates.includes(tpl)) templates.push(tpl);
|
|
334
|
+
mapped.push(ip);
|
|
335
|
+
} else {
|
|
336
|
+
unmapped.push(ip);
|
|
337
|
+
}
|
|
338
|
+
}
|
|
339
|
+
return { templates: templates.join(' '), mapped, unmapped };
|
|
340
|
+
}
|
|
341
|
+
|
|
233
342
|
module.exports = {
|
|
234
343
|
createRotatingResolver,
|
|
235
344
|
createDnsCircuitBreaker,
|
|
236
345
|
parseDnsServers,
|
|
237
346
|
isNonExistenceError,
|
|
347
|
+
dohTemplatesForResolvers,
|
|
238
348
|
};
|
package/lib/fingerprint.js
CHANGED
|
@@ -2489,44 +2489,47 @@ async function applyUserAgentSpoofing(page, siteConfig, forceDebug, currentUrl)
|
|
|
2489
2489
|
}, 'enhanced mouse/pointer spoofing');
|
|
2490
2490
|
|
|
2491
2491
|
safeExecute(() => {
|
|
2492
|
-
// Neutralize CDP
|
|
2493
|
-
// CDP's Runtime.enable
|
|
2494
|
-
//
|
|
2495
|
-
//
|
|
2496
|
-
|
|
2497
|
-
|
|
2498
|
-
console.
|
|
2499
|
-
|
|
2500
|
-
|
|
2501
|
-
|
|
2502
|
-
|
|
2503
|
-
|
|
2504
|
-
|
|
2505
|
-
|
|
2506
|
-
|
|
2492
|
+
// Neutralize CDP "inspector reads logged object" traps across the
|
|
2493
|
+
// common console methods. CDP's Runtime.enable serializes console
|
|
2494
|
+
// arguments, reading getters / walking prototypes on logged objects —
|
|
2495
|
+
// disable-devtool-style scripts exploit this (log an object with a
|
|
2496
|
+
// getter, then check if it fired) to detect the inspector and redirect
|
|
2497
|
+
// away. The previous version guarded ONLY console.debug; detectors
|
|
2498
|
+
// overwhelmingly use console.log (and dir/table), so the trap still
|
|
2499
|
+
// fired. Sanitize args so getter/proxy traps can't trigger, and drop
|
|
2500
|
+
// DevTools-protocol noise. Covers log/info/debug/dir/table — the
|
|
2501
|
+
// methods detection uses; warn/error stay native (legit usage, and
|
|
2502
|
+
// not the common vector).
|
|
2503
|
+
const sanitizeArg = (arg) => {
|
|
2504
|
+
// Strip Error objects with custom .stack getters (inspector reads .stack)
|
|
2505
|
+
if (arg instanceof Error) {
|
|
2506
|
+
const desc = Object.getOwnPropertyDescriptor(arg, 'stack');
|
|
2507
|
+
if (desc && desc.get) return `${arg.name}: ${arg.message}`;
|
|
2507
2508
|
}
|
|
2508
|
-
//
|
|
2509
|
-
|
|
2510
|
-
|
|
2511
|
-
|
|
2512
|
-
|
|
2513
|
-
|
|
2514
|
-
|
|
2515
|
-
|
|
2516
|
-
|
|
2517
|
-
|
|
2518
|
-
const proto = Object.getPrototypeOf(arg);
|
|
2519
|
-
if (proto && proto !== Object.prototype && proto !== Array.prototype) {
|
|
2520
|
-
try { Object.keys(proto); } catch { return '[object Object]'; }
|
|
2521
|
-
}
|
|
2522
|
-
} catch { return '[object Object]'; }
|
|
2523
|
-
}
|
|
2524
|
-
return arg;
|
|
2525
|
-
});
|
|
2526
|
-
return originalConsoleDebug.apply(this, sanitized);
|
|
2509
|
+
// Neutralize Proxy/getter prototype traps (inspector walks the chain)
|
|
2510
|
+
if (arg !== null && typeof arg === 'object') {
|
|
2511
|
+
try {
|
|
2512
|
+
const proto = Object.getPrototypeOf(arg);
|
|
2513
|
+
if (proto && proto !== Object.prototype && proto !== Array.prototype) {
|
|
2514
|
+
try { Object.keys(proto); } catch { return '[object Object]'; }
|
|
2515
|
+
}
|
|
2516
|
+
} catch { return '[object Object]'; }
|
|
2517
|
+
}
|
|
2518
|
+
return arg;
|
|
2527
2519
|
};
|
|
2528
|
-
|
|
2529
|
-
|
|
2520
|
+
const DEVTOOLS_NOISE = ['DevTools', 'Runtime.evaluate', 'Page.addScriptToEvaluateOnNewDocument', 'Protocol error'];
|
|
2521
|
+
for (const method of ['log', 'info', 'debug', 'dir', 'table']) {
|
|
2522
|
+
const original = console[method];
|
|
2523
|
+
if (typeof original !== 'function') continue;
|
|
2524
|
+
console[method] = function(...args) {
|
|
2525
|
+
const message = args.join(' ');
|
|
2526
|
+
if (typeof message === 'string' && DEVTOOLS_NOISE.some(n => message.includes(n))) {
|
|
2527
|
+
return;
|
|
2528
|
+
}
|
|
2529
|
+
return original.apply(this, args.map(sanitizeArg));
|
|
2530
|
+
};
|
|
2531
|
+
}
|
|
2532
|
+
}, 'console trap neutralization');
|
|
2530
2533
|
|
|
2531
2534
|
// NOTE: The previous `location URL masking` Proxy was removed.
|
|
2532
2535
|
// It wrapped window.location in a Proxy to return 'about:blank' when
|
package/lib/interaction.js
CHANGED
|
@@ -1322,6 +1322,154 @@ function createInteractionConfig(url, siteConfig = {}) {
|
|
|
1322
1322
|
* const { generateRandomCoordinates } = require('./lib/interaction');
|
|
1323
1323
|
* const pos = generateRandomCoordinates(1920, 1080, { preferEdges: true });
|
|
1324
1324
|
*/
|
|
1325
|
+
|
|
1326
|
+
/**
|
|
1327
|
+
* Find the first VISIBLE match for a selector across the main frame AND every
|
|
1328
|
+
* child frame (iframes), polling until found or timeoutMs. page.waitForSelector
|
|
1329
|
+
* only searches the main frame; players/ads commonly live in an iframe, so we
|
|
1330
|
+
* walk page.frames() (main frame first, so a main-frame match wins). "Visible"
|
|
1331
|
+
* is approximated by a non-null boundingBox (rules out display:none/zero-size).
|
|
1332
|
+
* Returns an ElementHandle bound to its frame, or null on timeout.
|
|
1333
|
+
*/
|
|
1334
|
+
async function findVisibleInAnyFrame(page, selector, timeoutMs) {
|
|
1335
|
+
const deadline = Date.now() + Math.max(0, timeoutMs);
|
|
1336
|
+
for (;;) {
|
|
1337
|
+
if (page.isClosed()) return null;
|
|
1338
|
+
const mf = page.mainFrame();
|
|
1339
|
+
const frames = [mf, ...page.frames().filter(f => f !== mf)]; // main frame first
|
|
1340
|
+
for (const frame of frames) {
|
|
1341
|
+
let el = null;
|
|
1342
|
+
try {
|
|
1343
|
+
el = await frame.$(selector);
|
|
1344
|
+
} catch (e) {
|
|
1345
|
+
// An invalid CSS selector (config typo) throws "... is not a valid
|
|
1346
|
+
// selector" — a permanent error, so surface it instead of polling to a
|
|
1347
|
+
// confusing not-found. Frame-detached / cross-origin hiccups have other
|
|
1348
|
+
// messages and fall through to keep polling.
|
|
1349
|
+
if (/is not a valid selector|not a valid or unsupported selector/i.test((e && e.message) || '')) {
|
|
1350
|
+
throw new Error(`invalid CSS selector: ${(e && e.message) || e}`);
|
|
1351
|
+
}
|
|
1352
|
+
el = null;
|
|
1353
|
+
}
|
|
1354
|
+
if (el) {
|
|
1355
|
+
let box = null;
|
|
1356
|
+
try { box = await el.boundingBox(); } catch (_) { box = null; }
|
|
1357
|
+
if (box) return el; // present AND rendered
|
|
1358
|
+
try { await el.dispose(); } catch (_) { /* not rendered yet — keep polling */ }
|
|
1359
|
+
}
|
|
1360
|
+
}
|
|
1361
|
+
if (Date.now() >= deadline) return null;
|
|
1362
|
+
await new Promise(r => setTimeout(r, 250));
|
|
1363
|
+
}
|
|
1364
|
+
}
|
|
1365
|
+
|
|
1366
|
+
/**
|
|
1367
|
+
* Click a list of CSS selectors in order, reaching content via organic
|
|
1368
|
+
* gesture/navigation instead of a direct page load. Each selector's first
|
|
1369
|
+
* match is clicked; if the click navigates (an <a href> / form submit), we wait
|
|
1370
|
+
* for it to commit, otherwise we wait a settle window for in-page actions
|
|
1371
|
+
* (e.g. a player starting). The page's request interceptor stays attached
|
|
1372
|
+
* throughout, so the post-click requests flow into the caller's normal
|
|
1373
|
+
* filterRegex/dig matching — this function only performs the clicks.
|
|
1374
|
+
*
|
|
1375
|
+
* Missing elements are skipped (sites change markup); a click error never
|
|
1376
|
+
* throws out of here. After a navigation, later selectors are queried against
|
|
1377
|
+
* the NEW page (so e.g. "movie link" then "play button" works).
|
|
1378
|
+
*
|
|
1379
|
+
* @param {import('puppeteer').Page} page
|
|
1380
|
+
* @param {string[]} selectors - CSS selectors, clicked in order
|
|
1381
|
+
* @param {object} [options]
|
|
1382
|
+
* @param {boolean} [options.realistic=false] - use humanClick (hover/tremor) vs elementHandle.click
|
|
1383
|
+
* @param {number} [options.waitMs=5000] - per click: max wait for the element to
|
|
1384
|
+
* appear+be visible (waitForSelector), AND the settle/nav window after the click
|
|
1385
|
+
* @param {function} [options.ghostClick] - optional (x,y)=>Promise that performs a
|
|
1386
|
+
* ghost-cursor click (Bezier travel + press) at the element centre. Injected by the
|
|
1387
|
+
* caller so this module needn't depend on ghost-cursor.js (which depends on this one).
|
|
1388
|
+
* When provided it takes precedence over the humanClick/el.click paths.
|
|
1389
|
+
* @param {boolean} [options.forceDebug=false]
|
|
1390
|
+
* @returns {Promise<Array<{selector:string, clicked:boolean, reason?:string}>>}
|
|
1391
|
+
*/
|
|
1392
|
+
async function performTargetedClicks(page, selectors, options = {}) {
|
|
1393
|
+
const { realistic = false, waitMs = 5000, forceDebug = false, ghostClick = null } = options;
|
|
1394
|
+
const results = [];
|
|
1395
|
+
if (!Array.isArray(selectors)) return results;
|
|
1396
|
+
|
|
1397
|
+
for (const raw of selectors) {
|
|
1398
|
+
const selector = typeof raw === 'string' ? raw.trim() : '';
|
|
1399
|
+
if (!selector) continue;
|
|
1400
|
+
if (page.isClosed()) break;
|
|
1401
|
+
|
|
1402
|
+
// Wait for the element to appear AND be visible (up to waitMs), searching
|
|
1403
|
+
// the main frame and every iframe — many targets (video players, lazy
|
|
1404
|
+
// menus, post-consent buttons) are injected by JS after DOMContentLoaded
|
|
1405
|
+
// and/or live inside an iframe, so a single main-frame query would miss
|
|
1406
|
+
// them. Timeout → treat as not-found, skip.
|
|
1407
|
+
let el;
|
|
1408
|
+
try {
|
|
1409
|
+
el = await findVisibleInAnyFrame(page, selector, waitMs);
|
|
1410
|
+
} catch (selErr) {
|
|
1411
|
+
const msg = (selErr && selErr.message) || '';
|
|
1412
|
+
if (/invalid CSS selector|not a valid selector/i.test(msg)) {
|
|
1413
|
+
// Config typo, not a transient miss — warn (visible without --debug).
|
|
1414
|
+
console.warn(formatLogMessage('warn', `${INTERACTION_TAG} click_elements: invalid selector "${selector}" — skipping (${msg})`));
|
|
1415
|
+
results.push({ selector, clicked: false, reason: 'invalid-selector' });
|
|
1416
|
+
} else {
|
|
1417
|
+
// Defensive: findVisibleInAnyFrame swallows transient/detached-frame
|
|
1418
|
+
// errors internally, so this shouldn't fire — but never let an
|
|
1419
|
+
// unexpected find error abort the remaining selectors.
|
|
1420
|
+
if (forceDebug) console.log(formatLogMessage('debug', `${INTERACTION_TAG} click_elements: find failed for "${selector}": ${msg} — skipping`));
|
|
1421
|
+
results.push({ selector, clicked: false, reason: msg || 'find-error' });
|
|
1422
|
+
}
|
|
1423
|
+
continue;
|
|
1424
|
+
}
|
|
1425
|
+
if (!el) {
|
|
1426
|
+
if (forceDebug) console.log(formatLogMessage('debug', `${INTERACTION_TAG} click_elements: "${selector}" not visible (any frame) within ${waitMs}ms — skipping`));
|
|
1427
|
+
results.push({ selector, clicked: false, reason: 'not-found' });
|
|
1428
|
+
continue;
|
|
1429
|
+
}
|
|
1430
|
+
|
|
1431
|
+
try {
|
|
1432
|
+
// Bring it into view so coordinate clicks land (elementHandle.click also
|
|
1433
|
+
// auto-scrolls, but humanClick clicks raw coordinates).
|
|
1434
|
+
try { await el.evaluate(e => e.scrollIntoView({ block: 'center', inline: 'center' })); } catch (_) { /* detached/odd element */ }
|
|
1435
|
+
|
|
1436
|
+
// Arm a navigation wait BEFORE clicking so a link/submit is caught.
|
|
1437
|
+
const navP = page.waitForNavigation({ waitUntil: 'domcontentloaded', timeout: waitMs + 3000 }).catch(() => {});
|
|
1438
|
+
|
|
1439
|
+
let box = null;
|
|
1440
|
+
try { box = await el.boundingBox(); } catch (_) { box = null; }
|
|
1441
|
+
if (typeof ghostClick === 'function' && box) {
|
|
1442
|
+
// Ghost-cursor path: Bezier travel to the element centre + realistic
|
|
1443
|
+
// press, matching the interact phase (caller-injected).
|
|
1444
|
+
await ghostClick(box.x + box.width / 2, box.y + box.height / 2);
|
|
1445
|
+
} else if (realistic && box) {
|
|
1446
|
+
await humanClick(page, box.x + box.width / 2, box.y + box.height / 2, { realistic: true, forceDebug });
|
|
1447
|
+
} else {
|
|
1448
|
+
await el.click({ delay: 30 }); // trusted gesture; auto-scrolls + handles non-visible coords
|
|
1449
|
+
}
|
|
1450
|
+
|
|
1451
|
+
// Resolve on whichever comes first: a committed navigation, or the settle
|
|
1452
|
+
// window (in-page actions). Either way, requests fired in between are
|
|
1453
|
+
// already captured by the caller's interceptor. Capture-and-clear the
|
|
1454
|
+
// settle timer (cdp.js / interact pattern): when navP wins, an uncleared
|
|
1455
|
+
// setTimeout would keep the event loop + closure alive for the full waitMs.
|
|
1456
|
+
let settleTimer;
|
|
1457
|
+
await Promise.race([
|
|
1458
|
+
navP,
|
|
1459
|
+
new Promise(r => { settleTimer = setTimeout(r, waitMs); })
|
|
1460
|
+
]).finally(() => clearTimeout(settleTimer));
|
|
1461
|
+
results.push({ selector, clicked: true });
|
|
1462
|
+
if (forceDebug) console.log(formatLogMessage('debug', `${INTERACTION_TAG} click_elements: clicked "${selector}"`));
|
|
1463
|
+
} catch (err) {
|
|
1464
|
+
if (forceDebug) console.log(formatLogMessage('debug', `${INTERACTION_TAG} click_elements: click failed for "${selector}": ${err.message}`));
|
|
1465
|
+
results.push({ selector, clicked: false, reason: err.message });
|
|
1466
|
+
} finally {
|
|
1467
|
+
try { await el.dispose(); } catch (_) { /* detached after navigation — fine */ }
|
|
1468
|
+
}
|
|
1469
|
+
}
|
|
1470
|
+
return results;
|
|
1471
|
+
}
|
|
1472
|
+
|
|
1325
1473
|
module.exports = {
|
|
1326
1474
|
// Main interaction functions
|
|
1327
1475
|
performPageInteraction,
|
|
@@ -1337,5 +1485,8 @@ module.exports = {
|
|
|
1337
1485
|
// hand-tremor + mouseup drift). Reused by lib/ghost-cursor.js so the ghost
|
|
1338
1486
|
// coordinate click gets the same press realism as built-in content clicks.
|
|
1339
1487
|
humanClick,
|
|
1488
|
+
// Click specific CSS selectors in order (organic navigation / play-button /
|
|
1489
|
+
// link clicking) — site config `click_elements`.
|
|
1490
|
+
performTargetedClicks,
|
|
1340
1491
|
generateRandomCoordinates
|
|
1341
1492
|
};
|
package/lib/nettools.js
CHANGED
|
@@ -30,7 +30,7 @@ const GLOBAL_DIG_CACHE_MAX = 2000;
|
|
|
30
30
|
// Global whois result cache — shared across ALL handler instances and processUrl calls
|
|
31
31
|
// Whois data is per root domain and doesn't change based on search terms
|
|
32
32
|
const globalWhoisResultCache = new Map();
|
|
33
|
-
const GLOBAL_WHOIS_CACHE_TTL =
|
|
33
|
+
const GLOBAL_WHOIS_CACHE_TTL = 129600000; // 36 hours (persisted to disk between runs). Longer than dig's 20h: registrar data is very stable and whois servers rate-limit aggressively, so caching longer cuts repeat queries.
|
|
34
34
|
const GLOBAL_WHOIS_CACHE_MAX = 2000;
|
|
35
35
|
|
|
36
36
|
// Persistent disk cache file paths
|
|
@@ -40,8 +40,8 @@ const WHOIS_CACHE_FILE = path.join(__dirname, '..', '.whoiscache');
|
|
|
40
40
|
// Index of hostnames known to resolve, populated as a side effect of
|
|
41
41
|
// positive dig/whois cache writes AND cache hits. nwss.js's DNS pre-check
|
|
42
42
|
// reads this via domainKnownToResolve() so it can skip its own resolve4
|
|
43
|
-
// call on hosts that dig or whois have already proven live within
|
|
44
|
-
//
|
|
43
|
+
// call on hosts that dig or whois have already proven live within their
|
|
44
|
+
// cache TTL window (dig 20h / whois 36h). Populating on cache HITS (not just writes) handles
|
|
45
45
|
// the --dns-cache disk-load case where entries arrive without going
|
|
46
46
|
// through the in-process write path. Stale entries -- hostname in Set but
|
|
47
47
|
// the dig/whois entry has since been evicted -- are harmless: worst case
|
|
@@ -366,7 +366,10 @@ function validateWhoisAvailability() {
|
|
|
366
366
|
};
|
|
367
367
|
} catch (error) {
|
|
368
368
|
try {
|
|
369
|
-
|
|
369
|
+
// `which` is Unix-only; Windows uses `where`. Without this, an installed
|
|
370
|
+
// whois.exe whose `whois --version` errors (e.g. Sysinternals whois)
|
|
371
|
+
// would be false-negatived as unavailable on native Windows.
|
|
372
|
+
execSync(process.platform === 'win32' ? 'where whois' : 'which whois', { encoding: 'utf8' });
|
|
370
373
|
validateWhoisAvailability._cached = {
|
|
371
374
|
isAvailable: true,
|
|
372
375
|
version: 'whois (version unknown)'
|
package/lib/openvpn_vpn.js
CHANGED
|
@@ -778,6 +778,14 @@ function validateOvpnConfig(ovpnConfig) {
|
|
|
778
778
|
* @returns {Promise<Object>} { success, connection, tunDevice, error }
|
|
779
779
|
*/
|
|
780
780
|
async function connectForSite(siteConfig, forceDebug = false) {
|
|
781
|
+
// Platform guard: OpenVPN routing here reads /proc and uses the iproute2 `ip`
|
|
782
|
+
// command, both Linux-only. Fail clearly instead of a cryptic /proc or `ip`
|
|
783
|
+
// error on macOS/Windows. WSL2 reports 'linux' and passes (TUN is checked
|
|
784
|
+
// separately below via isWSL/checkTunDevice).
|
|
785
|
+
if (process.platform !== 'linux') {
|
|
786
|
+
return { success: false, error: `OpenVPN routing is currently Linux-only (needs /proc + the iproute2 'ip' command; not available on ${process.platform}). Run on Linux/WSL2, or remove the 'openvpn' option from the site config.` };
|
|
787
|
+
}
|
|
788
|
+
|
|
781
789
|
const ovpnConfig = normalizeOvpnConfig(siteConfig.openvpn);
|
|
782
790
|
if (!ovpnConfig) {
|
|
783
791
|
return { success: false, error: 'Invalid OpenVPN configuration' };
|
package/lib/validate_rules.js
CHANGED
|
@@ -1102,7 +1102,7 @@ function testDomainValidation() {
|
|
|
1102
1102
|
const KNOWN_SITE_CONFIG_KEYS = new Set([
|
|
1103
1103
|
'adblock_rules', 'blocked', 'bypass_cache', 'capture_popups',
|
|
1104
1104
|
'capture_popups_max_depth', 'capture_popups_window_ms', 'cdp', 'cdp_specific',
|
|
1105
|
-
'clear_sitedata', 'clear_sitedata_full_on_reload',
|
|
1105
|
+
'clear_sitedata', 'clear_sitedata_full_on_reload', 'click_elements', 'click_wait',
|
|
1106
1106
|
'cloudflare_bypass', 'cloudflare_max_retries', 'comments',
|
|
1107
1107
|
'cloudflare_parallel_detection', 'cloudflare_phish', 'cloudflare_retry_on_error',
|
|
1108
1108
|
'css_blocked', 'curl', 'cursor_mode', 'custom_headers', 'delay',
|
|
@@ -1119,7 +1119,7 @@ const KNOWN_SITE_CONFIG_KEYS = new Set([
|
|
|
1119
1119
|
'js_redirect_timeout', 'localhost', 'max_redirects', 'openvpn', 'pihole',
|
|
1120
1120
|
'output_regex',
|
|
1121
1121
|
'plain', 'privoxy', 'proxy', 'proxy_bypass', 'proxy_debug', 'proxy_remote_dns',
|
|
1122
|
-
'realistic_click', 'referrer_disable', 'referrer_headers', 'regex_and',
|
|
1122
|
+
'realistic_click', 'redirect_first_party', 'referrer_disable', 'referrer_headers', 'regex_and',
|
|
1123
1123
|
'reload', 'resourceTypes', 'screenshot', 'searchstring', 'searchstring_and',
|
|
1124
1124
|
'socks5_bypass', 'socks5_debug', 'socks5_proxy', 'socks5_remote_dns',
|
|
1125
1125
|
'subDomains',
|
|
@@ -1151,7 +1151,7 @@ const BOOLEAN_SITE_CONFIG_FIELDS = new Set([
|
|
|
1151
1151
|
'grep', 'headful', 'ignore_similar', 'ignore_similar_ignored_domains',
|
|
1152
1152
|
'interact', 'interact_clicks', 'interact_scrolling', 'isBrave', 'localhost',
|
|
1153
1153
|
'pihole', 'plain', 'privoxy', 'proxy_debug', 'proxy_remote_dns',
|
|
1154
|
-
'realistic_click', 'referrer_disable', 'regex_and', 'screenshot',
|
|
1154
|
+
'realistic_click', 'redirect_first_party', 'referrer_disable', 'regex_and', 'screenshot',
|
|
1155
1155
|
'searchstring_and', 'socks5_debug', 'socks5_remote_dns', 'thirdParty',
|
|
1156
1156
|
'unbound', 'whois_retry_on_error', 'whois_retry_on_timeout', 'whois_use_fallback',
|
|
1157
1157
|
]);
|