@fanboynz/network-scanner 3.0.3 → 3.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +53 -0
- package/lib/adblock-rust.js +17 -4
- package/lib/adblock.js +92 -15
- package/lib/browserhealth.js +41 -100
- package/lib/cdp.js +68 -34
- package/lib/clear_sitedata.js +68 -20
- package/lib/compress.js +26 -58
- package/lib/curl.js +44 -22
- package/lib/domain-cache.js +8 -57
- package/lib/dry-run.js +9 -4
- package/lib/fingerprint.js +599 -129
- package/lib/fingerprint.md +94 -0
- package/lib/interaction.js +262 -26
- package/lib/nettools.js +47 -76
- package/lib/openvpn_vpn.js +116 -35
- package/lib/proxy.js +6 -2
- package/lib/searchstring.js +15 -237
- package/lib/smart-cache.js +9 -1
- package/lib/socks-relay.js +14 -9
- package/lib/validate_rules.js +285 -3
- package/lib/wireguard_vpn.js +64 -12
- package/nwss.js +557 -220
- package/package.json +1 -1
- package/regex-tool/index.html +321 -628
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,59 @@
|
|
|
2
2
|
|
|
3
3
|
All notable changes to the Network Scanner (nwss.js) project.
|
|
4
4
|
|
|
5
|
+
## [3.1.1] - 2026-05-30
|
|
6
|
+
|
|
7
|
+
### Changed
|
|
8
|
+
- **Fingerprint identity pinned to Stable Chrome 148**, not whatever Chrome-for-Testing puppeteer bundles (currently 149, ahead of Stable). The spoof must blend with the real-world population; claiming an unreleased build is itself a tell. The Chrome major + build (`CHROME_BUILD`) + GREASE brand (`CHROME_GREASE_BRAND`) are now single constants — see `lib/fingerprint.md`.
|
|
9
|
+
- **UA Client Hints made fully consistent and matched to real Chrome 148** (verified field-for-field against a live desktop): brand-list order + GREASE string (`Not/A)Brand`), and the full-version build (`148.0.7778.217`) sourced from one place so JS `getHighEntropyValues` and the HTTP `Sec-CH-UA-Full-Version*` headers can't drift. Added `wow64`, `model`, `formFactors`, `uaFullVersion`, and `Sec-CH-UA-WoW64`/`-Model`/`-Form-Factors` headers; Windows `platformVersion` → `19.0.0`.
|
|
10
|
+
- **`navigator.deviceMemory` and `Sec-CH-Device-Memory` both pinned to `8`** (consistent JS↔HTTP), hiding the host's real RAM; `hardwareConcurrency` reports 4–8 (hides datacenter core count).
|
|
11
|
+
- **Dependencies**: puppeteer / puppeteer-core 25.1.0, lru-cache 11.5.1.
|
|
12
|
+
|
|
13
|
+
### Fixed
|
|
14
|
+
- **Timezone is now spoofed via CDP `emulateTimezone`** instead of JS overrides, so `Date`, `Intl`, and `getTimezoneOffset` are all consistent and DST-correct. The old JS patching left the real `Date` in the host zone — an 8-hour `Date`-vs-`Intl` contradiction and a leaked host timezone.
|
|
15
|
+
- **Closed several headless tells**: Battery now reports the plugged-in default (`charging:true, level:1`); `navigator.bluetooth`, `navigator.share`/`canShare` stubs added (present in real Chrome, absent in headless); `speechSynthesis.getVoices()` returns the claimed-OS voice set (`instanceof`-correct).
|
|
16
|
+
- **proxy**: a string `proxy_bypass`/`socks5_bypass` (instead of an array) no longer throws `bypass.join is not a function` in the browser-launch path.
|
|
17
|
+
- **socks-relay**: a client that disconnects during the upstream-connect await is now handled, so a tunnel isn't opened for a gone client and the watchdog clears immediately.
|
|
18
|
+
- **smart-cache**: the memory-check and auto-save `setInterval`s are now `unref`'d, so an error path that skips `destroy()` can no longer hang the process.
|
|
19
|
+
|
|
20
|
+
### Removed
|
|
21
|
+
- Dead code: `browserhealth` `testNetworkCapability` + `purgeStaleTrackers` (zero callers), and a redundant 2-voice `speechSynthesis` block superseded by the full voice set.
|
|
22
|
+
|
|
23
|
+
### Added
|
|
24
|
+
- **`lib/fingerprint.md`** — fingerprint spoofing coverage tables (surfaces, mitigations, gating flags) and known limitations.
|
|
25
|
+
|
|
26
|
+
## [3.1.0] - 2026-05-29
|
|
27
|
+
|
|
28
|
+
### Added
|
|
29
|
+
- **`realistic_click`** site flag — denser mouse approach, hold tremor, and mouseup drift for sites that score click realism.
|
|
30
|
+
- **`interact_click_count`** site override for popunder-discovery click volume (default content-click count also raised 2 → 3).
|
|
31
|
+
- **`clear_sitedata_full_on_reload`** site flag — full storage clear between reloads; quick mode now also clears localStorage/sessionStorage.
|
|
32
|
+
- **regex-tool rewritten** as a real `filterRegex` builder/tester: literal↔standard↔JSON conversion, multi-pattern + `regex_and`, and testing against real request URLs (matching mirrors the scanner exactly).
|
|
33
|
+
- **Fingerprint coverage**: per-domain-seeded Battery / `navigator.connection` values, `AudioBuffer` fingerprint defeat, `PerformanceNavigationTiming` jitter, `userActivation`; UA strings bumped to Chrome 148 / Firefox 151 / Safari 19.5.
|
|
34
|
+
|
|
35
|
+
### Changed
|
|
36
|
+
- **`userAgent` now defaults to `"chrome"`** when a site doesn't set one — previously sites without it leaked the bundled `HeadlessChrome` UA.
|
|
37
|
+
- **`Sec-CH-UA` headers and the curl content-fetch UA derive from the single UA source**, so Client Hints can't drift from `navigator.userAgent`.
|
|
38
|
+
- **VPN configs force scan concurrency to 1** — the shared system routing table isn't concurrency-safe.
|
|
39
|
+
- **Interaction time ceiling scales with the work envelope** (click count / `realistic_click`) instead of a flat 15s.
|
|
40
|
+
|
|
41
|
+
### Fixed
|
|
42
|
+
- **Per-URL timeout scales** with site timeout/delay/reload (+8s recovery grace) instead of a flat 75s that discarded partial-match recovery on multi-URL scans.
|
|
43
|
+
- **Interaction hard cap is now actually enforced** (was cooperative, overshooting to 20s+ under concurrency).
|
|
44
|
+
- **WireGuard** inline temp-config leaked the private key on failed connect and broke retries; temp dir is now per-PID so concurrent processes can't wipe each other's config.
|
|
45
|
+
- **nettools**: fixed a dig dedup race (concurrent same-domain double lookups); whois no longer discards valid records over non-fatal stderr.
|
|
46
|
+
- **Orphan resource leaks** on `Promise.race` timeout (cdp.js, clear_sitedata.js, browserhealth.js) and several un-`unref`'d `setTimeout` handles.
|
|
47
|
+
- **Config keys validated at startup** with boolean-like coercion, preventing silent misconfiguration.
|
|
48
|
+
|
|
49
|
+
### Security
|
|
50
|
+
- **OpenVPN** `pkill`/`ping`/`curl` calls moved from shell-interpolated `execSync` to `spawnSync` arg arrays (command-injection).
|
|
51
|
+
- **WireGuard/OpenVPN interface & connection names validated** against a strict charset before use in paths/commands.
|
|
52
|
+
|
|
53
|
+
### Performance
|
|
54
|
+
- **adblock**: O(1) exact-domain lookup for `$third-party` / `$first-party` rules.
|
|
55
|
+
- Parallelized site-data clearing and window-cleanup checks.
|
|
56
|
+
- Removed dead code across cdp, domain-cache, searchstring, compress, adblock-rust, and nettools.
|
|
57
|
+
|
|
5
58
|
## [3.0.3] - 2026-05-26
|
|
6
59
|
|
|
7
60
|
### Improved
|
package/lib/adblock-rust.js
CHANGED
|
@@ -219,10 +219,20 @@ function parseAdblockRules(filePathOrArray, options = {}) {
|
|
|
219
219
|
const buf = buffers[i];
|
|
220
220
|
buffers[i] = null;
|
|
221
221
|
const lines = buf.toString('utf-8').split('\n');
|
|
222
|
+
// Count actual rules for the startup banner. Skip:
|
|
223
|
+
// - empty lines
|
|
224
|
+
// - whitespace-only lines (trim then re-check length)
|
|
225
|
+
// - '!'-prefixed comments (standard adblock)
|
|
226
|
+
// - '['-prefixed filter list headers (e.g. '[Adblock Plus 2.0]')
|
|
227
|
+
// Previously only the first two skip conditions ran on the raw line,
|
|
228
|
+
// so whitespace lines + headers inflated the displayed count.
|
|
222
229
|
for (let j = 0; j < lines.length; j++) {
|
|
223
230
|
const line = lines[j];
|
|
224
231
|
if (line.length === 0) continue;
|
|
225
|
-
|
|
232
|
+
const trimmed = line.trim();
|
|
233
|
+
if (trimmed.length === 0) continue;
|
|
234
|
+
const c = trimmed.charCodeAt(0);
|
|
235
|
+
if (c === 0x21 || c === 0x5B) continue; // '!' or '['
|
|
226
236
|
ruleCount++;
|
|
227
237
|
}
|
|
228
238
|
filterSet.addFilters(lines);
|
|
@@ -238,7 +248,12 @@ function parseAdblockRules(filePathOrArray, options = {}) {
|
|
|
238
248
|
// up by the TTL prune on a future run) but the final cachePath is
|
|
239
249
|
// either complete or absent — never half-written.
|
|
240
250
|
const tmpPath = cachePath + '.' + process.pid + '.tmp';
|
|
241
|
-
|
|
251
|
+
// Buffer.from(buffer) ALWAYS copies — wasteful when adblock-rs's
|
|
252
|
+
// serialize() already returns a Buffer (binding-version dependent).
|
|
253
|
+
// For a ~10MB compiled engine that's a pointless 5-10ms allocate+
|
|
254
|
+
// memcpy on the cold-cache-write path.
|
|
255
|
+
const out = Buffer.isBuffer(serialized) ? serialized : Buffer.from(serialized);
|
|
256
|
+
fs.writeFileSync(tmpPath, out);
|
|
242
257
|
fs.renameSync(tmpPath, cachePath);
|
|
243
258
|
// Best-effort prune of stale cache files. Done after our own write so
|
|
244
259
|
// we never delete the entry we just created.
|
|
@@ -287,8 +302,6 @@ function parseAdblockRules(filePathOrArray, options = {}) {
|
|
|
287
302
|
}
|
|
288
303
|
|
|
289
304
|
return {
|
|
290
|
-
rules: { stats },
|
|
291
|
-
|
|
292
305
|
shouldBlock(url, sourceUrl, resourceType) {
|
|
293
306
|
// Avoid default-parameter syntax in the hot path — explicit null/undefined
|
|
294
307
|
// checks are slightly cheaper for V8's argument adaptor.
|
package/lib/adblock.js
CHANGED
|
@@ -85,22 +85,26 @@ function parseAdblockRules(filePath, options = {}) {
|
|
|
85
85
|
const lines = fileContent.split('\n');
|
|
86
86
|
|
|
87
87
|
const rules = {
|
|
88
|
-
domainMap: new Map(),
|
|
89
|
-
domainRules: [],
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
88
|
+
domainMap: new Map(), // ||domain.com^ - Exact domains for O(1) lookup
|
|
89
|
+
domainRules: [], // ||*.domain.com^ - Wildcard domains (fallback)
|
|
90
|
+
thirdPartyDomainMap: new Map(), // ||domain.com^$third-party (exact) — O(1)
|
|
91
|
+
thirdPartyRules: [], // wildcard / non-domain $third-party (fallback)
|
|
92
|
+
firstPartyDomainMap: new Map(), // ||domain.com^$first-party (exact) — O(1)
|
|
93
|
+
firstPartyRules: [], // wildcard / non-domain $first-party (fallback)
|
|
94
|
+
pathRules: [], // /ads/*
|
|
95
|
+
scriptRules: [], // .js$script
|
|
96
|
+
regexRules: [], // /regex/
|
|
97
|
+
whitelist: [], // @@||domain.com^ - Wildcard whitelist
|
|
98
|
+
whitelistMap: new Map(), // Exact whitelist domains for O(1) lookup
|
|
99
|
+
elementHiding: [], // ##.ad-class (not used for network blocking)
|
|
98
100
|
stats: {
|
|
99
101
|
total: 0,
|
|
100
102
|
domain: 0,
|
|
101
|
-
domainMapEntries: 0,
|
|
103
|
+
domainMapEntries: 0, // Exact domain matches in Map
|
|
102
104
|
thirdParty: 0,
|
|
105
|
+
thirdPartyMapEntries: 0, // Exact-domain $third-party rules in Map
|
|
103
106
|
firstParty: 0,
|
|
107
|
+
firstPartyMapEntries: 0, // Exact-domain $first-party rules in Map
|
|
104
108
|
path: 0,
|
|
105
109
|
script: 0,
|
|
106
110
|
regex: 0,
|
|
@@ -161,12 +165,28 @@ function parseAdblockRules(filePath, options = {}) {
|
|
|
161
165
|
// Regular blocking rules
|
|
162
166
|
const parsedRule = parseRule(line, false, enableLogging);
|
|
163
167
|
|
|
164
|
-
// Categorize based on rule type
|
|
168
|
+
// Categorize based on rule type. For $third-party and $first-party
|
|
169
|
+
// rules we additionally split out the exact-domain variants into a
|
|
170
|
+
// hash map keyed by hostname, mirroring the domainMap pattern. This
|
|
171
|
+
// turns the common `||example.com^$third-party` lookup from O(N) over
|
|
172
|
+
// thousands of array entries into O(1) by hostname (+ small parent
|
|
173
|
+
// walk). Wildcard / non-domain party rules still fall back to the
|
|
174
|
+
// linear array.
|
|
165
175
|
if (parsedRule.isThirdParty) {
|
|
166
|
-
|
|
176
|
+
if (parsedRule.isDomain && parsedRule.domain && !parsedRule.domain.includes('*')) {
|
|
177
|
+
rules.thirdPartyDomainMap.set(parsedRule.domain.toLowerCase(), parsedRule);
|
|
178
|
+
rules.stats.thirdPartyMapEntries++;
|
|
179
|
+
} else {
|
|
180
|
+
rules.thirdPartyRules.push(parsedRule);
|
|
181
|
+
}
|
|
167
182
|
rules.stats.thirdParty++;
|
|
168
183
|
} else if (parsedRule.isFirstParty) {
|
|
169
|
-
|
|
184
|
+
if (parsedRule.isDomain && parsedRule.domain && !parsedRule.domain.includes('*')) {
|
|
185
|
+
rules.firstPartyDomainMap.set(parsedRule.domain.toLowerCase(), parsedRule);
|
|
186
|
+
rules.stats.firstPartyMapEntries++;
|
|
187
|
+
} else {
|
|
188
|
+
rules.firstPartyRules.push(parsedRule);
|
|
189
|
+
}
|
|
170
190
|
rules.stats.firstParty++;
|
|
171
191
|
} else if (parsedRule.isDomain) {
|
|
172
192
|
// Store exact domains in Map for O(1) lookup, wildcards in array
|
|
@@ -201,7 +221,11 @@ function parseAdblockRules(filePath, options = {}) {
|
|
|
201
221
|
console.log(formatLogMessage('debug', ` • Exact matches (Map): ${rules.stats.domainMapEntries}`));
|
|
202
222
|
console.log(formatLogMessage('debug', ` • Wildcard patterns (Array): ${rules.domainRules.length}`));
|
|
203
223
|
console.log(formatLogMessage('debug', ` - Third-party rules: ${rules.stats.thirdParty}`));
|
|
224
|
+
console.log(formatLogMessage('debug', ` • Exact matches (Map): ${rules.stats.thirdPartyMapEntries}`));
|
|
225
|
+
console.log(formatLogMessage('debug', ` • Wildcard/path (Array): ${rules.thirdPartyRules.length}`));
|
|
204
226
|
console.log(formatLogMessage('debug', ` - First-party rules: ${rules.stats.firstParty}`));
|
|
227
|
+
console.log(formatLogMessage('debug', ` • Exact matches (Map): ${rules.stats.firstPartyMapEntries}`));
|
|
228
|
+
console.log(formatLogMessage('debug', ` • Wildcard/path (Array): ${rules.firstPartyRules.length}`));
|
|
205
229
|
console.log(formatLogMessage('debug', ` - Path rules: ${rules.stats.path}`));
|
|
206
230
|
console.log(formatLogMessage('debug', ` - Script rules: ${rules.stats.script}`));
|
|
207
231
|
console.log(formatLogMessage('debug', ` - Regex rules: ${rules.stats.regex}`));
|
|
@@ -445,7 +469,14 @@ function createMatcher(rules, options = {}) {
|
|
|
445
469
|
let resultCacheHits = 0, resultCacheMisses = 0;
|
|
446
470
|
let urlCacheHits = 0, urlCacheMisses = 0;
|
|
447
471
|
let sourceCacheHits = 0, sourceCacheMisses = 0;
|
|
448
|
-
|
|
472
|
+
// Include the new domain-maps in the party-rules presence check — without
|
|
473
|
+
// this, a filter list whose $third-party rules ALL went into the Map (empty
|
|
474
|
+
// array) would never trigger third-party detection, silently disabling the
|
|
475
|
+
// entire third-party path.
|
|
476
|
+
const hasPartyRules = rules.thirdPartyRules.length > 0 ||
|
|
477
|
+
rules.firstPartyRules.length > 0 ||
|
|
478
|
+
rules.thirdPartyDomainMap.size > 0 ||
|
|
479
|
+
rules.firstPartyDomainMap.size > 0;
|
|
449
480
|
// Result cache uses FIFO eviction (see FIFOCache class comment) —
|
|
450
481
|
// evicts oldest entries one at a time instead of clearing everything.
|
|
451
482
|
const resultCache = new FIFOCache(32000);
|
|
@@ -634,6 +665,29 @@ function createMatcher(rules, options = {}) {
|
|
|
634
665
|
|
|
635
666
|
// Check third-party rules
|
|
636
667
|
if (isThirdParty) {
|
|
668
|
+
// Fast path: exact-domain $third-party rules (O(1) by hostname)
|
|
669
|
+
let rule = rules.thirdPartyDomainMap.get(lowerHostname);
|
|
670
|
+
if (rule && matchesRule(rule, url, hostname, isThirdParty, resourceType, sourceDomain)) {
|
|
671
|
+
if (enableLogging) {
|
|
672
|
+
console.log(formatLogMessage('debug', `${ADBLOCK_TAG} Blocked third-party: ${url} (${rule.raw || rule.pattern})`));
|
|
673
|
+
}
|
|
674
|
+
const r = { blocked: true, rule: rule.raw || rule.pattern, reason: 'third_party_rule' };
|
|
675
|
+
resultCacheSet(url, sourceUrl, resourceType, r);
|
|
676
|
+
return r;
|
|
677
|
+
}
|
|
678
|
+
// Parent-domain $third-party rules — same walk as domainMap
|
|
679
|
+
for (let i = 0; i < parents.length; i++) {
|
|
680
|
+
rule = rules.thirdPartyDomainMap.get(parents[i]);
|
|
681
|
+
if (rule && matchesRule(rule, url, hostname, isThirdParty, resourceType, sourceDomain)) {
|
|
682
|
+
if (enableLogging) {
|
|
683
|
+
console.log(formatLogMessage('debug', `${ADBLOCK_TAG} Blocked third-party: ${url} (${rule.raw || rule.pattern})`));
|
|
684
|
+
}
|
|
685
|
+
const r = { blocked: true, rule: rule.raw || rule.pattern, reason: 'third_party_rule' };
|
|
686
|
+
resultCacheSet(url, sourceUrl, resourceType, r);
|
|
687
|
+
return r;
|
|
688
|
+
}
|
|
689
|
+
}
|
|
690
|
+
// Slow path: wildcard / non-domain $third-party rules
|
|
637
691
|
const thirdPartyLen = rules.thirdPartyRules.length; // V8: Cache length
|
|
638
692
|
for (let i = 0; i < thirdPartyLen; i++) {
|
|
639
693
|
const rule = rules.thirdPartyRules[i];
|
|
@@ -650,6 +704,29 @@ function createMatcher(rules, options = {}) {
|
|
|
650
704
|
|
|
651
705
|
// Check first-party rules
|
|
652
706
|
if (!isThirdParty) {
|
|
707
|
+
// Fast path: exact-domain $first-party rules (O(1) by hostname)
|
|
708
|
+
let rule = rules.firstPartyDomainMap.get(lowerHostname);
|
|
709
|
+
if (rule && matchesRule(rule, url, hostname, isThirdParty, resourceType, sourceDomain)) {
|
|
710
|
+
if (enableLogging) {
|
|
711
|
+
console.log(formatLogMessage('debug', `${ADBLOCK_TAG} Blocked first-party: ${url} (${rule.raw || rule.pattern})`));
|
|
712
|
+
}
|
|
713
|
+
const r = { blocked: true, rule: rule.raw || rule.pattern, reason: 'first_party_rule' };
|
|
714
|
+
resultCacheSet(url, sourceUrl, resourceType, r);
|
|
715
|
+
return r;
|
|
716
|
+
}
|
|
717
|
+
// Parent-domain $first-party rules
|
|
718
|
+
for (let i = 0; i < parents.length; i++) {
|
|
719
|
+
rule = rules.firstPartyDomainMap.get(parents[i]);
|
|
720
|
+
if (rule && matchesRule(rule, url, hostname, isThirdParty, resourceType, sourceDomain)) {
|
|
721
|
+
if (enableLogging) {
|
|
722
|
+
console.log(formatLogMessage('debug', `${ADBLOCK_TAG} Blocked first-party: ${url} (${rule.raw || rule.pattern})`));
|
|
723
|
+
}
|
|
724
|
+
const r = { blocked: true, rule: rule.raw || rule.pattern, reason: 'first_party_rule' };
|
|
725
|
+
resultCacheSet(url, sourceUrl, resourceType, r);
|
|
726
|
+
return r;
|
|
727
|
+
}
|
|
728
|
+
}
|
|
729
|
+
// Slow path: wildcard / non-domain $first-party rules
|
|
653
730
|
const firstPartyLen = rules.firstPartyRules.length;
|
|
654
731
|
for (let i = 0; i < firstPartyLen; i++) {
|
|
655
732
|
const rule = rules.firstPartyRules[i];
|
package/lib/browserhealth.js
CHANGED
|
@@ -107,10 +107,13 @@ async function performGroupWindowCleanup(browserInstance, groupDescription, forc
|
|
|
107
107
|
// Identify the main Puppeteer window (should be about:blank or the initial page)
|
|
108
108
|
let mainPuppeteerPage = null;
|
|
109
109
|
let pagesToClose = [];
|
|
110
|
-
|
|
111
|
-
//
|
|
110
|
+
|
|
111
|
+
// First pass: synchronous categorization. Separate blank pages from
|
|
112
|
+
// content pages so the conservative-mode isPageFromPreviousScan() checks
|
|
113
|
+
// can run in parallel via Promise.all below, instead of N sequential
|
|
114
|
+
// awaits (each potentially a CDP roundtrip for page.title()).
|
|
115
|
+
const contentPages = [];
|
|
112
116
|
for (const page of allPages) {
|
|
113
|
-
// Cache page.url() call to avoid repeated DOM/browser communication
|
|
114
117
|
const pageUrl = page.url();
|
|
115
118
|
if (pageUrl === 'about:blank' || pageUrl === '' || pageUrl.startsWith('chrome://')) {
|
|
116
119
|
if (!mainPuppeteerPage) {
|
|
@@ -119,18 +122,21 @@ async function performGroupWindowCleanup(browserInstance, groupDescription, forc
|
|
|
119
122
|
pagesToClose.push(page); // Additional blank pages can be closed
|
|
120
123
|
}
|
|
121
124
|
} else {
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
125
|
+
contentPages.push(page);
|
|
126
|
+
}
|
|
127
|
+
}
|
|
128
|
+
|
|
129
|
+
if (cleanupMode === "all") {
|
|
130
|
+
// Aggressive mode: close all content pages — no per-page async check
|
|
131
|
+
for (const page of contentPages) pagesToClose.push(page);
|
|
132
|
+
} else {
|
|
133
|
+
// Conservative mode: run the isPageFromPreviousScan checks in parallel
|
|
134
|
+
// and collect the leftovers in original order.
|
|
135
|
+
const checks = await Promise.all(
|
|
136
|
+
contentPages.map(page => isPageFromPreviousScan(page, forceDebug))
|
|
137
|
+
);
|
|
138
|
+
for (let i = 0; i < contentPages.length; i++) {
|
|
139
|
+
if (checks[i]) pagesToClose.push(contentPages[i]);
|
|
134
140
|
}
|
|
135
141
|
}
|
|
136
142
|
|
|
@@ -391,12 +397,13 @@ async function performRealtimeWindowCleanup(browserInstance, threshold = REALTIM
|
|
|
391
397
|
if (forceDebug) {
|
|
392
398
|
console.log(formatLogMessage('debug', `${REALTIME_CLEANUP_TAG} Found ${contextPages.length} pages in popup context`));
|
|
393
399
|
}
|
|
394
|
-
// Close popup context pages
|
|
395
|
-
|
|
396
|
-
|
|
397
|
-
|
|
398
|
-
|
|
399
|
-
|
|
400
|
+
// Close popup context pages in parallel — each close is
|
|
401
|
+
// independent and the sequential await was both slow AND would
|
|
402
|
+
// abort the whole loop on the first close failure, leaking the
|
|
403
|
+
// remaining pages. .catch() per page ensures we attempt all.
|
|
404
|
+
await Promise.all(contextPages.map(page =>
|
|
405
|
+
page.isClosed() ? undefined : page.close().catch(() => {})
|
|
406
|
+
));
|
|
400
407
|
}
|
|
401
408
|
}
|
|
402
409
|
} catch (contextErr) {
|
|
@@ -600,16 +607,6 @@ function untrackPage(page) {
|
|
|
600
607
|
pageUsageTracker.delete(page);
|
|
601
608
|
}
|
|
602
609
|
|
|
603
|
-
/**
|
|
604
|
-
* No-op since the trackers were migrated to WeakMap — GC reclaims dead-page
|
|
605
|
-
* entries automatically when Puppeteer drops its internal references. Kept
|
|
606
|
-
* exported so the ~7 callers in nwss.js continue to compile; safe to delete
|
|
607
|
-
* entirely once those callsites are scrubbed.
|
|
608
|
-
*/
|
|
609
|
-
function purgeStaleTrackers() {
|
|
610
|
-
// intentionally empty
|
|
611
|
-
}
|
|
612
|
-
|
|
613
610
|
/**
|
|
614
611
|
* Quick browser responsiveness test for use during page setup
|
|
615
612
|
* Designed to catch browser degradation between operations
|
|
@@ -630,71 +627,6 @@ async function isQuicklyResponsive(browserInstance, timeout = 3000) {
|
|
|
630
627
|
}
|
|
631
628
|
}
|
|
632
629
|
|
|
633
|
-
/**
|
|
634
|
-
* Tests if browser can handle network operations (like Network.enable)
|
|
635
|
-
* Creates a test page and attempts basic network setup
|
|
636
|
-
* @param {import('puppeteer').Browser} browserInstance - Puppeteer browser instance
|
|
637
|
-
* @param {number} timeout - Timeout in milliseconds (default: 10000)
|
|
638
|
-
* @returns {Promise<object>} Network capability test result
|
|
639
|
-
*/
|
|
640
|
-
async function testNetworkCapability(browserInstance, timeout = 10000) {
|
|
641
|
-
const result = {
|
|
642
|
-
capable: false,
|
|
643
|
-
error: null,
|
|
644
|
-
responseTime: 0
|
|
645
|
-
};
|
|
646
|
-
|
|
647
|
-
const startTime = Date.now();
|
|
648
|
-
let testPage = null;
|
|
649
|
-
|
|
650
|
-
try {
|
|
651
|
-
// Create test page
|
|
652
|
-
testPage = await raceWithTimeout(
|
|
653
|
-
browserInstance.newPage(),
|
|
654
|
-
timeout,
|
|
655
|
-
'Test page creation timeout'
|
|
656
|
-
);
|
|
657
|
-
|
|
658
|
-
// Test network operations (the critical operation that's failing)
|
|
659
|
-
await raceWithTimeout(
|
|
660
|
-
testPage.setRequestInterception(true),
|
|
661
|
-
timeout,
|
|
662
|
-
'Network.enable test timeout'
|
|
663
|
-
);
|
|
664
|
-
|
|
665
|
-
// Turn off interception. Symmetric to the enable above — Network.disable
|
|
666
|
-
// can hang for the same CDP reasons, so it needs the same watchdog.
|
|
667
|
-
await raceWithTimeout(
|
|
668
|
-
testPage.setRequestInterception(false),
|
|
669
|
-
timeout,
|
|
670
|
-
'Network.disable test timeout'
|
|
671
|
-
);
|
|
672
|
-
result.capable = true;
|
|
673
|
-
result.responseTime = Date.now() - startTime;
|
|
674
|
-
|
|
675
|
-
} catch (error) {
|
|
676
|
-
result.error = error.message;
|
|
677
|
-
result.responseTime = Date.now() - startTime;
|
|
678
|
-
|
|
679
|
-
// Classify the error type
|
|
680
|
-
if (error.message.includes('Network.enable') ||
|
|
681
|
-
error.message.includes('timed out') ||
|
|
682
|
-
error.message.includes('Protocol error')) {
|
|
683
|
-
result.error = `Network capability test failed: ${error.message}`;
|
|
684
|
-
}
|
|
685
|
-
} finally {
|
|
686
|
-
if (testPage && !testPage.isClosed()) {
|
|
687
|
-
try {
|
|
688
|
-
await testPage.close();
|
|
689
|
-
} catch (closeErr) {
|
|
690
|
-
/* ignore cleanup errors */
|
|
691
|
-
}
|
|
692
|
-
}
|
|
693
|
-
}
|
|
694
|
-
|
|
695
|
-
return result;
|
|
696
|
-
}
|
|
697
|
-
|
|
698
630
|
/**
|
|
699
631
|
* Checks if browser instance is still responsive
|
|
700
632
|
* @param {import('puppeteer').Browser} browserInstance - Puppeteer browser instance
|
|
@@ -740,9 +672,15 @@ async function checkBrowserHealth(browserInstance, timeout = 8000) {
|
|
|
740
672
|
|
|
741
673
|
// Test 4: Create a single test page to verify both browser functionality AND network capability
|
|
742
674
|
let testPage = null;
|
|
675
|
+
// Same orphan-cleanup pattern as cdp.js + clear_sitedata.js.
|
|
676
|
+
// Promise.race can't cancel newPage() — if the race
|
|
677
|
+
// times out the underlying call may still produce a Page tab nothing
|
|
678
|
+
// references → leaked tab.
|
|
679
|
+
let testPagePromise = null;
|
|
743
680
|
try {
|
|
681
|
+
testPagePromise = browserInstance.newPage();
|
|
744
682
|
testPage = await raceWithTimeout(
|
|
745
|
-
|
|
683
|
+
testPagePromise,
|
|
746
684
|
timeout,
|
|
747
685
|
'Page creation timeout'
|
|
748
686
|
);
|
|
@@ -780,6 +718,11 @@ async function checkBrowserHealth(browserInstance, timeout = 8000) {
|
|
|
780
718
|
await testPage.close();
|
|
781
719
|
|
|
782
720
|
} catch (pageTestError) {
|
|
721
|
+
// Orphan cleanup: if testPage is null but newPage was started, the
|
|
722
|
+
// race timed out before assignment. Close the orphan when it arrives.
|
|
723
|
+
if (!testPage && testPagePromise) {
|
|
724
|
+
testPagePromise.then(p => p.close().catch(() => {})).catch(() => {});
|
|
725
|
+
}
|
|
783
726
|
if (testPage && !testPage.isClosed()) {
|
|
784
727
|
try { await testPage.close(); } catch (e) { /* ignore */ }
|
|
785
728
|
}
|
|
@@ -1253,7 +1196,6 @@ module.exports = {
|
|
|
1253
1196
|
performGroupWindowCleanup,
|
|
1254
1197
|
performRealtimeWindowCleanup,
|
|
1255
1198
|
trackPageForRealtime,
|
|
1256
|
-
testNetworkCapability,
|
|
1257
1199
|
isQuicklyResponsive,
|
|
1258
1200
|
performHealthAssessment,
|
|
1259
1201
|
monitorBrowserHealth,
|
|
@@ -1261,6 +1203,5 @@ module.exports = {
|
|
|
1261
1203
|
isCriticalProtocolError,
|
|
1262
1204
|
updatePageUsage,
|
|
1263
1205
|
untrackPage,
|
|
1264
|
-
cleanupPageBeforeReload
|
|
1265
|
-
purgeStaleTrackers
|
|
1206
|
+
cleanupPageBeforeReload
|
|
1266
1207
|
};
|
package/lib/cdp.js
CHANGED
|
@@ -48,7 +48,8 @@ function raceWithTimeout(promise, ms, message) {
|
|
|
48
48
|
}
|
|
49
49
|
|
|
50
50
|
// Shared no-op cleanup used by every no-CDP / CDP-failed return path. Hoisted
|
|
51
|
-
// so
|
|
51
|
+
// so the success path doesn't allocate a fresh `async () => {}` per call
|
|
52
|
+
// when cleanup logic isn't needed, and so NOOP_SESSION_RESULT can reuse it.
|
|
52
53
|
const NOOP_CLEANUP = async () => {};
|
|
53
54
|
|
|
54
55
|
/**
|
|
@@ -74,27 +75,39 @@ function isCriticalCDPError(message) {
|
|
|
74
75
|
message.includes('Browser has been closed');
|
|
75
76
|
}
|
|
76
77
|
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
const createSessionResult = (session = null, cleanup = NOOP_CLEANUP, isEnhanced = false) => ({
|
|
85
|
-
session,
|
|
86
|
-
cleanup,
|
|
87
|
-
isEnhanced
|
|
78
|
+
// Pre-allocated singleton for both the early-exit case (CDP not enabled OR
|
|
79
|
+
// not in debug mode) AND the non-critical-error path. Frozen so callers can't
|
|
80
|
+
// mutate the shared instance. Result shape is {session, cleanup}; previously
|
|
81
|
+
// also carried an `isEnhanced: false` field that had zero consumers anywhere.
|
|
82
|
+
const NOOP_SESSION_RESULT = Object.freeze({
|
|
83
|
+
session: null,
|
|
84
|
+
cleanup: NOOP_CLEANUP
|
|
88
85
|
});
|
|
89
86
|
|
|
90
87
|
/**
|
|
91
|
-
* Creates a new page with timeout protection to prevent CDP hangs
|
|
88
|
+
* Creates a new page with timeout protection to prevent CDP hangs.
|
|
89
|
+
*
|
|
90
|
+
* Orphan-page handling: Promise.race cannot cancel browser.newPage(). If the
|
|
91
|
+
* timer wins, the underlying call keeps running and eventually resolves to a
|
|
92
|
+
* real Page tab nothing references → leaked tab in the browser. We capture
|
|
93
|
+
* the original promise and attach a close-on-resolve cleanup so the orphan
|
|
94
|
+
* is reaped if it arrives after the race lost.
|
|
95
|
+
*
|
|
92
96
|
* @param {import('puppeteer').Browser} browser - Browser instance
|
|
93
97
|
* @param {number} timeout - Timeout in milliseconds (default: 30000)
|
|
94
98
|
* @returns {Promise<import('puppeteer').Page>} Page instance
|
|
95
99
|
*/
|
|
96
100
|
async function createPageWithTimeout(browser, timeout = 30000) {
|
|
97
|
-
|
|
101
|
+
const pagePromise = browser.newPage();
|
|
102
|
+
try {
|
|
103
|
+
return await raceWithTimeout(pagePromise, timeout, 'Page creation timeout - browser may be unresponsive');
|
|
104
|
+
} catch (err) {
|
|
105
|
+
// If pagePromise eventually resolves after the race gave up, close the
|
|
106
|
+
// orphan tab. .catch(() => {}) handles the case where pagePromise also
|
|
107
|
+
// rejected (no resource to clean up).
|
|
108
|
+
pagePromise.then(p => p.close().catch(() => {})).catch(() => {});
|
|
109
|
+
throw err;
|
|
110
|
+
}
|
|
98
111
|
}
|
|
99
112
|
|
|
100
113
|
/**
|
|
@@ -171,7 +184,7 @@ async function createCDPSession(page, currentUrl, options = {}) {
|
|
|
171
184
|
const cdpLoggingNeeded = (enableCDP || siteSpecificCDP === true) && forceDebug;
|
|
172
185
|
|
|
173
186
|
if (!cdpLoggingNeeded) {
|
|
174
|
-
return
|
|
187
|
+
return NOOP_SESSION_RESULT;
|
|
175
188
|
}
|
|
176
189
|
|
|
177
190
|
// Parse the current URL hostname once and reuse it for the mode-log line,
|
|
@@ -187,11 +200,16 @@ async function createCDPSession(page, currentUrl, options = {}) {
|
|
|
187
200
|
}
|
|
188
201
|
|
|
189
202
|
let cdpSession = null;
|
|
203
|
+
let cdpSessionPromise = null;
|
|
190
204
|
|
|
191
205
|
try {
|
|
192
|
-
// Create CDP session using modern Puppeteer 20+ API
|
|
193
|
-
//
|
|
194
|
-
|
|
206
|
+
// Create CDP session using modern Puppeteer 20+ API.
|
|
207
|
+
// Capture the promise BEFORE racing so the catch block can attach an
|
|
208
|
+
// orphan-cleanup chain — if our race times out but the underlying
|
|
209
|
+
// createCDPSession() later resolves, we'd otherwise leak a CDP session
|
|
210
|
+
// on the browser side that nothing references.
|
|
211
|
+
cdpSessionPromise = page.createCDPSession();
|
|
212
|
+
cdpSession = await raceWithTimeout(cdpSessionPromise, 20000, 'CDP session creation timeout');
|
|
195
213
|
|
|
196
214
|
// Enable network domain — required for network event monitoring. This is
|
|
197
215
|
// the operation the rest of the codebase has learned can hang under
|
|
@@ -221,10 +239,13 @@ async function createCDPSession(page, currentUrl, options = {}) {
|
|
|
221
239
|
|
|
222
240
|
console.log(formatLogMessage('debug', `${CDP_TAG} CDP session created successfully for ${currentUrl}`));
|
|
223
241
|
|
|
224
|
-
return
|
|
225
|
-
cdpSession,
|
|
226
|
-
async () => {
|
|
227
|
-
// Safe cleanup that never throws errors
|
|
242
|
+
return {
|
|
243
|
+
session: cdpSession,
|
|
244
|
+
cleanup: async () => {
|
|
245
|
+
// Safe cleanup that never throws errors. Idempotent — null out the
|
|
246
|
+
// captured reference after the first successful detach so a
|
|
247
|
+
// double-cleanup is a true no-op instead of generating a misleading
|
|
248
|
+
// "Failed to detach: Session closed" debug log on the second call.
|
|
228
249
|
if (cdpSession) {
|
|
229
250
|
try {
|
|
230
251
|
await cdpSession.detach();
|
|
@@ -232,28 +253,41 @@ async function createCDPSession(page, currentUrl, options = {}) {
|
|
|
232
253
|
} catch (cdpCleanupErr) {
|
|
233
254
|
// Log cleanup errors but don't throw - cleanup should never fail the calling code
|
|
234
255
|
console.log(formatLogMessage('debug', `${CDP_TAG} Failed to detach CDP session for ${currentUrl}: ${cdpCleanupErr.message}`));
|
|
256
|
+
} finally {
|
|
257
|
+
cdpSession = null;
|
|
235
258
|
}
|
|
236
259
|
}
|
|
237
|
-
}
|
|
238
|
-
|
|
239
|
-
);
|
|
260
|
+
}
|
|
261
|
+
};
|
|
240
262
|
|
|
241
263
|
} catch (cdpErr) {
|
|
242
|
-
//
|
|
243
|
-
//
|
|
244
|
-
//
|
|
245
|
-
//
|
|
246
|
-
//
|
|
264
|
+
// Two distinct cleanup paths depending on where the failure was:
|
|
265
|
+
//
|
|
266
|
+
// a) cdpSession IS set → failure was AFTER createCDPSession() resolved
|
|
267
|
+
// (e.g. Network.enable timed out). We have a real handle — detach
|
|
268
|
+
// directly. Previously the code just nulled the local and orphaned
|
|
269
|
+
// the session; now we detach and log any failure.
|
|
270
|
+
//
|
|
271
|
+
// b) cdpSession is null but cdpSessionPromise was started → the race
|
|
272
|
+
// timed out before assignment. The underlying createCDPSession()
|
|
273
|
+
// may still resolve later, producing an orphan session on the
|
|
274
|
+
// browser side. Attach a detach-on-resolve chain; .catch(()=>{})
|
|
275
|
+
// swallows the case where the underlying promise also rejected.
|
|
247
276
|
if (cdpSession) {
|
|
248
277
|
try { await cdpSession.detach(); }
|
|
249
278
|
catch (partialDetachErr) {
|
|
250
279
|
console.log(formatLogMessage('debug', `${CDP_TAG} Partial-session detach failed for ${currentUrl}: ${partialDetachErr.message}`));
|
|
251
280
|
}
|
|
252
|
-
|
|
281
|
+
} else if (cdpSessionPromise) {
|
|
282
|
+
cdpSessionPromise.then(s => s.detach().catch(() => {})).catch(() => {});
|
|
253
283
|
}
|
|
254
284
|
|
|
255
|
-
// Enhanced error context for CDP domain-specific debugging
|
|
256
|
-
|
|
285
|
+
// Enhanced error context for CDP domain-specific debugging. Reuse the
|
|
286
|
+
// currentHostname computed at function entry (one URL parse vs two);
|
|
287
|
+
// only fall back to the truncated raw URL when that parse failed too.
|
|
288
|
+
const urlContext = currentHostname !== 'unknown'
|
|
289
|
+
? currentHostname
|
|
290
|
+
: `${currentUrl.substring(0, 50)}...`;
|
|
257
291
|
|
|
258
292
|
// Critical errors: browser is broken, propagate so the caller can restart.
|
|
259
293
|
if (isCriticalCDPError(cdpErr.message)) {
|
|
@@ -265,7 +299,7 @@ async function createCDPSession(page, currentUrl, options = {}) {
|
|
|
265
299
|
console.warn(formatLogMessage('warn', `${CDP_TAG} Failed to attach CDP session for ${urlContext}: ${cdpErr.message}`));
|
|
266
300
|
|
|
267
301
|
// Return null session with no-op cleanup for consistent API
|
|
268
|
-
return
|
|
302
|
+
return NOOP_SESSION_RESULT;
|
|
269
303
|
}
|
|
270
304
|
}
|
|
271
305
|
|