@fanboynz/network-scanner 3.0.2 → 3.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,40 @@
2
2
 
3
3
  All notable changes to the Network Scanner (nwss.js) project.
4
4
 
5
+ ## [3.0.3] - 2026-05-26
6
+
7
+ ### Improved
8
+ - **3 DataDome-targeted gaps closed in `lib/fingerprint.js`** (inside `applyFingerprintProtection`, so gated on `siteConfig.fingerprint_protection` like every other spoof in that function):
9
+ - **`Notification.permission` static property** now returns `'default'` (real Chrome's no-granted-permission state). Previously only `Notification.requestPermission()` (the method) was patched; the static property still returned the headless default `'denied'` — a live tell for DataDome and similar detectors that read it directly.
10
+ - **`screen.orientation` interface** is now provided as a stable `{type: 'landscape-primary', angle: 0, addEventListener, lock, unlock, ...}` object when missing. Modern browsers always expose ScreenOrientation; absence is a "real browser?" check signal.
11
+ - **`<html>` `webdriver` DOM attribute** stripped if present. Defensive — modern Puppeteer with `ignoreDefaultArgs: ['--enable-automation']` doesn't emit this, but older driver setups do, and detectors check both `navigator.webdriver` AND `documentElement.getAttribute('webdriver')`. Appended to the existing `'webdriver removal'` safeExecute block so all webdriver cleanup lives together.
12
+
13
+ Targeted at sites running DataDome's `ct.captcha-delivery.com/i.js` (and similar fingerprint suites: PerimeterX, Akamai Bot Manager). Most other surfaces these detectors probe were already covered (chrome.app/csi/loadTimes, userAgentData, maxTouchPoints, permissions.query, WebGL UNMASKED_VENDOR/RENDERER, etc.). `scripts/test-stealth.js sannysoft` regression smoke holds at 29 passed / 1 warn / 0 failed (the warn is `CHR_DEBUG_TOOLS`, a CDP-attached signal that's fundamental to Puppeteer and unrelated to these additions). JS-only spoofing can't address TLS fingerprint, HTTP/2 fingerprint, IP reputation, or behavioural analysis — those still depend on proxy choice and `interact` / `ghost-cursor` config.
14
+
15
+ ### Added
16
+ - **`scripts/test-stealth.js` now reports warn-row labels** for sannysoft, not just failure-row labels. Previously a cell moving from `passed` → `warn` between runs was invisible (only the count changed), making soft-regression debugging require `--headful`. Now the warn-row table contents print inline so you can see e.g. `warn rows: CHR_DEBUG_TOOLS` directly. Schema additive: result object gains a `warnings: string[]` array alongside the existing `failures: string[]`.
17
+ - **`scripts/test-stealth.js` extracts CreepJS's actual current metrics** instead of stale `Trust Score` regex that returned `n/a` for every field. New extracted fields: `fpId` (CreepJS's stable fingerprint hash, lets you A/B before/after a spoof change), `isChromium` (engine identification), `headlessPct` (HARD headless detection score, lower = better), `likeHeadlessPct` (SOFT headless signals), `stealthPct` (spoof-detection probes score, HIGHER = better since it means our spoofs LOOK convincing). Formatter prints all five with directionality hints inline. Excerpt now 40 lines / 2KB (was 15 / 400 bytes) so future UI rotations are debuggable from the output without `--headful`.
18
+ - **Additional headless-mode spoofs in `lib/fingerprint.js`** (all inside `applyFingerprintProtection`, gated on `siteConfig.fingerprint_protection`):
19
+ - **`matchMedia` hover/pointer queries**: `(any-hover: hover)`, `(any-hover: none)`, `(any-pointer: fine)`, `(any-pointer: none)`, `(any-pointer: coarse)` plus the legacy non-`any-` aliases. Headless Chrome reports no hover device and no fine pointer (no mouse hardware); detectors probe these as a binary 'real desktop hardware?' signal. Pass-through for all other queries (responsive, color-scheme, reduced-motion, etc.).
20
+ - **`screenLeft` / `screenTop` mirror `screenX` / `screenY`**. Real Chrome exposes these as identical-value legacy aliases; spoofers often leave them undefined or 0, which is inconsistent with the non-zero `screenX/Y` our existing patch produces.
21
+ - **Modern Chrome API stubs**: `document.hasStorageAccess()` → `Promise<true>`, `navigator.userActivation` → `{hasBeenActive: true, isActive: true}`, `navigator.getInstalledRelatedApps()` → `Promise<[]>`. Each gated on absence check so real-Chrome paths skip the override.
22
+
23
+ Honest measurement: CreepJS's specific `headless score` did NOT move after these additions (stayed at 67%). My prior estimate of '~-10 to -15 percentage points' was over-optimistic — CreepJS apparently doesn't weight matchMedia hover/pointer heavily in its headless calculation. The additions are still correct spoofs that close real fingerprint gaps and likely help against DataDome / PerimeterX which use different scoring; they're net-positive but score-neutral against CreepJS specifically. The remaining ~67% headless detection is architectural (CDP attachment, software-rasterizer GPU, no real mouse cursor) and can't be lowered without `--headful`.
24
+
25
+ ### Security
26
+ - **WebRTC public-IP leak closed** in `lib/fingerprint.js` (`applyFingerprintProtection`). The previous local-IP filter only stripped RFC1918 private ranges (`10.x / 172.16-31.x / 192.168.x`), missing `srflx` (STUN-discovered PUBLIC IP), `prflx`, `relay`, and host candidates with non-RFC1918 addresses (CGNAT 100.64.0.0/10, link-local IPv6, real public IPs on bare-metal hosts). STUN traffic is UDP and **bypasses the SOCKS5 proxy entirely**, so the leaked IP was the real host IP regardless of proxy config — visible to any page that listened on `icecandidate` events. Caught by `test-stealth.js creepjs` which surfaced the candidate string `122.252.155.250 typ srflx` and the corresponding `ip:` field in its WebRTC panel. Fix: strip EVERY ICE candidate; deliver only the null-candidate sentinel (end-of-gathering signal). Side note: the property-based `pc.onicecandidate = fn` setter was also broken (stored handler but never wired it up); now mirrors the same filter as the addEventListener path. Side effect: any site that REQUIRES functional WebRTC peer connections sees ICE gathering produce zero candidates. For nwss.js's scanning use case this is correct.
27
+
28
+ ### Stealth hardening (toString masking)
29
+ - **Added 8 session-introduced spoofs to `Function.prototype.toString` bulk masking** (`matchMedia`, `hasStorageAccess`, `getInstalledRelatedApps`, `userActivation` getter, `Notification.permission` getter, `screen.orientation` getter, `screenLeft`/`screenTop` getters). Without this, each new spoof was detectable via `.toString()` returning the override source instead of `[native code]`.
30
+ - **Masked per-instance WebRTC `onicecandidate` getter/setter + `addEventListener` wrap.** The bulk-mask block only runs once at injection; per-RTCPeerConnection closures created inside the factory weren't covered. A site doing `Object.getOwnPropertyDescriptor(pc, 'onicecandidate').get.toString()` could see the spoof.
31
+ - **Spoofed `navigator.productSub` + `vendorSub`** (UA-aware: `'20030107'` for Chrome/Safari/etc., `'20100101'` for Firefox; `vendorSub` always `''`). Companion legacy properties to the already-spoofed `vendor`/`product`. Common bot-detection signal since anti-detection libraries often spoof UA but forget these. `vendor`/`product` getters also added to the maskAsNative list (pre-existing oversight folded in).
32
+
33
+ ### Fixed
34
+ - **`validatePageForInjection`'s 1.5s race timer is now `unref`'d.** Last remaining Node-side `setTimeout` that wasn't unref'd; could hold the event loop alive for up to 1.5s past scan completion. All Node-side timers in `lib/fingerprint.js`, `lib/nettools.js`, and `lib/socks-relay.js` are now unref'd.
35
+
36
+ ### Performance
37
+ - **Canvas noise application now cached per `HTMLCanvasElement`** via WeakMap. `toDataURL` and `toBlob` previously did a `getImageData` + `putImageData` round-trip on every call (~500k iterations for size-capped canvases) to bake noise into the export. Now the round-trip runs once per canvas; subsequent exports skip it (the canvas backing store still has the noised pixels from the first call). Trade-off: animated canvases that redraw between exports won't have new content re-noised — acceptable for the common fingerprinter pattern (single probe → single toDataURL).
38
+
5
39
  ## [3.0.2] - 2026-05-25
6
40
 
7
41
  ### Security
@@ -219,10 +219,20 @@ function parseAdblockRules(filePathOrArray, options = {}) {
219
219
  const buf = buffers[i];
220
220
  buffers[i] = null;
221
221
  const lines = buf.toString('utf-8').split('\n');
222
+ // Count actual rules for the startup banner. Skip:
223
+ // - empty lines
224
+ // - whitespace-only lines (trim then re-check length)
225
+ // - '!'-prefixed comments (standard adblock)
226
+ // - '['-prefixed filter list headers (e.g. '[Adblock Plus 2.0]')
227
+ // Previously only the first two skip conditions ran on the raw line,
228
+ // so whitespace lines + headers inflated the displayed count.
222
229
  for (let j = 0; j < lines.length; j++) {
223
230
  const line = lines[j];
224
231
  if (line.length === 0) continue;
225
- if (line.charCodeAt(0) === 0x21) continue;
232
+ const trimmed = line.trim();
233
+ if (trimmed.length === 0) continue;
234
+ const c = trimmed.charCodeAt(0);
235
+ if (c === 0x21 || c === 0x5B) continue; // '!' or '['
226
236
  ruleCount++;
227
237
  }
228
238
  filterSet.addFilters(lines);
@@ -238,7 +248,12 @@ function parseAdblockRules(filePathOrArray, options = {}) {
238
248
  // up by the TTL prune on a future run) but the final cachePath is
239
249
  // either complete or absent — never half-written.
240
250
  const tmpPath = cachePath + '.' + process.pid + '.tmp';
241
- fs.writeFileSync(tmpPath, Buffer.from(serialized));
251
+ // Buffer.from(buffer) ALWAYS copies — wasteful when adblock-rs's
252
+ // serialize() already returns a Buffer (binding-version dependent).
253
+ // For a ~10MB compiled engine that's a pointless 5-10ms allocate+
254
+ // memcpy on the cold-cache-write path.
255
+ const out = Buffer.isBuffer(serialized) ? serialized : Buffer.from(serialized);
256
+ fs.writeFileSync(tmpPath, out);
242
257
  fs.renameSync(tmpPath, cachePath);
243
258
  // Best-effort prune of stale cache files. Done after our own write so
244
259
  // we never delete the entry we just created.
@@ -287,8 +302,6 @@ function parseAdblockRules(filePathOrArray, options = {}) {
287
302
  }
288
303
 
289
304
  return {
290
- rules: { stats },
291
-
292
305
  shouldBlock(url, sourceUrl, resourceType) {
293
306
  // Avoid default-parameter syntax in the hot path — explicit null/undefined
294
307
  // checks are slightly cheaper for V8's argument adaptor.
package/lib/adblock.js CHANGED
@@ -85,22 +85,26 @@ function parseAdblockRules(filePath, options = {}) {
85
85
  const lines = fileContent.split('\n');
86
86
 
87
87
  const rules = {
88
- domainMap: new Map(), // ||domain.com^ - Exact domains for O(1) lookup
89
- domainRules: [], // ||*.domain.com^ - Wildcard domains (fallback)
90
- thirdPartyRules: [], // ||domain.com^$third-party
91
- firstPartyRules: [],
92
- pathRules: [], // /ads/*
93
- scriptRules: [], // .js$script
94
- regexRules: [], // /regex/
95
- whitelist: [], // @@||domain.com^ - Wildcard whitelist
96
- whitelistMap: new Map(), // Exact whitelist domains for O(1) lookup
97
- elementHiding: [], // ##.ad-class (not used for network blocking)
88
+ domainMap: new Map(), // ||domain.com^ - Exact domains for O(1) lookup
89
+ domainRules: [], // ||*.domain.com^ - Wildcard domains (fallback)
90
+ thirdPartyDomainMap: new Map(), // ||domain.com^$third-party (exact) — O(1)
91
+ thirdPartyRules: [], // wildcard / non-domain $third-party (fallback)
92
+ firstPartyDomainMap: new Map(), // ||domain.com^$first-party (exact) — O(1)
93
+ firstPartyRules: [], // wildcard / non-domain $first-party (fallback)
94
+ pathRules: [], // /ads/*
95
+ scriptRules: [], // .js$script
96
+ regexRules: [], // /regex/
97
+ whitelist: [], // @@||domain.com^ - Wildcard whitelist
98
+ whitelistMap: new Map(), // Exact whitelist domains for O(1) lookup
99
+ elementHiding: [], // ##.ad-class (not used for network blocking)
98
100
  stats: {
99
101
  total: 0,
100
102
  domain: 0,
101
- domainMapEntries: 0, // Exact domain matches in Map
103
+ domainMapEntries: 0, // Exact domain matches in Map
102
104
  thirdParty: 0,
105
+ thirdPartyMapEntries: 0, // Exact-domain $third-party rules in Map
103
106
  firstParty: 0,
107
+ firstPartyMapEntries: 0, // Exact-domain $first-party rules in Map
104
108
  path: 0,
105
109
  script: 0,
106
110
  regex: 0,
@@ -161,12 +165,28 @@ function parseAdblockRules(filePath, options = {}) {
161
165
  // Regular blocking rules
162
166
  const parsedRule = parseRule(line, false, enableLogging);
163
167
 
164
- // Categorize based on rule type
168
+ // Categorize based on rule type. For $third-party and $first-party
169
+ // rules we additionally split out the exact-domain variants into a
170
+ // hash map keyed by hostname, mirroring the domainMap pattern. This
171
+ // turns the common `||example.com^$third-party` lookup from O(N) over
172
+ // thousands of array entries into O(1) by hostname (+ small parent
173
+ // walk). Wildcard / non-domain party rules still fall back to the
174
+ // linear array.
165
175
  if (parsedRule.isThirdParty) {
166
- rules.thirdPartyRules.push(parsedRule);
176
+ if (parsedRule.isDomain && parsedRule.domain && !parsedRule.domain.includes('*')) {
177
+ rules.thirdPartyDomainMap.set(parsedRule.domain.toLowerCase(), parsedRule);
178
+ rules.stats.thirdPartyMapEntries++;
179
+ } else {
180
+ rules.thirdPartyRules.push(parsedRule);
181
+ }
167
182
  rules.stats.thirdParty++;
168
183
  } else if (parsedRule.isFirstParty) {
169
- rules.firstPartyRules.push(parsedRule);
184
+ if (parsedRule.isDomain && parsedRule.domain && !parsedRule.domain.includes('*')) {
185
+ rules.firstPartyDomainMap.set(parsedRule.domain.toLowerCase(), parsedRule);
186
+ rules.stats.firstPartyMapEntries++;
187
+ } else {
188
+ rules.firstPartyRules.push(parsedRule);
189
+ }
170
190
  rules.stats.firstParty++;
171
191
  } else if (parsedRule.isDomain) {
172
192
  // Store exact domains in Map for O(1) lookup, wildcards in array
@@ -201,7 +221,11 @@ function parseAdblockRules(filePath, options = {}) {
201
221
  console.log(formatLogMessage('debug', ` • Exact matches (Map): ${rules.stats.domainMapEntries}`));
202
222
  console.log(formatLogMessage('debug', ` • Wildcard patterns (Array): ${rules.domainRules.length}`));
203
223
  console.log(formatLogMessage('debug', ` - Third-party rules: ${rules.stats.thirdParty}`));
224
+ console.log(formatLogMessage('debug', ` • Exact matches (Map): ${rules.stats.thirdPartyMapEntries}`));
225
+ console.log(formatLogMessage('debug', ` • Wildcard/path (Array): ${rules.thirdPartyRules.length}`));
204
226
  console.log(formatLogMessage('debug', ` - First-party rules: ${rules.stats.firstParty}`));
227
+ console.log(formatLogMessage('debug', ` • Exact matches (Map): ${rules.stats.firstPartyMapEntries}`));
228
+ console.log(formatLogMessage('debug', ` • Wildcard/path (Array): ${rules.firstPartyRules.length}`));
205
229
  console.log(formatLogMessage('debug', ` - Path rules: ${rules.stats.path}`));
206
230
  console.log(formatLogMessage('debug', ` - Script rules: ${rules.stats.script}`));
207
231
  console.log(formatLogMessage('debug', ` - Regex rules: ${rules.stats.regex}`));
@@ -445,7 +469,14 @@ function createMatcher(rules, options = {}) {
445
469
  let resultCacheHits = 0, resultCacheMisses = 0;
446
470
  let urlCacheHits = 0, urlCacheMisses = 0;
447
471
  let sourceCacheHits = 0, sourceCacheMisses = 0;
448
- const hasPartyRules = rules.thirdPartyRules.length > 0 || rules.firstPartyRules.length > 0;
472
+ // Include the new domain-maps in the party-rules presence check — without
473
+ // this, a filter list whose $third-party rules ALL went into the Map (empty
474
+ // array) would never trigger third-party detection, silently disabling the
475
+ // entire third-party path.
476
+ const hasPartyRules = rules.thirdPartyRules.length > 0 ||
477
+ rules.firstPartyRules.length > 0 ||
478
+ rules.thirdPartyDomainMap.size > 0 ||
479
+ rules.firstPartyDomainMap.size > 0;
449
480
  // Result cache uses FIFO eviction (see FIFOCache class comment) —
450
481
  // evicts oldest entries one at a time instead of clearing everything.
451
482
  const resultCache = new FIFOCache(32000);
@@ -634,6 +665,29 @@ function createMatcher(rules, options = {}) {
634
665
 
635
666
  // Check third-party rules
636
667
  if (isThirdParty) {
668
+ // Fast path: exact-domain $third-party rules (O(1) by hostname)
669
+ let rule = rules.thirdPartyDomainMap.get(lowerHostname);
670
+ if (rule && matchesRule(rule, url, hostname, isThirdParty, resourceType, sourceDomain)) {
671
+ if (enableLogging) {
672
+ console.log(formatLogMessage('debug', `${ADBLOCK_TAG} Blocked third-party: ${url} (${rule.raw || rule.pattern})`));
673
+ }
674
+ const r = { blocked: true, rule: rule.raw || rule.pattern, reason: 'third_party_rule' };
675
+ resultCacheSet(url, sourceUrl, resourceType, r);
676
+ return r;
677
+ }
678
+ // Parent-domain $third-party rules — same walk as domainMap
679
+ for (let i = 0; i < parents.length; i++) {
680
+ rule = rules.thirdPartyDomainMap.get(parents[i]);
681
+ if (rule && matchesRule(rule, url, hostname, isThirdParty, resourceType, sourceDomain)) {
682
+ if (enableLogging) {
683
+ console.log(formatLogMessage('debug', `${ADBLOCK_TAG} Blocked third-party: ${url} (${rule.raw || rule.pattern})`));
684
+ }
685
+ const r = { blocked: true, rule: rule.raw || rule.pattern, reason: 'third_party_rule' };
686
+ resultCacheSet(url, sourceUrl, resourceType, r);
687
+ return r;
688
+ }
689
+ }
690
+ // Slow path: wildcard / non-domain $third-party rules
637
691
  const thirdPartyLen = rules.thirdPartyRules.length; // V8: Cache length
638
692
  for (let i = 0; i < thirdPartyLen; i++) {
639
693
  const rule = rules.thirdPartyRules[i];
@@ -650,6 +704,29 @@ function createMatcher(rules, options = {}) {
650
704
 
651
705
  // Check first-party rules
652
706
  if (!isThirdParty) {
707
+ // Fast path: exact-domain $first-party rules (O(1) by hostname)
708
+ let rule = rules.firstPartyDomainMap.get(lowerHostname);
709
+ if (rule && matchesRule(rule, url, hostname, isThirdParty, resourceType, sourceDomain)) {
710
+ if (enableLogging) {
711
+ console.log(formatLogMessage('debug', `${ADBLOCK_TAG} Blocked first-party: ${url} (${rule.raw || rule.pattern})`));
712
+ }
713
+ const r = { blocked: true, rule: rule.raw || rule.pattern, reason: 'first_party_rule' };
714
+ resultCacheSet(url, sourceUrl, resourceType, r);
715
+ return r;
716
+ }
717
+ // Parent-domain $first-party rules
718
+ for (let i = 0; i < parents.length; i++) {
719
+ rule = rules.firstPartyDomainMap.get(parents[i]);
720
+ if (rule && matchesRule(rule, url, hostname, isThirdParty, resourceType, sourceDomain)) {
721
+ if (enableLogging) {
722
+ console.log(formatLogMessage('debug', `${ADBLOCK_TAG} Blocked first-party: ${url} (${rule.raw || rule.pattern})`));
723
+ }
724
+ const r = { blocked: true, rule: rule.raw || rule.pattern, reason: 'first_party_rule' };
725
+ resultCacheSet(url, sourceUrl, resourceType, r);
726
+ return r;
727
+ }
728
+ }
729
+ // Slow path: wildcard / non-domain $first-party rules
653
730
  const firstPartyLen = rules.firstPartyRules.length;
654
731
  for (let i = 0; i < firstPartyLen; i++) {
655
732
  const rule = rules.firstPartyRules[i];
@@ -107,10 +107,13 @@ async function performGroupWindowCleanup(browserInstance, groupDescription, forc
107
107
  // Identify the main Puppeteer window (should be about:blank or the initial page)
108
108
  let mainPuppeteerPage = null;
109
109
  let pagesToClose = [];
110
-
111
- // Find the main page - typically the first page that's about:blank or has been there longest
110
+
111
+ // First pass: synchronous categorization. Separate blank pages from
112
+ // content pages so the conservative-mode isPageFromPreviousScan() checks
113
+ // can run in parallel via Promise.all below, instead of N sequential
114
+ // awaits (each potentially a CDP roundtrip for page.title()).
115
+ const contentPages = [];
112
116
  for (const page of allPages) {
113
- // Cache page.url() call to avoid repeated DOM/browser communication
114
117
  const pageUrl = page.url();
115
118
  if (pageUrl === 'about:blank' || pageUrl === '' || pageUrl.startsWith('chrome://')) {
116
119
  if (!mainPuppeteerPage) {
@@ -119,18 +122,21 @@ async function performGroupWindowCleanup(browserInstance, groupDescription, forc
119
122
  pagesToClose.push(page); // Additional blank pages can be closed
120
123
  }
121
124
  } else {
122
- // Any page with actual content should be evaluated for closure
123
- if (cleanupMode === "all") {
124
- // Aggressive mode: close all content pages
125
- pagesToClose.push(page);
126
- } else {
127
- // Conservative mode: only close pages that look like leftovers from previous scans
128
- // Keep pages that might still be actively used
129
- const isOldPage = await isPageFromPreviousScan(page, forceDebug);
130
- if (isOldPage) {
131
- pagesToClose.push(page);
132
- }
133
- }
125
+ contentPages.push(page);
126
+ }
127
+ }
128
+
129
+ if (cleanupMode === "all") {
130
+ // Aggressive mode: close all content pages no per-page async check
131
+ for (const page of contentPages) pagesToClose.push(page);
132
+ } else {
133
+ // Conservative mode: run the isPageFromPreviousScan checks in parallel
134
+ // and collect the leftovers in original order.
135
+ const checks = await Promise.all(
136
+ contentPages.map(page => isPageFromPreviousScan(page, forceDebug))
137
+ );
138
+ for (let i = 0; i < contentPages.length; i++) {
139
+ if (checks[i]) pagesToClose.push(contentPages[i]);
134
140
  }
135
141
  }
136
142
 
@@ -391,12 +397,13 @@ async function performRealtimeWindowCleanup(browserInstance, threshold = REALTIM
391
397
  if (forceDebug) {
392
398
  console.log(formatLogMessage('debug', `${REALTIME_CLEANUP_TAG} Found ${contextPages.length} pages in popup context`));
393
399
  }
394
- // Close popup context pages
395
- for (const page of contextPages) {
396
- if (!page.isClosed()) {
397
- await page.close();
398
- }
399
- }
400
+ // Close popup context pages in parallel — each close is
401
+ // independent and the sequential await was both slow AND would
402
+ // abort the whole loop on the first close failure, leaking the
403
+ // remaining pages. .catch() per page ensures we attempt all.
404
+ await Promise.all(contextPages.map(page =>
405
+ page.isClosed() ? undefined : page.close().catch(() => {})
406
+ ));
400
407
  }
401
408
  }
402
409
  } catch (contextErr) {
@@ -646,11 +653,17 @@ async function testNetworkCapability(browserInstance, timeout = 10000) {
646
653
 
647
654
  const startTime = Date.now();
648
655
  let testPage = null;
656
+ // Hoisted so the catch can attach an orphan-close chain. Promise.race
657
+ // cannot cancel browser.newPage() — if the race times out, the underlying
658
+ // call may still resolve to a real Page tab nothing references. Same
659
+ // pattern as cdp.js (commit 0772ccd) and clear_sitedata.js (commit 780b443).
660
+ let testPagePromise = null;
649
661
 
650
662
  try {
651
663
  // Create test page
664
+ testPagePromise = browserInstance.newPage();
652
665
  testPage = await raceWithTimeout(
653
- browserInstance.newPage(),
666
+ testPagePromise,
654
667
  timeout,
655
668
  'Test page creation timeout'
656
669
  );
@@ -673,21 +686,26 @@ async function testNetworkCapability(browserInstance, timeout = 10000) {
673
686
  result.responseTime = Date.now() - startTime;
674
687
 
675
688
  } catch (error) {
689
+ // Orphan cleanup: if testPage is null but newPage() was started, the
690
+ // race timed out before assignment. Close the orphan when it arrives.
691
+ if (!testPage && testPagePromise) {
692
+ testPagePromise.then(p => p.close().catch(() => {})).catch(() => {});
693
+ }
676
694
  result.error = error.message;
677
695
  result.responseTime = Date.now() - startTime;
678
696
 
679
697
  // Classify the error type
680
- if (error.message.includes('Network.enable') ||
698
+ if (error.message.includes('Network.enable') ||
681
699
  error.message.includes('timed out') ||
682
700
  error.message.includes('Protocol error')) {
683
701
  result.error = `Network capability test failed: ${error.message}`;
684
702
  }
685
703
  } finally {
686
704
  if (testPage && !testPage.isClosed()) {
687
- try {
688
- await testPage.close();
689
- } catch (closeErr) {
690
- /* ignore cleanup errors */
705
+ try {
706
+ await testPage.close();
707
+ } catch (closeErr) {
708
+ /* ignore cleanup errors */
691
709
  }
692
710
  }
693
711
  }
@@ -740,9 +758,15 @@ async function checkBrowserHealth(browserInstance, timeout = 8000) {
740
758
 
741
759
  // Test 4: Create a single test page to verify both browser functionality AND network capability
742
760
  let testPage = null;
761
+ // Same orphan-cleanup pattern as testNetworkCapability above + cdp.js +
762
+ // clear_sitedata.js. Promise.race can't cancel newPage() — if the race
763
+ // times out the underlying call may still produce a Page tab nothing
764
+ // references → leaked tab.
765
+ let testPagePromise = null;
743
766
  try {
767
+ testPagePromise = browserInstance.newPage();
744
768
  testPage = await raceWithTimeout(
745
- browserInstance.newPage(),
769
+ testPagePromise,
746
770
  timeout,
747
771
  'Page creation timeout'
748
772
  );
@@ -780,6 +804,11 @@ async function checkBrowserHealth(browserInstance, timeout = 8000) {
780
804
  await testPage.close();
781
805
 
782
806
  } catch (pageTestError) {
807
+ // Orphan cleanup: if testPage is null but newPage was started, the
808
+ // race timed out before assignment. Close the orphan when it arrives.
809
+ if (!testPage && testPagePromise) {
810
+ testPagePromise.then(p => p.close().catch(() => {})).catch(() => {});
811
+ }
783
812
  if (testPage && !testPage.isClosed()) {
784
813
  try { await testPage.close(); } catch (e) { /* ignore */ }
785
814
  }
package/lib/cdp.js CHANGED
@@ -48,7 +48,8 @@ function raceWithTimeout(promise, ms, message) {
48
48
  }
49
49
 
50
50
  // Shared no-op cleanup used by every no-CDP / CDP-failed return path. Hoisted
51
- // so createSessionResult() doesn't allocate a fresh `async () => {}` per call.
51
+ // so the success path doesn't allocate a fresh `async () => {}` per call
52
+ // when cleanup logic isn't needed, and so NOOP_SESSION_RESULT can reuse it.
52
53
  const NOOP_CLEANUP = async () => {};
53
54
 
54
55
  /**
@@ -74,27 +75,39 @@ function isCriticalCDPError(message) {
74
75
  message.includes('Browser has been closed');
75
76
  }
76
77
 
77
- /**
78
- * Creates a standardized session result object for consistent V8 optimization
79
- * @param {object|null} session - CDP session or null
80
- * @param {Function} cleanup - Cleanup function
81
- * @param {boolean} isEnhanced - Whether enhanced features are active
82
- * @returns {object} Standardized session object
83
- */
84
- const createSessionResult = (session = null, cleanup = NOOP_CLEANUP, isEnhanced = false) => ({
85
- session,
86
- cleanup,
87
- isEnhanced
78
+ // Pre-allocated singleton for both the early-exit case (CDP not enabled OR
79
+ // not in debug mode) AND the non-critical-error path. Frozen so callers can't
80
+ // mutate the shared instance. Result shape is {session, cleanup}; previously
81
+ // also carried an `isEnhanced: false` field that had zero consumers anywhere.
82
+ const NOOP_SESSION_RESULT = Object.freeze({
83
+ session: null,
84
+ cleanup: NOOP_CLEANUP
88
85
  });
89
86
 
90
87
  /**
91
- * Creates a new page with timeout protection to prevent CDP hangs
88
+ * Creates a new page with timeout protection to prevent CDP hangs.
89
+ *
90
+ * Orphan-page handling: Promise.race cannot cancel browser.newPage(). If the
91
+ * timer wins, the underlying call keeps running and eventually resolves to a
92
+ * real Page tab nothing references → leaked tab in the browser. We capture
93
+ * the original promise and attach a close-on-resolve cleanup so the orphan
94
+ * is reaped if it arrives after the race lost.
95
+ *
92
96
  * @param {import('puppeteer').Browser} browser - Browser instance
93
97
  * @param {number} timeout - Timeout in milliseconds (default: 30000)
94
98
  * @returns {Promise<import('puppeteer').Page>} Page instance
95
99
  */
96
100
  async function createPageWithTimeout(browser, timeout = 30000) {
97
- return raceWithTimeout(browser.newPage(), timeout, 'Page creation timeout - browser may be unresponsive');
101
+ const pagePromise = browser.newPage();
102
+ try {
103
+ return await raceWithTimeout(pagePromise, timeout, 'Page creation timeout - browser may be unresponsive');
104
+ } catch (err) {
105
+ // If pagePromise eventually resolves after the race gave up, close the
106
+ // orphan tab. .catch(() => {}) handles the case where pagePromise also
107
+ // rejected (no resource to clean up).
108
+ pagePromise.then(p => p.close().catch(() => {})).catch(() => {});
109
+ throw err;
110
+ }
98
111
  }
99
112
 
100
113
  /**
@@ -171,7 +184,7 @@ async function createCDPSession(page, currentUrl, options = {}) {
171
184
  const cdpLoggingNeeded = (enableCDP || siteSpecificCDP === true) && forceDebug;
172
185
 
173
186
  if (!cdpLoggingNeeded) {
174
- return createSessionResult();
187
+ return NOOP_SESSION_RESULT;
175
188
  }
176
189
 
177
190
  // Parse the current URL hostname once and reuse it for the mode-log line,
@@ -187,11 +200,16 @@ async function createCDPSession(page, currentUrl, options = {}) {
187
200
  }
188
201
 
189
202
  let cdpSession = null;
203
+ let cdpSessionPromise = null;
190
204
 
191
205
  try {
192
- // Create CDP session using modern Puppeteer 20+ API
193
- // Add timeout protection for CDP session creation
194
- cdpSession = await raceWithTimeout(page.createCDPSession(), 20000, 'CDP session creation timeout');
206
+ // Create CDP session using modern Puppeteer 20+ API.
207
+ // Capture the promise BEFORE racing so the catch block can attach an
208
+ // orphan-cleanup chain if our race times out but the underlying
209
+ // createCDPSession() later resolves, we'd otherwise leak a CDP session
210
+ // on the browser side that nothing references.
211
+ cdpSessionPromise = page.createCDPSession();
212
+ cdpSession = await raceWithTimeout(cdpSessionPromise, 20000, 'CDP session creation timeout');
195
213
 
196
214
  // Enable network domain — required for network event monitoring. This is
197
215
  // the operation the rest of the codebase has learned can hang under
@@ -221,10 +239,13 @@ async function createCDPSession(page, currentUrl, options = {}) {
221
239
 
222
240
  console.log(formatLogMessage('debug', `${CDP_TAG} CDP session created successfully for ${currentUrl}`));
223
241
 
224
- return createSessionResult(
225
- cdpSession,
226
- async () => {
227
- // Safe cleanup that never throws errors
242
+ return {
243
+ session: cdpSession,
244
+ cleanup: async () => {
245
+ // Safe cleanup that never throws errors. Idempotent — null out the
246
+ // captured reference after the first successful detach so a
247
+ // double-cleanup is a true no-op instead of generating a misleading
248
+ // "Failed to detach: Session closed" debug log on the second call.
228
249
  if (cdpSession) {
229
250
  try {
230
251
  await cdpSession.detach();
@@ -232,28 +253,41 @@ async function createCDPSession(page, currentUrl, options = {}) {
232
253
  } catch (cdpCleanupErr) {
233
254
  // Log cleanup errors but don't throw - cleanup should never fail the calling code
234
255
  console.log(formatLogMessage('debug', `${CDP_TAG} Failed to detach CDP session for ${currentUrl}: ${cdpCleanupErr.message}`));
256
+ } finally {
257
+ cdpSession = null;
235
258
  }
236
259
  }
237
- },
238
- false
239
- );
260
+ }
261
+ };
240
262
 
241
263
  } catch (cdpErr) {
242
- // If the session was created but a subsequent send/wire-up failed, detach
243
- // it so we don't leak a half-attached session. Previously the code just
244
- // nulled the local and orphaned the session. We're already past the
245
- // cdpLoggingNeeded gate here so forceDebug is truelog a failed detach
246
- // instead of swallowing it, so partial-cleanup failures aren't invisible.
264
+ // Two distinct cleanup paths depending on where the failure was:
265
+ //
266
+ // a) cdpSession IS set failure was AFTER createCDPSession() resolved
267
+ // (e.g. Network.enable timed out). We have a real handle — detach
268
+ // directly. Previously the code just nulled the local and orphaned
269
+ // the session; now we detach and log any failure.
270
+ //
271
+ // b) cdpSession is null but cdpSessionPromise was started → the race
272
+ // timed out before assignment. The underlying createCDPSession()
273
+ // may still resolve later, producing an orphan session on the
274
+ // browser side. Attach a detach-on-resolve chain; .catch(()=>{})
275
+ // swallows the case where the underlying promise also rejected.
247
276
  if (cdpSession) {
248
277
  try { await cdpSession.detach(); }
249
278
  catch (partialDetachErr) {
250
279
  console.log(formatLogMessage('debug', `${CDP_TAG} Partial-session detach failed for ${currentUrl}: ${partialDetachErr.message}`));
251
280
  }
252
- cdpSession = null;
281
+ } else if (cdpSessionPromise) {
282
+ cdpSessionPromise.then(s => s.detach().catch(() => {})).catch(() => {});
253
283
  }
254
284
 
255
- // Enhanced error context for CDP domain-specific debugging
256
- const urlContext = safeHostname(currentUrl, `${currentUrl.substring(0, 50)}...`);
285
+ // Enhanced error context for CDP domain-specific debugging. Reuse the
286
+ // currentHostname computed at function entry (one URL parse vs two);
287
+ // only fall back to the truncated raw URL when that parse failed too.
288
+ const urlContext = currentHostname !== 'unknown'
289
+ ? currentHostname
290
+ : `${currentUrl.substring(0, 50)}...`;
257
291
 
258
292
  // Critical errors: browser is broken, propagate so the caller can restart.
259
293
  if (isCriticalCDPError(cdpErr.message)) {
@@ -265,7 +299,7 @@ async function createCDPSession(page, currentUrl, options = {}) {
265
299
  console.warn(formatLogMessage('warn', `${CDP_TAG} Failed to attach CDP session for ${urlContext}: ${cdpErr.message}`));
266
300
 
267
301
  // Return null session with no-op cleanup for consistent API
268
- return createSessionResult();
302
+ return NOOP_SESSION_RESULT;
269
303
  }
270
304
  }
271
305