barebrowse 0.9.1 → 0.10.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +76 -0
- package/barebrowse.context.md +6 -1
- package/cli.js +22 -0
- package/package.json +1 -1
- package/src/blocklist.js +190 -0
- package/src/chromium.js +39 -9
- package/src/daemon.js +12 -3
- package/src/index.js +56 -8
- package/src/stealth.js +69 -0
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,81 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.10.0
|
|
4
|
+
|
|
5
|
+
### Ad/tracker URL blocking + canvas-noise stealth + Chromium pgid reap fix
|
|
6
|
+
|
|
7
|
+
Scrapling-inspired additions to make every snapshot quieter and every
|
|
8
|
+
headless session less fingerprintable, plus a flake fix surfaced by the
|
|
9
|
+
new work.
|
|
10
|
+
|
|
11
|
+
- **Ad/tracker URL blocking via CDP `Network.setBlockedURLs`.** New
|
|
12
|
+
`src/blocklist.js` ships ~120 hand-curated glob patterns covering the
|
|
13
|
+
high-frequency tracker families: Google ads + analytics, Facebook
|
|
14
|
+
Pixel, Amazon ads, MS Clarity/Bing, Adobe Marketing Cloud, the
|
|
15
|
+
consumer-pixel cluster (LinkedIn/Twitter/TikTok/Snap/Pinterest), the
|
|
16
|
+
SaaS analytics stacks (Segment/Amplitude/Mixpanel/Heap/PostHog),
|
|
17
|
+
session-replay (Hotjar/FullStory/LogRocket/Crazy Egg/Mouseflow),
|
|
18
|
+
content recommendation (Criteo/Taboola/Outbrain), supply-side ad
|
|
19
|
+
networks (AppNexus/Rubicon/PubMatic/OpenX/Trade Desk), and marketing
|
|
20
|
+
automation (HubSpot/Marketo/Pardot/Intercom/Drift). Curated by traffic
|
|
21
|
+
frequency rather than pulled wholesale from Peter Lowe — CDP does
|
|
22
|
+
linear pattern matching per request, so the long tail of regional
|
|
23
|
+
networks was measurable cost (~10ms cumulative on a 100-request page)
|
|
24
|
+
for ~5% extra coverage we'd rarely hit in agent traffic. Net effect:
|
|
25
|
+
smaller ARIA snapshots and faster page loads.
|
|
26
|
+
- **`opts.blockAds` and `opts.blockUrls` on `connect()` and `browse()`.**
|
|
27
|
+
`blockAds` defaults to `true` for launched browsers and `false` in
|
|
28
|
+
attach mode (would otherwise affect any tab in the user's running
|
|
29
|
+
browser). Explicit `blockAds: true` in attach mode is honored and
|
|
30
|
+
follows the session across `switchTab()`. `blockUrls` accepts extra
|
|
31
|
+
glob patterns merged with the default unless `blockAds: false`.
|
|
32
|
+
- **CLI flags on `bb open`: `--no-block-ads` and `--block-urls=PATTERN`**
|
|
33
|
+
(the latter repeatable). Plumbed through `cli.js`, `src/daemon.js`
|
|
34
|
+
startDaemon args, and `runDaemon` → `connect()`. Not exposed via MCP
|
|
35
|
+
or bareagent on purpose — agents inside a session shouldn't be
|
|
36
|
+
reconfiguring infra per tool call; the decision belongs at session
|
|
37
|
+
start.
|
|
38
|
+
- **Canvas fingerprint noise** in `src/stealth.js`. After WebGL
|
|
39
|
+
(already spoofed in v0.9.0), canvas `toDataURL` / `getImageData` is
|
|
40
|
+
the second-most-checked fingerprint vector — the pixel output of
|
|
41
|
+
rendered text/shapes depends on GPU, driver, and font rasterizer in
|
|
42
|
+
ways that are stable per machine but unique across machines, which
|
|
43
|
+
makes it a tracking signal that survives cookie clearing. The patch
|
|
44
|
+
XORs ~1 bit per 64-byte stride into the read pixels, with the bit
|
|
45
|
+
derived from a position-mixed hash of a per-session
|
|
46
|
+
`crypto.getRandomValues`-seeded value. Output is stable within a
|
|
47
|
+
session (so legitimate canvas use doesn't flicker) and different
|
|
48
|
+
across sessions (so fingerprinters see a fresh hash on every visit).
|
|
49
|
+
The canvas bitmap is snapshotted and restored around encoding so any
|
|
50
|
+
downstream legitimate read sees the original pixels.
|
|
51
|
+
- **Pre-existing Chromium subprocess reap flake fixed.** Chromium
|
|
52
|
+
spawns renderer/GPU/network/utility subprocesses that, under
|
|
53
|
+
`--site-per-process` (v0.9.0 H2), can outlive SIGTERM on the
|
|
54
|
+
Chromium parent by seconds while still holding profile-dir file
|
|
55
|
+
handles. Without `detached: true`, all of them shared Node's process
|
|
56
|
+
group — there was no way to signal the whole Chromium tree without
|
|
57
|
+
enumerating PIDs. `src/chromium.js` now spawns with `detached: true`
|
|
58
|
+
so each Chromium becomes its own process-group leader, and
|
|
59
|
+
`cleanupBrowser` / `reapAllSync` send SIGKILL to the negative PID
|
|
60
|
+
(the whole group) before `rmSync`. Latent in `main`, but the new
|
|
61
|
+
blocklist's added CDP setup overlapped the cleanup window enough to
|
|
62
|
+
hit ~1-in-3 under parallel test load. Side effect: terminal SIGINT
|
|
63
|
+
now goes to Node's pgid only — `registerExitHandlers`' SIGINT
|
|
64
|
+
reaper is what kills Chromium under Ctrl-C and must not be removed.
|
|
65
|
+
- **`startDaemon` poll deadline 15s → 30s** for cold-boot margin on
|
|
66
|
+
slower hardware (CI / older boxes) now that the blocklist adds a
|
|
67
|
+
small amount of CDP setup time to the session-startup path.
|
|
68
|
+
- **Tests:** 138 total (10 new). New: 5-test unit suite for
|
|
69
|
+
`DEFAULT_BLOCKLIST` (shape/coverage drift guards, must-cover
|
|
70
|
+
tracker families, no dups); 2-test integration suite that proves
|
|
71
|
+
`Network.setBlockedURLs` actually drops the matching subresource
|
|
72
|
+
and that `blockAds:false` lets it through; 2 new canvas-noise
|
|
73
|
+
subtests (patch installed, stable within session, different across
|
|
74
|
+
sessions); 1 end-to-end `bb open --block-urls=PATTERN URL` test
|
|
75
|
+
that proves the flag survives every hop through `cli.js` →
|
|
76
|
+
`startDaemon` → daemon-internal → `connect()` → `setBlockedURLs`
|
|
77
|
+
and that the tracker server sees zero hits.
|
|
78
|
+
|
|
3
79
|
## 0.9.1
|
|
4
80
|
|
|
5
81
|
### Pruning — `pruneMode` reaches MCP / bareagent and `read` finally works
|
package/barebrowse.context.md
CHANGED
|
@@ -45,6 +45,8 @@ const snapshot = await browse('https://example.com', {
|
|
|
45
45
|
prune: true, // apply ARIA pruning (47-95% token reduction)
|
|
46
46
|
pruneMode: 'act', // 'act' (interactive elements) | 'read' (all content)
|
|
47
47
|
consent: true, // auto-dismiss cookie consent dialogs
|
|
48
|
+
blockAds: true, // block ~120 ad/tracker URL patterns (default on for owned browsers)
|
|
49
|
+
blockUrls: [], // extra URL globs to block (merged with the default)
|
|
48
50
|
timeout: 30000, // navigation timeout in ms
|
|
49
51
|
});
|
|
50
52
|
```
|
|
@@ -91,6 +93,8 @@ const snapshot = await browse('https://example.com', {
|
|
|
91
93
|
- `viewport: '1280x720'` — Set viewport dimensions
|
|
92
94
|
- `storageState: 'file.json'` — Load cookies/localStorage from saved state
|
|
93
95
|
- `downloadPath: '/abs/dir'` — Where downloads land. Default: per-session `mkdtemp` under `/tmp/barebrowse-dl-*` that gets removed on `close()`. Caller-supplied paths are not cleaned up — caller owns the lifecycle.
|
|
96
|
+
- `blockAds: true|false` — CDP-level URL blocking of ~120 common ad/tracker patterns (Google ads/analytics, FB/Amazon/MS/Adobe ad+analytics, Segment/Amplitude/Mixpanel/Heap, Hotjar/FullStory/LogRocket, Criteo/Taboola/Outbrain, the consumer-pixel cluster, AppNexus/Rubicon/PubMatic supply, marketing automation). Default `true` for launched browsers, `false` in attach mode (would affect any tab in the user's running browser). Explicit `true` in attach mode is honored and follows the session across `switchTab()`. Shrinks ARIA snapshots and speeds page loads.
|
|
97
|
+
- `blockUrls: ['*://foo.com/*', ...]` — Extra glob patterns (CDP `Network.setBlockedURLs` format) to block in addition to the default. Merged with the default unless `blockAds: false`.
|
|
94
98
|
|
|
95
99
|
## Snapshot format
|
|
96
100
|
|
|
@@ -161,7 +165,8 @@ barebrowse can inject cookies from the user's real browser sessions, bypassing l
|
|
|
161
165
|
| Form submission | `press('Enter')` triggers onsubmit | Both |
|
|
162
166
|
| SPA navigation | `waitForNavigation()` uses loadEventFired + frameNavigated | Both |
|
|
163
167
|
| Bot detection | v0.9.0 (H9): Cloudflare-strong phrases ("Just a moment", "Attention Required", "verify you are human") fire alone; generic phrases ("access denied", "unknown error") only fire on near-empty pages — no more false-positive headed-launches on legitimate 4xx/5xx pages. `botBlocked` flag set after every `goto()`. Hybrid fallback switches to headed. Snapshot shows `[BOT CHALLENGE DETECTED]` warning. | Hybrid |
|
|
164
|
-
| Stealth (headless tells) | v0.9.0 (H4): `Network.setUserAgentOverride` strips "HeadlessChrome" from UA in HTTP headers AND `navigator.userAgent`; JS patches for webdriver, plugins, languages, full `chrome.runtime` enum shape, `Notification` constructor + `permission: 'default'`, `hardwareConcurrency: 8`, `deviceMemory: 8`, WebGL `UNMASKED_VENDOR_WEBGL`/`UNMASKED_RENDERER_WEBGL` spoofed to Intel | Headless |
|
|
168
|
+
| Stealth (headless tells) | v0.9.0 (H4): `Network.setUserAgentOverride` strips "HeadlessChrome" from UA in HTTP headers AND `navigator.userAgent`; JS patches for webdriver, plugins, languages, full `chrome.runtime` enum shape, `Notification` constructor + `permission: 'default'`, `hardwareConcurrency: 8`, `deviceMemory: 8`, WebGL `UNMASKED_VENDOR_WEBGL`/`UNMASKED_RENDERER_WEBGL` spoofed to Intel. v0.10.0: canvas fingerprint noise — `toDataURL`/`getImageData` XOR a per-session `crypto.getRandomValues`-seeded mask into ~1 byte per 64-byte stride (stable within a session, different across sessions; bitmap is restored after encoding so legitimate canvas use is unaffected). | Headless |
|
|
169
|
+
| Ad / tracker URL blocking | v0.10.0: CDP `Network.setBlockedURLs` with ~120 curated patterns (Google/FB/Amazon/MS/Adobe ad+analytics, the major SaaS analytics + session-replay stacks, content-rec, supply-side ad networks, marketing automation). Default on for launched browsers, off in attach mode. `opts.blockUrls` extends; `opts.blockAds: false` disables. Shrinks ARIA snapshots and speeds loads. | Launched |
|
|
165
170
|
| iframe / OOPIF content (Stripe, reCAPTCHA, embedded forms) | v0.9.0 (H2): `Target.setAutoAttach({flatten:true})` registers a CDP session per iframe; `ariaTree()` walks `Page.getFrameTree`, fetches each frame's AX tree on the right session, splices children under iframe placeholders via `DOM.getFrameOwner`. Refs route via `{session, backendNodeId}` so clicks dispatch in the iframe's Input domain. `--site-per-process` launch flag forces every iframe — including same-origin — into OOPIF so coords work. | Both |
|
|
166
171
|
| Downloads | v0.9.0 (H7): `Browser.setDownloadBehavior({behavior:'allowAndName', downloadPath, eventsEnabled:true})` + listeners populate `page.downloads`. Files land at `savedPath` (under `--download-path` if supplied, else per-session `/tmp/barebrowse-dl-*`). | Headless + Headed (skipped in attach mode) |
|
|
167
172
|
| Profile locking | Unique temp dir per headless instance | Headless |
|
package/cli.js
CHANGED
|
@@ -117,6 +117,8 @@ async function cmdOpen() {
|
|
|
117
117
|
viewport: parseFlag('--viewport'),
|
|
118
118
|
storageState: parseFlag('--storage-state'),
|
|
119
119
|
downloadPath: parseFlag('--download-path'),
|
|
120
|
+
blockAds: hasFlag('--no-block-ads') ? false : undefined,
|
|
121
|
+
blockUrls: parseFlagAll('--block-urls'),
|
|
120
122
|
};
|
|
121
123
|
|
|
122
124
|
try {
|
|
@@ -218,6 +220,8 @@ async function runDaemonInternal() {
|
|
|
218
220
|
viewport: parseFlag('--viewport'),
|
|
219
221
|
storageState: parseFlag('--storage-state'),
|
|
220
222
|
downloadPath: parseFlag('--download-path'),
|
|
223
|
+
blockAds: hasFlag('--no-block-ads') ? false : undefined,
|
|
224
|
+
blockUrls: parseFlagAll('--block-urls'),
|
|
221
225
|
};
|
|
222
226
|
const outputDir = parseFlag('--output-dir') || resolve('.barebrowse');
|
|
223
227
|
const url = parseFlag('--url');
|
|
@@ -240,6 +244,20 @@ function hasFlag(name) {
|
|
|
240
244
|
return args.includes(name);
|
|
241
245
|
}
|
|
242
246
|
|
|
247
|
+
// Collects every occurrence of a repeatable flag (--name=val or --name val).
|
|
248
|
+
// Returns undefined when absent so the opts object stays sparse and callers
|
|
249
|
+
// can distinguish "not provided" from "provided but empty".
|
|
250
|
+
function parseFlagAll(name) {
|
|
251
|
+
const out = [];
|
|
252
|
+
for (let i = 0; i < args.length; i++) {
|
|
253
|
+
if (args[i].startsWith(name + '=')) out.push(args[i].slice(name.length + 1));
|
|
254
|
+
else if (args[i] === name && args[i + 1] && !args[i + 1].startsWith('--')) {
|
|
255
|
+
out.push(args[i + 1]); i++;
|
|
256
|
+
}
|
|
257
|
+
}
|
|
258
|
+
return out.length ? out : undefined;
|
|
259
|
+
}
|
|
260
|
+
|
|
243
261
|
|
|
244
262
|
// --- MCP auto-installer ---
|
|
245
263
|
|
|
@@ -467,6 +485,10 @@ Session:
|
|
|
467
485
|
--viewport=WxH Viewport size (e.g. 1280x720)
|
|
468
486
|
--storage-state=FILE Load cookies/localStorage from JSON file
|
|
469
487
|
--download-path=DIR Directory for downloaded files (default: per-session temp dir)
|
|
488
|
+
--no-block-ads Disable the built-in ad/tracker blocklist (~120 patterns).
|
|
489
|
+
Default: enabled in owned-browser modes, disabled in attach mode.
|
|
490
|
+
--block-urls=PATTERN Extra URL glob to block (repeatable, e.g. --block-urls='*://*.foo.com/*').
|
|
491
|
+
Use the =VALUE form when the pattern could be mistaken for a flag.
|
|
470
492
|
|
|
471
493
|
Navigation:
|
|
472
494
|
barebrowse goto <url> Navigate to URL
|
package/package.json
CHANGED
package/src/blocklist.js
ADDED
|
@@ -0,0 +1,190 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* blocklist.js — Ad/tracker URL patterns for CDP Network.setBlockedURLs.
|
|
3
|
+
*
|
|
4
|
+
* Curated by real-world frequency, not pulled wholesale from Peter Lowe /
|
|
5
|
+
* EasyList. CDP does linear pattern matching per request, so 3,000-entry
|
|
6
|
+
* lists add ~150ms cumulative cost on a typical page for ~5% extra coverage
|
|
7
|
+
* (long-tail regional networks the agent rarely encounters). The set below
|
|
8
|
+
* is ~120 patterns covering the trackers that actually show up in agent
|
|
9
|
+
* traffic: Google/FB/Amazon/MS/Adobe ad+analytics, the major SaaS analytics
|
|
10
|
+
* stacks (Segment/Amplitude/Mixpanel/HubSpot/Hotjar/FullStory/Heap/Mouseflow),
|
|
11
|
+
* session-replay (LogRocket/Crazy Egg/Optimizely/VWO), content-recommendation
|
|
12
|
+
* (Taboola/Outbrain/Criteo), and the consumer-pixel cluster (LinkedIn/Twitter/
|
|
13
|
+
* TikTok/Snap/Pinterest/Reddit).
|
|
14
|
+
*
|
|
15
|
+
* Patterns are CDP-format globs: '*' matches any character run.
|
|
16
|
+
*
|
|
17
|
+
* To extend at runtime, pass connect({ blockUrls: [...] }) — your patterns
|
|
18
|
+
* are merged with this default. To turn the default off entirely, pass
|
|
19
|
+
* { blockAds: false }.
|
|
20
|
+
*/
|
|
21
|
+
|
|
22
|
+
export const DEFAULT_BLOCKLIST = [
|
|
23
|
+
// --- Google ads + analytics (the single biggest cluster) ---
|
|
24
|
+
'*://*.doubleclick.net/*',
|
|
25
|
+
'*://*.googlesyndication.com/*',
|
|
26
|
+
'*://*.googleadservices.com/*',
|
|
27
|
+
'*://*.googletagservices.com/*',
|
|
28
|
+
'*://*.googletagmanager.com/*',
|
|
29
|
+
'*://*.google-analytics.com/*',
|
|
30
|
+
'*://*.adservice.google.com/*',
|
|
31
|
+
'*://pagead2.googlesyndication.com/*',
|
|
32
|
+
'*://www.googleadservices.com/pagead/*',
|
|
33
|
+
'*://ssl.google-analytics.com/*',
|
|
34
|
+
'*://stats.g.doubleclick.net/*',
|
|
35
|
+
|
|
36
|
+
// --- Facebook / Meta ---
|
|
37
|
+
'*://connect.facebook.net/*',
|
|
38
|
+
'*://*.facebook.com/tr*', // Pixel (matches both /tr/... and /tr?...)
|
|
39
|
+
'*://*.fbcdn.net/signals/*',
|
|
40
|
+
|
|
41
|
+
// --- Amazon ads ---
|
|
42
|
+
'*://*.amazon-adsystem.com/*',
|
|
43
|
+
'*://aax.amazon-adsystem.com/*',
|
|
44
|
+
'*://s.amazon-adsystem.com/*',
|
|
45
|
+
|
|
46
|
+
// --- Microsoft (Bing ads + Clarity) ---
|
|
47
|
+
'*://bat.bing.com/*',
|
|
48
|
+
'*://*.clarity.ms/*',
|
|
49
|
+
|
|
50
|
+
// --- Yandex ---
|
|
51
|
+
'*://mc.yandex.ru/*',
|
|
52
|
+
'*://an.yandex.ru/*',
|
|
53
|
+
'*://yandex.ru/ads/*',
|
|
54
|
+
|
|
55
|
+
// --- Adobe Marketing Cloud ---
|
|
56
|
+
'*://*.omtrdc.net/*',
|
|
57
|
+
'*://*.demdex.net/*',
|
|
58
|
+
'*://*.everesttech.net/*',
|
|
59
|
+
'*://*.2o7.net/*',
|
|
60
|
+
'*://*.adobedtm.com/*',
|
|
61
|
+
|
|
62
|
+
// --- LinkedIn ---
|
|
63
|
+
'*://px.ads.linkedin.com/*',
|
|
64
|
+
'*://snap.licdn.com/li.lms-analytics/*',
|
|
65
|
+
|
|
66
|
+
// --- Twitter/X ---
|
|
67
|
+
'*://analytics.twitter.com/*',
|
|
68
|
+
'*://static.ads-twitter.com/*',
|
|
69
|
+
'*://*.t.co/i/adsct*',
|
|
70
|
+
|
|
71
|
+
// --- TikTok ---
|
|
72
|
+
'*://analytics.tiktok.com/*',
|
|
73
|
+
'*://business-api.tiktok.com/*',
|
|
74
|
+
'*://*.tiktokcdn.com/tiktok/*',
|
|
75
|
+
|
|
76
|
+
// --- Snap ---
|
|
77
|
+
'*://tr.snapchat.com/*',
|
|
78
|
+
'*://sc-static.net/scevent.min.js*',
|
|
79
|
+
|
|
80
|
+
// --- Pinterest ---
|
|
81
|
+
'*://ct.pinterest.com/*',
|
|
82
|
+
'*://*.pinimg.com/ct/*',
|
|
83
|
+
|
|
84
|
+
// --- Reddit ---
|
|
85
|
+
'*://events.redditmedia.com/*',
|
|
86
|
+
'*://www.redditstatic.com/ads/*',
|
|
87
|
+
|
|
88
|
+
// --- Quantcast / ComScore / Chartbeat ---
|
|
89
|
+
'*://pixel.quantserve.com/*',
|
|
90
|
+
'*://*.quantcount.com/*',
|
|
91
|
+
'*://*.scorecardresearch.com/*',
|
|
92
|
+
'*://ping.chartbeat.net/*',
|
|
93
|
+
'*://static.chartbeat.com/*',
|
|
94
|
+
|
|
95
|
+
// --- Criteo / Taboola / Outbrain (content + retargeting) ---
|
|
96
|
+
'*://*.criteo.com/*',
|
|
97
|
+
'*://*.criteo.net/*',
|
|
98
|
+
'*://cdn.taboola.com/*',
|
|
99
|
+
'*://trc.taboola.com/*',
|
|
100
|
+
'*://widgets.outbrain.com/*',
|
|
101
|
+
'*://*.outbrain.com/utils/*',
|
|
102
|
+
|
|
103
|
+
// --- Tealium / Marketo / Pardot / Salesforce marketing ---
|
|
104
|
+
'*://tags.tiqcdn.com/*',
|
|
105
|
+
'*://*.tealiumiq.com/*',
|
|
106
|
+
'*://munchkin.marketo.net/*',
|
|
107
|
+
'*://*.marketo.com/munchkin*',
|
|
108
|
+
'*://pi.pardot.com/*',
|
|
109
|
+
'*://*.exacttarget.com/cdn/*',
|
|
110
|
+
|
|
111
|
+
// --- Yahoo / Verizon Media ---
|
|
112
|
+
'*://*.yahoo.com/p.gif*',
|
|
113
|
+
'*://ad.yieldmanager.com/*',
|
|
114
|
+
'*://sp.analytics.yahoo.com/*',
|
|
115
|
+
|
|
116
|
+
// --- RUM / front-end perf (debatable, but commonly noisy) ---
|
|
117
|
+
'*://rum.pingdom.net/*',
|
|
118
|
+
'*://bam.nr-data.net/*',
|
|
119
|
+
'*://bam-cell.nr-data.net/*',
|
|
120
|
+
'*://js-agent.newrelic.com/*',
|
|
121
|
+
'*://*.browser-intake-datadoghq.com/*',
|
|
122
|
+
'*://*.browser-intake-datadoghq.eu/*',
|
|
123
|
+
|
|
124
|
+
// --- Session replay + heatmaps ---
|
|
125
|
+
'*://*.hotjar.com/*',
|
|
126
|
+
'*://*.hotjar.io/*',
|
|
127
|
+
'*://*.fullstory.com/s/*',
|
|
128
|
+
'*://*.fullstory.com/rec/*',
|
|
129
|
+
'*://r.lr-ingest.io/*',
|
|
130
|
+
'*://*.logrocket.io/*',
|
|
131
|
+
'*://cdn.lr-ingest.com/*',
|
|
132
|
+
'*://script.crazyegg.com/*',
|
|
133
|
+
'*://cdn.mouseflow.com/*',
|
|
134
|
+
'*://*.mouseflow.com/projects/*',
|
|
135
|
+
|
|
136
|
+
// --- A/B testing ---
|
|
137
|
+
'*://cdn.optimizely.com/*',
|
|
138
|
+
'*://*.optimizely.com/event*',
|
|
139
|
+
'*://dev.visualwebsiteoptimizer.com/*',
|
|
140
|
+
'*://*.vwo.com/*',
|
|
141
|
+
|
|
142
|
+
// --- Product analytics ---
|
|
143
|
+
'*://api.segment.io/*',
|
|
144
|
+
'*://cdn.segment.com/*',
|
|
145
|
+
'*://*.segment.io/v1/*',
|
|
146
|
+
'*://api.amplitude.com/*',
|
|
147
|
+
'*://api2.amplitude.com/*',
|
|
148
|
+
'*://cdn.amplitude.com/*',
|
|
149
|
+
'*://api.mixpanel.com/*',
|
|
150
|
+
'*://cdn.mxpnl.com/*',
|
|
151
|
+
'*://*.heapanalytics.com/*',
|
|
152
|
+
'*://heapanalytics.com/h*',
|
|
153
|
+
'*://*.posthog.com/e/*',
|
|
154
|
+
'*://*.posthog.com/decide/*',
|
|
155
|
+
|
|
156
|
+
// --- Marketing automation ---
|
|
157
|
+
'*://track.hubspot.com/*',
|
|
158
|
+
'*://js.hs-scripts.com/*',
|
|
159
|
+
'*://js.hs-analytics.net/*',
|
|
160
|
+
'*://js.hsforms.net/*',
|
|
161
|
+
|
|
162
|
+
// --- Customer messaging (these load chat widgets that bloat ARIA) ---
|
|
163
|
+
'*://widget.intercom.io/*',
|
|
164
|
+
'*://api-iam.intercom.io/messenger/*',
|
|
165
|
+
'*://js.intercomcdn.com/*',
|
|
166
|
+
'*://js.driftt.com/*',
|
|
167
|
+
'*://event.api.drift.com/*',
|
|
168
|
+
|
|
169
|
+
// --- Error reporters (Sentry kept off — agents may want to see errors) ---
|
|
170
|
+
'*://sessions.bugsnag.com/*',
|
|
171
|
+
'*://notify.bugsnag.com/*',
|
|
172
|
+
|
|
173
|
+
// --- Misc widely-deployed ad networks ---
|
|
174
|
+
'*://*.adnxs.com/*', // AppNexus / Xandr
|
|
175
|
+
'*://*.rubiconproject.com/*',
|
|
176
|
+
'*://*.pubmatic.com/*',
|
|
177
|
+
'*://*.openx.net/*',
|
|
178
|
+
'*://*.casalemedia.com/*',
|
|
179
|
+
'*://*.bidswitch.net/*',
|
|
180
|
+
'*://*.adsrvr.org/*', // The Trade Desk
|
|
181
|
+
'*://*.media.net/*',
|
|
182
|
+
'*://*.mediavoice.com/*',
|
|
183
|
+
'*://*.serving-sys.com/*', // Sizmek
|
|
184
|
+
'*://*.smartadserver.com/*',
|
|
185
|
+
'*://*.indexww.com/*',
|
|
186
|
+
'*://*.mathtag.com/*',
|
|
187
|
+
'*://*.tapad.com/*',
|
|
188
|
+
'*://*.bluekai.com/*', // Oracle Data Cloud
|
|
189
|
+
'*://*.krxd.net/*', // Salesforce / Krux
|
|
190
|
+
];
|
package/src/chromium.js
CHANGED
|
@@ -16,9 +16,12 @@ let exitHandlersRegistered = false;
|
|
|
16
16
|
function reapAllSync() {
|
|
17
17
|
const toReap = [...activeBrowsers];
|
|
18
18
|
activeBrowsers.clear();
|
|
19
|
-
// Send SIGKILL to
|
|
19
|
+
// Send SIGKILL to the parent AND the whole process group (detached:true
|
|
20
|
+
// gives each Chromium its own pgid, so -pid targets every renderer/GPU/
|
|
21
|
+
// network child without touching Node or its other children).
|
|
20
22
|
for (const b of toReap) {
|
|
21
23
|
try { if (!b.process.killed) b.process.kill('SIGKILL'); } catch {}
|
|
24
|
+
try { process.kill(-b.process.pid, 'SIGKILL'); } catch {}
|
|
22
25
|
}
|
|
23
26
|
// Then poll each for actual death before removing its profile dir —
|
|
24
27
|
// Chromium can hold file handles briefly even after SIGKILL, which would
|
|
@@ -151,8 +154,22 @@ export async function launch(opts = {}) {
|
|
|
151
154
|
// about:blank as initial page
|
|
152
155
|
args.push('about:blank');
|
|
153
156
|
|
|
157
|
+
// detached:true makes Node call setsid() so Chromium becomes its own
|
|
158
|
+
// process-group leader. Without this, the renderer/GPU/network children
|
|
159
|
+
// it forks share the Node parent's process group — SIGTERM on the
|
|
160
|
+
// Chromium PID only signals Chromium itself and the children linger,
|
|
161
|
+
// holding profile-dir files for seconds after the parent exits. Under
|
|
162
|
+
// parallel test load that races our rmSync cleanup. With a separate
|
|
163
|
+
// pgid, cleanupBrowser can signal the whole group with process.kill(-pid).
|
|
164
|
+
//
|
|
165
|
+
// Trade-off: a terminal SIGINT (Ctrl-C) is delivered to the foreground
|
|
166
|
+
// process group, which is now Node's — Chromium will NOT receive it
|
|
167
|
+
// directly. The SIGINT handler in registerExitHandlers() that calls
|
|
168
|
+
// reapAllSync() is what actually kills Chromium under Ctrl-C now. Do not
|
|
169
|
+
// remove that handler without restoring some other path to reap children.
|
|
154
170
|
const child = spawn(binary, args, {
|
|
155
171
|
stdio: ['ignore', 'pipe', 'pipe'],
|
|
172
|
+
detached: true,
|
|
156
173
|
});
|
|
157
174
|
|
|
158
175
|
// Parse the WebSocket URL from stderr
|
|
@@ -216,20 +233,33 @@ export async function cleanupBrowser(browser) {
|
|
|
216
233
|
});
|
|
217
234
|
try { browser.process.kill(); } catch {}
|
|
218
235
|
await exited;
|
|
236
|
+
// SIGKILL the whole Chromium process group. The parent may have exited
|
|
237
|
+
// already (above) but renderer/GPU/network children — separate processes
|
|
238
|
+
// under --site-per-process — can outlive it by seconds, and they hold
|
|
239
|
+
// profile-dir file handles. Because launch() spawned with detached:true,
|
|
240
|
+
// the children share Chromium's pgid (not Node's), so process.kill on a
|
|
241
|
+
// negative PID reaps the whole group without touching anything else.
|
|
242
|
+
try { process.kill(-browser.process.pid, 'SIGKILL'); } catch {
|
|
243
|
+
// ESRCH = group already gone; anything else is best-effort here.
|
|
244
|
+
}
|
|
219
245
|
}
|
|
220
246
|
if (browser.ownedProfileDir) {
|
|
221
|
-
// Chromium
|
|
222
|
-
// --site-per-process
|
|
223
|
-
//
|
|
224
|
-
//
|
|
225
|
-
//
|
|
226
|
-
//
|
|
227
|
-
|
|
247
|
+
// Chromium spawns renderer + GPU + network + utility subprocesses (one
|
|
248
|
+
// per site under --site-per-process from H2), and SIGTERM on the parent
|
|
249
|
+
// doesn't guarantee the children have closed their profile-file handles
|
|
250
|
+
// by the time the parent's exit event fires. Under parallel test load
|
|
251
|
+
// we've seen handle-release take >2.5s. Retry budget here is 60×100ms
|
|
252
|
+
// jittered (~6s+ worst case). Retry on ANY error short of ENOENT —
|
|
253
|
+
// earlier code only retried ENOTEMPTY/EBUSY but Linux also reports
|
|
254
|
+
// EPERM/EACCES transiently when an open-deleted file is still being
|
|
255
|
+
// written to. force:true already swallows ENOENT, so the catch only
|
|
256
|
+
// sees real failures.
|
|
257
|
+
for (let i = 0; i < 60; i++) {
|
|
228
258
|
try {
|
|
229
259
|
rmSync(browser.ownedProfileDir, { recursive: true, force: true });
|
|
230
260
|
break;
|
|
231
261
|
} catch (err) {
|
|
232
|
-
if (err.code
|
|
262
|
+
if (err.code === 'ENOENT') break; // already gone
|
|
233
263
|
await new Promise((r) => setTimeout(r, 100 + Math.floor(Math.random() * 50)));
|
|
234
264
|
}
|
|
235
265
|
}
|
package/src/daemon.js
CHANGED
|
@@ -40,6 +40,10 @@ export async function startDaemon(opts, outputDir, initialUrl) {
|
|
|
40
40
|
if (opts.viewport) args.push('--viewport', opts.viewport);
|
|
41
41
|
if (opts.storageState) args.push('--storage-state', opts.storageState);
|
|
42
42
|
if (opts.downloadPath) args.push('--download-path', opts.downloadPath);
|
|
43
|
+
if (opts.blockAds === false) args.push('--no-block-ads');
|
|
44
|
+
if (Array.isArray(opts.blockUrls)) {
|
|
45
|
+
for (const p of opts.blockUrls) args.push('--block-urls', p);
|
|
46
|
+
}
|
|
43
47
|
|
|
44
48
|
const child = spawn(process.execPath, args, {
|
|
45
49
|
detached: true,
|
|
@@ -48,8 +52,11 @@ export async function startDaemon(opts, outputDir, initialUrl) {
|
|
|
48
52
|
});
|
|
49
53
|
child.unref();
|
|
50
54
|
|
|
51
|
-
// Poll for session.json (50ms interval,
|
|
52
|
-
|
|
55
|
+
// Poll for session.json (50ms interval, 30s timeout). 30s covers cold
|
|
56
|
+
// Chromium boot plus initial-URL navigation on slower CI/older hardware;
|
|
57
|
+
// the previous 15s was tight enough that the ad-blocklist's added
|
|
58
|
+
// CDP setup time pushed real boots past it on stress runs.
|
|
59
|
+
const deadline = Date.now() + 30000;
|
|
53
60
|
while (Date.now() < deadline) {
|
|
54
61
|
if (existsSync(sessionPath)) {
|
|
55
62
|
try {
|
|
@@ -59,7 +66,7 @@ export async function startDaemon(opts, outputDir, initialUrl) {
|
|
|
59
66
|
}
|
|
60
67
|
await new Promise((r) => setTimeout(r, 50));
|
|
61
68
|
}
|
|
62
|
-
throw new Error('Daemon failed to start within
|
|
69
|
+
throw new Error('Daemon failed to start within 30s');
|
|
63
70
|
}
|
|
64
71
|
|
|
65
72
|
/**
|
|
@@ -79,6 +86,8 @@ export async function runDaemon(opts, outputDir, initialUrl) {
|
|
|
79
86
|
viewport: opts.viewport,
|
|
80
87
|
storageState: opts.storageState,
|
|
81
88
|
downloadPath: opts.downloadPath,
|
|
89
|
+
blockAds: opts.blockAds,
|
|
90
|
+
blockUrls: opts.blockUrls,
|
|
82
91
|
});
|
|
83
92
|
|
|
84
93
|
// Console log capture
|
package/src/index.js
CHANGED
|
@@ -16,6 +16,7 @@ import { prune as pruneTree } from './prune.js';
|
|
|
16
16
|
import { click as cdpClick, type as cdpType, scroll as cdpScroll, press as cdpPress, hover as cdpHover, select as cdpSelect, drag as cdpDrag, upload as cdpUpload } from './interact.js';
|
|
17
17
|
import { dismissConsent } from './consent.js';
|
|
18
18
|
import { applyStealth } from './stealth.js';
|
|
19
|
+
import { DEFAULT_BLOCKLIST } from './blocklist.js';
|
|
19
20
|
import { waitForNetworkIdle } from './network-idle.js';
|
|
20
21
|
import { join as pathJoin } from 'node:path';
|
|
21
22
|
|
|
@@ -29,6 +30,11 @@ import { join as pathJoin } from 'node:path';
|
|
|
29
30
|
* @param {boolean} [opts.cookies=true] - Inject user's cookies (Phase 2)
|
|
30
31
|
* @param {boolean} [opts.prune=true] - Apply ARIA pruning (Phase 2)
|
|
31
32
|
* @param {number} [opts.timeout=30000] - Navigation timeout in ms
|
|
33
|
+
* @param {boolean} [opts.blockAds=true] - Block ~120 common ad/tracker
|
|
34
|
+
* URL patterns via CDP. Shrinks ARIA snapshots and speeds page loads.
|
|
35
|
+
* See src/blocklist.js for the default set. Set false to disable.
|
|
36
|
+
* @param {string[]} [opts.blockUrls] - Extra URL glob patterns to block,
|
|
37
|
+
* merged with the default unless blockAds:false.
|
|
32
38
|
* @returns {Promise<string>} ARIA snapshot text
|
|
33
39
|
*/
|
|
34
40
|
export async function browse(url, opts = {}) {
|
|
@@ -53,7 +59,8 @@ export async function browse(url, opts = {}) {
|
|
|
53
59
|
}
|
|
54
60
|
|
|
55
61
|
// Step 2: Create a new page target and attach
|
|
56
|
-
|
|
62
|
+
const pageOpts = { viewport: opts.viewport, blockAds: opts.blockAds, blockUrls: opts.blockUrls };
|
|
63
|
+
let page = await createPage(cdp, mode !== 'headed', pageOpts);
|
|
57
64
|
|
|
58
65
|
// Step 2.5: Suppress permission prompts
|
|
59
66
|
await suppressPermissions(cdp);
|
|
@@ -87,7 +94,7 @@ export async function browse(url, opts = {}) {
|
|
|
87
94
|
try {
|
|
88
95
|
browser = await launch({ ...launchOpts, headed: true });
|
|
89
96
|
cdp = await createCDP(browser.wsUrl);
|
|
90
|
-
page = await createPage(cdp, false,
|
|
97
|
+
page = await createPage(cdp, false, pageOpts);
|
|
91
98
|
await suppressPermissions(cdp);
|
|
92
99
|
if (opts.cookies !== false) {
|
|
93
100
|
try { await authenticate(page.session, url, { browser: opts.browser }); } catch {}
|
|
@@ -139,6 +146,14 @@ export async function browse(url, opts = {}) {
|
|
|
139
146
|
* Default: a per-session subdirectory under the OS temp dir. Downloads
|
|
140
147
|
* land here as <guid>; check `page.downloads` for { url, suggestedFilename,
|
|
141
148
|
* savedPath, state, totalBytes, receivedBytes } per file.
|
|
149
|
+
* @param {boolean} [opts.blockAds] - Block ~120 common ad/tracker URL
|
|
150
|
+
* patterns via CDP. Defaults to true for launched browsers, false in
|
|
151
|
+
* attach mode (would affect any tab attached to the user's running
|
|
152
|
+
* session). Setting blockAds:true explicitly in attach mode honors the
|
|
153
|
+
* request — blocking applies to whichever tab the session is currently
|
|
154
|
+
* attached to and follows the session across switchTab() until close.
|
|
155
|
+
* @param {string[]} [opts.blockUrls] - Extra URL glob patterns to block,
|
|
156
|
+
* merged with the default unless blockAds is false.
|
|
142
157
|
* @returns {Promise<object>} Page handle with goto, snapshot, close
|
|
143
158
|
*/
|
|
144
159
|
export async function connect(opts = {}) {
|
|
@@ -169,7 +184,15 @@ export async function connect(opts = {}) {
|
|
|
169
184
|
// (they'd persist in the user's session via addScriptToEvaluateOnNewDocument)
|
|
170
185
|
// and the headed→headless rewind in goto() is gated off below.
|
|
171
186
|
let currentlyHeaded = attachMode || (mode === 'headed');
|
|
172
|
-
|
|
187
|
+
// Default blockAds on for owned browsers, off in attach mode (would affect
|
|
188
|
+
// any tab we attach to in the user's running session). Caller can flip with
|
|
189
|
+
// explicit blockAds:true/false.
|
|
190
|
+
const pageOpts = {
|
|
191
|
+
viewport: opts.viewport,
|
|
192
|
+
blockAds: opts.blockAds !== undefined ? opts.blockAds : !attachMode,
|
|
193
|
+
blockUrls: opts.blockUrls,
|
|
194
|
+
};
|
|
195
|
+
let page = await createPage(cdp, !currentlyHeaded, pageOpts);
|
|
173
196
|
let refMap = new Map();
|
|
174
197
|
let botBlocked = false;
|
|
175
198
|
|
|
@@ -304,7 +327,7 @@ export async function connect(opts = {}) {
|
|
|
304
327
|
|
|
305
328
|
browser = await launch(launchOpts);
|
|
306
329
|
cdp = await createCDP(browser.wsUrl);
|
|
307
|
-
page = await createPage(cdp, true,
|
|
330
|
+
page = await createPage(cdp, true, pageOpts);
|
|
308
331
|
setupDialogHandler(page.session);
|
|
309
332
|
await suppressPermissions(cdp);
|
|
310
333
|
currentlyHeaded = false;
|
|
@@ -330,7 +353,7 @@ export async function connect(opts = {}) {
|
|
|
330
353
|
try {
|
|
331
354
|
browser = await launch({ ...launchOpts, headed: true });
|
|
332
355
|
cdp = await createCDP(browser.wsUrl);
|
|
333
|
-
page = await createPage(cdp, false,
|
|
356
|
+
page = await createPage(cdp, false, pageOpts);
|
|
334
357
|
setupDialogHandler(page.session);
|
|
335
358
|
await suppressPermissions(cdp);
|
|
336
359
|
await navigate(page, url, timeout);
|
|
@@ -473,7 +496,7 @@ export async function connect(opts = {}) {
|
|
|
473
496
|
// closure handle used by every method below, so swapping it makes
|
|
474
497
|
// snapshot/click/type/etc. operate on the new tab.
|
|
475
498
|
const oldSessionId = page.sessionId;
|
|
476
|
-
page = await attachToExistingTarget(cdp, target.targetId);
|
|
499
|
+
page = await attachToExistingTarget(cdp, target.targetId, pageOpts);
|
|
477
500
|
refMap = new Map(); // refs from the previous tab are no longer valid
|
|
478
501
|
setupDialogHandler(page.session);
|
|
479
502
|
try { await cdp.send('Target.detachFromTarget', { sessionId: oldSessionId }); } catch {}
|
|
@@ -561,7 +584,7 @@ export async function connect(opts = {}) {
|
|
|
561
584
|
get cdp() { return page.session; },
|
|
562
585
|
|
|
563
586
|
async createTab() {
|
|
564
|
-
const tab = await createPage(cdp, !currentlyHeaded,
|
|
587
|
+
const tab = await createPage(cdp, !currentlyHeaded, pageOpts);
|
|
565
588
|
await suppressPermissions(cdp);
|
|
566
589
|
setupDialogHandler(tab.session);
|
|
567
590
|
let tabBotBlocked = false;
|
|
@@ -653,6 +676,12 @@ async function createPage(cdp, stealth = false, pageOpts = {}) {
|
|
|
653
676
|
await applyStealth(session);
|
|
654
677
|
}
|
|
655
678
|
|
|
679
|
+
// Ad/tracker URL blocking via CDP. Default on for owned browsers — shrinks
|
|
680
|
+
// ARIA, speeds loads. Skipped in attach mode (would affect the user's
|
|
681
|
+
// running browser globally) and skippable per-call via blockAds:false.
|
|
682
|
+
// Custom patterns in blockUrls extend the default unless blockAds is false.
|
|
683
|
+
await applyBlocklist(session, pageOpts);
|
|
684
|
+
|
|
656
685
|
// Set viewport size if specified (e.g. "1280x720")
|
|
657
686
|
if (pageOpts.viewport) {
|
|
658
687
|
const [w, h] = pageOpts.viewport.split('x').map(Number);
|
|
@@ -718,16 +747,35 @@ async function attachFrameTracking(cdp, mainSession) {
|
|
|
718
747
|
* Attach a CDP session to an existing target (e.g. a tab opened by window.open).
|
|
719
748
|
* Enables the same domains as createPage so snapshot/click/type work uniformly.
|
|
720
749
|
*/
|
|
721
|
-
async function attachToExistingTarget(cdp, targetId) {
|
|
750
|
+
async function attachToExistingTarget(cdp, targetId, pageOpts = {}) {
|
|
722
751
|
const { sessionId } = await cdp.send('Target.attachToTarget', { targetId, flatten: true });
|
|
723
752
|
const session = cdp.session(sessionId);
|
|
724
753
|
await session.send('Page.enable');
|
|
725
754
|
await session.send('Network.enable');
|
|
726
755
|
await session.send('DOM.enable');
|
|
756
|
+
await applyBlocklist(session, pageOpts);
|
|
727
757
|
const framesByFrameId = await attachFrameTracking(cdp, session);
|
|
728
758
|
return { session, targetId, sessionId, framesByFrameId };
|
|
729
759
|
}
|
|
730
760
|
|
|
761
|
+
/**
|
|
762
|
+
* Apply Network.setBlockedURLs for ad/tracker blocking on a session.
|
|
763
|
+
* Default list is on; pass blockAds:false to skip, blockUrls:[] to extend.
|
|
764
|
+
* Silent on failure — older Chrome / unusual modes shouldn't break the page.
|
|
765
|
+
*/
|
|
766
|
+
async function applyBlocklist(session, pageOpts) {
|
|
767
|
+
if (pageOpts.blockAds === false && !pageOpts.blockUrls) return;
|
|
768
|
+
const patterns = pageOpts.blockAds === false
|
|
769
|
+
? (pageOpts.blockUrls || [])
|
|
770
|
+
: [...DEFAULT_BLOCKLIST, ...(pageOpts.blockUrls || [])];
|
|
771
|
+
if (!patterns.length) return;
|
|
772
|
+
try {
|
|
773
|
+
await session.send('Network.setBlockedURLs', { urls: patterns });
|
|
774
|
+
} catch {
|
|
775
|
+
// Network.setBlockedURLs unsupported on this Chrome — skip silently.
|
|
776
|
+
}
|
|
777
|
+
}
|
|
778
|
+
|
|
731
779
|
/**
|
|
732
780
|
* Navigate to a URL and wait for the page to load.
|
|
733
781
|
*/
|
package/src/stealth.js
CHANGED
|
@@ -92,6 +92,75 @@ const STEALTH_SCRIPT = `
|
|
|
92
92
|
return origGetParam2.apply(this, arguments);
|
|
93
93
|
};
|
|
94
94
|
}
|
|
95
|
+
|
|
96
|
+
// Canvas fingerprinting — sites render standard text/shapes, then read
|
|
97
|
+
// pixels via toDataURL or getImageData. The output is stable per machine
|
|
98
|
+
// (GPU, font rasterizer, anti-aliasing) but unique across machines, which
|
|
99
|
+
// makes it the second-most-common fingerprint after WebGL. Defense: nudge
|
|
100
|
+
// a few RGB channels by ±1 per session so the hash changes each visit
|
|
101
|
+
// while the canvas still looks identical to the human eye. The per-tab
|
|
102
|
+
// seed keeps reads stable within a session so legitimate canvas use
|
|
103
|
+
// (image processing, games) doesn't flicker.
|
|
104
|
+
// crypto.getRandomValues is guaranteed unique per browsing context; using
|
|
105
|
+
// Math.random alone can collide when two fresh V8 contexts start within
|
|
106
|
+
// microseconds of each other (real-world: parallel tests, real-world hit:
|
|
107
|
+
// we observed it). performance.now adds a wall-clock anchor as a belt-and-
|
|
108
|
+
// braces guard against contexts where crypto is somehow stubbed.
|
|
109
|
+
const _seedBuf = new Uint32Array(1);
|
|
110
|
+
crypto.getRandomValues(_seedBuf);
|
|
111
|
+
const CANVAS_SEED = (_seedBuf[0] ^ ((performance.now() * 1e6) | 0)) >>> 0;
|
|
112
|
+
function shiftPixels(data) {
|
|
113
|
+
// Touch ~1 byte per 64-byte stride. The bit we XOR with is taken from a
|
|
114
|
+
// position-dependent SLICE of a seed-mixed hash, not its low bit — a
|
|
115
|
+
// naive 'mix & 1' collapses to only two possible outputs per seed
|
|
116
|
+
// parity because every stride index is even (the position multiplier
|
|
117
|
+
// is odd, so the low bit only depends on seed parity). Indexing the
|
|
118
|
+
// hash by (i/64) mod 32 makes every stride position sample a different
|
|
119
|
+
// bit, so two distinct seeds produce different mask patterns.
|
|
120
|
+
for (let i = 0; i < data.length; i += 64) {
|
|
121
|
+
const h = ((CANVAS_SEED * 2654435761) ^ (i * 1597334677)) >>> 0;
|
|
122
|
+
const bit = (h >>> ((i >>> 6) & 31)) & 1;
|
|
123
|
+
data[i] = (data[i] ^ bit) & 0xff;
|
|
124
|
+
}
|
|
125
|
+
return data;
|
|
126
|
+
}
|
|
127
|
+
// Capture originals BEFORE replacing — toDataURL must read pixels via the
|
|
128
|
+
// native getImageData (not the patched one), otherwise the patch double-
|
|
129
|
+
// applies and the second XOR cancels the first, leaving output unchanged.
|
|
130
|
+
const origGetImageData = CanvasRenderingContext2D.prototype.getImageData;
|
|
131
|
+
const origToDataURL = HTMLCanvasElement.prototype.toDataURL;
|
|
132
|
+
HTMLCanvasElement.prototype.toDataURL = function() {
|
|
133
|
+
const ctx = this.getContext('2d');
|
|
134
|
+
if (ctx && this.width > 0 && this.height > 0) {
|
|
135
|
+
try {
|
|
136
|
+
const img = origGetImageData.call(ctx, 0, 0, this.width, this.height);
|
|
137
|
+
// Snapshot the original bytes so we can restore them after encoding.
|
|
138
|
+
// Without this, repeated toDataURL() alternates noisy/clean: call 1
|
|
139
|
+
// XORs the canvas in place, call 2 reads the noisy canvas and XORs
|
|
140
|
+
// again (self-inverse), call 3 again, etc. Same XOR-cancellation
|
|
141
|
+
// class as the earlier double-apply bug, just through canvas state
|
|
142
|
+
// rather than method composition. The restore also keeps the
|
|
143
|
+
// bitmap idempotent for any downstream legitimate canvas reads.
|
|
144
|
+
const original = new Uint8ClampedArray(img.data);
|
|
145
|
+
shiftPixels(img.data);
|
|
146
|
+
ctx.putImageData(img, 0, 0);
|
|
147
|
+
const result = origToDataURL.apply(this, arguments);
|
|
148
|
+
img.data.set(original);
|
|
149
|
+
ctx.putImageData(img, 0, 0);
|
|
150
|
+
return result;
|
|
151
|
+
} catch {
|
|
152
|
+
// Tainted canvas (cross-origin image) — can't read; skip the nudge
|
|
153
|
+
// and fall through to the original call so the page sees the
|
|
154
|
+
// expected SecurityError instead of silent corruption.
|
|
155
|
+
}
|
|
156
|
+
}
|
|
157
|
+
return origToDataURL.apply(this, arguments);
|
|
158
|
+
};
|
|
159
|
+
CanvasRenderingContext2D.prototype.getImageData = function() {
|
|
160
|
+
const img = origGetImageData.apply(this, arguments);
|
|
161
|
+
shiftPixels(img.data);
|
|
162
|
+
return img;
|
|
163
|
+
};
|
|
95
164
|
`;
|
|
96
165
|
|
|
97
166
|
/**
|