barebrowse 0.10.0 → 0.11.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.github/workflows/publish.yml +26 -0
- package/CHANGELOG.md +107 -0
- package/README.md +11 -0
- package/barebrowse.context.md +12 -5
- package/cli.js +8 -0
- package/commands/barebrowse/SKILL.md +4 -0
- package/commands/barebrowse.md +4 -0
- package/package.json +8 -2
- package/src/auth.js +28 -6
- package/src/blocklist.js +12 -0
- package/src/chromium.js +12 -4
- package/src/daemon.js +46 -10
- package/src/index.js +51 -5
- package/src/session-client.js +5 -1
- package/src/url-guard.js +138 -0
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
name: Publish to npm
|
|
2
|
+
|
|
3
|
+
# Trusted publishing via OIDC — no NPM_TOKEN needed.
|
|
4
|
+
# Configure the trusted publisher at npmjs.com first (see repo notes).
|
|
5
|
+
on:
|
|
6
|
+
workflow_dispatch: # manual "Run workflow" button
|
|
7
|
+
release:
|
|
8
|
+
types: [published] # also publishes when you cut a GitHub release
|
|
9
|
+
|
|
10
|
+
permissions:
|
|
11
|
+
contents: read
|
|
12
|
+
id-token: write # required: lets npm mint OIDC credentials
|
|
13
|
+
|
|
14
|
+
jobs:
|
|
15
|
+
publish:
|
|
16
|
+
runs-on: ubuntu-latest
|
|
17
|
+
steps:
|
|
18
|
+
- uses: actions/checkout@v4
|
|
19
|
+
- uses: actions/setup-node@v4
|
|
20
|
+
with:
|
|
21
|
+
node-version: 22
|
|
22
|
+
registry-url: 'https://registry.npmjs.org'
|
|
23
|
+
- name: Upgrade npm (trusted publishing needs >= 11.5.1)
|
|
24
|
+
run: npm install -g npm@latest
|
|
25
|
+
- name: Publish
|
|
26
|
+
run: npm publish
|
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,112 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.11.0
|
|
4
|
+
|
|
5
|
+
### Security hardening — audit findings fixed, safe-by-default
|
|
6
|
+
|
|
7
|
+
A full security audit of the library + CLI daemon + MCP server. Eight
|
|
8
|
+
findings were reproduced with live PoCs, fixed, and locked in with 14 new
|
|
9
|
+
regression tests (143 → 157 passing). Two new opt-in controls; two new
|
|
10
|
+
defaults that change behavior (see **Breaking** below).
|
|
11
|
+
|
|
12
|
+
- **Daemon authentication (was: unauthenticated `eval` over loopback).**
|
|
13
|
+
The CLI daemon's HTTP server bound to `127.0.0.1` but had no auth — and
|
|
14
|
+
loopback is shared across local users, so any local process could POST
|
|
15
|
+
`/command` (including `eval` = arbitrary JS in the authenticated browser).
|
|
16
|
+
Now every daemon mints a 32-byte random token at startup, written into
|
|
17
|
+
`session.json` (mode `0600`) and required on `/command` via the
|
|
18
|
+
`x-barebrowse-token` header (constant-time compare). `session-client.js`
|
|
19
|
+
reads and sends it transparently — no caller change. `GET /status` stays
|
|
20
|
+
open as a liveness ping returning only `{ ok, pid }`.
|
|
21
|
+
- **Artifact permissions.** The session dir is now created `0700` and all
|
|
22
|
+
daemon artifacts (`session.json`, snapshots, screenshots, PDFs, console /
|
|
23
|
+
network / dialog logs) plus `page.saveState()` output are written `0600`.
|
|
24
|
+
`saveState` holds cookies + localStorage (session tokens), so this stops a
|
|
25
|
+
multi-user host from reading another user's credentials off disk.
|
|
26
|
+
- **Navigation scheme guard (new module `src/url-guard.js`).** `goto()` /
|
|
27
|
+
`browse()` now reject local-resource and browser-internal schemes
|
|
28
|
+
(`file:`, `view-source:`, `chrome:`, `chrome-extension:`, `filesystem:`,
|
|
29
|
+
`devtools:`, …) by default — closing a confirmed local-file-read /
|
|
30
|
+
directory-listing vector for a prompt-injected agent. `http`/`https`/
|
|
31
|
+
`data`/`blob`/`about` stay allowed (`data:` is opaque-origin and the
|
|
32
|
+
test-fixture mechanism — not a read/SSRF vector). Override with
|
|
33
|
+
`{ allowLocalUrls: true }`.
|
|
34
|
+
- **SSRF guard (opt-in `blockPrivateNetwork`).** When set, `goto()`/
|
|
35
|
+
`browse()` refuse loopback / RFC-1918 / link-local / cloud-metadata
|
|
36
|
+
(`169.254.169.254`) / `*.internal` hosts. Off by default so localhost
|
|
37
|
+
dev-server browsing keeps working. Exposed as `--block-private-network`.
|
|
38
|
+
- **Upload sandbox (opt-in `uploadDir`).** `upload()` confirmed it would
|
|
39
|
+
attach any absolute path to a file input (exfil vector under prompt
|
|
40
|
+
injection). When `uploadDir` is set, every path must resolve (symlinks
|
|
41
|
+
included, via `realpath`) inside it. Default unrestricted — nothing breaks
|
|
42
|
+
unless you opt in. Exposed as `--upload-dir=DIR`. Both new opts pass
|
|
43
|
+
through `connect()` → MCP / bareagent / CLI daemon uniformly.
|
|
44
|
+
- **Cookie injection scoped precisely (was: over-broad substring match).**
|
|
45
|
+
`authenticate()` matched `host_key LIKE '%domain%'`, so browsing
|
|
46
|
+
`apple.com` injected cookies for `apple.com.evil.org` / `notapple.com`,
|
|
47
|
+
and `mybank.co.uk` (→ `co.uk`) pulled every `*.co.uk` cookie. The LIKE
|
|
48
|
+
query is now only a coarse pre-filter; a precise RFC-6265
|
|
49
|
+
`cookieDomainMatch()` decides what actually gets injected (parent-domain
|
|
50
|
+
cookies like `.google.com` still apply to `mail.google.com`).
|
|
51
|
+
- **Hardening:** browser discovery uses `execFileSync('which', [name])`
|
|
52
|
+
(no shell) instead of an interpolated `execSync` string; the cleanup
|
|
53
|
+
busy-wait drops a `sleep` subprocess for `Atomics.wait`. Added
|
|
54
|
+
`.gitignore` (was missing — `.barebrowse/` state/snapshots could be
|
|
55
|
+
accidentally committed). Pinned `wearehere` to exact `1.0.0`.
|
|
56
|
+
- **Tests:** 157 total (14 new) — `test/unit/url-guard.test.js` (19
|
|
57
|
+
assertions over scheme/private-host policy), `cookieDomainMatch` cases in
|
|
58
|
+
`test/unit/auth.test.js`, daemon token + `0600` perms in
|
|
59
|
+
`test/integration/cli.test.js`.
|
|
60
|
+
|
|
61
|
+
**Breaking:** (1) `file:`/`chrome:`/etc. navigation now throws by default —
|
|
62
|
+
pass `allowLocalUrls: true` to restore. (2) The CLI daemon now requires the
|
|
63
|
+
token; this is transparent via the bundled `session-client`, but any
|
|
64
|
+
third-party client hitting the daemon's HTTP API directly must send
|
|
65
|
+
`x-barebrowse-token` from `session.json`.
|
|
66
|
+
|
|
67
|
+
## 0.10.1
|
|
68
|
+
|
|
69
|
+
### Blocklist long-tail additions + legacy-Chrome warn + switchTab attach-mode test
|
|
70
|
+
|
|
71
|
+
Carry-forward items from the v0.10.0 backlog. All additive, no behavior
|
|
72
|
+
change on supported Chrome.
|
|
73
|
+
|
|
74
|
+
- **8 new patterns in `src/blocklist.js`** (120 → 128, still in the
|
|
75
|
+
curated 80–200 band):
|
|
76
|
+
- Mobile-measurement-on-web cluster (increasingly served from web
|
|
77
|
+
pages, not just SDKs): `*.appsflyer.com`, `*.branch.io`,
|
|
78
|
+
`*.adjust.com`.
|
|
79
|
+
- Privacy-friendly analytics that still tracks from an agent POV:
|
|
80
|
+
`static.cloudflareinsights.com` (Cloudflare Web Analytics),
|
|
81
|
+
`*.matomo.cloud` (Matomo Cloud's hosted tier).
|
|
82
|
+
- Broader Outbrain coverage: `amplify.outbrain.com`,
|
|
83
|
+
`log.outbrain.com` (in addition to the existing
|
|
84
|
+
`widgets.outbrain.com` and `*.outbrain.com/utils/*`).
|
|
85
|
+
- Broader PostHog: `*.posthog.com/static/array.js*` (the snippet
|
|
86
|
+
loader, in addition to the existing `/e/` and `/decide/` endpoints).
|
|
87
|
+
- **One-time `console.warn` when `Network.setBlockedURLs` rejects.**
|
|
88
|
+
Legacy Chromium builds lacking the method previously failed silently
|
|
89
|
+
inside `applyBlocklist`; now a single warn per process surfaces the
|
|
90
|
+
reason so callers don't wonder why blocking isn't engaging. Stays
|
|
91
|
+
silent on supported Chrome (success path), stays silent when
|
|
92
|
+
`blockAds: false` opts out entirely. Module-scoped flag —
|
|
93
|
+
intentionally not per-session, since the failure mode is the
|
|
94
|
+
browser, not the session.
|
|
95
|
+
- **`switchTab()` + `blockAds:true` attach-mode integration test.**
|
|
96
|
+
The v0.10.0 JSDoc claimed blocklist follows `switchTab()` in attach
|
|
97
|
+
mode but had no automated guard. New test in
|
|
98
|
+
`test/integration/blocklist.test.js` launches a real browser, opens
|
|
99
|
+
a second tab via raw CDP (bypassing barebrowse so the tab simulates
|
|
100
|
+
one the user already had open), attaches with explicit
|
|
101
|
+
`blockAds: true` + `blockUrls: [pattern]`, switches into that tab,
|
|
102
|
+
and asserts the tracker server gets zero hits and the tracker script
|
|
103
|
+
never executed. Locks in the post-switch `applyBlocklist` call site
|
|
104
|
+
that was added in v0.10.0.
|
|
105
|
+
- **Tests:** 143 total (5 new). 4 new unit tests in
|
|
106
|
+
`test/unit/blocklist.test.js` (long-tail coverage drift guard +
|
|
107
|
+
3-subtest warn-once suite covering rejection, success path, and
|
|
108
|
+
opted-out paths); 1 new integration test as above.
|
|
109
|
+
|
|
3
110
|
## 0.10.0
|
|
4
111
|
|
|
5
112
|
### Ad/tracker URL blocking + canvas-noise stealth + Chromium pgid reap fix
|
package/README.md
CHANGED
|
@@ -134,6 +134,17 @@ No clone profile, no fresh cookies — the agent sees what you see.
|
|
|
134
134
|
|
|
135
135
|
Cookie consent walls (29 languages, with real mouse click fallback for stubborn CMPs), login walls (cookie extraction from your browsers), bot detection (ARIA node count heuristic + stealth patches + automatic headed fallback — snapshot shows `[BOT CHALLENGE DETECTED]` warning when blocked), permission prompts, SPA navigation, JS dialogs, off-screen elements, pre-filled inputs, ARIA noise, and profile locking. The agent doesn't think about any of it.
|
|
136
136
|
|
|
137
|
+
## Safe by default (v0.11.0)
|
|
138
|
+
|
|
139
|
+
barebrowse hands an autonomous — and therefore prompt-injectable — agent an *authenticated* browser, so the defaults are calibrated for that threat:
|
|
140
|
+
|
|
141
|
+
- **Local-resource schemes blocked.** `file:`, `view-source:`, `chrome:`, etc. are rejected by default (a confirmed local-file-read vector); `http`/`https`/`data` stay allowed. Override with `allowLocalUrls: true`.
|
|
142
|
+
- **Cookie injection scoped** to a precise RFC-6265 domain match — browsing one site can't pull look-alike or unrelated cookies into the session.
|
|
143
|
+
- **CLI daemon authenticated** with a per-session token (loopback alone isn't an authorization boundary); snapshots and saved state are written owner-only (`0600`).
|
|
144
|
+
- **Opt-in hardening** for stricter deployments: `blockPrivateNetwork` (SSRF guard for loopback/RFC-1918/cloud-metadata) and `uploadDir` (confine `upload()` to one directory). Both available on the library, MCP, bareagent, and CLI (`--block-private-network`, `--upload-dir`).
|
|
145
|
+
|
|
146
|
+
See `barebrowse.context.md` and the PRD's "Security Model & Safe Defaults" for the full rationale.
|
|
147
|
+
|
|
137
148
|
## What the agent sees
|
|
138
149
|
|
|
139
150
|
Raw ARIA output from a page is noisy -- decorative wrappers, hidden elements, structural junk. The pruning pipeline (ported from [mcprune](https://github.com/hamr0/mcprune)) strips it down to what matters.
|
package/barebrowse.context.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
# barebrowse -- Integration Guide
|
|
2
2
|
|
|
3
3
|
> For AI assistants and developers wiring barebrowse into a project.
|
|
4
|
-
> v0.
|
|
4
|
+
> v0.11.0 | Node.js >= 22 | 0 required deps | Apache-2.0
|
|
5
5
|
|
|
6
6
|
## What this is
|
|
7
7
|
|
|
@@ -45,7 +45,7 @@ const snapshot = await browse('https://example.com', {
|
|
|
45
45
|
prune: true, // apply ARIA pruning (47-95% token reduction)
|
|
46
46
|
pruneMode: 'act', // 'act' (interactive elements) | 'read' (all content)
|
|
47
47
|
consent: true, // auto-dismiss cookie consent dialogs
|
|
48
|
-
blockAds: true, // block
|
|
48
|
+
blockAds: true, // block 128 ad/tracker URL patterns (default on for owned browsers)
|
|
49
49
|
blockUrls: [], // extra URL globs to block (merged with the default)
|
|
50
50
|
timeout: 30000, // navigation timeout in ms
|
|
51
51
|
});
|
|
@@ -93,8 +93,11 @@ const snapshot = await browse('https://example.com', {
|
|
|
93
93
|
- `viewport: '1280x720'` — Set viewport dimensions
|
|
94
94
|
- `storageState: 'file.json'` — Load cookies/localStorage from saved state
|
|
95
95
|
- `downloadPath: '/abs/dir'` — Where downloads land. Default: per-session `mkdtemp` under `/tmp/barebrowse-dl-*` that gets removed on `close()`. Caller-supplied paths are not cleaned up — caller owns the lifecycle.
|
|
96
|
-
- `blockAds: true|false` — CDP-level URL blocking of
|
|
96
|
+
- `blockAds: true|false` — CDP-level URL blocking of 128 common ad/tracker patterns (Google ads/analytics, FB/Amazon/MS/Adobe ad+analytics, Segment/Amplitude/Mixpanel/Heap/PostHog, Hotjar/FullStory/LogRocket, Criteo/Taboola/Outbrain, the consumer-pixel cluster, AppNexus/Rubicon/PubMatic supply, marketing automation; v0.10.1 added AppsFlyer/Branch/Adjust, Cloudflare Web Analytics, Matomo Cloud). Default `true` for launched browsers, `false` in attach mode (would affect any tab in the user's running browser). Explicit `true` in attach mode is honored and follows the session across `switchTab()` (regression-tested). Shrinks ARIA snapshots and speeds page loads. On legacy Chromium lacking `Network.setBlockedURLs` a one-time `console.warn` surfaces the fallback.
|
|
97
97
|
- `blockUrls: ['*://foo.com/*', ...]` — Extra glob patterns (CDP `Network.setBlockedURLs` format) to block in addition to the default. Merged with the default unless `blockAds: false`.
|
|
98
|
+
- `allowLocalUrls: true|false` — (v0.11.0) Default `false`: navigation to local-resource / browser-internal schemes (`file:`, `view-source:`, `chrome:`, `filesystem:`, `devtools:`, …) is **blocked** to stop a prompt-injected agent reading local files. `http`/`https`/`data`/`blob`/`about` are always allowed. Set `true` to permit local schemes.
|
|
99
|
+
- `blockPrivateNetwork: true|false` — (v0.11.0) Default `false`. When `true`, `goto()`/`browse()` refuse loopback / RFC-1918 / link-local / cloud-metadata (`169.254.169.254`) / `*.internal` hosts (SSRF guard). Off by default so localhost dev-server browsing works. Hostname-based — does not catch DNS names that resolve to private IPs.
|
|
100
|
+
- `uploadDir: '/abs/dir'` — (v0.11.0) Default unset (no restriction). When set, `upload()` rejects any file that does not resolve (symlinks included, via `realpath`) inside this directory — sandboxes the agent's file-upload capability.
|
|
98
101
|
|
|
99
102
|
## Snapshot format
|
|
100
103
|
|
|
@@ -166,7 +169,7 @@ barebrowse can inject cookies from the user's real browser sessions, bypassing l
|
|
|
166
169
|
| SPA navigation | `waitForNavigation()` uses loadEventFired + frameNavigated | Both |
|
|
167
170
|
| Bot detection | v0.9.0 (H9): Cloudflare-strong phrases ("Just a moment", "Attention Required", "verify you are human") fire alone; generic phrases ("access denied", "unknown error") only fire on near-empty pages — no more false-positive headed-launches on legitimate 4xx/5xx pages. `botBlocked` flag set after every `goto()`. Hybrid fallback switches to headed. Snapshot shows `[BOT CHALLENGE DETECTED]` warning. | Hybrid |
|
|
168
171
|
| Stealth (headless tells) | v0.9.0 (H4): `Network.setUserAgentOverride` strips "HeadlessChrome" from UA in HTTP headers AND `navigator.userAgent`; JS patches for webdriver, plugins, languages, full `chrome.runtime` enum shape, `Notification` constructor + `permission: 'default'`, `hardwareConcurrency: 8`, `deviceMemory: 8`, WebGL `UNMASKED_VENDOR_WEBGL`/`UNMASKED_RENDERER_WEBGL` spoofed to Intel. v0.10.0: canvas fingerprint noise — `toDataURL`/`getImageData` XOR a per-session `crypto.getRandomValues`-seeded mask into ~1 byte per 64-byte stride (stable within a session, different across sessions; bitmap is restored after encoding so legitimate canvas use is unaffected). | Headless |
|
|
169
|
-
| Ad / tracker URL blocking | v0.10.0: CDP `Network.setBlockedURLs` with
|
|
172
|
+
| Ad / tracker URL blocking | v0.10.0: CDP `Network.setBlockedURLs` with 128 curated patterns (Google/FB/Amazon/MS/Adobe ad+analytics, the major SaaS analytics + session-replay stacks, content-rec, supply-side ad networks, marketing automation). v0.10.1 added long-tail: AppsFlyer/Branch/Adjust, Cloudflare Web Analytics, Matomo Cloud, broader Outbrain (`amplify`/`log`) and PostHog (`/static/array.js`). Default on for launched browsers, off in attach mode. `opts.blockUrls` extends; `opts.blockAds: false` disables. Shrinks ARIA snapshots and speeds loads. v0.10.1: regression-tested across `switchTab()` in attach mode; one-time `console.warn` if Chromium lacks the CDP method. | Launched |
|
|
170
173
|
| iframe / OOPIF content (Stripe, reCAPTCHA, embedded forms) | v0.9.0 (H2): `Target.setAutoAttach({flatten:true})` registers a CDP session per iframe; `ariaTree()` walks `Page.getFrameTree`, fetches each frame's AX tree on the right session, splices children under iframe placeholders via `DOM.getFrameOwner`. Refs route via `{session, backendNodeId}` so clicks dispatch in the iframe's Input domain. `--site-per-process` launch flag forces every iframe — including same-origin — into OOPIF so coords work. | Both |
|
|
171
174
|
| Downloads | v0.9.0 (H7): `Browser.setDownloadBehavior({behavior:'allowAndName', downloadPath, eventsEnabled:true})` + listeners populate `page.downloads`. Files land at `savedPath` (under `--download-path` if supplied, else per-session `/tmp/barebrowse-dl-*`). | Headless + Headed (skipped in attach mode) |
|
|
172
175
|
| Profile locking | Unique temp dir per headless instance | Headless |
|
|
@@ -226,7 +229,7 @@ barebrowse save-state # → .barebrowse/state-<timestamp>.json
|
|
|
226
229
|
barebrowse close # Kill daemon + browser
|
|
227
230
|
```
|
|
228
231
|
|
|
229
|
-
**Open flags:** `--mode=headless|headed|hybrid`, `--port=N` (attach to running browser), `--proxy=URL`, `--viewport=WxH`, `--storage-state=FILE`, `--download-path=DIR` (v0.9.0), `--no-cookies`, `--browser=firefox|chromium`, `--timeout=N`
|
|
232
|
+
**Open flags:** `--mode=headless|headed|hybrid`, `--port=N` (attach to running browser), `--proxy=URL`, `--viewport=WxH`, `--storage-state=FILE`, `--download-path=DIR` (v0.9.0), `--no-cookies`, `--browser=firefox|chromium`, `--timeout=N`, `--block-private-network` (SSRF guard, v0.11.0), `--upload-dir=DIR` (upload sandbox, v0.11.0)
|
|
230
233
|
|
|
231
234
|
Session lifecycle: `open` spawns a background daemon holding a `connect()` session. Subsequent commands POST to the daemon over HTTP (localhost). `close` shuts everything down. JS dialogs (alert/confirm/prompt) are auto-dismissed and logged.
|
|
232
235
|
|
|
@@ -355,6 +358,10 @@ Useful for agent threshold decisions: "skip sites above score 40", "warn if term
|
|
|
355
358
|
|
|
356
359
|
14. **`eval` MCP tool is opt-in.** Set `BAREBROWSE_MCP_EVAL=1` to register it. Default off because `Runtime.evaluate` in an authenticated session can read cookies/localStorage, post on the user's behalf, hit any same-origin endpoint. CLI/connect()/daemon all keep `eval` because the developer is the caller; MCP gates it because the agent acts with less judgment.
|
|
357
360
|
|
|
361
|
+
15. **The CLI daemon requires a per-session token (v0.11.0).** `open` mints a 32-byte random token, writes it into `.barebrowse/session.json` (mode `0600`) and requires it on `POST /command` via the `x-barebrowse-token` header (loopback is shared across local users, so binding to `127.0.0.1` alone isn't an authorization boundary). The bundled `session-client` sends it automatically — no change for CLI users. A third-party client hitting the daemon HTTP API directly must read the token from `session.json` and send it. `GET /status` stays open (liveness only). The session dir is `0700`; snapshots, `saveState`, and logs are written `0600`.
|
|
362
|
+
|
|
363
|
+
16. **Navigation is scheme-guarded by default (v0.11.0).** `file:`/`chrome:`/etc. throw unless `allowLocalUrls: true`; `blockPrivateNetwork` and `uploadDir` add opt-in SSRF and upload-sandbox controls. All four are exposed identically on the library, MCP/bareagent (via `connect` opts), and the CLI (`--block-private-network`, `--upload-dir=DIR`; the scheme guard and token are always on).
|
|
364
|
+
|
|
358
365
|
## Constraints
|
|
359
366
|
|
|
360
367
|
- **Node >= 22** -- built-in WebSocket, built-in SQLite
|
package/cli.js
CHANGED
|
@@ -119,6 +119,8 @@ async function cmdOpen() {
|
|
|
119
119
|
downloadPath: parseFlag('--download-path'),
|
|
120
120
|
blockAds: hasFlag('--no-block-ads') ? false : undefined,
|
|
121
121
|
blockUrls: parseFlagAll('--block-urls'),
|
|
122
|
+
blockPrivateNetwork: hasFlag('--block-private-network') || undefined,
|
|
123
|
+
uploadDir: parseFlag('--upload-dir') ? resolve(parseFlag('--upload-dir')) : undefined,
|
|
122
124
|
};
|
|
123
125
|
|
|
124
126
|
try {
|
|
@@ -222,6 +224,8 @@ async function runDaemonInternal() {
|
|
|
222
224
|
downloadPath: parseFlag('--download-path'),
|
|
223
225
|
blockAds: hasFlag('--no-block-ads') ? false : undefined,
|
|
224
226
|
blockUrls: parseFlagAll('--block-urls'),
|
|
227
|
+
blockPrivateNetwork: hasFlag('--block-private-network') || undefined,
|
|
228
|
+
uploadDir: parseFlag('--upload-dir'),
|
|
225
229
|
};
|
|
226
230
|
const outputDir = parseFlag('--output-dir') || resolve('.barebrowse');
|
|
227
231
|
const url = parseFlag('--url');
|
|
@@ -489,6 +493,10 @@ Session:
|
|
|
489
493
|
Default: enabled in owned-browser modes, disabled in attach mode.
|
|
490
494
|
--block-urls=PATTERN Extra URL glob to block (repeatable, e.g. --block-urls='*://*.foo.com/*').
|
|
491
495
|
Use the =VALUE form when the pattern could be mistaken for a flag.
|
|
496
|
+
--block-private-network SSRF guard: refuse to navigate to loopback / RFC-1918 / link-local /
|
|
497
|
+
cloud-metadata hosts. Off by default so localhost browsing works.
|
|
498
|
+
--upload-dir=DIR Sandbox uploads: reject files outside DIR (symlinks resolved).
|
|
499
|
+
Default: no restriction. (file:/chrome: schemes are always blocked.)
|
|
492
500
|
|
|
493
501
|
Navigation:
|
|
494
502
|
barebrowse goto <url> Navigate to URL
|
|
@@ -39,6 +39,10 @@ All output files go to `.barebrowse/` in the current directory. Read them with t
|
|
|
39
39
|
- `--proxy=URL` — HTTP/SOCKS proxy server
|
|
40
40
|
- `--viewport=WxH` — Viewport size (e.g. 1280x720)
|
|
41
41
|
- `--storage-state=FILE` — Load cookies/localStorage from JSON file
|
|
42
|
+
- `--block-private-network` — SSRF guard: refuse loopback / RFC-1918 / link-local / cloud-metadata hosts (v0.11.0)
|
|
43
|
+
- `--upload-dir=DIR` — Sandbox uploads to DIR; reject files outside it (v0.11.0)
|
|
44
|
+
|
|
45
|
+
> Security (v0.11.0): `file:`/`chrome:`/etc. navigation is blocked by default, and the daemon requires a per-session token (handled transparently by the CLI). Snapshots and saved state are written owner-only (`0600`).
|
|
42
46
|
|
|
43
47
|
### Navigation
|
|
44
48
|
|
package/commands/barebrowse.md
CHANGED
|
@@ -38,6 +38,10 @@ All output files go to `.barebrowse/` in the current directory. Read them with t
|
|
|
38
38
|
- `--proxy=URL` — HTTP/SOCKS proxy server
|
|
39
39
|
- `--viewport=WxH` — Viewport size (e.g. 1280x720)
|
|
40
40
|
- `--storage-state=FILE` — Load cookies/localStorage from JSON file
|
|
41
|
+
- `--block-private-network` — SSRF guard: refuse loopback / RFC-1918 / link-local / cloud-metadata hosts (v0.11.0)
|
|
42
|
+
- `--upload-dir=DIR` — Sandbox uploads to DIR; reject files outside it (v0.11.0)
|
|
43
|
+
|
|
44
|
+
> Security (v0.11.0): `file:`/`chrome:`/etc. navigation is blocked by default, and the daemon requires a per-session token (handled transparently by the CLI). Snapshots and saved state are written owner-only (`0600`).
|
|
41
45
|
|
|
42
46
|
### Navigation
|
|
43
47
|
|
package/package.json
CHANGED
|
@@ -1,7 +1,13 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "barebrowse",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.11.0",
|
|
4
4
|
"description": "Authenticated web browsing for autonomous agents via CDP. URL in, pruned ARIA snapshot out.",
|
|
5
|
+
"repository": {
|
|
6
|
+
"type": "git",
|
|
7
|
+
"url": "git+https://github.com/hamr0/barebrowse.git"
|
|
8
|
+
},
|
|
9
|
+
"homepage": "https://github.com/hamr0/barebrowse#readme",
|
|
10
|
+
"bugs": "https://github.com/hamr0/barebrowse/issues",
|
|
5
11
|
"type": "module",
|
|
6
12
|
"main": "src/index.js",
|
|
7
13
|
"exports": {
|
|
@@ -29,7 +35,7 @@
|
|
|
29
35
|
"headless"
|
|
30
36
|
],
|
|
31
37
|
"optionalDependencies": {
|
|
32
|
-
"wearehere": "
|
|
38
|
+
"wearehere": "1.0.0"
|
|
33
39
|
},
|
|
34
40
|
"license": "Apache-2.0"
|
|
35
41
|
}
|
package/src/auth.js
CHANGED
|
@@ -268,6 +268,22 @@ export async function injectCookies(session, cookies) {
|
|
|
268
268
|
}
|
|
269
269
|
}
|
|
270
270
|
|
|
271
|
+
/**
|
|
272
|
+
* RFC 6265 domain-match: does `host` belong to a cookie declared for
|
|
273
|
+
* `cookieDomain`? Leading dot on the cookie domain is ignored (host-only
|
|
274
|
+
* vs domain cookies are matched the same here, intentionally — we want
|
|
275
|
+
* parent-domain cookies like .google.com to apply to mail.google.com).
|
|
276
|
+
* @param {string} host - target hostname (e.g. 'mail.google.com')
|
|
277
|
+
* @param {string} cookieDomain - cookie's host_key (e.g. '.google.com')
|
|
278
|
+
* @returns {boolean}
|
|
279
|
+
*/
|
|
280
|
+
export function cookieDomainMatch(host, cookieDomain) {
|
|
281
|
+
const h = String(host).toLowerCase();
|
|
282
|
+
const d = String(cookieDomain).toLowerCase().replace(/^\./, '');
|
|
283
|
+
if (!d) return false;
|
|
284
|
+
return h === d || h.endsWith('.' + d);
|
|
285
|
+
}
|
|
286
|
+
|
|
271
287
|
/**
|
|
272
288
|
* Extract cookies for a URL and inject them into a CDP session.
|
|
273
289
|
* Convenience function combining extractCookies + injectCookies.
|
|
@@ -276,12 +292,18 @@ export async function injectCookies(session, cookies) {
|
|
|
276
292
|
* @param {object} [opts] - Options passed to extractCookies
|
|
277
293
|
*/
|
|
278
294
|
export async function authenticate(session, url, opts = {}) {
|
|
279
|
-
|
|
280
|
-
//
|
|
281
|
-
|
|
282
|
-
|
|
283
|
-
|
|
284
|
-
|
|
295
|
+
const fullHost = new URL(url).hostname.toLowerCase();
|
|
296
|
+
// Coarse SQL pre-filter: strip to a registrable-ish domain so the LIKE query
|
|
297
|
+
// returns a superset (incl. parent-domain cookies). slice(-2) is a cheap
|
|
298
|
+
// heuristic — it over-selects for multi-part eTLDs (co.uk) and as a substring
|
|
299
|
+
// match, so the precise RFC-6265 domain-match below is what actually decides
|
|
300
|
+
// which cookies get injected. Without it, browsing apple.com would inject
|
|
301
|
+
// cookies for apple.com.evil.org and every *.co.uk site (verified).
|
|
302
|
+
const noWww = fullHost.replace(/^www\./, '');
|
|
303
|
+
const parts = noWww.split('.');
|
|
304
|
+
const coarseDomain = parts.length > 2 ? parts.slice(-2).join('.') : noWww;
|
|
305
|
+
const candidates = extractCookies({ ...opts, domain: coarseDomain });
|
|
306
|
+
const cookies = candidates.filter((c) => cookieDomainMatch(fullHost, c.domain));
|
|
285
307
|
if (cookies.length > 0) {
|
|
286
308
|
await injectCookies(session, cookies);
|
|
287
309
|
}
|
package/src/blocklist.js
CHANGED
|
@@ -99,6 +99,8 @@ export const DEFAULT_BLOCKLIST = [
|
|
|
99
99
|
'*://trc.taboola.com/*',
|
|
100
100
|
'*://widgets.outbrain.com/*',
|
|
101
101
|
'*://*.outbrain.com/utils/*',
|
|
102
|
+
'*://amplify.outbrain.com/*',
|
|
103
|
+
'*://log.outbrain.com/*',
|
|
102
104
|
|
|
103
105
|
// --- Tealium / Marketo / Pardot / Salesforce marketing ---
|
|
104
106
|
'*://tags.tiqcdn.com/*',
|
|
@@ -152,6 +154,7 @@ export const DEFAULT_BLOCKLIST = [
|
|
|
152
154
|
'*://heapanalytics.com/h*',
|
|
153
155
|
'*://*.posthog.com/e/*',
|
|
154
156
|
'*://*.posthog.com/decide/*',
|
|
157
|
+
'*://*.posthog.com/static/array.js*',
|
|
155
158
|
|
|
156
159
|
// --- Marketing automation ---
|
|
157
160
|
'*://track.hubspot.com/*',
|
|
@@ -170,6 +173,15 @@ export const DEFAULT_BLOCKLIST = [
|
|
|
170
173
|
'*://sessions.bugsnag.com/*',
|
|
171
174
|
'*://notify.bugsnag.com/*',
|
|
172
175
|
|
|
176
|
+
// --- Mobile-measurement (increasingly served on web too) ---
|
|
177
|
+
'*://*.appsflyer.com/*',
|
|
178
|
+
'*://*.branch.io/*',
|
|
179
|
+
'*://*.adjust.com/*',
|
|
180
|
+
|
|
181
|
+
// --- Privacy-friendly analytics (still trackers from an agent POV) ---
|
|
182
|
+
'*://static.cloudflareinsights.com/*',
|
|
183
|
+
'*://*.matomo.cloud/*',
|
|
184
|
+
|
|
173
185
|
// --- Misc widely-deployed ad networks ---
|
|
174
186
|
'*://*.adnxs.com/*', // AppNexus / Xandr
|
|
175
187
|
'*://*.rubiconproject.com/*',
|
package/src/chromium.js
CHANGED
|
@@ -5,9 +5,14 @@
|
|
|
5
5
|
* Modes: headless (launch new, no UI), headed (launch new, visible window).
|
|
6
6
|
*/
|
|
7
7
|
|
|
8
|
-
import {
|
|
8
|
+
import { execFileSync, spawn } from 'node:child_process';
|
|
9
9
|
import { existsSync, rmSync } from 'node:fs';
|
|
10
10
|
|
|
11
|
+
/** Block the current thread for `ms` without spawning a process. */
|
|
12
|
+
function sleepSync(ms) {
|
|
13
|
+
Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, ms);
|
|
14
|
+
}
|
|
15
|
+
|
|
11
16
|
// Track launched browsers so we can clean them up if the parent crashes.
|
|
12
17
|
// Registered exit handlers (one-time) iterate this set on shutdown.
|
|
13
18
|
const activeBrowsers = new Set();
|
|
@@ -29,7 +34,7 @@ function reapAllSync() {
|
|
|
29
34
|
for (const b of toReap) {
|
|
30
35
|
for (let i = 0; i < 20; i++) {
|
|
31
36
|
try { process.kill(b.process.pid, 0); } catch { break; }
|
|
32
|
-
|
|
37
|
+
sleepSync(50);
|
|
33
38
|
}
|
|
34
39
|
if (b.ownedProfileDir) {
|
|
35
40
|
try { rmSync(b.ownedProfileDir, { recursive: true, force: true }); } catch {}
|
|
@@ -84,8 +89,11 @@ export function findBrowser() {
|
|
|
84
89
|
if (existsSync(candidate)) return candidate;
|
|
85
90
|
continue;
|
|
86
91
|
}
|
|
87
|
-
// Relative name —
|
|
88
|
-
const path =
|
|
92
|
+
// Relative name — resolve via `which` (execFile: no shell, no injection)
|
|
93
|
+
const path = execFileSync('which', [candidate], {
|
|
94
|
+
encoding: 'utf8',
|
|
95
|
+
stdio: ['ignore', 'pipe', 'ignore'],
|
|
96
|
+
}).trim();
|
|
89
97
|
if (path) return path;
|
|
90
98
|
} catch {
|
|
91
99
|
// Not found, try next
|
package/src/daemon.js
CHANGED
|
@@ -8,9 +8,25 @@
|
|
|
8
8
|
import { createServer } from 'node:http';
|
|
9
9
|
import { spawn } from 'node:child_process';
|
|
10
10
|
import { writeFileSync, mkdirSync, existsSync, readFileSync, unlinkSync } from 'node:fs';
|
|
11
|
+
import { randomBytes, timingSafeEqual } from 'node:crypto';
|
|
11
12
|
import { join, resolve } from 'node:path';
|
|
12
13
|
import { connect } from './index.js';
|
|
13
14
|
|
|
15
|
+
/** Owner-only file write helper — daemon artifacts can hold authenticated content. */
|
|
16
|
+
function writeFilePrivate(path, data) {
|
|
17
|
+
writeFileSync(path, data, { mode: 0o600 });
|
|
18
|
+
}
|
|
19
|
+
|
|
20
|
+
/** Constant-time token compare; false on any length/format mismatch. */
|
|
21
|
+
function tokenMatches(expected, got) {
|
|
22
|
+
if (typeof got !== 'string' || got.length !== expected.length) return false;
|
|
23
|
+
try {
|
|
24
|
+
return timingSafeEqual(Buffer.from(got), Buffer.from(expected));
|
|
25
|
+
} catch {
|
|
26
|
+
return false;
|
|
27
|
+
}
|
|
28
|
+
}
|
|
29
|
+
|
|
14
30
|
const SESSION_FILE = 'session.json';
|
|
15
31
|
|
|
16
32
|
/**
|
|
@@ -19,7 +35,7 @@ const SESSION_FILE = 'session.json';
|
|
|
19
35
|
*/
|
|
20
36
|
export async function startDaemon(opts, outputDir, initialUrl) {
|
|
21
37
|
const absDir = resolve(outputDir);
|
|
22
|
-
mkdirSync(absDir, { recursive: true });
|
|
38
|
+
mkdirSync(absDir, { recursive: true, mode: 0o700 });
|
|
23
39
|
|
|
24
40
|
// Clean stale session
|
|
25
41
|
const sessionPath = join(absDir, SESSION_FILE);
|
|
@@ -44,6 +60,8 @@ export async function startDaemon(opts, outputDir, initialUrl) {
|
|
|
44
60
|
if (Array.isArray(opts.blockUrls)) {
|
|
45
61
|
for (const p of opts.blockUrls) args.push('--block-urls', p);
|
|
46
62
|
}
|
|
63
|
+
if (opts.blockPrivateNetwork) args.push('--block-private-network');
|
|
64
|
+
if (opts.uploadDir) args.push('--upload-dir', opts.uploadDir);
|
|
47
65
|
|
|
48
66
|
const child = spawn(process.execPath, args, {
|
|
49
67
|
detached: true,
|
|
@@ -75,7 +93,13 @@ export async function startDaemon(opts, outputDir, initialUrl) {
|
|
|
75
93
|
*/
|
|
76
94
|
export async function runDaemon(opts, outputDir, initialUrl) {
|
|
77
95
|
const absDir = resolve(outputDir);
|
|
78
|
-
mkdirSync(absDir, { recursive: true });
|
|
96
|
+
mkdirSync(absDir, { recursive: true, mode: 0o700 });
|
|
97
|
+
|
|
98
|
+
// Per-session auth token. The daemon binds to loopback, but loopback is
|
|
99
|
+
// shared across local users — without a token any local user/process could
|
|
100
|
+
// POST /command and drive the authenticated browser (incl. `eval`). The
|
|
101
|
+
// token is written into session.json (mode 0600) so only the owner reads it.
|
|
102
|
+
const authToken = randomBytes(32).toString('hex');
|
|
79
103
|
|
|
80
104
|
// Connect to browser
|
|
81
105
|
const page = await connect({
|
|
@@ -88,6 +112,8 @@ export async function runDaemon(opts, outputDir, initialUrl) {
|
|
|
88
112
|
downloadPath: opts.downloadPath,
|
|
89
113
|
blockAds: opts.blockAds,
|
|
90
114
|
blockUrls: opts.blockUrls,
|
|
115
|
+
blockPrivateNetwork: opts.blockPrivateNetwork,
|
|
116
|
+
uploadDir: opts.uploadDir,
|
|
91
117
|
});
|
|
92
118
|
|
|
93
119
|
// Console log capture
|
|
@@ -161,7 +187,7 @@ export async function runDaemon(opts, outputDir, initialUrl) {
|
|
|
161
187
|
const text = await page.snapshot({ mode: pruneMode });
|
|
162
188
|
const ts = new Date().toISOString().replace(/[:.]/g, '-');
|
|
163
189
|
const file = join(absDir, `page-${ts}.yml`);
|
|
164
|
-
|
|
190
|
+
writeFilePrivate(file, text);
|
|
165
191
|
return { ok: true, file };
|
|
166
192
|
},
|
|
167
193
|
|
|
@@ -170,7 +196,7 @@ export async function runDaemon(opts, outputDir, initialUrl) {
|
|
|
170
196
|
const ts = new Date().toISOString().replace(/[:.]/g, '-');
|
|
171
197
|
const ext = format || 'png';
|
|
172
198
|
const file = join(absDir, `screenshot-${ts}.${ext}`);
|
|
173
|
-
|
|
199
|
+
writeFilePrivate(file, Buffer.from(data, 'base64'));
|
|
174
200
|
return { ok: true, file };
|
|
175
201
|
},
|
|
176
202
|
|
|
@@ -244,7 +270,7 @@ export async function runDaemon(opts, outputDir, initialUrl) {
|
|
|
244
270
|
const data = await page.pdf({ landscape });
|
|
245
271
|
const ts = new Date().toISOString().replace(/[:.]/g, '-');
|
|
246
272
|
const file = join(absDir, `page-${ts}.pdf`);
|
|
247
|
-
|
|
273
|
+
writeFilePrivate(file, Buffer.from(data, 'base64'));
|
|
248
274
|
return { ok: true, file };
|
|
249
275
|
},
|
|
250
276
|
|
|
@@ -273,7 +299,7 @@ export async function runDaemon(opts, outputDir, initialUrl) {
|
|
|
273
299
|
async 'dialog-log'() {
|
|
274
300
|
const ts = new Date().toISOString().replace(/[:.]/g, '-');
|
|
275
301
|
const file = join(absDir, `dialogs-${ts}.json`);
|
|
276
|
-
|
|
302
|
+
writeFilePrivate(file, JSON.stringify(page.dialogLog, null, 2));
|
|
277
303
|
return { ok: true, file, count: page.dialogLog.length };
|
|
278
304
|
},
|
|
279
305
|
|
|
@@ -304,7 +330,7 @@ export async function runDaemon(opts, outputDir, initialUrl) {
|
|
|
304
330
|
if (level) logs = logs.filter((l) => l.type === level);
|
|
305
331
|
const ts = new Date().toISOString().replace(/[:.]/g, '-');
|
|
306
332
|
const file = join(absDir, `console-${ts}.json`);
|
|
307
|
-
|
|
333
|
+
writeFilePrivate(file, JSON.stringify(logs, null, 2));
|
|
308
334
|
if (clear) consoleLogs.length = 0;
|
|
309
335
|
return { ok: true, file, count: logs.length };
|
|
310
336
|
},
|
|
@@ -314,7 +340,7 @@ export async function runDaemon(opts, outputDir, initialUrl) {
|
|
|
314
340
|
if (failed) logs = logs.filter((l) => l.status === 0 || l.status >= 400);
|
|
315
341
|
const ts = new Date().toISOString().replace(/[:.]/g, '-');
|
|
316
342
|
const file = join(absDir, `network-${ts}.json`);
|
|
317
|
-
|
|
343
|
+
writeFilePrivate(file, JSON.stringify(logs, null, 2));
|
|
318
344
|
return { ok: true, file, count: logs.length };
|
|
319
345
|
},
|
|
320
346
|
|
|
@@ -346,6 +372,14 @@ export async function runDaemon(opts, outputDir, initialUrl) {
|
|
|
346
372
|
return;
|
|
347
373
|
}
|
|
348
374
|
|
|
375
|
+
// Require the per-session token. Rejects any local process that hasn't
|
|
376
|
+
// read session.json (which is owner-only). Constant-time compare.
|
|
377
|
+
if (!tokenMatches(authToken, req.headers['x-barebrowse-token'])) {
|
|
378
|
+
res.writeHead(401, { 'Content-Type': 'application/json' });
|
|
379
|
+
res.end(JSON.stringify({ ok: false, error: 'Unauthorized: missing or invalid token' }));
|
|
380
|
+
return;
|
|
381
|
+
}
|
|
382
|
+
|
|
349
383
|
let body = '';
|
|
350
384
|
for await (const chunk of req) body += chunk;
|
|
351
385
|
|
|
@@ -388,11 +422,13 @@ export async function runDaemon(opts, outputDir, initialUrl) {
|
|
|
388
422
|
|
|
389
423
|
const port = server.address().port;
|
|
390
424
|
|
|
391
|
-
// Write session.json so parent/clients can find us
|
|
425
|
+
// Write session.json so parent/clients can find us. Owner-only: it carries
|
|
426
|
+
// the auth token that gates /command.
|
|
392
427
|
const sessionPath = join(absDir, SESSION_FILE);
|
|
393
|
-
|
|
428
|
+
writeFilePrivate(sessionPath, JSON.stringify({
|
|
394
429
|
port,
|
|
395
430
|
pid: process.pid,
|
|
431
|
+
token: authToken,
|
|
396
432
|
startedAt: new Date().toISOString(),
|
|
397
433
|
}));
|
|
398
434
|
|
package/src/index.js
CHANGED
|
@@ -18,7 +18,9 @@ import { dismissConsent } from './consent.js';
|
|
|
18
18
|
import { applyStealth } from './stealth.js';
|
|
19
19
|
import { DEFAULT_BLOCKLIST } from './blocklist.js';
|
|
20
20
|
import { waitForNetworkIdle } from './network-idle.js';
|
|
21
|
+
import { assertNavigable, assertUploadAllowed } from './url-guard.js';
|
|
21
22
|
import { join as pathJoin } from 'node:path';
|
|
23
|
+
import { chmodSync } from 'node:fs';
|
|
22
24
|
|
|
23
25
|
/**
|
|
24
26
|
* Browse a URL and return an ARIA snapshot.
|
|
@@ -41,6 +43,10 @@ export async function browse(url, opts = {}) {
|
|
|
41
43
|
const mode = opts.mode || 'headless';
|
|
42
44
|
const timeout = opts.timeout || 30000;
|
|
43
45
|
|
|
46
|
+
// Reject local-resource schemes (and optionally private hosts) before we
|
|
47
|
+
// spend a browser launch on a URL we won't navigate to.
|
|
48
|
+
assertNavigable(url, { allowLocalUrls: opts.allowLocalUrls, blockPrivateNetwork: opts.blockPrivateNetwork });
|
|
49
|
+
|
|
44
50
|
let browser = null;
|
|
45
51
|
let cdp = null;
|
|
46
52
|
// Forward caller-supplied launch knobs (binary, userDataDir, proxy) into
|
|
@@ -154,6 +160,15 @@ export async function browse(url, opts = {}) {
|
|
|
154
160
|
* attached to and follows the session across switchTab() until close.
|
|
155
161
|
* @param {string[]} [opts.blockUrls] - Extra URL glob patterns to block,
|
|
156
162
|
* merged with the default unless blockAds is false.
|
|
163
|
+
* @param {boolean} [opts.allowLocalUrls=false] - Permit navigation to local-
|
|
164
|
+
* resource schemes (file:, view-source:, chrome:, …). Blocked by default
|
|
165
|
+
* because a prompt-injected agent could use them to read local files.
|
|
166
|
+
* @param {boolean} [opts.blockPrivateNetwork=false] - Reject navigation to
|
|
167
|
+
* loopback / RFC-1918 / link-local / cloud-metadata hosts (SSRF guard).
|
|
168
|
+
* Off by default so localhost dev-server browsing keeps working.
|
|
169
|
+
* @param {string} [opts.uploadDir] - When set, upload() rejects any file that
|
|
170
|
+
* does not resolve (symlinks included) inside this directory. Sandboxes the
|
|
171
|
+
* agent's file-upload capability. Default: no restriction.
|
|
157
172
|
* @returns {Promise<object>} Page handle with goto, snapshot, close
|
|
158
173
|
*/
|
|
159
174
|
export async function connect(opts = {}) {
|
|
@@ -164,6 +179,11 @@ export async function connect(opts = {}) {
|
|
|
164
179
|
// Forward caller-supplied launch knobs into every launch() below,
|
|
165
180
|
// including hybrid-fallback re-launches inside goto().
|
|
166
181
|
const launchOpts = { proxy: opts.proxy, binary: opts.binary, userDataDir: opts.userDataDir };
|
|
182
|
+
// Navigation safety policy, applied on every goto()/createTab().goto().
|
|
183
|
+
const urlGuard = { allowLocalUrls: opts.allowLocalUrls, blockPrivateNetwork: opts.blockPrivateNetwork };
|
|
184
|
+
// Optional upload sandbox: when set, upload() rejects files outside this dir.
|
|
185
|
+
// assertUploadAllowed resolves it (realpath) at check time.
|
|
186
|
+
const uploadDir = opts.uploadDir || null;
|
|
167
187
|
|
|
168
188
|
if (attachMode) {
|
|
169
189
|
// Reuse the user's running browser — do not launch, do not own the
|
|
@@ -312,6 +332,7 @@ export async function connect(opts = {}) {
|
|
|
312
332
|
|
|
313
333
|
return {
|
|
314
334
|
async goto(url, timeout = 30000) {
|
|
335
|
+
assertNavigable(url, urlGuard);
|
|
315
336
|
// Refs from the previous page are about to become invalid — clear
|
|
316
337
|
// before navigating so a stale click(ref) errors clearly instead of
|
|
317
338
|
// silently resolving to whatever backendNodeId happens to still be in
|
|
@@ -467,6 +488,10 @@ export async function connect(opts = {}) {
|
|
|
467
488
|
async upload(ref, files) {
|
|
468
489
|
const entry = refMap.get(ref);
|
|
469
490
|
if (!entry) throw new Error(`No element found for ref "${ref}"`);
|
|
491
|
+
// Upload sandbox: when uploadDir is set, every path must resolve
|
|
492
|
+
// (symlinks included, via realpath) inside it. Stops a prompt-injected
|
|
493
|
+
// agent from attaching ~/.ssh/id_rsa or other arbitrary local files.
|
|
494
|
+
assertUploadAllowed(files, uploadDir);
|
|
470
495
|
await cdpUpload(entry.session, entry.backendNodeId, files);
|
|
471
496
|
},
|
|
472
497
|
|
|
@@ -535,7 +560,10 @@ export async function connect(opts = {}) {
|
|
|
535
560
|
});
|
|
536
561
|
const state = { cookies, localStorage: JSON.parse(result.value || '{}') };
|
|
537
562
|
const { writeFileSync } = await import('node:fs');
|
|
538
|
-
|
|
563
|
+
// State holds cookies + localStorage (session tokens) — write owner-only
|
|
564
|
+
// so a multi-user host can't read another user's credentials off disk.
|
|
565
|
+
writeFileSync(filePath, JSON.stringify(state, null, 2), { mode: 0o600 });
|
|
566
|
+
try { chmodSync(filePath, 0o600); } catch { /* best effort if pre-existing */ }
|
|
539
567
|
},
|
|
540
568
|
|
|
541
569
|
get botBlocked() { return botBlocked; },
|
|
@@ -590,6 +618,7 @@ export async function connect(opts = {}) {
|
|
|
590
618
|
let tabBotBlocked = false;
|
|
591
619
|
return {
|
|
592
620
|
async goto(url, timeout = 30000) {
|
|
621
|
+
assertNavigable(url, urlGuard);
|
|
593
622
|
await navigate(tab, url, timeout);
|
|
594
623
|
if (opts.consent !== false) {
|
|
595
624
|
await dismissConsent(tab.session);
|
|
@@ -758,12 +787,21 @@ async function attachToExistingTarget(cdp, targetId, pageOpts = {}) {
|
|
|
758
787
|
return { session, targetId, sessionId, framesByFrameId };
|
|
759
788
|
}
|
|
760
789
|
|
|
790
|
+
// One-time warn flag for Network.setBlockedURLs reject. Module-scoped so the
|
|
791
|
+
// warn fires once per process across every session — legacy Chrome will keep
|
|
792
|
+
// rejecting and we don't want to spam.
|
|
793
|
+
let blocklistWarned = false;
|
|
794
|
+
|
|
761
795
|
/**
|
|
762
796
|
* Apply Network.setBlockedURLs for ad/tracker blocking on a session.
|
|
763
797
|
* Default list is on; pass blockAds:false to skip, blockUrls:[] to extend.
|
|
764
|
-
*
|
|
798
|
+
* On failure (legacy Chrome lacking the method) warns once and continues —
|
|
799
|
+
* blocking is an enhancement, not a hard requirement.
|
|
800
|
+
*
|
|
801
|
+
* Exported for unit testing of the warn-once behavior; not part of the public
|
|
802
|
+
* API surface.
|
|
765
803
|
*/
|
|
766
|
-
async function applyBlocklist(session, pageOpts) {
|
|
804
|
+
export async function applyBlocklist(session, pageOpts) {
|
|
767
805
|
if (pageOpts.blockAds === false && !pageOpts.blockUrls) return;
|
|
768
806
|
const patterns = pageOpts.blockAds === false
|
|
769
807
|
? (pageOpts.blockUrls || [])
|
|
@@ -771,11 +809,19 @@ async function applyBlocklist(session, pageOpts) {
|
|
|
771
809
|
if (!patterns.length) return;
|
|
772
810
|
try {
|
|
773
811
|
await session.send('Network.setBlockedURLs', { urls: patterns });
|
|
774
|
-
} catch {
|
|
775
|
-
|
|
812
|
+
} catch (err) {
|
|
813
|
+
if (!blocklistWarned) {
|
|
814
|
+
blocklistWarned = true;
|
|
815
|
+
console.warn(`barebrowse: Network.setBlockedURLs unsupported — ad/tracker blocking disabled (${err.message})`);
|
|
816
|
+
}
|
|
776
817
|
}
|
|
777
818
|
}
|
|
778
819
|
|
|
820
|
+
/** Test-only: reset the warn-once flag. Not part of the public API. */
|
|
821
|
+
export function _resetBlocklistWarning() {
|
|
822
|
+
blocklistWarned = false;
|
|
823
|
+
}
|
|
824
|
+
|
|
779
825
|
/**
|
|
780
826
|
* Navigate to a URL and wait for the page to load.
|
|
781
827
|
*/
|
package/src/session-client.js
CHANGED
|
@@ -53,7 +53,11 @@ export async function sendCommand(command, args, outputDir) {
|
|
|
53
53
|
try {
|
|
54
54
|
res = await fetch(`http://127.0.0.1:${session.port}/command`, {
|
|
55
55
|
method: 'POST',
|
|
56
|
-
headers: {
|
|
56
|
+
headers: {
|
|
57
|
+
'Content-Type': 'application/json',
|
|
58
|
+
// Authenticate to the daemon with the per-session token from session.json.
|
|
59
|
+
...(session.token ? { 'x-barebrowse-token': session.token } : {}),
|
|
60
|
+
},
|
|
57
61
|
body: JSON.stringify({ command, args }),
|
|
58
62
|
signal: AbortSignal.timeout(60000),
|
|
59
63
|
});
|
package/src/url-guard.js
ADDED
|
@@ -0,0 +1,138 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* url-guard.js — Navigation safety checks for goto()/browse().
|
|
3
|
+
*
|
|
4
|
+
* Closes two confirmed vectors for an autonomous (and therefore
|
|
5
|
+
* prompt-injectable) agent:
|
|
6
|
+
* 1. Local-resource schemes (file:, view-source:, chrome:, …) that let a
|
|
7
|
+
* page-sourced instruction read local files or browser internals.
|
|
8
|
+
* 2. Optional private-network blocking (loopback, RFC-1918, link-local,
|
|
9
|
+
* cloud-metadata) to stop SSRF to internal services.
|
|
10
|
+
*
|
|
11
|
+
* Scheme blocking is on by default; private-network blocking is opt-in
|
|
12
|
+
* (blockPrivateNetwork) so localhost dev-server browsing keeps working.
|
|
13
|
+
*
|
|
14
|
+
* Limitation: private-network checks match the URL hostname only. A public
|
|
15
|
+
* DNS name that resolves to a private IP (DNS rebinding) is NOT caught here —
|
|
16
|
+
* that needs connection-time IP inspection. Documented, not silently assumed.
|
|
17
|
+
*/
|
|
18
|
+
|
|
19
|
+
import { realpathSync } from 'node:fs';
|
|
20
|
+
import { resolve, sep } from 'node:path';
|
|
21
|
+
|
|
22
|
+
// Schemes safe to navigate to. Everything else is treated as a local-resource
|
|
23
|
+
// or browser-internal scheme and blocked unless allowLocalUrls is set.
|
|
24
|
+
// data:/blob:/about: stay allowed: opaque origins, no file:// or cross-origin
|
|
25
|
+
// read, and data: is the library's test-fixture mechanism.
|
|
26
|
+
const ALLOWED_SCHEMES = new Set(['http:', 'https:', 'data:', 'blob:', 'about:']);
|
|
27
|
+
|
|
28
|
+
/**
|
|
29
|
+
* @param {string} host - hostname (no brackets for IPv6)
|
|
30
|
+
* @returns {boolean} true if it names a private/loopback/link-local/internal host
|
|
31
|
+
*/
|
|
32
|
+
function isPrivateHost(host) {
|
|
33
|
+
const h = host.toLowerCase().replace(/^\[|\]$/g, ''); // strip IPv6 brackets
|
|
34
|
+
|
|
35
|
+
// Internal hostnames
|
|
36
|
+
if (h === 'localhost' || h.endsWith('.localhost')) return true;
|
|
37
|
+
if (h.endsWith('.local') || h.endsWith('.internal')) return true;
|
|
38
|
+
if (h === 'metadata.google.internal') return true;
|
|
39
|
+
|
|
40
|
+
// IPv4 (incl. ranges)
|
|
41
|
+
const v4 = h.match(/^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$/);
|
|
42
|
+
if (v4) {
|
|
43
|
+
const [a, b] = [Number(v4[1]), Number(v4[2])];
|
|
44
|
+
if (a === 127) return true; // loopback 127.0.0.0/8
|
|
45
|
+
if (a === 10) return true; // 10.0.0.0/8
|
|
46
|
+
if (a === 0) return true; // 0.0.0.0/8
|
|
47
|
+
if (a === 169 && b === 254) return true; // link-local / cloud metadata
|
|
48
|
+
if (a === 172 && b >= 16 && b <= 31) return true; // 172.16.0.0/12
|
|
49
|
+
if (a === 192 && b === 168) return true; // 192.168.0.0/16
|
|
50
|
+
return false;
|
|
51
|
+
}
|
|
52
|
+
|
|
53
|
+
// IPv6 — gated on the host actually being an IPv6 literal (contains a
|
|
54
|
+
// colon). Without this gate, ordinary hostnames like "fcbarcelona.com" or
|
|
55
|
+
// "fdic.gov" would match the fc00::/7 ULA prefix check and be wrongly blocked.
|
|
56
|
+
if (h.includes(':')) {
|
|
57
|
+
if (h === '::1' || h === '::') return true; // loopback / unspecified
|
|
58
|
+
if (h.startsWith('fe80:')) return true; // link-local fe80::/10
|
|
59
|
+
if (h.startsWith('fc') || h.startsWith('fd')) return true; // fc00::/7 ULA
|
|
60
|
+
// IPv4-mapped IPv6 (e.g. ::ffff:127.0.0.1)
|
|
61
|
+
const mapped = h.match(/::ffff:(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})$/);
|
|
62
|
+
if (mapped) return isPrivateHost(mapped[1]);
|
|
63
|
+
return false;
|
|
64
|
+
}
|
|
65
|
+
|
|
66
|
+
return false;
|
|
67
|
+
}
|
|
68
|
+
|
|
69
|
+
/**
|
|
70
|
+
* Throw if `url` is unsafe to navigate to under the given policy.
|
|
71
|
+
* @param {string} url
|
|
72
|
+
* @param {object} [opts]
|
|
73
|
+
* @param {boolean} [opts.allowLocalUrls=false] - permit file:/chrome:/etc.
|
|
74
|
+
* @param {boolean} [opts.blockPrivateNetwork=false] - reject loopback/RFC-1918/metadata.
|
|
75
|
+
*/
|
|
76
|
+
export function assertNavigable(url, opts = {}) {
|
|
77
|
+
let parsed;
|
|
78
|
+
try {
|
|
79
|
+
parsed = new URL(url);
|
|
80
|
+
} catch {
|
|
81
|
+
throw new Error(`Refusing to navigate: not a valid URL (${String(url).slice(0, 80)})`);
|
|
82
|
+
}
|
|
83
|
+
|
|
84
|
+
if (!opts.allowLocalUrls && !ALLOWED_SCHEMES.has(parsed.protocol)) {
|
|
85
|
+
throw new Error(
|
|
86
|
+
`Refusing to navigate to "${parsed.protocol}" URL — local-resource and ` +
|
|
87
|
+
`browser-internal schemes are blocked (reads local files / browser state). ` +
|
|
88
|
+
`Pass { allowLocalUrls: true } to override.`
|
|
89
|
+
);
|
|
90
|
+
}
|
|
91
|
+
|
|
92
|
+
if (
|
|
93
|
+
opts.blockPrivateNetwork &&
|
|
94
|
+
(parsed.protocol === 'http:' || parsed.protocol === 'https:') &&
|
|
95
|
+
parsed.hostname &&
|
|
96
|
+
isPrivateHost(parsed.hostname)
|
|
97
|
+
) {
|
|
98
|
+
throw new Error(
|
|
99
|
+
`Refusing to navigate to private/internal host "${parsed.hostname}" — ` +
|
|
100
|
+
`blockPrivateNetwork is enabled (SSRF guard). ` +
|
|
101
|
+
`Unset it to allow localhost / internal browsing.`
|
|
102
|
+
);
|
|
103
|
+
}
|
|
104
|
+
}
|
|
105
|
+
|
|
106
|
+
/**
|
|
107
|
+
* Throw if any file in `files` resolves outside `uploadDir`. Both the base
|
|
108
|
+
* dir and each file are resolved through realpath, so symlinks (in either the
|
|
109
|
+
* base path — e.g. macOS /tmp → /private/tmp — or the file) can't be used to
|
|
110
|
+
* escape the sandbox or to false-reject a legitimate file.
|
|
111
|
+
* No-op when `uploadDir` is falsy (no restriction configured).
|
|
112
|
+
* @param {string|string[]} files
|
|
113
|
+
* @param {string|null} uploadDir
|
|
114
|
+
*/
|
|
115
|
+
export function assertUploadAllowed(files, uploadDir) {
|
|
116
|
+
if (!uploadDir) return;
|
|
117
|
+
let baseReal;
|
|
118
|
+
try {
|
|
119
|
+
baseReal = realpathSync(resolve(uploadDir));
|
|
120
|
+
} catch {
|
|
121
|
+
throw new Error(`upload: uploadDir does not exist or is unreadable (${uploadDir})`);
|
|
122
|
+
}
|
|
123
|
+
const list = Array.isArray(files) ? files : [files];
|
|
124
|
+
for (const f of list) {
|
|
125
|
+
let real;
|
|
126
|
+
try {
|
|
127
|
+
real = realpathSync(resolve(String(f)));
|
|
128
|
+
} catch {
|
|
129
|
+
throw new Error(`upload: cannot resolve "${f}" (must exist inside uploadDir)`);
|
|
130
|
+
}
|
|
131
|
+
if (real !== baseReal && !real.startsWith(baseReal + sep)) {
|
|
132
|
+
throw new Error(`upload: "${f}" is outside the allowed uploadDir (${uploadDir})`);
|
|
133
|
+
}
|
|
134
|
+
}
|
|
135
|
+
}
|
|
136
|
+
|
|
137
|
+
// Exported for unit tests.
|
|
138
|
+
export { isPrivateHost };
|