pi-webveil 0.2.1 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +118 -101
  2. package/package.json +2 -2
package/README.md CHANGED
@@ -13,113 +13,46 @@ works perfectly well non-anonymously (direct egress).
13
13
  webveil is a pnpm workspace monorepo. The **core** (`search()` / `fetch()`) is plain,
14
14
  framework-agnostic. Two thin frontends wrap that same core:
15
15
 
16
- - **[`webveil`](packages/webveil)**, an [incur](https://github.com/wevm/incur)-based
16
+ - **[`webveil`](https://github.com/wighawag/webveil/blob/pi-webveil@0.2.2/packages/webveil)**, an [incur](https://github.com/wevm/incur)-based
17
17
  **CLI + MCP server** (`--mcp`, skills, `--llms`, TOON output). Pi-agnostic; usable by any
18
18
  agent (pi via pi-mcp-adapter, Claude Code, Cursor, Codex, bash). Has a `webveil` bin.
19
- - **[`pi-webveil`](packages/pi-webveil)**, a **pi extension** registering `web_search` and
19
+ - **[`pi-webveil`](https://github.com/wighawag/webveil/blob/pi-webveil@0.2.2/packages/pi-webveil)**, a **pi extension** registering `web_search` and
20
20
  `web_fetch` tools that call the core in-process. A drop-in replacement for Ollama's tools
21
21
  (same names), which is the original motivation. Depends on `webveil` via `workspace:*`.
22
22
 
23
23
  ## Quick start
24
24
 
25
- webveil needs a **backend** to get results from. The zero-config default is a local
26
- **SearXNG** at `http://127.0.0.1:8080` on `direct` egress (non-anonymous). There is
27
- **no** zero-setup + anonymous + real-web-results option in the ecosystem, see
28
- [`work/notes/ideas/default-backend-policy-account-vs-origin.md`](work/notes/ideas/default-backend-policy-account-vs-origin.md);
29
- SearXNG (you run it) is the closest, `tavily-compat` (needs an account/key) is the other.
30
-
31
- ### Run SearXNG (matches the default with no config)
25
+ webveil needs a **backend** for results. The zero-config default is a local **SearXNG** at
26
+ `http://127.0.0.1:8080` on `direct` egress (non-anonymous). Run one with Docker:
32
27
 
33
28
  ```sh
34
- # Docker: the container binds 8080 internally; map host 8080 -> container 8080
35
- # so it matches webveil's default baseUrl exactly.
29
+ # The container binds 8080 internally; map host 8080 -> 8080 to match the default.
36
30
  docker run -d --name searxng -p 8080:8080 searxng/searxng
37
31
  ```
38
32
 
39
- Then `webveil search "…"` / `web_fetch` work with no config.
40
-
41
- > **Port gotcha (you WILL hit this):** SearXNG's default port depends on how you install
42
- > it. A bare-metal / pip / source install defaults to **8888** (`settings.yml`
43
- > `server.port: 8888`). The Docker image binds **8080** internally regardless (its
44
- > entrypoint forces `0.0.0.0:8080`). SearXNG's own docs suggest `docker run … -p 8888:8080`
45
- > (host 8888 → container 8080). webveil's default expects **8080**. If your instance is on
46
- > any other port, point webveil at it:
47
- >
48
- > ```sh
49
- > export WEBVEIL_BASE_URL=http://127.0.0.1:8888 # or wherever your instance listens
50
- > ```
51
- >
52
- > or set `baseUrl` in `webveil.json` (see config seam below).
53
-
54
- ### Other SearXNG install options
55
-
56
- webveil needs something to point `baseUrl` at: an **HTTP `host:port`**, or (script install)
57
- the **Unix socket** itself. How you get one:
58
-
59
- - **Docker (above)**, binds a real TCP port directly; simplest if you only need webveil.
60
- - **Install script as a background service** (`sudo -H ./utils/searxng.sh install all`,
61
- see <https://docs.searxng.org/admin/installation-scripts.html>), sets SearXNG up as a
62
- systemd/uWSGI service. **Gotcha:** by default this listens on a **Unix socket**
63
- (`socket = /usr/local/searxng/run/socket`), NOT a TCP port. And, crucially, that default
64
- socket speaks the **native uwsgi protocol, NOT HTTP** (`socket = …`, not `http-socket =
65
- …`), so even a `curl --unix-socket … http://localhost/` returns HTTP 000. webveil's
66
- `unix:` baseUrl speaks **HTTP over a unix socket** via undici, so it CANNOT reach that
67
- default uwsgi socket directly. Three ways to reach the install-script instance:
68
- - **Point webveil straight at an HTTP unix socket** (no proxy, no extra process), once the
69
- socket actually speaks HTTP. The install-script default does NOT, so first make uWSGI
70
- serve HTTP on the socket: in the generated `.ini`, replace
71
- `socket = /usr/local/searxng/run/socket` with
72
- `http-socket = /usr/local/searxng/run/socket` (HTTP over the socket instead of the
73
- uwsgi protocol). THEN point webveil at it with a `unix:` URL naming the socket file:
74
- ```sh
75
- export WEBVEIL_BASE_URL=unix:/usr/local/searxng/run/socket
76
- ```
77
- webveil dials the socket directly over undici (`Agent({connect:{socketPath}})`, no
78
- extra dependency) and issues its normal `/search?...&format=json` request. The grammar
79
- is `unix:<socketPath>[:<httpPath>]`: the socket file path, then an OPTIONAL `:` +
80
- base path (mount point) the SearXNG app lives under (defaults to `/`, so the example
81
- above requests `/search`; a non-root mount is `unix:/usr/local/searxng/run/socket:/searxng`).
82
- (`unix:` works against ANY HTTP-on-a-unix-socket server, e.g. a Caddy/nginx upstream
83
- bound to a socket; the uwsgi-vs-`http-socket` distinction above is the SearXNG-specific
84
- catch.)
85
- **Egress must be `direct`** for this: a Unix socket is inherently local, so combining a
86
- `unix:` baseUrl with `egress=http`/`socks5` fails loud (proxying a local hop is fake
87
- anonymity, see "Where does anonymity live?" below; proxy SearXNG's `outgoing.proxies`
88
- instead and keep webveil `direct`).
89
- - **Front it with a reverse proxy** (this is what the SearXNG docs' nginx/apache step is
90
- for, it bridges HTTP-on-a-port to the uWSGI socket, serving BOTH the browser UI and
91
- webveil). **Any HTTP server works**, the docs say so explicitly; **Caddy is fine** and
92
- a good pick if you already run it. Plain Caddy `reverse_proxy` speaks **HTTP** to its
93
- upstream, so point it at an `http-socket` (see below) or a TCP `http-socket`:
94
- ```caddy
95
- searxng.example.com {
96
- reverse_proxy unix//usr/local/searxng/run/socket # plain reverse_proxy = HTTP, so the socket must be http-socket = (not the uwsgi socket =)
97
- }
98
- ```
99
- Then point webveil at the Caddy address. (Set SearXNG's `server.base_url` in
100
- `settings.yml` to match, and keep the limiter in mind, see below.) If you want a Caddy
101
- frontend AND webveil-direct, the simplest path is ONE `http-socket` that both consume
102
- (Caddy's HTTP `reverse_proxy` and webveil's `unix:` both speak HTTP to it); you only
103
- need the uwsgi `socket = ` form if Caddy uses an explicit uwsgi transport.
104
- - **Or make uWSGI listen on a TCP port** instead of the socket: in the generated
105
- `.ini`, replace `socket = …/run/socket` with `http-socket = 127.0.0.1:8888`, then point
106
- webveil at `http://127.0.0.1:8888`. Good when you want ONLY webveil (no public web UI /
107
- TLS).
108
-
109
- > **You will also need to enable the JSON API and (for a local instance) disable the
110
- > limiter.** A fresh script install ships with `server.limiter: true` and often no `json`
111
- > output format, so webveil gets `429 TOO MANY REQUESTS` or an HTML page. In SearXNG's
112
- > `settings.yml` set `server.limiter: false` + `server.public_instance: false` (safe for a
113
- > LOCAL, socket-only instance, NOT internet-exposed) and add `json` under `search.formats:`
114
- > (`[html, json]`), then restart uWSGI. This applies to EVERY option above, it is a
115
- > SearXNG-side requirement, not a webveil one.
116
-
117
- Full SearXNG install options (Docker, Compose, script, bare-metal): the official docs at
118
- <https://docs.searxng.org/admin/installation.html>. Install topology + the
119
- uwsgi-vs-`http-socket`, limiter, and reverse-proxy details captured in
120
- [`work/notes/findings/searxng-install-topology.md`](work/notes/findings/searxng-install-topology.md)
121
- and
122
- [`work/notes/findings/searxng-script-socket-is-uwsgi-not-http.md`](work/notes/findings/searxng-script-socket-is-uwsgi-not-http.md).
33
+ Then searches and fetches work with no config:
34
+
35
+ ```sh
36
+ webveil search "hello world"
37
+ ```
38
+
39
+ That's the whole happy path. Two things to know:
40
+
41
+ - **Enable SearXNG's JSON API.** A fresh install may serve only HTML and ship with the rate
42
+ limiter on, giving webveil `429` or an HTML page. The stock Docker image above works out
43
+ of the box; other installs need `json` in `search.formats` and `server.limiter: false`
44
+ for local use. See [SearXNG setup](https://github.com/wighawag/webveil/blob/pi-webveil@0.2.2/docs/searxng-setup.md).
45
+ - **Different port?** Point webveil at it (a non-Docker install often defaults to 8888):
46
+ ```sh
47
+ export WEBVEIL_BASE_URL=http://127.0.0.1:8888 # or wherever your instance listens
48
+ ```
49
+ or set `baseUrl` in `webveil.json`.
50
+
51
+ For any non-Docker topology (install script, Unix sockets, reverse proxy, the
52
+ uwsgi-vs-`http-socket` catch), see **[SearXNG setup (detailed)](https://github.com/wighawag/webveil/blob/pi-webveil@0.2.2/docs/searxng-setup.md)**.
53
+ There is **no** zero-setup + anonymous + real-web-results option in the ecosystem (see
54
+ [`work/notes/ideas/default-backend-policy-account-vs-origin.md`](https://github.com/wighawag/webveil/blob/pi-webveil@0.2.2/work/notes/ideas/default-backend-policy-account-vs-origin.md));
55
+ SearXNG (you run it) is the closest, `tavily-compat` (needs an account/key) is the other.
123
56
 
124
57
  ### Where does anonymity live? (read before turning on egress)
125
58
 
@@ -151,7 +84,7 @@ Rule of thumb: **proxy the hop that actually reaches the public internet.** For
151
84
  self-hosted SearXNG that hop is SearXNG's, so the proxy goes on SearXNG
152
85
  (`outgoing.proxies`), and webveil stays `direct`. webveil's `socks5` mode is for *remote*
153
86
  backends and for `web_fetch`. See
154
- [`work/notes/findings/webveil-anonymity-boundary.md`](work/notes/findings/webveil-anonymity-boundary.md).
87
+ [`work/notes/findings/webveil-anonymity-boundary.md`](https://github.com/wighawag/webveil/blob/pi-webveil@0.2.2/work/notes/findings/webveil-anonymity-boundary.md).
155
88
 
156
89
  ## How it works (seams)
157
90
 
@@ -169,16 +102,16 @@ backends and for `web_fetch`. See
169
102
  and nothing else: your shell, `git push`, the browser, and the OS are untouched. So
170
103
  webveil on `socks5` does NOT route your `git push` through the proxy. See
171
104
  [Anonymous egress](#anonymous-egress-mullvad--tor) and
172
- [`work/notes/findings/mullvad-socks5-egress-mechanics.md`](work/notes/findings/mullvad-socks5-egress-mechanics.md).
105
+ [`work/notes/findings/mullvad-socks5-egress-mechanics.md`](https://github.com/wighawag/webveil/blob/pi-webveil@0.2.2/work/notes/findings/mullvad-socks5-egress-mechanics.md).
173
106
  - **config seam**, per-folder resolution: env > nearest `webveil.json` walking up from
174
107
  cwd > global `$XDG_CONFIG_HOME/webveil/config.json` (default
175
108
  `~/.config/webveil/config.json`) > defaults. Per folder = per account/egress. The
176
109
  project file is a frontend-neutral `webveil.json` read identically by the CLI and the
177
- pi extension. See [`docs/adr/0002`](docs/adr/0002-config-file-location-neutral-webveil-json.md).
110
+ pi extension. See [`docs/adr/0002`](https://github.com/wighawag/webveil/blob/pi-webveil@0.2.2/docs/adr/0002-config-file-location-neutral-webveil-json.md).
178
111
  - **extractor seam**, `urlToMarkdown` via `distilly/fetch` by default, injected with
179
112
  webveil's egress-bound `fetch`; a backend's own `/extract` (Tavily-compat) may override
180
113
  it. Owns the context-friendly markdown + size presets (`s`/`m`/`l`/`f`). See
181
- [`docs/adr/0001`](docs/adr/0001-extractor-uses-distilly-fetch-with-injected-egress.md).
114
+ [`docs/adr/0001`](https://github.com/wighawag/webveil/blob/pi-webveil@0.2.2/docs/adr/0001-extractor-uses-distilly-fetch-with-injected-egress.md).
182
115
  - **security**, an SSRF guard lives in the egress fetch, so it covers distilly's
183
116
  rule-rewritten requests too.
184
117
 
@@ -202,6 +135,16 @@ or per folder in `webveil.json`:
202
135
  { "egress": { "mode": "socks5", "url": "socks5://10.64.0.1:1080" } }
203
136
  ```
204
137
 
138
+ > **`socks5` is for a REMOTE backend or `web_fetch`, NOT a local SearXNG.** If your
139
+ > `baseUrl` is a local SearXNG (`unix:` or `127.0.0.1`), `WEBVEIL_EGRESS=socks5` is
140
+ > **rejected** (fail-loud), because webveil → local-SearXNG is a local hop; proxying it
141
+ > would give fake anonymity while SearXNG still crawls the web from your real IP. The hop
142
+ > that needs proxying is SearXNG's own, so you put the proxy on **SearXNG**
143
+ > (`outgoing.proxies`) and keep webveil `direct`. See
144
+ > [Where does anonymity live?](#where-does-anonymity-live-read-before-turning-on-egress)
145
+ > for the full table; the same SOCKS5 listener (Mullvad/Tor/wireproxy below) plugs into
146
+ > either side.
147
+
205
148
  ### Two layers keep your `git push` (and everything else) off the proxy
206
149
 
207
150
  A common worry: "if I route through Mullvad, will my `git push` to GitHub leak under the
@@ -241,7 +184,7 @@ real IP (proving only the proxy is tunnelled).
241
184
 
242
185
  If you want webveil to exit somewhere different from your system, you have options, but be
243
186
  clear on what is and isn't possible (see
244
- [`work/notes/findings/mullvad-socks5-egress-mechanics.md`](work/notes/findings/mullvad-socks5-egress-mechanics.md)):
187
+ [`work/notes/findings/mullvad-socks5-egress-mechanics.md`](https://github.com/wighawag/webveil/blob/pi-webveil@0.2.2/work/notes/findings/mullvad-socks5-egress-mechanics.md)):
245
188
 
246
189
  - **Different exit LOCATION, same account (easy).** Point webveil at a specific multihop
247
190
  SOCKS5 host so it exits elsewhere than your tunnel's entry:
@@ -263,6 +206,80 @@ clear on what is and isn't possible (see
263
206
  `WEBVEIL_EGRESS_URL=socks5://127.0.0.1:9050` with the Tor daemon running. Same per-request,
264
207
  webveil-only scoping applies.
265
208
 
209
+ ### Other SOCKS5 providers
210
+
211
+ webveil's `socks5` egress is generic: it builds a SOCKS5 dispatcher from any
212
+ `socks5://host:port` URL. Mullvad and Tor are just the documented examples. Anything that
213
+ exposes a SOCKS5 endpoint works, e.g. an SSH dynamic forward (`ssh -D 1080 user@host`, then
214
+ `socks5://127.0.0.1:1080`), or a local shadowsocks/sing-box listener. Use the `socks5://`
215
+ scheme (remote DNS, no leak; webveil does not resolve hostnames locally under proxy
216
+ egress). Verify any provider with
217
+ `curl https://ipv4.am.i.mullvad.net --socks5-hostname <host>:<port>`.
218
+
219
+ ### ProtonVPN (via wireproxy)
220
+
221
+ ProtonVPN **does not offer a native SOCKS5 proxy** ([and says it never
222
+ will](https://protonvpn.com/support/socks5)), unlike Mullvad's built-in `10.64.0.1:1080`.
223
+ There is no `socks5://` endpoint Proton hands you. But you can wrap a Proton **WireGuard**
224
+ tunnel in a local SOCKS5 listener and point webveil at that, exactly like the Tor case.
225
+
226
+ [wireproxy](https://github.com/pufferffish/wireproxy) is a userspace WireGuard client that
227
+ exposes a SOCKS5 port, which suits webveil's design (a local `127.0.0.1` listener,
228
+ webveil-only scope, no system-wide tunnel). **You do not need the Proton app or CLI
229
+ running:** wireproxy speaks the WireGuard protocol itself in userspace, so the `.conf`'s
230
+ keys + endpoint are all it needs to establish the tunnel (no `wg`/`wg-quick`, no network
231
+ interface, no root). Proton's dashboard is just where you generate the config once. (One
232
+ limit: wireproxy proxies TCP via SOCKS5 CONNECT, which is all webveil needs; it is not a
233
+ UDP path.)
234
+
235
+ 1. Download a **WireGuard config** from Proton's account dashboard (Downloads → WireGuard
236
+ configuration).
237
+ 2. Add a `[Socks5]` block and run wireproxy:
238
+ ```ini
239
+ # proton.conf (from Proton's WireGuard download, plus the [Socks5] block)
240
+ [Interface]
241
+ PrivateKey = <from Proton>
242
+ Address = 10.2.0.2/32
243
+ DNS = 10.2.0.1
244
+
245
+ [Peer]
246
+ PublicKey = <from Proton>
247
+ Endpoint = <proton-server>:51820
248
+ AllowedIPs = 0.0.0.0/0
249
+
250
+ [Socks5]
251
+ BindAddress = 127.0.0.1:1080
252
+ ```
253
+ ```sh
254
+ wireproxy -c proton.conf
255
+ ```
256
+ 3. Point the SOCKS5 endpoint (`socks5://127.0.0.1:1080`) at the **hop that reaches the
257
+ public web** (see the warning above):
258
+ - **Remote backend, or `web_fetch`** → webveil's egress:
259
+ ```sh
260
+ export WEBVEIL_EGRESS=socks5
261
+ export WEBVEIL_EGRESS_URL=socks5://127.0.0.1:1080
262
+ ```
263
+ - **Local SearXNG** → SearXNG's own outbound, in its `settings.yml` (keep webveil
264
+ `direct`):
265
+ ```yaml
266
+ outgoing:
267
+ proxies:
268
+ all://:
269
+ - socks5://127.0.0.1:1080
270
+ ```
271
+ This routes SearXNG's engine requests (→ Google/Bing/…) through Proton; webveil's
272
+ local hop to SearXNG stays direct. (`WEBVEIL_EGRESS=socks5` with a local `baseUrl` is
273
+ rejected, so this is the only correct shape for local SearXNG.)
274
+
275
+ Because wireproxy is userspace, only the traffic you point at it exits via Proton; your
276
+ `git push`, shell, and OS stay on your real IP, with no system tunnel. A ready-made Docker
277
+ wrapper that does the same (Proton WireGuard creds in, SOCKS5 on `1080` out) is
278
+ [`SamuelMoraesF/protonvpn-proxy`](https://github.com/SamuelMoraesF/protonvpn-proxy). Verify
279
+ the proxy with `curl https://ipv4.am.i.mullvad.net --socks5-hostname 127.0.0.1:1080` (a
280
+ Proton exit IP means traffic through it exits via Proton); webveil **fails loud** if a
281
+ configured proxy is unbuildable, so it never silently falls back to your real IP.
282
+
266
283
  > **Caveat:** webveil's `socks5` mode is NOT a whole-machine VPN. Do not assume enabling it
267
284
  > anonymizes anything other than webveil. Conversely, a system-wide full-tunnel VPN under
268
285
  > your logged-in identity is the thing that CAN deanonymize a `git push`; webveil's scoped
@@ -274,7 +291,7 @@ AGPL-3.0-or-later. webveil depends on `distilly` (MIT, the local HTML-to-markdow
274
291
  extractor; webveil uses its networked `distilly/fetch` entrypoint with an injected egress
275
292
  fetch) and `incur` (MIT). MIT code may be used by AGPL software; `distilly` stays
276
293
  GPL/AGPL-free so it remains cleanly reusable under MIT. See [`LICENSE`](LICENSE) and
277
- [`COPYRIGHT`](COPYRIGHT).
294
+ [`COPYRIGHT`](https://github.com/wighawag/webveil/blob/pi-webveil@0.2.2/COPYRIGHT).
278
295
 
279
296
  ## Size discipline (per-module LOC)
280
297
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-webveil",
3
- "version": "0.2.1",
3
+ "version": "0.2.2",
4
4
  "description": "Pi extension: web_search and web_fetch tools backed by webveil. A drop-in, anonymity-capable replacement for Ollama's web_search/web_fetch.",
5
5
  "license": "AGPL-3.0-or-later",
6
6
  "keywords": [
@@ -41,7 +41,7 @@
41
41
  ]
42
42
  },
43
43
  "dependencies": {
44
- "webveil": "0.2.1"
44
+ "webveil": "0.2.2"
45
45
  },
46
46
  "devDependencies": {
47
47
  "@types/node": "^25.2.0",