pi-web-toolkit 0.2.0 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md ADDED
@@ -0,0 +1,125 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ## [0.2.2] - 2026-06-11
11
+
12
+ ### Added
13
+
14
+ - Self-contained README prompt that Pi users can copy to install and verify the package, SearXNG, Scrapling, and agent-browser.
15
+ - `CHANGELOG.md` to the files included in the published npm package.
16
+
17
+ ### Changed
18
+
19
+ - Corrected README, tool reference, usage guide, agent guidance, project context, and test documentation to match current repository behavior.
20
+ - Clarified when to use each tool, Scrapling fallback behavior, external dependency requirements, and SearXNG JSON API setup.
21
+ - Test scripts now use the locally installed `tsx` development dependency.
22
+ - User-visible tool descriptions now distinguish pages that need interaction from those that do not.
23
+
24
+ ### Fixed
25
+
26
+ - Corrected historical changelog dates and inaccurate claims about robots.txt enforcement, tool limits, runtime configuration, and test coverage.
27
+ - Corrected GitHub issue-reading commands in agent guidance.
28
+
29
+ ## [0.2.1] - 2026-06-10
30
+
31
+ ### Added
32
+
33
+ - README tools preview grid with screenshots for `web_search`, `web_fetch`, `web_batch_fetch`, and `web_browse`.
34
+ - Agent-browser parser regression tests covering array, wrapped, single-item, and invalid JSON output shapes.
35
+
36
+ ### Fixed
37
+
38
+ - `web_browse` now accepts multiple agent-browser batch JSON output shapes instead of assuming a top-level array.
39
+
40
+ ### Changed
41
+
42
+ - `npm test` now also runs the agent-browser parser regression suite.
43
+
44
+ ## [0.2.0] - 2026-06-09
45
+
46
+ ### Added
47
+
48
+ - `extensions/utils/cli-runner.ts` — centralized CLI process spawning with timeout and AbortSignal support.
49
+ - `extensions/utils/content-preview.ts` — intelligent content extraction from scraped pages.
50
+ - `extensions/utils/output-sink.ts` — truncation and temp-file fallback, replacing `truncateHead` + manual `writeFile`/`mkdtemp` in every tool.
51
+ - `extensions/utils/render-helpers.ts` — URL abbreviations, text normalization, and error formatting for TUI.
52
+ - `extensions/utils/tool-factory.ts` — common tool registration patterns.
53
+ - `CLAUDE.md` — symlink to `AGENTS.md` for IDE/agent integration.
54
+ - `CONTEXT.md` — project domain summary for pi runtime context.
55
+ - `test/` directory — automated test suite under `test/content-preview/` with fixtures, baselines, snapshots, and summary report.
56
+
57
+ ### Changed
58
+
59
+ - All 4 tools (`web_search`, `web_fetch`, `web_browse`, `web_batch_fetch`) refactored to use new shared utils, eliminating ~200 lines of duplicated truncate/output logic per tool.
60
+ - `scrapling.ts` and `agent-browser.ts` now use `cli-runner`, eliminating duplicate `spawn` logic.
61
+ - `web_search` — `language` default changed from `"auto"` to `""` (omits param when unset to use SearXNG default).
62
+ - `web_search` — `promptGuidelines` now recommends `web_batch_fetch` for parallel reading of 2–5 results.
63
+ - `web_batch_fetch` — added live progress tracking with per-URL status (fetching / done / error).
64
+ - `web_browse` — added step formatting and tracking (`formatBrowseStep` + `steps` in details).
65
+ - Unified TUI redesign across all 4 tools:
66
+ - Consistent `isError` rendering with `✗` status, error text, and context details.
67
+ - Enhanced `isPartial` rendering with domain/URL context and live progress indicators.
68
+ - `fullOutputPath` rendered in accent color.
69
+ - `renderCall` tags: `[stealthy]`, `[selector=...]`, `[headed]`, `concurrency`.
70
+ - `web_fetch` — content preview (500-char extract) shown in collapsed and expanded views.
71
+ - `web_browse` — expanded view shows complete step list + preview.
72
+ - `web_batch_fetch` — collapsed shows top 3 successes with previews; expanded shows full success list + failure list.
73
+
74
+ ### Meta
75
+
76
+ - Stop tracking `package-lock.json` (library project; reproducible by downstream consumers).
77
+ - Add `typecheck`, `test`, and `test:approve` scripts to `package.json`.
78
+
79
+ ## [0.1.2] - 2026-06-08
80
+
81
+ ### Added
82
+
83
+ - `utils/agent-browser.ts` — extracted from `web_browse.ts` to encapsulate all agent-browser CLI interaction (command building, process spawning, JSON parsing, session cleanup).
84
+ - `tsconfig.json` — TypeScript project configuration for CI type-checking.
85
+ - GitHub Actions CI workflow (`ci.yml`) — runs `tsc --noEmit` on every push and PR.
86
+
87
+ ### Changed
88
+
89
+ - `web_search` — `SEARXNG_URL` is now read at execute time instead of module load time, so in-process environment changes take effect without reloading the extension.
90
+ - `utils/scrapling.ts` — introduced `runScraplingWithFallback()` with configurable `noGetFallback` option, eliminating duplicate fallback logic in `web_fetch` and `web_batch_fetch`.
91
+ - `web_browse.ts` — reduced from ~400 lines to ~194 lines by moving CLI logic to `utils/agent-browser.ts`.
92
+ - README — added `## Configuration` section, `## Contributing` section, CI badge, and updated project structure with design principles.
93
+
94
+ ### Fixed
95
+
96
+ - Preserved GET fallback for `web_batch_fetch` when `stealthy: true` fails, maintaining backward compatibility with the previous batch implementation.
97
+
98
+ ## [0.1.1] - 2026-06-04
99
+
100
+ ### Added
101
+
102
+ - `web_batch_fetch` — parallel multi-page fetching via scrapling.
103
+ - Built-in output truncation with temp-file fallback for all tools.
104
+ - TUI renderers for tool calls and results.
105
+
106
+ ### Changed
107
+
108
+ - Unified extension entry point at `extensions/index.ts`.
109
+
110
+ ## [0.1.0] - 2026-06-03
111
+
112
+ ### Added
113
+
114
+ - `web_search` — SearXNG web search.
115
+ - `web_fetch` — single-page extraction via scrapling.
116
+ - `web_browse` — interactive browser automation via agent-browser.
117
+ - LLM-optimized `promptGuidelines` and `promptSnippet` for every tool.
118
+
119
+ [Unreleased]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.2.2...HEAD
120
+ [0.2.2]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.2.1...v0.2.2
121
+ [0.2.1]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.2.0...v0.2.1
122
+ [0.2.0]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.1.2...v0.2.0
123
+ [0.1.2]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.1.1...v0.1.2
124
+ [0.1.1]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.1.0...v0.1.1
125
+ [0.1.0]: https://github.com/Wade11s/pi-web-toolkit/releases/tag/v0.1.0
package/README.md CHANGED
@@ -1,31 +1,138 @@
1
1
  # pi-web-toolkit
2
2
 
3
3
  [![npm version](https://badge.fury.io/js/pi-web-toolkit.svg)](https://www.npmjs.com/package/pi-web-toolkit)
4
+ [![Pi package](https://img.shields.io/badge/Pi-package-111111.svg)](https://pi.dev/packages/pi-web-toolkit)
4
5
  [![CI](https://github.com/Wade11s/pi-web-toolkit/actions/workflows/ci.yml/badge.svg)](https://github.com/Wade11s/pi-web-toolkit/actions)
5
6
  [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
6
7
  ![Node.js](https://img.shields.io/badge/node-%3E%3D22-339933)
7
8
 
8
- **100% open-source. Zero API keys. Zero fees.**
9
+ **100% open-source. No required API keys or paid services.**
9
10
 
10
- Web research toolkit for [pi](https://pi.dev) agents. Search via SearXNG, fetch static pages with scrapling, browse interactively via agent-browser, and batch-read sources in parallel. All self-hosted, all local, all free with built-in truncation safety and LLM-optimized prompt guidelines.
11
+ Web research toolkit for [pi](https://pi.dev) agents. Search via SearXNG, fetch pages with scrapling, browse interactively via agent-browser, and batch-read sources in parallel. All backends run locally or are self-hosted, with built-in truncation safety and LLM-optimized prompt guidelines.
11
12
 
12
13
  ## Features
13
14
 
14
15
  | Tool | Backend | Purpose | Current Limit |
15
16
  |------|---------|---------|---------------|
16
- | **`web_search`** | [SearXNG](https://github.com/searxng/searxng) | Search the web with scored, ranked results from multiple engines — always the first step in web research | 20 results (max 60, auto-pages up to 3 pages) |
17
- | **`web_fetch`** | [scrapling](https://github.com/D4Vinci/Scrapling) | Fetch a single static page as clean markdown | — |
18
- | **`web_batch_fetch`** | [scrapling](https://github.com/D4Vinci/Scrapling) | Fetch 2–15 pages in parallel for research synthesis | 3 concurrent (max 5) |
17
+ | **`web_search`** | [SearXNG](https://github.com/searxng/searxng) | Discover scored, ranked results from multiple engines | 20 results (max 60, auto-pages up to 3 pages) |
18
+ | **`web_fetch`** | [scrapling](https://github.com/D4Vinci/Scrapling) | Fetch a single page as clean markdown | — |
19
+ | **`web_batch_fetch`** | [scrapling](https://github.com/D4Vinci/Scrapling) | Fetch 1–15 pages in parallel for research synthesis (2–5 recommended) | 3 concurrent (max 5) |
19
20
  | **`web_browse`** | [agent-browser](https://github.com/vercel-labs/agent-browser) | Interact with a page (click, scroll, fill) then extract content | 25 actions |
20
21
 
22
+ ## Tools Preview
23
+
24
+ A quick look at how pi renders toolkit calls while an agent searches, fetches, batches, and browses the web.
25
+
26
+ <table>
27
+ <tr>
28
+ <td width="50%"><strong>Multi-tool research flow</strong><br><img src="docs/assets/screenshots/tools-workflow-preview.png" alt="pi-web-toolkit multi-tool research preview"></td>
29
+ <td width="50%"><strong><code>web_search</code> expanded results</strong><br><img src="docs/assets/screenshots/web-search-results-expanded.png" alt="web_search expanded results"></td>
30
+ </tr>
31
+ <tr>
32
+ <td width="50%"><strong><code>web_batch_fetch</code> progress</strong><br><img src="docs/assets/screenshots/web-batch-fetch-progress.png" alt="web_batch_fetch progress"></td>
33
+ <td width="50%"><strong><code>web_batch_fetch</code> results</strong><br><img src="docs/assets/screenshots/web-batch-fetch-results.png" alt="web_batch_fetch results"></td>
34
+ </tr>
35
+ <tr>
36
+ <td width="50%"><strong><code>web_fetch</code> result preview</strong><br><img src="docs/assets/screenshots/web-fetch-summary.png" alt="web_fetch result preview"></td>
37
+ <td width="50%"><strong><code>web_browse</code> headless browser flow</strong><br><img src="docs/assets/screenshots/web-browse-headless.png" alt="web_browse headless browser flow"></td>
38
+ </tr>
39
+ <tr>
40
+ <td colspan="2"><strong>End-to-end research summary</strong><br><img src="docs/assets/screenshots/web-research-workflow.png" alt="end-to-end web research workflow"></td>
41
+ </tr>
42
+ </table>
43
+
44
+ ## Install with Pi Agent
45
+
46
+ Copy and send the prompt below to Pi. It will install this package and its external dependencies for you.
47
+
48
+ ```text
49
+ Install pi-web-toolkit and its external dependencies. Complete and verify every
50
+ step yourself; do not rely on web browsing or external documentation. Inspect
51
+ the machine first and reuse working installations. Ask before using sudo,
52
+ changing shell profiles, overwriting configuration, or modifying existing
53
+ services or containers.
54
+
55
+ 1. Ensure Node.js 22+, npm, Docker, OpenSSL, curl, uv, and Pi are installed, and
56
+ that Docker is running. Install only missing or incompatible prerequisites.
57
+ 2. Configure SearXNG:
58
+ - Test SEARXNG_URL when set, then http://localhost:8080.
59
+ - Verify /search?q=test&format=json returns JSON with a results array.
60
+ - If neither endpoint works, first ensure no existing container or config
61
+ would be overwritten, then create a local-only instance by running:
62
+
63
+ mkdir -p "$HOME/.config/searxng"
64
+ cat > "$HOME/.config/searxng/settings.yml" <<'YAML'
65
+ use_default_settings: true
66
+
67
+ search:
68
+ formats:
69
+ - html
70
+ - json
71
+ YAML
72
+
73
+ docker run -d \
74
+ --name searxng \
75
+ --restart unless-stopped \
76
+ -p 127.0.0.1:8080:8080 \
77
+ -e FORCE_OWNERSHIP=false \
78
+ -e SEARXNG_SECRET="$(openssl rand -hex 32)" \
79
+ -v "$HOME/.config/searxng/settings.yml:/etc/searxng/settings.yml:ro" \
80
+ docker.io/searxng/searxng:latest
81
+
82
+ - Verify the selected endpoint by running:
83
+
84
+ SEARXNG_ENDPOINT="${SEARXNG_URL:-http://localhost:8080}"
85
+ curl -fsS --get "${SEARXNG_ENDPOINT%/}/search" \
86
+ --data-urlencode "q=test" \
87
+ --data "format=json" |
88
+ grep -q '"results"' && echo "SearXNG JSON API ready"
89
+
90
+ - Pi uses http://localhost:8080 by default. Set SEARXNG_URL before starting
91
+ Pi only when using another endpoint.
92
+ 3. Install and verify Scrapling:
93
+ uv tool install "scrapling[all]"
94
+ scrapling install
95
+ scrapling --help
96
+ 4. Install and verify agent-browser:
97
+ npm install -g agent-browser
98
+ agent-browser install
99
+ agent-browser doctor
100
+ On Linux, use agent-browser install --with-deps if required.
101
+ 5. After all dependencies pass verification, install the package:
102
+ pi install npm:pi-web-toolkit
103
+
104
+ Report what was installed or reused, all verification results, the SearXNG
105
+ endpoint Pi will use, and whether Pi must be restarted. Do not report success
106
+ until every check passes.
107
+ ```
108
+
21
109
  ## Quick Start
22
110
 
23
111
  ### 1. Install external dependencies
24
112
 
113
+ The commands below assume a POSIX shell with Docker, OpenSSL, curl, uv, and Node.js 22+ with npm.
114
+
25
115
  ```bash
26
- # SearXNG (for search)
27
- docker run -d --name searxng -p 8080:8080 -v searxng:/etc/searxng searxng/searxng
28
- export SEARXNG_URL="http://localhost:8080"
116
+ # SearXNG (for search; local-only instance with the required JSON API)
117
+ mkdir -p "$HOME/.config/searxng"
118
+ cat > "$HOME/.config/searxng/settings.yml" <<'YAML'
119
+ use_default_settings: true
120
+
121
+ search:
122
+ formats:
123
+ - html
124
+ - json
125
+ YAML
126
+
127
+ docker run -d \
128
+ --name searxng \
129
+ --restart unless-stopped \
130
+ -p 127.0.0.1:8080:8080 \
131
+ -e FORCE_OWNERSHIP=false \
132
+ -e SEARXNG_SECRET="$(openssl rand -hex 32)" \
133
+ -v "$HOME/.config/searxng/settings.yml:/etc/searxng/settings.yml:ro" \
134
+ docker.io/searxng/searxng:latest
135
+ export SEARXNG_URL="http://127.0.0.1:8080"
29
136
 
30
137
  # scrapling (for fetch & batch fetch)
31
138
  uv tool install "scrapling[all]"
@@ -33,12 +140,16 @@ scrapling install
33
140
 
34
141
  # agent-browser (for browse)
35
142
  npm i -g agent-browser && agent-browser install
143
+ # On Linux hosts missing browser system libraries: agent-browser install --with-deps
36
144
  ```
37
145
 
38
146
  **Verify dependencies:**
39
147
  ```bash
40
148
  # SearXNG
41
- curl -s "$SEARXNG_URL" | head
149
+ curl -fsS --get "$SEARXNG_URL/search" \
150
+ --data-urlencode "q=searxng" \
151
+ --data "format=json" |
152
+ grep -q '"results"' && echo "SearXNG JSON API ready"
42
153
 
43
154
  # scrapling
44
155
  scrapling --help
@@ -59,7 +170,7 @@ pi install git:github.com/Wade11s/pi-web-toolkit
59
170
 
60
171
  ## Configuration
61
172
 
62
- All tools are configured via **environment variables** at runtime no rebuild or restart required.
173
+ `web_search` reads its SearXNG endpoint from an environment variable. Set it before starting pi; no build step is required.
63
174
 
64
175
  | Variable | Default | Used By | Description |
65
176
  |----------|---------|---------|-------------|
@@ -78,24 +189,37 @@ pi-web-toolkit/
78
189
  ├── extensions/
79
190
  │ ├── index.ts # Unified entry point — registers all 4 tools
80
191
  │ ├── utils/
192
+ │ │ ├── cli-runner.ts # Unified CLI process spawning with timeout/AbortSignal
193
+ │ │ ├── content-preview.ts # Intelligent content extraction from scraped pages
194
+ │ │ ├── output-sink.ts # Truncation + temp-file fallback
195
+ │ │ ├── render-helpers.ts # URL abbreviations, text normalization, error formatting for TUI
81
196
  │ │ ├── scrapling.ts # Reusable scrapling CLI wrapper (shared by fetch + batch)
197
+ │ │ ├── tool-factory.ts # Common tool registration patterns
82
198
  │ │ └── agent-browser.ts # agent-browser CLI wrapper (shared by web_browse)
83
199
  │ ├── web_search.ts # SearXNG search tool
84
200
  │ ├── web_fetch.ts # Single-page scrapling fetcher
85
201
  │ ├── web_batch_fetch.ts # Parallel scrapling fetcher
86
202
  │ └── web_browse.ts # Interactive browser automation (agent-browser)
203
+ ├── test/
204
+ │ ├── agent-browser/ # agent-browser output parser regression tests
205
+ │ ├── content-preview/ # Content preview fixtures, baselines & snapshots
206
+ │ └── README.md # Test suite structure and conventions
87
207
  ├── docs/
88
208
  │ ├── tools.md # Full parameter specs
89
- └── guide.md # Decision tree & tool comparison
209
+ ├── guide.md # Decision tree & tool comparison
210
+ │ └── agents/ # Issue tracker, triage and domain guidance
211
+ ├── AGENTS.md
212
+ ├── CONTEXT.md
90
213
  ├── CHANGELOG.md
91
214
  ├── package.json
92
215
  ├── README.md
216
+ ├── tsconfig.json
93
217
  └── LICENSE
94
218
  ```
95
219
 
96
220
  **Design principles:**
97
221
  - **Unified registration** — `index.ts` is the single source of truth for what pi loads.
98
- - **Shared utilities** — `utils/scrapling.ts` and `utils/agent-browser.ts` encapsulate the CLI wrappers and fallback logic; tool files import only from `utils/`, never from each other.
222
+ - **Shared utilities** — `utils/` modules encapsulate CLI spawning, content extraction, output truncation, TUI formatting, and common registration patterns; tool files import only from `utils/`, never from each other.
99
223
  - **Per-tool isolation** — each tool owns its own schema, execute logic, and TUI renderer; no cross-imports except via `utils/`.
100
224
  - **Runtime config** — environment variables are read at execute time, not build time.
101
225
 
@@ -112,7 +236,10 @@ pi-web-toolkit/
112
236
  pi install ./
113
237
 
114
238
  # Type-check (no build step; pi loads TypeScript directly)
115
- npx tsc --noEmit
239
+ npm run typecheck
240
+
241
+ # Run tests
242
+ npm run test
116
243
 
117
244
  # Verify external CLI dependencies
118
245
  scrapling --help
@@ -5,7 +5,7 @@ Issues and PRDs for this repo live as GitHub issues. Use the `gh` CLI for all op
5
5
  ## Conventions
6
6
 
7
7
  - **Create an issue**: `gh issue create --title "..." --body "..."`. Use a heredoc for multi-line bodies.
8
- - **Read an issue**: `gh issue view <number> --comments`, filtering comments by `jq` and also fetching labels.
8
+ - **Read an issue**: `gh issue view <number> --json number,title,body,labels,comments --jq '{number, title, body, labels: [.labels[].name], comments: [.comments[].body]}'`.
9
9
  - **List issues**: `gh issue list --state open --json number,title,body,labels,comments --jq '[.[] | {number, title, body, labels: [.labels[].name], comments: [.comments[].body]}]'` with appropriate `--label` and `--state` filters.
10
10
  - **Comment on an issue**: `gh issue comment <number> --body "..."`
11
11
  - **Apply / remove labels**: `gh issue edit <number> --add-label "..."` / `--remove-label "..."`
@@ -19,4 +19,4 @@ Create a GitHub issue.
19
19
 
20
20
  ## When a skill says "fetch the relevant ticket"
21
21
 
22
- Run `gh issue view <number> --comments`.
22
+ Run `gh issue view <number> --json number,title,body,labels,comments --jq '{number, title, body, labels: [.labels[].name], comments: [.comments[].body]}'`.
package/docs/guide.md CHANGED
@@ -8,19 +8,19 @@ User asks about something external / current
8
8
  ├─→ web_search("...")
9
9
  │ │
10
10
  │ ├─→ 1 relevant result?
11
- │ │ └─→ web_fetch(url) ← static page
11
+ │ │ └─→ web_fetch(url) ← no interaction needed
12
12
  │ │ OR
13
13
  │ │ └─→ web_browse(url, actions) ← needs interaction
14
14
  │ │
15
15
  │ └─→ 2–5 relevant results?
16
- │ ├─→ All static pages?
16
+ │ ├─→ All need no interaction?
17
17
  │ │ └─→ web_batch_fetch(urls[]) ← parallel fetch
18
18
  │ └─→ Some need interaction?
19
- │ └─→ web_fetch (static ones)
19
+ │ └─→ web_fetch (no-interaction ones)
20
20
  │ web_browse (interactive ones) ← sequential
21
21
 
22
22
  └─→ User provides a URL directly
23
- ├─→ Static / loads on first request?
23
+ ├─→ No interaction needed / loads on first request?
24
24
  │ └─→ web_fetch(url)
25
25
  └─→ Needs clicking / scrolling / waiting?
26
26
  └─→ web_browse(url, actions)
@@ -32,10 +32,12 @@ User asks about something external / current
32
32
 
33
33
  | | `web_fetch` | `web_browse` | `web_batch_fetch` |
34
34
  |--|-------------|--------------|-------------------|
35
- | **Pages** | 1 | 1 | 2–15 |
36
- | **Browser** | Yes (scrapling) | Yes (agent-browser) | Yes (scrapling) |
35
+ | **Pages** | 1 | 1 | 1–15 (2–5 recommended) |
36
+ | **Browser** | Yes (Scrapling) | Yes (agent-browser) | Yes (Scrapling) |
37
37
  | **Interaction** | ❌ No | ✅ Click, fill, scroll, wait | ❌ No |
38
38
  | **Selector** | ✅ Per-URL | ✅ Final state | ✅ Applied to all |
39
- | **Stealthy** | ✅ Yes | ❌ No (planned) | ✅ Yes |
39
+ | **Stealthy** | ✅ Yes | ❌ No | ✅ Yes |
40
40
  | **Speed** | Fast | Slower (browser ops) | Medium (parallel) |
41
41
  | **Best for** | Articles, docs, blogs | SPAs, forms, pagination | Research synthesis |
42
+
43
+ `web_fetch` falls back to HTTP GET after a normal browser fetch fails, but not in stealthy mode. `web_batch_fetch` falls back to GET after failed browser fetches in all modes.
package/docs/tools.md CHANGED
@@ -7,22 +7,24 @@ Search the web via SearXNG. Returns ranked results with title, URL, and snippet.
7
7
  ```typescript
8
8
  {
9
9
  query: string, // Search query
10
- language?: string, // Language code (en, de, fr...). Default: "auto"
10
+ language?: string, // Language code (en, en-US, de...). Omit to use the SearXNG default.
11
11
  results?: number, // Max results (1–60). Default: 20. Automatically pages through SearXNG (up to 3 pages) if needed.
12
12
  }
13
13
  ```
14
14
 
15
- **When to use:** The user asks about current events, facts, or anything requiring up-to-date information. This is always the **first step** of web research.
15
+ **When to use:** The user asks about current events, facts, or anything requiring up-to-date information and has not already provided the source URLs.
16
16
 
17
- **Empty results behavior:** When no results are found, `web_search` returns a list of **suggestions** alternative queries that SearXNG believes may yield better results. The agent can use these suggestions to automatically refine and retry the search.
17
+ **Empty results behavior:** When no results are found, `web_search` includes any query **suggestions** provided by SearXNG. The agent can use them to refine and retry the search.
18
18
 
19
19
  **Pagination:** `web_search` automatically fetches up to 3 pages from SearXNG and deduplicates by URL. You do not need to call it multiple times for deeper results.
20
20
 
21
+ **Full output:** For non-empty searches, the formatted result output is always written to a temporary file. Returned text is also truncated to pi's default line/byte limits when necessary.
22
+
21
23
  ---
22
24
 
23
25
  ## `web_fetch`
24
26
 
25
- Fetch a single page and convert it to clean markdown. Uses scrapling's browser automation for JS-heavy sites.
27
+ Fetch a single page and convert it to clean markdown. Uses Scrapling's browser-based `fetch` command first, then falls back to an HTTP GET when allowed. Stealthy mode uses `stealthy-fetch` and does not fall back to GET.
26
28
 
27
29
  ```typescript
28
30
  {
@@ -39,9 +41,9 @@ Fetch a single page and convert it to clean markdown. Uses scrapling's browser a
39
41
 
40
42
  **Example flow:**
41
43
  ```
42
- User: "What's the latest Rust release?"
43
- → web_search("latest Rust programming language release")
44
- → web_fetch("https://blog.rust-lang.org/2026/06/02/maintainers-fund/")
44
+ User: "How do I install Rust?"
45
+ → web_search("official Rust installation guide")
46
+ → web_fetch("https://www.rust-lang.org/tools/install")
45
47
  → Agent answers with full context
46
48
  ```
47
49
 
@@ -51,7 +53,7 @@ User: "What's the latest Rust release?"
51
53
 
52
54
  Open a real browser, perform a chain of actions (click, fill, scroll, wait), then extract content.
53
55
 
54
- Uses the [agent-browser](https://github.com/vercel-labs/agent-browser) CLI for native browser automation via Chrome CDP.
56
+ Uses the [agent-browser](https://github.com/vercel-labs/agent-browser) CLI with batched JSON commands.
55
57
 
56
58
  ```typescript
57
59
  {
@@ -65,12 +67,15 @@ Uses the [agent-browser](https://github.com/vercel-labs/agent-browser) CLI for n
65
67
  | { type: "wait_selector", selector: string, state?: "attached" | "visible" | "hidden" }
66
68
  | { type: "scroll", direction: "down" | "up" | "bottom" | "top", amount?: number }
67
69
  >,
70
+ // Maximum: 25 actions
68
71
  selector?: string, // Extract content from final page state
69
72
  headless?: boolean, // Default: true
70
73
  timeout?: number, // Overall browser batch timeout (ms). Default: 30000
71
74
  }
72
75
  ```
73
76
 
77
+ When `selector` is omitted, the tool returns agent-browser's interactive accessibility snapshot rather than full page text.
78
+
74
79
  **When to use:**
75
80
  - The page requires **clicking** before showing target content (e.g. "Load more", pagination, tab switching)
76
81
  - The page requires **filling a form** (e.g. search box, login)
@@ -123,7 +128,7 @@ Fetch multiple pages in parallel and return aggregated content.
123
128
 
124
129
  ```typescript
125
130
  {
126
- urls: string[], // 1–10 URLs
131
+ urls: string[], // 1–15 URLs; 2–5 recommended
127
132
  selector?: string, // CSS selector applied to ALL pages
128
133
  stealthy?: boolean, // Default: false
129
134
  max_concurrency?: number // Parallel fetches (1–5). Default: 3
@@ -136,7 +141,9 @@ Fetch multiple pages in parallel and return aggregated content.
136
141
  - Comparing implementations across different docs/pages
137
142
  - Research synthesis requiring multiple sources
138
143
 
139
- **NOT for:** Single pages (use `web_fetch` simpler and supports per-URL stealthy mode).
144
+ **NOT recommended for:** Single pages. The schema accepts one URL, but `web_fetch` is simpler and provides single-page behavior.
145
+
146
+ Each page starts with the selected Scrapling browser fetcher. Failed attempts fall back to HTTP GET, including when batch stealthy mode is enabled.
140
147
 
141
148
  **Example flow:**
142
149
  ```
@@ -3,7 +3,7 @@
3
3
  *
4
4
  * Registers all web research tools as a single extension:
5
5
  * - web_search: Search via SearXNG
6
- * - web_fetch: Fetch static pages with scrapling
6
+ * - web_fetch: Fetch a single page with scrapling
7
7
  * - web_browse: Interactive browser automation via agent-browser
8
8
  * - web_batch_fetch: Concurrent multi-page fetching
9
9
  */
@@ -25,6 +25,48 @@ export interface AgentBrowserBatchItem {
25
25
  error?: string | null;
26
26
  }
27
27
 
28
+ function isRecord(value: unknown): value is Record<string, unknown> {
29
+ return typeof value === "object" && value !== null && !Array.isArray(value);
30
+ }
31
+
32
+ function isBatchItem(value: unknown): value is AgentBrowserBatchItem {
33
+ return isRecord(value)
34
+ && typeof value.success === "boolean"
35
+ && Array.isArray(value.command)
36
+ && value.command.every((part) => typeof part === "string");
37
+ }
38
+
39
+ function describeBatchOutput(value: unknown): string {
40
+ if (Array.isArray(value)) return `array with ${value.length} item(s)`;
41
+ if (isRecord(value)) return `object with keys: ${Object.keys(value).join(", ") || "(none)"}`;
42
+ return typeof value;
43
+ }
44
+
45
+ export function parseAgentBrowserBatchOutput(stdout: string): AgentBrowserBatchItem[] {
46
+ const parsed = JSON.parse(stdout) as unknown;
47
+
48
+ if (Array.isArray(parsed)) {
49
+ if (parsed.every(isBatchItem)) return parsed;
50
+ throw new Error(`Expected every batch result item to contain { success, command }; got ${describeBatchOutput(parsed)}`);
51
+ }
52
+
53
+ if (isBatchItem(parsed)) {
54
+ return [parsed];
55
+ }
56
+
57
+ if (isRecord(parsed)) {
58
+ for (const key of ["results", "items", "data", "commands"]) {
59
+ const candidate = parsed[key];
60
+ if (Array.isArray(candidate)) {
61
+ if (candidate.every(isBatchItem)) return candidate;
62
+ throw new Error(`Expected ${key} to contain batch result items; got ${describeBatchOutput(candidate)}`);
63
+ }
64
+ }
65
+ }
66
+
67
+ throw new Error(`Expected JSON array of batch results; got ${describeBatchOutput(parsed)}`);
68
+ }
69
+
28
70
  function requireString(action: BrowseAction, field: "selector" | "value" | "key"): string {
29
71
  const value = action[field] as string | undefined;
30
72
  if (typeof value !== "string" || value.length === 0) {
@@ -150,7 +192,7 @@ export async function runAgentBrowserBatch(
150
192
  }
151
193
 
152
194
  try {
153
- return JSON.parse(result.stdout) as AgentBrowserBatchItem[];
195
+ return parseAgentBrowserBatchOutput(result.stdout);
154
196
  } catch (err: any) {
155
197
  throw new Error(
156
198
  `Failed to parse agent-browser output: ${err.message}\nstdout: ${result.stdout}\nstderr: ${result.stderr}`
@@ -3,7 +3,7 @@
3
3
  *
4
4
  * Provides a single interface for running external CLI commands
5
5
  * with consistent signal handling, timeout support, and stdout/stderr
6
- * collection. Enables testability by allowing the runner to be swapped.
6
+ * collection.
7
7
  */
8
8
 
9
9
  import { spawn, type ChildProcess } from "node:child_process";
@@ -109,8 +109,7 @@ const webBatchFetchTool = defineTool({
109
109
  label: "Web Batch Fetch",
110
110
  description: [
111
111
  "Fetch multiple web pages in parallel and return their content aggregated.",
112
- "Use web_batch_fetch AFTER web_search when there are 2–5 relevant results",
113
- "that the agent wants to read simultaneously for comparison or synthesis.",
112
+ "Use web_batch_fetch for 2–5 relevant URLs, whether discovered by search or provided by the user.",
114
113
  "For a single page, use web_fetch instead.",
115
114
  `Output is truncated to ${DEFAULT_MAX_LINES} lines or ${formatSize(DEFAULT_MAX_BYTES)}; if truncated, full output is saved to a temp file.`,
116
115
  ].join(" "),
@@ -8,7 +8,7 @@
8
8
  * filling forms, waiting for dynamic content) BEFORE its target content
9
9
  * becomes available.
10
10
  *
11
- * For static pages that need no interaction, use `web_fetch` instead.
11
+ * For pages that need no interaction, use `web_fetch` instead.
12
12
  */
13
13
 
14
14
  import {
@@ -88,10 +88,10 @@ const webBrowseTool = defineTool({
88
88
  description: [
89
89
  "Interact with a web page through a browser: navigate, click, fill forms, scroll,",
90
90
  "wait for content, and then extract text.",
91
- "Uses the agent-browser CLI for fast, native browser automation via Chrome CDP.",
91
+ "Uses the agent-browser CLI with batched JSON commands.",
92
92
  "Use web_browse when the target content requires interaction (clicking buttons,",
93
93
  "scrolling, filling search boxes, waiting for JS to load) before it becomes available.",
94
- "For static pages that need no interaction, use web_fetch instead.",
94
+ "For pages that need no interaction, use web_fetch instead.",
95
95
  `Output is truncated to ${DEFAULT_MAX_LINES} lines or ${formatSize(DEFAULT_MAX_BYTES)}; if truncated, full output is saved to a temp file.`,
96
96
  ].join(" "),
97
97
  promptSnippet: "Interact with a web page (click, scroll, fill) and extract content",
@@ -100,7 +100,7 @@ const webBrowseTool = defineTool({
100
100
  "Use web_browse for SPAs, pagination (click 'Load more'), search forms, tab switching, and modal dialogs.",
101
101
  "For static articles, docs, or blogs that load everything on first request, prefer web_fetch.",
102
102
  "After web_search returns results, prefer web_fetch for reading individual articles.",
103
- "Only use web_browse if web_fetch fails to get the needed content.",
103
+ "Use web_browse directly when interaction is required; otherwise try web_fetch first.",
104
104
  "Always provide a selector to extract only the relevant content area — avoid dumping full page text.",
105
105
  ],
106
106
  parameters: WebBrowseParamsSchema,
@@ -41,13 +41,13 @@ const webFetchTool = defineTool({
41
41
  description: [
42
42
  "Fetch and extract readable content from a web page URL.",
43
43
  "Uses scrapling to download the page and convert it to clean markdown.",
44
- "Use web_fetch AFTER web_search to read the full content of a result page.",
45
- "Respects robots.txt and site ToS.",
44
+ "Use web_fetch to read the full content of a specific result or user-provided URL.",
45
+ "Callers remain responsible for robots.txt and site terms; Scrapling extract commands do not enforce them automatically.",
46
46
  `Output is truncated to ${DEFAULT_MAX_LINES} lines or ${formatSize(DEFAULT_MAX_BYTES)}; if truncated, full output is saved to a temp file.`,
47
47
  ].join(" "),
48
48
  promptSnippet: "Fetch full page content from a URL as markdown",
49
49
  promptGuidelines: [
50
- "Use web_fetch to read a single static page (article, doc, or blog) when given a specific URL.",
50
+ "Use web_fetch to read a single page (article, doc, or blog) that needs no interaction.",
51
51
  "For a single URL, always use web_fetch instead of web_batch_fetch.",
52
52
  "If the page is dynamic/JavaScript-heavy, the tool automatically uses browser automation.",
53
53
  "When reading multiple (2–5) pages at once (e.g., after web_search), prefer web_batch_fetch over repeated web_fetch calls.",
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "pi-web-toolkit",
3
- "version": "0.2.0",
4
- "description": "Web research toolkit for the pi coding agent. Search via SearXNG, fetch static pages with scrapling, browse interactively via agent-browser, and batch-read sources in parallel.",
3
+ "version": "0.2.2",
4
+ "description": "Web research toolkit for the pi coding agent. Search via SearXNG, fetch pages with scrapling, browse interactively via agent-browser, and batch-read sources in parallel.",
5
5
  "author": "Wade Huang <fastwade11@gmail.com>",
6
6
  "license": "MIT",
7
7
  "repository": {
@@ -13,16 +13,18 @@
13
13
  },
14
14
  "homepage": "https://github.com/Wade11s/pi-web-toolkit#readme",
15
15
  "keywords": ["pi-package", "pi-extension", "web-search", "scrapling", "agent-browser"],
16
- "files": ["extensions", "docs", "README.md", "package.json", "LICENSE"],
16
+ "files": ["extensions", "docs", "README.md", "CHANGELOG.md", "package.json", "LICENSE"],
17
17
  "engines": {
18
18
  "node": ">=22.0.0"
19
19
  },
20
20
  "scripts": {
21
21
  "typecheck": "tsc --noEmit",
22
- "test": "npx tsx test/content-preview/test.ts",
23
- "test:approve": "npx tsx test/content-preview/test.ts --approve"
22
+ "test": "tsx test/content-preview/test.ts && tsx test/agent-browser/test.ts",
23
+ "test:agent-browser": "tsx test/agent-browser/test.ts",
24
+ "test:approve": "tsx test/content-preview/test.ts --approve"
24
25
  },
25
26
  "devDependencies": {
27
+ "tsx": "^4.22.4",
26
28
  "typescript": "^5.7.0"
27
29
  },
28
30
  "peerDependencies": {