pi-web-toolkit 0.3.1 → 0.3.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +16 -1
- package/README.md +2 -2
- package/docs/guide.md +2 -2
- package/docs/tools.md +10 -7
- package/extensions/firecrawl_interact.ts +13 -14
- package/extensions/firecrawl_scrape.ts +13 -14
- package/extensions/firecrawl_search.ts +6 -6
- package/extensions/index.ts +25 -0
- package/extensions/web_batch_fetch.ts +3 -7
- package/extensions/web_browse.ts +5 -9
- package/extensions/web_fetch.ts +5 -9
- package/extensions/web_search.ts +5 -6
- package/package.json +2 -2
package/CHANGELOG.md
CHANGED
|
@@ -7,6 +7,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
7
7
|
|
|
8
8
|
## [Unreleased]
|
|
9
9
|
|
|
10
|
+
## [0.3.2] - 2026-06-25
|
|
11
|
+
|
|
12
|
+
### Fixed
|
|
13
|
+
|
|
14
|
+
- Kept the agent's web-tool selection local-first: ordinary URL reads now prefer `web_fetch`, discovery prefers `web_search`, and interaction prefers `web_browse`; `firecrawl_*` tools are documented and prompted as fallback-only unless explicitly requested.
|
|
15
|
+
- Fixed `firecrawl_scrape` and `firecrawl_interact` partial-result rendering type-check errors caused by reading `details` before declaration.
|
|
16
|
+
|
|
17
|
+
### Changed
|
|
18
|
+
|
|
19
|
+
- Reduced web-tool prompt metadata overhead by consolidating shared routing rules and shortening per-tool `promptSnippet`/`promptGuidelines` text.
|
|
20
|
+
- Added a tool-routing prompt regression test and included it in `npm test`.
|
|
21
|
+
|
|
10
22
|
## [0.3.1] - 2026-06-23
|
|
11
23
|
|
|
12
24
|
### Changed
|
|
@@ -145,7 +157,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
145
157
|
- `web_browse` — interactive browser automation via agent-browser.
|
|
146
158
|
- LLM-optimized `promptGuidelines` and `promptSnippet` for every tool.
|
|
147
159
|
|
|
148
|
-
[Unreleased]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.
|
|
160
|
+
[Unreleased]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.3.2...HEAD
|
|
161
|
+
[0.3.2]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.3.1...v0.3.2
|
|
162
|
+
[0.3.1]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.3.0...v0.3.1
|
|
163
|
+
[0.3.0]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.2.2...v0.3.0
|
|
149
164
|
[0.2.2]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.2.1...v0.2.2
|
|
150
165
|
[0.2.1]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.2.0...v0.2.1
|
|
151
166
|
[0.2.0]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.1.2...v0.2.0
|
package/README.md
CHANGED
|
@@ -22,7 +22,7 @@ Web research toolkit for [pi](https://pi.dev) agents. Search via SearXNG, fetch
|
|
|
22
22
|
| **`firecrawl_scrape`** | [firecrawl-cli](https://github.com/firecrawl/cli) (keyless) | Cloud single-page fetch (anti-bot / JS / PDF) | — |
|
|
23
23
|
| **`firecrawl_interact`** | [firecrawl-cli](https://github.com/firecrawl/cli) (keyless) | Cloud natural-language page interaction | — |
|
|
24
24
|
|
|
25
|
-
> **Firecrawl fallback.** `web_search`, `web_fetch`, and `web_browse` automatically retry through Firecrawl Keyless (1,000 free credits/month, no API key) when their local backend errors out or search returns nothing. The three `firecrawl_*` tools are
|
|
25
|
+
> **Firecrawl fallback.** `web_search`, `web_fetch`, and `web_browse` are the local-first primary tools and automatically retry through Firecrawl Keyless (1,000 free credits/month, no API key) only when their local backend errors out or search returns nothing. The three `firecrawl_*` tools are fallback-only escape hatches; agents are instructed not to call them first unless you explicitly ask for Firecrawl/cloud behavior or a local-first tool already failed. Disable fallback use with `PI_WEB_FIRECRAWL_FALLBACK=0`. Install the optional CLI: `npm install -g firecrawl-cli`.
|
|
26
26
|
|
|
27
27
|
## Tools Preview
|
|
28
28
|
|
|
@@ -198,7 +198,7 @@ export PI_WEB_FIRECRAWL_FALLBACK=0
|
|
|
198
198
|
|
|
199
199
|
### Optional: Firecrawl keyless fallback
|
|
200
200
|
|
|
201
|
-
When a local backend (`web_search`/`web_fetch`/`web_browse`) fails or returns nothing, the tools automatically retry through [Firecrawl Keyless](https://www.firecrawl.dev/blog/firecrawl-keyless-launch) — 1,000 free credits/month, **no API key, no signup**. The `firecrawl_*` tools are explicit escape hatches for capabilities the local backends lack (search categories, cloud rendering, natural-language interaction).
|
|
201
|
+
When a local backend (`web_search`/`web_fetch`/`web_browse`) fails or returns nothing, the tools automatically retry through [Firecrawl Keyless](https://www.firecrawl.dev/blog/firecrawl-keyless-launch) — 1,000 free credits/month, **no API key, no signup**. The `firecrawl_*` tools are fallback-only explicit escape hatches for capabilities the local backends lack (search categories, cloud rendering, natural-language interaction). Agents should use `web_fetch`/`web_search`/`web_browse` first unless you explicitly request Firecrawl/cloud behavior.
|
|
202
202
|
|
|
203
203
|
Install the optional CLI (the fallback degrades gracefully if it is absent):
|
|
204
204
|
|
package/docs/guide.md
CHANGED
|
@@ -46,7 +46,7 @@ User asks about something external / current
|
|
|
46
46
|
|
|
47
47
|
## Firecrawl Keyless fallback
|
|
48
48
|
|
|
49
|
-
When a local backend cannot do the job, the tools automatically retry through **Firecrawl Keyless** (1,000 free credits/month, no API key, no signup) before giving up. It is **fallback-only** — never the primary path — and is **opt-out-able** with `PI_WEB_FIRECRAWL_FALLBACK=0`. Requires the optional `firecrawl-cli` (`npm install -g firecrawl-cli`); if it is absent the tools simply surface the original local error.
|
|
49
|
+
When a local backend cannot do the job, the tools automatically retry through **Firecrawl Keyless** (1,000 free credits/month, no API key, no signup) before giving up. It is **fallback-only** — never the primary path — and is **opt-out-able** with `PI_WEB_FIRECRAWL_FALLBACK=0`. Requires the optional `firecrawl-cli` (`npm install -g firecrawl-cli`); if it is absent the tools simply surface the original local error. Agents should call `web_search`/`web_fetch`/`web_browse` first and call `firecrawl_*` directly only after the corresponding local-first tool failed, or when the user explicitly asks for Firecrawl/cloud behavior.
|
|
50
50
|
|
|
51
51
|
| Tool | Falls back to Firecrawl when… |
|
|
52
52
|
|------|-------------------------------|
|
|
@@ -55,7 +55,7 @@ When a local backend cannot do the job, the tools automatically retry through **
|
|
|
55
55
|
| `web_browse` | agent-browser is missing or its batch fails (not on caller validation errors) |
|
|
56
56
|
| `web_batch_fetch` | (no fallback — Firecrawl batch scrape is not keyless) |
|
|
57
57
|
|
|
58
|
-
The three `firecrawl_*` tools are
|
|
58
|
+
The three `firecrawl_*` tools are fallback-only explicit escape hatches for capabilities the local backends lack (`github`/`research`/`pdf` search categories, cloud rendering, natural-language interaction). They are not the first step for ordinary URL reading; `web_fetch` already performs Firecrawl fallback internally when local fetching fails.
|
|
59
59
|
|
|
60
60
|
**Graceful skip.** If the fallback itself cannot help — the CLI is missing, the IP is flagged as suspicious, the keyless quota is exhausted, or the fallback is disabled — the tool falls through to the original local-tool error so the user is never left worse off.
|
|
61
61
|
|
package/docs/tools.md
CHANGED
|
@@ -12,7 +12,7 @@ Search the web via SearXNG. Returns ranked results with title, URL, and snippet.
|
|
|
12
12
|
}
|
|
13
13
|
```
|
|
14
14
|
|
|
15
|
-
**When to use:** The user asks about current events, facts, or anything requiring up-to-date information and has not already provided the source URLs.
|
|
15
|
+
**When to use:** The user asks about current events, facts, or anything requiring up-to-date information and has not already provided the source URLs. Use `web_search` before `firecrawl_search`; `web_search` already performs Firecrawl fallback internally when SearXNG fails or returns nothing.
|
|
16
16
|
|
|
17
17
|
**Empty results behavior:** When no results are found, `web_search` includes any query **suggestions** provided by SearXNG. The agent can use them to refine and retry the search.
|
|
18
18
|
|
|
@@ -35,9 +35,10 @@ Fetch a single page and convert it to clean markdown. Uses Scrapling's browser-b
|
|
|
35
35
|
```
|
|
36
36
|
|
|
37
37
|
**When to use:**
|
|
38
|
-
-
|
|
38
|
+
- As the first attempt for a user-provided URL or after `web_search` finds a relevant result
|
|
39
39
|
- The page is static or loads its content on first request
|
|
40
40
|
- You need to read **one** article, doc, or blog post
|
|
41
|
+
- Before `firecrawl_scrape`; `web_fetch` already performs Firecrawl fallback internally when the local fetcher fails
|
|
41
42
|
|
|
42
43
|
**Example flow:**
|
|
43
44
|
```
|
|
@@ -77,10 +78,12 @@ Uses the [agent-browser](https://github.com/vercel-labs/agent-browser) CLI with
|
|
|
77
78
|
When `selector` is omitted, the tool returns agent-browser's interactive accessibility snapshot rather than full page text.
|
|
78
79
|
|
|
79
80
|
**When to use:**
|
|
81
|
+
- As the first attempt when the page requires interaction
|
|
80
82
|
- The page requires **clicking** before showing target content (e.g. "Load more", pagination, tab switching)
|
|
81
83
|
- The page requires **filling a form** (e.g. search box, login)
|
|
82
84
|
- The page requires **scrolling** to load lazy content (infinite scroll)
|
|
83
85
|
- The page requires **waiting** for JS to render content (SPA)
|
|
86
|
+
- Before `firecrawl_interact`; `web_browse` already performs Firecrawl fallback internally when local browser automation fails
|
|
84
87
|
|
|
85
88
|
**Example flows:**
|
|
86
89
|
|
|
@@ -163,11 +166,11 @@ User: "Compare Python asyncio, Trio, and curio"
|
|
|
163
166
|
|
|
164
167
|
---
|
|
165
168
|
|
|
166
|
-
## Firecrawl keyless tools (optional cloud escape hatches)
|
|
169
|
+
## Firecrawl keyless tools (optional fallback-only cloud escape hatches)
|
|
167
170
|
|
|
168
171
|
These three tools talk to [Firecrawl](https://www.firecrawl.dev) in **keyless** mode: 1,000 free credits/month, **no API key and no signup**. They require the optional `firecrawl-cli` (`npm install -g firecrawl-cli`). **Privacy:** the URL/query/page content is sent to Firecrawl's cloud.
|
|
169
172
|
|
|
170
|
-
They double as the implementation of the automatic fallback: `web_search`/`web_fetch`/`web_browse` retry through Firecrawl keyless when their local backend fails (or search returns nothing). Disable all Firecrawl usage with `PI_WEB_FIRECRAWL_FALLBACK=0`.
|
|
173
|
+
They double as the implementation of the automatic fallback: `web_search`/`web_fetch`/`web_browse` retry through Firecrawl keyless when their local backend fails (or search returns nothing). Do not use `firecrawl_*` as the first attempt for ordinary search, URL reading, or page interaction; use the corresponding local-first tool first unless the user explicitly asks for Firecrawl/cloud behavior. Disable all Firecrawl usage with `PI_WEB_FIRECRAWL_FALLBACK=0`.
|
|
171
174
|
|
|
172
175
|
### `firecrawl_search`
|
|
173
176
|
|
|
@@ -187,7 +190,7 @@ Cloud web search via Firecrawl keyless, with capabilities the local SearXNG tool
|
|
|
187
190
|
}
|
|
188
191
|
```
|
|
189
192
|
|
|
190
|
-
**When to use:** `web_search` failed or returned nothing;
|
|
193
|
+
**When to use:** `web_search` failed or returned nothing; you need `github`/`research`/`pdf` categories, images/news sources, or domain scoping that SearXNG does not provide; or the user explicitly asked for Firecrawl/cloud search. Do not use it before `web_search` for ordinary discovery.
|
|
191
194
|
|
|
192
195
|
### `firecrawl_scrape`
|
|
193
196
|
|
|
@@ -203,7 +206,7 @@ Cloud single-page fetch via Firecrawl keyless (anti-bot bypass, JS rendering, PD
|
|
|
203
206
|
}
|
|
204
207
|
```
|
|
205
208
|
|
|
206
|
-
**When to use:** `web_fetch` failed on an anti-bot-protected, JavaScript-heavy, or PDF page.
|
|
209
|
+
**When to use:** `web_fetch` failed on an anti-bot-protected, JavaScript-heavy, or PDF page, or the user explicitly asked for Firecrawl/cloud scraping. Do not use it before `web_fetch` for ordinary URL reading.
|
|
207
210
|
|
|
208
211
|
### `firecrawl_interact`
|
|
209
212
|
|
|
@@ -219,6 +222,6 @@ Open a URL in a live Firecrawl browser session and drive it with a natural-langu
|
|
|
219
222
|
}
|
|
220
223
|
```
|
|
221
224
|
|
|
222
|
-
**When to use:** `web_browse` cannot run (agent-browser missing / OS deps missing),
|
|
225
|
+
**When to use:** `web_browse` cannot run (agent-browser missing / OS deps missing), you need natural-language page interaction without hand-written CSS selectors, or the user explicitly asked for Firecrawl/cloud interaction. Do not use it before `web_browse` for ordinary page interaction. Write each prompt as a single, focused task.
|
|
223
226
|
|
|
224
227
|
---
|
|
@@ -39,18 +39,18 @@ const firecrawlInteractTool = defineTool({
|
|
|
39
39
|
name: "firecrawl_interact",
|
|
40
40
|
label: "Firecrawl Interact",
|
|
41
41
|
description: [
|
|
42
|
+
"Fallback-only cloud browser interaction via Firecrawl keyless.",
|
|
43
|
+
"Do not use firecrawl_interact as the first attempt for ordinary page interaction; use web_browse first.",
|
|
42
44
|
"Open a URL in a live Firecrawl browser session and drive it with a natural-language",
|
|
43
|
-
"prompt (or code), returning the result.
|
|
44
|
-
"
|
|
45
|
-
"natural-language page interaction without CSS selectors.",
|
|
45
|
+
"prompt (or code), returning the result. Use only when web_browse cannot run,",
|
|
46
|
+
"when the user explicitly asks for Firecrawl/cloud interaction, or when you need natural-language page interaction without CSS selectors.",
|
|
46
47
|
"Privacy: the URL, page content, and prompt are sent to Firecrawl's cloud.",
|
|
47
48
|
`Output is truncated to ${DEFAULT_MAX_LINES} lines or ${formatSize(DEFAULT_MAX_BYTES)}; if truncated, full output is saved to a temp file.`,
|
|
48
49
|
].join(" "),
|
|
49
|
-
promptSnippet: "
|
|
50
|
+
promptSnippet: "Fallback-only Firecrawl interaction",
|
|
50
51
|
promptGuidelines: [
|
|
51
|
-
"
|
|
52
|
-
"
|
|
53
|
-
"Always pass the full URL including https://.",
|
|
52
|
+
"Use firecrawl_interact only after web_browse fails, for needed NL interaction, or explicit cloud interaction.",
|
|
53
|
+
"Keep firecrawl_interact prompt/code focused.",
|
|
54
54
|
],
|
|
55
55
|
parameters: FirecrawlInteractParamsSchema,
|
|
56
56
|
|
|
@@ -95,13 +95,6 @@ const firecrawlInteractTool = defineTool({
|
|
|
95
95
|
|
|
96
96
|
renderResult(result, { expanded, isPartial }, theme, context) {
|
|
97
97
|
const isError = context?.isError ?? false;
|
|
98
|
-
|
|
99
|
-
if (isPartial) {
|
|
100
|
-
const domain = details?.url ? getDomain(details.url) : "";
|
|
101
|
-
const label = domain ? `Interacting with ${domain} via Firecrawl...` : "Interacting via Firecrawl...";
|
|
102
|
-
return new Text(theme.fg("warning", label), 0, 0);
|
|
103
|
-
}
|
|
104
|
-
|
|
105
98
|
const details = result.details as {
|
|
106
99
|
url?: string;
|
|
107
100
|
output?: string;
|
|
@@ -111,6 +104,12 @@ const firecrawlInteractTool = defineTool({
|
|
|
111
104
|
creditsUsed?: number;
|
|
112
105
|
} | undefined;
|
|
113
106
|
|
|
107
|
+
if (isPartial) {
|
|
108
|
+
const domain = details?.url ? getDomain(details.url) : "";
|
|
109
|
+
const label = domain ? `Interacting with ${domain} via Firecrawl...` : "Interacting via Firecrawl...";
|
|
110
|
+
return new Text(theme.fg("warning", label), 0, 0);
|
|
111
|
+
}
|
|
112
|
+
|
|
114
113
|
if (isError) {
|
|
115
114
|
const errText = getErrorText(result);
|
|
116
115
|
let text = theme.fg("error", "✗ Firecrawl interact failed");
|
|
@@ -41,17 +41,17 @@ const firecrawlScrapeTool = defineTool({
|
|
|
41
41
|
name: "firecrawl_scrape",
|
|
42
42
|
label: "Firecrawl Scrape",
|
|
43
43
|
description: [
|
|
44
|
-
"
|
|
45
|
-
"
|
|
46
|
-
"
|
|
44
|
+
"Fallback-only cloud fetch via Firecrawl (keyless — no API key, no signup).",
|
|
45
|
+
"Do not use firecrawl_scrape as the first attempt for ordinary URL reading; use web_fetch first.",
|
|
46
|
+
"Use firecrawl_scrape only when web_fetch already failed on a hard target (anti-bot,",
|
|
47
|
+
"JavaScript-heavy pages, PDFs), or when the user explicitly asks for Firecrawl/cloud rendering.",
|
|
47
48
|
"Privacy: the URL and page content are sent to Firecrawl's cloud.",
|
|
48
49
|
`Output is truncated to ${DEFAULT_MAX_LINES} lines or ${formatSize(DEFAULT_MAX_BYTES)}; if truncated, full output is saved to a temp file.`,
|
|
49
50
|
].join(" "),
|
|
50
|
-
promptSnippet: "
|
|
51
|
+
promptSnippet: "Fallback-only Firecrawl scrape",
|
|
51
52
|
promptGuidelines: [
|
|
52
|
-
"
|
|
53
|
-
"firecrawl_scrape
|
|
54
|
-
"Always pass the full URL including https://.",
|
|
53
|
+
"Use firecrawl_scrape only after web_fetch fails or explicit cloud scraping/rendering.",
|
|
54
|
+
"Use firecrawl_scrape for anti-bot pages, heavy JS, and PDFs.",
|
|
55
55
|
],
|
|
56
56
|
parameters: FirecrawlScrapeParamsSchema,
|
|
57
57
|
|
|
@@ -97,13 +97,6 @@ const firecrawlScrapeTool = defineTool({
|
|
|
97
97
|
|
|
98
98
|
renderResult(result, { expanded, isPartial }, theme, context) {
|
|
99
99
|
const isError = context?.isError ?? false;
|
|
100
|
-
|
|
101
|
-
if (isPartial) {
|
|
102
|
-
const domain = details?.url ? getDomain(details.url) : "";
|
|
103
|
-
const label = domain ? `Scraping ${domain} via Firecrawl...` : "Scraping via Firecrawl...";
|
|
104
|
-
return new Text(theme.fg("warning", label), 0, 0);
|
|
105
|
-
}
|
|
106
|
-
|
|
107
100
|
const details = result.details as {
|
|
108
101
|
url?: string;
|
|
109
102
|
bytes?: number;
|
|
@@ -113,6 +106,12 @@ const firecrawlScrapeTool = defineTool({
|
|
|
113
106
|
creditsUsed?: number;
|
|
114
107
|
} | undefined;
|
|
115
108
|
|
|
109
|
+
if (isPartial) {
|
|
110
|
+
const domain = details?.url ? getDomain(details.url) : "";
|
|
111
|
+
const label = domain ? `Scraping ${domain} via Firecrawl...` : "Scraping via Firecrawl...";
|
|
112
|
+
return new Text(theme.fg("warning", label), 0, 0);
|
|
113
|
+
}
|
|
114
|
+
|
|
116
115
|
if (isError) {
|
|
117
116
|
const errText = getErrorText(result);
|
|
118
117
|
let text = theme.fg("error", "✗ Firecrawl scrape failed");
|
|
@@ -42,17 +42,17 @@ const firecrawlSearchTool = defineTool({
|
|
|
42
42
|
name: "firecrawl_search",
|
|
43
43
|
label: "Firecrawl Search",
|
|
44
44
|
description: [
|
|
45
|
-
"
|
|
45
|
+
"Fallback-only cloud search via Firecrawl (keyless — no API key, no signup).",
|
|
46
|
+
"Do not use firecrawl_search as the first attempt for ordinary web discovery; use web_search first.",
|
|
46
47
|
"Supports sources (web/images/news) and categories (github/research/pdf) that",
|
|
47
|
-
"SearXNG does not. Use as an escape hatch
|
|
48
|
+
"SearXNG does not. Use only as an escape hatch when web_search fails/returns nothing, or when the user explicitly asks for Firecrawl/cloud search.",
|
|
48
49
|
"Privacy: the query is sent to Firecrawl's cloud.",
|
|
49
50
|
`Output is truncated to ${DEFAULT_MAX_LINES} lines or ${formatSize(DEFAULT_MAX_BYTES)}; if truncated, full output is saved to a temp file.`,
|
|
50
51
|
].join(" "),
|
|
51
|
-
promptSnippet: "
|
|
52
|
+
promptSnippet: "Fallback-only Firecrawl search",
|
|
52
53
|
promptGuidelines: [
|
|
53
|
-
"
|
|
54
|
-
"Use categories=[\"github\"
|
|
55
|
-
"Use includeDomains/excludeDomains to scope results to specific sites.",
|
|
54
|
+
"Use firecrawl_search only after web_search fails/returns nothing, for Firecrawl-only categories, or explicit cloud search.",
|
|
55
|
+
"Use categories=[\"github\"|\"research\"|\"pdf\"] and includeDomains/excludeDomains when needed.",
|
|
56
56
|
],
|
|
57
57
|
parameters: FirecrawlSearchParamsSchema,
|
|
58
58
|
|
package/extensions/index.ts
CHANGED
|
@@ -17,6 +17,26 @@ import registerFirecrawlScrape from "./firecrawl_scrape";
|
|
|
17
17
|
import registerFirecrawlSearch from "./firecrawl_search";
|
|
18
18
|
import registerFirecrawlInteract from "./firecrawl_interact";
|
|
19
19
|
|
|
20
|
+
const WEB_TOOL_ROUTING_POLICY = [
|
|
21
|
+
"Web tools are local-first: web_search=discover, web_fetch=one static URL, web_batch_fetch=2–5 static URLs, web_browse=interaction.",
|
|
22
|
+
"Use firecrawl_* only after the matching local tool failed in this conversation, or when the user explicitly asks for Firecrawl/cloud.",
|
|
23
|
+
"web_search/web_fetch/web_browse already auto-fallback to Firecrawl; pass full URLs with scheme and selectors when useful.",
|
|
24
|
+
].join("\n");
|
|
25
|
+
|
|
26
|
+
const WEB_TOOL_NAMES = new Set([
|
|
27
|
+
"web_search",
|
|
28
|
+
"web_fetch",
|
|
29
|
+
"web_browse",
|
|
30
|
+
"web_batch_fetch",
|
|
31
|
+
"firecrawl_search",
|
|
32
|
+
"firecrawl_scrape",
|
|
33
|
+
"firecrawl_interact",
|
|
34
|
+
]);
|
|
35
|
+
|
|
36
|
+
function shouldInjectWebToolRoutingPolicy(selectedTools: readonly string[] | undefined): boolean {
|
|
37
|
+
return selectedTools?.some((tool) => WEB_TOOL_NAMES.has(tool)) ?? false;
|
|
38
|
+
}
|
|
39
|
+
|
|
20
40
|
export default function (pi: ExtensionAPI) {
|
|
21
41
|
registerWebSearch(pi);
|
|
22
42
|
registerWebFetch(pi);
|
|
@@ -25,4 +45,9 @@ export default function (pi: ExtensionAPI) {
|
|
|
25
45
|
registerFirecrawlScrape(pi);
|
|
26
46
|
registerFirecrawlSearch(pi);
|
|
27
47
|
registerFirecrawlInteract(pi);
|
|
48
|
+
|
|
49
|
+
pi.on("before_agent_start", (event) => {
|
|
50
|
+
if (!shouldInjectWebToolRoutingPolicy(event.systemPromptOptions.selectedTools)) return;
|
|
51
|
+
return { systemPrompt: `${event.systemPrompt}\n\n${WEB_TOOL_ROUTING_POLICY}` };
|
|
52
|
+
});
|
|
28
53
|
}
|
|
@@ -113,14 +113,10 @@ const webBatchFetchTool = defineTool({
|
|
|
113
113
|
"For a single page, use web_fetch instead.",
|
|
114
114
|
`Output is truncated to ${DEFAULT_MAX_LINES} lines or ${formatSize(DEFAULT_MAX_BYTES)}; if truncated, full output is saved to a temp file.`,
|
|
115
115
|
].join(" "),
|
|
116
|
-
promptSnippet: "
|
|
116
|
+
promptSnippet: "Parallel fetch for 2–5 URLs",
|
|
117
117
|
promptGuidelines: [
|
|
118
|
-
"Use web_batch_fetch
|
|
119
|
-
"
|
|
120
|
-
"Use web_batch_fetch for cross-referencing sources, comparing implementations, or synthesizing research from multiple sites.",
|
|
121
|
-
"For a single URL, always use web_fetch — it supports per-URL selectors and stealthy mode.",
|
|
122
|
-
"If a page in the batch fails, the tool reports the error but continues with the others.",
|
|
123
|
-
"Keep batch sizes reasonable (≤8) to avoid overwhelming the browser and token budget.",
|
|
118
|
+
"Use web_batch_fetch for 2–5 pages to compare/cross-reference/synthesize; single URL → web_fetch.",
|
|
119
|
+
"Keep batches small (≤8; schema max 15); failed pages are reported without stopping the batch.",
|
|
124
120
|
],
|
|
125
121
|
parameters: WebBatchFetchParamsSchema,
|
|
126
122
|
|
package/extensions/web_browse.ts
CHANGED
|
@@ -106,22 +106,18 @@ const webBrowseTool = defineTool({
|
|
|
106
106
|
name: "web_browse",
|
|
107
107
|
label: "Web Browse",
|
|
108
108
|
description: [
|
|
109
|
-
"
|
|
109
|
+
"Primary local-first tool for interactive web pages: navigate, click, fill forms, scroll,",
|
|
110
110
|
"wait for content, and then extract text.",
|
|
111
|
-
"Uses the agent-browser CLI with batched JSON commands.",
|
|
111
|
+
"Uses the agent-browser CLI with batched JSON commands, then automatically tries Firecrawl keyless only if local browser automation fails.",
|
|
112
112
|
"Use web_browse when the target content requires interaction (clicking buttons,",
|
|
113
113
|
"scrolling, filling search boxes, waiting for JS to load) before it becomes available.",
|
|
114
114
|
"For pages that need no interaction, use web_fetch instead.",
|
|
115
115
|
`Output is truncated to ${DEFAULT_MAX_LINES} lines or ${formatSize(DEFAULT_MAX_BYTES)}; if truncated, full output is saved to a temp file.`,
|
|
116
116
|
].join(" "),
|
|
117
|
-
promptSnippet: "
|
|
117
|
+
promptSnippet: "Local browser interaction and extraction",
|
|
118
118
|
promptGuidelines: [
|
|
119
|
-
"Use web_browse when
|
|
120
|
-
"
|
|
121
|
-
"For static articles, docs, or blogs that load everything on first request, prefer web_fetch.",
|
|
122
|
-
"After web_search returns results, prefer web_fetch for reading individual articles.",
|
|
123
|
-
"Use web_browse directly when interaction is required; otherwise try web_fetch first.",
|
|
124
|
-
"Always provide a selector to extract only the relevant content area — avoid dumping full page text.",
|
|
119
|
+
"Use web_browse only when clicks/forms/scroll/wait are needed; otherwise use web_fetch.",
|
|
120
|
+
"Provide a selector to narrow extracted content when possible.",
|
|
125
121
|
],
|
|
126
122
|
parameters: WebBrowseParamsSchema,
|
|
127
123
|
|
package/extensions/web_fetch.ts
CHANGED
|
@@ -40,19 +40,15 @@ const webFetchTool = defineTool({
|
|
|
40
40
|
name: "web_fetch",
|
|
41
41
|
label: "Web Fetch",
|
|
42
42
|
description: [
|
|
43
|
-
"
|
|
44
|
-
"
|
|
45
|
-
"Use web_fetch to read the full content of a specific result or user-provided URL.",
|
|
43
|
+
"Primary local-first tool for reading a single web page URL.",
|
|
44
|
+
"Fetches and extracts readable content via scrapling, then automatically tries Firecrawl keyless only if the local fetcher fails.",
|
|
45
|
+
"Use web_fetch as the first attempt to read the full content of a specific result or user-provided URL.",
|
|
46
46
|
"Callers remain responsible for robots.txt and site terms; Scrapling extract commands do not enforce them automatically.",
|
|
47
47
|
`Output is truncated to ${DEFAULT_MAX_LINES} lines or ${formatSize(DEFAULT_MAX_BYTES)}; if truncated, full output is saved to a temp file.`,
|
|
48
48
|
].join(" "),
|
|
49
|
-
promptSnippet: "
|
|
49
|
+
promptSnippet: "Local-first fetch of one URL as markdown",
|
|
50
50
|
promptGuidelines: [
|
|
51
|
-
"Use web_fetch
|
|
52
|
-
"For a single URL, always use web_fetch instead of web_batch_fetch.",
|
|
53
|
-
"If the page is dynamic/JavaScript-heavy, the tool automatically uses browser automation.",
|
|
54
|
-
"When reading multiple (2–5) pages at once (e.g., after web_search), prefer web_batch_fetch over repeated web_fetch calls.",
|
|
55
|
-
"Always pass the full URL including https://.",
|
|
51
|
+
"Use web_fetch for one non-interactive URL; use web_batch_fetch for 2–5 URLs.",
|
|
56
52
|
],
|
|
57
53
|
parameters: WebFetchParamsSchema,
|
|
58
54
|
|
package/extensions/web_search.ts
CHANGED
|
@@ -51,19 +51,18 @@ const webSearchTool = defineTool({
|
|
|
51
51
|
name: "web_search",
|
|
52
52
|
label: "Web Search",
|
|
53
53
|
description: [
|
|
54
|
-
"
|
|
54
|
+
"Primary local-first tool for web discovery via a SearXNG instance.",
|
|
55
55
|
"Returns a list of results with title, URL, and snippet.",
|
|
56
56
|
"Automatically aggregates up to 3 pages of SearXNG results when more than ~20 are needed.",
|
|
57
|
+
"Use web_search as the first attempt for web search; it automatically tries Firecrawl keyless only if SearXNG fails or returns nothing.",
|
|
57
58
|
"Use web_search when the user asks about current events, facts, or anything",
|
|
58
59
|
"that requires up-to-date information beyond the model's training data.",
|
|
59
60
|
`Output is truncated to ${DEFAULT_MAX_LINES} lines or ${formatSize(DEFAULT_MAX_BYTES)}; if truncated, full output is saved to a temp file.`,
|
|
60
61
|
].join(" "),
|
|
61
|
-
promptSnippet: "
|
|
62
|
+
promptSnippet: "Local web search via SearXNG",
|
|
62
63
|
promptGuidelines: [
|
|
63
|
-
"Use web_search
|
|
64
|
-
"
|
|
65
|
-
"If web_search returns no results but includes suggestions, consider using a suggested query to refine your search.",
|
|
66
|
-
"If web_search returns multiple (2–5) relevant results that all need to be read, prefer web_batch_fetch to fetch them in parallel instead of calling web_fetch repeatedly.",
|
|
64
|
+
"Use web_search for current/external facts, verification, docs, and discovery.",
|
|
65
|
+
"If 2–5 results need reading, use web_batch_fetch; retry suggested queries when results are empty.",
|
|
67
66
|
],
|
|
68
67
|
parameters: WebSearchParamsSchema,
|
|
69
68
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "pi-web-toolkit",
|
|
3
|
-
"version": "0.3.
|
|
3
|
+
"version": "0.3.2",
|
|
4
4
|
"description": "Web research toolkit for the pi coding agent. Search via SearXNG, fetch pages with scrapling, browse interactively via agent-browser, batch-read sources in parallel, and optionally fall back to Firecrawl Keyless (no API key) when a local backend fails.",
|
|
5
5
|
"author": "Wade Huang <fastwade11@gmail.com>",
|
|
6
6
|
"license": "MIT",
|
|
@@ -19,7 +19,7 @@
|
|
|
19
19
|
},
|
|
20
20
|
"scripts": {
|
|
21
21
|
"typecheck": "tsc --noEmit",
|
|
22
|
-
"test": "tsx test/content-preview/test.ts && tsx test/agent-browser/test.ts && tsx test/firecrawl/test.ts",
|
|
22
|
+
"test": "tsx test/content-preview/test.ts && tsx test/agent-browser/test.ts && tsx test/firecrawl/test.ts && tsx test/tool-routing/test.ts",
|
|
23
23
|
"test:agent-browser": "tsx test/agent-browser/test.ts",
|
|
24
24
|
"test:firecrawl": "tsx test/firecrawl/test.ts",
|
|
25
25
|
"test:approve": "tsx test/content-preview/test.ts --approve"
|