pi-web-toolkit 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -7,6 +7,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [0.3.2] - 2026-06-25
11
+
12
+ ### Fixed
13
+
14
+ - Kept the agent's web-tool selection local-first: ordinary URL reads now prefer `web_fetch`, discovery prefers `web_search`, and interaction prefers `web_browse`; `firecrawl_*` tools are documented and prompted as fallback-only unless explicitly requested.
15
+ - Fixed `firecrawl_scrape` and `firecrawl_interact` partial-result rendering type-check errors caused by reading `details` before declaration.
16
+
17
+ ### Changed
18
+
19
+ - Reduced web-tool prompt metadata overhead by consolidating shared routing rules and shortening per-tool `promptSnippet`/`promptGuidelines` text.
20
+ - Added a tool-routing prompt regression test and included it in `npm test`.
21
+
10
22
  ## [0.3.1] - 2026-06-23
11
23
 
12
24
  ### Changed
@@ -145,7 +157,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
145
157
  - `web_browse` — interactive browser automation via agent-browser.
146
158
  - LLM-optimized `promptGuidelines` and `promptSnippet` for every tool.
147
159
 
148
- [Unreleased]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.2.2...HEAD
160
+ [Unreleased]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.3.2...HEAD
161
+ [0.3.2]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.3.1...v0.3.2
162
+ [0.3.1]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.3.0...v0.3.1
163
+ [0.3.0]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.2.2...v0.3.0
149
164
  [0.2.2]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.2.1...v0.2.2
150
165
  [0.2.1]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.2.0...v0.2.1
151
166
  [0.2.0]: https://github.com/Wade11s/pi-web-toolkit/compare/v0.1.2...v0.2.0
package/README.md CHANGED
@@ -22,7 +22,7 @@ Web research toolkit for [pi](https://pi.dev) agents. Search via SearXNG, fetch
22
22
  | **`firecrawl_scrape`** | [firecrawl-cli](https://github.com/firecrawl/cli) (keyless) | Cloud single-page fetch (anti-bot / JS / PDF) | — |
23
23
  | **`firecrawl_interact`** | [firecrawl-cli](https://github.com/firecrawl/cli) (keyless) | Cloud natural-language page interaction | — |
24
24
 
25
- > **Firecrawl fallback.** `web_search`, `web_fetch`, and `web_browse` automatically retry through Firecrawl Keyless (1,000 free credits/month, no API key) when their local backend errors out or search returns nothing. The three `firecrawl_*` tools are explicit escape hatches. Disable it with `PI_WEB_FIRECRAWL_FALLBACK=0`. Install the optional CLI: `npm install -g firecrawl-cli`.
25
+ > **Firecrawl fallback.** `web_search`, `web_fetch`, and `web_browse` are the local-first primary tools and automatically retry through Firecrawl Keyless (1,000 free credits/month, no API key) only when their local backend errors out or search returns nothing. The three `firecrawl_*` tools are fallback-only escape hatches; agents are instructed not to call them first unless you explicitly ask for Firecrawl/cloud behavior or a local-first tool already failed. Disable fallback use with `PI_WEB_FIRECRAWL_FALLBACK=0`. Install the optional CLI: `npm install -g firecrawl-cli`.
26
26
 
27
27
  ## Tools Preview
28
28
 
@@ -198,7 +198,7 @@ export PI_WEB_FIRECRAWL_FALLBACK=0
198
198
 
199
199
  ### Optional: Firecrawl keyless fallback
200
200
 
201
- When a local backend (`web_search`/`web_fetch`/`web_browse`) fails or returns nothing, the tools automatically retry through [Firecrawl Keyless](https://www.firecrawl.dev/blog/firecrawl-keyless-launch) — 1,000 free credits/month, **no API key, no signup**. The `firecrawl_*` tools are explicit escape hatches for capabilities the local backends lack (search categories, cloud rendering, natural-language interaction).
201
+ When a local backend (`web_search`/`web_fetch`/`web_browse`) fails or returns nothing, the tools automatically retry through [Firecrawl Keyless](https://www.firecrawl.dev/blog/firecrawl-keyless-launch) — 1,000 free credits/month, **no API key, no signup**. The `firecrawl_*` tools are fallback-only explicit escape hatches for capabilities the local backends lack (search categories, cloud rendering, natural-language interaction). Agents should use `web_fetch`/`web_search`/`web_browse` first unless you explicitly request Firecrawl/cloud behavior.
202
202
 
203
203
  Install the optional CLI (the fallback degrades gracefully if it is absent):
204
204
 
package/docs/guide.md CHANGED
@@ -46,7 +46,7 @@ User asks about something external / current
46
46
 
47
47
  ## Firecrawl Keyless fallback
48
48
 
49
- When a local backend cannot do the job, the tools automatically retry through **Firecrawl Keyless** (1,000 free credits/month, no API key, no signup) before giving up. It is **fallback-only** — never the primary path — and is **opt-out-able** with `PI_WEB_FIRECRAWL_FALLBACK=0`. Requires the optional `firecrawl-cli` (`npm install -g firecrawl-cli`); if it is absent the tools simply surface the original local error.
49
+ When a local backend cannot do the job, the tools automatically retry through **Firecrawl Keyless** (1,000 free credits/month, no API key, no signup) before giving up. It is **fallback-only** — never the primary path — and is **opt-out-able** with `PI_WEB_FIRECRAWL_FALLBACK=0`. Requires the optional `firecrawl-cli` (`npm install -g firecrawl-cli`); if it is absent the tools simply surface the original local error. Agents should call `web_search`/`web_fetch`/`web_browse` first and call `firecrawl_*` directly only after the corresponding local-first tool failed, or when the user explicitly asks for Firecrawl/cloud behavior.
50
50
 
51
51
  | Tool | Falls back to Firecrawl when… |
52
52
  |------|-------------------------------|
@@ -55,7 +55,7 @@ When a local backend cannot do the job, the tools automatically retry through **
55
55
  | `web_browse` | agent-browser is missing or its batch fails (not on caller validation errors) |
56
56
  | `web_batch_fetch` | (no fallback — Firecrawl batch scrape is not keyless) |
57
57
 
58
- The three `firecrawl_*` tools are the explicit escape hatches for capabilities the local backends lack (`github`/`research`/`pdf` search categories, cloud rendering, natural-language interaction).
58
+ The three `firecrawl_*` tools are fallback-only explicit escape hatches for capabilities the local backends lack (`github`/`research`/`pdf` search categories, cloud rendering, natural-language interaction). They are not the first step for ordinary URL reading; `web_fetch` already performs Firecrawl fallback internally when local fetching fails.
59
59
 
60
60
  **Graceful skip.** If the fallback itself cannot help — the CLI is missing, the IP is flagged as suspicious, the keyless quota is exhausted, or the fallback is disabled — the tool falls through to the original local-tool error so the user is never left worse off.
61
61
 
package/docs/tools.md CHANGED
@@ -12,7 +12,7 @@ Search the web via SearXNG. Returns ranked results with title, URL, and snippet.
12
12
  }
13
13
  ```
14
14
 
15
- **When to use:** The user asks about current events, facts, or anything requiring up-to-date information and has not already provided the source URLs.
15
+ **When to use:** The user asks about current events, facts, or anything requiring up-to-date information and has not already provided the source URLs. Use `web_search` before `firecrawl_search`; `web_search` already performs Firecrawl fallback internally when SearXNG fails or returns nothing.
16
16
 
17
17
  **Empty results behavior:** When no results are found, `web_search` includes any query **suggestions** provided by SearXNG. The agent can use them to refine and retry the search.
18
18
 
@@ -35,9 +35,10 @@ Fetch a single page and convert it to clean markdown. Uses Scrapling's browser-b
35
35
  ```
36
36
 
37
37
  **When to use:**
38
- - After `web_search` finds a relevant result
38
+ - As the first attempt for a user-provided URL or after `web_search` finds a relevant result
39
39
  - The page is static or loads its content on first request
40
40
  - You need to read **one** article, doc, or blog post
41
+ - Before `firecrawl_scrape`; `web_fetch` already performs Firecrawl fallback internally when the local fetcher fails
41
42
 
42
43
  **Example flow:**
43
44
  ```
@@ -77,10 +78,12 @@ Uses the [agent-browser](https://github.com/vercel-labs/agent-browser) CLI with
77
78
  When `selector` is omitted, the tool returns agent-browser's interactive accessibility snapshot rather than full page text.
78
79
 
79
80
  **When to use:**
81
+ - As the first attempt when the page requires interaction
80
82
  - The page requires **clicking** before showing target content (e.g. "Load more", pagination, tab switching)
81
83
  - The page requires **filling a form** (e.g. search box, login)
82
84
  - The page requires **scrolling** to load lazy content (infinite scroll)
83
85
  - The page requires **waiting** for JS to render content (SPA)
86
+ - Before `firecrawl_interact`; `web_browse` already performs Firecrawl fallback internally when local browser automation fails
84
87
 
85
88
  **Example flows:**
86
89
 
@@ -163,11 +166,11 @@ User: "Compare Python asyncio, Trio, and curio"
163
166
 
164
167
  ---
165
168
 
166
- ## Firecrawl keyless tools (optional cloud escape hatches)
169
+ ## Firecrawl keyless tools (optional fallback-only cloud escape hatches)
167
170
 
168
171
  These three tools talk to [Firecrawl](https://www.firecrawl.dev) in **keyless** mode: 1,000 free credits/month, **no API key and no signup**. They require the optional `firecrawl-cli` (`npm install -g firecrawl-cli`). **Privacy:** the URL/query/page content is sent to Firecrawl's cloud.
169
172
 
170
- They double as the implementation of the automatic fallback: `web_search`/`web_fetch`/`web_browse` retry through Firecrawl keyless when their local backend fails (or search returns nothing). Disable all Firecrawl usage with `PI_WEB_FIRECRAWL_FALLBACK=0`.
173
+ They double as the implementation of the automatic fallback: `web_search`/`web_fetch`/`web_browse` retry through Firecrawl keyless when their local backend fails (or search returns nothing). Do not use `firecrawl_*` as the first attempt for ordinary search, URL reading, or page interaction; use the corresponding local-first tool first unless the user explicitly asks for Firecrawl/cloud behavior. Disable all Firecrawl usage with `PI_WEB_FIRECRAWL_FALLBACK=0`.
171
174
 
172
175
  ### `firecrawl_search`
173
176
 
@@ -187,7 +190,7 @@ Cloud web search via Firecrawl keyless, with capabilities the local SearXNG tool
187
190
  }
188
191
  ```
189
192
 
190
- **When to use:** `web_search` failed or returned nothing; or you need `github`/`research`/`pdf` categories, images/news sources, or domain scoping that SearXNG does not provide.
193
+ **When to use:** `web_search` failed or returned nothing; you need `github`/`research`/`pdf` categories, images/news sources, or domain scoping that SearXNG does not provide; or the user explicitly asked for Firecrawl/cloud search. Do not use it before `web_search` for ordinary discovery.
191
194
 
192
195
  ### `firecrawl_scrape`
193
196
 
@@ -203,7 +206,7 @@ Cloud single-page fetch via Firecrawl keyless (anti-bot bypass, JS rendering, PD
203
206
  }
204
207
  ```
205
208
 
206
- **When to use:** `web_fetch` failed on an anti-bot-protected, JavaScript-heavy, or PDF page.
209
+ **When to use:** `web_fetch` failed on an anti-bot-protected, JavaScript-heavy, or PDF page, or the user explicitly asked for Firecrawl/cloud scraping. Do not use it before `web_fetch` for ordinary URL reading.
207
210
 
208
211
  ### `firecrawl_interact`
209
212
 
@@ -219,6 +222,6 @@ Open a URL in a live Firecrawl browser session and drive it with a natural-langu
219
222
  }
220
223
  ```
221
224
 
222
- **When to use:** `web_browse` cannot run (agent-browser missing / OS deps missing), or you want natural-language page interaction without hand-written CSS selectors. Write each prompt as a single, focused task.
225
+ **When to use:** `web_browse` cannot run (agent-browser missing / OS deps missing), you need natural-language page interaction without hand-written CSS selectors, or the user explicitly asked for Firecrawl/cloud interaction. Do not use it before `web_browse` for ordinary page interaction. Write each prompt as a single, focused task.
223
226
 
224
227
  ---
@@ -39,18 +39,18 @@ const firecrawlInteractTool = defineTool({
39
39
  name: "firecrawl_interact",
40
40
  label: "Firecrawl Interact",
41
41
  description: [
42
+ "Fallback-only cloud browser interaction via Firecrawl keyless.",
43
+ "Do not use firecrawl_interact as the first attempt for ordinary page interaction; use web_browse first.",
42
44
  "Open a URL in a live Firecrawl browser session and drive it with a natural-language",
43
- "prompt (or code), returning the result. Keyless no API key, no signup.",
44
- "Use firecrawl_interact when the local web_browse cannot run, or when you want",
45
- "natural-language page interaction without CSS selectors.",
45
+ "prompt (or code), returning the result. Use only when web_browse cannot run,",
46
+ "when the user explicitly asks for Firecrawl/cloud interaction, or when you need natural-language page interaction without CSS selectors.",
46
47
  "Privacy: the URL, page content, and prompt are sent to Firecrawl's cloud.",
47
48
  `Output is truncated to ${DEFAULT_MAX_LINES} lines or ${formatSize(DEFAULT_MAX_BYTES)}; if truncated, full output is saved to a temp file.`,
48
49
  ].join(" "),
49
- promptSnippet: "Drive a page via Firecrawl keyless (natural-language interaction)",
50
+ promptSnippet: "Fallback-only Firecrawl interaction",
50
51
  promptGuidelines: [
51
- "Prefer web_browse first; reach for firecrawl_interact when web_browse can't run or you want NL interaction.",
52
- "Write each prompt as a single, focused task; the session can be reused across calls.",
53
- "Always pass the full URL including https://.",
52
+ "Use firecrawl_interact only after web_browse fails, for needed NL interaction, or explicit cloud interaction.",
53
+ "Keep firecrawl_interact prompt/code focused.",
54
54
  ],
55
55
  parameters: FirecrawlInteractParamsSchema,
56
56
 
@@ -95,13 +95,6 @@ const firecrawlInteractTool = defineTool({
95
95
 
96
96
  renderResult(result, { expanded, isPartial }, theme, context) {
97
97
  const isError = context?.isError ?? false;
98
-
99
- if (isPartial) {
100
- const domain = details?.url ? getDomain(details.url) : "";
101
- const label = domain ? `Interacting with ${domain} via Firecrawl...` : "Interacting via Firecrawl...";
102
- return new Text(theme.fg("warning", label), 0, 0);
103
- }
104
-
105
98
  const details = result.details as {
106
99
  url?: string;
107
100
  output?: string;
@@ -111,6 +104,12 @@ const firecrawlInteractTool = defineTool({
111
104
  creditsUsed?: number;
112
105
  } | undefined;
113
106
 
107
+ if (isPartial) {
108
+ const domain = details?.url ? getDomain(details.url) : "";
109
+ const label = domain ? `Interacting with ${domain} via Firecrawl...` : "Interacting via Firecrawl...";
110
+ return new Text(theme.fg("warning", label), 0, 0);
111
+ }
112
+
114
113
  if (isError) {
115
114
  const errText = getErrorText(result);
116
115
  let text = theme.fg("error", "✗ Firecrawl interact failed");
@@ -41,17 +41,17 @@ const firecrawlScrapeTool = defineTool({
41
41
  name: "firecrawl_scrape",
42
42
  label: "Firecrawl Scrape",
43
43
  description: [
44
- "Fetch a single page as clean markdown via Firecrawl (keyless — no API key, no signup).",
45
- "Use firecrawl_scrape when the local web_fetch fails on a hard target (anti-bot,",
46
- "JavaScript-heavy pages, PDFs) or when you need Firecrawl's cloud rendering directly.",
44
+ "Fallback-only cloud fetch via Firecrawl (keyless — no API key, no signup).",
45
+ "Do not use firecrawl_scrape as the first attempt for ordinary URL reading; use web_fetch first.",
46
+ "Use firecrawl_scrape only when web_fetch already failed on a hard target (anti-bot,",
47
+ "JavaScript-heavy pages, PDFs), or when the user explicitly asks for Firecrawl/cloud rendering.",
47
48
  "Privacy: the URL and page content are sent to Firecrawl's cloud.",
48
49
  `Output is truncated to ${DEFAULT_MAX_LINES} lines or ${formatSize(DEFAULT_MAX_BYTES)}; if truncated, full output is saved to a temp file.`,
49
50
  ].join(" "),
50
- promptSnippet: "Fetch a single page via Firecrawl keyless (anti-bot / JS / PDF fallback)",
51
+ promptSnippet: "Fallback-only Firecrawl scrape",
51
52
  promptGuidelines: [
52
- "Prefer web_fetch first; reach for firecrawl_scrape when web_fetch fails or you need cloud rendering.",
53
- "firecrawl_scrape handles anti-bot protection, JS-heavy SPAs, and PDFs that scrapling may miss.",
54
- "Always pass the full URL including https://.",
53
+ "Use firecrawl_scrape only after web_fetch fails or explicit cloud scraping/rendering.",
54
+ "Use firecrawl_scrape for anti-bot pages, heavy JS, and PDFs.",
55
55
  ],
56
56
  parameters: FirecrawlScrapeParamsSchema,
57
57
 
@@ -97,13 +97,6 @@ const firecrawlScrapeTool = defineTool({
97
97
 
98
98
  renderResult(result, { expanded, isPartial }, theme, context) {
99
99
  const isError = context?.isError ?? false;
100
-
101
- if (isPartial) {
102
- const domain = details?.url ? getDomain(details.url) : "";
103
- const label = domain ? `Scraping ${domain} via Firecrawl...` : "Scraping via Firecrawl...";
104
- return new Text(theme.fg("warning", label), 0, 0);
105
- }
106
-
107
100
  const details = result.details as {
108
101
  url?: string;
109
102
  bytes?: number;
@@ -113,6 +106,12 @@ const firecrawlScrapeTool = defineTool({
113
106
  creditsUsed?: number;
114
107
  } | undefined;
115
108
 
109
+ if (isPartial) {
110
+ const domain = details?.url ? getDomain(details.url) : "";
111
+ const label = domain ? `Scraping ${domain} via Firecrawl...` : "Scraping via Firecrawl...";
112
+ return new Text(theme.fg("warning", label), 0, 0);
113
+ }
114
+
116
115
  if (isError) {
117
116
  const errText = getErrorText(result);
118
117
  let text = theme.fg("error", "✗ Firecrawl scrape failed");
@@ -42,17 +42,17 @@ const firecrawlSearchTool = defineTool({
42
42
  name: "firecrawl_search",
43
43
  label: "Firecrawl Search",
44
44
  description: [
45
- "Search the web via Firecrawl (keyless — no API key, no signup).",
45
+ "Fallback-only cloud search via Firecrawl (keyless — no API key, no signup).",
46
+ "Do not use firecrawl_search as the first attempt for ordinary web discovery; use web_search first.",
46
47
  "Supports sources (web/images/news) and categories (github/research/pdf) that",
47
- "SearXNG does not. Use as an escape hatch or when web_search returns nothing.",
48
+ "SearXNG does not. Use only as an escape hatch when web_search fails/returns nothing, or when the user explicitly asks for Firecrawl/cloud search.",
48
49
  "Privacy: the query is sent to Firecrawl's cloud.",
49
50
  `Output is truncated to ${DEFAULT_MAX_LINES} lines or ${formatSize(DEFAULT_MAX_BYTES)}; if truncated, full output is saved to a temp file.`,
50
51
  ].join(" "),
51
- promptSnippet: "Search the web via Firecrawl keyless (categories, sources, domain filters)",
52
+ promptSnippet: "Fallback-only Firecrawl search",
52
53
  promptGuidelines: [
53
- "Prefer web_search first; reach for firecrawl_search when web_search fails or returns nothing.",
54
- "Use categories=[\"github\"], [\"research\"], or [\"pdf\"] for source-type-specific discovery.",
55
- "Use includeDomains/excludeDomains to scope results to specific sites.",
54
+ "Use firecrawl_search only after web_search fails/returns nothing, for Firecrawl-only categories, or explicit cloud search.",
55
+ "Use categories=[\"github\"|\"research\"|\"pdf\"] and includeDomains/excludeDomains when needed.",
56
56
  ],
57
57
  parameters: FirecrawlSearchParamsSchema,
58
58
 
@@ -17,6 +17,26 @@ import registerFirecrawlScrape from "./firecrawl_scrape";
17
17
  import registerFirecrawlSearch from "./firecrawl_search";
18
18
  import registerFirecrawlInteract from "./firecrawl_interact";
19
19
 
20
+ const WEB_TOOL_ROUTING_POLICY = [
21
+ "Web tools are local-first: web_search=discover, web_fetch=one static URL, web_batch_fetch=2–5 static URLs, web_browse=interaction.",
22
+ "Use firecrawl_* only after the matching local tool failed in this conversation, or when the user explicitly asks for Firecrawl/cloud.",
23
+ "web_search/web_fetch/web_browse already auto-fallback to Firecrawl; pass full URLs with scheme and selectors when useful.",
24
+ ].join("\n");
25
+
26
+ const WEB_TOOL_NAMES = new Set([
27
+ "web_search",
28
+ "web_fetch",
29
+ "web_browse",
30
+ "web_batch_fetch",
31
+ "firecrawl_search",
32
+ "firecrawl_scrape",
33
+ "firecrawl_interact",
34
+ ]);
35
+
36
+ function shouldInjectWebToolRoutingPolicy(selectedTools: readonly string[] | undefined): boolean {
37
+ return selectedTools?.some((tool) => WEB_TOOL_NAMES.has(tool)) ?? false;
38
+ }
39
+
20
40
  export default function (pi: ExtensionAPI) {
21
41
  registerWebSearch(pi);
22
42
  registerWebFetch(pi);
@@ -25,4 +45,9 @@ export default function (pi: ExtensionAPI) {
25
45
  registerFirecrawlScrape(pi);
26
46
  registerFirecrawlSearch(pi);
27
47
  registerFirecrawlInteract(pi);
48
+
49
+ pi.on("before_agent_start", (event) => {
50
+ if (!shouldInjectWebToolRoutingPolicy(event.systemPromptOptions.selectedTools)) return;
51
+ return { systemPrompt: `${event.systemPrompt}\n\n${WEB_TOOL_ROUTING_POLICY}` };
52
+ });
28
53
  }
@@ -113,14 +113,10 @@ const webBatchFetchTool = defineTool({
113
113
  "For a single page, use web_fetch instead.",
114
114
  `Output is truncated to ${DEFAULT_MAX_LINES} lines or ${formatSize(DEFAULT_MAX_BYTES)}; if truncated, full output is saved to a temp file.`,
115
115
  ].join(" "),
116
- promptSnippet: "Fetch multiple URLs in parallel for research",
116
+ promptSnippet: "Parallel fetch for 2–5 URLs",
117
117
  promptGuidelines: [
118
- "Use web_batch_fetch when web_search returns multiple (2–5) relevant pages and the agent needs to read them all at once.",
119
- "Prefer web_batch_fetch over repeated web_fetch calls when reading multiple pages for comparison or synthesis.",
120
- "Use web_batch_fetch for cross-referencing sources, comparing implementations, or synthesizing research from multiple sites.",
121
- "For a single URL, always use web_fetch — it supports per-URL selectors and stealthy mode.",
122
- "If a page in the batch fails, the tool reports the error but continues with the others.",
123
- "Keep batch sizes reasonable (≤8) to avoid overwhelming the browser and token budget.",
118
+ "Use web_batch_fetch for 2–5 pages to compare/cross-reference/synthesize; single URL web_fetch.",
119
+ "Keep batches small (≤8; schema max 15); failed pages are reported without stopping the batch.",
124
120
  ],
125
121
  parameters: WebBatchFetchParamsSchema,
126
122
 
@@ -106,22 +106,18 @@ const webBrowseTool = defineTool({
106
106
  name: "web_browse",
107
107
  label: "Web Browse",
108
108
  description: [
109
- "Interact with a web page through a browser: navigate, click, fill forms, scroll,",
109
+ "Primary local-first tool for interactive web pages: navigate, click, fill forms, scroll,",
110
110
  "wait for content, and then extract text.",
111
- "Uses the agent-browser CLI with batched JSON commands.",
111
+ "Uses the agent-browser CLI with batched JSON commands, then automatically tries Firecrawl keyless only if local browser automation fails.",
112
112
  "Use web_browse when the target content requires interaction (clicking buttons,",
113
113
  "scrolling, filling search boxes, waiting for JS to load) before it becomes available.",
114
114
  "For pages that need no interaction, use web_fetch instead.",
115
115
  `Output is truncated to ${DEFAULT_MAX_LINES} lines or ${formatSize(DEFAULT_MAX_BYTES)}; if truncated, full output is saved to a temp file.`,
116
116
  ].join(" "),
117
- promptSnippet: "Interact with a web page (click, scroll, fill) and extract content",
117
+ promptSnippet: "Local browser interaction and extraction",
118
118
  promptGuidelines: [
119
- "Use web_browse when a page requires clicking, scrolling, or form submission before showing target content.",
120
- "Use web_browse for SPAs, pagination (click 'Load more'), search forms, tab switching, and modal dialogs.",
121
- "For static articles, docs, or blogs that load everything on first request, prefer web_fetch.",
122
- "After web_search returns results, prefer web_fetch for reading individual articles.",
123
- "Use web_browse directly when interaction is required; otherwise try web_fetch first.",
124
- "Always provide a selector to extract only the relevant content area — avoid dumping full page text.",
119
+ "Use web_browse only when clicks/forms/scroll/wait are needed; otherwise use web_fetch.",
120
+ "Provide a selector to narrow extracted content when possible.",
125
121
  ],
126
122
  parameters: WebBrowseParamsSchema,
127
123
 
@@ -40,19 +40,15 @@ const webFetchTool = defineTool({
40
40
  name: "web_fetch",
41
41
  label: "Web Fetch",
42
42
  description: [
43
- "Fetch and extract readable content from a web page URL.",
44
- "Uses scrapling to download the page and convert it to clean markdown.",
45
- "Use web_fetch to read the full content of a specific result or user-provided URL.",
43
+ "Primary local-first tool for reading a single web page URL.",
44
+ "Fetches and extracts readable content via scrapling, then automatically tries Firecrawl keyless only if the local fetcher fails.",
45
+ "Use web_fetch as the first attempt to read the full content of a specific result or user-provided URL.",
46
46
  "Callers remain responsible for robots.txt and site terms; Scrapling extract commands do not enforce them automatically.",
47
47
  `Output is truncated to ${DEFAULT_MAX_LINES} lines or ${formatSize(DEFAULT_MAX_BYTES)}; if truncated, full output is saved to a temp file.`,
48
48
  ].join(" "),
49
- promptSnippet: "Fetch full page content from a URL as markdown",
49
+ promptSnippet: "Local-first fetch of one URL as markdown",
50
50
  promptGuidelines: [
51
- "Use web_fetch to read a single page (article, doc, or blog) that needs no interaction.",
52
- "For a single URL, always use web_fetch instead of web_batch_fetch.",
53
- "If the page is dynamic/JavaScript-heavy, the tool automatically uses browser automation.",
54
- "When reading multiple (2–5) pages at once (e.g., after web_search), prefer web_batch_fetch over repeated web_fetch calls.",
55
- "Always pass the full URL including https://.",
51
+ "Use web_fetch for one non-interactive URL; use web_batch_fetch for 2–5 URLs.",
56
52
  ],
57
53
  parameters: WebFetchParamsSchema,
58
54
 
@@ -51,19 +51,18 @@ const webSearchTool = defineTool({
51
51
  name: "web_search",
52
52
  label: "Web Search",
53
53
  description: [
54
- "Search the web using a SearXNG instance.",
54
+ "Primary local-first tool for web discovery via a SearXNG instance.",
55
55
  "Returns a list of results with title, URL, and snippet.",
56
56
  "Automatically aggregates up to 3 pages of SearXNG results when more than ~20 are needed.",
57
+ "Use web_search as the first attempt for web search; it automatically tries Firecrawl keyless only if SearXNG fails or returns nothing.",
57
58
  "Use web_search when the user asks about current events, facts, or anything",
58
59
  "that requires up-to-date information beyond the model's training data.",
59
60
  `Output is truncated to ${DEFAULT_MAX_LINES} lines or ${formatSize(DEFAULT_MAX_BYTES)}; if truncated, full output is saved to a temp file.`,
60
61
  ].join(" "),
61
- promptSnippet: "Search the web for current information",
62
+ promptSnippet: "Local web search via SearXNG",
62
63
  promptGuidelines: [
63
- "Use web_search when the user asks about recent events, current data, or external facts.",
64
- "Use web_search to verify claims, find documentation, or discover resources online.",
65
- "If web_search returns no results but includes suggestions, consider using a suggested query to refine your search.",
66
- "If web_search returns multiple (2–5) relevant results that all need to be read, prefer web_batch_fetch to fetch them in parallel instead of calling web_fetch repeatedly.",
64
+ "Use web_search for current/external facts, verification, docs, and discovery.",
65
+ "If 2–5 results need reading, use web_batch_fetch; retry suggested queries when results are empty.",
67
66
  ],
68
67
  parameters: WebSearchParamsSchema,
69
68
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-web-toolkit",
3
- "version": "0.3.1",
3
+ "version": "0.3.2",
4
4
  "description": "Web research toolkit for the pi coding agent. Search via SearXNG, fetch pages with scrapling, browse interactively via agent-browser, batch-read sources in parallel, and optionally fall back to Firecrawl Keyless (no API key) when a local backend fails.",
5
5
  "author": "Wade Huang <fastwade11@gmail.com>",
6
6
  "license": "MIT",
@@ -19,7 +19,7 @@
19
19
  },
20
20
  "scripts": {
21
21
  "typecheck": "tsc --noEmit",
22
- "test": "tsx test/content-preview/test.ts && tsx test/agent-browser/test.ts && tsx test/firecrawl/test.ts",
22
+ "test": "tsx test/content-preview/test.ts && tsx test/agent-browser/test.ts && tsx test/firecrawl/test.ts && tsx test/tool-routing/test.ts",
23
23
  "test:agent-browser": "tsx test/agent-browser/test.ts",
24
24
  "test:firecrawl": "tsx test/firecrawl/test.ts",
25
25
  "test:approve": "tsx test/content-preview/test.ts --approve"