@staticn0va/wigolo 0.5.1 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/SKILL.md +253 -89
- package/dist/fetch/router.d.ts +1 -0
- package/dist/fetch/router.d.ts.map +1 -1
- package/dist/fetch/router.js.map +1 -1
- package/dist/instructions.d.ts +3 -3
- package/dist/instructions.d.ts.map +1 -1
- package/dist/instructions.js +19 -4
- package/dist/instructions.js.map +1 -1
- package/dist/search/find-similar.d.ts.map +1 -1
- package/dist/search/find-similar.js +136 -29
- package/dist/search/find-similar.js.map +1 -1
- package/dist/server.d.ts.map +1 -1
- package/dist/server.js +47 -3
- package/dist/server.js.map +1 -1
- package/dist/tools/fetch.d.ts.map +1 -1
- package/dist/tools/fetch.js +6 -4
- package/dist/tools/fetch.js.map +1 -1
- package/dist/tools/search.d.ts +2 -2
- package/dist/tools/search.d.ts.map +1 -1
- package/dist/tools/search.js +55 -8
- package/dist/tools/search.js.map +1 -1
- package/dist/types.d.ts +8 -0
- package/dist/types.d.ts.map +1 -1
- package/package.json +2 -1
package/SKILL.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: wigolo
|
|
3
|
-
description: Local-first web
|
|
3
|
+
description: Local-first web access MCP server for AI coding agents. Eight tools for search, fetch, crawl, cache, extract, find similar, research, and agent-driven data gathering. No API keys. Results cached in local SQLite.
|
|
4
4
|
author: KnockOutEZ
|
|
5
5
|
license: BUSL-1.1
|
|
6
6
|
repository: https://github.com/KnockOutEZ/wigolo
|
|
@@ -9,29 +9,29 @@ install: npx @staticn0va/wigolo
|
|
|
9
9
|
runtime: node
|
|
10
10
|
min_runtime_version: "20"
|
|
11
11
|
tools:
|
|
12
|
-
- name: search
|
|
13
|
-
description: Search the web and return results with optional full content extraction. Supports domain filtering, date ranges, categories, and ML reranking.
|
|
14
12
|
- name: fetch
|
|
15
|
-
description: Fetch
|
|
13
|
+
description: Fetch one URL, return clean markdown. Auto-routes between HTTP and Playwright. Supports sections, auth, screenshots, browser actions.
|
|
14
|
+
- name: search
|
|
15
|
+
description: Search the web, return extracted markdown per result. Single query or array of query variants. Domain, category, date filters. Optional synthesized answer via MCP sampling.
|
|
16
16
|
- name: crawl
|
|
17
|
-
description: Crawl a
|
|
17
|
+
description: Crawl a site from a seed URL. BFS, DFS, sitemap, or map (URL-only) strategies with regex include/exclude filters.
|
|
18
18
|
- name: cache
|
|
19
|
-
description:
|
|
19
|
+
description: FTS5 search over previously fetched content. URL glob, date filters, stats, clear, and change detection via re-fetch.
|
|
20
20
|
- name: extract
|
|
21
|
-
description:
|
|
21
|
+
description: Structured extraction from URL or raw HTML. Modes: selector (CSS), tables, metadata (meta + JSON-LD), schema (heuristic field matching).
|
|
22
22
|
- name: find_similar
|
|
23
|
-
description: Find pages
|
|
23
|
+
description: Find pages similar to a URL or concept. Hybrid cache (FTS5 + embeddings) + optional web supplement.
|
|
24
24
|
- name: research
|
|
25
|
-
description:
|
|
25
|
+
description: Multi-step research pipeline. Question decomposition, parallel sub-search, source synthesis with citations. Quick, standard, or comprehensive depth.
|
|
26
26
|
- name: agent
|
|
27
|
-
description:
|
|
27
|
+
description: Natural-language data gathering. Plans searches/URLs, fetches in parallel within page and time budgets, optionally applies a JSON Schema to each page.
|
|
28
28
|
---
|
|
29
29
|
|
|
30
30
|
# wigolo
|
|
31
31
|
|
|
32
|
-
Local-first web search MCP server for AI coding agents.
|
|
32
|
+
Local-first web search MCP server for AI coding agents. Ships eight tools over stdio. All network results land in a local SQLite cache.
|
|
33
33
|
|
|
34
|
-
##
|
|
34
|
+
## Quick Setup
|
|
35
35
|
|
|
36
36
|
**Claude Code:**
|
|
37
37
|
```bash
|
|
@@ -50,147 +50,311 @@ claude mcp add wigolo -- npx @staticn0va/wigolo
|
|
|
50
50
|
}
|
|
51
51
|
```
|
|
52
52
|
|
|
53
|
-
**
|
|
53
|
+
**Warmup (recommended, one-time):**
|
|
54
54
|
```bash
|
|
55
|
-
npx @staticn0va/wigolo warmup
|
|
55
|
+
npx @staticn0va/wigolo warmup # installs Playwright Chromium + bootstraps SearXNG
|
|
56
|
+
npx @staticn0va/wigolo warmup --all # also installs Firefox, WebKit, reranker, embeddings, trafilatura
|
|
57
|
+
npx @staticn0va/wigolo warmup --force # wipe SearXNG state and rebuild
|
|
56
58
|
```
|
|
57
59
|
|
|
60
|
+
Warmup flags: `--force`, `--all`, `--trafilatura`, `--reranker`, `--firefox`, `--webkit`, `--embeddings`, `--lightpanda`.
|
|
61
|
+
|
|
58
62
|
## Tools
|
|
59
63
|
|
|
60
|
-
###
|
|
61
|
-
|
|
64
|
+
### fetch
|
|
65
|
+
|
|
66
|
+
Fetch a single URL and return clean markdown. Use when you already have a specific URL.
|
|
67
|
+
|
|
68
|
+
Parameters:
|
|
69
|
+
- `url` (string, required)
|
|
70
|
+
- `render_js`: `"auto"` (default) | `"always"` | `"never"`
|
|
71
|
+
- `use_auth`: boolean (default `false`) — reuses the user's browser session
|
|
72
|
+
- `max_chars`: number
|
|
73
|
+
- `section`: string — return only the content under a heading
|
|
74
|
+
- `section_index`: number (default `0`) — which heading match when multiple hit
|
|
75
|
+
- `screenshot`: boolean (default `false`)
|
|
76
|
+
- `headers`: object
|
|
77
|
+
- `force_refresh`: boolean — bypass cache
|
|
78
|
+
- `actions`: array of `{type, selector, text, ms, timeout, direction, amount}` — `click`, `type`, `wait`, `wait_for`, `scroll`, `screenshot`. Forces Playwright when present.
|
|
79
|
+
|
|
80
|
+
Example:
|
|
62
81
|
```json
|
|
63
|
-
{ "
|
|
82
|
+
{ "url": "https://react.dev/reference/react/useState", "section": "Parameters" }
|
|
64
83
|
```
|
|
65
84
|
|
|
66
|
-
|
|
67
|
-
|
|
85
|
+
Tip: `section` is much cheaper than reading the full page. Repeat fetches of the same URL are free from cache unless `force_refresh: true`.
|
|
86
|
+
|
|
87
|
+
### search
|
|
88
|
+
|
|
89
|
+
Search the web and return extracted markdown per result. Use when you don't have a URL yet.
|
|
90
|
+
|
|
91
|
+
Parameters:
|
|
92
|
+
- `query` (string OR `string[]`, required) — array runs variants in parallel and dedupes
|
|
93
|
+
- `max_results`: number (default `5`, cap `20`)
|
|
94
|
+
- `include_content`: boolean (default `true`)
|
|
95
|
+
- `content_max_chars`: number (default `30000`)
|
|
96
|
+
- `max_total_chars`: number (default `50000`)
|
|
97
|
+
- `time_range`: `"day"` | `"week"` | `"month"` | `"year"`
|
|
98
|
+
- `include_domains` / `exclude_domains`: `string[]`
|
|
99
|
+
- `from_date` / `to_date`: ISO `YYYY-MM-DD`
|
|
100
|
+
- `category`: `"general"` | `"news"` | `"code"` | `"docs"` | `"papers"` | `"images"`
|
|
101
|
+
- `language`: string
|
|
102
|
+
- `search_engines`: `string[]` — override engine selection
|
|
103
|
+
- `format`: `"full"` (default) | `"context"` (token-budgeted string) | `"answer"` (synthesized via MCP sampling) | `"stream_answer"` (answer + phase progress notifications)
|
|
104
|
+
- `force_refresh`: boolean
|
|
105
|
+
|
|
106
|
+
Example:
|
|
68
107
|
```json
|
|
69
|
-
{ "
|
|
108
|
+
{ "query": ["react server components patterns", "RSC data fetching", "react server components streaming"], "category": "docs", "include_domains": ["react.dev"], "max_results": 5 }
|
|
70
109
|
```
|
|
71
110
|
|
|
111
|
+
Tip: keyword queries beat natural-language questions. A 3–5 item `query` array usually finds more unique sources than one longer query.
|
|
112
|
+
|
|
72
113
|
### crawl
|
|
73
|
-
|
|
114
|
+
|
|
115
|
+
Crawl a site starting from a seed URL.
|
|
116
|
+
|
|
117
|
+
Parameters:
|
|
118
|
+
- `url` (string, required)
|
|
119
|
+
- `strategy`: `"bfs"` (default) | `"dfs"` | `"sitemap"` | `"map"` (URL-only discovery, no content)
|
|
120
|
+
- `max_depth`: number (default `2`)
|
|
121
|
+
- `max_pages`: number (default `20`)
|
|
122
|
+
- `include_patterns` / `exclude_patterns`: regex `string[]`
|
|
123
|
+
- `use_auth`: boolean (default `false`)
|
|
124
|
+
- `extract_links`: boolean (default `false`) — returns inter-page link graph
|
|
125
|
+
- `max_total_chars`: number (default `100000`)
|
|
126
|
+
|
|
127
|
+
Example:
|
|
74
128
|
```json
|
|
75
|
-
{ "url": "https://docs.
|
|
129
|
+
{ "url": "https://docs.python.org/3/library/", "strategy": "sitemap", "max_pages": 30, "include_patterns": ["^https://docs\\.python\\.org/3/library/asyncio"] }
|
|
76
130
|
```
|
|
77
131
|
|
|
132
|
+
Tip: `strategy: "sitemap"` is faster and more complete than BFS on doc sites. `strategy: "map"` returns URLs only — cheap way to scope before targeted fetches.
|
|
133
|
+
|
|
78
134
|
### cache
|
|
79
|
-
|
|
135
|
+
|
|
136
|
+
Search previously fetched content without hitting the network.
|
|
137
|
+
|
|
138
|
+
Parameters:
|
|
139
|
+
- `query`: FTS5 syntax — supports `AND`, `OR`, `NOT`, `"exact phrase"`
|
|
140
|
+
- `url_pattern`: glob (e.g. `"*react.dev*"`)
|
|
141
|
+
- `since`: ISO date
|
|
142
|
+
- `stats`: boolean — returns total URLs, size, date range
|
|
143
|
+
- `clear`: boolean — deletes matching entries (requires one of `query`, `url_pattern`, `since`)
|
|
144
|
+
- `check_changes`: boolean — re-fetches matching URLs, reports changed/unchanged with diff summaries
|
|
145
|
+
|
|
146
|
+
Example:
|
|
80
147
|
```json
|
|
81
|
-
{ "query": "
|
|
148
|
+
{ "query": "useState OR useReducer", "url_pattern": "*react.dev*" }
|
|
82
149
|
```
|
|
83
150
|
|
|
151
|
+
Tip: cache hits are instant and cross-session. Run this before `search` or `fetch` when you suspect the content is already on disk.
|
|
152
|
+
|
|
84
153
|
### extract
|
|
85
|
-
|
|
154
|
+
|
|
155
|
+
Structured extraction from URL or raw HTML.
|
|
156
|
+
|
|
157
|
+
Parameters:
|
|
158
|
+
- `url` OR `html` (one required; `url` wins if both provided)
|
|
159
|
+
- `mode`: `"metadata"` (default) | `"selector"` | `"tables"` | `"schema"`
|
|
160
|
+
- `css_selector`: string — required for `mode: "selector"`
|
|
161
|
+
- `multiple`: boolean (default `false`) — return all matches, selector mode only
|
|
162
|
+
- `schema`: JSON Schema object with `properties` — required for `mode: "schema"`
|
|
163
|
+
|
|
164
|
+
Example:
|
|
86
165
|
```json
|
|
87
|
-
{ "url": "https://example.com/product", "mode": "schema", "schema": { "type": "object", "properties": { "price": { "type": "string" }, "name": { "type": "string" } } } }
|
|
166
|
+
{ "url": "https://example.com/product", "mode": "schema", "schema": { "type": "object", "properties": { "price": { "type": "string" }, "name": { "type": "string" }, "sku": { "type": "string" } } } }
|
|
88
167
|
```
|
|
89
168
|
|
|
169
|
+
Tip: `mode: "schema"` does heuristic matching over CSS classes, ARIA labels, microdata, and JSON-LD — no LLM call required.
|
|
170
|
+
|
|
90
171
|
### find_similar
|
|
91
|
-
|
|
172
|
+
|
|
173
|
+
Find pages related to a URL or a free-text concept.
|
|
174
|
+
|
|
175
|
+
Parameters:
|
|
176
|
+
- `url` OR `concept` (one required)
|
|
177
|
+
- `max_results`: number (default `10`, cap `50`)
|
|
178
|
+
- `include_domains` / `exclude_domains`: `string[]`
|
|
179
|
+
- `include_cache`: boolean (default `true`)
|
|
180
|
+
- `include_web`: boolean (default `true`)
|
|
181
|
+
|
|
182
|
+
Example:
|
|
92
183
|
```json
|
|
93
|
-
{ "url": "https://react.dev/reference/react/useState", "max_results":
|
|
184
|
+
{ "url": "https://react.dev/reference/react/useState", "max_results": 8, "include_domains": ["react.dev", "developer.mozilla.org"] }
|
|
94
185
|
```
|
|
95
186
|
|
|
187
|
+
Tip: uses hybrid 3-way search — FTS5 over titles, FTS5 over body, plus embeddings when available. Cache path is near-instant; web supplement runs only if cache yields too few results.
|
|
188
|
+
|
|
96
189
|
### research
|
|
97
|
-
Deep multi-step research that plans queries, fetches, and synthesizes.
|
|
98
|
-
```json
|
|
99
|
-
{ "question": "How do modern bundlers handle tree-shaking of ESM vs CJS", "depth": "standard", "max_sources": 10 }
|
|
100
|
-
```
|
|
101
190
|
|
|
102
|
-
|
|
103
|
-
|
|
191
|
+
Multi-step research pipeline with decomposition, parallel search, and cited synthesis.
|
|
192
|
+
|
|
193
|
+
Parameters:
|
|
194
|
+
- `question` (string, required)
|
|
195
|
+
- `depth`: `"quick"` (~15s, 2 sub-queries, 5–8 sources) | `"standard"` (~40s, default) | `"comprehensive"` (~80s, 7 sub-queries, 20–25 sources)
|
|
196
|
+
- `max_sources`: number (cap `50`) — overrides depth default
|
|
197
|
+
- `include_domains` / `exclude_domains`: `string[]`
|
|
198
|
+
- `schema`: JSON Schema — if present, report is structured to fill these fields
|
|
199
|
+
- `stream`: boolean — emit progress notifications per phase
|
|
200
|
+
|
|
201
|
+
Example:
|
|
104
202
|
```json
|
|
105
|
-
{ "
|
|
203
|
+
{ "question": "How do modern JS bundlers tree-shake ESM vs CJS?", "depth": "standard", "include_domains": ["webpack.js.org", "rollupjs.org", "esbuild.github.io", "vitejs.dev"] }
|
|
106
204
|
```
|
|
107
205
|
|
|
108
|
-
|
|
206
|
+
Tip: `research` checks cache internally — no need to pre-probe. Requires MCP sampling-capable client for synthesis; without sampling, returns raw sources in context format.
|
|
109
207
|
|
|
110
|
-
|
|
208
|
+
### agent
|
|
111
209
|
|
|
112
|
-
|
|
210
|
+
Natural-language data gathering. Plans queries and URLs from a prompt, runs them in parallel within budget, optionally applies a schema.
|
|
113
211
|
|
|
114
|
-
|
|
212
|
+
Parameters:
|
|
213
|
+
- `prompt` (string, required)
|
|
214
|
+
- `urls`: `string[]` — seed URLs to include
|
|
215
|
+
- `schema`: JSON Schema — extract structured fields per page and merge
|
|
216
|
+
- `max_pages`: number (default `10`, cap `100`)
|
|
217
|
+
- `max_time_ms`: number (default `60000`, cap `600000`)
|
|
218
|
+
- `stream`: boolean
|
|
115
219
|
|
|
116
|
-
|
|
220
|
+
Example:
|
|
221
|
+
```json
|
|
222
|
+
{ "prompt": "Compare pricing tiers for Supabase, Firebase, and Clerk", "schema": { "type": "object", "properties": { "provider": { "type": "string" }, "free_tier": { "type": "string" }, "paid_start": { "type": "string" } } }, "max_pages": 12 }
|
|
223
|
+
```
|
|
117
224
|
|
|
118
|
-
|
|
225
|
+
Tip: output includes a `steps` array showing every action (plan, search, fetch, extract, synthesize) with timings. Use this to debug why an agent run produced a weak result.
|
|
119
226
|
|
|
120
|
-
|
|
227
|
+
## Workflow Patterns
|
|
121
228
|
|
|
122
|
-
|
|
229
|
+
Quick routing:
|
|
230
|
+
- Use when `search` — you need information but don't have a URL.
|
|
231
|
+
- Use when `fetch` — you already have the URL.
|
|
232
|
+
- Use when `crawl` — you need multiple pages from one site.
|
|
233
|
+
- Use when `cache` — you want to check whether something is already on disk.
|
|
234
|
+
- Use when `extract` — you need specific fields, tables, or metadata, not the whole page.
|
|
235
|
+
- Use when `find_similar` — you have a good page/concept and want related content.
|
|
236
|
+
- Use when `research` — a question needs decomposition and multi-source synthesis.
|
|
237
|
+
- Use when `agent` — a natural-language task needs multi-step data gathering.
|
|
238
|
+
|
|
239
|
+
**Cache-first lookup.** Before any `fetch` or `search`, probe the cache.
|
|
240
|
+
```json
|
|
241
|
+
cache({ "query": "oauth2 pkce", "url_pattern": "*auth0.com*" })
|
|
242
|
+
// empty? fall through to search
|
|
243
|
+
search({ "query": "oauth2 pkce flow", "include_domains": ["auth0.com"] })
|
|
244
|
+
```
|
|
123
245
|
|
|
124
|
-
**
|
|
246
|
+
**Fresh content (news, dashboards, changelogs).** Bypass cache explicitly.
|
|
247
|
+
```json
|
|
248
|
+
search({ "query": "node.js 22 release notes", "force_refresh": true, "time_range": "week" })
|
|
249
|
+
fetch({ "url": "https://nodejs.org/en/blog", "force_refresh": true })
|
|
250
|
+
```
|
|
125
251
|
|
|
126
|
-
**
|
|
252
|
+
**Scoped documentation research.** Crawl the relevant slice, then query cache.
|
|
253
|
+
```json
|
|
254
|
+
crawl({ "url": "https://docs.astro.build", "strategy": "sitemap", "max_pages": 40 })
|
|
255
|
+
cache({ "query": "server islands hydration", "url_pattern": "*docs.astro.build*" })
|
|
256
|
+
```
|
|
127
257
|
|
|
128
|
-
|
|
258
|
+
**Broad exploration.** Pass a query array; dedup is automatic.
|
|
259
|
+
```json
|
|
260
|
+
search({ "query": ["rust async runtimes comparison", "tokio vs async-std vs smol", "rust executor benchmarks"], "max_results": 8 })
|
|
261
|
+
```
|
|
129
262
|
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
- `from_date` / `to_date` for time-sensitive queries
|
|
135
|
-
- `format: "context"` returns a single token-budgeted string for LLM injection
|
|
263
|
+
**More like this.** Start with a known-good URL, widen via `find_similar`.
|
|
264
|
+
```json
|
|
265
|
+
find_similar({ "url": "https://react.dev/reference/react/useMemo", "max_results": 6, "include_domains": ["react.dev"] })
|
|
266
|
+
```
|
|
136
267
|
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
268
|
+
**Complex synthesis.** One `research` call replaces 5+ manual search/fetch cycles.
|
|
269
|
+
```json
|
|
270
|
+
research({ "question": "Tradeoffs of vector DBs for RAG at 100M+ embeddings", "depth": "comprehensive" })
|
|
271
|
+
```
|
|
141
272
|
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
273
|
+
**Structured data from multiple sources.** Use `agent` with a schema.
|
|
274
|
+
```json
|
|
275
|
+
agent({ "prompt": "Find latency and pricing for top 5 edge compute providers", "schema": { "type": "object", "properties": { "provider": {"type":"string"}, "cold_start_ms": {"type":"string"}, "price_per_million": {"type":"string"} } } })
|
|
276
|
+
```
|
|
146
277
|
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
278
|
+
**Table extraction.** Skip markdown entirely.
|
|
279
|
+
```json
|
|
280
|
+
extract({ "url": "https://en.wikipedia.org/wiki/List_of_programming_languages", "mode": "tables" })
|
|
281
|
+
```
|
|
150
282
|
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
283
|
+
## Parameter Cheat Sheet
|
|
284
|
+
|
|
285
|
+
| Situation | Tool + parameters |
|
|
286
|
+
|---|---|
|
|
287
|
+
| Focused lookup, known site | `search` + `max_results: 3` + `include_domains` |
|
|
288
|
+
| Broad topic survey | `search` + `query: [...3-5 variants]` + `max_results: 8` |
|
|
289
|
+
| Fresh content required | any tool + `force_refresh: true` |
|
|
290
|
+
| Doc site indexing | `crawl` + `strategy: "sitemap"` |
|
|
291
|
+
| Site URL inventory only | `crawl` + `strategy: "map"` |
|
|
292
|
+
| Single heading from long page | `fetch` + `section: "..."` |
|
|
293
|
+
| Behind login | `fetch` / `crawl` + `use_auth: true` |
|
|
294
|
+
| Direct answer (sampling client) | `search` + `format: "answer"` |
|
|
295
|
+
| LLM-ready context blob | `search` + `format: "context"` |
|
|
296
|
+
| Complex question, multi-source | `research` + `depth: "standard"` |
|
|
297
|
+
| Structured multi-page extraction | `agent` + `schema` |
|
|
298
|
+
| One-page structured data | `extract` + `mode: "schema"` or `"tables"` |
|
|
299
|
+
| Change tracking | `cache` + `check_changes: true` |
|
|
155
300
|
|
|
156
301
|
## Anti-Patterns
|
|
157
302
|
|
|
158
|
-
|
|
303
|
+
**Do not skip the cache.** Running `search` or `fetch` without probing `cache` wastes time on content already on disk. `research` and `agent` check cache internally; manual `search`/`fetch` do not.
|
|
159
304
|
|
|
160
|
-
**Do not
|
|
305
|
+
**Do not send natural-language questions to `search`.** Use keywords. `"how do I debounce in React hooks"` loses to `"react useDebounce hook custom"`.
|
|
161
306
|
|
|
162
|
-
**Do not
|
|
307
|
+
**Do not retry an identical failing query.** Reformulate keywords, swap `category`, or add `include_domains`. Same query → same empty result.
|
|
163
308
|
|
|
164
|
-
**Do not
|
|
309
|
+
**Do not use `agent` or `research` for one-URL lookups.** Use `fetch`. `agent` is for multi-source gathering; `research` is for decomposable questions.
|
|
165
310
|
|
|
166
|
-
**Do not
|
|
311
|
+
**Do not crawl `max_pages: 100` without filters.** Always add `include_patterns` to stay in-scope. Unfiltered crawls fetch nav, footer, and sitemap garbage.
|
|
167
312
|
|
|
168
|
-
**Do not
|
|
313
|
+
**Do not fetch whole pages when you need one section.** `fetch` + `section` reads under one heading only.
|
|
169
314
|
|
|
170
|
-
**Do not
|
|
315
|
+
**Do not set `force_refresh: true` by default.** It defeats the cache. Use it for news, status, changelogs — content that actually churns.
|
|
171
316
|
|
|
172
|
-
**Do not
|
|
317
|
+
**Do not pass a JSON Schema to `extract` without `properties`.** The handler rejects schemas that lack a `properties` key.
|
|
173
318
|
|
|
174
|
-
|
|
319
|
+
## CLI Commands
|
|
320
|
+
|
|
321
|
+
```bash
|
|
322
|
+
wigolo # default: start MCP server on stdio
|
|
323
|
+
wigolo mcp # explicit: start MCP server
|
|
324
|
+
wigolo warmup [flags] # install Playwright, bootstrap SearXNG, optional extras
|
|
325
|
+
wigolo serve # start HTTP daemon on WIGOLO_DAEMON_PORT (default 3333)
|
|
326
|
+
wigolo health # health probe, exits 0 if ok
|
|
327
|
+
wigolo doctor # environment diagnostics (Python, Docker, Playwright, SearXNG)
|
|
328
|
+
wigolo auth discover # list CDP sessions (needs WIGOLO_CDP_URL)
|
|
329
|
+
wigolo auth status # show configured auth paths
|
|
330
|
+
wigolo plugin add <git-url> # clone plugin into ~/.wigolo/plugins/
|
|
331
|
+
wigolo plugin list # list installed plugins
|
|
332
|
+
wigolo plugin remove <name> # remove a plugin
|
|
333
|
+
wigolo shell [--json] # interactive REPL against subsystems
|
|
334
|
+
```
|
|
175
335
|
|
|
176
|
-
##
|
|
336
|
+
## Configuration
|
|
177
337
|
|
|
178
|
-
|
|
179
|
-
- Zero cloud dependency -- runs entirely local
|
|
180
|
-
- Authenticated browsing (Chrome profiles, session state)
|
|
181
|
-
- Localhost access (develop against local servers)
|
|
182
|
-
- SQLite FTS5 cache with full-text search
|
|
183
|
-
- ML reranking (optional, via FlashRank)
|
|
184
|
-
- Extraction ensemble: site-specific, Defuddle, Trafilatura, Readability, Turndown
|
|
338
|
+
Top environment variables. All optional — defaults are safe.
|
|
185
339
|
|
|
186
|
-
|
|
340
|
+
| Variable | Default | Purpose |
|
|
341
|
+
|---|---|---|
|
|
342
|
+
| `WIGOLO_DATA_DIR` | `~/.wigolo` | Cache DB, SearXNG state, plugins, embeddings |
|
|
343
|
+
| `SEARXNG_URL` | unset | Point at an existing SearXNG (skips native bootstrap) |
|
|
344
|
+
| `SEARXNG_MODE` | `native` | `native` runs local Python SearXNG; `docker` runs container |
|
|
345
|
+
| `WIGOLO_CHROME_PROFILE_PATH` | unset | Chrome profile for `use_auth: true` |
|
|
346
|
+
| `WIGOLO_CDP_URL` | unset | Chrome DevTools endpoint (e.g. `http://localhost:9222`) |
|
|
347
|
+
| `MAX_BROWSERS` | `3` | Playwright pool size |
|
|
348
|
+
| `WIGOLO_BROWSER_TYPES` | `chromium` | Comma list: `chromium,firefox,webkit` |
|
|
349
|
+
| `WIGOLO_RERANKER` | `none` | `flashrank` for ML reranking |
|
|
350
|
+
| `WIGOLO_EMBEDDING_MODEL` | `BAAI/bge-small-en-v1.5` | Used by `find_similar` |
|
|
351
|
+
| `CACHE_TTL_CONTENT` | `604800` (7d) | Seconds before cached pages expire |
|
|
352
|
+
| `LOG_LEVEL` | `info` | `debug` \| `info` \| `warn` \| `error` |
|
|
187
353
|
|
|
188
|
-
|
|
189
|
-
- Python 3.8+ (recommended, for embedded SearXNG search)
|
|
190
|
-
- Docker (optional, alternative to Python for SearXNG)
|
|
354
|
+
Full list: see `src/config.ts`.
|
|
191
355
|
|
|
192
356
|
## Links
|
|
193
357
|
|
|
194
358
|
- Repository: https://github.com/KnockOutEZ/wigolo
|
|
195
359
|
- npm: https://www.npmjs.com/package/@staticn0va/wigolo
|
|
196
|
-
- License:
|
|
360
|
+
- License: BUSL-1.1 (converts to open source on 2029-04-12)
|
package/dist/fetch/router.d.ts
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"router.d.ts","sourceRoot":"","sources":["../../src/fetch/router.ts"],"names":[],"mappings":"AAIA,OAAO,KAAK,EAAE,cAAc,EAAE,aAAa,EAAE,MAAM,aAAa,CAAC;AAEjE,MAAM,WAAW,kBAAkB;IACjC,QAAQ,CAAC,EAAE,MAAM,GAAG,QAAQ,GAAG,OAAO,CAAC;IACvC,OAAO,CAAC,EAAE,OAAO,CAAC;IAClB,OAAO,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAC;IACjC,UAAU,CAAC,EAAE,OAAO,CAAC;IACrB,OAAO,CAAC,EAAE,aAAa,EAAE,CAAC;
|
|
1
|
+
{"version":3,"file":"router.d.ts","sourceRoot":"","sources":["../../src/fetch/router.ts"],"names":[],"mappings":"AAIA,OAAO,KAAK,EAAE,cAAc,EAAE,aAAa,EAAE,MAAM,aAAa,CAAC;AAEjE,MAAM,WAAW,kBAAkB;IACjC,QAAQ,CAAC,EAAE,MAAM,GAAG,QAAQ,GAAG,OAAO,CAAC;IACvC,OAAO,CAAC,EAAE,OAAO,CAAC;IAClB,OAAO,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAC;IACjC,UAAU,CAAC,EAAE,OAAO,CAAC;IACrB,OAAO,CAAC,EAAE,aAAa,EAAE,CAAC;IAC1B,aAAa,CAAC,EAAE,OAAO,CAAC;CACzB;AAED,MAAM,WAAW,UAAU;IACzB,KAAK,CACH,GAAG,EAAE,MAAM,EACX,OAAO,CAAC,EAAE;QAAE,OAAO,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAC;QAAC,SAAS,CAAC,EAAE,MAAM,CAAA;KAAE,GACjE,OAAO,CAAC;QACT,GAAG,EAAE,MAAM,CAAC;QACZ,QAAQ,EAAE,MAAM,CAAC;QACjB,IAAI,EAAE,MAAM,CAAC;QACb,WAAW,EAAE,MAAM,CAAC;QACpB,UAAU,EAAE,MAAM,CAAC;QACnB,OAAO,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAC;QAChC,SAAS,CAAC,EAAE,MAAM,CAAC;KACpB,CAAC,CAAC;CACJ;AAED,MAAM,WAAW,oBAAoB;IACnC,gBAAgB,CACd,GAAG,EAAE,MAAM,EACX,OAAO,CAAC,EAAE;QAAE,OAAO,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAC;QAAC,gBAAgB,CAAC,EAAE,MAAM,CAAC;QAAC,WAAW,CAAC,EAAE,MAAM,CAAC;QAAC,UAAU,CAAC,EAAE,OAAO,CAAC;QAAC,OAAO,CAAC,EAAE,aAAa,EAAE,CAAC;QAAC,MAAM,CAAC,EAAE,MAAM,CAAA;KAAE,GAChK,OAAO,CAAC,cAAc,CAAC,CAAC;CAC5B;AAED,UAAU,WAAW;IACnB,YAAY,EAAE,MAAM,CAAC;IACrB,gBAAgB,EAAE,OAAO,CAAC;CAC3B;AAED,qBAAa,WAAW;IAIpB,OAAO,CAAC,QAAQ,CAAC,UAAU;IAC3B,OAAO,CAAC,QAAQ,CAAC,WAAW;IAJ9B,OAAO,CAAC,QAAQ,CAAC,SAAS,CAAkC;gBAGzC,UAAU,EAAE,UAAU,EACtB,WAAW,EAAE,oBAAoB;IAG9C,KAAK,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,GAAE,kBAAuB,GAAG,OAAO,CAAC,cAAc,CAAC;IAoEnF,cAAc,CAAC,MAAM,EAAE,MAAM,GAAG,WAAW,GAAG,SAAS;IAIvD,OAAO,CAAC,WAAW;IASnB,OAAO,CAAC,gBAAgB;CAczB"}
|
package/dist/fetch/router.js.map
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"router.js","sourceRoot":"","sources":["../../src/fetch/router.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,SAAS,EAAE,MAAM,cAAc,CAAC;AACzC,OAAO,EAAE,YAAY,EAAE,MAAM,cAAc,CAAC;AAC5C,OAAO,EAAE,mBAAmB,EAAE,MAAM,oBAAoB,CAAC;AACzD,OAAO,EAAE,cAAc,EAAE,MAAM,WAAW,CAAC;
|
|
1
|
+
{"version":3,"file":"router.js","sourceRoot":"","sources":["../../src/fetch/router.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,SAAS,EAAE,MAAM,cAAc,CAAC;AACzC,OAAO,EAAE,YAAY,EAAE,MAAM,cAAc,CAAC;AAC5C,OAAO,EAAE,mBAAmB,EAAE,MAAM,oBAAoB,CAAC;AACzD,OAAO,EAAE,cAAc,EAAE,MAAM,WAAW,CAAC;AAuC3C,MAAM,OAAO,WAAW;IAIH;IACA;IAJF,SAAS,GAAG,IAAI,GAAG,EAAuB,CAAC;IAE5D,YACmB,UAAsB,EACtB,WAAiC;QADjC,eAAU,GAAV,UAAU,CAAY;QACtB,gBAAW,GAAX,WAAW,CAAsB;IACjD,CAAC;IAEJ,KAAK,CAAC,KAAK,CAAC,GAAW,EAAE,UAA8B,EAAE;QACvD,MAAM,EAAE,QAAQ,GAAG,MAAM,EAAE,OAAO,GAAG,KAAK,EAAE,OAAO,EAAE,UAAU,EAAE,OAAO,EAAE,GAAG,OAAO,CAAC;QACrF,MAAM,MAAM,GAAG,SAAS,EAAE,CAAC;QAC3B,MAAM,MAAM,GAAG,YAAY,CAAC,OAAO,CAAC,CAAC;QACrC,MAAM,SAAS,GAAG,MAAM,CAAC,wBAAwB,CAAC;QAClD,MAAM,MAAM,GAAG,IAAI,GAAG,CAAC,GAAG,CAAC,CAAC,QAAQ,CAAC;QAErC,uEAAuE;QACvE,IAAI,OAAO,IAAI,OAAO,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YAClC,MAAM,WAAW,GAAG,OAAO,CAAC,CAAC,CAAC,CAAC,MAAM,cAAc,EAAE,IAAI,EAAE,CAAC,CAAC,CAAC,CAAC,EAAE,CAAC;YAClE,MAAM,CAAC,KAAK,CAAC,uBAAuB,EAAE,EAAE,GAAG,EAAE,MAAM,EAAE,iBAAiB,EAAE,CAAC,CAAC;YAC1E,OAAO,IAAI,CAAC,WAAW,CAAC,gBAAgB,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,UAAU,EAAE,OAAO,EAAE,GAAG,WAAW,EAAE,CAAC,CAAC;QAClG,CAAC;QAED,kDAAkD;QAClD,IAAI,QAAQ,KAAK,QAAQ,IAAI,OAAO,EAAE,CAAC;YACrC,MAAM,WAAW,GAAG,OAAO,CAAC,CAAC,CAAC,CAAC,MAAM,cAAc,EAAE,IAAI,EAAE,CAAC,CAAC,CAAC,CAAC,EAAE,CAAC;YAClE,MAAM,CAAC,KAAK,CAAC,uBAAuB,EAAE,EAAE,GAAG,EAAE,MAAM,EAAE,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,CAAC,CAAC,kBAAkB,EAAE,CAAC,CAAC;YAC9F,OAAO,IAAI,CAAC,WAAW,CAAC,gBAAgB,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,UAAU,EAAE,GAAG,WAAW,EAAE,CAAC,CAAC;QACzF,CAAC;QAED,yBAAyB;QACzB,IAAI,QAAQ,KAAK,OAAO,EAAE,CAAC;YACzB,MAAM,CAAC,KAAK,CAAC,yBAAyB,EAAE,EAAE,GAAG,EAAE,CAAC,CAAC;YACjD,MAAM,MAAM,GAAG,MAAM,IAAI,CAAC,UAAU,CAAC,KAAK,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,CAAC,CAAC;YAC7D,IAAI,CAAC,WAAW,CAAC,MAAM,CAAC,CAAC;YACzB,OAAO,IAAI,CAAC,gBAAgB,CAAC,MAAM,CAAC,CAAC;QACvC,CAAC;QAED,yDAAyD;QACzD,MAAM,KAAK,GAAG,IAAI,CAAC,WAAW,CAAC,MAAM,CAAC,CAAC;QAEvC,IAAI,KAAK,CAAC,gBAAgB,EAAE,CAAC;YAC3B,MAAM,CAAC,KAAK,CAAC,uCAAuC,EAAE,EAAE,GAAG,EAAE,MAAM,EAAE,CAAC,CAAC;YACvE,OAAO,IAAI,CAAC,WAAW,CAAC,gBAAgB,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,UAAU,EAAE,CAAC,CAAC;QACzE,CAAC;QAED,iBAAiB;QACjB,IAAI,CAAC;YACH,MAAM,MAAM,GAAG,MAAM,IAAI,CAAC,UAAU,CAAC,KAAK,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,CAAC,CAAC;YAE7D,sCAAsC;YACtC,IAAI,mBAAmB,CAAC,MAAM,CAAC,IAAI,CAAC,EAAE,CAAC;gBACrC,MAAM,CAAC,IAAI,CAAC,mDAAmD,EAAE,EAAE,GAAG,EAAE,MAAM,EAAE,CAAC,CAAC;gBAClF,KAAK,CAAC,gBAAgB,GAAG,IAAI,CAAC;gBAC9B,OAAO,IAAI,CAAC,WAAW,CAAC,gBAAgB,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,UAAU,EAAE,CAAC,CAAC;YACzE,CAAC;YAED,OAAO,IAAI,CAAC,gBAAgB,CAAC,MAAM,CAAC,CAAC;QACvC,CAAC;QAAC,OAAO,GAAG,EAAE,CAAC;YACb,KAAK,CAAC,YAAY,EAAE,CAAC;YACrB,MAAM,CAAC,IAAI,CAAC,mBAAmB,EAAE;gBAC/B,GAAG;gBACH,MAAM;gBACN,YAAY,EAAE,KAAK,CAAC,YAAY;gBAChC,KAAK,EAAE,GAAG,YAAY,KAAK,CAAC,CAAC,CAAC,GAAG,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,GAAG,CAAC;aACxD,CAAC,CAAC;YAEH,IAAI,KAAK,CAAC,YAAY,IAAI,SAAS,EAAE,CAAC;gBACpC,MAAM,CAAC,IAAI,CAAC,0DAA0D,EAAE,EAAE,GAAG,EAAE,MAAM,EAAE,SAAS,EAAE,CAAC,CAAC;gBACpG,KAAK,CAAC,gBAAgB,GAAG,IAAI,CAAC;gBAC9B,OAAO,IAAI,CAAC,WAAW,CAAC,gBAAgB,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,UAAU,EAAE,CAAC,CAAC;YACzE,CAAC;YAED,MAAM,GAAG,CAAC;QACZ,CAAC;IACH,CAAC;IAED,cAAc,CAAC,MAAc;QAC3B,OAAO,IAAI,CAAC,SAAS,CAAC,GAAG,CAAC,MAAM,CAAC,CAAC;IACpC,CAAC;IAEO,WAAW,CAAC,MAAc;QAChC,IAAI,KAAK,GAAG,IAAI,CAAC,SAAS,CAAC,GAAG,CAAC,MAAM,CAAC,CAAC;QACvC,IAAI,CAAC,KAAK,EAAE,CAAC;YACX,KAAK,GAAG,EAAE,YAAY,EAAE,CAAC,EAAE,gBAAgB,EAAE,KAAK,EAAE,CAAC;YACrD,IAAI,CAAC,SAAS,CAAC,GAAG,CAAC,MAAM,EAAE,KAAK,CAAC,CAAC;QACpC,CAAC;QACD,OAAO,KAAK,CAAC;IACf,CAAC;IAEO,gBAAgB,CACtB,MAAgD;QAEhD,OAAO;YACL,GAAG,EAAE,MAAM,CAAC,GAAG;YACf,QAAQ,EAAE,MAAM,CAAC,QAAQ;YACzB,IAAI,EAAE,MAAM,CAAC,IAAI;YACjB,WAAW,EAAE,MAAM,CAAC,WAAW;YAC/B,UAAU,EAAE,MAAM,CAAC,UAAU;YAC7B,MAAM,EAAE,MAAM;YACd,OAAO,EAAE,MAAM,CAAC,OAAO;YACvB,SAAS,EAAE,MAAM,CAAC,SAAS;SAC5B,CAAC;IACJ,CAAC;CACF"}
|
package/dist/instructions.d.ts
CHANGED
|
@@ -14,10 +14,10 @@
|
|
|
14
14
|
* Parameter schemas (types, enums, required/optional) belong on the JSON
|
|
15
15
|
* Schema, not here. Installation/configuration is for humans, not LLMs.
|
|
16
16
|
*/
|
|
17
|
-
export declare const WIGOLO_INSTRUCTIONS = "Wigolo is a local-first web access layer: search the open web, fetch pages, crawl sites, extract structured data, find related content, run multi-step research, and execute agent-driven data gathering. All results land in a local SQLite cache that persists across sessions.\n\n## When to use which tool\n\n- `search` -- you need information on a topic but do not have a URL yet. Pass a query string or an array of 3-5 semantically varied keyword forms for broader coverage.\n- `fetch` -- you already have a specific URL to read.\n- `crawl` -- you need multiple pages from the same site (docs, wikis, references).\n- `cache` -- you want to know if the content is already on disk from an earlier read.\n- `extract` -- you need specific data points (tables, metadata, schema-shaped fields) rather than a whole page as markdown.\n- `find_similar` -- you have a URL or concept and want related content from the cache or web. Useful for \"more like this\" discovery.\n- `research` -- you have a complex question that needs multi-step investigation: question decomposition, parallel search, source synthesis into a report. Set `depth` to control thoroughness.\n- `agent` -- you need to gather structured or unstructured data from multiple sources based on a natural-language prompt. Provides full step transparency.\n\n## Routing by intent\n\n| Intent | Tool | Key parameters |\n|--------|------|----------------|\n| Documentation lookup | `search` | `include_domains: [\"react.dev\", \"docs.python.org\"]`, `category: \"docs\"` |\n| Error debugging | `search` | exact error string as query, `category: \"code\"` |\n| Library research | `crawl` | seed URL of docs site, `strategy: \"sitemap\"`, then `cache` for later queries |\n| Related content | `find_similar` | `url` of a known good page, or `concept` as free text |\n| Direct answer | `search` | `format: \"answer\"` for a synthesized direct response |\n| Comprehensive research | `research` | `depth: \"comprehensive\"`, optional `include_domains` to scope |\n| Data gathering | `agent` | natural-language `prompt`, optional `schema` for structured output |\n| Structured extraction | `extract` | `mode: \"schema\"` with a JSON Schema, or `mode: \"tables\"` |\n| Site inventory | `crawl` | `strategy: \"map\"` for URL-only discovery, no content fetched |\n\n## Check the cache before going to the network\n\nBefore every `search` or `fetch`, consider a `cache` call with the query text or URL pattern. Pages read in this or a prior session return instantly with their full markdown -- no network, no rate limits. The `research` and `agent` tools check the cache internally, so you do not need a separate call for those.\n\n## Multi-query search strategy\n\nFor broad or exploratory queries, pass an array of 3-5 semantically varied keyword forms rather than a single natural-language question. Example: instead of \"how does React handle state management\", pass `[\"react state management\", \"useState useReducer patterns\", \"react hooks state\", \"react context vs redux\"]`. The search tool deduplicates across sub-queries automatically.\n\n## Pick the right strategy\n\n- For documentation sites, prefer `crawl` with `strategy: \"sitemap\"` -- it is faster and more complete than BFS because it reads sitemap.xml directly.\n- When you only need to discover what pages exist on a site, use `crawl` with `strategy: \"map\"`. It returns URLs only, no content, and is far cheaper than a full crawl. Follow up with targeted `fetch` calls.\n- For structured data (prices, specs, listings, table rows), use `extract` with `mode: \"schema\"` or `mode: \"tables\"`. Reach for `fetch` only when you want the whole page as markdown.\n- For complex questions requiring synthesis from multiple sources, use `research` instead of manually chaining `search` + `fetch` calls.\n- For natural-language data gathering tasks (\"find the pricing for the top 5 CRM tools\"), use `agent` with an optional `schema` to structure the output.\n\n## Scope searches, do not just broaden queries\n\n`search` accepts `include_domains` (e.g. `[\"react.dev\", \"developer.mozilla.org\"]`) and a `category` such as `\"docs\"`, `\"code\"`, `\"news\"`, or `\"papers\"`. A scoped query usually beats a broader query with post-filtering.\n\n## Performance\n\n- `max_results: 3` for focused lookups; `5` is the default; `10+` only for broad research.\n- `fetch` with `section: \"Heading Name\"` returns just the content under that heading. Cheaper and more relevant than the whole page.\n- Repeated fetches of the same URL are free -- served directly from the SQLite cache.\n- `research` with `depth: \"quick\"` is fast (~15s) and sufficient for most factual questions. Reserve `\"comprehensive\"` for topics requiring deep investigation.\n- `agent` respects `max_pages` (default 10) and `max_time_ms` (default 60s) to bound resource usage.\n\n## Capabilities worth knowing\n\n- Localhost URLs work: `http://localhost:3000`, `http://127.0.0.1:8080`, and similar. Useful for reading local dev servers and internal docs.\n- `use_auth: true` on `fetch` and `crawl` reuses the user's configured browser session for pages behind a login.\n- `cache` accepts FTS5 query syntax (`AND`, `OR`, `NOT`, `\"exact phrase\"`) for precise lookups.\n- `crawl` accepts regex `include_patterns` and `exclude_patterns` to stay inside a section of a large site.\n- `find_similar` uses cached embeddings when available -- no network call needed if the content has been seen before.\n- `research` and `agent` use MCP requestSampling for intelligent decomposition and synthesis when the client supports it. Without sampling support, they return raw sources in context format.";
|
|
17
|
+
export declare const WIGOLO_INSTRUCTIONS = "Wigolo is a local-first web access layer: search the open web, fetch pages, crawl sites, extract structured data, find related content, run multi-step research, and execute agent-driven data gathering. All results land in a local SQLite cache that persists across sessions.\n\n## When to use which tool\n\n- `search` -- you need information on a topic but do not have a URL yet. Pass a query string or an array of 3-5 semantically varied keyword forms for broader coverage.\n- `fetch` -- you already have a specific URL to read.\n- `crawl` -- you need multiple pages from the same site (docs, wikis, references).\n- `cache` -- you want to know if the content is already on disk from an earlier read.\n- `extract` -- you need specific data points (tables, metadata, schema-shaped fields) rather than a whole page as markdown.\n- `find_similar` -- you have a URL or concept and want related content from the cache or web. Useful for \"more like this\" discovery.\n- `research` -- you have a complex question that needs multi-step investigation: question decomposition, parallel search, source synthesis into a report. Set `depth` to control thoroughness.\n- `agent` -- you need to gather structured or unstructured data from multiple sources based on a natural-language prompt. Provides full step transparency.\n\n## Routing by intent\n\n| Intent | Tool | Key parameters |\n|--------|------|----------------|\n| Documentation lookup | `search` | `include_domains: [\"react.dev\", \"docs.python.org\"]`, `category: \"docs\"` |\n| Error debugging | `search` | exact error string as query, `category: \"code\"` |\n| Library research | `crawl` | seed URL of docs site, `strategy: \"sitemap\"`, then `cache` for later queries |\n| Related content | `find_similar` | `url` of a known good page, or `concept` as free text |\n| Direct answer | `search` | `format: \"answer\"` for a synthesized direct response |\n| Comprehensive research | `research` | `depth: \"comprehensive\"`, optional `include_domains` to scope |\n| Data gathering | `agent` | natural-language `prompt`, optional `schema` for structured output |\n| Structured extraction | `extract` | `mode: \"schema\"` with a JSON Schema, or `mode: \"tables\"` |\n| Site inventory | `crawl` | `strategy: \"map\"` for URL-only discovery, no content fetched |\n\n## Rapidly changing content\n\nFor news, prices, status pages, release notes, or any content that changes frequently, bypass the cache with `force_refresh: true`:\n\n search({ query: \"...\", force_refresh: true })\n fetch({ url: \"...\", force_refresh: true })\n\nWhen freshness matters more than speed, use `force_refresh`. When speed matters more than freshness (documentation, tutorials, reference pages), let the cache work -- it is much faster.\n\n## Check the cache before going to the network\n\nBefore every `search` or `fetch`, consider a `cache` call with the query text or URL pattern. Pages read in this or a prior session return instantly with their full markdown -- no network, no rate limits. The `research` and `agent` tools check the cache internally, so you do not need a separate call for those.\n\n## Multi-query search strategy\n\nFor broad or exploratory queries, pass an array of 3-5 semantically varied keyword forms rather than a single natural-language question. Example: instead of \"how does React handle state management\", pass `[\"react state management\", \"useState useReducer patterns\", \"react hooks state\", \"react context vs redux\"]`. The search tool deduplicates across sub-queries automatically.\n\n## Pick the right strategy\n\n- For documentation sites, prefer `crawl` with `strategy: \"sitemap\"` -- it is faster and more complete than BFS because it reads sitemap.xml directly.\n- When you only need to discover what pages exist on a site, use `crawl` with `strategy: \"map\"`. It returns URLs only, no content, and is far cheaper than a full crawl. Follow up with targeted `fetch` calls.\n- For structured data (prices, specs, listings, table rows), use `extract` with `mode: \"schema\"` or `mode: \"tables\"`. Reach for `fetch` only when you want the whole page as markdown.\n- For complex questions requiring synthesis from multiple sources, use `research` instead of manually chaining `search` + `fetch` calls.\n- For natural-language data gathering tasks (\"find the pricing for the top 5 CRM tools\"), use `agent` with an optional `schema` to structure the output.\n\n## Scope searches, do not just broaden queries\n\n`search` accepts `include_domains` (e.g. `[\"react.dev\", \"developer.mozilla.org\"]`) and a `category` such as `\"docs\"`, `\"code\"`, `\"news\"`, or `\"papers\"`. A scoped query usually beats a broader query with post-filtering.\n\n## Performance\n\n- `max_results: 3` for focused lookups; `5` is the default; `10+` only for broad research.\n- `fetch` with `section: \"Heading Name\"` returns just the content under that heading. Cheaper and more relevant than the whole page.\n- Repeated fetches of the same URL are free -- served directly from the SQLite cache.\n- `research` with `depth: \"quick\"` is fast (~15s) and sufficient for most factual questions. Reserve `\"comprehensive\"` for topics requiring deep investigation.\n- `agent` respects `max_pages` (default 10) and `max_time_ms` (default 60s) to bound resource usage.\n\n## Capabilities worth knowing\n\n- Localhost URLs work: `http://localhost:3000`, `http://127.0.0.1:8080`, and similar. Useful for reading local dev servers and internal docs.\n- `use_auth: true` on `fetch` and `crawl` reuses the user's configured browser session for pages behind a login.\n- `cache` accepts FTS5 query syntax (`AND`, `OR`, `NOT`, `\"exact phrase\"`) for precise lookups.\n- `crawl` accepts regex `include_patterns` and `exclude_patterns` to stay inside a section of a large site.\n- `find_similar` uses cached embeddings when available -- no network call needed if the content has been seen before.\n- `research` and `agent` use MCP requestSampling for intelligent decomposition and synthesis when the client supports it. Without sampling support, they return raw sources in context format.";
|
|
18
18
|
export declare const TOOL_DESCRIPTIONS: {
|
|
19
|
-
readonly fetch: "Fetch a single URL and return clean markdown. Use when you have a specific URL to read. Automatically detects if JavaScript rendering is needed.\n\nKey parameters:\n- section: extract content under a specific heading (e.g., section: \"API Reference\") -- faster than reading the whole page\n- use_auth: true to use stored browser session for authenticated/private pages\n- render_js: \"auto\" (default, detects JS need), \"always\" (force browser), \"never\" (HTTP only, fastest)\n- headers: custom HTTP headers if needed\n\nReturns title, markdown content, links, images, and metadata. Result is cached locally -- subsequent fetches of the same URL return instantly. Works with localhost URLs (localhost:3000, etc.) for reading local dev servers.";
|
|
20
|
-
readonly search: "Search the web and return full markdown content from top results. Use for finding information on any topic -- returns extracted page content, not just snippets.\n\nKey parameters:\n- query: a search string, or an array of 3-5 semantically varied keyword forms for broader coverage. Arrays are deduplicated and merged automatically.\n- include_domains/exclude_domains: scope results to specific sites (e.g., include_domains: [\"react.dev\"])\n- category: \"general\", \"news\", \"code\", \"docs\", \"papers\" -- filters by content type\n- from_date/to_date: ISO dates for time-bounded queries\n- max_results: default 5. Use 3 for focused queries, 10+ for research.\n- format: \"full\" (default, structured JSON), \"context\" (single token-budgeted string for LLM injection), \"answer\" (synthesized direct answer via requestSampling), \"stream_answer\" (
|
|
19
|
+
readonly fetch: "Fetch a single URL and return clean markdown. Use when you have a specific URL to read. Automatically detects if JavaScript rendering is needed.\n\nKey parameters:\n- section: extract content under a specific heading (e.g., section: \"API Reference\") -- faster than reading the whole page\n- use_auth: true to use stored browser session for authenticated/private pages\n- render_js: \"auto\" (default, detects JS need), \"always\" (force browser), \"never\" (HTTP only, fastest)\n- headers: custom HTTP headers if needed\n- force_refresh: true to bypass cache and fetch fresh content from the network\n\nReturns title, markdown content, links, images, and metadata. Result is cached locally -- subsequent fetches of the same URL return instantly. Works with localhost URLs (localhost:3000, etc.) for reading local dev servers.\n\nUse force_refresh: true for pages that change frequently (news sites, changelogs, dashboards, API status pages). By default, previously fetched pages are served from local cache for speed.";
|
|
20
|
+
readonly search: "Search the web and return full markdown content from top results. Use for finding information on any topic -- returns extracted page content, not just snippets.\n\nKey parameters:\n- query: a search string, or an array of 3-5 semantically varied keyword forms for broader coverage. Arrays are deduplicated and merged automatically.\n- include_domains/exclude_domains: scope results to specific sites (e.g., include_domains: [\"react.dev\"])\n- category: \"general\", \"news\", \"code\", \"docs\", \"papers\" -- filters by content type\n- from_date/to_date: ISO dates for time-bounded queries\n- max_results: default 5. Use 3 for focused queries, 10+ for research.\n- format: \"full\" (default, structured JSON), \"context\" (single token-budgeted string for LLM injection), \"answer\" (synthesized direct answer via requestSampling), \"stream_answer\" (same as answer, with MCP progress notifications emitted between pipeline phases)\n- force_refresh: true to bypass all caches (search results and page content)\n\nThe \"answer\" format uses the MCP client's sampling capability to synthesize a direct response from search results. If sampling is not supported, falls back to \"context\" format. \"stream_answer\" emits notifications/progress messages at each pipeline phase (search, fetch, synthesize) when the client provides a progressToken via request._meta — token-level streaming of the LLM response is not supported by MCP sampling, so the answer itself still arrives as one block.\n\nResults include title, URL, relevance_score, and full markdown_content per result. Previously fetched pages are served from local cache.\n\nUse force_refresh: true when you need current information that may have changed since the last search. Default behavior uses cached results when available.";
|
|
21
21
|
readonly crawl: "Crawl a website starting from a URL and return content from multiple pages. Use for indexing documentation sites, wikis, or any multi-page resource.\n\nKey parameters:\n- strategy: \"bfs\" (breadth-first, default), \"dfs\" (depth-first), \"sitemap\" (use sitemap.xml -- fastest for doc sites), \"map\" (URL discovery only, no content -- fastest for scoping a site)\n- max_depth: how many links deep to follow (default 2)\n- max_pages: maximum pages to fetch (default 20)\n- include_patterns/exclude_patterns: regex filters on URLs\n\nReturns an array of pages with title, markdown, and depth. Content is deduplicated across pages (repeated nav/headers/footers stripped). All pages are cached for later cache queries.";
|
|
22
22
|
readonly cache: "Search previously fetched content without hitting the network. Use before searching the web -- if relevant content was already fetched or crawled, this returns it instantly.\n\nKey parameters:\n- query: full-text search over cached markdown and titles (supports FTS5 syntax: AND, OR, NOT, \"phrase match\")\n- url_pattern: glob filter on URLs (e.g., \"*example.com*\")\n- since: ISO date -- only results cached after this date\n- stats: true to get cache size, entry count, oldest/newest dates\n- clear: true to delete matching entries\n\nReturns matching cached pages with full markdown content. Cache persists across sessions in local SQLite.";
|
|
23
23
|
readonly extract: "Extract structured data from a URL or raw HTML. Use when you need specific data points, tables, or metadata rather than full page markdown.\n\nKey parameters:\n- mode: \"selector\" (CSS selector -> text), \"tables\" (HTML tables -> JSON rows), \"metadata\" (title/author/date/description), \"schema\" (JSON Schema -> heuristic field extraction)\n- css_selector: required for mode=\"selector\" -- any valid CSS selector\n- schema: for mode=\"schema\", a JSON Schema object describing the fields to extract\n- multiple: true to return array of all matches (mode=\"selector\" only)\n\nFor mode=\"tables\", returns array of table objects with headers and row data. For mode=\"schema\", pass { price: \"string\", name: \"string\" } and get structured fields extracted from the page.";
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"instructions.d.ts","sourceRoot":"","sources":["../src/instructions.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;GAeG;AAEH,eAAO,MAAM,mBAAmB,
|
|
1
|
+
{"version":3,"file":"instructions.d.ts","sourceRoot":"","sources":["../src/instructions.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;GAeG;AAEH,eAAO,MAAM,mBAAmB,i/LAuEmK,CAAC;AAEpM,eAAO,MAAM,iBAAiB;;;;;;;;;CA4GpB,CAAC;AAEX,MAAM,MAAM,QAAQ,GAAG,MAAM,OAAO,iBAAiB,CAAC"}
|
package/dist/instructions.js
CHANGED
|
@@ -41,6 +41,15 @@ export const WIGOLO_INSTRUCTIONS = `Wigolo is a local-first web access layer: se
|
|
|
41
41
|
| Structured extraction | \`extract\` | \`mode: "schema"\` with a JSON Schema, or \`mode: "tables"\` |
|
|
42
42
|
| Site inventory | \`crawl\` | \`strategy: "map"\` for URL-only discovery, no content fetched |
|
|
43
43
|
|
|
44
|
+
## Rapidly changing content
|
|
45
|
+
|
|
46
|
+
For news, prices, status pages, release notes, or any content that changes frequently, bypass the cache with \`force_refresh: true\`:
|
|
47
|
+
|
|
48
|
+
search({ query: "...", force_refresh: true })
|
|
49
|
+
fetch({ url: "...", force_refresh: true })
|
|
50
|
+
|
|
51
|
+
When freshness matters more than speed, use \`force_refresh\`. When speed matters more than freshness (documentation, tutorials, reference pages), let the cache work -- it is much faster.
|
|
52
|
+
|
|
44
53
|
## Check the cache before going to the network
|
|
45
54
|
|
|
46
55
|
Before every \`search\` or \`fetch\`, consider a \`cache\` call with the query text or URL pattern. Pages read in this or a prior session return instantly with their full markdown -- no network, no rate limits. The \`research\` and \`agent\` tools check the cache internally, so you do not need a separate call for those.
|
|
@@ -85,8 +94,11 @@ Key parameters:
|
|
|
85
94
|
- use_auth: true to use stored browser session for authenticated/private pages
|
|
86
95
|
- render_js: "auto" (default, detects JS need), "always" (force browser), "never" (HTTP only, fastest)
|
|
87
96
|
- headers: custom HTTP headers if needed
|
|
97
|
+
- force_refresh: true to bypass cache and fetch fresh content from the network
|
|
98
|
+
|
|
99
|
+
Returns title, markdown content, links, images, and metadata. Result is cached locally -- subsequent fetches of the same URL return instantly. Works with localhost URLs (localhost:3000, etc.) for reading local dev servers.
|
|
88
100
|
|
|
89
|
-
|
|
101
|
+
Use force_refresh: true for pages that change frequently (news sites, changelogs, dashboards, API status pages). By default, previously fetched pages are served from local cache for speed.`,
|
|
90
102
|
search: `Search the web and return full markdown content from top results. Use for finding information on any topic -- returns extracted page content, not just snippets.
|
|
91
103
|
|
|
92
104
|
Key parameters:
|
|
@@ -95,11 +107,14 @@ Key parameters:
|
|
|
95
107
|
- category: "general", "news", "code", "docs", "papers" -- filters by content type
|
|
96
108
|
- from_date/to_date: ISO dates for time-bounded queries
|
|
97
109
|
- max_results: default 5. Use 3 for focused queries, 10+ for research.
|
|
98
|
-
- format: "full" (default, structured JSON), "context" (single token-budgeted string for LLM injection), "answer" (synthesized direct answer via requestSampling), "stream_answer" (
|
|
110
|
+
- format: "full" (default, structured JSON), "context" (single token-budgeted string for LLM injection), "answer" (synthesized direct answer via requestSampling), "stream_answer" (same as answer, with MCP progress notifications emitted between pipeline phases)
|
|
111
|
+
- force_refresh: true to bypass all caches (search results and page content)
|
|
112
|
+
|
|
113
|
+
The "answer" format uses the MCP client's sampling capability to synthesize a direct response from search results. If sampling is not supported, falls back to "context" format. "stream_answer" emits notifications/progress messages at each pipeline phase (search, fetch, synthesize) when the client provides a progressToken via request._meta — token-level streaming of the LLM response is not supported by MCP sampling, so the answer itself still arrives as one block.
|
|
99
114
|
|
|
100
|
-
|
|
115
|
+
Results include title, URL, relevance_score, and full markdown_content per result. Previously fetched pages are served from local cache.
|
|
101
116
|
|
|
102
|
-
|
|
117
|
+
Use force_refresh: true when you need current information that may have changed since the last search. Default behavior uses cached results when available.`,
|
|
103
118
|
crawl: `Crawl a website starting from a URL and return content from multiple pages. Use for indexing documentation sites, wikis, or any multi-page resource.
|
|
104
119
|
|
|
105
120
|
Key parameters:
|
package/dist/instructions.js.map
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"instructions.js","sourceRoot":"","sources":["../src/instructions.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;GAeG;AAEH,MAAM,CAAC,MAAM,mBAAmB,GAAG
|
|
1
|
+
{"version":3,"file":"instructions.js","sourceRoot":"","sources":["../src/instructions.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;GAeG;AAEH,MAAM,CAAC,MAAM,mBAAmB,GAAG;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;mMAuEgK,CAAC;AAEpM,MAAM,CAAC,MAAM,iBAAiB,GAAG;IAC/B,KAAK,EAAE;;;;;;;;;;;6LAWoL;IAE3L,MAAM,EAAE;;;;;;;;;;;;;;;4JAekJ;IAE1J,KAAK,EAAE;;;;;;;;uLAQ8K;IAErL,KAAK,EAAE;;;;;;;;;0GASiG;IAExG,OAAO,EAAE;;;;;;;;4LAQiL;IAE1L,YAAY,EAAE;;;;;;;;;;;uEAWuD;IAErE,QAAQ,EAAE;;;;;;;;;;;;;;yHAc6G;IAEvH,KAAK,EAAE;;;;;;;;;;;;;;;;sKAgB6J;CAC5J,CAAC"}
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"find-similar.d.ts","sourceRoot":"","sources":["../../src/search/find-similar.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EACV,gBAAgB,EAChB,iBAAiB,EAEjB,YAAY,EAEb,MAAM,aAAa,CAAC;AACrB,OAAO,KAAK,EAAE,WAAW,EAAE,MAAM,oBAAoB,CAAC;AACtD,OAAO,KAAK,EAAE,aAAa,EAAE,MAAM,6BAA6B,CAAC;
|
|
1
|
+
{"version":3,"file":"find-similar.d.ts","sourceRoot":"","sources":["../../src/search/find-similar.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EACV,gBAAgB,EAChB,iBAAiB,EAEjB,YAAY,EAEb,MAAM,aAAa,CAAC;AACrB,OAAO,KAAK,EAAE,WAAW,EAAE,MAAM,oBAAoB,CAAC;AACtD,OAAO,KAAK,EAAE,aAAa,EAAE,MAAM,6BAA6B,CAAC;AAyBjE,wBAAsB,WAAW,CAC/B,KAAK,EAAE,gBAAgB,EACvB,OAAO,EAAE,YAAY,EAAE,EACvB,MAAM,EAAE,WAAW,EACnB,aAAa,CAAC,EAAE,aAAa,GAC5B,OAAO,CAAC,iBAAiB,CAAC,CAqJ5B"}
|