@staticn0va/wigolo 0.5.1 → 0.6.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +46 -34
- package/SKILL.md +253 -89
- package/dist/cli/doctor.d.ts.map +1 -1
- package/dist/cli/doctor.js +17 -15
- package/dist/cli/doctor.js.map +1 -1
- package/dist/cli/warmup.d.ts.map +1 -1
- package/dist/cli/warmup.js +93 -13
- package/dist/cli/warmup.js.map +1 -1
- package/dist/embedding/subprocess.d.ts.map +1 -1
- package/dist/embedding/subprocess.js +2 -1
- package/dist/embedding/subprocess.js.map +1 -1
- package/dist/extraction/trafilatura.d.ts.map +1 -1
- package/dist/extraction/trafilatura.js +3 -2
- package/dist/extraction/trafilatura.js.map +1 -1
- package/dist/fetch/router.d.ts +1 -0
- package/dist/fetch/router.d.ts.map +1 -1
- package/dist/fetch/router.js.map +1 -1
- package/dist/instructions.d.ts +3 -3
- package/dist/instructions.d.ts.map +1 -1
- package/dist/instructions.js +19 -4
- package/dist/instructions.js.map +1 -1
- package/dist/python-env.d.ts +9 -0
- package/dist/python-env.d.ts.map +1 -0
- package/dist/python-env.js +18 -0
- package/dist/python-env.js.map +1 -0
- package/dist/search/find-similar.d.ts.map +1 -1
- package/dist/search/find-similar.js +136 -29
- package/dist/search/find-similar.js.map +1 -1
- package/dist/search/flashrank.d.ts.map +1 -1
- package/dist/search/flashrank.js +2 -1
- package/dist/search/flashrank.js.map +1 -1
- package/dist/server/backend-status.d.ts +2 -0
- package/dist/server/backend-status.d.ts.map +1 -1
- package/dist/server/backend-status.js +13 -0
- package/dist/server/backend-status.js.map +1 -1
- package/dist/server.d.ts.map +1 -1
- package/dist/server.js +51 -3
- package/dist/server.js.map +1 -1
- package/dist/tools/fetch.d.ts.map +1 -1
- package/dist/tools/fetch.js +6 -4
- package/dist/tools/fetch.js.map +1 -1
- package/dist/tools/search.d.ts +2 -2
- package/dist/tools/search.d.ts.map +1 -1
- package/dist/tools/search.js +55 -8
- package/dist/tools/search.js.map +1 -1
- package/dist/types.d.ts +8 -0
- package/dist/types.d.ts.map +1 -1
- package/package.json +2 -1
package/README.md
CHANGED
|
@@ -10,11 +10,12 @@ Search, fetch, crawl, cache, and extract — zero API keys, zero cloud, zero cos
|
|
|
10
10
|
[](https://nodejs.org)
|
|
11
11
|
[](https://www.typescriptlang.org/)
|
|
12
12
|
|
|
13
|
-
[Quick Start](#quick-start) · [Features](#features) · [Why wigolo?](#why-wigolo)
|
|
13
|
+
[Quick Start](#quick-start) · [Features](#features) · [Why wigolo?](#why-wigolo)
|
|
14
14
|
|
|
15
15
|
</div>
|
|
16
16
|
|
|
17
17
|
```
|
|
18
|
+
$ npx @staticn0va/wigolo warmup --all
|
|
18
19
|
$ claude mcp add wigolo -- npx @staticn0va/wigolo
|
|
19
20
|
Added MCP server wigolo
|
|
20
21
|
|
|
@@ -27,6 +28,30 @@ wigolo gives AI coding agents (Claude Code, Cursor, Gemini CLI, Codex, Windsurf)
|
|
|
27
28
|
|
|
28
29
|
## Quick Start
|
|
29
30
|
|
|
31
|
+
### 1. Warm up (required)
|
|
32
|
+
|
|
33
|
+
Install Playwright, bootstrap SearXNG, install Python extras (FlashRank, Trafilatura, sentence-transformers), then verify the setup end-to-end:
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
npx @staticn0va/wigolo warmup --all
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
`--all` runs verification automatically: it starts SearXNG, runs a test search, checks every Python package, then shuts SearXNG down. You see proof everything works before connecting an agent. Re-run any time with `warmup --verify`.
|
|
40
|
+
|
|
41
|
+
Flag menu:
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
npx @staticn0va/wigolo warmup # Playwright + SearXNG only
|
|
45
|
+
npx @staticn0va/wigolo warmup --all # + reranker + trafilatura + embeddings + lightpanda + verify
|
|
46
|
+
npx @staticn0va/wigolo warmup --reranker # Install FlashRank (ML reranking)
|
|
47
|
+
npx @staticn0va/wigolo warmup --trafilatura # Install Trafilatura (content extraction)
|
|
48
|
+
npx @staticn0va/wigolo warmup --embeddings # Install sentence-transformers
|
|
49
|
+
npx @staticn0va/wigolo warmup --verify # Start SearXNG, test search, test Python packages
|
|
50
|
+
npx @staticn0va/wigolo warmup --force # Wipe SearXNG state/install/locks and re-bootstrap
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
### 2. Connect your agent
|
|
54
|
+
|
|
30
55
|
**Claude Code:**
|
|
31
56
|
```bash
|
|
32
57
|
claude mcp add wigolo -- npx @staticn0va/wigolo
|
|
@@ -44,12 +69,7 @@ claude mcp add wigolo -- npx @staticn0va/wigolo
|
|
|
44
69
|
}
|
|
45
70
|
```
|
|
46
71
|
|
|
47
|
-
|
|
48
|
-
```bash
|
|
49
|
-
npx @staticn0va/wigolo warmup # Downloads Playwright + SearXNG
|
|
50
|
-
npx @staticn0va/wigolo warmup --all # + ML reranking + Trafilatura extraction
|
|
51
|
-
npx @staticn0va/wigolo warmup --force # Wipe SearXNG state/install/locks and re-bootstrap
|
|
52
|
-
```
|
|
72
|
+
> Skipping warmup still works — wigolo will bootstrap in the background on first tool call — but early searches will be lower quality until the install finishes. Running `warmup --all` up front is strongly recommended.
|
|
53
73
|
|
|
54
74
|
## Diagnostics
|
|
55
75
|
|
|
@@ -268,31 +288,6 @@ SearXNG bootstrap failures are self-healing: wigolo retries after 30 seconds, 1
|
|
|
268
288
|
4. Readability.js — battle-tested Mozilla algorithm
|
|
269
289
|
5. Raw Turndown — last resort HTML-to-markdown
|
|
270
290
|
|
|
271
|
-
## Roadmap
|
|
272
|
-
|
|
273
|
-
### v2.1 — Next
|
|
274
|
-
- [x] Daemon mode — persistent HTTP server, zero startup latency
|
|
275
|
-
- [ ] Browser interaction — click, type, scroll before extraction
|
|
276
|
-
- [ ] Content change detection — diff monitoring for cached pages
|
|
277
|
-
- [ ] CDP session discovery — attach to running Chrome for seamless auth
|
|
278
|
-
- [ ] Plugin system — community extractors and search engines
|
|
279
|
-
|
|
280
|
-
### v2.2
|
|
281
|
-
- [ ] Multi-browser pool — Chromium + Firefox for fingerprint diversity
|
|
282
|
-
- [ ] Interactive REPL (`wigolo shell`)
|
|
283
|
-
- [x] Agent skill distribution — MCP registry listings, `SKILL.md`
|
|
284
|
-
|
|
285
|
-
### v3 — The Knowledge Engine
|
|
286
|
-
- [ ] Answer synthesis — search + LLM = direct answers with citations (bring your own key)
|
|
287
|
-
- [ ] Semantic search — local vector embeddings over cached content (`findSimilar`)
|
|
288
|
-
- [ ] Agent endpoint — describe what you need, no URLs required
|
|
289
|
-
- [ ] Streaming answers — real-time generation as results come in
|
|
290
|
-
- [ ] Knowledge graph — entity and relationship extraction from crawled content
|
|
291
|
-
- [ ] Auto re-crawl scheduler — keep documentation fresh automatically
|
|
292
|
-
- [ ] Lightpanda browser — optional ultra-lightweight headless browser (11x less RAM than Chrome)
|
|
293
|
-
- [ ] Cloud sync — share cache across machines via rclone (S3, Drive, Dropbox)
|
|
294
|
-
- [ ] Team knowledge base — shared indexed content across team members
|
|
295
|
-
|
|
296
291
|
## Discovery
|
|
297
292
|
|
|
298
293
|
wigolo is listed on MCP server registries for agent discovery:
|
|
@@ -309,8 +304,19 @@ See `SKILL.md` for the full tool schema in agent-discovery format.
|
|
|
309
304
|
|
|
310
305
|
## Troubleshooting
|
|
311
306
|
|
|
307
|
+
Start with `npx @staticn0va/wigolo doctor` — it reports the state of every component and is the fastest way to find the cause.
|
|
308
|
+
|
|
309
|
+
**First search is slow or returns odd results**
|
|
310
|
+
SearXNG is still bootstrapping in the background. Either wait a minute, or (recommended) run `npx @staticn0va/wigolo warmup --all` before connecting your agent.
|
|
311
|
+
|
|
312
|
+
**FlashRank / Trafilatura / sentence-transformers "not installed"**
|
|
313
|
+
These are optional Python extras. Install them with `npx @staticn0va/wigolo warmup --all` (or per-package: `--reranker`, `--trafilatura`, `--embeddings`). wigolo uses a private venv under `~/.wigolo/searxng/venv` so your system Python stays untouched.
|
|
314
|
+
|
|
312
315
|
**SearXNG won't start**
|
|
313
|
-
Make sure `python3` is on your PATH and version 3.8+. Check with `python3 --version`. Alternatively, set `SEARXNG_MODE=docker` if Docker is available.
|
|
316
|
+
Make sure `python3` is on your PATH and version 3.8+. Check with `python3 --version`. If bootstrap got interrupted, `npx @staticn0va/wigolo warmup --force` wipes the state and reinstalls. Alternatively, set `SEARXNG_MODE=docker` if Docker is available.
|
|
317
|
+
|
|
318
|
+
**Doctor reports SearXNG "not running"**
|
|
319
|
+
That's expected when you haven't made a search yet — the process starts on-demand when the MCP server needs it. Doctor only marks it degraded if the install is broken.
|
|
314
320
|
|
|
315
321
|
**Playwright browser not found**
|
|
316
322
|
Run `npx @staticn0va/wigolo warmup` to download Chromium. This is done automatically on first use but can fail behind corporate proxies.
|
|
@@ -321,6 +327,12 @@ If SearXNG and all fallback engines fail, check your network connection. Behind
|
|
|
321
327
|
**Permission errors on `~/.wigolo/`**
|
|
322
328
|
wigolo stores its cache and SearXNG installation in `~/.wigolo/`. Ensure your user has write access. Override with `WIGOLO_DATA_DIR=/your/path`.
|
|
323
329
|
|
|
330
|
+
**Start fresh**
|
|
331
|
+
```bash
|
|
332
|
+
rm -rf ~/.wigolo
|
|
333
|
+
npx @staticn0va/wigolo warmup --all
|
|
334
|
+
```
|
|
335
|
+
|
|
324
336
|
## Contributing
|
|
325
337
|
|
|
326
338
|
PRs welcome. Open an issue first to discuss what you'd like to change.
|
|
@@ -353,4 +365,4 @@ Requires the `NPM_TOKEN` repository secret (npm automation token with publish sc
|
|
|
353
365
|
|
|
354
366
|
## License
|
|
355
367
|
|
|
356
|
-
[BSL 1.1](LICENSE) — free for individuals, small teams (under $1M revenue), education, and open source. Converts to
|
|
368
|
+
[BSL 1.1](LICENSE) — free for individuals, small teams (under $1M revenue), education, and open source. Converts to AGPL-3.0 on 2029-04-12.
|
package/SKILL.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: wigolo
|
|
3
|
-
description: Local-first web
|
|
3
|
+
description: Local-first web access MCP server for AI coding agents. Eight tools for search, fetch, crawl, cache, extract, find similar, research, and agent-driven data gathering. No API keys. Results cached in local SQLite.
|
|
4
4
|
author: KnockOutEZ
|
|
5
5
|
license: BUSL-1.1
|
|
6
6
|
repository: https://github.com/KnockOutEZ/wigolo
|
|
@@ -9,29 +9,29 @@ install: npx @staticn0va/wigolo
|
|
|
9
9
|
runtime: node
|
|
10
10
|
min_runtime_version: "20"
|
|
11
11
|
tools:
|
|
12
|
-
- name: search
|
|
13
|
-
description: Search the web and return results with optional full content extraction. Supports domain filtering, date ranges, categories, and ML reranking.
|
|
14
12
|
- name: fetch
|
|
15
|
-
description: Fetch
|
|
13
|
+
description: Fetch one URL, return clean markdown. Auto-routes between HTTP and Playwright. Supports sections, auth, screenshots, browser actions.
|
|
14
|
+
- name: search
|
|
15
|
+
description: Search the web, return extracted markdown per result. Single query or array of query variants. Domain, category, date filters. Optional synthesized answer via MCP sampling.
|
|
16
16
|
- name: crawl
|
|
17
|
-
description: Crawl a
|
|
17
|
+
description: Crawl a site from a seed URL. BFS, DFS, sitemap, or map (URL-only) strategies with regex include/exclude filters.
|
|
18
18
|
- name: cache
|
|
19
|
-
description:
|
|
19
|
+
description: FTS5 search over previously fetched content. URL glob, date filters, stats, clear, and change detection via re-fetch.
|
|
20
20
|
- name: extract
|
|
21
|
-
description:
|
|
21
|
+
description: Structured extraction from URL or raw HTML. Modes: selector (CSS), tables, metadata (meta + JSON-LD), schema (heuristic field matching).
|
|
22
22
|
- name: find_similar
|
|
23
|
-
description: Find pages
|
|
23
|
+
description: Find pages similar to a URL or concept. Hybrid cache (FTS5 + embeddings) + optional web supplement.
|
|
24
24
|
- name: research
|
|
25
|
-
description:
|
|
25
|
+
description: Multi-step research pipeline. Question decomposition, parallel sub-search, source synthesis with citations. Quick, standard, or comprehensive depth.
|
|
26
26
|
- name: agent
|
|
27
|
-
description:
|
|
27
|
+
description: Natural-language data gathering. Plans searches/URLs, fetches in parallel within page and time budgets, optionally applies a JSON Schema to each page.
|
|
28
28
|
---
|
|
29
29
|
|
|
30
30
|
# wigolo
|
|
31
31
|
|
|
32
|
-
Local-first web search MCP server for AI coding agents.
|
|
32
|
+
Local-first web search MCP server for AI coding agents. Ships eight tools over stdio. All network results land in a local SQLite cache.
|
|
33
33
|
|
|
34
|
-
##
|
|
34
|
+
## Quick Setup
|
|
35
35
|
|
|
36
36
|
**Claude Code:**
|
|
37
37
|
```bash
|
|
@@ -50,147 +50,311 @@ claude mcp add wigolo -- npx @staticn0va/wigolo
|
|
|
50
50
|
}
|
|
51
51
|
```
|
|
52
52
|
|
|
53
|
-
**
|
|
53
|
+
**Warmup (recommended, one-time):**
|
|
54
54
|
```bash
|
|
55
|
-
npx @staticn0va/wigolo warmup
|
|
55
|
+
npx @staticn0va/wigolo warmup # installs Playwright Chromium + bootstraps SearXNG
|
|
56
|
+
npx @staticn0va/wigolo warmup --all # also installs Firefox, WebKit, reranker, embeddings, trafilatura
|
|
57
|
+
npx @staticn0va/wigolo warmup --force # wipe SearXNG state and rebuild
|
|
56
58
|
```
|
|
57
59
|
|
|
60
|
+
Warmup flags: `--force`, `--all`, `--trafilatura`, `--reranker`, `--firefox`, `--webkit`, `--embeddings`, `--lightpanda`.
|
|
61
|
+
|
|
58
62
|
## Tools
|
|
59
63
|
|
|
60
|
-
###
|
|
61
|
-
|
|
64
|
+
### fetch
|
|
65
|
+
|
|
66
|
+
Fetch a single URL and return clean markdown. Use when you already have a specific URL.
|
|
67
|
+
|
|
68
|
+
Parameters:
|
|
69
|
+
- `url` (string, required)
|
|
70
|
+
- `render_js`: `"auto"` (default) | `"always"` | `"never"`
|
|
71
|
+
- `use_auth`: boolean (default `false`) — reuses the user's browser session
|
|
72
|
+
- `max_chars`: number
|
|
73
|
+
- `section`: string — return only the content under a heading
|
|
74
|
+
- `section_index`: number (default `0`) — which heading match when multiple hit
|
|
75
|
+
- `screenshot`: boolean (default `false`)
|
|
76
|
+
- `headers`: object
|
|
77
|
+
- `force_refresh`: boolean — bypass cache
|
|
78
|
+
- `actions`: array of `{type, selector, text, ms, timeout, direction, amount}` — `click`, `type`, `wait`, `wait_for`, `scroll`, `screenshot`. Forces Playwright when present.
|
|
79
|
+
|
|
80
|
+
Example:
|
|
62
81
|
```json
|
|
63
|
-
{ "
|
|
82
|
+
{ "url": "https://react.dev/reference/react/useState", "section": "Parameters" }
|
|
64
83
|
```
|
|
65
84
|
|
|
66
|
-
|
|
67
|
-
|
|
85
|
+
Tip: `section` is much cheaper than reading the full page. Repeat fetches of the same URL are free from cache unless `force_refresh: true`.
|
|
86
|
+
|
|
87
|
+
### search
|
|
88
|
+
|
|
89
|
+
Search the web and return extracted markdown per result. Use when you don't have a URL yet.
|
|
90
|
+
|
|
91
|
+
Parameters:
|
|
92
|
+
- `query` (string OR `string[]`, required) — array runs variants in parallel and dedupes
|
|
93
|
+
- `max_results`: number (default `5`, cap `20`)
|
|
94
|
+
- `include_content`: boolean (default `true`)
|
|
95
|
+
- `content_max_chars`: number (default `30000`)
|
|
96
|
+
- `max_total_chars`: number (default `50000`)
|
|
97
|
+
- `time_range`: `"day"` | `"week"` | `"month"` | `"year"`
|
|
98
|
+
- `include_domains` / `exclude_domains`: `string[]`
|
|
99
|
+
- `from_date` / `to_date`: ISO `YYYY-MM-DD`
|
|
100
|
+
- `category`: `"general"` | `"news"` | `"code"` | `"docs"` | `"papers"` | `"images"`
|
|
101
|
+
- `language`: string
|
|
102
|
+
- `search_engines`: `string[]` — override engine selection
|
|
103
|
+
- `format`: `"full"` (default) | `"context"` (token-budgeted string) | `"answer"` (synthesized via MCP sampling) | `"stream_answer"` (answer + phase progress notifications)
|
|
104
|
+
- `force_refresh`: boolean
|
|
105
|
+
|
|
106
|
+
Example:
|
|
68
107
|
```json
|
|
69
|
-
{ "
|
|
108
|
+
{ "query": ["react server components patterns", "RSC data fetching", "react server components streaming"], "category": "docs", "include_domains": ["react.dev"], "max_results": 5 }
|
|
70
109
|
```
|
|
71
110
|
|
|
111
|
+
Tip: keyword queries beat natural-language questions. A 3–5 item `query` array usually finds more unique sources than one longer query.
|
|
112
|
+
|
|
72
113
|
### crawl
|
|
73
|
-
|
|
114
|
+
|
|
115
|
+
Crawl a site starting from a seed URL.
|
|
116
|
+
|
|
117
|
+
Parameters:
|
|
118
|
+
- `url` (string, required)
|
|
119
|
+
- `strategy`: `"bfs"` (default) | `"dfs"` | `"sitemap"` | `"map"` (URL-only discovery, no content)
|
|
120
|
+
- `max_depth`: number (default `2`)
|
|
121
|
+
- `max_pages`: number (default `20`)
|
|
122
|
+
- `include_patterns` / `exclude_patterns`: regex `string[]`
|
|
123
|
+
- `use_auth`: boolean (default `false`)
|
|
124
|
+
- `extract_links`: boolean (default `false`) — returns inter-page link graph
|
|
125
|
+
- `max_total_chars`: number (default `100000`)
|
|
126
|
+
|
|
127
|
+
Example:
|
|
74
128
|
```json
|
|
75
|
-
{ "url": "https://docs.
|
|
129
|
+
{ "url": "https://docs.python.org/3/library/", "strategy": "sitemap", "max_pages": 30, "include_patterns": ["^https://docs\\.python\\.org/3/library/asyncio"] }
|
|
76
130
|
```
|
|
77
131
|
|
|
132
|
+
Tip: `strategy: "sitemap"` is faster and more complete than BFS on doc sites. `strategy: "map"` returns URLs only — cheap way to scope before targeted fetches.
|
|
133
|
+
|
|
78
134
|
### cache
|
|
79
|
-
|
|
135
|
+
|
|
136
|
+
Search previously fetched content without hitting the network.
|
|
137
|
+
|
|
138
|
+
Parameters:
|
|
139
|
+
- `query`: FTS5 syntax — supports `AND`, `OR`, `NOT`, `"exact phrase"`
|
|
140
|
+
- `url_pattern`: glob (e.g. `"*react.dev*"`)
|
|
141
|
+
- `since`: ISO date
|
|
142
|
+
- `stats`: boolean — returns total URLs, size, date range
|
|
143
|
+
- `clear`: boolean — deletes matching entries (requires one of `query`, `url_pattern`, `since`)
|
|
144
|
+
- `check_changes`: boolean — re-fetches matching URLs, reports changed/unchanged with diff summaries
|
|
145
|
+
|
|
146
|
+
Example:
|
|
80
147
|
```json
|
|
81
|
-
{ "query": "
|
|
148
|
+
{ "query": "useState OR useReducer", "url_pattern": "*react.dev*" }
|
|
82
149
|
```
|
|
83
150
|
|
|
151
|
+
Tip: cache hits are instant and cross-session. Run this before `search` or `fetch` when you suspect the content is already on disk.
|
|
152
|
+
|
|
84
153
|
### extract
|
|
85
|
-
|
|
154
|
+
|
|
155
|
+
Structured extraction from URL or raw HTML.
|
|
156
|
+
|
|
157
|
+
Parameters:
|
|
158
|
+
- `url` OR `html` (one required; `url` wins if both provided)
|
|
159
|
+
- `mode`: `"metadata"` (default) | `"selector"` | `"tables"` | `"schema"`
|
|
160
|
+
- `css_selector`: string — required for `mode: "selector"`
|
|
161
|
+
- `multiple`: boolean (default `false`) — return all matches, selector mode only
|
|
162
|
+
- `schema`: JSON Schema object with `properties` — required for `mode: "schema"`
|
|
163
|
+
|
|
164
|
+
Example:
|
|
86
165
|
```json
|
|
87
|
-
{ "url": "https://example.com/product", "mode": "schema", "schema": { "type": "object", "properties": { "price": { "type": "string" }, "name": { "type": "string" } } } }
|
|
166
|
+
{ "url": "https://example.com/product", "mode": "schema", "schema": { "type": "object", "properties": { "price": { "type": "string" }, "name": { "type": "string" }, "sku": { "type": "string" } } } }
|
|
88
167
|
```
|
|
89
168
|
|
|
169
|
+
Tip: `mode: "schema"` does heuristic matching over CSS classes, ARIA labels, microdata, and JSON-LD — no LLM call required.
|
|
170
|
+
|
|
90
171
|
### find_similar
|
|
91
|
-
|
|
172
|
+
|
|
173
|
+
Find pages related to a URL or a free-text concept.
|
|
174
|
+
|
|
175
|
+
Parameters:
|
|
176
|
+
- `url` OR `concept` (one required)
|
|
177
|
+
- `max_results`: number (default `10`, cap `50`)
|
|
178
|
+
- `include_domains` / `exclude_domains`: `string[]`
|
|
179
|
+
- `include_cache`: boolean (default `true`)
|
|
180
|
+
- `include_web`: boolean (default `true`)
|
|
181
|
+
|
|
182
|
+
Example:
|
|
92
183
|
```json
|
|
93
|
-
{ "url": "https://react.dev/reference/react/useState", "max_results":
|
|
184
|
+
{ "url": "https://react.dev/reference/react/useState", "max_results": 8, "include_domains": ["react.dev", "developer.mozilla.org"] }
|
|
94
185
|
```
|
|
95
186
|
|
|
187
|
+
Tip: uses hybrid 3-way search — FTS5 over titles, FTS5 over body, plus embeddings when available. Cache path is near-instant; web supplement runs only if cache yields too few results.
|
|
188
|
+
|
|
96
189
|
### research
|
|
97
|
-
Deep multi-step research that plans queries, fetches, and synthesizes.
|
|
98
|
-
```json
|
|
99
|
-
{ "question": "How do modern bundlers handle tree-shaking of ESM vs CJS", "depth": "standard", "max_sources": 10 }
|
|
100
|
-
```
|
|
101
190
|
|
|
102
|
-
|
|
103
|
-
|
|
191
|
+
Multi-step research pipeline with decomposition, parallel search, and cited synthesis.
|
|
192
|
+
|
|
193
|
+
Parameters:
|
|
194
|
+
- `question` (string, required)
|
|
195
|
+
- `depth`: `"quick"` (~15s, 2 sub-queries, 5–8 sources) | `"standard"` (~40s, default) | `"comprehensive"` (~80s, 7 sub-queries, 20–25 sources)
|
|
196
|
+
- `max_sources`: number (cap `50`) — overrides depth default
|
|
197
|
+
- `include_domains` / `exclude_domains`: `string[]`
|
|
198
|
+
- `schema`: JSON Schema — if present, report is structured to fill these fields
|
|
199
|
+
- `stream`: boolean — emit progress notifications per phase
|
|
200
|
+
|
|
201
|
+
Example:
|
|
104
202
|
```json
|
|
105
|
-
{ "
|
|
203
|
+
{ "question": "How do modern JS bundlers tree-shake ESM vs CJS?", "depth": "standard", "include_domains": ["webpack.js.org", "rollupjs.org", "esbuild.github.io", "vitejs.dev"] }
|
|
106
204
|
```
|
|
107
205
|
|
|
108
|
-
|
|
206
|
+
Tip: `research` checks cache internally — no need to pre-probe. Requires MCP sampling-capable client for synthesis; without sampling, returns raw sources in context format.
|
|
109
207
|
|
|
110
|
-
|
|
208
|
+
### agent
|
|
111
209
|
|
|
112
|
-
|
|
210
|
+
Natural-language data gathering. Plans queries and URLs from a prompt, runs them in parallel within budget, optionally applies a schema.
|
|
113
211
|
|
|
114
|
-
|
|
212
|
+
Parameters:
|
|
213
|
+
- `prompt` (string, required)
|
|
214
|
+
- `urls`: `string[]` — seed URLs to include
|
|
215
|
+
- `schema`: JSON Schema — extract structured fields per page and merge
|
|
216
|
+
- `max_pages`: number (default `10`, cap `100`)
|
|
217
|
+
- `max_time_ms`: number (default `60000`, cap `600000`)
|
|
218
|
+
- `stream`: boolean
|
|
115
219
|
|
|
116
|
-
|
|
220
|
+
Example:
|
|
221
|
+
```json
|
|
222
|
+
{ "prompt": "Compare pricing tiers for Supabase, Firebase, and Clerk", "schema": { "type": "object", "properties": { "provider": { "type": "string" }, "free_tier": { "type": "string" }, "paid_start": { "type": "string" } } }, "max_pages": 12 }
|
|
223
|
+
```
|
|
117
224
|
|
|
118
|
-
|
|
225
|
+
Tip: output includes a `steps` array showing every action (plan, search, fetch, extract, synthesize) with timings. Use this to debug why an agent run produced a weak result.
|
|
119
226
|
|
|
120
|
-
|
|
227
|
+
## Workflow Patterns
|
|
121
228
|
|
|
122
|
-
|
|
229
|
+
Quick routing:
|
|
230
|
+
- Use when `search` — you need information but don't have a URL.
|
|
231
|
+
- Use when `fetch` — you already have the URL.
|
|
232
|
+
- Use when `crawl` — you need multiple pages from one site.
|
|
233
|
+
- Use when `cache` — you want to check whether something is already on disk.
|
|
234
|
+
- Use when `extract` — you need specific fields, tables, or metadata, not the whole page.
|
|
235
|
+
- Use when `find_similar` — you have a good page/concept and want related content.
|
|
236
|
+
- Use when `research` — a question needs decomposition and multi-source synthesis.
|
|
237
|
+
- Use when `agent` — a natural-language task needs multi-step data gathering.
|
|
238
|
+
|
|
239
|
+
**Cache-first lookup.** Before any `fetch` or `search`, probe the cache.
|
|
240
|
+
```json
|
|
241
|
+
cache({ "query": "oauth2 pkce", "url_pattern": "*auth0.com*" })
|
|
242
|
+
// empty? fall through to search
|
|
243
|
+
search({ "query": "oauth2 pkce flow", "include_domains": ["auth0.com"] })
|
|
244
|
+
```
|
|
123
245
|
|
|
124
|
-
**
|
|
246
|
+
**Fresh content (news, dashboards, changelogs).** Bypass cache explicitly.
|
|
247
|
+
```json
|
|
248
|
+
search({ "query": "node.js 22 release notes", "force_refresh": true, "time_range": "week" })
|
|
249
|
+
fetch({ "url": "https://nodejs.org/en/blog", "force_refresh": true })
|
|
250
|
+
```
|
|
125
251
|
|
|
126
|
-
**
|
|
252
|
+
**Scoped documentation research.** Crawl the relevant slice, then query cache.
|
|
253
|
+
```json
|
|
254
|
+
crawl({ "url": "https://docs.astro.build", "strategy": "sitemap", "max_pages": 40 })
|
|
255
|
+
cache({ "query": "server islands hydration", "url_pattern": "*docs.astro.build*" })
|
|
256
|
+
```
|
|
127
257
|
|
|
128
|
-
|
|
258
|
+
**Broad exploration.** Pass a query array; dedup is automatic.
|
|
259
|
+
```json
|
|
260
|
+
search({ "query": ["rust async runtimes comparison", "tokio vs async-std vs smol", "rust executor benchmarks"], "max_results": 8 })
|
|
261
|
+
```
|
|
129
262
|
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
- `from_date` / `to_date` for time-sensitive queries
|
|
135
|
-
- `format: "context"` returns a single token-budgeted string for LLM injection
|
|
263
|
+
**More like this.** Start with a known-good URL, widen via `find_similar`.
|
|
264
|
+
```json
|
|
265
|
+
find_similar({ "url": "https://react.dev/reference/react/useMemo", "max_results": 6, "include_domains": ["react.dev"] })
|
|
266
|
+
```
|
|
136
267
|
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
268
|
+
**Complex synthesis.** One `research` call replaces 5+ manual search/fetch cycles.
|
|
269
|
+
```json
|
|
270
|
+
research({ "question": "Tradeoffs of vector DBs for RAG at 100M+ embeddings", "depth": "comprehensive" })
|
|
271
|
+
```
|
|
141
272
|
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
273
|
+
**Structured data from multiple sources.** Use `agent` with a schema.
|
|
274
|
+
```json
|
|
275
|
+
agent({ "prompt": "Find latency and pricing for top 5 edge compute providers", "schema": { "type": "object", "properties": { "provider": {"type":"string"}, "cold_start_ms": {"type":"string"}, "price_per_million": {"type":"string"} } } })
|
|
276
|
+
```
|
|
146
277
|
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
278
|
+
**Table extraction.** Skip markdown entirely.
|
|
279
|
+
```json
|
|
280
|
+
extract({ "url": "https://en.wikipedia.org/wiki/List_of_programming_languages", "mode": "tables" })
|
|
281
|
+
```
|
|
150
282
|
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
283
|
+
## Parameter Cheat Sheet
|
|
284
|
+
|
|
285
|
+
| Situation | Tool + parameters |
|
|
286
|
+
|---|---|
|
|
287
|
+
| Focused lookup, known site | `search` + `max_results: 3` + `include_domains` |
|
|
288
|
+
| Broad topic survey | `search` + `query: [...3-5 variants]` + `max_results: 8` |
|
|
289
|
+
| Fresh content required | any tool + `force_refresh: true` |
|
|
290
|
+
| Doc site indexing | `crawl` + `strategy: "sitemap"` |
|
|
291
|
+
| Site URL inventory only | `crawl` + `strategy: "map"` |
|
|
292
|
+
| Single heading from long page | `fetch` + `section: "..."` |
|
|
293
|
+
| Behind login | `fetch` / `crawl` + `use_auth: true` |
|
|
294
|
+
| Direct answer (sampling client) | `search` + `format: "answer"` |
|
|
295
|
+
| LLM-ready context blob | `search` + `format: "context"` |
|
|
296
|
+
| Complex question, multi-source | `research` + `depth: "standard"` |
|
|
297
|
+
| Structured multi-page extraction | `agent` + `schema` |
|
|
298
|
+
| One-page structured data | `extract` + `mode: "schema"` or `"tables"` |
|
|
299
|
+
| Change tracking | `cache` + `check_changes: true` |
|
|
155
300
|
|
|
156
301
|
## Anti-Patterns
|
|
157
302
|
|
|
158
|
-
|
|
303
|
+
**Do not skip the cache.** Running `search` or `fetch` without probing `cache` wastes time on content already on disk. `research` and `agent` check cache internally; manual `search`/`fetch` do not.
|
|
159
304
|
|
|
160
|
-
**Do not
|
|
305
|
+
**Do not send natural-language questions to `search`.** Use keywords. `"how do I debounce in React hooks"` loses to `"react useDebounce hook custom"`.
|
|
161
306
|
|
|
162
|
-
**Do not
|
|
307
|
+
**Do not retry an identical failing query.** Reformulate keywords, swap `category`, or add `include_domains`. Same query → same empty result.
|
|
163
308
|
|
|
164
|
-
**Do not
|
|
309
|
+
**Do not use `agent` or `research` for one-URL lookups.** Use `fetch`. `agent` is for multi-source gathering; `research` is for decomposable questions.
|
|
165
310
|
|
|
166
|
-
**Do not
|
|
311
|
+
**Do not crawl `max_pages: 100` without filters.** Always add `include_patterns` to stay in-scope. Unfiltered crawls fetch nav, footer, and sitemap garbage.
|
|
167
312
|
|
|
168
|
-
**Do not
|
|
313
|
+
**Do not fetch whole pages when you need one section.** `fetch` + `section` reads under one heading only.
|
|
169
314
|
|
|
170
|
-
**Do not
|
|
315
|
+
**Do not set `force_refresh: true` by default.** It defeats the cache. Use it for news, status, changelogs — content that actually churns.
|
|
171
316
|
|
|
172
|
-
**Do not
|
|
317
|
+
**Do not pass a JSON Schema to `extract` without `properties`.** The handler rejects schemas that lack a `properties` key.
|
|
173
318
|
|
|
174
|
-
|
|
319
|
+
## CLI Commands
|
|
320
|
+
|
|
321
|
+
```bash
|
|
322
|
+
wigolo # default: start MCP server on stdio
|
|
323
|
+
wigolo mcp # explicit: start MCP server
|
|
324
|
+
wigolo warmup [flags] # install Playwright, bootstrap SearXNG, optional extras
|
|
325
|
+
wigolo serve # start HTTP daemon on WIGOLO_DAEMON_PORT (default 3333)
|
|
326
|
+
wigolo health # health probe, exits 0 if ok
|
|
327
|
+
wigolo doctor # environment diagnostics (Python, Docker, Playwright, SearXNG)
|
|
328
|
+
wigolo auth discover # list CDP sessions (needs WIGOLO_CDP_URL)
|
|
329
|
+
wigolo auth status # show configured auth paths
|
|
330
|
+
wigolo plugin add <git-url> # clone plugin into ~/.wigolo/plugins/
|
|
331
|
+
wigolo plugin list # list installed plugins
|
|
332
|
+
wigolo plugin remove <name> # remove a plugin
|
|
333
|
+
wigolo shell [--json] # interactive REPL against subsystems
|
|
334
|
+
```
|
|
175
335
|
|
|
176
|
-
##
|
|
336
|
+
## Configuration
|
|
177
337
|
|
|
178
|
-
|
|
179
|
-
- Zero cloud dependency -- runs entirely local
|
|
180
|
-
- Authenticated browsing (Chrome profiles, session state)
|
|
181
|
-
- Localhost access (develop against local servers)
|
|
182
|
-
- SQLite FTS5 cache with full-text search
|
|
183
|
-
- ML reranking (optional, via FlashRank)
|
|
184
|
-
- Extraction ensemble: site-specific, Defuddle, Trafilatura, Readability, Turndown
|
|
338
|
+
Top environment variables. All optional — defaults are safe.
|
|
185
339
|
|
|
186
|
-
|
|
340
|
+
| Variable | Default | Purpose |
|
|
341
|
+
|---|---|---|
|
|
342
|
+
| `WIGOLO_DATA_DIR` | `~/.wigolo` | Cache DB, SearXNG state, plugins, embeddings |
|
|
343
|
+
| `SEARXNG_URL` | unset | Point at an existing SearXNG (skips native bootstrap) |
|
|
344
|
+
| `SEARXNG_MODE` | `native` | `native` runs local Python SearXNG; `docker` runs container |
|
|
345
|
+
| `WIGOLO_CHROME_PROFILE_PATH` | unset | Chrome profile for `use_auth: true` |
|
|
346
|
+
| `WIGOLO_CDP_URL` | unset | Chrome DevTools endpoint (e.g. `http://localhost:9222`) |
|
|
347
|
+
| `MAX_BROWSERS` | `3` | Playwright pool size |
|
|
348
|
+
| `WIGOLO_BROWSER_TYPES` | `chromium` | Comma list: `chromium,firefox,webkit` |
|
|
349
|
+
| `WIGOLO_RERANKER` | `none` | `flashrank` for ML reranking |
|
|
350
|
+
| `WIGOLO_EMBEDDING_MODEL` | `BAAI/bge-small-en-v1.5` | Used by `find_similar` |
|
|
351
|
+
| `CACHE_TTL_CONTENT` | `604800` (7d) | Seconds before cached pages expire |
|
|
352
|
+
| `LOG_LEVEL` | `info` | `debug` \| `info` \| `warn` \| `error` |
|
|
187
353
|
|
|
188
|
-
|
|
189
|
-
- Python 3.8+ (recommended, for embedded SearXNG search)
|
|
190
|
-
- Docker (optional, alternative to Python for SearXNG)
|
|
354
|
+
Full list: see `src/config.ts`.
|
|
191
355
|
|
|
192
356
|
## Links
|
|
193
357
|
|
|
194
358
|
- Repository: https://github.com/KnockOutEZ/wigolo
|
|
195
359
|
- npm: https://www.npmjs.com/package/@staticn0va/wigolo
|
|
196
|
-
- License:
|
|
360
|
+
- License: BUSL-1.1 (converts to open source on 2029-04-12)
|
package/dist/cli/doctor.d.ts.map
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"doctor.d.ts","sourceRoot":"","sources":["../../src/cli/doctor.ts"],"names":[],"mappings":"
|
|
1
|
+
{"version":3,"file":"doctor.d.ts","sourceRoot":"","sources":["../../src/cli/doctor.ts"],"names":[],"mappings":"AA4DA;;;;;GAKG;AACH,wBAAsB,SAAS,CAAC,OAAO,EAAE,MAAM,GAAG,OAAO,CAAC,MAAM,CAAC,CA+EhE"}
|