@staticn0va/wigolo 0.5.1 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/SKILL.md CHANGED
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: wigolo
3
- description: Local-first web search MCP server for AI coding agents. Search, fetch, crawl, cache, extract, find similar pages, deep research, and autonomous agent mode with zero API keys.
3
+ description: Local-first web access MCP server for AI coding agents. Eight tools for search, fetch, crawl, cache, extract, find similar, research, and agent-driven data gathering. No API keys. Results cached in local SQLite.
4
4
  author: KnockOutEZ
5
5
  license: BUSL-1.1
6
6
  repository: https://github.com/KnockOutEZ/wigolo
@@ -9,29 +9,29 @@ install: npx @staticn0va/wigolo
9
9
  runtime: node
10
10
  min_runtime_version: "20"
11
11
  tools:
12
- - name: search
13
- description: Search the web and return results with optional full content extraction. Supports domain filtering, date ranges, categories, and ML reranking.
14
12
  - name: fetch
15
- description: Fetch a web page and return its content as clean markdown. Supports JavaScript rendering, authenticated browsing, section extraction, and caching.
13
+ description: Fetch one URL, return clean markdown. Auto-routes between HTTP and Playwright. Supports sections, auth, screenshots, browser actions.
14
+ - name: search
15
+ description: Search the web, return extracted markdown per result. Single query or array of query variants. Domain, category, date filters. Optional synthesized answer via MCP sampling.
16
16
  - name: crawl
17
- description: Crawl a website starting from a seed URL. Supports BFS, DFS, sitemap, and map (URL-only) strategies with depth/page limits and URL filtering.
17
+ description: Crawl a site from a seed URL. BFS, DFS, sitemap, or map (URL-only) strategies with regex include/exclude filters.
18
18
  - name: cache
19
- description: Query the local knowledge base of previously fetched content. Full-text search over cached pages by query, URL pattern, or date. Cache stats and clearing.
19
+ description: FTS5 search over previously fetched content. URL glob, date filters, stats, clear, and change detection via re-fetch.
20
20
  - name: extract
21
- description: Extract structured data from a web page. CSS selector extraction, HTML table parsing, metadata extraction (title, author, JSON-LD), and JSON Schema heuristic matching.
21
+ description: Structured extraction from URL or raw HTML. Modes: selector (CSS), tables, metadata (meta + JSON-LD), schema (heuristic field matching).
22
22
  - name: find_similar
23
- description: Find pages semantically similar to a given URL or concept. Uses cached embeddings and web search to discover related content.
23
+ description: Find pages similar to a URL or concept. Hybrid cache (FTS5 + embeddings) + optional web supplement.
24
24
  - name: research
25
- description: Deep multi-step research on a question. Decomposes into sub-queries, searches in parallel, fetches sources, and synthesizes a report with citations.
25
+ description: Multi-step research pipeline. Question decomposition, parallel sub-search, source synthesis with citations. Quick, standard, or comprehensive depth.
26
26
  - name: agent
27
- description: Autonomous data gathering agent. Plans search queries from a prompt, fetches pages within budget, optionally extracts structured data via JSON Schema, and synthesizes results.
27
+ description: Natural-language data gathering. Plans searches/URLs, fetches in parallel within page and time budgets, optionally applies a JSON Schema to each page.
28
28
  ---
29
29
 
30
30
  # wigolo
31
31
 
32
- Local-first web search MCP server for AI coding agents.
32
+ Local-first web search MCP server for AI coding agents. Ships eight tools over stdio. All network results land in a local SQLite cache.
33
33
 
34
- ## Installation
34
+ ## Quick Setup
35
35
 
36
36
  **Claude Code:**
37
37
  ```bash
@@ -50,147 +50,311 @@ claude mcp add wigolo -- npx @staticn0va/wigolo
50
50
  }
51
51
  ```
52
52
 
53
- **Optional warmup (improves search quality):**
53
+ **Warmup (recommended, one-time):**
54
54
  ```bash
55
- npx @staticn0va/wigolo warmup
55
+ npx @staticn0va/wigolo warmup # installs Playwright Chromium + bootstraps SearXNG
56
+ npx @staticn0va/wigolo warmup --all # also installs Firefox, WebKit, reranker, embeddings, trafilatura
57
+ npx @staticn0va/wigolo warmup --force # wipe SearXNG state and rebuild
56
58
  ```
57
59
 
60
+ Warmup flags: `--force`, `--all`, `--trafilatura`, `--reranker`, `--firefox`, `--webkit`, `--embeddings`, `--lightpanda`.
61
+
58
62
  ## Tools
59
63
 
60
- ### search
61
- Search the web and get full markdown content in one call.
64
+ ### fetch
65
+
66
+ Fetch a single URL and return clean markdown. Use when you already have a specific URL.
67
+
68
+ Parameters:
69
+ - `url` (string, required)
70
+ - `render_js`: `"auto"` (default) | `"always"` | `"never"`
71
+ - `use_auth`: boolean (default `false`) — reuses the user's browser session
72
+ - `max_chars`: number
73
+ - `section`: string — return only the content under a heading
74
+ - `section_index`: number (default `0`) — which heading match when multiple hit
75
+ - `screenshot`: boolean (default `false`)
76
+ - `headers`: object
77
+ - `force_refresh`: boolean — bypass cache
78
+ - `actions`: array of `{type, selector, text, ms, timeout, direction, amount}` — `click`, `type`, `wait`, `wait_for`, `scroll`, `screenshot`. Forces Playwright when present.
79
+
80
+ Example:
62
81
  ```json
63
- { "query": "React Server Components best practices", "max_results": 5, "include_domains": ["react.dev"] }
82
+ { "url": "https://react.dev/reference/react/useState", "section": "Parameters" }
64
83
  ```
65
84
 
66
- ### fetch
67
- Fetch any URL and get clean markdown.
85
+ Tip: `section` is much cheaper than reading the full page. Repeat fetches of the same URL are free from cache unless `force_refresh: true`.
86
+
87
+ ### search
88
+
89
+ Search the web and return extracted markdown per result. Use when you don't have a URL yet.
90
+
91
+ Parameters:
92
+ - `query` (string OR `string[]`, required) — array runs variants in parallel and dedupes
93
+ - `max_results`: number (default `5`, cap `20`)
94
+ - `include_content`: boolean (default `true`)
95
+ - `content_max_chars`: number (default `30000`)
96
+ - `max_total_chars`: number (default `50000`)
97
+ - `time_range`: `"day"` | `"week"` | `"month"` | `"year"`
98
+ - `include_domains` / `exclude_domains`: `string[]`
99
+ - `from_date` / `to_date`: ISO `YYYY-MM-DD`
100
+ - `category`: `"general"` | `"news"` | `"code"` | `"docs"` | `"papers"` | `"images"`
101
+ - `language`: string
102
+ - `search_engines`: `string[]` — override engine selection
103
+ - `format`: `"full"` (default) | `"context"` (token-budgeted string) | `"answer"` (synthesized via MCP sampling) | `"stream_answer"` (answer + phase progress notifications)
104
+ - `force_refresh`: boolean
105
+
106
+ Example:
68
107
  ```json
69
- { "url": "https://docs.react.dev/reference/react/useState", "section": "Parameters" }
108
+ { "query": ["react server components patterns", "RSC data fetching", "react server components streaming"], "category": "docs", "include_domains": ["react.dev"], "max_results": 5 }
70
109
  ```
71
110
 
111
+ Tip: keyword queries beat natural-language questions. A 3–5 item `query` array usually finds more unique sources than one longer query.
112
+
72
113
  ### crawl
73
- Crawl a site from a seed URL.
114
+
115
+ Crawl a site starting from a seed URL.
116
+
117
+ Parameters:
118
+ - `url` (string, required)
119
+ - `strategy`: `"bfs"` (default) | `"dfs"` | `"sitemap"` | `"map"` (URL-only discovery, no content)
120
+ - `max_depth`: number (default `2`)
121
+ - `max_pages`: number (default `20`)
122
+ - `include_patterns` / `exclude_patterns`: regex `string[]`
123
+ - `use_auth`: boolean (default `false`)
124
+ - `extract_links`: boolean (default `false`) — returns inter-page link graph
125
+ - `max_total_chars`: number (default `100000`)
126
+
127
+ Example:
74
128
  ```json
75
- { "url": "https://docs.example.com", "strategy": "sitemap", "max_pages": 50 }
129
+ { "url": "https://docs.python.org/3/library/", "strategy": "sitemap", "max_pages": 30, "include_patterns": ["^https://docs\\.python\\.org/3/library/asyncio"] }
76
130
  ```
77
131
 
132
+ Tip: `strategy: "sitemap"` is faster and more complete than BFS on doc sites. `strategy: "map"` returns URLs only — cheap way to scope before targeted fetches.
133
+
78
134
  ### cache
79
- Query previously fetched content without hitting the network.
135
+
136
+ Search previously fetched content without hitting the network.
137
+
138
+ Parameters:
139
+ - `query`: FTS5 syntax — supports `AND`, `OR`, `NOT`, `"exact phrase"`
140
+ - `url_pattern`: glob (e.g. `"*react.dev*"`)
141
+ - `since`: ISO date
142
+ - `stats`: boolean — returns total URLs, size, date range
143
+ - `clear`: boolean — deletes matching entries (requires one of `query`, `url_pattern`, `since`)
144
+ - `check_changes`: boolean — re-fetches matching URLs, reports changed/unchanged with diff summaries
145
+
146
+ Example:
80
147
  ```json
81
- { "query": "React hooks", "url_pattern": "*react.dev*" }
148
+ { "query": "useState OR useReducer", "url_pattern": "*react.dev*" }
82
149
  ```
83
150
 
151
+ Tip: cache hits are instant and cross-session. Run this before `search` or `fetch` when you suspect the content is already on disk.
152
+
84
153
  ### extract
85
- Structured data extraction from any URL or HTML.
154
+
155
+ Structured extraction from URL or raw HTML.
156
+
157
+ Parameters:
158
+ - `url` OR `html` (one required; `url` wins if both provided)
159
+ - `mode`: `"metadata"` (default) | `"selector"` | `"tables"` | `"schema"`
160
+ - `css_selector`: string — required for `mode: "selector"`
161
+ - `multiple`: boolean (default `false`) — return all matches, selector mode only
162
+ - `schema`: JSON Schema object with `properties` — required for `mode: "schema"`
163
+
164
+ Example:
86
165
  ```json
87
- { "url": "https://example.com/product", "mode": "schema", "schema": { "type": "object", "properties": { "price": { "type": "string" }, "name": { "type": "string" } } } }
166
+ { "url": "https://example.com/product", "mode": "schema", "schema": { "type": "object", "properties": { "price": { "type": "string" }, "name": { "type": "string" }, "sku": { "type": "string" } } } }
88
167
  ```
89
168
 
169
+ Tip: `mode: "schema"` does heuristic matching over CSS classes, ARIA labels, microdata, and JSON-LD — no LLM call required.
170
+
90
171
  ### find_similar
91
- Find pages related to a URL or concept.
172
+
173
+ Find pages related to a URL or a free-text concept.
174
+
175
+ Parameters:
176
+ - `url` OR `concept` (one required)
177
+ - `max_results`: number (default `10`, cap `50`)
178
+ - `include_domains` / `exclude_domains`: `string[]`
179
+ - `include_cache`: boolean (default `true`)
180
+ - `include_web`: boolean (default `true`)
181
+
182
+ Example:
92
183
  ```json
93
- { "url": "https://react.dev/reference/react/useState", "max_results": 5 }
184
+ { "url": "https://react.dev/reference/react/useState", "max_results": 8, "include_domains": ["react.dev", "developer.mozilla.org"] }
94
185
  ```
95
186
 
187
+ Tip: uses hybrid 3-way search — FTS5 over titles, FTS5 over body, plus embeddings when available. Cache path is near-instant; web supplement runs only if cache yields too few results.
188
+
96
189
  ### research
97
- Deep multi-step research that plans queries, fetches, and synthesizes.
98
- ```json
99
- { "question": "How do modern bundlers handle tree-shaking of ESM vs CJS", "depth": "standard", "max_sources": 10 }
100
- ```
101
190
 
102
- ### agent
103
- Autonomous data gathering from a natural-language prompt.
191
+ Multi-step research pipeline with decomposition, parallel search, and cited synthesis.
192
+
193
+ Parameters:
194
+ - `question` (string, required)
195
+ - `depth`: `"quick"` (~15s, 2 sub-queries, 5–8 sources) | `"standard"` (~40s, default) | `"comprehensive"` (~80s, 7 sub-queries, 20–25 sources)
196
+ - `max_sources`: number (cap `50`) — overrides depth default
197
+ - `include_domains` / `exclude_domains`: `string[]`
198
+ - `schema`: JSON Schema — if present, report is structured to fill these fields
199
+ - `stream`: boolean — emit progress notifications per phase
200
+
201
+ Example:
104
202
  ```json
105
- { "prompt": "Compare authentication strategies of Supabase, Firebase, and Clerk", "max_pages": 15, "max_time_ms": 90000 }
203
+ { "question": "How do modern JS bundlers tree-shake ESM vs CJS?", "depth": "standard", "include_domains": ["webpack.js.org", "rollupjs.org", "esbuild.github.io", "vitejs.dev"] }
106
204
  ```
107
205
 
108
- ## Workflow Patterns
206
+ Tip: `research` checks cache internally — no need to pre-probe. Requires MCP sampling-capable client for synthesis; without sampling, returns raw sources in context format.
109
207
 
110
- Use the right tool for the right situation.
208
+ ### agent
111
209
 
112
- **When you know the URL** -- use `fetch`. One URL, clean markdown. Add `section` to read only the heading you need.
210
+ Natural-language data gathering. Plans queries and URLs from a prompt, runs them in parallel within budget, optionally applies a schema.
113
211
 
114
- **When you need to find information** -- use `search`. Formulate a keyword query (not a natural language question). Scope with `include_domains` and `category` when you know where the answer lives.
212
+ Parameters:
213
+ - `prompt` (string, required)
214
+ - `urls`: `string[]` — seed URLs to include
215
+ - `schema`: JSON Schema — extract structured fields per page and merge
216
+ - `max_pages`: number (default `10`, cap `100`)
217
+ - `max_time_ms`: number (default `60000`, cap `600000`)
218
+ - `stream`: boolean
115
219
 
116
- **When you need multiple pages from one site** -- use `crawl`. For documentation sites, use `strategy: "sitemap"`. When you just want to discover what pages exist, use `strategy: "map"` (URL list only) then follow up with targeted `fetch` calls.
220
+ Example:
221
+ ```json
222
+ { "prompt": "Compare pricing tiers for Supabase, Firebase, and Clerk", "schema": { "type": "object", "properties": { "provider": { "type": "string" }, "free_tier": { "type": "string" }, "paid_start": { "type": "string" } } }, "max_pages": 12 }
223
+ ```
117
224
 
118
- **When you need structured data** -- use `extract` with `mode: "tables"` or `mode: "schema"`. Do not use `fetch` when you need prices, specs, or table rows.
225
+ Tip: output includes a `steps` array showing every action (plan, search, fetch, extract, synthesize) with timings. Use this to debug why an agent run produced a weak result.
119
226
 
120
- **When you already have content and want related pages** -- use `find_similar`. It searches the local cache by semantic similarity. No network calls needed.
227
+ ## Workflow Patterns
121
228
 
122
- **When you need a thorough answer on a complex topic** -- use `research`. It plans multiple search queries, fetches sources, and produces a cited synthesis. Prefer this over running 5+ manual search/fetch cycles.
229
+ Quick routing:
230
+ - Use when `search` — you need information but don't have a URL.
231
+ - Use when `fetch` — you already have the URL.
232
+ - Use when `crawl` — you need multiple pages from one site.
233
+ - Use when `cache` — you want to check whether something is already on disk.
234
+ - Use when `extract` — you need specific fields, tables, or metadata, not the whole page.
235
+ - Use when `find_similar` — you have a good page/concept and want related content.
236
+ - Use when `research` — a question needs decomposition and multi-source synthesis.
237
+ - Use when `agent` — a natural-language task needs multi-step data gathering.
238
+
239
+ **Cache-first lookup.** Before any `fetch` or `search`, probe the cache.
240
+ ```json
241
+ cache({ "query": "oauth2 pkce", "url_pattern": "*auth0.com*" })
242
+ // empty? fall through to search
243
+ search({ "query": "oauth2 pkce flow", "include_domains": ["auth0.com"] })
244
+ ```
123
245
 
124
- **When the task requires multi-step data gathering** -- use `agent`. It breaks prompts into search queries and URL fetches, respects page and time budgets, and can extract structured data via JSON Schema.
246
+ **Fresh content (news, dashboards, changelogs).** Bypass cache explicitly.
247
+ ```json
248
+ search({ "query": "node.js 22 release notes", "force_refresh": true, "time_range": "week" })
249
+ fetch({ "url": "https://nodejs.org/en/blog", "force_refresh": true })
250
+ ```
125
251
 
126
- **Before any network call** -- check `cache` first. Pages from prior sessions are still there. A cache hit is instant and free.
252
+ **Scoped documentation research.** Crawl the relevant slice, then query cache.
253
+ ```json
254
+ crawl({ "url": "https://docs.astro.build", "strategy": "sitemap", "max_pages": 40 })
255
+ cache({ "query": "server islands hydration", "url_pattern": "*docs.astro.build*" })
256
+ ```
127
257
 
128
- ## Parameter Optimization
258
+ **Broad exploration.** Pass a query array; dedup is automatic.
259
+ ```json
260
+ search({ "query": ["rust async runtimes comparison", "tokio vs async-std vs smol", "rust executor benchmarks"], "max_results": 8 })
261
+ ```
129
262
 
130
- ### search
131
- - `max_results: 3` for focused lookups, `5` for exploration (default), `10+` for broad research
132
- - `include_domains` narrows to trusted sources -- always use when you know the domain
133
- - `category: "code"` for programming, `"docs"` for library docs, `"news"` for recent events
134
- - `from_date` / `to_date` for time-sensitive queries
135
- - `format: "context"` returns a single token-budgeted string for LLM injection
263
+ **More like this.** Start with a known-good URL, widen via `find_similar`.
264
+ ```json
265
+ find_similar({ "url": "https://react.dev/reference/react/useMemo", "max_results": 6, "include_domains": ["react.dev"] })
266
+ ```
136
267
 
137
- ### fetch
138
- - `section: "heading text"` extracts only content under that heading -- much cheaper than the full page
139
- - `render_js: "never"` is fastest for static sites; `"always"` for SPAs
140
- - `use_auth: true` to access pages behind login using the user's browser session
268
+ **Complex synthesis.** One `research` call replaces 5+ manual search/fetch cycles.
269
+ ```json
270
+ research({ "question": "Tradeoffs of vector DBs for RAG at 100M+ embeddings", "depth": "comprehensive" })
271
+ ```
141
272
 
142
- ### crawl
143
- - `strategy: "sitemap"` is 5-10x faster than BFS for doc sites
144
- - `strategy: "map"` returns URLs only -- use to scope a site before targeted fetches
145
- - `include_patterns` / `exclude_patterns` accept regex to stay in one section
273
+ **Structured data from multiple sources.** Use `agent` with a schema.
274
+ ```json
275
+ agent({ "prompt": "Find latency and pricing for top 5 edge compute providers", "schema": { "type": "object", "properties": { "provider": {"type":"string"}, "cold_start_ms": {"type":"string"}, "price_per_million": {"type":"string"} } } })
276
+ ```
146
277
 
147
- ### research
148
- - `depth: "quick"` (~15s, 2 sub-queries), `"standard"` (~40s, default), `"comprehensive"` (~80s, 7 sub-queries)
149
- - `max_sources` overrides the default source count for the chosen depth
278
+ **Table extraction.** Skip markdown entirely.
279
+ ```json
280
+ extract({ "url": "https://en.wikipedia.org/wiki/List_of_programming_languages", "mode": "tables" })
281
+ ```
150
282
 
151
- ### agent
152
- - `max_pages` caps total page fetches (default 10, max 100)
153
- - `max_time_ms` caps total execution time (default 60000)
154
- - `schema` enables structured extraction from each page -- results are merged across sources
283
+ ## Parameter Cheat Sheet
284
+
285
+ | Situation | Tool + parameters |
286
+ |---|---|
287
+ | Focused lookup, known site | `search` + `max_results: 3` + `include_domains` |
288
+ | Broad topic survey | `search` + `query: [...3-5 variants]` + `max_results: 8` |
289
+ | Fresh content required | any tool + `force_refresh: true` |
290
+ | Doc site indexing | `crawl` + `strategy: "sitemap"` |
291
+ | Site URL inventory only | `crawl` + `strategy: "map"` |
292
+ | Single heading from long page | `fetch` + `section: "..."` |
293
+ | Behind login | `fetch` / `crawl` + `use_auth: true` |
294
+ | Direct answer (sampling client) | `search` + `format: "answer"` |
295
+ | LLM-ready context blob | `search` + `format: "context"` |
296
+ | Complex question, multi-source | `research` + `depth: "standard"` |
297
+ | Structured multi-page extraction | `agent` + `schema` |
298
+ | One-page structured data | `extract` + `mode: "schema"` or `"tables"` |
299
+ | Change tracking | `cache` + `check_changes: true` |
155
300
 
156
301
  ## Anti-Patterns
157
302
 
158
- These waste tokens, time, and rate limits. Avoid them.
303
+ **Do not skip the cache.** Running `search` or `fetch` without probing `cache` wastes time on content already on disk. `research` and `agent` check cache internally; manual `search`/`fetch` do not.
159
304
 
160
- **Do not retry the same query.** If `search` returns no results, reformulate with different keywords. Repeating an identical query returns the same empty results.
305
+ **Do not send natural-language questions to `search`.** Use keywords. `"how do I debounce in React hooks"` loses to `"react useDebounce hook custom"`.
161
306
 
162
- **Do not skip the cache.** Every `fetch`, `search`, and `crawl` result is cached locally. Before any network call, run `cache` with the URL pattern or query text. Cached results return instantly.
307
+ **Do not retry an identical failing query.** Reformulate keywords, swap `category`, or add `include_domains`. Same query same empty result.
163
308
 
164
- **Do not send natural language questions as search queries.** Search engines work best with keywords. Instead of `"What is the best way to handle authentication in Next.js?"`, use `"Next.js authentication best practices 2025"`.
309
+ **Do not use `agent` or `research` for one-URL lookups.** Use `fetch`. `agent` is for multi-source gathering; `research` is for decomposable questions.
165
310
 
166
- **Do not use `agent` for simple lookups.** One fact from one URL = `fetch`. Quick search result = `search`. Reserve `agent` for tasks requiring multiple search/fetch cycles.
311
+ **Do not crawl `max_pages: 100` without filters.** Always add `include_patterns` to stay in-scope. Unfiltered crawls fetch nav, footer, and sitemap garbage.
167
312
 
168
- **Do not use `research` when you already know the URLs.** If you have URLs to read, use `fetch` or `crawl`. `research` is for when you need the tool to discover sources autonomously.
313
+ **Do not fetch whole pages when you need one section.** `fetch` + `section` reads under one heading only.
169
314
 
170
- **Do not fetch entire pages when you need one section.** Use `fetch` with `section` to extract just the relevant part.
315
+ **Do not set `force_refresh: true` by default.** It defeats the cache. Use it for news, status, changelogs content that actually churns.
171
316
 
172
- **Do not crawl with high max_pages without filtering.** A `max_pages: 100` crawl without `include_patterns` fetches navigation pages, footers, and irrelevant content.
317
+ **Do not pass a JSON Schema to `extract` without `properties`.** The handler rejects schemas that lack a `properties` key.
173
318
 
174
- **Do not ignore `format: "context"` for search.** When injecting search results into a prompt, use `format: "context"` instead of manually concatenating results.
319
+ ## CLI Commands
320
+
321
+ ```bash
322
+ wigolo # default: start MCP server on stdio
323
+ wigolo mcp # explicit: start MCP server
324
+ wigolo warmup [flags] # install Playwright, bootstrap SearXNG, optional extras
325
+ wigolo serve # start HTTP daemon on WIGOLO_DAEMON_PORT (default 3333)
326
+ wigolo health # health probe, exits 0 if ok
327
+ wigolo doctor # environment diagnostics (Python, Docker, Playwright, SearXNG)
328
+ wigolo auth discover # list CDP sessions (needs WIGOLO_CDP_URL)
329
+ wigolo auth status # show configured auth paths
330
+ wigolo plugin add <git-url> # clone plugin into ~/.wigolo/plugins/
331
+ wigolo plugin list # list installed plugins
332
+ wigolo plugin remove <name> # remove a plugin
333
+ wigolo shell [--json] # interactive REPL against subsystems
334
+ ```
175
335
 
176
- ## Key Features
336
+ ## Configuration
177
337
 
178
- - Zero API keys required
179
- - Zero cloud dependency -- runs entirely local
180
- - Authenticated browsing (Chrome profiles, session state)
181
- - Localhost access (develop against local servers)
182
- - SQLite FTS5 cache with full-text search
183
- - ML reranking (optional, via FlashRank)
184
- - Extraction ensemble: site-specific, Defuddle, Trafilatura, Readability, Turndown
338
+ Top environment variables. All optional — defaults are safe.
185
339
 
186
- ## Requirements
340
+ | Variable | Default | Purpose |
341
+ |---|---|---|
342
+ | `WIGOLO_DATA_DIR` | `~/.wigolo` | Cache DB, SearXNG state, plugins, embeddings |
343
+ | `SEARXNG_URL` | unset | Point at an existing SearXNG (skips native bootstrap) |
344
+ | `SEARXNG_MODE` | `native` | `native` runs local Python SearXNG; `docker` runs container |
345
+ | `WIGOLO_CHROME_PROFILE_PATH` | unset | Chrome profile for `use_auth: true` |
346
+ | `WIGOLO_CDP_URL` | unset | Chrome DevTools endpoint (e.g. `http://localhost:9222`) |
347
+ | `MAX_BROWSERS` | `3` | Playwright pool size |
348
+ | `WIGOLO_BROWSER_TYPES` | `chromium` | Comma list: `chromium,firefox,webkit` |
349
+ | `WIGOLO_RERANKER` | `none` | `flashrank` for ML reranking |
350
+ | `WIGOLO_EMBEDDING_MODEL` | `BAAI/bge-small-en-v1.5` | Used by `find_similar` |
351
+ | `CACHE_TTL_CONTENT` | `604800` (7d) | Seconds before cached pages expire |
352
+ | `LOG_LEVEL` | `info` | `debug` \| `info` \| `warn` \| `error` |
187
353
 
188
- - Node.js 20+
189
- - Python 3.8+ (recommended, for embedded SearXNG search)
190
- - Docker (optional, alternative to Python for SearXNG)
354
+ Full list: see `src/config.ts`.
191
355
 
192
356
  ## Links
193
357
 
194
358
  - Repository: https://github.com/KnockOutEZ/wigolo
195
359
  - npm: https://www.npmjs.com/package/@staticn0va/wigolo
196
- - License: BSL 1.1 (converts to MIT on 2029-04-12)
360
+ - License: BUSL-1.1 (converts to open source on 2029-04-12)
@@ -5,6 +5,7 @@ export interface RouterFetchOptions {
5
5
  headers?: Record<string, string>;
6
6
  screenshot?: boolean;
7
7
  actions?: BrowserAction[];
8
+ force_refresh?: boolean;
8
9
  }
9
10
  export interface HttpClient {
10
11
  fetch(url: string, options?: {
@@ -1 +1 @@
1
- {"version":3,"file":"router.d.ts","sourceRoot":"","sources":["../../src/fetch/router.ts"],"names":[],"mappings":"AAIA,OAAO,KAAK,EAAE,cAAc,EAAE,aAAa,EAAE,MAAM,aAAa,CAAC;AAEjE,MAAM,WAAW,kBAAkB;IACjC,QAAQ,CAAC,EAAE,MAAM,GAAG,QAAQ,GAAG,OAAO,CAAC;IACvC,OAAO,CAAC,EAAE,OAAO,CAAC;IAClB,OAAO,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAC;IACjC,UAAU,CAAC,EAAE,OAAO,CAAC;IACrB,OAAO,CAAC,EAAE,aAAa,EAAE,CAAC;CAC3B;AAED,MAAM,WAAW,UAAU;IACzB,KAAK,CACH,GAAG,EAAE,MAAM,EACX,OAAO,CAAC,EAAE;QAAE,OAAO,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAC;QAAC,SAAS,CAAC,EAAE,MAAM,CAAA;KAAE,GACjE,OAAO,CAAC;QACT,GAAG,EAAE,MAAM,CAAC;QACZ,QAAQ,EAAE,MAAM,CAAC;QACjB,IAAI,EAAE,MAAM,CAAC;QACb,WAAW,EAAE,MAAM,CAAC;QACpB,UAAU,EAAE,MAAM,CAAC;QACnB,OAAO,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAC;QAChC,SAAS,CAAC,EAAE,MAAM,CAAC;KACpB,CAAC,CAAC;CACJ;AAED,MAAM,WAAW,oBAAoB;IACnC,gBAAgB,CACd,GAAG,EAAE,MAAM,EACX,OAAO,CAAC,EAAE;QAAE,OAAO,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAC;QAAC,gBAAgB,CAAC,EAAE,MAAM,CAAC;QAAC,WAAW,CAAC,EAAE,MAAM,CAAC;QAAC,UAAU,CAAC,EAAE,OAAO,CAAC;QAAC,OAAO,CAAC,EAAE,aAAa,EAAE,CAAC;QAAC,MAAM,CAAC,EAAE,MAAM,CAAA;KAAE,GAChK,OAAO,CAAC,cAAc,CAAC,CAAC;CAC5B;AAED,UAAU,WAAW;IACnB,YAAY,EAAE,MAAM,CAAC;IACrB,gBAAgB,EAAE,OAAO,CAAC;CAC3B;AAED,qBAAa,WAAW;IAIpB,OAAO,CAAC,QAAQ,CAAC,UAAU;IAC3B,OAAO,CAAC,QAAQ,CAAC,WAAW;IAJ9B,OAAO,CAAC,QAAQ,CAAC,SAAS,CAAkC;gBAGzC,UAAU,EAAE,UAAU,EACtB,WAAW,EAAE,oBAAoB;IAG9C,KAAK,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,GAAE,kBAAuB,GAAG,OAAO,CAAC,cAAc,CAAC;IAoEnF,cAAc,CAAC,MAAM,EAAE,MAAM,GAAG,WAAW,GAAG,SAAS;IAIvD,OAAO,CAAC,WAAW;IASnB,OAAO,CAAC,gBAAgB;CAczB"}
1
+ {"version":3,"file":"router.d.ts","sourceRoot":"","sources":["../../src/fetch/router.ts"],"names":[],"mappings":"AAIA,OAAO,KAAK,EAAE,cAAc,EAAE,aAAa,EAAE,MAAM,aAAa,CAAC;AAEjE,MAAM,WAAW,kBAAkB;IACjC,QAAQ,CAAC,EAAE,MAAM,GAAG,QAAQ,GAAG,OAAO,CAAC;IACvC,OAAO,CAAC,EAAE,OAAO,CAAC;IAClB,OAAO,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAC;IACjC,UAAU,CAAC,EAAE,OAAO,CAAC;IACrB,OAAO,CAAC,EAAE,aAAa,EAAE,CAAC;IAC1B,aAAa,CAAC,EAAE,OAAO,CAAC;CACzB;AAED,MAAM,WAAW,UAAU;IACzB,KAAK,CACH,GAAG,EAAE,MAAM,EACX,OAAO,CAAC,EAAE;QAAE,OAAO,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAC;QAAC,SAAS,CAAC,EAAE,MAAM,CAAA;KAAE,GACjE,OAAO,CAAC;QACT,GAAG,EAAE,MAAM,CAAC;QACZ,QAAQ,EAAE,MAAM,CAAC;QACjB,IAAI,EAAE,MAAM,CAAC;QACb,WAAW,EAAE,MAAM,CAAC;QACpB,UAAU,EAAE,MAAM,CAAC;QACnB,OAAO,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAC;QAChC,SAAS,CAAC,EAAE,MAAM,CAAC;KACpB,CAAC,CAAC;CACJ;AAED,MAAM,WAAW,oBAAoB;IACnC,gBAAgB,CACd,GAAG,EAAE,MAAM,EACX,OAAO,CAAC,EAAE;QAAE,OAAO,CAAC,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAC;QAAC,gBAAgB,CAAC,EAAE,MAAM,CAAC;QAAC,WAAW,CAAC,EAAE,MAAM,CAAC;QAAC,UAAU,CAAC,EAAE,OAAO,CAAC;QAAC,OAAO,CAAC,EAAE,aAAa,EAAE,CAAC;QAAC,MAAM,CAAC,EAAE,MAAM,CAAA;KAAE,GAChK,OAAO,CAAC,cAAc,CAAC,CAAC;CAC5B;AAED,UAAU,WAAW;IACnB,YAAY,EAAE,MAAM,CAAC;IACrB,gBAAgB,EAAE,OAAO,CAAC;CAC3B;AAED,qBAAa,WAAW;IAIpB,OAAO,CAAC,QAAQ,CAAC,UAAU;IAC3B,OAAO,CAAC,QAAQ,CAAC,WAAW;IAJ9B,OAAO,CAAC,QAAQ,CAAC,SAAS,CAAkC;gBAGzC,UAAU,EAAE,UAAU,EACtB,WAAW,EAAE,oBAAoB;IAG9C,KAAK,CAAC,GAAG,EAAE,MAAM,EAAE,OAAO,GAAE,kBAAuB,GAAG,OAAO,CAAC,cAAc,CAAC;IAoEnF,cAAc,CAAC,MAAM,EAAE,MAAM,GAAG,WAAW,GAAG,SAAS;IAIvD,OAAO,CAAC,WAAW;IASnB,OAAO,CAAC,gBAAgB;CAczB"}
@@ -1 +1 @@
1
- {"version":3,"file":"router.js","sourceRoot":"","sources":["../../src/fetch/router.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,SAAS,EAAE,MAAM,cAAc,CAAC;AACzC,OAAO,EAAE,YAAY,EAAE,MAAM,cAAc,CAAC;AAC5C,OAAO,EAAE,mBAAmB,EAAE,MAAM,oBAAoB,CAAC;AACzD,OAAO,EAAE,cAAc,EAAE,MAAM,WAAW,CAAC;AAsC3C,MAAM,OAAO,WAAW;IAIH;IACA;IAJF,SAAS,GAAG,IAAI,GAAG,EAAuB,CAAC;IAE5D,YACmB,UAAsB,EACtB,WAAiC;QADjC,eAAU,GAAV,UAAU,CAAY;QACtB,gBAAW,GAAX,WAAW,CAAsB;IACjD,CAAC;IAEJ,KAAK,CAAC,KAAK,CAAC,GAAW,EAAE,UAA8B,EAAE;QACvD,MAAM,EAAE,QAAQ,GAAG,MAAM,EAAE,OAAO,GAAG,KAAK,EAAE,OAAO,EAAE,UAAU,EAAE,OAAO,EAAE,GAAG,OAAO,CAAC;QACrF,MAAM,MAAM,GAAG,SAAS,EAAE,CAAC;QAC3B,MAAM,MAAM,GAAG,YAAY,CAAC,OAAO,CAAC,CAAC;QACrC,MAAM,SAAS,GAAG,MAAM,CAAC,wBAAwB,CAAC;QAClD,MAAM,MAAM,GAAG,IAAI,GAAG,CAAC,GAAG,CAAC,CAAC,QAAQ,CAAC;QAErC,uEAAuE;QACvE,IAAI,OAAO,IAAI,OAAO,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YAClC,MAAM,WAAW,GAAG,OAAO,CAAC,CAAC,CAAC,CAAC,MAAM,cAAc,EAAE,IAAI,EAAE,CAAC,CAAC,CAAC,CAAC,EAAE,CAAC;YAClE,MAAM,CAAC,KAAK,CAAC,uBAAuB,EAAE,EAAE,GAAG,EAAE,MAAM,EAAE,iBAAiB,EAAE,CAAC,CAAC;YAC1E,OAAO,IAAI,CAAC,WAAW,CAAC,gBAAgB,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,UAAU,EAAE,OAAO,EAAE,GAAG,WAAW,EAAE,CAAC,CAAC;QAClG,CAAC;QAED,kDAAkD;QAClD,IAAI,QAAQ,KAAK,QAAQ,IAAI,OAAO,EAAE,CAAC;YACrC,MAAM,WAAW,GAAG,OAAO,CAAC,CAAC,CAAC,CAAC,MAAM,cAAc,EAAE,IAAI,EAAE,CAAC,CAAC,CAAC,CAAC,EAAE,CAAC;YAClE,MAAM,CAAC,KAAK,CAAC,uBAAuB,EAAE,EAAE,GAAG,EAAE,MAAM,EAAE,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,CAAC,CAAC,kBAAkB,EAAE,CAAC,CAAC;YAC9F,OAAO,IAAI,CAAC,WAAW,CAAC,gBAAgB,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,UAAU,EAAE,GAAG,WAAW,EAAE,CAAC,CAAC;QACzF,CAAC;QAED,yBAAyB;QACzB,IAAI,QAAQ,KAAK,OAAO,EAAE,CAAC;YACzB,MAAM,CAAC,KAAK,CAAC,yBAAyB,EAAE,EAAE,GAAG,EAAE,CAAC,CAAC;YACjD,MAAM,MAAM,GAAG,MAAM,IAAI,CAAC,UAAU,CAAC,KAAK,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,CAAC,CAAC;YAC7D,IAAI,CAAC,WAAW,CAAC,MAAM,CAAC,CAAC;YACzB,OAAO,IAAI,CAAC,gBAAgB,CAAC,MAAM,CAAC,CAAC;QACvC,CAAC;QAED,yDAAyD;QACzD,MAAM,KAAK,GAAG,IAAI,CAAC,WAAW,CAAC,MAAM,CAAC,CAAC;QAEvC,IAAI,KAAK,CAAC,gBAAgB,EAAE,CAAC;YAC3B,MAAM,CAAC,KAAK,CAAC,uCAAuC,EAAE,EAAE,GAAG,EAAE,MAAM,EAAE,CAAC,CAAC;YACvE,OAAO,IAAI,CAAC,WAAW,CAAC,gBAAgB,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,UAAU,EAAE,CAAC,CAAC;QACzE,CAAC;QAED,iBAAiB;QACjB,IAAI,CAAC;YACH,MAAM,MAAM,GAAG,MAAM,IAAI,CAAC,UAAU,CAAC,KAAK,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,CAAC,CAAC;YAE7D,sCAAsC;YACtC,IAAI,mBAAmB,CAAC,MAAM,CAAC,IAAI,CAAC,EAAE,CAAC;gBACrC,MAAM,CAAC,IAAI,CAAC,mDAAmD,EAAE,EAAE,GAAG,EAAE,MAAM,EAAE,CAAC,CAAC;gBAClF,KAAK,CAAC,gBAAgB,GAAG,IAAI,CAAC;gBAC9B,OAAO,IAAI,CAAC,WAAW,CAAC,gBAAgB,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,UAAU,EAAE,CAAC,CAAC;YACzE,CAAC;YAED,OAAO,IAAI,CAAC,gBAAgB,CAAC,MAAM,CAAC,CAAC;QACvC,CAAC;QAAC,OAAO,GAAG,EAAE,CAAC;YACb,KAAK,CAAC,YAAY,EAAE,CAAC;YACrB,MAAM,CAAC,IAAI,CAAC,mBAAmB,EAAE;gBAC/B,GAAG;gBACH,MAAM;gBACN,YAAY,EAAE,KAAK,CAAC,YAAY;gBAChC,KAAK,EAAE,GAAG,YAAY,KAAK,CAAC,CAAC,CAAC,GAAG,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,GAAG,CAAC;aACxD,CAAC,CAAC;YAEH,IAAI,KAAK,CAAC,YAAY,IAAI,SAAS,EAAE,CAAC;gBACpC,MAAM,CAAC,IAAI,CAAC,0DAA0D,EAAE,EAAE,GAAG,EAAE,MAAM,EAAE,SAAS,EAAE,CAAC,CAAC;gBACpG,KAAK,CAAC,gBAAgB,GAAG,IAAI,CAAC;gBAC9B,OAAO,IAAI,CAAC,WAAW,CAAC,gBAAgB,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,UAAU,EAAE,CAAC,CAAC;YACzE,CAAC;YAED,MAAM,GAAG,CAAC;QACZ,CAAC;IACH,CAAC;IAED,cAAc,CAAC,MAAc;QAC3B,OAAO,IAAI,CAAC,SAAS,CAAC,GAAG,CAAC,MAAM,CAAC,CAAC;IACpC,CAAC;IAEO,WAAW,CAAC,MAAc;QAChC,IAAI,KAAK,GAAG,IAAI,CAAC,SAAS,CAAC,GAAG,CAAC,MAAM,CAAC,CAAC;QACvC,IAAI,CAAC,KAAK,EAAE,CAAC;YACX,KAAK,GAAG,EAAE,YAAY,EAAE,CAAC,EAAE,gBAAgB,EAAE,KAAK,EAAE,CAAC;YACrD,IAAI,CAAC,SAAS,CAAC,GAAG,CAAC,MAAM,EAAE,KAAK,CAAC,CAAC;QACpC,CAAC;QACD,OAAO,KAAK,CAAC;IACf,CAAC;IAEO,gBAAgB,CACtB,MAAgD;QAEhD,OAAO;YACL,GAAG,EAAE,MAAM,CAAC,GAAG;YACf,QAAQ,EAAE,MAAM,CAAC,QAAQ;YACzB,IAAI,EAAE,MAAM,CAAC,IAAI;YACjB,WAAW,EAAE,MAAM,CAAC,WAAW;YAC/B,UAAU,EAAE,MAAM,CAAC,UAAU;YAC7B,MAAM,EAAE,MAAM;YACd,OAAO,EAAE,MAAM,CAAC,OAAO;YACvB,SAAS,EAAE,MAAM,CAAC,SAAS;SAC5B,CAAC;IACJ,CAAC;CACF"}
1
+ {"version":3,"file":"router.js","sourceRoot":"","sources":["../../src/fetch/router.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,SAAS,EAAE,MAAM,cAAc,CAAC;AACzC,OAAO,EAAE,YAAY,EAAE,MAAM,cAAc,CAAC;AAC5C,OAAO,EAAE,mBAAmB,EAAE,MAAM,oBAAoB,CAAC;AACzD,OAAO,EAAE,cAAc,EAAE,MAAM,WAAW,CAAC;AAuC3C,MAAM,OAAO,WAAW;IAIH;IACA;IAJF,SAAS,GAAG,IAAI,GAAG,EAAuB,CAAC;IAE5D,YACmB,UAAsB,EACtB,WAAiC;QADjC,eAAU,GAAV,UAAU,CAAY;QACtB,gBAAW,GAAX,WAAW,CAAsB;IACjD,CAAC;IAEJ,KAAK,CAAC,KAAK,CAAC,GAAW,EAAE,UAA8B,EAAE;QACvD,MAAM,EAAE,QAAQ,GAAG,MAAM,EAAE,OAAO,GAAG,KAAK,EAAE,OAAO,EAAE,UAAU,EAAE,OAAO,EAAE,GAAG,OAAO,CAAC;QACrF,MAAM,MAAM,GAAG,SAAS,EAAE,CAAC;QAC3B,MAAM,MAAM,GAAG,YAAY,CAAC,OAAO,CAAC,CAAC;QACrC,MAAM,SAAS,GAAG,MAAM,CAAC,wBAAwB,CAAC;QAClD,MAAM,MAAM,GAAG,IAAI,GAAG,CAAC,GAAG,CAAC,CAAC,QAAQ,CAAC;QAErC,uEAAuE;QACvE,IAAI,OAAO,IAAI,OAAO,CAAC,MAAM,GAAG,CAAC,EAAE,CAAC;YAClC,MAAM,WAAW,GAAG,OAAO,CAAC,CAAC,CAAC,CAAC,MAAM,cAAc,EAAE,IAAI,EAAE,CAAC,CAAC,CAAC,CAAC,EAAE,CAAC;YAClE,MAAM,CAAC,KAAK,CAAC,uBAAuB,EAAE,EAAE,GAAG,EAAE,MAAM,EAAE,iBAAiB,EAAE,CAAC,CAAC;YAC1E,OAAO,IAAI,CAAC,WAAW,CAAC,gBAAgB,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,UAAU,EAAE,OAAO,EAAE,GAAG,WAAW,EAAE,CAAC,CAAC;QAClG,CAAC;QAED,kDAAkD;QAClD,IAAI,QAAQ,KAAK,QAAQ,IAAI,OAAO,EAAE,CAAC;YACrC,MAAM,WAAW,GAAG,OAAO,CAAC,CAAC,CAAC,CAAC,MAAM,cAAc,EAAE,IAAI,EAAE,CAAC,CAAC,CAAC,CAAC,EAAE,CAAC;YAClE,MAAM,CAAC,KAAK,CAAC,uBAAuB,EAAE,EAAE,GAAG,EAAE,MAAM,EAAE,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,CAAC,CAAC,kBAAkB,EAAE,CAAC,CAAC;YAC9F,OAAO,IAAI,CAAC,WAAW,CAAC,gBAAgB,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,UAAU,EAAE,GAAG,WAAW,EAAE,CAAC,CAAC;QACzF,CAAC;QAED,yBAAyB;QACzB,IAAI,QAAQ,KAAK,OAAO,EAAE,CAAC;YACzB,MAAM,CAAC,KAAK,CAAC,yBAAyB,EAAE,EAAE,GAAG,EAAE,CAAC,CAAC;YACjD,MAAM,MAAM,GAAG,MAAM,IAAI,CAAC,UAAU,CAAC,KAAK,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,CAAC,CAAC;YAC7D,IAAI,CAAC,WAAW,CAAC,MAAM,CAAC,CAAC;YACzB,OAAO,IAAI,CAAC,gBAAgB,CAAC,MAAM,CAAC,CAAC;QACvC,CAAC;QAED,yDAAyD;QACzD,MAAM,KAAK,GAAG,IAAI,CAAC,WAAW,CAAC,MAAM,CAAC,CAAC;QAEvC,IAAI,KAAK,CAAC,gBAAgB,EAAE,CAAC;YAC3B,MAAM,CAAC,KAAK,CAAC,uCAAuC,EAAE,EAAE,GAAG,EAAE,MAAM,EAAE,CAAC,CAAC;YACvE,OAAO,IAAI,CAAC,WAAW,CAAC,gBAAgB,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,UAAU,EAAE,CAAC,CAAC;QACzE,CAAC;QAED,iBAAiB;QACjB,IAAI,CAAC;YACH,MAAM,MAAM,GAAG,MAAM,IAAI,CAAC,UAAU,CAAC,KAAK,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,CAAC,CAAC;YAE7D,sCAAsC;YACtC,IAAI,mBAAmB,CAAC,MAAM,CAAC,IAAI,CAAC,EAAE,CAAC;gBACrC,MAAM,CAAC,IAAI,CAAC,mDAAmD,EAAE,EAAE,GAAG,EAAE,MAAM,EAAE,CAAC,CAAC;gBAClF,KAAK,CAAC,gBAAgB,GAAG,IAAI,CAAC;gBAC9B,OAAO,IAAI,CAAC,WAAW,CAAC,gBAAgB,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,UAAU,EAAE,CAAC,CAAC;YACzE,CAAC;YAED,OAAO,IAAI,CAAC,gBAAgB,CAAC,MAAM,CAAC,CAAC;QACvC,CAAC;QAAC,OAAO,GAAG,EAAE,CAAC;YACb,KAAK,CAAC,YAAY,EAAE,CAAC;YACrB,MAAM,CAAC,IAAI,CAAC,mBAAmB,EAAE;gBAC/B,GAAG;gBACH,MAAM;gBACN,YAAY,EAAE,KAAK,CAAC,YAAY;gBAChC,KAAK,EAAE,GAAG,YAAY,KAAK,CAAC,CAAC,CAAC,GAAG,CAAC,OAAO,CAAC,CAAC,CAAC,MAAM,CAAC,GAAG,CAAC;aACxD,CAAC,CAAC;YAEH,IAAI,KAAK,CAAC,YAAY,IAAI,SAAS,EAAE,CAAC;gBACpC,MAAM,CAAC,IAAI,CAAC,0DAA0D,EAAE,EAAE,GAAG,EAAE,MAAM,EAAE,SAAS,EAAE,CAAC,CAAC;gBACpG,KAAK,CAAC,gBAAgB,GAAG,IAAI,CAAC;gBAC9B,OAAO,IAAI,CAAC,WAAW,CAAC,gBAAgB,CAAC,GAAG,EAAE,EAAE,OAAO,EAAE,UAAU,EAAE,CAAC,CAAC;YACzE,CAAC;YAED,MAAM,GAAG,CAAC;QACZ,CAAC;IACH,CAAC;IAED,cAAc,CAAC,MAAc;QAC3B,OAAO,IAAI,CAAC,SAAS,CAAC,GAAG,CAAC,MAAM,CAAC,CAAC;IACpC,CAAC;IAEO,WAAW,CAAC,MAAc;QAChC,IAAI,KAAK,GAAG,IAAI,CAAC,SAAS,CAAC,GAAG,CAAC,MAAM,CAAC,CAAC;QACvC,IAAI,CAAC,KAAK,EAAE,CAAC;YACX,KAAK,GAAG,EAAE,YAAY,EAAE,CAAC,EAAE,gBAAgB,EAAE,KAAK,EAAE,CAAC;YACrD,IAAI,CAAC,SAAS,CAAC,GAAG,CAAC,MAAM,EAAE,KAAK,CAAC,CAAC;QACpC,CAAC;QACD,OAAO,KAAK,CAAC;IACf,CAAC;IAEO,gBAAgB,CACtB,MAAgD;QAEhD,OAAO;YACL,GAAG,EAAE,MAAM,CAAC,GAAG;YACf,QAAQ,EAAE,MAAM,CAAC,QAAQ;YACzB,IAAI,EAAE,MAAM,CAAC,IAAI;YACjB,WAAW,EAAE,MAAM,CAAC,WAAW;YAC/B,UAAU,EAAE,MAAM,CAAC,UAAU;YAC7B,MAAM,EAAE,MAAM;YACd,OAAO,EAAE,MAAM,CAAC,OAAO;YACvB,SAAS,EAAE,MAAM,CAAC,SAAS;SAC5B,CAAC;IACJ,CAAC;CACF"}
@@ -14,10 +14,10 @@
14
14
  * Parameter schemas (types, enums, required/optional) belong on the JSON
15
15
  * Schema, not here. Installation/configuration is for humans, not LLMs.
16
16
  */
17
- export declare const WIGOLO_INSTRUCTIONS = "Wigolo is a local-first web access layer: search the open web, fetch pages, crawl sites, extract structured data, find related content, run multi-step research, and execute agent-driven data gathering. All results land in a local SQLite cache that persists across sessions.\n\n## When to use which tool\n\n- `search` -- you need information on a topic but do not have a URL yet. Pass a query string or an array of 3-5 semantically varied keyword forms for broader coverage.\n- `fetch` -- you already have a specific URL to read.\n- `crawl` -- you need multiple pages from the same site (docs, wikis, references).\n- `cache` -- you want to know if the content is already on disk from an earlier read.\n- `extract` -- you need specific data points (tables, metadata, schema-shaped fields) rather than a whole page as markdown.\n- `find_similar` -- you have a URL or concept and want related content from the cache or web. Useful for \"more like this\" discovery.\n- `research` -- you have a complex question that needs multi-step investigation: question decomposition, parallel search, source synthesis into a report. Set `depth` to control thoroughness.\n- `agent` -- you need to gather structured or unstructured data from multiple sources based on a natural-language prompt. Provides full step transparency.\n\n## Routing by intent\n\n| Intent | Tool | Key parameters |\n|--------|------|----------------|\n| Documentation lookup | `search` | `include_domains: [\"react.dev\", \"docs.python.org\"]`, `category: \"docs\"` |\n| Error debugging | `search` | exact error string as query, `category: \"code\"` |\n| Library research | `crawl` | seed URL of docs site, `strategy: \"sitemap\"`, then `cache` for later queries |\n| Related content | `find_similar` | `url` of a known good page, or `concept` as free text |\n| Direct answer | `search` | `format: \"answer\"` for a synthesized direct response |\n| Comprehensive research | `research` | `depth: \"comprehensive\"`, optional `include_domains` to scope |\n| Data gathering | `agent` | natural-language `prompt`, optional `schema` for structured output |\n| Structured extraction | `extract` | `mode: \"schema\"` with a JSON Schema, or `mode: \"tables\"` |\n| Site inventory | `crawl` | `strategy: \"map\"` for URL-only discovery, no content fetched |\n\n## Check the cache before going to the network\n\nBefore every `search` or `fetch`, consider a `cache` call with the query text or URL pattern. Pages read in this or a prior session return instantly with their full markdown -- no network, no rate limits. The `research` and `agent` tools check the cache internally, so you do not need a separate call for those.\n\n## Multi-query search strategy\n\nFor broad or exploratory queries, pass an array of 3-5 semantically varied keyword forms rather than a single natural-language question. Example: instead of \"how does React handle state management\", pass `[\"react state management\", \"useState useReducer patterns\", \"react hooks state\", \"react context vs redux\"]`. The search tool deduplicates across sub-queries automatically.\n\n## Pick the right strategy\n\n- For documentation sites, prefer `crawl` with `strategy: \"sitemap\"` -- it is faster and more complete than BFS because it reads sitemap.xml directly.\n- When you only need to discover what pages exist on a site, use `crawl` with `strategy: \"map\"`. It returns URLs only, no content, and is far cheaper than a full crawl. Follow up with targeted `fetch` calls.\n- For structured data (prices, specs, listings, table rows), use `extract` with `mode: \"schema\"` or `mode: \"tables\"`. Reach for `fetch` only when you want the whole page as markdown.\n- For complex questions requiring synthesis from multiple sources, use `research` instead of manually chaining `search` + `fetch` calls.\n- For natural-language data gathering tasks (\"find the pricing for the top 5 CRM tools\"), use `agent` with an optional `schema` to structure the output.\n\n## Scope searches, do not just broaden queries\n\n`search` accepts `include_domains` (e.g. `[\"react.dev\", \"developer.mozilla.org\"]`) and a `category` such as `\"docs\"`, `\"code\"`, `\"news\"`, or `\"papers\"`. A scoped query usually beats a broader query with post-filtering.\n\n## Performance\n\n- `max_results: 3` for focused lookups; `5` is the default; `10+` only for broad research.\n- `fetch` with `section: \"Heading Name\"` returns just the content under that heading. Cheaper and more relevant than the whole page.\n- Repeated fetches of the same URL are free -- served directly from the SQLite cache.\n- `research` with `depth: \"quick\"` is fast (~15s) and sufficient for most factual questions. Reserve `\"comprehensive\"` for topics requiring deep investigation.\n- `agent` respects `max_pages` (default 10) and `max_time_ms` (default 60s) to bound resource usage.\n\n## Capabilities worth knowing\n\n- Localhost URLs work: `http://localhost:3000`, `http://127.0.0.1:8080`, and similar. Useful for reading local dev servers and internal docs.\n- `use_auth: true` on `fetch` and `crawl` reuses the user's configured browser session for pages behind a login.\n- `cache` accepts FTS5 query syntax (`AND`, `OR`, `NOT`, `\"exact phrase\"`) for precise lookups.\n- `crawl` accepts regex `include_patterns` and `exclude_patterns` to stay inside a section of a large site.\n- `find_similar` uses cached embeddings when available -- no network call needed if the content has been seen before.\n- `research` and `agent` use MCP requestSampling for intelligent decomposition and synthesis when the client supports it. Without sampling support, they return raw sources in context format.";
17
+ export declare const WIGOLO_INSTRUCTIONS = "Wigolo is a local-first web access layer: search the open web, fetch pages, crawl sites, extract structured data, find related content, run multi-step research, and execute agent-driven data gathering. All results land in a local SQLite cache that persists across sessions.\n\n## When to use which tool\n\n- `search` -- you need information on a topic but do not have a URL yet. Pass a query string or an array of 3-5 semantically varied keyword forms for broader coverage.\n- `fetch` -- you already have a specific URL to read.\n- `crawl` -- you need multiple pages from the same site (docs, wikis, references).\n- `cache` -- you want to know if the content is already on disk from an earlier read.\n- `extract` -- you need specific data points (tables, metadata, schema-shaped fields) rather than a whole page as markdown.\n- `find_similar` -- you have a URL or concept and want related content from the cache or web. Useful for \"more like this\" discovery.\n- `research` -- you have a complex question that needs multi-step investigation: question decomposition, parallel search, source synthesis into a report. Set `depth` to control thoroughness.\n- `agent` -- you need to gather structured or unstructured data from multiple sources based on a natural-language prompt. Provides full step transparency.\n\n## Routing by intent\n\n| Intent | Tool | Key parameters |\n|--------|------|----------------|\n| Documentation lookup | `search` | `include_domains: [\"react.dev\", \"docs.python.org\"]`, `category: \"docs\"` |\n| Error debugging | `search` | exact error string as query, `category: \"code\"` |\n| Library research | `crawl` | seed URL of docs site, `strategy: \"sitemap\"`, then `cache` for later queries |\n| Related content | `find_similar` | `url` of a known good page, or `concept` as free text |\n| Direct answer | `search` | `format: \"answer\"` for a synthesized direct response |\n| Comprehensive research | `research` | `depth: \"comprehensive\"`, optional `include_domains` to scope |\n| Data gathering | `agent` | natural-language `prompt`, optional `schema` for structured output |\n| Structured extraction | `extract` | `mode: \"schema\"` with a JSON Schema, or `mode: \"tables\"` |\n| Site inventory | `crawl` | `strategy: \"map\"` for URL-only discovery, no content fetched |\n\n## Rapidly changing content\n\nFor news, prices, status pages, release notes, or any content that changes frequently, bypass the cache with `force_refresh: true`:\n\n search({ query: \"...\", force_refresh: true })\n fetch({ url: \"...\", force_refresh: true })\n\nWhen freshness matters more than speed, use `force_refresh`. When speed matters more than freshness (documentation, tutorials, reference pages), let the cache work -- it is much faster.\n\n## Check the cache before going to the network\n\nBefore every `search` or `fetch`, consider a `cache` call with the query text or URL pattern. Pages read in this or a prior session return instantly with their full markdown -- no network, no rate limits. The `research` and `agent` tools check the cache internally, so you do not need a separate call for those.\n\n## Multi-query search strategy\n\nFor broad or exploratory queries, pass an array of 3-5 semantically varied keyword forms rather than a single natural-language question. Example: instead of \"how does React handle state management\", pass `[\"react state management\", \"useState useReducer patterns\", \"react hooks state\", \"react context vs redux\"]`. The search tool deduplicates across sub-queries automatically.\n\n## Pick the right strategy\n\n- For documentation sites, prefer `crawl` with `strategy: \"sitemap\"` -- it is faster and more complete than BFS because it reads sitemap.xml directly.\n- When you only need to discover what pages exist on a site, use `crawl` with `strategy: \"map\"`. It returns URLs only, no content, and is far cheaper than a full crawl. Follow up with targeted `fetch` calls.\n- For structured data (prices, specs, listings, table rows), use `extract` with `mode: \"schema\"` or `mode: \"tables\"`. Reach for `fetch` only when you want the whole page as markdown.\n- For complex questions requiring synthesis from multiple sources, use `research` instead of manually chaining `search` + `fetch` calls.\n- For natural-language data gathering tasks (\"find the pricing for the top 5 CRM tools\"), use `agent` with an optional `schema` to structure the output.\n\n## Scope searches, do not just broaden queries\n\n`search` accepts `include_domains` (e.g. `[\"react.dev\", \"developer.mozilla.org\"]`) and a `category` such as `\"docs\"`, `\"code\"`, `\"news\"`, or `\"papers\"`. A scoped query usually beats a broader query with post-filtering.\n\n## Performance\n\n- `max_results: 3` for focused lookups; `5` is the default; `10+` only for broad research.\n- `fetch` with `section: \"Heading Name\"` returns just the content under that heading. Cheaper and more relevant than the whole page.\n- Repeated fetches of the same URL are free -- served directly from the SQLite cache.\n- `research` with `depth: \"quick\"` is fast (~15s) and sufficient for most factual questions. Reserve `\"comprehensive\"` for topics requiring deep investigation.\n- `agent` respects `max_pages` (default 10) and `max_time_ms` (default 60s) to bound resource usage.\n\n## Capabilities worth knowing\n\n- Localhost URLs work: `http://localhost:3000`, `http://127.0.0.1:8080`, and similar. Useful for reading local dev servers and internal docs.\n- `use_auth: true` on `fetch` and `crawl` reuses the user's configured browser session for pages behind a login.\n- `cache` accepts FTS5 query syntax (`AND`, `OR`, `NOT`, `\"exact phrase\"`) for precise lookups.\n- `crawl` accepts regex `include_patterns` and `exclude_patterns` to stay inside a section of a large site.\n- `find_similar` uses cached embeddings when available -- no network call needed if the content has been seen before.\n- `research` and `agent` use MCP requestSampling for intelligent decomposition and synthesis when the client supports it. Without sampling support, they return raw sources in context format.";
18
18
  export declare const TOOL_DESCRIPTIONS: {
19
- readonly fetch: "Fetch a single URL and return clean markdown. Use when you have a specific URL to read. Automatically detects if JavaScript rendering is needed.\n\nKey parameters:\n- section: extract content under a specific heading (e.g., section: \"API Reference\") -- faster than reading the whole page\n- use_auth: true to use stored browser session for authenticated/private pages\n- render_js: \"auto\" (default, detects JS need), \"always\" (force browser), \"never\" (HTTP only, fastest)\n- headers: custom HTTP headers if needed\n\nReturns title, markdown content, links, images, and metadata. Result is cached locally -- subsequent fetches of the same URL return instantly. Works with localhost URLs (localhost:3000, etc.) for reading local dev servers.";
20
- readonly search: "Search the web and return full markdown content from top results. Use for finding information on any topic -- returns extracted page content, not just snippets.\n\nKey parameters:\n- query: a search string, or an array of 3-5 semantically varied keyword forms for broader coverage. Arrays are deduplicated and merged automatically.\n- include_domains/exclude_domains: scope results to specific sites (e.g., include_domains: [\"react.dev\"])\n- category: \"general\", \"news\", \"code\", \"docs\", \"papers\" -- filters by content type\n- from_date/to_date: ISO dates for time-bounded queries\n- max_results: default 5. Use 3 for focused queries, 10+ for research.\n- format: \"full\" (default, structured JSON), \"context\" (single token-budgeted string for LLM injection), \"answer\" (synthesized direct answer via requestSampling), \"stream_answer\" (streaming answer chunks)\n\nThe \"answer\" format uses the MCP client's sampling capability to synthesize a direct response from search results. If sampling is not supported, falls back to \"context\" format. \"stream_answer\" sends incremental progress notifications.\n\nResults include title, URL, relevance_score, and full markdown_content per result. Previously fetched pages are served from local cache.";
19
+ readonly fetch: "Fetch a single URL and return clean markdown. Use when you have a specific URL to read. Automatically detects if JavaScript rendering is needed.\n\nKey parameters:\n- section: extract content under a specific heading (e.g., section: \"API Reference\") -- faster than reading the whole page\n- use_auth: true to use stored browser session for authenticated/private pages\n- render_js: \"auto\" (default, detects JS need), \"always\" (force browser), \"never\" (HTTP only, fastest)\n- headers: custom HTTP headers if needed\n- force_refresh: true to bypass cache and fetch fresh content from the network\n\nReturns title, markdown content, links, images, and metadata. Result is cached locally -- subsequent fetches of the same URL return instantly. Works with localhost URLs (localhost:3000, etc.) for reading local dev servers.\n\nUse force_refresh: true for pages that change frequently (news sites, changelogs, dashboards, API status pages). By default, previously fetched pages are served from local cache for speed.";
20
+ readonly search: "Search the web and return full markdown content from top results. Use for finding information on any topic -- returns extracted page content, not just snippets.\n\nKey parameters:\n- query: a search string, or an array of 3-5 semantically varied keyword forms for broader coverage. Arrays are deduplicated and merged automatically.\n- include_domains/exclude_domains: scope results to specific sites (e.g., include_domains: [\"react.dev\"])\n- category: \"general\", \"news\", \"code\", \"docs\", \"papers\" -- filters by content type\n- from_date/to_date: ISO dates for time-bounded queries\n- max_results: default 5. Use 3 for focused queries, 10+ for research.\n- format: \"full\" (default, structured JSON), \"context\" (single token-budgeted string for LLM injection), \"answer\" (synthesized direct answer via requestSampling), \"stream_answer\" (same as answer, with MCP progress notifications emitted between pipeline phases)\n- force_refresh: true to bypass all caches (search results and page content)\n\nThe \"answer\" format uses the MCP client's sampling capability to synthesize a direct response from search results. If sampling is not supported, falls back to \"context\" format. \"stream_answer\" emits notifications/progress messages at each pipeline phase (search, fetch, synthesize) when the client provides a progressToken via request._meta — token-level streaming of the LLM response is not supported by MCP sampling, so the answer itself still arrives as one block.\n\nResults include title, URL, relevance_score, and full markdown_content per result. Previously fetched pages are served from local cache.\n\nUse force_refresh: true when you need current information that may have changed since the last search. Default behavior uses cached results when available.";
21
21
  readonly crawl: "Crawl a website starting from a URL and return content from multiple pages. Use for indexing documentation sites, wikis, or any multi-page resource.\n\nKey parameters:\n- strategy: \"bfs\" (breadth-first, default), \"dfs\" (depth-first), \"sitemap\" (use sitemap.xml -- fastest for doc sites), \"map\" (URL discovery only, no content -- fastest for scoping a site)\n- max_depth: how many links deep to follow (default 2)\n- max_pages: maximum pages to fetch (default 20)\n- include_patterns/exclude_patterns: regex filters on URLs\n\nReturns an array of pages with title, markdown, and depth. Content is deduplicated across pages (repeated nav/headers/footers stripped). All pages are cached for later cache queries.";
22
22
  readonly cache: "Search previously fetched content without hitting the network. Use before searching the web -- if relevant content was already fetched or crawled, this returns it instantly.\n\nKey parameters:\n- query: full-text search over cached markdown and titles (supports FTS5 syntax: AND, OR, NOT, \"phrase match\")\n- url_pattern: glob filter on URLs (e.g., \"*example.com*\")\n- since: ISO date -- only results cached after this date\n- stats: true to get cache size, entry count, oldest/newest dates\n- clear: true to delete matching entries\n\nReturns matching cached pages with full markdown content. Cache persists across sessions in local SQLite.";
23
23
  readonly extract: "Extract structured data from a URL or raw HTML. Use when you need specific data points, tables, or metadata rather than full page markdown.\n\nKey parameters:\n- mode: \"selector\" (CSS selector -> text), \"tables\" (HTML tables -> JSON rows), \"metadata\" (title/author/date/description), \"schema\" (JSON Schema -> heuristic field extraction)\n- css_selector: required for mode=\"selector\" -- any valid CSS selector\n- schema: for mode=\"schema\", a JSON Schema object describing the fields to extract\n- multiple: true to return array of all matches (mode=\"selector\" only)\n\nFor mode=\"tables\", returns array of table objects with headers and row data. For mode=\"schema\", pass { price: \"string\", name: \"string\" } and get structured fields extracted from the page.";
@@ -1 +1 @@
1
- {"version":3,"file":"instructions.d.ts","sourceRoot":"","sources":["../src/instructions.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;GAeG;AAEH,eAAO,MAAM,mBAAmB,yiLA8DmK,CAAC;AAEpM,eAAO,MAAM,iBAAiB;;;;;;;;;CAsGpB,CAAC;AAEX,MAAM,MAAM,QAAQ,GAAG,MAAM,OAAO,iBAAiB,CAAC"}
1
+ {"version":3,"file":"instructions.d.ts","sourceRoot":"","sources":["../src/instructions.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;GAeG;AAEH,eAAO,MAAM,mBAAmB,i/LAuEmK,CAAC;AAEpM,eAAO,MAAM,iBAAiB;;;;;;;;;CA4GpB,CAAC;AAEX,MAAM,MAAM,QAAQ,GAAG,MAAM,OAAO,iBAAiB,CAAC"}
@@ -41,6 +41,15 @@ export const WIGOLO_INSTRUCTIONS = `Wigolo is a local-first web access layer: se
41
41
  | Structured extraction | \`extract\` | \`mode: "schema"\` with a JSON Schema, or \`mode: "tables"\` |
42
42
  | Site inventory | \`crawl\` | \`strategy: "map"\` for URL-only discovery, no content fetched |
43
43
 
44
+ ## Rapidly changing content
45
+
46
+ For news, prices, status pages, release notes, or any content that changes frequently, bypass the cache with \`force_refresh: true\`:
47
+
48
+ search({ query: "...", force_refresh: true })
49
+ fetch({ url: "...", force_refresh: true })
50
+
51
+ When freshness matters more than speed, use \`force_refresh\`. When speed matters more than freshness (documentation, tutorials, reference pages), let the cache work -- it is much faster.
52
+
44
53
  ## Check the cache before going to the network
45
54
 
46
55
  Before every \`search\` or \`fetch\`, consider a \`cache\` call with the query text or URL pattern. Pages read in this or a prior session return instantly with their full markdown -- no network, no rate limits. The \`research\` and \`agent\` tools check the cache internally, so you do not need a separate call for those.
@@ -85,8 +94,11 @@ Key parameters:
85
94
  - use_auth: true to use stored browser session for authenticated/private pages
86
95
  - render_js: "auto" (default, detects JS need), "always" (force browser), "never" (HTTP only, fastest)
87
96
  - headers: custom HTTP headers if needed
97
+ - force_refresh: true to bypass cache and fetch fresh content from the network
98
+
99
+ Returns title, markdown content, links, images, and metadata. Result is cached locally -- subsequent fetches of the same URL return instantly. Works with localhost URLs (localhost:3000, etc.) for reading local dev servers.
88
100
 
89
- Returns title, markdown content, links, images, and metadata. Result is cached locally -- subsequent fetches of the same URL return instantly. Works with localhost URLs (localhost:3000, etc.) for reading local dev servers.`,
101
+ Use force_refresh: true for pages that change frequently (news sites, changelogs, dashboards, API status pages). By default, previously fetched pages are served from local cache for speed.`,
90
102
  search: `Search the web and return full markdown content from top results. Use for finding information on any topic -- returns extracted page content, not just snippets.
91
103
 
92
104
  Key parameters:
@@ -95,11 +107,14 @@ Key parameters:
95
107
  - category: "general", "news", "code", "docs", "papers" -- filters by content type
96
108
  - from_date/to_date: ISO dates for time-bounded queries
97
109
  - max_results: default 5. Use 3 for focused queries, 10+ for research.
98
- - format: "full" (default, structured JSON), "context" (single token-budgeted string for LLM injection), "answer" (synthesized direct answer via requestSampling), "stream_answer" (streaming answer chunks)
110
+ - format: "full" (default, structured JSON), "context" (single token-budgeted string for LLM injection), "answer" (synthesized direct answer via requestSampling), "stream_answer" (same as answer, with MCP progress notifications emitted between pipeline phases)
111
+ - force_refresh: true to bypass all caches (search results and page content)
112
+
113
+ The "answer" format uses the MCP client's sampling capability to synthesize a direct response from search results. If sampling is not supported, falls back to "context" format. "stream_answer" emits notifications/progress messages at each pipeline phase (search, fetch, synthesize) when the client provides a progressToken via request._meta — token-level streaming of the LLM response is not supported by MCP sampling, so the answer itself still arrives as one block.
99
114
 
100
- The "answer" format uses the MCP client's sampling capability to synthesize a direct response from search results. If sampling is not supported, falls back to "context" format. "stream_answer" sends incremental progress notifications.
115
+ Results include title, URL, relevance_score, and full markdown_content per result. Previously fetched pages are served from local cache.
101
116
 
102
- Results include title, URL, relevance_score, and full markdown_content per result. Previously fetched pages are served from local cache.`,
117
+ Use force_refresh: true when you need current information that may have changed since the last search. Default behavior uses cached results when available.`,
103
118
  crawl: `Crawl a website starting from a URL and return content from multiple pages. Use for indexing documentation sites, wikis, or any multi-page resource.
104
119
 
105
120
  Key parameters:
@@ -1 +1 @@
1
- {"version":3,"file":"instructions.js","sourceRoot":"","sources":["../src/instructions.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;GAeG;AAEH,MAAM,CAAC,MAAM,mBAAmB,GAAG;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;mMA8DgK,CAAC;AAEpM,MAAM,CAAC,MAAM,iBAAiB,GAAG;IAC/B,KAAK,EAAE;;;;;;;;+NAQsN;IAE7N,MAAM,EAAE;;;;;;;;;;;;yIAY+H;IAEvI,KAAK,EAAE;;;;;;;;uLAQ8K;IAErL,KAAK,EAAE;;;;;;;;;0GASiG;IAExG,OAAO,EAAE;;;;;;;;4LAQiL;IAE1L,YAAY,EAAE;;;;;;;;;;;uEAWuD;IAErE,QAAQ,EAAE;;;;;;;;;;;;;;yHAc6G;IAEvH,KAAK,EAAE;;;;;;;;;;;;;;;;sKAgB6J;CAC5J,CAAC"}
1
+ {"version":3,"file":"instructions.js","sourceRoot":"","sources":["../src/instructions.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;GAeG;AAEH,MAAM,CAAC,MAAM,mBAAmB,GAAG;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;mMAuEgK,CAAC;AAEpM,MAAM,CAAC,MAAM,iBAAiB,GAAG;IAC/B,KAAK,EAAE;;;;;;;;;;;6LAWoL;IAE3L,MAAM,EAAE;;;;;;;;;;;;;;;4JAekJ;IAE1J,KAAK,EAAE;;;;;;;;uLAQ8K;IAErL,KAAK,EAAE;;;;;;;;;0GASiG;IAExG,OAAO,EAAE;;;;;;;;4LAQiL;IAE1L,YAAY,EAAE;;;;;;;;;;;uEAWuD;IAErE,QAAQ,EAAE;;;;;;;;;;;;;;yHAc6G;IAEvH,KAAK,EAAE;;;;;;;;;;;;;;;;sKAgB6J;CAC5J,CAAC"}
@@ -1 +1 @@
1
- {"version":3,"file":"find-similar.d.ts","sourceRoot":"","sources":["../../src/search/find-similar.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EACV,gBAAgB,EAChB,iBAAiB,EAEjB,YAAY,EAEb,MAAM,aAAa,CAAC;AACrB,OAAO,KAAK,EAAE,WAAW,EAAE,MAAM,oBAAoB,CAAC;AACtD,OAAO,KAAK,EAAE,aAAa,EAAE,MAAM,6BAA6B,CAAC;AAuBjE,wBAAsB,WAAW,CAC/B,KAAK,EAAE,gBAAgB,EACvB,OAAO,EAAE,YAAY,EAAE,EACvB,MAAM,EAAE,WAAW,EACnB,aAAa,CAAC,EAAE,aAAa,GAC5B,OAAO,CAAC,iBAAiB,CAAC,CAoH5B"}
1
+ {"version":3,"file":"find-similar.d.ts","sourceRoot":"","sources":["../../src/search/find-similar.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EACV,gBAAgB,EAChB,iBAAiB,EAEjB,YAAY,EAEb,MAAM,aAAa,CAAC;AACrB,OAAO,KAAK,EAAE,WAAW,EAAE,MAAM,oBAAoB,CAAC;AACtD,OAAO,KAAK,EAAE,aAAa,EAAE,MAAM,6BAA6B,CAAC;AAyBjE,wBAAsB,WAAW,CAC/B,KAAK,EAAE,gBAAgB,EACvB,OAAO,EAAE,YAAY,EAAE,EACvB,MAAM,EAAE,WAAW,EACnB,aAAa,CAAC,EAAE,aAAa,GAC5B,OAAO,CAAC,iBAAiB,CAAC,CAqJ5B"}