smart-web-mcp 0.7.6 → 0.7.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +6 -0
- package/README.md +78 -107
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -6,6 +6,12 @@ The format is based on Keep a Changelog[^1] and this project uses SemVer.
|
|
|
6
6
|
|
|
7
7
|
## [Unreleased]
|
|
8
8
|
|
|
9
|
+
## [0.7.7] - 2026-03-29
|
|
10
|
+
|
|
11
|
+
### Changed
|
|
12
|
+
|
|
13
|
+
- the README now leads with agent and MCP-host routing concerns instead of first-time human onboarding flow, which makes the package positioning and tool contract clearer for the way this repo is actually used
|
|
14
|
+
|
|
9
15
|
## [0.7.6] - 2026-03-29
|
|
10
16
|
|
|
11
17
|
### Added
|
package/README.md
CHANGED
|
@@ -2,74 +2,67 @@
|
|
|
2
2
|
|
|
3
3
|
# smart-web
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Local MCP for agentic web retrieval.
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
`smart-web` bundles three tools in one stdio server:
|
|
8
8
|
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
9
|
+
- `smartfetch` for direct URLs
|
|
10
|
+
- `smartsearch` for discovery
|
|
11
|
+
- `smartcrawl` for same-site multi-page traversal
|
|
12
12
|
|
|
13
|
-
|
|
13
|
+
It is designed for **agents and MCP hosts**, not for manual browser automation. The job is to return structured retrieval output first, then signal when a browser-task runtime would be a better next step.
|
|
14
14
|
|
|
15
|
-
`smart-web`
|
|
16
|
-
|
|
17
|
-
- `
|
|
18
|
-
- `smartfetch`: browser-aware direct-URL fetch with target-specific normalization
|
|
19
|
-
- `smartcrawl`: site exploration for docs, boards, pagination, scroll-heavy pages, and optional local Markdown export via `output_dir`
|
|
20
|
-
|
|
21
|
-
All three tools now return a structured `assessment` block so hosts can tell when deterministic retrieval was strong enough versus when a browser-task runtime would likely do better.
|
|
22
|
-
|
|
23
|
-
Migration note: if you previously called `smartdocscrawl`, switch to `smartcrawl` with the same crawl parameters plus `output_dir`.
|
|
24
|
-
|
|
25
|
-
## Why people use it
|
|
26
|
-
|
|
27
|
-
- one local MCP instead of separate search/fetch/crawl servers
|
|
28
|
-
- better normalized outputs for common web surfaces instead of raw HTML dumps
|
|
29
|
-
- explicit confidence/blocking/handoff signals so agents can make better decisions
|
|
30
|
-
- local-first and privacy-first by default
|
|
31
|
-
|
|
32
|
-
Current high-signal targets include:
|
|
33
|
-
|
|
34
|
-
- communities: Reddit, DCInside, LinkedIn public posts, Algumon
|
|
35
|
-
- reference/article pages: NamuWiki, Wikipedia, Naver Blog, Tistory, Velog
|
|
36
|
-
- commerce pages: Coupang, Danawa, Aladin, AliExpress, Amazon
|
|
37
|
-
- media/social: YouTube, X, Threads, Instagram, Telegram
|
|
38
|
-
|
|
39
|
-
## What it is not
|
|
15
|
+
- npm: `https://www.npmjs.com/package/smart-web-mcp`
|
|
16
|
+
- MCP Registry: `io.github.rich-jojo/smart-web`
|
|
17
|
+
- GitHub: `https://github.com/rich-jojo/smart-web`
|
|
40
18
|
|
|
41
|
-
|
|
42
|
-
- not an enterprise anti-bot proxy network
|
|
43
|
-
- not a cloud crawler that tries to out-scale Bright Data / Firecrawl / Apify
|
|
19
|
+
## Highlights
|
|
44
20
|
|
|
45
|
-
|
|
21
|
+
- one local MCP instead of separate search, fetch, and crawl servers
|
|
22
|
+
- explicit routing: URL -> `smartfetch`, query -> `smartsearch`, known site -> `smartcrawl`
|
|
23
|
+
- structured output with `assessment` and `pipeline` fields for host-side decision making
|
|
24
|
+
- local-first defaults with privacy-aware search/fetch behavior
|
|
25
|
+
- useful normalization for common public surfaces instead of raw HTML dumps
|
|
46
26
|
|
|
47
27
|
## Tool routing
|
|
48
28
|
|
|
49
29
|
Use the tools like this:
|
|
50
30
|
|
|
51
31
|
- `smartfetch` first when the user already gave a URL or shortlink
|
|
52
|
-
- `smartsearch` when the user gave a topic, keywords, or a `site:` query
|
|
53
|
-
- `smartcrawl` when you already know the site and need multiple relevant pages
|
|
32
|
+
- `smartsearch` when the user gave a topic, keywords, or a `site:` query
|
|
33
|
+
- `smartcrawl` when you already know the site and need multiple relevant pages
|
|
54
34
|
|
|
55
|
-
|
|
35
|
+
| Situation | Tool |
|
|
36
|
+
| --- | --- |
|
|
37
|
+
| User already gave a URL or shortlink | `smartfetch` |
|
|
38
|
+
| User gave a topic, keywords, or `site:` query | `smartsearch` |
|
|
39
|
+
| You already know the site and need multiple pages | `smartcrawl` |
|
|
56
40
|
|
|
57
|
-
|
|
41
|
+
`smart-web` is a retrieval layer. It is not a general-purpose click/type/form-fill browser agent.[^1]
|
|
58
42
|
|
|
59
|
-
|
|
43
|
+
## What the tools return
|
|
60
44
|
|
|
61
|
-
|
|
62
|
-
- LinkedIn profile URLs return a short authwall-aware summary instead of dumping the whole signup wall
|
|
63
|
-
- LinkedIn profile `smartcrawl` is now a best-effort public-post discovery path instead of crawling irrelevant LinkedIn navigation pages
|
|
64
|
-
- an optional Python `undetected-chromedriver` helper is enabled by default when the helper is actually installed and reachable, and can be pointed at a virtualenv python for tougher browser fetches when Playwright alone is not enough
|
|
45
|
+
All three tools are shaped for agent consumption.
|
|
65
46
|
|
|
66
|
-
|
|
47
|
+
Common high-signal fields include:
|
|
67
48
|
|
|
68
|
-
|
|
49
|
+
- `assessment`: confidence, block/auth hints, recommended handoff
|
|
50
|
+
- `pipeline`: resolve/acquire/normalize/assess stages
|
|
51
|
+
- `errors`: structured failure reasons
|
|
52
|
+
- `post`, `thread`, `comments`: normalized content surfaces
|
|
53
|
+
- `assets` and `download`: file-like outputs when present
|
|
69
54
|
|
|
70
|
-
-
|
|
71
|
-
|
|
72
|
-
|
|
55
|
+
This lets hosts decide whether deterministic retrieval was good enough or whether they should escalate to a browser-task runtime.[^2]
|
|
56
|
+
|
|
57
|
+
## Supported high-signal surfaces
|
|
58
|
+
|
|
59
|
+
- communities: Reddit, DCInside, LinkedIn public posts, Algumon
|
|
60
|
+
- reference/article pages: NamuWiki, Wikipedia, Naver Blog, Tistory, Velog
|
|
61
|
+
- media/social: YouTube, X, Threads, Instagram, Telegram
|
|
62
|
+
- commerce pages: Amazon, Coupang, Danawa, Aladin, AliExpress
|
|
63
|
+
- archives/reference wrappers: Wayback snapshots, archive.md lookup pages, arXiv abstract pages with direct paper links
|
|
64
|
+
|
|
65
|
+
When a source stays blocked or under-specified, `smart-web` prefers partial but honest output over pretending to have verified content.
|
|
73
66
|
|
|
74
67
|
## Install
|
|
75
68
|
|
|
@@ -79,19 +72,17 @@ npx -y smart-web-mcp
|
|
|
79
72
|
|
|
80
73
|
`npm` is the canonical package-manager path for this repo and package.
|
|
81
74
|
|
|
82
|
-
##
|
|
83
|
-
|
|
84
|
-
### 1) Add the MCP to your host
|
|
75
|
+
## Host setup
|
|
85
76
|
|
|
86
|
-
Claude Code
|
|
77
|
+
### Claude Code
|
|
87
78
|
|
|
88
79
|
```bash
|
|
89
80
|
claude mcp add smart-web -- npx -y smart-web-mcp
|
|
90
81
|
```
|
|
91
82
|
|
|
92
|
-
|
|
83
|
+
The repo also ships a small Claude plugin wrapper for marketplace-style distribution.
|
|
93
84
|
|
|
94
|
-
OpenCode
|
|
85
|
+
### OpenCode
|
|
95
86
|
|
|
96
87
|
```json
|
|
97
88
|
{
|
|
@@ -106,42 +97,27 @@ OpenCode:
|
|
|
106
97
|
}
|
|
107
98
|
```
|
|
108
99
|
|
|
109
|
-
### 2) Verify it with a few real calls
|
|
110
|
-
|
|
111
|
-
- `smartfetch` on a normal article URL
|
|
112
|
-
- `smartfetch` on a known Reddit, YouTube, or product URL
|
|
113
|
-
- `smartsearch` on a `site:` query
|
|
114
|
-
- `smartcrawl` on a docs site or board you actually use
|
|
115
|
-
|
|
116
|
-
### 3) Start using the routing rule daily
|
|
117
|
-
|
|
118
|
-
- direct URL already provided -> `smartfetch`
|
|
119
|
-
- topic/keywords only -> `smartsearch`
|
|
120
|
-
- same site, many pages needed -> `smartcrawl`
|
|
121
|
-
|
|
122
100
|
## Configuration
|
|
123
101
|
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
Default path:
|
|
102
|
+
Default settings path:
|
|
127
103
|
|
|
128
104
|
```text
|
|
129
105
|
~/.config/smart-web/settings.json
|
|
130
106
|
```
|
|
131
107
|
|
|
132
|
-
|
|
108
|
+
Print a template:
|
|
133
109
|
|
|
134
110
|
```bash
|
|
135
111
|
npx -y smart-web-mcp --print-settings-example
|
|
136
112
|
```
|
|
137
113
|
|
|
138
|
-
|
|
114
|
+
Initialize the default file:
|
|
139
115
|
|
|
140
116
|
```bash
|
|
141
117
|
npx -y smart-web-mcp --init-settings
|
|
142
118
|
```
|
|
143
119
|
|
|
144
|
-
|
|
120
|
+
Use a non-default path:
|
|
145
121
|
|
|
146
122
|
```json
|
|
147
123
|
{
|
|
@@ -151,27 +127,34 @@ If you need a non-default path, set only one variable in the host config:
|
|
|
151
127
|
}
|
|
152
128
|
```
|
|
153
129
|
|
|
154
|
-
|
|
130
|
+
Recommended steady state: keep most runtime configuration in one settings file instead of a long MCP `env` block.
|
|
131
|
+
|
|
132
|
+
Most installations only need:
|
|
155
133
|
|
|
156
|
-
|
|
134
|
+
- `profile: "balanced"`
|
|
135
|
+
- `profile: "private"`
|
|
136
|
+
- `search.searxngBaseUrl`
|
|
137
|
+
- `search.exaApiKey`
|
|
138
|
+
- `search.braveSearchApiKey`
|
|
157
139
|
|
|
158
|
-
|
|
159
|
-
- `profile: "private"`: disables relay-style providers and public search helpers by default
|
|
160
|
-
- `search.searxngBaseUrl`: self-hosted private fallback search
|
|
161
|
-
- `search.exaApiKey` and `search.braveSearchApiKey`: optional paid search tiers
|
|
140
|
+
## Quick verification
|
|
162
141
|
|
|
163
|
-
|
|
142
|
+
Sanity-check the server with a few real calls:
|
|
164
143
|
|
|
165
|
-
|
|
144
|
+
- `smartfetch` on a normal article URL
|
|
145
|
+
- `smartfetch` on an arXiv abstract URL
|
|
146
|
+
- `smartfetch` on a known product URL
|
|
147
|
+
- `smartsearch` on a `site:` query
|
|
148
|
+
- `smartcrawl` on a docs site or board you actually use
|
|
166
149
|
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
-
|
|
170
|
-
-
|
|
171
|
-
-
|
|
172
|
-
-
|
|
173
|
-
|
|
174
|
-
|
|
150
|
+
## Examples
|
|
151
|
+
|
|
152
|
+
- [Claude Code](examples/claude.mcp.json)
|
|
153
|
+
- [OpenCode](examples/opencode.json)
|
|
154
|
+
- [OpenCode dev hot-reload](examples/opencode.dev.json)
|
|
155
|
+
- [Settings template](examples/smart-web.settings.json)
|
|
156
|
+
|
|
157
|
+
These examples use `/absolute/path/to/...` placeholders for local checkouts.
|
|
175
158
|
|
|
176
159
|
## Docs
|
|
177
160
|
|
|
@@ -185,15 +168,6 @@ Everything else lives in the same JSON file as advanced overrides rather than ex
|
|
|
185
168
|
- [Contributing](CONTRIBUTING.md)
|
|
186
169
|
- [CHANGELOG](CHANGELOG.md)
|
|
187
170
|
|
|
188
|
-
## Examples
|
|
189
|
-
|
|
190
|
-
- [Claude Code](examples/claude.mcp.json)
|
|
191
|
-
- [OpenCode](examples/opencode.json)
|
|
192
|
-
- [OpenCode dev hot-reload](examples/opencode.dev.json)
|
|
193
|
-
- [Settings template](examples/smart-web.settings.json)
|
|
194
|
-
|
|
195
|
-
These examples use `/absolute/path/to/...` placeholders for local checkouts.
|
|
196
|
-
|
|
197
171
|
## Development
|
|
198
172
|
|
|
199
173
|
```bash
|
|
@@ -203,18 +177,16 @@ npm run check
|
|
|
203
177
|
|
|
204
178
|
`npm install` also installs local git hooks for fast TypeScript checks on commit and the full verification suite on push.
|
|
205
179
|
|
|
206
|
-
### Dev MCP without restarting
|
|
207
|
-
|
|
208
|
-
If you run `smart-web` from a local repo checkout, you can use the dev entrypoint so normal implementation changes are picked up after a rebuild without restarting the MCP host:
|
|
180
|
+
### Dev MCP without restarting the host
|
|
209
181
|
|
|
210
182
|
```bash
|
|
211
183
|
npm run dev:watch
|
|
212
184
|
smart-web-dev
|
|
213
185
|
```
|
|
214
186
|
|
|
215
|
-
`smart-web-dev` hot-swaps
|
|
187
|
+
`smart-web-dev` hot-swaps rebuilt `dist/*.js` modules on each tool call. New implementations update after rebuilds, but brand-new MCP tools or schema changes still require a client reconnect because MCP registration happens at process start.
|
|
216
188
|
|
|
217
|
-
By default
|
|
189
|
+
By default, staging snapshots live under the platform cache root (for example `~/.cache/smart-web/tmp` on Linux). Set `SMART_WEB_TMP_DIR` if you need an explicit override.
|
|
218
190
|
|
|
219
191
|
## License
|
|
220
192
|
|
|
@@ -222,6 +194,5 @@ MIT
|
|
|
222
194
|
|
|
223
195
|
## References
|
|
224
196
|
|
|
225
|
-
[^1]: Model Context Protocol, "Tools" specification —
|
|
226
|
-
[^2]: Model Context Protocol Blog, "Tool Annotations as Risk Vocabulary: What Hints Can and Can't Do" (2026-03-16) —
|
|
227
|
-
[^3]: VS Code MCP configuration reference documents `env` and `inputs` on the client side, which is why `smart-web` supports both env-based overrides and a single settings file while keeping the main runtime behavior centralized.
|
|
197
|
+
[^1]: Model Context Protocol, "Tools" specification — MCP tools are a retrieval and integration surface, not a requirement to emulate full browser-automation flows inside one tool.
|
|
198
|
+
[^2]: Model Context Protocol Blog, "Tool Annotations as Risk Vocabulary: What Hints Can and Can't Do" (2026-03-16) — structured tool output and accurate hints improve host-side routing and escalation decisions.
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "smart-web-mcp",
|
|
3
3
|
"mcpName": "io.github.rich-jojo/smart-web",
|
|
4
|
-
"version": "0.7.
|
|
4
|
+
"version": "0.7.7",
|
|
5
5
|
"packageManager": "npm@10.8.2",
|
|
6
6
|
"description": "Portable MCP server for web search, direct-URL fetch, site crawling, and docs export.",
|
|
7
7
|
"homepage": "https://github.com/rich-jojo/smart-web",
|